WEBVTT Kind: captions Language: en 00:00:00.060 --> 00:00:01.595 There are a couple of different things   00:00:01.595 --> 00:00:04.710 that influence the quality of  your quantitative research, 00:00:04.710 --> 00:00:07.807 so what makes your research reliable and valid? 00:00:08.293 --> 00:00:10.230 To understand those things and 00:00:10.230 --> 00:00:13.350 what does the quality of your study depend on, 00:00:13.350 --> 00:00:15.550 we first have to go through 00:00:15.550 --> 00:00:21.270 what does the concept of reliability and validity mean in the context of quantitative studies? 00:00:21.675 --> 00:00:24.525 This is one way to understand  reliability and validity. 00:00:25.170 --> 00:00:28.593 The concept of reliability  is fairly easy to understand. 00:00:28.593 --> 00:00:32.340 It basically means that if we  repeat the same study again, 00:00:32.340 --> 00:00:33.510 using the same sample, 00:00:33.510 --> 00:00:35.100 would we get the same result? 00:00:35.764 --> 00:00:41.121 In quantitative studies, your  analysis is done by a computer, 00:00:41.121 --> 00:00:44.400 and the computer will always  produce the same result, 00:00:44.400 --> 00:00:45.806 if you give it the same data. 00:00:46.210 --> 00:00:50.430 So reliability in quantitative  analysis or quantitative studies 00:00:50.430 --> 00:00:54.090 is mostly about measurement reliability. 00:00:54.090 --> 00:00:56.151 So if you measure the same things again, 00:00:56.265 --> 00:00:57.975 would you get the same result? 00:00:58.347 --> 00:01:00.312 The validity, on the other hand, 00:01:00.571 --> 00:01:02.581 answers the question of, 00:01:03.480 --> 00:01:08.190 does the study answer the question that  it is supposed to answer correctly? 00:01:08.190 --> 00:01:10.410 So it doesn't provide a correct answer. 00:01:10.410 --> 00:01:11.730 Reliability is about, 00:01:11.730 --> 00:01:15.390 do we get the same answer if  you repeat the same study? 00:01:15.390 --> 00:01:18.898 Reliability doesn't tell us anything  about whether the result is correct. 00:01:19.448 --> 00:01:21.750 Validate tells us whether the result is correct. 00:01:22.430 --> 00:01:27.289 Then, validity can be broken down  into four different categories. 00:01:27.646 --> 00:01:29.040 Measurement validity, 00:01:29.040 --> 00:01:31.249 which we will discuss later, 00:01:31.249 --> 00:01:34.170 refers to whether the variables in our data 00:01:34.170 --> 00:01:36.990 measure the concepts that we claim they measure. 00:01:38.010 --> 00:01:40.727 Statistical conclusion validity refers to, 00:01:40.727 --> 00:01:43.290 whether our statistical results are correct. 00:01:43.290 --> 00:01:45.903 So if you have identified a trend 00:01:45.903 --> 00:01:48.573 or a difference in the sample, 00:01:49.107 --> 00:01:52.140 have we identified that correctly? 00:01:52.140 --> 00:01:56.310 So, is there really a difference  or a trend in the population? 00:01:56.310 --> 00:02:01.290 So it relates to whether our statistical  associations measured from the sample, 00:02:01.290 --> 00:02:03.030 generalize to the populous. 00:02:03.969 --> 00:02:06.073 Then we have internal validity, 00:02:06.073 --> 00:02:13.590 which refers to whether the relationships  actually correspond to the causal relation 00:02:13.590 --> 00:02:14.710 that we claimed. 00:02:14.904 --> 00:02:18.990 And internal validity is about causal inference. 00:02:18.990 --> 00:02:22.500 So have we identified the right controls and 00:02:22.500 --> 00:02:25.745 how we control the controls appropriately, 00:02:25.745 --> 00:02:33.090 or is our experimental or quasi-experimental  design free of any possible selection effects, 00:02:33.090 --> 00:02:36.138 that would confound the treatment effect. 00:02:36.591 --> 00:02:38.010 So that's causal inference. 00:02:38.010 --> 00:02:41.669 Then external validity simply refers to, 00:02:41.669 --> 00:02:46.680 do our results from one population  generalize to other populations? 00:02:47.166 --> 00:02:53.461 So what determines the quality of a  research study is an interesting question, 00:02:53.461 --> 00:02:57.267 and it can be examined  through the research process 00:02:57.267 --> 00:02:59.366 according to Singleton and Straits book. 00:02:59.868 --> 00:03:03.900 So Singleton and Straits say  that research always starts from 00:03:03.900 --> 00:03:07.134 research topic and formulating  of a research question. 00:03:07.134 --> 00:03:09.933 Of course, your study is not very valuable, 00:03:09.933 --> 00:03:12.900 if the research question is not interesting, 00:03:12.900 --> 00:03:15.240 but we will be focusing on the empirical part. 00:03:15.758 --> 00:03:18.420 So after you have your research question set, 00:03:18.938 --> 00:03:22.000 then you start to prepare your research design. 00:03:22.437 --> 00:03:25.549 And the research design has two main components. 00:03:26.035 --> 00:03:27.280 One is sampling, 00:03:27.280 --> 00:03:29.130 so what are the units, people, 00:03:29.130 --> 00:03:30.903 organizations, projects, 00:03:30.903 --> 00:03:32.880 whatever that are the units that you're studying. 00:03:32.880 --> 00:03:35.730 So which units and how many are you studying. 00:03:36.167 --> 00:03:38.024 And then we have the measurement, 00:03:38.024 --> 00:03:39.900 which variables we collect. 00:03:40.110 --> 00:03:43.191 So if we think of our data as an Excel sheet, 00:03:43.434 --> 00:03:47.940 sampling concerns, what are  the rows in that Excel sheet, 00:03:47.940 --> 00:03:53.310 and measurement concerns what are  the columns in that Excel sheet. 00:03:53.731 --> 00:03:57.900 Then we do data collection and  after the data have been collected, 00:03:57.900 --> 00:04:00.694 we typically process the data somehow, 00:04:00.856 --> 00:04:03.155 we screen it for errors, 00:04:03.155 --> 00:04:05.218 we modify it in the different form, 00:04:05.218 --> 00:04:07.020 and then we do data analysis, 00:04:07.020 --> 00:04:08.393 and we interpret the results, 00:04:08.474 --> 00:04:10.710 finally, we write an article about it. 00:04:11.196 --> 00:04:16.020 So, which part defines the quality of a study? 00:04:17.105 --> 00:04:19.440 It is this part here, 00:04:19.780 --> 00:04:21.945 so when you have collected your data, 00:04:22.252 --> 00:04:25.643 then you have basically already  set an upper limit of the quality, 00:04:25.967 --> 00:04:28.457 if your data are not good, 00:04:28.667 --> 00:04:32.193 then you can't make a good study. 00:04:32.620 --> 00:04:33.939 On the other hand, 00:04:33.939 --> 00:04:35.265 if you have great data, 00:04:35.621 --> 00:04:40.470 even if you mess up your data collection  or data analysis or interpretation, 00:04:40.470 --> 00:04:42.398 that is something that you can fix, 00:04:42.398 --> 00:04:45.210 you have the data, you can just  analyze it a bit differently. 00:04:46.376 --> 00:04:53.072 It's important to understand that the validity  of our causal claims depends crucially on, 00:04:53.072 --> 00:04:55.076 whether the sample is appropriate, 00:04:55.529 --> 00:04:58.920 and whether we have collected  all relevant controls, 00:04:58.920 --> 00:05:01.680 or whether we have a valid experimental design. 00:05:02.101 --> 00:05:07.200 After that, data processing and  data analysis are just mechanics 00:05:07.427 --> 00:05:12.550 that will allow you to document  this great study conducted here. 00:05:12.809 --> 00:05:14.629 So this is the important part. 00:05:14.629 --> 00:05:19.410 And you should not rush into  data collection, obviously. 00:05:19.410 --> 00:05:22.769 Because if you just go and  you collect data right away, 00:05:22.866 --> 00:05:25.146 the odds for you doing it correctly, 00:05:25.146 --> 00:05:28.500 with the good design that  includes all relevant controls, 00:05:28.500 --> 00:05:29.398 is pretty low. 00:05:31.195 --> 00:05:34.452 This is highlighted in some of the readings. 00:05:34.938 --> 00:05:38.400 So the problems in rejected manuscripts 00:05:38.400 --> 00:05:40.980 in good journals are rarely about data analysis. 00:05:40.980 --> 00:05:44.753 So when I myself review a paper, 00:05:45.000 --> 00:05:48.060 I typically have lots of things  to say about the methods, 00:05:48.060 --> 00:05:49.574 because that's my speciality. 00:05:49.736 --> 00:05:52.440 But if the data are good, the design is good, 00:05:52.440 --> 00:05:55.800 then I will say that okay,  do the analysis differently, 00:05:55.800 --> 00:06:00.060 resubmit and then I will  re-evaluate your manuscript 00:06:00.060 --> 00:06:01.793 to see whether it makes sense. 00:06:02.052 --> 00:06:07.860 But if there is a control variable that is  very important based on existing theory, 00:06:07.860 --> 00:06:12.424 that provides an alternative  explanation for the phenomenon, 00:06:12.424 --> 00:06:16.453 that the researchers are studying,  which has not been measured, 00:06:16.453 --> 00:06:19.680 then there is nothing that they  can do about it in most cases, 00:06:19.680 --> 00:06:21.981 because it's very difficult to go, 00:06:21.981 --> 00:06:24.240 particularly if you collect  the data with the survey, 00:06:24.240 --> 00:06:26.910 it's very difficult to go and  collect additional variables. 00:06:27.590 --> 00:06:31.020 Also, Aguinis and Vandenberg say here, 00:06:31.910 --> 00:06:34.950 that data analysis problems are rarely something 00:06:34.950 --> 00:06:37.440 that cause an article to be rejected. 00:06:37.440 --> 00:06:39.480 Because data analysis problem is something 00:06:39.480 --> 00:06:41.310 that you just re-analyze the data, 00:06:41.310 --> 00:06:43.328 fix the problem and you're going to be fine. 00:06:43.976 --> 00:06:46.320 So the problem is that 00:06:46.320 --> 00:06:49.388 if your design doesn't allow  you to make causal claims, 00:06:49.388 --> 00:06:53.850 then you can't make a claim and there's  nothing that you can do about it. 00:06:53.850 --> 00:06:57.308 They also say that there is  this kind of persistent belief, 00:06:57.680 --> 00:07:00.000 that if you have a bad design, 00:07:00.000 --> 00:07:03.720 you can compensate that using a fancy method. 00:07:03.720 --> 00:07:06.345 So some people seem to think that because, 00:07:06.507 --> 00:07:11.693 let's say multilevel generalized structural  equation modeling is a new thing, 00:07:12.000 --> 00:07:14.730 therefore using that makes your study better. 00:07:15.183 --> 00:07:16.923 That is not true. 00:07:17.220 --> 00:07:20.820 The quality of the study is determined  largely before data collection, 00:07:20.820 --> 00:07:25.080 and then after that, you have to  just choose the appropriate analysis, 00:07:25.080 --> 00:07:28.890 instead of going with the  one that is the most complex. 00:07:29.457 --> 00:07:32.223 If you have a bad design, you have bad data, 00:07:32.223 --> 00:07:35.211 then using a fancy method for that data 00:07:35.211 --> 00:07:38.070 just means that you spend  a lot of time using a fancy 00:07:38.070 --> 00:07:40.650 and complicated method on bad data 00:07:40.650 --> 00:07:43.440 and the outcome is, it's not a good paper anyway. 00:07:44.039 --> 00:07:47.640 So you just end up with the poor  study with the fancy method.