WEBVTT 00:00:00.640 --> 00:00:06.160 Empirical studies typically need to demonstrate  reliability and validity of their measures.  00:00:06.160 --> 00:00:11.840 The two commonly used tools for doing so  are factor analysis for validation and   00:00:11.840 --> 00:00:17.360 coefficient alpha for demonstrating reliability. There are also other techniques that can be   00:00:17.360 --> 00:00:22.000 applied but these are the most commonly used  ones and they are also the easiest to apply.  00:00:22.800 --> 00:00:29.520 Let's take a look at empirical example. Our example comes from Baron and Tang and they   00:00:30.160 --> 00:00:35.360 measure social skills of entrepreneurs. And here we can see that they are saying   00:00:35.360 --> 00:00:38.880 that they apply alpha. So there's the alpha   00:00:38.880 --> 00:00:44.960 character for reliability assessment. They present some numbers 0.85, 0.71 and they   00:00:44.960 --> 00:00:49.600 also report that they applied factor analysis. So what do these two techniques do   00:00:49.600 --> 00:00:53.120 and why are they used. And used here what is the logic.  00:00:54.320 --> 00:00:59.200 Also what does this table mean. So this table shows the factor analysis results   00:00:59.200 --> 00:01:03.280 and it also shows us the alphas that  they were calculating for this case.  00:01:03.920 --> 00:01:10.160 To understand what this table is about we need  to first understand a bit about measurement   00:01:10.880 --> 00:01:13.680 and what is reliability. So let's assume that we have   00:01:13.680 --> 00:01:16.000 this bathroom scale here. It's a bit rusty,   00:01:16.000 --> 00:01:20.720 we don't know whether it is reliable or not. To determine if the scale is reliable,   00:01:20.720 --> 00:01:27.440 if it always gives the same result when you  step on it it's very easy to check by simply   00:01:27.440 --> 00:01:32.080 stepping on the scale once. Getting the reading. Stepping off,   00:01:32.080 --> 00:01:37.920 letting the scale to reset, then stepping  on the scale again, getting the reading,   00:01:37.920 --> 00:01:44.400 stepping off, letting it reset, stepping on,  getting the reading and stepping off. Now you   00:01:44.400 --> 00:01:50.640 have three different readings from the same scale. If those all three readings are the same   00:01:50.640 --> 00:01:55.600 or very similar to one another then we  conclude that this scale is reliable.  00:01:56.800 --> 00:02:00.560 It lacks any random error. It of course does not tell   00:02:00.560 --> 00:02:03.520 us if it's a valid scale. It might show 10 kilos too   00:02:03.520 --> 00:02:09.280 much or 10 kilos to little but that's not  the question that reliability addresses.  00:02:09.280 --> 00:02:13.120 Reliability is about consistency. If we measure the same thing   00:02:13.120 --> 00:02:18.560 again and again do we get the same result. So how do we do that when we measure people.  00:02:19.120 --> 00:02:24.240 If we want to measure a person's social  perception then we can't simply ask the   00:02:24.240 --> 00:02:28.080 same question again and again. Because a person will remember   00:02:28.080 --> 00:02:32.960 what they answered for the previous time and  then they will just repeat what they answered.  00:02:32.960 --> 00:02:38.400 So it doesn't work, so people don't reset  as easily as scales do, bathroom scales.  00:02:40.000 --> 00:02:47.440 In practice we often use multiple different  questions that are called distinct measures.  00:02:47.440 --> 00:02:51.840 We have five different questions that are  all supposed to measure the same thing.  00:02:52.480 --> 00:02:56.800 But they are sufficiently distinct so that  the person doesn't really recognize that   00:02:56.800 --> 00:03:02.720 these are actually asking about the same thing. About persons social perception capabilities.  00:03:03.680 --> 00:03:09.520 They are nevertheless sufficiently similar that  we can argue that they all measure the same thing.  00:03:10.320 --> 00:03:14.400 This is a fine balance on how different  and how similar the items can be.  00:03:15.280 --> 00:03:21.040 There is also another strategy for assessing  reliability, called test retest reliability.  00:03:21.040 --> 00:03:25.920 Where we actually ask the same question  over and over with a time delay.  00:03:25.920 --> 00:03:30.640 But this is a bit problematic because the  time delay for asking a question for from   00:03:30.640 --> 00:03:35.200 a person needs to be days or weeks. Or otherwise the person will remember   00:03:35.200 --> 00:03:40.320 their past answer and will just repeat  it without reconsidering the question.  00:03:40.320 --> 00:03:44.160 So in practice most studies use  these distinct measures we asked.  00:03:44.880 --> 00:03:49.920 For example three or five different questions  that are supposed to measure the same thing.  00:03:49.920 --> 00:03:51.680 But they're different enough that the person   00:03:51.680 --> 00:03:54.000 doesn't realize that they're  being asked the same thing.  00:03:55.120 --> 00:03:59.360 Factor analysis is a tool for  validating these multiple item measures.  00:03:59.360 --> 00:04:03.920 This is a table of factory analysis  results and what these results tell   00:04:03.920 --> 00:04:09.280 us or what factor analysis tells us. It tells which items go together.  00:04:09.280 --> 00:04:14.640 Which items have something in common or if  there are any underlying dimensions in the data.  00:04:14.640 --> 00:04:21.120 So the idea of factor analysis here is that  if we have five measurement scales then these   00:04:21.120 --> 00:04:26.320 items should be grouped empirically, according  to the things that they're supposed to measure.  00:04:26.320 --> 00:04:29.840 So these file items are supposed  to mess with social perception.  00:04:29.840 --> 00:04:35.120 So we say that conceptually they have in common  that they all measure social perceptions.  00:04:35.120 --> 00:04:39.120 Then we look at do they have  something in common also empirically.  00:04:39.120 --> 00:04:44.160 And factor analysis does that for us. So factor analysis identifies   00:04:44.160 --> 00:04:47.120 that these five items belong to factor number two.  00:04:47.680 --> 00:04:51.760 So when we look at this table we  want to see a pattern like this.  00:04:52.400 --> 00:04:57.280 Each item belongs to one factor these  numbers are called factor loadings.  00:04:57.280 --> 00:05:01.200 Ideally the loading on the main  factor would be more than 0.7.  00:05:01.200 --> 00:05:07.840 This 0.5 is a bit weak so 0.7 is typically  acceptable reliability for that item.  00:05:07.840 --> 00:05:12.560 We also want to see that the items  don't load highly on the other factors.  00:05:13.120 --> 00:05:16.480 If we assume that this factor 3  for example is expressiveness.   00:05:17.040 --> 00:05:20.640 These items correlate strongly because  they all measure expressiveness.  00:05:20.640 --> 00:05:24.240 We want to see low values here  in the social perception items.  00:05:24.240 --> 00:05:28.000 These social perception items  should not depend on expressiveness.  00:05:28.000 --> 00:05:33.600 And this is a very clean factor solution because  the social perception items only load on the   00:05:33.600 --> 00:05:40.320 social perception factor but not the others. We want to see that these are small and the   00:05:40.320 --> 00:05:42.400 main loadings are large. What is small?  00:05:43.360 --> 00:05:47.600 Less than 0.2 less than 0.3 depending on  the source is typically considered small.  00:05:48.320 --> 00:05:53.040 Not all the items work ideally. For example this item here: "people tell   00:05:53.040 --> 00:06:00.160 me that I'm a sensitive and understanding person"  is loading on factor 4 and factor 3 and factor 1.  00:06:01.600 --> 00:06:08.560 So it's not cleanly measuring only factor  5 which is social adaptability but it   00:06:08.560 --> 00:06:15.360 also depends on for example expressiveness. So that kind of items we might consider dropping   00:06:15.360 --> 00:06:19.280 but it was retained in this study  because it have been validated before.  00:06:19.840 --> 00:06:23.120 This is factor analysis. You look for pattern where each   00:06:23.120 --> 00:06:26.640 item loads on the same, that they are  supposed to measure the same thing.  00:06:26.640 --> 00:06:29.760 Loads on the same factor and  not on any other factors.  00:06:29.760 --> 00:06:34.160 Then we would typically label these factors. This would be labeled social perception, this   00:06:34.160 --> 00:06:41.040 would be labeled social adaptability and so on. This is the tool that we use for   00:06:41.040 --> 00:06:44.400 validating items empirically. How do we access reliability?  00:06:45.120 --> 00:06:50.080 Once we have established that items measure  something in common by using factor analysis.  00:06:50.080 --> 00:06:54.080 We calculate coefficient alpha. How exactly it's calculated?  00:06:54.080 --> 00:06:58.320 It's not useful to know. But it basically calculates   00:06:58.320 --> 00:07:03.680 the reliability of the average or the sum  of items that belong to the same scale.  00:07:04.240 --> 00:07:11.600 If we take the integration on scale here it  has 4 items, we take a sum of those 4 items.  00:07:11.600 --> 00:07:17.200 Then the alpha here tells us what  would be the reliability of that sum.  00:07:18.000 --> 00:07:22.960 Typically values greater than  0.7 are considered acceptable.  00:07:22.960 --> 00:07:28.960 But oftentimes we get higher, sometimes we get  lower and sometimes lower reliability can be   00:07:29.680 --> 00:07:33.520 okay if your question is something that  hasn't been never asked that before.  00:07:33.520 --> 00:07:37.680 If you're studying something that  has very well established scales   00:07:37.680 --> 00:07:44.400 then we might require 0.85, 0.9 reliability  because we are the baseline so high already.  00:07:45.120 --> 00:07:49.200 So let's take a summary of measurement. The important concept of investment is   00:07:49.200 --> 00:07:54.400 reliability, lack of random noise in our measures. Validity, do the variables actually measure   00:07:54.400 --> 00:07:58.640 what they are supposed to measure. Reliability is conceptually easy   00:07:58.640 --> 00:08:03.840 to demonstrate and vertically you just take  repeated measures using the same instrument.  00:08:03.840 --> 00:08:08.320 If they correlate it is reliable. With measuring people and their attitudes   00:08:08.320 --> 00:08:13.440 and perceptions this is difficult because the  person remembers, what they answered in the past.  00:08:13.440 --> 00:08:19.200 So in practice we have to use multiple questions  that are slightly different to do that.  00:08:19.920 --> 00:08:24.560 And then we calculate coefficient out. Validity is something that we   00:08:24.560 --> 00:08:30.560 can only demonstrate less directly. In practice validity is an argument for it   00:08:30.560 --> 00:08:37.520 that we have to make on conceptual grounds. For example if we use a ceo's name as a measure   00:08:37.520 --> 00:08:43.120 of gender we have to on conceptual grounds  argue, that name is a good measure of gender.  00:08:43.120 --> 00:08:46.480 It's pretty obvious for most people  that that would be a valid measure.  00:08:47.120 --> 00:08:53.520 Empirically when we have multiple items we can  apply factor analysis to demonstrate, that those   00:08:53.520 --> 00:08:59.280 indicators that are supposed to measure the same  thing also have something in common empirically.  00:08:59.280 --> 00:09:04.320 And then we assume that what they have in common  is actually that they measure the same thing.  00:09:04.320 --> 00:09:08.160 In practice using coefficient  alpha and factor analysis   00:09:08.160 --> 00:09:12.480 is what most of the articles  that apply survey data do this.