WEBVTT WEBVTT Kind: captions Language: en 00:00:00.199 --> 00:00:04.450 In the previous video we had a Heywood case in a factor analysis. 00:00:04.450 --> 00:00:09.769 Let's take a look at what's the cause of Heywood case and how would you interpret one if you 00:00:09.769 --> 00:00:12.710 get one in your actual own research. 00:00:12.710 --> 00:00:19.760 So the idea of Heywood case and admissibility is that all variances must be positive because 00:00:19.760 --> 00:00:25.350 variance quantifies the degree of variation and something can't have negative variance. 00:00:25.350 --> 00:00:30.480 I'ts like a you can't have negative leg for example. 00:00:30.480 --> 00:00:32.660 Then this is our example. 00:00:32.660 --> 00:00:35.250 So we have three indicator factor model here. 00:00:35.250 --> 00:00:37.660 We have modeling plus correlation matrix here. 00:00:37.660 --> 00:00:42.610 We have here the empirical correlation matrix and we estimate the factor model. 00:00:42.610 --> 00:00:45.490 So we will get factor loading which is standardized. 00:00:45.490 --> 00:00:52.660 So there are - factor is scaled by setting the variances of factor to one. 00:00:52.660 --> 00:00:55.380 We can see that we have correlation that exist one. 00:00:55.380 --> 00:00:56.920 That is not possible. 00:00:56.920 --> 00:01:01.100 And we have variance that is below zero which is not possible either. 00:01:01.100 --> 00:01:04.790 So this is the Heywood case. 00:01:04.790 --> 00:01:06.899 So it's a negative error variance. 00:01:06.899 --> 00:01:07.899 Variances can't be negative. 00:01:07.899 --> 00:01:13.950 It is inadmissible because it's impossible solution. 00:01:13.950 --> 00:01:18.810 Now what do we do with it and why does this occur? 00:01:18.810 --> 00:01:24.689 It occurs - in this case - it occurs because of sampling error. 00:01:24.689 --> 00:01:33.130 So the correlations here are never at exactly the population values and sometimes it happens 00:01:33.130 --> 00:01:35.119 that we'll get negative estimates. 00:01:35.119 --> 00:01:42.180 The reason for that is that if we repeat - this is simulated data set - if we repeat the estimation 00:01:42.180 --> 00:01:48.920 of this factor model over and over the real error variance is 0.19 and the real factor 00:01:48.920 --> 00:01:51.070 loading is 0.9. 00:01:51.070 --> 00:01:57.450 If we estimate this factor loading that has real value of 0.9 many many times and we have 00:01:57.450 --> 00:02:03.469 an unbiased estimator then the estimates are correct on average. 00:02:03.469 --> 00:02:08.599 So the estimates are centered around the correct population value 0.9. 00:02:08.599 --> 00:02:14.530 If our sample size is small then it means that the estimates - any individual estimate 00:02:14.530 --> 00:02:21.040 - is not exactly 0.9 but is somewhere around 0.9 here. 00:02:21.040 --> 00:02:26.430 If the estimates are normally distributed then we have this negative tail here and we 00:02:26.430 --> 00:02:29.079 also have this positive tail here. 00:02:29.079 --> 00:02:36.489 And we can see that if we have some estimates that are below 0.8 then because of unbiased 00:02:36.489 --> 00:02:40.950 normality some estimates go above 1. 00:02:40.950 --> 00:02:48.310 So it's possible that if you have very good estimator and population value is very large 00:02:48.310 --> 00:02:55.170 and if or the population error variance is very small - then and you sample size is small 00:02:55.170 --> 00:03:00.519 - then you will get because of the normality and unbiased estimates - we will get these 00:03:00.519 --> 00:03:02.590 inadmissible results. 00:03:02.590 --> 00:03:05.409 So what do you do about it? 00:03:05.409 --> 00:03:10.920 Well there are two things that can cause Heywood case. 00:03:10.920 --> 00:03:13.329 One thing is a small sample. 00:03:13.329 --> 00:03:20.900 Highly reliable indicator and small sample we could estimate this 0.19 as being negative. 00:03:20.900 --> 00:03:26.950 Another thing that Heywood case can indicate is that your model is so severely misspesified 00:03:26.950 --> 00:03:32.520 so that the factors that you're specifying are not actually the correct factors. 00:03:32.520 --> 00:03:35.590 So you're specifying the factor structure incorrectly. 00:03:35.590 --> 00:03:42.150 And that can cause some of the estimates become inadmissible as well. 00:03:42.150 --> 00:03:44.239 So how do you know which one is the case? 00:03:44.239 --> 00:03:52.620 Is it a symptom of model misspesification or is it just because you have unbiased estimator 00:03:52.620 --> 00:03:58.510 that is normally distributed and you have a population value that is close to being 00:03:58.510 --> 00:04:01.480 maximum or minimum. 00:04:01.480 --> 00:04:08.549 You don't know for sure but one thing that is sure is that if you have variance that 00:04:08.549 --> 00:04:17.690 is let's say minus 2 and a factor variance that is 1 then - that can be because of small 00:04:17.690 --> 00:04:26.420 sampling fluctuation - so if your estimated error variances are way below zero then that's 00:04:26.420 --> 00:04:28.030 an indication of problem. 00:04:28.030 --> 00:04:34.930 If they are slightly below zero then you could say that maybe the population value is actually 00:04:34.930 --> 00:04:39.950 a small positive number but it's only a small sampling fluctuation thing. 00:04:39.950 --> 00:04:47.390 You don't know but if you have small values then I would be ok with just you saying that 00:04:47.390 --> 00:04:48.820 the indicator is highly reliable.