WEBVTT WEBVTT Kind: captions Language: en 00:00:00.370 --> 00:00:05.000 After a statistical analysis you will nearly always have to do some kind of diagnostics 00:00:05.000 --> 00:00:08.410 for the results before you can trust them. 00:00:08.410 --> 00:00:14.589 In confirmatory factor analysis the most important diagnostic information is the chi-square statistic. 00:00:14.589 --> 00:00:22.270 And when you have a chi-square that is significant - it indicates that the model did not reproduce 00:00:22.270 --> 00:00:25.390 the empirical correlation matrix completely. 00:00:25.390 --> 00:00:32.250 It means that the model doesn't really explain every part of the data well enough that the 00:00:32.250 --> 00:00:35.660 resituals can be attributed to the chance only. 00:00:35.660 --> 00:00:41.579 So in this case I estimated same data set as in the empirical example but I specified 00:00:41.579 --> 00:00:48.100 the factor model that hat some factor correlations that were constrained to be zero. 00:00:48.100 --> 00:00:53.199 The chi-square detects that the correlations were not actually zero in the population. 00:00:53.199 --> 00:00:55.649 Therefor it rejects the model. 00:00:55.649 --> 00:00:56.820 So what do we do? 00:00:56.820 --> 00:01:02.659 It's actually very common that your chi-square statistic doesn't or rejects the model. 00:01:02.659 --> 00:01:05.440 So you can't conclude that everything is well. 00:01:05.440 --> 00:01:08.180 You have to then again understand why that occurs. 00:01:08.180 --> 00:01:10.550 So you have to do some diagnostics. 00:01:10.550 --> 00:01:17.700 There are two main ways of doing diagnostics for confirmatory factor analysis in an exploratory 00:01:17.700 --> 00:01:18.700 manner. 00:01:18.700 --> 00:01:25.200 So the exploratory manner means that you don't have any prior hypothesis of what is incorrect. 00:01:25.200 --> 00:01:29.090 The first approach is modification indices. 00:01:29.090 --> 00:01:35.970 I said earlier that your software could indicate that if you add a correlation between two 00:01:35.970 --> 00:01:41.640 error terms then that will indicate that - that will improve the fit of the model. 00:01:41.640 --> 00:01:47.340 It will make the chi-square smaller and we hope non-significant. 00:01:47.340 --> 00:01:53.810 The idea of modification indices is that the computer calculates things that you can add 00:01:53.810 --> 00:01:55.930 to your model to make it better. 00:01:55.930 --> 00:01:57.890 That should not be done mindlessly. 00:01:57.890 --> 00:02:07.500 Mesquito and Lazzari give a good example of how to report these modification indices. 00:02:07.500 --> 00:02:11.230 First of all they report what is the purpose of this indices. 00:02:11.230 --> 00:02:17.239 So the purpose of this indices is that you can make the model reproduce the correlation 00:02:17.239 --> 00:02:21.360 matrix better by adding something to the model. 00:02:21.360 --> 00:02:25.610 Then they found - then you explain what you do. 00:02:25.610 --> 00:02:32.140 So they add some stuff and they add some other stuff. 00:02:32.140 --> 00:02:35.260 So is that justified? 00:02:35.260 --> 00:02:42.409 Well every time when you do a change to your model it has to be justified based on your 00:02:42.409 --> 00:02:43.629 theory. 00:02:43.629 --> 00:02:50.400 For example if we have these six indicators and we have a modification indices that indicates 00:02:50.400 --> 00:02:55.300 that these error terms should be correlated then we have to explain what the correlation 00:02:55.300 --> 00:02:56.300 means. 00:02:56.300 --> 00:03:04.049 For example if we have indicators of innovativeness indicators about productivity we could say 00:03:04.049 --> 00:03:10.790 that ok yeah this indicator also measures something about personnel and this measures 00:03:10.790 --> 00:03:12.840 about something about personnel as well. 00:03:12.840 --> 00:03:18.590 So these indicators have this personnel dimension and therefor we say that their errors should 00:03:18.590 --> 00:03:20.379 be correlated. 00:03:20.379 --> 00:03:26.800 The first structural regression model course that I took the instructor told us that when 00:03:26.800 --> 00:03:33.230 you see modification index then unless it gives you this kind of aha-moment then you 00:03:33.230 --> 00:03:34.969 shouldn't add anything to your model. 00:03:34.969 --> 00:03:39.919 So the modification index is only something that tells you that this is a part that you 00:03:39.919 --> 00:03:40.919 should consider. 00:03:40.919 --> 00:03:44.090 Then it's up to you to decide whether it makes sense. 00:03:44.090 --> 00:03:51.499 The idea of factor analysis model is not to produce the date perfectly - the idea is to 00:03:51.499 --> 00:03:56.000 have a theoretical presentation of the process that could have caused your data and it's 00:03:56.000 --> 00:04:01.849 also possible that factor analysis simply says that no you're data don't measure the 00:04:01.849 --> 00:04:05.219 things you want - you say they do measure. 00:04:05.219 --> 00:04:06.549 And that's a result. 00:04:06.549 --> 00:04:11.450 So every modification must be done based on theory. 00:04:11.450 --> 00:04:15.590 Another way of doing this is looking at the residuals. 00:04:15.590 --> 00:04:21.459 So we have residual correlations which is the difference with the implied matrix and 00:04:21.459 --> 00:04:24.240 the observed correlation matrix or covariance matrix. 00:04:24.240 --> 00:04:27.590 Here are the residuals for the full model. 00:04:27.590 --> 00:04:30.599 So there are two things that we need to check. 00:04:30.599 --> 00:04:33.710 First is the overall distribution of these residuals. 00:04:33.710 --> 00:04:40.919 Turns out that if the model is correctly specified these residual correlations are normally distributed 00:04:40.919 --> 00:04:45.400 with the mean zero and we can see here that we have this bump here on the right hand side 00:04:45.400 --> 00:04:50.010 of the tail so that indicates misspecification. 00:04:50.010 --> 00:04:55.580 And this tail also indicates - because there's bump in it - it indicates there's local misspecification. 00:04:55.580 --> 00:05:01.780 So there is some part of the model that is incorrectly specified. 00:05:01.780 --> 00:05:02.780 It's mostly ok. 00:05:02.780 --> 00:05:08.759 So most of these correlations are close to zero but there are some parts this bump here 00:05:08.759 --> 00:05:15.840 - big bump and smaller bump - then indicate that there are parts where the model doesn't 00:05:15.840 --> 00:05:17.440 reproduce the data. 00:05:17.440 --> 00:05:22.690 Then it's up to us to look at the residuals and see where are the high values. 00:05:22.690 --> 00:05:29.850 We can see here that one block of items here - the vertical covernance or horizontal covernance 00:05:29.850 --> 00:05:34.720 indicators correlate much more than the model implies. 00:05:34.720 --> 00:05:39.430 Then we have to look at the model and then think ok so we have an implied correlation 00:05:39.430 --> 00:05:44.970 of let's say zero so why is it zero in the implied correlation matrix. 00:05:44.970 --> 00:05:47.370 That relates back to the tracing rules. 00:05:47.370 --> 00:05:50.879 So what in the model predicts the correlation? 00:05:50.879 --> 00:05:57.729 In this case I constraint these two factors to be uncorrelated and that caused these residuals 00:05:57.729 --> 00:06:03.599 to go up and it indicates the model is misspecified because there horizontal and vertical are 00:06:03.599 --> 00:06:06.259 actually quite highly correlated. 00:06:06.259 --> 00:06:13.560 Another thing is that we can find that these are - these high values also single indicator 00:06:13.560 --> 00:06:18.830 factors - I constrained that to be uncorrelated with other factors as well. 00:06:18.830 --> 00:06:23.639 So that way you can look at the residuals and look which correlation the model doesn't 00:06:23.639 --> 00:06:28.930 explain well and then you think ok so why - what influences that correlation in your 00:06:28.930 --> 00:06:29.930 model? 00:06:29.930 --> 00:06:32.020 Is that part of your model correct? 00:06:32.020 --> 00:06:37.629 This requires a bit more expertise than just doing the modification indices. 00:06:37.629 --> 00:06:42.230 But the problem with the modification indices is that sometimes the modification indices 00:06:42.230 --> 00:06:43.879 don't make any sense at all. 00:06:43.879 --> 00:06:49.979 And it's easier to do nonsensical decision using the modification indices than it's using 00:06:49.979 --> 00:06:51.500 the residuals. 00:06:51.500 --> 00:06:58.080 So the way I do diagnostics is that I usually quickly take the modification indices if my 00:06:58.080 --> 00:07:01.949 model doesn't fit well and then I print out the residuals. 00:07:01.949 --> 00:07:05.150 Also it may make sense to print out a part of these residuals. 00:07:05.150 --> 00:07:10.490 So after - this is a big matrix so going through it one by one is difficult but once you have 00:07:10.490 --> 00:07:16.300 identified the segment of the matrix where you have larger values then you can fit a 00:07:16.300 --> 00:07:17.300 submodel. 00:07:17.300 --> 00:07:21.460 So for example we could only fit the model with horizontal covernance vertical covernance 00:07:21.460 --> 00:07:24.860 and then maybe one other factor. 00:07:24.860 --> 00:07:30.780 So the way to do diagnostics is that if a full model doesn't work then you start doing 00:07:30.780 --> 00:07:31.780 submodels. 00:07:31.780 --> 00:07:37.919 So can you get smaller model work - drop something from the model and then if it works then you 00:07:37.919 --> 00:07:42.030 know that something that you drop from the model was the reason why it didn't work. 00:07:42.030 --> 00:07:44.460 Then you can look at the part that you dropped. 00:07:44.460 --> 00:07:48.620 Or split the model into two and then do diagnostics for first part. 00:07:48.620 --> 00:07:51.379 Once your happy with that then do it for the second part. 00:07:51.379 --> 00:07:54.550 Once your happy with that then do it for the full model. 00:07:54.550 --> 00:07:59.800 It's a good idea - good engineering principle is that once you have big system that doesn't 00:07:59.800 --> 00:08:04.080 work start looking at individual parts and then figure out which of those parts don't 00:08:04.080 --> 00:08:09.230 work and whether it can be fixed and only after verifying all the parts then you look 00:08:09.230 --> 00:08:13.630 at the whole because looking at the big correlation matrix is very difficult to do.