WEBVTT

WEBVTT
Kind: captions
Language: en

00:00:00.370 --> 00:00:05.000
After a statistical analysis you will nearly
always have to do some kind of diagnostics

00:00:05.000 --> 00:00:08.410
for the results before you can trust them.

00:00:08.410 --> 00:00:14.589
In confirmatory factor analysis the most important
diagnostic information is the chi-square statistic.

00:00:14.589 --> 00:00:22.270
And when you have a chi-square that is significant
- it indicates that the model did not reproduce

00:00:22.270 --> 00:00:25.390
the empirical correlation matrix completely.

00:00:25.390 --> 00:00:32.250
It means that the model doesn't really explain
every part of the data well enough that the

00:00:32.250 --> 00:00:35.660
resituals can be attributed to the chance
only.

00:00:35.660 --> 00:00:41.579
So in this case I estimated same data set
as in the empirical example but I specified

00:00:41.579 --> 00:00:48.100
the factor model that hat some factor correlations
that were constrained to be zero.

00:00:48.100 --> 00:00:53.199
The chi-square detects that the correlations
were not actually zero in the population.

00:00:53.199 --> 00:00:55.649
Therefor it rejects the model.

00:00:55.649 --> 00:00:56.820
So what do we do?

00:00:56.820 --> 00:01:02.659
It's actually very common that your chi-square
statistic doesn't or rejects the model.

00:01:02.659 --> 00:01:05.440
So you can't conclude that everything is well.

00:01:05.440 --> 00:01:08.180
You have to then again understand why that
occurs.

00:01:08.180 --> 00:01:10.550
So you have to do some diagnostics.

00:01:10.550 --> 00:01:17.700
There are two main ways of doing diagnostics
for confirmatory factor analysis in an exploratory

00:01:17.700 --> 00:01:18.700
manner.

00:01:18.700 --> 00:01:25.200
So the exploratory manner means that you don't
have any prior hypothesis of what is incorrect.

00:01:25.200 --> 00:01:29.090
The first approach is modification indices.

00:01:29.090 --> 00:01:35.970
I said earlier that your software could indicate
that if you add a correlation between two

00:01:35.970 --> 00:01:41.640
error terms then that will indicate that - that
will improve the fit of the model.

00:01:41.640 --> 00:01:47.340
It will make the chi-square smaller and we
hope non-significant.

00:01:47.340 --> 00:01:53.810
The idea of modification indices is that the
computer calculates things that you can add

00:01:53.810 --> 00:01:55.930
to your model to make it better.

00:01:55.930 --> 00:01:57.890
That should not be done mindlessly.

00:01:57.890 --> 00:02:07.500
Mesquito and Lazzari give a good example of
how to report these modification indices.

00:02:07.500 --> 00:02:11.230
First of all they report what is the purpose
of this indices.

00:02:11.230 --> 00:02:17.239
So the purpose of this indices is that you
can make the model reproduce the correlation

00:02:17.239 --> 00:02:21.360
matrix better by adding something to the model.

00:02:21.360 --> 00:02:25.610
Then they found - then you explain what you
do.

00:02:25.610 --> 00:02:32.140
So they add some stuff and they add some other
stuff.

00:02:32.140 --> 00:02:35.260
So is that justified?

00:02:35.260 --> 00:02:42.409
Well every time when you do a change to your
model it has to be justified based on your

00:02:42.409 --> 00:02:43.629
theory.

00:02:43.629 --> 00:02:50.400
For example if we have these six indicators
and we have a modification indices that indicates

00:02:50.400 --> 00:02:55.300
that these error terms should be correlated
then we have to explain what the correlation

00:02:55.300 --> 00:02:56.300
means.

00:02:56.300 --> 00:03:04.049
For example if we have indicators of innovativeness
indicators about productivity we could say

00:03:04.049 --> 00:03:10.790
that ok yeah this indicator also measures
something about personnel and this measures

00:03:10.790 --> 00:03:12.840
about something about personnel as well.

00:03:12.840 --> 00:03:18.590
So these indicators have this personnel dimension
and therefor we say that their errors should

00:03:18.590 --> 00:03:20.379
be correlated.

00:03:20.379 --> 00:03:26.800
The first structural regression model course
that I took the instructor told us that when

00:03:26.800 --> 00:03:33.230
you see modification index then unless it
gives you this kind of aha-moment then you

00:03:33.230 --> 00:03:34.969
shouldn't add anything to your model.

00:03:34.969 --> 00:03:39.919
So the modification index is only something
that tells you that this is a part that you

00:03:39.919 --> 00:03:40.919
should consider.

00:03:40.919 --> 00:03:44.090
Then it's up to you to decide whether it makes
sense.

00:03:44.090 --> 00:03:51.499
The idea of factor analysis model is not to
produce the date perfectly - the idea is to

00:03:51.499 --> 00:03:56.000
have a theoretical presentation of the process
that could have caused your data and it's

00:03:56.000 --> 00:04:01.849
also possible that factor analysis simply
says that no you're data don't measure the

00:04:01.849 --> 00:04:05.219
things you want - you say they do measure.

00:04:05.219 --> 00:04:06.549
And that's a result.

00:04:06.549 --> 00:04:11.450
So every modification must be done based on
theory.

00:04:11.450 --> 00:04:15.590
Another way of doing this is looking at the
residuals.

00:04:15.590 --> 00:04:21.459
So we have residual correlations which is
the difference with the implied matrix and

00:04:21.459 --> 00:04:24.240
the observed correlation matrix or covariance
matrix.

00:04:24.240 --> 00:04:27.590
Here are the residuals for the full model.

00:04:27.590 --> 00:04:30.599
So there are two things that we need to check.

00:04:30.599 --> 00:04:33.710
First is the overall distribution of these
residuals.

00:04:33.710 --> 00:04:40.919
Turns out that if the model is correctly specified
these residual correlations are normally distributed

00:04:40.919 --> 00:04:45.400
with the mean zero and we can see here that
we have this bump here on the right hand side

00:04:45.400 --> 00:04:50.010
of the tail so that indicates misspecification.

00:04:50.010 --> 00:04:55.580
And this tail also indicates - because there's
bump in it - it indicates there's local misspecification.

00:04:55.580 --> 00:05:01.780
So there is some part of the model that is
incorrectly specified.

00:05:01.780 --> 00:05:02.780
It's mostly ok.

00:05:02.780 --> 00:05:08.759
So most of these correlations are close to
zero but there are some parts this bump here

00:05:08.759 --> 00:05:15.840
- big bump and smaller bump - then indicate
that there are parts where the model doesn't

00:05:15.840 --> 00:05:17.440
reproduce the data.

00:05:17.440 --> 00:05:22.690
Then it's up to us to look at the residuals
and see where are the high values.

00:05:22.690 --> 00:05:29.850
We can see here that one block of items here
- the vertical covernance or horizontal covernance

00:05:29.850 --> 00:05:34.720
indicators correlate much more than the model
implies.

00:05:34.720 --> 00:05:39.430
Then we have to look at the model and then
think ok so we have an implied correlation

00:05:39.430 --> 00:05:44.970
of let's say zero so why is it zero in the
implied correlation matrix.

00:05:44.970 --> 00:05:47.370
That relates back to the tracing rules.

00:05:47.370 --> 00:05:50.879
So what in the model predicts the correlation?

00:05:50.879 --> 00:05:57.729
In this case I constraint these two factors
to be uncorrelated and that caused these residuals

00:05:57.729 --> 00:06:03.599
to go up and it indicates the model is misspecified
because there horizontal and vertical are

00:06:03.599 --> 00:06:06.259
actually quite highly correlated.

00:06:06.259 --> 00:06:13.560
Another thing is that we can find that these
are - these high values also single indicator

00:06:13.560 --> 00:06:18.830
factors - I constrained that to be uncorrelated
with other factors as well.

00:06:18.830 --> 00:06:23.639
So that way you can look at the residuals
and look which correlation the model doesn't

00:06:23.639 --> 00:06:28.930
explain well and then you think ok so why
- what influences that correlation in your

00:06:28.930 --> 00:06:29.930
model?

00:06:29.930 --> 00:06:32.020
Is that part of your model correct?

00:06:32.020 --> 00:06:37.629
This requires a bit more expertise than just
doing the modification indices.

00:06:37.629 --> 00:06:42.230
But the problem with the modification indices
is that sometimes the modification indices

00:06:42.230 --> 00:06:43.879
don't make any sense at all.

00:06:43.879 --> 00:06:49.979
And it's easier to do nonsensical decision
using the modification indices than it's using

00:06:49.979 --> 00:06:51.500
the residuals.

00:06:51.500 --> 00:06:58.080
So the way I do diagnostics is that I usually
quickly take the modification indices if my

00:06:58.080 --> 00:07:01.949
model doesn't fit well and then I print out
the residuals.

00:07:01.949 --> 00:07:05.150
Also it may make sense to print out a part
of these residuals.

00:07:05.150 --> 00:07:10.490
So after - this is a big matrix so going through
it one by one is difficult but once you have

00:07:10.490 --> 00:07:16.300
identified the segment of the matrix where
you have larger values then you can fit a

00:07:16.300 --> 00:07:17.300
submodel.

00:07:17.300 --> 00:07:21.460
So for example we could only fit the model
with horizontal covernance vertical covernance

00:07:21.460 --> 00:07:24.860
and then maybe one other factor.

00:07:24.860 --> 00:07:30.780
So the way to do diagnostics is that if a
full model doesn't work then you start doing

00:07:30.780 --> 00:07:31.780
submodels.

00:07:31.780 --> 00:07:37.919
So can you get smaller model work - drop something
from the model and then if it works then you

00:07:37.919 --> 00:07:42.030
know that something that you drop from the
model was the reason why it didn't work.

00:07:42.030 --> 00:07:44.460
Then you can look at the part that you dropped.

00:07:44.460 --> 00:07:48.620
Or split the model into two and then do diagnostics
for first part.

00:07:48.620 --> 00:07:51.379
Once your happy with that then do it for the
second part.

00:07:51.379 --> 00:07:54.550
Once your happy with that then do it for the
full model.

00:07:54.550 --> 00:07:59.800
It's a good idea - good engineering principle
is that once you have big system that doesn't

00:07:59.800 --> 00:08:04.080
work start looking at individual parts and
then figure out which of those parts don't

00:08:04.080 --> 00:08:09.230
work and whether it can be fixed and only
after verifying all the parts then you look

00:08:09.230 --> 00:08:13.630
at the whole because looking at the big correlation
matrix is very difficult to do.