WEBVTT
WEBVTT
Kind: captions
Language: en
00:00:00.060 --> 00:00:04.500
This video explains the basic idea of
structural regression models that are
00:00:04.500 --> 00:00:07.380
sometimes referred to as structural
equation models in the literature.
00:00:07.380 --> 00:00:11.130
What is a structural regression
model? This technique is used
00:00:11.130 --> 00:00:14.730
in for example the Mesquita and
Lazzarini's paper. They explain
00:00:14.730 --> 00:00:19.560
the technique that it's a combination of
a factor analysis and a path analysis.
00:00:19.560 --> 00:00:23.670
Path analysis is basically the
regression analysis where there
00:00:23.670 --> 00:00:27.870
are multiple equations for example
when you do a mediation model using
00:00:27.870 --> 00:00:31.380
the simultaneous equations approach
that will be called a path analysis.
00:00:31.380 --> 00:00:35.520
So path analysis is regression with
opposite variables except that you
00:00:35.520 --> 00:00:40.200
have more than one dependent variables and
factor analysis is the analysis where we
00:00:40.200 --> 00:00:45.450
check what different indicators have
in common and perhaps whether we can
00:00:45.450 --> 00:00:49.560
group those indicators and consider
them as measures of the same concept.
00:00:49.560 --> 00:00:58.740
So SEM or structural equation method combines
these two analysis approaches. To understand
00:00:58.740 --> 00:01:05.280
what SEM is and what it does we can start
with the basic regression analysis model.
00:01:05.280 --> 00:01:09.540
So the basic regression analysis model
makes the important assumption that the
00:01:09.540 --> 00:01:15.810
X 1 and X 2 here are measured without any
measurement error. So the X 1 and X 2 are
00:01:15.810 --> 00:01:24.000
the quantities of interest in terms instead of
being measures of the quantities of interest.
00:01:24.000 --> 00:01:29.640
So X 1 is of direct interest instead
of being a measure with possibly some
00:01:29.640 --> 00:01:33.600
error in there of some concept that
we can't measure - observe directly.
00:01:33.600 --> 00:01:38.880
So regression analysis makes that assumption
if that assumption of no measurement error
00:01:38.880 --> 00:01:45.480
fails these regression coefficients beta 1
and beta 2 will be inconsistent and biased.
00:01:45.480 --> 00:01:51.120
Then we have the factor analysis model.
The idea of a factor analysis model was
00:01:51.120 --> 00:01:54.090
that we have a set of indicators and then we ask
00:01:54.090 --> 00:01:59.160
what these indicators have in common and
what they have in common is one factor.
00:01:59.160 --> 00:02:04.380
In confirmatory factor analysis we
ask do these indicators represent
00:02:04.380 --> 00:02:09.210
one factor or not. The computer gives us
an answer in an expiratory analysis which
00:02:09.210 --> 00:02:12.960
is not part of structural regression
model the computer finds the factors.
00:02:12.960 --> 00:02:15.660
So we define a factor structure here and then we
00:02:15.660 --> 00:02:18.060
estimate it. So that's part of
structural regression model.
00:02:18.060 --> 00:02:24.390
The idea of structural regression model is that we
take these variables - these analysis approaches
00:02:24.390 --> 00:02:29.550
and we combine them. So we have a regression
analysis model here where instead of having
00:02:29.550 --> 00:02:36.690
the indicators that are possibly contaminated
with measurement error we model regression
00:02:36.690 --> 00:02:45.450
between latent variables X 1 X 2 and Y and then
we add the factor analysis directly to the model.
00:02:45.450 --> 00:02:48.660
So we have a combination of factor analysis and
00:02:48.660 --> 00:02:53.460
regression analysis between the
factors in the factor analysis.
00:02:53.460 --> 00:02:56.730
This is a clearly more complicated concept than
00:02:56.730 --> 00:03:00.840
simply applying regression
analysis on scales course.
00:03:00.840 --> 00:03:07.920
This model has two parts. This inner part here
with the latent variables is referred to as
00:03:07.920 --> 00:03:13.110
the latent variable model. Some people call
this a part of the model as the structural
00:03:13.110 --> 00:03:18.960
model but that's a bit misleading because
these measurement relationships here are
00:03:18.960 --> 00:03:24.570
also equally structural in terms that they
have theoretical causal interpretations.
00:03:24.570 --> 00:03:31.620
Then the outer part linking the measures to
the factors is called measurement model and
00:03:31.620 --> 00:03:37.710
this is uniformly accepted definition.
So whenever anyone speaks about or talks
00:03:37.710 --> 00:03:43.620
about measurement model it means the part that
links the latent variables to their indicators.
00:03:43.620 --> 00:03:49.890
So that's a big model and it's a complicated
model. The question is - this is clearly
00:03:49.890 --> 00:03:54.270
more complicated than taking a sum
of indicators and using regression
00:03:54.270 --> 00:03:59.250
analysis - so why would you want to use
a more complicated analysis approach?
00:03:59.250 --> 00:04:02.580
The structural regression
model approach has a couple
00:04:02.580 --> 00:04:06.900
of advantages over regression
analysis with scales course.
00:04:06.900 --> 00:04:14.520
Let's take a look at this example. So we have
these concepts A and B represented by these two
00:04:14.520 --> 00:04:20.130
latent variables and then we have indicators
here. The indicators variances here consist
00:04:20.130 --> 00:04:26.370
of variance due to the concept A and variance due
to the concept B plus all these different sources
00:04:26.370 --> 00:04:31.080
of measurement error values. So we have random
noise E and then we have some item uniqueness
00:04:31.080 --> 00:04:36.750
here that is not related to the concept B or A
that these indicators are supposed to measure.
00:04:36.750 --> 00:04:42.390
When we take a sum of these indicators
of A sum of these indicators of B then
00:04:42.390 --> 00:04:46.860
all the sources of variation including
the measurement errors will be in the
00:04:46.860 --> 00:04:52.500
sum. So we just take everything together -
we take a sum and we have this combination
00:04:52.500 --> 00:04:57.420
of mostly variation of interest but also
some variation that is not of interest.
00:04:57.420 --> 00:05:02.880
When we estimate this regression coefficient
beta here then the estimate will be too small
00:05:02.880 --> 00:05:07.200
you'll be are attenuated and it's
going to be inconsistent and biased.
00:05:07.200 --> 00:05:14.580
So what can SCM bring us that
will help with this problem?
00:05:14.580 --> 00:05:21.030
The idea of SCM or structural regression
model is that instead of taking sum of the
00:05:21.030 --> 00:05:26.580
indicators we estimate the factor model and
a regression analysis between the factors.
00:05:26.580 --> 00:05:31.230
So the idea of a confirmatory factor analysis
was that you take the variation of these
00:05:31.230 --> 00:05:38.250
indicators apart so for example the b1 b2
and b3 indicators variation is modeled as
00:05:38.250 --> 00:05:43.410
being due to the factor here and also due
to these measurement error components here.
00:05:43.410 --> 00:05:49.380
Because we have now these factors that
are pursued to be free of measurement
00:05:49.380 --> 00:05:53.520
error - the correlation between the
factors the beta is going to be correct.
00:05:53.520 --> 00:06:00.210
The advantage is that structure regression
or a structural equation model corrects for
00:06:00.210 --> 00:06:04.380
measurement error. This correction
comes with certain assumptions that
00:06:04.380 --> 00:06:07.020
I will explain a bit later in this video but that
00:06:07.020 --> 00:06:12.420
is the basic idea if your model is correct
then measurement error is controlled for.
00:06:12.420 --> 00:06:18.810
The practical outcome is presented here. So this
is a paper - from a paper that I've written - and
00:06:18.810 --> 00:06:26.250
we simulated a data set from two concepts that
we were measuring each with three indicators
00:06:26.250 --> 00:06:33.000
and so we have six indicators together total.
We take a sum of the first three indicators.
00:06:33.000 --> 00:06:39.390
We take a sum of the indicators 4 5 or 6 and we
calculate the correlation between those two sums.
00:06:39.390 --> 00:06:45.660
We vary how much the concepts correlate in the
population. We varied between zero point zero
00:06:45.660 --> 00:06:52.980
to zero point six and then we replicate this
analysis 300 times. We estimate the correlation
00:06:52.980 --> 00:07:00.750
between using SEM or using sum's scales sum
of the indicators and regression analysis.
00:07:00.750 --> 00:07:06.510
We can see here clearly then when we take a sum
of the indicators and when we apply regression
00:07:06.510 --> 00:07:14.730
analysis regardless of whether we take a sum of
indicators or we use weights that are maximized
00:07:14.730 --> 00:07:21.390
the reliability of the indicators. There is not
much difference. These correlations here will
00:07:21.390 --> 00:07:28.830
be too small because there's anyway measurement
error ending up in the sum of those scale items.
00:07:28.830 --> 00:07:36.330
In SEM - because we model not a sum
correlation between two sums but the
00:07:36.330 --> 00:07:42.240
correlation between two factors - this effect
is unbiased. So we can see that it the effect
00:07:42.240 --> 00:07:50.610
here - the estimates here are correct so
that's the true value here and it's roughly
00:07:50.610 --> 00:07:55.350
equally - they are roughly normally distributed
around the true value. So SEM provides you this
00:07:55.350 --> 00:08:01.320
small advantage in precision and that's
a good thing if you can apply it well.
00:08:01.320 --> 00:08:06.420
There is also another advantage in SEM that
I have demonstrated in the earlier videos
00:08:06.420 --> 00:08:13.830
and it's testing the model. So we had the
confirmatory factor analysis example model.
00:08:13.830 --> 00:08:18.810
We have the chi-square test that tells whether
the factor model fits the data if it doesn't
00:08:18.810 --> 00:08:23.190
you have to do diagnostics and then we have
the mediation example. We'll also have the
00:08:23.190 --> 00:08:28.170
chi-square test that tells whether the full
mediation model fits the data well or not.
00:08:28.170 --> 00:08:35.880
The idea of the chi-square test again is to test
if the constraints implied by the model are close
00:08:35.880 --> 00:08:43.860
enough to the correlations in the data so that we
can say that these differences here are only due
00:08:43.860 --> 00:08:50.850
to chance only. And that we want it here to not
reject the null hypothesis because rejecting the
00:08:50.850 --> 00:08:56.610
null hypothesis that these discrepancies in the
implied correlation - observed correlations - are
00:08:56.610 --> 00:09:02.760
due to chance only means that we have to
declare or we have to conclude that the
00:09:02.760 --> 00:09:06.630
model is not correctly specified and we need
to do some Diagnostics to understand why.
00:09:06.630 --> 00:09:09.090
So this is the second advantage in structure
00:09:09.090 --> 00:09:13.530
regression models. It allows you to
test whether the model fits the data.
00:09:13.530 --> 00:09:18.210
Regression analysis doesn't allow you
to test the model. It only allows you
00:09:18.210 --> 00:09:23.220
to assess how much the model explains
the data. It doesn't allow you testing
00:09:23.220 --> 00:09:25.890
whether the model is correct. So
that's the second big advantage.
00:09:25.890 --> 00:09:35.070
There are also other advantages in SEM such as we
can model relationships that go into both ways. So
00:09:35.070 --> 00:09:40.800
reciprocal causation for example but that's more
advanced and these are the reasons why people
00:09:40.800 --> 00:09:46.170
typically apply structural regression models
or SEMS instead of regression with some scales.
00:09:46.170 --> 00:09:53.370
There is this slippery slope to SEM. So
whenever you have a scale with multiple
00:09:53.370 --> 00:09:56.970
items you should apply a factor
analysis. So every time you have
00:09:56.970 --> 00:10:01.830
an a survey instrument for example you get
data then you run a factor analysis. That's
00:10:01.830 --> 00:10:07.620
a - you must do that to for example calculate
coefficient alpha to addresses reliability.
00:10:07.620 --> 00:10:14.100
Then if you do on exploratory factor analysis
then in most cases actually the confirmatory
00:10:14.100 --> 00:10:18.780
factor analysis would be better because it's
a bit more rigorous it allows you to test
00:10:18.780 --> 00:10:24.450
whether the model is correct and it also -
in cases where exploratory factor analysis
00:10:24.450 --> 00:10:29.430
cannot find your solution then it's possible
that confirmatory factor analysis still works
00:10:29.430 --> 00:10:36.780
because you give the solution and don't
require the computer to find it for you.
00:10:36.780 --> 00:10:42.420
But then if you apply confirmatory factor
analysis then instead of taking the sums
00:10:42.420 --> 00:10:46.410
of indicators and using those as in
regression analysis you really should
00:10:46.410 --> 00:10:50.520
be using structural regression model
because it's again more rigorous and it
00:10:50.520 --> 00:10:54.540
allows you to control for measurement error
and it allows you to do overall more tests.
00:10:54.540 --> 00:11:02.130
So there's - every time when you do a survey or
any other multiple item measurement you must do
00:11:02.130 --> 00:11:08.640
a factor analysis. If you do a factor analysis
then it's better to up like CFA if you do CFA then
00:11:08.640 --> 00:11:13.290
it's better to apply structural regression models
than to do Regression analysis with sum scales.
00:11:13.290 --> 00:11:19.170
So this is all good and but there are
reasons why you probably shouldn't
00:11:19.170 --> 00:11:24.720
apply structural regression models as your
first analysis technique. So if structure
00:11:24.720 --> 00:11:28.800
regression models are so much better
than regression with sum scales why
00:11:28.800 --> 00:11:32.310
would I not use so it? That's the
question. There are good reasons.
00:11:32.310 --> 00:11:38.640
The reasons not to use structural
regression models - the first reason
00:11:38.640 --> 00:11:45.270
is that it's more complicated to apply.
So that has two implications. The first
00:11:45.270 --> 00:11:50.820
implication is that if you are a beginner
and you want to get your first paper for
00:11:50.820 --> 00:11:56.130
a first conference publication out then
doing that with regression of sum scales
00:11:56.130 --> 00:11:59.850
it's easier and you can get more done
with regressions analysis than SEM.
00:11:59.850 --> 00:12:05.730
In SEM it's possible that when you give you
the computer data the computer doesn't give
00:12:05.730 --> 00:12:09.840
you any results at all. That doesn't happen
with regression analysis. If it happens
00:12:09.840 --> 00:12:15.600
with SEM then you need some expertise
to be able to get the model to work.
00:12:15.600 --> 00:12:20.760
There is also another reason related to
the complication of application. It is
00:12:20.760 --> 00:12:27.540
that it's better that if you know a tool
well - like a regression analysis that is
00:12:27.540 --> 00:12:34.650
slightly sub optimal so regression analysis
can't deal with measurement error the same
00:12:34.650 --> 00:12:40.140
way that structural equation models can - it's
nevertheless better to use that technique than
00:12:40.140 --> 00:12:43.380
a more complicated technique that
you may not understand very well.
00:12:43.380 --> 00:12:50.580
So it's better to have results that you know
are done correctly using a slightly suboptimal
00:12:50.580 --> 00:12:57.930
techniques than having results that are done with
the state of the art technique but you're not sure
00:12:57.930 --> 00:13:04.440
whether they're done correctly. So I would
encourage you to first run - do a regression
00:13:04.440 --> 00:13:09.300
analysis really well and only after you know
that then move to the more complicated ones.
00:13:09.300 --> 00:13:17.130
SEM also has some statistical issues. So
SEM requires that the model is correctly
00:13:17.130 --> 00:13:21.960
specified. The idea of correct model
specification is that if your model is
00:13:21.960 --> 00:13:28.770
not correct the SEM results can be highly
misleading. Model correctness means that
00:13:28.770 --> 00:13:33.390
the measurement model must be correctly
specified so each indicator must belong
00:13:33.390 --> 00:13:38.310
to those factors that they say that they
do and then all these causal relationship
00:13:38.310 --> 00:13:43.320
between the factors must be correctly specified.
Otherwise the results can be very misleading.
00:13:43.320 --> 00:13:51.030
Then what helps you here is the chi-square test.
If your chi-square test rejects the model then
00:13:51.030 --> 00:13:56.820
that means that something is incorrect.
Something - the model is incorrect for the
00:13:56.820 --> 00:14:01.470
data in somewhere. You have to understand why
and you have to do Diagnostics. That requires
00:14:01.470 --> 00:14:06.630
an expertise to do and unless you do that
then the results could be widely misleading.
00:14:06.630 --> 00:14:09.390
It's probably easier to get misleading results
00:14:09.390 --> 00:14:13.950
with structural regression models than
regression analysis with sum scores.
00:14:13.950 --> 00:14:21.780
My personal take is that if you know how to
use structure regression models well you should
00:14:21.780 --> 00:14:27.840
probably always use that as your own the main
analysis technique instead of regression analysis.
00:14:27.840 --> 00:14:33.510
Then again I have the impression that most
people who apply structure regression models
00:14:33.510 --> 00:14:37.710
or structural equation models probably
don't understand these techniques well
00:14:37.710 --> 00:14:44.220
enough to use them in a way that we can rely
on the results to be correct and that's a big
00:14:44.220 --> 00:14:48.660
problem and for that reason I recommend that
people start with regression analysis instead.
00:14:48.660 --> 00:14:57.570
Finally if you want to get started with regression
analysis. Study a good book. There's so many
00:14:57.570 --> 00:15:05.370
different ways that can go incorrect and my
favorite SEM book is Klein's book principles
00:15:05.370 --> 00:15:11.160
and practice instructor of structural equation
modeling. He concludes his book with this nice
00:15:11.160 --> 00:15:19.920
chapter of how to fool yourself with SEM and then
he had at least 52 different things that can go
00:15:19.920 --> 00:15:25.320
wrong and you need to know these things really
before you apply this technique because otherwise
00:15:25.320 --> 00:15:31.200
you will have problems with the technique
and your results may not be trustworthy.
00:15:31.200 --> 00:15:35.640
But it is a technique worth learning
in the long run because it allows you
00:15:35.640 --> 00:15:38.340
to do things that you cannot
do with regression analysis.