WEBVTT WEBVTT Kind: captions Language: en 00:00:00.060 --> 00:00:04.500 This video explains the basic idea of  structural regression models that are   00:00:04.500 --> 00:00:07.380 sometimes referred to as structural  equation models in the literature. 00:00:07.380 --> 00:00:11.130 What is a structural regression  model? This technique is used   00:00:11.130 --> 00:00:14.730 in for example the Mesquita and  Lazzarini's paper. They explain   00:00:14.730 --> 00:00:19.560 the technique that it's a combination of  a factor analysis and a path analysis. 00:00:19.560 --> 00:00:23.670 Path analysis is basically the  regression analysis where there   00:00:23.670 --> 00:00:27.870 are multiple equations for example  when you do a mediation model using   00:00:27.870 --> 00:00:31.380 the simultaneous equations approach  that will be called a path analysis. 00:00:31.380 --> 00:00:35.520 So path analysis is regression with  opposite variables except that you   00:00:35.520 --> 00:00:40.200 have more than one dependent variables and  factor analysis is the analysis where we   00:00:40.200 --> 00:00:45.450 check what different indicators have  in common and perhaps whether we can   00:00:45.450 --> 00:00:49.560 group those indicators and consider  them as measures of the same concept. 00:00:49.560 --> 00:00:58.740 So SEM or structural equation method combines  these two analysis approaches. To understand   00:00:58.740 --> 00:01:05.280 what SEM is and what it does we can start  with the basic regression analysis model. 00:01:05.280 --> 00:01:09.540 So the basic regression analysis model  makes the important assumption that the   00:01:09.540 --> 00:01:15.810 X 1 and X 2 here are measured without any  measurement error. So the X 1 and X 2 are   00:01:15.810 --> 00:01:24.000 the quantities of interest in terms instead of  being measures of the quantities of interest. 00:01:24.000 --> 00:01:29.640 So X 1 is of direct interest instead  of being a measure with possibly some   00:01:29.640 --> 00:01:33.600 error in there of some concept that  we can't measure - observe directly. 00:01:33.600 --> 00:01:38.880 So regression analysis makes that assumption  if that assumption of no measurement error   00:01:38.880 --> 00:01:45.480 fails these regression coefficients beta 1  and beta 2 will be inconsistent and biased. 00:01:45.480 --> 00:01:51.120 Then we have the factor analysis model.  The idea of a factor analysis model was   00:01:51.120 --> 00:01:54.090 that we have a set of indicators and then we ask   00:01:54.090 --> 00:01:59.160 what these indicators have in common and  what they have in common is one factor. 00:01:59.160 --> 00:02:04.380 In confirmatory factor analysis we  ask do these indicators represent   00:02:04.380 --> 00:02:09.210 one factor or not. The computer gives us  an answer in an expiratory analysis which   00:02:09.210 --> 00:02:12.960 is not part of structural regression  model the computer finds the factors. 00:02:12.960 --> 00:02:15.660 So we define a factor structure here and then we   00:02:15.660 --> 00:02:18.060 estimate it. So that's part of  structural regression model. 00:02:18.060 --> 00:02:24.390 The idea of structural regression model is that we  take these variables - these analysis approaches   00:02:24.390 --> 00:02:29.550 and we combine them. So we have a regression  analysis model here where instead of having   00:02:29.550 --> 00:02:36.690 the indicators that are possibly contaminated  with measurement error we model regression   00:02:36.690 --> 00:02:45.450 between latent variables X 1 X 2 and Y and then  we add the factor analysis directly to the model. 00:02:45.450 --> 00:02:48.660 So we have a combination of factor analysis and   00:02:48.660 --> 00:02:53.460 regression analysis between the  factors in the factor analysis. 00:02:53.460 --> 00:02:56.730 This is a clearly more complicated concept than   00:02:56.730 --> 00:03:00.840 simply applying regression  analysis on scales course. 00:03:00.840 --> 00:03:07.920 This model has two parts. This inner part here  with the latent variables is referred to as   00:03:07.920 --> 00:03:13.110 the latent variable model. Some people call  this a part of the model as the structural   00:03:13.110 --> 00:03:18.960 model but that's a bit misleading because  these measurement relationships here are   00:03:18.960 --> 00:03:24.570 also equally structural in terms that they  have theoretical causal interpretations. 00:03:24.570 --> 00:03:31.620 Then the outer part linking the measures to  the factors is called measurement model and   00:03:31.620 --> 00:03:37.710 this is uniformly accepted definition.  So whenever anyone speaks about or talks   00:03:37.710 --> 00:03:43.620 about measurement model it means the part that  links the latent variables to their indicators. 00:03:43.620 --> 00:03:49.890 So that's a big model and it's a complicated  model. The question is - this is clearly   00:03:49.890 --> 00:03:54.270 more complicated than taking a sum  of indicators and using regression   00:03:54.270 --> 00:03:59.250 analysis - so why would you want to use  a more complicated analysis approach? 00:03:59.250 --> 00:04:02.580 The structural regression  model approach has a couple   00:04:02.580 --> 00:04:06.900 of advantages over regression  analysis with scales course. 00:04:06.900 --> 00:04:14.520 Let's take a look at this example. So we have  these concepts A and B represented by these two   00:04:14.520 --> 00:04:20.130 latent variables and then we have indicators  here. The indicators variances here consist   00:04:20.130 --> 00:04:26.370 of variance due to the concept A and variance due  to the concept B plus all these different sources   00:04:26.370 --> 00:04:31.080 of measurement error values. So we have random  noise E and then we have some item uniqueness   00:04:31.080 --> 00:04:36.750 here that is not related to the concept B or A  that these indicators are supposed to measure. 00:04:36.750 --> 00:04:42.390 When we take a sum of these indicators  of A sum of these indicators of B then   00:04:42.390 --> 00:04:46.860 all the sources of variation including  the measurement errors will be in the   00:04:46.860 --> 00:04:52.500 sum. So we just take everything together -  we take a sum and we have this combination   00:04:52.500 --> 00:04:57.420 of mostly variation of interest but also  some variation that is not of interest. 00:04:57.420 --> 00:05:02.880 When we estimate this regression coefficient  beta here then the estimate will be too small   00:05:02.880 --> 00:05:07.200 you'll be are attenuated and it's  going to be inconsistent and biased. 00:05:07.200 --> 00:05:14.580 So what can SCM bring us that  will help with this problem? 00:05:14.580 --> 00:05:21.030 The idea of SCM or structural regression  model is that instead of taking sum of the   00:05:21.030 --> 00:05:26.580 indicators we estimate the factor model and  a regression analysis between the factors. 00:05:26.580 --> 00:05:31.230 So the idea of a confirmatory factor analysis  was that you take the variation of these   00:05:31.230 --> 00:05:38.250 indicators apart so for example the b1 b2  and b3 indicators variation is modeled as   00:05:38.250 --> 00:05:43.410 being due to the factor here and also due  to these measurement error components here. 00:05:43.410 --> 00:05:49.380 Because we have now these factors that  are pursued to be free of measurement   00:05:49.380 --> 00:05:53.520 error - the correlation between the  factors the beta is going to be correct. 00:05:53.520 --> 00:06:00.210 The advantage is that structure regression  or a structural equation model corrects for   00:06:00.210 --> 00:06:04.380 measurement error. This correction  comes with certain assumptions that   00:06:04.380 --> 00:06:07.020 I will explain a bit later in this video but that   00:06:07.020 --> 00:06:12.420 is the basic idea if your model is correct  then measurement error is controlled for. 00:06:12.420 --> 00:06:18.810 The practical outcome is presented here. So this  is a paper - from a paper that I've written - and   00:06:18.810 --> 00:06:26.250 we simulated a data set from two concepts that  we were measuring each with three indicators   00:06:26.250 --> 00:06:33.000 and so we have six indicators together total.  We take a sum of the first three indicators.   00:06:33.000 --> 00:06:39.390 We take a sum of the indicators 4 5 or 6 and we  calculate the correlation between those two sums. 00:06:39.390 --> 00:06:45.660 We vary how much the concepts correlate in the  population. We varied between zero point zero   00:06:45.660 --> 00:06:52.980 to zero point six and then we replicate this  analysis 300 times. We estimate the correlation   00:06:52.980 --> 00:07:00.750 between using SEM or using sum's scales sum  of the indicators and regression analysis. 00:07:00.750 --> 00:07:06.510 We can see here clearly then when we take a sum  of the indicators and when we apply regression   00:07:06.510 --> 00:07:14.730 analysis regardless of whether we take a sum of  indicators or we use weights that are maximized   00:07:14.730 --> 00:07:21.390 the reliability of the indicators. There is not  much difference. These correlations here will   00:07:21.390 --> 00:07:28.830 be too small because there's anyway measurement  error ending up in the sum of those scale items. 00:07:28.830 --> 00:07:36.330 In SEM - because we model not a sum  correlation between two sums but the   00:07:36.330 --> 00:07:42.240 correlation between two factors - this effect  is unbiased. So we can see that it the effect   00:07:42.240 --> 00:07:50.610 here - the estimates here are correct so  that's the true value here and it's roughly   00:07:50.610 --> 00:07:55.350 equally - they are roughly normally distributed  around the true value. So SEM provides you this   00:07:55.350 --> 00:08:01.320 small advantage in precision and that's  a good thing if you can apply it well. 00:08:01.320 --> 00:08:06.420 There is also another advantage in SEM that  I have demonstrated in the earlier videos   00:08:06.420 --> 00:08:13.830 and it's testing the model. So we had the  confirmatory factor analysis example model.   00:08:13.830 --> 00:08:18.810 We have the chi-square test that tells whether  the factor model fits the data if it doesn't   00:08:18.810 --> 00:08:23.190 you have to do diagnostics and then we have  the mediation example. We'll also have the   00:08:23.190 --> 00:08:28.170 chi-square test that tells whether the full  mediation model fits the data well or not. 00:08:28.170 --> 00:08:35.880 The idea of the chi-square test again is to test  if the constraints implied by the model are close   00:08:35.880 --> 00:08:43.860 enough to the correlations in the data so that we  can say that these differences here are only due   00:08:43.860 --> 00:08:50.850 to chance only. And that we want it here to not  reject the null hypothesis because rejecting the   00:08:50.850 --> 00:08:56.610 null hypothesis that these discrepancies in the  implied correlation - observed correlations - are   00:08:56.610 --> 00:09:02.760 due to chance only means that we have to  declare or we have to conclude that the   00:09:02.760 --> 00:09:06.630 model is not correctly specified and we need  to do some Diagnostics to understand why. 00:09:06.630 --> 00:09:09.090 So this is the second advantage in structure   00:09:09.090 --> 00:09:13.530 regression models. It allows you to  test whether the model fits the data. 00:09:13.530 --> 00:09:18.210 Regression analysis doesn't allow you  to test the model. It only allows you   00:09:18.210 --> 00:09:23.220 to assess how much the model explains  the data. It doesn't allow you testing   00:09:23.220 --> 00:09:25.890 whether the model is correct. So  that's the second big advantage. 00:09:25.890 --> 00:09:35.070 There are also other advantages in SEM such as we  can model relationships that go into both ways. So   00:09:35.070 --> 00:09:40.800 reciprocal causation for example but that's more  advanced and these are the reasons why people   00:09:40.800 --> 00:09:46.170 typically apply structural regression models  or SEMS instead of regression with some scales. 00:09:46.170 --> 00:09:53.370 There is this slippery slope to SEM. So  whenever you have a scale with multiple   00:09:53.370 --> 00:09:56.970 items you should apply a factor  analysis. So every time you have   00:09:56.970 --> 00:10:01.830 an a survey instrument for example you get  data then you run a factor analysis. That's   00:10:01.830 --> 00:10:07.620 a - you must do that to for example calculate  coefficient alpha to addresses reliability. 00:10:07.620 --> 00:10:14.100 Then if you do on exploratory factor analysis  then in most cases actually the confirmatory   00:10:14.100 --> 00:10:18.780 factor analysis would be better because it's  a bit more rigorous it allows you to test   00:10:18.780 --> 00:10:24.450 whether the model is correct and it also -  in cases where exploratory factor analysis   00:10:24.450 --> 00:10:29.430 cannot find your solution then it's possible  that confirmatory factor analysis still works   00:10:29.430 --> 00:10:36.780 because you give the solution and don't  require the computer to find it for you. 00:10:36.780 --> 00:10:42.420 But then if you apply confirmatory factor  analysis then instead of taking the sums   00:10:42.420 --> 00:10:46.410 of indicators and using those as in  regression analysis you really should   00:10:46.410 --> 00:10:50.520 be using structural regression model  because it's again more rigorous and it   00:10:50.520 --> 00:10:54.540 allows you to control for measurement error  and it allows you to do overall more tests. 00:10:54.540 --> 00:11:02.130 So there's - every time when you do a survey or  any other multiple item measurement you must do   00:11:02.130 --> 00:11:08.640 a factor analysis. If you do a factor analysis  then it's better to up like CFA if you do CFA then   00:11:08.640 --> 00:11:13.290 it's better to apply structural regression models  than to do Regression analysis with sum scales. 00:11:13.290 --> 00:11:19.170 So this is all good and but there are  reasons why you probably shouldn't   00:11:19.170 --> 00:11:24.720 apply structural regression models as your  first analysis technique. So if structure   00:11:24.720 --> 00:11:28.800 regression models are so much better  than regression with sum scales why   00:11:28.800 --> 00:11:32.310 would I not use so it? That's the  question. There are good reasons. 00:11:32.310 --> 00:11:38.640 The reasons not to use structural  regression models - the first reason   00:11:38.640 --> 00:11:45.270 is that it's more complicated to apply.  So that has two implications. The first   00:11:45.270 --> 00:11:50.820 implication is that if you are a beginner  and you want to get your first paper for   00:11:50.820 --> 00:11:56.130 a first conference publication out then  doing that with regression of sum scales   00:11:56.130 --> 00:11:59.850 it's easier and you can get more done  with regressions analysis than SEM. 00:11:59.850 --> 00:12:05.730 In SEM it's possible that when you give you  the computer data the computer doesn't give   00:12:05.730 --> 00:12:09.840 you any results at all. That doesn't happen  with regression analysis. If it happens   00:12:09.840 --> 00:12:15.600 with SEM then you need some expertise  to be able to get the model to work. 00:12:15.600 --> 00:12:20.760 There is also another reason related to  the complication of application. It is   00:12:20.760 --> 00:12:27.540 that it's better that if you know a tool  well - like a regression analysis that is   00:12:27.540 --> 00:12:34.650 slightly sub optimal so regression analysis  can't deal with measurement error the same   00:12:34.650 --> 00:12:40.140 way that structural equation models can - it's  nevertheless better to use that technique than   00:12:40.140 --> 00:12:43.380 a more complicated technique that  you may not understand very well. 00:12:43.380 --> 00:12:50.580 So it's better to have results that you know  are done correctly using a slightly suboptimal   00:12:50.580 --> 00:12:57.930 techniques than having results that are done with  the state of the art technique but you're not sure   00:12:57.930 --> 00:13:04.440 whether they're done correctly. So I would  encourage you to first run - do a regression   00:13:04.440 --> 00:13:09.300 analysis really well and only after you know  that then move to the more complicated ones. 00:13:09.300 --> 00:13:17.130 SEM also has some statistical issues. So  SEM requires that the model is correctly   00:13:17.130 --> 00:13:21.960 specified. The idea of correct model  specification is that if your model is   00:13:21.960 --> 00:13:28.770 not correct the SEM results can be highly  misleading. Model correctness means that   00:13:28.770 --> 00:13:33.390 the measurement model must be correctly  specified so each indicator must belong   00:13:33.390 --> 00:13:38.310 to those factors that they say that they  do and then all these causal relationship   00:13:38.310 --> 00:13:43.320 between the factors must be correctly specified.  Otherwise the results can be very misleading. 00:13:43.320 --> 00:13:51.030 Then what helps you here is the chi-square test.  If your chi-square test rejects the model then   00:13:51.030 --> 00:13:56.820 that means that something is incorrect.  Something - the model is incorrect for the   00:13:56.820 --> 00:14:01.470 data in somewhere. You have to understand why  and you have to do Diagnostics. That requires   00:14:01.470 --> 00:14:06.630 an expertise to do and unless you do that  then the results could be widely misleading. 00:14:06.630 --> 00:14:09.390 It's probably easier to get misleading results   00:14:09.390 --> 00:14:13.950 with structural regression models than  regression analysis with sum scores. 00:14:13.950 --> 00:14:21.780 My personal take is that if you know how to  use structure regression models well you should   00:14:21.780 --> 00:14:27.840 probably always use that as your own the main  analysis technique instead of regression analysis. 00:14:27.840 --> 00:14:33.510 Then again I have the impression that most  people who apply structure regression models   00:14:33.510 --> 00:14:37.710 or structural equation models probably  don't understand these techniques well   00:14:37.710 --> 00:14:44.220 enough to use them in a way that we can rely  on the results to be correct and that's a big   00:14:44.220 --> 00:14:48.660 problem and for that reason I recommend that  people start with regression analysis instead. 00:14:48.660 --> 00:14:57.570 Finally if you want to get started with regression  analysis. Study a good book. There's so many   00:14:57.570 --> 00:15:05.370 different ways that can go incorrect and my  favorite SEM book is Klein's book principles   00:15:05.370 --> 00:15:11.160 and practice instructor of structural equation  modeling. He concludes his book with this nice   00:15:11.160 --> 00:15:19.920 chapter of how to fool yourself with SEM and then  he had at least 52 different things that can go   00:15:19.920 --> 00:15:25.320 wrong and you need to know these things really  before you apply this technique because otherwise   00:15:25.320 --> 00:15:31.200 you will have problems with the technique  and your results may not be trustworthy. 00:15:31.200 --> 00:15:35.640 But it is a technique worth learning  in the long run because it allows you   00:15:35.640 --> 00:15:38.340 to do things that you cannot  do with regression analysis.