WEBVTT WEBVTT Kind: captions Language: en 00:00:00.030 --> 00:00:05.610 Multi-level data can be used for estimating  different kinds of effects. In this video   00:00:05.610 --> 00:00:08.970 I will explain the difference between  the within effect, the between effect,   00:00:08.970 --> 00:00:11.610 the contextual effect and the  population average effect. 00:00:11.610 --> 00:00:16.020 Because these effects have different  interpretations it is important to   00:00:16.020 --> 00:00:22.080 specify which effect you are interested in.  That depends on your research question and   00:00:22.080 --> 00:00:25.530 more generally on the objective  of your statistical analysis. 00:00:25.530 --> 00:00:31.620 Books on multi-level modeling typically  start by explaining the difference between   00:00:31.620 --> 00:00:36.570 a within effect and the between effect but  I think it's more useful to understand and   00:00:36.570 --> 00:00:41.010 easier to understand the difference between the  within effect and the contextual effect first. 00:00:41.010 --> 00:00:46.110 The within effect tells us what is  the effect on for an individual level   00:00:46.110 --> 00:00:51.000 variable or an individual level outcome  for example how a person's intelligence   00:00:51.000 --> 00:00:57.330 influences person's task behavior or how company's  innotiveness influences company's performance. 00:00:57.330 --> 00:01:03.450 The within effect has a clear policy  implication because it implies that   00:01:03.450 --> 00:01:10.410 when an individual level varying attribute  or trait changes then there is change in   00:01:10.410 --> 00:01:15.510 an outcome variable of interest. So  if you want more of Y do more of X. 00:01:15.510 --> 00:01:22.770 The contextual effect on the other hand  tells the effect of context. So how do   00:01:22.770 --> 00:01:29.940 the actions of others in the same context  influenced individual level behavior or the   00:01:29.940 --> 00:01:35.610 other way how does individuals actions  influence others in the same context? 00:01:35.610 --> 00:01:42.030 Typically the contextual effect is  explained as what is the effect of   00:01:42.030 --> 00:01:47.280 a mean characteristic on an individual  level outcome. The difference between   00:01:47.280 --> 00:01:52.530 the within effect and the contextual effect  is perhaps best understood through examples. 00:01:52.530 --> 00:01:59.100 Let's take a look at examples. The first is  vaccinations. So if you vaccinate yourself   00:01:59.100 --> 00:02:06.420 then you're less likely to develop a serious  disease. So there is a within effect. Your   00:02:06.420 --> 00:02:13.950 vaccination helps you. But there's also contextual  effect because if others around you vaccinate   00:02:13.950 --> 00:02:19.890 themselves then that creates herd immunity which  protects also those peoples who don't vaccinate. 00:02:19.890 --> 00:02:27.870 So the within effect here is positive and the  contextual effect is positive as well. We can   00:02:27.870 --> 00:02:33.600 also have examples of a positive within  effect and a negative contextual effect. 00:02:33.600 --> 00:02:41.520 For example over fishing. If an individual  fisherman exceeds their quota then their   00:02:41.520 --> 00:02:46.920 profits will increase because you get  more fish to sell. If everyone else   00:02:46.920 --> 00:02:52.680 in the same leg over fishes then there  is a negative effect on profits because   00:02:52.680 --> 00:02:58.440 overfishing leads to less catches  or smaller catches for everyone. 00:02:58.440 --> 00:03:03.450 So the within effect is positive but  the contextual effect is negative. They   00:03:03.450 --> 00:03:06.570 may in some scenarios also cancel each other out. 00:03:06.570 --> 00:03:17.400 Another example. Innovation company performance.  If a company innovates a lot then they can develop   00:03:17.400 --> 00:03:24.630 valuable capabilities that lead to competitive  advantage. But if everyone else around the focal   00:03:24.630 --> 00:03:33.000 company innovates a lot then innovations are  no longer valuable because everyone has these   00:03:33.000 --> 00:03:39.150 innovations and also if a firm doesn't innovate  then they will fall behind and their performance   00:03:39.150 --> 00:03:44.790 will suffer. So innovation could have positive  within effect and a negative contextual effect. 00:03:44.790 --> 00:03:51.510 We could also have a variable that doesn't have  within effect but has a contextual effect. For   00:03:51.510 --> 00:03:57.510 example gender. How does our individual's  gender influence individuals performance in   00:03:57.510 --> 00:04:05.010 a team? We can say that individuals performance  doesn't depend on individual gender. So there   00:04:05.010 --> 00:04:10.800 is no within effect but we can also say that  gender has an effect on the contextual level   00:04:10.800 --> 00:04:18.720 for example teams with half man half women work  better than only men teams or solely women teams. 00:04:18.720 --> 00:04:24.360 Okay so that's the contextual effect  and the withing effect. How does this   00:04:24.360 --> 00:04:30.570 relate to the between effect which is typically  introduced in books about multi-level modeling? 00:04:30.570 --> 00:04:36.540 The between effect is simply the  sum of the within effect and the   00:04:36.540 --> 00:04:43.950 contextual effect. And it tells us what is  the influence of a mean characteristic of a   00:04:43.950 --> 00:04:49.710 group to the mean outcome of a group.  And it doesn't have as clear policy   00:04:49.710 --> 00:04:54.540 implications or causal interpretations as  the within effect and the contextual effect. 00:04:54.540 --> 00:05:01.230 Then the population average effect is simply  a regression line over the data ignoring all   00:05:01.230 --> 00:05:08.160 clustering. It answers the question what is the  most likely value of Y given a known X and it's a   00:05:08.160 --> 00:05:13.680 weighted sum of the withing effect and the between  effect or the within effect and the contextual   00:05:13.680 --> 00:05:19.470 effect. And it doesn't really have a clear causal  interpretation. So the population average effect   00:05:19.470 --> 00:05:26.340 is more useful for predictive applications than  the within effect and the contextual effect. 00:05:26.340 --> 00:05:33.930 Let's take a look at example of these effects from  a paper by Enders and Tofighi in psychological   00:05:33.930 --> 00:05:40.110 methods. So they have data. They have on  individuals well-being here on the y-axis and   00:05:40.110 --> 00:05:47.520 individuals work hours here on the x-axis and they  have a three individuals. One two and three over   00:05:47.520 --> 00:05:53.940 five week. So we have five repeated observations  of each individual. These are synthetic data. 00:05:53.940 --> 00:06:01.260 Now the first effect that we have here is  the between effect. So this is the how does   00:06:01.260 --> 00:06:09.300 the mean well-being of a person depend on the  mean work hours of a person. So we basically   00:06:09.300 --> 00:06:16.380 calculate means of these clusters for each of  these three individuals or both variables so   00:06:16.380 --> 00:06:21.510 we have three observations of two variables  each and we run a regression on those three   00:06:21.510 --> 00:06:27.240 observations those three cluster means and for  two variables and then we have the between effect. 00:06:27.240 --> 00:06:35.910 Then we have the within effect. The within  effect tells how changing your work hours   00:06:35.910 --> 00:06:42.000 influences your well-being. So it's  a regression line that we get when   00:06:42.000 --> 00:06:47.040 we ignore the differences between these  groups. So we basically put these ovals   00:06:47.040 --> 00:06:52.710 on top of each others. We eliminate all between  group differences and then we run a regression   00:06:52.710 --> 00:06:58.080 line through the ovals and that tells us  the direction of individual level change. 00:06:58.080 --> 00:07:10.860 So this person here who is a relatively  well-off will never become this bad in   00:07:10.860 --> 00:07:17.790 well-being regardless of how much they  work because the only the within affect   00:07:17.790 --> 00:07:24.630 influences an individual level outcome and this  person is overall for some reason a lot higher   00:07:24.630 --> 00:07:30.630 on well-being than this person. So this is the  between effect the constant stable differences   00:07:30.630 --> 00:07:36.600 between people and the within effect tells how  much variation there is within individuals. 00:07:36.600 --> 00:07:42.180 Then the population average effect is simply  if we run a regression line through the data   00:07:42.180 --> 00:07:48.660 ignoring all clustering effects and we  get another line which in this case is   00:07:48.660 --> 00:07:54.840 closer to between effect and that line really  doesn't have any clear causal interpretation. 00:07:54.840 --> 00:08:03.120 There are a couple of relationships  that we need to understand of within   00:08:03.120 --> 00:08:10.740 effect and the contextual effect. If the  within effect is zero and the contextual   00:08:10.740 --> 00:08:17.250 effects is 0 then all effects are 0. So  two variables are linearly unrelated. 00:08:17.250 --> 00:08:27.510 If we have within effect that is not 0 but the  contextual effect that is 0 then all effects   00:08:27.510 --> 00:08:32.910 are equal. So the within effect between effect  and the population average effect just give you   00:08:32.910 --> 00:08:37.770 the same value. And this is a scenario where  all the individuals are perfectly comparable   00:08:37.770 --> 00:08:44.340 all the groups or all the companies are perfectly  comparable. So there are no systematic differences   00:08:44.340 --> 00:08:49.680 between groups companies or people that we  observe or whatever our cluster variables. 00:08:49.680 --> 00:08:56.940 So these are simple cases. Then if there is  no within effect all effects are contextual   00:08:56.940 --> 00:09:03.090 then the contextual effect between effect and  the population average effect are the same.   00:09:03.090 --> 00:09:12.570 And in all of these three cases things are fairly  simple because we can just run a regression model   00:09:12.570 --> 00:09:18.750 to get the population average effect ignoring  the clustering and that will give us the effect   00:09:18.750 --> 00:09:26.760 of interest. But if the within effect is not  zero and if the contextual effect is not zero   00:09:26.760 --> 00:09:32.760 then all effects are distinct. So the within  effect between effect contextual effect and   00:09:32.760 --> 00:09:38.820 population average effect all generally have a  different value for a variable. And then which   00:09:38.820 --> 00:09:44.160 of those values is the one that you should be  focusing on depends on your research question. 00:09:44.160 --> 00:09:50.850 So this is the most common case and which  is unfortunate because it complicates our   00:09:50.850 --> 00:09:57.090 life. But there are also other  interesting things in this table. 00:09:57.090 --> 00:10:05.700 So in econometrics when we do multi-level modeling  or panel data analysis we typically want to make   00:10:05.700 --> 00:10:11.610 something called the random effects assumption.  I'm gonna explain the random effects assumption   00:10:11.610 --> 00:10:20.190 in another video, but on this within effect  is not zero and contextual effect is zero,   00:10:20.190 --> 00:10:23.850 is basically what the random  effects assumption is about. 00:10:23.850 --> 00:10:36.870 Anyway the key outcome or the key thing to know  about this slide is that you need to be specific   00:10:36.870 --> 00:10:42.690 on which effect you're interested in. So when  you write a paper or try to use the term within   00:10:42.690 --> 00:10:48.900 effect between effect or contextual effect just  to let your readers to understand which effect   00:10:48.900 --> 00:10:56.400 you want to know and why and then your readers  are also better off in understanding if you have   00:10:56.400 --> 00:11:02.040 estimated the effect of interest correctly.  Most commonly what we want to know is the   00:11:02.040 --> 00:11:08.730 within effect which tells how changing individual  level variables affects individual level outcomes.