WEBVTT WEBVTT Kind: captions Language: en 00:00:00.000 --> 00:00:04.230 Multilevel data provides interesting research  opportunities but they can also be challenging   00:00:04.230 --> 00:00:08.760 for data analysis. To get started with  multi-level data let's start with an example. 00:00:08.760 --> 00:00:16.020 We start with a simple question: is profitability  related to only investments? And we have a data   00:00:16.020 --> 00:00:21.690 set of 150 observations of two variables. We  run a regression analysis and we can see that   00:00:22.290 --> 00:00:28.140 R&D here is clearly positively related  to profitability. So the question is:   00:00:28.140 --> 00:00:33.900 is that really so? And just based on this  figure it would seem that there is a positive   00:00:33.900 --> 00:00:39.300 relationship but if the data are actually  multilevel the answer may not be as clear. 00:00:39.300 --> 00:00:46.590 So what if I say that this data this 150  data points are actually three industries   00:00:46.590 --> 00:00:52.110 with 50 firms each. Would that change  the conclusion of the direction or the   00:00:52.110 --> 00:00:56.850 existence of the effect? It could because  the data could actually look like this. 00:00:56.850 --> 00:01:04.830 So it is possible that this red industry here is  not very profitable doesn't spend much on R&D and   00:01:04.830 --> 00:01:12.600 this blue industry here is very R&D heavy and  also very profitable but within an industry the   00:01:12.600 --> 00:01:16.830 effect is actually negative. The more you  spend on R&D the less profitable you are. 00:01:16.830 --> 00:01:22.890 So it's possible that on one level the  effect is positive on another level it's   00:01:22.890 --> 00:01:27.960 negative. And now the question is that if  we want to know what is the direction and   00:01:27.960 --> 00:01:34.950 magnitude of the effect we have to specify  which level are we interested in because the   00:01:34.950 --> 00:01:39.630 answer is different depending on whether  we want to study how firms perform within   00:01:39.630 --> 00:01:44.550 industries or whether we want to study how  industry is different from one another. 00:01:44.550 --> 00:01:51.030 So we have these different levels. Firms exists  within industries so industry is a larger level.   00:01:51.030 --> 00:01:57.450 That's a level two variable or level two unit  and firm is level one unit within the level   00:01:57.450 --> 00:02:04.140 two units the industries. And clearly which  effect we report the positive effect from   00:02:04.140 --> 00:02:09.000 the previous slide or these negative effects  depends on what is the purpose of our study. 00:02:09.000 --> 00:02:13.530 Let's take a look at another  example. Is profitability   00:02:13.530 --> 00:02:16.410 related to R&D investments? Our data are here. 00:02:16.410 --> 00:02:20.190 So we have again some number of observations and   00:02:20.190 --> 00:02:27.870 the trend is clearly positive. So R&D and  profitability are positively correlated. 00:02:27.870 --> 00:02:32.310 What if these are actually repeated  observations of the same set of firms?   00:02:32.310 --> 00:02:38.970 So what if we have 15 companies over 10  years? Could be the same thing here. So   00:02:38.970 --> 00:02:44.450 within a company the effect is negative.  So if the same company increases their   00:02:44.450 --> 00:02:49.890 R&D spending their profitability will go  down but there are these between company   00:02:49.890 --> 00:02:55.800 differences that nevertheless cause the  overall regression line to be positive. 00:02:55.800 --> 00:03:02.610 Again if we want to answer the question, is  there a positive or negative effect, we want   00:03:02.610 --> 00:03:08.790 to know which level we are interested in. Are we  interested in answering the question what makes   00:03:08.790 --> 00:03:16.080 firms different or are we interested in answering  the question what a firm can do to increase their   00:03:16.080 --> 00:03:22.290 profitability or can a firm increase their  profitability by increasing the R&D spending? 00:03:22.290 --> 00:03:29.910 Normally at least in strategic management  we're much more focused on the within firm   00:03:29.910 --> 00:03:36.120 level. What a firm can do to improve  themselves? From this example it's   00:03:36.120 --> 00:03:41.820 clear that just running a regression on  these data and reporting that effect as   00:03:41.820 --> 00:03:46.170 if it was the within effect would  lead us to an incorrect conclusion. 00:03:46.170 --> 00:03:53.550 There are two different fallacies related to  this example. One is that we have the ecological   00:03:53.550 --> 00:04:00.600 fallacy. So if we try to journalize from the  between firm effects here so it's clearly positive   00:04:00.600 --> 00:04:07.770 to the within firm effects. If the effects are  not the same we are committing ecological fallacy. 00:04:07.770 --> 00:04:15.450 The opposite is atomistic fallacy.  So the idea in atomistic fallacy is   00:04:15.450 --> 00:04:21.600 that we are saying generalizing from this  within company trends to between company   00:04:21.600 --> 00:04:27.390 differences. We could for example say that  because investing more in R&D causes you   00:04:27.390 --> 00:04:34.020 to be more profitable then all companies  that invest more in R&D is more profitable   00:04:34.020 --> 00:04:38.310 than those that do not. That kind of  inference is not historically valid   00:04:38.310 --> 00:04:43.380 because it could be that the between effect  and the within effect are not the same. 00:04:43.380 --> 00:04:50.190 So what's the consequence?  Consequences of cluster data   00:04:50.190 --> 00:04:53.040 there are two things that you need to understand. 00:04:53.040 --> 00:04:58.620 One is that clustering can be a nuisance.  So if you want to estimate the within   00:04:58.620 --> 00:05:04.140 effect how R&D investment within a firm  influences that firm's future performance,   00:05:04.140 --> 00:05:11.190 then the existence of this between company  effects would be a problem for you because   00:05:11.190 --> 00:05:15.420 you can't just run normal regression on  your data and get the correct effect. 00:05:15.420 --> 00:05:20.610 Another one is that you're violating the  independence of observations assumptions in   00:05:20.610 --> 00:05:27.030 regression analysis but this is a much more  trivial problem because you can just apply   00:05:27.030 --> 00:05:33.030 cluster robust standard and that will deal  with the problem in the regression context. 00:05:33.030 --> 00:05:39.120 It's a lot more challenging to get the  effect right than to deal with this   00:05:39.120 --> 00:05:43.740 issue of non independence of observations  which mostly effects on standard errors. 00:05:43.740 --> 00:05:50.610 But also clustering presents interesting  opportunities. So your interest could lay   00:05:50.610 --> 00:05:56.340 on multiple levels so you could study  how much context matters. So how much   00:05:56.340 --> 00:06:03.000 does belonging to a particular industry  affect company performance and how much   00:06:03.000 --> 00:06:07.020 those decisions that the company can  make over time influence performance. 00:06:07.020 --> 00:06:14.040 So on which level does performance vary and  what is under the firm's control? Then we   00:06:14.040 --> 00:06:20.490 can study effects that vary between units.  So we could study for example whether R&D   00:06:20.490 --> 00:06:25.530 investments effect and profitability is  stronger for some companies than others. 00:06:25.530 --> 00:06:32.490 We can also study what explains the difference  in magnitude in those effects. So for example   00:06:32.490 --> 00:06:39.270 thus being in high-tech industry  media moderate the effect of R&D   00:06:39.270 --> 00:06:43.530 investment on profitability and that  would be across level interactions. 00:06:43.530 --> 00:06:52.710 So we can study on which level things vary. So  does it matter that company makes something some   00:06:52.710 --> 00:06:58.950 decisions or is the outcome mainly determined  by the context which is beyond the control of   00:06:58.950 --> 00:07:07.080 the company most of the time. The magnitude or  affect how it varies between different companies.