WEBVTT 00:00:00.280 --> 00:00:04.690 Statistical analysis can be used for multiple different purposes. 00:00:04.690 --> 00:00:09.030 Let's take a look at this example that I'm going to be using in multiple videos. 00:00:09.030 --> 00:00:12.660 There is this Finnish Business Magazine called Talouselämä. 00:00:12.660 --> 00:00:19.540 And every year they publish Talouselämä 500 list which lists 500 largest Finnish companies, 00:00:19.540 --> 00:00:24.720 and presents all kinds of analysis of those companies and how they did for the previous 00:00:24.720 --> 00:00:25.720 year. 00:00:25.720 --> 00:00:31.390 So it's followed by many reporters, and many people who followed generally Finnish 00:00:31.390 --> 00:00:33.180 business environment. 00:00:33.180 --> 00:00:40.199 In 2005, there was a big headline in one of the most prestigious Finnish newspapers. 00:00:40.199 --> 00:00:48.159 That on this list, the women-led companies had 4.7 % points higher return on 00:00:48.159 --> 00:00:53.149 assets than those companies whose CEO was a man. 00:00:53.149 --> 00:00:55.809 So what can we say, based on this fact. 00:00:55.809 --> 00:01:04.129 We have a 4.7 % point difference, which is pretty substantial on one variable 00:01:04.129 --> 00:01:06.990 based difference between two groups. 00:01:06.990 --> 00:01:15.640 So the most obvious claim that people want to make with this kind of number is that naming 00:01:15.640 --> 00:01:19.450 a woman as a CEO causes the profitability to increase. 00:01:19.450 --> 00:01:23.130 So we have a claim with all kinds of policy implications. 00:01:23.130 --> 00:01:24.509 But that's not the only claim. 00:01:24.509 --> 00:01:29.219 And it may not be a valid claim that we can make from this fact, this number. 00:01:29.219 --> 00:01:34.310 So to understand what kind of claims we can make, generally, let's take a look at three 00:01:34.310 --> 00:01:37.079 purposes of statistics. 00:01:37.079 --> 00:01:40.240 The first purpose, the most simple one is description. 00:01:40.240 --> 00:01:46.289 So we can just say that women led companies are more profitable now or in 2005. 00:01:46.289 --> 00:01:48.250 And we don't even try to generalize anywhere. 00:01:48.250 --> 00:01:50.969 So we just state a fact. 00:01:50.969 --> 00:01:54.329 And that kind of description could be useful. 00:01:54.329 --> 00:02:03.509 For example, if one third of students taking research methods course fail, then that provides 00:02:03.509 --> 00:02:07.010 an indication that there's either something wrong with the students. 00:02:07.010 --> 00:02:13.700 Or something wrong with the course, even if we don't try to make any any stronger claims. 00:02:13.700 --> 00:02:21.459 Then the second level of of sophistication in statistical analysis is prediction. 00:02:21.459 --> 00:02:26.430 So the predictive claim would be that if a company is led by a woman, then it will be 00:02:26.430 --> 00:02:27.430 more profitable. 00:02:27.430 --> 00:02:29.040 So that's not a causal claim. 00:02:29.040 --> 00:02:34.810 So it's not a claim that the woman is actually the cause of the profitability difference. 00:02:34.810 --> 00:02:41.959 It is a claim that, if we observe a women-led company, then for some reason, it is likely 00:02:41.959 --> 00:02:43.500 to be more profitable. 00:02:43.500 --> 00:02:44.500 And prediction is useful. 00:02:44.500 --> 00:02:50.290 For example, if we know that a company is led by a woman, then it will be more profitable. 00:02:50.290 --> 00:02:54.670 If we know that and others don't, we could make investment decisions that are better 00:02:54.670 --> 00:02:57.019 than other industries, for example. 00:02:57.019 --> 00:02:59.200 Predictive analytics is very useful. 00:02:59.200 --> 00:03:06.590 We do forecasting and predictions all the time, you watch weather forecasts, banks forecast, 00:03:06.590 --> 00:03:11.209 or predict who is going to pay their mortgage on time who's going to be late. 00:03:11.209 --> 00:03:16.209 And stock market, or investors try to forecast where the stock market goes, and so on. 00:03:16.209 --> 00:03:21.030 So prediction, without any claims about causality is very useful. 00:03:21.030 --> 00:03:24.730 But that's not very common in quantitative research. 00:03:24.730 --> 00:03:29.920 Then we have the third step, which is causal inference. 00:03:29.920 --> 00:03:34.640 So naming a woman as a CEO causes the company to be more profitable. 00:03:34.640 --> 00:03:37.079 So here, we attribute the difference. 00:03:37.079 --> 00:03:42.370 We're not saying that this is merely a correlational relationship, we attribute the difference 00:03:42.370 --> 00:03:48.330 in the return on assets to women being CEOs of some companies and not others. 00:03:48.330 --> 00:03:51.010 And this has clear policy implications. 00:03:51.010 --> 00:03:56.530 If you have a male CEO, then you could increase their profitability by naming a woman CEO. 00:03:56.530 --> 00:03:58.520 If this claim is true. 00:03:58.520 --> 00:04:02.310 Then we have still a fourth level of claims that we can make. 00:04:02.310 --> 00:04:07.870 Which goes beyond statistics, and that is a causal explanation. 00:04:07.870 --> 00:04:11.140 So causal explanation differs from causal inference. 00:04:11.140 --> 00:04:20.019 In that we don't only make a claim that it's a woman that causes the company to be more 00:04:20.019 --> 00:04:22.840 profitable, but we'll also explain why that is the case. 00:04:22.840 --> 00:04:25.190 So that's why it's causal explanation. 00:04:25.190 --> 00:04:32.630 Typically, quantitative analysis can get us to the causal inference part, but the explanation 00:04:32.630 --> 00:04:34.330 needs to come from somewhere else. 00:04:34.330 --> 00:04:38.780 So we don't generally get to make theory from numbers. 00:04:38.780 --> 00:04:40.469 We only can make test claims.