WEBVTT Kind: captions Language: en 00:00:00.160 --> 00:00:03.710 Degrees of freedom is an important concept in statistical testing, 00:00:03.710 --> 00:00:05.810 particularly in model comparisons. 00:00:05.810 --> 00:00:09.179 Degrees of freedom quantifies the difference between 00:00:09.179 --> 00:00:11.670 the information that you get from the data, 00:00:11.670 --> 00:00:14.309 and the information that is required for model estimation. 00:00:14.309 --> 00:00:17.130 For example, in regression analysis, 00:00:17.130 --> 00:00:18.130 the degrees of freedom is 00:00:18.130 --> 00:00:24.720 the number of observations minus one for the intercept minus the number of estimated parameters 00:00:24.720 --> 00:00:25.820 in the model. 00:00:25.820 --> 00:00:27.230 Let's take a look at the concept. 00:00:27.230 --> 00:00:30.269 So Wikipedia provides the following definitions. 00:00:30.269 --> 00:00:34.910 But instead of reading the definitions, I'll explain the concept with an example. 00:00:34.910 --> 00:00:38.480 Let's say that we have example data of five observations, 00:00:38.480 --> 00:00:42.379 which get values of 1, 2, 3, 4 and 5. 00:00:42.379 --> 00:00:44.930 And the mean of this observation is 3. 00:00:44.930 --> 00:00:48.399 By calculating the mean we consume one degree of freedom, 00:00:48.399 --> 00:00:52.180 so this analysis has four degrees of freedom altogether. 00:00:52.180 --> 00:00:55.539 Well, the degrees of freedom here tells us that, 00:00:55.539 --> 00:00:56.880 if we know the mean, 00:00:56.880 --> 00:01:00.879 we only need to know four more numbers in the series, 00:01:00.879 --> 00:01:04.260 to be able to say what the exact entire series is. 00:01:04.260 --> 00:01:08.290 So if we know that the first four numbers are 1, 2, 3, and 4, 00:01:08.290 --> 00:01:09.980 we know that the mean is 3, 00:01:09.980 --> 00:01:14.790 then we can say that final number must be a 5, 00:01:14.790 --> 00:01:18.190 because we have consumed all our degrees of freedom. 00:01:18.190 --> 00:01:20.710 So initially we have five units of information, 00:01:20.710 --> 00:01:23.700 we consumed one unit by estimating the mean, 00:01:23.700 --> 00:01:25.950 which leaves us four units. 00:01:25.950 --> 00:01:29.470 If we estimate the standard deviation or variance, 00:01:29.470 --> 00:01:31.760 we consume two units of information, 00:01:31.760 --> 00:01:33.390 because to calculate standard deviation, 00:01:33.390 --> 00:01:37.060 we have to estimate the mean first, and then 00:01:37.060 --> 00:01:41.970 we calculate the standard deviation based on the differences between these observations 00:01:41.970 --> 00:01:43.200 from their mean. 00:01:43.200 --> 00:01:45.570 So the standard deviation is 1.58. 00:01:45.570 --> 00:01:48.750 To calculate the standard deviation, we also have to calculate the mean. 00:01:48.750 --> 00:01:50.720 Once we know the mean and standard deviation, 00:01:50.720 --> 00:01:53.680 we only need to know three more numbers in the series 00:01:53.680 --> 00:01:56.310 to say what the remaining two are. 00:01:56.310 --> 00:01:58.040 So the idea of degrees of freedom is that 00:01:58.040 --> 00:02:01.560 once you calculate more and more stuff from the same data, 00:02:01.560 --> 00:02:04.200 then you are consuming information. 00:02:04.200 --> 00:02:06.530 And once the degrees of freedom goes to zero, 00:02:06.530 --> 00:02:09.590 then you can't estimate anything from the data anymore. 00:02:09.590 --> 00:02:12.320 You have used all the information that you have.