WEBVTT Kind: captions Language: en 00:00:00.090 --> 00:00:04.650 Maximum like and estimation is one of the most  commonly used estimation principles in research.  00:00:04.650 --> 00:00:10.950 Understanding this principle why it's not  exactly necessary for you to do your analysis,   00:00:10.950 --> 00:00:17.730 it's very useful because sometimes when you get to  more complicated models, the estimation can fail,   00:00:17.730 --> 00:00:22.530 and to able to troubleshoot and fix the  problem you have to understand what the   00:00:22.530 --> 00:00:27.750 computer is trying to do, when it attempts  maximum likelihood estimates. To understand   00:00:27.750 --> 00:00:31.260 maximum likelihood estimation we need  to first understand a couple of concepts   00:00:31.260 --> 00:00:36.990 from probability. The first two concepts are  probability density and cumulative probability.  00:00:36.990 --> 00:00:43.200 When we have the p-value the p-value quantifies  the probability the cumulative probability   00:00:43.200 --> 00:00:48.510 what is the area under the curve here. And the area quantifies how likely or   00:00:48.510 --> 00:00:54.870 how probable a value of less than minus one  is from a normal distribution center that   00:00:54.870 --> 00:01:01.200 one at zero and a standard deviation of 1. Then the probability density tells us how   00:01:01.200 --> 00:01:07.800 probable a value of minus 1 is compared  to any other value of the distribution.  00:01:07.800 --> 00:01:15.270 So our probability cumulative probability tells us  what is the probability of getting a value or more   00:01:15.270 --> 00:01:23.460 extreme from the distribution, and probability  density tells us what is the probability related   00:01:23.460 --> 00:01:30.150 to others. So this is not any particular any  probability of any particular value, because   00:01:30.150 --> 00:01:37.500 the normal distribution has unlimited range, so  getting an exact value is pretty much impossible   00:01:37.500 --> 00:01:43.050 because there are so many different alternatives. But this tells us the relative probability   00:01:43.050 --> 00:01:47.160 between this value compared to for  example this value or or that value.  00:01:47.160 --> 00:01:54.060 And we are not lot more likely to get the  value of minus 1 than a value of minus 2   00:01:54.060 --> 00:01:59.070 and we are more likely the value of 0 at them  which is the mean, than the value of minus 1.  00:01:59.070 --> 00:02:04.830 We also need to understand the probability  of independent events. And the independent   00:02:04.830 --> 00:02:11.370 events probability are also highlights why we  need to assume independence of observations.  00:02:11.370 --> 00:02:18.210 The idea of in the and the events is that, if  you have two events we have two dice here so   00:02:18.210 --> 00:02:24.690 we have died one the first die receives values  from 1 2 to 6 and we have the second die here   00:02:24.690 --> 00:02:31.200 receives values from 1 to 6 as well. And these dice are independent,   00:02:31.200 --> 00:02:36.360 so throwing the first die doesn't  affect the throw of the second die.  00:02:36.360 --> 00:02:41.940 And therefore all possible 36  combinations are equally likely   00:02:41.940 --> 00:02:48.510 to occur because these are fair dice. So if we have a value of 1 from the first   00:02:48.510 --> 00:02:55.950 die that is 1 out of 6, and to get a value  of 1 from the second die that is 1 out of 6   00:02:55.950 --> 00:03:03.930 again and the total probability of two ones is 1  out of 6 times 1 out of 6 which is 1 out of 36,   00:03:03.930 --> 00:03:09.180 as this figure illustrates. So the probability of two   00:03:09.180 --> 00:03:14.250 independent events, is the product of  the probabilities of both events here.  00:03:14.250 --> 00:03:19.080 And this is an important principle  in maximum likelihood estimation.  00:03:19.080 --> 00:03:25.980 Let's now go on and explain the actual  estimation principle. Assume that we   00:03:25.980 --> 00:03:33.240 have a normal distribution here, which has  a mean of zero and standard deviation of 1.  00:03:33.240 --> 00:03:42.120 To get values of 2 3 and 4 from from this  population, they probability density for 2   00:03:42.120 --> 00:03:51.180 is 0.05 for 3 is 0.04 and for four it's  0.001 and the cumulative probability   00:03:51.180 --> 00:04:00.000 density is the product of these two values. So the this value here is the product of that   00:04:00.000 --> 00:04:06.960 probability density and that probability density  this value here is the product of that probably   00:04:06.960 --> 00:04:12.600 the density plus that probability density. We multiply these probability densities   00:04:12.600 --> 00:04:16.590 together because these are independent  observations from the same distribution.  00:04:16.590 --> 00:04:22.980 So the total probability density is the product  of these individual probability densities.  00:04:22.980 --> 00:04:30.870 So getting values of 2 3 & 4 this probably  density quantifies the likelihood of those   00:04:30.870 --> 00:04:35.310 three values compared to any other  three values that we could get.  00:04:35.310 --> 00:04:41.820 So these probability densities are very  small numbers because there's one numbers   00:04:41.820 --> 00:04:48.900 it is typically more convenient to work with  logarithms and the when we work with logarithms.  00:04:48.900 --> 00:04:58.740 We extend our the table by having a the log here. So we have the logarithm and then we have our   00:04:58.740 --> 00:05:03.570 the cumulative logarithm which  is the sum of these logarithms.  00:05:03.570 --> 00:05:10.950 So there are why we take a sum of these  logarithms is that the logarithm of us   00:05:10.950 --> 00:05:17.010 of a product is the sum of the logarithms of  those two components that go into the product.  00:05:17.010 --> 00:05:24.330 We also call these now likelihoods because we are  trying to estimate there are the mean of this.  00:05:24.330 --> 00:05:29.100 We no longer know the mean  of this normal distribution,   00:05:29.100 --> 00:05:37.230 rather we try to estimate it from the data. So assuming that the mean is zero,   00:05:37.230 --> 00:05:43.230 then the likelihood of the data is this  small number here and the log likelihood   00:05:43.230 --> 00:05:50.220 which the logarithm of the likelihood is 117. How maximum likelihood estimation works is   00:05:50.220 --> 00:05:57.720 that we try different values for the mean. For example now mean is 0, we try 0.01 and   00:05:57.720 --> 00:06:05.790 we check if the log likelihood becomes larger. If it does, then we know that the mean is in   00:06:05.790 --> 00:06:11.760 that direction, the actual maximum likelihood  estimate of the population mean. So we move   00:06:11.760 --> 00:06:17.070 the normal distribution sideways a bit  and we move it to the right this time.  00:06:17.070 --> 00:06:22.980 We can see here that the the probability  densities of all these observations increased,   00:06:22.980 --> 00:06:31.680 because the distribution is now closer  to these observations. And also the log   00:06:31.680 --> 00:06:39.030 likelihood became larger. We can still increase  the log likelihood or the cumulative likelihood   00:06:39.030 --> 00:06:46.500 by moving the distribution right a bit more,  so we would to write a bit more and uh the log   00:06:46.500 --> 00:06:51.600 likelihood became large or closer to zero. This is nearly always a negative number,   00:06:51.600 --> 00:06:55.980 so increasing a negative number means  that the number gets closer to zero.  00:06:55.980 --> 00:07:05.040 And we can still move it right a bit to increase  the likelihood even more, and this is the   00:07:05.040 --> 00:07:10.080 maximum likelihood estimate, this is the maximum  likelihood that we can get by shifting the mean.  00:07:10.080 --> 00:07:15.630 So having a normal distribution  with standard deviation of 1 the   00:07:15.630 --> 00:07:21.030 we can't get any larger likelihood values. We when we set the mean at zero that is the   00:07:21.030 --> 00:07:27.030 maximum likelihood if we received it right a bit  or left a bit then the likelihood would decrease.  00:07:27.030 --> 00:07:32.820 And the value of mean here that  maximizes the likelihood is called   00:07:32.820 --> 00:07:37.380 the maximum likelihood estimate. The maximum likelihood estimate   00:07:37.380 --> 00:07:42.270 is found by maximizing the likelihood  function or the log likelihood function.  00:07:42.270 --> 00:07:49.950 We can also express the likelihood as a function  of the mean. So here is the likelihood function.  00:07:49.950 --> 00:07:53.940 We can see that the likelihood of getting  those three observations pretty much zero   00:07:53.940 --> 00:07:59.010 here and it's pretty much zero here,  so the likelihood Peaks only when   00:07:59.010 --> 00:08:03.990 we were very close to the correct values. When we have the log likelihood, it looks a   00:08:03.990 --> 00:08:10.740 lot nicer curve because it actually goes to some  direction instead of being flat even if we are   00:08:10.740 --> 00:08:17.460 further from the actual correct population value. So that's one reason why we use logarithms   00:08:17.460 --> 00:08:21.900 for maximum likelihood estimates. So in practice we set some starting value   00:08:21.900 --> 00:08:26.850 let's say that that's zero for the mean. Then we look which direction does the   00:08:26.850 --> 00:08:28.980 likelihood go up. We see that it goes   00:08:28.980 --> 00:08:35.190 up when we go right, then we go right a bit  we try here, we calculate likelihood here,   00:08:35.190 --> 00:08:40.350 we check which direction we should go, we  calculated here and, then we discover that if   00:08:40.350 --> 00:08:44.850 we go further the likelihood starts to decrease. Then we declare that this is our maximum   00:08:44.850 --> 00:08:49.290 likelihood estimate. So that's the basic  principle of maximum likelihood estimation.  00:08:49.290 --> 00:08:55.560 Trying to estimate mean of normal distribution  with standard deviation of 1 from the   00:08:55.560 --> 00:09:01.860 observations. In other videos I will use this  same principle for more complicated examples