WEBVTT Kind: captions Language: en 00:00:00.060 --> 00:00:04.440 Normal regression analysis is  a very convenient technique, 00:00:04.440 --> 00:00:06.720 because it will always give you some results. 00:00:06.720 --> 00:00:09.810 Maximum likelihood estimation, on the other hand, 00:00:09.810 --> 00:00:11.250 can sometimes fail. 00:00:11.250 --> 00:00:14.400 And to understand, why it fails, 00:00:14.400 --> 00:00:16.590 will allow you to troubleshoot your models 00:00:16.590 --> 00:00:19.770 and make informed decisions on  how to get the model to work. 00:00:19.770 --> 00:00:24.570 In this video, I will show you an example  of a logistic regression analysis. 00:00:24.570 --> 00:00:26.670 The purpose of the video is not to demonstrate 00:00:26.670 --> 00:00:31.320 the logistic regression analysis feature  specifically, but more generally, 00:00:31.320 --> 00:00:34.920 what could cause a maximum likelihood  estimation process to fail? 00:00:34.920 --> 00:00:36.810 Let's take a look at this data. 00:00:36.810 --> 00:00:42.870 And we have eight observations  X1, X2, two independent variables, 00:00:42.870 --> 00:00:46.926 dependent variable Y, that  receives values of 1 and 0. 00:00:46.926 --> 00:00:49.860 And we'll run a logistic regression analysis 00:00:49.860 --> 00:00:52.230 explaining why using X1 and X2, 00:00:52.230 --> 00:00:53.400 and see what happens. 00:00:53.400 --> 00:00:55.440 So the analysis setup is here, 00:00:55.440 --> 00:00:59.040 we'll be using two different softwares just to demonstrate software differences. 00:00:59.040 --> 00:01:03.000 So this is R and this is Stata  syntax for running this model. 00:01:03.528 --> 00:01:05.508 And the results are in. 00:01:06.270 --> 00:01:08.886 So, what happens, what do we know first? 00:01:08.886 --> 00:01:13.110 The first thing we know that  there are lots of things missing 00:01:13.110 --> 00:01:14.760 from the Stata output. 00:01:14.760 --> 00:01:16.410 We don't have significance tests, 00:01:16.410 --> 00:01:18.660 we don't have standard errors, 00:01:18.660 --> 00:01:22.230 we don't have the overall model test. 00:01:22.230 --> 00:01:24.210 So we have results that are missing. 00:01:24.210 --> 00:01:30.450 Another thing is that softwares  give us different results. 00:01:30.450 --> 00:01:33.510 So R says that the effect of x1 is 15, 00:01:33.510 --> 00:01:37.680 and Stata says that the effect of x1 is 33, 00:01:37.680 --> 00:01:43.200 and the effect of x2 is 30, and the  effect of x2 according to R is 6. 00:01:43.200 --> 00:01:48.060 So these are significantly or  substantially different results, 00:01:48.060 --> 00:01:52.440 because normally if we interpret  the coefficients using odds ratios, 00:01:52.440 --> 00:01:53.820 we exponentiate them. 00:01:53.820 --> 00:01:57.780 So the difference between 15 and 30 is, 00:01:57.780 --> 00:01:59.400 on an exponential scale, 00:01:59.400 --> 00:02:00.780 that's a huge difference. 00:02:00.780 --> 00:02:02.190 So what do we do? 00:02:02.190 --> 00:02:06.720 Do we just pick one of  these two sets of estimates, 00:02:06.720 --> 00:02:10.740 and report that set as if there was no problem? 00:02:11.268 --> 00:02:14.688 Well, we need to understand what's going on? 00:02:14.688 --> 00:02:18.246 Also, we know that the likelihood is zero here, 00:02:18.246 --> 00:02:21.960 and the log-likelihood is 0 here as well, 00:02:21.960 --> 00:02:23.520 which means that likelihood is 1. 00:02:23.520 --> 00:02:26.640 That's a very unusual scenario. 00:02:26.640 --> 00:02:30.840 So it means that getting this  kind of data from this model 00:02:30.840 --> 00:02:35.100 would be a 100% probability. 00:02:35.100 --> 00:02:37.110 So getting any other observations, 00:02:37.110 --> 00:02:39.450 any other values for the Y variable 00:02:39.450 --> 00:02:41.550 would be impossible from this model. 00:02:41.550 --> 00:02:44.670 And you don't have that kind of perfect models. 00:02:44.670 --> 00:02:45.450 So what's going on? 00:02:46.698 --> 00:02:49.590 Then we also have this warning, 00:02:49.590 --> 00:02:53.130 and the successes and failures  are completely determined. 00:02:53.130 --> 00:02:57.960 And R gives a bit less user-friendly warning, 00:02:57.960 --> 00:03:01.890 just numerical probabilities,  numerically 0 and 1 occurred. 00:03:01.890 --> 00:03:05.550 The important thing about warnings is that, 00:03:05.550 --> 00:03:10.170 when you get a warning, that is software  telling you that there's something going on 00:03:10.170 --> 00:03:11.670 that you should pay attention to. 00:03:11.670 --> 00:03:15.390 So warnings are not some inconveniences  that you can just ignore, 00:03:15.390 --> 00:03:18.150 and then report the results if you got any. 00:03:18.150 --> 00:03:23.370 The warning is something that you need  to then spend some time understanding, 00:03:23.370 --> 00:03:24.960 what is the warning telling you and, 00:03:24.960 --> 00:03:27.480 why is the warning occurring, 00:03:27.480 --> 00:03:30.000 and then what you can do about the warning? 00:03:30.000 --> 00:03:34.170 You should not report any  analysis with the warning, 00:03:34.170 --> 00:03:36.180 unless you know what the warning is, 00:03:36.180 --> 00:03:39.030 and have made an explicit decision  not to care about the warning. 00:03:39.030 --> 00:03:41.400 Generally, we want these warnings to go away. 00:03:41.784 --> 00:03:43.884 So what's the cause? 00:03:43.884 --> 00:03:46.878 Let's take a look at the  data set a bit more closely. 00:03:46.878 --> 00:03:50.628 And in this case, the problem is the variable x1. 00:03:50.628 --> 00:03:53.058 So we can just take x2 out. 00:03:53.058 --> 00:03:56.730 What do we see here with x1 and why? 00:03:56.730 --> 00:04:02.760 We see that when x1 receives  values greater than 4, 00:04:02.760 --> 00:04:04.950 y is always 1, 00:04:04.950 --> 00:04:07.740 and when x1 receives value less than 4, 00:04:07.740 --> 00:04:10.320 y is always 0. 00:04:10.320 --> 00:04:15.210 So the x value here perfectly  predicts the value of y. 00:04:15.210 --> 00:04:16.470 So that's the thing. 00:04:16.470 --> 00:04:21.090 And why that would be problematic  for maximum likelihood estimation? 00:04:21.090 --> 00:04:24.180 Let's take a look at how maximum  likelihood estimation works. 00:04:24.564 --> 00:04:27.474 So this is the R analysis. 00:04:27.780 --> 00:04:30.300 And maximum likelihood estimation always starts 00:04:30.300 --> 00:04:32.250 with some kind of initial guess. 00:04:32.250 --> 00:04:35.040 So the computer is fitting an s-curve, 00:04:35.040 --> 00:04:37.140 because this is logistic regression analysis. 00:04:37.140 --> 00:04:40.800 And the first guess is that the s is quite, 00:04:43.020 --> 00:04:45.870 it is not very steep, but it goes up. 00:04:45.870 --> 00:04:49.230 So it goes up for x1, instead of going down. 00:04:49.230 --> 00:04:54.270 And the estimation then proceeds by 00:04:54.270 --> 00:04:57.930 trying different values for the coefficient x, 00:04:57.930 --> 00:05:00.990 so that the curve would fit the data better. 00:05:00.990 --> 00:05:05.370 And in this case, we originally  had the curve fitting here. 00:05:05.370 --> 00:05:10.380 So it predicts this observation  to have about 60% probability. 00:05:10.380 --> 00:05:13.776 And then we make the curve steeper and steeper, 00:05:13.776 --> 00:05:17.280 we can see that this observation is  explained or predicted better and better. 00:05:18.432 --> 00:05:24.300 The problem here is that there is no  limit on how steep the curve can be. 00:05:24.300 --> 00:05:26.970 So the steeper you make the curve, 00:05:26.970 --> 00:05:29.010 the better it predicts these observations. 00:05:29.010 --> 00:05:33.090 And you can make it indefinitely steep. 00:05:33.090 --> 00:05:34.860 So there is no limit, 00:05:34.860 --> 00:05:38.280 how much you can increase the x1 coefficient here, 00:05:38.280 --> 00:05:40.500 and it will always make the curve a bit steeper. 00:05:40.500 --> 00:05:42.990 So we can see that it's not straight up yet, 00:05:42.990 --> 00:05:46.230 we could still make it a few pixels steeper. 00:05:46.230 --> 00:05:51.030 So the coefficient of x1 just goes to infinity, 00:05:51.030 --> 00:05:52.836 if we allow the process to continue. 00:05:52.980 --> 00:05:55.560 What will happen to the likelihood, 00:05:55.560 --> 00:05:58.440 or the log-likelihood in  this case, it'll go to zero, 00:05:58.440 --> 00:06:04.080 and it gets to zero when every  observation is predicted perfectly. 00:06:05.160 --> 00:06:09.720 So we don't have a maximum for this likelihood, 00:06:09.720 --> 00:06:12.420 because the likelihood can never be exactly 0. 00:06:12.420 --> 00:06:15.300 It just goes very very close to zero, 00:06:15.300 --> 00:06:17.790 but we can always make the  curve a little bit steeper 00:06:17.790 --> 00:06:20.310 to make the log-likelihood closer to 0. 00:06:21.222 --> 00:06:26.610 So the maximum of this  log-likelihood here does not exist. 00:06:26.610 --> 00:06:31.020 The consequence is that the  maximum likelihood estimates, 00:06:31.020 --> 00:06:32.520 for this model, don't exist either. 00:06:32.520 --> 00:06:36.150 So maximum likelihood  estimate is indeterminate, 00:06:36.150 --> 00:06:41.640 because making x1 larger and larger  will always fit the data a bit better. 00:06:41.640 --> 00:06:43.680 The increase in fit is marginal, 00:06:43.680 --> 00:06:50.400 but we can't say that x1 coefficient  50, would be the correct value, 00:06:50.400 --> 00:06:54.540 because the coefficient of  51 would fit the data better. 00:06:55.284 --> 00:06:58.020 So the estimates don't exist. 00:06:58.020 --> 00:06:59.424 So what do you do? 00:06:59.424 --> 00:07:03.090 This is a scenario that is so well understood that 00:07:03.090 --> 00:07:05.430 statistical softwares have checks for this. 00:07:05.430 --> 00:07:07.860 So this is from Stata user manual. 00:07:07.860 --> 00:07:10.080 And if we run the logistic model, 00:07:10.080 --> 00:07:14.220 without the s is modified that I  had in before to force it to run, 00:07:14.220 --> 00:07:17.310 Stata says that, no we can't run it, 00:07:17.310 --> 00:07:20.160 because x1 predicts the data perfectly, 00:07:20.160 --> 00:07:21.810 the estimates don't exist. 00:07:21.810 --> 00:07:25.410 And they have an explanation about it. 00:07:25.410 --> 00:07:28.140 So there are a couple of pages  explanation in the user manual, 00:07:28.140 --> 00:07:29.820 what causes this problem, 00:07:29.820 --> 00:07:31.620 how Stata deals with it, 00:07:31.620 --> 00:07:32.940 and what you can do about it? 00:07:32.940 --> 00:07:40.320 The problem is that not all possible scenarios  are programmed into your statistical software. 00:07:40.320 --> 00:07:41.970 So there are scenarios, 00:07:41.970 --> 00:07:44.700 where maximum likelihood estimation can fail, 00:07:44.700 --> 00:07:46.800 and there is no specific check, 00:07:46.800 --> 00:07:51.150 and then it will just fail and  you have no warning indicating, 00:07:51.150 --> 00:07:52.830 why it failed. 00:07:54.000 --> 00:07:56.520 The perfect prediction, that's well understood, 00:07:56.520 --> 00:07:58.200 so you can rely on software catching up. 00:07:58.200 --> 00:08:01.590 But now let's take a look at another problem, 00:08:01.590 --> 00:08:04.380 where the software doesn't  catch it before estimation. 00:08:05.244 --> 00:08:08.190 So this is another variant of the same analysis. 00:08:08.190 --> 00:08:09.954 We add one more observation. 00:08:09.954 --> 00:08:15.360 So we add a ninth observation with  values of x1 at 11 and x2 at 0, 00:08:15.360 --> 00:08:18.330 which is the same values than we  had for the eight observation, 00:08:18.330 --> 00:08:23.190 but the y variable resists the value of 0. 00:08:23.190 --> 00:08:26.100 So now we cannot predict perfectly, 00:08:26.100 --> 00:08:30.930 because the prediction calculated  from x1 and x2 is always the same, 00:08:30.930 --> 00:08:34.050 and if we predict one perfectly, 00:08:34.050 --> 00:08:36.360 then we don't predict 0 and vice-versa. 00:08:36.360 --> 00:08:38.670 So we can't predict perfectly using this data. 00:08:39.534 --> 00:08:44.520 So what will happen is that the perfect  prediction check will not trigger. 00:08:44.520 --> 00:08:47.145 Stata will try to estimate it, 00:08:47.145 --> 00:08:49.110 R will try to estimate it as well. 00:08:49.110 --> 00:08:52.350 And we again get a warning. 00:08:52.350 --> 00:08:55.560 So we have 'convergence is not achieved' warning. 00:08:55.560 --> 00:08:58.020 Stata tried to estimate it, 00:08:58.020 --> 00:09:00.060 couldn't find a maximum of the likelihood. 00:09:00.060 --> 00:09:03.330 It went through 1600 iterations, 00:09:03.330 --> 00:09:04.530 which is the default limit, 00:09:04.530 --> 00:09:06.330 and then it just gave up. 00:09:06.330 --> 00:09:08.670 You can, of course, increase the limits, 00:09:08.670 --> 00:09:12.300 have Stata try 10,000  different sets of estimates, 00:09:12.300 --> 00:09:15.840 it still cannot find the maximum 00:09:15.840 --> 00:09:17.880 because for this model it doesn't exist either. 00:09:19.296 --> 00:09:23.376 So Stata tries, gives up. 00:09:23.700 --> 00:09:24.900 So what do we do about it? 00:09:24.900 --> 00:09:28.080 We see that we don't have standard  errors for one of the parameters, 00:09:28.080 --> 00:09:31.494 that's an indication that we have a  problem that we have to deal with. 00:09:31.494 --> 00:09:35.940 At least because we want to report the  standard error or if you know that, 00:09:35.940 --> 00:09:37.470 then at least the p-value, 00:09:37.470 --> 00:09:38.550 and we have nothing to report. 00:09:39.006 --> 00:09:42.546 So the missing standard errors indicate  that there's some kind of problem. 00:09:42.546 --> 00:09:46.860 We can see also that the likelihood  here says that it's not concave. 00:09:46.860 --> 00:09:50.280 And that gives us some information  that is useful for troubleshooting. 00:09:50.280 --> 00:09:55.170 I will not go through the  troubleshooting procedure in this video, 00:09:55.170 --> 00:09:57.600 but just to demonstrate what's available for you, 00:09:57.600 --> 00:10:00.330 what the 'not concave' means? 00:10:00.714 --> 00:10:03.234 When we estimate maximum likelihood, 00:10:03.234 --> 00:10:06.510 then we do trial and error. 00:10:06.510 --> 00:10:07.770 This is from the video where 00:10:07.770 --> 00:10:11.610 I demonstrated the maximum likelihood  estimation of the population mean, 00:10:11.610 --> 00:10:12.960 using this data. 00:10:12.960 --> 00:10:19.080 So we can see that when the values are 2, 3 and 4, 00:10:19.080 --> 00:10:23.190 then a good estimate for the population mean is 3, 00:10:23.190 --> 00:10:25.230 and that's actually the  maximum likelihood estimate. 00:10:25.230 --> 00:10:28.350 If we try any other values to likelihood function, 00:10:28.350 --> 00:10:29.760 we get smaller likelihoods. 00:10:29.760 --> 00:10:34.680 We have the actual likelihood here and  then we have the log-likelihood here. 00:10:34.680 --> 00:10:38.340 What's important about the log-likelihood is that, 00:10:38.340 --> 00:10:41.250 it's a curve that bends down, 00:10:41.250 --> 00:10:42.510 so it kind of curves down. 00:10:42.510 --> 00:10:45.960 And we say that this is a concave curve, 00:10:45.960 --> 00:10:47.580 its curve that curves down. 00:10:47.580 --> 00:10:53.310 And the concave curve has the second derivative, 00:10:53.310 --> 00:10:57.960 which quantifies the curvature  here always negative. 00:10:57.960 --> 00:11:00.060 So if you have a curve that is concave, 00:11:00.060 --> 00:11:01.800 then the second derivative is negative. 00:11:01.800 --> 00:11:04.680 If the second derivative is  negative and this is concave, 00:11:04.680 --> 00:11:07.200 then we know that there's a peak somewhere 00:11:07.200 --> 00:11:09.570 and that the peak is our  maximum likelihood estimate. 00:11:09.570 --> 00:11:11.640 What will happen is that, 00:11:11.640 --> 00:11:15.270 if this curve, for example, is flat here, 00:11:15.270 --> 00:11:18.810 then it's not concave because it's  not bending down all the time. 00:11:18.810 --> 00:11:22.170 And we wouldn't have a maximum of likelihood, 00:11:22.170 --> 00:11:25.860 because we have multiple  different values of the parameter, 00:11:25.860 --> 00:11:27.210 the estimate of the mean, 00:11:27.210 --> 00:11:30.000 that is equally good for the  maximum likelihood perspective. 00:11:30.780 --> 00:11:34.020 Also, we could have a curve  that goes down first 00:11:34.020 --> 00:11:35.160 and then curves up, 00:11:35.160 --> 00:11:36.990 so that would not be concave either. 00:11:36.990 --> 00:11:39.690 So that's what the non-concave means, 00:11:39.690 --> 00:11:42.660 the maximum likelihood is not something  that is easy for us to estimate. 00:11:42.660 --> 00:11:46.380 We can check what's actually  the problem by looking at 00:11:46.380 --> 00:11:49.170 the matrix of the second derivatives here, 00:11:49.170 --> 00:11:53.070 which tell us, how strongly this curves down, 00:11:53.070 --> 00:11:55.830 and we can see that there  are a couple of 0's there. 00:11:55.830 --> 00:11:57.600 So we have these zeros here, 00:11:57.600 --> 00:12:02.568 and that indicates that we have  a problem with these parameters. 00:12:03.504 --> 00:12:06.570 The troubleshooting and an  exact interpretation of this is 00:12:06.570 --> 00:12:08.760 something that I will leave for another video. 00:12:09.912 --> 00:12:13.590 Ok, then let's take a look at the problem. 00:12:13.590 --> 00:12:17.220 We have missing standard errors here, 00:12:17.220 --> 00:12:20.670 which is an indication that we  need to do something definitely. 00:12:20.670 --> 00:12:27.390 And we have a warning that '3 failures  and 2 successes completely determined'. 00:12:28.650 --> 00:12:31.770 The logical thing to do next is to ask, 00:12:31.770 --> 00:12:33.720 which two and which three. 00:12:33.720 --> 00:12:39.480 And to get, which observations  are predicted perfectly, 00:12:39.480 --> 00:12:40.800 we can use the model, 00:12:40.800 --> 00:12:42.150 even if it's not converged, 00:12:42.150 --> 00:12:44.760 we can use the model to  calculate the actual predictions. 00:12:44.760 --> 00:12:49.980 So we can see here that predicted values  for these three zeros are exactly 0's, 00:12:49.980 --> 00:12:53.304 for these two, the predictive  values are exactly 1's, 00:12:53.304 --> 00:12:54.960 and that's the warning. 00:12:54.960 --> 00:12:58.770 And the predictive value for  this is very close to zero, 00:12:58.770 --> 00:13:03.180 so if we would allow Stata to go  on forever, doing the estimation, 00:13:03.180 --> 00:13:06.480 it will probably estimate or  predict this to be 0 as well, 00:13:06.480 --> 00:13:07.320 and this to be 1. 00:13:07.320 --> 00:13:11.250 So we have basically seven observations  that are perfectly predicted 00:13:11.250 --> 00:13:12.480 and two that are not. 00:13:12.480 --> 00:13:14.640 So also what's going on here? 00:13:14.640 --> 00:13:16.410 It's not a perfect prediction, 00:13:16.410 --> 00:13:18.540 because these are not predicted  perfectly, they can't. 00:13:18.540 --> 00:13:22.490 And for that reason, Stata  doesn't catch the problem. 00:13:23.378 --> 00:13:27.140 The logistic regression model  with more than one variable, 00:13:27.140 --> 00:13:30.980 two variables, can be understood  as this kind of surface. 00:13:30.980 --> 00:13:35.300 So we have x1 here, we have x2 here and then 00:13:35.300 --> 00:13:36.800 we have the y on this axis. 00:13:36.800 --> 00:13:37.604 And we can see, 00:13:37.604 --> 00:13:41.060 how the observations depend on x1 and x2. 00:13:41.060 --> 00:13:46.940 The circles here are the values  that are actual values, the 1's. 00:13:46.940 --> 00:13:50.300 The circles down here are  the actual values of 0's. 00:13:50.300 --> 00:13:56.330 And the position of the circle indicates  the values of x1 and x2 variables 00:13:56.330 --> 00:13:57.566 for that observation. 00:13:57.566 --> 00:14:00.800 Then the cross here is the  predicted value on the surface. 00:14:00.800 --> 00:14:03.890 When we do maximum likelihood estimation, 00:14:03.890 --> 00:14:06.050 we want to adjust the surface 00:14:06.050 --> 00:14:09.410 by adjusting the coefficients of x1 and x2, 00:14:09.410 --> 00:14:14.720 so that the predicted values are as  close as the observed values as possible. 00:14:14.720 --> 00:14:17.000 And what will happen again that 00:14:17.000 --> 00:14:20.180 if we make this surface indefinitely steep. 00:14:20.180 --> 00:14:22.610 So we can make it as steep as possible. 00:14:22.610 --> 00:14:25.760 But this one observation is always, 00:14:25.760 --> 00:14:26.900 the predicted value, 00:14:26.900 --> 00:14:28.850 will always be in the middle of the surface, 00:14:28.850 --> 00:14:31.490 and it can't be predicted perfectly, 00:14:31.490 --> 00:14:34.310 because you can't predict  1 and 0 at the same time. 00:14:34.310 --> 00:14:38.780 So this set of x1 is 11, x2 is 0, 00:14:38.780 --> 00:14:41.390 corresponds with two different values of y, 00:14:41.390 --> 00:14:43.310 so that's why you can't predict perfectly. 00:14:43.310 --> 00:14:45.080 But the problem is the same, 00:14:45.080 --> 00:14:50.060 the coefficients of x1 and x2 grow large, 00:14:50.060 --> 00:14:54.410 the intercept goes toward minus infinity, 00:14:54.410 --> 00:14:58.670 and the log-likelihood increases without limits. 00:14:58.670 --> 00:15:01.880 So you can always make the surface a bit steeper 00:15:01.880 --> 00:15:04.130 and then it'll fit the data a bit better, 00:15:04.130 --> 00:15:07.250 but there's no limit on how large x1 and x2, 00:15:07.250 --> 00:15:09.500 and how small the intercept can be. 00:15:11.732 --> 00:15:13.340 So what do you do with this problem? 00:15:13.340 --> 00:15:16.760 There are four options, 00:15:16.760 --> 00:15:19.910 how you can actually use the analysis. 00:15:19.910 --> 00:15:22.250 And option one, 00:15:22.250 --> 00:15:26.300 just if you use Stata, and  you didn't do the R analysis, 00:15:26.300 --> 00:15:28.880 you wouldn't have noticed that the  softwares give radically different results. 00:15:28.880 --> 00:15:32.330 Choose the results that you got, 00:15:32.330 --> 00:15:35.180 ignore the warning and present  the results in your paper. 00:15:35.700 --> 00:15:38.460 I can't tell you how common that is 00:15:38.460 --> 00:15:40.200 but I'm pretty sure some people will do that. 00:15:40.200 --> 00:15:43.200 So understanding the warning requires effort 00:15:43.200 --> 00:15:45.870 and if you have some estimates  that you could report, 00:15:45.870 --> 00:15:47.460 and not go through the extra effort, 00:15:47.460 --> 00:15:49.410 some people probably will just do. 00:15:49.410 --> 00:15:50.520 That's a bit unethical, 00:15:50.520 --> 00:15:52.170 because the software with the warning, 00:15:52.170 --> 00:15:55.890 tells you that there's a problem  that you should pay attention to, 00:15:55.890 --> 00:15:58.410 and then you're ignoring evidence of a problem 00:15:58.410 --> 00:16:01.500 and reporting the results  as if there was no problem. 00:16:01.500 --> 00:16:04.380 The second alternative is trial and error. 00:16:04.380 --> 00:16:06.330 This is something that I have done a lot 00:16:06.330 --> 00:16:09.810 before I started to think that  maybe I should understand, 00:16:09.810 --> 00:16:10.860 what the computer is doing. 00:16:10.860 --> 00:16:15.180 So you just try, you drop  cases, you drop variables, 00:16:15.180 --> 00:16:17.670 until you get the warning to disappear. 00:16:17.670 --> 00:16:18.960 So you run and run, 00:16:18.960 --> 00:16:21.060 and trial and error without understanding, 00:16:21.060 --> 00:16:24.330 why sometimes the error  appears, sometimes it doesn't, 00:16:24.330 --> 00:16:28.860 and then you pick one of the analyses  that doesn't produce you the error. 00:16:28.860 --> 00:16:32.760 This is a bit better because at least you're  trying to do something for the problem, 00:16:32.760 --> 00:16:35.760 but this trial and error, blindly, 00:16:35.760 --> 00:16:38.580 could lead you with the suboptimal model. 00:16:38.580 --> 00:16:40.740 For example, you're dropping a control variable, 00:16:40.740 --> 00:16:44.460 because the model doesn't converge  because of the control there, 00:16:44.460 --> 00:16:49.110 and then instead of getting the  model to converge with the control, 00:16:49.110 --> 00:16:52.170 you are doing a model that doesn't  control for some explanation 00:16:52.170 --> 00:16:53.430 that you would really like to control. 00:16:53.838 --> 00:16:55.758 So this is not an ideal case. 00:16:56.340 --> 00:16:58.410 The third alternative is a bit better. 00:16:58.410 --> 00:17:01.770 So if you use, for example,  logistic regression a lot, 00:17:01.770 --> 00:17:05.340 this perfect prediction issue that I  demonstrated here is a well-known thing. 00:17:05.340 --> 00:17:10.170 So any decent book on logistic  regression analysis will tell you 00:17:10.170 --> 00:17:13.650 at least the first case, but  probably also the second case. 00:17:13.650 --> 00:17:17.640 Stata user manual will explain to  you both cases that I demonstrated, 00:17:17.640 --> 00:17:21.150 and what Stata does in those scenarios and why. 00:17:21.150 --> 00:17:24.720 So you can try it and learn each special case, 00:17:24.720 --> 00:17:26.100 and how to deal with them, separately. 00:17:26.100 --> 00:17:32.340 And it works if you just want to use a  small number of analysis in your life. 00:17:32.340 --> 00:17:35.640 The problem is that the special  cases for different analysis, 00:17:35.640 --> 00:17:38.400 for example, if you do negative  binomial regression analysis, 00:17:38.400 --> 00:17:39.840 there are different special cases. 00:17:39.840 --> 00:17:43.830 So the number of special cases that  you have to learn is quite large. 00:17:44.622 --> 00:17:49.080 Then are the fourth option is to  understand the estimation principle. 00:17:49.080 --> 00:17:53.920 So what do the second  derivatives and the likelihood, 00:17:53.920 --> 00:17:54.640 what do they mean, 00:17:54.640 --> 00:17:58.270 how do they depend on the parameter values  that the computer is currently trying, 00:17:58.270 --> 00:18:01.480 and then you can see what's the problem. 00:18:01.480 --> 00:18:03.040 So this is, of course, more difficult, 00:18:03.040 --> 00:18:08.140 but in the long run, it'll  make you a better researcher, 00:18:08.140 --> 00:18:11.950 because you can do diagnostics  for your model, in a way, 00:18:11.950 --> 00:18:16.480 that's just trying to memorizing every  special case doesn't allow you to do. 00:18:16.480 --> 00:18:21.640 So these are the four options that  allow you to present some results. 00:18:21.640 --> 00:18:23.050 The one is unethical, 00:18:23.050 --> 00:18:25.360 unethical to ignore warnings. 00:18:25.360 --> 00:18:27.100 Trial and error is bad, 00:18:27.100 --> 00:18:30.610 this is good but it's not the ideal case. 00:18:30.610 --> 00:18:32.200 And in the ideal case, you understand, 00:18:32.200 --> 00:18:32.980 what the software is doing. 00:18:32.980 --> 00:18:35.110 There's also the fifth option, 00:18:35.110 --> 00:18:37.240 which is ignore model, 00:18:37.240 --> 00:18:38.980 so give up, 00:18:38.980 --> 00:18:40.960 don't do the model. 00:18:40.960 --> 00:18:44.650 For example, if you are just  doing a robustness check, 00:18:44.650 --> 00:18:47.320 you have your main analysis results, no warnings, 00:18:47.320 --> 00:18:49.600 and then in the robustness check, 00:18:49.600 --> 00:18:52.990 where you analyze a different  model that's not that important, 00:18:52.990 --> 00:18:54.730 you get a warning. 00:18:54.730 --> 00:19:00.910 So should I spend a day or a  week and troubleshooting it, 00:19:00.910 --> 00:19:04.570 should I spend a month studying first  and then a week troubleshooting it, 00:19:04.570 --> 00:19:07.630 or should I just leave the analysis out? 00:19:07.630 --> 00:19:10.330 Leaving a problematic analysis out, 00:19:10.330 --> 00:19:15.460 it's a better alternative  than ignoring the warning, 00:19:15.460 --> 00:19:17.620 ignoring the problem and  reporting the results anyway. 00:19:17.620 --> 00:19:19.510 I do this all the time. 00:19:19.510 --> 00:19:21.100 When the problem is not important, 00:19:21.100 --> 00:19:23.950 I don't want to spend my  time dealing with problems. 00:19:23.950 --> 00:19:25.570 So it's a perfectly viable option, 00:19:25.570 --> 00:19:26.920 that's always something that you should consider