WEBVTT
Kind: captions
Language: en

00:00:00.060 --> 00:00:04.440
Normal regression analysis is&nbsp;
a very convenient technique,

00:00:04.440 --> 00:00:06.720
because it will always give you some results.

00:00:06.720 --> 00:00:09.810
Maximum likelihood estimation, on the other hand,

00:00:09.810 --> 00:00:11.250
can sometimes fail.

00:00:11.250 --> 00:00:14.400
And to understand, why it fails,

00:00:14.400 --> 00:00:16.590
will allow you to troubleshoot your models

00:00:16.590 --> 00:00:19.770
and make informed decisions on&nbsp;
how to get the model to work.

00:00:19.770 --> 00:00:24.570
In this video, I will show you an example&nbsp;
of a logistic regression analysis.

00:00:24.570 --> 00:00:26.670
The purpose of the video is not to demonstrate

00:00:26.670 --> 00:00:31.320
the logistic regression analysis feature&nbsp;
specifically, but more generally,

00:00:31.320 --> 00:00:34.920
what could cause a maximum likelihood&nbsp;
estimation process to fail?

00:00:34.920 --> 00:00:36.810
Let's take a look at this data.

00:00:36.810 --> 00:00:42.870
And we have eight observations&nbsp;
X1, X2, two independent variables,

00:00:42.870 --> 00:00:46.926
dependent variable Y, that&nbsp;
receives values of 1 and 0.

00:00:46.926 --> 00:00:49.860
And we'll run a logistic regression analysis

00:00:49.860 --> 00:00:52.230
explaining why using X1 and X2,

00:00:52.230 --> 00:00:53.400
and see what happens.

00:00:53.400 --> 00:00:55.440
So the analysis setup is here,

00:00:55.440 --> 00:00:59.040
we'll be using two different softwares
just to demonstrate software differences.

00:00:59.040 --> 00:01:03.000
So this is R and this is Stata&nbsp;
syntax for running this model.

00:01:03.528 --> 00:01:05.508
And the results are in.

00:01:06.270 --> 00:01:08.886
So, what happens, what do we know first?

00:01:08.886 --> 00:01:13.110
The first thing we know that&nbsp;
there are lots of things missing

00:01:13.110 --> 00:01:14.760
from the Stata output.

00:01:14.760 --> 00:01:16.410
We don't have significance tests,

00:01:16.410 --> 00:01:18.660
we don't have standard errors,

00:01:18.660 --> 00:01:22.230
we don't have the overall model test.

00:01:22.230 --> 00:01:24.210
So we have results that are missing.

00:01:24.210 --> 00:01:30.450
Another thing is that softwares&nbsp;
give us different results.

00:01:30.450 --> 00:01:33.510
So R says that the effect of x1 is 15,

00:01:33.510 --> 00:01:37.680
and Stata says that the effect of x1 is 33,

00:01:37.680 --> 00:01:43.200
and the effect of x2 is 30, and the&nbsp;
effect of x2 according to R is 6.

00:01:43.200 --> 00:01:48.060
So these are significantly or&nbsp;
substantially different results,

00:01:48.060 --> 00:01:52.440
because normally if we interpret&nbsp;
the coefficients using odds ratios,

00:01:52.440 --> 00:01:53.820
we exponentiate them.

00:01:53.820 --> 00:01:57.780
So the difference between 15 and 30 is,

00:01:57.780 --> 00:01:59.400
on an exponential scale,

00:01:59.400 --> 00:02:00.780
that's a huge difference.

00:02:00.780 --> 00:02:02.190
So what do we do?

00:02:02.190 --> 00:02:06.720
Do we just pick one of&nbsp;
these two sets of estimates,

00:02:06.720 --> 00:02:10.740
and report that set as if there was no problem?

00:02:11.268 --> 00:02:14.688
Well, we need to understand what's going on?

00:02:14.688 --> 00:02:18.246
Also, we know that the likelihood is zero here,

00:02:18.246 --> 00:02:21.960
and the log-likelihood is 0 here as well,

00:02:21.960 --> 00:02:23.520
which means that likelihood is 1.

00:02:23.520 --> 00:02:26.640
That's a very unusual scenario.

00:02:26.640 --> 00:02:30.840
So it means that getting this&nbsp;
kind of data from this model

00:02:30.840 --> 00:02:35.100
would be a 100% probability.

00:02:35.100 --> 00:02:37.110
So getting any other observations,

00:02:37.110 --> 00:02:39.450
any other values for the Y variable

00:02:39.450 --> 00:02:41.550
would be impossible from this model.

00:02:41.550 --> 00:02:44.670
And you don't have that kind of perfect models.

00:02:44.670 --> 00:02:45.450
So what's going on?

00:02:46.698 --> 00:02:49.590
Then we also have this warning,

00:02:49.590 --> 00:02:53.130
and the successes and failures&nbsp;
are completely determined.

00:02:53.130 --> 00:02:57.960
And R gives a bit less user-friendly warning,

00:02:57.960 --> 00:03:01.890
just numerical probabilities,&nbsp;
numerically 0 and 1 occurred.

00:03:01.890 --> 00:03:05.550
The important thing about warnings is that,

00:03:05.550 --> 00:03:10.170
when you get a warning, that is software&nbsp;
telling you that there's something going on

00:03:10.170 --> 00:03:11.670
that you should pay attention to.

00:03:11.670 --> 00:03:15.390
So warnings are not some inconveniences&nbsp;
that you can just ignore,

00:03:15.390 --> 00:03:18.150
and then report the results if you got any.

00:03:18.150 --> 00:03:23.370
The warning is something that you need&nbsp;
to then spend some time understanding,

00:03:23.370 --> 00:03:24.960
what is the warning telling you and,

00:03:24.960 --> 00:03:27.480
why is the warning occurring,

00:03:27.480 --> 00:03:30.000
and then what you can do about the warning?

00:03:30.000 --> 00:03:34.170
You should not report any&nbsp;
analysis with the warning,

00:03:34.170 --> 00:03:36.180
unless you know what the warning is,

00:03:36.180 --> 00:03:39.030
and have made an explicit decision&nbsp;
not to care about the warning.

00:03:39.030 --> 00:03:41.400
Generally, we want these warnings to go away.

00:03:41.784 --> 00:03:43.884
So what's the cause?

00:03:43.884 --> 00:03:46.878
Let's take a look at the&nbsp;
data set a bit more closely.

00:03:46.878 --> 00:03:50.628
And in this case, the problem is the variable x1.

00:03:50.628 --> 00:03:53.058
So we can just take x2 out.

00:03:53.058 --> 00:03:56.730
What do we see here with x1 and why?

00:03:56.730 --> 00:04:02.760
We see that when x1 receives&nbsp;
values greater than 4,

00:04:02.760 --> 00:04:04.950
y is always 1,

00:04:04.950 --> 00:04:07.740
and when x1 receives value less than 4,

00:04:07.740 --> 00:04:10.320
y is always 0.

00:04:10.320 --> 00:04:15.210
So the x value here perfectly&nbsp;
predicts the value of y.

00:04:15.210 --> 00:04:16.470
So that's the thing.

00:04:16.470 --> 00:04:21.090
And why that would be problematic&nbsp;
for maximum likelihood estimation?

00:04:21.090 --> 00:04:24.180
Let's take a look at how maximum&nbsp;
likelihood estimation works.

00:04:24.564 --> 00:04:27.474
So this is the R analysis.

00:04:27.780 --> 00:04:30.300
And maximum likelihood estimation always starts

00:04:30.300 --> 00:04:32.250
with some kind of initial guess.

00:04:32.250 --> 00:04:35.040
So the computer is fitting an s-curve,

00:04:35.040 --> 00:04:37.140
because this is logistic regression analysis.

00:04:37.140 --> 00:04:40.800
And the first guess is that the s is quite,

00:04:43.020 --> 00:04:45.870
it is not very steep, but it goes up.

00:04:45.870 --> 00:04:49.230
So it goes up for x1, instead of going down.

00:04:49.230 --> 00:04:54.270
And the estimation then proceeds by

00:04:54.270 --> 00:04:57.930
trying different values for the coefficient x,

00:04:57.930 --> 00:05:00.990
so that the curve would fit the data better.

00:05:00.990 --> 00:05:05.370
And in this case, we originally&nbsp;
had the curve fitting here.

00:05:05.370 --> 00:05:10.380
So it predicts this observation&nbsp;
to have about 60% probability.

00:05:10.380 --> 00:05:13.776
And then we make the curve steeper and steeper,

00:05:13.776 --> 00:05:17.280
we can see that this observation is&nbsp;
explained or predicted better and better.

00:05:18.432 --> 00:05:24.300
The problem here is that there is no&nbsp;
limit on how steep the curve can be.

00:05:24.300 --> 00:05:26.970
So the steeper you make the curve,

00:05:26.970 --> 00:05:29.010
the better it predicts these observations.

00:05:29.010 --> 00:05:33.090
And you can make it indefinitely steep.

00:05:33.090 --> 00:05:34.860
So there is no limit,

00:05:34.860 --> 00:05:38.280
how much you can increase the x1 coefficient here,

00:05:38.280 --> 00:05:40.500
and it will always make the curve a bit steeper.

00:05:40.500 --> 00:05:42.990
So we can see that it's not straight up yet,

00:05:42.990 --> 00:05:46.230
we could still make it a few pixels steeper.

00:05:46.230 --> 00:05:51.030
So the coefficient of x1 just goes to infinity,

00:05:51.030 --> 00:05:52.836
if we allow the process to continue.

00:05:52.980 --> 00:05:55.560
What will happen to the likelihood,

00:05:55.560 --> 00:05:58.440
or the log-likelihood in&nbsp;
this case, it'll go to zero,

00:05:58.440 --> 00:06:04.080
and it gets to zero when every&nbsp;
observation is predicted perfectly.

00:06:05.160 --> 00:06:09.720
So we don't have a maximum for this likelihood,

00:06:09.720 --> 00:06:12.420
because the likelihood can never be exactly 0.

00:06:12.420 --> 00:06:15.300
It just goes very very close to zero,

00:06:15.300 --> 00:06:17.790
but we can always make the&nbsp;
curve a little bit steeper

00:06:17.790 --> 00:06:20.310
to make the log-likelihood closer to 0.

00:06:21.222 --> 00:06:26.610
So the maximum of this&nbsp;
log-likelihood here does not exist.

00:06:26.610 --> 00:06:31.020
The consequence is that the&nbsp;
maximum likelihood estimates,

00:06:31.020 --> 00:06:32.520
for this model, don't exist either.

00:06:32.520 --> 00:06:36.150
So maximum likelihood&nbsp;
estimate is indeterminate,

00:06:36.150 --> 00:06:41.640
because making x1 larger and larger&nbsp;
will always fit the data a bit better.

00:06:41.640 --> 00:06:43.680
The increase in fit is marginal,

00:06:43.680 --> 00:06:50.400
but we can't say that x1 coefficient&nbsp;
50, would be the correct value,

00:06:50.400 --> 00:06:54.540
because the coefficient of&nbsp;
51 would fit the data better.

00:06:55.284 --> 00:06:58.020
So the estimates don't exist.

00:06:58.020 --> 00:06:59.424
So what do you do?

00:06:59.424 --> 00:07:03.090
This is a scenario that is so well understood that

00:07:03.090 --> 00:07:05.430
statistical softwares have checks for this.

00:07:05.430 --> 00:07:07.860
So this is from Stata user manual.

00:07:07.860 --> 00:07:10.080
And if we run the logistic model,

00:07:10.080 --> 00:07:14.220
without the s is modified that I&nbsp;
had in before to force it to run,

00:07:14.220 --> 00:07:17.310
Stata says that, no we can't run it,

00:07:17.310 --> 00:07:20.160
because x1 predicts the data perfectly,

00:07:20.160 --> 00:07:21.810
the estimates don't exist.

00:07:21.810 --> 00:07:25.410
And they have an explanation about it.

00:07:25.410 --> 00:07:28.140
So there are a couple of pages&nbsp;
explanation in the user manual,

00:07:28.140 --> 00:07:29.820
what causes this problem,

00:07:29.820 --> 00:07:31.620
how Stata deals with it,

00:07:31.620 --> 00:07:32.940
and what you can do about it?

00:07:32.940 --> 00:07:40.320
The problem is that not all possible scenarios&nbsp;
are programmed into your statistical software.

00:07:40.320 --> 00:07:41.970
So there are scenarios,

00:07:41.970 --> 00:07:44.700
where maximum likelihood estimation can fail,

00:07:44.700 --> 00:07:46.800
and there is no specific check,

00:07:46.800 --> 00:07:51.150
and then it will just fail and&nbsp;
you have no warning indicating,

00:07:51.150 --> 00:07:52.830
why it failed.

00:07:54.000 --> 00:07:56.520
The perfect prediction, that's well understood,

00:07:56.520 --> 00:07:58.200
so you can rely on software catching up.

00:07:58.200 --> 00:08:01.590
But now let's take a look at another problem, 

00:08:01.590 --> 00:08:04.380
where the software doesn't&nbsp;
catch it before estimation.

00:08:05.244 --> 00:08:08.190
So this is another variant of the same analysis.

00:08:08.190 --> 00:08:09.954
We add one more observation.

00:08:09.954 --> 00:08:15.360
So we add a ninth observation with&nbsp;
values of x1 at 11 and x2 at 0,

00:08:15.360 --> 00:08:18.330
which is the same values than we&nbsp;
had for the eight observation,

00:08:18.330 --> 00:08:23.190
but the y variable resists the value of 0.

00:08:23.190 --> 00:08:26.100
So now we cannot predict perfectly,

00:08:26.100 --> 00:08:30.930
because the prediction calculated&nbsp;
from x1 and x2 is always the same,

00:08:30.930 --> 00:08:34.050
and if we predict one perfectly,

00:08:34.050 --> 00:08:36.360
then we don't predict 0 and vice-versa.

00:08:36.360 --> 00:08:38.670
So we can't predict perfectly using this data.

00:08:39.534 --> 00:08:44.520
So what will happen is that the perfect&nbsp;
prediction check will not trigger.

00:08:44.520 --> 00:08:47.145
Stata will try to estimate it,

00:08:47.145 --> 00:08:49.110
R will try to estimate it as well.

00:08:49.110 --> 00:08:52.350
And we again get a warning.

00:08:52.350 --> 00:08:55.560
So we have 'convergence is not achieved' warning.

00:08:55.560 --> 00:08:58.020
Stata tried to estimate it,

00:08:58.020 --> 00:09:00.060
couldn't find a maximum of the likelihood.

00:09:00.060 --> 00:09:03.330
It went through 1600 iterations,

00:09:03.330 --> 00:09:04.530
which is the default limit,

00:09:04.530 --> 00:09:06.330
and then it just gave up.

00:09:06.330 --> 00:09:08.670
You can, of course, increase the limits,

00:09:08.670 --> 00:09:12.300
have Stata try 10,000&nbsp;
different sets of estimates,

00:09:12.300 --> 00:09:15.840
it still cannot find the maximum

00:09:15.840 --> 00:09:17.880
because for this model it doesn't exist either.

00:09:19.296 --> 00:09:23.376
So Stata tries, gives up.

00:09:23.700 --> 00:09:24.900
So what do we do about it?

00:09:24.900 --> 00:09:28.080
We see that we don't have standard&nbsp;
errors for one of the parameters,

00:09:28.080 --> 00:09:31.494
that's an indication that we have a&nbsp;
problem that we have to deal with.

00:09:31.494 --> 00:09:35.940
At least because we want to report the&nbsp;
standard error or if you know that,

00:09:35.940 --> 00:09:37.470
then at least the p-value,

00:09:37.470 --> 00:09:38.550
and we have nothing to report.

00:09:39.006 --> 00:09:42.546
So the missing standard errors indicate&nbsp;
that there's some kind of problem.

00:09:42.546 --> 00:09:46.860
We can see also that the likelihood&nbsp;
here says that it's not concave.

00:09:46.860 --> 00:09:50.280
And that gives us some information&nbsp;
that is useful for troubleshooting.

00:09:50.280 --> 00:09:55.170
I will not go through the&nbsp;
troubleshooting procedure in this video,

00:09:55.170 --> 00:09:57.600
but just to demonstrate what's available for you,

00:09:57.600 --> 00:10:00.330
what the 'not concave' means?

00:10:00.714 --> 00:10:03.234
When we estimate maximum likelihood,

00:10:03.234 --> 00:10:06.510
then we do trial and error.

00:10:06.510 --> 00:10:07.770
This is from the video where

00:10:07.770 --> 00:10:11.610
I demonstrated the maximum likelihood&nbsp;
estimation of the population mean,

00:10:11.610 --> 00:10:12.960
using this data.

00:10:12.960 --> 00:10:19.080
So we can see that when the values are 2, 3 and 4,

00:10:19.080 --> 00:10:23.190
then a good estimate for the population mean is 3,

00:10:23.190 --> 00:10:25.230
and that's actually the&nbsp;
maximum likelihood estimate.

00:10:25.230 --> 00:10:28.350
If we try any other values to likelihood function,

00:10:28.350 --> 00:10:29.760
we get smaller likelihoods.

00:10:29.760 --> 00:10:34.680
We have the actual likelihood here and&nbsp;
then we have the log-likelihood here.

00:10:34.680 --> 00:10:38.340
What's important about the log-likelihood is that,

00:10:38.340 --> 00:10:41.250
it's a curve that bends down,

00:10:41.250 --> 00:10:42.510
so it kind of curves down.

00:10:42.510 --> 00:10:45.960
And we say that this is a concave curve,

00:10:45.960 --> 00:10:47.580
its curve that curves down.

00:10:47.580 --> 00:10:53.310
And the concave curve has the second derivative,

00:10:53.310 --> 00:10:57.960
which quantifies the curvature&nbsp;
here always negative.

00:10:57.960 --> 00:11:00.060
So if you have a curve that is concave,

00:11:00.060 --> 00:11:01.800
then the second derivative is negative.

00:11:01.800 --> 00:11:04.680
If the second derivative is&nbsp;
negative and this is concave,

00:11:04.680 --> 00:11:07.200
then we know that there's a peak somewhere

00:11:07.200 --> 00:11:09.570
and that the peak is our&nbsp;
maximum likelihood estimate.

00:11:09.570 --> 00:11:11.640
What will happen is that,

00:11:11.640 --> 00:11:15.270
if this curve, for example, is flat here,

00:11:15.270 --> 00:11:18.810
then it's not concave because it's&nbsp;
not bending down all the time.

00:11:18.810 --> 00:11:22.170
And we wouldn't have a maximum of likelihood,

00:11:22.170 --> 00:11:25.860
because we have multiple&nbsp;
different values of the parameter,

00:11:25.860 --> 00:11:27.210
the estimate of the mean,

00:11:27.210 --> 00:11:30.000
that is equally good for the&nbsp;
maximum likelihood perspective.

00:11:30.780 --> 00:11:34.020
Also, we could have a curve&nbsp;
that goes down first

00:11:34.020 --> 00:11:35.160
and then curves up,

00:11:35.160 --> 00:11:36.990
so that would not be concave either.

00:11:36.990 --> 00:11:39.690
So that's what the non-concave means,

00:11:39.690 --> 00:11:42.660
the maximum likelihood is not something&nbsp;
that is easy for us to estimate.

00:11:42.660 --> 00:11:46.380
We can check what's actually&nbsp;
the problem by looking at

00:11:46.380 --> 00:11:49.170
the matrix of the second derivatives here,

00:11:49.170 --> 00:11:53.070
which tell us, how strongly this curves down,

00:11:53.070 --> 00:11:55.830
and we can see that there&nbsp;
are a couple of 0's there.

00:11:55.830 --> 00:11:57.600
So we have these zeros here,

00:11:57.600 --> 00:12:02.568
and that indicates that we have&nbsp;
a problem with these parameters.

00:12:03.504 --> 00:12:06.570
The troubleshooting and an&nbsp;
exact interpretation of this is

00:12:06.570 --> 00:12:08.760
something that I will leave for another video.

00:12:09.912 --> 00:12:13.590
Ok, then let's take a look at the problem.

00:12:13.590 --> 00:12:17.220
We have missing standard errors here,

00:12:17.220 --> 00:12:20.670
which is an indication that we&nbsp;
need to do something definitely.

00:12:20.670 --> 00:12:27.390
And we have a warning that '3 failures&nbsp;
and 2 successes completely determined'.

00:12:28.650 --> 00:12:31.770
The logical thing to do next is to ask,

00:12:31.770 --> 00:12:33.720
which two and which three.

00:12:33.720 --> 00:12:39.480
And to get, which observations&nbsp;
are predicted perfectly,

00:12:39.480 --> 00:12:40.800
we can use the model,

00:12:40.800 --> 00:12:42.150
even if it's not converged,

00:12:42.150 --> 00:12:44.760
we can use the model to&nbsp;
calculate the actual predictions.

00:12:44.760 --> 00:12:49.980
So we can see here that predicted values&nbsp;
for these three zeros are exactly 0's,

00:12:49.980 --> 00:12:53.304
for these two, the predictive&nbsp;
values are exactly 1's,

00:12:53.304 --> 00:12:54.960
and that's the warning.

00:12:54.960 --> 00:12:58.770
And the predictive value for&nbsp;
this is very close to zero,

00:12:58.770 --> 00:13:03.180
so if we would allow Stata to go&nbsp;
on forever, doing the estimation,

00:13:03.180 --> 00:13:06.480
it will probably estimate or&nbsp;
predict this to be 0 as well,

00:13:06.480 --> 00:13:07.320
and this to be 1.

00:13:07.320 --> 00:13:11.250
So we have basically seven observations&nbsp;
that are perfectly predicted

00:13:11.250 --> 00:13:12.480
and two that are not.

00:13:12.480 --> 00:13:14.640
So also what's going on here?

00:13:14.640 --> 00:13:16.410
It's not a perfect prediction,

00:13:16.410 --> 00:13:18.540
because these are not predicted&nbsp;
perfectly, they can't.

00:13:18.540 --> 00:13:22.490
And for that reason, Stata&nbsp;
doesn't catch the problem.

00:13:23.378 --> 00:13:27.140
The logistic regression model&nbsp;
with more than one variable,

00:13:27.140 --> 00:13:30.980
two variables, can be understood&nbsp;
as this kind of surface.

00:13:30.980 --> 00:13:35.300
So we have x1 here, we have x2 here and then

00:13:35.300 --> 00:13:36.800
we have the y on this axis.

00:13:36.800 --> 00:13:37.604
And we can see,

00:13:37.604 --> 00:13:41.060
how the observations depend on x1 and x2.

00:13:41.060 --> 00:13:46.940
The circles here are the values&nbsp;
that are actual values, the 1's.

00:13:46.940 --> 00:13:50.300
The circles down here are&nbsp;
the actual values of 0's.

00:13:50.300 --> 00:13:56.330
And the position of the circle indicates&nbsp;
the values of x1 and x2 variables

00:13:56.330 --> 00:13:57.566
for that observation.

00:13:57.566 --> 00:14:00.800
Then the cross here is the&nbsp;
predicted value on the surface.

00:14:00.800 --> 00:14:03.890
When we do maximum likelihood estimation,

00:14:03.890 --> 00:14:06.050
we want to adjust the surface

00:14:06.050 --> 00:14:09.410
by adjusting the coefficients of x1 and x2,

00:14:09.410 --> 00:14:14.720
so that the predicted values are as&nbsp;
close as the observed values as possible.

00:14:14.720 --> 00:14:17.000
And what will happen again that

00:14:17.000 --> 00:14:20.180
if we make this surface indefinitely steep.

00:14:20.180 --> 00:14:22.610
So we can make it as steep as possible.

00:14:22.610 --> 00:14:25.760
But this one observation is always,

00:14:25.760 --> 00:14:26.900
the predicted value,

00:14:26.900 --> 00:14:28.850
will always be in the middle of the surface,

00:14:28.850 --> 00:14:31.490
and it can't be predicted perfectly,

00:14:31.490 --> 00:14:34.310
because   you can't predict&nbsp;
1 and 0 at the same time.

00:14:34.310 --> 00:14:38.780
So this set of x1 is 11, x2 is 0,

00:14:38.780 --> 00:14:41.390
corresponds with two different values of y,

00:14:41.390 --> 00:14:43.310
so that's why you can't predict perfectly.

00:14:43.310 --> 00:14:45.080
But the problem is the same,

00:14:45.080 --> 00:14:50.060
the coefficients of x1 and x2 grow large,

00:14:50.060 --> 00:14:54.410
the intercept goes toward minus infinity,

00:14:54.410 --> 00:14:58.670
and the log-likelihood increases without limits.

00:14:58.670 --> 00:15:01.880
So you can always make the surface a bit steeper

00:15:01.880 --> 00:15:04.130
and then it'll fit the data a bit better,

00:15:04.130 --> 00:15:07.250
but there's no limit on how large x1 and x2,

00:15:07.250 --> 00:15:09.500
and how small the intercept can be.

00:15:11.732 --> 00:15:13.340
So what do you do with this problem?

00:15:13.340 --> 00:15:16.760
There are four options,

00:15:16.760 --> 00:15:19.910
how you can actually use the analysis.

00:15:19.910 --> 00:15:22.250
And option one,

00:15:22.250 --> 00:15:26.300
just if you use Stata, and&nbsp;
you didn't do the R analysis,

00:15:26.300 --> 00:15:28.880
you wouldn't have noticed that the&nbsp;
softwares give radically different results.

00:15:28.880 --> 00:15:32.330
Choose the results that you got,

00:15:32.330 --> 00:15:35.180
ignore the warning and present&nbsp;
the results in your paper.

00:15:35.700 --> 00:15:38.460
I can't tell you how common that is

00:15:38.460 --> 00:15:40.200
but I'm pretty sure some people will do that.

00:15:40.200 --> 00:15:43.200
So understanding the warning requires effort

00:15:43.200 --> 00:15:45.870
and if you have some estimates&nbsp;
that you could report,

00:15:45.870 --> 00:15:47.460
and not go through the extra effort,

00:15:47.460 --> 00:15:49.410
some people probably will just do.

00:15:49.410 --> 00:15:50.520
That's a bit unethical,

00:15:50.520 --> 00:15:52.170
because the software with the warning,

00:15:52.170 --> 00:15:55.890
tells you that there's a problem&nbsp;
that you should pay attention to,

00:15:55.890 --> 00:15:58.410
and then you're ignoring evidence of a problem

00:15:58.410 --> 00:16:01.500
and reporting the results&nbsp;
as if there was no problem.

00:16:01.500 --> 00:16:04.380
The second alternative is trial and error.

00:16:04.380 --> 00:16:06.330
This is something that I have done a lot

00:16:06.330 --> 00:16:09.810
before I started to think that&nbsp;
maybe I should understand,

00:16:09.810 --> 00:16:10.860
what the computer is doing.

00:16:10.860 --> 00:16:15.180
So you just try, you drop&nbsp;
cases, you drop variables,

00:16:15.180 --> 00:16:17.670
until you get the warning to disappear.

00:16:17.670 --> 00:16:18.960
So you run and run,

00:16:18.960 --> 00:16:21.060
and trial and error without understanding,

00:16:21.060 --> 00:16:24.330
why sometimes the error&nbsp;
appears, sometimes it doesn't,

00:16:24.330 --> 00:16:28.860
and then you pick one of the analyses&nbsp;
that doesn't produce you the error.

00:16:28.860 --> 00:16:32.760
This is a bit better because at least you're&nbsp;
trying to do something for the problem,

00:16:32.760 --> 00:16:35.760
but this trial and error, blindly,

00:16:35.760 --> 00:16:38.580
could lead you with the suboptimal model.

00:16:38.580 --> 00:16:40.740
For example, you're dropping a control variable,

00:16:40.740 --> 00:16:44.460
because the model doesn't converge&nbsp;
because of the control there,

00:16:44.460 --> 00:16:49.110
and then instead of getting the&nbsp;
model to converge with the control,

00:16:49.110 --> 00:16:52.170
you are doing a model that doesn't&nbsp;
control for some explanation

00:16:52.170 --> 00:16:53.430
that you would really like to control.

00:16:53.838 --> 00:16:55.758
So this is not an ideal case.

00:16:56.340 --> 00:16:58.410
The third alternative is a bit better.

00:16:58.410 --> 00:17:01.770
So if you use, for example,&nbsp;
logistic regression a lot,

00:17:01.770 --> 00:17:05.340
this perfect prediction issue that I&nbsp;
demonstrated here is a well-known thing.

00:17:05.340 --> 00:17:10.170
So any decent book on logistic&nbsp;
regression analysis will tell you

00:17:10.170 --> 00:17:13.650
at least the first case, but&nbsp;
probably also the second case.

00:17:13.650 --> 00:17:17.640
Stata user manual will explain to&nbsp;
you both cases that I demonstrated,

00:17:17.640 --> 00:17:21.150
and what Stata does in those scenarios and why.

00:17:21.150 --> 00:17:24.720
So you can try it and learn each special case,

00:17:24.720 --> 00:17:26.100
and how to deal with them, separately.

00:17:26.100 --> 00:17:32.340
And it works if you just want to use a&nbsp;
small number of analysis in your life.

00:17:32.340 --> 00:17:35.640
The problem is that the special&nbsp;
cases for different analysis,

00:17:35.640 --> 00:17:38.400
for example, if you do negative&nbsp;
binomial regression analysis,

00:17:38.400 --> 00:17:39.840
there are different special cases.

00:17:39.840 --> 00:17:43.830
So the number of special cases that&nbsp;
you have to learn is quite large.

00:17:44.622 --> 00:17:49.080
Then are the fourth option is to&nbsp;
understand the estimation principle.

00:17:49.080 --> 00:17:53.920
So what do the second&nbsp;
derivatives and the likelihood,

00:17:53.920 --> 00:17:54.640
what do they mean,

00:17:54.640 --> 00:17:58.270
how do they depend on the parameter values&nbsp;
that the computer is currently trying,

00:17:58.270 --> 00:18:01.480
and then you can see what's the problem.

00:18:01.480 --> 00:18:03.040
So this is, of course, more difficult,

00:18:03.040 --> 00:18:08.140
but in the long run, it'll&nbsp;
make you a better researcher,

00:18:08.140 --> 00:18:11.950
because you can do diagnostics&nbsp;
for your model, in a way,

00:18:11.950 --> 00:18:16.480
that's just trying to memorizing every&nbsp;
special case doesn't allow you to do.

00:18:16.480 --> 00:18:21.640
So these are the four options that&nbsp;
allow you to present some results.

00:18:21.640 --> 00:18:23.050
The one is unethical,

00:18:23.050 --> 00:18:25.360
unethical to ignore warnings.

00:18:25.360 --> 00:18:27.100
Trial and error is bad,

00:18:27.100 --> 00:18:30.610
this is good but it's not the ideal case.

00:18:30.610 --> 00:18:32.200
And in the ideal case, you understand,

00:18:32.200 --> 00:18:32.980
what the software is doing.

00:18:32.980 --> 00:18:35.110
There's also the fifth option,

00:18:35.110 --> 00:18:37.240
which is ignore model,

00:18:37.240 --> 00:18:38.980
so give up,

00:18:38.980 --> 00:18:40.960
don't do the model.

00:18:40.960 --> 00:18:44.650
For example, if you are just&nbsp;
doing a robustness check,

00:18:44.650 --> 00:18:47.320
you have your main analysis results, no warnings,

00:18:47.320 --> 00:18:49.600
and then in the robustness check,

00:18:49.600 --> 00:18:52.990
where you analyze a different&nbsp;
model that's not that important,

00:18:52.990 --> 00:18:54.730
you get a warning.

00:18:54.730 --> 00:19:00.910
So should I spend a day or a&nbsp;
week and troubleshooting it,

00:19:00.910 --> 00:19:04.570
should I spend a month studying first&nbsp;
and then a week troubleshooting it,

00:19:04.570 --> 00:19:07.630
or should I just leave the analysis out?

00:19:07.630 --> 00:19:10.330
Leaving a problematic analysis out,

00:19:10.330 --> 00:19:15.460
it's a better alternative&nbsp;
than ignoring the warning,

00:19:15.460 --> 00:19:17.620
ignoring the problem and&nbsp;
reporting the results anyway.

00:19:17.620 --> 00:19:19.510
I do this all the time.

00:19:19.510 --> 00:19:21.100
When the problem is not important,

00:19:21.100 --> 00:19:23.950
I don't want to spend my&nbsp;
time dealing with problems.

00:19:23.950 --> 00:19:25.570
So it's a perfectly viable option,

00:19:25.570 --> 00:19:26.920
that's always something that you should consider