WEBVTT
Kind: captions
Language: en
00:00:00.060 --> 00:00:04.440
Normal regression analysis is
a very convenient technique,
00:00:04.440 --> 00:00:06.720
because it will always give you some results.
00:00:06.720 --> 00:00:09.810
Maximum likelihood estimation, on the other hand,
00:00:09.810 --> 00:00:11.250
can sometimes fail.
00:00:11.250 --> 00:00:14.400
And to understand, why it fails,
00:00:14.400 --> 00:00:16.590
will allow you to troubleshoot your models
00:00:16.590 --> 00:00:19.770
and make informed decisions on
how to get the model to work.
00:00:19.770 --> 00:00:24.570
In this video, I will show you an example
of a logistic regression analysis.
00:00:24.570 --> 00:00:26.670
The purpose of the video is not to demonstrate
00:00:26.670 --> 00:00:31.320
the logistic regression analysis feature
specifically, but more generally,
00:00:31.320 --> 00:00:34.920
what could cause a maximum likelihood
estimation process to fail?
00:00:34.920 --> 00:00:36.810
Let's take a look at this data.
00:00:36.810 --> 00:00:42.870
And we have eight observations
X1, X2, two independent variables,
00:00:42.870 --> 00:00:46.926
dependent variable Y, that
receives values of 1 and 0.
00:00:46.926 --> 00:00:49.860
And we'll run a logistic regression analysis
00:00:49.860 --> 00:00:52.230
explaining why using X1 and X2,
00:00:52.230 --> 00:00:53.400
and see what happens.
00:00:53.400 --> 00:00:55.440
So the analysis setup is here,
00:00:55.440 --> 00:00:59.040
we'll be using two different softwares
just to demonstrate software differences.
00:00:59.040 --> 00:01:03.000
So this is R and this is Stata
syntax for running this model.
00:01:03.528 --> 00:01:05.508
And the results are in.
00:01:06.270 --> 00:01:08.886
So, what happens, what do we know first?
00:01:08.886 --> 00:01:13.110
The first thing we know that
there are lots of things missing
00:01:13.110 --> 00:01:14.760
from the Stata output.
00:01:14.760 --> 00:01:16.410
We don't have significance tests,
00:01:16.410 --> 00:01:18.660
we don't have standard errors,
00:01:18.660 --> 00:01:22.230
we don't have the overall model test.
00:01:22.230 --> 00:01:24.210
So we have results that are missing.
00:01:24.210 --> 00:01:30.450
Another thing is that softwares
give us different results.
00:01:30.450 --> 00:01:33.510
So R says that the effect of x1 is 15,
00:01:33.510 --> 00:01:37.680
and Stata says that the effect of x1 is 33,
00:01:37.680 --> 00:01:43.200
and the effect of x2 is 30, and the
effect of x2 according to R is 6.
00:01:43.200 --> 00:01:48.060
So these are significantly or
substantially different results,
00:01:48.060 --> 00:01:52.440
because normally if we interpret
the coefficients using odds ratios,
00:01:52.440 --> 00:01:53.820
we exponentiate them.
00:01:53.820 --> 00:01:57.780
So the difference between 15 and 30 is,
00:01:57.780 --> 00:01:59.400
on an exponential scale,
00:01:59.400 --> 00:02:00.780
that's a huge difference.
00:02:00.780 --> 00:02:02.190
So what do we do?
00:02:02.190 --> 00:02:06.720
Do we just pick one of
these two sets of estimates,
00:02:06.720 --> 00:02:10.740
and report that set as if there was no problem?
00:02:11.268 --> 00:02:14.688
Well, we need to understand what's going on?
00:02:14.688 --> 00:02:18.246
Also, we know that the likelihood is zero here,
00:02:18.246 --> 00:02:21.960
and the log-likelihood is 0 here as well,
00:02:21.960 --> 00:02:23.520
which means that likelihood is 1.
00:02:23.520 --> 00:02:26.640
That's a very unusual scenario.
00:02:26.640 --> 00:02:30.840
So it means that getting this
kind of data from this model
00:02:30.840 --> 00:02:35.100
would be a 100% probability.
00:02:35.100 --> 00:02:37.110
So getting any other observations,
00:02:37.110 --> 00:02:39.450
any other values for the Y variable
00:02:39.450 --> 00:02:41.550
would be impossible from this model.
00:02:41.550 --> 00:02:44.670
And you don't have that kind of perfect models.
00:02:44.670 --> 00:02:45.450
So what's going on?
00:02:46.698 --> 00:02:49.590
Then we also have this warning,
00:02:49.590 --> 00:02:53.130
and the successes and failures
are completely determined.
00:02:53.130 --> 00:02:57.960
And R gives a bit less user-friendly warning,
00:02:57.960 --> 00:03:01.890
just numerical probabilities,
numerically 0 and 1 occurred.
00:03:01.890 --> 00:03:05.550
The important thing about warnings is that,
00:03:05.550 --> 00:03:10.170
when you get a warning, that is software
telling you that there's something going on
00:03:10.170 --> 00:03:11.670
that you should pay attention to.
00:03:11.670 --> 00:03:15.390
So warnings are not some inconveniences
that you can just ignore,
00:03:15.390 --> 00:03:18.150
and then report the results if you got any.
00:03:18.150 --> 00:03:23.370
The warning is something that you need
to then spend some time understanding,
00:03:23.370 --> 00:03:24.960
what is the warning telling you and,
00:03:24.960 --> 00:03:27.480
why is the warning occurring,
00:03:27.480 --> 00:03:30.000
and then what you can do about the warning?
00:03:30.000 --> 00:03:34.170
You should not report any
analysis with the warning,
00:03:34.170 --> 00:03:36.180
unless you know what the warning is,
00:03:36.180 --> 00:03:39.030
and have made an explicit decision
not to care about the warning.
00:03:39.030 --> 00:03:41.400
Generally, we want these warnings to go away.
00:03:41.784 --> 00:03:43.884
So what's the cause?
00:03:43.884 --> 00:03:46.878
Let's take a look at the
data set a bit more closely.
00:03:46.878 --> 00:03:50.628
And in this case, the problem is the variable x1.
00:03:50.628 --> 00:03:53.058
So we can just take x2 out.
00:03:53.058 --> 00:03:56.730
What do we see here with x1 and why?
00:03:56.730 --> 00:04:02.760
We see that when x1 receives
values greater than 4,
00:04:02.760 --> 00:04:04.950
y is always 1,
00:04:04.950 --> 00:04:07.740
and when x1 receives value less than 4,
00:04:07.740 --> 00:04:10.320
y is always 0.
00:04:10.320 --> 00:04:15.210
So the x value here perfectly
predicts the value of y.
00:04:15.210 --> 00:04:16.470
So that's the thing.
00:04:16.470 --> 00:04:21.090
And why that would be problematic
for maximum likelihood estimation?
00:04:21.090 --> 00:04:24.180
Let's take a look at how maximum
likelihood estimation works.
00:04:24.564 --> 00:04:27.474
So this is the R analysis.
00:04:27.780 --> 00:04:30.300
And maximum likelihood estimation always starts
00:04:30.300 --> 00:04:32.250
with some kind of initial guess.
00:04:32.250 --> 00:04:35.040
So the computer is fitting an s-curve,
00:04:35.040 --> 00:04:37.140
because this is logistic regression analysis.
00:04:37.140 --> 00:04:40.800
And the first guess is that the s is quite,
00:04:43.020 --> 00:04:45.870
it is not very steep, but it goes up.
00:04:45.870 --> 00:04:49.230
So it goes up for x1, instead of going down.
00:04:49.230 --> 00:04:54.270
And the estimation then proceeds by
00:04:54.270 --> 00:04:57.930
trying different values for the coefficient x,
00:04:57.930 --> 00:05:00.990
so that the curve would fit the data better.
00:05:00.990 --> 00:05:05.370
And in this case, we originally
had the curve fitting here.
00:05:05.370 --> 00:05:10.380
So it predicts this observation
to have about 60% probability.
00:05:10.380 --> 00:05:13.776
And then we make the curve steeper and steeper,
00:05:13.776 --> 00:05:17.280
we can see that this observation is
explained or predicted better and better.
00:05:18.432 --> 00:05:24.300
The problem here is that there is no
limit on how steep the curve can be.
00:05:24.300 --> 00:05:26.970
So the steeper you make the curve,
00:05:26.970 --> 00:05:29.010
the better it predicts these observations.
00:05:29.010 --> 00:05:33.090
And you can make it indefinitely steep.
00:05:33.090 --> 00:05:34.860
So there is no limit,
00:05:34.860 --> 00:05:38.280
how much you can increase the x1 coefficient here,
00:05:38.280 --> 00:05:40.500
and it will always make the curve a bit steeper.
00:05:40.500 --> 00:05:42.990
So we can see that it's not straight up yet,
00:05:42.990 --> 00:05:46.230
we could still make it a few pixels steeper.
00:05:46.230 --> 00:05:51.030
So the coefficient of x1 just goes to infinity,
00:05:51.030 --> 00:05:52.836
if we allow the process to continue.
00:05:52.980 --> 00:05:55.560
What will happen to the likelihood,
00:05:55.560 --> 00:05:58.440
or the log-likelihood in
this case, it'll go to zero,
00:05:58.440 --> 00:06:04.080
and it gets to zero when every
observation is predicted perfectly.
00:06:05.160 --> 00:06:09.720
So we don't have a maximum for this likelihood,
00:06:09.720 --> 00:06:12.420
because the likelihood can never be exactly 0.
00:06:12.420 --> 00:06:15.300
It just goes very very close to zero,
00:06:15.300 --> 00:06:17.790
but we can always make the
curve a little bit steeper
00:06:17.790 --> 00:06:20.310
to make the log-likelihood closer to 0.
00:06:21.222 --> 00:06:26.610
So the maximum of this
log-likelihood here does not exist.
00:06:26.610 --> 00:06:31.020
The consequence is that the
maximum likelihood estimates,
00:06:31.020 --> 00:06:32.520
for this model, don't exist either.
00:06:32.520 --> 00:06:36.150
So maximum likelihood
estimate is indeterminate,
00:06:36.150 --> 00:06:41.640
because making x1 larger and larger
will always fit the data a bit better.
00:06:41.640 --> 00:06:43.680
The increase in fit is marginal,
00:06:43.680 --> 00:06:50.400
but we can't say that x1 coefficient
50, would be the correct value,
00:06:50.400 --> 00:06:54.540
because the coefficient of
51 would fit the data better.
00:06:55.284 --> 00:06:58.020
So the estimates don't exist.
00:06:58.020 --> 00:06:59.424
So what do you do?
00:06:59.424 --> 00:07:03.090
This is a scenario that is so well understood that
00:07:03.090 --> 00:07:05.430
statistical softwares have checks for this.
00:07:05.430 --> 00:07:07.860
So this is from Stata user manual.
00:07:07.860 --> 00:07:10.080
And if we run the logistic model,
00:07:10.080 --> 00:07:14.220
without the s is modified that I
had in before to force it to run,
00:07:14.220 --> 00:07:17.310
Stata says that, no we can't run it,
00:07:17.310 --> 00:07:20.160
because x1 predicts the data perfectly,
00:07:20.160 --> 00:07:21.810
the estimates don't exist.
00:07:21.810 --> 00:07:25.410
And they have an explanation about it.
00:07:25.410 --> 00:07:28.140
So there are a couple of pages
explanation in the user manual,
00:07:28.140 --> 00:07:29.820
what causes this problem,
00:07:29.820 --> 00:07:31.620
how Stata deals with it,
00:07:31.620 --> 00:07:32.940
and what you can do about it?
00:07:32.940 --> 00:07:40.320
The problem is that not all possible scenarios
are programmed into your statistical software.
00:07:40.320 --> 00:07:41.970
So there are scenarios,
00:07:41.970 --> 00:07:44.700
where maximum likelihood estimation can fail,
00:07:44.700 --> 00:07:46.800
and there is no specific check,
00:07:46.800 --> 00:07:51.150
and then it will just fail and
you have no warning indicating,
00:07:51.150 --> 00:07:52.830
why it failed.
00:07:54.000 --> 00:07:56.520
The perfect prediction, that's well understood,
00:07:56.520 --> 00:07:58.200
so you can rely on software catching up.
00:07:58.200 --> 00:08:01.590
But now let's take a look at another problem,
00:08:01.590 --> 00:08:04.380
where the software doesn't
catch it before estimation.
00:08:05.244 --> 00:08:08.190
So this is another variant of the same analysis.
00:08:08.190 --> 00:08:09.954
We add one more observation.
00:08:09.954 --> 00:08:15.360
So we add a ninth observation with
values of x1 at 11 and x2 at 0,
00:08:15.360 --> 00:08:18.330
which is the same values than we
had for the eight observation,
00:08:18.330 --> 00:08:23.190
but the y variable resists the value of 0.
00:08:23.190 --> 00:08:26.100
So now we cannot predict perfectly,
00:08:26.100 --> 00:08:30.930
because the prediction calculated
from x1 and x2 is always the same,
00:08:30.930 --> 00:08:34.050
and if we predict one perfectly,
00:08:34.050 --> 00:08:36.360
then we don't predict 0 and vice-versa.
00:08:36.360 --> 00:08:38.670
So we can't predict perfectly using this data.
00:08:39.534 --> 00:08:44.520
So what will happen is that the perfect
prediction check will not trigger.
00:08:44.520 --> 00:08:47.145
Stata will try to estimate it,
00:08:47.145 --> 00:08:49.110
R will try to estimate it as well.
00:08:49.110 --> 00:08:52.350
And we again get a warning.
00:08:52.350 --> 00:08:55.560
So we have 'convergence is not achieved' warning.
00:08:55.560 --> 00:08:58.020
Stata tried to estimate it,
00:08:58.020 --> 00:09:00.060
couldn't find a maximum of the likelihood.
00:09:00.060 --> 00:09:03.330
It went through 1600 iterations,
00:09:03.330 --> 00:09:04.530
which is the default limit,
00:09:04.530 --> 00:09:06.330
and then it just gave up.
00:09:06.330 --> 00:09:08.670
You can, of course, increase the limits,
00:09:08.670 --> 00:09:12.300
have Stata try 10,000
different sets of estimates,
00:09:12.300 --> 00:09:15.840
it still cannot find the maximum
00:09:15.840 --> 00:09:17.880
because for this model it doesn't exist either.
00:09:19.296 --> 00:09:23.376
So Stata tries, gives up.
00:09:23.700 --> 00:09:24.900
So what do we do about it?
00:09:24.900 --> 00:09:28.080
We see that we don't have standard
errors for one of the parameters,
00:09:28.080 --> 00:09:31.494
that's an indication that we have a
problem that we have to deal with.
00:09:31.494 --> 00:09:35.940
At least because we want to report the
standard error or if you know that,
00:09:35.940 --> 00:09:37.470
then at least the p-value,
00:09:37.470 --> 00:09:38.550
and we have nothing to report.
00:09:39.006 --> 00:09:42.546
So the missing standard errors indicate
that there's some kind of problem.
00:09:42.546 --> 00:09:46.860
We can see also that the likelihood
here says that it's not concave.
00:09:46.860 --> 00:09:50.280
And that gives us some information
that is useful for troubleshooting.
00:09:50.280 --> 00:09:55.170
I will not go through the
troubleshooting procedure in this video,
00:09:55.170 --> 00:09:57.600
but just to demonstrate what's available for you,
00:09:57.600 --> 00:10:00.330
what the 'not concave' means?
00:10:00.714 --> 00:10:03.234
When we estimate maximum likelihood,
00:10:03.234 --> 00:10:06.510
then we do trial and error.
00:10:06.510 --> 00:10:07.770
This is from the video where
00:10:07.770 --> 00:10:11.610
I demonstrated the maximum likelihood
estimation of the population mean,
00:10:11.610 --> 00:10:12.960
using this data.
00:10:12.960 --> 00:10:19.080
So we can see that when the values are 2, 3 and 4,
00:10:19.080 --> 00:10:23.190
then a good estimate for the population mean is 3,
00:10:23.190 --> 00:10:25.230
and that's actually the
maximum likelihood estimate.
00:10:25.230 --> 00:10:28.350
If we try any other values to likelihood function,
00:10:28.350 --> 00:10:29.760
we get smaller likelihoods.
00:10:29.760 --> 00:10:34.680
We have the actual likelihood here and
then we have the log-likelihood here.
00:10:34.680 --> 00:10:38.340
What's important about the log-likelihood is that,
00:10:38.340 --> 00:10:41.250
it's a curve that bends down,
00:10:41.250 --> 00:10:42.510
so it kind of curves down.
00:10:42.510 --> 00:10:45.960
And we say that this is a concave curve,
00:10:45.960 --> 00:10:47.580
its curve that curves down.
00:10:47.580 --> 00:10:53.310
And the concave curve has the second derivative,
00:10:53.310 --> 00:10:57.960
which quantifies the curvature
here always negative.
00:10:57.960 --> 00:11:00.060
So if you have a curve that is concave,
00:11:00.060 --> 00:11:01.800
then the second derivative is negative.
00:11:01.800 --> 00:11:04.680
If the second derivative is
negative and this is concave,
00:11:04.680 --> 00:11:07.200
then we know that there's a peak somewhere
00:11:07.200 --> 00:11:09.570
and that the peak is our
maximum likelihood estimate.
00:11:09.570 --> 00:11:11.640
What will happen is that,
00:11:11.640 --> 00:11:15.270
if this curve, for example, is flat here,
00:11:15.270 --> 00:11:18.810
then it's not concave because it's
not bending down all the time.
00:11:18.810 --> 00:11:22.170
And we wouldn't have a maximum of likelihood,
00:11:22.170 --> 00:11:25.860
because we have multiple
different values of the parameter,
00:11:25.860 --> 00:11:27.210
the estimate of the mean,
00:11:27.210 --> 00:11:30.000
that is equally good for the
maximum likelihood perspective.
00:11:30.780 --> 00:11:34.020
Also, we could have a curve
that goes down first
00:11:34.020 --> 00:11:35.160
and then curves up,
00:11:35.160 --> 00:11:36.990
so that would not be concave either.
00:11:36.990 --> 00:11:39.690
So that's what the non-concave means,
00:11:39.690 --> 00:11:42.660
the maximum likelihood is not something
that is easy for us to estimate.
00:11:42.660 --> 00:11:46.380
We can check what's actually
the problem by looking at
00:11:46.380 --> 00:11:49.170
the matrix of the second derivatives here,
00:11:49.170 --> 00:11:53.070
which tell us, how strongly this curves down,
00:11:53.070 --> 00:11:55.830
and we can see that there
are a couple of 0's there.
00:11:55.830 --> 00:11:57.600
So we have these zeros here,
00:11:57.600 --> 00:12:02.568
and that indicates that we have
a problem with these parameters.
00:12:03.504 --> 00:12:06.570
The troubleshooting and an
exact interpretation of this is
00:12:06.570 --> 00:12:08.760
something that I will leave for another video.
00:12:09.912 --> 00:12:13.590
Ok, then let's take a look at the problem.
00:12:13.590 --> 00:12:17.220
We have missing standard errors here,
00:12:17.220 --> 00:12:20.670
which is an indication that we
need to do something definitely.
00:12:20.670 --> 00:12:27.390
And we have a warning that '3 failures
and 2 successes completely determined'.
00:12:28.650 --> 00:12:31.770
The logical thing to do next is to ask,
00:12:31.770 --> 00:12:33.720
which two and which three.
00:12:33.720 --> 00:12:39.480
And to get, which observations
are predicted perfectly,
00:12:39.480 --> 00:12:40.800
we can use the model,
00:12:40.800 --> 00:12:42.150
even if it's not converged,
00:12:42.150 --> 00:12:44.760
we can use the model to
calculate the actual predictions.
00:12:44.760 --> 00:12:49.980
So we can see here that predicted values
for these three zeros are exactly 0's,
00:12:49.980 --> 00:12:53.304
for these two, the predictive
values are exactly 1's,
00:12:53.304 --> 00:12:54.960
and that's the warning.
00:12:54.960 --> 00:12:58.770
And the predictive value for
this is very close to zero,
00:12:58.770 --> 00:13:03.180
so if we would allow Stata to go
on forever, doing the estimation,
00:13:03.180 --> 00:13:06.480
it will probably estimate or
predict this to be 0 as well,
00:13:06.480 --> 00:13:07.320
and this to be 1.
00:13:07.320 --> 00:13:11.250
So we have basically seven observations
that are perfectly predicted
00:13:11.250 --> 00:13:12.480
and two that are not.
00:13:12.480 --> 00:13:14.640
So also what's going on here?
00:13:14.640 --> 00:13:16.410
It's not a perfect prediction,
00:13:16.410 --> 00:13:18.540
because these are not predicted
perfectly, they can't.
00:13:18.540 --> 00:13:22.490
And for that reason, Stata
doesn't catch the problem.
00:13:23.378 --> 00:13:27.140
The logistic regression model
with more than one variable,
00:13:27.140 --> 00:13:30.980
two variables, can be understood
as this kind of surface.
00:13:30.980 --> 00:13:35.300
So we have x1 here, we have x2 here and then
00:13:35.300 --> 00:13:36.800
we have the y on this axis.
00:13:36.800 --> 00:13:37.604
And we can see,
00:13:37.604 --> 00:13:41.060
how the observations depend on x1 and x2.
00:13:41.060 --> 00:13:46.940
The circles here are the values
that are actual values, the 1's.
00:13:46.940 --> 00:13:50.300
The circles down here are
the actual values of 0's.
00:13:50.300 --> 00:13:56.330
And the position of the circle indicates
the values of x1 and x2 variables
00:13:56.330 --> 00:13:57.566
for that observation.
00:13:57.566 --> 00:14:00.800
Then the cross here is the
predicted value on the surface.
00:14:00.800 --> 00:14:03.890
When we do maximum likelihood estimation,
00:14:03.890 --> 00:14:06.050
we want to adjust the surface
00:14:06.050 --> 00:14:09.410
by adjusting the coefficients of x1 and x2,
00:14:09.410 --> 00:14:14.720
so that the predicted values are as
close as the observed values as possible.
00:14:14.720 --> 00:14:17.000
And what will happen again that
00:14:17.000 --> 00:14:20.180
if we make this surface indefinitely steep.
00:14:20.180 --> 00:14:22.610
So we can make it as steep as possible.
00:14:22.610 --> 00:14:25.760
But this one observation is always,
00:14:25.760 --> 00:14:26.900
the predicted value,
00:14:26.900 --> 00:14:28.850
will always be in the middle of the surface,
00:14:28.850 --> 00:14:31.490
and it can't be predicted perfectly,
00:14:31.490 --> 00:14:34.310
because you can't predict
1 and 0 at the same time.
00:14:34.310 --> 00:14:38.780
So this set of x1 is 11, x2 is 0,
00:14:38.780 --> 00:14:41.390
corresponds with two different values of y,
00:14:41.390 --> 00:14:43.310
so that's why you can't predict perfectly.
00:14:43.310 --> 00:14:45.080
But the problem is the same,
00:14:45.080 --> 00:14:50.060
the coefficients of x1 and x2 grow large,
00:14:50.060 --> 00:14:54.410
the intercept goes toward minus infinity,
00:14:54.410 --> 00:14:58.670
and the log-likelihood increases without limits.
00:14:58.670 --> 00:15:01.880
So you can always make the surface a bit steeper
00:15:01.880 --> 00:15:04.130
and then it'll fit the data a bit better,
00:15:04.130 --> 00:15:07.250
but there's no limit on how large x1 and x2,
00:15:07.250 --> 00:15:09.500
and how small the intercept can be.
00:15:11.732 --> 00:15:13.340
So what do you do with this problem?
00:15:13.340 --> 00:15:16.760
There are four options,
00:15:16.760 --> 00:15:19.910
how you can actually use the analysis.
00:15:19.910 --> 00:15:22.250
And option one,
00:15:22.250 --> 00:15:26.300
just if you use Stata, and
you didn't do the R analysis,
00:15:26.300 --> 00:15:28.880
you wouldn't have noticed that the
softwares give radically different results.
00:15:28.880 --> 00:15:32.330
Choose the results that you got,
00:15:32.330 --> 00:15:35.180
ignore the warning and present
the results in your paper.
00:15:35.700 --> 00:15:38.460
I can't tell you how common that is
00:15:38.460 --> 00:15:40.200
but I'm pretty sure some people will do that.
00:15:40.200 --> 00:15:43.200
So understanding the warning requires effort
00:15:43.200 --> 00:15:45.870
and if you have some estimates
that you could report,
00:15:45.870 --> 00:15:47.460
and not go through the extra effort,
00:15:47.460 --> 00:15:49.410
some people probably will just do.
00:15:49.410 --> 00:15:50.520
That's a bit unethical,
00:15:50.520 --> 00:15:52.170
because the software with the warning,
00:15:52.170 --> 00:15:55.890
tells you that there's a problem
that you should pay attention to,
00:15:55.890 --> 00:15:58.410
and then you're ignoring evidence of a problem
00:15:58.410 --> 00:16:01.500
and reporting the results
as if there was no problem.
00:16:01.500 --> 00:16:04.380
The second alternative is trial and error.
00:16:04.380 --> 00:16:06.330
This is something that I have done a lot
00:16:06.330 --> 00:16:09.810
before I started to think that
maybe I should understand,
00:16:09.810 --> 00:16:10.860
what the computer is doing.
00:16:10.860 --> 00:16:15.180
So you just try, you drop
cases, you drop variables,
00:16:15.180 --> 00:16:17.670
until you get the warning to disappear.
00:16:17.670 --> 00:16:18.960
So you run and run,
00:16:18.960 --> 00:16:21.060
and trial and error without understanding,
00:16:21.060 --> 00:16:24.330
why sometimes the error
appears, sometimes it doesn't,
00:16:24.330 --> 00:16:28.860
and then you pick one of the analyses
that doesn't produce you the error.
00:16:28.860 --> 00:16:32.760
This is a bit better because at least you're
trying to do something for the problem,
00:16:32.760 --> 00:16:35.760
but this trial and error, blindly,
00:16:35.760 --> 00:16:38.580
could lead you with the suboptimal model.
00:16:38.580 --> 00:16:40.740
For example, you're dropping a control variable,
00:16:40.740 --> 00:16:44.460
because the model doesn't converge
because of the control there,
00:16:44.460 --> 00:16:49.110
and then instead of getting the
model to converge with the control,
00:16:49.110 --> 00:16:52.170
you are doing a model that doesn't
control for some explanation
00:16:52.170 --> 00:16:53.430
that you would really like to control.
00:16:53.838 --> 00:16:55.758
So this is not an ideal case.
00:16:56.340 --> 00:16:58.410
The third alternative is a bit better.
00:16:58.410 --> 00:17:01.770
So if you use, for example,
logistic regression a lot,
00:17:01.770 --> 00:17:05.340
this perfect prediction issue that I
demonstrated here is a well-known thing.
00:17:05.340 --> 00:17:10.170
So any decent book on logistic
regression analysis will tell you
00:17:10.170 --> 00:17:13.650
at least the first case, but
probably also the second case.
00:17:13.650 --> 00:17:17.640
Stata user manual will explain to
you both cases that I demonstrated,
00:17:17.640 --> 00:17:21.150
and what Stata does in those scenarios and why.
00:17:21.150 --> 00:17:24.720
So you can try it and learn each special case,
00:17:24.720 --> 00:17:26.100
and how to deal with them, separately.
00:17:26.100 --> 00:17:32.340
And it works if you just want to use a
small number of analysis in your life.
00:17:32.340 --> 00:17:35.640
The problem is that the special
cases for different analysis,
00:17:35.640 --> 00:17:38.400
for example, if you do negative
binomial regression analysis,
00:17:38.400 --> 00:17:39.840
there are different special cases.
00:17:39.840 --> 00:17:43.830
So the number of special cases that
you have to learn is quite large.
00:17:44.622 --> 00:17:49.080
Then are the fourth option is to
understand the estimation principle.
00:17:49.080 --> 00:17:53.920
So what do the second
derivatives and the likelihood,
00:17:53.920 --> 00:17:54.640
what do they mean,
00:17:54.640 --> 00:17:58.270
how do they depend on the parameter values
that the computer is currently trying,
00:17:58.270 --> 00:18:01.480
and then you can see what's the problem.
00:18:01.480 --> 00:18:03.040
So this is, of course, more difficult,
00:18:03.040 --> 00:18:08.140
but in the long run, it'll
make you a better researcher,
00:18:08.140 --> 00:18:11.950
because you can do diagnostics
for your model, in a way,
00:18:11.950 --> 00:18:16.480
that's just trying to memorizing every
special case doesn't allow you to do.
00:18:16.480 --> 00:18:21.640
So these are the four options that
allow you to present some results.
00:18:21.640 --> 00:18:23.050
The one is unethical,
00:18:23.050 --> 00:18:25.360
unethical to ignore warnings.
00:18:25.360 --> 00:18:27.100
Trial and error is bad,
00:18:27.100 --> 00:18:30.610
this is good but it's not the ideal case.
00:18:30.610 --> 00:18:32.200
And in the ideal case, you understand,
00:18:32.200 --> 00:18:32.980
what the software is doing.
00:18:32.980 --> 00:18:35.110
There's also the fifth option,
00:18:35.110 --> 00:18:37.240
which is ignore model,
00:18:37.240 --> 00:18:38.980
so give up,
00:18:38.980 --> 00:18:40.960
don't do the model.
00:18:40.960 --> 00:18:44.650
For example, if you are just
doing a robustness check,
00:18:44.650 --> 00:18:47.320
you have your main analysis results, no warnings,
00:18:47.320 --> 00:18:49.600
and then in the robustness check,
00:18:49.600 --> 00:18:52.990
where you analyze a different
model that's not that important,
00:18:52.990 --> 00:18:54.730
you get a warning.
00:18:54.730 --> 00:19:00.910
So should I spend a day or a
week and troubleshooting it,
00:19:00.910 --> 00:19:04.570
should I spend a month studying first
and then a week troubleshooting it,
00:19:04.570 --> 00:19:07.630
or should I just leave the analysis out?
00:19:07.630 --> 00:19:10.330
Leaving a problematic analysis out,
00:19:10.330 --> 00:19:15.460
it's a better alternative
than ignoring the warning,
00:19:15.460 --> 00:19:17.620
ignoring the problem and
reporting the results anyway.
00:19:17.620 --> 00:19:19.510
I do this all the time.
00:19:19.510 --> 00:19:21.100
When the problem is not important,
00:19:21.100 --> 00:19:23.950
I don't want to spend my
time dealing with problems.
00:19:23.950 --> 00:19:25.570
So it's a perfectly viable option,
00:19:25.570 --> 00:19:26.920
that's always something that you should consider