outsider
There is an existing linear combination of regressiors almost perfectly predict the reponse variable. The condition cause the maximum likelihood estimation iteration either to fail or to produce extremely large estimators. A complete separation would be the speical case. You may need to find out the regressor to make some adjustment.
hongtianli
Existence of Maximum Likelihood Estimates
The likelihood equation for a logistic regression model does not always have a finite solution. Sometimes there is a nonunique maximum on the boundary of the parameter space, at infinity. The existence, finiteness, and uniqueness of maximum likelihood estimates for the logistic regression model depend on the patterns of data points in the observation space (Albert and Anderson 1984; Santner and Duffy 1986). The existence checks are not performed for conditional logistic regression.
Consider a binary response model. Let Yj be the response of the ith subject and let xj be the vector of explanatory variables (including the constant 1 associated with the intercept). There are three mutually exclusive and exhaustive types of data configurations: complete separation, quasi-complete separation, and overlap.
Complete Separation
There is a complete separation of data points if there exists a vector b that correctly allocates all observations to their response groups; that is,
This configuration gives nonunique infinite estimates. If the iterative process of maximizing the likelihood function is allowed to continue, the log likelihood diminishes to zero, and the dispersion matrix becomes unbounded.
Quasi-Complete Separation
The data are not completely separable but there is a vector b such that
and equality holds for at least one subject in each response group. This configuration also yields nonunique infinite estimates. If the iterative process of maximizing the likelihood function is allowed to continue, the dispersion matrix becomes unbounded and the log likelihood diminishes to a nonzero constant.
Overlap
If neither complete nor quasi-complete separation exists in the sample points, there is an overlap of sample points. In this configuration, the maximum likelihood estimates exist and are unique.
examples:
1、when you input:
data a;
input y x;
cards;
1 2
0 -1
1 2
0 -1
1 2
0 -1
1 2
0 -1
1 2
0 -1
1 2
0 -2
1 2
0 -1
1 2
0 -1
;
run;
proc logistic;
model y=x;
run;
then you'll read in log"
WARNING: There is a complete separation of data points. The maximum likelihood estimate does not exist.
WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood iteration. Validity of the model fit is questionable.
"
and while you input:
data a;
input y x;
cards;
1 0
0 -1
1 2
0 -1
1 2
0 -1
1 2
0 -1
1 2
0 -1
1 2
0 -2
1 2
0 0
1 2
0 -1
;
run;
proc print; run;
proc logistic;
model y=x;
run;
then the log shows:
"WARNING: There is possibly a quasi-complete separation of data points. The maximum likelihood estimate may not exist.
WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood iteration. Validity of the model fit is questionable.
"
综上:当用最大似然函数,采用迭代法估计回归系数时,完全线性可分(Complete Separation )或者近似完全线性可分(Quasi-Complete Separation ),无法得到估计值;只有在重叠时(Overlap ),方可得到估计值。线性可分的含义是:用一条线,很明显就可以把两个组分开,没有交叉,故不用估计,就已经把两组分开了。