Multinomial logit model

Purpose

Multinomial logit model is used to estimate probability of each categorical outcome from multiple choices. Basic idea is same to binary logit model; set a hidden factor z for each probability and build regression equations on them. Its likelihood is given by a function involving probabilities. Finally, maximizing sum of logarithm of likelihood leads to estimates of coefficients.

Note

Most important feature of multinomial logit model is to set a “base category.” It is needed because multinomial logit estimates probabilities of shift from base to other categories.

In short, it estimates relative probability of outcomes to base outcome.

Calculation

Let’s say here is a case where there can be k outcomes. First, set a latent factor z for each outcome and builid regression equations on them.

$z_i = \alpha_i + \beta_{i1} x_1 + \beta_{i2} x_2 + \epsilon$

Probability of each outcome is given by a function below;

$p_i = \frac{exp(z_i)}{ \Sigma^{k-1}_{i=0} exp( z_i )}$

where z0 = 0, so therefore exp(z0) = 1.

Then, the likelihood is computed as product of all probabilities; L(θ; y)=Πpi

Now, maximizing sum of logarithm of likelihood leads to coefficients estimation.

Example

Here is an example of medicine choices from 3 brands by 735 patients. Brand A, B, and C are denoted here as 1, 2, and 3. Independent variables are gender and age.

Now, hidden variable z for each outcome should be created.

Computing z for 1(z0), 2(z1), and 3(z2). Note β1 has 2 coefficients for dummy variable (female). So, equation is zi = αi + β1(0)(1-0)(female) + β1(1)(female) + β2(age)

Next, logit of each z (estimated probability) is calculated as below. Look into the form of equations.

As mentioned above, again, the equation is pi = exp(zi)/Σexp(zi) where z1 = 0 thus exp(z1) = 1.

Note: By the way, as you can see, z0 here is not used anywhere else. So, actually it isn’t needed.

And product of all probabilities is likelihood. The goal is to maximize sum of its logarithm.

And the predicted category will be naturally the choice of highest probability.

Finally, Excel solver will find coefficients to maximize sum of log-likelihood.

Now, it’s done. See the picture below;

Results. See coefficients of z1 are unchanged, because z1 is not involving calculation of estimated probabilities.

“Goodness” in the picture above is the proportion of correct prediction; this model predicted 315 patients’ choices correctly out of 735 patients.

Appendix: z0 = 0, thus exp(z0) = 1

Here is a short note about why z0 = 0.

• As mentioned earlier, multinomial estimates probability of shift from “base category” to others. So, probability controller z for base category should be both mean and median of all its possible values.
• As mentioned in the post on binary logit model, z ~ (-∞, ∞). Thus, z0 = 0.
• In other words, multinomial logit model estimates shift probabilities by estimating shift of z from 0.