FAQ How do I interpret a regression model when some variables are log transformed_
pdf
keyboard_arrow_up
School
Georgia Institute Of Technology *
*We aren’t endorsed by this school
Course
OMSA
Subject
Mathematics
Date
Apr 3, 2024
Type
Pages
8
Uploaded by DukeSummer7561
1/20/24, 1:57 PM
FAQ How do I interpret a regression model when some variables are log transformed?
https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faqhow-do-i-interpret-a-regression-model-when-some-variables-are-log-transformed/
1/8
ARE LOG TRANSFORMED?
Introduction
In this page, we will discuss how to interpret a regression model when some variables in the model have been
log transformed. The example data can be downloaded here
(https://stats.idre.ucla.edu/wp-
content/uploads/2016/02/lgtrans.csv) (the file is in .csv format). The variables in the data set are writing, reading,
and math scores ( , and ), the log transformed writing (
lgwrite
) and log transformed math
scores (
lgmath
) and . For these examples, we have taken the natural log (ln). All the examples are done
in Stata, but they can be easily generated in any statistical package. In the examples below, the variable or its log transformed version will be used as the outcome variable. The examples are used for illustrative
purposes and are not intended to make substantive sense. Here is a table of different types of means for variable
.
Variable | Type Obs Mean [95% Conf. Interval]
-------------+----------------------------------------------------------
write | Arithmetic 200 52.775 51.45332 54.09668 | Geometric 200 51.8496 50.46854 53.26845 | Harmonic 200 50.84403 49.40262 52.37208 ------------------------------------------------------------------------
Outcome variable is log transformed
Very often, a linear relationship is hypothesized between a log transformed outcome variable and a group of
predictor variables. Written mathematically, the relationship follows the equation
where is the outcome variable and are the predictor variables. In other words, we assume that
is normally distributed, (or is log-normal conditional on all the covariates). Since this is just an
ordinary least squares regression, we can easily interpret a regression coefficient, say , as the expected
change in log of with respect to a one-unit increase in holding all other variables at any fixed value,
assuming that enters the model only as a main effect. But what if we want to know what happens to the
outcome variable itself for a one-unit increase in ? The natural way to do this is to interpret the
exponentiated regression coefficients, , since exponentiation is the inverse of logarithm function.
Let’s start with the intercept-only model.
------------------------------------------------------------------------------
1/20/24, 1:57 PM
FAQ How do I interpret a regression model when some variables are log transformed?
https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faqhow-do-i-interpret-a-regression-model-when-some-variables-are-log-transformed/
2/8
lgwrite | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
intercept | 3.948347 .0136905 288.40 0.000 3.92135 3.975344
------------------------------------------------------------------------------
.
We can say that is the unconditional expected mean of log of . Therefore the exponentiated value is
. This is the geometric mean of . The emphasis here is that it is the geometric
mean instead of the arithmetic mean. OLS regression of the original variable is used to to estimate the
expected arithmetic mean and OLS regression of the log transformed outcome variable is to estimated the
expected geometric mean of the original variable.
Now let’s move on to a model with a single binary predictor variable.
------------------------------------------------------------------------------
lgwrite | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | .1032614 .0265669 3.89 0.000 .050871 .1556518
intercept | 3.89207 .0196128 198.45 0.000 3.853393 3.930747
------------------------------------------------------------------------------
Before diving into the interpretation of these parameters, let’s get the means of our dependent variable, ,
by gender.
males
Variable | Type Obs Mean [95% Conf. Interval]
+
1/20/24, 1:57 PM
FAQ How do I interpret a regression model when some variables are log transformed?
https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faqhow-do-i-interpret-a-regression-model-when-some-variables-are-log-transformed/
3/8
-------------+----------------------------------------------------------
write | Arithmetic 91 50.12088 47.97473 52.26703 | Geometric 91 49.01222 46.8497 51.27457 | Harmonic 91 47.85388 45.6903 50.23255 ------------------------------------------------------------------------
females
Variable | Type Obs Mean [95% Conf. Interval]
-------------+----------------------------------------------------------
write | Arithmetic 109 54.99083 53.44658 56.53507 | Geometric 109 54.34383 52.73513 56.0016 | Harmonic 109 53.64236 51.96389 55.43289 ------------------------------------------------------------------------
Now we can map the parameter estimates to the geometric means for the two groups. The intercept of is
the log of geometric mean of when , i.e., for males. Therefore, the exponentiated value of it
is the geometric mean for the male group: . What can we say about the coefficient for
? In the log scale, it is the difference in the expected geometric means of the log of between the
female students and male students. In the original scale of the variable , it is the ratio of the geometric
mean of for female students over the geometric mean of for male students,
. In terms of percent change, we can say that switching from
male students to female students, we expect to see about increase in the geometric mean of writing
scores.
Last, let’s look at a model with multiple predictor variables.
------------------------------------------------------------------------------
lgwrite | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female |
114718
0195341
5 87
0 000
076194
153242
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
1/20/24, 1:57 PM
FAQ How do I interpret a regression model when some variables are log transformed?
https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faqhow-do-i-interpret-a-regression-model-when-some-variables-are-log-transformed/
4/8
female | .114718 .0195341 5.87 0.000 .076194 .153242
read | .0066305 .0012689 5.23 0.000 .0041281 .0091329
math | .0076792 .0013873 5.54 0.000 .0049432 .0104152
intercept | 3.135243 .0598109 52.42 0.000 3.017287 3.253198
------------------------------------------------------------------------------
The exponentiated coefficient for is the ratio of the expected geometric mean for the female
students group over the expected geometric mean for the male students group, when and are held
at some fixed value. Of course, the expected geometric means for the male and female students group will be
different for different values of and . However, their ratio is a constant: . In our example,
. We can say that writing scores will be higher for the female students
than for the male students. For the variable , we can say that for a one-unit increase in , we expect to
see about a increase in writing score, since . For a ten-unit increase
in , we expect to see about a increase in writing score, since
.
The intercept becomes less interesting when the predictor variables are not centered and are continuous. In this
particular model, the intercept is the expected mean for for male ( ) when and
are equal to zero.
In summary, when the outcome variable is log transformed, it is natural to interpret the exponentiated regression
coefficients. These values correspond to changes in the ratio of the expected geometric means of the original
outcome variable.
Some (not all) predictor variables are log transformed
Occasionally, we also have some predictor variables being log transformed. In this section, we will take a look at
an example where some predictor variables are log-transformed, but the outcome variable is in its original scale.
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female |
5 388777
9307948
5 79
0 000
3 553118
7 224436
1/20/24, 1:57 PM
FAQ How do I interpret a regression model when some variables are log transformed?
https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faqhow-do-i-interpret-a-regression-model-when-some-variables-are-log-transformed/
5/8
female | 5.388777 .9307948 5.79 0.000 3.553118 7.224436
lgmath | 20.94097 3.430907 6.10 0.000 14.17473 27.7072
lgread | 16.85218 3.063376 5.50 0.000 10.81076 22.89359
intercept | -99.16397 10.80406 -9.18 0.000 -120.4711 -77.85685
------------------------------------------------------------------------------
Written in equation, we have
Since this is an OLS regression, the interpretation of the regression coefficients for the non-transformed variables
are unchanged from an OLS regression without any transformed variables. For example, the expected mean
difference in writing scores between the female and male students is about points, holding the other
predictor variables constant. On the other hand, due to the log transformation, the estimated effects of and are no longer linear, even though the effect of and are linear. The plot below
shows the curve of predicted values against the reading scores for the female students group holding math
score constant.
1/20/24, 1:57 PM
FAQ How do I interpret a regression model when some variables are log transformed?
https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faqhow-do-i-interpret-a-regression-model-when-some-variables-are-log-transformed/
6/8
How do we interpret the coefficient of for the variable of log of reading score? Let’s take two values of
reading score, and . The expected mean difference in writing score at and , holding the other predictor
variables constant, is . This means that
as long as the percent increase in (the predictor variable) is fixed, we will see the same difference in writing
score, regardless where the baseline reading score is. For example, we can say that for a increase in
reading score, the difference in the expected mean writing scores will be always
.
Note:
Recalling the Taylor expansion of the function around , we have
. Therefore, for a small change in the predictor variable we can approximate the
difference in the expected mean of the dependent variable by multiplying the coefficient by the change in the
predictor variable. In our example we can say that for a increase in reading score, the difference in the
expected mean writing scores will be approximately . If we use the
log, the exact value will be .
Both the outcome variable and some predictor variables are log transformed
What happens when both the outcome variable and predictor variables are log transformed? We can combine
the two previously described situations into one. Here is an example of such a model.
------------------------------------------------------------------------------
lgwrite | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female |
1142399
0194712
5 87
0 000
07584
1526399
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
1/20/24, 1:57 PM
FAQ How do I interpret a regression model when some variables are log transformed?
https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faqhow-do-i-interpret-a-regression-model-when-some-variables-are-log-transformed/
7/8
female | .1142399 .0194712 5.87 0.000 .07584 .1526399
lgmath | .4085369 .0720791 5.67 0.000 .2663866 .5506872
read | .0066086 .0012561 5.26 0.000 .0041313 .0090859
intercept | 1.928101 .2469391 7.81 0.000 1.441102 2.415099
------------------------------------------------------------------------------
Written as an equation, we can describe the model:
For variables that are not transformed, such as , its exponentiated coefficient is the ratio of the
geometric mean for the female to the geometric mean for the male students group. For example, in our example,
we can say that the expected percent increase in geometric mean from male student group to female student
group is about holding other variables constant, since . For reading score, we can
say that for a one-unit increase in reading score, we expected to see about of increase in the geometric
mean of writing score, since .
Now, let’s focus on the effect of . Take two values of , and , and hold the other predictor
variables at any fixed value. The equation above yields
It can be simplified to , leading to
This tells us that as long as the ratio of the two math scores, stays the same, the expected ratio of the
outcome variable, , stays the same. For example, we can say that for any increase in score,
the expected ratio of the writing score will be . In other words, we
expect about increase in writing score when math score increases by .
Note:
Here also we can use an approximation method. Since, for a small value of , therefore
for a small change in the predictor variable we can approximate the expected ratio of the of the dependent
variable by multiplying the coefficient by the ratio of the change in the predictor variable. For example, we can
say that for any increase in score, the expected ratio of the writing score is approximately
. The exact value will be
.
1/20/24, 1:57 PM
FAQ How do I interpret a regression model when some variables are log transformed?
https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faqhow-do-i-interpret-a-regression-model-when-some-variables-are-log-transformed/
8/8
© 2021 UC REGENTS (http://www.ucla.edu/terms-of-use/)
HOME (/)
CONTACT (/contact)
Click here to report an error on this page or leave a comment
How to cite this page
(https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-how-do-i-cite-web-pages-and-
programs-from-the-ucla-statistical-consulting-group/)