TLDR: If you have an estimate for \(Z\), you can't just take \(e^{estimate}\) to estimate \(e^Z\)
A bias correction factor of \(e^{\hat\sigma^2/2}\) has to be applied on the "common sense" estimator \(e^{\hat{E(Z)}}\), to correctly estimate \(Y=e^Z\). The right estimate is \(Y=e^Z\ \hat =\ e^{\hat \sigma^2/2}e^{\hat{E(Z)}}\).
A bias correction factor of \(e^{\hat\sigma^2/2}\) has to be applied on the "common sense" estimator \(e^{\hat{E(Z)}}\), to correctly estimate \(Y=e^Z\). The right estimate is \(Y=e^Z\ \hat =\ e^{\hat \sigma^2/2}e^{\hat{E(Z)}}\).
Suppose we model \(Z=\ln(Y)\) instead of \(Y\), so that \(Y=e^Z\). We estimate \(E(Z)=\mu\ \hat=\ \hat \mu= f(X)\) based on independent variables \(X\). (Read the symbol \(\hat =\) as "estimated as".)
Given \(\hat \mu\) estimates \(E(Z)\), a common-sense option to estimate \(E(Y)\) might seem to be \(e^\hat \mu\), since \(Y=e^Z\).
But this will not give the best results - simply because \(E(Y)=E(e^Z)\ne e^{E(Z)}\).
\(E(Y)=e^{\mu+\sigma^2/2}\), where \(\sigma^2\) is the variance of the error \(Z-\hat Z\) - and hence a good estimate of \(E(Y)\) would be \(E(Y)\ \hat=\ e^{\hat \mu+\hat \sigma^2/2}\).
Estimating \(\sigma^2\)
We are used to estimating \(E(Z)\hat=\hat \mu\), which is the just the familiar regression estimate \(\sum \hat \beta_i X_i\). We will need to estimate \(\hat\sigma^2\) now too, to get an accurate point estimate of \(Y=e^Z\).
OLS
If
you are running an Ordinary Least Squares regression, an unbiased
estimate for \(\sigma^2\) is \(\frac{SSE}{n-k}\) where \(n\)=#observations, and \(k\)=#parameters in the model.
Most
statistical packages report these - and if not, you can calculate it as
\(\sum (Z-\hat Z)^2/(n-k)\). SAS reports all these if you use PROC REG,
in fact, in SAS \(\hat \sigma\) is already reported as "Root MSE", and
you can directly take \(\text{Root MSE}^2\) as an estimate of
\(\sigma^2\).
Other Regression Frameworks (Machine Learning - RandomForest, NN, KNN, etc.)
A
generic way of estimating the \(\sigma^2\) is to borrow the assumption
of homoscedasticity from OLS - i.e. that the \(\sigma^2\) does not vary
from person to person.
Under
this assumption, CLT can be used to show that \(\sum (Z-\hat Z)^2/n\)
will converge in probability to \(\sigma^2\), and hence remains a good
estimator - even if it may not be unbiased for small \(n\).
If number of parameters in the model is known, then it is recommended to use \(\sum (Z-\hat Z)^2/(n-k)\), mimicking
the OLS estimator - it will correct for the bias to some extent,
although for large \(n\), the difference between \(1/(n-k)\) and \(1/n\)
will be small.
Proof
If \(Z=\ln(Y)\sim N(\mu,\sigma^2)\), then \(E(Y)=E(e^Z)=e^{\mu+\sigma^2/2}\).Citing the mean of lognormal distribution in Wikipedia may work as "proof" in most cases. Just for completeness, a full mathematical proof is also given below.
\[E(e^Z)=\int_{-\infty}^{\infty}{e^z\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(z-\mu)^2}{2\sigma^2}}}\,dz=\int_{-\infty}^{\infty}{\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{\color{#3366FF}{(z-\mu)^2-2\sigma^2 z}}{2\sigma^2}}}\,dz\]
\[\begin{array}{rcl}
\color{#3366FF}{(z-\mu)^2-2\sigma^2 z}&=&z^2-2\mu z + \mu^2-2\sigma^2z\\
&=&z^2-2(\mu+\sigma^2) z + \mu^2\\
&=&\left(z-(\mu+\sigma^2)\right)^2 + \mu^2-(\mu+\sigma^2)^2\\
&=&\left(z-(\mu+\sigma^2)\right)^2 - 2\mu\sigma^2-\sigma^4\\
&=&\color{green}{\left(z-(\mu+\sigma^2)\right)^2} - \color{red}{2\sigma^2(\mu+\sigma^2/2)}\\
\end{array}\]
\[\begin{array}{rcl}
E(e^Z)&=&\int_{-\infty}^{\infty}{\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{\color{green}{\left(z-(\mu+\sigma^2)\right)^2} - \color{red}{2\sigma^2(\mu+\sigma^2/2)}}{2\sigma^2}}}\,dz\\
&=&\color{red}{e^{\mu+\sigma^2/2}}\int_{-\infty}^{\infty}{\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{\left(z-(\mu+\sigma^2)\right)^2}{2\sigma^2}}}\,dz\\
&=&\color{red}{e^{\mu+\sigma^2/2}}\\
\end{array}\]