\name{lfe-package}
\alias{lfe-package}
\alias{lfe}
\docType{package}
\title{Overview. Linear Group Fixed Effects}
\concept{Method of Alternating Projections}
\concept{Kaczmarz Method}
\concept{Fixed Effect Estimator}
\concept{Within Estimator}
\concept{Multiple Fixed Effects}
\concept{Instrumental Variables}
\concept{Conditional F-statistic}
\concept{Limited Mobility Bias}
\description{
The package uses the Method of Alternating Projections to estimate
linear models with multiple group fixed effects.  A generalization
of the within estimator. It supports IV-estimation with multiple
endogenous variables via 2SLS, with conditional F statistics for
detection of weak instruments. It is
thread-parallelized and intended for large problems.
A method for correcting limited mobility bias is also included.
}

\details{

This package is intended for linear models with multiple group fixed effects,
i.e. with 2 or more factors with a large number of levels.
It performs similar functions as \code{\link[stats]{lm}},
but it uses a special method for projecting out multiple group fixed
effects from the normal equations, hence it is faster. It is a
generalization of the within estimator.  This may be required if
the groups have high cardinality (many levels), resulting in tens
or hundreds of thousands of dummy-variables.  It is also useful if one
only wants to control for the group effects, without actually estimating
them.  The package may optionally compute standard errors for the group
effects by bootstrapping, but this is a very time- and memory-consuming process compared
to finding the point estimates.
If you only have a single huge factor, the package \pkg{plm} is probably
better suited.

As of version 1.6, projecting out interactions between continuous
covariates and factors is supported. I.e. individual slopes, not
only individual intercepts.  As of version 2.0, multiple left hand sides
are supported.

The estimation is done in two steps.  First the other coefficients are
estimated with the function \code{\link{felm}} by centering on all the
group means, followed by an OLS (similar to lm).  Then the group effects
are extracted (if needed) with the function \code{\link{getfe}}.  This
method is described in \cite{Gaure (2013)}, but also appears in \cite{Guimaraes and
Portugal(2010)}, disguised as the Gauss-Seidel algorithm.

There's also a function \code{\link{demeanlist}} which just does the
centering on an arbitrary matrix, and there's a function
\code{\link{compfactor}} which computes the connected components (which
are used for interpreting the group effects when there are only two
factors; see the Abowd et al references), they are also returned by
\code{\link{getfe}}.

For those who study the correlation between the fixed effects, like in
\cite{Abowd et al.}, there are functions \code{\link{bccorr}} and
\code{\link{fevcov}} for computing limited mobility bias corrected
correlations and variances as documented in \cite{Gaure (2014b)}.

Instrumented variable estimations are supported with 2SLS. Conditional
F statistics for testing reduced rank weak instruments as in \cite{Sanderson
 and Windmeijer (2014)} are available in \code{\link{condfstat}}.

The centering on the means is done with a tolerance which is
set by \code{options(lfe.eps=1e-8)} (the default).  This is a somewhat
conservative tolerance, in many cases I'd guess
\code{1e-6} may be sufficient.  This may speed up the
centering.  In the other direction, setting \code{options(lfe.eps=0)}
will provide maximum accuracy at the cost of computing time and
warnings about convergence failure.

The package is threaded, that is, it may use more than one cpu.  The
number of threads is fetched upon loading the package from the
environment variable \env{LFE_THREADS}, \env{OMP_THREAD_LIMIT}, \env{OMP_NUM_THREADS} or
\env{NUMBER_OF_PROCESSORS} (for Windows), and
stored by \code{options(lfe.threads=n)}.  This option may be changed prior to
calling \code{\link{felm}}, if so desired.  Note that, typically,
\pkg{lfe} is limited by memory-bandwidth, not cpu-speed, thus fast
memory and large cache is more important than clock-frequency. It's
therefore also not always true that running on all available cores is
much better than running on half of them.

Threading is only done for the centering; the extraction of the group
effects is not threaded. The default method for extracting the
group coefficients is the iterative Kaczmarz-method, its tolerance
is also the \code{lfe.eps} option.

For some datasets the Kaczmarz-method is converging very slowly, in this
case it may be replaced with the conjugate gradient method of \pkg{Rcgmin}
by setting the option \code{options(lfe.usecg=TRUE)}.

The package has been tested on datasets with approx 20,000,000
observations with 15 covariates and approx 2,300,000 and 270,000 group
levels (the \code{\link{felm}} took about 50 minutes on 8 cpus, the
\code{\link{getfe}} takes 5 minutes).  Though, beware that not
only the size of the dataset matters, but also its structure.

The package will work with any positive number of grouping factors, but if
more than two, their interpretation is in general not well understood,
i.e. one should make sure that the coefficients are estimable.

In the exec-directory there is a perl-script \code{lfescript} which
is used at the author's site for creating R-scripts from 
a simple specification file.  The format is documented in 
\code{doc/lfeguide.txt}.

\pkg{lfe} is similar in function, though not in method, to the
Stata modules \code{a2reg} and \code{felsdvreg}.
}

\references{
Abowd, J.M., F. Kramarz and D.N. Margolis (1999) \cite{High Wage Workers and High
Wage Firms}, Econometrica 67 (1999), no. 2, 251--333.
\url{http://dx.doi.org/10.1111/1468-0262.00020}
  
Abowd, J.M., R. Creecy and F. Kramarz (2002) \cite{Computing Person
  and Firm Effects Using Linked Longitudinal Employer-Employee
  Data.} Technical Report TP-2002-06, U.S. Census Bureau.
  \url{http://lehd.did.census.gov/led/library/techpapers/tp-2002-06.pdf}

Andrews, M., L. Gill, T. Schank and R. Upward (2008)
\cite{High wage workers and low wage firms: negative assortative
  matching or limited mobility bias?}
  J.R. Stat. Soc.(A) 171(3), 673--697. 
  \url{http://dx.doi.org/10.1111/j.1467-985X.2007.00533.x}

Cornelissen, T. (2008)
\cite{The stata command felsdvreg to fit a linear model with two
  high-dimensional fixed effects.}
Stata Journal, 8(2):170--189, 2008.
\url{http://econpapers.repec.org/RePEc:tsj:stataj:v:8:y:2008:i:2:p:170-189}

Gaure, S. (2013) \cite{OLS with Multiple High Dimensional Category
  Variables.} Computational Statistics and Data Analysis, 66:8--18, 2013
  \url{http://dx.doi.org/10.1016/j.csda.2013.03.024}

Gaure, S. (2014a) \code{lfe: Linear Group Fixed Effects.} The R
  Journal, 5(2):104-117, Dec 2013. \url{http://journal.r-project.org/archive/2013-2/gaure.pdf}

Gaure, S. (2014b), \cite{Correlation bias correction in two-way
  fixed-effects linear regression}, Stat 3(1):379-390, 2014.
  \url{http://dx.doi.org/10.1002/sta4.68}

Guimaraes, P. and Portugal, P. (2010) \cite{A simple feasible
  procedure to fit models with high-dimensional fixed effects.}
  The Stata Journal, 10(4):629--649, 2010.
  \url{http://www.stata-journal.com/article.html?article=st0212}
  
Ouazad, A. (2008)
\cite{A2REG: Stata module to estimate models with two fixed effects.}
Statistical Software Components S456942, Boston College Department of Economics.
\url{http://ideas.repec.org/c/boc/bocode/s456942.html}

Sanderson, E. and F. Windmeijer (2014)
\cite{A weak instrument F-test in linear iv models with multiple
endogenous variables}, Disc. Paper 14/644 Univ of Bristol.
\url{http://www.efm.bris.ac.uk/economics/working_papers/pdffiles/dp14644.pdf}

}

\examples{
  oldopts <- options(lfe.threads=1)
  x <- rnorm(1000)
  x2 <- rnorm(length(x))
  id <- factor(sample(10,length(x),replace=TRUE))
  firm <- factor(sample(3,length(x),replace=TRUE,prob=c(2,1.5,1)))
  year <- factor(sample(10,length(x),replace=TRUE,prob=c(2,1.5,rep(1,8))))
  id.eff <- rnorm(nlevels(id))
  firm.eff <- rnorm(nlevels(firm))
  year.eff <- rnorm(nlevels(year))
  y <- x + 0.25*x2 + id.eff[id] + firm.eff[firm] +
         year.eff[year] + rnorm(length(x))
  est <- felm(y ~ x+x2 | id + firm + year)
  summary(est)

  getfe(est,se=TRUE)
# compare with an ordinary lm
  summary(lm(y ~ x+x2+id+firm+year-1))
  options(oldopts)
}
\keyword{regression}
\keyword{models}
