% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/seqimpute.R
\name{seqimpute}
\alias{seqimpute}
\title{Imputation of missing data in sequence analysis}
\usage{
seqimpute(
  OD,
  regr = "multinom",
  np = 1,
  nf = 0,
  nfi = 1,
  npt = 1,
  available = TRUE,
  CO = matrix(NA, nrow = 1, ncol = 1),
  COt = matrix(NA, nrow = 1, ncol = 1),
  pastDistrib = FALSE,
  futureDistrib = FALSE,
  mi = 1,
  mice.return = FALSE,
  include = FALSE,
  noise = 0,
  ParExec = FALSE,
  ncores = NULL,
  SetRNGSeed = FALSE,
  num.trees = 10,
  min.node.size = NULL,
  max.depth = NULL,
  verbose = TRUE
)
}
\arguments{
\item{OD}{either a data frame containing sequences of a multinomial variable with missing data (coded as \code{NA}) or
a state sequence object built with the TraMineR package}

\item{regr}{a character specifying the imputation method. If \code{regr="multinom"}, multinomial models are used,
while if \code{regr="rf"}, random forest models are used.}

\item{np}{number of previous observations in the imputation model of the internal gaps.}

\item{nf}{number of future observations in the imputation model of the internal gaps.}

\item{nfi}{number of future observations in the imputation model of the initial gaps.}

\item{npt}{number of previous observations in the imputation model of the terminal gaps.}

\item{available}{a logical value allowing the user to choose whether to consider the already imputed data in the predictive model (\code{available = TRUE}) or not (\code{available = FALSE}).}

\item{CO}{a data frame containing some covariates among which the user can choose in order to specify his model more accurately.}

\item{COt}{a data frame object containing some time-dependent covariates that help specifying the predictive model more accurately.}

\item{pastDistrib}{a logical indicating if the past distribution should be used as predictor in the imputation model.}

\item{futureDistrib}{a logical indicating if the futur distribution should be used as predictor in the imputation model.}

\item{mi}{number of multiple imputations  (default: \code{1}).}

\item{mice.return}{a logical indicating whether an object of class \code{mids}, that can be directly used by the \code{mice} package, should be returned
by the algorithm. By default, a data frame with the imputed datasets stacked vertically is returned.}

\item{include}{logical. If a dataframe is returned (\code{mice.return = FALSE}), indicates if the original
dataset should be included or not. This parameter does not apply if \code{mice.return=TRUE}.}

\item{noise}{\code{numeric} object adding a noise on the predicted variable \code{pred} determined by the multinomial model 
(by introducing a variance \code{noise} for each components of the vector \code{pred}) (the user can choose any value for \code{noise}, but we recommend to choose a rather relatively small value situated in the interval \code{[0.005-0.03]}).}

\item{ParExec}{logical. If \code{TRUE}, the multiple imputations are run in parallell. This allows faster run time depending of how many core the processor has.}

\item{ncores}{integer. Number of cores to be used for the parallel computation. If no value is set for this parameter, the number of cores will be set
to the maximum number of CPU cores minus 1.}

\item{SetRNGSeed}{an integer that is used to set the seed in the case of parallel computation. Note that setting \code{set.seed()} alone before the seqimpute function won't work in case
of parallel computation.}

\item{num.trees}{random forest parameter setting the number of trees of each random forest model.}

\item{min.node.size}{random forest parameter setting the minimum node size for each random forest model.}

\item{max.depth}{random forest parameter setting the maximal depth tree for each random forest model.}

\item{verbose}{logical. If \code{TRUE}, seqimpute will print history and warnings on console. Use \code{verbose=FALSE} for silent computation.}
}
\value{
Returns either an S3 object of class \code{mids} if \code{mice.return = TRUE}
or a dataframe, where the imputed dataset are stacked vertically. In the second case,
two columns are added: \code{.imp} integer that refers to the imputation number
(0 corresponding to the original dataset if \code{include=TRUE}) and \code{.id} character corresponding to
the rownames of the dataset to impute.
}
\description{
Multiple imputation of missing data present in a dataset through the prediction based
on either a multinomial or a random forest regression model.
Covariates and time-dependant covariates can be included in the model.
The prediction of the missing values is based on the theory of Prof. Brendan
Halpin. It considers a various amount of surrounding available information to
perform the prediction process.
In fact, we can among others specify \code{np} (the number of past variables
taken into account) and \code{nf} (the number of future information taken
into account).
}
\details{
The imputation process is divided into several steps. According to the location of the gaps of NA among the original dataset, we have defined 5 types of gaps:

- Internal Gaps (simple usual gaps)

- Initial Gaps (gaps situated at the very beginning of a sequence)

- Terminal Gaps (gaps situaed at the very end of a sequence)

- Left-hand side SLG (Specially Located Gaps) (gaps of which the beginning location is included in the interval \code{[0,np]}
but the ending location is not included in the interval \code{[ncol(OD)-nf,ncol(OD)]})

- Right-hand side SLG (Specially Located Gaps) (gaps of which the ending location is included in the interval \code{[ncol(OD)-nf,ncol(OD)]} 
but the beginning location is not included in the interval \code{[0,np]})

- Both-hand side SLG (Specially Located Gaps) (gaps of which the beginning location is included in the interval \code{[0,np]}
and the ending location is included in the interval \code{[ncol(OD)-nf,ncol(OD)]} )

Order of imputation of the gaps types:
    1. Internal Gaps
    2. Initial Gaps
    3. Terminal Gaps
    4. Left-hand side SLG
    5. Right-hand side SLG
    6. Both-hand side SLG
}
\examples{

# Default single imputation
RESULT <- seqimpute(OD=OD, np=1, nf=1, nfi=1, npt=1, mi=1)

# Seqimpute used with parallelisation
\dontrun{
RESULT <- seqimpute(OD=OD, np=1, nf=1, nfi=1, npt=1, mi=2, ParExec=TRUE, SetRNGSeed=17,ncores=2)
}

}
\references{
HALPIN, Brendan (2012). Multiple imputation for life-course sequence data. Working Paper WP2012-01, Department of Sociology, University of Limerick. http://hdl.handle.net/10344/3639.

HALPIN, Brendan (2013). Imputing sequence data: Extensions to initial and terminal gaps, Stata's. Working Paper WP2013-01, Department of Sociology, University of Limerick. http://hdl.handle.net/10344/3620
}
\author{
Andre Berchtold <andre.berchtold@unil.ch> Kevin Emery Anthony Guinchard Kamyar Taher
}
