% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/semTools.R
\name{transformData}
\alias{transformData}
\title{Transform data methods}
\usage{
transformData(x, method = "npn", ...)
}
\arguments{
\item{x}{A matrix or data.frame (n x p). Rows correspond to subjects, and
columns to graph nodes.}

\item{method}{Trasform data method. It can be one of the following:
\enumerate{
\item "npn" (default), performs nonparanormal(npn) or semiparametric
Gaussian copula model (Liu et al, 2009), estimating the Gaussian copula
by marginally transforming the variables using smooth ECDF functions.
The npn distribution corresponds to the latent underlying multivariate
normal distribution, preserving the conditional independence structure
of the original variables.
\item "spearman", computes a trigonometric trasformation of Spearman
rho correlation for estimation of latent Gaussian correlations
parameter of a nonparanormal distribution (Harris & Dorton (2013),
and generates the data matrix with the exact same sample covariance
matrix as the estimated one.
\item "kendall", computes a trigonometric trasformation of Kendall
tau correlation for estimation of latent Gaussian correlations
parameter of a nonparanormal distribution (Harris & Dorton (2013),
and generates the data matrix with the exact same sample covariance
matrix as the estimated one.
\item "polychoric", computes the polychoric correlation matrix and
generates the data matrix with the exact same sample covariance matrix
as the estimated one. The polychoric correlation (Olsson, 1974) is a
measure of association between two ordinal variables. It is based on the
assumption that two latent bivariate normally distributed random variables
generate couples of ordinal scores. Tetrachoric (two binary variables) and
biserial (an ordinal and a numeric variables) correlations are special cases.
\item "lineals", performs optimal scaling in order to achieve linearizing
transformations for each bivariate regression between pairwise variables for
subsequent structural equation models using the resulting correlation
matrix computed on the transformed data (de Leeuw, 1988).
\item "mca", performs optimal scaling of categorical data by Multiple
Correspondence Analysis (MCA, a.k.a homogeneity analysis) maximizing
the first eigenvalues of the trasformed correlation matrix. The estimates
of the corresponding structural parameters are consistent if the underlying
latent space of the observed variables is unidimensional.
}}

\item{...}{Currently ignored.}
}
\value{
A list of 2 objects is returned:
\enumerate{
\item "data", the matrix (n x p) of n observations and p transformed
variables or the matrix (n x p) of simulate observations based on the
selected correlation matrix.  
\item "catscores", the category weights for "lineals" or "mca"
methods or NULL otherwise.
}
}
\description{
Implements various data trasformation methods with
optimal scaling for ordinal or nominal data, and to help relax
the assumption of normality (gaussianity) for continuous data.
}
\details{
Nonparanormal trasformation is computationally very efficient
and only requires one ECDF pass of the data matrix. Polychoric correlation
matrix is computed with the \code{lavCor()} function of the \code{lavaan}
package. Optimal scaling (lineals and mca) is performed with the
\code{lineals()} and \code{corAspect()} functions of the \code{aspect}
package (Mair and De Leeuw, 2008). To note, SEM fitting of the generate data
(fake data) must be done with a covariance-based method and bootstrap SE,
i.e., with \code{SEMrun(..., algo="ricf", n_rep=1000)}.
}
\examples{

#... with continuous ALS data
graph<- alsData$graph
data<- alsData$exprs; dim(data)
X<- data[, colnames(data) \%in\% V(graph)$name]; dim(X)

npn.data<- transformData(X, method="npn")
sem0.npn<- SEMrun(graph, npn.data$data, algo="cggm")

mvnS.data<- transformData(X, method="spearman")
sem0.mvnS<- SEMrun(graph, mvnS.data$data, algo="cggm")

mvnK.data<- transformData(X, method="kendall")
sem0.mvnK<- SEMrun(graph, mvnK.data$data, algo="cggm")

#...with ordinal (K=4 categories) ALS data
Xord <- data.frame(X)
Xord <- as.data.frame(lapply(Xord, cut, 4, labels = FALSE))
colnames(Xord) <- sub("X", "", colnames(Xord))

mvnP.data<- transformData(Xord, method="polychoric")
sem0.mvnP<- SEMrun(graph, mvnP.data$data, algo="cggm")

#...with nominal (K=4 categories) ALS data
mca.data<- transformData(Xord, method="mca")
sem0.mca<- SEMrun(graph, mca.data$data, algo="cggm")
mca.data$catscores
gplot(sem0.mca$graph, l="fdp", main="ALS mca")

# plot colored graphs
#par(mfrow=c(2,2), mar=rep(1,4))
#gplot(sem0.npn$graph, l="fdp", main="ALS npm")
#gplot(sem0.mvnS$graph, l="fdp", main="ALS mvnS")
#gplot(sem0.mvnK$graph, l="fdp", main="ALS mvnK")
#gplot(sem0.mvnP$graph, l="fdp", main="ALS mvnP")

}
\references{
Liu H, Lafferty J, and Wasserman L (2009). The Nonparanormal: Semiparametric Estimation of
High Dimensional Undirected Graphs. Journal of Machine Learning Research 10(80): 2295-2328

Harris N, and Drton M (2013). PC Algorithm for Nonparanormal Graphical Models.
Journal of Machine Learning Research 14 (69): 3365-3383

Olsson U (1979). Maximum likelihood estimation of the polychoric correlation coefficient.
Psychometrika, 44(4), 443-460.

Mair P, and De Leeuw J (2008). Scaling variables by optimizing correlational and
non-correlational aspects in R. Journal of Statistical Software, 32(9), 1-23.

de Leeuw J (1988). Multivariate analysis with linearizable regressions. Psychometrika,
53, 437-454.
}
\author{
Mario Grassi \email{mario.grassi@unipv.it}
}
