% Generated by roxygen2 (4.1.1): do not edit by hand
% Please edit documentation in R/bigpca.R
\name{thin}
\alias{thin}
\title{Reduce one dimension of a large matrix in a strategic way}
\usage{
thin(bigMat, keep = 0.05, how = c("uniform", "correlation", "pca",
  "association"), dir = "", rows = TRUE, random = TRUE, hi.cor = TRUE,
  least = TRUE, pref = "thin", verbose = FALSE, ret.obj = TRUE, ...)
}
\arguments{
\item{bigMat}{a big.matrix object, or any argument accepted by get.big.matrix(), which includes
paths to description files or even a standard matrix object.}

\item{keep}{numeric, by default a proportion (decimal) of the original number of rows/columns to choose
for the subset. Otherwise if an integer>2 then will assume this is the size of the desired subset,
e.g, for a dataset with 10,000 rows where you want a subset size of 1,000 you could set 'keep' as
either 0.1 or 1000.}

\item{how}{character, only the first two characters are required and they are not case sensitive,
 select what method to use to perform subset selection, options are:
'uniform': evenly spaced selection when random=FALSE, or random selection otherwise;
 see uniform.select().
'correlation': most correlated subset when hi.cor=TRUE, least correlated otherwise;
 see subcor.select().
'pca': most representative variables of the principle components of a subset;
 see subpc.select().
'association': most correlated subset with phenotype if least=FALSE, or least correlated otherwise;
 see select.least.assoc().}

\item{dir}{directory containing the filebacked.big.matrix, same as 'dir' for get.big.matrix.}

\item{rows}{logical, whether to choose a subset of rows (TRUE), or columns (FALSE). rows is always
TRUE when using 'association' methods.}

\item{random}{logical, whether to use random selections and subsets (TRUE), or whether to use uniform
selections that should give the same result each time for the same dataset (FALSE)}

\item{hi.cor}{logical, if using 'correlation' methods, then whether to choose the most correlated (TRUE)
or least correlated (FALSE).}

\item{least}{logical, if using 'association' methods, whether to choose the least associated (TRUE) or
most associated variables with phenotype}

\item{pref}{character, a prefix for big.matrix backing files generated by this selection}

\item{verbose}{logical, whether to display more information about processing}

\item{ret.obj}{logical, whether to return the result as a big.matrix object (TRUE), or as a reference
to the binary file containing the big.matrix.descriptor object [either can be read with get.big.matrix() or
prv.big.matrix()]}

\item{...}{other arguments to be passed to uniform.select, subpc.select, subcor.select, or select.least.assoc}
}
\value{
A smaller big.matrix with fewer rows and/or columns than the original matrix
}
\description{
Thin the rows (or columns) of a large matrix or big.matrix in order to reduce the size of the
dataset while retaining important information. Percentage of the original size or a new number
of rows/columns is selectable, and then there are four methods to choose the data subset.
Simple uniform and random selection can be specified. Other methods look at the correlation
structure of a subset of the data to derive non-arbitrary selections, using correlation, PCA,
or association with a phenotype or some other categorical variable. Each of the four methods
has a separate function in this package, which you can see for more information, this function
is merely a wrapper to select one of the four.
}
\examples{
bmat <- generate.test.matrix(5,big.matrix=TRUE)
prv.big.matrix(bmat)
# make 5\% random selection:
lmat <- thin(bmat)
prv.big.matrix(lmat)
# make 10\% most orthogonal selection (lowest correlations):
lmat <- thin(bmat,.10,"cor",hi.cor=FALSE)
prv.big.matrix(lmat)
# make 10\% most representative selection:
lmat <- thin(bmat,.10,"PCA",ret.obj=FALSE) # return file name instead of object
print(lmat)
prv.big.matrix(lmat)
# make 25\% selection most correlated to phenotype
# create random phenotype variable
pheno <- rep(1,ncol(bmat)); pheno[which(runif(ncol(bmat))<.5)] <- 2
lmat <- thin(bmat,.25,"assoc",phenotype=pheno,least=FALSE,verbose=TRUE)
prv.big.matrix(lmat)
# tidy up temporary files:
unlink(c("thin.bck","thin.dsc","thin.RData"))
}
\author{
Nicholas Cooper
}
\seealso{
\code{\link{uniform.select}}, \code{\link{subpc.select}}, \code{\link{subcor.select}},
\code{\link{select.least.assoc}}, \code{\link{big.select}}, \code{\link{get.big.matrix}}
}

