% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/whichFunctions.R
\name{whichAreIncluded}
\alias{whichAreIncluded}
\title{Identify columns that are included in others}
\usage{
whichAreIncluded(dataSet, keep_cols = NULL, verbose = TRUE)
}
\arguments{
\item{dataSet}{Matrix, data.frame or data.table}

\item{keep_cols}{List of columns not to drop (list of character, default to NULL)}

\item{verbose}{Should the algorithm talk (logical, default to TRUE)}
}
\value{
A list of index of columns that have an exact duplicate in the \code{dataSet}.
}
\description{
Find all the columns that don't contain more information than another column. For example if 
you have a column with an amount and another with the same amount but rounded, the second 
column is included in the first.
}
\details{
This function is performing exponential search and is looking to every couple of columns. \cr
Be very careful while using this function: \cr
- if there is an id column, it will say everything is included in the id column; \cr
- the order of columns will influence the result.\cr
\cr
For example if 
you have a column with an amount and another with the same amount but rounded, the second 
column is included in the first.\cr
\cr
And last but not least, with some machine learning algorithm it's not always smart to drop 
columns even if they don't give more info: the extreme example is the id example.
}
\examples{
# Load toy data set
require(data.table)
data(messy_adult)

# Reduce set size to save time (you can run it on full set)
messy_adult = messy_adult[1:100, ]

# Check for included columns
whichAreIncluded(messy_adult)

# Return columns that are also constant, double and bijection
# Let's add a truly just included column
messy_adult$are50OrMore <- messy_adult$age > 50
whichAreIncluded(messy_adult[, .(age, are50OrMore)])

# As one can, see this column that doesn't have additional info than age is spotted.

# But you should be careful, if there is a column id, every column will be dropped:
messy_adult$id = 1:nrow(messy_adult) # build id
whichAreIncluded(messy_adult)
}
