% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/OneR.R
\name{optbin}
\alias{optbin}
\title{Optimal Binning function}
\usage{
optbin(data, formula = NULL, method = c("logreg", "naive"),
  na.omit = TRUE)
}
\arguments{
\item{data}{dataframe which contains the data. When \code{formula = NULL} (the default) the last column must be the target variable.}

\item{formula}{formula interface for the \code{optbin} function.}

\item{method}{a character string specifying the method for optimal binning, see 'Details'; can be abbreviated.}

\item{na.omit}{logical value whether instances with missing values should be removed.}
}
\value{
A dataframe with the target variable being in the last column.
}
\description{
Discretizes all numerical data in a dataframe into categorical bins where the cut points are optimally aligned with the target categories, thereby a factor is returned.
When building a OneR model this could result in fewer rules with enhanced accuracy.
}
\details{
The cutpoints are calculated by pairwise logistic regressions (method \code{"logreg"}) or as the means of the expected values of the respective classes (\code{"naive"}).
The function is likely to give unsatisfactory results when the distributions of the respective classes are not (linearly) separable. Method \code{"naive"} should only be used when distributions are (approximately) normal,
although in this case \code{"logreg"} should give comparable results, so it is the preferable (and therefore default) method.

Character strings and logical strings are coerced into factors. Matrices are coerced into dataframes. If the target is numeric it is turned into a factor with the number of levels equal to the number of values. Additionally a warning is given.

When \code{"na.omit = FALSE"} an additional level \code{"NA"} is added to each factor with missing values.
}
\examples{
data <- iris # without optimal binning
model <- OneR(data, verbose = TRUE)
summary(model)

data_opt <- optbin(iris) # with optimal binning
model_opt <- OneR(data_opt, verbose = TRUE)
summary(model_opt)

## The same with the formula interface:
data_opt <- optbin(formula = Species ~., data = iris)
model_opt <- OneR(data_opt, verbose = TRUE)
summary(model_opt)

}
\author{
Holger von Jouanne-Diedrich
}
\references{
\url{http://vonjd.github.io/OneR/}
}
\seealso{
\code{\link{OneR}}, \code{\link{bin}}
}
\keyword{binning}
\keyword{discretization}
\keyword{discretize}

