\name{ahist}
\alias{ahist}
\title{Adaptive Histograms}
\description{
Generate or plot histograms adaptive to patterns in univariate data. The number and widths of histogram bins are automatically calculated based on an optimal \var{k}-means clustering of input data. Thus the bins are unlikely of equal width.
}

\usage{
ahist(x, k = c(1,9), breaks=NULL, data=NULL, plot = TRUE,
      xlab = deparse(substitute(x)), main = NULL,
      col = NULL, lwd = graphics::par("lwd"),
      col.stick = "gray", lwd.stick = 1, add.sticks=TRUE,
      style = c("discontinuous", "midpoints"),
      skip.empty.bin.color=TRUE,
      \dots)
}

\arguments{
  \item{x}{a numeric vector of data or an object of class \code{"Ckmeans.1d.dp"}.

If \code{x} is a numeric vector, all \code{NA} elements must be removed from \code{x} before calling this function.

If \code{x} is an object of class \code{"Ckmeans.1d.dp"}, the clustering information in \code{x} will be used and the \code{data} argument contains the numeric vector to be plotted.}

  \item{k}{either an exact integer number of bins/clusters, or a vector of length two specifying the minimum and maximum numbers of bins/clusters to be examined. The default is \code{c(1,9)}. When \code{k} is a range, the actual number of clusters is determined by Bayesian information criterion. This argument is ignored if \code{x} is an object of class \code{"Ckmeans.1d.dp"}.}

  \item{breaks}{This argument is defined in \code{\link{hist}}. If this argument is provided, optimal univariate \var{k}-means clustering is not applied to obtain the histogram, but instead the histogram will be generated by the \code{\link{hist}} function in \pkg{\link{graphics}}, except that sticks representing data can still be optionally plotted by specifying the \code{add.sticks=TRUE} argument.}

  \item{data}{a numeric vector. If \code{x} is an object of class \code{"Ckmeans.1d.dp"}, the data argument must be provided. If \code{x} is a numeric vector, this argument is ignored.}

 \item{plot}{a logical. If \code{TRUE}, the histogram will be plotted.}

 \item{xlab}{a character string. The x-axis label for the plot.}

 \item{main}{a character string. The title for the plot.}

 \item{col}{a character string. The fill color of the histogram bars.}

 \item{lwd}{a numeric value. The line width of the border of the histogram bars}

 \item{col.stick}{a character string. The color of the sticks above the x-axis. See Details.}

 \item{lwd.stick}{a numeric value. The line width of the sticks above the x-axis. See Details.}

 \item{add.sticks}{a logical. If \code{TRUE} (default), the sticks representing the data will be added to the bottom of the histogram. Otherwise, sticks are not plotted.}

 \item{style}{a character string. The style of the adaptive histogram. See details.}

 \item{skip.empty.bin.color}{a logical. If \code{TRUE} (default), an empty bin (invisible) will be assigned the same bar color with the next bin. This is useful when all provided colors are to be used for non-empty bins. If \code{FALSE}, each bin will be assigned a bar color from \code{col}. A value of \code{TRUE} will coordinate the bar and stick colors.}

 \item{...}{additional arguments to be passed to \code{\link{hist}} or \code{\link{plot.histogram}}.}
}

\author{
	Joe Song
}

\details{
The histogram is by default plotted using the \code{\link{plot.histogram}} method. The plot can be optionally disabled with the \code{plot=FALSE} argument. The original input data are shown as sticks just above the horizontal axis.

If the \code{breaks} argument is not specified, the number of histogram bins is the optimal number of univariate \var{k}-means clusters estimated using Bayesian information criterion evaluated on Gaussian mixture models fitted to the input data in \code{x}.

If not provided with the \code{breaks} argument, breaks in the histogram are derived from clusters identified by optimal univariate \var{k}-means (\code{\link{Ckmeans.1d.dp}}) in two styles. With the default \code{"discontinuous"} style,  the bin width of each bar is determined according to a data-adaptive rule; the \code{"midpoints"} style uses the midpoints of cluster border points to determine the bin-width. For clustered data, the \code{"midpoints"} style generates bins that are too wide to capture the cluster patterns. In contrast, the \code{"discontinuous"} style is more adaptive to the data by allowing some bins to be empty making the histogram bars discontinuous.}

\value{
An object of class \code{histogram} defined in \code{\link{hist}}. It has a S3 \code{plot} method \code{\link{plot.histogram}}.
}

\references{
Wang, H. and Song, M. (2011) Ckmeans.1d.dp: optimal \var{k}-means clustering in one dimension by dynamic programming. \emph{The R Journal} \bold{3}(2), 29--33. Retrieved from \url{https://journal.r-project.org/archive/2011-2/RJournal_2011-2_Wang+Song.pdf}
}

\seealso{
  \code{\link{hist}} in package \pkg{\link{graphics}}.
}

\examples{
# Example 1: plot an adaptive histogram from data generated by
#   a Gaussian mixture model with three components
x <- c(rnorm(40, mean=-2, sd=0.3),
       rnorm(45, mean=1, sd=0.1),
       rnorm(70, mean=3, sd=0.2))
ahist(x, col="lightblue", sub=paste("n =", length(x)),
      col.stick="salmon", lwd=2,
      main="Example 1. Gaussian mixture model with 3 components\n(one bin per component)")


# Example 2: plot an adaptive histogram from data generated by
#   a Gaussian mixture model with three components using a given
#   number of bins
ahist(x, k=9, col="lavender", col.stick="salmon",
      sub=paste("n =", length(x)), lwd=2,
      main="Example 2. Gaussian mixture model with 3 components\n(on average 3 bins per component)")


# Example 3: The DNase data frame has 176 rows and 3 columns of
#   data obtained during development of an ELISA assay for the
#   recombinant protein DNase in rat serum.

data(DNase)
res <- Ckmeans.1d.dp(DNase$density)
kopt <- length(res$size)
ahist(res, data=DNase$density, col=rainbow(kopt), col.stick=rainbow(kopt)[res$cluster],
      sub=paste("n =", length(x)), border="transparent",
      xlab="Optical density of protein DNase",
      main="Example 3. Elisa assay of DNase in rat serum")


# Example 4: Add sticks to histograms with the R provided
#   hist() function.

ahist(DNase$density, breaks="Sturges", col="palegreen",
      add.sticks=TRUE, col.stick="darkgreen",
      main="Example 4. Elisa assay of DNase in rat serum\n(Equal width bins)",
      xlab="Optical density of protein DNase")
}
