\name{ccSim}
\Rdversion{1.2}
\alias{ccSim}
%- Also NEED an '\alias' for EACH other topic documented here.
\title{
Simulation function for case-control study designs.
}
\description{
  Monte Carlo based evaluation of operating characteristics of the maximum likelihood estimator (MLE) for the coefficients of a logistic regression model, based on the case-control.
}
\usage{
ccSim(B=1000, betaTruth, X, N, nCC, r, refDesign=1, alpha=0.05,
      threshold=c(-Inf, Inf), digits=NULL, betaNames=NULL,
      monitor=NULL,NI=NULL)
}
\arguments{
  \item{B}{
   The number of datasets generated by the simulation.
}
  \item{betaTruth}{
    Regression coefficients from the logistic regression model.
}
  \item{X}{
    Design matrix for the logistic regression model. The first column should correspond to intercept. For each exposure, the baseline group should be coded as 0, the first level as 1, and so on. The number of levels cannot exceed 100.
}
  \item{N}{
    A numeric vector providing the sample size for each row of the design matrix, \code{X}. 
}
  \item{nCC}{
    A numeric value indicating the total case-control sample size. 
}
  \item{r}{
    A numeric value indicating the control:case ratio in the case-control sample. If a vector is provided, separate simulations are run for each value.
}
  \item{refDesign}{
		A numeric value indicating the control:case ratio for the referent design (for the relative uncertainty calculation).
}
  \item{alpha}{
    Type I error rate assumed for the evaluation of coverage probabilities and power.
}
  \item{threshold}{
		An interval that specifies truncation of the Monte Carlo sampling distribution of the MLE.
}
  \item{digits}{
    Integer indicating the precision to be used for the output.
}
  \item{betaNames}{
    An optional character vector of names for the regression coefficients, \code{betaTruth}.
}
\item{monitor}{
	Numeric value indicating how often \code{ccSim()} reports real-time progress on the simulation, as the \code{B} datasets are generated and evaluated. The default of \code{NULL} indicates no output.
      }
\item{NI}{
    A pair of integers providing the outcome-specific phase I sample
    sizes when the phase I data are drawn as a case-control sample. The
    first element corresponds to the controls and the second to the
    cases.
  }
}
\details{
A simulation study is performed to evaluate the operating
characteristics of the MLE for \code{betaTruth} from a case-control
design (Prentice and Pyke, 1979). The operating characteristics are
evaluated using the Monte Carlo sampling distribution of the
estimator. The latter is generated using the following steps:
\itemize{
  \item{}{(i) Specify the (joint) marginal exposure distribution of underlying population, using \code{X} and \code{N}.} 
  \item{}{(ii) Simulate outcomes for all sum(\code{N}) individuals in the population, based on an underlying logistic regression model specified via \code{betaTruth}.}
  \item{}{(iii) Sample \code{n0} controls and \code{n1} cases, on the basis of \code{nCC} and \code{r}.}
	\item{}{(iv) Evaluate the MLE estimator, its estimated standard error and store the results.}
  \item{}{(v) Repeat steps (ii)-(iv) \code{B} times.}
}
  All CC MLEs are evaluated using the generic \code{\link{glm}} function.
	
	The correspondence between \code{betaTruth} and \code{X}, specifically the ordering of elements, is based on successive use of \code{\link{factor}} to each column of \code{X}, excluding the intercept.
	
	When evaluating operating characteristics of the MLE, some simulated datasets may result in unusually large or small estimates. Particularly, when the the case-control sample size, \code{nCC}, is small. In some settings, it may be desirable to truncate the Monte Carlo sampling distribution prior to evaluating operating characteristics. The \code{threshold} argument indicates the interval beyond which MLEs are ignored. The default is such that all \code{B} datasets are kept.
}
\value{
  \code{ccSim()} returns an object of class "ccSim", a list containing all the input arguments, as well as the following components:
    \item{mean}{
    	Mean of the Monte Carlo sampling distribution for each regression coefficient estimator.
    }
    \item{bias.mean}{
    	Bias based on the mean, calculated as \code{mean} - \code{betaTruth}. 
    }
    \item{pct.bias.mean}{
    	Percent bias based on mean, calculated as ((\code{mean} - \code{betaTruth}) / \code{betaTruth}) x 100. If a regression coefficient is
      zero, percent bias is not calculated and an NA is returned.
    }
    \item{median}{
    	Median of the Monte Carlo sampling distribution for each regression coefficient estimator.
    }  
    \item{bias.median}{
    	Bias based on the median, calculated as \code{median} - \code{betaTruth}. 
    }
    \item{pct.bias.med}{
    	Percent bias based on median, calculated as ((\code{median} - \code{betaTruth}) / \code{betaTruth}) x 100. If a regression coefficient is
      zero, median percent bias is not calculated and an NA is returned.
    }
    \item{sd}{
    	Standard deviation of the Monte Carlo sampling distribution for each regression coefficient estimator.
    }
    \item{relative.uncertainty}{
    	relative uncertainty, defined as the ratio of the standard deviation of the Monte Carlo sampling
    	distribution for each estimator to the standard deviation of the Monte Carlo sampling distribution
    	for the estimator corresponding to \code{refDesign}. The ratio is multiplied by 100.
    }
    \item{mean.squared.error}{
      Mean squared error of the Monte Carlo sampling distribution for each regression coefficient
      estimator.
    }
    \item{reported.standard.error}{
    	Mean of the Monte Carlo sampling distribution for the standard error estimates reported by glm().
    }
    \item{sd.reported.vs.actual}{
    	Ratio of the mean reported standard error to the standard deviation of the Monte Carlo sampling
    	distribution for each regression coefficient estimator. The ratio is multiplied by 100.
    
    }
    \item{coverage.probability}{
    	Coverage probability for Wald-based confidence intervals, evaluated on the basis of an \code{alpha} type I error rate.
    }
    \item{power}{
    	Power against the null hypothesis that the regression coefficient is zero for a Wald-based test with an \code{alpha} type I error rate.
    }
    \item{na}{
    	A matrix consisting of the number of datasets excluded from the power calculations (i.e. set to \code{NA}), for each simulation performed. The three reasons are: (1) lack of convergence indicated by \code{NA} point estimates returned by \code{\link{glm}}, (2) lack of convergence indicated by \code{NA} standard error point estimates returned by \code{\link{glm}}, (3) exclusion on the basis of the \code{threshold} argument.
    }
}
\note{
	A generic print method provides formatted output of the results.
}
\references{
  Prentice, R. and Pyke, R. (1979) "Logistic disease incidence models and case-control studies." Biometrika 66:403-411.
}
\author{
  Takumi Saegusa, Sebastien Haneuse
}
\seealso{
  \code{\link{plotPower}}.
}
\examples{
##
data(Ohio)

## 
XM   <- cbind(Int=1, Ohio[,1:3])
fitM <- glm(cbind(Death, N-Death) ~ factor(Age) + Sex + Race, data=Ohio,
            family=binomial)
betaNamesM <- c("Int", "Age1", "Age2", "Sex", "Race")

## Single case-control design

\dontrun{
ccResults1 <- ccSim(B=1000, betaTruth=fitM$coef, X=XM, N=Ohio$N,
                    nCC=500, r=1, betaNames=betaNamesM, monitor=100)
ccResults1}

## Examining unbalanced case-control designs

\dontrun{
ccResults2 <- ccSim(B=1000, betaTruth=fitM$coef, X=XM, N=Ohio$N,
                    nCC=500, r=c(0.25, 0.33, 0.5, 1, 2, 3, 4),
                    betaNames=betaNamesM, monitor=100)
ccResults2}
}



