\name{cfPermute}
\alias{cfPermute}
\title{
Permutation testing to indicate statistical significance of performance
}
\description{
The \code{cfPermute} function performs permutation testing on a classification ensemble produced by \code{\link{cfBuild}}. This is essentially a comparison between the classification performance achieved for a given dataset and the performance that would be achieved by random chance. It therefore provides an indication of significance of the performance of a classifier. 
}
\usage{
cfPermute(inputData, inputClass, bootNum = 100, ensNum = 100, permNum = 100, 
          parallel = TRUE, cpus = NULL, type = "SOCK", socketHosts = NULL, 
          progressBar = TRUE)
}
\arguments{
  \item{inputData}{The input data matrix as provided by the user (mandatory field).}
  \item{inputClass}{The input class vector as provided by the user (mandatory field).}
  \item{bootNum}{The number of bootstrap iterations during the optimisation process. By default, the value is set to 100.}
  \item{ensNum}{The number of classifiers that constitute the ensemble for each permutation. By default, the value is set to 100.}
  \item{permNum}{The number of permutations to be executed. By default, the value is set to 100.}
  \item{parallel}{Boolean value that determines parallel or sequential execution. By default set to \code{TRUE}. For more details, see \link{sfInit}.}
  \item{cpus}{Numeric value that provides the number of CPUs requested for the cluster. For more details, see \link{sfInit}.}
  \item{type}{The type of cluster. It can take the values `SOCK', `MPI', `PVM' or `NWS'. By default, type is equal to `SOCK'. For more details, see \link{sfInit}.}
  \item{socketHosts}{Host list for socket clusters. Only needed for socketmode (SOCK) and if using more than one machines (if using only your local machine (localhost) no list is needed). For more details, see \link{sfInit}.}
  \item{progressBar}{Boolean value that determines whether a progress bar should be displayed. By default set to \code{TRUE}.}
}
\value{
The \code{cfPermute} function returns an object in the form of an R list. The attributes of the list can be accessed by executing the \link{attributes} command. More specifically, the list of attributes includes:
  \item{avgAcc}{The average test accuracy across all ensembles within each permutation iteration.}
  \item{totalTime}{The overall execution time of permutation testing.}
  \item{execTime}{The individual execution times for each permutation round.}
  \item{permList}{For each permutation iteration, a new object (list) is generated by the function \code{\link{cfBuild}} using as input the initial data and the permuted class. This attribute will have the same length - the same number of elements - as the \code{permNum} attribute specified in the \code{cfPermute} function. For more information on the arguments of the object, see \code{\link{cfBuild}}}
}
\details{
Permutation testing is a widely-applied process used in order to provide an indication of the statistical significance of the classification results. In a permutation test, the entries of the original class vector (\code{inputClass}) are randomly shuffled, while the class distribution is preserved. This approach destroys all the sample membership information since the samples of a permuted dataset correspond to randomly assigned classes. The whole model building process as described in \code{\link{cfBuild}} is once more repeated for the "false" (permuted) classes. In general, permutation testing should be performed at least 100 times (default value of \code{permNum}) until a stable distribution of results is obtained.  
}
\references{
  Good, P. I.\cr
  \emph{Permutation, Parametric and Bootstrap Tests of Hypotheses}\cr
  3rd ed, Springer-Verlag New York Inc, Dordrecht, 2006
  
  Hesterberg, T., Moore, D. S., Monaghan, S., Clipson, A. and Epstein, R. 
  \emph{Bootstrap methods and permutation tests}\cr
  Introduction to the Practice of Statistics, vol. 5, pp. 1-70, 2005
}
\seealso{
  \code{\link{getPerm5Num}},
  \code{\link{ggPermHist}}
}
\examples{
\dontrun{
data(iris)

irisClass <- iris[,5]
irisData  <- iris[,-5]
            
ens <- cfBuild(irisData, irisClass, bootNum = 100, ensNum = 100, parallel = TRUE, 
               cpus = 4, type = "SOCK")

# Execute 5 permutation rounds; in each permutation test, an ensemble of 20 classifiers 
# is constructed, each running 10 bootstrap iterations during the optimization process
# The default values for permutation testing are ensNum = bootNum = permNum = 100

permObj <- cfPermute(irisData, irisClass, bootNum = 10, ensNum = 20, permNum = 5, parallel = TRUE, 
                     cpus = 4, type = "SOCK")

# List of attributes for each permutation
attributes(permObj)

# Get the vector of averaged accuracies, one for each permutation 
# (each permutation is an independent classification ensemble)
permObj$avgAcc

# Get the overall elapsed time for the permutation process 
permObj$totalTime[3]

# Get the vector of individual execution times for each permutation
permObj$execTime

# Access the first ensemble in the permutation list
permObj$permList[[1]]
}
}
\keyword{nonparametric}
\keyword{models}
