\name{test.csv}
\alias{test.csv}
\title{
To cross-validate a Gaussian Process emulator
}
\description{
To test an emulator using cross-validation. Any reasonable number of runs can be
excluded from the emulator for testing. Then, the emulator is used to
predict at the excluded parameter settings, and the results can be
plotted. Optionally, prediction standard deviation can also be
plotted. The program is reproducible if the seed number is specified.
}
\usage{
test.csv(final.emul, num.test, plot.std, theseed = NULL, test.runind =
NULL, make.plot = TRUE)
}
\arguments{
  \item{final.emul}{
A standard emulator object. It can be generated by, for example, the
'emulator' function.
}
  \item{num.test}{
Number of test runs to withhold at random from the emulator. Can not be
more than half of the ensemble.
}
  \item{plot.std}{
If \code{TRUE}, prediction std. is plotted around the emulator prediction
}
  \item{theseed}{
Seed for random number generator. Default is \code{NULL}.
}
  \item{test.runind}{
If not \code{NULL}, a monotonically increasing vector of run indices at which
to test the emulator. Has to contain \code{num.test} number of elements.
If \code{NULL}, run indices are generated at random. Default is \code{NULL}.
}
  \item{make.plot}{
If \code{TRUE} (the default) produced a cross-validation plot. 
}
}
\details{
The function withholds model runs from the emulator, and then uses the
withheld emulator to predict at the excluded parameter settings. Any
reasonable number of runs up to 1/2 of the ensemble can be excluded. If
\code{make.plot=TRUE} the plot of model output and emulator prediction
at the excluded setting is produced. Setting \code{plot.std=TRUE} adds
the prediction standard deviation to the plot. If \code{plot.std=TRUE}
but \code{make.plot=FALSE} a warning is given and nothing is plotted.
The default setting is to
exclude \code{num.test} runs at random. If \code{theseed} is specified
the results become reproducible as the seed passed on to the R random
number
generator is fixed. Optionally, the excluded run indices can be
specified explicitly via the vector \code{test.runind}. If both
\code{theseed} and \code{test.runind} arguments are specified, then
\code{theseed} is ignored.

The function prints out the excluded run
indices and outputs some basic guidance. The function handles runs which
are at parameter space boundaries by passing NA to the relevant rows of
output matrices. If prediction at a particular withheld run gives error
(most likely due to the fact that extrapolation is not allowed and the
excluded run is at the corner of parameter space) that particular run is
not plotted. Model output is beige, and emulator output is brown.
  
By 'withholding' runs from an emulator it is meant that the emulator
statistical parameters are kept the same, whereas the withheld runs are
eliminated from the \code{final.emul$Theta.mat},
\code{final.emul$Y.mat}, \code{final.emul$X.mat}, and
\code{final.emul$vecC}. The ensemble size \code{final.emul$p} is reduced
accordingly. 
}
\value{
List with components
\item{$model.out.test}{ Model output at test parameter settings [row,col] =
[test run index, time index]}
\item{$emul.out.test}{Emulator output at test parameter settings [row,col] =
[test run index, time index]}
\item{$emul.std.test}{Emulator prediction standard deviation at test parameter
settings [row,col] = [test run index, time index]}
The time vector for columns of all components is
\code{final.emul$t.vec}
}
\references{
  R. Olson and W. Chang (2013): Mathematical framework for a separable
Gaussian Process Emulator. Tech. Rep., available from
\href{http://www.geosc.psu.edu/~rtonkono}{www.geosc.psu.edu/~rtonkono}.
}
\seealso{
\code{\link{test.all}}, \code{\link{emul.predict}}
}
\examples{
# Test the 1D emulator by withholding the second parameter
# setting, and then predicting at the withheld run
data(emul.1D)
test.1D <- test.csv(final.emul=emul.1D, num.test=1, plot.std=FALSE,
                    test.runind=2, make.plot=FALSE)

# Create a custom plot
plot.default(NA, xlim=range(emul.1D$t.vec), ylim=range(test.1D$model.out.test),
             xlab="Year", ylab="Sample Output", main="Emulator Test at Theta=1",
             cex.lab=1.3, cex.axis=1.3, cex.main=1.3)
lines(emul.1D$t.vec, test.1D$model.out.test, lwd=4, lty=2, col="green")
# The emulator prediction is close to perfect! Yay!
lines(emul.1D$t.vec, test.1D$emul.out.test, lwd=2, col="yellow")

# Produce a legend
model.max <- max(test.1D$model.out.test)
# We already know that time ranges from 0 to 10 so figuring out x placement
# for the legend is not hard
legend(1, model.max, c("Model Output", "Emulator Predictions"),
       col=c("green", "yellow"), lty=c(2,1), lwd=c(4,2), cex=1.3)



# Test SICOPOLIS emulator at three pre-selected parameter settings
# specified by theseed=1234321
data(emul.Sicopolis)
# Plot the standard deviation in all three cases
test.csv(final.emul=emul.Sicopolis, num.test=3, plot.std=TRUE, theseed=1234321,
         make.plot=TRUE)
}
