% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/std_index.R
\name{std_index}
\alias{std_index}
\title{Calculate standardised indices}
\usage{
std_index(
  x_new,
  x_ref = x_new,
  timescale = NULL,
  dist = "empirical",
  return_fit = FALSE,
  moving_window = NULL,
  window_scale = NULL,
  agg_period = NULL,
  agg_scale = NULL,
  agg_fun = "sum",
  rescale = NULL,
  rescale_fun = "sum",
  index_type = "normal",
  ignore_na = FALSE
)
}
\arguments{
\item{x_new}{numeric; vector or time series to be converted to standardised indices.}

\item{x_ref}{numeric; vector or time series containing reference data to use when calculating the standardised indices.}

\item{timescale}{string; timescale of the data. Required if the time series is to be aggregated or rescaled.}

\item{dist}{string; distribution used to calculate the indices.}

\item{return_fit}{logical; return parameters and goodness-of-fit statistics for the distribution fit.}

\item{moving_window}{numeric; length of moving window on which to calculate the indices.}

\item{window_scale}{string; timescale of \code{moving_window}, default is the timescale of the data.}

\item{agg_period}{numeric; the number of values to aggregate over.}

\item{agg_scale}{string; timescale of \code{agg_period}, default is the timescale of the data.}

\item{agg_fun}{string; function used to aggregate the data over the aggregation period, default is "sum".}

\item{rescale}{string; the timescale that the time series should be rescaled to.}

\item{rescale_fun}{string; function used to rescale the data, default is "sum".}

\item{index_type}{string; the type of index: "normal" (default), "probability", or "bounded".}

\item{ignore_na}{logical; ignore NAs when rescaling the time series.}
}
\value{
Time series of standardised indices.
}
\description{
Inputs a time series of a chosen variable (e.g. precipitation,
energy demand, residual load etc.) and returns a time series of standardised indices.
Indices can be calculated on any timescale.
}
\details{
Standardised indices are calculated by estimating the cumulative distribution function (CDF)
of the variable of interest, and using this to transform the measurements to
a standardised scale.

\code{std_index()} estimates the CDF using a time series of reference data \code{x_ref},
and applies the resulting transformation to the time series \code{x_new}. The result is
a time series of standardised \code{x_new} values. These standardised indices quantify
how extreme the \code{x_new} values are in reference to \code{x_ref}.
\code{x_new} and \code{x_ref} should therefore contain values of the same variable.
If \code{x_ref} is not specified, then \code{x_new} is also used to estimate the CDF.

\code{x_new} and \code{x_ref} can either be provided as vectors or xts time series.
In the latter case, the time series can be aggregated across timescales or rescaled.
This is useful, for example, if \code{x_new} contains hourly data, but interest is
on daily accumulations or averages of the hourly data.

The argument \code{rescale} converts the data to a different timescale. The original
timescale of the data can be manually specified using the argument \code{timescale}.
Otherwise, the function will try to automatically determine the timescale of the data.
Manually specifying the timescale of the data is generally more robust. The rescaling
is performed using the function \code{rescale_fun}. By default, this is assumed to be
\code{rescale_fun = "sum"}, so that values are added across the timescale of interest.
This can be changed to any user-specified function.

The argument \code{agg_period} aggregates the data across the timescale of interest.
This differs from \code{rescale} in that the resolution of the data remains the same.
\code{agg_period} is a number specifying how long the data should be aggregated across.
By default, it is assumed that \code{agg_period} is on the same timescale as \code{x_new}
and \code{x_ref}. For example, if the data is hourly and \code{agg_period = 24}, then
this assumes the data is to be aggregated over the past 24 hours. The scale of the
aggregation period can also be specified manually using \code{agg_scale}. For example,
one could also specify \code{agg_period = 1} and \code{agg_scale = "days"}, and this
would also aggregate the data over the past day. \code{agg_fun} specifies how the
data is to be aggregated, the default is \code{agg_fun = "sum"}.

\code{timescale}, \code{agg_scale}, and \code{rescale} must all be one of: "days",
"weeks", "months", "quarters", and "years".

\code{dist} is the distribution used to estimate the CDF from \code{x_ref}.
Currently, functionality is available to fit one of the following distributions to the data:
Normal ('norm'), Log-normal ('lnorm'), Logistic ('logis'), Log-logistic ('llogis'),
Exponential ('exp'), Gamma ('gamma'), and Weibull ('weibull').
Alternatively, the CDF can be estimated empirically (\code{dist = "empirical"})
based on the values in \code{x_ref}, or using kernel density estimation (\code{dist = "kde"}).

If \code{dist} is a parametric family of distributions, then parameters of the
distribution are estimated using maximum likelihood estimation from \code{x_ref}.
The resulting parameters and corresponding goodness-of-fit statistics can be
returned by specifying \code{return_fit = TRUE}.

By default, the distribution is estimated over all values in \code{x_ref}. Alternatively,
if \code{x_new} is an xts object, parameters can be estimated sequentially using a
moving window of values. \code{moving_window} determines the length of the moving window.
This is a single value, assumed to be on the same timescale as \code{x_new}.
This can also be specified manually using \code{window_scale}. \code{window_scale}
must also be one of "days", "weeks", "months", "quarters", and "years".

The function returns a vector of time series (depending on the format of \code{x_new})
containing the standardised indices corresponding to \code{x_new}. Three different
types of indices are available, which are explained in detail in the vignette.
The index type can be chosen using \code{index_type}, which must be one of
"normal" (default), "probability", and "bounded".
}
\examples{
data(data_supply)
# consider hourly German energy supply data in 2019
supply_de <- subset(data_supply, country == "Germany", select = c("date", "PWS"))
supply_de <- xts::xts(supply_de$PWS, order.by = supply_de$date)
#options(xts_check_TZ = FALSE)

# convert to hourly standardised indices
supply_de_std <- std_index(supply_de, timescale = "hours")
hist(supply_de, main = "Raw values")
hist(supply_de_std, main = "Standardised values")

# convert to daily or weekly standardised indices
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "days")

# convert to weekly standardised indices calculated on each day
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "days",
                           agg_period = 1, agg_scale = "weeks")

# calculate standardised indices corresponding to December, based on the previous year
dec <- zoo::index(supply_de) > "2019-12-01 UTC"
supply_de_std_dec <- std_index(x_new = supply_de[dec], x_ref = supply_de[!dec],
                               timescale = "hours")

# calculate standardised indices using a 100 day moving window
supply_de_std_dec <- std_index(supply_de[dec], supply_de, timescale = "hours",
                               rescale = "days", moving_window = 100)

# suppose we are interested in the daily maximum rather than the daily total
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "days",
                           rescale_fun = "max")
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "days",
                           rescale_fun = "mean") # or average

# the default uses the empirical distribution, but this requires more data than
# parametric distributions, meaning it is not ideal when data is short, e.g. in weekly case
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "weeks") # warning
# instead, we can use a parametric distribution, e.g. a gamma distribution
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "weeks", dist = "gamma")
# we can check the fit by checking whether the indices resemble a standard normal distribution
hist(supply_de)
hist(supply_de_std)
# we can also look at the properties of the fit
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "weeks",
                           dist = "gamma", return_fit = TRUE)

# we could also use kernel density estimation, which is a flexible compromise between the two
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "weeks", dist = "kde")

}
\references{
Allen, S. and N. Otero (2023):
`Standardised indices to monitor energy droughts',
\emph{Renewable Energy} 217, 119206
\doi{10.1016/j.renene.2023.119206}

McKee, T. B., Doesken, N. J., & Kleist, J. (1993):
`The relationship of drought frequency and duration to time scales',
\emph{In Proceedings of the 8th Conference on Applied Climatology} 17, 179-183.
}
\author{
Sam Allen, Noelia Otero
}
