| Title: | Spatial Sampling Design and Analysis | 
| Version: | 5.6.0 | 
| Maintainer: | Michael Dumelle <Dumelle.Michael@epa.gov> | 
| Description: | A design-based approach to statistical inference, with a focus on spatial data. Spatially balanced samples are selected using the Generalized Random Tessellation Stratified (GRTS) algorithm. The GRTS algorithm can be applied to finite resources (point geometries) and infinite resources (linear / linestring and areal / polygon geometries) and flexibly accommodates a diverse set of sampling design features, including stratification, unequal inclusion probabilities, proportional (to size) inclusion probabilities, legacy (historical) sites, a minimum distance between sites, and two options for replacement sites (reverse hierarchical order and nearest neighbor). Data are analyzed using a wide range of analysis functions that perform categorical variable analysis, continuous variable analysis, attributable risk analysis, risk difference analysis, relative risk analysis, change analysis, and trend analysis. spsurvey can also be used to summarize objects, visualize objects, select samples that are not spatially balanced, select panel samples, measure the amount of spatial balance in a sample, adjust design weights, and more. For additional details, see Dumelle et al. (2023) <doi:10.18637/jss.v105.i03>. | 
| Depends: | R (≥ 3.5.0), sf, survey (≥ 4.1-1) | 
| Imports: | boot, crossdes, deldir, graphics, grDevices, lme4, MASS, sampling, stats, units | 
| Suggests: | knitr, testthat, rmarkdown | 
| License: | GPL (≥ 3) | 
| URL: | https://usepa.github.io/spsurvey/, https://github.com/USEPA/spsurvey | 
| BugReports: | https://github.com/USEPA/spsurvey/issues | 
| VignetteBuilder: | knitr | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| RoxygenNote: | 7.3.2 | 
| NeedsCompilation: | no | 
| Packaged: | 2025-09-29 21:33:12 UTC; MDUMELLE | 
| Author: | Michael Dumelle | 
| Repository: | CRAN | 
| Date/Publication: | 2025-09-29 22:20:02 UTC | 
spsurvey: Spatial Sampling Design and Analysis
Description
spsurvey implements a design-based approach to statistical inference, with a focus on spatial data. Spatially balanced samples are selected using the Generalized Random Tessellation Stratified (GRTS) algorithm. The GRTS algorithm can be applied to finite resources (point geometries) and infinite resources (linear / linestring and areal / polygon geometries) and flexibly accommodates a diverse set of sampling design features, including stratification, unequal inclusion probabilities, proportional (to size) inclusion probabilities, legacy (historical) sites, a minimum distance between sites, and two options for replacement sites (reverse hierarchical order and nearest neighbor). Data are analyzed using a wide range of analysis functions that perform categorical variable analysis, continuous variable analysis, attributable risk analysis, risk difference analysis, relative risk analysis, change analysis, and trend analysis. spsurvey can also be used to summarize objects, visualize objects, select samples that are not spatially balanced, select panel samples, measure the amount of spatial balance in a sample, adjust design weights, and more. This R package has been reviewed in accordance with U.S. Environmental Protection Agency policy and approved for publication. Mention of trade names or commercial products does not constitute endorsement or recommendation for use.
Author(s)
Maintainer: Michael Dumelle Dumelle.Michael@epa.gov (ORCID)
Authors:
- Tom Kincaid 
- Anthony (Tony) R. Olsen 
- Marc Weber 
Other contributors:
- Don Stevens [contributor] 
- Denis White [contributor] 
- Amanda M. Nahlik [contributor] 
- Sarah Lehmann [contributor] 
See Also
Useful links:
- Report bugs at https://github.com/USEPA/spsurvey/issues 
Illinois River data
Description
An (sf) MULTILINESTRING object of 244 segments of the
Illinois River in Arkansas and Oklahoma.
Usage
Illinois_River
Format
244 rows and 2 variables:
- STATE_NAME
- State name. 
- geometry
- MULTILINESTRING geometry using the NAD83 / Conus Albers coordinate reference system (EPSG: 5070). 
Illinois River legacy data
Description
An (sf) POINT object of legacy sites for the Illinois
River data.
Usage
Illinois_River_Legacy
Format
5 rows and 2 variables:
- STATE_NAME
- State name. 
- geometry
- POINT geometry using the NAD83 / Conus Albers coordinate reference system (EPSG: 5070). 
Lake Ontario data
Description
An sf MULTIPOLYGON object of 187 polygons consisting
of shore segments in Lake Ontario.
Usage
Lake_Ontario
Format
187 rows and 5 variables:
- COUNTRY
- Country. 
- RSRC_CLASS
- Bay class. 
- PSTL_CODE
- Postal code. 
- AREA_SQKM
- Area in square kilometers 
- geometry
- MULTIPOLYGON geometry using the NAD83 / Conus Albers coordinate reference system (EPSG: 5070). 
New England Lakes data
Description
An sf POINT object of 195 lakes in the Northeastern
United States.
Usage
NE_Lakes
Format
195 rows and 5 variables:
- AREA
- Lake area in hectares. 
- AREA_CAT
- Lake area categories based on a hectare cutoff. 
- ELEV
- Elevation in meters. 
- ELEV_CAT
- Elevation categories based on a meter cutoff. 
- geometry
- POINT geometry using the NAD83 / Conus Albers coordinate reference system (EPSG: 5070). 
New England Lakes legacy data
Description
An sf POINT object of 5 legacy sites for the NE Lakes data
Usage
NE_Lakes_Legacy
Format
5 rows and 5 variables:
- AREA
- Lake area in hectares. 
- AREA_CAT
- Lake area categories based on a hectare cutoff. 
- ELEV
- Elevation in meters. 
- ELEV_CAT
- Elevation categories based on a meter cutoff. 
- geometry
- POINT geometry using the NAD83 / Conus Albers coordinate reference system (EPSG: 5070). 
New England Lakes data (as a data frame)
Description
An data frame of 195 lakes in the Northeastern United States.
Usage
NE_Lakes_df
Format
195 rows and 6 variables:
- AREA
- Lake area in hectares. 
- AREA_CAT
- Lake area categories based on a hectare cutoff. 
- ELEV
- Elevation in meters. 
- ELEV_CAT
- Elevation categories based on a meter cutoff. 
- XCOORD
- x-coordinate using the WGS 84 coordinate reference system (EPSG: 4326) 
- YCOORD
- y-coordinate using WGS 84 coordinate reference system (EPSG: 4326) 
NLA PNW data
Description
An sf POINT object of 96 lakes in the Pacific Northwest Region of the United
States during the year 2017, from a subset of the Environmental
Protection Agency's "National Lakes Assessment."
Usage
NLA_PNW
Format
96 rows and 9 variables:
- SITE_ID
- A unique lake identifier. 
- WEIGHT
- The sampling design weight. 
- URBAN
- Urban category. 
- STATE
- State name. 
- BMMI
- Benthic MMI value. 
- BMMI_COND
- Benthic MMI condition categories. 
- PHOS_COND
- Phosphorus condition categories. 
- NITR_COND
- Nitrogen condition categories. 
- geometry
- POINT geometry using the NAD83 / Conus Albers coordinate reference system (EPSG: 5070). 
NRSA EPA7 data
Description
An sf POINT object of 353 stream segments in the Central
United States during the years 2008 and 2013, from a subset of the Environmental
Protection Agency's "National Rivers and Streams Assessment."
Usage
NRSA_EPA7
Format
353 rows and 10 variables:
- SITE_ID
- A unique site identifier. 
- YEAR
- Year of design cycle. 
- WEIGHT
- Sampling design weights. 
- ECOREGION
- Ecoregion. 
- STATE
- State name. 
- BMMI
- Benthic MMI value. 
- BMMI_COND
- Benthic MMI categories. 
- PHOS_COND
- Phosphorus condition categories. 
- NITR_COND
- Nitrogen condition categories. 
- geometry
- POINT geometry using the NAD83 / Conus Albers coordinate reference system (EPSG: 5070). 
Adjust survey design weights by categories
Description
Adjust initial survey design weights so that the
final weights sum to a desired frame size. Adjusted weights
proportionally scale the initial weights to sum to the desired frame size.
Separate adjustments are applied to each category specified in wgtcat.
Usage
adjwgt(wgt, wgtcat = NULL, framesize, sites = NULL)
Arguments
| wgt | Vector of initial weights for each site. These equal the reciprocal of the site's inclusion probability. | 
| wgtcat | Vector containing each site's weight adjustment
category name. The default is  | 
| framesize | Vector containing the known size of the frame
for each category name in  | 
| sites | Vector indicating site use;  | 
Value
Vector of adjusted weights, where the adjusted weight is set
to 0 for sites whose value in the sites argument was set to
FALSE.
Author(s)
Tony Olsen olsen.tony@epa.gov
Examples
wgt <- runif(50)
wgtcat <- rep(c("A", "B"), c(30, 20))
framesize <- c(A = 15, B = 10)
sites <- rep(rep(c(TRUE, FALSE), c(9, 1)), 5)
adjwgt(wgt, wgtcat, framesize, sites)
Adjust survey design weights for non-response by categories
Description
Adjust weights for target sample units that do not respond and are missing at random within categories. The missing at random assumption implies that their sample weight may be assigned to specific categories of units that have responded (i.e., have been sampled). This is a class-based method for non-response adjustment.
Usage
adjwgtNR(wgt, MARClass = NULL, EvalStatus, TNRClass, TRClass)
Arguments
| wgt | vector of weights for each sample unit that will be adjusted for non-response. Weights must be weights for the design as implemented. All weights must be greater than zero. | 
| MARClass | vector that identifies for each sample unit the class (i.e., category, level) that will be used in non-response weight adjustment for sample units that are known to be target. Within each missing at random (MAR) class, the missing sample units that are not sampled are assumed to be missing at random. If MARClass is not specified, all sample units are assumed to be from the same MAR class. | 
| EvalStatus | vector of the evaluation status for each sample unit. Values must include the values given in TNRclass and TRClass. May include other values not required for the non-response adjustment. | 
| TNRClass | subset of values in EvalStatus that identify sample units whose target status is known and that do not respond (i.e., are not sampled). | 
| TRClass | Subset of values in EvalStatus that identify sample units whose target status is known and that respond (i.e., are target and sampled). | 
Value
Vector of sample unit weights that are adjusted for non-response and that is the same length of input weights. Weights for sample units that did not response but were known to be eligible are set to zero. Weights for all other sample units are also set to zero.
Author(s)
Tony Olsen olsen.tony@epa.gov
Examples
set.seed(5)
wgt <- runif(40)
MARClass <- rep(c("A", "B"), rep(20, 2))
EvalStatus <- sample(c("Not_Target", "Target_Sampled", "Target_Not_Sampled"), 40, replace = TRUE)
TNRClass <- "Target_Not_Sampled"
TRClass <- "Target_Sampled"
adjwgtNR(wgt, MARClass, EvalStatus, TNRClass, TRClass)
# function that has an error check
Compute the average shifted histogram (ASH) for one-dimensional weighted data
Description
Calculate the average shifted histogram estimate of a density based on one-dimensional data from a survey design with weights.
Usage
ash1_wgt(
  x,
  wgt = rep(1, length(x)),
  m = 5,
  nbin = 50,
  ab = NULL,
  support = "Continuous"
)
Arguments
| x | Vector used to estimate the density.  | 
| wgt | Vector of weights for each observation from a probability sample. The default assigns equal weights (equal probability). | 
| m | Number of empty bins to add to the ends when the range is not
completely specified.  The default is  | 
| nbin | Number of bins for density estimation.  The default is  | 
| ab | Optional range for support associated with the density. Both
values may be equal to  | 
| support | Type of support.  If equal to  | 
Value
List containing the ASH density estimate. List consists of
- tcen
- x-coordinate for center of bin 
- f
- y-coordinate for density estimate height 
Author(s)
Tony Olsen Olsen.tony@epa.gov
References
Scott, D. W. (1985). "Averaged shifted histograms: effective nonparametric density estimators in several dimensions." The Annals of Statistics 13(3): 1024-1040.
Examples
x <- rnorm(100, 10, sqrt(10))
wgt <- runif(100, 10, 100)
rslt <- ash1_wgt(x, wgt)
plot(rslt)
Attributable risk analysis
Description
This function organizes input and output for the analysis of attributable risk (for
categorical variables).  The analysis data,
dframe, can be either a data frame or a simple features (sf) object.  If an
sf object is used, coordinates are extracted from the geometry column in the
object, arguments xcoord and ycoord are assigned values
"xcoord" and "ycoord", respectively, and the geometry column is
dropped from the object.
Usage
attrisk_analysis(
  dframe,
  vars_response,
  vars_stressor,
  response_levels = NULL,
  stressor_levels = NULL,
  subpops = NULL,
  siteID = NULL,
  weight = "weight",
  xcoord = NULL,
  ycoord = NULL,
  stratumID = NULL,
  clusterID = NULL,
  weight1 = NULL,
  xcoord1 = NULL,
  ycoord1 = NULL,
  sizeweight = FALSE,
  sweight = NULL,
  sweight1 = NULL,
  fpc = NULL,
  popsize = NULL,
  vartype = "Local",
  conf = 95,
  All_Sites = FALSE
)
Arguments
| dframe | Data to be analyzed (analysis data). A data frame or
 | 
| vars_response | Vector composed of character values that identify the
names of response variables in  | 
| vars_stressor | Vector composed of character values that identify the
names of stressor variables in  | 
| response_levels | List providing the category values (levels) for each
element in the  | 
| stressor_levels | List providing the category values (levels) for each
element in the  | 
| subpops | Vector composed of character values that identify the
names of subpopulation (domain) variables in  | 
| siteID | Character value providing the name of the site ID variable in
 | 
| weight | Character value providing the name of the design weight
variable in  | 
| xcoord | Character value providing name of the x-coordinate variable in
 | 
| ycoord | Character value providing name of the y-coordinate variable in
 | 
| stratumID | Character value providing the name of the stratum ID
variable in  | 
| clusterID | Character value providing the name of the cluster
(stage one) ID variable in  | 
| weight1 | Character value providing the name of the stage one weight
variable in  | 
| xcoord1 | Character value providing the name of the stage one
x-coordinate variable in  | 
| ycoord1 | Character value providing the name of the stage one
y-coordinate variable in  | 
| sizeweight | Logical value that indicates whether size weights should be
used during estimation, where  | 
| sweight | Character value providing the name of the size weight variable
in  | 
| sweight1 | Character value providing the name of the stage one size
weight variable in  | 
| fpc | Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design: 
 Example fpc for a single-stage stratified survey design: 
 Example fpc for a two-stage unstratified survey design: 
 Example fpc for a two-stage stratified survey design: 
 | 
| popsize | Object that provides values for the population argument of the
 Example popsize for calibration: 
 Example popsize for post-stratification using a data frame: 
 Example popsize for post-stratification using a table: 
 Example popsize for post-stratification using an xtabs object: 
 | 
| vartype | Character value providing the choice of the variance
estimator, where  | 
| conf | Numeric value providing the Gaussian-based confidence level.  The default value
is  | 
| All_Sites | A logical variable used when  | 
Value
The analysis results. A data frame of population estimates for all combinations of subpopulations, categories within each subpopulation, response variables, and categories within each response variable. Estimates are provided for proportion and size of the population plus standard error, margin of error, and confidence interval estimates. The data frame contains the following variables:
- Type
- subpopulation (domain) name 
- Subpopulation
- subpopulation name within a domain 
- Response
- response variable 
- Stressor
- stressor variable 
- nResp
- sample size 
- Estimate
- attributable risk estimate 
- StdError_log
- attributable risk standard error (on the log scale) 
- MarginofError_log
- attributable risk margin of error (on the log scale) 
- LCBxxPct
- xx% (default 95%) lower confidence bound 
- UCBxxPct
- xx% (default 95%) upper confidence bound 
- WeightTotal
- sum of design weights 
- Count_RespPoor_StressPoor
- number of observations in the poor response and poor stressor group 
- Count_RespPoor_StressGood
- number of observations in the poor response and good stressor group 
- Count_RespGood_StressPoor
- number of observations in the good response and poor stressor group 
- Count_RespGood_StressGood
- number of observations in the good response and good stressor group 
- Prop_RespPoor_StressPoor
- weighted proportion of observations in the poor response and poor stressor group 
- Prop_RespPoor_StressGood
- weighted proportion of observations in the poor response and good stressor group 
- Prop_RespGood_StressPoor
- weighted proportion of observations in the good response and poor stressor group 
- Prop_RespGood_StressGood
- weighted proportion of observations in the good response and good stressor group 
Details
Attributable risk measures the proportional reduction in the extent of poor condition of a response variable that presumably would result from eliminating a stressor variable, where the response and stressor variables are classified as either good (i.e., reference condition) or poor (i.e., different from reference condition). Attributable risk is defined as one minus the ratio of two probabilities. The numerator of the ratio is the conditional probability that the response variable is in poor condition given that the stressor variable is in good condition. The denominator of the ratio is the probability that the response variable is in poor condition. Attributable risk values close to zero indicate that removing the stressor variable will have little or no impact on the probability that the response variable is in poor condition. Attributable risk values close to one indicate that removing the stressor variable will result in extensive reduction of the probability that the response variable is in poor condition.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
References
Sickle, J. V., & Paulsen, S. G. (2008). Assessing the attributable risks, relative risks, and regional extents of aquatic stressors. Journal of the North American Benthological Society, 27(4), 920-931.
See Also
- relrisk_analysis
- for relative risk analysis 
- diffrisk_analysis
- for risk difference analysis 
Examples
dframe <- data.frame(
  siteID = paste0("Site", 1:100),
  wgt = runif(100, 10, 100),
  xcoord = runif(100),
  ycoord = runif(100),
  stratum = rep(c("Stratum1", "Stratum2"), 50),
  RespVar1 = sample(c("Poor", "Good"), 100, replace = TRUE),
  RespVar2 = sample(c("Poor", "Good"), 100, replace = TRUE),
  StressVar = sample(c("Poor", "Good"), 100, replace = TRUE),
  All_Sites = rep("All Sites", 100),
  Resource_Class = rep(c("Agr", "Forest"), c(55, 45))
)
myresponse <- c("RespVar1", "RespVar2")
mystressor <- c("StressVar")
mysubpops <- c("All_Sites", "Resource_Class")
attrisk_analysis(dframe,
  vars_response = myresponse,
  vars_stressor = mystressor, subpops = mysubpops, siteID = "siteID",
  weight = "wgt", xcoord = "xcoord", ycoord = "ycoord",
  stratumID = "stratum"
)
Categorical variable analysis
Description
This function organizes input and output for the analysis of categorical variables.  The analysis data,
dframe, can be either a data frame or a simple features (sf) object.  If an
sf object is used, coordinates are extracted from the geometry column in the
object, arguments xcoord and ycoord are assigned values
"xcoord" and "ycoord", respectively, and the geometry column is
dropped from the object.
Usage
cat_analysis(
  dframe,
  vars,
  subpops = NULL,
  siteID = NULL,
  weight = "weight",
  xcoord = NULL,
  ycoord = NULL,
  stratumID = NULL,
  clusterID = NULL,
  weight1 = NULL,
  xcoord1 = NULL,
  ycoord1 = NULL,
  sizeweight = FALSE,
  sweight = NULL,
  sweight1 = NULL,
  fpc = NULL,
  popsize = NULL,
  vartype = "Local",
  jointprob = "overton",
  conf = 95,
  All_Sites = FALSE
)
Arguments
| dframe | Data to be analyzed (analysis data). A data frame or
 | 
| vars | Vector composed of character values that identify the
names of response variables in  | 
| subpops | Vector composed of character values that identify the
names of subpopulation (domain) variables in  | 
| siteID | Character value providing name of the site ID variable in
the  | 
| weight | Character value providing name of the design weight
variable in  | 
| xcoord | Character value providing name of the x-coordinate variable in
the  | 
| ycoord | Character value providing name of the y-coordinate variable in
the  | 
| stratumID | Character value providing name of the stratum ID variable in
the  | 
| clusterID | Character value providing the name of the cluster
(stage one) ID variable in  | 
| weight1 | Character value providing name of the stage one weight
variable in  | 
| xcoord1 | Character value providing the name of the stage one
x-coordinate variable in  | 
| ycoord1 | Character value providing the name of the stage one
y-coordinate variable in  | 
| sizeweight | Logical value that indicates whether size weights should be
used during estimation, where  | 
| sweight | Character value providing the name of the size weight variable
in  | 
| sweight1 | Character value providing name of the stage one size weight
variable in  | 
| fpc | Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design: 
 Example fpc for a single-stage stratified survey design: 
 Example fpc for a two-stage unstratified survey design: 
 Example fpc for a two-stage stratified survey design: 
 | 
| popsize | Object that provides values for the population argument of the
 Example popsize for calibration: 
 Example popsize for post-stratification using a data frame: 
 Example popsize for post-stratification using a table: 
 Example popsize for post-stratification using an xtabs object: 
 | 
| vartype | Character value providing the choice of the variance
estimator, where  | 
| jointprob | Character value providing the choice of joint inclusion
probability approximation for use with Horvitz-Thompson and Yates-Grundy
variance estimators, where  | 
| conf | Numeric value providing the Gaussian-based confidence level.  The default value
is  | 
| All_Sites | A logical variable used when  | 
Value
The analysis results. A data frame of population estimates for all combinations of subpopulations, categories within each subpopulation, response variables, and categories within each response variable. Estimates are provided for proportion and total of the population plus standard error, margin of error, and confidence interval estimates. The data frame contains the following variables:
- Type
- subpopulation (domain) name 
- Subpopulation
- subpopulation name within a domain 
- Indicator
- response variable 
- Category
- category of response variable 
- nResp
- sample size 
- Estimate.P
- proportion estimate (in %) 
- StdError.P
- standard error of proportion estimate 
- MarginofError.P
- margin of error of proportion estimate 
- LCBxxPct.P
- xx% (default 95%) lower confidence bound of proportion estimate 
- UCBxxPct.P
- xx% (default 95%) upper confidence bound of proportion estimate 
- Estimate.U
- total estimate 
- StdError.U
- standard error of total estimate 
- MarginofError.U
- margin of error of total estimate 
- LCBxxPct.U
- xx% (default 95%) lower confidence bound of total estimate 
- UCBxxPct.U
- xx% (default 95%) upper confidence bound of total estimate 
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
See Also
- cont_analysis
- for continuous variable analysis 
Examples
dframe <- data.frame(
  siteID = paste0("Site", 1:100),
  wgt = runif(100, 10, 100),
  xcoord = runif(100),
  ycoord = runif(100),
  stratum = rep(c("Stratum1", "Stratum2"), 50),
  CatVar = rep(c("north", "south", "east", "west"), 25),
  All_Sites = rep("All Sites", 100),
  Resource_Class = rep(c("Good", "Poor"), c(55, 45))
)
myvars <- c("CatVar")
mysubpops <- c("All_Sites", "Resource_Class")
mypopsize <- data.frame(
  Resource_Class = c("Good", "Poor"),
  Total = c(4000, 1500)
)
cat_analysis(dframe,
  vars = myvars, subpops = mysubpops, siteID = "siteID",
  weight = "wgt", xcoord = "xcoord", ycoord = "ycoord",
  stratumID = "stratum", popsize = mypopsize
)
Plot a cumulative distribution function (CDF)
Description
This function creates a CDF plot.  Input data for the plots is provided by a
data frame with the same structure as the "CDF" output from  cont_analysis.
Confidence limits for the CDF also are plotted.
Usage
cdf_plot(
  cdfest,
  var = NULL,
  subpop = NULL,
  subpop_level = NULL,
  units_cdf = "Percent",
  type_cdf = "Continuous",
  log = "",
  xlab = NULL,
  ylab = NULL,
  ylab_r = NULL,
  main = NULL,
  legloc = NULL,
  confcut = 0,
  conflev = 95,
  cex.main = 1.2,
  cex.legend = 1,
  ...
)
Arguments
| cdfest | Data frame with the same structure as the "CDF" output from
 | 
| var | If  | 
| subpop | If  | 
| subpop_level | If  | 
| units_cdf | Indicator for the label utilized for the left side y-axis and the values used for the left side y-axis tick marks, where "Percent" means the label and values are in terms of percent of the population, and "Units" means the label and values are in terms of units (count, length, or area) of the population. The default is "Percent". | 
| type_cdf | Character string consisting of the value "Continuous" or "Ordinal" that controls the type of CDF plot. The default is "Continuous". | 
| log | Character string consisting of the value "" or "x" that controls whether the x axis uses the original scale ("") or the base 10 logarithmic scale ("x"). The default is "". | 
| xlab | Character string providing the x-axis label. If this argument equals NULL, then the indicator name is used as the label. The default is NULL. | 
| ylab | Character string providing the left side y-axis label. If argument units_cdf equals "Units", a value should be provided for this argument. Otherwise, the label will be "Percent". The default is "Percent". | 
| ylab_r | Character string providing the label for the right side y-axis (and, hence, determining the values used for the right side y-axis tick marks), where NULL means a right side y-axis is not created. If this argument equals "Same", the right side y-axis will have the same label and tick mark values as the left side y-axis. If this argument equals a character string other than "Same", the right side y-axis label will be the value provided for argument ylab_r, and the right side y-axis tick mark values will be determined by the choice not utilized for argument units_cdf, which means that the default value of argument units_cdf (i.e., "Percent") will result in the right side y-axis tick mark values being expressed in terms of units of the population (i.e., count, length, or area). The default is NULL. | 
| main | Character string providing the plot title. The default is NULL. | 
| legloc | Indicator for location of the plot legend, where "BR" means bottom right, "BL" means bottom left, "TR" means top right, "TL" means top left, and NULL means no legend. The default is NULL. | 
| confcut | Numeric value that controls plotting confidence limits at the CDF extremes. Confidence limits for CDF values (percent scale) less than confcut or greater than 100 minus confcut are not plotted. A value of zero means confidence limits are plotted for the complete range of the CDF. The default is 0. | 
| conflev | Numeric value of the confidence level used for confidence limits. The default is 95. | 
| cex.main | Expansion factor for the plot title. The default is 1.2. | 
| cex.legend | Expansion factor for the legend title. The default is 1. | 
| ... | Additional arguments passed to the  | 
Value
A plot of a variable's CDF estimates associated confidence limits.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
See Also
- cont_cdfplot
- for creating a PDF file containing CDF plots 
- cont_cdftest
- for CDF hypothesis testing 
Examples
## Not run: 
dframe <- data.frame(
  siteID = paste0("Site", 1:100),
  wgt = runif(100, 10, 100),
  xcoord = runif(100),
  ycoord = runif(100),
  stratum = rep(c("Stratum1", "Stratum2"), 50),
  ContVar = rnorm(100, 10, 1),
  All_Sites = rep("All Sites", 100),
  Resource_Class = rep(c("Good", "Poor"), c(55, 45))
)
myvars <- c("ContVar")
mysubpops <- c("All_Sites", "Resource_Class")
mypopsize <- data.frame(
  Resource_Class = c("Good", "Poor"),
  Total = c(4000, 1500)
)
myanalysis <- cont_analysis(dframe,
  vars = myvars, subpops = mysubpops,
  siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord",
  stratumID = "stratum", popsize = mypopsize
)
keep <- with(myanalysis$CDF, Type == "Resource_Class" &
  Subpopulation == "Good")
par(mfrow = c(2, 1))
cdf_plot(myanalysis$CDF[keep, ],
  xlab = "ContVar",
  ylab = "Percent of Stream Length", ylab_r = "Stream Length (km)",
  main = "Estimates for Resource Class: Good"
)
cdf_plot(myanalysis$CDF[keep, ],
  xlab = "ContVar",
  ylab = "Percent of Stream Length", ylab_r = "Same",
  main = "Estimates for Resource Class: Good"
)
## End(Not run)
Change analysis
Description
This function organizes input and output for the estimation of change between two
samples (for categorical and continuous variables).  The analysis data,
dframe, can be either a data frame or a simple features (sf) object.  If an
sf object is used, coordinates are extracted from the geometry column in the
object, arguments xcoord and ycoord are assigned values
"xcoord" and "ycoord", respectively, and the geometry column is
dropped from the object.
Usage
change_analysis(
  dframe,
  vars_cat = NULL,
  vars_cont = NULL,
  test = "mean",
  subpops = NULL,
  surveyID = "surveyID",
  survey_names = NULL,
  siteID = "siteID",
  weight = "weight",
  revisitwgt = FALSE,
  xcoord = NULL,
  ycoord = NULL,
  stratumID = NULL,
  clusterID = NULL,
  weight1 = NULL,
  xcoord1 = NULL,
  ycoord1 = NULL,
  sizeweight = FALSE,
  sweight = NULL,
  sweight1 = NULL,
  fpc = NULL,
  popsize = NULL,
  vartype = "Local",
  jointprob = "overton",
  conf = 95,
  All_Sites = FALSE
)
Arguments
| dframe | Data to be analyzed (analysis data). A data frame or
 | 
| vars_cat | Vector composed of character values that identify the
names of categorical response variables in  | 
| vars_cont | Vector composed of character values that identify the
names of continuous response variables in  | 
| test | Character string or character vector providing the location
measure(s) to use for change estimation for continuous variables.  The
choices are  | 
| subpops | Vector composed of character values that identify the
names of subpopulation (domain) variables in  | 
| surveyID | Character value providing name of the survey ID variable in
 | 
| survey_names | Character vector of length two that provides the survey
names contained in the  | 
| siteID | Character value providing name of the site ID variable in
 | 
| weight | Character value providing name of the design weight
variable in  | 
| revisitwgt | Logical value that indicates whether each repeat visit
site has the same design weight in the two surveys, where
 | 
| xcoord | Character value providing name of the x-coordinate variable in
 | 
| ycoord | Character value providing name of the y-coordinate variable in
 | 
| stratumID | Character value providing name of the stratum ID variable in
 | 
| clusterID | Character value providing the name of the cluster
(stage one) ID variable in  | 
| weight1 | Character value providing name of the stage one weight
variable in  | 
| xcoord1 | Character value providing the name of the stage one
x-coordinate variable in  | 
| ycoord1 | Character value providing the name of the stage one
y-coordinate variable in  | 
| sizeweight | Logical value that indicates whether size weights should be
used during estimation, where  | 
| sweight | Character value providing the name of the size weight variable
in  | 
| sweight1 | Character value providing name of the stage one size weight
variable in  | 
| fpc | Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design: 
 Example fpc for a single-stage stratified survey design: 
 Example fpc for a two-stage unstratified survey design: 
 Example fpc for a two-stage stratified survey design: 
 | 
| popsize | Object that provides values for the population argument of the
 Example popsize for calibration: 
 Example popsize for post-stratification using a data frame: 
 Example popsize for post-stratification using a table: 
 Example popsize for post-stratification using an xtabs object: 
 | 
| vartype | Character value providing the choice of the variance
estimator, where  | 
| jointprob | Character value providing the choice of joint inclusion
probability approximation for use with Horvitz-Thompson and Yates-Grundy
variance estimators, where  | 
| conf | Numeric value providing the Gaussian-based confidence level.  The default value
is  | 
| All_Sites | A logical variable used when  | 
Value
List of change estimates composed of four items:
(1) catsum contains change estimates for categorical variables,
(2) contsum_mean contains estimates for continuous variables using
the mean, (3) contsum_total contains estimates for continuous
variables using the total, and (4) contsum_median contains estimates for continuous
variables using the median.  The items in the list will contain NULL
for estimates that were not calculated.  Each data frame includes estimates
for all combinations of population Types, subpopulations within types,
response variables, and categories within each response variable (for
categorical variables and continuous variables using the median).  Change
estimates are provided plus standard error estimates and confidence
interval estimates.
The catsum data frame contains the following variables:
- Survey_1
- first survey name 
- Survey_2
- second survey name 
- Type
- subpopulation (domain) name 
- Subpopulation
- subpopulation name within a domain 
- Indicator
- response variable 
- Category
- category of response variable 
- DiffEst.P
- proportion difference estimate (in %; second survey - first survey) 
- StdError.P
- standard error of proportion difference estimate 
- MarginofError.P
- margin of error of proportion difference estimate 
- LCBxxPct.P
- xx% (default 95%) lower confidence bound of proportion difference estimate 
- UCBxxPct.P
- xx% (default 95%) upper confidence bound of proportion difference estimate 
- Estimate.U
- total difference estimate (second survey - first survey) 
- StdError.U
- standard error of total difference estimate 
- MarginofError.U
- margin of error of total difference estimate 
- LCBxxPct.U
- xx% (default 95%) lower confidence bound of total difference estimate 
- UCBxxPct.U
- xx% (default 95%) upper confidence bound of total difference estimate 
- nResp_1
- sample size in the first survey 
- Estimate.P_1
- proportion estimate (in %) from the first survey 
- StdError.P_1
- standard error of proportion estimate from the first survey 
- MarginofError.P_1
- margin of error of proportion estimate from the first survey 
- LCBxxPct.P_1
- xx% (default 95%) lower confidence bound of proportion estimate from the first survey 
- UCBxxPct.P_1
- xx% (default 95%) upper confidence bound of proportion estimate from the first survey 
- nResp_2
- sample size in the second survey 
- Estimate.U_1
- total estimate from the first survey 
- StdError.U_1
- standard error of total estimate from the first survey 
- MarginofError.U_1
- margin of error of total estimate from the first survey 
- LCBxxPct.U_1
- xx% (default 95%) lower confidence bound of total estimate from the first survey 
- UCBxxPct.U_1
- xx% (default 95%) upper confidence bound of total estimate from the first survey 
- Estimate.P_2
- proportion estimate (in %) from the second survey 
- StdError.P_2
- standard error of proportion estimate from the second survey 
- MarginofError.P_2
- margin of error of proportion estimate from the second survey 
- LCBxxPct.P_2
- xx% (default 95%) lower confidence bound of proportion estimate from the second survey 
- UCBxxPct.P_2
- xx% (default 95%) upper confidence bound of proportion estimate from the second survey 
- Estimate.U_2
- total estimate from the second survey 
- StdError.U_2
- standard error of total estimate from the second survey 
- MarginofError.U_2
- margin of error of total estimate from the second survey 
- LCBxxPct.U_2
- xx% (default 95%) lower confidence bound of total estimate from the second survey 
- UCBxxPct.U_2
- xx% (default 95%) upper confidence bound of total estimate from the second survey 
The contsum_mean data frame contains the following variables:
- Survey_1
- first survey name 
- Survey_2
- second survey name 
- Type
- subpopulation (domain) name 
- Subpopulation
- subpopulation name within a domain 
- Indicator
- response variable 
- Statistic
- value of percentile 
- nResp
- sample size at or below - Value
- DiffEst
- mean difference estimate 
- StdError
- standard error of mean difference estimate 
- MarginofError
- margin of error of mean difference estimate 
- LCBxxPct
- xx% (default 95%) lower confidence bound of mean difference estimate 
- UCBxxPct
- xx% (default 95%) upper confidence bound of mean difference estimate 
- nResp_1
- sample size in the first survey 
- Estimate_1
- mean estimate from the first survey 
- StdError_1
- standard error of mean estimate from the first survey 
- MarginofError_1
- margin of error of mean estimate from the first survey 
- LCBxxPct_1
- xx% (default 95%) lower confidence bound of mean estimate from the first survey 
- UCBxxPct_1
- xx% (default 95%) upper confidence bound of mean estimate from the first survey 
- nResp_2
- sample size in the second survey 
- Estimate_2
- mean estimate from the second survey 
- StdError_2
- standard error of mean estimate from the second survey 
- MarginofError_2
- margin of error of mean estimate from the second survey 
- LCBxxPct_2
- xx% (default 95%) lower confidence bound of mean estimate from the second survey 
- UCBxxPct_2
- xx% (default 95%) upper confidence bound of mean estimate from the second survey 
The contsum_total data frame contains the following variables:
- Survey_1
- first survey name 
- Survey_2
- second survey name 
- Type
- subpopulation (domain) name 
- Subpopulation
- subpopulation name within a domain 
- Indicator
- response variable 
- Statistic
- value of percentile 
- nResp
- sample size at or below - Value
- DiffEst
- total difference estimate 
- StdError
- standard error of total difference estimate 
- MarginofError
- margin of error of total difference estimate 
- LCBxxPct
- xx% (default 95%) lower confidence bound of total difference estimate 
- UCBxxPct
- xx% (default 95%) upper confidence bound of total difference estimate 
- nResp_1
- sample size in the first survey 
- Estimate_1
- total estimate from the first survey 
- StdError_1
- standard error of total estimate from the first survey 
- MarginofError_1
- margin of error of total estimate from the first survey 
- LCBxxPct_1
- xx% (default 95%) lower confidence bound of total estimate from the first survey 
- UCBxxPct_1
- xx% (default 95%) upper confidence bound of total estimate from the first survey 
- nResp_2
- sample size in the second survey 
- Estimate_2
- total estimate from the second survey 
- StdError_2
- standard error of total estimate from the second survey 
- MarginofError_2
- margin of error of total estimate from the second survey 
- LCBxxPct_2
- xx% (default 95%) lower confidence bound of total estimate from the second survey 
- UCBxxPct_2
- xx% (default 95%) upper confidence bound of total estimate from the second survey 
The contsum_median data frame contains the following variables:
- Survey_1
- first survey name 
- Survey_2
- second survey name 
- Type
- subpopulation (domain) name 
- Subpopulation
- subpopulation name within a domain 
- Indicator
- response variable 
- Category
- category of response variable 
- DiffEst.P
- proportion above or below median difference estimate (in %; second survey - first survey) 
- StdError.P
- standard error of proportion above or below median difference estimate 
- MarginofError.P
- margin of error of proportion above or below median difference estimate 
- LCBxxPct.P
- xx% (default 95%) lower confidence bound of proportion above or below median difference estimate 
- UCBxxPct.P
- xx% (default 95%) upper confidence bound of proportion above or below median difference estimate 
- Estimate.U
- total above or below median difference estimate (second survey - first survey) 
- StdError.U
- standard error of total above or below median difference estimate 
- MarginofError.U
- margin of error of total above or below median difference estimate 
- LCBxxPct.U
- xx% (default 95%) lower confidence bound of total above or below median difference estimate 
- UCBxxPct.U
- xx% (default 95%) upper confidence bound of total above or below median difference estimate 
- nResp_1
- sample size in the first survey 
- Estimate.P_1
- proportion above or below median estimate (in %) from the first survey 
- StdError.P_1
- standard error of proportion above or below median estimate from the first survey 
- MarginofError.P_1
- margin of error of proportion above or below median estimate from the first survey 
- LCBxxPct.P_1
- xx% (default 95%) lower confidence bound of proportion above or below median estimate from the first survey 
- UCBxxPct.P_1
- xx% (default 95%) upper confidence bound of proportion above or below median estimate from the first survey 
- nResp_2
- sample size in the second survey 
- Estimate.U_1
- total above or below median estimate from the first survey 
- StdError.U_1
- standard error of total above or below median estimate from the first survey 
- MarginofError.U_1
- margin of error of total above or below median estimate from the first survey 
- LCBxxPct.U_1
- xx% (default 95%) lower confidence bound of total above or below median estimate from the first survey 
- UCBxxPct.U_1
- xx% (default 95%) upper confidence bound of total above or below median estimate from the first survey 
- Estimate.P_2
- proportion above or below median estimate (in %) from the second survey 
- StdError.P_2
- standard error of proportion above or below median estimate from the second survey 
- MarginofError.P_2
- margin of error of proportion above or below median estimate from the second survey 
- LCBxxPct.P_2
- xx% (default 95%) lower confidence bound of proportion above or below median estimate from the second survey 
- UCBxxPct.P_2
- xx% (default 95%) upper confidence bound of proportion above or below median estimate from the second survey 
- Estimate.U_2
- total above or below median estimate from the second survey 
- StdError.U_2
- standard error of total above or below median estimate from the second survey 
- MarginofError.U_2
- margin of error of total above or below median estimate from the second survey 
- LCBxxPct.U_2
- xx% (default 95%) lower confidence bound of total above or below median estimate from the second survey 
- UCBxxPct.U_2
- xx% (default 95%) upper confidence bound of total above or below median estimate from the second survey 
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
See Also
- trend_analysis
- for trend analysis 
Examples
# Categorical variable example for three resource classes
dframe <- data.frame(
  surveyID = rep(c("Survey 1", "Survey 2"), c(100, 100)),
  siteID = paste0("Site", 1:200),
  wgt = runif(200, 10, 100),
  xcoord = runif(200),
  ycoord = runif(200),
  stratum = rep(rep(c("Stratum 1", "Stratum 2"), c(2, 2)), 50),
  CatVar = rep(c("North", "South"), 100),
  All_Sites = rep("All Sites", 200),
  Resource_Class = sample(c("Good", "Fair", "Poor"), 200, replace = TRUE)
)
myvars <- c("CatVar")
mysubpops <- c("All_Sites", "Resource_Class")
change_analysis(dframe,
  vars_cat = myvars, subpops = mysubpops,
  surveyID = "surveyID", siteID = "siteID", weight = "wgt",
  xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum"
)
Continuous variable analysis
Description
This function organizes input and output for the analysis of continuous
variables. The analysis data, dframe, can be either a data frame or a
simple features (sf) object.  If an sf object is used,
coordinates are extracted from the geometry column in the object, arguments
xcoord and ycoord are assigned values "xcoord" and
"ycoord", respectively, and the geometry column is dropped from the
object.
Usage
cont_analysis(
  dframe,
  vars,
  subpops = NULL,
  siteID = NULL,
  weight = "weight",
  xcoord = NULL,
  ycoord = NULL,
  stratumID = NULL,
  clusterID = NULL,
  weight1 = NULL,
  xcoord1 = NULL,
  ycoord1 = NULL,
  sizeweight = FALSE,
  sweight = NULL,
  sweight1 = NULL,
  fpc = NULL,
  popsize = NULL,
  vartype = "Local",
  jointprob = "overton",
  conf = 95,
  pctval = c(5, 10, 25, 50, 75, 90, 95),
  statistics = c("CDF", "Pct", "Mean", "Total"),
  All_Sites = FALSE
)
Arguments
| dframe | Data to be analyzed (analysis data). A data frame or
 | 
| vars | Vector composed of character values that identify the
names of response variables in  | 
| subpops | Vector composed of character values that identify the
names of subpopulation (domain) variables in  | 
| siteID | Character value providing name of the site ID variable in
the  | 
| weight | Character value providing name of the design weight
variable in  | 
| xcoord | Character value providing name of the x-coordinate variable in
the  | 
| ycoord | Character value providing name of the y-coordinate variable in
the  | 
| stratumID | Character value providing name of the stratum ID variable in
the  | 
| clusterID | Character value providing the name of the cluster
(stage one) ID variable in  | 
| weight1 | Character value providing name of the stage one weight
variable in  | 
| xcoord1 | Character value providing the name of the stage one
x-coordinate variable in  | 
| ycoord1 | Character value providing the name of the stage one
y-coordinate variable in  | 
| sizeweight | Logical value that indicates whether size weights should be
used during estimation, where  | 
| sweight | Character value providing the name of the size weight variable
in  | 
| sweight1 | Character value providing name of the stage one size weight
variable in  | 
| fpc | Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design: 
 Example fpc for a single-stage stratified survey design: 
 Example fpc for a two-stage unstratified survey design: 
 Example fpc for a two-stage stratified survey design: 
 | 
| popsize | Object that provides values for the population argument of the
 Example popsize for calibration: 
 Example popsize for post-stratification using a data frame: 
 Example popsize for post-stratification using a table: 
 Example popsize for post-stratification using an xtabs object: 
 | 
| vartype | Character value providing the choice of the variance
estimator, where  | 
| jointprob | Character value providing the choice of joint inclusion
probability approximation for use with Horvitz-Thompson and Yates-Grundy
variance estimators, where  | 
| conf | Numeric value providing the Gaussian-based confidence level.  The default value
is  | 
| pctval | Vector of the set of values at which percentiles are
estimated.  The default set is:  | 
| statistics | Character vector specifying desired estimates, where
 | 
| All_Sites | A logical variable used when  | 
Value
The analysis results. A list composed of one, two, three, or four
data frames that contain population estimates for all combinations of
subpopulations, categories within each subpopulation, and response
variables, where the number of data frames is determined by argument
statistics.  The possible data frames in the output list are:
- CDF
- : a data frame containing CDF estimates 
- Pct
- : data frame containing percentile estimates 
- Mean
- : a data frame containing mean estimates 
- Total
- : a data frame containing total estimates 
The CDF data frame contains the following variables:
- Type
- subpopulation (domain) name 
- Subpopulation
- subpopulation name within a domain 
- Indicator
- response variable 
- Value
- value of response variable 
- nResp
- sample size at or below - Value
- Estimate.P
- CDF proportion estimate (in %) 
- StdError.P
- standard error of CDF proportion estimate 
- MarginofError.P
- margin of error of CDF proportion estimate 
- LCBxxPct.P
- xx% (default 95%) lower confidence bound of CDF proportion estimate 
- UCBxxPct.P
- xx% (default 95%) upper confidence bound of CDF proportion estimate 
- Estimate.U
- CDF total estimate 
- StdError.U
- standard error of CDF total estimate 
- MarginofError.U
- margin of error of CDF total estimate 
- LCBxxPct.U
- xx% (default 95%) lower confidence bound of CDF total estimate 
- UCBxxPct.U
- xx% (default 95%) upper confidence bound of CDF total estimate 
The Pct data frame contains the following variables:
- Type
- subpopulation (domain) name 
- Subpopulation
- subpopulation name within a domain 
- Indicator
- response variable 
- Statistic
- value of percentile 
- nResp
- sample size at or below - Value
- Estimate
- percentile estimate 
- StdError
- standard error of percentile estimate 
- MarginofError
- margin of error of percentile estimate 
- LCBxxPct
- xx% (default 95%) lower confidence bound of percentile estimate 
- UCBxxPct
- xx% (default 95%) upper confidence bound of percentile estimate 
The Mean data frame contains the following variables:
- Type
- subpopulation (domain) name 
- Subpopulation
- subpopulation name within a domain 
- Indicator
- response variable 
- nResp
- sample size at or below - Value
- Estimate
- mean estimate 
- StdError
- standard error of mean estimate 
- MarginofError
- margin of error of mean estimate 
- LCBxxPct
- xx% (default 95%) lower confidence bound of mean estimate 
- UCBxxPct
- xx% (default 95%) upper confidence bound of mean estimate 
The Total data frame contains the following variables:
- Type
- subpopulation (domain) name 
- Subpopulation
- subpopulation name within a domain 
- Indicator
- response variable 
- nResp
- sample size at or below - Value
- Estimate
- total estimate 
- StdError
- standard error of total estimate 
- MarginofError
- margin of error of total estimate 
- LCBxxPct
- xx% (default 95%) lower confidence bound of total estimate 
- UCBxxPct
- xx% (default 95%) upper confidence bound of total estimate 
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
See Also
- cat_analysis
- for categorical variable analysis 
Examples
dframe <- data.frame(
  siteID = paste0("Site", 1:100),
  wgt = runif(100, 10, 100),
  xcoord = runif(100),
  ycoord = runif(100),
  stratum = rep(c("Stratum1", "Stratum2"), 50),
  ContVar = rnorm(100, 10, 1),
  All_Sites = rep("All Sites", 100),
  Resource_Class = rep(c("Good", "Poor"), c(55, 45))
)
myvars <- c("ContVar")
mysubpops <- c("All_Sites", "Resource_Class")
mypopsize <- data.frame(
  Resource_Class = c("Good", "Poor"),
  Total = c(4000, 1500)
)
cont_analysis(dframe,
  vars = myvars, subpops = mysubpops, siteID = "siteID",
  weight = "wgt", xcoord = "xcoord", ycoord = "ycoord",
  stratumID = "stratum", popsize = mypopsize, statistics = "Mean"
)
Create a PDF file containing cumulative distribution functions (CDF) plots
Description
This function creates a PDF file containing CDF plots.  Input data for the
plots is provided by a data frame with the same structure as the "CDF"
output from  cont_analysis.  Plots are produced for every combination of Type of
population, Subpopulation within Type, and Indicator (every combination
of subpopulations, subpopulation levels, and variables).
Usage
cont_cdfplot(
  pdffile = "cdf2x2.pdf",
  cdfest,
  units_cdf = "Percent",
  ind_type = rep("Continuous", nind),
  log = rep("", nind),
  xlab = NULL,
  ylab = NULL,
  ylab_r = NULL,
  legloc = NULL,
  cdf_page = 4,
  width = 10,
  height = 8,
  confcut = 0,
  cex.main = 1.2,
  cex.legend = 1,
  ...
)
Arguments
| pdffile | Name of the PDF file. The default is "cdf2x2.pdf". | 
| cdfest | Data frame with the same structure as the "CDF"
output from   | 
| units_cdf | Indicator for the label utilized for the left side y-axis and the values used for the left side y-axis tick marks, where "Percent" means the label and values are in terms of percent of the population, and "Units" means the label and values are in terms of units (count, length, or area) of the population. The default is "Percent". | 
| ind_type | Character vector consisting of the values "Continuous" or "Ordinal" that controls the type of CDF plot for each indicator. The default is "Continuous" for every indicator. | 
| log | Character vector consisting of the values "" or "x" that controls whether the x axis uses the original scale ("") or the base 10 logarithmic scale ("x") for each indicator. The default is "" for every indicator. | 
| xlab | Character vector consisting of the x-axis label for each indicator. If this argument equals NULL, then indicator names are used as the labels. The default is NULL. | 
| ylab | Character string providing the left side y-axis label. If argument units_cdf equals "Units", a value should be provided for this argument. Otherwise, the label will be "Percent". The default is "Percent". | 
| ylab_r | Character string providing the label for the right side y-axis (and, hence, determining the values used for the right side y-axis tick marks), where NULL means a right side y-axis is not created. If this argument equals "Same", the right side y-axis will have the same label and tick mark values as the left side y-axis. If this argument equals a character string other than "Same", the right side y-axis label will be the value provided for argument ylab_r, and the right side y-axis tick mark values will be determined by the choice not utilized for argument units_cdf, which means that the default value of argument units_cdf (i.e., "Percent") will result in the right side y-axis tick mark values being expressed in terms of units of the population (i.e., count, length, or area). The default is NULL. | 
| legloc | Indicator for location of the plot legend, where "BR" means bottom right, "BL" means bottom left, "TR" means top right, "TL" means top left, and NULL means no legend. The default is NULL. | 
| cdf_page | Number of CDF plots on each page, which must be chosen from the values: 1, 2, 4, or 6. The default is 4. | 
| width | Width of the graphic region in inches. The default is 10. | 
| height | Height of the graphic region in inches. The default is 8. | 
| confcut | Numeric value that controls plotting confidence limits at the CDF extremes. Confidence limits for CDF values (percent scale) less than confcut or greater than 100 minus confcut are not plotted. A value of zero means confidence limits are plotted for the complete range of the CDF. The default is 0. | 
| cex.main | Expansion factor for the plot title. The default is 1.2. | 
| cex.legend | Expansion factor for the legend title. The default is 1. | 
| ... | Additional arguments passed to the  | 
Value
A PDF file containing the CDF plots.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
See Also
- cdf_plot
- for plotting a cumulative distribution function (CDF) 
- cont_cdftest
- for CDF hypothesis testing 
Examples
## Not run: 
dframe <- data.frame(
  siteID = paste0("Site", 1:100),
  wgt = runif(100, 10, 100),
  xcoord = runif(100),
  ycoord = runif(100),
  stratum = rep(c("Stratum1", "Stratum2"), 50),
  ContVar = rnorm(100, 10, 1),
  All_Sites = rep("All Sites", 100),
  Resource_Class = rep(c("Good", "Poor"), c(55, 45))
)
myvars <- c("ContVar")
mysubpops <- c("All_Sites", "Resource_Class")
mypopsize <- data.frame(
  Resource_Class = c("Good", "Poor"),
  Total = c(4000, 1500)
)
myanalysis <- cont_analysis(dframe,
  vars = myvars, subpops = mysubpops,
  siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord",
  stratumID = "stratum", popsize = mypopsize
)
cont_cdfplot("myanalysis.pdf", myanalysis$CDF, ylab_r = "Stream Length (km)")
## End(Not run)
Cumulative distribution function (CDF) inference for a probability survey
Description
This function organizes input and output for conducting inference regarding cumulative distribution functions (CDFs) generated by a probability survey. For every response variable and every subpopulation (domain) variable, differences between CDFs are tested for every pair of subpopulations within the domain. Data input to the function can be either a single survey or multiple surveys (two or more). If the data contain multiple surveys, then the domain variables will reference those surveys and (potentially) subpopulations within those surveys. The inferential procedures divide the CDFs into a discrete set of intervals (classes) and then utilize procedures that have been developed for analysis of categorical data from probability surveys. Choices for inference are the Wald, adjusted Wald, Rao-Scott first order corrected (mean eigenvalue corrected), and Rao-Scott second order corrected (Satterthwaite corrected) test statistics. The default test statistic is the adjusted Wald statistic. The input data argument can be either a data frame or a simple features (sf) object. If an sf object is used, coordinates are extracted from the geometry column in the object, arguments xcoord and ycoord are assigned values "xcoord" and "ycoord", respectively, and the geometry column is dropped from the object.
Usage
cont_cdftest(
  dframe,
  vars,
  subpops = NULL,
  surveyID = NULL,
  siteID = "siteID",
  weight = "weight",
  xcoord = NULL,
  ycoord = NULL,
  stratumID = NULL,
  clusterID = NULL,
  weight1 = NULL,
  xcoord1 = NULL,
  ycoord1 = NULL,
  sizeweight = FALSE,
  sweight = NULL,
  sweight1 = NULL,
  fpc = NULL,
  popsize = NULL,
  vartype = "Local",
  jointprob = "overton",
  testname = "adjWald",
  nclass = 3
)
Arguments
| dframe | Data frame containing survey design variables, response variables, and subpopulation (domain) variables. | 
| vars | Vector composed of character values that identify the
names of response variables in the  | 
| subpops | Vector composed of character values that identify the
names of subpopulation (domain) variables in the  | 
| surveyID | Character value providing name of the survey ID variable in
the  | 
| siteID | Character value providing name of the site ID variable in
the  | 
| weight | Character value providing name of the survey design weight
variable in the  | 
| xcoord | Character value providing name of the x-coordinate variable in
the  | 
| ycoord | Character value providing name of the y-coordinate variable in
the  | 
| stratumID | Character value providing name of the stratum ID variable in
the  | 
| clusterID | Character value providing the name of the cluster
(stage one) ID variable in the  | 
| weight1 | Character value providing name of the stage one weight
variable in the  | 
| xcoord1 | Character value providing the name of the stage one
x-coordinate variable in the  | 
| ycoord1 | Character value providing the name of the stage one
y-coordinate variable in the  | 
| sizeweight | Logical value that indicates whether size weights should be
used during estimation, where  | 
| sweight | Character value providing the name of the size weight variable
in the  | 
| sweight1 | Character value providing name of the stage one size weight
variable in the  | 
| fpc | Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design: 
 Example fpc for a single-stage stratified survey design: 
 Example fpc for a two-stage unstratified survey design: 
 Example fpc for a two-stage stratified survey design: 
 | 
| popsize | Object that provides values for the population argument of the
 Example popsize for calibration: 
 Example popsize for post-stratification using a data frame: 
 Example popsize for post-stratification using a table: 
 Example popsize for post-stratification using an xtabs object: 
 | 
| vartype | Character value providing the choice of the variance
estimator, where  | 
| jointprob | Character value providing the choice of joint inclusion
probability approximation for use with Horvitz-Thompson and Yates-Grundy
variance estimators, where  | 
| testname | Name of the test statistic to be reported in the output
data frame.  Choices for the name are:  | 
| nclass | Number of classes into which the CDFs will be divided
(binned), which must equal at least  | 
Value
Data frame of CDF test results for all pairs of subpopulations
within each population type for every response variable.  The data frame
includes the test statistic specified by argument testname plus its
degrees of freedom and p-value.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
See Also
- cdf_plot
- for visualizing CDF plots 
- cont_cdfplot
- for making CDF plots output to pdfs 
Examples
n <- 200
mysiteID <- paste("Site", 1:n, sep = "")
dframe <- data.frame(
  siteID = mysiteID,
  wgt = runif(n, 10, 100),
  xcoord = runif(n),
  ycoord = runif(n),
  stratum = rep(c("Stratum1", "Stratum2"), n / 2),
  Resource_Class = sample(c("Agr", "Forest", "Urban"), n, replace = TRUE)
)
ContVar <- numeric(n)
tst <- dframe$Resource_Class == "Agr"
ContVar[tst] <- rnorm(sum(tst), 10, 1)
tst <- dframe$Resource_Class == "Forest"
ContVar[tst] <- rnorm(sum(tst), 10.1, 1)
tst <- dframe$Resource_Class == "Urban"
ContVar[tst] <- rnorm(sum(tst), 10.5, 1)
dframe$ContVar <- ContVar
myvars <- c("ContVar")
mysubpops <- c("Resource_Class")
mypopsize <- data.frame(
  Resource_Class = rep(c("Agr", "Forest", "Urban"), rep(2, 3)),
  stratum = rep(c("Stratum1", "Stratum2"), 3),
  Total = c(2500, 1500, 1000, 500, 600, 450)
)
cont_cdftest(dframe,
  vars = myvars, subpops = mysubpops, siteID = "siteID",
  weight = "wgt", xcoord = "xcoord", ycoord = "ycoord",
  stratumID = "stratum", popsize = mypopsize, testname = "RaoScott_First"
)
Create a covariance matrix for a panel design
Description
Covariance structure accounts for the panel design and the four variance components: unit variation, period variation, unit by period interaction variation and index (or residual) variation. The model incorporates unit, period, unit by period, and index variance components. It also includes a provision for unit correlation and period autocorrelation.
Usage
cov_panel_dsgn(
  paneldsgn = matrix(50, 1, 10),
  nrepeats = 1,
  unit_var = NULL,
  period_var = NULL,
  unitperiod_var = NULL,
  index_var = NULL,
  unit_rho = 1,
  period_rho = 0
)
Arguments
| paneldsgn | A matrix (dimensions: number of panels (rows) by number of periods (columns)) containing the number of units visited for each combination of panel and period. Default is matrix(50, 1, 10) which is a single panel of 50 units visited 10 times, typical time is a period. | 
| nrepeats | Either  | 
| unit_var | The variance component estimate for unit. The default is
 | 
| period_var | The variance component estimate for period The default is
 | 
| unitperiod_var | The variance component estimate for unit by period
interaction. The default is  | 
| index_var | The variance component estimate for index error. The
default is  | 
| unit_rho | Unit correlation across periods. The default is  | 
| period_rho | Period autocorrelation. The default is  | 
Details
Covariance structure accounts for the panel design and the four variance components: unit variation, period variation, unit by period interaction variation and index (or residual) variation. Uses the model structure defined by Urquhart 2012.
If nrepeats is NULL, then no units sampled more than once in a specific
panel, period combination) and then unit by period and index variances are
added together or user may have only estimated unit, period and unit by
period variance components so that index component is zero. It calculates
the covariance matrix for the simple linear regression. The standard error
for a linear trend coefficient is the square root of the variance.
Value
A list containing the covariance matrix (cov) for the panel design,
the input panel design (paneldsgn), the input nrepeats design
(nrepeats.dsgn) and the function call.
Author(s)
Tony Olsen Olsen.Tony@epa.gov
References
Urquhart, N. S., W. S. Overton, et al. (1993) Comparing sampling designs for monitoring ecological status and trends: impact of temporal patterns. In: Statistics for the Environment. V. Barnett and K. F. Turkman. John Wiley & Sons, New York, pp. 71-86.
Urquhart, N. S. and T. M. Kincaid (1999). Designs for detecting trends from repeated surveys of ecological resources. Journal of Agricultural, Biological, and Environmental Statistics, 4(4), 404-414.
Urquhart, N. S. (2012). The role of monitoring design in detecting trend in long-term ecological monitoring studies. In: Design and Analysis of Long-term Ecological Monitoring Studies. R. A. Gitzen, J. J. Millspaugh, A. B. Cooper, and D. S. Licht (eds.). Cambridge University Press, New York, pp. 151-173.
See Also
- power_dsgn
- for power calculations of multiple panel designs 
Risk difference analysis
Description
This function organizes input and output for risk difference analysis (of
categorical variables).  The analysis data,
dframe, can be either a data frame or a simple features (sf) object.  If an
sf object is used, coordinates are extracted from the geometry column in the
object, arguments xcoord and ycoord are assigned values
"xcoord" and "ycoord", respectively, and the geometry column is
dropped from the object.
Usage
diffrisk_analysis(
  dframe,
  vars_response,
  vars_stressor,
  response_levels = NULL,
  stressor_levels = NULL,
  subpops = NULL,
  siteID = NULL,
  weight = "weight",
  xcoord = NULL,
  ycoord = NULL,
  stratumID = NULL,
  clusterID = NULL,
  weight1 = NULL,
  xcoord1 = NULL,
  ycoord1 = NULL,
  sizeweight = FALSE,
  sweight = NULL,
  sweight1 = NULL,
  fpc = NULL,
  popsize = NULL,
  vartype = "Local",
  conf = 95,
  All_Sites = FALSE
)
Arguments
| dframe | Data to be analyzed (analysis data). A data frame or
 | 
| vars_response | Vector composed of character values that identify the
names of response variables in  | 
| vars_stressor | Vector composed of character values that identify the
names of stressor variables in  | 
| response_levels | List providing the category values (levels) for each
element in the  | 
| stressor_levels | List providing the category values (levels) for each
element in the  | 
| subpops | Vector composed of character values that identify the
names of subpopulation (domain) variables in  | 
| siteID | Character value providing the name of the site ID variable in
 | 
| weight | Character value providing the name of the design weight
variable in  | 
| xcoord | Character value providing name of the x-coordinate variable in
 | 
| ycoord | Character value providing name of the y-coordinate variable in
 | 
| stratumID | Character value providing the name of the stratum ID
variable in  | 
| clusterID | Character value providing the name of the cluster
(stage one) ID variable in  | 
| weight1 | Character value providing the name of the stage one weight
variable in  | 
| xcoord1 | Character value providing the name of the stage one
x-coordinate variable in  | 
| ycoord1 | Character value providing the name of the stage one
y-coordinate variable in  | 
| sizeweight | Logical value that indicates whether size weights should be
used during estimation, where  | 
| sweight | Character value providing the name of the size weight variable
in  | 
| sweight1 | Character value providing the name of the stage one size
weight variable in  | 
| fpc | Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design: 
 Example fpc for a single-stage stratified survey design: 
 Example fpc for a two-stage unstratified survey design: 
 Example fpc for a two-stage stratified survey design: 
 | 
| popsize | Object that provides values for the population argument of the
 Example popsize for calibration: 
 Example popsize for post-stratification using a data frame: 
 Example popsize for post-stratification using a table: 
 Example popsize for post-stratification using an xtabs object: 
 | 
| vartype | Character value providing the choice of the variance
estimator, where  | 
| conf | Numeric value providing the Gaussian-based confidence level.  The default value
is  | 
| All_Sites | A logical variable used when  | 
Value
The analysis results. A data frame of population estimates for all combinations of subpopulations, categories within each subpopulation, response variables, and categories within each response variable. Estimates are provided for proportion and size of the population plus standard error, margin of error, and confidence interval estimates. The data frame contains the following variables:
- Type
- subpopulation (domain) name 
- Subpopulation
- subpopulation name within a domain 
- Response
- response variable 
- Stressor
- stressor variable 
- nResp
- sample size 
- Estimate
- risk difference estimate 
- Estimate_StressPoor
- risk estimate for poor condition stressor 
- Estimate_StressGood
- risk estimate for good condition stressor 
- StdError
- risk difference standard error 
- MarginofError
- risk difference margin of error 
- LCBxxPct
- xx% (default 95%) lower confidence bound 
- UCBxxPct
- xx% (default 95%) upper confidence bound 
- WeightTotal
- sum of design weights 
- Count_RespPoor_StressPoor
- number of observations in the poor response and poor stressor group 
- Count_RespPoor_StressGood
- number of observations in the poor response and good stressor group 
- Count_RespGood_StressPoor
- number of observations in the good response and poor stressor group 
- Count_RespGood_StressGood
- number of observations in the good response and good stressor group 
- Prop_RespPoor_StressPoor
- weighted proportion of observations in the poor response and poor stressor group 
- Prop_RespPoor_StressGood
- weighted proportion of observations in the poor response and good stressor group 
- Prop_RespGood_StressPoor
- weighted proportion of observations in the good response and poor stressor group 
- Prop_RespGood_StressGood
- weighted proportion of observations in the good response and good stressor group 
Details
Risk difference measures the absolute strength of association between conditional probabilities defined for a response variable and a stressor variable, where the response and stressor variables are classified as either good (i.e., reference condition) or poor (i.e., different from reference condition). Risk difference is defined as the difference between two conditional probabilities: the probability that the response variable is in poor condition given that the stressor variable is in poor condition and the probability that the response variable is in poor condition given that the stressor variable is in good condition. Risk difference values close to zero indicate that the stressor variable has little or no impact on the probability that the response variable is in poor condition. Risk difference values much greater than zero indicate that the stressor variable has a significant impact on the probability that the response variable is in poor condition.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
See Also
- attrisk_analysis
- for attributable risk analysis 
- relrisk_analysis
- for relative risk analysis 
Examples
dframe <- data.frame(
  siteID = paste0("Site", 1:100),
  wgt = runif(100, 10, 100),
  xcoord = runif(100),
  ycoord = runif(100),
  stratum = rep(c("Stratum1", "Stratum2"), 50),
  RespVar1 = sample(c("Poor", "Good"), 100, replace = TRUE),
  RespVar2 = sample(c("Poor", "Good"), 100, replace = TRUE),
  StressVar = sample(c("Poor", "Good"), 100, replace = TRUE),
  All_Sites = rep("All Sites", 100),
  Resource_Class = rep(c("Agr", "Forest"), c(55, 45))
)
myresponse <- c("RespVar1", "RespVar2")
mystressor <- c("StressVar")
mysubpops <- c("All_Sites", "Resource_Class")
diffrisk_analysis(dframe,
  vars_response = myresponse,
  vars_stressor = mystressor, subpops = mysubpops, siteID = "siteID",
  weight = "wgt", xcoord = "xcoord", ycoord = "ycoord",
  stratumID = "stratum"
)
Print errors from analysis functions
Description
This function prints the error messages vector in the analysis functions.
Usage
errorprnt(error_vec = get("error_vec", envir = .GlobalEnv))
Arguments
| error_vec | Data frame that contains error messages.  The default is
 | 
Value
Printed errors.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
Select a generalized random tessellation stratified (GRTS) sample
Description
Select a spatially balanced sample from a point (finite), linear / linestring (infinite), or areal / polygon (infinite) sampling frame using the Generalized Random Tessellation Stratified (GRTS) algorithm. The GRTS algorithm accommodates unstratified and stratified sampling designs and allows for equal inclusion probabilities, unequal inclusion probabilities according to a categorical variable, and inclusion probabilities proportional to a positive auxiliary variable. Several additional sampling options are included, such as including legacy (historical) sites, requiring a minimum distance between sites, and selecting replacement sites. For technical details, see Stevens and Olsen (2004).
Usage
grts(
  sframe,
  n_base,
  stratum_var = NULL,
  seltype = NULL,
  caty_var = NULL,
  caty_n = NULL,
  aux_var = NULL,
  legacy_var = NULL,
  legacy_sites = NULL,
  legacy_stratum_var = NULL,
  legacy_caty_var = NULL,
  legacy_aux_var = NULL,
  mindis = NULL,
  maxtry = 10,
  n_over = NULL,
  n_near = NULL,
  wgt_units = NULL,
  pt_density = NULL,
  DesignID = "Site",
  SiteBegin = 1,
  sep = "-",
  projcrs_check = TRUE
)
Arguments
| sframe | A sampling frame as an  | 
| n_base | The base sample size required. If the sampling design is unstratified,
this is a single numeric value. If the sampling design is stratified, this is a named
vector or list whose names represent each stratum and whose values represent each
stratum's sample size. These names must match the values of the stratification
variable represented by  | 
| stratum_var | A character string containing the name of the column from
 | 
| seltype | A character string or vector indicating the inclusion probability type,
which must be one of following:  | 
| caty_var | A character string containing the name of the column from
 | 
| caty_n | A character vector indicating the expected sample size for each
level of  | 
| aux_var | A character string containing the name of the column from
 | 
| legacy_var | This argument can be used instead of  | 
| legacy_sites | An sf object with a  | 
| legacy_stratum_var | A character string containing the name of the column from
 | 
| legacy_caty_var | A character string containing the name of the column from
 | 
| legacy_aux_var | A character string containing the name of the column from
 | 
| mindis | A numeric value indicating the desired minimum distance between sampled
sites. If the sampling design is stratified and  | 
| maxtry | The number of maximum attempts to apply the minimum distance algorithm to obtain
the desired minimum distance between sites. Each iteration takes roughly as long as the
standard GRTS algorithm. Successive iterations will always contain at least as many
sites satisfying the minimum distance requirement as the previous iteration. The algorithm stops
when the minimum distance requirement is met or there are  | 
| n_over | The number of reverse hierarchically ordered (rho) replacement sites.
If the sampling design is unstratified, then
 | 
| n_near | The number of nearest neighbor (nn) replacement sites.
If the sampling design is unstratified,  | 
| wgt_units | The units used to compute the design weights. These
units must be standard units as defined by the  | 
| pt_density | A positive integer controlling the density of the GRTS approximation
for infinite sampling frames. The GRTS approximation for infinite sample
frames vastly improves computational efficiency by generating many finite points and
selecting a sample from the points.  | 
| DesignID | A character string indicating the naming structure for each
site's identifier selected in the sample, which is matched with  | 
| SiteBegin | A character string indicating the first number to use to match
with  | 
| sep | A character string that acts as a separator between
 | 
| projcrs_check | A check for whether the coordinates are projected. If  | 
Details
n_base is the number of sites used to calculate
the design weights, which is typically the number of sites used in an analysis. When a panel sampling design is implemented, n_base is typically the
number of sites in all panels that will be sampled in the same temporal period –
n_base is not the total number of sites in all panels. The sum of n_base and
n_over is equal to the total number of sites to be visited for all panels plus
any replacement sites that may be required.
Value
The sampling design sites and additional information about the sampling design. More specifically, it is, a list with five elements:
-  sites_legacyAn sf object containing legacy sites. This isNULLif legacy sites were not included in the sample.
-  sites_baseAn sf object containing the base sites. This isNULLifn_baseequals the number of legacy sites.
-  sites_overAn sf object containing the reverse hierarchically ordered replacement sites. This isNULLif no reverse hierarchically ordered replacement sites were included in the sample.
-  sites_nearAn sf object containing the nearest neighbor replacement sites. This isNULLif no nearest neighbor replacement sites were included in the sample.
-  designA list documenting the specifications of this sampling design. This can be checked to verify your sampling design ran as intended.-  callThe original function call.
-  stratum_varThe name of the stratification variable insframe. This equalsNULLif no stratification is used.
-  stratumThe unique strata. This equals"None"if the sampling design is unstratified.
-  n_baseThe base sample size per stratum.
-  seltypeThe selection type per stratum.
-  caty_varThe name of the unequal probability variable insframe. This equalsNULLif no unequal probability variable is used.
-  caty_nThe expected sample sizes for each level of the unequal probability grouping variable per stratum. This equalsNULLwhenseltypeis not"unequal".
-  aux_varThe name of the proportional probability (auxiliary) variable insframe. This equalsNULLif no proportional probability variable is used.
-  legacyA logical variable indicating whether legacy sites were included in the sample.
-  legacy_stratum_varThe name of the stratification variable inlegacy_sites. Omitted if legacy sites are not used. This equalsNULLif legacy sites were used but no stratification variable is used.
-  legacy_caty_varThe name of the unequal probability variable inlegacy_sites. Omitted if legacy sites are not used. This equalsNULLif legacy sites were used but no unequal probability variable is used.
-  legacy_aux_varThe name of the proportional probability (auxiliary) variable inlegacy_sites. Omitted if legacy sites are not used. This equalsNULLif legacy sites were used but no proportional probability variable is used.
-  mindisThe minimum distance requirement desired. This isNULLwhen no minimum distance requirement was applied.
-  n_overThe reverse hierarchically ordered replacement site sample sizes per stratum. Ifseltypeisunequal, this represents the expected sample sizes. This isNULLwhen no reverse hierarchically ordered replacement sites were selected.
-  n_nearThe number of nearest neighbor replacement sites desired. This isNULLwhen no nearest neighbor replacement sites were selected.
 
-  
When non-NULL, the sites_legacy, sites_base,
sites_over, and sites_near objects contain the original columns
in sframe and include a few additional columns. These additional columns
are
-  siteIDA site identifier (as named using theDesignIDandSiteBeginarguments togrts()).
-  siteuseWhether the site is a legacy site (Legacy), base site (Base), reverse hierarchically ordered replacement site (Over), or nearest neighbor replacement site (Near).
-  replsiteThe replacement site ordering.replsiteisNoneif the site is not a replacement site,Nextif it is the next reverse hierarchically ordered replacement site to use, orNear_, where the word following_indicates the ordering of sites closest to the originally sampled site.
-  lon_WGS84Longitude coordinates using the WGS84 coordinate system (EPSG:4326). Only given if coordinates are projected.
-  lat_WGS84Latitude coordinates using the WGS84 coordinate system (EPSG:4326). Only given if coordinates are projected.
-  XLongitude coordinates using the provided coordinate system. Only given if coordinates are not projected (i.e., they are geographic or NA).
-  YLatitude coordinates using the provided coordinate system. Only given if coordinates are not projected (i.e., they are geographic or NA).
-  stratumA stratum indicator.stratumisNoneif the sampling design was unstratified. If the sampling design wasstratified,stratumindicates the stratum.
-  wgtThe design weight.
-  ipThe site's original inclusion probability (the reciprocal) of (wgt).
-  catyAn unequal probability grouping indicator.catyisNoneif the sampling design did not use unequal inclusion probabilities. If the sampling design did use unequal inclusion probabilities,catyindicates the unequal probability level.
-  auxThe auxiliary proportional probability variable. This column is only returned ifseltypewasproportionalin the original sampling design.
If any columns in sframe contain these names, those columns
from sframe will be automatically prefixed with sframe_
in the sites object. When output is printed, a summary of site counts by
the levels in stratum_var and caty_var is shown.
Author(s)
Tony Olsen olsen.tony@epa.gov
References
Stevens Jr., Don L. and Olsen, Anthony R. (2004). Spatially balanced sampling of natural resources. Journal of the American Statistical Association, 99(465), 262-278.
See Also
- irs
- to select a sample that is not spatially balanced 
Examples
## Not run: 
samp <- grts(NE_Lakes, n_base = 100)
print(samp)
strata_n <- c(low = 25, high = 30)
samp_strat <- grts(NE_Lakes, n_base = strata_n, stratum_var = "ELEV_CAT")
print(samp_strat)
samp_over <- grts(NE_Lakes, n_base = 30, n_over = 5)
print(samp_over)
## End(Not run)
Select an independent random sample (IRS)
Description
Select a sample that is not spatially balanced from a point (finite), linear / linestring (infinite), or areal / polygon (infinite) sampling frame using the Independent Random Sampling (IRS) algorithm. The IRS algorithm accommodates unstratified and stratified sampling designs and allows for equal inclusion probabilities, unequal inclusion probabilities according to a categorical variable, and inclusion probabilities proportional to a positive auxiliary variable. Several additional sampling options are included, such as including legacy (historical) sites, requiring a minimum distance between sites, and selecting replacement sites.
Usage
irs(
  sframe,
  n_base,
  stratum_var = NULL,
  seltype = NULL,
  caty_var = NULL,
  caty_n = NULL,
  aux_var = NULL,
  legacy_var = NULL,
  legacy_sites = NULL,
  legacy_stratum_var = NULL,
  legacy_caty_var = NULL,
  legacy_aux_var = NULL,
  mindis = NULL,
  maxtry = 10,
  n_over = NULL,
  n_near = NULL,
  wgt_units = NULL,
  pt_density = NULL,
  DesignID = "Site",
  SiteBegin = 1,
  sep = "-",
  projcrs_check = TRUE
)
Arguments
| sframe | A sampling frame as an  | 
| n_base | The base sample size required. If the sampling design is unstratified,
this is a single numeric value. If the sampling design is stratified, this is a named
vector or list whose names represent each stratum and whose values represent each
stratum's sample size. These names must match the values of the stratification
variable represented by  | 
| stratum_var | A character string containing the name of the column from
 | 
| seltype | A character string or vector indicating the inclusion probability type,
which must be one of following:  | 
| caty_var | A character string containing the name of the column from
 | 
| caty_n | A character vector indicating the expected sample size for each
level of  | 
| aux_var | A character string containing the name of the column from
 | 
| legacy_var | This argument can be used instead of  | 
| legacy_sites | An sf object with a  | 
| legacy_stratum_var | A character string containing the name of the column from
 | 
| legacy_caty_var | A character string containing the name of the column from
 | 
| legacy_aux_var | A character string containing the name of the column from
 | 
| mindis | A numeric value indicating the desired minimum distance between sampled
sites. If the sampling design is stratified and  | 
| maxtry | The number of maximum attempts to apply the minimum distance algorithm to obtain
the desired minimum distance between sites. Each iteration takes roughly as long as the
standard GRTS algorithm. Successive iterations will always contain at least as many
sites satisfying the minimum distance requirement as the previous iteration. The algorithm stops
when the minimum distance requirement is met or there are  | 
| n_over | The number of reverse hierarchically ordered (rho) replacement sites.
If the sampling design is unstratified, then
 | 
| n_near | The number of nearest neighbor (nn) replacement sites.
If the sampling design is unstratified,  | 
| wgt_units | The units used to compute the design weights. These
units must be standard units as defined by the  | 
| pt_density | A positive integer controlling the density of the GRTS approximation
for infinite sampling frames. The GRTS approximation for infinite sample
frames vastly improves computational efficiency by generating many finite points and
selecting a sample from the points.  | 
| DesignID | A character string indicating the naming structure for each
site's identifier selected in the sample, which is matched with  | 
| SiteBegin | A character string indicating the first number to use to match
with  | 
| sep | A character string that acts as a separator between
 | 
| projcrs_check | A check for whether the coordinates are projected. If  | 
Details
n_base is the number of sites used to calculate
the design weights, which is typically the number of sites used in an analysis. When a panel sampling design is implemented, n_base is typically the
number of sites in all panels that will be sampled in the same temporal period –
n_base is not the total number of sites in all panels. The sum of n_base and
n_over is equal to the total number of sites to be visited for all panels plus
any replacement sites that may be required.
Value
The sampling design sites and additional information about the sampling design. More specifically, it is, a list with five elements:
-  sites_legacyAn sf object containing legacy sites. This isNULLif legacy sites were not included in the sample.
-  sites_baseAn sf object containing the base sites. This isNULLifn_baseequals the number of legacy sites.
-  sites_overAn sf object containing the reverse hierarchically ordered replacement sites. This isNULLif no reverse hierarchically ordered replacement sites were included in the sample.
-  sites_nearAn sf object containing the nearest neighbor replacement sites. This isNULLif no nearest neighbor replacement sites were included in the sample.
-  designA list documenting the specifications of this sampling design. This can be checked to verify your sampling design ran as intended.-  callThe original function call.
-  stratum_varThe name of the stratification variable insframe. This equalsNULLif no stratification is used.
-  stratumThe unique strata. This equals"None"if the sampling design is unstratified.
-  n_baseThe base sample size per stratum.
-  seltypeThe selection type per stratum.
-  caty_varThe name of the unequal probability variable insframe. This equalsNULLif no unequal probability variable is used.
-  caty_nThe expected sample sizes for each level of the unequal probability grouping variable per stratum. This equalsNULLwhenseltypeis not"unequal".
-  aux_varThe name of the proportional probability (auxiliary) variable insframe. This equalsNULLif no proportional probability variable is used.
-  legacyA logical variable indicating whether legacy sites were included in the sample.
-  legacy_stratum_varThe name of the stratification variable inlegacy_sites. Omitted if legacy sites are not used. This equalsNULLif legacy sites were used but no stratification variable is used.
-  legacy_caty_varThe name of the unequal probability variable inlegacy_sites. Omitted if legacy sites are not used. This equalsNULLif legacy sites were used but no unequal probability variable is used.
-  legacy_aux_varThe name of the proportional probability (auxiliary) variable inlegacy_sites. Omitted if legacy sites are not used. This equalsNULLif legacy sites were used but no proportional probability variable is used.
-  mindisThe minimum distance requirement desired. This isNULLwhen no minimum distance requirement was applied.
-  n_overThe reverse hierarchically ordered replacement site sample sizes per stratum. Ifseltypeisunequal, this represents the expected sample sizes. This isNULLwhen no reverse hierarchically ordered replacement sites were selected.
-  n_nearThe number of nearest neighbor replacement sites desired. This isNULLwhen no nearest neighbor replacement sites were selected.
 
-  
When non-NULL, the sites_legacy, sites_base,
sites_over, and sites_near objects contain the original columns
in sframe and include a few additional columns. These additional columns
are
-  siteIDA site identifier (as named using theDesignIDandSiteBeginarguments togrts()).
-  siteuseWhether the site is a legacy site (Legacy), base site (Base), reverse hierarchically ordered replacement site (Over), or nearest neighbor replacement site (Near).
-  replsiteThe replacement site ordering.replsiteisNoneif the site is not a replacement site,Nextif it is the next reverse hierarchically ordered replacement site to use, orNear_, where the word following_indicates the ordering of sites closest to the originally sampled site.
-  lon_WGS84Longitude coordinates using the WGS84 coordinate system (EPSG:4326). Only given if coordinates are projected.
-  lat_WGS84Latitude coordinates using the WGS84 coordinate system (EPSG:4326). Only given if coordinates are projected.
-  XLongitude coordinates using the provided coordinate system. Only given if coordinates are not projected (i.e., they are geographic or NA).
-  YLatitude coordinates using the provided coordinate system. Only given if coordinates are not projected (i.e., they are geographic or NA).
-  stratumA stratum indicator.stratumisNoneif the sampling design was unstratified. If the sampling design wasstratified,stratumindicates the stratum.
-  wgtThe design weight.
-  ipThe site's original inclusion probability (the reciprocal) of (wgt).
-  catyAn unequal probability grouping indicator.catyisNoneif the sampling design did not use unequal inclusion probabilities. If the sampling design did use unequal inclusion probabilities,catyindicates the unequal probability level.
-  auxThe auxiliary proportional probability variable. This column is only returned ifseltypewasproportionalin the original sampling design.
If any columns in sframe contain these names, those columns
from sframe will be automatically prefixed with sframe_
in the sites object. When output is printed, a summary of site counts by
the levels in stratum_var and caty_var is shown.
Author(s)
Tony Olsen olsen.tony@epa.gov
See Also
- grts
- to select a sample that is spatially balanced 
Examples
## Not run: 
samp <- irs(NE_Lakes, n_base = 100)
print(samp)
strata_n <- c(low = 25, high = 30)
samp_strat <- irs(NE_Lakes, n_base = strata_n, stratum_var = "ELEV_CAT")
print(samp_strat)
samp_over <- irs(NE_Lakes, n_base = 30, n_over = 5)
print(samp_over)
## End(Not run)
Internal Function: Variance-Covariance Matrix Based on Local Mean Estimator
Description
This function calculates the variance-covariance matrix using the local mean estimator.
Usage
localmean_cov(zmat, weight_1st)
Arguments
| zmat | Matrix of weighted response values or weighted residual values for the sample points. | 
| weight_1st | List from the local mean weight function containing two
elements: a matrix named  | 
Value
The local mean estimator of the variance-covariance matrix.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
Internal Function: Local Mean Variance Estimator
Description
This function calculates the local mean variance estimator.
Usage
localmean_var(z, weight_1st)
Arguments
| z | Vector of weighted response values or weighted residual values for the sample points. | 
| weight_1st | List from the local mean weight function containing two
elements: a matrix named  | 
Value
The local mean estimator of the variance.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
Internal Function: Local Mean Variance Neighbors and Weights
Description
This function calculates the index values of neighboring points and associated weights required by the local mean variance estimator.
Usage
localmean_weight(x, y, prb, nbh = 4)
Arguments
| x | Vector of x-coordinates for location of the sample points. | 
| y | Vector of y-coordinates for location of the sample points. | 
| prb | Vector of inclusion probabilities for the sample points. | 
| nbh | Number of neighboring points to use in the calculations. | 
Value
If ginv fails to return valid output, a NULL object.  Otherwise, a
list containing two elements: a matrix named ij composed of the
index values of neighboring points and a vector named gwt
composed of weights.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
Summary characteristics of a panel revisit design
Description
Panel revisit design characteristics are summarized: number of panels, number of time periods, total number of sample events for the revisit design, total number of sample events for each panel, total number of sample events for each time period and cumulative number of unique units sampled by time periods.
Usage
pd_summary(object, visitdsgn = NULL, ...)
Arguments
| object | Two-dimensional array from  | 
| visitdsgn | Two-dimensional array with same dimensions as  | 
| ... | Additional arguments (S3 consistency) | 
Details
The revisit panel design and the visit design (if present) are summarized. Summaries can be useful to know the effort required to complete the survey design. See the values returned for the summaries that are produced.
Value
List of six elements.
- n_panel
- number of panels in revisit design 
- n_period
- number of time periods in revisit design 
- n_total
- total number of sample events across all panels and all time periods, accounting for - visitdsgn, that will be sampled in the revisit design
- n_periodunit
- vector of the number of time periods a unit will be sampled in each panel 
- n_unitpnl
- vector of the number of sample units, accounting for - visitdsgn, that will be sampled in each panel
- n_unitperiod
- vector of the number of sample units, accounting for - visitdsgn, that will be sampled during each time period
- ncum_unit
- vector of the cumulative number of unique units that will be sampled in time periods up to and including the current time period. 
Author(s)
Tony Olsen Olsen.Tony@epa.gov
Examples
# Serially alternating panel revisit design summary
sa_dsgn <- revisit_dsgn(20, panels = list(SA60N = list(
  n = 60, pnl_dsgn = c(1, 4),
  pnl_n = NA, start_option = "None"
)), begin = 1)
pd_summary(sa_dsgn)
# Add visit design where first panel is sampled twice at every time period
sa_visit <- sa_dsgn
sa_visit[sa_visit > 0] <- 1
sa_visit[1, sa_visit[1, ] > 0] <- 2
pd_summary(sa_dsgn, sa_visit)
Plot sampling frames, design sites, and analysis data.
Description
This function plots sampling frames, design sites, and analysis data.
If the left-hand side of the formula is empty, plots
are of the distributions of the right-hand side variables. If the left-hand side
of the variable contains a variable, plots are of the left-hand size variable
for each level of each right-hand side variable.
This function is largely built on plot.sf(), and all spsurvey plotting
methods can supply additional arguments to plot.sf(). For more information on
plotting in sf, run ?sf::plot.sf(). Equivalent to sp_plot(); both
are currently maintained for backwards compatibility.
Usage
## S3 method for class 'sp_frame'
plot(
  x,
  formula = ~1,
  xcoord,
  ycoord,
  crs,
  var_args = NULL,
  varlevel_args = NULL,
  geom = FALSE,
  onlyshow = NULL,
  fix_bbox = TRUE,
  ...
)
## S3 method for class 'sp_design'
plot(
  x,
  sframe = NULL,
  formula = ~siteuse,
  siteuse = NULL,
  var_args = NULL,
  varlevel_args = NULL,
  geom = FALSE,
  onlyshow = NULL,
  fix_bbox = TRUE,
  ...
)
Arguments
| x | An object to plot. When plotting sampling frames an  | 
| formula | A formula. One-sided formulas are used to summarize the
distribution of numeric or categorical variables. For one-sided formulas,
variable names are placed to the right of  | 
| xcoord | Name of the x-coordinate (east-west) in  | 
| ycoord | Name of y (north-south)-coordinate in  | 
| crs | Projection code for  | 
| var_args | A named list. The name of each list element corresponds to a
right-hand side variable in  | 
| varlevel_args | A named list. The name of each list element corresponds to a
right-hand side variable in  | 
| geom | Should separate geometries for each level of the right-hand
side  | 
| onlyshow | A string indicating the single level of the single right-hand side variable for which a summary is requested. This argument is only used when a single right-hand side variable is provided. | 
| fix_bbox | Should the geometry bounding box be fixed across plots?
If a length-four vector with names "xmin", "ymin", "xmax", and "ymax" and values
indicating bounding box edges, the bounding box will be fixed as  | 
| ... | Additional arguments to pass to  | 
| sframe | The sampling frame (an  | 
| siteuse | A character vector of site types to include when plotting design sites.
It can only take on values  | 
Author(s)
Michael Dumelle Dumelle.Michael@epa.gov
Examples
## Not run: 
data("NE_Lakes")
NE_Lakes <- sp_frame(NE_Lakes)
plot(NE_Lakes, formula = ~ELEV_CAT)
sample <- grts(NE_Lakes, 30)
plot(sample, NE_Lakes)
## End(Not run)
Plot a cumulative distribution function (CDF)
Description
This function creates a CDF plot.  Input data for the plots is provided by a
data frame from the "CDF" output given by  cont_analysis.
Confidence limits for the CDF also are plotted. Equivalent to cdf_plot(); 
both are currently maintained for backwards compatibility.
Usage
## S3 method for class 'sp_CDF'
plot(
  x,
  var = NULL,
  subpop = NULL,
  subpop_level = NULL,
  units_cdf = "Percent",
  type_cdf = "Continuous",
  log = "",
  xlab = NULL,
  ylab = NULL,
  ylab_r = NULL,
  main = NULL,
  legloc = NULL,
  confcut = 0,
  conflev = 95,
  cex.main = 1.2,
  cex.legend = 1,
  ...
)
Arguments
| x | Data frame from the "CDF" output given by
 | 
| var | If  | 
| subpop | If  | 
| subpop_level | If  | 
| units_cdf | Indicator for the label utilized for the left side y-axis and the values used for the left side y-axis tick marks, where "Percent" means the label and values are in terms of percent of the population, and "Units" means the label and values are in terms of units (count, length, or area) of the population. The default is "Percent". | 
| type_cdf | Character string consisting of the value "Continuous" or "Ordinal" that controls the type of CDF plot. The default is "Continuous". | 
| log | Character string consisting of the value "" or "x" that controls whether the x axis uses the original scale ("") or the base 10 logarithmic scale ("x"). The default is "". | 
| xlab | Character string providing the x-axis label. If this argument equals NULL, then the indicator name is used as the label. The default is NULL. | 
| ylab | Character string providing the left side y-axis label. If argument units_cdf equals "Units", a value should be provided for this argument. Otherwise, the label will be "Percent". The default is "Percent". | 
| ylab_r | Character string providing the label for the right side y-axis (and, hence, determining the values used for the right side y-axis tick marks), where NULL means a right side y-axis is not created. If this argument equals "Same", the right side y-axis will have the same label and tick mark values as the left side y-axis. If this argument equals a character string other than "Same", the right side y-axis label will be the value provided for argument ylab_r, and the right side y-axis tick mark values will be determined by the choice not utilized for argument units_cdf, which means that the default value of argument units_cdf (i.e., "Percent") will result in the right side y-axis tick mark values being expressed in terms of units of the population (i.e., count, length, or area). The default is NULL. | 
| main | Character string providing the plot title. The default is NULL. | 
| legloc | Indicator for location of the plot legend, where "BR" means bottom right, "BL" means bottom left, "TR" means top right, "TL" means top left, and NULL means no legend. The default is NULL. | 
| confcut | Numeric value that controls plotting confidence limits at the CDF extremes. Confidence limits for CDF values (percent scale) less than confcut or greater than 100 minus confcut are not plotted. A value of zero means confidence limits are plotted for the complete range of the CDF. The default is 0. | 
| conflev | Numeric value of the confidence level used for confidence limits. The default is 95. | 
| cex.main | Expansion factor for the plot title. The default is 1.2. | 
| cex.legend | Expansion factor for the legend title. The default is 1. | 
| ... | Additional arguments passed to the  | 
Value
A plot of a variable's CDF estimates associated confidence limits.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
See Also
- cont_cdfplot
- for creating a PDF file containing CDF plots 
- cont_cdftest
- for CDF hypothesis testing 
Examples
## Not run: 
dframe <- data.frame(
  siteID = paste0("Site", 1:100),
  wgt = runif(100, 10, 100),
  xcoord = runif(100),
  ycoord = runif(100),
  stratum = rep(c("Stratum1", "Stratum2"), 50),
  ContVar = rnorm(100, 10, 1),
  All_Sites = rep("All Sites", 100),
  Resource_Class = rep(c("Good", "Poor"), c(55, 45))
)
myvars <- c("ContVar")
mysubpops <- c("All_Sites", "Resource_Class")
mypopsize <- data.frame(
  Resource_Class = c("Good", "Poor"),
  Total = c(4000, 1500)
)
myanalysis <- cont_analysis(dframe,
  vars = myvars, subpops = mysubpops,
  siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord",
  stratumID = "stratum", popsize = mypopsize
)
keep <- with(myanalysis$CDF, Type == "Resource_Class" &
  Subpopulation == "Good")
par(mfrow = c(2, 1))
plot(myanalysis$CDF[keep, ],
  xlab = "ContVar",
  ylab = "Percent of Stream Length", ylab_r = "Stream Length (km)",
  main = "Estimates for Resource Class: Good"
)
plot(myanalysis$CDF[keep, ],
  xlab = "ContVar",
  ylab = "Percent of Stream Length", ylab_r = "Same",
  main = "Estimates for Resource Class: Good"
)
## End(Not run)
Power calculation for multiple panel designs
Description
Calculates the power for trend detection for one or more variables, for one or more panel designs, for one or more linear trends, and for one or more significance levels. The panel designs create a covariance model where the model includes variance components for units, periods, the interaction of units and periods, and the residual (or index) variance.
Usage
power_dsgn(
  ind_names,
  ind_values,
  unit_var,
  period_var,
  unitperiod_var,
  index_var,
  unit_rho = 1,
  period_rho = 0,
  paneldsgn,
  nrepeats = NULL,
  trend_type = "mean",
  ind_pct = NULL,
  ind_tail = NULL,
  trend = 2,
  alpha = 0.05
)
Arguments
| ind_names | Vector of indicator names | 
| ind_values | Vector of indicator mean values | 
| unit_var | Vector of variance component estimates for unit variability for the indicators | 
| period_var | Vector of variance component estimates for period variability for the indicators | 
| unitperiod_var | Vector of variance component estimates for unit by period interaction variability for the indicators | 
| index_var | Vector of variance component estimates for index (residual) error for the indicators | 
| unit_rho | Correlation across units. Default is  | 
| period_rho | Correlation across periods. Default is  | 
| paneldsgn | A list of panel designs each as a matrix.  Each element of
the list is a matrix with  | 
| nrepeats | Either  | 
| trend_type | Trend type is either  | 
| ind_pct | When  | 
| ind_tail | When trend_type is equal to  | 
| trend | Single value or vector of assumed percent change from
initial value in the indicator for each period. Assumes the trend is
expressed as percent per period. Note that the trend may be either positive
or negative. The default is  | 
| alpha | Single value or vector of significance level for linear
trend test, alpha, Type I error, level.  The default is  | 
Details
Calculates the power for detecting a change in the mean for different panel design structures. The model incorporates unit, period, unit by period, and index variance components as well as correlation across units and across periods. See references for methods.
Value
A list with components trend_type, ind_pct, ind_tail, trend values
across periods, periods (all periods included in one or more panel
designs), significance levels, a five-dimensional array of power
calculations (dimensions: panel, design names, periods, indicator names,
trend names, alpha_names), an array of indicator mean values for each trend
and the function call.
Author(s)
Tony Olsen Olsen.Tony@epa.gov
References
Urquhart, N. S., W. S. Overton, et al. (1993) Comparing sampling designs for monitoring ecological status and trends: impact of temporal patterns. In: Statistics for the Environment. V. Barnett and K. F. Turkman. John Wiley & Sons, New York, pp. 71-86.
Urquhart, N. S. and T. M. Kincaid (1999). Designs for detecting trends from repeated surveys of ecological resources. Journal of Agricultural, Biological, and Environmental Statistics, 4(4), 404-414.
Urquhart, N. S. (2012). The role of monitoring design in detecting trend in long-term ecological monitoring studies. In: Design and Analysis of Long-term Ecological Monitoring Studies. R. A. Gitzen, J. J. Millspaugh, A. B. Cooper, and D. S. Licht (eds.). Cambridge University Press, New York, pp. 151-173.
See Also
- ppd_plot
- to plot power curves for panel designs 
Examples
# Power for rotating panel with sample size 60
power_dsgn("Variable_Name",
  ind_values = 43, unit_var = 280, period_var = 4,
  unitperiod_var = 40, index_var = 90, unit_rho = 1, period_rho = 0,
  paneldsgn = list(NoR60 = revisit_dsgn(20,
    panels = list(NoR60 = list(
      n = 60, pnl_dsgn = c(1, NA),
      pnl_n = NA, start_option = "None"
    )), begin = 1
  )),
  nrepeats = NULL, trend_type = "mean", trend = 1.0, alpha = 0.05
)
Plot power curves for panel designs
Description
Plot power curves and relative power curves for trend detection for set of panel designs, time periods, indicators, significance levels and trend. Trend may be based on percent change per period in mean or percent change in proportion of cumulative distribution function above or below a fixed cut point. Types of plots are combinations of standard/relative, mean/percent, period/change and design/indicator. Input must be be of class powerpaneldesign and is normally the output of function power_dsgn.
Usage
ppd_plot(
  object,
  plot_type = "standard",
  trend_type = "mean",
  xaxis_type = "period",
  comp_type = "design",
  dsgns = NULL,
  indicator = NULL,
  trend = NULL,
  period = NULL,
  alpha = NULL,
  ...
)
Arguments
| object | List object of class  | 
| plot_type | Default is  | 
| trend_type | Character value for trend in mean ( | 
| xaxis_type | Character value equal to  | 
| comp_type | Character value equal to  | 
| dsgns | Vector of names of panel designs that are to be plotted.  Names
must be all, or a subset of, names of designs in  | 
| indicator | Vector of indicator names contained in  | 
| trend | 
 | 
| period | 
 | 
| alpha | A single value or vector of significance levels (as proportion,
e.g.  | 
| ... | Additional arguments (S3 consistency) | 
Details
By default the plot function produces a standard power curve at end
of each time period on the x-axis with y-axis as power. When more than one
panel design is in dsgnpower, the first panel design is used. When more than
one indicator is in dsgnpower, the first indicator is used.  When more than
one trend value is in dsgnpower, the maximum trend value is used. When more
than one significance level, alpha, is in dsgnpower, the minimum
significance level is used.
Control of the type of plot produced is governed by plot_type, trend_type,
xaxis_type and comp_type. The number of plots produced is governed by the
number of panel designs (dsgn) specified, the number of indicators
(indicator) specified, the number of time periods (period) specifies, the
number of trend values (trend) specified and the number of significance
levels (alpha) specified.
When the comparison type ("comp_type") is equal to "design", all power
curves specified by dsgn are plotted on the same plot.  When comp_type is
equal to "indicator", all power curves specified by "indicator" are plotted
on the same plot.  Typically, no more than 4-5 power curves should be
plotted on same plot.
Value
One or more power curve plots are created and plotted. User must specify output graphical device if more than one plot is created. See Devices for graphical output options.
Author(s)
Tony Olsen Olsen.Tony@epa.gov
Examples
## Not run: 
# Construct a rotating panel design with sample size of 60
R60N <- revisit_dsgn(20, panels = list(R60N = list(
  n = 60, pnl_dsgn = c(1, NA),
  pnl_n = NA, start_option = "None"
)), begin = 1)
# Construct a fixed panel design with sample size of 60
F60 <- revisit_dsgn(20, panels = list(F60 = list(
  n = 60, pnl_dsgn = c(1, 0),
  pnl_n = NA, start_option = "None"
)), begin = 1)
# Power for rotating panel with sample size 60
Power_tst <- power_dsgn("Variable_Name",
  ind_values = 43, unit_var = 280,
  period_var = 4, unitperiod_var = 40, index_var = 90,
  unit_rho = 1, period_rho = 0, paneldsgn = list(
    R60N = R60N, F60 = F60
  ), nrepeats = NULL,
  trend_type = "mean", trend = c(1.0, 2.0), alpha = 0.05
)
ppd_plot(Power_tst)
ppd_plot(Power_tst, dsgns = c("F60", "R60N"))
ppd_plot(Power_tst, dsgns = c("F60", "R60N"), trend = 1.0)
ppd_plot(Power_tst,
  plot_type = "relative", comp_type = "design",
  trend_type = "mean", trend = c(1, 2), dsgns = c("R60N", "F60"),
  indicator = "Variable_Name"
)
## End(Not run)
Relative risk analysis
Description
This function organizes input and output for relative risk analysis (of
categorical variables).  The analysis data,
dframe, can be either a data frame or a simple features (sf) object.  If an
sf object is used, coordinates are extracted from the geometry column in the
object, arguments xcoord and ycoord are assigned values
"xcoord" and "ycoord", respectively, and the geometry column is
dropped from the object.
Usage
relrisk_analysis(
  dframe,
  vars_response,
  vars_stressor,
  response_levels = NULL,
  stressor_levels = NULL,
  subpops = NULL,
  siteID = NULL,
  weight = "weight",
  xcoord = NULL,
  ycoord = NULL,
  stratumID = NULL,
  clusterID = NULL,
  weight1 = NULL,
  xcoord1 = NULL,
  ycoord1 = NULL,
  sizeweight = FALSE,
  sweight = NULL,
  sweight1 = NULL,
  fpc = NULL,
  popsize = NULL,
  vartype = "Local",
  conf = 95,
  All_Sites = FALSE
)
Arguments
| dframe | Data to be analyzed (analysis data). A data frame or
 | 
| vars_response | Vector composed of character values that identify the
names of response variables in  | 
| vars_stressor | Vector composed of character values that identify the
names of stressor variables in  | 
| response_levels | List providing the category values (levels) for each
element in the  | 
| stressor_levels | List providing the category values (levels) for each
element in the  | 
| subpops | Vector composed of character values that identify the
names of subpopulation (domain) variables in  | 
| siteID | Character value providing the name of the site ID variable in
 | 
| weight | Character value providing the name of the design weight
variable in  | 
| xcoord | Character value providing name of the x-coordinate variable in
 | 
| ycoord | Character value providing name of the y-coordinate variable in
 | 
| stratumID | Character value providing the name of the stratum ID
variable in  | 
| clusterID | Character value providing the name of the cluster
(stage one) ID variable in  | 
| weight1 | Character value providing the name of the stage one weight
variable in  | 
| xcoord1 | Character value providing the name of the stage one
x-coordinate variable in  | 
| ycoord1 | Character value providing the name of the stage one
y-coordinate variable in  | 
| sizeweight | Logical value that indicates whether size weights should be
used during estimation, where  | 
| sweight | Character value providing the name of the size weight variable
in  | 
| sweight1 | Character value providing the name of the stage one size
weight variable in  | 
| fpc | Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design: 
 Example fpc for a single-stage stratified survey design: 
 Example fpc for a two-stage unstratified survey design: 
 Example fpc for a two-stage stratified survey design: 
 | 
| popsize | Object that provides values for the population argument of the
 Example popsize for calibration: 
 Example popsize for post-stratification using a data frame: 
 Example popsize for post-stratification using a table: 
 Example popsize for post-stratification using an xtabs object: 
 | 
| vartype | Character value providing the choice of the variance
estimator, where  | 
| conf | Numeric value providing the Gaussian-based confidence level.  The default value
is  | 
| All_Sites | A logical variable used when  | 
Value
The analysis results. A data frame of population estimates for all combinations of subpopulations, categories within each subpopulation, response variables, and categories within each response variable. Estimates are provided for proportion and size of the population plus standard error, margin of error, and confidence interval estimates. The data frame contains the following variables:
- Type
- subpopulation (domain) name 
- Subpopulation
- subpopulation name within a domain 
- Response
- response variable 
- Stressor
- stressor variable 
- nResp
- sample size 
- Estimate
- relative risk estimate 
- Estimate_num
- relative risk numerator estimate 
- Estimate_denom
- relative risk denominator estimate 
- StdError
- relative risk standard error 
- MarginofError
- relative risk margin of error 
- LCBxxPct
- xx% (default 95%) lower confidence bound 
- UCBxxPct
- xx% (default 95%) upper confidence bound 
- WeightTotal
- sum of design weights 
- Count_RespPoor_StressPoor
- number of observations in the poor response and poor stressor group 
- Count_RespPoor_StressGood
- number of observations in the poor response and good stressor group 
- Count_RespGood_StressPoor
- number of observations in the good response and poor stressor group 
- Count_RespGood_StressGood
- number of observations in the good response and good stressor group 
- Prop_RespPoor_StressPoor
- weighted proportion of observations in the poor response and poor stressor group 
- Prop_RespPoor_StressGood
- weighted proportion of observations in the poor response and good stressor group 
- Prop_RespGood_StressPoor
- weighted proportion of observations in the good response and poor stressor group 
- Prop_RespGood_StressGood
- weighted proportion of observations in the good response and good stressor group 
Details
Relative risk measures the relative strength of association between conditional probabilities defined for a response variable and a stressor variable, where the response and stressor variables are classified as either good (i.e., reference condition) or poor (i.e., different from reference condition). Relative risk is defined as the ratio of two conditional probabilities. The numerator of the ratio is the probability that the response variable is in poor condition given that the stressor variable is in poor condition. The denominator of the ratio is the probability that the response variable is in poor condition given that the stressor variable is in good condition. A relative risk value equal to one indicates that the response variable is independent of the stressor variable. Relative risk values greater than one measure the extent to which poor condition of the stressor variable is associated with poor condition of the response variable.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
See Also
- attrisk_analysis
- for attributable risk analysis 
- diffrisk_analysis
- for risk difference analysis 
Examples
dframe <- data.frame(
  siteID = paste0("Site", 1:100),
  wgt = runif(100, 10, 100),
  xcoord = runif(100),
  ycoord = runif(100),
  stratum = rep(c("Stratum1", "Stratum2"), 50),
  RespVar1 = sample(c("Poor", "Good"), 100, replace = TRUE),
  RespVar2 = sample(c("Poor", "Good"), 100, replace = TRUE),
  StressVar = sample(c("Poor", "Good"), 100, replace = TRUE),
  All_Sites = rep("All Sites", 100),
  Resource_Class = rep(c("Agr", "Forest"), c(55, 45))
)
myresponse <- c("RespVar1", "RespVar2")
mystressor <- c("StressVar")
mysubpops <- c("All_Sites", "Resource_Class")
relrisk_analysis(dframe,
  vars_response = myresponse,
  vars_stressor = mystressor, subpops = mysubpops, siteID = "siteID",
  weight = "wgt", xcoord = "xcoord", ycoord = "ycoord",
  stratumID = "stratum"
)
Create a balanced incomplete block panel revisit design
Description
Create a revisit design for panels in a survey that specifies the time periods for the units of each panel to be sampled based on searching for a D-optimal block design that is a member of the class of generalized Youden designs. The resulting design need not be a balanced incomplete block design. Based on algorithmic idea by Cook and Nachtsheim (1989) and implemented by Robert Wheeler.
Usage
revisit_bibd(
  n_period,
  n_pnl,
  n_visit,
  nsamp,
  panel_name = "BIB",
  begin = 1,
  skip = 1,
  iter = 30
)
Arguments
| n_period | Number of time periods for the survey design. Typically, number of periods if sampling occurs once per period or number of months if sampling occurs once per month. (v, number of varieties/treatments in BIBD terms) | 
| n_pnl | Number of panels (b, number of blocks in BIBD terms) | 
| n_visit | Number of time periods to be visited in a panel (k, block size in BIBD terms) | 
| nsamp | Number of samples in each panel. | 
| panel_name | Prefix for name of each panel | 
| begin | Numeric name of first sampling occasion, e.g. a specific period. | 
| skip | Number of sampling occasions to skip between planned sampling
periods, e.g., sampling will occur only every 5 periods if  | 
| iter | Maximum number of iterations in search for D-optimal Generalized Youden Design. | 
Details
The function uses find.BIB function from crossdes package to
search for a D-optimal block design.  crossdes uses package AlgDesign
to search balanced incomplete block designs.
Value
A two-dimensional array of sample sizes to be sampled for each panel and each sampling occasion.
Author(s)
Tony Olsen Olsen.Tony@epa.gov
References
Cook R. D. and C. Nachtsheim. (1989). Computer-aided blocking of factorial and response-surface designs. Technometrics 31(3), 339-346.
See Also
- revisit_dsgn
- to create a panel revisit design 
- revisit_rand
- to create a panel revisit design with random assignment to panels and time periods 
- pd_summary
- to summarize characteristics of a panel revisit design 
Examples
# Balanced incomplete block design with 20 sample occasions, 20 panels,
# 3 visits to each unit, and 20 units in each panel.
revisit_bibd(n_period = 20, n_pnl = 20, n_visit = 3, nsamp = 20)
Create a panel revisit design
Description
Create a revisit design for panels in a survey that specifies the time periods that members of each panel will be sampled. Three basic panel design structures may be created: always revisit panel, serially alternating panels, or rotating panels.
Usage
revisit_dsgn(n_period, panels, begin = 1, skip = 1)
Arguments
| n_period | Number of time periods for the panel design. For example, number of periods if sampling occurs once per period or number of months if sampling occurs once per month. | 
| panels | List of lists where each list specifies a revisit panel
structure. Each sublist consists of four components:  | 
| begin | Numeric name of first sampling occasion, e.g. a specific period. | 
| skip | Number of time periods to skip between planned sampling
periods, e.g., sampling will occur only every 5 periods if  | 
Details
The function creates revisit designs using the concepts in McDonald (2003) to specify the revisit pattern across time periods for each panel. The panel revisit schedule is specified by a vector. Odd positions in vector specify the number of consecutive time periods when panel units are sampled. Even positions in vector specify the number of consecutive time periods when panel units are not sampled.
If last even position is a "0", then a single panel follows an always
revisit panel structure.  After satisfying the initial revisit schedule
specified prior to the "0", units in a panel are always visited for rest of
the time periods. The simplest always revisit panel design is to revisit
every sample unit on every time period, specified as pnl_dsgn = c(1,0) or
using McDonald's notation [1-0].
If the last even position is NA, the panels follow a rotating panel
structure. For example, pnl_dsgn = c(1, NA) designates that sample units in
a panel will be visited once and then never again, [1-n] in McDonald's
notation. pnl_dsgn =c(1, 4, 1, NA) designates that sample units in a panel
will be visited once, then not sampled on next four time periods, then
sampled again once at the next time period and then never sampled again,
[1-4-1-n] in McDonald/s notation.
If the last even position is > 0, the panels follow a serially alternating
panel structure. For example, pnl_dsgn = c(1, 4) designates that sample
units in a panel will be visited once, then not sampled during the next
four time periods, then sampled once and not sampled for next four time
periods, and that cycle repeated until end of the number of time periods,
[1-4] in McDonald's notation. pnl_dsgn = c(2, 3, 1, 4) designates that the
cycle has sample units in a panel being visited during two consecutive time
periods, not sampled for three consecutive time periods, sampled for one time
period and then not sampled on next four time periods, and the cycle is
repeated until end of the number of time periods, [2-3-1-4] in McDonald's
notation.
The number of panels in a single panel design is specified by pnl_n.  For
an always revisit panel structure, a single panel is created and pnl_n is
ignored. For a rotating panel structure, when pnl_n = NA, the number of
panels is equal to n_period. Note that this should only be used when the
rotating panel structure is the only panel design, i.e., no split panel
design (see below for split panel details). If pnl_n = m is specified for a
rotating panel design, then then number of panels will be m.  For example,
pnl_dsgn = c( 1, 4, 1, NA) and and pnl_n = 5 means that only 5 panels will
be constructed and the last time period to be sampled will be time period
10. In McDonald's notation the panel design structure is [(1-4-1-n)^5].  If
the number of time periods, n_period, is 20 and no other panel design
structure is specified, then the last 10 time periods will not be sampled.
For serially alternating panels, when pnl_n = NA, the number of panels will
be the sum of the elements in pan_dsgn (ignoring NA). If pnl_n is specified
as m, then m panels will be created.  For example, pnl_dsgn = c(1, 4, 1, 4)
and pnl_n = 3, [(1-4-1-4)^3] in McDonald's notation, will create first three
panels of the 510 serially alternating panels specified by pnl_dsgn.
A serially alternating or rotating panel revisit design may not result in
the same number of units being sampled during each time period,
particularly during the initial start up period.  The default is to not
specify a startup option ("None").  Start up option "Partial_Begin"
initiates the revisit design at the last time period scheduled for sampling
in the first panel. For example, a [2-3-1-4] design starts at time period 6
instead of time period 1 under the Partial_Begin option. For a serially
alternating panel structure, start up option "Partial_End" initiates the
revisit design at the time period that begins the second serially
alternating pattern. For example, a [2-3-1-4] design starts at time period
11 instead of time period 1. For a rotating panel structure design, use of
Partial_End makes the assumption that the number of panels equals the
number of time periods and adds units to the last "m" panels for time
periods 1 to "m" as if number of time periods was extended by "m" where "m"
is one less than then the sum of the panel design.  For example, a
[1-4-1-4-1-n] design would result in m = 10.  Note that some designs with
pnl_n not equal to the number of sample occasions can produce unexpected
panel designs.  See examples.
Different types of panel structures can be combined, these are termed split panels by many authors, by specifying more than one list for the panels parameter. The total number of panels is the sum of the number of panels in each of the panel structures specified by the split panel design.
Value
A two-dimensional array of sample sizes to be sampled at each combination of panel and time period.
Author(s)
Tony Olsen Olsen.Tony@epa.gov
References
McDonald, T. (2003). Review of environmental monitoring methods: survey designs. Environmental Monitoring and Assessment 85, 277-292.
See Also
- revisit_bibd
- to create a balanced incomplete block panel revisit design 
- revisit_rand
- to create a revisit design with random assignment to panels and time periods 
- pd_summary
- to summarize characteristics of a panel revisit design 
Examples
# One panel of  60 sample units sampled at every time period: [1-0]
revisit_dsgn(20, panels = list(
  Annual = list(
    n = 60, pnl_dsgn = c(1, 0), pnl.n = NA,
    start_option = "None"
  )
), begin = 1)
# Rotating panels of 60 units sampled once and never again: [1-n].  Number
# of panels equal n_period.
revisit_dsgn(20,
  panels = list(
    R60N = list(n = 60, pnl_dsgn = c(1, NA), pnl_n = NA, start_option = "None")
  ),
  begin = 1
)
# Serially alternating panel with three visits to sample unit then skip
# next two time periods: [3-2]
revisit_dsgn(20, panels = list(
  SA60PE = list(
    n = 20, pnl_dsgn = c(3, 2), pnl_n = NA,
    start_option = "Partial_End"
  )
), begin = 1)
# Split panel of sample units combining above two panel designs: [1-0, 1-n]
revisit_dsgn(n_period = 20, begin = 2017, panels = list(
  Annual = list(
    n = 60, pnl_dsgn = c(1, 0), pnl.n = NA,
    start_option = "None"
  ),
  R60N = list(n = 60, pnl_dsgn = c(1, NA), pnl_n = NA, start_option = "None")
))
Create a revisit design with random assignment to panels and time periods
Description
Create a revisit design for a survey that specifies the panels and time
periods that will be sampled by random selection of panels and time periods.
Three options for random assignments are "period" where the number of time
periods to be sampled in a panel is fixed, "panel" where the number panels to
be sampled in a time period is fixed, and "none" where the number of
panel-period combinations is fixed.
Usage
revisit_rand(
  n_period,
  n_pnl,
  rand_control = "period",
  n_visit,
  nsamp,
  panel_name = "Random",
  begin = 1,
  skip = 1
)
Arguments
| n_period | Number of time periods for the survey design. Typically, number of periods if sampling occurs once per period or number of months if sampling occurs once per month. (v, number of varieties (or treatments) in BIBD terms) | 
| n_pnl | Number of panels | 
| rand_control | Character value must be  | 
| n_visit | If  | 
| nsamp | Number of samples in each panel. | 
| panel_name | Prefix for name of each panel | 
| begin | Numeric name of first sampling occasion, e.g. a specific period. | 
| skip | Number of sampling occasions to skip between planned sampling
periods, e.g., sampling will occur only every 5 periods if  | 
Details
The revisit design for a survey is created by random selection of panels and time periods that will have sample events. The number of sample occasions that will be visited by a panel is random.
Value
A two-dimensional array of sample sizes to be sampled for each panel and each time period.
Author(s)
Tony Olsen Olsen.Tony@epa.gov
See Also
- revisit_bibd
- create a balanced incomplete block panel revisit design 
- revisit_dsgn
- create a panel revisit design 
- pd_summary
- to summarize characteristics of a panel revisit design 
Examples
revisit_rand(
  n_period = 20, n_pnl = 10, rand_control = "none", n_visit = 50,
  nsamp = 20
)
revisit_rand(
  n_period = 20, n_pnl = 10, rand_control = "panel", n_visit = 5,
  nsamp = 10
)
revisit_rand(
  n_period = 20, n_pnl = 10, rand_control = "period",
  n_visit = 5, nsamp = 10
)
Calculate spatial balance metrics
Description
This function measures the spatial balance (with respect to the sampling frame) of design sites using Voronoi polygons (Dirichlet tessellations).
Usage
sp_balance(
  object,
  sframe,
  stratum_var = NULL,
  ip = NULL,
  metrics = "pielou",
  extents = FALSE
)
Arguments
| object | An  | 
| sframe | The sampling frame as an  | 
| stratum_var | The name of the stratum variable in  | 
| ip | Inclusion probabilities associated with each row of  | 
| metrics | A character vector of spatial balance metrics: 
 All spatial balance metrics have a lower bound of zero, which indicates perfect spatial balance. As the metric value increases, the spatial balance decreases. | 
| extents | Should the extent (total units) within each Voronoi polygon
be returned? Defaults to  | 
Value
A data frame with columns providing the stratum (stratum),
spatial balance metric (metric), and spatial balance (value).
Author(s)
Michael Dumelle Dumelle.Michael@epa.gov
Examples
## Not run: 
sample <- grts(NE_Lakes, 30)
sp_balance(sample$sites_base, NE_Lakes)
strata_n <- c(low = 25, high = 30)
sample_strat <- grts(NE_Lakes, n_base = strata_n, stratum_var = "ELEV_CAT")
sp_balance(sample_strat$sites_base, NE_Lakes, stratum_var = "ELEV_CAT", metric = "rmse")
## End(Not run)
sp_frame objects
Description
Turn sampling frames or analysis data into an sp_frame object
or transform sp_frame objects back into their original object.
Usage
sp_frame(frame)
sp_unframe(sp_frame)
Arguments
| frame | A sampling frame or analysis data | 
| sp_frame | An  | 
Details
The sp_frame() function assigns frame class sp_frame
to be used by summary() and plot(). sp_frame() objects
can sometimes clash with other sf and tidyverse generics, so un_spframe() removes
class sp_frame(), leaving the original classes of frame intact.
Value
An sp_frame object.
Examples
NE_Lakes <- sp_frame(NE_Lakes)
class(NE_Lakes)
NE_Lakes <- sp_unframe(NE_Lakes)
class(NE_Lakes)
Plot sampling frames, design sites, and analysis data.
Description
This function plots sampling frames, design sites, and analysis data.
If the left-hand side of the formula is empty, plots
are of the distributions of the right-hand side variables. If the left-hand side
of the variable contains a variable, plots are of the left-hand size variable
for each level of each right-hand side variable.
This function is largely built on plot.sf(), and all spsurvey plotting
methods can supply additional arguments to plot.sf(). For more information on
plotting in sf, run ?sf::plot.sf(). Equivalent to spsurvey::plot(); both
are currently maintained for backwards compatibility.
Usage
sp_plot(object, ...)
## Default S3 method:
sp_plot(
  object,
  formula = ~1,
  xcoord,
  ycoord,
  crs,
  var_args = NULL,
  varlevel_args = NULL,
  geom = FALSE,
  onlyshow = NULL,
  fix_bbox = TRUE,
  ...
)
## S3 method for class 'sp_design'
sp_plot(
  object,
  sframe = NULL,
  formula = ~siteuse,
  siteuse = NULL,
  var_args = NULL,
  varlevel_args = NULL,
  geom = FALSE,
  onlyshow = NULL,
  fix_bbox = TRUE,
  ...
)
Arguments
| object | An object to plot. When plotting sampling frames or analysis data,
a data frame or  | 
| ... | Additional arguments to pass to  | 
| formula | A formula. One-sided formulas are used to summarize the
distribution of numeric or categorical variables. For one-sided formulas,
variable names are placed to the right of  | 
| xcoord | Name of the x-coordinate (east-west) in  | 
| ycoord | Name of y (north-south)-coordinate in  | 
| crs | Projection code for  | 
| var_args | A named list. The name of each list element corresponds to a
right-hand side variable in  | 
| varlevel_args | A named list. The name of each list element corresponds to a
right-hand side variable in  | 
| geom | Should separate geometries for each level of the right-hand
side  | 
| onlyshow | A string indicating the single level of the single right-hand side variable for which a summary is requested. This argument is only used when a single right-hand side variable is provided. | 
| fix_bbox | Should the geometry bounding box be fixed across plots?
If a length-four vector with names "xmin", "ymin", "xmax", and "ymax" and values
indicating bounding box edges, the bounding box will be fixed as  | 
| sframe | The sampling frame (an  | 
| siteuse | A character vector of site types to include when plotting design sites.
It can only take on values  | 
Author(s)
Michael Dumelle Dumelle.Michael@epa.gov
Examples
## Not run: 
data("NE_Lakes")
sp_plot(NE_Lakes, formula = ~ELEV_CAT)
sample <- grts(NE_Lakes, 30)
sp_plot(sample, NE_Lakes)
data("NLA_PNW")
sp_plot(NLA_PNW, formula = ~BMMI)
## End(Not run)
Combine rows from GRTS or IRS samples.
Description
This function row binds the sites_legacy, sites_base,
sites_over, and sites_near objects from a GRTS or IRS sample
into a single sf object. This function is most useful when a single
sf object that contains all design sites is desired
(e.g. writing out a single shapefile using sf::write_sf()).
Usage
sp_rbind(object, siteuse = NULL)
Arguments
| object | The design sites (output from  | 
| siteuse | A character vector of site types to return. Can contain
 | 
Value
A single sf object containing all requested design sites.
Author(s)
Michael Dumelle Dumelle.Michael@epa.gov
Examples
## Not run: 
sample <- grts(NE_Lakes, 50, n_over = 10)
sample <- sp_rbind(sample)
write_sf(sample, "mypath/sample.shp")
## End(Not run)
Summarize sampling frames, design sites, and analysis data.
Description
sp_summary() summarizes sampling frames, design sites, and analysis data. The right-hand of the
formula specifies the variables (or factors) to
summarize by. If the left-hand side of the formula is empty, the
summary will be of the distributions of the right-hand side variables. If the left-hand side
of the formula contains a variable, the summary will be of the left-hand size variable
for each level of each right-hand side variable. Equivalent to spsurvey::summary(); both
are currently maintained for backwards compatibility.
Usage
sp_summary(object, ...)
## Default S3 method:
sp_summary(object, formula = ~1, onlyshow = NULL, ...)
## S3 method for class 'sp_design'
sp_summary(object, formula = ~siteuse, siteuse = NULL, onlyshow = NULL, ...)
Arguments
| object | An object to summarize. When summarizing sampling frames,
an  | 
| ... | Additional arguments to pass to  | 
| formula | A formula. One-sided formulas are used to summarize the
distribution of numeric or categorical variables. For one-sided formulas,
variable names are placed to the right of  | 
| onlyshow | A string indicating the single level of the single right-hand side variable for which a summary is requested. This argument is only used when a single right-hand side variable is provided. | 
| siteuse | A character vector indicating the design sites
for which summaries are requested in  | 
Value
If the left-hand side of the formula is empty, a named list containing summaries of the count distribution for each right-hand side variable is returned. If the left-hand side of the formula contains a variable, a named list containing five number summaries (numeric left-hand side) or tables (categorical or factor left hand side) is returned for each right-hand side variable.
Author(s)
Michael Dumelle Dumelle.Michael@epa.gov
Examples
## Not run: 
data("NE_Lakes")
sp_summary(NE_Lakes, ELEV ~ 1)
sp_summary(NE_Lakes, ~ ELEV_CAT * AREA_CAT)
sample <- grts(NE_Lakes, 100)
sp_summary(sample, ~ ELEV_CAT * AREA_CAT)
## End(Not run)
Print grts() and irs() errors.
Description
This function prints the error messages vector in the grts
and irs functions.
Usage
stopprnt(stop_df = get("stop_df", envir = .GlobalEnv), m = 1:nrow(stop_df))
Arguments
| stop_df | Data frame that contains stop messages.  The default is
 | 
| m | Vector of indices for stop messages that are to be printed. The
default is a vector containing the integers from 1 through the number of
rows in  | 
Value
Printed errors
Author(s)
Tony Olsen Olsen.Tony@epa.gov
Summarize sampling frames, design sites, and analysis data.
Description
summary() summarizes sampling frames, design sites, and analysis data. The right-hand of the
formula specifies the variables (or factors) to
summarize by. If the left-hand side of the formula is empty, the
summary will be of the distributions of the right-hand side variables. If the left-hand side
of the formula contains a variable, the summary will be of the left-hand size variable
for each level of each right-hand side variable. Equivalent to sp_summary(); both
are currently maintained for backwards compatibility.
Usage
## S3 method for class 'sp_frame'
summary(object, formula = ~1, onlyshow = NULL, ...)
## S3 method for class 'sp_design'
summary(object, formula = ~siteuse, siteuse = NULL, onlyshow = NULL, ...)
Arguments
| object | An object to summarize. When summarizing sampling frames,
an  | 
| formula | A formula. One-sided formulas are used to summarize the
distribution of numeric or categorical variables. For one-sided formulas,
variable names are placed to the right of  | 
| onlyshow | A string indicating the single level of the single right-hand side variable for which a summary is requested. This argument is only used when a single right-hand side variable is provided. | 
| ... | Additional arguments to pass to  | 
| siteuse | A character vector indicating the design sites
for which summaries are requested in  | 
Value
If the left-hand side of the formula is empty, a named list containing summaries of the count distribution for each right-hand side variable is returned. If the left-hand side of the formula contains a variable, a named list containing five number summaries (numeric left-hand side) or tables (categorical or factor left hand side) is returned for each right-hand side variable.
Author(s)
Michael Dumelle Dumelle.Michael@epa.gov
Examples
## Not run: 
data("NE_Lakes")
summary(NE_Lakes, ELEV ~ 1)
summary(NE_Lakes, ~ ELEV_CAT * AREA_CAT)
sample <- grts(NE_Lakes, 100)
summary(sample, ~ ELEV_CAT * AREA_CAT)
## End(Not run)
Trend analysis
Description
This function organizes input and output for estimation of trend across time
for a series of samples (for categorical and continuous variables). Trend is estimated using the
analytical procedure identified by the model arguments.  For categorical
variables, the choices for the model_cat argument are: (1) simple linear
regression, (2) weighted linear regression, and (3) generalized linear
mixed-effects model. For continuous variables, the choices for the
model_cont argument are: (1) simple linear regression, (2) weighted
linear regression, and (3)  linear mixed-effects model.  The analysis data,
dframe, can be either a data frame or a simple features (sf) object.  If an
sf object is used, coordinates are extracted from the geometry column in the
object, arguments xcoord and ycoord are assigned values
"xcoord" and "ycoord", respectively, and the geometry column is
dropped from the object.
Usage
trend_analysis(
  dframe,
  vars_cat = NULL,
  vars_cont = NULL,
  subpops = NULL,
  model_cat = "SLR",
  cat_rhs = NULL,
  model_cont = "LMM",
  cont_rhs = NULL,
  siteID = "siteID",
  yearID = "year",
  weight = "weight",
  xcoord = NULL,
  ycoord = NULL,
  stratumID = NULL,
  clusterID = NULL,
  weight1 = NULL,
  xcoord1 = NULL,
  ycoord1 = NULL,
  sizeweight = FALSE,
  sweight = NULL,
  sweight1 = NULL,
  fpc = NULL,
  popsize = NULL,
  invprboot = TRUE,
  nboot = 1000,
  vartype = "Local",
  jointprob = "overton",
  conf = 95,
  All_Sites = FALSE
)
Arguments
| dframe | Data to be analyzed (analysis data). A data frame or
 | 
| vars_cat | Vector composed of character values that identify the names
of categorical response variables in  | 
| vars_cont | Vector composed of character values that identify the
names of continuous response variables in  | 
| subpops | Vector composed of character values that identify the
names of subpopulation (domain) variables in  | 
| model_cat | Character value identifying the analytical procedure used
for trend estimation for categorical variables.  The choices are:
 | 
| cat_rhs | Character value specifying the right hand side of the formula
for a generalized linear mixed-effects model.  If a value is not provided,
the argument is assigned a value that specifies the Piepho and Ogutu (2002)
model.  The default value is  | 
| model_cont | Character value identifying the analytical procedure used
for trend estimation for continuous variables.  The choices are:
 | 
| cont_rhs | Character value specifying the right hand side of the
formula for a linear mixed-effects model.  If a value is not provided, the
argument is assigned a value that specifies the Piepho and Ogutu (2002)
model.  The default value is  | 
| siteID | Character value providing name of the site ID variable in
 | 
| yearID | Character value providing name of the time period variable in
 | 
| weight | Character value providing name of the design weight
variable in  | 
| xcoord | Character value providing name of the x-coordinate variable in
 | 
| ycoord | Character value providing name of the y-coordinate variable in
 | 
| stratumID | Character value providing name of the stratum ID variable in
 | 
| clusterID | Character value providing name of the cluster (stage one) ID
variable in  | 
| weight1 | Character value providing name of the stage one weight
variable in  | 
| xcoord1 | Character value providing name of the stage one x-coordinate
variable in  | 
| ycoord1 | Character value providing name of the stage one y-coordinate
variable in  | 
| sizeweight | Logical value that indicates whether size weights should be
used during estimation, where  | 
| sweight | Character value providing name of the size weight variable in
 | 
| sweight1 | Character value providing name of the stage one size weight
variable in  | 
| fpc | Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design: 
 Example fpc for a single-stage stratified survey design: 
 Example fpc for a two-stage unstratified survey design: 
 Example fpc for a two-stage stratified survey design: 
 | 
| popsize | Object that provides values for the population argument of the
 Example popsize for calibration: 
 Example popsize for post-stratification using a data frame: 
 Example popsize for post-stratification using a table: 
 Example popsize for post-stratification using an xtabs object: 
 | 
| invprboot | Logical value that indicates whether the inverse probability
bootstrap procedure is used to calculate trend parameter estimates.  This
bootstrap procedure is only available for the "LMM" option for continuous
variables.  Inverse probability references the design weights, which
are the inverse of the sample inclusion probabilities.  The default value
is  | 
| nboot | Numeric value for the number of bootstrap iterations.  The
default is  | 
| vartype | Character value providing choice of the variance estimator,
where  | 
| jointprob | Character value providing choice of joint inclusion
probability approximation for use with Horvitz-Thompson and Yates-Grundy
variance estimators, where  | 
| conf | Numeric value for the Gaussian-based confidence level.  The default is
 | 
| All_Sites | A logical variable used when  | 
Value
The analysis results. A list composed of two data frames containing trend estimates for all combinations of population Types, subpopulations within Types, and response variables. For categorical variables, trend estimates are calculated for each category of the variable. The two data frames in the output list are:
- catsum
- data frame containing trend estimates for categorical variables 
- contsum
- data frame containing trend estimates for continuous variables 
For the SLR and WLR model options, the data frame contains the following variables:
- Type
- subpopulation (domain) name 
- Subpopulation
- subpopulation name within a domain 
- Indicator
- response variable 
- Trend_Estimate
- trend estimate 
- Trend_Std_Error
- trend standard error 
- Trend_LCBxxPct
- trend xx% (default 95%) lower confidence bound 
- Trend_UCBxxPct
- trend xx% (default 95%) upper confidence bound 
- Trend_p_Value
- trend p-value 
- Intercept_Estimate
- intercept estimate 
- Intercept_Std_Error
- intercept standard error 
- Intercept_LCBxxPct
- intercept xx% (default 95%) lower confidence bound 
- Intercept_UCBxxPct
- intercept xx% (default 95%) upper confidence bound 
- Intercept_p_Value
- intercept p-value 
- R_Squared
- R-squared value 
- Adj_R_Squared
- adjusted R-squared value 
For the GLMM and LMM model options, contents of the data frames will vary
depending on the model specified by arguments cat_rhs and
cont_rhs.  For the default PO model, the data frame contains the
following variables:
- Type
- subpopulation (domain) name 
- Subpopulation
- subpopulation name within a domain 
- Indicator
- response variable 
- Trend_Estimate
- trend estimate 
- Trend_Std_Error
- trend standard error 
- Trend_LCBxxPct
- trend xx% (default 95%) lower confidence bound 
- Trend_UCBxxPct
- trend xx% (default 95%) upper confidence bound 
- Trend_p_Value
- trend p-value 
- Intercept_Estimate
- intercept estimate 
- Intercept_Std_Error
- intercept standard error 
- Intercept_LCBxxPct
- intercept xx% (default 95%) lower confidence bound 
- Intercept_UCBxxPct
- intercept xx% (default 95%) upper confidence bound 
- Intercept_p_Value
- intercept p-value 
- Var_SiteInt
- variance of the site intercepts 
- Var_SiteTrend
- variance of the site trends 
- Corr_SiteIntSlope
- correlation of site intercepts and site trends 
- Var_Year
- year variance 
- Var_Residual
- residual variance 
- AIC
- generalized Akaike Information Criterion 
Details
For the simple linear regression (SLR) model, a design-based estimate of the
category proportion (categorical variables) or the mean (continuous
variables) is calculated for each time period (year).  Four choices of
variance estimator are available for calculating variance of the design-based
estimates: (1) the local mean estimator, (2) the simple random sampling
estimator, (3) the Horvitz-Thompson estimator, and (4) the Yates-Grundy
estimator.  For the Horvitz-Thompson and Yates-Grundy estimators, there are
three choices for calculating joint inclusion probabilities: (1) the Overton
approximation, (2) the Hartley-Rao approximation, and (3) the Brewer
approximation.  The lm function in the stats package is used to fit a
linear model using a formula argument that specifies the proportion or
mean estimates as the response variable and years as the regressor variable.
For fitting the SLR model, the yearID variable from the dframe
argument is modified by subtracting the minimum value of years from all
values of the variable.  Parameter estimates are extracted from the object
returned by the lm function.  For the weighted linear regression (WLR)
model, the process is the same as the SLR model except that the inverse of
the variances of the proportion or mean estimates is used as the
weights argument in the call to the lm function.  For the LMM
option, the lmer function in the lme4 package is used to fit a linear
mixed-effects model for trend across years.  For both the GLMM and LMM
options, the default Piepho and Ogutu (PO) model includes fixed effects for
intercept and trend (slope) and random effects for intercept and trend for
individual sites, where the siteID variable from the dframe
argument identifies sites.  Correlation between the random effects for site
intercepts and site trends is included in the model. Finally, the PO model
contains random effects for year variance and residual variance. For the GLMM
and LMM options, arguments cat_rhs and cont_rhs, respectively,
can be used to specify the right hand side of the model formula. Internally,
a variable named Wyear is created that is useful for specifying the
cat_rhs and cont_rhs arguments.  The Wyear variable is
created by subtracting the minimum value of the yearID variable from
all values of the variable.  If argument invprboot is FALSE,
parameter estimates are extracted from the object returned by the lmer
function. If argument invprboot is TRUE, the boot
function in the boot package is used to generate bootstrap replicates using a
function named bootfcn as the statistic argument passed to the
boot function.  For each bootstrap replicate, bootfcn calls the
glmer or lmer function, as appropriate, using the specified
model.  design weights identified by the weight argument for
the trend_analysis function are passed as the weights argument
for the boot function, which specifies importance weights.  Using the
design weights as the weights argument ensures that bootstrap
replicates are representative of the survey population.  Parameter estimates
are calculated using the object returned by the boot function.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
See Also
- change_analysis
- for change analysis 
Examples
# Example using a categorical variable with three resource classes and a
# continuous variable
mydframe <- data.frame(
  siteID = rep(paste0("Site", 1:40), rep(5, 40)),
  yearID = rep(seq(2000, 2020, by = 5), 40),
  wgt = rep(runif(40, 10, 100), rep(5, 40)),
  xcoord = rep(runif(40), rep(5, 40)),
  ycoord = rep(runif(40), rep(5, 40)),
  All_Sites = rep("All Sites", 200),
  Region = sample(c("North", "South"), 200, replace = TRUE),
  Resource_Class = sample(c("Good", "Fair", "Poor"), 200, replace = TRUE),
  ContVar = rnorm(200, 10, 1)
)
myvars_cat <- c("Resource_Class")
myvars_cont <- c("ContVar")
mysubpops <- c("All_Sites", "Region")
trend_analysis(
  dframe = mydframe,
  vars_cat = myvars_cat,
  vars_cont = myvars_cont,
  subpops = mysubpops,
  model_cat = "WLR",
  model_cont = "SLR",
  siteID = "siteID",
  yearID = "yearID",
  weight = "wgt",
  xcoord = "xcoord",
  ycoord = "ycoord"
)
Print grts(), irs()), and analysis function warnings
Description
This function prints the warnings messages from the grts(), irs(),
and analysis functions.
Usage
warnprnt(warn_df = get("warn_df", envir = .GlobalEnv), m = 1:nrow(warn_df))
Arguments
| warn_df | Data frame that contains warning messages.  The default is
 | 
| m | Vector of indices for warning messages that are to be printed. The
default is a vector containing the integers from 1 through the number of
rows in  | 
Value
Printed warnings.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov