| Type: | Package | 
| Version: | 0.40.2 | 
| Date: | 2024-08-25 | 
| Title: | Compositional Data Analysis in Practice | 
| Author: | Michael Greenacre [aut, cre] | 
| Imports: | ca (≥ 0.7), vegan (≥ 2.3), ellipse (≥ 0.4.1) | 
| Maintainer: | Michael Greenacre <michael.greenacre@upf.edu> | 
| Description: | Univariate and multivariate methods for compositional data analysis, based on logratios. The package implements the approach in the book Compositional Data Analysis in Practice by Michael Greenacre (2018), where accent is given to simple pairwise logratios. Selection can be made of logratios that account for a maximum percentage of logratio variance. Various multivariate analyses of logratios are included in the package. | 
| License: | GPL-2 | GPL-3 [expanded from: GPL] | 
| URL: | https://github.com/michaelgreenacre/CODAinPractice/ | 
| NeedsCompilation: | no | 
| Packaged: | 2024-08-27 14:25:12 UTC; rforge | 
| Repository: | CRAN | 
| Repository/R-Forge/Project: | easycoda | 
| Repository/R-Forge/Revision: | 53 | 
| Repository/R-Forge/DateTimeStamp: | 2024-08-27 14:20:20 | 
| Date/Publication: | 2024-08-27 15:00:05 UTC | 
| Depends: | R (≥ 2.10) | 
Compositional Data Analysis in Practice
Description
Univariate and multivariate methods for compositional data analysis, based on logratios. The package implements the approach in the book Compositional Data Analysis in Practice by Michael Greenacre (2018), where accent is given to simple pairwise logratios. Selection can be made of logratios that account for a maximum percentage of logratio variance. Various multivariate analyses of logratios are included in the package.
Details
The DESCRIPTION file:
| Package: | easyCODA | 
| Type: | Package | 
| Version: | 0.40.2 | 
| Date: | 2024-08-25 | 
| Title: | Compositional Data Analysis in Practice | 
| Authors@R: | person(given = "Michael", family = "Greenacre", role = c("aut", "cre"), email = "michael.greenacre@upf.edu") | 
| Author: | Michael Greenacre [aut, cre] | 
| Imports: | ca (>= 0.7), vegan (>= 2.3), ellipse (>= 0.4.1) | 
| Maintainer: | Michael Greenacre <michael.greenacre@upf.edu> | 
| Description: | Univariate and multivariate methods for compositional data analysis, based on logratios. The package implements the approach in the book Compositional Data Analysis in Practice by Michael Greenacre (2018), where accent is given to simple pairwise logratios. Selection can be made of logratios that account for a maximum percentage of logratio variance. Various multivariate analyses of logratios are included in the package. | 
| License: | GPL | 
| URL: | https://github.com/michaelgreenacre/CODAinPractice/ | 
| NeedsCompilation: | no | 
| Packaged: | 2024-08-25 20:08:18 UTC; u16368 | 
| Repository: | R-Forge | 
| Repository/R-Forge/Project: | easycoda | 
| Repository/R-Forge/Revision: | 53 | 
| Repository/R-Forge/DateTimeStamp: | 2024-08-27 14:20:20 | 
| Date/Publication: | 2024-08-27 14:20:20 | 
Index of help topics:
ACLUST                  Amalgamation clustering of the parts of a
                        compositional data matrix
ALR                     Additive logratios
BAR                     Compositional bar plot
CA                      Correspondence analysis
CIplot_biv              Bivariate confidence and data ellipses
CLOSE                   Closure of rows of compositional data matrix
CLR                     Centred logratios
DOT                     Dot plot
DUMMY                   Dummy variable (indicator) coding
FINDALR                 Find the best ALR transformation
ILR                     Isometric logratio
LR                      All pairwise logratios
LR.VAR                  Total logratio variance
LRA                     Logratio analysis
PCA                     Principal component analysis
PLOT.CA                 Plot the results of a correspondence analysis
PLOT.LRA                Plot the results of a logratio analysis
PLOT.PCA                Plot the results of a principal component
                        analysis
PLOT.RDA                Plot the results of a redundancy analysis
PLR                     Pivot logratios
RDA                     Redundancy analysis
SLR                     Amalgamation (summed) logratio
STEP                    Stepwise selection of logratios
STEPR                   Stepwise selection of pairwise logratios for
                        generalized linear modelling
VAR                     Variance of a vector of observations, dividing
                        by n rather than n-1
WARD                    Ward clustering of a compositional data matrix
cups                    Dataset: RomanCups
easyCODA-package        Compositional Data Analysis in Practice
fish                    Dataset: FishMorphology
invALR                  Inverse of additive logratios
invCLR                  Inverse of centred logratios
invSLR                  Inverse of full set of amalgamation balances
time                    Dataset: TimeBudget
veg                     Dataset: Vegetables
Author(s)
Michael Greenacre [aut, cre]
Maintainer: Michael Greenacre <michael.greenacre@upf.edu>
References
Greenacre, Michael (2018) Compositional Data Analysis in Practice. Chapman & Hall / CRC Press
See Also
Examples
# Roman cups glass compositions
data("cups")
# unweighted logratio analysis
cups.ulra <- LRA(cups, weight=FALSE)
PLOT.LRA(cups.ulra)
# weighted logratio analysis
cups.wlra <- LRA(cups)
PLOT.LRA(cups.wlra)
Amalgamation clustering of the parts of a compositional data matrix
Description
This function clusters the parts of a compositional data matrix, using amalgamation of the parts at each step.
Usage
ACLUST(data, weight = TRUE, close = TRUE)
Arguments
| data | Compositional data matrix, with the parts as columns | 
| weight | 
 | 
| close | 
 | 
Details
The function ACLUST performs amalgamation hierarchical clustering on the parts (columns) of a given compositional data matrix, as proposed by Greenacre (2019).
At each step of the clustering two clusters are amalgamated that give the least loss of explained logratio variance.
Value
An object which describes the tree produced by the clustering process on the n objects. The object is a list with components:
| merge | an n-1 by 2 matrix. Row i of  | 
| height | a set of n-1 real values (non-decreasing for ultrametric trees). The clustering height: that is, the value of the criterion associated with the clustering method for the particular agglomeration. | 
| order | a vector giving the permutation of the original observations suitable for plotting, in the sense that a cluster plot using this ordering and matrix merge will not have crossings of the branches | 
| labels | a vector of column labels, the column names of  | 
Author(s)
Michael Greenacre
References
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC. 
Greenacre, M. (2019), Amalgamations are valid in compositional data analysis,
can be used in agglomerative clustering, and their
logratios have an inverse transformation.
Applied Computing and Geosciences, open access.
See Also
hclust, WARD,CLR, LR.VAR,
CLOSE
Examples
data(cups)
# amalgamation clustering    (weighted parts)
cups.aclust <- ACLUST(cups)
plot(cups.aclust)
# reproducing Figure 2(b) of Greenacre (2019) (unweighted parts))
# dataset Aar is in the compositions package
# aar is a subset of Aar
# code given here within the '\dontrun' environment since external package 'compositions' required
## Not run: 
  library(compositions)
  data(Aar)
  aar <- Aar[,c(3:12)]
  aar.aclust <- ACLUST(aar, weight=FALSE)
# the maximum height is the total variance
# convert to percents of variance NOT explained
  aar.aclust$height <- 100 * aar.aclust$height / max(aar.aclust$height)
  plot(aar.aclust, main="Parts of Unexplained Variance", ylab="Variance (percent)")
## End(Not run)
Additive logratios
Description
Computation of additive logratios (ALRs) with respect to a specified part.
Usage
ALR(data, denom=ncol(data), weight=TRUE, stats=FALSE)
Arguments
| data | A compositional data frame or matrix | 
| denom | Number of part used in the denominator | 
| weight | Logical indicating if varying weights are returned(default: | 
| stats | Logical indicating if means, variances and total variance of the ALRs are returned (default: | 
Details
The function ALR computes a set of additive logratios (ALRs) with respect to a specified part (by default, the last part). 
Value
| LR | The additive logratios (ALRs) | 
| LR.wt | The weights assigned to the ALRs | 
| denom | The index of the denominator used in the computation of the ALRs | 
| part.names | The part names in the data, i.e. column names | 
| part.wt | The part weights | 
| means | The means of the ALRs (only returned if  | 
| vars | The variances of the ALRs (only returned if  | 
| totvar | The total variance of the ALRs (only returned if  | 
Author(s)
Michael Greenacre
References
Aitchison, J. (1986), The Statistical Analysis of Compositional Data, Chapman & Hall. 
 
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC Press.
See Also
invALR, LR, CLR, invCLR, LR.VAR
Examples
data(veg)
ALR(veg, denom=2)
Compositional bar plot
Description
Horizontal bar plot of compositional data
Usage
BAR(data, cols=rainbow(ncol(data)), col.names=colnames(data),
    row.names=rownames(data), order.column=NA, eps=0.5, main="", ylab="",
    ylim=c(0,nrow(data)), xlim=c(0,100), cex=1, truncate=NA)
Arguments
| data | Compositional data matrix or data frame with compositions in rows, parts in columns | 
| cols | Colours of points for each part, default rainbow | 
| col.names | Part names, if modified | 
| row.names | Sample names, if modified | 
| order.column | By default parts are taken in order of columns, but can be re-ordered using this option | 
| eps | Small space between bars, can be modified | 
| main | Heading | 
| ylab | Vertical axis label | 
| ylim | Vertical axis limits (default is the number of rows in data) | 
| xlim | Horizontal axis limits (default c(0,100)) | 
| cex | Character size scaling factor for labels | 
| truncate | Truncate part (column) names to this number of characters for legend | 
Details
The function BAR makes a BAR plot for specified groups of points, which can be in columns of a matrix or data frame. 
Author(s)
Michael Greenacre
References
Greenacre, M. (2016), Data reporting and visualization in ecology, Polar Biology: 39, 2189-2205.
See Also
Examples
# Vegetables data set: order samples by carbohydrates 
data(veg)
BAR(veg, order.column=2)
data(time)
# TimeBudget data set: put domestic work in first column and order by it
BAR(time[,c(2,1,3,4,5,6)], order.column=1, main="Time Budget")
Correspondence analysis
Description
Computation of correspondence analysis on a table of nonnegative data.
Usage
CA(data, nd = 2, suprow = NA, supcol = NA)
Arguments
| data | A data frame or matrix of nonnegative data (no negative values) | 
| nd | Number of dimensions for summary solution if not 2 (default) | 
| suprow | Indices of rows that are supplementary points | 
| supcol | Indices of columns that are supplementary points | 
Details
The function CA is a simple wrapper for the ca function in the ca package (Nenadic and Greenacre, 2007), for compatibility within the easyCODA package.
Supplementary rows and columns can be declared (also known as passive points) – these do not contribute to the solution but are positioned on the solution axes.
The function borrows the structure and functions of the ca package, which is required, and produces a ca object, and the same print, summary and plot methods can be used, as for a ca object.     
It additionally exports the principal coordinates of both the rows and columns, not presently found in the ca package.
Value
| sv | Singular values | 
| nd | Number of dimensions in solution results | 
| rownames | Row names | 
| rowmass | Row weights | 
| rowdist | Row logratio distances to centroid | 
| rowinertia | Row inertias | 
| rowcoord | Row standard coordinates | 
| rowpcoord | Row principal coordinates | 
| rowsup | Indices of row supplementary points | 
| colnames | Column names | 
| colmass | Column weights | 
| coldist | Column logratio distances to centroid | 
| colinertia | Column inertias | 
| colcoord | Column standard coordinates | 
| colpcoord | Column principal coordinates | 
| N | The compositional data table | 
Author(s)
Michael Greenacre
References
Nenadic, O. and Greenacre, M. (2007). Correspondence analysis in R, with two- and three-dimensional graphics: The ca package. Journal of Statistical Software, 20 (3), https://www.jstatsoft.org/v20/i03/
See Also
PLOT.CA, plot.ca, summary.ca, print.ca
Examples
# CA of the Roman cups data (symmetric map)
data("cups")
PLOT.CA(CA(cups))
Bivariate confidence and data ellipses
Description
Draws confidence and data ellipses in bivariate scatterplots
Usage
CIplot_biv(x, y, group, wt=rep(1/length(x),length(x)), 
           varnames=c("x","y"), groupnames=sort(unique(group)), 
           groupcols=rainbow(length(unique(group))), 
           shownames=TRUE, xlim=c(NA,NA), ylim=c(NA,NA), 
           lty=1, lwd=1, add=FALSE, alpha=0.95, ellipse=0, 
           shade=FALSE, alpha.f=0.2, frac=0.01, cex=1) 
Arguments
| x | x-variable (horizontal) of scatterplot | 
| y | y-variable (vertical) of scatterplot | 
| group | Grouping variable | 
| wt | Set of weights on the cases (operates when ellipse=1) | 
| varnames | Vector of two labels for the axes (default is x and y) | 
| groupnames | Vector of labels for the groups (default is 1, 2, etc...) | 
| groupcols | Vector of colours for the groups | 
| shownames | Whether to show group names at group centroids or not (default is  | 
| xlim | Possible new x-limits for plot | 
| ylim | Possible new y-limits for plot | 
| lty | Line type for the ellipses (default is 1) | 
| lwd | Line width for the ellipses (default is 1) | 
| add | = | 
| alpha | Confidence level of ellipses (default is 0.95) | 
| ellipse | Type of ellipse (see Details below; default is 0 for normal-based ellipses) | 
| shade | = | 
| alpha.f | Shading fraction (default is 0.2) | 
| frac | Proportional part defining the width of the bars at the edges of confidence intervals (for  | 
| cex | Character expansion factor for group names | 
Details
The function CIplot_biv makes various types of confidence and data ellipses, according to option ellipse. Set ellipse<0 for regular data-covering ellipses. Set ellipse=0 (default) for normal-theory confidence ellipses. Setellipse=1 for bootstrap confidence ellipses. The option ellipse=2 for the delta method is not implemented yet.  Set ellipse=3 for normal-theory confidence error bars lined up with axes. Set ellipse=4 for bootstrap confidence error bars along axes.  The package ellipse is required.
Author(s)
Michael Greenacre
References
Greenacre, M. (2016), Data reporting and visualization in ecology, Polar Biology, 39:2189-2205.
See Also
Examples
# Generate some bivariate normal data in three groups with different means
# Means (1,0), (0,1) and (0,0)
means  <- matrix(c(1,0,0,1,0,0), ncol=3)
data   <- matrix(nrow=300, ncol=2)
groups <- sample(rep(c(1,2,3), 100))
for(i in 1:300) data[i,] <- rnorm(c(1,1), mean=means[,groups[i]])
# Plot confidence ellipses with shading
CIplot_biv(data[,1], data[,2], group=groups, shade=TRUE)
Closure of rows of compositional data matrix
Description
This function closes (or normalizes) the rows of a compositional data matrix, resulting in rows summing to 1.
Usage
CLOSE(x)
Arguments
| x | Compositional data matrix. | 
Details
Compositional data carry relative information. It is sometimes required to close the data so that each row of observations sums to 1.
The function CLOSE performs the closure.
Value
The closed compositional data matrix.
Author(s)
Michael Greenacre
References
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC. 
Examples
data(cups)
apply(cups, 2, sum)
cups <- CLOSE(cups)
apply(cups, 2, sum)
Centred logratios
Description
Computation of centred logratios (CLRs).
Usage
CLR(data, weight=TRUE)
Arguments
| data | A compositional data frame or matrix | 
| weight | Logical indicating if varying weights are returned(default: | 
Details
The function CLR computes the set of centred logratios (CLRs). 
Value
| LR | The centred logratios (CLRs) | 
| LR.wt | The weights assigned to the CLRs | 
Author(s)
Michael Greenacre
References
Aitchison, J. (1986), The Statistical Analysis of Compositional Data, Chapman & Hall. 
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC Press.
See Also
invCLR, ALR, invALR, LR, LR.VAR
Examples
data(veg)
CLR(veg)
Dot plot
Description
Simple dot plot of original data
Usage
DOT(data, cols=NA, names=NA, groups=NA, pch=NA, horizon=FALSE, jitter=1, 
     xscale=NA, xscalefac=1, yaxis=TRUE, shownames=TRUE, main="", ylab="", 
     xlim=c(NA,NA), ylim=c(NA, NA), cex=1)
Arguments
| data | Matrix or data frame with data groups in columns; alternatively, a single vector but then groups (if any) have to specified with the  | 
| cols | Colours of points for each sample, default rainbow | 
| names | Labels for variables, by default the column names of data, or group names | 
| groups | Group codes to split the data vector into separate plots | 
| pch | Point character | 
| horizon | 
 | 
| jitter | 1 by default, increase or decrease slightly for more jitter | 
| xscale | User-supplied positions of points on horizontal axis | 
| xscalefac | 1 by default, rescale the positions on horizontal axis | 
| yaxis | TRUE by default, FALSE to suppress and optionally add afterwards | 
| shownames | 
 | 
| main | Heading | 
| ylab | Vertical axis label | 
| xlim | Horizontal axis limits | 
| ylim | Vertical axis limits | 
| cex | Character size adjustment for labels | 
Details
The function DOT makes a dot plot for specified groups of points, which can be in columns of a matrix or data frame, or in a single vector with group codes specified separately. 
Author(s)
Michael Greenacre
References
Greenacre, M. (2016), Data reporting and visualization in ecology, Polar Biology, 39:2189-2205.
See Also
Examples
# Dot plot of columns of Vegetables data set
data(veg)
DOT(veg)
# Dot plot of domestic work column of TimeBudget data set, split by sex
data(time)
DOT(time[,2], groups=substr(rownames(time),3,3), cols=c("blue","red"), ylim=c(0,20), 
    jitter=2, main="Percentage of Domestic Work")
Dummy variable (indicator) coding
Description
Convert categorical variable to dummy (0/1) coding
Usage
DUMMY(x, catnames=NA)
Arguments
| x | Variable (vector) of categorical data to be coded | 
| catnames | Category names | 
Details
The function DUMMY takes a categorical variable and converts it to a set of dummy variables (zeros and ones), where the ones indicate the corresponding category. There are as many columns in the result as there are unique categories in the input vector. 
Author(s)
Michael Greenacre
Examples
# Indicator (dummy) coding of sex in FishMorphology data set
data(fish)
sex   <- fish[,1]
sex.Z <- DUMMY(sex, catnames=c("F","M")) 
Find the best ALR transformation
Description
Searching over every possible reference part for choosing an optimal ALR transformation.
Usage
FINDALR(data, weight=FALSE)
Arguments
| data | Compositional data matrix, with the parts as columns | 
| weight | FALSE (default) for equally weighted parts, TRUE when weights are in data list object, or a vector of user-defined part weights | 
Details
The function FINDALR considers every possible set of additive logratio (ALR) transformations, by trying each of the references. For each set the closeness to isometry is measured by the Procrustes correlation. In addition, the variance of the log-transformed reference is also computed. The reference with highest Procrustes correlation and the reference with the lowest variance of its log-transform are identified. The number of ALRs computed is equal to 1 less than the number of rows or columns, whichever is the smallest.
Value
An object which describes the tree produced by the clustering process on the n objects. The object is a list with components:
| totvar | Total logratio variance | 
| procrust.cor | The Procrustes correlations of the ALRs using each reference | 
| procrust.max | The value of the highest Procrustes correlation | 
| procrust.ref | The reference corresponding to the highest correlation | 
| var.log | Variances of the log-transformed references | 
| var.min | The value of the lowest variance | 
| var.ref | The reference corresponding to the lowest variance | 
Author(s)
Michael Greenacre
References
Greenacre, M., Martinez-Alvaro, M. and Blasco, A. (2021), Compositional data analysis of microbiome and any-omics datasets: a validation of the additive logratio transformation, Frontiers in Microbiology 12: 2625 
Gower, J. and Dijksterhuis, G.B. (2004), Procrustes Problems. Oxford University Press   
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC
See Also
Examples
# For the fish morphometric data, first close (normalize) 
# then loop over the 26 possible references
data(fish)
FINDALR(CLOSE(fish[,4:29]))
# Note that for the default option weight=FALSE closing the data is not necessary
Isometric logratio
Description
Computation of a single isometric logratio (ILR)
Usage
ILR(data, numer=NA, denom=NA, weight=TRUE)
Arguments
| data | A compositional data frame or matrix | 
| numer | Vector of parts in the numerator | 
| denom | Vector of parts in the denominator | 
| weight | Logical indicating if a varying weight is returned (default: | 
Details
The function ILR computes a single isometric logratio based on the specified numerator and denominator parts that define the two geometric means in the ratio. 
Value
| LR | The isometric logratio (ILR) | 
| LR.wt | The weight assigned to the ILR | 
Author(s)
Michael Greenacre
References
Aitchison, J. (1986), The Statistical Analysis of Compositional Data, Chapman & Hall.  
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC Press.
See Also
Examples
data(veg)
ILR(veg, numer=1, denom=2:3)
All pairwise logratios
Description
Computation of all pairwise logratios.
Usage
LR(data, ordering=1:ncol(data), weight=TRUE)
Arguments
| data | A compositional data frame or matrix | 
| ordering | A permutation of the columns (default: the original ordering) | 
| weight | Logical indicating if varying weights are returned (default: | 
Details
The function LR computes the complete set of pairwise logratios, in the order [1,2], [1,3], [2,3], [1,4], [2,4], [3,4], etc. 
Value
| LR | The pairwise logratios as columns of a data matrix | 
| LR.wt | The weights assigned to the respective logratios | 
Author(s)
Michael Greenacre
References
Aitchison, J. (1986), The Statistical Analysis of Compositional Data, Chapman & Hall. 
 
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC Press.
See Also
ALR, invALR, CLR, invCLR, LR.VAR
Examples
data(veg)
LR(veg)
Total logratio variance
Description
Computation of total (weighted)logratio variance.
Usage
LR.VAR(LRdata, row.wt = NA, weight=TRUE, vars=FALSE)
Arguments
| LRdata | Matrix of logratios, either a vector or preferably the logratio object resulting from one of the functions ALR, CLR, PLR or LR | 
| row.wt | Optional set of row weights (default: equal weights) | 
| weight | Logical indicating if varying weights are returned(default: | 
| vars | If  | 
Details
The function LR.VAR computes the sum of the logratio variances provided as input, using the weights in the logratio data object. 
Value
| LRtotvar | The total logratio variance | 
| LRvars | (optional, if vars=TRUE, the individual logratio variances composing the total) | 
Author(s)
Michael Greenacre
References
Aitchison, J. (1986), The Statistical Analysis of Compositional Data, Chapman & Hall.  
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC Press.
See Also
Examples
data(cups)
# These give identical total logratio variances (weighted, by default)
LR.VAR(CLR(cups))
LR.VAR(LR(cups))
# Summing over all sets of ALRs gives twice the variance
totvar <- 0
for(j in 1:ncol(cups)) totvar <- totvar + LR.VAR(ALR(cups, denom=j))
totvar/2
Logratio analysis
Description
Computation of weighted or unweighted logratio analysis of a samples-by-parts compositional data table.
Usage
LRA(data, nd = 2, weight = TRUE, suprow = NA, row.wt = NA, amalg = NA, supamalg = FALSE)
Arguments
| data | A data frame or matrix of compositional data, with no zero values | 
| nd | Number of dimensions for summary solution if not 2 (default) | 
| weight | TRUE (default) for part weighting, FALSE for unweighted analysis, or a vector of user-defined part weights | 
| suprow | Indices of rows that are supplementary points | 
| row.wt | Optional user-defined set of positive weights for the rows (samples) (default: equal weights) | 
| amalg | Optional list of amalgamated parts | 
| supamalg | FALSE (default) when amalgamations are active and their subparts supplementary, TRUE when amalgamations are supplementary and their parts active | 
Details
The function LRA computes a log-ratio analysis of a table of compositional data based on the singular value decomposition. 
By default the weighted log-ratio analysis is computed (Greenacre & Lewi 2009). 
For the unweighted logratio analysis (Aitchison & Greenacre 2002), specify the option weight=FALSE. 
User-specified weights can be supplied, for the rows and/or the columns.
Usually row weights are not specified, and are equal unless intentional weighting of the samples is desired.
Default column weights (if weight = TRUE) are the part means of the true compositional table, thus summing to 1.
User-specified part weights can be provided using the same weight option. 
Supplementary rows can be declared (also known as passive points) – these do not contribute to the solution but are positioned on the solution axes.
Amalgamations can be defined and can either replace their constituent parts (default) or be declared supplementary using the supamalg option: supamalg = FALSE (default), = TRUE if all declared amalgamations are supplementary.
The function borrows the structure and functions of the ca package, which is required, and produces a ca object, and the same print, summary and plot methods can be used, as for a ca object.
Value
| sv | Singular values | 
| nd | Number of dimensions in solution results | 
| rownames | Row names | 
| rowmass | Row weights | 
| rowdist | Row logratio distances to centroid | 
| rowinertia | Row inertias | 
| rowcoord | Row standard coordinates | 
| rowpcoord | Row principal coordinates | 
| rowsup | Indices of row supplementary points | 
| colnames | Column names | 
| colmass | Column weights | 
| coldist | Column logratio distances to centroid | 
| colinertia | Column inertias | 
| colcoord | Column standard coordinates | 
| colpcoord | Column principal coordinates | 
| N | The compositional data table | 
Author(s)
Michael Greenacre
References
Aitchison, J. and Greenacre, M. (2002), Biplots of compositional data, Applied Statistics 51, 375-392.
Greenacre, M. and Lewi, P.J. (2009), Distributional equivalence and subcompositional coherence in the analysis of compositional data, contingency tables and ratio scale measurements. Journal of Classification 26, 29-54.
Greenacre, M. (2020), Amalgamations are valid in compositional data analysis, can be used in agglomerative clustering, and their logratios have an inverse transformation. Applied Computing and Geosciences 5, 100017.
See Also
Examples
# (weighted) LRA of the RomanCups data set, showing default symmetric map
data("cups")
PLOT.LRA(LRA(cups))
# (unweighted) LRA of the RomanCups data set, showing default symmetric map
# the solution is completely different
PLOT.LRA(LRA(cups, weight=FALSE))
Principal component analysis
Description
Computation of weighted or unweighted principal component analysis of a matrix of interval-scale data (e.g. a matrix of logratios).
Usage
PCA(data, nd = 2, weight = TRUE, row.wt = NA, suprow = NA)
Arguments
| data | A data frame or matrix of interval-scale data, or logratio object from functions  | 
| nd | Number of dimensions for summary solution if not 2 (default) | 
| weight | TRUE (default) for column weighting, FALSE for unweighted analysis, or a vector of user-defined column weights | 
| row.wt | Optional user-defined set of positive weights for the rows (samples) (default: equal weights) | 
| suprow | Indices of rows that are supplementary points (NOTE: this option is not implemented in this version) | 
Details
The function PCA computes an unstandardized principal component analysis, based on the singular value decomposition, of a matrix of interval-scale data, usually a matrix of logratios in the context of this package (but it can be used for general data as well). 
For general usage the unweighted option weight = FALSE might be preferred, but the default is weighted in the present context of compositional data.
User-specified weights can be supplied, for the rows and/or the columns.
Usually row weights are not specified, and are equal unless intentional weighting of the samples is desired.
User-specified part weights can be provided using the weight option. 
Supplementary rows and columns can be declared (also known as passive points) – these do not contribute to the solution but are positioned on the solution axes. Notice that this optyion is not implemented in the present version, but will appear in the next one.
The function borrows the structure and functions of the ca package, which is required, and produces a ca object, and the same print, summary and plot methods can be used, as for a ca object.
Value
| sv | Singular values | 
| nd | Dimenson of the solution | 
| rownames | Row names | 
| rowmass | Row weights | 
| rowdist | Row logratio distances to centroid | 
| rowinertia | Row variances | 
| rowcoord | Row standard coordinates | 
| rowpcoord | Row principal coordinates | 
| rowsup | Indices of row supplementary points | 
| colnames | Column names | 
| colmass | Column weights | 
| coldist | Column logratio distances to centroid | 
| colinertia | Column variances | 
| colcoord | Column standard coordinates | 
| colpcoord | Column principal coordinates | 
| N | The data table | 
Author(s)
Michael Greenacre
References
Aitchison, J. and Greenacre, M. (2002), Biplots of compositional data, Applied Statistics 51, 375-392.
Greenacre, M. (2010), Biplots in Practice, BBVA Foundation, Bilbao. Free download from www.multivariatestatistics.org
See Also
PLOT.PCA, plot.ca, summary.ca, print.ca
Examples
# compute logratios of Vegetables data set
data("veg")
veg.LR <- LR(veg)
# unweighted PCA biplot of the results
veg.pca <- PCA(veg.LR$LR, weight=FALSE)
PLOT.PCA(veg.pca, map="asymmetric")
Plot the results of a correspondence analysis
Description
Various maps and biplots of the results of a correspondence analysis using function CA.
Usage
PLOT.CA(obj, map="symmetric", rescale=1, dim=c(1,2), axes.inv = c(1,1), main="",
        cols=c("blue","red"), colarrows = "pink", cexs=c(0.8,0.8), fonts=c(2,4))
Arguments
| obj | A CA object created using function  | 
| map | Choice of scaling of rows and columns:  | 
| rescale | A rescaling factor applied to column coordinates (default is 1 for no rescaling) | 
| dim | Dimensions selected for horizontal and vertical axes of the plot (default is c(1,2)) | 
| main | Title for plot | 
| axes.inv | Option for reversing directions of horizontal and vertical axes (default is c(1,1) for no reversing, change one or both to -1 for reversing) | 
| cols | Colours for row and column labels (default is c("blue","red")) | 
| colarrows | Colour for arrows in asymmetric and contribution biplots (default is "pink") | 
| cexs | Character expansion factors for row and column labels (default is c(0.8,0.8)) | 
| fonts | Fonts for row and column labels (default is c(2,4)) | 
Details
The function PLOT.CA makes a scatterplot of the results of a correspondence analysis (computed using function CA), with various options for scaling the results and changing the direction of the axes.  By default, dimensions 1 and 2 are plotted on the horizontal and vertical axes, and it is assumed that row points refer to samples and columns to variables.
By default, the symmetric scaling is used, where both rows and columns are in principal coordinates and have the same amount of weighted variance (i.e. inertia) along the two dimensions. The other options are biplots: the asymmetric option, when columns are in standard coordinates, and the contribution option, when columns are in contribution coordinates.  In cases where the row and column displays occupy widely different extents, the column coordinates can be rescaled using the rescale option. 
Author(s)
Michael Greenacre
References
Greenacre, M. (2013), Contribution biplots, Journal of Computational and Graphical Statistics, 22, 107-122.
See Also
Examples
data("cups")
cups.ca <- CA(cups)
PLOT.CA(cups.ca, map="contribution", rescale=0.2)	
# Compare the above plot with that of a weighted LRA -- practically the same
cups.lra <- LRA(cups)
PLOT.LRA(cups.lra, map="contribution", rescale=0.2)
Plot the results of a logratio analysis
Description
Various maps and biplots of the results of a logratio analysis using function LRA.
Usage
PLOT.LRA(obj, map="symmetric", rescale=1, dim=c(1,2), axes.inv = c(1,1), main="",
         cols=c("blue","red"), colarrows = "pink", cexs=c(0.8,0.8), fonts=c(2,4))
Arguments
| obj | An LRA object created using function  | 
| map | Choice of scaling of rows and columns:  | 
| rescale | A rescaling factor applied to column coordinates (default is 1 for no rescaling) | 
| dim | Dimensions selected for horizontal and vertical axes of the plot (default is c(1,2)) | 
| axes.inv | Option for reversing directions of horizontal and vertical axes (default is c(1,1) for no reversing, change one or both to -1 for reversing) | 
| main | Title for plot | 
| cols | Colours for row and column labels (default is c("blue","red")) | 
| colarrows | Colour for arrows in asymmetric and contribution biplots (default is "pink") | 
| cexs | Character expansion factors for row and column labels (default is c(0.8,0.8)) | 
| fonts | Fonts for row and column labels (default is c(2,4)) | 
Details
The function PLOT.LRA makes a scatterplot of the results of a logratio analysis (computed using function LRA), with various options for scaling the results and changing the direction of the axes.  By default, dimensions 1 and 2 are plotted on the horizontal and vertical axes, and it is assumed that row points refer to samples and columns to compositional parts.
By default, the symmetric scaling is used, where both rows and columns are in principal coordinates and have the same amount of weighted variance along the two dimensions. The other options are the asymmetric option, when columns are in standard coordinates, and the contribution option, when columns are in contribution coordinates.  In cases where the row and column displays occupy widely different extents, the column coordinates can be rescaled using the rescale option. 
Author(s)
Michael Greenacre
References
Greenacre, M. (2013), Contribution biplots, Journal of Computational and Graphical Statistics, 22, 107-122.
See Also
Examples
# perform LRA on the vegetable compositions (unweighted form)
data("veg")
veg.lra <- LRA(veg, weight=FALSE)
PLOT.LRA(veg.lra)
# perform LRA on the Roman cups data set and plot the results (weighted form)
data("cups")
cups.lra <- LRA(cups)
PLOT.LRA(cups.lra, map="contribution", rescale=0.2)	
Plot the results of a principal component analysis
Description
Various maps and biplots of the results of a principal component analysis using function PCA.
Usage
PLOT.PCA(obj, map="symmetric", rescale=1, dim=c(1,2), axes.inv = c(1,1),
         main="", cols=c("blue","red"), colarrows = "pink", cexs=c(0.8,0.8),
         fonts=c(2,4))
Arguments
| obj | An LRA object created using function  | 
| map | Choice of scaling of rows and columns:  | 
| rescale | A rescaling factor applied to column coordinates (default is 1 for no rescaling) | 
| dim | Dimensions selected for horizontal and vertical axes of the plot (default is c(1,2)) | 
| axes.inv | Option for reversing directions of horizontal and vertical axes (default is c(1,1) for no reversing, change one or both to -1 for reversing) | 
| main | Title for plot | 
| cols | Colours for row and column labels (default is c("blue","red")) | 
| colarrows | Colour for arrows in asymmetric and contribution biplots (default is "pink") | 
| cexs | Character expansion factors for row and column labels (default is c(0.8,0.8)) | 
| fonts | Fonts for row and column labels (default is c(2,4)) | 
Details
The function PLOT.PCA makes a scatterplot of the results of a logratio analysis (computed using function PCA), with various options for scaling the results and changing the direction of the axes.  By default, dimensions 1 and 2 are plotted on the horizontal and vertical axes, and it is assumed that row points refer to samples and columns to variables.
By default, the symmetric scaling is used, where both rows and columns are in principal coordinates and have the same amount of weighted variance along the two dimensions. The other options are biplots: the asymmetric option, when columns are in standard coordinates, and the contribution option, when columns are in contribution coordinates.  In cases where the row and column displays occupy widely different extents, the column coordinates can be rescaled using the rescale option. 
Author(s)
Michael Greenacre
References
Greenacre, M. (2013), Contribution biplots, Journal of Computational and Graphical Statistics, 22, 107-122.
See Also
Examples
# perform weighted PCA on the ALRs of the Roman cups data set
# where the first oxide silica is chosen as the denominator
data("cups")
cups.alr <- ALR(cups, denom=1)
cups.pca <- PCA(cups.alr)
PLOT.PCA(cups.pca, map="contribution", rescale=0.2, axes.inv=c(1,-1))	
Plot the results of a redundancy analysis
Description
Various maps and biplots/triplots of the results of a redundancy analysis using function RDA.
Usage
PLOT.RDA(obj, map="symmetric", indcat=NA, rescale=1, dim=c(1,2), axes.inv=c(1,1),
         main="", rowstyle=1, cols=c("blue","red","forestgreen"),
         colarrows=c("pink","lightgreen"), colrows=NA, pchrows=NA, colcats=NA,
         cexs=c(0.8,0.8,0.8), fonts=c(2,4,4))
Arguments
| obj | An RDA object created using function  | 
| map | Choice of scaling of rows and columns:  | 
| indcat | A vector indicating which of the covariates are dummy (or fuzzy) variables | 
| rescale | A rescaling factor applied to column coordinates(default is 1 for no rescaling). If rescale is a vector with two values, the first applies to the column coordinates and the second to the covariate coordinates. | 
| dim | Dimensions selected for horizontal and vertical axes of the plot (default is c(1,2)) | 
| axes.inv | Option for reversing directions of horizontal and vertical axes (default is c(1,1) for no reversing, change one or both to -1 for reversing) | 
| main | Title for plot | 
| rowstyle | Scaling option for row coordinates, either 1 (SVD coordinates, default) or 2 (as supplementary points) | 
| cols | Colours for row and column and covariate labels (default is c("blue","red","forestgreen")) | 
| colarrows | Colour for arrows in asymmetric or contribution biplots, for columns and covariates (default is c("pink","lightgreen")) | 
| colrows | Optional vector of colours for rows | 
| pchrows | Optional vector of point symbols for rows | 
| colcats | Optional vector of colours for covariate categories (dummy variables) | 
| cexs | Vector of character expansion factors for row and column and covariate labels (default is c(0.8,0.8,0.8)) | 
| fonts | Vector of font styles for row and column and covariate labels (default is c(2,4,4)) | 
Details
The function PLOT.RDA makes a scatterplot of the results of a redundancy analysis (computed using function RDA), with various options for scaling the results and changing the direction of the axes.  By default, dimensions 1 and 2 are plotted on the horizontal and vertical axes, and it is assumed that row points refer to samples and columns to compositional parts.   Covariates are plotted according to their regression coefficients with the RDA dimensions, and if they contain dummy (or fuzzy) variables these are indicated by the option indcat, and hence plotted as centroids not arrows.
By default, the symmetric scaling is used, where both rows and columns are in principal coordinates and have the same amount of weighted variance along the two dimensions. The other options are the asymmetric option, when columns are in standard coordinates, and the contribution option, when columns are in contribution coordinates.  In cases where the row and column displays as well as the covariate positions occupy widely different extents, the column and covariate coordinates can be rescaled using the rescale option. 
Author(s)
Michael Greenacre
See Also
Examples
# see the use of PLOT.RDA in the example of the RDA function
Pivot logratios
Description
Computation of the set of pivot logratios(PLRs) based on the specified ordering of parts
Usage
PLR(data, ordering=1:ncol(data), weight=TRUE)
Arguments
| data | A compositional data frame or matrix | 
| ordering | The ordering of the parts to be used in the PLRs (by default, the original ordering of the columns) | 
| weight | Logical indicating if varying weights are returned (default: | 
Details
The function PLR computes the set of pivot logratios according to the ordering of the parts. 
Value
| LR | The pivot logratios (PLRs) | 
| LR.wt | The weights assigned to the PLRs | 
Author(s)
Michael Greenacre
References
Hron K., Filzmoser P., de Caritat P., Fiserova E., Gardlo A. (2017). Weighted pivot coordinates for copositional data and their application to geochemical mapping. Mathematical Geosciences 49, 777-796.
See Also
Examples
data(veg)
PLR(veg, ordering=c(1,3,2))
Redundancy analysis
Description
Computation of weighted or unweighted redundancy analysis of a samples-by-parts compositional data table, given a set of covariates.
Usage
RDA(data, cov=NA, nd = NA, weight = TRUE, suprow = NA, row.wt = NA) 
Arguments
| data | A data frame or matrix of interval-scale data, e.g. logratios (which are preferably in a list object with weights) | 
| cov | List of covariates for constraining solution | 
| nd | Number of dimensions for summary output, by default the number of constraining dimensions | 
| weight | TRUE (default) when weights are in data list object, FALSE for unweighted analysis, or a vector of user-defined part weights | 
| suprow | Indices of rows that are supplementary (passive) points (NOTE: this option is not implemented in this version) | 
| row.wt | Optional user-defined set of positive weights for the rows (samples) (default: equal weights) | 
Details
The function RDA computes a redundancy analysis of a matrix of interval-scaled data, constrained by a matrix of covariates, using the singular value decomposition. 
By default weights are assumed in the data list object. 
For the unweighted logratio analysis, specify the option weight=FALSE. 
If weight = TRUE (the default) it is assumed that the weights are included in the data object, which comes from one of the logratio functions.
User-specified part weights can be provided using the same weight option. 
Usually row weights are not specified, they are equal unless intentional weighting of the samples is desired. Supplementary rows can be declared (also known as passive points) – these do not contribute to the solution but are positioned on the solution axes. This option will be available in the next release of the package.
Value
| sv | Singular values | 
| nd | Number of dimensions in the solution output | 
| rownames | Row names | 
| rowmass | Row weights | 
| rowdist | Row distances to centroid | 
| rowinertia | Row variances | 
| rowcoord | Row standard coordinates | 
| rowpcoord | Row principal coordinates | 
| rowsup | Indices of row supplementary points | 
| colnames | Column names | 
| colmass | Column weights | 
| coldist | Column logratio distances to centroid | 
| colinertia | Column variances | 
| colcoord | Column standard coordinates | 
| colpcoord | Column principal coordinates | 
| covcoord | Regression coordinates of constraining variables | 
| covnames | Names of constraining variables | 
| N | The data table (usually logratios in this package) | 
| cov | The table of covariates | 
Author(s)
Michael Greenacre
References
Van den Wollenbergh, A. (1977), Redundancy analysis. An alternative to canonical correlation analysis, Psychometrika 42, 207-219.
Greenacre, M. (2013), Contribution biplots, Journal of Computational and Graphical Statistics 22, 107-122.
See Also
Examples
# Data frame fish has sex, habitat and mass in first columns, 
# then morphometric data in remaining columns
data("fish")
sex     <- fish[,1]
habitat <- fish[,2]
mass    <- fish[,3]
fishm   <- as.matrix(fish[,4:29])
# Convert to compositional data matrix
fishm   <- CLOSE(fishm)
# Compute logarithm of mass and interaction of sex (F/M) and habitat (L/P) categories
logmass <- log(mass)
sexhab  <- 2*(sex-1)+habitat
sexhab.names <- c("FL","FP","ML","MP")
rownames(fishm) <- sexhab.names[sexhab]
# Create dummy variables for sexhab and create matrix of covariates
sexhab.Z <- DUMMY(sexhab, catnames=sexhab.names)
vars     <- cbind(logmass, sexhab.Z)
# Perform RDA on centred logratios
require(ca)
fish.rda <- RDA(CLR(fishm), cov=vars)
# Plot results 
# (for more options see Appendix of Compositional Data Analysis in Practice)
PLOT.RDA(fish.rda, map="contribution", rescale=0.05, indcat=2:5, 
         colrows=rainbow(4, start=0.1, end=0.8)[sexhab], cexs=c(0.8,0.8,1))
Amalgamation (summed) logratio
Description
Computation of a single amalgamation (summed) logratio
Usage
SLR(data, numer=NA, denom=NA, weight=TRUE)
Arguments
| data | A compositional data frame or matrix | 
| numer | Vector of parts in the numerator | 
| denom | Vector of parts in the denominator | 
| weight | Logical indicating if a varying weight is returned (default: | 
Details
The function SLR computes a single amalgamation logratio based on the specified numerator and denominator parts that define the two summations in the ratio. 
Value
| LR | The amalgamation (summed)) logratio (SLR) | 
| LR.wt | The weight assigned to the SLR | 
Author(s)
Michael Greenacre
References
Aitchison, J. (1986), The Statistical Analysis of Compositional Data, Chapman & Hall. 
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC Press.
See Also
Examples
data(veg)
SLR(veg, numer=1, denom=2:3)
Stepwise selection of logratios
Description
Stepwise selection of pairwise logratios that explain maximum variance in a target matrix.
Usage
STEP(data, datatarget=data, previous=NA, previous.wt=NA, weight=TRUE, 
     random=FALSE, nsteps=min(ncol(data), ncol(datatarget))-1, top=1)
Arguments
| data | A data frame or matrix of compositional data on which pairwise logratios are computed | 
| datatarget | A matrix of interval-scale data, with as many rows as  | 
| previous | A vector or matrix of variables to be forced in before logratios are sought | 
| previous.wt | Possible weights of the variable(s) forced in before logratios are sought (if not specified, weights of 1 are assumed) | 
| weight | 
 | 
| random | 
 | 
| nsteps | Number of steps to take (by default, one less than the number of columns of data and of datatarget, whichever is smaller) | 
| top | Number of top variance-explaining logratios returned after last step (by default, 1, i.e. the best) | 
Details
The function STEP sequentially computes the logratios in a data matrix (usually compositional) that best explain the variance in a second matrix, called the target matrix.  By default, the target matrix is the same matrix, in which case the logratios that best explain the logratio variance in the same matrix are computed.
In this case, weights for the data matrix are assumed by default, proportional to part means of the compositional data matrix.
For the unweighted logratio variance, specify the option weight=FALSE. 
User-specified weights on the columns of the data matrix (usually compositional parts) can be provided using the same weight option. 
If the target matrix is a different matrix, it is the logratio variance of that matrix that is to be explained. An option for the target matrix to be any response matrix will be in the next release.
If nsteps > 1 and top=1 the results are in the form of an optimal set of logratios that sequentially add maximum explained variance at each step.  
If top>1 then at the last step the ordered list of top variance-explaining logratios is returned, which allows users to make an alternative choice of the logratio based on substantive knowledge. Hence, if nsteps=1 and top=10, for example, the procedure will move only one step, but list the top 10 logratios for that step.  If top=1 then all results with extension .top related to the top ratios are omitted because they are already given.
Value
| names | Names of maximizing ratios in stepwise process | 
| ratios | Indices of ratios | 
| logratios | Matrix of logratios | 
| R2max | Sequence of maximum cumulative explained variances | 
| pro.cor | Corresponding sequence of Procrustes correlations | 
| names.top | Names of "top" ratios at last step | 
| ratios.top | Indices of "top" ratios | 
| logratios.top | Matrix of "top" logratios | 
| R2.top | Sequence of "top" cumulative explained variances (in descending order) | 
| pro.cor.top | Corresponding sequence of "top" Procrustes correlations | 
| totvar | Total logratio variance of target matrix | 
Author(s)
Michael Greenacre
References
Van den Wollenbergh, A. (1977), Redundancy analysis. An alternative to canonical correlation analysis, Psychometrika 42, 207-219.
Greenacre, M. (2018), Variable selection in compositional data analysis using pairwise logratios, Mathematical Geosciences, DOI: 10.1007/s11004-018-9754-x. 
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC
See Also
Examples
# Stepwise selection of ratios for RomanCups data set
data(cups)
# Set seed to obtain same results as in Appendix C of Greenacre (2018)
set.seed(2872)
STEP(cups, random=TRUE)
# Select best ratio, but output "top 5"
STEP(cups, nsteps=1, top=5)
Stepwise selection of pairwise logratios for generalized linear modelling
Description
Three different algorithms for selecting pairwise logratios that best explain/predict a response variable, which could be continuous, binary or count
Usage
STEPR(data, y, method = NA, family = "gaussian", nsteps = ncol(data)-1, 
      top = 1, previous = NA, criterion = "Bonferroni", alpha = 0.05, 
      previousparts=NA, denom=NA)
Arguments
| data | A data frame or matrix of compositional data on which the pairwise logratios will be constructed and selected | 
| y | The response variable: a numeric variable for regression (default)), a binary factor for logistic regression or a numeric count for Poisson regression | 
| method | The selection method: 1 (unrestricted selection of logratios), 2 (restricted to non-overlapping parts), 3 (additive logratios) | 
| family | The distribution used in the generalized linear model family: "gaussian" (default, for multiple regression), "binomial" (for logistic regression of binary response), or "poisson" (for Poisson regression) | 
| nsteps | The maximum number of steps taken, by default one less than the number of parts | 
| top | When one step is taken (nsteps=1), the ordered list of top logratios with the highest improvements in the likelihood function, for selection based on domain knowledge | 
| previous | For specifying variable(s) to be included before stepwise selection takes place; these can be non-compositional variables and/or specific pairwise logratios computed in previous runs of STEPR or by hand; the matrix (or vector for a single variable) of values must be supplied | 
| criterion | Criterion for stopping the stepwise selection: "Bonferroni" (default), "AIC", "BIC", or NA for no stopping until maximum specified or permissible logratios entered | 
| alpha | Overall significance level (default is 0.05) | 
| previousparts | (For method 2) The sequence numbers of the logratios, if any, forced in using the previous option | 
| denom | (For method 3) The sequence number of the part used in denominator; for use when additive logratios are forced in using previous option or to select a set of additive logratios with specific reference from the start | 
Details
The function STEPR performs stepwise selection of pairwise logratios with the objective of explaining/predicting a response variable, in the framework of generalized linear modelling where the response can be numeric continuous (regression analysis), or a binary factor (logistic regression), or a numeric count (Poisson regression). The corresponding family option has to be indicated if the regression is logistic or Poisson. The different method options for the stepwise selection are method = 1 (unrestricted selection of logratios, any logratios can be selected irrespective of the previous ones), method = 2 (restricted to non-overlapping parts, each part participates at most in one logratio, so that parts in previously selected logratios are excluded in subsequent steps; logratio effects can be interpreted as under orthogonality), method = 3 (additive logratios, only logratios with the same denominator as the first selected logratio are c onsidered; the result is an additive logratio transformation on a subcomposition) Three alternative stopping criteria can be specified, otherwise the procedure executes as many steps as the value of nsteps. These are (in increasing strictness), "AIC", "BIC" and "Bonferroni" (the default).
Value
| rationames | Names of the selected logratios | 
| ratios | The sequence numbers of the selected parts in each ratio | 
| logratios | Matrix of selected logratios | 
| logLik | The -2*log-likelihood sequence for the steps | 
| deviance | The deviance sequence for the steps | 
| AIC | The AIC sequence for the steps | 
| BIC | The BIC sequence for the steps | 
| Bonferroni | The Bonferroni sequence for the steps | 
| null.deviance | The null deviance for the regression | 
(Notice that for logLik, AIC, BIC and Bonferroni, the values for one more step are given, so that the stopping point can be confirmed.)  
And the following if top > 1: 
| ratios.top | The top ratios and the sequence numbers of their parts | 
| logratios.top | The matrix of top logratios | 
| logLik.top | The set of top -2*log-likelihoods | 
| deviance.top | The set of top deviances | 
| AIC.top | The set of top AICs | 
| BIC.top | The set of top BICs | 
| Bonferroni.top | The set of top Bonferronis | 
Author(s)
Michael Greenacre
References
Coenders, G. and Greenacre, M. (2021), Three approaches to supervised learning for compositional data with pairwise logratios. aRxiv preprint. URL:https://arxiv.org/abs/2111.08953 
Coenders, G. and Pawlowsky-Glahn, V. (2020), On interpretations of tests and effect sizes in regression models with a compositional predictor. SORT, 44:201-220 
Greenacre, M. (2021), Compositional data analysis, Annual Review of Statistics and its Application, 8: 271-299
See Also
Examples
# For the fish morphometric data, first close (normalize, although not necessary) 
# then loop over the 26*25/2 = 325 possible logratios stepwise
data(fish)
habitat <- fish[,2]
morph <- CLOSE(fish[,4:29])
# predict habitat binary classification from morphometric ratios
fish.step1 <- STEPR(morph, as.factor(habitat), method=1, family="binomial")
# [1] "Criterion increases when 3-th ratio enters"
fish.step1$names
# [1] "Bac/Hg" "Hw/Jl"
# perform logistic regression with selected logratios
fish.glm   <- glm(as.factor(habitat) ~ fish.step1$logratios, family="binomial")
summary(fish.glm)
fish.pred1  <- predict(fish.glm)
table(fish.pred1>0.5, habitat)
#     habitat
#        1  2
# FALSE 56 11
# TRUE   3  5
# (Thus 61/75 correct predictions)
#
# force the sex variable in at the first step before selecting logratios
# and using more strict Bonferroni default
sex <- as.factor(fish[,1]) 
fish.step2 <- STEPR(morph, as.factor(habitat), method=1, previous=sex, family="binomial")
# [1] "Criterion increases when 3-th ratio enters"
fish.step2$names
# [1] "Bac/Hg" "Hw/Jl"
# perform logistic regression with sex and selected logratios
fish.glm   <- glm(as.factor(habitat) ~ sex + fish.step2$logratios, family="binomial")
summary(fish.glm)
# (sex not significant)
#
# check the top 10 ratios at Step 1 to allow domain knowledge to operate
fish.step3 <- STEPR(morph, as.factor(habitat), method=1, nsteps=1, top=10, family="binomial")
cbind(fish.step3$ratios.top, fish.step3$BIC.top)
#        row col         
# Bac/Hg   8  19 67.93744
# Bp/Hg    7  19 69.87134
# Jl/Hg    6  19 70.31554
# Jw/Bp    5   7 71.53671
# Jw/Jl    5   6 71.57122
# Jw/Bac   5   8 71.69294
# Fc/Hg   10  19 72.38560
# Hw/Bac   1   8 73.25325
# Jw/Fc    5  10 73.48882
# Hw/Bp    1   7 73.55621
# Suppose 5th in list, Jw/Jl (Jaw width/Jaw length), preferred at the first step
fish.step4 <- STEPR(morph, as.factor(habitat), method=1, 
                    previous=fish.step3$logratios.top[,5], family="binomial")
# [1] "Criterion increases when 2-th ratio enters"
fish.step4$names
# [1] "Bac/Hg"
# So after Jw/Jl forced in only Bac/Hg enters, the best one originally
Variance of a vector of observations, dividing by n rather than n-1
Description
This function computes the usual variance but divides by n, not by n-1.
Usage
VAR(x)
Arguments
| x | Vector of values for which variance is computed | 
Details
To think of each of n observations weighted by 1/n this function VAR computes squared deviations from the mean and averages them.
Thus, the sum of squared deviations is divided by n rather than by n-1, as for the unbiased estimate of the variance.
Value
The value of the variance.
Author(s)
Michael Greenacre
References
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC. 
See Also
Examples
data(cups)
cups <- CLOSE(cups)
# variances using base R function var
apply(cups, 2, var)
# variances using easyCODA function VAR
apply(cups, 2, VAR)
Ward clustering of a compositional data matrix
Description
This function clusters the rows (or the columns, if the matrix is transformed) of a compositional data matrix, using weighted Ward clustering of the logratios.
Usage
WARD(LRdata, weight=TRUE, row.wt=NA)
Arguments
| LRdata | Matrix of logratios, either a vector or preferably the logratio object resulting from one of the functions ALR, CLR, PLR or LR (usually CLRs will be used)) | 
| weight | 
 | 
| row.wt | Optional set of row weights (default is equal weights when  | 
Details
The function WARD performs a weighted WARD hierarchical clustering on the rows of an input set of logratios, usually CLR-transformed.
(This would be equivalent to performing the clustering on all pairwise logratios).
If the columns of the logratio matrix are unweighted, specify the option weight=FALSE: they will then get equal weights.
The default weight=TRUE option implies that column weights are provided, either in the input list object LRdata, as LRdata$LR.wt, or as
a vector of user-specified weights using the same weight option.
Value
An object which describes the tree produced by the clustering process on the n objects. The object is a list with components:
| merge | an n-1 by 2 matrix. Row i of  | 
| height | a set of n-1 real values (non-decreasing for ultrametric trees). The clustering height: that is, the value of the criterion associated with the clustering method for the particular agglomeration. | 
| order | a vector giving the permutation of the original observations suitable for plotting, in the sense that a cluster plot using this ordering and matrix merge will not have crossings of the branches | 
Author(s)
Michael Greenacre
References
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC. 
See Also
Examples
# clustering steps for unweighted and weighted logratios
# for both row- and column-clustering
data(cups)
cups <- CLOSE(cups)
# unweighted logratios: clustering samples
cups.uclr  <- CLR(cups, weight=FALSE)
cups.uward <- WARD(cups.uclr, weight=FALSE)   # weight=FALSE not needed here,
                                              # as equal weights are in object
plot(cups.uward)
# add up the heights of the nodes
sum(cups.uward$height)
# [1] 0.02100676
# check against the total logratio variance
LR.VAR(cups.uclr, weight=FALSE)
# [1] 0.02100676
# unweighted logratios: clustering parts
tcups <- t(cups)
tcups.uclr  <- CLR(tcups, weight=FALSE)
tcups.uward <- WARD(tcups.uclr, weight=FALSE)  # weight=FALSE not needed here,
                                               # as equal weights are in object
plot(tcups.uward, labels=colnames(cups))
sum(tcups.uward$height)
# [1] 0.02100676
LR.VAR(tcups.uclr, weight=FALSE)
# [1] 0.02100676
# weighted logratios: clustering samples
cups.clr <- CLR(cups)
cups.ward <- WARD(cups.clr)
plot(cups.ward)
sum(cups.ward$height)
# [1] 0.002339335
LR.VAR(cups.clr)
# [1] 0.002339335
# weighted logratios: clustering parts
# weight=FALSE is needed here, since we want equal weights
# for the samples (columns of tcups)
tcups.clr  <- CLR(tcups, weight=FALSE)
tcups.ward <- WARD(tcups.clr, row.wt=colMeans(cups))
plot(tcups.ward, labels=colnames(cups))
   sum(tcups.ward$height)
# [1] 0.002339335
LR.VAR(tcups.clr, row.wt=colMeans(cups))
# [1] 0.002339335
Dataset: RomanCups
Description
This data set consists of the compositions of 11 oxides in 47 Roman cups found at an archaeological site in eastern England. Compositions are expressed as percentages.
Usage
data(cups)Format
Data frame containing the 47 x 11 matrix.
Source
Baxter MJ, Beardah CC, Cool HEM and Jackson CM (2005) Compositional data analysis of some alkaline glasses. Mathematical Geology 37: 183-196.
Dataset: FishMorphology
Description
This data set consists of the sex, habitat, mass and then 26 morphometric measurements on 75 fish (Arctic charr)
Usage
data(fish)Format
Data frame containing the 75 x 29 matrix. Column 1 contains sex (1=female, 2=male). Column 2 contains habitat (1=litoral, 2=pelagic). Column 3 contains the mass in grams. Columns 4 to 29 contain the 26 morphometric measurements.
Source
Greenacre, M and Primicerio, R (2010) Multivariate Analysis of Ecological Data. BBVA Foundation, Bilbao. Free download at www.multivariatestatistics.org
Inverse of additive logratios
Description
Given additive logratios (ALRs) with respect to a specified part, compute the inverse (i.e. original parts)
Usage
invALR(ALRmatrix, part.names=paste("part",1:(ncol(ALRmatrix)+1),sep=""), denom=NA)
Arguments
| ALRmatrix | A matrix of additive logratios (ALRs) with respect to a specified part) | 
| part.names | Part names in the reconstructed compositional data matrix | 
| denom | The index of the denominator used in the computation of the ALRs (default: last part)) | 
Details
The function invALR computes the original parts, given the additive logratios (ALRs)
Value
| parts | The reconstructed parts (they add up to 1) | 
Author(s)
Michael Greenacre
References
Aitchison, J. (1986), The Statistical Analysis of Compositional Data, Chapman & Hall. 
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC Press.
See Also
Examples
data(veg)
# compute additive logratios with respect to second part
veg.ALR <- ALR(veg, denom=2)$LR
# recover original parts (to get same order, specify the denominator used originally)
invALR(veg.ALR, denom=2)
Inverse of centred logratios
Description
Given centred logratios (CLRs), compute the inverse (i.e. recover the original parts)
Usage
invCLR(CLRmatrix, part.names=colnames(CLRmatrix))
Arguments
| CLRmatrix | A matrix of centred logratios | 
| part.names | Part names in the reconstructed compositional data matrix | 
Details
The function invCLR computes the original parts, given the centred logratios (CLRs)
Value
| parts | The reconstructed parts (they add up to 1) | 
Author(s)
Michael Greenacre
References
Aitchison, J. (1986), The Statistical Analysis of Compositional Data, Chapman & Hall.   
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC Press.
See Also
Examples
data(veg)
# compute centred logratios
veg.CLR <- CLR(veg)$LR
# invert back to original parts (parts closed to sum to 1)
invALR(veg.CLR)
Inverse of full set of amalgamation balances
Description
Given a full set of amalgamation (or summation) balances (SLRs), compute the inverse (i.e. recover the original parts)
Usage
invSLR(SLRmatrix, part.names=NA, ratio.names=colnames(SLRmatrix))
Arguments
| SLRmatrix | A matrix of amalgamation logratios, one less column than the number of parts | 
| part.names | Part names in the reconstructed compositional data matrix | 
| ratio.names | Definition of the amalgamation logratios | 
Details
The function invSLR computes the original parts, given the amalgamation logratios (CLRs).
The amalgamation logratios are specified in ratio.names in the format num/den where num and den are the numerator and denominator amalgamations respectively. 
An amalgamation is specified as "p1&p2&...", where p1, p2, etc. are the parts summed in the amalgamation.
For example, an SLR of the ratio MnO/(CaO+P2O5) would be names as "MnO/CaO&P2O5". 
Value
| parts | The reconstructed parts (they add up to 1) | 
Author(s)
Michael Greenacre
References
Aitchison, J. (1986), The Statistical Analysis of Compositional Data, Chapman & Hall.   
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC Press.
See Also
Examples
data(veg)
# compute centred logratios
veg.CLR <- CLR(veg)$LR
# invert back to original parts (parts closed to sum to 1)
invALR(veg.CLR)
Dataset: TimeBudget
Description
This data set consists of the average percentage breakdown of time use into six categories, for 16 countries, split by males and females.
Usage
data(time)Format
Data matrix containing the 32 x 6 matrix. Row samples are labelled by the two-character country code and m (male) or f (female).
Source
Greenacre M., Compositional Data Analysis in Practice, Chapman & Hall / CRC, 2018.
Dataset: Vegetables
Description
This data set consists of the protein, carbohydrate and fat compositions of 10 different vegetables. Compositions are expressed as percentages.
Usage
data(veg)Format
Data frame containing the 10 x 3 matrix.
Source
US Department of Agriculture, https://ndb.nal.usda.gov/ndb/nutrients/index