| Type: | Package | 
| Title: | Robust Multivariate Regression | 
| Version: | 0.1.0 | 
| Description: | Robust methods for estimating the parameters of multivariate Gaussian linear models. | 
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] | 
| Encoding: | UTF-8 | 
| Imports: | Rcpp, foreach, doParallel, mvtnorm,parallel,RSpectra , capushe, KneeArrower, fastmatrix, DescTools | 
| LinkingTo: | Rcpp, RcppArmadillo | 
| NeedsCompilation: | yes | 
| RoxygenNote: | 7.1.2 | 
| Packaged: | 2024-04-22 17:56:54 UTC; pug56 | 
| Author: | Antoine Godichon-Baggioni [aut, cre, cph], Stéphane Robin [aut], Laure Sansonnet [aut] | 
| Maintainer: | Antoine Godichon-Baggioni <antoine.godichon_baggioni@upmc.fr> | 
| Repository: | CRAN | 
| Date/Publication: | 2024-04-23 09:00:02 UTC | 
Robust Multivariate Regression
Description
This Package focuses on multivariate robust Guassian linear regression.
We provide a function Robust_Mahalanobis_regression which enables to obtain robust estimates of the parameters of Multivariate Gaussian Linear Models with the help of the Mahalanobis distance, using a Stochastic Gradient algorithm or a Fix point. This is based on the function Robust_Variance which allows to obtain robust estimation of the variance, and so, also for low rank matrices (see Godichon-Baggioni and RObin (2024) <doi:10.1007/s11222-023-10362-9>)
Robust methods for estimating the parameters of multivariate Gaussian linear models. .
Details
| Package: | RobRegression | 
| Type: | Package | 
| Title: | Robust Multivariate Regression | 
| Version: | 0.1.0 | 
| Authors@R: | c(person("Antoine","Godichon-Baggioni", role = c("aut", "cre","cph"), email = "antoine.godichon_baggioni@upmc.fr"), person("Stéphane","Robin", role = "aut"), person("Laure","Sansonnet", role = "aut")) | 
| Description: | Robust methods for estimating the parameters of multivariate Gaussian linear models. | 
| License: | GPL (>= 2) | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| Imports: | Rcpp, foreach, doParallel, mvtnorm,parallel,RSpectra , capushe, KneeArrower, fastmatrix, DescTools | 
| LinkingTo: | Rcpp, RcppArmadillo | 
| NeedsCompilation: | yes | 
| Roxygen: | list(markdown = True) | 
| RoxygenNote: | 7.1.2 | 
| Author: | Antoine Godichon-Baggioni [aut, cre, cph], Stéphane Robin [aut], Laure Sansonnet [aut] | 
| Maintainer: | Antoine Godichon-Baggioni <antoine.godichon_baggioni@upmc.fr> | 
| Archs: | x64 | 
Index of help topics:
RobRegression-package   Robust Multivariate Regression
Robust_Mahalanobis_regression
                        Robust_Mahalanobis_regression
Robust_Variance         Robust_Variance
Robust_regression       Robust_regression
Author(s)
NA
Maintainer: NA
References
Cardot, H., Cenac, P. and Zitt, P-A. (2013). Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli, 19, 18-43.
Cardot, H. and Godichon-Baggioni, A. (2017). Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis. Test, 26(3), 461-480
Godichon-Baggioni, A. and Robin, S. (2024). Recursive ridge regression using second-order stochastic algorithms. Computational Statistics & Data Analysis, 190, 107854.
Vardi, Y. and Zhang, C.-H. (2000). The multivariate L1-median and associated data depth. Proc. Natl. Acad. Sci. USA, 97(4):1423-1426.
Robust_Mahalanobis_regression
Description
We propose here a function which enables to provide a robust estimation of the parameters of Multivariate Gaussian Linear Models of the form Y = X \beta + \epsilon where \epsilon is a 0-mean Gaussian vector of variance \Sigma. In addition, one can aslo consider a low-rank variance of the form \Sigma = C + \sigma I where \sigma is a positive scalar and C is a matrix of rank d. More precisely, the aim is to minimize the functional
G_\lambda(\hat{\beta}) = \mathbb{E}\left(\| Y-X\hat{\beta} \|_{\Sigma^{-1}}\right) + \lambda \|\hat{\beta}\|^{\text{Ridge}}.
Usage
Robust_Mahalanobis_regression(X, Y, alphaRM=0.66, alphareg=0.66, w=2, lambda=0,
                              creg='default', K=2:30, par=TRUE, epsilon=10^(-8),
                              method_regression='Offline', niter_regression=50,
                              cRM='default', mc_sample_size='default',
                              method_MCM='Weiszfeld', methodMC='Robbins',
                              niterMC=50, ridge=1, eps_vp=10^(-4), nlambda=50,
                              scale='none', tol=10^(-3))
Arguments
| X | A  | 
| Y | A  | 
| method_regression | The method used for estimating the parameter. Should be  | 
| niter_regression | The maximum number of regression iterations if the fix point algorithm is used, i.e. if  | 
| epsilon | Stoping condition for the fix point algorithm if  | 
| scale | If a scaling is used.  | 
| ridge | The power of the penalty: i.e. should be  | 
| lambda | A vector giving the different studied penalizations. If  | 
| par | Is equal to  | 
| nlambda | The number of tested penalizations if  | 
| alphaRM | A scalar between 1/2 and 1 used in the stepsequence if the Robbins-Monro algorithm is used, i.e. if  | 
| alphareg | A scalar between 1/2 and 1 used in the stepsequence for stochastic gradient algorithm if  | 
| w | The power for the weighted averaged algorithms if  | 
| creg | The constant in the stepsequence if the averaged stochastic gradient algorithm is used, i.e. if  | 
| K | A vector containing the possible values of  | 
| mc_sample_size | The number of data generated for the Monte-Carlo method for estimating robustly the eigenvalues of the variance. | 
| method_MCM | The method chosen to estimate Median Covariation Matrix. Can be  | 
| methodMC | The method chosen to estimate robustly the variance. Can be  | 
| niterMC | The number of iterations for estimating robustly the variance of each class if  | 
| eps_vp | The minimum values for the estimates of the eigenvalues of the Variance can take. Default is  | 
| cRM | The constant in the stepsequence if the Robbins-Monro algorithm is used to robustly estimate the variance, i.e. if  | 
| tol | A scalar that avoid numerical problems if method='Offline'. Default is  | 
Value
A list with:
| beta | A  | 
| Residual_Variance | A  | 
| criterion | A vector giving the loss for the different chosen  | 
| all_beta | A list containing the different estimation of the parameters (with respect to the different choices of  | 
| lambda_opt | A scalar giving the selected  | 
| variance_results | A list giving the results on the variance of the noise obtained with the help of the function  | 
Details of the list variance_results:
| Sigma | The robust estimation of the variance. | 
| invSigma | The robuste estimation of the inverse of the variance. | 
| MCM | The Median Covariation Matrix. | 
| eigenvalues | A vector containing the estimation of the  | 
| MCM_eigenvalues | A vector containing the estimation of the  | 
| cap | The result given for capushe for selecting  | 
| reduction_results | A list containing the results for all possible  | 
References
Cardot, H., Cenac, P. and Zitt, P-A. (2013). Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli, 19, 18-43.
Cardot, H. and Godichon-Baggioni, A. (2017). Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis. Test, 26(3), 461-480
Vardi, Y. and Zhang, C.-H. (2000). The multivariate L1-median and associated data depth. Proc. Natl. Acad. Sci. USA, 97(4):1423-1426.
See Also
See also Robust_Variance, Robust_regression and RobRegression-package.
Examples
p=5
q=10
n=2000
mu=rep(0,q)
Sigma=diag(c(q,rep(0.1,q-1)))
epsilon=mvtnorm::rmvnorm(n = n,mean = mu,sigma = Sigma)
X=mvtnorm::rmvnorm(n=n,mean=rep(0,p))
beta=matrix(rnorm(p*q),ncol=q)
Y=X %*% beta+epsilon
Res_reg=Robust_Mahalanobis_regression(X,Y,par=FALSE)
sum((Res_reg$beta-beta)^2)
Robust_Variance
Description
The aim is to provide a robust estimation of the variance for Guassian models with reduction dimension. More precisely we considering a q dimensional random vector  whose variance can be written as \Sigma = C + \sigma I where C is a matrix of rank d, with d possibly much smaller than q, sigma is a positive scalar, and I is the identity matrix.
Usage
Robust_Variance(X,K=ncol(X),par=TRUE,alphaRM=0.75,
                c='default',w=2,mc_sample_size='default',
                methodMC='Robbins',niterMC=50,method_MCM='Weiszfeld',
                eps_vp=10^(-6))
Arguments
| X | A matrix whose raws are the vector we want to estimate the variance. | 
| K | A vector containing the possible values of d. The 'good' d is chosen with the help of a penatly criterion if the length of K is larger than 10. Default is  | 
| par | Is equal to  | 
| mc_sample_size | The number of data generated for the Monte-Carlo method for estimating robustly the eigenvalues of the variance. | 
| methodMC | The method chosen to estimate robustly the variance. Can be  | 
| niterMC | The number of iterations for estimating robustly the variance of each class if  | 
| method_MCM | The method chosen to estimate Median Covariation Matrix. Can be  | 
| alphaRM | A scalar between 1/2 and 1 used in the stepsequence for the Robbins-Monro method if  | 
| c | The constant in the stepsequence if  | 
| w | The power for the weighted averaged Robbins-Monro algorithm if  | 
| eps_vp | The minimum values for the estimates of the eigenvalues of the Variance can take. Default is  | 
Value
A list with:
| Sigma | The robust estimation of the variance. | 
| invSigma | The robuste estimation of the inverse of the variance. | 
| MCM | The Median Covariation Matrix. | 
| eigenvalues | A vector containing the estimation of the d+1 main eigenvalues of the variance, where d+1 is the optimal choice belong K. | 
| MCM_eigenvalues | A vector containing the estimation of the d+1 main eigenvalues of the Median Covariation Matrix, where d+1 is the optimal choice belong K. | 
| cap | The result given for capushe for selecting d if the length of K is larger than 10. | 
| reduction_results | A list containing the results for all possible K. | 
References
Cardot, H., Cenac, P. and Zitt, P-A. (2013). Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli, 19, 18-43.
Cardot, H. and Godichon-Baggioni, A. (2017). Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis. Test, 26(3), 461-480
Vardi, Y. and Zhang, C.-H. (2000). The multivariate L1-median and associated data depth. Proc. Natl. Acad. Sci. USA, 97(4):1423-1426.
See Also
See also Robust_Mahalanobis_regression, Robust_regression and RobRegression-package.
Examples
q<-100
d<-10
n<-2000
Sigma<- diag(c(d:1,rep(0,q-d)))+ diag(rep(0.1,q))
X=mvtnorm::rmvnorm(n=n,sigma=Sigma)
RobVar = Robust_Variance(X,K=q)
sum((RobVar$Sigma-Sigma)^2)/q
Robust_regression
Description
This function gives robust estimates of the paramter of the Multivariate Linear regression with the help of the euclidean distance, or with the help of the Mahalanobis distance for some matrice Sigma. More precisely, the aim is to minimize
G(\hat{\beta}) = \mathbb{E}[ \| Y-X\hat{\beta} \|_{\Sigma}] + \lambda \| \hat{\beta}\|^{\text{ridge}}
.
Usage
Robust_regression(X,Y, Mat_Mahalanobis=diag(rep(1,ncol(Y))),
                  niter=50,lambda=0,c='default',method='Offline',
                  alpha=0.66,w=2,ridge=1,nlambda=50,
                  init=matrix(runif(ncol(X)*ncol(Y))-0.5,nrow=ncol(X),ncol=ncol(Y)),
                  epsilon=10^(-8), Mahalanobis_distance = FALSE,
                  par=TRUE,scale='none',tol=10^(-3))
Arguments
| X | A (n,p)-matrix whose raws are the explaining data. | 
| Y | A (n,q)-matrix whose raws are the variables to be explained. | 
| method | The method used for estimating the parameter. Should be  | 
| Mat_Mahalanobis | A (q,q)-matrix giving  | 
| Mahalanobis_distance | A logical telling if the Mahalanobis distance is used. Default is  | 
| scale | If a scaling is used.   | 
| niter | The maximum number of iteration if  | 
| init | A (p,q)-matrix which gives the initialization of the algorithm. | 
| ridge | The power of the penalty: i.e should be  | 
| lambda | A vector giving the different studied penalizations. If  | 
| nlambda | The number of tested penalizations if  | 
| par | Is equal to  | 
| c | The constant in the stepsequence if the averaged stochastic gradient algorithm, i.e if  | 
| alpha | A scalar between 1/2 and 1 used in the stepsequence for stochastic gradient algorithm if  | 
| w | The power for the weighted averaged Robbins-Monro algorithm if  | 
| epsilon | Stoping condition for the fix point algorithm if  | 
| tol | A scalar that avoid numerical problems if method='Offline'. Default is  | 
Value
A list with:
| beta | A (p,q)-matrix giving the estimation of the parameters. | 
| criterion | A vector giving the loss for the different chosen  | 
| all_beta | A list containing the different estimation of the parameters (with respect to the different coices of  | 
| lambda_opt | A scalar giving the selected  | 
References
Godichon-Baggioni, A., Robin, S. and Sansonnet, L. (2023): A robust multivariate linear regression based on the Mahalanobis distance
See Also
See also Robust_Variance, Robust_Mahalanobis_regression and RobRegression-package.
Examples
p=5
q=10
n=2000
mu=rep(0,q)
epsilon=mvtnorm::rmvnorm(n = n,mean = mu)
X=mvtnorm::rmvnorm(n=n,mean=rep(0,p))
beta=matrix(rnorm(p*q),ncol=q)
Y=X %*% beta+epsilon
Res_reg=Robust_regression(X,Y)
sum((Res_reg$beta-beta)^2)