This Document is to introduce the R package ‘imputeTestbench’. It is an testing workbench for comparison of missing data imptation models/methods. It compares imputing methods with reference to RMSE, MAE or MAPE parameters. It allows to add new proposed methods to test bench and to compare with other methods. The function append_method() allows to add multiple numbers of methods to the existing methods available in test bench.
Following example describs the working of this package:
Consider a sample data datax as follows:
datax <- c(1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5)Import library for Package imputeTestbench as follows:
library(imputeTestbench)The function impute_errors() is used to compare imputing methods with reference to RMSE, MAE or MAPE parameters. Syntax of `impute_errors()’ as shown below:
impute_errors(dataIn, missPercentFrom, missPercentTo, interval, repetition, errorParameter, MethodPath, MethodName)
where,
dataIn is input data for testingmissPercentFrom is variable from which percent of missing values to be consideredmissPercentTo is variable to state upto what percent missing values are to be consideredinterval is interval between consecutive missPercent valuesrepetition is an integer to decide the numbers of repetition to be done for each missPercent valueerrorParameter is type of error calculation (RMSE, MAE or MAPE)MethodPath is location of function for the proposed imputation methodMethodName is name for function for the proposed imputation methodAt simplest form, function impute_errors() can we used as:
q <- impute_errors(datax)
q## $Parameter
## [1] "RMSE Plot"
## 
## $Missing_Percent
## [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
## 
## $Historic_Mean
## [1] 0.4789879 0.6250889 0.8108440 0.9018024 0.9856108 1.1087825 1.1952286
## [8] 1.2724180
## 
## $Interpolation
## [1] 0.6220167 0.7748639 0.8716673 1.3633658 1.2714936 1.3627703 1.2976507
## [8] 1.8725297# By default, the bar plot is used to show the comparison
plot_errors(q)##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,]  1.5  4.5  7.5 10.5 13.5 16.5 19.5 22.5
## [2,]  2.5  5.5  8.5 11.5 14.5 17.5 20.5 23.5# Also, User can plot the comparison with line plot as:
plot_errors(dataIn = q, plotType = 2)By default, this function compares two basic imputation methods, i.e. Historical means and Interpolation methods. The plot_errors() function is used to plot the comparison plots between different methods. This test bench allows to add one more imputing method to compare with already existing methods. The only care is to be takes as, the new imputing method is to be designed in function format such that it could return imputed data as output. Suppose, following function is the desired method to add in test bench.
===============================
inter <- function(outs)
{
library(imputeTS)
outs <- ts(outs)
d <- na.random(outs)
return(d)
}
===============================
Save this function in new R script file and save it and note its Source location similar to "source('~/imputeTestbench/R/inter.R')" and use ’impute_errors()` function as:
#aa <- append_method(existing_method = q,dataIn= datax,missPercentFrom = 10, missPercentTo = 80, interval = 10, MethodPath = "source('~/imputeTestbench/R/inter.R')", MethodName = "Random")
#aa
#plot_errors(aa)This above code is written in commented format, since this function is dependent on other function and its location, which is not included in this package.
If user wishes to add more than one imputation methods to test bench, the function append_method() is used as:
#bb <- append_method(existing_method = aa, dataIn= datax,missPercentFrom = 10, missPercentTo = 80, interval = 10, MethodPath = "source('~/imputeTestbench/R/PSFimpute.R')", MethodName = "PSFimpute")
#bb
#plot_errors(bb)where
existing_method is output obtained from impute_error() functiondataIn is input data for testingmissPercentFrom is variable from which percent of missing values to be consideredmissPercentTo is variable to state upto what percent missing values are to be consideredinterval is interval between consecutive missPercent valuesrepetition is an integer to decide the numbers of repetition to be done for each missPercent valueerrorParameter is type of error calculation (RMSE, MAE or MAPE)MethodPath is location of function for the proposed imputation methodMethodName is name for function for the proposed imputation methodSimilarly, user can remove an imputation method from test bench with following function
#cc <- remove_method(existing_method = bb, method_number = 1)
#cc
#plot_errors(cc)To introduce missing patches as desired locations, random parameter is used. When random = 1, package itself inroduce missing values at completely random places, whereas when random = 0, it allows user to introduce missing patches as desired locations as shown in following code.
dd <- impute_errors(random = 0, startPoint = c(10, 20, 30), patchLength = c(3, 4, 5))
dd## $Parameter
## [1] "RMSE Plot"
## 
## $Missing_Percent
## [1] 0.12
## 
## $Historic_Mean
## [1] 0.5746791
## 
## $Interpolation
## [1] 0.7843964