The goal of this package is for easily applying same t-tests/basic data description across several sub-groups, with the output as a nice arranged data.frame. Multiple comparison and the significance symbols are also provided.
This kind of analysis is commonly seen in ROI (Region-of-interest) analysis of brain imaging data. That’s why the package is called roistats.
After data cleaning and wrangling, we yield a data.frame called color_index. This data.frame contains the neural analysis result of the degree of color memory sensitivity at each brain region of each subject. color_index has three columns:
subj_id: identify the subjects. This labels the single data point within each roi_id.roi_id: brain sub-region that of interest for the analysis. We are interested in eight brain regions.color_index: the value that indicate how sensitive of a certain brain region to the memory of color. For each subj_id and roi_id, we obtained a single color_index value.head(color_index)
#> subj_id roi_id color_index
#> 1 01 AnG -0.032384500
#> 2 01 dLatIPS -0.042524083
#> 3 01 LO -0.032643250
#> 4 01 pIPS -0.014760833
#> 5 01 V1 -0.001259167
#> 6 01 vIPS -0.023800500Before we dive into the statistical test, we want to get mean, sd, and se (standard error of the mean) for the color_index at each brain region. df_sem function provided in the package can help us with this.
To use this function, you need to use group_by from dplyr to group your data.frame and obtain the desired sub-groups which you want to get the stats summary.
Next step, specify the data.frame and the column’s name of the variable which you want to the stats summary. In this case, the data.frame is called color_index, and the column is also called color_index (a confusing example, sorry).
Note, the data.frame color_index was already grouped by roi_id.
str(color_index)
#> Classes 'grouped_df', 'tbl_df', 'tbl' and 'data.frame': 232 obs. of 3 variables:
#> $ subj_id : chr "01" "01" "01" "01" ...
#> $ roi_id : chr "AnG" "dLatIPS" "LO" "pIPS" ...
#> $ color_index: num -0.03238 -0.04252 -0.03264 -0.01476 -0.00126 ...
#> - attr(*, "groups")=Classes 'tbl_df', 'tbl' and 'data.frame': 8 obs. of 2 variables:
#> ..$ roi_id: chr [1:8] "AnG" "dLatIPS" "LO" "pIPS" ...
#> ..$ .rows :List of 8
#> .. ..$ : int [1:29] 1 9 17 25 33 41 49 57 65 73 ...
#> .. ..$ : int [1:29] 2 10 18 26 34 42 50 58 66 74 ...
#> .. ..$ : int [1:29] 3 11 19 27 35 43 51 59 67 75 ...
#> .. ..$ : int [1:29] 4 12 20 28 36 44 52 60 68 76 ...
#> .. ..$ : int [1:29] 5 13 21 29 37 45 53 61 69 77 ...
#> .. ..$ : int [1:29] 6 14 22 30 38 46 54 62 70 78 ...
#> .. ..$ : int [1:29] 7 15 23 31 39 47 55 63 71 79 ...
#> .. ..$ : int [1:29] 8 16 24 32 40 48 56 64 72 80 ...
#> .. ..- attr(*, "ptype")= int(0)
#> .. ..- attr(*, "class")= chr [1:3] "vctrs_list_of" "vctrs_vctr" "list"
#> ..- attr(*, ".drop")= logi TRUE
df_sem(color_index, color_index) # first arg refers the data.frame; second arg refers the coloumn
#> # A tibble: 8 x 5
#> roi_id mean_color_index sd n se
#> <chr> <dbl> <dbl> <int> <dbl>
#> 1 AnG 0.00537 0.0507 29 0.00942
#> 2 dLatIPS 0.0159 0.0510 29 0.00946
#> 3 LO 0.0181 0.0428 29 0.00796
#> 4 pIPS 0.0102 0.0297 29 0.00552
#> 5 V1 0.00955 0.0421 29 0.00782
#> 6 vIPS 0.0162 0.0327 29 0.00607
#> 7 vLatIPS 0.0162 0.0514 29 0.00955
#> 8 VTC 0.00468 0.0218 29 0.00405You can also achieve this in a typical tidyverse pipeline.
library(magrittr) # No need to import magrittr if you have imported tidyverse already
color_index_summary <- color_index %>%
df_sem(color_index)
knitr::kable(color_index_summary, digits = 3)| roi_id | mean_color_index | sd | n | se |
|---|---|---|---|---|
| AnG | 0.005 | 0.051 | 29 | 0.009 |
| dLatIPS | 0.016 | 0.051 | 29 | 0.009 |
| LO | 0.018 | 0.043 | 29 | 0.008 |
| pIPS | 0.010 | 0.030 | 29 | 0.006 |
| V1 | 0.010 | 0.042 | 29 | 0.008 |
| vIPS | 0.016 | 0.033 | 29 | 0.006 |
| vLatIPS | 0.016 | 0.051 | 29 | 0.010 |
| VTC | 0.005 | 0.022 | 29 | 0.004 |
Yay! We have obtained the SEM (which is commonly used for error bar plotting in psych and cog neuro area) for each sub-group easily.
Now, we want to test whether color_index is significantly against 0 for each sub-group (roi_id). That is, for each roi_id sub-group, we want to test whether the values of column color_index of the data.frame color_index is significantly different from 0. Here, we have eight sub-groups, which means we will get eight one-sample t-test results in total. At a first step analysis to figure out which brain region would be interesting, we don’t care much about the very detailed output from the t.test function provided by {stats} package. So, here we have this t_test_one_sample function that help us apply the same t-test to each sub-group, extract the key results, and wrap everything in a data.frame.
Again, the data.frame color_index was already grouped by roi_id.
t_test_one_sample(color_index, "color_index", mu = 0)
#> # A tibble: 8 x 5
#> # Groups: roi_id [8]
#> roi_id tvalue df p p_bonferroni
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 AnG 0.570 28 0.573 1
#> 2 dLatIPS 1.68 28 0.104 0.835
#> 3 LO 2.27 28 0.0311 0.249
#> 4 pIPS 1.85 28 0.0752 0.601
#> 5 V1 1.22 28 0.232 1
#> 6 vIPS 2.67 28 0.0124 0.0991
#> 7 vLatIPS 1.69 28 0.101 0.811
#> 8 VTC 1.16 28 0.257 1Here, we see the t-values, dfs, ps, and bonferroni corrected ps! Nice, we get the t-stats for each brain region, and multiple comparison corrected p-values are even provided.
However, I believe the bonferroni method is too conservative, and I want to compare the fdr method results with it. This time, we write things up in a tidyverse format again:
color_index_one_sample_t_res <- color_index %>%
t_test_one_sample("color_index", mu = 0, p_adjust = c("bonferroni","fdr"))
knitr::kable(color_index_one_sample_t_res, digits = 3)| roi_id | tvalue | df | p | p_bonferroni | p_fdr |
|---|---|---|---|---|---|
| AnG | 0.570 | 28 | 0.573 | 1.000 | 0.573 |
| dLatIPS | 1.678 | 28 | 0.104 | 0.835 | 0.167 |
| LO | 2.270 | 28 | 0.031 | 0.249 | 0.124 |
| pIPS | 1.848 | 28 | 0.075 | 0.601 | 0.167 |
| V1 | 1.221 | 28 | 0.232 | 1.000 | 0.294 |
| vIPS | 2.673 | 28 | 0.012 | 0.099 | 0.099 |
| vLatIPS | 1.694 | 28 | 0.101 | 0.811 | 0.167 |
| VTC | 1.156 | 28 | 0.257 | 1.000 | 0.294 |
Usually, we want the significance symbol to highlight the result table or the plot. Here we have the p_range function to create the significance symbol:
library(dplyr)
color_index_one_sample_t_with_sig <- color_index_one_sample_t_res %>%
mutate(sig_origin_p = p_range(p))
knitr::kable(color_index_one_sample_t_with_sig, digits = 3)| roi_id | tvalue | df | p | p_bonferroni | p_fdr | sig_origin_p |
|---|---|---|---|---|---|---|
| AnG | 0.570 | 28 | 0.573 | 1.000 | 0.573 | |
| dLatIPS | 1.678 | 28 | 0.104 | 0.835 | 0.167 | |
| LO | 2.270 | 28 | 0.031 | 0.249 | 0.124 | * |
| pIPS | 1.848 | 28 | 0.075 | 0.601 | 0.167 | |
| V1 | 1.221 | 28 | 0.232 | 1.000 | 0.294 | |
| vIPS | 2.673 | 28 | 0.012 | 0.099 | 0.099 | * |
| vLatIPS | 1.694 | 28 | 0.101 | 0.811 | 0.167 | |
| VTC | 1.156 | 28 | 0.257 | 1.000 | 0.294 |
You can use p_range for a single number too:
t_test_two_sample is for applying two-sample t-tests to all sub-groups.
Here we have color_index_two_sample: * subj_id: identify the subjects. This labels the single data point within each roi_id. * roi_id: brain sub-region that of interest for the analysis. We are interested in eight brain regions. * group: whether the test was Paired condition or Control condition. * color_effect: the value that indicate the memory trace of color. For each subj_id at each with each test condition group at each brain regionroi_id, we obtained a single color_effect value.
Note, the data.frame was already grouped by roi_id.
head(color_index_two_sample)
#> # A tibble: 6 x 4
#> # Groups: roi_id [6]
#> subj_id roi_id group color_effect
#> <chr> <chr> <fct> <dbl>
#> 1 01 AnG Paired -0.0155
#> 2 01 dLatIPS Paired -0.0484
#> 3 01 LO Paired -0.00366
#> 4 01 pIPS Paired -0.0398
#> 5 01 V1 Paired -0.0120
#> 6 01 vIPS Paired -0.0366
str(color_index_two_sample)
#> tibble [464 × 4] (S3: grouped_df/tbl_df/tbl/data.frame)
#> $ subj_id : chr [1:464] "01" "01" "01" "01" ...
#> $ roi_id : chr [1:464] "AnG" "dLatIPS" "LO" "pIPS" ...
#> $ group : Factor w/ 2 levels "Paired","Control": 1 1 1 1 1 1 1 1 1 1 ...
#> $ color_effect: num [1:464] -0.01546 -0.04841 -0.00366 -0.03982 -0.01201 ...
#> - attr(*, "groups")= tibble [8 × 2] (S3: tbl_df/tbl/data.frame)
#> ..$ roi_id: chr [1:8] "AnG" "dLatIPS" "LO" "pIPS" ...
#> ..$ .rows : list<int> [1:8]
#> .. ..$ : int [1:58] 1 9 17 25 33 41 49 57 65 73 ...
#> .. ..$ : int [1:58] 2 10 18 26 34 42 50 58 66 74 ...
#> .. ..$ : int [1:58] 3 11 19 27 35 43 51 59 67 75 ...
#> .. ..$ : int [1:58] 4 12 20 28 36 44 52 60 68 76 ...
#> .. ..$ : int [1:58] 5 13 21 29 37 45 53 61 69 77 ...
#> .. ..$ : int [1:58] 6 14 22 30 38 46 54 62 70 78 ...
#> .. ..$ : int [1:58] 7 15 23 31 39 47 55 63 71 79 ...
#> .. ..$ : int [1:58] 8 16 24 32 40 48 56 64 72 80 ...
#> .. ..@ ptype: int(0)
#> ..- attr(*, ".drop")= logi TRUEHere is the example of how to obtain the paired t-test for each sub-group:
t_test_two_sample(color_index_two_sample, x = "color_effect", y = "group", paired = TRUE)
#> # A tibble: 8 x 5
#> # Groups: roi_id [8]
#> roi_id tvalue df p p_bonferroni
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 AnG 0.570 28 0.573 1
#> 2 dLatIPS 1.68 28 0.104 0.835
#> 3 LO 2.27 28 0.0311 0.249
#> 4 pIPS 1.85 28 0.0752 0.601
#> 5 V1 1.22 28 0.232 1
#> 6 vIPS 2.67 28 0.0124 0.0991
#> 7 vLatIPS 1.69 28 0.101 0.811
#> 8 VTC 1.16 28 0.257 1Can be integrated into tidyverse pipeline too.
color_index_two_sample_t_res <- color_index_two_sample %>%
t_test_two_sample(
x = "color_effect", y = "group", paired = TRUE, p_adjust = c("bonferroni","fdr")
)
knitr::kable(color_index_two_sample_t_res, digits = 3)| roi_id | tvalue | df | p | p_bonferroni | p_fdr |
|---|---|---|---|---|---|
| AnG | 0.570 | 28 | 0.573 | 1.000 | 0.573 |
| dLatIPS | 1.678 | 28 | 0.104 | 0.835 | 0.167 |
| LO | 2.270 | 28 | 0.031 | 0.249 | 0.124 |
| pIPS | 1.848 | 28 | 0.075 | 0.601 | 0.167 |
| V1 | 1.221 | 28 | 0.232 | 1.000 | 0.294 |
| vIPS | 2.673 | 28 | 0.012 | 0.099 | 0.099 |
| vLatIPS | 1.694 | 28 | 0.101 | 0.811 | 0.167 |
| VTC | 1.156 | 28 | 0.257 | 1.000 | 0.294 |