SlimR is an R package designed for annotating single-cell and
spatial-transcriptomics (ST) datasets. It supports the creation of a
unified marker list, Markers_list
, using sources including:
the package’s built-in curated species-specific cell type and marker
reference databases (e.g., ‘Cellmarker2’, ‘PanglaoDB’, ‘scIBD’,
‘TCellSI’), Seurat objects containing cell label information, or
user-provided Excel tables mapping cell types to markers.
Based on the Markers_list, SlimR can calculate gene expression of
different cell types and predict annotation information and calculate
corresponding AUC by Celltype_Calculate()
, and annotate it
by Celltype_Annotation()
, then verify it by
Celltype_Verification()
. At the same time, it can calculate
gene expression corresponding to the cell type to generate the
corresponding annotation reference map for manual annotation (e.g.,
‘Heatmap’, ‘Features plot’, ‘Combined plot’).
Install SlimR directly from CRAN using: (Stable version, recommended when the version equivalent to GitHub package version)
install.packages("SlimR")
Note: Try adjusting the CRAN image to “Global (CDN)” or use “BiocManager::install(”SlimR”)” if you encounter a version mismatch during installation.
Install SlimR directly from GitHub using: (Development version, more recommended when the version is higher than CRAN package version)
::install_github("Zhaoqing-wang/SlimR") devtools
Load the package in your R environment:
library(SlimR)
For Seurat objects with multiple layers in the assay, please run
Seurat::JoinLayers()
first.
# For example, if you want to use the 'RNA' layer in the multilayered Seurat object assay.
@assays$RNA <- Seurat::JoinLayers(sce@assays$RNA) sce
Important: To ensure accuracy of the annotation, make sure that the entered Seurat object has run the standard process and removed batch effects.
Note: It is recommended to use the clustree
package
to determine the appropriate resolution for the input Seurat
object.
SlimR requires R (≥ 3.5) and depends on the following packages:
cowplot
, dplyr
, ggplot2
,
patchwork
, pheatmap
, readxl
,
scales
, Seurat
, tidyr
,
tools
. If installation fails, please install missing
dependencies using:
# Install dependencies if needed:
install.packages(c("cowplot", "dplyr", "ggplot2", "patchwork",
"pheatmap", "readxl", "scales", "Seurat",
"tidyr", "tools"))
SlimR requires a standardized list format for storing marker information, metrics (can be omitted), and corresponding cell types (list names = cell types (necessary), first column = markers (necessary), subsequent columns = metrics (can be omitted)).
Cellmarkers2: A database of cell types and markers covering different species and tissue types.
Reference: Hu et al. (2023) doi:10.1093/nar/gkac947.
<- SlimR::Cellmarker2 Cellmarker2
<- SlimR::Cellmarker2_table
Cellmarker2_table View(Cellmarker2_table)
Markers_list
:<- Markers_filter_Cellmarker2(
Markers_list_Cellmarker2
Cellmarker2,species = "Human",
tissue_class = "Intestine",
tissue_type = NULL,
cancer_type = NULL,
cell_type = NULL
)
Important: Select at least the ‘species’ and ‘tissue_class’ parameters to ensure the accuracy of the annotation.
Link: Output usable in sections 3.1, 4.1, 4.2, 4.3 and 5.1. Click to section3 automated annotation workflow.
PanglaoDB: Database of cell types and markers covering different species and tissue types.
Reference: Franzén et al. (2019) doi:10.1093/database/baz046.
<- SlimR::PanglaoDB PanglaoDB
<- SlimR::PanglaoDB_table
PanglaoDB_table View(PanglaoDB_table)
Markers_list
:<- Markers_filter_PanglaoDB(
Markers_list_panglaoDB
PanglaoDB,species_input = 'Human',
organ_input = 'GI tract'
)
Important: Select the ‘species_input’ and ‘organ_input’ parameters to ensure the accuracy of the annotation.
Link: Output ‘Markers_list’ usable in sections 3.1, 4.1, 4.2, 4.3 and 5.2. Click to section3 automated annotation workflow.
scIBD: A database of human intestine markers.
Reference: Nie et al. (2023) doi:10.1038/s43588-023-00464-9.
<- SlimR::Markers_list_scIBD Markers_list_scIBD
Important: This is for human intestinal annotation only. The input Seurat object was ensured to be a human intestinal type to ensure the accuracy of the labeling.
Link: Output ‘Markers_list’ usable in sections 3.1, 4.1, 4.2, 4.3 and 5.3. Click to section3 automated annotation workflow.
TCellSI: A database of T cell markers of different subtypes.
Reference: Yang et al. (2024) doi:10.1002/imt2.231.
<- SlimR::Markers_list_TCellSI Markers_list_TCellSI
Important: This is only for T cell subset annotation. Ensure that the input Seurat object is of T cell type to guarantee the accuracy of the annotation.
Link: Output ‘Markers_list’ usable in sections 3.1, 4.1, 4.2, 4.3 and 5.4. Click to section3 automated annotation workflow.
Markers_list
:The standard Markers_list
can be generated by the
built-in read_seurat_markers()
function after obtaining
Markers through the Seurat::FindAllMarkers()
function.
<- Seurat::FindAllMarkers(
seurat_markers object = sce,
group.by = "Cell_type",
only.pos = TRUE)
<- Read_seurat_markers(seurat_markers,
Markers_list_Seurat sources = "Seurat",
sort_by = "FSS",
gene_filter = 20
)
Note: Recommend use the parameter sort_by = "FSS"
to
use the ‘Feature Significance Score’ (FSS, product value of
log2FC
and Expression ratio
) as the ranking
basis.
presto
to Speed Up: (Alternative)For large data sets, the presto::wilcoxauc()
function
can be used to speed up the operation. (Alternative, sacrifice partial
accuracy)
<- dplyr::filter(
seurat_markers ::wilcoxauc(
prestoX = sce,
group_by = "Cell_type",
seurat_assay = "RNA"
),< 0.05, logFC > 0.5
padj
)
<- Read_seurat_markers(seurat_markers,
Markers_list_Seurat sources = "presto",
sort_by = "FSS",
gene_filter = 20
)
Improtant: This feature depends on the presto
packages, please run
devtools::install_github('immunogenomics/presto')
and
library(presto)
first.
Note: Recommend use the parameter sort_by = "FSS"
to
use the ‘Feature Significance Score’ (FSS, product value of
log2FC
and Expression ratio
) as the ranking
basis.
Link: Output ‘Markers_list’ usable in sections 3.1, 4.1, 4.2, 4.3 and 5.3. Click to section3 automated annotation workflow.
Format Requirements:
Each sheet name = cell type (necessary)
First row = column headers (necessary)
First column = markers (necessary)
Subsequent columns = metrics (can be omitted)
<- Read_excel_markers("D:/Laboratory/Marker_load.xlsx") Markers_list_Excel
Link: Output ‘Markers_list’ usable in sections 3.1, 4.1, 4.2, 4.3 and 5.4. Click to section3 automated annotation workflow.
Uses markers_list
to calculate probability, prediction
results, calculate corresponding AUC (optional) and generate heatmap and
ROC graphs (optional) for cell annotation.
<- Celltype_Calculate(seurat_obj = sce,
SlimR_anno_result gene_list = Markers_list,
species = "Human",
cluster_col = "seurat_clusters",
assay = "RNA",
min_expression = 0.1,
specificity_weight = 3,
threshold = 0.8,
compute_AUC = TRUE,
plot_AUC = TRUE,
AUC_correction = TRUE,
colour_low = "navy",
colour_high = "firebrick3"
)
Important: The parameter cluster_col
in the
function Celltype_Calculate()
and the function
Celltype_Annotation()
must be strictly the same to avoid
false matches.
Note: Using the parameter AUC_correction = TRUE
takes a little longer to compute, but it is recommended to correct the
predicted cell type this way in order to obtain more accurate cell type
prediction results. The lower the parameter threshold
, the
more alternative cell types will be checked by AUC, and the longer the
run time will be.
Check the annotation probability of the cell type to be annotated in
the input cluster_col
column and cell types in
Markers_list
with the following code.
print(SlimR_anno_result$Heatmap_plot)
Note: If the heatmap is not generated properly, please run the
function library(pheatmap)
first.
Cell type information results predicted by SlimR can be viewed with the following code.
View(SlimR_anno_result$Prediction_results)
Furthermore, the ROC curve and AUC value of the corresponding
cluster_col
and predicted cell types can be viewed by the
following code.
print(SlimR_anno_result$AUC_plot)
Improtant: This feature depends on the parameter
plot_AUC = TRUE
.
Note: If the heatmap is not generated properly, please run the
function library(ggplot2)
first.
After viewing the list of predicted cell types and the corresponding AUC values, the predicted cell types can be corrected with the following code.
Example 1:
# For example, cluster `15` in `cluster_col` corresponds to cell type `Intestinal stem cell`.
$Prediction_results$Predicted_cell_type[
SlimR_anno_result$Prediction_results$cluster_col == 15
SlimR_anno_result<- "Intestinal stem cell" ]
Example 2:
# For example, a predicted cell type with an AUC of 0.5 or less should be labeled `Unknown`.
$Prediction_results$Predicted_cell_type[
SlimR_anno_result$Prediction_results$AUC <= 0.5
SlimR_anno_result<- "Unknown" ]
After modifying the corresponding predicted cell type, the following code is used to view the updated predicted cell type table.
View(SlimR_anno_result$Prediction_results)
Improtant: It is strongly recommended that if you need to
correct the cell type, use cell types in
SlimR_anno_result$Prediction_results$Alternative_cell_type
.
Assigns SlimR predicted cell types information in
SlimR_anno_result$Prediction_results$Predicted_cell_type
to
the Seurat object based on cluster annotations, and stores the results
into seurat_obj@meta.data$annotation_col
.
<- Celltype_Annotation(seurat_obj = sce,
sce cluster_col = "seurat_clusters",
SlimR_anno_result = SlimR_anno_result,
plot_UMAP = TRUE,
annotation_col = "Cell_type_SlimR"
)
Important: The parameter cluster_col
in the
function Celltype_Calculate()
and the function
Celltype_Annotation()
must be strictly the same to avoid
false matches. And the parameter annotation_col
in the
function Celltype_Annotation()
and the function
Celltype_Verification()
must be strictly the same to avoid
false matches.
Use the cell group identity information in
seurat_obj@meta.data$annotation_col
and use the ‘Feature
Significance Score’ (FSS, product value of log2FC
and
Expression ratio
) as the ranking basis.
Celltype_Verification(seurat_obj = sce,
SlimR_anno_result = SlimR_anno_result,
gene_number = 5,
assay = "RNA",
colour_low = "white",
colour_high = "navy",
annotation_col = "Cell_type_SlimR"
)
Important: The parameter annotation_col
in the
function Celltype_Annotation()
and the function
Celltype_Verification()
must be strictly the same to avoid
false matches.
Note: Cell types located in
SlimR_anno_result$Prediction_results
were verified using
the markers information from
SlimR_anno_result$Expression_list
; cell types that are not
in the above list are validated using the markers information from the
function FindMarkers()
.
Generate a heatmap to estimate the likelihood that various cell clusters exhibited similarity to control cell types:
Celltype_Annotation_Heatmap(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
cluster_col = "seurat_cluster",
min_expression = 0.1,
specificity_weight = 3,
colour_low = "navy",
colour_high = "firebrick3"
)
Note: Now this function has been incorporated into
Celltype_Calculate()
, and it is recommended to use
Celltype_Calculate()
instead.
Generates per-cell-type expression dot plot with metric heatmap (when the metric information exists):
Celltype_Annotation_Features(
seurat_obj = sce,
gene_list = Markers_list,
gene_list_type = "Cellmarker2",
species = "Human",
save_path = "./SlimR/Celltype_Annotation_Features/",
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy"
)
Each resulting combined image consists of a dot plot above and a heat map below (if mertic information present). Dot plot show the expression level and expression ratio relationship between the cell type and corresponding markers. Below it, there is a metric heatmap for the corresponding markers (if the metric information exists).
Generates per-cell-type expression combined plots:
Celltype_Annotation_Combined(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
cluster_col = "seurat_cluster",
assay = "RNA",
save_path = "./SlimR/Celltype_Annotation_Combined/",
colour_low = "white",
colour_high = "navy"
)
Each generated combined plot shows the box plot of the expression levels of the corresponding markers for that cell type, with the colors corresponding to the average expression levels of the markers.
Functions in section 5.1, 5.2, 5.3 and 5.4 has been incorporated into
Celltype_Annotation_Features()
, and it is recommended to
use Celltype_Annotation_Features()
and set corresponding
parameters (for example, gene_list_type = "Cellmarker2"
)
instead. For more information, please refer to section 4.2.
Celltype_annotation_Cellmarker2(
seurat_obj = sce,
gene_list = Markers_list_Cellmarker2,
species = "Human",
cluster_col = "seurat_cluster",
assay = "RNA",
save_path = "./SlimR/Celltype_annotation_Cellmarkers2/",
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy"
)
Note: To call this function, set the parameter
gene_list_type = "Cellmarker2"
in the function
Celltype_Annotation_Features()
.
Celltype_annotation_PanglaoDB(
seurat_obj = sce,
gene_list = Markers_list_panglaoDB,
species = "Human",
cluster_col = "seurat_cluster",
assay = "RNA",
save_path = "./SlimR/Celltype_annotation_PanglaoDB/",
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy"
)
Note: To call this function, set the parameter
gene_list_type = "PanglaoDB"
in the function
Celltype_Annotation_Features()
.
Celltype_annotation_Seurat(
seurat_obj = sce,
gene_list = Markers_list_Seurat,
species = "Human",
cluster_col = "seurat_cluster",
assay = "RNA",
save_path = "./SlimR/Celltype_annotation_Seurat/",
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy"
)
Note: To call this function, set the parameter
gene_list_type = "Seurat"
in the function
Celltype_Annotation_Features()
.
Celltype_annotation_Excel(
seurat_obj = sce,
gene_list = Markers_list_Excel,
species = "Human",
cluster_col = "seurat_cluster",
assay = "RNA",
save_path = "./SlimR/Celltype_annotation_Excel/",
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy"
)
Note: To call this function, set the parameter
gene_list_type = "Excel"
in the function
Celltype_Annotation_Features
. This function also works
with Markers_list
without mertic information or with mertic
information generated in other ways.
Thank you for using SlimR. For questions, issues, or suggestions, please submit them in the issue section or discussion section on GitHub (suggested) or send an email (alternative):
Zhaoqing Wang
zhaoqingwang@mail.sdu.edu.cn