Comprehensive Codon Usage Bias Analysis in R
Codon usage bias refers to the non-uniform usage of synonymous codons (codons that encode the same amino acid) across different organisms, genes, and functional categories. cubar is a comprehensive R package for analyzing codon usage bias in coding sequences. It provides a unified framework for calculating established codon usage metrics, conducting sliding-window analyses or differential usage analyses, and optimizing sequences for heterologous expression.
Biostrings
and data.table
backendsInstall the latest stable version from CRAN:
install.packages("cubar")
Install the latest development version from GitHub:
# Install devtools if not already installed
if (!requireNamespace("devtools", quietly = TRUE)) {
install.packages("devtools")
}
# Install cubar from GitHub
::install_github("mt1022/cubar", dependencies = TRUE) devtools
System Requirements: - R (β₯ 4.1.0)
Required Packages: - Biostrings
(β₯
2.60.0) - Bioconductor package for sequence manipulation -
IRanges
(β₯ 2.34.0) - Bioconductor infrastructure for range
operations
- data.table
(β₯ 1.14.0) - High-performance data
manipulation - ggplot2
(β₯ 3.3.5) - Data visualization -
rlang
(β₯ 0.4.11) - Language tools
Note: Bioconductor packages will be installed automatically, but you may need to update your R installation if you encounter compatibility issues.
π Complete documentation is available within R
(?function_name
) and on our package
website.
Hereβs a typical analysis workflow demonstrating key functionality:
library(cubar)
library(ggplot2)
# 1. Load and quality-check sequences
data(yeast_cds)
<- check_cds(yeast_cds)
clean_cds
# 2. Calculate codon frequencies
<- count_codons(clean_cds)
codon_freq
# 3. Calculate multiple metrics
<- get_enc(codon_freq) # Effective number of codons
enc <- get_gc3s(codon_freq) # GC content at 3rd positions
gc3s
# 4. Analyze highly expressed genes
data(yeast_exp)
<- yeast_exp[yeast_exp$gene_id %in% rownames(codon_freq), ]
yeast_exp <- head(yeast_exp[order(-yeast_exp$fpkm), ], 500)
high_expr <- est_rscu(codon_freq[high_expr$gene_id, ])
rscu_high <- get_cai(codon_freq, rscu_high)
cai
# 5. Visualize results
<- data.frame(ENC = enc, CAI = cai, GC3s = gc3s)
df ggplot(df, aes(color = GC3s, x = ENC, y = CAI)) +
geom_point(alpha = 0.6) +
scale_color_viridis_c() +
labs(title = "Codon Usage Bias Relationships",
x = "Effective Number of Codons", y = "Codon Adaptation Index")
?function_name
) and online docsFor complementary analysis, consider these R packages:
This project is licensed under the MIT License - see the LICENSE file for details.