cubar

Comprehensive Codon Usage Bias Analysis in R

CRAN status DOI Lifecycle: stable

Table of Contents

Overview

Codon usage bias refers to the non-uniform usage of synonymous codons (codons that encode the same amino acid) across different organisms, genes, and functional categories. cubar is a comprehensive R package for analyzing codon usage bias in coding sequences. It provides a unified framework for calculating established codon usage metrics, conducting sliding-window analyses or differential usage analyses, and optimizing sequences for heterologous expression.

Features

🧬 Codon-Level Analysis

πŸ“Š Gene-Level Metrics

πŸ› οΈ Utilities & Tools

Why Choose cubar?

Installation

Install the latest stable version from CRAN:

install.packages("cubar")

Development Version

Install the latest development version from GitHub:

# Install devtools if not already installed
if (!requireNamespace("devtools", quietly = TRUE)) {
    install.packages("devtools")
}

# Install cubar from GitHub
devtools::install_github("mt1022/cubar", dependencies = TRUE)

Dependencies

System Requirements: - R (β‰₯ 4.1.0)

Required Packages: - Biostrings (β‰₯ 2.60.0) - Bioconductor package for sequence manipulation - IRanges (β‰₯ 2.34.0) - Bioconductor infrastructure for range operations
- data.table (β‰₯ 1.14.0) - High-performance data manipulation - ggplot2 (β‰₯ 3.3.5) - Data visualization - rlang (β‰₯ 0.4.11) - Language tools

Note: Bioconductor packages will be installed automatically, but you may need to update your R installation if you encounter compatibility issues.

Documentation & Tutorials

πŸ“– Complete documentation is available within R (?function_name) and on our package website.

🎯 Getting Started

πŸ“š Advanced Topics

Example Workflow

Here’s a typical analysis workflow demonstrating key functionality:

library(cubar)
library(ggplot2)

# 1. Load and quality-check sequences
data(yeast_cds)
clean_cds <- check_cds(yeast_cds)

# 2. Calculate codon frequencies
codon_freq <- count_codons(clean_cds)

# 3. Calculate multiple metrics
enc <- get_enc(codon_freq)           # Effective number of codons
gc3s <- get_gc3s(codon_freq)         # GC content at 3rd positions

# 4. Analyze highly expressed genes
data(yeast_exp)
yeast_exp <- yeast_exp[yeast_exp$gene_id %in% rownames(codon_freq), ]
high_expr <- head(yeast_exp[order(-yeast_exp$fpkm), ], 500)
rscu_high <- est_rscu(codon_freq[high_expr$gene_id, ])
cai <- get_cai(codon_freq, rscu_high)

# 5. Visualize results
df <- data.frame(ENC = enc, CAI = cai, GC3s = gc3s)
ggplot(df, aes(color = GC3s, x = ENC, y = CAI)) + 
  geom_point(alpha = 0.6) + 
  scale_color_viridis_c() +
  labs(title = "Codon Usage Bias Relationships",
       x = "Effective Number of Codons", y = "Codon Adaptation Index")

πŸ†˜ Getting Help

For complementary analysis, consider these R packages:

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments


πŸ“š Documentation β€’ πŸ› Report Bug β€’ πŸ’‘ Request Feature