| Title: | Simulate DNA Methylation Dynamics on Different Genomic Structures along Genealogies |
| Version: | 0.2.1 |
| Author: | Sara Castillo Vicente [aut, cre], Dirk Metzler [aut, ths] |
| Maintainer: | Sara Castillo Vicente <castillo@bio.lmu.de> |
| Description: | DNA methylation is an epigenetic modification involved in genomic stability, gene regulation, development and disease. DNA methylation occurs mainly through the addition of a methyl group to cytosines, for example to cytosines in a CpG dinucleotide context (CpG stands for a cytosine followed by a guanine). Tissue-specific methylation patterns lead to genomic regions with different characteristic methylation levels. E.g. in vertebrates CpG islands (regions with high CpG content) that are associated to promoter regions of expressed genes tend to be unmethylated. 'MethEvolSIM' is a model-based simulation software for the generation and modification of cytosine methylation patterns along a given tree, which can be a genealogy of cells within an organism, a coalescent tree of DNA sequences sampled from a population, or a species tree. The simulations are based on an extension of the model of Grosser & Metzler (2020) <doi:10.1186/s12859-020-3438-5> and allows for changes of the methylation states at single cytosine positions as well as simultaneous changes of methylation frequencies in genomic structures like CpG islands. |
| License: | GPL (≥ 3) |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.2 |
| Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) |
| Config/testthat/edition: | 3 |
| Imports: | R6, ape |
| Depends: | R (≥ 4.0) |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2025-04-11 11:08:50 UTC; castillo |
| Repository: | CRAN |
| Date/Publication: | 2025-04-11 22:00:19 UTC |
Compute the Mean Site Frequency of Methylation Changes per Cherry
Description
This function calculates the weighted mean frequency of methylation changes at island and non-island genomic structures for each cherry in a phylogenetic tree. A cherry is a pair of leaf nodes (also called tips or terminal nodes) in a phylogenetic tree that share a direct common ancestor.
Usage
MeanSiteFChange_cherry(
data,
categorized_data = FALSE,
tree,
index_islands,
index_nonislands
)
Arguments
data |
A list containing methylation states at tree tips for each genomic structure (e.g., island/non-island).
The data should be structured as |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
tree |
A phylogenetic tree in Newick format or a |
index_islands |
A numeric vector specifying the indices of genomic structures corresponding to islands. |
index_nonislands |
A numeric vector specifying the indices of genomic structures corresponding to non-islands. |
Details
The function first validates the tree and the input data structure. It then computes the
per-cherry frequency of sites with different methylation states using get_siteFChange_cherry.
The indices provided for islands and non-islands are checked for validity using validate_structureIndices.
Finally, the function calculates the weighted mean site frequency of methylation changes for each cherry,
separately for islands and non-islands.
Value
A data frame with one row per cherry, containing the following columns:
- tip_names
A character string representing the names of the two tips in the cherry, concatenated with a hyphen.
- tip_indices
A character string representing the indices of the two tips in the cherry, concatenated with a hyphen.
- dist
A numeric value representing the sum of the branch distances between the cherry tips.
- nonisland_meanFChange
A numeric value representing the weighted mean frequency of methylation changes in non-island structures.
- island_meanFChange
A numeric value representing the weighted mean frequency of methylation changes in island structures.
Examples
# Example data setup
data <- list(
list(rep(1,10), rep(0,5), rep(1,8)), # Tip a
list(rep(1,10), rep(0.5,5), rep(0,8)), # Tip b
list(rep(1,10), rep(0.5,5), rep(0,8)), # Tip c
list(c(rep(0,5), rep(0.5, 5)), c(0, 0, 1, 1, 1), c(0.5, 1, rep(0, 6)))) # Tip d
tree <- "((a:1.5,b:1.5):2,(c:2,d:2):1.5);"
index_islands <- c(1,3)
index_nonislands <- c(2)
MeanSiteFChange_cherry(data = data,
categorized_data = TRUE,
tree = tree,
index_islands = index_islands,
index_nonislands = index_nonislands)
Categorize Global States of CpG Islands
Description
This function categorizes CpG islands into unmethylated, methylated, or partially methylated states based on specified thresholds.
Usage
categorize_islandGlbSt(meanMeth_islands, u_threshold, m_threshold)
Arguments
meanMeth_islands |
A numeric vector containing the mean methylation levels for CpG islands at each tip. |
u_threshold |
A numeric value (0-1) defining the threshold for categorization as unmethylated. |
m_threshold |
A numeric value (0-1) defining the threshold for categorization as methylated. |
Details
The function assigns each island a state:
- "u"
if mean methylation lower or equal than
u_threshold- "m"
if mean methylation greater or equal than
m_threshold- "p"
if mean methylation is in between
Value
A character vector of length equal to meanMeth_islands, containing "u", "p", or "m" for each island.
Examples
meanMeth_islands <- c(0.1, 0.4, 0.8)
categorize_islandGlbSt(meanMeth_islands, 0.2, 0.8)
Categorize Methylation Frequencies Based on Thresholds
Description
This function categorizes the values in data[[tip]][[structure]] into three categories:
-
0for unmethylated sites, where values are smaller or equal tou_threshold. -
0.5for partially methylated sites, where values are betweenu_thresholdandm_threshold. -
1for methylated sites, where values are larger or equal tom_threshold.
Usage
categorize_siteMethSt(data, u_threshold = 0.2, m_threshold = 0.8)
Arguments
data |
A list structured as |
u_threshold |
A numeric value representing the upper bound for values to be classified as unmethylated ( |
m_threshold |
A numeric value representing the lower bound for values to be classified as methylated ( |
Details
If any value in data[[tip]][[structure]] is outside these categories, it is transformed based on the given thresholds.
Value
A transformed version of data where each value is categorized as 0 (unmethylated), 0.5 (partially methylated),
or 1 (methylated).
Examples
data <- list(
list(c(0.1, 0.2, 0.02), c(0.05, 0.25, 0.15)), # tip 1
list(c(0.01, 0.7, 0.85), c(0.3, 0.1, 0.98)) # tip 2
)
transformed_data <- categorize_siteMethSt(data, u_threshold = 0.2, m_threshold = 0.8)
cftpStepGenerator
Description
an R6 class representing the steps for sampling a sequence of methylation states from the equilibrium (SSEi and SSEc considered, IWE neglected) using the CFTP algorithm for a given combiStructureGenerator instance.
It is stored in the private attribute CFTP_info of combiStructureGenerator instances when calling the combiStructureGenerator$cftp() method and can be retrieved with the combiStructureGenerator$get_CFTP_info()
Public fields
singleStr_numberPublic attribute: Number of singleStr instances
singleStr_siteNumberPublic attribute: Number of sites in singleStr instances
CFTP_highest_ratePublic attribute: CFTP highest rate
number_stepsPublic attribute: counter of steps alredy generated
CFTP_chosen_singleStrPublic attribute: list with vectors of equal size with chosen singleStr index at each CFTP step
CFTP_chosen_sitePublic attribute: list with vectors of equal size with chosen site index at each CFTP step
CFTP_eventPublic attribute: list with vectors of equal size with type of CFTP event at each CFTP step.
CFTP_randomPublic attribute: list with vectors of equal size with CFTP threshold at each CFTP step
steps_perVectorPublic attribute: size of vectors in lists CFTP_chosen_singleStr, CFTP_chosen_site, CFTP_event and CFTP_random
Methods
Public methods
Method new()
Create a new instance of class cftpStepGenerator with the info of the corresponding combiStrucutre instance
Usage
cftpStepGenerator$new( singleStr_number, singleStr_siteNumber, CFTP_highest_rate )
Arguments
singleStr_numberNumber of singleStr instances
singleStr_siteNumberNumber of sites in singleStr instances
CFTP_highest_rateCFTP highest rate across all singleStr withing combiStr instance
Returns
A new instance of class cftpStepGenerator
Method generate_events()
1: SSEi to unmethylated, 2: SSEi to partially methylated, 3: SSEi to methylated 4: SSEc copy left state, 5: SSEc copy right state
Public Method. Generates the events to apply for CFTP.
Usage
cftpStepGenerator$generate_events(steps = 10000, testing = FALSE)
Arguments
stepsInteger value >=1
testingdefault FALSE. TRUE for testing output
Details
The function add steps to the existing ones. If called several times the given steps need to be higher than the sum of steps generated before.
Returns
NULL when testing FALSE. Testing output when testing TRUE.
Method clone()
The objects of this class are cloneable with this method.
Usage
cftpStepGenerator$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
combiStructureGenerator
Description
an R6 class representing several genomic structures. Each genomic structure contained is an object of class singleStructureGenerator. Note that default clone(deep=TRUE) fails to clone singleStructureGenerator objects contained, use method $copy() instead.
Public fields
testing_outputPublic attribute: Testing output for initialize
Methods
Public methods
Method new()
Create a new combiStructureGenerator object.
Note that this object can be generated within a treeMultiRegionSimulator object.
Usage
combiStructureGenerator$new(infoStr, params = NULL, testing = FALSE)
Arguments
infoStrA data frame containing columns 'n' for the number of sites, and 'globalState' for the favoured global methylation state. If initial equilibrium frequencies are given the dataframe must contain 3 additional columns: 'u_eqFreq', 'p_eqFreq' and 'm_eqFreq'
paramsDefault NULL. When given: data frame containing model parameters.
testingDefault FALSE. TRUE for writing in public field of new instance $testing_output
Returns
A new combiStructureGenerator object.
Method get_singleStr()
Public method: Get one singleStructureGenerator object in $singleStr
Usage
combiStructureGenerator$get_singleStr(i)
Arguments
iindex of the singleStructureGenerator object in $singleStr
Returns
the singleStructureGenerator object in $singleStr with index i
Method get_singleStr_number()
Public method: Get number of singleStructureGenerator objects in $singleStr
Usage
combiStructureGenerator$get_singleStr_number()
Returns
number of singleStructureGenerator object contained in $singleStr
Method get_singleStr_siteNumber()
Public method: Get number of sites in all singleStructureGenerator objects
Usage
combiStructureGenerator$get_singleStr_siteNumber()
Returns
number of sites in all singleStructureGenerator objects
Method get_island_number()
Public method: Get number of singleStructureGenerator objects in $singleStr with $globalState "U" (CpG islands)
Usage
combiStructureGenerator$get_island_number()
Returns
number of singleStructureGenerator in $singleStr objects with $globalState "U" (CpG islands)
Method get_island_index()
Public method: Get index of singleStructureGenerator objects in $singleStr with $globalState "U" (CpG islands)
Usage
combiStructureGenerator$get_island_index()
Returns
index of singleStructureGenerator objects in $singleStr with $globalState "U" (CpG islands)
Method set_IWE_events()
Public method: Set information of the IWE events sampled in a tree branch
Usage
combiStructureGenerator$set_IWE_events(a)
Arguments
avalue to which IWE_events should be set
Returns
NULL
Method get_IWE_events()
Public method: Get information of the IWE events sampled in a tree branch
Usage
combiStructureGenerator$get_IWE_events()
Returns
information of the IWE events sampled in a tree branch
Method set_name()
Public method: Set the name of the leaf if evolutionary process (simulated from class treeMultiRegionSimulator) ends in a tree leaf.
Usage
combiStructureGenerator$set_name(a)
Arguments
avalue to which name should be set
Returns
NULL
Method get_name()
Public method: Get the name of the leaf if evolutionary process (simulated from class treeMultiRegionSimulator) ended in a tree leaf.
Usage
combiStructureGenerator$get_name()
Returns
Name of the leaf if evolutionary process (simulated from class treeMultiRegionSimulator) ended in a tree leaf. For iner tree nodes return NULL
Method get_own_index()
Public method: Set own branch index in the tree along which the evolutionary process is simulated (from class treeMultiRegionSimulator).
Usage
combiStructureGenerator$get_own_index()
Returns
NULL
Method set_own_index()
Public method: Get own branch index in the tree along which the evolutionary process is simulated (from class treeMultiRegionSimulator).
Usage
combiStructureGenerator$set_own_index(i)
Arguments
iindex of focal object
Returns
Own branch index in the tree along which the evolutionary process is simulated (from class treeMultiRegionSimulator).
Method get_parent_index()
Public method: Get parent branch index in the tree along which the evolutionary process is simulated (from class treeMultiRegionSimulator).
Usage
combiStructureGenerator$get_parent_index()
Returns
Parent branch index in the tree along which the evolutionary process is simulated (from class treeMultiRegionSimulator).
Method set_parent_index()
Public method: Set parent branch index in the tree along which the evolutionary process is simulated (from class treeMultiRegionSimulator).
Usage
combiStructureGenerator$set_parent_index(i)
Arguments
iset parent_index to this value
Returns
NULL
Method get_offspring_index()
Public method: Get offspring branch index in the tree along which the evolutionary process is simulated (from class treeMultiRegionSimulator).
Usage
combiStructureGenerator$get_offspring_index()
Returns
Offspring branch index in the tree along which the evolutionary process is simulated (from class treeMultiRegionSimulator).
Method set_offspring_index()
Public method: Set offspring branch index in the tree along which the evolutionary process is simulated (from class treeMultiRegionSimulator).
Usage
combiStructureGenerator$set_offspring_index(i)
Arguments
iset offspring_index to this value
Returns
NULL
Method add_offspring_index()
Public method: Add offspring branch index in the tree along which the evolutionary process is simulated (from class treeMultiRegionSimulator).
Usage
combiStructureGenerator$add_offspring_index(i)
Arguments
iindex to be added
Returns
NULL
Method get_mu()
Public method.
Usage
combiStructureGenerator$get_mu()
Returns
Model parameter for the rate of the IWE evolutionary process (per island and branch length).
Method get_id()
Public method. Get the unique ID of the instance
Usage
combiStructureGenerator$get_id()
Returns
A numeric value representing the unique ID of the instance.
Method set_id()
Public method. Set the unique ID of the instance
Usage
combiStructureGenerator$set_id(id)
Arguments
idinteger value to identificate the combiStructure instance
Returns
A numeric value representing the unique ID of the instance.
Method get_sharedCounter()
Public method. Get the counter value from the shared environment between instances of combiStructureGenerator class
Usage
combiStructureGenerator$get_sharedCounter()
Returns
Numeric counter value.
Method reset_sharedCounter()
Public method. Reset the counter value of the shared environment between instances of combiStructureGenerator class
Usage
combiStructureGenerator$reset_sharedCounter()
Returns
NULL
Method set_singleStr()
Public method: Clone each singleStructureGenerator object in $singleStr
Usage
combiStructureGenerator$set_singleStr(singStrList)
Arguments
singStrListobject to be cloned
Returns
NULL
Method copy()
Public method: Clone combiStructureGenerator object and all singleStructureGenerator objects in it
Usage
combiStructureGenerator$copy()
Returns
cloned combiStructureGenerator object
Method branch_evol()
Simulate CpG dinucleotide methylation state evolution along a tree branch. The function samples the IWE events on the tree branch and simulates the evolution through the SSE and IWE processes.
Usage
combiStructureGenerator$branch_evol(branch_length, dt, testing = FALSE)
Arguments
branch_lengthLength of the branch.
dtLength of SSE time steps.
testingDefault FALSE. TRUE for testing purposes.
Details
It handles both cases where IWE events are sampled or not sampled within the branch.
Returns
Default NULL. If testing = TRUE it returns information for testing purposes.
Method get_highest_rate()
Public Method. Gets the highest rate among all singleStructureGenerator objects for CFTP.
Usage
combiStructureGenerator$get_highest_rate()
Returns
Highest rate value.
Method set_CFTP_info()
Public Method. Sets a cftpStepGenerator instance asthe CFTP info.
Usage
combiStructureGenerator$set_CFTP_info(CFTP_instance)
Arguments
CFTP_instanceCFTP info.
Returns
NULL
Method get_CFTP_info()
Public Method. Gets the CFTP info.
Usage
combiStructureGenerator$get_CFTP_info()
Returns
CFTP info.
Method cftp_apply_events()
Public Method. Applies the CFTP events.
Usage
combiStructureGenerator$cftp_apply_events(testing = FALSE)
Arguments
testingdefault FALSE. TRUE for testing output
Returns
NULL when testing FALSE. Testing output when testing TRUE.
Method cftp()
Public Method. Applies the CFTP algorithm to evolve a structure and checks for convergence by comparing methylation states.
This method generates CFTP steps until the methylation sequences of the current structure and a cloned structure become identical across all singleStr instances or a step limit is reached. If the step limit is exceeded, an approximation method is applied to finalize the sequence.
Usage
combiStructureGenerator$cftp( steps = 10000, step_limit = 327680000, testing = FALSE )
Arguments
stepsminimum number of steps to apply (default 10000).
step_limitmaximum number of steps before applying an approximation method (default 327680000 corresponding to size of CFTP info of approx 6.1 GB). If this limit is reached, the algorithm stops and an approximation is applied.
testinglogical. If TRUE, returns additional testing output including the current structure, the cloned structure, the counter value, total steps, and the chosen site for the CFTP events. Default is FALSE.
Returns
NULL when testing is FALSE. If testing is TRUE, returns a list with:
-
self: the current object after applying the CFTP algorithm. -
combi_m: a deep cloned object with applied CFTP events. -
counter: the number of iterations performed. -
total_steps: the number of steps applied by the CFTP algorithm. -
CFTP_chosen_site: the site selected during the CFTP event application.
Method clone()
The objects of this class are cloneable with this method.
Usage
combiStructureGenerator$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
Compare Methylation Frequencies Between Two Tips
Description
Performs a chi-squared test to compare the distribution of methylation states
(unmethylated 0, partially-methylated 0.5, and methylated 1)
between two cherry tips. A cherry is a pair of leaf nodes (also called tips or terminal nodes)
in a phylogenetic tree that share a direct common ancestor.
Usage
compare_CherryFreqs(tip1, tip2, testing = FALSE)
Arguments
tip1 |
A numeric vector representing methylation states ( |
tip2 |
A numeric vector representing methylation states ( |
testing |
Logical; if |
Details
The function uses simulate.p.value = TRUE in chisq.test
to compute the p-value via Monte Carlo simulation to improve reliability
regardless of whether the expected frequencies meet the assumptions of the chi-squared test
(i.e., expected counts of at least 5 in each category).
Value
If testing = TRUE, returns a list with the contingency table and chi-squared test results.
Otherwise, returns the p-value of the test.
Examples
tip1 <- c(0, 0, 1, 0.5, 1, 0.5)
tip2 <- c(0, 1, 1, 0, 0.5, 0.5)
compare_CherryFreqs(tip1, tip2)
Compute Fitch Parsimony for Global Methylation States at CpG Islands
Description
This function categorizes CpG islands into methylation states and applies Fitch parsimony to estimate the minimum number of state changes in a phylogenetic tree.
Usage
computeFitch_islandGlbSt(
index_islands,
data,
tree,
u_threshold,
m_threshold,
testing = FALSE
)
Arguments
index_islands |
A numeric vector specifying the indices of genomic structures corresponding to islands. |
data |
A list containing methylation states at tree tips, structured as |
tree |
A rooted binary tree in Newick format (character string) or as an |
u_threshold |
A numeric threshold value (0-1) defining the unmethylated category. |
m_threshold |
A numeric threshold value (0-1) defining the methylated category. |
testing |
Logical; if |
Details
The function first validates the input data and categorizes CpG islands using categorize_islandGlbSt.
It then structures the data into a matrix matching tree tip labels and applies compute_fitch
to infer the minimum number of changes.
Value
If testing = TRUE, returns a list containing the categorized data matrix; otherwise,
returns a numeric vector of minimum state changes.
Examples
tree <- "((a:1,b:1):2,(c:2,d:2):1.5);"
data <- list(
list(rep(1,10), rep(0,5), rep(1,8)),
list(rep(1,10), rep(0.5,5), rep(0,8))
)
index_islands <- c(1,3)
computeFitch_islandGlbSt(index_islands, data, tree, 0.2, 0.6)
Compute Fitch Parsimony for Methylation Categories
Description
This function applies Fitch parsimony to determine the minimum number of changes required for methylation categories at tree tips.
Usage
compute_fitch(tree, meth, input_control = TRUE)
Arguments
tree |
A rooted binary tree in Newick format (character string) or as an |
meth |
A matrix of methylation categories at the tree tips, with rows corresponding to tips (names matching tree tip labels) and columns corresponding to sites or structures. |
input_control |
Logical; if |
Value
A list containing:
optStateSetA list of sets of optimal states for the root at each site/structure.
minChange_numberA numeric vector indicating the minimum number of changes for each site.
Examples
tree <- "((a:1,b:1):2,(c:2,d:2):1.5);"
meth <- matrix(c("u", "m", "p", "u", "p", "m", "m", "u"),
nrow=4, byrow=TRUE, dimnames=list(c("a", "b", "c", "d")))
compute_fitch(tree, meth)
Compute the Mean Correlation of Methylation State in Islands
Description
This function calculates the mean correlation of methylation states within island structures, allowing to exclude the shores.
Usage
compute_meanCor_i(
index_islands,
minN_CpG,
shore_length,
data,
sample_n,
categorized_data = FALSE
)
Arguments
index_islands |
A vector containing the structural indices for islands. |
minN_CpG |
The minimum number of central CpGs required for computation. |
shore_length |
The number of CpGs at each side of an island to exclude (shores). |
data |
A list containing methylation states at tree tips for each genomic structure (island / non-island)
For a single tip: |
sample_n |
The number of tips (samples) to process. |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
Details
The function processes only islands with a minimum length equal to 2 * shore_length + minN_CpG.
If none has minimum length, returns NA
Value
A numeric value representing the mean correlation of methylation states in the central CpGs of islands.
Examples
# Example usage:
index_islands <- c(1, 2)
data <- list(
list(c(0, 1, 0.5, 1, 0.5, 0), c(0.5, 0.5, 1, 1, 0, 0)), # tip 1
list(c(1, 0, 1, 1, 0.5, 0), c(1, 1, 0.5, 0.5, 0, 1)) # tip 2
)
minN_CpG <- 2
shore_length <- 1
sample_n <- 2
compute_meanCor_i(index_islands, minN_CpG, shore_length, data, sample_n,
categorized_data = TRUE)
Compute the Mean Correlation of Methylation State in Non-islands
Description
This function calculates the mean correlation of methylation states within non-island structures, allowing to exclude the shores.
Usage
compute_meanCor_ni(
index_nonislands,
minN_CpG,
shore_length,
data,
sample_n,
categorized_data = FALSE
)
Arguments
index_nonislands |
A vector containing the structural indices for non-islands. |
minN_CpG |
The minimum number of central CpGs required for computation. |
shore_length |
The number of CpGs at each side of an non-island to exclude (shores). |
data |
A list containing methylation states at tree tips for each genomic structure (island / non-island)
For a single tip: |
sample_n |
The number of tips (samples) to process. |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
Details
The function processes only non-islands with a minimum length equal to 2 * shore_length + minN_CpG.
If none has minimum length, returns NA
Value
A numeric value representing the mean correlation of methylation states in the central CpGs of non-islands.
Examples
# Example usage:
index_nonislands <- c(1, 2)
data <- list(
list(c(0, 1, 0.5, 1, 0.5, 0), c(0.5, 0.5, 1, 1, 0, 0)), # tip 1
list(c(1, 0, 1, 1, 0.5, 0), c(1, 1, 0.5, 0.5, 0, 1)) # tip 2
)
minN_CpG <- 2
shore_length <- 1
sample_n <- 2
compute_meanCor_ni(index_nonislands, minN_CpG, shore_length, data, sample_n,
categorized_data = TRUE)
Count Methylation Differences Between Cherry Pairs
Description
This function calculates the number of methylation differences between pairs of cherry tips in a phylogenetic tree. A cherry is a pair of leaf nodes that share a direct common ancestor. The function quantifies full and half methylation differences for each genomic structure (e.g., island/non-island) across all sites.
Usage
countSites_cherryMethDiff(
cherryDist,
data,
categorized_data = FALSE,
input_control = TRUE
)
Arguments
cherryDist |
A data frame containing pairwise distances between the tips of a phylogenetic tree that form cherries.
This should be as the output of
|
data |
A list containing methylation states at tree tips for each genomic structure (e.g., island/non-island).
The data should be structured as |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
input_control |
A logical value indicating whether to validate the input data.
If |
Details
The function first verifies that cherryDist contains the required columns and has at least one row.
It also ensures that data contains a sufficient number of tips and that all structures have the same number of sites.
The function then iterates over each cherry and genomic structure to compute the number of full and half methylation differences
between the two tips of each cherry.
Value
A data frame with one row per cherry, containing the following columns:
- tip_names
A character string representing the names of the two tips in the cherry, concatenated with a hyphen.
- tip_indices
A character string representing the indices of the two tips in the cherry, concatenated with a hyphen.
- dist
A numeric value representing the sum of the branch distances between the cherry tips.
- One column for each structure named with the structure number followed by _f
An integer count of the sites with a full methylation difference (where one tip is methylated and the other is unmethylated) for the given structure.
- One column for each structure named with the structure number followed by _h
An integer count of the sites with a half methylation difference (where one tip is partially methylated and the other is either fully methylated or unmethylated) for the given structure.
Examples
# Example data setup
data <- list(
list(c(0, 1, 0.5, 0), c(1, 1, 0, 0.5)),
list(c(1, 0, 0.5, 1), c(0, 1, 0.5, 0.5))
)
tree <- "(tip1:0.25, tip2:0.25);"
cherryDist <- get_cherryDist(tree)
countSites_cherryMethDiff(cherryDist, data, categorized_data = TRUE)
Count Methylation States
Description
This internal function counts the number of sites with unmethylated, partially-methylated, and methylated states in a given vector.
Usage
count_upm(data)
Arguments
data |
A numeric vector with methylation values: |
Value
An integer vector of length 3 containing counts of unmethylated, partially-methylated, and methylated sites, respectively.
Compute Methylation Frequency Differences Between Cherry Pairs
Description
This function calculates the frequency of methylation differences between pairs of cherry tips in a phylogenetic tree. A cherry is a pair of leaf nodes that share a direct common ancestor. The function quantifies full and half methylation differences for each genomic structure (e.g., island/non-island) across all sites and normalizes these counts by the number of sites per structure to obtain frequencies.
Usage
freqSites_cherryMethDiff(
tree,
data,
categorized_data = FALSE,
input_control = TRUE
)
Arguments
tree |
A phylogenetic tree object. The function assumes it follows an appropriate format for downstream processing. |
data |
A list containing methylation states at tree tips for each genomic structure (e.g., island/non-island).
The data should be structured as |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
input_control |
A logical value indicating whether to validate the input data.
If |
Details
The function first validates the tree structure and extracts pairwise distances between cherry tips.
It then counts methylation differences using countSites_cherryMethDiff and normalizes these counts by the number
of sites per structure to compute frequencies. The resulting data frame provides a per-cherry frequency
of methylation differences (half or full difference) across different genomic structures.
Value
A data frame with one row per cherry, containing the following columns:
- tip_names
A character string representing the names of the two tips in the cherry, concatenated with a hyphen.
- tip_indices
A character string representing the indices of the two tips in the cherry, concatenated with a hyphen.
- dist
A numeric value representing the sum of the branch distances between the cherry tips.
- One column for each structure named with the structure number followed by _f
A numeric value representing the frequency of sites with a full methylation difference (where one tip is methylated and the other is unmethylated) for the given structure.
- One column for each structure named with the structure number followed by _h
A numeric value representing the frequency of sites with a half methylation difference (where one tip is partially methylated and the other is either fully methylated or unmethylated) for the given structure.
Examples
# Example data setup
data <- list(
list(rep(1,10), rep(0,5), rep(1,8)),
list(rep(1,10), rep(0.5,5), rep(0,8)),
list(rep(1,10), rep(0.5,5), rep(0,8)),
list(c(rep(0,5), rep(0.5, 5)), c(0, 0, 1, 1, 1), c(0.5, 1, rep(0, 6))))
tree <- "((a:1.5,b:1.5):2,(c:2,d:2):1.5);"
freqSites_cherryMethDiff(tree, data, categorized_data = TRUE)
Get Cherry Pair Distances from a Phylogenetic Tree
Description
This function computes the pairwise distances between the tips of a phylogenetic tree that are part of cherries. A cherry is a pair of leaf nodes (also called tips or terminal nodes) in a phylogenetic tree that share a direct common ancestor. In other words, if two leaves are connected to the same internal node and no other leaves are connected to that internal node, they form a cherry. The distance is calculated as the sum of the branch lengths between the two cherry tips.
Usage
get_cherryDist(tree, input_control = TRUE)
Arguments
tree |
A tree in Newick format (as a character string) or an object of class |
input_control |
A logical value indicating whether to validate the input tree.
If |
Details
The function first checks if the input is either a character string in the Newick format or an object of class phylo,
unless input_control is set to FALSE. It then computes the pairwise distances between the tips in the tree and
identifies the sister pairs (cherries). The distance between each cherry is the sum of the branch lengths leading to the sister tips.
The tips of each cherry are identified by their names and indices.
The tip indices correspond to (a) the index from left to right on the Newick string,
(b) the order of the tip label in the phylo_object$tip.label, and
(c) the index in the methylation data list (data[[tip]][[structure]]) as obtained with the function simulate_evolData() when the given tree has several tips.
If the tree is provided in Newick format, it will be parsed using the ape::read.tree function.
Value
A data frame with five columns:
first_tip_name |
A character string representing the name of the first tip in the cherry. |
second_tip_name |
A character string representing the name of the second tip in the cherry. |
first_tip_index |
An integer representing the index of the first tip in the cherry. |
second_tip_index |
An integer representing the index of the second tip in the cherry. |
dist |
A numeric value representing the sum of the branch lengths between the two tips (i.e., the distance between the cherries). |
Examples
# Example of a tree in Newick format
newick_tree <- "((a:1,b:2):5,c:6);"
get_cherryDist(newick_tree)
# Example of using a phylo object from ape
library(ape)
tree_phylo <- read.tree(text = "((a:1,b:1):5,c:6);")
get_cherryDist(tree_phylo)
Calculate the Mean Frequency of Methylated Sites in Islands
Description
This function computes the mean frequency of methylated sites (with methylation state 1) for a set of structures identified as islands.
Usage
get_islandMeanFreqM(index_islands, data, sample_n, categorized_data = FALSE)
Arguments
index_islands |
A vector containing the structural indices for islands. |
data |
A list containing methylation states at tree tips for each genomic structure (island / non-island)
For a single tip: |
sample_n |
The number of samples (tips) to process. |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
Value
A numeric value representing the mean frequency of methylated sites in the islands.
Examples
# Example usage:
index_islands <- c(1, 3)
data <- list(
list(c(0.5, 1, 0.5), c(0, 0.5, 1), c(1, 0, 0.5)), # tip 1
list(c(0.5, 0.5, 0), c(1, 0.5, 0.5), c(0.5, 0.5, 1)) # tip 2
)
sample_n <- 2
get_islandMeanFreqM(index_islands, data, sample_n, categorized_data = TRUE)
Calculate the Mean Frequency of Partially Methylated Sites in Islands
Description
This function computes the mean frequency of partially methylated sites (with methylation state 0.5) for the set of genomic structures identified as islands.
Usage
get_islandMeanFreqP(index_islands, data, sample_n, categorized_data = FALSE)
Arguments
index_islands |
A vector containing the structural indices for islands. |
data |
A list containing methylation states at tree tips for each genomic structure (island / non-island)
For a single tip: |
sample_n |
The number of samples (tips) to process. |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
Value
A numeric value representing the mean frequency of partially methylated sites in the islands.
Examples
# Example usage:
index_islands <- c(1, 3)
data <- list(
list(c(0.5, 1, 0.5), c(0, 0.5, 1), c(1, 0, 0.5)), # tip 1
list(c(0.5, 0.5, 0), c(1, 0.5, 0.5), c(0.5, 0.5, 1)) # tip 2
)
sample_n <- 2
get_islandMeanFreqP(index_islands, data, sample_n, categorized_data = TRUE)
Calculate the Mean Standard Deviation of Methylated Sites in Islands
Description
This function computes the mean standard deviation of methylated sites (with methylation state 1) for a set of genomic structures identified as islands.
Usage
get_islandSDFreqM(index_islands, data, sample_n, categorized_data = FALSE)
Arguments
index_islands |
A vector containing the structural indices for islands. |
data |
A list containing methylation states at tree tips for each genomic structure (island / non-island)
For a single tip: |
sample_n |
The number of tips (samples) to process. |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
Value
A numeric value representing the mean standard deviation of methylated sites in the islands.
Examples
# Example usage:
index_islands <- c(1, 3)
data <- list(
list(c(0.5, 1, 0.5), c(0, 0.5, 1), c(1, 0, 0.5)), # tip 1
list(c(0.5, 0.5, 0), c(1, 0.5, 0.5), c(0.5, 0.5, 1)) # tip 2
)
sample_n <- 2
get_islandSDFreqM(index_islands, data, sample_n, categorized_data = TRUE)
Calculate the Mean Standard Deviation of Partially Methylated Sites in Islands
Description
This function computes the mean standard deviation of partially methylated sites (with methylation state 0.5) for a set of genomic structures identified as islands.
Usage
get_islandSDFreqP(index_islands, data, sample_n, categorized_data = FALSE)
Arguments
index_islands |
A vector containing the structural indices for islands. |
data |
A list containing methylation states at tree tips for each genomic structure (island / non-island)
For a single tip: |
sample_n |
The number of samples (tips) to process. |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
Value
A numeric value representing the mean standard deviation of partially methylated sites in the islands.
Examples
# Example usage:
index_islands <- c(1, 3)
data <- list(
list(c(0.5, 1, 0.5), c(0, 0.5, 1), c(1, 0, 0.5)), # tip 1
list(c(0.5, 0.5, 0), c(1, 0.5, 0.5), c(0.5, 0.5, 1)) # tip 2
)
sample_n <- 2
get_islandSDFreqP(index_islands, data, sample_n, categorized_data = TRUE)
Compute the Mean Methylation of CpG Islands
Description
This function calculates the mean methylation level for CpG islands across all tree tips.
Usage
get_meanMeth_islands(index_islands, data)
Arguments
index_islands |
A numeric vector specifying the indices of genomic structures corresponding to islands. |
data |
A list containing methylation states at tree tips for each genomic structure
(e.g., island/non-island). The data should be structured as |
Value
A list where each element corresponds to a tree tip and contains a numeric vector representing the mean methylation levels for the indexed CpG islands.
Examples
# Example data setup
data <- list(
# Tip 1
list(rep(1,10), rep(0,5), rep(1,8)),
# Tip 2
list(rep(1,10), rep(0.5,5), rep(0,8))
)
index_islands <- c(1,3)
get_meanMeth_islands(index_islands, data)
Calculate the Mean Frequency of Methylated Sites in Non-Islands
Description
This function computes the mean frequency of methylated sites (with methylation state 1) for a set of structures identified as non-islands.
Usage
get_nonislandMeanFreqM(
index_nonislands,
data,
sample_n,
categorized_data = FALSE
)
Arguments
index_nonislands |
A vector containing the structural indices for non-islands. |
data |
A list containing methylation states at tree tips for each genomic structure (island / non-island)
For a single tip: |
sample_n |
The number of samples (tips) to process. |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
Value
A numeric value representing the mean frequency of methylated sites in the non-islands.
Examples
# Example usage:
index_nonislands <- c(1, 3)
data <- list(
list(c(1, 0, 1), c(0.5, 1, 1), c(1, 0, 0.5)), # tip 1
list(c(1, 0.5, 1), c(0.5, 1, 1), c(1, 0.5, 0.5)) # tip 2
)
sample_n <- 2
get_nonislandMeanFreqM(index_nonislands, data, sample_n, categorized_data = TRUE)
Calculate the Mean Frequency of Partially Methylated Sites in Non-Islands
Description
This function computes the mean frequency of partially methylated sites (with methylation state 0.5) for a set of genomic structures identified as non-islands.
Usage
get_nonislandMeanFreqP(
index_nonislands,
data,
sample_n,
categorized_data = FALSE
)
Arguments
index_nonislands |
A vector containing the structural indices for non-islands. |
data |
A list containing methylation states at tree tips for each genomic structure (island / non-island)
For a single tip: |
sample_n |
The number of samples (tips) to process. |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
Value
A numeric value representing the mean frequency of partially methylated sites in the non-islands.
Examples
# Example usage:
index_nonislands <- c(1, 3)
data <- list(
list(c(0.5, 1, 0.5), c(0, 0.5, 1), c(1, 0, 0.5)), # tip 1
list(c(0.5, 0.5, 0), c(1, 0.5, 0.5), c(0.5, 0.5, 1)) # tip 2
)
sample_n <- 2
get_nonislandMeanFreqP(index_nonislands, data, sample_n, categorized_data = TRUE)
Calculate the Mean Standard Deviation of Methylated Sites in Non-Islands
Description
This function computes the mean standard deviation of methylated sites (with methylation state 1) for a set of genomic structures identified as non-islands.
Usage
get_nonislandSDFreqM(
index_nonislands,
data,
sample_n,
categorized_data = FALSE
)
Arguments
index_nonislands |
A vector containing the structural indices for non-islands. |
data |
A list containing methylation states at tree tips for each genomic structure (island / non-island)
For a single tip: |
sample_n |
The number of tips (samples) to process. |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
Value
A numeric value representing the mean standard deviation of methylated sites in the non-islands.
Examples
# Example usage:
index_nonislands <- c(1, 3)
data <- list(
list(c(1, 1, 1), c(0, 1, 0.5), c(1, 0, 1)), # tip 1
list(c(1, 0.5, 0), c(1, 1, 0.5), c(1, 1, 1)) # tip 2
)
sample_n <- 2
get_nonislandSDFreqM(index_nonislands, data, sample_n, categorized_data = TRUE)
Calculate the Mean Standard Deviation of Partially Methylated Sites in Non-Islands
Description
This function computes the mean standard deviation of partially methylated sites (with methylation state 0.5) for a set of genomic structures identified as non-islands.
Usage
get_nonislandSDFreqP(
index_nonislands,
data,
sample_n,
categorized_data = FALSE
)
Arguments
index_nonislands |
A vector containing the structural indices for non-islands. |
data |
A list containing methylation states at tree tips for each genomic structure (island / non-island)
For a single tip: |
sample_n |
The number of samples (tips) to process. |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
Value
A numeric value representing the mean standard deviation of partially methylated sites in the non-islands.
Examples
# Example usage:
index_nonislands <- c(1, 3)
data <- list(
list(c(0.5, 1, 0.5), c(0, 0.5, 1), c(1, 0, 0.5)), # tip 1
list(c(0.5, 0.5, 0), c(1, 0.5, 0.5), c(0.5, 0.5, 1)) # tip 2
)
sample_n <- 2
get_nonislandSDFreqP(index_nonislands, data, sample_n, categorized_data = TRUE)
Get Default Parameter Values
Description
This function retrieves parameter values for the DNA methylation simulation.
Usage
get_parameterValues(rootData = NULL)
Arguments
rootData |
NULL to return default parameter values. For data parameter values, provide rootData as the output of simulate_initialData()$data. |
Details
The function called without arguments returns default parameter values. When rootData (as $data output of simulate_initialData()) is given, it returns data parameter values.
Value
A data frame containing default parameter values.
Examples
# Get default parameter values
default_values <- get_parameterValues()
# Get parameter values of simulate_initialData() output
custom_params <- get_parameterValues()
infoStr <- data.frame(n = c(5, 10), globalState = c("M", "U"))
rootData <- simulate_initialData(infoStr = infoStr, params = custom_params)$data
rootData_paramValues <- get_parameterValues(rootData = rootData)
Compute Site Frequency of Methylation Changes per Cherry
Description
This function calculates the total frequency of methylation differences (both full and half changes) for each genomic structure for each cherry in a phylogenetic tree. A cherry is a pair of leaf nodes (also called tips or terminal nodes) in a phylogenetic tree that share a direct common ancestor. In other words, if two leaves are connected to the same internal node and no other leaves are connected to that internal node, they form a cherry.
Usage
get_siteFChange_cherry(tree, data, categorized_data = FALSE)
Arguments
tree |
A phylogenetic tree in Newick format or a phylo object from the ape package. The function ensures the tree has a valid structure and at least two tips. |
data |
A list containing methylation states at tree tips for each genomic structure (e.g., island/non-island).
The data should be structured as |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
Details
The function first verifies that tree and data have valid structures and the minimum number of tips.
It then extracts per-cherry methylation differences using freqSites_cherryMethDiff, handling potential errors.
Finally, it aggregates the full and half methylation differences for each genomic structure at each cherry.
Value
A data frame with one row per cherry, containing the following columns:
- tip_names
A character string representing the names of the two tips in the cherry, concatenated with a hyphen.
- tip_indices
A character string representing the indices of the two tips in the cherry, concatenated with a hyphen.
- dist
A numeric value representing the sum of the branch distances between the cherry tips.
- One column for each structure named with the structure number
A numeric value representing the total frequency of methylation changes (both full and half) for the given structure.
Examples
# Example data setup
data <- list(
list(rep(1,10), rep(0,5), rep(1,8)),
list(rep(1,10), rep(0.5,5), rep(0,8)),
list(rep(1,10), rep(0.5,5), rep(0,8)),
list(c(rep(0,5), rep(0.5, 5)), c(0, 0, 1, 1, 1), c(0.5, 1, rep(0, 6))))
tree <- "((a:1.5,b:1.5):2,(c:2,d:2):1.5);"
get_siteFChange_cherry(tree, data, categorized_data = TRUE)
Mean Number of Significant Methylation Frequency Changes per Island in Cherries
Description
Computes the mean number of significant changes per island in phylogenetic tree cherries, based on a specified p-value threshold.
Usage
mean_CherryFreqsChange_i(
data,
categorized_data = FALSE,
index_islands,
tree,
pValue_threshold
)
Arguments
data |
A list containing methylation states at tree tips for each genomic structure (e.g., island/non-island).
The data should be structured as |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
index_islands |
A numeric vector specifying the indices of islands to analyze. |
tree |
A rooted binary tree in Newick format (character string) or as an |
pValue_threshold |
A numeric value between 0 and 1 that serves as the threshold for statistical significance in the chi-squared test. |
Details
The function uses simulate.p.value = TRUE in chisq.test
to compute the p-value via Monte Carlo simulation to improve reliability
regardless of whether the expected frequencies meet the assumptions of the chi-squared test
(i.e., expected counts of at least 5 in each category).
Value
A data frame containing the same information as pValue_CherryFreqsChange_i,
but with additional columns indicating whether p-values are below the threshold (significant changes)
and the mean frequency of significant changes per island.
Examples
tree <- "((d:1,e:1):2,a:2);"
data <- list(
#Tip 1
list(c(rep(1,9), rep(0,1)),
c(rep(0,9), 1),
c(rep(0,9), rep(0.5,1))),
#Tip 2
list(c(rep(0,9), rep(0.5,1)),
c(rep(0.5,9), 1),
c(rep(1,9), rep(0,1))),
#Tip 3
list(c(rep(1,9), rep(0.5,1)),
c(rep(0.5,9), 1),
c(rep(0,9), rep(0.5,1))))
index_islands <- c(1,3)
mean_CherryFreqsChange_i(data, categorized_data = TRUE,
index_islands, tree, pValue_threshold = 0.05)
Mean Number of Significant Frequency Changes per Island Across all Tree Tips
Description
This function analyzes the frequency changes of methylation states (unmethylated, partially methylated, methylated) across tree tips for a given set of islands. It performs a chi-squared test for each island to check for significant changes in frequencies across tips and returns the proportion of islands showing significant changes.
Usage
mean_TreeFreqsChange_i(
tree,
data,
categorized_data = FALSE,
index_islands,
pValue_threshold,
testing = FALSE
)
Arguments
tree |
A phylogenetic tree object, typically of class |
data |
A list containing methylation states at tree tips for each genomic structure (e.g., island/non-island).
The data should be structured as |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
index_islands |
A vector of indices of genomic structures corresponding to islands in data. |
pValue_threshold |
A numeric value between 0 and 1 that serves as the threshold for statistical significance in the chi-squared test. |
testing |
Logical defaulted to FALSE. TRUE for testing output. |
Details
The function uses simulate.p.value = TRUE in chisq.test
to compute the p-value via Monte Carlo simulation to improve reliability
regardless of whether the expected frequencies meet the assumptions of the chi-squared test
(i.e., expected counts of at least 5 in each category).
Throws errors if:
The
treeis not valid.-
datais not structured correctly across tips. -
index_islandsis empty. -
pValue_thresholdis not between 0 and 1.
Value
A numeric value representing the mean proportion of islands with significant frequency changes across tips.
Examples
# Example of usage:
tree <- "((d:1,e:1):2,a:2);"
data <- list(
#Tip 1
list(c(rep(1,9), rep(0,1)),
c(rep(0,9), 1),
c(rep(0,9), rep(0.5,1))),
#Tip 2
list(c(rep(1,9), rep(0.5,1)),
c(rep(0.5,9), 1),
c(rep(1,9), rep(0,1))),
#Tip 3
list(c(rep(1,9), rep(0.5,1)),
c(rep(0.5,9), 1),
c(rep(0,9), rep(0.5,1))))
index_islands <- c(1,3)
mean_TreeFreqsChange_i(tree,
data, categorized_data = TRUE,
index_islands,
pValue_threshold = 0.05)
Compute p-Values for Methylation Frequency Changes in Cherries
Description
Calculates p-values for changes in methylation frequency between pairs of cherry tips in a phylogenetic tree. A cherry is a pair of leaf nodes (also called tips or terminal nodes) in a phylogenetic tree that share a direct common ancestor.
Usage
pValue_CherryFreqsChange_i(
data,
categorized_data = FALSE,
index_islands,
tree,
input_control = TRUE
)
Arguments
data |
A list containing methylation states at tree tips for each genomic structure (e.g., island/non-island).
The data should be structured as |
categorized_data |
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1. |
index_islands |
A numeric vector specifying the indices of islands to analyze. |
tree |
A rooted binary tree in Newick format (character string) or as an |
input_control |
Logical; if |
Details
The function uses simulate.p.value = TRUE in chisq.test
to compute the p-value via Monte Carlo simulation to improve reliability
regardless of whether the expected frequencies meet the assumptions of the chi-squared test
(i.e., expected counts of at least 5 in each category).
Value
A data frame containing tip pair information (first tip name, second tip name, first tip index, second tip index, distance) and one column per island with the p-values from the chi-squared tests.
Examples
# Example with hypothetical tree and data structure
tree <- "((d:1,e:1):2,a:2);"
data <- list(
#Tip 1
list(c(rep(1,9), rep(0,1)),
c(rep(0,9), 1),
c(rep(0,9), rep(0.5,1))),
#Tip 2
list(c(rep(0,9), rep(0.5,1)),
c(rep(0.5,9), 1),
c(rep(1,9), rep(0,1))),
#Tip 3
list(c(rep(1,9), rep(0.5,1)),
c(rep(0.5,9), 1),
c(rep(0,9), rep(0.5,1))))
index_islands <- c(1,3)
pValue_CherryFreqsChange_i(data, categorized_data = TRUE, index_islands, tree)
Simulate Data Evolution along a Tree
Description
This function simulates methylation data evolution along a tree. Either by simulating data at the root of the provided evolutionary tree (if infoStr is given) or by using pre-existing data at the root (if rootData is given) and letting it evolve along the tree.
Usage
simulate_evolData(
infoStr = NULL,
rootData = NULL,
tree = NULL,
params = NULL,
dt = 0.01,
CFTP = FALSE,
CFTP_step_limit = 327680000,
n_rep = 1,
only_tip = TRUE
)
Arguments
infoStr |
A data frame containing columns 'n' for the number of sites, and 'globalState' for the favoured global methylation state. If customized initial equilibrium frequencies are given, it also contains columns 'u_eqFreq', 'p_eqFreq', and 'm_eqFreq' with the equilibrium frequency values for unmethylated, partially methylated, and methylated. |
rootData |
The output of the simulate_initialData()$data function. It represents the initial data at the root of the evolutionary tree. |
tree |
A string in Newick format representing the evolutionary tree. |
params |
Optional data frame with specific parameter values. Structure as in get_parameterValues() output. If not provided, default values will be used. |
dt |
Length of time step for Gillespie's Tau-Leap Approximation (default is 0.01). |
CFTP |
Default FALSE. TRUE for calling cftp algorithm to set root state according to model equilibrium (Note that current implementation neglects IWE process). |
CFTP_step_limit |
when CFTP = TRUE, maximum number of steps before applying an approximation method (default 327680000 corresponding to size of CFTP info of approx 6.1 GB). |
n_rep |
Number of replicates to simulate (default is 1). |
only_tip |
Logical indicating whether to extract data only for tips (default is TRUE, FALSE to extract the information for all the tree branches). |
Value
A list containing the parameters used ($params), the length of the time step used for the Gillespie's tau-leap approximation ($dt, default 0.01), the tree used ($tree).
simulated data and the simulated data ($data). In $data, each list element corresponds to a simulation replicate.
If only_tip is TRUE: In
$data, each list element corresponds to a simulation replicate. Each replicate includes one list per tree tip, each containing:The name of each tip in the simulated tree (e.g. replicate 2, tip 1:
$data[[2]][[1]]$name).A list with the sequence of methylation states for each tip-specific structure (e.g. replicate 1, tip 2, 3rd structure:
$data[[1]][[2]]$seq[[3]]. The methylation states are encoded as 0 for unmethylated, 0.5 for partially methylated, and 1 for methylated.
If only_tip is FALSE,
$datacontains 2 lists:-
$data$branchInTree: a list in which each element contains the information of the relationship with other branches:Index of the parent branch (e.g. branch 2):
$data$branchInTree[[2]]$parent_index)Index(es) of the offspring branch(es) (e.g. branch 1 (root)):
$data$branchInTree[[1]]$offspring_index)
-
$data$sim_data: A list containing simulated data. Each list element corresponds to a simulation replicate. Each replicate includes one list per tree branch, each containing:The name of each branch in the simulated tree. It's NULL for the tree root and inner nodes, and the name of the tips for the tree tips. (e.g. replicate 2, branch 1:
$data$sim_data[[2]][[1]]$name)Information of IWE events on that branch. It's NULL for the tree root and FALSE for the branches in which no IWE event was sampled, and a list containing
$islandswith the index(ces) of the island structure(s) that went through the IWE event and$timesfor the branch time point(s ) in which the IWE was sampled. (e.g. replicate 1, branch 3:$data$sim_data[[1]][[3]]$IWE)A list with the sequence of methylation states for each structure (the index of the list corresponds to the index of the structures). The methylation states are encoded as 0 for unmethylated, 0.5 for partially methylated, and 1 for methylated. (e.g. replicate 3, branch 2, structure 1:
$data$sim_data[[3]][[2]]$seq[[1]])A list with the methylation equilibrium frequencies for each structure (the index of the list corresponds to the index of the structures). Each structure has a vector with 3 values, the first one corresponding to the frequency of unmethylated, the second one to the frequency of partially methylated, and the third one to the frequency of methylated CpGs. (e.g. replicate 3, branch 2, structure 1:
$data$sim_data[[3]][[2]]$eqFreqs[[1]])
-
Examples
# Example data
infoStr <- data.frame(n = c(10, 100, 10), globalState = c("M", "U", "M"))
# Simulate data evolution along a tree with default parameters
simulate_evolData(infoStr = infoStr, tree = "(A:0.1,B:0.1);")
# Simulate data evolution along a tree with custom parameters
custom_params <- get_parameterValues()
custom_params$iota <- 0.5
simulate_evolData(infoStr = infoStr, tree = "(A:0.1,B:0.1);", params = custom_params)
Simulate Initial Data
Description
This function simulates initial data based on the provided information and parameters.
Usage
simulate_initialData(
infoStr,
params = NULL,
CFTP = FALSE,
CFTP_step_limit = 327680000
)
Arguments
infoStr |
A data frame containing columns 'n' for the number of sites, and 'globalState' for the favoured global methylation state. If customized equilibrium frequencies are given, it also contains columns 'u_eqFreq', 'p_eqFreq' and 'm_eqFreq' with the equilibrium frequency values for unmethylated, partially methylated and methylated. |
params |
Optional data frame with specific parameter values. |
CFTP |
Default FALSE. TRUE for calling cftp algorithm to set root state according to model equilibrium (Note that current implementation neglects IWE process). Structure as in get_parameterValues() output. If not provided, default values will be used. |
CFTP_step_limit |
when CFTP = TRUE, maximum number of steps before applying an approximation method (default 327680000 corresponding to size of CFTP info of approx 6.1 GB). |
Details
The function performs several checks on the input data and parameters to ensure they meet the required criteria and simulates DNA methylation data.
Value
A list containing the simulated data ($data) and parameters ($params).
Examples
# Example data
infoStr <- data.frame(n = c(10, 100, 10), globalState = c("M", "U", "M"))
# Simulate initial data with default parameters
simulate_initialData(infoStr = infoStr)
# Simulate data evolution along a tree with custom parameters
custom_params <- get_parameterValues()
custom_params$iota <- 0.5
simulate_initialData(infoStr = infoStr, params = custom_params)
singleStructureGenerator
Description
an R6 class representing a single genomic structure
Public fields
testing_outputPublic attribute: Testing output for initialize
Methods
Public methods
Method init_neighbSt()
Public method: Initialization of $neighbSt
This fuction initiates each CpG position $neighbSt as encoded in $mapNeighbSt_matrix
It uses $update_neighbSt which updates for each sequence index, the neighbSt of left and right neighbors This means that it updates position 2, then 1 and 3, then 2 and 4.. Therefore, if the combiStructure instance has several singleStr instances within and the first has length 1, the $neighbSt of that position of the first singleStr instance is initialized when the method is called from the second singleStr instance
Positions at the edge of the entire simulated sequence use their only neighbor as both neighbors.
Usage
singleStructureGenerator$init_neighbSt()
Returns
NULL
Method initialize_ratetree()
Public method: Initialization of $ratetree
This function initializes $ratetree
Usage
singleStructureGenerator$initialize_ratetree()
Returns
NULL
Method new()
Create a new singleStructureGenerator object.
Note that this object is typically generated withing a combiStructureGenerator object.
Usage
singleStructureGenerator$new( globalState, n, eqFreqs = NULL, combiStr = NULL, combiStr_index = NULL, params = NULL, testing = FALSE )
Arguments
globalStateCharacter. Structure's favored global state: "M" for methylated (island structures) / "U" for unmethylated (non-island structures).
nNumerical Value. Number of CpG positions
eqFreqsDefault NULL. When given: numerical vector with structure's methylation state equilibrium frequencies (for unmethylated, partially methylated and methylated)
combiStrDefault NULL. When initiated from combiStructureGenerator: object of class combiStructureGenerator containing it
combiStr_indexDefault NULL. When initiated from combiStructureGenerator: index in Object of class combiStructureGenerator
paramsDefault NULL. When given: data frame containing model parameters
testingDefault FALSE. TRUE for writing in public field of new instance $testing_output
Returns
A new singleStructureGenerator object.
Method set_myCombiStructure()
Public method: Set my_combiStructure. Assigns given combi instance to private field my_combiStructure
Usage
singleStructureGenerator$set_myCombiStructure(combi)
Arguments
combiinstance of combiStructureGenerator
Returns
NULL
Method get_seq()
Public method: Get object's methylation state sequence
Encoded with 1 for unmethylated, 2 for partially methylated and 3 for methylated
Usage
singleStructureGenerator$get_seq()
Returns
vector with equilibrium frequencies of unmethylated, partially methylated and methylated
Method get_seqFirstPos()
Public method: Get first sequence position methylation state
Usage
singleStructureGenerator$get_seqFirstPos()
Returns
numerical encoding of first position's methylation state
Method get_seq2ndPos()
Public method: Get second sequence position methylation state
Usage
singleStructureGenerator$get_seq2ndPos()
Returns
numerical encoding of second position's methylation state. NULL if position does not exist
Method get_seqLastPos()
Public method: Get first sequence position methylation state
Usage
singleStructureGenerator$get_seqLastPos()
Returns
numerical encoding of first position's methylation state
Method get_seq2ndButLastPos()
Public method: Get second but last sequence position methylation state
Usage
singleStructureGenerator$get_seq2ndButLastPos()
Returns
numerical encoding of second but last position's methylation state. NULL if position does not exist
Method get_combiStructure_index()
Public method: Get index in object of class combiStructureGenerator
Usage
singleStructureGenerator$get_combiStructure_index()
Returns
index in object of class combiStructureGenerator
Method update_interStr_firstNeighbSt()
Public method: Update neighbSt of next singleStructureGenerator object within combiStructureGenerator object
This function is used when the last $seq position of a singleStructureGenerator object changes methylation state to update the neighbSt position
Usage
singleStructureGenerator$update_interStr_firstNeighbSt( leftNeighb_seqSt, rightNeighb_seqSt )
Arguments
leftNeighb_seqSt$seq state of left neighbor (left neighbor is in previous singleStructureGenerator object)
rightNeighb_seqSt$seq state of right neighbor
Returns
NULL
Method update_interStr_lastNeighbSt()
Public method: Update neighbSt of previous singleStructureGenerator object within combiStructureGenerator object
Usage
singleStructureGenerator$update_interStr_lastNeighbSt( leftNeighb_seqSt, rightNeighb_seqSt )
Arguments
leftNeighb_seqSt$seq state of right neighbor (left neighbor is in next singleStructureGenerator object)
rightNeighb_seqSt$seq state of right neighbor
Returns
NULL
Method get_eqFreqs()
Public method: Get object's equilibrium Frequencies
Usage
singleStructureGenerator$get_eqFreqs()
Returns
vector with equilibrium frequencies of unmethylated, partially methylated and methylated
Method SSE_evol()
Public method. Simulate how CpG dinucleotide methylation state changes due to the SSE process along a time step of length dt
Usage
singleStructureGenerator$SSE_evol(dt, testing = FALSE)
Arguments
dttime step length.
testinglogical value for testing purposes. Default FALSE.
Returns
default NULL. If testing TRUE it returns a list with the debugNov3.outnumber of events sampled and a dataframe with the position(s) affected, new state and old methylation state.
Method get_transMat()
Public Method. Get a transition matrix
Usage
singleStructureGenerator$get_transMat( old_eqFreqs, new_eqFreqs, info, testing = FALSE )
Arguments
old_eqFreqsnumeric vector with 3 frequency values (for old u, p and m)
new_eqFreqsnumeric vector with 3 frequency values (for new u, p and m)
infocharacter string to indicate where the method is being called
testinglogical value for testing purposes. Default FALSE.
Details
Given a tripple of old equilibrium frequencies and new equilibrium frequencies, generates the corresponding transition matrix.
Returns
transMat. The transition matrix. If testing = TRUE it returns a list. If there was a change in the equilibrium frequencies the list contains the following 7 elements, if not it contains the first 3 elements:
transMattransition matrix
caseThe applied case.
Method IWE_evol()
Public Method. Simulate IWE Events
Simulates how CpG Islands' methylation state frequencies change and simultaneous sites change methylation state along a branch of length t according to the SSE-IWE model.
Usage
singleStructureGenerator$IWE_evol(testing = FALSE)
Arguments
testinglogical value for testing purposes. Default FALSE.
Details
The function checks if the methylation equilibrium frequencies (eqFreqs) and sequence observed
frequencies (obsFreqs) change after the IWE event. If there is a change in either
frequencies, the corresponding change flag eqFreqsChange
in the infoIWE list will be set to TRUE.
Returns
If testing = TRUE it returns a list. If there was a change in the equilibrium frequencies the list contains the following 7 elements, if not it contains the first 3 elements:
eqFreqsChangelogical indicating if there was a change in the equilibrium frequencies.
old_eqFreqsOriginal equilibrium frequencies before the IWE event.
new_eqFreqsNew equilibrium frequencies after the IWE event.
old_obsFreqsOriginal observed frequencies before the IWE event.
new_obsFreqsNew observed frequencies after the IWE event.
IWE_caseDescription of the IWE event case.
MkTransition matrix used for the IWE event.
Method get_alpha_pI()
Public Method.
Usage
singleStructureGenerator$get_alpha_pI()
Returns
Model parameter alpha_pI for sampling island equilibrium frequencies
Method get_beta_pI()
Public Method.
Usage
singleStructureGenerator$get_beta_pI()
Returns
Model parameter for sampling island equilibrium frequencies
Method get_alpha_mI()
Public Method.
Usage
singleStructureGenerator$get_alpha_mI()
Returns
Model parameter for sampling island equilibrium frequencies
Method get_beta_mI()
Public Method.
Usage
singleStructureGenerator$get_beta_mI()
Returns
Model parameter for sampling island equilibrium frequencies
Method get_alpha_pNI()
Public Method.
Usage
singleStructureGenerator$get_alpha_pNI()
Returns
Model parameter for sampling non-island equilibrium frequencies
Method get_beta_pNI()
Public Method.
Usage
singleStructureGenerator$get_beta_pNI()
Returns
Model parameter for sampling non-island equilibrium frequencies
Method get_alpha_mNI()
Public Method.
Usage
singleStructureGenerator$get_alpha_mNI()
Returns
Model parameter for sampling non-island equilibrium frequencies
Method get_beta_mNI()
Public Method.
Usage
singleStructureGenerator$get_beta_mNI()
Returns
Model parameter for sampling non-island equilibrium frequencies
Method get_alpha_Ri()
Public Method.
Usage
singleStructureGenerator$get_alpha_Ri()
Returns
Model parameter for gamma distribution shape to initialize the 3 $Ri_values
Method get_iota()
Public Method.
Usage
singleStructureGenerator$get_iota()
Returns
Model parameter for gamma distribution expected value to initialize the 3 $Ri_values
Method get_Ri_values()
Public Method.
Usage
singleStructureGenerator$get_Ri_values()
Returns
The 3 $Ri_values
Method get_Q()
Public Method.
Usage
singleStructureGenerator$get_Q( siteR = NULL, neighbSt = NULL, oldSt = NULL, newSt = NULL )
Arguments
siteRdefault NULL. Numerical value encoding for the sites rate of independent SSE (1, 2 or 3)
neighbStdefault NULL. Numerical value encoding for the sites neighbouring state (as in mapNeighbSt_matrix)
oldStdefault NULL. Numerical value encoding for the sites old methylation state (1, 2 or 3)
newStdefault NULL. Numerical value encoding for the sites new methylation state (1, 2 or 3)
Returns
With NULL arguments, the list of rate matrices. With non NULL arguments, the corresponding rate of change.
Method get_siteR()
Public Method.
Usage
singleStructureGenerator$get_siteR(index = NULL)
Arguments
indexdefault NULL. Numerical value for the index of the CpG position within the singleStr instance
Returns
with NULL arguments, siteR vector. non NULL arguments, the corresponding siteR
Method get_neighbSt()
Public Method.
Usage
singleStructureGenerator$get_neighbSt(index = NULL)
Arguments
indexdefault NULL. Numerical value for the index of the CpG position within the singleStr instance
Returns
with NULL arguments, neighbSt vector. non NULL arguments, the corresponding neighbSt
Method update_ratetree_otherStr()
Public Method. Update ratetree from another singleStructure instance
Usage
singleStructureGenerator$update_ratetree_otherStr(position, rate)
Arguments
positionNumerical value for the index of the CpG position within the singleStr instance
rateRate of change to asign to that position
Returns
NULL
Method get_Qi()
Public Method. Get list of matrices for SSE process
Usage
singleStructureGenerator$get_Qi(siteR = NULL, oldSt = NULL, newSt = NULL)
Arguments
siteRdefault NULL. Numerical value encoding for the sites rate of independent SSE (1, 2 or 3)
oldStdefault NULL. Numerical value encoding for the sites old methylation state (1, 2 or 3)
newStdefault NULL. Numerical value encoding for the sites new methylation state (1, 2 or 3)
Returns
With NULL arguments, the list of SSEi rate matrices. With non NULL arguments, the corresponding rate of change.
Method get_seqSt_leftneighb()
Public Method. Decode methylation state of left neighbor form owns neighbSt
Usage
singleStructureGenerator$get_seqSt_leftneighb(index)
Arguments
indexInteger index value for the CpG position within the singleStr instance
Returns
decoded methylation state ($seq) of left neighbor (1, 2 or 3 for unmethylated, partially methylated or methylated)
Method get_seqSt_rightneighb()
Public Method. Decode methylation state of left neighbor form owns neighbSt
Usage
singleStructureGenerator$get_seqSt_rightneighb(index)
Arguments
indexInteger index value for the CpG position within the singleStr instance
Returns
decoded methylation state ($seq) of right neighbor (1, 2 or 3 for unmethylated, partially methylated or methylated)
Method cftp_all_equal()
Public Method. Make a singleStructure with the same segment lengths and parameters as the focal one but where all states are m or u
Usage
singleStructureGenerator$cftp_all_equal(state, testing = FALSE)
Arguments
stateCharacter value "U" or "M"
testingdefault FALSE. TRUE for testing output
Returns
right neighbSt
Method set_seqSt_update_neighbSt()
Public Method. Set the methylation state of a sequence position and update the neighbor's neighbSt. It does NOT update RATETREE
Usage
singleStructureGenerator$set_seqSt_update_neighbSt( index, newSt, testing = FALSE )
Arguments
indexNumerical value for the index of the CpG position within the singleStr instance
newStNumerical value encoding for the sites new methylation state (1, 2 or 3)
testingdefault FALSE. TRUE for testing output
Returns
NULL when testing FALSE. Testing output when testing TRUE.
Method reset_seq()
Public Method. Resets the sequence states by resampling according to the instance's equilibrium frequencies.
Usage
singleStructureGenerator$reset_seq()
Returns
NULL. The sequence is updated in place.
Method clone()
The objects of this class are cloneable with this method.
Usage
singleStructureGenerator$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
treeMultiRegionSimulator
Description
an R6 class representing the methylation state of GpGs in different genomic structures in the nodes of a tree.
The whole CpG sequence is an object of class combiStructureGenerator. Each genomic structure in it is contained in an object of class singleStructureGenerator.
Public fields
testing_outputPublic attribute: Testing output for initialize
BranchPublic attribute: List containing objects of class combiStructureGenerator
branchLengthPublic attribute: Vector with the corresponding branch lengths of each $Branch element
Methods
Public methods
Method treeEvol()
Simulate CpG dinucleotide methylation state evolution along a tree. The function splits a given tree and simulates evolution along its branches. It recursively simulates evolution in all of the subtrees in the given tree until the tree leafs
Usage
treeMultiRegionSimulator$treeEvol( Tree, dt = 0.01, parent_index = 1, testing = FALSE )
Arguments
TreeString. Tree in Newick format. When called recursivelly it is given the corresponding subtree.
dtLength of SSE time steps.
parent_indexDefault 1. When called recursivelly it is given the corresponding parent branch index.
testingDefault FALSE. TRUE for testing purposes.
Returns
NULL
Method new()
Create a new treeMultiRegionSimulator object. $Branch is a list for the tree branches, its first element represents the tree root.
Note that one of either infoStr or rootData needs to be given. Not both, not neither.
Usage
treeMultiRegionSimulator$new( infoStr = NULL, rootData = NULL, tree = NULL, params = NULL, dt = 0.01, CFTP = FALSE, CFTP_step_limit = 327680000, testing = FALSE )
Arguments
infoStrA data frame containing columns 'n' for the number of sites, and 'globalState' for the favoured global methylation state. If initial equilibrium frequencies are given the dataframe must contain 3 additional columns: 'u_eqFreq', 'p_eqFreq' and 'm_eqFreq'
rootDatacombiStructureGenerator object. When given, the simulation uses its parameter values.
treetree
paramsDefault NULL. When given: data frame containing model parameters. Note that if rootData is not null, its parameter values are used.
dtlength of the dt time steps for the SSE evolutionary process
CFTPDefault FALSE. TRUE for calling cftp algorithm to set root state according to model equilibrium (Note that current implementation neglects IWE process).
CFTP_step_limitwhen CFTP = TRUE, maximum number of steps before applying an approximation method (default 327680000 corresponding to size of CFTP info of approx 6.1 GB).
testingDefault FALSE. TRUE for testing output.
Returns
A new treeMultiRegionSimulator object.
Method clone()
The objects of this class are cloneable with this method.
Usage
treeMultiRegionSimulator$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
Validate Data Structure Across Tips
Description
This function ensures that data follows the required nested structure
data[[tip]][[structure]], where:
-
datais a list of at least twotipelements. Each
tipis a list ofstructureelements.Each
structurecontains a numeric vector of equal length across all tips.
Usage
validate_dataAcrossTips(data)
Arguments
data |
A list structured as |
Details
Throws errors if:
-
datais not a list. It has fewer than two tips.
Any tip is not a list.
The number of structures is inconsistent across tips.
Any structure has zero-length data at any tip.
Structures have different site lengths across tips.
Validate Structure of Input Data for Cherry Distance Computation
Description
This function checks whether the provided input data has the required structure. It ensures that the number of tips is sufficient and that the data structure is consistent across tips and structures.
Usage
validate_data_cherryDist(cherryDist, data)
Arguments
cherryDist |
A data frame containing cherry pair distances, including tip indices (output from |
data |
A nested list representing structured data for each tip, following the format |
Details
The function performs several validation steps:
Ensures that the number of tips in
datais at least as large as the highest tip index incherryDist.Checks that all tips contain at least one structure and that the number of structures is consistent across tips.
Verifies that within each structure, all tips have the same number of sites and no zero-length structures.
If any of these conditions fail, the function throws an error with a descriptive message.
Validate Structure Indices for Island and Non-Island Data
Description
This function checks whether the provided indices for islands and non-islands
are within the valid range of structures in the dataset. It also warns if
any indices are present in both index_islands and index_nonislands.
Usage
validate_structureIndices(data, index_islands, index_nonislands)
Arguments
data |
A nested list |
index_islands |
An integer vector specifying indices that correspond to island structures. |
index_nonislands |
An integer vector specifying indices that correspond to non-island structures. |
Details
The funct@exportion performs the following checks:
Ensures that all indices in
index_islandsandindex_nonislandsare within the range of available structures.Throws an error if any index is out of bounds.
Issues a warning if the same index appears in both
index_islandsandindex_nonislands.
Value
No return value. The function stops execution if invalid indices are detected.
Validate and Parse a Phylogenetic Tree
Description
This function checks whether the input is a valid phylogenetic tree, either as a character string in Newick format
or as an object of class phylo from the ape package. If the input is a Newick string, it is parsed into
a phylo object. The function also ensures that the tree contains at least two tips.
Usage
validate_tree(tree)
Arguments
tree |
A phylogenetic tree in Newick format (as a character string) or an object of class
|
Details
The function first verifies that the input is either a valid
phyloobject or a character string.If the input is a Newick string, it attempts to parse it into a
phyloobject usingape::read.tree().If parsing fails, an informative error message is returned.
The function also checks that the tree contains at least two tips, as a valid phylogenetic tree should have at least one split.
Value
A phylo object representing the validated and parsed tree.