Introduction

The EmpiricalDynamics package provides a complete workflow for discovering differential equations from time series data. This approach, sometimes called “equation discovery” or “symbolic regression for dynamics,” allows researchers to uncover the mathematical laws governing observed phenomena.

The package implements a six-step workflow:

Preprocessing: Compute numerical derivatives from noisy time series
Exploration: Visual analysis to identify functional relationships
Symbolic Search: Discover equation structure automatically
Residual Analysis: Construct stochastic differential equations (SDEs)
Validation: Cross-validation, trajectory simulation, qualitative checks
Output: Generate publication-ready reports and LaTeX equations

Installation

# Install from local source
install.packages("EmpiricalDynamics", repos = NULL, type = "source")

# Or using devtools
devtools::install_github("your-repo/EmpiricalDynamics")

Quick Start Example

Let’s discover the equation governing logistic population growth.

Step 1: Load Data and Compute Derivatives

library(EmpiricalDynamics)

# Load example data
logistic_growth <- load_example_data("logistic_growth")

# View the data
head(logistic_growth)

# Compute derivative using Total Variation Regularization (recommended)
deriv <- compute_derivative(
  logistic_growth$Z,
  time = logistic_growth$time,
  method = "tvr",
  lambda = "auto"
)

# Add derivative to data
logistic_growth$dZ <- deriv$derivative

# Plot diagnostic
plot(deriv)

Step 2: Explore the Data

# Explore relationships between dZ/dt and Z
exploration <- explore_dynamics(
  data = logistic_growth,
  response = "dZ",
  predictors = "Z"
)

# View suggestions for functional forms
print(exploration$suggestions)

Step 3: Fit the Equation

Based on theory, logistic growth follows: \(\frac{dZ}{dt} = rZ(1 - Z/K)\)

# Fit the theoretical equation
equation <- fit_specified_equation(
  "r * Z * (1 - Z / K)",
  data = logistic_growth,
  derivative_col = "dZ",
  start = list(r = 0.5, K = 100)
)

# View results
summary(equation)
print(coef(equation))

Step 4: Validate the Model

# Run comprehensive validation
validation <- validate_model(
  equation = equation,
  data = logistic_growth,
  derivative_col = "dZ",
  variable = "Z",
  cv_folds = 5
)

print(validation)

Step 5: Generate Output

# Get LaTeX equation
latex_eq <- to_latex(equation, variable = "Z")
cat(latex_eq)

# Generate full report (using tempdir() per CRAN policy)
generate_report(
  results = list(
    data = logistic_growth,
    equation = equation,
    validation = validation
  ),
  output_file = file.path(tempdir(), "logistic_analysis"),
  format = "markdown"
)

Detailed Workflow

A. Numerical Differentiation

Computing reliable derivatives from noisy data is critical. The package offers several methods:

Total Variation Regularization (TVR)

Recommended for most applications, especially economic data with trends and structural breaks.

deriv_tvr <- compute_derivative(
  y = data$Z,
  time = data$time,
  method = "tvr",
  lambda = "auto"  # Automatic regularization selection
)

# Plot diagnostic: shows original, derivative, reconstruction, residuals
plot_tvr_diagnostic(deriv_tvr)

TVR solves the optimization problem: \[\min_{\dot{Z}} \left\| Z - \int \dot{Z} \, dt \right\|_2^2 + \lambda \|\Delta \dot{Z}\|_1\]

Other Methods

# Savitzky-Golay filter (preserves peaks, good for smooth data)
deriv_sg <- compute_derivative(y, time, method = "savgol", 
                                window = 11, order = 3)

# Smoothing spline (good for high noise)
deriv_spline <- compute_derivative(y, time, method = "spline", 
                                    spar = 0.7)

# Finite differences (simple, fast, sensitive to noise)
deriv_fd <- compute_derivative(y, time, method = "fd")

# Spectral (for periodic data ONLY - warns about Gibbs phenomenon otherwise)
deriv_spec <- compute_derivative(y, time, method = "spectral")

Automatic Method Selection

# Get recommendation based on data characteristics
suggestion <- suggest_differentiation_method(data$Z, data$time)
print(suggestion)

# Compare all methods visually
compare_differentiation_methods(data$Z, data$time)

B. Visual Exploration

Before formal fitting, explore the data visually:

# Comprehensive exploration
exploration <- explore_dynamics(
  data = data,
  response = "dZ",
  predictors = c("Z", "X", "Y")
)

# Individual plots
plot_phase_1d(data, x = "Z", y = "dZ")
plot_bivariate(data, x = "Z", y = "dZ")
plot_surface_3d(data, x1 = "Z", x2 = "X", y = "dZ")

C. Equation Discovery

Automated Symbolic Search

# Genetic algorithm search (pure R)
search_result <- symbolic_search(
  data = data,
  response = "dZ",
  predictors = c("Z", "X"),
  backend = "r_genetic",
  max_complexity = 15,
  n_generations = 50,
  population_size = 100,
  n_runs = 3
)

# View Pareto front
plot_pareto_front(search_result)

# Select best equation
best_eq <- select_equation(search_result, criterion = "bic")

Theory-Guided Fitting

# Fit a specific functional form
equation <- fit_specified_equation(
  "alpha + beta * Z + gamma * Z^2 + delta * X",
  data = data,
  derivative_col = "dZ",
  method = "levenberg-marquardt"  # Robust optimization
)

summary(equation)

Julia Backend (Optional)

For large-scale symbolic regression, use the Julia backend:

# Setup Julia (one-time)
setup_julia_backend()

# Run with Julia's SymbolicRegression.jl
search_result <- symbolic_search(
  data = data,
  response = "dZ",
  predictors = c("Z", "X"),
  backend = "julia",
  max_complexity = 25
)

D. Residual Analysis and SDE Construction

Diagnostics

# Compute residuals
resid <- compute_residuals(equation, data, "dZ")

# Run diagnostic tests
diag <- residual_diagnostics(resid, data)
print(diag)
plot(diag)

The diagnostics include: - Ljung-Box test: Autocorrelation in residuals - ARCH-LM test: Conditional heteroscedasticity - Breusch-Pagan test: Heteroscedasticity vs predictors - Jarque-Bera test: Normality - Runs test: Randomness

Modeling the Diffusion

# Model conditional variance
var_model <- model_conditional_variance(
  resid,
  data = data,
  predictors = c("Z"),
  method = "linear"  # or "quadratic", "symbolic"
)

# Construct SDE: dZ = f(Z,X) dt + g(Z) dW
sde <- construct_sde(
  drift = equation,
  diffusion = var_model,
  variable = "Z"
)

print(sde)

Iterative GLS Estimation

When heteroscedasticity is significant, use iterative GLS:

# Iterative estimation for unbiased drift coefficients
sde <- estimate_sde_iterative(
  drift_formula = "a + b * Z + c * Z^2",
  data = data,
  derivative_col = "dZ",
  diffusion_formula = "sigma * (1 + tau * abs(Z))",
  max_iter = 20,
  tolerance = 1e-4
)

# Check convergence
print(sde$converged)
print(sde$iterations)

E. Validation

Cross-Validation

# Block cross-validation (recommended for time series)
cv <- cross_validate(
  equation,
  data = data,
  derivative_col = "dZ",
  k = 5,
  method = "block"
)

print(cv)
plot(cv)

Trajectory Simulation

# Simulate from the SDE
sim <- simulate_trajectory(
  sde,
  initial_conditions = c(Z = data$Z[1]),
  times = data$time,
  n_sims = 100,
  method = "euler"
)

# Plot with observed data
plot(sim, observed_data = data)

# Compare quantitatively
metrics <- compare_trajectories(sim, data, "time", "Z")
print(metrics)

Qualitative Behavior

# Analyze fixed points
fps <- analyze_fixed_points(equation, "Z", range = c(-10, 150))
print(fps)

# Bifurcation analysis (vary a parameter)
bifurc <- analyze_bifurcations(
  equation, 
  variable = "Z",
  parameter = "K",
  param_range = c(50, 200)
)
plot(bifurc)

# Check against expected behavior
qual_check <- check_qualitative_behavior(
  equation, data, "Z",
  expected_features = list(
    n_fixed_points = 2,
    stability_pattern = c("unstable", "stable"),
    bounded = TRUE
  )
)
print(qual_check)

Comprehensive Validation

validation <- validate_model(
  equation = equation,
  sde = sde,
  data = data,
  derivative_col = "dZ",
  variable = "Z",
  time_col = "time",
  cv_folds = 5,
  n_sims = 100,
  expected_features = list(n_fixed_points = 2)
)

print(validation)
plot(validation)

F. Output and Reporting

LaTeX Equations

# Convert to LaTeX
latex <- to_latex(equation, variable = "Z", precision = 4)
cat(latex)

# Full SDE in LaTeX
sde_latex <- to_latex(sde, variable = "Z")
cat(sde_latex)

Tables

# Coefficient table
coef_table <- coefficient_table(equation, format = "latex")
cat(coef_table)

# Model comparison
models <- list(
  "Linear" = fit_specified_equation("a + b*Z", data, "dZ"),
  "Quadratic" = fit_specified_equation("a + b*Z + c*Z^2", data, "dZ"),
  "Logistic" = fit_specified_equation("r*Z*(1-Z/K)", data, "dZ")
)
comparison <- model_comparison_table(models, data, "dZ", format = "markdown")
cat(comparison)

Full Report

# Collect all results
results <- list(
  data = data,
  exploration = exploration,
  equation = equation,
  sde = sde,
  validation = validation
)

# Generate markdown report (using tempdir() per CRAN policy)
generate_report(
  results,
  output_file = file.path(tempdir(), "analysis_report"),
  format = "markdown",
  title = "My Dynamics Analysis"
)

# Export to multiple formats (using tempdir() per CRAN policy)
export_results(results, output_dir = tempdir(), formats = c("rds", "csv", "json"))

# Save plots (using tempdir() per CRAN policy)
save_plots(results, output_dir = tempdir(), format = c("png", "pdf"))

Analysis Summary

# Print summary to console
print_summary(results)

# Get template for new analyses (using tempdir() per CRAN policy)
get_analysis_template(file.path(tempdir(), "my_analysis.R"))

Best Practices

Differentiation

Start with TVR for economic/social science data
Use suggest_differentiation_method() for guidance
Always plot diagnostics to verify derivative quality
Consider data collection frequency via diagnose_sampling_frequency()

Equation Discovery

Use theory when available - fit_specified_equation() is more reliable
For exploratory analysis, use symbolic search but validate thoroughly
Prefer simpler equations (parsimony)
Run multiple searches and compare

Validation

Always use block CV for time series (not random CV)
Check residual diagnostics before trusting results
Verify qualitative behavior matches domain knowledge
Use bootstrap for uncertainty quantification

Reporting

Report uncertainty (standard errors or confidence intervals)
Include model comparison with alternatives
Show validation metrics prominently
Discuss limitations and assumptions

Session Info

sessionInfo()
#> R version 4.4.2 (2024-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26200)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=C                   LC_CTYPE=Spanish_Spain.utf8   
#> [3] LC_MONETARY=Spanish_Spain.utf8 LC_NUMERIC=C                  
#> [5] LC_TIME=Spanish_Spain.utf8    
#> 
#> time zone: America/Costa_Rica
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.37     R6_2.6.1          fastmap_1.2.0     xfun_0.55        
#>  [5] cachem_1.1.0      knitr_1.51        htmltools_0.5.8.1 rmarkdown_2.29   
#>  [9] lifecycle_1.0.4   cli_3.6.5         sass_0.4.10       jquerylib_0.1.4  
#> [13] compiler_4.4.2    rstudioapi_0.17.1 tools_4.4.2       evaluate_1.0.5   
#> [17] bslib_0.9.0       yaml_2.3.12       rlang_1.1.6       jsonlite_2.0.0

Getting Started with EmpiricalDynamics

EmpiricalDynamics Package

2026-01-11