Based on the detailed documentation provided for EmpiricalDynamics, the addition of strategic diagrams would significantly enhance user understanding, particularly for the complex workflows involved in SDE discovery and the hybrid architecture.
Here is the revised documentation with suggested image tags inserted at high-value locations.
High-Performance & Robust Empirical Discovery of Differential Equations from Time Series Data
EmpiricalDynamics is a comprehensive toolkit for discovering differential and difference equations from empirical time series data. It combines the statistical power of R with a high-performance Julia backend (via SymbolicRegression.jl) to offer a robust engine capable of recovering physical laws, economic models, and stochastic differential equations from noisy data.
| Feature | Description |
|---|---|
| Hybrid R/Julia Architecture | Seamlessly leverages Julia’s speed for symbolic regression while keeping R’s statistical ecosystem for analysis |
| Multi-threaded Search | Automatic parallelization across CPU cores via Julia threads |
| SDE Discovery | Full pipeline for Stochastic Differential Equations: drift + diffusion recovery |
| GLS Iterative Refinement | Iterative Generalized Least Squares for heteroscedastic data |
| TVR Differentiation | Total Variation Regularized derivatives for robust estimation in noisy data |
| Physics-Informed | Automatic detection of fundamental constants (π, e, φ) |
| Production Ready | Handles edge cases (NaNs, outliers, single-variable inputs) gracefully |
# 1. Install R dependencies
install.packages(c("CVXR", "minpack.lm", "signal", "lmtest", "tseries",
"ggplot2", "gridExtra", "JuliaCall", "devtools"))
# 2. Install EmpiricalDynamics
devtools::install()
# 3. (Optional) Enable Julia parallelization
# Add to ~/.Renviron:
# JULIA_NUM_THREADS=8library(EmpiricalDynamics)
# Initialize Julia backend (installs required Julia packages)
# This may take a few minutes on first run
julia <- setup_julia_backend()Recovering from noisy observations:
library(EmpiricalDynamics)
# Generate noisy data
set.seed(42)
t <- seq(-3, 3, length.out = 200)
y <- pi * sin(t) + exp(1) * t + rnorm(200, 0, 0.05)
data <- data.frame(y = y, t = t)
# Symbolic search
results <- symbolic_search(
target = data$y,
predictors = data[, "t", drop = FALSE],
complexity_limit = 15,
n_iterations = 500,
parsimony = 0.001
)
# Best equation
best <- select_equation(results, criterion = "knee")
print(best)
# Expression: (sin(t) * 3.14159) + (t * 2.71828)
# R²: 0.9999Recovering the Lorenz equations from trajectory data:
# Assuming you have trajectory data with X, Y, Z and their TVR derivatives
data_lorenz <- data.frame(X = X, Y = Y, Z = Z, dX = dX_tvr, dY = dY_tvr, dZ = dZ_tvr)
# Add auxiliary transformations
data_lorenz$XY <- X * Y
data_lorenz$XZ <- X * Z
data_lorenz$YmX <- Y - X
# Search for dx/dt = σ(y - x)
results_dX <- symbolic_search(
target = data_lorenz$dX,
predictors = data_lorenz[, c("X", "Y", "Z", "YmX")],
complexity_limit = 20,
n_iterations = 1000,
parsimony = 0.0001
)
# Achieves R² > 0.93Recovering Stochastic Differential Equations of the form:
Where is the drift and is the diffusion coefficient.
In SDEs, the observed derivative contains both signal (drift) and noise (diffusion):
Direct symbolic regression on recovers drift poorly when Signal-to-Noise Ratio (SNR) < 1.
library(EmpiricalDynamics)
# ═══════════════════════════════════════════════════════════════════
# STEP 1: Compute derivatives with TVR
# ═══════════════════════════════════════════════════════════════════
data_spec <- specify_variables(data, endogenous = "Z", exogenous = "X", time_col = "time")
tvr_result <- compute_derivative(
Z = data_spec$Z,
t = data_spec$time,
method = "tvr",
lambda = "auto",
solver = "osqp"
)
dZ_tvr <- tvr_result$derivative
# ═══════════════════════════════════════════════════════════════════
# STEP 2: Initial drift search (symbolic)
# ═══════════════════════════════════════════════════════════════════
# Prepare predictors with transformations
data_full <- create_transformations(data, variables = c("X", "Z"))
data_full$dZ <- dZ_tvr
data_full$sinX <- sin(data_full$X)
data_full$absX <- abs(data_full$X)
data_full$Z3 <- data_full$Z^3
drift_initial <- symbolic_search(
target = data_full$dZ,
predictors = data_full[, c("X", "Z", "sinX", "Z3")],
complexity_limit = 20,
n_iterations = 600,
parsimony = 0.0002
)
# ═══════════════════════════════════════════════════════════════════
# STEP 3: GLS iterative refinement
# ═══════════════════════════════════════════════════════════════════
sde_refined <- estimate_sde_iterative(
target = data_full$dZ,
predictors = data_full[, c("X", "Z", "sinX", "Z3", "absX")],
data = data_full,
initial_drift = select_equation(drift_initial, "best"),
max_iter = 8,
tol = 0.001
)
# GLS iteratively:
# 1. Estimates diffusion from residuals
# 2. Calculates weights = 1/σ²
# 3. Re-estimates drift with weighted least squares
# 4. Repeats until convergence
# ═══════════════════════════════════════════════════════════════════
# STEP 4: Final diffusion search (symbolic)
# ═══════════════════════════════════════════════════════════════════
# Use refined residuals
residuals_refined <- sde_refined$final_residuals
resid_magnitude <- abs(residuals_refined) / sqrt(dt)
# Smooth with median filter (more robust than mean)
resid_smoothed <- stats::runmed(resid_magnitude, k = 21)
diffusion_final <- symbolic_search(
target = resid_smoothed,
predictors = data_full[, c("X", "absX")],
complexity_limit = 12,
n_iterations = 400,
parsimony = 0.0005
)
# ═══════════════════════════════════════════════════════════════════
# STEP 5: Construct final SDE model
# ═══════════════════════════════════════════════════════════════════
sde_final <- construct_sde(
drift = sde_refined$drift,
diffusion = select_equation(diffusion_final, "best"),
variable = "Z"
)
print(sde_final)Test system:
| Method | Drift R² | Diffusion R² |
|---|---|---|
| Direct symbolic search | 0.82 | 0.07 |
| GLS + Symbolic (recommended) | 0.86 | 0.59 |
| Scenario | Recommended Method |
|---|---|
| Deterministic ODE (no noise) | Direct symbolic_search() |
| SDE with SNR > 2 | Direct symbolic_search() on dZ, then residuals |
| SDE with SNR < 1 | GLS iterative + symbolic (as above) |
| Unknown noise structure | model_conditional_variance(method = "symbolic") |
Computing derivatives from noisy time series is critical. We offer multiple methods:
# Total Variation Regularization (recommended for noisy data)
deriv_tvr <- compute_derivative(Z, t, method = "tvr", lambda = "auto", solver = "osqp")
# Savitzky-Golay filter (fast fallback)
deriv_sg <- compute_derivative(Z, t, method = "savgol", p = 3, n = 15)
# Cubic spline (smooth data)
deriv_spline <- compute_derivative(Z, t, method = "spline")
# Spectral (periodic data)
deriv_spectral <- compute_derivative(Z, t, method = "spectral")| Method | Noise Robustness | Discontinuities | Speed | Best For |
|---|---|---|---|---|
| TVR | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐ | Noisy economic/financial data |
| Savgol | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | General purpose, fallback |
| Spline | ⭐⭐ | ⭐ | ⭐⭐⭐⭐ | Smooth physical data |
| Spectral | ⭐ | ⭐ | ⭐⭐⭐⭐⭐ | Periodic signals |
# Direct Julia call with full control
results <- julia_call(
"run_symbolic_search_robust",
data_path,
response_col,
predictor_cols,
max_complexity = 20L, # Tree size limit
n_iterations = 500L, # Genetic algorithm generations
parsimony = 0.001, # Complexity penalty
custom_operators = "" # e.g., "safe_log,safe_sqrt"
)| Value | Use Case |
|---|---|
| 0.1 | Quick exploration, very simple models |
| 0.01 | Standard regression |
| 0.001 | Physics discovery (recommended) |
| 0.0001 | Complex systems (Lorenz, etc.) |
# Check Julia threads
julia_eval("Threads.nthreads()")
# Enable more threads (before julia_setup)
Sys.setenv(JULIA_NUM_THREADS = "8")# After fitting drift
residuals <- data$dZ - predict(drift_model, data)
# Comprehensive diagnostics
diag <- residual_diagnostics(
residuals = residuals,
data = data,
predictors = c("X", "Z"),
max_lag = 20,
plot = TRUE
)
print(diag)
# Ljung-Box (autocorrelation): p=0.XXX
# ARCH-LM (conditional het.): p=0.XXX
# Breusch-Pagan (het.): p=0.XXX
# Jarque-Bera (normality): p=0.XXX# Cross-validation
cv_result <- cross_validate(
equation = best_drift,
data = data,
target_col = "dZ",
k = 5
)
# Trajectory simulation
sim <- simulate_trajectory(
sde_model = sde_final,
initial_state = c(Z = 0),
t_span = c(0, 30),
dt = 0.01,
n_paths = 100
)
plot(sim)Discovering a Solow-Swan type growth equation from GDP data:
library(EmpiricalDynamics)
# Load data
data <- read_empirical_data("gdp_quarterly.csv")
data <- specify_variables(data, endogenous = "GDP", exogenous = c("Capital", "Labor"),
time_col = "Quarter")
# Compute growth rate
deriv <- compute_derivative(data$GDP, data$Quarter, method = "tvr", lambda = "auto")
data$dGDP <- deriv$derivative
# Create transformations
data <- create_transformations(data, variables = c("GDP", "Capital", "Labor"))
# Search for growth equation
results <- symbolic_search(
target = data$dGDP,
predictors = data[, c("GDP", "Capital", "Labor", "log_Capital", "log_Labor")],
complexity_limit = 20,
n_iterations = 800,
parsimony = 0.001
)
# Select and validate
best <- select_equation(results, criterion = "knee")
cv <- cross_validate(best, data, "dGDP", k = 5)
# Export results
export_results(best, cv, format = "latex", file = "growth_model.tex")| Function | Description |
|---|---|
symbolic_search() |
Main symbolic regression interface |
select_equation() |
Select from Pareto front (best, simplest, knee) |
compute_derivative() |
Numerical differentiation (TVR, Savgol, etc.) |
specify_variables() |
Annotate data with variable roles |
create_transformations() |
Auto-generate log, sqrt, square variants |
| Function | Description |
|---|---|
estimate_sde_iterative() |
GLS iterative refinement for SDEs |
model_conditional_variance() |
Estimate diffusion from residuals |
construct_sde() |
Combine drift + diffusion into SDE object |
simulate_trajectory() |
Monte Carlo SDE simulation |
| Function | Description |
|---|---|
residual_diagnostics() |
Ljung-Box, ARCH-LM, Breusch-Pagan, Jarque-Bera |
cross_validate() |
K-fold cross-validation |
validate_model() |
Out-of-sample prediction metrics |
# Check current threads
julia_eval("Threads.nthreads()") # Shows 1?
# Solution: Set BEFORE starting R
# Windows: System Environment Variables → JULIA_NUM_THREADS = 8
# Linux/Mac: Add to ~/.Renviron: JULIA_NUM_THREADS=8
# Then restart RStudio completely# If OSQP fails, try SCS (slower but more robust)
deriv <- compute_derivative(Z, t, method = "tvr", solver = "scs")
# If all solvers fail, use Savitzky-Golay
deriv <- compute_derivative(Z, t, method = "savgol", p = 3, n = 15)This is expected when SNR < 1. The diffusion term is inherently harder to recover because:
Solutions:
If you use EmpiricalDynamics in your research, please cite:
@software{EmpiricalDynamics,
title = {EmpiricalDynamics: Robust Discovery of Differential Equations with R and Julia},
author = {Gómez Julián, José Mauricio},
year = {2025},
url = {https://github.com/IsadoreNabi/EmpiricalDynamics},
version = {0.1.0}
}MIT License. See LICENSE for details.
Built with ❤️ for Science and Economics.