| Type: | Package |
| Title: | Lagrangian Multiplier Smoothing Splines for Smooth Function Estimation |
| Version: | 1.0.1 |
| Description: | Implements Lagrangian multiplier smoothing splines for flexible nonparametric regression and function estimation. Provides tools for fitting, prediction, and inference using a constrained optimization approach to enforce smoothness. Supports generalized linear models, Weibull accelerated failure time (AFT) models, quadratic programming constraints, and customizable working-correlation structures, with options for parallel fitting. The core spline construction builds on Ezhov et al. (2018) <doi:10.1515/jag-2017-0029>. Quadratic-programming and SQP details follow Goldfarb & Idnani (1983) <doi:10.1007/BF02591962> and Nocedal & Wright (2006) <doi:10.1007/978-0-387-40065-5>. For smoothing spline and penalized spline background, see Wahba (1990) <doi:10.1137/1.9781611970128> and Wood (2017) <doi:10.1201/9781315370279>. For variance-component and correlation-parameter estimation, see Searle et al. (2006) <ISBN:978-0470009598>. The default multivariate partitioning step uses k-means clustering as in MacQueen (1967). |
| License: | MIT + file LICENSE |
| Language: | en-US |
| Depends: | R (≥ 3.5.0) |
| Imports: | Rcpp (≥ 1.0.7), RcppArmadillo, FNN, RColorBrewer, plotly, quadprog, methods, stats |
| LinkingTo: | Rcpp, RcppArmadillo |
| Suggests: | testthat (≥ 3.0.0), spelling, knitr, rmarkdown, parallel, survival, MASS, graphics |
| URL: | https://github.com/matthewlouisdavisBioStat/lgspline |
| BugReports: | https://github.com/matthewlouisdavisBioStat/lgspline/issues |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.2 |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | yes |
| Packaged: | 2026-03-15 06:45:34 UTC; matth |
| Author: | Matthew Davis |
| Maintainer: | Matthew Davis <matthewlouisdavis@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-03-15 12:10:07 UTC |
Lagrangian Multiplier Smoothing Splines
Description
Implements Lagrangian multiplier smoothing splines for flexible nonparametric regression and function estimation. Provides tools for fitting, prediction, and inference using a constrained optimization approach to enforce smoothness. Supports generalized linear models, Weibull accelerated failure time (AFT) models, quadratic programming constraints, and customizable working-correlation structures, with options for parallel fitting. The core spline construction builds on Ezhov et al. (2018) doi:10.1515/jag-2017-0029. Quadratic-programming and SQP details follow Goldfarb & Idnani (1983) doi:10.1007/BF02591962 and Nocedal & Wright (2006) doi:10.1007/978-0-387-40065-5. For smoothing spline and penalized spline background, see Wahba (1990) doi:10.1137/1.9781611970128 and Wood (2017) doi:10.1201/9781315370279. For variance-component and correlation-parameter estimation, see Searle et al. (2006) <ISBN:978-0470009598>. The default multivariate partitioning step uses k-means clustering as in MacQueen (1967).
Author(s)
Maintainer: Matthew Davis matthewlouisdavis@gmail.com (ORCID)
See Also
Useful links:
Report bugs at https://github.com/matthewlouisdavisBioStat/lgspline/issues
Efficient Matrix Multiplication Operator
Description
Operator wrapper around C++ efficient_matrix_mult() for matrix multiplication syntax.
This is an internal function meant to provide improvement over base R's operator for certain large matrix operations, at a cost of potential slight slowdown for smaller problems.
Usage
x %**% y
Arguments
x |
Left matrix |
y |
Right matrix |
Value
Matrix product of x and y
Examples
M1 <- matrix(1:4, 2, 2)
M2 <- matrix(5:8, 2, 2)
M1 %**% M2
Partition-Wise Active-Set Refinement for Inequality Constraints
Description
Replaces the dense .qp_refine SQP loop with a partition-wise
active-set method that reuses the existing Lagrangian projection
machinery. The key insight is that an active inequality constraint
can be treated as an additional equality constraint and absorbed into
the augmented constraint matrix \mathbf{A}_{\mathrm{aug}},
after which the standard \mathbf{G}^{1/2}\mathbf{r}^* trick
applies without ever forming the full P \times P information
matrix.
Usage
.active_set_refine(
result,
X,
y,
K,
p_expansions,
A,
R_constraints,
constraint_value_vectors,
Lambda,
Ghalf,
GhalfInv,
family,
qp_Amat,
qp_bvec,
qp_meq,
Xy_or_uncon,
is_path3,
parallel_aga,
parallel_matmult,
cl,
chunk_size,
num_chunks,
rem_chunks,
tol,
max_as_iter = 50
)
Arguments
result |
List of current coefficient column vectors by partition. |
X, y |
Lists of partition-specific design matrices and responses. |
K, p_expansions |
Integer dimensions. |
A |
Original equality constraint matrix. |
R_constraints |
Number of columns of |
constraint_value_vectors |
Constraint RHS list. |
Lambda |
Shared penalty matrix. |
Ghalf, GhalfInv |
Lists of |
family |
GLM family object. |
qp_Amat |
Inequality constraint matrix. |
qp_bvec |
Inequality constraint RHS. |
qp_meq |
Number of equality constraints within |
Xy_or_uncon |
Cross-products (Path 2) or unconstrained estimates (Path 3). |
is_path3 |
Logical. |
parallel_aga, parallel_matmult |
Logical flags. |
cl, chunk_size, num_chunks, rem_chunks |
Parallel parameters. |
tol |
Convergence tolerance. |
max_as_iter |
Maximum active-set iterations (default 50). |
Details
The active-set loop:
Solve the equality-constrained subproblem (Lagrangian projection) with current active set.
Check primal feasibility: any violated inactive inequalities?
Check dual feasibility: any negative multipliers on active inequalities?
If both satisfied, return (KKT conditions met).
Otherwise, add most-violated constraint or drop most-negative multiplier, and repeat.
Falls back to .qp_refine if the active-set method does not
converge within max_as_iter iterations.
Value
A list with components:
- result
List of refined coefficient column vectors by partition.
- qp_info
List with active constraint information, or NULL.
- converged
Logical; TRUE if active-set method converged.
Assemble qp_info from a solve.QP Solution
Description
Thin wrapper around .solver_assemble_qp_info().
Keeping the local helper name avoids changing the public return shape
or downstream references inside blockfit_solve().
Usage
.bf_assemble_qp_info(
last_qp_sol,
beta_block,
qp_Amat_combined,
qp_bvec_combined,
qp_meq_combined,
converged,
final_deviance,
info_matrix = NULL
)
Arguments
last_qp_sol |
Output of |
beta_block |
Final coefficient vector. |
qp_Amat_combined |
Combined constraint matrix. |
qp_bvec_combined |
Combined constraint RHS. |
qp_meq_combined |
Number of equalities. |
converged |
Logical. |
final_deviance |
Scalar. |
info_matrix |
Information matrix (non-GEE path only; NULL otherwise). |
Value
A list suitable for the qp_info slot, or NULL if
last_qp_sol is NULL.
Gaussian Identity + GEE: Closed-Form Solve (Case a)
Description
When the response is Gaussian with identity link and a working
correlation structure is present, the whitened system
\mathbf{V}^{-1/2}\mathbf{X} is not block-diagonal, so
partition-wise backfitting does not apply.
This function performs a single closed-form solve that replicates
get_B Path 1a but in the backfitting context.
Usage
.bf_case_gauss_gee(
X,
y,
K,
p_expansions,
order_list,
VhalfInv,
Lambda,
L_partition_list,
unique_penalty_per_partition,
A,
constraint_values,
spline_cols,
flat_cols,
observation_weights
)
Arguments
X |
List of |
y |
List of |
K |
Integer; number of interior knots. |
p_expansions |
Integer; columns per partition. |
order_list |
List of observation-index vectors by partition. |
VhalfInv |
|
Lambda |
Shared penalty matrix. |
L_partition_list |
Partition-specific penalty matrices. |
unique_penalty_per_partition |
Logical. |
A |
Full constraint matrix. |
constraint_values |
List of constraint RHS vectors. |
spline_cols, flat_cols |
Integer vectors of column indices. |
Value
A named list:
- beta_spline
List of
K+1spline coefficient vectors.- beta_flat
Numeric vector of flat coefficients.
- result
List of
K+1full per-partition coefficient vectors.
Gaussian Identity Backfitting Without Correlation (Case b)
Description
Standard block-coordinate descent on the convex quadratic objective, alternating between the spline step (Lagrangian projection) and the flat step (pooled penalized regression).
Usage
.bf_case_gauss_no_corr(
X_spline,
X_flat,
y,
K,
Ghalf_sp,
GhalfA_sp,
XfXf_pen_inv,
constraint_values_spline,
nc_spline,
nc_flat,
A_spline,
tol,
max_backfit_iter,
verbose,
split = NULL,
constraint_values = list(),
Lambda_flat = NULL,
p_expansions = NULL
)
Arguments
X_spline, X_flat |
Lists of per-partition submatrices. |
y |
List of response vectors. |
K |
Integer. |
Ghalf_sp |
List of |
GhalfA_sp |
Pre-computed |
XfXf_pen_inv |
Penalised inverse of the pooled flat Gram matrix. |
constraint_values_spline |
Spline-only constraint RHS. |
nc_spline, nc_flat |
Integers. |
A_spline |
Spline-only constraint matrix. |
tol |
Convergence tolerance. |
max_backfit_iter |
Maximum iterations. |
verbose |
Logical. |
split |
Output of |
constraint_values |
Full constraint RHS list. |
Lambda_flat |
Flat penalty submatrix. |
p_expansions |
Integer; columns per partition. |
Value
A named list with beta_spline and
beta_flat.
GLM + GEE Two-Stage Estimation (Case c)
Description
Stage 1 runs damped Newton-Raphson backfitting on the unwhitened link-scale working
response to produce a warm start. Stage 2 refines via damped SQP
on the full whitened system, replicating get_B Path 1b.
Usage
.bf_case_glm_gee(
X,
y,
K,
p_expansions,
flat_cols,
split,
family,
order_list,
glm_weight_function,
schur_correction_function,
qp_score_function,
need_dispersion_for_estimation,
dispersion_function,
observation_weights,
iterate,
tol,
max_backfit_iter,
parallel_eigen,
cl,
chunk_size,
num_chunks,
rem_chunks,
schur_zero,
quadprog,
qp_Amat,
qp_bvec,
qp_meq,
Lambda,
L_partition_list,
unique_penalty_per_partition,
A,
constraint_values,
Vhalf,
VhalfInv,
verbose,
...
)
Arguments
X |
List of |
y |
List of |
K |
Integer; number of interior knots. |
p_expansions |
Integer; number of coefficients per partition. |
flat_cols |
Integer vector indicating flat columns of
|
split |
Output of |
family |
GLM family object. |
order_list |
List of observation index vectors by partition. |
glm_weight_function |
GLM weight function. |
schur_correction_function |
Schur complement correction function. |
qp_score_function |
Score function for QP subproblem. |
need_dispersion_for_estimation |
Logical. |
dispersion_function |
Dispersion estimation function. |
observation_weights |
List of observation weights. |
iterate |
Logical; if FALSE, single pass (no iteration). |
tol |
Convergence tolerance. |
max_backfit_iter |
Integer. |
parallel_eigen, cl, chunk_size, num_chunks, rem_chunks |
Parallel arguments. |
schur_zero |
List of zeros (one per partition). |
quadprog |
Logical; apply inequality constraint refinement if TRUE. |
qp_Amat |
Inequality constraint matrix for
|
qp_bvec |
Inequality constraint right-hand side. |
qp_meq |
Number of leading equality constraints. |
Lambda |
|
L_partition_list |
List of partition-specific penalty matrices. |
unique_penalty_per_partition |
Logical. |
A |
Full |
constraint_values |
List of constraint right-hand sides. |
Vhalf |
Square root of the working correlation matrix in the original observation ordering. |
VhalfInv |
Inverse square root of the working correlation matrix
in the original observation ordering. When both are non- |
verbose |
Logical. |
... |
Additional arguments passed to weight and dispersion functions. |
Value
A named list with result, beta_spline,
beta_flat, and qp_info.
GLM Without GEE: Damped Newton-Raphson + Backfitting (Case d)
Description
Damped Newton-Raphson outer loop wrapping the weighted backfitting inner loop.
Each damped Newton-Raphson step computes weights from the
current linear predictor, then calls .bf_inner_weighted.
Usage
.bf_case_glm_no_corr(
X,
y,
K,
p_expansions,
split,
family,
order_list,
glm_weight_function,
need_dispersion_for_estimation,
dispersion_function,
observation_weights,
iterate,
tol,
max_backfit_iter,
parallel_eigen,
cl,
chunk_size,
num_chunks,
rem_chunks,
schur_zero,
unique_penalty_per_partition,
VhalfInv,
verbose,
constraint_values = list(),
...
)
Arguments
X |
List of |
y |
List of |
K |
Integer; number of interior knots. |
p_expansions |
Integer; number of coefficients per partition. |
split |
Output of |
family |
GLM family object. |
order_list |
List of observation index vectors by partition. |
glm_weight_function |
GLM weight function. |
need_dispersion_for_estimation |
Logical. |
dispersion_function |
Dispersion estimation function. |
observation_weights |
List of observation weights. |
iterate |
Logical; if FALSE, single pass (no iteration). |
tol |
Convergence tolerance. |
max_backfit_iter |
Integer. |
parallel_eigen, cl, chunk_size, num_chunks, rem_chunks |
Parallel arguments. |
schur_zero |
List of zeros. |
unique_penalty_per_partition |
Logical. |
VhalfInv |
Inverse square root of the working correlation matrix
in the original observation ordering. When both are non- |
verbose |
Logical. |
constraint_values |
List of constraint right-hand sides. |
... |
Additional arguments passed to weight and dispersion functions. |
Value
A named list with beta_spline and
beta_flat.
Constrained Flat Update Given Current Spline Solution
Description
Solves for flat coefficients subject to the residual equality constraint imposed by the full constraint system, conditional on the current spline block solution.
Usage
.bf_constrained_flat_update(
XfWXf_pen,
Xfr,
A_full,
constraint_values,
beta_spline,
flat_rows_all,
spline_rows_all,
spline_cols,
flat_cols,
p_expansions,
K,
nc_flat
)
Arguments
XfWXf_pen |
Penalized flat Gram matrix
( |
Xfr |
Right-hand side cross-product for flat update
( |
A_full |
Full constraint matrix ( |
constraint_values |
List of constraint RHS vectors. |
beta_spline |
List of |
flat_rows_all |
Integer vector of flat row indices in the full P-space. |
spline_rows_all |
Integer vector of spline row indices in the full P-space. |
spline_cols |
Integer vector of spline column indices within each partition. |
flat_cols |
Integer vector of flat column indices within each partition. |
p_expansions |
Integer; columns per partition. |
K |
Integer; number of interior knots. |
nc_flat |
Integer; number of flat columns. |
Details
When the full constraint system is \mathbf{A}^\top \boldsymbol{\beta} = \mathbf{c},
and we partition \boldsymbol{\beta} into spline and flat blocks, the flat
update must satisfy:
\mathbf{A}_{\mathrm{flat}}^\top \boldsymbol{\beta}_{\mathrm{flat}}
= \mathbf{c} - \mathbf{A}_{\mathrm{spline}}^\top \boldsymbol{\beta}_{\mathrm{spline}}
This function solves the penalized least-squares problem for flat coefficients subject to this residual equality constraint using a Lagrangian approach.
Value
Numeric vector of flat coefficients (nc_f \times 1).
Compute Deviance for Non-GEE Models
Description
Evaluates the mean deviance (or mean squared error as fallback) at
the current fitted values. Used inside the damped Newton-Raphson outer loop of
blockfit_solve for convergence monitoring.
Usage
.bf_deviance(y_vec, mu_vec, obs_wt, ord, fam, ...)
Arguments
y_vec |
Numeric vector of observed responses. |
mu_vec |
Numeric vector of fitted means. |
obs_wt |
Numeric vector of observation weights. |
ord |
Integer vector of observation indices (passed to custom deviance residual functions). |
fam |
GLM family object. |
... |
Additional arguments forwarded to
|
Value
Scalar; the mean deviance contribution.
Compute Whitened Deviance for GEE Convergence Monitoring
Description
Evaluates deviance in the whitened (decorrelated) space when the
family supplies fam$custom_dev.resids. In that case the raw
deviance residuals are divided by \sqrt{W} and then
pre-multiplied by \mathbf{V}^{-1/2} before squaring and
averaging. If only fam$dev.resids is available, the function
falls back to the usual mean deviance (and otherwise mean squared
error). Used in the GEE refinement loop (Case c) of
blockfit_solve.
Usage
.bf_gee_deviance(y_vec, mu_vec, W_vec, ord, obs_wt, VhInv, fam, ...)
Arguments
y_vec |
Numeric vector of observed responses. |
mu_vec |
Numeric vector of fitted means. |
W_vec |
Numeric vector of damped Newton-Raphson weights at current iterate. |
ord |
Integer vector of observation indices (passed to custom deviance residual functions). |
obs_wt |
Numeric vector of observation weights. |
VhInv |
|
fam |
GLM family object. |
... |
Additional arguments forwarded to
|
Value
Scalar; the mean whitened deviance.
Numerical Derivative of the Inverse Link Function
Description
Returns \partial\mu / \partial\eta for the supplied family
object, using the family's own mu.eta method when available
and falling back to analytic forms for common links or central
differences otherwise.
Usage
.bf_get_mu_eta(eta, fam)
Arguments
eta |
Numeric vector of linear predictor values. |
fam |
GLM family object. |
Value
Numeric vector of the same length as eta.
GLM Backfitting Inner Loop
Description
Given damped Newton-Raphson weights and response partitioned into lists, runs the inner backfitting loop that alternates between a weighted spline step and a weighted flat step until convergence. Shared by both Case c (GEE warm start) and Case d (GLM without GEE).
Usage
.bf_inner_weighted(
X_spline,
X_flat,
z_list,
W_list,
K,
Lambda_spline,
Lambda_flat,
L_part_spline,
unique_penalty_per_partition,
A_spline,
constraint_values_spline,
nc_spline,
nc_flat,
beta_spline_init,
beta_flat_init,
tol,
max_backfit_iter,
parallel_eigen,
cl,
chunk_size,
num_chunks,
rem_chunks,
split = NULL,
constraint_values = list(),
p_expansions = NULL
)
Arguments
X_spline, X_flat |
Lists of per-partition submatrices. |
z_list |
List of |
W_list |
List of |
K |
Integer. |
Lambda_spline, Lambda_flat |
Penalty submatrices. |
L_part_spline |
Partition-specific spline penalty list. |
unique_penalty_per_partition |
Logical. |
A_spline |
Spline-only constraint matrix. |
constraint_values_spline |
Constraint RHS list. |
nc_spline, nc_flat |
Integers. |
beta_spline_init, beta_flat_init |
Initial coefficient values. |
tol |
Convergence tolerance. |
max_backfit_iter |
Maximum iterations. |
parallel_eigen, cl, chunk_size, num_chunks, rem_chunks |
Parallel arguments forwarded to |
split |
Output of |
constraint_values |
Full constraint RHS list. |
p_expansions |
Integer; columns per partition. |
Value
A named list with beta_spline and
beta_flat.
Lagrangian Projection for the Spline-Only Subproblem
Description
Projects the adjusted cross-product \mathbf{X}_s^{\top}
\tilde{\mathbf{y}} through \mathbf{G}_s^{1/2} and
removes the component along \mathbf{G}_s^{1/2}\mathbf{A}_s
to enforce the smoothness constraints in the spline block.
When cv_spline contains nonzero values (e.g.\ from
no_intercept), the corresponding contribution is added
back after projection.
Usage
.bf_lagrangian_project(
Xy_adj,
Ghalf_cur,
GhalfA_cur,
cv_spline,
K,
nc_spline,
A_spline
)
Arguments
Xy_adj |
List of |
Ghalf_cur |
List of |
GhalfA_cur |
|
cv_spline |
List of constraint right-hand-side vectors for the spline subproblem (may be empty). |
K |
Integer; number of interior knots. |
nc_spline |
Integer; number of spline columns. |
A_spline |
Spline-only constraint matrix. |
Value
List of K+1 coefficient vectors
(nc_s \times 1).
Construct Full Block-Diagonal Penalty Matrix
Description
Thin wrapper around .solver_build_lambda_block().
The block/backfitting solver keeps its original helper name so the
surrounding code stays unchanged, while the shared implementation now
lives in solver_utils.R.
Usage
.bf_make_Lambda_block(
Lambda,
K,
unique_penalty_per_partition,
L_partition_list
)
Arguments
Lambda |
|
K |
Integer; number of interior knots. |
unique_penalty_per_partition |
Logical. |
L_partition_list |
List of |
Value
A (p_expansions \cdot (K+1)) \times (p_expansions \cdot (K+1)) matrix.
Split Design and Penalty into Spline and Flat Components
Description
Partitions the per-partition design matrices, penalty matrices, and constraint matrix into "spline" (columns receiving partition-specific coefficients) and "flat" (columns receiving a single shared coefficient across all partitions) subsets.
Usage
.bf_split_components(
X,
flat_cols,
p_expansions,
K,
Lambda,
L_partition_list,
A,
constraint_values
)
Arguments
X |
List of |
flat_cols |
Integer vector of flat column indices. |
p_expansions |
Integer; total columns per partition. |
K |
Integer; number of interior knots. |
Lambda |
|
L_partition_list |
List of partition-specific penalty matrices. |
A |
Full |
constraint_values |
List of constraint right-hand sides. |
Value
A named list with elements:
- X_spline
List of
K+1matrices, spline columns only.- X_flat
List of
K+1matrices, flat columns only.- Lambda_spline
nc_s \times nc_spenalty submatrix.- Lambda_flat
nc_f \times nc_fpenalty submatrix.- L_part_spline
List of partition-specific penalty submatrices for the spline columns.
- A_spline
Spline-only constraint matrix (columns pruned and rank-reduced via pivoted QR).
- nca_spline
Integer; number of columns in
A_spline.- constraint_values_spline
List of constraint RHS vectors restricted to spline rows.
- spline_cols
Integer vector of spline column indices.
- nc_spline
Integer; number of spline columns.
- nc_flat
Integer; number of flat columns.
- A_flat
Flat-only constraint matrix (rows for flat coefficients, columns pruned and rank-reduced). Used to enforce mixed constraints on the flat update step.
- nca_flat
Integer; number of columns in
A_flat.- A_full
The original full constraint matrix
A, retained for mixed-constraint enforcement.- flat_rows_all
Integer vector of flat-coefficient row indices in the full P-space.
- spline_rows_all
Integer vector of spline-coefficient row indices in the full P-space.
- has_mixed_constraints
Logical; TRUE if any constraint column in A has nonzero entries on both spline and flat rows.
- mixed_constraint_cols
Integer vector of column indices in the original A that are mixed (touch both spline and flat rows).
Damped SQP Refinement on the Full (Optionally Whitened) System
Description
Runs a damped sequential quadratic programming loop using
quadprog::solve.QP at each iteration, enforcing all
smoothness equalities, flat-equality constraints, and any
user-supplied inequality constraints.
Usage
.bf_sqp_loop(
X_design,
y_design,
X_block_raw,
beta_init,
Lambda_block,
qp_Amat_combined,
qp_bvec_combined,
qp_meq_combined,
K,
p_expansions,
family,
order_list,
glm_weight_function,
schur_correction_function,
qp_score_function,
need_dispersion_for_estimation,
dispersion_function,
observation_weights,
iterate,
tol,
VhalfInv,
VhalfInv_perm,
is_gee,
deviance_fun,
X_partitions,
y_partitions,
verbose,
...
)
Arguments
X_design |
|
y_design |
|
X_block_raw |
Unwhitened block-diagonal design matrix (used
for computing the linear predictor on the original scale).
For the non-GEE path this equals |
beta_init |
|
Lambda_block |
|
qp_Amat_combined |
Combined constraint matrix (equalities then inequalities). |
qp_bvec_combined |
Combined constraint RHS. |
qp_meq_combined |
Number of leading equality constraints. |
K, p_expansions |
Integers. |
family |
GLM family object. |
order_list |
Observation-index lists. |
glm_weight_function, schur_correction_function |
qp_score_function Functions. |
need_dispersion_for_estimation |
Logical. |
dispersion_function |
Function. |
observation_weights |
List or vector of weights. |
iterate |
Logical; if FALSE, takes at most two steps. |
tol |
Convergence tolerance. |
VhalfInv, VhalfInv_perm |
|
is_gee |
Logical; TRUE when called from the GEE refinement path (affects deviance computation). |
deviance_fun |
Function computing deviance; either
|
X_partitions |
List of per-partition design matrices (unwhitened); needed for Schur correction in non-GEE path. |
y_partitions |
List of per-partition response vectors (unwhitened); needed for Schur correction in non-GEE path. |
verbose |
Logical. |
... |
Forwarded to weight, dispersion, and score functions. |
Details
This function serves two roles within blockfit_solve:
-
GEE Stage 2 (Case c): operates on the whitened system
\tilde{\mathbf{X}} = \mathbf{V}^{-1/2} \mathbf{X}_{\mathrm{block}}, initialized from the damped Newton-Raphson backfitting warm start. -
Non-GEE SQP refinement: operates on the unwhitened block-diagonal design, applying inequality constraints after backfitting convergence.
Value
A named list with elements beta_block (final
coefficient vector), result (per-partition coefficient
list), last_qp_sol (last successful solve.QP
output or NULL), converged (logical), and
final_deviance (scalar).
Build Derivative QP Constraints in Full P-Dimensional Space
Description
Internal helper that constructs the Amat / bvec pair
enforcing derivative sign constraints at every row of a block-diagonal
design matrix.
Given an N_{\mathrm{sub}} \times P block-diagonal design matrix
X_block, where P = p \times (K+1), this function:
Recovers the per-partition
p-column expansion matrixC_qpand records each row's partition assignment.Calls
make_derivative_matrixonC_qpto obtain first or second derivative matrices with respect to each predictor.Optionally selects only derivatives for a subset of predictor variables (
target_vars).Maps each derivative row into the full
P-dimensional coefficient space, yielding one constraint column per (observation, variable) pair.
The result is a constraint pair \mathbf{A}^{\top}\boldsymbol{\beta}
\ge \mathbf{b} (with \mathbf{b} = \mathbf{0}) suitable for
solve.QP.
Usage
.build_deriv_qp(
X_block,
sign_mult,
just_first,
p_expansions,
K,
colnm_expansions,
power1_cols,
power2_cols,
nonspline_cols,
interaction_single_cols,
interaction_quad_cols,
triplet_cols,
include_2way_interactions,
include_3way_interactions,
include_quadratic_interactions,
expansion_scales,
target_vars = NULL,
og_cols = NULL
)
Arguments
X_block |
Numeric matrix, |
sign_mult |
Numeric scalar, |
just_first |
Logical. If |
p_expansions |
Integer. Number of basis expansions per partition. |
K |
Integer. Number of interior knots (partitions minus 1). |
colnm_expansions |
Character vector of length |
power1_cols |
Integer vector of linear-term column indices. |
power2_cols |
Integer vector of quadratic-term column indices. |
nonspline_cols |
Integer vector of non-spline linear column indices. |
interaction_single_cols |
Integer vector of linear-by-linear interaction column indices. |
interaction_quad_cols |
Integer vector of linear-by-quadratic interaction column indices. |
triplet_cols |
Integer vector of three-way interaction column indices. |
include_2way_interactions |
Logical switch forwarded to
|
include_3way_interactions |
Logical switch forwarded to
|
include_quadratic_interactions |
Logical switch forwarded to
|
expansion_scales |
Numeric vector of length |
target_vars |
Optional. Integer vector of predictor column indices
or character vector of predictor names identifying which predictors
to constrain. When |
og_cols |
Optional character vector of original predictor column
names, used to resolve character |
Value
A list with components:
- Amat
P \times Mconstraint matrix, whereMis the number of unique constraint columns (after deduplication).- bvec
Numeric vector of length
M, all zeros.- meq
Integer, always
0(inequality constraints).
See Also
Build Block-Diagonal Penalty Matrix
Description
Thin wrapper around .solver_build_lambda_block().
Keeping the local helper name preserves existing call sites in
get_B() while centralizing the shared implementation in
solver_utils.R.
Usage
.build_lambda_block(Lambda, K, unique_penalty_per_partition, L_partition_list)
Arguments
Lambda |
Shared |
K |
Integer; number of interior knots ( |
unique_penalty_per_partition |
Logical; if |
L_partition_list |
List of partition-specific |
Value
A P \times P block-diagonal matrix where
P = p \times (K+1).
Build Tuning Environment
Description
Assembles a named list containing all pre-computed objects and configuration needed by the GCV_u evaluation and gradient functions during penalty tuning. This avoids deep nesting of closures and makes the dependencies explicit.
Usage
.build_tuning_env(
y,
X,
X_gram,
Xy,
smoothing_spline_penalty,
A,
R_constraints,
K,
p_expansions,
N_obs,
custom_penalty_mat,
colnm_expansions,
unique_penalty_per_predictor,
unique_penalty_per_partition,
meta_penalty,
family,
delta,
order_list,
observation_weights,
homogenous_weights,
parallel,
parallel_eigen,
parallel_trace,
parallel_aga,
parallel_matmult,
parallel_unconstrained,
cl,
chunk_size,
num_chunks,
rem_chunks,
unconstrained_fit_fxn,
keep_weighted_Lambda,
iterate,
qp_score_function,
quadprog,
qp_Amat,
qp_bvec,
qp_meq,
tol,
sd_y,
constraint_value_vectors,
glm_weight_function,
schur_correction_function,
need_dispersion_for_estimation,
dispersion_function,
blockfit,
just_linear_without_interactions,
Vhalf,
VhalfInv,
verbose,
include_warnings,
flat_cols,
use_blockfit
)
Arguments
flat_cols |
Integer vector; pre-computed flat column indices (passed
in from |
use_blockfit |
Logical; pre-computed dispatch flag (passed in from
|
Details
In addition to the standard tuning arguments, the environment stores two pre-computed blockfit dispatch items:
- use_blockfit
Logical; TRUE when
blockfitis enabled,flat_colsis non-empty, andK > 0. Mirrors the dispatch logic inlgspline.fitso that the same fitting path is used during tuning as during the final fit.- flat_cols
Integer vector; column indices of non-interactive linear terms derived from
just_linear_without_interactionsandcolnm_expansions. Pre-computed once here rather than re-derived at every GCV evaluation.
Value
Named list (the "tuning environment").
Check KKT Conditions for Partition-Wise Active-Set Method
Description
Given current constrained coefficient estimates and a set of active inequality constraints (treated as equalities in the last Lagrangian projection), checks primal feasibility of inactive constraints and dual feasibility (non-negative multipliers) of active constraints.
Usage
.check_kkt_partitionwise(
result,
Ghalf,
GhalfInv,
Xy_or_uncon,
is_path3,
A_aug,
n_eq_orig,
qp_Amat,
qp_bvec,
active_ineq,
K,
p_expansions,
family,
parallel_matmult,
parallel_aga,
cl,
chunk_size,
num_chunks,
rem_chunks,
tol
)
Arguments
result |
List of |
Ghalf |
List of |
GhalfInv |
List of |
Xy_or_uncon |
Either the list of cross-products
|
is_path3 |
Logical; if TRUE, |
A_aug |
Augmented constraint matrix (original A plus active inequality columns). |
n_eq_orig |
Integer; number of original equality constraints (columns of A before augmentation). |
qp_Amat |
Full inequality constraint matrix. |
qp_bvec |
Full inequality constraint RHS. |
active_ineq |
Integer vector; indices into columns of
|
K, p_expansions |
Integer dimensions. |
family |
GLM family object. |
parallel_matmult, parallel_aga |
Logical flags. |
cl, chunk_size, num_chunks, rem_chunks |
Parallel parameters. |
tol |
Numeric tolerance for feasibility and multiplier checks. |
Details
Multipliers for active inequality constraints are recovered from the
OLS fit used in the Lagrangian projection: the fitted coefficients
on \mathbf{X}^* = \mathbf{G}^{1/2}\mathbf{A}_{\mathrm{aug}}
give the Lagrangian multipliers (up to sign and scaling).
Value
A list with components:
- feasible
Logical; TRUE if all inactive inequality constraints are satisfied within tolerance.
- dual_feasible
Logical; TRUE if all active inequality multipliers are non-negative within tolerance.
- violated
Integer vector; indices of violated inactive constraints (into
qp_Amatcolumns).- drop
Integer vector; indices of active constraints with negative multipliers that should be dropped.
- multipliers
Numeric vector of Lagrangian multipliers for the active inequality constraints.
Compute Partitioned GLS Cross-Products
Description
Computes \mathbf{X}_k^{\top}\mathbf{V}^{-1}\mathbf{y} for each
partition, accounting for cross-partition contributions from the
correlation structure.
Usage
.compute_Xy_V(X, y, VhalfInv_perm, K, p_expansions, order_list)
Arguments
X |
List of partition-specific design matrices. |
y |
List of response vectors by partition. |
VhalfInv_perm |
|
K, p_expansions |
Integer dimensions. |
order_list |
Partition-to-data index mapping (unused here but retained for interface consistency). |
Value
A list of K+1 column vectors, each
p\_expansions \times 1.
Compute Euclidean distance matrix for a cluster block
Description
Returns pairwise Euclidean distances within the cluster indexed by
inds. When spacetime has multiple columns, squared distances
are averaged across dimensions before taking the square root.
Usage
.compute_dist_block(spacetime, inds)
Evaluate GCV_u Criterion at a Given Penalty Configuration
Description
Computes the GCV_u (unbiased generalized cross-validation) criterion for a given set of penalty parameters. This is the objective function minimized during penalty tuning.
Usage
.compute_gcvu(par, log_penalty_vec, env, ...)
Arguments
par |
Numeric vector; log-scale penalty parameters. First two elements are log(wiggle_penalty) and log(flat_ridge_penalty). Remaining elements (if any) are log-scale predictor/partition penalties. |
log_penalty_vec |
Numeric vector; log-scale predictor/partition penalties (passed separately for compatibility with the grid search). |
env |
List; pre-computed objects and tuning configuration. Contains:
|
... |
Additional arguments passed to fitting functions. |
Value
List containing:
- GCV_u
Numeric; GCV_u criterion value including meta-penalty.
- B
List; fitted coefficient vectors by partition.
- GXX
List;
\mathbf{G}_{k} \mathbf{X}_{k}^{\top}\mathbf{X}_{k}matrices.- G_list
List; eigendecomposition results from
compute_G_eigen.- mean_W
Numeric; mean of hat matrix diagonal.
- sum_W
Numeric; trace of hat matrix.
- Lambda
Matrix; combined penalty matrix.
- L1
Matrix; smoothing spline penalty component.
- L2
Matrix; ridge penalty component.
- L_predictor_list
List; predictor-specific penalty matrices.
- L_partition_list
List; partition-specific penalty matrices.
- numerator
Numeric; sum of squared residuals.
- denominator
Numeric; GCV denominator
N(1 - \bar{W})^{2}.- residuals
List; residual vectors by partition.
- denom_sq
Numeric; squared denominator.
- AGAInv
Matrix;
(\mathbf{A}^{\top}\mathbf{G}\mathbf{A})^{-1}.
Compute Closed-Form Gradient of GCV_u Criterion
Description
Computes the gradient of the GCV_u criterion with respect to the log-scale penalty parameters using analytical derivatives of the hat matrix trace and residual sum of squares.
Usage
.compute_gcvu_gradient(par, log_penalty_vec, outlist = NULL, env, ...)
Arguments
par |
Numeric vector; log-scale penalty parameters. |
log_penalty_vec |
Numeric vector; log-scale predictor/partition penalties. |
outlist |
List or NULL; pre-computed GCV_u components from
|
env |
List; pre-computed objects and tuning configuration (same
structure as in |
... |
Additional arguments passed to fitting functions. |
Details
The gradient is computed via:
\frac{\partial \mathrm{GCV}_u}{\partial \theta}
= \frac{1}{D^{2}} \left(
\frac{\partial N}{\partial \theta} D
- N \frac{\partial D}{\partial \theta}
\right)
where N = \sum r_{i}^{2} (numerator), D = n(1 - \bar{W})^{2}
(denominator), \theta is the log-scale penalty parameter, and the
chain rule d\lambda / d\theta = \lambda (exp parameterization) is
applied.
For predictor- and partition-specific penalties, a trace-ratio heuristic is used:
\frac{\partial \mathrm{GCV}_u}{\partial \lambda_{j}}
\approx \frac{\mathrm{tr}(\mathbf{L}_{j})}{\mathrm{tr}(\boldsymbol{\Lambda})}
\cdot \frac{\partial \mathrm{GCV}_u}{\partial \lambda_{w}}
Value
List containing:
- GCV_u
Numeric; GCV_u criterion value including meta-penalty.
- gradient
Numeric vector; gradient on the log penalty scale.
- outlist
List; GCV_u components (for reuse to avoid recomputation).
Compute Regularization (Meta) Penalty on Penalty Parameters
Description
Computes the regularization term that pulls predictor- and partition-specific penalty parameters toward 1 on the raw (positive) scale. This acts as a "meta-penalty" on the penalty magnitudes themselves.
Usage
.compute_meta_penalty(
wiggle_penalty,
penalty_vec,
meta_penalty_coef,
unique_penalty_per_predictor,
unique_penalty_per_partition
)
Arguments
wiggle_penalty |
Numeric; current wiggle penalty on raw scale. |
penalty_vec |
Numeric vector; current predictor/partition penalties
on raw scale. May be empty ( |
meta_penalty_coef |
Numeric; coefficient for the meta-penalty. |
unique_penalty_per_predictor |
Logical; whether predictor-specific penalties are active. |
unique_penalty_per_partition |
Logical; whether partition-specific penalties are active. |
Details
The penalty takes the form:
0.5 \times c_{\mathrm{meta}} \times \sum_{j} (\lambda_{j} - 1)^{2}
+ 0.5 \times 10^{-32} \times (\lambda_{w} - 1)^{2}
where \lambda_{j} are predictor/partition penalties and
\lambda_{w} is the wiggle penalty.
Value
Numeric scalar; the regularization penalty value.
Compute Gradient of Regularization (Meta) Penalty
Description
Computes the gradient of the meta-penalty with respect to the log-scale penalty parameters, incorporating the exp parameterization chain rule.
Usage
.compute_meta_penalty_gradient(
wiggle_penalty,
penalty_vec,
meta_penalty_coef,
unique_penalty_per_predictor,
unique_penalty_per_partition
)
Arguments
wiggle_penalty |
Numeric; current wiggle penalty on raw scale. |
penalty_vec |
Numeric vector; current predictor/partition penalties
on raw scale. May be empty ( |
meta_penalty_coef |
Numeric; coefficient for the meta-penalty. |
unique_penalty_per_predictor |
Logical; whether predictor-specific penalties are active. |
unique_penalty_per_partition |
Logical; whether partition-specific penalties are active. |
Details
Under exp parameterization \lambda = \exp(\theta):
\frac{\partial}{\partial \theta}
\left[ 0.5 c (\exp(\theta) - 1)^{2} \right]
= c (\lambda - 1) \lambda
Value
Numeric vector; gradient of the meta-penalty on the log scale. Length equals 2 + length(penalty_vec).
Compute Partitioned GEE Score Vector
Description
Computes the GEE score vector \mathbf{X}^{\top}\mathrm{diag}
(\mathbf{W})\mathbf{V}^{-1}(\mathbf{y} - \boldsymbol{\mu}) split
into per-partition pieces of dimension p \times 1 each. Uses
the identity \mathbf{V}^{-1} = \mathbf{I} + \boldsymbol{\Delta}_V
where \boldsymbol{\Delta}_V = \mathbf{V}^{-1} - \mathbf{I} is
precomputed and fixed across iterations, avoiding explicit formation
of \mathbf{V}^{-1}.
Usage
.compute_score_V_partitioned(
X,
X_block,
y,
result,
K,
p_expansions,
family,
W,
Delta_V,
observation_weights
)
Arguments
X |
List of partition-specific design matrices. |
X_block |
Full |
y |
List of response vectors by partition. |
result |
List of current coefficient column vectors by partition. |
K, p_expansions |
Integer dimensions. |
family |
GLM family object. |
W |
Length- |
Delta_V |
Fixed |
observation_weights |
List of observation weights by partition. |
Value
A list of K+1 column vectors, each
p\_expansions \times 1, representing the partition-wise
components of the full GEE score.
Compute Pseudocount Delta for Link Function Stabilization
Description
Determines the pseudocount \delta used to stabilize link function
transformations during GCV penalty tuning. For identity link or when
the response is naturally in the domain of the link function, returns 0.
Otherwise, finds the \delta that makes the transformed response
distribution most closely approximate a t-distribution.
Usage
.compute_tuning_delta(family, unl_y, N_obs, observation_weights, opt)
Arguments
family |
GLM family object. |
unl_y |
Numeric vector; unlisted response values (concatenated across partitions). |
N_obs |
Integer; total sample size. |
observation_weights |
List or NULL; observation weights by partition. |
opt |
Logical; whether optimization is being performed. |
Value
Numeric scalar; the pseudocount \delta \geq 0.
Compute Predictions During Penalty Tuning
Description
Wrapper around matmult_block_diagonal for computing partition-wise
predictions \mathbf{X}_{k} \boldsymbol{\beta}_{k} during GCV
penalty tuning.
Usage
.compute_tuning_predictions(
X,
B,
K,
parallel,
cl,
chunk_size,
num_chunks,
rem_chunks
)
Arguments
X |
List; design matrices by partition. |
B |
List; coefficient vectors by partition. |
K |
Integer; number of interior knots. |
parallel |
Logical; use parallel computation. |
cl |
Parallel cluster object. |
chunk_size, num_chunks, rem_chunks |
Integer; parallel chunking parameters. |
Value
List of prediction vectors, one per partition.
Compute Residuals for GCV Criterion During Penalty Tuning
Description
Computes residuals used in the numerator of the GCV criterion. Handles identity link, general GLM link functions with pseudocount delta, custom deviance residual functions, and observation weights.
Usage
.compute_tuning_residuals(
y,
preds,
delta,
family,
observation_weights,
K,
order_list,
...
)
Arguments
y |
List; response vectors by partition. |
preds |
List; prediction vectors by partition. |
delta |
Numeric; pseudocount for link function stabilization. |
family |
GLM family object. |
observation_weights |
List; observation weights by partition. |
K |
Integer; number of interior knots (partitions - 1). |
order_list |
List; observation indices per partition. |
... |
Additional arguments passed to |
Details
Three computation paths:
-
Identity link or no custom deviance: Standard
g(y) - \hat{\eta}residuals on the link scale, optionally weighted by observation weights for non-Gaussian families. For Gaussian identity link with heterogeneous weights, the weights have already been absorbed into X and y prior to this call. -
Custom deviance residuals: Delegates to
family$custom_dev.resids(y, mu, order_indices, family, observation_weights, ...).
Value
List of residual vectors, one per partition.
Damped BFGS Optimizer for GCV Penalty Tuning
Description
Custom implementation of damped BFGS quasi-Newton optimization for minimizing the GCV_u criterion. Uses step-size damping with backtracking and Sherman-Morrison-Woodbury inverse Hessian updates.
Usage
.damped_bfgs(
par,
log_penalty_vec,
gcvu_fxn,
gr_fxn,
env,
tol,
max_iter = 100,
...
)
Arguments
par |
Numeric vector; initial log-scale penalty parameters (first two elements are log(wiggle) and log(flat_ridge)). |
log_penalty_vec |
Numeric vector; log-scale predictor/partition penalties appended to the optimization vector. |
gcvu_fxn |
Function; GCV_u evaluation function with signature
|
gr_fxn |
Function; gradient function with signature
|
env |
List; tuning environment (passed through to gcvu_fxn and gr_fxn). |
tol |
Numeric; convergence tolerance for both GCV_u change and parameter change. |
max_iter |
Integer; maximum number of BFGS iterations (default 100). |
... |
Additional arguments passed to fitting functions. |
Details
The optimizer uses the following strategy:
Iterations 1-2: steepest descent with damping.
Iteration 3+: BFGS quasi-Newton with inverse Hessian approximation updated via the standard secant condition. Falls back to identity matrix when the update is numerically unstable.
Step acceptance: Armijo-like criterion (accept if
\mathrm{GCV}_{u}^{(\mathrm{new})} \leq \mathrm{GCV}_{u}^{(\mathrm{old})}).Backtracking: damping factor halved on rejection; terminates when damp <
2^{-10}(early iterations) or2^{-12}(later iterations).
Value
List containing:
- par
Numeric vector; best log-scale penalty parameters found.
- gcv_u
Numeric; best GCV_u value achieved.
- iterations
Integer; number of iterations performed.
Detect Whether Inequality Constraints Require Global (Dense) QP
Description
Thin wrapper around .solver_detect_qp_global().
This preserves the original helper name inside get_B() while
using the shared detection logic defined in solver_utils.R.
Usage
.detect_qp_global(qp_Amat, p_expansions, K)
Arguments
qp_Amat |
Inequality constraint matrix ( |
p_expansions |
Integer; number of basis terms per partition. |
K |
Integer; number of interior knots. |
Value
Logical; TRUE if dense QP is needed, FALSE if
partition-wise active-set is valid.
Extract Per-Partition Diagonal Blocks from a Full Matrix
Description
Given a P \times P matrix (e.g., the full-system
\mathbf{G} = (\mathbf{X}^{\top}\mathbf{V}^{-1}\mathbf{X} +
\boldsymbol{\Lambda})^{-1}), extracts the p \times p diagonal
block for each of the K+1 partitions.
Usage
.extract_G_diagonal(M, p_expansions, K)
Arguments
M |
A |
p_expansions |
Integer; number of basis terms (columns) per partition. |
K |
Integer; number of interior knots. |
Value
A list of length K+1, each element a
p\_expansions \times p\_expansions matrix corresponding to the
diagonal block for partition k.
Fit Coefficients During GCV Tuning: blockfit_solve or get_B
Description
Dispatches to blockfit_solve when the blockfit conditions are met
(i.e. env$use_blockfit is TRUE), otherwise calls get_B.
On blockfit_solve failure, falls back to get_B automatically.
Usage
.fit_coefficients(G_list, Lambda, L_partition_list, env, return_G_getB, ...)
Arguments
G_list |
List; eigendecomposition results from |
Lambda |
Matrix; current combined penalty matrix. |
L_partition_list |
List; partition-specific penalty matrices. |
env |
List; tuning environment from |
return_G_getB |
Logical; whether to return G inside the fit. Set to
TRUE within |
... |
Additional arguments forwarded to the fitting routine. |
Details
The blockfit condition mirrors lgspline.fit:
blockfit && length(flat_cols) > 0 && K > 0, pre-computed in
tune_Lambda and stored in env$use_blockfit.
return_G_getB is set to TRUE by the callers so that
B_list$G_list contains the updated G matrices (after any GLM
weight iteration inside get_B or blockfit_solve).
These are needed immediately after this call for AGAmult_wrapper,
GXX, and the trace computation.
Value
List; output of blockfit_solve or get_B, containing
at minimum $B (coefficient list) and $G_list.
Call get_B During GCV Tuning
Description
Internal wrapper that calls get_B with all arguments drawn from
the tuning environment env. Separated from .fit_coefficients
so the fallback path in .fit_coefficients is clean and does not
repeat the full argument list.
Usage
.fit_get_B(G_list, Lambda, L_partition_list, env, return_G_getB, ...)
Arguments
G_list |
List; eigendecomposition results from |
Lambda |
Matrix; current combined penalty matrix. |
L_partition_list |
List; partition-specific penalty matrices. |
env |
List; tuning environment from |
return_G_getB |
Logical; whether to return G inside the fit. Set to
TRUE within |
... |
Additional arguments forwarded to the fitting routine. |
Path 2: Gaussian Identity Link, No Correlation
Description
For the canonical Gaussian case without correlation structures, the
unconstrained estimate has the closed form
\hat{\boldsymbol{\beta}}_k = \mathbf{G}_k \mathbf{X}_k^{\top}
\mathbf{y}_k and the constrained estimate follows by a single
Lagrangian projection. No iteration is needed.
Usage
.get_B_gaussian_nocorr(
Xy,
Ghalf,
GhalfInv,
A,
K,
p_expansions,
R_constraints,
constraint_value_vectors,
family,
return_G_getB,
quadprog,
qp_Amat,
qp_bvec,
qp_meq,
qp_score_function,
parallel_aga,
parallel_matmult,
cl,
chunk_size,
num_chunks,
rem_chunks,
X,
y,
Lambda,
order_list,
observation_weights,
iterate,
tol,
glm_weight_function,
schur_correction_function,
need_dispersion_for_estimation,
dispersion_function,
unique_penalty_per_partition,
L_partition_list,
VhalfInv,
homogenous_weights,
...
)
Arguments
Xy |
List of cross-products |
Ghalf, GhalfInv |
Lists of |
A |
Constraint matrix |
K, p_expansions, R_constraints |
Integer dimensions. |
constraint_value_vectors |
Constraint RHS list encoding
|
family |
GLM family object. |
return_G_getB |
Logical; return covariance components. |
quadprog |
Logical; apply QP refinement for inequality constraints. |
qp_Amat, qp_bvec, qp_meq |
QP constraint specification. |
qp_score_function |
Score function for QP step. |
parallel_aga, parallel_matmult |
Logical flags. |
cl, chunk_size, num_chunks, rem_chunks |
Parallel parameters. |
X, y, Lambda, order_list, observation_weights |
Standard arguments passed through for optional QP refinement. |
iterate, tol, glm_weight_function, schur_correction_function |
need_dispersion_for_estimation,dispersion_function, unique_penalty_per_partition,L_partition_list,VhalfInv, homogenous_weights Standard arguments passed through. |
... |
Passed to sub-functions. |
Details
When K = 0 and there are no inequality constraints or nonzero
constraint values, the function returns
\hat{\boldsymbol{\beta}} = \mathbf{G}\mathbf{X}^{\top}\mathbf{y}
directly without forming the full P-dimensional OLS system.
Value
Same structure as get_B.
Path 1a: Gaussian Identity + GEE (Closed-Form Full-System Solve)
Description
Computes the constrained penalized GLS estimate for Gaussian response
with identity link when a correlation structure is present. Because
\mathbf{V}^{-1/2} couples all partitions, fitting must operate on
the full P-dimensional whitened system rather than partition-wise.
Usage
.get_B_gee_gaussian(
X_block,
X_tilde,
y_tilde,
VhalfInv_perm,
Lambda_block,
A,
K,
p_expansions,
constraint_value_vectors,
family,
return_G_getB,
quadprog,
qp_Amat,
qp_bvec,
qp_meq,
qp_score_function,
order_list,
observation_weights,
...
)
Arguments
X_block |
Full |
X_tilde |
Whitened design |
y_tilde |
Whitened response |
VhalfInv_perm |
|
Lambda_block |
Full |
A |
Constraint matrix ( |
K, p_expansions |
Integer dimensions. |
constraint_value_vectors |
Constraint RHS list encoding
|
family |
GLM family object. |
return_G_getB |
Logical; return covariance components. |
quadprog |
Logical; apply QP refinement. |
qp_Amat, qp_bvec, qp_meq |
QP constraint specification. |
qp_score_function |
Score function for QP step. |
order_list, observation_weights |
Standard partition arguments. |
... |
Passed to score function. |
Details
The unconstrained GLS estimate is:
\hat{\boldsymbol{\beta}} = \mathbf{G}(\mathbf{X}^{*\top}
\mathbf{y}^{*}), \quad
\mathbf{G} = (\mathbf{X}^{*\top}\mathbf{X}^{*} +
\boldsymbol{\Lambda})^{-1}
where \mathbf{X}^{*} = \mathbf{V}^{-1/2}\mathbf{X} and
\mathbf{y}^{*} = \mathbf{V}^{-1/2}\mathbf{y}. The constrained
estimate is then obtained via the \mathbf{G}^{1/2}\mathbf{r}^*
trick in the full P-dimensional space.
Value
If return_G_getB = TRUE: list with B,
G_list, and qp_info. Otherwise: list of B and
qp_info.
Path 1b: Non-Gaussian GEE (Damped SQP with Full Whitened Design)
Description
Estimates constrained coefficients for non-Gaussian GLMs with correlation
structures using damped Sequential Quadratic Programming (SQP) in the
full P-dimensional whitened space. The whitened design
\mathbf{X}^{*} = \mathbf{V}^{-1/2}\mathbf{X} is required
because \mathbf{V}^{-1/2} couples all partitions.
Usage
.get_B_gee_glm(
X_block,
X_tilde,
y_block,
y_tilde,
VhalfInv_perm,
Lambda_block,
A,
K,
p_expansions,
constraint_value_vectors,
family,
return_G_getB,
iterate,
tol,
qp_Amat,
qp_bvec,
qp_meq,
qp_score_function,
order_list,
observation_weights,
glm_weight_function,
schur_correction_function,
need_dispersion_for_estimation,
dispersion_function,
VhalfInv,
...
)
Arguments
X_block |
Full |
X_tilde |
Whitened design |
y_block |
Unwhitened response vector |
y_tilde |
Whitened response |
VhalfInv_perm |
|
Lambda_block |
Full |
A |
Constraint matrix ( |
K, p_expansions |
Integer dimensions. |
constraint_value_vectors |
Constraint RHS list encoding
|
family |
GLM family object. |
return_G_getB |
Logical; return covariance components. |
iterate |
Logical; if |
tol |
Convergence tolerance. |
qp_Amat, qp_bvec, qp_meq |
QP constraint specification. |
qp_score_function |
Score function for QP step. |
order_list, observation_weights |
Standard partition arguments. |
glm_weight_function |
Function computing GLM working weights. |
schur_correction_function |
Function computing Schur corrections. |
need_dispersion_for_estimation |
Logical. |
dispersion_function |
Dispersion estimation function. |
VhalfInv |
Inverse square root correlation matrix. |
... |
Passed to weight, correction, dispersion, and score functions. |
Value
Same structure as .get_B_gee_gaussian.
Path 1b-Woodbury: Non-Gaussian GEE with Woodbury Acceleration
Description
Replacement for .get_B_gee_glm when the off-diagonal rank
r is low. At each damped Newton iteration, the Woodbury
decomposition is recomputed with updated GLM working weights
\mathbf{W}(\boldsymbol{\beta}^{(s)}) (since the weighted
perturbation \boldsymbol{\Delta}(\mathbf{W}) changes), and
the constrained step is taken via
.lagrangian_project_woodbury.
Usage
.get_B_gee_glm_woodbury(
X,
y,
K,
p_expansions,
VhalfInv_perm,
order_list,
A,
R_constraints,
constraint_value_vectors,
family,
return_G_getB,
iterate,
tol,
qp_Amat,
qp_bvec,
qp_meq,
qp_score_function,
observation_weights,
Lambda,
Lambda_block,
unique_penalty_per_partition,
L_partition_list,
wb_decomp_init,
wb_sqrt_init,
glm_weight_function,
schur_correction_function,
need_dispersion_for_estimation,
dispersion_function,
VhalfInv,
parallel_eigen,
parallel_aga,
parallel_matmult,
cl,
chunk_size,
num_chunks,
rem_chunks,
...
)
Arguments
X, y |
Lists of partition-specific design matrices and responses. |
K, p_expansions |
Integer dimensions. |
VhalfInv_perm |
|
order_list, observation_weights |
Standard partition arguments. |
A |
Constraint matrix ( |
constraint_value_vectors |
Constraint RHS list encoding
|
family |
GLM family object. |
return_G_getB |
Logical; return covariance components. |
iterate |
Logical; if |
tol |
Convergence tolerance. |
qp_Amat, qp_bvec, qp_meq |
QP constraint specification. |
qp_score_function |
Score function for QP step. |
Lambda, Lambda_block |
Shared and full penalty matrices. |
unique_penalty_per_partition |
Logical. |
L_partition_list |
Partition-specific penalty matrices. |
wb_decomp_init |
Initial Woodbury decomposition from
|
wb_sqrt_init |
Initial half-sqrt components (unused; retained for interface consistency with the Gaussian Woodbury path). |
glm_weight_function |
Function computing GLM working weights. |
schur_correction_function |
Function computing Schur corrections. |
need_dispersion_for_estimation |
Logical. |
dispersion_function |
Dispersion estimation function. |
VhalfInv |
Inverse square root correlation matrix. |
parallel_eigen, parallel_aga, parallel_matmult |
Logical flags. |
cl, chunk_size, num_chunks, rem_chunks |
Parallel parameters. |
... |
Passed to sub-functions. |
Details
The key precomputation exploited here is that
\boldsymbol{\Delta}_V = \mathbf{V}^{-1} - \mathbf{I} and
\boldsymbol{\Delta}_V \mathbf{X} are fixed across iterations.
At each step, the weighted perturbation
\boldsymbol{\Delta}(\mathbf{W}) = \mathbf{X}^{\top}
\mathrm{diag}(\mathbf{W})(\mathbf{V}^{-1} - \mathbf{I})\mathbf{X}
is computed in O(NP) using the precomputed
\boldsymbol{\Delta}_V \mathbf{X}, then split into
block-diagonal corrections (absorbed into the corrected Gram) and
an off-diagonal low-rank remainder.
Falls back to the dense .get_B_gee_glm if the Woodbury
decomposition becomes invalid (e.g.\ rank exceeds threshold or
\mathbf{F} is not positive definite) at any iteration.
Value
Same structure as .get_B_gee_glm.
Path 1a-Woodbury: Gaussian GEE with Woodbury Acceleration
Description
Replacement for .get_B_gee_gaussian when the off-diagonal rank
r of the \mathbf{V}^{-1} perturbation is low. Uses the
conserved \mathbf{G}^{1/2}\mathbf{r}^* OLS trick with
\mathbf{G}_V^{1/2} expressed as block-diagonal
\mathbf{G}_c^{1/2} plus rank-r correction through
\mathbf{U}_Q. The code structure closely mirrors
.get_B_gaussian_nocorr.
Usage
.get_B_gee_woodbury(
X,
y,
K,
p_expansions,
VhalfInv_perm,
order_list,
A,
R_constraints,
constraint_value_vectors,
family,
return_G_getB,
quadprog,
qp_Amat,
qp_bvec,
qp_meq,
qp_score_function,
observation_weights,
wb_decomp,
wb_sqrt,
parallel_aga,
parallel_matmult,
cl,
chunk_size,
num_chunks,
rem_chunks,
qp_global,
...
)
Arguments
X, y |
Lists of partition-specific design matrices and responses. |
K, p_expansions |
Integer dimensions. |
VhalfInv_perm |
|
order_list |
Partition-to-data index mapping. |
A |
Constraint matrix ( |
R_constraints |
Number of columns of A. |
constraint_value_vectors |
Constraint RHS list. |
family |
GLM family object. |
return_G_getB |
Logical; return covariance components. |
quadprog |
Logical; apply QP refinement. |
qp_Amat, qp_bvec, qp_meq |
QP constraint specification. |
qp_score_function |
Score function for QP step. |
observation_weights |
Observation weights. |
wb_decomp |
Output of |
wb_sqrt |
Output of |
parallel_aga, parallel_matmult |
Logical flags. |
cl, chunk_size, num_chunks, rem_chunks |
Parallel parameters. |
qp_global |
Logical; forced TRUE for GEE but passed for interface consistency. |
... |
Passed to sub-functions. |
Details
Cost is O(Kp^3 + Pr^2) compared to O(P^3) for the dense
Path 1a.
Value
Same structure as .get_B_gee_gaussian.
Path 3: Non-Gaussian GLM, No Correlation
Description
For GLMs with non-identity links or non-Gaussian families (without
correlation structures), unconstrained partition-wise estimates are
first obtained via Newton-Raphson (or OLS for Gaussian identity), then
the constrained estimate is computed by a single Lagrangian projection.
For non-canonical links, \mathbf{G} depends on the current
fitted values through the GLM working weights \mathbf{W}; the
projection is therefore iterated, updating \mathbf{G} at the
current constrained estimate until convergence.
Usage
.get_B_glm_nocorr(
X,
y,
X_gram,
Xy,
Lambda,
Ghalf,
GhalfInv,
A,
K,
p_expansions,
R_constraints,
constraint_value_vectors,
family,
return_G_getB,
iterate,
tol,
quadprog,
qp_Amat,
qp_bvec,
qp_meq,
qp_score_function,
unconstrained_fit_fxn,
keep_weighted_Lambda,
unique_penalty_per_partition,
L_partition_list,
parallel_eigen,
parallel_aga,
parallel_matmult,
parallel_unconstrained,
cl,
chunk_size,
num_chunks,
rem_chunks,
order_list,
observation_weights,
glm_weight_function,
schur_correction_function,
need_dispersion_for_estimation,
dispersion_function,
VhalfInv,
homogenous_weights,
...
)
Arguments
X, y |
Lists of partition-specific design matrices and responses. |
X_gram |
List of Gram matrices. |
Xy |
List of cross-products. |
Lambda |
Shared penalty matrix. |
Ghalf, GhalfInv |
Lists of matrix square roots by partition. |
A |
Constraint matrix. |
K, p_expansions, R_constraints |
Integer dimensions. |
constraint_value_vectors |
Constraint RHS list encoding
|
family |
GLM family object. |
return_G_getB |
Logical; return covariance components. |
iterate |
Logical; iterate for non-canonical links. |
tol |
Convergence tolerance. |
quadprog |
Logical; apply QP refinement. |
qp_Amat, qp_bvec, qp_meq |
QP constraint specification. |
qp_score_function |
Score function for QP step. |
unconstrained_fit_fxn |
Function for partition-wise unconstrained estimation. |
keep_weighted_Lambda, unique_penalty_per_partition |
Logical flags. |
L_partition_list |
Partition-specific penalty matrices. |
parallel_eigen, parallel_aga, parallel_matmult |
parallel_unconstrained Logical flags. |
cl, chunk_size, num_chunks, rem_chunks |
Parallel parameters. |
order_list, observation_weights |
Standard partition arguments. |
glm_weight_function, schur_correction_function |
need_dispersion_for_estimation,dispersion_function GLM customization. |
VhalfInv |
Inverse square root correlation (unused here). |
homogenous_weights |
Logical. |
... |
Passed to fitting, weight, and score functions. |
Value
Same structure as get_B.
Lagrangian Projection via OLS Reformulation
Description
Given unconstrained (or GEE-initialized) coefficient estimates
\hat{\boldsymbol{\beta}}, computes the constrained estimate
\tilde{\boldsymbol{\beta}} = \mathbf{U}\hat{\boldsymbol{\beta}}
where \mathbf{U} = \mathbf{I} - \mathbf{G}\mathbf{A}
(\mathbf{A}^{\top}\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^{\top}.
Usage
.lagrangian_project(
GhalfXy,
Ghalf,
A,
K,
p_expansions,
R_constraints,
constraint_value_vectors,
family,
parallel_aga,
parallel_matmult,
cl,
chunk_size,
num_chunks,
rem_chunks
)
Arguments
GhalfXy |
Numeric column vector |
Ghalf |
List of |
A |
Constraint matrix |
K |
Integer; number of interior knots. |
p_expansions |
Integer; number of basis terms per partition. |
R_constraints |
Integer; number of columns of |
constraint_value_vectors |
List of constraint right-hand-side
vectors encoding |
family |
GLM family object (used for |
parallel_aga, parallel_matmult |
Logical flags for parallel computation. |
cl |
Parallel cluster object. |
chunk_size, num_chunks, rem_chunks |
Parallel distribution parameters. |
Details
Rather than forming \mathbf{U} directly, the projection is
reformulated as a residual from an OLS problem (the
\mathbf{G}^{1/2}\mathbf{r}^* trick):
\mathbf{y}^* = \mathbf{G}^{-1/2}\hat{\boldsymbol{\beta}}
\mathbf{X}^* = \mathbf{G}^{1/2}\mathbf{A}
\mathbf{r}^* = (\mathbf{I} - \mathbf{X}^*(\mathbf{X}^{*\top}
\mathbf{X}^*)^{-1}\mathbf{X}^{*\top})\mathbf{y}^*
\tilde{\boldsymbol{\beta}} = \mathbf{G}^{1/2}\mathbf{r}^*
This avoids explicitly forming and inverting
\mathbf{A}^{\top}\mathbf{G}\mathbf{A}; the most expensive step is
the QR decomposition of the R \times R system inside
.lm.fit, which is far cheaper than the full P \times P
solve.
Value
A list of length K+1, each element a column vector of
constrained coefficients \tilde{\boldsymbol{\beta}}_k.
Woodbury-Corrected Lagrangian Projection
Description
Performs the constrained Lagrangian projection using the conserved
\mathbf{G}^{1/2}\mathbf{r}^* OLS trick, with
\mathbf{G}_V^{1/2} expressed as block-diagonal
\mathbf{G}_c^{1/2} plus a rank-r correction through
\mathbf{U}_Q. The structure is identical to
.lagrangian_project: form \mathbf{y}^*, form
\mathbf{X}^*, run .lm.fit, back-transform. Steps 1, 2,
and 4 each gain an additive rank-r correction at cost
O(Pr), while step 3 is literally unchanged.
Usage
.lagrangian_project_woodbury(
GhalfXy_V,
Ghalf_corrected,
A,
K,
p_expansions,
R_constraints,
constraint_value_vectors,
family,
wb_sqrt,
parallel_aga,
parallel_matmult,
cl,
chunk_size,
num_chunks,
rem_chunks
)
Arguments
GhalfXy_V |
Numeric column vector of length |
Ghalf_corrected |
List of |
A |
Constraint matrix ( |
K, p_expansions, R_constraints |
Integer dimensions. |
constraint_value_vectors |
Constraint RHS list encoding
|
family |
GLM family object. |
wb_sqrt |
Output of |
parallel_aga, parallel_matmult |
Logical flags. |
cl, chunk_size, num_chunks, rem_chunks |
Parallel parameters. |
Details
The correction matrix \mathbf{F} (identity plus rank-r
modification) factors as
\mathbf{F}^{1/2} = \mathbf{I}_P -
\mathbf{U}_Q \mathbf{C} \mathbf{U}_Q^{\top} where \mathbf{C}
is r \times r diagonal. The inverse square root is
\mathbf{F}^{-1/2} = \mathbf{I}_P +
\mathbf{U}_Q \mathbf{C}_{\mathrm{inv}} \mathbf{U}_Q^{\top}.
Value
A list of K+1 coefficient column vectors.
Quadratic Programming Refinement for Inequality Constraints
Description
After the Lagrangian projection handles smoothness equality constraints,
this function refines the estimate to satisfy additional inequality
constraints (monotonicity, derivative sign, range bounds, or user-supplied
constraints) via quadprog::solve.QP.
Usage
.qp_refine(
result,
X,
y,
K,
p_expansions,
A,
Lambda,
Lambda_block,
family,
iterate,
tol,
qp_Amat,
qp_bvec,
qp_meq,
qp_score_function,
order_list,
glm_weight_function,
schur_correction_function,
need_dispersion_for_estimation,
dispersion_function,
observation_weights,
VhalfInv,
...
)
Arguments
result |
List of current coefficient column vectors by partition. |
X |
List of partition-specific design matrices. |
y |
List of response vectors by partition. |
K |
Integer; number of interior knots. |
p_expansions |
Integer; number of basis terms per partition. |
A |
Equality constraint matrix |
Lambda |
Shared penalty matrix |
Lambda_block |
Full block-diagonal penalty matrix. |
family |
GLM family object. |
iterate |
Logical; if |
tol |
Convergence tolerance. |
qp_Amat |
Inequality constraint matrix |
qp_bvec |
Inequality constraint vector |
qp_meq |
Number of equality constraints within |
qp_score_function |
Score function
|
order_list |
List of index vectors mapping partition rows to original data ordering. |
glm_weight_function |
Function computing GLM working weights. |
schur_correction_function |
Function computing Schur corrections. |
need_dispersion_for_estimation |
Logical. |
dispersion_function |
Dispersion estimation function. |
observation_weights |
List of observation weights by partition. |
VhalfInv |
Inverse square root of the working correlation matrix in
the original observation ordering (or |
... |
Passed to weight, correction, dispersion, and score functions. |
Details
The subproblem at each iteration is a second-order Taylor approximation
of the penalized log-likelihood around the current iterate
\boldsymbol{\beta}^*. Collecting terms, this yields:
\tilde{\boldsymbol{\beta}} = \arg\min_{\boldsymbol{\beta}}
\left\{-\mathbf{d}^{\top}\boldsymbol{\beta} + \frac{1}{2}
\boldsymbol{\beta}^{\top}\mathbf{G}^{-1}\boldsymbol{\beta}\right\}
\quad \text{s.t.} \quad
\mathbf{A}^{\top}\boldsymbol{\beta} = \mathbf{0}, \quad
\mathbf{C}^{\top}\boldsymbol{\beta} \succeq \mathbf{c}
where \mathbf{d} is the score adjusted by the current iterate
and \mathbf{c} is the constraint value vector. Step acceptance
uses damped updates with deviance monitoring; see Nocedal and Wright
(2006) for the general SQP framework.
Value
A list with components:
- result
List of refined coefficient column vectors by partition.
- qp_info
List with QP solve metadata including Lagrangian multipliers and the active constraint matrix.
Compute rank-based distance matrix for AR(1) structures
Description
Converts the unique within-block distances returned by
.compute_dist_block() to integer lags 0, 1, 2, ... in
increasing order.
Usage
.rank_dists(spacetime, inds)
Recompute G at Current Coefficient Estimates
Description
Thin wrapper around .solver_recompute_G_at_estimate().
The numerical core is shared with blockfit_solve() so the
final G, Ghalf, and GhalfInv objects are
constructed through one code path.
Usage
.recompute_G_at_estimate(
X,
y,
result,
K,
Lambda,
family,
order_list,
glm_weight_function,
schur_correction_function,
need_dispersion_for_estimation,
dispersion_function,
observation_weights,
VhalfInv,
parallel_eigen,
parallel_matmult,
cl,
chunk_size,
num_chunks,
rem_chunks,
unique_penalty_per_partition,
L_partition_list,
...
)
Arguments
X |
List of partition-specific design matrices
|
y |
List of response vectors |
result |
List of current coefficient column vectors
|
K |
Integer; number of interior knots. |
Lambda |
Shared |
family |
GLM family object. |
order_list |
List of index vectors mapping partition rows to original data ordering. |
glm_weight_function |
Function computing GLM working weights
|
schur_correction_function |
Function computing Schur corrections to the information matrix. |
need_dispersion_for_estimation |
Logical; if |
dispersion_function |
Dispersion estimation function. |
observation_weights |
List of observation weights
|
VhalfInv |
Inverse square root correlation matrix, or
|
parallel_eigen, parallel_matmult |
Logical flags for parallel computation. |
cl |
Parallel cluster object. |
chunk_size, num_chunks, rem_chunks |
Parallel distribution parameters. |
unique_penalty_per_partition |
Logical. |
L_partition_list |
List of partition-specific penalty matrices. |
... |
Passed to weight, correction, and dispersion functions. |
Value
A list with components G, Ghalf, and
GhalfInv, each a list of K+1 matrices.
Assemble qp_info from a solve.QP Solution
Description
Packages the output of quadprog::solve.QP into the
qp_info list expected by downstream code (inference,
generate_posterior, and varcovmat construction).
Usage
.solver_assemble_qp_info(
last_qp_sol,
beta_block,
qp_Amat_combined,
qp_bvec_combined,
qp_meq_combined,
converged,
final_deviance,
info_matrix = NULL
)
Arguments
last_qp_sol |
Output of |
beta_block |
Final |
qp_Amat_combined |
Combined equality + inequality constraint matrix. |
qp_bvec_combined |
Combined constraint right-hand side. |
qp_meq_combined |
Integer; number of leading equality constraints. |
converged |
Logical; whether the outer loop converged. |
final_deviance |
Scalar deviance at convergence. |
info_matrix |
Optional information matrix; included in the
returned list when non- |
Details
Active constraint columns are identified as all equality columns
(1:qp_meq_combined) plus any inequality column whose
Lagrange multiplier exceeds sqrt(.Machine$double.eps).
This matches the convention used in .qp_refine().
Value
A list with elements solution, lagrangian,
active_constraints, iact, Amat_active,
bvec_active, meq_active, converged, and
final_deviance, plus info_matrix,
Amat_combined, bvec_combined, and
meq_combined when info_matrix is supplied.
Returns NULL if last_qp_sol is NULL.
Build Block-Diagonal Penalty Matrix
Description
Assembles the full P \times P block-diagonal penalty matrix
\boldsymbol{\Lambda} from a shared per-partition penalty
Lambda and optional partition-specific additive terms.
P = p \times (K+1) where p is the number of basis terms
per partition.
Usage
.solver_build_lambda_block(
Lambda,
K,
unique_penalty_per_partition,
L_partition_list
)
Arguments
Lambda |
Shared |
K |
Integer; number of interior knots ( |
unique_penalty_per_partition |
Logical; if |
L_partition_list |
List of |
Details
When unique_penalty_per_partition = TRUE, the k-th
diagonal block is Lambda + L_partition_list[[k]]; otherwise
every block is Lambda.
Value
A P \times P block-diagonal matrix,
P = p \times (K+1).
Detect Whether Inequality Constraints Require a Dense Global QP
Description
Inspects the columns of qp_Amat to determine whether every
inequality constraint is confined to a single partition block
(block-separable) or whether any constraint couples coefficients
across partitions.
Usage
.solver_detect_qp_global(qp_Amat, p_expansions, K)
Arguments
qp_Amat |
Inequality constraint matrix
( |
p_expansions |
Integer; number of basis terms per partition. |
K |
Integer; number of interior knots. |
Details
Returns FALSE (partition-wise active-set is valid) when all
columns have nonzeros in at most one block. Returns TRUE
(dense SQP required) when any column spans multiple blocks.
Value
Logical scalar.
Recompute G, Ghalf, and GhalfInv at a Supplied Coefficient Estimate
Description
Given current constrained coefficient estimates result,
recomputes the penalized information matrix
\mathbf{G}_k = (\mathbf{X}_k^{\top}\mathbf{W}_k
\mathbf{D}_k\mathbf{X}_k + \boldsymbol{\Lambda}_k)^{-1} and its
matrix square roots for each partition.
Usage
.solver_recompute_G_at_estimate(
X,
y,
result,
K,
Lambda,
family,
order_list,
glm_weight_function,
schur_correction_function,
need_dispersion_for_estimation,
dispersion_function,
observation_weights,
VhalfInv,
parallel_eigen,
parallel_matmult,
cl,
chunk_size,
num_chunks,
rem_chunks,
unique_penalty_per_partition,
L_partition_list,
...
)
Arguments
X |
List of partition-specific design matrices
|
y |
List of response vectors |
result |
List of current coefficient column vectors
|
K |
Integer; number of interior knots. |
Lambda |
Shared |
family |
GLM family object. |
order_list |
List of index vectors mapping partition rows to original data ordering. |
glm_weight_function |
Function computing GLM working weights
|
schur_correction_function |
Function computing Schur corrections to the information matrix. |
need_dispersion_for_estimation |
Logical; if |
dispersion_function |
Dispersion estimation function. |
observation_weights |
List of observation weights
|
VhalfInv |
Inverse square root correlation matrix, or
|
parallel_eigen, parallel_matmult |
Logical flags for parallel computation. |
cl |
Parallel cluster object. |
chunk_size, num_chunks, rem_chunks |
Parallel distribution parameters. |
unique_penalty_per_partition |
Logical. |
L_partition_list |
List of partition-specific penalty matrices. |
... |
Passed to weight, correction, and dispersion functions. |
Details
This is used after Newton-Raphson convergence in Path 3 of
get_B() and at the final return in blockfit_solve()
when return_G_getB = TRUE. The implementation matches
.recompute_G_at_estimate() exactly; it exists here so both
solvers can call the same numerical core.
Value
A list with components G, Ghalf, and
GhalfInv, each a list of K+1 matrices.
G[[k]] is computed as tcrossprod(Ghalf[[k]]) to
guarantee exact symmetry.
Grid Search Initialization for Penalty Tuning
Description
Evaluates the GCV_u criterion over a grid of initial wiggle and ridge penalty values to find a good starting point for BFGS optimization.
Usage
.tune_grid_search(
log_initial_wiggle,
log_initial_flat,
log_penalty_vec,
gcvu_fxn,
env,
include_warnings,
...
)
Arguments
log_initial_wiggle |
Numeric vector; log-scale candidate values for the wiggle penalty. |
log_initial_flat |
Numeric vector; log-scale candidate values for the flat ridge penalty. |
log_penalty_vec |
Numeric vector; log-scale predictor/partition penalties. |
gcvu_fxn |
Function; GCV_u evaluation function. |
env |
List; tuning environment. |
include_warnings |
Logical; whether to print warnings on failure. |
... |
Additional arguments passed to gcvu_fxn. |
Value
Numeric vector of length 2; the best (log_wiggle, log_flat) found.
Woodbury Decomposition of Correlation Structure
Description
Given \mathbf{V}^{-1/2}, the partition-specific design matrices,
and the penalty, decomposes the GLS information matrix as
\mathbf{G}_V^{-1} = \mathbf{G}_c^{-1} +
\boldsymbol{\Delta}_{\mathrm{off}} where \mathbf{G}_c absorbs
within-partition corrections from \mathbf{V}^{-1} - \mathbf{I}
and \boldsymbol{\Delta}_{\mathrm{off}} captures cross-partition
coupling with effective rank r.
Usage
.woodbury_decompose_V(
VhalfInv_perm,
X,
K,
p_expansions,
Lambda,
Lambda_block,
unique_penalty_per_partition,
L_partition_list,
order_list,
parallel_eigen,
cl,
chunk_size,
num_chunks,
rem_chunks,
rank_threshold_fraction = 1/3,
family
)
Arguments
VhalfInv_perm |
|
X |
List of partition-specific design matrices. |
K, p_expansions |
Integer dimensions. |
Lambda, Lambda_block |
Shared and full block-diagonal penalty matrices. |
unique_penalty_per_partition |
Logical. |
L_partition_list |
Partition-specific penalty matrices. |
order_list |
Partition-to-data index mapping. |
parallel_eigen |
Logical; parallel eigendecomposition. |
cl, chunk_size, num_chunks, rem_chunks |
Parallel parameters. |
rank_threshold_fraction |
Numeric; Woodbury is used only when
|
family |
GLM family object. |
Details
When r = 0 (no cross-partition coupling) or r \ge P/3
(low-rank approximation not worthwhile), returns
use_woodbury = FALSE and the caller should use the dense
Path 1 approach instead.
Value
A list with components:
- use_woodbury
Logical; FALSE triggers dense Path 1 fallback.
- Ghalf_corrected
List of corrected
\mathbf{G}_{c,k}^{1/2}.- GhalfInv_corrected
List of corrected
\mathbf{G}_{c,k}^{-1/2}.- G_corrected
List of corrected
\mathbf{G}_{c,k}.- L
Off-diagonal low-rank factor (
P \times r).- S_signs
Length-
rvector of\pm 1.- r
Integer effective rank.
- M
Precomputed
r \times rWoodbury inner inverse(\mathrm{diag}(1/S) + \mathbf{L}^{\top}\mathbf{G}_c \mathbf{L})^{-1}.- GL
List of
K+1matrices\mathbf{G}_{c,k} \mathbf{L}[rows_k,], eachp \times r.
Compute Woodbury Half-Square-Root Components
Description
Given the low-rank Woodbury components from
.woodbury_decompose_V, precomputes the \mathbf{U}_Q,
\mathbf{C}, and \mathbf{C}_{\mathrm{inv}} matrices
needed to express \mathbf{G}_V^{1/2} and
\mathbf{G}_V^{-1/2} as block-diagonal plus rank-r
correction.
Usage
.woodbury_halfsqrt_components(Ghalf_corrected, L, S_signs, r, K, p_expansions)
Arguments
Ghalf_corrected |
List of corrected |
L |
Off-diagonal low-rank factor ( |
S_signs |
Length- |
r |
Integer effective rank. |
K, p_expansions |
Integer dimensions. |
Details
The correction matrix \mathbf{F} (identity plus rank-r
modification) satisfies \mathbf{G}_V = \mathbf{G}_c^{1/2}
\mathbf{F}\,\mathbf{G}_c^{1/2} and its square root factors as
\mathbf{F}^{1/2} = \mathbf{I}_P - \mathbf{U}_Q \mathbf{C}
\mathbf{U}_Q^{\top}.
Value
A list with components:
- valid
Logical; FALSE if
\mathbf{F}is not positive definite (signals fallback to dense).- U_Q
P \times rleft singular vectors of\mathbf{Q} = \mathbf{G}_c^{1/2}\mathbf{L}.- C
r \times rdiagonal matrix\mathbf{I}_r - \mathrm{diag}(\sqrt{1 - d_{F,i}}).- C_inv
r \times rdiagonal matrix\mathrm{diag}(1/\sqrt{1 - d_{F,i}}) - \mathbf{I}_r.- GchalfUQ
List of
K+1matrices, eachp \times r:\mathbf{G}_{c,k}^{1/2} \mathbf{U}_Q[rows_k,].
Per-Iteration Weighted Woodbury Redecomposition
Description
Given the FIXED \boldsymbol{\Delta}_V = \mathbf{V}^{-1} -
\mathbf{I} and precomputed \boldsymbol{\Delta}_V \mathbf{X}
(both invariant across Newton iterations), together with the current
iteration's GLM working weights \mathbf{W} and weighted Gram
matrices, computes all Woodbury components needed for the constrained
step.
Usage
.woodbury_redecompose_weighted(
Delta_V,
X_block,
DV_X,
W,
X,
K,
p_expansions,
X_gram_weighted,
Lambda,
schur_corrections,
unique_penalty_per_partition,
L_partition_list,
parallel_eigen,
cl,
chunk_size,
num_chunks,
rem_chunks,
rank_threshold_fraction = 1/3,
family
)
Arguments
Delta_V |
Fixed |
X_block |
Full |
DV_X |
Precomputed |
W |
Length- |
X |
List of partition-specific design matrices. |
K, p_expansions |
Integer dimensions. |
X_gram_weighted |
List of |
Lambda |
Shared |
schur_corrections |
List of |
unique_penalty_per_partition |
Logical. |
L_partition_list |
Partition-specific penalty matrices. |
parallel_eigen |
Logical; parallel eigendecomposition. |
cl, chunk_size, num_chunks, rem_chunks |
Parallel parameters. |
rank_threshold_fraction |
Numeric; Woodbury threshold (default 1/3). |
family |
GLM family object. |
Details
The weighted perturbation is decomposed as
\boldsymbol{\Delta}(\mathbf{W}) = \mathbf{X}^{\top}
\mathrm{diag}(\mathbf{W})(\mathbf{V}^{-1} - \mathbf{I})\mathbf{X},
split into block-diagonal corrections (absorbed into the corrected
Gram) and an off-diagonal remainder factored at low rank.
Value
A list with the same structure as .woodbury_decompose_V:
- use_woodbury
Logical; FALSE if rank exceeds threshold.
- Ghalf_corrected
List of corrected
\mathbf{G}_{c,k}^{1/2}.- GhalfInv_corrected
List of corrected
\mathbf{G}_{c,k}^{-1/2}.- G_corrected
List of corrected
\mathbf{G}_{c,k}.- L
Off-diagonal low-rank factor (
P \times r).- S_signs
Length-
rsign vector.- r
Integer effective rank.
- M
Precomputed
r \times rWoodbury inner inverse.- GL
List of per-partition
\mathbf{G}_{c,k}\mathbf{L}[rows_k,].
Efficient Matrix Multiplication for \textbf{A}^{T}\textbf{G}\textbf{A}
Description
Efficient Matrix Multiplication for \textbf{A}^{T}\textbf{G}\textbf{A}
Usage
AGAmult_wrapper(
G,
A,
K,
p_expansions,
R_constraints,
parallel,
cl,
chunk_size,
num_chunks,
rem_chunks
)
Arguments
G |
List of G matrices ( |
A |
Constraint matrix ( |
K |
Number of partitions minus 1 ( |
p_expansions |
Number of columns per partition |
R_constraints |
Number of constraint columns |
parallel |
Use parallel processing |
cl |
Cluster object |
chunk_size |
Chunk size for parallel |
num_chunks |
Number of chunks |
rem_chunks |
Remaining chunks |
Details
Computes \textbf{A}^{T}\textbf{G}\textbf{A} efficiently in parallel chunks using AGAmult_chunk().
Value
Matrix product \textbf{A}^{T}\textbf{G}\textbf{A}
Lagrangian Multiplier Smoothing Splines: Mathematical Details
Description
This document provides the mathematical and implementation details for Lagrangian Multiplier Smoothing Splines as implemented in lgspline.
This package provides exhaustive resources for fitting multivariate smoothing smoothing splines with a monomial basis and analytical form cubic smoothing spline penalty.
The material is presented such that a programmer or statistician of reasonable experience and background can understand and implement the procedure from scratch, and also potentially critique some of the modelling choices that went into designing this package.
Informally, lgspline answers the following question: How can we best adapt a useful functionality of basis splines, under the alternative interpretation of smoothing as explicit external constraints instead?
The obvious benefit is a much more flexible and interpretable final model that for non-experienced users is simply easier to understand without post-hoc processing, and for experienced users can be used to customize models more easily.
The drawback is that the interpretation of constraints as external adds a new layer of complexity to each step of the model fitting process, whereas for implicit design matrix construction these complications are bypassed.
While it is true a B-spline can always be converted back into monomial form, tensor-product splines that generalize this to multiple dimensions often explodes the number and degree of interaction terms, the conversion may not be computationally stable, and it is not available in standard software.
Statistical Problem Formulation
Consider an N \times q matrix of predictors
\mathbf{T} = (\mathbf{t}_1, \dots, \mathbf{t}_N)^{\top} and an
N \times 1 response vector \mathbf{y} = (y_1, \dots, y_N)^{\top}.
We assume the relationship follows a generalized linear model with unknown
smooth function f:
y_i \sim \mathcal{D}(g^{-1}(f(\mathbf{t}_i)),\, \sigma^2)
where \mathcal{D} is a distribution (e.g. exponential family or related) with mean
\mu_i = g^{-1}(f(\mathbf{t}_i)), link function g(\cdot), and
dispersion parameter \sigma^2. For Gaussian response with identity
link, observations are independently distributed as
y_i \mid \mathbf{t}_i, \sigma^2 \sim \mathcal{N}(f(\mathbf{t}_i), \sigma^2).
The objective is to estimate f by:
Partitioning the predictor space into
K+1mutually exclusive regions.Fitting local polynomial models within each partition.
Enforcing smoothness at partition boundaries via Lagrangian multipliers.
Penalizing the integrated squared second derivative to discourage roughness.
Unlike other smoothing spline formulations, no post-fitting algebraic rearrangement or disentanglement of a spline basis is needed to obtain interpretable models. The polynomial expansions are homogeneous across partitions, and the relationship between predictor and response is explicit at the coefficient level.
To anchor the notation, in the single-predictor cubic case one would write
\hat{f}(t_i) = \hat{\beta}_{(0)} + \hat{\beta}_{(1)}t_i +
\hat{\beta}_{(2)}t_i^2 + \hat{\beta}_{(3)}t_i^3 =
\mathbf{x}_i^{\top}\hat{\boldsymbol{\beta}},
where \mathbf{x}_i = (1, t_i, t_i^2, t_i^3)^{\top}. The LMSS
formulation preserves exactly this kind of polynomial representation, but now
does so within each partition and then forces neighboring pieces to agree in
the smoothness conditions described below.
Core notation used throughout:
-
\mathbf{y}_{(N \times 1)}: Response vector. -
\mathbf{T}_{(N \times q)}: Matrix of predictors. -
\mathbf{X}_{(N \times P)}: Block-diagonal matrix of polynomial expansions, with diagonal blocks\mathbf{X}_kof dimensionn_k \times p. -
\boldsymbol{\Lambda}_{(P \times P)}: Block-diagonal penalty matrix, with blocks\boldsymbol{\Lambda}_kof dimensionp \times p. -
\hat{\boldsymbol{\beta}}_{(P \times 1)}: Unconstrained penalized estimate. -
\tilde{\boldsymbol{\beta}}_{(P \times 1)}: Constrained coefficient estimates. -
\mathbf{G}_{(P \times P)}: Block-diagonal matrix with blocks\mathbf{G}_k = (\mathbf{X}_k^{\top}\mathbf{W}_k\mathbf{D}_k\mathbf{X}_k + \boldsymbol{\Lambda}_k)^{-1}, where\mathbf{W}_kand\mathbf{D}_kare defined below. -
\mathbf{A}_{(P \times r)}: Constraint matrix encoding smoothness conditions. Reduced to linearly independent columns via pivoted QR decomposition. -
\mathbf{U}_{(P \times P)}:\mathbf{I} - \mathbf{G}\mathbf{A}(\mathbf{A}^{\top}\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^{\top}. -
\mathbf{D}_{(N \times N)}: Diagonal matrix of user-supplied observation weights (observation_weightsorweights). Defaults to the identity. These play the role of prior precision on individual observations: a weight of 2 is equivalent to seeing that observation twice. -
\mathbf{W}_{(N \times N)}: Diagonal matrix of GLM working weights. In the implementation these diagonal entries are whatever is returned byglm_weight_function; by default this isfamily$variance(mu), optionally multiplied by user-supplied observation weights. For Gaussian response with identity link,\mathbf{W} = \mathbf{I}. For other families,\mathbf{W}depends on the current fitted values and is updated at each Newton-Raphson iteration. For the common canonical families used by default, this matches the familiar Fisher-scoring weighting role. -
\mathbf{V}_{(N \times N)}: Correlation matrix of errors. When no correlation structure is specified,\mathbf{V} = \mathbf{I}. Otherwise supplied viaVhalfInvor estimated throughVhalfInv_fxn.
In the Gaussian identity case with unit weights and no correlation,
\mathbf{G}_k = (\mathbf{X}_k^{\top}\mathbf{X}_k + \boldsymbol{\Lambda}_k)^{-1}
and most formulas simplify accordingly. When \mathbf{D} or
\mathbf{W} appear in a formula, the product \mathbf{W}\mathbf{D}
means “GLM working weights times observation weights”; whenever one of
them is the identity it drops out.
Before these quantities reach the main fitting stage, the user-facing inputs
are parsed, standardized, and organized by process_input. When
the formula interface is used and auto_encode_factors = TRUE, that
preprocessing step also relies on helpers such as create_onehot
to encode factor levels before the design reaches lgspline.fit().
The notation in the remainder of this document therefore refers to the internal
objects that actually enter lgspline.fit(), not necessarily the raw
objects originally supplied by the user.
From the user side, many of the arguments that control these internal objects
can be supplied either individually or through the grouped lists
penalty_args, tuning_args, expansion_args,
constraint_args, qp_args, parallel_args,
covariance_args, return_args, and glm_args, as
documented in lgspline. These grouped lists are unpacked before
dispatch into the same fitting pipeline, so they are a convenience layer
rather than a separate modeling abstraction. A closely related exploratory
mode is dummy_fit = TRUE in lgspline or
lgspline.fit, which runs the preprocessing, partition
construction, expansion building, and penalty setup without solving for
nonzero coefficients, making it a practical way to inspect objects such as
X, A, the returned make_partition_list from
make_partitions, and the assembled penalties from
compute_Lambda before a full fit.
Model Formulation and Estimation
Piecewise Polynomial Structure
For K knots (one predictor) or K+1 partitions (multiple
predictors) there are K+1 mutually exclusive partitions
\mathcal{P}_0, \dots, \mathcal{P}_{K}. Each observation i
belongs to exactly one partition. Within partition k, the function is
represented as a polynomial of degree p-1 in each predictor:
\hat{f}_k(\mathbf{t}) = \mathbf{x}^{\top}\tilde{\boldsymbol{\beta}}_k
where \mathbf{x} collects the polynomial basis terms (intercept,
linear, quadratic, cubic, and optionally quartic and interaction terms) and
\tilde{\boldsymbol{\beta}}_k are the corresponding coefficients. In
one predictor, the same idea can be written more explicitly as
\hat{f}(t_i) = \sum_{k=0}^{K}
\mathbf{x}_{ik}^{\top}\hat{\boldsymbol{\beta}}_k
\mathbf{1}(t_i \in \mathcal{P}_k),
which highlights that the unconstrained problem is just a collection of
local polynomial regressions. The expansions are homogeneous across
partitions, so coefficients are directly comparable. This is implemented via
get_polynomial_expansions.
The exact contents of \mathbf{x} are controlled by the basis-expansion
arguments documented in lgspline: include_quadratic_terms,
include_cubic_terms, include_quartic_terms,
include_2way_interactions, include_3way_interactions,
include_quadratic_interactions, exclude_interactions_for,
exclude_these_expansions, and custom_basis_fxn. Likewise,
just_linear_with_interactions and
just_linear_without_interactions determine which predictors remain
structurally linear even though they still participate in the same
partition-wise polynomial bookkeeping described here.
Letting p denote the number of basis terms per partition,
P = p(K+1) is the total number of coefficients. The full design matrix
\mathbf{X} and penalty matrix \boldsymbol{\Lambda} are
block-diagonal with K+1 blocks, so unconstrained estimation reduces to
K+1 independent penalized regressions, which appears as follows for the identity link case:
\hat{\boldsymbol{\beta}}_k = \mathbf{G}_k \mathbf{X}_k^{\top} \mathbf{W}_k\mathbf{D}_k\mathbf{y}_k, \quad
\mathbf{G}_k = (\mathbf{X}_k^{\top}\mathbf{W}_k\mathbf{D}_k\mathbf{X}_k + \boldsymbol{\Lambda}_k)^{-1}.
For Gaussian identity with unit weights this reduces to the familiar
\mathbf{G}_k = (\mathbf{X}_k^{\top}\mathbf{X}_k + \boldsymbol{\Lambda}_k)^{-1}.
The block-diagonal structure means these can be computed in parallel across
partitions. In the user-facing interface this is realized by supplying a
cluster through cl, optionally controlling work splitting with
chunk_size, and enabling stages such as parallel_eigen
for the eigendecompositions and, in non-Gaussian Path 3, parallel_unconstrained
for the partition-wise unconstrained fits; nearby stages can likewise use
parallel_penalty and parallel_make_constraint. The eigenvalue
decomposition and matrix square roots of each \mathbf{G}_k are
computed by compute_G_eigen, and can be returned in the fitted
object as G and Ghalf when return_G = TRUE and
return_Ghalf = TRUE.
Fitted values for the canonical Gaussian case appear as
\tilde{\mathbf{y}} = \mathbf{X}\tilde{\boldsymbol{\beta}} = \mathbf{H}\mathbf{y}
for \mathbf{H} = \mathbf{X}\mathbf{U}\mathbf{G}\mathbf{X}^{\top}.
Smoothing Constraints and the Constraint Matrix
Without further intervention the piecewise polynomial will be discontinuous.
The central idea of LMSS is that smoothness is not hidden inside a special
basis, but instead imposed directly where neighboring partitions meet.
At each knot t_{k,k+1} between neighboring partitions k and
k+1, up to three smoothing constraints are imposed:
Continuity:
\mathbf{x}_{k,k+1}^{\top}\boldsymbol{\beta}_k = \mathbf{x}_{k,k+1}^{\top}\boldsymbol{\beta}_{k+1}.First-derivative continuity:
\mathbf{x}_{k,k+1}^{\prime\top}\boldsymbol{\beta}_k = \mathbf{x}_{k,k+1}^{\prime\top}\boldsymbol{\beta}_{k+1}.Second-derivative continuity:
\mathbf{x}_{k,k+1}^{\prime\prime\top}\boldsymbol{\beta}_k = \mathbf{x}_{k,k+1}^{\prime\prime\top}\boldsymbol{\beta}_{k+1}.
where \mathbf{x}^{\prime} and \mathbf{x}^{\prime\prime} are
elementwise first and second derivatives of the basis with respect to
\mathbf{t}. For the familiar cubic single-predictor basis
\mathbf{x} = (1, t, t^2, t^3)^{\top}, these derivative vectors are
\mathbf{x}' = (0, 1, 2t, 3t^2)^{\top}, \qquad
\mathbf{x}'' = (0, 0, 2, 6t)^{\top}.
With K knots this yields up to 3K scalar
constraints (for a single predictor; more for multiple predictors with
interactions), collected as linear equations
\mathbf{A}^{\top}\boldsymbol{\beta} = \mathbf{0}
in a P \times r matrix \mathbf{A}. The constraint matrix is
built by make_constraint_matrix and returned in the fitted
object as A.
In higher dimensions or with many partitions, the constraints can become
over-specified and force the model toward a single global polynomial. In these
cases it is recommended to drop second-derivative constraints or include quartic
terms, allowing the model to fit a richer surface while maintaining
perceived smoothness at knots. The appropriate constraint level can be
controlled via include_constrain_fitted,
include_constrain_first_deriv, and
include_constrain_second_deriv.
The companion flag include_constrain_interactions determines whether
the analogous mixed-partial constraints are imposed for interaction terms,
and no_intercept adds the special homogeneous equality constraint that
fixes the intercept at zero (the same behavior triggered by using
0 + in the formula interface).
Before computing the projection \mathbf{U}, the constraint matrix is
reduced to a linearly independent subset of columns via pivoted QR
decomposition. This avoids numerical instability from redundant constraints
and ensures \mathbf{A}^{\top}\mathbf{G}\mathbf{A} is invertible.
Lagrangian Projection
The constrained estimate is derived via Lagrangian multipliers. Define the
P \times P projection matrix:
\mathbf{U} = \mathbf{I} - \mathbf{G}\mathbf{A}(\mathbf{A}^{\top}\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^{\top}.
Then the constrained estimate is:
\tilde{\boldsymbol{\beta}} = \mathbf{U}\hat{\boldsymbol{\beta}}.
The matrix \mathbf{U} has the property that
\mathbf{U}\mathbf{G}\mathbf{U}^{\top} = \mathbf{U}\mathbf{G}, which is
used extensively in variance estimation and posterior draws. In words, the
unconstrained penalized estimate is projected back into the coefficient space
that satisfies the smoothness restrictions, and all subsequent uncertainty
calculations inherit that same projected geometry.
The projection is computed via get_U and, when requested,
returned in the fitted object as U through return_U = TRUE.
When the constraints are inhomogeneous
(\mathbf{A}^{\top}\boldsymbol{\beta} = \mathbf{c} with
\mathbf{c} \neq \mathbf{0}), a particular solution
\boldsymbol{\beta}_0 satisfying
\mathbf{A}^{\top}\boldsymbol{\beta}_0 = \mathbf{c} is added back
after projection, yielding the full Lagrangian solution
\mathbf{U}\hat{\boldsymbol{\beta}} + (\mathbf{I} - \mathbf{U})\boldsymbol{\beta}_0.
In lgspline and lgspline.fit, users realize
this by supplying extra equality columns in constraint_vectors
together with matching right-hand sides in constraint_values;
null_constraint provides the alternate shorthand documented in
lgspline when constraint_vectors is supplied and
constraint_values is left empty.
In practice \mathbf{U} is never explicitly formed during fitting.
The constrained estimate is obtained from a transformed OLS residual problem (the
\mathbf{G}^{1/2}\mathbf{r}^{*} trick) in four steps:
Obtain the unconstrained partition-wise unconstrained estimate
\hat{\boldsymbol{\beta}}.Set
\mathbf{y}^{*} = \mathbf{G}^{-1/2}\hat{\boldsymbol{\beta}}and\mathbf{X}^{*} = \mathbf{G}^{1/2}\mathbf{A}.Fit the linear model
\mathbb{E}[\mathbf{y}^{*}] = \mathbf{X}^{*}\boldsymbol{\gamma}by OLS using QR decomposition.Compute the residuals
\mathbf{r}^{*} = \mathbf{y}^{*} - \mathbf{X}^{*}(\mathbf{X}^{*\top}\mathbf{X}^{*})^{-1}\mathbf{X}^{*\top}\mathbf{y}^{*}from that transformed OLS fit and recover the constrained estimate by\tilde{\boldsymbol{\beta}} = \mathbf{G}^{1/2}\mathbf{r}^{*}.
A scaling factor 1/\sqrt{K+1} is applied to both
\mathbf{X}^{*} and \mathbf{y}^{*} prior to the OLS call
and divided out afterward, improving numerical conditioning when the
constraint matrix has many rows.
The most expensive operation in this approach is the QR decomposition of the
P \times r matrix \mathbf{X}^{*} = \mathbf{G}^{1/2}\mathbf{A},
which is far cheaper than working with the full P \times P system directly.
Without correlation or SQP constraints, \mathbf{G} is stored and operated upon as a
list of K+1 small p \times p matrices rather than the full
P \times P block-diagonal, saving substantial memory when K
is large and allowing for parallelism.
When correlation is present, \mathbf{X}^{\top}\mathbf{V}^{-1}\mathbf{X}
is no longer block-diagonal, so the full-dimensional system must be handled
directly unless the Woodbury acceleration
(see .woodbury_decompose_V) applies.
When additional inequality constraints are present, the code either augments the equality system with a
partition-wise active-set refinement (block-separable case) or falls back to
dense SQP via the Goldfarb-Idnani dual active-set method implemented in
solve.QP.
GLM Extension and Iterative Updates
Working Quantities
For GLM responses with mean \mu_i = g^{-1}(\eta_i) and working
weights w_i = [V(\mu_i)\{g'(\mu_i)\}^2]^{-1}, the penalized
information becomes
\mathbf{G}_k^{-1} = \mathbf{X}_k^{\top}\mathbf{W}_k\mathbf{X}_k
+ \boldsymbol{\Lambda}_k with
\mathbf{W}_k = \mathrm{diag}(w_i : i \in \mathcal{P}_k).
Because \mathbf{W} is diagonal,
\mathbf{X}^{\top}\mathbf{W}\mathbf{X} remains block-diagonal and
the four-step procedure carries through with
\mathbf{G}_k = (\mathbf{X}_k^{\top}\mathbf{W}_k\mathbf{X}_k +
\boldsymbol{\Lambda}_k)^{-1} in place of
(\mathbf{X}_k^{\top}\mathbf{X}_k + \boldsymbol{\Lambda}_k)^{-1}.
Under the default glm_weight_function, the diagonal entries reduce
to family$variance(mu) (optionally scaled by observation weights),
which for canonical families is the usual Fisher scoring weight.
The partition-wise unconstrained estimates are obtained by
unconstrained_fit_fxn, by default
unconstrained_fit_default, which initializes via an augmented
ridge trick (appending \boldsymbol{\Lambda}^{1/2} as
pseudo-observations to glm.fit) then refines using
damped_newton_r with nr_iterate. For
non-canonical log-link gamma regression, the keep_weighted_Lambda = TRUE
option correctly returns the augmented ridge estimate directly.
Three Fitting Paths
Path 1: Correlation structure present.
When a working correlation \mathbf{V} is present, such as for
marginal means models or generalized estimating equation (GEE)-like models,
\mathbf{V}^{-1} couples observations across partitions, so
partition-wise fitting is not directly available. Two sub-paths handle
this.
Path 1a (Gaussian identity + GEE) solves the whitened system
directly via
\tilde{\mathbf{G}} =
(\tilde{\mathbf{X}}^{\top}\tilde{\mathbf{X}} +
\boldsymbol{\Lambda}_{\mathrm{block}})^{-1} where
\tilde{\mathbf{X}} = \mathbf{V}^{-1/2}\mathbf{X}, then applies
the Lagrangian projection in the full P-space.
See .get_B_gee_gaussian.
Path 1b (non-Gaussian GEE) uses a damped SQP iteration on the
whitened system. The first iterate is a constrained Newton step from the
projection matrix \mathbf{U}; subsequent iterates solve the
quadratic subproblem via solve.QP.
See .get_B_gee_glm.
Both sub-paths have Woodbury-accelerated variants described below.
Path 2: Gaussian identity, no correlation.
The constrained estimate is obtained by a single Lagrangian projection
from the per-partition penalized least-squares cross-products (the four
steps in closed form). No outer iteration is needed. When K = 0 and
there are no additional constraints, this reduces to the ordinary
penalized closed form
\hat{\boldsymbol{\beta}} = \mathbf{G}\mathbf{X}^{\top}\mathbf{y}.
Implemented in .get_B_gaussian_nocorr.
Path 3: Non-Gaussian GLM, no correlation.
Unconstrained estimates are obtained separately within each partition,
then projected onto the constraint space. When the link is non-identity,
the information \mathbf{G} depends on the current fitted values
through the working weights, so the projection must be iterated. The
unconstrained anchor \hat{\boldsymbol{\beta}} is held fixed while
\mathbf{G} is recomputed at each iterate's constrained estimate:
\tilde{\boldsymbol{\beta}}^{(s+1)} =
\mathbf{U}^{(s)}\hat{\boldsymbol{\beta}}, \quad
\mathbf{U}^{(s)} = \mathbf{I} - \mathbf{G}^{(s)}\mathbf{A}
(\mathbf{A}^{\top}\mathbf{G}^{(s)}\mathbf{A})^{-1}\mathbf{A}^{\top}.
Iteration stops when the mean absolute coefficient change falls below
tol, or when the update begins increasing (in which case the
previous iterate is restored). The recomputation of weighted Gram
matrices, Schur corrections, and square-root information factors at each
step is handled by .solver_recompute_G_at_estimate. Implemented
in .get_B_glm_nocorr.
Woodbury Acceleration for Structured Correlation
For structured correlation matrices (AR(1), exchangeable, banded), the
perturbation \mathbf{V}^{-1} - \mathbf{I} is sparse. Writing the
GLS information matrix as
\mathbf{G}_V^{-1} = \underbrace{(\mathbf{X}^{\top}\mathbf{X} +
\boldsymbol{\Lambda})}_{\text{block-diagonal}} +
\underbrace{\mathbf{X}^{\top}(\mathbf{V}^{-1} -
\mathbf{I})\mathbf{X}}_{\boldsymbol{\Delta}},
the block-diagonal part of \boldsymbol{\Delta} (within-partition
corrections) is absorbed into per-partition eigendecompositions, yielding a
corrected block-diagonal inverse
\mathbf{G}_c. The off-diagonal remainder
\boldsymbol{\Delta}_{\mathrm{off}} captures cross-partition coupling
and has effective rank r determined numerically by
.woodbury_decompose_V. For example, AR(1) correlation
partitioned by time gives r \approx 2K since only observation pairs
straddling knot boundaries contribute.
Factoring
\boldsymbol{\Delta}_{\mathrm{off}} \approx
\mathbf{L}\mathbf{S}\mathbf{L}^{\top} where \mathbf{L} is
P \times r and \mathbf{S} is r \times r diagonal with
entries \pm 1, the Woodbury identity gives
\mathbf{G}_V = \mathbf{G}_c - \mathbf{G}_c\mathbf{L}
(\mathbf{S}^{-1} + \mathbf{L}^{\top}\mathbf{G}_c\mathbf{L})^{-1}
\mathbf{L}^{\top}\mathbf{G}_c,
where the inner matrix is only r \times r.
The four-step projection is preserved by expressing
\mathbf{G}_V^{1/2} through \mathbf{G}_c^{1/2}. Define
\mathbf{Q} = \mathbf{G}_c^{1/2}\mathbf{L} (computed
block-diagonally) with thin SVD
\mathbf{Q} = \mathbf{O}_Q\boldsymbol{\Sigma}_Q\mathbf{R}_Q^{\top}.
Then
\mathbf{G}_V^{1/2} = \mathbf{G}_c^{1/2}\mathbf{F}^{1/2}, \quad
\mathbf{F}^{1/2} = \mathbf{I}_P -
\mathbf{O}_Q\mathbf{C}\mathbf{O}_Q^{\top},
where \mathbf{C} is an r \times r diagonal matrix computed
in .woodbury_halfsqrt_components. Every step of the
projection decomposes as a block-diagonal operation through
\mathbf{G}_c^{1/2} plus an additive rank-r correction
through \mathbf{O}_Q:
\mathbf{y}^* = \mathbf{G}_c^{1/2}(\mathbf{X}^{\top}
\mathbf{V}^{-1}\mathbf{y}) -
\mathbf{G}_c^{1/2}\mathbf{O}_Q\mathbf{C}\mathbf{O}_Q^{\top}
\mathbf{G}_c^{1/2}(\mathbf{X}^{\top}\mathbf{V}^{-1}\mathbf{y}),
and similarly for \mathbf{X}^* and the back-transformation. The
OLS residual step itself is unchanged.
For the non-Gaussian Woodbury path
(.get_B_gee_glm_woodbury), the perturbation
\boldsymbol{\Delta}_V = \mathbf{V}^{-1} - \mathbf{I} and the
product \boldsymbol{\Delta}_V\mathbf{X} are precomputed once
and held fixed across Newton iterations. At each step, the weighted
perturbation
\boldsymbol{\Delta}(\mathbf{W}) =
\mathbf{X}^{\top}\mathbf{W}
(\mathbf{V}^{-1} - \mathbf{I})\mathbf{X} is formed using the
precomputed product, then split and re-factored via
.woodbury_redecompose_weighted.
The Woodbury path is used when r < P/3; otherwise the code falls
back to the dense whitened approach. If the correction matrix
\mathbf{F} is not positive definite at any point, the dense path
is also used as fallback.
See .get_B_gee_woodbury (Gaussian) and
.get_B_gee_glm_woodbury (non-Gaussian).
Step Control
Different paths use different step-control strategies.
In the GEE paths (Paths 1a and 1b), the coefficient update is damped:
\boldsymbol{\beta}^{(s+1)} = (1 - \alpha_s)\boldsymbol{\beta}^{(s)}
+ \alpha_s\boldsymbol{\beta}_{\mathrm{cand}}^{(s)},
with \alpha_s = 2^{-d_s} and d_s increased whenever the
candidate gives non-finite or larger deviance. The loop terminates after
10 consecutive rejections or 100 total iterations, and exits early when
both coefficient change and deviance improvement fall below tol
after a burn-in period.
In Path 3 (non-Gaussian, no correlation), the outer projection loop has
no line search. The algorithm recomputes \mathbf{G} at the current
constrained estimate and applies a fresh projection. If the mean absolute
coefficient change begins increasing, the previous iterate is restored
and the loop stops. The partition-wise unconstrained estimates themselves
are obtained by damped Newton-Raphson inside
unconstrained_fit_default, where
damped_newton_r computes a Newton direction once per
iteration and halves the step size until the penalized log-likelihood
improves.
Accommodating Correlation Structures
Parametric Correlation Structures
Suppose \mathrm{Cov}(\mathbf{y}) = \sigma^2\,\mathbf{V}(\boldsymbol{\theta})
for a known parametric family indexed by \boldsymbol{\theta} (e.g.,
AR(1) with \theta = \rho, Matern with
\boldsymbol{\theta} = (\ell, \nu), exchangeable with
\theta = \rho). The penalized generalized least-squares problem
becomes
\min_{\boldsymbol{\beta}}\;
(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^{\top}\mathbf{V}^{-1}
(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})
+ \boldsymbol{\beta}^{\top}\boldsymbol{\Lambda}\boldsymbol{\beta}
\quad \text{s.t.}\; \mathbf{A}^{\top}\boldsymbol{\beta} = \mathbf{0}.
The correlation matrix \mathbf{V} is supplied through the
fitted-object components Vhalf and VhalfInv, either
directly or via user functions Vhalf_fxn and VhalfInv_fxn.
When both are non-NULL, get_B dispatches to the GEE
paths (Path 1a or 1b). For built-in correlation structures, the required
square-root matrices are assembled numerically using
matsqrt, matinvsqrt, and invert.
Whitening and Permutation
Because the data are stored in partition ordering (all observations from
partition 0, then partition 1, etc.) while \mathbf{V} is in the
original observation ordering, a permutation is applied internally:
\mathbf{V}_{\mathrm{perm}}^{-1/2} =
\mathbf{V}^{-1/2}[\boldsymbol{\pi}, \boldsymbol{\pi}], where
\boldsymbol{\pi} = \texttt{unlist(order\_list)} maps original
indices to partition-ordered indices. The whitened design and response are
\tilde{\mathbf{X}} =
\mathbf{V}_{\mathrm{perm}}^{-1/2}\mathbf{X}_{\mathrm{block}} and
\tilde{\mathbf{y}} =
\mathbf{V}_{\mathrm{perm}}^{-1/2}\mathbf{y}.
The \mathbf{X} and \mathbf{y} inputs to
lgspline.fit are preserved in their unwhitened form.
Whitening is applied inside get_B and
blockfit_solve where the full N \times P
block-diagonal design is available, since applying
\mathbf{V}^{-1/2} to only the diagonal blocks of the partitioned
design would silently discard cross-partition contributions and corrupt
the Gram matrix.
Loss of Block-Diagonal Structure
Unlike the independent-errors case,
\mathbf{X}^{\top}\mathbf{V}^{-1}\mathbf{X} is not block-diagonal
because \mathbf{V}^{-1} introduces cross-partition coupling. The
unconstrained estimator
\hat{\boldsymbol{\beta}} =
(\mathbf{X}^{\top}\mathbf{V}^{-1}\mathbf{X} +
\boldsymbol{\Lambda})^{-1}
\mathbf{X}^{\top}\mathbf{V}^{-1}\mathbf{y}
requires a full P \times P solve, and the constraint projection
proceeds with
\mathbf{G} = (\mathbf{X}^{\top}\mathbf{V}^{-1}\mathbf{X} +
\boldsymbol{\Lambda})^{-1}.
For structured correlation matrices (AR(1), exchangeable, banded), the
perturbation \mathbf{V}^{-1} - \mathbf{I} is sparse and the
cross-partition coupling has low effective rank. In these cases the
Woodbury-accelerated paths
(.get_B_gee_woodbury,
.get_B_gee_glm_woodbury) recover the partition-wise
computational structure by decomposing the coupling into a
block-diagonal correction plus a low-rank remainder, as described in the
GLM Extension section. For dense or high-rank
\mathbf{V}^{-1}, the code falls back to the full whitened system
(.get_B_gee_gaussian, .get_B_gee_glm).
GEE Deviance Monitoring
For non-Gaussian models with correlation, the deviance used for
convergence monitoring is computed in the whitened space by
.bf_gee_deviance. When the family supplies
custom_dev.resids, the raw deviance residuals
r_i = \mathrm{sign}(d_i)\sqrt{|d_i|} are divided by
\sqrt{w_i} and pre=dmultiplied by
\mathbf{V}_{\mathrm{perm}}^{-1/2} before squaring and averaging:
D_{\mathrm{GEE}} = \frac{1}{N}\left\|
\mathbf{V}_{\mathrm{perm}}^{-1/2}\,
\mathbf{w}^{-1/2}\mathbf{r}\right\|^{2},
where \mathbf{w} is the vector of working weights at the current
iterate, clamped below at \sqrt{\varepsilon_{\mathrm{mach}}}. When
only dev.resids is available, the function falls back to the
standard mean deviance; otherwise it uses mean squared error.
REML Estimation of Correlation Parameters
Correlation parameters \boldsymbol{\theta} are estimated by minimizing
a negative restricted log-likelihood (REML) objective. The criterion
implemented in lgspline is a central-limit-theorem-based working
approximation to a Laplace-style marginal likelihood criterion,
applied here solely to correlation structure estimation rather than
penalty parameter selection.
Let \mathbf{D} = \mathrm{diag}(d_i) be the observation weight matrix,
\tilde{\mathbf{W}} = \mathrm{diag}(\tilde{w}_i) the GLM working weight
matrix at the current fitted values, \mathbf{V} the correlation matrix
parameterized by \boldsymbol{\rho} (a vector on the unconstrained
real line), and \tilde{\sigma}^{2} the dispersion profiled at its
restricted maximum likelihood estimate. The negative REML objective
implemented in lgspline, scaled by 1/N, is
-\ell_{R}(\boldsymbol{\rho}) = \frac{1}{N}\!\left[
-\log|\mathbf{V}^{-1/2}|
+ \frac{N}{2}\log\tilde{\sigma}^{2}
+ \frac{1}{2\tilde{\sigma}^{2}}
(\mathbf{y} - \boldsymbol{\mu})^{\top}
\mathbf{D}\tilde{\mathbf{W}}^{-1}\mathbf{V}^{-1}
(\mathbf{y} - \boldsymbol{\mu})
+ \frac{1}{2}\log|(\tilde{\sigma}^{2}\mathbf{U}\mathbf{G})^{-1}|^{+}
\right],
where |\cdot|^{+} denotes the generalized determinant (product of
nonzero eigenvalues), and
\boldsymbol{\mu} = g^{-1}(\mathbf{X}\tilde{\boldsymbol{\beta}}) are
the fitted values on the response scale.
Gradients with respect to correlation parameters are available in
closed form for all built-in structures except Matern, which uses
finite-difference approximation due to the complexity of differentiating
the modified Bessel function K_{\nu} with respect to \nu. See
reml_grad_from_dV for the full gradient derivation and
notation. Custom analytic gradients can be supplied through REML_grad,
and a fully custom criterion can replace REML through
custom_VhalfInv_loss. The Toeplitz example in lgspline
demonstrates how to supply custom correlation structures with user-defined
gradient functions. Optimization over these working correlation parameters is
then carried out by the same quasi-Newton engine used elsewhere in the
package, namely efficient_bfgs with fallback to
approx_grad when needed. In the user-facing interface, this
machinery is activated through correlation_structure,
correlation_id, spacetime, VhalfInv, Vhalf,
VhalfInv_fxn, Vhalf_fxn, VhalfInv_par_init,
REML_grad, custom_VhalfInv_loss, and
VhalfInv_logdet.
The gradient of the negative REML has three terms per parameter:
-
\frac{1}{2}\mathrm{tr}(\mathbf{V}^{-1}\partial\mathbf{V}/\partial\theta_j): the log-determinant contribution. -
-\frac{1}{2\tilde{\sigma}^{2}}\mathbf{r}^{\top} (\partial\mathbf{V}/\partial\theta_j)\mathbf{r}: the residual quadratic form contribution, where\mathbf{r} = \mathrm{diag}(\sqrt{d_i}/\sqrt{\tilde{w}_i}) \mathbf{V}^{-1/2}(\mathbf{y} - \boldsymbol{\mu}). -
-\frac{1}{2}\mathrm{tr}\!\left(\mathbf{M}^{+}\mathbf{X}_{*}^{\top} \mathbf{V}^{-1}(\partial\mathbf{V}/\partial\theta_j)\mathbf{V}^{-1} \tilde{\mathbf{W}}\mathbf{D}\mathbf{X}_{*}\right): the REML correction, where\mathbf{X}_{*} = \mathbf{X}\mathbf{U}is the constrained design and\mathbf{M} = \mathbf{X}_{*}^{\top}\mathbf{V}^{-1} \tilde{\mathbf{W}}\mathbf{D}\mathbf{X}_{*} + \mathbf{U}^{\top}\boldsymbol{\Lambda}\mathbf{U}is the projected penalized information.
For each supported correlation family, the derivatives
\partial\mathbf{V}/\partial\theta_j are available in closed form,
enabling analytic gradient computation for use with the quasi-Newton
optimizer.
Connection to Standard Mixed Model REML
The quadratic penalty \boldsymbol{\Lambda} acts as the inverse prior
covariance of a Gaussian random effect on the spline coefficients, with
the smoothing parameter satisfying \lambda = \tau^{2}/\sigma^{2};
this is the same mixed model representation used by mgcv, and
Monte Carlo draws under the resulting Laplace-style approximation are
available via generate_posterior. For Gaussian responses
with identity link and no penalization, the implemented criterion
coincides exactly with classical REML. For non-Gaussian responses, the
criterion substitutes the Fisher information for the full penalized
log-likelihood Hessian, exploiting the CLT approximation that
\tilde{\mathbf{W}}^{-1/2}(\mathbf{y} - \boldsymbol{\mu}) is
approximately Gaussian; this yields a method-of-moments style estimator
for the correlation and variance parameters that depends only on
mean-variance relationships through \mathbf{W}, and therefore
generalizes naturally to quasi-likelihood and other settings where a
fully specified log-likelihood is unavailable. The REML correction term
\log|(\tilde{\sigma}^{2}\mathbf{U}\mathbf{G})^{-1}|^{+} uses a
generalized log-determinant so that only nonzero eigenvalues contribute
when rank deficiency arises from smoothness constraints or
identifiability conditions in \mathbf{A}. When
\boldsymbol{\Lambda} is full rank, the criterion coincides with
the exact marginal likelihood from integrating out
\boldsymbol{\beta} under its Gaussian prior; when
\boldsymbol{\Lambda} is rank-deficient, unpenalized coefficients
are projected out in the REML sense while penalized coefficients are
integrated through their prior. During penalty tuning, the block-diagonal
approximation is retained for GCV criteria and gradients; since GCV is
rotation-invariant the practical effect on automatic selection is expected
to be negligible, though this is not confirmed and the tuned penalties can
always be overridden.
Built-In Correlation Structures
The package provides several built-in correlation structures for modeling
spatial and temporal dependence. These are specified via
correlation_structure with group membership in correlation_id
and spatial or temporal coordinates in spacetime (an N-row
matrix). Exchangeable correlation does not require spacetime.
All positive scale parameters are estimated on the log scale,
with back-transform \exp(\cdot). Parameters constrained to
(0, 1) use a double-exponential back-transform of the form
\exp(-\exp(\eta)), so optimization still occurs on the
unconstrained real line while the correlation remains bounded.
- Exchangeable
-
Aliases:
'exchangeable','cs','CS','compoundsymmetric','compound-symmetric'.A constant correlation
\nubetween any two observations within the same cluster. Parameterization:\nu = \exp(-\exp(\rho)), so\nu \in (0, 1). Only positive within-cluster correlation is supported under this parameterization. - Spatial Exponential
-
Aliases:
'spatial-exponential','spatialexponential','exp','exponential'.Correlation decays exponentially with distance:
\exp(-\omega d)wheredis Euclidean distance and\omega > 0. Parameterization:\omega = \exp(\rho). Mathematically equivalent to the power correlation\theta^{d}with\theta = e^{-\omega}, but with better numerical properties during optimization. - AR(1)
-
Aliases:
'ar1','ar(1)','AR(1)','AR1'.Correlation depends on rank difference between observations:
\nu^{r}whereris the rank difference within cluster. Parameterization:\nu = \exp(-\exp(\rho)), so\nu \in (0, 1). Only positive autocorrelation is supported. - Gaussian / Squared Exponential
-
Aliases:
'gaussian','rbf','squared-exponential'.Smooth decay with squared distance:
\exp(-d^{2}/(2\ell^{2}))where\ellis the length scale. Parameterization:\ell = \exp(\rho). - Spherical
-
Aliases:
'spherical','Spherical','cubic','sphere'.Polynomial decay with a hard cutoff at range
r:1 - 1.5(d/r) + 0.5(d/r)^{3}ford \le r, and0otherwise. Parameterization:r = \exp(\rho). - Matern
-
Aliases:
'matern','Matern'.Flexible correlation with adjustable smoothness:
(2^{1-\nu}/\Gamma(\nu))(\sqrt{2\nu}\,d/\ell)^{\nu}K_{\nu}(\sqrt{2\nu}\,d/\ell). Two parameters: length scale\ell = \exp(\rho_1)and smoothness\nu = \exp(\rho_2). No analytical gradient is available for\nudue to the difficulty of differentiating the modified Bessel functionK_{\nu}with respect to its order, so finite differences are used; this makes Matern slower and potentially less stable than other structures. - Gamma-Cosine
-
Aliases:
'gamma-cosine','gammacosine','GammaCosine'.Oscillatory dependence:
(d^{\alpha-1}e^{-\gamma d})/(\Gamma(\alpha)/\gamma^{\alpha})\cdot\cos(\omega d). Three parameters: shape\alpha = \exp(\rho_1), rate\gamma = \exp(\rho_2), frequency\omega = \exp(\rho_3). Reduces to exponential when\alpha = 1and\omega \approx 0. - Gaussian-Cosine
-
Aliases:
'gaussian-cosine','gaussiancosine','GaussianCosine'.Smooth oscillatory correlation:
\exp(-d^{2}/(2\ell^{2}))\cdot\cos(\omega d). Two parameters: length scale\ell = \exp(\rho_1)and frequency\omega = \exp(\rho_2). Reduces to Gaussian when\omega \approx 0.
Interpreting Estimated Correlation Parameters
Correlation parameters are estimated on transformed scales; they must be
back-transformed for interpretation. When confint.lgspline is called and the
inverse Hessian from BFGS is available, confidence intervals are returned
on the untransformed (working) scale and should be back-transformed as
described in the examples for lgspline.
Custom Correlation Structures
Custom correlation structures can be specified through:
-
VhalfInv_fxn: Creates\mathbf{V}^{-1/2}. -
Vhalf_fxn: Creates\mathbf{V}^{1/2}. When omitted, the code computes it by explicit inversion ofVhalfInv. -
REML_grad: Provides the analytical gradient of the REML objective. -
VhalfInv_logdet: Efficient log-determinant computation. -
custom_VhalfInv_loss: Replaces the REML objective entirely.
These functions enter lgspline through
correlation_structure, VhalfInv_fxn, Vhalf_fxn,
REML_grad, and custom_VhalfInv_loss, and the fitted object
retains the resulting correlation machinery in components such as
VhalfInv_fxn, Vhalf_fxn, and
VhalfInv_params_estimates. When VhalfInv is supplied but
Vhalf is not, Vhalf is computed unconditionally as the inverse
of VhalfInv for all family/link combinations, since both
get_B and blockfit_solve require it for GEE
estimation.
Variance and Dispersion Estimation
Once the constrained estimate has been obtained, the next questions are how much flexibility the fitted model effectively used and how uncertainty should be propagated through the same constrained geometry. The quantities in this section are therefore all built on the projected information matrices from the previous sections.
Effective Degrees of Freedom and Dispersion
The effective degrees of freedom is the trace of the hat matrix. In the
Gaussian identity case with observation weights and correlation, the fitted
linear operator is built from the dense GLS analogue
\mathbf{G}_{\mathrm{correct}} and can be written schematically as
\mathbf{H} =
\mathbf{V}^{-1/2}(\mathbf{W}\mathbf{D})^{1/2}
\mathbf{X}\mathbf{U}\mathbf{G}_{\mathrm{correct}}\mathbf{X}^{\top}
(\mathbf{W}\mathbf{D})^{1/2}\mathbf{V}^{-1/2},
where for Gaussian identity \mathbf{W} = \mathbf{I}. In the
no-correlation Gaussian case this reduces to the familiar
\mathbf{H} = \mathbf{X}\mathbf{U}\mathbf{G}\mathbf{X}^{\top}\mathbf{D}.
For Gaussian identity fits, the dispersion estimate is computed as a
weighted mean squared residual, optionally scaled by
N/(N - \mathrm{tr}(\mathbf{H})) when
unbias_dispersion = TRUE:
\tilde{\sigma}^{2} =
\frac{1}{N - \mathrm{tr}(\mathbf{H})}\|\mathbf{y} - \mathbf{\tilde{y}}_i \|^{2}.
More generally with weights, a correlation structure and non-linear link function:
\tilde{\sigma}^{2} =
\frac{1}{N - \mathrm{tr}(\mathbf{H})}\| \mathbf{V}^{-1/2}\mathbf{W}^{-1/2}\mathbf{D}^{1/2}(\mathbf{y} - \tilde{\mathbf{y}})\|^{2}.
This estimated dispersion is returned as sigmasq_tilde, and the
corresponding effective degrees of freedom trace is returned as
trace_XUGX. For non-Gaussian families, the fitting code delegates
dispersion estimation to dispersion_function; thus the package does
not assume a single closed-form Pearson-style formula outside the Gaussian
identity setting. The hat-matrix trace itself is assembled by
compute_trace_H in the dense correlation-aware case and by the
same blockwise products summarized by trace_XUGX in the simpler
no-correlation paths.
A concrete built-in non-Gaussian example is the Weibull AFT path, which pairs
weibull_family with weibull_dispersion_function,
weibull_glm_weight_function, and
weibull_schur_correction.
Users who want these quantities available for downstream inference should
keep estimate_dispersion = TRUE and return_varcovmat = TRUE
(the defaults), since wald_univariate,
confint.lgspline, and the prediction-standard-error path in
predict.lgspline all rely on the post-fit dispersion and
covariance components documented here.
Variance-Covariance Matrix
The variance-covariance matrix of \tilde{\boldsymbol{\beta}} is
estimated as:
\mathrm{Var}(\tilde{\boldsymbol{\beta}}) = \tilde{\sigma}^{2}(\mathbf{U}\mathbf{G}^{1/2})(\mathbf{U}\mathbf{G}^{1/2})^{\top}
using the outer-product form for numerical stability. The result is returned
as varcovmat when return_varcovmat = TRUE. The algebraically
equivalent expression \tilde{\sigma}^{2}\mathbf{U}\mathbf{G} is not
used because \mathbf{G} is only positive semi-definite when the
penalty matrix \boldsymbol{\Lambda} has zero eigenvalues (e.g., the
intercept and linear terms under the smoothing spline penalty when
flat_ridge_penalty = 0), which can introduce negative diagonal
entries in finite precision arithmetic. The outer-product form also
guarantees symmetry.
This is the Bayesian posterior covariance, treating the penalty as a
Gaussian prior on the coefficients. When exact_varcovmat = TRUE,
a frequentist correction is additionally computed:
\mathrm{Var}_{\mathrm{exact}}(\tilde{\boldsymbol{\beta}}) =
\tilde{\sigma^2}\mathbf{U}\mathbf{G}^{1/2}(\mathbf{X}^{\top}\mathbf{W}\mathbf{D}\mathbf{V}^{-1}\mathbf{X})\mathbf{G}^{1/2}\mathbf{U}^{\top} =
\tilde{\sigma}^{2}\mathbf{U}\mathbf{G}\mathbf{U}^{\top}
- \tilde{\sigma}^{2}\mathbf{U}\mathbf{G}\boldsymbol{\Lambda}\mathbf{G}\mathbf{U}^{\top}.
The first term is the Bayesian posterior covariance; the second is a frequentist correction such that for Gaussian identity link (with or without correlation), this is the exact variance-covariance matrix of the constrained estimator.
When a correlation structure is present (VhalfInv non-NULL),
the block-diagonal \mathbf{G} is replaced by the full weighted GLS
analogue
\mathbf{G}_{\mathrm{correct}} =
\left(\mathbf{X}^{\top}\mathbf{W}\mathbf{D}\mathbf{V}^{-1}\mathbf{X}
+ \boldsymbol{\Lambda}\right)^{-1},
where \mathbf{W} = \mathbf{I} in the Gaussian identity case.
This dense matrix is what enters the correlation-aware \mathbf{U},
\mathrm{Var}(\tilde{\boldsymbol{\beta}}), and
\mathrm{Var}_{\mathrm{exact}}(\tilde{\boldsymbol{\beta}})
computations.
In user-facing terms, return_varcovmat controls whether this matrix is
stored at all, while exact_varcovmat switches between the default
posterior/Laplace approximation and the exact frequentist correction in the
Gaussian-identity setting. The stored covariance is what powers
wald_univariate, confint.lgspline, and
se.fit = TRUE in predict.lgspline; the
critical_value argument supplied at fit time is carried forward as the
default cutoff for those interval-producing helpers.
Recomputation of G at Convergence
At the final iterate, \mathbf{G} is recomputed to reflect the
converged working weights and Schur corrections. The implementation
computes the weighted design
\mathbf{X}_{w}^{(k)} = \mathbf{X}_k \cdot \mathrm{diag}(\sqrt{\mathbf{w}_k}),
forms the weighted Gram matrix
\mathbf{X}_{w}^{(k)\top}\mathbf{X}_{w}^{(k)}, adds the Schur
correction, and performs eigendecomposition via
compute_G_eigen to obtain \mathbf{G}_k,
\mathbf{G}_k^{1/2}, and \mathbf{G}_k^{-1/2}. The relationship
\mathbf{G}_k = \mathbf{G}_k^{1/2}(\mathbf{G}_k^{1/2})^{\top} is
enforced exactly by construction, and the fitted object can retain these as
G and Ghalf when return_G = TRUE and
return_Ghalf = TRUE. The square-root factors are numerically
stabilized through the helper routines matsqrt and
matinvsqrt, which are also used elsewhere in the package when
dense analogues of \mathbf{G}^{1/2} or \mathbf{G}^{-1/2} are
required.
Bayesian Interpretation
The penalty has a natural Gaussian-prior interpretation, so once the constrained estimator and its covariance are available, Bayesian-style posterior simulation follows almost immediately. This section records the interpretation that is already implicit in the fitted object and in the package's posterior simulation helpers.
A Bayesian interpretation follows from viewing the penalty as a Gaussian prior on the coefficients. Conditional on the fitted smoothing parameters, the code samples on the coefficient scale from
\boldsymbol{\beta}^{(m)} =
\tilde{\boldsymbol{\beta}} +
\sqrt{\tilde{\sigma}^{2}},\mathbf{U}\mathbf{G}^{1/2}\mathbf{z}^{(m)},
\qquad \mathbf{z}^{(m)} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}).
The coefficients are then back-transformed to the original response and predictor scales.
When inequality constraints are absent, these draws are i.i.d. Gaussian posterior draws around the fitted mode. The underlying coefficient-draw closure also contains an elliptical slice sampling route for active inequality constraints, using the same covariance factor to keep retained draws in the feasible region.
At the implementation level, standard Gaussian posterior draws may place positive mass on the infeasible region, so the constrained-draw closure instead targets the truncated posterior
\pi(\boldsymbol{\beta} \mid \mathbf{y}) \propto
\exp\!\left(-\frac{1}{2}(\boldsymbol{\beta} - \tilde{\boldsymbol{\beta}})^{\top}
\mathbf{G}^{-1}(\boldsymbol{\beta} - \tilde{\boldsymbol{\beta}})\right)
\mathbf{1}(\mathbf{C}^{\top}\boldsymbol{\beta} \succeq \mathbf{c}),
yielding credible intervals that respect the constraint boundaries.
The public generate_posterior wrapper forwards
enforce_qp_constraints to the stored constrained-draw closure, so
constrained draws can be requested directly from the user-facing interface.
When a working correlation structure is present, the companion helper
generate_posterior_correlation extends this idea by propagating
uncertainty in the fitted correlation parameters through the same
VhalfInv_fxn/Vhalf_fxn machinery described in the correlation
section, rather than conditioning only on fixed covariance parameters. Correlation
parameters are drawn from a multivariate normal distribution centered about their estimates
with the inverse approxiamte BFGS Hessian of the REML optimization problem
VhalfInv_params_vcov used by default (or a custom alternative as
supplied to the argument correlation_param_vcov_sc).
Inequality Constraints via Sequential Quadratic Programming
Overview
Inequality constraints of the form
\mathbf{C}^{\top}\boldsymbol{\beta} \succeq \mathbf{c} handle
shape restrictions such as monotonicity, convexity, or boundedness. In
the monomial basis, these are linear in \boldsymbol{\beta}.
Monotonicity at a grid of points requires the first derivative polynomial
to be non-negative there; convexity requires the second derivative to be
non-negative; range constraints bound the fitted values directly. All of
these translate to linear inequality constraints on the coefficient
vector.
The inequality pieces are assembled by process_qp, which
returns the qp_Amat, qp_bvec, and qp_meq objects
passed to solve.QP, together with a
quadprog flag indicating whether any inequality constraints are
active. For derivative-sign constraints, process_qp calls
.build_deriv_qp, which uses
make_derivative_matrix on the expansion-standardized
design and maps the derivative rows into the full P-dimensional
coefficient space partition by partition.
Partition-Wise Active-Set Method
The sparsity pattern of \mathbf{C} is inspected automatically by
.detect_qp_global (equivalently
.solver_detect_qp_global). When every column of
\mathbf{C} has nonzero entries in only a single partition block,
the constraint system is block-separable and an active-set method can
replace the dense QP.
At each iteration, the active set \mathcal{A} (constraints
satisfied at equality) is appended to \mathbf{A} as additional
equality constraints:
\mathbf{A}_{\mathrm{aug}} = [\mathbf{A} \mid
\mathbf{C}_{\mathcal{A}}].
The constrained estimate is obtained by the same OLS projection with
\mathbf{A}_{\mathrm{aug}} in place of \mathbf{A}, and
since \mathbf{A}_{\mathrm{aug}} retains block-diagonal
compatibility, all operations remain partition-wise.
The active set is updated by checking primal feasibility (adding the
most-violated inactive constraint) and dual feasibility (dropping the
active constraint with the most negative Lagrange multiplier) until the
KKT conditions are satisfied. Lagrange multipliers for active
inequalities are recovered from the OLS fit used in the Lagrangian
projection: the fitted coefficients on
\mathbf{X}^* = \mathbf{G}^{1/2}\mathbf{A}_{\mathrm{aug}} give
the multipliers up to sign. Implemented in
.active_set_refine and
.check_kkt_partitionwise.
When any column of \mathbf{C} spans multiple partition blocks
(for example, cross-knot monotonicity constraints), the block-diagonal
structure is broken and the dense SQP approach is required. The
selection is made automatically. If the active-set method does not
converge within its iteration limit (default 50), the code falls back
to dense SQP as well.
Dense SQP Iteration
The dense SQP approach, implemented in .qp_refine,
solves a sequence of quadratic subproblems approximating the penalized
log-likelihood. At each iteration s:
Compute the information matrix
\mathbf{M}^{(s)} = \mathbf{X}^{\top}\mathbf{W}^{(s)}\mathbf{X} + \boldsymbol{\Lambda}_{\mathrm{block}} + \mathbf{S}^{(s)}, where\mathbf{S}^{(s)}is the Schur complement correction.Compute the score vector via
qp_score_function.Solve the QP with
solve.QP, where the combined constraint matrix is[\mathbf{A} \mid \mathbf{C}]with the firstRcolumns treated as equalities.Apply a damped update
\boldsymbol{\beta}^{(s+1)} = (1 - \alpha)\boldsymbol{\beta}^{(s)} + \alpha\boldsymbol{\beta}_{\mathrm{QP}}, with\alpha = 2^{-d}anddincremented upon deviance increase.
A rescaling factor
\mathrm{sc} = \sqrt{\mathrm{mean}(|\mathbf{M}^{(s)}|)} is applied
to the Hessian and linear term before calling the QP solver. The
equality-constrained estimate from the Lagrangian projection serves as
a warm start. The qp_score_function defaults to the canonical GLM
score
\mathbf{X}^{\top}(\mathbf{y} - \boldsymbol{\mu}); for custom
models a different score can be supplied.
Active Set and Lagrange Multipliers
The active set at the solution identifies binding inequality constraints.
The Lagrange multipliers quantify the cost of each: a multiplier of zero
means the constraint is not binding. The implementation stores active
constraint indices, the corresponding submatrix of the constraint matrix,
and the multiplier vector in the qp_info list returned alongside
coefficient estimates, with components lagrangian, iact,
and Amat_active. When
return_lagrange_multipliers = TRUE, the fitted object also stores
the final multiplier vector directly as lagrange_multipliers.
The original assembled inequality data are retained in the fitted
object's quadprog_list component (containing qp_Amat,
qp_bvec, and qp_meq from process_qp) so
the final active set can be interpreted relative to the full
specification. The final \mathbf{U} used in constructing the
posterior variance-covariance matrix is built from both the equality
constraints and the active inequality constraints at the solution.
Built-In Constraints
Built-in inequality constraints include:
Monotonicity:
qp_monotonic_increase,qp_monotonic_decrease. Enforced by requiring consecutive fitted values to be non-decreasing (or non-increasing):(\mathbf{x}_i - \mathbf{x}_{i-1})^{\top}\boldsymbol{\beta} \geq 0. These are constructed byprocess_qpfrom the partition-stacked block design reordered to observation order.Derivative sign:
qp_positive_derivative,qp_negative_derivative. Enforced through the first-derivative design matrix frommake_derivative_matrix. May beTRUE/FALSEor a character/integer vector selecting specific predictors.Second-derivative sign:
qp_positive_2ndderivative,qp_negative_2ndderivative. Same construction using the second-derivative design matrix.Response range:
qp_range_lower,qp_range_upper. Bounds on the linear predictor; for non-identity links, the bounds are transformed to the link scale insideprocess_qp.Custom constraints via
qp_Amat_fxn,qp_bvec_fxn, andqp_meq_fxn, which receive the design matrix structure and return the constraint matrix, bound vector, and number of equalities. These are commonly paired with a customqp_score_functionwhen the quadratic approximation uses a non-default likelihood. The low-level objectsqp_Amat,qp_bvec, andqp_meqremain documented inlgsplinebut in the current implementation serve as activation markers rather than being merged into the constraint set assembled byprocess_qp.
All constraints can be thinned to a user-specified subset of rows via
qp_observations, which process_qp applies before
assembly.
Blockfit Backfitting for Linear Non-Interactive Effects
Motivation
When a model contains both spline terms (receiving K+1
partition-specific coefficient vectors constrained to smoothness) and
non-interactive linear terms (“flat” terms, specified via
just_linear_without_interactions, receiving a single shared
coefficient vector \mathbf{v} across all partitions), the standard
solver carries K+1 copies of \mathbf{v} linked by equality
constraints. Backfitting avoids this inflation by solving a
lower-dimensional problem at each step. Write the partition-k
design as
\mathbf{X}_k = [\mathbf{Z}_k \mid
\mathbf{X}_{\mathrm{flat}}^{(k)}], where \mathbf{Z}_k contains
the spline columns and \mathbf{X}_{\mathrm{flat}}^{(k)} the flat
columns. This is invoked when blockfit = TRUE, flat columns are
non-empty, and K > 0.
The design, penalty, and constraint matrices are split into spline and
flat components by .bf_split_components, which extracts
spline rows from \mathbf{A}, drops null columns, rank-reduces via
QR, and detects mixed constraints (columns of \mathbf{A} with
nonzero entries on both spline and flat rows).
Block-Coordinate Descent
Spline step. Holding \mathbf{v} fixed, the code forms
the adjusted response
\mathbf{y}_k - \mathbf{X}_{\mathrm{flat}}^{(k)}\mathbf{v} and
applies the spline-only Lagrangian projection via
.bf_lagrangian_project:
\tilde{\boldsymbol{\beta}}_{\mathrm{spline}}^{(k)} =
\mathbf{U}_{\mathrm{spline}}\mathbf{G}_{\mathrm{spline}}
\mathbf{Z}_k^{\top}(\mathbf{y}_k -
\mathbf{X}_{\mathrm{flat}}^{(k)}\mathbf{v}).
Flat step. Holding spline coefficients fixed, the shared flat vector is updated by pooled penalized regression on residuals:
\mathbf{v} = \left(\sum_{k=0}^{K}
\mathbf{X}_{\mathrm{flat}}^{(k)\top}
\mathbf{X}_{\mathrm{flat}}^{(k)} +
\boldsymbol{\Lambda}_{\mathrm{flat}}\right)^{-1}
\sum_{k=0}^{K}
\mathbf{X}_{\mathrm{flat}}^{(k)\top}
(\mathbf{y}_k - \mathbf{Z}_k
\tilde{\boldsymbol{\beta}}_{\mathrm{spline}}^{(k)}).
When the constraint matrix \mathbf{A} has mixed columns (nonzero
entries on both spline and flat rows), the flat update instead solves a
KKT system enforcing the residual equality constraint
\mathbf{A}_{\mathrm{flat}}^{\top}\mathbf{v} = \mathbf{c} -
\mathbf{A}_{\mathrm{spline}}^{\top}
\tilde{\boldsymbol{\beta}}_{\mathrm{spline}}
via .bf_constrained_flat_update.
Convergence is checked using the maximum absolute change across both spline and flat coefficients. In the weighted inner loop used by the GLM solvers, the flat-block change alone determines stopping.
Four Estimation Cases
blockfit_solve dispatches to one of four paths.
Case (a): Gaussian identity + GEE
(.bf_case_gauss_gee).
Whitening destroys block-diagonal structure, so the code skips
backfitting and performs the same full-system Gaussian GEE projection
used by get_B Path 1a. The result is split back into
spline and flat components for downstream assembly.
Case (b): Gaussian identity, no correlation
(.bf_case_gauss_no_corr).
Standard block-coordinate descent as described above. The spline-only
\mathbf{G}_{\mathrm{spline}} factors are precomputed once, and the
pooled flat penalized inverse is reused across iterations.
Case (c): GLM + GEE
(.bf_case_glm_gee).
Two stages. Stage 1 forms a warm start by running damped Newton-Raphson
on the unwhitened working response: each outer iteration computes
working responses and weights at the current linear predictor, then
calls .bf_inner_weighted for the inner backfitting loop.
Stage 2 refines the warm start on the full whitened system using the
damped SQP loop (.bf_sqp_loop), replicating the approach
used by .get_B_gee_glm.
Case (d): GLM without GEE
(.bf_case_glm_no_corr).
A damped Newton-Raphson outer loop updates working responses and weights,
while each inner iteration alternates between weighted spline and
weighted flat updates via .bf_inner_weighted. Deviance is
monitored across outer iterations for convergence and damping.
Inequality Constraints and Reassembly
Because flat coefficients are shared by construction, the corresponding
equality constraints are satisfied exactly. Smoothness constraints on the
spline block are enforced by the spline-only Lagrangian projection.
After backfitting convergence, inequality constraints are enforced via
the same partition-wise active-set or dense SQP refinement used by
get_B. The method is selected automatically by
.solver_detect_qp_global: block-separable constraints use the
active-set method through .active_set_refine, while
cross-partition constraints trigger the dense SQP loop through
.bf_sqp_loop. For GEE (Case c), inequality handling
occurs inside Stage 2 on the whitened system.
After convergence, the shared flat vector \mathbf{v} is copied
into each partition's coefficient vector, yielding
\boldsymbol{\beta}_k =
[\tilde{\boldsymbol{\beta}}_{\mathrm{spline}}^{(k)\top},
\mathbf{v}^{\top}]^{\top} for compatibility with downstream inference.
If blockfit_solve throws an error, a warning is issued
and the code falls back to get_B.
Knot Selection and Partitioning
The topic of knot selection is not the main focus of the package, but the partition structure is central because every later design matrix, penalty, and smoothness constraint depends on it. The defaults in lgspline are therefore meant to be practical and transparent rather than theoretically final.
Univariate Case
For a single predictor, the default partitioning is now handled by
make_partitions in the same k-means framework used more
generally: K+1 centers are fit on an internally standardized copy
of the predictor, controlled by standardize_predictors_for_knots, and
then returned on the raw scale. Custom knots can still be supplied via
custom_knots, in which case partition assignment is built directly
from those raw-scale breakpoints. The default number of knots K is
chosen adaptively based on N, p, q, and the GLM family.
For multivariate fits, the resulting partition metadata are returned as
make_partition_list and can be re-used in later calls to
lgspline. This is particularly useful when one wants to hold
the partition geometry fixed across repeated fits, for example while varying
penalties, families, or correlation structures.
Multivariate Case
For multiple predictors, K+1 cluster centers are identified by
k-means on an internally standardized predictor matrix via
make_partitions. This is the partitioning mechanism used to
determine the multivariate spline regions; see MacQueen (1967) for the
classical clustering formulation and Kisi et al. (2025) for a recent applied
example of k-means-driven partitioning in a nonlinear prediction
setting. Midpoints between neighboring centers
(those whose midpoint does not fall into a third cluster) serve as knot
locations. Observations are assigned to the nearest cluster center using
get.knnx, and the returned centers and knots are on
the original predictor scale. The resulting partition structure is a type
of Voronoi diagram and is stored in the fitted object as
make_partition_list. The do_not_cluster_on_these argument can
exclude certain predictors from clustering (e.g., a treatment indicator that
should not drive partitioning). The lower-level clustering behavior can be
further controlled by cluster_args and neighbor_tolerance,
while cluster_on_indicators determines whether binary predictors are
allowed to influence the partition geometry at all.
Standardizing Predictors
Higher-order polynomial terms can dramatically inflate or deflate the
magnitude of basis expansions, introducing numerical instability. All
polynomial basis expansions are scaled by
q_{0.69} - q_{0.31}, where q_{\zeta} is the \zeta-th
quantile of the expansion. For a standard normal distribution this quantity
is approximately 1, so the scaling is close to one standard deviation for
symmetric distributions. This fitting-stage rescaling is controlled by
standardize_expansions_for_fitting, while knot construction is
controlled separately by standardize_predictors_for_knots. The same
scaling is applied to the constraint matrix to maintain smoothness, and
coefficients are back-transformed to the original scale after fitting.
Smoothing Spline Penalty
Penalty Construction
The penalty matrix \boldsymbol{\Lambda}_s penalizes the integrated
squared total curvature of the fitted function over the observed predictor
ranges. This is the step that makes the piecewise polynomial fit genuinely
behave like a smoothing spline rather than merely a constrained regression
spline. The package computes this penalty directly from the monomial
structure of the basis rather than by appealing to a pre-tabulated spline
basis. For a single partition k with basis expansion
\mathbf{x}_k = (\phi_1(\mathbf{t}), \ldots, \phi_p(\mathbf{t}))^{\top}
where each \phi_i(\mathbf{t}) = \prod_{j=1}^{q} t_j^{\alpha_{ij}}
is a multivariate monomial:
\boldsymbol{\beta}_k^{\top}\boldsymbol{\Lambda}_s\boldsymbol{\beta}_k
= \int_{\mathbf{a}}^{\mathbf{b}}
\|\tilde{f}_k''(\mathbf{t})\|^{2}\,d\mathbf{t},
where \mathbf{a} and \mathbf{b} are the observed predictor
minimums and maximums (computed globally from the data, not
partition-specific), and
\tilde{f}_k(\mathbf{t}) = \mathbf{x}_k^{\top}\boldsymbol{\beta}_k
is the fitted function for partition \mathcal{P}_k.
Total curvature operator.
The integrated squared second derivative decomposes into q
curvature operators, one per predictor. For predictor v, the
curvature operator D_v is defined as
D_v = \frac{\partial^{2}}{\partial t_v^{2}}
+ \sum_{s \neq v}\frac{\partial^{2}}{\partial t_v\,\partial t_s}.
That is, D_v captures both the pure second derivative with respect
to t_v and all mixed second partial derivatives involving t_v.
The penalty matrix entries are then
[\boldsymbol{\Lambda}_s]_{ij}
= \sum_{v=1}^{q}\int_{\mathbf{a}}^{\mathbf{b}}
D_v(\phi_i)\,D_v(\phi_j)\,d\mathbf{t}.
Monomial derivative rule.
For a monomial \phi(\mathbf{t}) = \prod_j t_j^{\alpha_j}, the
derivatives entering D_v have closed forms. The pure second
derivative is
\frac{\partial^{2}}{\partial t_v^{2}}\prod_j t_j^{\alpha_j}
= \alpha_v(\alpha_v - 1)\,t_v^{\alpha_v - 2}\prod_{j \neq v} t_j^{\alpha_j},
which is zero when \alpha_v < 2. The mixed second derivative is
\frac{\partial^{2}}{\partial t_v\,\partial t_s}\prod_j t_j^{\alpha_j}
= \alpha_v\alpha_s\,t_v^{\alpha_v - 1}t_s^{\alpha_s - 1}
\prod_{j \neq v,s} t_j^{\alpha_j},
which is zero when \alpha_v < 1 or \alpha_s < 1. Applying
D_v to a monomial \phi_i produces a sum of monomials with
known coefficients and exponent vectors.
Factorized integration.
Because every D_v(\phi_i) is polynomial, the product
D_v(\phi_i)\,D_v(\phi_j) is also polynomial and the multivariate
integral factorizes over predictors:
\int_{\mathbf{a}}^{\mathbf{b}}\prod_{j=1}^{q} t_j^{e_j}\,d\mathbf{t}
= \prod_{j=1}^{q}\frac{b_j^{e_j+1} - a_j^{e_j+1}}{e_j + 1}.
Crucially, this integral runs over all q predictor ranges,
including predictors that do not appear in the integrand (for which
e_j = 0 and the factor reduces to b_j - a_j). This ensures
that the penalty is properly scaled relative to the volume of the
predictor space.
Single-predictor verification.
For q = 1 with expansion
\mathbf{x} = (1, t, t^{2}, t^{3})^{\top} on [a, b], the
curvature operator reduces to D_1 = \partial^{2}/\partial t^{2} (no
mixed partials exist), and the penalty matrix reduces to
\boldsymbol{\Lambda}_s
= \int_a^b \mathbf{x}''\mathbf{x}''^{\top}\,dt
= \begin{pmatrix}
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
0 & 0 & 4(b - a) & 6(b^{2} - a^{2}) \\
0 & 0 & 6(b^{2} - a^{2}) & 12(b^{3} - a^{3})
\end{pmatrix},
equivalent to the classical cubic smoothing spline penalty originally proposed by Reinsch.
Handling of non-spline predictors.
Predictors specified via just_linear_without_interactions or
just_linear_with_interactions do not receive higher-order
polynomial expansions in the design matrix. To ensure their curvature
contributions are still correctly computed (particularly through
interaction terms), the implementation temporarily appends phantom
higher-order columns (with zero data) for these predictors, computes
the full curvature penalty on the augmented basis, and then subsets the
result back to the original p \times p dimensions. This ensures
that interaction terms involving non-spline predictors receive appropriate
penalty contributions without affecting the rest of the estimation
pipeline.
Parallel computation.
Because the total penalty is an additive sum over predictors
(\boldsymbol{\Lambda}_s = \sum_{v=1}^{q}\boldsymbol{\Lambda}_{s,v}),
the computation can be parallelized by distributing the per-predictor
curvature matrices across workers via parallel::parLapply and
summing the results. This is controlled by the parallel_penalty
argument and is beneficial when q is large.
The penalty is computed by get_2ndDerivPenalty (single
predictor or subset) and get_2ndDerivPenalty_wrapper
(full assembly with optional parallelism and non-spline handling).
Because the smoothing penalty has zero eigenvalues for the intercept and
linear terms (whose second derivatives vanish), an optional ridge penalty on
lower-order terms is added for computational stability. The full penalty
block for partition k is:
\boldsymbol{\Lambda}_k = \lambda_w\bigl(\boldsymbol{\Lambda}_s + \lambda_r\boldsymbol{\Lambda}_r + \sum_{m=1}^{M}\xi_{mk}\mathbf{L}_{mk}\bigr)
where \lambda_w is the global wiggle penalty (wiggle_penalty),
\lambda_r is ridge penalty on linear and intercept terms
(flat_ridge_penalty) multiplied by the wiggle penalty, and
\xi_{mk} and \mathbf{L}_{mk} denote optional additional
penalty multipliers and matrices, including the predictor- and
partition-specific components activated through
unique_penalty_per_predictor, unique_penalty_per_partition,
predictor_penalties, and partition_penalties. This assembly is handled by
compute_Lambda.
The penalty matrix \boldsymbol{\Lambda} is stored as a list
of K+1 p \times p square, symmetric, positive semi-definite
matrices.
Penalty Optimization via Generalized Cross-Validation
After the structural pieces of the model are fixed, the main remaining
question is how much smoothing to apply. In lgspline, that tuning is
performed with generalized cross-validation, but it is carried out using the
same constrained estimator that will be used in the final model fit.
Penalty parameters are estimated on the log scale via exponential
parameterization (\lambda = \exp(\theta),
\theta \in \mathbb{R}), ensuring positivity. The chain rule factor
\partial\exp(\theta)/\partial\theta = \exp(\theta) = \lambda is
applied throughout. User-facing arguments (initial_wiggle,
initial_flat, predictor_penalties,
partition_penalties) accept values on the raw, natural scale;
conversion to log scale is handled internally. The final tuned values and
assembled penalty pieces are returned in the fitted object's
penalties component.
The total penalty matrix \boldsymbol{\Lambda} is constructed as:
\boldsymbol{\Lambda} = \lambda_w\mathbf{L}_{w} + \lambda_r\mathbf{L}_{r}
+ \sum_{j}\nu_j\mathbf{L}_j^{(\mathrm{pred})}
+ \sum_{k}\tau_k\mathbf{L}_k^{(\mathrm{part})},
where \mathbf{L}_{w} is the integrated squared second-derivative
penalty (i.e., \boldsymbol{\Lambda}_s above), \mathbf{L}_{r} is
a ridge penalty on intercept and linear coefficients,
\mathbf{L}_j^{(\mathrm{pred})} are predictor-specific penalties, and
\mathbf{L}_k^{(\mathrm{part})} are partition-specific penalties. The
scalars \lambda_w (wiggle_penalty), \lambda_r
(flat_ridge_penalty), \{\nu_j\}
(predictor_penalties), and \{\tau_k\}
(partition_penalties) are tuned.
The unbiased generalized cross-validation criterion is
\mathrm{GCV}_{u} = \frac{\sum_{i=1}^{N}D_{ii}\,r_i^{2}}{N(1 - \bar{W})^{2}},
where r_i are residuals on the link scale and
\bar{W} = \mathrm{tr}(\mathbf{H})/N is the mean of the hat-matrix
diagonal. For identity link, r_i = y_i - \hat{\eta}_i. For
non-identity links, the residuals are
r_i = g((y_i + \delta)/(1+2\delta)) - (\hat{\eta}_i + \delta)/(1+2\delta),
where \delta \geq 0 is a pseudocount that stabilizes the link
transformation, automatically tuned within tune_Lambda if not supplied
to delta.
Non-Gaussian families with observation weights
\omega_i have their residuals scaled by \omega_i. When the
family provides a custom deviance residual function, that function is used
in place of the link-scale residuals.
Pseudocount selection. The pseudocount \delta is chosen to
make the transformed response distribution most closely approximate a
t-distribution with N-1 degrees of freedom, in the sense of
minimizing the (optionally weighted) mean absolute deviation between the
sorted standardized transformed responses and the corresponding
t-quantiles. This is solved via Brent's method over
[10^{-64}, 1]. When the link is identity, or when the response
naturally lies in the domain of the link function, \delta = 0.
This behavior is exposed through the delta argument in
lgspline: supplying a fixed numeric value bypasses the internal
search, while leaving it NULL allows the tuning code to choose the
stabilizing pseudocount automatically when needed.
Meta-penalty regularization. A regularization term pulls the predictor- and partition-specific penalty parameters toward 1 on the raw scale:
P_{\mathrm{meta}}(\lambda_w, \nu_j, \tau_k)
= \frac{1}{2}c_{\mathrm{meta}}\sum_j(\nu_j - 1)^{2}
+ \frac{1}{2}\cdot 10^{-32}(\lambda_w - 1)^{2},
where c_{\mathrm{meta}} is a user-specified coefficient
(meta_penalty). The gradient of P_{\mathrm{meta}} on the log
scale, incorporating the exp chain rule, is
\partial P_{\mathrm{meta}}/\partial\theta_j = c_{\mathrm{meta}}(\nu_j - 1)\nu_j
and
\partial P_{\mathrm{meta}}/\partial\theta_1 = 10^{-32}(\lambda_w - 1)\lambda_w.
The total objective is \mathrm{GCV}_{u} + P_{\mathrm{meta}}.
Closed-Form Gradient of GCV
The gradient of \mathrm{GCV}_{u} with respect to
\theta_1 = \log\lambda_w is computed analytically via the quotient
rule:
\frac{\partial\mathrm{GCV}_{u}}{\partial\theta_1}
= \frac{1}{D^{2}}\left(\frac{\partial\mathcal{N}}{\partial\theta_1}D
- \mathcal{N}\frac{\partial D}{\partial\theta_1}\right),
where \mathcal{N} = \sum r_i^{2} (numerator) and
D = N(1 - \bar{W})^{2} (denominator). The key intermediates are:
-
\partial\mathbf{G}/\partial\lambda_w, computed from the matrix identity\partial(\mathbf{X}^{\top}\mathbf{X} + \boldsymbol{\Lambda})^{-1}/\partial\lambda = -\mathbf{G}(\partial\boldsymbol{\Lambda}/\partial\lambda)\mathbf{G}. -
\partial\mathbf{G}^{1/2}/\partial\lambda_w, derived from\partial\mathbf{G}/\partial\lambda_wvia the eigendecomposition chain rule. -
\partial\bar{W}/\partial\lambda_w, the derivative of the trace of the hat matrix\mathbf{H} = \mathbf{X}\mathbf{U}\mathbf{G}\mathbf{X}^{\top}, which depends on both\partial\mathbf{G}/\partial\lambda_wand\partial\mathbf{G}^{1/2}/\partial\lambda_w. -
\partial\mathcal{N}/\partial\theta_1 = -2\mathbf{r}^{\top}\mathbf{X}(\partial(\mathbf{U}\mathbf{G})/\partial\lambda_w)\mathbf{X}^{\top}\mathbf{y}\cdot\lambda_w, via the chain rule applied to the residual vector. -
\partial D/\partial\theta_1 = 2(1 - \bar{W})(-\partial\bar{W}/\partial\lambda_w)\cdot\lambda_w.
In the implementation, these quantities are assembled by a small set of
helper routines: compute_dG_dlambda for
\partial\mathbf{G}/\partial\lambda, compute_dGhalf
for \partial\mathbf{G}^{1/2}/\partial\lambda,
compute_dW_dlambda_wrapper for derivatives of the effective
degrees-of-freedom term, compute_trace_UGXX_wrapper for the
trace pieces entering GCV, and compute_dG_u_dlambda_xy for
the derivative of the fitted-value quadratic form.
The full gradient is scaled by N before adding the meta-penalty
gradient.
For the ridge penalty and predictor-/partition-specific penalties, a trace-ratio heuristic is used:
\frac{\partial\mathrm{GCV}_{u}}{\partial\lambda_l} \approx
\frac{\mathrm{mean}(\mathrm{diag}(\mathbf{L}_l))}{\mathrm{mean}(\mathrm{diag}(\boldsymbol{\Lambda}))}
\frac{\partial\mathrm{GCV}_{u}}{\partial\lambda_w},
and analogously for predictor- and partition-specific penalties, where
\mathbf{L}_j^{(\mathrm{pred})} or \mathbf{L}_k^{(\mathrm{part})}
replaces \mathbf{L}_l in the numerator. This follows from a
chain-rule argument: by the Leibniz rule and the inverse derivative,
\partial\lambda_w/\partial\lambda_l = (\partial\boldsymbol{\Lambda}/\partial\lambda_l)(\partial\boldsymbol{\Lambda}/\partial\lambda_w)^{-1}.
Since the derivative appears as a matrix rather than a scalar, the
mean-diagonal ratio provides a scalar summary. Once the derivative for
\lambda_w is in hand, the derivatives of other penalties are cheap
to compute. The exp chain rule is then applied:
\partial/\partial\theta = (\partial/\partial\lambda)\cdot\lambda.
Optimization Procedure
Grid search initialization. The \mathrm{GCV}_{u} criterion
is evaluated over a grid of candidate values for
(\lambda_w, \lambda_r) on the log scale. All combinations of
user-supplied candidate vectors (initial_wiggle and
initial_flat) are formed, and the combination yielding
the smallest finite \mathrm{GCV}_{u} is selected as the starting
point for BFGS optimization. Grid points producing non-finite
\mathrm{GCV}_{u} are discarded. If all grid points fail, an error
is raised advising the user to check the data or adjust the grid.
Damped BFGS optimizer. A custom damped BFGS quasi-Newton optimizer,
implemented in efficient_bfgs, minimizes
\mathrm{GCV}_{u} + P_{\mathrm{meta}}. When analytic gradients are not
usable, the fallback finite-difference helper is approx_grad.
Iterations 1-2: steepest descent. The first two iterations use
steepest descent with a damping factor \alpha:
\boldsymbol{\phi}^{(t+1)} = \boldsymbol{\phi}^{(t)} - \alpha\nabla_{\boldsymbol{\phi}}.
Iterations 3+: BFGS. From iteration 3, an inverse Hessian
approximation \mathbf{J}^{(t)} is maintained via the standard secant
update. Let
\mathbf{s}^{(t)} = \boldsymbol{\phi}^{(t)} - \boldsymbol{\phi}^{(t-1)}
and
\mathbf{v}^{(t)} = \nabla^{(t)} - \nabla^{(t-1)}. The BFGS update
is:
\mathbf{J}^{(t+1)}
= (\mathbf{I} - \mathbf{u}\mathbf{s}\mathbf{v}^{\top})
\mathbf{J}^{(t)}
(\mathbf{I} - \mathbf{u}\mathbf{v}\mathbf{s}^{\top})
+ \mathbf{u}\mathbf{s}\mathbf{s}^{\top},
\qquad \mathbf{u} = (\mathbf{v}^{\top}\mathbf{s})^{-1}.
When |\mathbf{v}^{\top}\mathbf{s}| < 10^{-64}, the approximation is
reset to \mathbf{I} and the iteration is flagged for restart. The
search direction is
\mathbf{d}^{(t)} = -\mathbf{J}^{(t)}\nabla^{(t)}.
Step acceptance. A step is accepted if
\mathrm{GCV}_{u}^{(\mathrm{new})} \leq \mathrm{GCV}_{u}^{(\mathrm{old})}.
On rejection, \alpha is halved. If \alpha < 2^{-10} (early
iterations) or \alpha < 2^{-12} (later iterations), the optimizer
terminates with the best solution found.
Convergence. The optimizer terminates when
|\mathrm{GCV}_{u}^{(t)} - \mathrm{GCV}_{u}^{(t-1)}| < \epsilon or
\|\boldsymbol{\phi}^{(t)} - \boldsymbol{\phi}^{(t-1)}\|_{\infty} < \epsilon,
provided at least 10 iterations have elapsed, for penalties
\boldsymbol{\phi} = \lambda_w, \lambda_r, \nu_j, \tau_j, ....
Alternative. A base-R
optim call with method "BFGS" and
finite-difference gradients can be used used instead via use_custom_bfgs=FALSE,
which uses stats::optim.
Post-optimization inflation. After optimization, the penalty
parameters are inflated by a factor ((N+2)/(N-2))^{2} to counteract
the in-sample bias toward underpenalization inherent in GCV-type criteria.
The tuning loop is implemented in tune_Lambda.
Incorporating Non-Spline Effects
Multiple fixed effects are accommodated naturally in the LMSS framework because spline effects, linear effects, and many interaction terms all live in the same partition-wise polynomial expansion. The distinction is therefore not whether a term is "allowed" by the solver, but whether it receives full spline treatment or remains structurally linear across partitions.
The constrained framework naturally accommodates non-spline terms. If only
linear terms are included for a predictor (via
just_linear_without_interactions or
just_linear_with_interactions), the first-derivative smoothing
constraint forces the linear coefficient to be identical across all
partitions, since the derivative of a linear function is its slope. This
is not an algorithmic modification but a natural consequence of the
constraint structure.
For example, a model with one spline effect and a linear treatment indicator interaction will naturally keep the treatment-time interaction coefficient constant across partitions while allowing the time effect to vary nonlinearly. This conveniently extends to arbitrary combinations of spline and linear terms without requiring special handling.
When blockfit = TRUE is specified alongside
just_linear_without_interactions, the flat-block path provides an
alternative enforcement mechanism. Rather than relying on constraint
projection, flat coefficients are pooled structurally across partitions
during backfitting. The two approaches agree at the point estimate but
differ in their uncertainty quantification; see the Blockfit section above.
Integration
Because the fitted object retains an explicit polynomial representation in each partition, numerical integration can be carried out in a fairly direct way. The package wraps that calculation in a user-facing S3 method so the user does not need to manage knot boundaries or partition membership by hand.
In the user-facing interface, numerical integration is applied through
integrate.lgspline, which applies Gauss-Legendre quadrature to
predictions from the fitted model produced by predict.lgspline.
Implementation
For a user-supplied rectangular domain, integrate.lgspline constructs a
tensor-product grid of Gauss-Legendre nodes, evaluates the fitted model at
those points, and forms the weighted sum. This works for both univariate and
multivariate models, respects the fitted partition structure automatically,
and avoids requiring the user to keep track of knot boundaries by hand.
The vars argument selects which predictors are integrated over.
Predictors not listed in vars are held fixed at
initial_values when supplied, or otherwise at the midpoint of their
observed training range. The optional B_predict argument makes it
possible to integrate posterior draws or other alternate coefficient sets,
and n_quad controls the number of Gauss-Legendre nodes used per
integrated dimension.
Integration is performed on the response scale by default. Setting
link_scale = TRUE instead integrates the linear predictor
\eta = f(\mathbf{t}). For identity-link Gaussian models the two scales
coincide.
Lagrange Multipliers
When return_lagrange_multipliers = TRUE, the multiplier vector
\boldsymbol{\lambda} = (\mathbf{A}^{\top}\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^{\top}\hat{\boldsymbol{\beta}}
is returned. These quantify the sensitivity of the penalized objective to
relaxing each smoothness or user-supplied equality constraint. When
constraint target values are nonzero
(\mathbf{A}^{\top}\boldsymbol{\beta}_0 \neq \mathbf{0}), the modified
formulation is used:
\boldsymbol{\lambda} = (\mathbf{A}^{\top}\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^{\top}(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}_0)
where \mathbf{A}^{\top}\boldsymbol{\beta}_0 is the vector of
constraint target values. Multipliers are NULL when no constraints
are active (\mathbf{A} is NULL or K = 0).
For inequality constraints, multipliers are returned as computed by
solve.QP. The Lagrange multipliers for active
inequality constraints can be used diagnostically to identify which shape
constraints are most costly in terms of goodness of fit.
S3 Methods
Standard S3 methods are provided for objects of class lgspline:
-
print.lgsplineandsummary.lgspline: Provide concise model summaries, withprint.summary.lgsplineformatting coefficient tables in a familiar regression-style layout. -
logLik.lgspline: Returns a standardlogLikobject. For Gaussian responses with identity link, the exact log-likelihood is computed. When a correlation structure is present viaVhalfInv, the log-likelihood includes the\log|\mathbf{V}^{-1/2}|adjustment and the corresponding whitened quadratic form. For other families, the method falls back tofamily$aicor a deviance-based approximation. Aninclude_priorargument (defaultTRUE) optionally adds the Gaussian prior penalty interpretation of the smoothing spline penalty-\frac{1}{2\sigma^{2}}\tilde{\boldsymbol{\beta}}^{\top}\boldsymbol{\Lambda}\tilde{\boldsymbol{\beta}}to obtain a penalized MAP log-likelihood. -
predict.lgspline: Produces fitted values and related quantities (e.g., derivatives and standard errors throughse.fit = TRUE), letsnew_predictorsoverridenewdata, accepts alternate coefficient lists throughB_predict, and supports prediction on new predictor matrices consistent with the original spline expansions. -
coef.lgspline: Extracts partition-specific coefficient vectors. -
confint.lgspline: Extracts confidence intervals. When the inverse Hessian from BFGS optimization is available for correlation parameters, intervals for those correlation parameters are returned on the working (transformed) scale and should be back-transformed as described in the correlation section. -
plot.lgspline: For one-dimensional fits, produces base R graphics showing the fitted function (with optional partition-wise formulas) and supports overlay viaadd = TRUE. For two or more predictors, an interactive plotly-based visualization is returned. Specific predictors may be selected viavars. -
integrate.lgspline: Computes definite integrals of the fitted surface over rectangular domains by Gauss-Legendre quadrature.
Additional user-facing helpers include wald_univariate for
coefficient-wise Wald inference, generate_posterior for
posterior and posterior-predictive sampling,
generate_posterior_correlation for correlation-aware posterior
simulation, equation for closed-form display of the fitted
partition formulas, and
find_extremum for optimizing the fitted surface or a custom
acquisition function built from it.
References
Buse, A. and Lim, L. (1977). Cubic Splines as a Special Case of Restricted Least Squares. Journal of the American Statistical Association, 72, 64-68.
Eilers, P. H. and Marx, B. D. (1996). Flexible Smoothing with B-splines and Penalties. Statistical Science, 11(2), 89-121.
Ezhov, N., Neitzel, F. and Petrovic, S. (2018). Spline Approximation, Part 1: Basic Methodology. Journal of Applied Geodesy, 12(2), 139-155.
Goldfarb, D. and Idnani, A. (1983). A Numerically Stable Dual Method for Solving Strictly Convex Quadratic Programs. Mathematical Programming, 27(1), 1-33.
Harville, D. A. (1977). Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems. Journal of the American Statistical Association, 72(358), 320-338.
Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Chapman & Hall/CRC.
Kisi, O., Heddam, S., Parmar, K. S., Petroselli, A., Kulls, C. and Zounemat-Kermani, M. (2025). Integration of Gaussian Process Regression and K Means Clustering for Enhanced Short Term Rainfall Runoff Modeling. Scientific Reports, 15, 7444.
MacQueen, J. B. (1967). Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1, 281-297. University of California Press.
McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. Chapman & Hall, 2nd edition.
Murray, I., Adams, R. P. and MacKay, D. J. C. (2010). Elliptical Slice Sampling. Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), 9, 541-548.
Nocedal, J. and Wright, S. J. (2006). Numerical Optimization (2nd ed.). Springer.
Patterson, H. D. and Thompson, R. (1971). Recovery of Inter-Block Information When Block Sizes Are Unequal. Biometrika, 58, 545-554.
Pya, N. and Wood, S. N. (2015). Shape Constrained Additive Models. Statistics and Computing, 25(3), 543-559.
Reinsch, C. H. (1967). Smoothing by Spline Functions. Numerische Mathematik, 10, 177-183.
Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression. Cambridge University Press.
Searle, S. R., Casella, G. and McCulloch, C. E. (2006). Variance Components. Wiley.
Wahba, G. (1990). Spline Models for Observational Data. SIAM.
Wood, S. N. (2006). On Confidence Intervals for Generalized Additive Models Based on Penalized Regression Splines. Australian & New Zealand Journal of Statistics, 48(4), 445-464.
Wood, S. N. (2011). Fast Stable Restricted Maximum Likelihood and Marginal Likelihood Estimation of Semiparametric Generalized Linear Models. Journal of the Royal Statistical Society: Series B, 73(1), 3-36.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R. CRC Press, 2nd edition.
Efficient Matrix Multiplication of G and A Matrices
Description
Efficient Matrix Multiplication of G and A Matrices
Usage
GAmult_wrapper(
G,
A,
K,
p_expansions,
R_constraints,
parallel,
cl,
chunk_size,
num_chunks,
rem_chunks
)
Arguments
G |
List of G matrices |
A |
Constraint matrix |
K |
Number of partitions minus 1 |
p_expansions |
Number of columns per partition |
R_constraints |
Number of constraint columns |
parallel |
Use parallel processing |
cl |
Cluster object |
chunk_size |
Size of parallel chunks |
num_chunks |
Number of chunks |
rem_chunks |
Remaining chunks |
Details
Computes G Processes in parallel chunks if enabled.
Value
List of matrix products, one per partition
Finite-difference Gradient Computer
Description
Computes a finite-difference approximation derived from the objective in
fn at x, returned in the sign convention currently used by
efficient_bfgs.
Usage
approx_grad(x, fn, eps = sqrt(.Machine$double.eps))
Arguments
x |
Numeric vector of function arguments |
fn |
Function returning list(objective, gradient) |
eps |
Numeric scalar, finite difference tolerance |
Details
Used within efficient_bfgs when fn does not supply a usable
gradient. In the main lgspline() correlation-optimization path,
stats::optim() is used instead when no analytic REML gradient is
available. Internally this helper returns the negated central-difference
approximation rather than the raw derivative.
Value
Numeric vector of finite-difference approximated gradient values in the current optimizer sign convention
Matrix Inversion using Armadillo
Description
Computes the inverse of a matrix using Armadillo's inversion method
Usage
armaInv(x)
Arguments
x |
Input matrix to be inverted |
Value
Inverted matrix
Backfitting Solver for Blockfit Models
Description
Fits models with mixed spline and non-interactive linear ("flat") terms
using an iterative backfitting approach. Flat terms receive a single
shared coefficient across partitions (rather than K+1
partition-specific coefficients constrained to equality), reducing the
effective parameter count and improving efficiency when the number of
flat terms is large relative to spline terms.
The backfitting loop alternates between:
-
Spline step
Fit spline terms on the response adjusted for the current flat contribution using a constrained penalized least squares solve.
If
\mathbf{y}_kis the response vector for partitionk,\mathbf{Z}_kthe spline design matrix,\mathbf{X}_{\mathrm{flat}}^{(k)}the flat design matrix, and\mathbf{v}the flat coefficient vector, then\boldsymbol{\beta}_{\mathrm{spline}}^{(k)} = \arg\min_{\boldsymbol{\beta}} \left\| \mathbf{y}_k - \mathbf{Z}_k \boldsymbol{\beta} - \mathbf{X}_{\mathrm{flat}}^{(k)} \mathbf{v} \right\|^2 + \boldsymbol{\beta}^{\top} \mathbf{\Lambda}_{\mathrm{spline}} \boldsymbol{\beta}subject to
\mathbf{A}_{\mathrm{spline}}^{\top} \boldsymbol{\beta} = \mathbf{c}_{\mathrm{spline}}. -
Flat step
Update flat coefficients via pooled penalized regression on residuals, subject to the residual equality constraint from any mixed constraints:
\mathbf{v} = \arg\min_{\mathbf{v}} \sum_{k=0}^{K} \left\| \mathbf{y}_k - \mathbf{Z}_k \boldsymbol{\beta}_{\mathrm{spline}}^{(k)} - \mathbf{X}_{\mathrm{flat}}^{(k)} \mathbf{v} \right\|^2 + \mathbf{v}^{\top} \mathbf{\Lambda}_{\mathrm{flat}} \mathbf{v}subject to
\tilde{\mathbf{A}}_{\mathrm{flat}}^{\top} \mathbf{v} = \mathbf{c} - \mathbf{A}_{\mathrm{spline}}^{\top} \boldsymbol{\beta}_{\mathrm{spline}}.
Four estimation paths are selected automatically based on the model configuration:
- Case (a)
Gaussian identity + GEE: closed-form solve via
.bf_case_gauss_gee.- Case (b)
Gaussian identity, no correlation: standard backfitting via
.bf_case_gauss_no_corr.- Case (c)
GLM + GEE: two-stage (damped Newton-Raphson warm start then damped SQP) via
.bf_case_glm_gee.- Case (d)
GLM without GEE: damped Newton-Raphson + backfitting via
.bf_case_glm_no_corr.
When quadprog = TRUE without GEE, inequality constraints are
enforced after backfitting convergence. The constraint handling
method is selected automatically by inspecting the sparsity pattern
of qp_Amat: if every column has nonzeros in only one
partition block (e.g.\ derivative sign or range constraints),
a partition-wise active-set method is used via
.active_set_refine, avoiding the dense P \times P
system. If any column spans multiple partition blocks (e.g.\
cross-knot monotonicity), or if the active-set method does not
converge, the dense SQP fallback via .bf_sqp_loop is used.
GEE paths always use the dense system.
After convergence, coefficients are reassembled into the standard per-partition format (flat coefficients replicated across partitions) for compatibility with downstream inference.
Usage
blockfit_solve(
X,
y,
flat_cols,
K,
p_expansions,
Lambda,
L_partition_list,
unique_penalty_per_partition,
A,
R_constraints,
constraint_values,
X_gram,
Ghalf_full,
GhalfInv_full,
family,
order_list,
glm_weight_function,
schur_correction_function,
need_dispersion_for_estimation,
dispersion_function,
observation_weights,
homogenous_weights = TRUE,
iterate,
tol,
parallel_eigen,
cl,
chunk_size,
num_chunks,
rem_chunks,
return_G_getB,
quadprog = FALSE,
qp_Amat = NULL,
qp_bvec = NULL,
qp_meq = NULL,
qp_score_function = NULL,
keep_weighted_Lambda = FALSE,
max_backfit_iter = 100,
Vhalf = NULL,
VhalfInv = NULL,
include_warnings = TRUE,
verbose = FALSE,
...
)
Arguments
X |
List of |
y |
List of |
flat_cols |
Integer vector indicating flat columns of
|
K |
Integer; number of interior knots. |
p_expansions |
Integer; number of coefficients per partition. |
Lambda |
|
L_partition_list |
List of partition-specific penalty matrices. |
unique_penalty_per_partition |
Logical. |
A |
Full |
R_constraints |
Integer; number of columns of |
constraint_values |
List of constraint right-hand sides. |
X_gram |
List of Gram matrices
|
Ghalf_full, GhalfInv_full |
Lists of
|
family |
GLM family object. |
order_list |
List of observation index vectors by partition. |
glm_weight_function |
GLM weight function. |
schur_correction_function |
Schur complement correction function. |
need_dispersion_for_estimation |
Logical. |
dispersion_function |
Dispersion estimation function. |
observation_weights |
List of observation weights. |
homogenous_weights |
Logical. |
iterate |
Logical; if FALSE, single pass (no iteration). |
tol |
Convergence tolerance. |
parallel_eigen, cl, chunk_size, num_chunks, rem_chunks |
Parallel arguments. |
return_G_getB |
Logical. |
quadprog |
Logical; apply inequality constraint refinement if TRUE. |
qp_Amat |
Inequality constraint matrix for
|
qp_bvec |
Inequality constraint right-hand side. |
qp_meq |
Number of leading equality constraints. |
qp_score_function |
Score function for QP subproblem. |
keep_weighted_Lambda |
Logical. |
max_backfit_iter |
Integer. |
Vhalf |
Square root of the working correlation matrix in the original observation ordering. |
VhalfInv |
Inverse square root of the working correlation matrix
in the original observation ordering. When both are non- |
include_warnings |
Logical. |
verbose |
Logical. |
... |
Additional arguments passed to weight and dispersion functions. |
Value
A list with elements:
- B
-
List of
K+1coefficient vectors (p_expansions \times 1), flat coefficients replicated across partitions. - G_list
-
List containing
\mathbf{G},\mathbf{G}^{1/2}, and\mathbf{G}^{-1/2}, each a list ofK+1p_expansions \times p_expansionsmatrices, or NULL ifreturn_G_getBis FALSE.\mathbf{G}satisfies\mathbf{G} = \mathbf{G}^{1/2} (\mathbf{G}^{1/2})^{\top}exactly.When a correlation structure is present, these matrices are obtained from the dense whitened information matrix and then split back into per-partition blocks. When flat columns are present, the returned matrices also reflect the pooled flat-column information across partitions.
- qp_info
-
NULL when no inequality-refinement metadata are produced. Otherwise a list of available QP or active-set diagnostics. Dense SQP solves include
solution,lagrangian,active_constraints,iact,Amat_active,bvec_active,meq_active,converged, andfinal_deviance; non-GEE dense solves also carryinfo_matrix,Amat_combined,bvec_combined, andmeq_combined. Partition-wise active-set refinement returns the corresponding active-constraint summary only.
See Also
Examples
## Not run:
## Minimal verification example: Gaussian identity, no correlation,
## 2 partitions, 3 spline columns + 1 flat column per partition.
##
## This confirms that blockfit_solve returns compatible output and
## that flat coefficients are replicated across partitions.
set.seed(1234)
n1 <- 50; n2 <- 50
p_expansions <- 4 # intercept + x + x^2 + z (flat)
K <- 1 # 2 partitions
X1 <- cbind(1, rnorm(n1), rnorm(n1)^2, rnorm(n1))
X2 <- cbind(1, rnorm(n2), rnorm(n2)^2, rnorm(n2))
beta_true <- c(1, 0.5, -0.3, 0.8)
y1 <- X1 %*% beta_true + rnorm(n1, 0, 0.5)
y2 <- X2 %*% beta_true + rnorm(n2, 0, 0.5)
## Constraint: spline coefficients equal across partitions
## (columns 1:3 constrained, column 4 is flat)
A <- matrix(0, nrow = p_expansions * (K + 1), ncol = 3)
for(j in 1:3){
A[j, j] <- 1
A[p_expansions + j, j] <- -1
}
qr_A <- qr(A)
A <- qr.Q(qr_A)[, 1:qr_A$rank, drop = FALSE]
Lambda <- diag(p_expansions) * 1e-4
L_partition_list <- list(0, 0)
X_gram <- list(crossprod(X1), crossprod(X2))
G_list <- compute_G_eigen(
X_gram, Lambda, K,
parallel = FALSE, cl = NULL,
chunk_size = NULL, num_chunks = NULL, rem_chunks = NULL,
family = gaussian(),
unique_penalty_per_partition = FALSE,
L_partition_list = L_partition_list,
keep_G = TRUE,
schur_corrections = list(0, 0))
result <- blockfit_solve(
X = list(X1, X2),
y = list(y1, y2),
flat_cols = 4L,
K = K, p_expansions = p_expansions,
Lambda = Lambda,
L_partition_list = L_partition_list,
unique_penalty_per_partition = FALSE,
A = A, R_constraints = ncol(A),
constraint_values = list(),
X_gram = X_gram,
Ghalf_full = G_list$Ghalf,
GhalfInv_full = G_list$GhalfInv,
family = gaussian(),
order_list = list(1:n1, (n1+1):(n1+n2)),
glm_weight_function = function(mu, y, oi, fam, d, ow, ...) rep(1, length(y)),
schur_correction_function = function(X, y, B, d, ol, K, fam, ow, ...) {
lapply(1:(K + 1), function(k) 0)
},
need_dispersion_for_estimation = FALSE,
dispersion_function = function(...) 1,
observation_weights = list(rep(1, n1), rep(1, n2)),
homogenous_weights = TRUE,
iterate = TRUE, tol = 1e-8,
parallel = FALSE, cl = NULL,
chunk_size = NULL, num_chunks = NULL, rem_chunks = NULL,
return_G_getB = TRUE,
verbose = FALSE)
## Verify: flat coefficient (col 4) is identical across partitions
stopifnot(abs(result$B[[1]][4] - result$B[[2]][4]) < 1e-10)
## Verify: output structure
stopifnot(length(result$B) == K + 1)
stopifnot(length(result$B[[1]]) == p_expansions)
stopifnot(!is.null(result$G_list))
## End(Not run)
Extract Coefficients from a Fitted lgspline
Description
Returns the per-partition polynomial coefficient lists from a fitted lgspline model.
Usage
## S3 method for class 'lgspline'
coef(object, ...)
Arguments
object |
A fitted lgspline model object. |
... |
Not used. |
Details
Coefficient names reflect the polynomial expansion terms, e.g.:
intercept
v: linear term for predictor v
v_^2: quadratic term
v^3: cubic term
_v_x_w_: two-way interaction
Column/variable names replace numeric indices when available.
To get all coefficients as a single matrix:
Reduce('cbind', coef(model_fit)).
Value
A list of per-partition coefficient vectors. Returns NULL with a
warning if object$B is not found.
See Also
Examples
set.seed(1234)
t <- runif(1000, -10, 10)
y <- 2*sin(t) + -0.06*t^2 + rnorm(length(t))
model_fit <- lgspline(t, y)
coefficients <- coef(model_fit)
print(coefficients[[1]])
print(Reduce('cbind', coefficients))
Extract Coefficients from a wald_lgspline Object
Description
Extract Coefficients from a wald_lgspline Object
Usage
## S3 method for class 'wald_lgspline'
coef(object, ...)
Arguments
object |
A |
... |
Not used. |
Value
Named numeric vector of coefficient estimates, or NULL if no estimate column is available.
Collapse Matrix List into a Single Dense Block-Layout Matrix
Description
Transforms a list of matrices into a single dense block-layout matrix. This is useful especially for quadratic programming problems, where operating on lists of blocks is not convenient.
Usage
collapse_block_diagonal(matlist)
Arguments
matlist |
List of input matrices |
Value
Dense matrix with each input block placed in its own column range
Compute Eigenvalues and Related Matrices for G
Description
Computes partition-wise inverse-information matrices and their matrix square roots from the penalized information matrix via eigendecomposition.
Computes partition-wise inverse-information matrices and their matrix square roots from the penalized information matrix via eigendecomposition.
Usage
compute_G_eigen(
X_gram,
Lambda,
K,
parallel,
cl,
chunk_size,
num_chunks,
rem_chunks,
family,
unique_penalty_per_partition,
L_partition_list,
keep_G = TRUE,
schur_corrections
)
compute_G_eigen(
X_gram,
Lambda,
K,
parallel,
cl,
chunk_size,
num_chunks,
rem_chunks,
family,
unique_penalty_per_partition,
L_partition_list,
keep_G = TRUE,
schur_corrections
)
Arguments
X_gram |
List of Gram matrices |
Lambda |
Penalty matrix |
K |
Integer; number of interior knots (partitions = |
parallel |
Logical; use parallel processing across partitions. |
cl |
Cluster object from |
chunk_size, num_chunks, rem_chunks |
Partition distribution parameters. |
family |
GLM family object. |
unique_penalty_per_partition |
Logical; if |
L_partition_list |
List of partition-specific penalty matrices. |
keep_G |
Logical; if |
schur_corrections |
List of Schur complement correction matrices. |
Value
A list with components G (or NULL blocks when
keep_G = FALSE), Ghalf, and optionally GhalfInv.
A list with components G (or NULL blocks when
keep_G = FALSE), Ghalf, and optionally GhalfInv.
Compute Component \textbf{G}^{1/2}\textbf{A}(\textbf{A}^{T}\textbf{G}\textbf{A})^{-1}\textbf{A}^{T}\textbf{G}\textbf{X}^{T}\textbf{y}
Description
Compute Component \textbf{G}^{1/2}\textbf{A}(\textbf{A}^{T}\textbf{G}\textbf{A})^{-1}\textbf{A}^{T}\textbf{G}\textbf{X}^{T}\textbf{y}
Usage
compute_GhalfXy_temp_wrapper(
G,
Ghalf,
A,
AGAInv,
Xy,
nc,
K,
parallel,
cl,
chunk_size,
num_chunks,
rem_chunks
)
Arguments
G |
List of |
Ghalf |
List of |
A |
Constraint matrix |
AGAInv |
Inverse of |
Xy |
List of |
nc |
Number of columns |
K |
Number of partitions minus 1 ( |
parallel |
Use parallel processing |
cl |
Cluster object |
chunk_size |
Size of parallel chunks |
num_chunks |
Number of chunks |
rem_chunks |
Remaining chunks |
Details
Computes the least-squares projection component
\mathbf{G}^{1/2}\mathbf{A}(\mathbf{A}^{T}\mathbf{G}\mathbf{A})^{-1}
\mathbf{A}^{T}\mathbf{G}\mathbf{X}^{T}\mathbf{y} together with the
intermediate product
(\mathbf{A}^{T}\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^{T}\mathbf{G}\mathbf{X}^{T}\mathbf{y}
for reuse downstream.
Value
Unnamed two-element list containing the projected result vector and
the intermediate
(\mathbf{A}^{T}\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^{T}\mathbf{G}\mathbf{X}^{T}\mathbf{y}
product.
Construct Smoothing Spline Penalty Matrix
Description
Builds penalty matrix combining smoothing spline and ridge penalties with optional predictor/partition-specific components.
Usage
compute_Lambda(
custom_penalty_mat,
L1,
wiggle_penalty,
flat_ridge_penalty,
K,
p_expansions,
unique_penalty_per_predictor,
unique_penalty_per_partition,
penalty_vec,
colnm_expansions,
just_Lambda = TRUE
)
Arguments
custom_penalty_mat |
Matrix; optional custom ridge penalty structure |
L1 |
Matrix; integrated squared second derivative penalty ( |
wiggle_penalty, flat_ridge_penalty |
Numeric; smoothing and ridge penalty parameters |
K |
Integer; number of interior knots ( |
p_expansions |
Integer; number of basis columns per partition |
unique_penalty_per_predictor, unique_penalty_per_partition |
Logical; enable predictor/partition-specific penalties |
penalty_vec |
Named numeric; custom penalty values for predictors/partitions |
colnm_expansions |
Character; column names for linking penalties to predictors |
just_Lambda |
Logical; return only combined penalty matrix ( |
Value
List containing Lambda, L1, L2, L_predictor_list, L_partition_list;
or just Lambda if just_Lambda=TRUE and no partition penalties.
Compute Derivative of Inverse-Information Matrix G with Respect to Lambda
Description
Calculates the derivative of the inverse-information matrix
\textbf{G} with respect to the smoothing parameter \lambda,
supporting both shared and partition-specific penalties.
Usage
compute_dG_dlambda(
G,
L,
K,
lambda,
unique_penalty_per_partition,
L_partition_list,
parallel,
cl,
chunk_size,
num_chunks,
rem_chunks
)
Arguments
G |
A list of inverse-information matrices |
L |
The base penalty matrix |
K |
Number of partitions minus 1 ( |
lambda |
Smoothing parameter value |
unique_penalty_per_partition |
Logical indicating partition-specific penalties |
L_partition_list |
Optional list of partition-specific penalty matrices |
parallel |
Logical to enable parallel processing |
cl |
Cluster object for parallel computation |
chunk_size |
Size of chunks for parallel processing |
num_chunks |
Number of chunks |
rem_chunks |
Remainder chunks |
Value
A list of derivative matrices d\textbf{G}/d\lambda for each partition
Derivative of Constrained Penalized Coefficients with Respect to Lambda
Description
Computes d(\mathbf{U}\mathbf{G}\mathbf{X}^{T}\mathbf{y})/d\lambda,
the sensitivity of the constrained coefficient estimates
\tilde{\boldsymbol{\beta}} = \mathbf{U}\mathbf{G}\mathbf{X}^T\mathbf{y}
to changes in the smoothing parameter \lambda.
Usage
compute_dG_u_dlambda_xy(
AGAInv_AGXy,
AGAInv,
G,
A,
dG_dlambda,
nc,
nca,
K,
Xy,
Ghalf,
dGhalf,
GhalfXy_temp,
parallel,
cl,
chunk_size,
num_chunks,
rem_chunks
)
Arguments
AGAInv_AGXy |
Product of |
AGAInv |
Inverse of |
G |
List of |
A |
Constraint matrix |
dG_dlambda |
List of |
nc |
Number of columns |
nca |
Number of constraint columns |
K |
Number of partitions minus 1 ( |
Xy |
List of |
Ghalf |
List of |
dGhalf |
List of |
GhalfXy_temp |
Temporary storage for |
parallel |
Use parallel processing |
cl |
Cluster object |
chunk_size |
Size of parallel chunks |
num_chunks |
Number of chunks |
rem_chunks |
Remaining chunks |
Details
This function is called during GCV/penalty optimization and computes how the constrained coefficient vector changes as the penalty weight varies. Two implementations are provided depending on problem size.
Derivation
The constrained estimate is
\tilde{\boldsymbol{\beta}} = \mathbf{U}\hat{\boldsymbol{\beta}}
where
\hat{\boldsymbol{\beta}} = \mathbf{G}\mathbf{X}^T\mathbf{y}
and
\mathbf{U} = \mathbf{I} - \mathbf{G}\mathbf{A}(\mathbf{A}^T\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^T.
Both \mathbf{G} and \mathbf{U} depend on \lambda
through \mathbf{G} = (\mathbf{X}^T\mathbf{X} + \lambda\boldsymbol{\Lambda})^{-1}.
Differentiating the product \mathbf{U}\mathbf{G}\mathbf{X}^T\mathbf{y}
requires the chain rule applied to three \lambda-dependent
components:
\frac{d}{d\lambda}(\mathbf{U}\mathbf{G}\mathbf{X}^T\mathbf{y})
= \frac{d\mathbf{U}}{d\lambda}\hat{\boldsymbol{\beta}} + \mathbf{U}\frac{d\hat{\boldsymbol{\beta}}}{d\lambda}
The unconstrained part is straightforward:
d\hat{\boldsymbol{\beta}}/d\lambda = (d\mathbf{G}/d\lambda)\mathbf{X}^T\mathbf{y},
computed partition-wise since \mathbf{G} is block-diagonal.
The constraint projection derivative is more involved. Writing
\mathbf{C} = (\mathbf{A}^T\mathbf{G}\mathbf{A})^{-1} and
expanding d\mathbf{U}/d\lambda gives three terms (after
applying the product rule and the identity
d\mathbf{C}/d\lambda = -\mathbf{C}(d(\mathbf{A}^T\mathbf{G}\mathbf{A})/d\lambda)\mathbf{C}):
- term2a
(d\mathbf{G}/d\lambda)\mathbf{A}\mathbf{C}\mathbf{A}^T\hat{\boldsymbol{\beta}}direct effect ofd\mathbf{G}/d\lambdaon the projection- term2b
\mathbf{G}\mathbf{A}\mathbf{C}(d(\mathbf{A}^T\mathbf{G}\mathbf{A})/d\lambda)\mathbf{C}\mathbf{A}^T\hat{\boldsymbol{\beta}}effect through the change in(\mathbf{A}^T\mathbf{G}\mathbf{A})^{-1}- term2c
\mathbf{G}\mathbf{A}\mathbf{C}\mathbf{A}^T(d\hat{\boldsymbol{\beta}}/d\lambda)projection of the unconstrained derivative back through the constraint
The full derivative is
d\hat{\boldsymbol{\beta}}/d\lambda minus the sum of these
three correction terms.
Large problem path (K >= 10, nc > 4)
Computes the three correction terms explicitly using the block
structure of \mathbf{G} and d\mathbf{G}/d\lambda.
The intermediate quantity
d(\mathbf{A}^T\mathbf{G}\mathbf{A})/d\lambda = \mathbf{A}^T(d\mathbf{G}/d\lambda)\mathbf{A}
is accumulated partition-wise. Shared vectors
\mathbf{A}\mathbf{C}\mathbf{A}^T\hat{\boldsymbol{\beta}}
and related products are precomputed once and reused across
partitions. Parallelism is over chunks of partitions.
Small problem path
Reformulates the derivative using the matrix square root
\mathbf{G}^{1/2} and its derivative
d\mathbf{G}^{1/2}/d\lambda. The constraint
\mathbf{A}^T\boldsymbol{\beta} = 0 is imposed via least
squares projection: the residuals from regressing
\mathbf{G}^{1/2}\mathbf{X}^T\mathbf{y} onto
\mathbf{G}^{1/2}\mathbf{A} give the constrained component,
and differentiating this projection with respect to \lambda
yields the derivative. Uses .lm.fit for speed with a
stabilizing rescaling factor to prevent numerical issues when the
constraint matrix is poorly scaled.
Value
P \times 1 vector of derivatives
d\tilde{\boldsymbol{\beta}}/d\lambda
Compute Eigen-Based Square-Root Factors for d\textbf{G}/d\lambda
Description
Applies an eigendecomposition-based matrix square root to each partition-wise
d\textbf{G}/d\lambda matrix after replacing NA entries with 0
and treating all-0 inputs with an identity fallback.
Usage
compute_dGhalf(
dG_dlambda,
p_expansions,
K,
parallel,
cl,
chunk_size,
num_chunks,
rem_chunks
)
Arguments
dG_dlambda |
List of |
p_expansions |
Integer; number of columns per partition |
K |
Integer; number of interior knots ( |
parallel, cl, chunk_size, num_chunks, rem_chunks |
Parallel computation parameters |
Value
List of p \times p eigen-based square-root factors derived from
the partition-wise d\textbf{G}/d\lambda matrices
Compute Derivative of Inverse-Information Matrix G with Respect to Lambda (Wrapper)
Description
Wrapper for the derivative of the trace term used in GCV / effective-
degrees-of-freedom calculations with respect to \lambda.
Usage
compute_dW_dlambda_wrapper(
G,
A,
GXX,
Ghalf,
dG_dlambda,
dGhalf_dlambda,
AGAInv,
nc,
K,
parallel,
cl,
chunk_size,
num_chunks,
rem_chunks
)
Arguments
G |
A list of inverse-information matrices |
A |
Constraint matrix |
GXX |
List of |
Ghalf |
List of |
dG_dlambda |
List of |
dGhalf_dlambda |
List of |
AGAInv |
Inverse of |
nc |
Number of columns |
K |
Number of partitions minus 1 ( |
parallel |
Logical to enable parallel processing |
cl |
Cluster object for parallel computation |
chunk_size |
Size of chunks for parallel processing |
num_chunks |
Number of chunks |
rem_chunks |
Remainder chunks |
Value
Scalar value representing the trace derivative component.
Compute Gram Matrix for Block Diagonal Structure
Description
Compute Gram Matrix for Block Diagonal Structure
Usage
compute_gram_block_diagonal(
list_in,
parallel,
cl,
chunk_size,
num_chunks,
rem_chunks
)
Arguments
list_in |
List of matrices |
parallel |
Use parallel processing |
cl |
Cluster object |
chunk_size |
Chunk size for parallel |
num_chunks |
Number of chunks |
rem_chunks |
Remaining chunks |
Details
For a list of matrices, will compute the gram matrix of each element of the list.
Value
List of Gram matrices (\textbf{X}^{T}\textbf{X}) for each block
Effective degrees of freedom via trace of the hat matrix
Description
Computes \mathrm{tr}(\mathbf{X}\mathbf{U}\mathbf{G}\mathbf{X}^\top
\mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}) without forming
the N \times N hat matrix, by reducing the problem to
P \times P and r \times r operations.
Usage
compute_trace_H(
G,
Lambda,
A,
AGAInv,
nc,
nca,
K,
parallel,
cl,
chunk_size,
num_chunks,
rem_chunks,
unique_penalty_per_partition,
L_partition_list
)
Arguments
G |
List of |
Lambda |
Base penalty matrix |
A |
Constraint matrix, |
AGAInv |
Precomputed |
nc |
Number of columns per partition block ( |
nca |
Number of constraint columns ( |
K |
Number of interior knots (so there are |
parallel |
Logical; use parallel processing. |
cl |
Cluster object for parallel computation. |
chunk_size |
Size of parallel chunks. |
num_chunks |
Number of full-size chunks. |
rem_chunks |
Number of remaining partitions in the final chunk. |
unique_penalty_per_partition |
Logical; if |
L_partition_list |
List of partition-specific penalty matrices
|
Details
Derivation
Let \mathbf{G} = (\mathbf{X}^\top\mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}\mathbf{X} + \boldsymbol{\Lambda})^{-1}
and \mathbf{U} = \mathbf{I} - \mathbf{G}\mathbf{A}(\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^\top.
By the cyclic property of the trace:
\mathrm{tr}(\mathbf{X}\mathbf{U}\mathbf{G}\mathbf{X}^\top\mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}) = \mathrm{tr}(\mathbf{U}\mathbf{G}\,\mathbf{X}^\top\mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}\mathbf{X})
Substituting \mathbf{X}^\top\mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}\mathbf{X} = \mathbf{G}^{-1} - \boldsymbol{\Lambda}:
= \mathrm{tr}(\mathbf{U}\mathbf{G}(\mathbf{G}^{-1} - \boldsymbol{\Lambda})) = \mathrm{tr}(\mathbf{U}) - \mathrm{tr}(\mathbf{U}\mathbf{G}\boldsymbol{\Lambda})
Since \mathbf{U} is idempotent,
\mathrm{tr}(\mathbf{U}) = \mathrm{rank}(\mathbf{U}) = P - r
where r = \mathrm{rank}(\mathbf{A}). For the second term,
expand \mathbf{U} = \mathbf{I} - \mathbf{G}\mathbf{A}(\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^\top:
\mathrm{tr}(\mathbf{U}\mathbf{G}\boldsymbol{\Lambda}) = \mathrm{tr}(\mathbf{G}\boldsymbol{\Lambda}) - \mathrm{tr}\!\left((\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^\top\mathbf{G}\boldsymbol{\Lambda}\mathbf{G}\mathbf{A}\right)
using the cyclic property on the second term. Both \mathbf{G}
and \boldsymbol{\Lambda} are block-diagonal with K+1
blocks of dimension p \times p, so:
\mathrm{tr}(\mathbf{G}\boldsymbol{\Lambda}) = \sum_{k=1}^{K+1}\mathrm{tr}(\mathbf{G}_k\boldsymbol{\Lambda}_k)
and \mathbf{G}\boldsymbol{\Lambda}\mathbf{G} is also block-diagonal
with blocks \mathbf{G}_k\boldsymbol{\Lambda}_k\mathbf{G}_k.
Combining:
\mathrm{tr}(\mathbf{X}\mathbf{U}\mathbf{G}\mathbf{X}^\top\mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}) = (P - r) - \sum_{k=1}^{K+1}\mathrm{tr}(\mathbf{G}_k\boldsymbol{\Lambda}_k) + \mathrm{tr}\!\left((\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^\top\mathbf{G}\boldsymbol{\Lambda}\mathbf{G}\mathbf{A}\right)
The correction term
\mathrm{tr}((\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^\top\mathbf{G}\boldsymbol{\Lambda}\mathbf{G}\mathbf{A})
is an r \times r trace and captures the degrees of freedom
recovered because the constraints pin certain linear combinations of
penalized coefficients, removing them from estimation. When
\boldsymbol{\Lambda} = \mathbf{0} (no penalty), both
\mathrm{tr}(\mathbf{G}\boldsymbol{\Lambda}) and the correction
vanish, giving \mathrm{edf} = P - r. When \mathbf{U} = \mathbf{I}
(no constraints), r = 0 and the correction vanishes, giving
\mathrm{edf} = P - \mathrm{tr}(\mathbf{G}\boldsymbol{\Lambda}).
The result is invariant to the correlation structure \mathbf{V}
and GLM weights \mathbf{W}, which enter only through
\mathbf{G}.
Partition-specific penalties
When unique_penalty_per_partition = TRUE, each partition k
has an additional penalty matrix \mathbf{L}_k from
L_partition_list, so the effective per-partition penalty becomes
\boldsymbol{\Lambda}_k = \boldsymbol{\Lambda} + \mathbf{L}_k.
All block-wise traces and products use \boldsymbol{\Lambda}_k
in place of the shared \boldsymbol{\Lambda}.
Computational cost
The partition-wise traces \mathrm{tr}(\mathbf{G}_k\boldsymbol{\Lambda}_k)
cost O(p^2) each and are parallelizable. The correction term
requires forming \mathbf{G}\boldsymbol{\Lambda}\mathbf{G}\mathbf{A}
(block-diagonal times sparse, O((K+1)p^2 r)) and a single
r \times r trace (O(r^2)). The total cost is
O((K+1)p^2 + r^2), compared to O(N^2) or O(NP)
for forming and tracing the full hat matrix.
Value
Scalar effective degrees of freedom, clamped to [0, \, (K+1)p].
Calculate Trace of Matrix Product \text{trace}(\textbf{X}\textbf{U}\textbf{G}\textbf{X}^{T})
Description
Calculate Trace of Matrix Product \text{trace}(\textbf{X}\textbf{U}\textbf{G}\textbf{X}^{T})
Usage
compute_trace_UGXX_wrapper(
G,
A,
GXX,
AGAInv,
nc,
nca,
K,
parallel,
cl,
chunk_size,
num_chunks,
rem_chunks
)
Arguments
G |
List of G matrices ( |
A |
Constraint matrix ( |
GXX |
List of |
AGAInv |
Inverse of |
nc |
Number of columns |
nca |
Number of constraint columns |
K |
Number of partitions minus 1 ( |
parallel |
Use parallel processing |
cl |
Cluster object |
chunk_size |
Size of parallel chunks |
num_chunks |
Number of chunks |
rem_chunks |
Remaining chunks |
Details
Computes \text{trace}(\textbf{X}\textbf{U}\textbf{G}\textbf{X}^{T}) where \textbf{U} = \textbf{I} - \textbf{G}\textbf{A}(\textbf{A}^{T}\textbf{G}\textbf{A})^{-1}\textbf{A}^{T}.
Handles parallel computation by splitting into chunks.
Value
Trace value
Confidence Intervals for lgspline Coefficients
Description
Wald-based confidence intervals for regression coefficients and, when available, correlation parameters (on the working scale).
Usage
## S3 method for class 'lgspline'
confint(object, parm, level = 0.95, ...)
Arguments
object |
A fitted lgspline object with |
parm |
Optional vector of parameter indices or names. Default returns all regression parameters; working-scale correlation parameters are appended when available. |
level |
Confidence level. Default 0.95. |
... |
Additional arguments passed to |
Details
For Gaussian identity-link models, t-distribution quantiles are used with
effective degrees of freedom
N - \mathrm{trace}(\mathbf{XUGX}^\top).
All other families use normal quantiles.
Correlation parameter intervals (if VhalfInv_params_estimates and
VhalfInv_params_vcov are present) are computed on the unbounded
working scale via a Wald interval.
Value
A matrix with columns giving lower and upper confidence limits,
named e.g. 2.5 % and 97.5 % for 95% intervals.
When available, rows for working-scale correlation parameters are
appended after the regression coefficients.
Extract Confidence Intervals from a wald_lgspline Object
Description
Extract Confidence Intervals from a wald_lgspline Object
Usage
## S3 method for class 'wald_lgspline'
confint(object, parm = NULL, level = NULL, ...)
Arguments
object |
A |
parm |
Parameter specification (ignored; all returned). |
level |
Confidence level (ignored; uses the object's critical value). |
... |
Not used. |
Value
Matrix with columns lower and upper, or NULL if confidence limits are not available.
Cox PH Dispersion Function
Description
Returns 1 unconditionally. Cox PH has no dispersion parameter; this
function exists solely for interface compatibility with lgspline's
dispersion_function argument.
Usage
cox_dispersion_function(
mu,
y,
order_indices,
family,
observation_weights,
VhalfInv,
...
)
Arguments
mu |
Predicted values. |
y |
Observed survival times. |
order_indices |
Observation indices. |
family |
Family object. |
observation_weights |
Observation weights. |
VhalfInv |
Inverse square root of correlation matrix. |
... |
Additional arguments (including |
Value
Scalar 1.
Cox Proportional Hazards Family for lgspline
Description
Creates a family-like object for Cox PH models. The link function is
log (the linear predictor is log-relative-hazard), but unlike
standard GLM families there is no dispersion parameter and no closed-form
mean-variance relationship.
The family provides $loglik and $aic methods compatible
with logLik.lgspline.
Usage
cox_family()
Details
Cox PH is semiparametric: the baseline hazard is unspecified. The
partial log-likelihood depends only on the order of event times and the
linear predictor \eta = \mathbf{X}\boldsymbol{\beta}. Consequently:
No dispersion parameter is estimated (
sigmasq_tildeis fixed at 1).-
dev.residsreturns martingale-style residuals for GCV tuning compatibility. The response variable is survival time (positive), and the link is
log.
Value
A list with family components used by lgspline.
Examples
fam <- cox_family()
fam$family
fam$link
Cox PH GLM Weight Function
Description
Computes working weights for the Cox PH information matrix, used by
lgspline when updating \mathbf{G} after obtaining constrained
estimates. The weights are a diagonal approximation built from the Breslow
tied-event information contributions.
Usage
cox_glm_weight_function(
mu,
y,
order_indices,
family,
dispersion,
observation_weights,
status
)
Arguments
mu |
Predicted values (exp(eta), i.e., relative hazard). |
y |
Observed survival times. |
order_indices |
Observation indices in partition order. |
family |
Cox family object (unused, for interface compatibility). |
dispersion |
Dispersion parameter (fixed at 1 for Cox PH). |
observation_weights |
Observation weights. |
status |
Event indicator (1 = event, 0 = censored). |
Details
For a tied event-time block g, the diagonal approximation uses
W_{jj}^{(g)} = d_g^{(w)} \frac{h_j}{S_g}
\Bigl(1 - \frac{h_j}{S_g}\Bigr), \qquad j \in R_g
where h_j = w_j \exp(\eta_j),
S_g = \sum_{k \in R_g} h_k, and
d_g^{(w)} = \sum_{i \in D_g} w_i.
When the natural weights are degenerate (all zero or non-finite), falls back to a vector of ones.
Value
Numeric vector of working weights, length N.
Examples
## Used internally by lgspline; see cox_helpers examples below.
Cox Proportional Hazards Helpers for lgspline
Description
Functions for fitting Cox proportional hazards regression models within the lgspline framework. Analogous to the Weibull AFT helpers, these provide the partial log-likelihood, score, information, and all interface functions needed by lgspline's unconstrained fitting, penalty tuning, and inference machinery.
Cox PH Score Function for Quadratic Programming and Blockfit
Description
Computes the score (gradient of partial log-likelihood) in the format
expected by lgspline's qp_score_function interface. The block-
diagonal design matrix \mathbf{X} and response \mathbf{y} are
in partition order; this function internally sorts by event time, computes
the Cox score using the Breslow approximation for tied event times, and
returns the result in the original partition order.
Usage
cox_qp_score_function(
X,
y,
mu,
order_list,
dispersion,
VhalfInv,
observation_weights,
status
)
Arguments
X |
Block-diagonal design matrix (N x P). |
y |
Response vector (survival times, N x 1). |
mu |
Predicted values (N x 1), same order as X and y. |
order_list |
List of observation indices per partition. |
dispersion |
Dispersion (fixed at 1). |
VhalfInv |
Inverse square root correlation matrix (NULL for independent observations). |
observation_weights |
Observation weights. |
status |
Event indicator (1 = event, 0 = censored). |
Value
Numeric column vector of length P (gradient w.r.t. coefficients).
Cox PH Schur Correction
Description
Returns zero corrections for all partitions. Cox PH has no nuisance
dispersion parameter, so no Schur complement correction to the
information matrix is needed. This function exists for interface
compatibility with lgspline's schur_correction_function.
Usage
cox_schur_correction(
X,
y,
B,
dispersion,
order_list,
K,
family,
observation_weights,
...
)
Arguments
X |
List of partition design matrices. |
y |
List of partition response vectors. |
B |
List of partition coefficient vectors. |
dispersion |
Dispersion (fixed at 1). |
order_list |
List of observation indices per partition. |
K |
Number of knots. |
family |
Family object. |
observation_weights |
Observation weights. |
... |
Additional arguments. |
Value
List of K+1 zeros.
Create Block Diagonal Matrix
Description
Create Block Diagonal Matrix
Usage
create_block_diagonal(matrix_list)
Arguments
matrix_list |
List of matrices to arrange diagonally |
Details
Takes in a list of matrices, and returns a block-diagonal matrix with each element of the list as one block. All off-diagonal elements are 0. Matrices should be square and have compatible dimensions.
Value
Block diagonal matrix with input matrices on diagonal
Create One-Hot Encoded Matrix
Description
Converts a categorical vector into a one-hot encoded matrix where each unique value becomes a binary column.
Usage
create_onehot(x, drop_first = FALSE)
Arguments
x |
A vector containing categorical values (factors, character, etc.) |
drop_first |
Logical; if |
Details
The function creates dummy variables for each unique value in the input vector using
model.matrix() with dummy-intercept coding. Column names are cleaned by removing the
'x' prefix added by model.matrix().
Value
A data frame containing the one-hot encoded binary columns with cleaned column names
Examples
## lgspline will not accept this format of "catvar", because inputting data
# this way can cause difficult-to-diagnose issues in formula parsing
# all variables must be numeric
df <- data.frame(numvar = rnorm(100),
catvar = rep(LETTERS[1:4],
25))
print(head(df))
## Instead, replace with dummy-intercept coding by
# 1) applying one-hot encoding
# 2) dropping the first column
# 3) appending to our data
dummy_intercept_coding <- create_onehot(df$catvar)[,-1]
df$catvar <- NULL
df <- cbind(df, dummy_intercept_coding)
print(head(df))
Damped Newton-Raphson Parameter Optimization
Description
Performs iterative parameter estimation with adaptive step-size damping
Internal function for fitting unconstrained model components using damped Newton-Raphson updates.
Usage
damped_newton_r(
parameters,
loglikelihood,
gradient,
neghessian,
tol = 1e-07,
max_cnt = 64,
max_dmp_steps = 16
)
Arguments
parameters |
Initial parameter vector to be optimized |
loglikelihood |
Function computing log-likelihood for current parameters |
gradient |
Function computing parameter gradients |
neghessian |
Function computing negative Hessian matrix |
tol |
Numeric convergence tolerance (default 1e-7) |
max_cnt |
Maximum number of optimization iterations (default 64) |
max_dmp_steps |
Maximum damping step attempts (default 16) |
Details
Implements a robust damped Newton-Raphson optimization algorithm. The Newton direction is computed once per outer iteration and reused across damping half-steps.
Value
Final parameter vector returned at termination.
See Also
- nr_iterate for parameter update computation
BFGS Implementation for REML Parameter Estimation
Description
BFGS optimizer designed for REML optimization of correlation parameters. Combines function evaluation and gradient computation into single call to avoid redundant model refitting.
Usage
efficient_bfgs(par, fn, control = list())
Arguments
par |
Numeric vector of initial parameter values. |
fn |
Function returning list(objective, gradient). Must return both objective value and gradient vector matching length(par). |
control |
List of control parameters:
|
Details
Implements BFGS, used internally by lgspline() for optimizing
correlation parameters via REML when an analytic REML gradient is supplied.
This is more efficient than native BFGS, since gradient and loss can be computed simultaneously, avoiding re-computing components in "fn" and "gr" separately.
Value
List containing:
- par
Parameter vector minimizing objective
- value
Minimum objective value
- counts
Number of iterations
- convergence
TRUE if termination occurred before
maxit; this reflects the current stopping rule rather than a separate post-hoc convergence check- message
Description of termination status
- vcov
Final approximation of inverse-Hessian, useful for inference
Examples
## Minimize Rosenbrock function
fn <- function(x) {
# Objective
f <- 100*(x[2] - x[1]^2)^2 + (1-x[1])^2
# Gradient
g <- c(-400*x[1]*(x[2] - x[1]^2) - 2*(1-x[1]),
200*(x[2] - x[1]^2))
list(f, g)
}
(res <- efficient_bfgs(c(0.5, 2.5), fn))
## Compare to
(res0 <- stats::optim(c(0.5, 2.5), function(x)fn(x)[[1]], hessian = TRUE))
solve(res0$hessian)
Efficient Matrix Multiplication
Description
Performs matrix multiplication using RcppArmadillo
Usage
efficient_matrix_mult(A, B)
Arguments
A |
First input matrix |
B |
Second input matrix |
Value
Matrix product of A and B
Print Closed-Form Fitted Equation from lgspline Model
Description
Displays the closed-form polynomial equation for each partition of a fitted lgspline model, along with partition boundary or cluster center information. Optionally prints the first derivative, second derivative, or antiderivative of the fitted equation with respect to a single specified variable.
Usage
equation(object, ...)
## S3 method for class 'lgspline'
equation(
object,
digits = 4,
scientific = FALSE,
show_bounds = TRUE,
predictor_names = NULL,
response_name = NULL,
collapse_zero = TRUE,
first_derivative = NULL,
second_derivative = NULL,
antiderivative = NULL,
...
)
## S3 method for class 'equation'
print(x, ...)
## S3 method for class 'lgspline'
equation(
object,
digits = 4,
scientific = FALSE,
show_bounds = TRUE,
predictor_names = NULL,
response_name = NULL,
collapse_zero = TRUE,
first_derivative = NULL,
second_derivative = NULL,
antiderivative = NULL,
...
)
## S3 method for class 'equation'
print(x, ...)
Arguments
object |
A fitted lgspline model object. |
... |
Not used. |
digits |
Integer; decimal places for coefficient display. Default 4. |
scientific |
Logical; use scientific notation for coefficients with absolute value < 1e-3 or > 1e4. Default FALSE. |
show_bounds |
Logical; display partition bounds (1D) or knot midpoint boundaries (multi-D). Default TRUE. |
predictor_names |
Character vector; custom names for predictor variables. If NULL (default), uses original column names or "_j_" labels. |
response_name |
Character; label for response. If NULL (default), uses "y" for identity link Gaussian, or "link(E[y])" otherwise. |
collapse_zero |
Logical; omit terms with coefficient exactly 0. Default TRUE. |
first_derivative |
Default: NULL. Character name or integer index of
the predictor variable with respect to which the first derivative
is printed. Only one variable at a time is supported. When non-NULL,
the printed equations show |
second_derivative |
Default: NULL. Character name or integer index of
the predictor variable with respect to which the second derivative
is printed. Only one variable at a time is supported. When non-NULL,
the printed equations show |
antiderivative |
Default: NULL. Character name or integer index of
the predictor variable with respect to which the antiderivative
(indefinite integral) is printed. Only one variable at a time is
supported. When non-NULL, the printed equations show
|
x |
An object returned by |
Details
For 1D models with K knots, partition boundaries are displayed as intervals
on the predictor scale. For multi-predictor models, partition boundaries are
computed as the midpoints between adjacent cluster centers along each
predictor dimension. When the model's make_partition_list contains
knots (midpoint boundaries between clusters), those are used directly.
Otherwise, cluster centers are displayed.
Coefficients are displayed on the original (unstandardized) predictor scale. For GLMs with non-identity link, the left-hand side shows the link function applied to the expected response.
Derivative and antiderivative modes.
Only one of first_derivative, second_derivative, or
antiderivative may be non-NULL. If more than one is supplied, the
priority order is: first derivative, second derivative, antiderivative.
Derivatives and antiderivatives are computed symbolically from the
polynomial coefficients. For a term a x^n, the first derivative is
n a x^{n-1}, the second derivative is n(n-1) a x^{n-2}, and
the antiderivative is a x^{n+1}/(n+1). Cross-terms (interactions)
involving the target variable are differentiated or integrated with respect
to that variable only, treating all other variables as constants.
A warning is emitted if the user attempts to differentiate or integrate with
respect to more than one variable simultaneously. Multi-variable calculus
operations should be performed one variable at a time by calling
equation() repeatedly.
Value
Invisibly returns a list with components:
- formulas
Character vector of equation strings per partition.
- bounds
Matrix or list of partition boundary information.
- link
Character; link function name.
- mode
Character; one of "equation", "first_derivative", "second_derivative", or "antiderivative".
- variable
Character; the variable name for the calculus operation, or NULL if mode is "equation".
See Also
lgspline, plot.lgspline,
coef.lgspline
Examples
## 1D example
set.seed(1234)
t <- runif(500, -5, 5)
y <- 2*sin(t) + 0.1*t^2 + rnorm(length(t), 0, 0.5)
fit <- lgspline(t, y, K = 2)
equation(fit)
equation(fit, digits = 2, predictor_names = "time")
## First derivative with respect to predictor
equation(fit, first_derivative = 1)
## Second derivative
equation(fit, second_derivative = 1)
## Antiderivative
equation(fit, antiderivative = 1)
## 2D example with named predictors
x1 <- runif(300, 0, 10)
x2 <- runif(300, 0, 10)
y <- x1 + 0.5*x2 + 0.1*x1*x2 + rnorm(300)
fit2d <- lgspline(cbind(x1, x2), y, K = 3)
equation(fit2d, predictor_names = c("Length", "Width"))
## Derivative w.r.t. first variable only
equation(fit2d, first_derivative = "Length",
predictor_names = c("Length", "Width"))
## GLM example
y_bin <- rbinom(500, 1, plogis(0.5*t))
fit_glm <- lgspline(t, y_bin, family = binomial(), K = 1)
equation(fit_glm)
Generate Grid Indices Without expand.grid()
Description
Generate Grid Indices Without expand.grid()
Usage
expgrid(vec_list, indices)
Arguments
vec_list |
List of vectors to combine |
indices |
Indices of combinations to return |
Details
Returns selected combinations from the cartesian product of vec_list
without constructing full expand.grid() for memory efficiency.
Value
Data frame of selected combinations
Find the Extremum of a Fitted lgspline
Description
Finds the global maximum or minimum of a fitted lgspline using L-BFGS-B, with options for partition-based heuristics, stochastic exploration, and custom objective functions (e.g., acquisition functions for Bayesian optimization).
Usage
find_extremum(
object,
vars = NULL,
quick_heuristic = TRUE,
initial = NULL,
B_predict = NULL,
minimize = FALSE,
stochastic = FALSE,
stochastic_draw = function(mu, sigma, ...) {
N <- length(mu)
rnorm(N, mu,
sigma)
},
sigmasq_predict = object$sigmasq_tilde,
custom_objective_function = NULL,
custom_objective_derivative = NULL,
...
)
Arguments
object |
A fitted lgspline model object. |
vars |
Integer or character vector; indices or names of predictors to optimize over. Default NULL optimizes all predictors. |
quick_heuristic |
Logical; if TRUE (default) searches only the best-performing partition. If FALSE, initiates searches from all partition local maxima. |
initial |
Numeric vector; optional starting values. Useful for fixing binary predictors. Default NULL. |
B_predict |
List; optional coefficient list for prediction, e.g.
from |
minimize |
Logical; find minimum instead of maximum. Default FALSE. |
stochastic |
Logical; add noise during optimization for exploration. Default FALSE. |
stochastic_draw |
Function; generates noise for stochastic
optimization. Takes |
sigmasq_predict |
Numeric; variance for stochastic draws.
Default |
custom_objective_function |
Function; optional custom objective.
Takes |
custom_objective_derivative |
Function; optional gradient of
|
... |
Additional arguments passed to internal optimization routines. |
Value
A list with elements:
- t
Numeric vector; predictor values at the extremum.
- y
Numeric; objective value at the extremum.
See Also
Examples
set.seed(1234)
t <- runif(1000, -10, 10)
y <- 2*sin(t) + -0.06*t^2 + rnorm(length(t))
model_fit <- lgspline(t, y)
plot(model_fit)
max_point <- find_extremum(model_fit)
min_point <- find_extremum(model_fit, minimize = TRUE)
abline(v = max_point$t, col = 'blue')
abline(v = min_point$t, col = 'red')
## Expected improvement acquisition function
ei_obj <- function(mu, sigma, y_best, ...) {
d <- y_best - mu
d * pnorm(d/sigma) + sigma * dnorm(d/sigma)
}
ei_deriv <- function(mu, sigma, y_best, d_mu, ...) {
d <- y_best - mu
z <- d/sigma
d_z <- -d_mu/sigma
pnorm(z)*d_mu - d*dnorm(z)*d_z + sigma*z*dnorm(z)*d_z
}
post_draw <- generate_posterior(model_fit)
acq <- find_extremum(model_fit,
stochastic = TRUE,
B_predict = post_draw$post_draw_coefficients,
sigmasq_predict = post_draw$post_draw_sigmasq,
custom_objective_function = ei_obj,
custom_objective_derivative = ei_deriv)
abline(v = acq$t, col = 'green')
Find Neighboring Cluster Partitions Using Midpoint Distance Criterion
Description
Identifies neighboring partitions by evaluating whether the midpoint between cluster centers is closer to those centers than to any other center.
Usage
find_neighbors(centers, parallel, cl, neighbor_tolerance)
Arguments
centers |
Matrix; rows are cluster center coordinates |
parallel |
Logical; use parallel processing |
cl |
Cluster object for parallel execution |
neighbor_tolerance |
Numeric; scaling factor for distance comparisons |
Value
List where element i contains indices of centers neighboring center i
Generate Posterior Samples from a Fitted lgspline
Description
Draws from the posterior distribution of model coefficients, with optional dispersion sampling, posterior predictive draws, and propagation of uncertainty in estimated correlation parameters.
Usage
generate_posterior(
object,
new_sigmasq_tilde = object$sigmasq_tilde,
new_predictors = NULL,
theta_1 = 0,
theta_2 = 0,
posterior_predictive_draw = function(N, mean, sqrt_dispersion, ...) {
rnorm(N,
mean, sqrt_dispersion)
},
draw_dispersion = TRUE,
include_posterior_predictive = FALSE,
num_draws = 1,
enforce_qp_constraints = TRUE,
draw_correlation = FALSE,
correlation_param_mean = NULL,
correlation_param_vcov = NULL,
correlation_VhalfInv_fxn = NULL,
correlation_Vhalf_fxn = NULL,
correlation_param_vcov_scale = NULL,
include_warnings = TRUE,
...
)
Arguments
object |
A fitted |
new_sigmasq_tilde |
Numeric; dispersion |
new_predictors |
Matrix; predictor matrix for posterior predictive sampling. Default uses in-sample predictors. |
theta_1 |
Numeric; shape increment for the inverse-gamma prior on
|
theta_2 |
Numeric; rate increment for the inverse-gamma prior. Default 0. |
posterior_predictive_draw |
Function; sampler for posterior predictive
realisations. Must accept |
draw_dispersion |
Logical; sample |
include_posterior_predictive |
Logical; generate posterior predictive
draws at |
num_draws |
Positive integer; number of draws. Default 1. |
enforce_qp_constraints |
Logical; if TRUE, enforce active QP inequality constraints during posterior sampling via the stored elliptical-slice constrained sampler. Default TRUE. |
draw_correlation |
Logical; propagate correlation parameter
uncertainty. Requires |
correlation_param_mean |
Numeric vector; mean of the approximate
normal posterior for correlation parameters on the unbounded
(working) scale. Default: |
correlation_param_vcov |
Matrix; variance-covariance for correlation
parameter draws. Default: inverse Hessian from BFGS
( |
correlation_VhalfInv_fxn |
Function; maps correlation parameter
vector to |
correlation_Vhalf_fxn |
Function or NULL; maps to
|
correlation_param_vcov_scale |
NULL or numeric; if supplied,
divides a user-supplied |
include_warnings |
Logical; emit warnings for degenerate draws, constraint violations, etc. Default TRUE. |
... |
Additional arguments forwarded to the GLM weight function,
dispersion function, and |
Details
Uses a Laplace approximation centred at the MAP estimate for non-Gaussian responses.
Dispersion posterior.
When draw_dispersion = TRUE, \sigma^2 is drawn from
\sigma^2 \mid \mathbf{y} \sim
\mathrm{InvGamma}(\alpha_1, \alpha_2),
where
\alpha_1 = \theta_1 + \tfrac{1}{2}(N - s \cdot \mathrm{tr}(\mathbf{H})),
\quad
\alpha_2 = \theta_2 + \tfrac{1}{2}(N - s \cdot \mathrm{tr}(\mathbf{H}))
\tilde{\sigma}^2,
\mathbf{H} = \mathbf{XUGX}^\top is the hat matrix, s = 1
when unbias_dispersion = TRUE (else s = 0), and
\theta_1 = \theta_2 = 0 recovers an improper uniform prior.
Correlation parameter posterior.
When draw_correlation = TRUE and the fitted model contains an
estimated correlation structure, each draw first samples
\boldsymbol{\rho} from
\boldsymbol{\rho}^{(m)} \sim
\mathcal{N}(\hat{\boldsymbol{\rho}}_{\mathrm{REML}},
\mathbf{H}^{-1}_{\mathrm{BFGS}}),
rebuilds the posterior covariance under the drawn correlation structure (reusing all pre-computed design matrices, constraints, and penalty matrices) and then draws coefficients from the updated posterior. Knot placement, partitioning, coefficient re-estimation, and GCV tuning are skipped entirely. Draws producing non-positive-definite correlation matrices are rejected and redrawn (up to 50 attempts).
When draw_correlation = FALSE (default), correlation parameters
are fixed at their estimated values.
Inequality constraints.
Active QP inequalities can be enforced during posterior sampling via
elliptical slice sampling, producing draws from the corresponding
truncated multivariate normal posterior on the coefficient scale.
The public enforce_qp_constraints argument is forwarded to the
stored sampler for both the standard and correlation-aware posterior
paths.
Value
When num_draws = 1, a named list:
- post_draw_coefficients
List of length K+1; per-partition coefficient vectors on the original scale.
- post_draw_sigmasq
Drawn (or fixed) dispersion.
- post_pred_draw
Posterior predictive vector (only when
include_posterior_predictive = TRUE).- post_draw_correlation_params
Drawn correlation parameters on the working scale (only when
draw_correlation = TRUE).
When num_draws > 1, each element becomes a list of length
num_draws, and post_pred_draw (if requested) is an
N_{\mathrm{new}} \times M matrix, where M = \mathrm{num\_draws}.
See Also
lgspline,
generate_posterior_correlation,
wald_univariate
Examples
set.seed(1234)
n_blocks <- 100; block_size <- 5; N <- n_blocks * block_size
rho_true <- 0.3
t <- seq(-5, 5, length.out = N)
true_mean <- sin(t)
errors <- Reduce("rbind",
lapply(1:n_blocks, function(i) {
sigma <- diag(block_size) + rho_true *
(matrix(1, block_size, block_size) - diag(block_size))
matsqrt(sigma) %*% rnorm(block_size)
})
)
y <- true_mean + errors * 0.5
model_fit <- lgspline(t, y,
K = 3,
correlation_id = rep(1:n_blocks, each = block_size),
correlation_structure = "exchangeable",
include_warnings = FALSE
)
## Propagate correlation uncertainty across 50 draws
post <- generate_posterior(model_fit,
draw_correlation = TRUE, num_draws = 50,
include_warnings = FALSE
)
## Fixed correlation parameters for comparison
post_fixed <- generate_posterior(model_fit, num_draws = 50)
corr_draws <- unlist(post$post_draw_correlation_params)
rho_draws <- exp(-exp(corr_draws))
print(summary(rho_draws))
Generate Posterior Samples Propagating Correlation Parameter Uncertainty
Description
Called internally by generate_posterior when
draw_correlation = TRUE, but can be used directly for finer control.
For each draw, samples the correlation parameter vector from its approximate
normal posterior, rebuilds the posterior covariance under that drawn
correlation structure without re-solving for a new coefficient mode, then
draws coefficients from the updated posterior.
Usage
generate_posterior_correlation(
object,
new_sigmasq_tilde = object$sigmasq_tilde,
new_predictors = NULL,
theta_1 = 0,
theta_2 = 0,
posterior_predictive_draw = function(N, mean, sqrt_dispersion, ...) {
rnorm(N,
mean, sqrt_dispersion)
},
draw_dispersion = TRUE,
include_posterior_predictive = FALSE,
num_draws = 1,
enforce_qp_constraints = TRUE,
correlation_param_mean = NULL,
correlation_param_vcov_sc = NULL,
correlation_VhalfInv_fxn = NULL,
correlation_Vhalf_fxn = NULL,
include_warnings = TRUE,
...
)
Arguments
object |
A fitted |
new_sigmasq_tilde |
Numeric; dispersion starting value when
|
new_predictors |
Matrix or NULL; predictor matrix for posterior predictive sampling. Default uses in-sample predictors. |
theta_1 |
Numeric; shape increment for the inverse-gamma prior. Default 0. |
theta_2 |
Numeric; rate increment for the inverse-gamma prior. Default 0. |
posterior_predictive_draw |
Function; sampler for posterior predictive
realisations. Default |
draw_dispersion |
Logical; sample |
include_posterior_predictive |
Logical; generate posterior predictive draws. Default FALSE. |
num_draws |
Positive integer; number of draws (each requires one correlation parameter sample and one covariance rebuild). Default 1. |
enforce_qp_constraints |
Logical; if TRUE, enforce active QP inequality constraints during each coefficient draw via the stored elliptical-slice constrained sampler. Default TRUE. |
correlation_param_mean |
Numeric vector or NULL; mean of the
approximate normal posterior on the working scale. Default:
|
correlation_param_vcov_sc |
Matrix or NULL; variance-covariance
on the working scale. Default:
|
correlation_VhalfInv_fxn |
Function or NULL; maps parameter vector
to |
correlation_Vhalf_fxn |
Function or NULL; maps to
|
include_warnings |
Logical; emit warnings. Default TRUE. |
... |
Additional arguments forwarded to the GLM weight function,
dispersion function, and |
Details
Each draw proceeds in three steps:
-
Draw correlation parameters.
\boldsymbol{\rho}^{(m)} \sim \mathcal{N}(\hat{\boldsymbol{\rho}}_{\mathrm{REML}}, \mathbf{H}^{-1}_{\mathrm{BFGS}})on the unbounded working scale. Draws producing a non-PD correlation matrix are rejected and redrawn (up to 50 attempts); if all fail, the point estimate is used with a warning. -
Rebuild posterior covariance. Using the already-expanded
\mathbf{X}_k,\mathbf{A}, and\boldsymbol{\Lambda}from the original fit, recompute only the covariance-side quantities implied by the drawn correlation structure:\mathbf{G}_{\mathrm{correct}}^{(m)} = \left(\mathbf{X}^{\top} \mathbf{W}\mathbf{D} \mathbf{V}^{-1}(\boldsymbol{\rho}^{(m)}) \mathbf{X} + \boldsymbol{\Lambda}\right)^{-1},and from this the updated constraint projection
\mathbf{U}^{(m)}and effective degrees of freedom\mathrm{trace}(\mathbf{H}^{(m)}).The coefficient mode
\hat{\boldsymbol{\beta}}_{\mathrm{raw}}and fitted mean\tilde{\mathbf{y}}are held fixed at the original fit values. Knot placement, partitioning, polynomial expansion, penalty tuning, and coefficient re-estimation are all skipped entirely. -
Draw coefficients. Updated quantities (
U,Ghalf_correct,VhalfInv,sigmasq_tilde,trace_XUGX) are passed to the stored closure viaoverride_*arguments so that the draw is centred at the original mode but uses the covariance implied by the drawn correlation structure. The stored modeobject$B_rawis passed asoverride_B_rawso it is not recomputed.
Why the mode is held fixed. Re-solving for a new MAP estimate under each drawn correlation structure is expensive, requires iterative solvers, and risks convergence failures on draws far from the REML estimate. The posterior draw is centred at the original mode, which remains a reasonable approximation when the REML surface is not sharply peaked. The covariance update captures the primary effect of correlation uncertainty on posterior width and shape.
BFGS inverse Hessian caveat.
The BFGS inverse Hessian approximation for the correlation parameter
covariance is asymptotically valid but may be poor for small samples,
near-boundary estimates, or multimodal REML surfaces. It is not guaranteed
to converge to the observed information matrix. Users should inspect
object$VhalfInv_params_vcov before relying on these draws.
Value
When num_draws = 1, a named list:
- post_draw_coefficients
List of length K+1; per-partition coefficient vectors on the original scale.
- post_draw_sigmasq
Drawn dispersion.
- post_pred_draw
Posterior predictive vector (only when
include_posterior_predictive = TRUE).- post_draw_correlation_params
Drawn correlation parameters on the working scale.
When num_draws > 1:
- post_draw_coefficients
List of
num_drawslists of K+1 coefficient vectors.- post_draw_sigmasq
List of
num_drawsscalars.- post_pred_draw
N_{\mathrm{new}} \times Mmatrix, whereM = \mathrm{num\_draws}(only wheninclude_posterior_predictive = TRUE).- post_draw_correlation_params
List of
num_drawsvectors.
See Also
generate_posterior,
lgspline,
lgspline.fit
Examples
## See ?generate_posterior for a complete worked example.
Compute Integrated Squared Second Derivative Penalty Matrix
Description
Computes the p \times p integrated squared second-derivative penalty matrix
\boldsymbol{\Lambda}_s for one partition of a monomial spline,
such that
\boldsymbol{\beta}_k^\top \boldsymbol{\Lambda}_s
\boldsymbol{\beta}_k
= \int_{\mathbf{a}}^{\mathbf{b}}
\|\tilde{f}_k''(\mathbf{t})\|^2 \, d\mathbf{t},
where \mathbf{a} and \mathbf{b} are the observed
predictor minimums and maximums, and
\tilde{f}_k(\mathbf{t}) = \mathbf{x}_k^\top
\boldsymbol{\beta}_k is the fitted function for partition
\mathcal{P}_k.
The implementation uses a general monomial derivative rule that handles all term types (marginal powers, two-way interactions, quadratic interactions, three-way interactions) in a single unified loop.
Usage
get_2ndDerivPenalty(
colnm_expansions,
C,
power1_cols,
power2_cols,
power3_cols,
power4_cols,
interaction_single_cols,
interaction_quad_cols,
triplet_cols,
p_expansions,
select_cols = NULL
)
Arguments
colnm_expansions |
Character vector of length |
C |
Numeric |
power1_cols |
Integer vector. Column indices of linear terms
|
power2_cols |
Integer vector. Column indices of quadratic terms
|
power3_cols |
Integer vector. Column indices of cubic terms
|
power4_cols |
Integer vector. Column indices of quartic terms
|
interaction_single_cols |
Integer vector. Column indices of
linear-by-linear interaction terms |
interaction_quad_cols |
Integer vector. Column indices of
linear-by-quadratic interaction terms |
triplet_cols |
Integer vector. Column indices of three-way
interaction terms |
select_cols |
Optional integer vector of predictor indices
(positions within |
Details
Mathematical framework
Let the basis expansion for partition \mathcal{P}_k be
\mathbf{x}_k = (\phi_1(\mathbf{t}), \ldots,
\phi_p(\mathbf{t}))^\top where each \phi_i is a
multivariate monomial
\phi_i(\mathbf{t})
= \prod_{j=1}^{q} t_j^{\alpha_{ij}}.
The second derivative of \tilde{f}_k decomposes into
q total curvature operators, one per predictor. For
predictor v:
D_v = \frac{\partial^2}{\partial t_v^2}
+ \sum_{s \neq v} \frac{\partial^2}{\partial t_v \, \partial t_s}.
The monomial derivative rule gives each second partial derivative
in closed form. For the pure second derivative (r = s = v):
\frac{\partial^2}{\partial t_v^2}
\prod_{j} t_j^{\alpha_j}
= \alpha_v(\alpha_v - 1) \;
t_v^{\alpha_v - 2} \prod_{j \neq v} t_j^{\alpha_j}.
For a mixed second derivative (s \neq v):
\frac{\partial^2}{\partial t_v \, \partial t_s}
\prod_{j} t_j^{\alpha_j}
= \alpha_v \alpha_s \;
t_v^{\alpha_v - 1} t_s^{\alpha_s - 1}
\prod_{j \neq v,s} t_j^{\alpha_j}.
In both cases a term is zero when the required exponent would be
negative (e.g.\ \alpha_v < 2 for the pure case). Applying
D_v to \phi_i produces a sum of monomials with known
coefficients and exponent vectors.
Integration
Because every D_v(\phi_i) is polynomial, the product
D_v(\phi_i) \, D_v(\phi_j) is also polynomial and the
multivariate integral factorises over predictors:
\int_{\mathbf{a}}^{\mathbf{b}}
\prod_{j=1}^{q} t_j^{e_j} \, d\mathbf{t}
= \prod_{j=1}^{q}
\frac{b_j^{e_j+1} - a_j^{e_j+1}}{e_j + 1}.
Crucially, this integral runs over all q predictor
ranges, including predictors that do not appear in the integrand
(for which e_j = 0 and the factor reduces to
b_j - a_j).
Single-predictor verification
For q = 1 with expansion
\mathbf{x} = (1, t, t^2, t^3)^\top on [a, b], the
penalty matrix reduces to
\boldsymbol{\Lambda}_s
= \int_a^b \mathbf{x}'' \mathbf{x}''^\top \, dt
= \begin{pmatrix}
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
0 & 0 & 4(b - a) & 6(b^2 - a^2) \\
0 & 0 & 6(b^2 - a^2) & 12(b^3 - a^3)
\end{pmatrix},
matching the formula in Section 2.3.
Value
A symmetric positive semi-definite p \times p matrix
\boldsymbol{\Lambda}_s with
[\boldsymbol{\Lambda}_s]_{ij}
= \sum_{v=1}^{q} \int_{\mathbf{a}}^{\mathbf{b}}
D_v(\phi_i) \, D_v(\phi_j) \, d\mathbf{t}.
Naming convention
Column names in colnm_expansions must encode each monomial's
predictor content. The parser checks each predictor name (taken
from the linear columns) against the column name:
-
predname^d(literal caret) signals exponentd. -
prednamewithout^signals exponent 1. Absence of
prednamesignals exponent 0.
Multiple factors in an interaction are separated by x.
Predictor names must be unique and must not be substrings of one
another.
References
Reinsch, C. H. (1967). Smoothing by spline functions. Numerische Mathematik, 10(3), 177–183.
See Also
lgspline for the full model fitting
interface.
Examples
## Verification example: 3-predictor model with all term types:
# This example constructs the penalty matrix analytically and then
# verifies selected entries against closed-form hand calculations.
# Users can extend or adapt this example to audit new basis
# expansions.
set.seed(1234)
n <- 2000
t1 <- runif(n, 1, 4) # predictor 1, support [1, 4]
t2 <- runif(n, -2, 3) # predictor 2, support [-2, 3]
t3 <- runif(n, 0.5, 2) # predictor 3, support [0.5, 2]
## Predictor names (must not be substrings of one another)
pn <- c("_1_aa", "_2_bb", "_3_cc")
## Build column names encoding the monomial structure
col_names <- c(
pn, # linear
paste0(pn, "^2"), # quadratic
paste0(pn, "^3"), # cubic
paste0(pn[1], "x", pn[2]), # t1*t2
paste0(pn[1], "x", pn[3]), # t1*t3
paste0(pn[2], "x", pn[3]), # t2*t3
paste0(pn[2], "x", pn[1], "^2"), # t1^2*t2
paste0(pn[1], "x", pn[2], "^2"), # t1*t2^2
paste0(pn[1], "x", pn[2], "x", pn[3]) # t1*t2*t3
)
p_expansions <- length(col_names) # 14 basis functions
## Build the expansion matrix C
C <- cbind(
t1, t2, t3,
t1^2, t2^2, t3^2,
t1^3, t2^3, t3^3,
t1*t2, t1*t3, t2*t3,
t2*t1^2, t1*t2^2,
t1*t2*t3
)
colnames(C) <- col_names
## Compute the penalty matrix
Ls <- get_2ndDerivPenalty(
colnm_expansions = col_names,
C = C,
power1_cols = 1:3,
power2_cols = 4:6,
power3_cols = 7:9,
power4_cols = integer(0),
interaction_single_cols = 10:12,
interaction_quad_cols = 13:14,
triplet_cols = 15,
p_expansions = p_expansions,
select_cols = 1:3
)
## Hand-computed reference values (exact, using true bounds) ---
# Notation: dt1 = 4-1 = 3, dt2 = 3-(-2) = 5, dt3 = 2-0.5 = 1.5
#
# Entry [4,4]: t1^2 diagonal.
# D_1(t1^2) = 2. No other D_v contributes.
# integral (2)^2 dt1 dt2 dt3 = 4 * 3 * 5 * 1.5 = 90
#
# Entry [10,10]: t1*t2 diagonal.
# D_1(t1*t2) = 1 (mixed d^2/dt1 dt2).
# D_2(t1*t2) = 1 (mixed d^2/dt2 dt1).
# integral 1 dt1 dt2 dt3 + integral 1 dt1 dt2 dt3 = 2*3*5*1.5 = 45
#
# Entry [15,15]: t1*t2*t3 diagonal.
# D_1(t1*t2*t3) = t3 + t2 (mixed partials d^2/dt1 dt2 and d^2/dt1 dt3)
# D_2(t1*t2*t3) = t1 + t3 (similarly)
# D_3(t1*t2*t3) = t1 + t2 (similarly)
# Full integral = sum of 3 terms = 120 + 337.5 + 266.25 = 723.75
## Compare (allowing ~0.5% tolerance for data-derived bounds)
cat("Entry [4,4]: analytical =", round(Ls[4,4], 2),
" exact = 90.00\n")
cat("Entry [10,10]: analytical =", round(Ls[10,10], 2),
" exact = 45.00\n")
cat("Entry [15,15]: analytical =", round(Ls[15,15], 2),
" exact = 723.75\n")
cat("Symmetric:", isSymmetric(Ls), "\n")
Wrapper for Integrated Second-Derivative Penalty Computation
Description
Computes the integrated squared second-derivative penalty matrix with optional parallel processing.
Usage
get_2ndDerivPenalty_wrapper(
K,
colnm_expansions,
C,
power1_cols,
power2_cols,
power3_cols,
power4_cols,
interaction_single_cols,
interaction_quad_cols,
triplet_cols,
nonspline_cols,
p_expansions,
parallel,
cl
)
Arguments
K |
Number of partitions ( |
colnm_expansions |
Column names of basis expansions |
C |
Basis expansion matrix used to determine observed predictor ranges |
power1_cols |
Linear term columns |
power2_cols |
Quadratic term columns |
power3_cols |
Cubic term columns |
power4_cols |
Quartic term columns |
interaction_single_cols |
Single interaction columns |
interaction_quad_cols |
Quadratic interaction columns |
triplet_cols |
Triplet interaction columns |
nonspline_cols |
Predictors not treated as spline effects |
p_expansions |
Number of columns in the basis expansion per partition |
parallel |
Logical to enable parallel processing |
cl |
Cluster object for parallel computation |
Value
A p \times p integrated squared second-derivative penalty
matrix.
Compute Constrained GLM Coefficient Estimates via Lagrangian Multipliers
Description
Core estimation function for Lagrangian multiplier smoothing splines. Computes penalized coefficient estimates subject to smoothness constraints (continuity and derivative matching at knots) and optional user-supplied linear equality or inequality constraints. Dispatches to one of three computational paths depending on the model structure:
Path 1. GEE (correlation structure present):
When both Vhalf and VhalfInv are provided, the full
N \times P whitened design
\mathbf{X}^{*} = \mathbf{V}_{\mathrm{perm}}^{-1/2}\mathbf{X} is used
after permuting observations to partition ordering via order_list.
For structured correlation matrices where \mathbf{V}^{-1} -
\mathbf{I} is sparse, a Woodbury-accelerated path is attempted
(cost O(Kp^3 + Pr^2) where r is the off-diagonal rank);
if the rank is too high, the dense O(P^3) path is used as
fallback.
Two sub-paths:
-
Path 1a. Gaussian identity + GEE. See
.get_B_gee_gaussian(dense) and.get_B_gee_woodbury(accelerated). -
Path 1b. Non-Gaussian GEE. See
.get_B_gee_glm(dense) and.get_B_gee_glm_woodbury(accelerated).
Path 2. Gaussian identity link, no correlation:
Unconstrained estimate \hat{\boldsymbol{\beta}}_k =
\mathbf{G}_k\mathbf{X}_k^{\top}\mathbf{y}_k per partition, then a
single Lagrangian projection. See .get_B_gaussian_nocorr.
Path 3. Non-Gaussian GLM, no correlation:
Partition-wise unconstrained estimates via Newton-Raphson, then
Lagrangian projection. See .get_B_glm_nocorr.
Inequality constraint handling:
When inequality constraints \mathbf{C}^{\top}\boldsymbol{\beta}
\succeq \mathbf{c} are present, the sparsity pattern of
\mathbf{C} is inspected automatically. If every constraint
column has nonzeros in only a single partition block (block-separable),
a partition-wise active-set method is used at cost
O(Kp^3) per iteration. If any constraint spans multiple
partition blocks (e.g.\ cross-knot monotonicity), the dense
quadprog::solve.QP SQP fallback is invoked.
Usage
get_B(
X,
X_gram,
Lambda,
keep_weighted_Lambda,
unique_penalty_per_partition,
L_partition_list,
A,
Xy,
y,
K,
p_expansions,
R_constraints,
Ghalf,
GhalfInv,
parallel_eigen,
parallel_aga,
parallel_matmult,
parallel_unconstrained,
cl,
chunk_size,
num_chunks,
rem_chunks,
family,
unconstrained_fit_fxn,
iterate,
qp_score_function,
quadprog,
qp_Amat,
qp_bvec,
qp_meq,
prevB = NULL,
prevUnconB = NULL,
iter_count = 0,
prev_diff = Inf,
tol,
constraint_value_vectors,
order_list,
glm_weight_function,
schur_correction_function,
need_dispersion_for_estimation,
dispersion_function,
observation_weights,
homogenous_weights,
return_G_getB,
blockfit,
just_linear_without_interactions,
Vhalf,
VhalfInv,
...
)
Arguments
X |
List of length |
X_gram |
List of Gram matrices by partition. |
Lambda |
Combined penalty matrix. |
keep_weighted_Lambda |
Logical. |
unique_penalty_per_partition |
Logical. |
L_partition_list |
List of partition-specific penalty matrices. |
A |
Constraint matrix |
Xy |
List of cross-products by partition. |
y |
List of response vectors by partition. |
K |
Integer; number of interior knots. |
p_expansions |
Integer; basis terms per partition. |
R_constraints |
Integer; columns of |
Ghalf |
List of |
GhalfInv |
List of |
parallel_eigen, parallel_aga, parallel_matmult, parallel_unconstrained |
Logical flags. |
cl |
Cluster object. |
chunk_size, num_chunks, rem_chunks |
Parallel distribution parameters. |
family |
GLM family object. |
unconstrained_fit_fxn |
Partition-wise unconstrained estimator. |
iterate |
Logical; iterate for non-canonical links. |
qp_score_function |
Score function for QP steps. |
quadprog |
Logical; use |
qp_Amat, qp_bvec, qp_meq |
Inequality constraint specification
|
prevB, prevUnconB, iter_count, prev_diff |
Retired; ignored. |
tol |
Convergence tolerance. |
constraint_value_vectors |
List encoding nonzero RHS
|
order_list |
Partition-to-data index mapping. |
glm_weight_function |
Function computing working weights. |
schur_correction_function |
Function computing Schur corrections. |
need_dispersion_for_estimation |
Logical. |
dispersion_function |
Dispersion estimator. |
observation_weights |
List of observation weights by partition. |
homogenous_weights |
Logical. |
return_G_getB |
Logical; return |
blockfit, just_linear_without_interactions |
Retired; retained for call-site compatibility. |
Vhalf, VhalfInv |
Square root and inverse square root of the
working correlation matrix in the original observation ordering.
When both are non- |
... |
Passed to fitting, weight, correction, and dispersion functions. |
Value
If return_G_getB = FALSE: a list with B (coefficient
column vectors by partition) and qp_info.
If return_G_getB = TRUE: a list with elements:
- B
List of constrained coefficient column vectors
\tilde{\boldsymbol{\beta}}_kby partition.- G_list
List with
G,Ghalf,GhalfInv, each a list ofK+1matrices.- qp_info
QP or active-set metadata, including Lagrangian multipliers and active-constraint information when available, or
NULL.
Verification Examples for get_B
Description
Simple, self-contained examples that reviewers can run to verify that
get_B produces correct output. These exercise Path 2 (Gaussian,
no correlation) and Path 3 (binomial GLM).
Examples
## Not run:
## Example 1: Path 2 - Gaussian identity, with knots
set.seed(42)
t <- runif(200, -5, 5)
y <- sin(t) + rnorm(200, 0, 0.5)
fit1 <- lgspline(t, y, K = 3, opt = FALSE, wiggle_penalty = 1e-4)
stopifnot(inherits(fit1, "lgspline"))
stopifnot(length(fit1$B) == 4) # K+1 = 4 partitions
cat("Example 1 passed: Gaussian identity, K=3\n")
preds1 <- predict(fit1, new_predictors = rnorm(10))
stopifnot(all(is.finite(preds1)))
cat(" Predictions finite: OK\n")
## Example 2: Path 2 - Gaussian identity, K=0 (no constraints)
fit2 <- lgspline(t, y, K = 0, opt = FALSE, wiggle_penalty = 1e-4)
stopifnot(inherits(fit2, "lgspline"))
stopifnot(length(fit2$B) == 1)
preds2 <- predict(fit2, new_predictors = rnorm(10))
stopifnot(all(is.finite(preds2)))
cat("Example 2 passed: Gaussian identity, K=0\n")
## Example 3: Path 3 - Binomial GLM
y_bin <- rbinom(200, 1, plogis(sin(t)))
fit3 <- lgspline(t, y_bin, K = 2, family = binomial(),
opt = FALSE, wiggle_penalty = 1e-3)
stopifnot(inherits(fit3, "lgspline"))
preds3 <- predict(fit3, new_predictors = rnorm(10))
stopifnot(all(preds3 >= 0 & preds3 <= 1))
cat("Example 3 passed: Binomial GLM, K=2\n")
## Example 4: Path 2 with QP constraints (monotonic increase)
t_sorted <- sort(runif(100, -3, 3))
y_mono <- t_sorted + 0.5 * sin(t_sorted) + rnorm(100, 0, 0.3)
fit4 <- lgspline(t_sorted, y_mono, K = 2,
qp_monotonic_increase = TRUE,
opt = FALSE, wiggle_penalty = 1e-4)
preds4 <- predict(fit4, new_predictors = cbind(sort(rnorm(50))))
stopifnot(all(diff(preds4) >= -sqrt(.Machine$double.eps)))
cat("Example 4 passed: Monotonic increase QP constraint\n")
## Example 5: Path 2 with range constraints
fit5 <- lgspline(t, y, K = 3,
qp_range_lower = -2, qp_range_upper = 2,
opt = FALSE, wiggle_penalty = 1e-4)
stopifnot(all(fit5$ytilde >= -2 - 0.01))
stopifnot(all(fit5$ytilde <= 2 + 0.01))
cat("Example 5 passed: Range constraints\n")
## Example 6: Multi-predictor
set.seed(123)
x1 <- rnorm(300)
x2 <- rnorm(300)
y6 <- sin(x1) + cos(x2) + rnorm(300, 0, 0.5)
fit6 <- lgspline(cbind(x1, x2), y6, K = 4,
opt = FALSE, wiggle_penalty = 1e-5)
stopifnot(inherits(fit6, "lgspline"))
stopifnot(fit6$q_predictors == 2)
cat("Example 6 passed: 2D predictor, K=4\n")
## Example 7: Coefficient consistency (determinism)
set.seed(999)
t7 <- runif(150, -4, 4)
y7 <- 2 * cos(t7) + rnorm(150, 0, 0.4)
fit7a <- lgspline(t7, y7, K = 2, opt = FALSE, wiggle_penalty = 1e-5)
set.seed(999)
fit7b <- lgspline(t7, y7, K = 2, opt = FALSE, wiggle_penalty = 1e-5)
max_diff <- max(abs(unlist(fit7a$B) - unlist(fit7b$B)))
stopifnot(max_diff < 1e-12)
cat("Example 7 passed: Deterministic coefficient reproduction\n")
cat("\nAll verification examples passed.\n")
## End(Not run)
Efficiently Construct U Matrix
Description
Efficiently Construct U Matrix
Usage
get_U(G, A, K, p_expansions, R_constraints)
Arguments
G |
List of G matrices ( |
A |
Constraint matrix ( |
K |
Number of partitions minus 1 ( |
p_expansions |
Number of columns per partition |
R_constraints |
Number of constraint columns |
Details
Computes \textbf{U} = \textbf{I} - \textbf{G}\textbf{A}(\textbf{A}^{T}\textbf{G}\textbf{A})^{-1}\textbf{A}^{T} efficiently, avoiding unnecessary
multiplication of blocks of \textbf{G} with all-0 elements.
Value
\textbf{U} matrix for constraints
Get Centers for Partitioning
Description
Get Centers for Partitioning
Usage
get_centers(
data,
K,
cluster_args,
cluster_on_indicators,
data_already_processed = FALSE
)
Arguments
data |
Matrix of predictor data (already processed: binary/excluded
columns zeroed out by the caller when |
K |
Number of partitions minus 1 ( |
cluster_args |
List with custom centers and kmeans args |
cluster_on_indicators |
Include binary predictors in clustering |
data_already_processed |
Logical; when TRUE the caller has already
zeroed out binary / excluded columns, so |
Details
Returns partition centers via:
1. Custom supplied centers if provided as a valid (K+1) \times q matrix
2. kmeans clustering on all non-spline variables if cluster_on_indicators=TRUE
or if data_already_processed=TRUE
3. kmeans clustering excluding binary variables if cluster_on_indicators=FALSE
and data_already_processed=FALSE
Value
Matrix of cluster centers
Generate Interaction Variable Patterns
Description
Generates all possible interaction patterns for 2 or 3 variables. This is used in part for identifying which interactions and expansions to exclude (provided to "exclude_these_expansions" argument of lgspline) based on formulas provided.
Usage
get_interaction_patterns(vars)
Arguments
vars |
Character vector of variable names |
Value
Character vector of interaction pattern strings for 2- or 3-variable
inputs, or NULL otherwise.
Generate Design Matrix with Polynomial and Interaction Terms
Description
Internal function for creating a design matrix containing polynomial expansions and interaction terms for predictor variables. Supports customizable term generation including polynomial degrees up to quartic terms, interaction types, and selective term exclusion.
Column names take on the form "_v_" for linear terms, "_v_^d" for polynomial powers up to d = 4, and "_v_x_w_" for interactions between variables v and w, where v and w are column indices of the input predictor matrix.
The custom_basis_fxn argument, if supplied, requires the same arguments
as this function, in the same order, excluding the argument
"custom_basis_fxn" itself.
Usage
get_polynomial_expansions(
predictors,
numerics,
just_linear_with_interactions,
just_linear_without_interactions,
exclude_interactions_for = NULL,
include_quadratic_terms = TRUE,
include_cubic_terms = TRUE,
include_quartic_terms = FALSE,
include_2way_interactions = TRUE,
include_3way_interactions = TRUE,
include_quadratic_interactions = FALSE,
exclude_these_expansions = NULL,
custom_basis_fxn = NULL,
...
)
Arguments
predictors |
Numeric matrix of predictor variables |
numerics |
Integer vector; column indices for variables to expand as polynomials |
just_linear_with_interactions |
Integer vector; column indices for variables to keep linear but allow interactions |
just_linear_without_interactions |
Integer vector; column indices for variables to keep linear without interactions |
exclude_interactions_for |
Integer vector; column indices to exclude from all interactions |
include_quadratic_terms |
Logical; whether to include squared terms (default TRUE) |
include_cubic_terms |
Logical; whether to include cubic terms (default TRUE) |
include_quartic_terms |
Logical; whether to include 4th degree terms (default FALSE) |
include_2way_interactions |
Logical; whether to include two-way interactions (default TRUE) |
include_3way_interactions |
Logical; whether to include three-way interactions (default TRUE) |
include_quadratic_interactions |
Logical; whether to include interactions with squared terms (default TRUE) |
exclude_these_expansions |
Character vector; names of specific terms to exclude from final matrix |
custom_basis_fxn |
Function; optional custom basis expansion function that accepts all arguments listed here except itself |
... |
Additional arguments passed to |
Value
Matrix with columns for intercept, polynomial terms, and specified interactions
Compute Gram Matrix
Description
Calculates X^T * X for the input matrix X
Usage
gramMatrix(X)
Arguments
X |
Input matrix |
Value
Gram matrix (X^T * X)
Compute Cox Observed Information Matrix
Description
Negative Hessian of the Cox partial log-likelihood under the Breslow approximation for tied event times. Data must be sorted by ascending event time.
Usage
info_cox(X, eta, status, y = NULL, weights = 1)
Arguments
X |
Design matrix (N x p), sorted by ascending event time. |
eta |
Linear predictor vector. |
status |
Event indicator (1 = event, 0 = censored). |
y |
Optional numeric vector of observed event/censor times, same length
and order as |
weights |
Observation weights (default 1). |
Details
Under the Breslow approximation, the observed information is
\sum_g d_g^{(w)}
\Bigl[\frac{S_{2g}}{S_{0g}} -
\Bigl(\frac{S_{1g}}{S_{0g}}\Bigr)
\Bigl(\frac{S_{1g}}{S_{0g}}\Bigr)^{\top}\Bigr]
where
S_{0g} = \sum_{j \in R_g} w_j e^{\eta_j},
S_{1g} = \sum_{j \in R_g} w_j e^{\eta_j}\mathbf{x}_j,
and S_{2g} = \sum_{j \in R_g} w_j e^{\eta_j}\mathbf{x}_j\mathbf{x}_j^{\top}.
Value
Symmetric p x p observed information matrix.
Compute Negative Binomial Observed Information Matrix
Description
Negative Hessian of the NB2 log-likelihood with respect to
\boldsymbol{\beta} under the log link.
Usage
info_negbin(X, y, mu, theta, weights = 1)
Arguments
X |
Design matrix (N x p). |
mu |
Mean vector. |
theta |
Shape parameter. |
weights |
Observation weights (default 1). |
Details
The expected information under log link is
\mathbf{I} = \mathbf{X}^{\top}\mathbf{W}\mathbf{X}
where W_{ii} = w_i \mu_i \theta / (\theta + \mu_i).
This is the IRLS weight for NB2 with log link.
Value
Symmetric p x p observed information matrix.
Generic for Numerical Integration
Description
S3 generic that dispatches to integrate.lgspline for fitted
lgspline objects and falls back to integrate
for ordinary functions.
Usage
integrate(f, ...)
## Default S3 method:
integrate(f, ...)
Arguments
f |
A fitted model object or a function. |
... |
Arguments passed to methods. |
Definite Integral of a Fitted lgspline
Description
Given a fitted lgspline object, computes the definite integral of
the fitted surface over a rectangular domain using Gauss–Legendre
quadrature on predict().
Usage
## S3 method for class 'lgspline'
integrate(
f,
lower,
upper,
vars = NULL,
initial_values = NULL,
B_predict = NULL,
link_scale = FALSE,
n_quad = 50L,
...
)
Arguments
f |
A fitted |
lower |
Numeric vector of lower bounds, one per integration variable. Scalar values are recycled. |
upper |
Numeric vector of upper bounds, one per integration variable. Scalar values are recycled. |
vars |
Default: NULL. Character or integer vector identifying which predictor(s) to integrate over. When NULL all numeric predictors are integrated simultaneously. |
initial_values |
Default: NULL. Numeric vector of length |
B_predict |
Default: NULL. Optional list of coefficient vectors, one per partition. When NULL the fitted coefficients are used. |
link_scale |
Default: FALSE. Logical; when TRUE the integral is
computed on the link (linear predictor) scale |
n_quad |
Default: 50. Number of Gauss–Legendre nodes per integration dimension. |
... |
Additional arguments (currently unused; present for S3 method compatibility). |
Value
A numeric scalar: the estimated definite integral.
Method
The integration domain is discretised into a tensor-product grid of
Gauss–Legendre quadrature nodes. Predicted values at each node come
from the model's predict() method, which correctly handles
partition assignment and piecewise polynomial evaluation. The integral
is the weighted sum
\int_{a_1}^{b_1} \cdots \int_{a_d}^{b_d}
\hat{f}(\mathbf{t})\,\mathrm{d}t_1 \cdots \mathrm{d}t_d
\;\approx\;
\sum_{i=1}^{M} w_i\,\hat{f}(\mathbf{t}_i)
where M = n_{\mathrm{quad}}^d and each weight incorporates the
Jacobian (b_j - a_j)/2 for the affine map from [-1, 1] to
[a_j, b_j]. Nodes and weights on [-1, 1] are computed via
the Golub–Welsch algorithm (eigenvalues of the symmetric tridiagonal
Jacobi matrix).
For smooth polynomials, 30–50 nodes per dimension is typically
sufficient; highly partitioned models (large K) may benefit from
more. Total evaluation points scale as n_{\mathrm{quad}}^d, so
problems with d \ge 4 may require reducing n_quad.
Integration scale
By default (link_scale = FALSE), integration is on the response
scale \mu = g^{-1}(\eta). Setting link_scale = TRUE
integrates the linear predictor \eta = \mathbf{x}^{\top}\boldsymbol{\beta}
directly, which is useful when the quantity of interest is the area
under the link-transformed surface rather than the response. For the
identity link the two coincide.
Examples
## 1-D: integral of fitted sin(t) over [-pi, pi] should be near 0
set.seed(1234)
t <- seq(-pi, pi, length.out = 1000)
y <- sin(t) + rnorm(length(t), 0, 0.01)
fit <- lgspline(t, y, K = 4, opt = FALSE)
integrate(fit, lower = -pi, upper = pi)
## Base R integrate still works as expected
integrate(sin, lower = -pi, upper = pi)
## 2-D: volume under fitted volcano surface
data(volcano)
vlong <- cbind(
rep(seq_len(nrow(volcano)), ncol(volcano)),
rep(seq_len(ncol(volcano)), each = nrow(volcano)),
as.vector(volcano)
)
colnames(vlong) <- c("Length", "Width", "Height")
fit_v <- lgspline(vlong[, 1:2], vlong[, 3], K = 18,
include_quadratic_interactions = TRUE, opt = FALSE)
integrate(fit_v, lower = c(1, 1), upper = c(87, 61))
Matrix Inversion with Fallback Methods
Description
Attempts matrix inversion using multiple methods, falling back to more robust approaches if standard inversion fails.
Usage
invert(mat, include_warnings = FALSE)
Arguments
mat |
Square matrix to invert |
include_warnings |
Logical; default FALSE for current implementation. |
Details
Tries methods in order:
1. Cholesky decomposition via chol2inv(chol(...)) for symmetric
positive-definite matrices (fastest for SPD)
2. Direct inversion using armaInv() as first fallback
3. Generalized inverse using eigendecomposition with small ridge
4. Returns identity matrix if all methods fail, with optional warning
For eigendecomposition, uses a small ridge penalty (1e-16) for stability and
zeroes eigenvalues below machine precision.
Value
Inverted matrix, or the identity matrix if all inversion attempts fail
Examples
## Well-conditioned matrix
A <- matrix(c(4,2,2,4), 2, 2)
invert(A) %*% A
## Singular matrix falls back to M.P. generalized inverse
B <- matrix(c(1,1,1,1), 2, 2)
invert(B) %*% B
Test if Vector is Binary
Description
Test if Vector is Binary
Usage
is_binary(x)
Arguments
x |
Vector to test |
Value
Logical indicating if x has at most 2 unique values
Expand Matrix into Partition Lists Based on Knot Boundaries
Description
Takes an input N \times p matrix of polynomial expansions and outputs a list of
length K+1, isolating the rows of the input corresponding to assigned partition.
Usage
knot_expand_list(partition_codes, partition_bounds, N, mat, K)
Arguments
partition_codes |
Numeric vector; values determining partition assignment for each row |
partition_bounds |
Numeric vector; ordered knot locations defining partition boundaries |
N |
Integer; number of rows in input matrix |
mat |
Numeric matrix; data to be partitioned |
K |
Integer; number of interior knots (resulting in |
Value
List of length K+1, each element containing the submatrix for that partition
Compute Leave-One-Out Cross-Validated Predictions for Gaussian Response/Identity Link under Constraint
Description
Computes the leave-one-out cross-validated predictions from a model fit, assuming Gaussian-distributed response with identity link.
The LOO closed-formula for observation i is \hat{y}_{(-i)} = y_i -
\frac{1}{1 - H_{ii}}(y_i - \hat{y}_i) where
\mathbf{H} is the effective hat matrix under
smoothing constraints, adjusted for weights and correlation structure if
present.
Observations with leverage at or above leverage_threshold are flagged
in a warning, since extreme hat values can make the shortcut numerically
unreliable. The default leverage_threshold = 100 is intentionally
permissive, so users who want diagnostic warnings for large H_{ii}
should set a smaller threshold explicitly.
For related discussion of prediction-sum-of-squares calculations under linear restrictions, see Tarpey (2000), who studies the PRESS statistic for restricted least squares. That setting is closely related to the constraint-adjusted hat-matrix shortcut used here.
Usage
leave_one_out(model_fit, leverage_threshold = 100)
Arguments
model_fit |
A fitted lgspline model object. |
leverage_threshold |
Numeric scalar. Observations with
|
Value
A vector of leave-one-out cross-validated predictions
References
Tarpey, T. (2000). A note on the prediction sum of squares statistic for restricted least squares. The American Statistician, 54(2), 116–118. doi:10.2307/2686028
Examples
## Basic usage with Gaussian response, computing PRESS
set.seed(1234)
t <- rnorm(50)
y <- sin(t) + rnorm(50, 0, .25)
model_fit <- lgspline(t, y)
loo <- leave_one_out(model_fit)
press <- mean((y - loo)^2, na.rm = TRUE)
plot(loo, y,
main = "LOO Cross-Validation Prediction vs. Observed Response",
xlab = 'Prediction', ylab = 'Response')
abline(0, 1)
Fit Lagrangian Multiplier Smoothing Splines
Description
A comprehensive software package for fitting a variant of smoothing splines as a constrained optimization problem, avoiding the need to algebraically disentangle a spline basis after fitting, and allowing for interpretable interactions and non-spline effects to be included.
lgspline fits piecewise polynomial regression splines constrained to be smooth where
they meet, penalized by the squared, integrated, second-derivative of the
estimated function with respect to predictors, using a monomial basis.
The method of Lagrangian multipliers is used to derive a polynomial regression spline that enforces the following smoothing constraints:
Equivalent fitted values at knots
Equivalent first derivatives at knots, with respect to predictors
Equivalent second derivatives at knots, with respect to predictors
The coefficients are penalized by a closed-form of the traditional cubic smoothing spline penalty, as well as tunable modifications that allow for unique penalization of multiple predictors and partitions.
This package supports model fitting for multiple spline and non-spline effects, GLM families, Weibull accelerated failure time (AFT) models, Cox proportional-Hazards models, negative-binomial regression, arbitrary correlation structures, shape constraints, and extensive customization for user-defined models and constraints.
In addition, parallel processing capabilities and comprehensive tools for visualization, frequentist, and Bayesian inference are provided.
Usage
lgspline(
predictors = NULL,
y = NULL,
formula = NULL,
response = NULL,
standardize_response = TRUE,
standardize_predictors_for_knots = TRUE,
standardize_expansions_for_fitting = TRUE,
family = gaussian(),
glm_weight_function = function(mu, y, order_indices, family, dispersion,
observation_weights, ...) {
if (any(!is.null(observation_weights))) {
family$variance(mu) * observation_weights
}
else {
family$variance(mu)
}
},
schur_correction_function = function(X, y, B, dispersion, order_list, K, family,
observation_weights, ...) {
lapply(1:(K + 1), function(k) 0)
},
need_dispersion_for_estimation = FALSE,
dispersion_function = function(mu, y, order_indices, family, observation_weights,
VhalfInv, ...) {
if (!is.null(VhalfInv)) {
VhalfInv <-
VhalfInv[order_indices, order_indices]
c(mean((tcrossprod(VhalfInv, t(y -
mu)))^2/family$variance(mu)))
}
else {
c(mean((y -
mu)^2/family$variance(mu)))
}
},
K = NULL,
custom_knots = NULL,
cluster_on_indicators = FALSE,
make_partition_list = NULL,
previously_tuned_penalties = NULL,
smoothing_spline_penalty = NULL,
opt = TRUE,
use_custom_bfgs = TRUE,
delta = NULL,
tol = 10 * sqrt(.Machine$double.eps),
initial_wiggle = c(1e-10, 1e-05, 0.1),
initial_flat = c(0.1, 10),
wiggle_penalty = 2e-07,
flat_ridge_penalty = 0.5,
unique_penalty_per_partition = TRUE,
unique_penalty_per_predictor = TRUE,
meta_penalty = 1e-08,
predictor_penalties = NULL,
partition_penalties = NULL,
include_quadratic_terms = TRUE,
include_cubic_terms = TRUE,
include_quartic_terms = NULL,
include_2way_interactions = TRUE,
include_3way_interactions = TRUE,
include_quadratic_interactions = FALSE,
offset = c(),
just_linear_with_interactions = NULL,
just_linear_without_interactions = NULL,
exclude_interactions_for = NULL,
exclude_these_expansions = NULL,
custom_basis_fxn = NULL,
include_constrain_fitted = TRUE,
include_constrain_first_deriv = TRUE,
include_constrain_second_deriv = TRUE,
include_constrain_interactions = TRUE,
cl = NULL,
chunk_size = NULL,
parallel_eigen = TRUE,
parallel_trace = FALSE,
parallel_aga = FALSE,
parallel_matmult = FALSE,
parallel_unconstrained = TRUE,
parallel_find_neighbors = FALSE,
parallel_penalty = FALSE,
parallel_make_constraint = FALSE,
unconstrained_fit_fxn = unconstrained_fit_default,
keep_weighted_Lambda = FALSE,
iterate_tune = TRUE,
iterate_final_fit = TRUE,
blockfit = TRUE,
qp_score_function = function(X, y, mu, order_list, dispersion, VhalfInv,
observation_weights, ...) {
if (!is.null(observation_weights)) {
crossprod(X, cbind((y - mu) * observation_weights))
}
else {
crossprod(X, cbind(y - mu))
}
},
qp_observations = NULL,
qp_Amat = NULL,
qp_bvec = NULL,
qp_meq = 0,
qp_positive_derivative = FALSE,
qp_negative_derivative = FALSE,
qp_positive_2ndderivative = FALSE,
qp_negative_2ndderivative = FALSE,
qp_monotonic_increase = FALSE,
qp_monotonic_decrease = FALSE,
qp_range_upper = NULL,
qp_range_lower = NULL,
qp_Amat_fxn = NULL,
qp_bvec_fxn = NULL,
qp_meq_fxn = NULL,
constraint_values = cbind(),
constraint_vectors = cbind(),
return_G = TRUE,
return_Ghalf = TRUE,
return_U = TRUE,
estimate_dispersion = TRUE,
unbias_dispersion = NULL,
return_varcovmat = TRUE,
exact_varcovmat = FALSE,
return_lagrange_multipliers = FALSE,
custom_penalty_mat = NULL,
cluster_args = c(custom_centers = NA, nstart = 10),
dummy_dividor = 1.2345672152894e-22,
dummy_adder = 2.234567210529e-18,
verbose = FALSE,
verbose_tune = FALSE,
dummy_fit = FALSE,
auto_encode_factors = TRUE,
observation_weights = NULL,
do_not_cluster_on_these = c(),
neighbor_tolerance = 1 + 1e-08,
null_constraint = NULL,
critical_value = qnorm(1 - 0.05/2),
data = NULL,
weights = NULL,
no_intercept = FALSE,
correlation_id = NULL,
spacetime = NULL,
correlation_structure = NULL,
VhalfInv = NULL,
Vhalf = NULL,
VhalfInv_fxn = NULL,
Vhalf_fxn = NULL,
VhalfInv_par_init = c(),
REML_grad = NULL,
custom_VhalfInv_loss = NULL,
VhalfInv_logdet = NULL,
include_warnings = TRUE,
penalty_args = NULL,
tuning_args = NULL,
expansion_args = NULL,
constraint_args = NULL,
qp_args = NULL,
parallel_args = NULL,
covariance_args = NULL,
return_args = NULL,
glm_args = NULL,
...
)
Arguments
predictors |
Default: NULL. Numeric matrix or data frame of predictor variables, or a formula when using the formula interface. |
y |
Default: NULL. Numeric response variable vector. |
formula |
Default: NULL. Optional statistical formula for model specification,
supporting |
response |
Default: NULL. Alternative name for response variable. |
standardize_response |
Default: TRUE. Logical indicator controlling whether the response variable should be centered and scaled before model fitting. Only offered for identity link functions. |
standardize_predictors_for_knots |
Default: TRUE. Logical flag controlling
whether predictors are internally standardized for partitioning / knot
placement. The exact transformation is handled inside
|
standardize_expansions_for_fitting |
Default: TRUE. Logical switch to
standardize polynomial basis expansions during model fitting. Design matrices,
variance-covariance matrices, and coefficients are backtransformed after fitting.
|
family |
Default: |
glm_weight_function |
Default: function returning |
schur_correction_function |
Default: function returning list of zeros.
Computes Schur complements |
need_dispersion_for_estimation |
Default: FALSE. Logical indicator specifying whether a dispersion parameter is required for coefficient estimation (e.g. Weibull AFT). |
dispersion_function |
Default: function returning mean squared residuals. Custom function for estimating the exponential dispersion parameter. |
K |
Default: NULL. Integer specifying the number of knot locations. Intuitively, total partitions minus 1. |
custom_knots |
Default: NULL. Optional matrix providing user-specified knot locations in 1-D. |
cluster_on_indicators |
Default: FALSE. Logical flag for whether indicator variables should be used for clustering knot locations. |
make_partition_list |
Default: NULL. Optional list allowing direct specification
of custom partition assignments. The |
previously_tuned_penalties |
Default: NULL. Optional list of pre-computed penalty components from a previous model fit. |
smoothing_spline_penalty |
Default: NULL. Optional custom smoothing spline penalty matrix. |
opt |
Default: TRUE. Logical switch controlling automatic penalty optimization via generalized cross-validation. |
use_custom_bfgs |
Default: TRUE. Selects between a native damped-BFGS implementation with closed-form gradients or base R's BFGS with finite-difference gradients. |
delta |
Default: NULL. Numeric pseudocount for stabilizing optimization in non-identity link function scenarios. |
tol |
Default: |
initial_wiggle |
Default: |
initial_flat |
Default: |
wiggle_penalty |
Default: 2e-7. Numeric penalty on the integrated squared second derivative, governing function smoothness. |
flat_ridge_penalty |
Default: 0.5. Numeric flat ridge penalty for intercepts and
linear terms only. Multiplied by |
unique_penalty_per_partition |
Default: TRUE. Logical flag allowing penalty magnitude to differ across partitions. |
unique_penalty_per_predictor |
Default: TRUE. Logical flag allowing penalty magnitude to differ between predictors. |
meta_penalty |
Default: 1e-8. Numeric regularization coefficient for predictor- and partition-specific penalties during tuning. On the raw scale, the implemented meta-penalty shrinks these penalty multipliers toward 1; the wiggle penalty receives only a tiny stabilizing penalty by default. |
predictor_penalties |
Default: NULL. Optional vector of custom penalties per predictor, on the raw (positive) scale. |
partition_penalties |
Default: NULL. Optional vector of custom penalties per partition, on the raw (positive) scale. |
include_quadratic_terms |
Default: TRUE. Logical switch to include squared predictor terms. |
include_cubic_terms |
Default: TRUE. Logical switch to include cubic predictor terms. |
include_quartic_terms |
Default: NULL. Includes quartic terms; when NULL, set to FALSE for single predictor and TRUE otherwise. Highly recommended for multi-predictor models to avoid over-specified constraints. |
include_2way_interactions |
Default: TRUE. Logical switch for linear two-way interactions. |
include_3way_interactions |
Default: TRUE. Logical switch for three-way interactions. |
include_quadratic_interactions |
Default: FALSE. Logical switch for linear-quadratic interaction terms. |
offset |
Default: Empty vector. Column indices/names to include as offsets. Coefficients for offset terms are automatically constrained to 1. |
just_linear_with_interactions |
Default: NULL. Integer or character vector specifying predictors to retain as linear terms while still allowing interactions. |
just_linear_without_interactions |
Default: NULL. Integer or character vector specifying predictors to retain only as linear terms without interactions. Eligible for blockfitting. |
exclude_interactions_for |
Default: NULL. Integer or character vector of predictors to exclude from all interaction terms. |
exclude_these_expansions |
Default: NULL. Character vector of basis expansions to
exclude. Named columns of data, or in the form |
custom_basis_fxn |
Default: NULL. Optional user-defined function for custom basis
expansions. See |
include_constrain_fitted |
Default: TRUE. Logical switch to constrain fitted values at knot points. |
include_constrain_first_deriv |
Default: TRUE. Logical switch to constrain first derivatives at knot points. |
include_constrain_second_deriv |
Default: TRUE. Logical switch to constrain second derivatives at knot points. |
include_constrain_interactions |
Default: TRUE. Logical switch to constrain interaction terms at knot points. |
cl |
Default: NULL. Parallel processing cluster object
(use |
chunk_size |
Default: NULL. Integer specifying custom chunk size for parallel processing. |
parallel_eigen |
Default: TRUE. Logical flag for parallel eigenvalue decomposition. |
parallel_trace |
Default: FALSE. Logical flag for parallel trace computation. |
parallel_aga |
Default: FALSE. Logical flag for parallel |
parallel_matmult |
Default: FALSE. Logical flag for parallel block-diagonal matrix multiplication. |
parallel_unconstrained |
Default: TRUE. Logical flag for parallel unconstrained MLE for non-identity-link-Gaussian models. |
parallel_find_neighbors |
Default: FALSE. Logical flag for parallel neighbor identification. |
parallel_penalty |
Default: FALSE. Logical flag for parallel penalty matrix construction. |
parallel_make_constraint |
Default: FALSE. Logical flag for parallel constraint matrix generation. |
unconstrained_fit_fxn |
Default: |
keep_weighted_Lambda |
Default: FALSE. Logical flag to retain GLM weights in penalty constraints using Tikhonov parameterization. Advised for non-canonical GLMs. |
iterate_tune |
Default: TRUE. Logical switch for iterative optimization during penalty tuning. |
iterate_final_fit |
Default: TRUE. Logical switch for iterative optimization in final model fitting. |
blockfit |
Default: TRUE. Logical switch for backfitting with mixed spline and
non-interactive linear terms. Requires flat columns, |
qp_score_function |
Default: |
qp_observations |
Default: NULL. Numeric vector of observation indices at which built-in QP constraints are evaluated. Useful for reducing the size of the constrained system. |
qp_Amat |
Default: NULL. Optional pre-built QP constraint matrix.
In the current pipeline its presence marks QP handling as active, but the
built-in constructor does not merge it into the assembled constraint set;
use |
qp_bvec |
Default: NULL. Optional pre-built QP right-hand side paired
with |
qp_meq |
Default: 0. Optional number of equality constraints paired
with |
qp_positive_derivative |
Default: FALSE. Constrain function to have positive first derivatives. Accepts: |
qp_negative_derivative |
Default: FALSE. Constrain function to have negative first derivatives. Same input types as |
qp_positive_2ndderivative |
Default: FALSE. Constrain function to have positive (convex) second derivatives. Same input types as |
qp_negative_2ndderivative |
Default: FALSE. Constrain function to have negative (concave) second derivatives. Same input types as |
qp_monotonic_increase |
Default: FALSE. Logical only. Constrain fitted values to be monotonically increasing in observation order. |
qp_monotonic_decrease |
Default: FALSE. Logical only. Constrain fitted values to be monotonically decreasing in observation order. |
qp_range_upper |
Default: NULL. Numeric upper bound for constrained fitted values. |
qp_range_lower |
Default: NULL. Numeric lower bound for constrained fitted values. |
qp_Amat_fxn |
Default: NULL. Custom function generating Amat. |
qp_bvec_fxn |
Default: NULL. Custom function generating bvec. |
qp_meq_fxn |
Default: NULL. Custom function generating meq. |
constraint_values |
Default: |
constraint_vectors |
Default: |
return_G |
Default: TRUE. Logical switch to return the unscaled unconstrained
variance-covariance matrix |
return_Ghalf |
Default: TRUE. Logical switch to return
|
return_U |
Default: TRUE. Logical switch to return the constraint projection
matrix |
estimate_dispersion |
Default: TRUE. Logical flag to estimate dispersion after fitting. |
unbias_dispersion |
Default: NULL. Logical switch to multiply dispersion by
|
return_varcovmat |
Default: TRUE. Logical switch to return the variance-covariance matrix of estimated coefficients. Needed for Wald inference. |
exact_varcovmat |
Default: FALSE. Logical switch to replace the default
asymptotic (Bayesian posterior) variance-covariance matrix with the exact
frequentist variance-covariance matrix of the constrained estimator. The
asymptotic version uses the Hessian of
the penalized log-likelihood:
When a correlation structure is present ( |
return_lagrange_multipliers |
Default: FALSE. Logical switch to return the Lagrangian multiplier vector. |
custom_penalty_mat |
Default: NULL. Optional |
cluster_args |
Default: |
dummy_dividor |
Default: 0.00000000000000000000012345672152894. Small numeric constant to prevent division by zero. |
dummy_adder |
Default: 0.000000000000000002234567210529. Small numeric constant to prevent division by zero. |
verbose |
Default: FALSE. Logical flag to print general progress messages. |
verbose_tune |
Default: FALSE. Logical flag to print detailed progress during penalty tuning. |
dummy_fit |
Default: FALSE. Runs the full pipeline but sets coefficients to zero,
allowing inspection of design matrix structure, penalty matrices, and partitioning.
Replaces the deprecated |
auto_encode_factors |
Default: TRUE. Logical switch to automatically one-hot encode factor or character variables when using the formula interface. |
observation_weights |
Default: NULL. Numeric vector of observation-specific weights for generalized least squares estimation. |
do_not_cluster_on_these |
Default: |
neighbor_tolerance |
Default: |
null_constraint |
Default: NULL. Alternative parameterization for a
nonzero equality target when |
critical_value |
Default: |
data |
Default: NULL. Optional data frame for formula-based model specification. |
weights |
Default: NULL. Alias for |
no_intercept |
Default: FALSE. Logical flag to constrain intercept to 0.
Formulas with |
correlation_id, spacetime |
Default: NULL. N-length vector and N-row matrix of cluster ids and longitudinal/spatial variables, respectively. |
correlation_structure |
Default: NULL. Native implementations: |
VhalfInv |
Default: NULL. Fixed custom |
Vhalf |
Default: NULL. Fixed custom |
VhalfInv_fxn |
Default: NULL. Parametric function for |
Vhalf_fxn |
Default: NULL. Optional function for efficient computation of
|
VhalfInv_par_init |
Default: |
REML_grad |
Default: NULL. Function for the gradient of the negative REML (or
custom loss) with respect to the parameters of |
custom_VhalfInv_loss |
Default: NULL. Alternative to negative REML for the
correlation parameter objective function. Takes |
VhalfInv_logdet |
Default: NULL. Function for efficient |
include_warnings |
Default: TRUE. Logical switch to control display of warnings. |
penalty_args |
Default: NULL. Optional named list grouping penalty-related arguments. See section "Grouped Argument Lists". |
tuning_args |
Default: NULL. Optional named list grouping tuning-related arguments. |
expansion_args |
Default: NULL. Optional named list grouping basis expansion arguments. |
constraint_args |
Default: NULL. Optional named list grouping constraint arguments. |
qp_args |
Default: NULL. Optional named list grouping quadratic programming arguments. |
parallel_args |
Default: NULL. Optional named list grouping parallel processing arguments. |
covariance_args |
Default: NULL. Optional named list grouping correlation structure arguments. |
return_args |
Default: NULL. Optional named list grouping return-control arguments. |
glm_args |
Default: NULL. Optional named list grouping GLM customization arguments. |
... |
Additional arguments passed to the unconstrained model fitting function. |
Details
A flexible and interpretable implementation of smoothing splines including:
Multiple predictors and interaction terms
Various GLM families and link functions
Correlation structures for longitudinal/clustered data
Shape constraints via quadratic programming
Parallel computation for large datasets
Comprehensive inference tools
Value
A list of class "lgspline" containing model components:
- y
Original response vector.
- ytilde
Fitted/predicted values on the scale of the response.
- X
List of design matrices
\mathbf{X}_{k}for each partition k, containing basis expansions including intercept, linear, quadratic, cubic, and interaction terms as specified. Returned on the unstandardized scale.- A
Constraint matrix
\mathbf{A}encoding smoothness constraints at knot points and any user-specified linear constraints. Only a linearly independent subset of columns is retained (via pivoted QR decomposition).- B
List of fitted coefficients
\boldsymbol{\beta}_{k}for each partition k on the original, unstandardized scale of the predictors and response.- B_raw
List of fitted coefficients for each partition on the predictor-and-response standardized scale.
- K
Number of interior knots with one predictor (number of partitions minus 1 with > 1 predictor).
- p
Number of basis expansions of predictors per partition.
- q
Number of predictor variables.
- P
Total number of coefficients (
p \times (K+1)).- N
Number of observations.
- penalties
List containing optimized penalty matrices and components:
Lambda: Combined penalty matrix (
\boldsymbol{\Lambda}), includes\mathbf{L}_{\mathrm{predictor\_list}}contributions but not\mathbf{L}_{\mathrm{partition\_list}}.L1: Smoothing spline penalty matrix (
\mathbf{L}_{1}).L2: Ridge penalty matrix (
\mathbf{L}_{2}).L_predictor_list: Predictor-specific penalty matrices (
\mathbf{L}_{\mathrm{predictor\_list}}).L_partition_list: Partition-specific penalty matrices (
\mathbf{L}_{\mathrm{partition\_list}}).
- knot_scale_transf
Function for transforming predictors to standardized scale used for knot placement.
- knot_scale_inv_transf
Function for transforming standardized predictors back to original scale.
- knots
Matrix of knot locations on original unstandardized predictor scale for one predictor.
- partition_codes
Vector assigning observations to partitions.
- partition_bounds
Vector or matrix specifying the boundaries between partitions.
- knot_expand_function
Internal function for expanding data according to partition structure.
- predict
Function for generating predictions on new data. For multi-predictor models,
take_first_derivatives = TRUE, take_second_derivativesreturns derivatives as a named list of components per predictor variable, rather than a concatenated vector. Whennew_predictorscontains columns not present in the data, extraneous columns are silently dropped before prediction.- assign_partition
Function for assigning new observations to partitions.
- family
GLM family object specifying the error distribution and link function.
- estimate_dispersion
Logical indicating whether dispersion parameter was estimated.
- unbias_dispersion
Logical indicating whether dispersion estimates should be unbiased.
- backtransform_coefficients
Function for converting standardized coefficients to original scale.
- forwtransform_coefficients
Function for converting coefficients to standardized scale.
- mean_y, sd_y
Mean and standard deviation of response if standardized.
- og_order
Original ordering of observations before partitioning.
- order_list
List containing observation indices for each partition.
- constraint_values, constraint_vectors
Matrices specifying linear equality constraints if provided.
- make_partition_list
List containing partition information for > 1-D cases.
- expansion_scales
Vector of scaling factors used for standardizing basis expansions.
- take_derivative, take_interaction_2ndderivative
Functions for computing derivatives of basis expansions.
- get_all_derivatives_insample
Function for computing all derivatives on training data.
- numerics
Indices of numeric predictors used in basis expansions.
- power1_cols, power2_cols, power3_cols, power4_cols
Column indices for linear through quartic terms.
- quad_cols
Column indices for all quadratic terms (including interactions).
- interaction_single_cols, interaction_quad_cols
Column indices for linear-linear and linear-quadratic interactions.
- triplet_cols
Column indices for three-way interactions.
- nonspline_cols
Column indices for terms excluded from spline expansion.
- return_varcovmat
Logical indicating whether variance-covariance matrix was computed.
- raw_expansion_names
Names of basis expansion terms.
- std_X, unstd_X
Functions for standardizing/unstandardizing design matrices.
- parallel_cluster_supplied
Logical indicating whether a parallel cluster was supplied.
- weights
Original observation weights on the data scale. When no weights were supplied, this is a vector of ones.
- G
List of unscaled partition-wise information inverses
\mathbf{G}_{k}ifreturn_G = TRUE. These are the blockwise quantities stored on the fitting scale; correlation-aware trace, posterior, and variance calculations additionally use dense GLS analogues internally when needed.- Ghalf
List of
\mathbf{G}_{k}^{1/2}matrices ifreturn_Ghalf = TRUE. As withG, dense GLS square-root factors may also be constructed internally for correlation-aware post-fit calculations.- U
Constraint projection matrix
\mathbf{U}ifreturn_U = TRUE. For K=0 and no constraints, returns identity. Otherwise, returns\mathbf{U} = \mathbf{I} - \mathbf{G}\mathbf{A}(\mathbf{A}^{\top}\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^{\top}. Used for computing the variance-covariance matrix\sigma^{2}\mathbf{U}\mathbf{G}.- sigmasq_tilde
Estimated (or fixed) dispersion parameter
\tilde{\sigma}^{2}. For Gaussian identity fits without correlation, this is the weighted mean squared residual with optional bias correction. WhenVhalfInvis non-NULL, Gaussian-identity residuals are whitened before this calculation.- trace_XUGX
Effective degrees of freedom (
\mathrm{trace}(\mathbf{X}\mathbf{U}\mathbf{G}\mathbf{X}^{\top})), where\mathbf{X}\mathbf{U}\mathbf{G}\mathbf{X}^{\top}serves as the "hat" matrix. WhenVhalfInvis non-NULL, computed as\|\mathbf{V}^{-1/2}\mathbf{X}\mathbf{U}\mathbf{G}_{\mathrm{correct}}^{1/2}\|_{F}^{2}using the full penalized GLS information.- varcovmat
Variance-covariance matrix of coefficient estimates if
return_varcovmat = TRUE. Computed as\sigma^{2}(\mathbf{U}\mathbf{G}^{1/2})(\mathbf{U}\mathbf{G}^{1/2})^{\top}for numerical stability. WhenVhalfInvis non-NULL, uses the full\mathbf{G}_{\mathrm{correct}}^{1/2}in place of the block-diagonal\mathbf{G}^{1/2}.- lagrange_multipliers
Vector of Lagrangian multipliers if
return_lagrange_multipliers = TRUE. For equality-only fits these correspond to the active columns of\mathbf{A}; when quadratic-programming constraints are active they are taken directly fromsolve.QPand therefore refer to the combined equality/inequality constraint system.NULLif no constraints are active (\mathbf{A}isNULLorK == 0).- VhalfInv
The
\mathbf{V}^{-1/2}matrix used for implementing correlation structures, if specified.- VhalfInv_fxn, Vhalf_fxn, VhalfInv_logdet, REML_grad
Functions for generating
\mathbf{V}^{-1/2},\mathbf{V}^{1/2},\log|\mathbf{V}^{-1/2}|, and gradient of REML if provided.- VhalfInv_params_estimates
Vector of estimated correlation parameters when using
VhalfInv_fxn.- VhalfInv_params_vcov
Approximate variance-covariance matrix of estimated correlation parameters from BFGS optimization.
- wald_univariate
Function for computing univariate Wald statistics and confidence intervals. Returns an S3 object of class
"wald_lgspline"with dedicatedprint,summary,plot,coef, andconfintmethods. Theprintmethod usesprintCoefmat()for standard R coefficient table formatting with significance stars.- critical_value
Critical value used for confidence interval construction.
- generate_posterior
Function for drawing from the posterior distribution of coefficients. When
VhalfInvis non-NULL, draws are from the correct joint posterior\mathbf{U}\mathbf{G}_{\mathrm{correct}}^{1/2}\mathbf{z}using the full penalized GLS information, reflecting cross-partition posterior covariance induced by off-diagonal blocks of\mathbf{V}^{-1/2}.- find_extremum
Function for optimizing the fitted function. Accepts both numeric column indices and character column names for
vars. Whenselect_vars_fl = TRUE, L-BFGS-B bounds are correctly subsetted to the optimized variables.- plot
Function for visualizing fitted curves.
- quadprog_list
List containing quadratic programming components if applicable.
- .fit_call_args
List containing the arguments passed to
lgspline.
The returned object has class "lgspline" and provides comprehensive tools for
model interpretation, inference, prediction, and visualization. All
coefficients and predictions can be transformed between standardized and
original scales using the provided transformation functions. The object includes
both frequentist and Bayesian inference capabilities through Wald statistics
and posterior sampling. S3 methods logLik.lgspline and
confint.lgspline are available for standard log-likelihood
extraction and confidence interval computation, respectively.
Advanced customization options are available for
analyzing arbitrarily complex study designs.
Response and Predictor Setup
These arguments control the primary data inputs and the initial standardization steps applied before knot placement and fitting.
GLM Customization
These options let you override the default GLM working-weight, dispersion, and partition-wise unconstrained fitting behavior.
Knots and Partitioning
These arguments determine how the predictor space is partitioned and how knot locations are chosen or reused.
Penalty
These arguments configure the smoothing penalty itself and the optional generalized cross-validation tuning procedure.
Basis Expansions
These arguments control which polynomial and interaction terms are included in the partition-specific design matrices.
Constraints
These arguments govern the smoothness equalities and any additional user supplied linear equality constraints.
Quadratic Programming
These arguments activate built-in or custom inequality constraints handled through quadratic programming.
Parallel Processing
These arguments control which computational subroutines may run in parallel and how work is chunked across cluster workers.
Tuning Control
These options control iterative updates during penalty tuning and the final constrained fit.
Return Control
These arguments determine which intermediate matrices and inferential quantities are retained in the returned fit object.
Correlation Structures
These arguments enable built-in or custom working-correlation structures for longitudinal, clustered, or spatially indexed responses.
Grouped Argument Lists
For convenience, related arguments can be bundled into named lists. When a grouped argument is non-NULL, its entries overwrite the corresponding individual arguments. Individual arguments remain available for backward compatibility.
penalty_argsGroups:
wiggle_penalty,flat_ridge_penalty,unique_penalty_per_partition,unique_penalty_per_predictor,meta_penalty,predictor_penalties,partition_penalties,custom_penalty_mat,previously_tuned_penalties,smoothing_spline_penalty.tuning_argsGroups:
opt,use_custom_bfgs,delta,tol,initial_wiggle,initial_flat,iterate_tune,iterate_final_fit.expansion_argsGroups:
include_quadratic_terms,include_cubic_terms,include_quartic_terms,include_2way_interactions,include_3way_interactions,include_quadratic_interactions,just_linear_with_interactions,just_linear_without_interactions,exclude_interactions_for,exclude_these_expansions,custom_basis_fxn,offset.constraint_argsGroups:
include_constrain_fitted,include_constrain_first_deriv,include_constrain_second_deriv,include_constrain_interactions,constraint_values,constraint_vectors,no_intercept.qp_argsGroups all
qp_*arguments.parallel_argsGroups:
cl,chunk_size, and allparallel_*flags.covariance_argsGroups:
correlation_id,spacetime,correlation_structure,VhalfInv,Vhalf,VhalfInv_fxn,Vhalf_fxn,VhalfInv_par_init,REML_grad,custom_VhalfInv_loss,VhalfInv_logdet.return_argsGroups:
return_G,return_Ghalf,return_U,estimate_dispersion,unbias_dispersion,return_varcovmat,exact_varcovmat,return_lagrange_multipliers.glm_argsGroups:
glm_weight_function,schur_correction_function,need_dispersion_for_estimation,dispersion_function,unconstrained_fit_fxn,keep_weighted_Lambda.
Miscellaneous
These remaining arguments affect inference defaults, numerical safeguards, verbosity, and developer-oriented diagnostics.
Grouped Arguments
For convenience, related arguments can be bundled into named lists. When a grouped argument is non-NULL, its entries overwrite the corresponding individual arguments. Individual arguments remain available for backward compatibility.
penalty_argsList. Groups:
wiggle_penalty,flat_ridge_penalty,unique_penalty_per_partition,unique_penalty_per_predictor,meta_penalty,predictor_penalties,partition_penalties,custom_penalty_mat,previously_tuned_penalties,smoothing_spline_penalty.tuning_argsList. Groups:
opt,use_custom_bfgs,delta,tol,initial_wiggle,initial_flat,iterate_tune,iterate_final_fit.expansion_argsList. Groups:
include_quadratic_terms,include_cubic_terms,include_quartic_terms,include_2way_interactions,include_3way_interactions,include_quadratic_interactions,just_linear_with_interactions,just_linear_without_interactions,exclude_interactions_for,exclude_these_expansions,custom_basis_fxn,offset.constraint_argsList. Groups:
include_constrain_fitted,include_constrain_first_deriv,include_constrain_second_deriv,include_constrain_interactions,constraint_values,constraint_vectors,no_intercept.qp_argsList. Groups all
qp_*arguments.parallel_argsList. Groups:
cl,chunk_size, and allparallel_*flags.covariance_argsList. Groups:
correlation_id,spacetime,correlation_structure,VhalfInv,Vhalf,VhalfInv_fxn,Vhalf_fxn,VhalfInv_par_init,REML_grad,custom_VhalfInv_loss,VhalfInv_logdet.return_argsList. Groups:
return_G,return_Ghalf,return_U,estimate_dispersion,unbias_dispersion,return_varcovmat,return_lagrange_multipliers.glm_argsList. Groups:
glm_weight_function,schur_correction_function,need_dispersion_for_estimation,dispersion_function,unconstrained_fit_fxn,keep_weighted_Lambda.
See Also
-
lgspline.fitfor the low-level fitting interface -
logLik.lgsplinefor log-likelihood extraction -
confint.lgsplinefor confidence interval extraction -
leave_one_outfor leave-one-out cross-validated predictions -
blockfit_solvefor the standalone backfitting solver -
solve.QPfor quadratic programming optimization -
plot_lyfor interactive plotting -
kmeansfor k-means clustering -
optimfor general purpose optimization routines
Examples
## ## ## ## Simple Examples ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##
## Simulate some data, fit using default settings without tuning, and plot
set.seed(1234)
t <- runif(2500, -10, 10)
y <- 2*sin(t) + -0.06*t^2 + rnorm(length(t))
model_fit <- lgspline(t, y, opt = FALSE)
plot(t, y, main = 'Observed Data vs. Fitted Function, Colored by Partition',
ylim = c(-10, 10))
plot(model_fit, add = TRUE)
## Repeat using logistic regression, with univariate inference shown
# and alternative function call
y <- rbinom(length(y), 1, 1/(1+exp(-std(y))))
df <- data.frame(t = t, y = y)
model_fit <- lgspline(y ~ spl(t),
df,
family = binomial())
plot(t, y, main = 'Observed Data vs Fitted Function with Formulas and Derivatives',
ylim = c(-0.5, 1.05), cex.main = 0.8)
plot(model_fit,
show_formulas = TRUE,
text_size_formula = 0.65,
legend_pos = 'bottomleft',
legend_args = list(y.intersp = 1.1),
add = TRUE)
## Notice how the coefficients match the formula, and expansions are
# homogenous across partitions without reparameterization
print(summary(model_fit))
## Overlay first and second derivatives of fitted function respectively
derivs <- predict(model_fit,
new_predictors = sort(t),
take_first_derivatives = TRUE,
take_second_derivatives = TRUE)
points(sort(t), derivs$first_deriv, col = 'gold', type = 'l')
points(sort(t), derivs$second_deriv, col = 'goldenrod', type = 'l')
legend('bottomright',
col = c('gold','goldenrod'),
lty = 1,
legend = c('First Derivative', 'Second Derivative'))
## Simple 2D example - including a non-spline effect
z <- seq(-2, 2, length.out = length(y))
df <- data.frame(Predictor1 = t,
Predictor2 = z,
Response = sin(y)+0.1*z)
model_fit <- lgspline(Response ~ spl(Predictor1) + Predictor1*Predictor2,
df)
## Notice, while spline effects change over partitions,
# interactions and non-spline effects are constrained to remain the same
coefficients <- Reduce('cbind', coef(model_fit))
colnames(coefficients) <- paste0('Partition ', 1:(model_fit$K+1))
print(coefficients)
## One or two variables can be selected for plotting at a time
# even when >= 3 predictors are present
plot(model_fit,
custom_title = 'Marginal Relationship of Predictor 1 and Response',
vars = 'Predictor1',
custom_response_lab = 'Response',
show_formulas = TRUE,
legend_pos = 'bottomright',
digits = 4,
text_size_formula = 0.5)
## 3D plots are implemented as well, retaining closed-formulas
my_plot <- plot(model_fit,
show_formulas = TRUE,
custom_response_lab = 'Response')
my_plot
## ## ## ## More Detailed 1D Example ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##
## 1D data generating functions
t <- seq(-9, 9, length.out = 1000)
slinky <- function(x) {
(50 * cos(x * 2) -2 * x^2 + (0.25 * x)^4 + 80)
}
coil <- function(x) {
(100 * cos(x * 2) +-1.5 * x^2 + (0.1 * x)^4 +
(0.05 * x^3) + (-0.01 * x^5) +
(0.00002 * x^6) -(0.000001 * x^7) + 100)
}
exponential_log <- function(x) {
unlist(c(sapply(x, function(xx) {
if (xx <= 1) {
100 * (exp(xx) - exp(1))
} else {
100 * (log(xx))
}
})))
}
scaled_abs_gamma <- function(x) {
2*sqrt(gamma(abs(x)))
}
## Composite function
fxn <- function(x)(slinky(t) +
coil(t) +
exponential_log(t) +
scaled_abs_gamma(t))
## Bind together with random noise
dat <- cbind(t, fxn(t) + rnorm(length(t), 0, 50))
colnames(dat) <- c('t', 'y')
x <- dat[,'t']
y <- dat[,'y']
## Fit Model, 4 equivalent ways are shown below
model_fit <- lgspline(t, y, opt = FALSE)
model_fit <- lgspline(y ~ spl(t), as.data.frame(dat), opt = FALSE)
model_fit <- lgspline(response = y, predictors = t, opt = FALSE)
model_fit <- lgspline(data = as.data.frame(dat), formula = y ~ ., opt = FALSE)
# This is not valid: lgspline(y ~ ., t)
# This is not valid: lgspline(y, data = as.data.frame(dat))
# Do not put operations in formulas, not valid: lgspline(y ~ log(t) + spl(t))
## Basic Functionality
predict(model_fit, new_predictors = rnorm(1)) # make prediction on new data
loo_vals <- suppressWarnings(head(leave_one_out(model_fit)))
loo_vals # may contain NA when leverage is too high
coef(model_fit) # extract coefficients
summary(model_fit) # model information and Wald inference
generate_posterior(model_fit) # generate draws of parameters from posterior distribution
find_extremum(model_fit, minimize = TRUE) # find the minimum of the fitted function
## Incorporate range constraints, custom knots, keep penalization identical
# across partitions
model_fit <- lgspline(y ~ spl(t),
unique_penalty_per_partition = FALSE,
custom_knots = cbind(c(-2, -1, 0, 1, 2)),
data = data.frame(t = t, y = y),
qp_range_lower = -150,
qp_range_upper = 150)
## Plotting the constraints and knots
plot(model_fit,
custom_title = 'Fitted Function Constrained to Lie Between (-150, 150)',
cex.main = 0.75)
# knot locations
abline(v = model_fit$knots)
# lower bound from quadratic program
abline(h = -150, lty = 2)
# upper bound from quadratic program
abline(h = 150, lty = 2)
# observed data
points(t, y, cex = 0.24)
## Enforce monotonic increasing constraints on fitted values
# K = 4 => 5 partitions
t <- seq(-10, 10, length.out = 100)
y <- 5*sin(t) + t + 2*rnorm(length(t))
model_fit <- lgspline(t,
y,
K = 4,
qp_monotonic_increase = TRUE)
plot(t, y, main = 'Monotonic Increasing Function with Respect to Fitted Values')
plot(model_fit,
add = TRUE,
show_formulas = TRUE,
legend_pos = 'bottomright',
custom_predictor_lab = 't',
custom_response_lab = 'y')
## Posterior draws under constraint
draw <- generate_posterior(model_fit, enforce_qp_constraints = TRUE)
pr <- predict(model_fit, B_predict = draw$post_draw_coefficients)
points(t, pr, col = 'grey')
## ## ## ## 2D Example using Volcano Dataset ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##
## Prep
data('volcano')
volcano_long <-
Reduce('rbind', lapply(1:nrow(volcano), function(i){
t(sapply(1:ncol(volcano), function(j){
c(i, j, volcano[i,j])
}))
}))
colnames(volcano_long) <- c('Length', 'Width', 'Height')
## Fit, with 50 partitions
# When fitting with > 1 predictor and large K, including quartic terms
# is highly recommended, and/or dropping the second-derivative constraint.
# Otherwise, the constraints can impose all partitions to be equal, with one
# cubic function fit for all (there is not enough degrees of freedom to fit
# unique cubic functions due to the massive amount of constraints).
# Below, quartic terms are included and the constraint of second-derivative
# smoothness at knots is ignored.
model_fit <- lgspline(volcano_long[,c(1, 2)],
volcano_long[,3],
include_quadratic_interactions = TRUE,
K = 49,
opt = FALSE,
return_U = FALSE,
return_varcovmat = FALSE,
estimate_dispersion = TRUE,
return_Ghalf = FALSE,
return_G = FALSE,
include_constrain_second_deriv = FALSE,
unique_penalty_per_predictor = FALSE,
unique_penalty_per_partition = FALSE,
wiggle_penalty = 1e-10, # the fixed wiggle penalty
flat_ridge_penalty = 1e-2) # the ridge penalty / wiggle penalty
## Plotting on new data with interactive visual + formulas
new_input <- expand.grid(seq(min(volcano_long[,1]),
max(volcano_long[,1]),
length.out = 250),
seq(min(volcano_long[,2]),
max(volcano_long[,2]),
length.out = 250))
plot(model_fit,
new_predictors = new_input,
show_formulas = TRUE,
custom_response_lab = "Height",
custom_title = 'Volcano 3-D Map',
digits = 2)
## Get AUC
area_under_volcano <- integrate(model_fit,
lower = apply(volcano_long, 2, min)[1:2],
upper = apply(volcano_long, 2, max)[1:2])
## ## ## ## Advanced Techniques using Trees Dataset ## ## ## ## ## ## ## ## ## ## ## ## ##
## Goal here is to introduce how lgspline works with non-canonical GLMs and
# demonstrate some custom features
data('trees')
## L1-regularization constraint function on standardized coefficients
# Bound all coefficients to be less than a certain value (l1_bound) in absolute
# magnitude such that | B^{(j)}_k | < lambda for all j = 1....p coefficients,
# and k = 1...K+1 partitions.
l1_constraint_matrix <- function(p, K) {
## Total number of coefficients
P <- p * (K + 1)
## Create diagonal matrices for L1 constraint
# First matrix: lamdba > -bound
# Second matrix: -lambda > -bound
first_diag <- diag(P)
second_diag <- -diag(P)
## Combine matrices
l1_Amat <- cbind(first_diag, second_diag)
return(l1_Amat)
}
## Bounds absolute value of coefficients to be < l1_bound
l1_bound_vector <- function(qp_Amat,
scales,
l1_bound) {
## Combine matrices
l1_bvec <- rep(-l1_bound, ncol(qp_Amat)) * c(1, scales)
return(l1_bvec)
}
## Fit model, using predictor-response formulation, assuming
# Gamma-distributed response, and custom quadratic-programming constraints,
# with qp_score_function/glm_weight_function updated for non-canonical GLMs
# as well as quartic terms, keeping the effect of height constant across
# partitions, and 3 partitions in total. Hence, this is an advanced-usage
# case.
# You can modify this code for performing l1-regularization in general.
# For canonical GLMs, the default qp_score_function/glm_weight_function are
# correct and do not need to be changed
# (custom functionality is not needed for canonical GLMs).
model_fit <- lgspline(
Volume ~ spl(Girth) + Height*Girth,
data = with(trees, cbind(Girth, Height, Volume)),
family = Gamma(link = 'log'),
keep_weighted_Lambda = TRUE,
glm_weight_function = function(
mu,
y,
order_indices,
family,
dispersion,
observation_weights,
...){
rep(1/dispersion, length(y))
},
dispersion_function = function(
mu,
y,
order_indices,
family,
observation_weights,
VhalfInv,
...){
mean(
mu^2/((y-mu)^2)
)
}, # = biased estimate of 1/shape parameter
need_dispersion_for_estimation = TRUE,
unbias_dispersion = TRUE, # multiply dispersion by N/(N-trace(XUGX^{T}))
K = 2, # 3 partitions
opt = FALSE, # keep penalties fixed
unique_penalty_per_partition = FALSE,
unique_penalty_per_predictor = FALSE,
flat_ridge_penalty = 1e-64,
wiggle_penalty = 1e-64,
qp_score_function = function(X, y, mu, order_list, dispersion, VhalfInv,
observation_weights, ...){
t(X) %*% diag(c(1/mu * 1/dispersion)) %*% cbind(y - mu)
}, # updated score for gamma regression with log link
qp_Amat_fxn = function(N, p, K, X, colnm, scales, deriv_fxn, ...) {
l1_constraint_matrix(p, K)
},
qp_bvec_fxn = function(qp_Amat, N, p, K, X, colnm, scales, deriv_fxn, ...) {
l1_bound_vector(qp_Amat, scales, 25)
},
qp_meq_fxn = function(qp_Amat, N, p, K, X, colnm, scales, deriv_fxn, ...) 0
)
## Notice, interaction effect is constant across partitions as is the effect
# of Height alone
Reduce('cbind', coef(model_fit))
## Many constraints, many coefficients, and small sample size makes inference
# using asymptotic variance-covariance matrix untrustworthy.
# Confidence intervals are often too wide or narrow, even for "good" fit.
# Consider bootstrapping or alternative.
print(summary(model_fit))
## Plot results
plot(model_fit, custom_predictor_lab1 = 'Girth',
custom_predictor_lab2 = 'Height',
custom_response_lab = 'Volume',
custom_title = 'Girth and Height Predicting Volume of Trees',
show_formulas = TRUE)
## Verify magnitude of unstandardized coefficients does not exceed bound (25)
print(max(abs(unlist(model_fit$B))))
## Find height and girth where tree volume is closest to 42
# Uses custom objective that minimizes MSE discrepancy between predicted
# value and 42.
# The vanilla find_extremum function can be thought of as
# using "function(mu)mu" aka the identity function as the
# objective, where mu = "f(t)", our estimated function. The derivative is then
# d_mu = "df/dt" with respect to predictors t.
# But with more creative objectives, and since we have machinery for
# df/dt already available, we can compute gradients for (and optimize)
# arbitrary differentiable functions of our predictors too.
# For any objective, differentiate w.r.t. to mu, then multiply by d_mu to
# satisfy chain rule.
# Here, we have objective function: 0.5*(mu-42)^2
# and gradient : (mu-42)*d_mu
# and L-BFGS-B will be used to find the height and girth that most closely
# yields a prediction of 42 within the bounds of the observed data.
# The d_mu also takes into account link function transforms automatically
# for most common link functions, and will return warning + instructions
# on how to program the link-function derivatives otherwise.
## Custom acquisition functions for Bayesian optimization could be coded here.
find_extremum(
model_fit,
minimize = TRUE,
custom_objective_function = function(mu, sigma, ybest, ...){
0.5*(mu - 42)^2
},
custom_objective_derivative = function(mu, sigma, ybest, d_mu, ...){
(mu - 42) * d_mu
}
)
## ## ## ## How to Use Formulas in lgspline ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##
## Demonstrates splines with multiple mixed predictors and interactions
## Generate data
n <- 2500
x <- rnorm(n)
y <- rnorm(n)
z <- sin(x)*mean(abs(y))/2
## Categorical predictors
cat1 <- rbinom(n, 1, 0.5)
cat2 <- rbinom(n, 1, 0.5)
cat3 <- rbinom(n, 1, 0.5)
## Response with mix of effects
response <- y + z + 0.1*(2*cat1 - 1)
## Continuous predictors re-named
continuous1 <- x
continuous2 <- z
## Combine data
dat <- data.frame(
response = response,
continuous1 = continuous1,
continuous2 = continuous2,
cat1 = cat1,
cat2 = cat2,
cat3 = cat3
)
## Example 1: Basic Model with Default Terms, No Intercept
# standardize_response = FALSE often needed when constraining intercepts to 0
fit1 <- lgspline(
formula = response ~ 0 + spl(continuous1, continuous2) +
cat1*cat2*continuous1 + cat3,
K = 2,
standardize_response = FALSE,
data = dat
)
## Examine coefficients included
rownames(fit1$B$partition1)
## Verify intercept term is near 0 up to some numeric tolerance
abs(fit1$B[[1]][1]) < 1e-8
## Example 2: Similar Model with Intercept, Other Terms Excluded
fit2 <- lgspline(
formula = response ~ spl(continuous1, continuous2) +
cat1*cat2*continuous1 + cat3,
K = 1,
standardize_response = FALSE,
include_cubic_terms = FALSE,
exclude_these_expansions = c( # Not all need to actually be present
'_batman_x_robin_',
'_3_x_4_', # no cat1 x cat2 interaction, coded using column indices
'continuous1xcontinuous2', # no continuous1 x continuous2 interaction
'thejoker'
),
data = dat
)
## Examine coefficients included
rownames(Reduce('cbind',coef(fit2)))
# Intercept will probably be present and non-0 now
abs(fit2$B[[1]][1]) < 1e-8
## ## ## ## Compare Inference to survreg for Weibull AFT Model Validation ##
# Only linear predictors, no knots, no penalties, using Weibull AFT Model
# The goal here is to ensure that for the special case of no spline effects
# and no knots, this implementation will be consistent with other model
# implementations.
# Also note, that when using models (like Weibull AFT) where dispersion is
# being estimated and is required for estimating beta coefficients,
# we use a schur complement correction function to adjust (or "correct") our
# variance-covariance matrix for both estimation and inference to account for
# uncertainty in estimating the dispersion.
# Typically the schur_correction_function would return a negative-definite
# matrix, as its output is elementwise added to the information matrix prior
# to inversion.
if (requireNamespace("survival", quietly = TRUE)) {
data("pbc", package = "survival")
df <- data.frame(na.omit(
pbc[, c("time", "trt", "stage", "hepato", "bili", "age", "status")]
))
## Weibull AFT using lgspline, showing how some custom options can be used to
# fit more complicated models
model_fit <- lgspline(time ~ trt + stage + hepato + bili + age,
df,
family = weibull_family(),
need_dispersion_for_estimation = TRUE,
dispersion_function = weibull_dispersion_function,
glm_weight_function = weibull_glm_weight_function,
schur_correction_function = weibull_schur_correction,
unconstrained_fit_fxn = unconstrained_fit_weibull,
opt = FALSE,
wiggle_penalty = 0,
flat_ridge_penalty = 0,
K = 0,
status = df$status != 0)
print(summary(model_fit))
## Survreg results match closely on estimates and inference for coefficients
survreg_fit <- survival::survreg(
survival::Surv(time, status != 0) ~ trt + stage + hepato + bili + age,
df
)
print(summary(survreg_fit))
## sigmasq_tilde = scale^2 of survreg
print(c(sqrt(model_fit$sigmasq_tilde), survreg_fit$scale))
}
## ## ## ## Modelling Correlation Structures ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##
## Setup
n_blocks <- 200 # Number of correlation_ids (subjects)
block_size <- 5 # Size of each correlation_ids (number of repeated measures per subj.)
N <- n_blocks * block_size # total sample size (balanced here)
rho_true <- 0.25 # True correlation
## Generate predictors and mean structure
t <- seq(-9, 9, length.out = N)
true_mean <- sin(t)
## Create block compound symmetric errors = I(1-p) + Jp
errors <- Reduce('rbind',
lapply(1:n_blocks,
function(i){
sigma <- diag(block_size) + rho_true *
(matrix(1, block_size, block_size) -
diag(block_size))
matsqrt(sigma) %*% rnorm(block_size)
}))
## Generate response with correlated errors
y <- true_mean + errors * 0.5
## Fit model with correlation structure
# include_warnings = FALSE is a good idea here, since many proposed
# correlations will not work
model_fit <- lgspline(t,
y,
K = 4,
correlation_id = rep(1:n_blocks, each = block_size),
correlation_structure = 'exchangeable',
include_warnings = FALSE
)
## Assess overall fit
plot(t, y, main = 'Sinosudial Fit Under Correlation Structure')
plot(model_fit, add = TRUE, show_formulas = TRUE, custom_predictor_lab = 't')
## Compare estimated vs true correlation
# Built-in exchangeable uses rho = exp(-exp(par)), so par in (-Inf, Inf)
# maps to rho in (0, 1). Only positive correlation is supported.
rho_est <- exp(-exp(model_fit$VhalfInv_params_estimates))
print(c("True correlation:" = rho_true,
"Estimated correlation:" = rho_est))
## Quantify uncertainty in correlation estimate with 95% confidence interval
# CI is constructed on the working scale and back-transformed
ci_transformed <- confint(model_fit)['Correlation parameter 1',]
ci_natural <- sort(exp(-exp(ci_transformed)))
print("95% CI for correlation:")
print(ci_natural)
## Also check SD (should be close to 0.5)
print(sqrt(model_fit$sigmasq_tilde))
## Toeplitz Simulation Setup, with demonstration of custom functions
# and boilerplate. Toep is not implemented by default, because it makes
# strong assumptions on the study design and missingness that are rarely met,
# with non-obvious workarounds.
# If a GLM was to-be-fit, you would also submit a function "Vhalf_fxn" analogous
# to VhalfInv_fxn with same argument (par) and an output of an N x N matrix
# that yields the inverse of VhalfInv_fxn output.
n_blocks <- 250 # Number of correlation_ids
block_size <- 8 # Observations per correlation_id
N <- n_blocks * block_size # total sample size
sigma_true <- 2 # Marginal standard deviation
## True Toeplitz components
# This example uses a convex combination of two geometric lag kernels:
# corr(h) = mix * rho_fast^h + (1 - mix) * rho_slow^h
# which is Toeplitz and positive definite for mix in (0, 1) and
# rho_fast, rho_slow in (0, 1).
rho_fast_true <- 0.25
rho_slow_true <- 0.75
mix_true <- 0.40
## Create time and correlation_id variables
time_var <- rep(1:block_size, n_blocks)
correlation_id_var <- rep(1:n_blocks, each = block_size)
## Create nonlinear predictor-response relationship
# Not sinusoidal and not polynomial.
t_base <- seq(-2, 2, length.out = block_size)
t <- rep(t_base, n_blocks) + rnorm(N, sd = 0.10)
f_true <- function(t) {
1.4 + 0.9 * atan(1.8 * t) + 0.8 * exp(-1.2 * (t - 0.4)^2)
}
## Generate mean structure
mu_true <- f_true(t)
## Toeplitz correlation helper
corr_from_components <- function(rho_fast, rho_slow, mix) {
corr <- matrix(0, block_size, block_size)
for(i in 1:block_size) {
for(j in 1:block_size) {
lag <- abs(i - j)
if(lag == 0) {
corr[i, j] <- 1
} else {
corr[i, j] <- mix * rho_fast^lag + (1 - mix) * rho_slow^lag
}
}
}
corr
}
## Toeplitz correlation function
# Custom functions can use any parameterization. Here we map:
# par[1] -> rho_fast = exp(-exp(par[1]))
# par[2] -> rho_slow = exp(-exp(par[2]))
# par[3] -> mix = plogis(par[3])
# so the parameter space is unconstrained, while the resulting Toeplitz
# correlation matrix remains valid.
corr_from_par <- function(par) {
rho_fast <- exp(-exp(par[1]))
rho_slow <- exp(-exp(par[2]))
mix <- plogis(par[3])
corr_from_components(rho_fast, rho_slow, mix)
}
## Create block Toeplitz errors from the same family we will fit
corr_true <- corr_from_components(rho_fast_true, rho_slow_true, mix_true)
errors <- Reduce('c',
lapply(1:n_blocks, function(i) {
c(matsqrt(corr_true) %*% rnorm(block_size))
}))
## Generate response with correlated errors and nonlinear covariate effect
y <- mu_true + sigma_true * errors
VhalfInv_fxn <- function(par) {
corr <- corr_from_par(par)
kronecker(diag(n_blocks), matinvsqrt(corr))
}
Vhalf_fxn <- function(par) {
corr <- corr_from_par(par)
kronecker(diag(n_blocks), matsqrt(corr))
}
## Determinant function (for efficiency)
# This avoids taking determinant of N by N matrix
VhalfInv_logdet <- function(par) {
corr <- corr_from_par(par)
log_det_invsqrt_corr <- -0.5 * determinant(corr, logarithm = TRUE)$modulus[1]
n_blocks * log_det_invsqrt_corr
}
## GLM weights for REML gradient helper
# For Gaussian identity, these are all 1.
glm_weight_function <- function(mu, y, order_indices, family,
dispersion, observation_weights, ...) {
rep(1, length(mu))
}
## REML gradient function
# The helper reml_grad_from_dV computes the three REML terms once dV / dpar
# is supplied. For this parameterization, dV / dpar has closed form.
REML_grad <- function(par, model_fit, ...) {
rho_fast <- exp(-exp(par[1]))
rho_slow <- exp(-exp(par[2]))
mix <- plogis(par[3])
dV1_block <- matrix(0, block_size, block_size)
dV2_block <- matrix(0, block_size, block_size)
dV3_block <- matrix(0, block_size, block_size)
for(i in 1:block_size) {
for(j in 1:block_size) {
lag <- abs(i - j)
if(lag > 0) {
## d/dpar[1] through rho_fast = exp(-exp(par[1]))
dV1_block[i, j] <- -mix * lag * exp(par[1]) * rho_fast^lag
## d/dpar[2] through rho_slow = exp(-exp(par[2]))
dV2_block[i, j] <- -(1 - mix) * lag * exp(par[2]) * rho_slow^lag
## d/dpar[3] through mix = plogis(par[3])
dV3_block[i, j] <- mix * (1 - mix) * (rho_fast^lag - rho_slow^lag)
}
}
}
dV1 <- kronecker(diag(n_blocks), dV1_block)
dV2 <- kronecker(diag(n_blocks), dV2_block)
dV3 <- kronecker(diag(n_blocks), dV3_block)
gradient <- numeric(3)
gradient[1] <- lgspline::reml_grad_from_dV(dV1, model_fit,
glm_weight_function, ...)
gradient[2] <- reml_grad_from_dV(dV2, model_fit,
glm_weight_function, ...)
gradient[3] <- reml_grad_from_dV(dV3, model_fit,
glm_weight_function, ...)
gradient
}
## Visualization
plot(t, y, col = correlation_id_var,
main = 'Simulated Data with Toeplitz Correlation')
## Fit model with custom Toeplitz
model_fit <- lgspline(
response = y,
predictors = t,
K = 4,
standardize_response = FALSE,
VhalfInv_fxn = VhalfInv_fxn,
Vhalf_fxn = Vhalf_fxn,
VhalfInv_logdet = VhalfInv_logdet,
REML_grad = REML_grad,
VhalfInv_par_init = c(0, -1, 0),
include_warnings = FALSE
)
## Print comparison of true and estimated correlations
lag_values <- 1:(block_size - 1)
corr_true_by_lag <- sapply(lag_values, function(h) {
mix_true * rho_fast_true^h + (1 - mix_true) * rho_slow_true^h
})
rho_fast_est <- exp(-exp(model_fit$VhalfInv_params_estimates[1]))
rho_slow_est <- exp(-exp(model_fit$VhalfInv_params_estimates[2]))
mix_est <- plogis(model_fit$VhalfInv_params_estimates[3])
corr_est_by_lag <- sapply(lag_values, function(h) {
mix_est * rho_fast_est^h + (1 - mix_est) * rho_slow_est^h
})
cat('Toeplitz Correlation Estimates by Lag:\n')
print(data.frame(
Lag = lag_values,
True.Correlation = round(corr_true_by_lag, 4),
Estimated.Correlation = round(corr_est_by_lag, 4)
))
## Should be ~ 2
print(sqrt(model_fit$sigmasq_tilde))
## ## ## ## Parallelism ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##
if (requireNamespace("parallel", quietly = TRUE)) {
## Data generating function
a <- runif(500000, -9, 9)
b <- runif(500000, -9, 9)
c <- rnorm(500000)
d <- rpois(500000, 1)
y <- sin(a) + cos(b) - 0.2*sqrt(a^2 + b^2) +
abs(a) + b +
0.5*(a^2 + b^2) +
(1/6)*(a^3 + b^3) +
a*b*c -
c +
d +
rnorm(500000, 0, 5)
## Set up cores
cl <- parallel::makeCluster(1)
on.exit(try(parallel::stopCluster(cl), silent = TRUE), add = TRUE)
## This example shows some options for what operations can be parallelized
# By default, only parallel_eigen and parallel_unconstrained are TRUE
# parallel_unconstrained is only for GLMs, for identity link Gaussian
# response, use parallel_matmult=TRUE to ensure parallel fitting across
# partitions.
# G, G^{-1/2}, and G^{1/2} are computed in parallel across each of the
# K+1 partitions.
# However, parallel_unconstrained only affects GLMs without corr. components
# - it does not affect fitting here
system.time({
parfit <- lgspline(y ~ spl(a, b) + a*b*c + d,
data = data.frame(y = y,
a = a,
b = b,
c = c,
d = d),
cl = cl,
K = 1,
parallel_eigen = TRUE,
parallel_unconstrained = TRUE,
parallel_aga = FALSE,
parallel_find_neighbors = FALSE,
parallel_trace = FALSE,
parallel_matmult = TRUE,
parallel_make_constraint = FALSE,
parallel_penalty = FALSE)
})
print(summary(parfit))
}
lgspline: S3 Methods
Description
S3 methods for lgspline objects: print, summary, coef, plot, predict, confint, logLik, and inference helpers.
Low-Level Fitting for Lagrangian Smoothing Splines
Description
The core function for fitting Lagrangian smoothing splines with
less user-friendliness. Called internally by lgspline after
formula parsing, factor encoding, and correlation-structure setup.
Usage
lgspline.fit(predictors, y = NULL, standardize_response = TRUE,
standardize_predictors_for_knots = TRUE,
standardize_expansions_for_fitting = TRUE, family = gaussian(),
glm_weight_function, schur_correction_function,
need_dispersion_for_estimation = FALSE,
dispersion_function,
K = NULL, custom_knots = NULL, cluster_on_indicators = FALSE,
make_partition_list = NULL, previously_tuned_penalties = NULL,
smoothing_spline_penalty = NULL, opt = TRUE, use_custom_bfgs = TRUE,
delta = NULL, tol = 10*sqrt(.Machine$double.eps),
initial_wiggle = c(1e-10, 1e-5, 1e-1),
initial_flat = c(0.1, 10), wiggle_penalty = 2e-07,
flat_ridge_penalty = 0.5, unique_penalty_per_partition = TRUE,
unique_penalty_per_predictor = TRUE, meta_penalty = 1e-08,
predictor_penalties = NULL, partition_penalties = NULL,
include_quadratic_terms = TRUE, include_cubic_terms = TRUE,
include_quartic_terms = FALSE, include_2way_interactions = TRUE,
include_3way_interactions = TRUE,
include_quadratic_interactions = FALSE,
offset = c(), just_linear_with_interactions = NULL,
just_linear_without_interactions = NULL,
exclude_interactions_for = NULL,
exclude_these_expansions = NULL, custom_basis_fxn = NULL,
include_constrain_fitted = TRUE,
include_constrain_first_deriv = TRUE,
include_constrain_second_deriv = TRUE,
include_constrain_interactions = TRUE, cl = NULL, chunk_size = NULL,
parallel_eigen = TRUE, parallel_trace = FALSE, parallel_aga = FALSE,
parallel_matmult = FALSE, parallel_unconstrained = FALSE,
parallel_find_neighbors = FALSE, parallel_penalty = FALSE,
parallel_make_constraint = FALSE,
unconstrained_fit_fxn = unconstrained_fit_default,
keep_weighted_Lambda = FALSE, iterate_tune = TRUE,
iterate_final_fit = TRUE, blockfit = TRUE,
qp_score_function,
qp_observations = NULL, qp_Amat = NULL, qp_bvec = NULL, qp_meq = 0,
qp_positive_derivative = FALSE, qp_negative_derivative = FALSE,
qp_positive_2ndderivative = FALSE, qp_negative_2ndderivative = FALSE,
qp_monotonic_increase = FALSE, qp_monotonic_decrease = FALSE,
qp_range_upper = NULL, qp_range_lower = NULL, qp_Amat_fxn = NULL,
qp_bvec_fxn = NULL, qp_meq_fxn = NULL, constraint_values = cbind(),
constraint_vectors = cbind(), return_G = TRUE, return_Ghalf = TRUE,
return_U = TRUE, estimate_dispersion = TRUE,
unbias_dispersion = TRUE,
return_varcovmat = TRUE, exact_varcovmat = FALSE,
return_lagrange_multipliers = FALSE,
custom_penalty_mat = NULL,
cluster_args = c(custom_centers = NA, nstart = 10),
dummy_dividor = 1.2345672152894e-22,
dummy_adder = 2.234567210529e-18,
verbose = FALSE, verbose_tune = FALSE,
dummy_fit = FALSE, auto_encode_factors = TRUE,
observation_weights = NULL, do_not_cluster_on_these = c(),
neighbor_tolerance = 1 + 1e-16, no_intercept = FALSE,
VhalfInv = NULL, Vhalf = NULL, include_warnings = TRUE,
og_cols = NULL,
factor_groups = NULL, ...)
Arguments
predictors |
Numeric matrix or data frame of predictor variables on the
low-level input scale expected by |
y |
Default: NULL. Numeric response variable vector. |
standardize_response |
Default: TRUE. Logical indicator controlling whether the response variable should be centered and scaled before model fitting. Only offered for identity link functions. |
standardize_predictors_for_knots |
Default: TRUE. Logical flag controlling
whether predictors are internally standardized for partitioning / knot
placement. The exact transformation is handled inside
|
standardize_expansions_for_fitting |
Default: TRUE. Logical switch to
standardize polynomial basis expansions during model fitting. Design matrices,
variance-covariance matrices, and coefficients are backtransformed after fitting.
|
family |
Default: |
glm_weight_function |
Default: function returning |
schur_correction_function |
Default: function returning list of zeros.
Computes Schur complements |
need_dispersion_for_estimation |
Default: FALSE. Logical indicator specifying whether a dispersion parameter is required for coefficient estimation (e.g. Weibull AFT). |
dispersion_function |
Default: function returning mean squared residuals. Custom function for estimating the exponential dispersion parameter. |
K |
Default: NULL. Integer specifying the number of knot locations. Intuitively, total partitions minus 1. |
custom_knots |
Default: NULL. Optional matrix providing user-specified knot locations in 1-D. |
cluster_on_indicators |
Default: FALSE. Logical flag for whether indicator variables should be used for clustering knot locations. |
make_partition_list |
Default: NULL. Optional list allowing direct specification
of custom partition assignments. The |
previously_tuned_penalties |
Default: NULL. Optional list of pre-computed penalty components from a previous model fit. |
smoothing_spline_penalty |
Default: NULL. Optional custom smoothing spline penalty matrix. |
opt |
Default: TRUE. Logical switch controlling automatic penalty optimization via generalized cross-validation. |
use_custom_bfgs |
Default: TRUE. Selects between a native damped-BFGS implementation with closed-form gradients or base R's BFGS with finite-difference gradients. |
delta |
Default: NULL. Numeric pseudocount for stabilizing optimization in non-identity link function scenarios. |
tol |
Default: |
initial_wiggle |
Default: |
initial_flat |
Default: |
wiggle_penalty |
Default: 2e-7. Numeric penalty on the integrated squared second derivative, governing function smoothness. |
flat_ridge_penalty |
Default: 0.5. Numeric flat ridge penalty for intercepts and
linear terms only. Multiplied by |
unique_penalty_per_partition |
Default: TRUE. Logical flag allowing penalty magnitude to differ across partitions. |
unique_penalty_per_predictor |
Default: TRUE. Logical flag allowing penalty magnitude to differ between predictors. |
meta_penalty |
Default: 1e-8. Numeric regularization coefficient for predictor- and partition-specific penalties during tuning. On the raw scale, the implemented meta-penalty shrinks these penalty multipliers toward 1; the wiggle penalty receives only a tiny stabilizing penalty by default. |
predictor_penalties |
Default: NULL. Optional vector of custom penalties per predictor, on the raw (positive) scale. |
partition_penalties |
Default: NULL. Optional vector of custom penalties per partition, on the raw (positive) scale. |
include_quadratic_terms |
Default: TRUE. Logical switch to include squared predictor terms. |
include_cubic_terms |
Default: TRUE. Logical switch to include cubic predictor terms. |
include_quartic_terms |
Default: FALSE. Logical switch to include quartic predictor terms at this low-level interface. |
include_2way_interactions |
Default: TRUE. Logical switch for linear two-way interactions. |
include_3way_interactions |
Default: TRUE. Logical switch for three-way interactions. |
include_quadratic_interactions |
Default: FALSE. Logical switch for linear-quadratic interaction terms. |
offset |
Default: Empty vector. Column indices/names to include as offsets. Coefficients for offset terms are automatically constrained to 1. |
just_linear_with_interactions |
Default: NULL. Integer or character vector specifying predictors to retain as linear terms while still allowing interactions. |
just_linear_without_interactions |
Default: NULL. Integer or character vector specifying predictors to retain only as linear terms without interactions. Eligible for blockfitting. |
exclude_interactions_for |
Default: NULL. Integer or character vector of predictors to exclude from all interaction terms. |
exclude_these_expansions |
Default: NULL. Character vector of basis expansions to
exclude. Named columns of data, or in the form |
custom_basis_fxn |
Default: NULL. Optional user-defined function for custom basis
expansions. See |
include_constrain_fitted |
Default: TRUE. Logical switch to constrain fitted values at knot points. |
include_constrain_first_deriv |
Default: TRUE. Logical switch to constrain first derivatives at knot points. |
include_constrain_second_deriv |
Default: TRUE. Logical switch to constrain second derivatives at knot points. |
include_constrain_interactions |
Default: TRUE. Logical switch to constrain interaction terms at knot points. |
cl |
Default: NULL. Parallel processing cluster object
(use |
chunk_size |
Default: NULL. Integer specifying custom chunk size for parallel processing. |
parallel_eigen |
Default: TRUE. Logical flag for parallel eigenvalue decomposition. |
parallel_trace |
Default: FALSE. Logical flag for parallel trace computation. |
parallel_aga |
Default: FALSE. Logical flag for parallel |
parallel_matmult |
Default: FALSE. Logical flag for parallel block-diagonal matrix multiplication. |
parallel_unconstrained |
Default: FALSE. Logical flag for parallel unconstrained MLE for non-identity-link-Gaussian models. |
parallel_find_neighbors |
Default: FALSE. Logical flag for parallel neighbor identification. |
parallel_penalty |
Default: FALSE. Logical flag for parallel penalty matrix construction. |
parallel_make_constraint |
Default: FALSE. Logical flag for parallel constraint matrix generation. |
unconstrained_fit_fxn |
Default: |
keep_weighted_Lambda |
Default: FALSE. Logical flag to retain GLM weights in penalty constraints using Tikhonov parameterization. Advised for non-canonical GLMs. |
iterate_tune |
Default: TRUE. Logical switch for iterative optimization during penalty tuning. |
iterate_final_fit |
Default: TRUE. Logical switch for iterative optimization in final model fitting. |
blockfit |
Default: TRUE. Logical switch for backfitting with mixed spline and
non-interactive linear terms. Requires flat columns, |
qp_score_function |
Default: |
qp_observations |
Default: NULL. Numeric vector of observation indices at which built-in QP constraints are evaluated. Useful for reducing the size of the constrained system. |
qp_Amat |
Default: NULL. Optional pre-built QP constraint matrix.
In the current pipeline its presence marks QP handling as active, but the
built-in constructor does not merge it into the assembled constraint set;
use |
qp_bvec |
Default: NULL. Optional pre-built QP right-hand side paired
with |
qp_meq |
Default: 0. Optional number of equality constraints paired
with |
qp_positive_derivative |
Default: FALSE. Constrain function to have positive first derivatives. Accepts: |
qp_negative_derivative |
Default: FALSE. Constrain function to have negative first derivatives. Same input types as |
qp_positive_2ndderivative |
Default: FALSE. Constrain function to have positive (convex) second derivatives. Same input types as |
qp_negative_2ndderivative |
Default: FALSE. Constrain function to have negative (concave) second derivatives. Same input types as |
qp_monotonic_increase |
Default: FALSE. Logical only. Constrain fitted values to be monotonically increasing in observation order. |
qp_monotonic_decrease |
Default: FALSE. Logical only. Constrain fitted values to be monotonically decreasing in observation order. |
qp_range_upper |
Default: NULL. Numeric upper bound for constrained fitted values. |
qp_range_lower |
Default: NULL. Numeric lower bound for constrained fitted values. |
qp_Amat_fxn |
Default: NULL. Custom function generating Amat. |
qp_bvec_fxn |
Default: NULL. Custom function generating bvec. |
qp_meq_fxn |
Default: NULL. Custom function generating meq. |
constraint_values |
Default: |
constraint_vectors |
Default: |
return_G |
Default: TRUE. Logical switch to return the unscaled unconstrained
variance-covariance matrix |
return_Ghalf |
Default: TRUE. Logical switch to return
|
return_U |
Default: TRUE. Logical switch to return the constraint projection
matrix |
estimate_dispersion |
Default: TRUE. Logical flag to estimate dispersion after fitting. |
unbias_dispersion |
Default: TRUE. Logical switch to multiply
dispersion by |
return_varcovmat |
Default: TRUE. Logical switch to return the variance-covariance matrix of estimated coefficients. Needed for Wald inference. |
exact_varcovmat |
Default: FALSE. Logical switch to replace the default
asymptotic (Bayesian posterior) variance-covariance matrix with the exact
frequentist variance-covariance matrix of the constrained estimator. The
asymptotic version uses the Hessian of
the penalized log-likelihood:
When a correlation structure is present ( |
return_lagrange_multipliers |
Default: FALSE. Logical switch to return the Lagrangian multiplier vector. |
custom_penalty_mat |
Default: NULL. Optional |
cluster_args |
Default: |
dummy_dividor |
Default: 0.00000000000000000000012345672152894. Small numeric constant to prevent division by zero. |
dummy_adder |
Default: 0.000000000000000002234567210529. Small numeric constant to prevent division by zero. |
verbose |
Default: FALSE. Logical flag to print general progress messages. |
verbose_tune |
Default: FALSE. Logical flag to print detailed progress during penalty tuning. |
dummy_fit |
Default: FALSE. Runs the full pipeline but sets coefficients to zero,
allowing inspection of design matrix structure, penalty matrices, and partitioning.
Replaces the deprecated |
auto_encode_factors |
Default: TRUE. Compatibility flag carried through
from higher-level preprocessing. Direct calls to |
observation_weights |
Default: NULL. Numeric vector of observation-specific weights for generalized least squares estimation. |
do_not_cluster_on_these |
Default: |
neighbor_tolerance |
Default: |
no_intercept |
Default: FALSE. Logical flag to constrain intercept to 0.
Formulas with |
VhalfInv |
Default: NULL. Fixed custom |
Vhalf |
Default: NULL. Fixed custom |
include_warnings |
Default: TRUE. Logical switch to control display of warnings. |
og_cols |
Default: NULL. Original predictor names |
factor_groups |
Named list mapping original factor variable names to
integer vectors of their corresponding one-hot indicator column positions
within the predictor matrix. Each element enforces a sum-to-zero equality
constraint on the linear-term coefficients of its indicator columns within
every partition, ensuring identifiability when all factor levels are
included without a reference/dropped level. For a group with indicator
columns at positions |
... |
Additional arguments passed to the unconstrained model fitting function. |
Details
lgspline.fit performs the following steps:
Polynomial expansion and predictor standardization.
Knot placement and partitioning (k-means or custom).
Constraint matrix
\mathbf{A}construction. Only a linearly independent subset of columns is retained via pivoted QR decomposition.Penalty tuning via GCV (exponential parameterization) or use of previously tuned penalties.
Final coefficient estimation via one of three paths:
-
Blockfit option (when
blockfit = TRUE, flat columns are non-empty,K > 0, and no correlation structure): Routes throughblockfit_solvefor backfitting with mixed spline and non-interactive linear terms. Falls back toget_Bon failure. -
Standard
get_Bpath: Three internal computational paths: GEE (damped SQP with correlation structures), Gaussian identity (closed-form OLS projection), and general GLM (unconstrained fit + Lagrangian projection with optional IRLS loop).
-
Post-fit inference:
\mathbf{U}, trace, dispersion, variance-covariance matrix, and optionally Lagrange multipliers. WhenVhalfInvis non-NULL, these are computed from the whitened Gram matrices\mathbf{X}^{\top}\mathbf{V}^{-1}\mathbf{X}via the full penalized GLS information\mathbf{G}_{\mathrm{correct}} = (\mathbf{X}^{\top}\mathbf{V}^{-1}\mathbf{X} + \boldsymbol{\Lambda})^{-1}.
Dummy fit. When dummy_fit = TRUE, an early-return path
skips the expensive fitting steps (compute_G_eigen, get_B,
trace computation, variance-covariance matrix) while retaining all
penalty, partitioning, and design matrix information. Coefficients are
set to zero. This replaces the deprecated expansions_only argument.
Value
A list containing the fitted model components, forming the core
structure used internally by lgspline and its associated methods.
This function is primarily intended for internal use or advanced users needing
direct access to fitting components. The returned list contains numerous elements,
typically including:
- y
The original response vector provided.
- ytilde
The fitted values on the original response scale. Set to
rep(0, N)whendummy_fit = TRUE.- X
A list, with each element the design matrix (
\mathbf{X}_{k}) for partition k, on the unstandardized expansion scale.- A
The constraint matrix (
\mathbf{A}) encoding smoothness and any other linear equality constraints. Reduced to linearly independent columns via pivoted QR decomposition.- B
A list of the final fitted coefficient vectors (
\boldsymbol{\beta}_{k}) for each partition k, on the original predictor/response scale.- B_raw
A list of fitted coefficient vectors on the internally standardized scale used during fitting.
- K, p, q, P, N
Key dimensions: number of internal knots (K), basis functions per partition (p), original predictors (q), total coefficients (P), and sample size (N).
- penalties
A list containing the final penalty components used (e.g.,
Lambda,L1,L2,L_predictor_list,L_partition_list). Seecompute_Lambda.- knot_scale_transf, knot_scale_inv_transf
Functions to transform predictors to/from the scale used for knot placement.
- knots
Matrix or vector of knot locations on the original predictor scale (NULL if K=0 or q > 1).
- partition_codes
Vector assigning each original observation to a partition.
- partition_bounds
Internal representation of partition boundaries.
- make_partition_list
List containing centers, knot midpoints, neighbor info, and assignment function from partitioning (NULL if K=0 or 1D). See
make_partitions.- knot_expand_function, assign_partition
Internal functions for partitioning data. See
knot_expand_list.- predict
The primary function embedded in the object for generating predictions on new data. For multi-predictor models,
take_first_derivatives = TRUEreturns derivatives as a named list of per-variable derivative vectors rather than a concatenated vector. Seepredict.lgspline.- family
The
familyobject or custom list used.- estimate_dispersion, unbias_dispersion
Logical flags related to dispersion estimation settings.
- sigmasq_tilde
The estimated (or fixed) dispersion parameter
\tilde{\sigma}^{2}. For Gaussian identity fits withVhalfInvnon-NULL, this is computed from whitened residuals\mathbf{V}^{-1/2}(\mathbf{y} - \hat{\mathbf{y}}), multiplied by the observation weights and the optional bias-correction factor. Whenestimate_dispersion = FALSE, set to 1. Omitted whendummy_fit = TRUE.- backtransform_coefficients, forwtransform_coefficients
Functions to convert coefficients between standardized and original scales.
- mean_y, sd_y
Mean and standard deviation used for standardizing the response.
- og_order, order_list
Information mapping original data order to partitioned order.
- constraint_values, constraint_vectors
User-supplied additional linear equality constraints.
- expansion_scales
Scaling factors applied to basis expansions during fitting (if
standardize_expansions_for_fitting = TRUE).- take_derivative, take_interaction_2ndderivative, get_all_derivatives_insample
Functions related to computing derivatives of the fitted spline.
- numerics, power1_cols, ..., nonspline_cols
Integer vectors storing column indices identifying different types of terms in the basis expansion.
- return_varcovmat
Logical indicating if variance matrix calculation was requested.
- exact_varcovmat
Not returned as a standalone component; this argument only controls whether
varcovmat, when requested, is left as the default asymptotic/Laplace version or replaced by the exact frequentist correction available for Gaussian identity fits.- raw_expansion_names
Original generated names for basis expansion columns (before potential renaming if input predictors had names).
- std_X, unstd_X
Functions to standardize/unstandardize design matrices according to
expansion_scales.- parallel_cluster_supplied
Logical indicating if a parallel cluster was used.
- weights
The original observation weights provided (potentially reformatted).
- VhalfInv
The fixed
\mathbf{V}^{-1/2}matrix if supplied.- quadprog_list
List containing components related to quadratic programming constraints, if used.
- G
List of unscaled variance-covariance matrices
\mathbf{G}_{k}per partition, returned ifreturn_G = TRUE. WhenVhalfInvis non-NULL, recomputed from whitened Gram matrices. Omitted whendummy_fit = TRUE.- Ghalf
List of
\mathbf{G}_{k}^{1/2}matrices, returned ifreturn_Ghalf = TRUE. WhenVhalfInvis non-NULL, the full\mathbf{G}_{\mathrm{correct}}^{1/2}is used for posterior draws and variance-covariance computation. Omitted whendummy_fit = TRUE.- U
Constraint projection matrix
\mathbf{U}, returned ifreturn_U = TRUE. Omitted whendummy_fit = TRUE.- trace_XUGX
The effective degrees-of-freedom trace term. When
VhalfInvis non-NULL, it is computed from the full penalized GLS information rather than the block-diagonal approximation. Omitted whendummy_fit = TRUE.- varcovmat
The final variance-covariance matrix of the estimated coefficients. Computed via the outer-product form
\sigma^{2}(\mathbf{U}\mathbf{G}^{1/2})(\mathbf{U}\mathbf{G}^{1/2})^{\top}for numerical stability. WhenVhalfInvis non-NULL, uses the full\mathbf{G}_{\mathrm{correct}}^{1/2}in place of block-diagonal\mathbf{G}^{1/2}. Returned ifreturn_varcovmat = TRUE. By default this is the asymptotic (Laplace/posterior) variance-covariance matrix; whenexact_varcovmat = TRUE, it is replaced in-place by the exact frequentist correction available for Gaussian identity fits. Omitted whendummy_fit = TRUE.- lagrange_multipliers
Vector of Lagrangian multipliers if
return_lagrange_multipliers = TRUE. For equality-only fits these follow the formulation(\mathbf{A}^{\top}\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^{\top}(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta_0}). When quadratic-programming constraints are active they are taken directly fromsolve.QPand therefore refer to the combined equality/inequality constraint system.NULLif no constraints are active
.
Note that the exact components returned depend heavily on the function
arguments (e.g., values of return_G, return_varcovmat, etc.)
and whether dummy_fit = TRUE.
Fit Cox Proportional Hazards Model via lgspline
Description
Convenience wrapper that calls lgspline with the correct
family, weight, dispersion, score, and unconstrained-fit functions for
Cox proportional hazards regression. All standard lgspline arguments
(knots, penalties, constraints, parallel, etc.) are passed through.
Usage
lgspline_cox(formula, data, status, ...)
Arguments
formula |
Formula specifying the model. The response should be
survival time; |
data |
Data frame. |
status |
Integer vector of event indicators (1 = event,
0 = censored), same length as the number of rows in |
... |
Additional arguments passed to |
Details
Internally sets:
-
family = cox_family() -
unconstrained_fit_fxn = unconstrained_fit_cox -
glm_weight_function = cox_glm_weight_function -
qp_score_function = cox_qp_score_function -
dispersion_function = cox_dispersion_function -
schur_correction_function = cox_schur_correction -
need_dispersion_for_estimation = FALSE -
estimate_dispersion = FALSE -
standardize_response = FALSE
A formula interface is needed, e.g. lgspline_cox(t, y, ...) won't work,
unlike for ordinary lgspline.
Value
An object of class "lgspline".
Examples
## Cox PH with a nonlinear age effect on lung cancer survival
if(requireNamespace("survival", quietly = TRUE)) {
library(survival)
set.seed(1234)
lung <- na.omit(lung[, c("time", "status", "age")])
lung$age_std <- std(lung$age)
## survival codes status as 1 = censored, 2 = dead
event <- as.integer(lung$status == 2)
## Spline on age
fit <- lgspline_cox(
time ~ spl(age_std),
data = lung,
status = event,
K = 1
)
print(summary(fit))
plot(fit,
show_formulas = TRUE,
custom_response_lab = 'HR',
custom_predictor_lab = 'Standardized Age',
ylim = c(0, 5))
}
Fit Negative Binomial Model via lgspline
Description
Convenience wrapper that calls lgspline with the correct
family, weight, dispersion, score, and unconstrained-fit functions for
NB2 regression. All standard lgspline arguments (knots, penalties,
constraints, parallel, correlation structures, etc.) are passed through.
Usage
lgspline_negbin(formula, data, ...)
Arguments
formula |
Formula specifying the model. The response should be non-negative integer counts. |
data |
Data frame. |
... |
Additional arguments passed to |
Details
Internally sets:
-
family = negbin_family() -
unconstrained_fit_fxn = unconstrained_fit_negbin -
glm_weight_function = negbin_glm_weight_function -
qp_score_function = negbin_qp_score_function -
dispersion_function = negbin_dispersion_function -
schur_correction_function = negbin_schur_correction -
need_dispersion_for_estimation = TRUE -
estimate_dispersion = TRUE -
standardize_response = FALSE
A formula interface is needed, e.g. lgspline_negbin(t, y) won't work,
unlike for ordinary lgspline.
When a correlation structure is supplied via Vhalf/VhalfInv,
the model is fitted through the GEE Path 1b machinery in get_B.
The dispersion function uses VhalfInv to whiten Pearson
residuals for a better moment-based initialization of \theta,
which stabilizes the profile MLE under moderate to strong correlation.
The score function handles the whitened design consistently with the
Weibull AFT GEE convention.
Value
An object of class "lgspline".
See Also
lgspline_cox for Cox PH,
lgspline_weibull for Weibull AFT,
negbin_family, unconstrained_fit_negbin
Examples
set.seed(1234)
N <- 300
t <- rnorm(N)
mu <- exp(1 + 0.5 * sin(2 * t))
y <- rnbinom(N, size = 3, mu = mu)
df <- data.frame(response = y, predictor = t)
fit <- lgspline_negbin(
response ~ spl(predictor),
data = df,
K = 2,
opt = FALSE,
wiggle_penalty = 1e-2
)
print(summary(fit))
plot(fit, show_formulas = TRUE,
custom_response_lab = 'Count')
points(t, mu, col = 'grey', cex=0.67)
Fit Weibull Accelerated Failure Time Model via lgspline
Description
Convenience wrapper that calls lgspline with the correct
family, weight, dispersion, score, and unconstrained-fit functions for
Weibull accelerated failure time regression. All standard lgspline
arguments (knots, penalties, constraints, parallel, etc.) are passed
through.
Usage
lgspline_weibull(formula, data, status, ...)
Arguments
formula |
Formula specifying the model. The response should be
survival time; |
data |
Data frame. |
status |
Integer vector of event indicators (1 = event,
0 = censored), same length as the number of rows in |
... |
Additional arguments passed to |
Details
Internally sets:
-
family = weibull_family() -
unconstrained_fit_fxn = unconstrained_fit_weibull -
glm_weight_function = weibull_glm_weight_function -
qp_score_function = weibull_qp_score_function -
dispersion_function = weibull_dispersion_function -
schur_correction_function = weibull_schur_correction -
need_dispersion_for_estimation = TRUE -
estimate_dispersion = TRUE -
standardize_response = FALSE
A formula interface is needed, e.g. lgspline_weibull(t, y, ...) won't work,
unlike for ordinary lgspline.
Value
An object of class "lgspline".
See Also
lgspline_cox for Cox proportional hazards,
weibull_family, unconstrained_fit_weibull
Examples
## Weibull AFT with a nonlinear age effect on lung cancer survival
if(requireNamespace("survival", quietly = TRUE)) {
library(survival)
set.seed(1234)
lung <- na.omit(lung[, c("time", "status", "age")])
lung$age_std <- std(lung$age)
## survival codes status as 1 = censored, 2 = dead
event <- as.integer(lung$status == 2)
## Spline on age
fit <- lgspline_weibull(
time ~ spl(age_std),
data = lung,
status = event,
K = 1,
opt = FALSE,
wiggle_penalty = 1e-4,
flat_ridge_penalty = 1
)
print(summary(fit))
plot(fit,
show_formulas = TRUE,
custom_response_lab = 'Survival Time',
custom_predictor_lab = 'Standardized Age')
}
Extract Log-Likelihood from a Fitted lgspline
Description
Returns the log-likelihood as a "logLik" object for use with
AIC, BIC, and other model
comparison tools.
Usage
## S3 method for class 'lgspline'
logLik(object, include_prior = TRUE, new_weights = NULL, ...)
Arguments
object |
A fitted lgspline model object. |
include_prior |
Logical; add the log-prior penalty term. Default TRUE. |
new_weights |
Numeric scalar or N-vector; optional observation weights
overriding |
... |
Not used. |
Details
Gaussian identity, no correlation.
\ell = -\frac{N}{2}\log(2\pi\tilde{\sigma}^2) -
\frac{1}{2\tilde{\sigma}^2}\sum_{i}(y_i - \hat{y}_i)^2
Gaussian identity, with correlation. GLS log-likelihood:
\ell = -\frac{N}{2}\log(2\pi\tilde{\sigma}^2)
+ \log|\mathbf{V}^{-1/2}|
- \frac{1}{2\tilde{\sigma}^2}
\|\mathbf{V}^{-1/2}(\mathbf{y} - \hat{\mathbf{y}})\|^2
\log|\mathbf{V}^{-1/2}| is obtained from VhalfInv_logdet
when available, or computed directly from VhalfInv.
Prior contribution.
When include_prior = TRUE (default), the log-prior
-\frac{1}{2\tilde{\sigma}^2}
\sum_{k}\boldsymbol{\beta}_k^\top\boldsymbol{\Lambda}_k
\boldsymbol{\beta}_k
is added, giving the penalised MAP log-likelihood coherent with the
smoothing spline objective. Set include_prior = FALSE for the
unpenalised marginal likelihood, which is more appropriate when comparing
models with different penalty structures or numbers of knots.
Other GLM families.
Uses family$aic() when available. For correlated models the
whitened residuals and fitted values are passed. When family$aic()
is unavailable, a deviance-based approximation is used (valid for
relative comparisons; a warning is emitted).
This function returns the marginal (full) GLS log-likelihood, not the
REML log-likelihood. This is consistent with REML = FALSE in
lme and gls, and is the conventional choice for AIC/BIC
comparisons of fixed-effects structure.
The df attribute is set to N - \mathrm{trace}(\mathbf{XUGX}^\top).
Value
A "logLik" object with attributes df (effective
degrees of freedom) and nobs (number of observations).
See Also
lgspline, prior_loglik,
logLik, AIC,
BIC
Examples
set.seed(1234)
t <- runif(1000, -10, 10)
y <- 2*sin(t) + -0.06*t^2 + rnorm(length(t))
model_fit <- lgspline(t, y)
logLik(model_fit)
logLik(model_fit, include_prior = FALSE)
AIC(model_fit)
BIC(model_fit)
## Compare models with different K using unpenalized likelihood
fit_k3 <- lgspline(t, y, K = 3)
fit_k7 <- lgspline(t, y, K = 7)
AIC(fit_k3, fit_k7)
Compute Cox Partial Log-Likelihood
Description
Evaluates the Cox partial log-likelihood for a given coefficient vector, using the Breslow approximation for tied event times.
Observations must be sorted in ascending order of survival time before calling this function. The internal helpers handle sorting automatically; this function is exposed for diagnostics and testing.
Usage
loglik_cox(eta, status, y = NULL, weights = 1)
Arguments
eta |
Numeric vector of linear predictors |
status |
Integer/logical vector of event indicators (1 = event,
0 = censored), same length and order as |
y |
Optional numeric vector of observed event/censor times, same length
and order as |
weights |
Optional numeric vector of observation weights (default 1). |
Details
The partial log-likelihood (Breslow) is
\ell(\boldsymbol{\beta}) =
\sum_{g} \Bigl[\sum_{i \in D_g} w_i \eta_i -
d_g^{(w)} \log\Bigl(\sum_{j \in R_g} w_j \exp(\eta_j)\Bigr)\Bigr]
where D_g is the event set at tied event time t_g,
R_g = \{j : t_j \ge t_g\} is the corresponding risk set, and
d_g^{(w)} = \sum_{i \in D_g} w_i.
Value
Scalar partial log-likelihood value.
Examples
set.seed(1234)
eta <- rnorm(50)
status <- rbinom(50, 1, 0.6)
y <- rexp(50)
loglik_cox(eta, status, y)
Compute Negative Binomial Log-Likelihood
Description
Evaluates the NB2 log-likelihood for given mean vector and shape parameter.
Usage
loglik_negbin(y, mu, theta, weights = 1)
Arguments
y |
Non-negative integer response vector. |
mu |
Positive mean vector, same length as |
theta |
Positive scalar shape parameter. |
weights |
Optional observation weights (default 1). |
Details
The log-likelihood is
\ell(\mu, \theta) = \sum_i w_i \bigl[
\log\Gamma(y_i + \theta) - \log\Gamma(\theta) - \log\Gamma(y_i + 1)
+ \theta\log\theta - \theta\log(\mu_i + \theta)
+ y_i\log\mu_i - y_i\log(\mu_i + \theta)\bigr]
Value
Scalar log-likelihood value.
Examples
set.seed(1234)
mu <- exp(rnorm(50))
y <- rpois(50, mu)
loglik_negbin(y, mu, theta = 5)
Compute Log-Likelihood for Weibull Accelerated Failure Time Model
Description
Calculates the log-likelihood for a Weibull accelerated failure time (AFT) survival model, supporting right-censored survival data.
Usage
loglik_weibull(log_y, log_mu, status, scale, weights = 1)
Arguments
log_y |
Numeric vector of logarithmic response/survival times |
log_mu |
Numeric vector of logarithmic predicted survival times |
status |
Numeric vector of censoring indicators (1 = event, 0 = censored) Indicates whether an event of interest occurred (1) or the observation was right-censored (0). In survival analysis, right-censoring occurs when the full survival time is unknown, typically because the study ended or the subject was lost to follow-up before the event of interest occurred. |
scale |
Numeric scalar representing the Weibull scale parameter
(sigma), equivalent to |
weights |
Optional numeric vector of observation weights (default = 1) |
Details
The function computes log-likelihood contributions for a Weibull AFT model, explicitly accounting for right-censored observations. It supports optional observation weighting to accommodate complex sampling designs.
This both provides a tool for actually fitting Weibull AFT models, and boilerplate code for users who wish to incorporate Lagrangian multiplier smoothing splines into their own custom models.
Value
A numeric scalar representing the weighted total log-likelihood of the model
Examples
## Minimal example of fitting a Weibull Accelerated Failure Time model
# Simulating survival data with right-censoring
set.seed(1234)
x1 <- rnorm(1000)
x2 <- rbinom(1000, 1, 0.5)
yraw <- rexp(exp(0.01*x1 + 0.01*x2))
# status: 1 = event occurred, 0 = right-censored
status <- rbinom(1000, 1, 0.25)
yobs <- ifelse(status, runif(length(yraw), 0, yraw), yraw)
df <- data.frame(
y = yobs,
x1 = x1,
x2 = x2
)
## Fit model using lgspline with Weibull AFT specifics
model_fit <- lgspline(y ~ spl(x1) + x2,
df,
unconstrained_fit_fxn = unconstrained_fit_weibull,
family = weibull_family(),
need_dispersion_for_estimation = TRUE,
dispersion_function = weibull_dispersion_function,
glm_weight_function = weibull_glm_weight_function,
schur_correction_function = weibull_schur_correction,
status = status,
opt = FALSE,
K = 1)
loglik_weibull(log(model_fit$y), log(model_fit$ytilde), status,
sqrt(model_fit$sigmasq_tilde))
Create Smoothing Spline Constraint Matrix
Description
Constructs constraint matrix \textbf{A} enforcing continuity and smoothness at knot boundaries
by constraining function values, derivatives, and interactions between partitions.
Usage
make_constraint_matrix(
p_expansions,
CKnots,
power1_cols,
power2_cols,
nonspline_cols,
interaction_single_cols,
interaction_quad_cols,
triplet_cols,
K,
include_constrain_fitted,
include_constrain_first_deriv,
include_constrain_second_deriv,
include_constrain_interactions,
include_2way_interactions,
include_3way_interactions,
include_quadratic_interactions,
colnm_expansions,
expansion_scales
)
Arguments
p_expansions |
Integer; number of columns in basis expansion per partition |
CKnots |
Matrix; basis expansions evaluated at knot points |
power1_cols |
Integer vector; indices of linear terms |
power2_cols |
Integer vector; indices of quadratic terms |
nonspline_cols |
Integer vector; indices of non-spline terms |
interaction_single_cols |
Integer vector; indices of linear interaction terms |
interaction_quad_cols |
Integer vector; indices of quadratic interaction terms |
triplet_cols |
Integer vector; indices of three-way interaction terms |
K |
Integer; number of interior knots ( |
include_constrain_fitted |
Logical; constrain function values at knots |
include_constrain_first_deriv |
Logical; constrain first derivatives at knots |
include_constrain_second_deriv |
Logical; constrain second derivatives at knots |
include_constrain_interactions |
Logical; constrain interaction terms at knots |
include_2way_interactions |
Logical; include two-way interactions |
include_3way_interactions |
Logical; include three-way interactions |
include_quadratic_interactions |
Logical; include quadratic interactions |
colnm_expansions |
Character vector; column names for basis expansions |
expansion_scales |
Numeric vector; scaling factors for standardization |
Value
Matrix \textbf{A} of constraint coefficients. Columns correspond to
constraints, rows to coefficients across all K+1 partitions.
Compute First and Second Derivative Matrices
Description
Compute First and Second Derivative Matrices
Usage
make_derivative_matrix(
p_expansions,
Cpredictors,
power1_cols,
power2_cols,
nonspline_cols,
interaction_single_cols,
interaction_quad_cols,
triplet_cols,
K,
include_2way_interactions,
include_3way_interactions,
include_quadratic_interactions,
colnm_expansions,
expansion_scales,
just_first_derivatives = FALSE,
just_spline_effects = TRUE
)
Arguments
p_expansions |
Number of columns in the basis expansion per partition |
Cpredictors |
Predictor matrix |
power1_cols |
Indices of linear terms of spline effects |
power2_cols |
Indices of quadratic terms of spline effects |
nonspline_cols |
Indices of non-spline effects |
interaction_single_cols |
Indices of first-order interactions |
interaction_quad_cols |
Indices of quadratic interactions |
triplet_cols |
Indices of three-way interactions |
K |
Number of partitions minus 1 ( |
include_2way_interactions |
Include 2-way interactions |
include_3way_interactions |
Include 3-way interactions |
include_quadratic_interactions |
Include quadratic interactions |
colnm_expansions |
Column names |
expansion_scales |
Scale factors |
just_first_derivatives |
Only compute first derivatives |
just_spline_effects |
Only compute derivatives for spline effects |
Value
List containing first-derivative matrices and, unless
just_first_derivatives = TRUE, second-derivative matrices
Create Data Partitions Using Clustering
Description
Partitions data support into clusters using Voronoi-like diagrams.
Standardization for clustering is handled internally; centers, knots, and
the returned assign_partition function are all on the **raw
(natural) scale** of the original predictors.
Usage
make_partitions(
data,
cluster_args,
cluster_on_indicators,
K,
parallel,
cl,
do_not_cluster_on_these,
neighbor_tolerance,
standardize = TRUE,
standardize_mode = "auto",
dummy_adder = 0,
dummy_dividor = 0
)
Arguments
data |
Numeric matrix of predictor variables (raw scale) |
cluster_args |
Parameters for clustering |
cluster_on_indicators |
Logical to include binary predictors |
K |
Number of partitions minus 1 ( |
parallel |
Logical to enable parallel processing |
cl |
Cluster object for parallel computation |
do_not_cluster_on_these |
Columns to exclude from clustering |
neighbor_tolerance |
Scaling factor for neighbor detection |
standardize |
Logical; whether to standardize data internally before
clustering. Should equal |
standardize_mode |
Character; |
dummy_adder |
Small constant added to numerator during standardization
to avoid exact-zero values (matches |
dummy_dividor |
Small constant added to denominator during
standardization to avoid division by zero (matches
|
Value
A list containing:
- centers
Cluster center coordinates on the raw scale.
- knots
Knot points between centers on the raw scale.
- assign_partition
Function that accepts raw-scale new data and returns integer-like partition assignments (0.5, 1.5, ...).
- neighbors
List of neighboring partition indices.
- standardize_transf
The forward standardization function used internally (for diagnostic use only).
- standardize_inv_transf
The inverse standardization function (for diagnostic use only).
- centers_std
Cluster centers on the standardized scale (for diagnostic use only).
Calculate Matrix Inverse Square Root for Symmetric Matrices
Description
Calculate Matrix Inverse Square Root for Symmetric Matrices
Usage
matinvsqrt(mat)
Arguments
mat |
A symmetric matrix |
Details
Uses an eigenvalue-decomposition-based approach.
Non-positive eigenvalues are set to 0 before taking inverse fourth roots.
This implementation is particularly useful for whitening procedures in GLMs with correlation structures and for computing variance-covariance matrices under constraints.
You can use this to help construct a custom VhalfInv_fxn for
lgspline. When only VhalfInv is supplied there,
the corresponding Vhalf is reconstructed internally by inversion
for the GEE code paths.
Value
A matrix \textbf{B} such that \textbf{B}\textbf{B}
equals the Moore-Penrose-style inverse on the positive-eigenvalue
subspace, with non-positive components truncated to 0.
Examples
## Identity matrix
m1 <- diag(2)
matinvsqrt(m1) # Returns identity matrix
## Compound symmetry correlation matrix
rho <- 0.5
m2 <- matrix(rho, 3, 3) + diag(1-rho, 3)
B <- matinvsqrt(m2)
# Verify: B %**% B approximately equals solve(m2)
all.equal(B %**% B, solve(m2))
## Example for GLM correlation structure
n_blocks <- 2 # Number of subjects
block_size <- 3 # Measurements per subject
rho <- 0.7 # Within-subject correlation
# Correlation matrix for one subject
R <- matrix(rho, block_size, block_size) +
diag(1-rho, block_size)
## Full correlation matrix for all subjects
V <- kronecker(diag(n_blocks), R)
## Create whitening matrix
VhalfInv <- matinvsqrt(V)
# Example construction of VhalfInv_fxn for lgspline
VhalfInv_fxn <- function(par) {
rho <- tanh(par) # Transform parameter to (-1, 1)
R <- matrix(rho, block_size, block_size) +
diag(1-rho, block_size)
kronecker(diag(n_blocks), matinvsqrt(R))
}
Left-Multiply a List of Block-Diagonal Matrices by U
Description
Useful for computing UG where G is a list of block-diagonal matrices for each partition, and U is a square P by P dense matrix
Usage
matmult_U(U, G, nc, K)
Arguments
U |
Matrix of dimension P by P that projects onto the null space of A transpose |
G |
List of G matrices for each partition |
nc |
Numeric numeric of basis expansions of predictors per partition |
K |
Number of blocks |
Value
Matrix of UG
Multiply Block Diagonal Matrices in Parallel
Description
Multiplies two lists of matrices that form block diagonal structures, with optional parallel processing.
Usage
matmult_block_diagonal(
A,
B,
K,
parallel,
cl,
chunk_size,
num_chunks,
rem_chunks
)
Arguments
A |
List of matrices forming first block diagonal matrix |
B |
List of matrices forming second block diagonal matrix |
K |
Number of blocks minus 1 ( |
parallel |
Logical; whether to use parallel processing |
cl |
Cluster object for parallel processing |
chunk_size |
Number of blocks per chunk for parallel processing |
num_chunks |
Number of chunks for parallel processing |
rem_chunks |
Remaining blocks after chunking |
Details
When parallel=TRUE, splits computation into chunks processed in parallel.
Handles remainder chunks separately. Uses matmult_block_diagonal_cpp() for
actual multiplication.
The function expects A and B to contain corresponding blocks that can be matrix multiplied.
Value
List containing products of corresponding blocks
Examples
A <- list(matrix(1:4,2,2), matrix(5:8,2,2))
B <- list(matrix(1:4,2,2), matrix(5:8,2,2))
matmult_block_diagonal(A, B, K=1, parallel=FALSE, cl=NULL,
chunk_size=1, num_chunks=1, rem_chunks=0)
Calculate Matrix Square Root for Symmetric Matrices
Description
Calculate Matrix Square Root for Symmetric Matrices
Usage
matsqrt(mat)
Arguments
mat |
A symmetric matrix |
Details
Uses an eigenvalue-decomposition-based approach.
Non-positive eigenvalues are set to 0 before taking fourth roots.
This implementation is particularly useful for whitening procedures in GLMs with correlation structures and for computing variance-covariance matrices under constraints.
You can use this to help construct a custom Vhalf_fxn, or more
directly to build the \mathbf{V}^{1/2} input supplied to
lgspline for correlation-aware fits.
Value
A matrix \textbf{B} such that \textbf{B}\textbf{B}
equals \textbf{M} on the positive-eigenvalue subspace, with
non-positive components truncated to 0.
Examples
## Identity matrix
m1 <- diag(2)
matsqrt(m1) # Returns identity matrix
## Compound symmetry correlation matrix
rho <- 0.5
m2 <- matrix(rho, 3, 3) + diag(1-rho, 3)
B <- matsqrt(m2)
# Verify: B %**% B approximately equals m2
all.equal(B %**% B, m2)
## Example for correlation structure
n_blocks <- 2 # Number of subjects
block_size <- 3 # Measurements per subject
rho <- 0.7 # Within-subject correlation
# Correlation matrix for one subject
R <- matrix(rho, block_size, block_size) +
diag(1-rho, block_size)
# Full correlation matrix for all subjects
V <- kronecker(diag(n_blocks), R)
Vhalf <- matsqrt(V)
NB Dispersion Function
Description
Estimates the shape parameter \theta from current fitted values.
When a correlation structure is present (VhalfInv is non-NULL),
the Pearson residuals are whitened before computing the moment-based
initial value, giving a better starting point for the profile MLE
under correlated data. The final estimate is always the profile MLE
over \theta.
Usage
negbin_dispersion_function(
mu,
y,
order_indices,
family,
observation_weights,
VhalfInv
)
Arguments
mu |
Predicted values. |
y |
Observed counts. |
order_indices |
Observation indices. |
family |
NB family object. |
observation_weights |
Observation weights. |
VhalfInv |
Inverse square root of the correlation matrix, or NULL
for independent observations. When non-NULL, used to whiten residuals
for the moment-based initialization of |
Details
The profile MLE maximizes \ell(\theta \mid \mu) via Brent's
method. When VhalfInv is provided, the Pearson residuals
r_i = (y_i - \mu_i) / \sqrt{V(\mu_i)} are pre-whitened as
\tilde{r} = V^{-1/2} r before computing the moment estimator
used for initialization. This accounts for the correlation structure
in the variance decomposition and produces a more stable starting
point for the optimizer, particularly when the correlation inflates
the marginal variance beyond what the NB model alone would predict.
The profile MLE itself does not use VhalfInv because the NB
log-likelihood is a marginal quantity; the correlation structure
affects estimation only through the mean model (handled by the GEE
paths in get_B).
Value
Scalar \theta estimate (stored as sigmasq_tilde).
Negative Binomial Family for lgspline
Description
Creates a family-like object for NB2 regression. The link is log, the
variance function is V(\mu) = \mu + \mu^2/\theta, and the
dispersion stored by lgspline (sigmasq_tilde) is the shape
parameter \theta.
Usage
negbin_family()
Details
The NB2 model has a nuisance shape parameter \theta analogous
to the Weibull scale parameter. It is estimated jointly with
\boldsymbol{\beta} and its uncertainty is propagated via the
Schur complement correction.
Value
A list with family components used by lgspline.
Examples
fam <- negbin_family()
fam$family
fam$link
NB GLM Weight Function
Description
Computes working weights for the NB2 information matrix used by
lgspline when updating \mathbf{G} after obtaining constrained
estimates.
Usage
negbin_glm_weight_function(
mu,
y,
order_indices,
family,
dispersion,
observation_weights
)
Arguments
mu |
Predicted values |
y |
Observed counts. |
order_indices |
Observation indices in partition order. |
family |
NB family object (unused, for interface compatibility). |
dispersion |
Shape parameter |
observation_weights |
Observation weights. |
Details
The IRLS weight for NB2 with log link is
W_i = w_i \mu_i \theta / (\theta + \mu_i)
Falls back to observation weights when natural weights are degenerate.
Value
Numeric vector of working weights, length N.
Negative Binomial Regression Helpers for lgspline
Description
Functions for fitting negative binomial (NB2) regression models within the lgspline framework. Analogous to the Weibull AFT and Cox PH helpers, these provide the log-likelihood, score, information, and all interface functions needed by lgspline's unconstrained fitting, penalty tuning, and inference machinery.
Details
The parameterization follows NB2: Y \sim \mathrm{NB}(\mu, \theta)
with \mathrm{Var}(Y) = \mu + \mu^2/\theta, where \theta > 0
is the shape (size) parameter. The canonical link is log:
\eta = \log\mu. The dispersion stored in
lgspline$sigmasq_tilde is \theta itself, not
1/\theta.
NB Score Function for Quadratic Programming and Blockfit
Description
Computes the score (gradient of NB log-likelihood) in the format
expected by lgspline's qp_score_function interface.
Usage
negbin_qp_score_function(
X,
y,
mu,
order_list,
dispersion,
VhalfInv,
observation_weights
)
Arguments
X |
Block-diagonal design matrix (N x P). |
y |
Response vector (N x 1). |
mu |
Predicted values (N x 1), same order as X and y. |
order_list |
List of observation indices per partition. |
dispersion |
Shape parameter |
VhalfInv |
Inverse square root of correlation matrix; when non-NULL
the score is computed on the whitened scale as
|
observation_weights |
Observation weights. |
Details
Without correlation (VhalfInv = NULL), the score is
\mathbf{X}^{\top}\mathbf{w} \odot (y - \mu)\theta/(\theta + \mu).
With correlation, the GEE score is
\tilde{\mathbf{X}}^{\top}\mathrm{diag}(\mathbf{W})
\mathbf{V}^{-1}(\mathbf{y} - \boldsymbol{\mu})
where \mathbf{W} contains the NB working weights. The
whitening is absorbed by pre-multiplying both \mathbf{X} and
the residual by \mathbf{V}^{-1/2}.
Value
Numeric column vector of length P.
NB Schur Correction
Description
Computes the Schur complement correction to account for uncertainty in
estimating \theta. Structure is identical to
weibull_schur_correction: the joint information is
partitioned into (\boldsymbol{\beta}, \theta) blocks and the
correction is -\mathbf{I}_{\beta\theta}
I_{\theta\theta}^{-1}\mathbf{I}_{\beta\theta}^{\top}.
Usage
negbin_schur_correction(
X,
y,
B,
dispersion,
order_list,
K,
family,
observation_weights
)
Arguments
X |
List of partition design matrices. |
y |
List of partition response vectors. |
B |
List of partition coefficient vectors. |
dispersion |
Scalar shape parameter |
order_list |
List of observation indices per partition. |
K |
Number of knots. |
family |
Family object. |
observation_weights |
Observation weights. |
Details
The cross-derivative (score of \eta_i w.r.t. \theta) is
\frac{\partial^2 \ell}{\partial \eta_i \partial \theta}
= \frac{(y_i - \mu_i)\mu_i}{(\theta + \mu_i)^2}
The second derivative of the log-likelihood w.r.t. \theta is
I_{\theta\theta} = -\sum_i w_i \bigl[
\psi'(y_i + \theta) - \psi'(\theta) +
\frac{1}{\theta} - \frac{2}{\mu_i + \theta} +
\frac{y_i + \theta}{(\mu_i + \theta)^2}\bigr]
where \psi' is the trigamma function.
Value
List of K+1 correction matrices, with 0 for empty
partitions.
Estimate Negative Binomial Shape Parameter
Description
Computes the profile MLE of the shape parameter \theta given
current mean estimates \mu.
Usage
negbin_theta(y, mu, weights = 1, init = NULL)
Arguments
y |
Response vector. |
mu |
Mean vector. |
weights |
Observation weights (default 1). |
init |
Optional initial value for |
Details
Maximizes the profile log-likelihood over \theta via Brent's
method on [10^{-4}, 10^4].
Value
Scalar MLE of \theta.
Examples
set.seed(1234)
mu <- rep(5, 200)
y <- rnbinom(200, size = 3, mu = 5)
negbin_theta(y, mu)
Compute Newton-Raphson Parameter Update with Numerical Stabilization
Description
Performs parameter update in iterative optimization.
Called by damped_newton_r in the update step
Usage
nr_iterate(gradient_val, neghessian_val)
Arguments
gradient_val |
Numeric vector of gradient values ( |
neghessian_val |
Negative Hessian matrix ( |
Details
This helper function is a core component of Newton-Raphson optimization.
It provides a computationally-stable approach to computing
\textbf{G}\textbf{u}, for information matrix \textbf{G} and
score vector \textbf{u}, where the Newton-Raphson update can be
expressed as
\boldsymbol{\beta}^{(m+1)} = \boldsymbol{\beta}^{(m)} + \textbf{G}\textbf{u}.
Value
Numeric column vector of parameter updates (\textbf{G}\textbf{u})
See Also
damped_newton_r for the full optimization routine
Plot Method for lgspline Objects
Description
Wrapper for the internal lgspline plot function. Produces a 1D line plot
(base R) or interactive 3D surface plot (plotly) depending on the number
of predictors, with optional formula overlays per partition. When plotting
a subset of variables via vars, non-plotted predictors are
automatically set to zero (or a user-specified value via
fixed_values).
Usage
## S3 method for class 'lgspline'
plot(
x,
show_formulas = FALSE,
digits = 4,
legend_pos = "topright",
custom_response_lab = "y",
custom_predictor_lab = NULL,
custom_predictor_lab1 = NULL,
custom_predictor_lab2 = NULL,
custom_formula_lab = NULL,
custom_title = "Fitted Function",
text_size_formula = NULL,
legend_args = list(),
new_predictors = NULL,
xlim = NULL,
ylim = NULL,
color_function = NULL,
add = FALSE,
vars = c(),
legend_order = NULL,
se.fit = FALSE,
cv = 1,
band_col = "grey80",
band_border = NA,
fixed_values = NULL,
n_grid = 200,
...
)
Arguments
x |
A fitted lgspline model object. |
show_formulas |
Logical; display partition-level polynomial formulas. Default FALSE. |
digits |
Integer; decimal places for formula coefficients. Default 4. |
legend_pos |
Character; legend position for 1D plots. Default
|
custom_response_lab |
Character; response axis label. Default
|
custom_predictor_lab |
Character; predictor axis label (1D). Default NULL uses the column name. |
custom_predictor_lab1 |
Character; first predictor axis label (2D). Default NULL. |
custom_predictor_lab2 |
Character; second predictor axis label (2D). Default NULL. |
custom_formula_lab |
Character; fitted response label on the link scale. Default NULL. |
custom_title |
Character; plot title. Default |
text_size_formula |
Numeric; formula text size. Passed to |
legend_args |
List; additional arguments passed to |
new_predictors |
Matrix; optional predictor values for prediction.
Default NULL. When |
xlim |
Numeric vector; x-axis limits (1D only). Default NULL. |
ylim |
Numeric vector; y-axis limits (1D only). Default NULL. |
color_function |
Function; returns K+1 colors, one per partition.
Default NULL uses |
add |
Logical; add to an existing plot (1D only). Default FALSE. |
vars |
Numeric or character vector; predictor indices or names to
plot. Default |
legend_order |
Numeric; re-ordered partition indices for the legend. |
se.fit |
Logical; if TRUE, plot pointwise confidence bands. Default FALSE. |
cv |
Numeric; critical value for confidence bands. Default 1. |
band_col |
Character; color for confidence band fill. Default
|
band_border |
Character or NA; border color for confidence band polygon. Default NA (no border). |
fixed_values |
Named list; fixed values for non-plotted predictors
when |
n_grid |
Integer; number of grid points for automatic grid
generation when |
... |
Details
Partition boundaries are indicated by color changes. For 1D models, observation points can be overlaid. For 2D models, plotly is used.
When using vars to plot a subset of predictors, the non-plotted
predictors are automatically set to zero. This can be overridden by
passing a named list to fixed_values (e.g.,
fixed_values = list(Height = 75)). The automatic zeroing replaces
the previous behavior where the user had to manually construct
new_predictors with non-plotted variables set to fixed values.
When se.fit = TRUE, pointwise confidence bands are drawn around
the fitted function. These are Wald-type intervals constructed on the
link scale and back-transformed to the response scale, using
cv as the critical value to multiply se.fit by (default 1 for actual se).
The function extracts predictor positions from linear expansion terms.
If linear terms are excluded (e.g., via exclude_these_expansions),
plotting will fail. As a workaround, constrain those terms to zero via
constraint_vectors / constraint_values so they remain in
the expansion but are zeroed out.
Value
For 1D models: invisibly returns NULL (base R plot drawn to device). For 2D models: returns a plotly object.
See Also
Examples
set.seed(1234)
t_data <- runif(1000, -10, 10)
y_data <- 2*sin(t_data) + -0.06*t_data^2 + rnorm(length(t_data))
model_fit <- lgspline(t_data, y_data, K = 9)
## Basic plot
plot(model_fit)
## Plot with confidence bands
plot(model_fit,
se.fit = TRUE,
cv = 1.96,
custom_title = 'Fitted Function with 95% CI')
## Multi-predictor: automatically zeros non-plotted variables
# plot(model_fit_2d, vars = 'x1', se.fit = TRUE)
Plot Method for wald_lgspline Objects
Description
Forest-style plot of coefficient estimates with confidence intervals.
Usage
## S3 method for class 'wald_lgspline'
plot(
x,
parm = NULL,
which = NULL,
main = "Coefficient Estimates and CIs",
xlab = "Estimate",
...
)
Arguments
x |
A |
parm |
Integer vector of coefficient indices or character vector of names to plot. Default NULL plots all. |
which |
Integer vector of coefficient indices to plot (alternative to
|
main |
Plot title. Default |
xlab |
x-axis label. Default |
... |
Additional arguments passed to |
Value
Invisibly returns NULL.
See Also
wald_univariate, confint.lgspline
Predict Method for lgspline Objects
Description
Generates predictions, derivatives, and basis expansions from a fitted lgspline model. Wrapper for the internal predict closure stored in the object.
Usage
## S3 method for class 'lgspline'
predict(
object,
newdata = NULL,
parallel = FALSE,
cl = NULL,
chunk_size = NULL,
num_chunks = NULL,
rem_chunks = NULL,
B_predict = NULL,
take_first_derivatives = FALSE,
take_second_derivatives = FALSE,
expansions_only = FALSE,
new_predictors = NULL,
...
)
Arguments
object |
A fitted lgspline model object. |
newdata |
Matrix or data.frame; new predictor values. Default NULL. |
parallel |
Logical; use parallel processing (experimental). Default FALSE. |
cl |
Cluster object for parallel processing. Default NULL. |
chunk_size |
Integer; chunk size for parallel processing. Default NULL. |
num_chunks |
Integer; number of chunks. Default NULL. |
rem_chunks |
Integer; remainder chunks. Default NULL. |
B_predict |
List; per-partition coefficient list for prediction, e.g.
from |
take_first_derivatives |
Logical; compute first derivatives. Default FALSE. |
take_second_derivatives |
Logical; compute second derivatives. Default FALSE. |
expansions_only |
Logical; return basis expansion matrix only. Default FALSE. |
new_predictors |
Matrix or data.frame; overrides |
... |
Additional arguments passed to the internal predict method. |
Details
new_predictors takes priority over newdata when both are
supplied. When both are NULL, the training data is used.
Fitted values are also accessible directly as model_fit$ytilde or
via model_fit$predict().
The parallel processing feature is experimental.
Additional arguments passed through ... include se.fit
and cv for pointwise interval summaries.
Predictor input should use the original predictor columns. Named extra columns are dropped when they can be identified as irrelevant to the fitted expansions.
Value
A numeric vector of predictions, or a list when derivatives or interval summaries are requested:
- preds
Numeric vector of predictions when derivatives are requested.
- fit
Numeric vector of predictions when
se.fit = TRUEand no derivatives are requested.- first_deriv
Numeric vector or named list of first derivatives (if requested).
- second_deriv
Numeric vector or named list of second derivatives (if requested).
- se.fit
Pointwise standard errors on the link scale (if requested).
- lower
Pointwise lower interval bound (if requested).
- upper
Pointwise upper interval bound (if requested).
- cv
Critical value returned when
se.fit = TRUEwithout derivative requests.
If expansions_only = TRUE, returns a list of basis expansions.
See Also
Examples
set.seed(1234)
t <- runif(1000, -10, 10)
y <- 2*sin(t) + -0.06*t^2 + rnorm(length(t))
model_fit <- lgspline(t, y)
newdata <- matrix(sort(rnorm(10000)), ncol = 1)
preds <- predict(model_fit, newdata)
deriv1_res <- predict(model_fit, newdata, take_first_derivatives = TRUE)
deriv2_res <- predict(model_fit, newdata, take_second_derivatives = TRUE)
oldpar <- par(no.readonly = TRUE)
layout(matrix(c(1,1,2,2,3,3), byrow = TRUE, ncol = 2))
plot(newdata[,1], preds, main = 'Fitted Function',
xlab = 't', ylab = "f(t)", type = 'l')
plot(newdata[,1], deriv1_res$first_deriv, main = 'First Derivative',
xlab = 't', ylab = "f'(t)", type = 'l')
plot(newdata[,1], deriv2_res$second_deriv, main = 'Second Derivative',
xlab = 't', ylab = "f''(t)", type = 'l')
par(oldpar)
Print Method for lgspline Objects
Description
Prints a concise summary of the fitted model to the console.
Usage
## S3 method for class 'lgspline'
print(x, ...)
Arguments
x |
An lgspline model object. |
... |
Not used. |
Value
Invisibly returns x.
Print Method for lgspline Summaries
Description
Displays a formatted model summary using printCoefmat
for the coefficient table.
Usage
## S3 method for class 'summary.lgspline'
print(x, ...)
Arguments
x |
A |
... |
Not used. |
Value
Invisibly returns x.
See Also
Print Method for wald_lgspline Objects
Description
Prints the coefficient table using printCoefmat with
significance stars.
Usage
## S3 method for class 'wald_lgspline'
print(
x,
digits = max(3, getOption("digits") - 3),
signif.stars = getOption("show.signif.stars"),
...
)
Arguments
x |
A |
digits |
Number of significant digits. |
signif.stars |
Logical; show significance stars. |
... |
Additional arguments passed to |
Value
Invisibly returns x.
See Also
Log-Prior Distribution Evaluation for lgspline Models
Description
Evaluates the log-prior on the spline coefficients conditional on the dispersion and penalty matrices.
Usage
prior_loglik(model_fit, sigmasq = NULL, include_constant = TRUE)
Arguments
model_fit |
An lgspline model object. |
sigmasq |
Numeric scalar dispersion parameter. If NULL,
|
include_constant |
Logical; if TRUE (default), include the multivariate normal normalizing constant. |
Details
Returns the quadratic form of \beta^{T}\Lambda\beta evaluated at the
tuned or fixed penalties, scaled by negative one-half inverse dispersion.
Assuming fixed penalties, the prior on \beta is taken to be
\beta | \sigma^2 \sim \mathcal{N}(\textbf{0}, \sigma^2\Lambda^{-1})
so that, up to a normalizing constant C with respect to \beta,
\implies \log P(\beta|\sigma^2) = C-\frac{1}{2\sigma^2}\beta^{T}\Lambda\beta
The value of C is included when include_constant = TRUE, and
omitted when FALSE.
This is useful for computing joint penalized log-likelihoods and related
MAP-style diagnostics for a fitted lgspline object.
Value
A numeric scalar representing the prior log-likelihood.
See Also
Examples
## Data
t <- sort(runif(100, -5, 5))
y <- sin(t) - 0.1*t^2 + rnorm(100)
## Model keeping penalties fixed
model_fit <- lgspline(t, y, opt = FALSE)
## Full joint log-likelihood, conditional upon known sigma^2 = 1
jntloglik <- sum(dnorm(model_fit$y,
model_fit$ytilde,
1,
log = TRUE)) +
prior_loglik(model_fit, sigmasq = 1)
print(jntloglik)
Process Input Arguments for lgspline
Description
Parses formula and data arguments, performs factor encoding,
resolves variable roles (spline, linear-with-interactions,
linear-without-interactions), constructs exclusion patterns,
and validates inputs. Called internally by lgspline.
Users may call this function directly to inspect how their formula and data are interpreted before fitting.
Usage
process_input(
predictors = NULL,
y = NULL,
formula = NULL,
response = NULL,
data = NULL,
weights = NULL,
observation_weights = NULL,
family = gaussian(),
K = NULL,
custom_knots = NULL,
auto_encode_factors = TRUE,
include_2way_interactions = TRUE,
include_3way_interactions = TRUE,
just_linear_with_interactions = NULL,
just_linear_without_interactions = NULL,
exclude_interactions_for = NULL,
exclude_these_expansions = NULL,
offset = c(),
no_intercept = FALSE,
do_not_cluster_on_these = c(),
include_quartic_terms = NULL,
cluster_args = c(custom_centers = NA, nstart = 10),
include_warnings = TRUE,
dummy_fit = FALSE,
include_constrain_second_deriv = TRUE,
standardize_response = TRUE,
...
)
Arguments
predictors |
Default: NULL. Formula or numeric matrix/data frame of predictor variables. |
y |
Default: NULL. Numeric response vector. |
formula |
Default: NULL. Optional formula; alias for predictors when a formula object. |
response |
Default: NULL. Alternative name for |
data |
Default: NULL. Data frame for formula interface. |
weights |
Default: NULL. Alias for |
observation_weights |
Default: NULL. Numeric observation weight vector. |
family |
Default: |
K |
Default: NULL. Number of interior knots. |
custom_knots |
Default: NULL. Custom knot matrix. |
auto_encode_factors |
Default: TRUE. Logical; auto one-hot encode factor and character columns when using the formula interface. |
include_2way_interactions |
Default: TRUE. Logical. |
include_3way_interactions |
Default: TRUE. Logical. |
just_linear_with_interactions |
Default: NULL. Integer vector or character vector of column names. |
just_linear_without_interactions |
Default: NULL. Integer vector or character vector of column names. |
exclude_interactions_for |
Default: NULL. Integer vector or character vector of column names. |
exclude_these_expansions |
Default: NULL. Character vector of expansion names to exclude. |
offset |
Default: |
no_intercept |
Default: FALSE. Logical; remove intercept. |
do_not_cluster_on_these |
Default: |
include_quartic_terms |
Default: NULL. Logical or NULL. |
cluster_args |
Default: |
include_warnings |
Default: TRUE. Logical. |
dummy_fit |
Default: FALSE. Logical; run the full preprocessing path but stop short of fitting nonzero coefficients. |
include_constrain_second_deriv |
Default: TRUE. Logical. |
standardize_response |
Default: TRUE. Logical. |
... |
Additional arguments (checked for depreciated names). |
Value
A named list containing:
- predictors
Numeric matrix of predictor variables with column names stripped for positional indexing.
- y
Numeric response vector.
- og_cols
Character vector of original predictor column names, or NULL if none were available.
- replace_colnames
Logical; TRUE if og_cols is available and column renaming should be applied post-fit.
- just_linear_with_interactions
Integer vector of column indices, or NULL.
- just_linear_without_interactions
Integer vector of column indices, or NULL.
- exclude_interactions_for
Integer vector of column indices, or NULL.
- exclude_these_expansions
Character vector of positional-notation expansion names, or NULL.
- offset
Integer vector of column indices, or
c().- no_intercept
Logical.
- do_not_cluster_on_these
Numeric vector of column indices, or
c().- observation_weights
Numeric vector or NULL.
- K
Integer or NULL, possibly updated by cluster_args or all-linear detection.
- include_3way_interactions
Logical, possibly updated by formula parsing.
- include_quartic_terms
Logical or NULL, possibly updated by number of predictors.
- data
Data frame, possibly with factor columns one-hot encoded.
- include_constrain_second_deriv
Logical, possibly set FALSE when no numeric predictors remain.
- factor_groups
Named list mapping original factor column names to integer vectors of their one-hot indicator column positions within the predictor matrix. Used by lgspline.fit to impose sum-to-zero constraints on encoded factor levels. NULL when no factors were encoded, or when the formula interface was not used.
See Also
lgspline for the main fitting interface.
Examples
## Not run:
data("Theoph")
df <- Theoph[, c("Time", "Dose", "conc", "Subject")]
processed <- process_input(
predictors = conc ~ spl(Time) + Time*Dose,
data = df,
auto_encode_factors = TRUE,
include_warnings = TRUE
)
str(processed$predictors)
processed$og_cols
processed$just_linear_without_interactions
processed$factor_groups
## End(Not run)
Prepare Quadratic Programming Constraints for lgspline
Description
Constructs qp_Amat, qp_bvec, and qp_meq from the
built-in quadratic programming constraints handled here: range bounds,
derivative sign constraints (first and second), monotonicity constraints,
and user-supplied custom constraint functions.
This function was refactored from the inline QP setup block in
lgspline.fit to improve readability and testability.
The interface is fully backward-compatible: existing calls that pass
TRUE/FALSE for derivative flags continue to work.
Usage
process_qp(
X,
K,
p_expansions,
order_list,
colnm_expansions,
expansion_scales,
power1_cols,
power2_cols,
nonspline_cols,
interaction_single_cols,
interaction_quad_cols,
triplet_cols,
include_2way_interactions,
include_3way_interactions,
include_quadratic_interactions,
family,
mean_y,
sd_y,
N_obs,
qp_observations = NULL,
qp_positive_derivative = FALSE,
qp_negative_derivative = FALSE,
qp_positive_2ndderivative = FALSE,
qp_negative_2ndderivative = FALSE,
qp_monotonic_increase = FALSE,
qp_monotonic_decrease = FALSE,
qp_range_upper = NULL,
qp_range_lower = NULL,
qp_Amat_fxn = NULL,
qp_bvec_fxn = NULL,
qp_meq_fxn = NULL,
qp_Amat = NULL,
qp_bvec = NULL,
qp_meq = 0,
all_derivatives_fxn = NULL,
og_cols = NULL,
include_warnings = TRUE,
...
)
Arguments
X |
List of per-partition design matrices. |
K |
Integer. Number of interior knots. |
p_expansions |
Integer. Number of basis expansions per partition. |
order_list |
List of per-partition observation index vectors. |
colnm_expansions |
Character vector of expansion column names. |
expansion_scales |
Numeric vector of expansion standardization scales. |
power1_cols |
Integer vector of linear-term column indices. |
power2_cols |
Integer vector of quadratic-term column indices. |
nonspline_cols |
Integer vector of non-spline linear column indices. |
interaction_single_cols |
Integer vector of linear-by-linear interaction column indices. |
interaction_quad_cols |
Integer vector of linear-by-quadratic interaction column indices. |
triplet_cols |
Integer vector of three-way interaction column indices. |
include_2way_interactions |
Logical switch controlling whether two-way interactions are included in derivative construction. |
include_3way_interactions |
Logical switch controlling whether three-way interactions are included in derivative construction. |
include_quadratic_interactions |
Logical switch controlling whether quadratic interactions are included in derivative construction. |
family |
GLM family object. |
mean_y, sd_y |
Numeric scalars for response standardization. |
N_obs |
Integer. Total sample size. |
qp_observations |
Optional integer vector of observation indices. |
qp_positive_derivative, qp_negative_derivative |
Logical scalar, character vector, or integer vector. See section Per-Variable Derivative Constraints. |
qp_positive_2ndderivative, qp_negative_2ndderivative |
Same as above but for second derivatives. |
qp_monotonic_increase, qp_monotonic_decrease |
Logical. Constrain fitted values to be monotonic in observation order. |
qp_range_upper, qp_range_lower |
Optional numeric upper/lower bounds for fitted values. |
qp_Amat_fxn, qp_bvec_fxn, qp_meq_fxn |
Optional user-supplied constraint-generating functions. |
qp_Amat, qp_bvec, qp_meq |
Optional pre-built QP objects. Their presence marks QP handling as active, but this helper does not append them to the built-in constraints it constructs; they are expected to be handled outside this constructor. |
all_derivatives_fxn |
Function to compute derivatives from expansion
matrices (the |
og_cols |
Optional character vector of original predictor column names. |
include_warnings |
Logical. Whether to issue warnings. |
... |
Additional arguments forwarded to custom constraint functions. |
Value
A list with components:
- qp_Amat
P \times Mcombined constraint matrix.- qp_bvec
Numeric vector of length
M.- qp_meq
Integer. Number of leading equality constraints.
- quadprog
Logical.
TRUEif any QP constraints are active.
Per-Variable Derivative Constraints
The arguments qp_positive_derivative, qp_negative_derivative,
qp_positive_2ndderivative, and qp_negative_2ndderivative
now accept three forms:
FALSENo constraint (default).
TRUEConstrain all predictor variables (backward compatible with previous behavior).
- Character or integer vector
Constrain only the specified predictor variables. Character entries are resolved via
og_cols; integer entries refer to column positions in the predictor matrix.
This allows, for example, enforcing a positive first derivative for
variable "Dose" and a negative first derivative for variable
"Time" simultaneously:
lgspline(...,
qp_positive_derivative = "Dose",
qp_negative_derivative = "Time")
The arguments qp_monotonic_increase and
qp_monotonic_decrease remain TRUE/FALSE only,
because they constrain fitted values in observation order (not
per-variable).
Examples
## Not run:
## Standalone verification: simple 1-D monotonic increase
set.seed(42)
t <- seq(-5, 5, length.out = 200)
y <- 3 * sin(t) + t + rnorm(200, 0, 0.5)
## Fit with positive first-derivative constraint on all variables
fit1 <- lgspline(t, y, K = 3,
qp_positive_derivative = TRUE)
## Verify: first derivative should be >= 0 everywhere
derivs1 <- predict(fit1, new_predictors = sort(t),
take_first_derivatives = TRUE)
stopifnot(all(derivs1$first_deriv >= -1e-8))
## Fit with monotonic increase (observation-order)
fit2 <- lgspline(t, y, K = 3,
qp_monotonic_increase = TRUE)
preds2 <- predict(fit2, new_predictors = sort(t))
stopifnot(all(diff(preds2) >= -1e-8))
## Per-variable constraints: 2-D example
x1 <- runif(500, -5, 5)
x2 <- runif(500, -5, 5)
y2 <- x1 + sin(x2) + rnorm(500, 0, 0.5)
dat2 <- data.frame(x1 = x1, x2 = x2, y = y2)
## Constrain x1 to have positive derivative, x2 to have negative
fit3 <- lgspline(y ~ spl(x1, x2), data = dat2, K = 2,
qp_positive_derivative = "x1",
qp_negative_derivative = "x2")
## Verify per-variable derivatives
newdat <- expand.grid(x1 = seq(-4, 4, length.out = 50),
x2 = seq(-4, 4, length.out = 50))
d3 <- predict(fit3, new_predictors = newdat,
take_first_derivatives = TRUE)
## x1 derivative should be >= 0
stopifnot(all(unlist(d3$first_deriv[["_1_"]]) >= -1e-6))
## x2 derivative should be <= 0
stopifnot(all(unlist(d3$first_deriv[["_2_"]]) <= 1e-6))
## Numeric column indices work identically
fit4 <- lgspline(y ~ spl(x1, x2), data = dat2, K = 2,
qp_positive_derivative = 1,
qp_negative_derivative = 2)
## Range + derivative constraints simultaneously
fit5 <- lgspline(t, y, K = 3,
qp_positive_derivative = TRUE,
qp_range_lower = -5,
qp_range_upper = 15)
preds5 <- predict(fit5)
stopifnot(all(preds5 >= -5 - 1e-6))
stopifnot(all(preds5 <= 15 + 1e-6))
## End(Not run)
Evaluate the REML gradient with respect to a single correlation parameter
Description
Computes the derivative of the negative REML objective with respect to a
scalar correlation parameter, given the matrix derivative
\partial \mathbf{V}/\partial \rho.
Usage
reml_grad_from_dV(dV, model_fit, glm_weight_function, ...)
Arguments
dV |
|
model_fit |
A fitted |
glm_weight_function |
The GLM weight function used during model
fitting; see the |
... |
Additional arguments forwarded to |
Details
Notation
\mathbf{D} = \mathrm{diag}(d_1,\ldots,d_N) is the diagonal matrix of
observation weights (observation_weights), and
\mathbf{W} = \mathrm{diag}(w_1,\ldots,w_N) is the diagonal matrix of
GLM working weights evaluated at the current fitted values via
glm_weight_function, with the observation-weight contribution
carried separately by \mathbf{D}. For canonical GLM families,
w_i is the usual IRWLS/Fisher-scoring weight on the mean–variance
scale; for example, logistic regression gives
w_i = \mu_i(1-\mu_i). The combined weighting entering the
information matrix is \mathbf{W}\mathbf{D}. In Gaussian identity
models both reduce to scalar multiples of the identity.
\mathbf{V} is the N \times N correlation matrix implied by
VhalfInv, with
\mathbf{V}^{-1} = (\mathbf{V}^{-1/2})^\top \mathbf{V}^{-1/2}.
The penalized observed information at the current iterate is
\mathbf{M} = (\mathbf{X}^*)^\top
\mathbf{V}^{-1}\mathbf{W}\mathbf{D}\,\mathbf{X}^*
+ \mathbf{U}^\top\boldsymbol{\Lambda}\mathbf{U},
where \mathbf{X}^* = \mathbf{X}\mathbf{U} is the constrained design
(N \times P) and the first term is the quadratic approximation to the
penalized log-likelihood Hessian at the current \boldsymbol{\mu}.
For non-Gaussian families this is a local approximation (the IRWLS/Fisher
scoring Hessian), not an exact GLS information matrix.
The constraint projection is
\mathbf{U} = \mathbf{I} -
\mathbf{G}\mathbf{A}(\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}
\mathbf{A}^\top
with \mathbf{G} = \mathbf{M}^{-1}.
\mathbf{U} is idempotent (\mathbf{U}^2 = \mathbf{U}) but
not symmetric, so
\mathbf{U}^\top\boldsymbol{\Lambda}\mathbf{U} \neq
\mathbf{U}\boldsymbol{\Lambda}\mathbf{U}^\top.
The U stored in model_fit$U is on the expansion- and
response-standardised scale. The rescaled version used here is
\tilde{\mathbf{U}} =
\mathbf{U} \cdot \mathrm{diag}(1, s_1, \ldots, s_{p-1},
1, s_1, \ldots) / \hat{\sigma}_y,
where s_j are the expansion scales and \hat{\sigma}_y
standardises the response. All quantities below use
\tilde{\mathbf{U}} in place of \mathbf{U}.
REML objective and its gradient
The REML objective is constructed by integrating out the fixed effects from the penalized log-likelihood, using a Laplace approximation to the marginal likelihood for non-Gaussian families. This approximation is exact for Gaussian identity models and is the standard extension used in restricted maximum likelihood estimation for GLMMs.
The REML correction term is
-\tfrac{1}{2}\log|\mathbf{M}|, where \mathbf{M} is the
penalized observed information defined above. Differentiating with respect
to \rho (noting only \mathbf{V} depends on \rho)
gives the REML correction gradient
-\frac{1}{2}\mathrm{tr}\!\Bigl(
\mathbf{M}^{-1}
(\mathbf{X}^*)^\top\mathbf{V}^{-1}
\frac{\partial\mathbf{V}}{\partial\rho}
\mathbf{V}^{-1}\mathbf{W}\mathbf{D}\,\mathbf{X}^*
\Bigr).
Full gradient
\frac{\partial(-\ell_R)}{\partial\rho}
= \frac{1}{N}\Biggl[
\underbrace{
\frac{1}{2}\mathrm{tr}\!\Bigl(
\mathbf{V}^{-1}\frac{\partial\mathbf{V}}{\partial\rho}
\Bigr)
}_{\text{(i) log-det of }\mathbf{V}}
\underbrace{
-\frac{1}{2\tilde{\sigma}^2}
\mathbf{r}^\top\frac{\partial\mathbf{V}}{\partial\rho}\mathbf{r}
}_{\text{(ii) residual quadratic form}}
\underbrace{
-\frac{1}{2}\mathrm{tr}\!\Bigl(
\mathbf{M}^{-1}(\mathbf{X}^*)^\top\mathbf{V}^{-1}
\frac{\partial\mathbf{V}}{\partial\rho}
\mathbf{V}^{-1}\mathbf{W}\mathbf{D}\,\mathbf{X}^*
\Bigr)
}_{\text{(iii) REML correction}}
\Biggr],
where the whitened residual is
\mathbf{r} =
\mathrm{diag}\!\left(\sqrt{d_i}/\sqrt{w_i}\right)
\mathbf{V}^{-1/2}(\mathbf{y} - \boldsymbol{\mu}),
and r_i =
[\mathbf{V}^{-1/2}(\mathbf{y}-\boldsymbol{\mu})]_i
\sqrt{d_i}/\sqrt{w_i}.
Term (i) does not involve \mathbf{D} or \mathbf{W}; the
log-determinant of \mathbf{V} depends only on the correlation
structure. For non-Gaussian families terms (ii) and (iii) are evaluated
at the current IRWLS iterate and constitute a local approximation.
Value
A scalar: the gradient of the negative REML objective with respect
to \rho, divided by N.
See Also
Safely Replace Variable Names in Printed Terms
Description
Replaces a variable name only when it appears as a standalone term, avoiding accidental replacements inside other words.
Usage
safe_replace_var(x, old, new)
Arguments
x |
Character vector to modify. |
old |
Character scalar, original variable name. |
new |
Character scalar, replacement variable name. |
Details
Useful for plot formula labels, for example replacing "t" with "that" without changing "intercept" into "inthatercept".
Value
Character vector with safe term-wise replacements.
Compute Cox Partial Log-Likelihood Score Vector
Description
Gradient of the Cox partial log-likelihood with respect to
\boldsymbol{\beta}. Data must be sorted by ascending event time.
Usage
score_cox(X, eta, status, y = NULL, weights = 1)
Arguments
X |
Design matrix (N x p), sorted by ascending event time. |
eta |
Linear predictor vector |
status |
Event indicator (1 = event, 0 = censored). |
y |
Optional numeric vector of observed event/censor times, same length
and order as |
weights |
Observation weights (default 1). |
Details
Under the Breslow approximation, the score is
\mathbf{u} =
\sum_g \Bigl[\sum_{i \in D_g} w_i \mathbf{x}_i -
d_g^{(w)} \frac{\sum_{j \in R_g} w_j e^{\eta_j}\mathbf{x}_j}
{\sum_{j \in R_g} w_j e^{\eta_j}}\Bigr]
where D_g is the tied event set at time t_g,
R_g = \{j : t_j \ge t_g\} is the risk set, and
d_g^{(w)} = \sum_{i \in D_g} w_i.
Value
Numeric column vector of length p (gradient).
Compute Negative Binomial Score Vector
Description
Gradient of the NB2 log-likelihood with respect to
\boldsymbol{\beta} under the log link.
Usage
score_negbin(X, y, mu, theta, weights = 1)
Arguments
X |
Design matrix (N x p). |
y |
Response vector. |
mu |
Mean vector |
theta |
Shape parameter. |
weights |
Observation weights (default 1). |
Details
Under log link, d\mu_i/d\eta_i = \mu_i, so the score is
\mathbf{u} = \mathbf{X}^{\top}\mathbf{w}_{\mathrm{obs}}
\odot \frac{(y_i - \mu_i)\theta}{\theta + \mu_i}
where the per-observation contribution
(y_i - \mu_i)\theta/(\theta + \mu_i) is the derivative of the
log-likelihood with respect to \eta_i.
Value
Numeric column vector of length p (gradient).
Compute softplus transform
Description
Computes the softplus transform, equivalent to the cumulant generating function
of a logistic regression model: \log(1+e^x).
Usage
softplus(x)
Arguments
x |
Numeric vector to apply softplus to |
Value
Softplus transformed vector
Examples
x <- runif(5)
softplus(x)
Standardize Vector to Z-Scores
Description
Centers a vector by its sample mean, then scales it by its sample standard deviation
(\text{x}-\text{mean}(\text{x}))/\text{sd}(\text{x}).
Usage
std(x)
Arguments
x |
Numeric vector to standardize |
Value
Standardized vector with sample mean 0 and standard deviation 1
Examples
x <- c(1, 2, 3, 4, 5)
std(x)
print(mean(x))
print(sd(x))
Summary Method for lgspline Objects
Description
Summary Method for lgspline Objects
Usage
## S3 method for class 'lgspline'
summary(object, ...)
Arguments
object |
An lgspline model object. |
... |
Not used. |
Value
An object of class summary.lgspline, a list containing:
- model_family
The
familyobject.- observations
Number of observations N.
- predictors
Number of predictor variables q.
- knots
Number of knots K.
- basis_functions
Basis functions per partition p.
- estimate_dispersion
"Yes" or "No".
- cv
Critical value used for confidence intervals.
- coefficients
Coefficient matrix from
wald_univariate, or a single-column estimate matrix ifreturn_varcovmat = FALSE.- sigmasq_tilde
Estimated dispersion
\tilde{\sigma}^2.- trace_XUGX
Trace of the hat matrix
\mathrm{trace}(\mathbf{XUGX}^\top).- N
Number of observations.
Summary Method for wald_lgspline Objects
Description
Prints a header with model info then delegates to
print.wald_lgspline.
Usage
## S3 method for class 'wald_lgspline'
summary(object, ...)
Arguments
object |
A |
... |
Passed to |
Value
Invisibly returns object.
See Also
wald_univariate, print.wald_lgspline
Calculate Derivatives of Polynomial Terms
Description
Computes first or second derivatives of polynomial terms in a design matrix with respect to a specified variable. Handles polynomial terms up to fourth degree.
Usage
take_derivative(dat, var, second = FALSE, scale)
Arguments
dat |
Numeric matrix; design matrix containing polynomial basis expansions |
var |
Character; column name of variable to differentiate with respect to |
second |
Logical; if TRUE compute second derivative, if FALSE compute first derivative (default FALSE) |
scale |
Numeric; scaling factor for normalization |
Value
Numeric matrix containing derivatives of polynomial terms, with same dimensions as input matrix
Calculate Second Derivatives of Interaction Terms
Description
Computes partial second derivatives for interaction terms including two-way linear, quadratic, and three-way interactions. Handles special cases for each type.
This function is necessary to compute total second derivatives as the sum of
second partial "pure" derivatives (d^2/dx^2) plus second partial "mixed"
derivative (d^2/dxdz), for a predictor x and all other predictors z.
Usage
take_interaction_2ndderivative(
dat,
var,
interaction_single_cols,
interaction_quad_cols,
triplet_cols,
colnm_expansions,
power1_cols,
power2_cols
)
Arguments
dat |
Numeric matrix; design matrix containing basis expansions |
var |
Character; variable name to differentiate with respect to |
interaction_single_cols |
Integer vector; column indices for linear-linear interactions |
interaction_quad_cols |
Integer vector; column indices for linear-quadratic interactions |
triplet_cols |
Integer vector; column indices for three-way interactions |
colnm_expansions |
Character vector; column names of expansions for each partition |
power1_cols |
Integer vector; column indices of linear terms |
power2_cols |
Integer vector; column indices of quadratic terms |
Value
Numeric matrix of second derivatives, same dimensions as input
Tune Smoothing and Ridge Penalties via Generalized Cross Validation
Description
Optimizes smoothing spline and ridge regression penalties by minimizing the GCV criterion. Uses BFGS optimization with analytical gradients or finite differences. This is the top-level entry point that orchestrates grid search initialization and quasi-Newton optimization via internal subfunctions.
Usage
tune_Lambda(
y,
X,
X_gram,
smoothing_spline_penalty,
A,
K,
p_expansions,
N_obs,
opt,
use_custom_bfgs,
C,
colnm_expansions,
wiggle_penalty,
flat_ridge_penalty,
initial_wiggle,
initial_flat,
unique_penalty_per_predictor,
unique_penalty_per_partition,
penalty_vec,
meta_penalty,
family,
unconstrained_fit_fxn,
keep_weighted_Lambda,
iterate,
qp_score_function,
quadprog,
qp_Amat,
qp_bvec,
qp_meq,
tol,
sd_y,
delta,
constraint_value_vectors,
parallel,
parallel_eigen,
parallel_trace,
parallel_aga,
parallel_matmult,
parallel_unconstrained,
cl,
chunk_size,
num_chunks,
rem_chunks,
shared_env,
custom_penalty_mat,
order_list,
glm_weight_function,
schur_correction_function,
need_dispersion_for_estimation,
dispersion_function,
observation_weights,
homogenous_weights,
blockfit,
just_linear_without_interactions,
Vhalf,
VhalfInv,
verbose,
include_warnings,
...
)
Arguments
y |
List; response vectors by partition. |
X |
List; design matrices by partition. |
X_gram |
List; Gram matrices by partition. |
smoothing_spline_penalty |
Matrix; integrated squared second derivative penalty. |
A |
Matrix; smoothness constraints at knots. |
K |
Integer; number of interior knots in 1-D, number of partitions - 1 in higher dimensions. |
p_expansions |
Integer; columns per partition. |
N_obs |
Integer; total sample size. |
opt |
Logical; TRUE to optimize penalties, FALSE to use initial values. |
use_custom_bfgs |
Logical; TRUE for analytic gradient BFGS as natively
implemented, FALSE for finite differences as implemented by
|
C |
Matrix; polynomial expansion matrix (used for initialization). |
colnm_expansions |
Character vector; column names of the expansion matrix. |
wiggle_penalty, flat_ridge_penalty |
Fixed penalty values if provided. |
initial_wiggle, initial_flat |
Numeric vectors; candidate values for grid search initialization on the raw (non-negative) scale. Converted to log scale internally for optimization. |
unique_penalty_per_predictor, unique_penalty_per_partition |
Logical; allow predictor/partition-specific penalties. |
penalty_vec |
Numeric vector; initial values for predictor/partition
penalties on the raw (non-negative) scale. Converted to log scale
internally for optimization. Use |
meta_penalty |
The "meta" ridge penalty, a regularization for predictor/partition penalties to pull them towards 1 on the raw scale. |
family |
GLM family with optional custom tuning loss. |
unconstrained_fit_fxn |
Function for unconstrained fitting. |
keep_weighted_Lambda, iterate |
Logical controlling GLM fitting. |
qp_score_function, quadprog, qp_Amat, qp_bvec, qp_meq |
Quadratic
programming parameters (see arguments of |
tol |
Numeric; convergence tolerance. |
sd_y, delta |
Response standardization parameters. |
constraint_value_vectors |
List; constraint values. |
parallel |
Logical; enable parallel computation. |
parallel_eigen, parallel_trace, parallel_aga |
Logical; specific parallel flags. |
parallel_matmult, parallel_unconstrained |
Logical; specific parallel flags. |
cl |
Parallel cluster object. |
chunk_size, num_chunks, rem_chunks |
Integer; parallel computation parameters. |
shared_env |
Environment; shared variables exported to cluster workers. |
custom_penalty_mat |
Optional custom penalty matrix. |
order_list |
List; observation ordering by partition. |
glm_weight_function, schur_correction_function |
Functions for GLM weights and corrections. |
need_dispersion_for_estimation, dispersion_function |
Control dispersion estimation. |
observation_weights |
Optional observation weights. |
homogenous_weights |
Logical; TRUE if all weights equal. |
blockfit |
Logical; when TRUE, the backfitting block decomposition
( |
just_linear_without_interactions |
Numeric; vector of columns for non-spline effects without interactions. |
Vhalf, VhalfInv |
Square root and inverse square root correlation structure matrices. |
verbose |
Logical; print progress. |
include_warnings |
Logical; print warnings/try-errors. |
... |
Additional arguments passed to fitting functions. |
Details
The tuning procedure consists of the following steps:
-
Preprocessing: Convert raw-scale penalties to log scale, compute cross-products, determine pseudocount delta, ensure constraint matrix compatibility.
-
Blockfit dispatch: Pre-compute
flat_colsand theuse_blockfitflag so that every GCV evaluation uses the same coefficient estimator as the final fit.flat_colsare identified by matching column names againstpaste0("_", just_linear_without_interactions, "_")incolnm_expansions. -
Grid search: Evaluate GCV_u over a grid of (wiggle, ridge) penalty candidates to find a good starting point (see
.tune_grid_search). -
BFGS optimization: Minimize GCV_u via either the custom damped BFGS with closed-form gradients (see
.damped_bfgs,.compute_gcvu_gradient) or base R'sstats::optimwith finite-difference gradients. -
Inflation: Apply small inflation factor
((N+2)/(N-2))^{2}to counteract in-sample bias toward underpenalization. -
Final Lambda: Compute the final penalty matrix from optimized parameters via
compute_Lambda.
Parameterization: initial penalty values are accepted on the raw (non-negative) scale and converted to natural log-scale internally, i.e. raw_penalty = exp(theta), so that raw penalties are always positive. The chain rule factor d(exp(theta))/d(theta) = exp(theta) = raw_penalty.
Value
List containing:
- Lambda
Final combined penalty matrix.
- flat_ridge_penalty
Optimized ridge penalty.
- wiggle_penalty
Optimized smoothing penalty.
- other_penalties
Optimized predictor/partition penalties.
- L_predictor_list
Predictor-specific penalty matrices.
- L_partition_list
Partition-specific penalty matrices.
See Also
-
optimfor Hessian-free optimization -
compute_Lambdafor penalty matrix construction -
compute_G_eigenfor eigendecomposition of penalized Gram matrices -
get_Bfor constrained coefficient estimation -
blockfit_solvefor the backfitting block-decomposition estimator used whenblockfit = TRUE
Examples
## Not run:
## ## Example 1: Basic usage within lgspline ## ##
## tune_Lambda is called internally by lgspline; direct calls are
## for advanced users. Here we verify that the refactored version
## produces identical output to the original.
set.seed(42)
t <- runif(200, -5, 5)
y <- sin(t) + rnorm(200, 0, 0.5)
## Fit with automatic penalty tuning (calls tune_Lambda internally)
fit1 <- lgspline(t, y, K = 3)
cat("Wiggle penalty:", fit1$penalties$wiggle_penalty, "\n")
cat("Ridge penalty:", fit1$penalties$flat_ridge_penalty, "\n")
cat("Trace (edf):", fit1$trace_XUGX, "\n")
## ## Example 2: Fixed penalties (no tuning) ## ##
fit2 <- lgspline(t, y, K = 3, opt = FALSE,
wiggle_penalty = 1e-4,
flat_ridge_penalty = 0.1)
cat("Fixed wiggle:", fit2$penalties$wiggle_penalty, "\n")
## ## Example 3: blockfit path ## ##
## When blockfit = TRUE and just_linear_without_interactions is non-empty,
## tune_Lambda dispatches to blockfit_solve at each GCV evaluation,
## ensuring penalties are tuned under the same estimator used in the final
## fit. Verify that tuned penalties are consistent across both paths.
set.seed(7)
n <- 300
x1 <- runif(n, 0, 5)
x2 <- rnorm(n)
y2 <- sin(x1) + 0.5 * x2 + rnorm(n, 0, 0.3)
df <- data.frame(x1 = x1, x2 = x2)
## blockfit = TRUE uses blockfit_solve during tuning
fit_bf <- lgspline(df, y2, K = 2, blockfit = TRUE,
just_linear_without_interactions = 2)
## blockfit = FALSE uses get_B during tuning (original path)
fit_std <- lgspline(df, y2, K = 2, blockfit = FALSE,
just_linear_without_interactions = 2)
cat("blockfit wiggle :", fit_bf$penalties$wiggle_penalty, "\n")
cat("standard wiggle :", fit_std$penalties$wiggle_penalty, "\n")
## Penalties may differ slightly; predictions should be close.
cat("Max pred diff:", max(abs(fit_bf$ytilde - fit_std$ytilde)), "\n")
## ## Example 4: Verify refactored subfunctions ## ##
## The internal .compute_meta_penalty should match hand calculation
mp <- lgspline:::.compute_meta_penalty(
wiggle_penalty = 0.5,
penalty_vec = c(predictor1 = 1.2, partition1 = 0.8),
meta_penalty_coef = 1e-8,
unique_penalty_per_predictor = TRUE,
unique_penalty_per_partition = TRUE
)
expected <- 0.5 * 1e-8 * ((1.2 - 1)^2 + (0.8 - 1)^2) +
0.5 * 1e-32 * (0.5 - 1)^2
stopifnot(abs(mp - expected) < 1e-20)
cat("Meta-penalty check passed.\n")
## ## Example 5: Verify gradient of meta-penalty ## ##
gr <- lgspline:::.compute_meta_penalty_gradient(
wiggle_penalty = 2.0,
penalty_vec = c(predictor1 = 1.5),
meta_penalty_coef = 1e-8,
unique_penalty_per_predictor = TRUE,
unique_penalty_per_partition = FALSE
)
## gr[1] should be 1e-32 * (2 - 1) * 2 = 2e-32
## gr[2] should be 0
## gr[3] should be 1e-8 * (1.5 - 1) * 1.5 = 7.5e-9
stopifnot(abs(gr[1] - 2e-32) < 1e-40)
stopifnot(gr[2] == 0)
stopifnot(abs(gr[3] - 7.5e-9) < 1e-17)
cat("Meta-penalty gradient check passed.\n")
## ## Example 6: Residual computation paths ## ##
## Identity link
r1 <- lgspline:::.compute_tuning_residuals(
y = list(c(1, 2, 3)),
preds = list(c(1.1, 1.9, 3.2)),
delta = 0,
family = gaussian(),
observation_weights = list(NULL),
K = 0,
order_list = list(1:3)
)
stopifnot(max(abs(r1[[1]] - c(-0.1, 0.1, -0.2))) < 1e-10)
cat("Residual check passed.\n")
## End(Not run)
Unconstrained Cox PH Estimation for lgspline
Description
Estimates penalized Cox PH coefficients for a single partition (or the full model when K = 0) using damped Newton-Raphson on the penalized partial log-likelihood with the Breslow approximation for tied event times.
This function is passed to lgspline's unconstrained_fit_fxn
argument. It receives the partition-level design matrix and response,
sorts internally by event time, and returns the coefficient vector.
Usage
unconstrained_fit_cox(
X,
y,
LambdaHalf,
Lambda,
keep_weighted_Lambda,
family,
tol = 1e-08,
K,
parallel,
cl,
chunk_size,
num_chunks,
rem_chunks,
order_indices,
weights,
status
)
Arguments
X |
Design matrix (N_k x p) for partition k. |
y |
Survival times for partition k. |
LambdaHalf |
Square root of penalty matrix (unused here, retained for interface compatibility). |
Lambda |
Penalty matrix. |
keep_weighted_Lambda |
Logical; if TRUE, return hot-start estimates without refinement (not recommended for Cox). |
family |
Cox family object. |
tol |
Convergence tolerance. |
K |
Number of knots. |
parallel |
Logical for parallel processing. |
cl |
Cluster object. |
chunk_size |
num_chunks, rem_chunks Parallelism parameters. |
order_indices |
Observation indices mapping partition to full data. |
weights |
Observation weights. |
status |
Event indicator (1 = event, 0 = censored), full-data length. |
Details
The penalized partial log-likelihood is
\ell_p(\boldsymbol{\beta}) = \ell(\boldsymbol{\beta})
- \tfrac{1}{2}\boldsymbol{\beta}^{\top}
\boldsymbol{\Lambda}\boldsymbol{\beta}
Newton-Raphson updates use the Cox score and observed information
(computed via score_cox and info_cox), plus the penalty
term. The step is damped: the step size is halved until the penalized
log-likelihood improves.
Value
Numeric column vector of penalized partial-likelihood coefficient estimates.
Examples
## Used internally by lgspline; see the full-model example below.
Unconstrained Generalized Linear Model Estimation
Description
Fits generalized linear models without smoothing constraints using penalized maximum likelihood estimation. This is applied to each partition to obtain the unconstrained estimates, prior to imposing the smoothing constraints.
Hot-start estimates are initialized by treating the matrix square-root inverse
as pseudo-observations, appending to the rows of the design matrix, and
calling glm.fit replacing the response for pseudo-observations
with the inverse link function applied to the value of eta = XB = 0. For log
link, exp(0) = 1 is valid for most families. For cases like inverse link,
dividing by 0 is obviously not possible, so this is replaced with 1/tol
where tol is the convergence tolerance argument.
Usage
unconstrained_fit_default(
X,
y,
LambdaHalf,
Lambda,
keep_weighted_Lambda,
family,
tol,
K,
parallel,
cl,
chunk_size,
num_chunks,
rem_chunks,
order_indices,
weights,
...
)
Arguments
X |
Design matrix of predictors |
y |
Response variable vector |
LambdaHalf |
Square root of penalty matrix
( |
Lambda |
Penalty matrix ( |
keep_weighted_Lambda |
Logical flag to control penalty matrix handling:
- |
family |
Distribution family specification |
tol |
Convergence tolerance |
K |
Number of partitions minus one ( |
parallel |
Flag for parallel processing |
cl |
Cluster object for parallel computation |
chunk_size |
Processing chunk size |
num_chunks |
Number of computational chunks |
rem_chunks |
Remaining chunks |
order_indices |
Observation ordering indices |
weights |
Optional observation weights |
... |
Additional arguments passed to |
Value
Numeric column vector of unconstrained coefficient estimates.
For fitting non-canonical GLMs, use keep_weighted_Lambda = TRUE
since the score and Hessian equations below are no longer valid.
For Gamma(link='log') using keep_weighted_Lambda = TRUE is
misleading. The information is weighted by a constant (shape parameter)
rather than some mean-variance relationship. So
keep_weighted_Lambda = TRUE is highly recommended for log-link Gamma
models. This constant flushes into the penalty terms, and so the
formulation of the information matrix is valid.
For other scenarios, like probit regression, there will be diagonal weights incorporated into the penalty matrix for providing initial MLE estimates, which technically imposes a prior distribution on beta coefficients that isn't by intent.
Heuristically, it shouldn't affect much, as these will be updated to their proper form when providing estimates under constraint; lgspline otherwise does use the correct form of score and information afterwards, regardless of canonical/non-canonical status, as long as 'glm_weight_function' and 'qp_score_function' are properly specified.
Unconstrained NB Estimation for lgspline
Description
Estimates penalized NB2 coefficients for a single partition (or the
full model when K = 0) using a two-stage optimization: outer loop
profiles over \theta via Brent's method, inner loop estimates
\boldsymbol{\beta} via damped Newton-Raphson on the penalized
log-likelihood.
Usage
unconstrained_fit_negbin(
X,
y,
LambdaHalf,
Lambda,
keep_weighted_Lambda,
family,
tol = 1e-08,
K,
parallel,
cl,
chunk_size,
num_chunks,
rem_chunks,
order_indices,
weights
)
Arguments
X |
Design matrix (N_k x p) for partition k. |
y |
Response counts for partition k. |
LambdaHalf |
Square root of penalty matrix. |
Lambda |
Penalty matrix. |
keep_weighted_Lambda |
Logical; if TRUE, return hot-start estimates from augmented Poisson regression without refinement. |
family |
NB family object. |
tol |
Convergence tolerance. |
K |
Number of knots. |
parallel |
Logical for parallel processing. |
cl |
Cluster object. |
chunk_size |
num_chunks, rem_chunks Parallelism parameters. |
order_indices |
Observation indices mapping partition to full data. |
weights |
Observation weights. |
Details
The penalized log-likelihood is
\ell_p(\boldsymbol{\beta}, \theta) = \ell(\boldsymbol{\beta},
\theta) - \tfrac{1}{2}\boldsymbol{\beta}^{\top}
\boldsymbol{\Lambda}\boldsymbol{\beta}
The outer loop optimizes \theta given the inner-loop-optimal
\boldsymbol{\beta}(\theta). This mirrors the Weibull AFT
approach where scale (\sigma) is profiled out.
Initialization uses Poisson regression coefficients as a hot start
(equivalent to \theta \to \infty).
Value
Numeric column vector of penalized coefficient estimates.
Unconstrained Weibull Accelerated Failure Time Model Estimation
Description
Estimates parameters for an unconstrained Weibull accelerated failure time (AFT) model supporting right-censored survival data.
This both provides a tool for actually fitting Weibull AFT Models, and boilerplate code for users who wish to incorporate Lagrangian multiplier smoothing splines into their own custom models.
Usage
unconstrained_fit_weibull(
X,
y,
LambdaHalf,
Lambda,
keep_weighted_Lambda,
family,
tol = 1e-08,
K,
parallel,
cl,
chunk_size,
num_chunks,
rem_chunks,
order_indices,
weights,
status
)
Arguments
X |
Design matrix of predictors |
y |
Survival/response times |
LambdaHalf |
Square root of penalty matrix
( |
Lambda |
Penalty matrix ( |
keep_weighted_Lambda |
Flag to retain weighted penalties |
family |
Distribution family specification |
tol |
Convergence tolerance (default 1e-8) |
K |
Number of partitions minus one ( |
parallel |
Flag for parallel processing |
cl |
Cluster object for parallel computation |
chunk_size |
Processing chunk size |
num_chunks |
Number of computational chunks |
rem_chunks |
Remaining chunks |
order_indices |
Observation ordering indices |
weights |
Optional observation weights |
status |
Censoring status indicator (1 = event, 0 = censored) Indicates whether an event of interest occurred (1) or the observation was right-censored (0). In survival analysis, right-censoring occurs when the full survival time is unknown, typically because the study ended or the subject was lost to follow-up before the event of interest occurred. |
Details
Estimation Approach: The function employs a two-stage optimization strategy for fitting accelerated failure time models via maximum likelihood:
1. Outer Loop: Estimate Scale Parameter (sigma) using Brent's method
2. Inner Loop: Estimate Regression Coefficients (given sigma) using damped Newton-Raphson.
Note: the score and information inside the Newton-Raphson are both scaled
by \sigma^2 (i.e., both omit the 1/\sigma and
1/\sigma^2 prefactors respectively). Since the Newton-Raphson step is
\mathbf{G}u = (\mathbf{X}^{\top}\mathbf{W}\mathbf{X})^{-1}
\mathbf{X}^{\top}\mathbf{v}, the \sigma^2 factors cancel and the
step remains correct.
Value
Numeric column vector of unconstrained coefficient estimates for the Weibull AFT model.
Examples
## Simulate survival data with covariates
set.seed(1234)
n <- 1000
t1 <- rnorm(n)
t2 <- rbinom(n, 1, 0.5)
## Generate survival times with Weibull-like structure
lambda <- exp(0.5 * t1 + 0.3 * t2)
yraw <- rexp(n, rate = 1/lambda)
## Introduce right-censoring
status <- rbinom(n, 1, 0.75)
y <- ifelse(status, yraw, runif(length(yraw), 0, yraw))
df <- data.frame(y = y, t1 = t1, t2 = t2)
## Fit model using lgspline with Weibull AFT unconstrained estimation
model_fit <- lgspline(y ~ spl(t1) + t2,
df,
unconstrained_fit_fxn = unconstrained_fit_weibull,
family = weibull_family(),
need_dispersion_for_estimation = TRUE,
dispersion_function = weibull_dispersion_function,
glm_weight_function = weibull_glm_weight_function,
schur_correction_function = weibull_schur_correction,
status = status,
opt = FALSE,
K = 1)
## Print model summary
summary(model_fit)
Vector-Matrix Multiplication for Block Diagonal Matrices
Description
Performs vector-matrix multiplication for block diagonal matrices
Usage
vectorproduct_block_diagonal(A, b, K)
Arguments
A |
List of matrices A |
b |
List of vectors b |
K |
Number of blocks |
Value
List of resulting vectors
Univariate Wald Tests and Confidence Intervals for lgspline Coefficients
Description
Computes per-coefficient Wald tests and confidence intervals from a fitted lgspline. For Gaussian identity-link models, t-statistics and t-intervals are used; otherwise z-statistics.
Usage
wald_univariate(object, scale_vcovmat_by = 1, cv, ...)
Arguments
object |
A fitted lgspline object. Must have been fit with
|
scale_vcovmat_by |
Numeric; scaling factor for the variance-covariance matrix. Default 1. |
cv |
Numeric; critical value for confidence intervals. If missing,
defaults to |
... |
Additional arguments passed to the internal |
Value
An object of class "wald_lgspline", a list with:
- coefficients
Matrix with columns: Estimate, Std. Error, t value or z value, Pr(>|t|) or Pr(>|z|), CI LB, CI UB.
- critical_value
Critical value used.
- family
GLM family from the fitted model.
- N
Number of observations.
- trace_XUGX
Effective df trace term.
- statistic_name
"t value" or "z value".
- p_value_name
"Pr(>|t|)" or "Pr(>|z|)".
- df.residual
Residual degrees of freedom when supplied by the internal Wald method.
Print, summary, and plot methods are available; see
print.wald_lgspline, summary.wald_lgspline,
plot.wald_lgspline.
See Also
lgspline, confint.lgspline,
print.wald_lgspline, summary.wald_lgspline,
plot.wald_lgspline
Examples
set.seed(1234)
t <- runif(1000, -10, 10)
y <- 2*sin(t) + -0.06*t^2 + rnorm(length(t))
model_fit <- lgspline(t, y, return_varcovmat = TRUE)
wald_default <- wald_univariate(model_fit)
print(wald_default)
## t-distribution critical value
eff_df <- model_fit$N - model_fit$trace_XUGX
wald_t <- wald_univariate(model_fit, cv = qt(0.975, eff_df))
print(wald_t)
coef_table <- wald_default$coefficients
plot(wald_default)
Estimate Weibull Dispersion for Accelerated Failure Time Model
Description
Computes the dispersion parameter (sigma^2 = scale^2) for a Weibull
accelerated failure time (AFT) model, supporting right-censored survival
data. The returned value is sigma^2, where sigma is the Weibull scale
parameter matching survreg$scale.
This both provides a tool for actually fitting Weibull AFT Models, and boilerplate code for users who wish to incorporate Lagrangian multiplier smoothing splines into their own custom models.
Usage
weibull_dispersion_function(
mu,
y,
order_indices,
family,
observation_weights,
VhalfInv,
status
)
Arguments
mu |
Predicted survival times |
y |
Observed response/survival times |
order_indices |
Indices to align status with response |
family |
Weibull AFT model family specification; unused here and retained for interface compatibility. |
observation_weights |
Optional observation weights |
VhalfInv |
Inverse square root of the correlation matrix; unused here and retained for interface compatibility. |
status |
Censoring indicator (1 = event, 0 = censored) Indicates whether an event of interest occurred (1) or the observation was right-censored (0). In survival analysis, right-censoring occurs when the full survival time is unknown, typically because the study ended or the subject was lost to follow-up before the event of interest occurred. |
Value
Dispersion estimate (sigma^2) for the Weibull AFT model, i.e., the squared
scale parameter. The Weibull scale (sigma) matching survreg$scale is
sqrt() of this value.
See Also
weibull_scale for the underlying scale estimation
function
Examples
## Simulate survival data with covariates
set.seed(1234)
n <- 1000
t1 <- rnorm(n)
t2 <- rbinom(n, 1, 0.5)
## Generate survival times with Weibull-like structure
lambda <- exp(0.5 * t1 + 0.3 * t2)
yraw <- rexp(n, rate = 1/lambda)
## Introduce right-censoring
status <- rbinom(n, 1, 0.75)
y <- ifelse(status, yraw, runif(length(yraw), 0, yraw))
## Example of using dispersion function
mu <- mean(y)
order_indices <- seq_along(y)
weights <- rep(1, n)
## Estimate dispersion (= scale^2 = sigma^2)
dispersion_est <- weibull_dispersion_function(
mu = mu,
y = y,
order_indices = order_indices,
family = weibull_family(),
observation_weights = weights,
VhalfInv = NULL,
status = status
)
print(dispersion_est) # sigma^2
print(sqrt(dispersion_est)) # sigma (comparable to survreg$scale)
Weibull Family for Survival Model Specification
Description
Creates a family-like object for Weibull accelerated failure time (AFT) models, including custom log-likelihood, AIC, and deviance helpers.
This both provides a tool for actually fitting Weibull AFT Models, and boilerplate code for users who wish to incorporate Lagrangian multiplier smoothing splines into their own custom models.
Usage
weibull_family()
Details
Provides a comprehensive family specification for Weibull AFT models,
including family name, link function, inverse link function, custom loss
function for model tuning, and methods for AIC and log-likelihood
computation compatible with logLik.lgspline.
Supports right-censored survival data with flexible parameter estimation.
Note on scale vs. dispersion: throughout this package, the lgspline object
stores sigmasq_tilde which equals sigma^2 (dispersion), where
sigma is the Weibull scale parameter matching survreg$scale.
Functions that accept a dispersion argument receive sigma^2;
functions that accept a scale argument receive sigma.
Value
A family-like list containing the link functions and Weibull-specific
methods used by lgspline.
Examples
## Simulate survival data with covariates
set.seed(1234)
n <- 1000
t1 <- rnorm(n)
t2 <- rbinom(n, 1, 0.5)
## Generate survival times with Weibull-like structure
lambda <- exp(0.5 * t1 + 0.3 * t2)
yraw <- rexp(n, rate = 1/lambda)
## Introduce right-censoring
status <- rbinom(n, 1, 0.75)
y <- ifelse(status, yraw, runif(length(yraw), 0, yraw))
## Prepare data
df <- data.frame(y = y, t1 = t1, t2 = t2, status = status)
## Fit model using custom Weibull family
model_fit <- lgspline(y ~ spl(t1) + t2,
df,
unconstrained_fit_fxn = unconstrained_fit_weibull,
family = weibull_family(),
need_dispersion_for_estimation = TRUE,
dispersion_function = weibull_dispersion_function,
glm_weight_function = weibull_glm_weight_function,
schur_correction_function = weibull_schur_correction,
status = status,
opt = FALSE,
K = 1)
summary(model_fit)
## Log-likelihood now works via logLik.lgspline:
# logLik(model_fit)
Weibull GLM Weight Function for Constructing Information Matrix
Description
Computes the working weights used in the Weibull AFT information matrix, including the observation-weight contribution returned on the vector scale.
Usage
weibull_glm_weight_function(
mu,
y,
order_indices,
family,
dispersion,
observation_weights,
status
)
Arguments
mu |
Predicted survival times |
y |
Observed response/survival times |
order_indices |
Order of observations when partitioned to match "status" to "response" |
family |
Weibull AFT family; unused here and retained for interface compatibility. |
dispersion |
Estimated dispersion parameter (sigma^2 = scale^2). The
lgspline framework stores and passes dispersion (sigma^2); the Weibull
scale (sigma) matching |
observation_weights |
Weights of observations submitted to function |
status |
Censoring indicator (1 = event, 0 = censored) Indicates whether an event of interest occurred (1) or the observation was right-censored (0). In survival analysis, right-censoring occurs when the full survival time is unknown, typically because the study ended or the subject was lost to follow-up before the event of interest occurred. |
Details
This function generates weights used in constructing the information matrix
after unconstrained estimates have been found. Specifically, it is used in
the construction of the \textbf{U} and \textbf{G} matrices
following initial unconstrained parameter estimation.
These weights are analogous to the variance terms in generalized linear
models (GLMs). Like logistic regression uses \mu(1-\mu), Poisson
regression uses e^{\mu}, and Linear regression uses constant weights,
Weibull AFT models use \exp((\log y - \log \mu)/\sigma) where
\sigma = \sqrt{\text{dispersion}} is the scale parameter.
Value
Numeric vector of working weights for the information matrix, including observation weights when finite and a fallback of 1s when the natural Weibull weights are non-finite.
Examples
## Demonstration of glm weight function in constrained model estimation
set.seed(1234)
n <- 1000
t1 <- rnorm(n)
t2 <- rbinom(n, 1, 0.5)
## Generate survival times
lambda <- exp(0.5 * t1 + 0.3 * t2)
yraw <- rexp(n, rate = 1/lambda)
## Introduce right-censoring
status <- rbinom(n, 1, 0.75)
y <- ifelse(status, yraw, runif(length(yraw), 0, yraw))
## Fit model demonstrating use of custom glm weight function
model_fit <- lgspline(y ~ spl(t1) + t2,
data.frame(y = y, t1 = t1, t2 = t2),
unconstrained_fit_fxn = unconstrained_fit_weibull,
family = weibull_family(),
need_dispersion_for_estimation = TRUE,
dispersion_function = weibull_dispersion_function,
glm_weight_function = weibull_glm_weight_function,
schur_correction_function = weibull_schur_correction,
status = status,
opt = FALSE,
K = 1)
print(summary(model_fit))
Compute Gradient of Log-Likelihood of Weibull Accelerated Failure Model
Description
Calculates the gradient of log-likelihood for a Weibull accelerated failure time (AFT) survival model, supporting right-censored survival data.
Usage
weibull_qp_score_function(
X,
y,
mu,
order_list,
dispersion,
VhalfInv,
observation_weights,
status
)
Arguments
X |
Design matrix |
y |
Response vector |
mu |
Predicted mean vector |
order_list |
List of observation indices per partition |
dispersion |
Dispersion parameter (sigma^2 = scale^2). The lgspline
framework stores and passes dispersion (sigma^2); the Weibull scale
(sigma) matching |
VhalfInv |
Inverse square root of correlation matrix; unused here and retained for interface compatibility. |
observation_weights |
Observation weights |
status |
Censoring indicator (1 = event, 0 = censored) |
Details
Needed if using "blockfit", correlation structures, or quadratic programming with Weibull AFT models.
The gradient is computed on a scale that omits the 1/sigma prefactor.
Specifically, the true score is (1/sigma) * X^T diag(w) (exp(z) - status),
but both this function and the corresponding information matrix used
internally omit 1/sigma and 1/sigma^2 respectively, so the Newton-Raphson
step G*u remains correct. This matches the convention in
unconstrained_fit_weibull.
Value
Numeric column vector representing the gradient with respect to coefficients.
Examples
set.seed(1234)
t1 <- rnorm(1000)
t2 <- rbinom(1000, 1, 0.5)
yraw <- rexp(exp(0.01*t1 + 0.01*t2))
status <- rbinom(1000, 1, 0.25)
yobs <- ifelse(status, runif(length(yraw), 0, yraw), yraw)
df <- data.frame(
y = yobs,
t1 = t1,
t2 = t2
)
## Example using blockfit for t2 as a linear term - output does not look
# different, but internal methods used for fitting change
model_fit <- lgspline(y ~ spl(t1) + t2,
df,
unconstrained_fit_fxn = unconstrained_fit_weibull,
family = weibull_family(),
need_dispersion_for_estimation = TRUE,
qp_score_function = weibull_qp_score_function,
dispersion_function = weibull_dispersion_function,
glm_weight_function = weibull_glm_weight_function,
schur_correction_function = weibull_schur_correction,
K = 1,
blockfit = TRUE,
opt = FALSE,
status = status,
verbose = TRUE)
print(summary(model_fit))
Estimate Scale for Weibull Accelerated Failure Time Model
Description
Computes maximum log-likelihood scale estimate (sigma) for a Weibull accelerated failure time (AFT) survival model.
This both provides a tool for actually fitting Weibull AFT Models, and boilerplate code for users who wish to incorporate Lagrangian multiplier smoothing splines into their own custom models.
Usage
weibull_scale(log_y, log_mu, status, weights = 1)
Arguments
log_y |
Logarithm of response/survival times |
log_mu |
Logarithm of predicted survival times |
status |
Censoring indicator (1 = event, 0 = censored) Indicates whether an event of interest occurred (1) or the observation was right-censored (0). In survival analysis, right-censoring occurs when the full survival time is unknown, typically because the study ended or the subject was lost to follow-up before the event of interest occurred. |
weights |
Optional observation weights (default = 1) |
Details
Calculates maximum log-likelihood estimate of scale (sigma) for Weibull AFT model accounting for right-censored observations using Brent's method for optimization.
Value
Scalar representing the estimated Weibull scale (sigma), equivalent to
survreg$scale. The dispersion (as stored in
lgspline$sigmasq_tilde) is sigma^2.
Examples
## Simulate exponential data with censoring
set.seed(1234)
mu <- 2 # mean of exponential distribution
n <- 500
y <- rexp(n, rate = 1/mu)
## Introduce censoring (25% of observations)
status <- rbinom(n, 1, 0.75)
y_obs <- ifelse(status, y, NA)
## Compute scale estimate
scale_est <- weibull_scale(
log_y = log(y_obs[!is.na(y_obs)]),
log_mu = log(mu),
status = status[!is.na(y_obs)]
)
print(scale_est)
Correction for the Variance-Covariance Matrix for Uncertainty in Scale
Description
Computes the Schur complement \textbf{S} such that
\textbf{G}^* = (\textbf{G}^{-1} + \textbf{S})^{-1} properly
accounts for uncertainty in estimating the Weibull scale parameter when
estimating the variance-covariance matrix. Otherwise, the
variance-covariance matrix is optimistic and assumes the scale is known,
when it was in fact estimated. Note that the parameterization adds the
output of this function elementwise (not subtract) so for most cases, the
output of this function will be negative or a negative
definite/semi-definite matrix.
Usage
weibull_schur_correction(
X,
y,
B,
dispersion,
order_list,
K,
family,
observation_weights,
status
)
weibull_shur_correction(
X,
y,
B,
dispersion,
order_list,
K,
family,
observation_weights,
status
)
Arguments
X |
Block-diagonal matrices of spline expansions |
y |
Block-vector of response |
B |
Block-vector of coefficient estimates |
dispersion |
Scalar, estimate of dispersion (sigma^2 = scale^2). The
lgspline framework stores and passes dispersion (sigma^2); the Weibull
scale (sigma) matching |
order_list |
List of partition orders |
K |
Number of partitions minus 1 ( |
family |
Distribution family |
observation_weights |
Optional observation weights (default = 1) |
status |
Censoring indicator (1 = event, 0 = censored) Indicates whether an event of interest occurred (1) or the observation was right-censored (0). In survival analysis, right-censoring occurs when the full survival time is unknown, typically because the study ended or the subject was lost to follow-up before the event of interest occurred. |
Details
Adjusts the variance-covariance matrix unscaled for coefficients to account
for uncertainty in estimating the Weibull scale parameter, that otherwise
would be lost if simply using
\textbf{G}=(\textbf{X}^{T}\textbf{W}\textbf{X} + \textbf{L})^{-1}.
This is accomplished using a correction based on the Schur complement so we
avoid having to construct the entire variance-covariance matrix, or
modifying the procedure for lgspline substantially.
For any model with nuisance parameters that must have uncertainty accounted
for, this tool will be helpful.
This both provides a tool for actually fitting Weibull accelerated failure time (AFT) models, and boilerplate code for users who wish to incorporate Lagrangian multiplier smoothing splines into their own custom models.
Value
List of blockwise Schur-complement corrections \textbf{S}_k to be
elementwise added to each block of the information matrix before inversion,
with 0 returned for empty partitions.
Examples
## Minimal example of fitting a Weibull Accelerated Failure Time model
# Simulating survival data with right-censoring
set.seed(1234)
t1 <- rnorm(1000)
t2 <- rbinom(1000, 1, 0.5)
yraw <- rexp(exp(0.01*t1 + 0.01*t2))
# status: 1 = event occurred, 0 = right-censored
status <- rbinom(1000, 1, 0.25)
yobs <- ifelse(status, runif(length(yraw), 0, yraw), yraw)
df <- data.frame(
y = yobs,
t1 = t1,
t2 = t2
)
## Fit model using lgspline with Weibull Schur correction
model_fit <- lgspline(y ~ spl(t1) + t2,
df,
unconstrained_fit_fxn = unconstrained_fit_weibull,
family = weibull_family(),
need_dispersion_for_estimation = TRUE,
dispersion_function = weibull_dispersion_function,
glm_weight_function = weibull_glm_weight_function,
schur_correction_function = weibull_schur_correction,
status = status,
opt = FALSE,
K = 1)
print(summary(model_fit))
## Fit model using lgspline without Weibull Schur correction
naive_fit <- lgspline(y ~ spl(t1) + t2,
df,
unconstrained_fit_fxn = unconstrained_fit_weibull,
family = weibull_family(),
need_dispersion_for_estimation = TRUE,
dispersion_function = weibull_dispersion_function,
glm_weight_function = weibull_glm_weight_function,
status = status,
opt = FALSE,
K = 1)
print(summary(naive_fit))