couplr 1.0.2
Documentation
- Added Overview section to algorithms vignette with audience and
prerequisites
- Fixed workflow diagram dark mode text handling in matching-workflows
vignette
- Improved SVG theme-awareness for multi-line text labels
- Removed grid lines from matching-workflows plots for cleaner
appearance
- Added threshold labels to balance comparison plot
couplr 1.0.0
Major New Features
(2025-11-19 Update)
Automatic Preprocessing and
Scaling
The package now includes intelligent preprocessing to improve
matching quality:
- New
auto_scale parameter in
match_couples() and greedy_couples() enables
automatic preprocessing
- Variable health checks detect and handle
problematic variables:
- Constant columns (SD = 0) are automatically excluded with
warnings
- High missingness (>50%) triggers warnings
- Extreme skewness (|skewness| > 2) is flagged
- Smart scaling method selection analyzes data and
recommends:
- “robust” scaling using median and MAD (resistant to outliers)
- “standardize” for traditional mean-centering and SD scaling
- “range” for min-max normalization
- New
preprocess_matching_vars() function for manual
preprocessing control
- Categorical variable encoding for binary and ordered factors
Balance Diagnostics
Comprehensive tools to assess matching quality:
- New
balance_diagnostics() function
computes multiple balance metrics:
- Standardized differences: (mean_left - mean_right) / pooled_sd
- Variance ratios: SD_left / SD_right
- Kolmogorov-Smirnov tests for distribution comparison
- Overall balance metrics (mean, max, % large imbalance)
- Quality thresholds with interpretation:
- |Std Diff| < 0.10: Excellent balance
- |Std Diff| 0.10-0.25: Good balance
- |Std Diff| 0.25-0.50: Acceptable balance
- |Std Diff| > 0.50: Poor balance
- Per-block statistics with quality ratings when blocking is used
balance_table() creates publication-ready formatted
tables
- Informative print methods with interpretation guides
Joined Matched Dataset
Output
Create analysis-ready datasets directly from matching results:
- New
join_matched() function automates
data preparation:
- Joins matched pairs with original left and right datasets
- Eliminates manual data wrangling after matching
- Select specific variables via
left_vars and
right_vars parameters
- Customizable suffixes (default:
_left,
_right) for overlapping columns
- Optional metadata:
pair_id, distance,
block_id
- Works with both optimal and greedy matching
- Broom-style
augment() method for
tidymodels integration:
- S3 method following broom package conventions
- Sensible defaults for quick exploration
- Supports all
join_matched() parameters
- Flexible output control:
include_distance - Include/exclude matching
distance
include_pair_id - Include/exclude sequential pair
IDs
include_block_id - Include/exclude block
identifiers
- Custom ID column support via
left_id and
right_id
- Clean column ordering: pair_id → IDs → distance → block →
variables
Precomputed and Reusable
Distances
Performance optimization for exploring multiple matching
strategies:
- New
compute_distances() function
precomputes and caches distance matrices:
- Compute distances once, reuse across multiple matching
operations
- Store complete metadata: variables, distance metric, scaling method,
timestamps
- Preserve original datasets for seamless integration with
join_matched()
- Enable rapid exploration of different matching parameters
- Performance improvement: ~60% faster when trying multiple matching
strategies
- Distance objects (S3 class
distance_object):
- Self-contained: cost matrix, IDs, metadata, original data
- Works with both
match_couples() and
greedy_couples()
- Pass as first argument instead of datasets:
match_couples(dist_obj, max_distance = 5)
- Informative print and summary methods with distance statistics
- Constraint modification via
update_constraints():
- Apply new
max_distance or calipers without
recomputing distances
- Creates new distance object following copy-on-modify semantics
- Experiment with different constraints efficiently
- Backward compatible integration:
- Modified function signatures:
match_couples(left, right = NULL, vars = NULL, ...)
- Automatically detects distance objects vs. datasets
- All existing code continues to work unchanged
Parallel Processing
Speed up blocked matching with multi-core processing:
- New
parallel parameter in
match_couples() and greedy_couples():
- Enable with
parallel = TRUE for automatic
configuration
- Specify plan with
parallel = "multisession" or other
future plan
- Works with any number of blocks - automatically determines if
beneficial
- Gracefully falls back if future packages not installed
- Powered by the
future package:
- Cross-platform support (Windows, Unix/Mac, clusters)
- Respects user-configured parallel backends
- Automatic worker management
- Clean restoration of original plan after execution
- Performance:
- Best for 10+ blocks with 50+ units per block
- Speedup scales with number of cores and complexity
- Minimal overhead for small problems
- Integration:
- Works with all blocking methods (exact, fuzzy, clustering)
- Compatible with distance caching from Step 4
- Supports all matching parameters (constraints, calipers,
scaling)
Fun Error Messages and
Cost Checking
Like testthat, couplr makes errors light, memorable, and helpful with
couple-themed messages:
- New
check_costs parameter (default:
TRUE) in match_couples() and
greedy_couples():
- Automatically checks distance distributions before matching
- Provides friendly, actionable warnings for common problems
- Set to
FALSE to skip checks in production code
- Fun couple-themed error messages throughout the
package:
- 💔 “No matches made - can’t couple without candidates!”
- 🔍 “Your constraints are too strict. Love can’t bloom in a
vacuum!”
- ✨ Helpful suggestions: “Try increasing max_distance or relaxing
calipers”
- 💖 Success messages: “Excellent balance! These couples are
well-matched!”
- Automatic problem detection:
- Too many zeros: Warns about duplicates or identical
values (>10% zero distances)
- Extreme costs: Detects skewed distributions (99th
percentile > 10x the 95th)
- Many forbidden pairs: Warns when constraints
eliminate >50% of valid pairs
- Constant distances: Alerts when all distances are
identical
- Constant variables: Detects and excludes variables
with no variation
- New diagnostic function
diagnose_distance_matrix():
- Comprehensive analysis of cost distributions
- Variable-specific problem detection
- Actionable suggestions for fixes
- Quality rating (good/fair/poor)
- Emoji control: Disable with
options(couplr.emoji = FALSE) if preferred
- Philosophy: Errors should be less intimidating,
more memorable, and provide clear guidance
New Functions
preprocess_matching_vars() - Main preprocessing
orchestrator
balance_diagnostics() - Comprehensive balance
assessment
balance_table() - Formatted balance tables for
reporting
join_matched() - Create analysis-ready datasets from
matching results
augment.matching_result() - Broom-style interface for
joined data
compute_distances() - Precompute and cache distance
matrices
update_constraints() - Modify constraints on distance
objects
is_distance_object() - Type checking for distance
objects
diagnose_distance_matrix() - Comprehensive distance
diagnostics
check_cost_distribution() - Check for distribution
problems
- Added robust scaling method using median and MAD
Documentation & Examples
examples/auto_scale_demo.R - 5 preprocessing
demonstrations
examples/balance_diagnostics_demo.R - 6 balance
diagnostic examples
examples/join_matched_demo.R - 8 joined dataset
demonstrations
examples/distance_cache_demo.R - Distance caching and
reuse examples
examples/parallel_matching_demo.R - 7 parallel
processing examples
examples/error_messages_demo.R - 10 fun error message
demonstrations
- Complete implementation documentation
(claude/IMPLEMENTATION_STEP1.md through STEP6.md)
- All functions have full Roxygen documentation
Tests
- Added 34+ new tests (10 for preprocessing, 11 for balance
diagnostics, 13 for joined datasets, tests for distance caching)
- All tests passing with full backward compatibility
Major Changes (Initial
1.0.0 Release)
Package Renamed: lapr → couplr
The package has been renamed from lapr to
couplr to better reflect its purpose as a general
pairing and matching toolkit.
couplr = Optimal pairing and matching via linear
assignment
Clean 1.0.0 Release
First official stable release with clean, well-organized
codebase.
New Organization
R Code
- Eliminated 3 redundant files
- Consistent
morph_* naming prefix
- Two-layer API:
assignment() (low-level) +
lap_solve() (tidy)
- 10 well-organized files (down from 13)
C++ Code
- Modular subdirectory structure:
src/core/ - Utilities and headers
src/interface/ - Rcpp exports
src/solvers/ - 14 LAP algorithms
src/gabow_tarjan/ - Gabow-Tarjan solver
src/morph/ - Image morphing
Features
Solvers
Hungarian, Jonker-Volgenant, Auction (3 variants), SAP/SSP,
SSAP-Bucket, Cost-scaling, Cycle-cancel, Gabow-Tarjan, Hopcroft-Karp,
Line-metric, Brute-force, Auto-select
High-Level
✅ Tidy tibble interface ✅ Matrix & data frame inputs
✅ Grouped data frames ✅ Batch solving + parallelization ✅ K-best
solutions (Murty, Lawler) ✅ Rectangular matrices ✅ Forbidden
assignments (NA/Inf) ✅ Maximize/minimize ✅ Pixel morphing
visualization
API
lap_solve() - Main tidy interface
lap_solve_batch() - Batch solving
lap_solve_kbest() - K-best solutions
assignment() - Low-level solver
- Utilities:
get_total_cost(),
as_assignment_matrix(), etc.
- Visualization:
pixel_morph(),
pixel_morph_animate()
Development history under “lapr” available in git log before
v1.0.0.