Differential Item Functioning (DIF) occurs when test-takers from different groups who have the same underlying ability have different probabilities of answering a test item correctly. DIF threatens the validity of comparisons across groups and is a central concern in fair assessment.
iDIFr makes DIF analysis accessible, with particular
support for intersectional group designs — where groups
are defined by combinations of demographic variables, such as gender ×
nationality × age band.
We’ll use a small synthetic dataset with known DIF to illustrate the workflow.
Your data should be a data frame where:
0 and
1set.seed(42)
dat <- simulate_dif(
n_persons = 600,
n_items = 20,
n_groups = 2,
dif_items = c(3, 7, 12), # these items have DIF
dif_effect = 0.9,
dif_type = "uniform"
)
head(dat[c(1:5, 21)]) # first 5 items + group column
#> item_1 item_2 item_3 item_4 item_5 group
#> 1 1 0 0 0 0 G1
#> 2 0 1 1 0 1 G1
#> 3 1 0 0 0 0 G1
#> 4 1 1 0 1 0 G1
#> 5 1 1 0 1 0 G2
#> 6 1 0 0 0 1 G2Before running the analysis, use check_groups() to
inspect group cell sizes. This is especially important in intersectional
designs where small cells can reduce statistical power.
For an intersectional design with multiple demographic variables:
# Add nationality variable for illustration
dat$nationality <- sample(c("UK", "DE", "FR"), 600, replace = TRUE)
dat$age_band <- sample(c("18-30", "31-45", "46+"), 600, replace = TRUE)
check_groups(dat, group = ~ group * nationality * age_band)If any cells are too small, check_groups() will tell you
and point you to merge_groups().
Supply your data, the item columns, a group formula, and which method(s) to use.
method is required — you must choose. Options are:
| Method | What it does |
|---|---|
"LR" |
Logistic Regression — flexible, non-IRT, effect size via Nagelkerke ΔR² |
"LRT" |
IRT Likelihood Ratio Test — model-based, effect size via standardised chi |
"MOB" |
Model-based recursive partitioning — non-parametric, detects intersectional instability |
The key feature of iDIFr is first-class support for
intersectional group structures. Where conventional DIF analysis
examines one demographic variable at a time, intersectional analysis
asks: does DIF appear at the combination of gender × nationality ×
age, even when no individual variable shows DIF?
result_intersectional <- idifr(
data = dat,
items = 1:20,
group = ~ group * nationality * age_band, # crossing all three variables
method = c("LR", "LRT")
)
print(result_intersectional)Intersectional designs often produce small cells. iDIFr
will warn you but always run the analysis. To merge sparse cells:
grp <- check_groups(dat, group = ~ group * nationality * age_band)
merged_dat <- merge_groups(
grp,
age_band = list("18-45" = c("18-30", "31-45")) # combine two age bands
)
# Re-run with merged groups
result_merged <- idifr(merged_dat, 1:20,
group = ~ group * nationality * age_band,
method = c("LR", "LRT"))iDIFr leads with effect sizes, not just p-values.
Flagging criteria require both a significant p-value (after
adjustment) and a meaningful effect size.
| Method | Effect size | Classification |
|---|---|---|
| LR | Nagelkerke ΔR² | A: <.035 · B: .035–.070 · C: ≥.070 |
| LRT | Std. chi | Negligible: <.20 · Moderate: .20–.50 · Large: ≥.50 |
| MOB | Std. score difference | Negligible: <.20 · Moderate: .20–.50 · Large: ≥.50 |
Set ica = TRUE in idifr() to go one step
further than a single intersectional analysis. It runs one analysis per
demographic variable (single-variable) and one intersectional
analysis, then classifies each item by comparing where it was
flagged.
ica_res <- idifr(
data = dat,
items = 1:20,
group = ~ group * nationality * age_band,
method = "LR",
ica = TRUE
)
print(ica_res) # includes the ICA classification section
tidy(ica_res, table = "ica") # flat ICA classification tableFour item classifications are possible:
| Classification | Meaning |
|---|---|
amplified |
Flagged in single-variable and intersectional runs |
pure_intersection |
Only flagged in the intersectional run |
obscured |
Flagged in a single-variable run but not intersectionally |
none |
Not flagged anywhere |
Note: ICA runs multiple analyses without cross-analysis p-value correction. The
effect_thresholdargument (default 0.035) provides a de facto stricter criterion. Interpretpure_intersectionandobscuredfindings with caution in small samples.