This vignette uses data from the Seattle Police Department, the same
data featured in Ridgeway
(2026). The dataset is included with cosmic. Running
the Stan model fit can take several hours, so the vignette renders from
a small precomputed object that stores diagnostics, officer summaries,
and selected posterior draws from the Seattle fit.
The demo data are distributed as a single data frame d
in the package extdata directory.
path <- system.file("extdata", "dataSPD.RData", package = "cosmic")
if (!nzchar(path)) {
path_candidates <- c(
file.path(vignette_dir, "..", "inst", "extdata", "dataSPD.RData"),
file.path("..", "inst", "extdata", "dataSPD.RData"),
file.path("inst", "extdata", "dataSPD.RData")
)
path <- path_candidates[file.exists(path_candidates)][1]
}
if (is.na(path) || !nzchar(path)) {
stop("Could not find 'dataSPD.RData' in the installed package or source tree.")
}
load(path)Each row is an officer-incident record. id identifies
the incident, idOff identifies the officer, and
y is the officer’s highest observed force level on an
ordinal scale from 1 to 4. Incidents with no within-incident variation
in force have already been removed, since they do not contribute
information to the conditional likelihood. The resulting dataset is
already in the format expected by cosmic(): one row per
officer within incident, with the outcome coded as consecutive integers
starting at 1.
seattle_overview <- data.frame(
quantity = c("Officer-incident rows", "Incidents", "Officers", "Force levels"),
value = c(nrow(d),
length(unique(d$id)),
length(unique(d$idOff)),
length(unique(d$y))))
seattle_overview
#> quantity value
#> 1 Officer-incident rows 37016
#> 2 Incidents 4821
#> 3 Officers 1503
#> 4 Force levels 4force_distribution <- data.frame(
force_level = as.integer(names(table(d$y))),
count = as.integer(table(d$y)))
force_distribution
#> force_level count
#> 1 1 28807
#> 2 2 6243
#> 3 3 1885
#> 4 4 81officers_per_incident <- table(table(d$id))
incident_size_distribution <- data.frame(
officers_in_incident = as.integer(names(officers_per_incident)),
number_of_incidents = as.integer(officers_per_incident)
)
incident_size_distribution
#> officers_in_incident number_of_incidents
#> 1 2 190
#> 2 3 406
#> 3 4 581
#> 4 5 630
#> 5 6 523
#> 6 7 427
#> 7 8 378
#> 8 9 308
#> 9 10 258
#> 10 11 244
#> 11 12 183
#> 12 13 174
#> 13 14 127
#> 14 15 108
#> 15 16 99
#> 16 17 64
#> 17 18 70
#> 18 19 51To fit the conditional ordinal stereotype model, use the
cosmic() function. This compiles the Stan program, prepares
the Seattle data, and draws samples from the posterior distribution for
the officer effects \(\boldsymbol\lambda\) and the ordinal score
parameters \(\mathbf{s}\). The
conditional likelihood calculation is parallelized across incidents, so
threads controls within-chain parallelism. In this example
it is usually better to let chains run sequentially
(cores = 1) and use the available CPU budget inside each
chain.
COSMIC fits models through cmdstanr, which is
distributed through the Stan R-universe rather than CRAN. If
cmdstanr or CmdStan is not already installed, run:
After fitting, start by reviewing convergence and Hamiltonian Monte Carlo diagnostics. The most important questions are whether the chains mixed well, whether effective sample sizes are adequate, and whether the Hamiltonian Monte Carlo sampler encountered pathologies such as divergences.
#> COSMIC fit diagnostics: OK
#> Max R-hat: 1.001
#> Min n_eff: 7172.822
#> Divergences: 0
#> Max treedepth saturations: 0
#> Low E-BFMI chains: none
#> No major diagnostic problems detected.
The next step is to summarize each officer relative to their peer
group. The officer-specific parameters \(\lambda_i\) are not individually
identified; only contrasts between officers are meaningful. For that
reason, officer_summary() reports each officer relative to
a set of peers for whom the data are informative enough to estimate
differences in \(\lambda\) with useful
precision.
The prior standard deviation on the \(\lambda_i\) is set with
priorSD_lambda when calling cosmic(), with the
default equal to 2. The prior variance for the difference of \(\lambda_i\) and \(\lambda_j\) is
2*priorSD_lambda^2. Officers \(i\) and \(j\) will be considered peers if their
posterior variance, \(\mathrm{Var}(\lambda_i-\lambda_j\mid\mathrm{data})
< \mathtt{pct\_threshold} \times 2 \times
\mathtt{priorSD\_lambda}^2\). pct_threshold is the
percent reduction in variance in the difference between two officers’
values of \(\lambda\) needed in order
for them to be considered peers, with the default set to 0.25.
officer_summary() will also compute the probability that
the officer is in the top or bottom flag_officer_tail_pct
of the distribution of \(\lambda\) in
their peer group, with the default set to 5%.
flag_officer_tail_pct <- 0.05
off_summary <- officer_summary(fit,
pct_threshold = 0.25,
pct_tail = flag_officer_tail_pct)
head(off_summary)#> idOffOrig idOff nPeers nInc force1 force2 force3 force4 pRankToppct
#> 1 1 1 426 2 1 1 0 0 0.25785
#> 2 2 2 0 1 1 0 0 0 NA
#> 3 3 3 471 2 1 1 0 0 0.16500
#> 4 4 4 74 5 5 0 0 0 0.00005
#> 5 5 5 846 33 33 0 0 0 0.00000
#> 6 6 6 0 2 2 0 0 0 NA
#> pRankBotpct lamMean lam025 lam975
#> 1 0.10820 0.333570433 -2.377860 2.8710224
#> 2 NA NA NA NA
#> 3 0.14960 -0.003180971 -2.683507 2.5165607
#> 4 0.95380 -3.242193464 -6.172846 -0.7727395
#> 5 0.95355 -3.510387239 -5.999311 -1.6184494
#> 6 NA NA NA NA
The resulting table includes the number of peers for each officer,
incident counts, force-category counts, and posterior summaries of each
officer’s latent propensity relative to peers. lamMean
reports the posterior mean for the difference between \(\lambda_i\) and the mean of the \(\lambda\)s in officer \(i\)’s peer group. lam025 and
lam975 give the endpoints of the 95% credible interval.
Some officers have no reported peers because their force counts are too
sparse for precise estimation of their \(\lambda\), or because they are weakly
connected to the part of the officer network where pairwise contrasts
can be estimated well.
The value of \(\lambda\) can be interpreted as a relative risk of using a more serious force level. \[ \frac{P(Y=y_2|\lambda=\lambda_0)}{P(Y=y_1|\lambda=\lambda_0)} = \frac{P(Y=y_2|\lambda=0)}{P(Y=y_1|\lambda=0)} e^{\left(s_{y_2}-s_{y_1}\right)\lambda_0} \] Since \(s_0=0\) and \(s_1=1\), one way of expressing \(\lambda\) is \[ e^{\lambda_0} = \frac{\frac{P(Y=1|\lambda=\lambda_0)}{P(Y=0|\lambda=\lambda_0)}} {\frac{P(Y=1|\lambda=0)}{P(Y=0|\lambda=0)}} \] The relative risk ratio of using Level 1 versus Level 0 force for an officer with \(\lambda=\lambda_0\) compared to the average of their peer group.
To focus attention on officers with strong posterior evidence of
being unusually high or low relative to peers, filter the officer
summary into an outlier report. outlier_report() keeps only
those officers with posterior probability exceeding
prob_outlier of being in either the top or bottom
pct_tail of the peer-group distribution of \(\lambda\).
#> idOffOrig idOff nPeers nInc force1 force2 force3 force4 pRankToppct
#> 1 28 28 1170 37 12 24 1 0 0.99855
#> 2 836 836 1188 75 33 39 2 1 0.99735
#> 3 1450 1450 864 7 0 6 1 0 0.98625
#> 4 1006 1006 1042 9 1 8 0 0 0.98010
#> 5 182 182 1187 79 25 43 11 0 0.97370
#> 6 1341 1341 1153 29 8 12 9 0 0.96065
#> 7 517 517 104 5 5 0 0 0 0.00040
#> 8 1447 1447 165 7 7 0 0 0 0.00035
#> 9 603 603 179 4 4 0 0 0 0.00035
#> 10 428 428 42 5 5 0 0 0 0.00030
#> pRankBotpct lamMean lam025 lam975
#> 1 0.00000 3.001081 2.278690 3.7535805
#> 2 0.00000 2.634561 2.119185 3.1536983
#> 3 0.00000 3.705152 1.734301 6.0632103
#> 4 0.00000 3.316528 1.781626 5.0991616
#> 5 0.00000 2.445398 1.907371 2.9878141
#> 6 0.00000 2.651643 1.778980 3.5635044
#> 7 0.92700 -2.827792 -5.829419 -0.4328217
#> 8 0.93600 -2.937077 -5.929507 -0.5594303
#> 9 0.95665 -3.219114 -6.088955 -0.7492189
#> 10 0.93415 -2.971142 -5.965520 -0.5442428
outliers_display <- outliers
prob_cols <- names(outliers_display) %in% c("pRankToppct", "pRankBotpct")
tail_pct_label <- formatC(100 * flag_officer_tail_pct,
format = "fg", digits = 3)
names(outliers_display)[names(outliers_display) == "pRankToppct"] <-
paste0("Prob. top ", tail_pct_label, "%")
names(outliers_display)[names(outliers_display) == "pRankBotpct"] <-
paste0("Prob. bottom ", tail_pct_label, "%")
outliers_display[prob_cols] <-
lapply(outliers_display[prob_cols],
function(x) sprintf("%.2f", x))
digits <- rep(0, ncol(outliers_display))
digits[seq.int(length(digits)-4,length(digits))] <- 2L
knitr::kable(
outliers_display,
format = "html",
digits = digits,
caption = "Officers with strong posterior evidence of being in the tails of the peer distribution",
align = rep("r", ncol(outliers_display))) |>
kableExtra::kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE,
position = "center") |>
kableExtra::row_spec(0, bold = TRUE, color = "white",
background = "#2C3E50") |>
kableExtra::column_spec(1, bold = TRUE) |>
kableExtra::scroll_box(width = "100%", height = "320px")| idOffOrig | idOff | nPeers | nInc | force1 | force2 | force3 | force4 | Prob. top 5% | Prob. bottom 5% | lamMean | lam025 | lam975 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 28 | 28 | 1170 | 37 | 12 | 24 | 1 | 0 | 1.00 | 0.00 | 3.00 | 2.28 | 3.75 |
| 836 | 836 | 1188 | 75 | 33 | 39 | 2 | 1 | 1.00 | 0.00 | 2.63 | 2.12 | 3.15 |
| 1450 | 1450 | 864 | 7 | 0 | 6 | 1 | 0 | 0.99 | 0.00 | 3.71 | 1.73 | 6.06 |
| 1006 | 1006 | 1042 | 9 | 1 | 8 | 0 | 0 | 0.98 | 0.00 | 3.32 | 1.78 | 5.10 |
| 182 | 182 | 1187 | 79 | 25 | 43 | 11 | 0 | 0.97 | 0.00 | 2.45 | 1.91 | 2.99 |
| 1341 | 1341 | 1153 | 29 | 8 | 12 | 9 | 0 | 0.96 | 0.00 | 2.65 | 1.78 | 3.56 |
| 517 | 517 | 104 | 5 | 5 | 0 | 0 | 0 | 0.00 | 0.93 | -2.83 | -5.83 | -0.43 |
| 1447 | 1447 | 165 | 7 | 7 | 0 | 0 | 0 | 0.00 | 0.94 | -2.94 | -5.93 | -0.56 |
| 603 | 603 | 179 | 4 | 4 | 0 | 0 | 0 | 0.00 | 0.96 | -3.22 | -6.09 | -0.75 |
| 428 | 428 | 42 | 5 | 5 | 0 | 0 | 0 | 0.00 | 0.93 | -2.97 | -5.97 | -0.54 |
| 427 | 427 | 156 | 8 | 8 | 0 | 0 | 0 | 0.00 | 0.92 | -2.83 | -5.78 | -0.47 |
| 1027 | 1027 | 51 | 7 | 7 | 0 | 0 | 0 | 0.00 | 0.93 | -2.94 | -5.94 | -0.52 |
| 321 | 321 | 166 | 5 | 5 | 0 | 0 | 0 | 0.00 | 0.93 | -2.90 | -5.84 | -0.51 |
| 372 | 372 | 112 | 15 | 8 | 4 | 3 | 0 | 0.00 | 0.96 | -3.25 | -6.16 | -0.79 |
| 737 | 737 | 322 | 8 | 8 | 0 | 0 | 0 | 0.00 | 0.91 | -2.83 | -5.71 | -0.50 |
| 265 | 265 | 115 | 8 | 8 | 0 | 0 | 0 | 0.00 | 0.93 | -2.85 | -5.86 | -0.50 |
| 191 | 191 | 39 | 13 | 13 | 0 | 0 | 0 | 0.00 | 0.93 | -2.94 | -5.92 | -0.59 |
| 31 | 31 | 289 | 3 | 3 | 0 | 0 | 0 | 0.00 | 0.95 | -3.25 | -6.07 | -0.77 |
| 1201 | 1201 | 270 | 12 | 12 | 0 | 0 | 0 | 0.00 | 0.91 | -2.79 | -5.71 | -0.48 |
| 19 | 19 | 182 | 10 | 10 | 0 | 0 | 0 | 0.00 | 0.92 | -2.76 | -5.73 | -0.43 |
| 326 | 326 | 507 | 11 | 11 | 0 | 0 | 0 | 0.00 | 0.90 | -2.99 | -5.85 | -0.75 |
| 1356 | 1356 | 185 | 10 | 10 | 0 | 0 | 0 | 0.00 | 0.91 | -2.78 | -5.74 | -0.43 |
| 245 | 245 | 95 | 10 | 10 | 0 | 0 | 0 | 0.00 | 0.92 | -2.82 | -5.85 | -0.47 |
| 528 | 528 | 357 | 6 | 6 | 0 | 0 | 0 | 0.00 | 0.92 | -3.06 | -5.84 | -0.67 |
| 435 | 435 | 82 | 14 | 13 | 1 | 0 | 0 | 0.00 | 0.93 | -2.86 | -5.88 | -0.51 |
| 1178 | 1178 | 238 | 9 | 9 | 0 | 0 | 0 | 0.00 | 0.94 | -2.98 | -5.94 | -0.67 |
| 152 | 152 | 336 | 6 | 6 | 0 | 0 | 0 | 0.00 | 0.95 | -3.24 | -6.05 | -0.86 |
| 4 | 4 | 74 | 5 | 5 | 0 | 0 | 0 | 0.00 | 0.95 | -3.24 | -6.17 | -0.77 |
| 112 | 112 | 484 | 5 | 5 | 0 | 0 | 0 | 0.00 | 0.96 | -3.49 | -6.23 | -1.06 |
| 420 | 420 | 694 | 17 | 17 | 0 | 0 | 0 | 0.00 | 0.90 | -3.12 | -5.80 | -1.00 |
| 841 | 841 | 481 | 16 | 16 | 0 | 0 | 0 | 0.00 | 0.90 | -2.95 | -5.79 | -0.69 |
| 328 | 328 | 739 | 32 | 32 | 0 | 0 | 0 | 0.00 | 0.90 | -3.14 | -5.84 | -1.12 |
| 391 | 391 | 713 | 14 | 14 | 0 | 0 | 0 | 0.00 | 0.91 | -3.18 | -5.86 | -1.09 |
| 21 | 21 | 770 | 19 | 19 | 0 | 0 | 0 | 0.00 | 0.92 | -3.24 | -5.86 | -1.23 |
| 100 | 100 | 798 | 35 | 35 | 0 | 0 | 0 | 0.00 | 0.92 | -3.23 | -5.82 | -1.29 |
| 57 | 57 | 314 | 12 | 12 | 0 | 0 | 0 | 0.00 | 0.92 | -2.88 | -5.78 | -0.57 |
| 1407 | 1407 | 349 | 12 | 12 | 0 | 0 | 0 | 0.00 | 0.92 | -2.95 | -5.86 | -0.66 |
| 429 | 429 | 315 | 11 | 11 | 0 | 0 | 0 | 0.00 | 0.93 | -2.95 | -5.89 | -0.61 |
| 94 | 94 | 795 | 21 | 21 | 0 | 0 | 0 | 0.00 | 0.93 | -3.34 | -5.93 | -1.34 |
| 1461 | 1461 | 22 | 10 | 10 | 0 | 0 | 0 | 0.00 | 0.93 | -3.12 | -6.15 | -0.74 |
| 461 | 461 | 658 | 4 | 4 | 0 | 0 | 0 | 0.00 | 0.93 | -3.37 | -6.03 | -1.13 |
| 92 | 92 | 819 | 21 | 21 | 0 | 0 | 0 | 0.00 | 0.94 | -3.36 | -5.90 | -1.39 |
| 527 | 527 | 812 | 28 | 28 | 0 | 0 | 0 | 0.00 | 0.94 | -3.35 | -5.93 | -1.42 |
| 597 | 597 | 677 | 18 | 18 | 0 | 0 | 0 | 0.00 | 0.94 | -3.36 | -6.02 | -1.22 |
| 409 | 409 | 789 | 38 | 38 | 0 | 0 | 0 | 0.00 | 0.94 | -3.37 | -5.96 | -1.39 |
| 632 | 632 | 193 | 6 | 6 | 0 | 0 | 0 | 0.00 | 0.94 | -3.04 | -5.96 | -0.63 |
| 579 | 579 | 843 | 68 | 68 | 0 | 0 | 0 | 0.00 | 0.95 | -3.42 | -5.89 | -1.53 |
| 373 | 373 | 722 | 11 | 11 | 0 | 0 | 0 | 0.00 | 0.95 | -3.53 | -6.15 | -1.39 |
| 75 | 75 | 362 | 6 | 6 | 0 | 0 | 0 | 0.00 | 0.95 | -3.31 | -6.15 | -0.92 |
| 5 | 5 | 846 | 33 | 33 | 0 | 0 | 0 | 0.00 | 0.95 | -3.51 | -6.00 | -1.62 |
| 342 | 342 | 808 | 33 | 33 | 0 | 0 | 0 | 0.00 | 0.95 | -3.47 | -6.03 | -1.51 |
| 237 | 237 | 829 | 27 | 27 | 0 | 0 | 0 | 0.00 | 0.96 | -3.49 | -6.03 | -1.55 |
| 288 | 288 | 834 | 61 | 61 | 0 | 0 | 0 | 0.00 | 0.96 | -3.52 | -6.01 | -1.61 |
| 287 | 287 | 884 | 61 | 60 | 1 | 0 | 0 | 0.00 | 0.96 | -3.63 | -6.03 | -1.78 |
| 387 | 387 | 822 | 33 | 33 | 0 | 0 | 0 | 0.00 | 0.96 | -3.51 | -6.06 | -1.60 |
| 189 | 189 | 848 | 41 | 41 | 0 | 0 | 0 | 0.00 | 0.96 | -3.57 | -6.10 | -1.70 |
| 411 | 411 | 842 | 26 | 26 | 0 | 0 | 0 | 0.00 | 0.96 | -3.58 | -6.08 | -1.65 |
| 271 | 271 | 855 | 49 | 49 | 0 | 0 | 0 | 0.00 | 0.97 | -3.65 | -6.15 | -1.78 |
| 247 | 247 | 850 | 53 | 53 | 0 | 0 | 0 | 0.00 | 0.97 | -3.68 | -6.17 | -1.80 |
| 629 | 629 | 800 | 24 | 24 | 0 | 0 | 0 | 0.00 | 0.97 | -3.71 | -6.21 | -1.70 |
| 399 | 399 | 806 | 37 | 37 | 0 | 0 | 0 | 0.00 | 0.97 | -3.72 | -6.23 | -1.73 |
| 175 | 175 | 749 | 18 | 18 | 0 | 0 | 0 | 0.00 | 0.97 | -3.80 | -6.37 | -1.66 |
| 457 | 457 | 860 | 65 | 64 | 1 | 0 | 0 | 0.00 | 0.98 | -3.78 | -6.26 | -1.90 |
| 636 | 636 | 838 | 39 | 39 | 0 | 0 | 0 | 0.00 | 0.99 | -3.97 | -6.47 | -2.03 |
If you need custom summaries or plots, extract posterior draws
directly. In this model, the score parameters are anchored so that the
baseline level is 0 and the next score is 1, while Stan samples the
positive increments in sDelta. Under the notation used
below, the remaining free score parameters are reconstructed as
\[ s_2 = 1 + \mathtt{sDelta}_1, \qquad s_3 = 1 + \mathtt{sDelta}_1 + \mathtt{sDelta}_2. \]
score_summary <- data.frame(
parameter = c("s[2]", "s[3]"),
mean = c(
mean(1 + sDelta_draws[, 1]),
mean(1 + sDelta_draws[, 1] + sDelta_draws[, 2])
),
lower_95 = c(
unname(stats::quantile(1 + sDelta_draws[, 1], probs = 0.025)),
unname(stats::quantile(1 + sDelta_draws[, 1] + sDelta_draws[, 2],
probs = 0.025))
),
upper_95 = c(
unname(stats::quantile(1 + sDelta_draws[, 1], probs = 0.975)),
unname(stats::quantile(1 + sDelta_draws[, 1] + sDelta_draws[, 2],
probs = 0.975))
)
)The table below summarizes the posterior distribution of the non-fixed score parameters. The expression \(\frac{1}{s_{y_2}-s_{y_1}}\log 2\) gives the amount that \(\lambda\) must increase to double the relative risk of force level \(y_2\) over \(y_1\).
| parameter | mean | lower_95 | upper_95 |
|---|---|---|---|
| s[2] | 1.009 | 1.000 | 1.031 |
| s[3] | 1.086 | 1.007 | 1.267 |
The next plots show the joint and marginal posterior distributions for these score parameters.
ggplot(data.frame(x = sDelta_draws[,1],
y = sDelta_draws[,2]),
aes(x=x, y=y)) +
geom_point(alpha = 0.05, size = 0.5) +
geom_density_2d(color = "red") +
labs(x = expression(s[2] - 1),
y = expression(s[3] - s[2])) +
theme_minimal()hist(1+sDelta_draws[,1]+sDelta_draws[,2],
xlab=expression(s[3]), main="")
abline(v=1+mean(sDelta_draws[,1])+mean(sDelta_draws[,2]))