Getting started with marimekko

What is a marimekko plot?

A marimekko (or mosaic) plot is a two-dimensional visualization of a contingency table. Each column represents a category of one variable, and the segments within each column represent categories of a second variable: - Column widths are proportional to the marginal counts of the x variable. - Segment heights within each column are proportional to the conditional counts of the fill variable given x.

The marimekko package provides this as a native ggplot2 layer, so you can combine it with any other ggplot2 functionality (facets, themes, annotations, etc.).

Installation

# From CRAN
install.packages("marimekko")

# From GitHub (when published)
devtools::install_github("gogonzo/marimekko")

Your first marimekko plot

The built-in Titanic dataset records survival counts by class, sex, and age. Let’s visualize survival by passenger class.

library(ggplot2)
library(marimekko)

titanic <- as.data.frame(Titanic)

ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq), formula = ~ Class | Survived) +
  labs(title = "Titanic survival by class")

Two components are at work:

  1. geom_marimekko() computes tile positions from your data. The formula defines the variables (columns and segments), fill defines the segment colours, and weight provides the counts. Axis labels are automatically added.
  2. Standard ggplot2 functions (labs(), theme(), etc.) work as usual.

Aesthetics

geom_marimekko() understands these aesthetics and parameters:

Parameter / Aesthetic Required Description
formula yes Formula specifying variables, e.g. ~ X \| Y
fill no Categorical variable for segment colours (defaults to last formula variable)
weight no Numeric weight/count (default 1)

If your data already has one row per observation (no aggregation needed), omit weight:

ggplot(mtcars) +
  geom_marimekko(aes(fill = factor(gear)),
    formula = ~ cyl | gear
  )

Gap control

The gap parameter controls spacing between tiles as a fraction of the plot area. Default is 0.01.

ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived, gap = 0.03
  ) +
  labs(title = "Wider gaps (gap = 0.03)")

Set gap = 0 for a seamless mosaic:

ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived, gap = 0
  ) +
  labs(title = "No gaps")

Marginal percentages

geom_marimekko() can append marginal percentages to the x-axis labels via the show_percentages parameter:

ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived,
    show_percentages = TRUE
  )

Adding text labels

Use geom_marimekko_text() (or geom_marimekko_label() for a boxed version) to place labels at tile centers. Tile positions are read automatically from the preceding geom_marimekko() layer — only the label aesthetic is needed. Reference computed variables via after_stat():

ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq), formula = ~ Class | Survived) +
  geom_marimekko_text(aes(label = after_stat(weight)), colour = "white") +
  labs(title = "Counts inside tiles")

Percentage labels:

ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq), formula = ~ Class | Survived) +
  geom_marimekko_text(aes(
    label = after_stat(paste0(round(cond_prop * 100), "%"))
  ), colour = "white", size = 3)

Theming

theme_marimekko() provides a clean, minimal theme that removes distracting x-axis gridlines:

ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq), formula = ~ Class | Survived) +
  theme_marimekko() +
  labs(title = "With theme_marimekko()")

Since it builds on theme_minimal(), you can override any element:

ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq), formula = ~ Class | Survived) +
  theme_marimekko() +
  theme(legend.position = "bottom")

Faceting

geom_marimekko() supports ggplot2 faceting. Each panel gets its own independently proportioned mosaic:

ggplot(as.data.frame(Titanic)) +
  geom_marimekko(aes(fill = Survived, weight = Freq), formula = ~ Class | Survived) +
  facet_wrap(~Sex) +
  labs(title = "Survival by class, faceted by sex")

Next steps

See vignette("advanced-features") for spine plots, Pearson residuals, three-variable mosaics, and programmatic data extraction with fortify_marimekko().