Part 1: A Motivating Example

Walktober

  • Our department had a walkathon in october where we all competed to see how many steps we could walk each day

Data quality angel

Before the competition started, I searched up the most accurate and cost effective pedometer

Pedometer = bad

  • It turns out, pedometers are wildly inaccurate

Data quality demons

  • It turns out I was the ONLY person concerned about data quality
  • We conducted a survey after walktober, to see if we could quantify the pedometer error
  • Turns out the measurement error was the least of my worries

Survey response

Visualising the data

Our department decides to do a visualisation challenge of the walktober data - Give up and just use raw data?

Not an uncommon scenario

Often our data is….

  • Unavailable
    • e.g. anonymised data, measurement error, etc.
  • Non-deterministic
    • e.g. bounded data, estimated values, etc.
  • or Theoretical
    • e.g. estimates based on theory, latent variables, etc

Part 2: A Teeny Tiny Spatial Example

Choropleth map

Temperatures in Iowa counties

What does transparency look like?

  • Statistical validity should translates to perceptual ease
    • The higher the variance on an estimate, the harder that estimate is to extract from the plot

Choropleth map

The wave should be invisible in high variance case

Test this property?

  • Made a fake colour blind test using Nick Tierney’s ishihara package

Does it work?

Solution: add an axis for uncertainty

Bivariate Map

Does it work?

Why doesn’t this work?

  • Uncertainty is not just another variable…
    • It presents an interesting perceptual problem
  • Usually do not want variables to interfere with each other
    • In uncertainty visualisation, the opposite is true

Solution: blend the colours together!

Value Suppressing Uncertainty Palette (VSUP) (Correll, 2018)

Does it work?

Solution: simulate a sample

Pixel Map (Lucchesi, 2021)

Does it work?

Part 3: The Coding Problem

Code from my minds eye

# ggplot
ggplot(map_data) +
  geom_sf(aes(geometry = geometry, fill = deterministic_variable))

# vizumap
ggplot(map_data) +
  geom_sf(aes(geometry = geometry, fill = random_variable))

Vizumap code

Works? Yes. Code from my minds eye? No.

# load the package
library(Vizumap)
library(sf)
sf_use_s2(FALSE)

# Step 1: Format data using bespoke data formatting function
data <- read.uv(data = original_data, 
                estimate = "mean", 
                error = "standard_error")

# Step 2: Pixelate the shapefile
pixelation <- pixelate(geoData = geometry_data, 
                       id = "ID", 
                       # improved - set number of pixels
                       pixelSize = 100)


# Step 3: Build pixel map
pixel_map <- build_pmap(data = data, 
                         distribution = "normal", 
                         pixelGeo = pixelation, 
                         id = "ID", 
                         # You can only use a set palette
                         palette = "Oranges",
                         border = geometry_data)

# Step 4: Print pixel map
view(pixel_map)

Input data?

# A tibble: 8 × 3
                steps_dist team    name 
                    <dist> <chr>   <chr>
1 N(23679, 4633687)[0,Inf] iwalk() A    
2 N(18322, 2774223)[0,Inf] iwalk() A    
3   N(24562, 5e+06)[0,Inf] iwalk() A    
4 N(26128, 5642050)[0,Inf] iwalk() A    
5  N(10238, 866202)[0,Inf] iwalk() A    
6 N(16568, 2268638)[0,Inf] iwalk() A    
7 N(12270, 1244340)[0,Inf] iwalk() A    
8 N(17226, 2452356)[0,Inf] iwalk() A    
  • Vectorised random variables with distributional
  • Thank you Mitch, Matt, Alex, and Rob
  • Our walktober example as trunacted normal distributions
  • This is not a talk on distributional

Simple problem

If your problem is simple…

The scale problem

Asking for help

Asking for help

Asking for help

Mania in the Github activity

Part 4: ggdibbler

Spatial Plots

ggplot(toy_temp_mean) +
  geom_sf(aes(geometry=county_geometry, fill=temp_mean), linewidth=0) +
  scale_fill_distiller(palette = "OrRd") +
  theme_map() +
  theme(legend.position = "bottom") +
  ggtitle("ggplot2")

ggplot(toy_temp_dist) + 
  geom_sf_sample(aes(geometry = county_geometry, fill=temp_dist), linewidth=0, times=50) + 
  scale_fill_distiller(palette = "OrRd") +
  theme_map() +
  theme(legend.position = "bottom") +
  ggtitle("ggdibbler")

Uncertain EVERYTHING

  • The extension is TRIVIAL

Uncertain data sets!

Every data set used in the ggplot2 documentation has a distributional version in ggdibbler where EVERY variable is a random variable.

  • diamonds = uncertain_diamonds
  • mpg = uncertain_mpg
  • mtcars = uncertain_mtcars
  • faithful = uncertain_faithful
  • economics= uncertain_economics

Any geom!

ggplot(faithfuld, aes(waiting, eruptions, z = density)) + 
  ggtitle("ggplot2") +
  geom_contour() +
  theme(aspect.ratio = 1)

ggplot(uncertain_faithfuld, aes(waiting, eruptions, z = density))+
  ggtitle("ggdibbler") +
  geom_contour_sample(alpha=0.2)+
  theme(aspect.ratio = 1)

Any geom!

ggplot(faithfuld, aes(waiting, eruptions)) + 
  geom_raster(aes(fill = density)) +
  scale_fill_viridis_c() +
  theme(legend.position = "none") +
  ggtitle("ggplot2")

ggplot(uncertain_faithfuld, aes(waiting, eruptions)) + 
  geom_raster_sample(aes(fill = density2)) +
  scale_fill_viridis_c() +
  ggtitle("ggdibbler")+
  theme(legend.position = "none")

Any aesthetic!

ggplot(textdata, aes(x=x, y=y)) +
  geom_text(aes(label = text_const), size=4) +
  theme_few() +
  ggtitle("ggplot2") +
  theme(aspect.ratio = 1, legend.position = "none") 

ggplot(textdata, aes(x=x, y=y, lab = text_dist)) +
  geom_text_sample(aes(label = after_stat(lab)), size=4, alpha=1/30, times=30) +
  ggtitle("ggdibbler") +
  theme_few() +
  theme(aspect.ratio = 1, legend.position = "none") 

Uhh hold on, positions are an isuse

Nested positions!

Any Position!

ggplot(mpg, aes(class)) + 
  geom_bar_sample(aes(fill = drv), 
                  position = "stack")+
  theme(legend.position="none")+
  ggtitle("ggplot2")

ggplot(uncertain_mpg, aes(class)) + 
  geom_bar_sample(aes(fill = drv),times = 30, alpha=1/30,
                  position = "stack_identity")+
  theme(legend.position="none")+
  ggtitle("ggdibbler")

All the documentation!

Combine all this together to get…

Universal application in ggdibbler

Back to walktober example

Part 5: Conlcusions

Future Plans

  • Future of the software
    • multivariate distributions
    • build out nested position system
    • expand on the scales to accept more object types

Acknowledgements

  • My supervisors: Di Cook, Susan Vanderplas, and Sarah Goodwin
  • Experiment co-authors: Rachel Rogers and Alison Kleffner
  • Good ideas: Mitch O’Hara-Wild and Cynthia Huang
  • AEMO Zema Energy Schoalarship
  • Australian RTP Stipend
  • Numbat Hackathon (for the walktober data)