class: center, middle, inverse, title-slide # ASP 460 2.0 Special Topics in Statistics ## Visualizing Distributions ### Thiyanga Talagala ### 2020-05-27 --- # Visualizing a Single Distribution - Histogram - Density plot - Cumulative density - Quantile-Quantile plot > Cumulative density and Quantile-Quantile plot are hard to interpret. --- # Visualizing multiple distributions .pull-left[ **Visualization of distributions along the X-axis** - Boxplots - Violins - Strip charts - Sina plots ] .pull-right[ **Visualization of distributions at the same time** - Staked histograms - Overlapping densities - Ridgeline plot ] --- # Histogram - Binwidth ![](lecture8_files/figure-html/unnamed-chunk-3-1.png)<!-- --> --- # Histogram-Binwidth (.1) **Narrow** ![](lecture8_files/figure-html/unnamed-chunk-4-1.png)<!-- --> --- # Histogram-Binwidth (2) **Wide** ![](lecture8_files/figure-html/unnamed-chunk-5-1.png)<!-- --> --- # Add a rug ![](lecture8_files/figure-html/unnamed-chunk-6-1.png)<!-- --> --- # Histogram - Example ```r ggplot(iris, aes(x = Sepal.Length)) + geom_histogram(binwidth = .2, fill = "orange", colour = "black") + geom_rug() + facet_wrap(~ Species) ``` ![](lecture8_files/figure-html/unnamed-chunk-7-1.png)<!-- --> --- # Boxplot **Medium to Large N** ![](lecture8_files/figure-html/unnamed-chunk-8-1.png)<!-- --> --- # Boxplot - Example ```r ggplot(iris, aes(y = Sepal.Length, x = Species)) + geom_boxplot() ``` ![](lecture8_files/figure-html/unnamed-chunk-9-1.png)<!-- --> --- # Add notches “Notches are used to compare groups; if the notches of two boxes do not overlap, this is strong evidence that the medians differ.” (Chambers et al., 1983, p. 62) ![](lecture8_files/figure-html/unnamed-chunk-10-1.png)<!-- --> --- # Boxplot with notch - Example ```r ggplot(iris, aes(y = Sepal.Length, x = Species)) + geom_boxplot(notch = T) ``` ![](lecture8_files/figure-html/unnamed-chunk-11-1.png)<!-- --> Your turn: Perform ANOVA. --- # Add summary statistics ![](lecture8_files/figure-html/unnamed-chunk-12-1.png)<!-- --> Green: Mean --- # Boxplot with summary - Example ```r ggplot(iris, aes(y = Sepal.Length, x = Species)) + geom_boxplot() + stat_summary(fun.y=mean) ``` ![](lecture8_files/figure-html/unnamed-chunk-13-1.png)<!-- --> --- # Boxplot with summary - Example Your turn: Add min, max, Q1, Q2, Q3 ![](lecture8_files/figure-html/unnamed-chunk-14-1.png)<!-- --> --- # Stripchart **Small to Medium** ![](lecture8_files/figure-html/unnamed-chunk-15-1.png)<!-- --> --- # Stripchart - Example ![](lecture8_files/figure-html/unnamed-chunk-16-1.png)<!-- --> --- # Boxplot using geom_dotplot **Small to Medium** .pull-left[ ![](lecture8_files/figure-html/unnamed-chunk-17-1.png)<!-- --> Previous ] .pull-right[ ![](lecture8_files/figure-html/unnamed-chunk-18-1.png)<!-- --> Now ] --- # Boxplot using geom_dotplot - Example .pull-left[ ```r ggplot(iris, aes(x = Species, y = Sepal.Length)) + geom_dotplot(stackdir = "center", binaxis = "y", binwidth = .1, binpositions = "all", stackratio = 1.5, fill = "#7570b3", colour = "#7570b3") ``` ![](lecture8_files/figure-html/unnamed-chunk-19-1.png)<!-- --> ] .pull-right[ ```r ggplot(iris, aes(x = Species, y = Sepal.Length)) + geom_dotplot(stackdir = "center", binaxis = "y", binwidth = .05, binpositions = "all", stackratio = 1.5, fill = "#7570b3", colour = "#7570b3") ``` ![](lecture8_files/figure-html/unnamed-chunk-20-1.png)<!-- --> ] --- # Bee swarm ![](honeybees.jpg) --- # Beeswarm .pull-left[ ![](lecture8_files/figure-html/unnamed-chunk-21-1.png)<!-- --> Previous ] .pull-right[ ![](lecture8_files/figure-html/unnamed-chunk-22-1.png)<!-- --> Now ] --- # Boxplot with dot points ![](lecture8_files/figure-html/unnamed-chunk-23-1.png)<!-- --> --- # Boxplot with dot points - Example ```r ggplot(iris, aes(y = Sepal.Length, x = Species)) + geom_boxplot(outlier.shape = NA) + geom_dotplot(binaxis = 'y', stackdir = 'center', fill = "#7570b3", colour = "#7570b3", binwidth = .05) ``` ![](lecture8_files/figure-html/unnamed-chunk-24-1.png)<!-- --> --- # Boxplot with dot points .pull-left[ ![](lecture8_files/figure-html/unnamed-chunk-25-1.png)<!-- --> Previous ] .pull-right[ ![](lecture8_files/figure-html/unnamed-chunk-26-1.png)<!-- --> Now with `geom="jitter"` ] --- # Boxplot with dot points (geom="jitter") ```r ggplot(iris, aes(y = Sepal.Length, x = Species)) + geom_boxplot(outlier.shape = NA, width = .5) + geom_jitter(fill = "#7570b3", colour = "#7570b3", position = position_jitter(height = 0, width = .1), alpha = .5) ``` ![](lecture8_files/figure-html/unnamed-chunk-27-1.png)<!-- --> --- # Density plots **Medium to large n** <!--More recently, as extensive computing power has become available in every devices such as laptops and cell phones, we see them increasingly being replaced by density plots.--> <!--attempt to visualize the underlying probability distribution of the data by drawing an appropriate continuous curve--> ![](lecture8_files/figure-html/unnamed-chunk-28-1.png)<!-- --> --- background-image: url('kernel1.png') background-position: center background-size: contain --- background-image: url('kernel2.png') background-position: center background-size: contain --- # Density plot .pull-left[ ![](lecture8_files/figure-html/unnamed-chunk-29-1.png)<!-- --> Previous ] .pull-right[ ![](lecture8_files/figure-html/unnamed-chunk-30-1.png)<!-- --> Now ] --- # Density plots - Example .pull-left[ ```r ggplot(iris, aes(x = Sepal.Length)) + geom_density(fill = "#7570b3") + facet_wrap(~ Species) ``` ![](lecture8_files/figure-html/unnamed-chunk-31-1.png)<!-- --> Previous ] .pull-right[ ```r ggplot(iris, aes(x = Sepal.Length, fill=Species)) + geom_density(alpha=0.5) ``` ![](lecture8_files/figure-html/unnamed-chunk-32-1.png)<!-- --> Now ] --- # Density plot and Histogram .pull-left[ ![](lecture8_files/figure-html/unnamed-chunk-33-1.png)<!-- --> Previous ] .pull-right[ ![](lecture8_files/figure-html/unnamed-chunk-34-1.png)<!-- --> Now ] --- # Density plot and Histogram - Example ```r ggplot(iris, aes(x = Sepal.Length)) + geom_histogram(aes(y = ..density..), binwidth = .5, colour = "black", fill = "white") + geom_density(alpha = .5, fill = "#7570b3") + facet_wrap(~ Species) ``` ![](lecture8_files/figure-html/unnamed-chunk-35-1.png)<!-- --> --- # Violin plot .pull-left[ ![](lecture8_files/figure-html/unnamed-chunk-36-1.png)<!-- --> Previous ] .pull-right[ ![](lecture8_files/figure-html/unnamed-chunk-37-1.png)<!-- --> Now ] --- # Violin plot - Example ```r ggplot(iris, aes(x = Species, y = Sepal.Length)) + geom_violin(color = NA, fill = "#7570b3", na.rm = TRUE, scale = "count") ``` ![](lecture8_files/figure-html/unnamed-chunk-38-1.png)<!-- --> --- # Violin plot + Boxplot .pull-left[ ![](lecture8_files/figure-html/unnamed-chunk-39-1.png)<!-- --> Previous ] .pull-right[ ![](lecture8_files/figure-html/unnamed-chunk-40-1.png)<!-- --> Now ] --- # Violin plot + Boxplot ```r ggplot(iris, aes(x = Species, y = Sepal.Length)) + geom_boxplot(outlier.size = 2, colour="#7570b3", width=.1) + geom_violin(alpha = .2, fill = "#7570b3") ``` ![](lecture8_files/figure-html/unnamed-chunk-41-1.png)<!-- --> --- # Ridgeline plots .pull-left[ ![](lecture8_files/figure-html/unnamed-chunk-42-1.png)<!-- --> Previous ] .pull-right[ ![](lecture8_files/figure-html/unnamed-chunk-43-1.png)<!-- --> Now ] --- # Ridgeline plots - Example ```r library(ggridges) ggplot(iris, aes(x = Sepal.Length, y = Species)) + geom_density_ridges(scale = 0.9, fill = "#7570b3", alpha = .5) ``` ![](lecture8_files/figure-html/unnamed-chunk-44-1.png)<!-- --> --- # Raincloud plot .pull-left[ ![](lecture8_files/figure-html/unnamed-chunk-45-1.png)<!-- --> Previous ] .pull-right[ ![](lecture8_files/figure-html/unnamed-chunk-46-1.png)<!-- --> Now ] --- # Raincloud plots - Example ```r library(ggridges) ggplot(iris, aes(x = Sepal.Length, y = Species)) + geom_density_ridges(scale = 0.9, position= "raincloud", jittered_points = TRUE, fill = "#7570b3", alpha = .5) ``` ![](lecture8_files/figure-html/unnamed-chunk-47-1.png)<!-- -->