class: center, middle, inverse, title-slide # Data Visualization ### Dr Thiyanga Talagala ### 2020 - 02 - 25 --- # Plotting with R ## Base R - using `plot()` function ## Using ggplot2: grammar of graphics 1. ggplot2 package: `qplot()` function - **q**plot: **quick** plot - very similar to how you graph with `plot()` function 2. ggplot2 package: `ggplot()` function - fully utilize the power of grammar --- `diamonds` data set ```r library(ggplot2) data(diamonds) head(diamonds) ``` ``` # A tibble: 6 x 10 carat cut color clarity depth table price x y z <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 4 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 ``` ```r tail(diamonds) ``` ``` # A tibble: 6 x 10 carat cut color clarity depth table price x y z <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> 1 0.72 Premium D SI1 62.7 59 2757 5.69 5.73 3.58 2 0.72 Ideal D SI1 60.8 57 2757 5.75 5.76 3.5 3 0.72 Good D SI1 63.1 55 2757 5.69 5.75 3.61 4 0.7 Very Good D SI1 62.8 60 2757 5.66 5.68 3.56 5 0.86 Premium H SI2 61 58 2757 6.15 6.12 3.74 6 0.75 Ideal D SI2 62.2 55 2757 5.83 5.87 3.64 ``` --- ### Base R: Bar chart ```r color.summary <- table(diamonds$color); color.summary ``` ``` ## ## D E F G H I J ## 6775 9797 9542 11292 8304 5422 2808 ``` ```r barplot(color.summary, xlab="color", ylab = "counts") ``` ![](lesson3viz_files/figure-html/unnamed-chunk-2-1.png)<!-- --> --- ### Base R: Percentage bar chart ```r color.percent <- table(diamonds$color)/length(diamonds$color)*100; color.percent ``` ``` ## ## D E F G H I J ## 12.560252 18.162773 17.690026 20.934372 15.394883 10.051910 5.205784 ``` ```r barplot(color.percent, xlab="color", ylab = "%") ``` ![](lesson3viz_files/figure-html/unnamed-chunk-3-1.png)<!-- --> --- ### Base R: Percentage bar chart ```r twoway.table <- table(diamonds$color, diamonds$cut); twoway.table ``` ``` ## ## Fair Good Very Good Premium Ideal ## D 163 662 1513 1603 2834 ## E 224 933 2400 2337 3903 ## F 312 909 2164 2331 3826 ## G 314 871 2299 2924 4884 ## H 303 702 1824 2360 3115 ## I 175 522 1204 1428 2093 ## J 119 307 678 808 896 ``` ```r barplot(twoway.table, xlab="color", ylab = "%", legend=rownames(twoway.table)) ``` ![](lesson3viz_files/figure-html/unnamed-chunk-4-1.png)<!-- --> --- ## Base R: Stacked bar chart ```r twoway.table <- table(diamonds$color, diamonds$cut); twoway.table ``` ``` ## ## Fair Good Very Good Premium Ideal ## D 163 662 1513 1603 2834 ## E 224 933 2400 2337 3903 ## F 312 909 2164 2331 3826 ## G 314 871 2299 2924 4884 ## H 303 702 1824 2360 3115 ## I 175 522 1204 1428 2093 ## J 119 307 678 808 896 ``` ```r barplot(twoway.table, xlab="color", ylab = "%", col=c("red", "blue", "green", "orange", "yellow", "black", "pink"), legend=rownames(twoway.table), args.legend = list(x = "topleft"), beside = FALSE) ``` ![](lesson3viz_files/figure-html/unnamed-chunk-5-1.png)<!-- --> --- ## Base R: Cluster bar chart ```r twoway.table <- table(diamonds$color, diamonds$cut); twoway.table ``` ``` ## ## Fair Good Very Good Premium Ideal ## D 163 662 1513 1603 2834 ## E 224 933 2400 2337 3903 ## F 312 909 2164 2331 3826 ## G 314 871 2299 2924 4884 ## H 303 702 1824 2360 3115 ## I 175 522 1204 1428 2093 ## J 119 307 678 808 896 ``` ```r barplot(twoway.table, xlab="color", ylab = "%", col=c("red", "blue", "green", "orange", "yellow", "black", "pink"), legend=rownames(twoway.table), args.legend = list(x = "topleft"), beside = TRUE) ``` ![](lesson3viz_files/figure-html/unnamed-chunk-6-1.png)<!-- --> --- # Base R functions: Histograms ```r hist(diamonds$price) ``` ![](lesson3viz_files/figure-html/unnamed-chunk-7-1.png)<!-- --> --- ## Base R functions: Boxplot ```r boxplot(diamonds$price, diamonds$color) ``` ![](lesson3viz_files/figure-html/unnamed-chunk-8-1.png)<!-- --> --- ```r boxplot(price~color, data=diamonds) ``` ![](lesson3viz_files/figure-html/unnamed-chunk-9-1.png)<!-- --> --- ## Base R plot functions: Scatter plot ```r plot(diamonds$carat, diamonds$price) ``` ![](lesson3viz_files/figure-html/unnamed-chunk-10-1.png)<!-- --> --- ## Base R plot functions: Scatter plot ```r plot(diamonds$carat, diamonds$price, col="red") ``` ![](lesson3viz_files/figure-html/unnamed-chunk-11-1.png)<!-- --> --- ## ggplot2 - R package for producing data visualizations. - The most used package for producing graphics in R. ![](wilkinson.png) --- ## Scatterplot ```r library(ggplot2) data(diamonds) ``` ```r qplot(carat, price, data=diamonds) ``` ![](lesson3viz_files/figure-html/unnamed-chunk-13-1.png)<!-- --> --- ```r qplot(log(carat), log(price), data=diamonds) ``` ![](lesson3viz_files/figure-html/unnamed-chunk-14-1.png)<!-- --> --- ```r qplot(log(carat), log(price), data=diamonds, alpha=0.5) ``` ![](lesson3viz_files/figure-html/unnamed-chunk-15-1.png)<!-- --> --- ## Mapping aesthetic attributes - aesthetic attributes: Visual properties that affect the way the observations are displayed. - shape, colour ```r qplot(carat, price, data = diamonds, colour = color) ``` ![](lesson3viz_files/figure-html/unnamed-chunk-16-1.png)<!-- --> --- ```r qplot(carat, price, data = diamonds, colour = table) ``` ![](lesson3viz_files/figure-html/unnamed-chunk-17-1.png)<!-- --> --- ```r qplot(carat, price, data = diamonds, shape = cut) ``` ``` ## Warning: Using shapes for an ordinal variable is not advised ``` ![](lesson3viz_files/figure-html/unnamed-chunk-18-1.png)<!-- --> --- ```r qplot(carat, price, data = diamonds, size = cut) ``` ![](lesson3viz_files/figure-html/unnamed-chunk-19-1.png)<!-- --> --- ### geom = "point" ```r qplot(carat, price, data = diamonds, geom = "point") ``` ![](lesson3viz_files/figure-html/unnamed-chunk-20-1.png)<!-- --> --- ### geom = "point" ```r qplot(cut, price, data = diamonds, geom = "point") ``` ![](lesson3viz_files/figure-html/unnamed-chunk-21-1.png)<!-- --> --- ### geom = "smooth" ```r qplot(carat, price, data = diamonds, geom = "smooth") ``` ``` ## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")' ``` ![](lesson3viz_files/figure-html/unnamed-chunk-22-1.png)<!-- --> --- ### geom = "point" ```r qplot(carat, price, data = diamonds, geom = c("point", "smooth")) ``` ``` ## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")' ``` ![](lesson3viz_files/figure-html/unnamed-chunk-23-1.png)<!-- --> --- ### geom = "path" ```r qplot(carat, price, data = diamonds, geom = "path") ``` ![](lesson3viz_files/figure-html/unnamed-chunk-24-1.png)<!-- --> --- ### geom = "line" ```r qplot(carat, price, data = diamonds, geom = "line") ``` ![](lesson3viz_files/figure-html/unnamed-chunk-25-1.png)<!-- --> --- ```r qplot(color, price, data = diamonds, geom = "boxplot") ``` ![](lesson3viz_files/figure-html/unnamed-chunk-26-1.png)<!-- --> --- ```r qplot(y=price, data = diamonds, geom = "boxplot") ``` ![](lesson3viz_files/figure-html/unnamed-chunk-27-1.png)<!-- --> --- ```r qplot(color, price, data = diamonds, geom = c("boxplot", "jitter")) ``` ![](lesson3viz_files/figure-html/unnamed-chunk-28-1.png)<!-- --> --- ```r qplot(color, price, data = diamonds, geom = c("boxplot", "jitter"), alpha=0.1) ``` ![](lesson3viz_files/figure-html/unnamed-chunk-29-1.png)<!-- --> --- ```r qplot(carat, data = diamonds, geom = "histogram") ``` ``` ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. ``` ![](lesson3viz_files/figure-html/unnamed-chunk-30-1.png)<!-- --> --- ```r qplot(carat, data = diamonds, geom = "histogram", bandwith=0.1) ``` ``` ## Warning: Ignoring unknown parameters: bandwith ``` ``` ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. ``` ![](lesson3viz_files/figure-html/unnamed-chunk-31-1.png)<!-- --> --- ## Aesthetic mapping ```r qplot(carat, data = diamonds, geom = "histogram", bandwith=0.1, colour=color) ``` ``` ## Warning: Ignoring unknown parameters: bandwith ``` ``` ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. ``` ![](lesson3viz_files/figure-html/unnamed-chunk-32-1.png)<!-- --> --- ## Aesthetic mapping ```r qplot(carat, data = diamonds, geom = "histogram", bandwith=0.1, fill=color) ``` ``` ## Warning: Ignoring unknown parameters: bandwith ``` ``` ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. ``` ![](lesson3viz_files/figure-html/unnamed-chunk-33-1.png)<!-- --> --- ```r qplot(carat, data = diamonds, geom = "density") ``` ![](lesson3viz_files/figure-html/unnamed-chunk-34-1.png)<!-- --> --- ## Bar chart ```r qplot(color, data = diamonds, geom = "bar") ``` ![](lesson3viz_files/figure-html/unnamed-chunk-35-1.png)<!-- --> --- ## Bar chart ```r qplot(color, data = diamonds, geom = "bar") ``` ![](lesson3viz_files/figure-html/unnamed-chunk-36-1.png)<!-- --> --- ```r qplot(color, data=diamonds) ``` ![](lesson3viz_files/figure-html/unnamed-chunk-37-1.png)<!-- --> --- ```r qplot(price, data=diamonds) ``` ``` ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. ``` ![](lesson3viz_files/figure-html/unnamed-chunk-38-1.png)<!-- --> --- ## Faceting ```r qplot(carat, ..density.., data = diamonds, facets = color ~ ., geom = "histogram", binwidth = 0.1, xlim = c(0, 3)) ``` ``` ## Warning: Removed 32 rows containing non-finite values (stat_bin). ``` ``` ## Warning: Removed 14 rows containing missing values (geom_bar). ``` ![](lesson3viz_files/figure-html/unnamed-chunk-39-1.png)<!-- --> --- ## Faceting ```r qplot(carat, ..density.., data = diamonds, facets = . ~ color, geom = "histogram", binwidth = 0.1, xlim = c(0, 3)) ``` ``` ## Warning: Removed 32 rows containing non-finite values (stat_bin). ``` ``` ## Warning: Removed 14 rows containing missing values (geom_bar). ``` ![](lesson3viz_files/figure-html/unnamed-chunk-40-1.png)<!-- --> --- ## Faceting ```r qplot(carat, ..density.., data = diamonds, facets = cut ~ color, geom = "histogram", binwidth = 0.1, xlim = c(0, 3)) ``` ``` ## Warning: Removed 32 rows containing non-finite values (stat_bin). ``` ``` ## Warning: Removed 70 rows containing missing values (geom_bar). ``` ![](lesson3viz_files/figure-html/unnamed-chunk-41-1.png)<!-- --> --- ## Other options ```r qplot( carat, price, data = diamonds, xlab = "Price ($)", ylab = "Weight (carats)", main = "Price-weight relationship") ``` ![](lesson3viz_files/figure-html/unnamed-chunk-42-1.png)<!-- --> --- ```r ls(pattern = '^geom_', env = as.environment('package:ggplot2')) ``` ``` ## [1] "geom_abline" "geom_area" "geom_bar" "geom_bin2d" ## [5] "geom_blank" "geom_boxplot" "geom_col" "geom_contour" ## [9] "geom_count" "geom_crossbar" "geom_curve" "geom_density" ## [13] "geom_density_2d" "geom_density2d" "geom_dotplot" "geom_errorbar" ## [17] "geom_errorbarh" "geom_freqpoly" "geom_hex" "geom_histogram" ## [21] "geom_hline" "geom_jitter" "geom_label" "geom_line" ## [25] "geom_linerange" "geom_map" "geom_path" "geom_point" ## [29] "geom_pointrange" "geom_polygon" "geom_qq" "geom_qq_line" ## [33] "geom_quantile" "geom_raster" "geom_rect" "geom_ribbon" ## [37] "geom_rug" "geom_segment" "geom_sf" "geom_sf_label" ## [41] "geom_sf_text" "geom_smooth" "geom_spoke" "geom_step" ## [45] "geom_text" "geom_tile" "geom_violin" "geom_vline" ``` ---