Name | diamonds |
Number of rows | 53940 |
Number of columns | 10 |
_______________________ | |
Column type frequency: | |
factor | 3 |
numeric | 7 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
cut | 0 | 1 | TRUE | 5 | Ide: 21551, Pre: 13791, Ver: 12082, Goo: 4906 |
color | 0 | 1 | TRUE | 7 | G: 11292, E: 9797, F: 9542, H: 8304 |
clarity | 0 | 1 | TRUE | 8 | SI1: 13065, VS2: 12258, SI2: 9194, VS1: 8171 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
carat | 0 | 1 | 0.80 | 0.47 | 0.2 | 0.40 | 0.70 | 1.04 | 5.01 | ▇▂▁▁▁ |
depth | 0 | 1 | 61.75 | 1.43 | 43.0 | 61.00 | 61.80 | 62.50 | 79.00 | ▁▁▇▁▁ |
table | 0 | 1 | 57.46 | 2.23 | 43.0 | 56.00 | 57.00 | 59.00 | 95.00 | ▁▇▁▁▁ |
price | 0 | 1 | 3932.80 | 3989.44 | 326.0 | 950.00 | 2401.00 | 5324.25 | 18823.00 | ▇▂▁▁▁ |
x | 0 | 1 | 5.73 | 1.12 | 0.0 | 4.71 | 5.70 | 6.54 | 10.74 | ▁▁▇▃▁ |
y | 0 | 1 | 5.73 | 1.14 | 0.0 | 4.72 | 5.71 | 6.54 | 58.90 | ▇▁▁▁▁ |
z | 0 | 1 | 3.54 | 0.71 | 0.0 | 2.91 | 3.53 | 4.04 | 31.80 | ▇▁▁▁▁ |
p1 <- ggplot(data=diamonds, aes(x=cut))
#old version
#..count..: special variable to represent frequency
#p1 <- ggplot(data=diamonds, aes(x=cut, y=..count..))
#p2 <- ggplot(data=diamonds, aes(x=cut, #y=..count../sum(..count..)))
# New version
p1 <- ggplot(data=diamonds, aes(x=cut, y=after_stat(count/sum(count)))) + geom_bar()
p2 <- ggplot(data=diamonds, aes(x=cut, y=after_stat(count/sum(count)))) + geom_bar()
cut percent
1 Fair 3.0
2 Good 9.0
3 Very Good 22.4
4 Premium 25.6
5 Ideal 40.0
p3 <- ggplot(data=cut.percent, aes(x=cut, y=percent))
# Need to rerun this once you change the factor levels (can't use p3)
ggplot(data=cut.percent, aes(x=cut, y=percent))+geom_bar(stat="identity")
Help: use coord_flip
cut percent prop
1 Fair 3.0 0.030
2 Good 9.0 0.090
3 Very Good 22.4 0.224
4 Premium 25.6 0.256
5 Ideal 40.0 0.400
ggplot(data=cut.prop, aes(x="", y=prop, fill=cut))+geom_bar(stat="identity", width=1)
ggplot(data=cut.prop, aes(x="", y=prop, fill=cut))+geom_bar(stat="identity", width=1, position = "dodge")
Pie charts are controversial in statistics.
Some extra work is needed to make the pie chart appealing to human eye.
Encoding by colour
Position: stack
b1 <- ggplot(data=diamonds, aes(x=cut, fill=color))
12: R code:___________
Encoding by colour
Position: dodge
13: R code:___________
Position: fill
14: R code:___________
15: Rcode:_______________
Encoding by position
ggplot(data=diamonds, aes(x=color))+geom_bar()+facet_wrap(~cut)
# A tibble: 5 x 2
cut mean_carat
<ord> <dbl>
1 Fair 1.05
2 Good 0.849
3 Very Good 0.806
4 Premium 0.892
5 Ideal 0.703
stat_summary
mean_se
: mean and standard error
g1 <- ggplot(diamonds, aes(x = cut, y = carat))
mean_cl_normal
: 95 per cent confidence interval assuming normality. (Use library(Hmisc)
)
mean_cl_boot
: Bootstrap confidence interval (95%)
mean_hilow
: Median, Q1, Q3
Description
The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).
Name | ToothGrowth |
Number of rows | 60 |
Number of columns | 3 |
_______________________ | |
Column type frequency: | |
factor | 1 |
numeric | 2 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
supp | 0 | 1 | FALSE | 2 | OJ: 30, VC: 30 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
len | 0 | 1 | 18.81 | 7.65 | 4.2 | 13.07 | 19.25 | 25.27 | 33.9 | ▅▃▅▇▂ |
dose | 0 | 1 | 1.17 | 0.63 | 0.5 | 0.50 | 1.00 | 2.00 | 2.0 | ▇▇▁▁▇ |
len supp dose
1 4.2 VC 0.5
2 11.5 VC 0.5
3 7.3 VC 0.5
4 5.8 VC 0.5
5 6.4 VC 0.5
6 10.0 VC 0.5
Avoid overlapping in the last category position_dodge(0.1)
Not suitable for this example: Why?