Gapminder data contain data on life expectancy, GDP per capita, and population by country.It has 1704 rows and 6 variables.
Variable | Description |
---|---|
country | factor with 142 levels |
continent | factor with 5 levels |
year | ranges from 1952 to 2007 in increments of 5 years |
lifeExp | life expectancy at birth, in years |
pop | population |
gdpPercap | GDP per capita (US$, inflation-adjusted) |
Quantitative : gdpPercap, pop, lifeExp
Qualitative : continent, Country, year
tabyl(gapminder$continent, sort = TRUE)
gapminder$continent n percent
Africa 624 0.36619718
Americas 300 0.17605634
Asia 396 0.23239437
Europe 360 0.21126761
Oceania 24 0.01408451
We can see that most of the data were collected in Africa conntinenet with 36.6% percentage.Number of data collected in africa is greater than twice as much as the second highest continent which is America.
gapminderNEw <- gapminder %>%
mutate(Life_Expectancy = ifelse(lifeExp > 50, "High", "Low"))
tabyl(gapminderNEw$Life_Expectancy , sort = TRUE)
gapminderNEw$Life_Expectancy n percent
High 1213 0.7118545
Low 491 0.2881455
Most of the countries have high life expectancy which is more than 50 years.
CrossTable(gapminderNEw$Life_Expectancy, gapminder$continent)
Cell Contents
|-------------------------|
| N |
| Chi-square contribution |
| N / Row Total |
| N / Col Total |
| N / Table Total |
|-------------------------|
Total Observations in Table: 1704
| gapminder$continent
gapminderNEw$Life_Expectancy | Africa | Americas | Asia | Europe | Oceania | Row Total |
-----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
High | 251 | 272 | 308 | 358 | 24 | 1213 |
| 84.028 | 15.994 | 2.418 | 40.385 | 2.799 | |
| 0.207 | 0.224 | 0.254 | 0.295 | 0.020 | 0.712 |
| 0.402 | 0.907 | 0.778 | 0.994 | 1.000 | |
| 0.147 | 0.160 | 0.181 | 0.210 | 0.014 | |
-----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
Low | 373 | 28 | 88 | 2 | 0 | 491 |
| 207.589 | 39.513 | 5.973 | 99.771 | 6.915 | |
| 0.760 | 0.057 | 0.179 | 0.004 | 0.000 | 0.288 |
| 0.598 | 0.093 | 0.222 | 0.006 | 0.000 | |
| 0.219 | 0.016 | 0.052 | 0.001 | 0.000 | |
-----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
Column Total | 624 | 300 | 396 | 360 | 24 | 1704 |
| 0.366 | 0.176 | 0.232 | 0.211 | 0.014 | |
-----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
The number of higher life expectancy countries is less than the number of low life expectancy countries only in Africa continent. Europe has the highest number of countries with high life expectancy and as well as less number of countries with low life expectancy with percentage 29.5%, 0.4% respectively.
ggplot(gapminder, aes(x = continent, y = lifeExp)) +
geom_boxplot(outlier.size = 3, colour="black", width=0.1) +
geom_violin(alpha = 0.2, fill = "blue") +
ylab("Life Expectancy") +
ggtitle("Distrbution of Life expectany for each continenet")
In here we can see that africa has the lowest life expectancy and Ocenia has the highest life expectancy.There are no outliers in Africa and Ocenia.Africa and Europe has approximately summetric distribution.
p1 <- ggplot(gapminder, aes(x = continent, y = gdpPercap)) +
geom_boxplot(outlier.size = 3, colour="black", width=0.1) +
geom_violin(alpha = 0.2, fill = "blue") +
ylab("gdp per capital") +
ggtitle("Distrbution of GDP per capital for each continenet")
p2<- ggplot(gapminder, aes(x = continent, y = log(gdpPercap))) +
geom_boxplot(outlier.size = 3, colour="black", width=0.1) +
geom_violin(alpha = 0.2, fill = "blue") +
ylab("log(gdp per capital)") +
ggtitle("Distrbution of GDP per capital for each continenet")
p1|p2
In the first graph we cannot clearly see the any distributions. Therefore, let’s look at the graph with log transformation. America, Asia, Europe, Oceania almost has symmetric distribution. But Africa has a positively skewed distribution. After the log transformation, Asia and Oceania don’t have any outliers.
p3 <- ggplot(gapminder, aes(x = continent, y = gdpPercap)) +
geom_boxplot(outlier.size = 3, colour="black", width=0.1) +
geom_violin(alpha = 0.2, fill = "blue") +
ylab("Population") +
ggtitle("Distrbution of population for each continenet")
p4 <- ggplot(gapminder, aes(x = continent, y = log(pop))) +
geom_boxplot(outlier.size = 3, colour="black", width=0.1) +
geom_violin(alpha = 0.2, fill = "blue") +
ylab("log(Population)") +
ggtitle("Distrbution of population for each continenet")
p3|p4
In this case also we cannot see any distributions, so we can use the log transformation for population. There is a bimodal distribution for both Europe and Oceania. There might be an external factor that affects them.
#####figure 04
ggpairs(gapminder, mapping = aes(color=continent, alpha =0.2),
columns =c ("gdpPercap", "pop", "lifeExp"))+
ggtitle("Scatter plot matrix")
There is a strong positive linear relationship between GDP per capital and life expectancy in Oceania. There is an overall weak negative linear relationship between Population and GDP per capital, but in America and Oceania has a moderate positive linear relationship. Population and life Expectancy have weak, strong linear relationships, but Oceania has a moderate linear relationship.
ggplot(gapminder, aes(x=year, y=lifeExp, group=country)) +
geom_line()
Since there are so many overlapping lines we cannot clearly see the any pattern.Therefore we can use average values.
#####figurer 06
gapminder %>%
group_by(continent, year) %>%
summarise(meanlifeExp=median(lifeExp)) %>%
ggplot(aes(x=year, y=meanlifeExp, col = continent)) +
geom_line() +
geom_point() +
ggtitle("Time series plot for mean Life Expectancy")
We can see that Mean life expectancy increases over the years.There is no specific seasonal pattern in here.
gapminder %>%
group_by(continent, year) %>%
summarise(meangdpPercap = mean(gdpPercap)) %>%
ggplot(aes(x=year, y=meangdpPercap, col = continent)) +
geom_line() +
geom_point() +
ggtitle("Time series plot for mean GDP per capital")
In here we can see that Oceania and Europe have almost same pattern. Both of them faced a sudden drop after 1990.There is a sudden increase of GDP in Asia after 1970.Africa has the lowest GDP for all those year. There is no considerable increase in Africa.
gapminder %>%
group_by(continent, year) %>%
summarise(meanPop= mean(pop)) %>%
ggplot(aes(x=year, y=meanPop, col = continent)) +
geom_line() +
geom_point() +
ggtitle("Time series plot for mean GDP population")
In Asia there is a huge increase in population. It is almost 4 times than other continents after 2000.In Europe, there is a small decrease between 1990 and 2000.
The majority of countries have high life expectancy.
Europe has the highest number of countries with high life expectancy and less number of countries with low life expectancy.
There is a moderate linear relationship between Life expectancy and gdp per capital.
Life Expectancy, gdp per capital, population increase in every years.
There is a huge increase in the population in Asia