Introduction

Gapminder data contain data on life expectancy, GDP per capita, and population by country.It has 1704 rows and 6 variables.

Variable Description
country factor with 142 levels
continent factor with 5 levels
year ranges from 1952 to 2007 in increments of 5 years
lifeExp life expectancy at birth, in years
pop population
gdpPercap GDP per capita (US$, inflation-adjusted)

Type of variables

Quantitative : gdpPercap, pop, lifeExp
Qualitative : continent, Country, year

Data analysis

Composition of the sample

table 01
tabyl(gapminder$continent, sort = TRUE)
 gapminder$continent   n    percent
              Africa 624 0.36619718
            Americas 300 0.17605634
                Asia 396 0.23239437
              Europe 360 0.21126761
             Oceania  24 0.01408451

We can see that most of the data were collected in Africa conntinenet with 36.6% percentage.Number of data collected in africa is greater than twice as much as the second highest continent which is America.

table 02
gapminderNEw <- gapminder %>% 
  mutate(Life_Expectancy = ifelse(lifeExp > 50, "High", "Low"))

tabyl(gapminderNEw$Life_Expectancy , sort = TRUE)
 gapminderNEw$Life_Expectancy    n   percent
                         High 1213 0.7118545
                          Low  491 0.2881455

Most of the countries have high life expectancy which is more than 50 years.

table 03
CrossTable(gapminderNEw$Life_Expectancy, gapminder$continent)

 
   Cell Contents
|-------------------------|
|                       N |
| Chi-square contribution |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  1704 

 
                             | gapminder$continent 
gapminderNEw$Life_Expectancy |    Africa |  Americas |      Asia |    Europe |   Oceania | Row Total | 
-----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
                        High |       251 |       272 |       308 |       358 |        24 |      1213 | 
                             |    84.028 |    15.994 |     2.418 |    40.385 |     2.799 |           | 
                             |     0.207 |     0.224 |     0.254 |     0.295 |     0.020 |     0.712 | 
                             |     0.402 |     0.907 |     0.778 |     0.994 |     1.000 |           | 
                             |     0.147 |     0.160 |     0.181 |     0.210 |     0.014 |           | 
-----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
                         Low |       373 |        28 |        88 |         2 |         0 |       491 | 
                             |   207.589 |    39.513 |     5.973 |    99.771 |     6.915 |           | 
                             |     0.760 |     0.057 |     0.179 |     0.004 |     0.000 |     0.288 | 
                             |     0.598 |     0.093 |     0.222 |     0.006 |     0.000 |           | 
                             |     0.219 |     0.016 |     0.052 |     0.001 |     0.000 |           | 
-----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
                Column Total |       624 |       300 |       396 |       360 |        24 |      1704 | 
                             |     0.366 |     0.176 |     0.232 |     0.211 |     0.014 |           | 
-----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|

 

The number of higher life expectancy countries is less than the number of low life expectancy countries only in Africa continent. Europe has the highest number of countries with high life expectancy and as well as less number of countries with low life expectancy with percentage 29.5%, 0.4% respectively.

figure 01
ggplot(gapminder, aes(x = continent, y = lifeExp)) + 
  geom_boxplot(outlier.size = 3, colour="black", width=0.1) + 
  geom_violin(alpha = 0.2, fill = "blue") +
  ylab("Life Expectancy") +
  ggtitle("Distrbution of Life expectany for each continenet")

In here we can see that africa has the lowest life expectancy and Ocenia has the highest life expectancy.There are no outliers in Africa and Ocenia.Africa and Europe has approximately summetric distribution.

figure 02
p1 <- ggplot(gapminder, aes(x = continent, y = gdpPercap)) + 
  geom_boxplot(outlier.size = 3, colour="black", width=0.1) + 
  geom_violin(alpha = 0.2, fill = "blue") +
  ylab("gdp per capital") +
  ggtitle("Distrbution of GDP per capital for each continenet")


p2<- ggplot(gapminder, aes(x = continent, y = log(gdpPercap))) + 
  geom_boxplot(outlier.size = 3, colour="black", width=0.1) + 
  geom_violin(alpha = 0.2, fill = "blue") +
  ylab("log(gdp per capital)") +
  ggtitle("Distrbution of GDP per capital for each continenet")

p1|p2

In the first graph we cannot clearly see the any distributions. Therefore, let’s look at the graph with log transformation. America, Asia, Europe, Oceania almost has symmetric distribution. But Africa has a positively skewed distribution. After the log transformation, Asia and Oceania don’t have any outliers.

figure 03
p3 <- ggplot(gapminder, aes(x = continent, y = gdpPercap)) + 
  geom_boxplot(outlier.size = 3, colour="black", width=0.1) + 
  geom_violin(alpha = 0.2, fill = "blue") +
  ylab("Population") +
  ggtitle("Distrbution of population for each continenet")


p4 <- ggplot(gapminder, aes(x = continent, y = log(pop))) + 
  geom_boxplot(outlier.size = 3, colour="black", width=0.1) + 
  geom_violin(alpha = 0.2, fill = "blue") +
  ylab("log(Population)") +
  ggtitle("Distrbution of population for each continenet")

p3|p4

In this case also we cannot see any distributions, so we can use the log transformation for population. There is a bimodal distribution for both Europe and Oceania. There might be an external factor that affects them.

#####figure 04

ggpairs(gapminder, mapping = aes(color=continent, alpha =0.2),
        columns =c ("gdpPercap", "pop", "lifeExp"))+
  ggtitle("Scatter plot matrix")

There is a strong positive linear relationship between GDP per capital and life expectancy in Oceania. There is an overall weak negative linear relationship between Population and GDP per capital, but in America and Oceania has a moderate positive linear relationship. Population and life Expectancy have weak, strong linear relationships, but Oceania has a moderate linear relationship.

figure 05
ggplot(gapminder, aes(x=year, y=lifeExp, group=country)) +
    geom_line()

Since there are so many overlapping lines we cannot clearly see the any pattern.Therefore we can use average values.

#####figurer 06

gapminder %>%
    group_by(continent, year) %>%
    summarise(meanlifeExp=median(lifeExp)) %>%
    ggplot(aes(x=year, y=meanlifeExp, col = continent)) +
     geom_line() + 
     geom_point() +
     ggtitle("Time series plot for mean Life Expectancy")

We can see that Mean life expectancy increases over the years.There is no specific seasonal pattern in here.

figure 06
gapminder %>%
    group_by(continent, year) %>%
    summarise(meangdpPercap = mean(gdpPercap)) %>%
    ggplot(aes(x=year, y=meangdpPercap, col = continent)) +
     geom_line() + 
     geom_point() +
     ggtitle("Time series plot for mean GDP per capital")

In here we can see that Oceania and Europe have almost same pattern. Both of them faced a sudden drop after 1990.There is a sudden increase of GDP in Asia after 1970.Africa has the lowest GDP for all those year. There is no considerable increase in Africa.

figure 07
gapminder %>%
    group_by(continent, year) %>%
    summarise(meanPop= mean(pop)) %>%
    ggplot(aes(x=year, y=meanPop, col = continent)) +
     geom_line() + 
     geom_point() +
     ggtitle("Time series plot for mean GDP population")

In Asia there is a huge increase in population. It is almost 4 times than other continents after 2000.In Europe, there is a small decrease between 1990 and 2000.

Conclusions

  • The majority of countries have high life expectancy.

  • Europe has the highest number of countries with high life expectancy and less number of countries with low life expectancy.

  • There is a bimodal distribution in Europe and Oceania for population.
  • There is a moderate linear relationship between Life expectancy and gdp per capital.

  • Life Expectancy, gdp per capital, population increase in every years.

  • There is a huge increase in the population in Asia