Introduction

Food industry is one of the fastest moving industrial sectors. The glamorous and glittering retail shops and supermarkets are expanding very fast all over the country. The majority of food is pre-packed and presented to the consumer in a labelled container. The data for the project is based on a study attempts to evaluate the consumers attitude towards food labels and awareness of information printed on food labels.This study was carried out to identify the association between demographic, socio-economic characteristics and health related factors on consumers attitude towards food labels and association between demographic, socio-economic characteristics and health related factors on awareness of information printed in food labels.

Varibale Description Type of variable
Gender Gender Qualitative
Age Age Quantitative
Education Educational level Qualitative
Employment Employment status Qualitative
Income Household Income Qualitative
House size Age groups Qualitative
Children Number of children Quantitative
Marital Marital status Qualitative
fshopper Major food shopper of the household Qualitative
mplanner Major meal planner of the household Qualitative
FA Having food allergies Qualitative
Diabetes Having Diabetes Qualitative
Metabolic syndrome Having Obesity, High blood pressure/Cholesterol, Heart disease Qualitative
Other Having Migrain, Osteoporoses, Other Qualitative
Specif Having a specific diet(pregnancy, breast feeding,training for sports,vegetarian) Qualitative
Job1 Doctors, nurses, health care workers Qualitative
Job2 Legislators related to food items Manufactures/ advertisers related to food items Qualitative
Exercise Frequency of do exercise Qualitative
Health Self perception of overall health Qualitative
Place Place of where buy package food Qualitative
Easy Easiness of the package food Qualitative
Familiarity Familiarity with the product Qualitative
Friends Recommendation by family and friends Qualitative
Useful Usefulness of food label Qualitative
Easiness Easiness of understand the information on food labels Qualitative
Sufficient Sufficiency of information provided in food label Qualitative
Truthfulness Truthfulness of information provided in food label Qualitative
Clear clarity of information printed in food label Qualitative
Attractive pack Influence of attractive package Qualitative
Hc/nufriclaim Influence of health claims/ Nutrition claims Qualitative
Graphical Influence of graphical and pictorial information Qualitative
Free Price Influence of Free/ Prizes/ Contests Qualitative
Net quan Awareness of net quantity Qualitative
Low in fat Awareness of low in fat Qualitative
Low in cho Awareness of low in cholesterol Qualitative
Sodium Awareness of nutrition claim indicates the lowest amount of sodium Qualitative
elabels Awareness of Ecode labels Qualitative

Packages

library(tidyverse)
library(janitor)
library(ggplot2)
library(gmodels)
library(GGally)
library(patchwork)
library(MASS)
library(huxtable)

First, we neeed to look at the types of collected data.

summary(foodlabel)
glimpse(food_label)

In here, we can see that the factors were recognized as integers. We need to convert them into factors. As well as in house size, eleventh there is 8 categories two data reported as 10 and 9. Therefore we have to replace those as missing values.

foodlabel <- foodlabel %>% mutate(Housesize = replace(Housesize, which(Housesize > 8), NA))

Composition of the sample

table 01:

tabyl(food_label,Gender)
Gendernpercent
female3770.643
male2090.357
tabyl(food_label, marital) 

maritalnpercent
single1550.265
married4310.735
According to the table 01 we can see that more than 64% of females are in the sample. It shows that females in the sample approximately twice as much as the number of males in the sample. As well as it shows that nearly 74% people are married and it is thrice much as the singles.

table 02:

tabyl(food_label, Education)

Educationnpercent
Below O/L660.113
Passed GCE O/L680.116
Passed GCE A/L880.15 
Diploma1450.247
Degree2190.374
By table 02, it can be seen that most of the respondents have studied up to Degree with percentage of 37%. There is no any person with post graduate degree. Degree qualified people are more than thrice much as the GCE A/L passed people.

table 03:

tabyl(food_label, Employment)
Employmentnpercent
Employed full time2320.396 
Employed part-time660.113 
Unemployed580.099 
Student830.142 
Housewife990.169 
Retired480.0819

According to the table 03 we can see that approximately 40% of respondents are full time employees. Only 8% of retired persons are available in the sample. table 04:

tabyl(food_label, Income)
Incomenpercent
Less than Rs: 20000200.0341
Rs: 20000 - Rs: 34999910.155 
Rs: 35000 - Rs: 499992040.348 
Rs: 50000 - Rs: 649991970.336 
Over Rs: 64499740.126 

Table 04 shows that around 34% people receive Rs: 35000 - Rs: 49999 and Rs: 50000 - Rs: 64999 income.

table 05:

tabyl(food_label, Housesize)%>% filter(!(is.na(Housesize))) 

Housesizenpercentvalid_percent
0-24 months310.05290.0531
2-5 years1340.229 0.229 
6-10 years1780.304 0.305 
11-16 years1160.198 0.199 
17-18 years750.128 0.128 
18-30 years260.04440.0445
30-55 years170.029 0.0291
over 55 years70.01190.012 
By table 05, it seems that around 30% household have children with 6-10 years.

table 06:

tabyl(food_label, fshopper)
fshoppernpercent
no1710.292
yes4150.708
tabyl(food_label, mplanner)
mplannernpercent
no1500.256
yes4360.744

The following table 06 reveals that the majority of the respondents are major food shopper of the household. As well as it shows that the majority of the sample are the major meal planner in the household.

table 07:

CrossTable(FA, Gender)

 
   Cell Contents
|-------------------------|
|                       N |
| Chi-square contribution |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  586 

 
             | Gender 
          FA |    female |      male | Row Total | 
-------------|-----------|-----------|-----------|
          no |       336 |       180 |       516 | 
             |     0.049 |     0.088 |           | 
             |     0.651 |     0.349 |     0.881 | 
             |     0.891 |     0.861 |           | 
             |     0.573 |     0.307 |           | 
-------------|-----------|-----------|-----------|
         yes |        41 |        29 |        70 | 
             |     0.361 |     0.652 |           | 
             |     0.586 |     0.414 |     0.119 | 
             |     0.109 |     0.139 |           | 
             |     0.070 |     0.049 |           | 
-------------|-----------|-----------|-----------|
Column Total |       377 |       209 |       586 | 
             |     0.643 |     0.357 |           | 
-------------|-----------|-----------|-----------|

 

In here we can see that only 11% of the sample suffering from the food allergies and most of them are ma

table 08:

CrossTable(Diabetes, Gender)

 
   Cell Contents
|-------------------------|
|                       N |
| Chi-square contribution |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  586 

 
             | Gender 
    Diabetes |    female |      male | Row Total | 
-------------|-----------|-----------|-----------|
          no |       102 |        77 |       179 | 
             |     1.504 |     2.712 |           | 
             |     0.570 |     0.430 |     0.305 | 
             |     0.271 |     0.368 |           | 
             |     0.174 |     0.131 |           | 
-------------|-----------|-----------|-----------|
         yes |       275 |       132 |       407 | 
             |     0.661 |     1.193 |           | 
             |     0.676 |     0.324 |     0.695 | 
             |     0.729 |     0.632 |           | 
             |     0.469 |     0.225 |           | 
-------------|-----------|-----------|-----------|
Column Total |       377 |       209 |       586 | 
             |     0.643 |     0.357 |           | 
-------------|-----------|-----------|-----------|

 

Table 08 shows that the majority of people are suffering from diabetes. Among them, 67% are females.

table 09:

CrossTable(`Metabolic cyndrents`, Gender)

 
   Cell Contents
|-------------------------|
|                       N |
| Chi-square contribution |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  586 

 
                    | Gender 
Metabolic cyndrents |    female |      male | Row Total | 
--------------------|-----------|-----------|-----------|
                 no |       157 |       100 |       257 | 
                    |     0.421 |     0.759 |           | 
                    |     0.611 |     0.389 |     0.439 | 
                    |     0.416 |     0.478 |           | 
                    |     0.268 |     0.171 |           | 
--------------------|-----------|-----------|-----------|
                yes |       220 |       109 |       329 | 
                    |     0.329 |     0.593 |           | 
                    |     0.669 |     0.331 |     0.561 | 
                    |     0.584 |     0.522 |           | 
                    |     0.375 |     0.186 |           | 
--------------------|-----------|-----------|-----------|
       Column Total |       377 |       209 |       586 | 
                    |     0.643 |     0.357 |           | 
--------------------|-----------|-----------|-----------|

 

According to the table 09 we can see that around 56% people have Metabolic cyndrentssuch as Obesity. High blood pressure/Cholesterol, Heart disease.

table 10:

CrossTable(specific, Gender)

 
   Cell Contents
|-------------------------|
|                       N |
| Chi-square contribution |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  586 

 
             | Gender 
    specific |    female |      male | Row Total | 
-------------|-----------|-----------|-----------|
          no |       228 |       131 |       359 | 
             |     0.038 |     0.068 |           | 
             |     0.635 |     0.365 |     0.613 | 
             |     0.605 |     0.627 |           | 
             |     0.389 |     0.224 |           | 
-------------|-----------|-----------|-----------|
         yes |       149 |        78 |       227 | 
             |     0.060 |     0.108 |           | 
             |     0.656 |     0.344 |     0.387 | 
             |     0.395 |     0.373 |           | 
             |     0.254 |     0.133 |           | 
-------------|-----------|-----------|-----------|
Column Total |       377 |       209 |       586 | 
             |     0.643 |     0.357 |           | 
-------------|-----------|-----------|-----------|

 

By table 10 it seems that less number of people have specific diet due to pregnancy, breast feeding, training for sports, vegetarian.

table 11:

tabyl(food_label, job1)
job1npercent
no2550.435
yes3310.565
tabyl(food_label, job2)

job2npercent
no3150.538
yes2710.462
In here we can see that around 56% people are doctors, nurses, health care workers while 46% are Legislators related to food items, Manufactures/ advertisers related to food items.

table 12:

CrossTable(Exercise, Gender)

 
   Cell Contents
|-------------------------|
|                       N |
| Chi-square contribution |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  586 

 
                         | Gender 
                Exercise |    female |      male | Row Total | 
-------------------------|-----------|-----------|-----------|
                   daily |        15 |         6 |        21 | 
                         |     0.164 |     0.296 |           | 
                         |     0.714 |     0.286 |     0.036 | 
                         |     0.040 |     0.029 |           | 
                         |     0.026 |     0.010 |           | 
-------------------------|-----------|-----------|-----------|
at least 2 days per week |        57 |        41 |        98 | 
                         |     0.580 |     1.046 |           | 
                         |     0.582 |     0.418 |     0.167 | 
                         |     0.151 |     0.196 |           | 
                         |     0.097 |     0.070 |           | 
-------------------------|-----------|-----------|-----------|
                  rarely |       165 |        84 |       249 | 
                         |     0.144 |     0.260 |           | 
                         |     0.663 |     0.337 |     0.425 | 
                         |     0.438 |     0.402 |           | 
                         |     0.282 |     0.143 |           | 
-------------------------|-----------|-----------|-----------|
                   never |       140 |        78 |       218 | 
                         |     0.000 |     0.001 |           | 
                         |     0.642 |     0.358 |     0.372 | 
                         |     0.371 |     0.373 |           | 
                         |     0.239 |     0.133 |           | 
-------------------------|-----------|-----------|-----------|
            Column Total |       377 |       209 |       586 | 
                         |     0.643 |     0.357 |           | 
-------------------------|-----------|-----------|-----------|

 

According to the table 12, most of the people are doing exercises rarely. It seems that Females tend to do exercises than males.

table 13:

CrossTable(Health, Gender)

 
   Cell Contents
|-------------------------|
|                       N |
| Chi-square contribution |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  586 

 
             | Gender 
      Health |    female |      male | Row Total | 
-------------|-----------|-----------|-----------|
   excellent |         5 |         4 |         9 | 
             |     0.108 |     0.194 |           | 
             |     0.556 |     0.444 |     0.015 | 
             |     0.013 |     0.019 |           | 
             |     0.009 |     0.007 |           | 
-------------|-----------|-----------|-----------|
        good |        21 |        10 |        31 | 
             |     0.056 |     0.101 |           | 
             |     0.677 |     0.323 |     0.053 | 
             |     0.056 |     0.048 |           | 
             |     0.036 |     0.017 |           | 
-------------|-----------|-----------|-----------|
        fair |        73 |        37 |       110 | 
             |     0.070 |     0.127 |           | 
             |     0.664 |     0.336 |     0.188 | 
             |     0.194 |     0.177 |           | 
             |     0.125 |     0.063 |           | 
-------------|-----------|-----------|-----------|
        poor |       145 |        80 |       225 | 
             |     0.000 |     0.001 |           | 
             |     0.644 |     0.356 |     0.384 | 
             |     0.385 |     0.383 |           | 
             |     0.247 |     0.137 |           | 
-------------|-----------|-----------|-----------|
   can't say |       133 |        78 |       211 | 
             |     0.056 |     0.100 |           | 
             |     0.630 |     0.370 |     0.360 | 
             |     0.353 |     0.373 |           | 
             |     0.227 |     0.133 |           | 
-------------|-----------|-----------|-----------|
Column Total |       377 |       209 |       586 | 
             |     0.643 |     0.357 |           | 
-------------|-----------|-----------|-----------|

 

Table 13 shows that most of them have poor health condition. Approximately 64% females have poor health. Only 1.5% have the excellent health condition.

table 14:

CrossTable(place, Gender)

 
   Cell Contents
|-------------------------|
|                       N |
| Chi-square contribution |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  586 

 
               | Gender 
         place |    female |      male | Row Total | 
---------------|-----------|-----------|-----------|
  retail shops |        71 |        31 |       102 | 
               |     0.441 |     0.795 |           | 
               |     0.696 |     0.304 |     0.174 | 
               |     0.188 |     0.148 |           | 
               |     0.121 |     0.053 |           | 
---------------|-----------|-----------|-----------|
 super markets |       174 |       102 |       276 | 
               |     0.072 |     0.129 |           | 
               |     0.630 |     0.370 |     0.471 | 
               |     0.462 |     0.488 |           | 
               |     0.297 |     0.174 |           | 
---------------|-----------|-----------|-----------|
  both equally |       132 |        76 |       208 | 
               |     0.025 |     0.044 |           | 
               |     0.635 |     0.365 |     0.355 | 
               |     0.350 |     0.364 |           | 
               |     0.225 |     0.130 |           | 
---------------|-----------|-----------|-----------|
  Column Total |       377 |       209 |       586 | 
               |     0.643 |     0.357 |           | 
---------------|-----------|-----------|-----------|

 

It seems that around 47% of people buy packaged foods from the supermarkets.

Distributions and Relationships of the sample

figure 01:

ggplot(food_label, aes(x = Gender, y = Age, fill = Gender)) +  
  geom_boxplot(size = .75) +   facet_grid(marital~., margins = FALSE) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))+
  ggtitle("Distribution of Age by Gender and Marital status")

According to figure 01 we can see that married females positively skewed distributed with age, while males have negatively skewed distribution. Both single males and females have a symmetric distribution with age.

figure 02:

ggplot(food_label, aes(x = Education, y = Age, fill = Gender)) +  
  geom_boxplot(size = .75) +   
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))+
  ggtitle("Distribution of Age by Gender and Education level")

Figure 02 shows that females negatively skewed distributed only for Passed GCE A/L with Age.

figure 03:

ggplot(food_label, aes(x = Employment, y = Age, fill = Gender)) +  
  geom_boxplot(size = .75) +   
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))+
  ggtitle("Distribution of Age by Gender and Employment")

It seems that both male and female students show negatively skewed distribution with age. Retired and unemployed males show negatively skewed distribution. Full time male employers have nearly symmetric distribution. figure 04:

ggplot(food_label, aes(x = Income, y = Age, fill = Gender)) +  
  geom_boxplot(size = .75) +   
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))+
  ggtitle("Distribution of Age by Gender and Income")

According to the figure 05 we can see that males have only positively skewed distribution with age in over Rs: 64499 income.

figure 05:

ggplot(food_label, aes(x = fshopper, y = Age, fill = Gender)) +  
  geom_boxplot(size = .75) +   facet_grid(mplanner~., margins = FALSE) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))+
  ggtitle("Distribution of Age by Gender, meal palnner and food shopper")

By figure 05 we can see that both males and females who major food shopper and meal planner, have positively skewed distribution with age. Females who are not moth food shopper and meal planner have negatively skewed distribution.

figure 06:

p1 <- ggplot(food_label, aes(x=FA, y=Age, fill = Gender)) +
  geom_boxplot(outlier.size = 1, colour="black", width=0.1 ) + 
  geom_violin(alpha = 0.2, width = 1) +facet_grid(Gender~., margins = FALSE) + 
  theme(legend.position = "none")+
  xlab("Food allergies") +
  ylab("Age") +
  ggtitle("Distribution of Age by Food allergies")

p2 <- ggplot(food_label, aes(x=Diabetes, y=Age, fill = Gender)) +
  geom_boxplot(outlier.size = 1, colour="black", width=0.1 ) + 
  geom_violin(alpha = 0.2, width = 1) +facet_grid(Gender~., margins = FALSE) +
  theme(legend.position = "none")+
  xlab("Diabetes") +
  ylab("Age") +
  ggtitle("Distribution of Age by Diabetes")

p3 <- ggplot(food_label, aes(x=`Metabolic cyndrents`, y=Age, fill = Gender)) +
  geom_boxplot(outlier.size = 1, colour="black", width=0.1 ) + 
  geom_violin(alpha = 0.2, width = 1) +facet_grid(Gender~., margins = FALSE) +
  theme(legend.position = "none")+
  xlab("Metabolic cyndrents") +
  ylab("Age") +
  ggtitle("Distribution of Age by \nMetabolic cyndrents")

p4 <-  ggplot(food_label, aes(x=specific, y=Age, fill = Gender)) +
  geom_boxplot(outlier.size = 1, colour="black", width=0.1 ) + 
  geom_violin(alpha = 0.2, width = 1) +facet_grid(Gender~., margins = FALSE) +
  xlab("specific diets") +
  ylab("Age") +
  ggtitle("Distribution of Age by \n specific diets")

(p1|p2) / (p3|p4)

Figure 06 shows that there are bimodal distribution for food allergies, Diabetes, Metabolic syndrome and specific diet. There may be some external factor that affects food allergies. Males who are having food allergies have positively skewed distribution.

figure 07:

p1 <- ggplot(food_label, aes(x=job1, y=Age, fill = Gender)) +
  geom_boxplot(outlier.size = 1, colour="black", width=0.1 ) + 
  geom_violin(alpha = 0.2, width = 1) +facet_grid(Gender~., margins = FALSE) +
   theme(legend.position = "none", axis.text.x = element_text(angle = 60, hjust = 1))+ 
  ylab("Age") +
  ggtitle("Distribution of Age by \ndoctors, nurses, \nhealth care workers jobs")

p2 <-  ggplot(food_label, aes(x=job2, y=Age, fill = Gender)) +
  geom_boxplot(outlier.size = 1, colour="black", width=0.1 ) + 
  geom_violin(alpha = 0.2, width = 1) +facet_grid(Gender~., margins = FALSE) +
   theme(axis.text.x = element_text(angle = 60, hjust = 1))+
  ylab("Age") +
  ggtitle("Distribution of Age by \nLegislators related to food items,\nManufactures/advertisers \nrelated to food items")

p1|p2

According to the figure 07 there are bimodal distributions in every category. There might be some external factors that affect those job types. As well as other than males who are working and not working as legislators, Manufactures/advertisers related to food items have positively skewed distribution with age.

figure 08:

ggplot(food_label, aes(x=Exercise, y=Age, fill = Gender)) +
  geom_boxplot(outlier.size = 1, colour="black", width=0.1 ) + 
  geom_violin(alpha = 0.2, width = 1) +facet_grid(Gender~., margins = FALSE) +
  xlab(" Exercise") +
  ylab("Age") +
  ggtitle("Distribution of Age by exercise")

In here we can see that older men tend to do exercises daily. But younger males show positively skewed distribution for not exercising. A small number of females are doing exercises daily and it shows positively skewed distribution.

figure 09:

ggplot(food_label, aes(x=place, y=Age, fill = Gender)) +
  geom_boxplot(outlier.size = 1, colour="black", width=0.1 ) + 
  geom_violin(alpha = 0.2, width = 1) +facet_grid(Gender~., margins = FALSE) +
  geom_violin(alpha = 0.2, fill = "pink", width = 1) +
  xlab(" place") +
  ylab("Age") +
  ggtitle("Distribution of Age by place")

In here we can see that there are some bimodal distributions. Females positively skewed distribution for both retails shops and supermarkets.

figure 10:

ggpairs(food_label, mapping = aes(color=Gender, alpha =0.2),
        columns =c ("Age", "Children"))+
  ggtitle("Scatter plot matrix by gender")

Figure 10 shows that there is a weak positive linear relationship between age and children. Females have a negative, weak linear relationship between age and children.

figure 11:

ggpairs(food_label, mapping = aes(color=marital, alpha =0.6),
        columns =c ("Age", "Children"))+
  ggtitle("Scatter plot matrix by marital status")

Figure 11 shows that there is a weak positive linear relationship between age and children for both single and married people.

figure 12:

ggpairs(food_label, mapping = aes(color=Education, alpha =0.6),
        columns =c ("Age", "Children"))+
  ggtitle("Scatter plot matrix by education")

In here we can see that there is a moderate positive linear relationship between age and children of people who GCE A/L passed.

figure 12:

ggpairs(food_label, mapping = aes(color=Exercise, alpha =0.6),
        columns =c ("Age", "Children"))+
  ggtitle("Scatter plot matrix by exercise")

In here we can see that there is a moderate positive linear relationship between age and children for people who exercise daily.