Introduction

Food industry is one of the fastest moving industrial sectors. The glamorous and glittering retail shops and supermarkets are expanding very fast all over the country. The majority of food is pre-packed and presented to the consumer in a labelled container. The data for the project is based on a study attempts to evaluate the consumers attitude towards food labels and awareness of information printed on food labels.This study was carried out to identify the association between demographic, socio-economic characteristics and health related factors on consumers attitude towards food labels and association between demographic, socio-economic characteristics and health related factors on awareness of information printed in food labels.

Varibale	Description	Type of variable
Gender	Gender	Qualitative
Age	Age	Quantitative
Education	Educational level	Qualitative
Employment	Employment status	Qualitative
Income	Household Income	Qualitative
House size	Age groups	Qualitative
Children	Number of children	Quantitative
Marital	Marital status	Qualitative
fshopper	Major food shopper of the household	Qualitative
mplanner	Major meal planner of the household	Qualitative
FA	Having food allergies	Qualitative
Diabetes	Having Diabetes	Qualitative
Metabolic syndrome	Having Obesity, High blood pressure/Cholesterol, Heart disease	Qualitative
Other	Having Migrain, Osteoporoses, Other	Qualitative
Specif	Having a specific diet(pregnancy, breast feeding,training for sports,vegetarian)	Qualitative
Job1	Doctors, nurses, health care workers	Qualitative
Job2	Legislators related to food items Manufactures/ advertisers related to food items	Qualitative
Exercise	Frequency of do exercise	Qualitative
Health	Self perception of overall health	Qualitative
Place	Place of where buy package food	Qualitative
Easy	Easiness of the package food	Qualitative
Familiarity	Familiarity with the product	Qualitative
Friends	Recommendation by family and friends	Qualitative
Useful	Usefulness of food label	Qualitative
Easiness	Easiness of understand the information on food labels	Qualitative
Sufficient	Sufficiency of information provided in food label	Qualitative
Truthfulness	Truthfulness of information provided in food label	Qualitative
Clear	clarity of information printed in food label	Qualitative
Attractive pack	Influence of attractive package	Qualitative
Hc/nufriclaim	Influence of health claims/ Nutrition claims	Qualitative
Graphical	Influence of graphical and pictorial information	Qualitative
Free Price	Influence of Free/ Prizes/ Contests	Qualitative
Net quan	Awareness of net quantity	Qualitative
Low in fat	Awareness of low in fat	Qualitative
Low in cho	Awareness of low in cholesterol	Qualitative
Sodium	Awareness of nutrition claim indicates the lowest amount of sodium	Qualitative
elabels	Awareness of Ecode labels	Qualitative

Packages

library(tidyverse)
library(janitor)
library(ggplot2)
library(gmodels)
library(GGally)
library(patchwork)
library(MASS)
library(huxtable)

First, we neeed to look at the types of collected data.

summary(foodlabel)
glimpse(food_label)

In here, we can see that the factors were recognized as integers. We need to convert them into factors. As well as in house size, eleventh there is 8 categories two data reported as 10 and 9. Therefore we have to replace those as missing values.

foodlabel <- foodlabel %>% mutate(Housesize = replace(Housesize, which(Housesize > 8), NA))

Composition of the sample

table 01:

tabyl(food_label,Gender)

Gender	n	percent
female	377	0.643
male	209	0.357

tabyl(food_label, marital)

marital	n	percent
single	155	0.265
married	431	0.735

According to the table 01 we can see that more than 64% of females are in the sample. It shows that females in the sample approximately twice as much as the number of males in the sample. As well as it shows that nearly 74% people are married and it is thrice much as the singles.

table 02:

tabyl(food_label, Education)

Education	n	percent
Below O/L	66	0.113
Passed GCE O/L	68	0.116
Passed GCE A/L	88	0.15
Diploma	145	0.247
Degree	219	0.374

By table 02, it can be seen that most of the respondents have studied up to Degree with percentage of 37%. There is no any person with post graduate degree. Degree qualified people are more than thrice much as the GCE A/L passed people.

table 03:

tabyl(food_label, Employment)

Employment	n	percent
Employed full time	232	0.396
Employed part-time	66	0.113
Unemployed	58	0.099
Student	83	0.142
Housewife	99	0.169
Retired	48	0.0819

According to the table 03 we can see that approximately 40% of respondents are full time employees. Only 8% of retired persons are available in the sample. table 04:

tabyl(food_label, Income)

Income	n	percent
Less than Rs: 20000	20	0.0341
Rs: 20000 - Rs: 34999	91	0.155
Rs: 35000 - Rs: 49999	204	0.348
Rs: 50000 - Rs: 64999	197	0.336
Over Rs: 64499	74	0.126

Table 04 shows that around 34% people receive Rs: 35000 - Rs: 49999 and Rs: 50000 - Rs: 64999 income.

table 05:

tabyl(food_label, Housesize)%>% filter(!(is.na(Housesize)))

Housesize	n	percent	valid_percent
0-24 months	31	0.0529	0.0531
2-5 years	134	0.229	0.229
6-10 years	178	0.304	0.305
11-16 years	116	0.198	0.199
17-18 years	75	0.128	0.128
18-30 years	26	0.0444	0.0445
30-55 years	17	0.029	0.0291
over 55 years	7	0.0119	0.012

By table 05, it seems that around 30% household have children with 6-10 years.

table 06:

tabyl(food_label, fshopper)

fshopper	n	percent
no	171	0.292
yes	415	0.708

tabyl(food_label, mplanner)

mplanner	n	percent
no	150	0.256
yes	436	0.744

The following table 06 reveals that the majority of the respondents are major food shopper of the household. As well as it shows that the majority of the sample are the major meal planner in the household.

table 07:

CrossTable(FA, Gender)


 
   Cell Contents
|-------------------------|
|                       N |
| Chi-square contribution |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  586 

 
             | Gender 
          FA |    female |      male | Row Total | 
-------------|-----------|-----------|-----------|
          no |       336 |       180 |       516 | 
             |     0.049 |     0.088 |           | 
             |     0.651 |     0.349 |     0.881 | 
             |     0.891 |     0.861 |           | 
             |     0.573 |     0.307 |           | 
-------------|-----------|-----------|-----------|
         yes |        41 |        29 |        70 | 
             |     0.361 |     0.652 |           | 
             |     0.586 |     0.414 |     0.119 | 
             |     0.109 |     0.139 |           | 
             |     0.070 |     0.049 |           | 
-------------|-----------|-----------|-----------|
Column Total |       377 |       209 |       586 | 
             |     0.643 |     0.357 |           | 
-------------|-----------|-----------|-----------|

In here we can see that only 11% of the sample suffering from the food allergies and most of them are ma

table 08:

CrossTable(Diabetes, Gender)


 
   Cell Contents
|-------------------------|
|                       N |
| Chi-square contribution |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  586 

 
             | Gender 
    Diabetes |    female |      male | Row Total | 
-------------|-----------|-----------|-----------|
          no |       102 |        77 |       179 | 
             |     1.504 |     2.712 |           | 
             |     0.570 |     0.430 |     0.305 | 
             |     0.271 |     0.368 |           | 
             |     0.174 |     0.131 |           | 
-------------|-----------|-----------|-----------|
         yes |       275 |       132 |       407 | 
             |     0.661 |     1.193 |           | 
             |     0.676 |     0.324 |     0.695 | 
             |     0.729 |     0.632 |           | 
             |     0.469 |     0.225 |           | 
-------------|-----------|-----------|-----------|
Column Total |       377 |       209 |       586 | 
             |     0.643 |     0.357 |           | 
-------------|-----------|-----------|-----------|

Table 08 shows that the majority of people are suffering from diabetes. Among them, 67% are females.

table 09:

CrossTable(`Metabolic cyndrents`, Gender)


 
   Cell Contents
|-------------------------|
|                       N |
| Chi-square contribution |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  586 

 
                    | Gender 
Metabolic cyndrents |    female |      male | Row Total | 
--------------------|-----------|-----------|-----------|
                 no |       157 |       100 |       257 | 
                    |     0.421 |     0.759 |           | 
                    |     0.611 |     0.389 |     0.439 | 
                    |     0.416 |     0.478 |           | 
                    |     0.268 |     0.171 |           | 
--------------------|-----------|-----------|-----------|
                yes |       220 |       109 |       329 | 
                    |     0.329 |     0.593 |           | 
                    |     0.669 |     0.331 |     0.561 | 
                    |     0.584 |     0.522 |           | 
                    |     0.375 |     0.186 |           | 
--------------------|-----------|-----------|-----------|
       Column Total |       377 |       209 |       586 | 
                    |     0.643 |     0.357 |           | 
--------------------|-----------|-----------|-----------|

According to the table 09 we can see that around 56% people have Metabolic cyndrentssuch as Obesity. High blood pressure/Cholesterol, Heart disease.

table 10:

CrossTable(specific, Gender)


 
   Cell Contents
|-------------------------|
|                       N |
| Chi-square contribution |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  586 

 
             | Gender 
    specific |    female |      male | Row Total | 
-------------|-----------|-----------|-----------|
          no |       228 |       131 |       359 | 
             |     0.038 |     0.068 |           | 
             |     0.635 |     0.365 |     0.613 | 
             |     0.605 |     0.627 |           | 
             |     0.389 |     0.224 |           | 
-------------|-----------|-----------|-----------|
         yes |       149 |        78 |       227 | 
             |     0.060 |     0.108 |           | 
             |     0.656 |     0.344 |     0.387 | 
             |     0.395 |     0.373 |           | 
             |     0.254 |     0.133 |           | 
-------------|-----------|-----------|-----------|
Column Total |       377 |       209 |       586 | 
             |     0.643 |     0.357 |           | 
-------------|-----------|-----------|-----------|

By table 10 it seems that less number of people have specific diet due to pregnancy, breast feeding, training for sports, vegetarian.

table 11:

tabyl(food_label, job1)

job1	n	percent
no	255	0.435
yes	331	0.565

tabyl(food_label, job2)

job2	n	percent
no	315	0.538
yes	271	0.462

In here we can see that around 56% people are doctors, nurses, health care workers while 46% are Legislators related to food items, Manufactures/ advertisers related to food items.

table 12:

CrossTable(Exercise, Gender)


 
   Cell Contents
|-------------------------|
|                       N |
| Chi-square contribution |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  586 

 
                         | Gender 
                Exercise |    female |      male | Row Total | 
-------------------------|-----------|-----------|-----------|
                   daily |        15 |         6 |        21 | 
                         |     0.164 |     0.296 |           | 
                         |     0.714 |     0.286 |     0.036 | 
                         |     0.040 |     0.029 |           | 
                         |     0.026 |     0.010 |           | 
-------------------------|-----------|-----------|-----------|
at least 2 days per week |        57 |        41 |        98 | 
                         |     0.580 |     1.046 |           | 
                         |     0.582 |     0.418 |     0.167 | 
                         |     0.151 |     0.196 |           | 
                         |     0.097 |     0.070 |           | 
-------------------------|-----------|-----------|-----------|
                  rarely |       165 |        84 |       249 | 
                         |     0.144 |     0.260 |           | 
                         |     0.663 |     0.337 |     0.425 | 
                         |     0.438 |     0.402 |           | 
                         |     0.282 |     0.143 |           | 
-------------------------|-----------|-----------|-----------|
                   never |       140 |        78 |       218 | 
                         |     0.000 |     0.001 |           | 
                         |     0.642 |     0.358 |     0.372 | 
                         |     0.371 |     0.373 |           | 
                         |     0.239 |     0.133 |           | 
-------------------------|-----------|-----------|-----------|
            Column Total |       377 |       209 |       586 | 
                         |     0.643 |     0.357 |           | 
-------------------------|-----------|-----------|-----------|

According to the table 12, most of the people are doing exercises rarely. It seems that Females tend to do exercises than males.

table 13:

CrossTable(Health, Gender)


 
   Cell Contents
|-------------------------|
|                       N |
| Chi-square contribution |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  586 

 
             | Gender 
      Health |    female |      male | Row Total | 
-------------|-----------|-----------|-----------|
   excellent |         5 |         4 |         9 | 
             |     0.108 |     0.194 |           | 
             |     0.556 |     0.444 |     0.015 | 
             |     0.013 |     0.019 |           | 
             |     0.009 |     0.007 |           | 
-------------|-----------|-----------|-----------|
        good |        21 |        10 |        31 | 
             |     0.056 |     0.101 |           | 
             |     0.677 |     0.323 |     0.053 | 
             |     0.056 |     0.048 |           | 
             |     0.036 |     0.017 |           | 
-------------|-----------|-----------|-----------|
        fair |        73 |        37 |       110 | 
             |     0.070 |     0.127 |           | 
             |     0.664 |     0.336 |     0.188 | 
             |     0.194 |     0.177 |           | 
             |     0.125 |     0.063 |           | 
-------------|-----------|-----------|-----------|
        poor |       145 |        80 |       225 | 
             |     0.000 |     0.001 |           | 
             |     0.644 |     0.356 |     0.384 | 
             |     0.385 |     0.383 |           | 
             |     0.247 |     0.137 |           | 
-------------|-----------|-----------|-----------|
   can't say |       133 |        78 |       211 | 
             |     0.056 |     0.100 |           | 
             |     0.630 |     0.370 |     0.360 | 
             |     0.353 |     0.373 |           | 
             |     0.227 |     0.133 |           | 
-------------|-----------|-----------|-----------|
Column Total |       377 |       209 |       586 | 
             |     0.643 |     0.357 |           | 
-------------|-----------|-----------|-----------|

Table 13 shows that most of them have poor health condition. Approximately 64% females have poor health. Only 1.5% have the excellent health condition.

table 14:

CrossTable(place, Gender)


 
   Cell Contents
|-------------------------|
|                       N |
| Chi-square contribution |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  586 

 
               | Gender 
         place |    female |      male | Row Total | 
---------------|-----------|-----------|-----------|
  retail shops |        71 |        31 |       102 | 
               |     0.441 |     0.795 |           | 
               |     0.696 |     0.304 |     0.174 | 
               |     0.188 |     0.148 |           | 
               |     0.121 |     0.053 |           | 
---------------|-----------|-----------|-----------|
 super markets |       174 |       102 |       276 | 
               |     0.072 |     0.129 |           | 
               |     0.630 |     0.370 |     0.471 | 
               |     0.462 |     0.488 |           | 
               |     0.297 |     0.174 |           | 
---------------|-----------|-----------|-----------|
  both equally |       132 |        76 |       208 | 
               |     0.025 |     0.044 |           | 
               |     0.635 |     0.365 |     0.355 | 
               |     0.350 |     0.364 |           | 
               |     0.225 |     0.130 |           | 
---------------|-----------|-----------|-----------|
  Column Total |       377 |       209 |       586 | 
               |     0.643 |     0.357 |           | 
---------------|-----------|-----------|-----------|

It seems that around 47% of people buy packaged foods from the supermarkets.

Distributions and Relationships of the sample

figure 01:

ggplot(food_label, aes(x = Gender, y = Age, fill = Gender)) +  
  geom_boxplot(size = .75) +   facet_grid(marital~., margins = FALSE) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))+
  ggtitle("Distribution of Age by Gender and Marital status")

According to figure 01 we can see that married females positively skewed distributed with age, while males have negatively skewed distribution. Both single males and females have a symmetric distribution with age.

figure 02:

ggplot(food_label, aes(x = Education, y = Age, fill = Gender)) +  
  geom_boxplot(size = .75) +   
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))+
  ggtitle("Distribution of Age by Gender and Education level")

Figure 02 shows that females negatively skewed distributed only for Passed GCE A/L with Age.

figure 03:

ggplot(food_label, aes(x = Employment, y = Age, fill = Gender)) +  
  geom_boxplot(size = .75) +   
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))+
  ggtitle("Distribution of Age by Gender and Employment")

It seems that both male and female students show negatively skewed distribution with age. Retired and unemployed males show negatively skewed distribution. Full time male employers have nearly symmetric distribution. figure 04:

ggplot(food_label, aes(x = Income, y = Age, fill = Gender)) +  
  geom_boxplot(size = .75) +   
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))+
  ggtitle("Distribution of Age by Gender and Income")

According to the figure 05 we can see that males have only positively skewed distribution with age in over Rs: 64499 income.

figure 05:

ggplot(food_label, aes(x = fshopper, y = Age, fill = Gender)) +  
  geom_boxplot(size = .75) +   facet_grid(mplanner~., margins = FALSE) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))+
  ggtitle("Distribution of Age by Gender, meal palnner and food shopper")

By figure 05 we can see that both males and females who major food shopper and meal planner, have positively skewed distribution with age. Females who are not moth food shopper and meal planner have negatively skewed distribution.

figure 06:

p1 <- ggplot(food_label, aes(x=FA, y=Age, fill = Gender)) +
  geom_boxplot(outlier.size = 1, colour="black", width=0.1 ) + 
  geom_violin(alpha = 0.2, width = 1) +facet_grid(Gender~., margins = FALSE) + 
  theme(legend.position = "none")+
  xlab("Food allergies") +
  ylab("Age") +
  ggtitle("Distribution of Age by Food allergies")

p2 <- ggplot(food_label, aes(x=Diabetes, y=Age, fill = Gender)) +
  geom_boxplot(outlier.size = 1, colour="black", width=0.1 ) + 
  geom_violin(alpha = 0.2, width = 1) +facet_grid(Gender~., margins = FALSE) +
  theme(legend.position = "none")+
  xlab("Diabetes") +
  ylab("Age") +
  ggtitle("Distribution of Age by Diabetes")

p3 <- ggplot(food_label, aes(x=`Metabolic cyndrents`, y=Age, fill = Gender)) +
  geom_boxplot(outlier.size = 1, colour="black", width=0.1 ) + 
  geom_violin(alpha = 0.2, width = 1) +facet_grid(Gender~., margins = FALSE) +
  theme(legend.position = "none")+
  xlab("Metabolic cyndrents") +
  ylab("Age") +
  ggtitle("Distribution of Age by \nMetabolic cyndrents")

p4 <-  ggplot(food_label, aes(x=specific, y=Age, fill = Gender)) +
  geom_boxplot(outlier.size = 1, colour="black", width=0.1 ) + 
  geom_violin(alpha = 0.2, width = 1) +facet_grid(Gender~., margins = FALSE) +
  xlab("specific diets") +
  ylab("Age") +
  ggtitle("Distribution of Age by \n specific diets")

(p1|p2) / (p3|p4)

Figure 06 shows that there are bimodal distribution for food allergies, Diabetes, Metabolic syndrome and specific diet. There may be some external factor that affects food allergies. Males who are having food allergies have positively skewed distribution.

figure 07:

p1 <- ggplot(food_label, aes(x=job1, y=Age, fill = Gender)) +
  geom_boxplot(outlier.size = 1, colour="black", width=0.1 ) + 
  geom_violin(alpha = 0.2, width = 1) +facet_grid(Gender~., margins = FALSE) +
   theme(legend.position = "none", axis.text.x = element_text(angle = 60, hjust = 1))+ 
  ylab("Age") +
  ggtitle("Distribution of Age by \ndoctors, nurses, \nhealth care workers jobs")

p2 <-  ggplot(food_label, aes(x=job2, y=Age, fill = Gender)) +
  geom_boxplot(outlier.size = 1, colour="black", width=0.1 ) + 
  geom_violin(alpha = 0.2, width = 1) +facet_grid(Gender~., margins = FALSE) +
   theme(axis.text.x = element_text(angle = 60, hjust = 1))+
  ylab("Age") +
  ggtitle("Distribution of Age by \nLegislators related to food items,\nManufactures/advertisers \nrelated to food items")

p1|p2

According to the figure 07 there are bimodal distributions in every category. There might be some external factors that affect those job types. As well as other than males who are working and not working as legislators, Manufactures/advertisers related to food items have positively skewed distribution with age.

figure 08:

ggplot(food_label, aes(x=Exercise, y=Age, fill = Gender)) +
  geom_boxplot(outlier.size = 1, colour="black", width=0.1 ) + 
  geom_violin(alpha = 0.2, width = 1) +facet_grid(Gender~., margins = FALSE) +
  xlab(" Exercise") +
  ylab("Age") +
  ggtitle("Distribution of Age by exercise")

In here we can see that older men tend to do exercises daily. But younger males show positively skewed distribution for not exercising. A small number of females are doing exercises daily and it shows positively skewed distribution.

figure 09:

ggplot(food_label, aes(x=place, y=Age, fill = Gender)) +
  geom_boxplot(outlier.size = 1, colour="black", width=0.1 ) + 
  geom_violin(alpha = 0.2, width = 1) +facet_grid(Gender~., margins = FALSE) +
  geom_violin(alpha = 0.2, fill = "pink", width = 1) +
  xlab(" place") +
  ylab("Age") +
  ggtitle("Distribution of Age by place")

In here we can see that there are some bimodal distributions. Females positively skewed distribution for both retails shops and supermarkets.

figure 10:

ggpairs(food_label, mapping = aes(color=Gender, alpha =0.2),
        columns =c ("Age", "Children"))+
  ggtitle("Scatter plot matrix by gender")

Figure 10 shows that there is a weak positive linear relationship between age and children. Females have a negative, weak linear relationship between age and children.

figure 11:

ggpairs(food_label, mapping = aes(color=marital, alpha =0.6),
        columns =c ("Age", "Children"))+
  ggtitle("Scatter plot matrix by marital status")

Figure 11 shows that there is a weak positive linear relationship between age and children for both single and married people.

figure 12:

ggpairs(food_label, mapping = aes(color=Education, alpha =0.6),
        columns =c ("Age", "Children"))+
  ggtitle("Scatter plot matrix by education")

In here we can see that there is a moderate positive linear relationship between age and children of people who GCE A/L passed.

figure 12:

ggpairs(food_label, mapping = aes(color=Exercise, alpha =0.6),
        columns =c ("Age", "Children"))+
  ggtitle("Scatter plot matrix by exercise")

In here we can see that there is a moderate positive linear relationship between age and children for people who exercise daily.

Association between demographic, socio-economic characteristics and health related factors on attitude towards food labels.

figure 13:

p1 <- ggplot(food_label, aes(x=`attractive pack`, y=Age, group=`attractive pack`)) +
  geom_boxplot(outlier.size = 1, colour="black", width=0.1 ) + 
  geom_violin(alpha = 0.2, fill = "pink", width = 1) +
  xlab("attractive pack") +
  ylab("Age") + 
  ggtitle("Distribution of Age by \nattractive pack")


p2 <- ggplot(food_label, aes(x= `hc/nutriclaims`, y=Age, group=`hc/nutriclaims`)) +
  geom_boxplot(outlier.size = 1, colour="black", width=0.1 ) + 
  geom_violin(alpha = 0.2, fill = "pink", width = 1) +
  xlab("hc/nutriclaims") +
  ylab("Age") +
  ggtitle("Distribution of Age by \nhc/nutriclaims")

p3 <- ggplot(food_label, aes(x= graphical, y=Age, group=graphical)) +
  geom_boxplot(outlier.size = 1, colour="black", width=0.1 ) + 
  geom_violin(alpha = 0.2, fill = "pink", width = 1) +
  xlab("graphical label") +
  ylab("Age") +
  ggtitle("Distribution of Age by \ngraphical label")

p4 <- ggplot(food_label, aes(x= `Free/prize`, y=Age, group=`Free/prize`)) +
  geom_boxplot(outlier.size = 1, colour="black", width=0.1 ) + 
  geom_violin(alpha = 0.2, fill = "pink", width = 1) +
  xlab("Free/prize") +
  ylab("Age") +
  ggtitle("Distribution of Age by \n`Free/prize`")


(p1|p2)/(p3|p4)

In here we can see that there are bimodal distribution for attractive package, hc/nuticlaims, graphical labels, free prize with age. High influence towards attractive package has positively skewed distribution.

figure 14:

p1 <- ggplot(food_label, aes(x=`attractive pack`, y=Education,fill=Age)) + 
  geom_raster()+ theme(axis.text.x = element_text(angle = 60, hjust = 1))+
  ggtitle("attractive pack and \neducation by age`")



p2 <- ggplot(food_label, aes(x=`hc/nutriclaims`, y=Education,fill=Age)) + 
  geom_raster()+ theme(axis.text.x = element_text(angle = 60, hjust = 1))+
  ggtitle("hc/nutriclaims and \neducation by age``")



p3 <- ggplot(food_label, aes(x=graphical, y=Education,fill=Age)) + 
  geom_raster()+ theme(axis.text.x = element_text(angle = 60, hjust = 1))+
  ggtitle("graphical label and \neducation by age``")



p4 <- ggplot(food_label, aes(x=`Free/prize`, y=Education,fill=Age)) + 
  geom_raster()+ theme(axis.text.x = element_text(angle = 60, hjust = 1))+
  ggtitle("Free/prize and \neducation by age``")


(p1|p2)/(p3|p4)

According to figure 15, older people who are having a diploma or GCE A/L passed have no influence towards the attractive package. As well as Older people who are degree qualified have a high influence towards health claims and nutri claims, graphical labelsand free prize.

figure 15:

p1 <- ggplot(food_label, aes(x=`attractive pack`, y=Employment,fill=Age)) + 
  geom_raster()+ theme(axis.text.x = element_text(angle = 60, hjust = 1))+
  ggtitle("attractive pack and \nemployment by age`")



p2 <- ggplot(food_label, aes(x=`hc/nutriclaims`, y=Employment,fill=Age)) + 
  geom_raster()+ theme(axis.text.x = element_text(angle = 60, hjust = 1))+
  ggtitle("hc/nutriclaims and \nemployment by age``")



p3 <- ggplot(food_label, aes(x=graphical, y=Employment,fill=Age)) + 
  geom_raster()+ theme(axis.text.x = element_text(angle = 60, hjust = 1))+
  ggtitle("graphical label and \nemployment by age``")



p4 <- ggplot(food_label, aes(x=`Free/prize`, y=Employment,fill=Age)) + 
  geom_raster()+ theme(axis.text.x = element_text(angle = 60, hjust = 1))+
  ggtitle("Free/prize and \nemployment by age``")


(p1|p2)/(p3|p4)

In here we can see that students have high influence towards the attractive package, health claims and Nutri claims, graphical labels and free prizes. Part time employed people between 50 and 60 have high influence towards the attractive package, health claims and Nutri claims, graphical labels and free prizes.

figure 16:

p1 <- ggplot(food_label, aes(x=`attractive pack`, y=Income,fill=Age)) + 
  geom_raster()+ theme(axis.text.x = element_text(angle = 60, hjust = 1))+
  ggtitle("attractive pack and \nincome  by age`")



p2 <- ggplot(food_label, aes(x=`hc/nutriclaims`, y=Income,fill=Age)) + 
  geom_raster()+ theme(axis.text.x = element_text(angle = 60, hjust = 1))+
  ggtitle("hc/nutriclaims and \nincome by age``")



p3 <- ggplot(food_label, aes(x=graphical, y=Income,fill=Age)) + 
  geom_raster()+ theme(axis.text.x = element_text(angle = 60, hjust = 1))+
  ggtitle("graphical label and \nincome by age``")



p4 <- ggplot(food_label, aes(x=`Free/prize`, y=Income,fill=Age)) + 
  geom_raster()+ theme(axis.text.x = element_text(angle = 60, hjust = 1))+
  ggtitle("Free/prize and \nincome by age``")


(p1|p2)/(p3|p4)

People between 20 and 30 who earn less than Rs:20000 don't care about the graphical label and free prizes. Older people who earn income between Rs: 35000 and Rs:49000 have no interest towards attractive package.

figure 17:

ggpairs(food_label, mapping = aes(color=`attractive pack`, alpha =0.2),
        columns =c ("Age", "Children"))+
  ggtitle("Scatter plot matrix by attractive pack")

There is no relationship between age and children. There neagtive weak linear relationship towards little influence for attractive package. There is no any strong linear relationships.

figure 18:

ggpairs(food_label, mapping = aes(color=`hc/nutriclaims`, alpha =0.2),
        columns =c ("Age", "Children"))+
  ggtitle("Scatter plot matrix by hc/nutriclaims")

There is weak linear postive relationship in every category.

figure 19:

 ggpairs(food_label, mapping = aes(color=graphical, alpha =0.2),
        columns =c ("Age", "Children"))+
  ggtitle("Scatter plot matrix by graphical label")

There is no any strong linera relationships.

figure 20:

 ggpairs(food_label, mapping = aes(color=`Free/prize`, alpha =0.2),
        columns =c ("Age", "Children"))+
  ggtitle("Scatter plot matrix by Free/prize")

There is no any strong linear relationships.

Before perform the statistical tests we have to omit missing values.

food_label <- na.omit(food_label)

To identify association betweeen categorical variables we can use chi square test. Ho : There is no association between X and Y H1 : There is an association between X and Y

Level of significance (??)= 0.05

Test statistic : e = (row.sum???col.sum) / grand.total

Decision rule : If p value <= ??, we reject H0 at 5% level of significance.

chiTable <- hux(
        Variable = c("Gender", "Education", "Employment", "Income", "Housesize" , 
                     "marital" ,"fshopper","mplanner" , "FA" ,"Diabetes" ,
                     "Metabolic_cyndrents", "Other", "specific","job1", "job2",
                     "Exercise" ,"Health", "place" ),  
        `attractive pack` = c(0.7871,0.8349,0.8234,0.8005,0.7203,0.4954,0.9025,0.688,0.02956,0.878,
                              0.1485,0.3106,0.5562,0.09352,0.9347,0.9524,0.1129,0.5621),
        `hc/nutriclaims`  =c(0.9223,0.9987,0.8875,0.8188,0.8461,0.5314,0.7948,0.3653,0.01645,0.6785,
                             0.4165,0.3762,0.811,0.7795,0.9699,0.5989,0.1845,0.4815),
         graphical = c(0.07883,0.005332,0.1171,0.1084,0.4267,0.6012,0.6359,0.7032,0.7903,0.4309,
                       0.8832,0.1234,0.2806,0.02592,0.4928,0.1685,0.1354,0.6253),
         `Free/prize` = c(0.0000684,0.05121,0.001036,0.3894,0.567,0.6864,0.3145,0.8791,0.7258,
                          0.3818,0.8952,0.4009,0.4923,0.05369,0.2724,0.3076,0.1766,0.626),
        add_colnames = TRUE
                )

table 16:

      caption(chiTable) <- "Chi square statistics - p values"
     bold(chiTable)[1,]           <- TRUE
   bottom_border(chiTable)[1,]  <- TRUE
   
   chiTable

Chi square statistics - p values
Variable	attractive pack	hc/nutriclaims	graphical	Free/prize
Gender	0.787	0.922	0.0788	6.84e-05
Education	0.835	0.999	0.00533	0.0512
Employment	0.823	0.887	0.117	0.00104
Income	0.8	0.819	0.108	0.389
Housesize	0.72	0.846	0.427	0.567
marital	0.495	0.531	0.601	0.686
fshopper	0.902	0.795	0.636	0.314
mplanner	0.688	0.365	0.703	0.879
FA	0.0296	0.0164	0.79	0.726
Diabetes	0.878	0.678	0.431	0.382
Metabolic_cyndrents	0.148	0.416	0.883	0.895
Other	0.311	0.376	0.123	0.401
specific	0.556	0.811	0.281	0.492
job1	0.0935	0.779	0.0259	0.0537
job2	0.935	0.97	0.493	0.272
Exercise	0.952	0.599	0.169	0.308
Health	0.113	0.184	0.135	0.177
place	0.562	0.481	0.625	0.626

According to table 16

Food allergies and attractive package-

Since pvalue = 0.02956 < 0.05 we have enough evidence to reject H0 at 5% level of significance. Food allergies and attractive package are dependent.

Food allergies and health claims/nutri claims-

Since pvalue = 0.01645 < 0.05 we have enough evidence to reject H0 at 5% level of significance.Food allergies and health claims/nutri claims are dependent.

Education level and graphical-

Since pvalue = 0.005332 < 0.05 we have enough evidence to reject H0 at 5% level of significance.Education level and graphical labels are dependent.

Gender and Free/Prize-

Since pvalue = 0.0000684 < 0.05 we have enough evidence to reject H0 at 5% level of significance.Gender and Free/Prize are dependent.

Employmemt level and Free/Prize-

Since pvalue = 0.001036 < 0.05 we have enough evidence to reject H0 at 5% level of significance.Employment level and Free/Prize are dependent.

Job1 and graphical label-

Since pvalue = 0.02592 < 0.05 we have enough evidence to reject H0 at 5% level of significance.Employment level and Free/Prize are dependent.

Association between demographic, socio-economic characteristics and health related factors on awareness of information printed in food labels.

figure 21:

ggplot(food_label, aes(x=netquan, y=Age, fill = Gender)) +  
  geom_boxplot(size = .75) +   
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))+
  xlab("Net quantity") +
  ylab("Age") + 
  ggtitle("Distribution of Age by \nnet quanity and gender")

According to the figure 21 we can see that most of the young males know the false detail about net quantity. Males have negatively skewed distribution with knowledge about net quantity.

figure 22:

ggplot(food_label, aes(x= `low in fat`, y=Age, fill = Gender)) +  
  geom_boxplot(size = .75) +   
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))+
  xlab("low in fat") +
  ylab("Age") +
  ggtitle("Distribution of Age by \n`low in fat` and gender")

Figure 22 shows that young males do not know about the low in fat label. Females show positively skewed distribution and males shows negatively skewed distribution for all answers.

figure 23:

 ggplot(food_label, aes(x= `low in cho`, y=Age, fill = Gender)) +  
  geom_boxplot(size = .75) +   
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))+
  xlab("low in cholestrol") +
  ylab("Age") +
  ggtitle("Distribution of Age by \nlow in cholestrol and gender")

In here females show positively skewed distribution with all answers. Both females and males show positively skewed distribution for correct answer which is not more than 0. 02g per 100g.

figure 24:

ggplot(food_label, aes(x= sodium, y=Age, fill = Gender)) +  
  geom_boxplot(size = .75) +   
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))+
  xlab("sodium") +
  ylab("Age") +
  ggtitle("Distribution of Age by \nsodium and gender")

Figure 24 shows that younger females and middle age males aware about the sodium label. They show positively skewed distribution with correct answer which is low in sodium

figure 25:

ggplot(food_label, aes(x= `e labels` , y=Age, fill = Gender)) +  
  geom_boxplot(size = .75) +   
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))+
  xlab("E code label") +
  ylab("Age") +
  ggtitle("Distribution of Age by \nE code label and gender")

According to the figure 25 we can see that middle age females show positively skewed distribution and middle age males show negatively skewed distribution for E code labels.

figure 26:

ggplot(food_label, aes(x=netquan, y=Age, fill = Education)) +  
  geom_boxplot(size = .75) +   
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))+
  xlab("Net quantity") +
  ylab("Age") + 
  ggtitle("Distribution of Age by \nnetquan and education")

Figure 26 shows that younger degree qualified people know about the net quantity label and it shows a nearly symmetric distribution. Middle aged people who didn't do O/L have positively skewed distribution with the wrong answer which is second category.

figure 27:

ggplot(food_label, aes(x= `low in fat`, y=Age, fill = Education)) +  
  geom_boxplot(size = .75) +   
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))+
  xlab("low in fat") +
  ylab("Age") +
  ggtitle("Distribution of Age by \n`low in fat` and education")

According to the figure 27 most of the people know about the low in fat label. The fewest number of below O/L, passed o/L, passed A/L people do not know about the low in fat table compared to Diploma and Degree. Diploma qualified people show approximately symmetric distribution with age.

figure 28:

 ggplot(food_label, aes(x= `low in cho`, y=Age, fill = Education)) +  
  geom_boxplot(size = .75) +   
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))+
  xlab("low in cholestrol") +
  ylab("Age") +
  ggtitle("Distribution of Age by \nlow in cholestrol and education")

It seems that older people who didn't do O/L aware about the low in fat label. Most of the people don't know about the low in cholesterol label. People who passed GCE O/L and didn't do O/L has nearly same symmetric distribution

figure 29:

ggplot(food_label, aes(x= sodium, y=Age, fill = Education)) +  
  geom_boxplot(size = .75) +   
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))+
  xlab("sodium") +
  ylab("Age") +
  ggtitle("Distribution of Age by \nsodium and education")

According to the figure 29, below O/L, passed o/L people do not know about the sodium label. Diploma and Degree qualified people show positively skewed distribution with correct answer.

figure 30:

ggplot(food_label, aes(x= `e labels` , y=Age, fill = Education)) +  
  geom_boxplot(size = .75) +   
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))+
  xlab("E code label") +
  ylab("Age") +
  ggtitle("Distribution of Age by \nE code label and education")

It seems that all the education levels show positively skewed distribution with awareness of the E code label.

figure 31:

 ggpairs(food_label, mapping = aes(color=netquan, alpha =0.2),
         columns =c ("Age", "Children"))+
   ggtitle("Scatter plot matrix by netquan")

There is no any strong linear relationships.

figure 32:

 ggpairs(food_label, mapping = aes(color=`low in fat`, alpha =0.2),
         columns =c ("Age", "Children"))+
   ggtitle("Scatter plot matrix by low in fat")

There is no any strong linear relationships.

figure 33:

 ggpairs(food_label, mapping = aes(color=`low in cho`, alpha =0.2),
         columns =c ("Age", "Children"))+
   ggtitle("Scatter plot matrix by low in cholestro;")

figure 34:

 ggpairs(food_label, mapping = aes(color=sodium, alpha =0.2),
         columns =c ("Age", "Children"))+
   ggtitle("Scatter plot matrix by sodium")

There is no any strong linear relationships.

figure 35:

 ggpairs(food_label, mapping = aes(color=`e labels`, alpha =0.2),
         columns =c ("Age", "Children"))+
   ggtitle("Scatter plot matrix by e code labels")

There is no any strong linear relationships.

chiTable2 <- hux(
        Variable = c("Gender", "Education", "Employment", "Income", "Housesize" , 
                     "marital" ,"fshopper","mplanner" , "FA" ,"Diabetes" ,
                     "Metabolic_cyndrents", "Other", "specific","job1", "job2",
                     "Exercise" ,"Health", "place" ),  
         netquan      = c(0.0000022,0.02227,0.2817,0.2911,0.8782,0.8696,0.2416,0.00002215,
                          0.1918,0.09066,0.03316,0.2741,0.2146,0.359,0.9124,0.3473,0.2398,0.2362),
        `low in fat`  = c(0.6385,0.09762,0.01041,0.41,0.171,0.2026,0.8417,0.8827,0.1397,0.8424,
                          0.9086,07869,0.6499,0.4867,0.5765,0.4409,0.7246,0.7535),
        `low in cho`  = c(0.2424,0.8388,0.5437,0.6919,0.1802,05666,0.1249,0.9328,05077,0.1007,0.4358,
                          0.296,0.5847,0.7082,0.7609,0.9345,0.1352,0.557),
         sodium       = c(0.01948,0.05121,0.001036,0.3894,0.4591,0.5783,0.6344,0.105,0.08301,0.6761,
                          0.4012,0.3476,0.1286,0.7007,0.7472,0.3076,0.1766,0.626),
        `e labels`    = c(0.0318,0.000000003251,0.1029,0.03139,0.8,0.05442,0.3435,0.7828,1,0.4325,0.1639,
                          0.8746,0.7086,0.007165,0.8449,0.0003601,0.901,0.04643),
        
        add_colnames = TRUE
                )

table 17:

      caption(chiTable2) <- "Chi square statistics - p values"
     bold(chiTable2)[1,]           <- TRUE
   bottom_border(chiTable2)[1,]  <- TRUE
   
   chiTable2

Chi square statistics - p values
Variable	netquan	low in fat	low in cho	sodium	e labels
Gender	2.2e-06	0.638	0.242	0.0195	0.0318
Education	0.0223	0.0976	0.839	0.0512	3.25e-09
Employment	0.282	0.0104	0.544	0.00104	0.103
Income	0.291	0.41	0.692	0.389	0.0314
Housesize	0.878	0.171	0.18	0.459	0.8
marital	0.87	0.203	5.67e+03	0.578	0.0544
fshopper	0.242	0.842	0.125	0.634	0.344
mplanner	2.21e-05	0.883	0.933	0.105	0.783
FA	0.192	0.14	5.08e+03	0.083	1
Diabetes	0.0907	0.842	0.101	0.676	0.432
Metabolic_cyndrents	0.0332	0.909	0.436	0.401	0.164
Other	0.274	7.87e+03	0.296	0.348	0.875
specific	0.215	0.65	0.585	0.129	0.709
job1	0.359	0.487	0.708	0.701	0.00717
job2	0.912	0.577	0.761	0.747	0.845
Exercise	0.347	0.441	0.934	0.308	0.00036
Health	0.24	0.725	0.135	0.177	0.901
place	0.236	0.753	0.557	0.626	0.0464

According to table 17,

Net quantity and Gender-

Since pvalue = 2.2e-06 < 0.05 we have enough evidence to reject H0 at 5% level of significance. Net quantity and Gender are dependent.

Net quantity and Education-

Since pvalue = 0.0223 < 0.05 we have enough evidence to reject H0 at 5% level of significance. Net quantity and Education are dependent.

Net quantity and Metabolic cyndrents-

Since pvalue = 0.0332 < 0.05 we have enough evidence to reject H0 at 5% level of significance. Net quantity and Metabolic cyndrents are dependent.

Low in fat and Employment level-

Since pvalue = 0.0104 < 0.05 we have enough evidence to reject H0 at 5% level of significance. Low in fat and Employment level are dependent.

Sodium and Gender-

Since pvalue = 0.01948 < 0.05 we have enough evidence to reject H0 at 5% level of significance. Sodium and Gender are dependent.

sodium and Employment level-

Since pvalue = 0.001036 < 0.05 we have enough evidence to reject H0 at 5% level of significance. sodium and Employment level are dependent.

sodium and food allergies-

Since pvalue = 0.08031 < 0.05 we have enough evidence to reject H0 at 5% level of significance. sodium and food allergies are dependent.

E label and Gender-

Since pvalue = 0.0318 < 0.05 we have enough evidence to reject H0 at 5% level of significance. E label and Gender are dependent.

E label and Education-

Since pvalue = 3.25e-09 < 0.05 we have enough evidence to reject H0 at 5% level of significance. E label and Education are dependent.

E label and Income-

Since pvalue = 0.03139 < 0.05 we have enough evidence to reject H0 at 5% level of significance. E label and Income are dependent.

E label and Job 1-

Since pvalue = 0.007165 < 0.05 we have enough evidence to reject H0 at 5% level of significance. E label and Job 1 are dependent.

E label and Exercise-

Since pvalue = 0.0003601 < 0.05 we have enough evidence to reject H0 at 5% level of significance. E label and Exercise are dependent.

E label and place-

Since pvalue = 0.04643 < 0.05 we have enough evidence to reject H0 at 5% level of significance. E label and place are dependent.

Conclusion

By using Chi sqaure test we can see that educational level and doctors, nurses health care workers show association on awareness of graphical labels. Food allergies shows association between attractive package and health claima/nutri claims. Net quantity has an associations with gender, education levels and metabolic cyndrents. As well as low in fat label and employment levels are dependent with each other. Sodium level has an association with gender, food allergies and employment level.Gender, education , income,octors, nurses health care workers ,Exercise , place show association with E code labels.

References

https://hellor.netlify.com/
https://tstdataviz.netlify.app/
https://data-flair.training/blogs/chi-square-test-in-r/#:~:text=Chi%2DSquare%20test%20in%20R%20is%20a%20statistical%20method%20which,selected%20from%20the%20same%20population.
http://www.sthda.com/english/wiki/chi-square-test-of-independence-in-r

DATA ANALYSIS ON FOOD LABELS

Introduction

Packages

Composition of the sample

Distributions and Relationships of the sample

Association between demographic, socio-economic characteristics and health related factors on attitude towards food labels.

Association between demographic, socio-economic characteristics and health related factors on awareness of information printed in food labels.

Conclusion

References