Flu Analysis: Exploration

Load the data and packages:

library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.2.3
Warning: package 'ggplot2' was built under R version 4.2.3
Warning: package 'tibble' was built under R version 4.2.3
Warning: package 'tidyr' was built under R version 4.2.3
Warning: package 'readr' was built under R version 4.2.3
Warning: package 'dplyr' was built under R version 4.2.3
Warning: package 'stringr' was built under R version 4.2.2
Warning: package 'forcats' was built under R version 4.2.3
Warning: package 'lubridate' was built under R version 4.2.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.1     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(here)
Warning: package 'here' was built under R version 4.2.2
here() starts at C:/Users/Raquel/GitHub/MADA/RaquelFrancisco-MADA-portfolio
library(ggplot2)

data <- readRDS(here('fluanalysis/data/SypAct_clean.rds'))
tibble(data) # to get a look at the data
# A tibble: 730 × 26
   SwollenLymphNodes ChestCongestion ChillsSweats NasalCongestion Sneeze Fatigue
   <fct>             <fct>           <fct>        <fct>           <fct>  <fct>  
 1 Yes               No              No           No              No     Yes    
 2 Yes               Yes             No           Yes             No     Yes    
 3 Yes               Yes             Yes          Yes             Yes    Yes    
 4 Yes               Yes             Yes          Yes             Yes    Yes    
 5 Yes               No              Yes          No              No     Yes    
 6 No                No              Yes          No              Yes    Yes    
 7 No                No              Yes          No              No     Yes    
 8 No                Yes             Yes          Yes             Yes    Yes    
 9 Yes               Yes             Yes          Yes             No     Yes    
10 No                Yes             No           Yes             No     Yes    
# ℹ 720 more rows
# ℹ 20 more variables: SubjectiveFever <fct>, Headache <fct>, Weakness <fct>,
#   CoughIntensity <fct>, Myalgia <fct>, RunnyNose <fct>, AbPain <fct>,
#   ChestPain <fct>, Diarrhea <fct>, EyePn <fct>, Insomnia <fct>,
#   ItchyEye <fct>, Nausea <fct>, EarPn <fct>, Pharyngitis <fct>,
#   Breathless <fct>, ToothPn <fct>, Vomit <fct>, Wheeze <fct>, BodyTemp <dbl>
str(data)
'data.frame':   730 obs. of  26 variables:
 $ SwollenLymphNodes: Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 2 1 ...
  ..- attr(*, "label")= chr "Swollen Lymph Nodes"
 $ ChestCongestion  : Factor w/ 2 levels "No","Yes": 1 2 2 2 1 1 1 2 2 2 ...
  ..- attr(*, "label")= chr "Chest Congestion"
 $ ChillsSweats     : Factor w/ 2 levels "No","Yes": 1 1 2 2 2 2 2 2 2 1 ...
  ..- attr(*, "label")= chr "Chills/Sweats"
 $ NasalCongestion  : Factor w/ 2 levels "No","Yes": 1 2 2 2 1 1 1 2 2 2 ...
  ..- attr(*, "label")= chr "Nasal Congestion"
 $ Sneeze           : Factor w/ 2 levels "No","Yes": 1 1 2 2 1 2 1 2 1 1 ...
  ..- attr(*, "label")= chr "Sneeze"
 $ Fatigue          : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
  ..- attr(*, "label")= chr "Fatigue"
 $ SubjectiveFever  : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 1 ...
  ..- attr(*, "label")= chr "Subjective Fever"
 $ Headache         : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 1 2 2 2 ...
  ..- attr(*, "label")= chr "Headache"
 $ Weakness         : Factor w/ 4 levels "None","Mild",..: 2 4 4 4 3 3 2 4 3 3 ...
 $ CoughIntensity   : Factor w/ 4 levels "None","Mild",..: 4 4 2 3 1 3 4 3 3 3 ...
  ..- attr(*, "label")= chr "Cough Severity"
 $ Myalgia          : Factor w/ 4 levels "None","Mild",..: 2 4 4 4 2 3 2 4 3 2 ...
 $ RunnyNose        : Factor w/ 2 levels "No","Yes": 1 1 2 2 1 1 2 2 2 2 ...
  ..- attr(*, "label")= chr "Runny Nose"
 $ AbPain           : Factor w/ 2 levels "No","Yes": 1 1 2 1 1 1 1 1 1 1 ...
  ..- attr(*, "label")= chr "Abdominal Pain"
 $ ChestPain        : Factor w/ 2 levels "No","Yes": 1 1 2 1 1 2 2 1 1 1 ...
  ..- attr(*, "label")= chr "Chest Pain"
 $ Diarrhea         : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 2 1 1 1 1 ...
 $ EyePn            : Factor w/ 2 levels "No","Yes": 1 1 1 1 2 1 1 1 1 1 ...
  ..- attr(*, "label")= chr "Eye Pain"
 $ Insomnia         : Factor w/ 2 levels "No","Yes": 1 1 2 2 2 1 1 2 2 2 ...
  ..- attr(*, "label")= chr "Sleeplessness"
 $ ItchyEye         : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "label")= chr "Itchy Eyes"
 $ Nausea           : Factor w/ 2 levels "No","Yes": 1 1 2 2 2 2 1 1 2 2 ...
 $ EarPn            : Factor w/ 2 levels "No","Yes": 1 2 1 2 1 1 1 1 1 1 ...
  ..- attr(*, "label")= chr "Ear Pain"
 $ Pharyngitis      : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 1 1 1 ...
  ..- attr(*, "label")= chr "Sore Throat"
 $ Breathless       : Factor w/ 2 levels "No","Yes": 1 1 2 1 1 2 1 1 1 2 ...
  ..- attr(*, "label")= chr "Breathlessness"
 $ ToothPn          : Factor w/ 2 levels "No","Yes": 1 1 2 1 1 1 1 1 2 1 ...
  ..- attr(*, "label")= chr "Tooth Pain"
 $ Vomit            : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 2 1 1 1 ...
  ..- attr(*, "label")= chr "Vomiting"
 $ Wheeze           : Factor w/ 2 levels "No","Yes": 1 1 1 2 1 2 1 1 1 1 ...
  ..- attr(*, "label")= chr "Wheezing"
 $ BodyTemp         : num  98.3 100.4 100.8 98.8 100.5 ...

Minimum requirements:

For each (important) variable, produce and print some numerical output (e.g. a table or some summary statistics numbers). - Myalgia - Cough Intensity - Weakness

  • Chills
  • Fatigue
  • Headache
  • Vision

For each (important) continuous variable, create a histogram or density plot. - Body Temperature

Create scatterplots or boxplots or similar plots for the variable you decided is your main outcome of interest and the most important (or all depending on number of variables) independent variables/predictors.

Summary Statistics for Categorical Data

table(data$Myalgia)

    None     Mild Moderate   Severe 
      79      213      325      113 
table(data$CoughIntensity)

    None     Mild Moderate   Severe 
      47      154      357      172 
table(data$Weakness)

    None     Mild Moderate   Severe 
      49      223      338      120 
par(mfrow=c(1,3)) # show the following plots side by side
barplot(table(data$Myalgia), ylab = 'Severity of Flu Symptoms', xlab = 'Myalgia', ylim = c(0,350))
barplot(table(data$CoughIntensity), xlab = 'Cough Intensity', ylim = c(0,350))
barplot(table(data$Weakness), xlab = 'Weakness', ylim = c(0,350))

Summary Statistics for Binary Data

table(data$ChillsSweats)

 No Yes 
130 600 
table(data$Fatigue)

 No Yes 
 64 666 
table(data$Headache)

 No Yes 
115 615 
table(data$Vision)
< table of extent 0 >
table(data$Nausea)

 No Yes 
475 255 
par(mfrow=c(1,5)) # show the following plots side by side
barplot(table(data$ChillsSweats), ylab = 'Presence of Flu Symptoms', xlab = 'Chills or Sweats', ylim = c(0,800))
barplot(table(data$Fatigue), xlab = 'Fatigue', ylim = c(0,800))
barplot(table(data$Headache), xlab = 'Headache', ylim = c(0,800))
barplot(table(data$Nausea), xlab = 'Nausea', ylim = c(0,800))

Summary Statistics for Continuous Data

summary(data$BodyTemp)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  97.20   98.20   98.50   98.94   99.30  103.10 
hist(data$BodyTemp) #histogram

d <- density(data$BodyTemp)
plot(d, main="Flu Body Temperature") #Density plot

Visualizing Data interations

ggplot(data = data) +
  geom_boxplot(aes(x= Myalgia, y = BodyTemp))

ggplot(data = data) +
  geom_boxplot(aes(x= CoughIntensity, y = BodyTemp))

ggplot(data = data) +
  geom_boxplot(aes(x= Weakness, y = BodyTemp))