19  :books: Malaria case study

19.1 Introduction

19.1.1 Overview

These pages will demonstrate how to use Quarto to data from Tanzania.

19.1.2 Learning objectives

  • Apply what you have learnt on Day 1 on real data

19.2 Getting started

19.2.1 Access the Quarto template

Download the Quarto template used for this case study (add link) using GitHub.

Please review previous sections on Quarto, data import and manipulation.

19.2.2 Install packages

```{r}
install.packages("ggplot2")
install.packages("ggthemes")
install.packages("networkD3")
install.packages("apyramid")
```
```{r}
library(openxlsx)
library(dplyr)
library(skimr)
library(gtsummary)
library(finalfit)
library(ggplot2)
library(ggthemes)
library(networkD3) # For alluvial/Sankey diagrams
library(tidyverse)
```

19.2.3 Dataset description

We will be using data and examples from a real consultation data which occurred in Tanzania between 2021-07-29 and 2021-12-17 within the Integrated Management of Childhood Illness (TIMCI) project.

Important

Data are made available by the Ifakara Health Institute (IHI) for training purposes only. Please note, that some data has been adapted in order to best achieve training objectives. No personally indentifiable information have been kept in this dataset.

Information about the consultations of 10,308 children [1 day - 59 months] from 18 facilities (dispensaries and health centres) in Kaliua District, Sengerema District and Tanga District, Tanzania.

(a)

(b)

(c)

(d)

Figure 19.1: Study area

19.2.4 Data collection

Data were collected using ODK (ODK Collect, ODK Central) between 2021-07-29 and 2021-12-17. Research assistants recorded the following information from different sources.

Types and sources of information
Information Prefix Source
Context CTX Metadata
Sociodemographics SDC Caregiver
Clinical presentation CLIN Caregiver
Laboratory investigations TEST Child booklet or facility MTUHA book
Diagnoses DX Child booklet or facility MTUHA book
Treatments RX
  • before consultation
Caregiver
  • as a result of the consultation
Child booklet or facility MTUHA book
Referral advice MGMT
Caregiver
Child booklet or facility MTUHA book

19.2.5 Data preparation

Data cleaning and data de-identification

Personally identifiable information (PII) were removed.

19.3 Population characteristics

19.3.1 Codebook

Variable Coding
SDC_age_in_month
SDC_sex 1: male
2: female
98: unknown
CLIN_fever 0: no
1: yes
98: not sure
CLIN_fever_onset
CLIN_cough 0: no
1: yes
98: not sure
CLIN_diarrhoea 0: no
1: yes
98: not sure
RX_preconsult_antibiotics
RX_preconsult_antimalarials
CONSULT_district Kaliua
Sengerema
Tanga
CONSULT_area urban
rural
CONSULT_facility_type dispensary
health centre

How many variables are numerical? How many features are categorical?

Note

A numerical variable is a quantity represented by a real or integer number.

A categorical variable has discrete values, typically represented by string labels (but not only) taken from a finite list of possible choices.

For instance, the variable native-country in our dataset is a categorical variable because it encodes the data using a finite list of possible countries (along with the ? symbol when this information is missing)