24 2022 Tanzania
24.1 Learning objectives
Two complementary aspects of moving into data science are:
- the mindset about how scientists think and collaborate about data, and
- the skillsets which is composed of an ecosystem of tools (mostly open-source) and practices.
Upon completing the workshop, participants will have gained:
- exposure to data science approach, tools and collaborative practices
- hands-on experience on how to interface between Stata and R, learned the basics of working with data in R/RStudio, and how to incrementally incorporate R into your existing data analysis workflows in Stata. The idea is not to replace everything you do in Stata into R but that you can continue your learning after this workshop at your own pace.
24.2 Is this workshop for me?
This workshop is relevant for individuals who answer yes to the following questions:
- Do you who want to develop data science projects in public health?
- Do you wants to learn more about how open and reproducible science approaches can be used in your daily practice?
- Are you a Stata user (or any other data analysis language) who would like to expand your data analysis skillset with R?
- Do you want to bridge analyses between data analysis tools (Stata, R or Python) and to more easily collaborate with other researchers who use another of these tools?
24.3 Schedule
🗓️ September 26-28, 2022
🕘 09:00 - 17:00
🌇 Dar-es-Salaam, Tanzania (Protea Hotel by Marriott Dar es Salaam Courtyard)
24.3.1 Before the workshop
- Fill out the online pre-workshop questionnaire
- Install on your laptop the (free) data science software that will be used during the workshop. If you have any difficulties with the installation, support can be provided on the first day of the workshop before the first session or during breaks.
24.3.2 Day 1
Time | Session |
---|---|
08.30 - 09.00 | Welcome Support for software installation |
09.00 - 09.15 | Introduction to data science tools Overview of objectives for Day 1 |
09.15 - 10.30 | Version control with Git |
10.30 - 11.00 | 🍵 ☕ Break |
11.00 -12.00 | Introduction to dynamic documents and Quarto |
12.00 - 13.00 | Use Quarto with Stata |
13.00 - 14.00 | 🍴 Lunch break |
14.00 - 15.00 | Import and manipulate external data (1) |
15.00 - 15.30 | Import and manipulate external data (2) |
15.30 - 16.00 | 🍵 ☕ Break |
16.00 - 17.00 | Share code and Collaborate with Git |
24.3.3 Day 2
Time | Session (all) |
---|---|
08.30 - 09.00 | Welcome |
09.00 - 09.15 | Introduction to Data Science for Public Health Overview of objectives for Day 2 |
09.15 - 10.30 | Discussion on concepts related to to health data for decision-making |
10.30 - 11.00 | 🍵 ☕ Break |
11:00-11:15 | Malaria use case - Presentation of the data |
11.15 - 11.45 | Malaria use case - Interdisciplinary discussion |
11.45 - 12.30 | Malaria use case - Data practicals by interdisciplinary groups |
12.30 - 13.00 | Malaria use case - Feedback on findings from practicals |
13.00 - 14.00 | 🍴 Lunch break |
14.00 - 14.30 | Malaria use case - Interdisciplinary discussion |
14.00 - 15.30 | Malaria use case Analysis: data practicals |
15.30 - 16.00 | 🍵 ☕ Break |
16.00 - 17.00 | Malaria use case - Feedback on praticals |
24.3.4 Day 3
Time | Session (all) |
---|---|
08.30 - 09.00 | Welcome |
09.00 - 09.15 | Interdisciplinary introduction to big data and machine Learning Overview of objectives for Day 3 |
09.15 - 10.00 | Discussion on secondary data sources (Public Datasets, e.g. DHS, Facebook, facilities, etc) Benefits and drawbacks between primary and secondary data sources |
10.00 - 10.30 | 🍵 ☕ Break |
11.00 - 13.00 | Analysis: Introduction to machine learning Interpretation: Critically discuss data surveys/reports |
13:00-14:00 | 🍴 Lunch break |
14:00-14:30 | Speed talks - research presentations |
14:30-15:30 | Feedback on findings from practicals |
15:30-16:30 |
Feedback on workshop - Wrap-up |
24.3.5 After the workshop
- Fill out the online post-workshop questionnaire
24.4 Scope
This workshop aims to accompany researchers to progress on the following development axes:
24.4.1 Data science mindset
- Use of reproducible research practices in public health
- Data provenance
- Use of distinct data sources for the development of public health indicators
- Research data vs. real world evidence data
- Ethical data science
- Data papers
24.4.2 Data science skillset
- Programming tools
- Move from Stata to R (prerequisite: Stata)
- R programming
- dplyr
- Python programming
- pandas
- scikit-learn (prerequisite: independent Python user)
- Coding with best practices (R/RStudio/tidyverse)
- Versioning using GitHub (all)
- Using targets (prerequisite: independent R user)
- Reporting and publishing: Dynamic report generation
- Reproducible data
- Use APIs (prerequisite: IT programming basics)
- Open access data (all)
- Statistical methods for reproducible research (advanced)
24.4.3 What is not covered
- Reproducible workflows (targets)
- Reproducible environments (Binder, Docker, renv, etc)
24.5 Conventions
Discussion activity 💬
Reflection activity 💭
Coding activity 💻