```{r}
relative_path <- "./data/mydata.csv"
```14 :blue_book: Import external data
14.1 Introduction
Most of the time you will want to generate Quarto documents using your own data. To this aim, you will have to import data from external sources such as files, URLs, or server data (e.g., ODK Central data). There is a dedicated importing function in R and Python for almost every data format. In this section we show you how to import Stata (.dta), Excel (.xlsx) and comma-separated values (CSV, .csv) data formats from files and from URLs, as well as how to import ODK data directly from an ODK Central server.
As you import data to further process / analyse them, you have to store the imported data in a data frame.
14.2 Import data from files
All Python and R functions only require as input the path where the file you want to import is stored. This path has to be passed as a sequence of characters (character) within double (““) or single (’’) quotes.
The path can be either:
- relative to your Quarto document
- absolute
```{r}
absolute_path <- "C:/Users/myuser/Documents/mydata.csv"
```Functions in general have additional optional arguments.
14.2.1 Import Excel data
Import the Excel data set dataset1.xlsx and store it into a data frame called df1.
- Stata: use the import excel Stata command with the stata function from the
RStatapackage. - R: use the read.xlsx function from the
openxlsxpackage. - Python: use the read_excel function from the
pandaspackage.
```{r}
# Write your code here
```Use the import excel Stata command with the stata function from the RStata package.
```{r}
#| df-print: kable
library(RStata)
df1_stata <- RStata::stata("import excel ./data/dataset1a.xlsx",
data.out = TRUE)
```Warning in readLines(rdl): Zeile 1 scheint ein nul Zeichen zu enthalten
```{r}
#| df-print: kable
df1_stata_sub <- head(df1_stata, 5)
knitr::kable(df1_stata_sub)
```. import excel ./data/dataset1a.xlsx
(7 vars, 10,309 obs)
| A | B | C | D | E | F | G |
|---|---|---|---|---|---|---|
| child_ID | SDC_sex | SDC_age_in_months | CLIN_fever | CLIN_fever_onset | CLIN_cough | CLIN_diarrhoea |
| 1 | 2 | 10 | 0 | 1 | 1 | |
| 2 | 2 | 6 | 0 | 0 | 1 | |
| 3 | 1 | 6 | 0 | 0 | 0 | |
| 4 | 1 | 11 | 1 | 3 | 1 | 1 |
Use the read.xlsx function from the openxlsx package.
```{r}
library(openxlsx)
df1_r <- openxlsx::read.xlsx("./data/dataset1a.xlsx")
df1_r_sub <- head(df1_r, 5)
knitr::kable(df1_r_sub)
```| child_ID | SDC_sex | SDC_age_in_months | CLIN_fever | CLIN_fever_onset | CLIN_cough | CLIN_diarrhoea |
|---|---|---|---|---|---|---|
| 1 | 2 | 10 | 0 | NA | 1 | 1 |
| 2 | 2 | 6 | 0 | NA | 0 | 1 |
| 3 | 1 | 6 | 0 | NA | 0 | 0 |
| 4 | 1 | 11 | 1 | 3 | 1 | 1 |
| 5 | 2 | 21 | 1 | 2 | 0 | 0 |
Use the read_excel function from the pandas package.
```{python}
import pandas
df1_python = pandas.read_excel('./data/dataset1a.xlsx')
df1_python_sub = df1_python.head(5)
```C:\Users\langhe\AppData\Local\R\win-library\4.3\reticulate\python\rpytools\loader.py:117: UserWarning: Pandas requires version '1.3.6' or newer of 'bottleneck' (version '1.3.5' currently installed).
return _find_and_load(name, import_)
library(reticulate)Warning: Paket 'reticulate' wurde unter R Version 4.3.3 erstellt
knitr::kable(py$df1_python_sub)| child_ID | SDC_sex | SDC_age_in_months | CLIN_fever | CLIN_fever_onset | CLIN_cough | CLIN_diarrhoea |
|---|---|---|---|---|---|---|
| 1 | 2 | 10 | 0 | NaN | 1 | 1 |
| 2 | 2 | 6 | 0 | NaN | 0 | 1 |
| 3 | 1 | 6 | 0 | NaN | 0 | 0 |
| 4 | 1 | 11 | 1 | 3 | 1 | 1 |
| 5 | 2 | 21 | 1 | 2 | 0 | 0 |
14.2.2 Import CSV data
Read the CSV data set dataset1.csv and store it into a data frame called df2.
```{r}
# Write your code here
``````{r}
library(RStata)
df2 <- RStata::stata("import delimited ./data/dataset1b.csv",
data.out = TRUE)
```. import delimited ./data/dataset1b.csv
(encoding automatically selected: ISO-8859-1)
(7 vars, 10,308 obs)
```{r}
df2_r <- read.csv("./data/dataset1b.csv")
df2_r_sub <- head(df2_r, 5)
knitr::kable(df2_r_sub)
```| child_ID | SDC_sex | SDC_age_in_months | CLIN_fever | CLIN_fever_onset | CLIN_cough | CLIN_diarrhoea |
|---|---|---|---|---|---|---|
| 1 | 2 | 10 | 0 | NA | 1 | 1 |
| 2 | 2 | 6 | 0 | NA | 0 | 1 |
| 3 | 1 | 6 | 0 | NA | 0 | 0 |
| 4 | 1 | 11 | 1 | 3 | 1 | 1 |
| 5 | 2 | 21 | 1 | 2 | 0 | 0 |
Use the read_csv function from the pandas package.
```{python}
import pandas
df2 = pandas.read_csv('./data/dataset1b.csv')
```14.2.3 Import Stata data
Read the Stata data set dataset1.dta and store it into a data frame called df3.
- Stata: use the
useStata command with the stata function from theRStatapackage. - R: use the read_dta function from the
havenpackage. This package supports SAS, STATA and SPSS software. - Python: use the read_stata function from the
pandaspackage.
```{r}
# Write your code here
``````{r}
library(RStata)
df3 <- RStata::stata("use ./data/dataset1c.dta",
data.out = TRUE)
```. use ./data/dataset1c.dta
```{r}
library(haven)
df3 <- haven::read_dta("./data/dataset1c.dta")
```Use the read_stata function from the pandas package.
```{python}
import pandas
df3 = pandas.read_stata('./data/dataset1c.dta')
```14.3 Import data from URLs
All functions can accept URLs as well instead of the path to a specific file.
Import the CSV data set that contains a comprehensive spatial inventory of 98,745 public health facilities in Sub Saharan Africa directly from the following url and store it into a data frame called df4.
To learn more about how this data set was assembled, please refer to (1)
- Stata: use the import excel Stata command with the
statafunction from theRStatapackage. - R: use the read.xlsx function from the
openxlsxpackage. - Python: use the read_excel function from the
pandaspackage. In the latest version of pandas (0.19.2) you can directly pass the url
```{r}
# Write your code here
```Here, because quotes are already used for the Stata command, you need to use the other type of quotes for indicating the URL.
```{r}
library(RStata)
#df2 <- RStata::stata('import delimited "https://open.africa/dataset/d7335980-29d5-476c-bf7a-feb4e22cf631/resource/e2432e8a-cf15-4a8b-b8c3-567f443c1459/download/cfa-data-hospitals-in-africa-00-ssa-mfl-130219.xlsx-ssa-mfl.csv"',
# data.out = TRUE)
```Use the read.xlsx function from the openxlsx package.
```{r}
#csv_url <- "https://open.africa/dataset/d7335980-29d5-476c-bf7a-feb4e22cf631/resource/e2432e8a-cf15-4a8b-b8c3-567f443c1459/download/cfa-data-hospitals-in-africa-00-ssa-mfl-130219.xlsx-ssa-mfl.csv"
#df4 <- read.csv(csv_url)
```Use the read_excel function from the pandas package.
```{python}
import pandas
#csv_url = "https://open.africa/dataset/d7335980-29d5-476c-bf7a-feb4e22cf631/resource/e2432e8a-cf15-4a8b-b8c3-567f443c1459/download/cfa-data-hospitals-in-africa-00-ssa-mfl-130219.xlsx-ssa-mfl.csv"
#df1 = pandas.read_csv(csv_url)
```