Exame Nacional do Ensino Medio (ENEM)
The national student aptitude test, used to assess high school completion and university admission.
One table with one row per test-taking student, a second of study habit questionnaire respondents.
Updated annually since 1998.
Maintained by Brazil’s Instituto Nacional de Estudos e Pesquisas Educacionais Anisio Teixeira
Please skim before you begin:
Leia_Me_Enem
included in each annual zipped fileA haiku regarding this microdata:
Download, Import, Preparation
Download and unzip the 2022 file:
library(httr)
library(archive)
tf <- tempfile()
this_url <- "https://download.inep.gov.br/microdados/microdados_enem_2022.zip"
GET( this_url , write_disk( tf ) , progress() )
archive_extract( tf , dir = tempdir() )
Import the 2022 file:
library(readr)
enem_fns <- list.files( tempdir() , recursive = TRUE , full.names = TRUE )
enem_fn <- grep( "MICRODADOS_ENEM_([0-9][0-9][0-9][0-9])\\.csv$" , enem_fns , value = TRUE )
enem_tbl <- read_csv2( enem_fn , locale = locale( encoding = 'latin1' ) )
enem_df <- data.frame( enem_tbl )
names( enem_df ) <- tolower( names( enem_df ) )
Save Locally
Save the object at any point:
# enem_fn <- file.path( path.expand( "~" ) , "ENEM" , "this_file.rds" )
# saveRDS( enem_df , file = enem_fn , compress = FALSE )
Load the same object:
Variable Recoding
Add new columns to the data set:
enem_df <-
transform(
enem_df ,
domestic_worker = as.numeric( q007 %in% c( 'B' , 'C' , 'D' ) ) ,
administrative_category =
factor(
tp_dependencia_adm_esc ,
levels = 1:4 ,
labels = c( 'Federal' , 'Estadual' , 'Municipal' , 'Privada' )
) ,
state_name =
factor(
co_uf_esc ,
levels = c( 11:17 , 21:29 , 31:33 , 35 , 41:43 , 50:53 ) ,
labels = c( "Rondonia" , "Acre" , "Amazonas" ,
"Roraima" , "Para" , "Amapa" , "Tocantins" ,
"Maranhao" , "Piaui" , "Ceara" , "Rio Grande do Norte" ,
"Paraiba" , "Pernambuco" , "Alagoas" , "Sergipe" ,
"Bahia" , "Minas Gerais" , "Espirito Santo" ,
"Rio de Janeiro" , "Sao Paulo" , "Parana" ,
"Santa Catarina" , "Rio Grande do Sul" ,
"Mato Grosso do Sul" , "Mato Grosso" , "Goias" ,
"Distrito Federal" )
)
)
Analysis Examples with base R
Descriptive Statistics
Calculate the mean (average) of a linear variable, overall and by groups:
mean( enem_df[ , "nu_nota_mt" ] , na.rm = TRUE )
tapply(
enem_df[ , "nu_nota_mt" ] ,
enem_df[ , "administrative_category" ] ,
mean ,
na.rm = TRUE
)
Calculate the distribution of a categorical variable, overall and by groups:
prop.table( table( enem_df[ , "state_name" ] ) )
prop.table(
table( enem_df[ , c( "state_name" , "administrative_category" ) ] ) ,
margin = 2
)
Calculate the sum of a linear variable, overall and by groups:
sum( enem_df[ , "nu_nota_mt" ] , na.rm = TRUE )
tapply(
enem_df[ , "nu_nota_mt" ] ,
enem_df[ , "administrative_category" ] ,
sum ,
na.rm = TRUE
)
Calculate the median (50th percentile) of a linear variable, overall and by groups:
Subsetting
Limit your data.frame
to mother graduated from high school:
Calculate the mean (average) of this subset:
Replication Example
This example matches the registration counts in the Sinopse ENEM 2022 Excel table:
Analysis Examples with dplyr
The R dplyr
library offers an alternative grammar of data manipulation to base R and SQL syntax. dplyr offers many verbs, such as summarize
, group_by
, and mutate
, the convenience of pipe-able functions, and the tidyverse
style of non-standard evaluation. This vignette details the available features. As a starting point for ENEM users, this code replicates previously-presented examples:
Calculate the mean (average) of a linear variable, overall and by groups:
enem_tbl %>%
summarize( mean = mean( nu_nota_mt , na.rm = TRUE ) )
enem_tbl %>%
group_by( administrative_category ) %>%
summarize( mean = mean( nu_nota_mt , na.rm = TRUE ) )
Analysis Examples with data.table
The R data.table
library provides a high-performance version of base R’s data.frame with syntax and feature enhancements for ease of use, convenience and programming speed. data.table offers concise syntax: fast to type, fast to read, fast speed, memory efficiency, a careful API lifecycle management, an active community, and a rich set of features. This vignette details the available features. As a starting point for ENEM users, this code replicates previously-presented examples:
Calculate the mean (average) of a linear variable, overall and by groups:
enem_dt[ , mean( nu_nota_mt , na.rm = TRUE ) ]
enem_dt[ , mean( nu_nota_mt , na.rm = TRUE ) , by = administrative_category ]
Analysis Examples with duckdb
The R duckdb
library provides an embedded analytical data management system with support for the Structured Query Language (SQL). duckdb offers a simple, feature-rich, fast, and free SQL OLAP management system. This vignette details the available features. As a starting point for ENEM users, this code replicates previously-presented examples:
library(duckdb)
con <- dbConnect( duckdb::duckdb() , dbdir = 'my-db.duckdb' )
dbWriteTable( con , 'enem' , enem_df )
Calculate the mean (average) of a linear variable, overall and by groups: