Exame Nacional do Ensino Medio (ENEM)

Build Status Build status

Contributed by Dr. Djalma Pessoa <pessoad@gmail.com>

The Exame Nacional do Ensino Medio (ENEM) contains the standardized test results of most Brazilian high school students.

Simplified Download and Importation

The R lodown package easily downloads and imports all available ENEM microdata by simply specifying "enem" with an output_dir = parameter in the lodown() function. Depending on your internet connection and computer processing speed, you might prefer to run this step overnight.

library(lodown)
lodown( "enem" , output_dir = file.path( path.expand( "~" ) , "ENEM" ) )

lodown also provides a catalog of available microdata extracts with the get_catalog() function. After requesting the ENEM catalog, you could pass a subsetted catalog through the lodown() function in order to download and import specific extracts (rather than all available extracts).

library(lodown)
# examine all available ENEM microdata files
enem_cat <-
    get_catalog( "enem" ,
        output_dir = file.path( path.expand( "~" ) , "ENEM" ) )

# 2015 only
enem_cat <- subset( enem_cat , year == 2015 )
# download the microdata to your local computer
lodown( "enem" , enem_cat )

0.19 Analysis Examples with SQL and MonetDBLite

Connect to a database:

library(DBI)
dbdir <- file.path( path.expand( "~" ) , "ENEM" , "MonetDB" )
db <- dbConnect( MonetDBLite::MonetDBLite() , dbdir )

Variable Recoding

Add new columns to the data set:

dbSendQuery( db , "ALTER TABLE microdados_enem_2015 ADD COLUMN female INTEGER" )

dbSendQuery( db , 
    "UPDATE microdados_enem_2015 
    SET female = 
        CASE WHEN tp_sexo = 2 THEN 1 ELSE 0 END" 
)

dbSendQuery( db , "ALTER TABLE microdados_enem_2015 ADD COLUMN fathers_education INTEGER" )

dbSendQuery( db , 
    "UPDATE microdados_enem_2015 
    SET fathers_education = 
        CASE WHEN q001 = 1 THEN '01 - nao estudou'
            WHEN q001 = 2 THEN '02 - 1 a 4 serie'
            WHEN q001 = 3 THEN '03 - 5 a 8 serie'
            WHEN q001 = 4 THEN '04 - ensino medio incompleto'
            WHEN q001 = 5 THEN '05 - ensino medio'
            WHEN q001 = 6 THEN '06 - ensino superior incompleto'
            WHEN q001 = 7 THEN '07 - ensino superior'
            WHEN q001 = 8 THEN '08 - pos-graduacao'
            WHEN q001 = 9 THEN '09 - nao estudou' ELSE NULL END" 
)

Unweighted Counts

Count the unweighted number of records in the SQL table, overall and by groups:

dbGetQuery( db , "SELECT COUNT(*) FROM microdados_enem_2015" )

dbGetQuery( db ,
    "SELECT
        fathers_education ,
        COUNT(*) 
    FROM microdados_enem_2015
    GROUP BY fathers_education"
)

Descriptive Statistics

Calculate the mean (average) of a linear variable, overall and by groups:

dbGetQuery( db , "SELECT AVG( nota_mt ) FROM microdados_enem_2015" )

dbGetQuery( db , 
    "SELECT 
        fathers_education , 
        AVG( nota_mt ) AS mean_nota_mt
    FROM microdados_enem_2015 
    GROUP BY fathers_education" 
)

Initiate a function that allows division by zero:

dbSendQuery( db , 
    "CREATE FUNCTION 
        div_noerror(l DOUBLE, r DOUBLE) 
    RETURNS DOUBLE 
    EXTERNAL NAME calc.div_noerror" 
)

Calculate the distribution of a categorical variable:

dbGetQuery( db , 
    "SELECT 
        uf_residencia , 
        div_noerror( 
            COUNT(*) , 
            ( SELECT COUNT(*) FROM microdados_enem_2015 ) 
        ) AS share_uf_residencia
    FROM microdados_enem_2015 
    GROUP BY uf_residencia" 
)

Calculate the sum of a linear variable, overall and by groups:

dbGetQuery( db , "SELECT SUM( nota_mt ) FROM microdados_enem_2015" )

dbGetQuery( db , 
    "SELECT 
        fathers_education , 
        SUM( nota_mt ) AS sum_nota_mt 
    FROM microdados_enem_2015 
    GROUP BY fathers_education" 
)

Calculate the median (50th percentile) of a linear variable, overall and by groups:

dbGetQuery( db , "SELECT QUANTILE( nota_mt , 0.5 ) FROM microdados_enem_2015" )

dbGetQuery( db , 
    "SELECT 
        fathers_education , 
        QUANTILE( nota_mt , 0.5 ) AS median_nota_mt
    FROM microdados_enem_2015 
    GROUP BY fathers_education" 
)

Subsetting

Limit your SQL analysis to took mathematics exam with WHERE:

dbGetQuery( db ,
    "SELECT
        AVG( nota_mt )
    FROM microdados_enem_2015
    WHERE in_presenca_mt = 1"
)

Measures of Uncertainty

Calculate the variance and standard deviation, overall and by groups:

dbGetQuery( db , 
    "SELECT 
        VAR_SAMP( nota_mt ) , 
        STDDEV_SAMP( nota_mt ) 
    FROM microdados_enem_2015" 
)

dbGetQuery( db , 
    "SELECT 
        fathers_education , 
        VAR_SAMP( nota_mt ) AS var_nota_mt ,
        STDDEV_SAMP( nota_mt ) AS stddev_nota_mt
    FROM microdados_enem_2015 
    GROUP BY fathers_education" 
)

Regression Models and Tests of Association

Calculate the correlation between two variables, overall and by groups:

dbGetQuery( db , 
    "SELECT 
        CORR( CAST( female AS DOUBLE ) , CAST( nota_mt AS DOUBLE ) )
    FROM microdados_enem_2015" 
)

dbGetQuery( db , 
    "SELECT 
        fathers_education , 
        CORR( CAST( female AS DOUBLE ) , CAST( nota_mt AS DOUBLE ) )
    FROM microdados_enem_2015 
    GROUP BY fathers_education" 
)

0.20 Analysis Examples with dplyr

The R dplyr library offers an alternative grammar of data manipulation to base R and SQL syntax. dplyr offers many verbs, such as summarize, group_by, and mutate, the convenience of pipe-able functions, and the tidyverse style of non-standard evaluation. This vignette details the available features. As a starting point for ENEM users, this code replicates previously-presented examples:

library(dplyr)
dplyr_db <- MonetDBLite::src_monetdblite( dbdir )
enem_tbl <- tbl( dplyr_db , 'microdados_enem_2015' )

Calculate the mean (average) of a linear variable, overall and by groups:

enem_tbl %>%
    summarize( mean = mean( nota_mt ) )

enem_tbl %>%
    group_by( fathers_education ) %>%
    summarize( mean = mean( nota_mt ) )

Replication Example

dbGetQuery( db , "SELECT COUNT(*) FROM microdados_enem_2015" )

Database Shutdown

dbDisconnect( db , shutdown = TRUE )