Home Mortgage Disclosure Act (HMDA)

Contributed by Max Weselcouch <mweselco@gmail.com>

Responding to discriminatory lending practices, the United States Congress mandated that financial organizations originating home mortgages report some basic operational statistics. The Home Mortgage Disclosure Act (HMDA) increased the transparency of home-lending activity across the country.

Simplified Download and Importation

The R lodown package easily downloads and imports all available HMDA microdata by simply specifying "hmda" with an output_dir = parameter in the lodown() function. Depending on your internet connection and computer processing speed, you might prefer to run this step overnight.

library(lodown)
lodown( "hmda" , output_dir = file.path( path.expand( "~" ) , "HMDA" ) )

Analysis Examples with SQL and MonetDBLite

Connect to a database:

library(DBI)
dbdir <- file.path( path.expand( "~" ) , "HMDA" , "MonetDB" )
db <- dbConnect( MonetDBLite::MonetDBLite() , dbdir )

Variable Recoding

Add new columns to the data set:

dbSendQuery( db , "ALTER TABLE hmda_2015 ADD COLUMN multifamily_home INTEGER" )

dbSendQuery( db , 
    "UPDATE hmda_2015 
    SET multifamily_home = 
        CASE WHEN ( propertytype = 3 ) THEN 1 ELSE 0 END" 
)

Unweighted Counts

Count the unweighted number of records in the SQL table, overall and by groups:

dbGetQuery( db , "SELECT COUNT(*) FROM hmda_2015" )

dbGetQuery( db ,
    "SELECT
        loanpurpose ,
        COUNT(*) 
    FROM hmda_2015
    GROUP BY loanpurpose"
)

Descriptive Statistics

Calculate the mean (average) of a linear variable, overall and by groups:

dbGetQuery( db , "SELECT AVG( loanamount ) FROM hmda_2015" )

dbGetQuery( db , 
    "SELECT 
        loanpurpose , 
        AVG( loanamount ) AS mean_loanamount
    FROM hmda_2015 
    GROUP BY loanpurpose" 
)

Initiate a function that allows division by zero:

dbSendQuery( db , 
    "CREATE FUNCTION 
        div_noerror(l DOUBLE, r DOUBLE) 
    RETURNS DOUBLE 
    EXTERNAL NAME calc.div_noerror" 
)

Calculate the distribution of a categorical variable:

dbGetQuery( db , 
    "SELECT 
        actiontype , 
        div_noerror( 
            COUNT(*) , 
            ( SELECT COUNT(*) FROM hmda_2015 ) 
        ) AS share_actiontype
    FROM hmda_2015 
    GROUP BY actiontype" 
)

Calculate the sum of a linear variable, overall and by groups:

dbGetQuery( db , "SELECT SUM( loanamount ) FROM hmda_2015" )

dbGetQuery( db , 
    "SELECT 
        loanpurpose , 
        SUM( loanamount ) AS sum_loanamount 
    FROM hmda_2015 
    GROUP BY loanpurpose" 
)

Calculate the median (50th percentile) of a linear variable, overall and by groups:

dbGetQuery( db , "SELECT QUANTILE( loanamount , 0.5 ) FROM hmda_2015" )

dbGetQuery( db , 
    "SELECT 
        loanpurpose , 
        QUANTILE( loanamount , 0.5 ) AS median_loanamount
    FROM hmda_2015 
    GROUP BY loanpurpose" 
)

Subsetting

Limit your SQL analysis to non-Hispanic White persons with WHERE:

dbGetQuery( db ,
    "SELECT
        AVG( loanamount )
    FROM hmda_2015
    WHERE race = 5 AND ethnicity = 2"
)

Measures of Uncertainty

Calculate the variance and standard deviation, overall and by groups:

dbGetQuery( db , 
    "SELECT 
        VAR_SAMP( loanamount ) , 
        STDDEV_SAMP( loanamount ) 
    FROM hmda_2015" 
)

dbGetQuery( db , 
    "SELECT 
        loanpurpose , 
        VAR_SAMP( loanamount ) AS var_loanamount ,
        STDDEV_SAMP( loanamount ) AS stddev_loanamount
    FROM hmda_2015 
    GROUP BY loanpurpose" 
)

Regression Models and Tests of Association

Calculate the correlation between two variables, overall and by groups:

dbGetQuery( db , 
    "SELECT 
        CORR( CAST( multifamily_home AS DOUBLE ) , CAST( loanamount AS DOUBLE ) )
    FROM hmda_2015" 
)

dbGetQuery( db , 
    "SELECT 
        loanpurpose , 
        CORR( CAST( multifamily_home AS DOUBLE ) , CAST( loanamount AS DOUBLE ) )
    FROM hmda_2015 
    GROUP BY loanpurpose" 
)

Analysis Examples with dplyr

The R dplyr library offers an alternative grammar of data manipulation to base R and SQL syntax. dplyr offers many verbs, such as summarize, group_by, and mutate, the convenience of pipe-able functions, and the tidyverse style of non-standard evaluation. This vignette details the available features. As a starting point for HMDA users, this code replicates previously-presented examples:

library(dplyr)
dplyr_db <- MonetDBLite::src_monetdblite( dbdir )
hmda_tbl <- tbl( dplyr_db , 'hmda_2015' )

Calculate the mean (average) of a linear variable, overall and by groups:

hmda_tbl %>%
    summarize( mean = mean( loanamount ) )

hmda_tbl %>%
    group_by( loanpurpose ) %>%
    summarize( mean = mean( loanamount ) )

Replication Example

dbGetQuery( db , "SELECT COUNT(*) FROM hmda_2015" )

Database Shutdown

dbDisconnect( db , shutdown = TRUE )