Home Mortgage Disclosure Act (HMDA)

Build Status Build status

Contributed by Max Weselcouch <mweselco@gmail.com>

Responding to discriminatory lending practices, the United States Congress mandated that financial organizations originating home mortgages report some basic operational statistics. The Home Mortgage Disclosure Act (HMDA) increased the transparency of home-lending activity across the country.

Simplified Download and Importation

The R lodown package easily downloads and imports all available HMDA microdata by simply specifying "hmda" with an output_dir = parameter in the lodown() function. Depending on your internet connection and computer processing speed, you might prefer to run this step overnight.

library(lodown)
lodown( "hmda" , output_dir = file.path( path.expand( "~" ) , "HMDA" ) )

Analysis Examples with SQL and RSQLite  

Connect to a database:

library(DBI)
dbdir <- file.path( path.expand( "~" ) , "HMDA" , "SQLite.db" )
db <- dbConnect( RSQLite::SQLite() , dbdir )

Variable Recoding

Add new columns to the data set:

dbSendQuery( db , "ALTER TABLE hmda_2015 ADD COLUMN multifamily_home INTEGER" )

dbSendQuery( db , 
    "UPDATE hmda_2015 
    SET multifamily_home = 
        CASE WHEN ( propertytype = 3 ) THEN 1 ELSE 0 END" 
)

Unweighted Counts

Count the unweighted number of records in the SQL table, overall and by groups:

dbGetQuery( db , "SELECT COUNT(*) FROM hmda_2015" )

dbGetQuery( db ,
    "SELECT
        loanpurpose ,
        COUNT(*) 
    FROM hmda_2015
    GROUP BY loanpurpose"
)

Descriptive Statistics

Calculate the mean (average) of a linear variable, overall and by groups:

dbGetQuery( db , "SELECT AVG( loanamount ) FROM hmda_2015" )

dbGetQuery( db , 
    "SELECT 
        loanpurpose , 
        AVG( loanamount ) AS mean_loanamount
    FROM hmda_2015 
    GROUP BY loanpurpose" 
)

Calculate the distribution of a categorical variable:

dbGetQuery( db , 
    "SELECT 
        actiontype , 
        COUNT(*) / ( SELECT COUNT(*) FROM hmda_2015 ) 
            AS share_actiontype
    FROM hmda_2015 
    GROUP BY actiontype" 
)

Calculate the sum of a linear variable, overall and by groups:

dbGetQuery( db , "SELECT SUM( loanamount ) FROM hmda_2015" )

dbGetQuery( db , 
    "SELECT 
        loanpurpose , 
        SUM( loanamount ) AS sum_loanamount 
    FROM hmda_2015 
    GROUP BY loanpurpose" 
)

Calculate the 25th, median, and 75th percentiles of a linear variable, overall and by groups:

RSQLite::initExtension( db )

dbGetQuery( db , 
    "SELECT 
        LOWER_QUARTILE( loanamount ) , 
        MEDIAN( loanamount ) , 
        UPPER_QUARTILE( loanamount ) 
    FROM hmda_2015" 
)

dbGetQuery( db , 
    "SELECT 
        loanpurpose , 
        LOWER_QUARTILE( loanamount ) AS lower_quartile_loanamount , 
        MEDIAN( loanamount ) AS median_loanamount , 
        UPPER_QUARTILE( loanamount ) AS upper_quartile_loanamount
    FROM hmda_2015 
    GROUP BY loanpurpose" 
)

Subsetting

Limit your SQL analysis to non-Hispanic White persons with WHERE:

dbGetQuery( db ,
    "SELECT
        AVG( loanamount )
    FROM hmda_2015
    WHERE race = 5 AND ethnicity = 2"
)

Measures of Uncertainty

Calculate the variance and standard deviation, overall and by groups:

RSQLite::initExtension( db )

dbGetQuery( db , 
    "SELECT 
        VARIANCE( loanamount ) , 
        STDEV( loanamount ) 
    FROM hmda_2015" 
)

dbGetQuery( db , 
    "SELECT 
        loanpurpose , 
        VARIANCE( loanamount ) AS var_loanamount ,
        STDEV( loanamount ) AS stddev_loanamount
    FROM hmda_2015 
    GROUP BY loanpurpose" 
)

Regression Models and Tests of Association

Perform a t-test:

hmda_slim_df <- 
    dbGetQuery( db , 
        "SELECT 
            loanamount , 
            multifamily_home ,
            actiontype
        FROM hmda_2015" 
    )

t.test( loanamount ~ multifamily_home , hmda_slim_df )

Perform a chi-squared test of association:

this_table <-
    table( hmda_slim_df[ , c( "multifamily_home" , "actiontype" ) ] )

chisq.test( this_table )

Perform a generalized linear model:

glm_result <- 
    glm( 
        loanamount ~ multifamily_home + actiontype , 
        data = hmda_slim_df
    )

summary( glm_result )

Analysis Examples with dplyr  

The R dplyr library offers an alternative grammar of data manipulation to base R and SQL syntax. dplyr offers many verbs, such as summarize, group_by, and mutate, the convenience of pipe-able functions, and the tidyverse style of non-standard evaluation. This vignette details the available features. As a starting point for HMDA users, this code replicates previously-presented examples:

library(dplyr)
library(dbplyr)
dplyr_db <- dplyr::src_sqlite( dbdir )
hmda_tbl <- tbl( dplyr_db , 'hmda_2015' )

Calculate the mean (average) of a linear variable, overall and by groups:

hmda_tbl %>%
    summarize( mean = mean( loanamount ) )

hmda_tbl %>%
    group_by( loanpurpose ) %>%
    summarize( mean = mean( loanamount ) )

Replication Example

dbGetQuery( db , "SELECT COUNT(*) FROM hmda_2015" )