Home Mortgage Disclosure Act (HMDA)
Contributed by Max Weselcouch <mweselco@gmail.com>
Responding to discriminatory lending practices, the United States Congress mandated that financial organizations originating home mortgages report some basic operational statistics. The Home Mortgage Disclosure Act (HMDA) increased the transparency of home-lending activity across the country.
A loan application record (LAR) table with one record per public loan application, with secondary tables containing both private loan applications (PMIC) and one record per institution tables (INS).
A public compilation of more than ninety percent of all Federal Housing Authority (FHA) loans in the United States.
Updated every September with a new year of microdata. Data prior to 2006 require a special order from the United States National Archives.
Maintained by the United States Federal Financial Institutions Examination Council (FEIEC)
Simplified Download and Importation
The R lodown
package easily downloads and imports all available HMDA microdata by simply specifying "hmda"
with an output_dir =
parameter in the lodown()
function. Depending on your internet connection and computer processing speed, you might prefer to run this step overnight.
library(lodown)
lodown( "hmda" , output_dir = file.path( path.expand( "~" ) , "HMDA" ) )
Analysis Examples with SQL and RSQLite
Connect to a database:
library(DBI)
dbdir <- file.path( path.expand( "~" ) , "HMDA" , "SQLite.db" )
db <- dbConnect( RSQLite::SQLite() , dbdir )
Variable Recoding
Add new columns to the data set:
dbSendQuery( db , "ALTER TABLE hmda_2015 ADD COLUMN multifamily_home INTEGER" )
dbSendQuery( db ,
"UPDATE hmda_2015
SET multifamily_home =
CASE WHEN ( propertytype = 3 ) THEN 1 ELSE 0 END"
)
Unweighted Counts
Count the unweighted number of records in the SQL table, overall and by groups:
dbGetQuery( db , "SELECT COUNT(*) FROM hmda_2015" )
dbGetQuery( db ,
"SELECT
loanpurpose ,
COUNT(*)
FROM hmda_2015
GROUP BY loanpurpose"
)
Descriptive Statistics
Calculate the mean (average) of a linear variable, overall and by groups:
dbGetQuery( db , "SELECT AVG( loanamount ) FROM hmda_2015" )
dbGetQuery( db ,
"SELECT
loanpurpose ,
AVG( loanamount ) AS mean_loanamount
FROM hmda_2015
GROUP BY loanpurpose"
)
Calculate the distribution of a categorical variable:
dbGetQuery( db ,
"SELECT
actiontype ,
COUNT(*) / ( SELECT COUNT(*) FROM hmda_2015 )
AS share_actiontype
FROM hmda_2015
GROUP BY actiontype"
)
Calculate the sum of a linear variable, overall and by groups:
dbGetQuery( db , "SELECT SUM( loanamount ) FROM hmda_2015" )
dbGetQuery( db ,
"SELECT
loanpurpose ,
SUM( loanamount ) AS sum_loanamount
FROM hmda_2015
GROUP BY loanpurpose"
)
Calculate the 25th, median, and 75th percentiles of a linear variable, overall and by groups:
RSQLite::initExtension( db )
dbGetQuery( db ,
"SELECT
LOWER_QUARTILE( loanamount ) ,
MEDIAN( loanamount ) ,
UPPER_QUARTILE( loanamount )
FROM hmda_2015"
)
dbGetQuery( db ,
"SELECT
loanpurpose ,
LOWER_QUARTILE( loanamount ) AS lower_quartile_loanamount ,
MEDIAN( loanamount ) AS median_loanamount ,
UPPER_QUARTILE( loanamount ) AS upper_quartile_loanamount
FROM hmda_2015
GROUP BY loanpurpose"
)
Subsetting
Limit your SQL analysis to non-Hispanic White persons with WHERE
:
dbGetQuery( db ,
"SELECT
AVG( loanamount )
FROM hmda_2015
WHERE race = 5 AND ethnicity = 2"
)
Measures of Uncertainty
Calculate the variance and standard deviation, overall and by groups:
RSQLite::initExtension( db )
dbGetQuery( db ,
"SELECT
VARIANCE( loanamount ) ,
STDEV( loanamount )
FROM hmda_2015"
)
dbGetQuery( db ,
"SELECT
loanpurpose ,
VARIANCE( loanamount ) AS var_loanamount ,
STDEV( loanamount ) AS stddev_loanamount
FROM hmda_2015
GROUP BY loanpurpose"
)
Regression Models and Tests of Association
Perform a t-test:
hmda_slim_df <-
dbGetQuery( db ,
"SELECT
loanamount ,
multifamily_home ,
actiontype
FROM hmda_2015"
)
t.test( loanamount ~ multifamily_home , hmda_slim_df )
Perform a chi-squared test of association:
this_table <-
table( hmda_slim_df[ , c( "multifamily_home" , "actiontype" ) ] )
chisq.test( this_table )
Perform a generalized linear model:
glm_result <-
glm(
loanamount ~ multifamily_home + actiontype ,
data = hmda_slim_df
)
summary( glm_result )
Analysis Examples with dplyr
The R dplyr
library offers an alternative grammar of data manipulation to base R and SQL syntax. dplyr offers many verbs, such as summarize
, group_by
, and mutate
, the convenience of pipe-able functions, and the tidyverse
style of non-standard evaluation. This vignette details the available features. As a starting point for HMDA users, this code replicates previously-presented examples:
library(dplyr)
library(dbplyr)
dplyr_db <- dplyr::src_sqlite( dbdir )
hmda_tbl <- tbl( dplyr_db , 'hmda_2015' )
Calculate the mean (average) of a linear variable, overall and by groups:
hmda_tbl %>%
summarize( mean = mean( loanamount ) )
hmda_tbl %>%
group_by( loanpurpose ) %>%
summarize( mean = mean( loanamount ) )
Replication Example
dbGetQuery( db , "SELECT COUNT(*) FROM hmda_2015" )