# Home Mortgage Disclosure Act (HMDA)

*Contributed by Max Weselcouch <mweselco@gmail.com>*

Responding to discriminatory lending practices, the United States Congress mandated that financial organizations originating home mortgages report some basic operational statistics. The Home Mortgage Disclosure Act (HMDA) increased the transparency of home-lending activity across the country.

A loan application record (LAR) table with one record per public loan application, with secondary tables containing both private loan applications (PMIC) and one record per institution tables (INS).

A public compilation of more than ninety percent of all Federal Housing Authority (FHA) loans in the United States.

Updated every September with a new year of microdata. Data prior to 2006 require a special order from the United States National Archives.

Maintained by the United States Federal Financial Institutions Examination Council (FEIEC)

## Simplified Download and Importation

The R `lodown`

package easily downloads and imports all available HMDA microdata by simply specifying `"hmda"`

with an `output_dir =`

parameter in the `lodown()`

function. Depending on your internet connection and computer processing speed, you might prefer to run this step overnight.

```
library(lodown)
lodown( "hmda" , output_dir = file.path( path.expand( "~" ) , "HMDA" ) )
```

## Analysis Examples with SQL and `RSQLite`

Connect to a database:

```
library(DBI)
dbdir <- file.path( path.expand( "~" ) , "HMDA" , "SQLite.db" )
db <- dbConnect( RSQLite::SQLite() , dbdir )
```

### Variable Recoding

Add new columns to the data set:

```
dbSendQuery( db , "ALTER TABLE hmda_2015 ADD COLUMN multifamily_home INTEGER" )
dbSendQuery( db ,
"UPDATE hmda_2015
SET multifamily_home =
CASE WHEN ( propertytype = 3 ) THEN 1 ELSE 0 END"
)
```

### Unweighted Counts

Count the unweighted number of records in the SQL table, overall and by groups:

```
dbGetQuery( db , "SELECT COUNT(*) FROM hmda_2015" )
dbGetQuery( db ,
"SELECT
loanpurpose ,
COUNT(*)
FROM hmda_2015
GROUP BY loanpurpose"
)
```

### Descriptive Statistics

Calculate the mean (average) of a linear variable, overall and by groups:

```
dbGetQuery( db , "SELECT AVG( loanamount ) FROM hmda_2015" )
dbGetQuery( db ,
"SELECT
loanpurpose ,
AVG( loanamount ) AS mean_loanamount
FROM hmda_2015
GROUP BY loanpurpose"
)
```

Calculate the distribution of a categorical variable:

```
dbGetQuery( db ,
"SELECT
actiontype ,
COUNT(*) / ( SELECT COUNT(*) FROM hmda_2015 )
AS share_actiontype
FROM hmda_2015
GROUP BY actiontype"
)
```

Calculate the sum of a linear variable, overall and by groups:

```
dbGetQuery( db , "SELECT SUM( loanamount ) FROM hmda_2015" )
dbGetQuery( db ,
"SELECT
loanpurpose ,
SUM( loanamount ) AS sum_loanamount
FROM hmda_2015
GROUP BY loanpurpose"
)
```

Calculate the 25th, median, and 75th percentiles of a linear variable, overall and by groups:

```
RSQLite::initExtension( db )
dbGetQuery( db ,
"SELECT
LOWER_QUARTILE( loanamount ) ,
MEDIAN( loanamount ) ,
UPPER_QUARTILE( loanamount )
FROM hmda_2015"
)
dbGetQuery( db ,
"SELECT
loanpurpose ,
LOWER_QUARTILE( loanamount ) AS lower_quartile_loanamount ,
MEDIAN( loanamount ) AS median_loanamount ,
UPPER_QUARTILE( loanamount ) AS upper_quartile_loanamount
FROM hmda_2015
GROUP BY loanpurpose"
)
```

### Subsetting

Limit your SQL analysis to non-Hispanic White persons with `WHERE`

:

```
dbGetQuery( db ,
"SELECT
AVG( loanamount )
FROM hmda_2015
WHERE race = 5 AND ethnicity = 2"
)
```

### Measures of Uncertainty

Calculate the variance and standard deviation, overall and by groups:

```
RSQLite::initExtension( db )
dbGetQuery( db ,
"SELECT
VARIANCE( loanamount ) ,
STDEV( loanamount )
FROM hmda_2015"
)
dbGetQuery( db ,
"SELECT
loanpurpose ,
VARIANCE( loanamount ) AS var_loanamount ,
STDEV( loanamount ) AS stddev_loanamount
FROM hmda_2015
GROUP BY loanpurpose"
)
```

### Regression Models and Tests of Association

Perform a t-test:

```
hmda_slim_df <-
dbGetQuery( db ,
"SELECT
loanamount ,
multifamily_home ,
actiontype
FROM hmda_2015"
)
t.test( loanamount ~ multifamily_home , hmda_slim_df )
```

Perform a chi-squared test of association:

```
this_table <-
table( hmda_slim_df[ , c( "multifamily_home" , "actiontype" ) ] )
chisq.test( this_table )
```

Perform a generalized linear model:

```
glm_result <-
glm(
loanamount ~ multifamily_home + actiontype ,
data = hmda_slim_df
)
summary( glm_result )
```

## Analysis Examples with `dplyr`

The R `dplyr`

library offers an alternative grammar of data manipulation to base R and SQL syntax. dplyr offers many verbs, such as `summarize`

, `group_by`

, and `mutate`

, the convenience of pipe-able functions, and the `tidyverse`

style of non-standard evaluation. This vignette details the available features. As a starting point for HMDA users, this code replicates previously-presented examples:

```
library(dplyr)
library(dbplyr)
dplyr_db <- dplyr::src_sqlite( dbdir )
hmda_tbl <- tbl( dplyr_db , 'hmda_2015' )
```

Calculate the mean (average) of a linear variable, overall and by groups:

```
hmda_tbl %>%
summarize( mean = mean( loanamount ) )
hmda_tbl %>%
group_by( loanpurpose ) %>%
summarize( mean = mean( loanamount ) )
```

## Replication Example

`dbGetQuery( db , "SELECT COUNT(*) FROM hmda_2015" )`