Area Health Resource File (AHRF)

Build Status Build status

Though not a survey data set itself, useful to merge onto other microdata.

Simplified Download and Importation

The R lodown package easily downloads and imports all available AHRF microdata by simply specifying "ahrf" with an output_dir = parameter in the lodown() function. Depending on your internet connection and computer processing speed, you might prefer to run this step overnight.

library(lodown)
lodown( "ahrf" , output_dir = file.path( path.expand( "~" ) , "AHRF" ) )

Analysis Examples with base R  

Load a data frame:

ahrf_df <- readRDS( file.path( path.expand( "~" ) , "AHRF" , "county/AHRF_2016-2017.rds" ) )

Variable Recoding

Add new columns to the data set:

ahrf_df <- 
    transform( 
        ahrf_df , 
        
        cbsa_indicator_code = 
            factor( 
                1 + as.numeric( f1406715 ) , 
                labels = c( "not metro" , "metro" , "micro" ) 
            ) ,
            
        mhi_2014 = f1322614 ,
        
        whole_county_hpsa_2016 = as.numeric( f0978716 == 1 ) ,
        
        census_region = 
            factor( 
                as.numeric( f04439 ) , 
                labels = c( "northeast" , "midwest" , "south" , "west" ) 
            )

    )

Unweighted Counts

Count the unweighted number of records in the table, overall and by groups:

nrow( ahrf_df )

table( ahrf_df[ , "cbsa_indicator_code" ] , useNA = "always" )

Descriptive Statistics

Calculate the mean (average) of a linear variable, overall and by groups:

mean( ahrf_df[ , "mhi_2014" ] , na.rm = TRUE )

tapply(
    ahrf_df[ , "mhi_2014" ] ,
    ahrf_df[ , "cbsa_indicator_code" ] ,
    mean ,
    na.rm = TRUE 
)

Calculate the distribution of a categorical variable, overall and by groups:

prop.table( table( ahrf_df[ , "census_region" ] ) )

prop.table(
    table( ahrf_df[ , c( "census_region" , "cbsa_indicator_code" ) ] ) ,
    margin = 2
)

Calculate the sum of a linear variable, overall and by groups:

sum( ahrf_df[ , "mhi_2014" ] , na.rm = TRUE )

tapply(
    ahrf_df[ , "mhi_2014" ] ,
    ahrf_df[ , "cbsa_indicator_code" ] ,
    sum ,
    na.rm = TRUE 
)

Calculate the median (50th percentile) of a linear variable, overall and by groups:

quantile( ahrf_df[ , "mhi_2014" ] , 0.5 , na.rm = TRUE )

tapply(
    ahrf_df[ , "mhi_2014" ] ,
    ahrf_df[ , "cbsa_indicator_code" ] ,
    quantile ,
    0.5 ,
    na.rm = TRUE 
)

Subsetting

Limit your data.frame to California:

sub_ahrf_df <- subset( ahrf_df , f12424 == "CA" )

Calculate the mean (average) of this subset:

mean( sub_ahrf_df[ , "mhi_2014" ] , na.rm = TRUE )

Measures of Uncertainty

Calculate the variance, overall and by groups:

var( ahrf_df[ , "mhi_2014" ] , na.rm = TRUE )

tapply(
    ahrf_df[ , "mhi_2014" ] ,
    ahrf_df[ , "cbsa_indicator_code" ] ,
    var ,
    na.rm = TRUE 
)

Regression Models and Tests of Association

Perform a t-test:

t.test( mhi_2014 ~ whole_county_hpsa_2016 , ahrf_df )

Perform a chi-squared test of association:

this_table <- table( ahrf_df[ , c( "whole_county_hpsa_2016" , "census_region" ) ] )

chisq.test( this_table )

Perform a generalized linear model:

glm_result <- 
    glm( 
        mhi_2014 ~ whole_county_hpsa_2016 + census_region , 
        data = ahrf_df
    )

summary( glm_result )

Analysis Examples with dplyr  

The R dplyr library offers an alternative grammar of data manipulation to base R and SQL syntax. dplyr offers many verbs, such as summarize, group_by, and mutate, the convenience of pipe-able functions, and the tidyverse style of non-standard evaluation. This vignette details the available features. As a starting point for AHRF users, this code replicates previously-presented examples:

library(dplyr)
ahrf_tbl <- tbl_df( ahrf_df )

Calculate the mean (average) of a linear variable, overall and by groups:

ahrf_tbl %>%
    summarize( mean = mean( mhi_2014 , na.rm = TRUE ) )

ahrf_tbl %>%
    group_by( cbsa_indicator_code ) %>%
    summarize( mean = mean( mhi_2014 , na.rm = TRUE ) )