Area Health Resource File (AHRF)
Though not a survey data set itself, useful to merge onto other microdata.
One table with one row per county and a second table with one row per state.
Replaced annually with the latest available county- and state-level statistics.
Compiled by the United States Health Services and Resources Administration (HRSA).
Simplified Download and Importation
The R lodown
package easily downloads and imports all available AHRF microdata by simply specifying "ahrf"
with an output_dir =
parameter in the lodown()
function. Depending on your internet connection and computer processing speed, you might prefer to run this step overnight.
library(lodown)
lodown( "ahrf" , output_dir = file.path( path.expand( "~" ) , "AHRF" ) )
Analysis Examples with base R
Load a data frame:
ahrf_df <- readRDS( file.path( path.expand( "~" ) , "AHRF" , "county/AHRF_2016-2017.rds" ) )
Variable Recoding
Add new columns to the data set:
ahrf_df <-
transform(
ahrf_df ,
cbsa_indicator_code =
factor(
1 + as.numeric( f1406715 ) ,
labels = c( "not metro" , "metro" , "micro" )
) ,
mhi_2014 = f1322614 ,
whole_county_hpsa_2016 = as.numeric( f0978716 == 1 ) ,
census_region =
factor(
as.numeric( f04439 ) ,
labels = c( "northeast" , "midwest" , "south" , "west" )
)
)
Unweighted Counts
Count the unweighted number of records in the table, overall and by groups:
nrow( ahrf_df )
table( ahrf_df[ , "cbsa_indicator_code" ] , useNA = "always" )
Descriptive Statistics
Calculate the mean (average) of a linear variable, overall and by groups:
mean( ahrf_df[ , "mhi_2014" ] , na.rm = TRUE )
tapply(
ahrf_df[ , "mhi_2014" ] ,
ahrf_df[ , "cbsa_indicator_code" ] ,
mean ,
na.rm = TRUE
)
Calculate the distribution of a categorical variable, overall and by groups:
prop.table( table( ahrf_df[ , "census_region" ] ) )
prop.table(
table( ahrf_df[ , c( "census_region" , "cbsa_indicator_code" ) ] ) ,
margin = 2
)
Calculate the sum of a linear variable, overall and by groups:
sum( ahrf_df[ , "mhi_2014" ] , na.rm = TRUE )
tapply(
ahrf_df[ , "mhi_2014" ] ,
ahrf_df[ , "cbsa_indicator_code" ] ,
sum ,
na.rm = TRUE
)
Calculate the median (50th percentile) of a linear variable, overall and by groups:
quantile( ahrf_df[ , "mhi_2014" ] , 0.5 , na.rm = TRUE )
tapply(
ahrf_df[ , "mhi_2014" ] ,
ahrf_df[ , "cbsa_indicator_code" ] ,
quantile ,
0.5 ,
na.rm = TRUE
)
Subsetting
Limit your data.frame
to California:
sub_ahrf_df <- subset( ahrf_df , f12424 == "CA" )
Calculate the mean (average) of this subset:
mean( sub_ahrf_df[ , "mhi_2014" ] , na.rm = TRUE )
Measures of Uncertainty
Calculate the variance, overall and by groups:
var( ahrf_df[ , "mhi_2014" ] , na.rm = TRUE )
tapply(
ahrf_df[ , "mhi_2014" ] ,
ahrf_df[ , "cbsa_indicator_code" ] ,
var ,
na.rm = TRUE
)
Regression Models and Tests of Association
Perform a t-test:
t.test( mhi_2014 ~ whole_county_hpsa_2016 , ahrf_df )
Perform a chi-squared test of association:
this_table <- table( ahrf_df[ , c( "whole_county_hpsa_2016" , "census_region" ) ] )
chisq.test( this_table )
Perform a generalized linear model:
glm_result <-
glm(
mhi_2014 ~ whole_county_hpsa_2016 + census_region ,
data = ahrf_df
)
summary( glm_result )
Analysis Examples with dplyr
The R dplyr
library offers an alternative grammar of data manipulation to base R and SQL syntax. dplyr offers many verbs, such as summarize
, group_by
, and mutate
, the convenience of pipe-able functions, and the tidyverse
style of non-standard evaluation. This vignette details the available features. As a starting point for AHRF users, this code replicates previously-presented examples:
library(dplyr)
ahrf_tbl <- tbl_df( ahrf_df )
Calculate the mean (average) of a linear variable, overall and by groups:
ahrf_tbl %>%
summarize( mean = mean( mhi_2014 , na.rm = TRUE ) )
ahrf_tbl %>%
group_by( cbsa_indicator_code ) %>%
summarize( mean = mean( mhi_2014 , na.rm = TRUE ) )