Area Health Resources Files (AHRF)
National, state, and county-level data on health care professions, health facilities, population characteristics, health workforce training, hospital utilization and expenditure, and the environment.
One table with one row per county and a second table with one row per state.
Replaced annually with the latest available county- and state-level statistics.
Compiled by the Bureau of Health Workforce at the Health Services and Resources Administration.
Please skim before you begin:
User Documentation for the County Area Health Resources File (AHRF) 2021-2022 Release
A haiku regarding this microdata:
# local aggregates
# to spread merge join spline regress
# like fresh buttered bread
Download, Import, Preparation
Download and import the most current county-level file:
library(haven)
<- tempfile()
tf
<- "https://data.hrsa.gov//DataDownload/AHRF/AHRF_2021-2022_SAS.zip"
ahrf_url
download.file( ahrf_url , tf , mode = 'wb' )
<- unzip( tf , exdir = tempdir() )
unzipped_files
<- grep( "\\.sas7bdat$" , unzipped_files , value = TRUE )
sas_fn
<- read_sas( sas_fn )
ahrf_tbl
<- data.frame( ahrf_tbl )
ahrf_df
names( ahrf_df ) <- tolower( names( ahrf_df ) )
Save Locally
Save the object at any point:
# ahrf_fn <- file.path( path.expand( "~" ) , "AHRF" , "this_file.rds" )
# saveRDS( ahrf_df , file = ahrf_fn , compress = FALSE )
Load the same object:
# ahrf_df <- readRDS( ahrf_fn )
Variable Recoding
Add new columns to the data set:
<-
ahrf_df transform(
ahrf_df ,
cbsa_indicator_code =
factor(
as.numeric( f1406720 ) ,
levels = 0:2 ,
labels = c( "not metro" , "metro" , "micro" )
) ,
mhi_2020 = f1322620 ,
whole_county_hpsa_2022 = as.numeric( f0978722 ) == 1 ,
census_region =
factor(
as.numeric( f04439 ) ,
levels = 1:4 ,
labels = c( "northeast" , "midwest" , "south" , "west" )
)
)
Analysis Examples with base R
Unweighted Counts
Count the unweighted number of records in the table, overall and by groups:
nrow( ahrf_df )
table( ahrf_df[ , "cbsa_indicator_code" ] , useNA = "always" )
Descriptive Statistics
Calculate the mean (average) of a linear variable, overall and by groups:
mean( ahrf_df[ , "mhi_2020" ] , na.rm = TRUE )
tapply(
"mhi_2020" ] ,
ahrf_df[ , "cbsa_indicator_code" ] ,
ahrf_df[ ,
mean ,na.rm = TRUE
)
Calculate the distribution of a categorical variable, overall and by groups:
prop.table( table( ahrf_df[ , "census_region" ] ) )
prop.table(
table( ahrf_df[ , c( "census_region" , "cbsa_indicator_code" ) ] ) ,
margin = 2
)
Calculate the sum of a linear variable, overall and by groups:
sum( ahrf_df[ , "mhi_2020" ] , na.rm = TRUE )
tapply(
"mhi_2020" ] ,
ahrf_df[ , "cbsa_indicator_code" ] ,
ahrf_df[ ,
sum ,na.rm = TRUE
)
Calculate the median (50th percentile) of a linear variable, overall and by groups:
quantile( ahrf_df[ , "mhi_2020" ] , 0.5 , na.rm = TRUE )
tapply(
"mhi_2020" ] ,
ahrf_df[ , "cbsa_indicator_code" ] ,
ahrf_df[ ,
quantile ,0.5 ,
na.rm = TRUE
)
Subsetting
Limit your data.frame
to California:
<- subset( ahrf_df , f12424 == "CA" ) sub_ahrf_df
Calculate the mean (average) of this subset:
mean( sub_ahrf_df[ , "mhi_2020" ] , na.rm = TRUE )
Measures of Uncertainty
Calculate the variance, overall and by groups:
var( ahrf_df[ , "mhi_2020" ] , na.rm = TRUE )
tapply(
"mhi_2020" ] ,
ahrf_df[ , "cbsa_indicator_code" ] ,
ahrf_df[ ,
var ,na.rm = TRUE
)
Regression Models and Tests of Association
Perform a t-test:
t.test( mhi_2020 ~ whole_county_hpsa_2022 , ahrf_df )
Perform a chi-squared test of association:
<- table( ahrf_df[ , c( "whole_county_hpsa_2022" , "census_region" ) ] )
this_table
chisq.test( this_table )
Perform a generalized linear model:
<-
glm_result glm(
~ whole_county_hpsa_2022 + census_region ,
mhi_2020 data = ahrf_df
)
summary( glm_result )
Replication Example
Match the record count in row number 8,543 of AHRF 2021-2022 Technical Documentation.xlsx
:
stopifnot( nrow( ahrf_df ) == 3232 )
Analysis Examples with dplyr
The R dplyr
library offers an alternative grammar of data manipulation to base R and SQL syntax. dplyr offers many verbs, such as summarize
, group_by
, and mutate
, the convenience of pipe-able functions, and the tidyverse
style of non-standard evaluation. This vignette details the available features. As a starting point for AHRF users, this code replicates previously-presented examples:
library(dplyr)
<- as_tibble( ahrf_df ) ahrf_tbl
Calculate the mean (average) of a linear variable, overall and by groups:
%>%
ahrf_tbl summarize( mean = mean( mhi_2020 , na.rm = TRUE ) )
%>%
ahrf_tbl group_by( cbsa_indicator_code ) %>%
summarize( mean = mean( mhi_2020 , na.rm = TRUE ) )
Analysis Examples with data.table
The R data.table
library provides a high-performance version of base R’s data.frame with syntax and feature enhancements for ease of use, convenience and programming speed. data.table offers concise syntax: fast to type, fast to read, fast speed, memory efficiency, a careful API lifecycle management, an active community, and a rich set of features. This vignette details the available features. As a starting point for AHRF users, this code replicates previously-presented examples:
library(data.table)
<- data.table( ahrf_df ) ahrf_dt
Calculate the mean (average) of a linear variable, overall and by groups:
mean( mhi_2020 , na.rm = TRUE ) ]
ahrf_dt[ ,
mean( mhi_2020 , na.rm = TRUE ) , by = cbsa_indicator_code ] ahrf_dt[ ,
Analysis Examples with duckdb
The R duckdb
library provides an embedded analytical data management system with support for the Structured Query Language (SQL). duckdb offers a simple, feature-rich, fast, and free SQL OLAP management system. This vignette details the available features. As a starting point for AHRF users, this code replicates previously-presented examples:
library(duckdb)
<- dbConnect( duckdb::duckdb() , dbdir = 'my-db.duckdb' )
con dbWriteTable( con , 'ahrf' , ahrf_df )
Calculate the mean (average) of a linear variable, overall and by groups:
dbGetQuery( con , 'SELECT AVG( mhi_2020 ) FROM ahrf' )
dbGetQuery(
con ,'SELECT
cbsa_indicator_code ,
AVG( mhi_2020 )
FROM
ahrf
GROUP BY
cbsa_indicator_code'
)