Basic Stand Alone Public Use File (BSAPUF)

The CMS Basic Stand Alone Public Use File (BSAPUF) contains a five percent sample of Medicare beneficiary spending and utilization in the enrolled population.

Simplified Download and Importation

The R lodown package easily downloads and imports all available BSAPUF microdata by simply specifying "bsapuf" with an output_dir = parameter in the lodown() function. Depending on your internet connection and computer processing speed, you might prefer to run this step overnight.

library(lodown)
lodown( "bsapuf" , output_dir = file.path( path.expand( "~" ) , "BSAPUF" ) )

Analysis Examples with SQL and MonetDBLite

Connect to a database:

library(DBI)
dbdir <- file.path( path.expand( "~" ) , "BSAPUF" , "MonetDB" )
db <- dbConnect( MonetDBLite::MonetDBLite() , dbdir )

Variable Recoding

Add new columns to the data set:

dbSendQuery( 
    db , 
    "ALTER TABLE bsa_partd_events_2008 ADD COLUMN brand_name_drug INTEGER" 
)

dbSendQuery( db , 
    "UPDATE bsa_partd_events_2008 
    SET brand_name_drug = 
        CASE 
            WHEN pde_drug_type_cd = 1 THEN 1 
            WHEN pde_drug_type_cd = 2 THEN 0 
            ELSE NULL 
        END" 
)

Unweighted Counts

Count the unweighted number of records in the SQL table, overall and by groups:

dbGetQuery( db , "SELECT COUNT(*) FROM bsa_partd_events_2008" )

dbGetQuery( db ,
    "SELECT
        bene_sex_ident_cd ,
        COUNT(*) 
    FROM bsa_partd_events_2008
    GROUP BY bene_sex_ident_cd"
)

Descriptive Statistics

Calculate the mean (average) of a linear variable, overall and by groups:

dbGetQuery( db , "SELECT AVG( pde_drug_cost ) FROM bsa_partd_events_2008" )

dbGetQuery( db , 
    "SELECT 
        bene_sex_ident_cd , 
        AVG( pde_drug_cost ) AS mean_pde_drug_cost
    FROM bsa_partd_events_2008 
    GROUP BY bene_sex_ident_cd" 
)

Initiate a function that allows division by zero:

dbSendQuery( db , 
    "CREATE FUNCTION 
        div_noerror(l DOUBLE, r DOUBLE) 
    RETURNS DOUBLE 
    EXTERNAL NAME calc.div_noerror" 
)

Calculate the distribution of a categorical variable:

dbGetQuery( db , 
    "SELECT 
        bene_age_cat_cd , 
        div_noerror( 
            COUNT(*) , 
            ( SELECT COUNT(*) FROM bsa_partd_events_2008 ) 
        ) AS share_bene_age_cat_cd
    FROM bsa_partd_events_2008 
    GROUP BY bene_age_cat_cd" 
)

Calculate the sum of a linear variable, overall and by groups:

dbGetQuery( db , "SELECT SUM( pde_drug_cost ) FROM bsa_partd_events_2008" )

dbGetQuery( db , 
    "SELECT 
        bene_sex_ident_cd , 
        SUM( pde_drug_cost ) AS sum_pde_drug_cost 
    FROM bsa_partd_events_2008 
    GROUP BY bene_sex_ident_cd" 
)

Calculate the median (50th percentile) of a linear variable, overall and by groups:

dbGetQuery( db , "SELECT QUANTILE( pde_drug_cost , 0.5 ) FROM bsa_partd_events_2008" )

dbGetQuery( db , 
    "SELECT 
        bene_sex_ident_cd , 
        QUANTILE( pde_drug_cost , 0.5 ) AS median_pde_drug_cost
    FROM bsa_partd_events_2008 
    GROUP BY bene_sex_ident_cd" 
)

Subsetting

Limit your SQL analysis to events where patient paid 100% of drug’s cost with WHERE:

dbGetQuery( db ,
    "SELECT
        AVG( pde_drug_cost )
    FROM bsa_partd_events_2008
    WHERE pde_drug_pat_pay_cd = 3"
)

Measures of Uncertainty

Calculate the variance and standard deviation, overall and by groups:

dbGetQuery( db , 
    "SELECT 
        VAR_SAMP( pde_drug_cost ) , 
        STDDEV_SAMP( pde_drug_cost ) 
    FROM bsa_partd_events_2008" 
)

dbGetQuery( db , 
    "SELECT 
        bene_sex_ident_cd , 
        VAR_SAMP( pde_drug_cost ) AS var_pde_drug_cost ,
        STDDEV_SAMP( pde_drug_cost ) AS stddev_pde_drug_cost
    FROM bsa_partd_events_2008 
    GROUP BY bene_sex_ident_cd" 
)

Regression Models and Tests of Association

Calculate the correlation between two variables, overall and by groups:

dbGetQuery( db , 
    "SELECT 
        CORR( CAST( brand_name_drug AS DOUBLE ) , CAST( pde_drug_cost AS DOUBLE ) )
    FROM bsa_partd_events_2008" 
)

dbGetQuery( db , 
    "SELECT 
        bene_sex_ident_cd , 
        CORR( CAST( brand_name_drug AS DOUBLE ) , CAST( pde_drug_cost AS DOUBLE ) )
    FROM bsa_partd_events_2008 
    GROUP BY bene_sex_ident_cd" 
)

Analysis Examples with dplyr

The R dplyr library offers an alternative grammar of data manipulation to base R and SQL syntax. dplyr offers many verbs, such as summarize, group_by, and mutate, the convenience of pipe-able functions, and the tidyverse style of non-standard evaluation. This vignette details the available features. As a starting point for BSAPUF users, this code replicates previously-presented examples:

library(dplyr)
dplyr_db <- MonetDBLite::src_monetdblite( dbdir )
bsapuf_tbl <- tbl( dplyr_db , 'bsa_partd_events_2008' )

Calculate the mean (average) of a linear variable, overall and by groups:

bsapuf_tbl %>%
    summarize( mean = mean( pde_drug_cost ) )

bsapuf_tbl %>%
    group_by( bene_sex_ident_cd ) %>%
    summarize( mean = mean( pde_drug_cost ) )

Replication Example

dbGetQuery( db , "SELECT COUNT(*) FROM bsa_partd_events_2008" )

Database Shutdown

dbDisconnect( db , shutdown = TRUE )