Social Security Administration Public Use Microdata (SSA)

Build Status Build status

Research extracts provided by the Social Security Administration.

  • Tables contain either one record per person or one record per person per year.

  • The entire population of either social security number holders (most of the country) or social security recipients (just beneficiaries). One-percent samples should be multiplied by 100 to get accurate nationwide count statistics, five-percent samples by 20.

  • No expected release timeline.

  • Released by the United States Social Security Administration (SSA).

Simplified Download and Importation

The R lodown package easily downloads and imports all available SSA microdata by simply specifying "ssa" with an output_dir = parameter in the lodown() function. Depending on your internet connection and computer processing speed, you might prefer to run this step overnight.

library(lodown)
lodown( "ssa" , output_dir = file.path( path.expand( "~" ) , "SSA" ) )

Analysis Examples with base R  

Load a data frame:

ssa_df <- readRDS( file.path( path.expand( "~" ) , "SSA" , "ssr_data/SSIPUF.rds" ) )

Variable Recoding

Add new columns to the data set:

ssa_df <- 
    transform( 
        ssa_df , 
        
        mental_disorder = as.numeric( diag %in% 1:2 ) ,
        
        program_eligibility =
            factor( 
                prel , 
                
                levels = 0:5 , 
                
                labels =
                    c( "Unspecified" ,
                    "Aged individual" ,
                    "Aged spouse" ,
                    "Disabled or blind individual" ,
                    "Disabled or blind spouse" ,
                    "Disabled or blind child" )
            )
            
    )

Unweighted Counts

Count the unweighted number of records in the table, overall and by groups:

nrow( ssa_df )

table( ssa_df[ , "stat" ] , useNA = "always" )

Descriptive Statistics

Calculate the mean (average) of a linear variable, overall and by groups:

mean( ssa_df[ , "fpmt" ] )

tapply(
    ssa_df[ , "fpmt" ] ,
    ssa_df[ , "stat" ] ,
    mean 
)

Calculate the distribution of a categorical variable, overall and by groups:

prop.table( table( ssa_df[ , "program_eligibility" ] ) )

prop.table(
    table( ssa_df[ , c( "program_eligibility" , "stat" ) ] ) ,
    margin = 2
)

Calculate the sum of a linear variable, overall and by groups:

sum( ssa_df[ , "fpmt" ] )

tapply(
    ssa_df[ , "fpmt" ] ,
    ssa_df[ , "stat" ] ,
    sum 
)

Calculate the median (50th percentile) of a linear variable, overall and by groups:

quantile( ssa_df[ , "fpmt" ] , 0.5 )

tapply(
    ssa_df[ , "fpmt" ] ,
    ssa_df[ , "stat" ] ,
    quantile ,
    0.5 
)

Subsetting

Limit your data.frame to females:

sub_ssa_df <- subset( ssa_df , sex == "F" )

Calculate the mean (average) of this subset:

mean( sub_ssa_df[ , "fpmt" ] )

Measures of Uncertainty

Calculate the variance, overall and by groups:

var( ssa_df[ , "fpmt" ] )

tapply(
    ssa_df[ , "fpmt" ] ,
    ssa_df[ , "stat" ] ,
    var 
)

Regression Models and Tests of Association

Perform a t-test:

t.test( fpmt ~ mental_disorder , ssa_df )

Perform a chi-squared test of association:

this_table <- table( ssa_df[ , c( "mental_disorder" , "program_eligibility" ) ] )

chisq.test( this_table )

Perform a generalized linear model:

glm_result <- 
    glm( 
        fpmt ~ mental_disorder + program_eligibility , 
        data = ssa_df
    )

summary( glm_result )

Analysis Examples with dplyr  

The R dplyr library offers an alternative grammar of data manipulation to base R and SQL syntax. dplyr offers many verbs, such as summarize, group_by, and mutate, the convenience of pipe-able functions, and the tidyverse style of non-standard evaluation. This vignette details the available features. As a starting point for SSA users, this code replicates previously-presented examples:

library(dplyr)
ssa_tbl <- tbl_df( ssa_df )

Calculate the mean (average) of a linear variable, overall and by groups:

ssa_tbl %>%
    summarize( mean = mean( fpmt ) )

ssa_tbl %>%
    group_by( stat ) %>%
    summarize( mean = mean( fpmt ) )