# Social Security Administration Public Use Microdata (SSA)

Research extracts provided by the Social Security Administration.

• Tables contain either one record per person or one record per person per year.

• The entire population of either social security number holders (most of the country) or social security recipients (just beneficiaries). One-percent samples should be multiplied by 100 to get accurate nationwide count statistics, five-percent samples by 20.

• No expected release timeline.

• Released by the United States Social Security Administration (SSA).

The R `lodown` package easily downloads and imports all available SSA microdata by simply specifying `"ssa"` with an `output_dir =` parameter in the `lodown()` function. Depending on your internet connection and computer processing speed, you might prefer to run this step overnight.

``````library(lodown)
lodown( "ssa" , output_dir = file.path( path.expand( "~" ) , "SSA" ) )``````

## Analysis Examples with base R

``ssa_df <- readRDS( file.path( path.expand( "~" ) , "SSA" , "ssr_data/SSIPUF.rds" ) )``

### Variable Recoding

Add new columns to the data set:

``````ssa_df <-
transform(
ssa_df ,

mental_disorder = as.numeric( diag %in% 1:2 ) ,

program_eligibility =
factor(
prel ,

levels = 0:5 ,

labels =
c( "Unspecified" ,
"Aged individual" ,
"Aged spouse" ,
"Disabled or blind individual" ,
"Disabled or blind spouse" ,
"Disabled or blind child" )
)

)``````

### Unweighted Counts

Count the unweighted number of records in the table, overall and by groups:

``````nrow( ssa_df )

table( ssa_df[ , "stat" ] , useNA = "always" )``````

### Descriptive Statistics

Calculate the mean (average) of a linear variable, overall and by groups:

``````mean( ssa_df[ , "fpmt" ] )

tapply(
ssa_df[ , "fpmt" ] ,
ssa_df[ , "stat" ] ,
mean
)``````

Calculate the distribution of a categorical variable, overall and by groups:

``````prop.table( table( ssa_df[ , "program_eligibility" ] ) )

prop.table(
table( ssa_df[ , c( "program_eligibility" , "stat" ) ] ) ,
margin = 2
)``````

Calculate the sum of a linear variable, overall and by groups:

``````sum( ssa_df[ , "fpmt" ] )

tapply(
ssa_df[ , "fpmt" ] ,
ssa_df[ , "stat" ] ,
sum
)``````

Calculate the median (50th percentile) of a linear variable, overall and by groups:

``````quantile( ssa_df[ , "fpmt" ] , 0.5 )

tapply(
ssa_df[ , "fpmt" ] ,
ssa_df[ , "stat" ] ,
quantile ,
0.5
)``````

### Subsetting

Limit your `data.frame` to females:

``sub_ssa_df <- subset( ssa_df , sex == "F" )``

Calculate the mean (average) of this subset:

``mean( sub_ssa_df[ , "fpmt" ] )``

### Measures of Uncertainty

Calculate the variance, overall and by groups:

``````var( ssa_df[ , "fpmt" ] )

tapply(
ssa_df[ , "fpmt" ] ,
ssa_df[ , "stat" ] ,
var
)``````

### Regression Models and Tests of Association

Perform a t-test:

``t.test( fpmt ~ mental_disorder , ssa_df )``

Perform a chi-squared test of association:

``````this_table <- table( ssa_df[ , c( "mental_disorder" , "program_eligibility" ) ] )

chisq.test( this_table )``````

Perform a generalized linear model:

``````glm_result <-
glm(
fpmt ~ mental_disorder + program_eligibility ,
data = ssa_df
)

summary( glm_result )``````

## Analysis Examples with `dplyr`

The R `dplyr` library offers an alternative grammar of data manipulation to base R and SQL syntax. dplyr offers many verbs, such as `summarize`, `group_by`, and `mutate`, the convenience of pipe-able functions, and the `tidyverse` style of non-standard evaluation. This vignette details the available features. As a starting point for SSA users, this code replicates previously-presented examples:

``````library(dplyr)
ssa_tbl <- tbl_df( ssa_df )``````

Calculate the mean (average) of a linear variable, overall and by groups:

``````ssa_tbl %>%
summarize( mean = mean( fpmt ) )

ssa_tbl %>%
group_by( stat ) %>%
summarize( mean = mean( fpmt ) )``````