General Social Survey (GSS)
A historical record of the concerns, experiences, attitudes, and practices of residents of the United States.
Both cross-sectional and panel tables with one row per sampled respondent.
A complex sample survey generalizing to non-institutionalized adults (18+) in the United States.
Updated biennially since 1972.
Funded by National Science Foundation, administered by the National Opinion Research Center.
Please skim before you begin:
A haiku regarding this microdata:
Download, Import, Preparation
Download and import the 1972-2022 cumulative data file:
library(haven)
zip_tf <- tempfile()
zip_url <- "https://gss.norc.org/Documents/sas/GSS_sas.zip"
download.file( zip_url , zip_tf , mode = 'wb' )
unzipped_files <- unzip( zip_tf , exdir = tempdir() )
gss_tbl <- read_sas( grep( '\\.sas7bdat$' , unzipped_files , value = TRUE ) )
gss_df <- data.frame( gss_tbl )
names( gss_df ) <- tolower( names( gss_df ) )
gss_df[ , 'one' ] <- 1
Save Locally
Save the object at any point:
# gss_fn <- file.path( path.expand( "~" ) , "GSS" , "this_file.rds" )
# saveRDS( gss_df , file = gss_fn , compress = FALSE )
Load the same object:
Variable Recoding
Add new columns to the data set:
gss_design <-
update(
gss_design ,
polviews =
factor( polviews , levels = 1:7 ,
labels = c( "Extremely liberal" , "Liberal" ,
"Slightly liberal" , "Moderate, middle of the road" ,
"Slightly conservative" , "Conservative" ,
"Extremely conservative" )
) ,
born_in_usa = as.numeric( born == 1 ) ,
race = factor( race , levels = 1:3 , labels = c( "white" , "black" , "other" ) ) ,
region =
factor( region , levels = 1:9 ,
labels = c( "New England" , "Middle Atlantic" ,
"East North Central" , "West North Central" ,
"South Atlantic" , "East South Central" ,
"West South Central" , "Mountain" , "Pacific" )
)
)
Analysis Examples with the survey
library
Unweighted Counts
Count the unweighted number of records in the survey sample, overall and by groups:
Descriptive Statistics
Calculate the mean (average) of a linear variable, overall and by groups:
svymean( ~ age , gss_design , na.rm = TRUE )
svyby( ~ age , ~ region , gss_design , svymean , na.rm = TRUE )
Calculate the distribution of a categorical variable, overall and by groups:
svymean( ~ race , gss_design , na.rm = TRUE )
svyby( ~ race , ~ region , gss_design , svymean , na.rm = TRUE )
Calculate the sum of a linear variable, overall and by groups:
svytotal( ~ age , gss_design , na.rm = TRUE )
svyby( ~ age , ~ region , gss_design , svytotal , na.rm = TRUE )
Calculate the weighted sum of a categorical variable, overall and by groups:
svytotal( ~ race , gss_design , na.rm = TRUE )
svyby( ~ race , ~ region , gss_design , svytotal , na.rm = TRUE )
Calculate the median (50th percentile) of a linear variable, overall and by groups:
svyquantile( ~ age , gss_design , 0.5 , na.rm = TRUE )
svyby(
~ age ,
~ region ,
gss_design ,
svyquantile ,
0.5 ,
ci = TRUE , na.rm = TRUE
)
Estimate a ratio:
Measures of Uncertainty
Extract the coefficient, standard error, confidence interval, and coefficient of variation from any descriptive statistics function result, overall and by groups:
this_result <- svymean( ~ age , gss_design , na.rm = TRUE )
coef( this_result )
SE( this_result )
confint( this_result )
cv( this_result )
grouped_result <-
svyby(
~ age ,
~ region ,
gss_design ,
svymean ,
na.rm = TRUE
)
coef( grouped_result )
SE( grouped_result )
confint( grouped_result )
cv( grouped_result )
Calculate the degrees of freedom of any survey design object:
Calculate the complex sample survey-adjusted variance of any statistic:
Include the complex sample design effect in the result for a specific statistic:
# SRS without replacement
svymean( ~ age , gss_design , na.rm = TRUE , deff = TRUE )
# SRS with replacement
svymean( ~ age , gss_design , na.rm = TRUE , deff = "replace" )
Compute confidence intervals for proportions using methods that may be more accurate near 0 and 1. See ?svyciprop
for alternatives:
Replication Example
Match the unweighted record count totals on PDF page 74 of the Public Use File codebook:
Analysis Examples with srvyr
The R srvyr
library calculates summary statistics from survey data, such as the mean, total or quantile using dplyr-like syntax. srvyr allows for the use of many verbs, such as summarize
, group_by
, and mutate
, the convenience of pipe-able functions, the tidyverse
style of non-standard evaluation and more consistent return types than the survey
package. This vignette details the available features. As a starting point for GSS users, this code replicates previously-presented examples:
Calculate the mean (average) of a linear variable, overall and by groups: