# General Social Survey (GSS)

A historical record of the concerns, experiences, attitudes, and practices of residents of the United States.

Both cross-sectional and panel tables with one row per sampled respondent.

A complex sample survey generalizing to non-institutionalized adults (18+) in the United States.

Updated biennially since 1972.

Funded by National Science Foundation, administered by the National Opinion Research Center.

Please skim before you begin:

This human-composed haiku or a bouquet of artificial intelligence-generated limericks

## Download, Import, Preparation

Download and import the 1972-2022 cumulative data file:

```
library(haven)
zip_tf <- tempfile()
zip_url <- "https://gss.norc.org/Documents/sas/GSS_sas.zip"
download.file( zip_url , zip_tf , mode = 'wb' )
unzipped_files <- unzip( zip_tf , exdir = tempdir() )
gss_tbl <- read_sas( grep( '\\.sas7bdat$' , unzipped_files , value = TRUE ) )
gss_df <- data.frame( gss_tbl )
names( gss_df ) <- tolower( names( gss_df ) )
gss_df[ , 'one' ] <- 1
```

### Save locally

Save the object at any point:

```
# gss_fn <- file.path( path.expand( "~" ) , "GSS" , "this_file.rds" )
# saveRDS( gss_df , file = gss_fn , compress = FALSE )
```

Load the same object:

### Variable Recoding

Add new columns to the data set:

```
gss_design <-
update(
gss_design ,
polviews =
factor( polviews , levels = 1:7 ,
labels = c( "Extremely liberal" , "Liberal" ,
"Slightly liberal" , "Moderate, middle of the road" ,
"Slightly conservative" , "Conservative" ,
"Extremely conservative" )
) ,
born_in_usa = as.numeric( born == 1 ) ,
race = factor( race , levels = 1:3 , labels = c( "white" , "black" , "other" ) ) ,
region =
factor( region , levels = 1:9 ,
labels = c( "New England" , "Middle Atlantic" ,
"East North Central" , "West North Central" ,
"South Atlantic" , "East South Central" ,
"West South Central" , "Mountain" , "Pacific" )
)
)
```

## Analysis Examples with the `survey`

library

### Unweighted Counts

Count the unweighted number of records in the survey sample, overall and by groups:

### Descriptive Statistics

Calculate the mean (average) of a linear variable, overall and by groups:

```
svymean( ~ age , gss_design , na.rm = TRUE )
svyby( ~ age , ~ region , gss_design , svymean , na.rm = TRUE )
```

Calculate the distribution of a categorical variable, overall and by groups:

```
svymean( ~ race , gss_design , na.rm = TRUE )
svyby( ~ race , ~ region , gss_design , svymean , na.rm = TRUE )
```

Calculate the sum of a linear variable, overall and by groups:

```
svytotal( ~ age , gss_design , na.rm = TRUE )
svyby( ~ age , ~ region , gss_design , svytotal , na.rm = TRUE )
```

Calculate the weighted sum of a categorical variable, overall and by groups:

```
svytotal( ~ race , gss_design , na.rm = TRUE )
svyby( ~ race , ~ region , gss_design , svytotal , na.rm = TRUE )
```

Calculate the median (50th percentile) of a linear variable, overall and by groups:

```
svyquantile( ~ age , gss_design , 0.5 , na.rm = TRUE )
svyby(
~ age ,
~ region ,
gss_design ,
svyquantile ,
0.5 ,
ci = TRUE , na.rm = TRUE
)
```

Estimate a ratio:

### Measures of Uncertainty

Extract the coefficient, standard error, confidence interval, and coefficient of variation from any descriptive statistics function result, overall and by groups:

```
this_result <- svymean( ~ age , gss_design , na.rm = TRUE )
coef( this_result )
SE( this_result )
confint( this_result )
cv( this_result )
grouped_result <-
svyby(
~ age ,
~ region ,
gss_design ,
svymean ,
na.rm = TRUE
)
coef( grouped_result )
SE( grouped_result )
confint( grouped_result )
cv( grouped_result )
```

Calculate the degrees of freedom of any survey design object:

Calculate the complex sample survey-adjusted variance of any statistic:

Include the complex sample design effect in the result for a specific statistic:

```
# SRS without replacement
svymean( ~ age , gss_design , na.rm = TRUE , deff = TRUE )
# SRS with replacement
svymean( ~ age , gss_design , na.rm = TRUE , deff = "replace" )
```

Compute confidence intervals for proportions using methods that may be more accurate near 0 and 1. See `?svyciprop`

for alternatives:

## Replication Example

Match the unweighted record count on PDF page 10 of the Public Use File codebook:

## Analysis Examples with `srvyr`

The R `srvyr`

library calculates summary statistics from survey data, such as the mean, total or quantile using dplyr-like syntax. srvyr allows for the use of many verbs, such as `summarize`

, `group_by`

, and `mutate`

, the convenience of pipe-able functions, the `tidyverse`

style of non-standard evaluation and more consistent return types than the `survey`

package. This vignette details the available features. As a starting point for GSS users, this code replicates previously-presented examples:

Calculate the mean (average) of a linear variable, overall and by groups: