Residential Energy Consumption Survey (RECS)
A periodic study conducted to provide detailed information about energy usage in U.S. homes.
One table with one row per sampled housing unit.
A complex sample survey designed to generalize to U.S. homes occupied as primary residences.
Released approximately every five years since 1979.
Prepared by the Energy Information Administration, with help from IMG-Crown and RTI International.
Please skim before you begin:
A haiku regarding this microdata:
Download, Import, Preparation
Download and import the most recent sas file:
library(haven)
sas_tf <- tempfile()
sas_url <- "https://www.eia.gov/consumption/residential/data/2020/sas/recs2020_public_v2.zip"
download.file( sas_url , sas_tf , mode = 'wb' )
recs_tbl <- read_sas( sas_tf )
recs_df <- data.frame( recs_tbl )
names( recs_df ) <- tolower( names( recs_df ) )
recs_df[ , 'one' ] <- 1
Save Locally
Save the object at any point:
# recs_fn <- file.path( path.expand( "~" ) , "RECS" , "this_file.rds" )
# saveRDS( recs_df , file = recs_fn , compress = FALSE )
Load the same object:
Variable Recoding
Add new columns to the data set:
recs_design <-
update(
recs_design ,
main_heating_fuel =
factor(
fuelheat ,
levels = c( -2 , 5 , 1 , 2 , 3 , 7 , 99 ) ,
labels =
c(
'Not applicable' ,
'Electricity' ,
'Natural gas from underground pipes' ,
'Propane (bottled gas)' ,
'Fuel oil' ,
'Wood or pellets' ,
'Other'
)
) ,
rooftype =
factor(
rooftype ,
levels = c( -2 , 1:6 , 99 ) ,
labels =
c(
'Not applicable' ,
'Ceramic or clay tiles' ,
'Wood shingles/shakes' ,
'Metal' ,
'Slate or synthetic slate' ,
'Shingles (composition or asphalt)' ,
'Concrete tiles' ,
'Other'
)
) ,
swimpool_binary =
ifelse( swimpool %in% 0:1 , swimpool , NA )
)
Analysis Examples with the survey
library
Unweighted Counts
Count the unweighted number of records in the survey sample, overall and by groups:
Descriptive Statistics
Calculate the mean (average) of a linear variable, overall and by groups:
svymean( ~ totsqft_en , recs_design )
svyby( ~ totsqft_en , ~ main_heating_fuel , recs_design , svymean )
Calculate the distribution of a categorical variable, overall and by groups:
svymean( ~ rooftype , recs_design )
svyby( ~ rooftype , ~ main_heating_fuel , recs_design , svymean )
Calculate the sum of a linear variable, overall and by groups:
svytotal( ~ totsqft_en , recs_design )
svyby( ~ totsqft_en , ~ main_heating_fuel , recs_design , svytotal )
Calculate the weighted sum of a categorical variable, overall and by groups:
svytotal( ~ rooftype , recs_design )
svyby( ~ rooftype , ~ main_heating_fuel , recs_design , svytotal )
Calculate the median (50th percentile) of a linear variable, overall and by groups:
svyquantile( ~ totsqft_en , recs_design , 0.5 )
svyby(
~ totsqft_en ,
~ main_heating_fuel ,
recs_design ,
svyquantile ,
0.5 ,
ci = TRUE
)
Estimate a ratio:
Subsetting
Restrict the survey design to households that cook three or more hot meals per day:
Calculate the mean (average) of this subset:
Measures of Uncertainty
Extract the coefficient, standard error, confidence interval, and coefficient of variation from any descriptive statistics function result, overall and by groups:
this_result <- svymean( ~ totsqft_en , recs_design )
coef( this_result )
SE( this_result )
confint( this_result )
cv( this_result )
grouped_result <-
svyby(
~ totsqft_en ,
~ main_heating_fuel ,
recs_design ,
svymean
)
coef( grouped_result )
SE( grouped_result )
confint( grouped_result )
cv( grouped_result )
Calculate the degrees of freedom of any survey design object:
Calculate the complex sample survey-adjusted variance of any statistic:
Include the complex sample design effect in the result for a specific statistic:
# SRS without replacement
svymean( ~ totsqft_en , recs_design , deff = TRUE )
# SRS with replacement
svymean( ~ totsqft_en , recs_design , deff = "replace" )
Compute confidence intervals for proportions using methods that may be more accurate near 0 and 1. See ?svyciprop
for alternatives:
Replication Example
This example matches the statistic, standard error, and relative standard error shown on PDF page 8 of Using the microdata file to compute estimates and relative standard errors (RSEs)
sas_v1_tf <- tempfile()
sas_v1_url <- "https://www.eia.gov/consumption/residential/data/2020/sas/recs2020_public_v1.zip"
download.file( sas_v1_url , sas_v1_tf , mode = 'wb' )
recs_v1_tbl <- read_sas( sas_v1_tf )
recs_v1_df <- data.frame( recs_v1_tbl )
names( recs_v1_df ) <- tolower( names( recs_v1_df ) )
recs_v1_design <-
svrepdesign(
data = recs_v1_df ,
weight = ~ nweight ,
repweights = 'nweight[1-9]+' ,
type = 'JK1' ,
combined.weights = TRUE ,
scale = 59 / 60 ,
mse = TRUE
)
recs_v1_design <-
update(
recs_v1_design ,
natural_gas_mainspace_heat = as.numeric( fuelheat == 1 )
)
result <-
svytotal(
~ natural_gas_mainspace_heat ,
recs_v1_design
)
stopifnot( round( coef( result ) , 0 ) == 56245389 )
stopifnot( round( SE( result ) , 0 ) == 545591 )
stopifnot( round( 100 * SE( result ) / coef( result ) , 2 ) == 0.97 )
Analysis Examples with srvyr
The R srvyr
library calculates summary statistics from survey data, such as the mean, total or quantile using dplyr-like syntax. srvyr allows for the use of many verbs, such as summarize
, group_by
, and mutate
, the convenience of pipe-able functions, the tidyverse
style of non-standard evaluation and more consistent return types than the survey
package. This vignette details the available features. As a starting point for RECS users, this code replicates previously-presented examples:
Calculate the mean (average) of a linear variable, overall and by groups: