Public Libraries Survey (PLS)

Build Status Build status

An annual census of public libraries in the United States.

Simplified Download and Importation

The R lodown package easily downloads and imports all available PLS microdata by simply specifying "pls" with an output_dir = parameter in the lodown() function. Depending on your internet connection and computer processing speed, you might prefer to run this step overnight.

library(lodown)
lodown( "pls" , output_dir = file.path( path.expand( "~" ) , "PLS" ) )

Analysis Examples with base R

Load a data frame:

pls_df <- readRDS( file.path( path.expand( "~" ) , "PLS" , "2014/pls_fy_ae_puplda.rds" ) )

Variable Recoding

Add new columns to the data set:

pls_df <- 
    transform( 
        pls_df , 
        
        c_relatn = 
            factor( c_relatn , levels = c( "HQ" , "ME" , "NO" ) ,
                c( "HQ-Headquarters of a federation or cooperative" ,
                "ME-Member of a federation or cooperative" ,
                "NO-Not a member of a federation or cooperative" )
            ) ,
            
        more_than_one_librarian = as.numeric( libraria > 1 )
                
    )   

Unweighted Counts

Count the unweighted number of records in the table, overall and by groups:

nrow( pls_df )

table( pls_df[ , "stabr" ] , useNA = "always" )

Descriptive Statistics

Calculate the mean (average) of a linear variable, overall and by groups:

mean( pls_df[ , "popu_lsa" ] )

tapply(
    pls_df[ , "popu_lsa" ] ,
    pls_df[ , "stabr" ] ,
    mean 
)

Calculate the distribution of a categorical variable, overall and by groups:

prop.table( table( pls_df[ , "c_relatn" ] ) )

prop.table(
    table( pls_df[ , c( "c_relatn" , "stabr" ) ] ) ,
    margin = 2
)

Calculate the sum of a linear variable, overall and by groups:

sum( pls_df[ , "popu_lsa" ] )

tapply(
    pls_df[ , "popu_lsa" ] ,
    pls_df[ , "stabr" ] ,
    sum 
)

Calculate the median (50th percentile) of a linear variable, overall and by groups:

quantile( pls_df[ , "popu_lsa" ] , 0.5 )

tapply(
    pls_df[ , "popu_lsa" ] ,
    pls_df[ , "stabr" ] ,
    quantile ,
    0.5 
)

Subsetting

Limit your data.frame to more than one million annual visits:

sub_pls_df <- subset( pls_df , visits > 1000000 )

Calculate the mean (average) of this subset:

mean( sub_pls_df[ , "popu_lsa" ] )

Measures of Uncertainty

Calculate the variance, overall and by groups:

var( pls_df[ , "popu_lsa" ] )

tapply(
    pls_df[ , "popu_lsa" ] ,
    pls_df[ , "stabr" ] ,
    var 
)

Regression Models and Tests of Association

Perform a t-test:

t.test( popu_lsa ~ more_than_one_librarian , pls_df )

Perform a chi-squared test of association:

this_table <- table( pls_df[ , c( "more_than_one_librarian" , "c_relatn" ) ] )

chisq.test( this_table )

Perform a generalized linear model:

glm_result <- 
    glm( 
        popu_lsa ~ more_than_one_librarian + c_relatn , 
        data = pls_df
    )

summary( glm_result )

Analysis Examples with dplyr

The R dplyr library offers an alternative grammar of data manipulation to base R and SQL syntax. dplyr offers many verbs, such as summarize, group_by, and mutate, the convenience of pipe-able functions, and the tidyverse style of non-standard evaluation. This vignette details the available features. As a starting point for PLS users, this code replicates previously-presented examples:

library(dplyr)
pls_tbl <- tbl_df( pls_df )

Calculate the mean (average) of a linear variable, overall and by groups:

pls_tbl %>%
    summarize( mean = mean( popu_lsa ) )

pls_tbl %>%
    group_by( stabr ) %>%
    summarize( mean = mean( popu_lsa ) )