# Public Libraries Survey (PLS)

An annual census of public libraries in the United States.

The R `lodown` package easily downloads and imports all available PLS microdata by simply specifying `"pls"` with an `output_dir =` parameter in the `lodown()` function. Depending on your internet connection and computer processing speed, you might prefer to run this step overnight.

``````library(lodown)
lodown( "pls" , output_dir = file.path( path.expand( "~" ) , "PLS" ) )``````

## Analysis Examples with base R

``pls_df <- readRDS( file.path( path.expand( "~" ) , "PLS" , "2014/pls_fy_ae_puplda.rds" ) )``

### Variable Recoding

Add new columns to the data set:

``````pls_df <-
transform(
pls_df ,

c_relatn =
factor( c_relatn , levels = c( "HQ" , "ME" , "NO" ) ,
c( "HQ-Headquarters of a federation or cooperative" ,
"ME-Member of a federation or cooperative" ,
"NO-Not a member of a federation or cooperative" )
) ,

more_than_one_librarian = as.numeric( libraria > 1 )

)   ``````

### Unweighted Counts

Count the unweighted number of records in the table, overall and by groups:

``````nrow( pls_df )

table( pls_df[ , "stabr" ] , useNA = "always" )``````

### Descriptive Statistics

Calculate the mean (average) of a linear variable, overall and by groups:

``````mean( pls_df[ , "popu_lsa" ] )

tapply(
pls_df[ , "popu_lsa" ] ,
pls_df[ , "stabr" ] ,
mean
)``````

Calculate the distribution of a categorical variable, overall and by groups:

``````prop.table( table( pls_df[ , "c_relatn" ] ) )

prop.table(
table( pls_df[ , c( "c_relatn" , "stabr" ) ] ) ,
margin = 2
)``````

Calculate the sum of a linear variable, overall and by groups:

``````sum( pls_df[ , "popu_lsa" ] )

tapply(
pls_df[ , "popu_lsa" ] ,
pls_df[ , "stabr" ] ,
sum
)``````

Calculate the median (50th percentile) of a linear variable, overall and by groups:

``````quantile( pls_df[ , "popu_lsa" ] , 0.5 )

tapply(
pls_df[ , "popu_lsa" ] ,
pls_df[ , "stabr" ] ,
quantile ,
0.5
)``````

### Subsetting

Limit your `data.frame` to more than one million annual visits:

``sub_pls_df <- subset( pls_df , visits > 1000000 )``

Calculate the mean (average) of this subset:

``mean( sub_pls_df[ , "popu_lsa" ] )``

### Measures of Uncertainty

Calculate the variance, overall and by groups:

``````var( pls_df[ , "popu_lsa" ] )

tapply(
pls_df[ , "popu_lsa" ] ,
pls_df[ , "stabr" ] ,
var
)``````

### Regression Models and Tests of Association

Perform a t-test:

``t.test( popu_lsa ~ more_than_one_librarian , pls_df )``

Perform a chi-squared test of association:

``````this_table <- table( pls_df[ , c( "more_than_one_librarian" , "c_relatn" ) ] )

chisq.test( this_table )``````

Perform a generalized linear model:

``````glm_result <-
glm(
popu_lsa ~ more_than_one_librarian + c_relatn ,
data = pls_df
)

summary( glm_result )``````

## Analysis Examples with `dplyr`

The R `dplyr` library offers an alternative grammar of data manipulation to base R and SQL syntax. dplyr offers many verbs, such as `summarize`, `group_by`, and `mutate`, the convenience of pipe-able functions, and the `tidyverse` style of non-standard evaluation. This vignette details the available features. As a starting point for PLS users, this code replicates previously-presented examples:

``````library(dplyr)
pls_tbl <- tbl_df( pls_df )``````

Calculate the mean (average) of a linear variable, overall and by groups:

``````pls_tbl %>%
summarize( mean = mean( popu_lsa ) )

pls_tbl %>%
group_by( stabr ) %>%
summarize( mean = mean( popu_lsa ) )``````