Accessing the Census Bureau API with tidycensus and pytidycensus

Author

Corey S. Sparks, Ph.D.

Introduction

This document provides a brief demonstration of how to access the U.S. Census Bureau’s API using the tidycensus package in R and the pytidycensus package for Python.

The examples focus on retrieving median household income from the American Community Survey (ACS), but the same approach can be extended to other datasets and variables.

To make the workflow reproducible, the document also includes instructions on how to look up available variables in ACS, so that you can easily find the correct codes for the measures you are interested in.

Both R and Python examples are presented in tabbed code blocks for easy comparison.

Querying data from the Census API for the American Community Survey (ACS)

# Install if needed:
# install.packages("tidycensus")

library(tidycensus)
library(dplyr)

# Set your Census API key (replace with your own)
# census_api_key("YOUR_KEY_HERE", install = TRUE)

# Get median household income from ACS 5-year (2022) for all states
income_data <- get_acs(
  geography = "state",
  variables = "B19013_001",  # Median household income
  year = 2022
)

head(income_data, n=20)
# A tibble: 20 × 5
   GEOID NAME                 variable   estimate   moe
   <chr> <chr>                <chr>         <dbl> <dbl>
 1 01    Alabama              B19013_001    59609   377
 2 02    Alaska               B19013_001    86370  1083
 3 04    Arizona              B19013_001    72581   450
 4 05    Arkansas             B19013_001    56335   422
 5 06    California           B19013_001    91905   277
 6 08    Colorado             B19013_001    87598   508
 7 09    Connecticut          B19013_001    90213   730
 8 10    Delaware             B19013_001    79325  1227
 9 11    District of Columbia B19013_001   101722  1569
10 12    Florida              B19013_001    67917   259
11 13    Georgia              B19013_001    71355   353
12 15    Hawaii               B19013_001    94814   994
13 16    Idaho                B19013_001    70214   715
14 17    Illinois             B19013_001    78433   297
15 18    Indiana              B19013_001    67173   383
16 19    Iowa                 B19013_001    70571   395
17 20    Kansas               B19013_001    69747   479
18 21    Kentucky             B19013_001    60183   443
19 22    Louisiana            B19013_001    57852   468
20 23    Maine                B19013_001    68251   611
# Import modules
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
import pytidycensus as tc
import os
# Set API key for pytidycensus (replace with your own)
tc.set_census_api_key("YOUR_API_KEY")
# Get median household income from ACS 5-year (2022) for all states
state_income  = tc.get_acs(
    geography="state", 
    variables=["B19013_001E"],
    year=2022,
    output="wide"
)
Getting data from the 2018-2022 5-year ACS
print(state_income.head(10))
  GEOID  B19013_001E state         NAME  B19013_001_moe
0    01        59609    01      Alabama           377.0
1    02        86370    02       Alaska          1083.0
2    04        72581    04      Arizona           450.0
3    05        56335    05     Arkansas           422.0
4    06        91905    06   California           277.0
5    08        87598    08     Colorado           508.0
6    09        90213    09  Connecticut           730.0
7    10        79325    10     Delaware          1227.0
8    11       101722    11           DC          1569.0
9    12        67917    12      Florida           259.0
print('Shape: ', state_income.shape)
Shape:  (52, 5)

Looking Up Variables

# Lookup available variables for ACS 5-year 2022
vars <- load_variables(2022, "acs5", cache = TRUE)

# Search for "income" related variables
dplyr::filter(vars, grepl("income", label, ignore.case = TRUE)) |> head(10)
# A tibble: 10 × 4
   name         label                                          concept geography
   <chr>        <chr>                                          <chr>   <chr>    
 1 B06010PR_002 Estimate!!Total:!!No income                    Place … <NA>     
 2 B06010PR_003 Estimate!!Total:!!With income:                 Place … <NA>     
 3 B06010PR_004 Estimate!!Total:!!With income:!!$1 to $9,999 … Place … <NA>     
 4 B06010PR_005 Estimate!!Total:!!With income:!!$10,000 to $1… Place … <NA>     
 5 B06010PR_006 Estimate!!Total:!!With income:!!$15,000 to $2… Place … <NA>     
 6 B06010PR_007 Estimate!!Total:!!With income:!!$25,000 to $3… Place … <NA>     
 7 B06010PR_008 Estimate!!Total:!!With income:!!$35,000 to $4… Place … <NA>     
 8 B06010PR_009 Estimate!!Total:!!With income:!!$50,000 to $6… Place … <NA>     
 9 B06010PR_010 Estimate!!Total:!!With income:!!$65,000 to $7… Place … <NA>     
10 B06010PR_011 Estimate!!Total:!!With income:!!$75,000 or mo… Place … <NA>     
# Search for income-related variables
vars = tc.load_variables(2022, "acs", "acs5")
Loaded cached variables for 2022 acs acs5
income_vars = vars[vars["label"].str.contains("income", case=False, na=False)]

# Show first 10
print(income_vars.head(100))
               name  ...     table
2312  B06010PR_002E  ...  B06010PR
2313  B06010PR_003E  ...  B06010PR
2314  B06010PR_004E  ...  B06010PR
2315  B06010PR_005E  ...  B06010PR
2316  B06010PR_006E  ...  B06010PR
...             ...  ...       ...
2416    B06010_051E  ...    B06010
2417    B06010_052E  ...    B06010
2418    B06010_053E  ...    B06010
2419    B06010_054E  ...    B06010
2420    B06010_055E  ...    B06010

[100 rows x 7 columns]