Data and Democracy Project

We build data tools for election protection: cleaning, analyzing, and visualizing public election datasets so that journalists, advocates, and election administrators can act on the findings.

Why This Matters

Elections in the United States generate enormous amounts of public data, but that data is scattered, inconsistent, and hard to use. Patterns that matter (administrative issues, data quality problems, or practices that warrant a closer look) stay buried in spreadsheets and codebooks that few people have the time or tooling to work through. Our work makes public election data easier to clean, compare, and analyze across jurisdictions and over time, so that the people protecting elections (journalists, advocates, legal teams, and election administrators) can find those patterns and act on them.

Our Impact

By making election data easier to clean, analyze, and compare across time and geography, this project supports election officials, researchers, advocacy organizations, legal teams, journalists, and other public-interest groups in identifying patterns that may warrant further investigation. The work aims to lower the barriers to using election data and to enable more consistent, transparent, and scalable analysis.

What We're Building

Our flagship effort is a reproducible, open workflow built around the U.S. Election Administration and Voting Survey (EAVS), the most detailed national dataset on how elections are actually run, covering voter registration, mail ballots, provisional ballots, voter list maintenance, and more. Current work includes:

A Python-based data pipeline to clean and standardize EAVS datasets
Cleaned datasets for 2020, 2022, and 2024, along with combined multi-year outputs
Time-series data to support analysis across election cycles
Data enrichment with demographic and jurisdictional information (e.g., Census-based data)
Analysis-ready datasets designed for dashboards and further investigation

The goal is to reduce the time and effort required to work with election data and to make analyses more transparent, reproducible, and scalable.

Current Project Status

The project is currently in a mid-to-late stage of development, with strong foundations in place and ongoing work in several areas.

Core datasets for 2020, 2022, and 2024 have been cleaned and standardized
Combined multi-year datasets and time-series outputs have been created
Demographic enrichment has been partially integrated
Dashboard development and analysis concepts are in progress
The team is actively engaging in outreach and user discovery to ensure the work aligns with real-world needs

The focus now is on refining the pipeline, expanding analysis and visualization, and working with potential users to guide further development.

Our Story

The project is led by a small core team (Michael, Yashin, and Cameron) alongside Civic Tech DC volunteers. It began with a simple observation: the data that could help protect elections is rich but not user-friendly. Working with it often requires identifying relevant variables across multiple files and codebooks, cleaning and standardizing inconsistent formats, calculating key metrics, and comparing results across years and jurisdictions. In practice, this has often meant manual, spreadsheet-based workflows that are time-consuming, difficult to reproduce, and prone to error.

Early work focused on understanding how the Campaign Legal Center was working with EAVS data and where the biggest bottlenecks existed. The project has since evolved into building a reusable, multi-year data pipeline and supporting tools to enable more reliable and scalable analysis, and into a broader effort to make public election data usable for everyone working to protect the vote.

How the Project Is Organised

The project is organized into five areas:

Data Cleaning / Pipeline (Python) – cleaning and standardizing datasets, building reproducible workflows
Data Visualization (Tableau, Plotly, etc.) – developing dashboards and user-facing tools
Analytics / Modeling – exploring data, identifying patterns, and generating findings
Outreach – connecting with potential users and partners to understand needs and use cases
Project Management – coordinating tasks, documentation, and team processes

Volunteers are welcome to contribute to one or more areas depending on their interests and experience.

Come Join Us

We're looking for volunteers with skills in Python data pipelines, data visualization, analytics and modeling, and outreach. But the most important thing is curiosity about how elections actually work and who gets to participate in them. Current needs include:

Data Cleaning / Pipeline Building (Python) – cleaning and standardizing datasets, improving and extending the pipeline
Data Visualization (Tableau, Plotly, etc.) – building and refining dashboards and visualizations
Data Analytics / Modeling – conducting quality checks, exploratory analysis, modeling, and identifying meaningful patterns
Outreach – identifying and connecting with potential users of the work
Project Management and Coordination – supporting organization, onboarding, and communication

The project is coordinated through the Civic Tech DC Slack workspace and during in-person project nights on the 2nd and 4th Wednesdays of each month. Work is largely asynchronous between meetings.

To get involved, join the Slack workspace. Once there, look for the #eavs_clc channel, where we share updates, tasks, questions, and resources.

If you are interested in potentially contributing now or later, please also fill out the Volunteer Matchmaker Survey. Completing the survey is not a commitment to volunteer; it simply helps us understand your interests, skills, and availability as the project evolves.

Slack (#eavs_clc) GitHub Repo