Data and Democracy Project
We build data tools for election protection: cleaning, analyzing, and visualizing public election datasets so that journalists, advocates, and election administrators can act on the findings.
Why This Matters
Elections in the United States generate enormous amounts of public data, but that data is scattered, inconsistent, and hard to use. Patterns that matter (administrative issues, data quality problems, or practices that warrant a closer look) stay buried in spreadsheets and codebooks that few people have the time or tooling to work through. Our work makes public election data easier to clean, compare, and analyze across jurisdictions and over time, so that the people protecting elections (journalists, advocates, legal teams, and election administrators) can find those patterns and act on them.
Our Impact
By making election data easier to clean, analyze, and compare across time and geography, this project supports election officials, researchers, advocacy organizations, legal teams, journalists, and other public-interest groups in identifying patterns that may warrant further investigation. The work aims to lower the barriers to using election data and to enable more consistent, transparent, and scalable analysis.
What We're Building
Our flagship effort is a reproducible, open workflow built around the U.S. Election Administration and Voting Survey (EAVS), the most detailed national dataset on how elections are actually run, covering voter registration, mail ballots, provisional ballots, voter list maintenance, and more. Current work includes:
- A Python-based data pipeline to clean and standardize EAVS datasets
- Cleaned datasets for 2020, 2022, and 2024, along with combined multi-year outputs
- Time-series data to support analysis across election cycles
- Data enrichment with demographic and jurisdictional information (e.g., Census-based data)
- Analysis-ready datasets designed for dashboards and further investigation
The goal is to reduce the time and effort required to work with election data and to make analyses more transparent, reproducible, and scalable.
Current Project Status
The project is currently in a mid-to-late stage of development, with strong foundations in place and ongoing work in several areas.
- Core datasets for 2020, 2022, and 2024 have been cleaned and standardized
- Combined multi-year datasets and time-series outputs have been created
- Demographic enrichment has been partially integrated
- Dashboard development and analysis concepts are in progress
- The team is actively engaging in outreach and user discovery to ensure the work aligns with real-world needs
The focus now is on refining the pipeline, expanding analysis and visualization, and working with potential users to guide further development.
Our Story
The project is led by a small core team (Michael, Yashin, and Cameron) alongside Civic Tech DC volunteers. It began with a simple observation: the data that could help protect elections is rich but not user-friendly. Working with it often requires identifying relevant variables across multiple files and codebooks, cleaning and standardizing inconsistent formats, calculating key metrics, and comparing results across years and jurisdictions. In practice, this has often meant manual, spreadsheet-based workflows that are time-consuming, difficult to reproduce, and prone to error.
Early work focused on understanding how the Campaign Legal Center was working with EAVS data and where the biggest bottlenecks existed. The project has since evolved into building a reusable, multi-year data pipeline and supporting tools to enable more reliable and scalable analysis, and into a broader effort to make public election data usable for everyone working to protect the vote.
How the Project Is Organised
The project is organized into five areas:
- Data Cleaning / Pipeline (Python) – cleaning and standardizing datasets, building reproducible workflows
- Data Visualization (Tableau, Plotly, etc.) – developing dashboards and user-facing tools
- Analytics / Modeling – exploring data, identifying patterns, and generating findings
- Outreach – connecting with potential users and partners to understand needs and use cases
- Project Management – coordinating tasks, documentation, and team processes
Volunteers are welcome to contribute to one or more areas depending on their interests and experience.
Come Join Us
We're looking for volunteers with skills in Python data pipelines, data visualization, analytics and modeling, and outreach. But the most important thing is curiosity about how elections actually work and who gets to participate in them. Current needs include:
- Data Cleaning / Pipeline Building (Python) – cleaning and standardizing datasets, improving and extending the pipeline
- Data Visualization (Tableau, Plotly, etc.) – building and refining dashboards and visualizations
- Data Analytics / Modeling – conducting quality checks, exploratory analysis, modeling, and identifying meaningful patterns
- Outreach – identifying and connecting with potential users of the work
- Project Management and Coordination – supporting organization, onboarding, and communication
The project is coordinated through the Civic Tech DC Slack workspace and during in-person project nights on the 2nd and 4th Wednesdays of each month. Work is largely asynchronous between meetings.
To get involved, join the Slack workspace. Once there, look for the #eavs_clc channel, where we share updates, tasks, questions, and resources.
If you are interested in potentially contributing now or later, please also fill out the Volunteer Matchmaker Survey. Completing the survey is not a commitment to volunteer; it simply helps us understand your interests, skills, and availability as the project evolves.