A nationwide municipal legislative corpus for AI training, grounding, and evaluation.
GovData Legislative Dataset is a normalized, relational dataset extracted from Legistar and CivicPlus deployments across thousands of U.S. jurisdictions.
It captures decisions, provenance, and time - meetings, agendas, motions, votes, sponsors, and attachments - enabling high-quality supervision and reliable grounding for models that need to answer civic and regulatory questions.
Problem solved: without structured municipal ground truth, civic AI systems hallucinate.
Built for grounding, not guessing. Preserve source provenance (meeting → agenda → item → motion → vote → attachment) to reduce hallucinations and improve auditability.
How AI teams use the dataset
Designed for model training, grounding (RAG), evaluation, and civic intelligence products that require trustworthy, time-aware sources.
Structured supervision for civic reasoning
Use relational objects and outcomes to train models that understand how municipal policy changes over time.
- Proposal → amendment → adoption lifecycle
- Sponsors, committees, departments, and roles
- Vote outcomes and decision provenance
- Topic taxonomies and jurisdiction metadata
Reliable citations and traceable sources
Power retrieval systems that can cite meeting items, attachments, and vote records with stable identifiers.
- Deep links to agenda items and files
- Attachment provenance and document metadata
- Time-aware retrieval (as-of queries)
- De-duplication across jurisdictions
Benchmarks for grounded factuality
Create test sets for temporal reasoning, citation accuracy, and “what changed” questions.
- Before/after policy changes
- Outcome verification (did it pass?)
- Cross-city comparison tasks
- Hallucination resistance scoring
Civic copilots and intelligence tooling
Enable public-sector assistants, compliance tools, and policy monitoring systems with reliable sources.
- Public portal copilots with citations
- Policy alerting and subscriptions
- Jurisdiction-specific ordinance tracking
- Procurement and public safety tech monitoring
Event extraction and topic pipelines
Layer on event extraction and topic tagging to build structured signals from upstream legislative actions.
- Zoning, housing, STR, ESG, infrastructure topics
- Configurable vocabularies and taxonomies
- Metro-specific or theme-specific bundles
- Joint iteration with your research team
Designed for data platforms
Delivered in formats and patterns that fit modern lake, warehouse, and streaming workflows.
- Parquet/CSV/JSON exports for bulk ingest
- Daily deltas and late-arriving corrections
- Stable IDs for joins and lineage
- Coverage slices for quick pilots
Relational civic ground truth with time and provenance
Extracted and normalized from municipal legislative systems across thousands of jurisdictions, built to preserve relationships that flat scrapes lose.
What’s included
The dataset is organized around entities + relationships so you can ground answers and compute outcomes reliably.
- Meetings: bodies, dates, locations, minutes, attendance
- Agendas: items, ordering, status, timestamps
- Files / Matters: legislation objects, lifecycle, departments
- Motions + Votes: outcomes, roll-calls, vote counts
- Sponsors + People: roles, affiliations where available
- Attachments: metadata, linkage to agenda items and files
- Jurisdictions: normalized naming and identifiers
- Topics: configurable tags for common civic domains
Historical depth: typically 5 - 10+ years per jurisdiction, sometimes 20+.
Quality advantages
Most public datasets lose critical relationships. We preserve them.
- Relational truth: meeting → agenda → item → motion → vote
- Stable identifiers for joins and lineage tracking
- Normalized schema across jurisdictions
- Late-arriving updates included in deltas (corrections happen)
- Extraction approach avoids broken OCR links and missing votes
Ask for a schema overview and a small sample slice to validate fit with your ingestion and modeling workflow.
Bulk history + daily deltas, designed for AI pipelines
Choose the delivery pattern that matches your data platform - lakes, warehouses, streaming, or hybrid.
Backfills for training and backtests
Large historical deliveries for pretraining, fine-tuning, and long-horizon analyses.
- Parquet/CSV/JSON export
- Coverage and time-window slicing
- Optional topic tags
- Schema + data dictionary
Ongoing updates with corrections
Production feeds for grounded retrieval, monitoring, and model-refresh pipelines.
- Daily or intraday drops
- Late-arriving updates handled
- Change logs by entity type
- Stable IDs for incremental joins
Query and retrieval workflows
For RAG systems and internal tooling that needs entity-level retrieval and “as-of” queries.
- Entity endpoints (meetings, items, votes, attachments)
- Filters by jurisdiction, topic, date
- As-of time windows for temporal grounding
- Integration support for evaluation harnesses
Need to integrate through existing data vendor channels (Neudata, Eagle Alpha, BattleFin, etc.)? We can support brokered delivery paths as well.
Clear licensing posture for model development
AI teams need clarity - what you can do with the data, where it comes from, and how provenance is preserved.
Intended uses
- Model training and fine-tuning (internal)
- Grounding and retrieval (RAG) in production systems
- Evaluation and benchmarking datasets
- Civic intelligence and policy monitoring products
We can structure licenses for research-only pilots, production grounding, and enterprise model development.
Source and compliance notes
- Public-record source: derived from municipal legislative systems
- Normalized dataset: independent structuring and entity linkage
- Provenance preserved: trace from derived objects back to source items
- Low MNPI posture: civic decisions are public proceedings
Ask us for a one-page licensing summary tailored for model training and grounding review workflows.
Request schema, samples, and a pilot slice
Tell us your intended use (training, grounding, evaluation), and we’ll propose a small but meaningful dataset slice.
What you’ll receive
We can start with a lightweight package designed to fit AI data review workflows.
- Schema overview + data dictionary (high level)
- Sample records across key entity types
- Coverage summary (jurisdictions, years, entity counts)
- Optional topic tags aligned with your use case
- Short technical call for ingestion + semantics
Prefer asynchronous? Email your team name, use case, and what format you want (Parquet/CSV/JSON), and we’ll respond with a proposed pilot slice.
Principal: Darius Tajanko, Original Legistar Architect
Email: info@govdataconsulting.com
Phone: (312) 767-4004
If you work with data procurement platforms or brokers, tell us which one and we can align to your preferred contracting path.