For Foundation Model Teams, RAG Platforms & Civic Intelligence

A nationwide municipal legislative corpus for AI training, grounding, and evaluation.

GovData Legislative Dataset is a normalized, relational dataset extracted from Legistar and CivicPlus deployments across thousands of U.S. jurisdictions.

It captures decisions, provenance, and time - meetings, agendas, motions, votes, sponsors, and attachments - enabling high-quality supervision and reliable grounding for models that need to answer civic and regulatory questions.

Problem solved: without structured municipal ground truth, civic AI systems hallucinate.

Relational entities (not flat text) Time-stamped decision outcomes Cross-jurisdiction normalization Daily deltas + late-arriving updates

Request Sample + Schema View Dataset Contents

Built for grounding, not guessing. Preserve source provenance (meeting → agenda → item → motion → vote → attachment) to reduce hallucinations and improve auditability.

Why this matters for AI

Local government is a major blind spot in most datasets.

Many models understand federal and state policy broadly, but struggle with municipal decision-making - where zoning, procurement, enforcement, and permitting policy are actually set.

This dataset provides structured, time-aware ground truth for local government actions long before the same changes appear in permits, news, or secondary aggregators.

Improve civic question-answering with traceable sources
Enable time-aware policy reasoning across jurisdictions
Train models to connect proposals to votes and outcomes
Support evaluations for grounded retrieval and factuality

🛡️ 100% Public Record Data. Clear provenance.

Built and maintained by Darius Tajanko, original Legistar architect (1996-2014).

Use Cases

How AI teams use the dataset

Designed for model training, grounding (RAG), evaluation, and civic intelligence products that require trustworthy, time-aware sources.

Training

Structured supervision for civic reasoning

Use relational objects and outcomes to train models that understand how municipal policy changes over time.

Proposal → amendment → adoption lifecycle
Sponsors, committees, departments, and roles
Vote outcomes and decision provenance
Topic taxonomies and jurisdiction metadata

Grounding (RAG)

Reliable citations and traceable sources

Power retrieval systems that can cite meeting items, attachments, and vote records with stable identifiers.

Deep links to agenda items and files
Attachment provenance and document metadata
Time-aware retrieval (as-of queries)
De-duplication across jurisdictions

Evaluation

Benchmarks for grounded factuality

Create test sets for temporal reasoning, citation accuracy, and “what changed” questions.

Before/after policy changes
Outcome verification (did it pass?)
Cross-city comparison tasks
Hallucination resistance scoring

Products

Civic copilots and intelligence tooling

Enable public-sector assistants, compliance tools, and policy monitoring systems with reliable sources.

Public portal copilots with citations
Policy alerting and subscriptions
Jurisdiction-specific ordinance tracking
Procurement and public safety tech monitoring

Signals (Optional)

Event extraction and topic pipelines

Layer on event extraction and topic tagging to build structured signals from upstream legislative actions.

Zoning, housing, STR, ESG, infrastructure topics
Configurable vocabularies and taxonomies
Metro-specific or theme-specific bundles
Joint iteration with your research team

Integrations

Designed for data platforms

Delivered in formats and patterns that fit modern lake, warehouse, and streaming workflows.

Parquet/CSV/JSON exports for bulk ingest
Daily deltas and late-arriving corrections
Stable IDs for joins and lineage
Coverage slices for quick pilots

Dataset

Relational civic ground truth with time and provenance

Extracted and normalized from municipal legislative systems across thousands of jurisdictions, built to preserve relationships that flat scrapes lose.

What’s included

The dataset is organized around entities + relationships so you can ground answers and compute outcomes reliably.

Meetings: bodies, dates, locations, minutes, attendance
Agendas: items, ordering, status, timestamps
Files / Matters: legislation objects, lifecycle, departments
Motions + Votes: outcomes, roll-calls, vote counts
Sponsors + People: roles, affiliations where available
Attachments: metadata, linkage to agenda items and files
Jurisdictions: normalized naming and identifiers
Topics: configurable tags for common civic domains

Historical depth: typically 5 - 10+ years per jurisdiction, sometimes 20+.

Quality advantages

Most public datasets lose critical relationships. We preserve them.

Relational truth: meeting → agenda → item → motion → vote
Stable identifiers for joins and lineage tracking
Normalized schema across jurisdictions
Late-arriving updates included in deltas (corrections happen)
Extraction approach avoids broken OCR links and missing votes

Ask for a schema overview and a small sample slice to validate fit with your ingestion and modeling workflow.

Delivery

Bulk history + daily deltas, designed for AI pipelines

Choose the delivery pattern that matches your data platform - lakes, warehouses, streaming, or hybrid.

Bulk History

Backfills for training and backtests

Large historical deliveries for pretraining, fine-tuning, and long-horizon analyses.

Parquet/CSV/JSON export
Coverage and time-window slicing
Optional topic tags
Schema + data dictionary

Daily Deltas

Ongoing updates with corrections

Production feeds for grounded retrieval, monitoring, and model-refresh pipelines.

Daily or intraday drops
Late-arriving updates handled
Change logs by entity type
Stable IDs for incremental joins

API

Query and retrieval workflows

For RAG systems and internal tooling that needs entity-level retrieval and “as-of” queries.

Entity endpoints (meetings, items, votes, attachments)
Filters by jurisdiction, topic, date
As-of time windows for temporal grounding
Integration support for evaluation harnesses

Need to integrate through existing data vendor channels (Neudata, Eagle Alpha, BattleFin, etc.)? We can support brokered delivery paths as well.

Licensing

Clear licensing posture for model development

AI teams need clarity - what you can do with the data, where it comes from, and how provenance is preserved.

Intended uses

Model training and fine-tuning (internal)
Grounding and retrieval (RAG) in production systems
Evaluation and benchmarking datasets
Civic intelligence and policy monitoring products

We can structure licenses for research-only pilots, production grounding, and enterprise model development.

Source and compliance notes

Public-record source: derived from municipal legislative systems
Normalized dataset: independent structuring and entity linkage
Provenance preserved: trace from derived objects back to source items
Low MNPI posture: civic decisions are public proceedings

Ask us for a one-page licensing summary tailored for model training and grounding review workflows.

Samples

Request schema, samples, and a pilot slice

Tell us your intended use (training, grounding, evaluation), and we’ll propose a small but meaningful dataset slice.

What you’ll receive

We can start with a lightweight package designed to fit AI data review workflows.

Schema overview + data dictionary (high level)
Sample records across key entity types
Coverage summary (jurisdictions, years, entity counts)
Optional topic tags aligned with your use case
Short technical call for ingestion + semantics

Prefer asynchronous? Email your team name, use case, and what format you want (Parquet/CSV/JSON), and we’ll respond with a proposed pilot slice.

Contact

Organization: GovData Consulting
Principal: Darius Tajanko, Original Legistar Architect
Email: info@govdataconsulting.com
Phone: (312) 767-4004

If you work with data procurement platforms or brokers, tell us which one and we can align to your preferred contracting path.