For Foundation Model Teams, RAG Platforms & Civic Intelligence

A nationwide municipal legislative corpus for AI training, grounding, and evaluation.

GovData Legislative Dataset is a normalized, relational dataset extracted from Legistar and CivicPlus deployments across thousands of U.S. jurisdictions.

It captures decisions, provenance, and time - meetings, agendas, motions, votes, sponsors, and attachments - enabling high-quality supervision and reliable grounding for models that need to answer civic and regulatory questions.

Problem solved: without structured municipal ground truth, civic AI systems hallucinate.

Relational entities (not flat text) Time-stamped decision outcomes Cross-jurisdiction normalization Daily deltas + late-arriving updates

Built for grounding, not guessing. Preserve source provenance (meeting → agenda → item → motion → vote → attachment) to reduce hallucinations and improve auditability.

Use Cases

How AI teams use the dataset

Designed for model training, grounding (RAG), evaluation, and civic intelligence products that require trustworthy, time-aware sources.

Training

Structured supervision for civic reasoning

Use relational objects and outcomes to train models that understand how municipal policy changes over time.

  • Proposal → amendment → adoption lifecycle
  • Sponsors, committees, departments, and roles
  • Vote outcomes and decision provenance
  • Topic taxonomies and jurisdiction metadata
Grounding (RAG)

Reliable citations and traceable sources

Power retrieval systems that can cite meeting items, attachments, and vote records with stable identifiers.

  • Deep links to agenda items and files
  • Attachment provenance and document metadata
  • Time-aware retrieval (as-of queries)
  • De-duplication across jurisdictions
Evaluation

Benchmarks for grounded factuality

Create test sets for temporal reasoning, citation accuracy, and “what changed” questions.

  • Before/after policy changes
  • Outcome verification (did it pass?)
  • Cross-city comparison tasks
  • Hallucination resistance scoring
Products

Civic copilots and intelligence tooling

Enable public-sector assistants, compliance tools, and policy monitoring systems with reliable sources.

  • Public portal copilots with citations
  • Policy alerting and subscriptions
  • Jurisdiction-specific ordinance tracking
  • Procurement and public safety tech monitoring
Signals (Optional)

Event extraction and topic pipelines

Layer on event extraction and topic tagging to build structured signals from upstream legislative actions.

  • Zoning, housing, STR, ESG, infrastructure topics
  • Configurable vocabularies and taxonomies
  • Metro-specific or theme-specific bundles
  • Joint iteration with your research team
Integrations

Designed for data platforms

Delivered in formats and patterns that fit modern lake, warehouse, and streaming workflows.

  • Parquet/CSV/JSON exports for bulk ingest
  • Daily deltas and late-arriving corrections
  • Stable IDs for joins and lineage
  • Coverage slices for quick pilots
Dataset

Relational civic ground truth with time and provenance

Extracted and normalized from municipal legislative systems across thousands of jurisdictions, built to preserve relationships that flat scrapes lose.

What’s included

The dataset is organized around entities + relationships so you can ground answers and compute outcomes reliably.

  • Meetings: bodies, dates, locations, minutes, attendance
  • Agendas: items, ordering, status, timestamps
  • Files / Matters: legislation objects, lifecycle, departments
  • Motions + Votes: outcomes, roll-calls, vote counts
  • Sponsors + People: roles, affiliations where available
  • Attachments: metadata, linkage to agenda items and files
  • Jurisdictions: normalized naming and identifiers
  • Topics: configurable tags for common civic domains

Historical depth: typically 5 - 10+ years per jurisdiction, sometimes 20+.

Quality advantages

Most public datasets lose critical relationships. We preserve them.

  • Relational truth: meeting → agenda → item → motion → vote
  • Stable identifiers for joins and lineage tracking
  • Normalized schema across jurisdictions
  • Late-arriving updates included in deltas (corrections happen)
  • Extraction approach avoids broken OCR links and missing votes

Ask for a schema overview and a small sample slice to validate fit with your ingestion and modeling workflow.

Delivery

Bulk history + daily deltas, designed for AI pipelines

Choose the delivery pattern that matches your data platform - lakes, warehouses, streaming, or hybrid.

Bulk History

Backfills for training and backtests

Large historical deliveries for pretraining, fine-tuning, and long-horizon analyses.

  • Parquet/CSV/JSON export
  • Coverage and time-window slicing
  • Optional topic tags
  • Schema + data dictionary
Daily Deltas

Ongoing updates with corrections

Production feeds for grounded retrieval, monitoring, and model-refresh pipelines.

  • Daily or intraday drops
  • Late-arriving updates handled
  • Change logs by entity type
  • Stable IDs for incremental joins
API

Query and retrieval workflows

For RAG systems and internal tooling that needs entity-level retrieval and “as-of” queries.

  • Entity endpoints (meetings, items, votes, attachments)
  • Filters by jurisdiction, topic, date
  • As-of time windows for temporal grounding
  • Integration support for evaluation harnesses

Need to integrate through existing data vendor channels (Neudata, Eagle Alpha, BattleFin, etc.)? We can support brokered delivery paths as well.

Licensing

Clear licensing posture for model development

AI teams need clarity - what you can do with the data, where it comes from, and how provenance is preserved.

Intended uses

  • Model training and fine-tuning (internal)
  • Grounding and retrieval (RAG) in production systems
  • Evaluation and benchmarking datasets
  • Civic intelligence and policy monitoring products

We can structure licenses for research-only pilots, production grounding, and enterprise model development.

Source and compliance notes

  • Public-record source: derived from municipal legislative systems
  • Normalized dataset: independent structuring and entity linkage
  • Provenance preserved: trace from derived objects back to source items
  • Low MNPI posture: civic decisions are public proceedings

Ask us for a one-page licensing summary tailored for model training and grounding review workflows.

Samples

Request schema, samples, and a pilot slice

Tell us your intended use (training, grounding, evaluation), and we’ll propose a small but meaningful dataset slice.

What you’ll receive

We can start with a lightweight package designed to fit AI data review workflows.

  • Schema overview + data dictionary (high level)
  • Sample records across key entity types
  • Coverage summary (jurisdictions, years, entity counts)
  • Optional topic tags aligned with your use case
  • Short technical call for ingestion + semantics

Prefer asynchronous? Email your team name, use case, and what format you want (Parquet/CSV/JSON), and we’ll respond with a proposed pilot slice.

Contact
Organization: GovData Consulting
Principal: Darius Tajanko, Original Legistar Architect
Email: info@govdataconsulting.com
Phone: (312) 767-4004

If you work with data procurement platforms or brokers, tell us which one and we can align to your preferred contracting path.