Methodology
Coverage
We monitor 1,800+ municipalities across 50 states. The United States has 19,500 municipalities. Our coverage is concentrated in New England, the Midwest, and the Mountain West, expanding weekly through automated onboarding.
What we cover well today:
- Municipal meeting documents (agendas, minutes, packets) from 1,800+ municipalities
- Check registers and accounts payable data from 200+ municipalities
- Building permits from 400+ municipalities (43,000+ structured records)
- Tower and infrastructure lease data from assessor records (3,800+ sites screened)
- FOAA/FOIA response data from 100+ jurisdictions with active records requests
Known gaps:
- County-level data (recorder, probate, assessor) is early stage -- we have depth in Maine and New Hampshire, limited coverage elsewhere
- Western and Southern states are underrepresented relative to New England and Midwest
- Insurance rate filings and mineral rights data are in pilot phase, not production coverage
- Real-time check register feeds exist for a minority of covered municipalities -- most are periodic via records requests
We are transparent about where our coverage is strong and where it is thin. If you need data from a specific jurisdiction, ask -- we can often onboard a new municipality in days.
Data Freshness
The pipeline runs daily at 10 PM ET. Every covered municipality is crawled, new documents are classified, and entities are resolved against the knowledge graph.
- Document ingestion: Daily for web-published documents. Varies for FOAA/FOIA responses (depends on jurisdiction response times, typically 5-10 business days)
- Signal classification: Same-day. Documents ingested in the evening crawl are classified and available by morning
- Entity resolution: Same-day. New vendor names are matched against the ticker database during classification
- Tower/infrastructure data: Weekly census. FCC and FAA cross-referencing runs on a weekly cycle
Source Types
| Source | Method | Typical Latency |
|---|---|---|
| Municipal agendas and minutes | Automated web scraping (CivicPlus, Granicus, BoardDocs, custom) | Same-day |
| Check registers / AP data | FOAA/FOIA records requests + web scraping where available | 5-30 days (records request) or same-day (web) |
| Building permits | Web scraping + records requests | Same-day to 30 days |
| Assessor records | Records requests + GIS portal scraping | 5-30 days |
| FCC/FAA tower data | Federal database cross-reference | Weekly |
| Insurance rate filings | State DOI portal scraping | Same-day (pilot) |
Classification
Documents are classified by an LLM-based classifier into document types (agenda, minutes, check_register, permit, ordinance, etc.) and assigned a financial priority score (HIGH, MEDIUM, LOW, NONE). The classifier is validated against a golden set of manually labeled documents and retrained weekly.
What "HIGH priority" means: The document contains information that maps directly to a public company ticker, infrastructure asset, or fiscal health indicator. Examples: a check register payment to a Grainger subsidiary, a tower lease renewal on a planning board agenda, a budget amendment showing a revenue shortfall.
What the classifier does not do: It does not make investment recommendations, predict price movements, or assign sentiment. It structures facts from public documents and resolves entities. The signal is the structured data itself.
Entity Resolution
Vendor names in municipal documents are messy. "Waste Management of Maine," "WM," "Waste Mgmt," and "WASTE MANAGEMENT INC" are the same company (ticker: WM). Our entity resolution system maps these variations to canonical company names and public tickers.
Current resolution covers 100+ public company tickers across 1,800+ municipalities. Resolution accuracy varies by entity -- large, frequently-appearing companies (AT&T, Verizon, Waste Management) resolve at high accuracy. Smaller or regional companies may not resolve on first appearance and are queued for manual review.
Methodology for Data Stories
Each data story (PFAS, housing density, company footprint, lead pipes) is built from classified signals in the pipeline, not from desk research or news aggregation. The counts, geographic breakdowns, and entity sightings are drawn directly from the knowledge graph.
When we say "120+ PFAS events across 72 municipalities," that means 120+ documents classified as PFAS-related by the pipeline, sourced from 72 distinct municipal websites. The events are verifiable -- every signal traces back to a specific document URL on a specific municipality's website.
What We Don't Do
- We don't scrape paywalled or subscription-only data sources
- We don't use social media, news, or web traffic data
- We don't make investment recommendations or predictions
- We don't sell data that isn't derived from public records
- We don't access non-public government systems -- everything we structure is published by governments for public consumption or obtained through formal records requests
Questions About Our Data
If you want to understand our coverage for a specific jurisdiction, entity, or data type, reach out. We'll tell you exactly what we have and what we don't.
Email: matt@municipalalpha.com