How we collect and verify the data behind every company profile
AI-Buzz tracks signals that traditional company databases miss: package downloads, open-source activity, developer community sentiment, and search interest. All signals sync daily at 6 AM UTC. Funding data is manually verified before publication.
| Signal | Source | What's Measured | Frequency |
|---|---|---|---|
| Package Downloads | npm, PyPI | 30-day download totals with month-over-month trends | Daily |
| GitHub Activity | GitHub API | Stars, forks, commit velocity (latest week vs. 4-week average) | Daily |
| GitHub Contributors | GitHub API | Unique commit authors in last 30 days across all company repos | Daily |
| npm Dependents | Libraries.io | Count of packages that depend on company npm packages | Daily |
| HN Mentions & Sentiment | HN API | 30-day mention counts, discussion share by category, sentiment classification (positive / neutral / negative) | Daily |
| Docker Hub Pulls | Docker Hub | Container adoption | Daily |
| Hugging Face Downloads | Hugging Face | ML model adoption | Daily |
| Hugging Face Models | Hugging Face | ML model portfolio breadth | Daily |
| Stack Overflow Questions | Stack Exchange | Developer support demand | Daily |
| Funding | TechCrunch, VentureBeat, Wikipedia | Round size, type, date, lead investors | As announced, manually verified |
Not all data points carry equal weight. We apply minimum thresholds to flag low-confidence signals:
| Signal | High Confidence | Low Confidence | Why It Matters |
|---|---|---|---|
| HN Sentiment | ≥30 comments (1.0 confidence) | <10 comments (0.2 confidence) | Sentiment classification requires enough comments for statistical relevance. Below 10, a single outlier can skew the ratio. |
| GitHub Velocity | ≥2 weeks history | <2 weeks history | Commit velocity compares the latest week to a 4-week average. With less than 2 weeks of data, the trend is unreliable. |
| Downloads | ≥100 / 30 days | <100 / 30 days | Very low download counts are often noise from CI bots or one-time installs. Below 100/month, the signal is too weak to draw conclusions about adoption. |
Each company receives a 0–100 data confidence score reflecting profile completeness, identifier verification, metric freshness, and signal coverage.
| Score Range | Rating | Meaning |
|---|---|---|
| 80–100 | Excellent | Fully verified identifiers, fresh metrics across all signals, complete profile. |
| 60–79 | Good | Most signals present and recently updated. Minor gaps in coverage or verification. |
| 40–59 | Fair | Some signals missing or stale. Profile may lack key identifiers like GitHub repos or package names. |
| <40 | Needs Review | Significant data gaps. Metrics may be outdated or unverified. Treat conclusions with caution. |
Funding rounds are detected from news feeds (TechCrunch, VentureBeat) and cross-referenced against company announcements and Wikipedia. Every round is manually verified before appearing on a company profile. We'd rather miss a round than publish incorrect data.
Every data point goes through a 4-layer validation pipeline:
Found an error? Use the "Report an error" button on any company profile, and we'll review it promptly.
The Developer Momentum Index combines all the signals described above into a single composite score (0-100) for each company. It weights developer adoption, community discussion, momentum trends, and search demand to rank AI companies by real developer traction.
For a detailed explanation of how the DMI is calculated, score tiers, and limitations, see the Developer Momentum Index page.
The momentum score (0–100) measures how quickly a company is gaining or losing developer traction. It combines four directional components:
| Component | Weight | Source |
|---|---|---|
| HN mentions trend | 30% | 30-day mention count change |
| GitHub stars trend | 25% | Star count growth rate |
| Funding recency | 20% | Days since last funding round (linear decay over 365 days) |
| Download trend | 25% | npm/PyPI download growth rate |
Not every company has data for all four components. When components are missing, the available weights redistribute proportionally so the score still uses the full 0–100 range. A confidence value (0.25–1.0) records the fraction of components that were available:
When the momentum score feeds into the signal score, it is multiplied by this confidence value, so companies with fewer data points receive proportionally less credit.
The signal score (used for ranking and prioritization) includes demand-side metrics that capture real user interest. These account for 40% of the total signal score:
| Signal | Weight | Source |
|---|---|---|
| Page views | 15% | 30-day company page views (GA4) |
| Search appearances | 10% | 30-day on-site search queries (GA4) |
| GSC impressions | 15% | Google Search Console impressions for company pages |
The remaining 60% of the signal score comes from supply-side signals (HN mentions, GitHub stars, momentum, funding recency — 40%) and data quality (profile coverage, package downloads — 20%).
When demand data is unavailable (e.g., a newly added company with no page views yet), the demand components gracefully degrade to 0 rather than penalizing the company. This means the score is driven entirely by supply-side and data quality signals until demand data becomes available.
See which AI companies are gaining developer traction right now.
When scoring formulas or weights change, we document them here. Weight changes are also automatically logged to Data Updates.
Rebalanced to 40% supply-side, 40% demand-side, 20% data quality. Added GSC impressions (15%) and search appearances (10%) as demand signals. Reduced HN weight from 20% to 15%.
Original signal score with equal weighting across HN mentions, GitHub stars, funding, and download metrics. No demand-side signals.
Which AI companies are developers actually adopting? We track npm and PyPI downloads for hundreds of companies. Get the biggest shifts weekly - before they show up in the news.