How we measure the completeness and reliability of each company's data
Every company on AI-Buzz receives a data confidence score from 0 to 100. This score reflects how complete, fresh, and internally consistent the data is — not how good the company is. A score of 85 means we have reliable, up-to-date data. A score of 35 means there are gaps that may affect accuracy.
The score is a weighted composite of 8 components, each measuring a different aspect of data quality. It is recomputed daily.
All core fields present, metrics recently synced, no cache drift, good snapshot depth. Data is reliable for analysis.
Most data is present and fresh. May be missing one or two optional fields or have slightly stale metrics.
Gaps in data coverage. May lack package identifiers, have stale metrics, or missing profile fields. Use with caution.
Significant data gaps. The company may be newly added, lack trackable identifiers, or have a sync failure. Metrics may not reflect reality.
| Component | Weight | What It Measures |
|---|---|---|
| Required Fields | 20% | Checks that slug, name, website, description, and primary domain are all present. Missing any of these reduces the score proportionally. |
| Profile Completeness | 15% | Checks for founding year, company status (Private/Public/Acquired), and employee size range. |
| Package Identifier | 10% | Whether the company has at least one trackable package identifier: npm package, PyPI package, Docker image, Hugging Face org, or GitHub repo. |
| Metric Freshness | 15% | How recently the company's metrics were last synced. Stale data means the pipeline may have missed this company. |
| Cache Drift | 10% | Verifies that cached values on the company record (e.g., GitHub stars, HN mentions) match the latest metric snapshots. Drift indicates a sync bug. |
| Scores Computed | 10% | Whether the company has computed signal and momentum scores. These are derived scores that power rankings and the Developer Momentum Index. |
| Snapshot Depth | 10% | Checks whether the company has at least 4 metric snapshots per tracked metric type in the last 14 days. More snapshots mean more reliable trend data. |
| Description Quality | 10% | Evaluates the company description for appropriate length (60-160 characters ideal) and absence of marketing cliches. |
Each of the 5 fields contributes equally. 5/5 = 100, 4/5 = 80, etc.
Each of the 3 fields contributes equally. 3/3 = 100, 2/3 = 67, 1/3 = 33, 0/3 = 0.
0 identifiers = 0, 1 = 30, 2 = 60, 3+ = 100.
Updated within 7 days = 100, within 14 days = 50, older = 0.
No drift = 100. Each drifted metric deducts 50 points.
Both computed = 100, one = 50, neither = 0.
Percentage of metric types with 4+ snapshots in the last 14 days.
Starts at 100. Short (<60 chars) or long (>300 chars) descriptions lose 25 points. Marketing cliches (e.g., "industry-leading", "revolutionary") deduct 25-50 points.
A company with most data present but missing employee size and having one marketing cliche in its description:
| Component | Score | Weight | Detail |
|---|---|---|---|
| Required Fields | 100 | 20% | All 5 fields present |
| Profile Completeness | 67 | 15% | Has founding year and status, missing size |
| Package Identifier | 60 | 10% | npm + GitHub repo (2 identifiers) |
| Metric Freshness | 100 | 15% | Updated 2 days ago |
| Cache Drift | 100 | 10% | No drift detected |
| Scores Computed | 100 | 10% | Both signal and momentum scores computed |
| Snapshot Depth | 75 | 10% | 3 of 4 metric types have 4+ snapshots |
| Description Quality | 75 | 10% | Good length, 1 marketing cliche detected |
Final score: 86 / 100 (Good)
Computed as: (100 x 0.20) + (67 x 0.15) + (60 x 0.10) + (100 x 0.15) + (100 x 0.10) + (100 x 0.10) + (75 x 0.10) + (75 x 0.10)
A 0-100 score reflecting how complete, fresh, and verified a company's data is on AI-Buzz. It is computed from 8 weighted components covering required fields, profile completeness, package identifiers, metric freshness, cache consistency, score computation, snapshot depth, and description quality.
The confidence score is recomputed daily as part of the data pipeline. Changes to any underlying component (e.g., a new sync updating metric freshness) are reflected the next day.
Yes. Adding missing identifiers (npm package, GitHub repo), updating the company description, and ensuring regular metric syncs all improve the score. The score reflects data quality, not company quality.
Not necessarily. A low score means data is incomplete or stale, not that existing data is incorrect. A newly added company with few identifiers will have a low score until more data sources are connected.
Descriptions with marketing cliches or extreme lengths suggest the profile hasn't been properly curated. Factual, concise descriptions correlate with better-maintained company profiles overall.