Methodology

How we track AI companies and measure developer traction

AI-Buzz tracks developer community signals that traditional company databases miss. This page explains our data collection methods, update frequency, and curation process so you can verify the data cited in our articles.

All automated metrics sync daily. Manual curation ensures quality over quantity—we focus on companies that matter to developers, not comprehensive coverage.

Package Download Tracking

How It Works

We track npm and PyPI package downloads as our primary adoption signal — these show which AI tools developers are actually installing and using in production.

npm data source: npm registry API (public, no authentication required)
PyPI data source: PyPI Stats API (backed by BigQuery, public)
Time window: 30-day rolling totals with month-over-month trend calculation
Update frequency: Daily at 6 AM UTC via automated GitHub Actions workflow

Why Package Downloads Matter

Downloads represent real developer adoption — someone ran npm install or pip install and integrated a package into their project. Unlike stars or mentions, downloads reflect actual usage in production code.

Known Limitations

CI/CD inflation: Automated pipelines (CI builds, Docker image construction) can inflate download counts. A single organization running frequent CI may contribute hundreds of daily downloads.
Monorepo undercounting: Companies using monorepo structures may have downloads spread across many small packages, understating total adoption.
Private registry exclusion: Downloads from private npm registries (e.g., Artifactory, GitHub Packages) are not counted. Enterprise usage is likely underrepresented.
Package mapping: We manually map npm/PyPI packages to companies. Some companies may have packages we haven't mapped yet.

Hacker News Tracking

How It Works

We use the Algolia HN Search API (free, no authentication required) to track how often companies are mentioned on Hacker News.

Search queries: Company name (exact phrase) and website domain
Time window: Last 30 days (rolling window)
Content types: Both stories and comments
Update frequency: Daily at 6 AM UTC via automated GitHub Actions workflow

Why HN Mentions Matter

Hacker News is a leading indicator of developer interest. What's discussed there often becomes mainstream 6-12 months later. High HN mention counts signal that developers are actively engaging with a company's products or announcements.

GitHub Activity Sync

How It Works

For open-source AI companies, we track GitHub activity using the GitHub REST API (authenticated via GitHub token when available for higher rate limits).

Metrics tracked: Stars, forks, last commit date, weekly commits, and commit velocity trend (see Commit Velocity section for details)
Requirements: Company must have github_repo field set (format: owner/repo)
Update frequency: Daily at 6 AM UTC via automated GitHub Actions workflow
Rate limiting: Script checks X-RateLimit-Remaining header and stops if <10 requests remaining

Why GitHub Activity Matters

GitHub stars and forks show developer adoption. Last commit date indicates whether a project is actively maintained. For open-source AI companies, these metrics complement funding data by showing real developer engagement.

Funding Curation Process

Detection Pipeline

We scan RSS feeds from trusted sources and use AI to extract structured funding data:

Sources: TechCrunch AI feed, VentureBeat AI feed, TechCrunch funding tag feed
Detection: Keyword matching (e.g., "raised", "series A", "$50M") filters relevant articles
Extraction: Gemini 1.5 Flash API extracts structured data: amount, round type, date, investors
Trust levels: "High" (Wikipedia sources), "Medium" (Gemini extraction), "Low" (news-only)

Human Review Queue

All detected funding rounds go through manual review before being added to the database:

Queue: Funding candidates stored in funding_queue.json
Review process: Human reviewer approves, rejects, or marks duplicates
Verification: Cross-checked against Wikipedia, company announcements, and press releases
Result: Only verified funding rounds appear on company profiles

Why Manual Curation

Automated funding detection has false positives and misattributions. Manual review ensures accuracy—we'd rather miss a round than publish incorrect data. This quality bar is what differentiates AI-Buzz from automated aggregators.

Data Freshness & Update Schedule

Automated Daily Sync

Runs daily at 6 AM UTC via GitHub Actions workflow:

Hacker News mentions: 30-day rolling window updated daily
GitHub stats: Stars, forks, last commit date refreshed daily
News aggregation: RSS feeds scanned for new company mentions
Category metrics: Pre-computed aggregates (total funding, company counts, avg HN mentions) recalculated daily

Manual Updates

These require human intervention and update as needed:

Company profiles: Descriptions, categories, status changes (updated when companies pivot or get acquired)
Funding rounds: After review queue approval (typically within 24-48 hours of detection)
New companies: Added via interactive wizard with validation (not automated)

Data Timestamps

Each company record includes a metrics_updated_at timestamp showing when automated metrics were last synced. This helps you understand data freshness when reading articles that cite AI-Buzz data.

HN Sentiment Analysis

Sentiment Classification

We analyze Hacker News comment sentiment for each company using Claude Haiku (claude-haiku-4-5) to classify community reactions. Comments are fetched from the Algolia HN Search API and classified into three categories:

Positive: Comments expressing enthusiasm, praise, or endorsement for the company or product
Neutral: Factual discussion, questions, or commentary without clear sentiment
Negative: Comments expressing criticism, skepticism, or concern

Sample Size & Citability

Classification is limited to 200 comments maximum per company, processed in batches of 50. Each batch is truncated to 300 characters per comment to stay within API token limits.

Comments are classified only when a company has HN mentions. If fewer than 10 analyzable comments exist after filtering, sentiment data may be incomplete. We prioritize broader search queries for companies with partial coverage (<10 samples) to increase sample size before marking data as final.

Confidence Tiers:

High confidence (30+ comments): Statistically robust — recommended for citation in articles and analysis
Moderate confidence (10-29 comments): Sufficient for citation with appropriate caveats about sample size
Low confidence (5-9 comments): Displayed for context but not recommended for citation
Sentiment data with fewer than 5 comments is not displayed on company pages

Positive Percentage Metric

Each company receives an hn_sentiment_positive_pct score (0-100) representing the percentage of classified comments that were positive. This metric is paired with hn_sentiment_sample_size to indicate confidence—a 75% positive score from 50 comments is more reliable than 75% from 3 comments.

GitHub Commit Velocity Trend

Weekly Commit Activity

For repositories with commit history, we fetch weekly commit counts from the GitHub API and compute a velocity ratio comparing recent activity to historical trends.

Latest week: Total commits in the most recent complete week
Prior weeks: Average commits over the previous 4 weeks (or fewer if insufficient history)
Data source: GitHub API GET /repos/{owner}/{repo}/stats/commit_activity

Velocity Ratio Interpretation

The velocity trend is a ratio: latest week commits / prior 4-week average.

> 1.0: Development is accelerating. For example, a ratio of 1.5 means the latest week had 50% more commits than the historical average. This signals increased engineering activity and momentum.
= 1.0: Development pace is consistent with historical baseline
< 1.0: Development is decelerating. A ratio of 0.6 means the latest week had 40% fewer commits than average. This may indicate reduced activity, maintenance mode, or a transition period.
null: Ratio is not computed if there are fewer than 2 weeks of commit history available

Limitations

Commit velocity reflects engineering activity only—not code quality, feature importance, or test coverage. A spike in commits may represent bug fixes, refactoring, or documentation changes, not necessarily new features. Use this metric in combination with other signals (GitHub stars, HN sentiment) for comprehensive project health assessment.

Google Trends Score

How It Works

Google Trends Score measures relative search interest for each company over the last 30 days using the Google Trends API. The score reflects public awareness and mainstream attention beyond developer communities.

Score range: 0–100, where 100 represents peak search interest for that specific company's search term during the tracked period
What it measures: Relative search volume, not absolute search counts. A score of 50 means the term had half the search interest compared to its peak.
Update frequency: Daily at 6 AM UTC via automated sync (data/scripts/syncGoogleTrends.ts)
Data source: Google Trends API (30-day rolling window)

Why Search Interest Matters

Google Trends data captures mainstream public interest that extends beyond developer communities tracked via Hacker News and GitHub. A rising Trends score indicates growing consumer awareness, media coverage, or market adoption. This complements developer-focused signals to show broader market reach.

Limitations

Scores are relative, not absolute—a score of 80 for one company cannot be directly compared to 80 for another, as each is scaled to that term's own peak. Search volume varies significantly by search term specificity (e.g., "OpenAI" vs. "Anthropic"). Additionally, brand name searches may reflect negative attention or controversy rather than positive interest. Use this metric alongside HN sentiment and other signals for comprehensive company assessment.

Limitations & Scope

What We Track

AI-Buzz focuses on curated quality, not comprehensive coverage:

Companies gaining developer traction (HN mentions, GitHub activity)
Well-funded companies (Series A+)
Strategically important to the AI ecosystem
Open-source AI projects with significant adoption

What We Don't Track

By design, we exclude:

Small companies without developer community signals (no HN mentions, no GitHub activity)
Companies without public funding announcements (stealth mode startups)
Enterprise-only companies with no developer-facing products
Non-AI companies (we focus exclusively on AI/ML companies)

Accuracy Guarantees

We verify all funding rounds manually, but automated metrics (HN mentions, GitHub stats) are pulled from public APIs and may have occasional discrepancies. If you find an error, please report it via email.

Explore the Data

See which AI companies are gaining developer traction right now.

View Dashboard Browse Companies About AI-Buzz

Methodology

How we track AI companies and measure developer traction

All automated metrics sync daily. Manual curation ensures quality over quantity—we focus on companies that matter to developers, not comprehensive coverage.

Package Download Tracking

How It Works

We track npm and PyPI package downloads as our primary adoption signal — these show which AI tools developers are actually installing and using in production.

npm data source: npm registry API (public, no authentication required)
PyPI data source: PyPI Stats API (backed by BigQuery, public)
Time window: 30-day rolling totals with month-over-month trend calculation
Update frequency: Daily at 6 AM UTC via automated GitHub Actions workflow

Why Package Downloads Matter

Known Limitations

CI/CD inflation: Automated pipelines (CI builds, Docker image construction) can inflate download counts. A single organization running frequent CI may contribute hundreds of daily downloads.
Monorepo undercounting: Companies using monorepo structures may have downloads spread across many small packages, understating total adoption.
Private registry exclusion: Downloads from private npm registries (e.g., Artifactory, GitHub Packages) are not counted. Enterprise usage is likely underrepresented.
Package mapping: We manually map npm/PyPI packages to companies. Some companies may have packages we haven't mapped yet.

Hacker News Tracking

How It Works

We use the Algolia HN Search API (free, no authentication required) to track how often companies are mentioned on Hacker News.

Search queries: Company name (exact phrase) and website domain
Time window: Last 30 days (rolling window)
Content types: Both stories and comments
Update frequency: Daily at 6 AM UTC via automated GitHub Actions workflow

Why HN Mentions Matter

GitHub Activity Sync

How It Works

For open-source AI companies, we track GitHub activity using the GitHub REST API (authenticated via GitHub token when available for higher rate limits).

Metrics tracked: Stars, forks, last commit date, weekly commits, and commit velocity trend (see Commit Velocity section for details)
Requirements: Company must have github_repo field set (format: owner/repo)
Update frequency: Daily at 6 AM UTC via automated GitHub Actions workflow
Rate limiting: Script checks X-RateLimit-Remaining header and stops if <10 requests remaining

Why GitHub Activity Matters

Funding Curation Process

Detection Pipeline

We scan RSS feeds from trusted sources and use AI to extract structured funding data:

Sources: TechCrunch AI feed, VentureBeat AI feed, TechCrunch funding tag feed
Detection: Keyword matching (e.g., "raised", "series A", "$50M") filters relevant articles
Extraction: Gemini 1.5 Flash API extracts structured data: amount, round type, date, investors
Trust levels: "High" (Wikipedia sources), "Medium" (Gemini extraction), "Low" (news-only)

Human Review Queue

All detected funding rounds go through manual review before being added to the database:

Queue: Funding candidates stored in funding_queue.json
Review process: Human reviewer approves, rejects, or marks duplicates
Verification: Cross-checked against Wikipedia, company announcements, and press releases
Result: Only verified funding rounds appear on company profiles

Why Manual Curation

Data Freshness & Update Schedule

Automated Daily Sync

Runs daily at 6 AM UTC via GitHub Actions workflow:

Hacker News mentions: 30-day rolling window updated daily
GitHub stats: Stars, forks, last commit date refreshed daily
News aggregation: RSS feeds scanned for new company mentions
Category metrics: Pre-computed aggregates (total funding, company counts, avg HN mentions) recalculated daily

Manual Updates

These require human intervention and update as needed:

Company profiles: Descriptions, categories, status changes (updated when companies pivot or get acquired)
Funding rounds: After review queue approval (typically within 24-48 hours of detection)
New companies: Added via interactive wizard with validation (not automated)

Data Timestamps

Each company record includes a metrics_updated_at timestamp showing when automated metrics were last synced. This helps you understand data freshness when reading articles that cite AI-Buzz data.

HN Sentiment Analysis

Sentiment Classification

Positive: Comments expressing enthusiasm, praise, or endorsement for the company or product
Neutral: Factual discussion, questions, or commentary without clear sentiment
Negative: Comments expressing criticism, skepticism, or concern

Sample Size & Citability

Classification is limited to 200 comments maximum per company, processed in batches of 50. Each batch is truncated to 300 characters per comment to stay within API token limits.

Confidence Tiers:

High confidence (30+ comments): Statistically robust — recommended for citation in articles and analysis
Moderate confidence (10-29 comments): Sufficient for citation with appropriate caveats about sample size
Low confidence (5-9 comments): Displayed for context but not recommended for citation
Sentiment data with fewer than 5 comments is not displayed on company pages

Positive Percentage Metric

GitHub Commit Velocity Trend

Weekly Commit Activity

For repositories with commit history, we fetch weekly commit counts from the GitHub API and compute a velocity ratio comparing recent activity to historical trends.

Latest week: Total commits in the most recent complete week
Prior weeks: Average commits over the previous 4 weeks (or fewer if insufficient history)
Data source: GitHub API GET /repos/{owner}/{repo}/stats/commit_activity

Velocity Ratio Interpretation

The velocity trend is a ratio: latest week commits / prior 4-week average.

> 1.0: Development is accelerating. For example, a ratio of 1.5 means the latest week had 50% more commits than the historical average. This signals increased engineering activity and momentum.
= 1.0: Development pace is consistent with historical baseline
< 1.0: Development is decelerating. A ratio of 0.6 means the latest week had 40% fewer commits than average. This may indicate reduced activity, maintenance mode, or a transition period.
null: Ratio is not computed if there are fewer than 2 weeks of commit history available

Limitations

Google Trends Score

How It Works

Score range: 0–100, where 100 represents peak search interest for that specific company's search term during the tracked period
What it measures: Relative search volume, not absolute search counts. A score of 50 means the term had half the search interest compared to its peak.
Update frequency: Daily at 6 AM UTC via automated sync (data/scripts/syncGoogleTrends.ts)
Data source: Google Trends API (30-day rolling window)

Why Search Interest Matters

Limitations

Limitations & Scope

What We Track

AI-Buzz focuses on curated quality, not comprehensive coverage:

Companies gaining developer traction (HN mentions, GitHub activity)
Well-funded companies (Series A+)
Strategically important to the AI ecosystem
Open-source AI projects with significant adoption

What We Don't Track

By design, we exclude:

Small companies without developer community signals (no HN mentions, no GitHub activity)
Companies without public funding announcements (stealth mode startups)
Enterprise-only companies with no developer-facing products
Non-AI companies (we focus exclusively on AI/ML companies)

Accuracy Guarantees

Explore the Data

See which AI companies are gaining developer traction right now.

View Dashboard Browse Companies About AI-Buzz

Methodology

Package Download Tracking

How It Works

Why Package Downloads Matter

Known Limitations

Hacker News Tracking

How It Works

Why HN Mentions Matter

GitHub Activity Sync

How It Works

Why GitHub Activity Matters

Funding Curation Process

Detection Pipeline

Human Review Queue

Why Manual Curation

Data Freshness & Update Schedule

Automated Daily Sync

Manual Updates

Data Timestamps

HN Discussion Share Analysis

Category Ranking

HN Discussion Share Percentage

HN Sentiment Analysis

Sentiment Classification

Sample Size & Citability

Positive Percentage Metric

GitHub Commit Velocity Trend

Weekly Commit Activity

Velocity Ratio Interpretation

Limitations

Google Trends Score

How It Works

Why Search Interest Matters

Limitations

Limitations & Scope

What We Track

What We Don't Track

Accuracy Guarantees

Explore the Data

Methodology

Package Download Tracking

How It Works

Why Package Downloads Matter

Known Limitations

Hacker News Tracking

How It Works

Why HN Mentions Matter

GitHub Activity Sync

How It Works

Why GitHub Activity Matters

Funding Curation Process

Detection Pipeline

Human Review Queue

Why Manual Curation

Data Freshness & Update Schedule

Automated Daily Sync

Manual Updates

Data Timestamps

HN Discussion Share Analysis

Category Ranking

HN Discussion Share Percentage

HN Sentiment Analysis

Sentiment Classification

Sample Size & Citability

Positive Percentage Metric

GitHub Commit Velocity Trend

Weekly Commit Activity

Velocity Ratio Interpretation

Limitations

Google Trends Score

How It Works

Why Search Interest Matters

Limitations

Limitations & Scope

What We Track

What We Don't Track

Accuracy Guarantees

Explore the Data