Skip to main content
Unstructured logo

Unstructured

ETL for unstructured data preprocessing

AI InfrastructureFounded 2022#16 of 55 in AI Infrastructure

Updated May 29, 2026

Follow this company

Follow this company to revisit its latest research cards from your account.

Company profile

What is Unstructured?

Unstructured specializes in extracting and transforming complex unstructured data from diverse sources like PDFs, images, and emails into clean, structured formats. Their tools are critical for preparing high-quality input data for large language models (LLMs) and other AI applications, enabling organizations to build more accurate and effective AI solutions.

Company research

As of April 29, 2026
  1. Hacker News mentions decreased 34.1% over the last 30 days. Hacker News is discussion volume, not adoption.

Source check
Standard. Use as a directional reading from retained source data.
Sources
5 measured data points from 5 sources, current through 2026-04-29.

Latest company data

Primary data point

Downloads

5.3M/30d

Tracked package: PyPI unstructured

▲ +7%Updated 1d ago

Other data points

Dependent projects

289

Projects depending on tracked package: PyPI unstructured

Updated 9h ago

GitHub stars

14.8K

Main repository stars

Updated 9h ago

Hacker News

60/30d

Position #7 in category discussion

Updated 9h ago

Repository health

Maintenance data from the main open-source repository.

Key-person risk
2
External contributors
40
% of recent contributors outside the core team
Releases (30d)
0

Repository usage

Public repositories and source files importing packages tied to Unstructured.

Repos importing
3.4K0%

About Unstructured

Unstructured specializes in extracting and transforming complex unstructured data from diverse sources like PDFs, images, and emails into clean, structured formats. Their tools are critical for preparing high-quality input data for large language models (LLMs) and other AI applications, enabling organizations to build more accurate and effective AI solutions.

FoundersBrian Raymond, Matthew Harrison

Funding

$105M · 3 rounds

Raised $105M total. Category position #16 of 55 in AI Infrastructure.

Series B2024

Menlo Ventures

$40M
Series A2024

Menlo Ventures

$40M
Seed2023

Madrona

$25M

Investors

MadronaMenlo Ventures

Tracked packages (1)

1 PyPI

unstructured

PyPIMain PyPI package

unstructured

ETL for unstructured data preprocessing