Crunchbase Scraper: Extract Funding Rounds, Companies, and Investors

1.Crunchbase fields deal teams and sales teams extract

VC sourcing teams prioritize funding announcements: round type, date, amount, and lead investors. Sales enrichment pipelines care about employee range, category tags, HQ location, and the company website for domain matching. Journalists and analysts want acquisition events and IPO history. Understanding which fields are freely visible versus paywalled determines what your scraper can realistically return.

Fields marked as paywalled below will render as blurred or empty DOM nodes on free views — an empty CSS extraction result for those selectors does not mean your selector is wrong. It means Crunchbase is intentionally hiding the value.

Organization permalink slug and UUID (embedded in page source JSON)
Company name, short description, and logo URL
Founded date, operating status, and closed date if applicable
Headquarters city, region, and country
Company website URL
Category and industry tags
Employee count range (e.g., 1001–5000)
Total funding amount — paywalled on most free views
Number of funding rounds and last funding date
Last funding type and lead investor names — partially paywalled
Acquisition targets and acquirer (when public)
IPO date and stock exchange (when applicable)
Founder and key people profile links

2.Crunchbase URL patterns and permalink stability

Crunchbase uses human-readable slugs for organization and person permalinks. These slugs are stable enough to use as primary keys in most pipelines — companies rarely change their Crunchbase slug even after rebranding, though it does happen. The UUID embedded in the page source JSON is more durable if you can extract it.

Funding round URLs include a hash suffix that acts as a stable identifier for that specific round event. Bookmark these when you want to track a specific raise over time rather than re-scraping the parent organization page.

Organization: https://www.crunchbase.com/organization/stripe
Person: https://www.crunchbase.com/person/patrick-collison
Funding round: https://www.crunchbase.com/funding_round/stripe-series-h--abc123
Acquisition: https://www.crunchbase.com/acquisition/company-acquires-target
Discover search (login-gated): https://www.crunchbase.com/discover/organization.companies
Category hub: https://www.crunchbase.com/hub/fintech-companies

3.Organization page DOM structure

Crunchbase organization pages are Angular single-page applications. The server renders an initial HTML shell with some metadata, but most content is hydrated client-side. This means fast HTTP-only requests will capture the shell and any JSON-LD embedded in the document, but funding sections and people cards require JavaScript execution to appear in the DOM.

Key selectors on a fully rendered organization page: company name in `h1.profile-name`; short description in `span.description`; headquarters in `span.field-type-address`; website in `a.component--field-formatter.field-type-link`; employee range in `a.field-type-enum`; category chips in `span.chip`; founded date in `span.field-type-date`. Funding round rows render inside `section#funding-rounds` as table rows once the Angular component loads.

Paywalled fields are wrapped in elements with class `cb-paywall`. These nodes exist in the DOM but their text content is replaced with a blur overlay and a prompt to upgrade. Your CSS extractor will return empty strings for those selectors — not an error, just a paywall signal. Crunchbase also embeds JSON-LD `Organization` schema on some pages with `name` and `url` properties, but funding detail is almost never included in the structured data.

4.Paywalls, anti-bot detection, and rate limits

Crunchbase runs layered defenses. At the network layer, datacenter IP ranges are rate-limited aggressively on organization page requests — you will see 429s or silent redirects to the homepage within a small number of sequential requests from a single datacenter IP. Residential proxies reduce this friction significantly.

At the application layer, the Pro paywall blurs funding amounts and investor lists for unauthenticated or free-tier sessions. This is not bot detection — it is deliberate content gating. Attempting to bypass it by injecting session cookies from a paid account violates Crunchbase Terms and potentially computer fraud statutes depending on jurisdiction.

The discover search and export features require an active login session and are heavily rate-limited even for Pro users. Do not attempt to automate discover exports — scrape known organization permalinks from a seed list instead.

`cb-paywall` class overlays on funding amounts and investor details
Login required for discover search and CSV exports
Aggressive rate limits on organization pages from datacenter IPs
Angular SPA hydration required for most content sections
Frequent component class name changes breaking CSS selectors
Legal terms explicitly prohibiting scraping and automated collection

5.Scraping public organization fields with OmniScrape

Use `mode: "auto"` with a residential US proxy for organization pages. OmniScrape will attempt a fast HTTP request first and escalate to a headless browser if the page requires JavaScript rendering. For basic firmographic fields — name, description, location, categories, employee range — the initial HTTP response often contains enough rendered HTML to extract values without full JS execution.

Target only the fields that are freely visible on public pages. If a selector returns an empty string, check whether the field is behind a `cb-paywall` overlay before debugging your selector. The `enable_solver` flag activates OmniScrape's Web Unlocker to handle bot challenges that may appear on high-volume scraping sessions.

Crunchbase organization — public fields request

json

1234567891011121314151617{
  "url": "https://www.crunchbase.com/organization/openai",
  "mode": "auto",
  "output_format": "css_extractor",
  "enable_solver": true,
  "proxy": "residential:us",
  "css_selectors": {
    "name": "h1.profile-name",
    "description": "span.description",
    "location": "span.field-type-address",
    "website": "a.component--field-formatter.field-type-link",
    "employees": "a.field-type-enum",
    "categories": "span.chip",
    "founded": "span.field-type-date",
    "operating_status": "span.field-type-enum[href*='operating_status']"
  }
}

6.Extracting the funding rounds section

The funding rounds section is rendered by an Angular component that loads asynchronously after the initial page shell. Use `mode: "js_rendering"` with `js_wait_selector` pointing to `section#funding-rounds` so OmniScrape waits for the component to hydrate before extracting. Set `js_wait_timeout` to at least 10–12 seconds — Crunchbase's Angular bootstrap is slow on cold loads.

Round rows that are not paywalled will contain the funding date, round type label, and a link to the round detail page. The amount and lead investor name may be empty strings if the session is not authenticated to Pro. Extract investor links by targeting anchor tags with `href` containing `/organization/` inside the funding section — these point to investor organization pages you can follow-scrape.

Store the round detail URL (e.g., `/funding_round/stripe-series-h--abc123`) as a stable foreign key. Re-scraping the parent organization page will re-surface the same round — the round URL is your deduplication handle.

Crunchbase funding rounds section request

json

12345678910111213141516{
  "url": "https://www.crunchbase.com/organization/anthropic",
  "mode": "js_rendering",
  "output_format": "css_extractor",
  "enable_solver": true,
  "proxy": "residential:us",
  "js_wait_selector": "section#funding-rounds",
  "js_wait_timeout": 12000,
  "css_selectors": {
    "round_dates": "section#funding-rounds span.field-type-date",
    "round_types": "section#funding-rounds a[href*='funding_round']",
    "round_amounts": "section#funding-rounds span.field-type-money",
    "investors": "section#funding-rounds a[href*='/organization/']",
    "total_funding": "span[data-test='funding-total']"
  }
}

7.Crunchbase Enterprise API and licensed data access

Crunchbase sells licensed API access and bulk data exports through its Enterprise tier. If you are building a product that surfaces Crunchbase funding data to end users — a CRM enrichment tool, an investor intelligence platform, a sales prospecting product — you almost certainly need a license rather than a scraper. Scraping free public fields and reselling compiled funding datasets competes directly with Crunchbase's core business and carries significant legal exposure.

The Enterprise API returns structured JSON with full funding detail, investor relationships, and historical round data. It is rate-limited but documented, and the data model is stable compared to CSS selectors that break whenever Crunchbase ships an Angular component update. For internal research use cases — a VC analyst running one-off lookups, a journalist verifying a funding claim — scraping publicly visible fields with counsel sign-off is a different risk profile than a commercial data product.

Evaluate the build-versus-buy decision honestly: the engineering cost of maintaining Crunchbase CSS selectors against frequent DOM changes, plus residential proxy costs, plus legal review, often exceeds the Enterprise API cost for production workloads.

8.Using permalinks and UUIDs as primary keys

Store the organization permalink slug — the human-readable portion of the URL like `openai` or `anthropic` — as your primary key for company records. This slug is stable across most rebrands and is the canonical identifier Crunchbase uses in all cross-links between organizations, funding rounds, and people.

For higher durability, extract the UUID from the embedded JSON in the page source. Crunchbase embeds a JSON blob in a `<script>` tag containing the organization's UUID, which persists even if the slug changes after an acquisition or rebrand. Parse this with a regex or JSON path extractor from the raw HTML response (`body.data.content`) before running CSS extraction.

Model funding events as separate rows keyed by the round URL slug. A single organization scrape may surface multiple rounds — store each as an independent record with the parent organization permalink as a foreign key. This lets you incrementally update round records without re-processing the full organization history on every scrape cycle.

9.Crunchbase Terms of Service and legal considerations

Crunchbase's Terms of Service explicitly prohibit automated scraping, crawling, and data collection. Section 4 of their Terms restricts use of robots, spiders, or automated tools to access the service. Paywall bypass — whether by injecting Pro session cookies, intercepting API calls, or circumventing the `cb-paywall` overlay — constitutes unauthorized access to paid content and may violate the Computer Fraud and Abuse Act in the US and equivalent statutes in other jurisdictions.

OmniScrape provides the technical capability to make HTTP and browser-rendered requests to publicly accessible URLs. It does not grant any rights to the data returned by those requests. The legality of collecting, storing, and using Crunchbase data depends on your jurisdiction, your use case, and whether the data is publicly visible without authentication. Get legal counsel before building a commercial product on scraped Crunchbase data.

For publicly visible fields collected at low volume for internal research — verifying a funding claim, enriching a small prospect list — the risk profile is different from bulk collection and redistribution. Document your use case, respect robots.txt, use rate limiting, and do not attempt to access paywalled content.

Frequently asked questions

Why does my Crunchbase scraper return empty funding amounts?

Funding amounts on Crunchbase are paywalled behind Pro for most organizations on free views. The DOM node exists but its text content is replaced with a blur overlay — your CSS selector is correct, but the value is intentionally hidden. You will see the same empty result whether you scrape with a browser or a headless tool. The only legitimate way to access the full amount is through a Pro account or the Enterprise API.

Do I need js_rendering mode for Crunchbase organization pages?

It depends on which fields you need. Basic firmographic fields — name, description, location, categories, employee range — are often present in the initial server-rendered HTML shell and can be extracted with mode auto without full JS execution. The funding rounds section, people cards, and acquisition history require Angular hydration and need js_rendering with js_wait_selector set to the relevant section ID. Use auto first and check what comes back before defaulting to js_rendering for every request.

Can I scrape Crunchbase discover search results?

Discover search requires an active login session and is heavily rate-limited even for authenticated Pro users. Automated access to discover search and CSV exports is explicitly restricted by Crunchbase Terms. The practical alternative is to build a seed list of organization permalinks from external sources — press releases, news mentions, LinkedIn company pages — and scrape each permalink directly rather than trying to replicate discover search programmatically.

How often do Crunchbase CSS selectors break?

Frequently. Crunchbase ships Angular component updates that change class names and DOM structure without notice. Selectors like h1.profile-name and section#funding-rounds have been relatively stable, but attribute-based selectors and deeply nested class chains break regularly. Build your pipeline with fallback selectors, monitor extraction success rates, and alert on empty results that were previously populated. Expect to update selectors several times per year.

Is Crunchbase data public domain?

No. Crunchbase aggregates, cleans, and licenses funding data. Even if individual data points like a funding announcement are public facts, Crunchbase's compiled database is protected as a copyrightable compilation in most jurisdictions. Scraping and redistributing Crunchbase data commercially — as part of a data product, API, or enrichment service — carries high legal risk regardless of whether the underlying facts are public.

What proxy type should I use for Crunchbase?

Residential US proxies. Crunchbase rate-limits datacenter IP ranges aggressively on organization page requests — you will see 429 responses or silent redirects within a small number of sequential requests from a datacenter IP. Residential proxies rotate through real ISP addresses and significantly reduce rate-limiting friction. Set proxy: "residential:us" in your OmniScrape request and keep request cadence low — one request per organization every few seconds rather than parallel bursts.

How should I model Crunchbase data in my database?

Use the organization permalink slug as the primary key for company records. Store funding rounds as separate rows keyed by the round URL slug with the organization permalink as a foreign key. Extract and store the UUID from the embedded page JSON as a secondary identifier — it survives slug changes after acquisitions or rebrands. Track a scraped_at timestamp on every record so you can identify stale data and prioritize re-scrape cycles for high-value organizations.

Related guides

Ready to scrape without blocks?

Get your API key in minutes. Test protected URLs from the dashboard — no credit card required to start.

1.Crunchbase fields deal teams and sales teams extract

Organization permalink slug and UUID (embedded in page source JSON)
Company name, short description, and logo URL
Founded date, operating status, and closed date if applicable
Headquarters city, region, and country
Company website URL
Category and industry tags
Employee count range (e.g., 1001–5000)
Total funding amount — paywalled on most free views
Number of funding rounds and last funding date
Last funding type and lead investor names — partially paywalled
Acquisition targets and acquirer (when public)
IPO date and stock exchange (when applicable)
Founder and key people profile links

2.Crunchbase URL patterns and permalink stability

Organization: https://www.crunchbase.com/organization/stripe
Person: https://www.crunchbase.com/person/patrick-collison
Funding round: https://www.crunchbase.com/funding_round/stripe-series-h--abc123
Acquisition: https://www.crunchbase.com/acquisition/company-acquires-target
Discover search (login-gated): https://www.crunchbase.com/discover/organization.companies
Category hub: https://www.crunchbase.com/hub/fintech-companies

3.Organization page DOM structure

4.Paywalls, anti-bot detection, and rate limits

`cb-paywall` class overlays on funding amounts and investor details
Login required for discover search and CSV exports
Aggressive rate limits on organization pages from datacenter IPs
Angular SPA hydration required for most content sections
Frequent component class name changes breaking CSS selectors
Legal terms explicitly prohibiting scraping and automated collection

5.Scraping public organization fields with OmniScrape

Crunchbase organization — public fields request

json

1234567891011121314151617{
  "url": "https://www.crunchbase.com/organization/openai",
  "mode": "auto",
  "output_format": "css_extractor",
  "enable_solver": true,
  "proxy": "residential:us",
  "css_selectors": {
    "name": "h1.profile-name",
    "description": "span.description",
    "location": "span.field-type-address",
    "website": "a.component--field-formatter.field-type-link",
    "employees": "a.field-type-enum",
    "categories": "span.chip",
    "founded": "span.field-type-date",
    "operating_status": "span.field-type-enum[href*='operating_status']"
  }
}

6.Extracting the funding rounds section

Crunchbase funding rounds section request

json

12345678910111213141516{
  "url": "https://www.crunchbase.com/organization/anthropic",
  "mode": "js_rendering",
  "output_format": "css_extractor",
  "enable_solver": true,
  "proxy": "residential:us",
  "js_wait_selector": "section#funding-rounds",
  "js_wait_timeout": 12000,
  "css_selectors": {
    "round_dates": "section#funding-rounds span.field-type-date",
    "round_types": "section#funding-rounds a[href*='funding_round']",
    "round_amounts": "section#funding-rounds span.field-type-money",
    "investors": "section#funding-rounds a[href*='/organization/']",
    "total_funding": "span[data-test='funding-total']"
  }
}

7.Crunchbase Enterprise API and licensed data access

8.Using permalinks and UUIDs as primary keys

9.Crunchbase Terms of Service and legal considerations

Frequently asked questions

Why does my Crunchbase scraper return empty funding amounts?

Do I need js_rendering mode for Crunchbase organization pages?

Can I scrape Crunchbase discover search results?

How often do Crunchbase CSS selectors break?

Is Crunchbase data public domain?

What proxy type should I use for Crunchbase?

How should I model Crunchbase data in my database?

Related guides

Ready to scrape without blocks?

Get your API key in minutes. Test protected URLs from the dashboard — no credit card required to start.