OmniScrape
ProductsSolutionsGuidesDocs ↗PricingAbout
ProductsSolutionsGuidesDocs ↗PricingAbout
← All guides
Site-Specific Scrapers

Crunchbase Scraper: Extract Funding Rounds, Companies, and Investors

Crunchbase is the canonical reference for startup funding history, but what you can actually collect without a paid account is narrower than most people assume. Funding amounts are blurred behind a Pro paywall on free organization views; investor lists are truncated; discover search requires login. What remains publicly visible — company name, description, headquarters, categories, employee range, and partial funding round metadata — is still valuable for firmographic enrichment, VC sourcing workflows, and competitive research.

Before building a Crunchbase scraping pipeline at scale, read this guide alongside the lead generation web scraping patterns and consider whether Crunchbase's Enterprise API or a licensed data vendor is the right foundation for a production product. Scraping public fields for internal research carries different risk than reselling compiled funding data.

On this page

1. Crunchbase fields deal teams and sales teams extract2. Crunchbase URL patterns and permalink stability3. Organization page DOM structure4. Paywalls, anti-bot detection, and rate limits5. Scraping public organization fields with OmniScrape6. Extracting the funding rounds section7. Crunchbase Enterprise API and licensed data access8. Using permalinks and UUIDs as primary keys9. Crunchbase Terms of Service and legal considerations10. FAQ

1.Crunchbase fields deal teams and sales teams extract

VC sourcing teams prioritize funding announcements: round type, date, amount, and lead investors. Sales enrichment pipelines care about employee range, category tags, HQ location, and the company website for domain matching. Journalists and analysts want acquisition events and IPO history. Understanding which fields are freely visible versus paywalled determines what your scraper can realistically return.

Fields marked as paywalled below will render as blurred or empty DOM nodes on free views — an empty CSS extraction result for those selectors does not mean your selector is wrong. It means Crunchbase is intentionally hiding the value.

  • Organization permalink slug and UUID (embedded in page source JSON)
  • Company name, short description, and logo URL
  • Founded date, operating status, and closed date if applicable
  • Headquarters city, region, and country
  • Company website URL
  • Category and industry tags
  • Employee count range (e.g., 1001–5000)
  • Total funding amount — paywalled on most free views
  • Number of funding rounds and last funding date
  • Last funding type and lead investor names — partially paywalled
  • Acquisition targets and acquirer (when public)
  • IPO date and stock exchange (when applicable)
  • Founder and key people profile links

2.Crunchbase URL patterns and permalink stability

Crunchbase uses human-readable slugs for organization and person permalinks. These slugs are stable enough to use as primary keys in most pipelines — companies rarely change their Crunchbase slug even after rebranding, though it does happen. The UUID embedded in the page source JSON is more durable if you can extract it.

Funding round URLs include a hash suffix that acts as a stable identifier for that specific round event. Bookmark these when you want to track a specific raise over time rather than re-scraping the parent organization page.

  • Organization: https://www.crunchbase.com/organization/stripe
  • Person: https://www.crunchbase.com/person/patrick-collison
  • Funding round: https://www.crunchbase.com/funding_round/stripe-series-h--abc123
  • Acquisition: https://www.crunchbase.com/acquisition/company-acquires-target
  • Discover search (login-gated): https://www.crunchbase.com/discover/organization.companies
  • Category hub: https://www.crunchbase.com/hub/fintech-companies

3.Organization page DOM structure

Crunchbase organization pages are Angular single-page applications. The server renders an initial HTML shell with some metadata, but most content is hydrated client-side. This means fast HTTP-only requests will capture the shell and any JSON-LD embedded in the document, but funding sections and people cards require JavaScript execution to appear in the DOM.

Key selectors on a fully rendered organization page: company name in `h1.profile-name`; short description in `span.description`; headquarters in `span.field-type-address`; website in `a.component--field-formatter.field-type-link`; employee range in `a.field-type-enum`; category chips in `span.chip`; founded date in `span.field-type-date`. Funding round rows render inside `section#funding-rounds` as table rows once the Angular component loads.

Paywalled fields are wrapped in elements with class `cb-paywall`. These nodes exist in the DOM but their text content is replaced with a blur overlay and a prompt to upgrade. Your CSS extractor will return empty strings for those selectors — not an error, just a paywall signal. Crunchbase also embeds JSON-LD `Organization` schema on some pages with `name` and `url` properties, but funding detail is almost never included in the structured data.

4.Paywalls, anti-bot detection, and rate limits

Crunchbase runs layered defenses. At the network layer, datacenter IP ranges are rate-limited aggressively on organization page requests — you will see 429s or silent redirects to the homepage within a small number of sequential requests from a single datacenter IP. Residential proxies reduce this friction significantly.

At the application layer, the Pro paywall blurs funding amounts and investor lists for unauthenticated or free-tier sessions. This is not bot detection — it is deliberate content gating. Attempting to bypass it by injecting session cookies from a paid account violates Crunchbase Terms and potentially computer fraud statutes depending on jurisdiction.

The discover search and export features require an active login session and are heavily rate-limited even for Pro users. Do not attempt to automate discover exports — scrape known organization permalinks from a seed list instead.

  • `cb-paywall` class overlays on funding amounts and investor details
  • Login required for discover search and CSV exports
  • Aggressive rate limits on organization pages from datacenter IPs
  • Angular SPA hydration required for most content sections
  • Frequent component class name changes breaking CSS selectors
  • Legal terms explicitly prohibiting scraping and automated collection

5.Scraping public organization fields with OmniScrape

Use `mode: "auto"` with a residential US proxy for organization pages. OmniScrape will attempt a fast HTTP request first and escalate to a headless browser if the page requires JavaScript rendering. For basic firmographic fields — name, description, location, categories, employee range — the initial HTTP response often contains enough rendered HTML to extract values without full JS execution.

Target only the fields that are freely visible on public pages. If a selector returns an empty string, check whether the field is behind a `cb-paywall` overlay before debugging your selector. The `enable_solver` flag activates OmniScrape's Web Unlocker to handle bot challenges that may appear on high-volume scraping sessions.

Crunchbase organization — public fields request
json
1234567891011121314151617{
  "url": "https://www.crunchbase.com/organization/openai",
  "mode": "auto",
  "output_format": "css_extractor",
  "enable_solver": true,
  "proxy": "residential:us",
  "css_selectors": {
    "name": "h1.profile-name",
    "description": "span.description",
    "location": "span.field-type-address",
    "website": "a.component--field-formatter.field-type-link",
    "employees": "a.field-type-enum",
    "categories": "span.chip",
    "founded": "span.field-type-date",
    "operating_status": "span.field-type-enum[href*='operating_status']"
  }
}

6.Extracting the funding rounds section

The funding rounds section is rendered by an Angular component that loads asynchronously after the initial page shell. Use `mode: "js_rendering"` with `js_wait_selector` pointing to `section#funding-rounds` so OmniScrape waits for the component to hydrate before extracting. Set `js_wait_timeout` to at least 10–12 seconds — Crunchbase's Angular bootstrap is slow on cold loads.

Round rows that are not paywalled will contain the funding date, round type label, and a link to the round detail page. The amount and lead investor name may be empty strings if the session is not authenticated to Pro. Extract investor links by targeting anchor tags with `href` containing `/organization/` inside the funding section — these point to investor organization pages you can follow-scrape.

Store the round detail URL (e.g., `/funding_round/stripe-series-h--abc123`) as a stable foreign key. Re-scraping the parent organization page will re-surface the same round — the round URL is your deduplication handle.

Crunchbase funding rounds section request
json
12345678910111213141516{
  "url": "https://www.crunchbase.com/organization/anthropic",
  "mode": "js_rendering",
  "output_format": "css_extractor",
  "enable_solver": true,
  "proxy": "residential:us",
  "js_wait_selector": "section#funding-rounds",
  "js_wait_timeout": 12000,
  "css_selectors": {
    "round_dates": "section#funding-rounds span.field-type-date",
    "round_types": "section#funding-rounds a[href*='funding_round']",
    "round_amounts": "section#funding-rounds span.field-type-money",
    "investors": "section#funding-rounds a[href*='/organization/']",
    "total_funding": "span[data-test='funding-total']"
  }
}

7.Crunchbase Enterprise API and licensed data access

Crunchbase sells licensed API access and bulk data exports through its Enterprise tier. If you are building a product that surfaces Crunchbase funding data to end users — a CRM enrichment tool, an investor intelligence platform, a sales prospecting product — you almost certainly need a license rather than a scraper. Scraping free public fields and reselling compiled funding datasets competes directly with Crunchbase's core business and carries significant legal exposure.

The Enterprise API returns structured JSON with full funding detail, investor relationships, and historical round data. It is rate-limited but documented, and the data model is stable compared to CSS selectors that break whenever Crunchbase ships an Angular component update. For internal research use cases — a VC analyst running one-off lookups, a journalist verifying a funding claim — scraping publicly visible fields with counsel sign-off is a different risk profile than a commercial data product.

Evaluate the build-versus-buy decision honestly: the engineering cost of maintaining Crunchbase CSS selectors against frequent DOM changes, plus residential proxy costs, plus legal review, often exceeds the Enterprise API cost for production workloads.

8.Using permalinks and UUIDs as primary keys

Store the organization permalink slug — the human-readable portion of the URL like `openai` or `anthropic` — as your primary key for company records. This slug is stable across most rebrands and is the canonical identifier Crunchbase uses in all cross-links between organizations, funding rounds, and people.

For higher durability, extract the UUID from the embedded JSON in the page source. Crunchbase embeds a JSON blob in a `<script>` tag containing the organization's UUID, which persists even if the slug changes after an acquisition or rebrand. Parse this with a regex or JSON path extractor from the raw HTML response (`body.data.content`) before running CSS extraction.

Model funding events as separate rows keyed by the round URL slug. A single organization scrape may surface multiple rounds — store each as an independent record with the parent organization permalink as a foreign key. This lets you incrementally update round records without re-processing the full organization history on every scrape cycle.

9.Crunchbase Terms of Service and legal considerations

Crunchbase's Terms of Service explicitly prohibit automated scraping, crawling, and data collection. Section 4 of their Terms restricts use of robots, spiders, or automated tools to access the service. Paywall bypass — whether by injecting Pro session cookies, intercepting API calls, or circumventing the `cb-paywall` overlay — constitutes unauthorized access to paid content and may violate the Computer Fraud and Abuse Act in the US and equivalent statutes in other jurisdictions.

OmniScrape provides the technical capability to make HTTP and browser-rendered requests to publicly accessible URLs. It does not grant any rights to the data returned by those requests. The legality of collecting, storing, and using Crunchbase data depends on your jurisdiction, your use case, and whether the data is publicly visible without authentication. Get legal counsel before building a commercial product on scraped Crunchbase data.

For publicly visible fields collected at low volume for internal research — verifying a funding claim, enriching a small prospect list — the risk profile is different from bulk collection and redistribution. Document your use case, respect robots.txt, use rate limiting, and do not attempt to access paywalled content.

Frequently asked questions

Why does my Crunchbase scraper return empty funding amounts?

Funding amounts on Crunchbase are paywalled behind Pro for most organizations on free views. The DOM node exists but its text content is replaced with a blur overlay — your CSS selector is correct, but the value is intentionally hidden. You will see the same empty result whether you scrape with a browser or a headless tool. The only legitimate way to access the full amount is through a Pro account or the Enterprise API.

Do I need js_rendering mode for Crunchbase organization pages?

It depends on which fields you need. Basic firmographic fields — name, description, location, categories, employee range — are often present in the initial server-rendered HTML shell and can be extracted with mode auto without full JS execution. The funding rounds section, people cards, and acquisition history require Angular hydration and need js_rendering with js_wait_selector set to the relevant section ID. Use auto first and check what comes back before defaulting to js_rendering for every request.

Can I scrape Crunchbase discover search results?

Discover search requires an active login session and is heavily rate-limited even for authenticated Pro users. Automated access to discover search and CSV exports is explicitly restricted by Crunchbase Terms. The practical alternative is to build a seed list of organization permalinks from external sources — press releases, news mentions, LinkedIn company pages — and scrape each permalink directly rather than trying to replicate discover search programmatically.

How often do Crunchbase CSS selectors break?

Frequently. Crunchbase ships Angular component updates that change class names and DOM structure without notice. Selectors like h1.profile-name and section#funding-rounds have been relatively stable, but attribute-based selectors and deeply nested class chains break regularly. Build your pipeline with fallback selectors, monitor extraction success rates, and alert on empty results that were previously populated. Expect to update selectors several times per year.

Is Crunchbase data public domain?

No. Crunchbase aggregates, cleans, and licenses funding data. Even if individual data points like a funding announcement are public facts, Crunchbase's compiled database is protected as a copyrightable compilation in most jurisdictions. Scraping and redistributing Crunchbase data commercially — as part of a data product, API, or enrichment service — carries high legal risk regardless of whether the underlying facts are public.

What proxy type should I use for Crunchbase?

Residential US proxies. Crunchbase rate-limits datacenter IP ranges aggressively on organization page requests — you will see 429 responses or silent redirects within a small number of sequential requests from a datacenter IP. Residential proxies rotate through real ISP addresses and significantly reduce rate-limiting friction. Set proxy: "residential:us" in your OmniScrape request and keep request cadence low — one request per organization every few seconds rather than parallel bursts.

How should I model Crunchbase data in my database?

Use the organization permalink slug as the primary key for company records. Store funding rounds as separate rows keyed by the round URL slug with the organization permalink as a foreign key. Extract and store the UUID from the embedded page JSON as a secondary identifier — it survives slug changes after acquisitions or rebrands. Track a scraped_at timestamp on every record so you can identify stale data and prioritize re-scrape cycles for high-value organizations.

Related guides

  • Lead Generation Web Scraping: Compliant Inbound Enrichment for Sales Teams
  • Market Research Web Scraping: Multi-Geo Data Collection for Research Firms
  • LinkedIn Scraper: Companies, Jobs, and Public Profiles
  • Web Scraping Without Getting Blocked

Ready to scrape without blocks?

Get your API key in minutes. Test protected URLs from the dashboard — no credit card required to start.

Ready to get started?

Start scraping protected sites today — no credit card required.

OmniScrape

Web scraping infrastructure for developers. One API call to bypass any protection.

All systems operational

Product

  • Web Unlocker
  • Browser-as-a-Service
  • Residential Proxies
  • Pricing

Developers

  • API Reference ↗
  • Quickstart ↗
  • All Guides
  • Use Cases
  • Status

Company

  • About
  • Contact

Legal

  • Privacy Policy
  • Terms of Service
  • Refund Policy
  • Cookie Policy
  • Acceptable Use

Solutions

  • E-commerce Web Scraping: Catalog Intelligence at Production Scale
  • Real Estate Web Scraping: Listings, Comps, and Market Data
  • SERP Web Scraping: Agency Rank Tracking Workflow
  • Job Board Web Scraping: HR Tech Pipeline for Labor Market Intelligence
  • Price Monitoring with Web Scraping: A Practical Developer Guide
  • Lead Generation Web Scraping: Compliant Inbound Enrichment for Sales Teams
  • Market Research Web Scraping: Multi-Geo Data Collection for Research Firms
  • Sentiment Analysis Web Scraping: Build a Production Review Pipeline
  • Logistics Web Scraping: Carrier Rates, Port ETAs, and Sailing Schedules
  • Social Media Web Scraping: Brand Mention Monitoring from Public Pages
  • LLM Training Data Scraping: Building Clean Web Corpora
  • Travel Web Scraping: Hotel Rates, Flight Fares & Parity Monitoring

Web Scraping by Language

  • Web Scraping with Python
  • Web Scraping with Node.js: fetch, Cheerio, and the OmniScrape API
  • Web Scraping with Java: HttpClient, Jsoup, and OmniScrape API
  • Web Scraping with PHP
  • Web Scraping with Go (Golang)
  • Web Scraping with Ruby: Faraday, Nokogiri, Sidekiq & OmniScrape
  • Web Scraping with C#: HttpClient, AngleSharp, and OmniScrape API
  • Web Scraping with Rust
  • Web Scraping with R: httr2, rvest, and the OmniScrape API
  • Web Scraping with C++
  • Web Scraping with Elixir
  • Web Scraping with Perl: Mojo::UserAgent, Mojo::DOM, and OmniScrape

Anti-Bot Bypass

  • How to Bypass Cloudflare When Web Scraping
  • How to Bypass DataDome When Web Scraping
  • How to Bypass Akamai Bot Manager When Web Scraping
  • How to Bypass PerimeterX (HUMAN Security) When Web Scraping
  • Bypassing AWS WAF When Web Scraping: Rate Rules, Bot Control, and Residential Proxies
  • How to Bypass Imperva (Incapsula) When Web Scraping
  • How to Bypass Kasada Bot Protection When Web Scraping
  • How to Bypass F5 BIG-IP Bot Defense When Web Scraping
  • How to Bypass Distil Networks When Web Scraping
  • How to Bypass reCAPTCHA When Web Scraping

Scraping Tools

  • Playwright Web Scraping: Practical Patterns for Protected Sites
  • Puppeteer Web Scraping: Patterns, Anti-Bot Limits, and BaaS Integration
  • Selenium Web Scraping: Practical Patterns for Real-World Projects
  • Scrapy Web Scraping with OmniScrape: Download Middleware, Pipelines, and Scale
  • Beautiful Soup Web Scraping: A Practical Guide
  • cURL Web Scraping: Shell-Native Patterns with OmniScrape
  • HTTPX Web Scraping: Async Python with OmniScrape
  • Cheerio Web Scraping: A Practical Guide

Site-Specific Scrapers

  • Amazon Scraper: Product Data, Buy Box, Reviews, and Multi-Marketplace
  • Google Search Scraper: Extract SERP Rankings and Features
  • Google Maps Scraper: Extract Business Listings and Place Data
  • LinkedIn Scraper: Companies, Jobs, and Public Profiles
  • Walmart Scraper: Prices, Stock, Rollback Deals, and Fulfillment Data
  • eBay Scraper: Extract Listings, Auctions, and Sold Prices
  • Shopify Scraper: Products, Variants, and JSON Endpoints
  • Indeed Scraper: Extract Job Listings, Salaries, and Company Data
  • Zillow Scraper: Extract Listings, Zestimates, and Price History
  • Reddit Scraper: Posts, Comments, and Subreddit Data
  • X (Twitter) Scraper: Tweets, Profiles, and Hashtags
  • Instagram Scraper: Posts, Reels, and Profile Metrics
  • TikTok Scraper: Extract Videos, Hashtags, and Trend Data
  • YouTube Scraper: Extract Video Metadata, Comments, and Channel Stats
  • Booking.com Scraper: Hotel Rates, Room Types, and Availability
  • Airbnb Scraper: Listings, Calendars, and Nightly Rates
  • Crunchbase Scraper: Extract Funding Rounds, Companies, and Investors
  • Yelp Scraper: Extract Business Listings, Ratings, and Reviews
  • Glassdoor Scraper: Employer Ratings, Salaries, and Review Data
  • Trustpilot Scraper: TrustScore, Star Distribution, and Review Monitoring

How We Compare

  • OmniScrape vs ScrapingBee
  • OmniScrape vs ZenRows
  • OmniScrape vs ScraperAPI: A Practical Developer Comparison
  • OmniScrape vs Bright Data: Which Web Scraping Platform Fits Your Team?
  • OmniScrape vs Oxylabs
  • OmniScrape vs Smartproxy
  • OmniScrape vs Crawlbase: API Design, Observability, and Migration Guide
  • OmniScrape vs Apify

Web Scraping Guides

  • Web Scraping Without Getting Blocked
  • Web Scraping Proxy Guide: Types, Sessions, Geo, and OmniScrape Integration
  • Solve CAPTCHAs While Web Scraping
  • Web Scraping vs Web Crawling: Architecture, Patterns, and When to Use Each
  • Headless Browser Scraping: When to Use It and How to Do It Right
  • Web Scraping API: Endpoint, Modes, Output Formats & Integration Patterns
  • Rotating Proxies for Web Scraping: Policies, Session Binding, and Geo Pools
  • Scrape JavaScript-Rendered Pages: SPAs, Hydration, and Hidden APIs

© 2026 OmniScrape. All rights reserved.

PrivacyTermsRefundsAcceptable Use