OmniScrape
ProductsSolutionsGuidesDocs ↗PricingAbout
ProductsSolutionsGuidesDocs ↗PricingAbout
← All guides
Scraping Tools

Cheerio Web Scraping: A Practical Guide

Cheerio brings jQuery selector syntax to Node.js servers — fast, familiar to frontend developers, tiny compared to JSDOM. It parses static HTML strings; it does not run client JavaScript. That is fine when OmniScrape already returned post-challenge, post-render HTML.

Pattern A: fetch from OmniScrape, cheerio.load(html). Pattern B: puppeteer.connect to BaaS, then cheerio.load on page.content(). See Puppeteer guide and OmniScrape vs Apify for Crawlee integration.

On this page

1. When to use Cheerio2. Where Cheerio breaks3. Pattern A — fetch + Cheerio4. Validate with Zod5. Skip Cheerio when possible6. Parsing repeating lists7. Pattern B — BaaS then Cheerio8. Worker threads for CPU parse9. Next.js API route note10. Checklist11. FAQ

1.When to use Cheerio

Next.js API routes, Express microservices, Lambda functions parsing HTML without headless Chrome cold start. Frontend engineers writing scrape code without learning XPath.

  • jQuery-style .find() and .each()
  • Medium documents — faster than JSDOM
  • Worker threads for CPU-heavy parse batches
  • Apify/Crawlee handlers after OmniScrape fetch

2.Where Cheerio breaks

cheerio.load on empty SPA shell returns empty .price — fetch layer must render JS first via OmniScrape js_rendering or js_wait_selector.

innerText semantics differ from browser — test on real OmniScrape HTML samples. No form submit or click simulation.

  • No client JS execution
  • Malformed HTML parse differs from Chrome DOM
  • Silent selector misses until QA
  • Direct fetch to protected sites fails before Cheerio runs

3.Pattern A — fetch + Cheerio

Validate json.success and non-empty fields before returning — Cheerio fails silently on bad selectors.

pattern-a.ts
javascript
123456789101112131415161718192021222324252627282930313233343536import fetch from 'node-fetch';
import * as cheerio from 'cheerio';

const API_KEY = process.env.OMNISCRAPE_KEY!;

async function scrapePdp(url: string) {
  const res = await fetch('https://api.omniscrape.io/v1/scrape', {
    method: 'POST',
    headers: {
      'X-API-Key': API_KEY,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      url,
      mode: 'auto',
      output_format: 'html',
      js_wait_selector: '.price',
      js_wait_timeout: 8000,
    }),
  });
  const json = await res.json();
  if (!json.success) throw new Error(JSON.stringify(json));

  const $ = cheerio.load(json.data.content);
  const price = $('.price').first().text().trim();
  const title = $('h1').first().text().trim();
  if (!price || !title) {
    throw new Error(`empty fields for ${url}`);
  }
  return {
    title,
    price,
    mode: json.metadata.method_used,
    cost: json.billing.charged,
  };
}

4.Validate with Zod

Runtime validation catches price format drift before bad rows hit your database.

validate.ts
typescript
123456789import { z } from 'zod';

const Product = z.object({
  title: z.string().min(1),
  price: z.string().regex(/\$[\d,.]+/),
});

const raw = { title, price };
const parsed = Product.parse(raw);

5.Skip Cheerio when possible

Flat PDP fields are cheaper to maintain as css_extractor JSON than Cheerio selectors.

css_extractor path
javascript
1234567891011const res = await fetch(API, {
  method: 'POST',
  headers: { 'X-API-Key': API_KEY, 'Content-Type': 'application/json' },
  body: JSON.stringify({
    url,
    mode: 'auto',
    output_format: 'css_extractor',
    css_selectors: { title: 'h1', price: '.price' },
  }),
});
const { css_extracted } = (await res.json()).data;

6.Parsing repeating lists

Use .each on repeating cards — css_extractor does not map cleanly to unbounded lists.

list extract
javascript
12345678const items: { title: string; price: string }[] = [];
$('.product-card').each((_, el) => {
  const card = $(el);
  items.push({
    title: card.find('h2').text().trim(),
    price: card.find('.price').text().trim(),
  });
});

7.Pattern B — BaaS then Cheerio

After infinite scroll via BaaS, cheerio.load on page.content() — same selectors as Pattern A.

infinite scroll → cheerio
javascript
1234567891011121314151617181920import puppeteer from 'puppeteer';
import * as cheerio from 'cheerio';

const browser = await puppeteer.connect({
  browserWSEndpoint:
    `wss://browser.omniscrape.io?apikey=${process.env.OMNISCRAPE_KEY}&render_media=false`,
});
const page = await browser.newPage();
await page.goto('https://protected.example/search');
for (let i = 0; i < 5; i++) {
  await page.click('#load-more');
  await page.waitForSelector('.product-card');
}
const html = await page.content();
await browser.disconnect();

const $ = cheerio.load(html);
const titles = $('.product-card h2')
  .map((_, el) => $(el).text().trim())
  .get();

8.Worker threads for CPU parse

Large HTML batches: fetch async in main thread, cheerio.load in worker_threads pool to avoid blocking event loop.

9.Next.js API route note

Run scrape server-side only — never expose OMNISCRAPE_KEY to client components. API route calls OmniScrape, returns JSON to frontend.

10.Checklist

Cheerio is not a crawler — bring URL queue separately.

  • Verify json.success before cheerio.load
  • Trim text and validate non-empty
  • Log metadata.method_used for cost
  • Archive HTML on selector failures
  • Prefer css_extractor for flat PDP fields

Frequently asked questions

Cheerio vs JSDOM?

Cheerio faster and smaller; JSDOM closer to browser DOM — usually unnecessary after OmniScrape render.

Cheerio in Apify Actor?

Yes — OmniScrape fetch inside requestHandler, cheerio.load — see Apify comparison guide.

TypeScript types for cheerio?

Use cheerio types package; type your extracted row interfaces explicitly.

Why .text() vs .attr()?

Prices in data-price attributes need .attr('data-price') — inspect OmniScrape HTML sample in DevTools.

ESM import cheerio?

import * as cheerio from 'cheerio' on cheerio 1.x — match your package.json module setting.

Related guides

  • Puppeteer Web Scraping: Patterns, Anti-Bot Limits, and BaaS Integration
  • OmniScrape vs Apify
  • Web Scraping API: Endpoint, Modes, Output Formats & Integration Patterns

Ready to scrape without blocks?

Get your API key in minutes. Test protected URLs from the dashboard — no credit card required to start.

Ready to get started?

Start scraping protected sites today — no credit card required.

OmniScrape

Web scraping infrastructure for developers. One API call to bypass any protection.

All systems operational

Product

  • Web Unlocker
  • Browser-as-a-Service
  • Residential Proxies
  • Pricing

Developers

  • API Reference ↗
  • Quickstart ↗
  • All Guides
  • Use Cases
  • Status

Company

  • About
  • Contact

Legal

  • Privacy Policy
  • Terms of Service
  • Refund Policy
  • Cookie Policy
  • Acceptable Use

Solutions

  • E-commerce Web Scraping: Catalog Intelligence at Production Scale
  • Real Estate Web Scraping: Listings, Comps, and Market Data
  • SERP Web Scraping: Agency Rank Tracking Workflow
  • Job Board Web Scraping: HR Tech Pipeline for Labor Market Intelligence
  • Price Monitoring with Web Scraping: A Practical Developer Guide
  • Lead Generation Web Scraping: Compliant Inbound Enrichment for Sales Teams
  • Market Research Web Scraping: Multi-Geo Data Collection for Research Firms
  • Sentiment Analysis Web Scraping: Build a Production Review Pipeline
  • Logistics Web Scraping: Carrier Rates, Port ETAs, and Sailing Schedules
  • Social Media Web Scraping: Brand Mention Monitoring from Public Pages
  • LLM Training Data Scraping: Building Clean Web Corpora
  • Travel Web Scraping: Hotel Rates, Flight Fares & Parity Monitoring

Web Scraping by Language

  • Web Scraping with Python
  • Web Scraping with Node.js: fetch, Cheerio, and the OmniScrape API
  • Web Scraping with Java: HttpClient, Jsoup, and OmniScrape API
  • Web Scraping with PHP
  • Web Scraping with Go (Golang)
  • Web Scraping with Ruby: Faraday, Nokogiri, Sidekiq & OmniScrape
  • Web Scraping with C#: HttpClient, AngleSharp, and OmniScrape API
  • Web Scraping with Rust
  • Web Scraping with R: httr2, rvest, and the OmniScrape API
  • Web Scraping with C++
  • Web Scraping with Elixir
  • Web Scraping with Perl: Mojo::UserAgent, Mojo::DOM, and OmniScrape

Anti-Bot Bypass

  • How to Bypass Cloudflare When Web Scraping
  • How to Bypass DataDome When Web Scraping
  • How to Bypass Akamai Bot Manager When Web Scraping
  • How to Bypass PerimeterX (HUMAN Security) When Web Scraping
  • Bypassing AWS WAF When Web Scraping: Rate Rules, Bot Control, and Residential Proxies
  • How to Bypass Imperva (Incapsula) When Web Scraping
  • How to Bypass Kasada Bot Protection When Web Scraping
  • How to Bypass F5 BIG-IP Bot Defense When Web Scraping
  • How to Bypass Distil Networks When Web Scraping
  • How to Bypass reCAPTCHA When Web Scraping

Scraping Tools

  • Playwright Web Scraping: Practical Patterns for Protected Sites
  • Puppeteer Web Scraping: Patterns, Anti-Bot Limits, and BaaS Integration
  • Selenium Web Scraping: Practical Patterns for Real-World Projects
  • Scrapy Web Scraping with OmniScrape: Download Middleware, Pipelines, and Scale
  • Beautiful Soup Web Scraping: A Practical Guide
  • cURL Web Scraping: Shell-Native Patterns with OmniScrape
  • HTTPX Web Scraping: Async Python with OmniScrape
  • Cheerio Web Scraping: A Practical Guide

Site-Specific Scrapers

  • Amazon Scraper: Product Data, Buy Box, Reviews, and Multi-Marketplace
  • Google Search Scraper: Extract SERP Rankings and Features
  • Google Maps Scraper: Extract Business Listings and Place Data
  • LinkedIn Scraper: Companies, Jobs, and Public Profiles
  • Walmart Scraper: Prices, Stock, Rollback Deals, and Fulfillment Data
  • eBay Scraper: Extract Listings, Auctions, and Sold Prices
  • Shopify Scraper: Products, Variants, and JSON Endpoints
  • Indeed Scraper: Extract Job Listings, Salaries, and Company Data
  • Zillow Scraper: Extract Listings, Zestimates, and Price History
  • Reddit Scraper: Posts, Comments, and Subreddit Data
  • X (Twitter) Scraper: Tweets, Profiles, and Hashtags
  • Instagram Scraper: Posts, Reels, and Profile Metrics
  • TikTok Scraper: Extract Videos, Hashtags, and Trend Data
  • YouTube Scraper: Extract Video Metadata, Comments, and Channel Stats
  • Booking.com Scraper: Hotel Rates, Room Types, and Availability
  • Airbnb Scraper: Listings, Calendars, and Nightly Rates
  • Crunchbase Scraper: Extract Funding Rounds, Companies, and Investors
  • Yelp Scraper: Extract Business Listings, Ratings, and Reviews
  • Glassdoor Scraper: Employer Ratings, Salaries, and Review Data
  • Trustpilot Scraper: TrustScore, Star Distribution, and Review Monitoring

How We Compare

  • OmniScrape vs ScrapingBee
  • OmniScrape vs ZenRows
  • OmniScrape vs ScraperAPI: A Practical Developer Comparison
  • OmniScrape vs Bright Data: Which Web Scraping Platform Fits Your Team?
  • OmniScrape vs Oxylabs
  • OmniScrape vs Smartproxy
  • OmniScrape vs Crawlbase: API Design, Observability, and Migration Guide
  • OmniScrape vs Apify

Web Scraping Guides

  • Web Scraping Without Getting Blocked
  • Web Scraping Proxy Guide: Types, Sessions, Geo, and OmniScrape Integration
  • Solve CAPTCHAs While Web Scraping
  • Web Scraping vs Web Crawling: Architecture, Patterns, and When to Use Each
  • Headless Browser Scraping: When to Use It and How to Do It Right
  • Web Scraping API: Endpoint, Modes, Output Formats & Integration Patterns
  • Rotating Proxies for Web Scraping: Policies, Session Binding, and Geo Pools
  • Scrape JavaScript-Rendered Pages: SPAs, Hydration, and Hidden APIs

© 2026 OmniScrape. All rights reserved.

PrivacyTermsRefundsAcceptable Use