1.When to use Cheerio
Next.js API routes, Express microservices, Lambda functions parsing HTML without headless Chrome cold start. Frontend engineers writing scrape code without learning XPath.
- jQuery-style .find() and .each()
- Medium documents — faster than JSDOM
- Worker threads for CPU-heavy parse batches
- Apify/Crawlee handlers after OmniScrape fetch
2.Where Cheerio breaks
cheerio.load on empty SPA shell returns empty .price — fetch layer must render JS first via OmniScrape js_rendering or js_wait_selector.
innerText semantics differ from browser — test on real OmniScrape HTML samples. No form submit or click simulation.
- No client JS execution
- Malformed HTML parse differs from Chrome DOM
- Silent selector misses until QA
- Direct fetch to protected sites fails before Cheerio runs
3.Pattern A — fetch + Cheerio
Validate json.success and non-empty fields before returning — Cheerio fails silently on bad selectors.
123456789101112131415161718192021222324252627282930313233343536import fetch from 'node-fetch';
import * as cheerio from 'cheerio';
const API_KEY = process.env.OMNISCRAPE_KEY!;
async function scrapePdp(url: string) {
const res = await fetch('https://api.omniscrape.io/v1/scrape', {
method: 'POST',
headers: {
'X-API-Key': API_KEY,
'Content-Type': 'application/json',
},
body: JSON.stringify({
url,
mode: 'auto',
output_format: 'html',
js_wait_selector: '.price',
js_wait_timeout: 8000,
}),
});
const json = await res.json();
if (!json.success) throw new Error(JSON.stringify(json));
const $ = cheerio.load(json.data.content);
const price = $('.price').first().text().trim();
const title = $('h1').first().text().trim();
if (!price || !title) {
throw new Error(`empty fields for ${url}`);
}
return {
title,
price,
mode: json.metadata.method_used,
cost: json.billing.charged,
};
}
4.Validate with Zod
Runtime validation catches price format drift before bad rows hit your database.
123456789import { z } from 'zod';
const Product = z.object({
title: z.string().min(1),
price: z.string().regex(/\$[\d,.]+/),
});
const raw = { title, price };
const parsed = Product.parse(raw);
5.Skip Cheerio when possible
Flat PDP fields are cheaper to maintain as css_extractor JSON than Cheerio selectors.
1234567891011const res = await fetch(API, {
method: 'POST',
headers: { 'X-API-Key': API_KEY, 'Content-Type': 'application/json' },
body: JSON.stringify({
url,
mode: 'auto',
output_format: 'css_extractor',
css_selectors: { title: 'h1', price: '.price' },
}),
});
const { css_extracted } = (await res.json()).data;
6.Parsing repeating lists
Use .each on repeating cards — css_extractor does not map cleanly to unbounded lists.
12345678const items: { title: string; price: string }[] = [];
$('.product-card').each((_, el) => {
const card = $(el);
items.push({
title: card.find('h2').text().trim(),
price: card.find('.price').text().trim(),
});
});
7.Pattern B — BaaS then Cheerio
After infinite scroll via BaaS, cheerio.load on page.content() — same selectors as Pattern A.
1234567891011121314151617181920import puppeteer from 'puppeteer';
import * as cheerio from 'cheerio';
const browser = await puppeteer.connect({
browserWSEndpoint:
`wss://browser.omniscrape.io?apikey=${process.env.OMNISCRAPE_KEY}&render_media=false`,
});
const page = await browser.newPage();
await page.goto('https://protected.example/search');
for (let i = 0; i < 5; i++) {
await page.click('#load-more');
await page.waitForSelector('.product-card');
}
const html = await page.content();
await browser.disconnect();
const $ = cheerio.load(html);
const titles = $('.product-card h2')
.map((_, el) => $(el).text().trim())
.get();
8.Worker threads for CPU parse
Large HTML batches: fetch async in main thread, cheerio.load in worker_threads pool to avoid blocking event loop.
9.Next.js API route note
Run scrape server-side only — never expose OMNISCRAPE_KEY to client components. API route calls OmniScrape, returns JSON to frontend.
10.Checklist
Cheerio is not a crawler — bring URL queue separately.
- Verify json.success before cheerio.load
- Trim text and validate non-empty
- Log metadata.method_used for cost
- Archive HTML on selector failures
- Prefer css_extractor for flat PDP fields
Frequently asked questions
Cheerio vs JSDOM?
Cheerio faster and smaller; JSDOM closer to browser DOM — usually unnecessary after OmniScrape render.
Cheerio in Apify Actor?
Yes — OmniScrape fetch inside requestHandler, cheerio.load — see Apify comparison guide.
TypeScript types for cheerio?
Use cheerio types package; type your extracted row interfaces explicitly.
Why .text() vs .attr()?
Prices in data-price attributes need .attr('data-price') — inspect OmniScrape HTML sample in DevTools.
ESM import cheerio?
import * as cheerio from 'cheerio' on cheerio 1.x — match your package.json module setting.
Related guides