OmniScrape
ProductsSolutionsGuidesDocs ↗PricingAbout
ProductsSolutionsGuidesDocs ↗PricingAbout
← All guides
Site-Specific Scrapers

Yelp Scraper: Extract Business Listings, Ratings, and Reviews

Yelp is the canonical source of local business reviews in the US, and it actively defends that data. The /biz/ page is the atomic unit: a stable, predictable URL containing name, address, phone, hours, price range, categories, and the first page of reviews. Deeper review pagination is JavaScript-rendered and protected by CAPTCHA on aggressive crawl patterns.

Local SEO agencies scrape Yelp for citation consistency audits — checking that NAP (Name, Address, Phone) matches across directories. Reputation platforms track rating trends and sentiment over time. Hedge funds use review velocity as a foot-traffic proxy. Yelp's terms of service explicitly prohibit scraping reviews to build competing directory products, so understand your use case before you build.

This guide walks through Yelp's URL structure, DOM layout, bot-detection behavior, and the exact OmniScrape API requests needed to extract business data and paginated reviews. It also covers merging Yelp data with Google Maps scraper output for NAP reconciliation across sources.

On this page

1. Data fields local SEO and reputation teams extract from Yelp2. Yelp URL patterns and pagination mechanics3. Yelp biz page DOM structure and CSS selectors4. Yelp bot detection and anti-scraping measures5. Scrape a Yelp business page with CSS extraction6. Paginate and extract Yelp reviews7. NAP normalization, deduplication, and cross-source merging8. Yelp Fusion API — when to use it instead of scraping9. Yelp Terms of Service and legal considerations10. FAQ

1.Data fields local SEO and reputation teams extract from Yelp

Citation monitoring is fundamentally a consistency problem: does the business name, address, and phone number on Yelp match what appears on Google Business Profile, Apple Maps, and a dozen other directories? Any mismatch is a local SEO signal worth flagging. Beyond NAP, reputation tools care about rating trends — not just the current average, but how it has moved over the last 90 days and whether negative reviews cluster around a specific product or location.

The fields below represent the full extraction target for a typical Yelp biz page. Not all fields are visible in the initial HTML — hours and amenities sometimes require interacting with expandable sections, and review text beyond the first page requires paginated requests.

  • Business name, Yelp biz ID, and /biz/ slug (stable identifier)
  • Star rating (aggregate average, 1–5 in 0.5 increments) and total review count
  • Street address, neighborhood label, city, state, ZIP code
  • Phone number (formatted and raw) and external website URL
  • Hours of operation for each day of the week, including holiday hours if present
  • Price range indicator ($ through $$$$)
  • Primary and secondary categories (e.g., 'Bakeries', 'Coffee & Tea')
  • Amenities and attributes (outdoor seating, reservations, wheelchair accessible, etc.)
  • Individual reviews: full text, star rating, date posted, reviewer username and profile URL
  • Owner response: presence, text, and response date
  • Claimed vs. unclaimed business status
  • Photos count and first-page photo URLs

2.Yelp URL patterns and pagination mechanics

Yelp's URL structure is stable and human-readable, which makes it predictable for crawlers. The primary biz slug is derived from the business name and city, lowercased and hyphenated. When two businesses share the same derived slug, Yelp appends a numeric suffix (-2, -3, etc.). The slug does not change when the business updates its name in the Yelp dashboard — the original slug persists, which is useful for long-term tracking.

Review pagination uses a simple offset query parameter rather than cursor tokens, which makes it easy to construct page URLs without first fetching a previous page. Each page returns 10 reviews. To paginate, increment start by 10 until the review container returns empty. The sort_by parameter controls ordering: date_desc gives chronological (newest first), which is most useful for incremental scraping where you only want reviews newer than your last crawl.

The biz ID — a numeric or alphanumeric identifier used in Yelp's internal systems — is embedded as a data attribute on the page and is useful when cross-referencing with the Yelp Fusion API.

  • Biz page: https://www.yelp.com/biz/dumpling-home-san-francisco
  • Duplicate slug: https://www.yelp.com/biz/dumpling-home-san-francisco-2
  • Reviews sorted by date: https://www.yelp.com/biz/dumpling-home-san-francisco?sort_by=date_desc
  • Review page 2 (offset 10): https://www.yelp.com/biz/dumpling-home-san-francisco?sort_by=date_desc&start=10
  • Review page 3 (offset 20): https://www.yelp.com/biz/dumpling-home-san-francisco?sort_by=date_desc&start=20
  • Local search: https://www.yelp.com/search?find_desc=pizza&find_loc=Brooklyn%2C+NY
  • Search with category filter: https://www.yelp.com/search?find_desc=Bakeries&find_loc=San+Francisco%2C+CA&cflt=bakeries
  • Biz ID exposed in data-biz-id attribute on the root biz container element
  • UK catalog: https://www.yelp.co.uk/biz/... (separate index, different review pools)

3.Yelp biz page DOM structure and CSS selectors

Yelp's frontend is a React application. The initial server-rendered HTML includes the business name, aggregate rating, address, phone, and the first 10 reviews — enough for basic extraction without JavaScript execution. However, Yelp periodically renames its CSS classes and data-testid attributes during frontend deploys, so selectors that work today may break within weeks. Anchoring selectors to data-testid attributes is more stable than class-based selectors, since testid values tend to change less frequently than generated class names.

The aggregate star rating is rendered as a div with data-testid="rating" containing an aria-label like '4 star rating' — parse the numeric value from the aria-label rather than trying to count star SVG elements. The review count appears in an anchor tag whose href contains the fragment #reviews. Address fields are split across multiple elements inside a container with data-testid="address"; concatenate the child text nodes to reconstruct the full address string.

Individual reviews are rendered as li elements inside an ordered list. Each review li contains a div with data-testid="review". Within that container: review text is in a p element with a lang attribute (e.g., lang="en"); the star rating is in a div with aria-label containing 'star rating'; the date is in a span with a generated class — look for a span whose text matches a date pattern rather than relying on class names. The reviewer's profile link is an anchor with href containing /user_details.

Pagination controls render as anchor tags. The 'next' page link contains rel="next" or can be constructed directly from the start= offset. When the start= value exceeds the total review count, the review list renders empty — use that as your termination condition.

4.Yelp bot detection and anti-scraping measures

Yelp runs a multi-layer bot detection stack. Datacenter IP ranges — AWS, GCP, Azure, DigitalOcean — are blocked or served degraded responses almost immediately. Search result pages are the most aggressively protected; even moderate request rates from residential IPs trigger CAPTCHA challenges on search. Biz pages are somewhat more permissive, but burst patterns (many requests in a short window) still trigger 403 responses or CAPTCHA interstitials.

Review pagination beyond the first page requires JavaScript execution in many cases — the review list container is present in the initial HTML but populated via an XHR call triggered after page load. If your css_extractor request returns an empty review list, switch to js_rendering mode with js_wait_selector targeting the review container. Set js_wait_timeout to at least 10–12 seconds to account for Yelp's API response latency.

Yelp's data-testid attribute names change with frontend deploys, typically every few weeks. Build selector validation into your pipeline: if the expected selector returns null for a known business, trigger an alert and re-inspect the DOM rather than silently writing empty fields to your database.

Geographically, Yelp operates separate catalogs for different countries (yelp.com, yelp.co.uk, yelp.com.au, etc.). A business listed on yelp.co.uk will not appear in yelp.com search results. Match your proxy geography to the catalog you are targeting.

  • Datacenter IPs blocked or rate-limited on most page types
  • CAPTCHA on search pages and high-volume biz page requests — use enable_solver: true
  • JS-rendered review pagination requires js_rendering mode for pages beyond the first
  • data-testid attribute names change with frontend deploys — monitor selector health
  • Geo-partitioned catalogs: yelp.com, yelp.co.uk, yelp.com.au are separate indexes
  • TLS fingerprinting and browser behavior signals used for bot classification
  • ToS Section 7 explicitly prohibits scraping reviews for competing directory products

5.Scrape a Yelp business page with CSS extraction

For a single biz page, mode 'auto' with a residential US proxy is the right starting point. The initial server-rendered HTML contains the business name, rating, address, phone, website, price range, and categories — all extractable without JavaScript execution. OmniScrape's auto mode tries a fast HTTP request first and escalates to headless browser rendering only if the response indicates a challenge or missing content, which keeps costs low for pages that serve full HTML.

Use css_extractor output format with explicit selectors for each field. The response will include a css_extracted map with your field names as keys. Check body.data.css_extracted in the response — if a field is null or empty, the selector may have changed and needs updating. The metadata.method_used field tells you whether the request was served via fast HTTP or js_rendering, which helps you understand Yelp's current response behavior for that URL.

Residential proxy geo-matching matters: a US proxy in the same metro as the business tends to get more complete results, particularly for hours and attributes that Yelp may personalize by region.

Yelp biz page — CSS extraction request
json
12345678910111213141516171819{
  "url": "https://www.yelp.com/biz/tartine-bakery-san-francisco",
  "mode": "auto",
  "output_format": "css_extractor",
  "enable_solver": true,
  "proxy": "residential:us",
  "css_selectors": {
    "name": "h1",
    "rating": "[data-testid=\"rating\"]",
    "review_count": "a[href*=\"#reviews\"]",
    "address": "[data-testid=\"address\"]",
    "phone": "[data-testid=\"phone-number\"]",
    "website": "[data-testid=\"biz-website-link\"]",
    "price_range": "[data-testid=\"price-range\"]",
    "categories": "span[class*=\"category-str-list\"]",
    "hours_table": "table[class*=\"hours-table\"]",
    "claimed_status": "[data-testid=\"claimed-status\"]"
  }
}

6.Paginate and extract Yelp reviews

Review pagination uses the start= query parameter with increments of 10. For incremental scraping (only new reviews since last run), use sort_by=date_desc and stop pagination when you encounter a review date older than your last crawl timestamp — this avoids fetching the full review history on every run.

Reviews beyond the first page reliably require JavaScript execution. Use js_rendering mode with js_wait_selector set to the review container. If the selector does not appear within js_wait_timeout milliseconds, OmniScrape returns whatever HTML was available — check that css_extracted.review_text is non-empty before writing to your store.

Space requests at least 3–5 seconds apart per business. For bulk crawls across many businesses, distribute requests across multiple sessions using session_id to avoid pattern detection from a single IP making sequential requests to the same domain.

Yelp reviews — paginated JS-rendered request
json
12345678910111213141516{
  "url": "https://www.yelp.com/biz/tartine-bakery-san-francisco?sort_by=date_desc&start=10",
  "mode": "js_rendering",
  "output_format": "css_extractor",
  "enable_solver": true,
  "proxy": "residential:us",
  "js_wait_selector": "[data-testid=\"review\"]",
  "js_wait_timeout": 12000,
  "css_selectors": {
    "review_text": "[data-testid=\"review\"] p[lang]",
    "review_rating": "[data-testid=\"review\"] div[aria-label*=\"star rating\"]",
    "review_date": "[data-testid=\"review\"] span[class*=\"date\"]",
    "reviewer_name": "[data-testid=\"review\"] a[href*=\"/user_details\"]",
    "owner_response": "[data-testid=\"owner-response\"]"
  }
}

7.NAP normalization, deduplication, and cross-source merging

NAP (Name, Address, Phone) reconciliation is the core use case for multi-source local data pipelines. Raw Yelp data arrives inconsistently formatted: phone numbers may be '(415) 555-0100', '+14155550100', or '415.555.0100' depending on what the business owner entered. Normalize all phone numbers to E.164 format (+14155550100) before storage and comparison. For addresses, use a parser like libpostal to decompose free-text addresses into structured components (street number, street name, city, state, postal code) — this enables reliable matching even when abbreviations differ ('St' vs 'Street', 'Ave' vs 'Avenue').

Store the Yelp biz slug as the primary key for Yelp records, not the business name — names change, slugs persist. When merging Yelp records with Google Maps data, use a two-stage match: first attempt an exact match on normalized phone + ZIP code, then fall back to fuzzy name matching (Jaro-Winkler or trigram similarity) within the same ZIP code. Require a similarity threshold of at least 0.85 before auto-merging; queue lower-confidence matches for manual review.

Track NAP discrepancies as structured diffs: {field: 'phone', yelp: '+14155550100', google: '+14155550199', business_id: '...'}. This format makes it easy to generate citation audit reports and to detect when a business has updated its information on one platform but not others.

See lead generation web scraping for broader enrichment pipeline patterns that apply to local business data.

8.Yelp Fusion API — when to use it instead of scraping

Yelp operates a first-party API called Fusion API, available at api.yelp.com. The free tier provides access to business search, business details, and reviews (capped at 3 reviews per business on the public tier). Fusion is the correct choice when you are building a consumer-facing product that displays Yelp data with attribution, or when your use case falls within Yelp's developer terms — the API provides structured JSON responses with stable field names and no bot-detection friction.

Fusion's limitations are meaningful for data-intensive use cases: the free tier has rate limits that make bulk extraction impractical, review text is truncated, and access to full review history is not available. The API also does not expose some fields visible on the biz page, such as detailed amenity attributes and owner responses. For internal analytics, citation monitoring of your own business listings, or research use cases that do not involve republishing Yelp content, scraping the biz page gives you more complete data.

The two approaches are not mutually exclusive. A common pattern is to use Fusion for initial business discovery and structured metadata (categories, coordinates, Yelp rating), then scrape the biz page for fields Fusion does not expose (full review text, owner responses, amenity details). Always check the current Fusion API terms before combining approaches — Yelp updates its developer policies periodically.

9.Yelp Terms of Service and legal considerations

Yelp's Terms of Service, Section 7 (Prohibited Activities), explicitly prohibits using automated means to access the site, scraping content, or using Yelp data to build competing products or populate other directories. Yelp has a history of litigation against scrapers — most notably the hiQ Labs v. LinkedIn precedent is often cited in this context, though that case involved LinkedIn and its outcome does not provide blanket protection for scraping Yelp.

The compliance picture depends heavily on use case. Monitoring your own business's Yelp listing for citation accuracy or review alerts is a common practice and low legal risk, though technically still outside the letter of the ToS. Building a competing local directory populated with Yelp reviews is the highest-risk use case and the one Yelp has historically pursued legally. Academic research and journalism occupy a grayer middle ground.

Do not republish Yelp review text verbatim in customer-facing products without explicit permission. If you are building a product that displays local business data, evaluate the Yelp Fusion API with proper attribution as the compliant path. For internal analytics where data is not republished, assess your specific use case with legal counsel familiar with the CFAA and relevant state computer fraud statutes.

Frequently asked questions

How do I paginate through all Yelp reviews for a business?

Use the start= query parameter in increments of 10: ?sort_by=date_desc&start=0, then start=10, start=20, and so on. Stop when the review container in the response is empty — this means you have exceeded the total review count. For incremental scraping, sort by date_desc and halt pagination when you encounter a review older than your last crawl timestamp, rather than always fetching from the beginning. Space requests at least 3–5 seconds apart to avoid triggering rate limits.

Why are reviews missing from my Yelp scrape response?

Review content beyond the first page is loaded via JavaScript after initial page render. If you are using mode 'auto' or 'fast' and the review list is empty, switch to mode 'js_rendering' with js_wait_selector set to '[data-testid="review"]' and js_wait_timeout of at least 10000–12000ms. Also verify that your css_selectors are current — Yelp's data-testid attribute names change with frontend deploys, so selectors that worked last month may be stale.

What proxy type should I use for Yelp?

Residential proxies are required for reliable Yelp access. Datacenter IP ranges (AWS, GCP, DigitalOcean) are blocked or served CAPTCHA challenges almost immediately. Use proxy: 'residential:us' for US Yelp listings. For geo-specific catalogs (yelp.co.uk, yelp.com.au), match the proxy country to the catalog — 'residential:gb' for UK, 'residential:au' for Australia. Metro-level geo-matching (e.g., a California residential IP for San Francisco businesses) can improve response completeness for localized content.

How do I handle Yelp CAPTCHA challenges?

Set enable_solver: true in your OmniScrape request. OmniScrape's Web Unlocker handles CAPTCHA solving automatically. Check metadata.solver_used and metadata.challenge_solved in the response to confirm the challenge was resolved. If you are seeing persistent CAPTCHA on search pages, consider targeting biz URLs directly rather than search — search pages have significantly more aggressive bot detection than individual biz pages.

How do I merge Yelp data with Google Maps data for NAP reconciliation?

Normalize both sources before merging: convert phone numbers to E.164 format and parse addresses with libpostal into structured components. Use a two-stage match: exact match on normalized phone + ZIP code first, then fall back to fuzzy name similarity (Jaro-Winkler, threshold 0.85+) within the same ZIP. Store the Yelp biz slug and Google Place ID as separate keys — do not use business name as a primary key since names change. See Google Maps scraper for the Maps-side extraction patterns.

Can I use Yelp Fusion API instead of scraping?

Fusion API is the right choice for consumer-facing products that display Yelp data with attribution, and for use cases within Yelp's developer terms. Its limitations: the free tier caps reviews at 3 per business, review text is truncated, and bulk extraction is rate-limited. Fusion does not expose amenity attributes, owner responses, or full review history. For internal analytics or citation monitoring of your own listings, scraping the biz page gives more complete data — but evaluate your use case against Yelp's developer terms before combining approaches.

How often do Yelp's CSS selectors and data-testid attributes change?

Yelp's frontend deploys frequently — data-testid attribute names and generated CSS classes can change every few weeks. Build selector health monitoring into your pipeline: after each crawl run, validate that key fields (name, rating, review_count) are non-null for a set of known test businesses. If any sentinel field returns null, trigger an alert and re-inspect the live DOM before the next production run. Anchoring to data-testid attributes is more stable than generated class names, but neither is immune to changes.

Related guides

  • Google Maps Scraper: Extract Business Listings and Place Data
  • Lead Generation Web Scraping: Compliant Inbound Enrichment for Sales Teams
  • Sentiment Analysis Web Scraping: Build a Production Review Pipeline
  • Web Scraping with Python

Ready to scrape without blocks?

Get your API key in minutes. Test protected URLs from the dashboard — no credit card required to start.

Ready to get started?

Start scraping protected sites today — no credit card required.

OmniScrape

Web scraping infrastructure for developers. One API call to bypass any protection.

All systems operational

Product

  • Web Unlocker
  • Browser-as-a-Service
  • Residential Proxies
  • Pricing

Developers

  • API Reference ↗
  • Quickstart ↗
  • All Guides
  • Use Cases
  • Status

Company

  • About
  • Contact

Legal

  • Privacy Policy
  • Terms of Service
  • Refund Policy
  • Cookie Policy
  • Acceptable Use

Solutions

  • E-commerce Web Scraping: Catalog Intelligence at Production Scale
  • Real Estate Web Scraping: Listings, Comps, and Market Data
  • SERP Web Scraping: Agency Rank Tracking Workflow
  • Job Board Web Scraping: HR Tech Pipeline for Labor Market Intelligence
  • Price Monitoring with Web Scraping: A Practical Developer Guide
  • Lead Generation Web Scraping: Compliant Inbound Enrichment for Sales Teams
  • Market Research Web Scraping: Multi-Geo Data Collection for Research Firms
  • Sentiment Analysis Web Scraping: Build a Production Review Pipeline
  • Logistics Web Scraping: Carrier Rates, Port ETAs, and Sailing Schedules
  • Social Media Web Scraping: Brand Mention Monitoring from Public Pages
  • LLM Training Data Scraping: Building Clean Web Corpora
  • Travel Web Scraping: Hotel Rates, Flight Fares & Parity Monitoring

Web Scraping by Language

  • Web Scraping with Python
  • Web Scraping with Node.js: fetch, Cheerio, and the OmniScrape API
  • Web Scraping with Java: HttpClient, Jsoup, and OmniScrape API
  • Web Scraping with PHP
  • Web Scraping with Go (Golang)
  • Web Scraping with Ruby: Faraday, Nokogiri, Sidekiq & OmniScrape
  • Web Scraping with C#: HttpClient, AngleSharp, and OmniScrape API
  • Web Scraping with Rust
  • Web Scraping with R: httr2, rvest, and the OmniScrape API
  • Web Scraping with C++
  • Web Scraping with Elixir
  • Web Scraping with Perl: Mojo::UserAgent, Mojo::DOM, and OmniScrape

Anti-Bot Bypass

  • How to Bypass Cloudflare When Web Scraping
  • How to Bypass DataDome When Web Scraping
  • How to Bypass Akamai Bot Manager When Web Scraping
  • How to Bypass PerimeterX (HUMAN Security) When Web Scraping
  • Bypassing AWS WAF When Web Scraping: Rate Rules, Bot Control, and Residential Proxies
  • How to Bypass Imperva (Incapsula) When Web Scraping
  • How to Bypass Kasada Bot Protection When Web Scraping
  • How to Bypass F5 BIG-IP Bot Defense When Web Scraping
  • How to Bypass Distil Networks When Web Scraping
  • How to Bypass reCAPTCHA When Web Scraping

Scraping Tools

  • Playwright Web Scraping: Practical Patterns for Protected Sites
  • Puppeteer Web Scraping: Patterns, Anti-Bot Limits, and BaaS Integration
  • Selenium Web Scraping: Practical Patterns for Real-World Projects
  • Scrapy Web Scraping with OmniScrape: Download Middleware, Pipelines, and Scale
  • Beautiful Soup Web Scraping: A Practical Guide
  • cURL Web Scraping: Shell-Native Patterns with OmniScrape
  • HTTPX Web Scraping: Async Python with OmniScrape
  • Cheerio Web Scraping: A Practical Guide

Site-Specific Scrapers

  • Amazon Scraper: Product Data, Buy Box, Reviews, and Multi-Marketplace
  • Google Search Scraper: Extract SERP Rankings and Features
  • Google Maps Scraper: Extract Business Listings and Place Data
  • LinkedIn Scraper: Companies, Jobs, and Public Profiles
  • Walmart Scraper: Prices, Stock, Rollback Deals, and Fulfillment Data
  • eBay Scraper: Extract Listings, Auctions, and Sold Prices
  • Shopify Scraper: Products, Variants, and JSON Endpoints
  • Indeed Scraper: Extract Job Listings, Salaries, and Company Data
  • Zillow Scraper: Extract Listings, Zestimates, and Price History
  • Reddit Scraper: Posts, Comments, and Subreddit Data
  • X (Twitter) Scraper: Tweets, Profiles, and Hashtags
  • Instagram Scraper: Posts, Reels, and Profile Metrics
  • TikTok Scraper: Extract Videos, Hashtags, and Trend Data
  • YouTube Scraper: Extract Video Metadata, Comments, and Channel Stats
  • Booking.com Scraper: Hotel Rates, Room Types, and Availability
  • Airbnb Scraper: Listings, Calendars, and Nightly Rates
  • Crunchbase Scraper: Extract Funding Rounds, Companies, and Investors
  • Yelp Scraper: Extract Business Listings, Ratings, and Reviews
  • Glassdoor Scraper: Employer Ratings, Salaries, and Review Data
  • Trustpilot Scraper: TrustScore, Star Distribution, and Review Monitoring

How We Compare

  • OmniScrape vs ScrapingBee
  • OmniScrape vs ZenRows
  • OmniScrape vs ScraperAPI: A Practical Developer Comparison
  • OmniScrape vs Bright Data: Which Web Scraping Platform Fits Your Team?
  • OmniScrape vs Oxylabs
  • OmniScrape vs Smartproxy
  • OmniScrape vs Crawlbase: API Design, Observability, and Migration Guide
  • OmniScrape vs Apify

Web Scraping Guides

  • Web Scraping Without Getting Blocked
  • Web Scraping Proxy Guide: Types, Sessions, Geo, and OmniScrape Integration
  • Solve CAPTCHAs While Web Scraping
  • Web Scraping vs Web Crawling: Architecture, Patterns, and When to Use Each
  • Headless Browser Scraping: When to Use It and How to Do It Right
  • Web Scraping API: Endpoint, Modes, Output Formats & Integration Patterns
  • Rotating Proxies for Web Scraping: Policies, Session Binding, and Geo Pools
  • Scrape JavaScript-Rendered Pages: SPAs, Hydration, and Hidden APIs

© 2026 OmniScrape. All rights reserved.

PrivacyTermsRefundsAcceptable Use