OmniScrape
ProductsSolutionsGuidesDocs ↗PricingAbout
ProductsSolutionsGuidesDocs ↗PricingAbout
← All guides
Site-Specific Scrapers

Amazon Scraper: Product Data, Buy Box, Reviews, and Multi-Marketplace

Amazon is the benchmark target for retail intelligence — and one of the hardest to scrape reliably. A single product detail page (PDP) packs dozens of fields into a layout that shifts by category, marketplace TLD, fulfillment type, and whether the item is sold by Amazon directly, an FBA seller, or an FBM merchant. Teams scrape Amazon for buy box tracking, MAP policy enforcement, review sentiment analysis, catalog gap analysis, and competitor pricing intelligence.

This guide covers Amazon specifically: which URLs to target, where fields live in the DOM, how Amazon's bot detection behaves, and how to structure requests through OmniScrape to get consistent data across storefronts. For the broader ecommerce data pipeline — scheduling, deduplication, change alerting — see ecommerce web scraping. For the Python fetch layer and parsing patterns, see web scraping with Python.

On this page

1. What teams extract from Amazon PDPs2. Amazon URL patterns that survive redesigns3. Where the data lives in Amazon HTML4. How Amazon blocks scrapers5. Scrape a product detail page with OmniScrape6. Pull review histogram and individual review text7. Fallback: parse JSON-LD when CSS selectors break8. Multi-marketplace scraping across Amazon TLDs9. Amazon Terms of Service and legal considerations10. FAQ

1.What teams extract from Amazon PDPs

Start from the business question before writing a single selector. Price monitors need the buy box winner, list price, deal badge, coupon text, and fulfillment type. Catalog teams want title, brand, bullet points, category breadcrumb, main image URL, and variant ASIN relationships. Review analysts pull the star histogram, total review count, and individual review text with verified-purchase flags and reviewer metadata.

Some fields are straightforward — title and brand rarely move. Others are volatile: buy box price can shift every few minutes, BSR updates hourly, and availability text is locale-dependent. Design your schema around the fields your use case actually needs, and instrument alerts when high-value fields come back empty rather than silently storing nulls.

  • ASIN and parent ASIN (for variant families — color, size, style)
  • Buy box price, currency symbol, and buy box seller name (Amazon Retail vs FBA vs FBM)
  • List price (struck-through reference price), savings amount, savings percentage
  • Deal badges: Lightning Deal countdown, Prime Exclusive Discount, coupon clip amount
  • Title, brand, bullet point features (up to 5), and long-form product description
  • Star rating (aggregate), review count, and rating breakdown histogram (5★ through 1★)
  • Best Sellers Rank (BSR) per category node — products can rank in multiple nodes
  • Availability text, fastest delivery promise, and fulfillment type (Prime, FBA, FBM)
  • A+ content presence flag, brand story section, and embedded video count
  • Main image URL and alternate image gallery URLs

2.Amazon URL patterns that survive redesigns

Amazon's search and autocomplete endpoints change frequently and trigger bot scoring the fastest — avoid them for bulk data collection. The most stable entry points are PDP URLs keyed by ASIN. Amazon redirects any slug variation to the canonical URL, so you can always use the bare /dp/ASIN form and ignore the human-readable slug entirely.

Build your ASIN list from brand feeds, licensed catalog data, or low-volume category browse — then refresh individual PDPs on a schedule. This keeps your scraping surface predictable and avoids the high-detection-risk search surface. For the reviews endpoint, the ref parameter is optional; the ASIN is what matters.

  • PDP (bare): https://www.amazon.com/dp/B08N5WRWNW
  • PDP (with slug, redirects to same page): https://www.amazon.com/Echo-Dot-4th-Gen/dp/B08N5WRWNW
  • Reviews page: https://www.amazon.com/product-reviews/B08N5WRWNW
  • Reviews paginated: https://www.amazon.com/product-reviews/B08N5WRWNW?pageNumber=2
  • All offers / seller listing: https://www.amazon.com/gp/offer-listing/B08N5WRWNW
  • Category search (higher detection risk): https://www.amazon.com/s?k=wireless+earbuds&page=2
  • Marketplace TLD variants: amazon.co.uk, amazon.de, amazon.fr, amazon.co.jp, amazon.ca — each is a fully separate catalog with independent pricing and seller pools

3.Where the data lives in Amazon HTML

Amazon embeds structured data when it helps their own SEO. Always check application/ld+json script blocks for Product schema first — name, image, offers.price, and aggregateRating are frequently present even when the visible DOM is heavily obfuscated or A/B-tested. This makes JSON-LD a reliable cross-check for price selectors.

The buy box price typically renders inside #corePrice_feature_div. Amazon duplicates price text for screen-reader accessibility: the visible formatted price uses span.a-price-whole and span.a-price-fraction, while span.a-price .a-offscreen holds the clean combined number (e.g., "$29.99") — always target .a-offscreen for machine parsing. The list price (struck-through) sits in .basisPrice .a-offscreen or .a-text-price .a-offscreen depending on the category template.

Review histogram bars are anchored by #histogramTable or by data-hook attributes on the reviews page. BSR appears in table rows inside the product details section — the exact wrapper varies by category template (electronics uses a different detail table structure than books or grocery). When in doubt, search for the literal text 'Best Sellers Rank' in the HTML and walk up to the containing row.

Variant selectors (color swatches, size tiles) are driven by JavaScript and inline JSON embedded in a script tag containing 'twister-plus-js-init-data' or similar. If you need variant ASIN mapping, request mode js_rendering and parse that JSON blob rather than trying to click through swatches.

4.How Amazon blocks scrapers

Amazon does not publicly name its bot management vendor, but the observed behavior matches enterprise-grade fingerprinting: TLS/JA3 fingerprint analysis, HTTP/2 settings inspection, behavioral scoring across request sequences, and CAPTCHA challenges on search and high-velocity IPs. Critically, Amazon frequently returns HTTP 200 with a 'dog page' (the cartoon dog error) or a stripped buy box instead of a hard block — your pipeline must validate that actual product content is present, not just that the status code was 200.

Regional storefronts serve different HTML structures and pricing. Scraping amazon.com with a German IP without matching Accept-Language headers and the correct marketplace TLD produces wrong or incomplete data — Amazon may serve a redirect, a localized interstitial, or simply omit the buy box. Dynamic pricing and seller rotation for some categories load via AJAX after first paint, meaning a pure HTTP fetch can return stale or absent buy box data. Use mode auto and verify with js_rendering when prices come back empty.

  • CAPTCHA interstitials ('Enter the characters you see below') on search and high-frequency PDP access
  • 'Sorry, we just need to make sure you're not a robot' pages that return HTTP 200
  • IP reputation scoring — datacenter IP ranges fail first; residential proxies matched to marketplace country are required
  • Zip-code and delivery-context-dependent pricing on grocery, pantry, and some electronics categories
  • A/B layout tests that relocate #corePrice_feature_div or replace it with new wrapper IDs without notice
  • Login prompts on review pagination at scale — Amazon gates deep review pagination behind account sessions
  • Session-linked URL tokens in some search and offer-listing URLs that expire within minutes

5.Scrape a product detail page with OmniScrape

Send the PDP URL to POST https://api.omniscrape.io/v1/scrape with mode auto and a US residential proxy so Amazon serves the same buy box a shopper in that region sees. Setting output_format to css_extractor lets OmniScrape evaluate your selectors server-side and return only the extracted values — no HTML parsing in your application code. Cross-check the buy box price against JSON-LD if buy_box_price comes back empty; that combination catches both selector drift and soft blocks.

The response will contain body.data.css_extracted with your named fields. Check body.success and inspect body.metadata.method_used to understand whether OmniScrape escalated to a full browser render. If method_used is 'fast' and buy_box_price is empty, retry with mode js_rendering and js_wait_selector set to #corePrice_feature_div.

Amazon PDP — css_extractor request
json
12345678910111213141516171819{
  "url": "https://www.amazon.com/dp/B08N5WRWNW",
  "mode": "auto",
  "output_format": "css_extractor",
  "proxy": "residential:us",
  "css_selectors": {
    "title": "#productTitle",
    "buy_box_price": "#corePrice_feature_div .a-price .a-offscreen",
    "list_price": ".basisPrice .a-offscreen",
    "deal_badge": "#dealBadge_feature_div .a-badge-label",
    "rating": "#acrPopover span.a-size-base",
    "review_count": "#acrCustomerReviewText",
    "bsr": "#productDetails_detailBullets_sections1 tr:has(th:contains('Best Sellers Rank')) td",
    "availability": "#availability span",
    "seller": "#sellerProfileTriggerId",
    "brand": "#bylineInfo",
    "main_image": "#landingImage"
  }
}

6.Pull review histogram and individual review text

The star histogram is available on the PDP itself, but individual review text requires the dedicated reviews endpoint. Paginate with ?pageNumber=N — Amazon typically shows 10 reviews per page. Keep concurrency low and introduce per-request delays; Amazon ties review scraping detection to both IP reputation and request cadence. Do not parallelize review pagination aggressively.

The data-hook attributes on the reviews page are more stable than class-based selectors — Amazon has kept data-hook='review-title', data-hook='review-body', and data-hook='avp-badge' consistent across redesigns. The histogram percentage bars on the reviews page use aria-label attributes that include the percentage as text, which is more reliable than trying to measure bar width.

Amazon reviews page — css_extractor request
json
1234567891011121314151617181920{
  "url": "https://www.amazon.com/product-reviews/B08N5WRWNW",
  "mode": "auto",
  "output_format": "css_extractor",
  "proxy": "residential:us",
  "css_selectors": {
    "overall_rating": "[data-hook=rating-out-of-text]",
    "total_reviews": "[data-hook=total-review-count]",
    "histogram_5star": "[data-hook=histogram-row-5-star] [aria-label]",
    "histogram_4star": "[data-hook=histogram-row-4-star] [aria-label]",
    "histogram_3star": "[data-hook=histogram-row-3-star] [aria-label]",
    "histogram_2star": "[data-hook=histogram-row-2-star] [aria-label]",
    "histogram_1star": "[data-hook=histogram-row-1-star] [aria-label]",
    "review_titles": "[data-hook=review-title]",
    "review_bodies": "[data-hook=review-body] span",
    "review_dates": "[data-hook=review-date]",
    "verified_badges": "[data-hook=avp-badge]",
    "reviewer_names": ".a-profile-name"
  }
}

7.Fallback: parse JSON-LD when CSS selectors break

When Amazon ships an A/B layout test that moves or renames price divs, the JSON-LD Product schema embedded for Google Shopping often still validates correctly. Request output_format html, locate all script[type="application/ld+json"] blocks in body.data.content, filter for @type === 'Product', and parse offers.price, offers.priceCurrency, offers.availability, and aggregateRating.ratingValue from the structured data.

This approach is slower than css_extractor because you receive and parse the full HTML, but it survives layout churn far better. The recommended production pattern is to run both in parallel: use css_extractor as the primary path for speed, and fall back to JSON-LD parsing when css_extracted.buy_box_price is empty or null. An empty css_extractor result with a populated JSON-LD price is a reliable signal that a selector needs updating — log it as a selector drift alert rather than a data gap.

For the offers page (/gp/offer-listing/ASIN), JSON-LD is less useful because it only reflects the buy box winner. You need to parse the HTML table rows to get all competing seller prices, conditions, and shipping costs.

8.Multi-marketplace scraping across Amazon TLDs

Each Amazon TLD is a fully independent catalog. The same ASIN may be listed on amazon.com, amazon.co.uk, amazon.de, and amazon.co.jp with different prices, different sellers, different review counts, and different availability — treat them as separate products in your data model. Always match the proxy country to the marketplace TLD: use residential:de for amazon.de, residential:gb for amazon.co.uk, residential:jp for amazon.co.jp. Mismatched proxies produce incorrect pricing, missing buy boxes, and sometimes marketplace redirects.

Currency and VAT display rules differ by marketplace. German and French storefronts display VAT-inclusive prices by default; UK prices include VAT for consumer-facing listings. Store the raw price string and currency code from each marketplace independently, and normalize to a base currency in your ETL layer — do not rely on Amazon to expose USD equivalents on non-US storefronts.

For price monitoring across regions, include marketplace as a first-class dimension on every row in your data store. Index on (asin, marketplace, scraped_at) to support time-series queries and cross-market price differential analysis.

9.Amazon Terms of Service and legal considerations

Amazon's Conditions of Use explicitly restrict automated access to their platform without prior written permission. The legal landscape for web scraping public product data continues to evolve — hiQ v. LinkedIn established that scraping publicly accessible data is not automatically a CFAA violation, but it does not provide blanket authorization for commercial scraping of any platform, and Amazon has pursued legal action against scrapers independently of CFAA arguments.

Many teams operate MAP monitoring programs under contractual relationships with brands or Amazon's own Brand Registry tooling — this is a different legal posture than unilateral competitive scraping. Before deploying a production Amazon scraper at scale, confirm the use case with legal counsel. Scope your data collection to public product and review text on PDPs; do not collect buyer names, shipping addresses, order histories, or any account-linked identities. OmniScrape provides the technical infrastructure for making HTTP requests; determining whether a specific use case is legally permissible in your jurisdiction is your responsibility.

Frequently asked questions

Should I scrape Amazon search results or go directly to PDP URLs?

Go directly to PDP URLs keyed by ASIN whenever possible. Search result pages trigger CAPTCHA and bot scoring much faster than PDPs, encode session state in URLs that expire, and have less stable HTML structure. Build your ASIN list from brand feeds, licensed catalog data, or low-volume category browse — then refresh individual PDPs on a schedule. This keeps detection risk low and your data model clean.

Why is the buy box price empty in my scrape?

There are four common causes: (1) selector drift from an A/B layout test — the price moved out of #corePrice_feature_div; (2) the price loads via JavaScript after first paint and a fast HTTP fetch returned the pre-render HTML; (3) the proxy country does not match the marketplace TLD, causing Amazon to suppress the buy box; (4) a soft block returned a dog page with HTTP 200. Diagnosis: check body.metadata.method_used — if it's 'fast', retry with mode js_rendering and js_wait_selector set to '#corePrice_feature_div'. Also parse JSON-LD from the same response; if JSON-LD has a price but css_extracted does not, you have selector drift.

Does OmniScrape solve Amazon CAPTCHAs automatically?

Yes — mode auto escalates to a full headless browser session when Amazon serves a CAPTCHA or challenge page, and OmniScrape's Web Unlocker handles the solve. Success rate is high for PDP URLs with residential proxies but is not guaranteed at extreme concurrency on search endpoints. Keep per-IP request rates modest, prefer PDP URLs over search, and stagger requests rather than bursting. See web scraping without getting blocked for rate discipline patterns.

How do I track all sellers on a single ASIN, not just the buy box winner?

Scrape the offer-listing URL (/gp/offer-listing/ASIN) which lists all competing offers in a table. Parse each row for seller name, price, condition (New/Used), fulfillment type (FBA/FBM), and shipping cost. The buy box winner is marked separately. You can also monitor buy box rotation on the PDP by polling #sellerProfileTriggerId and alerting when the seller name changes — this is lighter weight than parsing the full offers page on every cycle.

How do I scrape Amazon review pagination without getting blocked?

Use the /product-reviews/ASIN endpoint with ?pageNumber=N. Keep concurrency at 1 request per ASIN at a time, add a delay of several seconds between page requests, and use residential proxies. Amazon gates deep pagination (beyond page 5–10) more aggressively than early pages. If you hit a login prompt, that session or IP has been flagged — rotate proxy and resume from the last successful page. Avoid scraping reviews for hundreds of ASINs simultaneously from the same IP pool.

Can I scrape Amazon product data for price comparison or MAP monitoring?

Technically yes, but the legal permissibility depends on your use case and jurisdiction. MAP monitoring on behalf of brands you represent or have contracts with is a common and generally lower-risk use case. Building a public price comparison engine that republishes Amazon data at scale is higher risk — Amazon's ToS prohibit this and they actively enforce it. Confirm your specific use case with legal counsel before deploying at production scale.

How do I handle Amazon's different category templates in my selectors?

The safest approach is to maintain a selector map per category type (electronics, books, grocery, apparel) and detect which template you received by checking for landmark elements. Alternatively, use the JSON-LD fallback as your primary extraction path for fields like price and rating — it's template-agnostic. For fields that only exist in the DOM (BSR, availability, seller name), write defensive selectors that try multiple candidates in order and log which one matched, so you can track template drift over time.

Related guides

  • E-commerce Web Scraping: Catalog Intelligence at Production Scale
  • Price Monitoring with Web Scraping: A Practical Developer Guide
  • Web Scraping with Python
  • How to Bypass PerimeterX (HUMAN Security) When Web Scraping

Ready to scrape without blocks?

Get your API key in minutes. Test protected URLs from the dashboard — no credit card required to start.

Ready to get started?

Start scraping protected sites today — no credit card required.

OmniScrape

Web scraping infrastructure for developers. One API call to bypass any protection.

All systems operational

Product

  • Web Unlocker
  • Browser-as-a-Service
  • Residential Proxies
  • Pricing

Developers

  • API Reference ↗
  • Quickstart ↗
  • All Guides
  • Use Cases
  • Status

Company

  • About
  • Contact

Legal

  • Privacy Policy
  • Terms of Service
  • Refund Policy
  • Cookie Policy
  • Acceptable Use

Solutions

  • E-commerce Web Scraping: Catalog Intelligence at Production Scale
  • Real Estate Web Scraping: Listings, Comps, and Market Data
  • SERP Web Scraping: Agency Rank Tracking Workflow
  • Job Board Web Scraping: HR Tech Pipeline for Labor Market Intelligence
  • Price Monitoring with Web Scraping: A Practical Developer Guide
  • Lead Generation Web Scraping: Compliant Inbound Enrichment for Sales Teams
  • Market Research Web Scraping: Multi-Geo Data Collection for Research Firms
  • Sentiment Analysis Web Scraping: Build a Production Review Pipeline
  • Logistics Web Scraping: Carrier Rates, Port ETAs, and Sailing Schedules
  • Social Media Web Scraping: Brand Mention Monitoring from Public Pages
  • LLM Training Data Scraping: Building Clean Web Corpora
  • Travel Web Scraping: Hotel Rates, Flight Fares & Parity Monitoring

Web Scraping by Language

  • Web Scraping with Python
  • Web Scraping with Node.js: fetch, Cheerio, and the OmniScrape API
  • Web Scraping with Java: HttpClient, Jsoup, and OmniScrape API
  • Web Scraping with PHP
  • Web Scraping with Go (Golang)
  • Web Scraping with Ruby: Faraday, Nokogiri, Sidekiq & OmniScrape
  • Web Scraping with C#: HttpClient, AngleSharp, and OmniScrape API
  • Web Scraping with Rust
  • Web Scraping with R: httr2, rvest, and the OmniScrape API
  • Web Scraping with C++
  • Web Scraping with Elixir
  • Web Scraping with Perl: Mojo::UserAgent, Mojo::DOM, and OmniScrape

Anti-Bot Bypass

  • How to Bypass Cloudflare When Web Scraping
  • How to Bypass DataDome When Web Scraping
  • How to Bypass Akamai Bot Manager When Web Scraping
  • How to Bypass PerimeterX (HUMAN Security) When Web Scraping
  • Bypassing AWS WAF When Web Scraping: Rate Rules, Bot Control, and Residential Proxies
  • How to Bypass Imperva (Incapsula) When Web Scraping
  • How to Bypass Kasada Bot Protection When Web Scraping
  • How to Bypass F5 BIG-IP Bot Defense When Web Scraping
  • How to Bypass Distil Networks When Web Scraping
  • How to Bypass reCAPTCHA When Web Scraping

Scraping Tools

  • Playwright Web Scraping: Practical Patterns for Protected Sites
  • Puppeteer Web Scraping: Patterns, Anti-Bot Limits, and BaaS Integration
  • Selenium Web Scraping: Practical Patterns for Real-World Projects
  • Scrapy Web Scraping with OmniScrape: Download Middleware, Pipelines, and Scale
  • Beautiful Soup Web Scraping: A Practical Guide
  • cURL Web Scraping: Shell-Native Patterns with OmniScrape
  • HTTPX Web Scraping: Async Python with OmniScrape
  • Cheerio Web Scraping: A Practical Guide

Site-Specific Scrapers

  • Amazon Scraper: Product Data, Buy Box, Reviews, and Multi-Marketplace
  • Google Search Scraper: Extract SERP Rankings and Features
  • Google Maps Scraper: Extract Business Listings and Place Data
  • LinkedIn Scraper: Companies, Jobs, and Public Profiles
  • Walmart Scraper: Prices, Stock, Rollback Deals, and Fulfillment Data
  • eBay Scraper: Extract Listings, Auctions, and Sold Prices
  • Shopify Scraper: Products, Variants, and JSON Endpoints
  • Indeed Scraper: Extract Job Listings, Salaries, and Company Data
  • Zillow Scraper: Extract Listings, Zestimates, and Price History
  • Reddit Scraper: Posts, Comments, and Subreddit Data
  • X (Twitter) Scraper: Tweets, Profiles, and Hashtags
  • Instagram Scraper: Posts, Reels, and Profile Metrics
  • TikTok Scraper: Extract Videos, Hashtags, and Trend Data
  • YouTube Scraper: Extract Video Metadata, Comments, and Channel Stats
  • Booking.com Scraper: Hotel Rates, Room Types, and Availability
  • Airbnb Scraper: Listings, Calendars, and Nightly Rates
  • Crunchbase Scraper: Extract Funding Rounds, Companies, and Investors
  • Yelp Scraper: Extract Business Listings, Ratings, and Reviews
  • Glassdoor Scraper: Employer Ratings, Salaries, and Review Data
  • Trustpilot Scraper: TrustScore, Star Distribution, and Review Monitoring

How We Compare

  • OmniScrape vs ScrapingBee
  • OmniScrape vs ZenRows
  • OmniScrape vs ScraperAPI: A Practical Developer Comparison
  • OmniScrape vs Bright Data: Which Web Scraping Platform Fits Your Team?
  • OmniScrape vs Oxylabs
  • OmniScrape vs Smartproxy
  • OmniScrape vs Crawlbase: API Design, Observability, and Migration Guide
  • OmniScrape vs Apify

Web Scraping Guides

  • Web Scraping Without Getting Blocked
  • Web Scraping Proxy Guide: Types, Sessions, Geo, and OmniScrape Integration
  • Solve CAPTCHAs While Web Scraping
  • Web Scraping vs Web Crawling: Architecture, Patterns, and When to Use Each
  • Headless Browser Scraping: When to Use It and How to Do It Right
  • Web Scraping API: Endpoint, Modes, Output Formats & Integration Patterns
  • Rotating Proxies for Web Scraping: Policies, Session Binding, and Geo Pools
  • Scrape JavaScript-Rendered Pages: SPAs, Hydration, and Hidden APIs

© 2026 OmniScrape. All rights reserved.

PrivacyTermsRefundsAcceptable Use