OmniScrape
ProductsSolutionsGuidesDocs ↗PricingAbout
ProductsSolutionsGuidesDocs ↗PricingAbout
← All guides
Site-Specific Scrapers

Shopify Scraper: Products, Variants, and JSON Endpoints

Shopify's predictable URL conventions make it one of the more approachable e-commerce targets — until you hit password gates, per-merchant Cloudflare, or themes that hide prices behind JavaScript variant pickers. The /products.json endpoint is a genuine shortcut for structured data on cooperative stores, but it is far from universal. Many high-traffic DTC brands rate-limit or disable it entirely.

This guide covers the full extraction path: start with the JSON API, fall back to DOM scraping with CSS selectors, and escalate to JavaScript rendering only when prices or inventory are injected after variant selection. For broader pipeline design, see ecommerce web scraping. For JavaScript-heavy pages, see scrape JavaScript rendered pages.

On this page

1. DTC catalog fields from Shopify2. Shopify URL patterns3. products.json — try this first4. Theme HTML when JSON fails5. Shopify bot protection layers6. Fetch products.json via OmniScrape7. Scrape a product page with CSS selectors8. Variant selection and JavaScript-rendered prices9. Competitive monitoring ethics and legal boundaries10. FAQ

1.DTC catalog fields from Shopify

Competitive catalog monitoring typically needs variant-level granularity — not just the product title, but the specific SKU, option combination (size + color), current price, and compare-at price that signals a markdown. Inventory signals are valuable even when exact counts are hidden; a sold-out variant tells you something about demand.

The fields below are available from /products.json on stores that expose it, and from PDP HTML on stores that do not. The JSON API is authoritative for structured fields like created_at and tags; HTML scraping is required for fields rendered client-side.

  • Product handle, title, vendor, and product_type
  • Variant ID, SKU, price (in cents as string), compare_at_price
  • Option names and values per variant (e.g., Size: M, Color: Navy)
  • available boolean and inventory_quantity when exposed in JSON
  • Product tags and collection membership for category inference
  • Primary and gallery image URLs with alt text
  • Product description as raw HTML (body_html in JSON)
  • created_at and updated_at timestamps from the JSON API
  • Metafields if exposed via theme liquid (rarely in JSON, sometimes in HTML)

2.Shopify URL patterns

Every Shopify storefront — whether on a myshopify.com subdomain or a custom domain — follows the same routing conventions. This means discovery logic written for one store transfers directly to another. The key paths to know are the product JSON endpoint, the collection JSON endpoint, and the XML sitemap.

Pagination on /products.json uses page and limit query parameters. The maximum limit is 250. When a page returns fewer than 250 results, you have reached the end of the catalog. Always check the products array length rather than relying on a total count field — the endpoint does not return one.

  • Product page: https://brand.com/products/wireless-earbuds
  • Product JSON: https://brand.com/products/wireless-earbuds.json
  • All products JSON: https://brand.com/products.json?limit=250&page=1
  • Collection page: https://brand.com/collections/new-arrivals
  • Collection products JSON: https://brand.com/collections/new-arrivals/products.json?limit=250
  • Product sitemap: https://brand.com/sitemap_products_1.xml (enumerate _2, _3 etc.)
  • Password gate: https://brand.com/password — do not attempt to bypass; stop here

3.products.json — try this first

Before rendering any page in a browser, issue a GET to /products.json?limit=250. On a cooperative store, Shopify returns a JSON object with a products array — each element contains the full variant list, option definitions, image URLs, tags, and timestamps. No CSS selectors, no DOM parsing, no JavaScript execution required.

Pagination is straightforward: increment the page parameter from 1 upward until the returned products array has fewer than limit items. For a 600-product catalog at limit=250, you need three requests: pages 1, 2, and 3.

Watch for these failure modes: a 404 means the store has disabled the endpoint at the theme level; a 403 or redirect to /password means the store is gated; an empty products array on page 1 means the endpoint is live but the store has explicitly hidden all products from it. In all three cases, escalate to HTML scraping of individual PDPs discovered via the sitemap.

Rate limiting on /products.json is real. Shopify's platform applies 429 responses on burst traffic, especially on smaller stores on shared infrastructure. Space requests by at least one second per page when paginating, and implement exponential backoff on 429.

4.Theme HTML when JSON fails

When /products.json is unavailable, scrape individual product detail pages (PDPs). Shopify's default Dawn theme uses a consistent set of CSS classes that you can rely on across Dawn-based stores. The current price lives in span.price-item--regular, the crossed-out compare-at price in span.price-item--compare, the product title in h1.product__title, and the vendor in a element or span with class product__vendor.

Variant pickers in Dawn use either a native select element or a set of radio inputs, both carrying data-option-name attributes that identify which option dimension they control (Size, Color, etc.). The selected variant's ID is written to an input[name='id'] hidden field when a variant is chosen.

Custom themes break all of this. A heavily customized store may use completely arbitrary class names, React or Vue components that render no static HTML, or a headless frontend that fetches product data from a separate API. When you encounter a blank or near-empty DOM, check the page source for an embedded JSON blob — many Shopify themes inject the full product object into a script tag as window.ShopifyAnalytics.meta or a JSON-LD block. Parsing that blob is faster and more reliable than scraping rendered HTML.

To find the embedded JSON, look for a script tag containing 'product' and 'variants' in the raw HTML response. A simple regex or a JSON-LD parser on application/ld+json script tags will surface structured product data even when the visible DOM is sparse.

5.Shopify bot protection layers

Shopify's platform applies its own bot mitigation at the infrastructure level, separate from any app a merchant installs. This manifests as 429 rate limits on JSON endpoints, JavaScript challenges on storefront pages under heavy load, and occasional CAPTCHA injection on checkout flows. For catalog scraping (not checkout), the platform-level protection is manageable with residential proxies and reasonable request rates.

Merchant-installed apps add a second layer. Locksmith and similar access-control apps can gate entire collections or individual products behind login or password prompts. These are application-level gates, not network-level blocks — the page loads but renders a form instead of product content. Detect them by checking for a password input or a login redirect in the response.

Merchants who route their custom domain through Cloudflare introduce a third layer. Cloudflare's bot score system can block or challenge requests that look automated, even on stores where /products.json would otherwise be open. Use mode auto with enable_solver: true and a residential proxy to handle Cloudflare challenges transparently. See Cloudflare bypass for a full breakdown.

  • 429 rate limits on /products.json at sustained high request rates
  • Password-protected pre-launch or wholesale stores — do not bypass
  • Per-merchant Cloudflare bot scoring on custom domains
  • JavaScript-only price rendering after variant selection in some themes
  • Inventory counts hidden until a specific variant is selected via theme JS
  • Locksmith and similar apps gating collections behind login
  • IP-based geo-restrictions on certain regional DTC stores

6.Fetch products.json via OmniScrape

Use output_format html to retrieve the raw JSON response body as a string in data.content — then parse it as JSON in your application. The mode auto setting handles both plain HTTP responses and any lightweight JavaScript challenges Shopify may inject. A residential US proxy reduces the likelihood of geo-based blocks and mimics the traffic pattern of a real shopper.

The response body in data.content will be a JSON string. Parse it with JSON.parse(body.data.content) to access the products array. Check that the array is non-empty before paginating — an empty array on page 1 means the endpoint is disabled for this store.

Shopify products.json request
json
123456{
  "url": "https://example-brand.com/products.json?limit=250&page=1",
  "mode": "auto",
  "output_format": "html",
  "proxy": "residential:us"
}

7.Scrape a product page with CSS selectors

When /products.json returns a 404, 403, or empty array, fall back to scraping individual PDPs. Use output_format css_extractor with a css_selectors map targeting Dawn theme classes. The OmniScrape API evaluates the selectors server-side and returns extracted values in data.css_extracted — no HTML parsing in your application code required.

The selectors below work reliably on Dawn-based stores. For custom themes, inspect the target store once and update the selector map accordingly. If a selector returns an empty string, the theme likely renders that field via JavaScript after page load — in that case, switch to mode js_rendering with a js_wait_selector to ensure the element is present before extraction.

For stores that embed product data in a JSON-LD script tag, you can also fetch with output_format html and extract the ld+json block from data.content — this avoids selector fragility entirely.

Shopify PDP CSS extraction request
json
123456789101112131415{
  "url": "https://example-brand.com/products/classic-hoodie",
  "mode": "auto",
  "output_format": "css_extractor",
  "proxy": "residential:us",
  "css_selectors": {
    "title": "h1.product__title",
    "price": "span.price-item--regular",
    "compare_at": "span.price-item--compare",
    "description": "div.product__description",
    "vendor": "span.product__vendor",
    "sku": "span.product__sku",
    "availability": "button[name='add']"
  }
}

8.Variant selection and JavaScript-rendered prices

Some Shopify themes — particularly heavily customized ones and those using React-based headless frontends — do not render the current variant's price in the initial HTML. The price element is present but empty, or shows the default variant price only after a client-side state update triggered by variant selection. If your CSS extractor returns an empty price field, this is the likely cause.

The cleanest solution is to parse the embedded product JSON that most Shopify themes inject into the page source. Look for a script tag containing a JSON object with a variants key — it will include price and compare_at_price for every variant without requiring any JavaScript execution. Fetch the page with output_format html, retrieve the raw HTML from data.content, and extract the JSON blob with a regex or an HTML parser targeting script[type='application/json'] or script[id='ProductJson'].

When the embedded JSON approach is not viable — for example, on a fully headless store that loads product data via a client-side API call — use mode js_rendering with a js_wait_selector targeting the price element. This tells OmniScrape to hold the headless browser open until the element appears in the DOM before returning the rendered HTML.

See scrape JavaScript rendered pages for a full treatment of js_wait_selector patterns and timeout configuration.

JS-rendered variant price extraction
json
1234567891011121314{
  "url": "https://example-brand.com/products/classic-hoodie",
  "mode": "js_rendering",
  "output_format": "css_extractor",
  "proxy": "residential:us",
  "js_wait_selector": "span.price-item--regular",
  "js_wait_timeout": 5000,
  "css_selectors": {
    "title": "h1.product__title",
    "price": "span.price-item--regular",
    "compare_at": "span.price-item--compare",
    "sku": "span.product__sku"
  }
}

9.Competitive monitoring ethics and legal boundaries

Scraping publicly accessible product pages for competitive price intelligence is a common and widely practiced use case. Courts in multiple jurisdictions have found that scraping publicly available data does not inherently constitute unauthorized access, but this is not a blanket permission — store terms of service, regional data protection laws, and the nature of the data all matter.

Password-protected stores are a hard boundary. A /password gate signals that the merchant has restricted access. Attempting to bypass it — whether by replaying session tokens, brute-forcing credentials, or exploiting application logic — constitutes unauthorized access under computer fraud statutes in most jurisdictions. Do not do it.

Checkout flows, customer account pages, and order history are out of scope for competitive monitoring. These contain personal data and are explicitly restricted by every Shopify store's terms. Stick to public catalog pages: PDPs, collection pages, and the /products.json endpoint.

Rate-limit your requests to avoid service degradation for the store's actual customers. A reasonable ceiling for catalog monitoring is one request per second per domain. Implement backoff on 429 responses and do not retry aggressively. Being a considerate scraper reduces the likelihood of IP blocks and keeps your monitoring sustainable.

Frequently asked questions

Should I use products.json or HTML scraping for a Shopify store?

Always try /products.json?limit=250&page=1 first. It returns structured variant-level data with no DOM parsing. Fall back to PDP HTML scraping only when the JSON endpoint returns a 404, 403, redirects to /password, or returns an empty products array. HTML scraping is slower, more fragile to theme changes, and requires selector maintenance.

How do I paginate through a full Shopify catalog?

Increment the page parameter starting from 1, keeping limit at 250. Stop when the returned products array contains fewer items than the limit value — Shopify does not return a total count field, so array length is your termination signal. For a 600-product store you need three requests: page=1, page=2, page=3 (the third returns 100 items, signaling the end).

How do I discover all product handles without paginating products.json?

Fetch the XML sitemap at /sitemap_products_1.xml. Shopify generates one sitemap file per 5,000 products and lists additional files in /sitemap.xml. Each products sitemap contains the canonical URL for every product, from which you can extract the handle. This approach works even when /products.json is disabled, and it gives you the full URL list for PDP scraping.

Why does Cloudflare appear on a Shopify store?

Shopify's infrastructure does not include Cloudflare by default, but merchants can route their custom domain through Cloudflare independently. When they do, Cloudflare's bot score system evaluates every request. Use mode auto with enable_solver: true and a residential proxy — OmniScrape's Web Unlocker handles the challenge automatically. See Cloudflare bypass for detailed configuration.

How do I get prices for all variants, not just the default?

The /products.json endpoint includes price and compare_at_price for every variant in the variants array — this is the most reliable method. For PDP HTML scraping, look for an embedded JSON blob in a script tag (often script[id='ProductJson'] or a script containing Shopify.product). It contains the full variant price matrix without requiring JavaScript execution. Only resort to js_rendering with variant click simulation when neither of these approaches is available.

Can I get real-time inventory counts from Shopify?

Only if the store exposes them. The /products.json endpoint includes an available boolean per variant on all stores, and inventory_quantity when the merchant has not hidden it (Shopify allows merchants to hide exact counts). PDP HTML sometimes shows 'Only 3 left' or similar text, but this is theme-dependent. For monitoring purposes, treat inventory as a boolean in_stock signal derived from the available field — exact counts are unreliable across stores.

What is the difference between mode auto and mode js_rendering for Shopify?

Mode auto tries a fast HTTP request first and escalates to a headless browser only if the response indicates a JavaScript challenge or the page is empty. For /products.json and most static PDPs, auto resolves via HTTP without browser overhead. Use mode js_rendering explicitly when you know the price or inventory element is injected by client-side JavaScript after page load — pair it with js_wait_selector targeting the element you need to ensure it is present before the page is returned.

Related guides

  • E-commerce Web Scraping: Catalog Intelligence at Production Scale
  • Price Monitoring with Web Scraping: A Practical Developer Guide
  • How to Bypass Cloudflare When Web Scraping
  • Scrape JavaScript-Rendered Pages: SPAs, Hydration, and Hidden APIs

Ready to scrape without blocks?

Get your API key in minutes. Test protected URLs from the dashboard — no credit card required to start.

Ready to get started?

Start scraping protected sites today — no credit card required.

OmniScrape

Web scraping infrastructure for developers. One API call to bypass any protection.

All systems operational

Product

  • Web Unlocker
  • Browser-as-a-Service
  • Residential Proxies
  • Pricing

Developers

  • API Reference ↗
  • Quickstart ↗
  • All Guides
  • Use Cases
  • Status

Company

  • About
  • Contact

Legal

  • Privacy Policy
  • Terms of Service
  • Refund Policy
  • Cookie Policy
  • Acceptable Use

Solutions

  • E-commerce Web Scraping: Catalog Intelligence at Production Scale
  • Real Estate Web Scraping: Listings, Comps, and Market Data
  • SERP Web Scraping: Agency Rank Tracking Workflow
  • Job Board Web Scraping: HR Tech Pipeline for Labor Market Intelligence
  • Price Monitoring with Web Scraping: A Practical Developer Guide
  • Lead Generation Web Scraping: Compliant Inbound Enrichment for Sales Teams
  • Market Research Web Scraping: Multi-Geo Data Collection for Research Firms
  • Sentiment Analysis Web Scraping: Build a Production Review Pipeline
  • Logistics Web Scraping: Carrier Rates, Port ETAs, and Sailing Schedules
  • Social Media Web Scraping: Brand Mention Monitoring from Public Pages
  • LLM Training Data Scraping: Building Clean Web Corpora
  • Travel Web Scraping: Hotel Rates, Flight Fares & Parity Monitoring

Web Scraping by Language

  • Web Scraping with Python
  • Web Scraping with Node.js: fetch, Cheerio, and the OmniScrape API
  • Web Scraping with Java: HttpClient, Jsoup, and OmniScrape API
  • Web Scraping with PHP
  • Web Scraping with Go (Golang)
  • Web Scraping with Ruby: Faraday, Nokogiri, Sidekiq & OmniScrape
  • Web Scraping with C#: HttpClient, AngleSharp, and OmniScrape API
  • Web Scraping with Rust
  • Web Scraping with R: httr2, rvest, and the OmniScrape API
  • Web Scraping with C++
  • Web Scraping with Elixir
  • Web Scraping with Perl: Mojo::UserAgent, Mojo::DOM, and OmniScrape

Anti-Bot Bypass

  • How to Bypass Cloudflare When Web Scraping
  • How to Bypass DataDome When Web Scraping
  • How to Bypass Akamai Bot Manager When Web Scraping
  • How to Bypass PerimeterX (HUMAN Security) When Web Scraping
  • Bypassing AWS WAF When Web Scraping: Rate Rules, Bot Control, and Residential Proxies
  • How to Bypass Imperva (Incapsula) When Web Scraping
  • How to Bypass Kasada Bot Protection When Web Scraping
  • How to Bypass F5 BIG-IP Bot Defense When Web Scraping
  • How to Bypass Distil Networks When Web Scraping
  • How to Bypass reCAPTCHA When Web Scraping

Scraping Tools

  • Playwright Web Scraping: Practical Patterns for Protected Sites
  • Puppeteer Web Scraping: Patterns, Anti-Bot Limits, and BaaS Integration
  • Selenium Web Scraping: Practical Patterns for Real-World Projects
  • Scrapy Web Scraping with OmniScrape: Download Middleware, Pipelines, and Scale
  • Beautiful Soup Web Scraping: A Practical Guide
  • cURL Web Scraping: Shell-Native Patterns with OmniScrape
  • HTTPX Web Scraping: Async Python with OmniScrape
  • Cheerio Web Scraping: A Practical Guide

Site-Specific Scrapers

  • Amazon Scraper: Product Data, Buy Box, Reviews, and Multi-Marketplace
  • Google Search Scraper: Extract SERP Rankings and Features
  • Google Maps Scraper: Extract Business Listings and Place Data
  • LinkedIn Scraper: Companies, Jobs, and Public Profiles
  • Walmart Scraper: Prices, Stock, Rollback Deals, and Fulfillment Data
  • eBay Scraper: Extract Listings, Auctions, and Sold Prices
  • Shopify Scraper: Products, Variants, and JSON Endpoints
  • Indeed Scraper: Extract Job Listings, Salaries, and Company Data
  • Zillow Scraper: Extract Listings, Zestimates, and Price History
  • Reddit Scraper: Posts, Comments, and Subreddit Data
  • X (Twitter) Scraper: Tweets, Profiles, and Hashtags
  • Instagram Scraper: Posts, Reels, and Profile Metrics
  • TikTok Scraper: Extract Videos, Hashtags, and Trend Data
  • YouTube Scraper: Extract Video Metadata, Comments, and Channel Stats
  • Booking.com Scraper: Hotel Rates, Room Types, and Availability
  • Airbnb Scraper: Listings, Calendars, and Nightly Rates
  • Crunchbase Scraper: Extract Funding Rounds, Companies, and Investors
  • Yelp Scraper: Extract Business Listings, Ratings, and Reviews
  • Glassdoor Scraper: Employer Ratings, Salaries, and Review Data
  • Trustpilot Scraper: TrustScore, Star Distribution, and Review Monitoring

How We Compare

  • OmniScrape vs ScrapingBee
  • OmniScrape vs ZenRows
  • OmniScrape vs ScraperAPI: A Practical Developer Comparison
  • OmniScrape vs Bright Data: Which Web Scraping Platform Fits Your Team?
  • OmniScrape vs Oxylabs
  • OmniScrape vs Smartproxy
  • OmniScrape vs Crawlbase: API Design, Observability, and Migration Guide
  • OmniScrape vs Apify

Web Scraping Guides

  • Web Scraping Without Getting Blocked
  • Web Scraping Proxy Guide: Types, Sessions, Geo, and OmniScrape Integration
  • Solve CAPTCHAs While Web Scraping
  • Web Scraping vs Web Crawling: Architecture, Patterns, and When to Use Each
  • Headless Browser Scraping: When to Use It and How to Do It Right
  • Web Scraping API: Endpoint, Modes, Output Formats & Integration Patterns
  • Rotating Proxies for Web Scraping: Policies, Session Binding, and Geo Pools
  • Scrape JavaScript-Rendered Pages: SPAs, Hydration, and Hidden APIs

© 2026 OmniScrape. All rights reserved.

PrivacyTermsRefundsAcceptable Use