OmniScrape
ProductsSolutionsGuidesDocs ↗PricingAbout
ProductsSolutionsGuidesDocs ↗PricingAbout
← All guides
Site-Specific Scrapers

Zillow Scraper: Extract Listings, Zestimates, and Price History

Zillow is the dominant public-facing real estate portal in the US, and one of the most aggressively defended consumer sites on the web. Individual property pages (homedetails) are JavaScript-rendered, search endpoints are geo-fenced and rate-limited, and MLS licensing rules restrict how scraped listing data can be redistributed — even when the HTML is technically accessible in a browser.

This guide focuses on individual property pages keyed by zpid: how the DOM is structured, which selectors are stable, how to handle lazy-loaded modules like price history, and how to build a maintainable refresh pipeline. For broader market-level pipelines and compliance framing, read real estate web scraping first.

Every code example uses OmniScrape's Web Unlocker with residential US proxies and js_rendering — the minimum viable configuration for Zillow homedetails pages at any meaningful scale.

On this page

1. Property data fields available on Zillow2. Zillow URL patterns and zpid extraction3. Zillow homedetails page DOM structure4. Zillow anti-bot protection and MLS constraints5. Scrape a Zillow property page by zpid6. Extracting the price history module7. Why not scrape Zillow search results8. Rental vs. for-sale listing pipelines9. MLS licensing, copyright, and Zillow terms of service10. FAQ

1.Property data fields available on Zillow

Zillow surfaces a wide range of structured fields on each homedetails page, drawn from MLS feeds, public records, and Zillow's own valuation models. Understanding which fields exist — and which are reliable — is the first step before writing any selectors.

Investors and analysts most commonly track price cuts, days on market, and Zestimate trends over time. PropTech products enrich address databases with Zestimate, tax history, and school ratings. Macro analysts model zip- and metro-level trends from listing snapshots. Each use case has a different refresh cadence and field priority.

  • zpid — Zillow Property ID, the stable primary key for every listing
  • Address components: street, city, state, zip, county
  • List price, price per square foot, and price change delta
  • Zestimate (Zillow's automated valuation) and Zestimate range
  • Bedrooms, full bathrooms, half bathrooms
  • Interior square footage and lot size
  • Property type: single-family, condo, townhouse, multi-family, land
  • Days on Zillow and cumulative days on market
  • Price history events: listed, sold, price reduced, relisted
  • Tax history: assessed value and annual tax amount by year
  • HOA fees (monthly), year built, garage/parking details
  • Heating, cooling, and utility fields from public records
  • Agent name, brokerage, and MLS listing ID
  • Rental Zestimate on dual for-sale/rental listings
  • School district and individual school ratings
  • Walk Score, Transit Score, and Bike Score

2.Zillow URL patterns and zpid extraction

Zillow's homedetails URLs follow a predictable pattern and are stable over time — the zpid does not change even if the address slug portion changes. This makes zpid the correct primary key for any pipeline. Bookmark or store the canonical URL with zpid rather than reconstructing from address strings.

Search result pages and map tile endpoints are structurally different from homedetails pages. They trigger Zillow's bot detection fastest, require JS execution to paginate, and are the most likely to change without notice. Build your zpid inventory from county records or one-time discovery crawls, then operate your refresh pipeline exclusively against homedetails URLs.

To extract the zpid from a URL programmatically, match the numeric segment immediately before `_zpid` in the path. A simple regex like `/\/(\d+)_zpid/` is sufficient and handles all current URL formats.

  • For-sale property: https://www.zillow.com/homedetails/123-Main-St-Anytown-CA-90210/12345678_zpid/
  • Rental listing: same homedetails path, listing_sub_type indicates rental status
  • Recently sold: https://www.zillow.com/homedetails/..._zpid/ with sold badge in DOM
  • Search results (avoid at scale): https://www.zillow.com/homes/for_sale/San-Francisco-CA/
  • Zestimate history: embedded in the homedetails page, not a separate URL
  • zpid regex: /\/(\d+)_zpid/ on the URL pathname

3.Zillow homedetails page DOM structure

Zillow uses React with Next.js, and the rendered DOM uses `data-testid` attributes as the primary hook for UI components. These are more stable than class names (which are hashed) but still subject to change when Zillow ships template updates. Treat your selectors as configuration that needs periodic review, not permanent infrastructure.

Key `data-testid` values on the current template: `price` for the list price span, `address` for the h1 address block, `bed-bath-sqft-fact-container` for the summary fact row, `zestimate` for the Zestimate figure, `days-on-zillow` for the market age badge, and `price-history` for the history table section.

Zillow also embeds a large JSON blob in a `<script id="__NEXT_DATA__">` tag. This blob contains the full listing object — including priceHistoryInfo events, tax history, and school data — in a structured format that is often more reliable than CSS extraction when Zillow A/B tests the visual layout. After fetching full HTML with `output_format: "html"`, parse `__NEXT_DATA__` with a JSON extractor for the most complete field set.

The price history chart and tax history table are lazy-loaded modules. They are not present in the initial HTML payload and require JavaScript execution plus a wait for the relevant `data-testid` to appear before extraction.

4.Zillow anti-bot protection and MLS constraints

Zillow operates one of the more sophisticated bot-detection stacks among consumer real estate sites. Detection operates at multiple layers: IP reputation scoring (datacenter ranges are blocked almost immediately), TLS fingerprint analysis, browser behavior heuristics on JS-rendered pages, and request rate and pattern analysis across sessions. A plain HTTP request from a datacenter IP to a homedetails URL will typically return a CAPTCHA challenge or redirect rather than listing HTML.

Residential US proxies are required for reliable access. Even with residential proxies, aggressive crawl rates will trigger session-level blocks. A sustainable homedetails refresh pipeline operates at low concurrency with randomized delays — not a bulk parallel crawler.

Zillow has historically pursued legal action against scrapers operating at commercial scale, citing the Computer Fraud and Abuse Act and breach of terms. MLS data displayed on Zillow is licensed — scraping and redistributing listing photos, agent remarks, or MLS listing IDs may violate MLS rules independently of Zillow's own terms.

  • Datacenter IPs blocked at connection or CAPTCHA-challenged immediately
  • TLS fingerprint analysis — headless browser fingerprints detected without spoofing
  • JS-required rendering for price, Zestimate, and fact panel modules
  • Lazy-loaded price history and tax history sections require explicit wait selectors
  • Frequent `data-testid` attribute changes during template A/B tests
  • Geo restrictions on some listing types and rental markets
  • MLS copyright on photos, agent remarks, and MLS listing IDs
  • Active legal enforcement history against high-volume commercial scrapers

5.Scrape a Zillow property page by zpid

Use `js_rendering` mode for homedetails pages. Zillow's fact panel — price, beds, baths, Zestimate — does not render in a plain HTTP response. Set `js_wait_selector` to `[data-testid="price"]` so the request waits until the price module has mounted before extraction runs.

Set `proxy` to `residential:us`. Zillow's geo-detection will surface different content or block non-US residential IPs. Enable the solver with `enable_solver: true` to handle any CAPTCHA challenges that appear during the session.

The `css_selectors` map below extracts the primary listing fields in a single request. If Zillow has updated a `data-testid` value, the corresponding key will return null — build null-checks into your pipeline and alert on unexpected null rates rather than silently dropping records.

Zillow homedetails CSS extraction request
json
1234567891011121314151617181920{
  "url": "https://www.zillow.com/homedetails/456-Oak-Ave-Seattle-WA-98101/2080998900_zpid/",
  "mode": "js_rendering",
  "output_format": "css_extractor",
  "proxy": "residential:us",
  "enable_solver": true,
  "js_wait_selector": "[data-testid=\"price\"]",
  "js_wait_timeout": 15000,
  "css_selectors": {
    "price": "[data-testid=\"price\"]",
    "address": "[data-testid=\"address\"]",
    "beds_baths_sqft": "[data-testid=\"bed-bath-sqft-fact-container\"]",
    "zestimate": "[data-testid=\"zestimate\"]",
    "days_on_market": "[data-testid=\"days-on-zillow\"]",
    "property_type": "[data-testid=\"property-type-badge\"]",
    "description": "[data-testid=\"description\"]",
    "hoa_fee": "[data-testid=\"hoa-fee\"]",
    "year_built": "[data-testid=\"year-built\"]"
  }
}

6.Extracting the price history module

Price history sits in a chart section that lazy-loads after the main fact panel. It is not present in the initial DOM and requires a separate wait. Use `js_wait_selector` targeting `[data-testid="price-history"]` with a longer timeout — this module loads after several secondary network requests complete.

Fetch full HTML with `output_format: "html"` for this request. After receiving the response, parse the `<script id="__NEXT_DATA__">` tag from `body.data.content` and extract `priceHistoryInfo.priceHistory` from the JSON. This array contains structured event objects with `date`, `price`, `priceChangeRate`, `event`, and `source` fields — far cleaner than scraping the rendered table rows.

Tax history follows the same pattern: look for `taxHistory` in `__NEXT_DATA__` rather than waiting for the tax table module to render.

Zillow price history — full HTML fetch for __NEXT_DATA__ parsing
json
123456789{
  "url": "https://www.zillow.com/homedetails/456-Oak-Ave-Seattle-WA-98101/2080998900_zpid/",
  "mode": "js_rendering",
  "output_format": "html",
  "proxy": "residential:us",
  "enable_solver": true,
  "js_wait_selector": "[data-testid=\"price-history\"]",
  "js_wait_timeout": 20000
}

7.Why not scrape Zillow search results

Zillow's search and map endpoints are the highest-risk surface on the site. Map tile requests, pagination tokens, and the underlying GraphQL-style search API all change frequently and are monitored closely for anomalous access patterns. A crawler hitting search pagination at any meaningful rate will be blocked within minutes on datacenter IPs and within hours on residential IPs.

The practical alternative is to build your zpid inventory from sources that are designed for bulk access: county assessor parcel data (most counties publish downloadable CSV or GIS files), USPS address databases, or licensed feeds from ATTOM, First American, or similar data providers. Once you have a zpid list, you operate your pipeline exclusively against homedetails URLs — which are stable, predictable, and lower-risk than search endpoints.

For one-time discovery of zpids in a specific geography, a low-volume search crawl with randomized delays and residential proxies is feasible — but treat it as a bootstrapping step, not an ongoing data collection mechanism. Read scrape JavaScript rendered pages for a deeper look at managing js_rendering costs across large URL lists.

8.Rental vs. for-sale listing pipelines

The same zpid can appear in both rental and for-sale contexts on Zillow — a property listed for sale may simultaneously show a Rental Zestimate, and a property that transitions from rental to for-sale retains its zpid. Store `listing_type` (for_sale, for_rent, recently_sold) as a dimension in your data model rather than assuming it is fixed.

Rental listings use largely the same homedetails template but surface different fields: monthly rent price instead of list price, lease term, pet policy, and laundry/parking details that may not appear on for-sale listings. The Rental Zestimate appears in a separate module from the sale Zestimate. Check `data-testid="rental-price"` and `data-testid="rental-zestimate"` for rental-specific fields.

If your pipeline covers both rental and for-sale inventory, parameterize your CSS selector map by listing type rather than using a single universal selector set. Null rates on type-specific selectors are a useful signal for detecting listing type transitions.

9.MLS licensing, copyright, and Zillow terms of service

Zillow's Terms of Use explicitly prohibit scraping, crawling, or automated data collection. Beyond Zillow's own terms, MLS data displayed on Zillow is subject to MLS licensing agreements that restrict downstream use of listing photos, agent remarks, and MLS listing IDs — regardless of whether the HTML is technically accessible in a browser.

The legal risk profile varies significantly by use case. Internal market research on a small number of properties for non-commercial analysis is a materially different situation from building a commercial product that redistributes scraped Zillow data at scale. Zillow has pursued litigation against scrapers in the latter category.

Commercial PropTech products that need comprehensive listing data typically license it from MLS aggregators (RETS/RESO feeds via Spark API, Bridge Interactive, or similar), or from public records data providers like ATTOM, CoreLogic, or First American. These sources are designed for programmatic access, have clear licensing terms, and do not carry the legal exposure of scraping a consumer portal.

If you are building anything beyond internal tooling, consult real estate counsel familiar with MLS licensing before deploying a Zillow scraper in production.

Frequently asked questions

What is a zpid and how do I extract it from a Zillow URL?

The zpid (Zillow Property ID) is the stable numeric identifier for every property on Zillow. It appears in homedetails URLs as the number immediately before `_zpid` in the path — for example, `.../2080998900_zpid/` → zpid `2080998900`. Use the regex `/\/(\d+)_zpid/` on the URL pathname. Store zpid as your primary key; it persists across address changes, re-listings, and template updates.

Why does Zillow return an empty or missing price field?

Zillow's price module is JavaScript-rendered and does not appear in the initial HTML payload. Use `mode: "js_rendering"` with `js_wait_selector: '[data-testid="price"]'` and a `js_wait_timeout` of at least 12,000–15,000ms. If the selector still returns null, fall back to parsing the `__NEXT_DATA__` JSON blob in the full HTML response — look for `listingPrice` or `price` in the listing object.

Can I scrape all active listings in a zip code or city?

Zillow's search and map endpoints block bulk crawlers quickly. For zip-level inventory, use county assessor parcel files (most counties publish free CSV/GIS downloads) or licensed feeds from ATTOM or CoreLogic to build your zpid list, then refresh homedetails URLs individually. For small one-time discovery (a few hundred properties), a low-volume search crawl with residential proxies and randomized delays is feasible as a bootstrapping step.

How do I get price history data for a property?

Price history lazy-loads in a separate module. Send a `js_rendering` request with `js_wait_selector: '[data-testid="price-history"]'` and `output_format: "html"`. In the response (`body.data.content`), locate the `<script id="__NEXT_DATA__">` tag and parse the JSON. The `priceHistoryInfo.priceHistory` array contains structured event objects with `date`, `price`, `priceChangeRate`, `event`, and `source` fields.

Is scraping Zillow legal?

Zillow's Terms of Use prohibit automated data collection. MLS agreements add copyright protection on photos and agent remarks. Zillow has pursued litigation against commercial scrapers under the CFAA and breach of contract theories. The legal risk is low for small-scale internal research and high for commercial redistribution. Consult real estate counsel before building any production system that scrapes and redistributes Zillow data.

Why do I need a residential US proxy for Zillow?

Zillow blocks datacenter IP ranges at the connection level or returns CAPTCHA challenges rather than listing HTML. Residential US proxies present IP addresses associated with real ISP subscribers, bypassing the first layer of IP reputation filtering. Non-US residential IPs may trigger geo-restrictions on certain listing types. Use `proxy: "residential:us"` in all Zillow requests.

How do I handle Zillow template changes breaking my selectors?

Build null-rate monitoring into your pipeline. When a `data-testid` attribute changes, the affected CSS selector returns null rather than throwing an error — silent data gaps are the failure mode. Alert when null rates on critical fields (price, address, beds) exceed a threshold over a rolling window. As a fallback, parse `__NEXT_DATA__` JSON for the same fields — the JSON schema changes less frequently than the visual template.

Related guides

  • Real Estate Web Scraping: Listings, Comps, and Market Data
  • Scrape JavaScript-Rendered Pages: SPAs, Hydration, and Hidden APIs
  • Web Scraping Without Getting Blocked
  • Web Scraping with Python

Ready to scrape without blocks?

Get your API key in minutes. Test protected URLs from the dashboard — no credit card required to start.

Ready to get started?

Start scraping protected sites today — no credit card required.

OmniScrape

Web scraping infrastructure for developers. One API call to bypass any protection.

All systems operational

Product

  • Web Unlocker
  • Browser-as-a-Service
  • Residential Proxies
  • Pricing

Developers

  • API Reference ↗
  • Quickstart ↗
  • All Guides
  • Use Cases
  • Status

Company

  • About
  • Contact

Legal

  • Privacy Policy
  • Terms of Service
  • Refund Policy
  • Cookie Policy
  • Acceptable Use

Solutions

  • E-commerce Web Scraping: Catalog Intelligence at Production Scale
  • Real Estate Web Scraping: Listings, Comps, and Market Data
  • SERP Web Scraping: Agency Rank Tracking Workflow
  • Job Board Web Scraping: HR Tech Pipeline for Labor Market Intelligence
  • Price Monitoring with Web Scraping: A Practical Developer Guide
  • Lead Generation Web Scraping: Compliant Inbound Enrichment for Sales Teams
  • Market Research Web Scraping: Multi-Geo Data Collection for Research Firms
  • Sentiment Analysis Web Scraping: Build a Production Review Pipeline
  • Logistics Web Scraping: Carrier Rates, Port ETAs, and Sailing Schedules
  • Social Media Web Scraping: Brand Mention Monitoring from Public Pages
  • LLM Training Data Scraping: Building Clean Web Corpora
  • Travel Web Scraping: Hotel Rates, Flight Fares & Parity Monitoring

Web Scraping by Language

  • Web Scraping with Python
  • Web Scraping with Node.js: fetch, Cheerio, and the OmniScrape API
  • Web Scraping with Java: HttpClient, Jsoup, and OmniScrape API
  • Web Scraping with PHP
  • Web Scraping with Go (Golang)
  • Web Scraping with Ruby: Faraday, Nokogiri, Sidekiq & OmniScrape
  • Web Scraping with C#: HttpClient, AngleSharp, and OmniScrape API
  • Web Scraping with Rust
  • Web Scraping with R: httr2, rvest, and the OmniScrape API
  • Web Scraping with C++
  • Web Scraping with Elixir
  • Web Scraping with Perl: Mojo::UserAgent, Mojo::DOM, and OmniScrape

Anti-Bot Bypass

  • How to Bypass Cloudflare When Web Scraping
  • How to Bypass DataDome When Web Scraping
  • How to Bypass Akamai Bot Manager When Web Scraping
  • How to Bypass PerimeterX (HUMAN Security) When Web Scraping
  • Bypassing AWS WAF When Web Scraping: Rate Rules, Bot Control, and Residential Proxies
  • How to Bypass Imperva (Incapsula) When Web Scraping
  • How to Bypass Kasada Bot Protection When Web Scraping
  • How to Bypass F5 BIG-IP Bot Defense When Web Scraping
  • How to Bypass Distil Networks When Web Scraping
  • How to Bypass reCAPTCHA When Web Scraping

Scraping Tools

  • Playwright Web Scraping: Practical Patterns for Protected Sites
  • Puppeteer Web Scraping: Patterns, Anti-Bot Limits, and BaaS Integration
  • Selenium Web Scraping: Practical Patterns for Real-World Projects
  • Scrapy Web Scraping with OmniScrape: Download Middleware, Pipelines, and Scale
  • Beautiful Soup Web Scraping: A Practical Guide
  • cURL Web Scraping: Shell-Native Patterns with OmniScrape
  • HTTPX Web Scraping: Async Python with OmniScrape
  • Cheerio Web Scraping: A Practical Guide

Site-Specific Scrapers

  • Amazon Scraper: Product Data, Buy Box, Reviews, and Multi-Marketplace
  • Google Search Scraper: Extract SERP Rankings and Features
  • Google Maps Scraper: Extract Business Listings and Place Data
  • LinkedIn Scraper: Companies, Jobs, and Public Profiles
  • Walmart Scraper: Prices, Stock, Rollback Deals, and Fulfillment Data
  • eBay Scraper: Extract Listings, Auctions, and Sold Prices
  • Shopify Scraper: Products, Variants, and JSON Endpoints
  • Indeed Scraper: Extract Job Listings, Salaries, and Company Data
  • Zillow Scraper: Extract Listings, Zestimates, and Price History
  • Reddit Scraper: Posts, Comments, and Subreddit Data
  • X (Twitter) Scraper: Tweets, Profiles, and Hashtags
  • Instagram Scraper: Posts, Reels, and Profile Metrics
  • TikTok Scraper: Extract Videos, Hashtags, and Trend Data
  • YouTube Scraper: Extract Video Metadata, Comments, and Channel Stats
  • Booking.com Scraper: Hotel Rates, Room Types, and Availability
  • Airbnb Scraper: Listings, Calendars, and Nightly Rates
  • Crunchbase Scraper: Extract Funding Rounds, Companies, and Investors
  • Yelp Scraper: Extract Business Listings, Ratings, and Reviews
  • Glassdoor Scraper: Employer Ratings, Salaries, and Review Data
  • Trustpilot Scraper: TrustScore, Star Distribution, and Review Monitoring

How We Compare

  • OmniScrape vs ScrapingBee
  • OmniScrape vs ZenRows
  • OmniScrape vs ScraperAPI: A Practical Developer Comparison
  • OmniScrape vs Bright Data: Which Web Scraping Platform Fits Your Team?
  • OmniScrape vs Oxylabs
  • OmniScrape vs Smartproxy
  • OmniScrape vs Crawlbase: API Design, Observability, and Migration Guide
  • OmniScrape vs Apify

Web Scraping Guides

  • Web Scraping Without Getting Blocked
  • Web Scraping Proxy Guide: Types, Sessions, Geo, and OmniScrape Integration
  • Solve CAPTCHAs While Web Scraping
  • Web Scraping vs Web Crawling: Architecture, Patterns, and When to Use Each
  • Headless Browser Scraping: When to Use It and How to Do It Right
  • Web Scraping API: Endpoint, Modes, Output Formats & Integration Patterns
  • Rotating Proxies for Web Scraping: Policies, Session Binding, and Geo Pools
  • Scrape JavaScript-Rendered Pages: SPAs, Hydration, and Hidden APIs

© 2026 OmniScrape. All rights reserved.

PrivacyTermsRefundsAcceptable Use