1.Property data fields available on Zillow
Zillow surfaces a wide range of structured fields on each homedetails page, drawn from MLS feeds, public records, and Zillow's own valuation models. Understanding which fields exist — and which are reliable — is the first step before writing any selectors.
Investors and analysts most commonly track price cuts, days on market, and Zestimate trends over time. PropTech products enrich address databases with Zestimate, tax history, and school ratings. Macro analysts model zip- and metro-level trends from listing snapshots. Each use case has a different refresh cadence and field priority.
- zpid — Zillow Property ID, the stable primary key for every listing
- Address components: street, city, state, zip, county
- List price, price per square foot, and price change delta
- Zestimate (Zillow's automated valuation) and Zestimate range
- Bedrooms, full bathrooms, half bathrooms
- Interior square footage and lot size
- Property type: single-family, condo, townhouse, multi-family, land
- Days on Zillow and cumulative days on market
- Price history events: listed, sold, price reduced, relisted
- Tax history: assessed value and annual tax amount by year
- HOA fees (monthly), year built, garage/parking details
- Heating, cooling, and utility fields from public records
- Agent name, brokerage, and MLS listing ID
- Rental Zestimate on dual for-sale/rental listings
- School district and individual school ratings
- Walk Score, Transit Score, and Bike Score
2.Zillow URL patterns and zpid extraction
Zillow's homedetails URLs follow a predictable pattern and are stable over time — the zpid does not change even if the address slug portion changes. This makes zpid the correct primary key for any pipeline. Bookmark or store the canonical URL with zpid rather than reconstructing from address strings.
Search result pages and map tile endpoints are structurally different from homedetails pages. They trigger Zillow's bot detection fastest, require JS execution to paginate, and are the most likely to change without notice. Build your zpid inventory from county records or one-time discovery crawls, then operate your refresh pipeline exclusively against homedetails URLs.
To extract the zpid from a URL programmatically, match the numeric segment immediately before `_zpid` in the path. A simple regex like `/\/(\d+)_zpid/` is sufficient and handles all current URL formats.
- For-sale property: https://www.zillow.com/homedetails/123-Main-St-Anytown-CA-90210/12345678_zpid/
- Rental listing: same homedetails path, listing_sub_type indicates rental status
- Recently sold: https://www.zillow.com/homedetails/..._zpid/ with sold badge in DOM
- Search results (avoid at scale): https://www.zillow.com/homes/for_sale/San-Francisco-CA/
- Zestimate history: embedded in the homedetails page, not a separate URL
- zpid regex: /\/(\d+)_zpid/ on the URL pathname
3.Zillow homedetails page DOM structure
Zillow uses React with Next.js, and the rendered DOM uses `data-testid` attributes as the primary hook for UI components. These are more stable than class names (which are hashed) but still subject to change when Zillow ships template updates. Treat your selectors as configuration that needs periodic review, not permanent infrastructure.
Key `data-testid` values on the current template: `price` for the list price span, `address` for the h1 address block, `bed-bath-sqft-fact-container` for the summary fact row, `zestimate` for the Zestimate figure, `days-on-zillow` for the market age badge, and `price-history` for the history table section.
Zillow also embeds a large JSON blob in a `<script id="__NEXT_DATA__">` tag. This blob contains the full listing object — including priceHistoryInfo events, tax history, and school data — in a structured format that is often more reliable than CSS extraction when Zillow A/B tests the visual layout. After fetching full HTML with `output_format: "html"`, parse `__NEXT_DATA__` with a JSON extractor for the most complete field set.
The price history chart and tax history table are lazy-loaded modules. They are not present in the initial HTML payload and require JavaScript execution plus a wait for the relevant `data-testid` to appear before extraction.
4.Zillow anti-bot protection and MLS constraints
Zillow operates one of the more sophisticated bot-detection stacks among consumer real estate sites. Detection operates at multiple layers: IP reputation scoring (datacenter ranges are blocked almost immediately), TLS fingerprint analysis, browser behavior heuristics on JS-rendered pages, and request rate and pattern analysis across sessions. A plain HTTP request from a datacenter IP to a homedetails URL will typically return a CAPTCHA challenge or redirect rather than listing HTML.
Residential US proxies are required for reliable access. Even with residential proxies, aggressive crawl rates will trigger session-level blocks. A sustainable homedetails refresh pipeline operates at low concurrency with randomized delays — not a bulk parallel crawler.
Zillow has historically pursued legal action against scrapers operating at commercial scale, citing the Computer Fraud and Abuse Act and breach of terms. MLS data displayed on Zillow is licensed — scraping and redistributing listing photos, agent remarks, or MLS listing IDs may violate MLS rules independently of Zillow's own terms.
- Datacenter IPs blocked at connection or CAPTCHA-challenged immediately
- TLS fingerprint analysis — headless browser fingerprints detected without spoofing
- JS-required rendering for price, Zestimate, and fact panel modules
- Lazy-loaded price history and tax history sections require explicit wait selectors
- Frequent `data-testid` attribute changes during template A/B tests
- Geo restrictions on some listing types and rental markets
- MLS copyright on photos, agent remarks, and MLS listing IDs
- Active legal enforcement history against high-volume commercial scrapers
5.Scrape a Zillow property page by zpid
Use `js_rendering` mode for homedetails pages. Zillow's fact panel — price, beds, baths, Zestimate — does not render in a plain HTTP response. Set `js_wait_selector` to `[data-testid="price"]` so the request waits until the price module has mounted before extraction runs.
Set `proxy` to `residential:us`. Zillow's geo-detection will surface different content or block non-US residential IPs. Enable the solver with `enable_solver: true` to handle any CAPTCHA challenges that appear during the session.
The `css_selectors` map below extracts the primary listing fields in a single request. If Zillow has updated a `data-testid` value, the corresponding key will return null — build null-checks into your pipeline and alert on unexpected null rates rather than silently dropping records.
1234567891011121314151617181920{
"url": "https://www.zillow.com/homedetails/456-Oak-Ave-Seattle-WA-98101/2080998900_zpid/",
"mode": "js_rendering",
"output_format": "css_extractor",
"proxy": "residential:us",
"enable_solver": true,
"js_wait_selector": "[data-testid=\"price\"]",
"js_wait_timeout": 15000,
"css_selectors": {
"price": "[data-testid=\"price\"]",
"address": "[data-testid=\"address\"]",
"beds_baths_sqft": "[data-testid=\"bed-bath-sqft-fact-container\"]",
"zestimate": "[data-testid=\"zestimate\"]",
"days_on_market": "[data-testid=\"days-on-zillow\"]",
"property_type": "[data-testid=\"property-type-badge\"]",
"description": "[data-testid=\"description\"]",
"hoa_fee": "[data-testid=\"hoa-fee\"]",
"year_built": "[data-testid=\"year-built\"]"
}
}
6.Extracting the price history module
Price history sits in a chart section that lazy-loads after the main fact panel. It is not present in the initial DOM and requires a separate wait. Use `js_wait_selector` targeting `[data-testid="price-history"]` with a longer timeout — this module loads after several secondary network requests complete.
Fetch full HTML with `output_format: "html"` for this request. After receiving the response, parse the `<script id="__NEXT_DATA__">` tag from `body.data.content` and extract `priceHistoryInfo.priceHistory` from the JSON. This array contains structured event objects with `date`, `price`, `priceChangeRate`, `event`, and `source` fields — far cleaner than scraping the rendered table rows.
Tax history follows the same pattern: look for `taxHistory` in `__NEXT_DATA__` rather than waiting for the tax table module to render.
123456789{
"url": "https://www.zillow.com/homedetails/456-Oak-Ave-Seattle-WA-98101/2080998900_zpid/",
"mode": "js_rendering",
"output_format": "html",
"proxy": "residential:us",
"enable_solver": true,
"js_wait_selector": "[data-testid=\"price-history\"]",
"js_wait_timeout": 20000
}
7.Why not scrape Zillow search results
Zillow's search and map endpoints are the highest-risk surface on the site. Map tile requests, pagination tokens, and the underlying GraphQL-style search API all change frequently and are monitored closely for anomalous access patterns. A crawler hitting search pagination at any meaningful rate will be blocked within minutes on datacenter IPs and within hours on residential IPs.
The practical alternative is to build your zpid inventory from sources that are designed for bulk access: county assessor parcel data (most counties publish downloadable CSV or GIS files), USPS address databases, or licensed feeds from ATTOM, First American, or similar data providers. Once you have a zpid list, you operate your pipeline exclusively against homedetails URLs — which are stable, predictable, and lower-risk than search endpoints.
For one-time discovery of zpids in a specific geography, a low-volume search crawl with randomized delays and residential proxies is feasible — but treat it as a bootstrapping step, not an ongoing data collection mechanism. Read scrape JavaScript rendered pages for a deeper look at managing js_rendering costs across large URL lists.
8.Rental vs. for-sale listing pipelines
The same zpid can appear in both rental and for-sale contexts on Zillow — a property listed for sale may simultaneously show a Rental Zestimate, and a property that transitions from rental to for-sale retains its zpid. Store `listing_type` (for_sale, for_rent, recently_sold) as a dimension in your data model rather than assuming it is fixed.
Rental listings use largely the same homedetails template but surface different fields: monthly rent price instead of list price, lease term, pet policy, and laundry/parking details that may not appear on for-sale listings. The Rental Zestimate appears in a separate module from the sale Zestimate. Check `data-testid="rental-price"` and `data-testid="rental-zestimate"` for rental-specific fields.
If your pipeline covers both rental and for-sale inventory, parameterize your CSS selector map by listing type rather than using a single universal selector set. Null rates on type-specific selectors are a useful signal for detecting listing type transitions.
9.MLS licensing, copyright, and Zillow terms of service
Zillow's Terms of Use explicitly prohibit scraping, crawling, or automated data collection. Beyond Zillow's own terms, MLS data displayed on Zillow is subject to MLS licensing agreements that restrict downstream use of listing photos, agent remarks, and MLS listing IDs — regardless of whether the HTML is technically accessible in a browser.
The legal risk profile varies significantly by use case. Internal market research on a small number of properties for non-commercial analysis is a materially different situation from building a commercial product that redistributes scraped Zillow data at scale. Zillow has pursued litigation against scrapers in the latter category.
Commercial PropTech products that need comprehensive listing data typically license it from MLS aggregators (RETS/RESO feeds via Spark API, Bridge Interactive, or similar), or from public records data providers like ATTOM, CoreLogic, or First American. These sources are designed for programmatic access, have clear licensing terms, and do not carry the legal exposure of scraping a consumer portal.
If you are building anything beyond internal tooling, consult real estate counsel familiar with MLS licensing before deploying a Zillow scraper in production.
Frequently asked questions
What is a zpid and how do I extract it from a Zillow URL?
The zpid (Zillow Property ID) is the stable numeric identifier for every property on Zillow. It appears in homedetails URLs as the number immediately before `_zpid` in the path — for example, `.../2080998900_zpid/` → zpid `2080998900`. Use the regex `/\/(\d+)_zpid/` on the URL pathname. Store zpid as your primary key; it persists across address changes, re-listings, and template updates.
Why does Zillow return an empty or missing price field?
Zillow's price module is JavaScript-rendered and does not appear in the initial HTML payload. Use `mode: "js_rendering"` with `js_wait_selector: '[data-testid="price"]'` and a `js_wait_timeout` of at least 12,000–15,000ms. If the selector still returns null, fall back to parsing the `__NEXT_DATA__` JSON blob in the full HTML response — look for `listingPrice` or `price` in the listing object.
Can I scrape all active listings in a zip code or city?
Zillow's search and map endpoints block bulk crawlers quickly. For zip-level inventory, use county assessor parcel files (most counties publish free CSV/GIS downloads) or licensed feeds from ATTOM or CoreLogic to build your zpid list, then refresh homedetails URLs individually. For small one-time discovery (a few hundred properties), a low-volume search crawl with residential proxies and randomized delays is feasible as a bootstrapping step.
How do I get price history data for a property?
Price history lazy-loads in a separate module. Send a `js_rendering` request with `js_wait_selector: '[data-testid="price-history"]'` and `output_format: "html"`. In the response (`body.data.content`), locate the `<script id="__NEXT_DATA__">` tag and parse the JSON. The `priceHistoryInfo.priceHistory` array contains structured event objects with `date`, `price`, `priceChangeRate`, `event`, and `source` fields.
Is scraping Zillow legal?
Zillow's Terms of Use prohibit automated data collection. MLS agreements add copyright protection on photos and agent remarks. Zillow has pursued litigation against commercial scrapers under the CFAA and breach of contract theories. The legal risk is low for small-scale internal research and high for commercial redistribution. Consult real estate counsel before building any production system that scrapes and redistributes Zillow data.
Why do I need a residential US proxy for Zillow?
Zillow blocks datacenter IP ranges at the connection level or returns CAPTCHA challenges rather than listing HTML. Residential US proxies present IP addresses associated with real ISP subscribers, bypassing the first layer of IP reputation filtering. Non-US residential IPs may trigger geo-restrictions on certain listing types. Use `proxy: "residential:us"` in all Zillow requests.
How do I handle Zillow template changes breaking my selectors?
Build null-rate monitoring into your pipeline. When a `data-testid` attribute changes, the affected CSS selector returns null rather than throwing an error — silent data gaps are the failure mode. Alert when null rates on critical fields (price, address, beds) exceed a threshold over a rolling window. As a fallback, parse `__NEXT_DATA__` JSON for the same fields — the JSON schema changes less frequently than the visual template.
Related guides