1.STR data fields available from Airbnb listings
Revenue managers model comp nightly rates across date ranges. Investors screen markets by review velocity, occupancy signals, and fee structure. The fields below are extractable from a standard listing detail page when date parameters are included in the request URL.
Cleaning fees and service fee estimates only appear in the booking panel when check-in and check-out dates are present. Without date params, you get the base nightly rate only — useful for some use cases but incomplete for total-cost comparisons. Calendar availability is a separate extraction that requires the inline calendar grid to render fully.
- Listing ID (numeric room ID from URL path)
- Title, property type (entire place / private room / shared room), and room type
- Nightly base price and per-night breakdown
- Cleaning fee, service fee estimate, and total price for date range
- Review average score (0–5), total review count, and category ratings (cleanliness, accuracy, communication, location, check-in, value)
- Superhost badge status and host response rate
- Amenities list (WiFi, kitchen, parking, pool, etc.) and house rules
- Calendar availability — blocked vs. open dates for the next 12 months
- Bedrooms, beds, bathrooms, and maximum guest count
- Approximate latitude/longitude (rounded for privacy by Airbnb)
- Host ID and host listing count (proxy for professional operator vs. individual host)
2.Airbnb URL patterns and query parameters
All listing detail pages follow a predictable path structure rooted at /rooms/ followed by a numeric ID. The room ID is the stable primary key for a listing — it persists through title changes, price updates, and host transfers. Store it as a string rather than an integer: large Airbnb room IDs exceed safe JavaScript integer range in some runtimes.
Date context is injected via query parameters. Including check_in and check_out causes the booking panel to calculate the full trip cost including cleaning and service fees. The adults parameter affects occupancy-based pricing on listings that charge per additional guest. Omitting dates returns only the base nightly rate displayed in the hero section.
Search result URLs follow a different pattern and are discussed separately in the search section — they are significantly harder to paginate at scale than individual listing URLs.
- Listing (no dates): https://www.airbnb.com/rooms/12345678
- Listing with dates: https://www.airbnb.com/rooms/12345678?check_in=2025-07-15&check_out=2025-07-20&adults=2
- Search results: https://www.airbnb.com/s/Honolulu--HI/homes
- Search with filters: https://www.airbnb.com/s/Honolulu--HI/homes?price_min=100&price_max=300&room_types[]=Entire+home%2Fapt
- Experiences (different page template): https://www.airbnb.com/experiences/12345
- Room ID extraction: numeric path segment immediately after /rooms/
3.Listing page DOM structure and JSON extraction
The listing title renders in h1[data-section-id='HERO_DEFAULT']. The booking panel price appears in div[data-testid='book-it-default'] and contains both the nightly rate and, when dates are present, the fee breakdown. Airbnb runs continuous A/B tests on layout, which means CSS class names like _1y74zjx change frequently — data-testid attributes and aria-labels are more stable anchors for selectors.
Reviews render in a button element whose aria-label contains the word 'reviews' and the numeric count. The star rating appears in a span with data-testid='review-score'. Amenities are listed in a section with data-section-id='AMENITIES_DEFAULT', each item in a div with an icon and label.
For production pipelines, parsing the embedded JSON is more reliable than CSS selectors alone. Airbnb injects two JSON payloads into the page: a JSON-LD block of type LodgingBusiness (structured data for SEO) and a larger dehydratedState blob that contains the full Niobe GraphQL response. The dehydratedState includes pricing, availability, host details, and amenities in a single parse. Extract it from the script tag with type='application/json' and id containing 'data-deferred-state' or similar. The structure changes with Airbnb deployments, so build a schema version check into your parser.
4.GraphQL responses, bot detection, and rate limits
Airbnb's web client fetches listing detail data via an internal GraphQL layer called Niobe. When bot detection triggers, the server returns an incomplete dehydratedState — the page HTML looks valid but critical fields like pricing and availability are missing or replaced with placeholder values. This is the most common failure mode: the scraper sees a 200 response with HTML but no price data.
Datacenter IP ranges are blocked aggressively. Residential proxies with US geolocation are required for consistent access. Geo-restricted listings in jurisdictions where Airbnb has local compliance requirements may return different content or redirect based on the request's apparent origin.
Search pagination is rate-limited more aggressively than individual listing pages. GraphQL query hashes rotate periodically, breaking any approach that intercepts network requests directly. A/B layout experiments mean a selector that works today may return empty results after a deployment. Build selector fallbacks and monitor extraction success rates per field.
- Datacenter IP blocking — residential proxies required
- Incomplete dehydratedState on bot detection (200 response, missing data)
- Login modal injection on deep search pagination
- GraphQL query hash rotation breaking direct API interception
- A/B layout experiments causing selector drift
- Geo-restricted listings returning different content by request origin
- CAPTCHA challenges on high-frequency access patterns
5.Scrape a listing with dates and fee breakdown
Include check_in, check_out, and adults as query parameters to trigger full price calculation in the booking panel. Use mode js_rendering to execute the JavaScript that populates the price block — the booking panel does not render in a fast HTTP-only request. The js_wait_selector targets the booking panel container; once it appears in the DOM, extraction proceeds.
The css_selectors map below targets stable data-testid attributes where possible. The description selector uses data-section-id which is more durable than class-based selectors across Airbnb's A/B experiments. If the nightly_price selector returns empty, check whether the listing requires date selection before showing price — some listings hide the rate until dates are chosen.
Response HTML is in body.data.content. The css_extracted object contains keyed results matching your css_selectors map. Parse the dehydratedState script tag from body.data.content for more complete structured data when CSS extraction is insufficient.
1234567891011121314151617181920{
"url": "https://www.airbnb.com/rooms/12345678?check_in=2025-08-01&check_out=2025-08-05&adults=2",
"mode": "js_rendering",
"output_format": "css_extractor",
"proxy": "residential:us",
"js_wait_selector": "[data-testid=\"book-it-default\"]",
"js_wait_timeout": 15000,
"css_selectors": {
"title": "h1",
"nightly_price": "[data-testid=\"book-it-default\"] span[aria-hidden=\"false\"]",
"total_price": "[data-testid=\"price-summary\"]",
"rating": "[data-testid=\"review-score\"]",
"review_count": "button[aria-label*=\"reviews\"]",
"bedrooms": "[data-section-id=\"OVERVIEW_DEFAULT_V2\"] li",
"description": "[data-section-id=\"DESCRIPTION_DEFAULT\"]",
"amenities": "[data-section-id=\"AMENITIES_DEFAULT\"] div[aria-label]",
"host_name": "[data-testid=\"host-profile-name\"]",
"superhost": "[aria-label*=\"Superhost\"]"
}
}
6.Calendar availability extraction
The availability calendar renders in a div with data-testid='inline-availability-calendar'. Each day is a td element — blocked dates carry aria-disabled='true', available dates have aria-disabled='false'. The aria-label on each td includes the date string, making it straightforward to build a structured availability map from the extracted elements.
The calendar grid requires JavaScript execution to render. Pass any single-night date range to force the calendar into view. For multi-month availability, Airbnb loads subsequent months on user interaction — navigating forward months programmatically requires browser automation beyond what a single request can achieve. For most STR analytics use cases, the default visible month range (typically two months) is sufficient per request; schedule daily refreshes to maintain a rolling availability window.
The css_extracted response for available_days and blocked_days returns arrays of matched element text content. Cross-reference the aria-label date strings to build a complete day-by-day availability map for the listing.
12345678910111213{
"url": "https://www.airbnb.com/rooms/12345678?check_in=2025-09-01&check_out=2025-09-02",
"mode": "js_rendering",
"output_format": "css_extractor",
"proxy": "residential:us",
"js_wait_selector": "[data-testid=\"inline-availability-calendar\"]",
"js_wait_timeout": 15000,
"css_selectors": {
"available_days": "td[aria-disabled=\"false\"][aria-label]",
"blocked_days": "td[aria-disabled=\"true\"][aria-label]",
"calendar_month": "[data-testid=\"inline-availability-calendar\"] h3"
}
}
7.Why search scraping fails at scale
Map-based search on /s/City/homes fires concurrent GraphQL requests for listing tiles, map bounds, and filter facets simultaneously. Airbnb monitors this traffic pattern closely and applies aggressive rate limits and bot scoring to IPs that paginate search results. Attempting to build a room ID list by crawling search pagination is the highest-friction approach and the most likely to result in IP blocks.
A more sustainable architecture separates discovery from refresh. Perform low-volume, infrequent discovery passes to build your initial room ID list — or source room IDs from licensed STR data providers who aggregate them through authorized channels. Once you have a room ID list, individual listing refreshes are cheaper and more reliable than repeated search crawls. Schedule listing refreshes on a cadence appropriate to your use case: nightly rate monitoring may need daily refreshes, while amenity data changes rarely and can be refreshed weekly.
See scrape JavaScript rendered pages for js_rendering cost planning when running listing refreshes at scale. See web scraping without getting blocked for session and rate management strategies.
8.Do not scrape guest PII
Individual reviews on Airbnb listings may contain guest first names, profile photos, trip dates, and narrative descriptions of their stay. Under GDPR, CCPA, and similar frameworks, this constitutes personal data. STR analytics use cases typically require aggregates — overall rating, review count, nightly rate distribution — not individual reviewer identity.
Minimize what you store. If your pipeline extracts review text for sentiment analysis, strip or hash any name or identifying detail before persistence. Do not build reviewer profiles across listings. If you are operating in the EU or serving EU customers, assess whether your use case requires a legal basis under GDPR Article 6 before collecting review-level data at all.
Host information — name, photo, response rate, listing count — is less sensitive but still personally identifiable. Apply the same minimization principle: collect what your analytics model requires, discard the rest.
9.Airbnb Terms of Service and authorized data access
Airbnb's Terms of Service explicitly prohibit scraping, crawling, and automated access to the platform outside of authorized APIs. This is not a grey area — the prohibition is unambiguous in the ToS and has been the basis for legal action against third-party data aggregators.
Authorized alternatives exist. Inside Airbnb (insideairbnb.com) publishes periodic datasets of listing-level data for research purposes under open data terms. Airbnb has partner API programs for authorized property management software — these cover operational data for properties you manage, not market-wide competitor data. Licensed STR data providers such as AirDNA and Transparent aggregate market data through authorized channels and resell it with appropriate licensing.
Commercial STR analytics built on unauthorized scraping carries legal and operational risk: account bans, IP blocks, and potential litigation. Evaluate licensed data sources against your budget before building a scraping pipeline. If you proceed with scraping for internal research or authorized use cases, keep request rates low, respect robots.txt, and do not redistribute the data.
Frequently asked questions
How do I get the Airbnb total price including cleaning fee, not just the nightly rate?
Pass check_in, check_out, and adults as query parameters in the listing URL. The booking panel (data-testid='book-it-default') calculates and displays the full trip cost including cleaning fee and service fee estimate when dates are present. Without date params, only the base nightly rate appears. Parse the price summary section for the itemized breakdown — the cleaning fee is listed as a separate line item below the nightly subtotal.
Why is the Airbnb price field empty in my scrape response?
Three common causes: (1) Missing date parameters — add check_in and check_out to the URL. (2) Bot detection returning a skeleton page — the HTML loads but dehydratedState is incomplete, so the booking panel never populates. Switch to residential:us proxy and verify js_wait_selector is matching. (3) The listing is unavailable for the requested dates and Airbnb hides the price. Try a different date range or check the blocked_days calendar extraction first.
What is the difference between room ID and listing ID on Airbnb?
Airbnb uses the numeric segment after /rooms/ as the primary listing identifier — commonly called the room ID. It is the stable key that persists across title changes, price updates, and host changes. Store it as a string: large room IDs (typically 8–9 digits) exceed safe integer precision in JavaScript's Number type (2^53 - 1), and future IDs may be larger.
Does Airbnb have an official API for market data?
No open market data API exists. Airbnb offers partner APIs for authorized property management platforms — these cover operational access to properties you manage (calendar sync, reservation management) not competitor or market-wide pricing data. For market analytics, evaluate licensed data providers (AirDNA, Transparent, Inside Airbnb) before building a scraping pipeline.
How do I extract availability for multiple months, not just the current month?
A single js_rendering request renders the calendar for the currently visible month range (typically two months). To retrieve subsequent months, you would need to trigger the forward-navigation button click, which requires browser automation beyond a single-request approach. A practical alternative: schedule daily requests with the check_in date set to the first day of each target month. This gives you one month of availability per request without requiring click interaction.
How should I handle A/B selector drift on Airbnb?
Build selector fallbacks into your extraction layer. For each critical field, define two or three alternative selectors in priority order — prefer data-testid and aria-label anchors over class names, since Airbnb's class names are generated and change frequently. Monitor extraction success rates per field in your pipeline. When a field drops below a success threshold, trigger an alert to review and update the selector. Parsing the dehydratedState JSON embedded in the page is the most resilient approach for fields like price and amenities, since it is less affected by layout experiments.
Can I use mode auto instead of js_rendering for Airbnb listings?
Mode auto attempts a fast HTTP request first and escalates to js_rendering if the response signals that JavaScript execution is needed. For Airbnb listing pages, the booking panel and calendar grid require JavaScript to populate, so escalation to js_rendering will occur in most cases. Using mode js_rendering directly skips the fast-lane attempt and reduces latency for pages you know require browser rendering. Either mode is valid — js_rendering is more predictable for Airbnb specifically.
Related guides