OmniScrape
ProductsSolutionsGuidesDocs ↗PricingAbout
ProductsSolutionsGuidesDocs ↗PricingAbout
← All guides
Site-Specific Scrapers

Google Search Scraper: Extract SERP Rankings and Features

SERP tracking looks straightforward until your datacenter IP hits a CAPTCHA on the third keyword. Google Search is not an API: the HTML layout shifts quarterly, EU users encounter consent walls before results render, and mobile SERPs are structurally different from desktop. Treating it like a stable data source leads to brittle pipelines.

This guide focuses specifically on Google Search: how to construct locale-correct URLs, which SERP features to capture as structured fields, how Google detects automated queries, and how to use OmniScrape's residential proxy layer and solver to get clean result HTML at scale. For the broader rank-monitoring architecture — multi-engine coverage, scheduling, and data normalization — read SERP web scraping.

On this page

1. SERP data teams actually store2. Google Search URL parameters3. Parsing Google result HTML4. How Google detects automated search5. Scrape a SERP with OmniScrape6. Mobile SERP variant7. Scaling keyword tracking without bans8. When not to scrape Google9. Legal and ToS considerations10. FAQ

1.SERP data teams actually store

A rank integer alone is misleading. A keyword ranking #3 organically may appear below a featured snippet, a local pack, and two ad blocks — meaning the organic result is actually the seventh visible element. Track every SERP feature that consumes above-the-fold real estate, not just blue-link positions.

Structure your schema around feature type, not just position. A result that moves from organic position 2 to position 2 inside a local pack is a fundamentally different signal — one that a flat rank integer cannot express.

  • Organic position, title, display URL, destination URL, and snippet for each result
  • Featured snippet: extracted text, source URL, snippet type (paragraph, list, table)
  • People Also Ask: question text, expanded answer, and source URL per card
  • Local pack: business name, star rating, review count, address, phone, and map pack position
  • Paid ads: headline, display URL, ad label, sitelinks, and ad position (top vs bottom)
  • Knowledge panel: entity name, type, description, and linked properties for brand queries
  • Image pack, video carousel, and Top Stories blocks with their source domains
  • Related searches footer links (useful for keyword expansion)
  • Sitelinks beneath organic results for navigational queries

2.Google Search URL parameters

Construct search URLs explicitly rather than relying on redirects or autocomplete. Hard-coding every parameter reduces variance between crawl runs and makes result differences attributable to actual SERP changes rather than request inconsistency.

The most important parameters for rank tracking are hl (interface language), gl (country), and num (results per page). Keep these fixed across runs for the same keyword set. Pagination uses the start offset — Google returns 10 results per page by default, so start=10 fetches page 2, start=20 fetches page 3.

  • Base URL: https://www.google.com/search?q=best+crm+software
  • Language and country: &hl=en&gl=us — always set both; omitting them yields geo-shifted results based on proxy IP alone
  • Results per page: &num=10 (default) or &num=100 for bulk extraction in a single request
  • Pagination: &start=10 for page 2, &start=20 for page 3
  • News tab: &tbm=nws — returns news articles instead of web results
  • Image tab: &tbm=isch
  • Verbatim mode: &tbs=li:1 — disables spelling corrections and synonym expansion
  • Safe search off: &safe=off — relevant for adult content research
  • Reduce personalization: &pws=0 historically suppressed signed-in personalization; combine with clean residential IPs for most consistent results

3.Parsing Google result HTML

Google's HTML uses a mix of stable semantic landmarks and hashed BEM-style class names that rotate with layout updates. Organic results are wrapped in div.g or div[data-hveid] containers depending on the current layout generation. Within each card: the title is an h3 element, the destination URL is in an a[href] or cite element, and the snippet lives in div.VwiC3b or div[data-sncf] — the latter appearing in newer layouts.

Prefer structural selectors over hashed class names where possible. For example, targeting h3 inside a result card is more durable than targeting a class like LC20lb that may change. When Google does rotate classes, your archived raw HTML snapshots let you diff the old and new layouts to update selectors without re-crawling.

People Also Ask blocks use div.related-question-pair with jsname attributes on the expand trigger. Featured snippets typically sit in div.xpdopen, with span.hgKElc or div.LGOjhe holding the extracted text. Local pack results appear in div.VkpGBb or div[data-cid] depending on the map integration version.

4.How Google detects automated search

Google's bot detection is primarily IP-reputation and behavioral, not JavaScript-challenge-based. Unlike PerimeterX or Cloudflare Bot Management — which inject JS fingerprinting on every page load — Google's primary gate is recognizing datacenter ASNs and high query velocity from a single IP. Once a threshold is crossed, requests are redirected to /sorry/index with a CAPTCHA.

EU and UK users encounter a GDPR consent interstitial before results render. This consent wall alters the DOM significantly — the results container is absent until the user accepts or rejects. When scraping from a European residential proxy, this banner must be handled before your CSS selectors will match anything meaningful. OmniScrape's solver handles this automatically when mode is set to auto.

Separate mobile and desktop indexes also produce different HTML structures, not just different rankings. A selector set built against desktop layout will miss or misparse mobile result cards.

  • CAPTCHA redirect to /sorry/index after burst queries from datacenter IPs
  • GDPR consent interstitial in EU/UK regions blocking result DOM
  • Quarterly HTML template changes that silently break brittle class-name selectors
  • Personalization drift: results shift by search history, signed-in account, and IP reputation (mitigate with clean residential IPs and &pws=0)
  • Separate mobile vs desktop index layouts requiring different selector sets
  • Geo-variance: same query from different country IPs returns different result sets even with matching gl parameter

5.Scrape a SERP with OmniScrape

Use a residential proxy matching your gl parameter. If gl=us, use proxy residential:us so the IP's geolocation corroborates the locale parameter — mismatches can produce blended results. Mode auto tries fast HTTP first and escalates to a headless browser only if Google returns a consent wall or CAPTCHA, keeping costs low for clean IPs.

The css_extractor output format runs selector matching server-side and returns structured arrays in body.data.css_extracted — no HTML parsing in your application code. Each selector key maps to an ordered array of matched text values, preserving visual rank order.

Google SERP — structured extraction
json
12345678910111213141516{
  "url": "https://www.google.com/search?q=omniscrape+web+unlocker&hl=en&gl=us&num=10&pws=0",
  "mode": "auto",
  "output_format": "css_extractor",
  "proxy": "residential:us",
  "enable_solver": true,
  "css_selectors": {
    "organic_titles": "div#search div.g h3",
    "organic_urls": "div#search div.g a[jsname]",
    "organic_snippets": "div#search div.g div.VwiC3b",
    "paa_questions": "div.related-question-pair span[jsname]",
    "featured_snippet_text": "div.xpdopen span.hgKElc",
    "featured_snippet_source": "div.xpdopen a[href]",
    "related_searches": "div#botstuff a[href*='search']"
  }
}

6.Mobile SERP variant

Mobile rankings differ from desktop in both content and HTML structure. Google maintains separate mobile and desktop indexes, and features like local pack and featured snippets render with different container elements on mobile. If your product tracks mobile rank specifically, run separate crawl jobs with js_rendering mode, which uses a headless browser with a mobile viewport by default.

The js_wait_selector parameter holds the request open until div#search is present in the DOM, ensuring the results container has rendered before extraction. Set js_wait_timeout conservatively — 8000ms covers most cases, but slow consent-wall flows may need more. For desktop-only rank tracking, the fast or auto mode without JS rendering is sufficient and cheaper per request.

Mobile SERP — JS rendering with wait
json
123456789{
  "url": "https://www.google.com/search?q=best+pizza+nyc&hl=en&gl=us&num=10&pws=0",
  "mode": "js_rendering",
  "output_format": "html",
  "proxy": "residential:us",
  "enable_solver": true,
  "js_wait_selector": "div#search",
  "js_wait_timeout": 8000
}

7.Scaling keyword tracking without bans

The single most effective scaling practice is distributing keywords across IPs and time. One keyword per request, 5–15 seconds between queries from the same IP, residential proxy rotation across a large pool. Never parallelize hundreds of Google queries from a single worker or IP — the query burst pattern is the clearest automated-traffic signal Google acts on.

Implement a sorry-page detector in your response handler. Check whether body.data.content contains /sorry/index or the CAPTCHA challenge text — if it does, back off that IP for at least 30 minutes, retry the keyword from a different proxy, and flag the result as unreliable rather than storing it as a real rank.

Archive raw HTML snapshots at least weekly. When Google rotates class names and your selectors break, saved SERPs let you diff the old and new layouts to update selectors without re-crawling the entire keyword set. Store snapshots keyed by keyword, locale, device type, and crawl timestamp.

For production rank-tracking products serving customers, many teams layer OmniScrape for freshness on a subset of keywords while using Search Console API for owned-property data and licensed SERP APIs for high-volume commercial use. Understand your volume needs before committing to a pure scraping architecture.

8.When not to scrape Google

Google Search Console provides average position, impressions, clicks, and CTR for properties you own — with zero scraping required and no ToS exposure. For owned domains, Search Console data is more accurate than scraped rank because it reflects actual impression-weighted position across all queries triggering your pages, not a single point-in-time crawl from one geo.

Google Ads Keyword Planner provides search volume estimates for keyword research. Third-party rank tracking APIs (licensed SERP data providers) exist specifically for commercial rank monitoring at scale. These are the appropriate tools when you need to track rankings for client domains commercially.

Scraping google.com for automated queries violates Google's Terms of Service. Technical feasibility is not the same as permission. Before building a scraping pipeline, read web scraping without getting blocked alongside a legal review of your specific use case.

9.Legal and ToS considerations

Google's Terms of Service explicitly prohibit automated queries against google.com without express written permission. This is why licensed SERP API providers exist — they have negotiated data agreements or operate under separate terms. If you scrape Google for internal research, minimize request volume, do not store personal data surfaced in results (names, contact details from local pack listings), and document your legal basis under applicable law.

GDPR and similar regulations add a second layer: result pages may contain personal data about individuals (author names, business owners, contact information). Storing and processing this data at scale may trigger data controller obligations. Consult legal counsel before building any product that stores Google SERP data about identifiable individuals at volume.

Frequently asked questions

How many Google searches can I run per day without getting blocked?

Google does not publish a threshold, and it varies by IP type, query pattern, and velocity. Datacenter IPs may fail after a few dozen queries. Residential IPs with 5–15 second spacing between requests from the same IP can sustain higher volumes. Monitor your sorry-page rate — when it rises above a few percent, slow down and rotate IPs more aggressively. There is no universally safe number; treat it as a dial you tune based on observed block rate.

Why do my scraped ranks differ from what I see in the browser?

Personalization is the primary culprit: signed-in Google accounts, search history, and location all shift results. Fix hl and gl parameters on every request, match your proxy country to your gl value, use &pws=0, and crawl at consistent times of day. For owned properties, compare scraped ranks against Search Console average position — if they diverge significantly, your crawl setup has a personalization or geo leak.

Can css_extractor return all ten organic result URLs in rank order?

Yes. When multiple DOM nodes match a selector, css_extractor returns an ordered array in body.data.css_extracted, preserving visual document order which corresponds to rank order. Verify this by spot-checking the first and last entries against the rendered page. Some teams prefer output_format html and parse with a library like Cheerio or BeautifulSoup for more control over edge cases like sitelinks or result cards with multiple URLs.

Does Google use Cloudflare or PerimeterX bot protection?

No. Google runs its own infrastructure and bot detection. Cloudflare bypass techniques do not apply. Google's primary defense is IP-reputation scoring and query-rate analysis — it identifies datacenter ASNs and burst patterns. Focus on residential proxies, rate spacing, and sorry-page monitoring rather than JS fingerprint evasion. See web scraping without getting blocked for general anti-bot evasion principles.

How do I handle the EU consent banner that appears before results?

Set enable_solver: true and mode: auto. OmniScrape's solver detects the GDPR consent interstitial and dismisses it before returning the response. Without the solver, the response HTML will contain the consent wall DOM rather than result cards, and your CSS selectors will return empty arrays. If you are routing through a non-EU residential proxy, you may not encounter the banner at all — but do not rely on this; proxy IP geolocation is not always precise.

How do I track rankings for multiple locales and languages?

Run separate crawl jobs per locale combination — one job per (keyword, hl, gl, device) tuple. Store results with all four dimensions as part of the primary key. Do not reuse the same request across locales and try to infer rank differences; Google's results differ enough between locales that cross-contamination will corrupt your dataset. Use matching proxy regions for each gl value to ensure IP geolocation corroborates the locale parameter.

What is the best way to detect when Google's HTML layout has changed and my selectors are broken?

Archive raw HTML snapshots keyed by keyword, locale, and crawl timestamp. After each crawl run, check whether your extracted arrays are unexpectedly empty or shorter than the previous run for the same keyword. An empty organic_titles array on a query that previously returned 10 results is a reliable signal of a selector break, not a genuine SERP change. Diff the new HTML against the archived snapshot to identify which class names or container elements changed, then update selectors without re-crawling.

Related guides

  • SERP Web Scraping: Agency Rank Tracking Workflow
  • Web Scraping Without Getting Blocked
  • Web Scraping with Python
  • Price Monitoring with Web Scraping: A Practical Developer Guide

Ready to scrape without blocks?

Get your API key in minutes. Test protected URLs from the dashboard — no credit card required to start.

Ready to get started?

Start scraping protected sites today — no credit card required.

OmniScrape

Web scraping infrastructure for developers. One API call to bypass any protection.

All systems operational

Product

  • Web Unlocker
  • Browser-as-a-Service
  • Residential Proxies
  • Pricing

Developers

  • API Reference ↗
  • Quickstart ↗
  • All Guides
  • Use Cases
  • Status

Company

  • About
  • Contact

Legal

  • Privacy Policy
  • Terms of Service
  • Refund Policy
  • Cookie Policy
  • Acceptable Use

Solutions

  • E-commerce Web Scraping: Catalog Intelligence at Production Scale
  • Real Estate Web Scraping: Listings, Comps, and Market Data
  • SERP Web Scraping: Agency Rank Tracking Workflow
  • Job Board Web Scraping: HR Tech Pipeline for Labor Market Intelligence
  • Price Monitoring with Web Scraping: A Practical Developer Guide
  • Lead Generation Web Scraping: Compliant Inbound Enrichment for Sales Teams
  • Market Research Web Scraping: Multi-Geo Data Collection for Research Firms
  • Sentiment Analysis Web Scraping: Build a Production Review Pipeline
  • Logistics Web Scraping: Carrier Rates, Port ETAs, and Sailing Schedules
  • Social Media Web Scraping: Brand Mention Monitoring from Public Pages
  • LLM Training Data Scraping: Building Clean Web Corpora
  • Travel Web Scraping: Hotel Rates, Flight Fares & Parity Monitoring

Web Scraping by Language

  • Web Scraping with Python
  • Web Scraping with Node.js: fetch, Cheerio, and the OmniScrape API
  • Web Scraping with Java: HttpClient, Jsoup, and OmniScrape API
  • Web Scraping with PHP
  • Web Scraping with Go (Golang)
  • Web Scraping with Ruby: Faraday, Nokogiri, Sidekiq & OmniScrape
  • Web Scraping with C#: HttpClient, AngleSharp, and OmniScrape API
  • Web Scraping with Rust
  • Web Scraping with R: httr2, rvest, and the OmniScrape API
  • Web Scraping with C++
  • Web Scraping with Elixir
  • Web Scraping with Perl: Mojo::UserAgent, Mojo::DOM, and OmniScrape

Anti-Bot Bypass

  • How to Bypass Cloudflare When Web Scraping
  • How to Bypass DataDome When Web Scraping
  • How to Bypass Akamai Bot Manager When Web Scraping
  • How to Bypass PerimeterX (HUMAN Security) When Web Scraping
  • Bypassing AWS WAF When Web Scraping: Rate Rules, Bot Control, and Residential Proxies
  • How to Bypass Imperva (Incapsula) When Web Scraping
  • How to Bypass Kasada Bot Protection When Web Scraping
  • How to Bypass F5 BIG-IP Bot Defense When Web Scraping
  • How to Bypass Distil Networks When Web Scraping
  • How to Bypass reCAPTCHA When Web Scraping

Scraping Tools

  • Playwright Web Scraping: Practical Patterns for Protected Sites
  • Puppeteer Web Scraping: Patterns, Anti-Bot Limits, and BaaS Integration
  • Selenium Web Scraping: Practical Patterns for Real-World Projects
  • Scrapy Web Scraping with OmniScrape: Download Middleware, Pipelines, and Scale
  • Beautiful Soup Web Scraping: A Practical Guide
  • cURL Web Scraping: Shell-Native Patterns with OmniScrape
  • HTTPX Web Scraping: Async Python with OmniScrape
  • Cheerio Web Scraping: A Practical Guide

Site-Specific Scrapers

  • Amazon Scraper: Product Data, Buy Box, Reviews, and Multi-Marketplace
  • Google Search Scraper: Extract SERP Rankings and Features
  • Google Maps Scraper: Extract Business Listings and Place Data
  • LinkedIn Scraper: Companies, Jobs, and Public Profiles
  • Walmart Scraper: Prices, Stock, Rollback Deals, and Fulfillment Data
  • eBay Scraper: Extract Listings, Auctions, and Sold Prices
  • Shopify Scraper: Products, Variants, and JSON Endpoints
  • Indeed Scraper: Extract Job Listings, Salaries, and Company Data
  • Zillow Scraper: Extract Listings, Zestimates, and Price History
  • Reddit Scraper: Posts, Comments, and Subreddit Data
  • X (Twitter) Scraper: Tweets, Profiles, and Hashtags
  • Instagram Scraper: Posts, Reels, and Profile Metrics
  • TikTok Scraper: Extract Videos, Hashtags, and Trend Data
  • YouTube Scraper: Extract Video Metadata, Comments, and Channel Stats
  • Booking.com Scraper: Hotel Rates, Room Types, and Availability
  • Airbnb Scraper: Listings, Calendars, and Nightly Rates
  • Crunchbase Scraper: Extract Funding Rounds, Companies, and Investors
  • Yelp Scraper: Extract Business Listings, Ratings, and Reviews
  • Glassdoor Scraper: Employer Ratings, Salaries, and Review Data
  • Trustpilot Scraper: TrustScore, Star Distribution, and Review Monitoring

How We Compare

  • OmniScrape vs ScrapingBee
  • OmniScrape vs ZenRows
  • OmniScrape vs ScraperAPI: A Practical Developer Comparison
  • OmniScrape vs Bright Data: Which Web Scraping Platform Fits Your Team?
  • OmniScrape vs Oxylabs
  • OmniScrape vs Smartproxy
  • OmniScrape vs Crawlbase: API Design, Observability, and Migration Guide
  • OmniScrape vs Apify

Web Scraping Guides

  • Web Scraping Without Getting Blocked
  • Web Scraping Proxy Guide: Types, Sessions, Geo, and OmniScrape Integration
  • Solve CAPTCHAs While Web Scraping
  • Web Scraping vs Web Crawling: Architecture, Patterns, and When to Use Each
  • Headless Browser Scraping: When to Use It and How to Do It Right
  • Web Scraping API: Endpoint, Modes, Output Formats & Integration Patterns
  • Rotating Proxies for Web Scraping: Policies, Session Binding, and Geo Pools
  • Scrape JavaScript-Rendered Pages: SPAs, Hydration, and Hidden APIs

© 2026 OmniScrape. All rights reserved.

PrivacyTermsRefundsAcceptable Use