Google Search Scraper: Extract SERP Rankings and Features

1.SERP data teams actually store

A rank integer alone is misleading. A keyword ranking #3 organically may appear below a featured snippet, a local pack, and two ad blocks — meaning the organic result is actually the seventh visible element. Track every SERP feature that consumes above-the-fold real estate, not just blue-link positions.

Structure your schema around feature type, not just position. A result that moves from organic position 2 to position 2 inside a local pack is a fundamentally different signal — one that a flat rank integer cannot express.

Organic position, title, display URL, destination URL, and snippet for each result
Featured snippet: extracted text, source URL, snippet type (paragraph, list, table)
People Also Ask: question text, expanded answer, and source URL per card
Local pack: business name, star rating, review count, address, phone, and map pack position
Paid ads: headline, display URL, ad label, sitelinks, and ad position (top vs bottom)
Knowledge panel: entity name, type, description, and linked properties for brand queries
Image pack, video carousel, and Top Stories blocks with their source domains
Related searches footer links (useful for keyword expansion)
Sitelinks beneath organic results for navigational queries

2.Google Search URL parameters

Construct search URLs explicitly rather than relying on redirects or autocomplete. Hard-coding every parameter reduces variance between crawl runs and makes result differences attributable to actual SERP changes rather than request inconsistency.

The most important parameters for rank tracking are hl (interface language), gl (country), and num (results per page). Keep these fixed across runs for the same keyword set. Pagination uses the start offset — Google returns 10 results per page by default, so start=10 fetches page 2, start=20 fetches page 3.

Base URL: https://www.google.com/search?q=best+crm+software
Language and country: &hl=en&gl=us — always set both; omitting them yields geo-shifted results based on proxy IP alone
Results per page: &num=10 (default) or &num=100 for bulk extraction in a single request
Pagination: &start=10 for page 2, &start=20 for page 3
News tab: &tbm=nws — returns news articles instead of web results
Image tab: &tbm=isch
Verbatim mode: &tbs=li:1 — disables spelling corrections and synonym expansion
Safe search off: &safe=off — relevant for adult content research
Reduce personalization: &pws=0 historically suppressed signed-in personalization; combine with clean residential IPs for most consistent results

3.Parsing Google result HTML

Google's HTML uses a mix of stable semantic landmarks and hashed BEM-style class names that rotate with layout updates. Organic results are wrapped in div.g or div[data-hveid] containers depending on the current layout generation. Within each card: the title is an h3 element, the destination URL is in an a[href] or cite element, and the snippet lives in div.VwiC3b or div[data-sncf] — the latter appearing in newer layouts.

Prefer structural selectors over hashed class names where possible. For example, targeting h3 inside a result card is more durable than targeting a class like LC20lb that may change. When Google does rotate classes, your archived raw HTML snapshots let you diff the old and new layouts to update selectors without re-crawling.

People Also Ask blocks use div.related-question-pair with jsname attributes on the expand trigger. Featured snippets typically sit in div.xpdopen, with span.hgKElc or div.LGOjhe holding the extracted text. Local pack results appear in div.VkpGBb or div[data-cid] depending on the map integration version.

4.How Google detects automated search

Google's bot detection is primarily IP-reputation and behavioral, not JavaScript-challenge-based. Unlike PerimeterX or Cloudflare Bot Management — which inject JS fingerprinting on every page load — Google's primary gate is recognizing datacenter ASNs and high query velocity from a single IP. Once a threshold is crossed, requests are redirected to /sorry/index with a CAPTCHA.

EU and UK users encounter a GDPR consent interstitial before results render. This consent wall alters the DOM significantly — the results container is absent until the user accepts or rejects. When scraping from a European residential proxy, this banner must be handled before your CSS selectors will match anything meaningful. OmniScrape's solver handles this automatically when mode is set to auto.

Separate mobile and desktop indexes also produce different HTML structures, not just different rankings. A selector set built against desktop layout will miss or misparse mobile result cards.

CAPTCHA redirect to /sorry/index after burst queries from datacenter IPs
GDPR consent interstitial in EU/UK regions blocking result DOM
Quarterly HTML template changes that silently break brittle class-name selectors
Personalization drift: results shift by search history, signed-in account, and IP reputation (mitigate with clean residential IPs and &pws=0)
Separate mobile vs desktop index layouts requiring different selector sets
Geo-variance: same query from different country IPs returns different result sets even with matching gl parameter

5.Scrape a SERP with OmniScrape

Use a residential proxy matching your gl parameter. If gl=us, use proxy residential:us so the IP's geolocation corroborates the locale parameter — mismatches can produce blended results. Mode auto tries fast HTTP first and escalates to a headless browser only if Google returns a consent wall or CAPTCHA, keeping costs low for clean IPs.

The css_extractor output format runs selector matching server-side and returns structured arrays in body.data.css_extracted — no HTML parsing in your application code. Each selector key maps to an ordered array of matched text values, preserving visual rank order.

Google SERP — structured extraction

json

12345678910111213141516{
  "url": "https://www.google.com/search?q=omniscrape+web+unlocker&hl=en&gl=us&num=10&pws=0",
  "mode": "auto",
  "output_format": "css_extractor",
  "proxy": "residential:us",
  "enable_solver": true,
  "css_selectors": {
    "organic_titles": "div#search div.g h3",
    "organic_urls": "div#search div.g a[jsname]",
    "organic_snippets": "div#search div.g div.VwiC3b",
    "paa_questions": "div.related-question-pair span[jsname]",
    "featured_snippet_text": "div.xpdopen span.hgKElc",
    "featured_snippet_source": "div.xpdopen a[href]",
    "related_searches": "div#botstuff a[href*='search']"
  }
}

6.Mobile SERP variant

Mobile rankings differ from desktop in both content and HTML structure. Google maintains separate mobile and desktop indexes, and features like local pack and featured snippets render with different container elements on mobile. If your product tracks mobile rank specifically, run separate crawl jobs with js_rendering mode, which uses a headless browser with a mobile viewport by default.

The js_wait_selector parameter holds the request open until div#search is present in the DOM, ensuring the results container has rendered before extraction. Set js_wait_timeout conservatively — 8000ms covers most cases, but slow consent-wall flows may need more. For desktop-only rank tracking, the fast or auto mode without JS rendering is sufficient and cheaper per request.

Mobile SERP — JS rendering with wait

json

123456789{
  "url": "https://www.google.com/search?q=best+pizza+nyc&hl=en&gl=us&num=10&pws=0",
  "mode": "js_rendering",
  "output_format": "html",
  "proxy": "residential:us",
  "enable_solver": true,
  "js_wait_selector": "div#search",
  "js_wait_timeout": 8000
}

7.Scaling keyword tracking without bans

The single most effective scaling practice is distributing keywords across IPs and time. One keyword per request, 5–15 seconds between queries from the same IP, residential proxy rotation across a large pool. Never parallelize hundreds of Google queries from a single worker or IP — the query burst pattern is the clearest automated-traffic signal Google acts on.

Implement a sorry-page detector in your response handler. Check whether body.data.content contains /sorry/index or the CAPTCHA challenge text — if it does, back off that IP for at least 30 minutes, retry the keyword from a different proxy, and flag the result as unreliable rather than storing it as a real rank.

Archive raw HTML snapshots at least weekly. When Google rotates class names and your selectors break, saved SERPs let you diff the old and new layouts to update selectors without re-crawling the entire keyword set. Store snapshots keyed by keyword, locale, device type, and crawl timestamp.

For production rank-tracking products serving customers, many teams layer OmniScrape for freshness on a subset of keywords while using Search Console API for owned-property data and licensed SERP APIs for high-volume commercial use. Understand your volume needs before committing to a pure scraping architecture.

8.When not to scrape Google

Google Search Console provides average position, impressions, clicks, and CTR for properties you own — with zero scraping required and no ToS exposure. For owned domains, Search Console data is more accurate than scraped rank because it reflects actual impression-weighted position across all queries triggering your pages, not a single point-in-time crawl from one geo.

Google Ads Keyword Planner provides search volume estimates for keyword research. Third-party rank tracking APIs (licensed SERP data providers) exist specifically for commercial rank monitoring at scale. These are the appropriate tools when you need to track rankings for client domains commercially.

Scraping google.com for automated queries violates Google's Terms of Service. Technical feasibility is not the same as permission. Before building a scraping pipeline, read web scraping without getting blocked alongside a legal review of your specific use case.

9.Legal and ToS considerations

Google's Terms of Service explicitly prohibit automated queries against google.com without express written permission. This is why licensed SERP API providers exist — they have negotiated data agreements or operate under separate terms. If you scrape Google for internal research, minimize request volume, do not store personal data surfaced in results (names, contact details from local pack listings), and document your legal basis under applicable law.

GDPR and similar regulations add a second layer: result pages may contain personal data about individuals (author names, business owners, contact information). Storing and processing this data at scale may trigger data controller obligations. Consult legal counsel before building any product that stores Google SERP data about identifiable individuals at volume.

Frequently asked questions

How many Google searches can I run per day without getting blocked?

Google does not publish a threshold, and it varies by IP type, query pattern, and velocity. Datacenter IPs may fail after a few dozen queries. Residential IPs with 5–15 second spacing between requests from the same IP can sustain higher volumes. Monitor your sorry-page rate — when it rises above a few percent, slow down and rotate IPs more aggressively. There is no universally safe number; treat it as a dial you tune based on observed block rate.

Why do my scraped ranks differ from what I see in the browser?

Personalization is the primary culprit: signed-in Google accounts, search history, and location all shift results. Fix hl and gl parameters on every request, match your proxy country to your gl value, use &pws=0, and crawl at consistent times of day. For owned properties, compare scraped ranks against Search Console average position — if they diverge significantly, your crawl setup has a personalization or geo leak.

Can css_extractor return all ten organic result URLs in rank order?

Yes. When multiple DOM nodes match a selector, css_extractor returns an ordered array in body.data.css_extracted, preserving visual document order which corresponds to rank order. Verify this by spot-checking the first and last entries against the rendered page. Some teams prefer output_format html and parse with a library like Cheerio or BeautifulSoup for more control over edge cases like sitelinks or result cards with multiple URLs.

Does Google use Cloudflare or PerimeterX bot protection?

No. Google runs its own infrastructure and bot detection. Cloudflare bypass techniques do not apply. Google's primary defense is IP-reputation scoring and query-rate analysis — it identifies datacenter ASNs and burst patterns. Focus on residential proxies, rate spacing, and sorry-page monitoring rather than JS fingerprint evasion. See web scraping without getting blocked for general anti-bot evasion principles.

How do I handle the EU consent banner that appears before results?

Set enable_solver: true and mode: auto. OmniScrape's solver detects the GDPR consent interstitial and dismisses it before returning the response. Without the solver, the response HTML will contain the consent wall DOM rather than result cards, and your CSS selectors will return empty arrays. If you are routing through a non-EU residential proxy, you may not encounter the banner at all — but do not rely on this; proxy IP geolocation is not always precise.

How do I track rankings for multiple locales and languages?

Run separate crawl jobs per locale combination — one job per (keyword, hl, gl, device) tuple. Store results with all four dimensions as part of the primary key. Do not reuse the same request across locales and try to infer rank differences; Google's results differ enough between locales that cross-contamination will corrupt your dataset. Use matching proxy regions for each gl value to ensure IP geolocation corroborates the locale parameter.

What is the best way to detect when Google's HTML layout has changed and my selectors are broken?

Archive raw HTML snapshots keyed by keyword, locale, and crawl timestamp. After each crawl run, check whether your extracted arrays are unexpectedly empty or shorter than the previous run for the same keyword. An empty organic_titles array on a query that previously returned 10 results is a reliable signal of a selector break, not a genuine SERP change. Diff the new HTML against the archived snapshot to identify which class names or container elements changed, then update selectors without re-crawling.

Related guides

Ready to scrape without blocks?

Get your API key in minutes. Test protected URLs from the dashboard — no credit card required to start.

1.SERP data teams actually store

Organic position, title, display URL, destination URL, and snippet for each result
Featured snippet: extracted text, source URL, snippet type (paragraph, list, table)
People Also Ask: question text, expanded answer, and source URL per card
Local pack: business name, star rating, review count, address, phone, and map pack position
Paid ads: headline, display URL, ad label, sitelinks, and ad position (top vs bottom)
Knowledge panel: entity name, type, description, and linked properties for brand queries
Image pack, video carousel, and Top Stories blocks with their source domains
Related searches footer links (useful for keyword expansion)
Sitelinks beneath organic results for navigational queries

2.Google Search URL parameters

Base URL: https://www.google.com/search?q=best+crm+software
Language and country: &hl=en&gl=us — always set both; omitting them yields geo-shifted results based on proxy IP alone
Results per page: &num=10 (default) or &num=100 for bulk extraction in a single request
Pagination: &start=10 for page 2, &start=20 for page 3
News tab: &tbm=nws — returns news articles instead of web results
Image tab: &tbm=isch
Verbatim mode: &tbs=li:1 — disables spelling corrections and synonym expansion
Safe search off: &safe=off — relevant for adult content research
Reduce personalization: &pws=0 historically suppressed signed-in personalization; combine with clean residential IPs for most consistent results

3.Parsing Google result HTML

4.How Google detects automated search

Separate mobile and desktop indexes also produce different HTML structures, not just different rankings. A selector set built against desktop layout will miss or misparse mobile result cards.

CAPTCHA redirect to /sorry/index after burst queries from datacenter IPs
GDPR consent interstitial in EU/UK regions blocking result DOM
Quarterly HTML template changes that silently break brittle class-name selectors
Personalization drift: results shift by search history, signed-in account, and IP reputation (mitigate with clean residential IPs and &pws=0)
Separate mobile vs desktop index layouts requiring different selector sets
Geo-variance: same query from different country IPs returns different result sets even with matching gl parameter

5.Scrape a SERP with OmniScrape

Google SERP — structured extraction

json

12345678910111213141516{
  "url": "https://www.google.com/search?q=omniscrape+web+unlocker&hl=en&gl=us&num=10&pws=0",
  "mode": "auto",
  "output_format": "css_extractor",
  "proxy": "residential:us",
  "enable_solver": true,
  "css_selectors": {
    "organic_titles": "div#search div.g h3",
    "organic_urls": "div#search div.g a[jsname]",
    "organic_snippets": "div#search div.g div.VwiC3b",
    "paa_questions": "div.related-question-pair span[jsname]",
    "featured_snippet_text": "div.xpdopen span.hgKElc",
    "featured_snippet_source": "div.xpdopen a[href]",
    "related_searches": "div#botstuff a[href*='search']"
  }
}

6.Mobile SERP variant

Mobile SERP — JS rendering with wait

json

123456789{
  "url": "https://www.google.com/search?q=best+pizza+nyc&hl=en&gl=us&num=10&pws=0",
  "mode": "js_rendering",
  "output_format": "html",
  "proxy": "residential:us",
  "enable_solver": true,
  "js_wait_selector": "div#search",
  "js_wait_timeout": 8000
}

7.Scaling keyword tracking without bans

8.When not to scrape Google

9.Legal and ToS considerations

Frequently asked questions

How many Google searches can I run per day without getting blocked?

Why do my scraped ranks differ from what I see in the browser?

Can css_extractor return all ten organic result URLs in rank order?

Does Google use Cloudflare or PerimeterX bot protection?

How do I handle the EU consent banner that appears before results?

How do I track rankings for multiple locales and languages?

What is the best way to detect when Google's HTML layout has changed and my selectors are broken?

Related guides

Ready to scrape without blocks?

Get your API key in minutes. Test protected URLs from the dashboard — no credit card required to start.