OmniScrape
ProductsSolutionsGuidesDocs ↗PricingAbout
ProductsSolutionsGuidesDocs ↗PricingAbout
← All guides
Site-Specific Scrapers

Glassdoor Scraper: Employer Ratings, Salaries, and Review Data

Glassdoor aggregates crowdsourced compensation data, culture ratings, interview questions, and management feedback that HR analytics teams want for benchmarking. The catch: precise salary bands are blurred behind a login prompt, full review text is truncated for unauthenticated sessions, and Glassdoor's bot detection is aggressive against datacenter traffic. Understanding exactly what is visible without authentication — and what is not — saves you from building a pipeline that returns unlock prompts instead of numbers.

This guide covers Glassdoor's URL structure, the DOM selectors that actually resolve on public pages, how OmniScrape handles the bot protection layer, and the hard limits imposed by the paywall and Terms of Service. For complementary data sources, see job board web scraping for job posting pipelines and Indeed scraper for job-linked salary disclosures.

On this page

1. Glassdoor data fields HR analytics teams want2. Glassdoor URL patterns and employer ID extraction3. Glassdoor page structure and CSS selectors4. Login walls, paywalls, and anti-bot measures5. Scraping the employer overview page (public fields)6. Scraping review listings (truncated for unauthenticated sessions)7. Salary data reality check: what you actually get without authentication8. Anonymizing employee review data before use9. Glassdoor Terms of Service and legal considerations10. FAQ

1.Glassdoor data fields HR analytics teams want

Compensation teams benchmark role salaries by metro area and job family. Employer brand teams track rating trends over quarters to correlate with hiring events or layoffs. Recruiters scan interview question snippets and difficulty ratings before outreach. The fields below represent what Glassdoor surfaces across its employer profile, reviews, salaries, and interview tabs — some publicly, some only after authentication.

Fields that are publicly visible (no login) on the overview page include aggregate ratings and review counts. Fields marked as login-required return blurred or missing values for unauthenticated requests, regardless of how you send them.

  • Employer ID and canonical company name
  • Overall rating (0–5 scale) and sub-ratings: culture, work-life balance, compensation & benefits, senior management
  • CEO name and approval percentage (thumbs-up/down vote tally)
  • Recommend to a friend percentage
  • Total review count and total salary report count
  • Salary ranges by job title and location (login-required for precise medians)
  • Individual review snippets: headline, pros, cons, reviewer job title and city
  • Interview difficulty score (Easy / Medium / Hard) and sample question text
  • Benefits ratings summary and category breakdown
  • Competitor employers listed on the overview sidebar
  • Business outlook rating and positive business outlook percentage

2.Glassdoor URL patterns and employer ID extraction

Glassdoor organizes all employer content under a single numeric employer ID embedded in every URL. The pattern is consistent across tabs, which makes it straightforward to construct target URLs programmatically once you have the ID. The employer ID appears in two forms: the full slug form (EI_IE9079.11,17) on overview pages, and the short form (E9079) on reviews, salaries, jobs, and interview pages.

To extract the numeric ID from a known company URL, parse the segment matching /E(\d+)/. Use that integer as your primary key for refresh jobs — it is stable across URL restructuring and localized subdomains. Regional Glassdoor sites (glassdoor.co.uk, glassdoor.de, glassdoor.fr) use the same employer IDs with locale-specific subdomains.

  • Overview: https://www.glassdoor.com/Overview/Working-at-Google-EI_IE9079.11,17.htm
  • Reviews: https://www.glassdoor.com/Reviews/Google-Reviews-E9079.htm
  • Salaries: https://www.glassdoor.com/Salary/Google-Salaries-E9079.htm
  • Jobs: https://www.glassdoor.com/Jobs/Google-Jobs-E9079.htm
  • Interview questions: https://www.glassdoor.com/Interview/Google-Interview-Questions-E9079.htm
  • Benefits: https://www.glassdoor.com/Benefits/Google-Benefits-E9079.htm
  • Employer ID extraction: match /E(\d+)/ — numeric portion is the stable key
  • Pagination on reviews: append ?sort.sortType=RD&sort.ascending=false&filter.iso3Language=eng&filter.employmentStatus=REGULAR&start=10
  • UK locale: https://www.glassdoor.co.uk/Reviews/Google-Reviews-E9079.htm

3.Glassdoor page structure and CSS selectors

Glassdoor's frontend has been rebuilt multiple times. The selectors below reflect the current structure, but class names are frequently obfuscated or changed in A/B tests. Where possible, prefer data-test attributes and structural selectors over class-name-only selectors, as data attributes are more stable across deploys.

The overall rating appears in a span with class rating-number or inside a div with class rating-headline. Sub-ratings for culture, work-life balance, compensation, and management render in li.rating-item elements with a label span and a value span. CEO approval lives in a div.ceoApproval block containing the CEO name and a percentage span.

Salary rows on the public salary tab render in table rows (tr.cdm-module-table-row or similar), with the salary range in a span.range child. For unauthenticated sessions, the range cell is replaced with an 'Unlock' CTA or a deliberately wide range (e.g., '$60K–$200K') that is not useful for benchmarking. Individual review cards are li elements whose id attribute begins with empReview — for example, li[id^='empReview'].

Glassdoor also embeds structured employer data in inline script tags. Look for JSON blobs assigned to window.appCache, window.__INITIAL_STATE__, or ApplicationSettings. Parsing these can yield cleaner data than DOM extraction, but the schema changes without notice and the blobs may be absent on bot-detected sessions.

4.Login walls, paywalls, and anti-bot measures

Glassdoor deploys layered access controls. The first layer is the login prompt: salary detail pages redirect unauthenticated users to a sign-in modal or return blurred cell values. The second layer is the paywall overlay on free accounts — even logged-in free users see a limited number of salary unlocks per month. The third layer is bot detection: Glassdoor uses fingerprinting and behavioral analysis that reliably blocks datacenter IP ranges, headless browser signatures, and high-frequency request patterns.

Regional sites (glassdoor.co.uk, glassdoor.de) apply the same protection stack with locale-specific CAPTCHA flows. Employer search (/employer/search) is particularly aggressive — expect CAPTCHA challenges on the second or third paginated request from a fresh session. OmniScrape's residential proxy pool and Web Unlocker solve the bot detection layer for public pages; they do not bypass the login wall or the salary paywall, which are intentional access controls tied to account state.

  • Login required for precise salary medians and full salary report detail
  • Blurred or wide-range salary cells on unauthenticated salary tab
  • CAPTCHA on employer search and paginated review requests
  • Headless browser fingerprint detection — use residential proxies
  • Datacenter IP blocks on repeated requests
  • Obfuscated JSON state blobs that may be absent on detected bot sessions
  • ToS Section 6 explicitly prohibits automated scraping and data collection
  • Session-based rate limiting on review pagination

5.Scraping the employer overview page (public fields)

The employer overview page exposes aggregate ratings, review counts, and company metadata without requiring authentication. This is the safest and most reliable Glassdoor endpoint to scrape. Use mode 'auto' with a residential US proxy — Glassdoor's CDN serves different content based on geolocation, and a US IP returns the most complete public data for US-listed employers.

The css_extractor output format lets OmniScrape run the CSS selectors server-side and return only the extracted values, reducing payload size and parsing work on your end. The selectors below target the stable data-test attributes and structural patterns on the current overview layout. Adjust the employer URL and ID for your target company.

Parsed HTML is available in body.data.content if you need the full page for selector debugging. The css_extracted object in the response contains the mapped values directly.

Glassdoor employer overview — OmniScrape request
json
1234567891011121314151617181920{
  "url": "https://www.glassdoor.com/Overview/Working-at-Microsoft-EI_IE1651.11,20.htm",
  "mode": "auto",
  "output_format": "css_extractor",
  "proxy": "residential:us",
  "enable_solver": true,
  "css_selectors": {
    "company_name": "h1.employerName, h1[data-test='employer-name']",
    "overall_rating": "span.rating-number, div.rating-headline span",
    "review_count": "a.reviewCount, a[data-test='review-count']",
    "recommend_pct": "span.recommendRating, span[data-test='recommend-pct']",
    "ceo_name": "div.ceoApproval span.ceoName",
    "ceo_approval": "div.ceoApproval span.ceoApprovalRating, span[data-test='ceo-approval']",
    "business_outlook": "span[data-test='business-outlook-pct']",
    "company_description": "div.employerDescription, div[data-test='employer-description']",
    "headquarters": "div[data-test='headquarters']",
    "industry": "div[data-test='industry']",
    "employee_count": "div[data-test='size']"
  }
}

6.Scraping review listings (truncated for unauthenticated sessions)

The reviews tab renders review cards client-side via JavaScript, so mode 'js_rendering' is required. Use js_wait_selector to wait for the review list to appear in the DOM before extraction. Without this, you will receive the initial HTML shell with empty review containers.

Unauthenticated sessions return truncated review text — typically the headline and a short excerpt of pros and cons, with the full text behind a login prompt. The data you can reliably extract without authentication includes: review headline, star rating, reviewer job title, reviewer city, review date, and the visible snippet of pros and cons text.

Pagination requires incrementing the start query parameter (start=0, start=10, start=20). Each paginated request should use a fresh residential proxy session to avoid session-based rate limiting. The js_wait_selector 'li.empReview' confirms that review cards have rendered before extraction runs.

Glassdoor reviews tab — OmniScrape request
json
12345678910111213141516171819{
  "url": "https://www.glassdoor.com/Reviews/Microsoft-Reviews-E1651.htm",
  "mode": "js_rendering",
  "output_format": "css_extractor",
  "proxy": "residential:us",
  "enable_solver": true,
  "js_wait_selector": "li[id^='empReview']",
  "js_wait_timeout": 15000,
  "css_selectors": {
    "review_headlines": "h2.review-summary, h2[data-test='review-title']",
    "pros": "span[data-test='pros'], p.pros",
    "cons": "span[data-test='cons'], p.cons",
    "star_ratings": "span.ratingNumber, span[data-test='rating']",
    "reviewer_job_title": "span.authorJobTitle, span[data-test='author-job-title']",
    "reviewer_location": "span.authorLocation, span[data-test='author-location']",
    "review_dates": "span.review-date, time[data-test='review-date']",
    "helpful_count": "span[data-test='helpful-count']"
  }
}

7.Salary data reality check: what you actually get without authentication

Without an authenticated Glassdoor session, the salary tab returns either a blurred cell with an 'Unlock' CTA or a deliberately wide salary range that is statistically useless for benchmarking (e.g., '$55,000–$210,000' for a Software Engineer). This is not a scraping limitation — it is intentional product design to drive account creation and engagement.

Even with an authenticated account, Glassdoor's Terms of Service prohibit automated collection of salary data. Scraping salary data at scale and republishing it in a competing HR product has been the subject of litigation in the industry. Internal compensation research reviewed by legal counsel is a different use case from commercial data resale, but both require careful ToS review.

For production compensation benchmarking, the standard approach is licensed data: Radford (Aon), Mercer, Willis Towers Watson, or Levels.fyi's API for tech roles. These datasets provide statistically valid sample sizes, job-level granularity, and legal data use rights. Glassdoor's public salary ranges are useful for directional sense-checking, not for building compensation bands.

Do not automate Glassdoor login using credentials obtained from any source other than your own account. Credential stuffing and bulk account creation for scraping purposes violate the Computer Fraud and Abuse Act (US), the Computer Misuse Act (UK), and analogous statutes in other jurisdictions.

8.Anonymizing employee review data before use

Glassdoor reviews are pseudonymous, not anonymous. A review that says 'Senior Software Engineer in Seattle, WA' at a company with three senior engineers in Seattle effectively identifies the reviewer. Before publishing aggregated review sentiment or feeding review data into internal dashboards, strip job titles, locations, and any other quasi-identifying fields.

GDPR Article 4 defines personal data broadly — if a review is reasonably linkable to an identifiable natural person, it is personal data regardless of whether the reviewer used their real name. EU employee reviews collected and processed by a non-EU company still fall under GDPR if the reviewer is in the EU. Aggregate sentiment scores (e.g., average rating by department) are generally safe; individual review text with metadata is not.

If you are building an employer brand analytics product, implement k-anonymity thresholds: suppress any cohort (job title × location × time period) with fewer than a configurable minimum number of reviews (commonly 5 or 10) before surfacing data to end users.

9.Glassdoor Terms of Service and legal considerations

Glassdoor's User Agreement (Section 6) explicitly prohibits scraping, crawling, and automated data collection. The prohibition covers both authenticated and unauthenticated access. Glassdoor has pursued legal action against companies that scraped and republished its data in competing HR analytics products — the hiQ Labs v. LinkedIn precedent on public data does not straightforwardly apply to Glassdoor because much of its high-value data (precise salaries, full review text) sits behind authentication.

The practical compliance boundary most legal teams draw: scraping publicly visible aggregate ratings (overall score, review count) for internal research is lower risk than scraping salary data or full review text for commercial redistribution. Neither is explicitly permitted by the ToS. Any production use case should involve legal review of the specific data fields, volumes, and downstream use.

If your use case is workforce analytics or employer brand monitoring, evaluate Glassdoor's official data licensing program before building a scraper. Licensed access provides structured data, refresh SLAs, and legal indemnification that a scraper cannot.

Frequently asked questions

Why are Glassdoor salary numbers blurred even when I scrape the salary tab?

Blurred salary cells are an intentional product feature, not a technical limitation. Glassdoor replaces precise salary values with an 'Unlock' CTA or a wide range for unauthenticated sessions to drive account sign-ups. Even with a logged-in session, the Terms of Service prohibit automated collection of salary data. What you can reliably extract without authentication is the job title label and a broad range — not the median or percentile breakdowns that make the data useful for benchmarking.

How do I extract the Glassdoor employer ID from a URL?

Match the regular expression /E(\d+)/ against the URL path. In the overview URL EI_IE9079.11,17.htm, the numeric ID is 9079. In the reviews URL E9079.htm, it is the same. Use this integer as your primary key for all employer-related requests — it is stable across URL restructuring, localized subdomains, and company name changes.

Which OmniScrape mode should I use for Glassdoor?

Use mode 'auto' with enable_solver: true and a residential US proxy for the overview page — it handles the bot detection layer and escalates to a headless browser automatically if needed. For the reviews tab, use mode 'js_rendering' explicitly with js_wait_selector set to 'li[id^="empReview"]', because review cards are rendered client-side and will not appear in a fast HTTP response.

Can OmniScrape log into Glassdoor on my behalf?

OmniScrape can execute login flows for accounts you own and are contractually permitted to automate. However, bulk scraping Glassdoor via automated accounts — whether your own or third-party credentials — violates Glassdoor's Terms of Service and may constitute unauthorized computer access under applicable law. The Web Unlocker solves bot detection on public pages; it does not bypass authentication requirements or paywall controls.

How do I handle Glassdoor's pagination for review scraping?

Increment the start query parameter in multiples of 10: ?start=0, ?start=10, ?start=20, and so on. Each paginated request should use a fresh residential proxy session (set a new session_id per request or omit session_id entirely) to avoid session-based rate limiting. Set js_wait_selector to 'li[id^="empReview"]' on each request to confirm that the new page of reviews has rendered before extraction runs. Expect CAPTCHA challenges on deeper pagination — enable_solver: true handles these automatically.

What is the difference between scraping Glassdoor and scraping Indeed for salary data?

Indeed surfaces salary data directly on job postings when employers disclose it — that data is tied to an active job listing and is generally more current. Glassdoor salary data is crowdsourced by employees and covers historical compensation across roles, not just open positions. Both platforms restrict automated collection in their Terms of Service. For job-posting salaries, see the Indeed scraper guide. For crowdsourced comp benchmarks, Glassdoor is the source — but expect the paywall to limit what you can extract without authentication.

Is Glassdoor review data subject to GDPR?

Yes, potentially. GDPR applies to personal data of EU residents regardless of where the processing company is located. A Glassdoor review that includes a job title, city, and approximate tenure at a small company can be reasonably linked to a specific individual — making it personal data under GDPR Article 4. Before storing or processing review-level data, strip quasi-identifying fields (job title, location, time period) or apply k-anonymity thresholds. Aggregate sentiment scores derived from reviews are generally lower risk than individual review records with metadata.

Related guides

  • Job Board Web Scraping: HR Tech Pipeline for Labor Market Intelligence
  • Indeed Scraper: Extract Job Listings, Salaries, and Company Data
  • Market Research Web Scraping: Multi-Geo Data Collection for Research Firms
  • Web Scraping Without Getting Blocked

Ready to scrape without blocks?

Get your API key in minutes. Test protected URLs from the dashboard — no credit card required to start.

Ready to get started?

Start scraping protected sites today — no credit card required.

OmniScrape

Web scraping infrastructure for developers. One API call to bypass any protection.

All systems operational

Product

  • Web Unlocker
  • Browser-as-a-Service
  • Residential Proxies
  • Pricing

Developers

  • API Reference ↗
  • Quickstart ↗
  • All Guides
  • Use Cases
  • Status

Company

  • About
  • Contact

Legal

  • Privacy Policy
  • Terms of Service
  • Refund Policy
  • Cookie Policy
  • Acceptable Use

Solutions

  • E-commerce Web Scraping: Catalog Intelligence at Production Scale
  • Real Estate Web Scraping: Listings, Comps, and Market Data
  • SERP Web Scraping: Agency Rank Tracking Workflow
  • Job Board Web Scraping: HR Tech Pipeline for Labor Market Intelligence
  • Price Monitoring with Web Scraping: A Practical Developer Guide
  • Lead Generation Web Scraping: Compliant Inbound Enrichment for Sales Teams
  • Market Research Web Scraping: Multi-Geo Data Collection for Research Firms
  • Sentiment Analysis Web Scraping: Build a Production Review Pipeline
  • Logistics Web Scraping: Carrier Rates, Port ETAs, and Sailing Schedules
  • Social Media Web Scraping: Brand Mention Monitoring from Public Pages
  • LLM Training Data Scraping: Building Clean Web Corpora
  • Travel Web Scraping: Hotel Rates, Flight Fares & Parity Monitoring

Web Scraping by Language

  • Web Scraping with Python
  • Web Scraping with Node.js: fetch, Cheerio, and the OmniScrape API
  • Web Scraping with Java: HttpClient, Jsoup, and OmniScrape API
  • Web Scraping with PHP
  • Web Scraping with Go (Golang)
  • Web Scraping with Ruby: Faraday, Nokogiri, Sidekiq & OmniScrape
  • Web Scraping with C#: HttpClient, AngleSharp, and OmniScrape API
  • Web Scraping with Rust
  • Web Scraping with R: httr2, rvest, and the OmniScrape API
  • Web Scraping with C++
  • Web Scraping with Elixir
  • Web Scraping with Perl: Mojo::UserAgent, Mojo::DOM, and OmniScrape

Anti-Bot Bypass

  • How to Bypass Cloudflare When Web Scraping
  • How to Bypass DataDome When Web Scraping
  • How to Bypass Akamai Bot Manager When Web Scraping
  • How to Bypass PerimeterX (HUMAN Security) When Web Scraping
  • Bypassing AWS WAF When Web Scraping: Rate Rules, Bot Control, and Residential Proxies
  • How to Bypass Imperva (Incapsula) When Web Scraping
  • How to Bypass Kasada Bot Protection When Web Scraping
  • How to Bypass F5 BIG-IP Bot Defense When Web Scraping
  • How to Bypass Distil Networks When Web Scraping
  • How to Bypass reCAPTCHA When Web Scraping

Scraping Tools

  • Playwright Web Scraping: Practical Patterns for Protected Sites
  • Puppeteer Web Scraping: Patterns, Anti-Bot Limits, and BaaS Integration
  • Selenium Web Scraping: Practical Patterns for Real-World Projects
  • Scrapy Web Scraping with OmniScrape: Download Middleware, Pipelines, and Scale
  • Beautiful Soup Web Scraping: A Practical Guide
  • cURL Web Scraping: Shell-Native Patterns with OmniScrape
  • HTTPX Web Scraping: Async Python with OmniScrape
  • Cheerio Web Scraping: A Practical Guide

Site-Specific Scrapers

  • Amazon Scraper: Product Data, Buy Box, Reviews, and Multi-Marketplace
  • Google Search Scraper: Extract SERP Rankings and Features
  • Google Maps Scraper: Extract Business Listings and Place Data
  • LinkedIn Scraper: Companies, Jobs, and Public Profiles
  • Walmart Scraper: Prices, Stock, Rollback Deals, and Fulfillment Data
  • eBay Scraper: Extract Listings, Auctions, and Sold Prices
  • Shopify Scraper: Products, Variants, and JSON Endpoints
  • Indeed Scraper: Extract Job Listings, Salaries, and Company Data
  • Zillow Scraper: Extract Listings, Zestimates, and Price History
  • Reddit Scraper: Posts, Comments, and Subreddit Data
  • X (Twitter) Scraper: Tweets, Profiles, and Hashtags
  • Instagram Scraper: Posts, Reels, and Profile Metrics
  • TikTok Scraper: Extract Videos, Hashtags, and Trend Data
  • YouTube Scraper: Extract Video Metadata, Comments, and Channel Stats
  • Booking.com Scraper: Hotel Rates, Room Types, and Availability
  • Airbnb Scraper: Listings, Calendars, and Nightly Rates
  • Crunchbase Scraper: Extract Funding Rounds, Companies, and Investors
  • Yelp Scraper: Extract Business Listings, Ratings, and Reviews
  • Glassdoor Scraper: Employer Ratings, Salaries, and Review Data
  • Trustpilot Scraper: TrustScore, Star Distribution, and Review Monitoring

How We Compare

  • OmniScrape vs ScrapingBee
  • OmniScrape vs ZenRows
  • OmniScrape vs ScraperAPI: A Practical Developer Comparison
  • OmniScrape vs Bright Data: Which Web Scraping Platform Fits Your Team?
  • OmniScrape vs Oxylabs
  • OmniScrape vs Smartproxy
  • OmniScrape vs Crawlbase: API Design, Observability, and Migration Guide
  • OmniScrape vs Apify

Web Scraping Guides

  • Web Scraping Without Getting Blocked
  • Web Scraping Proxy Guide: Types, Sessions, Geo, and OmniScrape Integration
  • Solve CAPTCHAs While Web Scraping
  • Web Scraping vs Web Crawling: Architecture, Patterns, and When to Use Each
  • Headless Browser Scraping: When to Use It and How to Do It Right
  • Web Scraping API: Endpoint, Modes, Output Formats & Integration Patterns
  • Rotating Proxies for Web Scraping: Policies, Session Binding, and Geo Pools
  • Scrape JavaScript-Rendered Pages: SPAs, Hydration, and Hidden APIs

© 2026 OmniScrape. All rights reserved.

PrivacyTermsRefundsAcceptable Use