1.Glassdoor data fields HR analytics teams want
Compensation teams benchmark role salaries by metro area and job family. Employer brand teams track rating trends over quarters to correlate with hiring events or layoffs. Recruiters scan interview question snippets and difficulty ratings before outreach. The fields below represent what Glassdoor surfaces across its employer profile, reviews, salaries, and interview tabs — some publicly, some only after authentication.
Fields that are publicly visible (no login) on the overview page include aggregate ratings and review counts. Fields marked as login-required return blurred or missing values for unauthenticated requests, regardless of how you send them.
- Employer ID and canonical company name
- Overall rating (0–5 scale) and sub-ratings: culture, work-life balance, compensation & benefits, senior management
- CEO name and approval percentage (thumbs-up/down vote tally)
- Recommend to a friend percentage
- Total review count and total salary report count
- Salary ranges by job title and location (login-required for precise medians)
- Individual review snippets: headline, pros, cons, reviewer job title and city
- Interview difficulty score (Easy / Medium / Hard) and sample question text
- Benefits ratings summary and category breakdown
- Competitor employers listed on the overview sidebar
- Business outlook rating and positive business outlook percentage
2.Glassdoor URL patterns and employer ID extraction
Glassdoor organizes all employer content under a single numeric employer ID embedded in every URL. The pattern is consistent across tabs, which makes it straightforward to construct target URLs programmatically once you have the ID. The employer ID appears in two forms: the full slug form (EI_IE9079.11,17) on overview pages, and the short form (E9079) on reviews, salaries, jobs, and interview pages.
To extract the numeric ID from a known company URL, parse the segment matching /E(\d+)/. Use that integer as your primary key for refresh jobs — it is stable across URL restructuring and localized subdomains. Regional Glassdoor sites (glassdoor.co.uk, glassdoor.de, glassdoor.fr) use the same employer IDs with locale-specific subdomains.
- Overview: https://www.glassdoor.com/Overview/Working-at-Google-EI_IE9079.11,17.htm
- Reviews: https://www.glassdoor.com/Reviews/Google-Reviews-E9079.htm
- Salaries: https://www.glassdoor.com/Salary/Google-Salaries-E9079.htm
- Jobs: https://www.glassdoor.com/Jobs/Google-Jobs-E9079.htm
- Interview questions: https://www.glassdoor.com/Interview/Google-Interview-Questions-E9079.htm
- Benefits: https://www.glassdoor.com/Benefits/Google-Benefits-E9079.htm
- Employer ID extraction: match /E(\d+)/ — numeric portion is the stable key
- Pagination on reviews: append ?sort.sortType=RD&sort.ascending=false&filter.iso3Language=eng&filter.employmentStatus=REGULAR&start=10
- UK locale: https://www.glassdoor.co.uk/Reviews/Google-Reviews-E9079.htm
3.Glassdoor page structure and CSS selectors
Glassdoor's frontend has been rebuilt multiple times. The selectors below reflect the current structure, but class names are frequently obfuscated or changed in A/B tests. Where possible, prefer data-test attributes and structural selectors over class-name-only selectors, as data attributes are more stable across deploys.
The overall rating appears in a span with class rating-number or inside a div with class rating-headline. Sub-ratings for culture, work-life balance, compensation, and management render in li.rating-item elements with a label span and a value span. CEO approval lives in a div.ceoApproval block containing the CEO name and a percentage span.
Salary rows on the public salary tab render in table rows (tr.cdm-module-table-row or similar), with the salary range in a span.range child. For unauthenticated sessions, the range cell is replaced with an 'Unlock' CTA or a deliberately wide range (e.g., '$60K–$200K') that is not useful for benchmarking. Individual review cards are li elements whose id attribute begins with empReview — for example, li[id^='empReview'].
Glassdoor also embeds structured employer data in inline script tags. Look for JSON blobs assigned to window.appCache, window.__INITIAL_STATE__, or ApplicationSettings. Parsing these can yield cleaner data than DOM extraction, but the schema changes without notice and the blobs may be absent on bot-detected sessions.
4.Login walls, paywalls, and anti-bot measures
Glassdoor deploys layered access controls. The first layer is the login prompt: salary detail pages redirect unauthenticated users to a sign-in modal or return blurred cell values. The second layer is the paywall overlay on free accounts — even logged-in free users see a limited number of salary unlocks per month. The third layer is bot detection: Glassdoor uses fingerprinting and behavioral analysis that reliably blocks datacenter IP ranges, headless browser signatures, and high-frequency request patterns.
Regional sites (glassdoor.co.uk, glassdoor.de) apply the same protection stack with locale-specific CAPTCHA flows. Employer search (/employer/search) is particularly aggressive — expect CAPTCHA challenges on the second or third paginated request from a fresh session. OmniScrape's residential proxy pool and Web Unlocker solve the bot detection layer for public pages; they do not bypass the login wall or the salary paywall, which are intentional access controls tied to account state.
- Login required for precise salary medians and full salary report detail
- Blurred or wide-range salary cells on unauthenticated salary tab
- CAPTCHA on employer search and paginated review requests
- Headless browser fingerprint detection — use residential proxies
- Datacenter IP blocks on repeated requests
- Obfuscated JSON state blobs that may be absent on detected bot sessions
- ToS Section 6 explicitly prohibits automated scraping and data collection
- Session-based rate limiting on review pagination
5.Scraping the employer overview page (public fields)
The employer overview page exposes aggregate ratings, review counts, and company metadata without requiring authentication. This is the safest and most reliable Glassdoor endpoint to scrape. Use mode 'auto' with a residential US proxy — Glassdoor's CDN serves different content based on geolocation, and a US IP returns the most complete public data for US-listed employers.
The css_extractor output format lets OmniScrape run the CSS selectors server-side and return only the extracted values, reducing payload size and parsing work on your end. The selectors below target the stable data-test attributes and structural patterns on the current overview layout. Adjust the employer URL and ID for your target company.
Parsed HTML is available in body.data.content if you need the full page for selector debugging. The css_extracted object in the response contains the mapped values directly.
1234567891011121314151617181920{
"url": "https://www.glassdoor.com/Overview/Working-at-Microsoft-EI_IE1651.11,20.htm",
"mode": "auto",
"output_format": "css_extractor",
"proxy": "residential:us",
"enable_solver": true,
"css_selectors": {
"company_name": "h1.employerName, h1[data-test='employer-name']",
"overall_rating": "span.rating-number, div.rating-headline span",
"review_count": "a.reviewCount, a[data-test='review-count']",
"recommend_pct": "span.recommendRating, span[data-test='recommend-pct']",
"ceo_name": "div.ceoApproval span.ceoName",
"ceo_approval": "div.ceoApproval span.ceoApprovalRating, span[data-test='ceo-approval']",
"business_outlook": "span[data-test='business-outlook-pct']",
"company_description": "div.employerDescription, div[data-test='employer-description']",
"headquarters": "div[data-test='headquarters']",
"industry": "div[data-test='industry']",
"employee_count": "div[data-test='size']"
}
}
6.Scraping review listings (truncated for unauthenticated sessions)
The reviews tab renders review cards client-side via JavaScript, so mode 'js_rendering' is required. Use js_wait_selector to wait for the review list to appear in the DOM before extraction. Without this, you will receive the initial HTML shell with empty review containers.
Unauthenticated sessions return truncated review text — typically the headline and a short excerpt of pros and cons, with the full text behind a login prompt. The data you can reliably extract without authentication includes: review headline, star rating, reviewer job title, reviewer city, review date, and the visible snippet of pros and cons text.
Pagination requires incrementing the start query parameter (start=0, start=10, start=20). Each paginated request should use a fresh residential proxy session to avoid session-based rate limiting. The js_wait_selector 'li.empReview' confirms that review cards have rendered before extraction runs.
12345678910111213141516171819{
"url": "https://www.glassdoor.com/Reviews/Microsoft-Reviews-E1651.htm",
"mode": "js_rendering",
"output_format": "css_extractor",
"proxy": "residential:us",
"enable_solver": true,
"js_wait_selector": "li[id^='empReview']",
"js_wait_timeout": 15000,
"css_selectors": {
"review_headlines": "h2.review-summary, h2[data-test='review-title']",
"pros": "span[data-test='pros'], p.pros",
"cons": "span[data-test='cons'], p.cons",
"star_ratings": "span.ratingNumber, span[data-test='rating']",
"reviewer_job_title": "span.authorJobTitle, span[data-test='author-job-title']",
"reviewer_location": "span.authorLocation, span[data-test='author-location']",
"review_dates": "span.review-date, time[data-test='review-date']",
"helpful_count": "span[data-test='helpful-count']"
}
}
7.Salary data reality check: what you actually get without authentication
Without an authenticated Glassdoor session, the salary tab returns either a blurred cell with an 'Unlock' CTA or a deliberately wide salary range that is statistically useless for benchmarking (e.g., '$55,000–$210,000' for a Software Engineer). This is not a scraping limitation — it is intentional product design to drive account creation and engagement.
Even with an authenticated account, Glassdoor's Terms of Service prohibit automated collection of salary data. Scraping salary data at scale and republishing it in a competing HR product has been the subject of litigation in the industry. Internal compensation research reviewed by legal counsel is a different use case from commercial data resale, but both require careful ToS review.
For production compensation benchmarking, the standard approach is licensed data: Radford (Aon), Mercer, Willis Towers Watson, or Levels.fyi's API for tech roles. These datasets provide statistically valid sample sizes, job-level granularity, and legal data use rights. Glassdoor's public salary ranges are useful for directional sense-checking, not for building compensation bands.
Do not automate Glassdoor login using credentials obtained from any source other than your own account. Credential stuffing and bulk account creation for scraping purposes violate the Computer Fraud and Abuse Act (US), the Computer Misuse Act (UK), and analogous statutes in other jurisdictions.
8.Anonymizing employee review data before use
Glassdoor reviews are pseudonymous, not anonymous. A review that says 'Senior Software Engineer in Seattle, WA' at a company with three senior engineers in Seattle effectively identifies the reviewer. Before publishing aggregated review sentiment or feeding review data into internal dashboards, strip job titles, locations, and any other quasi-identifying fields.
GDPR Article 4 defines personal data broadly — if a review is reasonably linkable to an identifiable natural person, it is personal data regardless of whether the reviewer used their real name. EU employee reviews collected and processed by a non-EU company still fall under GDPR if the reviewer is in the EU. Aggregate sentiment scores (e.g., average rating by department) are generally safe; individual review text with metadata is not.
If you are building an employer brand analytics product, implement k-anonymity thresholds: suppress any cohort (job title × location × time period) with fewer than a configurable minimum number of reviews (commonly 5 or 10) before surfacing data to end users.
9.Glassdoor Terms of Service and legal considerations
Glassdoor's User Agreement (Section 6) explicitly prohibits scraping, crawling, and automated data collection. The prohibition covers both authenticated and unauthenticated access. Glassdoor has pursued legal action against companies that scraped and republished its data in competing HR analytics products — the hiQ Labs v. LinkedIn precedent on public data does not straightforwardly apply to Glassdoor because much of its high-value data (precise salaries, full review text) sits behind authentication.
The practical compliance boundary most legal teams draw: scraping publicly visible aggregate ratings (overall score, review count) for internal research is lower risk than scraping salary data or full review text for commercial redistribution. Neither is explicitly permitted by the ToS. Any production use case should involve legal review of the specific data fields, volumes, and downstream use.
If your use case is workforce analytics or employer brand monitoring, evaluate Glassdoor's official data licensing program before building a scraper. Licensed access provides structured data, refresh SLAs, and legal indemnification that a scraper cannot.
Frequently asked questions
Why are Glassdoor salary numbers blurred even when I scrape the salary tab?
Blurred salary cells are an intentional product feature, not a technical limitation. Glassdoor replaces precise salary values with an 'Unlock' CTA or a wide range for unauthenticated sessions to drive account sign-ups. Even with a logged-in session, the Terms of Service prohibit automated collection of salary data. What you can reliably extract without authentication is the job title label and a broad range — not the median or percentile breakdowns that make the data useful for benchmarking.
How do I extract the Glassdoor employer ID from a URL?
Match the regular expression /E(\d+)/ against the URL path. In the overview URL EI_IE9079.11,17.htm, the numeric ID is 9079. In the reviews URL E9079.htm, it is the same. Use this integer as your primary key for all employer-related requests — it is stable across URL restructuring, localized subdomains, and company name changes.
Which OmniScrape mode should I use for Glassdoor?
Use mode 'auto' with enable_solver: true and a residential US proxy for the overview page — it handles the bot detection layer and escalates to a headless browser automatically if needed. For the reviews tab, use mode 'js_rendering' explicitly with js_wait_selector set to 'li[id^="empReview"]', because review cards are rendered client-side and will not appear in a fast HTTP response.
Can OmniScrape log into Glassdoor on my behalf?
OmniScrape can execute login flows for accounts you own and are contractually permitted to automate. However, bulk scraping Glassdoor via automated accounts — whether your own or third-party credentials — violates Glassdoor's Terms of Service and may constitute unauthorized computer access under applicable law. The Web Unlocker solves bot detection on public pages; it does not bypass authentication requirements or paywall controls.
How do I handle Glassdoor's pagination for review scraping?
Increment the start query parameter in multiples of 10: ?start=0, ?start=10, ?start=20, and so on. Each paginated request should use a fresh residential proxy session (set a new session_id per request or omit session_id entirely) to avoid session-based rate limiting. Set js_wait_selector to 'li[id^="empReview"]' on each request to confirm that the new page of reviews has rendered before extraction runs. Expect CAPTCHA challenges on deeper pagination — enable_solver: true handles these automatically.
What is the difference between scraping Glassdoor and scraping Indeed for salary data?
Indeed surfaces salary data directly on job postings when employers disclose it — that data is tied to an active job listing and is generally more current. Glassdoor salary data is crowdsourced by employees and covers historical compensation across roles, not just open positions. Both platforms restrict automated collection in their Terms of Service. For job-posting salaries, see the Indeed scraper guide. For crowdsourced comp benchmarks, Glassdoor is the source — but expect the paywall to limit what you can extract without authentication.
Is Glassdoor review data subject to GDPR?
Yes, potentially. GDPR applies to personal data of EU residents regardless of where the processing company is located. A Glassdoor review that includes a job title, city, and approximate tenure at a small company can be reasonably linked to a specific individual — making it personal data under GDPR Article 4. Before storing or processing review-level data, strip quasi-identifying fields (job title, location, time period) or apply k-anonymity thresholds. Aggregate sentiment scores derived from reviews are generally lower risk than individual review records with metadata.
Related guides