OmniScrape
ProductsSolutionsGuidesDocs ↗PricingAbout
ProductsSolutionsGuidesDocs ↗PricingAbout
← All guides
Site-Specific Scrapers

LinkedIn Scraper: Companies, Jobs, and Public Profiles

LinkedIn is the default source for B2B firmographics, job market signals, and professional identity data — and one of the few platforms that has successfully litigated against unauthorized scraping. The hiQ Labs v. LinkedIn case established a narrow CFAA precedent for public data, but LinkedIn's User Agreement still contractually prohibits automated access, and Microsoft enforces it actively. Read the compliance section before you write any code.

The technical patterns below apply exclusively to pages LinkedIn renders to logged-out browsers — public company pages, individual job postings, and the thin slice of profile data visible without authentication. This is not a guide for bypassing login walls or automating LinkedIn accounts. If you need sustainable B2B enrichment at scale, evaluate LinkedIn's official APIs and licensed data providers (Crunchbase, People Data Labs, PitchBook) before building a scraper. For contexts where you have lawful access to public pages, this guide documents exactly what the HTML looks like, what breaks scrapers, and where headless browser scraping becomes necessary.

On this page

1. Data fields worth extracting from LinkedIn2. LinkedIn URL patterns for public pages3. Parsing LinkedIn's obfuscated markup4. LinkedIn's anti-bot enforcement stack5. Scrape a public company page6. Scrape an individual job posting7. Job board aggregation at scale8. Official APIs and licensed data alternatives9. User Agreement and legal risk — read before writing code10. FAQ

1.Data fields worth extracting from LinkedIn

Before writing selectors, map what is actually available to a logged-out browser versus what requires authentication. LinkedIn aggressively gates high-value fields behind login — email addresses, full work histories, connection graphs, and InMail content are never accessible without an authenticated session, and automating that session violates the User Agreement.

The fields below are those visible on public pages to an unauthenticated request. Even these can disappear behind a login prompt if LinkedIn's bot detection flags your traffic — residential proxies and low request frequency reduce but do not eliminate that risk.

Sales intelligence teams typically want company size, industry, headquarters, and recent post activity. Recruiting and HR products want job title, location, seniority level, and posting date to feed freshness signals. Enrichment pipelines want public profile headlines and current employer to cross-reference against CRM records — not contact details.

  • Company: name, tagline, industry vertical, employee count range (e.g. '10,001+ employees'), HQ city and country, follower count, LinkedIn URL slug
  • Company about: founded year, company type (public/private/nonprofit), website URL, specialties list, description text
  • Company posts: post text, reaction counts, comment counts, post URL, author name — only on public post pages
  • Jobs: title, company name, location string, workplace type (Remote / Hybrid / On-site), time since posted, easy-apply flag
  • Job description: responsibilities and requirements text, salary range when listed, applicant count when visible (e.g. 'Over 200 applicants')
  • Public profiles (logged-out): headline, current role title and employer, location, connection count label ('500+ connections') — full work history is gated
  • School and showcase pages: page name, follower count, description — same structure as company pages

2.LinkedIn URL patterns for public pages

Use canonical URLs directly. Avoid constructing URLs from search result fragments — LinkedIn's search pages are heavily rate-limited and often redirect to login for non-human traffic. Company slugs and job IDs are stable identifiers you can store and revisit.

Job search URLs (/jobs/search/) work without login for the first few pages but paginate via &start=25 increments and block aggressive crawlers quickly. Treat them as a discovery mechanism, not a bulk harvest endpoint. Individual job view URLs are far more reliable for extraction.

People search (/search/results/people/) and Sales Navigator are fully login-gated. Do not attempt to automate them.

  • Company overview: https://www.linkedin.com/company/stripe/
  • Company about tab: https://www.linkedin.com/company/stripe/about/
  • Company posts tab: https://www.linkedin.com/company/stripe/posts/
  • Job view (canonical): https://www.linkedin.com/jobs/view/3847291847/
  • Job search (logged-out, limited depth): https://www.linkedin.com/jobs/search/?keywords=data+engineer&location=Berlin
  • Public profile: https://www.linkedin.com/in/williamhgates/ — only headline and partial experience visible without login
  • School page: https://www.linkedin.com/school/mit/
  • Showcase page: https://www.linkedin.com/showcase/microsoft-azure/

3.Parsing LinkedIn's obfuscated markup

LinkedIn randomizes CSS class names on every deployment using CSS module hashing. A selector that works today — org-top-card-summary__title, for example — may be replaced with a meaningless hash within weeks. This is the primary reason LinkedIn scrapers require constant maintenance. Build monitoring into any production pipeline: alert on empty extracts rather than silently storing null values.

Stable anchors that survive deploys more reliably include: JSON-LD structured data embedded in script tags, data-test-id attributes on some job page elements, semantic HTML structure (h1 as the first heading in the top card, the first anchor inside a company span), and ARIA labels on interactive elements. JSON-LD is the most reliable — LinkedIn embeds jobPosting schema on job view pages and Organization schema on some company pages.

For job postings, parse the JSON-LD block first. It typically contains title, datePosted, hiringOrganization.name, jobLocation, and description. Fall back to CSS selectors only for fields not in structured data — applicant count and workplace type are rarely in JSON-LD.

Company employee counts appear as text near the top card — look for a span or anchor containing the string 'employees' and extract the preceding number range. The exact wrapper element changes; a text-content search is more durable than a class-name selector.

Job descriptions often render truncated with a 'Show more' button. The full text is inside a container that requires JavaScript interaction to expand — this is why js_rendering mode is necessary for complete description extraction.

4.LinkedIn's anti-bot enforcement stack

LinkedIn operates one of the more sophisticated bot-detection systems among public websites. It combines IP reputation scoring (datacenter ranges are blocked almost immediately), behavioral fingerprinting, TLS fingerprint analysis, rate limiting at the IP and session level, and authentication walls that appear selectively based on traffic patterns. A request that returns full HTML on the first hit may return a login redirect on the tenth from the same IP.

The legal layer compounds the technical one. hiQ Labs v. LinkedIn (Ninth Circuit, 2022 remand) held that scraping publicly accessible data may not violate the CFAA, but that ruling does not override LinkedIn's User Agreement, which explicitly prohibits scraping, crawling, and bot use. Microsoft has filed and won injunctions against scrapers. Commercial products reselling LinkedIn-derived datasets face the highest exposure.

For any access pattern you do pursue on public pages: use residential proxies, keep request rates well below human browsing speed, do not attempt to log in programmatically, and do not scrape fields that are only visible after authentication. The OmniScrape Web Unlocker returns what a logged-out browser sees — it does not bypass authentication.

  • Immediate blocks on datacenter IP ranges — residential proxies are not optional
  • Login redirect walls on most profile fields beyond headline and current role
  • Rate limits and CAPTCHA challenges on job search pagination beyond page 3–4
  • CSS class name randomization breaking static selectors within weeks
  • JavaScript-rendered content requiring headless browser execution for full text
  • TLS and browser fingerprint checks that flag non-browser HTTP clients
  • Contractual prohibition in LinkedIn User Agreement, actively enforced by Microsoft
  • Legal precedent: LinkedIn has obtained injunctions against commercial scrapers

5.Scrape a public company page

Company overview and about pages are the lowest-risk LinkedIn targets — they are public marketing content LinkedIn explicitly wants indexed by search engines. Use mode 'auto' with a residential proxy. The css_extractor output format lets OmniScrape run selector matching server-side and return only the fields you need.

The selectors below target the logged-out company page structure as of the current guide revision. Treat them as starting points — validate against live HTML and update when extracts go empty. The about section selector targets the description paragraph inside the about module; website and employee count are in the info list near the top card.

If the page returns a login redirect instead of company content, LinkedIn has flagged the request. Reduce frequency, rotate residential proxy endpoints, and avoid hitting the same company URL more than once per session.

LinkedIn company about page — css_extractor request
json
12345678910111213141516171819{
  "url": "https://www.linkedin.com/company/stripe/about/",
  "mode": "auto",
  "output_format": "css_extractor",
  "proxy": "residential:us",
  "enable_solver": true,
  "css_selectors": {
    "name": "h1",
    "tagline": "p.org-top-card-summary__tagline",
    "industry": "div[data-test-id='about-us__industry'] dd",
    "employee_count": "div[data-test-id='about-us__size'] dd",
    "headquarters": "div[data-test-id='about-us__headquarters'] dd",
    "founded": "div[data-test-id='about-us__foundedOn'] dd",
    "company_type": "div[data-test-id='about-us__organizationType'] dd",
    "website": "div[data-test-id='about-us__website'] a",
    "description": "p[data-test-id='about-us__description']",
    "followers": "span[data-test-id='org-top-card-followers-count']"
  }
}

6.Scrape an individual job posting

Individual job view URLs (/jobs/view/JOB_ID/) are the most reliable LinkedIn extraction target. The page structure is more consistent than search results, and LinkedIn embeds jobPosting JSON-LD structured data that survives class name randomization. Use js_rendering mode because the full job description requires JavaScript to expand — the 'Show more' button must execute before the complete markup is present in the DOM.

The js_wait_selector targets the job title heading, which appears once the top card has rendered. Set js_wait_timeout to at least 10 seconds — LinkedIn's JS bundle is large and cold-start render times on residential proxies can be slow.

After extraction, always attempt to parse the JSON-LD block from the raw HTML as a secondary data source. The css_extractor fields below cover what is not in structured data: applicant count, workplace type badge, and the expanded description container.

Applicant count (e.g. 'Over 200 applicants') and salary range appear inconsistently — they are present on some postings and absent on others. Handle missing fields gracefully in your parser.

LinkedIn job posting — js_rendering with css_extractor
json
123456789101112131415161718192021{
  "url": "https://www.linkedin.com/jobs/view/3847291847/",
  "mode": "js_rendering",
  "output_format": "css_extractor",
  "proxy": "residential:us",
  "enable_solver": true,
  "js_wait_selector": "h1.top-card-layout__title",
  "js_wait_timeout": 12000,
  "css_selectors": {
    "title": "h1.top-card-layout__title",
    "company": "a.topcard__org-name-link",
    "location": "span.topcard__flavor--bullet",
    "workplace_type": "span.workplace-type",
    "posted": "span.posted-time-ago__text",
    "applicants": "span.num-applicants__caption",
    "salary": "div.salary.compensation__salary",
    "description": "div.show-more-less-html__markup",
    "seniority_level": "span.description__job-criteria-text:nth-of-type(1)",
    "employment_type": "span.description__job-criteria-text:nth-of-type(2)"
  }
}

7.Job board aggregation at scale

For systematic job harvesting across multiple sources, see the dedicated job board web scraping guide. LinkedIn-specific considerations at scale are more restrictive than most job boards.

Discovery: use the job search URL with &keywords= and &location= parameters to find job IDs, then scrape individual /jobs/view/JOB_ID/ pages. Do not rely on search result HTML for description text — it is always truncated. Extract the job ID from the URL (the numeric segment) and use it as your deduplication key. Title-based deduplication fails because the same role is often posted multiple times by the same employer.

Pagination: job search paginates via &start=25, &start=50, etc. LinkedIn blocks crawlers that paginate aggressively. Limit to the first 3–4 pages per query, rotate queries rather than paginating deep, and introduce delays between requests that reflect human browsing cadence — several seconds minimum, not milliseconds.

Freshness: employers edit job descriptions in place without changing the job ID or URL. Archive the full description text with a scraped_at timestamp on every fetch. Compare description hashes across runs to detect silent edits. The datePosted field in JSON-LD reflects original posting date, not last edit.

Deduplication across sources: if you aggregate from multiple job boards, the same posting often appears on LinkedIn, Indeed, and the company's own careers page. Normalize on (company_name, job_title, location, posted_date) as a composite key, not on URL.

Volume limits: there is no published rate limit from LinkedIn. Treat any 429 or redirect-to-login response as a signal to back off for that IP endpoint. Residential proxy rotation across a large pool is the primary mitigation — do not retry immediately on block.

8.Official APIs and licensed data alternatives

LinkedIn's official data access programs exist for specific use cases and require approval. The LinkedIn Marketing API covers ad targeting and company page analytics for authorized partners. Talent Solutions and Recruiter contracts provide job posting data and applicant tracking integrations. Sales Navigator has a limited API for authorized CRM vendors. None of these are self-serve for arbitrary data extraction, but they provide legally clean access to the fields you actually need.

For company firmographics without scraping LinkedIn directly, Crunchbase and PitchBook offer structured company graphs with funding, headcount, and industry data via licensed API. People Data Labs aggregates professional identity data (current employer, title, location) from public sources and provides it via API with clear data licensing terms. Clearbit (now part of HubSpot) offers company enrichment on domain lookup. These are the right starting point for B2B enrichment pipelines that need to scale.

For job market intelligence, the Bureau of Labor Statistics JOLTS data, Indeed Hiring Lab, and Lightcast (formerly EMSI Burning Glass) provide aggregated labor market signals without the legal exposure of scraping LinkedIn job postings.

If you have a legitimate need to automate LinkedIn flows for accounts you own and control — for example, managing your own company page programmatically — a Browser-as-a-Service approach with your authenticated session is the technical path. It is still subject to LinkedIn's User Agreement limits on automation, so review those terms with counsel before proceeding.

9.User Agreement and legal risk — read before writing code

LinkedIn's User Agreement (Section 8.2) explicitly prohibits scraping, crawling, spidering, and using bots or other automated means to access the service without LinkedIn's express written permission. This applies regardless of whether the data is publicly visible. Microsoft enforces this: LinkedIn has obtained temporary restraining orders and injunctions against commercial scrapers, and has pursued damages claims.

The hiQ Labs v. LinkedIn litigation (Ninth Circuit opinions in 2019 and 2022) established that scraping publicly accessible data may not constitute unauthorized access under the Computer Fraud and Abuse Act — a narrow technical holding about one federal statute. It does not override the User Agreement as a contract. It does not apply in jurisdictions outside the US. It does not protect resellers of LinkedIn-derived datasets. It does not address state computer crime laws. Do not treat hiQ as a green light.

The practical risk gradient: scraping individual public company pages for internal research at low volume is lower risk than bulk job harvesting, which is lower risk than profile scraping, which is lower risk than automating logged-in sessions, which is lower risk than reselling LinkedIn-derived data commercially. Every step up that gradient increases legal exposure.

OmniScrape provides technical infrastructure for web data extraction. Whether to use that infrastructure on LinkedIn, and for what purpose, is a business and legal decision your organization must make explicitly — ideally with counsel who has reviewed LinkedIn's current User Agreement and the applicable law in your jurisdiction. This guide is a technical reference, not legal advice, and not an endorsement of scraping LinkedIn.

Frequently asked questions

Can I scrape LinkedIn profiles without logging in?

A logged-out browser sees only the profile headline, current role title and employer, general location, and a truncated connection count label. Full work history, education details, skills, recommendations, contact information, and connection graphs are all gated behind authentication. LinkedIn's User Agreement prohibits automating logged-in sessions, so in practice the logged-out view is the only compliant target — and it contains limited data for most enrichment use cases.

Why do my LinkedIn CSS selectors break every few weeks?

LinkedIn uses CSS module hashing in its frontend build pipeline, which generates randomized class names on each deployment. A class like 'org-top-card-summary__title' may be replaced with an opaque hash string within weeks. Build your selectors around stable anchors that survive deploys: JSON-LD structured data in script tags (most reliable), data-test-id attributes, ARIA labels, semantic HTML structure (h1 as the first heading in the top card), and text-content matching for labels like 'employees'. Monitor extracts in production and alert on empty results rather than silently storing null values.

Does OmniScrape bypass LinkedIn's login walls?

No. The OmniScrape Web Unlocker returns what a logged-out browser sees after resolving bot challenges on public pages. It does not authenticate with LinkedIn or access content that requires a user session. If LinkedIn returns a login redirect for a given URL, that content is not accessible without authentication and OmniScrape cannot retrieve it. Bypassing authentication to access gated content is outside compliant use of the API.

What is the lowest-risk LinkedIn data to collect?

Individual job postings on /jobs/view/ URLs and company about pages are the lowest-risk targets — they are public marketing content LinkedIn explicitly indexes in search engines. Collect at low volume, with residential proxies, with delays between requests, and with legal review of your specific use case. The highest-risk activities are: scraping profile data at scale, automating logged-in sessions, and reselling LinkedIn-derived datasets commercially. For any production use case, evaluate official LinkedIn APIs and licensed data providers first.

How do I extract the full job description when it is truncated?

LinkedIn renders job descriptions with a 'Show more' button that requires JavaScript execution to expand the full text. Use mode 'js_rendering' with a js_wait_selector targeting the job title heading (h1.top-card-layout__title) and a js_wait_timeout of at least 10–12 seconds. The full description text appears in div.show-more-less-html__markup after the expand interaction completes. Additionally, parse the JSON-LD jobPosting block in the page's script tags — it sometimes contains the complete description text without requiring JS interaction.

How should I deduplicate LinkedIn job postings?

Use the numeric job ID extracted from the /jobs/view/JOB_ID/ URL as your primary deduplication key — not the job title or description text. The same role is frequently posted multiple times by the same employer with different job IDs, and employers edit descriptions in place without changing the ID. Store a scraped_at timestamp and a hash of the description text on every fetch so you can detect silent edits. For cross-source deduplication (LinkedIn plus Indeed plus company careers page), normalize on a composite key of (company_name, job_title, location, posted_date).

What does the hiQ v. LinkedIn ruling actually mean for scraping?

The Ninth Circuit held that LinkedIn could not invoke the Computer Fraud and Abuse Act to block hiQ from scraping publicly accessible profile pages, because accessing public data does not constitute 'unauthorized access' under that specific statute. This is a narrow holding about one US federal law. It does not override LinkedIn's User Agreement as a contract. It does not apply outside the US. It does not protect commercial resale of LinkedIn data. It does not address state computer crime statutes. Courts in subsequent cases have distinguished hiQ on various grounds. Treat it as a data point for your legal team's analysis, not as a general permission to scrape LinkedIn.

Related guides

  • Job Board Web Scraping: HR Tech Pipeline for Labor Market Intelligence
  • Lead Generation Web Scraping: Compliant Inbound Enrichment for Sales Teams
  • Headless Browser Scraping: When to Use It and How to Do It Right
  • Web Scraping Without Getting Blocked

Ready to scrape without blocks?

Get your API key in minutes. Test protected URLs from the dashboard — no credit card required to start.

Ready to get started?

Start scraping protected sites today — no credit card required.

OmniScrape

Web scraping infrastructure for developers. One API call to bypass any protection.

All systems operational

Product

  • Web Unlocker
  • Browser-as-a-Service
  • Residential Proxies
  • Pricing

Developers

  • API Reference ↗
  • Quickstart ↗
  • All Guides
  • Use Cases
  • Status

Company

  • About
  • Contact

Legal

  • Privacy Policy
  • Terms of Service
  • Refund Policy
  • Cookie Policy
  • Acceptable Use

Solutions

  • E-commerce Web Scraping: Catalog Intelligence at Production Scale
  • Real Estate Web Scraping: Listings, Comps, and Market Data
  • SERP Web Scraping: Agency Rank Tracking Workflow
  • Job Board Web Scraping: HR Tech Pipeline for Labor Market Intelligence
  • Price Monitoring with Web Scraping: A Practical Developer Guide
  • Lead Generation Web Scraping: Compliant Inbound Enrichment for Sales Teams
  • Market Research Web Scraping: Multi-Geo Data Collection for Research Firms
  • Sentiment Analysis Web Scraping: Build a Production Review Pipeline
  • Logistics Web Scraping: Carrier Rates, Port ETAs, and Sailing Schedules
  • Social Media Web Scraping: Brand Mention Monitoring from Public Pages
  • LLM Training Data Scraping: Building Clean Web Corpora
  • Travel Web Scraping: Hotel Rates, Flight Fares & Parity Monitoring

Web Scraping by Language

  • Web Scraping with Python
  • Web Scraping with Node.js: fetch, Cheerio, and the OmniScrape API
  • Web Scraping with Java: HttpClient, Jsoup, and OmniScrape API
  • Web Scraping with PHP
  • Web Scraping with Go (Golang)
  • Web Scraping with Ruby: Faraday, Nokogiri, Sidekiq & OmniScrape
  • Web Scraping with C#: HttpClient, AngleSharp, and OmniScrape API
  • Web Scraping with Rust
  • Web Scraping with R: httr2, rvest, and the OmniScrape API
  • Web Scraping with C++
  • Web Scraping with Elixir
  • Web Scraping with Perl: Mojo::UserAgent, Mojo::DOM, and OmniScrape

Anti-Bot Bypass

  • How to Bypass Cloudflare When Web Scraping
  • How to Bypass DataDome When Web Scraping
  • How to Bypass Akamai Bot Manager When Web Scraping
  • How to Bypass PerimeterX (HUMAN Security) When Web Scraping
  • Bypassing AWS WAF When Web Scraping: Rate Rules, Bot Control, and Residential Proxies
  • How to Bypass Imperva (Incapsula) When Web Scraping
  • How to Bypass Kasada Bot Protection When Web Scraping
  • How to Bypass F5 BIG-IP Bot Defense When Web Scraping
  • How to Bypass Distil Networks When Web Scraping
  • How to Bypass reCAPTCHA When Web Scraping

Scraping Tools

  • Playwright Web Scraping: Practical Patterns for Protected Sites
  • Puppeteer Web Scraping: Patterns, Anti-Bot Limits, and BaaS Integration
  • Selenium Web Scraping: Practical Patterns for Real-World Projects
  • Scrapy Web Scraping with OmniScrape: Download Middleware, Pipelines, and Scale
  • Beautiful Soup Web Scraping: A Practical Guide
  • cURL Web Scraping: Shell-Native Patterns with OmniScrape
  • HTTPX Web Scraping: Async Python with OmniScrape
  • Cheerio Web Scraping: A Practical Guide

Site-Specific Scrapers

  • Amazon Scraper: Product Data, Buy Box, Reviews, and Multi-Marketplace
  • Google Search Scraper: Extract SERP Rankings and Features
  • Google Maps Scraper: Extract Business Listings and Place Data
  • LinkedIn Scraper: Companies, Jobs, and Public Profiles
  • Walmart Scraper: Prices, Stock, Rollback Deals, and Fulfillment Data
  • eBay Scraper: Extract Listings, Auctions, and Sold Prices
  • Shopify Scraper: Products, Variants, and JSON Endpoints
  • Indeed Scraper: Extract Job Listings, Salaries, and Company Data
  • Zillow Scraper: Extract Listings, Zestimates, and Price History
  • Reddit Scraper: Posts, Comments, and Subreddit Data
  • X (Twitter) Scraper: Tweets, Profiles, and Hashtags
  • Instagram Scraper: Posts, Reels, and Profile Metrics
  • TikTok Scraper: Extract Videos, Hashtags, and Trend Data
  • YouTube Scraper: Extract Video Metadata, Comments, and Channel Stats
  • Booking.com Scraper: Hotel Rates, Room Types, and Availability
  • Airbnb Scraper: Listings, Calendars, and Nightly Rates
  • Crunchbase Scraper: Extract Funding Rounds, Companies, and Investors
  • Yelp Scraper: Extract Business Listings, Ratings, and Reviews
  • Glassdoor Scraper: Employer Ratings, Salaries, and Review Data
  • Trustpilot Scraper: TrustScore, Star Distribution, and Review Monitoring

How We Compare

  • OmniScrape vs ScrapingBee
  • OmniScrape vs ZenRows
  • OmniScrape vs ScraperAPI: A Practical Developer Comparison
  • OmniScrape vs Bright Data: Which Web Scraping Platform Fits Your Team?
  • OmniScrape vs Oxylabs
  • OmniScrape vs Smartproxy
  • OmniScrape vs Crawlbase: API Design, Observability, and Migration Guide
  • OmniScrape vs Apify

Web Scraping Guides

  • Web Scraping Without Getting Blocked
  • Web Scraping Proxy Guide: Types, Sessions, Geo, and OmniScrape Integration
  • Solve CAPTCHAs While Web Scraping
  • Web Scraping vs Web Crawling: Architecture, Patterns, and When to Use Each
  • Headless Browser Scraping: When to Use It and How to Do It Right
  • Web Scraping API: Endpoint, Modes, Output Formats & Integration Patterns
  • Rotating Proxies for Web Scraping: Policies, Session Binding, and Geo Pools
  • Scrape JavaScript-Rendered Pages: SPAs, Hydration, and Hidden APIs

© 2026 OmniScrape. All rights reserved.

PrivacyTermsRefundsAcceptable Use