OmniScrape
ProductsSolutionsGuidesDocs ↗PricingAbout
ProductsSolutionsGuidesDocs ↗PricingAbout
← All guides
Site-Specific Scrapers

Instagram Scraper: Posts, Reels, and Profile Metrics

Instagram is a GraphQL application wearing a thin HTML shell. Nearly every endpoint that returns meaningful data — follower counts, post grids, reel play counts, stories — requires session cookies, CSRF tokens, and an authenticated login. Meta's Terms of Service explicitly prohibit automated scraping, and the enforcement stack behind that prohibition is real: datacenter IP blocks, rotating query hashes, and active litigation against scrapers.

This guide documents what is technically observable on logged-out public post pages, how the oEmbed endpoint works for embed-compliant use cases, and why most production pipelines for influencer analytics or brand monitoring end up on the official Instagram Graph API or a licensed data partner. Read this alongside social media web scraping for broader governance context. Nothing here authorizes scraping private accounts or bypassing authentication you are not permitted to use.

On this page

1. Instagram metrics brands and developers track2. Instagram URL patterns and endpoint anatomy3. Instagram markup reality for logged-out requests4. Meta's bot detection and enforcement stack5. Scraping a public post permalink (logged-out attempt)6. oEmbed endpoint for embed-compliant use cases7. What actually works in production8. Reels: video metrics and logged-out access9. Meta Terms of Service and legal constraints10. FAQ

1.Instagram metrics brands and developers track

Campaign managers need to verify influencer deliverables — post went live, caption includes required hashtags, engagement is within expected range. Competitor analysts watch hashtag volume and posting cadence. Brand safety teams monitor mentions. None of this is straightforward without authorized API access, and Meta has progressively closed the gaps that previously made logged-out scraping viable.

Below is the full set of fields teams typically want. The realistic subset obtainable without login in 2024 is small: og meta tags on individual post permalinks, oEmbed author and thumbnail, and occasionally a caption snippet in the page title.

  • Post shortcode, full caption text, hashtags, @mentions
  • Like count and comment count on public posts
  • Post type: image, carousel, reel, or video
  • Timestamp (ISO 8601) and location tag name
  • Profile: username, display name, bio, follower count, following count, post count
  • Reel play count when visible to logged-out viewers (increasingly rare)
  • Tagged products, paid partnership labels, collaborator usernames
  • oEmbed: thumbnail URL, author name, embed HTML snippet
  • Media CDN URLs for images and video (expire via signed query parameters)

2.Instagram URL patterns and endpoint anatomy

Post and reel permalink URLs are stable and predictable. The shortcode — the base64url-like string in /p/SHORTCODE/ — encodes the internal media ID and is the canonical identifier for a piece of content. Profile grids and hashtag feeds paginate through GraphQL calls that require authentication; there is no public cursor-based pagination available to unauthenticated clients.

The oEmbed endpoint is the only officially documented, publicly accessible API Instagram exposes without a Business account. It accepts a post URL and returns a limited JSON payload suitable for embed rendering — not bulk data extraction.

  • Post permalink: https://www.instagram.com/p/CxYzAbCdEfG/
  • Reel permalink: https://www.instagram.com/reel/CxYzAbCdEfG/
  • Profile grid: https://www.instagram.com/natgeo/
  • Tagged posts: https://www.instagram.com/natgeo/tagged/
  • Hashtag feed: https://www.instagram.com/explore/tags/wildlife/ (login-gated)
  • oEmbed endpoint: https://api.instagram.com/oembed?url=POST_URL&omitscript=true
  • Shortcode decode: base64url(media_id) — useful for deduplication, not for API calls
  • Embed iframe: https://www.instagram.com/p/SHORTCODE/embed/ (renders post without login for display only)

3.Instagram markup reality for logged-out requests

When Instagram serves a logged-out view of a public post permalink, the most reliable data surface is Open Graph meta tags in the <head>. The og:description tag typically contains a truncated caption snippet, like and comment counts in a formatted string, and the author handle. The og:image tag points to the post's thumbnail CDN URL. These are rendered server-side and survive bot detection more consistently than body content.

Historically, Instagram injected a window._sharedData JSON blob and later a series of <script type="application/json"> tags into the page body. These contained the full GraphQL response for the post — captions, media nodes, owner data. Meta has progressively stripped these from logged-out responses. As of mid-2024, most logged-out post pages return empty or stub JSON blobs. Do not build a pipeline that depends on them.

Visible DOM elements — like counts in <span> tags inside <section> elements, captions in <h1>, timestamps in <time datetime="..."> — exist on some logged-out views but use hashed CSS class names generated by Meta's CSS-in-JS system. These class names change on every deploy, which happens multiple times per week. Selectors targeting class names break silently and frequently. Target semantic HTML elements and attributes (tagName, datetime attribute, itemprop) rather than class names wherever possible.

4.Meta's bot detection and enforcement stack

Instagram's protection is layered and actively maintained. At the network level, datacenter IP ranges are blocked outright — residential proxies are the minimum viable option for any logged-out request that returns real content rather than a redirect to the login page. Rate limits are applied per IP and per session, and thresholds are low enough that bulk crawling from a single IP triggers blocks within minutes.

At the application level, GraphQL queries are identified by a doc_id hash that Meta rotates. Unofficial clients that hardcode these hashes — the approach used by most open-source Instagram scrapers — break on rotation, sometimes within days of a release. CSRF tokens are required on state-changing requests. Session cookies are fingerprinted and invalidated when behavioral signals look automated.

Legal enforcement is real. Meta has filed and won cases against scraper operators. The hiQ v. LinkedIn ruling on CFAA does not extend to Instagram because Meta's terms are explicit and the technical access controls are real, not merely contractual. If your use case requires Instagram data at scale, the correct path is the Instagram Graph API with a Business account, or a licensed data provider operating under a data access agreement.

  • Datacenter IP instant blocks — residential proxies required for any logged-out access
  • Per-IP and per-session rate limits with low thresholds
  • Login wall on profile grids, hashtag feeds, stories, and most search results
  • GraphQL doc_id hash rotation breaking unofficial API clients
  • CSRF token and session cookie requirements on all authenticated endpoints
  • Behavioral fingerprinting: mouse movement, scroll velocity, request timing
  • CDN media URL expiry via signed query parameters (URLs are not permanently usable)
  • Active legal enforcement — Meta ToS Section 3 explicitly prohibits scraping

5.Scraping a public post permalink (logged-out attempt)

This request targets the Open Graph meta tags and semantic HTML elements that Instagram occasionally serves to logged-out visitors on individual post permalinks. Success rate is low and declining — Instagram increasingly redirects unauthenticated requests to the login page, especially from non-residential IPs. When it does serve content, og:description and og:image are the most reliable extraction targets.

Use js_rendering mode because Instagram's post pages load some content asynchronously and the login-wall redirect is sometimes JavaScript-driven. The js_wait_selector targets the article element that wraps post content; if Instagram redirects to login instead, this selector will time out and the response will contain the login page HTML. Check body.data.css_extracted for null or empty values before treating the response as a successful extract.

Do not use this pattern for bulk collection. A single exploratory fetch to verify what a specific post's og tags contain is a reasonable use case. Crawling a profile's post grid this way is not viable — profile grids require authenticated GraphQL pagination.

Instagram public post — CSS extractor request
json
12345678910111213141516171819{
  "url": "https://www.instagram.com/p/CxYzAbCdEfG/",
  "mode": "js_rendering",
  "output_format": "css_extractor",
  "proxy": "residential:us",
  "enable_solver": true,
  "js_wait_selector": "article",
  "js_wait_timeout": 15000,
  "css_selectors": {
    "og_description": "meta[property='og:description']",
    "og_image": "meta[property='og:image']",
    "og_title": "meta[property='og:title']",
    "caption": "h1",
    "author": "header a[href]",
    "timestamp": "time[datetime]",
    "likes": "section span[aria-label]",
    "comments": "ul[class] li span"
  }
}

6.oEmbed endpoint for embed-compliant use cases

Instagram's oEmbed endpoint is the only officially documented, publicly accessible data surface that does not require a Business account. It accepts a post URL and returns a JSON object containing the author name, author URL, thumbnail URL and dimensions, and an HTML snippet for embedding the post. It is designed for CMSes and publishing tools that need to render Instagram embeds — not for bulk data extraction.

The endpoint is rate-limited and requires the requesting application to have agreed to Instagram's Platform Policy. For a single embed render or a low-volume editorial workflow, it is the correct tool. For anything resembling analytics at scale, it is not.

Note that this request sends the oEmbed API URL to OmniScrape, which fetches it and returns the response body. The output_format is html here because we want the raw response body — the oEmbed JSON — returned in body.data.content. Parse that string as JSON in your application code to extract the fields you need.

Instagram oEmbed — fetch via OmniScrape
json
123456{
  "url": "https://api.instagram.com/oembed?url=https://www.instagram.com/p/CxYzAbCdEfG/&omitscript=true",
  "mode": "auto",
  "output_format": "html",
  "proxy": "residential:us"
}

7.What actually works in production

The Instagram Graph API is the correct tool for accounts you manage or have been granted access to by their owners. Through a Meta Business account and app review, you can retrieve post metrics, audience demographics, story insights, and media objects for connected accounts. This is the path for agencies managing client accounts, brands tracking their own presence, and tools built on top of creator partnerships.

For third-party mention data — tracking what other accounts post about your brand, monitoring hashtags, benchmarking competitors — licensed social listening vendors are the realistic option. Providers like Brandwatch, Sprout Social, and Meltwater operate under data access agreements with Meta and can surface aggregated metrics that are not available through the public Graph API.

Technical scraping without authorization is a maintenance trap with compounding costs. Budget 20 or more hours per month for selector breakage, IP block management, and login-wall workarounds — and that assumes Meta does not take legal action. For any use case that needs reliable, ongoing Instagram data, the cost of Graph API access or a licensed data provider is almost always lower than the engineering cost of maintaining an unauthorized scraper.

8.Reels: video metrics and logged-out access

Reel permalinks use the same /reel/SHORTCODE/ URL pattern and are subject to the same logged-out access constraints as regular posts. On some logged-out views, the og:description tag includes a play count in the format '1.2M plays' alongside the caption snippet — but this is inconsistent and Meta has been progressively removing it.

Video CDN URLs embedded in the page source are signed and expire, typically within hours. They are not suitable for archival or redistribution. Reel audio metadata, remix counts, and the full engagement breakdown (saves, shares) are not available on logged-out pages under any circumstances — they require authenticated Graph API access.

If your use case is embedding a reel in a CMS or rendering a preview, the oEmbed endpoint works for reels as well as regular posts — pass the /reel/ URL as the url parameter. For analytics on reels you own, use the Instagram Graph API's media insights endpoint.

9.Meta Terms of Service and legal constraints

Meta's Terms of Service, Instagram's Platform Policy, and Instagram's Community Guidelines all contain explicit prohibitions on automated data collection without authorization. Section 3 of the Instagram Terms states that users may not 'do anything unlawful, misleading, or fraudulent' and specifically prohibits 'collect[ing] users' content or information' without consent. These are not ambiguous.

The hiQ v. LinkedIn line of cases established that scraping publicly accessible data from a website does not automatically violate the Computer Fraud and Abuse Act — but that reasoning applies narrowly to data that is genuinely public and where no technical access controls exist. Instagram's login walls, CSRF requirements, and IP blocks are technical access controls. Circumventing them to access data that Instagram has chosen to gate behind authentication is a different legal question, and Meta has successfully litigated on this basis.

EU GDPR applies to any personal data collected from Instagram posts — usernames, profile photos, captions that identify individuals. Storing or processing this data without a lawful basis is a compliance exposure independent of the scraping question. California CCPA and other state privacy laws create similar obligations for US-based operators.

OmniScrape documents the technical mechanics of what is possible on logged-out public pages. Using these techniques on Instagram without Meta's authorization is a legal decision that your organization's counsel should make, not an engineering one. We do not represent that any of the techniques described here are permitted under Meta's Terms.

Frequently asked questions

Can I scrape Instagram without an account?

Occasionally, for individual public post permalinks, Instagram serves a logged-out HTML response that includes Open Graph meta tags with a caption snippet, thumbnail, and author handle. This works inconsistently and is declining in reliability as Meta tightens logged-out access. Profile grids, hashtag feeds, stories, and search results all require authentication. For any production use case, plan on the official Graph API or a licensed data provider.

Does OmniScrape bypass Instagram's login wall?

No. OmniScrape's Web Unlocker and js_rendering mode fetch the HTML that Instagram serves to an unauthenticated browser — which is increasingly a redirect to the login page rather than post content. OmniScrape does not log into Instagram on your behalf, manage Instagram sessions, or bypass authentication controls. Using session cookies from accounts you own is technically possible but must comply with Meta's Terms and applicable law.

Why do my Instagram CSS selectors break every few days?

Meta uses a CSS-in-JS system that generates hashed class names at build time. Every time Instagram deploys — which happens multiple times per week — the class names change. Selectors like div._aagw or span._aacl will stop working without warning. The mitigation is to target semantic HTML attributes instead: meta[property='og:description'], time[datetime], a[href*='/p/'], and header elements. These are more stable because they are driven by HTML semantics rather than styling.

What data does the Instagram oEmbed endpoint return?

The oEmbed endpoint returns: author_name, author_url, provider_name, provider_url, thumbnail_url, thumbnail_width, thumbnail_height, html (the embed iframe snippet), width, and version. It does not return like counts, comment counts, full captions, follower counts, or any engagement metrics. It is designed for embed rendering, not analytics.

Is scraping Instagram legal?

Meta explicitly prohibits it in their Terms of Service. Circumventing technical access controls — login walls, IP blocks, CSRF tokens — to access gated data raises Computer Fraud and Abuse Act exposure in the US, and Meta has filed and won cases on this basis. Storing personal data from Instagram posts without a lawful basis creates GDPR and CCPA exposure. Commercial use cases should use the Instagram Graph API with proper app review or a licensed data provider. This is a question for your legal counsel, not your engineering team.

Can I scrape Instagram Reels for play counts and video URLs?

Play counts appear in og:description on some logged-out reel pages but are inconsistently present and increasingly absent. Video CDN URLs embedded in page source are signed and expire within hours — they cannot be used for archival or redistribution. Full reel metrics (saves, shares, reach, impressions) require authenticated Graph API access to the account that owns the reel.

What is the Instagram Graph API and how do I get access?

The Instagram Graph API is Meta's official programmatic interface for Instagram data. Access requires a Meta Business account, a Facebook App with Instagram Graph API permissions, and app review for most advanced permissions. It provides post metrics, audience insights, story data, and media objects for accounts that have connected to your app. Basic Display API (now deprecated) provided read access for personal accounts. Start at developers.facebook.com/docs/instagram-api for current documentation.

Related guides

  • Social Media Web Scraping: Brand Mention Monitoring from Public Pages
  • Headless Browser Scraping: When to Use It and How to Do It Right
  • Web Scraping Without Getting Blocked
  • TikTok Scraper: Extract Videos, Hashtags, and Trend Data

Ready to scrape without blocks?

Get your API key in minutes. Test protected URLs from the dashboard — no credit card required to start.

Ready to get started?

Start scraping protected sites today — no credit card required.

OmniScrape

Web scraping infrastructure for developers. One API call to bypass any protection.

All systems operational

Product

  • Web Unlocker
  • Browser-as-a-Service
  • Residential Proxies
  • Pricing

Developers

  • API Reference ↗
  • Quickstart ↗
  • All Guides
  • Use Cases
  • Status

Company

  • About
  • Contact

Legal

  • Privacy Policy
  • Terms of Service
  • Refund Policy
  • Cookie Policy
  • Acceptable Use

Solutions

  • E-commerce Web Scraping: Catalog Intelligence at Production Scale
  • Real Estate Web Scraping: Listings, Comps, and Market Data
  • SERP Web Scraping: Agency Rank Tracking Workflow
  • Job Board Web Scraping: HR Tech Pipeline for Labor Market Intelligence
  • Price Monitoring with Web Scraping: A Practical Developer Guide
  • Lead Generation Web Scraping: Compliant Inbound Enrichment for Sales Teams
  • Market Research Web Scraping: Multi-Geo Data Collection for Research Firms
  • Sentiment Analysis Web Scraping: Build a Production Review Pipeline
  • Logistics Web Scraping: Carrier Rates, Port ETAs, and Sailing Schedules
  • Social Media Web Scraping: Brand Mention Monitoring from Public Pages
  • LLM Training Data Scraping: Building Clean Web Corpora
  • Travel Web Scraping: Hotel Rates, Flight Fares & Parity Monitoring

Web Scraping by Language

  • Web Scraping with Python
  • Web Scraping with Node.js: fetch, Cheerio, and the OmniScrape API
  • Web Scraping with Java: HttpClient, Jsoup, and OmniScrape API
  • Web Scraping with PHP
  • Web Scraping with Go (Golang)
  • Web Scraping with Ruby: Faraday, Nokogiri, Sidekiq & OmniScrape
  • Web Scraping with C#: HttpClient, AngleSharp, and OmniScrape API
  • Web Scraping with Rust
  • Web Scraping with R: httr2, rvest, and the OmniScrape API
  • Web Scraping with C++
  • Web Scraping with Elixir
  • Web Scraping with Perl: Mojo::UserAgent, Mojo::DOM, and OmniScrape

Anti-Bot Bypass

  • How to Bypass Cloudflare When Web Scraping
  • How to Bypass DataDome When Web Scraping
  • How to Bypass Akamai Bot Manager When Web Scraping
  • How to Bypass PerimeterX (HUMAN Security) When Web Scraping
  • Bypassing AWS WAF When Web Scraping: Rate Rules, Bot Control, and Residential Proxies
  • How to Bypass Imperva (Incapsula) When Web Scraping
  • How to Bypass Kasada Bot Protection When Web Scraping
  • How to Bypass F5 BIG-IP Bot Defense When Web Scraping
  • How to Bypass Distil Networks When Web Scraping
  • How to Bypass reCAPTCHA When Web Scraping

Scraping Tools

  • Playwright Web Scraping: Practical Patterns for Protected Sites
  • Puppeteer Web Scraping: Patterns, Anti-Bot Limits, and BaaS Integration
  • Selenium Web Scraping: Practical Patterns for Real-World Projects
  • Scrapy Web Scraping with OmniScrape: Download Middleware, Pipelines, and Scale
  • Beautiful Soup Web Scraping: A Practical Guide
  • cURL Web Scraping: Shell-Native Patterns with OmniScrape
  • HTTPX Web Scraping: Async Python with OmniScrape
  • Cheerio Web Scraping: A Practical Guide

Site-Specific Scrapers

  • Amazon Scraper: Product Data, Buy Box, Reviews, and Multi-Marketplace
  • Google Search Scraper: Extract SERP Rankings and Features
  • Google Maps Scraper: Extract Business Listings and Place Data
  • LinkedIn Scraper: Companies, Jobs, and Public Profiles
  • Walmart Scraper: Prices, Stock, Rollback Deals, and Fulfillment Data
  • eBay Scraper: Extract Listings, Auctions, and Sold Prices
  • Shopify Scraper: Products, Variants, and JSON Endpoints
  • Indeed Scraper: Extract Job Listings, Salaries, and Company Data
  • Zillow Scraper: Extract Listings, Zestimates, and Price History
  • Reddit Scraper: Posts, Comments, and Subreddit Data
  • X (Twitter) Scraper: Tweets, Profiles, and Hashtags
  • Instagram Scraper: Posts, Reels, and Profile Metrics
  • TikTok Scraper: Extract Videos, Hashtags, and Trend Data
  • YouTube Scraper: Extract Video Metadata, Comments, and Channel Stats
  • Booking.com Scraper: Hotel Rates, Room Types, and Availability
  • Airbnb Scraper: Listings, Calendars, and Nightly Rates
  • Crunchbase Scraper: Extract Funding Rounds, Companies, and Investors
  • Yelp Scraper: Extract Business Listings, Ratings, and Reviews
  • Glassdoor Scraper: Employer Ratings, Salaries, and Review Data
  • Trustpilot Scraper: TrustScore, Star Distribution, and Review Monitoring

How We Compare

  • OmniScrape vs ScrapingBee
  • OmniScrape vs ZenRows
  • OmniScrape vs ScraperAPI: A Practical Developer Comparison
  • OmniScrape vs Bright Data: Which Web Scraping Platform Fits Your Team?
  • OmniScrape vs Oxylabs
  • OmniScrape vs Smartproxy
  • OmniScrape vs Crawlbase: API Design, Observability, and Migration Guide
  • OmniScrape vs Apify

Web Scraping Guides

  • Web Scraping Without Getting Blocked
  • Web Scraping Proxy Guide: Types, Sessions, Geo, and OmniScrape Integration
  • Solve CAPTCHAs While Web Scraping
  • Web Scraping vs Web Crawling: Architecture, Patterns, and When to Use Each
  • Headless Browser Scraping: When to Use It and How to Do It Right
  • Web Scraping API: Endpoint, Modes, Output Formats & Integration Patterns
  • Rotating Proxies for Web Scraping: Policies, Session Binding, and Geo Pools
  • Scrape JavaScript-Rendered Pages: SPAs, Hydration, and Hidden APIs

© 2026 OmniScrape. All rights reserved.

PrivacyTermsRefundsAcceptable Use