Instagram Scraper: Posts, Reels, and Profile Metrics

1.Instagram metrics brands and developers track

Campaign managers need to verify influencer deliverables — post went live, caption includes required hashtags, engagement is within expected range. Competitor analysts watch hashtag volume and posting cadence. Brand safety teams monitor mentions. None of this is straightforward without authorized API access, and Meta has progressively closed the gaps that previously made logged-out scraping viable.

Below is the full set of fields teams typically want. The realistic subset obtainable without login in 2024 is small: og meta tags on individual post permalinks, oEmbed author and thumbnail, and occasionally a caption snippet in the page title.

Post shortcode, full caption text, hashtags, @mentions
Like count and comment count on public posts
Post type: image, carousel, reel, or video
Timestamp (ISO 8601) and location tag name
Profile: username, display name, bio, follower count, following count, post count
Reel play count when visible to logged-out viewers (increasingly rare)
Tagged products, paid partnership labels, collaborator usernames
oEmbed: thumbnail URL, author name, embed HTML snippet
Media CDN URLs for images and video (expire via signed query parameters)

2.Instagram URL patterns and endpoint anatomy

Post and reel permalink URLs are stable and predictable. The shortcode — the base64url-like string in /p/SHORTCODE/ — encodes the internal media ID and is the canonical identifier for a piece of content. Profile grids and hashtag feeds paginate through GraphQL calls that require authentication; there is no public cursor-based pagination available to unauthenticated clients.

The oEmbed endpoint is the only officially documented, publicly accessible API Instagram exposes without a Business account. It accepts a post URL and returns a limited JSON payload suitable for embed rendering — not bulk data extraction.

Post permalink: https://www.instagram.com/p/CxYzAbCdEfG/
Reel permalink: https://www.instagram.com/reel/CxYzAbCdEfG/
Profile grid: https://www.instagram.com/natgeo/
Tagged posts: https://www.instagram.com/natgeo/tagged/
Hashtag feed: https://www.instagram.com/explore/tags/wildlife/ (login-gated)
oEmbed endpoint: https://api.instagram.com/oembed?url=POST_URL&omitscript=true
Shortcode decode: base64url(media_id) — useful for deduplication, not for API calls
Embed iframe: https://www.instagram.com/p/SHORTCODE/embed/ (renders post without login for display only)

3.Instagram markup reality for logged-out requests

When Instagram serves a logged-out view of a public post permalink, the most reliable data surface is Open Graph meta tags in the <head>. The og:description tag typically contains a truncated caption snippet, like and comment counts in a formatted string, and the author handle. The og:image tag points to the post's thumbnail CDN URL. These are rendered server-side and survive bot detection more consistently than body content.

Historically, Instagram injected a window._sharedData JSON blob and later a series of <script type="application/json"> tags into the page body. These contained the full GraphQL response for the post — captions, media nodes, owner data. Meta has progressively stripped these from logged-out responses. As of mid-2024, most logged-out post pages return empty or stub JSON blobs. Do not build a pipeline that depends on them.

Visible DOM elements — like counts in <span> tags inside <section> elements, captions in <h1>, timestamps in <time datetime="..."> — exist on some logged-out views but use hashed CSS class names generated by Meta's CSS-in-JS system. These class names change on every deploy, which happens multiple times per week. Selectors targeting class names break silently and frequently. Target semantic HTML elements and attributes (tagName, datetime attribute, itemprop) rather than class names wherever possible.

4.Meta's bot detection and enforcement stack

Instagram's protection is layered and actively maintained. At the network level, datacenter IP ranges are blocked outright — residential proxies are the minimum viable option for any logged-out request that returns real content rather than a redirect to the login page. Rate limits are applied per IP and per session, and thresholds are low enough that bulk crawling from a single IP triggers blocks within minutes.

At the application level, GraphQL queries are identified by a doc_id hash that Meta rotates. Unofficial clients that hardcode these hashes — the approach used by most open-source Instagram scrapers — break on rotation, sometimes within days of a release. CSRF tokens are required on state-changing requests. Session cookies are fingerprinted and invalidated when behavioral signals look automated.

Legal enforcement is real. Meta has filed and won cases against scraper operators. The hiQ v. LinkedIn ruling on CFAA does not extend to Instagram because Meta's terms are explicit and the technical access controls are real, not merely contractual. If your use case requires Instagram data at scale, the correct path is the Instagram Graph API with a Business account, or a licensed data provider operating under a data access agreement.

Datacenter IP instant blocks — residential proxies required for any logged-out access
Per-IP and per-session rate limits with low thresholds
Login wall on profile grids, hashtag feeds, stories, and most search results
GraphQL doc_id hash rotation breaking unofficial API clients
CSRF token and session cookie requirements on all authenticated endpoints
Behavioral fingerprinting: mouse movement, scroll velocity, request timing
CDN media URL expiry via signed query parameters (URLs are not permanently usable)
Active legal enforcement — Meta ToS Section 3 explicitly prohibits scraping

5.Scraping a public post permalink (logged-out attempt)

This request targets the Open Graph meta tags and semantic HTML elements that Instagram occasionally serves to logged-out visitors on individual post permalinks. Success rate is low and declining — Instagram increasingly redirects unauthenticated requests to the login page, especially from non-residential IPs. When it does serve content, og:description and og:image are the most reliable extraction targets.

Use js_rendering mode because Instagram's post pages load some content asynchronously and the login-wall redirect is sometimes JavaScript-driven. The js_wait_selector targets the article element that wraps post content; if Instagram redirects to login instead, this selector will time out and the response will contain the login page HTML. Check body.data.css_extracted for null or empty values before treating the response as a successful extract.

Do not use this pattern for bulk collection. A single exploratory fetch to verify what a specific post's og tags contain is a reasonable use case. Crawling a profile's post grid this way is not viable — profile grids require authenticated GraphQL pagination.

Instagram public post — CSS extractor request

json

12345678910111213141516171819{
  "url": "https://www.instagram.com/p/CxYzAbCdEfG/",
  "mode": "js_rendering",
  "output_format": "css_extractor",
  "proxy": "residential:us",
  "enable_solver": true,
  "js_wait_selector": "article",
  "js_wait_timeout": 15000,
  "css_selectors": {
    "og_description": "meta[property='og:description']",
    "og_image": "meta[property='og:image']",
    "og_title": "meta[property='og:title']",
    "caption": "h1",
    "author": "header a[href]",
    "timestamp": "time[datetime]",
    "likes": "section span[aria-label]",
    "comments": "ul[class] li span"
  }
}

6.oEmbed endpoint for embed-compliant use cases

Instagram's oEmbed endpoint is the only officially documented, publicly accessible data surface that does not require a Business account. It accepts a post URL and returns a JSON object containing the author name, author URL, thumbnail URL and dimensions, and an HTML snippet for embedding the post. It is designed for CMSes and publishing tools that need to render Instagram embeds — not for bulk data extraction.

The endpoint is rate-limited and requires the requesting application to have agreed to Instagram's Platform Policy. For a single embed render or a low-volume editorial workflow, it is the correct tool. For anything resembling analytics at scale, it is not.

Note that this request sends the oEmbed API URL to OmniScrape, which fetches it and returns the response body. The output_format is html here because we want the raw response body — the oEmbed JSON — returned in body.data.content. Parse that string as JSON in your application code to extract the fields you need.

Instagram oEmbed — fetch via OmniScrape

json

123456{
  "url": "https://api.instagram.com/oembed?url=https://www.instagram.com/p/CxYzAbCdEfG/&omitscript=true",
  "mode": "auto",
  "output_format": "html",
  "proxy": "residential:us"
}

7.What actually works in production

The Instagram Graph API is the correct tool for accounts you manage or have been granted access to by their owners. Through a Meta Business account and app review, you can retrieve post metrics, audience demographics, story insights, and media objects for connected accounts. This is the path for agencies managing client accounts, brands tracking their own presence, and tools built on top of creator partnerships.

For third-party mention data — tracking what other accounts post about your brand, monitoring hashtags, benchmarking competitors — licensed social listening vendors are the realistic option. Providers like Brandwatch, Sprout Social, and Meltwater operate under data access agreements with Meta and can surface aggregated metrics that are not available through the public Graph API.

Technical scraping without authorization is a maintenance trap with compounding costs. Budget 20 or more hours per month for selector breakage, IP block management, and login-wall workarounds — and that assumes Meta does not take legal action. For any use case that needs reliable, ongoing Instagram data, the cost of Graph API access or a licensed data provider is almost always lower than the engineering cost of maintaining an unauthorized scraper.

8.Reels: video metrics and logged-out access

Reel permalinks use the same /reel/SHORTCODE/ URL pattern and are subject to the same logged-out access constraints as regular posts. On some logged-out views, the og:description tag includes a play count in the format '1.2M plays' alongside the caption snippet — but this is inconsistent and Meta has been progressively removing it.

Video CDN URLs embedded in the page source are signed and expire, typically within hours. They are not suitable for archival or redistribution. Reel audio metadata, remix counts, and the full engagement breakdown (saves, shares) are not available on logged-out pages under any circumstances — they require authenticated Graph API access.

If your use case is embedding a reel in a CMS or rendering a preview, the oEmbed endpoint works for reels as well as regular posts — pass the /reel/ URL as the url parameter. For analytics on reels you own, use the Instagram Graph API's media insights endpoint.

9.Meta Terms of Service and legal constraints

Meta's Terms of Service, Instagram's Platform Policy, and Instagram's Community Guidelines all contain explicit prohibitions on automated data collection without authorization. Section 3 of the Instagram Terms states that users may not 'do anything unlawful, misleading, or fraudulent' and specifically prohibits 'collect[ing] users' content or information' without consent. These are not ambiguous.

The hiQ v. LinkedIn line of cases established that scraping publicly accessible data from a website does not automatically violate the Computer Fraud and Abuse Act — but that reasoning applies narrowly to data that is genuinely public and where no technical access controls exist. Instagram's login walls, CSRF requirements, and IP blocks are technical access controls. Circumventing them to access data that Instagram has chosen to gate behind authentication is a different legal question, and Meta has successfully litigated on this basis.

EU GDPR applies to any personal data collected from Instagram posts — usernames, profile photos, captions that identify individuals. Storing or processing this data without a lawful basis is a compliance exposure independent of the scraping question. California CCPA and other state privacy laws create similar obligations for US-based operators.

OmniScrape documents the technical mechanics of what is possible on logged-out public pages. Using these techniques on Instagram without Meta's authorization is a legal decision that your organization's counsel should make, not an engineering one. We do not represent that any of the techniques described here are permitted under Meta's Terms.

Frequently asked questions

Can I scrape Instagram without an account?

Occasionally, for individual public post permalinks, Instagram serves a logged-out HTML response that includes Open Graph meta tags with a caption snippet, thumbnail, and author handle. This works inconsistently and is declining in reliability as Meta tightens logged-out access. Profile grids, hashtag feeds, stories, and search results all require authentication. For any production use case, plan on the official Graph API or a licensed data provider.

Does OmniScrape bypass Instagram's login wall?

No. OmniScrape's Web Unlocker and js_rendering mode fetch the HTML that Instagram serves to an unauthenticated browser — which is increasingly a redirect to the login page rather than post content. OmniScrape does not log into Instagram on your behalf, manage Instagram sessions, or bypass authentication controls. Using session cookies from accounts you own is technically possible but must comply with Meta's Terms and applicable law.

Why do my Instagram CSS selectors break every few days?

Meta uses a CSS-in-JS system that generates hashed class names at build time. Every time Instagram deploys — which happens multiple times per week — the class names change. Selectors like div._aagw or span._aacl will stop working without warning. The mitigation is to target semantic HTML attributes instead: meta[property='og:description'], time[datetime], a[href*='/p/'], and header elements. These are more stable because they are driven by HTML semantics rather than styling.

What data does the Instagram oEmbed endpoint return?

The oEmbed endpoint returns: author_name, author_url, provider_name, provider_url, thumbnail_url, thumbnail_width, thumbnail_height, html (the embed iframe snippet), width, and version. It does not return like counts, comment counts, full captions, follower counts, or any engagement metrics. It is designed for embed rendering, not analytics.

Is scraping Instagram legal?

Meta explicitly prohibits it in their Terms of Service. Circumventing technical access controls — login walls, IP blocks, CSRF tokens — to access gated data raises Computer Fraud and Abuse Act exposure in the US, and Meta has filed and won cases on this basis. Storing personal data from Instagram posts without a lawful basis creates GDPR and CCPA exposure. Commercial use cases should use the Instagram Graph API with proper app review or a licensed data provider. This is a question for your legal counsel, not your engineering team.

Can I scrape Instagram Reels for play counts and video URLs?

Play counts appear in og:description on some logged-out reel pages but are inconsistently present and increasingly absent. Video CDN URLs embedded in page source are signed and expire within hours — they cannot be used for archival or redistribution. Full reel metrics (saves, shares, reach, impressions) require authenticated Graph API access to the account that owns the reel.

What is the Instagram Graph API and how do I get access?

The Instagram Graph API is Meta's official programmatic interface for Instagram data. Access requires a Meta Business account, a Facebook App with Instagram Graph API permissions, and app review for most advanced permissions. It provides post metrics, audience insights, story data, and media objects for accounts that have connected to your app. Basic Display API (now deprecated) provided read access for personal accounts. Start at developers.facebook.com/docs/instagram-api for current documentation.

Related guides

Ready to scrape without blocks?

Get your API key in minutes. Test protected URLs from the dashboard — no credit card required to start.

1.Instagram metrics brands and developers track

Post shortcode, full caption text, hashtags, @mentions
Like count and comment count on public posts
Post type: image, carousel, reel, or video
Timestamp (ISO 8601) and location tag name
Profile: username, display name, bio, follower count, following count, post count
Reel play count when visible to logged-out viewers (increasingly rare)
Tagged products, paid partnership labels, collaborator usernames
oEmbed: thumbnail URL, author name, embed HTML snippet
Media CDN URLs for images and video (expire via signed query parameters)

2.Instagram URL patterns and endpoint anatomy

Post permalink: https://www.instagram.com/p/CxYzAbCdEfG/
Reel permalink: https://www.instagram.com/reel/CxYzAbCdEfG/
Profile grid: https://www.instagram.com/natgeo/
Tagged posts: https://www.instagram.com/natgeo/tagged/
Hashtag feed: https://www.instagram.com/explore/tags/wildlife/ (login-gated)
oEmbed endpoint: https://api.instagram.com/oembed?url=POST_URL&omitscript=true
Shortcode decode: base64url(media_id) — useful for deduplication, not for API calls
Embed iframe: https://www.instagram.com/p/SHORTCODE/embed/ (renders post without login for display only)

3.Instagram markup reality for logged-out requests

4.Meta's bot detection and enforcement stack

Datacenter IP instant blocks — residential proxies required for any logged-out access
Per-IP and per-session rate limits with low thresholds
Login wall on profile grids, hashtag feeds, stories, and most search results
GraphQL doc_id hash rotation breaking unofficial API clients
CSRF token and session cookie requirements on all authenticated endpoints
Behavioral fingerprinting: mouse movement, scroll velocity, request timing
CDN media URL expiry via signed query parameters (URLs are not permanently usable)
Active legal enforcement — Meta ToS Section 3 explicitly prohibits scraping

5.Scraping a public post permalink (logged-out attempt)

Instagram public post — CSS extractor request

json

12345678910111213141516171819{
  "url": "https://www.instagram.com/p/CxYzAbCdEfG/",
  "mode": "js_rendering",
  "output_format": "css_extractor",
  "proxy": "residential:us",
  "enable_solver": true,
  "js_wait_selector": "article",
  "js_wait_timeout": 15000,
  "css_selectors": {
    "og_description": "meta[property='og:description']",
    "og_image": "meta[property='og:image']",
    "og_title": "meta[property='og:title']",
    "caption": "h1",
    "author": "header a[href]",
    "timestamp": "time[datetime]",
    "likes": "section span[aria-label]",
    "comments": "ul[class] li span"
  }
}

6.oEmbed endpoint for embed-compliant use cases

Instagram oEmbed — fetch via OmniScrape

json

123456{
  "url": "https://api.instagram.com/oembed?url=https://www.instagram.com/p/CxYzAbCdEfG/&omitscript=true",
  "mode": "auto",
  "output_format": "html",
  "proxy": "residential:us"
}

7.What actually works in production

8.Reels: video metrics and logged-out access

9.Meta Terms of Service and legal constraints

Frequently asked questions

Can I scrape Instagram without an account?

Does OmniScrape bypass Instagram's login wall?

Why do my Instagram CSS selectors break every few days?

What data does the Instagram oEmbed endpoint return?

Is scraping Instagram legal?

Can I scrape Instagram Reels for play counts and video URLs?

What is the Instagram Graph API and how do I get access?

Related guides

Ready to scrape without blocks?

Get your API key in minutes. Test protected URLs from the dashboard — no credit card required to start.