Use Cases

What People Build with OmniScrape

Real workflows from price monitoring to AI dataset collection. Each one includes working code you can copy and adapt.

🏷️E-commerce

Price and Product Monitoring

Track competitor prices across Amazon, Shopify stores, and retailer websites. Most product pages sit behind Cloudflare or have JavaScript-rendered prices that basic scrapers miss. Our Slow Lane handles those reliably.

  • Scrape JavaScript-rendered prices with Slow Lane
  • Extract structured product data with CSS selectors
  • Schedule runs every hour without infrastructure overhead
  • Get output as clean JSON your database can ingest directly
e-commerce.py
python
1234567891011121314151617response = requests.post(
    "https://api.omniscrape.io/v1/scrape",
    headers={"X-API-Key": "YOUR_KEY"},
    json={
        "url": "https://competitor.com/product/123",
        "mode": "auto",
        "output_format": "autoparse",
        "css_selectors": {
            "price":       ".product-price",
            "title":       "h1.product-name",
            "stock":       ".availability-status",
            "sku":         "[data-sku]"
        }
    }
)
product = response.json()["data"]["css_extracted"]
print(product)
🧠AI / ML

Training Data Collection

Large language models and computer vision pipelines need massive, diverse datasets. News sites, research portals, and forums often sit behind rate limiters or bot detection. We route around those so your data pipeline keeps running.

  • Get Markdown output for clean LLM training text
  • Auto-parse extracts article bodies, headings, and metadata
  • Run thousands of URLs in parallel with our concurrent request limit
  • Screenshot capture for computer vision and layout datasets
ai_/_ml.py
python
1234567891011121314151617181920212223import requests, json

urls = [
    "https://news-site.com/article/1",
    "https://research-portal.org/paper/abc",
    "https://forum.example.com/thread/456",
]

for url in urls:
    r = requests.post(
        "https://api.omniscrape.io/v1/scrape",
        headers={"X-API-Key": "YOUR_KEY"},
        json={
            "url": url,
            "mode": "auto",
            "output_format": "markdown",
            "screenshot": False
        }
    )
    data = r.json()
    if data["success"]:
        with open(f"corpus/{hash(url)}.md", "w") as f:
            f.write(data["data"]["content"])
🏠Real Estate

Property Listing Aggregation

Real estate portals rotate anti-bot measures constantly. Zillow, Rightmove, and local listing sites all use varying levels of Cloudflare protection. Our auto-routing handles the detection; you just process the listings.

  • Extract price, location, bedrooms, bathrooms with CSS selectors
  • Handle pagination automatically with session management
  • Grab listing photos via template extraction
  • Run geo-targeted requests through specific country proxies
real_estate.py
python
12345678910111213141516response = requests.post(
    "https://api.omniscrape.io/v1/scrape/advanced",
    headers={"X-API-Key": "YOUR_KEY"},
    json={
        "url": "https://listing-portal.com/search?city=jakarta",
        "mode": "js_rendering",
        "output_format": "autoparse",
        "css_selectors": {
            "listings":  ".listing-card",
            "price":     ".listing-price",
            "location":  ".listing-address",
            "bedrooms":  ".bed-count"
        },
        "proxy": "http://USER:PASS@proxy.omniscrape.io:8080"
    }
)
📊Finance

Market Data and Financial Research

Financial data sites protect their content aggressively. Bloomberg, Yahoo Finance alternatives, and broker portals sit behind multiple bot detection layers. We bypass those so your quant pipeline gets the numbers it needs.

  • Get stock prices, ratios, and earnings data as structured JSON
  • Handle login-gated pages via BaaS with persistent sessions
  • Capture tabular financial data with auto-parse table extraction
  • Monitor for page updates with scheduled requests
finance.py
python
123456789101112131415161718from playwright.async_api import async_playwright

async def get_financials(ticker: str):
    async with async_playwright() as p:
        browser = await p.chromium.connect_over_cdp(
            "wss://browser.omniscrape.io"
            "?apikey=YOUR_KEY&render_media=false"
        )
        page = await browser.new_page()
        await page.goto(f"https://finance-site.com/{ticker}")
        await page.wait_for_selector(".financials-table")
        data = await page.evaluate("""
            () => Array.from(
                document.querySelectorAll(".financials-table tr")
            ).map(r => r.innerText)
        """)
        await browser.close()
        return data
📬B2B Sales

Lead Generation and Enrichment

Sales teams pull contact data, job titles, and company info from LinkedIn, directories, and company websites. These sites are among the hardest to scrape. Our stealth browser with residential proxies is built exactly for this.

  • Extract company data, headcount, and tech stack info
  • Use residential proxies to avoid geo-based blocking
  • Rotate sessions to avoid per-account rate limits
  • Export to JSON and push directly to your CRM via webhook
b2b_sales.py
python
123456789101112131415161718response = requests.post(
    "https://api.omniscrape.io/v1/scrape",
    headers={"X-API-Key": "YOUR_KEY"},
    json={
        "url": "https://company-directory.com/company/acme",
        "mode": "js_rendering",
        "output_format": "autoparse",
        "css_selectors": {
            "name":       "h1.company-name",
            "employees":  ".employee-count",
            "industry":   ".industry-tag",
            "website":    "a.website-link",
            "founded":    ".founded-year"
        },
        "proxy": "http://USER:PASS@proxy.omniscrape.io:8080",
        "enable_solver": True
    }
)
🔍SEO / Content

SERP Scraping and Content Research

Google search result pages, keyword tools, and SEO platforms lock down their data behind aggressive anti-bot measures. Getting clean SERP data for rank tracking or competitor content analysis requires stealth-level access.

  • Scrape SERP results including titles, descriptions, and URLs
  • Use Markdown output for direct content analysis
  • Capture XHR requests to catch API-loaded results
  • Run searches from specific countries using proxy routing
seo_/_content.py
python
123456789101112131415161718response = requests.post(
    "https://api.omniscrape.io/v1/scrape/advanced",
    headers={"X-API-Key": "YOUR_KEY"},
    json={
        "url": "https://www.google.com/search?q=web+scraping+api",
        "mode": "js_rendering",
        "output_format": "autoparse",
        "templates": ["links", "headings"],
        "capture_xhr": True,
        "proxy": "http://USER:PASS@proxy.omniscrape.io:8080",
        "custom_headers": {
            "Accept-Language": "en-US,en;q=0.9"
        }
    }
)
results = response.json()["data"]["template_extracted"]
for link in results["links"][:10]:
    print(link)

Your use case not listed here?

If a page exists on the public web, we can likely scrape it. Try it yourself — start your free trial today.