Use Cases

What People Build with OmniScrape

Real workflows from price monitoring to AI dataset collection. Each one includes working code you can copy and adapt.

🏷️E-commerce 🧠AI / ML 🏠Real Estate 📊Finance 📬B2B Sales 🔍SEO / Content

🏷️E-commerce

Price and Product Monitoring

Track competitor prices across Amazon, Shopify stores, and retailer websites. Most product pages sit behind Cloudflare or have JavaScript-rendered prices that basic scrapers miss. Our Slow Lane handles those reliably.

Scrape JavaScript-rendered prices with Slow Lane
Extract structured product data with CSS selectors
Schedule runs every hour without infrastructure overhead
Get output as clean JSON your database can ingest directly

e-commerce.py

python

1234567891011121314151617response = requests.post(
    "https://api.omniscrape.io/v1/scrape",
    headers={"X-API-Key": "YOUR_KEY"},
    json={
        "url": "https://competitor.com/product/123",
        "mode": "auto",
        "output_format": "autoparse",
        "css_selectors": {
            "price":       ".product-price",
            "title":       "h1.product-name",
            "stock":       ".availability-status",
            "sku":         "[data-sku]"
        }
    }
)
product = response.json()["data"]["css_extracted"]
print(product)

🧠AI / ML

Training Data Collection

Large language models and computer vision pipelines need massive, diverse datasets. News sites, research portals, and forums often sit behind rate limiters or bot detection. We route around those so your data pipeline keeps running.

Get Markdown output for clean LLM training text
Auto-parse extracts article bodies, headings, and metadata
Run thousands of URLs in parallel with our concurrent request limit
Screenshot capture for computer vision and layout datasets

ai_/_ml.py

python

1234567891011121314151617181920212223import requests, json

urls = [
    "https://news-site.com/article/1",
    "https://research-portal.org/paper/abc",
    "https://forum.example.com/thread/456",
]

for url in urls:
    r = requests.post(
        "https://api.omniscrape.io/v1/scrape",
        headers={"X-API-Key": "YOUR_KEY"},
        json={
            "url": url,
            "mode": "auto",
            "output_format": "markdown",
            "screenshot": False
        }
    )
    data = r.json()
    if data["success"]:
        with open(f"corpus/{hash(url)}.md", "w") as f:
            f.write(data["data"]["content"])

🏠Real Estate

Property Listing Aggregation

Real estate portals rotate anti-bot measures constantly. Zillow, Rightmove, and local listing sites all use varying levels of Cloudflare protection. Our auto-routing handles the detection; you just process the listings.

Extract price, location, bedrooms, bathrooms with CSS selectors
Handle pagination automatically with session management
Grab listing photos via template extraction
Run geo-targeted requests through specific country proxies

real_estate.py

python

12345678910111213141516response = requests.post(
    "https://api.omniscrape.io/v1/scrape/advanced",
    headers={"X-API-Key": "YOUR_KEY"},
    json={
        "url": "https://listing-portal.com/search?city=jakarta",
        "mode": "js_rendering",
        "output_format": "autoparse",
        "css_selectors": {
            "listings":  ".listing-card",
            "price":     ".listing-price",
            "location":  ".listing-address",
            "bedrooms":  ".bed-count"
        },
        "proxy": "http://USER:PASS@proxy.omniscrape.io:8080"
    }
)

📊Finance

Market Data and Financial Research

Financial data sites protect their content aggressively. Bloomberg, Yahoo Finance alternatives, and broker portals sit behind multiple bot detection layers. We bypass those so your quant pipeline gets the numbers it needs.

Get stock prices, ratios, and earnings data as structured JSON
Handle login-gated pages via BaaS with persistent sessions
Capture tabular financial data with auto-parse table extraction
Monitor for page updates with scheduled requests

finance.py

python

123456789101112131415161718from playwright.async_api import async_playwright

async def get_financials(ticker: str):
    async with async_playwright() as p:
        browser = await p.chromium.connect_over_cdp(
            "wss://browser.omniscrape.io"
            "?apikey=YOUR_KEY&render_media=false"
        )
        page = await browser.new_page()
        await page.goto(f"https://finance-site.com/{ticker}")
        await page.wait_for_selector(".financials-table")
        data = await page.evaluate("""
            () => Array.from(
                document.querySelectorAll(".financials-table tr")
            ).map(r => r.innerText)
        """)
        await browser.close()
        return data

📬B2B Sales

Lead Generation and Enrichment

Sales teams pull contact data, job titles, and company info from LinkedIn, directories, and company websites. These sites are among the hardest to scrape. Our stealth browser with residential proxies is built exactly for this.

Extract company data, headcount, and tech stack info
Use residential proxies to avoid geo-based blocking
Rotate sessions to avoid per-account rate limits
Export to JSON and push directly to your CRM via webhook

b2b_sales.py

python

123456789101112131415161718response = requests.post(
    "https://api.omniscrape.io/v1/scrape",
    headers={"X-API-Key": "YOUR_KEY"},
    json={
        "url": "https://company-directory.com/company/acme",
        "mode": "js_rendering",
        "output_format": "autoparse",
        "css_selectors": {
            "name":       "h1.company-name",
            "employees":  ".employee-count",
            "industry":   ".industry-tag",
            "website":    "a.website-link",
            "founded":    ".founded-year"
        },
        "proxy": "http://USER:PASS@proxy.omniscrape.io:8080",
        "enable_solver": True
    }
)

🔍SEO / Content

SERP Scraping and Content Research

Google search result pages, keyword tools, and SEO platforms lock down their data behind aggressive anti-bot measures. Getting clean SERP data for rank tracking or competitor content analysis requires stealth-level access.

Scrape SERP results including titles, descriptions, and URLs
Use Markdown output for direct content analysis
Capture XHR requests to catch API-loaded results
Run searches from specific countries using proxy routing

seo_/_content.py

python

123456789101112131415161718response = requests.post(
    "https://api.omniscrape.io/v1/scrape/advanced",
    headers={"X-API-Key": "YOUR_KEY"},
    json={
        "url": "https://www.google.com/search?q=web+scraping+api",
        "mode": "js_rendering",
        "output_format": "autoparse",
        "templates": ["links", "headings"],
        "capture_xhr": True,
        "proxy": "http://USER:PASS@proxy.omniscrape.io:8080",
        "custom_headers": {
            "Accept-Language": "en-US,en;q=0.9"
        }
    }
)
results = response.json()["data"]["template_extracted"]
for link in results["links"][:10]:
    print(link)

Your use case not listed here?

If a page exists on the public web, we can likely scrape it. Try it yourself — start your free trial today.