What People Build with OmniScrape
Real workflows from price monitoring to AI dataset collection. Each one includes working code you can copy and adapt.
Price and Product Monitoring
Track competitor prices across Amazon, Shopify stores, and retailer websites. Most product pages sit behind Cloudflare or have JavaScript-rendered prices that basic scrapers miss. Our Slow Lane handles those reliably.
- Scrape JavaScript-rendered prices with Slow Lane
- Extract structured product data with CSS selectors
- Schedule runs every hour without infrastructure overhead
- Get output as clean JSON your database can ingest directly
1234567891011121314151617response = requests.post(
"https://api.omniscrape.io/v1/scrape",
headers={"X-API-Key": "YOUR_KEY"},
json={
"url": "https://competitor.com/product/123",
"mode": "auto",
"output_format": "autoparse",
"css_selectors": {
"price": ".product-price",
"title": "h1.product-name",
"stock": ".availability-status",
"sku": "[data-sku]"
}
}
)
product = response.json()["data"]["css_extracted"]
print(product)
Training Data Collection
Large language models and computer vision pipelines need massive, diverse datasets. News sites, research portals, and forums often sit behind rate limiters or bot detection. We route around those so your data pipeline keeps running.
- Get Markdown output for clean LLM training text
- Auto-parse extracts article bodies, headings, and metadata
- Run thousands of URLs in parallel with our concurrent request limit
- Screenshot capture for computer vision and layout datasets
1234567891011121314151617181920212223import requests, json
urls = [
"https://news-site.com/article/1",
"https://research-portal.org/paper/abc",
"https://forum.example.com/thread/456",
]
for url in urls:
r = requests.post(
"https://api.omniscrape.io/v1/scrape",
headers={"X-API-Key": "YOUR_KEY"},
json={
"url": url,
"mode": "auto",
"output_format": "markdown",
"screenshot": False
}
)
data = r.json()
if data["success"]:
with open(f"corpus/{hash(url)}.md", "w") as f:
f.write(data["data"]["content"])
Property Listing Aggregation
Real estate portals rotate anti-bot measures constantly. Zillow, Rightmove, and local listing sites all use varying levels of Cloudflare protection. Our auto-routing handles the detection; you just process the listings.
- Extract price, location, bedrooms, bathrooms with CSS selectors
- Handle pagination automatically with session management
- Grab listing photos via template extraction
- Run geo-targeted requests through specific country proxies
12345678910111213141516response = requests.post(
"https://api.omniscrape.io/v1/scrape/advanced",
headers={"X-API-Key": "YOUR_KEY"},
json={
"url": "https://listing-portal.com/search?city=jakarta",
"mode": "js_rendering",
"output_format": "autoparse",
"css_selectors": {
"listings": ".listing-card",
"price": ".listing-price",
"location": ".listing-address",
"bedrooms": ".bed-count"
},
"proxy": "http://USER:PASS@proxy.omniscrape.io:8080"
}
)
Market Data and Financial Research
Financial data sites protect their content aggressively. Bloomberg, Yahoo Finance alternatives, and broker portals sit behind multiple bot detection layers. We bypass those so your quant pipeline gets the numbers it needs.
- Get stock prices, ratios, and earnings data as structured JSON
- Handle login-gated pages via BaaS with persistent sessions
- Capture tabular financial data with auto-parse table extraction
- Monitor for page updates with scheduled requests
123456789101112131415161718from playwright.async_api import async_playwright
async def get_financials(ticker: str):
async with async_playwright() as p:
browser = await p.chromium.connect_over_cdp(
"wss://browser.omniscrape.io"
"?apikey=YOUR_KEY&render_media=false"
)
page = await browser.new_page()
await page.goto(f"https://finance-site.com/{ticker}")
await page.wait_for_selector(".financials-table")
data = await page.evaluate("""
() => Array.from(
document.querySelectorAll(".financials-table tr")
).map(r => r.innerText)
""")
await browser.close()
return data
Lead Generation and Enrichment
Sales teams pull contact data, job titles, and company info from LinkedIn, directories, and company websites. These sites are among the hardest to scrape. Our stealth browser with residential proxies is built exactly for this.
- Extract company data, headcount, and tech stack info
- Use residential proxies to avoid geo-based blocking
- Rotate sessions to avoid per-account rate limits
- Export to JSON and push directly to your CRM via webhook
123456789101112131415161718response = requests.post(
"https://api.omniscrape.io/v1/scrape",
headers={"X-API-Key": "YOUR_KEY"},
json={
"url": "https://company-directory.com/company/acme",
"mode": "js_rendering",
"output_format": "autoparse",
"css_selectors": {
"name": "h1.company-name",
"employees": ".employee-count",
"industry": ".industry-tag",
"website": "a.website-link",
"founded": ".founded-year"
},
"proxy": "http://USER:PASS@proxy.omniscrape.io:8080",
"enable_solver": True
}
)
SERP Scraping and Content Research
Google search result pages, keyword tools, and SEO platforms lock down their data behind aggressive anti-bot measures. Getting clean SERP data for rank tracking or competitor content analysis requires stealth-level access.
- Scrape SERP results including titles, descriptions, and URLs
- Use Markdown output for direct content analysis
- Capture XHR requests to catch API-loaded results
- Run searches from specific countries using proxy routing
123456789101112131415161718response = requests.post(
"https://api.omniscrape.io/v1/scrape/advanced",
headers={"X-API-Key": "YOUR_KEY"},
json={
"url": "https://www.google.com/search?q=web+scraping+api",
"mode": "js_rendering",
"output_format": "autoparse",
"templates": ["links", "headings"],
"capture_xhr": True,
"proxy": "http://USER:PASS@proxy.omniscrape.io:8080",
"custom_headers": {
"Accept-Language": "en-US,en;q=0.9"
}
}
)
results = response.json()["data"]["template_extracted"]
for link in results["links"][:10]:
print(link)