OmniScrape
ProductsSolutionsGuidesDocs ↗PricingAbout
ProductsSolutionsGuidesDocs ↗PricingAbout
← All guides
Web Scraping by Language

Web Scraping with C#: HttpClient, AngleSharp, and OmniScrape API

.NET teams scrape for the same reasons everyone else does — price feeds, compliance snapshots, competitor catalogs — but they want the scraper embedded inside services that already use dependency injection, structured logging, and Key Vault for secrets. HttpClient paired with AngleSharp is the standard approach: HttpClient handles transport, AngleSharp provides a standards-compliant DOM with LINQ-friendly query methods.

The most common production mistake is constructing a new HttpClient per request. That exhausts the socket pool under load and produces intermittent connection failures that are hard to reproduce locally. This guide covers the IHttpClientFactory pattern for ASP.NET Core and Worker Services, AngleSharp parsing, async batch concurrency with SemaphoreSlim, and routing bot-protected URLs through the OmniScrape API. Teams working across languages can compare with web scraping with Python — the JSON request body is identical across SDKs.

On this page

1. NuGet Packages and Project Setup2. Fetching Pages with HttpClient3. Parsing HTML with AngleSharp4. IHttpClientFactory in ASP.NET Core5. Handling Bot-Protected Pages6. Server-Side CSS Extraction with css_extractor7. Async Batch Scraping with SemaphoreSlim8. JavaScript-Rendered Pages and SPAs9. HTTP Status and Error Handling10. FAQ

1.NuGet Packages and Project Setup

AngleSharp is the recommended HTML parser for modern .NET. It implements the WHATWG HTML5 parsing specification, so it handles malformed markup the same way browsers do. System.Text.Json ships in the BCL from .NET 6 onwards and is sufficient for deserializing OmniScrape responses — no need to add Newtonsoft.Json unless your project already depends on it.

For projects targeting .NET 6 or later, both HttpClient and System.Text.Json are available without additional packages. Add only AngleSharp explicitly.

terminal
bash
1234dotnet add package AngleSharp
# HttpClient and System.Text.Json are included in the BCL (.NET 6+)
# Optional: add Polly for retry policies
dotnet add package Microsoft.Extensions.Http.Polly

2.Fetching Pages with HttpClient

For throwaway console tools or one-off scripts, a single static HttpClient instance shared across the process lifetime is acceptable. Set a realistic Timeout — the default 100-second timeout is too long for batch jobs and too short for slow CDN-backed pages. Add a User-Agent header; many servers return 403 to requests that omit it.

For ASP.NET Core services, hosted workers, or anything that runs longer than a single process invocation, skip this pattern entirely and use IHttpClientFactory covered in the next section. The factory manages handler lifetimes and avoids DNS staleness.

FetchPage.cs
csharp
12345678910111213141516using System.Net.Http;

// Declare once at the class or program level — never inside a loop
private static readonly HttpClient _client = new HttpClient
{
    Timeout = TimeSpan.FromSeconds(30),
    DefaultRequestHeaders =
    {
        { "User-Agent", "Mozilla/5.0 (compatible; MyBot/1.0)" },
    },
};

var html = await _client.GetStringAsync(
    "https://books.toscrape.com/catalogue/page-1.html");

Console.WriteLine($"Fetched {html.Length:N0} characters");

3.Parsing HTML with AngleSharp

BrowsingContext.New creates a parsing environment. Pass the raw HTML string via req.Content() inside OpenAsync — this avoids a second network call. QuerySelectorAll accepts any CSS selector string and returns an IHtmlCollection you can project with LINQ. Always call .Trim() on TextContent before persisting; whitespace around prices and titles is common.

GetAttribute returns null when the attribute is absent, so use the null-conditional operator throughout. If a selector changes on the target site, you get null values rather than an exception — log nulls explicitly so silent data gaps surface in monitoring.

ParseBooks.cs
csharp
1234567891011121314151617181920using AngleSharp;
using AngleSharp.Dom;

var context = BrowsingContext.New(Configuration.Default);
var document = await context.OpenAsync(req => req.Content(html));

var books = document.QuerySelectorAll("article.product_pod")
    .Select(card => new
    {
        Title = card.QuerySelector("h3 a")?.GetAttribute("title")?.Trim(),
        Price = card.QuerySelector(".price_color")?.TextContent.Trim(),
        Rating = card.QuerySelector("p.star-rating")?.ClassName
                     ?.Replace("star-rating", "").Trim(),
    })
    .Where(b => b.Title is not null)
    .ToList();

Console.WriteLine($"Parsed {books.Count} books");
foreach (var book in books.Take(3))
    Console.WriteLine($"{book.Title} — {book.Price} ({book.Rating} stars)");

4.IHttpClientFactory in ASP.NET Core

Register a typed client in Program.cs. The factory pools HttpMessageHandler instances and rotates them on a configurable interval (default two minutes), preventing both socket exhaustion and DNS staleness. Read the API key from configuration — in development use dotnet user-secrets, in production use Azure Key Vault via the Microsoft.Extensions.Configuration.AzureKeyVault provider.

Inject OmniScrapeClient into a BackgroundService or a Hangfire job, not directly into MVC controllers handling user requests. Scraping is slow and should never block a request thread. The typed client below reads body.Data.Content from the success response, which is where OmniScrape returns the fetched HTML.

OmniScrapeClient.cs
csharp
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556// Program.cs
builder.Services.AddHttpClient<OmniScrapeClient>(client =>
{
    client.BaseAddress = new Uri("https://api.omniscrape.io/");
    client.Timeout = TimeSpan.FromMinutes(2);
    client.DefaultRequestHeaders.Add(
        "X-API-Key",
        builder.Configuration["OmniScrape:ApiKey"]);
});

// OmniScrapeClient.cs
public class OmniScrapeClient(HttpClient http)
{
    public async Task<string> FetchHtmlAsync(
        string url,
        string mode = "auto",
        CancellationToken ct = default)
    {
        var payload = new
        {
            url,
            mode,
            output_format = "html",
            enable_solver = true,
        };

        using var response = await http.PostAsJsonAsync("v1/scrape", payload, ct);
        response.EnsureSuccessStatusCode();

        var body = await response.Content
            .ReadFromJsonAsync<ScrapeResponse>(cancellationToken: ct)
            ?? throw new InvalidOperationException("Null response body");

        if (!body.Success)
            throw new ScrapeFailedException(url, body.Error);

        // HTML content is in data.content
        return body.Data.Content;
    }
}

// ScrapeResponse.cs (record for System.Text.Json)
public record ScrapeResponse(
    bool Success,
    ScrapeData Data,
    ScrapeMetadata Metadata,
    string? Error);

public record ScrapeData(
    string Content,
    Dictionary<string, string>? CssExtracted);

public record ScrapeMetadata(
    string MethodUsed,
    bool SolverUsed,
    bool ChallengeSolved);

5.Handling Bot-Protected Pages

Finance portals, retail sites, and travel aggregators commonly return 403 responses or serve JavaScript challenge pages to datacenter IP ranges. AngleSharp will parse the challenge page without error — your selectors simply return null, and you silently collect no data. Detect this by checking for known challenge fingerprints ("Just a moment", "cf-browser-verification") in the returned HTML before parsing.

For domains that consistently block direct HTTP, route requests through OmniScrapeClient with enable_solver: true. The OmniScrape Web Unlocker handles TLS fingerprinting, JavaScript challenge execution, and cookie management transparently. Read Cloudflare bypass for a detailed breakdown of the protection stack. The mode "auto" will attempt fast HTTP first and escalate to a headless browser automatically when a challenge is detected — no code change needed per domain.

ChallengeDetection.cs
csharp
12345678910111213// Detect challenge pages before parsing
private static bool IsChallengeResponse(string html) =>
    html.Contains("cf-browser-verification", StringComparison.OrdinalIgnoreCase) ||
    html.Contains("Just a moment", StringComparison.OrdinalIgnoreCase) ||
    html.Length < 5_000; // suspiciously small for a product page

var html = await _client.GetStringAsync(targetUrl);

if (IsChallengeResponse(html))
{
    // Fall back to OmniScrape with solver enabled
    html = await omni.FetchHtmlAsync(targetUrl, mode: "auto", ct: ct);
}

6.Server-Side CSS Extraction with css_extractor

When you need a small set of fields from a page, use output_format: css_extractor and pass a css_selectors dictionary. OmniScrape evaluates the selectors server-side and returns a flat key-value map in data.css_extracted. This eliminates the AngleSharp parsing step entirely for simple cases and reduces the amount of HTML you need to transfer and process.

Map the extracted dictionary directly to a DTO or record. If a selector fails to match, the key is absent from the dictionary — handle that with TryGetValue rather than direct indexing to avoid KeyNotFoundException when a site changes its markup.

StructuredScrape.cs
csharp
1234567891011121314151617181920212223242526272829var payload = new
{
    url = "https://protected-shop.com/sku/441",
    mode = "auto",
    output_format = "css_extractor",
    enable_solver = true,
    css_selectors = new Dictionary<string, string>
    {
        ["title"]         = "h1.product-name",
        ["price"]         = "span.price-current",
        ["availability"]  = ".stock-status",
        ["sku"]           = "meta[name='sku']@content",
    },
};

using var response = await http.PostAsJsonAsync("v1/scrape", payload, ct);
response.EnsureSuccessStatusCode();

var json = await response.Content
    .ReadFromJsonAsync<ScrapeResponse>(cancellationToken: ct);

var extracted = json!.Data.CssExtracted ?? new Dictionary<string, string>();

var product = new ProductDto(
    Title:        extracted.GetValueOrDefault("title", ""),
    Price:        extracted.GetValueOrDefault("price", ""),
    Availability: extracted.GetValueOrDefault("availability", "unknown"),
    Sku:          extracted.GetValueOrDefault("sku", "")
);

7.Async Batch Scraping with SemaphoreSlim

Task.WhenAll fires all tasks concurrently. Without a concurrency cap, a list of 500 URLs will open 500 simultaneous connections — saturating your API quota and triggering rate limiting. SemaphoreSlim(n) limits in-flight requests to n at a time. Five is a reasonable starting point for OmniScrape; increase it based on your plan's rate limits.

Always await all the way through the call chain. Calling .Result or .GetAwaiter().GetResult() on ASP.NET threads can deadlock the synchronization context. Pass a CancellationToken from the host's ApplicationStopping event so in-progress batches drain cleanly on shutdown.

BatchScrape.cs
csharp
12345678910111213141516171819202122232425262728293031323334353637var urls = new[]
{
    "https://example.com/product/1",
    "https://example.com/product/2",
    "https://example.com/product/3",
    // ...
};

using var sem = new SemaphoreSlim(initialCount: 5, maxCount: 5);

var tasks = urls.Select(async url =>
{
    await sem.WaitAsync(ct);
    try
    {
        var html = await omni.FetchHtmlAsync(url, ct: ct);
        return (url, html, error: (string?)null);
    }
    catch (Exception ex)
    {
        logger.LogWarning(ex, "Failed to fetch {Url}", url);
        return (url, html: (string?)null, error: ex.Message);
    }
    finally
    {
        sem.Release();
    }
});

var results = await Task.WhenAll(tasks);

var succeeded = results.Where(r => r.html is not null).ToList();
var failed    = results.Where(r => r.error is not null).ToList();

logger.LogInformation(
    "Batch complete: {Succeeded} succeeded, {Failed} failed",
    succeeded.Count, failed.Count);

8.JavaScript-Rendered Pages and SPAs

AngleSharp parses static HTML — it does not execute JavaScript. React, Vue, and Blazor WebAssembly storefronts render their content client-side, so the raw HTML response contains only a shell div and script tags. Selectors against that HTML return null for every field.

Use mode: js_rendering to instruct OmniScrape to load the page in a headless Chromium instance. Set js_wait_selector to a CSS selector that appears only after the target content has rendered — this is more reliable than a fixed delay. js_wait_timeout is in milliseconds; 10 000 is a safe ceiling for most SPAs. See scraping JavaScript-rendered pages for detailed guidance on selector choice and session reuse.

JsRenderingScrape.cs
csharp
123456789101112131415161718192021222324var payload = new
{
    url = "https://spa-store.com/category/laptops",
    mode = "js_rendering",
    output_format = "html",
    js_wait_selector = ".product-card",
    js_wait_timeout = 10_000,
    proxy = "residential:us",
};

using var response = await http.PostAsJsonAsync("v1/scrape", payload, ct);
response.EnsureSuccessStatusCode();

var body = await response.Content
    .ReadFromJsonAsync<ScrapeResponse>(cancellationToken: ct);

// Log which rendering path was actually used
logger.LogInformation(
    "method_used={Method} solver_used={Solver}",
    body!.Metadata.MethodUsed,
    body.Metadata.SolverUsed);

// HTML content is in data.content
var html = body.Data.Content;

9.HTTP Status and Error Handling

Distinguish transport-level HTTP errors from application-level scrape failures. EnsureSuccessStatusCode throws HttpRequestException for 4xx/5xx responses, but you also need to check body.Success for cases where the API returns 200 with a failure payload (e.g., the target site was unreachable).

Use Polly (via Microsoft.Extensions.Http.Polly) for retry logic on transient errors. Do not retry on 401 or 402 — those require operator intervention, not automatic retries.

  • 401 Unauthorized — API key missing or invalid; fix Key Vault secret, do not retry automatically
  • 402 Payment Required — account balance exhausted; pause the hosted service and alert ops
  • 429 Too Many Requests — rate limit exceeded; apply Polly exponential backoff with jitter, respect Retry-After header
  • 502 Bad Gateway — transient upstream error; retry up to three times with a short delay
  • body.Success === false with HTTP 200 — the target URL was unreachable or returned an error; log and skip, do not feed into Polly
  • HttpRequestException with timeout — increase client.Timeout for js_rendering requests; they take longer than fast HTTP fetches
  • KeyNotFoundException on css_extracted — a selector stopped matching; alert and fall back to full HTML parsing
RetryPolicy.cs
csharp
123456789101112131415161718192021// Polly retry policy registered at startup
builder.Services.AddHttpClient<OmniScrapeClient>()
    .AddPolicyHandler(HttpPolicyExtensions
        .HandleTransientHttpError()
        .OrResult(r => r.StatusCode == HttpStatusCode.TooManyRequests)
        .WaitAndRetryAsync(
            retryCount: 3,
            sleepDurationProvider: (attempt, outcome, _) =>
            {
                // Honour Retry-After if present, else exponential backoff
                if (outcome.Result?.Headers.RetryAfter?.Delta is { } delta)
                    return delta;
                return TimeSpan.FromSeconds(Math.Pow(2, attempt));
            },
            onRetry: (outcome, delay, attempt, _) =>
            {
                Log.Warning(
                    "Retry {Attempt} after {Delay}s — {Reason}",
                    attempt, delay.TotalSeconds,
                    outcome.Exception?.Message ?? outcome.Result?.StatusCode.ToString());
            }));

Frequently asked questions

AngleSharp or Html Agility Pack — which should I use?

AngleSharp for any greenfield .NET 6+ project. It implements the WHATWG HTML5 specification, supports CSS selectors natively, and integrates well with LINQ. Html Agility Pack is more tolerant of severely broken HTML and has a longer history in .NET, making it a reasonable choice when you're parsing legacy intranet pages or documents that predate modern HTML standards. For public web scraping, AngleSharp's spec compliance is an advantage — it parses pages the same way Chrome does.

Why not use Playwright or Puppeteer Sharp instead of OmniScrape?

Playwright is the right tool for authenticated workflows you control — filling forms, clicking through multi-step checkouts, or testing your own application. For scraping public bot-protected pages at scale, maintaining a headless browser fleet against adaptive bot vendors is a significant operational burden: fingerprint rotation, proxy management, CAPTCHA solving, and browser version updates all require ongoing work. OmniScrape handles that infrastructure. Use Playwright for flows that require session state you manage; use OmniScrape for public pages that block datacenter IPs.

Is Azure Functions a good host for a scraping workload?

Yes for scheduled jobs and event-driven triggers. Use the isolated worker model (not in-process) and register IHttpClientFactory via dependency injection in Program.cs. Set the function timeout in host.json above the worst-case js_rendering response time — allow at least 90 seconds. For high-volume continuous scraping, a Worker Service on Container Apps or AKS gives more control over concurrency and scaling than the Consumption plan.

Where should I store the OmniScrape API key in a .NET project?

In development: dotnet user-secrets set OmniScrape:ApiKey your-key-here. In CI/CD: environment variables injected at build time, not committed to source. In production on Azure: Azure Key Vault referenced via managed identity — no credentials in appsettings.json or Dockerfile. Never commit API keys to git; rotate immediately if one is exposed.

When should I use mode fast versus mode auto?

Default to auto in production. The auto mode tries fast HTTP-only fetching first and escalates to a headless browser only when it detects a challenge or JavaScript requirement. This keeps costs low for pages that don't need rendering while handling protected pages transparently. Use fast explicitly only when you have confirmed the target never requires JavaScript and you want to enforce the lower-cost path. Inspect metadata.method_used in API responses to understand what each domain actually requires.

How do I handle pagination across hundreds of pages efficiently?

Build the URL list upfront if the pagination pattern is predictable (e.g., ?page=1 through ?page=200), then run the SemaphoreSlim batch pattern. For sites with cursor-based or next-link pagination, chain requests sequentially: parse the next-page link from each response before queuing the next request. Use session_id in the OmniScrape request body to reuse a browser session across paginated requests on JavaScript-heavy sites — this avoids repeated challenge solving and reduces latency.

How do I deserialize OmniScrape responses with System.Text.Json?

Use JsonPropertyName attributes or configure JsonSerializerOptions with JsonNamingPolicy.SnakeCaseLower (.NET 8+) to map snake_case JSON fields to PascalCase C# properties. The response shape is: body.success (bool), body.data.content (HTML string), body.data.css_extracted (Dictionary<string,string> when using css_extractor), and body.metadata.method_used (string). The correct field for HTML content is data.content — there is no data.html field in the OmniScrape response.

Related guides

  • Web Scraping with Python
  • How to Bypass Cloudflare When Web Scraping
  • Scrape JavaScript-Rendered Pages: SPAs, Hydration, and Hidden APIs
  • Web Scraping API: Endpoint, Modes, Output Formats & Integration Patterns

Ready to scrape without blocks?

Get your API key in minutes. Test protected URLs from the dashboard — no credit card required to start.

Ready to get started?

Start scraping protected sites today — no credit card required.

OmniScrape

Web scraping infrastructure for developers. One API call to bypass any protection.

All systems operational

Product

  • Web Unlocker
  • Browser-as-a-Service
  • Residential Proxies
  • Pricing

Developers

  • API Reference ↗
  • Quickstart ↗
  • All Guides
  • Use Cases
  • Status

Company

  • About
  • Contact

Legal

  • Privacy Policy
  • Terms of Service
  • Refund Policy
  • Cookie Policy
  • Acceptable Use

Solutions

  • E-commerce Web Scraping: Catalog Intelligence at Production Scale
  • Real Estate Web Scraping: Listings, Comps, and Market Data
  • SERP Web Scraping: Agency Rank Tracking Workflow
  • Job Board Web Scraping: HR Tech Pipeline for Labor Market Intelligence
  • Price Monitoring with Web Scraping: A Practical Developer Guide
  • Lead Generation Web Scraping: Compliant Inbound Enrichment for Sales Teams
  • Market Research Web Scraping: Multi-Geo Data Collection for Research Firms
  • Sentiment Analysis Web Scraping: Build a Production Review Pipeline
  • Logistics Web Scraping: Carrier Rates, Port ETAs, and Sailing Schedules
  • Social Media Web Scraping: Brand Mention Monitoring from Public Pages
  • LLM Training Data Scraping: Building Clean Web Corpora
  • Travel Web Scraping: Hotel Rates, Flight Fares & Parity Monitoring

Web Scraping by Language

  • Web Scraping with Python
  • Web Scraping with Node.js: fetch, Cheerio, and the OmniScrape API
  • Web Scraping with Java: HttpClient, Jsoup, and OmniScrape API
  • Web Scraping with PHP
  • Web Scraping with Go (Golang)
  • Web Scraping with Ruby: Faraday, Nokogiri, Sidekiq & OmniScrape
  • Web Scraping with C#: HttpClient, AngleSharp, and OmniScrape API
  • Web Scraping with Rust
  • Web Scraping with R: httr2, rvest, and the OmniScrape API
  • Web Scraping with C++
  • Web Scraping with Elixir
  • Web Scraping with Perl: Mojo::UserAgent, Mojo::DOM, and OmniScrape

Anti-Bot Bypass

  • How to Bypass Cloudflare When Web Scraping
  • How to Bypass DataDome When Web Scraping
  • How to Bypass Akamai Bot Manager When Web Scraping
  • How to Bypass PerimeterX (HUMAN Security) When Web Scraping
  • Bypassing AWS WAF When Web Scraping: Rate Rules, Bot Control, and Residential Proxies
  • How to Bypass Imperva (Incapsula) When Web Scraping
  • How to Bypass Kasada Bot Protection When Web Scraping
  • How to Bypass F5 BIG-IP Bot Defense When Web Scraping
  • How to Bypass Distil Networks When Web Scraping
  • How to Bypass reCAPTCHA When Web Scraping

Scraping Tools

  • Playwright Web Scraping: Practical Patterns for Protected Sites
  • Puppeteer Web Scraping: Patterns, Anti-Bot Limits, and BaaS Integration
  • Selenium Web Scraping: Practical Patterns for Real-World Projects
  • Scrapy Web Scraping with OmniScrape: Download Middleware, Pipelines, and Scale
  • Beautiful Soup Web Scraping: A Practical Guide
  • cURL Web Scraping: Shell-Native Patterns with OmniScrape
  • HTTPX Web Scraping: Async Python with OmniScrape
  • Cheerio Web Scraping: A Practical Guide

Site-Specific Scrapers

  • Amazon Scraper: Product Data, Buy Box, Reviews, and Multi-Marketplace
  • Google Search Scraper: Extract SERP Rankings and Features
  • Google Maps Scraper: Extract Business Listings and Place Data
  • LinkedIn Scraper: Companies, Jobs, and Public Profiles
  • Walmart Scraper: Prices, Stock, Rollback Deals, and Fulfillment Data
  • eBay Scraper: Extract Listings, Auctions, and Sold Prices
  • Shopify Scraper: Products, Variants, and JSON Endpoints
  • Indeed Scraper: Extract Job Listings, Salaries, and Company Data
  • Zillow Scraper: Extract Listings, Zestimates, and Price History
  • Reddit Scraper: Posts, Comments, and Subreddit Data
  • X (Twitter) Scraper: Tweets, Profiles, and Hashtags
  • Instagram Scraper: Posts, Reels, and Profile Metrics
  • TikTok Scraper: Extract Videos, Hashtags, and Trend Data
  • YouTube Scraper: Extract Video Metadata, Comments, and Channel Stats
  • Booking.com Scraper: Hotel Rates, Room Types, and Availability
  • Airbnb Scraper: Listings, Calendars, and Nightly Rates
  • Crunchbase Scraper: Extract Funding Rounds, Companies, and Investors
  • Yelp Scraper: Extract Business Listings, Ratings, and Reviews
  • Glassdoor Scraper: Employer Ratings, Salaries, and Review Data
  • Trustpilot Scraper: TrustScore, Star Distribution, and Review Monitoring

How We Compare

  • OmniScrape vs ScrapingBee
  • OmniScrape vs ZenRows
  • OmniScrape vs ScraperAPI: A Practical Developer Comparison
  • OmniScrape vs Bright Data: Which Web Scraping Platform Fits Your Team?
  • OmniScrape vs Oxylabs
  • OmniScrape vs Smartproxy
  • OmniScrape vs Crawlbase: API Design, Observability, and Migration Guide
  • OmniScrape vs Apify

Web Scraping Guides

  • Web Scraping Without Getting Blocked
  • Web Scraping Proxy Guide: Types, Sessions, Geo, and OmniScrape Integration
  • Solve CAPTCHAs While Web Scraping
  • Web Scraping vs Web Crawling: Architecture, Patterns, and When to Use Each
  • Headless Browser Scraping: When to Use It and How to Do It Right
  • Web Scraping API: Endpoint, Modes, Output Formats & Integration Patterns
  • Rotating Proxies for Web Scraping: Policies, Session Binding, and Geo Pools
  • Scrape JavaScript-Rendered Pages: SPAs, Hydration, and Hidden APIs

© 2026 OmniScrape. All rights reserved.

PrivacyTermsRefundsAcceptable Use