OmniScrape
ProductsSolutionsGuidesDocs ↗PricingAbout
ProductsSolutionsGuidesDocs ↗PricingAbout
← All guides
Web Scraping by Language

Web Scraping with Go (Golang)

Go scrapers compile to a single static binary — no virtualenv, no classpath, no gem conflicts. The deployment story is a single `COPY` in a Dockerfile, which teams running workers on Kubernetes or a $5 VPS find refreshing. `net/http` ships in the standard library; `goquery` adds jQuery-style CSS selectors on top of the `golang.org/x/net/html` parser. Together they cover the majority of open, static sites with minimal overhead.

What Go does not give you is a free pass past Cloudflare, Akamai, or DataDome. The same 403 you see in Python shows up here — bot-detection fingerprints the TLS handshake and HTTP/2 frame ordering, not the language. This guide covers idiomatic Go: `context` everywhere, `goquery` parsing, worker pools with goroutines, and POSTing to the OmniScrape API when direct HTTP fails. The Python guide mirrors the API integration side if your team splits stacks.

On this page

1. Module setup and dependencies2. Fetch pages with net/http3. Parse HTML with goquery4. Concurrent scraping with a goroutine worker pool5. OmniScrape API integration with net/http6. OmniScrape call with resty7. Colly for link discovery, OmniScrape for protected fetches8. JavaScript-rendered pages with js_rendering mode9. Error handling, retries, and observability10. FAQ

1.Module setup and dependencies

Initialise a module, then pull in `goquery`. It transitively brings in `golang.org/x/net/html`, which is the actual HTML5 parser. `resty` is optional — it reduces JSON-API boilerplate but adds a dependency. Decide once per project and stay consistent.

Pin versions in `go.sum` before committing. Floating `latest` in CI causes silent breakage when upstream tags a new major. Run `go mod tidy` after every dependency change to keep the graph clean.

terminal
bash
1234go mod init example.com/scraper
go get github.com/PuerkitoBio/goquery
# optional HTTP client sugar:
go get github.com/go-resty/resty/v2

2.Fetch pages with net/http

Always thread `context.Context` through every HTTP call. A scraper without cancellation leaks goroutines when a slow target stalls — the goroutine blocks on `resp.Body.Read` indefinitely. `context.WithTimeout` is the cheapest insurance you have.

Set a realistic `User-Agent`. Many CDNs reject requests with Go's default `Go-http-client/2.0` string before even evaluating the page. A plausible browser UA does not defeat bot detection on its own, but it avoids trivial rejections on lightly-protected sites.

Read the full body with `io.ReadAll` and close the response body in a `defer`. Failing to drain and close the body prevents connection reuse in `http.DefaultClient`'s transport pool, which degrades throughput on high-volume crawls.

fetch.go
go
1234567891011121314151617181920212223242526272829303132333435package main

import (
    "context"
    "io"
    "log"
    "net/http"
    "time"
)

func fetchPage(ctx context.Context, target string) ([]byte, error) {
    ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
    defer cancel()

    req, err := http.NewRequestWithContext(ctx, "GET", target, nil)
    if err != nil {
        return nil, err
    }
    req.Header.Set("User-Agent",
        "Mozilla/5.0 (compatible; GoScraper/1.0)")
    req.Header.Set("Accept-Language", "en-US,en;q=0.9")

    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        return nil, err
    }
    defer resp.Body.Close()

    body, err := io.ReadAll(resp.Body)
    if err != nil {
        return nil, err
    }
    log.Printf("GET %s -> %d (%d bytes)", target, resp.StatusCode, len(body))
    return body, nil
}

3.Parse HTML with goquery

`goquery.NewDocumentFromReader` accepts any `io.Reader`. Wrap the raw response bytes in `bytes.NewReader` and you have a parsed document ready for CSS selectors. `Find()` returns a `*Selection`; call `Each()` to iterate matched nodes.

`Text()` returns the concatenated text content of the node and all descendants — trim whitespace with `strings.TrimSpace`. `Attr()` returns the attribute value and a boolean indicating presence; always check the boolean when the attribute is optional.

Avoid deeply nested `Find` chains inside `Each` callbacks. Prefer a flat selector that targets the leaf element directly — it is both faster and easier to maintain when the site updates its markup.

parse.go
go
12345678910111213141516171819202122232425262728293031323334353637package main

import (
    "bytes"
    "log"
    "strings"

    "github.com/PuerkitoBio/goquery"
)

type Book struct {
    Title string
    Price string
    URL   string
}

func parseBooks(body []byte) []Book {
    doc, err := goquery.NewDocumentFromReader(bytes.NewReader(body))
    if err != nil {
        log.Fatal(err)
    }

    var books []Book
    doc.Find("article.product_pod").Each(func(_ int, s *goquery.Selection) {
        title, _ := s.Find("h3 a").Attr("title")
        price := strings.TrimSpace(s.Find(".price_color").Text())
        href, _ := s.Find("h3 a").Attr("href")
        books = append(books, Book{
            Title: title,
            Price: price,
            URL:   "https://books.toscrape.com/catalogue/" + href,
        })
    })

    log.Printf("parsed %d books", len(books))
    return books
}

4.Concurrent scraping with a goroutine worker pool

Fan out requests across a fixed number of goroutines using buffered channels. The `jobs` channel carries URLs; the `results` channel carries structured output. A `sync.WaitGroup` is not needed here because the result count equals the job count — reading `len(urls)` results from the channel is sufficient synchronisation.

Cap workers at 5–10 for external targets unless you have confirmed rate-limit headroom. More goroutines does not mean more throughput when the bottleneck is the remote server or your egress bandwidth. For OmniScrape calls the practical limit is your account's concurrent-request allowance.

Pass the API key via environment variable — never hardcode credentials. `os.Getenv` at startup, fail fast if empty.

pool.go
go
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465package main

import (
    "context"
    "log"
    "os"
    "strings"

    "github.com/PuerkitoBio/goquery"
)

type Result struct {
    URL  string
    HTML string
    Err  error
}

func worker(
    ctx context.Context,
    jobs <-chan string,
    results chan<- Result,
    apiKey string,
) {
    for url := range jobs {
        html, err := fetchOmniScrape(ctx, apiKey, url)
        results <- Result{URL: url, HTML: html, Err: err}
    }
}

func main() {
    apiKey := os.Getenv("OMNISCRAPE_KEY")
    if apiKey == "" {
        log.Fatal("OMNISCRAPE_KEY not set")
    }

    urls := []string{
        "https://example.com/product/1",
        "https://example.com/product/2",
        "https://example.com/product/3",
    }

    jobs := make(chan string, len(urls))
    results := make(chan Result, len(urls))

    ctx := context.Background()
    const workers = 5
    for w := 0; w < workers; w++ {
        go worker(ctx, jobs, results, apiKey)
    }

    for _, u := range urls {
        jobs <- u
    }
    close(jobs)

    for range urls {
        r := <-results
        if r.Err != nil {
            log.Printf("FAIL %s: %v", r.URL, r.Err)
            continue
        }
        doc, _ := goquery.NewDocumentFromReader(strings.NewReader(r.HTML))
        log.Printf("OK   %s | h1: %q", r.URL, strings.TrimSpace(doc.Find("h1").First().Text()))
    }
}

5.OmniScrape API integration with net/http

Marshal a request struct to JSON, POST it to `https://api.omniscrape.io/v1/scrape` with your `X-API-Key` header, then decode the response. On bot-protected retail, news, or travel sites this replaces your direct `GET` entirely — the API handles TLS fingerprinting, browser emulation, and CAPTCHA solving upstream.

The response HTML lives in `data.content`. Check `success` before accessing `data` — on failure the API returns a structured error body rather than a 4xx status, so a non-nil decode error does not imply failure. Log `metadata.method_used` to understand whether the API escalated to a headless browser; that affects your billing and latency expectations.

For deeper context on what happens when the API encounters a Cloudflare challenge, see Cloudflare bypass.

omniscrape.go
go
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970package main

import (
    "bytes"
    "context"
    "encoding/json"
    "fmt"
    "net/http"
)

type ScrapeRequest struct {
    URL          string `json:"url"`
    Mode         string `json:"mode"`
    OutputFormat string `json:"output_format"`
    EnableSolver bool   `json:"enable_solver,omitempty"`
}

type ScrapeResponse struct {
    Success bool `json:"success"`
    Data    struct {
        Content string `json:"content"`
    } `json:"data"`
    Metadata struct {
        MethodUsed      string `json:"method_used"`
        SolverUsed      bool   `json:"solver_used"`
        ChallengeSolved bool   `json:"challenge_solved"`
    } `json:"metadata"`
    Billing struct {
        Charged      float64 `json:"charged"`
        BalanceAfter float64 `json:"balance_after"`
    } `json:"billing"`
    Error string `json:"error,omitempty"`
}

func fetchOmniScrape(ctx context.Context, apiKey, target string) (string, error) {
    payload, err := json.Marshal(ScrapeRequest{
        URL:          target,
        Mode:         "auto",
        OutputFormat: "html",
        EnableSolver: true,
    })
    if err != nil {
        return "", err
    }

    req, err := http.NewRequestWithContext(ctx, "POST",
        "https://api.omniscrape.io/v1/scrape", bytes.NewReader(payload))
    if err != nil {
        return "", err
    }
    req.Header.Set("Content-Type", "application/json")
    req.Header.Set("X-API-Key", apiKey)

    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        return "", err
    }
    defer resp.Body.Close()

    var out ScrapeResponse
    if err := json.NewDecoder(resp.Body).Decode(&out); err != nil {
        return "", fmt.Errorf("decode error: %w", err)
    }
    if !out.Success {
        return "", fmt.Errorf("scrape failed for %s: %s", target, out.Error)
    }

    // HTML is in out.Data.Content
    return out.Data.Content, nil
}

6.OmniScrape call with resty

resty reduces JSON API boilerplate: set the base URL, default headers, and timeout once on the client, then reuse it across all requests. `SetResult` unmarshals the response body directly into your struct — no manual `json.NewDecoder` call needed.

The `css_extractor` output format lets the API extract structured data server-side, returning a `css_extracted` map instead of raw HTML. This is more efficient than fetching full HTML and parsing locally when you only need a handful of fields.

resty.go
go
12345678910111213141516171819202122232425262728293031323334353637383940414243package main

import (
    "fmt"
    "log"
    "os"
    "time"

    "github.com/go-resty/resty/v2"
)

func fetchWithResty(target string) (*ScrapeResponse, error) {
    client := resty.New().
        SetBaseURL("https://api.omniscrape.io").
        SetHeader("X-API-Key", os.Getenv("OMNISCRAPE_KEY")).
        SetTimeout(2 * time.Minute)

    var result ScrapeResponse
    _, err := client.R().
        SetBody(map[string]any{
            "url":           target,
            "mode":          "auto",
            "output_format": "css_extractor",
            "enable_solver": true,
            "proxy":         "residential:us",
            "css_selectors": map[string]string{
                "title":       "h1",
                "price":       "[data-price]",
                "description": ".product-description p",
            },
        }).
        SetResult(&result).
        Post("/v1/scrape")
    if err != nil {
        return nil, err
    }
    if !result.Success {
        return nil, fmt.Errorf("scrape failed: %s", result.Error)
    }
    log.Printf("method_used=%s charged=%.4f",
        result.Metadata.MethodUsed, result.Billing.Charged)
    return &result, nil
}

7.Colly for link discovery, OmniScrape for protected fetches

Colly is well-suited for crawling link graphs on open sites — its `OnHTML` callbacks, politeness delays, and revisit tracking save real implementation time. Use it to discover product, article, or listing URLs from sitemaps and paginated indexes.

On protected detail pages, do not expect Colly's default HTTP client to survive Akamai, PerimeterX, or Cloudflare Bot Management. The fingerprint is wrong at the TLS layer before any application-level header is evaluated. The right pattern: use Colly to collect URLs, then feed those URLs to `fetchOmniScrape` — either directly or via the worker pool above.

You can hook OmniScrape into Colly's custom transport by implementing `http.RoundTripper`, but it is simpler to keep the two concerns separate: Colly owns crawl state and URL deduplication; OmniScrape owns the actual fetch for protected targets.

8.JavaScript-rendered pages with js_rendering mode

goquery operates on the raw HTML returned by the server. Single-page applications that render content client-side via React, Vue, or similar frameworks return a near-empty HTML shell — goquery will find nothing useful. For these targets, use `mode: js_rendering` which runs a headless browser upstream.

Pair `js_rendering` with `js_wait_selector` to tell the browser to wait until a specific element is present in the DOM before capturing the page. Without it, the snapshot may be taken before the async data fetch completes. `js_wait_timeout` sets the maximum wait in milliseconds before the API gives up and returns whatever is rendered so far.

Full walkthrough with pagination and infinite scroll: scraping JavaScript-rendered pages.

js_rendering.go
go
123456789101112131415161718192021222324252627282930313233343536373839404142package main

import (
    "bytes"
    "context"
    "encoding/json"
    "fmt"
    "net/http"
    "os"
)

func fetchSPA(ctx context.Context, target string) (string, error) {
    payload, _ := json.Marshal(map[string]any{
        "url":              target,
        "mode":             "js_rendering",
        "output_format":    "html",
        "js_wait_selector": ".product-card",
        "js_wait_timeout":  10000,
        "enable_solver":    true,
    })

    req, _ := http.NewRequestWithContext(ctx, "POST",
        "https://api.omniscrape.io/v1/scrape", bytes.NewReader(payload))
    req.Header.Set("Content-Type", "application/json")
    req.Header.Set("X-API-Key", os.Getenv("OMNISCRAPE_KEY"))

    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        return "", err
    }
    defer resp.Body.Close()

    var out ScrapeResponse
    if err := json.NewDecoder(resp.Body).Decode(&out); err != nil {
        return "", fmt.Errorf("decode error: %w", err)
    }
    if !out.Success {
        return "", fmt.Errorf("js_rendering failed: %s", out.Error)
    }
    // HTML content is in out.Data.Content
    return out.Data.Content, nil
}

9.Error handling, retries, and observability

A production scraper needs more than a single retry loop. Structure your error handling around the response characteristics so you spend credits only on requests that have a realistic chance of succeeding on retry.

Key rules for OmniScrape error handling:

  • Propagate context cancellation immediately — if the parent context is done, stop retrying and return the context error
  • Retry HTTP 502/503/504 with exponential backoff and ±20% jitter; cap at 3 attempts
  • Never retry 401 (bad API key) or 402 (insufficient credits) — these require operator intervention
  • Check success:false in the response body even on HTTP 200 — the API returns structured errors this way
  • Log metadata.method_used and billing.charged per request; aggregate in your metrics system to track cost per domain
  • Route success:false URLs to a dead-letter file or queue for offline inspection and manual replay
  • Set a custom http.Transport with MaxIdleConnsPerHost tuned to your worker count to avoid connection exhaustion on high-concurrency runs

Frequently asked questions

Should I use net/http or resty for OmniScrape calls?

net/http keeps your dependency graph minimal and is the right choice for dedicated scraper binaries or CLI tools. resty is ergonomic when you are already using it elsewhere in the codebase and want to avoid manual JSON encode/decode boilerplate. Both work identically against the OmniScrape API — the choice is a style preference, not a correctness issue.

Should I use Colly for everything?

Use Colly for crawling open sites where you need link graph traversal, politeness delays, and URL deduplication out of the box. For protected pages — anything behind Cloudflare, Akamai, or PerimeterX — route fetches through OmniScrape regardless of which crawl framework you use. Colly's HTTP client cannot survive modern bot management at the TLS fingerprint level.

How do I limit memory when scraping large pages?

If you only need structured fields, use output_format: css_extractor instead of html. The API extracts data server-side and returns a small JSON map — you never allocate the full HTML string in your process. For HTML output, avoid accumulating all pages in a slice; process and discard each result as it arrives from the results channel.

When should I use http.DefaultClient vs a custom Transport?

http.DefaultClient is fine for low-concurrency scrapers. When running 20+ concurrent OmniScrape calls from one process, create a custom http.Transport with MaxIdleConnsPerHost set to your worker count. Without this, the default limit of 2 idle connections per host causes excessive TCP connection churn and adds measurable latency.

What is the difference between mode auto and mode fast?

mode auto is the default and preferred choice — it tries a lightweight HTTP fetch first and escalates to a headless browser only if the response indicates a challenge or empty render. mode fast skips the escalation logic entirely and returns whatever the HTTP response contains. Use fast only when you have confirmed the target is static and unprotected, and you want to minimise latency and cost.

How do I handle pagination in a Go scraper?

For simple numeric pagination, generate URLs in a loop and push them into the jobs channel. For cursor- or token-based pagination, process each result synchronously — extract the next-page token from the HTML using goquery, then enqueue the next URL. Avoid recursion; use an explicit queue (a slice or channel) to track pending pages and a visited map to prevent cycles.

Can I use session_id to maintain state across requests?

Yes. Pass a session_id string in the request body. The OmniScrape API will reuse the same browser context for subsequent requests with the same session ID, preserving cookies and local storage. This is useful for sites that require a login flow before reaching the target page. Generate a unique session ID per scrape job, not per request.

Related guides

  • Web Scraping with Python
  • How to Bypass Cloudflare When Web Scraping
  • Scrape JavaScript-Rendered Pages: SPAs, Hydration, and Hidden APIs
  • Web Scraping API: Endpoint, Modes, Output Formats & Integration Patterns

Ready to scrape without blocks?

Get your API key in minutes. Test protected URLs from the dashboard — no credit card required to start.

Ready to get started?

Start scraping protected sites today — no credit card required.

OmniScrape

Web scraping infrastructure for developers. One API call to bypass any protection.

All systems operational

Product

  • Web Unlocker
  • Browser-as-a-Service
  • Residential Proxies
  • Pricing

Developers

  • API Reference ↗
  • Quickstart ↗
  • All Guides
  • Use Cases
  • Status

Company

  • About
  • Contact

Legal

  • Privacy Policy
  • Terms of Service
  • Refund Policy
  • Cookie Policy
  • Acceptable Use

Solutions

  • E-commerce Web Scraping: Catalog Intelligence at Production Scale
  • Real Estate Web Scraping: Listings, Comps, and Market Data
  • SERP Web Scraping: Agency Rank Tracking Workflow
  • Job Board Web Scraping: HR Tech Pipeline for Labor Market Intelligence
  • Price Monitoring with Web Scraping: A Practical Developer Guide
  • Lead Generation Web Scraping: Compliant Inbound Enrichment for Sales Teams
  • Market Research Web Scraping: Multi-Geo Data Collection for Research Firms
  • Sentiment Analysis Web Scraping: Build a Production Review Pipeline
  • Logistics Web Scraping: Carrier Rates, Port ETAs, and Sailing Schedules
  • Social Media Web Scraping: Brand Mention Monitoring from Public Pages
  • LLM Training Data Scraping: Building Clean Web Corpora
  • Travel Web Scraping: Hotel Rates, Flight Fares & Parity Monitoring

Web Scraping by Language

  • Web Scraping with Python
  • Web Scraping with Node.js: fetch, Cheerio, and the OmniScrape API
  • Web Scraping with Java: HttpClient, Jsoup, and OmniScrape API
  • Web Scraping with PHP
  • Web Scraping with Go (Golang)
  • Web Scraping with Ruby: Faraday, Nokogiri, Sidekiq & OmniScrape
  • Web Scraping with C#: HttpClient, AngleSharp, and OmniScrape API
  • Web Scraping with Rust
  • Web Scraping with R: httr2, rvest, and the OmniScrape API
  • Web Scraping with C++
  • Web Scraping with Elixir
  • Web Scraping with Perl: Mojo::UserAgent, Mojo::DOM, and OmniScrape

Anti-Bot Bypass

  • How to Bypass Cloudflare When Web Scraping
  • How to Bypass DataDome When Web Scraping
  • How to Bypass Akamai Bot Manager When Web Scraping
  • How to Bypass PerimeterX (HUMAN Security) When Web Scraping
  • Bypassing AWS WAF When Web Scraping: Rate Rules, Bot Control, and Residential Proxies
  • How to Bypass Imperva (Incapsula) When Web Scraping
  • How to Bypass Kasada Bot Protection When Web Scraping
  • How to Bypass F5 BIG-IP Bot Defense When Web Scraping
  • How to Bypass Distil Networks When Web Scraping
  • How to Bypass reCAPTCHA When Web Scraping

Scraping Tools

  • Playwright Web Scraping: Practical Patterns for Protected Sites
  • Puppeteer Web Scraping: Patterns, Anti-Bot Limits, and BaaS Integration
  • Selenium Web Scraping: Practical Patterns for Real-World Projects
  • Scrapy Web Scraping with OmniScrape: Download Middleware, Pipelines, and Scale
  • Beautiful Soup Web Scraping: A Practical Guide
  • cURL Web Scraping: Shell-Native Patterns with OmniScrape
  • HTTPX Web Scraping: Async Python with OmniScrape
  • Cheerio Web Scraping: A Practical Guide

Site-Specific Scrapers

  • Amazon Scraper: Product Data, Buy Box, Reviews, and Multi-Marketplace
  • Google Search Scraper: Extract SERP Rankings and Features
  • Google Maps Scraper: Extract Business Listings and Place Data
  • LinkedIn Scraper: Companies, Jobs, and Public Profiles
  • Walmart Scraper: Prices, Stock, Rollback Deals, and Fulfillment Data
  • eBay Scraper: Extract Listings, Auctions, and Sold Prices
  • Shopify Scraper: Products, Variants, and JSON Endpoints
  • Indeed Scraper: Extract Job Listings, Salaries, and Company Data
  • Zillow Scraper: Extract Listings, Zestimates, and Price History
  • Reddit Scraper: Posts, Comments, and Subreddit Data
  • X (Twitter) Scraper: Tweets, Profiles, and Hashtags
  • Instagram Scraper: Posts, Reels, and Profile Metrics
  • TikTok Scraper: Extract Videos, Hashtags, and Trend Data
  • YouTube Scraper: Extract Video Metadata, Comments, and Channel Stats
  • Booking.com Scraper: Hotel Rates, Room Types, and Availability
  • Airbnb Scraper: Listings, Calendars, and Nightly Rates
  • Crunchbase Scraper: Extract Funding Rounds, Companies, and Investors
  • Yelp Scraper: Extract Business Listings, Ratings, and Reviews
  • Glassdoor Scraper: Employer Ratings, Salaries, and Review Data
  • Trustpilot Scraper: TrustScore, Star Distribution, and Review Monitoring

How We Compare

  • OmniScrape vs ScrapingBee
  • OmniScrape vs ZenRows
  • OmniScrape vs ScraperAPI: A Practical Developer Comparison
  • OmniScrape vs Bright Data: Which Web Scraping Platform Fits Your Team?
  • OmniScrape vs Oxylabs
  • OmniScrape vs Smartproxy
  • OmniScrape vs Crawlbase: API Design, Observability, and Migration Guide
  • OmniScrape vs Apify

Web Scraping Guides

  • Web Scraping Without Getting Blocked
  • Web Scraping Proxy Guide: Types, Sessions, Geo, and OmniScrape Integration
  • Solve CAPTCHAs While Web Scraping
  • Web Scraping vs Web Crawling: Architecture, Patterns, and When to Use Each
  • Headless Browser Scraping: When to Use It and How to Do It Right
  • Web Scraping API: Endpoint, Modes, Output Formats & Integration Patterns
  • Rotating Proxies for Web Scraping: Policies, Session Binding, and Geo Pools
  • Scrape JavaScript-Rendered Pages: SPAs, Hydration, and Hidden APIs

© 2026 OmniScrape. All rights reserved.

PrivacyTermsRefundsAcceptable Use