cURL Integration

Use ScrapeHub directly from the command line

Basic Scrape Request

Terminal
curl -X POST https://api.scrapehub.io/v4/scrape \ -H "X-API-KEY: sk_live_xxxx_449x" \ -H "Content-Type: application/json" \ -d '{ "target": "https://example.com/products", "engine": "neural-x1", "format": "json" }'

Advanced Configuration

With JavaScript Rendering

Terminal
curl -X POST https://api.scrapehub.io/v4/scrape \ -H "X-API-KEY: sk_live_xxxx_449x" \ -H "Content-Type: application/json" \ -d '{ "target": "https://example.com/products", "engine": "neural-x1", "render_js": true, "wait_for": ".product-list", "format": "json" }'

With Custom Headers

Terminal
curl -X POST https://api.scrapehub.io/v4/scrape \ -H "X-API-KEY: sk_live_xxxx_449x" \ -H "Content-Type: application/json" \ -d '{ "target": "https://example.com/products", "engine": "neural-x1", "headers": { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36", "Accept-Language": "en-US,en;q=0.9", "Referer": "https://example.com" } }'

With Pagination

Terminal
curl -X POST https://api.scrapehub.io/v4/scrape \ -H "X-API-KEY: sk_live_xxxx_449x" \ -H "Content-Type: application/json" \ -d '{ "target": "https://example.com/products", "engine": "neural-x1", "pagination": { "enabled": true, "max_pages": 10, "selector": "a.next-page" } }'

Creating Async Jobs

Terminal
# Create job curl -X POST https://api.scrapehub.io/v4/jobs \ -H "X-API-KEY: sk_live_xxxx_449x" \ -H "Content-Type: application/json" \ -d '{ "target": "https://example.com/large-dataset", "engine": "neural-x1" }' # Response: # { # "job_id": "job_abc123xyz", # "status": "pending", # "created_at": "2026-01-27T10:30:00Z" # }

Checking Job Status

Terminal
curl -X GET https://api.scrapehub.io/v4/jobs/job_abc123xyz \ -H "X-API-KEY: sk_live_xxxx_449x"

Exporting Results

Export as JSON

Terminal
curl -X GET "https://api.scrapehub.io/v4/jobs/job_abc123xyz/export?format=json" \ -H "X-API-KEY: sk_live_xxxx_449x" \ -o results.json

Export as CSV

Terminal
curl -X GET "https://api.scrapehub.io/v4/jobs/job_abc123xyz/export?format=csv" \ -H "X-API-KEY: sk_live_xxxx_449x" \ -o results.csv

Export as Compressed

Terminal
curl -X GET "https://api.scrapehub.io/v4/jobs/job_abc123xyz/export?format=json&compress=gzip" \ -H "X-API-KEY: sk_live_xxxx_449x" \ -o results.json.gz

Pretty Print JSON

Terminal
curl -X POST https://api.scrapehub.io/v4/scrape \ -H "X-API-KEY: sk_live_xxxx_449x" \ -H "Content-Type: application/json" \ -d '{ "target": "https://example.com/products", "engine": "neural-x1" }' | jq '.'

Using Variables

Terminal
# Store API key as environment variable export SCRAPEHUB_API_KEY="sk_live_xxxx_449x" # Use in requests curl -X POST https://api.scrapehub.io/v4/scrape \ -H "X-API-KEY: $SCRAPEHUB_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "target": "https://example.com/products", "engine": "neural-x1" }'

Batch Processing

batch_scrape.sh
bash
Terminal
#!/bin/bash API_KEY="sk_live_xxxx_449x" URLS=( "https://example.com/category/1" "https://example.com/category/2" "https://example.com/category/3" ) for url in "${URLS[@]}"; do echo "Scraping: $url" curl -X POST https://api.scrapehub.io/v4/scrape \ -H "X-API-KEY: $API_KEY" \ -H "Content-Type: application/json" \ -d "{ \"target\": \"$url\", \"engine\": \"neural-x1\" }" > "results_$(basename $url).json" echo "Saved results for $url" done

Error Handling

Terminal
# Capture HTTP status code HTTP_STATUS=$(curl -s -o response.json -w "%{http_code}" \ -X POST https://api.scrapehub.io/v4/scrape \ -H "X-API-KEY: sk_live_xxxx_449x" \ -H "Content-Type: application/json" \ -d '{ "target": "https://example.com/products", "engine": "neural-x1" }') if [ $HTTP_STATUS -eq 200 ]; then echo "Success! Results saved to response.json" cat response.json | jq '.' elif [ $HTTP_STATUS -eq 401 ]; then echo "Error: Invalid API key" elif [ $HTTP_STATUS -eq 429 ]; then echo "Error: Rate limit exceeded" else echo "Error: HTTP $HTTP_STATUS" cat response.json fi

Webhook Integration

Terminal
curl -X POST https://api.scrapehub.io/v4/jobs \ -H "X-API-KEY: sk_live_xxxx_449x" \ -H "Content-Type: application/json" \ -d '{ "target": "https://example.com/products", "engine": "neural-x1", "webhook_url": "https://your-server.com/webhook" }'

Pro Tips

  • Use -s flag for silent mode (no progress bar)
  • Use -v flag for verbose output (debugging)
  • Pipe results through jq for JSON formatting
  • Store API keys in environment variables for security
  • Use -o filename to save output to file