Webhooks

Receive real-time notifications when scraping jobs complete

About Webhooks

Webhooks allow you to receive HTTP callbacks when events occur in ScrapeHub. Instead of polling for job completion, ScrapeHub will notify your server automatically, making your integration more efficient and real-time.

Setting Up Webhooks

1. Configure Webhook URL

Set up your webhook endpoint in the ScrapeHub dashboard:

Navigate to Settings → Webhooks
Click "Add Webhook"
Enter your endpoint URL (must be HTTPS in production)
Select which events to subscribe to
Save your webhook configuration

2. Verify Webhook Signature

ScrapeHub signs all webhook requests with a secret key. Always verify the signature to ensure requests are authentic:

webhook_verification.py

python

import hmac
import hashlib
from flask import Flask, request, abort

app = Flask(__name__)
WEBHOOK_SECRET = "your_webhook_secret"

@app.route('/webhook', methods=['POST'])
def handle_webhook():
    # Get signature from header
    signature = request.headers.get('X-ScrapeHub-Signature')
    if not signature:
        abort(401)

    # Compute expected signature
    payload = request.get_data()
    expected_signature = hmac.new(
        WEBHOOK_SECRET.encode(),
        payload,
        hashlib.sha256
    ).hexdigest()

    # Verify signature
    if not hmac.compare_digest(signature, expected_signature):
        abort(401)

    # Process webhook
    event = request.json
    handle_event(event)

    return {'status': 'success'}, 200

def handle_event(event):
    event_type = event['type']

    if event_type == 'scrape.completed':
        handle_scrape_completed(event['data'])
    elif event_type == 'scrape.failed':
        handle_scrape_failed(event['data'])
    elif event_type == 'batch.completed':
        handle_batch_completed(event['data'])

def handle_scrape_completed(data):
    job_id = data['job_id']
    result = data['result']
    print(f"Job {job_id} completed successfully")
    # Store result in database, notify users, etc.

if __name__ == '__main__':
    app.run(port=8080)

Node.js/Express Example

webhook_server.js

javascript

const express = require('express');
const crypto = require('crypto');

const app = express();
const WEBHOOK_SECRET = process.env.WEBHOOK_SECRET;

// Raw body needed for signature verification
app.use(express.json({
  verify: (req, res, buf) => {
    req.rawBody = buf.toString('utf8');
  }
}));

app.post('/webhook', (req, res) => {
  const signature = req.headers['x-scrapehub-signature'];

  if (!signature) {
    return res.status(401).send('Missing signature');
  }

  // Verify signature
  const expectedSignature = crypto
    .createHmac('sha256', WEBHOOK_SECRET)
    .update(req.rawBody)
    .digest('hex');

  if (!crypto.timingSafeEqual(
    Buffer.from(signature),
    Buffer.from(expectedSignature)
  )) {
    return res.status(401).send('Invalid signature');
  }

  // Process webhook
  const event = req.body;
  handleEvent(event);

  res.json({ status: 'success' });
});

function handleEvent(event) {
  switch (event.type) {
    case 'scrape.completed':
      handleScrapeCompleted(event.data);
      break;
    case 'scrape.failed':
      handleScrapeFailed(event.data);
      break;
    case 'batch.completed':
      handleBatchCompleted(event.data);
      break;
  }
}

function handleScrapeCompleted(data) {
  console.log(`Job ${data.job_id} completed successfully`);
  // Process the result
}

app.listen(8080, () => {
  console.log('Webhook server listening on port 8080');
});

Webhook Events

scrape.completed

Triggered when a scraping job completes successfully

{
  "type": "scrape.completed",
  "id": "evt_1234567890",
  "created_at": "2026-01-27T10:30:00Z",
  "data": {
    "job_id": "job_abc123",
    "url": "https://example.com",
    "engine": "neural-x1",
    "status": "completed",
    "result": {
      "data": { ... },
      "metadata": {
        "response_time": 1250,
        "status_code": 200
      }
    }
  }
}

scrape.failed

Triggered when a scraping job fails

{
  "type": "scrape.failed",
  "id": "evt_1234567891",
  "created_at": "2026-01-27T10:35:00Z",
  "data": {
    "job_id": "job_abc124",
    "url": "https://example.com",
    "engine": "neural-x1",
    "status": "failed",
    "error": {
      "code": "timeout",
      "message": "Request timed out after 30 seconds"
    }
  }
}

batch.completed

Triggered when a batch scraping job completes (all URLs processed)

{
  "type": "batch.completed",
  "id": "evt_1234567892",
  "created_at": "2026-01-27T10:40:00Z",
  "data": {
    "batch_id": "batch_xyz789",
    "total_jobs": 100,
    "successful": 95,
    "failed": 5,
    "results_url": "https://api.scrapehub.io/v4/batches/batch_xyz789/results"
  }
}

Using Webhooks with API

Specify a webhook URL when creating a scraping job:

async_scraping.py

python

from scrapehub import ScrapeHubClient

client = ScrapeHubClient(api_key="your_api_key")

# Submit async job with webhook
job = client.scrape_async(
    url="https://example.com",
    engine="neural-x1",
    webhook_url="https://your-server.com/webhook"
)

print(f"Job submitted: {job.id}")
# Your webhook will be notified when the job completes

async_scraping.js

javascript

const { ScrapeHubClient } = require('@scrapehub/node');

const client = new ScrapeHubClient({
  apiKey: process.env.SCRAPEHUB_API_KEY
});

// Submit async job with webhook
async function submitJob() {
  const job = await client.scrapeAsync({
    url: 'https://example.com',
    engine: 'neural-x1',
    webhookUrl: 'https://your-server.com/webhook'
  });

  console.log(`Job submitted: ${job.id}`);
  // Your webhook will be notified when the job completes
}

submitJob();

Webhook Best Practices

Security & Reliability

Always verify webhook signatures before processing
Use HTTPS endpoints in production
Respond with a 200 status code quickly (within 5 seconds)
Process webhook events asynchronously to avoid timeouts
Implement idempotency - webhooks may be delivered multiple times
Log webhook events for debugging and monitoring
Set up retry logic for failed webhook deliveries

Retry Policy

If your webhook endpoint fails to respond or returns an error, ScrapeHub will retry the delivery:

Retries occur at increasing intervals: 1min, 5min, 15min, 1hr, 6hr, 24hr
Maximum of 6 retry attempts
Webhooks are considered failed after all retries are exhausted
You can manually replay failed webhooks from the dashboard

Testing Webhooks

Local Development

Use a tool like ngrok to expose your local server for testing:

Terminal
# Start your webhook server locally
python webhook_server.py

# In another terminal, start ngrok
ngrok http 8080

# Use the ngrok URL in ScrapeHub dashboard
# https://abc123.ngrok.io/webhook

Test Events

Send a test webhook from the dashboard to verify your endpoint:

Go to Settings → Webhooks
Select your webhook configuration
Click "Send Test Event"
Choose event type and click "Send"
Verify your server receives and processes the test event

Important Notes

Webhook URLs must be publicly accessible
HTTP URLs are only allowed for development (use HTTPS in production)
Keep your webhook secret secure - never commit it to version control
Webhook delivery is not guaranteed - implement fallback polling if critical