Webhooks

Receive real-time notifications when scraping jobs complete

About Webhooks

Webhooks allow you to receive HTTP callbacks when events occur in ScrapeHub. Instead of polling for job completion, ScrapeHub will notify your server automatically, making your integration more efficient and real-time.

Setting Up Webhooks

1. Configure Webhook URL

Set up your webhook endpoint in the ScrapeHub dashboard:

  1. Navigate to Settings → Webhooks
  2. Click "Add Webhook"
  3. Enter your endpoint URL (must be HTTPS in production)
  4. Select which events to subscribe to
  5. Save your webhook configuration

2. Verify Webhook Signature

ScrapeHub signs all webhook requests with a secret key. Always verify the signature to ensure requests are authentic:

webhook_verification.py
python
import hmac
import hashlib
from flask import Flask, request, abort

app = Flask(__name__)
WEBHOOK_SECRET = "your_webhook_secret"

@app.route('/webhook', methods=['POST'])
def handle_webhook():
    # Get signature from header
    signature = request.headers.get('X-ScrapeHub-Signature')
    if not signature:
        abort(401)

    # Compute expected signature
    payload = request.get_data()
    expected_signature = hmac.new(
        WEBHOOK_SECRET.encode(),
        payload,
        hashlib.sha256
    ).hexdigest()

    # Verify signature
    if not hmac.compare_digest(signature, expected_signature):
        abort(401)

    # Process webhook
    event = request.json
    handle_event(event)

    return {'status': 'success'}, 200

def handle_event(event):
    event_type = event['type']

    if event_type == 'scrape.completed':
        handle_scrape_completed(event['data'])
    elif event_type == 'scrape.failed':
        handle_scrape_failed(event['data'])
    elif event_type == 'batch.completed':
        handle_batch_completed(event['data'])

def handle_scrape_completed(data):
    job_id = data['job_id']
    result = data['result']
    print(f"Job {job_id} completed successfully")
    # Store result in database, notify users, etc.

if __name__ == '__main__':
    app.run(port=8080)

Node.js/Express Example

webhook_server.js
javascript
const express = require('express');
const crypto = require('crypto');

const app = express();
const WEBHOOK_SECRET = process.env.WEBHOOK_SECRET;

// Raw body needed for signature verification
app.use(express.json({
  verify: (req, res, buf) => {
    req.rawBody = buf.toString('utf8');
  }
}));

app.post('/webhook', (req, res) => {
  const signature = req.headers['x-scrapehub-signature'];

  if (!signature) {
    return res.status(401).send('Missing signature');
  }

  // Verify signature
  const expectedSignature = crypto
    .createHmac('sha256', WEBHOOK_SECRET)
    .update(req.rawBody)
    .digest('hex');

  if (!crypto.timingSafeEqual(
    Buffer.from(signature),
    Buffer.from(expectedSignature)
  )) {
    return res.status(401).send('Invalid signature');
  }

  // Process webhook
  const event = req.body;
  handleEvent(event);

  res.json({ status: 'success' });
});

function handleEvent(event) {
  switch (event.type) {
    case 'scrape.completed':
      handleScrapeCompleted(event.data);
      break;
    case 'scrape.failed':
      handleScrapeFailed(event.data);
      break;
    case 'batch.completed':
      handleBatchCompleted(event.data);
      break;
  }
}

function handleScrapeCompleted(data) {
  console.log(`Job ${data.job_id} completed successfully`);
  // Process the result
}

app.listen(8080, () => {
  console.log('Webhook server listening on port 8080');
});

Webhook Events

scrape.completed

Triggered when a scraping job completes successfully

{
  "type": "scrape.completed",
  "id": "evt_1234567890",
  "created_at": "2026-01-27T10:30:00Z",
  "data": {
    "job_id": "job_abc123",
    "url": "https://example.com",
    "engine": "neural-x1",
    "status": "completed",
    "result": {
      "data": { ... },
      "metadata": {
        "response_time": 1250,
        "status_code": 200
      }
    }
  }
}

scrape.failed

Triggered when a scraping job fails

{
  "type": "scrape.failed",
  "id": "evt_1234567891",
  "created_at": "2026-01-27T10:35:00Z",
  "data": {
    "job_id": "job_abc124",
    "url": "https://example.com",
    "engine": "neural-x1",
    "status": "failed",
    "error": {
      "code": "timeout",
      "message": "Request timed out after 30 seconds"
    }
  }
}

batch.completed

Triggered when a batch scraping job completes (all URLs processed)

{
  "type": "batch.completed",
  "id": "evt_1234567892",
  "created_at": "2026-01-27T10:40:00Z",
  "data": {
    "batch_id": "batch_xyz789",
    "total_jobs": 100,
    "successful": 95,
    "failed": 5,
    "results_url": "https://api.scrapehub.io/v4/batches/batch_xyz789/results"
  }
}

Using Webhooks with API

Specify a webhook URL when creating a scraping job:

async_scraping.py
python
from scrapehub import ScrapeHubClient

client = ScrapeHubClient(api_key="your_api_key")

# Submit async job with webhook
job = client.scrape_async(
    url="https://example.com",
    engine="neural-x1",
    webhook_url="https://your-server.com/webhook"
)

print(f"Job submitted: {job.id}")
# Your webhook will be notified when the job completes
async_scraping.js
javascript
const { ScrapeHubClient } = require('@scrapehub/node');

const client = new ScrapeHubClient({
  apiKey: process.env.SCRAPEHUB_API_KEY
});

// Submit async job with webhook
async function submitJob() {
  const job = await client.scrapeAsync({
    url: 'https://example.com',
    engine: 'neural-x1',
    webhookUrl: 'https://your-server.com/webhook'
  });

  console.log(`Job submitted: ${job.id}`);
  // Your webhook will be notified when the job completes
}

submitJob();

Webhook Best Practices

Security & Reliability

  • Always verify webhook signatures before processing
  • Use HTTPS endpoints in production
  • Respond with a 200 status code quickly (within 5 seconds)
  • Process webhook events asynchronously to avoid timeouts
  • Implement idempotency - webhooks may be delivered multiple times
  • Log webhook events for debugging and monitoring
  • Set up retry logic for failed webhook deliveries

Retry Policy

If your webhook endpoint fails to respond or returns an error, ScrapeHub will retry the delivery:

  • Retries occur at increasing intervals: 1min, 5min, 15min, 1hr, 6hr, 24hr
  • Maximum of 6 retry attempts
  • Webhooks are considered failed after all retries are exhausted
  • You can manually replay failed webhooks from the dashboard

Testing Webhooks

Local Development

Use a tool like ngrok to expose your local server for testing:

Terminal
# Start your webhook server locally python webhook_server.py # In another terminal, start ngrok ngrok http 8080 # Use the ngrok URL in ScrapeHub dashboard # https://abc123.ngrok.io/webhook

Test Events

Send a test webhook from the dashboard to verify your endpoint:

  1. Go to Settings → Webhooks
  2. Select your webhook configuration
  3. Click "Send Test Event"
  4. Choose event type and click "Send"
  5. Verify your server receives and processes the test event

Important Notes

  • Webhook URLs must be publicly accessible
  • HTTP URLs are only allowed for development (use HTTPS in production)
  • Keep your webhook secret secure - never commit it to version control
  • Webhook delivery is not guaranteed - implement fallback polling if critical