Neural Engine

AI-powered intelligent web scraping with automatic adaptation

About Neural Engine

Neural Engine leverages advanced AI and machine learning models to intelligently extract data from websites. It automatically adapts to page structure changes, handles dynamic content, and understands context without requiring CSS selectors or XPath expressions.

Key Features

AI-Powered

Uses machine learning to understand page structure and extract relevant data automatically

Self-Adapting

Automatically adjusts to website changes without needing selector updates

Dynamic Content

Handles JavaScript-rendered content, infinite scrolling, and lazy loading

Context-Aware

Understands semantic meaning and relationships between data elements

Basic Usage

Simple Extraction

Extract data without specifying selectors - Neural Engine figures it out:

neural_basic.py
python
from scrapehub import ScrapeHubClient

client = ScrapeHubClient(api_key="your_api_key")

# Simple extraction - AI automatically detects content
result = client.scrape(
    url="https://example.com/product",
    engine="neural-x1"
)

print(result.data)
# {
#   "title": "Product Name",
#   "price": "$99.99",
#   "description": "Product description...",
#   "images": ["url1.jpg", "url2.jpg"],
#   "availability": "In Stock"
# }

Guided Extraction

Provide hints about what data you want to extract:

neural_guided.py
python
result = client.scrape(
    url="https://example.com/article",
    engine="neural-x1",
    schema={
        "title": "article headline",
        "author": "author name",
        "published_date": "publication date",
        "content": "main article text",
        "tags": "article tags or categories"
    }
)

print(result.data)
# {
#   "title": "Article Headline Here",
#   "author": "John Doe",
#   "published_date": "2026-01-27",
#   "content": "Full article text...",
#   "tags": ["technology", "AI", "web scraping"]
# }

Node.js Example

neural_example.js
javascript
const { ScrapeHubClient } = require('@scrapehub/node');

const client = new ScrapeHubClient({
  apiKey: process.env.SCRAPEHUB_API_KEY
});

async function scrapeProduct() {
  const result = await client.scrape({
    url: 'https://example.com/product',
    engine: 'neural-x1',
    schema: {
      name: 'product name',
      price: 'product price',
      rating: 'customer rating',
      reviews_count: 'number of reviews',
      specifications: 'product specs and features'
    }
  });

  console.log(result.data);
}

scrapeProduct();

Advanced Features

List Extraction

Extract multiple items from listing pages:

neural_lists.py
python
result = client.scrape(
    url="https://example.com/products",
    engine="neural-x1",
    extract_lists=True,
    schema={
        "products": {
            "type": "list",
            "item": {
                "name": "product name",
                "price": "price",
                "rating": "rating",
                "url": "product link"
            }
        }
    }
)

for product in result.data['products']:
    print(f"{product['name']}: {product['price']}")

Pagination

Automatically follow pagination to scrape multiple pages:

neural_pagination.py
python
result = client.scrape(
    url="https://example.com/search?q=laptops",
    engine="neural-x1",
    pagination={
        "enabled": True,
        "max_pages": 5,
        "wait_time": 2  # seconds between pages
    },
    schema={
        "products": {
            "type": "list",
            "item": {
                "title": "product title",
                "price": "price"
            }
        }
    }
)

print(f"Total products found: {len(result.data['products'])}")

Dynamic Content Handling

Wait for dynamic content to load before extraction:

neural_dynamic.py
python
result = client.scrape(
    url="https://example.com/dashboard",
    engine="neural-x1",
    wait_for={
        "type": "content",  # or "selector", "network_idle"
        "value": "data-loaded",  # wait for specific indicator
        "timeout": 10  # seconds
    },
    schema={
        "statistics": "dashboard statistics",
        "recent_activity": "recent user activity"
    }
)

Neural Engine Versions

neural-x1

Latest

Our most advanced AI model with highest accuracy and best handling of complex pages

  • Best for: E-commerce, complex SPAs, dynamic content
  • Speed: Moderate (2-5 seconds per page)
  • Accuracy: 95%+

neural-lite

Fast

Lightweight model optimized for speed with good accuracy

  • Best for: Simple pages, high-volume scraping
  • Speed: Fast (0.5-2 seconds per page)
  • Accuracy: 85%+

neural-ultra

Premium

Maximum accuracy for highly complex or unusual page structures

  • Best for: Difficult sites, maximum accuracy needed
  • Speed: Slower (5-10 seconds per page)
  • Accuracy: 98%+

Data Validation

Neural Engine can validate extracted data against expected formats:

neural_validation.py
python
result = client.scrape(
    url="https://example.com/contact",
    engine="neural-x1",
    schema={
        "email": {
            "description": "contact email",
            "type": "email"  # validates email format
        },
        "phone": {
            "description": "phone number",
            "type": "phone"  # validates phone format
        },
        "price": {
            "description": "product price",
            "type": "number",  # ensures numeric value
            "format": "currency"  # extracts numeric price
        },
        "date": {
            "description": "publication date",
            "type": "date"  # normalizes to ISO format
        }
    }
)

# Data is validated and normalized
print(result.data)
# {
#   "email": "contact@example.com",
#   "phone": "+1-555-123-4567",
#   "price": 99.99,
#   "date": "2026-01-27"
# }

Best Practices

Neural Engine Tips

  • Provide clear, descriptive schema hints for better accuracy
  • Start with neural-x1 then optimize for speed if needed
  • Use data validation to ensure consistent output formats
  • Enable pagination for comprehensive data collection
  • Set appropriate wait times for dynamic content
  • Test extractions with a few pages before large-scale scraping

Error Handling

neural_errors.py
python
from scrapehub import ScrapeHubClient
from scrapehub.exceptions import ExtractionError

client = ScrapeHubClient(api_key="your_api_key")

try:
    result = client.scrape(
        url="https://example.com",
        engine="neural-x1",
        schema={"title": "page title"}
    )

    # Check confidence scores
    if result.confidence < 0.8:
        print("Warning: Low confidence extraction")

    print(result.data)

except ExtractionError as e:
    print(f"Extraction failed: {e}")
    print(f"Partial data: {e.partial_data}")  # May contain partial results