Neural Engine

AI-powered intelligent web scraping with automatic adaptation

About Neural Engine

Neural Engine leverages advanced AI and machine learning models to intelligently extract data from websites. It automatically adapts to page structure changes, handles dynamic content, and understands context without requiring CSS selectors or XPath expressions.

Key Features

AI-Powered

Uses machine learning to understand page structure and extract relevant data automatically

Self-Adapting

Automatically adjusts to website changes without needing selector updates

Dynamic Content

Handles JavaScript-rendered content, infinite scrolling, and lazy loading

Context-Aware

Understands semantic meaning and relationships between data elements

Basic Usage

Simple Extraction

Extract data without specifying selectors - Neural Engine figures it out:

neural_basic.py

python

from scrapehub import ScrapeHubClient

client = ScrapeHubClient(api_key="your_api_key")

# Simple extraction - AI automatically detects content
result = client.scrape(
    url="https://example.com/product",
    engine="neural-x1"
)

print(result.data)
# {
#   "title": "Product Name",
#   "price": "$99.99",
#   "description": "Product description...",
#   "images": ["url1.jpg", "url2.jpg"],
#   "availability": "In Stock"
# }

Guided Extraction

Provide hints about what data you want to extract:

neural_guided.py

python

result = client.scrape(
    url="https://example.com/article",
    engine="neural-x1",
    schema={
        "title": "article headline",
        "author": "author name",
        "published_date": "publication date",
        "content": "main article text",
        "tags": "article tags or categories"
    }
)

print(result.data)
# {
#   "title": "Article Headline Here",
#   "author": "John Doe",
#   "published_date": "2026-01-27",
#   "content": "Full article text...",
#   "tags": ["technology", "AI", "web scraping"]
# }

Node.js Example

neural_example.js

javascript

const { ScrapeHubClient } = require('@scrapehub/node');

const client = new ScrapeHubClient({
  apiKey: process.env.SCRAPEHUB_API_KEY
});

async function scrapeProduct() {
  const result = await client.scrape({
    url: 'https://example.com/product',
    engine: 'neural-x1',
    schema: {
      name: 'product name',
      price: 'product price',
      rating: 'customer rating',
      reviews_count: 'number of reviews',
      specifications: 'product specs and features'
    }
  });

  console.log(result.data);
}

scrapeProduct();

Advanced Features

List Extraction

Extract multiple items from listing pages:

neural_lists.py

python

result = client.scrape(
    url="https://example.com/products",
    engine="neural-x1",
    extract_lists=True,
    schema={
        "products": {
            "type": "list",
            "item": {
                "name": "product name",
                "price": "price",
                "rating": "rating",
                "url": "product link"
            }
        }
    }
)

for product in result.data['products']:
    print(f"{product['name']}: {product['price']}")

Pagination

Automatically follow pagination to scrape multiple pages:

neural_pagination.py

python

result = client.scrape(
    url="https://example.com/search?q=laptops",
    engine="neural-x1",
    pagination={
        "enabled": True,
        "max_pages": 5,
        "wait_time": 2  # seconds between pages
    },
    schema={
        "products": {
            "type": "list",
            "item": {
                "title": "product title",
                "price": "price"
            }
        }
    }
)

print(f"Total products found: {len(result.data['products'])}")

Dynamic Content Handling

Wait for dynamic content to load before extraction:

neural_dynamic.py

python

result = client.scrape(
    url="https://example.com/dashboard",
    engine="neural-x1",
    wait_for={
        "type": "content",  # or "selector", "network_idle"
        "value": "data-loaded",  # wait for specific indicator
        "timeout": 10  # seconds
    },
    schema={
        "statistics": "dashboard statistics",
        "recent_activity": "recent user activity"
    }
)

Neural Engine Versions

neural-x1

Latest

Our most advanced AI model with highest accuracy and best handling of complex pages

Best for: E-commerce, complex SPAs, dynamic content
Speed: Moderate (2-5 seconds per page)
Accuracy: 95%+

neural-lite

Fast

Lightweight model optimized for speed with good accuracy

Best for: Simple pages, high-volume scraping
Speed: Fast (0.5-2 seconds per page)
Accuracy: 85%+

neural-ultra

Premium

Maximum accuracy for highly complex or unusual page structures

Best for: Difficult sites, maximum accuracy needed
Speed: Slower (5-10 seconds per page)
Accuracy: 98%+

Data Validation

Neural Engine can validate extracted data against expected formats:

neural_validation.py

python

result = client.scrape(
    url="https://example.com/contact",
    engine="neural-x1",
    schema={
        "email": {
            "description": "contact email",
            "type": "email"  # validates email format
        },
        "phone": {
            "description": "phone number",
            "type": "phone"  # validates phone format
        },
        "price": {
            "description": "product price",
            "type": "number",  # ensures numeric value
            "format": "currency"  # extracts numeric price
        },
        "date": {
            "description": "publication date",
            "type": "date"  # normalizes to ISO format
        }
    }
)

# Data is validated and normalized
print(result.data)
# {
#   "email": "contact@example.com",
#   "phone": "+1-555-123-4567",
#   "price": 99.99,
#   "date": "2026-01-27"
# }

Best Practices

Neural Engine Tips

Provide clear, descriptive schema hints for better accuracy
Start with neural-x1 then optimize for speed if needed
Use data validation to ensure consistent output formats
Enable pagination for comprehensive data collection
Set appropriate wait times for dynamic content
Test extractions with a few pages before large-scale scraping

Error Handling

neural_errors.py

python

from scrapehub import ScrapeHubClient
from scrapehub.exceptions import ExtractionError

client = ScrapeHubClient(api_key="your_api_key")

try:
    result = client.scrape(
        url="https://example.com",
        engine="neural-x1",
        schema={"title": "page title"}
    )

    # Check confidence scores
    if result.confidence < 0.8:
        print("Warning: Low confidence extraction")

    print(result.data)

except ExtractionError as e:
    print(f"Extraction failed: {e}")
    print(f"Partial data: {e.partial_data}")  # May contain partial results

Neural Engine

About Neural Engine

Key Features

AI-Powered

Self-Adapting

Dynamic Content

Context-Aware

Basic Usage

Simple Extraction

Guided Extraction

Node.js Example

Advanced Features

List Extraction

Pagination

Dynamic Content Handling

Neural Engine Versions

neural-x1

neural-lite

neural-ultra

Data Validation

Best Practices

Neural Engine Tips

Error Handling

Next Steps