Neural Engine
AI-powered intelligent web scraping with automatic adaptation
About Neural Engine
Neural Engine leverages advanced AI and machine learning models to intelligently extract data from websites. It automatically adapts to page structure changes, handles dynamic content, and understands context without requiring CSS selectors or XPath expressions.
Key Features
AI-Powered
Uses machine learning to understand page structure and extract relevant data automatically
Self-Adapting
Automatically adjusts to website changes without needing selector updates
Dynamic Content
Handles JavaScript-rendered content, infinite scrolling, and lazy loading
Context-Aware
Understands semantic meaning and relationships between data elements
Basic Usage
Simple Extraction
Extract data without specifying selectors - Neural Engine figures it out:
from scrapehub import ScrapeHubClient
client = ScrapeHubClient(api_key="your_api_key")
# Simple extraction - AI automatically detects content
result = client.scrape(
url="https://example.com/product",
engine="neural-x1"
)
print(result.data)
# {
# "title": "Product Name",
# "price": "$99.99",
# "description": "Product description...",
# "images": ["url1.jpg", "url2.jpg"],
# "availability": "In Stock"
# }Guided Extraction
Provide hints about what data you want to extract:
result = client.scrape(
url="https://example.com/article",
engine="neural-x1",
schema={
"title": "article headline",
"author": "author name",
"published_date": "publication date",
"content": "main article text",
"tags": "article tags or categories"
}
)
print(result.data)
# {
# "title": "Article Headline Here",
# "author": "John Doe",
# "published_date": "2026-01-27",
# "content": "Full article text...",
# "tags": ["technology", "AI", "web scraping"]
# }Node.js Example
const { ScrapeHubClient } = require('@scrapehub/node');
const client = new ScrapeHubClient({
apiKey: process.env.SCRAPEHUB_API_KEY
});
async function scrapeProduct() {
const result = await client.scrape({
url: 'https://example.com/product',
engine: 'neural-x1',
schema: {
name: 'product name',
price: 'product price',
rating: 'customer rating',
reviews_count: 'number of reviews',
specifications: 'product specs and features'
}
});
console.log(result.data);
}
scrapeProduct();Advanced Features
List Extraction
Extract multiple items from listing pages:
result = client.scrape(
url="https://example.com/products",
engine="neural-x1",
extract_lists=True,
schema={
"products": {
"type": "list",
"item": {
"name": "product name",
"price": "price",
"rating": "rating",
"url": "product link"
}
}
}
)
for product in result.data['products']:
print(f"{product['name']}: {product['price']}")Pagination
Automatically follow pagination to scrape multiple pages:
result = client.scrape(
url="https://example.com/search?q=laptops",
engine="neural-x1",
pagination={
"enabled": True,
"max_pages": 5,
"wait_time": 2 # seconds between pages
},
schema={
"products": {
"type": "list",
"item": {
"title": "product title",
"price": "price"
}
}
}
)
print(f"Total products found: {len(result.data['products'])}")Dynamic Content Handling
Wait for dynamic content to load before extraction:
result = client.scrape(
url="https://example.com/dashboard",
engine="neural-x1",
wait_for={
"type": "content", # or "selector", "network_idle"
"value": "data-loaded", # wait for specific indicator
"timeout": 10 # seconds
},
schema={
"statistics": "dashboard statistics",
"recent_activity": "recent user activity"
}
)Neural Engine Versions
neural-x1
LatestOur most advanced AI model with highest accuracy and best handling of complex pages
- Best for: E-commerce, complex SPAs, dynamic content
- Speed: Moderate (2-5 seconds per page)
- Accuracy: 95%+
neural-lite
FastLightweight model optimized for speed with good accuracy
- Best for: Simple pages, high-volume scraping
- Speed: Fast (0.5-2 seconds per page)
- Accuracy: 85%+
neural-ultra
PremiumMaximum accuracy for highly complex or unusual page structures
- Best for: Difficult sites, maximum accuracy needed
- Speed: Slower (5-10 seconds per page)
- Accuracy: 98%+
Data Validation
Neural Engine can validate extracted data against expected formats:
result = client.scrape(
url="https://example.com/contact",
engine="neural-x1",
schema={
"email": {
"description": "contact email",
"type": "email" # validates email format
},
"phone": {
"description": "phone number",
"type": "phone" # validates phone format
},
"price": {
"description": "product price",
"type": "number", # ensures numeric value
"format": "currency" # extracts numeric price
},
"date": {
"description": "publication date",
"type": "date" # normalizes to ISO format
}
}
)
# Data is validated and normalized
print(result.data)
# {
# "email": "contact@example.com",
# "phone": "+1-555-123-4567",
# "price": 99.99,
# "date": "2026-01-27"
# }Best Practices
Neural Engine Tips
- Provide clear, descriptive schema hints for better accuracy
- Start with
neural-x1then optimize for speed if needed - Use data validation to ensure consistent output formats
- Enable pagination for comprehensive data collection
- Set appropriate wait times for dynamic content
- Test extractions with a few pages before large-scale scraping
Error Handling
from scrapehub import ScrapeHubClient
from scrapehub.exceptions import ExtractionError
client = ScrapeHubClient(api_key="your_api_key")
try:
result = client.scrape(
url="https://example.com",
engine="neural-x1",
schema={"title": "page title"}
)
# Check confidence scores
if result.confidence < 0.8:
print("Warning: Low confidence extraction")
print(result.data)
except ExtractionError as e:
print(f"Extraction failed: {e}")
print(f"Partial data: {e.partial_data}") # May contain partial results