The crawl endpoint lets you extract data from multiple pages by following links.

Basic Crawl

curl -X POST https://api.refyne.uk/api/v1/crawl \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/products",
    "schema": {
      "name": "string",
      "price": "number"
    },
    "options": {
      "follow_selector": "a.product-link",
      "max_pages": 10
    }
  }'

Crawl Options

Option	Type	Description
`follow_selector`	string	CSS selector for links to follow
`follow_pattern`	string	Regex pattern for URLs to follow
`max_pages`	number	Maximum pages to extract (default: 50)
`max_depth`	number	Maximum link depth to follow (default: 2)
`same_domain_only`	boolean	Only follow links on same domain (default: true)
`delay`	string	Delay between requests (e.g., "1s")
`concurrency`	number	Parallel requests (default: 3)

Following Links

By CSS Selector

Follow specific links on the page:

{
  "options": {
    "follow_selector": "a.product-card",
    "max_pages": 20
  }
}

By URL Pattern

Follow links matching a pattern:

{
  "options": {
    "follow_pattern": "/products/[0-9]+",
    "max_pages": 20
  }
}

Pagination

Handle paginated content:

{
  "options": {
    "next_selector": "a.next-page",
    "max_pages": 100
  }
}

Job Status

Crawl jobs run asynchronously. Check status:

curl https://api.refyne.uk/api/v1/jobs/JOB_ID \
  -H "Authorization: Bearer YOUR_API_KEY"

Getting Results

Retrieve crawl results:

# Individual results
curl https://api.refyne.uk/api/v1/jobs/JOB_ID/results \
  -H "Authorization: Bearer YOUR_API_KEY"

# Merged results
curl "https://api.refyne.uk/api/v1/jobs/JOB_ID/results?merge=true" \
  -H "Authorization: Bearer YOUR_API_KEY"

Crawling