API ReferenceExtraction

Start crawl job

POST
/api/v1/crawl

Authorization

bearerAuth
AuthorizationBearer <token>

API key authentication. Include your API key in the Authorization header as Bearer rf_your_key.

In: header

Query Parameters

wait?boolean

Block until job completes and return results directly. Max wait time is 2 minutes. Returns 202 if timeout exceeded.

Defaultfalse
timeout?integer

Maximum seconds to wait when wait=true (default 120s, max 120s/2min). For longer jobs, use async mode.

Default120
Formatint64
Range10 <= value <= 120

Request Body

application/json

llm_config?

Optional LLM configuration override (BYOK)

options?

Crawl configuration options

schema*unknown

JSON Schema defining the data structure to extract. Example: {"name":"string","price":"number","description":"string"}

url*string

Seed URL to start crawling from

Length1 <= length
webhook?

Inline ephemeral webhook configuration

webhook_id?string

ID of a saved webhook to call on job events

webhook_url?string

Simple webhook URL (backward compatible)

Formaturi
[key: string]?never

Response Body

application/json

application/problem+json

curl -X POST "http://localhost:8080/api/v1/crawl" \  -H "Content-Type: application/json" \  -d '{    "schema": null,    "url": "https://example.com/products"  }'
{
  "cost_usd": 0.15,
  "data": {
    "property1": null,
    "property2": null
  },
  "duration_ms": 12500,
  "error_message": "string",
  "job_id": "01HXYZ123ABC456DEF789",
  "page_count": 5,
  "status": "completed",
  "status_url": "https://api.refyne.uk/api/v1/jobs/01HXYZ123ABC456DEF789",
  "token_usage": {
    "input": 8500,
    "output": 2100
  }
}
{
  "detail": "Property foo is required but is missing.",
  "errors": [
    {
      "location": "string",
      "message": "string",
      "value": null
    }
  ],
  "instance": "https://example.com/error-log/abc123",
  "status": 400,
  "title": "Bad Request",
  "type": "https://example.com/errors/example"
}