Extraction

Extract structured data from any webpage

The extraction endpoint is the core of Refyne. It takes a URL and a schema, and returns structured data.

Basic Extraction

curl -X POST https://api.refyne.uk/api/v1/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/product/123",
    "schema": {
      "name": "string",
      "price": "number",
      "in_stock": "boolean"
    }
  }'

Schema Types

Schemas support the following types:

TypeDescriptionExample
stringText content"name": "string"
numberNumeric values"price": "number"
booleanTrue/false"in_stock": "boolean"
arrayList of items"tags": ["string"]
objectNested structure"author": { "name": "string" }

Nested Schemas

Extract complex nested data:

{
  "url": "https://example.com/article",
  "schema": {
    "title": "string",
    "author": {
      "name": "string",
      "bio": "string"
    },
    "comments": [{
      "author": "string",
      "text": "string",
      "likes": "number"
    }]
  }
}

LLM Configuration

Specify which LLM provider and model to use:

{
  "url": "https://example.com",
  "schema": { ... },
  "llm_config": {
    "provider": "openrouter",
    "model": "anthropic/claude-3.5-sonnet"
  }
}

Response Format

{
  "data": { ... },
  "url": "https://example.com/product/123",
  "fetched_at": "2024-01-15T10:30:00Z",
  "usage": {
    "input_tokens": 1523,
    "output_tokens": 234,
    "cost_usd": 0.0045
  },
  "metadata": {
    "fetch_duration_ms": 342,
    "extract_duration_ms": 1205,
    "model": "claude-3.5-sonnet",
    "provider": "anthropic"
  }
}