Reference for every option across Firecrawl’s scrape, crawl, map, and agent endpoints.Documentation Index
Fetch the complete documentation index at: https://firecrawl-mog-search-exclude-include-domains.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Basic scraping
To scrape a single page and get clean markdown content, use the/scrape endpoint.
Scraping PDFs
Firecrawl supports PDFs. Use theparsers option (e.g., parsers: ["pdf"]) when you want to ensure PDF parsing. You can control the parsing strategy with the mode option:
auto(default) — attempts fast text-based extraction first, then falls back to OCR if needed.fast— text-based parsing only (embedded text). Fastest, but skips scanned/image-heavy pages.ocr— forces OCR parsing on every page. Use for scanned documents or whenautomisclassifies a page.
{ type: "pdf" } and "pdf" both default to mode: "auto".
Scrape options
When using the/scrape endpoint, you can customize the request with the following options.
Formats (formats)
The formats array controls which output types the scraper returns. Default: ["markdown"].
String formats: pass the name directly (e.g. "markdown").
| Format | Description |
|---|---|
markdown | Page content converted to clean Markdown. |
html | Processed HTML with unnecessary elements removed. |
rawHtml | Original HTML exactly as returned by the server. |
links | All links found on the page. |
images | All images found on the page. |
summary | An LLM-generated summary of the page content. |
branding | Extracts brand identity (colors, fonts, typography, spacing, UI components). |
type and additional options.
| Format | Options | Description |
|---|---|---|
json | prompt?: string, schema?: object | Extract structured data using an LLM. Provide a JSON schema and/or a natural-language prompt (max 10,000 characters). |
screenshot | fullPage?: boolean, quality?: number, viewport?: { width, height } | Capture a screenshot. Max one per request. Viewport max resolution is 7680×4320. Screenshot URLs expire after 24 hours. |
changeTracking | modes?: ("json" | "git-diff")[], tag?: string, schema?: object, prompt?: string | Track changes between scrapes. Requires "markdown" to also be in the formats array. |
attributes | selectors: [{ selector: string, attribute: string }] | Extract specific HTML attributes from elements matching CSS selectors. |
Content filtering
These parameters control which parts of the page appear in the output. WhenonlyMainContent is true (the default), boilerplate (nav, footer, etc.) is stripped. includeTags and excludeTags are applied against the original page DOM, not the post-filtered result, so your selectors should target elements as they appear in the source HTML. Set onlyMainContent: false to use the full page as the starting point for tag filtering.
| Parameter | Type | Default | Description |
|---|---|---|---|
onlyMainContent | boolean | true | Return only the main content. Set false for the full page. |
includeTags | array | — | CSS selectors to include — tags, classes, IDs, or attribute selectors (e.g. ["h1", "p", ".main-content", "[data-testid=\"main\"]"]). |
excludeTags | array | — | CSS selectors to exclude — tags, classes, IDs, or attribute selectors (e.g. ["#ad", "#footer", "[role=\"banner\"]"]). |
Timing and cache
| Parameter | Type | Default | Description |
|---|---|---|---|
waitFor | integer (ms) | 0 | Extra wait time before scraping, on top of smart-wait. Use sparingly. |
maxAge | integer (ms) | 172800000 | Return a cached version if fresher than this value (default is 2 days). Set 0 to always fetch fresh. |
timeout | integer (ms) | 60000 | Max request duration before aborting (default is 60 seconds). Minimum is 1000 (1 second). |
PDF parsing
| Parameter | Type | Default | Description |
|---|---|---|---|
parsers | array | ["pdf"] | Controls PDF processing. [] to skip parsing and return base64 (1 credit flat). |
| Property | Type | Default | Description |
|---|---|---|---|
type | "pdf" | (required) | Parser type. |
mode | "fast" | "auto" | "ocr" | "auto" | fast: text-based extraction only. auto: fast with OCR fallback. ocr: force OCR. |
maxPages | integer | — | Cap the number of pages to parse. |
Actions
Run browser actions before scraping. This is useful for dynamic content, navigation, or user-gated pages. You can include up to 50 actions per request, and the combined wait time across allwait actions and waitFor must not exceed 60 seconds.
| Action | Parameters | Description |
|---|---|---|
wait | milliseconds?: number, selector?: string | Wait for a fixed duration or until an element is visible (provide one, not both). When using selector, times out after 30 seconds. |
click | selector: string, all?: boolean | Click an element matching the CSS selector. Set all: true to click every match. |
write | text: string | Type text into the currently focused field. You must focus the element with a click action first. |
press | key: string | Press a keyboard key (e.g. "Enter", "Tab", "Escape"). |
scroll | direction?: "up" | "down", selector?: string | Scroll the page or a specific element. Direction defaults to "down". |
screenshot | fullPage?: boolean, quality?: number, viewport?: { width, height } | Capture a screenshot. Max viewport resolution is 7680×4320. |
scrape | (none) | Capture the current page HTML at this point in the action sequence. |
executeJavascript | script: string | Run JavaScript code in the page. Return values are available in the actions.javascriptReturns array of the response. |
pdf | format?: string, landscape?: boolean, scale?: number | Generate a PDF. Supported formats: "A0" through "A6", "Letter", "Legal", "Tabloid", "Ledger". Defaults to "Letter". |
Action execution notes
- Write requires a preceding
clickto focus the target element. - Scroll accepts an optional
selectorto scroll a specific element instead of the page. - Wait accepts either
milliseconds(fixed delay) orselector(wait until visible). - Actions run sequentially: each step completes before the next begins.
- Actions are not supported for PDFs. If the URL resolves to a PDF the request will fail.
Advanced action examples
Taking a screenshot:cURL
cURL
cURL
cURL
executeJavascript action is captured in the actions.javascriptReturns array of the response.
Full scrape example
The following request combines multiple scrape options:cURL
<h1>, <p>, <a>, and .main-content while excluding #ad and #footer, waits 1 second before scraping, sets a 15 second timeout, and enables PDF parsing.
See the full Scrape API reference for details.
JSON extraction via formats
Use the JSON format object informats to extract structured data in one pass:
Agent endpoint
Use the/v2/agent endpoint for autonomous, multi-page data extraction. The agent runs asynchronously: you start a job, then poll for results.
Agent options
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt | string | (required) | Natural-language instructions describing what data to extract (max 10,000 characters). |
urls | array | — | URLs to constrain the agent to. |
schema | object | — | JSON schema to structure the extracted data. |
maxCredits | number | 2500 | Maximum credits the agent can spend. The dashboard supports up to 2,500; for higher limits, set this via the API (values above 2,500 are always billed as paid requests). |
strictConstrainToURLs | boolean | false | When true, the agent only visits the provided URLs. |
model | string | "spark-1-mini" | AI model to use. "spark-1-mini" (default, 60% cheaper) or "spark-1-pro" (higher accuracy). |
Check agent status
PollGET /v2/agent/{jobId} to check progress. The response status field will be "processing", "completed", or "failed".
cURL
firecrawl.agent()) that starts the job and polls automatically until completion.
Crawling multiple pages
To crawl multiple pages, use the/v2/crawl endpoint. The crawl runs asynchronously and returns a job ID. Use the limit parameter to control how many pages are crawled. If omitted, the crawl will process up to 10,000 pages.
cURL
Response
Check crawl job
Use the job ID to check the status of a crawl and retrieve its results.cURL
next parameter, a URL to the next page of results.
Crawl prompt and params preview
You can provide a natural-languageprompt to let Firecrawl derive crawl settings. Preview them first:
cURL
Crawler options
When using the/v2/crawl endpoint, you can customize crawling behavior with the following options.
Path filtering
| Parameter | Type | Default | Description |
|---|---|---|---|
includePaths | array | — | Regex patterns for URLs to include (pathname only by default). |
excludePaths | array | — | Regex patterns for URLs to exclude (pathname only by default). |
regexOnFullURL | boolean | false | Match patterns against the full URL instead of just the pathname. |
Crawl scope
| Parameter | Type | Default | Description |
|---|---|---|---|
maxDiscoveryDepth | integer | — | Max link-depth for discovering new URLs. |
limit | integer | 10000 | Max pages to crawl. |
crawlEntireDomain | boolean | false | Explore siblings and parents to cover the entire domain. |
allowExternalLinks | boolean | false | Follow links to external domains. |
allowSubdomains | boolean | false | Follow subdomains of the main domain. |
delay | number (s) | — | Delay between scrapes. Setting this forces concurrency to 1. |
Sitemap and deduplication
| Parameter | Type | Default | Description |
|---|---|---|---|
sitemap | string | "include" | "include": use sitemap + link discovery. "skip": ignore sitemap. "only": crawl only sitemap URLs. |
deduplicateSimilarURLs | boolean | true | Normalize URL variants (www., https, trailing slashes, index.html) as duplicates. |
ignoreQueryParameters | boolean | false | Strip query strings before deduplication (e.g. /page?a=1 and /page?a=2 become one URL). |
Scrape options for crawl
| Parameter | Type | Default | Description |
|---|---|---|---|
scrapeOptions | object | { formats: ["markdown"] } | Per-page scrape config. Accepts all scrape options above. |
Crawl example
cURL
Mapping website links
The/v2/map endpoint identifies URLs related to a given website.
cURL
Map options
| Parameter | Type | Default | Description |
|---|---|---|---|
search | string | — | Filter links by text match. |
limit | integer | 100 | Max links to return. |
sitemap | string | "include" | "include", "skip", or "only". |
includeSubdomains | boolean | true | Include subdomains. |
Whitelisting Firecrawl
Allowing Firecrawl to scrape your website
- User Agent: Allow
FirecrawlAgentin your firewall or security rules. - IP addresses: Firecrawl does not use a fixed set of outbound IPs.
Allowing your application to call the Firecrawl API
If your firewall blocks outbound requests from your application to external services, you need to whitelist Firecrawl’s API server IP address so your application can reach the Firecrawl API (api.firecrawl.dev):
- IP Address:
35.245.250.27

