Batch Scrape

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

urls

string<uri>[]

required

The URL to scrape

webhook

object

A webhook specification object.

Show child attributes

maxConcurrency

integer

Maximum number of concurrent scrapes. This parameter allows you to set a concurrency limit for this batch scrape. If not specified, the batch scrape adheres to your team's concurrency limit.

ignoreInvalidURLs

boolean

default:true

If invalid URLs are specified in the urls array, they will be ignored. Instead of them failing the entire request, a batch scrape using the remaining valid URLs will be created, and the invalid URLs will be returned in the invalidURLs field of the response.

formats

Output formats to include in the response. You can specify one or more formats, either as strings (e.g., 'markdown') or as objects with additional options (e.g., { type: 'json', schema: {...} }). Some formats require specific options to be set. Example: ['markdown', { type: 'json', schema: {...} }].

Show child attributes

onlyMainContent

boolean

default:true

Only return the main content of the page excluding headers, navs, footers, etc. This is a deterministic HTML-level filter applied before markdown is generated; no LLM is involved.

onlyCleanContent

boolean

default:false

Beta. Run an additional LLM-based pass over the generated markdown to remove residual boilerplate that onlyMainContent can miss (cookie banners, ad blocks, social share widgets, breadcrumbs, newsletter signups, comment sections, related-article lists). Headings, lists, tables, code blocks, image references, and inline links are preserved. Can be combined with onlyMainContent (the most common setup) or used on its own. Skipped with a warning when the markdown exceeds the cleaning model's output token limit (the original markdown is preserved). Not supported on zero-data-retention requests.

includeTags

string[]

Tags to include in the output.

excludeTags

string[]

Tags to exclude from the output.

maxAge

integer

default:172800000

Returns a cached version of the page if it is younger than this age in milliseconds. If a cached version of the page is older than this value, the page will be scraped. If you do not need extremely fresh data, enabling this can speed up your scrapes by 500%. Defaults to 2 days.

minAge

integer

When set, the request only checks the cache and never triggers a fresh scrape. The value is in milliseconds and specifies the minimum age the cached data must be. If matching cached data exists, it is returned instantly. If no cached data is found, a 404 with error code SCRAPE_NO_CACHED_DATA is returned. Set to 1 to accept any cached data regardless of age.

headers

object

Headers to send with the request. Can be used to send cookies, user-agent, etc.

waitFor

integer

default:0

Specify a delay in milliseconds before fetching the content, allowing the page sufficient time to load. This waiting time is in addition to Firecrawl's smart wait feature.

mobile

boolean

default:false

Set to true if you want to emulate scraping from a mobile device. Useful for testing responsive pages and taking mobile screenshots.

skipTlsVerification

boolean

default:true

Skip TLS certificate verification when making requests.

timeout

integer

default:60000

Timeout in milliseconds for the request. Minimum is 1000 (1 second). Default is 60000 (60 seconds). Maximum is 300000 (300 seconds).

Required range: 1000 <= x <= 300000

parsers

object[]

Controls how files are processed during scraping. When "pdf" is included (default), the PDF content is extracted and converted to markdown format, with billing based on the number of pages (1 credit per page). When an empty array is passed, the PDF file is returned in base64 encoding with a flat rate of 1 credit for the entire PDF.

Show child attributes

actions

Actions to perform on the page before grabbing the content

Show child attributes

location

object

Location settings for the request. When specified, this will use an appropriate proxy if available and emulate the corresponding language and timezone settings. Defaults to 'US' if not specified.

Show child attributes

removeBase64Images

boolean

default:true

Removes all base 64 images from the markdown output, which may be overwhelmingly long. This does not affect html or rawHtml formats. The image's alt text remains in the output, but the URL is replaced with a placeholder.

blockAds

boolean

default:true

Enables ad-blocking and cookie popup blocking.

proxy

enum<string>

default:auto

Specifies the type of proxy to use.

basic: Proxies for scraping sites with none to basic anti-bot solutions. Fast and usually works.
enhanced: Enhanced proxies for scraping sites with advanced anti-bot solutions. Slower, but more reliable on certain sites. Costs up to 5 credits per request.
auto: Firecrawl will automatically retry scraping with enhanced proxies if the basic proxy fails. If the retry with enhanced is successful, 5 credits will be billed for the scrape. If the first attempt with basic is successful, only the regular cost will be billed.

Available options:

basic,

enhanced,

auto

storeInCache

boolean

default:true

If true, the page will be stored in the Firecrawl index and cache. Setting this to false is useful if your scraping activity may have data protection concerns. Using some parameters associated with sensitive scraping (e.g. actions, headers) will force this parameter to be false.

lockdown

boolean

default:false

If true, serves the request from Firecrawl's cache only and never makes an outbound request to the target URL. Designed for compliance-constrained or air-gapped environments where the scrape request itself could leak sensitive information. On cache miss, returns a 404 with error code SCRAPE_LOCKDOWN_CACHE_MISS (the URL is never logged on miss). Lockdown requests are treated as zero data retention. Default maxAge is extended to 2 years so existing cached pages remain eligible. Billed at 5 credits on hit, 1 credit on cache miss.

profile

object

Enable persistent browser storage across scrape and interact sessions. Pass a profile when scraping to preserve cookies, localStorage, and session data. Sessions with the same profile name share browser state.

Show child attributes

zeroDataRetention

boolean

default:false

If true, this will enable zero data retention for this batch scrape. To enable this feature, please contact help@firecrawl.dev

Response

Successful response

success

boolean

string

url

string<uri>

invalidURLs

string[] | null

If ignoreInvalidURLs is true, this is an array containing the invalid URLs that were specified in the request. If there were no invalid URLs, this will be an empty array. If ignoreInvalidURLs is false, this field will be undefined.

Using the API

Search Endpoints

Scrape Endpoints

Interact Endpoints

Map Endpoints

Parse Endpoints

Crawl Endpoints

Agent Endpoints

Extract Endpoints

Account Endpoints

Webhook Payloads

Authorizations

Body

Response

Using the API

Search Endpoints

Scrape Endpoints

Interact Endpoints

Map Endpoints

Parse Endpoints

Crawl Endpoints

Agent Endpoints

Extract Endpoints

Account Endpoints

Webhook Payloads

Documentation Index

Authorizations

Body

Response