Firecrawl provides powerful document parsing capabilities, allowing you to extract structured content from various document formats. This feature is particularly useful for processing files like spreadsheets, Word documents, and more.Documentation Index
Fetch the complete documentation index at: https://firecrawl-mog-search-exclude-include-domains.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Supported Document Formats
Firecrawl currently supports the following document formats:-
Excel Spreadsheets (
.xlsx,.xls)- Each worksheet is converted to an HTML table
- Worksheets are separated by H2 headings with the sheet name
- Preserves cell formatting and data types
-
Word Documents (
.docx,.doc,.odt,.rtf)- Extracts text content while preserving document structure
- Maintains headings, paragraphs, lists, and tables
- Preserves basic formatting and styling
-
PDF Documents (
.pdf)- Extracts text content with layout information
- Preserves document structure including sections and paragraphs
- Handles both text-based and scanned PDFs (with OCR support)
- Supports
modeoption to control parsing strategy:fast(text-only),auto(text with OCR fallback, default), orocr(force OCR) - Priced at 1 credit per-page. See Pricing for details.
PDF Parsing Modes
Use theparsers option to control how PDFs are processed:
| Mode | Description |
|---|---|
auto | Attempts fast text-based extraction first, falls back to OCR if needed. This is the default. |
fast | Text-based parsing only (embedded text). Fastest option, but will not extract text from scanned or image-heavy pages. |
ocr | Forces OCR parsing on every page. Use for scanned documents or when auto misclassifies a page. |
How to Use Document Parsing
Document parsing in Firecrawl works in two ways:- URL-based parsing (
/v2/scrape): provide a URL that points to a supported document type. - File upload parsing (
/v2/parse): upload file bytes directly withmultipart/form-data.
Upload documents with /v2/parse
Use /v2/parse when the source document is local or not publicly accessible by URL.
Example: Scraping an Excel File
Node
Example: Scraping a Word Document
Node
Output Format
All supported document types are converted to clean, structured markdown. For example, an Excel file with multiple sheets might be converted to:Are you an AI agent that needs a Firecrawl API key? See firecrawl.dev/agent-onboarding/SKILL.md for automated onboarding instructions.

