Documentation Index
Fetch the complete documentation index at: https://firecrawl-mog-search-exclude-include-domains.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
官方 .NET SDK 由 Firecrawl monorepo 中的 apps/dot-net-sdk 维护。
要安装 Firecrawl .NET SDK,请添加以下 NuGet 包:
.NET CLI
包管理器
PackageReference
dotnet add package firecrawl-sdk
Install-Package firecrawl-sdk
<PackageReference Include="firecrawl-sdk" Version="1.0.0" />
- 从 firecrawl.dev 获取 API 密钥
- 将 API 密钥设置为名为
FIRECRAWL_API_KEY 的环境变量,或在 FirecrawlClient 构造函数中传入该密钥
下面是一个基于当前 SDK API 的快速示例:
using Firecrawl;
using Firecrawl.Models;
var client = new FirecrawlClient("fc-your-api-key");
// 抓取单个页面
var doc = await client.ScrapeAsync("https://firecrawl.dev",
new ScrapeOptions { Formats = new List<object> { "markdown" } });
// 爬取网站
var job = await client.CrawlAsync("https://firecrawl.dev",
new CrawlOptions { Limit = 5 });
Console.WriteLine(doc.Markdown);
Console.WriteLine($"Crawled pages: {job.Data?.Count ?? 0}");
要抓取单个 URL,请使用 ScrapeAsync 方法。
using Firecrawl.Models;
var doc = await client.ScrapeAsync("https://firecrawl.dev",
new ScrapeOptions
{
Formats = new List<object> { "markdown", "html" },
OnlyMainContent = true,
WaitFor = 5000
});
Console.WriteLine(doc.Markdown);
Console.WriteLine(doc.Metadata?["title"]);
通过 scrape 端点使用 JsonFormat 提取结构化 JSON:
using Firecrawl.Models;
var jsonFmt = new JsonFormat
{
Prompt = "Extract the product name and price",
Schema = new Dictionary<string, object>
{
["type"] = "object",
["properties"] = new Dictionary<string, object>
{
["name"] = new Dictionary<string, object> { ["type"] = "string" },
["price"] = new Dictionary<string, object> { ["type"] = "number" }
}
}
};
var doc = await client.ScrapeAsync("https://example.com/product",
new ScrapeOptions
{
Formats = new List<object> { jsonFmt }
});
Console.WriteLine(doc.Json);
要爬取网站并等待其完成,请使用 CrawlAsync。该方法会自动处理轮询和分页。
using Firecrawl.Models;
var job = await client.CrawlAsync("https://firecrawl.dev",
new CrawlOptions
{
Limit = 50,
MaxDiscoveryDepth = 3,
ScrapeOptions = new ScrapeOptions
{
Formats = new List<object> { "markdown" }
}
});
Console.WriteLine($"Status: {job.Status}");
Console.WriteLine($"Progress: {job.Completed}/{job.Total}");
if (job.Data != null)
{
foreach (var page in job.Data)
{
Console.WriteLine(page.Metadata?["sourceURL"]);
}
}
使用 StartCrawlAsync 异步启动任务,无需等待。
using Firecrawl.Models;
var start = await client.StartCrawlAsync("https://firecrawl.dev",
new CrawlOptions { Limit = 100 });
Console.WriteLine($"Job ID: {start.Id}");
使用 GetCrawlStatusAsync 查看爬取进度。
var status = await client.GetCrawlStatusAsync(start.Id!);
Console.WriteLine($"Status: {status.Status}");
Console.WriteLine($"Progress: {status.Completed}/{status.Total}");
使用 CancelCrawlAsync 取消正在运行中的爬取任务。
var result = await client.CancelCrawlAsync(start.Id!);
Console.WriteLine(result);
使用 MapAsync 发现网站中的链接。
using Firecrawl.Models;
var data = await client.MapAsync("https://firecrawl.dev",
new MapOptions
{
Limit = 100,
Search = "blog"
});
if (data.Links != null)
{
foreach (var link in data.Links)
{
Console.WriteLine(link);
}
}
使用 SearchAsync 并可选配搜索设置进行搜索。
using Firecrawl.Models;
var results = await client.SearchAsync("firecrawl web scraping",
new SearchOptions
{
Limit = 10,
Location = "US"
});
if (results.Web != null)
{
foreach (var hit in results.Web)
{
Console.WriteLine($"{hit.Title} - {hit.Url}");
}
}
使用 BatchScrapeAsync 可并行抓取多个 URL。该方法会自动处理轮询和分页。
using Firecrawl.Models;
var urls = new List<string>
{
"https://firecrawl.dev",
"https://firecrawl.dev/blog"
};
var job = await client.BatchScrapeAsync(urls,
new BatchScrapeOptions
{
Options = new ScrapeOptions
{
Formats = new List<object> { "markdown" }
}
});
if (job.Data != null)
{
foreach (var doc in job.Data)
{
Console.WriteLine(doc.Markdown);
}
}
为确保不会处理重复请求,请传入 IdempotencyKey:
var job = await client.BatchScrapeAsync(urls,
new BatchScrapeOptions
{
IdempotencyKey = "my-unique-key",
Options = new ScrapeOptions
{
Formats = new List<object> { "markdown" }
}
});
查看并发数和剩余额度:
using Firecrawl.Models;
var concurrency = await client.GetConcurrencyAsync();
Console.WriteLine($"Concurrency: {concurrency.Current}/{concurrency.MaxConcurrency}");
var credits = await client.GetCreditUsageAsync();
Console.WriteLine($"Remaining credits: {credits.RemainingCredits}");
.NET SDK 中的所有方法默认均为异步,并返回 Task<T>。它们支持通过 CancellationToken 进行协作取消。
using Firecrawl.Models;
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(30));
var doc = await client.ScrapeAsync("https://example.com",
new ScrapeOptions
{
Formats = new List<object> { "markdown" }
},
cancellationToken: cts.Token);
Console.WriteLine(doc.Markdown);
FirecrawlClient 构造函数支持以下选项:
| 选项 | 类型 | 默认值 | 描述 |
|---|
apiKey | string? | FIRECRAWL_API_KEY 环境变量 | 您的 Firecrawl API 密钥 |
apiUrl | string? | https://api.firecrawl.dev (或 FIRECRAWL_API_URL) | API 基础 URL |
timeout | TimeSpan? | 5 分钟 | HTTP 请求超时 |
maxRetries | int | 3 | 发生瞬时故障时自动重试 |
backoffFactor | double | 0.5 | 以秒为单位的指数退避系数 |
httpClient | HttpClient? | 根据 timeout 构建 | 预先配置的 HttpClient 实例 |
using Firecrawl;
var client = new FirecrawlClient(
apiKey: "fc-your-api-key",
apiUrl: "https://api.firecrawl.dev",
timeout: TimeSpan.FromMinutes(5),
maxRetries: 3,
backoffFactor: 0.5);
你可以传入一个预先配置好的 HttpClient,以控制连接池、代理、消息处理程序以及 HttpClient 的其他功能。提供后,将忽略 timeout 设置,改为使用该客户端自身的配置。
using Firecrawl;
var handler = new HttpClientHandler
{
Proxy = new WebProxy("http://proxy.example.com:8080"),
UseProxy = true
};
var httpClient = new HttpClient(handler)
{
Timeout = TimeSpan.FromSeconds(60)
};
var client = new FirecrawlClient(
apiKey: "fc-your-api-key",
httpClient: httpClient);
当省略构造函数参数时,SDK 会从环境变量中获取配置:
// 使用 FIRECRAWL_API_KEY 和 FIRECRAWL_API_URL 环境变量
var client = new FirecrawlClient();
SDK 会抛出 Firecrawl.Exceptions 下定义的特定异常。
using Firecrawl.Exceptions;
using Firecrawl.Models;
try
{
var doc = await client.ScrapeAsync("https://example.com");
}
catch (AuthenticationException ex)
{
Console.Error.WriteLine($"Auth failed: {ex.Message}");
}
catch (RateLimitException ex)
{
Console.Error.WriteLine($"Rate limited: {ex.Message}");
}
catch (JobTimeoutException ex)
{
Console.Error.WriteLine($"Job {ex.JobId} timed out after {ex.TimeoutSeconds}s");
}
catch (FirecrawlException ex)
{
Console.Error.WriteLine($"Error {ex.StatusCode}: {ex.Message}");
}
异常层级:
| Exception | HTTP Code | 触发时机 |
|---|
AuthenticationException | 401 | API 密钥无效或缺失 |
RateLimitException | 429 | 请求过多 |
JobTimeoutException | — | 异步任务 (爬取/批量抓取) 未能在规定时间内完成 |
FirecrawlException | varies | 其他所有 API 错误的基础异常 |
在抛出异常之前,瞬时故障 (408、409、502 和其他 5xx 错误) 会先以指数退避方式自动重试。
你是需要 Firecrawl API 密钥的 AI 代理吗?请参见 firecrawl.dev/agent-onboarding/SKILL.md 了解自动化接入说明。