Documentation Index
Fetch the complete documentation index at: https://firecrawl-mog-search-exclude-include-domains.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Aprende a usar las funciones principales de Firecrawl para hacer scraping de repositorios, issues y documentación de GitHub.
npm install @mendable/firecrawl-js zod
Extrae datos estructurados de repositorios mediante esquemas de Zod.
import FirecrawlApp from '@mendable/firecrawl-js';
import { z } from 'zod';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const result = await firecrawl.scrape('https://github.com/firecrawl/firecrawl', {
formats: [{
type: 'json',
schema: z.object({
name: z.string(),
description: z.string(),
stars: z.number(),
forks: z.number(),
language: z.string(),
topics: z.array(z.string())
})
}]
});
console.log(result.json);
Busca repositorios, issues o documentación en GitHub.
import FirecrawlApp from '@mendable/firecrawl-js';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const searchResult = await firecrawl.search('machine learning site:github.com', {
limit: 10,
sources: [{ type: 'web' }], // { type: 'news' }, { type: 'images' }
scrapeOptions: {
formats: ['markdown']
}
});
console.log(searchResult);
Extrae una sola página de GitHub: un repositorio, un issue o un archivo.
import FirecrawlApp from '@mendable/firecrawl-js';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const result = await firecrawl.scrape('https://github.com/firecrawl/firecrawl', {
formats: ['markdown'] // p. ej. html, links, etc.
});
console.log(result);
Descubre todas las URL disponibles en un repositorio o sitio de documentación. Nota: Map devuelve únicamente las URL, sin contenido.
import FirecrawlApp from '@mendable/firecrawl-js';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const mapResult = await firecrawl.map('https://github.com/vercel/next.js/tree/canary/docs');
console.log(mapResult.links);
// Devuelve un array de URLs sin contenido
Rastrea varias páginas de un repositorio o de la documentación.
import FirecrawlApp from '@mendable/firecrawl-js';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const crawlResult = await firecrawl.crawl('https://github.com/facebook/react/wiki', {
limit: 10,
scrapeOptions: {
formats: ['markdown']
}
});
console.log(crawlResult.data);
Extrae varias URL de GitHub simultáneamente.
import FirecrawlApp from '@mendable/firecrawl-js';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
// Esperar a que finalice
const job = await firecrawl.batchScrape([
'https://github.com/vercel/next.js',
'https://github.com/facebook/react',
'https://github.com/microsoft/typescript'],
{
options: {
formats: ['markdown']
},
pollInterval: 2,
timeout: 120
}
);
console.log(job.status, job.completed, job.total);
console.log(job);
Rastreo por lotes con modo JSON
Extrae datos estructurados de múltiples repositorios a la vez.
import FirecrawlApp from '@mendable/firecrawl-js';
import { z } from 'zod';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
// Esperar a que finalice
const job = await firecrawl.batchScrape([
'https://github.com/vercel/next.js',
'https://github.com/facebook/react'],
{
options: {
formats: [{
type: 'json',
schema: z.object({
name: z.string(),
description: z.string(),
stars: z.number(),
language: z.string()
})
}]
},
pollInterval: 2,
timeout: 120
}
);
console.log(job.status, job.completed, job.total);
console.log(job);