Crawl4AI - Open-source LLM Friendly Web Crawler & Scraper

🚀🤖 Crawl4AI

Open-source LLM Friendly Web Crawler & Scraper. Transform the web into clean, LLM-ready Markdown for RAG, agents, and data pipelines.

66,906

Stars

6,852

Forks

Issues

372

Subscribers

Ready to Transform Your Web Data?

Join the 50k+ developer community using the most starred web crawler on GitHub

View on GitHub

Why Developers Choose Crawl4AI

📝 LLM Ready Output

Clean Markdown with headings, tables, code
Citation hints and references
BM25-based content filtering
Customizable generation strategies

🚀 Fast & Efficient

Async browser pool
Intelligent caching
Minimal network hops
Optimized for large-scale

🎯 Full Control

Session management
Proxy & cookie support
User scripts & hooks
Custom headers & user agents

🧠 Adaptive Intelligence

Site pattern learning
Smart content extraction
Anti-bot detection
Shadow DOM support

🌐 Deploy Anywhere

Zero API keys required
CLI and Docker support
Cloud friendly
Self-hostable

🚀 Quick Start

1. Install Crawl4AI

# Install the package

pip install -U crawl4ai

# Run post-installation setup

crawl4ai-setup

# Verify your installation

crawl4ai-doctor

2. Run a Simple Crawl

import asyncio

from crawl4ai import *

async def main():

  async with AsyncWebCrawler() as crawler:

    result = await crawler.arun(

      url="https://www.nbcnews.com/business",

    )

    print(result.markdown)

if __name__ == "__main__":

  asyncio.run(main())

3. Use Command Line

# Basic crawl

crwl https://www.nbcnews.com/business -o markdown

# Deep crawl

crwl https://docs.crawl4ai.com --deep-crawl bfs --max-pages 10

# LLM extraction

crwl https://www.example.com/products -q "Extract all product prices"

✨ Recent Updates

Version 0.8.6 — Security Hotfix

Replaced `litellm` with `unclecode-litellm` due to PyPI supply chain compromise. If you're on v0.8.5 or earlier, upgrade immediately: pip install -U crawl4ai

Version 0.8.5 — Anti-Bot Detection & Shadow DOM

3-tier anti-bot detection with proxy escalation, Shadow DOM flattening, deep crawl cancellation, and 60+ bug fixes. Major release focused on bot detection capabilities.

Version 0.8.0 — Crash Recovery & Prefetch Mode

Deep crawl crash recovery with `resume_state` and `on_state_change` callbacks. New `prefetch=True` mode for 5-10x faster URL discovery.

Version 0.7.8 — Stability Release

11 bug fixes addressing Docker API issues, LLM extraction improvements, URL handling fixes, and dependency updates.