🚀🤖 Crawl4AI

Open-source LLM Friendly Web Crawler & Scraper. Transform the web into clean, LLM-ready Markdown for RAG, agents, and data pipelines.

66,906
Stars
6,852
Forks
97
Issues
372
Subscribers

Ready to Transform Your Web Data?

Join the 50k+ developer community using the most starred web crawler on GitHub

View on GitHub

Why Developers Choose Crawl4AI

📝 LLM Ready Output

  • Clean Markdown with headings, tables, code
  • Citation hints and references
  • BM25-based content filtering
  • Customizable generation strategies

🚀 Fast & Efficient

  • Async browser pool
  • Intelligent caching
  • Minimal network hops
  • Optimized for large-scale

🎯 Full Control

  • Session management
  • Proxy & cookie support
  • User scripts & hooks
  • Custom headers & user agents

🧠 Adaptive Intelligence

  • Site pattern learning
  • Smart content extraction
  • Anti-bot detection
  • Shadow DOM support

🌐 Deploy Anywhere

  • Zero API keys required
  • CLI and Docker support
  • Cloud friendly
  • Self-hostable

🚀 Quick Start

1. Install Crawl4AI

# Install the package
pip install -U crawl4ai

# Run post-installation setup
crawl4ai-setup

# Verify your installation
crawl4ai-doctor

2. Run a Simple Crawl

import asyncio
from crawl4ai import *

async def main():
  async with AsyncWebCrawler() as crawler:
    result = await crawler.arun(
      url="https://www.nbcnews.com/business",
    )
    print(result.markdown)

if __name__ == "__main__":
  asyncio.run(main())

3. Use Command Line

# Basic crawl
crwl https://www.nbcnews.com/business -o markdown

# Deep crawl
crwl https://docs.crawl4ai.com --deep-crawl bfs --max-pages 10

# LLM extraction
crwl https://www.example.com/products -q "Extract all product prices"

✨ Recent Updates

Version 0.8.6 — Security Hotfix

Replaced `litellm` with `unclecode-litellm` due to PyPI supply chain compromise. If you're on v0.8.5 or earlier, upgrade immediately: pip install -U crawl4ai

Version 0.8.5 — Anti-Bot Detection & Shadow DOM

3-tier anti-bot detection with proxy escalation, Shadow DOM flattening, deep crawl cancellation, and 60+ bug fixes. Major release focused on bot detection capabilities.

Version 0.8.0 — Crash Recovery & Prefetch Mode

Deep crawl crash recovery with `resume_state` and `on_state_change` callbacks. New `prefetch=True` mode for 5-10x faster URL discovery.

Version 0.7.8 — Stability Release

11 bug fixes addressing Docker API issues, LLM extraction improvements, URL handling fixes, and dependency updates.