🚀🤖 Crawl4AI
Open-source LLM Friendly Web Crawler & Scraper. Transform the web into clean, LLM-ready Markdown for RAG, agents, and data pipelines.
Ready to Transform Your Web Data?
Join the 50k+ developer community using the most starred web crawler on GitHub
View on GitHubWhy Developers Choose Crawl4AI
📝 LLM Ready Output
- Clean Markdown with headings, tables, code
- Citation hints and references
- BM25-based content filtering
- Customizable generation strategies
🚀 Fast & Efficient
- Async browser pool
- Intelligent caching
- Minimal network hops
- Optimized for large-scale
🎯 Full Control
- Session management
- Proxy & cookie support
- User scripts & hooks
- Custom headers & user agents
🧠 Adaptive Intelligence
- Site pattern learning
- Smart content extraction
- Anti-bot detection
- Shadow DOM support
🌐 Deploy Anywhere
- Zero API keys required
- CLI and Docker support
- Cloud friendly
- Self-hostable
🚀 Quick Start
1. Install Crawl4AI
pip install -U crawl4ai
# Run post-installation setup
crawl4ai-setup
# Verify your installation
crawl4ai-doctor
2. Run a Simple Crawl
from crawl4ai import *
async def main():
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(
url="https://www.nbcnews.com/business",
)
print(result.markdown)
if __name__ == "__main__":
asyncio.run(main())
3. Use Command Line
crwl https://www.nbcnews.com/business -o markdown
# Deep crawl
crwl https://docs.crawl4ai.com --deep-crawl bfs --max-pages 10
# LLM extraction
crwl https://www.example.com/products -q "Extract all product prices"
✨ Recent Updates
Version 0.8.6 — Security Hotfix
Replaced `litellm` with `unclecode-litellm` due to PyPI supply chain compromise. If you're on v0.8.5 or earlier, upgrade immediately: pip install -U crawl4ai
Version 0.8.5 — Anti-Bot Detection & Shadow DOM
3-tier anti-bot detection with proxy escalation, Shadow DOM flattening, deep crawl cancellation, and 60+ bug fixes. Major release focused on bot detection capabilities.
Version 0.8.0 — Crash Recovery & Prefetch Mode
Deep crawl crash recovery with `resume_state` and `on_state_change` callbacks. New `prefetch=True` mode for 5-10x faster URL discovery.
Version 0.7.8 — Stability Release
11 bug fixes addressing Docker API issues, LLM extraction improvements, URL handling fixes, and dependency updates.
💖 Support Crawl4AI
Crawl4AI is the #1 trending open-source web crawler on GitHub. Your support keeps it independent, innovative, and free for the community.