Scrapy 0.14 documentation

This documentation contains everything you need to know about Scrapy.

Getting help

Having trouble? We’d like to help!

First steps

Scrapy at a glance
Understand what Scrapy is and how it can help you.
Installation guide
Get Scrapy installed on your computer.
Scrapy Tutorial
Write your first Scrapy project.
Learn more by playing with a pre-made Scrapy project.

Basic concepts

Command line tool
Learn about the command-line tool used to manage your Scrapy project.
Define the data you want to scrape.
Write the rules to crawl your websites.
XPath Selectors
Extract the data from web pages.
Scrapy shell
Test your extraction code in an interactive environment.
Item Loaders
Populate your items with the extracted data.
Item Pipeline
Post-process and store your scraped data.
Feed exports
Output your scraped data using different formats and storages.
Link Extractors
Convenient classes to extract links to follow from pages.

Built-in services

Understand the simple logging facility provided by Scrapy.
Stats Collection
Collect statistics about your scraping crawler.
Sending e-mail
Send email notifications when certain events occur.
Telnet Console
Inspect a running crawler using a built-in Python console.
Web Service
Monitor and control a crawler using a web service.

Solving specific problems

Frequently Asked Questions
Get answers to most frequently asked questions.
Using Firefox for scraping
Learn how to scrape with Firefox and some useful add-ons.
Using Firebug for scraping
Learn how to scrape efficiently using Firebug.
Debugging memory leaks
Learn how to find and get rid of memory leaks in your crawler.
Downloading Item Images
Download static images associated with your scraped items.
Ubuntu packages
Install latest Scrapy packages easily on Ubuntu
Scrapy Service (scrapyd)
Deploying your Scrapy project in production.
Jobs: pausing and resuming crawls
Learn how to pause and resume crawls for large spiders.

Extending Scrapy

Architecture overview
Understand the Scrapy architecture.
Downloader Middleware
Customize how pages get requested and downloaded.
Spider Middleware
Customize the input and output of your spiders.
Add any custom functionality using signals and the Scrapy API


Command line tool
Learn about the command-line tool and see all available commands.
Requests and Responses
Understand the classes used to represent HTTP requests and responses.
Learn how to configure Scrapy and see all available settings.
See all available signals and how to work with them.
See all available exceptions and their meaning.
Item Exporters
Quickly export your scraped items to a file (XML, CSV, etc).

All the rest

Contributing to Scrapy
Learn how to contribute to the Scrapy project.
Versioning and API Stability
Understand Scrapy versioning and API stability.
Experimental features
Learn about bleeding-edge features.