Scrapy 0.24 documentation¶

This documentation contains everything you need to know about Scrapy.

Getting help¶

Having trouble? We’d like to help!

Try the FAQ – it’s got answers to some common questions.
Looking for specific information? Try the Index or Module Index.
Search for information in the archives of the scrapy-users mailing list, or post a question.
Ask a question in the #scrapy IRC channel.
Report bugs with Scrapy in our issue tracker.

Command line tool: Learn about the command-line tool used to manage your Scrapy project.
Items: Define the data you want to scrape.
Spiders: Write the rules to crawl your websites.
Selectors: Extract the data from web pages using XPath.
Scrapy shell: Test your extraction code in an interactive environment.
Item Loaders: Populate your items with the extracted data.
Item Pipeline: Post-process and store your scraped data.
Feed exports: Output your scraped data using different formats and storages.
Link Extractors: Convenient classes to extract links to follow from pages.

Logging: Understand the simple logging facility provided by Scrapy.
Stats Collection: Collect statistics about your scraping crawler.
Sending e-mail: Send email notifications when certain events occur.
Telnet Console: Inspect a running crawler using a built-in Python console.
Web Service: Monitor and control a crawler using a web service.

Frequently Asked Questions: Get answers to most frequently asked questions.
Debugging Spiders: Learn how to debug common problems of your scrapy spider.
Spiders Contracts: Learn how to use contracts for testing your spiders.
Common Practices: Get familiar with some Scrapy common practices.
Broad Crawls: Tune Scrapy for crawling a lot domains in parallel.
Using Firefox for scraping: Learn how to scrape with Firefox and some useful add-ons.
Using Firebug for scraping: Learn how to scrape efficiently using Firebug.
Debugging memory leaks: Learn how to find and get rid of memory leaks in your crawler.
Downloading Item Images: Download static images associated with your scraped items.
Ubuntu packages: Install latest Scrapy packages easily on Ubuntu
Scrapyd: Deploying your Scrapy project in production.
AutoThrottle extension: Adjust crawl rate dynamically based on load.
Benchmarking: Check how Scrapy performs on your hardware.
Jobs: pausing and resuming crawls: Learn how to pause and resume crawls for large spiders.
DjangoItem: Write scraped items using Django models.

Architecture overview: Understand the Scrapy architecture.
Downloader Middleware: Customize how pages get requested and downloaded.
Spider Middleware: Customize the input and output of your spiders.
Extensions: Extend Scrapy with your custom functionality
Core API: Use it on extensions and middlewares to extend Scrapy functionality

Command line tool: Learn about the command-line tool and see all available commands.
Requests and Responses: Understand the classes used to represent HTTP requests and responses.
Settings: Learn how to configure Scrapy and see all available settings.
Signals: See all available signals and how to work with them.
Exceptions: See all available exceptions and their meaning.
Item Exporters: Quickly export your scraped items to a file (XML, CSV, etc).

Release notes: See what has changed in recent Scrapy versions.
Contributing to Scrapy: Learn how to contribute to the Scrapy project.
Versioning and API Stability: Understand Scrapy versioning and API stability.
Experimental features: Learn about bleeding-edge features.