Logging

Scrapy uses logging for event logging. We’ll provide some simple examples to get you started, but for more advanced use-cases it’s strongly suggested to read thoroughly its documentation.

Logging works out of the box, and can be configured to some extent with the Scrapy settings listed in Logging settings.

Scrapy calls scrapy.utils.log.configure_logging() to set some reasonable defaults and handle those settings in Logging settings when running commands, so it’s recommended to manually call it if you’re running Scrapy from scripts as described in Run Scrapy from a script.

Log levels

Python’s builtin logging defines 5 different levels to indicate the severity of a given log message. Here are the standard ones, listed in decreasing order:

logging.CRITICAL - for critical errors (highest severity)
logging.ERROR - for regular errors
logging.WARNING - for warning messages
logging.INFO - for informational messages
logging.DEBUG - for debugging messages (lowest severity)

How to log messages

Here’s a quick example of how to log a message using the logging.WARNING level:

import logging

logging.warning("This is a warning")

There are shortcuts for issuing log messages on any of the standard 5 levels, and there’s also a general logging.log method which takes a given level as argument. If needed, the last example could be rewritten as:

import logging

logging.log(logging.WARNING, "This is a warning")

On top of that, you can create different “loggers” to encapsulate messages. (For example, a common practice is to create different loggers for every module). These loggers can be configured independently, and they allow hierarchical constructions.

The previous examples use the root logger behind the scenes, which is a top level logger where all messages are propagated to (unless otherwise specified). Using logging helpers is merely a shortcut for getting the root logger explicitly, so this is also an equivalent of the last snippets:

import logging

logger = logging.getLogger()
logger.warning("This is a warning")

You can use a different logger just by getting its name with the logging.getLogger function:

import logging

logger = logging.getLogger("mycustomlogger")
logger.warning("This is a warning")

Finally, you can ensure having a custom logger for any module you’re working on by using the __name__ variable, which is populated with current module’s path:

import logging

logger = logging.getLogger(__name__)
logger.warning("This is a warning")

Logging from Spiders

Scrapy provides a logger within each Spider instance, which can be accessed and used like this:

import scrapy


class MySpider(scrapy.Spider):
    name = "myspider"
    start_urls = ["https://scrapy.org"]

    def parse(self, response):
        self.logger.info("Parse function called on %s", response.url)

That logger is created using the Spider’s name, but you can use any custom Python logger you want. For example:

import logging
import scrapy

logger = logging.getLogger("mycustomlogger")


class MySpider(scrapy.Spider):
    name = "myspider"
    start_urls = ["https://scrapy.org"]

    def parse(self, response):
        logger.info("Parse function called on %s", response.url)

Logging configuration

Loggers on their own don’t manage how messages sent through them are displayed. For this task, different “handlers” can be attached to any logger instance and they will redirect those messages to appropriate destinations, such as the standard output, files, emails, etc.

By default, Scrapy sets and configures a handler for the root logger, based on the settings below.

Logging settings

These settings can be used to configure the logging:

The first couple of settings define a destination for log messages. If LOG_FILE is set, messages sent through the root logger will be redirected to a file named LOG_FILE with encoding LOG_ENCODING. If unset and LOG_ENABLED is True, log messages will be displayed on the standard error. If LOG_FILE is set and LOG_FILE_APPEND is False, the file will be overwritten (discarding the output from previous runs, if any). Lastly, if LOG_ENABLED is False, there won’t be any visible log output.

LOG_LEVEL determines the minimum level of severity to display, those messages with lower severity will be filtered out. It ranges through the possible levels listed in Log levels.

LOG_FORMAT and LOG_DATEFORMAT specify formatting strings used as layouts for all messages. Those strings can contain any placeholders listed in logging’s logrecord attributes docs and datetime’s strftime and strptime directives respectively.

If LOG_SHORT_NAMES is set, then the logs will not display the Scrapy component that prints the log. It is unset by default, hence logs contain the Scrapy component responsible for that log output.

Rotating log files

Scrapy’s LOG_FILE setting writes logs to a single file. It does not rotate log files automatically, but you can use Python’s standard logging.handlers module when running Scrapy from a script.

For example, to rotate the log file every day:

import logging
from logging.handlers import TimedRotatingFileHandler

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

from myproject.spiders.myspider import MySpider

settings = get_project_settings()
process = CrawlerProcess(settings, install_root_handler=False)

handler = TimedRotatingFileHandler(
    "scrapy.log",
    when="midnight",
    backupCount=7,
    encoding=settings.get("LOG_ENCODING"),
)
handler.setFormatter(
    logging.Formatter(settings.get("LOG_FORMAT"), settings.get("LOG_DATEFORMAT"))
)

root_logger = logging.getLogger()
root_logger.setLevel(settings.get("LOG_LEVEL"))
root_logger.addHandler(handler)

process.crawl(MySpider)
process.start()

Command-line options

There are command-line arguments, available for all commands, that you can use to override some of the Scrapy settings regarding logging.

--logfile FILE
Overrides LOG_FILE
--loglevel/-L LEVEL
Overrides LOG_LEVEL
--nolog
Sets LOG_ENABLED to False

Custom Log Formats

A custom log format can be set for different actions by extending LogFormatter class and making LOG_FORMATTER point to your new class.

class scrapy.logformatter.LogFormatter[source]

Class for generating log messages for different actions.

All methods must return a dictionary listing the parameters level, msg and args which are going to be used for constructing the log message when calling logging.log.

Dictionary keys for the method outputs:

level is the log level for that action, you can use those from the python logging library : logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR and logging.CRITICAL.
msg should be a string that can contain different formatting placeholders. This string, formatted with the provided args, is going to be the long message for that action.
args should be a tuple or dict with the formatting placeholders for msg. The final log message is computed as msg % args.

Users can define their own LogFormatter class if they want to customize how each action is logged or if they want to omit it entirely. In order to omit logging an action the method must return None.

Here is an example on how to create a custom log formatter to lower the severity level of the log message when an item is dropped from the pipeline:

class PoliteLogFormatter(logformatter.LogFormatter):
    def dropped(self, item, exception, response, spider):
        return {
            "level": logging.INFO,  # lowering the level from logging.WARNING
            "msg": "Dropped: %(exception)s" + os.linesep + "%(item)s",
            "args": {
                "exception": exception,
                "item": item,
            },
        }

crawled(request: Request, response: Response, spider: Spider) → LogFormatterResult[source]: Logs a message when the crawler finds a webpage.

download_error(failure: Failure, request: Request, spider: Spider, errmsg: str | None = None) → LogFormatterResult[source]: Logs a download error message from a spider (typically coming from the engine).

dropped(item: Any, exception: BaseException, response: Response | Failure | None, spider: Spider) → LogFormatterResult[source]: Logs a message when an item is dropped while it is passing through the item pipeline.

item_error(item: Any, exception: BaseException, response: Response | Failure | None, spider: Spider) → LogFormatterResult[source]: Logs a message when an item causes an error while it is passing through the item pipeline.

scraped(item: Any, response: Response | Failure | None, spider: Spider) → LogFormatterResult[source]: Logs a message when an item is scraped by a spider.

spider_error(failure: Failure, request: Request, response: Response | Failure, spider: Spider) → LogFormatterResult[source]: Logs an error message from a spider.

Advanced customization

Because Scrapy uses stdlib logging module, you can customize logging using all features of stdlib logging.

For example, let’s say you’re scraping a website which returns many HTTP 404 and 500 responses, and you want to hide all messages like this:

2016-12-16 22:00:06 [scrapy.spidermiddlewares.httperror] INFO: Ignoring
response <500 https://quotes.toscrape.com/page/1-34/>: HTTP status code
is not handled or not allowed

The first thing to note is a logger name - it is in brackets: [scrapy.spidermiddlewares.httperror]. If you get just [scrapy] then LOG_SHORT_NAMES is likely set to True; set it to False and re-run the crawl.

Next, we can see that the message has INFO level. To hide it we should set logging level for scrapy.spidermiddlewares.httperror higher than INFO; next level after INFO is WARNING. It could be done e.g. in the spider’s __init__ method:

import logging
import scrapy


class MySpider(scrapy.Spider):
    # ...
    def __init__(self, *args, **kwargs):
        logger = logging.getLogger("scrapy.spidermiddlewares.httperror")
        logger.setLevel(logging.WARNING)
        super().__init__(*args, **kwargs)

If you run this spider again then INFO messages from scrapy.spidermiddlewares.httperror logger will be gone.

You can also filter log records by LogRecord data. For example, you can filter log records by message content using a substring or a regular expression. Create a logging.Filter subclass and equip it with a regular expression pattern to filter out unwanted messages:

import logging
import re


class ContentFilter(logging.Filter):
    def filter(self, record):
        match = re.search(r"\d{3} [Ee]rror, retrying", record.message)
        if match:
            return False

A project-level filter may be attached to the root handler created by Scrapy, this is a wieldy way to filter all loggers in different parts of the project (middlewares, spider, etc.):

import logging
import scrapy


class MySpider(scrapy.Spider):
    # ...
    def __init__(self, *args, **kwargs):
        for handler in logging.root.handlers:
            handler.addFilter(ContentFilter())

Alternatively, you may choose a specific logger and hide it without affecting other loggers:

import logging
import scrapy


class MySpider(scrapy.Spider):
    # ...
    def __init__(self, *args, **kwargs):
        logger = logging.getLogger("my_logger")
        logger.addFilter(ContentFilter())

scrapy.utils.log module

scrapy.utils.log.configure_logging(settings: Settings | dict[str, Any] | None = None, install_root_handler: bool = True) → None[source]

Initialize logging defaults for Scrapy.

Parameters:

settings (dict, Settings object or None) – settings used to create and configure a handler for the root logger (default: None).
install_root_handler (bool) – whether to install root logging handler (default: True)

This function does:

Route warnings and twisted logging through Python standard logging
Assign DEBUG and ERROR level to Scrapy and Twisted loggers respectively
Route stdout to log if LOG_STDOUT setting is True

When install_root_handler is True (default), this function also creates a handler for the root logger according to given settings (see Logging settings). You can override default options using settings argument. When settings is empty or None, defaults are used.

configure_logging is automatically called when using Scrapy commands or CrawlerProcess, but needs to be called explicitly when running custom scripts using CrawlerRunner. In that case, its usage is not required but it’s recommended.

Another option when running custom scripts is to manually configure the logging. To do this you can use logging.basicConfig() to set a basic root handler.

Note that CrawlerProcess automatically calls configure_logging, so it is recommended to only use logging.basicConfig() together with CrawlerRunner.

This is an example on how to redirect INFO or higher messages to a file:

import logging

logging.basicConfig(
    filename="log.txt", format="%(levelname)s: %(message)s", level=logging.INFO
)

Refer to Run Scrapy from a script for more details about using Scrapy this way.