In large systems, logs can be terrifying: they are huge in volume, and hard to understand.This note lists some suggestions and common misuse of Python's logging
module,with the aim of:
Loggers are globally identified by a dot-separated name given to logging.getLogger(name)
, such as library.module.submodule
.The logger named by an empty string is the "root logger".
Libraries must not call logging.basicConfig
or configure the root logger in any way, unless requested by users.
Configuration of the root logger affects all logs, which is beyond the responsibility of anysingle library. Only application developers, i.e. those who create the program that interacts withusers, should determine how to configure the root logger.
Never call functions like logging.{info,error}
from within a library, because they write to theroot logger. I've added this advice into CPython's official documentation.
When a library writes to the root logger, applications that use the library lose control over thelibrary's logging behavior: they cannot turn on/off the logs from the library, apply custom filter/formatter, or redirect thelogs from the library.
Instead, A library should write to a logger with easily and uniquely identifiable name, using
|
This way, caller of the library will be able to reconfigure its logger using the same name.
__name__
, i.e. the current module name, is often a good logger name.Occasionally, __name__
is not good enough:
__name__
may beuninformative and can be removed, e.g. my_lib.submodule._internal._impl
.Note that there is a trade-off between the name simplicity and the granularity of control.__name__
, e.g. company.company3.organization
.Removing such a common prefix can simplify the logger names while still keeping them unique.The "current function/class name" is often a bad logger name because:
I wrote a simple scriptthat processes Python source files to automatically replace all logging.xxx
by logging.getLogger(__name__).xxx
.This script has created PRs in a few projects that misuse the root logger, such aspytorch/72649,tensorboardX/662.
I hope someone could create a linter that performs this check.
Handlercan be attached to loggers to decide where/how to log a record.
Unless requested by users,a library should not add a handler anywhere (even not to its own logger),if the handler has a visible effect on users.This is because the application developer should make the final call onhow each library's logs are processed.Pre-existing handlers may cause issues such as duplicated logs.This suggestion is present in CPython's documentation here.
Examples of invisible handlers that libraries may add to their loggers are:
Libraries should try to be good citizens in reducing the amount of duplicate/unwanted/useless logs theyprinted. Some tips include:
INFO
for debugging.__init__
.if valid(): log(...)
.log_first_n
:log only for the first times. It limits the total number of certain logs.log_every_n_seconds
:limit the frequency of certain logs to be less than once every seconds.log_every_n
:log once every calls. It reduces the total number of certain logs by a factor of .logging
module directly.Logs are not only strings. logging.LogRecord
is a rich structure with useful attributes, and users can even tag logs with custom attributes through the extra=
argument.
Large, distributed systems should not rely on printing as the sole method of logging.Whenever logs are printed (to terminal or files), they have to be converted to strings.A lot of useful attributes, such as stack trace and line number, are often lost. The lack of structure also makes it difficult toparse and analyze the logs.
In addition to printing, we can also use an additional Handler
to send structured logs to a logging service,such as Google Cloud Logging or Humio.The advantages include:
In an MPI-like distributed job (e.g. many data-parallel deep learning training), workers often print almost identical logs.We should avoid printing them all to the terminal.
A good strategy could be:
Detectron2's setup_logger
implements (1) and (2).
When logs are printed to terminal, they are more readable ifseverity is represented by colors rather than strings.I often use this formatter:
|
Attach this formatter only when the handler writes to terminals (check sys.stdout.isatty
),and we'll get outputs like:
logging
module is not enoughBe aware that it's insufficient to only rely on logging
module.A Python program may produce useful logs bypassing the logging
module, e.g.:
print
statements: they should be avoided, but may still exist.To not miss important logs, a comprehensive logging solution needs to integrateboth the structured logs from Python and the less common unstructured logs from the above sources.