Choosing Log Levels

There are many tools available that will allow a programmer to create log entries.  I originally worked with log4j, but have recently been working with logback. I am also working with Jython and am looking at Python’s logging framework. Used well they can make log analysis simple and provide a rich tool for resolving issues. Used poorly they can generate a mass of relatively useless information.

While these loggers have a good hierarchy of levels, the documentation tends to be lacking in guidance on when to use which level. The following document contains the recommendations I have gathered over the years.

The log levels provided by log4j, logback and other frameworks provide a framework for generating logs of increasing detail and decreasing severiry. Log records containing the log level can be readily filtered to include or exclude particular types of records.

Custom loggers can be used to for separate logging requirements such as security, or user interface tracking. Custom logs typically have fewer levels than operational logs, usually no more than Error, Warn, and Info.

The guidelines for access logs are different from the logs discussed here. They are typically generated with a consistent format, and on a one to one basis with each request by the application or its container.

The guidelines below apply to operational logs and are based on multiple references and years of reviewing logs. They may provide a starting point to create guidelines for custom logs. They are ordered in level of decreasing severity (increasing detail and volume), not in the order of use.

FATAL

The Fatal level designates very severe error events that should lead the application to shutting down. It may not be possible to create these log entries as the event may be running out of memory or another required resource. They may also result from stack overflows, memory corruption, or other non-recoverable condition.  The application or thread should shut down following attempting to log the event. A common use would be to log initialization failures, that prevent the application from starting successfully. If the application is able to run properly after the event, then a different log level should be used. These events should happen extremely rarely.

ERROR

The Error level designates error events that might still allow the application to continue running. This is the level for code and temporary resource errors which can be worked around. They will likely the current request to fail. Events logged at this level should be resolvable by correcting the code, or providing the required resource. Error events should be rare. An application generating more than a few a day, likely needs to be fixed.

Consider logging a Fatal error if the application will be unable to resume normal operations when the error condition is resolved. This may not be appropriate for minor functionality. However, logging a fatal condition may be appropriate for core functionality where the application needs to be restarted to recover.

Application frameworks will typically log un-handled errors at this level. It is usually possible, to capture the error before the framework receives the error, or filter the error before the framework logs it. If it is a common condition that requires a standard response, then it is likely better to handle it and log at a more appropriate level. Consider implementing guard code to prevent such errors. Web applications can set the HTTP status code to an appropriate value after handling the error. (If the HTTP header has been written, neither the contain nor the application can change the HTTP status.)

This is not the level to log user errors such as invalid 9. Nor should it be used to log common conditions that the application should be handling. If you do want to log these events at this level, use a custom logger and filter the messages from the main logs. This will retain the integrity of the operational logs and provide a clean log for analysis.

WARN

The Warn level designates potentially harmful situations. This is an appropriate level for events indicating an attempted intrusion or poor performance. This includes security messages for multiple password failures, or account lockouts.  If the application is monitoring its performance, it would be appropriate to log poor or declining performance at this level. This could include messages indicating processing being deferred due to resource overloading.

INFO

The Info level designates informational messages that highlight the progress of the application at coarse-grained level. This is the lowest level for non-debug events logging. Any non-debug logging that has not been covered by the levels above should be logged at this level.

Depending on the requirements it is common for transactions to occur without logging anything. Successful  transactions should generate few log messages. Logging high volumes of messages can cause scaling issues.

Under normal operation, Info level messages are may not not logged to the console, which normally only gets messages at or above the  Info, Warn or Error level. Not all environments will capture the console log, so a log file logging messages at the Info level and above should be configured. Logging to a file can be made conditional so that disk space is not used when the logs won’t be examined.

It is common to log application configuration messages at this level. This may be limited to logging non-default values, and possibly significant default values. Some applications provide flags to trigger additional configuration logging.

DEBUG

The Debug level designates fine-grained informational events that are most useful to debug an application. This level is used to log debug details. It is appropriate to use this level instead of a print statement when debugging issues.

Some logging frameworks can ignore Debug and Trace level messages with little overhead.   If not, there is usually a Boolean function that can be called and used to conditionally bypass the logging call. Research the recommended approach for your logger.

TRACE

The Trace level designates fine-grained informational events than the Debug level. Typically, these would be used to trace entry into and return from methods. Trace messages can be used to manually verify the expected result is generated from the method inputs.

The entry message could contain the incoming parameters. The entry logger statement would typically be the first line of the method.

The return value log message may be in the called method, the calling method, or both. Logging in the called method is simpler when there should be one return point and the return value is in a method variable.  The returned value needs to be in a variable to be logged. This may disable some code optimizations. The logger call should be immediately before the return in a called method, or immediately after the call in a calling method.

Notes

Levels from WARN up are usually written to the console. Many systems will also log INFO level messages to the console, although if this is a busy level it may be omitted. Console output may also be logged to the container log.

Application logs generally will contain all enabled log levels. For production systems this is normally INFO level and up. Debug and trace level may be enabled selectively for short periods of times. The logging framework contains components which must be synchronized, so excessive logging can be a bottleneck.

Some environments may have a debug log. This may contain only DEBUG and TRACE level logs. However, it is common to log all messages to this log. Debug logs are often frequently rotated and purged quickly.

References