Debugging Techniques

Be it when initially setting up mod_wsgi for the first time, or later during development or use of your WSGI application, you are bound to get some sort of unexpected Python error. By default all you are usually going to see as evidence of this is a HTTP 500 “Internal Server Error” being displayed in the browser window and little else.

The purpose of this document is to explain where to go look for more details on what caused the error, describe techniques one can use to have your WSGI application generate more useful debug information, as well as mechanisms that can be employed to interactively debug your application.

Note that although this document is intended to deal with techniques which can be used when using mod_wsgi, many of the techniques are also directly transferable or adaptable to other web hosting mechanisms for WSGI applications.

Apache Error Log Files

When using mod_wsgi, unless you or the web framework you are using takes specific action to catch exceptions and present the details in an alternate manner, the only place that details of uncaught exceptions will be recorded is in the Apache error log files. The Apache error log files are therefore your prime source of information when things go wrong.

Do note though that log messages generated by mod_wsgi are logged with various severity levels and which ones will be output to the Apache error log files will depend on how Apache has been configured. The standard configuration for Apache has the !LogLevel directive being set to ‘warn’. With this setting any important error messages will be output, but informational messages generated by mod_wsgi which can assist in working out what it is doing are not. Thus, if new to mod_wsgi or trying to debug a problem, it is worthwhile setting the Apache configuration to use ‘info’ log level instead:

LogLevel info

If your Apache web server is only providing services for one host, it is likely that you will only have one error log file. If however the Apache web server is configured for multiple virtual hosts, then it is possible that there will be multiple error log files, one corresponding to the main server host and an additional error log file for each virtual host. Such a virtual host specific error log if one is being used, would have been configured through the placing of the Apache CustomLog directive within the context of the VirtualHost container.

Although your WSGI application may be hosted within a particular virtual host and that virtual host has its own error log file, some error and informational messages will still go to the main server host error log file. Thus you may still need to consult both error log files when using virtual hosts.

Messages of note that will end up in the main server host error log file include notifications in regard to initialisation of Python and the creation and destruction of Python sub interpreters, plus any errors which occur when doing this.

Messages of note that would end up in the virtual host error log file, if it exists, include details of uncaught Python exceptions which occur when the WSGI application script is being loaded, or when the WSGI application callable object is being executed.

Messages that are logged by a WSGI application via the ‘wsgi.errors’ object passed through to the application in the WSGI environment are also logged. These will go to the virtual host error log file if it exists, or the main error log file if the virtual host is not setup with its own error log file. Thus, if you want to add debugging messages to your WSGI application code, you can use ‘wsgi.errors’ in conjunction with the ‘print’ statement as shown below:

def application(environ, start_response):
    status = '200 OK'
    output = b'Hello World!'

    print("application debug #1", file=environ['wsgi.errors'])

    response_headers = [('Content-type', 'text/plain'),
                        ('Content-Length', str(len(output)))]
    start_response(status, response_headers)

    print("application debug #2", file=environ['wsgi.errors'])

    return [output]

Note

If you are using Python 2, you will need to enable the print function at the beginning of the file:

from __future__ import print_function

Alternatively, always use print as a statement rather than a function:

print >> environ['wsgi.errors'], "application debug #N"

If ‘wsgi.errors’ is not available to the code which needs to output log messages, then it should explicitly direct output from ‘print’ to ‘sys.stderr’:

import sys

def function():
    print("application debug #3", file=sys.stderr)
    ...

If sys.stderr or sys.stdout is used directly then these messages will end up in the main server host error log file and not that for the virtual host unless the WSGI application is running in a daemon process specifically associated with a virtual host.

Do be aware though that writing to sys.stdout is by default restricted in versions of mod_wsgi prior to 3.0 and will result in an exception occurring of the form:

IOError: sys.stdout access restricted by mod_wsgi

This is because portable WSGI applications should not write to sys.stdout or use ‘print’ without specifying an alternate file object besides sys.stdout as the target. This restriction can be disabled for the whole server using the WSGIRestrictStdout directive, or by mapping sys.stdout to sys.stderr at global scope within in the WSGI application script file:

import sys
sys.stdout = sys.stderr

In general, a WSGI application should always endeavour to only log messages via the ‘wsgi.errors’ object that is passed through to a WSGI application in the WSGI environment. This is because this is the only way of logging messages for which there is some guarantee that they will end up in a log file that you might have access to if using a shared server.

An application shouldn’t however cache ‘wsgi.errors’ and try to use it outside of the context of a request. If this is done an exception will be raised indicating that the request has expired and the error log object is now invalid.

That messages output via sys.stderr and sys.stdout end up in the Apache error logs at all is provided as a convenience but there is no requirement in the WSGI specification that they are valid means of a WSGI application logging messages.

Displaying Request Environment

When a WSGI application is invoked, the request headers are passed as CGI variables in the WSGI request environment. The dictionary used for this also holds information about the WSGI execution environment and mod_wsgi. This includes mod_wsgi specific variables indicating the name of the process and application groups within which the WSGI application is executing.

Knowing the values of the process and application group variables can be important when needing to validate that your Apache configuration is doing what you intended as far as ensuring your WSGI application is running in daemon mode or otherwise.

A simple way of validating such details or getting access to any of the other WSGI request environment variables is to substitute your existing WSGI application with one which echos back the details to your browser. Such a task can be achieved with the following test application. The application could be extended as necessary to display other information as well, with process ID, user ID and group ID being shown as examples:

import cStringIO
import os

def application(environ, start_response):
    headers = []
    headers.append(('Content-Type', 'text/plain'))
    write = start_response('200 OK', headers)

    input = environ['wsgi.input']
    output = cStringIO.StringIO()

    print("PID: %s" % os.getpid(), file=output)
    print("UID: %s" % os.getuid(), file=output)
    print("GID: %s" % os.getgid(), file=output)
    print(file=output)

    keys = environ.keys()
    keys.sort()
    for key in keys:
        print('%s: %s' % (key, repr(environ[key])), file=output)
    print(file=output)

    output.write(input.read(int(environ.get('CONTENT_LENGTH', '0'))))

    return [output.getvalue()]

For the case of the process group as recorded by the ‘mod_wsgi.process_group’ variable in the WSGI request environment, if the value is an empty string then the WSGI application is running in embedded mode. For any other value it will be running in daemon mode with the process group named by the variables value.

Note that by default WSGI applications run in embedded mode, which means within the Apache server child processes which accept the original requests. Daemon mode processes would only be used through appropriate use of the WSGIDaemonProcess and WSGIProcessGroup directives to delegate the WSGI application to a named daemon process group.

For the case of the application group as recorded by the ‘mod_wsgi.application_group’ variable in the WSGI request environment, if the value is an empty string then the WSGI application is running in the main Python interpreter. That is, the very first interpreter created when Python was initialised. For any other value it indicates it is running in the named Python sub interpreter.

Note that by default WSGI applications would always run in a sub interpreter rather than the main interpreter. The name of this sub interpreter would be automatically constructed from the name of the server or virtual host, the URL mount point of the WSGI application and the number of the listener port when it is other than ports 80 or 443.

To delegate a WSGI application to run in main Python interpreter, the WSGIApplicationGroup directive would need to have been used with the value ‘%{GLOBAL}’. Although the value is ‘%{GLOBAL}’, this translates to the empty string seen for the value of ‘mod_wsgi.application_group’ within the WSGI request environment.

The WSGIApplicationGroup directive could also be used to designate a specific named sub interpreter rather than that selected automatically.

For newcomers this can all be a bit confusing, which is where the test application comes in as you can use it to validate where your WSGI application is running is where you intended it to run.

The set of WSGI request environment variables will also show the WSGI variables indicating whether process is multithreaded and whether the process group is multiprocess or not. For a more complete explanation of what that means see documentation of Processes And Threading.

Tracking Request and Response

Although one can use above test application to display the request environment, it is replacing your original WSGI application. Rather than replace your existing application you can use a WSGI middleware wrapper application which logs the details to the Apache error log instead:

# Original WSGI application.

def application(environ, start_response):
    ...

# Logging WSGI middleware.

import pprint

class LoggingMiddleware:

    def __init__(self, application):
        self.__application = application

    def __call__(self, environ, start_response):
        errors = environ['wsgi.errors']
        pprint.pprint(('REQUEST', environ), stream=errors)

        def _start_response(status, headers, *args):
            pprint.pprint(('RESPONSE', status, headers), stream=errors)
            return start_response(status, headers, *args)

        return self.__application(environ, _start_response)

application = LoggingMiddleware(application)

The output from the middleware would end up in the Apache error log for the virtual host, or if no virtual host specific error log file, in the main Apache error log file.

For more complicated problems it may also be necessary to track both the request and response content as well. A more complicated middleware which can log these as well as header information to the file system is as follows:

# Original WSGI application.

def application(environ, start_response):
    ...

# Logging WSGI middleware.

import threading
import pprint
import time
import os

class LoggingInstance:
    def __init__(self, start_response, oheaders, ocontent):
        self.__start_response = start_response
        self.__oheaders = oheaders
        self.__ocontent = ocontent

    def __call__(self, status, headers, *args):
        pprint.pprint((status, headers)+args, stream=self.__oheaders)
        self.__oheaders.close()

        self.__write = self.__start_response(status, headers, *args)
        return self.write

    def __iter__(self):
        return self

    def write(self, data):
        self.__ocontent.write(data)
        self.__ocontent.flush()
        return self.__write(data)

    def next(self):
        data = self.__iterable.next()
        self.__ocontent.write(data)
        self.__ocontent.flush()
        return data

    def close(self):
        if hasattr(self.__iterable, 'close'):
            self.__iterable.close()
        self.__ocontent.close()

    def link(self, iterable):
        self.__iterable = iter(iterable)

class LoggingMiddleware:

    def __init__(self, application, savedir):
        self.__application = application
        self.__savedir = savedir
        self.__lock = threading.Lock()
        self.__pid = os.getpid()
        self.__count = 0

    def __call__(self, environ, start_response):
        self.__lock.acquire()
        self.__count += 1
        count = self.__count
        self.__lock.release()

        key = "%s-%s-%s" % (time.time(), self.__pid, count)

        iheaders = os.path.join(self.__savedir, key + ".iheaders")
        iheaders_fp = open(iheaders, 'w')

        icontent = os.path.join(self.__savedir, key + ".icontent")
        icontent_fp = open(icontent, 'w+b')

        oheaders = os.path.join(self.__savedir, key + ".oheaders")
        oheaders_fp = open(oheaders, 'w')

        ocontent = os.path.join(self.__savedir, key + ".ocontent")
        ocontent_fp = open(ocontent, 'w+b')

        errors = environ['wsgi.errors']
        pprint.pprint(environ, stream=iheaders_fp)
        iheaders_fp.close()

        length = int(environ.get('CONTENT_LENGTH', '0'))
        input = environ['wsgi.input']
        while length != 0:
            data = input.read(min(4096, length))
            if data:
                icontent_fp.write(data)
                length -= len(data)
            else:
                length = 0
        icontent_fp.flush()
        icontent_fp.seek(0, os.SEEK_SET)
        environ['wsgi.input'] = icontent_fp

        iterable = LoggingInstance(start_response, oheaders_fp, ocontent_fp)
        iterable.link(self.__application(environ, iterable))
        return iterable

application = LoggingMiddleware(application, '/tmp/wsgi')

For this middleware, the second argument to the constructor should be a preexisting directory. For each request four files will be saved. These correspond to input headers, input content, response status and headers, and request content.

Poorly Performing Code

The WSGI specification allows any iterable object to be returned as the response, so long as the iterable yields byte-strings (bytes, or str on Python 2). That this is the case means that one can too easily return an object which satisfies this requirement but has some sort of performance related issue.

The worst case of this is where instead of returning a list containing byte-strings, a single byte-string value is returned. The problem is that when a byte-string is iterated over, a single byte is yielded each time. In other words, a single byte is written back to the client on each loop, with a flush occurring in between to ensure that the byte has actually been written and isn’t just being buffered.

Although for small byte-strings a performance impact may not be noticed, if returning more data the effect on request throughput could be quite significant.

Another case which can cause problems is to return a file like object. For iteration over a file like object, typically what can occur is that a single line within the file is returned each time. If the file is a line oriented text file where each line is a of a reasonable length, this may be okay, but if the file is a binary file there may not actually be line breaks within the file.

For the case where file contains many short lines, throughput would be affected much like in the case where a byte-string is returned. For the case where the file is just binary data, the result can be that the complete file may be read in on the first loop. If the file is large, this could cause a large transient spike in memory usage. Once that memory is allocated, it will then usually be retained by the process, albeit that it may be reused by the process at a later point.

Because of the performance impacts in terms of throughput and memory usage, both these cases should be avoided. For the case of returning a byte-string, it should be returned with a single element list. For the case of a file like object, the ‘wsgi.file_wrapper’ extension should be used, or a wrapper which suitably breaks the response into chunks.

In order to identify where code may be inadvertently returning such iterable types, the following code can be used:

import io
import socket
import sys

BAD_ITERABLES = [
  bytes,
  socket.socket,
  io.IOBase,
]
if sys.version_info < (3, 0):
    # Python 2
    import types
    import cStringIO
    import StringIO
    BAD_ITERABLES.extend([
        types.FileType,
        cStringIO.InputType,
        StringIO.StringIO,
    ])

class ValidatingMiddleware:

    def __init__(self, application):
        self.__application = application

    def __call__(self, environ, start_response):
        errors = environ['wsgi.errors']

        result = self.__application(environ, start_response)

        value = type(result)
        if value in BAD_ITERABLES:
            print('BAD ITERABLE RETURNED: ', file=errors, end='')
            print('URL=%s ' % environ['REQUEST_URI'], file=errors, end='')
            print('TYPE=%s' % value, file=errors)

        return result

def application(environ, start_response):
    ...

application = ValidatingMiddleware(application)

Error Catching Middleware

Because mod_wsgi only logs details of uncaught exceptions to the Apache error log and returns a generic HTTP 500 “Internal Server Error” response, if you want the details of any exception to be displayed in the error page and be visible from the browser, you will need to use a WSGI error catching middleware component.

One example of WSGI error catching middleware is the ErrorMiddleware class from Paste.

This class can be configured not only to catch exceptions and present the details to the browser in an error page, it can also be configured to send the details of any errors in email to a designated recipient, or log the details to an alternate log file.

Being able to have error details sent by email would be useful in a production environment or where your application is running on a web hosting environment and the Apache error logs would not necessarily be closely monitored on a day to day basis. Enabling of that particular feature though should possibly only be done when you have some confidence in the application else you might end up getting inundated with emails.

To use the error catching middleware from Paste you simply need to wrap your existing application with it such that it then becomes the top level application entry point:

def application(environ, start_response):
    status = '200 OK'
    output = b'Hello World!\n'

    response_headers = [('Content-type', 'text/plain'),
                        ('Content-Length', str(len(output)))]
    start_response(status, response_headers)

    return [output]

from paste.exceptions.errormiddleware import ErrorMiddleware
application = ErrorMiddleware(application, debug=True)

In addition to displaying information about the Python exception that has occurred and the stack traceback, this middleware component will also output information about the WSGI environment such that you can see what was being passed to the WSGI application. This can be useful if the cause of any problem was unexpected values passed in the headers of the HTTP request.

Note that error catching middleware is of absolutely no use for trying to capture and display in the browser any errors that occur at global scope within the WSGI application script when it is being imported. Details of any such errors occurring at this point will only be captured in the Apache error log files. As much as possible you should avoid performing complicated tasks when the WSGI application script file is being imported, instead you should only trigger such actions the first time a request is received. By doing this you will be able to capture errors in such initialisation code with the error catching middleware.

Also note that the debug mode whereby details are displayed in the browser should only be used during development and not in a production system. This is because details which are displayed may be of use to anyone who may wish to compromise your site.

Python Interactive Debugger

Python debuggers such as implemented by the ‘pdb’ module can sometimes be useful in debugging Python applications, especially where there is a need to single step through code and analyse application state at each point. Use of such debuggers in web applications can be a bit more tricky than normal applications though and especially so with mod_wsgi.

The problem with mod_wsgi is that the Apache web server can create multiple child processes to respond to requests. Partly because of this, but also just to prevent problems in general, Apache closes off standard input at startup. Thus there is no actual way to interact with the Python debugger module if it were used.

To get around this requires having complete control of the Apache web server that you are using to host your WSGI application. In particular, it will be necessary to shutdown the web server and then startup the ‘httpd’ process explicitly in single process debug mode, avoiding the ‘apachectl’ management application altogether:

$ apachectl stop
$ httpd -X

If Apache is normally started as the ‘root’ user, this also will need to be run as the ‘root’ user otherwise the Apache web server will not have the required permissions to write to its log directories etc.

The result of starting the ‘httpd’ process in this way will be that the Apache web server will run everything in one process rather than using multiple processes. Further, it will not close off standard input thus allowing the Python debugger to be used.

Do note though that one cannot be using the ability of mod_wsgi to run your application in a daemon process when doing this. The WSGI application must be running within the main Apache process.

To trigger the Python debugger for any call within your code, the following customised wrapper for the ‘Pdb’ class should be used:

class Debugger:

    def __init__(self, object):
        self.__object = object

    def __call__(self, *args, **kwargs):
        import pdb, sys
        debugger = pdb.Pdb()
        debugger.use_rawinput = 0
        debugger.reset()
        sys.settrace(debugger.trace_dispatch)

        try:
            return self.__object(*args, **kwargs)
        finally:
            debugger.quitting = 1
            sys.settrace(None)

This might for example be used to wrap the actual WSGI application callable object:

def application(environ, start_response):
    status = '200 OK'
    output = b'Hello World!\n'

    response_headers = [('Content-type', 'text/plain'),
                        ('Content-Length', str(len(output)))]
    start_response(status, response_headers)

    return [output]

application = Debugger(application)

When a request is now received, the Python debugger will be triggered and you can interactively debug your application from the window you ran the ‘httpd’ process. For example:

> /usr/local/wsgi/scripts/hello.py(21)application()
-> status = '200 OK'

(Pdb) list
 16             finally:
 17                 debugger.quitting = 1
 18                 sys.settrace(None)
 19
 20     def application(environ, start_response):
 21  ->     status = '200 OK'
 22         output = b'Hello World!\n'
 23
 24         response_headers = [('Content-type', 'text/plain'),
 25                             ('Content-Length', str(len(output)))]
 26         start_response(status, response_headers)

(Pdb) print start_response
<built-in method start_response of mod_wsgi.Adapter object at 0x1160180>

cont

When wishing to allow the request to complete, issue the ‘cont’ command. If wishing to cause the request to abort, issue the ‘quit’ command. This will result in a ‘BdbQuit’ exception being raised and would result in a HTTP 500 “Internal Server Error” response being returned to the client. To kill off the whole ‘httpd’ process, after having issued ‘cont’ or ‘quit’ to exit the debugger, interrupt the process using ‘CTRL-C’.

To see what commands the Python debugger accepts, issue the ‘help’ command and also consult the documentation for the ‘pdb’ module on the Python web site.

Note that the Python debugger expects to be able to write to sys.stdout to display information to the terminal. Thus if using using a Python web framework which replaces sys.stdout such as web.py, you will not be able to use the Python debugger.

Browser Based Debugger

In order to use the Python debugger modules you need to have direct access to the host and the Apache web server that is running your WSGI application. If your only access to the system is via your web browser this makes the use of the full Python debugger impractical.

An alternative to the Python debugger modules which is available is an extension of the WSGI error catching middleware previously described. This is the EvalException class from Paste. It embodies the error catching attributes of the ErrorMiddleware class, but also allows some measure of interactive debugging and introspection through the web browser.

As with any WSGI middleware component, to use the class entails creating a wrapper around the application you wish to debug:

def application(environ, start_response):
    status = '200 OK'
    output = b'Hello World!\n'

    response_headers = [('Content-type', 'text/plain'),
                        ('Content-Length', str(len(output)))]
    start_response(status, response_headers)

    return [output]

from paste.evalexception.middleware import EvalException
application = EvalException(application)

Like ErrorMiddleware when an unexpected exception occurs a web page is presented which shows the location of the error along with the contents of the WSGI application environment. Where EvalException is different however is that it is possible to inspect the local variables residing within each stack frame down to where the error occurred. Further, it is possible to enter Python code which can be evaluated within the context of the selected stack frame in order to access data or call functions or methods of objects.

In order for this to all work requires that subsequent requests back to the WSGI application always end up with the same process where the error originally occurred. With mod_wsgi this does however present a bit of a problem as Apache can create and use multiple child processes to handle requests.

Because of this requirement, if you want to be able to use this browser based interactive debugger, if running your application in embedded mode of mod_wsgi, you will need to configure Apache such that it only starts up one child process to handle requests and that it never creates any additional processes. The Apache configuration directives required to achieve this are as follows:

StartServers 1
ServerLimit 1

The directives must be placed at global scope within the main Apache configuration files and will affect the whole Apache web server.

If you are using the worker MPM on a UNIX system, restricting Apache to just a single process may not be an issue, at least during development. If however you are using the prefork MPM on a UNIX system, you may see issues if you are using an AJAX intensive page that relies on being able to execute parallel requests, as only one request at a time will be able to be handled by the Apache web server.

If using Apache 2.X on a UNIX system, a better approach is to use daemon mode of mod_wsgi and delegate your application to run in a single daemon process. This process may be single or multithreaded as per any threading requirements of your application.

Which ever configuration is used, if the browser based interactive debugger is used it should only be used on a development system and should never be deployed on a production system or in a web hosting environment. This is because the debugger will allow one to execute arbitrary Python code within the context of your application from a remote client.

Debugging Crashes With GDB

In cases where Apache itself crashes for no apparent reason, the above techniques are not always particularly useful. This is especially the case where the crash occurs in non Python code outside of your WSGI application.

The most common cause of Apache crashing, besides any still latent bugs that may exist in mod_wsgi, of which hopefully there aren’t any, are shared library version mismatches. Another major cause of crashes is third party C extension modules for Python which are not compatible with being used in a Python sub interpreter which isn’t the first interpreter created when Python is initialised, or modules which are not compatible with Python sub interpreters being destroyed and the module then being used in a new Python sub interpreter.

Examples of where shared library version mismatches are known to occur are between the version of the ‘expat’ library used by Apache and that embedded within the Python ‘pyexpat’ module. Another is between the version of the MySQL client libraries used by PHP and the Python MySQL module.

Both these can be a cause of crashes where the different components are compiled and linked against different versions of the shared library for the packages in question. It is vitally important that all packages making use of a shared library were compiled against and use the same version of a shared library.

Another problematic package is Subversion. In this case there can be conflicts between the version of Subversion libraries used by mod_dav_svn and the Python Subversion bindings. Certain versions of the Python Subversion modules also cause problems because they appear to be incompatible with use in a Python sub interpreter which isn’t the first interpreter created when Python is initialised.

In this latter issue, the sub interpreter problems can often be solved by forcing the WSGI application using the Python Subversion modules to run in the ‘%{GLOBAL}’ application group. This solution often also resolves issues with SWIG generated bindings, especially where the -thread option was supplied to ‘swig’ when the bindings were generated.

Whatever the reason, in some cases the only way to determine why Apache or Python is crashing is to use a C code debugger such as ‘gdb’. Now although it is possible to attach ‘gdb’ to a running process, the preferred method for using ‘gdb’ in conjunction with Apache is to run Apache in single process debug mode from within ‘gdb’.

To do this it is necessary to first shutdown Apache. The ‘gdb’ debugger can then be started against the ‘httpd’ executable and then the process started up from inside of ‘gdb’ with the -X flag to select single-process debug mode:

$ /usr/local/apache/bin/apachectl stop
$ sudo gdb /usr/local/apache/bin/httpd
GNU gdb 6.1-20040303 (Apple version gdb-384) (Mon Mar 21 00:05:26 GMT 2005)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "powerpc-apple-darwin"...Reading symbols for shared
libraries ........ done

(gdb) run -X
Starting program: /usr/local/apache/bin/httpd -X
Reading symbols for shared libraries .+++ done
Reading symbols for shared libraries ..................... done

If Apache is normally started as the ‘root’ user, this also will need to be run as the ‘root’ user otherwise the Apache web server will not have the required permissions to write to its log directories etc.

If Apache was crashing on startup, you should immediately encounter the error, otherwise use your web browser to access the URL which is causing the crash to occur. You can then commence trying to debug why the crash is occurring.

Note that you should ensure that you have not assigned your WSGI application to run in a mod_wsgi daemon process using the WSGIDaemonProcess and WSGIProcessGroup directives. This is because the above procedure will only catch crashes which occur when the application is running in embedded mode. If it turns out that the application only crashes when run in mod_wsgi daemon mode, an alternate method of using ‘gdb’ will be required.

In this circumstance you should run Apache as normal, but ensure that you only create one mod_wsgi daemon process and have it use only a single thread:

WSGIDaemonProcess debug threads=1
WSGIProcessGroup debug

If not running the daemon process as a distinct user where you can tell which process it is, then you will also need to ensure that Apache !LogLevel directive has been set to ‘info’. This is to ensure that information about daemon processes created by mod_wsgi are logged to the Apache error log. This is necessary, as you will need to consult the Apache error logs to determine the process ID of the daemon process that has been created for that daemon process group:

mod_wsgi (pid=666): Starting process 'debug' with threads=1.

Knowing the process ID, you should then run ‘gdb’, telling it to attach directly to the daemon process:

$ sudo gdb /usr/local/apache/bin/httpd 666
GNU gdb 6.1-20040303 (Apple version gdb-384) (Mon Mar 21 00:05:26 GMT 2005)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "powerpc-apple-darwin"...Reading symbols for shared
libraries ........ done

/Users/grahamd/666: No such file or directory.
Attaching to program: `/usr/local/apache/bin/httpd', process 666.
Reading symbols for shared libraries .+++..................... done
0x900c7060 in sigwait ()
(gdb) cont
Continuing.

Once ‘gdb’ has been started and attached to the process, then initiate the request with the URL that causes the application to crash.

Attaching to the running daemon process can also be useful where a single request or the whole process is appearing to hang. In this case one can force a stack trace to be output for all running threads to try and determine what code is getting stuck. The appropriate gdb command in this instance is ‘thread apply all bt’:

sudo gdb /usr/local/apache-2.2/bin/httpd 666
GNU gdb 6.3.50-20050815 (Apple version gdb-477) (Sun Apr 30 20:06:22 GMT 2006)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "powerpc-apple-darwin"...Reading symbols
for shared libraries ....... done

/Users/grahamd/666: No such file or directory.
Attaching to program: `/usr/local/apache/bin/httpd', process 666.
Reading symbols for shared libraries .+++..................... done
0x900c7060 in sigwait ()
(gdb) thread apply all bt

Thread 4 (process 666 thread 0xd03):
#0  0x9001f7ac in select ()
#1  0x004189b4 in apr_pollset_poll (pollset=0x1894650,
    timeout=-1146117585187099488, num=0xf0182d98, descriptors=0xf0182d9c)
    at poll/unix/select.c:363
#2  0x002a57f0 in wsgi_daemon_thread (thd=0x1889660, data=0x18895e8)
    at mod_wsgi.c:6980
#3  0x9002bc28 in _pthread_body ()

Thread 3 (process 666 thread 0xc03):
#0  0x9001f7ac in select ()
#1  0x0041d224 in apr_sleep (t=1000000) at time/unix/time.c:246
#2  0x002a2b10 in wsgi_deadlock_thread (thd=0x0, data=0x2aee68) at
    mod_wsgi.c:7119
#3  0x9002bc28 in _pthread_body ()

Thread 2 (process 666 thread 0xb03):
#0  0x9001f7ac in select ()
#1  0x0041d224 in apr_sleep (t=299970002) at time/unix/time.c:246
#2  0x002a2dec in wsgi_monitor_thread (thd=0x0, data=0x18890e8) at
    mod_wsgi.c:7197
#3  0x9002bc28 in _pthread_body ()

Thread 1 (process 666 thread 0x203):
#0  0x900c7060 in sigwait ()
#1  0x0041ba9c in apr_signal_thread (signal_handler=0x2a29a0
    <wsgi_check_signal>) at threadproc/unix/signals.c:383
#2  0x002a3728 in wsgi_start_process (p=0x1806418, daemon=0x18890e8)
    at mod_wsgi.c:7311
#3  0x002a6a4c in wsgi_hook_init (pconf=0x1806418, ptemp=0x0,
    plog=0xc8, s=0x18be8d4) at mod_wsgi.c:7716
#4  0x0000a5b0 in ap_run_post_config (pconf=0x1806418, plog=0x1844418,
    ptemp=0x180e418, s=0x180da78) at config.c:91
#5  0x000033d4 in main (argc=3, argv=0xbffffa8c) at main.c:706

It is suggested when trying to debug such issues that the daemon process be made to run with only a single thread. This will reduce how many stack traces one needs to analyse.

If you are running with multiple processes within the daemon process group and all requests are hanging, you will need to get a snapshot of what is happening in all processes in the daemon process group. Because doing this by hand will be tedious, it is better to automate it.

To automate capturing the stack traces, first create a file called ‘gdb.cmds’ which contains the following:

set pagination 0
thread apply all bt
detach
quit

This can then be used in conjunction with ‘gdb’ to avoid needing to enter the commands manually. For example:

sudo gdb /usr/local/apache-2.2/bin/httpd -x gdb.cmds -p 666

To be able to automate this further and apply it to all processes in a daemon process group, then first off ensure that daemon processes are named in ‘ps’ output by using the ‘display-name’ option to WSGIDaemonProcess directive.

For example, to apply default naming strategy as implemented by mod_wsgi, use:

WSGIDaemonProcess xxx display-name=%{GLOBAL}

In the output of a BSD derived ‘ps’ command, this will now show the process as being named ‘(wsgi:xxx)’:

$ ps -cxo command,pid | grep wsgi
(wsgi:xxx)        666

Note that the name may be truncated as the resultant name can be no longer than what was the length of the original executable path for Apache. You may therefore like to name it explicitly:

WSGIDaemonProcess xxx display-name=(wsgi:xxx)

Having named the processes in the daemon process group, we can now parse the output of ‘ps’ to identify the process and apply the ‘gdb’ command script to each:

for pid in `ps -cxo command,pid | awk '{ if ($0 ~ /wsgi:xxx/ && $1 !~ /grep/) print $NF }'`; do sudo gdb -x gdb.cmds -p $pid; done

The actual name given to the daemon process group using the ‘display-name’ option should be replaced in this command line. That is, change ‘wsgi:xxx’ appropriately.

If you are having problems with process in daemon process groups hanging, you might consider implementing a monitoring system which automatically detects somehow when the processes are no longer responding to requests and automatically trigger this dump of the stack traces before restarting the daemon process group or Apache.

Extracting Python Stack Traces

Using gdb to get stack traces as described above only gives you information about what is happening at the C code level. This will not tell where in the actual Python code execution was at. Your only clue is going to be where a call out was being made to some distinct C function in a C extension module for Python.

One can get stack traces for Python code by using:

def _stacktraces():
   code = []
   for threadId, stack in sys._current_frames().items():
       code.append("\n# ThreadID: %s" % threadId)
       for filename, lineno, name, line in traceback.extract_stack(stack):
           code.append('File: "%s", line %d, in %s' % (filename,
                   lineno, name))
           if line:
               code.append("  %s" % (line.strip()))

   for line in code:
       print >> sys.stderr, line

The caveat here obviously is that the process has to still be running. There is also the issue of how you trigger that function to dump stack traces for executing Python threads.

If the problem you have is that some request handler threads are stuck, either blocked, or stuck in an infinite loop, and you want to know what they are doing, then so long as there are still some handler threads left and the application is still responding to requests, then you could trigger it from a request handler triggered by making a request against a specific URL.

This though depends on you only running your application within a single process because as soon as you have multiple processes you have no guarantee that a request will go to the process you want to debug.

A better method therefore is to have a perpetually running background thread which monitors for a specific file in the file system. When that file is created or the modification time changes, then the background thread would dump the stack traces for the process.

Sample code which takes this approach is included below. This code could be placed temporarily at the end of your WSGI script file if you know you are going to need it because of a recurring problem:

from __future__ import print_function

import os
import sys
import time
import signal
import threading
import atexit
import traceback

try:
    from Queue import Queue  # Python 2
except ImportError:
    from queue import Queue  # Python 3

FILE = '/tmp/dump-stack-traces.txt'

_interval = 1.0

_running = False
_queue = Queue()
_lock = threading.Lock()

def _stacktraces():
    code = []
    for threadId, stack in sys._current_frames().items():
        code.append("\n# ProcessId: %s" % os.getpid())
        code.append("# ThreadID: %s" % threadId)
        for filename, lineno, name, line in traceback.extract_stack(stack):
            code.append('File: "%s", line %d, in %s' % (filename,
                    lineno, name))
            if line:
                code.append("  %s" % (line.strip()))

    for line in code:
        print(line, file=sys.stderr)

try:
    mtime = os.path.getmtime(FILE)
except:
    mtime = None

def _monitor():
    while True:
        global mtime

        try:
            current = os.path.getmtime(FILE)
        except:
            current = None

        if current != mtime:
            mtime = current
            _stacktraces()

        # Go to sleep for specified interval.

        try:
            return _queue.get(timeout=_interval)
        except:
            pass

_thread = threading.Thread(target=_monitor)
_thread.setDaemon(True)

def _exiting():
    try:
        _queue.put(True)
    except:
        pass
    _thread.join()

atexit.register(_exiting)

def _start(interval=1.0):
    global _interval
    if interval < _interval:
        _interval = interval

    global _running
    _lock.acquire()
    if not _running:
        prefix = 'monitor (pid=%d):' % os.getpid()
        print('%s Starting stack trace monitor.' % prefix, file=sys.stderr)
        _running = True
        _thread.start()
    _lock.release()

_start()

Once your WSGI script file has been loaded, then touching the file ‘/tmp/dump-stack-traces.txt’ will cause stack traces for active Python threads to be output to the Apache error log.

Note that the sample code doesn’t deal with possibility that with multiple processes for same application, that all processes may attempt to dump information at the same time. As such, you may get interleaving of output from multiple processes in Apache error logs at the same time.

What you may want to do is modify this code to dump out to some special directory, distinct files containing the trace where the names of the file include the process ID and a date/time. That way each will be separate.

An example of what one might expect to see from the above code is as follows:

# ProcessId: 666
# ThreadID: 4352905216
File: "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 497, in __bootstrap
  self.__bootstrap_inner()
File: "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 522, in __bootstrap_inner
  self.run()
File: "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 477, in run
  self.__target(*self.__args, **self.__kwargs)
File: "/Library/WebServer/Sites/django-1/htdocs/project.wsgi", line 72, in _monitor
  _stacktraces()
File: "/Library/WebServer/Sites/django-1/htdocs/project.wsgi", line 47, in _stacktraces
  for filename, lineno, name, line in traceback.extract_stack(stack):

# ThreadID: 4322832384
File: "/Library/WebServer/Sites/django-1/htdocs/project.wsgi", line 21, in application
  return _application(environ, start_response)
File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/core/handlers/wsgi.py", line 245, in __call__
  response = middleware_method(request, response)
File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/contrib/sessions/middleware.py", line 36, in process_response
  request.session.save()
File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/contrib/sessions/backends/db.py", line 63, in save
  obj.save(force_insert=must_create, using=using)
File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/base.py", line 434, in save
  self.save_base(using=using, force_insert=force_insert, force_update=force_update)
File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/base.py", line 527, in save_base
  result = manager._insert(values, return_id=update_pk, using=using)
File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/manager.py", line 195, in _insert
  return insert_query(self.model, values, **kwargs)
File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/query.py", line 1479, in insert_query
  return query.get_compiler(using=using).execute_sql(return_id)
File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/sql/compiler.py", line 783, in execute_sql
  cursor = super(SQLInsertCompiler, self).execute_sql(None)
File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/sql/compiler.py", line 727, in execute_sql
  cursor.execute(sql, params)
File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/debug_toolbar/panels/sql.py", line 95, in execute
  stacktrace = tidy_stacktrace(traceback.extract_stack())
File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/debug_toolbar/panels/sql.py", line 40, in tidy_stacktrace
  s_path = os.path.realpath(s[0])
File: "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/posixpath.py", line 355, in realpath
  if islink(component):
File: "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/posixpath.py", line 132, in islink
  st = os.lstat(path)

Note that one of the displayed threads will be that for the thread which is dumping the stack traces. That stack trace can obviously be ignored.

One could extend the above recipe in more elaborate ways by using a WSGI middleware that capture details of each request from the WSGI environment and also dumping out from that the URL for the request being handled by any threads. This may assist in working out whether problems are related to a specific URL.