Wiki Example

author:

Ian Bicking <ianb@colorstudy.com>

Introduction

This is an example of how to write a WSGI application using WebOb. WebOb isn't itself intended to write applications -- it is not a web framework on its own -- but it is possible to write applications using just WebOb.

The file serving example is a better example of advanced HTTP usage. The comment middleware example is a better example of using middleware. This example provides some completeness by showing an application-focused end point.

This example implements a very simple wiki.

Code

The finished code for this is available in docs/wiki-example-code/example.py -- you can run that file as a script to try it out.

Creating an Application

A common pattern for creating small WSGI applications is to have a class which is instantiated with the configuration. For our application we'll be storing the pages under a directory.

class WikiApp(object):

    def __init__(self, storage_dir):
        self.storage_dir = os.path.abspath(os.path.normpath(storage_dir))

WSGI applications are callables like wsgi_app(environ, start_response). Instances of WikiApp are WSGI applications, so we'll implement a __call__ method:

class WikiApp(object):
    ...
    def __call__(self, environ, start_response):
        # what we'll fill in

To make the script runnable we'll create a simple command-line interface:

if __name__ == '__main__':
    import optparse
    parser = optparse.OptionParser(
        usage='%prog --port=PORT'
        )
    parser.add_option(
        '-p', '--port',
        default='8080',
        dest='port',
        type='int',
        help='Port to serve on (default 8080)')
    parser.add_option(
        '--wiki-data',
        default='./wiki',
        dest='wiki_data',
        help='Place to put wiki data into (default ./wiki/)')
    options, args = parser.parse_args()
    print 'Writing wiki pages to %s' % options.wiki_data
    app = WikiApp(options.wiki_data)
    from wsgiref.simple_server import make_server
    httpd = make_server('localhost', options.port, app)
    print 'Serving on http://localhost:%s' % options.port
    try:
        httpd.serve_forever()
    except KeyboardInterrupt:
        print '^C'

There's not much to talk about in this code block. The application is instantiated and served with the built-in module wsgiref.simple_server.

The WSGI Application

Of course all the interesting stuff is in that __call__ method. WebOb lets you ignore some of the details of WSGI, like what start_response really is. environ is a CGI-like dictionary, but webob.Request gives an object interface to it. webob.Response represents a response, and is itself a WSGI application. Here's kind of the hello world of WSGI applications using these objects:

from webob import Request, Response

class WikiApp(object):
    ...

    def __call__(self, environ, start_response):
        req = Request(environ)
        resp = Response(
            'Hello %s!' % req.params.get('name', 'World'))
        return resp(environ, start_response)

req.params.get('name', 'World') gets any query string parameter (like ?name=Bob), or if it's a POST form request it will look for a form parameter name. We instantiate the response with the body of the response. You could also give keyword arguments like content_type='text/plain' (text/html is the default content type and 200 OK is the default status).

For the wiki application we'll support a couple different kinds of screens, and we'll make our __call__ method dispatch to different methods depending on the request. We'll support an action parameter like ?action=edit, and also dispatch on the method (GET, POST, etc, in req.method). We'll pass in the request and expect a response object back.

Also, WebOb has a series of exceptions in webob.exc, like webob.exc.HTTPNotFound, webob.exc.HTTPTemporaryRedirect, etc. We'll also let the method raise one of these exceptions and turn it into a response.

One last thing we'll do in our __call__ method is create our Page object, which represents a wiki page.

All this together makes:

from webob import Request, Response
from webob import exc

class WikiApp(object):
    ...

    def __call__(self, environ, start_response):
        req = Request(environ)
        action = req.params.get('action', 'view')
        # Here's where we get the Page domain object:
        page = self.get_page(req.path_info)
        try:
            try:
                # The method name is action_{action_param}_{request_method}:
                meth = getattr(self, 'action_%s_%s' % (action, req.method))
            except AttributeError:
                # If the method wasn't found there must be
                # something wrong with the request:
                raise exc.HTTPBadRequest('No such action %r' % action)
            resp = meth(req, page)
        except exc.HTTPException, e:
            # The exception object itself is a WSGI application/response:
            resp = e
        return resp(environ, start_response)

The Domain Object

The Page domain object isn't really related to the web, but it is important to implementing this. Each Page is just a file on the filesystem. Our get_page method figures out the filename given the path (the path is in req.path_info, which is all the path after the base path). The Page class handles getting and setting the title and content.

Here's the method to figure out the filename:

import os

class WikiApp(object):
    ...

    def get_page(self, path):
        path = path.lstrip('/')
        if not path:
            # The path was '/', the home page
            path = 'index'
        path = os.path.join(self.storage_dir)
        path = os.path.normpath(path)
        if path.endswith('/'):
            path += 'index'
        if not path.startswith(self.storage_dir):
            raise exc.HTTPBadRequest("Bad path")
        path += '.html'
        return Page(path)

Mostly this is just the kind of careful path construction you have to do when mapping a URL to a filename. While the server may normalize the path (so that a path like /../../ can't be requested), you can never really be sure. By using os.path.normpath we eliminate these, and then we make absolutely sure that the resulting path is under our self.storage_dir with if not path.startswith(self.storage_dir): raise exc.HTTPBadRequest("Bad path").

Here's the actual domain object:

class Page(object):
    def __init__(self, filename):
        self.filename = filename

    @property
    def exists(self):
        return os.path.exists(self.filename)

    @property
    def title(self):
        if not self.exists:
            # we need to guess the title
            basename = os.path.splitext(os.path.basename(self.filename))[0]
            basename = re.sub(r'[_-]', ' ', basename)
            return basename.capitalize()
        content = self.full_content
        match = re.search(r'<title>(.*?)</title>', content, re.I|re.S)
        return match.group(1)

    @property
    def full_content(self):
        f = open(self.filename, 'rb')
        try:
            return f.read()
        finally:
            f.close()

    @property
    def content(self):
        if not self.exists:
            return ''
        content = self.full_content
        match = re.search(r'<body[^>]*>(.*?)</body>', content, re.I|re.S)
        return match.group(1)

    @property
    def mtime(self):
        if not self.exists:
            return None
        else:
            return int(os.stat(self.filename).st_mtime)

    def set(self, title, content):
        dir = os.path.dirname(self.filename)
        if not os.path.exists(dir):
            os.makedirs(dir)
        new_content = """<html><head><title>%s</title></head><body>%s</body></html>""" % (
            title, content)
        f = open(self.filename, 'wb')
        f.write(new_content)
        f.close()

Basically it provides a .title attribute, a .content attribute, the .mtime (last modified time), and the page can exist or not (giving appropriate guesses for title and content when the page does not exist). It encodes these on the filesystem as a simple HTML page that is parsed by some regular expressions.

None of this really applies much to the web or WebOb, so I'll leave it to you to figure out the details of this.

URLs, PATH_INFO, and SCRIPT_NAME

This is an aside for the tutorial, but an important concept. In WSGI, and accordingly with WebOb, the URL is split up into several pieces. Some of these are obvious and some not.

An example:

http://example.com:8080/wiki/article/12?version=10

There are several components here:

  • req.scheme: http

  • req.host: example.com:8080

  • req.server_name: example.com

  • req.server_port: 8080

  • req.script_name: /wiki

  • req.path_info: /article/12

  • req.query_string: version=10

One non-obvious part is req.script_name and req.path_info. These correspond to the CGI environmental variables SCRIPT_NAME and PATH_INFO. req.script_name points to the application. You might have several applications in your site at different paths: one at /wiki, one at /blog, one at /. Each application doesn't necessarily know about the others, but it has to construct its URLs properly -- so any internal links to the wiki application should start with /wiki.

Just as there are pieces to the URL, there are several properties in WebOb to construct URLs based on these:

  • req.host_url: http://example.com:8080

  • req.application_url: http://example.com:8080/wiki

  • req.path_url: http://example.com:8080/wiki/article/12

  • req.path: /wiki/article/12

  • req.path_qs: /wiki/article/12?version=10

  • req.url: http://example.com:8080/wiki/article/12?version10

You can also create URLs with req.relative_url('some/other/page'). In this example that would resolve to http://example.com:8080/wiki/article/some/other/page. You can also create a relative URL to the application URL (SCRIPT_NAME) like req.relative_url('some/other/page', True) which would be http://example.com:8080/wiki/some/other/page.

Back to the Application

We have a dispatching function with __call__ and we have a domain object with Page, but we aren't actually doing anything.

The dispatching goes to action_ACTION_METHOD, where ACTION defaults to view. So a simple page view will be action_view_GET. Let's implement that:

class WikiApp(object):
    ...

    def action_view_GET(self, req, page):
        if not page.exists:
            return exc.HTTPTemporaryRedirect(
                location=req.url + '?action=edit')
        text = self.view_template.substitute(
            page=page, req=req)
        resp = Response(text)
        resp.last_modified = page.mtime
        resp.conditional_response = True
        return resp

The first thing we do is redirect the user to the edit screen if the page doesn't exist. exc.HTTPTemporaryRedirect is a response that gives a 307 Temporary Redirect response with the given location.

Otherwise we fill in a template. The template language we're going to use in this example is Tempita, a very simple template language with a similar interface to string.Template.

The template actually looks like this:

from tempita import HTMLTemplate

VIEW_TEMPLATE = HTMLTemplate("""\
<html>
 <head>
  <title>{{page.title}}</title>
 </head>
 <body>
<h1>{{page.title}}</h1>

<div>{{page.content|html}}</div>

<hr>
<a href="{{req.url}}?action=edit">Edit</a>
 </body>
</html>
""")

class WikiApp(object):
    view_template = VIEW_TEMPLATE
    ...

As you can see it's a simple template using the title and the body, and a link to the edit screen. We copy the template object into a class method (view_template = VIEW_TEMPLATE) so that potentially a subclass could override these templates.

tempita.HTMLTemplate is a template that does automatic HTML escaping. Our wiki will just be written in plain HTML, so we disable escaping of the content with {{page.content|html}}.

So let's look at the action_view_GET method again:

def action_view_GET(self, req, page):
    if not page.exists:
        return exc.HTTPTemporaryRedirect(
            location=req.url + '?action=edit')
    text = self.view_template.substitute(
        page=page, req=req)
    resp = Response(text)
    resp.last_modified = page.mtime
    resp.conditional_response = True
    return resp

The template should be pretty obvious now. We create a response with Response(text), which already has a default Content-Type of text/html.

To allow conditional responses we set resp.last_modified. You can set this attribute to a date, None (effectively removing the header), a time tuple (like produced by time.localtime()), or as in this case to an integer timestamp. If you get the value back it will always be a datetime object (or None). With this header we can process requests with If-Modified-Since headers, and return 304 Not Modified if appropriate. It won't actually do that unless you set resp.conditional_response to True.

Note

If you subclass webob.Response you can set the class attribute default_conditional_response = True and this setting will be on by default. You can also set other defaults, like the default_charset ("utf8"), or default_content_type ("text/html").

The Edit Screen

The edit screen will be implemented in the method action_edit_GET. There's a template and a very simple method:

EDIT_TEMPLATE = HTMLTemplate("""\
<html>
 <head>
  <title>Edit: {{page.title}}</title>
 </head>
 <body>
{{if page.exists}}
<h1>Edit: {{page.title}}</h1>
{{else}}
<h1>Create: {{page.title}}</h1>
{{endif}}

<form action="{{req.path_url}}" method="POST">
 <input type="hidden" name="mtime" value="{{page.mtime}}">
 Title: <input type="text" name="title" style="width: 70%" value="{{page.title}}"><br>
 Content: <input type="submit" value="Save">
 <a href="{{req.path_url}}">Cancel</a>
   <br>
 <textarea name="content" style="width: 100%; height: 75%" rows="40">{{page.content}}</textarea>
   <br>
 <input type="submit" value="Save">
 <a href="{{req.path_url}}">Cancel</a>
</form>
</body></html>
""")

class WikiApp(object):
    ...

    edit_template = EDIT_TEMPLATE

    def action_edit_GET(self, req, page):
        text = self.edit_template.substitute(
            page=page, req=req)
        return Response(text)

As you can see, all the action here is in the template.

In <form action="{{req.path_url}}" method="POST"> we submit to req.path_url; that's everything but ?action=edit. So we are POSTing right over the view page. This has the nice side effect of automatically invalidating any caches of the original page. It also is vaguely RESTful.

We save the last modified time in a hidden mtime field. This way we can detect concurrent updates. If start editing the page who's mtime is 100000, and someone else edits and saves a revision changing the mtime to 100010, we can use this hidden field to detect that conflict. Actually resolving the conflict is a little tricky and outside the scope of this particular tutorial, we'll just note the conflict to the user in an error.

From there we just have a very straight-forward HTML form. Note that we don't quote the values because that is done automatically by HTMLTemplate; if you are using something like string.Template or a templating language that doesn't do automatic quoting, you have to be careful to quote all the field values.

We don't have any error conditions in our application, but if there were error conditions we might have to re-display this form with the input values the user already gave. In that case we'd do something like:

<input type="text" name="title"
 value="{{req.params.get('title', page.title)}}">

This way we use the value in the request (req.params is both the query string parameters and any variables in a POST response), but if there is no value (e.g., first request) then we use the page values.

Processing the Form

The form submits to action_view_POST (view is the default action). So we have to implement that method:

class WikiApp(object):
    ...

    def action_view_POST(self, req, page):
        submit_mtime = int(req.params.get('mtime') or '0') or None
        if page.mtime != submit_mtime:
            return exc.HTTPPreconditionFailed(
                "The page has been updated since you started editing it")
        page.set(
            title=req.params['title'],
            content=req.params['content'])
        resp = exc.HTTPSeeOther(
            location=req.path_url)
        return resp

The first thing we do is check the mtime value. It can be an empty string (when there's no mtime, like when you are creating a page) or an integer. int(req.params.get('time') or '0') or None basically makes sure we don't pass "" to int() (which is an error) then turns 0 into None (0 or None will evaluate to None in Python -- false_value or other_value in Python resolves to other_value). If it fails we just give a not-very-helpful error message, using 412 Precondition Failed (typically preconditions are HTTP headers like If-Unmodified-Since, but we can't really get the browser to send requests like that, so we use the hidden field instead).

Note

Error statuses in HTTP are often under-used because people think they need to either return an error (useful for machines) or an error message or interface (useful for humans). In fact you can do both: you can give any human readable error message with your error response.

One problem is that Internet Explorer will replace error messages with its own incredibly unhelpful error messages. However, it will only do this if the error message is short. If it's fairly large (4Kb is large enough) it will show the error message it was given. You can load your error with a big HTML comment to accomplish this, like "<!-- %s -->" % ('x'*4000).

You can change the status of any response with resp.status_int = 412, or you can change the body of an exc.HTTPSomething with resp.body = new_body. The primary advantage of using the classes in webob.exc is giving the response a clear name and a boilerplate error message.

After we check the mtime we get the form parameters from req.params and issue a redirect back to the original view page. 303 See Other is a good response to give after accepting a POST form submission, as it gets rid of the POST (no warning messages for the user if they try to go back).

In this example we've used req.params for all the form values. If we wanted to be specific about where we get the values from, they could come from req.GET (the query string, a misnomer since the query string is present even in POST requests) or req.POST (a POST form body). While sometimes it's nice to distinguish between these two locations, for the most part it doesn't matter. If you want to check the request method (e.g., make sure you can't change a page with a GET request) there's no reason to do it by accessing these method-specific getters. It's better to just handle the method specifically. We do it here by including the request method in our dispatcher (dispatching to action_view_GET or action_view_POST).

Cookies

One last little improvement we can do is show the user a message when they update the page, so it's not quite so mysteriously just another page view.

A simple way to do this is to set a cookie after the save, then display it in the page view. To set it on save, we add a little to action_view_POST:

def action_view_POST(self, req, page):
    ...
    resp = exc.HTTPSeeOther(
        location=req.path_url)
    resp.set_cookie('message', 'Page updated')
    return resp

And then in action_view_GET:

VIEW_TEMPLATE = HTMLTemplate("""\
...
{{if message}}
<div style="background-color: #99f">{{message}}</div>
{{endif}}
...""")

class WikiApp(object):
    ...

    def action_view_GET(self, req, page):
        ...
        if req.cookies.get('message'):
            message = req.cookies['message']
        else:
            message = None
        text = self.view_template.substitute(
            page=page, req=req, message=message)
        resp = Response(text)
        if message:
            resp.delete_cookie('message')
        else:
            resp.last_modified = page.mtime
            resp.conditional_response = True
        return resp

req.cookies is just a dictionary, and we also delete the cookie if it is present (so the message doesn't keep getting set). The conditional response stuff only applies when there isn't any message, as messages are private. Another alternative would be to display the message with Javascript, like:

<script type="text/javascript">
function readCookie(name) {
    var nameEQ = name + "=";
    var ca = document.cookie.split(';');
    for (var i=0; i < ca.length; i++) {
        var c = ca[i];
        while (c.charAt(0) == ' ') c = c.substring(1,c.length);
        if (c.indexOf(nameEQ) == 0) return c.substring(nameEQ.length,c.length);
    }
    return null;
}

function createCookie(name, value, days) {
    if (days) {
        var date = new Date();
        date.setTime(date.getTime()+(days*24*60*60*1000));
        var expires = "; expires="+date.toGMTString();
    } else {
        var expires = "";
    }
    document.cookie = name+"="+value+expires+"; path=/";
}

function eraseCookie(name) {
    createCookie(name, "", -1);
}

function showMessage() {
    var message = readCookie('message');
    if (message) {
        var el = document.getElementById('message');
        el.innerHTML = message;
        el.style.display = '';
        eraseCookie('message');
    }
}
</script>

Then put <div id="messaage" style="display: none"></div> in the page somewhere. This has the advantage of being very cacheable and simple on the server side.

Conclusion

We're done, hurrah!