ZEO
How ZEO Works
The ZODB, as I’ve described it so far, can only be used within a single Python process (though perhaps with multiple threads). ZEO, Zope Enterprise Objects, extends the ZODB machinery to provide access to objects over a network. The name “Zope Enterprise Objects” is a bit misleading; ZEO can be used to store Python objects and access them in a distributed fashion without Zope ever entering the picture. The combination of ZEO and ZODB is essentially a Python- specific object database.
ZEO consists of about 12,000 lines of Python code, excluding tests. The code is
relatively small because it contains only code for a TCP/IP server, and for a
new type of Storage, ClientStorage
. ClientStorage
simply makes
remote procedure calls to the server, which then passes them on a regular
Storage
class such as FileStorage
. The following diagram lays
out the system:
XXX insert diagram here later
Any number of processes can create a ClientStorage
instance, and any
number of threads in each process can be using that instance.
ClientStorage
aggressively caches objects locally, so in order to avoid
using stale data the ZEO server sends an invalidation message to all the
connected ClientStorage
instances on every write operation. The
invalidation message contains the object ID for each object that’s been
modified, letting the ClientStorage
instances delete the old data for
the given object from their caches.
This design decision has some consequences you should be aware of. First, while
ZEO isn’t tied to Zope, it was first written for use with Zope, which stores
HTML, images, and program code in the database. As a result, reads from the
database are far more frequent than writes, and ZEO is therefore better suited
for read-intensive applications. If every ClientStorage
is writing to
the database all the time, this will result in a storm of invalidate messages
being sent, and this might take up more processing time than the actual database
operations themselves. These messages are small and sent in batches, so there
would need to be a lot of writes before it became a problem.
On the other hand, for applications that have few writes in comparison to the
number of read accesses, this aggressive caching can be a major win. Consider a
Slashdot-like discussion forum that divides the load among several Web servers.
If news items and postings are represented by objects and accessed through ZEO,
then the most heavily accessed objects – the most recent or most popular
postings – will very quickly wind up in the caches of the
ClientStorage
instances on the front-end servers. The back-end ZEO
server will do relatively little work, only being called upon to return the
occasional older posting that’s requested, and to send the occasional invalidate
message when a new posting is added. The ZEO server isn’t going to be contacted
for every single request, so its workload will remain manageable.
Installing ZEO
This section covers how to install the ZEO package, and how to configure and run a ZEO Storage Server on a machine.
Requirements
The ZEO server software is included in ZODB3. As with the rest of ZODB3, you’ll need Python 2.3 or higher.
Running a server
The runzeo.py script in the ZEO directory can be used to start a server. Run it
with the -h option to see the various values. If you’re just experimenting, a
good choise is to use python ZEO/runzeo.py -a /tmp/zeosocket -f
/tmp/test.fs
to run ZEO with a Unix domain socket and a FileStorage
.
Testing the ZEO Installation
Once a ZEO server is up and running, using it is just like using ZODB with a
more conventional disk-based storage; no new programming details are introduced
by using a remote server. The only difference is that programs must create a
ClientStorage
instance instead of a FileStorage
instance.
From that point onward, ZODB-based code is happily unaware that objects are
being retrieved from a ZEO server, and not from the local disk.
As an example, and to test whether ZEO is working correctly, try running the following lines of code, which will connect to the server, add some bits of data to the root of the ZODB, and commits the transaction:
from ZEO import ClientStorage
from ZODB import DB
import transaction
# Change next line to connect to your ZEO server
addr = 'kronos.example.com', 1975
storage = ClientStorage.ClientStorage(addr)
db = DB(storage)
conn = db.open()
root = conn.root()
# Store some things in the root
root['list'] = ['a', 'b', 1.0, 3]
root['dict'] = {'a':1, 'b':4}
# Commit the transaction
transaction.commit()
If this code runs properly, then your ZEO server is working correctly.
You can also use a configuration file.
<zodb>
<zeoclient>
server localhost:9100
</zeoclient>
</zodb>
One nice feature of the configuration file is that you don’t need to specify imports for a specific storage. That makes the code a little shorter and allows you to change storages without changing the code.
import ZODB.config
db = ZODB.config.databaseFromURL('/tmp/zeo.conf')
ZEO Programming Notes
ZEO is written using asyncore
, from the Python standard library. It
assumes that some part of the user application is running an asyncore
mainloop. For example, Zope run the loop in a separate thread and ZEO uses
that. If your application does not have a mainloop, ZEO will not process
incoming invalidation messages until you make some call into ZEO. The
Connection.sync()
method can be used to process pending invalidation
messages. You can call it when you want to make sure the Connection
has the most recent version of every object, but you don’t have any other work
for ZEO to do.
Sample Application: chatter.py
For an example application, we’ll build a little chat application. What’s
interesting is that none of the application’s code deals with network
programming at all; instead, an object will hold chat messages, and be
magically shared between all the clients through ZEO. I won’t present the
complete script here; you can download it from chatter.py
. Only the interesting portions of the code will be covered here.
The basic data structure is the ChatSession
object, which provides an
add_message()
method that adds a message, and a new_messages()
method that returns a list of new messages that have accumulated since the last
call to new_messages()
. Internally, ChatSession
maintains a
B-tree that uses the time as the key, and stores the message as the
corresponding value.
The constructor for ChatSession
is pretty simple; it simply creates an
attribute containing a B-tree:
class ChatSession(Persistent):
def __init__(self, name):
self.name = name
# Internal attribute: _messages holds all the chat messages.
self._messages = BTrees.OOBTree.OOBTree()
add_message()
has to add a message to the _messages
B-tree. A
complication is that it’s possible that some other client is trying to add a
message at the same time; when this happens, the client that commits first wins,
and the second client will get a ConflictError
exception when it tries to
commit. For this application, ConflictError
isn’t serious but simply
means that the operation has to be retried; other applications might treat it as
a fatal error. The code uses try...except...else
inside a while
loop,
breaking out of the loop when the commit works without raising an exception.
def add_message(self, message):
"""Add a message to the channel.
message -- text of the message to be added
"""
while 1:
try:
now = time.time()
self._messages[now] = message
get_transaction().commit()
except ConflictError:
# Conflict occurred; this process should abort,
# wait for a little bit, then try again.
transaction.abort()
time.sleep(.2)
else:
# No ConflictError exception raised, so break
# out of the enclosing while loop.
break
# end while
new_messages()
introduces the use of volatile attributes. Attributes of
a persistent object that begin with _v_
are considered volatile and are
never stored in the database. new_messages()
needs to store the last time
the method was called, but if the time was stored as a regular attribute, its
value would be committed to the database and shared with all the other clients.
new_messages()
would then return the new messages accumulated since any
other client called new_messages()
, which isn’t what we want.
def new_messages(self):
"Return new messages."
# self._v_last_time is the time of the most recent message
# returned to the user of this class.
if not hasattr(self, '_v_last_time'):
self._v_last_time = 0
new = []
T = self._v_last_time
for T2, message in self._messages.items():
if T2 > T:
new.append(message)
self._v_last_time = T2
return new
This application is interesting because it uses ZEO to easily share a data structure; ZEO and ZODB are being used for their networking ability, not primarily for their data storage ability. I can foresee many interesting applications using ZEO in this way:
With a Tkinter front-end, and a cleverer, more scalable data structure, you could build a shared whiteboard using the same technique.
A shared chessboard object would make writing a networked chess game easy.
You could create a Python class containing a CD’s title and track information. To make a CD database, a read-only ZEO server could be opened to the world, or an HTTP or XML-RPC interface could be written on top of the ZODB.
A program like Quicken could use a ZODB on the local disk to store its data. This avoids the need to write and maintain specialized I/O code that reads in your objects and writes them out; instead you can concentrate on the problem domain, writing objects that represent cheques, stock portfolios, or whatever.