PYPY(1) | PyPy | PYPY(1) |
pypy - fast, compliant alternative implementation of the Python language
pypy [options] [-c cmd|-m mod|file.py|-] [arg...]
PYPYLOG=jit-log-opt,jit-backend:logfile will generate a log suitable for jitviewer, a tool for debugging performance issues under PyPy.
PyPy's default garbage collector is called incminimark - it's an incremental, generational moving collector. Here we hope to explain a bit how it works and how it can be tuned to suit the workload.
Incminimark first allocates objects in so called nursery - place for young objects, where allocation is very cheap, being just a pointer bump. The nursery size is a very crucial variable - depending on your workload (one or many processes) and cache sizes you might want to experiment with it via PYPY_GC_NURSERY environment variable. When the nursery is full, there is performed a minor collection. Freed objects are no longer referencable and just die, just by not being referenced any more; on the other hand, objects found to still be alive must survive and are copied from the nursery to the old generation. Either to arenas, which are collections of objects of the same size, or directly allocated with malloc if they're larger. (A third category, the very large objects, are initially allocated outside the nursery and never move.)
Since Incminimark is an incremental GC, the major collection is incremental: the goal is not to have any pause longer than 1ms, but in practice it depends on the size and characteristics of the heap: occasionally, there can be pauses between 10-100ms.
If there are parts of the program where it is important to have a low latency, you might want to control precisely when the GC runs, to avoid unexpected pauses. Note that this has effect only on major collections, while minor collections continue to work as usual.
As explained above, a full major collection consists of N steps, where N depends on the size of the heap; generally speaking, it is not possible to predict how many steps will be needed to complete a collection.
gc.enable() and gc.disable() control whether the GC runs collection steps automatically. When the GC is disabled the memory usage will grow indefinitely, unless you manually call gc.collect() and gc.collect_step().
gc.collect() runs a full major collection.
gc.collect_step() runs a single collection step. It returns an object of type GcCollectStepStats, the same which is passed to the corresponding GC Hooks. The following code is roughly equivalent to a gc.collect():
while True:
if gc.collect_step().major_is_done:
break
For a real-world example of usage of this API, you can look at the 3rd-party module pypytools.gc.custom, which also provides a with customgc.nogc() context manager to mark sections where the GC is forbidden.
Before we discuss issues of "fragmentation", we need a bit of precision. There are two kinds of related but distinct issues:
There is a special function in the gc module called get_stats(memory_pressure=False).
memory_pressure controls whether or not to report memory pressure from objects allocated outside of the GC, which requires walking the entire heap, so it's disabled by default due to its cost. Enable it when debugging mysterious memory disappearance.
Example call looks like that:
>>> gc.get_stats(True) Total memory consumed: GC used: 4.2MB (peak: 4.2MB)
in arenas: 763.7kB
rawmalloced: 383.1kB
nursery: 3.1MB raw assembler used: 0.0kB memory pressure: 0.0kB ----------------------------- Total: 4.2MB Total memory allocated: GC allocated: 4.5MB (peak: 4.5MB)
in arenas: 763.7kB
rawmalloced: 383.1kB
nursery: 3.1MB raw assembler allocated: 0.0kB memory pressure: 0.0kB ----------------------------- Total: 4.5MB
In this particular case, which is just at startup, GC consumes relatively little memory and there is even less unused, but allocated memory. In case there is a lot of unreturned memory or actual fragmentation, the "allocated" can be much higher than "used". Generally speaking, "peak" will more closely resemble the actual memory consumed as reported by RSS. Indeed, returning memory to the OS is a hard and not solved problem. In PyPy, it occurs only if an arena is entirely free---a contiguous block of 64 pages of 4 or 8 KB each. It is also rare for the "rawmalloced" category, at least for common system implementations of malloc().
The details of various fields:
GC hooks are user-defined functions which are called whenever a specific GC event occur, and can be used to monitor GC activity and pauses. You can install the hooks by setting the following attributes:
To uninstall a hook, simply set the corresponding attribute to None. To install all hooks at once, you can call gc.hooks.set(obj), which will look for methods on_gc_* on obj. To uninstall all the hooks at once, you can call gc.hooks.reset().
The functions called by the hooks receive a single stats argument, which contains various statistics about the event.
Note that PyPy cannot call the hooks immediately after a GC event, but it has to wait until it reaches a point in which the interpreter is in a known state and calling user-defined code is harmless. It might happen that multiple events occur before the hook is invoked: in this case, you can inspect the value stats.count to know how many times the event occurred since the last time the hook was called. Similarly, stats.duration contains the total time spent by the GC for this specific event since the last time the hook was called.
On the other hand, all the other fields of the stats object are relative only to the last event of the series.
The attributes for GcMinorStats are:
The attributes for GcCollectStepStats are:
The value of oldstate and newstate is one of these constants, defined inside gc.GcCollectStepStats: STATE_SCANNING, STATE_MARKING, STATE_SWEEPING, STATE_FINALIZING, STATE_USERDEL. It is possible to get a string representation of it by indexing the GC_STATES tuple.
The attributes for GcCollectStats are:
Note that GcCollectStats has not got a duration field. This is because all the GC work is done inside gc-collect-step: gc-collect-done is used only to give additional stats, but doesn't do any actual work.
A note about the duration field: depending on the architecture and operating system, PyPy uses different ways to read timestamps, so duration is expressed in varying units. It is possible to know which by calling __pypy__.debug_get_timestamp_unit(), which can be one of the following values:
Unfortunately, there does not seem to be a reliable standard way for converting tsc ticks into nanoseconds, although in practice on modern CPUs it is enough to divide the ticks by the maximum nominal frequency of the CPU. For this reason, PyPy gives the raw value, and leaves the job of doing the conversion to external libraries.
Here is an example of GC hooks in use:
import sys import gc class MyHooks(object):
done = False
def on_gc_minor(self, stats):
print 'gc-minor: count = %02d, duration = %d' % (stats.count,
stats.duration)
def on_gc_collect_step(self, stats):
old = gc.GcCollectStepStats.GC_STATES[stats.oldstate]
new = gc.GcCollectStepStats.GC_STATES[stats.newstate]
print 'gc-collect-step: %s --> %s' % (old, new)
print ' count = %02d, duration = %d' % (stats.count,
stats.duration)
def on_gc_collect(self, stats):
print 'gc-collect-done: count = %02d' % stats.count
self.done = True hooks = MyHooks() gc.hooks.set(hooks) # simulate some GC activity lst = [] while not hooks.done:
lst = [lst, 1, 2, 3]
PyPy's default incminimark garbage collector is configurable through several environment variables:
The PyPy Project
2019, The PyPy Project
March 24, 2019 | 7.0 |