MAKEPP_BUILD_CACHE(1) | Makepp | MAKEPP_BUILD_CACHE(1) |
makepp_build_cache -- How to set up and use build caches
C: clean,
create, M: makepp_build_cache_control,
mppbcc, S: show,
stats
A build cache is a directory containing copies of previous targets that makepp already built. When makepp is asked to build a new target, it sees if it has already built it somewhere else under the same conditions, and if so, simply links or copies it instead of rebuilding it.
A build cache can be useful in the following circumstances:
A similar situation is if you normally work on one architecture but briefly switch to a different architecture, and then you switch back. If the old files are still in the build cache, makepp will not have to recompile anything.
A build cache can help if all of the following are true:
You may find, for example, that using a build cache isn't worth it for compiling very small modules. It's almost certainly not worth it for commands to make a static library (an archive file, libxyz.a), except if you use links to save disk space.
Using a build cache requires a little bit of setup and maintenance work. Please do not try using a build cache until you understand how they work, how to create them, and how to keep them from continually growing and eating up all of the available disk space on your system.
If you enable a build cache, every time a file is built, makepp stores a copy away in a build cache. The name of the file is a key that is a hash of the checksums of all the inputs and the build command and the architecture. The next time makepp wants to rebuild the file, it sees if there is a file with the same checksums already in the build cache. If so, the file is copied out of the build cache.
For efficiency, if the build cache is located on the same file system as the build, makepp will not actually copy the file; instead, it will make a hard link. This is faster and doesn't use up any extra disk space. Similarly, when makepp wants to pull a file out of the build cache, it will use a hard link if possible, or copy it if necessary.
WARNING: Makepp never deletes files from a build cache unless it is explicitly asked. This means that your build caches will continue to grow without bounds unless you clean them up periodically (see below for details).
Build caches and repositories
Build caches and repositories (see makepp_repositories) can solve similar problems. For some situations, a repository is more appropriate, while for others, a build cache is more appropriate.
You can also combine the two. If you have a huge directory structure with lots of sources, which you don't want every developer to have a copy of, then you can provide them as a repository. The produced files, with varying debug options and so forth, can then be managed more flexibly through a build cache.
The key differences between a build cache and a repository are:
In general, a repository is more useful if you have a single central build that you want all developers to take files from. A build cache is what you want if you have a decentralized system where one developer should borrow compiled files from any other developer.
Both build caches and repositories can help with variant builds. For example, if you want to compile all your sources optimized, then again with debugging, then again optimized, you can avoid recompiling all the optimized files again by using either a repository or a build cache. To do this with a repository, you have to think ahead and explicitly tell makepp to use a repository for the debugging compilation, or else it will wipe out your initial optimized compilation. With a build cache, makepp goes ahead and wipes out the initial optimized compilation but can get it back quickly.
A group is a loose coupling of build caches. It is loose in the sense that makepp doesn't deal with it, so as to not slow down its build cache management. To benefit from this you have to use the offline utility. Notably the "clean" command also performs the replication. If you give an unrealistic cleaning criterion, like "--mtime=+1000", no cleaning occurs, only replication.
Grouping allows sharing files with more people, especially if you have your build caches on the developers' disks, to benefit from hard linking, which saves submission time and disk space. Hard linking alone, however, is restricted to per disk benefits.
With grouping the file will get replicated at some time after makepp submitted it to the build cache. This means that the file will get created only once for all disks together.
On file systems which allow hard linking to symbolic links -- which seems restricted to Linux and Solaris -- the file will additionally be physically present on one disk only. Additionally it remains on each disk it got created on before you replicated, but only as long as it is in use on those disks. In this scenario with symlinks you may choose one or more file systems on which you prefer your files to be physically. Be aware that successfully built files may become unavailable, if the disk they are on physically goes offline. Rebuilding will remedy this, and the impact can be lessened by spreading the files over several preferred disks.
Replication has several interesting uses:
How to tell makepp to use the build cache
Once the build cache has been created, it is now available to makepp. There are several options you can specify during creation; see "How to manage a build cache" for details.
A build cache is specified with the --build-cache command line option, with the build_cache statement within a makefile, or with the :build_cache rule modifier.
The most useful ways that I have found so far to work with build caches are:
export MAKEPPFLAGS=--build-cache=/path/to/build/cache setenv MAKEPPFLAGS --build-cache=/path/to/build/cache
Now every build that you run will always use this build cache, and you don't need to modify anything else.
BUILD_CACHE := /path/to/build_cache build_cache $(BUILD_CACHE)
You have to put this in all makefiles that use a build cache (or in a common include file that all the makefiles use). Or put this into your RootMakeppfile:
BUILD_CACHE := /path/to/build_cache global build_cache $(BUILD_CACHE)
On a multiuser machine you might set up one build cache per home disk to take advantage of links. You might find it more convenient to use a statement like this:
build_cache $(find_upwards our_build_cache)
which searches upwards from the current directory in the current file system until it finds a directory called our_build_cache. This can be the same statement for all users and still individually point to the cache on their disk.
Solaris 10 can do some fancy remounting of home directories. Your home will apparently be a mount point of its own, called /home/$LOGNAME, when in fact it is on one of the /export/home* disks alongside those of other users. Because it's not really a separate filesystem, links still work. But you can't search upwards. Instead you can do:
BUILD_CACHE := ${makeperl </export/home*/$(LOGNAME)/../makepp_bc>}
Build caches and signatures
Makepp looks up files in the build cache according to their signatures. If you are using the default signature method (file date + size), makepp will only pull files out of the build cache if the file date of the input files is identical. Depending on how your build works, the file dates may never be identical. For example, if you check files out into two different directory hierarchies, the file dates are likely to be the time you checked the files out, not the time the files were checked in (depending, of course, on your version control software).
What you probably want is to pull files out of the build cache if the file contents are identical, regardless of the date. If this is the case, you should be using some sort of a content-based signature. Makepp does this by default for C and C++ compilations, but it uses file dates for any other kinds of files (e.g., object files, or any other files in the build process not specifically recognized as a C source or include file). If you want other kinds of files to work with the build cache (i.e., if you want it to work with anything other than C/C++ compilation commands), then you could put a statement like this somewhere near the top of your makefile:
signature md5
to force makepp to use signatures based on the content of files rather than their date.
How not to cache certain files
There may be certain files that you know you will never want to cache. For example, if you embed a datestamp into a file, you know that you will never under any circumstances want to fetch a previous copy of the file out of the build cache, because the date stamp is different. In this case, it is just a waste of time and disk space to copy it into the build cache.
Or, you may think it is highly unlikely that you will want to cache the final executable. You might want to cache individual objects or shared objects that go into making the executable, but it's often pretty unlikely that you will build an exactly identical executable from identical inputs. Again, in this case, using a build cache is a waste of disk space and time, so it makes sense to disable it.
Sometimes a file may be extremely quick to generate, and it is just a waste to put it into the build cache since it can be generated as quickly as copied. You may want to selectively disable caching of these files.
You can turn off the build cache for specific rules by specifying ": build_cache none" in a rule, like this:
our_executable: dateStamp.o main.o */*.so : build_cache none $(CC) $(LDFLAGS) $(inputs) -o $(output)
This flag means that any outputs from this particular rule will never be put into the build cache, and makepp will never try to pull them out of the build cache either.
makepp_build_cache_control, mppbcc is a utility that administers build caches for makepp. What makepp_build_cache_control does is determined by the first word of its argument.
In fact this little script is a wrapper to the following command, which you might want to call directly in your cron jobs, where the path to "makeppbuiltin" might be needed:
makeppbuiltin -MMpp::BuildCacheControl command ...
You can also use these commands from a makefile after loading them, with a "&"-prefix as follows for the example of "create":
perl { use Mpp::BuildCacheControl } # It's a Perl module, so use instead of include. my_cache: &create $(CACHE_OPTIONS) $(output) # Call a loaded builtin. build_cache $(prebuild my_cache)
The valid commands, which also take a few of the standard options described in makepp_builtins, are:
Standard options: "-A, --args-file, --arguments-file=filename, -v, --verbose"
Files in the build cache are named using MD5 hashes of data that makepp uses, so each filename is 22 base64 digits plus the original filename. If a build cache file name is 0123456789abcdef012345_module.o, it is actually stored in the build cache as 01/23/456789abcdef012345_module.o if you specify "--subdir-chars 2,4". In fact, "--subdir-chars 2,4" is the default, which is for a gigantic build cache of maximally 4096 dirs with 416777216 subdirs. Even "--subdir-chars 1,2" or "--subdir-chars 1" will get you quite far. On a file system optimized for huge directories you might even say "-s ''" or "--subdir-chars=" to store all files at the top level.
As these are directory permissions, if you grant any access, you must also grant execute access, or you will get a bunch of weird failures. I.e. 0700 means that only this user may have access to this build cache. 0770 means that this user and anyone in the group may have write access to the build cache. 0777 means that anyone may have access to the build cache. The sensible octal digits are 7 (write), 5 (read) or 0 (none). 3 (write) or 1 (read) is also possible, allowing the cache to be used, but not to be browsed, i.e. it would be harder for a malicious user to find file names to manipulate.
In a group of build caches each one has its own value for this, so you can enforce different write permissions on different disks.
If you don't specify the permissions, your umask permissions at creation time apply throughout the lifetime of the build cache.
Only files with a link count of 1 are deleted (because otherwise, the file doesn't get physically deleted anyway -- you'd just uncache a file which someone is apparently still interested in, so somebody else might be too). The criteria you give pertain to the actual cached files. Each build info file will be deleted when its main file is. No empty directories will be left. Irrespective of the link count and the options you give, any file that does not match its build info file will be deleted, if it is older than a safety margin of 10 minutes.
The following options take a time specification as an argument. Time specs start with a "+" meaning longer ago, a "-" meaning more recently or nothing meaning between the number you give, and one more. Numbers, which may be fractional, are by default days. But they may be followed by one of the letters "w" (weeks), "d" (days, the default), "h" (hours), "m" (minutes) or "s" (seconds). Note that days are simply 24 real hours ignoring any change between summer and winter time. Examples:
1 between 24 and 48 hours ago 24h between 24 and 25 hours ago 0.5d between 12 and 36 hours ago 1w between 7 and 14 times 24 hours ago -2 less than 48 hours ago +30m more than 30 minutes ago
All the following options are combined with "and". If you want several sets of combinations with "or", you must call this command repeatedly with different sets of options. Do the ones where you expect the most deletions first, then the others can be faster.
Standard options: "-A, --args-file, --arguments-file=filename, -v, --verbose"
Some file systems do not support the atime field, and even if the file system does, sometimes people turn off access time on their file systems because it adds a lot of extra disk I/O which can be harmful on battery powered notebooks, or in disk speed optimization. (But this is potentially fixable -- see the UTIME_ON_IMPORT comment in Mpp/BuildCache.pm.)
Doing this for clean may have unwanted effects, if you can hard link to symlinks, because it may migrate members from one group to another. Subsequent non blended cleans, may then clean them form the original group prematurely.
This option is named after the equivalent utility "newgrp" which alas can't easily be used in "cron" jobs or similar setups.
The timespec for "--incoming-modification-time" must begin with "+", and defaults to "+2h" (files at least 2 hours old are assumed to have been orphaned).
This is the Swiss officer's knife. The perlcode is called in scalar context once for every cache entry (i.e. excluding directories and metainfo files). It is called in a "File::Find" "wanted" function, so see there for the variables you can use. An "lstat" has been performed, so you can use the "_" filehandle.
If perlcode returns "undef" it is as if it weren't there, that is the other options decide. If it returns true the file is deleted. If it returns false, the file is retained.
This strategy only works if you can trust your users not to subvert the build cache for storing arbitrary (i.e. non-development) files beyond their disk quota. The ownership of the associated metadata file is retained, so you can always see who cached a file. If you need this option, it might need to be given several times during the daytime.
There are different possible strategies, depending on how much space you have and on whether the build cache contains linked files or whether users only have copies. Several strategies can be combined, by calling them one after another or at different times. The "show" command is meant to help you find an appropriate strategy.
A nightly (from Tuesday through Saturday) run might specify "--atime +2" (or "--mtime" if you don't have atime), deleting all files no one has read for two days.
If you use links, you can also prevent fast useless growth which occurs when successive header changes, which never get version controlled, lead to lots of objects being rapidly created. Something like an hourly run with "--mtime=-2h --ctime=+1h" during the daytime will catch those guys the creator deleted within less than an hour, and nobody else has wanted since.
The fields are, in the short standard and the long verbose form:
With "-v, --verbose" the information shown for each command allows you to get an impression which options to give to the "clean" command. The times are shown in readable form, as well as the number of days, hours or minutes the age of this file has just exceeded. If you double the option, you additionally get the info for each group member.
Standard options: "-A, --args-file, --arguments-file=filename, -f, --force, -o, --output=filename, -O, --outfail, -v, --verbose"
If you have a huge cache for which sorting takes intolerably long, or needs more memory than your processes are allowed, you can skip sorting by giving an empty list.
Each of the latter two groups consists of three column pairs, one column with a value, and one for the percentage of the total that value represents. The first pair shows either the size of files or the number of files. The other two pairs show the CUMULation, once from smallest to biggest and once the other way round.
The first three tables, with a first column of AD, CD or MD show access times, inode change times or modification times grouped by days. Days are actually 24 hour blocks counting backwards from the start time of the stats command. The row "0" of the first table will thus show the sum of sizes and the number of files accessed less than a day ago. If no files were accessed then, there will be no row "0". Row "1" in the third table will show the files modified (i.e. written to the build cache) between 24 and 48 hours ago.
The next table, EL, shows external links, i.e. how many build trees share a file from the build cache. This is a measure of usefulness of the build cache. Alas it only works when developers have a buld cache on their own disk, else they have to copy which leaves no global trail. The more content has bigger external link counts, the bigger the benefit of the build cache.
The next table, again EL, shows the same information as the previous one, but weighted by the number of external links. Each byte or file with an external link count of one counts as one. But if the count is ten, the values are counted ten times. That's why the headings change to *SIZE and *FILES. This is a hypothetical value, showing how much disk usage or how many files there would be if the same build trees had all used no build cache.
One more table, C:S copies to symlinks, pertains to grouped caches only. Ideally all members exist in one copy, and one less symlinks than there are caches in the group. Symlinks remain "0" until cleaning has replicated. There may be more than one copy, if either several people created the identical file before it was replicated, or if replication migrated the file to a preferred disk, but the original file was still in use. Superfluous copies become symlinks when cleaning finds they have no more external links.
Standard options: "-A, --args-file, --arguments-file=filename, -v, --verbose"
Build caches will not work well under the following circumstances:
&echo prog_path=$(PWD) -o $(output)
because then the command line will be different and makepp won't incorrectly pull the file out of the build cache. But if the command line is not different, then there could be a problem. For example,
echo prog_path=`pwd` > $(output)
will not work properly.
Imagine Chang is the first to do a full build. Along comes Ching and gets a link to all those files. Chang does some fundamental changes leading to most things being rebuilt. He checks them in, Chong checks them out and gets links to the build cache. Chang again does changes, leading to a third set of files.
In this scenario, no matter what cleaning strategy you use, no files will get deleted, because they are all still in use. The problem is that they all belong to Chang, which can make him reach his disk quota, and there is nothing he can do about it on most systems. See the "clean --set-user" command under "How to manage a build cache" for how the system administrator could change the files to a quota-less cache owner.
Build caches need to support concurrent access, which implies that the implementation must be tolerant of races. In particular, a file might get aged (deleted) between the time makepp decides to import a target and the time the import completes.
Furthermore, some people use build caches over NFS, which is not necessarily coherent. In other words, the order of file creation and deletion by the writer on one host will not necessarily match the order seen by a reader on another host, and therefore races cannot be resolved by paying particular attention to the order of file operations. (But there is usually an NFS cache timeout of about 1 minute which guarantees that writes will take no longer than that amount of time to propagate to all readers. Furthermore, typically in practice at least 99% of writes are visible everywhere within 1 second.) Because of this, we must tolerate the case in which the cached target and its build info file appear not to correspond. Furthermore, there is a peculiar race that can occur when a file is simultaneously aged and replaced, in which the files don't correspond even after the NFS cache flushes. This appears to be unavoidable.
2021-01-06 | perl v5.32.0 |