REPOCUTTER(1) | REPOCUTTER(1) |
repocutter - surgical and filtering operations on Subversion dump files
repocutter [-q] [-d n] [-i 'filename'] [-r 'selection'] 'subcommand'
This program does surgical and filtering operations on Subversion dump files. While it is is not as flexible as reposurgeon(1), it can perform Subversion-specific transformations that reposurgeon cannot, and can be useful for processing Subversion repositories into a form suitable for conversion. Also, it supports the version 3 dumpfile format, which reposurgeon does not.
In most commands, the -r (or --range) option limits the selection of revisions over which an operation will be performed. Usually other revisions will be passed through unaltered, except in the select and deselect commands for which the option controls which revisions will be passed through. A selection consists of one or more comma-separated ranges. A range may consist of an integer revision number or the special name HEAD for the head revision. Or it may be a colon-separated pair of integers, or an integer followed by a colon followed by HEAD.
If the output stream contains copyfrom references to missing revisions, repocutter silently patch each copysources by stepping it backwards to the most recent previous version that exists.
(Older versions of this tool, before 4.30, treated -r as an implied selection filter rather than passing through unselected revisions unaltered. If you have old scripts using repocutter they may need modification.)
Normally, each subcommand produces a progress spinner on standard error; each turn means another revision has been filtered. The -q (or --quiet) option suppresses this. Quiet mode is set when output is redirected to a file or pipe.
The -d option enables debug messages on standard error. It takes an integer debug level. These messages are probably only of interest to repocutter developers.
The -i option sets the input source to a specified filename. This is primarily useful when running the program under a debugger. When this option is not present the program expects to read a stream from standard input.
Generally, if you need to use this program at all, you will find that you need to pipe your dump file through multiple instances of it doing one kind of operation each. This is not as expensive as it sounds; with the exception of the reduce subcommand, the working set of this program is bounded by the size of the the largest single blob plus its metadata. It does not need to hold the entire repo metadata in memory.
The -f/-fixed option disables regexp compilation of PATTERN arguments, treating them as literal strings.
The -t option sets a tag to be included in error and warning messages. This will be useful for determining which stage of a multistage repocutter pipeline failed.
There are a few other command-specific options described under individual commands.
In the command descriptions, PATTERN arguments are regular expressions to match pathnames, constrained so that each match must be a path segment or a sequence of path segments; that is, the left end must be either at the start of path or immediately following a /, and the right end must precede a / or be at end of string. With a leading ^ the match is constrained to be a leading sequence of the pathname; with a trailing $, a trailing one.
The following subcommands are available:
select
Warning::valid dump that can be read by reposurgeon. In particular, it may delete a revision that is referenced in a later copy-from operation, which will crash reposurgeon.
deselect
Warning::valid dump that can be read by reposurgeon. In particular, it may delete a revision that is referenced in a later copy-from operation, which will crash reposurgeon.
see
renumber
count
log
setlog
propdel
proprename
propset
May be restricted by a revision selection. Note that specifying only a revision will cause the property to be seet on the revision properties and on all nodes in the rtevision; you’ll probably want to specify a node index.
You may specify multiple property settings.
propclean
expunge
Warning::valid dump that can be read by reposurgeon. In particular, it may delete a revision that is referenced in a later copy-from operation, which will crash reposurgeon.
sift
This transform can be restricted by a selection set.
Warning::valid dump that can be read by reposurgeon. In particular, it may delete a revision that is referenced in a later copy-from operation, which will crash reposurgeon.
closure
pathlist
pathrename
Matches are constrained so that each match must be a path segment or a sequence of path segments; that is, the left end must be either at the start of path or immediately following a /, and the right end must precede a / or be at end of string. With a leading ^ the match is constrained to be a leading sequence of the pathname; with a trailing $, a trailing one.
Multiple FROM/TO pairs may be specified and are applied in order. This transform can be restricted by a selection set.
All mergeinfo properties are updated in accordance with the path renames,
setpath
setcopyfrom
pop
May be useful after a sift command to turn a dump from a subproject stripped from a dump for a multiple-project repository into the normal form with trunk/tags/branches at the top level.
This transform cannot be restricted by a selection set, as it is not possible to guarantee that copyfro paths and mergeinfo properties will be modified consistently in the presence of that kind of restriction.
Mergeinfo properties in all revisions are updated, as well as path and copyfrom parts.
push
This transform cannot be restricted by a selection set, as it is not possible to guarantee that copyfro paths and mergeinfo properties will be modified consistently in the presence of that kind of restriction.
Mergeinfo properties in all revisions are updated toi refer to the new pathnames.
filecopy
You can use this operation to sever links from obsolete branches or non-conformable directories in a multiproject repository so the unwanted content can be expunged without changing the content of later revisions.
If a PATTERN argument is provided, only replace copies with an explicit add/change when the source node path matches PATTERN.
With the -n flag, only the basename is required to match PATTERN if it is provided. Otherwise, with -n and no PATTERN, require a match of source to target on basename only rather than the full path. This may be required in order to extract filecopies from branches.
Restricting the range holds down the memory requirement of this tool, which in the worst (and default) 1:$ case will keep a copy of every blob in the repository until it’s done processing the stream.
skipcopy
swap
swapsvn
Fires when the second component of a matching path is "trunk", "branches", or "tags", or the path consists of a single segment that is a top-level project directory; passes through all paths for this is not so unaltered.
Top-level project directories with properties or comments make this command die (return status 1) with an error message on stderr; otherwise these directories are silently discarded.
Otherwise, swaps "trunk" and the top-level (project) directory straight up. For tags and branches, the following two components are swapped to the top. thus, "foo/branches/release23" becomes "branches/release23/foo", putting the project directory beneath the branch.
Also fires when an entire project directory is copied; this is transformed into a copy of trunk and copies of each subbranch and tag that exists.
After the swap, there are attempts to recognize spans of copies into branch directories, and copies into tag subdirectories that are parallel in all top-level (project) directories. These are coalesced into single copies in the inverted structure. No attempts is made to coalesce deletes; the user must manually trim unneeded branches.
Accordingly, copies with three-segment sources and three-segment targets are transformed; for tags/ and branches/ paths the last segment (the subdirectory below the branch name) is dropped, Following copies are skipped.
This has two minor negative consequences. One is that metadata belonging to all deletes or copies after the first one in a coalesced span is lost. The other is that branches and tags local to individual project directories are promoted to global branches and tags across the entire transformed repository; no content is lost this way.
Parallel rename sequences are also coalesced.
If a PATTERN argument is given, only paths matching the pattern are swapped.
Note that the result of swapping does not have initial trunk/branches/tags directory creations and can thus not be fed directly to svnload. reposurgeon copes with this, but Subversion will not.
Merfeinfo propertied are updated to use the swapped path names.
This transform can be restricted by a selection set.
swapcheck
Each report line has two fields; the first is the earliest revision containing a path with the prefix given, and the second is the prefix.
If feeding a Subversion dump to this subcommand doesn’t produce an empty report, you can expect swapsvn to produce an invalid dump that will confuse and possibly crash reposurgeon. The remedy for this is s set of pathrenames and/or deselections that yields paths conformable to being swapped into a regular Subversion structure.
Note, replace
strip
This command is useful for reducing the bulk of a stream without touching its metadata, so you can do test conversions more quickly.
obscure
reduce
testify
count
For the meaning of the debug levels, see the source code. This option is probably only of interest to repocutter developers.
version
Under the name "svncutter", an ancestor of this program traveled in the 'contrib/' director of the Subversion distribution. It had functional overlap with reposurgeon(1) because it was directly ancestral to that code. It was moved to the reposurgeon(1) distribution in January 2016. This program was ported from Python to Go in August 2018, at which time the obsolete "squash" command was retired. The syntax of regular expressions in the pathrename command changed at that time.
The reason for the partial functional overlap between repocutter and reposurgeon is that repocutter was first written earlier and became a testbed for some of the design concepts in reposurgeon. After reposurgeon was written, the author learned that it could not naturally support some useful operations very specific to Subversion, and enhanced repocutter to do those.
Normally 0. Can be 1 if repocutter sees an ill-formed dump, or if the output stream contains any copyfrom references to missing revisions.
There is one regression since the Python version: repocutter no longer recognizes Macintosh-style line endings consisting of a carriage return only. This may be addressed in a future version.
Suppose you have a Subversion repository with the following semi-pathological structure:
Directory1/ (with unrelated content) Directory2/ (with unrelated content) TheDirIWantToMigrate/
branches/
crazy-feature/
UnrelatedApp1/
TheAppIWantToMigrate/
tags/
v1.001/
UnrelatedApp1/
UnrelatedApp2/
TheAppIWantToMigrate/
trunk/
UnrelatedApp1/
UnrelatedApp2/
TheAppIWantToMigrate/
You want to transform the dump file so that TheAppIWantToMigrate can be subject to a regular branchy lift. A way to dissect out the code of interest would be with the following series of filters applied:
repocutter expunge '^Directory1' '^Directory2' repocutter pathrename '^TheDirIWantToMigrate/' '' repocutter expunge '^branches/crazy-feature/UnrelatedApp1/ repocutter pathrename 'branches/crazy-feature/TheAppIWantToMigrate/' 'branches/crazy-feature/' repocutter expunge '^tags/v1.001/UnrelatedApp1/' repocutter expunge '^tags/v1.001/UnrelatedApp2/' repocutter pathrename '^tags/v1.001/TheAppIWantToMigrate/' 'tags/v1.001/' repocutter expunge '^trunk/UnrelatedApp1/' repocutter expunge '^trunk/UnrelatedApp2/' repocutter pathrename '^trunk/TheAppIWantToMigrate/' 'trunk/'
The sift and expunge operations can produce output dumps that are invalid. The problem is copyfrom operations (Subversion branch and tag creations). If an included revision includes a copyfrom reference to an excluded one, the reference target won’t be in the emitted dump; it won’t load correctly in Subversion, and while reposurgeon has fallback logic that backs down to the latest existing revision before the kissing one this expedient is fragile. The revision number in a copyfrom header pointing to a missing revision will be zero. Attempts to be clever about this won’t work; the problem is inherent in the data model of Subversion.
Eric S. Raymond <esr@thyrsus.com>. This tool is distributed with reposurgeon; see the project page <http://www.catb.org/~esr/reposurgeon>.
2023-04-09 |