GIT-PACK-OBJECTS(1) | Git Manual | GIT-PACK-OBJECTS(1) |
git-pack-objects - Create a packed archive of objects
git pack-objects [-q | --progress | --all-progress] [--all-progress-implied]
[--no-reuse-delta] [--delta-base-offset] [--non-empty]
[--local] [--incremental] [--window=<n>] [--depth=<n>]
[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
[--stdout [--filter=<filter-spec>] | base-name]
[--shallow] [--keep-true-parents] < object-list
Reads list of objects from the standard input, and writes either one or more packed archives with the specified base-name to disk, or a packed archive to the standard output.
A packed archive is an efficient way to transfer a set of objects between two repositories as well as an access efficient archival format. In a packed archive, an object is either stored as a compressed whole or as a difference from some other object. The latter is often called a delta.
The packed archive format (.pack) is designed to be self-contained so that it can be unpacked without any further information. Therefore, each object that a delta depends upon must be present within the pack.
A pack index file (.idx) is generated for fast, random access to the objects in the pack. Placing both the index file (.idx) and the packed archive (.pack) in the pack/ subdirectory of $GIT_OBJECT_DIRECTORY (or any of the directories on $GIT_ALTERNATE_OBJECT_DIRECTORIES) enables Git to read from the pack archive.
The git unpack-objects command can read the packed archive and expand the objects contained in the pack into "one-file one-object" format; this is typically done by the smart-pull commands when a pack is created on-the-fly for efficient network transport by their peers.
base-name
--stdout
--revs
--unpacked
--all
--include-tag
--window=<n>, --depth=<n>
The default value for --window is 10 and --depth is 50. The maximum depth is 4095.
--window-memory=<n>
--max-pack-size=<n>
--honor-pack-keep
--keep-pack=<pack-name>
--incremental
--local
--non-empty
--progress
--all-progress
--all-progress-implied
-q
--no-reuse-delta
--no-reuse-object
--compression=<n>
--thin
Note: A thin pack violates the packed archive format by omitting required objects and is thus unusable by Git without making it self-contained. Use git index-pack --fix-thin (see git-index-pack(1)) to restore the self-contained property.
--shallow
--delta-base-offset
Note: Porcelain commands such as git gc (see git-gc(1)), git repack (see git-repack(1)) pass this option by default in modern Git when they put objects in your repository into pack files. So does git bundle (see git-bundle(1)) when it creates a bundle.
--threads=<n>
--index-version=<version>[,<offset>]
--keep-true-parents
--filter=<filter-spec>
--no-filter
--missing=<missing-action>
The form --missing=error requests that pack-objects stop with an error if a missing object is encountered. This is the default action.
The form --missing=allow-any will allow object traversal to continue if a missing object is encountered. Missing objects will silently be omitted from the results.
The form --missing=allow-promisor is like allow-any, but will only allow object traversal to continue for EXPECTED promisor missing objects. Unexpected missing object will raise an error.
--exclude-promisor-objects
--keep-unreachable
--pack-loose-unreachable
--unpack-unreachable
--delta-islands
When possible, pack-objects tries to reuse existing on-disk deltas to avoid having to search for new ones on the fly. This is an important optimization for serving fetches, because it means the server can avoid inflating most objects at all and just send the bytes directly from disk. This optimization can’t work when an object is stored as a delta against a base which the receiver does not have (and which we are not already sending). In that case the server "breaks" the delta and has to find a new one, which has a high CPU cost. Therefore it’s important for performance that the set of objects in on-disk delta relationships match what a client would fetch.
In a normal repository, this tends to work automatically. The objects are mostly reachable from the branches and tags, and that’s what clients fetch. Any deltas we find on the server are likely to be between objects the client has or will have.
But in some repository setups, you may have several related but separate groups of ref tips, with clients tending to fetch those groups independently. For example, imagine that you are hosting several "forks" of a repository in a single shared object store, and letting clients view them as separate repositories through GIT_NAMESPACE or separate repos using the alternates mechanism. A naive repack may find that the optimal delta for an object is against a base that is only found in another fork. But when a client fetches, they will not have the base object, and we’ll have to find a new delta on the fly.
A similar situation may exist if you have many refs outside of refs/heads/ and refs/tags/ that point to related objects (e.g., refs/pull or refs/changes used by some hosting providers). By default, clients fetch only heads and tags, and deltas against objects found only in those other groups cannot be sent as-is.
Delta islands solve this problem by allowing you to group your refs into distinct "islands". Pack-objects computes which objects are reachable from which islands, and refuses to make a delta from an object A against a base which is not present in all of A's islands. This results in slightly larger packs (because we miss some delta opportunities), but guarantees that a fetch of one island will not have to recompute deltas on the fly due to crossing island boundaries.
When repacking with delta islands the delta window tends to get clogged with candidates that are forbidden by the config. Repacking with a big --window helps (and doesn’t take as long as it otherwise might because we can reject some object pairs based on islands before doing any computation on the content).
Islands are configured via the pack.island option, which can be specified multiple times. Each value is a left-anchored regular expressions matching refnames. For example:
[pack] island = refs/heads/ island = refs/tags/
puts heads and tags into an island (whose name is the empty string; see below for more on naming). Any refs which do not match those regular expressions (e.g., refs/pull/123) is not in any island. Any object which is reachable only from refs/pull/ (but not heads or tags) is therefore not a candidate to be used as a base for refs/heads/.
Refs are grouped into islands based on their "names", and two regexes that produce the same name are considered to be in the same island. The names are computed from the regexes by concatenating any capture groups from the regex, with a - dash in between. (And if there are no capture groups, then the name is the empty string, as in the above example.) This allows you to create arbitrary numbers of islands. Only up to 14 such capture groups are supported though.
For example, imagine you store the refs for each fork in refs/virtual/ID, where ID is a numeric identifier. You might then configure:
[pack] island = refs/virtual/([0-9]+)/heads/ island = refs/virtual/([0-9]+)/tags/ island = refs/virtual/([0-9]+)/(pull)/
That puts the heads and tags for each fork in their own island (named "1234" or similar), and the pull refs for each go into their own "1234-pull".
Note that we pick a single island for each regex to go into, using "last one wins" ordering (which allows repo-specific config to take precedence over user-wide config, and so forth).
Part of the git(1) suite
04/20/2020 | Git 2.20.1 |