From: Christian Couder <christian.couder@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
Christian Couder <chriscool@tuxfamily.org>
Subject: [RFC PATCH 3/5] pack-objects: add delta-islands support
Date: Sun, 22 Jul 2018 07:48:34 +0200 [thread overview]
Message-ID: <20180722054836.28935-4-chriscool@tuxfamily.org> (raw)
In-Reply-To: <20180722054836.28935-1-chriscool@tuxfamily.org>
From: Jeff King <peff@peff.net>
Implement support for delta islands in git pack-objects
and document how delta islands work in
"Documentation/git-pack-objects.txt".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
Documentation/git-pack-objects.txt | 88 +++++++++++++++++++
builtin/pack-objects.c | 130 ++++++++++++++++++++---------
2 files changed, 177 insertions(+), 41 deletions(-)
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index d95b472d16..7b7a36056f 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -289,6 +289,94 @@ Unexpected missing object will raise an error.
--unpack-unreachable::
Keep unreachable objects in loose form. This implies `--revs`.
+--delta-islands::
+ Restrict delta matches based on "islands". See DELTA ISLANDS
+ below.
+
+
+DELTA ISLANDS
+-------------
+
+When possible, `pack-objects` tries to reuse existing on-disk deltas to
+avoid having to search for new ones on the fly. This is an important
+optimization for serving fetches, because it means the server can avoid
+inflating most objects at all and just send the bytes directly from
+disk. This optimization can't work when an object is stored as a delta
+against a base which the receiver does not have (and which we are not
+already sending). In that case the server "breaks" the delta and has to
+find a new one, which has a high CPU cost. Therefore it's important for
+performance that the set of objects in on-disk delta relationships match
+what a client would fetch.
+
+In a normal repository, this tends to work automatically. The objects
+are mostly reachable from the branches and tags, and that's what clients
+fetch. Any deltas we find on the server are likely to be between objects
+the client has or will have.
+
+But in some repository setups, you may have several related but separate
+groups of ref tips, with clients tending to fetch those groups
+independently. For example, imagine that you are hosting several "forks"
+of a repository in a single shared object store, and letting clients
+view them as separate repositories through `GIT_NAMESPACE` or separate
+repos using the alternates mechanism. A naive repack may find that the
+optimal delta for an object is against a base that is only found in
+another fork. But when a client fetches, they will not have the base
+object, and we'll have to find a new delta on the fly.
+
+A similar situation may exist if you have many refs outside of
+`refs/heads/` and `refs/tags/` that point to related objects (e.g.,
+`refs/pull` or `refs/changes` used by some hosting providers). By
+default, clients fetch only heads and tags, and deltas against objects
+found only in those other groups cannot be sent as-is.
+
+Delta islands solve this problem by allowing you to group your refs into
+distinct "islands". Pack-objects computes which objects are reachable
+from which islands, and refuses to make a delta from an object `A`
+against a base which is not present in all of `A`'s islands. This
+results in slightly larger packs (because we miss some delta
+opportunities), but guarantees that a fetch of one island will not have
+to recompute deltas on the fly due to crossing island boundaries.
+
+Islands are configured via the `pack.island` option, which can be
+specified multiple times. Each value is a left-anchored regular
+expressions matching refnames. For example:
+
+-------------------------------------------
+[pack]
+island = refs/heads/
+island = refs/tags/
+-------------------------------------------
+
+puts heads and tags into an island (whose name is the empty string; see
+below for more on naming). Any refs which do not match those regular
+expressions (e.g., `refs/pull/123`) is not in any island. Any object
+which is reachable only from `refs/pull/` (but not heads or tags) is
+therefore not a candidate to be used as a base for `refs/heads/`.
+
+Refs are grouped into islands based on their "names", and two regexes
+that produce the same name are considered to be in the same island. The
+names are computed from the regexes by concatenating any capture groups
+from the regex (and if there are none, then the name is the empty
+string, as in the above example). This allows you to create arbitrary
+numbers of islands. For example, imagine you store the refs for each
+fork in `refs/virtual/ID`, where `ID` is a numeric identifier. You might
+then configure:
+
+-------------------------------------------
+[pack]
+island = refs/virtual/([0-9]+)/heads/
+island = refs/virtual/([0-9]+)/tags/
+island = refs/virtual/([0-9]+)/(pull)/
+-------------------------------------------
+
+That puts the heads and tags for each fork in their own island (named
+"1234" or similar), and the pull refs for each go into their own
+"1234-pull".
+
+Note that we pick a single island for each regex to go into, using "last
+one wins" ordering (which allows repo-specific config to take precedence
+over user-wide config, and so forth).
+
SEE ALSO
--------
linkgit:git-rev-list[1]
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ebc8cefb53..3bd41d8b05 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -24,6 +24,7 @@
#include "streaming.h"
#include "thread-utils.h"
#include "pack-bitmap.h"
+#include "delta-islands.h"
#include "reachable.h"
#include "sha1-array.h"
#include "argv-array.h"
@@ -59,6 +60,7 @@ static struct packing_data to_pack;
static struct pack_idx_entry **written_list;
static uint32_t nr_result, nr_written, nr_seen;
+static uint32_t write_layer;
static int non_empty;
static int reuse_delta = 1, reuse_object = 1;
@@ -93,6 +95,8 @@ static uint16_t write_bitmap_options;
static int exclude_promisor_objects;
+static int use_delta_islands;
+
static unsigned long delta_cache_size = 0;
static unsigned long max_delta_cache_size = DEFAULT_DELTA_CACHE_SIZE;
static unsigned long cache_max_small_delta_size = 1000;
@@ -607,7 +611,7 @@ static inline void add_to_write_order(struct object_entry **wo,
unsigned int *endp,
struct object_entry *e)
{
- if (e->filled)
+ if (e->filled || e->layer != write_layer)
return;
wo[(*endp)++] = e;
e->filled = 1;
@@ -669,6 +673,7 @@ static void add_family_to_write_order(struct object_entry **wo,
static struct object_entry **compute_write_order(void)
{
+ uint32_t max_layers = 1;
unsigned int i, wo_end, last_untagged;
struct object_entry **wo;
@@ -700,51 +705,58 @@ static struct object_entry **compute_write_order(void)
*/
for_each_tag_ref(mark_tagged, NULL);
- /*
- * Give the objects in the original recency order until
- * we see a tagged tip.
- */
+ if (use_delta_islands)
+ max_layers = compute_pack_layers(&to_pack);
+
ALLOC_ARRAY(wo, to_pack.nr_objects);
- for (i = wo_end = 0; i < to_pack.nr_objects; i++) {
- if (objects[i].tagged)
- break;
- add_to_write_order(wo, &wo_end, &objects[i]);
- }
- last_untagged = i;
+ wo_end = 0;
- /*
- * Then fill all the tagged tips.
- */
- for (; i < to_pack.nr_objects; i++) {
- if (objects[i].tagged)
+ for (; write_layer < max_layers; ++write_layer) {
+ /*
+ * Give the objects in the original recency order until
+ * we see a tagged tip.
+ */
+ for (i = 0; i < to_pack.nr_objects; i++) {
+ if (objects[i].tagged)
+ break;
add_to_write_order(wo, &wo_end, &objects[i]);
- }
+ }
+ last_untagged = i;
- /*
- * And then all remaining commits and tags.
- */
- for (i = last_untagged; i < to_pack.nr_objects; i++) {
- if (oe_type(&objects[i]) != OBJ_COMMIT &&
- oe_type(&objects[i]) != OBJ_TAG)
- continue;
- add_to_write_order(wo, &wo_end, &objects[i]);
- }
+ /*
+ * Then fill all the tagged tips.
+ */
+ for (; i < to_pack.nr_objects; i++) {
+ if (objects[i].tagged)
+ add_to_write_order(wo, &wo_end, &objects[i]);
+ }
- /*
- * And then all the trees.
- */
- for (i = last_untagged; i < to_pack.nr_objects; i++) {
- if (oe_type(&objects[i]) != OBJ_TREE)
- continue;
- add_to_write_order(wo, &wo_end, &objects[i]);
- }
+ /*
+ * And then all remaining commits and tags.
+ */
+ for (i = last_untagged; i < to_pack.nr_objects; i++) {
+ if (oe_type(&objects[i]) != OBJ_COMMIT &&
+ oe_type(&objects[i]) != OBJ_TAG)
+ continue;
+ add_to_write_order(wo, &wo_end, &objects[i]);
+ }
- /*
- * Finally all the rest in really tight order
- */
- for (i = last_untagged; i < to_pack.nr_objects; i++) {
- if (!objects[i].filled)
- add_family_to_write_order(wo, &wo_end, &objects[i]);
+ /*
+ * And then all the trees.
+ */
+ for (i = last_untagged; i < to_pack.nr_objects; i++) {
+ if (oe_type(&objects[i]) != OBJ_TREE)
+ continue;
+ add_to_write_order(wo, &wo_end, &objects[i]);
+ }
+
+ /*
+ * Finally all the rest in really tight order
+ */
+ for (i = last_untagged; i < to_pack.nr_objects; i++) {
+ if (!objects[i].filled && objects[i].layer == write_layer)
+ add_family_to_write_order(wo, &wo_end, &objects[i]);
+ }
}
if (wo_end != to_pack.nr_objects)
@@ -1504,7 +1516,8 @@ static void check_object(struct object_entry *entry)
break;
}
- if (base_ref && (base_entry = packlist_find(&to_pack, base_ref, NULL))) {
+ if (base_ref && (base_entry = packlist_find(&to_pack, base_ref, NULL)) &&
+ in_same_island(&entry->idx.oid, &base_entry->idx.oid)) {
/*
* If base_ref was set above that means we wish to
* reuse delta data, and we even found that base
@@ -1820,6 +1833,11 @@ static int type_size_sort(const void *_a, const void *_b)
return -1;
if (a->preferred_base < b->preferred_base)
return 1;
+ if (use_delta_islands) {
+ int island_cmp = island_delta_cmp(&a->idx.oid, &b->idx.oid);
+ if (island_cmp)
+ return island_cmp;
+ }
if (a_size > b_size)
return -1;
if (a_size < b_size)
@@ -1968,6 +1986,9 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
if (trg_size < src_size / 32)
return 0;
+ if (!in_same_island(&trg->entry->idx.oid, &src->entry->idx.oid))
+ return 0;
+
/* Load data if not already done */
if (!trg->data) {
read_lock();
@@ -2506,6 +2527,9 @@ static void prepare_pack(int window, int depth)
uint32_t i, nr_deltas;
unsigned n;
+ if (use_delta_islands)
+ resolve_tree_islands(progress, &to_pack);
+
get_object_details();
/*
@@ -2669,6 +2693,9 @@ static void show_commit(struct commit *commit, void *data)
if (write_bitmap_index)
index_commit_for_bitmap(commit);
+
+ if (use_delta_islands)
+ propagate_island_marks(commit);
}
static void show_object(struct object *obj, const char *name, void *data)
@@ -2676,6 +2703,19 @@ static void show_object(struct object *obj, const char *name, void *data)
add_preferred_base_object(name);
add_object_entry(&obj->oid, obj->type, name, 0);
obj->flags |= OBJECT_ADDED;
+
+ if (use_delta_islands) {
+ const char *p;
+ unsigned depth = 0;
+ struct object_entry *ent;
+
+ for (p = strchr(name, '/'); p; p = strchr(p + 1, '/'))
+ depth++;
+
+ ent = packlist_find(&to_pack, obj->oid.hash, NULL);
+ if (ent && depth > ent->tree_depth)
+ ent->tree_depth = depth;
+ }
}
static void show_object__ma_allow_any(struct object *obj, const char *name, void *data)
@@ -3003,6 +3043,9 @@ static void get_object_list(int ac, const char **av)
if (use_bitmap_index && !get_object_list_from_bitmap(&revs))
return;
+ if (use_delta_islands)
+ load_delta_islands();
+
if (prepare_revision_walk(&revs))
die("revision walk setup failed");
mark_edges_uninteresting(&revs, show_edge);
@@ -3182,6 +3225,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
option_parse_missing_action },
OPT_BOOL(0, "exclude-promisor-objects", &exclude_promisor_objects,
N_("do not pack objects in promisor packfiles")),
+ OPT_BOOL(0, "delta-islands", &use_delta_islands,
+ N_("enable islands for delta compression")),
OPT_END(),
};
@@ -3308,6 +3353,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (pack_to_stdout || !rev_list_all)
write_bitmap_index = 0;
+ if (use_delta_islands)
+ argv_array_push(&rp, "--topo-order");
+
if (progress && all_progress_implied)
progress = 2;
--
2.18.0.237.gffdb1dbdaa
next prev parent reply other threads:[~2018-07-22 5:49 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-22 5:48 [RFC PATCH 0/5] Add delta islands support Christian Couder
2018-07-22 5:48 ` [RFC PATCH 1/5] packfile: make get_delta_base() non static Christian Couder
2018-07-24 16:19 ` Junio C Hamano
2018-07-27 11:29 ` Jeff King
2018-07-22 5:48 ` [RFC PATCH 2/5] Add delta-islands.{c,h} Christian Couder
2018-07-22 8:50 ` Duy Nguyen
2018-07-22 13:57 ` Christian Couder
2018-08-05 18:53 ` Christian Couder
2018-08-06 14:17 ` Jeff King
2018-08-06 15:53 ` Duy Nguyen
2018-08-06 18:54 ` Christian Couder
2018-08-06 19:21 ` Duy Nguyen
2018-07-24 16:47 ` Junio C Hamano
2018-07-27 13:02 ` Jeff King
2018-07-27 9:40 ` Jeff King
2018-07-22 5:48 ` Christian Couder [this message]
2018-07-22 8:55 ` [RFC PATCH 3/5] pack-objects: add delta-islands support Duy Nguyen
2018-08-05 17:28 ` Christian Couder
2018-07-23 18:52 ` Stefan Beller
2018-07-24 9:58 ` Jeff King
2018-07-24 17:20 ` Stefan Beller
2018-07-27 13:13 ` Jeff King
2018-07-27 17:22 ` Stefan Beller
2018-07-28 9:00 ` Jeff King
2018-07-28 12:12 ` Christian Couder
2018-07-24 17:03 ` Junio C Hamano
2018-07-24 17:11 ` Junio C Hamano
2018-08-05 17:40 ` Christian Couder
2018-08-06 8:44 ` Christian Couder
2018-08-06 13:58 ` Jeff King
2018-07-22 5:48 ` [RFC PATCH 4/5] repack: " Christian Couder
2018-07-22 5:48 ` [RFC PATCH 5/5] t: add t9930-delta-islands.sh Christian Couder
2018-07-24 10:24 ` Jeff King
2018-07-24 10:16 ` [RFC PATCH 0/5] Add delta islands support Jeff King
2018-07-24 17:18 ` Junio C Hamano
2018-07-24 21:14 ` Jeff King
2018-07-26 13:34 ` Johannes Schindelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180722054836.28935-4-chriscool@tuxfamily.org \
--to=christian.couder@gmail.com \
--cc=chriscool@tuxfamily.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).