From: "Arijit Banerjee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Junio C Hamano" <gitster@pobox.com>,
"Derrick Stolee" <stolee@gmail.com>, "Jeff King" <peff@peff.net>,
"Arijit Banerjee" <arijit91@gmail.com>,
"Arijit Banerjee" <arijit@effectiveailabs.com>
Subject: [PATCH v3] index-pack: retain child bases in delta cache
Date: Wed, 03 Jun 2026 00:05:17 +0000 [thread overview]
Message-ID: <pull.2131.v3.git.1780445118653.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2131.v2.git.1780330402264.gitgitgadget@gmail.com>
From: Arijit Banerjee <arijit@effectiveailabs.com>
When resolving a delta whose result has children of its own,
index-pack adds the result to work_head, accounts its data in
base_cache_used, and calls prune_base_data(). It then immediately frees
that same data.
This bypasses the existing delta base cache policy and can force later
descendants to reconstruct the queued base again. Let the existing
delta_base_cache_limit pruning policy decide whether to keep or evict
the data instead.
This does not add a new cache or increase the cache limit. The object
data is already accounted in base_cache_used before prune_base_data()
runs. Once all direct children of a base have been dispatched, and no
thread is actively retaining that base for patch_delta(), release the
cached bytes. The base_data struct itself remains alive until the
existing children_remaining bookkeeping says the whole subtree is done.
On a quiet Ubuntu 24.04 VM with 16 vCPUs, 32 GiB RAM, and local SSD,
standard p5302-pack-index.sh runs improved as follows:
libgit2: 3.17(11.49+0.60) -> 2.69(10.52+0.28), 15.1% faster
redis: 5.84(15.56+0.63) -> 4.95(14.05+0.32), 15.2% faster
git.git: 11.17(38.04+1.29) -> 9.67(35.29+0.60), 13.4% faster
cpython: 32.69(117.85+4.37) -> 28.60(109.25+1.91), 12.5% faster
linux: 279.22(797.69+40.86) -> 236.34(723.13+19.02), 15.4% faster
The linux p5302 number is from a single repeat; the others are from the
default three repeats.
End-to-end local full-clone spot checks also improved:
libgit2: 5.00s -> 4.54s, 9.2% faster
redis: 8.75s -> 7.92s, 9.5% faster
git.git: 25.04s -> 23.71s, 5.3% faster
cpython: 56.72s -> 55.94s, 1.4% faster
linux: 556.17s -> 523.83s, 5.8% faster
t/t5302-pack-index.sh passed, and GitGitGadget's linux-leaks CI also
exercised that test under SANITIZE=leak.
Signed-off-by: Arijit Banerjee <arijit@effectiveailabs.com>
---
index-pack: retain child bases in delta cache
Speed up the local index-pack phase used by clone/fetch for large
delta-compressed packs.
When index-pack reconstructs a child base and queues it for resolving
descendant deltas, it currently frees that data immediately. This can
force the same base to be reconstructed again. This patch keeps the data
in the existing delta base cache instead of immediately freeing it.
This does not add a new cache or increase the cache limit. The object
data is already accounted in base_cache_used, and prune_base_data() is
already called at this point.
To keep the retained data lifetime precise, the patch also releases
cached bytes once all direct children of a base have been dispatched and
no thread is actively retaining that base for patch_delta(). The
base_data struct itself still stays alive until the existing
children_remaining bookkeeping says the whole subtree is done.
Changes since v2:
* Addressed Jeff King's review question by releasing cached base data
after all direct children have been dispatched, while keeping the
existing subtree bookkeeping intact.
* Re-ran t/t5302-pack-index.sh, p5302-pack-index.sh, and end-to-end
full clone spot checks with the precise-release version.
Changes since v1:
* Added benchmark results and leak-safety context to the commit
message.
* Included standard p5302-pack-index.sh perf-suite results.
Correctness:
* t/t5302-pack-index.sh passed.
* GitGitGadget's linux-leaks CI exercises t5302-pack-index.sh under
SANITIZE=leak.
Benchmarks on a quiet Ubuntu 24.04 VM, 16 vCPU, 32 GiB RAM, local SSD:
Standard p5302-pack-index.sh perf-suite results using
GIT_PERF_LARGE_REPO=<repo> ./run HEAD~1 HEAD -- p5302-pack-index.sh:
repo HEAD~1 HEAD wall-time change libgit2 3.17(11.49+0.60)
2.69(10.52+0.28) 15.1% faster redis 5.84(15.56+0.63) 4.95(14.05+0.32)
15.2% faster git.git 11.17(38.04+1.29) 9.67(35.29+0.60) 13.4% faster
cpython 32.69(117.85+4.37) 28.60(109.25+1.91) 12.5% faster linux
279.22(797.69+40.86) 236.34(723.13+19.02) 15.4% faster
The linux p5302 row is from a single repeat; the others use the default
three repeats. These timings isolate the index-pack phase affected by
this patch.
End-to-end local full-clone spot checks also showed wins, though these
timings include both server-side pack-objects and client-side index-pack
running concurrently over a local file:// transport, so they are not
isolated index-pack timings.
These runs used git clone --bare --no-local, dropped page cache before
each clone, and used the matching build's bin-wrappers/git as the client
plus the matching bin-wrappers/git-upload-pack via --upload-pack.
repo baseline patched wall-time change libgit2 5.00s 4.54s 9.2% faster
redis 8.75s 7.92s 9.5% faster git.git 25.04s 23.71s 5.3% faster cpython
56.72s 55.94s 1.4% faster linux 556.17s 523.83s 5.8% faster
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2131%2Farijit91%2Findex-pack-retain-child-base-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2131/arijit91/index-pack-retain-child-base-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/2131
Range-diff vs v2:
1: 42eca38f51 ! 1: 51967f9edf index-pack: retain child bases in delta cache
@@ Commit message
This does not add a new cache or increase the cache limit. The object
data is already accounted in base_cache_used before prune_base_data()
- runs, and the existing pruning and base cleanup paths still release it.
+ runs. Once all direct children of a base have been dispatched, and no
+ thread is actively retaining that base for patch_delta(), release the
+ cached bytes. The base_data struct itself remains alive until the
+ existing children_remaining bookkeeping says the whole subtree is done.
On a quiet Ubuntu 24.04 VM with 16 vCPUs, 32 GiB RAM, and local SSD,
- direct index-pack timings on single-pack Linux fixtures improved as
- follows:
+ standard p5302-pack-index.sh runs improved as follows:
- linux blobless: 69.17s -> 57.98s (16.2% faster), RSS flat
- linux full: 280.72s -> 236.32s (15.8% faster), RSS +1.9%
+ libgit2: 3.17(11.49+0.60) -> 2.69(10.52+0.28), 15.1% faster
+ redis: 5.84(15.56+0.63) -> 4.95(14.05+0.32), 15.2% faster
+ git.git: 11.17(38.04+1.29) -> 9.67(35.29+0.60), 13.4% faster
+ cpython: 32.69(117.85+4.37) -> 28.60(109.25+1.91), 12.5% faster
+ linux: 279.22(797.69+40.86) -> 236.34(723.13+19.02), 15.4% faster
- Five-repeat medians on public repositories also improved:
+ The linux p5302 number is from a single repeat; the others are from the
+ default three repeats.
- git.git: 12.31s -> 10.70s (13.1% faster)
- libgit2: 3.35s -> 2.88s (14.0% faster)
- redis: 6.52s -> 5.64s (13.5% faster)
- cpython: 33.02s -> 31.44s (4.8% faster)
+ End-to-end local full-clone spot checks also improved:
- The standard p5302 perf test on a smaller git.git fixture was neutral:
-
- 5302.9 index-pack default threads:
- 11.21(38.07+1.33) -> 11.16(37.90+1.31), -0.4%
+ libgit2: 5.00s -> 4.54s, 9.2% faster
+ redis: 8.75s -> 7.92s, 9.5% faster
+ git.git: 25.04s -> 23.71s, 5.3% faster
+ cpython: 56.72s -> 55.94s, 1.4% faster
+ linux: 556.17s -> 523.83s, 5.8% faster
t/t5302-pack-index.sh passed, and GitGitGadget's linux-leaks CI also
exercised that test under SANITIZE=leak.
@@ Commit message
Signed-off-by: Arijit Banerjee <arijit@effectiveailabs.com>
## builtin/index-pack.c ##
+@@ builtin/index-pack.c: static void free_base_data(struct base_data *c)
+ }
+ }
+
++static int base_data_has_remaining_direct_children(struct base_data *c)
++{
++ return c->ref_first <= c->ref_last ||
++ c->ofs_first <= c->ofs_last;
++}
++
+ static void prune_base_data(struct base_data *retain)
+ {
+ struct list_head *pos;
+@@ builtin/index-pack.c: static void *threaded_second_pass(void *data)
+ }
+
+ work_lock();
+- if (parent)
++ if (parent) {
+ parent->retain_data--;
++ if (!parent->retain_data &&
++ !base_data_has_remaining_direct_children(parent))
++ free_base_data(parent);
++ }
+
+ if (child && child->data) {
+ /*
@@ builtin/index-pack.c: static void *threaded_second_pass(void *data)
list_add(&child->list, &work_head);
base_cache_used += child->size;
builtin/index-pack.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index cf0bd8280d..00b4dff419 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -433,6 +433,12 @@ static void free_base_data(struct base_data *c)
}
}
+static int base_data_has_remaining_direct_children(struct base_data *c)
+{
+ return c->ref_first <= c->ref_last ||
+ c->ofs_first <= c->ofs_last;
+}
+
static void prune_base_data(struct base_data *retain)
{
struct list_head *pos;
@@ -1201,8 +1207,12 @@ static void *threaded_second_pass(void *data)
}
work_lock();
- if (parent)
+ if (parent) {
parent->retain_data--;
+ if (!parent->retain_data &&
+ !base_data_has_remaining_direct_children(parent))
+ free_base_data(parent);
+ }
if (child && child->data) {
/*
@@ -1212,7 +1222,6 @@ static void *threaded_second_pass(void *data)
list_add(&child->list, &work_head);
base_cache_used += child->size;
prune_base_data(NULL);
- free_base_data(child);
} else if (child) {
/*
* This child does not have its own children. It may be
base-commit: c69baaf57ba26cf117c2b6793802877f19738b0d
--
gitgitgadget
next prev parent reply other threads:[~2026-06-03 0:05 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-29 16:06 [PATCH] index-pack: retain child bases in delta cache Arijit Banerjee via GitGitGadget
2026-06-01 12:50 ` Derrick Stolee
2026-06-01 16:13 ` [PATCH v2] " Arijit Banerjee via GitGitGadget
2026-06-02 6:45 ` Jeff King
2026-06-03 0:05 ` Arijit Banerjee via GitGitGadget [this message]
2026-06-03 12:24 ` [PATCH v3] " Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.2131.v3.git.1780445118653.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=arijit91@gmail.com \
--cc=arijit@effectiveailabs.com \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox