From: Derrick Stolee <stolee@gmail.com>
To: Arijit Banerjee via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Junio C Hamano" <gitster@pobox.com>,
"Arijit Banerjee" <arijit91@gmail.com>,
"Arijit Banerjee" <arijit@effectiveailabs.com>
Subject: Re: [PATCH] index-pack: retain child bases in delta cache
Date: Mon, 1 Jun 2026 08:50:44 -0400 [thread overview]
Message-ID: <4882be43-9bc5-48cf-b74c-4a05453b2fef@gmail.com> (raw)
In-Reply-To: <pull.2131.git.1780070763044.gitgitgadget@gmail.com>
On 5/29/2026 12:06 PM, Arijit Banerjee via GitGitGadget wrote:
> From: Arijit Banerjee <arijit@effectiveailabs.com>
>
> When resolving a delta whose result has children of its own,
> index-pack adds the result to work_head, accounts its data in
> base_cache_used, and calls prune_base_data(). It then immediately
> frees that same data.
>
> This bypasses the existing delta base cache policy and can force later
> descendants to reconstruct the queued base again. Let the existing
> delta_base_cache_limit pruning policy decide whether to keep or evict
> the data instead.
>
> Signed-off-by: Arijit Banerjee <arijit@effectiveailabs.com>
> ---
> index-pack: retain child bases in delta cache
>
> Speed up the local pack indexing phase of clone/fetch for large
> delta-compressed packs by keeping reconstructed delta bases available
> for reuse when they are queued for later delta resolution.
>
> When index-pack reconstructs a child base and queues it for resolving
> descendant deltas, it currently frees that data immediately. This can
> force the same base to be reconstructed again. Instead, keep it in the
> existing delta base cache and let the existing delta_base_cache_limit
> policy decide whether to retain or evict it.
>
> This does not add a new cache or increase the cache limit. The object
> data is already accounted in base_cache_used, and prune_base_data() is
> already called at this point.
>
> Correctness:
>
> * t/t5302-pack-index.sh passed all 36 tests.
Is there any chance that you ran this also with SANITIZE=leak to make
sure that we aren't introducing a memory leak? (It's hard to tell just
from the patch context, though your description is convincing.)
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2131%2Farijit91%2Findex-pack-retain-child-base-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2131/arijit91/index-pack-retain-child-base-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/2131
Indeed, this PR has a passing linux-leaks build that exercises this
test script. [1]
[1] https://github.com/gitgitgadget/git/actions/runs/26605615549/job/78399938323?pr=2131#step:9:1405
> Benchmarks on a quiet Ubuntu 24.04 VM, 16 vCPU, 32 GiB RAM, local SSD:
>
> pack baseline patched wall-time change RSS change linux blobless 69.17s
> 57.98s 16.2% faster -0.0% linux full 280.72s 236.32s 15.8% faster +1.9%
>
> Five-repeat public-repo medians also improved: git.git 13.1%, libgit2
> 14.0%, redis 13.5%, cpython 4.8%.
>
> Perf on the linux blobless pack showed the same direction under
> profiling: 76.64s baseline vs 61.09s patched, with similar RSS.
A lot of this information that is in your cover letter would be helpful
to include in your commit message, for posterity.
Also, I prefer to see performance numbers for these repos reflected in
results from our performance test suite. We have a test for this purpose,
so you could try running this from t/perf/ for your local copies of these
repos:
GIT_PERF_LARGE_REPO=<path> ./run HEAD~1 HEAD -- p5302-pack-index.sh
And this should result in a standard comparison table that will help
present your results in a way that is familiar to Git contributors.
> @@ -1212,7 +1212,6 @@ static void *threaded_second_pass(void *data)
> list_add(&child->list, &work_head);
> base_cache_used += child->size;
> prune_base_data(NULL);
> - free_base_data(child);
> } else if (child) {
A nice and simple change. Good find!
Thanks,
-Stolee
next prev parent reply other threads:[~2026-06-01 12:50 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-29 16:06 [PATCH] index-pack: retain child bases in delta cache Arijit Banerjee via GitGitGadget
2026-06-01 12:50 ` Derrick Stolee [this message]
2026-06-01 16:13 ` [PATCH v2] " Arijit Banerjee via GitGitGadget
2026-06-02 6:45 ` Jeff King
2026-06-03 0:05 ` [PATCH v3] " Arijit Banerjee via GitGitGadget
2026-06-03 12:24 ` Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4882be43-9bc5-48cf-b74c-4a05453b2fef@gmail.com \
--to=stolee@gmail.com \
--cc=arijit91@gmail.com \
--cc=arijit@effectiveailabs.com \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox