Git development
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Arijit Banerjee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Arijit Banerjee" <arijit91@gmail.com>,
	"Arijit Banerjee" <arijit@effectiveailabs.com>
Subject: Re: [PATCH] index-pack: retain child bases in delta cache
Date: Mon, 1 Jun 2026 08:50:44 -0400	[thread overview]
Message-ID: <4882be43-9bc5-48cf-b74c-4a05453b2fef@gmail.com> (raw)
In-Reply-To: <pull.2131.git.1780070763044.gitgitgadget@gmail.com>

On 5/29/2026 12:06 PM, Arijit Banerjee via GitGitGadget wrote:
> From: Arijit Banerjee <arijit@effectiveailabs.com>
> 
> When resolving a delta whose result has children of its own,
> index-pack adds the result to work_head, accounts its data in
> base_cache_used, and calls prune_base_data(). It then immediately
> frees that same data.
> 
> This bypasses the existing delta base cache policy and can force later
> descendants to reconstruct the queued base again. Let the existing
> delta_base_cache_limit pruning policy decide whether to keep or evict
> the data instead.
> 
> Signed-off-by: Arijit Banerjee <arijit@effectiveailabs.com>
> ---
>     index-pack: retain child bases in delta cache
>     
>     Speed up the local pack indexing phase of clone/fetch for large
>     delta-compressed packs by keeping reconstructed delta bases available
>     for reuse when they are queued for later delta resolution.
>     
>     When index-pack reconstructs a child base and queues it for resolving
>     descendant deltas, it currently frees that data immediately. This can
>     force the same base to be reconstructed again. Instead, keep it in the
>     existing delta base cache and let the existing delta_base_cache_limit
>     policy decide whether to retain or evict it.
>     
>     This does not add a new cache or increase the cache limit. The object
>     data is already accounted in base_cache_used, and prune_base_data() is
>     already called at this point.
>     
>     Correctness:
>     
>      * t/t5302-pack-index.sh passed all 36 tests.

Is there any chance that you ran this also with SANITIZE=leak to make
sure that we aren't introducing a memory leak? (It's hard to tell just
from the patch context, though your description is convincing.)

> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2131%2Farijit91%2Findex-pack-retain-child-base-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2131/arijit91/index-pack-retain-child-base-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/2131

Indeed, this PR has a passing linux-leaks build that exercises this
test script. [1]

[1] https://github.com/gitgitgadget/git/actions/runs/26605615549/job/78399938323?pr=2131#step:9:1405

>     Benchmarks on a quiet Ubuntu 24.04 VM, 16 vCPU, 32 GiB RAM, local SSD:
>     
>     pack baseline patched wall-time change RSS change linux blobless 69.17s
>     57.98s 16.2% faster -0.0% linux full 280.72s 236.32s 15.8% faster +1.9%
>     
>     Five-repeat public-repo medians also improved: git.git 13.1%, libgit2
>     14.0%, redis 13.5%, cpython 4.8%.
>     
>     Perf on the linux blobless pack showed the same direction under
>     profiling: 76.64s baseline vs 61.09s patched, with similar RSS.
A lot of this information that is in your cover letter would be helpful
to include in your commit message, for posterity.

Also, I prefer to see performance numbers for these repos reflected in
results from our performance test suite. We have a test for this purpose,
so you could try running this from t/perf/ for your local copies of these
repos:

  GIT_PERF_LARGE_REPO=<path> ./run HEAD~1 HEAD -- p5302-pack-index.sh

And this should result in a standard comparison table that will help
present your results in a way that is familiar to Git contributors.

> @@ -1212,7 +1212,6 @@ static void *threaded_second_pass(void *data)
>  			list_add(&child->list, &work_head);
>  			base_cache_used += child->size;
>  			prune_base_data(NULL);
> -			free_base_data(child);
>  		} else if (child) {
A nice and simple change. Good find!

Thanks,
-Stolee


  reply	other threads:[~2026-06-01 12:50 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-29 16:06 [PATCH] index-pack: retain child bases in delta cache Arijit Banerjee via GitGitGadget
2026-06-01 12:50 ` Derrick Stolee [this message]
2026-06-01 16:13 ` [PATCH v2] " Arijit Banerjee via GitGitGadget
2026-06-02  6:45   ` Jeff King
2026-06-03  0:05   ` [PATCH v3] " Arijit Banerjee via GitGitGadget
2026-06-03 12:24     ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4882be43-9bc5-48cf-b74c-4a05453b2fef@gmail.com \
    --to=stolee@gmail.com \
    --cc=arijit91@gmail.com \
    --cc=arijit@effectiveailabs.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox