From: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Git mailing list <git@vger.kernel.org>, Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH] builtin/history: unuse the commit buffer after use
Date: Tue, 30 Jun 2026 09:15:49 +0530 [thread overview]
Message-ID: <317d0f7b-469f-4456-8808-506e17de264d@gmail.com> (raw)
In-Reply-To: <20260615172946.GD91269@coredump.intra.peff.net>
Hi Peff,
On 15/06/26 22:59, Jeff King wrote:
> On Mon, Jun 15, 2026 at 11:48:10AM +0200, Patrick Steinhardt wrote:
>> Huh, curious. That seems to hint that we're missing test coverage for
>> this specific scenario, as our test suite doesn't detect this leak.
>
> I think it will only leak when the commit object has an "encoding"
> header. See below.
>
I'm quite sure this is not about the commit with the encoding header.
More below.
> The first paragraph is accurate here. We'd generally just get a pointer
> to the buffer cached in the slab, because no re-encoding occurs. And in
> that case you _don't_ need to call unuse_commit_buffer(), because you
> have a read-only copy, and the slab cache will hold it forever[1].
> Calling the unuse function will be a noop.
>
> But when we _do_ re-encode, then you get a new buffer which must be
> freed. And that is when you have to call the unuse function. And the
> reason it is "unuse" and not just "free" is that you don't necessarily
> know which you have, but that function figures it out (and frees it only
> if necessary).
>
> So what the patch is doing is correct, but the explanation is a little
> confused. We see the leak only when re-encoding, so we'd probably want a
> test case that triggers that. Which I assume implies rewriting a commit
> that was previously generated with an encoding header.
>
Thank you very much for these insights! It has been helpful but on
further digging I think this is not about reencoding. On testing and
digging further, the leak appears to be happening when the commit that
is being reworded we get is a freshly allocated buffer from
repo_get_commit_buffer. I'm still trying to figure out how specific
commits get cached in the slab while other commits don't. I'll update
this thread shortly once I get an idea about the same.
Meanwhile, if anyone knows offhand about this, kindly chime in.
> Now back to that [1] note. Even if we didn't re-encode, we'll still hold
> onto that buffer forever. It's not a "leak" in the traditional sense
> because it's still referenced in the commit slab cache. But if you are
> going to walk over a million commits (like git-log does), you probably
> don't want to hold a million commit messages in memory at once.
>
> For that you'd want to call free_commit_buffer() when you know you're
> totally done with it (again, like git-log does after it finishes showing
> the commit). That might be the case here in commit_tree_ext(), or it
> might happen later (I'm not familiar with the git-history code).
>
> But note that you need to do _both_ the unuse and free calls. If we did
> re-encode, the former is needed to free the newly allocated buffer. The
> latter only drops the original buffer in the cache.
>
From my understanding, I think we may not need free_commit_buffer for
the following reasons:
- The leak was only being reported when the commit did not come
from the commit slab.
- We are not going to be reading too many commit objects into memory in
this code path. Hence freeing the commit in the slab isn't strictly
necessary.
Kindly correct me if I missed something, though.
To conclude, I think the change that the patch proposes if fine but the
commit message definitely needs updation.
--
Sivaraam
next prev parent reply other threads:[~2026-06-30 3:45 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-14 14:15 [PATCH] builtin/history: unuse the commit buffer after use Kaartic Sivaraam
2026-06-15 9:48 ` Patrick Steinhardt
2026-06-15 17:29 ` Jeff King
2026-06-16 5:45 ` Patrick Steinhardt
2026-06-30 3:45 ` Kaartic Sivaraam [this message]
2026-06-30 5:26 ` Jeff King
2026-06-30 3:43 ` Kaartic Sivaraam
2026-06-30 5:38 ` Jeff King
2026-06-30 5:50 ` Jeff King
2026-06-30 6:44 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=317d0f7b-469f-4456-8808-506e17de264d@gmail.com \
--to=kaartic.sivaraam@gmail.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox