Git development
 help / color / mirror / Atom feed
From: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Git mailing list <git@vger.kernel.org>, Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH] builtin/history: unuse the commit buffer after use
Date: Tue, 30 Jun 2026 09:15:49 +0530	[thread overview]
Message-ID: <317d0f7b-469f-4456-8808-506e17de264d@gmail.com> (raw)
In-Reply-To: <20260615172946.GD91269@coredump.intra.peff.net>

Hi Peff,

On 15/06/26 22:59, Jeff King wrote:
> On Mon, Jun 15, 2026 at 11:48:10AM +0200, Patrick Steinhardt wrote:
>> Huh, curious. That seems to hint that we're missing test coverage for
>> this specific scenario, as our test suite doesn't detect this leak.
> 
> I think it will only leak when the commit object has an "encoding"
> header. See below.
> 

I'm quite sure this is not about the commit with the encoding header. 
More below.

> The first paragraph is accurate here. We'd generally just get a pointer
> to the buffer cached in the slab, because no re-encoding occurs. And in
> that case you _don't_ need to call unuse_commit_buffer(), because you
> have a read-only copy, and the slab cache will hold it forever[1].
> Calling the unuse function will be a noop.
> 
> But when we _do_ re-encode, then you get a new buffer which must be
> freed. And that is when you have to call the unuse function. And the
> reason it is "unuse" and not just "free" is that you don't necessarily
> know which you have, but that function figures it out (and frees it only
> if necessary).
> 
> So what the patch is doing is correct, but the explanation is a little
> confused. We see the leak only when re-encoding, so we'd probably want a
> test case that triggers that. Which I assume implies rewriting a commit
> that was previously generated with an encoding header.
> 

Thank you very much for these insights! It has been helpful but on 
further digging I think this is not about reencoding. On testing and 
digging further, the leak appears to be happening when the commit that 
is being reworded we get is a freshly allocated buffer from 
repo_get_commit_buffer. I'm still trying to figure out how specific 
commits get cached in the slab while other commits don't. I'll update 
this thread shortly once I get an idea about the same.

Meanwhile, if anyone knows offhand about this, kindly chime in.

> Now back to that [1] note. Even if we didn't re-encode, we'll still hold
> onto that buffer forever. It's not a "leak" in the traditional sense
> because it's still referenced in the commit slab cache. But if you are
> going to walk over a million commits (like git-log does), you probably
> don't want to hold a million commit messages in memory at once.
> 
> For that you'd want to call free_commit_buffer() when you know you're
> totally done with it (again, like git-log does after it finishes showing
> the commit). That might be the case here in commit_tree_ext(), or it
> might happen later (I'm not familiar with the git-history code).
>
> But note that you need to do _both_ the unuse and free calls. If we did
> re-encode, the former is needed to free the newly allocated buffer. The
> latter only drops the original buffer in the cache.
>

 From my understanding, I think we may not need free_commit_buffer for 
the following reasons:

- The leak was only being reported when the commit did not come
   from the commit slab.
- We are not going to be reading too many commit objects into memory in
   this code path. Hence freeing the commit in the slab isn't strictly
   necessary.

Kindly correct me if I missed something, though.

To conclude, I think the change that the patch proposes if fine but the 
commit message definitely needs updation.

--
Sivaraam


  parent reply	other threads:[~2026-06-30  3:45 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-14 14:15 [PATCH] builtin/history: unuse the commit buffer after use Kaartic Sivaraam
2026-06-15  9:48 ` Patrick Steinhardt
2026-06-15 17:29   ` Jeff King
2026-06-16  5:45     ` Patrick Steinhardt
2026-06-30  3:45     ` Kaartic Sivaraam [this message]
2026-06-30  5:26       ` Jeff King
2026-06-30  3:43   ` Kaartic Sivaraam
2026-06-30  5:38     ` Jeff King
2026-06-30  5:50       ` Jeff King
2026-06-30  6:44         ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=317d0f7b-469f-4456-8808-506e17de264d@gmail.com \
    --to=kaartic.sivaraam@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox