All of lore.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>,
	Git mailing list <git@vger.kernel.org>
Subject: Re: [PATCH] builtin/history: unuse the commit buffer after use
Date: Tue, 16 Jun 2026 07:45:43 +0200	[thread overview]
Message-ID: <ajDjB67HvbdgjYSa@pks.im> (raw)
In-Reply-To: <20260615172946.GD91269@coredump.intra.peff.net>

On Mon, Jun 15, 2026 at 01:29:46PM -0400, Jeff King wrote:
> On Mon, Jun 15, 2026 at 11:48:10AM +0200, Patrick Steinhardt wrote:
> 
> > On Sun, Jun 14, 2026 at 02:15:40PM +0000, Kaartic Sivaraam wrote:
> > > While running `git history reword` using a Git built with `SANITIZE` flag set
> > > to `address,leak`, we could observe the following leak being reported:
> > 
> > Huh, curious. That seems to hint that we're missing test coverage for
> > this specific scenario, as our test suite doesn't detect this leak.
> 
> I think it will only leak when the commit object has an "encoding"
> header. See below.
> 
> > > As part of rewording a commit in `git history`, we get the commit message
> > > buffer in the `commit_tree_ext` function. This in turn obtains the buffer
> > > from `repo_logmsg_reencode`. Given how `commit_tree_ext` is invoking the
> > > function with the last two parameters as NULL, we are clearly not expecting
> > > a reencode to happen. In this case, the buffer that we receive from
> > > `repo_logmsg_reencode` ends up always being obtained from a call to
> > > `repo_get_commit_buffer`.
> > > 
> > > This buffer is expected to be released with an accompanying call to
> > > `repo_unuse_commit_buffer` which takes care of freeing it. This call
> > > is missing in the `commit_tree_ext` flow thus resulting in the leak.
> > 
> > So this doesn't really read specific at all, and I would have expected
> > us to hit this leak. Puzzling.
> 
> The first paragraph is accurate here. We'd generally just get a pointer
> to the buffer cached in the slab, because no re-encoding occurs. And in
> that case you _don't_ need to call unuse_commit_buffer(), because you
> have a read-only copy, and the slab cache will hold it forever[1].
> Calling the unuse function will be a noop.
> 
> But when we _do_ re-encode, then you get a new buffer which must be
> freed. And that is when you have to call the unuse function. And the
> reason it is "unuse" and not just "free" is that you don't necessarily
> know which you have, but that function figures it out (and frees it only
> if necessary).
> 
> So what the patch is doing is correct, but the explanation is a little
> confused. We see the leak only when re-encoding, so we'd probably want a
> test case that triggers that. Which I assume implies rewriting a commit
> that was previously generated with an encoding header.
> 
> Now back to that [1] note. Even if we didn't re-encode, we'll still hold
> onto that buffer forever. It's not a "leak" in the traditional sense
> because it's still referenced in the commit slab cache. But if you are
> going to walk over a million commits (like git-log does), you probably
> don't want to hold a million commit messages in memory at once.
> 
> For that you'd want to call free_commit_buffer() when you know you're
> totally done with it (again, like git-log does after it finishes showing
> the commit). That might be the case here in commit_tree_ext(), or it
> might happen later (I'm not familiar with the git-history code).
> 
> But note that you need to do _both_ the unuse and free calls. If we did
> re-encode, the former is needed to free the newly allocated buffer. The
> latter only drops the original buffer in the cache.

Thanks for the explanation!

Patrick

      reply	other threads:[~2026-06-16  5:45 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-14 14:15 [PATCH] builtin/history: unuse the commit buffer after use Kaartic Sivaraam
2026-06-15  9:48 ` Patrick Steinhardt
2026-06-15 17:29   ` Jeff King
2026-06-16  5:45     ` Patrick Steinhardt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajDjB67HvbdgjYSa@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=kaartic.sivaraam@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.