From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>,
Git mailing list <git@vger.kernel.org>
Subject: Re: [PATCH] builtin/history: unuse the commit buffer after use
Date: Tue, 16 Jun 2026 07:45:43 +0200 [thread overview]
Message-ID: <ajDjB67HvbdgjYSa@pks.im> (raw)
In-Reply-To: <20260615172946.GD91269@coredump.intra.peff.net>
On Mon, Jun 15, 2026 at 01:29:46PM -0400, Jeff King wrote:
> On Mon, Jun 15, 2026 at 11:48:10AM +0200, Patrick Steinhardt wrote:
>
> > On Sun, Jun 14, 2026 at 02:15:40PM +0000, Kaartic Sivaraam wrote:
> > > While running `git history reword` using a Git built with `SANITIZE` flag set
> > > to `address,leak`, we could observe the following leak being reported:
> >
> > Huh, curious. That seems to hint that we're missing test coverage for
> > this specific scenario, as our test suite doesn't detect this leak.
>
> I think it will only leak when the commit object has an "encoding"
> header. See below.
>
> > > As part of rewording a commit in `git history`, we get the commit message
> > > buffer in the `commit_tree_ext` function. This in turn obtains the buffer
> > > from `repo_logmsg_reencode`. Given how `commit_tree_ext` is invoking the
> > > function with the last two parameters as NULL, we are clearly not expecting
> > > a reencode to happen. In this case, the buffer that we receive from
> > > `repo_logmsg_reencode` ends up always being obtained from a call to
> > > `repo_get_commit_buffer`.
> > >
> > > This buffer is expected to be released with an accompanying call to
> > > `repo_unuse_commit_buffer` which takes care of freeing it. This call
> > > is missing in the `commit_tree_ext` flow thus resulting in the leak.
> >
> > So this doesn't really read specific at all, and I would have expected
> > us to hit this leak. Puzzling.
>
> The first paragraph is accurate here. We'd generally just get a pointer
> to the buffer cached in the slab, because no re-encoding occurs. And in
> that case you _don't_ need to call unuse_commit_buffer(), because you
> have a read-only copy, and the slab cache will hold it forever[1].
> Calling the unuse function will be a noop.
>
> But when we _do_ re-encode, then you get a new buffer which must be
> freed. And that is when you have to call the unuse function. And the
> reason it is "unuse" and not just "free" is that you don't necessarily
> know which you have, but that function figures it out (and frees it only
> if necessary).
>
> So what the patch is doing is correct, but the explanation is a little
> confused. We see the leak only when re-encoding, so we'd probably want a
> test case that triggers that. Which I assume implies rewriting a commit
> that was previously generated with an encoding header.
>
> Now back to that [1] note. Even if we didn't re-encode, we'll still hold
> onto that buffer forever. It's not a "leak" in the traditional sense
> because it's still referenced in the commit slab cache. But if you are
> going to walk over a million commits (like git-log does), you probably
> don't want to hold a million commit messages in memory at once.
>
> For that you'd want to call free_commit_buffer() when you know you're
> totally done with it (again, like git-log does after it finishes showing
> the commit). That might be the case here in commit_tree_ext(), or it
> might happen later (I'm not familiar with the git-history code).
>
> But note that you need to do _both_ the unuse and free calls. If we did
> re-encode, the former is needed to free the newly allocated buffer. The
> latter only drops the original buffer in the cache.
Thanks for the explanation!
Patrick
prev parent reply other threads:[~2026-06-16 5:45 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-14 14:15 [PATCH] builtin/history: unuse the commit buffer after use Kaartic Sivaraam
2026-06-15 9:48 ` Patrick Steinhardt
2026-06-15 17:29 ` Jeff King
2026-06-16 5:45 ` Patrick Steinhardt [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajDjB67HvbdgjYSa@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=kaartic.sivaraam@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox