Git development
 help / color / mirror / Atom feed
* [PATCH] builtin/history: unuse the commit buffer after use
@ 2026-06-14 14:15 Kaartic Sivaraam
  2026-06-15  9:48 ` Patrick Steinhardt
  0 siblings, 1 reply; 4+ messages in thread
From: Kaartic Sivaraam @ 2026-06-14 14:15 UTC (permalink / raw)
  To: Git mailing list; +Cc: Patrick Steinhardt

While running `git history reword` using a Git built with `SANITIZE` flag set
to `address,leak`, we could observe the following leak being reported:

-- 8< --

=================================================================
==7156==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1813 byte(s) in 1 object(s) allocated from:
    #0 0x79a3d1d2b60f in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:67
    #1 0x612e2bb8c2e9 in do_xmalloc /path/to/git/wrapper.c:55
    #2 0x612e2bb8c3f7 in do_xmallocz /path/to/git/wrapper.c:89
    #3 0x612e2bb8c48f in xmallocz_gently /path/to/git/wrapper.c:102
    #4 0x612e2b8dc28e in unpack_compressed_entry /path/to/git/packfile.c:1744
    #5 0x612e2b8dd12a in unpack_entry /path/to/git/packfile.c:1897
    #6 0x612e2b8daae2 in cache_or_unpack_entry /path/to/git/packfile.c:1535
    #7 0x612e2b8db1f6 in packed_object_info_with_index_pos /path/to/git/packfile.c:1617
    #8 0x612e2b8dc1c4 in packed_object_info /path/to/git/packfile.c:1732
    #9 0x612e2b8def05 in packfile_store_read_object_info /path/to/git/packfile.c:2228
    #10 0x612e2b889cb0 in odb_source_files_read_object_info odb/source-files.c:58
    #11 0x612e2b8805e9 in odb_source_read_object_info odb/source.h:326
    #12 0x612e2b885cf0 in do_oid_object_info_extended /path/to/git/odb.c:572
    #13 0x612e2b886fe4 in odb_read_object_info_extended /path/to/git/odb.c:710
    #14 0x612e2b887584 in odb_read_object /path/to/git/odb.c:756
    #15 0x612e2b68d6e4 in repo_get_commit_buffer /path/to/git/commit.c:399
    #16 0x612e2b91a7d4 in repo_logmsg_reencode /path/to/git/pretty.c:716
    #17 0x612e2b3de7da in commit_tree_ext builtin/history.c:127
    #18 0x612e2b3dee9f in commit_tree_with_edited_message builtin/history.c:183
    #19 0x612e2b3e2c4d in cmd_history_reword builtin/history.c:717
    #20 0x612e2b3e53b6 in cmd_history builtin/history.c:998
    #21 0x612e2b27ae97 in run_builtin /path/to/git/git.c:506
    #22 0x612e2b27b9ae in handle_builtin /path/to/git/git.c:782
    #23 0x612e2b27c240 in run_argv /path/to/git/git.c:865
    #24 0x612e2b27cd94 in cmd_main /path/to/git/git.c:986
    #25 0x612e2b5c4267 in main /path/to/git/common-main.c:9
    #26 0x79a3d182a600 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:59
    #27 0x79a3d182a717 in __libc_start_main_impl ../csu/libc-start.c:360
    #28 0x612e2b276124 in _start (/path/to.local/bin/git+0x211124) (BuildId: 8da3d640a944e21b895fc4802d7942b1505be663)

SUMMARY: AddressSanitizer: 1813 byte(s) leaked in 1 allocation(s).

-- >8 --

A deeper investigation on this reveals the following as the root cause.

As part of rewording a commit in `git history`, we get the commit message
buffer in the `commit_tree_ext` function. This in turn obtains the buffer
from `repo_logmsg_reencode`. Given how `commit_tree_ext` is invoking the
function with the last two parameters as NULL, we are clearly not expecting
a reencode to happen. In this case, the buffer that we receive from
`repo_logmsg_reencode` ends up always being obtained from a call to
`repo_get_commit_buffer`.

This buffer is expected to be released with an accompanying call to
`repo_unuse_commit_buffer` which takes care of freeing it. This call
is missing in the `commit_tree_ext` flow thus resulting in the leak.

Fix this by ensuring we call `repo_unuse_commit_buffer` on the
original_message buffer.

Signed-off-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
---
I must mention that I also noticed the following comment in `commit_tree_ext`:

»       /* We retain authorship of the original commit. */
»       original_message = repo_logmsg_reencode(repo, commit_with_message, NULL, NULL);

... but I'm not quite sure why we don't unuse the buffer after its purpose is
done. Kindly englighten me in case I missed something.


 builtin/history.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/history.c b/builtin/history.c
index 091465a59e..0e9259b5d7 100644
--- a/builtin/history.c
+++ b/builtin/history.c
@@ -154,6 +154,7 @@ static int commit_tree_ext(struct repository *repo,
 	free_commit_extra_headers(original_extra_headers);
 	strbuf_release(&commit_message);
 	free(original_author);
+	repo_unuse_commit_buffer(repo, commit_with_message, original_message);
 	return ret;
 }
 
-- 
2.55.0.rc0.738.g0c8ab3ebcc.dirty


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] builtin/history: unuse the commit buffer after use
  2026-06-14 14:15 [PATCH] builtin/history: unuse the commit buffer after use Kaartic Sivaraam
@ 2026-06-15  9:48 ` Patrick Steinhardt
  2026-06-15 17:29   ` Jeff King
  0 siblings, 1 reply; 4+ messages in thread
From: Patrick Steinhardt @ 2026-06-15  9:48 UTC (permalink / raw)
  To: Kaartic Sivaraam; +Cc: Git mailing list

On Sun, Jun 14, 2026 at 02:15:40PM +0000, Kaartic Sivaraam wrote:
> While running `git history reword` using a Git built with `SANITIZE` flag set
> to `address,leak`, we could observe the following leak being reported:

Huh, curious. That seems to hint that we're missing test coverage for
this specific scenario, as our test suite doesn't detect this leak.

[snip]
> A deeper investigation on this reveals the following as the root cause.
> 
> As part of rewording a commit in `git history`, we get the commit message
> buffer in the `commit_tree_ext` function. This in turn obtains the buffer
> from `repo_logmsg_reencode`. Given how `commit_tree_ext` is invoking the
> function with the last two parameters as NULL, we are clearly not expecting
> a reencode to happen. In this case, the buffer that we receive from
> `repo_logmsg_reencode` ends up always being obtained from a call to
> `repo_get_commit_buffer`.
> 
> This buffer is expected to be released with an accompanying call to
> `repo_unuse_commit_buffer` which takes care of freeing it. This call
> is missing in the `commit_tree_ext` flow thus resulting in the leak.

So this doesn't really read specific at all, and I would have expected
us to hit this leak. Puzzling.

> Fix this by ensuring we call `repo_unuse_commit_buffer` on the
> original_message buffer.
> 
> Signed-off-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
> ---
> I must mention that I also noticed the following comment in `commit_tree_ext`:
> 
> »       /* We retain authorship of the original commit. */
> »       original_message = repo_logmsg_reencode(repo, commit_with_message, NULL, NULL);
> 
> ... but I'm not quite sure why we don't unuse the buffer after its purpose is
> done. Kindly englighten me in case I missed something.

Did you maybe confuse "authorship" with "ownership" while reading the
comment? The comment only mentions that we retain the original "Author"
commit metadata, it doesn't refer to ownership of the underlying
objects.

> diff --git a/builtin/history.c b/builtin/history.c
> index 091465a59e..0e9259b5d7 100644
> --- a/builtin/history.c
> +++ b/builtin/history.c
> @@ -154,6 +154,7 @@ static int commit_tree_ext(struct repository *repo,
>  	free_commit_extra_headers(original_extra_headers);
>  	strbuf_release(&commit_message);
>  	free(original_author);
> +	repo_unuse_commit_buffer(repo, commit_with_message, original_message);
>  	return ret;
>  }

Yup, this makes sense to me.

Thanks!

Patrick

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] builtin/history: unuse the commit buffer after use
  2026-06-15  9:48 ` Patrick Steinhardt
@ 2026-06-15 17:29   ` Jeff King
  2026-06-16  5:45     ` Patrick Steinhardt
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff King @ 2026-06-15 17:29 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Kaartic Sivaraam, Git mailing list

On Mon, Jun 15, 2026 at 11:48:10AM +0200, Patrick Steinhardt wrote:

> On Sun, Jun 14, 2026 at 02:15:40PM +0000, Kaartic Sivaraam wrote:
> > While running `git history reword` using a Git built with `SANITIZE` flag set
> > to `address,leak`, we could observe the following leak being reported:
> 
> Huh, curious. That seems to hint that we're missing test coverage for
> this specific scenario, as our test suite doesn't detect this leak.

I think it will only leak when the commit object has an "encoding"
header. See below.

> > As part of rewording a commit in `git history`, we get the commit message
> > buffer in the `commit_tree_ext` function. This in turn obtains the buffer
> > from `repo_logmsg_reencode`. Given how `commit_tree_ext` is invoking the
> > function with the last two parameters as NULL, we are clearly not expecting
> > a reencode to happen. In this case, the buffer that we receive from
> > `repo_logmsg_reencode` ends up always being obtained from a call to
> > `repo_get_commit_buffer`.
> > 
> > This buffer is expected to be released with an accompanying call to
> > `repo_unuse_commit_buffer` which takes care of freeing it. This call
> > is missing in the `commit_tree_ext` flow thus resulting in the leak.
> 
> So this doesn't really read specific at all, and I would have expected
> us to hit this leak. Puzzling.

The first paragraph is accurate here. We'd generally just get a pointer
to the buffer cached in the slab, because no re-encoding occurs. And in
that case you _don't_ need to call unuse_commit_buffer(), because you
have a read-only copy, and the slab cache will hold it forever[1].
Calling the unuse function will be a noop.

But when we _do_ re-encode, then you get a new buffer which must be
freed. And that is when you have to call the unuse function. And the
reason it is "unuse" and not just "free" is that you don't necessarily
know which you have, but that function figures it out (and frees it only
if necessary).

So what the patch is doing is correct, but the explanation is a little
confused. We see the leak only when re-encoding, so we'd probably want a
test case that triggers that. Which I assume implies rewriting a commit
that was previously generated with an encoding header.

Now back to that [1] note. Even if we didn't re-encode, we'll still hold
onto that buffer forever. It's not a "leak" in the traditional sense
because it's still referenced in the commit slab cache. But if you are
going to walk over a million commits (like git-log does), you probably
don't want to hold a million commit messages in memory at once.

For that you'd want to call free_commit_buffer() when you know you're
totally done with it (again, like git-log does after it finishes showing
the commit). That might be the case here in commit_tree_ext(), or it
might happen later (I'm not familiar with the git-history code).

But note that you need to do _both_ the unuse and free calls. If we did
re-encode, the former is needed to free the newly allocated buffer. The
latter only drops the original buffer in the cache.

-Peff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] builtin/history: unuse the commit buffer after use
  2026-06-15 17:29   ` Jeff King
@ 2026-06-16  5:45     ` Patrick Steinhardt
  0 siblings, 0 replies; 4+ messages in thread
From: Patrick Steinhardt @ 2026-06-16  5:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Kaartic Sivaraam, Git mailing list

On Mon, Jun 15, 2026 at 01:29:46PM -0400, Jeff King wrote:
> On Mon, Jun 15, 2026 at 11:48:10AM +0200, Patrick Steinhardt wrote:
> 
> > On Sun, Jun 14, 2026 at 02:15:40PM +0000, Kaartic Sivaraam wrote:
> > > While running `git history reword` using a Git built with `SANITIZE` flag set
> > > to `address,leak`, we could observe the following leak being reported:
> > 
> > Huh, curious. That seems to hint that we're missing test coverage for
> > this specific scenario, as our test suite doesn't detect this leak.
> 
> I think it will only leak when the commit object has an "encoding"
> header. See below.
> 
> > > As part of rewording a commit in `git history`, we get the commit message
> > > buffer in the `commit_tree_ext` function. This in turn obtains the buffer
> > > from `repo_logmsg_reencode`. Given how `commit_tree_ext` is invoking the
> > > function with the last two parameters as NULL, we are clearly not expecting
> > > a reencode to happen. In this case, the buffer that we receive from
> > > `repo_logmsg_reencode` ends up always being obtained from a call to
> > > `repo_get_commit_buffer`.
> > > 
> > > This buffer is expected to be released with an accompanying call to
> > > `repo_unuse_commit_buffer` which takes care of freeing it. This call
> > > is missing in the `commit_tree_ext` flow thus resulting in the leak.
> > 
> > So this doesn't really read specific at all, and I would have expected
> > us to hit this leak. Puzzling.
> 
> The first paragraph is accurate here. We'd generally just get a pointer
> to the buffer cached in the slab, because no re-encoding occurs. And in
> that case you _don't_ need to call unuse_commit_buffer(), because you
> have a read-only copy, and the slab cache will hold it forever[1].
> Calling the unuse function will be a noop.
> 
> But when we _do_ re-encode, then you get a new buffer which must be
> freed. And that is when you have to call the unuse function. And the
> reason it is "unuse" and not just "free" is that you don't necessarily
> know which you have, but that function figures it out (and frees it only
> if necessary).
> 
> So what the patch is doing is correct, but the explanation is a little
> confused. We see the leak only when re-encoding, so we'd probably want a
> test case that triggers that. Which I assume implies rewriting a commit
> that was previously generated with an encoding header.
> 
> Now back to that [1] note. Even if we didn't re-encode, we'll still hold
> onto that buffer forever. It's not a "leak" in the traditional sense
> because it's still referenced in the commit slab cache. But if you are
> going to walk over a million commits (like git-log does), you probably
> don't want to hold a million commit messages in memory at once.
> 
> For that you'd want to call free_commit_buffer() when you know you're
> totally done with it (again, like git-log does after it finishes showing
> the commit). That might be the case here in commit_tree_ext(), or it
> might happen later (I'm not familiar with the git-history code).
> 
> But note that you need to do _both_ the unuse and free calls. If we did
> re-encode, the former is needed to free the newly allocated buffer. The
> latter only drops the original buffer in the cache.

Thanks for the explanation!

Patrick

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-16  5:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-14 14:15 [PATCH] builtin/history: unuse the commit buffer after use Kaartic Sivaraam
2026-06-15  9:48 ` Patrick Steinhardt
2026-06-15 17:29   ` Jeff King
2026-06-16  5:45     ` Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox