From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: lidongyan <502024330056@smail.nju.edu.cn>,
Alex via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org, Alex <alexguo1023@gmail.com>,
jinyaoguo <guo846@purdue.edu>
Subject: Re: [PATCH] Allocate msg only after fatal checks to avoid leaks
Date: Sat, 14 Jun 2025 19:01:58 -0400 [thread overview]
Message-ID: <20250614230158.GA2568638@coredump.intra.peff.net> (raw)
In-Reply-To: <xmqqcyb672mc.fsf@gitster.g>
On Sat, Jun 14, 2025 at 08:40:43AM -0700, Junio C Hamano wrote:
> > A memory leak on the die() path shouldn't be considered a real leak,
> > right? Since the OS will clean up all memory once the process
> > terminates, explicitly freeing msg isn't necessary in this case.
>
> It may not matter in practice, but I think the leak checking
> machinery like sanitizers would still complain, so I view efforts on
> plugging such leaks in the error code paths more about decluttering
> the leak checker output to help us spot the real leaks.
I disagree here. These are not really leaks, and a leak-checker that
complains about them is bad.
When we call die(), the pointer to the buffer is still on the stack, and
thus the memory is still reachable and not leaked. Some tools like
valgrind may still report these as "still reachable", but because they
categorize them properly we can ignore them[1].
The one exception we've seen is that an optimizing compiler may reorder
instructions to obliterate the stack (because it knows die() is marked
with NORETURN), causing a false positive. We dealt with that via
d3775de074 (Makefile: force -O0 when compiling with SANITIZE=leak,
2022-10-18). I don't think we've seen any recurrence since then.
And while it may be tempting to say "well, it does not hurt to free them
on the die() path", in my opinion that way madness lies. You may have
access to some local variables that can be freed, but there will be many
other heap allocations that you don't even know about! Here's a toy
example from a similar discussion a few years ago:
https://lore.kernel.org/git/YNypPeoZTRiOxPPQ@coredump.intra.peff.net/
So I'd really prefer not to go down this route. And I think the existing
code in this patch's pre-image that calls free() before die() only on
one path should be simplified, so that all die() paths consistently do
not worry about this.
I.e., this:
diff --git a/builtin/notes.c b/builtin/notes.c
index cc1163242f..f3d5eda104 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -321,12 +321,8 @@ static int parse_reuse_arg(const struct option *opt, const char *arg, int unset)
die(_("failed to resolve '%s' as a valid ref."), arg);
if (!(value = odb_read_object(the_repository->objects, &object, &type, &len)))
die(_("failed to read object '%s'."), arg);
- if (type != OBJ_BLOB) {
- strbuf_release(&msg->buf);
- free(value);
- free(msg);
+ if (type != OBJ_BLOB)
die(_("cannot read note data from non-blob object '%s'."), arg);
- }
strbuf_add(&msg->buf, value, len);
free(value);
-Peff
[1] "still reachable" leaks _can_ be useful when returning from main,
because they may show memory held in global structures that we might
have been able to free at a more timely spot. But there are a lot of
these in Git, most of which are not interesting (e.g., is freeing
the_repository really worth caring about?), and I don't think there
is a good way to tell the difference.
More pontificating from that earlier discussion:
https://lore.kernel.org/git/YN3iIaovvG7XgLQP@coredump.intra.peff.net/
next prev parent reply other threads:[~2025-06-14 23:02 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-13 19:32 [PATCH] Allocate msg only after fatal checks to avoid leaks Alex via GitGitGadget
2025-06-13 20:37 ` Junio C Hamano
2025-06-14 8:26 ` lidongyan
2025-06-14 15:40 ` Junio C Hamano
2025-06-14 15:50 ` lidongyan
2025-06-14 23:01 ` Jeff King [this message]
2025-06-15 0:45 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250614230158.GA2568638@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=502024330056@smail.nju.edu.cn \
--cc=alexguo1023@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=guo846@purdue.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox