git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, "René Scharfe" <l.s.r@web.de>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks
Date: Mon, 19 Dec 2022 10:20:00 +0100	[thread overview]
Message-ID: <221219.867cyn20zj.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <Y53AXmfabIvdkfZz@coredump.intra.peff.net>


On Sat, Dec 17 2022, Jeff King wrote:

> On Thu, Dec 15, 2022 at 10:11:06AM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> This is an alternative to René's [1], his already fixes a leak in "git
>> am", and this could be done later, so I'm submitting it as RFC, but it
>> could also replace it.
>> 
>> I think as this series shows extending the "strvec" API to get a
>> feature that works like the existing "strdup_strings" that the "struct
>> string_list" has can make memory management much simpler.
>
> I know this is kind of a surface level review, but...please don't do
> this. We have chased so many bugs over the years due to string-list's
> "maybe this is allocated and maybe not", in both directions (accidental
> leaks and double-frees).
>
> One of the reasons I advocated for strvec in the first place is so that
> it would have consistent memory management semantics, at the minor cost
> of sometimes duplicating them when we don't need to.
>
> And having a nodup form doesn't even save you from having to call
> strvec_clear(); you still need to do so to avoid leaking the array
> itself. It only helps in the weird parse-options case, where we don't
> handle ownership of the array very well (the strvec owns it, but
> parse-options wants to modify it).

Yes, just like "struct string_list" in the "nodup" mode.

I hear you, but I also think you're implicitly conflating two things
here.

There's the question of whether we should in general optimize for safety
over more optimila memory use. I.e. if we simply have every strvec,
string_list etc. own its memory fully we don't need to think as much
about allocation or ownership.

I think we should do that in general, but we also have cases where we'd
like to not do that, e.g. where we're adding thousands of strings to a
string_list, which are all borrewed from elsewhere, except for a few
we'd like to xstrdup().

Such API use *is* tricky, but I think formalizing it as the
"string_list" does is making it better, not worse. In particular...:

>> This does make the API slightly more dangerous to use, as it's no
>> longer guaranteed that it owns all the members it points to. But as
>> the "struct string_list" has shown this isn't an issue in practice,
>> and e.g. SANITIZE=address et al are good about finding double-frees,
>> or frees of fixed strings.
>
> I would disagree that this hasn't been an issue in practice. A few
> recent examples:
>
>   - 5eeb9aa208 (refs: fix memory leak when parsing hideRefs config,
>     2022-11-17)
>   - 7e2619d8ff (list_objects_filter_options: plug leak of filter_spec
>     strings, 2022-09-08)
>   - 4c81ee9669 (submodule--helper: fix "reference" leak, 2022-09-01)

...it's funny that those are the examples I would have dug up to argue
that this is a good idea, and to add some:

	- 4a0479086a9 (commit-graph: fix memory leak in misused
          string_list API, 2022-03-04)
	- b202e51b154 (grep: fix a "path_list" memory leak, 2021-10-22)

I.e. above you note "in both directions [...] leaks [...] and double
frees", but these (and the ones I added) are all in the second category.

Which is why I don't think it's an issue in practice. The leaks have
been a non-issue, and to the extent that we care the SANITIZE=leak
testing is closing that gap.

The dangerous issue is the double-free, but (and over the years I have
looked at pretty much every caller) I can't imagine a string_list
use-case where we:

 a) Actually still want to keep that memory optimization, i.e. it
    wouldn't be better by just saying "screw it, let's dup it".
 b) Given "a", we'd be better off with some bespoke custom pattern over
    the facility to do this with the "string_list".

So I really think we're in agreement about 99% of this, I just don't see
how *if* we want to do this why we're better of re-inventing this
particular wheel for every caller.

  reply	other threads:[~2022-12-19  9:31 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-13  6:47 [PATCH] am: don't pass strvec to apply_parse_options() René Scharfe
2022-12-13  8:37 ` Ævar Arnfjörð Bjarmason
2022-12-13 18:31   ` René Scharfe
2022-12-14  8:44     ` Ævar Arnfjörð Bjarmason
2022-12-15  9:11       ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks Ævar Arnfjörð Bjarmason
2022-12-15  9:11         ` [RFC PATCH 1/5] builtin/annotate.c: simplify for strvec API Ævar Arnfjörð Bjarmason
2022-12-17 12:45           ` René Scharfe
2022-12-15  9:11         ` [RFC PATCH 2/5] various: add missing strvec_clear() Ævar Arnfjörð Bjarmason
2022-12-15  9:11         ` [RFC PATCH 3/5] strvec API: add a "STRVEC_INIT_NODUP" Ævar Arnfjörð Bjarmason
2022-12-17 12:45           ` René Scharfe
2022-12-15  9:11         ` [RFC PATCH 4/5] strvec API users: fix leaks by using "STRVEC_INIT_NODUP" Ævar Arnfjörð Bjarmason
2022-12-17 12:45           ` René Scharfe
2022-12-15  9:11         ` [RFC PATCH 5/5] strvec API users: fix more " Ævar Arnfjörð Bjarmason
2022-12-17 12:45           ` René Scharfe
2022-12-17 12:45         ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks René Scharfe
2022-12-17 13:13         ` Jeff King
2022-12-19  9:20           ` Ævar Arnfjörð Bjarmason [this message]
2023-01-07 13:21             ` Jeff King
2022-12-17 12:46       ` [PATCH] am: don't pass strvec to apply_parse_options() René Scharfe
2022-12-17 13:24     ` Jeff King
2022-12-17 16:07       ` René Scharfe
2022-12-17 21:53         ` Jeff King
2022-12-18  2:42           ` Junio C Hamano
2022-12-20  1:29         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=221219.867cyn20zj.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).