From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: "René Scharfe" <l.s.r@web.de>
Cc: Git List <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] am: don't pass strvec to apply_parse_options()
Date: Wed, 14 Dec 2022 09:44:12 +0100 [thread overview]
Message-ID: <221214.86ilie48cv.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <2d0d77a4-f6ac-1fa7-bddb-9083579d8dd7@web.de>
On Tue, Dec 13 2022, René Scharfe wrote:
> Am 13.12.22 um 09:37 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Tue, Dec 13 2022, René Scharfe wrote:
>>
>>> apply_parse_options() passes the array of argument strings to
>>> parse_options(), which removes recognized options. The removed strings
>>> are not freed, though.
>>>
>>> Make a copy of the strvec to pass to the function to retain the pointers
>>> of its strings, so we release them all at the end.
>>>
>>> Signed-off-by: René Scharfe <l.s.r@web.de>
>>> ---
>>> builtin/am.c | 12 +++++++++++-
>>> 1 file changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/builtin/am.c b/builtin/am.c
>>> index 30c9b3a9cd..dddf1b9af0 100644
>>> --- a/builtin/am.c
>>> +++ b/builtin/am.c
>>> @@ -1476,6 +1476,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
>>> int res, opts_left;
>>> int force_apply = 0;
>>> int options = 0;
>>> + const char **apply_argv;
>>>
>>> if (init_apply_state(&apply_state, the_repository, NULL))
>>> BUG("init_apply_state() failed");
>>> @@ -1483,7 +1484,15 @@ static int run_apply(const struct am_state *state, const char *index_file)
>>> strvec_push(&apply_opts, "apply");
>>> strvec_pushv(&apply_opts, state->git_apply_opts.v);
>>>
>>> - opts_left = apply_parse_options(apply_opts.nr, apply_opts.v,
>>> + /*
>>> + * Build a copy that apply_parse_options() can rearrange.
>>> + * apply_opts.v keeps referencing the allocated strings for
>>> + * strvec_clear() to release.
>>> + */
>>> + ALLOC_ARRAY(apply_argv, apply_opts.nr);
>>> + COPY_ARRAY(apply_argv, apply_opts.v, apply_opts.nr);
>>> +
>>> + opts_left = apply_parse_options(apply_opts.nr, apply_argv,
>>> &apply_state, &force_apply, &options,
>>> NULL);
>>>
>>> @@ -1513,6 +1522,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
>>> strvec_clear(&apply_paths);
>>> strvec_clear(&apply_opts);
>>> clear_apply_state(&apply_state);
>>> + free(apply_argv);
>>>
>>> if (res)
>>> return res;
>>
>> I don't mind this going in, but it really feels like a bit of a dirty
>> hack.
>>
>> We have widespread leaks all over the place due to this
>> idiom. I.e. parse_options() and a couple of other APIs expect that they
>> can munge the "argv", which is fine if it arrives via main(), but not if
>> we're editing our own strvecs.
>
> Where? A quick "git grep 'parse_options.*nr'" turns up only this place
> as one that passes a strvec to parse_options.
Sorry about the vagueness, this was from memory and I didn't have a full
list handy.
First, that grep isn't going to give you what you want, since it only
catches cases where we're passing the "strvec" to the "parse_options()"
directly.
But if you look at e.g. how cmd_stash() invokes push_stash() or
cmd_describe() invoking cmd_name_rev() you'll see callers where we're
potentially losing the strvec due to parse_options().
"Potentially" because in both those cases we call that API, but in the
latter case we've got a missing strvec_clear(), so maybe that's all
that's needed (it doesn't always munge the argv).
>> I think less of a hack is to teach the eventual parse_options() that
>> when it munges it it should free() it. I did that for the revisions API
>> in f92dbdbc6a8 (revisions API: don't leak memory on argv elements that
>> need free()-ing, 2022-08-02).
>>
>> What do you think?
>
> Generating string lists and then parsing them is weird. When calls have
> to cross a process boundary then we have no choice, but in-process we
> shouldn't have to lower our request to an intermediate text format. git
> am does it anyway because it writes its options to a file and reads them
> back when it resumes with --continue, IIUC.
I agree, but making all those use a nicer API would be a much bigger
change.
> I hope that is and will be the only place that uses parse_options() with
> a strvec -- and then we don't have to change that function.
>
> If this pattern is used more widely then we could package the copying
> done by this patch somehow, e.g. by adding a strvec_parse_options()
> that wraps the real thing.
The reason I'm encouraging you to look at some of the other strvec leaks
is to see if you can think of a pattern that fits these various leaky
callers.
So e.g. a strvec_parse_options() won't work for the ones noted above
where we've lost track of it being a "struct strvec" in the first place.
I have a local version somewhere where I did (pseudocode):
struct strvec vec = STRVEC_INIT;
struct string_list to_free = STRING_LIST_INIT_DUP;
// populate vec...
for (size_t i = 0; i < vec.nr; i++)
string_list_append_nodup(&to_free, v[i]):
// call parse_options(), or otherwise munge "vec"...
free(strvec_detach(&vec));
string_list_clear(&to_free, 0);
I.e. you snapshot the members of the "vec" before it's munged, and
free() those via a "struct string_list".
Then you don't strvec_clear(), but instead free() the container, its
members being free'd by the string_list.
Here's a (not yet in-tree) patch that uses that:
https://lore.kernel.org/git/patch-10.10-81f138e460c-20221017T115544Z-avarab@gmail.com/
I think that's more reliable & general than the pattern you're adding
here.
In your version the only reason we're not segfaulting is because the
parse_options() consumes all the arguments, but that's not something you
can generally rely on with parse_options().
Also, and maybe I'm missing something, but wouldn't this be functionally
the same as your implementation:
diff --git a/builtin/am.c b/builtin/am.c
index 30c9b3a9cd7..e65262cbb21 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1476,11 +1476,13 @@ static int run_apply(const struct am_state *state, const char *index_file)
int res, opts_left;
int force_apply = 0;
int options = 0;
+ char *to_free;
if (init_apply_state(&apply_state, the_repository, NULL))
BUG("init_apply_state() failed");
strvec_push(&apply_opts, "apply");
+ to_free = (char *)apply_opts.v[0];
strvec_pushv(&apply_opts, state->git_apply_opts.v);
opts_left = apply_parse_options(apply_opts.nr, apply_opts.v,
@@ -1489,6 +1491,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
if (opts_left != 0)
die("unknown option passed through to git apply");
+ free(to_free);
if (index_file) {
apply_state.index_file = index_file;
I.e. we leak the strvec_push() of the "apply" literal, but for the rest
we append the "state->git_apply_opts.v", which we already free()
elsewhere.
So actually, aren't we really just fumbling our way towards the "nodup"
interface that the "struct string_list" has, and we should add the same
to the "struct strvec".
This seems to also fix all the leaks you fixed, but without any of the
copying:
diff --git a/builtin/am.c b/builtin/am.c
index 30c9b3a9cd7..691ec1d152d 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1471,7 +1471,7 @@ static int parse_mail_rebase(struct am_state *state, const char *mail)
static int run_apply(const struct am_state *state, const char *index_file)
{
struct strvec apply_paths = STRVEC_INIT;
- struct strvec apply_opts = STRVEC_INIT;
+ struct strvec apply_opts = STRVEC_INIT_NODUP;
struct apply_state apply_state;
int res, opts_left;
int force_apply = 0;
diff --git a/strvec.c b/strvec.c
index 61a76ce6cb9..efdc47a5854 100644
--- a/strvec.c
+++ b/strvec.c
@@ -22,7 +22,9 @@ static void strvec_push_nodup(struct strvec *array, const char *value)
const char *strvec_push(struct strvec *array, const char *value)
{
- strvec_push_nodup(array, xstrdup(value));
+ const char *to_push = array->nodup_strings ? value : xstrdup(value);
+
+ strvec_push_nodup(array, to_push);
return array->v[array->nr - 1];
}
@@ -31,6 +33,9 @@ const char *strvec_pushf(struct strvec *array, const char *fmt, ...)
va_list ap;
struct strbuf v = STRBUF_INIT;
+ if (array->nodup_strings)
+ BUG("cannot strvec_pushf() on a 'nodup' strvec");
+
va_start(ap, fmt);
strbuf_vaddf(&v, fmt, ap);
va_end(ap);
@@ -67,6 +72,9 @@ void strvec_pop(struct strvec *array)
void strvec_split(struct strvec *array, const char *to_split)
{
+ if (array->nodup_strings)
+ BUG("cannot strvec_split() on a 'nodup' strvec");
+
while (isspace(*to_split))
to_split++;
for (;;) {
@@ -90,7 +98,7 @@ void strvec_clear(struct strvec *array)
if (array->v != empty_strvec) {
int i;
for (i = 0; i < array->nr; i++)
- free((char *)array->v[i]);
+ free(array->nodup_strings ? NULL : (char *)array->v[i]);
free(array->v);
}
strvec_init(array);
diff --git a/strvec.h b/strvec.h
index 9f55c8766ba..ac6e2c04cea 100644
--- a/strvec.h
+++ b/strvec.h
@@ -31,12 +31,18 @@ struct strvec {
const char **v;
size_t nr;
size_t alloc;
+ unsigned int nodup_strings:1;
};
#define STRVEC_INIT { \
.v = empty_strvec, \
}
+#define STRVEC_INIT_NODUP { \
+ .v = empty_strvec, \
+ .nodup_strings = 1, \
+}
+
/**
* Initialize an array. This is no different than assigning from
* `STRVEC_INIT`.
In any case, for this change (and leak fixes in general), could you
please also mark the now-passing leak-free tests. This fails after your
patch, but passes before.
The failure is a "good" one, as they're now leak-free, but should be
marked as such:
GIT_TEST_PASSING_SANITIZE_LEAK=check GIT_TEST_SANITIZE_LEAK_LOG=true \
make SANITIZE=leak test T="t4256-am-format-flowed.sh t0023-crlf-am.sh t4254-am-corrupt.sh t4257-am-interactive.sh t5403-post-checkout-hook.sh t4152-am-subjects.sh"
> If we have to change parse_options() at all then I'd prefer it to not
> free() anything (to keep it usable with main()'s parameters),...
I'm suggesting passing a flag to it to have it free() things it
NULL's.
Maybe that's a bad idea, but I don't see the incompatibility with
main(), for those we just wouldn't pass the flag.
It does have the inherent limitation that you couldn't mix strvec and
non-strvec parameters, i.e. if some of your elements are dup'd but
others aren't you'd need to arrange to have them all dup'd.
But I don't think that's an issue in practice, either we pass the fully
dup'd version, or main() parameters.
> but to
> reorder in a non-destructive way. That would mean keeping the NULL
> sentinel where it is, and making sure all callers use only the returned
> argc to determine which arguments parse_options() didn't recognize.
I think this would work if parse_options() wasn't clobbering those
elements in some cases, and replacing them with new ones, but doesn't it
do that e.g. in parse_options_step()?
It also seems like a much more fragile change to try to ensure that
nothing uses the "argv" again, but only looks at the updated
"argc".
Both that & adding a "const" to it would be a huge change across the
codebase, so I'd like to avoid them, but at least the "const" method
would be helped along by the compiler.
next prev parent reply other threads:[~2022-12-14 9:56 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-13 6:47 [PATCH] am: don't pass strvec to apply_parse_options() René Scharfe
2022-12-13 8:37 ` Ævar Arnfjörð Bjarmason
2022-12-13 18:31 ` René Scharfe
2022-12-14 8:44 ` Ævar Arnfjörð Bjarmason [this message]
2022-12-15 9:11 ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks Ævar Arnfjörð Bjarmason
2022-12-15 9:11 ` [RFC PATCH 1/5] builtin/annotate.c: simplify for strvec API Ævar Arnfjörð Bjarmason
2022-12-17 12:45 ` René Scharfe
2022-12-15 9:11 ` [RFC PATCH 2/5] various: add missing strvec_clear() Ævar Arnfjörð Bjarmason
2022-12-15 9:11 ` [RFC PATCH 3/5] strvec API: add a "STRVEC_INIT_NODUP" Ævar Arnfjörð Bjarmason
2022-12-17 12:45 ` René Scharfe
2022-12-15 9:11 ` [RFC PATCH 4/5] strvec API users: fix leaks by using "STRVEC_INIT_NODUP" Ævar Arnfjörð Bjarmason
2022-12-17 12:45 ` René Scharfe
2022-12-15 9:11 ` [RFC PATCH 5/5] strvec API users: fix more " Ævar Arnfjörð Bjarmason
2022-12-17 12:45 ` René Scharfe
2022-12-17 12:45 ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks René Scharfe
2022-12-17 13:13 ` Jeff King
2022-12-19 9:20 ` Ævar Arnfjörð Bjarmason
2023-01-07 13:21 ` Jeff King
2022-12-17 12:46 ` [PATCH] am: don't pass strvec to apply_parse_options() René Scharfe
2022-12-17 13:24 ` Jeff King
2022-12-17 16:07 ` René Scharfe
2022-12-17 21:53 ` Jeff King
2022-12-18 2:42 ` Junio C Hamano
2022-12-20 1:29 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=221214.86ilie48cv.gmgdl@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=l.s.r@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).