All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: "René Scharfe" <l.s.r@web.de>
Cc: Git List <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] am: don't pass strvec to apply_parse_options()
Date: Wed, 14 Dec 2022 09:44:12 +0100	[thread overview]
Message-ID: <221214.86ilie48cv.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <2d0d77a4-f6ac-1fa7-bddb-9083579d8dd7@web.de>


On Tue, Dec 13 2022, René Scharfe wrote:

> Am 13.12.22 um 09:37 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Tue, Dec 13 2022, René Scharfe wrote:
>>
>>> apply_parse_options() passes the array of argument strings to
>>> parse_options(), which removes recognized options.  The removed strings
>>> are not freed, though.
>>>
>>> Make a copy of the strvec to pass to the function to retain the pointers
>>> of its strings, so we release them all at the end.
>>>
>>> Signed-off-by: René Scharfe <l.s.r@web.de>
>>> ---
>>>  builtin/am.c | 12 +++++++++++-
>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/builtin/am.c b/builtin/am.c
>>> index 30c9b3a9cd..dddf1b9af0 100644
>>> --- a/builtin/am.c
>>> +++ b/builtin/am.c
>>> @@ -1476,6 +1476,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
>>>  	int res, opts_left;
>>>  	int force_apply = 0;
>>>  	int options = 0;
>>> +	const char **apply_argv;
>>>
>>>  	if (init_apply_state(&apply_state, the_repository, NULL))
>>>  		BUG("init_apply_state() failed");
>>> @@ -1483,7 +1484,15 @@ static int run_apply(const struct am_state *state, const char *index_file)
>>>  	strvec_push(&apply_opts, "apply");
>>>  	strvec_pushv(&apply_opts, state->git_apply_opts.v);
>>>
>>> -	opts_left = apply_parse_options(apply_opts.nr, apply_opts.v,
>>> +	/*
>>> +	 * Build a copy that apply_parse_options() can rearrange.
>>> +	 * apply_opts.v keeps referencing the allocated strings for
>>> +	 * strvec_clear() to release.
>>> +	 */
>>> +	ALLOC_ARRAY(apply_argv, apply_opts.nr);
>>> +	COPY_ARRAY(apply_argv, apply_opts.v, apply_opts.nr);
>>> +
>>> +	opts_left = apply_parse_options(apply_opts.nr, apply_argv,
>>>  					&apply_state, &force_apply, &options,
>>>  					NULL);
>>>
>>> @@ -1513,6 +1522,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
>>>  	strvec_clear(&apply_paths);
>>>  	strvec_clear(&apply_opts);
>>>  	clear_apply_state(&apply_state);
>>> +	free(apply_argv);
>>>
>>>  	if (res)
>>>  		return res;
>>
>> I don't mind this going in, but it really feels like a bit of a dirty
>> hack.
>>
>> We have widespread leaks all over the place due to this
>> idiom. I.e. parse_options() and a couple of other APIs expect that they
>> can munge the "argv", which is fine if it arrives via main(), but not if
>> we're editing our own strvecs.
>
> Where?  A quick "git grep 'parse_options.*nr'" turns up only this place
> as one that passes a strvec to parse_options.

Sorry about the vagueness, this was from memory and I didn't have a full
list handy.

First, that grep isn't going to give you what you want, since it only
catches cases where we're passing the "strvec" to the "parse_options()"
directly.

But if you look at e.g. how cmd_stash() invokes push_stash() or
cmd_describe() invoking cmd_name_rev() you'll see callers where we're
potentially losing the strvec due to parse_options().

"Potentially" because in both those cases we call that API, but in the
latter case we've got a missing strvec_clear(), so maybe that's all
that's needed (it doesn't always munge the argv).

>> I think less of a hack is to teach the eventual parse_options() that
>> when it munges it it should free() it. I did that for the revisions API
>> in f92dbdbc6a8 (revisions API: don't leak memory on argv elements that
>> need free()-ing, 2022-08-02).
>>
>> What do you think?
>
> Generating string lists and then parsing them is weird.  When calls have
> to cross a process boundary then we have no choice, but in-process we
> shouldn't have to lower our request to an intermediate text format.  git
> am does it anyway because it writes its options to a file and reads them
> back when it resumes with --continue, IIUC.

I agree, but making all those use a nicer API would be a much bigger
change.

> I hope that is and will be the only place that uses parse_options() with
> a strvec -- and then we don't have to change that function.
>
> If this pattern is used more widely then we could package the copying
> done by this patch somehow, e.g. by adding a strvec_parse_options()
> that wraps the real thing.

The reason I'm encouraging you to look at some of the other strvec leaks
is to see if you can think of a pattern that fits these various leaky
callers.

So e.g. a strvec_parse_options() won't work for the ones noted above
where we've lost track of it being a "struct strvec" in the first place.

I have a local version somewhere where I did (pseudocode):

	struct strvec vec = STRVEC_INIT;
	struct string_list to_free = STRING_LIST_INIT_DUP;

        // populate vec...
        for (size_t i = 0; i < vec.nr; i++)
		string_list_append_nodup(&to_free, v[i]):
	// call parse_options(), or otherwise munge "vec"...
	free(strvec_detach(&vec));
        string_list_clear(&to_free, 0);

I.e. you snapshot the members of the "vec" before it's munged, and
free() those via a "struct string_list".

Then you don't strvec_clear(), but instead free() the container, its
members being free'd by the string_list.

Here's a (not yet in-tree) patch that uses that:
https://lore.kernel.org/git/patch-10.10-81f138e460c-20221017T115544Z-avarab@gmail.com/

I think that's more reliable & general than the pattern you're adding
here.

In your version the only reason we're not segfaulting is because the
parse_options() consumes all the arguments, but that's not something you
can generally rely on with parse_options().

Also, and maybe I'm missing something, but wouldn't this be functionally
the same as your implementation:
	
	diff --git a/builtin/am.c b/builtin/am.c
	index 30c9b3a9cd7..e65262cbb21 100644
	--- a/builtin/am.c
	+++ b/builtin/am.c
	@@ -1476,11 +1476,13 @@ static int run_apply(const struct am_state *state, const char *index_file)
	 	int res, opts_left;
	 	int force_apply = 0;
	 	int options = 0;
	+	char *to_free;
	 
	 	if (init_apply_state(&apply_state, the_repository, NULL))
	 		BUG("init_apply_state() failed");
	 
	 	strvec_push(&apply_opts, "apply");
	+	to_free = (char *)apply_opts.v[0];
	 	strvec_pushv(&apply_opts, state->git_apply_opts.v);
	 
	 	opts_left = apply_parse_options(apply_opts.nr, apply_opts.v,
	@@ -1489,6 +1491,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
	 
	 	if (opts_left != 0)
	 		die("unknown option passed through to git apply");
	+	free(to_free);
	 
	 	if (index_file) {
	 		apply_state.index_file = index_file;

I.e. we leak the strvec_push() of the "apply" literal, but for the rest
we append the "state->git_apply_opts.v", which we already free()
elsewhere.

So actually, aren't we really just fumbling our way towards the "nodup"
interface that the "struct string_list" has, and we should add the same
to the "struct strvec".

This seems to also fix all the leaks you fixed, but without any of the
copying:

	diff --git a/builtin/am.c b/builtin/am.c
	index 30c9b3a9cd7..691ec1d152d 100644
	--- a/builtin/am.c
	+++ b/builtin/am.c
	@@ -1471,7 +1471,7 @@ static int parse_mail_rebase(struct am_state *state, const char *mail)
	 static int run_apply(const struct am_state *state, const char *index_file)
	 {
	 	struct strvec apply_paths = STRVEC_INIT;
	-	struct strvec apply_opts = STRVEC_INIT;
	+	struct strvec apply_opts = STRVEC_INIT_NODUP;
	 	struct apply_state apply_state;
	 	int res, opts_left;
	 	int force_apply = 0;
	diff --git a/strvec.c b/strvec.c
	index 61a76ce6cb9..efdc47a5854 100644
	--- a/strvec.c
	+++ b/strvec.c
	@@ -22,7 +22,9 @@ static void strvec_push_nodup(struct strvec *array, const char *value)
	 
	 const char *strvec_push(struct strvec *array, const char *value)
	 {
	-	strvec_push_nodup(array, xstrdup(value));
	+	const char *to_push = array->nodup_strings ? value : xstrdup(value);
	+
	+	strvec_push_nodup(array, to_push);
	 	return array->v[array->nr - 1];
	 }
	 
	@@ -31,6 +33,9 @@ const char *strvec_pushf(struct strvec *array, const char *fmt, ...)
	 	va_list ap;
	 	struct strbuf v = STRBUF_INIT;
	 
	+	if (array->nodup_strings)
	+		BUG("cannot strvec_pushf() on a 'nodup' strvec");
	+
	 	va_start(ap, fmt);
	 	strbuf_vaddf(&v, fmt, ap);
	 	va_end(ap);
	@@ -67,6 +72,9 @@ void strvec_pop(struct strvec *array)
	 
	 void strvec_split(struct strvec *array, const char *to_split)
	 {
	+	if (array->nodup_strings)
	+		BUG("cannot strvec_split() on a 'nodup' strvec");
	+
	 	while (isspace(*to_split))
	 		to_split++;
	 	for (;;) {
	@@ -90,7 +98,7 @@ void strvec_clear(struct strvec *array)
	 	if (array->v != empty_strvec) {
	 		int i;
	 		for (i = 0; i < array->nr; i++)
	-			free((char *)array->v[i]);
	+			free(array->nodup_strings ? NULL : (char *)array->v[i]);
	 		free(array->v);
	 	}
	 	strvec_init(array);
	diff --git a/strvec.h b/strvec.h
	index 9f55c8766ba..ac6e2c04cea 100644
	--- a/strvec.h
	+++ b/strvec.h
	@@ -31,12 +31,18 @@ struct strvec {
	 	const char **v;
	 	size_t nr;
	 	size_t alloc;
	+	unsigned int nodup_strings:1;
	 };
	 
	 #define STRVEC_INIT { \
	 	.v = empty_strvec, \
	 }
	 
	+#define STRVEC_INIT_NODUP { \
	+	.v = empty_strvec, \
	+	.nodup_strings = 1, \
	+}
	+
	 /**
	  * Initialize an array. This is no different than assigning from
	  * `STRVEC_INIT`.

In any case, for this change (and leak fixes in general), could you
please also mark the now-passing leak-free tests. This fails after your
patch, but passes before.

The failure is a "good" one, as they're now leak-free, but should be
marked as such:

	GIT_TEST_PASSING_SANITIZE_LEAK=check GIT_TEST_SANITIZE_LEAK_LOG=true \
	make SANITIZE=leak test T="t4256-am-format-flowed.sh t0023-crlf-am.sh t4254-am-corrupt.sh t4257-am-interactive.sh t5403-post-checkout-hook.sh t4152-am-subjects.sh"

> If we have to change parse_options() at all then I'd prefer it to not
> free() anything (to keep it usable with main()'s parameters),...

I'm suggesting passing a flag to it to have it free() things it
NULL's.

Maybe that's a bad idea, but I don't see the incompatibility with
main(), for those we just wouldn't pass the flag.

It does have the inherent limitation that you couldn't mix strvec and
non-strvec parameters, i.e. if some of your elements are dup'd but
others aren't you'd need to arrange to have them all dup'd.

But I don't think that's an issue in practice, either we pass the fully
dup'd version, or main() parameters.

> but to
> reorder in a non-destructive way.  That would mean keeping the NULL
> sentinel where it is, and making sure all callers use only the returned
> argc to determine which arguments parse_options() didn't recognize.

I think this would work if parse_options() wasn't clobbering those
elements in some cases, and replacing them with new ones, but doesn't it
do that e.g. in parse_options_step()?

It also seems like a much more fragile change to try to ensure that
nothing uses the "argv" again, but only looks at the updated
"argc".

Both that & adding a "const" to it would be a huge change across the
codebase, so I'd like to avoid them, but at least the "const" method
would be helped along by the compiler.

  reply	other threads:[~2022-12-14  9:56 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-13  6:47 [PATCH] am: don't pass strvec to apply_parse_options() René Scharfe
2022-12-13  8:37 ` Ævar Arnfjörð Bjarmason
2022-12-13 18:31   ` René Scharfe
2022-12-14  8:44     ` Ævar Arnfjörð Bjarmason [this message]
2022-12-15  9:11       ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks Ævar Arnfjörð Bjarmason
2022-12-15  9:11         ` [RFC PATCH 1/5] builtin/annotate.c: simplify for strvec API Ævar Arnfjörð Bjarmason
2022-12-17 12:45           ` René Scharfe
2022-12-15  9:11         ` [RFC PATCH 2/5] various: add missing strvec_clear() Ævar Arnfjörð Bjarmason
2022-12-15  9:11         ` [RFC PATCH 3/5] strvec API: add a "STRVEC_INIT_NODUP" Ævar Arnfjörð Bjarmason
2022-12-17 12:45           ` René Scharfe
2022-12-15  9:11         ` [RFC PATCH 4/5] strvec API users: fix leaks by using "STRVEC_INIT_NODUP" Ævar Arnfjörð Bjarmason
2022-12-17 12:45           ` René Scharfe
2022-12-15  9:11         ` [RFC PATCH 5/5] strvec API users: fix more " Ævar Arnfjörð Bjarmason
2022-12-17 12:45           ` René Scharfe
2022-12-17 12:45         ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks René Scharfe
2022-12-17 13:13         ` Jeff King
2022-12-19  9:20           ` Ævar Arnfjörð Bjarmason
2023-01-07 13:21             ` Jeff King
2022-12-17 12:46       ` [PATCH] am: don't pass strvec to apply_parse_options() René Scharfe
2022-12-17 13:24     ` Jeff King
2022-12-17 16:07       ` René Scharfe
2022-12-17 21:53         ` Jeff King
2022-12-18  2:42           ` Junio C Hamano
2022-12-20  1:29         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=221214.86ilie48cv.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.