All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Enzo Matsumiya <ematsumiya@suse.de>
Subject: Re: [PATCH 5/5] run-command API: remove "argv" member, always use "args"
Date: Mon, 22 Nov 2021 19:19:14 +0100	[thread overview]
Message-ID: <211122.86wnl0xd24.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <YZvUN0kuTpmf9Q7P@coredump.intra.peff.net>


On Mon, Nov 22 2021, Jeff King wrote:

> On Mon, Nov 22, 2021 at 05:04:07PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> This change is larger than I'd like, but there's no easy way to avoid
>> it that wouldn't involve even more verbose intermediate steps. We use
>> the "argv" as the source of truth over the "args", so we need to
>> change all parts of run-command.[ch] itself, as well as the trace2
>> logging at the same time.
>
> Yeah, agreed most of this needs to come in a bundle. But at least few spots
> could be split out into the earlier patches:
>
>>  builtin/worktree.c          |  2 --
>>  t/helper/test-run-command.c | 10 +++++---
>
> as they are just users of the API.

Will split these out, I had test-run-command.c split initially, but
squashed it locally, which did away with having to explain leaving this
area untested as an intermediate step. But will split again.

>> diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c
>> index 3c4fb862234..b5519f92a19 100644
>> --- a/t/helper/test-run-command.c
>> +++ b/t/helper/test-run-command.c
>> [...]
>> @@ -396,7 +396,8 @@ int cmd__run_command(int argc, const char **argv)
>>  	}
>>  	if (argc < 3)
>>  		return 1;
>> -	proc.argv = (const char **)argv + 2;
>> +	strvec_clear(&proc.args);
>> +	strvec_pushv(&proc.args, (const char **)argv + 2);
>>  
>>  	if (!strcmp(argv[1], "start-command-ENOENT")) {
>>  		if (start_command(&proc) < 0 && errno == ENOENT)
>> @@ -408,7 +409,8 @@ int cmd__run_command(int argc, const char **argv)
>>  		exit(run_command(&proc));
>>  
>>  	jobs = atoi(argv[2]);
>> -	proc.argv = (const char **)argv + 3;
>> +	strvec_clear(&proc.args);
>> +	strvec_pushv(&proc.args, (const char **)argv + 3);
>>  
>>  	if (!strcmp(argv[1], "run-command-parallel"))
>>  		exit(run_processes_parallel(jobs, parallel_next,
>
> These two in particular are tricky. This second strvec_clear() is
> necessary because we are overwriting what was put into proc.args by the
> earlier strvec_pushv(). I think this is one of two interesting ways
> these kinds of trivial conversions can fail:
>
>   - somebody is relying on the fact that "argv = whatever" is an
>     assignment, not an append (which is the case here)
>
>   - somebody is relying on the fact that assignment of the pointer is
>     shallow, whereas strvec_push() is doing a deep copy. So converting:
>
>       cp.argv = argv;
>       argv[1] = "foo";
>
>     to:
>
>       strvec_pushv(&cp.args, argv);
>       argv[1] = "foo";
>
>     is not correct. We wouldn't use the updated "foo". I didn't notice
>     any cases like this during my rough own rough conversion, but they
>     could be lurking.

Yes, I tried to eyeball all the code involved & see if there were any. I
could amend this to start renaming variables, but that'll be a much
larger change.

> The strvec_clear() in the first hunk above is also not necessary (nobody
> has written anything before it), but I don't mind it for consistency /
> being defensive.

*nod*, will remove.

>> @@ -902,8 +900,9 @@ int start_command(struct child_process *cmd)
>>  #else
>>  {
>>  	int fhin = 0, fhout = 1, fherr = 2;
>> -	const char **sargv = cmd->argv;
>> +	const char **sargv = strvec_detach(&cmd->args);
>>  	struct strvec nargv = STRVEC_INIT;
>> +	const char **temp_argv = NULL;
>
> I wondered about detaching here, because other parts of the code
> (finish_command(), tracing, etc) will be looking at cmd->args.v[0] for
> the command name.
>
> But we eventually overwrite it with munged results:
>
>> @@ -929,20 +928,26 @@ int start_command(struct child_process *cmd)
>>  		fhout = dup(cmd->out);
>>  
>>  	if (cmd->git_cmd)
>> -		cmd->argv = prepare_git_cmd(&nargv, cmd->argv);
>> +		temp_argv = prepare_git_cmd(&nargv, sargv);
>>  	else if (cmd->use_shell)
>> -		cmd->argv = prepare_shell_cmd(&nargv, cmd->argv);
>> +		temp_argv = prepare_shell_cmd(&nargv, sargv);
>> +	else
>> +		temp_argv = sargv;
>> +	if (!temp_argv)
>> +		BUG("should have some cmd->args to set");
>> +	strvec_pushv(&cmd->args, temp_argv);
>> +	strvec_clear(&nargv);
>
> So we have to do some replacement. I think the memory management gets
> confusing here, though.
>
> At this point "temp_argv" might point to nargv.v (in which case it is a
> dangling pointer after we clear nargv) or to the original sargv (in
> which case it is memory owned by us that must be freed). The former is
> OK, because we never look at it again. And the latter we eventually
> reattach into &cmd->args, but:
>
>> -	strvec_clear(&nargv);
>> -	cmd->argv = sargv;
>> +	strvec_pushv(&cmd->args, sargv);;
>> +	free(sargv);
>
> I think we still have all the entries we pushed into cmd->args from
> temp_argv earlier. So we'd need to strvec_clear() it before pushing
> sargv again.
>
> And then the free(sargv) is too shallow. It's just freeing the outer
> array, but sargv[0], etc, are leaked.

I'll try to fix that, but updates to this part are very painful, since
I've needed to wait 30m for each change in the Windows CI.

> I think what you really want is the equivalent of strvec_attach(). We
> don't have that, but it's basically just:
>
>   void strvec_attach(struct strvec *s, const char **v)
>   {
> 	/*
> 	 * this is convenient for our caller, but if we wanted to find
> 	 * potential misuses, we could also BUG() if s.nr > 0
> 	 */
> 	strvec_clear(&s);
>
>         /* take over ownership */
> 	s->v = v;
>
> 	/* v must be NULL-terminated; count up to set number */
> 	s->nr = 0;
> 	for (; *v; v++)
> 		s->nr++;
> 	/*
> 	 * we don't know how big the original allocation was, so we can
> 	 * assume only nr. But add 1 to account for the NULL.
> 	 */
> 	s->alloc = s->nr + 1;
>   }
>
> I do think this whole thing would be easier to read if we left cmd->argv
> intact, and just operated on a separate argv strvec. That's what the
> non-Windows side does. But that distinction is nothing new in your
> patch, and I'm not sure if there is a reason that the Windows code does
> it differently.

I did have this with a strvec_attach() locally, but figured I'd get
comments about growing scope & with just one caller.

This version seems to be duplicating things in the existing API though,
I just had the below, which I think works just as well without the
duplication. Perhaps you missed strvec_push_nodup()?

diff --git a/strvec.c b/strvec.c
index 61a76ce6cb9..c10008d792f 100644
--- a/strvec.c
+++ b/strvec.c
@@ -106,3 +106,9 @@ const char **strvec_detach(struct strvec *array)
                return ret;
        }
 }
+
+void strvec_attach(struct strvec *array, const char **items)
+{
+       for (; *items; items++)
+               strvec_push_nodup(array, *items);
+}

  reply	other threads:[~2021-11-22 18:25 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-20 19:40 [PATCH v2] pager: fix crash when pager program doesn't exist Enzo Matsumiya
2021-11-21 18:37 ` Jeff King
2021-11-22  2:10   ` Junio C Hamano
2021-11-22  4:35     ` Jeff King
2021-11-22 14:52       ` Enzo Matsumiya
2021-11-22 17:05         ` Junio C Hamano
2021-11-23 16:40           ` Enzo Matsumiya
2021-11-24  1:55             ` Ævar Arnfjörð Bjarmason
2021-11-24 15:51               ` Jeff King
2021-11-22 16:04       ` [PATCH 0/5] run-command API: get rid of "argv" Ævar Arnfjörð Bjarmason
2021-11-22 16:04         ` [PATCH 1/5] archive-tar: use our own cmd.buf in error message Ævar Arnfjörð Bjarmason
2021-11-22 21:04           ` Junio C Hamano
2021-11-22 16:04         ` [PATCH 2/5] upload-archive: use regular "struct child_process" pattern Ævar Arnfjörð Bjarmason
2021-11-22 17:02           ` Jeff King
2021-11-22 20:53           ` Ævar Arnfjörð Bjarmason
2021-11-22 21:10             ` Jeff King
2021-11-22 21:36               ` Ævar Arnfjörð Bjarmason
2021-11-22 16:04         ` [PATCH 3/5] run-command API users: use strvec_pushv(), not argv assignment Ævar Arnfjörð Bjarmason
2021-11-22 21:19           ` Junio C Hamano
2021-11-22 21:30             ` Ævar Arnfjörð Bjarmason
2021-11-22 16:04         ` [PATCH 4/5] run-command API users: use strvec_pushl(), not argv construction Ævar Arnfjörð Bjarmason
2021-11-22 16:04         ` [PATCH 5/5] run-command API: remove "argv" member, always use "args" Ævar Arnfjörð Bjarmason
2021-11-22 17:32           ` Jeff King
2021-11-22 18:19             ` Ævar Arnfjörð Bjarmason [this message]
2021-11-22 18:47               ` Jeff King
2021-11-22 17:52         ` [PATCH 0/5] run-command API: get rid of "argv" Jeff King
2021-11-22 18:11           ` Junio C Hamano
2021-11-22 18:33             ` Ævar Arnfjörð Bjarmason
2021-11-22 18:49               ` Jeff King
2021-11-22 18:26           ` Ævar Arnfjörð Bjarmason
2021-11-23 12:06         ` [PATCH v2 0/9] run-command API: get rid of "argv" and "env" Ævar Arnfjörð Bjarmason
2021-11-23 12:06           ` [PATCH v2 1/9] worktree: remove redundant NULL-ing of "cp.argv Ævar Arnfjörð Bjarmason
2021-11-23 15:26             ` Eric Sunshine
2021-11-24  1:54               ` Junio C Hamano
2021-11-24  6:00                 ` Eric Sunshine
2021-11-24  6:12                   ` Eric Sunshine
2021-11-24  5:44               ` Eric Sunshine
2021-11-23 12:06           ` [PATCH v2 2/9] upload-archive: use regular "struct child_process" pattern Ævar Arnfjörð Bjarmason
2021-11-23 12:06           ` [PATCH v2 3/9] run-command API users: use strvec_pushv(), not argv assignment Ævar Arnfjörð Bjarmason
2021-11-23 12:06           ` [PATCH v2 4/9] run-command tests: " Ævar Arnfjörð Bjarmason
2021-11-24  1:33             ` Eric Sunshine
2021-11-23 12:06           ` [PATCH v2 5/9] run-command API users: use strvec_pushl(), not argv construction Ævar Arnfjörð Bjarmason
2021-11-23 12:06           ` [PATCH v2 6/9] run-command API users: use strvec_push(), " Ævar Arnfjörð Bjarmason
2021-11-23 12:06           ` [PATCH v2 7/9] run-command API: remove "argv" member, always use "args" Ævar Arnfjörð Bjarmason
2021-11-23 12:06           ` [PATCH v2 8/9] difftool: use "env_array" to simplify memory management Ævar Arnfjörð Bjarmason
2021-11-23 12:06           ` [PATCH v2 9/9] run-command API: remove "env" member, always use "env_array" Ævar Arnfjörð Bjarmason
2021-11-25 22:52           ` [PATCH v3 0/9] run-command API: get rid of "argv" and "env" Ævar Arnfjörð Bjarmason
2021-11-25 22:52             ` [PATCH v3 1/9] worktree: stop being overly intimate with run_command() internals Ævar Arnfjörð Bjarmason
2021-11-26  9:48               ` Eric Sunshine
2021-11-25 22:52             ` [PATCH v3 2/9] upload-archive: use regular "struct child_process" pattern Ævar Arnfjörð Bjarmason
2021-11-25 22:52             ` [PATCH v3 3/9] run-command API users: use strvec_pushv(), not argv assignment Ævar Arnfjörð Bjarmason
2021-11-25 22:52             ` [PATCH v3 4/9] run-command tests: " Ævar Arnfjörð Bjarmason
2021-11-25 22:52             ` [PATCH v3 5/9] run-command API users: use strvec_pushl(), not argv construction Ævar Arnfjörð Bjarmason
2021-11-25 22:52             ` [PATCH v3 6/9] run-command API users: use strvec_push(), " Ævar Arnfjörð Bjarmason
2021-11-25 22:52             ` [PATCH v3 7/9] run-command API: remove "argv" member, always use "args" Ævar Arnfjörð Bjarmason
2021-11-25 22:52             ` [PATCH v3 8/9] difftool: use "env_array" to simplify memory management Ævar Arnfjörð Bjarmason
2021-11-25 22:52             ` [PATCH v3 9/9] run-command API: remove "env" member, always use "env_array" Ævar Arnfjörð Bjarmason
2021-11-22 15:31     ` [PATCH v2] pager: fix crash when pager program doesn't exist Enzo Matsumiya
2021-11-22 16:22       ` Ævar Arnfjörð Bjarmason
2021-11-22 16:46         ` Enzo Matsumiya
2021-11-22 17:10           ` Ævar Arnfjörð Bjarmason
2021-11-22 17:41             ` Jeff King
2021-11-22 18:00             ` Junio C Hamano
2021-11-22 18:26               ` Jeff King
2021-11-22 17:55           ` Junio C Hamano
2021-11-22 18:19             ` Junio C Hamano
2021-11-22 18:37               ` Jeff King
2021-11-22 20:39                 ` Junio C Hamano
2021-11-22 17:08         ` Junio C Hamano
2021-11-22 18:35           ` Ævar Arnfjörð Bjarmason
2021-11-22 16:30       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=211122.86wnl0xd24.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=ematsumiya@suse.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.