git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Calvin Wan <calvinwan@google.com>
Cc: git@vger.kernel.org, emilyshaffer@google.com, phillip.wood123@gmail.com
Subject: Re: [PATCH v3 2/6] run-command: add hide_output to run_processes_parallel_opts
Date: Tue, 25 Oct 2022 21:32:13 +0200	[thread overview]
Message-ID: <221025.86fsfbd64l.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <CAFySSZCFrfhdKuOT=kxqPPBGBD0T2FtD4vJHfa9M3cMAPCSBtA@mail.gmail.com>


On Mon, Oct 24 2022, Calvin Wan wrote:

>> I may just be missing something, but doesn't "struct child_process"
>> already have e.g. "no_stderr", "no_stdout" etc. that we can use?
>> I.e. isn't this thing equivalent to running:
>>
>>         your-command >/dev/null 2>/dev/null
>>
>> Which is what the non-parallel API already supports.
>>
>> Now, IIRC if you just set that in the "get_next_task" callback it won't
>> work in the parallel API, or you'll block waiting for I/O that'll never
>> come or whatever.
>>
>> But that'll be because the parallel interface currently only suppors a
>> subset of the full "child_process" combination of options, and maybe it
>> doesn't grok this.
>>
>> But if that's the case we should just extend the API to support
>> "no_stdout", "no_stderr" etc., no?
>>
>> I.e. hypothetically the parallel one could support 100% of the "struct
>> child_process" combination of options, we just haven't bothered yet.
>>
>> But I don't see why the parallel API should grow options that we already
>> have in "struct child_process", instead we should set them there, and it
>> should gradually learn to deal with them.
>>
>> I think it's also fine to have some basic sanity checks there, e.g. I
>> could see how for something like this we don't want to support piping
>> only some children to /dev/null but not others, and that it should be
>> all or nothing (maybe it makes state management when we loop over them
>> easier).
>>
>> Or again, maybe I'm missing something...
>
> Shouldn't the options that are set in "child_process" be abstracted away
> from "parallel_processes"?

In general yes, and no :)

Our main interafce should probably be "just set
these in the 'struct child_process' we hand you", but the parallel API
might want to assert certain things about those settings, as some of
them may conflict with its assumptions.

> Setting "no_stdout", "no_stderr", etc. in a
> "child_process" shouldn't imply that we still pass the stdout and stderr to
>  "parallel_processes" and then we send the output to "/dev/null".

Sure, but if they're not producing any output because it's being piped
to /dev/null how worthwhile is it to optimize that?

We still can optimize it, but I still think the interface should just be
the equivalent of:

	parallel -k -j100% 'sleep 0.0$RANDOM && echo {} >/dev/null' ::: {1..100}

Whereas what you seem to be trying to implement is the equivalent of a:

	parallel -u -j100% 'sleep 0.0$RANDOM && echo {} ::: {1..100} >/dev/null

Except as an option to the parallel API, but the end result seems to be
equivalent.

> That being said, I can understand the aversion to adding an option like
> this that doesn't also add support for stdout and stderr. I can remove this
> patch and instead reset the buffer inside of pipe_output and task_finished
> in a later patch

I'm not necessarily opposed to it, just puzzled about it, maybe I don't
have the full picture.

In general I highly recomend looking at whatever GNU parallel is doing,
and seeing if new features in run-command.[ch] can map to that mental
model.

Our API is basically a small subset of its featureset, and I've found it
useful both to steal ideas from there, and to test
assumptions. E.g. "ungroup" is just a straight rip-off of the
"--ungroup" option, it's also had to think about combining various
options we don't have yet (but might want).

In that case the supervisor API/parallel(1) needs to do something
special, but for "I don't want output" it seems best to just do that at
the worker level, i.e. equivalent to piping to /dev/null.

Having a bias towards that approach also makes it easier to convert
things to running in parallel, i.e. you just (mostly) keep your current
"struct child_process", and don't need to find the equivalents in the
parallel API.









  reply	other threads:[~2022-10-25 20:02 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <https://lore.kernel.org/git/20221011232604.839941-1-calvinwan@google.com/>
2022-10-20 23:25 ` [PATCH v3 0/6] submodule: parallelize diff Calvin Wan
2022-10-20 23:25 ` [PATCH v3 1/6] run-command: add pipe_output_fn to run_processes_parallel_opts Calvin Wan
2022-10-21  3:11   ` Ævar Arnfjörð Bjarmason
2022-10-24 17:13     ` Calvin Wan
2022-10-21  5:46   ` Junio C Hamano
2022-10-24 17:00     ` Calvin Wan
2022-10-24 19:04       ` Junio C Hamano
2022-10-25 18:51         ` Calvin Wan
2022-10-20 23:25 ` [PATCH v3 2/6] run-command: add hide_output " Calvin Wan
2022-10-21  2:54   ` Ævar Arnfjörð Bjarmason
2022-10-24 19:24     ` Calvin Wan
2022-10-25 19:32       ` Ævar Arnfjörð Bjarmason [this message]
2022-10-25 21:22         ` Calvin Wan
2022-10-20 23:25 ` [PATCH v3 3/6] submodule: strbuf variable rename Calvin Wan
2022-10-20 23:25 ` [PATCH v3 4/6] submodule: move status parsing into function Calvin Wan
2022-10-20 23:25 ` [PATCH v3 5/6] diff-lib: refactor match_stat_with_submodule Calvin Wan
2022-10-20 23:25 ` [PATCH v3 6/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan
2022-10-21  1:13   ` Ævar Arnfjörð Bjarmason
2022-11-03 21:16     ` Calvin Wan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=221025.86fsfbd64l.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=calvinwan@google.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=phillip.wood123@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).