All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Calvin Wan <calvinwan@google.com>
Cc: git@vger.kernel.org, emilyshaffer@google.com, phillip.wood123@gmail.com
Subject: Re: [PATCH v3 2/6] run-command: add hide_output to run_processes_parallel_opts
Date: Tue, 25 Oct 2022 21:32:13 +0200	[thread overview]
Message-ID: <221025.86fsfbd64l.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <CAFySSZCFrfhdKuOT=kxqPPBGBD0T2FtD4vJHfa9M3cMAPCSBtA@mail.gmail.com>


On Mon, Oct 24 2022, Calvin Wan wrote:

>> I may just be missing something, but doesn't "struct child_process"
>> already have e.g. "no_stderr", "no_stdout" etc. that we can use?
>> I.e. isn't this thing equivalent to running:
>>
>>         your-command >/dev/null 2>/dev/null
>>
>> Which is what the non-parallel API already supports.
>>
>> Now, IIRC if you just set that in the "get_next_task" callback it won't
>> work in the parallel API, or you'll block waiting for I/O that'll never
>> come or whatever.
>>
>> But that'll be because the parallel interface currently only suppors a
>> subset of the full "child_process" combination of options, and maybe it
>> doesn't grok this.
>>
>> But if that's the case we should just extend the API to support
>> "no_stdout", "no_stderr" etc., no?
>>
>> I.e. hypothetically the parallel one could support 100% of the "struct
>> child_process" combination of options, we just haven't bothered yet.
>>
>> But I don't see why the parallel API should grow options that we already
>> have in "struct child_process", instead we should set them there, and it
>> should gradually learn to deal with them.
>>
>> I think it's also fine to have some basic sanity checks there, e.g. I
>> could see how for something like this we don't want to support piping
>> only some children to /dev/null but not others, and that it should be
>> all or nothing (maybe it makes state management when we loop over them
>> easier).
>>
>> Or again, maybe I'm missing something...
>
> Shouldn't the options that are set in "child_process" be abstracted away
> from "parallel_processes"?

In general yes, and no :)

Our main interafce should probably be "just set
these in the 'struct child_process' we hand you", but the parallel API
might want to assert certain things about those settings, as some of
them may conflict with its assumptions.

> Setting "no_stdout", "no_stderr", etc. in a
> "child_process" shouldn't imply that we still pass the stdout and stderr to
>  "parallel_processes" and then we send the output to "/dev/null".

Sure, but if they're not producing any output because it's being piped
to /dev/null how worthwhile is it to optimize that?

We still can optimize it, but I still think the interface should just be
the equivalent of:

	parallel -k -j100% 'sleep 0.0$RANDOM && echo {} >/dev/null' ::: {1..100}

Whereas what you seem to be trying to implement is the equivalent of a:

	parallel -u -j100% 'sleep 0.0$RANDOM && echo {} ::: {1..100} >/dev/null

Except as an option to the parallel API, but the end result seems to be
equivalent.

> That being said, I can understand the aversion to adding an option like
> this that doesn't also add support for stdout and stderr. I can remove this
> patch and instead reset the buffer inside of pipe_output and task_finished
> in a later patch

I'm not necessarily opposed to it, just puzzled about it, maybe I don't
have the full picture.

In general I highly recomend looking at whatever GNU parallel is doing,
and seeing if new features in run-command.[ch] can map to that mental
model.

Our API is basically a small subset of its featureset, and I've found it
useful both to steal ideas from there, and to test
assumptions. E.g. "ungroup" is just a straight rip-off of the
"--ungroup" option, it's also had to think about combining various
options we don't have yet (but might want).

In that case the supervisor API/parallel(1) needs to do something
special, but for "I don't want output" it seems best to just do that at
the worker level, i.e. equivalent to piping to /dev/null.

Having a bias towards that approach also makes it easier to convert
things to running in parallel, i.e. you just (mostly) keep your current
"struct child_process", and don't need to find the equivalents in the
parallel API.









  reply	other threads:[~2022-10-25 20:02 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <https://lore.kernel.org/git/20221011232604.839941-1-calvinwan@google.com/>
2022-10-20 23:25 ` [PATCH v3 0/6] submodule: parallelize diff Calvin Wan
2022-10-20 23:25 ` [PATCH v3 1/6] run-command: add pipe_output_fn to run_processes_parallel_opts Calvin Wan
2022-10-21  3:11   ` Ævar Arnfjörð Bjarmason
2022-10-24 17:13     ` Calvin Wan
2022-10-21  5:46   ` Junio C Hamano
2022-10-24 17:00     ` Calvin Wan
2022-10-24 19:04       ` Junio C Hamano
2022-10-25 18:51         ` Calvin Wan
2022-10-20 23:25 ` [PATCH v3 2/6] run-command: add hide_output " Calvin Wan
2022-10-21  2:54   ` Ævar Arnfjörð Bjarmason
2022-10-24 19:24     ` Calvin Wan
2022-10-25 19:32       ` Ævar Arnfjörð Bjarmason [this message]
2022-10-25 21:22         ` Calvin Wan
2022-10-20 23:25 ` [PATCH v3 3/6] submodule: strbuf variable rename Calvin Wan
2022-10-20 23:25 ` [PATCH v3 4/6] submodule: move status parsing into function Calvin Wan
2022-10-20 23:25 ` [PATCH v3 5/6] diff-lib: refactor match_stat_with_submodule Calvin Wan
2022-10-20 23:25 ` [PATCH v3 6/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan
2022-10-21  1:13   ` Ævar Arnfjörð Bjarmason
2022-11-03 21:16     ` Calvin Wan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=221025.86fsfbd64l.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=calvinwan@google.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=phillip.wood123@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.