Re: [PATCHv3 06/13] run-command: add an asynchronous parallel child processor

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Stefan Beller <sbeller@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	Jacob Keller <jacob.keller@gmail.com>, Jeff King <peff@peff.net>,
	Jonathan Nieder <jrnieder@gmail.com>,
	Johannes Schindelin <johannes.schindelin@gmail.com>,
	Jens Lehmann <Jens.Lehmann@web.de>,
	Vitali Lovich <vlovich@gmail.com>
Subject: Re: [PATCHv3 06/13] run-command: add an asynchronous parallel child processor
Date: Tue, 22 Sep 2015 11:28:31 -0700	[thread overview]
Message-ID: <CAGZ79kbUkUSAP+muhYxTwHZdD+ojJYXjogZfRXs0PemEdcqfbA@mail.gmail.com> (raw)
In-Reply-To: <xmqqfv276z1q.fsf@gitster.mtv.corp.google.com>

On Mon, Sep 21, 2015 at 6:08 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> +void default_start_failure(void *data,
>> +                        struct child_process *cp,
>> +                        struct strbuf *err)
>> +{
>> +     int i;
>> +     struct strbuf sb = STRBUF_INIT;
>> +
>> +     for (i = 0; cp->argv[i]; i++)
>> +             strbuf_addf(&sb, "%s ", cp->argv[i]);
>> +     die_errno("Starting a child failed:\n%s", sb.buf);
>
> Do we want that trailing SP after the last element of argv[]?
> Same question applies to the one in "return-value".

done

>
>> +static void run_processes_parallel_init(struct parallel_processes *pp,
>> +                                     int n, void *data,
>> +                                     get_next_task_fn get_next_task,
>> +                                     start_failure_fn start_failure,
>> +                                     return_value_fn return_value)
>> +{
>> +     int i;
>> +
>> +     if (n < 1)
>> +             n = online_cpus();
>> +
>> +     pp->max_processes = n;
>> +     pp->data = data;
>> +     if (!get_next_task)
>> +             die("BUG: you need to specify a get_next_task function");
>> +     pp->get_next_task = get_next_task;
>> +
>> +     pp->start_failure = start_failure ? start_failure : default_start_failure;
>> +     pp->return_value = return_value ? return_value : default_return_value;
>
> I would actually have expected that leaving these to NULL will just
> skip pp->fn calls, instead of a "default implementation", but a pair
> of very simple default implementation would not hrtut.

Ok, I think the default implementation provided is a reasonable default, as
it provides enough information in case of an error.

>
>> +static void run_processes_parallel_cleanup(struct parallel_processes *pp)
>> +{
>> +     int i;
>
> Have a blank between the decl block and the first stmt here (and
> elsewhere, too---which you got correct in the function above)?

done

>
>> +     for (i = 0; i < pp->max_processes; i++)
>> +             strbuf_release(&pp->children[i].err);
>
>> +static void run_processes_parallel_start_one(struct parallel_processes *pp)
>> +{
>> +     int i;
>> +
>> +     for (i = 0; i < pp->max_processes; i++)
>> +             if (!pp->children[i].in_use)
>> +                     break;
>> +     if (i == pp->max_processes)
>> +             die("BUG: bookkeeping is hard");
>
> Mental note: the caller is responsible for not calling this when all
> slots are taken.
>
>> +     if (!pp->get_next_task(pp->data,
>> +                            &pp->children[i].process,
>> +                            &pp->children[i].err)) {
>> +             pp->all_tasks_started = 1;
>> +             return;
>> +     }
>
> Mental note: but it is OK to call this if get_next_task() previously
> said "no more task".
>
> The above two shows a slight discrepancy (nothing earth-breaking).

I see. Maybe this can be improved by having the
run_processes_parallel_start_as_needed call get_next_task
and pass the information into the run_processes_parallel_start_one
or as we had it before, combine these two functions again.

>
> I have this suspicion that the all-tasks-started bit may turn out to
> be a big mistake that we may later regret.  Don't we want to allow
> pp->more_task() to say "no more task to run at this moment" implying
> "but please do ask me later, because I may have found more to do by
> the time you ask me again"?

And this task would arise because the current running children produce
more work to be done?
So you would have a
    more_tasks() question. If that returns true
    get_next_task() must provide that next task?

In case we had more work to do, which is based on the outcome of the
children, we could just wait in get_next_task for a semaphore/condition
variable from the return_value. Though that would stop progress reporting
end maybe lock up the whole program due to pipe clogging.

It seems to be a better design as we come back to the main loop fast
which does the polling. Although I feel like it is over engineered for now.

So how would you find out when we are done?
* more_tasks() could have different return values in an enum
  (YES_THERE_ARE, NO_BUT_ASK_LATER, NO_NEVER_ASK_AGAIN)
* There could be yet another callback more_tasks_available() and
   parallel_processing_should_stop()
* Hand back a callback ourselfs [Call signal_parallel_processing_done(void*)
  when more_tasks will never return true again, with a void* we provide to
  more_tasks()]
* ...

>
> That is one of the reasons why I do not think the "very top level is
> a bulleted list" organization is a good idea in general.  A good
> scheduling decision can seldom be made in isolation without taking
> global picture into account.
>
>> +static void run_processes_parallel_collect_finished(struct parallel_processes *pp)
>> +{
>> +     int i = 0;
>> +     pid_t pid;
>> +     int wait_status, code;
>> +     int n = pp->max_processes;
>> +
>> +     while (pp->nr_processes > 0) {
>> +             pid = waitpid(-1, &wait_status, WNOHANG);
>> +             if (pid == 0)
>> +                     return;
>> +
>> +             if (pid < 0)
>> +                     die_errno("wait");
>> +
>> +             for (i = 0; i < pp->max_processes; i++)
>> +                     if (pp->children[i].in_use &&
>> +                         pid == pp->children[i].process.pid)
>> +                             break;
>> +             if (i == pp->max_processes)
>> +                     /*
>> +                      * waitpid returned another process id
>> +                      * which we are not waiting for.
>> +                      */
>> +                     return;
>
> If we culled a child process that this machinery is not in charge
> of, waitpid() in other places that wants to see that child will not
> see it.  Perhaps such a situation might even warrant an error() or
> BUG()?  Do we want a "NEEDSWORK: Is this a bug?" comment here at
> least?
>
>> +             if (strbuf_read_once(&pp->children[i].err,
>> +                                  pp->children[i].process.err, 0) < 0 &&
>> +                 errno != EAGAIN)
>> +                     die_errno("strbuf_read_once");
>
> Don't we want to read thru to the end here?  The reason read_once()
> did not read thru to the end may not have anything to do with
> NONBLOCK (e.g. xread_nonblock() caps len, and it does not loop).

right.

>

next prev parent reply	other threads:[~2015-09-22 18:28 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-21 22:39 [PATCHv3 00/13] fetch submodules in parallel and a preview on parallel "submodule update" Stefan Beller
2015-09-21 22:39 ` [PATCHv3 01/13] Sending "Fetching submodule <foo>" output to stderr Stefan Beller
2015-09-21 23:47   ` Junio C Hamano
2015-09-21 22:39 ` [PATCHv3 02/13] xread: poll on non blocking fds Stefan Beller
2015-09-21 23:55   ` Junio C Hamano
2015-09-22  4:55     ` Torsten Bögershausen
2015-09-22  6:23       ` Jacob Keller
2015-09-22 18:40         ` Torsten Bögershausen
2015-09-22 19:45         ` Junio C Hamano
2015-09-22 19:49           ` Jeff King
2015-09-22 20:00             ` Junio C Hamano
2015-09-23  0:14               ` Stefan Beller
2015-09-23  0:43                 ` Junio C Hamano
2015-09-23  1:51                 ` Jeff King
2015-09-21 23:56   ` Eric Sunshine
2015-09-22 15:58     ` Junio C Hamano
2015-09-22 17:38       ` Stefan Beller
2015-09-22 18:21         ` Junio C Hamano
2015-09-22 18:41           ` Stefan Beller
2015-09-21 22:39 ` [PATCHv3 03/13] xread_nonblock: add functionality to read from fds nonblockingly Stefan Beller
2015-09-22  0:02   ` Junio C Hamano
2015-09-22  0:10   ` Junio C Hamano
2015-09-22  6:26     ` Jacob Keller
2015-09-22  6:27   ` Jacob Keller
2015-09-22 15:59     ` Junio C Hamano
2015-09-21 22:39 ` [PATCHv3 04/13] strbuf: add strbuf_read_once to read without blocking Stefan Beller
2015-09-22  0:17   ` Junio C Hamano
2015-09-22  6:29     ` Jacob Keller
2015-09-21 22:39 ` [PATCHv3 05/13] run-command: factor out return value computation Stefan Beller
2015-09-22  0:38   ` Junio C Hamano
2015-09-21 22:39 ` [PATCHv3 06/13] run-command: add an asynchronous parallel child processor Stefan Beller
2015-09-22  1:08   ` Junio C Hamano
2015-09-22 18:28     ` Stefan Beller [this message]
2015-09-22 19:53       ` Junio C Hamano
2015-09-22 21:31         ` Stefan Beller
2015-09-22 21:41           ` Junio C Hamano
2015-09-22 21:54             ` Stefan Beller
2015-09-22 22:23               ` Junio C Hamano
2015-09-21 22:39 ` [PATCHv3 07/13] fetch_populated_submodules: use new parallel job processing Stefan Beller
2015-09-22 16:28   ` Junio C Hamano
2015-09-21 22:39 ` [PATCHv3 08/13] submodules: allow parallel fetching, add tests and documentation Stefan Beller
2015-09-21 22:39 ` [PATCHv3 09/13] submodule config: keep update strategy around Stefan Beller
2015-09-22  0:56   ` Eric Sunshine
2015-09-22 15:50     ` Stefan Beller
2015-09-21 22:39 ` [PATCHv3 10/13] git submodule update: cmd_update_recursive Stefan Beller
2015-09-21 22:39 ` [PATCHv3 11/13] git submodule update: cmd_update_clone Stefan Beller
2015-09-21 22:39 ` [PATCHv3 12/13] git submodule update: cmd_update_fetch Stefan Beller
2015-09-21 22:39 ` [PATCHv3 13/13] Rewrite submodule update in C Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kbUkUSAP+muhYxTwHZdD+ojJYXjogZfRXs0PemEdcqfbA@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=Jens.Lehmann@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jacob.keller@gmail.com \
    --cc=johannes.schindelin@gmail.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=vlovich@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).