Re: run-command: output owner picking strategy

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Stefan Beller <sbeller@google.com>
To: William Duclot <william.duclot@ensimag.grenoble-inp.fr>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	simon rabourg <simon.rabourg@ensimag.grenoble-inp.fr>,
	francois beutin <francois.beutin@ensimag.grenoble-inp.fr>,
	antoine queru <antoine.queru@ensimag.grenoble-inp.fr>,
	Matthieu Moy <matthieu.moy@grenoble-inp.fr>
Subject: Re: run-command: output owner picking strategy
Date: Fri, 20 May 2016 10:00:24 -0700	[thread overview]
Message-ID: <CAGZ79kZ1LTF4h_WVV9mMCAQCVtH667GU011LiRNtkTmi9ygUYw@mail.gmail.com> (raw)
In-Reply-To: <401208017.5792558.1463749910970.JavaMail.zimbra@ensimag.grenoble-inp.fr>

On Fri, May 20, 2016 at 6:11 AM, William Duclot
<william.duclot@ensimag.grenoble-inp.fr> wrote:
> Hi,
> I stumbled upon this piece of code (run-command.c:pp_collect_finish()), picking the owner
> of the output amongst parallel processes (introduced by Stephan Beller in commit c553c72eed64b5f7316ce227f6d5d783eae6f2ed)
>
> /*
>  * Pick next process to output live.
>  * NEEDSWORK:
>  * For now we pick it randomly by doing a round
>  * robin. Later we may want to pick the one with
>  * the most output or the longest or shortest
>  * running process time.
>  */
> for (i = 0; i < n; i++)
>    if (pp->children[(pp->output_owner + i) % n].state == GIT_CP_WORKING)
>       break;
> pp->output_owner = (pp->output_owner + i) % n;
>
>
> Would it be useful to improve this round-robin into something smarter (as stated by the NEEDSWORK)? It seems to be only used for submodules fetch/clone.
>
> The options would be (as said in the comment):
> 1 - pick the process with the longest running process time
> 2 - pick the process with the shortest running process time
> 3 - pick the process for which the output buffer is the longest
>
> But with one of those strategies, wouldn't we lose the advantage of having the same output order as a non-parallelized version? Cf the commit message.
>
> What do you think ?

When running in parallel we already may be out of order
(relative to serial processing). See the second example in the
commit message to produce a different order.

Consider we scheduled tasks to be run in 3 parallel processes:
(As we NEEDSWORK comment only addresses the ouput selection,
let's assume this is a fixes schedule, which we cannot alter.
Which is true if we only change the code you quoted. That picks
the process to output.)


    process 1: |---A---||-E----------|

    process 2: |-B-||-----------D----|

    process 3: |-------C-------||-F-|

output order:    A, B, D, C, F, E
timing:        |---A---|{B}|------D----|{C,F,E}

All outputs with {braces} are instant from buffers,
and output surrounded by |--dashes--| are live.

The output is produced by the current algorithm:
(1) Start with process 1 (A) whose output will be live
(2) Once A is done, flush all other done things, (B)
(3) live output will be round robin, so process 2 (D)
(4) Once D is done, flush all other done things (C, F, E)
    in order of who finshed first


(1) is uncontroversial. We have no information about tasks A,B,C,
    so pick a random candidate. We hardcoded process 1 for now.

(2) also uncontroversial IMHO. There is not much we can do different.

(3) is what this NEEDSWORK comment is about. Instead of outputting D
    we might have choosen C. (for $REASONS, e.g.: C is running longer than
    D already, so we expect it to finish sooner, by assuming
    any task takes the same expected time to finish. And as C
    is expected to finish earlier than D, we may have smoother
    output. "Less buffered bursts")

Then the output looks like this:

output order: A B C D F E
timing:        |---A---|{B}|-C-||-D-|{F,E}
(Same notation as above)

This seems to be better than the current behavior as we have more
different tasks with "live" output, i.e. you see stuff moving.
I made up the data to make the point though. We would need to use
live data and experiment with different strategies to find a
good/better solution.

Another way to do output would be to not round robin but keep outputting
process 1 live (different than current behavior, and not optimal IMHO)

output order: A B E C D F
timing:        |---A---|{B}|-----E----|{C,D,F}
(Same notation as above)

Thanks,
Stefan

next prev parent reply	other threads:[~2016-05-20 17:00 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <906110267.5791524.1463748828183.JavaMail.zimbra@ensimag.grenoble-inp.fr>
2016-05-20 13:11 ` run-command: output owner picking strategy William Duclot
2016-05-20 17:00   ` Stefan Beller [this message]
2016-05-20 18:29     ` William Duclot
2016-05-20 18:39       ` Stefan Beller
2016-05-20 18:54         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kZ1LTF4h_WVV9mMCAQCVtH667GU011LiRNtkTmi9ygUYw@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=antoine.queru@ensimag.grenoble-inp.fr \
    --cc=francois.beutin@ensimag.grenoble-inp.fr \
    --cc=git@vger.kernel.org \
    --cc=matthieu.moy@grenoble-inp.fr \
    --cc=simon.rabourg@ensimag.grenoble-inp.fr \
    --cc=william.duclot@ensimag.grenoble-inp.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).