All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Stefan Beller <sbeller@google.com>
Cc: Jeff King <peff@peff.net>,
	"git\@vger.kernel.org" <git@vger.kernel.org>,
	Jonathan Nieder <jrnieder@gmail.com>
Subject: Re: [PATCH 3/5] submodule: helper to run foreach in parallel
Date: Tue, 25 Aug 2015 15:23:18 -0700	[thread overview]
Message-ID: <xmqqy4gzvwh5.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <CAGZ79kb2N_5_tJv-GURL9_ESFs=pHp=L-Mujn3Df_+-T74_9Dg@mail.gmail.com> (Stefan Beller's message of "Tue, 25 Aug 2015 14:42:25 -0700")

Stefan Beller <sbeller@google.com> writes:

>>> +     while (1) {
>>> +             ssize_t len = xread(cp->err, buf, sizeof(buf));
>>> +             if (len < 0)
>>> +                     die("Read from child failed");
>>> +             else if (len == 0)
>>> +                     break;
>>> +             else {
>>> +                     strbuf_add(&out, buf, len);
>>> +             }
>>
>> ... and the whole thing is accumulated in core???
>
> The pipes have a limit, so we need to empty them to prevent back-pressure?

Of course.  But that does not lead to "we hold everything in core".
This side could choose to emit (under protection of args->mutex)
early, e.g. after reading a line, emit it to our standard output (or
our standard error).

> And because we want to have the output of one task at a time, we need to
> save it up until we can put out the whole output, no?

I do not necessarily agree, and I think I said that already:

  http://thread.gmane.org/gmane.comp.version-control.git/276273/focus=276321

>>> +     }
>>> +     if (finish_command(cp))
>>> +             die("command died with error");
>>> +
>>> +     sem_wait(args->mutex);
>>> +     fputs(out.buf, stderr);
>>> +     sem_post(args->mutex);
>>
>> ... and emitted to standard error?
>>
>> I would have expected that the standard error would be left alone
>
> `git fetch` which may be a good candidate for such an operation
> provides progress on stderr, and we don't want to intermingle
> 2 different submodule fetch progress displays
> ("I need to work offline for a bit, so let me get all of the latest stuff,
> so I'll run `git submodule foreach -j 16 -- git fetch --all" though ideally
> we want to have `git fetch --recurse-submodules -j16` instead )
>
>> (i.e. letting warnings from multiple jobs to be mixed together
>> simply because everybody writes to the same file descriptor), while
>> the standard output would be line-buffered, perhaps captured by the
>> above loop and then emitted under mutex, or something.
>
>>
>> I think I said this earlier, but latency to the first output counts
>
> "to the first stderr"
> in this case?

I didn't mean "output==the standard output stream".  As I said in
$gmane/276321, an early output, as an indication that we are doing
something, is important.

> Why would we want to unplug the task queue from somewhere else?

When you have a dispatcher more intelligent than a stupid FIFO, I
would imagine that you would want to be able to do this pattern,
especially when coming up with a task (not performing a task) takes
non-trivial amount of work:

	prepare task queue and have N threads waiting on it;

	plug the queue, i.e. tell threads that do not start picking
	tasks out of it yet;

	large enough loop to fill the queue to a reasonable size
	while keeping the threads waiting;

	unplug the queue.  Now the threads can pick tasks from the
	queue, but they have many to choose from, and a dispatcher
	can do better than simple FIFO can take advantage of it;

	keep filling the queue with more tasks, if necessary;

        and finally, wait for everything to finish.

Without "plug/unplug" interface, you _could_ do the above by doing
something stupid like

	prepare a task queue and have N threads waiting on it;

	loop to find enough number of tasks but do not put them to
	task queue, as FIFO will eat them one-by-one; instead hold
	onto them in a custom data structure that is outside the
	task queue system;

	tight and quick loop to move them to the task queue;

	keep finding more tasks and feed them to the task queue;

        and finally, wait for everything to finish.

  reply	other threads:[~2015-08-25 22:23 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-25 17:28 [RFC PATCH 0/5] Demonstrate new parallel threading API Stefan Beller
2015-08-25 17:28 ` [PATCH 1/5] FIXUP submodule: implement `module_clone` as a builtin helper Stefan Beller
2015-08-25 17:28 ` [PATCH 2/5] thread-utils: add a threaded task queue Stefan Beller
2015-08-25 17:28 ` [PATCH 3/5] submodule: helper to run foreach in parallel Stefan Beller
2015-08-25 21:09   ` Junio C Hamano
2015-08-25 21:42     ` Stefan Beller
2015-08-25 22:23       ` Junio C Hamano [this message]
2015-08-25 22:44         ` Junio C Hamano
2015-08-26 17:06   ` Jeff King
2015-08-26 17:21     ` Stefan Beller
2015-08-25 17:28 ` [PATCH 4/5] index-pack: Use the new worker pool Stefan Beller
2015-08-25 19:03   ` Jeff King
2015-08-25 19:23     ` Stefan Beller
2015-08-25 20:41     ` Junio C Hamano
2015-08-25 20:59       ` Stefan Beller
2015-08-25 21:12         ` Junio C Hamano
2015-08-25 22:39           ` Stefan Beller
2015-08-25 22:50             ` Junio C Hamano
2015-08-25 17:28 ` [PATCH 5/5] pack-objects: Use " Stefan Beller
  -- strict thread matches above, loose matches on Subject: below --
2015-08-27  0:52 [RFC PATCH 0/5] Progressing with `git submodule foreach_parallel` Stefan Beller
2015-08-27  0:52 ` [PATCH 3/5] submodule: helper to run foreach in parallel Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqy4gzvwh5.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.