Re: [PATCHv14 5/7] git submodule update: have a dedicated helper for cloning

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jonathan Nieder <jrnieder@gmail.com>
To: Stefan Beller <sbeller@google.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	Jens Lehmann <Jens.Lehmann@web.de>, Jeff King <peff@peff.net>,
	Eric Sunshine <sunshine@sunshineco.com>
Subject: Re: [PATCHv14 5/7] git submodule update: have a dedicated helper for cloning
Date: Fri, 19 Feb 2016 16:32:24 -0800	[thread overview]
Message-ID: <20160220003224.GN28749@google.com> (raw)
In-Reply-To: <CAGZ79kZjD_hRBFEVpj-80NpaYR2qRvnLbBb+9kR4x7MKoRX+UQ@mail.gmail.com>

Stefan Beller wrote:
> On Fri, Feb 19, 2016 at 3:07 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:

[...]
>>> +             strbuf_addf(err, "Skipping submodule '%s'\n",
>>> +                         displaypath);
>>
>> Does the caller expect a newline at the end of err?
>>
>> In the refs code that uses an err strbuf, the convention is to
>> not end the message with a newline.  That way, a function like
>> die() can insert a newline and messages are guaranteed to be
>> newline-terminated even if someone is sloppy and does the wrong thing
>> when generating an error.
>
> Oh! So we need to add new lines ourselves here.

Now I'm confused.  The code in this patch is inconsistent.  I was
recommending consistently leaving out the \n.

It looks like pp_start_one takes the content of err without adding
a \n.  That's a bug in pp_start_one and pp_collect_finished IMHO.

For example, default_task_finished and default_start_failure don't
put a newline don't put a newline at the end of the message.  I
don't think that's a bug --- they're doing the right thing, but
pp_start_one and pp_collect_finished is just not handling it
correctly.

>>> +             if (pp->reference)
>>> +                     argv_array_push(&cp->args, pp->reference);
>>> +             if (pp->depth)
>>> +                     argv_array_push(&cp->args, pp->depth);
>>
>> What does 'cp' stand for mean?  Would a name like 'child' work?
>
> child process ? (any code which had a struct child_process referred to
> it as *cp)

Output from 'git grep --F -e "child_process *"' finds variables with
various names, corresponding to what kind of child it is.

[...]
>> Is this the 'list' subcommand?
>
> no. :(

No problem --- that's what review is for.

[...]
>>> +     if (pp.print_unmatched) {
>>> +             printf("#unmatched\n");
>>
>> I'm still confused about this.  I think a comment where
>> 'print_unmatched' is declared would probably clear it up.
>
> Probably this is a too literal translation from shell to C.
> By printing #unmatched the shell on the other end of the pipe
> of this helper program knows to just stop and error out.
>
> So this is telling the downstream program to stop.
>
> Maybe we can name the variable 'quickstop' ?
> 'abort_and_exit' ?

Interesting.

Would it work for the helper to exit at that point with nonzero status?

Lacking "set -o pipefail", it's a little messy to handle in the shell,
but it's possible:

	{
		git submodule--helper \
			--foo \
			--bar \
			--baz ||
		... handle error, e.g. by printing something
		that the other end of the pipe wants to see ...
	} |
	...

[...]
>> (just curious) why are these saved up and printed all at once instead
>> of being printed as it goes?
>
> To have a clean abort path, i.e. we need to be done with all the parallel stuff
> before we start on doing local on-disk stuff.

Interesting.

That's subtle.  Probably worth a comment so people know not to break
it.  (E.g.

	/*
	 * We saved the output and splat it all at once now.
	 * That means:
	 * - the listener does not have to interleave their (checkout)
	 *   work with our fetching.  The writes involved in a
	 *   checkout involve more straightforward sequential I/O.
	 * - the listener can avoid doing any work if fetching failed.
	 */

).

Thinking out loud: the other side could write their output to a
temporary file (e.g.  under .git) and save us the trouble of buffering
it.  But the list of submodules and statuses is not likely to be huge
--- buffering doesn't hurt much.

[...]
> In a next step, we can do the checkout in parallel if there is nothing to assume
> to lead to trouble. i.e. update strategy is checkout and the submodule is
> in a clean state at the tip of a branch.

Somebody tried parallelizing file output during checkout and found it
sped up the checkout on some systems by a small amount.  But then
later operations to read back the files were much slower.  It seems
that the output pattern affected the layout of the files on disk in a
bad way.

I'm not too afraid.  Just saying that the benefit of parallel checkout
would be something to measure.

A bigger worry is that checkout might be I/O bound and not benefit
much from parallelism.

> All problems which need to be solved by the user should be presented in
> a sequential way, i.e. present one problem and then full stop.
> That seems to be more encouraging as if we'd throw a "Here are a dozen
> broken submodule repositories which we expect the user fixes up".

It depends on the problem --- sometimes people want to see a list of
errors and fix them all (hence tools like "make -k").  I agree that
stop-on-first-error is a good default and that it's not worth worrying
about introducing --keep-going until someone asks for it.

Thanks for your thoughtfulness,
Jonathan

next prev parent reply	other threads:[~2016-02-20  0:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-19 18:17 [PATCHv14 0/7] Expose submodule parallelism to the user Stefan Beller
2016-02-19 18:17 ` [PATCHv14 1/7] submodule-config: keep update strategy around Stefan Beller
2016-02-19 18:17 ` [PATCHv14 2/7] submodule-config: drop check against NULL Stefan Beller
2016-02-19 18:17 ` [PATCHv14 3/7] fetching submodules: respect `submodule.fetchJobs` config option Stefan Beller
2016-02-19 18:17 ` [PATCHv14 4/7] submodule update: direct error message to stderr Stefan Beller
2016-02-19 18:17 ` [PATCHv14 5/7] git submodule update: have a dedicated helper for cloning Stefan Beller
2016-02-19 23:07   ` Jonathan Nieder
2016-02-19 23:49     ` Stefan Beller
2016-02-20  0:32       ` Jonathan Nieder [this message]
2016-02-19 18:17 ` [PATCHv14 6/7] submodule update: expose parallelism to the user Stefan Beller
2016-02-19 18:17 ` [PATCHv14 7/7] clone: allow an explicit argument for parallel submodule clones Stefan Beller
2016-02-19 21:04 ` [PATCHv14 0/7] Expose submodule parallelism to the user Junio C Hamano
2016-02-23 23:33   ` Junio C Hamano
2016-02-23 23:42     ` Stefan Beller
2016-02-23 23:44       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160220003224.GN28749@google.com \
    --to=jrnieder@gmail.com \
    --cc=Jens.Lehmann@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.