* [PATCH] Enable parallelism in git submodule update.
@ 2012-07-27 18:37 Stefan Zager
2012-07-27 21:38 ` Junio C Hamano
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Stefan Zager @ 2012-07-27 18:37 UTC (permalink / raw)
To: git; +Cc: gitster, jens.lehmann, hvoigt
The --jobs parameter may be used to set the degree of per-submodule
parallel execution.
Signed-off-by: Stefan Zager <szager@google.com>
---
Documentation/git-submodule.txt | 8 +++++++-
git-submodule.sh | 23 ++++++++++++++++++++++-
2 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index fbbbcb2..34f81fb 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -14,7 +14,8 @@ SYNOPSIS
'git submodule' [--quiet] status [--cached] [--recursive] [--] [<path>...]
'git submodule' [--quiet] init [--] [<path>...]
'git submodule' [--quiet] update [--init] [-N|--no-fetch] [--rebase]
- [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
+ [--reference <repository>] [--merge] [--recursive]
+ [-j|--jobs [jobs]] [--] [<path>...]
'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) <n>]
[commit] [--] [<path>...]
'git submodule' [--quiet] foreach [--recursive] <command>
@@ -147,6 +148,11 @@ If the submodule is not yet initialized, and you just want to use the
setting as stored in .gitmodules, you can automatically initialize the
submodule with the `--init` option.
+
+By default, each submodule is treated serially. You may specify a degree of
+parallel execution with the --jobs flag. If a parameter is provided, it is
+the maximum number of jobs to run in parallel; without a parameter, all jobs are
+run in parallel.
++
If `--recursive` is specified, this command will recurse into the
registered submodules, and update any nested submodules within.
diff --git a/git-submodule.sh b/git-submodule.sh
index dba4d39..761420a 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -8,7 +8,7 @@ dashless=$(basename "$0" | sed -e 's/-/ /')
USAGE="[--quiet] add [-b branch] [-f|--force] [--reference <repository>] [--] <repository> [<path>]
or: $dashless [--quiet] status [--cached] [--recursive] [--] [<path>...]
or: $dashless [--quiet] init [--] [<path>...]
- or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
+ or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [-j|--jobs [jobs]] [--] [<path>...]
or: $dashless [--quiet] summary [--cached|--files] [--summary-limit <n>] [commit] [--] [<path>...]
or: $dashless [--quiet] foreach [--recursive] <command>
or: $dashless [--quiet] sync [--] [<path>...]"
@@ -473,6 +473,7 @@ cmd_update()
{
# parse $args after "submodule ... update".
orig_flags=
+ jobs="1"
while test $# -ne 0
do
case "$1" in
@@ -491,6 +492,20 @@ cmd_update()
-r|--rebase)
update="rebase"
;;
+ -j|--jobs)
+ case "$2" in
+ ''|-*)
+ jobs="0"
+ ;;
+ *)
+ jobs="$2"
+ shift
+ ;;
+ esac
+ # Don't preserve this arg.
+ shift
+ continue
+ ;;
--reference)
case "$2" in '') usage ;; esac
reference="--reference=$2"
@@ -529,6 +544,12 @@ cmd_update()
cmd_init "--" "$@" || return
fi
+ if test "$jobs" != "1"
+ then
+ module_list "$@" | awk '{print $4}' | xargs -L 1 -P "$jobs" git submodule update $orig_args
+ return
+ fi
+
cloned_modules=
module_list "$@" | {
err=
--
1.7.11.rc2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] Enable parallelism in git submodule update.
2012-07-27 18:37 [PATCH] Enable parallelism in git submodule update Stefan Zager
@ 2012-07-27 21:38 ` Junio C Hamano
[not found] ` <CAHOQ7J_jYAe7r1q6Cg9OJb8f+79UfS=JfRk9NrS4R4a+oLM8LA@mail.gmail.com>
2012-07-28 10:22 ` Heiko Voigt
2012-07-29 15:37 ` [PATCH] Enable parallelism in git submodule update Jens Lehmann
2 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2012-07-27 21:38 UTC (permalink / raw)
To: Stefan Zager; +Cc: git, jens.lehmann, hvoigt
Stefan Zager <szager@google.com> writes:
> + module_list "$@" | awk '{print $4}' | xargs -L 1 -P "$jobs" git submodule update $orig_args
Capital-P option to xargs is not even in POSIX, no?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Enable parallelism in git submodule update.
2012-07-27 18:37 [PATCH] Enable parallelism in git submodule update Stefan Zager
2012-07-27 21:38 ` Junio C Hamano
@ 2012-07-28 10:22 ` Heiko Voigt
2012-07-28 12:19 ` [PATCH] cleanup argument passing in submodule status command Heiko Voigt
2012-07-29 15:37 ` [PATCH] Enable parallelism in git submodule update Jens Lehmann
2 siblings, 1 reply; 12+ messages in thread
From: Heiko Voigt @ 2012-07-28 10:22 UTC (permalink / raw)
To: Stefan Zager; +Cc: git, gitster, jens.lehmann
Hi Stefan,
neat patch. See below for a few notes.
On Fri, Jul 27, 2012 at 11:37:34AM -0700, Stefan Zager wrote:
> diff --git a/git-submodule.sh b/git-submodule.sh
> index dba4d39..761420a 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -491,6 +492,20 @@ cmd_update()
> -r|--rebase)
> update="rebase"
> ;;
> + -j|--jobs)
> + case "$2" in
> + ''|-*)
> + jobs="0"
> + ;;
> + *)
> + jobs="$2"
> + shift
> + ;;
> + esac
> + # Don't preserve this arg.
> + shift
> + continue
> + ;;
> --reference)
> case "$2" in '') usage ;; esac
> reference="--reference=$2"
> @@ -529,6 +544,12 @@ cmd_update()
> cmd_init "--" "$@" || return
> fi
>
> + if test "$jobs" != "1"
> + then
> + module_list "$@" | awk '{print $4}' | xargs -L 1 -P "$jobs" git submodule update $orig_args
I do not see orig_args set anywhere in submodule.sh. It seems the
existing usage of it in cmd_status() is a leftover from commit
98dbe63 when this variable got renamed to orig_flags.
I will follow up with a patch to that location.
Another problem here is the passing of arguments. Have a look at
a7eff1a8 to see how this was solved for other locations.
The next thing I noticed is that the parallelism is not recursive. You
drop the option and only execute the first depth in parallel. How about
using the amount of modules defined by arguments left in $@ as an
indicator whether you need to fork parallel execution or not. If there
is exactly one you do the update if there are more you do the parallel
thing. That way you can just keep passing the --jobs flag to the
subprocesses.
The next question to solve is UI: Since the output lines of the parallel
update jobs will be mixed we need some way to distinguish them. Imagine
one of the update fails somewhere how do we find out which it was?
Two possible solutions come to my mind:
1. Prefix each line with a job number. This way you can distinguish
which process outputted what and still have immediate feedback.
2. Cache the output (to stderr and stdout) of each job and output it
once one job is done. I imagine this needs some infrastructure which
we need to implement. We already have some ideas how to collect such
output in C here[1].
I would prefer solution 2 since the output of 1 will be hard to read but
I guess we could start with 1 and then move over to 2 later on.
Cheers Heiko
[1] http://article.gmane.org/gmane.comp.version-control.git/197747
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] cleanup argument passing in submodule status command
2012-07-28 10:22 ` Heiko Voigt
@ 2012-07-28 12:19 ` Heiko Voigt
2012-07-29 6:22 ` Junio C Hamano
0 siblings, 1 reply; 12+ messages in thread
From: Heiko Voigt @ 2012-07-28 12:19 UTC (permalink / raw)
To: gitster; +Cc: git, jens.lehmann, Stefan Zager
In commit 98dbe63 the variable $orig_args was renamed to $orig_flags.
One location in cmd_status() was missed.
Note: This is a code cleanup and does not fix any bugs. As a side effect
the variables containing the parsed flags to "git submodule status" are
passed down recursively. So everything was already behaving as expected.
Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
---
git-submodule.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/git-submodule.sh b/git-submodule.sh
index dba4d39..3a3f0a4 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -961,7 +961,7 @@ cmd_status()
prefix="$displaypath/"
clear_local_git_env
cd "$sm_path" &&
- eval cmd_status "$orig_args"
+ eval cmd_status "$orig_flags"
) ||
die "$(eval_gettext "Failed to recurse into submodule path '\$sm_path'")"
fi
--
1.7.12.rc0.23.g3c7cae0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] cleanup argument passing in submodule status command
2012-07-28 12:19 ` [PATCH] cleanup argument passing in submodule status command Heiko Voigt
@ 2012-07-29 6:22 ` Junio C Hamano
2012-07-29 15:29 ` Jens Lehmann
0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2012-07-29 6:22 UTC (permalink / raw)
To: Heiko Voigt; +Cc: git, jens.lehmann, Stefan Zager
Heiko Voigt <hvoigt@hvoigt.net> writes:
> Note: This is a code cleanup and does not fix any bugs. As a side effect
> the variables containing the parsed flags to "git submodule status" are
> passed down recursively. So everything was already behaving as expected.
If that is the case, shouldn't we stop passing anything down, if we
want it to be a "clean-up only, no behaviour changes" patch? While
at it, we may want to kill that code to accumulate the original
options in orig_flags because we haven't been using the variable.
We _know_ $orig_args has been empty, i.e. the code has been working
fine with only cmd_status there. Nobody has tried what happens when
we pass the original arguments to cmd_status on that line. The
patch changes the behaviour of the code; it makes the command line
parsing "while" loop to run again, and if the code that accumulates
original options in orig_flags have been buggy, now that bug will be
exposed.
> Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
> ---
> git-submodule.sh | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/git-submodule.sh b/git-submodule.sh
> index dba4d39..3a3f0a4 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -961,7 +961,7 @@ cmd_status()
> prefix="$displaypath/"
> clear_local_git_env
> cd "$sm_path" &&
> - eval cmd_status "$orig_args"
> + eval cmd_status "$orig_flags"
> ) ||
> die "$(eval_gettext "Failed to recurse into submodule path '\$sm_path'")"
> fi
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] cleanup argument passing in submodule status command
2012-07-29 6:22 ` Junio C Hamano
@ 2012-07-29 15:29 ` Jens Lehmann
2012-07-29 21:57 ` Junio C Hamano
0 siblings, 1 reply; 12+ messages in thread
From: Jens Lehmann @ 2012-07-29 15:29 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Heiko Voigt, git, Stefan Zager
Am 29.07.2012 08:22, schrieb Junio C Hamano:
> Heiko Voigt <hvoigt@hvoigt.net> writes:
>
>> Note: This is a code cleanup and does not fix any bugs. As a side effect
>> the variables containing the parsed flags to "git submodule status" are
>> passed down recursively. So everything was already behaving as expected.
>
> If that is the case, shouldn't we stop passing anything down, if we
> want it to be a "clean-up only, no behaviour changes" patch? While
> at it, we may want to kill that code to accumulate the original
> options in orig_flags because we haven't been using the variable.
>
> We _know_ $orig_args has been empty, i.e. the code has been working
> fine with only cmd_status there. Nobody has tried what happens when
> we pass the original arguments to cmd_status on that line.
I tried today. Before this change no arguments got passed down and
afterwards they are (but just the arguments, no submodule paths
were passed on in either case; which is what Kevin fixed in the
commit Heiko referenced). Three arguments are allowed for "git
submodule status":
--recursive:
It doesn't matter if we pass that on or not because $recursive is
reused when "eval cmd_status" is executed.
--quiet:
Same as recursive, GIT_QUIET is set the first time and then reused
in the recursion.
--cached:
This was dropped when recursing into submodules but isn't anymore
with Heiko's change, so we do have a change in behavior here.
> The
> patch changes the behaviour of the code; it makes the command line
> parsing "while" loop to run again, and if the code that accumulates
> original options in orig_flags have been buggy, now that bug will be
> exposed.
Hmm, when --cached is used together with --recursive, I would expect
it to show the commit stored in the index for the deeper submodules
too (and not magically switch to show their HEAD again after the
first level of submodules). To me this looks like a bug which Kevin
accidentally introduced and nobody noticed and/or reported until now.
So I'd vote for making this a bugfix patch for "git submodule status
--cached --recursive" (and would love to see a test for it ;-).
>> Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
>> ---
>> git-submodule.sh | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/git-submodule.sh b/git-submodule.sh
>> index dba4d39..3a3f0a4 100755
>> --- a/git-submodule.sh
>> +++ b/git-submodule.sh
>> @@ -961,7 +961,7 @@ cmd_status()
>> prefix="$displaypath/"
>> clear_local_git_env
>> cd "$sm_path" &&
>> - eval cmd_status "$orig_args"
>> + eval cmd_status "$orig_flags"
>> ) ||
>> die "$(eval_gettext "Failed to recurse into submodule path '\$sm_path'")"
>> fi
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] cleanup argument passing in submodule status command
2012-07-29 15:29 ` Jens Lehmann
@ 2012-07-29 21:57 ` Junio C Hamano
0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2012-07-29 21:57 UTC (permalink / raw)
To: Jens Lehmann; +Cc: Heiko Voigt, git, Stefan Zager
Jens Lehmann <Jens.Lehmann@web.de> writes:
> I tried today. Before this change no arguments got passed down and
> afterwards they are (but just the arguments, no submodule paths
> were passed on in either case; which is what Kevin fixed in the
> commit Heiko referenced). Three arguments are allowed for "git
> submodule status":
>
> --recursive:
> It doesn't matter if we pass that on or not because $recursive is
> reused when "eval cmd_status" is executed.
>
> --quiet:
> Same as recursive, GIT_QUIET is set the first time and then reused
> in the recursion.
>
> --cached:
> This was dropped when recursing into submodules but isn't anymore
> with Heiko's change, so we do have a change in behavior here.
> ...
> Hmm, when --cached is used together with --recursive, I would expect
> it to show the commit stored in the index for the deeper submodules
> too (and not magically switch to show their HEAD again after the
> first level of submodules). To me this looks like a bug which Kevin
> accidentally introduced and nobody noticed and/or reported until now.
>
> So I'd vote for making this a bugfix patch for "git submodule status
> --cached --recursive" (and would love to see a test for it ;-).
Yeah, I am not opposed to a "fix". I just wanted it to be labelled
as such, and analysed correctly.
And with test ;-)
Thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Enable parallelism in git submodule update.
2012-07-27 18:37 [PATCH] Enable parallelism in git submodule update Stefan Zager
2012-07-27 21:38 ` Junio C Hamano
2012-07-28 10:22 ` Heiko Voigt
@ 2012-07-29 15:37 ` Jens Lehmann
2012-11-03 19:07 ` Jens Lehmann
2 siblings, 1 reply; 12+ messages in thread
From: Jens Lehmann @ 2012-07-29 15:37 UTC (permalink / raw)
To: Stefan Zager; +Cc: git, gitster, hvoigt
Am 27.07.2012 20:37, schrieb Stefan Zager:
> The --jobs parameter may be used to set the degree of per-submodule
> parallel execution.
I think this is a sound idea, but it would be good to see some
actual measurements. What are the performance numbers with and
without this change? Which cases do benefit and are there some
which run slower when run in parallel?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Enable parallelism in git submodule update.
2012-07-29 15:37 ` [PATCH] Enable parallelism in git submodule update Jens Lehmann
@ 2012-11-03 19:07 ` Jens Lehmann
0 siblings, 0 replies; 12+ messages in thread
From: Jens Lehmann @ 2012-11-03 19:07 UTC (permalink / raw)
To: Stefan Zager; +Cc: git, gitster, hvoigt
Am 29.07.2012 17:37, schrieb Jens Lehmann:
> Am 27.07.2012 20:37, schrieb Stefan Zager:
>> The --jobs parameter may be used to set the degree of per-submodule
>> parallel execution.
>
> I think this is a sound idea, but it would be good to see some
> actual measurements. What are the performance numbers with and
> without this change? Which cases do benefit and are there some
> which run slower when run in parallel?
ping?
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-11-03 19:07 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-27 18:37 [PATCH] Enable parallelism in git submodule update Stefan Zager
2012-07-27 21:38 ` Junio C Hamano
[not found] ` <CAHOQ7J_jYAe7r1q6Cg9OJb8f+79UfS=JfRk9NrS4R4a+oLM8LA@mail.gmail.com>
2012-07-27 23:25 ` Junio C Hamano
2012-07-28 10:52 ` Heiko Voigt
2012-07-29 21:59 ` Junio C Hamano
2012-07-28 10:22 ` Heiko Voigt
2012-07-28 12:19 ` [PATCH] cleanup argument passing in submodule status command Heiko Voigt
2012-07-29 6:22 ` Junio C Hamano
2012-07-29 15:29 ` Jens Lehmann
2012-07-29 21:57 ` Junio C Hamano
2012-07-29 15:37 ` [PATCH] Enable parallelism in git submodule update Jens Lehmann
2012-11-03 19:07 ` Jens Lehmann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).