git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Disable useless check_for_new_submodule_commits() by default
@ 2011-09-09  1:56 Junio C Hamano
  2011-09-09 16:29 ` Jens Lehmann
  0 siblings, 1 reply; 2+ messages in thread
From: Junio C Hamano @ 2011-09-09  1:56 UTC (permalink / raw)
  To: git; +Cc: Jens Lehmann, Heiko Voigt

Imagine a project with 20k paths with a history 250k commits deep, without
any submodule.

Imagine another project with the same depth of history but with 1k paths,
among which there are 200 submodules.

Further, imagine fetching from one of the above repositories into a
repository that is very behind, and updating many remote tracking
branches.  Now, think about what check_for_new_submodule_commits() in
submodule.c that is called from update_local_ref() in builtin/fetch.c
would do.

For each updated remote tracking branch (or anything that is not a tag),
the problem function will run the equivalent of:

	git log --raw $new --not --all

which would mean 250k rounds of diff-tree to enumerate the submodules that
may have been updated, and it does it for each and every refs outside the
refs/tags hierarchy.

Presumably, this is so that it only has to actually fetch in the submodule
"on demand", but even if in a project _with_ submodules (i.e. the latter
example above), this is simply not acceptable. You could just enumerate
those 200 submodules at the tip that _might_ matter and that would be
million times cheaper. To add insult to injury, this "on demand" behaviour
is on by default, which hurts projects without any submodules (i.e. the
former example above) a lot.

In short, if "on demand" check is million times more expensive than
actually doing it, the check does not have any value.

In the longer term, people who want to have the on-demand behaviour need
to come up with a cheaper way to determine if it is necessary to recurse
into submodules by fixing check_for_new_submodule_commits(), but until
that happens, we should disable submodule recursion by default unless the
user explicitly asks for it from the command line or from the configuration.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * This is meant for 1.7.6.X maintenance track to fix the huge regression
   caused by 88a2197 (fetch/pull: recurse into submodules when necessary,
   2011-03-06).  I would very much appreciate a simpler and trivially
   correct patch as a counter-proposal that makes sure that the expensive
   check_for_new_submodule_commits() and fetch_populated_submodules()
   functions are never called when the user does not have any command line
   option or configuration variable to tell otherwise. I suspect that this
   patch may be disabling more things than absolutely necessary.

   I also suspect that it might be just a matter of finding any and all
   comparison of the form:

	if (recurse != ON && recurse != OFF)

   that was present before the on-demand stuff and replace that with
        
	if (recurse == ON_DEMAND)

   but I didn't look very closely. At least this patch is much less error
   prone for the purpose of not triggering the expensive and pointless
   check ;-)

 builtin/fetch.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 93c9938..71232c1 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -29,7 +29,7 @@ enum {
 };
 
 static int all, append, dry_run, force, keep, multiple, prune, update_head_ok, verbosity;
-static int progress, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
+static int progress, recurse_submodules = RECURSE_SUBMODULES_OFF;
 static int tags = TAGS_DEFAULT;
 static const char *depth;
 static const char *upload_pack;

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] Disable useless check_for_new_submodule_commits() by default
  2011-09-09  1:56 [PATCH] Disable useless check_for_new_submodule_commits() by default Junio C Hamano
@ 2011-09-09 16:29 ` Jens Lehmann
  0 siblings, 0 replies; 2+ messages in thread
From: Jens Lehmann @ 2011-09-09 16:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Heiko Voigt

Am 09.09.2011 03:56, schrieb Junio C Hamano:
> Imagine a project with 20k paths with a history 250k commits deep, without
> any submodule.
> 
> Imagine another project with the same depth of history but with 1k paths,
> among which there are 200 submodules.
> 
> Further, imagine fetching from one of the above repositories into a
> repository that is very behind, and updating many remote tracking
> branches.  Now, think about what check_for_new_submodule_commits() in
> submodule.c that is called from update_local_ref() in builtin/fetch.c
> would do.
> 
> For each updated remote tracking branch (or anything that is not a tag),
> the problem function will run the equivalent of:
> 
> 	git log --raw $new --not --all
> 
> which would mean 250k rounds of diff-tree to enumerate the submodules that
> may have been updated, and it does it for each and every refs outside the
> refs/tags hierarchy.
>
> Presumably, this is so that it only has to actually fetch in the submodule
> "on demand", but even if in a project _with_ submodules (i.e. the latter
> example above), this is simply not acceptable.

Definitely.

> You could just enumerate
> those 200 submodules at the tip that _might_ matter and that would be
> million times cheaper.

But that trick won't do it. I thought about that too when I implemented this
feature, but we need to look in all new commits (e.g. a submodule is not
present anymore in the new tip but is changed in the preceding but not yet
fetched commits, then it still has to be checked if it has the necessary
commits locally and they need to be fetched if that is not the case).

> To add insult to injury, this "on demand" behaviour
> is on by default, which hurts projects without any submodules (i.e. the
> former example above) a lot.

Right, I'll post a patch disabling that check completely when a project has
no submodules.

> In short, if "on demand" check is million times more expensive than
> actually doing it, the check does not have any value.
> 
> In the longer term, people who want to have the on-demand behaviour need
> to come up with a cheaper way to determine if it is necessary to recurse
> into submodules by fixing check_for_new_submodule_commits(), but until
> that happens, we should disable submodule recursion by default unless the
> user explicitly asks for it from the command line or from the configuration.

Ok, I'll see if I can come up with a solution for the performance regression.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-09-09 16:30 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-09  1:56 [PATCH] Disable useless check_for_new_submodule_commits() by default Junio C Hamano
2011-09-09 16:29 ` Jens Lehmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).