* [PATCH] fetch: avoid quadratic loop checking for updated submodules
@ 2011-09-12 19:56 Jeff King
2011-09-12 20:21 ` Junio C Hamano
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Jeff King @ 2011-09-12 19:56 UTC (permalink / raw)
To: Jens Lehmann; +Cc: Junio C Hamano, git, git-dev
Recent versions of git can be slow to fetch repositories with a
large number of refs (or when they already have a large
number of refs). For example, GitHub makes pull-requests
available as refs, which can lead to a large number of
available refs. This slowness goes away when submodule
recursion is turned off:
$ git ls-remote git://github.com/rails/rails.git | wc -l
3034
[this takes ~10 seconds of CPU time to complete]
git fetch --recurse-submodules=no \
git://github.com/rails/rails.git "refs/*:refs/*"
[this still isn't done after 10 _minutes_ of pegging the CPU]
git fetch \
git://github.com/rails/rails.git "refs/*:refs/*"
You can produce a quicker and simpler test case like this:
doit() {
head=`git rev-parse HEAD`
for i in `seq 1 $1`; do
echo $head refs/heads/ref$i
done >.git/packed-refs
echo "==> $1"
rm -rf dest
git init -q --bare dest &&
(cd dest && time git.compile fetch -q .. refs/*:refs/*)
}
rm -rf repo
git init -q repo && cd repo &&
>file && git add file && git commit -q -m one
doit 100
doit 200
doit 400
doit 800
doit 1600
doit 3200
Which yields timings like:
# refs seconds of CPU
100 0.06
200 0.24
400 0.95
800 3.39
1600 13.66
3200 54.09
Notice that although the number of refs doubles in each
trial, the CPU time spent quadruples.
The problem is that the submodule recursion code works
something like:
- for each ref we fetch
- for each commit in git rev-list $new_sha1 --not --all
- add modified submodules to list
- fetch any newly referenced submodules
But that means if we fetch N refs, we start N revision
walks. Worse, because we use "--all", the number of refs we
must process that constitute "--all" keeps growing, too. And
you end up doing O(N^2) ref resolutions.
Instead, this patch structures the code like this:
- for each sha1 we already have
- add $old_sha1 to list $old
- for each ref we fetch
- add $new_sha1 to list $new
- for each commit in git rev-list $new --not $old
- add modified submodules to list
- fetch any newly referenced submodules
This yields timings like:
# refs seconds of CPU
100 0.00
200 0.04
400 0.04
800 0.10
1600 0.21
3200 0.39
Note that the amount of effort doubles as the number of refs
doubles. Similarly, the fetch of rails.git takes about as
much time as it does with --recurse-submodules=no.
Signed-off-by: Jeff King <peff@peff.net>
---
submodule.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++----
1 files changed, 71 insertions(+), 5 deletions(-)
diff --git a/submodule.c b/submodule.c
index 7a76edf..00aeb71 100644
--- a/submodule.c
+++ b/submodule.c
@@ -8,12 +8,17 @@
#include "diffcore.h"
#include "refs.h"
#include "string-list.h"
+#include "sha1-array.h"
static struct string_list config_name_for_path;
static struct string_list config_fetch_recurse_submodules_for_name;
static struct string_list config_ignore_for_name;
static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND;
static struct string_list changed_submodule_paths;
+static int initialized_fetch_ref_tips;
+static struct sha1_array ref_tips_before_fetch;
+static struct sha1_array ref_tips_after_fetch;
+
/*
* The following flag is set if the .gitmodules file is unmerged. We then
* disable recursion for all submodules where .git/config doesn't have a
@@ -474,16 +479,71 @@ static void submodule_collect_changed_cb(struct diff_queue_struct *q,
}
}
+static int add_sha1_to_array(const char *ref, const unsigned char *sha1,
+ int flags, void *data)
+{
+ sha1_array_append(data, sha1);
+ return 0;
+}
+
void check_for_new_submodule_commits(unsigned char new_sha1[20])
{
+ if (!initialized_fetch_ref_tips) {
+ for_each_ref(add_sha1_to_array, &ref_tips_before_fetch);
+ initialized_fetch_ref_tips = 1;
+ }
+
+ sha1_array_append(&ref_tips_after_fetch, new_sha1);
+}
+
+struct argv_array {
+ const char **argv;
+ unsigned int argc;
+ unsigned int alloc;
+};
+
+static void init_argv(struct argv_array *array)
+{
+ array->argv = NULL;
+ array->argc = 0;
+ array->alloc = 0;
+}
+
+static void push_argv(struct argv_array *array, const char *value)
+{
+ ALLOC_GROW(array->argv, array->argc + 2, array->alloc);
+ array->argv[array->argc++] = xstrdup(value);
+ array->argv[array->argc] = NULL;
+}
+
+static void clear_argv(struct argv_array *array)
+{
+ int i;
+ for (i = 0; i < array->argc; i++)
+ free((char **)array->argv[i]);
+ free(array->argv);
+ init_argv(array);
+}
+
+static void add_sha1_to_argv(const unsigned char sha1[20], void *data)
+{
+ push_argv(data, sha1_to_hex(sha1));
+}
+
+static void calculate_changed_submodule_paths() {
struct rev_info rev;
struct commit *commit;
- const char *argv[] = {NULL, NULL, "--not", "--all", NULL};
- int argc = ARRAY_SIZE(argv) - 1;
+ struct argv_array argv;
init_revisions(&rev, NULL);
- argv[1] = xstrdup(sha1_to_hex(new_sha1));
- setup_revisions(argc, argv, &rev, NULL);
+ init_argv(&argv);
+ push_argv(&argv, "--"); /* argv[0] program name */
+ sha1_array_for_each_unique(&ref_tips_after_fetch,
+ add_sha1_to_argv, &argv);
+ push_argv(&argv, "--not");
+ sha1_array_for_each_unique(&ref_tips_before_fetch,
+ add_sha1_to_argv, &argv);
+ setup_revisions(argv.argc, argv.argv, &rev, NULL);
if (prepare_revision_walk(&rev))
die("revision walk setup failed");
@@ -507,7 +567,11 @@ void check_for_new_submodule_commits(unsigned char new_sha1[20])
parent = parent->next;
}
}
- free((char *)argv[1]);
+
+ clear_argv(&argv);
+ sha1_array_clear(&ref_tips_before_fetch);
+ sha1_array_clear(&ref_tips_after_fetch);
+ initialized_fetch_ref_tips = 0;
}
int fetch_populated_submodules(int num_options, const char **options,
@@ -541,6 +605,8 @@ int fetch_populated_submodules(int num_options, const char **options,
cp.git_cmd = 1;
cp.no_stdin = 1;
+ calculate_changed_submodule_paths();
+
for (i = 0; i < active_nr; i++) {
struct strbuf submodule_path = STRBUF_INIT;
struct strbuf submodule_git_dir = STRBUF_INIT;
--
1.7.6.10.g62f04
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] fetch: avoid quadratic loop checking for updated submodules
2011-09-12 19:56 [PATCH] fetch: avoid quadratic loop checking for updated submodules Jeff King
@ 2011-09-12 20:21 ` Junio C Hamano
2011-09-13 19:34 ` Jens Lehmann
2011-09-12 21:25 ` Junio C Hamano
2011-09-13 16:40 ` Christian Couder
2 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2011-09-12 20:21 UTC (permalink / raw)
To: Jeff King; +Cc: Jens Lehmann, git, git-dev
Jeff King <peff@peff.net> writes:
> Instead, this patch structures the code like this:
Yup, I agree that's the right way to do the other half of the issue.
Will queue for 1.7.6.X series, but I've already pushed out 1.7.6.3
so... ;-)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] fetch: avoid quadratic loop checking for updated submodules
2011-09-12 20:21 ` Junio C Hamano
@ 2011-09-13 19:34 ` Jens Lehmann
2011-09-14 18:26 ` Jens Lehmann
0 siblings, 1 reply; 17+ messages in thread
From: Jens Lehmann @ 2011-09-13 19:34 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jeff King, git, git-dev
Am 12.09.2011 22:21, schrieb Junio C Hamano:
> Jeff King <peff@peff.net> writes:
>
>> Instead, this patch structures the code like this:
>
> Yup, I agree that's the right way to do the other half of the issue.
Ack from me too! I tested it on the repo with 3k refs and the time went
down from 142s to 1s (when applied to 3793ac56b4, as later versions of
master contain my other half which would skip Peff's code).
On current master including my other half this takes 0.90s, while running
with Peff's code on top of 3793ac56b4 it takes .96s. That is 6 hundreds
of a second (7%) extra for not having to worry if one must run "git fetch
--recurse-submodules" or not.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] fetch: avoid quadratic loop checking for updated submodules
2011-09-13 19:34 ` Jens Lehmann
@ 2011-09-14 18:26 ` Jens Lehmann
0 siblings, 0 replies; 17+ messages in thread
From: Jens Lehmann @ 2011-09-14 18:26 UTC (permalink / raw)
To: git; +Cc: Jens Lehmann, Junio C Hamano, Jeff King, git-dev, Martin Fick
Am 13.09.2011 21:34, schrieb Jens Lehmann:
> Am 12.09.2011 22:21, schrieb Junio C Hamano:
>> Jeff King <peff@peff.net> writes:
>>
>>> Instead, this patch structures the code like this:
>>
>> Yup, I agree that's the right way to do the other half of the issue.
>
> Ack from me too! I tested it on the repo with 3k refs and the time went
> down from 142s to 1s (when applied to 3793ac56b4, as later versions of
> master contain my other half which would skip Peff's code).
>
> On current master including my other half this takes 0.90s, while running
> with Peff's code on top of 3793ac56b4 it takes .96s. That is 6 hundreds
> of a second (7%) extra for not having to worry if one must run "git fetch
> --recurse-submodules" or not.
Just for the record: Martin Fick was so kind to run Peff's fix (without my
half) on his 100k refs repo that showed this regression. Here are the results
of some test runs:
1.7.7.rc0.189.gab72a
10m8.133s 9m7.971s 8m16.600s 13m57.821s
8m34.974s 8m41.527s
1.7.7.rc0.189.gab72a --no-recurse-submodules
10m43.833s 7m41.283s 8m17.889s 8m4.549s
8m12.668s 7m59.180s
The fastest runs are 8% apart, which is pretty much the same slowdown as in
the 3k ref repo. Looks like we do have O(n) now :-)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] fetch: avoid quadratic loop checking for updated submodules
2011-09-12 19:56 [PATCH] fetch: avoid quadratic loop checking for updated submodules Jeff King
2011-09-12 20:21 ` Junio C Hamano
@ 2011-09-12 21:25 ` Junio C Hamano
2011-09-12 22:49 ` Jeff King
2011-09-13 16:40 ` Christian Couder
2 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2011-09-12 21:25 UTC (permalink / raw)
To: Jeff King; +Cc: Jens Lehmann, git, git-dev
Jeff King <peff@peff.net> writes:
> +struct argv_array {
> + const char **argv;
> + unsigned int argc;
> + unsigned int alloc;
> +};
> +
> ...
> +static void calculate_changed_submodule_paths() {
Locally fixed while queueing; no need to resend:
static void calculate_changed_submodule_paths(void)
{
> struct rev_info rev;
> struct commit *commit;
> - const char *argv[] = {NULL, NULL, "--not", "--all", NULL};
> - int argc = ARRAY_SIZE(argv) - 1;
> + struct argv_array argv;
>
> init_revisions(&rev, NULL);
> - argv[1] = xstrdup(sha1_to_hex(new_sha1));
> - setup_revisions(argc, argv, &rev, NULL);
> + init_argv(&argv);
> + push_argv(&argv, "--"); /* argv[0] program name */
> + sha1_array_for_each_unique(&ref_tips_after_fetch,
> + add_sha1_to_argv, &argv);
> + push_argv(&argv, "--not");
> + sha1_array_for_each_unique(&ref_tips_before_fetch,
> + add_sha1_to_argv, &argv);
> + setup_revisions(argv.argc, argv.argv, &rev, NULL);
I initially thought "eh, what if we have very many refs that may not fit
on the command line", but this is running the internal rev-list, so there
is no such issue.
I however have to wonder if the use of object flags done by this revision
walking need to be cleared to prevent them from interfering with the
ancestry walking done in find_common(). Shouldn't this be done in a
subprocess "rev-list" that is fed these positive and negative refs from
its standard input?
The patch does not make things worse in any sense, so I'll queue it but I
have this suspicion that fetching in a superproject that uses submodules
may be broken in the versions of Git with these internal revision walking.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] fetch: avoid quadratic loop checking for updated submodules
2011-09-12 21:25 ` Junio C Hamano
@ 2011-09-12 22:49 ` Jeff King
2011-09-12 23:06 ` Junio C Hamano
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Jeff King @ 2011-09-12 22:49 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jens Lehmann, git, git-dev
On Mon, Sep 12, 2011 at 02:25:38PM -0700, Junio C Hamano wrote:
> > +static void calculate_changed_submodule_paths() {
>
> Locally fixed while queueing; no need to resend:
>
> static void calculate_changed_submodule_paths(void)
> {
Oops, thanks. Old habits die hard, I guess. :)
> I initially thought "eh, what if we have very many refs that may not fit
> on the command line", but this is running the internal rev-list, so there
> is no such issue.
>
> I however have to wonder if the use of object flags done by this revision
> walking need to be cleared to prevent them from interfering with the
> ancestry walking done in find_common(). Shouldn't this be done in a
> subprocess "rev-list" that is fed these positive and negative refs from
> its standard input?
Maybe. I haven't noticed any such breakage. I don't know if it's because
it's not there, or we don't have sufficient test cases. But if it were a
problem, it would be pretty severe, wouldn't it? The traversal is done
whether you actually have submodules or not, so lots of people should be
seeing it.
I tried to keep the patch as minimal as possible. If I were writing
from scratch, I think my strategy would be to actually start a
traversal, and mark the commit objects themselves as we find them. I.e.,
the two steps would be:
- start a traversal
- for each existing ref
- lookup_commit_reference, mark as SEEN
- for each new commit we fetched
- walk commits from $new_sha1 until we hit SEEN, marking each as
SEEN as we go.
IOW, instead of keeping "before" and "after" lists, just mark the
commits. But that perhaps exacerbates the protential problem you
mentioned (though we could maybe just use a different flag than SEEN).
To be honest, the whole submodule recursion thing seems a bit confusing
to me. We walk every commit we just fetched looking for it to introduce
a change in a submodule. But we don't actually remember the sha1 it
changed to. And even if we did, we can't just ask a remote for a sha1,
anyway. So with a set of changes like:
[assume submodule at commit A, superproject at commit B]
1. Make commit C in submodule repo.
2. Make commit D in superproject repo.
3. Make commit E in submodule repo.
4. Make commit F in superproject repo.
what does it buy us to find out that the submodule changed from "A" to
"C"? We can't actually fetch it. We can only fetch the tips of the
submodule and hope that they include everything we wanted (i.e., both C
and E; which might not be the case of E rewound and is not a descendant
of C).
So since we must accept that we can't necessarily get every intermediate
step, I wonder if we are simply better off diffing the "before" and
"after" state of a particular ref, rather than traversing. It's way
cheaper, and is just as likely to give us the same information (i.e.,
which submodule paths had changed commits). It can be fooled by rewinds,
too, of course:
1. Make commit C in submodule repo.
2. Make commit D in superproject repo.
3. Rewind submodule repo back to A.
4. Make commit E in superproject repo.
But in that case, you probably can't get commit "C" from the submodule
anyway, so knowing about it is unlikely to help.
I dunno. Maybe there is more going on. This is the first time I've
looked at the problem.
Also also. I was a little turned off by the fact that every fetch is
going to do the equivalent of "git log --raw -m $new --not $old",
whether you have submodules or not. If you fetch a lot of history, this
is a non-trivial price to pay. Is there some way we can detect whether
submodules are being used at all, and not even bother with this
traversal otherwise? Like checking whether .gitmodules exists? I know
virtually nothing about how submodules work.
-Peff
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] fetch: avoid quadratic loop checking for updated submodules
2011-09-12 22:49 ` Jeff King
@ 2011-09-12 23:06 ` Junio C Hamano
2011-09-12 23:25 ` Jeff King
2011-09-12 23:33 ` Junio C Hamano
2011-09-13 19:13 ` Jens Lehmann
2 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2011-09-12 23:06 UTC (permalink / raw)
To: Jeff King; +Cc: Jens Lehmann, git, git-dev
Jeff King <peff@peff.net> writes:
> Also also. I was a little turned off by the fact that every fetch is
> going to do the equivalent of "git log --raw -m $new --not $old",
> whether you have submodules or not.
Notice I called your patch solving "the other half"?
See http://thread.gmane.org/gmane.comp.version-control.git/181101
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] fetch: avoid quadratic loop checking for updated submodules
2011-09-12 23:06 ` Junio C Hamano
@ 2011-09-12 23:25 ` Jeff King
0 siblings, 0 replies; 17+ messages in thread
From: Jeff King @ 2011-09-12 23:25 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jens Lehmann, git, git-dev
On Mon, Sep 12, 2011 at 04:06:10PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > Also also. I was a little turned off by the fact that every fetch is
> > going to do the equivalent of "git log --raw -m $new --not $old",
> > whether you have submodules or not.
>
> Notice I called your patch solving "the other half"?
> See http://thread.gmane.org/gmane.comp.version-control.git/181101
Oh, heh. I totally missed that thread. I don't usually pay attention to
submodule patches, and came at this completely from the "why is git
fetch so slow" angle. Glad to see Jens and I are meeting in the middle.
:)
-Peff
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] fetch: avoid quadratic loop checking for updated submodules
2011-09-12 22:49 ` Jeff King
2011-09-12 23:06 ` Junio C Hamano
@ 2011-09-12 23:33 ` Junio C Hamano
2011-09-13 19:13 ` Jens Lehmann
2 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2011-09-12 23:33 UTC (permalink / raw)
To: Jeff King; +Cc: Jens Lehmann, git, git-dev
Jeff King <peff@peff.net> writes:
> To be honest, the whole submodule recursion thing seems a bit confusing
> to me....
> So since we must accept that we can't necessarily get every intermediate
> step, I wonder if we are simply better off diffing the "before" and
> "after" state of a particular ref, rather than traversing.
Yes, exactly my thought.
It would be far cheaper to look at a single final tree and enumerate all
the submodules, which would give you pessimistically the maximum set that
could possibly be relevant, than running millions of pairwise diff trees.
By the way, I think the submodule traversal won't hurt the correctness of
the primary traversal because that happens _after_ we fetch the object
stream.
The submodule traversal _may_ be getting an incorrect result, though, for
the same reason.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] fetch: avoid quadratic loop checking for updated submodules
2011-09-12 22:49 ` Jeff King
2011-09-12 23:06 ` Junio C Hamano
2011-09-12 23:33 ` Junio C Hamano
@ 2011-09-13 19:13 ` Jens Lehmann
2011-09-13 22:17 ` Jeff King
2 siblings, 1 reply; 17+ messages in thread
From: Jens Lehmann @ 2011-09-13 19:13 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git, git-dev
Am 13.09.2011 00:49, schrieb Jeff King:
> So with a set of changes like:
>
> [assume submodule at commit A, superproject at commit B]
>
> 1. Make commit C in submodule repo.
>
> 2. Make commit D in superproject repo.
>
> 3. Make commit E in submodule repo.
>
> 4. Make commit F in superproject repo.
>
> what does it buy us to find out that the submodule changed from "A" to
> "C"? We can't actually fetch it. We can only fetch the tips of the
> submodule and hope that they include everything we wanted (i.e., both C
> and E; which might not be the case of E rewound and is not a descendant
> of C).
Yes. But working with submodules in my experience only then works well
when you never drop a submodule commit recorded in any superproject. At
my dayjob we have the convention: You may only record commits that are
on the submodule's master - or another never to be rewound integration
branch - in the superproject. That gives us all needed commits in a
simple fetch.
> So since we must accept that we can't necessarily get every intermediate
> step, I wonder if we are simply better off diffing the "before" and
> "after" state of a particular ref, rather than traversing. It's way
> cheaper, and is just as likely to give us the same information (i.e.,
> which submodule paths had changed commits).
The real world use case I have for that is that when a bug introduced by
a new submodule commit is detected later on, the superproject's fix
records an older submodule commit to remove the problematic change from
the superproject. But the submodule's branch isn't rewound (as that is
most probably master) but a fix is applied on top of it. Then that one
gets tested and - if it was found ok - recorded in the superproject.
Changes like this could be overlooked if you only compare "before" and
"after" instead of traversing, leading to not fetching a submodule which
does have new commits that are used in the newly fetched superproject's
commits. I'd like to have on-demand fetch keep the correct solution of
traversing unless we have good reasons against it, especially as teaching
checkout & friends to recursively update submodules too depends on all
needed commits being present.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] fetch: avoid quadratic loop checking for updated submodules
2011-09-13 19:13 ` Jens Lehmann
@ 2011-09-13 22:17 ` Jeff King
2011-09-14 13:20 ` Marc Branchaud
2011-09-14 18:27 ` Jens Lehmann
0 siblings, 2 replies; 17+ messages in thread
From: Jeff King @ 2011-09-13 22:17 UTC (permalink / raw)
To: Jens Lehmann; +Cc: Junio C Hamano, git
On Tue, Sep 13, 2011 at 09:13:10PM +0200, Jens Lehmann wrote:
> The real world use case I have for that is that when a bug introduced by
> a new submodule commit is detected later on, the superproject's fix
> records an older submodule commit to remove the problematic change from
> the superproject. But the submodule's branch isn't rewound (as that is
> most probably master) but a fix is applied on top of it. Then that one
> gets tested and - if it was found ok - recorded in the superproject.
OK, that makes sense. It is a "rewind" from the perspective of the
superproject, but there is never a fork; because the submodule didn't
rewind, when we do get a new submodule state in the superproject, it
will be a fast forward from the old state.
That is a reasonable use case.
It's still probably a minority case, and we have to pay for the full
traversal each time, which is annoying. But now that it's turned off by
default if you don't have any submodules, I'm less concerned about the
performance impact. And superproject repositories are probably not going
to have the same number of commits as submodule repositories, so it may
be less of an issue.
One thing that could make it slightly less expensive (but I wouldn't
worry about implementing until it becomes an issue): you do a full diff
and collect any changed GITLINK entries, and then compare the paths we
get with the submodule config. Couldn't you instead do something like
this:
- let S = set of submodule paths that we care about, from the config
- start the "git rev-list $new --not $old" traversal, as we do now
- for each commit
- diff using a pathspec of S
- for each changed entry
- add it to the set of changed submodules
- remove it from S
- if S is empty, break
That has two advantages:
1. We limit the diff by pathspec, which means we can avoid looking at
irrelevant parts of the tree (we don't do this yet, but hopefully
we will in the future).
2. You can break out of the traversal early if you already know you
have changes in each submodule of interest.
> Changes like this could be overlooked if you only compare "before" and
> "after" instead of traversing, leading to not fetching a submodule which
> does have new commits that are used in the newly fetched superproject's
> commits. I'd like to have on-demand fetch keep the correct solution of
> traversing unless we have good reasons against it, especially as teaching
> checkout & friends to recursively update submodules too depends on all
> needed commits being present.
Out of curiosity, what happens if we don't have such a commit? I know
you said that your policy is never to rewind a submodule's commit that
has been published in a superproject, and I think that's the only sane
thing to do. But it's not enforced by git itself, and I wonder how badly
we break if it does happen (i.e., I'm hoping the answer is "you can't
diff or checkout superproject revisions that reference the missing bit"
and not "git log crashes when it gets to that point in history").
-Peff
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] fetch: avoid quadratic loop checking for updated submodules
2011-09-13 22:17 ` Jeff King
@ 2011-09-14 13:20 ` Marc Branchaud
2011-09-14 18:27 ` Jens Lehmann
1 sibling, 0 replies; 17+ messages in thread
From: Marc Branchaud @ 2011-09-14 13:20 UTC (permalink / raw)
To: Jeff King; +Cc: Jens Lehmann, Junio C Hamano, git
On 11-09-13 06:17 PM, Jeff King wrote:
>
> And superproject repositories are probably not going
> to have the same number of commits as submodule repositories, so it may
> be less of an issue.
Just a side note: I don't think this is a safe assumption. It's certainly
not the case in our repo, where the submodules are infrequently updated.
M.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] fetch: avoid quadratic loop checking for updated submodules
2011-09-13 22:17 ` Jeff King
2011-09-14 13:20 ` Marc Branchaud
@ 2011-09-14 18:27 ` Jens Lehmann
2011-09-14 18:29 ` Jeff King
1 sibling, 1 reply; 17+ messages in thread
From: Jens Lehmann @ 2011-09-14 18:27 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git
Am 14.09.2011 00:17, schrieb Jeff King:
> One thing that could make it slightly less expensive (but I wouldn't
> worry about implementing until it becomes an issue): you do a full diff
> and collect any changed GITLINK entries, and then compare the paths we
> get with the submodule config. Couldn't you instead do something like
> this:
>
> - let S = set of submodule paths that we care about, from the config
> - start the "git rev-list $new --not $old" traversal, as we do now
> - for each commit
> - diff using a pathspec of S
> - for each changed entry
> - add it to the set of changed submodules
> - remove it from S
> - if S is empty, break
>
> That has two advantages:
>
> 1. We limit the diff by pathspec, which means we can avoid looking at
> irrelevant parts of the tree (we don't do this yet, but hopefully
> we will in the future).
>
> 2. You can break out of the traversal early if you already know you
> have changes in each submodule of interest.
I think this would work for the functionality which is implemented right
now. But with Frederik's gitfile work I hope to enable git to support
submodules being moved around in the work tree rather soonish. Then we
would be blind for any changes in the new location. Until it hurts us
I'd prefer to stay with the correct version, even if it is a bit slower.
> Out of curiosity, what happens if we don't have such a commit? I know
> you said that your policy is never to rewind a submodule's commit that
> has been published in a superproject, and I think that's the only sane
> thing to do. But it's not enforced by git itself, and I wonder how badly
> we break if it does happen (i.e., I'm hoping the answer is "you can't
> diff or checkout superproject revisions that reference the missing bit"
> and not "git log crashes when it gets to that point in history").
No worries, nothing bad happens in that case.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] fetch: avoid quadratic loop checking for updated submodules
2011-09-14 18:27 ` Jens Lehmann
@ 2011-09-14 18:29 ` Jeff King
0 siblings, 0 replies; 17+ messages in thread
From: Jeff King @ 2011-09-14 18:29 UTC (permalink / raw)
To: Jens Lehmann; +Cc: Junio C Hamano, git
On Wed, Sep 14, 2011 at 08:27:48PM +0200, Jens Lehmann wrote:
> Am 14.09.2011 00:17, schrieb Jeff King:
> > One thing that could make it slightly less expensive (but I wouldn't
> > worry about implementing until it becomes an issue): you do a full diff
> [...]
> I think this would work for the functionality which is implemented right
> now. But with Frederik's gitfile work I hope to enable git to support
> submodules being moved around in the work tree rather soonish. Then we
> would be blind for any changes in the new location. Until it hurts us
> I'd prefer to stay with the correct version, even if it is a bit slower.
Agreed. Let's worry about it if and when it becomes a performance issue.
Thanks for putting up my with my total cluelessness regarding what's
happening in submodules. ;)
-Peff
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] fetch: avoid quadratic loop checking for updated submodules
2011-09-12 19:56 [PATCH] fetch: avoid quadratic loop checking for updated submodules Jeff King
2011-09-12 20:21 ` Junio C Hamano
2011-09-12 21:25 ` Junio C Hamano
@ 2011-09-13 16:40 ` Christian Couder
2011-09-13 17:15 ` Jeff King
2 siblings, 1 reply; 17+ messages in thread
From: Christian Couder @ 2011-09-13 16:40 UTC (permalink / raw)
To: Jeff King; +Cc: Jens Lehmann, Junio C Hamano, git, git-dev
Hi Peff,
On Mon, Sep 12, 2011 at 9:56 PM, Jeff King <peff@peff.net> wrote:
> diff --git a/submodule.c b/submodule.c
> index 7a76edf..00aeb71 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -8,12 +8,17 @@
> #include "diffcore.h"
> #include "refs.h"
> #include "string-list.h"
> +#include "sha1-array.h"
Nice to see you reuse sha1-array!
[...]
> +struct argv_array {
> + const char **argv;
> + unsigned int argc;
> + unsigned int alloc;
> +};
But there is already such a struct in bisect.c!
> +static void init_argv(struct argv_array *array)
> +{
> + array->argv = NULL;
> + array->argc = 0;
> + array->alloc = 0;
> +}
> +
> +static void push_argv(struct argv_array *array, const char *value)
> +{
> + ALLOC_GROW(array->argv, array->argc + 2, array->alloc);
> + array->argv[array->argc++] = xstrdup(value);
> + array->argv[array->argc] = NULL;
> +}
> +
> +static void clear_argv(struct argv_array *array)
> +{
> + int i;
> + for (i = 0; i < array->argc; i++)
> + free((char **)array->argv[i]);
> + free(array->argv);
> + init_argv(array);
> +}
> +
> +static void add_sha1_to_argv(const unsigned char sha1[20], void *data)
> +{
> + push_argv(data, sha1_to_hex(sha1));
> +}
So it would be nice if you could refactor this and the argv_array
functions in bisect.c in the same way you refactored sha1-array.
Thanks,
Christian.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] fetch: avoid quadratic loop checking for updated submodules
2011-09-13 16:40 ` Christian Couder
@ 2011-09-13 17:15 ` Jeff King
2011-09-13 17:34 ` Junio C Hamano
0 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2011-09-13 17:15 UTC (permalink / raw)
To: Christian Couder; +Cc: Jens Lehmann, Junio C Hamano, git, git-dev
On Tue, Sep 13, 2011 at 06:40:07PM +0200, Christian Couder wrote:
> > +struct argv_array {
> > + const char **argv;
> > + unsigned int argc;
> > + unsigned int alloc;
> > +};
>
> But there is already such a struct in bisect.c!
Heh. I completely missed that. As I was writing it, I realized it would
be a good thing to factor out, but most of the argv builders I checked
weren't dynamic at all (they knew up front how big argv would need to be
because they were copying).
As it turns out, our implementations are remarkably similar considering
I hadn't read yours. It must mean they're both obviously correct. :)
> So it would be nice if you could refactor this and the argv_array
> functions in bisect.c in the same way you refactored sha1-array.
Will do. Junio, do you want me to re-roll the quadratic fix, or just
build the refactoring on top?
-Peff
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2011-09-14 18:29 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-12 19:56 [PATCH] fetch: avoid quadratic loop checking for updated submodules Jeff King
2011-09-12 20:21 ` Junio C Hamano
2011-09-13 19:34 ` Jens Lehmann
2011-09-14 18:26 ` Jens Lehmann
2011-09-12 21:25 ` Junio C Hamano
2011-09-12 22:49 ` Jeff King
2011-09-12 23:06 ` Junio C Hamano
2011-09-12 23:25 ` Jeff King
2011-09-12 23:33 ` Junio C Hamano
2011-09-13 19:13 ` Jens Lehmann
2011-09-13 22:17 ` Jeff King
2011-09-14 13:20 ` Marc Branchaud
2011-09-14 18:27 ` Jens Lehmann
2011-09-14 18:29 ` Jeff King
2011-09-13 16:40 ` Christian Couder
2011-09-13 17:15 ` Jeff King
2011-09-13 17:34 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).