* [PATCHv2] push: change submodule default to check @ 2016-08-24 17:30 Stefan Beller 2016-08-24 18:38 ` Junio C Hamano [not found] ` <20160824183112.ceekegpzavnbybxp@sigill.intra.peff.net> 0 siblings, 2 replies; 25+ messages in thread From: Stefan Beller @ 2016-08-24 17:30 UTC (permalink / raw) Cc: git, hvoigt, Jens.Lehmann, iveqy, leandro.lucarella, gitster, Stefan Beller When working with submodules, it is easy to forget to push the submodules. The setting 'check', which checks if any existing submodule is present on at least one remote of the submodule remotes, is designed to prevent this mistake. Flipping the default to check for submodules is safer than the current default of ignoring submodules while pushing. Signed-off-by: Stefan Beller <sbeller@google.com> --- Slightly reworded commit message than in v1, (https://public-inbox.org/git/20160817204848.8983-1-sbeller@google.com/) The patch itself is however the same. I just push it out now with a new commit message, such that we can easier pick it up later for Git 3.0, when changes that change default make more sense. As said in an earlier message, you could however also argue that this is fixing a bug in your workflow, so it might be worth fixing before 3.0 as well. I dunno. Thanks, Stefan builtin/push.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/push.c b/builtin/push.c index 3bb9d6b..479150a 100644 --- a/builtin/push.c +++ b/builtin/push.c @@ -22,7 +22,7 @@ static int deleterefs; static const char *receivepack; static int verbosity; static int progress = -1; -static int recurse_submodules = RECURSE_SUBMODULES_DEFAULT; +static int recurse_submodules = RECURSE_SUBMODULES_CHECK; static enum transport_family family; static struct push_cas_option cas; -- 2.10.0.rc1.1.g1ceb01a ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCHv2] push: change submodule default to check 2016-08-24 17:30 [PATCHv2] push: change submodule default to check Stefan Beller @ 2016-08-24 18:38 ` Junio C Hamano [not found] ` <20160824183112.ceekegpzavnbybxp@sigill.intra.peff.net> 1 sibling, 0 replies; 25+ messages in thread From: Junio C Hamano @ 2016-08-24 18:38 UTC (permalink / raw) To: Stefan Beller; +Cc: git, hvoigt, Jens.Lehmann, iveqy, leandro.lucarella Stefan Beller <sbeller@google.com> writes: > When working with submodules, it is easy to forget to push the submodules. > The setting 'check', which checks if any existing submodule is present on > at least one remote of the submodule remotes, is designed to prevent this > mistake. > > Flipping the default to check for submodules is safer than the current > default of ignoring submodules while pushing. > > Signed-off-by: Stefan Beller <sbeller@google.com> > --- > > Slightly reworded commit message than in v1, > (https://public-inbox.org/git/20160817204848.8983-1-sbeller@google.com/) > The patch itself is however the same. > > I just push it out now with a new commit message, such that we can easier > pick it up later for Git 3.0, when changes that change default make more sense. > > As said in an earlier message, you could however also argue that this is > fixing a bug in your workflow, so it might be worth fixing before 3.0 > as well. I dunno. With this change suddenly landing on the version of Git a user just updated, the only possible negative effect would be that somebody who did not configure push.recurseSubmodules suddenly starts getting an error. What would the error message say? What I want you to think about and explain in the justification is if it gives the user enough information to decide the best course of action to adjust to the new world order, when the user's expectation has been that the superproject gets pushed independent from its submodules. If the existing error message gives enough information, explains why Git suddenly started refusing the push is generally a good thing for the user, tells some minority users (in whose workflow it is perfectly safe to push out the toplevel independently from the submodule) how to turn it back to the old behaviour clearly enough, then this single-liner may be good enough. Otherwise we may need a new logic to allow us to see if recurse_submodules that is set to RECURSE_SUBMODULES_CHECK is because the user explicitly configured, or because the user did not have any configuration, and issue a different message in the latter case. > builtin/push.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/builtin/push.c b/builtin/push.c > index 3bb9d6b..479150a 100644 > --- a/builtin/push.c > +++ b/builtin/push.c > @@ -22,7 +22,7 @@ static int deleterefs; > static const char *receivepack; > static int verbosity; > static int progress = -1; > -static int recurse_submodules = RECURSE_SUBMODULES_DEFAULT; > +static int recurse_submodules = RECURSE_SUBMODULES_CHECK; > static enum transport_family family; > > static struct push_cas_option cas; ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20160824183112.ceekegpzavnbybxp@sigill.intra.peff.net>]
* Re: [PATCHv2] push: change submodule default to check [not found] ` <20160824183112.ceekegpzavnbybxp@sigill.intra.peff.net> @ 2016-08-24 19:37 ` Junio C Hamano 2016-08-24 21:26 ` Junio C Hamano 2016-08-24 22:37 ` Stefan Beller 0 siblings, 2 replies; 25+ messages in thread From: Junio C Hamano @ 2016-08-24 19:37 UTC (permalink / raw) To: Jeff King Cc: Stefan Beller, git, hvoigt, Jens.Lehmann, iveqy, leandro.lucarella Jeff King <peff@peff.net> writes: This seems to be dropped from the list, probably due to no "To:" header in the original, which led to "no", "To-header" "on" and "input <" on YOUR recipient list, so I am quoting it in full without trimming. > On Wed, Aug 24, 2016 at 10:30:17AM -0700, Stefan Beller wrote: > >> When working with submodules, it is easy to forget to push the submodules. >> The setting 'check', which checks if any existing submodule is present on >> at least one remote of the submodule remotes, is designed to prevent this >> mistake. >> >> Flipping the default to check for submodules is safer than the current >> default of ignoring submodules while pushing. > > It is safer, and that's good. But it's also slower, because it requires > an extra traversal of all of the pushed commits. And now people will > have to pay the price even if they are not using submodules at all. > > For instance, try this from a checkout of linux.git: > > for i in no check; do > rm -rf dst.git > git init --bare dst.git > echo "==> Pushing with submodules=$i" > time git push --recurse-submodules=$i dst.git HEAD > done > > The second case takes 30-40 seconds longer. This is a full push of > history, so it's an extreme case[1], but it's still rather unfortunate. > > Can we tie this default to some sign that submodules are actually in > use? I don't think the presence of .gitmodules is perfect (because you > might be in a bare repo, for example, and have just fetched some other > history you are relaying), but it may be a good compromise. I'm > envisioning something like "--recurse-submodules=auto-check" which > auto-detects common situations (e.g., presence of .gitmodules or > .git/modules checkouts) and enables "check", and then setting the > default to that in the long run. > > -Peff > > [1] Actually, there is another much worse case lurking there. Try: > > git push --recurse-submodules=check --mirror dst.git > > from the kernel. I didn't let it finish, but I'd estimate it would > take on the order of 5 hours. The problem is that push feeds each > updated ref tip to find_unpushed_submodules(), so we end up walking > over the same history over and over. > > I think it should feed all of the "before" and "after" ref tips it > proposes to update to a _single_ revision traversal. That sounds massively ... broken. So before even thinking about flipping it to default, this needs to be fixed first. > I also notice that it uses "--remote=...", which is weird, because > push knows exactly what it proposes to update, which may be ahead of > where our refs/remotes/* cache is. Not to mention that we may be > pushing to a remote for which we do not keep tracking refs at all! > > So I'd actually suspect that with your patch, a bare URL like: > > git push https://github.com/peff/linux.git > > would do the full 40-second walk, even if I was only pushing up one > or two objects. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCHv2] push: change submodule default to check 2016-08-24 19:37 ` Junio C Hamano @ 2016-08-24 21:26 ` Junio C Hamano 2016-08-24 22:37 ` Stefan Beller 1 sibling, 0 replies; 25+ messages in thread From: Junio C Hamano @ 2016-08-24 21:26 UTC (permalink / raw) To: Jeff King Cc: Stefan Beller, git, hvoigt, Jens.Lehmann, iveqy, leandro.lucarella Junio C Hamano <gitster@pobox.com> writes: > Jeff King <peff@peff.net> writes: > >> For instance, try this from a checkout of linux.git: >> >> for i in no check; do >> rm -rf dst.git >> git init --bare dst.git >> echo "==> Pushing with submodules=$i" >> time git push --recurse-submodules=$i dst.git HEAD >> done >> >> The second case takes 30-40 seconds longer. This is a full push of >> history, so it's an extreme case[1], but it's still rather unfortunate. This actually bit me just now. github.com/gitster/git.git is mirror pushed, and it would seem to take forever, so I killed it, and flipped the configuration variable off. Which means the feature won't ever get any testing from me in real life. People, git.git is not a large project in any sense of the word. Why wasn't it discovered that "push.recursesubmodules = check" is UNUSABLE since it was introduced is simply beyond me. I am mad (which is unusual for me, isn't it? -- I've seen somebody else saying "I am mad", but I do not think I ever said it here). ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCHv2] push: change submodule default to check 2016-08-24 19:37 ` Junio C Hamano 2016-08-24 21:26 ` Junio C Hamano @ 2016-08-24 22:37 ` Stefan Beller 2016-08-24 23:01 ` Jeff King 1 sibling, 1 reply; 25+ messages in thread From: Stefan Beller @ 2016-08-24 22:37 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, git@vger.kernel.org, Heiko Voigt, Jens Lehmann, Fredrik Gustafsson, Leandro Lucarella On Wed, Aug 24, 2016 at 12:37 PM, Junio C Hamano <gitster@pobox.com> wrote: > Jeff King <peff@peff.net> writes: > > This seems to be dropped from the list, probably due to no "To:" > header in the original, which led to "no", "To-header" "on" and > "input <" on YOUR recipient list, so I am quoting it in full without > trimming. > >> On Wed, Aug 24, 2016 at 10:30:17AM -0700, Stefan Beller wrote: >> >>> When working with submodules, it is easy to forget to push the submodules. >>> The setting 'check', which checks if any existing submodule is present on >>> at least one remote of the submodule remotes, is designed to prevent this >>> mistake. >>> >>> Flipping the default to check for submodules is safer than the current >>> default of ignoring submodules while pushing. >> >> It is safer, and that's good. But it's also slower, because it requires >> an extra traversal of all of the pushed commits. And now people will >> have to pay the price even if they are not using submodules at all. >> >> For instance, try this from a checkout of linux.git: >> >> for i in no check; do >> rm -rf dst.git >> git init --bare dst.git >> echo "==> Pushing with submodules=$i" >> time git push --recurse-submodules=$i dst.git HEAD >> done >> >> The second case takes 30-40 seconds longer. This is a full push of >> history, so it's an extreme case[1], but it's still rather unfortunate. >> >> Can we tie this default to some sign that submodules are actually in >> use? I don't think the presence of .gitmodules is perfect (because you >> might be in a bare repo, for example, and have just fetched some other >> history you are relaying), but it may be a good compromise. I'm >> envisioning something like "--recurse-submodules=auto-check" which >> auto-detects common situations (e.g., presence of .gitmodules or >> .git/modules checkouts) and enables "check", and then setting the >> default to that in the long run. >> >> -Peff >> >> [1] Actually, there is another much worse case lurking there. Try: >> >> git push --recurse-submodules=check --mirror dst.git >> >> from the kernel. I didn't let it finish, but I'd estimate it would >> take on the order of 5 hours. The problem is that push feeds each >> updated ref tip to find_unpushed_submodules(), so we end up walking >> over the same history over and over. >> >> I think it should feed all of the "before" and "after" ref tips it >> proposes to update to a _single_ revision traversal. > > That sounds massively ... broken. So before even thinking about > flipping it to default, this needs to be fixed first. > I agree. That sounds bad. However having the --auto-check feels like papering over the actual problem which to me sounds like a design problem. However this may be a viable short term solution. About the design issue: We need to answer the question: Which submodule commits are referenced by a given set of superproject commits. This question is advancing a very similar question that we'd have to ask in git-gc. In gc we would end up without having to worry about a specific set, but rather the all reachable commits of the superproject are in the given set. So we could solve two issues at the same time if we had a quick way to answer this question quickly. And for that I considered introducing a ref in the submodule (e.g.) refs/superproject/gc_protector, which records both the superprojects commit as well as the submodule commit in case the superproject changed the submodule pointer. I have some rough patches for this, but I did not consider sending that to the list as the patches are still rough and not quite fully thought out on the design level. -- Actually screw this. After an office discussion with Jonathan (cc'd): 1) We need to have an "alternate refs namespace", which contains secondary data (as this can be derived from the primary data with lots of compute). So maybe we need a file similar to the packed refs file, "superproject-refs" that behaves as if it is storing refs, but that file is safe to delete as we know all its contents can be recreated. 2) In this new alternate refs space, we can have refs/superproject/<sha1> in the submodule with the sha1 of the superproject that points to a commit which is a dirty merge of all submodule commits, that have no other parents. e.g. In the superproject we might have a history of: A <- B <- C with A being origin/master and C our local master. In the submodule we have: D <- E <- F \ G F is the master branch in the submodule, G is a dangling commit. Now we could have the following git links: A -> D B -> G C -> F Then the refs/superproject/<C> would be a merge of F and G. When pushing in the superproject (A..C) we would need to make sure the submodule is updated as well, i.e. we'd look at refs/superproject/<C> to know we need to advance the remote submodule history to contain at least G and F. Then we can do a standard rev walk starting at G and F downwards until we hit a commit that is contained in remote/refs/*. > Why wasn't it discovered that "push.recursesubmodules = check" is > UNUSABLE since it was introduced is simply beyond me. Maybe it is not compatible with your workflow as you push only once a day? In a centralized server model you rarely exceed a few commits on push time, but $ git rev-list a829490...c08db5a |wc -l 23 $ git rev-list f5df0f2...f3c0fa5 |wc -l 61 $ git rev-list 8f73021...de36215 |wc -l 95 and some new branches, so you pushed around 200 commits. I think this misbehaves strongly for typical maintainer work flows. > I am mad (which is unusual for me, isn't it? -- I've seen somebody > else saying "I am mad", but I do not think I ever said it here). Please direct your anger at the design of submodules. The assumption of the submodule being sort of 'independant' doesn't allow for efficient data structures I would think. So maybe we need to teach the submodules about the existence and state of the superproject? /me ducks ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCHv2] push: change submodule default to check 2016-08-24 22:37 ` Stefan Beller @ 2016-08-24 23:01 ` Jeff King 2016-09-14 17:31 ` [PATCH 1/2] serialize collection of changed submodules Heiko Voigt 2016-09-14 17:51 ` [PATCH 2/2] serialize collection of refs that contain submodule changes Heiko Voigt 0 siblings, 2 replies; 25+ messages in thread From: Jeff King @ 2016-08-24 23:01 UTC (permalink / raw) To: Stefan Beller Cc: Junio C Hamano, git@vger.kernel.org, Heiko Voigt, Jens Lehmann, Fredrik Gustafsson, Leandro Lucarella On Wed, Aug 24, 2016 at 03:37:29PM -0700, Stefan Beller wrote: > > That sounds massively ... broken. So before even thinking about > > flipping it to default, this needs to be fixed first. > > I agree. That sounds bad. > > However having the --auto-check feels like papering over the > actual problem which to me sounds like a design problem. > However this may be a viable short term solution. Sort of... I may not have been clear, but there are really a few things going on. One is that the design of find_unpushed_submodules() is just brain-dead. It does one traversal per updated ref, which means a from-scratch mirror is O(nr_of_refs * nr_of_commits). That's just silly, and can easily be fixed behind the scenes to be O(nr_of_commits). And I _suspect_ it is what made Junio's earlier push so awful; he probably pushed up the same commits as part of many different branches, so he did the same diffs over and over. So clearing that up seems like an obvious first step, and dulls the pain to "if submodule recursion is on, the worst case is that you walk all the new objects you are sending". That's still _a_ traversal, but it's one we have to do anyway in pack-objects, so it's the same order of magnitude as the rest of the push[1]. Then you've got two cases: the repo is using submodules at all, or they are not. The former is an easy case, if we can identify it; we can avoid the traversal at all, and people who do not use submodules are not regressed at all. That leaves people who _are_ using submodules with paying the extra traversal cost. Not great, but only really a pain when you have a really big chunk of history to push. It may be lost in the noise for such a push in more normal circumstances (where bandwidth to push up the objects dominates, though it is unfortunate that we do not even start utilizing the bandwidth, the critical resource, until we are done with the submodule check). [1] Of course with reachability bitmaps the pack-objects traversal goes away, but the same cannot be accomplished here (because they do not store the gitlink sha1s at all, because they do not imply reachability). > We need to answer the question: Which submodule commits > are referenced by a given set of superproject commits. > > This question is advancing a very similar question that we'd > have to ask in git-gc. In gc we would end up without having to > worry about a specific set, but rather the all reachable commits > of the superproject are in the given set. > > So we could solve two issues at the same time if we had a quick > way to answer this question quickly. > [...] I snipped here because your solutions sound complicated (which isn't to say they're wrong, but that I am not willing to give them a lot of thought at this time in the evening ;) ). One opposite approach which appeals to me is not to remove the need for the traversal, but to make it much faster. E.g., by storing commits in a form that can be traversed more quickly, and possibly keeping a bitmap cache of which paths are touched in each commit (I have posted patches to the list for the former, but have only been considering experimenting with the latter). That's _also_ complicated, but it applies to way more things. Including normal log traversals, path-limiting for diffs, the "counting" traversal done by pack-object, etc. And while it is complicated in some ways, it's conceptually simple at the git data model layer. It's returning the same old answers about commits and trees, just faster. Anyway, food for thought. -Peff ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 1/2] serialize collection of changed submodules 2016-08-24 23:01 ` Jeff King @ 2016-09-14 17:31 ` Heiko Voigt 2016-09-14 22:30 ` Junio C Hamano 2016-09-16 17:27 ` [PATCH 1/2] serialize collection of changed submodules Junio C Hamano 2016-09-14 17:51 ` [PATCH 2/2] serialize collection of refs that contain submodule changes Heiko Voigt 1 sibling, 2 replies; 25+ messages in thread From: Heiko Voigt @ 2016-09-14 17:31 UTC (permalink / raw) To: Jeff King Cc: Stefan Beller, Junio C Hamano, git@vger.kernel.org, Jens Lehmann, Fredrik Gustafsson, Leandro Lucarella To check whether a submodule needs to be pushed we need to collect all changed submodules. Lets collect them first and then execute the possibly expensive test whether certain revisions are already pushed only once per submodule. There is further potential for optimization since we can assemble one command and only issued that instead of one call for each remote ref in the submodule. Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net> --- Sorry about the late reply. I was not able to process emails until now. Here are two patches that should help to improve the situation and batch up some processing. This one is for repositories with submodules, so that they do not iterate over the same submodule twice with the same hash. The second one will be the one people without submodules are interested in. Cheers Heiko submodule.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 62 insertions(+), 5 deletions(-) diff --git a/submodule.c b/submodule.c index 0ef2ff4..b04c066 100644 --- a/submodule.c +++ b/submodule.c @@ -554,19 +554,38 @@ static int submodule_needs_pushing(const char *path, const unsigned char sha1[20 return 0; } +static struct sha1_array *get_sha1s_from_list(struct string_list *submodules, + const char *path) +{ + struct string_list_item *item; + struct sha1_array *hashes; + + item = string_list_insert(submodules, path); + if (item->util) + return (struct sha1_array *) item->util; + + hashes = (struct sha1_array *) xmalloc(sizeof(struct sha1_array)); + /* NEEDSWORK: should we add an initializer function for + * sha1_array ? */ + memset(hashes, 0, sizeof(struct sha1_array)); + item->util = hashes; + return hashes; +} + static void collect_submodules_from_diff(struct diff_queue_struct *q, struct diff_options *options, void *data) { int i; - struct string_list *needs_pushing = data; + struct string_list *submodules = data; for (i = 0; i < q->nr; i++) { struct diff_filepair *p = q->queue[i]; + struct sha1_array *hashes; if (!S_ISGITLINK(p->two->mode)) continue; - if (submodule_needs_pushing(p->two->path, p->two->oid.hash)) - string_list_insert(needs_pushing, p->two->path); + hashes = get_sha1s_from_list(submodules, p->two->path); + sha1_array_append(hashes, p->two->oid.hash); } } @@ -582,14 +601,41 @@ static void find_unpushed_submodule_commits(struct commit *commit, diff_tree_combined_merge(commit, 1, &rev); } +struct collect_submodule_from_sha1s_data { + char *submodule_path; + struct string_list *needs_pushing; +}; + +static void collect_submodules_from_sha1s(const unsigned char sha1[20], + void *data) +{ + struct collect_submodule_from_sha1s_data *me = + (struct collect_submodule_from_sha1s_data *) data; + + if (submodule_needs_pushing(me->submodule_path, sha1)) + string_list_insert(me->needs_pushing, me->submodule_path); +} + +static void free_submodules_sha1s(struct string_list *submodules) +{ + int i; + for (i = 0; i < submodules->nr; i++) { + struct string_list_item *item = &submodules->items[i]; + struct sha1_array *hashes = (struct sha1_array *) item->util; + sha1_array_clear(hashes); + } + string_list_clear(submodules, 1); +} + int find_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name, struct string_list *needs_pushing) { struct rev_info rev; struct commit *commit; const char *argv[] = {NULL, NULL, "--not", "NULL", NULL}; - int argc = ARRAY_SIZE(argv) - 1; + int argc = ARRAY_SIZE(argv) - 1, i; char *sha1_copy; + struct string_list submodules = STRING_LIST_INIT_DUP; struct strbuf remotes_arg = STRBUF_INIT; @@ -603,12 +649,23 @@ int find_unpushed_submodules(unsigned char new_sha1[20], die("revision walk setup failed"); while ((commit = get_revision(&rev)) != NULL) - find_unpushed_submodule_commits(commit, needs_pushing); + find_unpushed_submodule_commits(commit, &submodules); reset_revision_walk(); free(sha1_copy); strbuf_release(&remotes_arg); + for (i = 0; i < submodules.nr; i++) { + struct string_list_item *item = &submodules.items[i]; + struct collect_submodule_from_sha1s_data data; + data.submodule_path = item->string; + data.needs_pushing = needs_pushing; + sha1_array_for_each_unique((struct sha1_array *) item->util, + collect_submodules_from_sha1s, + &data); + } + free_submodules_sha1s(&submodules); + return needs_pushing->nr; } -- 2.0.2.832.g083c931 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] serialize collection of changed submodules 2016-09-14 17:31 ` [PATCH 1/2] serialize collection of changed submodules Heiko Voigt @ 2016-09-14 22:30 ` Junio C Hamano 2016-09-15 12:10 ` [PATCH 3/2] batch check whether submodule needs pushing into one call Heiko Voigt 2016-09-15 12:18 ` [PATCH 4/2] use actual start hashes for submodule push check instead of local refs Heiko Voigt 2016-09-16 17:27 ` [PATCH 1/2] serialize collection of changed submodules Junio C Hamano 1 sibling, 2 replies; 25+ messages in thread From: Junio C Hamano @ 2016-09-14 22:30 UTC (permalink / raw) To: Heiko Voigt Cc: Jeff King, Stefan Beller, git@vger.kernel.org, Jens Lehmann, Fredrik Gustafsson, Leandro Lucarella Heiko Voigt <hvoigt@hvoigt.net> writes: > Sorry about the late reply. I was not able to process emails until now. > Here are two patches that should help to improve the situation and batch > up some processing. This one is for repositories with submodules, so > that they do not iterate over the same submodule twice with the same > hash. > > The second one will be the one people without submodules are interested > in. Thanks. Will take a look at later as I'm already deep in today's integration cycle. Very much appreciated. ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 3/2] batch check whether submodule needs pushing into one call 2016-09-14 22:30 ` Junio C Hamano @ 2016-09-15 12:10 ` Heiko Voigt 2016-09-15 21:08 ` Junio C Hamano 2016-09-16 17:59 ` Junio C Hamano 2016-09-15 12:18 ` [PATCH 4/2] use actual start hashes for submodule push check instead of local refs Heiko Voigt 1 sibling, 2 replies; 25+ messages in thread From: Heiko Voigt @ 2016-09-15 12:10 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Stefan Beller, git@vger.kernel.org, Jens Lehmann, Fredrik Gustafsson, Leandro Lucarella We run a command for each sha1 change in a submodule. This is unnecessary since we can simply batch all sha1's we want to check into one command. Lets do it so we can speedup the check when many submodule changes are in need of checking. Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net> --- On Wed, Sep 14, 2016 at 03:30:53PM -0700, Junio C Hamano wrote: > Heiko Voigt <hvoigt@hvoigt.net> writes: > > > Sorry about the late reply. I was not able to process emails until now. > > Here are two patches that should help to improve the situation and batch > > up some processing. This one is for repositories with submodules, so > > that they do not iterate over the same submodule twice with the same > > hash. > > > > The second one will be the one people without submodules are interested > > in. > > Thanks. Will take a look at later as I'm already deep in today's > integration cycle. Very much appreciated. No problem. While I am at it: Here are actually another two patches that should make life of submodule users easier (push times of big pushes). In Numbers with the qt5[1] superproject and all submodules initialized. The same --mirror test as before with the git repository: # Without patch: book:qt5 hvoigt (5.6)$ rm -rf ~/Downloads/git-test && mkdir ~/Downloads/git-test && (cd ~/Downloads/git-test && git init) && time git push --mirror --recurse-submodules=check ~/Downloads/git-test real 4m0.881s user 3m30.139s sys 0m22.329s Without --recurse-submodules=check real 0m0.251s user 0m0.218s sys 0m0.082s # With patch: real 0m1.167s user 0m0.846s sys 0m0.262s real 0m1.110s user 0m0.815s sys 0m0.247s real 0m1.111s user 0m0.818s sys 0m0.251s Without --recurse-submodules=check real 0m0.294s user 0m0.221s sys 0m0.104s real 0m0.248s user 0m0.216s sys 0m0.080s real 0m0.247s user 0m0.212s sys 0m0.082s [1] git://code.qt.io/qt/qt5.git submodule.c | 75 ++++++++++++++++++++++++++++++++----------------------------- 1 file changed, 39 insertions(+), 36 deletions(-) diff --git a/submodule.c b/submodule.c index a15e346..28bb74e 100644 --- a/submodule.c +++ b/submodule.c @@ -522,27 +522,54 @@ static int has_remote(const char *refname, const struct object_id *oid, return 1; } -static int submodule_needs_pushing(const char *path, const unsigned char sha1[20]) +static void append_hash_to_argv(const unsigned char sha1[20], void *data) { - if (add_submodule_odb(path) || !lookup_commit_reference(sha1)) + struct argv_array *argv = (struct argv_array *) data; + argv_array_push(argv, sha1_to_hex(sha1)); +} + +static void check_has_hash(const unsigned char sha1[20], void *data) +{ + int *has_hash = (int *) data; + + if (!lookup_commit_reference(sha1)) + *has_hash = 0; +} + +static int submodule_has_hashes(const char *path, struct sha1_array *hashes) +{ + int has_hash = 1; + + if (add_submodule_odb(path)) + return 0; + + sha1_array_for_each_unique(hashes, check_has_hash, &has_hash); + return has_hash; +} + +static int submodule_needs_pushing(const char *path, struct sha1_array *hashes) +{ + if (!submodule_has_hashes(path, hashes)) return 0; if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) { struct child_process cp = CHILD_PROCESS_INIT; - const char *argv[] = {"rev-list", NULL, "--not", "--remotes", "-n", "1" , NULL}; + + argv_array_push(&cp.args, "rev-list"); + sha1_array_for_each_unique(hashes, append_hash_to_argv, &cp.args); + argv_array_pushl(&cp.args, "--not", "--remotes", "-n", "1" , NULL); + struct strbuf buf = STRBUF_INIT; int needs_pushing = 0; - argv[1] = sha1_to_hex(sha1); - cp.argv = argv; prepare_submodule_repo_env(&cp.env_array); cp.git_cmd = 1; cp.no_stdin = 1; cp.out = -1; cp.dir = path; if (start_command(&cp)) - die("Could not run 'git rev-list %s --not --remotes -n 1' command in submodule %s", - sha1_to_hex(sha1), path); + die("Could not run 'git rev-list <hashes> --not --remotes -n 1' command in submodule %s", + path); if (strbuf_read(&buf, cp.out, 41)) needs_pushing = 1; finish_command(&cp); @@ -601,21 +628,6 @@ static void find_unpushed_submodule_commits(struct commit *commit, diff_tree_combined_merge(commit, 1, &rev); } -struct collect_submodule_from_sha1s_data { - char *submodule_path; - struct string_list *needs_pushing; -}; - -static void collect_submodules_from_sha1s(const unsigned char sha1[20], - void *data) -{ - struct collect_submodule_from_sha1s_data *me = - (struct collect_submodule_from_sha1s_data *) data; - - if (submodule_needs_pushing(me->submodule_path, sha1)) - string_list_insert(me->needs_pushing, me->submodule_path); -} - static void free_submodules_sha1s(struct string_list *submodules) { int i; @@ -627,13 +639,6 @@ static void free_submodules_sha1s(struct string_list *submodules) string_list_clear(submodules, 1); } -static void append_hash_to_argv(const unsigned char sha1[20], - void *data) -{ - struct argv_array *argv = (struct argv_array *) data; - argv_array_push(argv, sha1_to_hex(sha1)); -} - int find_unpushed_submodules(struct sha1_array *hashes, const char *remotes_name, struct string_list *needs_pushing) { @@ -662,13 +667,11 @@ int find_unpushed_submodules(struct sha1_array *hashes, argv_array_clear(&argv); for (i = 0; i < submodules.nr; i++) { - struct string_list_item *item = &submodules.items[i]; - struct collect_submodule_from_sha1s_data data; - data.submodule_path = item->string; - data.needs_pushing = needs_pushing; - sha1_array_for_each_unique((struct sha1_array *) item->util, - collect_submodules_from_sha1s, - &data); + struct string_list_item *submodule = &submodules.items[i]; + struct sha1_array *hashes = (struct sha1_array *) submodule->util; + + if (submodule_needs_pushing(submodule->string, hashes)) + string_list_insert(needs_pushing, submodule->string); } free_submodules_sha1s(&submodules); -- 2.10.0.133.g13017a3 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 3/2] batch check whether submodule needs pushing into one call 2016-09-15 12:10 ` [PATCH 3/2] batch check whether submodule needs pushing into one call Heiko Voigt @ 2016-09-15 21:08 ` Junio C Hamano 2016-09-16 9:40 ` Heiko Voigt 2016-09-16 17:59 ` Junio C Hamano 1 sibling, 1 reply; 25+ messages in thread From: Junio C Hamano @ 2016-09-15 21:08 UTC (permalink / raw) To: Heiko Voigt Cc: Jeff King, Stefan Beller, git@vger.kernel.org, Jens Lehmann, Fredrik Gustafsson, Leandro Lucarella Heiko Voigt <hvoigt@hvoigt.net> writes: > if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) { > struct child_process cp = CHILD_PROCESS_INIT; > - const char *argv[] = {"rev-list", NULL, "--not", "--remotes", "-n", "1" , NULL}; > + > + argv_array_push(&cp.args, "rev-list"); > + sha1_array_for_each_unique(hashes, append_hash_to_argv, &cp.args); > + argv_array_pushl(&cp.args, "--not", "--remotes", "-n", "1" , NULL); > + > struct strbuf buf = STRBUF_INIT; > int needs_pushing = 0; These two become decl-after-stmt; move your new lines a bit lower, perhaps? > - argv[1] = sha1_to_hex(sha1); > - cp.argv = argv; > prepare_submodule_repo_env(&cp.env_array); By the way, with the two new patches, 'pu' seems to start failing some tests, e.g. 5533 5404 5405. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/2] batch check whether submodule needs pushing into one call 2016-09-15 21:08 ` Junio C Hamano @ 2016-09-16 9:40 ` Heiko Voigt 2016-09-16 12:31 ` Heiko Voigt 0 siblings, 1 reply; 25+ messages in thread From: Heiko Voigt @ 2016-09-16 9:40 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Stefan Beller, git@vger.kernel.org, Jens Lehmann, Fredrik Gustafsson, Leandro Lucarella On Thu, Sep 15, 2016 at 02:08:58PM -0700, Junio C Hamano wrote: > Heiko Voigt <hvoigt@hvoigt.net> writes: > > > if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) { > > struct child_process cp = CHILD_PROCESS_INIT; > > - const char *argv[] = {"rev-list", NULL, "--not", "--remotes", "-n", "1" , NULL}; > > + > > + argv_array_push(&cp.args, "rev-list"); > > + sha1_array_for_each_unique(hashes, append_hash_to_argv, &cp.args); > > + argv_array_pushl(&cp.args, "--not", "--remotes", "-n", "1" , NULL); > > + > > struct strbuf buf = STRBUF_INIT; > > int needs_pushing = 0; > > These two become decl-after-stmt; move your new lines a bit lower, > perhaps? Thanks, missed those. Will do. > > - argv[1] = sha1_to_hex(sha1); > > - cp.argv = argv; > > prepare_submodule_repo_env(&cp.env_array); > > By the way, with the two new patches, 'pu' seems to start failing > some tests, e.g. 5533 5404 5405. Ah ok I did only test on master, will look into those. Cheers Heiko ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/2] batch check whether submodule needs pushing into one call 2016-09-16 9:40 ` Heiko Voigt @ 2016-09-16 12:31 ` Heiko Voigt 2016-09-16 18:13 ` Junio C Hamano 0 siblings, 1 reply; 25+ messages in thread From: Heiko Voigt @ 2016-09-16 12:31 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Stefan Beller, git@vger.kernel.org, Jens Lehmann, Fredrik Gustafsson, Leandro Lucarella On Fri, Sep 16, 2016 at 11:40:19AM +0200, Heiko Voigt wrote: > > By the way, with the two new patches, 'pu' seems to start failing > > some tests, e.g. 5533 5404 5405. > > Ah ok I did only test on master, will look into those. Ok I had a look into these and the reason t5533 fails is because on pu --recurse-submodules is enabled by default and I missed the case when overwriting a ref. In that case we get the sha1 from the remote side as old. So we could catch that and fall back to all revisions there, but... ... tl;dr: The solution to use the old revisions from the remote side was too simple and does not make matters better but actually worse for some typical usecases. Its only in the last patch. ... that lead me to further thinking about the previous solution (using the locally cached remote refs) which might actually be a good default for the non-fastforward cases like creating new ref or overwriting a ref. My current patch would handle the --mirror case nicer, since it gets a lot of old revs to reduce the revisions to check. For the typical one branch push it would most likely be worse. Except when the user is updating (fast-forwarding) the remote ref we would scan all revs of a ref until the root because we do not get enough valid revs that already exist on the remote. The most exact solution would be to use all actual remote refs available (not sure if we have them at this point in the process?) another solution would be to still append the --remotes=<remotename> option as a fallback to reduce the revisions. What do others think? Will leave this for a little bit further thinking. Its just the last patch ("use actual start hashes for submodule push check instead of local refs") that needs to go back to the drawing board. Cheers Heiko ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/2] batch check whether submodule needs pushing into one call 2016-09-16 12:31 ` Heiko Voigt @ 2016-09-16 18:13 ` Junio C Hamano 2016-09-19 20:08 ` Heiko Voigt 0 siblings, 1 reply; 25+ messages in thread From: Junio C Hamano @ 2016-09-16 18:13 UTC (permalink / raw) To: Heiko Voigt Cc: Jeff King, Stefan Beller, git@vger.kernel.org, Jens Lehmann, Fredrik Gustafsson, Leandro Lucarella Heiko Voigt <hvoigt@hvoigt.net> writes: > On Fri, Sep 16, 2016 at 11:40:19AM +0200, Heiko Voigt wrote: >> > By the way, with the two new patches, 'pu' seems to start failing >> > some tests, e.g. 5533 5404 5405. >> >> Ah ok I did only test on master, will look into those. > > Ok I had a look into these and the reason t5533 fails is because on pu > --recurse-submodules is enabled by default and I missed the case when > overwriting a ref. In that case we get the sha1 from the remote side as > old. So we could catch that and fall back to all revisions there, but... > > ... tl;dr: The solution to use the old revisions from the remote side > was too simple and does not make matters better but actually worse for > some typical usecases. Its only in the last patch. You may not even have the old one in your copy of the remote repository if you haven't fetched from them and you are forcing your push. "rev-list <new ones> --not <old ones>" may fail in such a case, not producing the list of new commits. You'd need to exclude old ones you learned over the wire that you do not yet have locally. > The most exact solution would be to use all actual remote refs available > (not sure if we have them at this point in the process?) another > solution would be to still append the --remotes=<remotename> option as a > fallback to reduce the revisions. I'd say --remotes=<remotename> is the least problematic thing to do. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/2] batch check whether submodule needs pushing into one call 2016-09-16 18:13 ` Junio C Hamano @ 2016-09-19 20:08 ` Heiko Voigt 0 siblings, 0 replies; 25+ messages in thread From: Heiko Voigt @ 2016-09-19 20:08 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Stefan Beller, git@vger.kernel.org, Jens Lehmann, Fredrik Gustafsson, Leandro Lucarella On Fri, Sep 16, 2016 at 11:13:09AM -0700, Junio C Hamano wrote: > Heiko Voigt <hvoigt@hvoigt.net> writes: > > The most exact solution would be to use all actual remote refs available > > (not sure if we have them at this point in the process?) another > > solution would be to still append the --remotes=<remotename> option as a > > fallback to reduce the revisions. > > I'd say --remotes=<remotename> is the least problematic thing to do. Ok then lets drop my last patch and keep it the way it was. Because if the remote sha1 differs we probably do not have it locally anyway. The only case this does not catch is when the user specifies a remote URL. But that just means we will iterate over all revisions instead of a reduced set, which makes the check slower but still correct. As one can see from my measurements that should not be that bad anymore. Cheers Heiko ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/2] batch check whether submodule needs pushing into one call 2016-09-15 12:10 ` [PATCH 3/2] batch check whether submodule needs pushing into one call Heiko Voigt 2016-09-15 21:08 ` Junio C Hamano @ 2016-09-16 17:59 ` Junio C Hamano 2016-09-19 19:58 ` Heiko Voigt 1 sibling, 1 reply; 25+ messages in thread From: Junio C Hamano @ 2016-09-16 17:59 UTC (permalink / raw) To: Heiko Voigt Cc: Jeff King, Stefan Beller, git@vger.kernel.org, Jens Lehmann, Fredrik Gustafsson, Leandro Lucarella Heiko Voigt <hvoigt@hvoigt.net> writes: > +static void append_hash_to_argv(const unsigned char sha1[20], void *data) > { > - if (add_submodule_odb(path) || !lookup_commit_reference(sha1)) > + struct argv_array *argv = (struct argv_array *) data; > + argv_array_push(argv, sha1_to_hex(sha1)); > +} Hmph, why do I think I've seen this before in the previous patch? ... scans through this patch and finds that a similar one is removed ;-) OK. This makes sense. > +static void check_has_hash(const unsigned char sha1[20], void *data) > +{ > + int *has_hash = (int *) data; > + > + if (!lookup_commit_reference(sha1)) > + *has_hash = 0; > +} > + > +static int submodule_has_hashes(const char *path, struct sha1_array *hashes) > +{ > + int has_hash = 1; > + > + if (add_submodule_odb(path)) > + return 0; > + > + sha1_array_for_each_unique(hashes, check_has_hash, &has_hash); > + return has_hash; > +} > + > +static int submodule_needs_pushing(const char *path, struct sha1_array *hashes) > +{ > + if (!submodule_has_hashes(path, hashes)) > return 0; I think you meant well, but this optimization is wrong. A mere presence of an object does not mean that the current tip can reach that object. Imagine you pushed commit A earlier to them at the tip, then pushed commit A~ to them at the tip, which is the current state of the remote of the submodule, and since them they may have GC'ed. They no longer have the commit A. For that matter, because you are doing this check by pretending as if all the submodule objects are in the object store of the current superproject you are working in, and saying "it exists there in the submodule repository" when the only thing you know is it exists in an object store of either the submodule repository, the superproject repository, or any of the other submodule repositories, you really cannot tell much from a mere presence of an object. Not just the remote of the submodule repository you are interested in, but the submodule repository you are interested in itself, may not have that object. Drop the previous two helper functions and this short-cut. > if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) { > struct child_process cp = CHILD_PROCESS_INIT; > - const char *argv[] = {"rev-list", NULL, "--not", "--remotes", "-n", "1" , NULL}; > + > + argv_array_push(&cp.args, "rev-list"); > + sha1_array_for_each_unique(hashes, append_hash_to_argv, &cp.args); > + argv_array_pushl(&cp.args, "--not", "--remotes", "-n", "1" , NULL); > + > struct strbuf buf = STRBUF_INIT; > int needs_pushing = 0; > > - argv[1] = sha1_to_hex(sha1); > - cp.argv = argv; > prepare_submodule_repo_env(&cp.env_array); > cp.git_cmd = 1; > cp.no_stdin = 1; > cp.out = -1; > cp.dir = path; > if (start_command(&cp)) > - die("Could not run 'git rev-list %s --not --remotes -n 1' command in submodule %s", > - sha1_to_hex(sha1), path); > + die("Could not run 'git rev-list <hashes> --not --remotes -n 1' command in submodule %s", > + path); > if (strbuf_read(&buf, cp.out, 41)) > needs_pushing = 1; > finish_command(&cp); > @@ -601,21 +628,6 @@ static void find_unpushed_submodule_commits(struct commit *commit, > diff_tree_combined_merge(commit, 1, &rev); > } Good. This is the optimization I alluded to in the review of the first one in the series. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/2] batch check whether submodule needs pushing into one call 2016-09-16 17:59 ` Junio C Hamano @ 2016-09-19 19:58 ` Heiko Voigt 0 siblings, 0 replies; 25+ messages in thread From: Heiko Voigt @ 2016-09-19 19:58 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Stefan Beller, git@vger.kernel.org, Jens Lehmann, Fredrik Gustafsson, Leandro Lucarella On Fri, Sep 16, 2016 at 10:59:37AM -0700, Junio C Hamano wrote: > Heiko Voigt <hvoigt@hvoigt.net> writes: > > +static void check_has_hash(const unsigned char sha1[20], void *data) > > +{ > > + int *has_hash = (int *) data; > > + > > + if (!lookup_commit_reference(sha1)) > > + *has_hash = 0; > > +} > > + > > +static int submodule_has_hashes(const char *path, struct sha1_array *hashes) > > +{ > > + int has_hash = 1; > > + > > + if (add_submodule_odb(path)) > > + return 0; > > + > > + sha1_array_for_each_unique(hashes, check_has_hash, &has_hash); > > + return has_hash; > > +} > > + > > +static int submodule_needs_pushing(const char *path, struct sha1_array *hashes) > > +{ > > + if (!submodule_has_hashes(path, hashes)) > > return 0; > > I think you meant well, but this optimization is wrong. A mere > presence of an object does not mean that the current tip can reach > that object. Imagine you pushed commit A earlier to them at the > tip, then pushed commit A~ to them at the tip, which is the current > state of the remote of the submodule, and since them they may have > GC'ed. They no longer have the commit A. > > For that matter, because you are doing this check by pretending as > if all the submodule objects are in the object store of the current > superproject you are working in, and saying "it exists there in the > submodule repository" when the only thing you know is it exists in > an object store of either the submodule repository, the superproject > repository, or any of the other submodule repositories, you really > cannot tell much from a mere presence of an object. Not just the > remote of the submodule repository you are interested in, but the > submodule repository you are interested in itself, may not have that > object. > > Drop the previous two helper functions and this short-cut. This is nothing I added. This came from the original code which I simply extended to check for all sha1's. The diff is a little bit misleading. This would be the local diff: - if (add_submodule_odb(path) || !lookup_commit_reference(sha1)) + if (!submodule_has_hashes(path, hashes)) return 0; I think that this is more a shortcut for: If we can not find the sha1, we do not need to bother spawning a process for the real check. We default to the answer no in that case. It looks like the reason for the sha1 check is to avoid error output from the spawned 'rev-list' process. The error output might be confusing to the user in case the sha1 does not exist in the submodule. We should probably die here since we were not able to check a submodule that has a diff in the revisions we would push. After thinking about this further: I think it is actually bad to proceed here, as the current revisions (just in the superproject) would still be pushed and the user possibly gets an inconsistent state on the remote side which he tried to avoid with check/on-demand-push enabled. So in short this deserves some further love. Will have a look into adding another patch for this if we agree on my suggestion to die above. Cheers Heiko ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 4/2] use actual start hashes for submodule push check instead of local refs 2016-09-14 22:30 ` Junio C Hamano 2016-09-15 12:10 ` [PATCH 3/2] batch check whether submodule needs pushing into one call Heiko Voigt @ 2016-09-15 12:18 ` Heiko Voigt 1 sibling, 0 replies; 25+ messages in thread From: Heiko Voigt @ 2016-09-15 12:18 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Stefan Beller, git@vger.kernel.org, Jens Lehmann, Fredrik Gustafsson, Leandro Lucarella Push knows the actual revision range it is actually pushing to a remote. Let's use the start revisions to reduce the amount of checked revisions instead of the locally cached remote refs which might be out of date. This actually changes behavior as it now can also properly handle pushes with URLs. That is also the reason why we change some tests. When passing the remote as URL the push is made unconditionally in on-demand. This is wrong since on-demand should only push the submodule if its SHA1 reference got changed in the superproject history that is getting pushed. The reason behind that is that the exclusion list for revisions was reduced by using the parameters "--not --remotes=<remote name>" to reduce the amount of revisions for the revision walk. In case of an URL this does not match anything and thus we would always do a full revision walk until the root commit. Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net> --- And here is another one which makes the check more correct. For push --recurse-submodules=check|on-demand with URLs this is also a performance improvement since at the moment we iterate over all revisions unconditionally as explained above. This patch changes the behavior (now its how its supposed to be) so there may be fallout from users complaining that their submodules do not get pushed with on-demand anymore. This is only for users using URL for pushing though. Cheers Heiko BTW: Peff, thanks for the good diagnosis of all these problems. submodule.c | 12 ++++++------ submodule.h | 6 +++--- t/t5531-deep-submodule-push.sh | 24 ++++++++++++++++++------ transport.c | 40 ++++++++++++++++++++++++++++++---------- 4 files changed, 57 insertions(+), 25 deletions(-) diff --git a/submodule.c b/submodule.c index 28bb74e..1f82974 100644 --- a/submodule.c +++ b/submodule.c @@ -639,8 +639,8 @@ static void free_submodules_sha1s(struct string_list *submodules) string_list_clear(submodules, 1); } -int find_unpushed_submodules(struct sha1_array *hashes, - const char *remotes_name, struct string_list *needs_pushing) +int find_unpushed_submodules(struct sha1_array *old_hashes, + struct sha1_array *new_hashes, struct string_list *needs_pushing) { struct rev_info rev; struct commit *commit; @@ -652,9 +652,9 @@ int find_unpushed_submodules(struct sha1_array *hashes, /* argv.argv[0] will be ignored by setup_revisions */ argv_array_push(&argv, "find_unpushed_submodules"); - sha1_array_for_each_unique(hashes, append_hash_to_argv, &argv); + sha1_array_for_each_unique(new_hashes, append_hash_to_argv, &argv); argv_array_push(&argv, "--not"); - argv_array_pushf(&argv, "--remotes=%s", remotes_name); + sha1_array_for_each_unique(old_hashes, append_hash_to_argv, &argv); setup_revisions(argv.argc, argv.argv, &rev, NULL); if (prepare_revision_walk(&rev)) @@ -700,12 +700,12 @@ static int push_submodule(const char *path) return 1; } -int push_unpushed_submodules(struct sha1_array *hashes, const char *remotes_name) +int push_unpushed_submodules(struct sha1_array *old_hashes, struct sha1_array *new_hashes) { int i, ret = 1; struct string_list needs_pushing = STRING_LIST_INIT_DUP; - if (!find_unpushed_submodules(hashes, remotes_name, &needs_pushing)) + if (!find_unpushed_submodules(old_hashes, new_hashes, &needs_pushing)) return 1; for (i = 0; i < needs_pushing.nr; i++) { diff --git a/submodule.h b/submodule.h index 065b2f0..55c6c91 100644 --- a/submodule.h +++ b/submodule.h @@ -63,9 +63,9 @@ int submodule_uses_gitfile(const char *path); int ok_to_remove_submodule(const char *path); int merge_submodule(unsigned char result[20], const char *path, const unsigned char base[20], const unsigned char a[20], const unsigned char b[20], int search); -int find_unpushed_submodules(struct sha1_array *hashes, const char *remotes_name, - struct string_list *needs_pushing); -int push_unpushed_submodules(struct sha1_array *hashes, const char *remotes_name); +int find_unpushed_submodules(struct sha1_array *old_hashes, + struct sha1_array *new_hashes, struct string_list *needs_pushing); +int push_unpushed_submodules(struct sha1_array *old_hashes, struct sha1_array *new_hashes); void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir); int parallel_submodules(void); diff --git a/t/t5531-deep-submodule-push.sh b/t/t5531-deep-submodule-push.sh index 198ce84..a5b8ef6 100755 --- a/t/t5531-deep-submodule-push.sh +++ b/t/t5531-deep-submodule-push.sh @@ -154,6 +154,8 @@ test_expect_success 'push recurse-submodules on command line overrides config' ' git fetch ../pub.git && git diff --quiet FETCH_HEAD master && (cd gar/bage && git diff --quiet origin/master master^) && + # Since this push was executed reset to previous state + (git push -f ../pub.git master^:master) && # Ensure that we can override check in the config to # disable submodule recursion entirely (alternative form) @@ -161,6 +163,8 @@ test_expect_success 'push recurse-submodules on command line overrides config' ' git fetch ../pub.git && git diff --quiet FETCH_HEAD master && (cd gar/bage && git diff --quiet origin/master master^) && + # Since this push was executed reset to previous state + (git push -f ../pub.git master^:master) && # Ensure that we can override check in the config to # push the submodule too @@ -198,11 +202,15 @@ test_expect_success 'push recurse-submodules last one wins on command line' ' git diff --quiet FETCH_HEAD master && # Check that the submodule commit did not get there (cd gar/bage && git diff --quiet origin/master master^) && + # Since this push was executed reset to previous state + (git push -f ../pub.git master^:master) && # should result in "no" git push --recurse-submodules=on-demand --no-recurse-submodules ../pub.git master && # Check that the submodule commit did not get there (cd gar/bage && git diff --quiet origin/master master^) && + # Since this push was executed reset to previous state + (git push -f ../pub.git master^:master) && # But the options in the other order should push the submodule git push --recurse-submodules=check --recurse-submodules=on-demand ../pub.git master && @@ -249,9 +257,11 @@ test_expect_success 'push succeeds if submodule commit disabling recursion from git fetch ../pub.git && git diff --quiet FETCH_HEAD master && # But that the submodule commit did not - ( cd gar/bage && git diff --quiet origin/master master^ ) && - # Now push it to avoid confusing future tests - git push --recurse-submodules=on-demand ../pub.git master + (cd gar/bage && + git diff --quiet origin/master master^ && + # Now push it to avoid confusing future tests + git push + ) ) ' @@ -271,9 +281,11 @@ test_expect_success 'push succeeds if submodule commit disabling recursion from git fetch ../pub.git && git diff --quiet FETCH_HEAD master && # But that the submodule commit did not - ( cd gar/bage && git diff --quiet origin/master master^ ) && - # Now push it to avoid confusing future tests - git push --recurse-submodules=on-demand ../pub.git master + (cd gar/bage && + git diff --quiet origin/master master^ && + # Now push it to avoid confusing future tests + git push + ) ) ' diff --git a/transport.c b/transport.c index 76e1daf..475faaf 100644 --- a/transport.c +++ b/transport.c @@ -845,6 +845,28 @@ static int run_pre_push_hook(struct transport *transport, return ret; } +static void collect_new_old_hashes(struct ref *ref, struct sha1_array *new_hashes, + struct sha1_array *old_hashes) +{ + for (; ref; ref = ref->next) { + if (!is_null_oid(&ref->new_oid)) { + /* Just need to handle non-null oid's. If there + * is no new we are handling a deletion and do + * not need to check + */ + sha1_array_append(new_hashes, ref->new_oid.hash); + + /* if the old id is null we are handling an new + * ref and can simply skip appending the oid + * since they are used to reduce the amount of + * checked revisions. + */ + if (!is_null_oid(&ref->old_oid)) + sha1_array_append(old_hashes, ref->old_oid.hash); + } + } +} + int transport_push(struct transport *transport, int refspec_nr, const char **refspec, int flags, unsigned int *reject_reasons) @@ -903,13 +925,12 @@ int transport_push(struct transport *transport, if ((flags & TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND) && !is_bare_repository()) { struct ref *ref = remote_refs; - struct sha1_array hashes = SHA1_ARRAY_INIT; + struct sha1_array new_hashes = SHA1_ARRAY_INIT; + struct sha1_array old_hashes = SHA1_ARRAY_INIT; - for (; ref; ref = ref->next) - if (!is_null_oid(&ref->new_oid)) - sha1_array_append(&hashes, ref->new_oid.hash); + collect_new_old_hashes(ref, &new_hashes, &old_hashes); - if (!push_unpushed_submodules(&hashes, transport->remote->name)) + if (!push_unpushed_submodules(&old_hashes, &new_hashes)) die ("Failed to push all needed submodules!"); } @@ -917,13 +938,12 @@ int transport_push(struct transport *transport, TRANSPORT_RECURSE_SUBMODULES_CHECK)) && !is_bare_repository()) { struct ref *ref = remote_refs; struct string_list needs_pushing = STRING_LIST_INIT_DUP; - struct sha1_array hashes = SHA1_ARRAY_INIT; + struct sha1_array new_hashes = SHA1_ARRAY_INIT; + struct sha1_array old_hashes = SHA1_ARRAY_INIT; - for (; ref; ref = ref->next) - if (!is_null_oid(&ref->new_oid)) - sha1_array_append(&hashes, ref->new_oid.hash); + collect_new_old_hashes(ref, &new_hashes, &old_hashes); - if (find_unpushed_submodules(&hashes, transport->remote->name, + if (find_unpushed_submodules(&old_hashes, &new_hashes, &needs_pushing)) die_with_unpushed_submodules(&needs_pushing); } -- 2.10.0.133.g13017a3 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] serialize collection of changed submodules 2016-09-14 17:31 ` [PATCH 1/2] serialize collection of changed submodules Heiko Voigt 2016-09-14 22:30 ` Junio C Hamano @ 2016-09-16 17:27 ` Junio C Hamano 2016-09-19 19:44 ` Heiko Voigt 1 sibling, 1 reply; 25+ messages in thread From: Junio C Hamano @ 2016-09-16 17:27 UTC (permalink / raw) To: Heiko Voigt Cc: Jeff King, Stefan Beller, git@vger.kernel.org, Jens Lehmann, Fredrik Gustafsson, Leandro Lucarella Heiko Voigt <hvoigt@hvoigt.net> writes: > +static struct sha1_array *get_sha1s_from_list(struct string_list *submodules, > + const char *path) > +{ > + struct string_list_item *item; > + struct sha1_array *hashes; > + > + item = string_list_insert(submodules, path); > + if (item->util) > + return (struct sha1_array *) item->util; > + > + hashes = (struct sha1_array *) xmalloc(sizeof(struct sha1_array)); > + /* NEEDSWORK: should we add an initializer function for > + * sha1_array ? */ > + memset(hashes, 0, sizeof(struct sha1_array)); > + item->util = hashes; /* NEEDSWORK: should we have SHA1_ARRAY_INIT etc.? */ item->util = xcalloc(1, sizeof(struct sha1_array)); > static void collect_submodules_from_diff(struct diff_queue_struct *q, > struct diff_options *options, > void *data) > { > int i; > - struct string_list *needs_pushing = data; > + struct string_list *submodules = data; > > for (i = 0; i < q->nr; i++) { > struct diff_filepair *p = q->queue[i]; > + struct sha1_array *hashes; > if (!S_ISGITLINK(p->two->mode)) > continue; > - if (submodule_needs_pushing(p->two->path, p->two->oid.hash)) > - string_list_insert(needs_pushing, p->two->path); > + hashes = get_sha1s_from_list(submodules, p->two->path); > + sha1_array_append(hashes, p->two->oid.hash); > } > } So the idea at this step is still let each commit in the top-level history inspected for any submodule change, but the result is collected in a mapping (submodule -> [ list of submodule commits ]). As we do not expect too many "oops, the old commit was better, so let's revert and rebind the old one from the submodule" in the history of the top-level, appending and then running for-each-unique is an efficient way, instead of first checking if we already have it and then inserting new ones to maintain the uniqueness. Makes sense. > @@ -582,14 +601,41 @@ static void find_unpushed_submodule_commits(struct commit *commit, > diff_tree_combined_merge(commit, 1, &rev); > } > > +struct collect_submodule_from_sha1s_data { > + char *submodule_path; > + struct string_list *needs_pushing; > +}; > + > +static void collect_submodules_from_sha1s(const unsigned char sha1[20], > + void *data) > +{ > + struct collect_submodule_from_sha1s_data *me = > + (struct collect_submodule_from_sha1s_data *) data; > + > + if (submodule_needs_pushing(me->submodule_path, sha1)) > + string_list_insert(me->needs_pushing, me->submodule_path); > +} This is called from sha1_array_for_each_unique() that iterates over the submodule commit object names for one submodule and then ends up calling submodule_needs_pushing() number of times, which smells less efficient than it could be. You can ask rev-list <all the submodule commits to be pushed> --not --remotes just once in the submodule repository. I imagine that is what you'll do in the next patch. An obvious but much less efficient way to optimize this part would be to see if me->needs_pushing already has me->submodule_path and skip the check for submodule_needs_pushing(), but if you drop the call by find_unpushed_submodule to sha1_array_for_each_unique() to walk new submodule commits one by one, that would become irrelevant. > +static void free_submodules_sha1s(struct string_list *submodules) > +{ > + int i; > + for (i = 0; i < submodules->nr; i++) { > + struct string_list_item *item = &submodules->items[i]; > + struct sha1_array *hashes = (struct sha1_array *) item->util; > + sha1_array_clear(hashes); > + } > + string_list_clear(submodules, 1); > +} > + > int find_unpushed_submodules(unsigned char new_sha1[20], > const char *remotes_name, struct string_list *needs_pushing) > { > struct rev_info rev; > struct commit *commit; > const char *argv[] = {NULL, NULL, "--not", "NULL", NULL}; > - int argc = ARRAY_SIZE(argv) - 1; > + int argc = ARRAY_SIZE(argv) - 1, i; > char *sha1_copy; > + struct string_list submodules = STRING_LIST_INIT_DUP; > > struct strbuf remotes_arg = STRBUF_INIT; > > @@ -603,12 +649,23 @@ int find_unpushed_submodules(unsigned char new_sha1[20], > die("revision walk setup failed"); > > while ((commit = get_revision(&rev)) != NULL) > - find_unpushed_submodule_commits(commit, needs_pushing); > + find_unpushed_submodule_commits(commit, &submodules); > > reset_revision_walk(); > free(sha1_copy); > strbuf_release(&remotes_arg); > > + for (i = 0; i < submodules.nr; i++) { > + struct string_list_item *item = &submodules.items[i]; > + struct collect_submodule_from_sha1s_data data; > + data.submodule_path = item->string; > + data.needs_pushing = needs_pushing; > + sha1_array_for_each_unique((struct sha1_array *) item->util, > + collect_submodules_from_sha1s, > + &data); > + } > + free_submodules_sha1s(&submodules); > + > return needs_pushing->nr; > } ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] serialize collection of changed submodules 2016-09-16 17:27 ` [PATCH 1/2] serialize collection of changed submodules Junio C Hamano @ 2016-09-19 19:44 ` Heiko Voigt 0 siblings, 0 replies; 25+ messages in thread From: Heiko Voigt @ 2016-09-19 19:44 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Stefan Beller, git@vger.kernel.org, Jens Lehmann, Fredrik Gustafsson, Leandro Lucarella On Fri, Sep 16, 2016 at 10:27:04AM -0700, Junio C Hamano wrote: > Heiko Voigt <hvoigt@hvoigt.net> writes: > > > +static struct sha1_array *get_sha1s_from_list(struct string_list *submodules, > > + const char *path) > > +{ > > + struct string_list_item *item; > > + struct sha1_array *hashes; > > + > > + item = string_list_insert(submodules, path); > > + if (item->util) > > + return (struct sha1_array *) item->util; > > + > > + hashes = (struct sha1_array *) xmalloc(sizeof(struct sha1_array)); > > + /* NEEDSWORK: should we add an initializer function for > > + * sha1_array ? */ > > + memset(hashes, 0, sizeof(struct sha1_array)); > > + item->util = hashes; > > > /* NEEDSWORK: should we have SHA1_ARRAY_INIT etc.? */ > item->util = xcalloc(1, sizeof(struct sha1_array)); Ok will do. ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 2/2] serialize collection of refs that contain submodule changes 2016-08-24 23:01 ` Jeff King 2016-09-14 17:31 ` [PATCH 1/2] serialize collection of changed submodules Heiko Voigt @ 2016-09-14 17:51 ` Heiko Voigt 2016-09-14 19:46 ` Heiko Voigt 2016-09-16 17:47 ` Junio C Hamano 1 sibling, 2 replies; 25+ messages in thread From: Heiko Voigt @ 2016-09-14 17:51 UTC (permalink / raw) To: Jeff King Cc: Stefan Beller, Junio C Hamano, git@vger.kernel.org, Jens Lehmann, Fredrik Gustafsson, Leandro Lucarella We are iterating over each pushed ref and want to check whether it contains changes to submodules. Instead of immediately checking each ref lets first collect them and then do the check for all of them in one revision walk. Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net> --- Sorry this was not catched earlier. This was implemented as part of summer of code and it seems we never tested with --mirror. This is the one which does only one revision walk instead of one for each ref. Here are some numbers (using the my development clone of git itself) from my local machine: rm -rf <test-git> && mkdir <test-git> && (cd <test-git> && git init) && time git push --mirror <test-git> real 0m16.056s user 0m24.424s sys 0m1.380s real 0m15.885s user 0m24.204s sys 0m1.296s real 0m16.731s user 0m24.176s sys 0m1.244s rm -rf <test-git> && mkdir <test-git> && (cd <test-git> && git init) && time git push --mirror --recurse-submodules=check <test-git> real 0m21.441s user 0m29.560s sys 0m1.480s real 0m21.319s user 0m29.484s sys 0m1.464s real 0m21.261s user 0m29.252s sys 0m1.592s Without my patches and --recurse-submodules=check the numbers are basically the same. I stopped the test with --recurse-submodules=check after ~ 9 minutes. Cheers Heiko submodule.c | 36 +++++++++++++++++++++--------------- submodule.h | 5 +++-- transport.c | 22 ++++++++++++++-------- 3 files changed, 38 insertions(+), 25 deletions(-) diff --git a/submodule.c b/submodule.c index b04c066..a15e346 100644 --- a/submodule.c +++ b/submodule.c @@ -627,24 +627,31 @@ static void free_submodules_sha1s(struct string_list *submodules) string_list_clear(submodules, 1); } -int find_unpushed_submodules(unsigned char new_sha1[20], +static void append_hash_to_argv(const unsigned char sha1[20], + void *data) +{ + struct argv_array *argv = (struct argv_array *) data; + argv_array_push(argv, sha1_to_hex(sha1)); +} + +int find_unpushed_submodules(struct sha1_array *hashes, const char *remotes_name, struct string_list *needs_pushing) { struct rev_info rev; struct commit *commit; - const char *argv[] = {NULL, NULL, "--not", "NULL", NULL}; - int argc = ARRAY_SIZE(argv) - 1, i; - char *sha1_copy; + int i; struct string_list submodules = STRING_LIST_INIT_DUP; + struct argv_array argv = ARGV_ARRAY_INIT; - struct strbuf remotes_arg = STRBUF_INIT; - - strbuf_addf(&remotes_arg, "--remotes=%s", remotes_name); init_revisions(&rev, NULL); - sha1_copy = xstrdup(sha1_to_hex(new_sha1)); - argv[1] = sha1_copy; - argv[3] = remotes_arg.buf; - setup_revisions(argc, argv, &rev, NULL); + + /* argv.argv[0] will be ignored by setup_revisions */ + argv_array_push(&argv, "find_unpushed_submodules"); + sha1_array_for_each_unique(hashes, append_hash_to_argv, &argv); + argv_array_push(&argv, "--not"); + argv_array_pushf(&argv, "--remotes=%s", remotes_name); + + setup_revisions(argv.argc, argv.argv, &rev, NULL); if (prepare_revision_walk(&rev)) die("revision walk setup failed"); @@ -652,8 +659,7 @@ int find_unpushed_submodules(unsigned char new_sha1[20], find_unpushed_submodule_commits(commit, &submodules); reset_revision_walk(); - free(sha1_copy); - strbuf_release(&remotes_arg); + argv_array_clear(&argv); for (i = 0; i < submodules.nr; i++) { struct string_list_item *item = &submodules.items[i]; @@ -691,12 +697,12 @@ static int push_submodule(const char *path) return 1; } -int push_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name) +int push_unpushed_submodules(struct sha1_array *hashes, const char *remotes_name) { int i, ret = 1; struct string_list needs_pushing = STRING_LIST_INIT_DUP; - if (!find_unpushed_submodules(new_sha1, remotes_name, &needs_pushing)) + if (!find_unpushed_submodules(hashes, remotes_name, &needs_pushing)) return 1; for (i = 0; i < needs_pushing.nr; i++) { diff --git a/submodule.h b/submodule.h index d9e197a..065b2f0 100644 --- a/submodule.h +++ b/submodule.h @@ -3,6 +3,7 @@ struct diff_options; struct argv_array; +struct sha1_array; enum { RECURSE_SUBMODULES_CHECK = -4, @@ -62,9 +63,9 @@ int submodule_uses_gitfile(const char *path); int ok_to_remove_submodule(const char *path); int merge_submodule(unsigned char result[20], const char *path, const unsigned char base[20], const unsigned char a[20], const unsigned char b[20], int search); -int find_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name, +int find_unpushed_submodules(struct sha1_array *hashes, const char *remotes_name, struct string_list *needs_pushing); -int push_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name); +int push_unpushed_submodules(struct sha1_array *hashes, const char *remotes_name); void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir); int parallel_submodules(void); diff --git a/transport.c b/transport.c index 94d6dc3..76e1daf 100644 --- a/transport.c +++ b/transport.c @@ -903,23 +903,29 @@ int transport_push(struct transport *transport, if ((flags & TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND) && !is_bare_repository()) { struct ref *ref = remote_refs; + struct sha1_array hashes = SHA1_ARRAY_INIT; + for (; ref; ref = ref->next) - if (!is_null_oid(&ref->new_oid) && - !push_unpushed_submodules(ref->new_oid.hash, - transport->remote->name)) - die ("Failed to push all needed submodules!"); + if (!is_null_oid(&ref->new_oid)) + sha1_array_append(&hashes, ref->new_oid.hash); + + if (!push_unpushed_submodules(&hashes, transport->remote->name)) + die ("Failed to push all needed submodules!"); } if ((flags & (TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND | TRANSPORT_RECURSE_SUBMODULES_CHECK)) && !is_bare_repository()) { struct ref *ref = remote_refs; struct string_list needs_pushing = STRING_LIST_INIT_DUP; + struct sha1_array hashes = SHA1_ARRAY_INIT; for (; ref; ref = ref->next) - if (!is_null_oid(&ref->new_oid) && - find_unpushed_submodules(ref->new_oid.hash, - transport->remote->name, &needs_pushing)) - die_with_unpushed_submodules(&needs_pushing); + if (!is_null_oid(&ref->new_oid)) + sha1_array_append(&hashes, ref->new_oid.hash); + + if (find_unpushed_submodules(&hashes, transport->remote->name, + &needs_pushing)) + die_with_unpushed_submodules(&needs_pushing); } push_ret = transport->push_refs(transport, remote_refs, flags); -- 2.0.2.832.g083c931 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] serialize collection of refs that contain submodule changes 2016-09-14 17:51 ` [PATCH 2/2] serialize collection of refs that contain submodule changes Heiko Voigt @ 2016-09-14 19:46 ` Heiko Voigt 2016-09-14 20:04 ` Stefan Beller 2016-09-16 17:47 ` Junio C Hamano 1 sibling, 1 reply; 25+ messages in thread From: Heiko Voigt @ 2016-09-14 19:46 UTC (permalink / raw) To: Jeff King Cc: Stefan Beller, Junio C Hamano, git@vger.kernel.org, Jens Lehmann, Fredrik Gustafsson, Leandro Lucarella On Wed, Sep 14, 2016 at 07:51:30PM +0200, Heiko Voigt wrote: > Here are some numbers (using the my development clone of git > itself) from my local machine: > > rm -rf <test-git> && mkdir <test-git> && > (cd <test-git> && git init) && > time git push --mirror <test-git> > > real 0m16.056s > user 0m24.424s > sys 0m1.380s > > real 0m15.885s > user 0m24.204s > sys 0m1.296s > > real 0m16.731s > user 0m24.176s > sys 0m1.244s > > rm -rf <test-git> && mkdir <test-git> && > (cd <test-git> && git init) && > time git push --mirror --recurse-submodules=check <test-git> > > real 0m21.441s > user 0m29.560s > sys 0m1.480s > > real 0m21.319s > user 0m29.484s > sys 0m1.464s > > real 0m21.261s > user 0m29.252s > sys 0m1.592s > > Without my patches and --recurse-submodules=check the numbers are > basically the same. I stopped the test with --recurse-submodules=check > after ~ 9 minutes. Fun fact, I let the push without my patch and with --recurse-submodules=check finish: real 27m7.962s user 27m15.568s sys 0m2.016s Thats quite some time... Cheers Heiko ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] serialize collection of refs that contain submodule changes 2016-09-14 19:46 ` Heiko Voigt @ 2016-09-14 20:04 ` Stefan Beller 0 siblings, 0 replies; 25+ messages in thread From: Stefan Beller @ 2016-09-14 20:04 UTC (permalink / raw) To: Heiko Voigt Cc: Jeff King, Junio C Hamano, git@vger.kernel.org, Jens Lehmann, Fredrik Gustafsson, Leandro Lucarella On Wed, Sep 14, 2016 at 12:46 PM, Heiko Voigt <hvoigt@hvoigt.net> wrote: > On Wed, Sep 14, 2016 at 07:51:30PM +0200, Heiko Voigt wrote: >> Here are some numbers (using the my development clone of git >> itself) from my local machine: >> >> rm -rf <test-git> && mkdir <test-git> && >> (cd <test-git> && git init) && >> time git push --mirror <test-git> >> >> real 0m16.056s >> user 0m24.424s >> sys 0m1.380s >> >> real 0m15.885s >> user 0m24.204s >> sys 0m1.296s >> >> real 0m16.731s >> user 0m24.176s >> sys 0m1.244s >> >> rm -rf <test-git> && mkdir <test-git> && >> (cd <test-git> && git init) && >> time git push --mirror --recurse-submodules=check <test-git> >> >> real 0m21.441s >> user 0m29.560s >> sys 0m1.480s >> >> real 0m21.319s >> user 0m29.484s >> sys 0m1.464s >> >> real 0m21.261s >> user 0m29.252s >> sys 0m1.592s >> >> Without my patches and --recurse-submodules=check the numbers are >> basically the same. I stopped the test with --recurse-submodules=check >> after ~ 9 minutes. > > Fun fact, I let the push without my patch and with > --recurse-submodules=check finish: Thanks for the numbers, one of the major push backs for origin/sb/push-make-submodule-check-the-default was that it introduced slowness; this patch might help a bit there. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] serialize collection of refs that contain submodule changes 2016-09-14 17:51 ` [PATCH 2/2] serialize collection of refs that contain submodule changes Heiko Voigt 2016-09-14 19:46 ` Heiko Voigt @ 2016-09-16 17:47 ` Junio C Hamano 2016-09-19 19:51 ` Heiko Voigt 1 sibling, 1 reply; 25+ messages in thread From: Junio C Hamano @ 2016-09-16 17:47 UTC (permalink / raw) To: Heiko Voigt Cc: Jeff King, Stefan Beller, git@vger.kernel.org, Jens Lehmann, Fredrik Gustafsson, Leandro Lucarella Heiko Voigt <hvoigt@hvoigt.net> writes: > diff --git a/submodule.c b/submodule.c > index b04c066..a15e346 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -627,24 +627,31 @@ static void free_submodules_sha1s(struct string_list *submodules) > string_list_clear(submodules, 1); > } > > -int find_unpushed_submodules(unsigned char new_sha1[20], > +static void append_hash_to_argv(const unsigned char sha1[20], > + void *data) > +{ > + struct argv_array *argv = (struct argv_array *) data; > + argv_array_push(argv, sha1_to_hex(sha1)); > +} > + > +int find_unpushed_submodules(struct sha1_array *hashes, > const char *remotes_name, struct string_list *needs_pushing) > { > struct rev_info rev; > struct commit *commit; > - const char *argv[] = {NULL, NULL, "--not", "NULL", NULL}; > - int argc = ARRAY_SIZE(argv) - 1, i; > - char *sha1_copy; > + int i; > struct string_list submodules = STRING_LIST_INIT_DUP; > + struct argv_array argv = ARGV_ARRAY_INIT; > > - struct strbuf remotes_arg = STRBUF_INIT; > - > - strbuf_addf(&remotes_arg, "--remotes=%s", remotes_name); > init_revisions(&rev, NULL); > - sha1_copy = xstrdup(sha1_to_hex(new_sha1)); > - argv[1] = sha1_copy; > - argv[3] = remotes_arg.buf; > - setup_revisions(argc, argv, &rev, NULL); > + > + /* argv.argv[0] will be ignored by setup_revisions */ > + argv_array_push(&argv, "find_unpushed_submodules"); > + sha1_array_for_each_unique(hashes, append_hash_to_argv, &argv); > + argv_array_push(&argv, "--not"); > + argv_array_pushf(&argv, "--remotes=%s", remotes_name); > + > + setup_revisions(argv.argc, argv.argv, &rev, NULL); Yes, its about time to for us to lose that fixed-size argv[] and replace it with an argv-array ;-). > if (prepare_revision_walk(&rev)) > die("revision walk setup failed"); So this one used to get a single commit at the tip of what we pushed in the superproject and was asked "Look at the history we just pushed leading to the tip commit, and tell me if any of the ones new to the remote requires submodule commits the remote does not yet have". Now the caller collects all the tip commits and asks us once: "Here are the new tips we just pushed; in the history leading to them, is there a commit that the remote did not have that requires submodule history the remote does not yet have?". Makes sort-of sense. I speculated that you would be doing the same kind of optimization to feed all positive commits to rev-list at once in each submodule repository in the review of the previous one, but you didn't do it here. You did the same for superproject in this step. Perhaps 3 or 4 would do so in the submodule repository. One thing that makes me worried is how the ref cache layer interacts with this. I see you first call push_unpushed_submodules() when ON_DEMAND is set, which would result in pushes in submodule repositories, updating their remote tracking branches. At that point, before you make another call to find_unpushed_submodules(), is our cached ref layer knows that the remote tracking branches are now up to date (otherwise, we would incorrectly judge that these submodules need pushing based on stale information)? > diff --git a/transport.c b/transport.c > index 94d6dc3..76e1daf 100644 > --- a/transport.c > +++ b/transport.c > @@ -903,23 +903,29 @@ int transport_push(struct transport *transport, > > if ((flags & TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND) && !is_bare_repository()) { > struct ref *ref = remote_refs; > + struct sha1_array hashes = SHA1_ARRAY_INIT; > + > for (; ref; ref = ref->next) > - if (!is_null_oid(&ref->new_oid) && > - !push_unpushed_submodules(ref->new_oid.hash, > - transport->remote->name)) > - die ("Failed to push all needed submodules!"); > + if (!is_null_oid(&ref->new_oid)) > + sha1_array_append(&hashes, ref->new_oid.hash); > + > + if (!push_unpushed_submodules(&hashes, transport->remote->name)) > + die ("Failed to push all needed submodules!"); Do we leak the contents of hashes here? > } > > if ((flags & (TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND | > TRANSPORT_RECURSE_SUBMODULES_CHECK)) && !is_bare_repository()) { > struct ref *ref = remote_refs; > struct string_list needs_pushing = STRING_LIST_INIT_DUP; > + struct sha1_array hashes = SHA1_ARRAY_INIT; > > for (; ref; ref = ref->next) > - if (!is_null_oid(&ref->new_oid) && > - find_unpushed_submodules(ref->new_oid.hash, > - transport->remote->name, &needs_pushing)) > - die_with_unpushed_submodules(&needs_pushing); > + if (!is_null_oid(&ref->new_oid)) > + sha1_array_append(&hashes, ref->new_oid.hash); > + > + if (find_unpushed_submodules(&hashes, transport->remote->name, > + &needs_pushing)) > + die_with_unpushed_submodules(&needs_pushing); Do we leak the contents of hashes here? I do not think we need to worry about needs_pushing leaking, as we will always die if it is not empty, but it might be a better code hygiene to clear it as well. > } > > push_ret = transport->push_refs(transport, remote_refs, flags); Thanks. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] serialize collection of refs that contain submodule changes 2016-09-16 17:47 ` Junio C Hamano @ 2016-09-19 19:51 ` Heiko Voigt 2016-09-19 20:09 ` Junio C Hamano 0 siblings, 1 reply; 25+ messages in thread From: Heiko Voigt @ 2016-09-19 19:51 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Stefan Beller, git@vger.kernel.org, Jens Lehmann, Fredrik Gustafsson, Leandro Lucarella Hi, On Fri, Sep 16, 2016 at 10:47:46AM -0700, Junio C Hamano wrote: > One thing that makes me worried is how the ref cache layer interacts > with this. I see you first call push_unpushed_submodules() when > ON_DEMAND is set, which would result in pushes in submodule > repositories, updating their remote tracking branches. At that > point, before you make another call to find_unpushed_submodules(), > is our cached ref layer knows that the remote tracking branches > are now up to date (otherwise, we would incorrectly judge that these > submodules need pushing based on stale information)? I am not sure if I understand you correctly here. With the "ref cache layer" you are referring to add_submodule_odb() which is called indirectly from submodule_needs_pushing()? Those revs are only used to check whether the hash we need on the remote side exists in the local submodule. That should not change due to a push. The actual check whether the commit(s) exist on the remote side is done using a 'rev-list' in a subprocess later. > > diff --git a/transport.c b/transport.c > > index 94d6dc3..76e1daf 100644 > > --- a/transport.c > > +++ b/transport.c > > @@ -903,23 +903,29 @@ int transport_push(struct transport *transport, > > > > if ((flags & TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND) && !is_bare_repository()) { > > struct ref *ref = remote_refs; > > + struct sha1_array hashes = SHA1_ARRAY_INIT; > > + > > for (; ref; ref = ref->next) > > - if (!is_null_oid(&ref->new_oid) && > > - !push_unpushed_submodules(ref->new_oid.hash, > > - transport->remote->name)) > > - die ("Failed to push all needed submodules!"); > > + if (!is_null_oid(&ref->new_oid)) > > + sha1_array_append(&hashes, ref->new_oid.hash); > > + > > + if (!push_unpushed_submodules(&hashes, transport->remote->name)) > > + die ("Failed to push all needed submodules!"); > > Do we leak the contents of hashes here? Good catch, sorry about that. Will clear the hashes array. > > } > > > > if ((flags & (TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND | > > TRANSPORT_RECURSE_SUBMODULES_CHECK)) && !is_bare_repository()) { > > struct ref *ref = remote_refs; > > struct string_list needs_pushing = STRING_LIST_INIT_DUP; > > + struct sha1_array hashes = SHA1_ARRAY_INIT; > > > > for (; ref; ref = ref->next) > > - if (!is_null_oid(&ref->new_oid) && > > - find_unpushed_submodules(ref->new_oid.hash, > > - transport->remote->name, &needs_pushing)) > > - die_with_unpushed_submodules(&needs_pushing); > > + if (!is_null_oid(&ref->new_oid)) > > + sha1_array_append(&hashes, ref->new_oid.hash); > > + > > + if (find_unpushed_submodules(&hashes, transport->remote->name, > > + &needs_pushing)) > > + die_with_unpushed_submodules(&needs_pushing); > > Do we leak the contents of hashes here? I do not think we need to > worry about needs_pushing leaking, as we will always die if it is > not empty, but it might be a better code hygiene to clear it as > well. As above, will fix. Cheers Heiko ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] serialize collection of refs that contain submodule changes 2016-09-19 19:51 ` Heiko Voigt @ 2016-09-19 20:09 ` Junio C Hamano 0 siblings, 0 replies; 25+ messages in thread From: Junio C Hamano @ 2016-09-19 20:09 UTC (permalink / raw) To: Heiko Voigt Cc: Jeff King, Stefan Beller, git@vger.kernel.org, Jens Lehmann, Fredrik Gustafsson, Leandro Lucarella Heiko Voigt <hvoigt@hvoigt.net> writes: > I am not sure if I understand you correctly here. With the "ref cache layer" > you are referring to add_submodule_odb() which is called indirectly from > submodule_needs_pushing()? Those revs are only used to check whether the hash > we need on the remote side exists in the local submodule. That should not > change due to a push. The actual check whether the commit(s) exist on the > remote side is done using a 'rev-list' in a subprocess later. I was wondering what would happen in this scenario: * You have ON_DEMAND set, which causes "git -C sub push origin" to push out what are new, updating the remote tracking branches in the submodule, sub/.git/refs/remotes/origin/*. * Then you check again. If you used for-each-ref-in-submodule, the updated refs/remotes/origin/* may not have been re-read. But you check by spawning "rev-list ... --not --remotes" as a separate process in submodule_needs_pushing(), and that will force the new process to read the updated state, so it turns out that I was overly worried without good reason ;-) It may matter once somebody tries to internalize the external rev-list call done via start_command() interface to an internal call, though. But we are not there yet. ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2016-09-19 20:11 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-08-24 17:30 [PATCHv2] push: change submodule default to check Stefan Beller 2016-08-24 18:38 ` Junio C Hamano [not found] ` <20160824183112.ceekegpzavnbybxp@sigill.intra.peff.net> 2016-08-24 19:37 ` Junio C Hamano 2016-08-24 21:26 ` Junio C Hamano 2016-08-24 22:37 ` Stefan Beller 2016-08-24 23:01 ` Jeff King 2016-09-14 17:31 ` [PATCH 1/2] serialize collection of changed submodules Heiko Voigt 2016-09-14 22:30 ` Junio C Hamano 2016-09-15 12:10 ` [PATCH 3/2] batch check whether submodule needs pushing into one call Heiko Voigt 2016-09-15 21:08 ` Junio C Hamano 2016-09-16 9:40 ` Heiko Voigt 2016-09-16 12:31 ` Heiko Voigt 2016-09-16 18:13 ` Junio C Hamano 2016-09-19 20:08 ` Heiko Voigt 2016-09-16 17:59 ` Junio C Hamano 2016-09-19 19:58 ` Heiko Voigt 2016-09-15 12:18 ` [PATCH 4/2] use actual start hashes for submodule push check instead of local refs Heiko Voigt 2016-09-16 17:27 ` [PATCH 1/2] serialize collection of changed submodules Junio C Hamano 2016-09-19 19:44 ` Heiko Voigt 2016-09-14 17:51 ` [PATCH 2/2] serialize collection of refs that contain submodule changes Heiko Voigt 2016-09-14 19:46 ` Heiko Voigt 2016-09-14 20:04 ` Stefan Beller 2016-09-16 17:47 ` Junio C Hamano 2016-09-19 19:51 ` Heiko Voigt 2016-09-19 20:09 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).