* Poor push performance with large number of refs @ 2014-12-10 0:37 brian m. carlson 2014-12-10 5:41 ` Shawn Pearce 2014-12-10 9:17 ` Poor push performance with large number of refs Duy Nguyen 0 siblings, 2 replies; 14+ messages in thread From: brian m. carlson @ 2014-12-10 0:37 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 1647 bytes --] I have a repository that's just under 2 GiB in size and contains over 20000 refs, with a copy of it on a server. Both sides are using Git 2.1.2. If I push a branch that contains a single commit, it takes about 15 seconds to push. However, if everything is up-to-date, it completes within 2 seconds. Notably, HTTPS performs the same as SSH. Most of the time is spent between the "Pushing to remote machine" and "Counting objects", running git pack-objects: git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress Unfortunately, -vvv doesn't provide any helpful output. I have some suspicions what's going on here, but no hard data. Where should I be looking to determine the bottleneck? brianc no # time git push -vvv remote-machine:/tmp/testcase +HEAD Pushing to remote-machine:/tmp/testcase Counting objects: 1, done. Writing objects: 100% (1/1), 184 bytes | 0 bytes/s, done. Total 1 (delta 0), reused 0 (delta 0) To remote-machine:/tmp/testcase + 5059893...2db9804 HEAD -> test (forced update) git push -vvv +HEAD 12.35s user 1.61s system 91% cpu 15.337 total brianc ok # time git push -vvv remote-machine:/tmp/testcase +HEAD Pushing to remote-machine:/tmp/testcase To remote-machine:/tmp/testcase = [up to date] HEAD -> test Everything up-to-date git push -vvv +HEAD 0.50s user 0.22s system 51% cpu 1.408 total -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187 [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Poor push performance with large number of refs 2014-12-10 0:37 Poor push performance with large number of refs brian m. carlson @ 2014-12-10 5:41 ` Shawn Pearce 2014-12-10 23:34 ` brian m. carlson 2014-12-10 9:17 ` Poor push performance with large number of refs Duy Nguyen 1 sibling, 1 reply; 14+ messages in thread From: Shawn Pearce @ 2014-12-10 5:41 UTC (permalink / raw) To: git On Tue, Dec 9, 2014 at 4:37 PM, brian m. carlson <sandals@crustytoothpaste.net> wrote: > I have a repository that's just under 2 GiB in size and contains over > 20000 refs, with a copy of it on a server. Both sides are using Git > 2.1.2. If I push a branch that contains a single commit, it takes about > 15 seconds to push. However, if everything is up-to-date, it completes > within 2 seconds. Notably, HTTPS performs the same as SSH. > > Most of the time is spent between the "Pushing to remote machine" and > "Counting objects", running git pack-objects: > > git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress > > Unfortunately, -vvv doesn't provide any helpful output. I have some > suspicions what's going on here, but no hard data. Where should I > be looking to determine the bottleneck? My guess is the revision queue is struggling to insert 20,000 commits that the remote side "has", are uninteresting, and should not be transmitted. This queue insertion usually requires parsing the commit object out of the local object store to get the commit timestamp, then bubble sort inserting that commit into the queue. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Poor push performance with large number of refs 2014-12-10 5:41 ` Shawn Pearce @ 2014-12-10 23:34 ` brian m. carlson 2014-12-10 23:49 ` [PATCH] push: add remote.pushThin to control thin pack generation brian m. carlson 2014-12-11 1:41 ` Poor push performance with large number of refs Duy Nguyen 0 siblings, 2 replies; 14+ messages in thread From: brian m. carlson @ 2014-12-10 23:34 UTC (permalink / raw) To: Shawn Pearce; +Cc: git [-- Attachment #1: Type: text/plain, Size: 1918 bytes --] On Tue, Dec 09, 2014 at 09:41:28PM -0800, Shawn Pearce wrote: > On Tue, Dec 9, 2014 at 4:37 PM, brian m. carlson > <sandals@crustytoothpaste.net> wrote: > > Most of the time is spent between the "Pushing to remote machine" and > > "Counting objects", running git pack-objects: > > > > git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress > > > > Unfortunately, -vvv doesn't provide any helpful output. I have some > > suspicions what's going on here, but no hard data. Where should I > > be looking to determine the bottleneck? > > My guess is the revision queue is struggling to insert 20,000 commits > that the remote side "has", are uninteresting, and should not be > transmitted. This queue insertion usually requires parsing the commit > object out of the local object store to get the commit timestamp, then > bubble sort inserting that commit into the queue. I looked at this more in depth today and I found that the bottleneck is --thin. I tried git send-pack, which does not use --thin by default, which led me to further testing. A particular push went from 24 seconds with --thin to 4 seconds without. I agree that the large number of refs is at least part of the problem, because reducing the number of refs has a slight but noticeable impact. It's also the factor I can least control. I have a patch which allows per-remote configuration of whether to use thin packs (which I will send shortly), but I'm wondering if we can do better, especially since --thin is the default. It looks like --thin forces pack-objects to do its own lookup (essentially a rev-list) instead of using the values provided on stdin. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187 [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] push: add remote.pushThin to control thin pack generation 2014-12-10 23:34 ` brian m. carlson @ 2014-12-10 23:49 ` brian m. carlson 2014-12-11 0:43 ` brian m. carlson 2014-12-11 1:41 ` Poor push performance with large number of refs Duy Nguyen 1 sibling, 1 reply; 14+ messages in thread From: brian m. carlson @ 2014-12-10 23:49 UTC (permalink / raw) To: git From: "brian m. carlson" <brian.carlson@cpanel.net> Thin packs are enabled when pushing by default, but with a large number of refs and a fast network, git may spend more time trying to find a good delta than it would spend simply sending data over the network. Add a per-remote option, pushThin, that controls the default for pushes on that remote. Signed-off-by: brian m. carlson <brian.carlson@cpanel.net> --- SOURCES/git/Documentation/config.txt | 6 ++++++ SOURCES/git/builtin/push.c | 6 ++++-- SOURCES/git/remote.c | 3 +++ SOURCES/git/remote.h | 1 + SOURCES/git/transport.c | 2 ++ 5 files changed, 16 insertions(+), 2 deletions(-) diff --git a/SOURCES/git/Documentation/config.txt b/SOURCES/git/Documentation/config.txt index 9220725..7fededd 100644 --- a/SOURCES/git/Documentation/config.txt +++ b/SOURCES/git/Documentation/config.txt @@ -2178,6 +2178,12 @@ remote.<name>.push:: The default set of "refspec" for linkgit:git-push[1]. See linkgit:git-push[1]. +remote.<name>.pushThin:: + If true (the default), pass the \--thin option to + linkgit:git-pack-objects[1] during push. This results in a smaller + pack being sent and can improve push time over slow networks. Over + fast networks, setting this value to false may improve performance. + remote.<name>.mirror:: If true, pushing to this remote will automatically behave as if the `--mirror` option was given on the command line. diff --git a/SOURCES/git/builtin/push.c b/SOURCES/git/builtin/push.c index a076b19..ae39677 100644 --- a/SOURCES/git/builtin/push.c +++ b/SOURCES/git/builtin/push.c @@ -15,7 +15,7 @@ static const char * const push_usage[] = { NULL, }; -static int thin = 1; +static int thin = -1; static int deleterefs; static const char *receivepack; static int verbosity; @@ -347,7 +347,9 @@ static int push_with_options(struct transport *transport, int flags) if (receivepack) transport_set_option(transport, TRANS_OPT_RECEIVEPACK, receivepack); - transport_set_option(transport, TRANS_OPT_THIN, thin ? "yes" : NULL); + + if (thin != -1) + transport_set_option(transport, TRANS_OPT_THIN, thin ? "yes" : NULL); if (!is_empty_cas(&cas)) { if (!transport->smart_options) diff --git a/SOURCES/git/remote.c b/SOURCES/git/remote.c index f624217..54777cb 100644 --- a/SOURCES/git/remote.c +++ b/SOURCES/git/remote.c @@ -175,6 +175,7 @@ static struct remote *make_remote(const char *name, int len) ret = xcalloc(1, sizeof(struct remote)); ret->prune = -1; /* unspecified */ + ret->push_thin = 1; /* default on */ ALLOC_GROW(remotes, remotes_nr + 1, remotes_alloc); remotes[remotes_nr++] = ret; ret->name = xstrndup(name, len); @@ -445,6 +446,8 @@ static int handle_config(const char *key, const char *value, void *cb) if (git_config_string(&v, key, value)) return -1; add_push_refspec(remote, v); + } else if (!strcmp(subkey, ".pushthin")) { + remote->push_thin = git_config_bool(key, value); } else if (!strcmp(subkey, ".fetch")) { const char *v; if (git_config_string(&v, key, value)) diff --git a/SOURCES/git/remote.h b/SOURCES/git/remote.h index 8b62efd..badf266 100644 --- a/SOURCES/git/remote.h +++ b/SOURCES/git/remote.h @@ -46,6 +46,7 @@ struct remote { int skip_default_update; int mirror; int prune; + int push_thin; const char *receivepack; const char *uploadpack; diff --git a/SOURCES/git/transport.c b/SOURCES/git/transport.c index 70d38e4..2f495fa 100644 --- a/SOURCES/git/transport.c +++ b/SOURCES/git/transport.c @@ -987,6 +987,8 @@ struct transport *transport_get(struct remote *remote, const char *url) ret->smart_options->receivepack = remote->receivepack; } + transport_set_option(ret, TRANS_OPT_THIN, remote->push_thin ? "yes" : NULL); + return ret; } -- 2.2.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] push: add remote.pushThin to control thin pack generation 2014-12-10 23:49 ` [PATCH] push: add remote.pushThin to control thin pack generation brian m. carlson @ 2014-12-11 0:43 ` brian m. carlson 0 siblings, 0 replies; 14+ messages in thread From: brian m. carlson @ 2014-12-11 0:43 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 4833 bytes --] On Wed, Dec 10, 2014 at 11:49:49PM +0000, brian m. carlson wrote: > From: "brian m. carlson" <brian.carlson@cpanel.net> > > Thin packs are enabled when pushing by default, but with a large number > of refs and a fast network, git may spend more time trying to find a > good delta than it would spend simply sending data over the network. > Add a per-remote option, pushThin, that controls the default for pushes > on that remote. I just realized this patch doesn't apply outside of the particular repo I was using. Consider this more of an RFC, since I'd like to avoid going this route if possible, in favor of a more robust approach. > Signed-off-by: brian m. carlson <brian.carlson@cpanel.net> > --- > SOURCES/git/Documentation/config.txt | 6 ++++++ > SOURCES/git/builtin/push.c | 6 ++++-- > SOURCES/git/remote.c | 3 +++ > SOURCES/git/remote.h | 1 + > SOURCES/git/transport.c | 2 ++ > 5 files changed, 16 insertions(+), 2 deletions(-) > > diff --git a/SOURCES/git/Documentation/config.txt b/SOURCES/git/Documentation/config.txt > index 9220725..7fededd 100644 > --- a/SOURCES/git/Documentation/config.txt > +++ b/SOURCES/git/Documentation/config.txt > @@ -2178,6 +2178,12 @@ remote.<name>.push:: > The default set of "refspec" for linkgit:git-push[1]. See > linkgit:git-push[1]. > > +remote.<name>.pushThin:: > + If true (the default), pass the \--thin option to > + linkgit:git-pack-objects[1] during push. This results in a smaller > + pack being sent and can improve push time over slow networks. Over > + fast networks, setting this value to false may improve performance. > + > remote.<name>.mirror:: > If true, pushing to this remote will automatically behave > as if the `--mirror` option was given on the command line. > diff --git a/SOURCES/git/builtin/push.c b/SOURCES/git/builtin/push.c > index a076b19..ae39677 100644 > --- a/SOURCES/git/builtin/push.c > +++ b/SOURCES/git/builtin/push.c > @@ -15,7 +15,7 @@ static const char * const push_usage[] = { > NULL, > }; > > -static int thin = 1; > +static int thin = -1; > static int deleterefs; > static const char *receivepack; > static int verbosity; > @@ -347,7 +347,9 @@ static int push_with_options(struct transport *transport, int flags) > if (receivepack) > transport_set_option(transport, > TRANS_OPT_RECEIVEPACK, receivepack); > - transport_set_option(transport, TRANS_OPT_THIN, thin ? "yes" : NULL); > + > + if (thin != -1) > + transport_set_option(transport, TRANS_OPT_THIN, thin ? "yes" : NULL); > > if (!is_empty_cas(&cas)) { > if (!transport->smart_options) > diff --git a/SOURCES/git/remote.c b/SOURCES/git/remote.c > index f624217..54777cb 100644 > --- a/SOURCES/git/remote.c > +++ b/SOURCES/git/remote.c > @@ -175,6 +175,7 @@ static struct remote *make_remote(const char *name, int len) > > ret = xcalloc(1, sizeof(struct remote)); > ret->prune = -1; /* unspecified */ > + ret->push_thin = 1; /* default on */ > ALLOC_GROW(remotes, remotes_nr + 1, remotes_alloc); > remotes[remotes_nr++] = ret; > ret->name = xstrndup(name, len); > @@ -445,6 +446,8 @@ static int handle_config(const char *key, const char *value, void *cb) > if (git_config_string(&v, key, value)) > return -1; > add_push_refspec(remote, v); > + } else if (!strcmp(subkey, ".pushthin")) { > + remote->push_thin = git_config_bool(key, value); > } else if (!strcmp(subkey, ".fetch")) { > const char *v; > if (git_config_string(&v, key, value)) > diff --git a/SOURCES/git/remote.h b/SOURCES/git/remote.h > index 8b62efd..badf266 100644 > --- a/SOURCES/git/remote.h > +++ b/SOURCES/git/remote.h > @@ -46,6 +46,7 @@ struct remote { > int skip_default_update; > int mirror; > int prune; > + int push_thin; > > const char *receivepack; > const char *uploadpack; > diff --git a/SOURCES/git/transport.c b/SOURCES/git/transport.c > index 70d38e4..2f495fa 100644 > --- a/SOURCES/git/transport.c > +++ b/SOURCES/git/transport.c > @@ -987,6 +987,8 @@ struct transport *transport_get(struct remote *remote, const char *url) > ret->smart_options->receivepack = remote->receivepack; > } > > + transport_set_option(ret, TRANS_OPT_THIN, remote->push_thin ? "yes" : NULL); > + > return ret; > } > > -- > 2.2.0 > > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187 [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Poor push performance with large number of refs 2014-12-10 23:34 ` brian m. carlson 2014-12-10 23:49 ` [PATCH] push: add remote.pushThin to control thin pack generation brian m. carlson @ 2014-12-11 1:41 ` Duy Nguyen 2014-12-11 3:09 ` brian m. carlson 1 sibling, 1 reply; 14+ messages in thread From: Duy Nguyen @ 2014-12-11 1:41 UTC (permalink / raw) To: Shawn Pearce, git On Thu, Dec 11, 2014 at 6:34 AM, brian m. carlson <sandals@crustytoothpaste.net> wrote: > I looked at this more in depth today and I found that the bottleneck is > --thin. I tried git send-pack, which does not use --thin by default, > which led me to further testing. A particular push went from 24 seconds > with --thin to 4 seconds without. > > I agree that the large number of refs is at least part of the problem, > because reducing the number of refs has a slight but noticeable impact. > It's also the factor I can least control. > > I have a patch which allows per-remote configuration of whether to use > thin packs (which I will send shortly), but I'm wondering if we can do > better, especially since --thin is the default. It looks like --thin > forces pack-objects to do its own lookup (essentially a rev-list) > instead of using the values provided on stdin. It could be a regression by fbd4a70 (list-objects: mark more commits as edges in mark_edges_uninteresting - 2013-08-16). That commit makes --thin a lot more agressive (reading lots of trees). You can try to revert that commit (or use a git version without that commit) and see if it improves performance. If so, we probably want to enable that code for shallow repos only. -- Duy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Poor push performance with large number of refs 2014-12-11 1:41 ` Poor push performance with large number of refs Duy Nguyen @ 2014-12-11 3:09 ` brian m. carlson 2014-12-11 3:46 ` [PATCH] list-objects: mark fewer commits as edges for non-shallow clones brian m. carlson 0 siblings, 1 reply; 14+ messages in thread From: brian m. carlson @ 2014-12-11 3:09 UTC (permalink / raw) To: Duy Nguyen; +Cc: Shawn Pearce, git [-- Attachment #1: Type: text/plain, Size: 850 bytes --] On Thu, Dec 11, 2014 at 08:41:07AM +0700, Duy Nguyen wrote: > It could be a regression by fbd4a70 (list-objects: mark more commits > as edges in mark_edges_uninteresting - 2013-08-16). That commit makes > --thin a lot more agressive (reading lots of trees). You can try to > revert that commit (or use a git version without that commit) and see > if it improves performance. If so, we probably want to enable that > code for shallow repos only. That's exactly it. With Git 2.2.0, --no-thin was 2.295s, --thin was 8.769s, and --thin with the patch reverted was 3.645s. I'll come up with a patch. Thanks for the suggestion. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187 [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] list-objects: mark fewer commits as edges for non-shallow clones 2014-12-11 3:09 ` brian m. carlson @ 2014-12-11 3:46 ` brian m. carlson 2014-12-11 10:51 ` Duy Nguyen 2014-12-11 19:13 ` Junio C Hamano 0 siblings, 2 replies; 14+ messages in thread From: brian m. carlson @ 2014-12-11 3:46 UTC (permalink / raw) To: git; +Cc: pclouds, gitster In commit fbd4a70 (list-objects: mark more commits as edges in mark_edges_uninteresting - 2013-08-16), we made --thin much more aggressive by reading lots more trees. This produces much smaller packs for shallow clones; however, it causes a significant performance regression for non-shallow clones with lots of refs (23.322 seconds vs. 4.785 seconds with 22400 refs). Limit this extra edge-marking to shallow clones to avoid poor performance. Suggested-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- list-objects.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/list-objects.c b/list-objects.c index 2910bec..274ebb4 100644 --- a/list-objects.c +++ b/list-objects.c @@ -151,13 +151,14 @@ void mark_edges_uninteresting(struct rev_info *revs, show_edge_fn show_edge) { struct commit_list *list; int i; + int shallow = is_repository_shallow(); for (list = revs->commits; list; list = list->next) { struct commit *commit = list->item; if (commit->object.flags & UNINTERESTING) { mark_tree_uninteresting(commit->tree); - if (revs->edge_hint && !(commit->object.flags & SHOWN)) { + if (shallow && revs->edge_hint && !(commit->object.flags & SHOWN)) { commit->object.flags |= SHOWN; show_edge(commit); } @@ -165,6 +166,8 @@ void mark_edges_uninteresting(struct rev_info *revs, show_edge_fn show_edge) } mark_edge_parents_uninteresting(commit, revs, show_edge); } + if (!shallow) + return; if (revs->edge_hint) { for (i = 0; i < revs->cmdline.nr; i++) { struct object *obj = revs->cmdline.rev[i].item; -- 2.2.0.rc0.207.ga3a616c ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] list-objects: mark fewer commits as edges for non-shallow clones 2014-12-11 3:46 ` [PATCH] list-objects: mark fewer commits as edges for non-shallow clones brian m. carlson @ 2014-12-11 10:51 ` Duy Nguyen 2014-12-20 22:18 ` brian m. carlson 2014-12-11 19:13 ` Junio C Hamano 1 sibling, 1 reply; 14+ messages in thread From: Duy Nguyen @ 2014-12-11 10:51 UTC (permalink / raw) To: brian m. carlson; +Cc: Git Mailing List, Junio C Hamano On Thu, Dec 11, 2014 at 10:46 AM, brian m. carlson <sandals@crustytoothpaste.net> wrote: > In commit fbd4a70 (list-objects: mark more commits as edges in > mark_edges_uninteresting - 2013-08-16), we made --thin much more > aggressive by reading lots more trees. This produces much smaller packs > for shallow clones; however, it causes a significant performance > regression for non-shallow clones with lots of refs (23.322 seconds vs. > 4.785 seconds with 22400 refs). Limit this extra edge-marking to > shallow clones to avoid poor performance. I'm glad it's now working better for you. Out of curiosity, have you enabled pack bitmap on this repo? I would expect it to reduce time some (or a lot) more, or we would find something to optimize further. -- Duy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] list-objects: mark fewer commits as edges for non-shallow clones 2014-12-11 10:51 ` Duy Nguyen @ 2014-12-20 22:18 ` brian m. carlson 0 siblings, 0 replies; 14+ messages in thread From: brian m. carlson @ 2014-12-20 22:18 UTC (permalink / raw) To: Duy Nguyen; +Cc: Git Mailing List, Junio C Hamano [-- Attachment #1: Type: text/plain, Size: 644 bytes --] On Thu, Dec 11, 2014 at 05:51:54PM +0700, Duy Nguyen wrote: > I'm glad it's now working better for you. Out of curiosity, have you > enabled pack bitmap on this repo? I would expect it to reduce time > some (or a lot) more, or we would find something to optimize further. The client performs much, much worse with pack bitmaps. The server has bitmaps enabled, but it doesn't have any patches for this issue applied to it. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187 [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] list-objects: mark fewer commits as edges for non-shallow clones 2014-12-11 3:46 ` [PATCH] list-objects: mark fewer commits as edges for non-shallow clones brian m. carlson 2014-12-11 10:51 ` Duy Nguyen @ 2014-12-11 19:13 ` Junio C Hamano 2014-12-11 19:26 ` Junio C Hamano 1 sibling, 1 reply; 14+ messages in thread From: Junio C Hamano @ 2014-12-11 19:13 UTC (permalink / raw) To: brian m. carlson; +Cc: git, pclouds "brian m. carlson" <sandals@crustytoothpaste.net> writes: > In commit fbd4a70 (list-objects: mark more commits as edges in > mark_edges_uninteresting - 2013-08-16), we made --thin much more > aggressive by reading lots more trees. This produces much smaller packs > for shallow clones; however, it causes a significant performance > regression for non-shallow clones with lots of refs (23.322 seconds vs. > 4.785 seconds with 22400 refs). Limit this extra edge-marking to > shallow clones to avoid poor performance. This change affects non-clone/fetch uses of object listing depending on the shallowness of the repository, and does not even care if it is driven as part of the pack-object codepath, if I am reading it correctly. It smells wrong. The problematic fbd4a70 already had unintended fallout that needed to be corrected with 200abe74 (list-objects: only look at cmdline trees with edge_hint, 2014-01-20). The current code with the fix, the decision to use the more expensive marking is tied to "edge_hint". I notice that edge_hint is turned on only if the caller of rev-list passes the "--objects-edge" option, and currently that only happens in the pack-objects codepath when "thin" is given. Perhaps that part should decide if it really wants to do edge_hint depending on the shallowness of the repository perhaps? That is, something like this instead? builtin/pack-objects.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 3f9f5c7..a9ebf56 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -2709,7 +2709,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) usage_with_options(pack_usage, pack_objects_options); argv_array_push(&rp, "pack-objects"); - if (thin) { + if (thin && is_repository_shallow()) { use_internal_rev_list = 1; argv_array_push(&rp, "--objects-edge"); } else ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] list-objects: mark fewer commits as edges for non-shallow clones 2014-12-11 19:13 ` Junio C Hamano @ 2014-12-11 19:26 ` Junio C Hamano 2014-12-20 19:28 ` brian m. carlson 0 siblings, 1 reply; 14+ messages in thread From: Junio C Hamano @ 2014-12-11 19:26 UTC (permalink / raw) To: brian m. carlson; +Cc: git, pclouds Junio C Hamano <gitster@pobox.com> writes: > This change affects non-clone/fetch uses of object listing depending > on the shallowness of the repository, and does not even care if it > is driven as part of the pack-object codepath, if I am reading it > correctly. It smells wrong. > > The problematic fbd4a70 already had unintended fallout that needed > to be corrected with 200abe74 (list-objects: only look at cmdline > trees with edge_hint, 2014-01-20). The current code with the fix, > the decision to use the more expensive marking is tied to > "edge_hint". I notice that edge_hint is turned on only if the caller > of rev-list passes the "--objects-edge" option, and currently that > only happens in the pack-objects codepath when "thin" is given. > Perhaps that part should decide if it really wants to do edge_hint > depending on the shallowness of the repository perhaps? > > That is, something like this instead? Eh, perhaps not like that, as that would disable milder use of "thin" when fetching into non-shallow repository. The right approach would be more like allocating one more bit in struct rev_info (call that edge_hint_aggressive), give a new option "--objects-edge-aggressive", and do something like if (thin) { use_internal_rev_list = 1; argv_array_push(&rp, is_repository_shallow() ? "--objects-edge-aggressive" : "--objects-edge"); } in this codepath? I'd actually suggest is_repository_shallow() detection to happen one level even higher (i.e. make decision at the caller of pack-objects) and decide to pass either "--thin" or "--thin-aggressive", so that we can make sure that the damage caused by fbd4a70 to be limited only to fetches into shallow repository with stronger confidence. > > builtin/pack-objects.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index 3f9f5c7..a9ebf56 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -2709,7 +2709,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) > usage_with_options(pack_usage, pack_objects_options); > > argv_array_push(&rp, "pack-objects"); > - if (thin) { > + if (thin && is_repository_shallow()) { > use_internal_rev_list = 1; > argv_array_push(&rp, "--objects-edge"); > } else ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] list-objects: mark fewer commits as edges for non-shallow clones 2014-12-11 19:26 ` Junio C Hamano @ 2014-12-20 19:28 ` brian m. carlson 0 siblings, 0 replies; 14+ messages in thread From: brian m. carlson @ 2014-12-20 19:28 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, pclouds [-- Attachment #1: Type: text/plain, Size: 1533 bytes --] On Thu, Dec 11, 2014 at 11:26:47AM -0800, Junio C Hamano wrote: > The right approach would be more like allocating one more bit in > struct rev_info (call that edge_hint_aggressive), give a new option > "--objects-edge-aggressive", and do something like > > if (thin) { > use_internal_rev_list = 1; > argv_array_push(&rp, is_repository_shallow() > ? "--objects-edge-aggressive" > : "--objects-edge"); > } > > in this codepath? I'd actually suggest is_repository_shallow() > detection to happen one level even higher (i.e. make decision at the > caller of pack-objects) and decide to pass either "--thin" or > "--thin-aggressive", so that we can make sure that the damage caused > by fbd4a70 to be limited only to fetches into shallow repository > with stronger confidence. Sorry it's taken me so long to get back to this. Real life keeps getting in the way. I think adding --objects-edge-aggressive is the best way forward here and then applying the patch above. If we add --thin-aggressive, we push the problem to a higher level, which will require more code changes and make the performance regression continue to affect external remote helpers, which we don't want. We know what the right decision is here, so let's just do it. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187 [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Poor push performance with large number of refs 2014-12-10 0:37 Poor push performance with large number of refs brian m. carlson 2014-12-10 5:41 ` Shawn Pearce @ 2014-12-10 9:17 ` Duy Nguyen 1 sibling, 0 replies; 14+ messages in thread From: Duy Nguyen @ 2014-12-10 9:17 UTC (permalink / raw) To: Git Mailing List On Wed, Dec 10, 2014 at 7:37 AM, brian m. carlson <sandals@crustytoothpaste.net> wrote: > I have a repository that's just under 2 GiB in size and contains over > 20000 refs, with a copy of it on a server. Both sides are using Git > 2.1.2. If I push a branch that contains a single commit, it takes about > 15 seconds to push. However, if everything is up-to-date, it completes > within 2 seconds. Notably, HTTPS performs the same as SSH. > > Most of the time is spent between the "Pushing to remote machine" and > "Counting objects", running git pack-objects: > > git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress > > Unfortunately, -vvv doesn't provide any helpful output. I have some > suspicions what's going on here, but no hard data. Where should I > be looking to determine the bottleneck? Start with "perf record", if this is on linux? -- Duy ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2014-12-20 22:18 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-12-10 0:37 Poor push performance with large number of refs brian m. carlson 2014-12-10 5:41 ` Shawn Pearce 2014-12-10 23:34 ` brian m. carlson 2014-12-10 23:49 ` [PATCH] push: add remote.pushThin to control thin pack generation brian m. carlson 2014-12-11 0:43 ` brian m. carlson 2014-12-11 1:41 ` Poor push performance with large number of refs Duy Nguyen 2014-12-11 3:09 ` brian m. carlson 2014-12-11 3:46 ` [PATCH] list-objects: mark fewer commits as edges for non-shallow clones brian m. carlson 2014-12-11 10:51 ` Duy Nguyen 2014-12-20 22:18 ` brian m. carlson 2014-12-11 19:13 ` Junio C Hamano 2014-12-11 19:26 ` Junio C Hamano 2014-12-20 19:28 ` brian m. carlson 2014-12-10 9:17 ` Poor push performance with large number of refs Duy Nguyen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).