git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Poor push performance with large number of refs
@ 2014-12-10  0:37 brian m. carlson
  2014-12-10  5:41 ` Shawn Pearce
  2014-12-10  9:17 ` Poor push performance with large number of refs Duy Nguyen
  0 siblings, 2 replies; 14+ messages in thread
From: brian m. carlson @ 2014-12-10  0:37 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1647 bytes --]

I have a repository that's just under 2 GiB in size and contains over
20000 refs, with a copy of it on a server.  Both sides are using Git
2.1.2.  If I push a branch that contains a single commit, it takes about
15 seconds to push.  However, if everything is up-to-date, it completes
within 2 seconds.  Notably, HTTPS performs the same as SSH.

Most of the time is spent between the "Pushing to remote machine" and
"Counting objects", running git pack-objects:

  git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress

Unfortunately, -vvv doesn't provide any helpful output.  I have some
suspicions what's going on here, but no hard data.  Where should I
be looking to determine the bottleneck?

  brianc no # time git push -vvv remote-machine:/tmp/testcase +HEAD
  Pushing to remote-machine:/tmp/testcase
  Counting objects: 1, done.
  Writing objects: 100% (1/1), 184 bytes | 0 bytes/s, done.
  Total 1 (delta 0), reused 0 (delta 0)
  To remote-machine:/tmp/testcase
   + 5059893...2db9804 HEAD -> test (forced update)
  git push -vvv +HEAD  12.35s user 1.61s system 91% cpu 15.337 total
  
  brianc ok # time git push -vvv remote-machine:/tmp/testcase +HEAD
  Pushing to remote-machine:/tmp/testcase
  To remote-machine:/tmp/testcase
   = [up to date]      HEAD -> test
  Everything up-to-date
  git push -vvv  +HEAD  0.50s user 0.22s system 51% cpu 1.408 total
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Poor push performance with large number of refs
  2014-12-10  0:37 Poor push performance with large number of refs brian m. carlson
@ 2014-12-10  5:41 ` Shawn Pearce
  2014-12-10 23:34   ` brian m. carlson
  2014-12-10  9:17 ` Poor push performance with large number of refs Duy Nguyen
  1 sibling, 1 reply; 14+ messages in thread
From: Shawn Pearce @ 2014-12-10  5:41 UTC (permalink / raw)
  To: git

On Tue, Dec 9, 2014 at 4:37 PM, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> I have a repository that's just under 2 GiB in size and contains over
> 20000 refs, with a copy of it on a server.  Both sides are using Git
> 2.1.2.  If I push a branch that contains a single commit, it takes about
> 15 seconds to push.  However, if everything is up-to-date, it completes
> within 2 seconds.  Notably, HTTPS performs the same as SSH.
>
> Most of the time is spent between the "Pushing to remote machine" and
> "Counting objects", running git pack-objects:
>
>   git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress
>
> Unfortunately, -vvv doesn't provide any helpful output.  I have some
> suspicions what's going on here, but no hard data.  Where should I
> be looking to determine the bottleneck?

My guess is the revision queue is struggling to insert 20,000 commits
that the remote side "has", are uninteresting, and should not be
transmitted. This queue insertion usually requires parsing the commit
object out of the local object store to get the commit timestamp, then
bubble sort inserting that commit into the queue.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Poor push performance with large number of refs
  2014-12-10  0:37 Poor push performance with large number of refs brian m. carlson
  2014-12-10  5:41 ` Shawn Pearce
@ 2014-12-10  9:17 ` Duy Nguyen
  1 sibling, 0 replies; 14+ messages in thread
From: Duy Nguyen @ 2014-12-10  9:17 UTC (permalink / raw)
  To: Git Mailing List

On Wed, Dec 10, 2014 at 7:37 AM, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> I have a repository that's just under 2 GiB in size and contains over
> 20000 refs, with a copy of it on a server.  Both sides are using Git
> 2.1.2.  If I push a branch that contains a single commit, it takes about
> 15 seconds to push.  However, if everything is up-to-date, it completes
> within 2 seconds.  Notably, HTTPS performs the same as SSH.
>
> Most of the time is spent between the "Pushing to remote machine" and
> "Counting objects", running git pack-objects:
>
>   git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress
>
> Unfortunately, -vvv doesn't provide any helpful output.  I have some
> suspicions what's going on here, but no hard data.  Where should I
> be looking to determine the bottleneck?

Start with "perf record", if this is on linux?
-- 
Duy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Poor push performance with large number of refs
  2014-12-10  5:41 ` Shawn Pearce
@ 2014-12-10 23:34   ` brian m. carlson
  2014-12-10 23:49     ` [PATCH] push: add remote.pushThin to control thin pack generation brian m. carlson
  2014-12-11  1:41     ` Poor push performance with large number of refs Duy Nguyen
  0 siblings, 2 replies; 14+ messages in thread
From: brian m. carlson @ 2014-12-10 23:34 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1918 bytes --]

On Tue, Dec 09, 2014 at 09:41:28PM -0800, Shawn Pearce wrote:
> On Tue, Dec 9, 2014 at 4:37 PM, brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> > Most of the time is spent between the "Pushing to remote machine" and
> > "Counting objects", running git pack-objects:
> >
> >   git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress
> >
> > Unfortunately, -vvv doesn't provide any helpful output.  I have some
> > suspicions what's going on here, but no hard data.  Where should I
> > be looking to determine the bottleneck?
> 
> My guess is the revision queue is struggling to insert 20,000 commits
> that the remote side "has", are uninteresting, and should not be
> transmitted. This queue insertion usually requires parsing the commit
> object out of the local object store to get the commit timestamp, then
> bubble sort inserting that commit into the queue.

I looked at this more in depth today and I found that the bottleneck is
--thin.  I tried git send-pack, which does not use --thin by default,
which led me to further testing.  A particular push went from 24 seconds
with --thin to 4 seconds without.

I agree that the large number of refs is at least part of the problem,
because reducing the number of refs has a slight but noticeable impact.
It's also the factor I can least control.

I have a patch which allows per-remote configuration of whether to use
thin packs (which I will send shortly), but I'm wondering if we can do
better, especially since --thin is the default.  It looks like --thin
forces pack-objects to do its own lookup (essentially a rev-list)
instead of using the values provided on stdin.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] push: add remote.pushThin to control thin pack generation
  2014-12-10 23:34   ` brian m. carlson
@ 2014-12-10 23:49     ` brian m. carlson
  2014-12-11  0:43       ` brian m. carlson
  2014-12-11  1:41     ` Poor push performance with large number of refs Duy Nguyen
  1 sibling, 1 reply; 14+ messages in thread
From: brian m. carlson @ 2014-12-10 23:49 UTC (permalink / raw)
  To: git

From: "brian m. carlson" <brian.carlson@cpanel.net>

Thin packs are enabled when pushing by default, but with a large number
of refs and a fast network, git may spend more time trying to find a
good delta than it would spend simply sending data over the network.
Add a per-remote option, pushThin, that controls the default for pushes
on that remote.

Signed-off-by: brian m. carlson <brian.carlson@cpanel.net>
---
 SOURCES/git/Documentation/config.txt | 6 ++++++
 SOURCES/git/builtin/push.c           | 6 ++++--
 SOURCES/git/remote.c                 | 3 +++
 SOURCES/git/remote.h                 | 1 +
 SOURCES/git/transport.c              | 2 ++
 5 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/SOURCES/git/Documentation/config.txt b/SOURCES/git/Documentation/config.txt
index 9220725..7fededd 100644
--- a/SOURCES/git/Documentation/config.txt
+++ b/SOURCES/git/Documentation/config.txt
@@ -2178,6 +2178,12 @@ remote.<name>.push::
 	The default set of "refspec" for linkgit:git-push[1]. See
 	linkgit:git-push[1].
 
+remote.<name>.pushThin::
+	If true (the default), pass the \--thin option to
+	linkgit:git-pack-objects[1] during push.  This results in a smaller
+	pack being sent and can improve push time over slow networks.  Over
+	fast networks, setting this value to false may improve performance.
+
 remote.<name>.mirror::
 	If true, pushing to this remote will automatically behave
 	as if the `--mirror` option was given on the command line.
diff --git a/SOURCES/git/builtin/push.c b/SOURCES/git/builtin/push.c
index a076b19..ae39677 100644
--- a/SOURCES/git/builtin/push.c
+++ b/SOURCES/git/builtin/push.c
@@ -15,7 +15,7 @@ static const char * const push_usage[] = {
 	NULL,
 };
 
-static int thin = 1;
+static int thin = -1;
 static int deleterefs;
 static const char *receivepack;
 static int verbosity;
@@ -347,7 +347,9 @@ static int push_with_options(struct transport *transport, int flags)
 	if (receivepack)
 		transport_set_option(transport,
 				     TRANS_OPT_RECEIVEPACK, receivepack);
-	transport_set_option(transport, TRANS_OPT_THIN, thin ? "yes" : NULL);
+
+	if (thin != -1)
+		transport_set_option(transport, TRANS_OPT_THIN, thin ? "yes" : NULL);
 
 	if (!is_empty_cas(&cas)) {
 		if (!transport->smart_options)
diff --git a/SOURCES/git/remote.c b/SOURCES/git/remote.c
index f624217..54777cb 100644
--- a/SOURCES/git/remote.c
+++ b/SOURCES/git/remote.c
@@ -175,6 +175,7 @@ static struct remote *make_remote(const char *name, int len)
 
 	ret = xcalloc(1, sizeof(struct remote));
 	ret->prune = -1;  /* unspecified */
+	ret->push_thin = 1; /* default on */
 	ALLOC_GROW(remotes, remotes_nr + 1, remotes_alloc);
 	remotes[remotes_nr++] = ret;
 	ret->name = xstrndup(name, len);
@@ -445,6 +446,8 @@ static int handle_config(const char *key, const char *value, void *cb)
 		if (git_config_string(&v, key, value))
 			return -1;
 		add_push_refspec(remote, v);
+	} else if (!strcmp(subkey, ".pushthin")) {
+		remote->push_thin = git_config_bool(key, value);
 	} else if (!strcmp(subkey, ".fetch")) {
 		const char *v;
 		if (git_config_string(&v, key, value))
diff --git a/SOURCES/git/remote.h b/SOURCES/git/remote.h
index 8b62efd..badf266 100644
--- a/SOURCES/git/remote.h
+++ b/SOURCES/git/remote.h
@@ -46,6 +46,7 @@ struct remote {
 	int skip_default_update;
 	int mirror;
 	int prune;
+	int push_thin;
 
 	const char *receivepack;
 	const char *uploadpack;
diff --git a/SOURCES/git/transport.c b/SOURCES/git/transport.c
index 70d38e4..2f495fa 100644
--- a/SOURCES/git/transport.c
+++ b/SOURCES/git/transport.c
@@ -987,6 +987,8 @@ struct transport *transport_get(struct remote *remote, const char *url)
 			ret->smart_options->receivepack = remote->receivepack;
 	}
 
+	transport_set_option(ret, TRANS_OPT_THIN, remote->push_thin ? "yes" : NULL);
+
 	return ret;
 }
 
-- 
2.2.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] push: add remote.pushThin to control thin pack generation
  2014-12-10 23:49     ` [PATCH] push: add remote.pushThin to control thin pack generation brian m. carlson
@ 2014-12-11  0:43       ` brian m. carlson
  0 siblings, 0 replies; 14+ messages in thread
From: brian m. carlson @ 2014-12-11  0:43 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 4833 bytes --]

On Wed, Dec 10, 2014 at 11:49:49PM +0000, brian m. carlson wrote:
> From: "brian m. carlson" <brian.carlson@cpanel.net>
> 
> Thin packs are enabled when pushing by default, but with a large number
> of refs and a fast network, git may spend more time trying to find a
> good delta than it would spend simply sending data over the network.
> Add a per-remote option, pushThin, that controls the default for pushes
> on that remote.

I just realized this patch doesn't apply outside of the particular repo
I was using.  Consider this more of an RFC, since I'd like to avoid
going this route if possible, in favor of a more robust approach.

> Signed-off-by: brian m. carlson <brian.carlson@cpanel.net>
> ---
>  SOURCES/git/Documentation/config.txt | 6 ++++++
>  SOURCES/git/builtin/push.c           | 6 ++++--
>  SOURCES/git/remote.c                 | 3 +++
>  SOURCES/git/remote.h                 | 1 +
>  SOURCES/git/transport.c              | 2 ++
>  5 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/SOURCES/git/Documentation/config.txt b/SOURCES/git/Documentation/config.txt
> index 9220725..7fededd 100644
> --- a/SOURCES/git/Documentation/config.txt
> +++ b/SOURCES/git/Documentation/config.txt
> @@ -2178,6 +2178,12 @@ remote.<name>.push::
>  	The default set of "refspec" for linkgit:git-push[1]. See
>  	linkgit:git-push[1].
>  
> +remote.<name>.pushThin::
> +	If true (the default), pass the \--thin option to
> +	linkgit:git-pack-objects[1] during push.  This results in a smaller
> +	pack being sent and can improve push time over slow networks.  Over
> +	fast networks, setting this value to false may improve performance.
> +
>  remote.<name>.mirror::
>  	If true, pushing to this remote will automatically behave
>  	as if the `--mirror` option was given on the command line.
> diff --git a/SOURCES/git/builtin/push.c b/SOURCES/git/builtin/push.c
> index a076b19..ae39677 100644
> --- a/SOURCES/git/builtin/push.c
> +++ b/SOURCES/git/builtin/push.c
> @@ -15,7 +15,7 @@ static const char * const push_usage[] = {
>  	NULL,
>  };
>  
> -static int thin = 1;
> +static int thin = -1;
>  static int deleterefs;
>  static const char *receivepack;
>  static int verbosity;
> @@ -347,7 +347,9 @@ static int push_with_options(struct transport *transport, int flags)
>  	if (receivepack)
>  		transport_set_option(transport,
>  				     TRANS_OPT_RECEIVEPACK, receivepack);
> -	transport_set_option(transport, TRANS_OPT_THIN, thin ? "yes" : NULL);
> +
> +	if (thin != -1)
> +		transport_set_option(transport, TRANS_OPT_THIN, thin ? "yes" : NULL);
>  
>  	if (!is_empty_cas(&cas)) {
>  		if (!transport->smart_options)
> diff --git a/SOURCES/git/remote.c b/SOURCES/git/remote.c
> index f624217..54777cb 100644
> --- a/SOURCES/git/remote.c
> +++ b/SOURCES/git/remote.c
> @@ -175,6 +175,7 @@ static struct remote *make_remote(const char *name, int len)
>  
>  	ret = xcalloc(1, sizeof(struct remote));
>  	ret->prune = -1;  /* unspecified */
> +	ret->push_thin = 1; /* default on */
>  	ALLOC_GROW(remotes, remotes_nr + 1, remotes_alloc);
>  	remotes[remotes_nr++] = ret;
>  	ret->name = xstrndup(name, len);
> @@ -445,6 +446,8 @@ static int handle_config(const char *key, const char *value, void *cb)
>  		if (git_config_string(&v, key, value))
>  			return -1;
>  		add_push_refspec(remote, v);
> +	} else if (!strcmp(subkey, ".pushthin")) {
> +		remote->push_thin = git_config_bool(key, value);
>  	} else if (!strcmp(subkey, ".fetch")) {
>  		const char *v;
>  		if (git_config_string(&v, key, value))
> diff --git a/SOURCES/git/remote.h b/SOURCES/git/remote.h
> index 8b62efd..badf266 100644
> --- a/SOURCES/git/remote.h
> +++ b/SOURCES/git/remote.h
> @@ -46,6 +46,7 @@ struct remote {
>  	int skip_default_update;
>  	int mirror;
>  	int prune;
> +	int push_thin;
>  
>  	const char *receivepack;
>  	const char *uploadpack;
> diff --git a/SOURCES/git/transport.c b/SOURCES/git/transport.c
> index 70d38e4..2f495fa 100644
> --- a/SOURCES/git/transport.c
> +++ b/SOURCES/git/transport.c
> @@ -987,6 +987,8 @@ struct transport *transport_get(struct remote *remote, const char *url)
>  			ret->smart_options->receivepack = remote->receivepack;
>  	}
>  
> +	transport_set_option(ret, TRANS_OPT_THIN, remote->push_thin ? "yes" : NULL);
> +
>  	return ret;
>  }
>  
> -- 
> 2.2.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Poor push performance with large number of refs
  2014-12-10 23:34   ` brian m. carlson
  2014-12-10 23:49     ` [PATCH] push: add remote.pushThin to control thin pack generation brian m. carlson
@ 2014-12-11  1:41     ` Duy Nguyen
  2014-12-11  3:09       ` brian m. carlson
  1 sibling, 1 reply; 14+ messages in thread
From: Duy Nguyen @ 2014-12-11  1:41 UTC (permalink / raw)
  To: Shawn Pearce, git

On Thu, Dec 11, 2014 at 6:34 AM, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> I looked at this more in depth today and I found that the bottleneck is
> --thin.  I tried git send-pack, which does not use --thin by default,
> which led me to further testing.  A particular push went from 24 seconds
> with --thin to 4 seconds without.
>
> I agree that the large number of refs is at least part of the problem,
> because reducing the number of refs has a slight but noticeable impact.
> It's also the factor I can least control.
>
> I have a patch which allows per-remote configuration of whether to use
> thin packs (which I will send shortly), but I'm wondering if we can do
> better, especially since --thin is the default.  It looks like --thin
> forces pack-objects to do its own lookup (essentially a rev-list)
> instead of using the values provided on stdin.

It could be a regression by fbd4a70 (list-objects: mark more commits
as edges in mark_edges_uninteresting - 2013-08-16). That commit makes
--thin a lot more agressive (reading lots of trees). You can try to
revert that commit (or use a git version without that commit) and see
if it improves performance. If so, we probably want to enable that
code for shallow repos only.
-- 
Duy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Poor push performance with large number of refs
  2014-12-11  1:41     ` Poor push performance with large number of refs Duy Nguyen
@ 2014-12-11  3:09       ` brian m. carlson
  2014-12-11  3:46         ` [PATCH] list-objects: mark fewer commits as edges for non-shallow clones brian m. carlson
  0 siblings, 1 reply; 14+ messages in thread
From: brian m. carlson @ 2014-12-11  3:09 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Shawn Pearce, git

[-- Attachment #1: Type: text/plain, Size: 850 bytes --]

On Thu, Dec 11, 2014 at 08:41:07AM +0700, Duy Nguyen wrote:
> It could be a regression by fbd4a70 (list-objects: mark more commits
> as edges in mark_edges_uninteresting - 2013-08-16). That commit makes
> --thin a lot more agressive (reading lots of trees). You can try to
> revert that commit (or use a git version without that commit) and see
> if it improves performance. If so, we probably want to enable that
> code for shallow repos only.

That's exactly it.  With Git 2.2.0, --no-thin was 2.295s, --thin was
8.769s, and --thin with the patch reverted was 3.645s.

I'll come up with a patch.  Thanks for the suggestion.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] list-objects: mark fewer commits as edges for non-shallow clones
  2014-12-11  3:09       ` brian m. carlson
@ 2014-12-11  3:46         ` brian m. carlson
  2014-12-11 10:51           ` Duy Nguyen
  2014-12-11 19:13           ` Junio C Hamano
  0 siblings, 2 replies; 14+ messages in thread
From: brian m. carlson @ 2014-12-11  3:46 UTC (permalink / raw)
  To: git; +Cc: pclouds, gitster

In commit fbd4a70 (list-objects: mark more commits as edges in
mark_edges_uninteresting - 2013-08-16), we made --thin much more
aggressive by reading lots more trees.  This produces much smaller packs
for shallow clones; however, it causes a significant performance
regression for non-shallow clones with lots of refs (23.322 seconds vs.
4.785 seconds with 22400 refs).  Limit this extra edge-marking to
shallow clones to avoid poor performance.

Suggested-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 list-objects.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/list-objects.c b/list-objects.c
index 2910bec..274ebb4 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -151,13 +151,14 @@ void mark_edges_uninteresting(struct rev_info *revs, show_edge_fn show_edge)
 {
 	struct commit_list *list;
 	int i;
+	int shallow = is_repository_shallow();
 
 	for (list = revs->commits; list; list = list->next) {
 		struct commit *commit = list->item;
 
 		if (commit->object.flags & UNINTERESTING) {
 			mark_tree_uninteresting(commit->tree);
-			if (revs->edge_hint && !(commit->object.flags & SHOWN)) {
+			if (shallow && revs->edge_hint && !(commit->object.flags & SHOWN)) {
 				commit->object.flags |= SHOWN;
 				show_edge(commit);
 			}
@@ -165,6 +166,8 @@ void mark_edges_uninteresting(struct rev_info *revs, show_edge_fn show_edge)
 		}
 		mark_edge_parents_uninteresting(commit, revs, show_edge);
 	}
+	if (!shallow)
+		return;
 	if (revs->edge_hint) {
 		for (i = 0; i < revs->cmdline.nr; i++) {
 			struct object *obj = revs->cmdline.rev[i].item;
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] list-objects: mark fewer commits as edges for non-shallow clones
  2014-12-11  3:46         ` [PATCH] list-objects: mark fewer commits as edges for non-shallow clones brian m. carlson
@ 2014-12-11 10:51           ` Duy Nguyen
  2014-12-20 22:18             ` brian m. carlson
  2014-12-11 19:13           ` Junio C Hamano
  1 sibling, 1 reply; 14+ messages in thread
From: Duy Nguyen @ 2014-12-11 10:51 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Junio C Hamano

On Thu, Dec 11, 2014 at 10:46 AM, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> In commit fbd4a70 (list-objects: mark more commits as edges in
> mark_edges_uninteresting - 2013-08-16), we made --thin much more
> aggressive by reading lots more trees.  This produces much smaller packs
> for shallow clones; however, it causes a significant performance
> regression for non-shallow clones with lots of refs (23.322 seconds vs.
> 4.785 seconds with 22400 refs).  Limit this extra edge-marking to
> shallow clones to avoid poor performance.

I'm glad it's now working better for you. Out of curiosity, have you
enabled pack bitmap on this repo? I would expect it to reduce time
some (or a lot) more, or we would find something to optimize further.
-- 
Duy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] list-objects: mark fewer commits as edges for non-shallow clones
  2014-12-11  3:46         ` [PATCH] list-objects: mark fewer commits as edges for non-shallow clones brian m. carlson
  2014-12-11 10:51           ` Duy Nguyen
@ 2014-12-11 19:13           ` Junio C Hamano
  2014-12-11 19:26             ` Junio C Hamano
  1 sibling, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2014-12-11 19:13 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, pclouds

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> In commit fbd4a70 (list-objects: mark more commits as edges in
> mark_edges_uninteresting - 2013-08-16), we made --thin much more
> aggressive by reading lots more trees.  This produces much smaller packs
> for shallow clones; however, it causes a significant performance
> regression for non-shallow clones with lots of refs (23.322 seconds vs.
> 4.785 seconds with 22400 refs).  Limit this extra edge-marking to
> shallow clones to avoid poor performance.

This change affects non-clone/fetch uses of object listing depending
on the shallowness of the repository, and does not even care if it
is driven as part of the pack-object codepath, if I am reading it
correctly.  It smells wrong.

The problematic fbd4a70 already had unintended fallout that needed
to be corrected with 200abe74 (list-objects: only look at cmdline
trees with edge_hint, 2014-01-20).  The current code with the fix,
the decision to use the more expensive marking is tied to
"edge_hint". I notice that edge_hint is turned on only if the caller
of rev-list passes the "--objects-edge" option, and currently that
only happens in the pack-objects codepath when "thin" is given.
Perhaps that part should decide if it really wants to do edge_hint
depending on the shallowness of the repository perhaps?

That is, something like this instead?

 builtin/pack-objects.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 3f9f5c7..a9ebf56 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2709,7 +2709,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		usage_with_options(pack_usage, pack_objects_options);
 
 	argv_array_push(&rp, "pack-objects");
-	if (thin) {
+	if (thin && is_repository_shallow()) {
 		use_internal_rev_list = 1;
 		argv_array_push(&rp, "--objects-edge");
 	} else

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] list-objects: mark fewer commits as edges for non-shallow clones
  2014-12-11 19:13           ` Junio C Hamano
@ 2014-12-11 19:26             ` Junio C Hamano
  2014-12-20 19:28               ` brian m. carlson
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2014-12-11 19:26 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, pclouds

Junio C Hamano <gitster@pobox.com> writes:

> This change affects non-clone/fetch uses of object listing depending
> on the shallowness of the repository, and does not even care if it
> is driven as part of the pack-object codepath, if I am reading it
> correctly.  It smells wrong.
>
> The problematic fbd4a70 already had unintended fallout that needed
> to be corrected with 200abe74 (list-objects: only look at cmdline
> trees with edge_hint, 2014-01-20).  The current code with the fix,
> the decision to use the more expensive marking is tied to
> "edge_hint". I notice that edge_hint is turned on only if the caller
> of rev-list passes the "--objects-edge" option, and currently that
> only happens in the pack-objects codepath when "thin" is given.
> Perhaps that part should decide if it really wants to do edge_hint
> depending on the shallowness of the repository perhaps?
>
> That is, something like this instead?

Eh, perhaps not like that, as that would disable milder use of
"thin" when fetching into non-shallow repository.

The right approach would be more like allocating one more bit in
struct rev_info (call that edge_hint_aggressive), give a new option
"--objects-edge-aggressive", and do something like

	if (thin) {
        	use_internal_rev_list = 1;
		argv_array_push(&rp, is_repository_shallow()
                	? "--objects-edge-aggressive"
                        : "--objects-edge");
	}

in this codepath?  I'd actually suggest is_repository_shallow()
detection to happen one level even higher (i.e. make decision at the
caller of pack-objects) and decide to pass either "--thin" or
"--thin-aggressive", so that we can make sure that the damage caused
by fbd4a70 to be limited only to fetches into shallow repository
with stronger confidence.


>
>  builtin/pack-objects.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 3f9f5c7..a9ebf56 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -2709,7 +2709,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  		usage_with_options(pack_usage, pack_objects_options);
>  
>  	argv_array_push(&rp, "pack-objects");
> -	if (thin) {
> +	if (thin && is_repository_shallow()) {
>  		use_internal_rev_list = 1;
>  		argv_array_push(&rp, "--objects-edge");
>  	} else

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] list-objects: mark fewer commits as edges for non-shallow clones
  2014-12-11 19:26             ` Junio C Hamano
@ 2014-12-20 19:28               ` brian m. carlson
  0 siblings, 0 replies; 14+ messages in thread
From: brian m. carlson @ 2014-12-20 19:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, pclouds

[-- Attachment #1: Type: text/plain, Size: 1533 bytes --]

On Thu, Dec 11, 2014 at 11:26:47AM -0800, Junio C Hamano wrote:
> The right approach would be more like allocating one more bit in
> struct rev_info (call that edge_hint_aggressive), give a new option
> "--objects-edge-aggressive", and do something like
> 
> 	if (thin) {
>         	use_internal_rev_list = 1;
> 		argv_array_push(&rp, is_repository_shallow()
>                 	? "--objects-edge-aggressive"
>                         : "--objects-edge");
> 	}
> 
> in this codepath?  I'd actually suggest is_repository_shallow()
> detection to happen one level even higher (i.e. make decision at the
> caller of pack-objects) and decide to pass either "--thin" or
> "--thin-aggressive", so that we can make sure that the damage caused
> by fbd4a70 to be limited only to fetches into shallow repository
> with stronger confidence.

Sorry it's taken me so long to get back to this.  Real life keeps
getting in the way.

I think adding --objects-edge-aggressive is the best way forward here
and then applying the patch above.  If we add --thin-aggressive, we push
the problem to a higher level, which will require more code changes and
make the performance regression continue to affect external remote
helpers, which we don't want.  We know what the right decision is here,
so let's just do it.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] list-objects: mark fewer commits as edges for non-shallow clones
  2014-12-11 10:51           ` Duy Nguyen
@ 2014-12-20 22:18             ` brian m. carlson
  0 siblings, 0 replies; 14+ messages in thread
From: brian m. carlson @ 2014-12-20 22:18 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 644 bytes --]

On Thu, Dec 11, 2014 at 05:51:54PM +0700, Duy Nguyen wrote:
> I'm glad it's now working better for you. Out of curiosity, have you
> enabled pack bitmap on this repo? I would expect it to reduce time
> some (or a lot) more, or we would find something to optimize further.

The client performs much, much worse with pack bitmaps.  The server has
bitmaps enabled, but it doesn't have any patches for this issue applied
to it.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-12-20 22:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-10  0:37 Poor push performance with large number of refs brian m. carlson
2014-12-10  5:41 ` Shawn Pearce
2014-12-10 23:34   ` brian m. carlson
2014-12-10 23:49     ` [PATCH] push: add remote.pushThin to control thin pack generation brian m. carlson
2014-12-11  0:43       ` brian m. carlson
2014-12-11  1:41     ` Poor push performance with large number of refs Duy Nguyen
2014-12-11  3:09       ` brian m. carlson
2014-12-11  3:46         ` [PATCH] list-objects: mark fewer commits as edges for non-shallow clones brian m. carlson
2014-12-11 10:51           ` Duy Nguyen
2014-12-20 22:18             ` brian m. carlson
2014-12-11 19:13           ` Junio C Hamano
2014-12-11 19:26             ` Junio C Hamano
2014-12-20 19:28               ` brian m. carlson
2014-12-10  9:17 ` Poor push performance with large number of refs Duy Nguyen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).