git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Cache stat_tracking_info() for faster status and branch -v
@ 2012-10-16 17:36 Nguyễn Thái Ngọc Duy
  2012-10-19 19:50 ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-16 17:36 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

stat_tracking_info() is used to calculated how many commits ahead or
behind for a branch. Rev walking can be slow especially when the
branch is way behind its remote end. By caching the results, we won't
have to rev walk every time we need these information.
stat_tracking_info() cost can be greatly reduced this way.

This makes sure "git status" instant no matter how far behind HEAD
is, except the first time after HEAD changes. This also makes
"branch -v" usable (for me) as it's now also instant versus 3.5
seconds in non-cache case on my machine.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 I wanted guaranteed-fast status for another reason, but it turns out
 "branch -v" benefits even more. Recent commit walking is not
 efficiently optimized even with Shawn's pack bitmaps. This may be
 useful some people, I guess.

 remote.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/remote.c b/remote.c
index 04fd9ea..db825b8 100644
--- a/remote.c
+++ b/remote.c
@@ -1549,6 +1549,7 @@ int stat_tracking_info(struct branch *branch, int *num_ours, int *num_theirs)
 	struct rev_info revs;
 	const char *rev_argv[10], *base;
 	int rev_argc;
+	int fd;
 
 	/*
 	 * Nothing to report unless we are marked to build on top of
@@ -1579,6 +1580,33 @@ int stat_tracking_info(struct branch *branch, int *num_ours, int *num_theirs)
 	if (theirs == ours)
 		return 0;
 
+	if (!access(git_path("stat_tracking_cache/%s",
+			     branch->refname), R_OK)) {
+		struct strbuf sb = STRBUF_INIT;
+		unsigned char sha1_ours[20], sha1_theirs[20];
+		int n1, n2;
+		if ((fd = open(git_path("stat_tracking_cache/%s",
+					branch->refname),
+			       O_RDONLY)) != -1 &&
+		    strbuf_read(&sb, fd, 0) != -1 &&
+		    sb.len > (40 + 1) * 2 &&
+		    !get_sha1_hex(sb.buf, sha1_ours) &&
+		    sb.buf[40] == '\n' &&
+		    !get_sha1_hex(sb.buf + 41, sha1_theirs) &&
+		    sb.buf[81] == '\n' &&
+		    !hashcmp(sha1_ours, ours->object.sha1) &&
+		    !hashcmp(sha1_theirs, theirs->object.sha1) &&
+		    sscanf(sb.buf + 82, "%d\n%d\n", &n1, &n2) == 2) {
+			*num_ours = n1;
+			*num_theirs = n2;
+			close(fd);
+			strbuf_release(&sb);
+			return 1;
+		}
+		close(fd);
+		strbuf_release(&sb);
+	}
+
 	/* Run "rev-list --left-right ours...theirs" internally... */
 	rev_argc = 0;
 	rev_argv[rev_argc++] = NULL;
@@ -1608,6 +1636,20 @@ int stat_tracking_info(struct branch *branch, int *num_ours, int *num_theirs)
 			(*num_theirs)++;
 	}
 
+	if (!safe_create_leading_directories(git_path("stat_tracking_cache/%s",
+						      branch->refname)) &&
+	    (fd = open(git_path("stat_tracking_cache/%s",
+				branch->refname),
+		       O_CREAT | O_TRUNC | O_RDWR, 0644)) != -1) {
+		struct strbuf sb = STRBUF_INIT;
+		strbuf_addf(&sb, "%s\n", sha1_to_hex(ours->object.sha1));
+		strbuf_addf(&sb, "%s\n", sha1_to_hex(theirs->object.sha1));
+		strbuf_addf(&sb, "%d\n%d\n", *num_ours, *num_theirs);
+		write(fd, sb.buf, sb.len);
+		strbuf_release(&sb);
+		close(fd);
+	}
+
 	/* clear object flags smudged by the above traversal */
 	clear_commit_marks(ours, ALL_REV_FLAGS);
 	clear_commit_marks(theirs, ALL_REV_FLAGS);
-- 
1.8.0.rc2.21.g0695653

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] Cache stat_tracking_info() for faster status and branch -v
  2012-10-16 17:36 [PATCH] Cache stat_tracking_info() for faster status and branch -v Nguyễn Thái Ngọc Duy
@ 2012-10-19 19:50 ` Junio C Hamano
  2012-10-20  9:02   ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2012-10-19 19:50 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> stat_tracking_info() is used to calculated how many commits ahead or
> behind for a branch. Rev walking can be slow especially when the
> branch is way behind its remote end. By caching the results, we won't
> have to rev walk every time we need these information.
> stat_tracking_info() cost can be greatly reduced this way.
>
> This makes sure "git status" instant no matter how far behind HEAD
> is, except the first time after HEAD changes. This also makes
> "branch -v" usable (for me) as it's now also instant versus 3.5
> seconds in non-cache case on my machine.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  I wanted guaranteed-fast status for another reason, but it turns out
>  "branch -v" benefits even more. Recent commit walking is not
>  efficiently optimized even with Shawn's pack bitmaps. This may be
>  useful some people, I guess.

Not particularly interested in the cause, but not so strongly
against it to veto it.

The design looks questionable.

You can fork one or more branches off of a single branch.  You may
fork your 'frotz' branch off of remotes/origin/master but also
another 'xyzzy' branch may be forked from the same.

I understand that you are trying to optimize, given two commit
object names X and Y, the cost to learn the symmetric distances
between them.

Doesn't it make more sense to use a notes-cache that is keyed off of
the commit object name X of the remote?  You will have a single note
that stores a blob for the commit object remotes/origin/master, and
the blob tells you how far the commit at the tip of 'frotz' is from
it, and the same for 'xyzzy'.

You would obviouly need to run "gc" on such a notes-cache tree from
time to time, removing notes for commits that are not tip of any
branch that could be a fork point, and from the remaining notes
blobs, entries that describe commits that are not tip of any branch,
if you go that route.

>
>  remote.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
>
> diff --git a/remote.c b/remote.c
> index 04fd9ea..db825b8 100644
> --- a/remote.c
> +++ b/remote.c
> @@ -1549,6 +1549,7 @@ int stat_tracking_info(struct branch *branch, int *num_ours, int *num_theirs)
>  	struct rev_info revs;
>  	const char *rev_argv[10], *base;
>  	int rev_argc;
> +	int fd;
>  
>  	/*
>  	 * Nothing to report unless we are marked to build on top of
> @@ -1579,6 +1580,33 @@ int stat_tracking_info(struct branch *branch, int *num_ours, int *num_theirs)
>  	if (theirs == ours)
>  		return 0;
>  
> +	if (!access(git_path("stat_tracking_cache/%s",
> +			     branch->refname), R_OK)) {
> +		struct strbuf sb = STRBUF_INIT;
> +		unsigned char sha1_ours[20], sha1_theirs[20];
> +		int n1, n2;
> +		if ((fd = open(git_path("stat_tracking_cache/%s",
> +					branch->refname),
> +			       O_RDONLY)) != -1 &&
> +		    strbuf_read(&sb, fd, 0) != -1 &&
> +		    sb.len > (40 + 1) * 2 &&
> +		    !get_sha1_hex(sb.buf, sha1_ours) &&
> +		    sb.buf[40] == '\n' &&
> +		    !get_sha1_hex(sb.buf + 41, sha1_theirs) &&
> +		    sb.buf[81] == '\n' &&
> +		    !hashcmp(sha1_ours, ours->object.sha1) &&
> +		    !hashcmp(sha1_theirs, theirs->object.sha1) &&
> +		    sscanf(sb.buf + 82, "%d\n%d\n", &n1, &n2) == 2) {
> +			*num_ours = n1;
> +			*num_theirs = n2;
> +			close(fd);
> +			strbuf_release(&sb);
> +			return 1;
> +		}
> +		close(fd);
> +		strbuf_release(&sb);
> +	}
> +
>  	/* Run "rev-list --left-right ours...theirs" internally... */
>  	rev_argc = 0;
>  	rev_argv[rev_argc++] = NULL;
> @@ -1608,6 +1636,20 @@ int stat_tracking_info(struct branch *branch, int *num_ours, int *num_theirs)
>  			(*num_theirs)++;
>  	}
>  
> +	if (!safe_create_leading_directories(git_path("stat_tracking_cache/%s",
> +						      branch->refname)) &&
> +	    (fd = open(git_path("stat_tracking_cache/%s",
> +				branch->refname),
> +		       O_CREAT | O_TRUNC | O_RDWR, 0644)) != -1) {
> +		struct strbuf sb = STRBUF_INIT;
> +		strbuf_addf(&sb, "%s\n", sha1_to_hex(ours->object.sha1));
> +		strbuf_addf(&sb, "%s\n", sha1_to_hex(theirs->object.sha1));
> +		strbuf_addf(&sb, "%d\n%d\n", *num_ours, *num_theirs);
> +		write(fd, sb.buf, sb.len);
> +		strbuf_release(&sb);
> +		close(fd);
> +	}
> +
>  	/* clear object flags smudged by the above traversal */
>  	clear_commit_marks(ours, ALL_REV_FLAGS);
>  	clear_commit_marks(theirs, ALL_REV_FLAGS);

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Cache stat_tracking_info() for faster status and branch -v
  2012-10-19 19:50 ` Junio C Hamano
@ 2012-10-20  9:02   ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 3+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-10-20  9:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Sat, Oct 20, 2012 at 2:50 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Not particularly interested in the cause, but not so strongly
> against it to veto it.

I wonder how many people keep old branches like I do, which are
usually far from remotes.

> Doesn't it make more sense to use a notes-cache that is keyed off of
> the commit object name X of the remote?  You will have a single note
> that stores a blob for the commit object remotes/origin/master, and
> the blob tells you how far the commit at the tip of 'frotz' is from
> it, and the same for 'xyzzy'.
>
> You would obviouly need to run "gc" on such a notes-cache tree from
> time to time, removing notes for commits that are not tip of any
> branch that could be a fork point, and from the remaining notes
> blobs, entries that describe commits that are not tip of any branch,
> if you go that route.

The notes-cache route looks much nicer. Thanks. We can also use Jeff's
persistent hash table from his rename-cache series.
-- 
Duy

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-10-20  9:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-16 17:36 [PATCH] Cache stat_tracking_info() for faster status and branch -v Nguyễn Thái Ngọc Duy
2012-10-19 19:50 ` Junio C Hamano
2012-10-20  9:02   ` Nguyen Thai Ngoc Duy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).