git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] git-fetch: Avoid reading packed refs over and over again
@ 2006-12-17 19:52 Johannes Schindelin
  2006-12-18  1:54 ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Johannes Schindelin @ 2006-12-17 19:52 UTC (permalink / raw)
  To: git, junkio


When checking which tags to fetch, the old code used to call
git-show-ref --verify for _each_ remote tag. Since reading even
packed refs is not a cheap operation when there are a lot of
local refs, the code became quite slow.

This fixes it by teaching git-show-ref to filter out valid
(i.e. locally stored) refs from stdin, when passing the parameter
--filter-invalid to git-show-ref, and feeding it lines in the
form 'sha1 refname'.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
	Since this option is purely for use in git-fetch, I did not even
	bother documenting it.

	This patch would have been so much cleaner if git-fetch was written
	in C... But since it accumulated so many functions by now, I see
	not much chance for that (at least in the near future).

	In very unscientific tests, a single read_packed_refs() in the 
	lilypond repo took 0.1 seconds. Yep, that's 1/10th second. So, the 
	while loop in git-fetch took more than 10 seconds for 107 tags.

 builtin-show-ref.c |   28 +++++++++++++++++++++++++++-
 git-fetch.sh       |    2 +-
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/builtin-show-ref.c b/builtin-show-ref.c
index f6929d9..c0b55c1 100644
--- a/builtin-show-ref.c
+++ b/builtin-show-ref.c
@@ -2,8 +2,9 @@
 #include "refs.h"
 #include "object.h"
 #include "tag.h"
+#include "path-list.h"
 
-static const char show_ref_usage[] = "git show-ref [-q|--quiet] [--verify] [-h|--head] [-d|--dereference] [-s|--hash[=<length>]] [--abbrev[=<length>]] [--tags] [--heads] [--] [pattern*]";
+static const char show_ref_usage[] = "git show-ref [-q|--quiet] [--verify] [-h|--head] [-d|--dereference] [-s|--hash[=<length>]] [--abbrev[=<length>]] [--tags] [--heads] [--] [pattern*] | --filter-invalid < ref-list";
 
 static int deref_tags = 0, show_head = 0, tags_only = 0, heads_only = 0,
 	found_match = 0, verify = 0, quiet = 0, hash_only = 0, abbrev = 0;
@@ -86,6 +87,29 @@ match:
 	return 0;
 }
 
+static int add_valid(const char *refname, const unsigned char *sha1, int flag, void *cbdata)
+{
+	struct path_list *list = (struct path_list *)cbdata;
+	path_list_insert(refname, list);
+	return 0;
+}
+
+static int filter_invalid()
+{
+	static struct path_list valid_refs = { NULL, 0, 0, 0 };
+	char buf[1024];
+
+	for_each_ref(add_valid, &valid_refs);
+	while (fgets(buf, sizeof(buf), stdin)) {
+		int len = strlen(buf);
+		if (len > 0 && buf[len - 1] == '\n')
+			buf[--len] = '\0';
+		if (len < 41 || !path_list_has_path(&valid_refs, buf + 41))
+			printf("%s\n", buf);
+	}
+	return 0;
+}
+
 int cmd_show_ref(int argc, const char **argv, const char *prefix)
 {
 	int i;
@@ -153,6 +177,8 @@ int cmd_show_ref(int argc, const char **argv, const char *prefix)
 			heads_only = 1;
 			continue;
 		}
+		if (!strcmp(arg, "--filter-invalid"))
+			return filter_invalid();
 		usage(show_ref_usage);
 	}
 	if (verify) {
diff --git a/git-fetch.sh b/git-fetch.sh
index 3feba32..d1c00db 100755
--- a/git-fetch.sh
+++ b/git-fetch.sh
@@ -474,9 +474,9 @@ case "$no_tags$tags" in
 		echo "$ls_remote_result" |
 		sed -n	-e 's|^\('"$_x40"'\)	\(refs/tags/.*\)^{}$|\1 \2|p' \
 			-e 's|^\('"$_x40"'\)	\(refs/tags/.*\)$|\1 \2|p' |
+		git-show-ref --filter-invalid |
 		while read sha1 name
 		do
-			git-show-ref --verify --quiet -- "$name" && continue
 			git-check-ref-format "$name" || {
 				echo >&2 "warning: tag ${name} ignored"
 				continue
-- 
1.4.4.1.g3c2a-dirty

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] git-fetch: Avoid reading packed refs over and over again
  2006-12-17 19:52 [PATCH] git-fetch: Avoid reading packed refs over and over again Johannes Schindelin
@ 2006-12-18  1:54 ` Junio C Hamano
  2006-12-18 12:33   ` Johannes Schindelin
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2006-12-18  1:54 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> 	Since this option is purely for use in git-fetch, I did not even
> 	bother documenting it.

I think the name --filter-blah makes it unclear if it is a
filter that picks up items that match the criteria "blah" out of
its input, or if it filters out the ones that do not match the
criteria.

In either case, you are not filtering "invalid vs valid" but
"existing vs missing".

If you are making a git-fetch specific extension, it would make
sense to include what the upstream sed does as well.  That is,
the user would become:

	echo "$ls_remote_result" |
-       sed -n	-e 's|^\('"$_x40"'\)	\(refs/tags/.*\)^{}$|\1 \2|p' \
-               -e 's|^\('"$_x40"'\)	\(refs/tags/.*\)$|\1 \2|p' |
+	git show-ref --exclude-existing=refs/tags/ |
        while read sha1 name
        do
-		git-show-ref --verify --quiet -- "$name" && continue
-		git-check-ref-format "$name" || {
-			echo >&2 "warning: tag ${name} ignored"
-			continue

The 'exclude-existing' flag would filter out the ones that do not
head-match the specified hierarchy, the ones that we have, and
the ones that do not conform to a valid refname pattern (which
would filter out ^{} entries).

How about this as a replacement?

---

 builtin-show-ref.c |   60 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 git-fetch.sh       |   12 ++-------
 2 files changed, 62 insertions(+), 10 deletions(-)

diff --git a/builtin-show-ref.c b/builtin-show-ref.c
index 0739798..b36f15e 100644
--- a/builtin-show-ref.c
+++ b/builtin-show-ref.c
@@ -2,8 +2,9 @@
 #include "refs.h"
 #include "object.h"
 #include "tag.h"
+#include "path-list.h"
 
-static const char show_ref_usage[] = "git show-ref [-q|--quiet] [--verify] [-h|--head] [-d|--dereference] [-s|--hash[=<length>]] [--abbrev[=<length>]] [--tags] [--heads] [--] [pattern*]";
+static const char show_ref_usage[] = "git show-ref [-q|--quiet] [--verify] [-h|--head] [-d|--dereference] [-s|--hash[=<length>]] [--abbrev[=<length>]] [--tags] [--heads] [--] [pattern*] | --filter-invalid < ref-list";
 
 static int deref_tags = 0, show_head = 0, tags_only = 0, heads_only = 0,
 	found_match = 0, verify = 0, quiet = 0, hash_only = 0, abbrev = 0;
@@ -86,6 +87,59 @@ match:
 	return 0;
 }
 
+static int add_existing(const char *refname, const unsigned char *sha1, int flag, void *cbdata)
+{
+	struct path_list *list = (struct path_list *)cbdata;
+	path_list_insert(refname, list);
+	return 0;
+}
+
+/*
+ * read "^(?:<anything>\s)?<refname>(?:\^\{\})?$" from the standard input,
+ * and
+ * (1) strip "^{}" at the end of line if any;
+ * (2) ignore if match is provided and does not head-match refname;
+ * (3) warn if refname is not a well-formed refname and skip;
+ * (4) ignore if refname is a ref that exists in the local repository;
+ * (5) otherwise output the line.
+ */
+static int exclude_existing(const char *match)
+{
+	static struct path_list existing_refs = { NULL, 0, 0, 0 };
+	char buf[1024];
+	int matchlen = match ? strlen(match) : 0;
+
+	for_each_ref(add_existing, &existing_refs);
+	while (fgets(buf, sizeof(buf), stdin)) {
+		int len = strlen(buf);
+		char *ref;
+		if (len > 0 && buf[len - 1] == '\n')
+			buf[--len] = '\0';
+		if (!strcmp(buf + len - 3, "^{}")) {
+			len -= 3;
+			buf[len] = '\0';
+		}
+		for (ref = buf + len; buf < ref; ref--)
+			if (isspace(ref[-1]))
+				break;
+		if (match) {
+			int reflen = buf + len - ref;
+			if (reflen < matchlen)
+				continue;
+			if (strncmp(ref, match, matchlen))
+				continue;
+		}
+		if (check_ref_format(ref)) {
+			fprintf(stderr, "warning: ref '%s' ignored\n", ref);
+			continue;
+		}
+		if (!path_list_has_path(&existing_refs, ref)) {
+			printf("%s\n", buf);
+		}
+	}
+	return 0;
+}
+
 int cmd_show_ref(int argc, const char **argv, const char *prefix)
 {
 	int i;
@@ -153,6 +207,10 @@ int cmd_show_ref(int argc, const char **argv, const char *prefix)
 			heads_only = 1;
 			continue;
 		}
+		if (!strcmp(arg, "--exclude-existing"))
+			return exclude_existing(NULL);
+		if (!strncmp(arg, "--exclude-existing=", 19))
+			return exclude_existing(arg + 19);
 		usage(show_ref_usage);
 	}
 	if (show_head)
diff --git a/git-fetch.sh b/git-fetch.sh
index fb35815..38101a6 100755
--- a/git-fetch.sh
+++ b/git-fetch.sh
@@ -242,7 +242,7 @@ esac
 reflist=$(get_remote_refs_for_fetch "$@")
 if test "$tags"
 then
-	taglist=`IFS="	" &&
+	taglist=`IFS='	' &&
 		  echo "$ls_remote_result" |
 	          while read sha1 name
 		  do
@@ -438,17 +438,11 @@ case "$no_tags$tags" in
 	*:refs/*)
 		# effective only when we are following remote branch
 		# using local tracking branch.
-		taglist=$(IFS=" " &&
+		taglist=$(IFS='	' &&
 		echo "$ls_remote_result" |
-		sed -n	-e 's|^\('"$_x40"'\)	\(refs/tags/.*\)^{}$|\1 \2|p' \
-			-e 's|^\('"$_x40"'\)	\(refs/tags/.*\)$|\1 \2|p' |
+		git-show-ref --exclude-existing=refs/tags/ |
 		while read sha1 name
 		do
-			git-show-ref --verify --quiet -- "$name" && continue
-			git-check-ref-format "$name" || {
-				echo >&2 "warning: tag ${name} ignored"
-				continue
-			}
 			git-cat-file -t "$sha1" >/dev/null 2>&1 || continue
 			echo >&2 "Auto-following $name"
 			echo ".${name}:${name}"

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] git-fetch: Avoid reading packed refs over and over again
  2006-12-18  1:54 ` Junio C Hamano
@ 2006-12-18 12:33   ` Johannes Schindelin
  2006-12-30 19:04     ` Han-Wen Nienhuys
  0 siblings, 1 reply; 6+ messages in thread
From: Johannes Schindelin @ 2006-12-18 12:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi,

On Sun, 17 Dec 2006, Junio C Hamano wrote:

> How about this as a replacement?

Makes sense, except for:

> --- a/builtin-show-ref.c
> +++ b/builtin-show-ref.c
> @@ -2,8 +2,9 @@
>  #include "refs.h"
>  #include "object.h"
>  #include "tag.h"
> +#include "path-list.h"
>  
> -static const char show_ref_usage[] = "git show-ref [-q|--quiet] [--verify] [-h|--head] [-d|--dereference] [-s|--hash[=<length>]] [--abbrev[=<length>]] [--tags] [--heads] [--] [pattern*]";
> +static const char show_ref_usage[] = "git show-ref [-q|--quiet] [--verify] [-h|--head] [-d|--dereference] [-s|--hash[=<length>]] [--abbrev[=<length>]] [--tags] [--heads] [--] [pattern*] | --filter-invalid < ref-list";

Here, you want to either not mention it at all, or add 
"--exclude-existing[=<pattern>]" instead of "--filter-invalid".

> +		if (!strcmp(buf + len - 3, "^{}")) {

Here you have to check first, if len > 3. Strictly speaking, there should 
not be any line coming in which is shorter than 42 bytes. But I was 
recentely bitten by such an assuption...

Overall, I like it. I even have the impression that this could actually 
open a way to build in fetch instead of relying on a POSIX conformant and 
fast shell for such a central part of git.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] git-fetch: Avoid reading packed refs over and over again
  2006-12-18 12:33   ` Johannes Schindelin
@ 2006-12-30 19:04     ` Han-Wen Nienhuys
  2006-12-30 19:24       ` Johannes Schindelin
  0 siblings, 1 reply; 6+ messages in thread
From: Han-Wen Nienhuys @ 2006-12-30 19:04 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin escreveu:
> 
> Here you have to check first, if len > 3. Strictly speaking, there should 
> not be any line coming in which is shorter than 42 bytes. But I was 
> recentely bitten by such an assuption...
> 
> Overall, I like it. I even have the impression that this could actually 
> open a way to build in fetch instead of relying on a POSIX conformant and 
> fast shell for such a central part of git.

is there any chance of this going in GIT 1.5.0 ? It's not in the rc0 release.


-- 
 Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] git-fetch: Avoid reading packed refs over and over again
  2006-12-30 19:04     ` Han-Wen Nienhuys
@ 2006-12-30 19:24       ` Johannes Schindelin
  2006-12-30 20:02         ` Han-Wen Nienhuys
  0 siblings, 1 reply; 6+ messages in thread
From: Johannes Schindelin @ 2006-12-30 19:24 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: git

Hi,

On Sat, 30 Dec 2006, Han-Wen Nienhuys wrote:

> Johannes Schindelin escreveu:
> > 
> > Here you have to check first, if len > 3. Strictly speaking, there should 
> > not be any line coming in which is shorter than 42 bytes. But I was 
> > recentely bitten by such an assuption...
> > 
> > Overall, I like it. I even have the impression that this could actually 
> > open a way to build in fetch instead of relying on a POSIX conformant and 
> > fast shell for such a central part of git.
> 
> is there any chance of this going in GIT 1.5.0 ? It's not in the rc0 
> release.

Um.

$ git grep -e exclude-existing v1.5.0-rc0
v1.5.0-rc0:builtin-show-ref.c:          if (!strcmp(arg, "--exclude-existing"))
v1.5.0-rc0:builtin-show-ref.c:          if (!strncmp(arg, "--exclude-existing=", 19))
v1.5.0-rc0:git-fetch.sh:                git-show-ref --exclude-existing=refs/tags/ |

The last line means that it _is_ in v1.5.0-rc0. (BTW it is the commit 
tags/v1.5.0-rc0~84, which I found by "git log v1.5.0-rc0 git-fetch.sh | 
git name-rev --tags --stdin | less".)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] git-fetch: Avoid reading packed refs over and over again
  2006-12-30 19:24       ` Johannes Schindelin
@ 2006-12-30 20:02         ` Han-Wen Nienhuys
  0 siblings, 0 replies; 6+ messages in thread
From: Han-Wen Nienhuys @ 2006-12-30 20:02 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin escreveu:

> $ git grep -e exclude-existing v1.5.0-rc0
> v1.5.0-rc0:builtin-show-ref.c:          if (!strcmp(arg, "--exclude-existing"))
> v1.5.0-rc0:builtin-show-ref.c:          if (!strncmp(arg, "--exclude-existing=", 19))
> v1.5.0-rc0:git-fetch.sh:                git-show-ref --exclude-existing=refs/tags/ |
> 
> The last line means that it _is_ in v1.5.0-rc0. (BTW it is the commit 
> tags/v1.5.0-rc0~84, which I found by "git log v1.5.0-rc0 git-fetch.sh | 
> git name-rev --tags --stdin | less".)

oh, oops. I was looking for the commit  message of the patch you originally 
sent me. Cool.

-- 
 Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-12-30 20:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-17 19:52 [PATCH] git-fetch: Avoid reading packed refs over and over again Johannes Schindelin
2006-12-18  1:54 ` Junio C Hamano
2006-12-18 12:33   ` Johannes Schindelin
2006-12-30 19:04     ` Han-Wen Nienhuys
2006-12-30 19:24       ` Johannes Schindelin
2006-12-30 20:02         ` Han-Wen Nienhuys

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).