* [PATCH] git-fetch: Avoid reading packed refs over and over again
@ 2006-12-17 19:52 Johannes Schindelin
2006-12-18 1:54 ` Junio C Hamano
0 siblings, 1 reply; 6+ messages in thread
From: Johannes Schindelin @ 2006-12-17 19:52 UTC (permalink / raw)
To: git, junkio
When checking which tags to fetch, the old code used to call
git-show-ref --verify for _each_ remote tag. Since reading even
packed refs is not a cheap operation when there are a lot of
local refs, the code became quite slow.
This fixes it by teaching git-show-ref to filter out valid
(i.e. locally stored) refs from stdin, when passing the parameter
--filter-invalid to git-show-ref, and feeding it lines in the
form 'sha1 refname'.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
Since this option is purely for use in git-fetch, I did not even
bother documenting it.
This patch would have been so much cleaner if git-fetch was written
in C... But since it accumulated so many functions by now, I see
not much chance for that (at least in the near future).
In very unscientific tests, a single read_packed_refs() in the
lilypond repo took 0.1 seconds. Yep, that's 1/10th second. So, the
while loop in git-fetch took more than 10 seconds for 107 tags.
builtin-show-ref.c | 28 +++++++++++++++++++++++++++-
git-fetch.sh | 2 +-
2 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/builtin-show-ref.c b/builtin-show-ref.c
index f6929d9..c0b55c1 100644
--- a/builtin-show-ref.c
+++ b/builtin-show-ref.c
@@ -2,8 +2,9 @@
#include "refs.h"
#include "object.h"
#include "tag.h"
+#include "path-list.h"
-static const char show_ref_usage[] = "git show-ref [-q|--quiet] [--verify] [-h|--head] [-d|--dereference] [-s|--hash[=<length>]] [--abbrev[=<length>]] [--tags] [--heads] [--] [pattern*]";
+static const char show_ref_usage[] = "git show-ref [-q|--quiet] [--verify] [-h|--head] [-d|--dereference] [-s|--hash[=<length>]] [--abbrev[=<length>]] [--tags] [--heads] [--] [pattern*] | --filter-invalid < ref-list";
static int deref_tags = 0, show_head = 0, tags_only = 0, heads_only = 0,
found_match = 0, verify = 0, quiet = 0, hash_only = 0, abbrev = 0;
@@ -86,6 +87,29 @@ match:
return 0;
}
+static int add_valid(const char *refname, const unsigned char *sha1, int flag, void *cbdata)
+{
+ struct path_list *list = (struct path_list *)cbdata;
+ path_list_insert(refname, list);
+ return 0;
+}
+
+static int filter_invalid()
+{
+ static struct path_list valid_refs = { NULL, 0, 0, 0 };
+ char buf[1024];
+
+ for_each_ref(add_valid, &valid_refs);
+ while (fgets(buf, sizeof(buf), stdin)) {
+ int len = strlen(buf);
+ if (len > 0 && buf[len - 1] == '\n')
+ buf[--len] = '\0';
+ if (len < 41 || !path_list_has_path(&valid_refs, buf + 41))
+ printf("%s\n", buf);
+ }
+ return 0;
+}
+
int cmd_show_ref(int argc, const char **argv, const char *prefix)
{
int i;
@@ -153,6 +177,8 @@ int cmd_show_ref(int argc, const char **argv, const char *prefix)
heads_only = 1;
continue;
}
+ if (!strcmp(arg, "--filter-invalid"))
+ return filter_invalid();
usage(show_ref_usage);
}
if (verify) {
diff --git a/git-fetch.sh b/git-fetch.sh
index 3feba32..d1c00db 100755
--- a/git-fetch.sh
+++ b/git-fetch.sh
@@ -474,9 +474,9 @@ case "$no_tags$tags" in
echo "$ls_remote_result" |
sed -n -e 's|^\('"$_x40"'\) \(refs/tags/.*\)^{}$|\1 \2|p' \
-e 's|^\('"$_x40"'\) \(refs/tags/.*\)$|\1 \2|p' |
+ git-show-ref --filter-invalid |
while read sha1 name
do
- git-show-ref --verify --quiet -- "$name" && continue
git-check-ref-format "$name" || {
echo >&2 "warning: tag ${name} ignored"
continue
--
1.4.4.1.g3c2a-dirty
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] git-fetch: Avoid reading packed refs over and over again
2006-12-17 19:52 [PATCH] git-fetch: Avoid reading packed refs over and over again Johannes Schindelin
@ 2006-12-18 1:54 ` Junio C Hamano
2006-12-18 12:33 ` Johannes Schindelin
0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2006-12-18 1:54 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> Since this option is purely for use in git-fetch, I did not even
> bother documenting it.
I think the name --filter-blah makes it unclear if it is a
filter that picks up items that match the criteria "blah" out of
its input, or if it filters out the ones that do not match the
criteria.
In either case, you are not filtering "invalid vs valid" but
"existing vs missing".
If you are making a git-fetch specific extension, it would make
sense to include what the upstream sed does as well. That is,
the user would become:
echo "$ls_remote_result" |
- sed -n -e 's|^\('"$_x40"'\) \(refs/tags/.*\)^{}$|\1 \2|p' \
- -e 's|^\('"$_x40"'\) \(refs/tags/.*\)$|\1 \2|p' |
+ git show-ref --exclude-existing=refs/tags/ |
while read sha1 name
do
- git-show-ref --verify --quiet -- "$name" && continue
- git-check-ref-format "$name" || {
- echo >&2 "warning: tag ${name} ignored"
- continue
The 'exclude-existing' flag would filter out the ones that do not
head-match the specified hierarchy, the ones that we have, and
the ones that do not conform to a valid refname pattern (which
would filter out ^{} entries).
How about this as a replacement?
---
builtin-show-ref.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++-
git-fetch.sh | 12 ++-------
2 files changed, 62 insertions(+), 10 deletions(-)
diff --git a/builtin-show-ref.c b/builtin-show-ref.c
index 0739798..b36f15e 100644
--- a/builtin-show-ref.c
+++ b/builtin-show-ref.c
@@ -2,8 +2,9 @@
#include "refs.h"
#include "object.h"
#include "tag.h"
+#include "path-list.h"
-static const char show_ref_usage[] = "git show-ref [-q|--quiet] [--verify] [-h|--head] [-d|--dereference] [-s|--hash[=<length>]] [--abbrev[=<length>]] [--tags] [--heads] [--] [pattern*]";
+static const char show_ref_usage[] = "git show-ref [-q|--quiet] [--verify] [-h|--head] [-d|--dereference] [-s|--hash[=<length>]] [--abbrev[=<length>]] [--tags] [--heads] [--] [pattern*] | --filter-invalid < ref-list";
static int deref_tags = 0, show_head = 0, tags_only = 0, heads_only = 0,
found_match = 0, verify = 0, quiet = 0, hash_only = 0, abbrev = 0;
@@ -86,6 +87,59 @@ match:
return 0;
}
+static int add_existing(const char *refname, const unsigned char *sha1, int flag, void *cbdata)
+{
+ struct path_list *list = (struct path_list *)cbdata;
+ path_list_insert(refname, list);
+ return 0;
+}
+
+/*
+ * read "^(?:<anything>\s)?<refname>(?:\^\{\})?$" from the standard input,
+ * and
+ * (1) strip "^{}" at the end of line if any;
+ * (2) ignore if match is provided and does not head-match refname;
+ * (3) warn if refname is not a well-formed refname and skip;
+ * (4) ignore if refname is a ref that exists in the local repository;
+ * (5) otherwise output the line.
+ */
+static int exclude_existing(const char *match)
+{
+ static struct path_list existing_refs = { NULL, 0, 0, 0 };
+ char buf[1024];
+ int matchlen = match ? strlen(match) : 0;
+
+ for_each_ref(add_existing, &existing_refs);
+ while (fgets(buf, sizeof(buf), stdin)) {
+ int len = strlen(buf);
+ char *ref;
+ if (len > 0 && buf[len - 1] == '\n')
+ buf[--len] = '\0';
+ if (!strcmp(buf + len - 3, "^{}")) {
+ len -= 3;
+ buf[len] = '\0';
+ }
+ for (ref = buf + len; buf < ref; ref--)
+ if (isspace(ref[-1]))
+ break;
+ if (match) {
+ int reflen = buf + len - ref;
+ if (reflen < matchlen)
+ continue;
+ if (strncmp(ref, match, matchlen))
+ continue;
+ }
+ if (check_ref_format(ref)) {
+ fprintf(stderr, "warning: ref '%s' ignored\n", ref);
+ continue;
+ }
+ if (!path_list_has_path(&existing_refs, ref)) {
+ printf("%s\n", buf);
+ }
+ }
+ return 0;
+}
+
int cmd_show_ref(int argc, const char **argv, const char *prefix)
{
int i;
@@ -153,6 +207,10 @@ int cmd_show_ref(int argc, const char **argv, const char *prefix)
heads_only = 1;
continue;
}
+ if (!strcmp(arg, "--exclude-existing"))
+ return exclude_existing(NULL);
+ if (!strncmp(arg, "--exclude-existing=", 19))
+ return exclude_existing(arg + 19);
usage(show_ref_usage);
}
if (show_head)
diff --git a/git-fetch.sh b/git-fetch.sh
index fb35815..38101a6 100755
--- a/git-fetch.sh
+++ b/git-fetch.sh
@@ -242,7 +242,7 @@ esac
reflist=$(get_remote_refs_for_fetch "$@")
if test "$tags"
then
- taglist=`IFS=" " &&
+ taglist=`IFS=' ' &&
echo "$ls_remote_result" |
while read sha1 name
do
@@ -438,17 +438,11 @@ case "$no_tags$tags" in
*:refs/*)
# effective only when we are following remote branch
# using local tracking branch.
- taglist=$(IFS=" " &&
+ taglist=$(IFS=' ' &&
echo "$ls_remote_result" |
- sed -n -e 's|^\('"$_x40"'\) \(refs/tags/.*\)^{}$|\1 \2|p' \
- -e 's|^\('"$_x40"'\) \(refs/tags/.*\)$|\1 \2|p' |
+ git-show-ref --exclude-existing=refs/tags/ |
while read sha1 name
do
- git-show-ref --verify --quiet -- "$name" && continue
- git-check-ref-format "$name" || {
- echo >&2 "warning: tag ${name} ignored"
- continue
- }
git-cat-file -t "$sha1" >/dev/null 2>&1 || continue
echo >&2 "Auto-following $name"
echo ".${name}:${name}"
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] git-fetch: Avoid reading packed refs over and over again
2006-12-18 1:54 ` Junio C Hamano
@ 2006-12-18 12:33 ` Johannes Schindelin
2006-12-30 19:04 ` Han-Wen Nienhuys
0 siblings, 1 reply; 6+ messages in thread
From: Johannes Schindelin @ 2006-12-18 12:33 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Hi,
On Sun, 17 Dec 2006, Junio C Hamano wrote:
> How about this as a replacement?
Makes sense, except for:
> --- a/builtin-show-ref.c
> +++ b/builtin-show-ref.c
> @@ -2,8 +2,9 @@
> #include "refs.h"
> #include "object.h"
> #include "tag.h"
> +#include "path-list.h"
>
> -static const char show_ref_usage[] = "git show-ref [-q|--quiet] [--verify] [-h|--head] [-d|--dereference] [-s|--hash[=<length>]] [--abbrev[=<length>]] [--tags] [--heads] [--] [pattern*]";
> +static const char show_ref_usage[] = "git show-ref [-q|--quiet] [--verify] [-h|--head] [-d|--dereference] [-s|--hash[=<length>]] [--abbrev[=<length>]] [--tags] [--heads] [--] [pattern*] | --filter-invalid < ref-list";
Here, you want to either not mention it at all, or add
"--exclude-existing[=<pattern>]" instead of "--filter-invalid".
> + if (!strcmp(buf + len - 3, "^{}")) {
Here you have to check first, if len > 3. Strictly speaking, there should
not be any line coming in which is shorter than 42 bytes. But I was
recentely bitten by such an assuption...
Overall, I like it. I even have the impression that this could actually
open a way to build in fetch instead of relying on a POSIX conformant and
fast shell for such a central part of git.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] git-fetch: Avoid reading packed refs over and over again
2006-12-18 12:33 ` Johannes Schindelin
@ 2006-12-30 19:04 ` Han-Wen Nienhuys
2006-12-30 19:24 ` Johannes Schindelin
0 siblings, 1 reply; 6+ messages in thread
From: Han-Wen Nienhuys @ 2006-12-30 19:04 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
Johannes Schindelin escreveu:
>
> Here you have to check first, if len > 3. Strictly speaking, there should
> not be any line coming in which is shorter than 42 bytes. But I was
> recentely bitten by such an assuption...
>
> Overall, I like it. I even have the impression that this could actually
> open a way to build in fetch instead of relying on a POSIX conformant and
> fast shell for such a central part of git.
is there any chance of this going in GIT 1.5.0 ? It's not in the rc0 release.
--
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] git-fetch: Avoid reading packed refs over and over again
2006-12-30 19:04 ` Han-Wen Nienhuys
@ 2006-12-30 19:24 ` Johannes Schindelin
2006-12-30 20:02 ` Han-Wen Nienhuys
0 siblings, 1 reply; 6+ messages in thread
From: Johannes Schindelin @ 2006-12-30 19:24 UTC (permalink / raw)
To: Han-Wen Nienhuys; +Cc: git
Hi,
On Sat, 30 Dec 2006, Han-Wen Nienhuys wrote:
> Johannes Schindelin escreveu:
> >
> > Here you have to check first, if len > 3. Strictly speaking, there should
> > not be any line coming in which is shorter than 42 bytes. But I was
> > recentely bitten by such an assuption...
> >
> > Overall, I like it. I even have the impression that this could actually
> > open a way to build in fetch instead of relying on a POSIX conformant and
> > fast shell for such a central part of git.
>
> is there any chance of this going in GIT 1.5.0 ? It's not in the rc0
> release.
Um.
$ git grep -e exclude-existing v1.5.0-rc0
v1.5.0-rc0:builtin-show-ref.c: if (!strcmp(arg, "--exclude-existing"))
v1.5.0-rc0:builtin-show-ref.c: if (!strncmp(arg, "--exclude-existing=", 19))
v1.5.0-rc0:git-fetch.sh: git-show-ref --exclude-existing=refs/tags/ |
The last line means that it _is_ in v1.5.0-rc0. (BTW it is the commit
tags/v1.5.0-rc0~84, which I found by "git log v1.5.0-rc0 git-fetch.sh |
git name-rev --tags --stdin | less".)
Ciao,
Dscho
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] git-fetch: Avoid reading packed refs over and over again
2006-12-30 19:24 ` Johannes Schindelin
@ 2006-12-30 20:02 ` Han-Wen Nienhuys
0 siblings, 0 replies; 6+ messages in thread
From: Han-Wen Nienhuys @ 2006-12-30 20:02 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
Johannes Schindelin escreveu:
> $ git grep -e exclude-existing v1.5.0-rc0
> v1.5.0-rc0:builtin-show-ref.c: if (!strcmp(arg, "--exclude-existing"))
> v1.5.0-rc0:builtin-show-ref.c: if (!strncmp(arg, "--exclude-existing=", 19))
> v1.5.0-rc0:git-fetch.sh: git-show-ref --exclude-existing=refs/tags/ |
>
> The last line means that it _is_ in v1.5.0-rc0. (BTW it is the commit
> tags/v1.5.0-rc0~84, which I found by "git log v1.5.0-rc0 git-fetch.sh |
> git name-rev --tags --stdin | less".)
oh, oops. I was looking for the commit message of the patch you originally
sent me. Cool.
--
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2006-12-30 20:02 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-17 19:52 [PATCH] git-fetch: Avoid reading packed refs over and over again Johannes Schindelin
2006-12-18 1:54 ` Junio C Hamano
2006-12-18 12:33 ` Johannes Schindelin
2006-12-30 19:04 ` Han-Wen Nienhuys
2006-12-30 19:24 ` Johannes Schindelin
2006-12-30 20:02 ` Han-Wen Nienhuys
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).