git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] gitweb: Do not parse refs by hand, use git-peek-remote instead
@ 2006-09-14 21:27 Jakub Narebski
  2006-09-14 23:09 ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Narebski @ 2006-09-14 21:27 UTC (permalink / raw)
  To: git; +Cc: Junio Hamano

This is response to Linus work on packed refs. Additionally it
makes gitweb work with symrefs, too.

Do not parse refs by hand, using File::Find and reading individual
heads to get hash of reference, but use git-peek-remote output
instead. Assume that the hash for deref (with ^{}) always follows hash
for ref, and that we hav derefs only for tags objects; this removes
call to git_get_type (and git-cat-file -t invocation) for tags, which
speeds "summary" and "tags" views generation, but might slow
generation of "heads" view a bit. As of now we do not save and use the
deref hash.

Remove git_get_hash_by_ref while at it, as git_get_refs_list was the
only place it was used.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
By the way, the previous patch
  "gitweb: Use File::Find::find in git_get_projects_list"
was added during working on this feature.

 gitweb/gitweb.perl |   39 +++++++++++++++++----------------------
 1 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 25383bc..4e58a6b 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -676,19 +676,6 @@ sub git_get_hash_by_path {
 ## ......................................................................
 ## git utility functions, directly accessing git repository
 
-# assumes that PATH is not symref
-sub git_get_hash_by_ref {
-	my $path = shift;
-
-	open my $fd, "$projectroot/$path" or return undef;
-	my $head = <$fd>;
-	close $fd;
-	chomp $head;
-	if ($head =~ m/^[0-9a-fA-F]{40}$/) {
-		return $head;
-	}
-}
-
 sub git_get_project_description {
 	my $path = shift;
 
@@ -1098,17 +1085,25 @@ sub git_get_refs_list {
 	my @reflist;
 
 	my @refs;
-	my $pfxlen = length("$projectroot/$project/$ref_dir");
-	File::Find::find(sub {
-		return if (/^\./);
-		if (-f $_) {
-			push @refs, substr($File::Find::name, $pfxlen + 1);
+	open my $fd, "-|", $GIT, "peek-remote", "$projectroot/$project/"
+		or return;
+	while (my $line = <$fd>) {
+		chomp $line;
+		if ($line =~ m/^([0-9a-fA-F]{40})\t$ref_dir\/?([^\^]+)$/) {
+			push @refs, { hash => $1, name => $2 };
+		} elsif ($line =~ m/^[0-9a-fA-F]{40}\t$ref_dir\/?.*\^\{\}$/) {
+			# assume that "peeled" ref is always after ref,
+			# and that you "peel" (deref) tags only
+			$refs[$#refs]->{'type'} = "tag";
 		}
-	}, "$projectroot/$project/$ref_dir");
+	}
+	close $fd;
+
+	foreach my $ref (@refs) {
+		my $ref_file = $ref->{'name'};
+		my $ref_id   = $ref->{'hash'};
 
-	foreach my $ref_file (@refs) {
-		my $ref_id = git_get_hash_by_ref("$project/$ref_dir/$ref_file");
-		my $type = git_get_type($ref_id) || next;
+		my $type = $ref->{'type'} || git_get_type($ref_id) || next;
 		my %ref_item = parse_ref($ref_file, $ref_id, $type);
 
 		push @reflist, \%ref_item;
-- 
1.4.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] gitweb: Do not parse refs by hand, use git-peek-remote instead
  2006-09-14 21:27 [PATCH] gitweb: Do not parse refs by hand, use git-peek-remote instead Jakub Narebski
@ 2006-09-14 23:09 ` Junio C Hamano
  2006-09-15  1:43   ` [PATCH (take 2)] " Jakub Narebski
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2006-09-14 23:09 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> writes:

> Remove git_get_hash_by_ref while at it, as git_get_refs_list was the
> only place it was used.

That's a very good news.

> +	open my $fd, "-|", $GIT, "peek-remote", "$projectroot/$project/"
> +		or return;
> +	while (my $line = <$fd>) {
> +		chomp $line;
> +		if ($line =~ m/^([0-9a-fA-F]{40})\t$ref_dir\/?([^\^]+)$/) {
> +			push @refs, { hash => $1, name => $2 };
> +		} elsif ($line =~ m/^[0-9a-fA-F]{40}\t$ref_dir\/?.*\^\{\}$/) {
> +			# assume that "peeled" ref is always after ref,
> +			# and that you "peel" (deref) tags only
> +			$refs[$#refs]->{'type'} = "tag";
> 		}

The assumption is good; we never show a ref^{} before showing
ref itself.  But you probably would want to safeguard yourself
from future changes by not depending on "immediately after".
At least you can check $refs[-1]{'name'} is the same as the
unpeeled ref you are looking at, like this:

	if ($line =~ m/^([0-9a-fA-F]{40})\t$ref_dir\/?([^\^]+)$/) {
		push @refs, { hash => $1, name => $2 };
	} elsif ($line =~ m/^[0-9a-fA-F]{40}\t$ref_dir\/?(.*)\^\{\}$/ &&
                 $1 eq $refs[-1]{'name'}) {
		# most likely a tag is followed by its peeled
		# one, and when that happens we know the
		# previous one was of type 'tag'.
		$refs[$#refs]->{'type'} = "tag";
	} ...

> +	foreach my $ref (@refs) {
> +		my $ref_file = $ref->{'name'};
> +		my $ref_id   = $ref->{'hash'};
> -		my $type = git_get_type($ref_id) || next;
> +		my $type = $ref->{'type'} || git_get_type($ref_id) || next;

And this is a good incremental change to reduce number of
external calls.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH (take 2)] gitweb: Do not parse refs by hand, use git-peek-remote instead
  2006-09-14 23:09 ` Junio C Hamano
@ 2006-09-15  1:43   ` Jakub Narebski
  2006-09-15  6:15     ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Narebski @ 2006-09-15  1:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

This is in response to Linus work on packed refs. Additionally it
makes gitweb work with symrefs, too.

Do not parse refs by hand, using File::Find and reading individual
heads to get hash of reference, but use git-peek-remote output
instead. Assume that the hash for deref (with ^{}) always follows hash
for ref, and that we hav derefs only for tags objects; this removes
call to git_get_type (and git-cat-file -t invocation) for tags, which
speeds "summary" and "tags" views generation, but might slow
generation of "heads" view a bit. As of now we do not save and use the
deref hash.

Remove git_get_hash_by_ref while at it, as git_get_refs_list was the
only place it was used.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/gitweb.perl |   41 +++++++++++++++++++----------------------
 1 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 25383bc..b4a890b 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -676,19 +676,6 @@ sub git_get_hash_by_path {
 ## ......................................................................
 ## git utility functions, directly accessing git repository
 
-# assumes that PATH is not symref
-sub git_get_hash_by_ref {
-	my $path = shift;
-
-	open my $fd, "$projectroot/$path" or return undef;
-	my $head = <$fd>;
-	close $fd;
-	chomp $head;
-	if ($head =~ m/^[0-9a-fA-F]{40}$/) {
-		return $head;
-	}
-}
-
 sub git_get_project_description {
 	my $path = shift;
 
@@ -1098,17 +1085,27 @@ sub git_get_refs_list {
 	my @reflist;
 
 	my @refs;
-	my $pfxlen = length("$projectroot/$project/$ref_dir");
-	File::Find::find(sub {
-		return if (/^\./);
-		if (-f $_) {
-			push @refs, substr($File::Find::name, $pfxlen + 1);
+	open my $fd, "-|", $GIT, "peek-remote", "$projectroot/$project/"
+		or return;
+	while (my $line = <$fd>) {
+		chomp $line;
+		if ($line =~ m/^([0-9a-fA-F]{40})\t$ref_dir\/?([^\^]+)$/) {
+			push @refs, { hash => $1, name => $2 };
+		} elsif ($line =~ m/^[0-9a-fA-F]{40}\t$ref_dir\/?(.*)\^\{\}$/ &&
+		         $1 eq $refs[-1]{'name'}) {
+			# most likely a tag is followed by its peeled
+			# (deref) one, and when that happens we know the
+			# previous one was of type 'tag'.
+			$refs[-1]{'type'} = "tag";
 		}
-	}, "$projectroot/$project/$ref_dir");
+	}
+	close $fd;
+
+	foreach my $ref (@refs) {
+		my $ref_file = $ref->{'name'};
+		my $ref_id   = $ref->{'hash'};
 
-	foreach my $ref_file (@refs) {
-		my $ref_id = git_get_hash_by_ref("$project/$ref_dir/$ref_file");
-		my $type = git_get_type($ref_id) || next;
+		my $type = $ref->{'type'} || git_get_type($ref_id) || next;
 		my %ref_item = parse_ref($ref_file, $ref_id, $type);
 
 		push @reflist, \%ref_item;
-- 
1.4.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH (take 2)] gitweb: Do not parse refs by hand, use git-peek-remote instead
  2006-09-15  1:43   ` [PATCH (take 2)] " Jakub Narebski
@ 2006-09-15  6:15     ` Junio C Hamano
  2006-09-15  7:14       ` Jakub Narebski
  2006-09-15 10:49       ` Jakub Narebski
  0 siblings, 2 replies; 9+ messages in thread
From: Junio C Hamano @ 2006-09-15  6:15 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> writes:

> This is in response to Linus work on packed refs. Additionally it
> makes gitweb work with symrefs, too.
>
> Do not parse refs by hand, using File::Find and reading individual
> heads to get hash of reference, but use git-peek-remote output
> instead.

Looks nicer.  Will apply.

Now, once we start doing this, it may make sense to rethink how
this function and git_get_references functions are used.  I
think

	git grep -n -e '^sub ' \
        	-e git_get_references \
                -e git_get_refs_list gitweb/gitweb.perl

would be instructive how wasteful the current code is.

get_refs_list is called _TWICE_ in git_summary and worse yet
very late in the function, after calling git_get_references that
could already have done what the function does (by the way,
git_get_references already knows how to use peek-remote output
but for some reason it uses ls-remote -- I think you can safely
rewrite it to use peek-remote).  So you end up doing peek-remote
three times to draw the summary page.

git_get_references are called from almost everywhere that shows
the list of commits, which is understandable because we would
want to see those markers in the list.

I very much suspect that you can use git_get_refs_list to return
a hash and a sorted list at the same time from the same input
and make git_summary to do with just a single call to it, and
get rid of git_get_references with a little bit of restructuring
of the caller.

Hmm?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH (take 2)] gitweb: Do not parse refs by hand, use git-peek-remote instead
  2006-09-15  6:15     ` Junio C Hamano
@ 2006-09-15  7:14       ` Jakub Narebski
  2006-09-15  7:48         ` Junio C Hamano
  2006-09-15 10:49       ` Jakub Narebski
  1 sibling, 1 reply; 9+ messages in thread
From: Jakub Narebski @ 2006-09-15  7:14 UTC (permalink / raw)
  To: git

Junio C Hamano wrote:

> Jakub Narebski <jnareb@gmail.com> writes:
> 
>> This is in response to Linus work on packed refs. Additionally it
>> makes gitweb work with symrefs, too.
>>
>> Do not parse refs by hand, using File::Find and reading individual
>> heads to get hash of reference, but use git-peek-remote output
>> instead.
> 
> Looks nicer.  Will apply.
> 
> Now, once we start doing this, it may make sense to rethink how
> this function and git_get_references functions are used.
[...]
> I very much suspect that you can use git_get_refs_list to return
> a hash and a sorted list at the same time from the same input
> and make git_summary to do with just a single call to it, and
> get rid of git_get_references with a little bit of restructuring
> of the caller.

It can be done. Well, we could also collapse git_get_references and
git_get_refs_list into one subroutine, but it would be slightly slower
than git_get_references.

But, if we change git_get_refs_list to do also work of git_get_references,
we should also change git_get_references to not use info/refs file at all
(it can, and usually for unknown reasons is stale) but always use
git-peek-remote, for consistency.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH (take 2)] gitweb: Do not parse refs by hand, use git-peek-remote instead
  2006-09-15  7:14       ` Jakub Narebski
@ 2006-09-15  7:48         ` Junio C Hamano
  2006-09-15  8:43           ` Jakub Narebski
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2006-09-15  7:48 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> writes:

> But, if we change git_get_refs_list to do also work of git_get_references,
> we should also change git_get_references to not use info/refs file at all
> (it can, and usually for unknown reasons is stale) but always use
> git-peek-remote, for consistency.

Yes that would make sense.  A repository served by gitweb does
not necessarily has to serve objects over http transport, so it
is nicer not to require info/refs to even exist or up to date.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH (take 2)] gitweb: Do not parse refs by hand, use git-peek-remote instead
  2006-09-15  7:48         ` Junio C Hamano
@ 2006-09-15  8:43           ` Jakub Narebski
  2006-09-15  9:05             ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Narebski @ 2006-09-15  8:43 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
> 
> > But, if we change git_get_refs_list to do also work of git_get_references,
> > we should also change git_get_references to not use info/refs file at all
> > (it can, and usually for unknown reasons is stale) but always use
> > git-peek-remote, for consistency.
> 
> Yes that would make sense.  A repository served by gitweb does
> not necessarily has to serve objects over http transport, so it
> is nicer not to require info/refs to even exist or up to date.

We do not require info/refs, as currently git_get_references falls
back to git-ls-remotes (should be git-peek-remote as it is faster)
if info/refs does not exist. But info/refs is usually stale; I guess
it is updated on pull/fetch/push, but not on commit.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH (take 2)] gitweb: Do not parse refs by hand, use git-peek-remote instead
  2006-09-15  8:43           ` Jakub Narebski
@ 2006-09-15  9:05             ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2006-09-15  9:05 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> writes:

> Junio C Hamano wrote:
>> Jakub Narebski <jnareb@gmail.com> writes:
>> 
>> > But, if we change git_get_refs_list to do also work of git_get_references,
>> > we should also change git_get_references to not use info/refs file at all
>> > (it can, and usually for unknown reasons is stale) but always use
>> > git-peek-remote, for consistency.
>> 
>> Yes that would make sense.  A repository served by gitweb does
>> not necessarily has to serve objects over http transport, so it
>> is nicer not to require info/refs to even exist or up to date.
>
> We do not require info/refs, as currently git_get_references falls
> back to git-ls-remotes (should be git-peek-remote as it is faster)
> if info/refs does not exist. But info/refs is usually stale; I guess
> it is updated on pull/fetch/push, but not on commit.

Yes, we are in agreement and are saying the same thing.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH (take 2)] gitweb: Do not parse refs by hand, use git-peek-remote instead
  2006-09-15  6:15     ` Junio C Hamano
  2006-09-15  7:14       ` Jakub Narebski
@ 2006-09-15 10:49       ` Jakub Narebski
  1 sibling, 0 replies; 9+ messages in thread
From: Jakub Narebski @ 2006-09-15 10:49 UTC (permalink / raw)
  To: Junio C Hamano, git

Junio C Hamano wrote:
> Now, once we start doing this, it may make sense to rethink how
> this function and git_get_references functions are used.  I
> think
> 
> 	git grep -n -e '^sub ' \
>         	-e git_get_references \
>                 -e git_get_refs_list gitweb/gitweb.perl
> 
> would be instructive how wasteful the current code is.
> 
> get_refs_list is called _TWICE_ in git_summary and worse yet
> very late in the function, after calling git_get_references that
> could already have done what the function does (by the way,
> git_get_references already knows how to use peek-remote output
> but for some reason it uses ls-remote -- I think you can safely
> rewrite it to use peek-remote).  So you end up doing peek-remote
> three times to draw the summary page.
> 
> git_get_references are called from almost everywhere that shows
> the list of commits, which is understandable because we would
> want to see those markers in the list.
> 
> I very much suspect that you can use git_get_refs_list to return
> a hash and a sorted list at the same time from the same input
> and make git_summary to do with just a single call to it, and
> get rid of git_get_references with a little bit of restructuring
> of the caller.

We can easily collapse two calls for git_get_refs_list in gi_summary, 
one for tags and one for heads into one call plus some filtering. 
Changing git_get_refs_list to do also the job of git_get_references 
means that in git_tags and git_heads we do extra the job of 
git_get_references. Neither git_tags, nor git_heads use references and 
refs marker; using the heads references in git_heads and tags 
references in git_tags is repeating the same information twice, 
cluttering the output. Unless we want to add yet another subroutine...

But as call to git-peek-remote is what takes most time, we can waste 
some time processing references which we would not use for the sake of 
clarity. Well, we can get rid of git_get_references too, if we don't 
mind slight decrease in performance.


We would then use:
git_summary:
  my ($refs, $reflist) = git_get_refs_list();
  my @taglist = map { s!^tags/!! } grep { m!^tags/! } @$reflist;
  my @headlist = map { s!^heads/!! } grep { m!^heads/! } @$reflist;

git_heads, git_tags:
  my (undef, $taglist)  = git_get_refs_list("tags");
  my (undef, $headlist) = git_get_refs_list("heads");

everywhere else
  my ($refs, undef) = git_get_refs_list($type);

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-09-15 10:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-14 21:27 [PATCH] gitweb: Do not parse refs by hand, use git-peek-remote instead Jakub Narebski
2006-09-14 23:09 ` Junio C Hamano
2006-09-15  1:43   ` [PATCH (take 2)] " Jakub Narebski
2006-09-15  6:15     ` Junio C Hamano
2006-09-15  7:14       ` Jakub Narebski
2006-09-15  7:48         ` Junio C Hamano
2006-09-15  8:43           ` Jakub Narebski
2006-09-15  9:05             ` Junio C Hamano
2006-09-15 10:49       ` Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).