From: Jakub Narebski <jnareb@gmail.com>
To: git@vger.kernel.org
Cc: Luben Tuikov <ltuikov@yahoo.com>, Junio C Hamano <gitster@pobox.com>
Subject: [PATCH 2/3] gitweb: Use "previous" header of git-blame -p in 'blame' view
Date: Fri, 10 Jul 2009 23:57:42 +0200 [thread overview]
Message-ID: <200907102357.43475.jnareb@gmail.com> (raw)
In-Reply-To: <200907102354.43232.jnareb@gmail.com>
Luben Tuikov changed 'lineno' link (line number link) from pointing to
'blame' view at given line at blamed commit, to the one at parent of
blamed commit in
244a70e (Blame "linenr" link jumps to previous state at
"orig_lineno", 2007-01-04).
This made it possible to do data mining using 'blame' view, by going
through history of a line using mentioned line number link.
Original implementation called "git rev-parse <commit>^" to find SHA-1
of a parent of a given commit once per each blamed line. In
39c19ce (gitweb: cache $parent_commit info in git_blame(),
2008-12-11)
this was improved so rev-parse was called once per each unique commit
in git-blame output. Alternate solution would be to relax validation
for 'hb' parameter by allowing extended SHA-1 syntax of the form
<rev>^ (perhaps redirecting to gitweb URL with <rev>^ resolved, in
practice moving call to rev-parse to 'the other side of link').
This solution had a bug that it didn't work for boundary commits,
which did not have parents, so "git rev-parse <commit>^" returned
literal "<commit>^" (which didn't exists), which gitweb passed
as 'hb' parameter in 'linenr' link... following which gave
400 - Invalid hash base parameter
error. This bug could have been fixed by checking if commit is
boundary commit, or check if rev-parse result is unchanged (still
ends in '^' prefix).
The solution employing rev-parse to find parent of commit had inherent
problem if blamed commit renamed file; then name of file would be
different in its parent. Solving this outside git-blame would be
difficult and costly (at least cost of additional fork for extra git
command).
Currently gitweb uses information in "previous" header, which was
introduced by Junio C Hamano in
96e1170 (blame: show "previous" information in
--porcelain/--incremental format, 2008-06-04)
This (currently undocumented) header has the following format:
"previous <sha1 of parent commit> <filename at parent>"
Using "previous" header solves both problem of performance and the
problem that blamed commit could have renaming blamed file.
Because "previous" header can be repeated for the same commit when
blamed commit is merge (has more than one parent), and we are
interested usually in _first_ parent, currently we store only first
value if blame header repeats. Using first parent (first "previous"
line) was what gitweb did before; without this change gitweb would use
last parent instead.
While at it introduce helper subroutine unquote_maybe(), which
unquotes filename if it is needed, which is marked by filename being
surrounded in doublequotes (which are not part of name, and which are
stripped by unquote_maybe() - which makes this function idempotent).
Currently unquote_maybe() us used only in git_blame.
If there is no previous commit 'linenr' link points to blamed commit
and blamed filename, making it work correctly for boundary commits.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
IMHO more important is that result is MORE CORRECT, not the better
performance.
In the table below you can see simple benchmark comparing gitweb
performance before and after this patch. Operating system used was
Linux 2.6.14, on 1 GHz AMD Athlon processor (2002.43 BogoMIPS).
As there is one "git rev-parse <rev>^" per each individual commit
in blame output, it would be much worse 'before' for operating
systems with costly fork.
File | C[1] || Time0[2] | Before[3] | After[3]
=================================================================
revision.c | 121 || 2.820s | 5.548s | 5.172s
gitweb/gitweb.perl[4] | 428 || 11.749s | 19.797s | 17.293s
Table footnotes:
~~~~~~~~~~~~~~~~
[1] Individual commits in blame output:
$ git blame -p <file> | grep author-time | wc -l
[2] Time for running "git blame -p" (user time, single run):
$ time git blame -p <file> >/dev/null
[3] Time to run gitweb as Perl script from command line:
$ time gitweb-run.sh "p=.git;a=blame;f=<file>" >/dev/null 2>&1
[4] Starting at 'origin', which is v1.6.3.3-412-gf581de1
> git blame -p origin -- gitweb/gitweb.perl
> "p=.git;a=blame;hb=origin;f=gitweb/gitweb.perl"
For comparison there is similar table for my 39c19ce (gitweb: cache
$parent_commit info in git_blame(), 2008-12-11):
File | L[1] | C[2] || Time0[3] | Before[4] | After[4]
====================================================================
blob.h | 18 | 4 || 0m1.727s | 0m2.545s | 0m2.474s
GIT-VERSION-GEN | 42 | 13 || 0m2.165s | 0m2.448s | 0m2.071s
README | 46 | 6 || 0m1.593s | 0m2.727s | 0m2.242s
revision.c | 1923 | 121 || 0m2.357s | 0m30.365s | 0m7.028s
gitweb/gitweb.perl | 6291 | 428 || 0m8.080s | 1m37.244s | 0m20.627s
File | L/C | Before/After
=========================================
blob.h | 4.5 | 1.03
GIT-VERSION-GEN | 3.2 | 1.18
README | 7.7 | 1.22
revision.c | 15.9 | 4.32
gitweb/gitweb.perl | 14.7 | 4.71
As you can see the greater ratio of lines in file to unique commits
in blame output, the greater gain from the new implementation.
Legend:
[1] Number of lines:
$ wc -l <file>
[2] Number of unique commits in the blame output:
$ git blame -p <file> | grep author-time | wc -l
[3] Time for running "git blame -p" (user time, single run):
$ time git blame -p <file> >/dev/null
[4] Time to run gitweb as Perl script from command line:
$ gitweb-run.sh "p=.git;a=blame;f=<file>" > /dev/null 2>&1
gitweb/gitweb.perl | 37 ++++++++++++++++++++++++-------------
1 files changed, 24 insertions(+), 13 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index fe73c2c..36b1ce5 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1187,6 +1187,16 @@ sub unquote {
return $str;
}
+# if filename is surrounded in double quotes, it need to be unquoted
+sub unquote_maybe {
+ my $str = shift;
+
+ if ($str =~ /^"(.*)"$/) {
+ return unquote($1);
+ }
+ return $str;
+}
+
# escape tabs (convert tabs to spaces)
sub untabify {
my $line = shift;
@@ -4827,7 +4837,7 @@ HTML
chomp $data;
last if ($data =~ s/^\t//); # contents of line
if ($data =~ /^(\S+)(?: (.*))?$/) {
- $meta->{$1} = $2;
+ $meta->{$1} = $2 unless exists $meta->{$1};
}
}
my $short_rev = substr($full_rev, 0, 8);
@@ -4852,20 +4862,21 @@ HTML
esc_html($short_rev));
print "</td>\n";
}
- my $parent_commit;
- if (!exists $meta->{'parent'}) {
- open (my $dd, "-|", git_cmd(), "rev-parse", "$full_rev^")
- or die_error(500, "Open git-rev-parse failed");
- $parent_commit = <$dd>;
- close $dd;
- chomp($parent_commit);
- $meta->{'parent'} = $parent_commit;
- } else {
- $parent_commit = $meta->{'parent'};
- }
+ # 'previous' <sha1 of parent commit> <filename at commit>
+ if (exists $meta->{'previous'} &&
+ $meta->{'previous'} =~ /^([a-fA-F0-9]{40}) (.*)$/) {
+ $meta->{'parent'} = $1;
+ $meta->{'file_parent'} = unquote_maybe($2);
+ }
+ my $linenr_commit =
+ exists($meta->{'parent'}) ?
+ $meta->{'parent'} : $full_rev;
+ my $linenr_filename =
+ exists($meta->{'file_parent'}) ?
+ $meta->{'file_parent'} : unquote_maybe($meta->{'filename'});
my $blamed = href(action => 'blame',
- file_name => $meta->{'filename'},
- hash_base => $parent_commit);
+ file_name => $linenr_filename,
+ hash_base => $linenr_commit);
print "<td class=\"linenr\">";
print $cgi->a({ -href => "$blamed#l$orig_lineno",
-class => "linenr" },
--
1.6.3.3
next prev parent reply other threads:[~2009-07-10 22:01 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-10 21:54 [PATCH 0/3] gitweb: 'blame' view improvements Jakub Narebski
2009-07-10 21:55 ` [PATCH 1/3] gitweb: Mark boundary commits in 'blame' view Jakub Narebski
2009-07-10 21:57 ` Jakub Narebski [this message]
2009-07-10 22:21 ` [PATCH 2/3] gitweb: Use "previous" header of git-blame -p " Junio C Hamano
2009-07-11 9:17 ` Jakub Narebski
2009-07-12 17:21 ` Luben Tuikov
2009-07-14 19:21 ` Jakub Narebski
2009-07-10 22:01 ` [PATCH 3/3] gitweb: Add author initials in 'blame' view, a la "git gui blame" Jakub Narebski
2009-07-11 16:56 ` [PATCH 0/3] gitweb: 'blame' view improvements Jakub Narebski
2009-07-13 19:08 ` [RFC PATCH 5/3] gitweb: Incremental blame (proof of concept) Jakub Narebski
2009-07-12 22:08 ` [PATCH 4/3] gitweb: Use light/dark class also in 'blame' view Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200907102357.43475.jnareb@gmail.com \
--to=jnareb@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=ltuikov@yahoo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).