From: Petr Baudis <pasky@suse.cz>
To: Jakub Narebski <jnareb@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
Nanako Shiraishi <nanako3@lavabit.com>,
git@vger.kernel.org, Luben Tuikov <ltuikov@yahoo.com>
Subject: Re: [PATCH 2/3 (edit v2)] gitweb: Cache $parent_commit info in git_blame()
Date: Wed, 17 Dec 2008 09:19:35 +0100 [thread overview]
Message-ID: <20081217081935.GC3640@machine.or.cz> (raw)
In-Reply-To: <200812110133.33124.jnareb@gmail.com>
On Thu, Dec 11, 2008 at 01:33:29AM +0100, Jakub Narebski wrote:
> Luben Tuikov changed 'lineno' link from leading to commit which gave
> current version of given block of lines, to leading to parent of this
> commit in 244a70e (Blame "linenr" link jumps to previous state at
> "orig_lineno"). This made possible data mining using 'blame' view.
>
> The current implementation calls rev-parse once per each blamed line
> to find parent revision of blamed commit, even when the same commit
> appears more than once, which is inefficient.
>
> This patch attempts to mitigate this issue by storing (caching)
> $parent_commit info in %metainfo, which makes gitweb call
> git-rev-parse only once per each unique commit in blame output.
>
>
> In the tables below you can see simple benchmark comparing gitweb
> performance before and after this patch
>
> File | L[1] | C[2] || Time0[3] | Before[4] | After[4]
> ====================================================================
> blob.h | 18 | 4 || 0m1.727s | 0m2.545s | 0m2.474s
> GIT-VERSION-GEN | 42 | 13 || 0m2.165s | 0m2.448s | 0m2.071s
> README | 46 | 6 || 0m1.593s | 0m2.727s | 0m2.242s
> revision.c | 1923 | 121 || 0m2.357s | 0m30.365s | 0m7.028s
> gitweb/gitweb.perl | 6291 | 428 || 0m8.080s | 1m37.244s | 0m20.627s
>
> File | L/C | Before/After
> =========================================
> blob.h | 4.5 | 1.03
> GIT-VERSION-GEN | 3.2 | 1.18
> README | 7.7 | 1.22
> revision.c | 15.9 | 4.32
> gitweb/gitweb.perl | 14.7 | 4.71
>
> As you can see the greater ratio of lines in file to unique commits
> in blame output, the greater gain from the new implementation.
>
> Footnotes:
> ~~~~~~~~~~
> [1] Lines:
> $ wc -l <file>
> [2] Individual commits in blame output:
> $ git blame -p <file> | grep author-time | wc -l
> [3] Time for running "git blame -p" (user time, single run):
> $ time git blame -p <file> >/dev/null
> [4] Time to run gitweb as Perl script from command line:
> $ gitweb-run.sh "p=.git;a=blame;f=<file>" > /dev/null 2>&1
>
> The gitweb-run.sh script includes slightly modified (with adjusted
> pathnames) code from gitweb_run() function from the test script
> t/t9500-gitweb-standalone-no-errors.sh; gitweb config file
> gitweb_config.perl contents (again up to adjusting pathnames; in
> particular $projectroot variable should point to top directory of git
> repository) can be found in the same place.
>
>
> Alternate solutions:
> ~~~~~~~~~~~~~~~~~~~~
> Alternate solution would be to open bidi pipe to "git cat-file
> --batch-check", (like in Git::Repo in gitweb caching by Lea Wiemann),
> feed $long_rev^ to it, and parse its output which has the following
> form:
>
> 926b07e694599d86cec668475071b32147c95034 commit 637
>
> This would mean one call to git-cat-file for the whole 'blame' view,
> instead of one call to git-rev-parse per each unique commit in blame
> output.
>
>
> Yet another solution would be to change use of validate_refname() to
> validate_revision() when checking script parameters (CGI query or
> path_info), with validate_revision being something like the following:
>
> sub validate_revision {
> my $rev = shift;
> return validate_refname(strip_rev_suffixes($rev));
> }
>
> so we don't need to calculate $long_rev^, but can pass "$long_rev^" as
> 'hb' parameter.
>
> This solution has the advantage that it can be easily adapted to
> future incremental blame output.
>
> Acked-by: Luben Tuikov <ltuikov@yahoo.com>
> Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Acked-by: Petr Baudis <pasky@suse.cz>
(though I think the commit message is total overkill for such an obvious
change ;-)
next prev parent reply other threads:[~2008-12-17 8:21 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-09 22:43 [PATCH 0/3] gitweb: Improve git_blame in preparation for incremental blame Jakub Narebski
2008-12-09 22:46 ` [PATCH 1/3] gitweb: Move 'lineno' id from link to row element in git_blame Jakub Narebski
2008-12-10 5:55 ` Luben Tuikov
2008-12-17 8:13 ` Petr Baudis
2008-12-09 22:48 ` [PATCH 2/3] gitweb: Cache $parent_commit info in git_blame() Jakub Narebski
2008-12-10 3:49 ` Nanako Shiraishi
2008-12-10 13:39 ` Jakub Narebski
2008-12-10 20:27 ` Junio C Hamano
2008-12-11 0:33 ` [PATCH 2/3 (edit v2)] " Jakub Narebski
2008-12-11 4:08 ` Luben Tuikov
2008-12-11 4:18 ` Junio C Hamano
2008-12-12 3:05 ` Junio C Hamano
2008-12-12 17:20 ` Jakub Narebski
2008-12-17 8:19 ` Petr Baudis [this message]
2008-12-17 8:34 ` Junio C Hamano
2008-12-10 6:20 ` [PATCH 2/3] " Luben Tuikov
2008-12-10 15:15 ` Jakub Narebski
2008-12-10 20:05 ` Luben Tuikov
2008-12-10 21:03 ` Jakub Narebski
2008-12-10 21:15 ` Luben Tuikov
2008-12-09 22:48 ` [PATCH 3/3] gitweb: A bit of code cleanup " Jakub Narebski
2008-12-10 2:13 ` Jakub Narebski
2008-12-10 8:35 ` Junio C Hamano
2008-12-10 6:24 ` Luben Tuikov
2008-12-10 20:11 ` [RFC/PATCH 4/3] gitweb: Incremental blame (proof of concept) Jakub Narebski
2008-12-11 0:47 ` Junio C Hamano
2008-12-11 1:22 ` Jakub Narebski
2008-12-11 17:28 ` Jakub Narebski
2008-12-11 22:34 ` Jakub Narebski
2008-12-14 0:17 ` [RFC/PATCH v2] " Jakub Narebski
2008-12-14 16:11 ` [RFC] gitweb: Incremental blame - suggestions for improvements Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081217081935.GC3640@machine.or.cz \
--to=pasky@suse.cz \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jnareb@gmail.com \
--cc=ltuikov@yahoo.com \
--cc=nanako3@lavabit.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).