git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Luben Tuikov <ltuikov@yahoo.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 2/3] gitweb: Cache $parent_commit info in git_blame()
Date: Wed, 10 Dec 2008 22:03:28 +0100	[thread overview]
Message-ID: <200812102203.30486.jnareb@gmail.com> (raw)
In-Reply-To: <710873.22344.qm@web31806.mail.mud.yahoo.com>

On Wed, 10 Dec 2008, Luben Tuikov wrote:
> --- On Wed, 12/10/08, Jakub Narebski <jnareb@gmail.com> wrote:

Side note: is it Yahoo web mail interface that removes attributions?

> > > Have you tested this patch that it gives the same commit chain
> > > as before it?
> > 
> > The only difference between precious version and this patch
> > is that now, if you calculate sha-1 of $long_rev^, it is stored in 
> > $metainfo{$long_rev}{'parent'} and not calculated second time.
> 
> Yes, I agree a patch to this effect would improve performance
> proportionally to the history of the lines of a file.

Or rather proportionally to the ratio of number of lines of a file
to number of unique commits (not groups of lines) which are blamed
for given contents of a file.

> So it's a good thing, as most commits change a contiguous block
> of size more than one line of a file.
> 
> "$parent_commit" depends on "$full_rev^" which depends on "$full_rev".
> So as soon as "$full_rev" != "$old_full_rev", you'd know that you need
> to update "$parent_commit".  "$old_full_rev" needs to exist outside
> the  scope of "while (1)".  I didn't see this in the code or in the
> patch. 

You don't need $old_full_rev. We have to save data about commit in
%metainfo hash because information about commit appears in 
"git blame --porcelain" output _once_ per commit. So we use the same
cache to store $full_rev^ in $meta{'parent'}, which means storing
it in $metainfo{$full_rev}{'parent'}.

Now if the commit we saved this info about appears again in git-blame
output, be it in group of lines for which the same commit is blamed,
or later in unrelated chunk, we use saved info.

Let me try to explain it using the following diagram:

  Commit N Line Original code      This patch
  ------------------------------------------------------
  3a4046 1 xxx  rev-parse 3a4046^  rev-parse 3a4046^
         2 xxx  rev-parse 3a4046^  $mi{3a4046}{'parent'}
         3 xxx  rev-parse 3a4046^  $mi{3a4046}{'parent'}
  f47c19 5 xxx  rev-parse f47c19^  rev-parse f47c19^
         6 xxx  rev-parse f47c19^  $mi{f47c19}{'parent'}
  3a4046 7 xxx  rev-parse 3a4046^  $mi{3a4046}{'parent'}  <--
         8 xxx  rev-parse 3a4046^  $mi{3a4046}{'parent'}

where "rev-parse 3a4046^" means call to git-rev-parse, and $mi{<rev>}
accessing $metainfo{$full_rev} (via $meta).
 
In place marked by arrow '<--' you don't need to calculate 3a4046^
again...

> > But I have checked that (at least for single example file)
> > the blame output is identical for before and after this patch.
> 
> I've always tested it on something like "gitweb.perl", etc.

I've checked it on blob.h.  Other good example is README (with boundary
commits) and GIT-VERSION-GEN (with different output between git-blame
--porcelain and --incremental), both of which take much less time than
gitweb/gitweb.perl (see benchmarks in other post).

-- 
Jakub Narebski
Poland

  reply	other threads:[~2008-12-10 21:05 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-09 22:43 [PATCH 0/3] gitweb: Improve git_blame in preparation for incremental blame Jakub Narebski
2008-12-09 22:46 ` [PATCH 1/3] gitweb: Move 'lineno' id from link to row element in git_blame Jakub Narebski
2008-12-10  5:55   ` Luben Tuikov
2008-12-17  8:13   ` Petr Baudis
2008-12-09 22:48 ` [PATCH 2/3] gitweb: Cache $parent_commit info in git_blame() Jakub Narebski
2008-12-10  3:49   ` Nanako Shiraishi
2008-12-10 13:39     ` Jakub Narebski
2008-12-10 20:27       ` Junio C Hamano
2008-12-11  0:33         ` [PATCH 2/3 (edit v2)] " Jakub Narebski
2008-12-11  4:08           ` Luben Tuikov
2008-12-11  4:18             ` Junio C Hamano
2008-12-12  3:05           ` Junio C Hamano
2008-12-12 17:20             ` Jakub Narebski
2008-12-17  8:19           ` Petr Baudis
2008-12-17  8:34             ` Junio C Hamano
2008-12-10  6:20   ` [PATCH 2/3] " Luben Tuikov
2008-12-10 15:15     ` Jakub Narebski
2008-12-10 20:05       ` Luben Tuikov
2008-12-10 21:03         ` Jakub Narebski [this message]
2008-12-10 21:15           ` Luben Tuikov
2008-12-09 22:48 ` [PATCH 3/3] gitweb: A bit of code cleanup " Jakub Narebski
2008-12-10  2:13   ` Jakub Narebski
2008-12-10  8:35     ` Junio C Hamano
2008-12-10  6:24   ` Luben Tuikov
2008-12-10 20:11 ` [RFC/PATCH 4/3] gitweb: Incremental blame (proof of concept) Jakub Narebski
2008-12-11  0:47   ` Junio C Hamano
2008-12-11  1:22     ` Jakub Narebski
2008-12-11 17:28   ` Jakub Narebski
2008-12-11 22:34     ` Jakub Narebski
2008-12-14  0:17   ` [RFC/PATCH v2] " Jakub Narebski
2008-12-14 16:11     ` [RFC] gitweb: Incremental blame - suggestions for improvements Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200812102203.30486.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=ltuikov@yahoo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).