All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Grimm <koreth@midwinter.com>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Jon Smirl <jonsmirl@gmail.com>,
	Julian Phillips <julian@quantumfyre.co.uk>,
	Andreas Ericsson <ae@op5.se>, Theodore Tso <tytso@mit.edu>,
	Junio C Hamano <gitster@pobox.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Git's database structure
Date: Thu, 06 Sep 2007 11:14:06 -0700	[thread overview]
Message-ID: <46E0436E.9030504@midwinter.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0709061354180.28586@racer.site>

Johannes Schindelin wrote:
> But you can add _yet another_ index to it, which can be generated on the 
> fly, so that Git only has to generate the information once, and then reuse 
> it later.  As a benefit of this method, the underlying well-tested 
> structure needs no change at all.
>   

And in fact, you can do this today, without modifying git-blame at all, 
by (ab)using its "-S" option (which lets you specify a custom ancestry 
chain to search). By coincidence, I was just showing some people at my 
office how to do this yesterday. I'll cut-and-paste from the email I 
sent them. I am not claiming this is nearly as desirable as a built-in, 
auto-updated secondary index, but it proves the concept, anyway.

Fast-to-generate version:

git-rev-list HEAD -- main.c | awk '{if (last) print last " " $0; 
last=$0;}' > /tmp/revlist

This speeds things up a lot, because git blame doesn't have to examine 
other revisions:

time git blame main.c
   1.56s user 0.30s system 99% cpu 1.868 total
time git blame -S /tmp/revlist main.c
   0.21s user 0.03s system 96% cpu 0.249 total

The bad news is that generating that revision list is a bit slow, and if 
you do it the naive way I suggested above, you can't use the rev list 
with the -M option (to follow renames). The good news is that it's 
possible to have that too if you generate a list of revisions that 
includes the renames:

# Generate a list of all revisions in the right order (only need to do 
this once, not once per file)
git rev-list HEAD > /tmp/all-revs
# Generate a list of the revisions that touched this file, following 
copies/renames.
# Could do this in fewer commands but this is hopefully easier to follow.
git blame --porcelain -M main.c | \
   egrep '^[0-9a-f]{40}' | \
   cut -d' ' -f1 | \
   fgrep -f - /tmp/all-revs | \
   awk '{if (last) print last " " $0; last=$0;}' > /tmp/revlist

Then -M is fast too:

time git blame -M main.c
   1.72s user 0.27s system 89% cpu 2.219 total
time git blame -M -S /tmp/revlist main.c
   0.29s user 0.03s system 93% cpu 0.341 total

Oddly, if you use the -S option, "git blame -C" actually gets 
significantly *slower*. I am not sure why.

-Steve

  reply	other threads:[~2007-09-06 18:14 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-04 15:23 Git's database structure Jon Smirl
2007-09-04 15:55 ` Andreas Ericsson
2007-09-04 16:07   ` Mike Hommey
2007-09-04 16:10     ` Andreas Ericsson
2007-09-04 16:19   ` Jon Smirl
2007-09-04 16:29     ` Andreas Ericsson
2007-09-04 17:09     ` Jeff King
2007-09-04 20:17     ` David Tweed
2007-09-04 17:21   ` Junio C Hamano
2007-09-04 16:28 ` Jon Smirl
2007-09-04 16:31   ` Andreas Ericsson
2007-09-04 16:47     ` Jon Smirl
2007-09-04 16:51       ` Andreas Ericsson
2007-09-04 17:25   ` Junio C Hamano
2007-09-04 17:44     ` Jon Smirl
2007-09-04 18:04       ` Mike Hommey
2007-09-04 19:44         ` Reece Dunn
2007-09-04 18:06       ` Junio C Hamano
2007-09-04 21:25       ` Theodore Tso
2007-09-04 21:54         ` Jon Smirl
2007-09-05  7:18           ` Andreas Ericsson
2007-09-05 13:41             ` Jon Smirl
2007-09-05 14:51               ` Andreas Ericsson
2007-09-05 15:37                 ` Jon Smirl
2007-09-05 15:54                   ` Julian Phillips
2007-09-05 16:12                     ` Jon Smirl
2007-09-05 17:31                       ` Julian Phillips
2007-09-06  1:27                         ` Kyle Moffett
2007-09-05 17:39                       ` Mike Hommey
2007-09-06  8:49                       ` Andreas Ericsson
2007-09-06  9:09                         ` Junio C Hamano
2007-09-06 11:03                           ` Wincent Colaiuta
2007-09-06 12:56                       ` Johannes Schindelin
2007-09-06 18:14                         ` Steven Grimm [this message]
2007-09-07  0:33                       ` Martin Langhoff
2007-09-05 19:52               ` Andy Parkins
2007-09-04 17:19 ` Julian Phillips
2007-09-04 17:30   ` Jon Smirl
2007-09-04 18:51     ` Andreas Ericsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46E0436E.9030504@midwinter.com \
    --to=koreth@midwinter.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=ae@op5.se \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonsmirl@gmail.com \
    --cc=julian@quantumfyre.co.uk \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.