From: Steven Grimm <koreth@midwinter.com>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Jon Smirl <jonsmirl@gmail.com>,
Julian Phillips <julian@quantumfyre.co.uk>,
Andreas Ericsson <ae@op5.se>, Theodore Tso <tytso@mit.edu>,
Junio C Hamano <gitster@pobox.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: Git's database structure
Date: Thu, 06 Sep 2007 11:14:06 -0700 [thread overview]
Message-ID: <46E0436E.9030504@midwinter.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0709061354180.28586@racer.site>
Johannes Schindelin wrote:
> But you can add _yet another_ index to it, which can be generated on the
> fly, so that Git only has to generate the information once, and then reuse
> it later. As a benefit of this method, the underlying well-tested
> structure needs no change at all.
>
And in fact, you can do this today, without modifying git-blame at all,
by (ab)using its "-S" option (which lets you specify a custom ancestry
chain to search). By coincidence, I was just showing some people at my
office how to do this yesterday. I'll cut-and-paste from the email I
sent them. I am not claiming this is nearly as desirable as a built-in,
auto-updated secondary index, but it proves the concept, anyway.
Fast-to-generate version:
git-rev-list HEAD -- main.c | awk '{if (last) print last " " $0;
last=$0;}' > /tmp/revlist
This speeds things up a lot, because git blame doesn't have to examine
other revisions:
time git blame main.c
1.56s user 0.30s system 99% cpu 1.868 total
time git blame -S /tmp/revlist main.c
0.21s user 0.03s system 96% cpu 0.249 total
The bad news is that generating that revision list is a bit slow, and if
you do it the naive way I suggested above, you can't use the rev list
with the -M option (to follow renames). The good news is that it's
possible to have that too if you generate a list of revisions that
includes the renames:
# Generate a list of all revisions in the right order (only need to do
this once, not once per file)
git rev-list HEAD > /tmp/all-revs
# Generate a list of the revisions that touched this file, following
copies/renames.
# Could do this in fewer commands but this is hopefully easier to follow.
git blame --porcelain -M main.c | \
egrep '^[0-9a-f]{40}' | \
cut -d' ' -f1 | \
fgrep -f - /tmp/all-revs | \
awk '{if (last) print last " " $0; last=$0;}' > /tmp/revlist
Then -M is fast too:
time git blame -M main.c
1.72s user 0.27s system 89% cpu 2.219 total
time git blame -M -S /tmp/revlist main.c
0.29s user 0.03s system 93% cpu 0.341 total
Oddly, if you use the -S option, "git blame -C" actually gets
significantly *slower*. I am not sure why.
-Steve
next prev parent reply other threads:[~2007-09-06 18:14 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-04 15:23 Git's database structure Jon Smirl
2007-09-04 15:55 ` Andreas Ericsson
2007-09-04 16:07 ` Mike Hommey
2007-09-04 16:10 ` Andreas Ericsson
2007-09-04 16:19 ` Jon Smirl
2007-09-04 16:29 ` Andreas Ericsson
2007-09-04 17:09 ` Jeff King
2007-09-04 20:17 ` David Tweed
2007-09-04 17:21 ` Junio C Hamano
2007-09-04 16:28 ` Jon Smirl
2007-09-04 16:31 ` Andreas Ericsson
2007-09-04 16:47 ` Jon Smirl
2007-09-04 16:51 ` Andreas Ericsson
2007-09-04 17:25 ` Junio C Hamano
2007-09-04 17:44 ` Jon Smirl
2007-09-04 18:04 ` Mike Hommey
2007-09-04 19:44 ` Reece Dunn
2007-09-04 18:06 ` Junio C Hamano
2007-09-04 21:25 ` Theodore Tso
2007-09-04 21:54 ` Jon Smirl
2007-09-05 7:18 ` Andreas Ericsson
2007-09-05 13:41 ` Jon Smirl
2007-09-05 14:51 ` Andreas Ericsson
2007-09-05 15:37 ` Jon Smirl
2007-09-05 15:54 ` Julian Phillips
2007-09-05 16:12 ` Jon Smirl
2007-09-05 17:31 ` Julian Phillips
2007-09-06 1:27 ` Kyle Moffett
2007-09-05 17:39 ` Mike Hommey
2007-09-06 8:49 ` Andreas Ericsson
2007-09-06 9:09 ` Junio C Hamano
2007-09-06 11:03 ` Wincent Colaiuta
2007-09-06 12:56 ` Johannes Schindelin
2007-09-06 18:14 ` Steven Grimm [this message]
2007-09-07 0:33 ` Martin Langhoff
2007-09-05 19:52 ` Andy Parkins
2007-09-04 17:19 ` Julian Phillips
2007-09-04 17:30 ` Jon Smirl
2007-09-04 18:51 ` Andreas Ericsson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46E0436E.9030504@midwinter.com \
--to=koreth@midwinter.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=ae@op5.se \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonsmirl@gmail.com \
--cc=julian@quantumfyre.co.uk \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.