All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Jan Smets <jan.smets@alcatel-lucent.com>, git@vger.kernel.org
Subject: Re: git blame performance
Date: Fri, 06 Nov 2015 15:52:56 +0100	[thread overview]
Message-ID: <563CBEC8.7070209@alum.mit.edu> (raw)
In-Reply-To: <563CAD30.6040608@alcatel-lucent.com>

On 11/06/2015 02:37 PM, Jan Smets wrote:
> I have recently migrated a fairly large project from CVS to Git.
> One of the issues we're having is the blame/annotate performance.
> [...]
> cvs annotate of the same file (over the network) is ready in 0.8 seconds.
> blame/annotate is a frequently used operation, ranging between 5 to 20
> usages a day per developer.

cvs annotate and git blame both have to follow history back until they
find the commit that introduced the oldest line that is still in the
current version of the file. So for a really old file, a lot of history
has to be walked through.

The reason that cvs annotate is so much faster than git blame is that
CVS stores revisions filewise, with all of the modifications to file
$FILE being stored in a single $FILE,v file. So in the worst case, CVS
only has to read this one file.

Git, on the other hand, stores revisions treewise. It has no way of
knowing, ab initio, which revisions touched a given file. (In fact, this
concept is not even well-defined because the answer depends on things
like whether copy (-C) and move (-M) detection are turned on and what
parameters they were given.) This means that git blame has to traverse
most of history to find the commits that touched $FILE.

Slow git blame is thus a relatively unavoidable consequence of Git's
data model. That's not to say that it can't be sped up somewhat, but it
will never reach CVS speeds.

But it does have some features that can reduce the work:

-L <start>,<end>, -L :<funcname> -- Annotate only the given line range.
This option can speed things up (1) if the range of lines does not
include the oldest lines, (2) by limiting which parents of merge commits
have to be followed.

--incremental -- if you are using this command to build tooling, this
option allows partial results to be returned early, to reduce the wait
until the user sees something.

If you are not interested in changes older than a certain date or
revision, you can limit the amount of history that git blame traverses.
See SPECIFYING RANGES in the manpage.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

  reply	other threads:[~2015-11-06 14:53 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-06 13:37 git blame performance Jan Smets
2015-11-06 14:52 ` Michael Haggerty [this message]
2015-11-06 17:53   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=563CBEC8.7070209@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=jan.smets@alcatel-lucent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.