git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Daniel Berlin <dberlin@dberlin.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: git annotate runs out of memory
Date: Tue, 11 Dec 2007 11:42:01 -0800 (PST)	[thread overview]
Message-ID: <alpine.LFD.0.9999.0712111122400.25032@woody.linux-foundation.org> (raw)
In-Reply-To: <4aca3dc20712111109y5d74a292rf29be6308932393c@mail.gmail.com>



On Tue, 11 Dec 2007, Daniel Berlin wrote:
>
> I understand this, and completely agree with you.
> However, I cannot force GCC people to adopt completely new workflow in
> this regard.

Oh, I agree. It's why we do have "git blame" these days, and it's why I've 
tried to make people use the nicer incremental mode, which is not at all 
faster, but it's a hell of a lot more pleasant to use because you get some 
output immediately.

In other words,

	git blame gcc/ChangeLog

is virtually useless because it's too expensive, but try doing

	git gui blame gcc ChangeLog

instead, and doesn't that just seem nicer? (*)

The difference is that the GUI one does it incrementally, and doesn't have 
to get _all_ the results before it can start reporting blame.

Not that I claim that the gui blame is perfect either (I dunno why it 
delays the nice coloring so long, for example), but it was something I 
pushed - and others made the gui for - exactly to help people with the 
fact that git interally really does it that incremental way.

> SVN had the same problem (the file retrieval was the most expensive op
> on FSFS). One of the things i did to speed it up tremendously was to
> do the annotate from newest to oldest (IE in reverse), and stop
> annotating when we had come up with annotate info for all the lines.

We do that. The expense for git is that we don't do the revisions as a 
single file at all. We'll look through each commit, check whether the 
"gcc" directory changed, if it did, we'll go into it, and check whether 
the "ChangeLog" file changed - and if it did, we'll actually diff it 
against the previous version.

> In GCC history, it is likely you will be able to cut off at least 30%
> of the time if you do this, because files often have changed entirely
> multiple times.

Not gcc/ChangeLog, though (apart from the renames that happen 
occasionally).

Btw, an example of something git *should* do right, but is just too damn 
expensive, is doing

	git gui blame gcc/ChangeLog-2000

and have it actually be able to track the original source of each of those 
annotations across that "ChangeLog split from hell". 

I bet it would eventually get it right, but that's a large file, way back 
in history, and it will try to do a non-whitespace blame with copy 
detection.

That's *expensive*, although it is an amusing thing to try to do ;)

			Linus

PS. I also do agree that we seem to use an excessive amount of memory 
there. As to whether it's the same issue or not, I'd not go as far as Nico 
and say "yes" yet. But it's interesting.

It's not entirely surprising that we see multiple issues with the gcc 
repo, simply because it's not the kind of repo that people have ever 
really worked on. So I don't think it's necessarily related at all, except 
in the sense of it being a different load and showing issues.

  parent reply	other threads:[~2007-12-11 19:42 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-11 17:33 git annotate runs out of memory Daniel Berlin
2007-12-11 17:47 ` Nicolas Pitre
2007-12-11 17:53   ` Daniel Berlin
2007-12-11 18:01     ` Nicolas Pitre
2007-12-11 18:32 ` Marco Costalba
2007-12-11 19:03   ` Daniel Berlin
2007-12-11 19:14     ` Marco Costalba
2007-12-11 19:27     ` Jason Sewall
2007-12-11 19:46     ` Daniel Barkalow
2007-12-11 20:14       ` Marco Costalba
2007-12-11 18:40 ` Linus Torvalds
2007-12-11 19:01   ` Matthieu Moy
2007-12-11 19:22     ` Linus Torvalds
2007-12-11 19:24       ` Daniel Berlin
2007-12-11 19:42         ` Pierre Habouzit
2007-12-11 21:09           ` Daniel Berlin
2007-12-11 23:37       ` Matthieu Moy
2007-12-11 23:48         ` Linus Torvalds
2007-12-11 19:06   ` Nicolas Pitre
2007-12-11 20:31     ` Jon Smirl
2007-12-11 19:09   ` Daniel Berlin
2007-12-11 19:26     ` Daniel Barkalow
2007-12-11 19:34     ` Pierre Habouzit
2007-12-11 19:59       ` Junio C Hamano
2007-12-11 19:42     ` Linus Torvalds [this message]
2007-12-11 19:50       ` Linus Torvalds
2007-12-11 21:14         ` Daniel Berlin
2007-12-11 21:34           ` Linus Torvalds
2007-12-12  7:57         ` Jeff King
2007-12-17 23:24           ` Jan Hudec
2007-12-18  0:05             ` Linus Torvalds
2007-12-11 21:14       ` Linus Torvalds
2007-12-11 21:54         ` Junio C Hamano
2007-12-11 23:36           ` Linus Torvalds
2007-12-12  0:02             ` Linus Torvalds
2007-12-12  0:22               ` Davide Libenzi
2007-12-12  0:50                 ` Linus Torvalds
2007-12-12  1:12                   ` Davide Libenzi
2007-12-12  2:10                     ` Linus Torvalds
2007-12-12  3:35                       ` Linus Torvalds
2007-12-12  0:56               ` Junio C Hamano
2007-12-12  2:20                 ` Linus Torvalds
2007-12-12  2:39                   ` Linus Torvalds
2007-12-12 19:43               ` Daniel Berlin
2007-12-12  4:48           ` Junio C Hamano
2007-12-11 21:24       ` Daniel Berlin
2007-12-12  3:57       ` Shawn O. Pearce
2007-12-11 20:29     ` Marco Costalba
2007-12-11 19:29   ` Steven Grimm
2007-12-11 20:14     ` Jakub Narebski
2007-12-12 10:36 ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.0.9999.0712111122400.25032@woody.linux-foundation.org \
    --to=torvalds@linux-foundation.org \
    --cc=dberlin@dberlin.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).