git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Yakov Lerner <iler.ml@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: 'git log FILE' slow
Date: Wed, 11 Jul 2007 14:03:07 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LFD.0.999.0707111354150.20061@woody.linux-foundation.org> (raw)
In-Reply-To: <f36b08ee0707111333q38004cb5x152f25e2055e2796@mail.gmail.com>



On Wed, 11 Jul 2007, Yakov Lerner wrote:
> 
> 'git-log FILE' takes 10-13 sec.  What can I do to identify
> the reason ? 'git log >/dev/null' takes 0.1 sec (cached).

"git log FILE" is simply *fundamnentally* much more expensive than "git 
log".

There's nothing to "identify". Both go through the whole log of the 
project, but "git log file" has to look at every tree, and see where the 
file actually changed.

However, "fundmanetally more expensive" doesn't actually mean that it 
should be that slow. I suspect that your archive is not packed, so you 
have probably thousands of individual objects in the filesystem, and are 
slowing down your git usage totally needlessly.

So do

	git gc

on the archive, and you'll probably be happy.

That said, 10-13 seconds *can* be valid for a really big archive, ie 
that's the kinds of times you might eventually expect for something like 
the full KDE archive (if they don't split the subprojects up).

I doubt that's it.

> On the cloned copy, the times are approximately same.

This is a big clue. Cloning will generate a new pack.

> The 'git-count-objects -v' shows:
> 
> count: 9830
> size: 241412
> in-pack: 12080
> packs: 18
> prune-packable: 188
> garbage: 0

Tons of packs, and lots of unpacked objects.

Just get used to doing "git gc" once a week (or maybe once a month - I 
guess you've not done it at all?)

> The strace shows only thousands of sbrk during the 10-13 sec time
> (after some initial I/O). Ltrace, I was not able to complete, takes too much.

Hmm. I'd have expected to see some "stat()/open()" calls if it was really 
just about packing, so I'm a bit surprised, but I really do think you 
should just garbage collect your packs. Having 12k objects in 18 packs is 
ridiculous - each pack must be pitifully small.

Here's my kernel archive:

	[torvalds@woody linux]$ git count-objects -v
	count: 364
	size: 2328
	in-pack: 506495
	packs: 12
	prune-packable: 5
	garbage: 0

ie I have forty times the objects, in fewer packs than you do (and most of 
it is in one big one). After a "git gc", it looks like

	[torvalds@woody linux]$ git count-objects -v
	count: 0
	size: 0
	in-pack: 506090
	packs: 1
	prune-packable: 0
	garbage: 0

and everything is happier (not that it was unhappy before either, but 
mine was much better packed than yours was).

		Linus

      reply	other threads:[~2007-07-11 21:03 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-11 20:33 'git log FILE' slow Yakov Lerner
2007-07-11 21:03 ` Linus Torvalds [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.0.999.0707111354150.20061@woody.linux-foundation.org \
    --to=torvalds@linux-foundation.org \
    --cc=git@vger.kernel.org \
    --cc=iler.ml@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).