git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Some git performance measurements..
@ 2007-11-29  2:49 Linus Torvalds
  2007-11-29  3:14 ` Linus Torvalds
  2007-11-29  5:17 ` Junio C Hamano
  0 siblings, 2 replies; 28+ messages in thread
From: Linus Torvalds @ 2007-11-29  2:49 UTC (permalink / raw)
  To: Git Mailing List


So, today, I finally started looking a bit at one of the only remaining 
performance issues that I'm aware of: git behaviour under cold-cache and 
particularly with a slow laptop harddisk isn't as nice as I wish it should 
be.

Sadly, one big reason for performance downsides is actually hard to 
measure: since we mmap() all the really critical data structures (the 
pack-file index and data), it doesn't show up really well in any of the 
otherwise really useful performance tools (eg strace and ltrace), because 
the bulk of the time is actually spent not in libraries or in system 
calls, but simply on regular instructions that take a page fault.

Not a lot that I see we can do about that, unless we can make the 
pack-files even denser.

But one very interesting thing I did notice: some loads open the 
".gitignore" files *way* too much. Even in cases where we really don't 
care. And when the caches are cold, that's actually very expensive, even 
if - and perhaps _especially_when_ - the file doesn't exist at all (ie 
some filesystems that don't use hashes will look through the whole 
directory before they see that it's empty).

An example of totally unnecessary .gitignore files is what a plain "git 
checkout" with no arguments ends up doing:

	git read-tree -m -u --exclude-per-directory=.gitignore HEAD HEAD

which is *really* quite expensive, and a lot of the cost is trying to open 
a .gitignore file in each subdirectory that are never even used.

Just to give a feel for *how* expensive that stupid .gitignore thing is, 
here's a pretty telling comparison of using --exclude-per-directory and 
not using it:

With totally pointless --exclude-per-directory (aka "git checkout"):

	[torvalds@woody linux]$ time git read-tree -m -u --exclude-per-directory=.gitignore HEAD HEAD
	real    0m13.475s
	user    0m0.108s
	sys     0m0.228s

Without:

	[torvalds@woody linux]$ time git read-tree -m -u HEAD HEAD
	real    0m5.923s
	user    0m0.100s
	sys     0m0.044s

now, I'm not all that happy about that latter six-second time either, but 
both of the above numbers were done with completely cold caches (ie after 
having done a "echo 3 > /proc/sys/vm/drop_caches" as root).

With hot caches, both of the numbers are under a tenth of a second (in 
fact, they are very close: 0.092s and 0.096s respectively), but the 
cold-cache case really shows just how horrible it is to (try to) open many 
files.

Doing an open (or an lstat) on individual files will be a totally 
synchronous operation, with no room for read-ahead etc, so even if your 
disk in *theory* gets 80MB/s off the platter, when you do an open() or 
lstat(), you're basically doing three or four small data-dependent IO 
operations, and as a result even a fast disk will take almost a hundredth 
of a second per open/lstat operation.

Less than a hundredth of a second may not sound much, but when we have 
1700+ directories in the kernel trees, doing that for each possible 
.gitignore file is really really expensive!

(Doing an "lstat()" of each file is much cheaper in comparison, because at 
least you'll get several director entries and probably a few related 
inodes with each IO. But opening just _one_ file per directory like the 
.gitignore code does, really kills your IO throughput)

Now, timings like these are why I'm looking forward to SSD's. They may 
have the same throughput as a disk, but they can do thousands of dependent 
IOPS, and help latency-bound cases like this by an order of magnitude. But 
when we're doing those .gitignore file reads totally unnecassarily, that 
just hurts..

			Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2007-12-08 23:04 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-29  2:49 Some git performance measurements Linus Torvalds
2007-11-29  3:14 ` Linus Torvalds
2007-11-29  3:59   ` Nicolas Pitre
2007-11-29  4:32     ` Linus Torvalds
2007-11-29 17:25       ` Nicolas Pitre
2007-11-29 17:48         ` Linus Torvalds
2007-11-29 18:52           ` Nicolas Pitre
2007-11-30  5:00           ` Junio C Hamano
2007-11-30  6:03             ` Linus Torvalds
2007-11-30  0:54         ` Jakub Narebski
2007-11-30  2:21           ` Linus Torvalds
2007-11-30  2:39             ` Jakub Narebski
2007-11-30  2:40             ` Nicolas Pitre
2007-11-30  6:11               ` Steffen Prohaska
2007-12-07 13:35                 ` Mike Ralphson
2007-12-07 13:49                   ` Johannes Schindelin
2007-12-07 16:07                     ` Linus Torvalds
2007-12-07 16:09                     ` Mike Ralphson
2007-12-07 18:37                       ` Johannes Schindelin
2007-12-07 19:15                         ` Mike Ralphson
2007-12-08 11:05                           ` Johannes Schindelin
2007-12-08 23:04                             ` Brian Downing
2007-11-30  2:54             ` Linus Torvalds
2007-12-05  1:04               ` Federico Mena Quintero
2007-12-01 11:36   ` Joachim B Haga
2007-12-01 17:19     ` Linus Torvalds
2007-11-29  5:17 ` Junio C Hamano
2007-11-29 10:17   ` [PATCH] per-directory-exclude: lazily read .gitignore files Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).