git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* update-index --assume-unchanged doesn't make things go fast
@ 2008-06-25 16:44 Avery Pennarun
  2008-06-25 17:38 ` Michael J Gruber
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Avery Pennarun @ 2008-06-25 16:44 UTC (permalink / raw)
  To: Git Mailing List

Hi all,

Using git 1.5.6.64.g85fe, but this applies to various other versions I've tried.

I have a git repo with about 17000+ files in 1000+ directories.  In
Linux, "git status" runs in under a second, which is perfectly fine.
But on Windows, which can apparently only stat() about 1000 files per
second, "git status" takes at least 17 seconds to run, even with a hot
cache.  (I've confirmed that stat() is so slow on Windows by writing a
simple program that just runs stat() in a tight loop.  The slowness
may be cygwin-related, as I found some direct Win32 calls that seem to
go more than twice as fast... which is still too slow.)

"git status" is not so important, since I can choose not to run it.
But it turns out that every git checkout and git commit does all the
same stuff, which is really not so great.  Even worse if you consider
that "git status" is almost always what I do by hand anyway to check
things before I commit.

So anyway, I read about the git-update-index --assume-unchanged
option, and thought that might be just what I want.  So I did this
(back in Linux, where things are easier to debug):

$ strace -fe lstat64 git status 2>&1 | wc -l
17869

$ git ls-files | xargs -d '\n' git update-index --assume-unchanged

$ strace -fe lstat64 git status 2>&1 | wc -l
33

So far, so good, and "git status" is now noticeably faster on my Linux
system (maybe twice as fast).  It's also noticeably faster on my
Windows system, but not as fast as I would have hoped.  I've tracked
it down to this:

$ strace -fe getdents64 git status 2>&1 | wc -l
2729

"git status" still checks all the *directories* to see if there are
any new files.  Of course!  --assume-unchanged can't be applied to a
directory, so there's no way to tell it not to do so.

Also, "git diff" is still as slow as ever:

$ strace -fe lstat64 git diff 2>&1 | wc -l
23199

It seems to be stat()ing the files even though they are
--assume-unchanged, which is probably a simple bug.

And while we're here, "git checkout" seems to be working a lot harder
than it should be:

$ strace -fe lstat64 git checkout -b boo 2>&1 | wc -l
23227

Note that I'm just creating a new branch name here, not even checking
out any new files, so I can't think of any situation where the
checkout would fail.  Is there one?

Even if I checkout a totally different branch, presumably it should
only need to stat() the files that changed between the old and new
versions, right?  And that would normally be very fast.

I don't mind doing some of the work to improve things here, as long as
people can give me some advice.  Specifically:

1) What's a sensible way to tell git to *not* opendir() specific
directories to look for unexpected files in "git status"?  (I don't
think I know enough to implement this myself.)

2) Do you think git-diff should honour --assume-unchanged?  If not, why not?

3) Do you think git-checkout can be optimized here?  I can see why it
might want to disregard --assume-unchanged (for safety reasons), but
presumably it only needs to look at all at files that it's planning to
change, right?

4) My idea is to eventually --assume-unchanged my whole repository,
then write a cheesy daemon that uses the Win32 dnotify-equivalent to
watch for files that get updated and then selectively
--no-assume-unchanged files that it gets notified about.  That would
avoid the need to ever synchronously scan the whole repo for changes,
thus making my git-Win32 experience much faster and more enjoyable.
(This daemon ought to be possible to run on Linux as well, for similar
improvements on gigantic repositories.  Also note that TortoiseSVN for
Windows does something similar to track file status updates, so this
isn't *just* me being crazy.)

Thoughts?

Thanks,

Avery

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2008-06-28  2:05 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-25 16:44 update-index --assume-unchanged doesn't make things go fast Avery Pennarun
2008-06-25 17:38 ` Michael J Gruber
2008-06-25 18:02   ` Avery Pennarun
2008-06-26  8:47     ` Michael J Gruber
2008-06-25 19:30 ` Jakub Narebski
2008-06-25 19:41   ` Junio C Hamano
2008-06-25 19:53   ` Avery Pennarun
2008-06-25 21:35     ` Jakub Narebski
2008-06-26  1:30       ` Avery Pennarun
2008-06-26 11:22 ` Stephen R. van den Berg
2008-06-27 17:01   ` Avery Pennarun
2008-06-27 17:31     ` Jakub Narebski
2008-06-27 17:56       ` Avery Pennarun
2008-06-27 18:09         ` Dana How
2008-06-27 18:51           ` Avery Pennarun
2008-06-28  2:03       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).