git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* update-index --assume-unchanged doesn't make things go fast
@ 2008-06-25 16:44 Avery Pennarun
  2008-06-25 17:38 ` Michael J Gruber
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Avery Pennarun @ 2008-06-25 16:44 UTC (permalink / raw)
  To: Git Mailing List

Hi all,

Using git 1.5.6.64.g85fe, but this applies to various other versions I've tried.

I have a git repo with about 17000+ files in 1000+ directories.  In
Linux, "git status" runs in under a second, which is perfectly fine.
But on Windows, which can apparently only stat() about 1000 files per
second, "git status" takes at least 17 seconds to run, even with a hot
cache.  (I've confirmed that stat() is so slow on Windows by writing a
simple program that just runs stat() in a tight loop.  The slowness
may be cygwin-related, as I found some direct Win32 calls that seem to
go more than twice as fast... which is still too slow.)

"git status" is not so important, since I can choose not to run it.
But it turns out that every git checkout and git commit does all the
same stuff, which is really not so great.  Even worse if you consider
that "git status" is almost always what I do by hand anyway to check
things before I commit.

So anyway, I read about the git-update-index --assume-unchanged
option, and thought that might be just what I want.  So I did this
(back in Linux, where things are easier to debug):

$ strace -fe lstat64 git status 2>&1 | wc -l
17869

$ git ls-files | xargs -d '\n' git update-index --assume-unchanged

$ strace -fe lstat64 git status 2>&1 | wc -l
33

So far, so good, and "git status" is now noticeably faster on my Linux
system (maybe twice as fast).  It's also noticeably faster on my
Windows system, but not as fast as I would have hoped.  I've tracked
it down to this:

$ strace -fe getdents64 git status 2>&1 | wc -l
2729

"git status" still checks all the *directories* to see if there are
any new files.  Of course!  --assume-unchanged can't be applied to a
directory, so there's no way to tell it not to do so.

Also, "git diff" is still as slow as ever:

$ strace -fe lstat64 git diff 2>&1 | wc -l
23199

It seems to be stat()ing the files even though they are
--assume-unchanged, which is probably a simple bug.

And while we're here, "git checkout" seems to be working a lot harder
than it should be:

$ strace -fe lstat64 git checkout -b boo 2>&1 | wc -l
23227

Note that I'm just creating a new branch name here, not even checking
out any new files, so I can't think of any situation where the
checkout would fail.  Is there one?

Even if I checkout a totally different branch, presumably it should
only need to stat() the files that changed between the old and new
versions, right?  And that would normally be very fast.

I don't mind doing some of the work to improve things here, as long as
people can give me some advice.  Specifically:

1) What's a sensible way to tell git to *not* opendir() specific
directories to look for unexpected files in "git status"?  (I don't
think I know enough to implement this myself.)

2) Do you think git-diff should honour --assume-unchanged?  If not, why not?

3) Do you think git-checkout can be optimized here?  I can see why it
might want to disregard --assume-unchanged (for safety reasons), but
presumably it only needs to look at all at files that it's planning to
change, right?

4) My idea is to eventually --assume-unchanged my whole repository,
then write a cheesy daemon that uses the Win32 dnotify-equivalent to
watch for files that get updated and then selectively
--no-assume-unchanged files that it gets notified about.  That would
avoid the need to ever synchronously scan the whole repo for changes,
thus making my git-Win32 experience much faster and more enjoyable.
(This daemon ought to be possible to run on Linux as well, for similar
improvements on gigantic repositories.  Also note that TortoiseSVN for
Windows does something similar to track file status updates, so this
isn't *just* me being crazy.)

Thoughts?

Thanks,

Avery

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: update-index --assume-unchanged doesn't make things go fast
  2008-06-25 16:44 update-index --assume-unchanged doesn't make things go fast Avery Pennarun
@ 2008-06-25 17:38 ` Michael J Gruber
  2008-06-25 18:02   ` Avery Pennarun
  2008-06-25 19:30 ` Jakub Narebski
  2008-06-26 11:22 ` Stephen R. van den Berg
  2 siblings, 1 reply; 16+ messages in thread
From: Michael J Gruber @ 2008-06-25 17:38 UTC (permalink / raw)
  To: git

Avery Pennarun venit, vidit, dixit 25.06.2008 18:44:
...
> 4) My idea is to eventually --assume-unchanged my whole repository,
> then write a cheesy daemon that uses the Win32 dnotify-equivalent to
> watch for files that get updated and then selectively
> --no-assume-unchanged files that it gets notified about.  That would
> avoid the need to ever synchronously scan the whole repo for changes,
> thus making my git-Win32 experience much faster and more enjoyable.
> (This daemon ought to be possible to run on Linux as well, for similar
> improvements on gigantic repositories.  Also note that TortoiseSVN for
> Windows does something similar to track file status updates, so this
> isn't *just* me being crazy.)

Looks like users on slow NFS would profit, too. Hate to say it, but hg 
feels faster on (slow) NFS than git. Yet I use git, for other reasons ;)

Michael

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: update-index --assume-unchanged doesn't make things go fast
  2008-06-25 17:38 ` Michael J Gruber
@ 2008-06-25 18:02   ` Avery Pennarun
  2008-06-26  8:47     ` Michael J Gruber
  0 siblings, 1 reply; 16+ messages in thread
From: Avery Pennarun @ 2008-06-25 18:02 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: git

On 6/25/08, Michael J Gruber <michaeljgruber+gmane@fastmail.fm> wrote:
> > 4) My idea is to eventually --assume-unchanged my whole repository,
> > then write a cheesy daemon that uses the Win32 dnotify-equivalent to
> > watch for files that get updated and then selectively
> > --no-assume-unchanged files that it gets notified about.  That would
> > avoid the need to ever synchronously scan the whole repo for changes,
> > thus making my git-Win32 experience much faster and more enjoyable.
> > (This daemon ought to be possible to run on Linux as well, for similar
> > improvements on gigantic repositories.  Also note that TortoiseSVN for
> > Windows does something similar to track file status updates, so this
> > isn't *just* me being crazy.)
>
>  Looks like users on slow NFS would profit, too. Hate to say it, but hg
> feels faster on (slow) NFS than git. Yet I use git, for other reasons ;)

Hmm, can you do dnotify over NFS?

I'd like to know how hg goes any faster.  As far as I can see, git is
going as fast as can be without some kind of daemon or other magic.
(Except for my point #3, which seems relatively minor.)

Thanks,

Avery

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: update-index --assume-unchanged doesn't make things go fast
  2008-06-25 16:44 update-index --assume-unchanged doesn't make things go fast Avery Pennarun
  2008-06-25 17:38 ` Michael J Gruber
@ 2008-06-25 19:30 ` Jakub Narebski
  2008-06-25 19:41   ` Junio C Hamano
  2008-06-25 19:53   ` Avery Pennarun
  2008-06-26 11:22 ` Stephen R. van den Berg
  2 siblings, 2 replies; 16+ messages in thread
From: Jakub Narebski @ 2008-06-25 19:30 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Git Mailing List

"Avery Pennarun" <apenwarr@gmail.com> writes:

> Hi all,
> 
> Using git 1.5.6.64.g85fe, but this applies to various other versions
> I've tried.
> 
> I have a git repo with about 17000+ files in 1000+ directories.  In
> Linux, "git status" runs in under a second, which is perfectly fine.
> But on Windows, which can apparently only stat() about 1000 files per
> second, "git status" takes at least 17 seconds to run, even with a hot
> cache.  (I've confirmed that stat() is so slow on Windows by writing a
> simple program that just runs stat() in a tight loop.  The slowness
> may be cygwin-related, as I found some direct Win32 calls that seem to
> go more than twice as fast... which is still too slow.)

Which git version do you use? Does it have the following configuration
variable (also available as command option):

  status.showUntrackedFiles::
        By default, linkgit:git-status[1] and linkgit:git-commit[1] show
        files which are not currently tracked by Git. Directories which
        contain only untracked files, are shown with the directory name
        only. Showing untracked files means that Git needs to lstat() all
        all the files in the whole repository, which might be slow on some
        systems. So, this variable controls how the commands displays
        the untracked files. Possible values are:

        - 'no'     - Show no untracked files
        - 'normal' - Shows untracked files and directories
        - 'all'    - Shows also individual files in untracked directories.

HTH.
-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: update-index --assume-unchanged doesn't make things go fast
  2008-06-25 19:30 ` Jakub Narebski
@ 2008-06-25 19:41   ` Junio C Hamano
  2008-06-25 19:53   ` Avery Pennarun
  1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2008-06-25 19:41 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Avery Pennarun, Git Mailing List

Jakub Narebski <jnareb@gmail.com> writes:

> "Avery Pennarun" <apenwarr@gmail.com> writes:
>
>> Hi all,
>> 
>> Using git 1.5.6.64.g85fe, but this applies to various other versions
>> I've tried.
>> 
>> I have a git repo with about 17000+ files in 1000+ directories.  In
>> Linux, "git status" runs in under a second, which is perfectly fine.
>> But on Windows, which can apparently only stat() about 1000 files per
>> second, "git status" takes at least 17 seconds to run, even with a hot
>> cache.  (I've confirmed that stat() is so slow on Windows by writing a
>> simple program that just runs stat() in a tight loop.  The slowness
>> may be cygwin-related, as I found some direct Win32 calls that seem to
>> go more than twice as fast... which is still too slow.)
>
> Which git version do you use? Does it have the following configuration
> variable (also available as command option):
>
>   status.showUntrackedFiles::
>         By default, linkgit:git-status[1] and linkgit:git-commit[1] show
>         files which are not currently tracked by Git. Directories which
>         contain only untracked files, are shown with the directory name
>         only. Showing untracked files means that Git needs to lstat() all
>         all the files in the whole repository, which might be slow on some
>         systems. So, this variable controls how the commands displays
>         the untracked files. Possible values are:
>
>         - 'no'     - Show no untracked files
>         - 'normal' - Shows untracked files and directories
>         - 'all'    - Shows also individual files in untracked directories.

That's on 'master' progressing forward to eventually become 1.6.0.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: update-index --assume-unchanged doesn't make things go fast
  2008-06-25 19:30 ` Jakub Narebski
  2008-06-25 19:41   ` Junio C Hamano
@ 2008-06-25 19:53   ` Avery Pennarun
  2008-06-25 21:35     ` Jakub Narebski
  1 sibling, 1 reply; 16+ messages in thread
From: Avery Pennarun @ 2008-06-25 19:53 UTC (permalink / raw)
  To: Jakub Narebski, Junio C Hamano; +Cc: Git Mailing List

On 6/25/08, Jakub Narebski <jnareb@gmail.com> wrote:
> Which git version do you use? Does it have the following configuration
>  variable (also available as command option):
>
>   status.showUntrackedFiles::
> [...]

Thanks, I didn't know about that one.  Using that definitely makes
"git status" go much faster (pretty much instantaneous if I've also
used --assume-unchanged on everything).

Now the catch is, if I want to implement the daemon I was talking
about earlier, I'd like to be able to notice untracked files (or
directories with untracked files) individually.  Ideally, I guess the
best way would be to just keep a separate list of all existing files
that aren't in the index, and have git status look at that rather than
at the actual filesystem.

Are there any suggestions for how best to do this?

Thanks,

Avery

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: update-index --assume-unchanged doesn't make things go fast
  2008-06-25 19:53   ` Avery Pennarun
@ 2008-06-25 21:35     ` Jakub Narebski
  2008-06-26  1:30       ` Avery Pennarun
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Narebski @ 2008-06-25 21:35 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Junio C Hamano, Git Mailing List

On Wed, 25 Jun 2008, Avery Pennarun wrote:
> On 6/25/08, Jakub Narebski <jnareb@gmail.com> wrote:
> >
> > Which git version do you use? Does it have the following configuration
> > variable (also available as command option):
> >
> >   status.showUntrackedFiles::
> > [...]
> 
> Thanks, I didn't know about that one.  Using that definitely makes
> "git status" go much faster (pretty much instantaneous if I've also
> used --assume-unchanged on everything).
> 
> Now the catch is, if I want to implement the daemon I was talking
> about earlier, I'd like to be able to notice untracked files (or
> directories with untracked files) individually.  Ideally, I guess the
> best way would be to just keep a separate list of all existing files
> that aren't in the index, and have git status look at that rather than
> at the actual filesystem.
> 
> Are there any suggestions for how best to do this?

You can try to take a look at how (third-party and Linux only) inotify
extension for Mercurial works.  AFAIK IIRC it uses some kind of daemon
which watches for inotify notices and updates Mercorial's equivalent
of index.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: update-index --assume-unchanged doesn't make things go fast
  2008-06-25 21:35     ` Jakub Narebski
@ 2008-06-26  1:30       ` Avery Pennarun
  0 siblings, 0 replies; 16+ messages in thread
From: Avery Pennarun @ 2008-06-26  1:30 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Junio C Hamano, Git Mailing List

On 6/25/08, Jakub Narebski <jnareb@gmail.com> wrote:
> On Wed, 25 Jun 2008, Avery Pennarun wrote:
>  > Now the catch is, if I want to implement the daemon I was talking
>  > about earlier, I'd like to be able to notice untracked files (or
>  > directories with untracked files) individually.  Ideally, I guess the
>  > best way would be to just keep a separate list of all existing files
>  > that aren't in the index, and have git status look at that rather than
>  > at the actual filesystem.
>  >
>  > Are there any suggestions for how best to do this?
>
> You can try to take a look at how (third-party and Linux only) inotify
>  extension for Mercurial works.  AFAIK IIRC it uses some kind of daemon
>  which watches for inotify notices and updates Mercorial's equivalent
>  of index.

Sorry, I asked the wrong question.  I wasn't asking how to implement
the daemon, which I think I can do without much trouble.  I actually
need to know how to represent the information.

I was thinking of handling updated files by doing update-index
--no-assume-unchanged on files that change.  But where should I store
information about *untracked* files that have changed, so that
git-status can still report them but not have to scan them all?

Thanks,

Avery

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: update-index --assume-unchanged doesn't make things go fast
  2008-06-25 18:02   ` Avery Pennarun
@ 2008-06-26  8:47     ` Michael J Gruber
  0 siblings, 0 replies; 16+ messages in thread
From: Michael J Gruber @ 2008-06-26  8:47 UTC (permalink / raw)
  To: git

Avery Pennarun venit, vidit, dixit 25.06.2008 20:02:
> On 6/25/08, Michael J Gruber <michaeljgruber+gmane@fastmail.fm> wrote:
>>> 4) My idea is to eventually --assume-unchanged my whole repository,
>>> then write a cheesy daemon that uses the Win32 dnotify-equivalent to
>>> watch for files that get updated and then selectively
>>> --no-assume-unchanged files that it gets notified about.  That would
>>> avoid the need to ever synchronously scan the whole repo for changes,
>>> thus making my git-Win32 experience much faster and more enjoyable.
>>> (This daemon ought to be possible to run on Linux as well, for similar
>>> improvements on gigantic repositories.  Also note that TortoiseSVN for
>>> Windows does something similar to track file status updates, so this
>>> isn't *just* me being crazy.)
>>  Looks like users on slow NFS would profit, too. Hate to say it, but hg
>> feels faster on (slow) NFS than git. Yet I use git, for other reasons ;)
> 
> Hmm, can you do dnotify over NFS?
> 
> I'd like to know how hg goes any faster.  As far as I can see, git is
> going as fast as can be without some kind of daemon or other magic.
> (Except for my point #3, which seems relatively minor.)

I haven't done any measurements, maybe I should; getting consistent 
results would require setting up an isolated NFS environment, though.

The thing is that hg is very careful about serializing and minimizing 
disk I/O, whereas git is very clever about delegating stuff to the 
kernel and processing data efficiently. In my work environment I have to 
keep my repos on NFS. For heavy history rewriting I resort to /tmp or 
/dev/shm temporarily. But git status is kinda slow on NFS. I don't know 
about [di]?notify over NFS.

Michael

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: update-index --assume-unchanged doesn't make things go fast
  2008-06-25 16:44 update-index --assume-unchanged doesn't make things go fast Avery Pennarun
  2008-06-25 17:38 ` Michael J Gruber
  2008-06-25 19:30 ` Jakub Narebski
@ 2008-06-26 11:22 ` Stephen R. van den Berg
  2008-06-27 17:01   ` Avery Pennarun
  2 siblings, 1 reply; 16+ messages in thread
From: Stephen R. van den Berg @ 2008-06-26 11:22 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Git Mailing List

Avery Pennarun wrote:
>1) What's a sensible way to tell git to *not* opendir() specific
>directories to look for unexpected files in "git status"?  (I don't
>think I know enough to implement this myself.)

Would checking the mtime on the directory itself help?
-- 
Sincerely,
           Stephen R. van den Berg.

If mind over matter is a matter of course, does it matter if nobody minds?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: update-index --assume-unchanged doesn't make things go fast
  2008-06-26 11:22 ` Stephen R. van den Berg
@ 2008-06-27 17:01   ` Avery Pennarun
  2008-06-27 17:31     ` Jakub Narebski
  0 siblings, 1 reply; 16+ messages in thread
From: Avery Pennarun @ 2008-06-27 17:01 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Git Mailing List

On 6/26/08, Stephen R. van den Berg <srb@cuci.nl> wrote:
> Avery Pennarun wrote:
>  >1) What's a sensible way to tell git to *not* opendir() specific
>  >directories to look for unexpected files in "git status"?  (I don't
>  >think I know enough to implement this myself.)
>
> Would checking the mtime on the directory itself help?

I'm guessing it would help somewhat (although not as much as not
checking anything at all).  However, we'd still have to check the
mtime *against* something, and I don't think the index stores
information about directories themselves.

Thanks,

Avery

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: update-index --assume-unchanged doesn't make things go fast
  2008-06-27 17:01   ` Avery Pennarun
@ 2008-06-27 17:31     ` Jakub Narebski
  2008-06-27 17:56       ` Avery Pennarun
  2008-06-28  2:03       ` Junio C Hamano
  0 siblings, 2 replies; 16+ messages in thread
From: Jakub Narebski @ 2008-06-27 17:31 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Stephen R. van den Berg, Git Mailing List

"Avery Pennarun" <apenwarr@gmail.com> writes:
> On 6/26/08, Stephen R. van den Berg <srb@cuci.nl> wrote:
>> Avery Pennarun wrote:
>>>
>>> 1) What's a sensible way to tell git to *not* opendir() specific
>>> directories to look for unexpected files in "git status"?  (I don't
>>> think I know enough to implement this myself.)
>>
>> Would checking the mtime on the directory itself help?
> 
> I'm guessing it would help somewhat (although not as much as not
> checking anything at all).  However, we'd still have to check the
> mtime *against* something, and I don't think the index stores
> information about directories themselves.

By the way, from time to time there on this mailing list is idea
to add entries for directories in the index.  This could help situation
like yours, tracking emty directories, faster operations when some trees
are unchanged, subtree <-> subproject changes.

But it always comes back to: 1.) no proposed implementation, 2.) "git
tracks contents"...

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: update-index --assume-unchanged doesn't make things go fast
  2008-06-27 17:31     ` Jakub Narebski
@ 2008-06-27 17:56       ` Avery Pennarun
  2008-06-27 18:09         ` Dana How
  2008-06-28  2:03       ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Avery Pennarun @ 2008-06-27 17:56 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Stephen R. van den Berg, Git Mailing List

On 6/27/08, Jakub Narebski <jnareb@gmail.com> wrote:
> "Avery Pennarun" <apenwarr@gmail.com> writes:
> > On 6/26/08, Stephen R. van den Berg <srb@cuci.nl> wrote:
>  >> Avery Pennarun wrote:
>  >>> 1) What's a sensible way to tell git to *not* opendir() specific
>  >>> directories to look for unexpected files in "git status"?  (I don't
>  >>> think I know enough to implement this myself.)
>  >>
>  >> Would checking the mtime on the directory itself help?
>  >
>  > I'm guessing it would help somewhat (although not as much as not
>  > checking anything at all).  However, we'd still have to check the
>  > mtime *against* something, and I don't think the index stores
>  > information about directories themselves.
>
> By the way, from time to time there on this mailing list is idea
>  to add entries for directories in the index.  This could help situation
>  like yours, tracking emty directories, faster operations when some trees
>  are unchanged, subtree <-> subproject changes.
>
>  But it always comes back to: 1.) no proposed implementation, 2.) "git
>  tracks contents"...

Yes, I've seen the occasional discussions about this.

I might volunteer to help solve (1) except that I have a feeling that
changing the index format would mangle all sorts of things beyond my
current understanding.  Attaining that understanding might not be so
bad, except for (2), which seems like any proposed changes will
probably be rejected anyhow.

So naturally I was hoping for a magical alternative suggestion for my
current problem instead :)  One option I'm thinking about is to have
my proposed daemon keep its own "index", which tracks *all* the files
on the filesystem, not just the ones that have been
git-update-index'd.  Then anything that needs to compare against the
filesystem can choose to compare against the contents of this file
instead if it exists (and/or the right option is set, etc).  Does that
sound sane?

Have fun,

Avery

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: update-index --assume-unchanged doesn't make things go fast
  2008-06-27 17:56       ` Avery Pennarun
@ 2008-06-27 18:09         ` Dana How
  2008-06-27 18:51           ` Avery Pennarun
  0 siblings, 1 reply; 16+ messages in thread
From: Dana How @ 2008-06-27 18:09 UTC (permalink / raw)
  To: Avery Pennarun
  Cc: Jakub Narebski, Stephen R. van den Berg, Git Mailing List,
	danahow

On Fri, Jun 27, 2008 at 10:56 AM, Avery Pennarun <apenwarr@gmail.com> wrote:
> On 6/27/08, Jakub Narebski <jnareb@gmail.com> wrote:
>> "Avery Pennarun" <apenwarr@gmail.com> writes:
>> > On 6/26/08, Stephen R. van den Berg <srb@cuci.nl> wrote:
>>  >> Avery Pennarun wrote:
>>  >>> 1) What's a sensible way to tell git to *not* opendir() specific
>>  >>> directories to look for unexpected files in "git status"?  (I don't
>>  >>> think I know enough to implement this myself.)
>>  >>
>>  >> Would checking the mtime on the directory itself help?
>>  >
>>  > I'm guessing it would help somewhat (although not as much as not
>>  > checking anything at all).  However, we'd still have to check the
>>  > mtime *against* something, and I don't think the index stores
>>  > information about directories themselves.
>>
>> By the way, from time to time there on this mailing list is idea
>>  to add entries for directories in the index.  This could help situation
>>  like yours, tracking emty directories, faster operations when some trees
>>  are unchanged, subtree <-> subproject changes.
>>
>>  But it always comes back to: 1.) no proposed implementation, 2.) "git
>>  tracks contents"...
>
> Yes, I've seen the occasional discussions about this.
>
> I might volunteer to help solve (1) except that I have a feeling that
> changing the index format would mangle all sorts of things beyond my
> current understanding.  Attaining that understanding might not be so
> bad, except for (2), which seems like any proposed changes will
> probably be rejected anyhow.
>
> So naturally I was hoping for a magical alternative suggestion for my
> current problem instead :)  One option I'm thinking about is to have
> my proposed daemon keep its own "index", which tracks *all* the files
> on the filesystem, not just the ones that have been
> git-update-index'd.  Then anything that needs to compare against the
> filesystem can choose to compare against the contents of this file
> instead if it exists (and/or the right option is set, etc).  Does that
> sound sane?
It sounds sane to me b/c I had the same reaction to this discussion.
You mean "all the files in the _worktree_" ?
You would use e.g. inotify on all the directories except .git?
This would be very helpful with an extremely large number of files.

Thanks,
-- 
Dana L. How danahow@gmail.com +1 650 804 5991 cell

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: update-index --assume-unchanged doesn't make things go fast
  2008-06-27 18:09         ` Dana How
@ 2008-06-27 18:51           ` Avery Pennarun
  0 siblings, 0 replies; 16+ messages in thread
From: Avery Pennarun @ 2008-06-27 18:51 UTC (permalink / raw)
  To: Dana How; +Cc: Jakub Narebski, Stephen R. van den Berg, Git Mailing List

On 6/27/08, Dana How <danahow@gmail.com> wrote:
> It sounds sane to me b/c I had the same reaction to this discussion.
>  You mean "all the files in the _worktree_" ?
>  You would use e.g. inotify on all the directories except .git?
>  This would be very helpful with an extremely large number of files.

Yes, that's the idea.  In Win32, it would use something other than
inotify, but otherwise it should work about the same.

Avery

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: update-index --assume-unchanged doesn't make things go fast
  2008-06-27 17:31     ` Jakub Narebski
  2008-06-27 17:56       ` Avery Pennarun
@ 2008-06-28  2:03       ` Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2008-06-28  2:03 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Avery Pennarun, Stephen R. van den Berg, Git Mailing List

Jakub Narebski <jnareb@gmail.com> writes:

> By the way, from time to time there on this mailing list is idea
> to add entries for directories in the index.  This could help situation
> like yours, tracking emty directories, faster operations when some trees
> are unchanged, subtree <-> subproject changes.

Tracking empty directories might be helped by having an explicit entry in
the index (even though it may not be the only possible implementation).  I
however suspect you are overvaluing it for "some trees are unchanged"
case:

        $ mkdir -p a/b
        $ stat a | grep Modify
        Modify: 2008-06-27 11:38:13.000000000 -0700
        $ >a/b/c
        $ stat a | grep Modify
        Modify: 2008-06-27 11:38:13.000000000 -0700
        $ >a/d
        $ stat a | grep Modify
        Modify: 2008-06-27 11:38:32.000000000 -0700

You have to descend into the leaf level anyway and directory mtime does
not allow you to check that much.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2008-06-28  2:05 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-25 16:44 update-index --assume-unchanged doesn't make things go fast Avery Pennarun
2008-06-25 17:38 ` Michael J Gruber
2008-06-25 18:02   ` Avery Pennarun
2008-06-26  8:47     ` Michael J Gruber
2008-06-25 19:30 ` Jakub Narebski
2008-06-25 19:41   ` Junio C Hamano
2008-06-25 19:53   ` Avery Pennarun
2008-06-25 21:35     ` Jakub Narebski
2008-06-26  1:30       ` Avery Pennarun
2008-06-26 11:22 ` Stephen R. van den Berg
2008-06-27 17:01   ` Avery Pennarun
2008-06-27 17:31     ` Jakub Narebski
2008-06-27 17:56       ` Avery Pennarun
2008-06-27 18:09         ` Dana How
2008-06-27 18:51           ` Avery Pennarun
2008-06-28  2:03       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).