Git development
 help / color / mirror / Atom feed
* Sort of a feature proposal
@ 2008-05-07 14:48 David Kastrup
  2008-05-07 15:41 ` Nicolas Pitre
  2008-05-07 16:25 ` Linus Torvalds
  0 siblings, 2 replies; 8+ messages in thread
From: David Kastrup @ 2008-05-07 14:48 UTC (permalink / raw)
  To: git


Hi, I have some large git repositories on a USB drive (ext3 file
system).  That means that when replugging the drive, the recorded st_dev
data in the index is off, meaning that the whole repo directory
structure gets reread as the stat data of all directories has changed.

That's a nuisance.  Can't we have some heuristic or configuration option
where we, say, record the st_dev of the _index_ file, and if that has
changed, we propagate that change to the st_dev of its contents?  I'd
like to see something that works more efficiently than rescanning the
whole disk every time I hibernate my computer.

Thanks,

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sort of a feature proposal
  2008-05-07 14:48 Sort of a feature proposal David Kastrup
@ 2008-05-07 15:41 ` Nicolas Pitre
  2008-05-07 16:00   ` Stephen R. van den Berg
  2008-05-07 16:03   ` Avery Pennarun
  2008-05-07 16:25 ` Linus Torvalds
  1 sibling, 2 replies; 8+ messages in thread
From: Nicolas Pitre @ 2008-05-07 15:41 UTC (permalink / raw)
  To: David Kastrup; +Cc: git

On Wed, 7 May 2008, David Kastrup wrote:

> 
> Hi, I have some large git repositories on a USB drive (ext3 file
> system).  That means that when replugging the drive, the recorded st_dev
> data in the index is off, meaning that the whole repo directory
> structure gets reread as the stat data of all directories has changed.
> 
> That's a nuisance.  Can't we have some heuristic or configuration option
> where we, say, record the st_dev of the _index_ file, and if that has
> changed, we propagate that change to the st_dev of its contents?  I'd
> like to see something that works more efficiently than rescanning the
> whole disk every time I hibernate my computer.

Maybe simply ignoring st_dev is the solution?  I hardly can see what 
value it had to the other stat fields.


Nicolas

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sort of a feature proposal
  2008-05-07 15:41 ` Nicolas Pitre
@ 2008-05-07 16:00   ` Stephen R. van den Berg
  2008-05-07 16:03   ` Avery Pennarun
  1 sibling, 0 replies; 8+ messages in thread
From: Stephen R. van den Berg @ 2008-05-07 16:00 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: David Kastrup, git

Nicolas Pitre wrote:
>Maybe simply ignoring st_dev is the solution?  I hardly can see what 
>value it had to the other stat fields.

It determines the scope of st_ino.
-- 
Sincerely,                                                          srb@cuci.nl
           Stephen R. van den Berg.
Lady Astor: "Winston, if you were my husband, I'd put poison in your coffee."
 Churchill: "Nancy, if you were my wife, I'd drink it."

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sort of a feature proposal
  2008-05-07 15:41 ` Nicolas Pitre
  2008-05-07 16:00   ` Stephen R. van den Berg
@ 2008-05-07 16:03   ` Avery Pennarun
  1 sibling, 0 replies; 8+ messages in thread
From: Avery Pennarun @ 2008-05-07 16:03 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: David Kastrup, git

On 5/7/08, Nicolas Pitre <nico@cam.org> wrote:
> On Wed, 7 May 2008, David Kastrup wrote:
>  > Hi, I have some large git repositories on a USB drive (ext3 file
>  > system).  That means that when replugging the drive, the recorded st_dev
>  > data in the index is off, meaning that the whole repo directory
>  > structure gets reread as the stat data of all directories has changed.
>  >
>  > That's a nuisance.  Can't we have some heuristic or configuration option
>  > where we, say, record the st_dev of the _index_ file, and if that has
>  > changed, we propagate that change to the st_dev of its contents?  I'd
>  > like to see something that works more efficiently than rescanning the
>  > whole disk every time I hibernate my computer.
>
> Maybe simply ignoring st_dev is the solution?  I hardly can see what
>  value it had to the other stat fields.

If I understand correctly, you can be sure a file hasn't changed if it
has exactly the same (dev,inode,ctime,length) attributes.  If you
don't track the dev, you can't be certain whether file attributes look
identical but it was actually on another disk, and therefore might
have different content after all.

It's obviously a pretty rare case, but nobody likes a version control
system that works properly "almost" all the time :)

Have fun,

Avery

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sort of a feature proposal
  2008-05-07 14:48 Sort of a feature proposal David Kastrup
  2008-05-07 15:41 ` Nicolas Pitre
@ 2008-05-07 16:25 ` Linus Torvalds
  2008-05-07 17:39   ` David Kastrup
  1 sibling, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2008-05-07 16:25 UTC (permalink / raw)
  To: David Kastrup; +Cc: git



On Wed, 7 May 2008, David Kastrup wrote:
> 
> Hi, I have some large git repositories on a USB drive (ext3 file
> system).  That means that when replugging the drive, the recorded st_dev
> data in the index is off, meaning that the whole repo directory
> structure gets reread as the stat data of all directories has changed.
> 
> That's a nuisance.  Can't we have some heuristic or configuration option
> where we, say, record the st_dev of the _index_ file, and if that has
> changed, we propagate that change to the st_dev of its contents?  I'd
> like to see something that works more efficiently than rescanning the
> whole disk every time I hibernate my computer.

Hmm. We shouldn't even be using st_dev any more.

How did you compile your git version? By default USE_STDEV should be off, 
and it's been that way for a long time (because st_dev is also not 
reliable on NFS etc).

		Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sort of a feature proposal
  2008-05-07 16:25 ` Linus Torvalds
@ 2008-05-07 17:39   ` David Kastrup
  2008-05-07 17:50     ` Dmitry Potapov
  0 siblings, 1 reply; 8+ messages in thread
From: David Kastrup @ 2008-05-07 17:39 UTC (permalink / raw)
  To: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Wed, 7 May 2008, David Kastrup wrote:
>> 
>> Hi, I have some large git repositories on a USB drive (ext3 file
>> system).  That means that when replugging the drive, the recorded st_dev
>> data in the index is off, meaning that the whole repo directory
>> structure gets reread as the stat data of all directories has changed.
>> 
>> That's a nuisance.  Can't we have some heuristic or configuration option
>> where we, say, record the st_dev of the _index_ file, and if that has
>> changed, we propagate that change to the st_dev of its contents?  I'd
>> like to see something that works more efficiently than rescanning the
>> whole disk every time I hibernate my computer.
>
> Hmm. We shouldn't even be using st_dev any more.
>
> How did you compile your git version? By default USE_STDEV should be off, 
> and it's been that way for a long time (because st_dev is also not 
> reliable on NFS etc).

Looks that way in my Makefile.  Maybe I am confused: I just did some
timings (this is ext3 on a USB drive) and got

    git svn rebase
    Current branch master is up to date.
    dak@lisa:/lisa/texlive$ time git svn rebase
    Current branch master is up to date.

    real	0m4.581s
    user	0m2.244s
    sys	0m1.492s
    dak@lisa:/lisa/texlive$ cd
    dak@lisa:~$ sudo umount /lisa;sudo mount /dev/mapper/Medion-reps /lisa;cd /lisa/texlive;time git svn rebase
    Current branch master is up to date.

    real	0m53.588s
    user	0m2.248s
    sys	0m2.388s
    dak@lisa:/lisa/texlive$ cd;sudo umount /lisa;sudo dmsetup remove /dev/mapper/Medion-reps
[Unplug and replug the USB drive]
    dak@lisa:~$ sudo mount /dev/mapper/Medion-reps /lisa;cd /lisa/texlive;time git svn rebase
    Current branch master is up to date.

    real	0m53.101s
    user	0m2.324s
    sys	0m2.380s
    dak@lisa:/lisa/texlive$ 

If my guess that the device number of LVM does not change when merely
un- and remounting, but does change when unplugging and replugging is
correct, it would appear that my idea where the time went was wrong and
that the device number has nothing whatsoever to do with the large
amount of lookups (this is a USB2.0 device at High Speed).

Is there a way to completely invalidate the disk cache without
unmounting?  How do I verify device numbers?

Thanks,

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sort of a feature proposal
  2008-05-07 17:39   ` David Kastrup
@ 2008-05-07 17:50     ` Dmitry Potapov
  2008-05-07 18:05       ` David Kastrup
  0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Potapov @ 2008-05-07 17:50 UTC (permalink / raw)
  To: David Kastrup; +Cc: git

>  Is there a way to completely invalidate the disk cache without
>  unmounting?

echo 3 > /proc/sys/vm/drop_caches

Dmitry

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sort of a feature proposal
  2008-05-07 17:50     ` Dmitry Potapov
@ 2008-05-07 18:05       ` David Kastrup
  0 siblings, 0 replies; 8+ messages in thread
From: David Kastrup @ 2008-05-07 18:05 UTC (permalink / raw)
  To: git

"Dmitry Potapov" <dpotapov@gmail.com> writes:

>>  Is there a way to completely invalidate the disk cache without
>>  unmounting?
>
> echo 3 > /proc/sys/vm/drop_caches

Sigh.  It is the disk cache after all.  Looks like "git svn rebase"
can't just work from the index (ok, it checks whether there are
unstashed modifications).  And indeed, flushing the cache will make much
less of a difference for "git svn fetch" than for rebase.

Sorry for the noise.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-05-07 18:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-07 14:48 Sort of a feature proposal David Kastrup
2008-05-07 15:41 ` Nicolas Pitre
2008-05-07 16:00   ` Stephen R. van den Berg
2008-05-07 16:03   ` Avery Pennarun
2008-05-07 16:25 ` Linus Torvalds
2008-05-07 17:39   ` David Kastrup
2008-05-07 17:50     ` Dmitry Potapov
2008-05-07 18:05       ` David Kastrup

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox