* git reset and ctime
@ 2010-12-03 21:36 ghazel
2010-12-04 0:51 ` Jonathan Nieder
0 siblings, 1 reply; 8+ messages in thread
From: ghazel @ 2010-12-03 21:36 UTC (permalink / raw)
To: git
Hi,
I've encountered a strange issue where "git reset --hard" insists on
"Checking out files ..." when all that is changed is the ctime on
these files. My deploy process (capistrano) maintains a cached copy of
a git repo, which it fetches, resets, and then hardlinks files from
when a deploy occurs ( https://github.com/37signals/fast_remote_cache
). The hardlinking step is meant to save the time of copying the file.
but hardlinking changes the ctime of the source files. That causes git
reset to re-check out the files when the next deploy occurs, which is
quite time-consuming. Some helpful people on #git showed that "git
update-index --refresh" before the git reset prevents this behavior,
but I wonder why that is needed at all.
Should git reset be performing whatever "git update-index --refresh"
is doing? Certainly in this case it would result in a vast speed
improvement.
-Greg
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: git reset and ctime
2010-12-03 21:36 git reset and ctime ghazel
@ 2010-12-04 0:51 ` Jonathan Nieder
2010-12-04 1:39 ` ghazel
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Jonathan Nieder @ 2010-12-04 0:51 UTC (permalink / raw)
To: ghazel; +Cc: git
Hi Greg,
ghazel@gmail.com wrote:
> I've encountered a strange issue where "git reset --hard" insists on
> "Checking out files ..." when all that is changed is the ctime
There is a performance trade-off. Refreshing the index requires
reading+hashing the existing file if the stat information changed;
this could be faster or slower than blindly overwriting depending on
the situation.
That said, I have no strong objection to an implicit refresh in "git
reset" (performance-sensitive scripts should be using read-tree
directly anyway). Have you tried making that change to
builtin/reset.c? How does it perform in practice?
> My deploy process (capistrano) maintains a cached copy of
> a git repo, which it fetches, resets, and then hardlinks files from
> when a deploy occurs ( https://github.com/37signals/fast_remote_cache
> ). The hardlinking step is meant to save the time of copying the file.
> but hardlinking changes the ctime of the source files.
Interesting. Setting "[core] trustctime = false" in the repository
configuration could be a good solution (no performance downside I can
think of).
Hope that helps,
Jonathan
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: git reset and ctime
2010-12-04 0:51 ` Jonathan Nieder
@ 2010-12-04 1:39 ` ghazel
2010-12-04 1:47 ` Jonathan Nieder
2010-12-04 2:28 ` Junio C Hamano
2010-12-06 17:37 ` Drew Northup
2 siblings, 1 reply; 8+ messages in thread
From: ghazel @ 2010-12-04 1:39 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git
On Fri, Dec 3, 2010 at 4:51 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> ghazel@gmail.com wrote:
>
>> I've encountered a strange issue where "git reset --hard" insists on
>> "Checking out files ..." when all that is changed is the ctime
>
> There is a performance trade-off. Refreshing the index requires
> reading+hashing the existing file if the stat information changed;
> this could be faster or slower than blindly overwriting depending on
> the situation.
>
> That said, I have no strong objection to an implicit refresh in "git
> reset" (performance-sensitive scripts should be using read-tree
> directly anyway). Have you tried making that change to
> builtin/reset.c? How does it perform in practice?
I did not make the modifications to reset.c, I just ran the refresh
before reset:
So originally:
$ time git reset --hard <rev>
Checking out files: 100% (2772/2772), done.
real 0m5.328s
user 0m2.539s
sys 0m2.542s
as opposed to:
$ time git update-index --refresh
real 0m1.236s
user 0m1.024s
sys 0m0.201s
$ time git reset --hard <rev>
real 0m0.055s
user 0m0.011s
sys 0m0.041s
>> My deploy process (capistrano) maintains a cached copy of
>> a git repo, which it fetches, resets, and then hardlinks files from
>> when a deploy occurs ( https://github.com/37signals/fast_remote_cache
>> ). The hardlinking step is meant to save the time of copying the file.
>> but hardlinking changes the ctime of the source files.
>
> Interesting. Setting "[core] trustctime = false" in the repository
> configuration could be a good solution (no performance downside I can
> think of).
This is a very useful suggestion. I do not see a case where ctime
would be valuable to me. Is it really valuable to other people? What
is the trade-off?
-Greg
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: git reset and ctime
2010-12-04 1:39 ` ghazel
@ 2010-12-04 1:47 ` Jonathan Nieder
0 siblings, 0 replies; 8+ messages in thread
From: Jonathan Nieder @ 2010-12-04 1:47 UTC (permalink / raw)
To: ghazel; +Cc: git
ghazel@gmail.com wrote:
> On Fri, Dec 3, 2010 at 4:51 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
>> ghazel@gmail.com wrote:
>>> My deploy process (capistrano) maintains a cached copy of
>>> a git repo, which it fetches, resets, and then hardlinks files from
>>> when a deploy occurs ( https://github.com/37signals/fast_remote_cache
>>> ). The hardlinking step is meant to save the time of copying the file.
>>> but hardlinking changes the ctime of the source files.
>>
>> Interesting. Setting "[core] trustctime = false" in the repository
>> configuration could be a good solution (no performance downside I can
>> think of).
>
> This is a very useful suggestion. I do not see a case where ctime
> would be valuable to me. Is it really valuable to other people? What
> is the trade-off?
Some reading for a rainy day :): [1] and surrounding discussion.
Short answer: I think the main purpose is catching worktree corruption
(e.g., if rsync screws up).
[1] http://thread.gmane.org/gmane.comp.version-control.git/89370/focus=89993
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: git reset and ctime
2010-12-04 0:51 ` Jonathan Nieder
2010-12-04 1:39 ` ghazel
@ 2010-12-04 2:28 ` Junio C Hamano
2010-12-06 17:37 ` Drew Northup
2 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2010-12-04 2:28 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: ghazel, git
Jonathan Nieder <jrnieder@gmail.com> writes:
> That said, I have no strong objection to an implicit refresh in "git
> reset" (performance-sensitive scripts should be using read-tree
> directly anyway). Have you tried making that change to
> builtin/reset.c? How does it perform in practice?
I would be more worried about correctness impact such a patch may make
when the index contains unmerged entries.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: git reset and ctime
2010-12-04 0:51 ` Jonathan Nieder
2010-12-04 1:39 ` ghazel
2010-12-04 2:28 ` Junio C Hamano
@ 2010-12-06 17:37 ` Drew Northup
2010-12-06 17:51 ` Jonathan Nieder
2 siblings, 1 reply; 8+ messages in thread
From: Drew Northup @ 2010-12-06 17:37 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: ghazel, git, Junio C Hamano
On Fri, 2010-12-03 at 18:51 -0600, Jonathan Nieder wrote:
> Hi Greg,
>
> ghazel@gmail.com wrote:
>
> > I've encountered a strange issue where "git reset --hard" insists on
> > "Checking out files ..." when all that is changed is the ctime
>
> There is a performance trade-off. Refreshing the index requires
> reading+hashing the existing file if the stat information changed;
> this could be faster or slower than blindly overwriting depending on
> the situation.
> > My deploy process (capistrano) maintains a cached copy of
> > a git repo, which it fetches, resets, and then hardlinks files from
> > when a deploy occurs ( https://github.com/37signals/fast_remote_cache
> > ). The hardlinking step is meant to save the time of copying the file.
> > but hardlinking changes the ctime of the source files.
>
> Interesting. Setting "[core] trustctime = false" in the repository
> configuration could be a good solution (no performance downside I can
> think of).
It is worth noting that many file-based backup systems which do "online"
backups (such as in use where I work) restore the atime by default at
the expense of the ctime (logic being that the atime may have had value
and the ctime changes either way--which may or may not be true) on unix
style filesystems. While many of the git command-line things I have run
seem to figure this out ok, it drives gitk nuts. As far as I am
concerned this is a small price to pay for a solid daily-updated backup
of my machine(s) to be available. I haven't yet put "git reset" of any
sort to use (obviously I just haven't been breaking enough things yet),
but I suspect that it would react in a similar way.
--
-Drew Northup N1XIM
AKA RvnPhnx on OPN
________________________________________________
"As opposed to vegetable or mineral error?"
-John Pescatore, SANS NewsBites Vol. 12 Num. 59
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: git reset and ctime
2010-12-06 17:37 ` Drew Northup
@ 2010-12-06 17:51 ` Jonathan Nieder
2010-12-07 15:14 ` Drew Northup
0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Nieder @ 2010-12-06 17:51 UTC (permalink / raw)
To: Drew Northup; +Cc: ghazel, git, Junio C Hamano
Drew Northup wrote:
> On Fri, 2010-12-03 at 18:51 -0600, Jonathan Nieder wrote:
>> Interesting. Setting "[core] trustctime = false" in the repository
>> configuration could be a good solution (no performance downside I can
>> think of).
>
> It is worth noting that many file-based backup systems which do "online"
> backups (such as in use where I work) restore the atime by default at
> the expense of the ctime (logic being that the atime may have had value
> and the ctime changes either way--which may or may not be true) on unix
> style filesystems.
So have you tried putting "[core] trustctime = false" in /etc/gitconfig?
This is exactly what the setting is for, after all.
Ideas for making this easier to find (FAQ on the git wiki? advice from
porcelain when ctime-only changes happen?) would be welcome, of course.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: git reset and ctime
2010-12-06 17:51 ` Jonathan Nieder
@ 2010-12-07 15:14 ` Drew Northup
0 siblings, 0 replies; 8+ messages in thread
From: Drew Northup @ 2010-12-07 15:14 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: ghazel, git, Junio C Hamano
On Mon, 2010-12-06 at 11:51 -0600, Jonathan Nieder wrote:
> Drew Northup wrote:
> > On Fri, 2010-12-03 at 18:51 -0600, Jonathan Nieder wrote:
>
> >> Interesting. Setting "[core] trustctime = false" in the repository
> >> configuration could be a good solution (no performance downside I can
> >> think of).
> >
> > It is worth noting that many file-based backup systems which do "online"
> > backups (such as in use where I work) restore the atime by default at
> > the expense of the ctime (logic being that the atime may have had value
> > and the ctime changes either way--which may or may not be true) on unix
> > style filesystems.
>
> So have you tried putting "[core] trustctime = false" in /etc/gitconfig?
> This is exactly what the setting is for, after all.
I hadn't yet, but it works like a charm.
> Ideas for making this easier to find (FAQ on the git wiki? advice from
> porcelain when ctime-only changes happen?) would be welcome, of course.
I'll have a look over that way a bit later.
I'm also going to have to have a look at the src.rpm for this particular
packaging of git and find out why it didn't create a
skeleton /etc/gitconfig (without much in it) in the postinstall script.
(I'm using the Dag Wieers / rpmforge one on my desktop.) It makes a lot
more sense to send along a patch then randomly demand that he change
it--he may have had a decent reason for not doing so.
--
-Drew Northup N1XIM
AKA RvnPhnx on OPN
________________________________________________
"As opposed to vegetable or mineral error?"
-John Pescatore, SANS NewsBites Vol. 12 Num. 59
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-12-07 15:16 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-03 21:36 git reset and ctime ghazel
2010-12-04 0:51 ` Jonathan Nieder
2010-12-04 1:39 ` ghazel
2010-12-04 1:47 ` Jonathan Nieder
2010-12-04 2:28 ` Junio C Hamano
2010-12-06 17:37 ` Drew Northup
2010-12-06 17:51 ` Jonathan Nieder
2010-12-07 15:14 ` Drew Northup
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).