* Re: Nanosecond fs timestamp support: sad
2011-07-22 22:59 ` NeilBrown
@ 2011-07-22 23:06 ` J. Bruce Fields
2011-07-22 23:49 ` J. Bruce Fields
2011-07-23 0:07 ` Matt Mackall
` (2 subsequent siblings)
3 siblings, 1 reply; 21+ messages in thread
From: J. Bruce Fields @ 2011-07-22 23:06 UTC (permalink / raw)
To: NeilBrown
Cc: Andi Kleen, Matt Mackall, linux-fsdevel,
Linux Kernel Mailing List
On Sat, Jul 23, 2011 at 08:59:15AM +1000, NeilBrown wrote:
> On Fri, 22 Jul 2011 18:31:58 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> wrote:
>
> > On Fri, Jul 22, 2011 at 06:10:39PM -0400, bfields wrote:
> > > On Fri, Jul 22, 2011 at 11:47:32PM +0200, Andi Kleen wrote:
> > > > On Fri, Jul 22, 2011 at 04:11:42PM -0500, Matt Mackall wrote:
> > > > > On Fri, 2011-07-22 at 22:59 +0200, Andi Kleen wrote:
> > > > > > > Indeed. Only usefully exists on ext4 and requires extra system calls.
> > > > > >
> > > > > > Not sure what you mean? It's in stat(2), just like the timestamps.
> > > > >
> > > > > I don't see anything that looks like a version or generation number in
> > > > > either the man pages, the asm-generic/stat.h, or glibc's asm/stat.h.
> > > > > Pointer?
> > > >
> > > > Hmm you're right. I thought it was in there, but apparently not.
> > > > I think it should be added there though. We still have some unused
> > > > fields.
> > >
> > > But last I checked I thought it was only ext4 that actually incremented
> > > the i_version on IO, and even then only when given a (non-default) mount
> > > option.
> > >
> > > My notes on what needs to be done there:
> > >
> > > - collect data to determine whether turning on i_version causes
> > > any significant performance regressions.
> > > - Last I talked to him, Ted Tso recommended running
> > > Bonnie on a local disk, since it does a lot of little
> > > writes, which is somewhat of a worst case, as it will
> > > generate extra metadata updates for each write.
> > > Compare total wall-clock time, number of iops, and
> > > number of bytes (using some kind of block tracing).
> > > - If there aren't any problems, turn it on by default, and we're
> > > done.
> >
> > (Well,and talk the other filesystem implementors into doing it.)
> >
>
> But does anyone apart from NFSv4 actually *want* i_version as opposed to the
> more-generally-useful precise timestamps?
It *seems* like a generally useful idea, but I don't know of any other
users.
> If not, we probably should tell NFSv4 to use timestamps and focus on making
> them work well.
> ??
Well, sure, I couldn't complain about that if that proved possible.
--b.
>
> The timestamp used doesn't need to update ever nanosecond. I think if it
> were just updated on every userspace->kernel transition (or effective
> equivalents inside kernel threads) that would be enough capture all
> causality. I wonder how that would be achieved.. I wonder if RCU machinery
> could help - doesn't it keep track of when threads schedule ... or something?
>
> NeilBrown
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Nanosecond fs timestamp support: sad
2011-07-22 23:06 ` J. Bruce Fields
@ 2011-07-22 23:49 ` J. Bruce Fields
2011-07-23 0:07 ` NeilBrown
0 siblings, 1 reply; 21+ messages in thread
From: J. Bruce Fields @ 2011-07-22 23:49 UTC (permalink / raw)
To: NeilBrown
Cc: Andi Kleen, Matt Mackall, linux-fsdevel,
Linux Kernel Mailing List
On Fri, Jul 22, 2011 at 07:06:12PM -0400, J. Bruce Fields wrote:
> On Sat, Jul 23, 2011 at 08:59:15AM +1000, NeilBrown wrote:
> > But does anyone apart from NFSv4 actually *want* i_version as opposed to the
> > more-generally-useful precise timestamps?
>
> It *seems* like a generally useful idea, but I don't know of any other
> users.
(Out of curiosity: what actually *needs* real timestamps?:
- They're generally useful to people, of course; ("what did I
change last tuesday?")
- Make uses them, though in theory perhaps it could do the same
job by caching records like "object X was built from
versions a, b, and c of objects A, B, and C respectively".
But a lot of uses are probably just to answer the question "did this
file change since the last time I looked at it"?
Of course, however theoretically useful, there's always the argument
that linux-specific interfaces are unlikely to be used by anyone except
Lennart Poettering.)
--b.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Nanosecond fs timestamp support: sad
2011-07-22 23:49 ` J. Bruce Fields
@ 2011-07-23 0:07 ` NeilBrown
0 siblings, 0 replies; 21+ messages in thread
From: NeilBrown @ 2011-07-23 0:07 UTC (permalink / raw)
To: J. Bruce Fields
Cc: Andi Kleen, Matt Mackall, linux-fsdevel,
Linux Kernel Mailing List
On Fri, 22 Jul 2011 19:49:21 -0400 "J. Bruce Fields" <bfields@fieldses.org>
wrote:
> On Fri, Jul 22, 2011 at 07:06:12PM -0400, J. Bruce Fields wrote:
> > On Sat, Jul 23, 2011 at 08:59:15AM +1000, NeilBrown wrote:
> > > But does anyone apart from NFSv4 actually *want* i_version as opposed to the
> > > more-generally-useful precise timestamps?
> >
> > It *seems* like a generally useful idea, but I don't know of any other
> > users.
>
> (Out of curiosity: what actually *needs* real timestamps?:
> - They're generally useful to people, of course; ("what did I
> change last tuesday?")
In the same vein they are useful for archiving. "what has changed since I
last started an archive?"
NFSv3 caching obviously uses them too.
> - Make uses them, though in theory perhaps it could do the same
> job by caching records like "object X was built from
> versions a, b, and c of objects A, B, and C respectively".
In theory....
>
> But a lot of uses are probably just to answer the question "did this
> file change since the last time I looked at it"?
I think everything could fall in two one of two categories.
a/ did this file change since the last time I looked at it?
b/ did this file change since the last time that file changed?
The former can be achieved with versions or timestamps.
The latter requires globally coherent high precision timestamps... or
something like dependency tracking which would probably be even more
expensive and - as you say - non-standard.
NeilBrown
>
> Of course, however theoretically useful, there's always the argument
> that linux-specific interfaces are unlikely to be used by anyone except
> Lennart Poettering.)
>
> --b.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Nanosecond fs timestamp support: sad
2011-07-22 22:59 ` NeilBrown
2011-07-22 23:06 ` J. Bruce Fields
@ 2011-07-23 0:07 ` Matt Mackall
2011-07-23 1:38 ` J. Bruce Fields
2011-07-29 19:49 ` Pavel Machek
2011-07-23 1:13 ` Andreas Dilger
2011-07-25 15:09 ` Paul E. McKenney
3 siblings, 2 replies; 21+ messages in thread
From: Matt Mackall @ 2011-07-23 0:07 UTC (permalink / raw)
To: NeilBrown
Cc: J. Bruce Fields, Andi Kleen, linux-fsdevel,
Linux Kernel Mailing List
On Sat, 2011-07-23 at 08:59 +1000, NeilBrown wrote:
> On Fri, 22 Jul 2011 18:31:58 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> wrote:
>
> > On Fri, Jul 22, 2011 at 06:10:39PM -0400, bfields wrote:
> > > On Fri, Jul 22, 2011 at 11:47:32PM +0200, Andi Kleen wrote:
> > > > On Fri, Jul 22, 2011 at 04:11:42PM -0500, Matt Mackall wrote:
> > > > > On Fri, 2011-07-22 at 22:59 +0200, Andi Kleen wrote:
> > > > > > > Indeed. Only usefully exists on ext4 and requires extra system calls.
> > > > > >
> > > > > > Not sure what you mean? It's in stat(2), just like the timestamps.
> > > > >
> > > > > I don't see anything that looks like a version or generation number in
> > > > > either the man pages, the asm-generic/stat.h, or glibc's asm/stat.h.
> > > > > Pointer?
> > > >
> > > > Hmm you're right. I thought it was in there, but apparently not.
> > > > I think it should be added there though. We still have some unused
> > > > fields.
> > >
> > > But last I checked I thought it was only ext4 that actually incremented
> > > the i_version on IO, and even then only when given a (non-default) mount
> > > option.
> > >
> > > My notes on what needs to be done there:
> > >
> > > - collect data to determine whether turning on i_version causes
> > > any significant performance regressions.
> > > - Last I talked to him, Ted Tso recommended running
> > > Bonnie on a local disk, since it does a lot of little
> > > writes, which is somewhat of a worst case, as it will
> > > generate extra metadata updates for each write.
> > > Compare total wall-clock time, number of iops, and
> > > number of bytes (using some kind of block tracing).
> > > - If there aren't any problems, turn it on by default, and we're
> > > done.
> >
> > (Well,and talk the other filesystem implementors into doing it.)
> >
>
> But does anyone apart from NFSv4 actually *want* i_version as opposed to the
> more-generally-useful precise timestamps?
In theory, a microsecond timestamp (ie gtod) may already not be good
enough for all applications. But i_version also doesn't allow comparing
across files.
> If not, we probably should tell NFSv4 to use timestamps and focus on making
> them work well.
> ??
>
> The timestamp used doesn't need to update ever nanosecond. I think if it
> were just updated on every userspace->kernel transition (or effective
> equivalents inside kernel threads) that would be enough capture all
> causality. I wonder how that would be achieved.. I wonder if RCU machinery
> could help - doesn't it keep track of when threads schedule ... or something?
Sort of.
Some observations:
- we only need to go to higher resolution when two events happen in the
same time quantum
- this applies at both the level of seconds and jiffies
- if the only file touched in a given quantum gets touched ago, we don't
need to update its timestamp if stat wasn't also called on it in this
quantum
- we never need to use a higher resolution than the global
min(s_time_gran)
For instance, if a machine is idle, except for writing to a single file
once a second, 1s resolution suffices.
If a machine is idle, except for writing to the same file 1000 times per
second, and no one is watching it, 1s still suffices (inode is dirtied
once per second).
Any time two files are touched in the same second, the second one (and
later files) needs jiffies resolution. Similarly, any time two files are
touched in the same jiffy, the second one should use gtod().
The global status bits needed to track this could be managed fairly
efficiently with cmpxchg.
(Arguably, we should supply > 1s resolution whether they're strictly
needed or not on filesystems with nanosecond support, so that people
casually inspecting timestamps don't wonder where their nanoseconds
went.)
--
Mathematics is the supreme nostalgia of our time.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Nanosecond fs timestamp support: sad
2011-07-23 0:07 ` Matt Mackall
@ 2011-07-23 1:38 ` J. Bruce Fields
2011-07-23 2:10 ` Trond Myklebust
2011-07-24 1:56 ` Andi Kleen
2011-07-29 19:49 ` Pavel Machek
1 sibling, 2 replies; 21+ messages in thread
From: J. Bruce Fields @ 2011-07-23 1:38 UTC (permalink / raw)
To: Matt Mackall
Cc: NeilBrown, Andi Kleen, linux-fsdevel, Linux Kernel Mailing List
On Fri, Jul 22, 2011 at 07:07:41PM -0500, Matt Mackall wrote:
> On Sat, 2011-07-23 at 08:59 +1000, NeilBrown wrote:
> > On Fri, 22 Jul 2011 18:31:58 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> > wrote:
> >
> > > On Fri, Jul 22, 2011 at 06:10:39PM -0400, bfields wrote:
> > > > On Fri, Jul 22, 2011 at 11:47:32PM +0200, Andi Kleen wrote:
> > > > > On Fri, Jul 22, 2011 at 04:11:42PM -0500, Matt Mackall wrote:
> > > > > > On Fri, 2011-07-22 at 22:59 +0200, Andi Kleen wrote:
> > > > > > > > Indeed. Only usefully exists on ext4 and requires extra system calls.
> > > > > > >
> > > > > > > Not sure what you mean? It's in stat(2), just like the timestamps.
> > > > > >
> > > > > > I don't see anything that looks like a version or generation number in
> > > > > > either the man pages, the asm-generic/stat.h, or glibc's asm/stat.h.
> > > > > > Pointer?
> > > > >
> > > > > Hmm you're right. I thought it was in there, but apparently not.
> > > > > I think it should be added there though. We still have some unused
> > > > > fields.
> > > >
> > > > But last I checked I thought it was only ext4 that actually incremented
> > > > the i_version on IO, and even then only when given a (non-default) mount
> > > > option.
> > > >
> > > > My notes on what needs to be done there:
> > > >
> > > > - collect data to determine whether turning on i_version causes
> > > > any significant performance regressions.
> > > > - Last I talked to him, Ted Tso recommended running
> > > > Bonnie on a local disk, since it does a lot of little
> > > > writes, which is somewhat of a worst case, as it will
> > > > generate extra metadata updates for each write.
> > > > Compare total wall-clock time, number of iops, and
> > > > number of bytes (using some kind of block tracing).
> > > > - If there aren't any problems, turn it on by default, and we're
> > > > done.
> > >
> > > (Well,and talk the other filesystem implementors into doing it.)
> > >
> >
> > But does anyone apart from NFSv4 actually *want* i_version as opposed to the
> > more-generally-useful precise timestamps?
>
> In theory, a microsecond timestamp (ie gtod) may already not be good
> enough for all applications. But i_version also doesn't allow comparing
> across files.
>
> > If not, we probably should tell NFSv4 to use timestamps and focus on making
> > them work well.
> > ??
> >
> > The timestamp used doesn't need to update ever nanosecond. I think if it
> > were just updated on every userspace->kernel transition (or effective
> > equivalents inside kernel threads) that would be enough capture all
> > causality. I wonder how that would be achieved.. I wonder if RCU machinery
> > could help - doesn't it keep track of when threads schedule ... or something?
>
> Sort of.
>
> Some observations:
>
> - we only need to go to higher resolution when two events happen in the
> same time quantum
> - this applies at both the level of seconds and jiffies
> - if the only file touched in a given quantum gets touched ago, we don't
> need to update its timestamp if stat wasn't also called on it in this
> quantum
> - we never need to use a higher resolution than the global
> min(s_time_gran)
Right, so there was a rough algorithm hashed out somewhere around here:
http://thread.gmane.org/gmane.linux.kernel/1022866/focus=1024624
that depended on those observations.
NFS presents a worst-case as the standard NFSv3 read and write
operations include timestamps in the result. So every single IO comes
with a stat. So either you have a clock good enough to give a distinct
timestamp for all of those, or you fall back on a global counter that
ends up serializing all IO. I think. I admit I'm not sure I understand
your proposal below.
--b.
>
>
> For instance, if a machine is idle, except for writing to a single file
> once a second, 1s resolution suffices.
>
> If a machine is idle, except for writing to the same file 1000 times per
> second, and no one is watching it, 1s still suffices (inode is dirtied
> once per second).
>
> Any time two files are touched in the same second, the second one (and
> later files) needs jiffies resolution. Similarly, any time two files are
> touched in the same jiffy, the second one should use gtod().
>
> The global status bits needed to track this could be managed fairly
> efficiently with cmpxchg.
>
> (Arguably, we should supply > 1s resolution whether they're strictly
> needed or not on filesystems with nanosecond support, so that people
> casually inspecting timestamps don't wonder where their nanoseconds
> went.)
>
> --
> Mathematics is the supreme nostalgia of our time.
>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Nanosecond fs timestamp support: sad
2011-07-23 1:38 ` J. Bruce Fields
@ 2011-07-23 2:10 ` Trond Myklebust
2011-07-24 1:56 ` Andi Kleen
1 sibling, 0 replies; 21+ messages in thread
From: Trond Myklebust @ 2011-07-23 2:10 UTC (permalink / raw)
To: J. Bruce Fields
Cc: Matt Mackall, NeilBrown, Andi Kleen, linux-fsdevel,
Linux Kernel Mailing List
On Fri, 2011-07-22 at 21:38 -0400, J. Bruce Fields wrote:
> On Fri, Jul 22, 2011 at 07:07:41PM -0500, Matt Mackall wrote:
> > On Sat, 2011-07-23 at 08:59 +1000, NeilBrown wrote:
> > > On Fri, 22 Jul 2011 18:31:58 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> > > wrote:
> > >
> > > > On Fri, Jul 22, 2011 at 06:10:39PM -0400, bfields wrote:
> > > > > On Fri, Jul 22, 2011 at 11:47:32PM +0200, Andi Kleen wrote:
> > > > > > On Fri, Jul 22, 2011 at 04:11:42PM -0500, Matt Mackall wrote:
> > > > > > > On Fri, 2011-07-22 at 22:59 +0200, Andi Kleen wrote:
> > > > > > > > > Indeed. Only usefully exists on ext4 and requires extra system calls.
> > > > > > > >
> > > > > > > > Not sure what you mean? It's in stat(2), just like the timestamps.
> > > > > > >
> > > > > > > I don't see anything that looks like a version or generation number in
> > > > > > > either the man pages, the asm-generic/stat.h, or glibc's asm/stat.h.
> > > > > > > Pointer?
> > > > > >
> > > > > > Hmm you're right. I thought it was in there, but apparently not.
> > > > > > I think it should be added there though. We still have some unused
> > > > > > fields.
> > > > >
> > > > > But last I checked I thought it was only ext4 that actually incremented
> > > > > the i_version on IO, and even then only when given a (non-default) mount
> > > > > option.
> > > > >
> > > > > My notes on what needs to be done there:
> > > > >
> > > > > - collect data to determine whether turning on i_version causes
> > > > > any significant performance regressions.
> > > > > - Last I talked to him, Ted Tso recommended running
> > > > > Bonnie on a local disk, since it does a lot of little
> > > > > writes, which is somewhat of a worst case, as it will
> > > > > generate extra metadata updates for each write.
> > > > > Compare total wall-clock time, number of iops, and
> > > > > number of bytes (using some kind of block tracing).
> > > > > - If there aren't any problems, turn it on by default, and we're
> > > > > done.
> > > >
> > > > (Well,and talk the other filesystem implementors into doing it.)
> > > >
> > >
> > > But does anyone apart from NFSv4 actually *want* i_version as opposed to the
> > > more-generally-useful precise timestamps?
> >
> > In theory, a microsecond timestamp (ie gtod) may already not be good
> > enough for all applications. But i_version also doesn't allow comparing
> > across files.
> >
> > > If not, we probably should tell NFSv4 to use timestamps and focus on making
> > > them work well.
> > > ??
> > >
> > > The timestamp used doesn't need to update ever nanosecond. I think if it
> > > were just updated on every userspace->kernel transition (or effective
> > > equivalents inside kernel threads) that would be enough capture all
> > > causality. I wonder how that would be achieved.. I wonder if RCU machinery
> > > could help - doesn't it keep track of when threads schedule ... or something?
> >
> > Sort of.
> >
> > Some observations:
> >
> > - we only need to go to higher resolution when two events happen in the
> > same time quantum
> > - this applies at both the level of seconds and jiffies
> > - if the only file touched in a given quantum gets touched ago, we don't
> > need to update its timestamp if stat wasn't also called on it in this
> > quantum
> > - we never need to use a higher resolution than the global
> > min(s_time_gran)
>
> Right, so there was a rough algorithm hashed out somewhere around here:
>
> http://thread.gmane.org/gmane.linux.kernel/1022866/focus=1024624
>
> that depended on those observations.
>
> NFS presents a worst-case as the standard NFSv3 read and write
> operations include timestamps in the result. So every single IO comes
> with a stat. So either you have a clock good enough to give a distinct
> timestamp for all of those, or you fall back on a global counter that
> ends up serializing all IO. I think. I admit I'm not sure I understand
> your proposal below.
...or you admit that NFSv3 is no longer able to keep up with modern
processing speeds and storage, and you ditch it in favour of NFSv4.
Time-stamps are _not_ the optimal way to label changes in a clustered
environment (or even a multi-cpu/multi-core environment): aside from the
various issues involving absolute time vs. wall clock time, you also
have to deal with clock synchronisation across those nodes/cpus/cores at
the < microsecond resolution level. Have fun doing that...
Trond
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Nanosecond fs timestamp support: sad
2011-07-23 1:38 ` J. Bruce Fields
2011-07-23 2:10 ` Trond Myklebust
@ 2011-07-24 1:56 ` Andi Kleen
1 sibling, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2011-07-24 1:56 UTC (permalink / raw)
To: J. Bruce Fields
Cc: Matt Mackall, NeilBrown, Andi Kleen, linux-fsdevel,
Linux Kernel Mailing List
> with a stat. So either you have a clock good enough to give a distinct
> timestamp for all of those, or you fall back on a global counter that
> ends up serializing all IO. I think. I admit I'm not sure I understand
Not global counter, but per inode. That's very reasonable because there's
already locking on the inode level.
-Andi
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Nanosecond fs timestamp support: sad
2011-07-23 0:07 ` Matt Mackall
2011-07-23 1:38 ` J. Bruce Fields
@ 2011-07-29 19:49 ` Pavel Machek
2011-07-29 21:37 ` Matt Mackall
1 sibling, 1 reply; 21+ messages in thread
From: Pavel Machek @ 2011-07-29 19:49 UTC (permalink / raw)
To: Matt Mackall
Cc: NeilBrown, J. Bruce Fields, Andi Kleen, linux-fsdevel,
Linux Kernel Mailing List
Hi!
> > If not, we probably should tell NFSv4 to use timestamps and focus on making
> > them work well.
> > ??
> >
> > The timestamp used doesn't need to update ever nanosecond. I think if it
> > were just updated on every userspace->kernel transition (or effective
> > equivalents inside kernel threads) that would be enough capture all
> > causality. I wonder how that would be achieved.. I wonder if RCU machinery
> > could help - doesn't it keep track of when threads schedule ... or something?
>
> Sort of.
>
> Some observations:
>
> - we only need to go to higher resolution when two events happen in the
> same time quantum
> - this applies at both the level of seconds and jiffies
> - if the only file touched in a given quantum gets touched ago, we don't
> need to update its timestamp if stat wasn't also called on it in this
> quantum
parse error aroound 'ago'.
> - we never need to use a higher resolution than the global
> min(s_time_gran)
>
>
> For instance, if a machine is idle, except for writing to a single file
> once a second, 1s resolution suffices.
Are you sure? As soon as you get network communication...
> Any time two files are touched in the same second, the second one (and
> later files) needs jiffies resolution. Similarly, any time two files are
> touched in the same jiffy, the second one should use gtod().
For make. I don't see how this is globally true.
I do
( date; > stamp; date ) | ( sleep 5; cat > counterexample )
I know timestamp should be between two dates, but it is not.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Nanosecond fs timestamp support: sad
2011-07-29 19:49 ` Pavel Machek
@ 2011-07-29 21:37 ` Matt Mackall
0 siblings, 0 replies; 21+ messages in thread
From: Matt Mackall @ 2011-07-29 21:37 UTC (permalink / raw)
To: Pavel Machek
Cc: NeilBrown, J. Bruce Fields, Andi Kleen, linux-fsdevel,
Linux Kernel Mailing List
On Fri, 2011-07-29 at 21:49 +0200, Pavel Machek wrote:
> Hi!
>
> > > If not, we probably should tell NFSv4 to use timestamps and focus on making
> > > them work well.
> > > ??
> > >
> > > The timestamp used doesn't need to update ever nanosecond. I think if it
> > > were just updated on every userspace->kernel transition (or effective
> > > equivalents inside kernel threads) that would be enough capture all
> > > causality. I wonder how that would be achieved.. I wonder if RCU machinery
> > > could help - doesn't it keep track of when threads schedule ... or something?
> >
> > Sort of.
> >
> > Some observations:
> >
> > - we only need to go to higher resolution when two events happen in the
> > same time quantum
> > - this applies at both the level of seconds and jiffies
> > - if the only file touched in a given quantum gets touched ago, we don't
> > need to update its timestamp if stat wasn't also called on it in this
> > quantum
>
> parse error aroound 'ago'.
This should read:
- if only one file is touched in a given quantum, we don't need to
update its timestamp if stat wasn't called on it in the same quantum
> > - we never need to use a higher resolution than the global
> > min(s_time_gran)
> >
> >
> > For instance, if a machine is idle, except for writing to a single file
> > once a second, 1s resolution suffices.
>
> Are you sure? As soon as you get network communication...
I don't think you can generally compare filesystem timestamps to other
time sources reliably. For instance, network filesystems might have
their own notions of current time.
> > Any time two files are touched in the same second, the second one (and
> > later files) needs jiffies resolution. Similarly, any time two files are
> > touched in the same jiffy, the second one should use gtod().
>
> For make. I don't see how this is globally true.
>
> I do
>
> ( date; > stamp; date ) | ( sleep 5; cat > counterexample )
>
> I know timestamp should be between two dates, but it is not.
You're claiming the timestamp on 'stamp' should be strictly between the
two dates reported?
This is true today if and only if you measure in seconds (and your
filesystem's clock is synced with your local clock). If you measure in
resolutions greater than the filesystem resolution (currently limited to
jiffies) even on a local filesystem, it will be wrong.
--
Mathematics is the supreme nostalgia of our time.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Nanosecond fs timestamp support: sad
2011-07-22 22:59 ` NeilBrown
2011-07-22 23:06 ` J. Bruce Fields
2011-07-23 0:07 ` Matt Mackall
@ 2011-07-23 1:13 ` Andreas Dilger
2011-07-25 15:09 ` Paul E. McKenney
3 siblings, 0 replies; 21+ messages in thread
From: Andreas Dilger @ 2011-07-23 1:13 UTC (permalink / raw)
To: NeilBrown
Cc: J. Bruce Fields, Andi Kleen, Matt Mackall, linux-fsdevel,
Linux Kernel Mailing List
As an FYI, Lustre uses i_version to store the transaction number in which a file changed. It sets the i_version itself. If NFSv4 were to set i_version when it needs to transition the state of a file then it wouldn't cause overhead on filesystems that are not being used for NFS export.
I don't think timestamps can ever be completely safe for distributed state management, unless the kernel bends the rules on what a timestamp IS, e.g. by never reverting the ctime when the clock moves backward and such.
Cheers, Andreas
On 2011-07-22, at 4:59 PM, NeilBrown <neilb@suse.de> wrote:
> On Fri, 22 Jul 2011 18:31:58 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> wrote:
>
>> On Fri, Jul 22, 2011 at 06:10:39PM -0400, bfields wrote:
>>> On Fri, Jul 22, 2011 at 11:47:32PM +0200, Andi Kleen wrote:
>>>> On Fri, Jul 22, 2011 at 04:11:42PM -0500, Matt Mackall wrote:
>>>>> On Fri, 2011-07-22 at 22:59 +0200, Andi Kleen wrote:
>>>>>>> Indeed. Only usefully exists on ext4 and requires extra system calls.
>>>>>>
>>>>>> Not sure what you mean? It's in stat(2), just like the timestamps.
>>>>>
>>>>> I don't see anything that looks like a version or generation number in
>>>>> either the man pages, the asm-generic/stat.h, or glibc's asm/stat.h.
>>>>> Pointer?
>>>>
>>>> Hmm you're right. I thought it was in there, but apparently not.
>>>> I think it should be added there though. We still have some unused
>>>> fields.
>>>
>>> But last I checked I thought it was only ext4 that actually incremented
>>> the i_version on IO, and even then only when given a (non-default) mount
>>> option.
>>>
>>> My notes on what needs to be done there:
>>>
>>> - collect data to determine whether turning on i_version causes
>>> any significant performance regressions.
>>> - Last I talked to him, Ted Tso recommended running
>>> Bonnie on a local disk, since it does a lot of little
>>> writes, which is somewhat of a worst case, as it will
>>> generate extra metadata updates for each write.
>>> Compare total wall-clock time, number of iops, and
>>> number of bytes (using some kind of block tracing).
>>> - If there aren't any problems, turn it on by default, and we're
>>> done.
>>
>> (Well,and talk the other filesystem implementors into doing it.)
>>
>
> But does anyone apart from NFSv4 actually *want* i_version as opposed to the
> more-generally-useful precise timestamps?
>
> If not, we probably should tell NFSv4 to use timestamps and focus on making
> them work well.
> ??
>
> The timestamp used doesn't need to update ever nanosecond. I think if it
> were just updated on every userspace->kernel transition (or effective
> equivalents inside kernel threads) that would be enough capture all
> causality. I wonder how that would be achieved.. I wonder if RCU machinery
> could help - doesn't it keep track of when threads schedule ... or something?
>
> NeilBrown
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Nanosecond fs timestamp support: sad
2011-07-22 22:59 ` NeilBrown
` (2 preceding siblings ...)
2011-07-23 1:13 ` Andreas Dilger
@ 2011-07-25 15:09 ` Paul E. McKenney
3 siblings, 0 replies; 21+ messages in thread
From: Paul E. McKenney @ 2011-07-25 15:09 UTC (permalink / raw)
To: NeilBrown
Cc: J. Bruce Fields, Andi Kleen, Matt Mackall, linux-fsdevel,
Linux Kernel Mailing List
On Sat, Jul 23, 2011 at 08:59:15AM +1000, NeilBrown wrote:
> On Fri, 22 Jul 2011 18:31:58 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> wrote:
>
> > On Fri, Jul 22, 2011 at 06:10:39PM -0400, bfields wrote:
> > > On Fri, Jul 22, 2011 at 11:47:32PM +0200, Andi Kleen wrote:
> > > > On Fri, Jul 22, 2011 at 04:11:42PM -0500, Matt Mackall wrote:
> > > > > On Fri, 2011-07-22 at 22:59 +0200, Andi Kleen wrote:
> > > > > > > Indeed. Only usefully exists on ext4 and requires extra system calls.
> > > > > >
> > > > > > Not sure what you mean? It's in stat(2), just like the timestamps.
> > > > >
> > > > > I don't see anything that looks like a version or generation number in
> > > > > either the man pages, the asm-generic/stat.h, or glibc's asm/stat.h.
> > > > > Pointer?
> > > >
> > > > Hmm you're right. I thought it was in there, but apparently not.
> > > > I think it should be added there though. We still have some unused
> > > > fields.
> > >
> > > But last I checked I thought it was only ext4 that actually incremented
> > > the i_version on IO, and even then only when given a (non-default) mount
> > > option.
> > >
> > > My notes on what needs to be done there:
> > >
> > > - collect data to determine whether turning on i_version causes
> > > any significant performance regressions.
> > > - Last I talked to him, Ted Tso recommended running
> > > Bonnie on a local disk, since it does a lot of little
> > > writes, which is somewhat of a worst case, as it will
> > > generate extra metadata updates for each write.
> > > Compare total wall-clock time, number of iops, and
> > > number of bytes (using some kind of block tracing).
> > > - If there aren't any problems, turn it on by default, and we're
> > > done.
> >
> > (Well,and talk the other filesystem implementors into doing it.)
> >
>
> But does anyone apart from NFSv4 actually *want* i_version as opposed to the
> more-generally-useful precise timestamps?
>
> If not, we probably should tell NFSv4 to use timestamps and focus on making
> them work well.
> ??
>
> The timestamp used doesn't need to update ever nanosecond. I think if it
> were just updated on every userspace->kernel transition (or effective
> equivalents inside kernel threads) that would be enough capture all
> causality. I wonder how that would be achieved.. I wonder if RCU machinery
> could help - doesn't it keep track of when threads schedule ... or something?
RCU does track thread scheduling, but currently only pays attention to
it if there is an RCU grace period in progress. It would be easy to
make it track more precisely, though, if that would help something.
That said, I suspect that Peter Zijlstra would be extremely unhappy with
any proposed change that (say) acquired a global lock on every thread
schedule. And I don't believe that he would be all that happy even with a
change that added a non-global lock acquisition to each context switch...
Thanx, Paul
Thanx, Paul
^ permalink raw reply [flat|nested] 21+ messages in thread