linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Nanosecond fs timestamp support: sad
@ 2011-07-21 18:07 Matt Mackall
  2011-07-22  6:01 ` Andi Kleen
  0 siblings, 1 reply; 21+ messages in thread
From: Matt Mackall @ 2011-07-21 18:07 UTC (permalink / raw)
  To: linux-fsdevel, Linux Kernel Mailing List

So it turns out that the resolution on filesystem timestamps is tied to
HZ rather than gettimeofday or similar, which means the resolution
improvement over seconds is.. not much. And not nearly as much as
advertised!

This means I can touch a file something like 70k times per second and
get only 300 distinct timestamps on my laptop. And only 100 distinct
timestamps on a typical distro server kernel.

Meanwhile, I can call gettimeofday 35M times per second and get ~1M
distinct responses.

Given that we can do gettimeofday three orders of magnitude faster than
we can do file transactions and it has four orders of magnitude better
resolution, shouldn't we be using it for filesystem time when
sb->s_time_gran is less than 1/HZ?

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nanosecond fs timestamp support: sad
  2011-07-21 18:07 Nanosecond fs timestamp support: sad Matt Mackall
@ 2011-07-22  6:01 ` Andi Kleen
  2011-07-22  6:33   ` NeilBrown
  0 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2011-07-22  6:01 UTC (permalink / raw)
  To: Matt Mackall; +Cc: linux-fsdevel, Linux Kernel Mailing List

Matt Mackall <mpm@selenic.com> writes:


> This means I can touch a file something like 70k times per second and
> get only 300 distinct timestamps on my laptop. And only 100 distinct
> timestamps on a typical distro server kernel.

You should use the inode generation number if you really want
to see every update.

> Meanwhile, I can call gettimeofday 35M times per second and get ~1M
> distinct responses.

They key word here is "I".

> Given that we can do gettimeofday three orders of magnitude faster than
> we can do file transactions and it has four orders of magnitude better
> resolution, shouldn't we be using it for filesystem time when
> sb->s_time_gran is less than 1/HZ?

Some systems have a quite slow gettimeofday()
That was the primary motivation for using jiffies. 

Also adding more granuality makes it more expensive,
because there's additional work every time it changes.
Even jiffies already caused regressions.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nanosecond fs timestamp support: sad
  2011-07-22  6:01 ` Andi Kleen
@ 2011-07-22  6:33   ` NeilBrown
  2011-07-22 19:34     ` Matt Mackall
  0 siblings, 1 reply; 21+ messages in thread
From: NeilBrown @ 2011-07-22  6:33 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Matt Mackall, linux-fsdevel, Linux Kernel Mailing List

On Thu, 21 Jul 2011 23:01:24 -0700 Andi Kleen <andi@firstfloor.org> wrote:

> Matt Mackall <mpm@selenic.com> writes:
> 
> 
> > This means I can touch a file something like 70k times per second and
> > get only 300 distinct timestamps on my laptop. And only 100 distinct
> > timestamps on a typical distro server kernel.
> 
> You should use the inode generation number if you really want
> to see every update.

I assume you mean i_version which gets incremented (under a spinlock) if the
filesystem asks for it.

This doesn't let you compare the ages of two files.  I wonder if that is
important.  Is it important to you Matt?


> 
> > Meanwhile, I can call gettimeofday 35M times per second and get ~1M
> > distinct responses.
> 
> They key word here is "I".
> 
> > Given that we can do gettimeofday three orders of magnitude faster than
> > we can do file transactions and it has four orders of magnitude better
> > resolution, shouldn't we be using it for filesystem time when
> > sb->s_time_gran is less than 1/HZ?
> 
> Some systems have a quite slow gettimeofday()
> That was the primary motivation for using jiffies. 
> 
> Also adding more granuality makes it more expensive,
> because there's additional work every time it changes.
> Even jiffies already caused regressions.
> 
> -Andi

I imagine a scheme where 'stat' would set a flag if it wasn't set, and
file_update_time would:
  - if the flag is set, use gettimeofday and clear the flag
  - if the flag is not set, use jiffies

so if you are looking, you will see i_mtime changing precisely but if not,
you don't pay the price.
This wouldn't allow precise ordering of distinct files either of course.

NeilBrown

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nanosecond fs timestamp support: sad
  2011-07-22  6:33   ` NeilBrown
@ 2011-07-22 19:34     ` Matt Mackall
  2011-07-22 20:59       ` Andi Kleen
  0 siblings, 1 reply; 21+ messages in thread
From: Matt Mackall @ 2011-07-22 19:34 UTC (permalink / raw)
  To: NeilBrown; +Cc: Andi Kleen, linux-fsdevel, Linux Kernel Mailing List

On Fri, 2011-07-22 at 16:33 +1000, NeilBrown wrote:
> On Thu, 21 Jul 2011 23:01:24 -0700 Andi Kleen <andi@firstfloor.org> wrote:
> 
> > Matt Mackall <mpm@selenic.com> writes:
> > 
> > 
> > > This means I can touch a file something like 70k times per second and
> > > get only 300 distinct timestamps on my laptop. And only 100 distinct
> > > timestamps on a typical distro server kernel.
> > 
> > You should use the inode generation number if you really want
> > to see every update.
> 
> I assume you mean i_version which gets incremented (under a spinlock) if the
> filesystem asks for it.

Indeed. Only usefully exists on ext4 and requires extra system calls.

> This doesn't let you compare the ages of two files.  I wonder if that is
> important.  Is it important to you Matt?

Sort of. We track a 'latest seen timestamp' so we can consider files
before that time unchanged and we need only concern ourselves with the
looking for invisible changes that occur inside that quantum.

> I imagine a scheme where 'stat' would set a flag if it wasn't set, and
> file_update_time would:
>   - if the flag is set, use gettimeofday and clear the flag
>   - if the flag is not set, use jiffies
> 
> so if you are looking, you will see i_mtime changing precisely but if not,
> you don't pay the price.

Hmm, interesting.

> This wouldn't allow precise ordering of distinct files either of course.

Yeah, I don't think we want to introduce observable non-causality in
filesystem time. There might be something clever we can do here, but it
would require some Deep Thought. But if successful, we could mitigate
some of the repeated inode dirtying caused by jiffies-resolution
timestamping.

-- 
Mathematics is the supreme nostalgia of our time.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nanosecond fs timestamp support: sad
  2011-07-22 19:34     ` Matt Mackall
@ 2011-07-22 20:59       ` Andi Kleen
  2011-07-22 21:11         ` Matt Mackall
  0 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2011-07-22 20:59 UTC (permalink / raw)
  To: Matt Mackall
  Cc: NeilBrown, Andi Kleen, linux-fsdevel, Linux Kernel Mailing List

> Indeed. Only usefully exists on ext4 and requires extra system calls.

Not sure what you mean?  It's in stat(2), just like the timestamps.

As for XFS, btrfs etc. I guess it could be added there.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nanosecond fs timestamp support: sad
  2011-07-22 20:59       ` Andi Kleen
@ 2011-07-22 21:11         ` Matt Mackall
  2011-07-22 21:47           ` Andi Kleen
  0 siblings, 1 reply; 21+ messages in thread
From: Matt Mackall @ 2011-07-22 21:11 UTC (permalink / raw)
  To: Andi Kleen; +Cc: NeilBrown, linux-fsdevel, Linux Kernel Mailing List

On Fri, 2011-07-22 at 22:59 +0200, Andi Kleen wrote:
> > Indeed. Only usefully exists on ext4 and requires extra system calls.
> 
> Not sure what you mean?  It's in stat(2), just like the timestamps.

I don't see anything that looks like a version or generation number in
either the man pages, the asm-generic/stat.h, or glibc's asm/stat.h.
Pointer?

The only interface I'm aware of is the EXT?_IOC_GETVERSION interface.
Looks like that is supported by BTRFS.

-- 
Mathematics is the supreme nostalgia of our time.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nanosecond fs timestamp support: sad
  2011-07-22 21:11         ` Matt Mackall
@ 2011-07-22 21:47           ` Andi Kleen
  2011-07-22 22:10             ` J. Bruce Fields
  0 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2011-07-22 21:47 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Andi Kleen, NeilBrown, linux-fsdevel, Linux Kernel Mailing List

On Fri, Jul 22, 2011 at 04:11:42PM -0500, Matt Mackall wrote:
> On Fri, 2011-07-22 at 22:59 +0200, Andi Kleen wrote:
> > > Indeed. Only usefully exists on ext4 and requires extra system calls.
> > 
> > Not sure what you mean?  It's in stat(2), just like the timestamps.
> 
> I don't see anything that looks like a version or generation number in
> either the man pages, the asm-generic/stat.h, or glibc's asm/stat.h.
> Pointer?

Hmm you're right. I thought it was in there, but apparently not.
I think it should be added there though. We still have some unused 
fields.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nanosecond fs timestamp support: sad
  2011-07-22 21:47           ` Andi Kleen
@ 2011-07-22 22:10             ` J. Bruce Fields
  2011-07-22 22:31               ` J. Bruce Fields
  0 siblings, 1 reply; 21+ messages in thread
From: J. Bruce Fields @ 2011-07-22 22:10 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Matt Mackall, NeilBrown, linux-fsdevel, Linux Kernel Mailing List

On Fri, Jul 22, 2011 at 11:47:32PM +0200, Andi Kleen wrote:
> On Fri, Jul 22, 2011 at 04:11:42PM -0500, Matt Mackall wrote:
> > On Fri, 2011-07-22 at 22:59 +0200, Andi Kleen wrote:
> > > > Indeed. Only usefully exists on ext4 and requires extra system calls.
> > > 
> > > Not sure what you mean?  It's in stat(2), just like the timestamps.
> > 
> > I don't see anything that looks like a version or generation number in
> > either the man pages, the asm-generic/stat.h, or glibc's asm/stat.h.
> > Pointer?
> 
> Hmm you're right. I thought it was in there, but apparently not.
> I think it should be added there though. We still have some unused 
> fields.

But last I checked I thought it was only ext4 that actually incremented
the i_version on IO, and even then only when given a (non-default) mount
option.

My notes on what needs to be done there:

	- collect data to determine whether turning on i_version causes
	  any significant performance regressions.
		- Last I talked to him, Ted Tso recommended running
		  Bonnie on a local disk, since it does a lot of little
		  writes, which is somewhat of a worst case, as it will
		  generate extra metadata updates for each write.
		  Compare total wall-clock time, number of iops, and
		  number of bytes (using some kind of block tracing).
	- If there aren't any problems, turn it on by default, and we're
	  done.  If there are unfixable problems, consider something
	  more complicated (like turning on i_version automatically when
	  someone asks for it).
	- We need to check that i_version is also doing something
	  sensible on directory as well as on file inodes.
	- We also need to think about what it does after reboots.  (E.g.
	  what is an nfs server to do if clients see the i_version go
	  backwards (and hence possible repeat old values) after a
	  reboot?)
	- Double-check the order that data updates and i_version updates
	  are done in.  (Ideal would be if they were atomic, but for
	  nfsd's purposes at least it should be adequate if the
	  i_version comes after, and no later than the next commit.)

--b.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nanosecond fs timestamp support: sad
  2011-07-22 22:10             ` J. Bruce Fields
@ 2011-07-22 22:31               ` J. Bruce Fields
  2011-07-22 22:59                 ` NeilBrown
  0 siblings, 1 reply; 21+ messages in thread
From: J. Bruce Fields @ 2011-07-22 22:31 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Matt Mackall, NeilBrown, linux-fsdevel, Linux Kernel Mailing List

On Fri, Jul 22, 2011 at 06:10:39PM -0400, bfields wrote:
> On Fri, Jul 22, 2011 at 11:47:32PM +0200, Andi Kleen wrote:
> > On Fri, Jul 22, 2011 at 04:11:42PM -0500, Matt Mackall wrote:
> > > On Fri, 2011-07-22 at 22:59 +0200, Andi Kleen wrote:
> > > > > Indeed. Only usefully exists on ext4 and requires extra system calls.
> > > > 
> > > > Not sure what you mean?  It's in stat(2), just like the timestamps.
> > > 
> > > I don't see anything that looks like a version or generation number in
> > > either the man pages, the asm-generic/stat.h, or glibc's asm/stat.h.
> > > Pointer?
> > 
> > Hmm you're right. I thought it was in there, but apparently not.
> > I think it should be added there though. We still have some unused 
> > fields.
> 
> But last I checked I thought it was only ext4 that actually incremented
> the i_version on IO, and even then only when given a (non-default) mount
> option.
> 
> My notes on what needs to be done there:
> 
> 	- collect data to determine whether turning on i_version causes
> 	  any significant performance regressions.
> 		- Last I talked to him, Ted Tso recommended running
> 		  Bonnie on a local disk, since it does a lot of little
> 		  writes, which is somewhat of a worst case, as it will
> 		  generate extra metadata updates for each write.
> 		  Compare total wall-clock time, number of iops, and
> 		  number of bytes (using some kind of block tracing).
> 	- If there aren't any problems, turn it on by default, and we're
> 	  done.

(Well,and talk the other filesystem implementors into doing it.)

--b.

>	  If there are unfixable problems, consider something
> 	  more complicated (like turning on i_version automatically when
> 	  someone asks for it).
> 	- We need to check that i_version is also doing something
> 	  sensible on directory as well as on file inodes.
> 	- We also need to think about what it does after reboots.  (E.g.
> 	  what is an nfs server to do if clients see the i_version go
> 	  backwards (and hence possible repeat old values) after a
> 	  reboot?)
> 	- Double-check the order that data updates and i_version updates
> 	  are done in.  (Ideal would be if they were atomic, but for
> 	  nfsd's purposes at least it should be adequate if the
> 	  i_version comes after, and no later than the next commit.)
> 
> --b.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nanosecond fs timestamp support: sad
  2011-07-22 22:31               ` J. Bruce Fields
@ 2011-07-22 22:59                 ` NeilBrown
  2011-07-22 23:06                   ` J. Bruce Fields
                                     ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: NeilBrown @ 2011-07-22 22:59 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Andi Kleen, Matt Mackall, linux-fsdevel,
	Linux Kernel Mailing List

On Fri, 22 Jul 2011 18:31:58 -0400 "J. Bruce Fields" <bfields@fieldses.org>
wrote:

> On Fri, Jul 22, 2011 at 06:10:39PM -0400, bfields wrote:
> > On Fri, Jul 22, 2011 at 11:47:32PM +0200, Andi Kleen wrote:
> > > On Fri, Jul 22, 2011 at 04:11:42PM -0500, Matt Mackall wrote:
> > > > On Fri, 2011-07-22 at 22:59 +0200, Andi Kleen wrote:
> > > > > > Indeed. Only usefully exists on ext4 and requires extra system calls.
> > > > > 
> > > > > Not sure what you mean?  It's in stat(2), just like the timestamps.
> > > > 
> > > > I don't see anything that looks like a version or generation number in
> > > > either the man pages, the asm-generic/stat.h, or glibc's asm/stat.h.
> > > > Pointer?
> > > 
> > > Hmm you're right. I thought it was in there, but apparently not.
> > > I think it should be added there though. We still have some unused 
> > > fields.
> > 
> > But last I checked I thought it was only ext4 that actually incremented
> > the i_version on IO, and even then only when given a (non-default) mount
> > option.
> > 
> > My notes on what needs to be done there:
> > 
> > 	- collect data to determine whether turning on i_version causes
> > 	  any significant performance regressions.
> > 		- Last I talked to him, Ted Tso recommended running
> > 		  Bonnie on a local disk, since it does a lot of little
> > 		  writes, which is somewhat of a worst case, as it will
> > 		  generate extra metadata updates for each write.
> > 		  Compare total wall-clock time, number of iops, and
> > 		  number of bytes (using some kind of block tracing).
> > 	- If there aren't any problems, turn it on by default, and we're
> > 	  done.
> 
> (Well,and talk the other filesystem implementors into doing it.)
> 

But does anyone apart from NFSv4 actually *want* i_version as opposed to the
more-generally-useful precise timestamps?

If not, we probably should tell NFSv4 to use timestamps and focus on making
them work well.
??

The timestamp used doesn't need to update ever nanosecond.  I think if it
were just updated on every userspace->kernel transition  (or effective
equivalents inside kernel threads) that would be enough capture all
causality.  I wonder how that would be achieved..  I wonder if RCU machinery
could help - doesn't it keep track of when threads schedule ... or something?

NeilBrown

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nanosecond fs timestamp support: sad
  2011-07-22 22:59                 ` NeilBrown
@ 2011-07-22 23:06                   ` J. Bruce Fields
  2011-07-22 23:49                     ` J. Bruce Fields
  2011-07-23  0:07                   ` Matt Mackall
                                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: J. Bruce Fields @ 2011-07-22 23:06 UTC (permalink / raw)
  To: NeilBrown
  Cc: Andi Kleen, Matt Mackall, linux-fsdevel,
	Linux Kernel Mailing List

On Sat, Jul 23, 2011 at 08:59:15AM +1000, NeilBrown wrote:
> On Fri, 22 Jul 2011 18:31:58 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> wrote:
> 
> > On Fri, Jul 22, 2011 at 06:10:39PM -0400, bfields wrote:
> > > On Fri, Jul 22, 2011 at 11:47:32PM +0200, Andi Kleen wrote:
> > > > On Fri, Jul 22, 2011 at 04:11:42PM -0500, Matt Mackall wrote:
> > > > > On Fri, 2011-07-22 at 22:59 +0200, Andi Kleen wrote:
> > > > > > > Indeed. Only usefully exists on ext4 and requires extra system calls.
> > > > > > 
> > > > > > Not sure what you mean?  It's in stat(2), just like the timestamps.
> > > > > 
> > > > > I don't see anything that looks like a version or generation number in
> > > > > either the man pages, the asm-generic/stat.h, or glibc's asm/stat.h.
> > > > > Pointer?
> > > > 
> > > > Hmm you're right. I thought it was in there, but apparently not.
> > > > I think it should be added there though. We still have some unused 
> > > > fields.
> > > 
> > > But last I checked I thought it was only ext4 that actually incremented
> > > the i_version on IO, and even then only when given a (non-default) mount
> > > option.
> > > 
> > > My notes on what needs to be done there:
> > > 
> > > 	- collect data to determine whether turning on i_version causes
> > > 	  any significant performance regressions.
> > > 		- Last I talked to him, Ted Tso recommended running
> > > 		  Bonnie on a local disk, since it does a lot of little
> > > 		  writes, which is somewhat of a worst case, as it will
> > > 		  generate extra metadata updates for each write.
> > > 		  Compare total wall-clock time, number of iops, and
> > > 		  number of bytes (using some kind of block tracing).
> > > 	- If there aren't any problems, turn it on by default, and we're
> > > 	  done.
> > 
> > (Well,and talk the other filesystem implementors into doing it.)
> > 
> 
> But does anyone apart from NFSv4 actually *want* i_version as opposed to the
> more-generally-useful precise timestamps?

It *seems* like a generally useful idea, but I don't know of any other
users.

> If not, we probably should tell NFSv4 to use timestamps and focus on making
> them work well.
> ??

Well, sure, I couldn't complain about that if that proved possible.

--b.

> 
> The timestamp used doesn't need to update ever nanosecond.  I think if it
> were just updated on every userspace->kernel transition  (or effective
> equivalents inside kernel threads) that would be enough capture all
> causality.  I wonder how that would be achieved..  I wonder if RCU machinery
> could help - doesn't it keep track of when threads schedule ... or something?
> 
> NeilBrown

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nanosecond fs timestamp support: sad
  2011-07-22 23:06                   ` J. Bruce Fields
@ 2011-07-22 23:49                     ` J. Bruce Fields
  2011-07-23  0:07                       ` NeilBrown
  0 siblings, 1 reply; 21+ messages in thread
From: J. Bruce Fields @ 2011-07-22 23:49 UTC (permalink / raw)
  To: NeilBrown
  Cc: Andi Kleen, Matt Mackall, linux-fsdevel,
	Linux Kernel Mailing List

On Fri, Jul 22, 2011 at 07:06:12PM -0400, J. Bruce Fields wrote:
> On Sat, Jul 23, 2011 at 08:59:15AM +1000, NeilBrown wrote:
> > But does anyone apart from NFSv4 actually *want* i_version as opposed to the
> > more-generally-useful precise timestamps?
> 
> It *seems* like a generally useful idea, but I don't know of any other
> users.

(Out of curiosity: what actually *needs* real timestamps?:
	- They're generally useful to people, of course; ("what did I
	  change last tuesday?")
	- Make uses them, though in theory perhaps it could do the same
	  job by caching records like "object X was built from
	  versions a, b, and c of objects A, B, and C respectively".

But a lot of uses are probably just to answer the question "did this
file change since the last time I looked at it"?

Of course, however theoretically useful, there's always the argument
that linux-specific interfaces are unlikely to be used by anyone except
Lennart Poettering.)

--b.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nanosecond fs timestamp support: sad
  2011-07-22 22:59                 ` NeilBrown
  2011-07-22 23:06                   ` J. Bruce Fields
@ 2011-07-23  0:07                   ` Matt Mackall
  2011-07-23  1:38                     ` J. Bruce Fields
  2011-07-29 19:49                     ` Pavel Machek
  2011-07-23  1:13                   ` Andreas Dilger
  2011-07-25 15:09                   ` Paul E. McKenney
  3 siblings, 2 replies; 21+ messages in thread
From: Matt Mackall @ 2011-07-23  0:07 UTC (permalink / raw)
  To: NeilBrown
  Cc: J. Bruce Fields, Andi Kleen, linux-fsdevel,
	Linux Kernel Mailing List

On Sat, 2011-07-23 at 08:59 +1000, NeilBrown wrote:
> On Fri, 22 Jul 2011 18:31:58 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> wrote:
> 
> > On Fri, Jul 22, 2011 at 06:10:39PM -0400, bfields wrote:
> > > On Fri, Jul 22, 2011 at 11:47:32PM +0200, Andi Kleen wrote:
> > > > On Fri, Jul 22, 2011 at 04:11:42PM -0500, Matt Mackall wrote:
> > > > > On Fri, 2011-07-22 at 22:59 +0200, Andi Kleen wrote:
> > > > > > > Indeed. Only usefully exists on ext4 and requires extra system calls.
> > > > > > 
> > > > > > Not sure what you mean?  It's in stat(2), just like the timestamps.
> > > > > 
> > > > > I don't see anything that looks like a version or generation number in
> > > > > either the man pages, the asm-generic/stat.h, or glibc's asm/stat.h.
> > > > > Pointer?
> > > > 
> > > > Hmm you're right. I thought it was in there, but apparently not.
> > > > I think it should be added there though. We still have some unused 
> > > > fields.
> > > 
> > > But last I checked I thought it was only ext4 that actually incremented
> > > the i_version on IO, and even then only when given a (non-default) mount
> > > option.
> > > 
> > > My notes on what needs to be done there:
> > > 
> > > 	- collect data to determine whether turning on i_version causes
> > > 	  any significant performance regressions.
> > > 		- Last I talked to him, Ted Tso recommended running
> > > 		  Bonnie on a local disk, since it does a lot of little
> > > 		  writes, which is somewhat of a worst case, as it will
> > > 		  generate extra metadata updates for each write.
> > > 		  Compare total wall-clock time, number of iops, and
> > > 		  number of bytes (using some kind of block tracing).
> > > 	- If there aren't any problems, turn it on by default, and we're
> > > 	  done.
> > 
> > (Well,and talk the other filesystem implementors into doing it.)
> > 
> 
> But does anyone apart from NFSv4 actually *want* i_version as opposed to the
> more-generally-useful precise timestamps?

In theory, a microsecond timestamp (ie gtod) may already not be good
enough for all applications. But i_version also doesn't allow comparing
across files.

> If not, we probably should tell NFSv4 to use timestamps and focus on making
> them work well.
> ??
> 
> The timestamp used doesn't need to update ever nanosecond.  I think if it
> were just updated on every userspace->kernel transition  (or effective
> equivalents inside kernel threads) that would be enough capture all
> causality.  I wonder how that would be achieved..  I wonder if RCU machinery
> could help - doesn't it keep track of when threads schedule ... or something?

Sort of.

Some observations:

- we only need to go to higher resolution when two events happen in the
same time quantum
- this applies at both the level of seconds and jiffies
- if the only file touched in a given quantum gets touched ago, we don't
need to update its timestamp if stat wasn't also called on it in this
quantum
- we never need to use a higher resolution than the global
min(s_time_gran)


For instance, if a machine is idle, except for writing to a single file
once a second, 1s resolution suffices.

If a machine is idle, except for writing to the same file 1000 times per
second, and no one is watching it, 1s still suffices (inode is dirtied
once per second).

Any time two files are touched in the same second, the second one (and
later files) needs jiffies resolution. Similarly, any time two files are
touched in the same jiffy, the second one should use gtod().

The global status bits needed to track this could be managed fairly
efficiently with cmpxchg.

(Arguably, we should supply > 1s resolution whether they're strictly
needed or not on filesystems with nanosecond support, so that people
casually inspecting timestamps don't wonder where their nanoseconds
went.)

-- 
Mathematics is the supreme nostalgia of our time.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nanosecond fs timestamp support: sad
  2011-07-22 23:49                     ` J. Bruce Fields
@ 2011-07-23  0:07                       ` NeilBrown
  0 siblings, 0 replies; 21+ messages in thread
From: NeilBrown @ 2011-07-23  0:07 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Andi Kleen, Matt Mackall, linux-fsdevel,
	Linux Kernel Mailing List

On Fri, 22 Jul 2011 19:49:21 -0400 "J. Bruce Fields" <bfields@fieldses.org>
wrote:

> On Fri, Jul 22, 2011 at 07:06:12PM -0400, J. Bruce Fields wrote:
> > On Sat, Jul 23, 2011 at 08:59:15AM +1000, NeilBrown wrote:
> > > But does anyone apart from NFSv4 actually *want* i_version as opposed to the
> > > more-generally-useful precise timestamps?
> > 
> > It *seems* like a generally useful idea, but I don't know of any other
> > users.
> 
> (Out of curiosity: what actually *needs* real timestamps?:
> 	- They're generally useful to people, of course; ("what did I
> 	  change last tuesday?")

In the same vein they are useful for archiving.  "what has changed since I
last started an archive?"

NFSv3 caching obviously uses them too.

> 	- Make uses them, though in theory perhaps it could do the same
> 	  job by caching records like "object X was built from
> 	  versions a, b, and c of objects A, B, and C respectively".

In theory....

> 
> But a lot of uses are probably just to answer the question "did this
> file change since the last time I looked at it"?

I think everything could fall in two one of two categories.
 a/ did this file change since the last time I looked at it?
 b/ did this file change since the last time that file changed?

The former can be achieved with versions or timestamps.
The latter requires globally coherent high precision timestamps... or
something like dependency tracking which would probably be even more
expensive and - as you say - non-standard.

NeilBrown


> 
> Of course, however theoretically useful, there's always the argument
> that linux-specific interfaces are unlikely to be used by anyone except
> Lennart Poettering.)
> 
> --b.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nanosecond fs timestamp support: sad
  2011-07-22 22:59                 ` NeilBrown
  2011-07-22 23:06                   ` J. Bruce Fields
  2011-07-23  0:07                   ` Matt Mackall
@ 2011-07-23  1:13                   ` Andreas Dilger
  2011-07-25 15:09                   ` Paul E. McKenney
  3 siblings, 0 replies; 21+ messages in thread
From: Andreas Dilger @ 2011-07-23  1:13 UTC (permalink / raw)
  To: NeilBrown
  Cc: J. Bruce Fields, Andi Kleen, Matt Mackall, linux-fsdevel,
	Linux Kernel Mailing List

As an FYI, Lustre uses i_version to store the transaction number in which a file changed. It sets the i_version itself. If NFSv4 were to set i_version when it needs to transition the state of a file then it wouldn't cause overhead on filesystems that are not being used for NFS export.

I don't think timestamps can ever be completely safe for distributed state management, unless the kernel bends the rules on what a timestamp IS, e.g. by never reverting the ctime when the clock moves backward and such. 

Cheers, Andreas

On 2011-07-22, at 4:59 PM, NeilBrown <neilb@suse.de> wrote:

> On Fri, 22 Jul 2011 18:31:58 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> wrote:
> 
>> On Fri, Jul 22, 2011 at 06:10:39PM -0400, bfields wrote:
>>> On Fri, Jul 22, 2011 at 11:47:32PM +0200, Andi Kleen wrote:
>>>> On Fri, Jul 22, 2011 at 04:11:42PM -0500, Matt Mackall wrote:
>>>>> On Fri, 2011-07-22 at 22:59 +0200, Andi Kleen wrote:
>>>>>>> Indeed. Only usefully exists on ext4 and requires extra system calls.
>>>>>> 
>>>>>> Not sure what you mean?  It's in stat(2), just like the timestamps.
>>>>> 
>>>>> I don't see anything that looks like a version or generation number in
>>>>> either the man pages, the asm-generic/stat.h, or glibc's asm/stat.h.
>>>>> Pointer?
>>>> 
>>>> Hmm you're right. I thought it was in there, but apparently not.
>>>> I think it should be added there though. We still have some unused 
>>>> fields.
>>> 
>>> But last I checked I thought it was only ext4 that actually incremented
>>> the i_version on IO, and even then only when given a (non-default) mount
>>> option.
>>> 
>>> My notes on what needs to be done there:
>>> 
>>>    - collect data to determine whether turning on i_version causes
>>>      any significant performance regressions.
>>>        - Last I talked to him, Ted Tso recommended running
>>>          Bonnie on a local disk, since it does a lot of little
>>>          writes, which is somewhat of a worst case, as it will
>>>          generate extra metadata updates for each write.
>>>          Compare total wall-clock time, number of iops, and
>>>          number of bytes (using some kind of block tracing).
>>>    - If there aren't any problems, turn it on by default, and we're
>>>      done.
>> 
>> (Well,and talk the other filesystem implementors into doing it.)
>> 
> 
> But does anyone apart from NFSv4 actually *want* i_version as opposed to the
> more-generally-useful precise timestamps?
> 
> If not, we probably should tell NFSv4 to use timestamps and focus on making
> them work well.
> ??
> 
> The timestamp used doesn't need to update ever nanosecond.  I think if it
> were just updated on every userspace->kernel transition  (or effective
> equivalents inside kernel threads) that would be enough capture all
> causality.  I wonder how that would be achieved..  I wonder if RCU machinery
> could help - doesn't it keep track of when threads schedule ... or something?
> 
> NeilBrown
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nanosecond fs timestamp support: sad
  2011-07-23  0:07                   ` Matt Mackall
@ 2011-07-23  1:38                     ` J. Bruce Fields
  2011-07-23  2:10                       ` Trond Myklebust
  2011-07-24  1:56                       ` Andi Kleen
  2011-07-29 19:49                     ` Pavel Machek
  1 sibling, 2 replies; 21+ messages in thread
From: J. Bruce Fields @ 2011-07-23  1:38 UTC (permalink / raw)
  To: Matt Mackall
  Cc: NeilBrown, Andi Kleen, linux-fsdevel, Linux Kernel Mailing List

On Fri, Jul 22, 2011 at 07:07:41PM -0500, Matt Mackall wrote:
> On Sat, 2011-07-23 at 08:59 +1000, NeilBrown wrote:
> > On Fri, 22 Jul 2011 18:31:58 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> > wrote:
> > 
> > > On Fri, Jul 22, 2011 at 06:10:39PM -0400, bfields wrote:
> > > > On Fri, Jul 22, 2011 at 11:47:32PM +0200, Andi Kleen wrote:
> > > > > On Fri, Jul 22, 2011 at 04:11:42PM -0500, Matt Mackall wrote:
> > > > > > On Fri, 2011-07-22 at 22:59 +0200, Andi Kleen wrote:
> > > > > > > > Indeed. Only usefully exists on ext4 and requires extra system calls.
> > > > > > > 
> > > > > > > Not sure what you mean?  It's in stat(2), just like the timestamps.
> > > > > > 
> > > > > > I don't see anything that looks like a version or generation number in
> > > > > > either the man pages, the asm-generic/stat.h, or glibc's asm/stat.h.
> > > > > > Pointer?
> > > > > 
> > > > > Hmm you're right. I thought it was in there, but apparently not.
> > > > > I think it should be added there though. We still have some unused 
> > > > > fields.
> > > > 
> > > > But last I checked I thought it was only ext4 that actually incremented
> > > > the i_version on IO, and even then only when given a (non-default) mount
> > > > option.
> > > > 
> > > > My notes on what needs to be done there:
> > > > 
> > > > 	- collect data to determine whether turning on i_version causes
> > > > 	  any significant performance regressions.
> > > > 		- Last I talked to him, Ted Tso recommended running
> > > > 		  Bonnie on a local disk, since it does a lot of little
> > > > 		  writes, which is somewhat of a worst case, as it will
> > > > 		  generate extra metadata updates for each write.
> > > > 		  Compare total wall-clock time, number of iops, and
> > > > 		  number of bytes (using some kind of block tracing).
> > > > 	- If there aren't any problems, turn it on by default, and we're
> > > > 	  done.
> > > 
> > > (Well,and talk the other filesystem implementors into doing it.)
> > > 
> > 
> > But does anyone apart from NFSv4 actually *want* i_version as opposed to the
> > more-generally-useful precise timestamps?
> 
> In theory, a microsecond timestamp (ie gtod) may already not be good
> enough for all applications. But i_version also doesn't allow comparing
> across files.
> 
> > If not, we probably should tell NFSv4 to use timestamps and focus on making
> > them work well.
> > ??
> > 
> > The timestamp used doesn't need to update ever nanosecond.  I think if it
> > were just updated on every userspace->kernel transition  (or effective
> > equivalents inside kernel threads) that would be enough capture all
> > causality.  I wonder how that would be achieved..  I wonder if RCU machinery
> > could help - doesn't it keep track of when threads schedule ... or something?
> 
> Sort of.
> 
> Some observations:
> 
> - we only need to go to higher resolution when two events happen in the
> same time quantum
> - this applies at both the level of seconds and jiffies
> - if the only file touched in a given quantum gets touched ago, we don't
> need to update its timestamp if stat wasn't also called on it in this
> quantum
> - we never need to use a higher resolution than the global
> min(s_time_gran)

Right, so there was a rough algorithm hashed out somewhere around here:

	http://thread.gmane.org/gmane.linux.kernel/1022866/focus=1024624

that depended on those observations.

NFS presents a worst-case as the standard NFSv3 read and write
operations include timestamps in the result.  So every single IO comes
with a stat.  So either you have a clock good enough to give a distinct
timestamp for all of those, or you fall back on a global counter that
ends up serializing all IO.  I think.  I admit I'm not sure I understand
your proposal below.

--b.

> 
> 
> For instance, if a machine is idle, except for writing to a single file
> once a second, 1s resolution suffices.
> 
> If a machine is idle, except for writing to the same file 1000 times per
> second, and no one is watching it, 1s still suffices (inode is dirtied
> once per second).
> 
> Any time two files are touched in the same second, the second one (and
> later files) needs jiffies resolution. Similarly, any time two files are
> touched in the same jiffy, the second one should use gtod().
> 
> The global status bits needed to track this could be managed fairly
> efficiently with cmpxchg.
> 
> (Arguably, we should supply > 1s resolution whether they're strictly
> needed or not on filesystems with nanosecond support, so that people
> casually inspecting timestamps don't wonder where their nanoseconds
> went.)
> 
> -- 
> Mathematics is the supreme nostalgia of our time.
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nanosecond fs timestamp support: sad
  2011-07-23  1:38                     ` J. Bruce Fields
@ 2011-07-23  2:10                       ` Trond Myklebust
  2011-07-24  1:56                       ` Andi Kleen
  1 sibling, 0 replies; 21+ messages in thread
From: Trond Myklebust @ 2011-07-23  2:10 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Matt Mackall, NeilBrown, Andi Kleen, linux-fsdevel,
	Linux Kernel Mailing List

On Fri, 2011-07-22 at 21:38 -0400, J. Bruce Fields wrote: 
> On Fri, Jul 22, 2011 at 07:07:41PM -0500, Matt Mackall wrote:
> > On Sat, 2011-07-23 at 08:59 +1000, NeilBrown wrote:
> > > On Fri, 22 Jul 2011 18:31:58 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> > > wrote:
> > > 
> > > > On Fri, Jul 22, 2011 at 06:10:39PM -0400, bfields wrote:
> > > > > On Fri, Jul 22, 2011 at 11:47:32PM +0200, Andi Kleen wrote:
> > > > > > On Fri, Jul 22, 2011 at 04:11:42PM -0500, Matt Mackall wrote:
> > > > > > > On Fri, 2011-07-22 at 22:59 +0200, Andi Kleen wrote:
> > > > > > > > > Indeed. Only usefully exists on ext4 and requires extra system calls.
> > > > > > > > 
> > > > > > > > Not sure what you mean?  It's in stat(2), just like the timestamps.
> > > > > > > 
> > > > > > > I don't see anything that looks like a version or generation number in
> > > > > > > either the man pages, the asm-generic/stat.h, or glibc's asm/stat.h.
> > > > > > > Pointer?
> > > > > > 
> > > > > > Hmm you're right. I thought it was in there, but apparently not.
> > > > > > I think it should be added there though. We still have some unused 
> > > > > > fields.
> > > > > 
> > > > > But last I checked I thought it was only ext4 that actually incremented
> > > > > the i_version on IO, and even then only when given a (non-default) mount
> > > > > option.
> > > > > 
> > > > > My notes on what needs to be done there:
> > > > > 
> > > > > 	- collect data to determine whether turning on i_version causes
> > > > > 	  any significant performance regressions.
> > > > > 		- Last I talked to him, Ted Tso recommended running
> > > > > 		  Bonnie on a local disk, since it does a lot of little
> > > > > 		  writes, which is somewhat of a worst case, as it will
> > > > > 		  generate extra metadata updates for each write.
> > > > > 		  Compare total wall-clock time, number of iops, and
> > > > > 		  number of bytes (using some kind of block tracing).
> > > > > 	- If there aren't any problems, turn it on by default, and we're
> > > > > 	  done.
> > > > 
> > > > (Well,and talk the other filesystem implementors into doing it.)
> > > > 
> > > 
> > > But does anyone apart from NFSv4 actually *want* i_version as opposed to the
> > > more-generally-useful precise timestamps?
> > 
> > In theory, a microsecond timestamp (ie gtod) may already not be good
> > enough for all applications. But i_version also doesn't allow comparing
> > across files.
> > 
> > > If not, we probably should tell NFSv4 to use timestamps and focus on making
> > > them work well.
> > > ??
> > > 
> > > The timestamp used doesn't need to update ever nanosecond.  I think if it
> > > were just updated on every userspace->kernel transition  (or effective
> > > equivalents inside kernel threads) that would be enough capture all
> > > causality.  I wonder how that would be achieved..  I wonder if RCU machinery
> > > could help - doesn't it keep track of when threads schedule ... or something?
> > 
> > Sort of.
> > 
> > Some observations:
> > 
> > - we only need to go to higher resolution when two events happen in the
> > same time quantum
> > - this applies at both the level of seconds and jiffies
> > - if the only file touched in a given quantum gets touched ago, we don't
> > need to update its timestamp if stat wasn't also called on it in this
> > quantum
> > - we never need to use a higher resolution than the global
> > min(s_time_gran)
> 
> Right, so there was a rough algorithm hashed out somewhere around here:
> 
> 	http://thread.gmane.org/gmane.linux.kernel/1022866/focus=1024624
> 
> that depended on those observations.
> 
> NFS presents a worst-case as the standard NFSv3 read and write
> operations include timestamps in the result.  So every single IO comes
> with a stat.  So either you have a clock good enough to give a distinct
> timestamp for all of those, or you fall back on a global counter that
> ends up serializing all IO.  I think.  I admit I'm not sure I understand
> your proposal below.

...or you admit that NFSv3 is no longer able to keep up with modern
processing speeds and storage, and you ditch it in favour of NFSv4.

Time-stamps are _not_ the optimal way to label changes in a clustered
environment (or even a multi-cpu/multi-core environment): aside from the
various issues involving absolute time vs. wall clock time, you also
have to deal with clock synchronisation across those nodes/cpus/cores at
the < microsecond resolution level. Have fun doing that...

   Trond


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nanosecond fs timestamp support: sad
  2011-07-23  1:38                     ` J. Bruce Fields
  2011-07-23  2:10                       ` Trond Myklebust
@ 2011-07-24  1:56                       ` Andi Kleen
  1 sibling, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2011-07-24  1:56 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Matt Mackall, NeilBrown, Andi Kleen, linux-fsdevel,
	Linux Kernel Mailing List

> with a stat.  So either you have a clock good enough to give a distinct
> timestamp for all of those, or you fall back on a global counter that
> ends up serializing all IO.  I think.  I admit I'm not sure I understand

Not global counter, but per inode. That's very reasonable because there's
already locking on the inode level.

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nanosecond fs timestamp support: sad
  2011-07-22 22:59                 ` NeilBrown
                                     ` (2 preceding siblings ...)
  2011-07-23  1:13                   ` Andreas Dilger
@ 2011-07-25 15:09                   ` Paul E. McKenney
  3 siblings, 0 replies; 21+ messages in thread
From: Paul E. McKenney @ 2011-07-25 15:09 UTC (permalink / raw)
  To: NeilBrown
  Cc: J. Bruce Fields, Andi Kleen, Matt Mackall, linux-fsdevel,
	Linux Kernel Mailing List

On Sat, Jul 23, 2011 at 08:59:15AM +1000, NeilBrown wrote:
> On Fri, 22 Jul 2011 18:31:58 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> wrote:
> 
> > On Fri, Jul 22, 2011 at 06:10:39PM -0400, bfields wrote:
> > > On Fri, Jul 22, 2011 at 11:47:32PM +0200, Andi Kleen wrote:
> > > > On Fri, Jul 22, 2011 at 04:11:42PM -0500, Matt Mackall wrote:
> > > > > On Fri, 2011-07-22 at 22:59 +0200, Andi Kleen wrote:
> > > > > > > Indeed. Only usefully exists on ext4 and requires extra system calls.
> > > > > > 
> > > > > > Not sure what you mean?  It's in stat(2), just like the timestamps.
> > > > > 
> > > > > I don't see anything that looks like a version or generation number in
> > > > > either the man pages, the asm-generic/stat.h, or glibc's asm/stat.h.
> > > > > Pointer?
> > > > 
> > > > Hmm you're right. I thought it was in there, but apparently not.
> > > > I think it should be added there though. We still have some unused 
> > > > fields.
> > > 
> > > But last I checked I thought it was only ext4 that actually incremented
> > > the i_version on IO, and even then only when given a (non-default) mount
> > > option.
> > > 
> > > My notes on what needs to be done there:
> > > 
> > > 	- collect data to determine whether turning on i_version causes
> > > 	  any significant performance regressions.
> > > 		- Last I talked to him, Ted Tso recommended running
> > > 		  Bonnie on a local disk, since it does a lot of little
> > > 		  writes, which is somewhat of a worst case, as it will
> > > 		  generate extra metadata updates for each write.
> > > 		  Compare total wall-clock time, number of iops, and
> > > 		  number of bytes (using some kind of block tracing).
> > > 	- If there aren't any problems, turn it on by default, and we're
> > > 	  done.
> > 
> > (Well,and talk the other filesystem implementors into doing it.)
> > 
> 
> But does anyone apart from NFSv4 actually *want* i_version as opposed to the
> more-generally-useful precise timestamps?
> 
> If not, we probably should tell NFSv4 to use timestamps and focus on making
> them work well.
> ??
> 
> The timestamp used doesn't need to update ever nanosecond.  I think if it
> were just updated on every userspace->kernel transition  (or effective
> equivalents inside kernel threads) that would be enough capture all
> causality.  I wonder how that would be achieved..  I wonder if RCU machinery
> could help - doesn't it keep track of when threads schedule ... or something?

RCU does track thread scheduling, but currently only pays attention to
it if there is an RCU grace period in progress.  It would be easy to
make it track more precisely, though, if that would help something.

That said, I suspect that Peter Zijlstra would be extremely unhappy with
any proposed change that (say) acquired a global lock on every thread
schedule.  And I don't believe that he would be all that happy even with a
change that added a non-global lock acquisition to each context switch...

							Thanx, Paul

							Thanx, Paul

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nanosecond fs timestamp support: sad
  2011-07-23  0:07                   ` Matt Mackall
  2011-07-23  1:38                     ` J. Bruce Fields
@ 2011-07-29 19:49                     ` Pavel Machek
  2011-07-29 21:37                       ` Matt Mackall
  1 sibling, 1 reply; 21+ messages in thread
From: Pavel Machek @ 2011-07-29 19:49 UTC (permalink / raw)
  To: Matt Mackall
  Cc: NeilBrown, J. Bruce Fields, Andi Kleen, linux-fsdevel,
	Linux Kernel Mailing List

Hi!

> > If not, we probably should tell NFSv4 to use timestamps and focus on making
> > them work well.
> > ??
> > 
> > The timestamp used doesn't need to update ever nanosecond.  I think if it
> > were just updated on every userspace->kernel transition  (or effective
> > equivalents inside kernel threads) that would be enough capture all
> > causality.  I wonder how that would be achieved..  I wonder if RCU machinery
> > could help - doesn't it keep track of when threads schedule ... or something?
> 
> Sort of.
> 
> Some observations:
> 
> - we only need to go to higher resolution when two events happen in the
> same time quantum
> - this applies at both the level of seconds and jiffies
> - if the only file touched in a given quantum gets touched ago, we don't
> need to update its timestamp if stat wasn't also called on it in this
> quantum

parse error aroound 'ago'.

> - we never need to use a higher resolution than the global
> min(s_time_gran)
> 
> 
> For instance, if a machine is idle, except for writing to a single file
> once a second, 1s resolution suffices.

Are you sure? As soon as you get network communication...

> Any time two files are touched in the same second, the second one (and
> later files) needs jiffies resolution. Similarly, any time two files are
> touched in the same jiffy, the second one should use gtod().

For make. I don't see how this is globally true.

I do

( date; > stamp; date ) | ( sleep 5; cat > counterexample )

I know timestamp should be between two dates, but it is not.

								Pavel


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nanosecond fs timestamp support: sad
  2011-07-29 19:49                     ` Pavel Machek
@ 2011-07-29 21:37                       ` Matt Mackall
  0 siblings, 0 replies; 21+ messages in thread
From: Matt Mackall @ 2011-07-29 21:37 UTC (permalink / raw)
  To: Pavel Machek
  Cc: NeilBrown, J. Bruce Fields, Andi Kleen, linux-fsdevel,
	Linux Kernel Mailing List

On Fri, 2011-07-29 at 21:49 +0200, Pavel Machek wrote:
> Hi!
> 
> > > If not, we probably should tell NFSv4 to use timestamps and focus on making
> > > them work well.
> > > ??
> > > 
> > > The timestamp used doesn't need to update ever nanosecond.  I think if it
> > > were just updated on every userspace->kernel transition  (or effective
> > > equivalents inside kernel threads) that would be enough capture all
> > > causality.  I wonder how that would be achieved..  I wonder if RCU machinery
> > > could help - doesn't it keep track of when threads schedule ... or something?
> > 
> > Sort of.
> > 
> > Some observations:
> > 
> > - we only need to go to higher resolution when two events happen in the
> > same time quantum
> > - this applies at both the level of seconds and jiffies
> > - if the only file touched in a given quantum gets touched ago, we don't
> > need to update its timestamp if stat wasn't also called on it in this
> > quantum
> 
> parse error aroound 'ago'.

This should read:

- if only one file is touched in a given quantum, we don't need to
update its timestamp if stat wasn't called on it in the same quantum

> > - we never need to use a higher resolution than the global
> > min(s_time_gran)
> > 
> > 
> > For instance, if a machine is idle, except for writing to a single file
> > once a second, 1s resolution suffices.
> 
> Are you sure? As soon as you get network communication...

I don't think you can generally compare filesystem timestamps to other
time sources reliably. For instance, network filesystems might have
their own notions of current time.

> > Any time two files are touched in the same second, the second one (and
> > later files) needs jiffies resolution. Similarly, any time two files are
> > touched in the same jiffy, the second one should use gtod().
> 
> For make. I don't see how this is globally true.
> 
> I do
> 
> ( date; > stamp; date ) | ( sleep 5; cat > counterexample )
> 
> I know timestamp should be between two dates, but it is not.

You're claiming the timestamp on 'stamp' should be strictly between the
two dates reported?

This is true today if and only if you measure in seconds (and your
filesystem's clock is synced with your local clock). If you measure in
resolutions greater than the filesystem resolution (currently limited to
jiffies) even on a local filesystem, it will be wrong.

-- 
Mathematics is the supreme nostalgia of our time.



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2011-07-29 21:37 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-21 18:07 Nanosecond fs timestamp support: sad Matt Mackall
2011-07-22  6:01 ` Andi Kleen
2011-07-22  6:33   ` NeilBrown
2011-07-22 19:34     ` Matt Mackall
2011-07-22 20:59       ` Andi Kleen
2011-07-22 21:11         ` Matt Mackall
2011-07-22 21:47           ` Andi Kleen
2011-07-22 22:10             ` J. Bruce Fields
2011-07-22 22:31               ` J. Bruce Fields
2011-07-22 22:59                 ` NeilBrown
2011-07-22 23:06                   ` J. Bruce Fields
2011-07-22 23:49                     ` J. Bruce Fields
2011-07-23  0:07                       ` NeilBrown
2011-07-23  0:07                   ` Matt Mackall
2011-07-23  1:38                     ` J. Bruce Fields
2011-07-23  2:10                       ` Trond Myklebust
2011-07-24  1:56                       ` Andi Kleen
2011-07-29 19:49                     ` Pavel Machek
2011-07-29 21:37                       ` Matt Mackall
2011-07-23  1:13                   ` Andreas Dilger
2011-07-25 15:09                   ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).