All of lore.kernel.org
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>,
	"Patrick J. LoPresti" <lopresti@gmail.com>,
	Andi Kleen <andi@firstfloor.org>,
	linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: Proposal: Use hi-res clock for file timestamps
Date: Wed, 18 Aug 2010 15:53:59 +1000	[thread overview]
Message-ID: <20100818155359.66b9ddb6@notabene> (raw)
In-Reply-To: <20100817192937.GD26609@fieldses.org>

On Tue, 17 Aug 2010 15:29:38 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Tue, Aug 17, 2010 at 08:39:41PM +0100, Alan Cox wrote:
> > > The problem with "increment mtime by a nanosecond when necessary" is
> > > that timestamps can wind up out of order.  As in:
> > 
> > Surely that depends on your implementation ?
> > 
> > > 1) Do a bunch of operations on file A
> > > 2) Do one operation on file B
> > > 
> > > Imagine each operation on A incrementing its timestamp by a nanosecond
> > > "just because".  If all of these operations happen in less than 4 ms,
> > > you can wind up with the timestamp on B being EARLIER than the
> > > timestamp on A.  That is a big no-no (think "make" or anything else
> > > relying on timestamps for relative times).
> > 
> > 
> > [time resolution bits of data][value incremented value for that time]
> > 
> > 
> > 	if (time_now == time_last)
> > 		return { time_last , ++ct };
> > 	else {
> > 		ct = 0;
> > 		time_last = time_now;
> > 		return { time_last , 0 };
> > 	}
> > 
> > providing it is done with the same 'ct' across the fs and you can't do
> > enough ops/second to wrap the nanosecs - which should be fine for now,
> > your ordering is still safe is it not ?
> 
> Right, so if I understand correctly, you're proposing a time source
> that's global to the filesystem and that guarantees it will always
> return a unique value by incrementing the nanoseconds field if jiffies
> haven't changed since the last time it was called.
> 
> (Does it really need to be global across all filesystems?  Or is it
> unreasonable to expect your unbelievably-fast make's to behave well when
> sources and targets live on different filesystems?)
>

I'm not sure you even want to pay for a per-filesystem atomic access when
updating mtime.  mnt_want_write - called at the same time - seems to go to
some lengths to avoid an atomic operation.

I think that nfsd should be the only place that has to pay the atomic
penalty, as it is where the need is.

I imagine something like this:
 - Create a global struct timespec which is protected by a seqlock
   Call it current_nfsd_time or similar.
 - file_update_time reads this and uses it if it is newer than
   current_fs_time.
 - nfsd updates it whenever it reads an mtime out of an inode that matches
   current_fs_time to the granularity of 1/HZ.
   If the current value is before current_kernel_time, it
   is set to current_kernel_time, otherwise tv_nsec is incremented -
   unless that increases
   beyond jiffies_to_usec(1)*1000 beyond current_kernel_time.
 - the global 'struct timespec' is zeroed whenever system time is set
   backwards.

Then - providing the fs stores nanosecond timestamps - we should have stable,
globally ordered, precise (if not entirely accurate) time stamps, and a
penalty would only be paid when nfsd actually needs the information.


[[You could probably make ext3 work reasonably well by adding a mount option
  which:
    - advertises s_time_gran as 1
    - when storing: rounds timestamps up to the next second if tv_nsec != 0
    - when loading, setting the timestamp to the current time if the stored
      number matches current_kernel_time().tv_sec+1
  You would get occasional forward jumps in mtime, but usually when you
  aren't looking, and at least you would not get real changes that are not
  reflected in mtime
]]

NeilBrown

WARNING: multiple messages have this Message-ID (diff)
From: Neil Brown <neilb-l3A5Bk7waGM@public.gmane.org>
To: "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
Cc: Alan Cox <alan-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>,
	"Patrick J. LoPresti"
	<lopresti-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: Proposal: Use hi-res clock for file timestamps
Date: Wed, 18 Aug 2010 15:53:59 +1000	[thread overview]
Message-ID: <20100818155359.66b9ddb6@notabene> (raw)
In-Reply-To: <20100817192937.GD26609-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>

On Tue, 17 Aug 2010 15:29:38 -0400
"J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote:

> On Tue, Aug 17, 2010 at 08:39:41PM +0100, Alan Cox wrote:
> > > The problem with "increment mtime by a nanosecond when necessary" is
> > > that timestamps can wind up out of order.  As in:
> > 
> > Surely that depends on your implementation ?
> > 
> > > 1) Do a bunch of operations on file A
> > > 2) Do one operation on file B
> > > 
> > > Imagine each operation on A incrementing its timestamp by a nanosecond
> > > "just because".  If all of these operations happen in less than 4 ms,
> > > you can wind up with the timestamp on B being EARLIER than the
> > > timestamp on A.  That is a big no-no (think "make" or anything else
> > > relying on timestamps for relative times).
> > 
> > 
> > [time resolution bits of data][value incremented value for that time]
> > 
> > 
> > 	if (time_now == time_last)
> > 		return { time_last , ++ct };
> > 	else {
> > 		ct = 0;
> > 		time_last = time_now;
> > 		return { time_last , 0 };
> > 	}
> > 
> > providing it is done with the same 'ct' across the fs and you can't do
> > enough ops/second to wrap the nanosecs - which should be fine for now,
> > your ordering is still safe is it not ?
> 
> Right, so if I understand correctly, you're proposing a time source
> that's global to the filesystem and that guarantees it will always
> return a unique value by incrementing the nanoseconds field if jiffies
> haven't changed since the last time it was called.
> 
> (Does it really need to be global across all filesystems?  Or is it
> unreasonable to expect your unbelievably-fast make's to behave well when
> sources and targets live on different filesystems?)
>

I'm not sure you even want to pay for a per-filesystem atomic access when
updating mtime.  mnt_want_write - called at the same time - seems to go to
some lengths to avoid an atomic operation.

I think that nfsd should be the only place that has to pay the atomic
penalty, as it is where the need is.

I imagine something like this:
 - Create a global struct timespec which is protected by a seqlock
   Call it current_nfsd_time or similar.
 - file_update_time reads this and uses it if it is newer than
   current_fs_time.
 - nfsd updates it whenever it reads an mtime out of an inode that matches
   current_fs_time to the granularity of 1/HZ.
   If the current value is before current_kernel_time, it
   is set to current_kernel_time, otherwise tv_nsec is incremented -
   unless that increases
   beyond jiffies_to_usec(1)*1000 beyond current_kernel_time.
 - the global 'struct timespec' is zeroed whenever system time is set
   backwards.

Then - providing the fs stores nanosecond timestamps - we should have stable,
globally ordered, precise (if not entirely accurate) time stamps, and a
penalty would only be paid when nfsd actually needs the information.


[[You could probably make ext3 work reasonably well by adding a mount option
  which:
    - advertises s_time_gran as 1
    - when storing: rounds timestamps up to the next second if tv_nsec != 0
    - when loading, setting the timestamp to the current time if the stored
      number matches current_kernel_time().tv_sec+1
  You would get occasional forward jumps in mtime, but usually when you
  aren't looking, and at least you would not get real changes that are not
  reflected in mtime
]]

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2010-08-18  5:54 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-13 18:25 Proposal: Use hi-res clock for file timestamps Patrick J. LoPresti
2010-08-13 18:45 ` john stultz
2010-08-13 18:45   ` john stultz
2010-08-13 18:57   ` Patrick J. LoPresti
2010-08-13 19:09     ` john stultz
2010-08-13 19:09       ` john stultz
2010-08-13 20:53       ` Patrick J. LoPresti
2010-08-13 20:53         ` Patrick J. LoPresti
2010-08-14 16:45         ` Patrick J. LoPresti
2010-08-15  1:50         ` Bret Towe
2010-08-13 19:57     ` Jim Rees
2010-08-13 20:26       ` john stultz
2010-08-13 20:52         ` Jim Rees
2010-08-17 14:54 ` Andi Kleen
2010-08-17 14:54   ` Andi Kleen
2010-08-17 17:41   ` J. Bruce Fields
2010-08-17 17:41     ` J. Bruce Fields
2010-08-17 18:29     ` Andi Kleen
2010-08-17 18:29       ` Andi Kleen
2010-08-17 18:50       ` Patrick J. LoPresti
2010-08-17 18:50         ` Patrick J. LoPresti
2010-08-17 19:04       ` J. Bruce Fields
2010-08-17 19:18         ` Patrick J. LoPresti
2010-08-17 19:18           ` Patrick J. LoPresti
2010-08-17 19:39           ` Alan Cox
2010-08-17 19:39             ` Alan Cox
2010-08-17 19:29             ` J. Bruce Fields
2010-08-17 19:29               ` J. Bruce Fields
2010-08-17 19:52               ` Alan Cox
2010-08-17 19:52                 ` Alan Cox
2010-08-18  5:53               ` Neil Brown [this message]
2010-08-18  5:53                 ` Neil Brown
2010-08-18 14:46                 ` Patrick J. LoPresti
2010-08-18 14:46                   ` Patrick J. LoPresti
2010-08-18 17:32                 ` J. Bruce Fields
2010-08-18 17:32                   ` J. Bruce Fields
2010-08-18 18:15                   ` Chuck Lever
2010-08-18 18:15                     ` Chuck Lever
2010-08-18 23:41                     ` Neil Brown
2010-08-18 23:41                       ` Neil Brown
2010-08-19  0:52                       ` Neil Brown
2010-08-19  0:52                         ` Neil Brown
2010-08-19  2:08                         ` J. Bruce Fields
2010-08-19  2:08                           ` J. Bruce Fields
2010-08-19  2:44                           ` Neil Brown
2010-08-19  2:44                             ` Neil Brown
2010-08-19 22:46                             ` J. Bruce Fields
2010-08-19 22:46                               ` J. Bruce Fields
2010-08-18 23:47                   ` Neil Brown
2010-08-18 23:47                     ` Neil Brown
2010-08-18 17:50                 ` Andi Kleen
2010-08-18 17:50                   ` Andi Kleen
2010-08-18 18:54                   ` J. Bruce Fields
2010-08-18 19:25                     ` Andi Kleen
2010-08-18 19:30                       ` J. Bruce Fields
2010-08-17 19:34             ` Patrick J. LoPresti
2010-08-17 19:34               ` Patrick J. LoPresti
2010-08-17 19:54               ` Alan Cox
2010-08-17 19:43                 ` Patrick J. LoPresti
2010-08-17 19:43                   ` Patrick J. LoPresti
2010-08-17 19:45                   ` J. Bruce Fields
2010-08-17 19:45                     ` J. Bruce Fields
2010-08-18 18:12               ` J. Bruce Fields
2010-08-18 18:12                 ` J. Bruce Fields
2010-08-19  1:41                 ` john stultz
2010-08-19  1:41                   ` john stultz
2010-08-19  2:31                   ` J. Bruce Fields
2010-08-19  2:31                     ` J. Bruce Fields
2010-08-19  3:17                     ` john stultz
2010-08-19  3:17                       ` john stultz
2010-08-19 22:53                       ` J. Bruce Fields
2010-08-18 18:20       ` David Woodhouse
2010-08-18 18:20         ` David Woodhouse
2010-08-18 18:32         ` Patrick J. LoPresti
2010-08-18 18:32           ` Patrick J. LoPresti
2010-08-18 18:53         ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100818155359.66b9ddb6@notabene \
    --to=neilb@suse.de \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=andi@firstfloor.org \
    --cc=bfields@fieldses.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=lopresti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.