All of lore.kernel.org
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>,
	"Patrick J. LoPresti" <lopresti@gmail.com>,
	Andi Kleen <andi@firstfloor.org>,
	linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: Proposal: Use hi-res clock for file timestamps
Date: Thu, 19 Aug 2010 09:41:36 +1000	[thread overview]
Message-ID: <20100819094136.24fef59b@notabene> (raw)
In-Reply-To: <0F91AB9D-0E14-4384-ADD6-0A467C3ABFAC@oracle.com>

On Wed, 18 Aug 2010 14:15:51 -0400
Chuck Lever <chuck.lever@oracle.com> wrote:

> 
> On Aug 18, 2010, at 1:32 PM, J. Bruce Fields wrote:
> 
> > On Wed, Aug 18, 2010 at 03:53:59PM +1000, Neil Brown wrote:
> >> I'm not sure you even want to pay for a per-filesystem atomic access when
> >> updating mtime.  mnt_want_write - called at the same time - seems to go to
> >> some lengths to avoid an atomic operation.
> >> 
> >> I think that nfsd should be the only place that has to pay the atomic
> >> penalty, as it is where the need is.
> >> 
> >> I imagine something like this:
> >> - Create a global struct timespec which is protected by a seqlock
> >>   Call it current_nfsd_time or similar.
> >> - file_update_time reads this and uses it if it is newer than
> >>   current_fs_time.
> >> - nfsd updates it whenever it reads an mtime out of an inode that matches
> >>   current_fs_time to the granularity of 1/HZ.
> > 
> > We can also skip the update whenever current_nfsd_time is greater than
> > the inode's mtime--that's enough to ensure that the next
> > file_update_time() call will get a time different from the inode's
> > current mtime.
> 
> Would it help if we only did this for directories, for now?
> 
> Files have close-to-open.  Directories... don't.  So we have the problem where directory changes (ie file creation and deletion) takes a long time (some times an infinitely long time) to propagate to clients.  Plus: directories don't change very often, so using fine-grained time stamps only on directories wouldn't impact heavy I/O workloads.

I'm don't quite see how close-to-open really affects this issue - it still
relies on the timestamps and so can cache old data if a file update didn't
change the timestamp.

In my mind the difference is that near-concurrent access to files usually
involves file locking which flushes caches (and if it doesn't then you have
bigger problems) while near-concurrent access to directories relies on the
natural atomicity of dir operations so no locking or flushing occurs.

So I agree that this is probably more of an issue for directories than for
files, and that implementing it just for directories would be a sensible
first step with lower expected overhead - just my reasoning seems to be a bit
different.

Thanks,
NeilBrown

WARNING: multiple messages have this Message-ID (diff)
From: Neil Brown <neilb-l3A5Bk7waGM@public.gmane.org>
To: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: "J. Bruce Fields"
	<bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>,
	Alan Cox <alan-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>,
	"Patrick J. LoPresti"
	<lopresti-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: Proposal: Use hi-res clock for file timestamps
Date: Thu, 19 Aug 2010 09:41:36 +1000	[thread overview]
Message-ID: <20100819094136.24fef59b@notabene> (raw)
In-Reply-To: <0F91AB9D-0E14-4384-ADD6-0A467C3ABFAC-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

On Wed, 18 Aug 2010 14:15:51 -0400
Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:

> 
> On Aug 18, 2010, at 1:32 PM, J. Bruce Fields wrote:
> 
> > On Wed, Aug 18, 2010 at 03:53:59PM +1000, Neil Brown wrote:
> >> I'm not sure you even want to pay for a per-filesystem atomic access when
> >> updating mtime.  mnt_want_write - called at the same time - seems to go to
> >> some lengths to avoid an atomic operation.
> >> 
> >> I think that nfsd should be the only place that has to pay the atomic
> >> penalty, as it is where the need is.
> >> 
> >> I imagine something like this:
> >> - Create a global struct timespec which is protected by a seqlock
> >>   Call it current_nfsd_time or similar.
> >> - file_update_time reads this and uses it if it is newer than
> >>   current_fs_time.
> >> - nfsd updates it whenever it reads an mtime out of an inode that matches
> >>   current_fs_time to the granularity of 1/HZ.
> > 
> > We can also skip the update whenever current_nfsd_time is greater than
> > the inode's mtime--that's enough to ensure that the next
> > file_update_time() call will get a time different from the inode's
> > current mtime.
> 
> Would it help if we only did this for directories, for now?
> 
> Files have close-to-open.  Directories... don't.  So we have the problem where directory changes (ie file creation and deletion) takes a long time (some times an infinitely long time) to propagate to clients.  Plus: directories don't change very often, so using fine-grained time stamps only on directories wouldn't impact heavy I/O workloads.

I'm don't quite see how close-to-open really affects this issue - it still
relies on the timestamps and so can cache old data if a file update didn't
change the timestamp.

In my mind the difference is that near-concurrent access to files usually
involves file locking which flushes caches (and if it doesn't then you have
bigger problems) while near-concurrent access to directories relies on the
natural atomicity of dir operations so no locking or flushing occurs.

So I agree that this is probably more of an issue for directories than for
files, and that implementing it just for directories would be a sensible
first step with lower expected overhead - just my reasoning seems to be a bit
different.

Thanks,
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2010-08-18 23:41 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-13 18:25 Proposal: Use hi-res clock for file timestamps Patrick J. LoPresti
2010-08-13 18:45 ` john stultz
2010-08-13 18:45   ` john stultz
2010-08-13 18:57   ` Patrick J. LoPresti
2010-08-13 19:09     ` john stultz
2010-08-13 19:09       ` john stultz
2010-08-13 20:53       ` Patrick J. LoPresti
2010-08-13 20:53         ` Patrick J. LoPresti
2010-08-14 16:45         ` Patrick J. LoPresti
2010-08-15  1:50         ` Bret Towe
2010-08-13 19:57     ` Jim Rees
2010-08-13 20:26       ` john stultz
2010-08-13 20:52         ` Jim Rees
2010-08-17 14:54 ` Andi Kleen
2010-08-17 14:54   ` Andi Kleen
2010-08-17 17:41   ` J. Bruce Fields
2010-08-17 17:41     ` J. Bruce Fields
2010-08-17 18:29     ` Andi Kleen
2010-08-17 18:29       ` Andi Kleen
2010-08-17 18:50       ` Patrick J. LoPresti
2010-08-17 18:50         ` Patrick J. LoPresti
2010-08-17 19:04       ` J. Bruce Fields
2010-08-17 19:18         ` Patrick J. LoPresti
2010-08-17 19:18           ` Patrick J. LoPresti
2010-08-17 19:39           ` Alan Cox
2010-08-17 19:39             ` Alan Cox
2010-08-17 19:29             ` J. Bruce Fields
2010-08-17 19:29               ` J. Bruce Fields
2010-08-17 19:52               ` Alan Cox
2010-08-17 19:52                 ` Alan Cox
2010-08-18  5:53               ` Neil Brown
2010-08-18  5:53                 ` Neil Brown
2010-08-18 14:46                 ` Patrick J. LoPresti
2010-08-18 14:46                   ` Patrick J. LoPresti
2010-08-18 17:32                 ` J. Bruce Fields
2010-08-18 17:32                   ` J. Bruce Fields
2010-08-18 18:15                   ` Chuck Lever
2010-08-18 18:15                     ` Chuck Lever
2010-08-18 23:41                     ` Neil Brown [this message]
2010-08-18 23:41                       ` Neil Brown
2010-08-19  0:52                       ` Neil Brown
2010-08-19  0:52                         ` Neil Brown
2010-08-19  2:08                         ` J. Bruce Fields
2010-08-19  2:08                           ` J. Bruce Fields
2010-08-19  2:44                           ` Neil Brown
2010-08-19  2:44                             ` Neil Brown
2010-08-19 22:46                             ` J. Bruce Fields
2010-08-19 22:46                               ` J. Bruce Fields
2010-08-18 23:47                   ` Neil Brown
2010-08-18 23:47                     ` Neil Brown
2010-08-18 17:50                 ` Andi Kleen
2010-08-18 17:50                   ` Andi Kleen
2010-08-18 18:54                   ` J. Bruce Fields
2010-08-18 19:25                     ` Andi Kleen
2010-08-18 19:30                       ` J. Bruce Fields
2010-08-17 19:34             ` Patrick J. LoPresti
2010-08-17 19:34               ` Patrick J. LoPresti
2010-08-17 19:54               ` Alan Cox
2010-08-17 19:43                 ` Patrick J. LoPresti
2010-08-17 19:43                   ` Patrick J. LoPresti
2010-08-17 19:45                   ` J. Bruce Fields
2010-08-17 19:45                     ` J. Bruce Fields
2010-08-18 18:12               ` J. Bruce Fields
2010-08-18 18:12                 ` J. Bruce Fields
2010-08-19  1:41                 ` john stultz
2010-08-19  1:41                   ` john stultz
2010-08-19  2:31                   ` J. Bruce Fields
2010-08-19  2:31                     ` J. Bruce Fields
2010-08-19  3:17                     ` john stultz
2010-08-19  3:17                       ` john stultz
2010-08-19 22:53                       ` J. Bruce Fields
2010-08-18 18:20       ` David Woodhouse
2010-08-18 18:20         ` David Woodhouse
2010-08-18 18:32         ` Patrick J. LoPresti
2010-08-18 18:32           ` Patrick J. LoPresti
2010-08-18 18:53         ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100819094136.24fef59b@notabene \
    --to=neilb@suse.de \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=andi@firstfloor.org \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=lopresti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.