[RFC] F_SETLEASE mess

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] F_SETLEASE mess
@ 2013-07-05  9:04 Al Viro
  2013-07-05 10:51 ` Jeff Layton
  0 siblings, 1 reply; 15+ messages in thread
From: Al Viro @ 2013-07-05  9:04 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-fsdevel, Linus Torvalds

	generic_add_lease() with F_WRLCK checks for other openers
in a very crude way - it wants no extra references to dentry (thus
excluding other struct file pointing to it) *and* no extra references
to in-core inode, excluding openers of other links.  It fails with
EAGAIN if those conditions are not met.

	The way it deals with another open(2) racing with it (i.e.
managing to squeeze between the check and locks_insert_lock()) is
theoretically racy; do_dentry_open() would spin on ->i_lock, all
right, but... only if there already is something in inode->i_flock.
If this is the first lease/lock being set, break_lease() will do
nothing, rather than call __break_lease() and spin there.

	It's _very_ hard to hit; we are holding ->i_lock and thus can't
be preempted, so open(2) would have to get *everything* (pathname
lookup, etc.) done in a very narrow window.  So I don't believe it's
exploitable, but it really smells bad.  The check is extremely crude
and if nothing else it's a DoS fodder - a luser that keeps hitting that
file with stat(2) can prevent F_SETLEASE from succeeding, even though
he wouldn't be able to open the damn thing at all...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] F_SETLEASE mess
  2013-07-05  9:04 [RFC] F_SETLEASE mess Al Viro
@ 2013-07-05 10:51 ` Jeff Layton
  2013-07-05 12:08   ` Jeff Layton
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff Layton @ 2013-07-05 10:51 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linus Torvalds, Bruce Fields

On Fri, 2013-07-05 at 10:04 +0100, Al Viro wrote:
> 	generic_add_lease() with F_WRLCK checks for other openers
> in a very crude way - it wants no extra references to dentry (thus
> excluding other struct file pointing to it) *and* no extra references
> to in-core inode, excluding openers of other links.  It fails with
> EAGAIN if those conditions are not met.
> 
> 	The way it deals with another open(2) racing with it (i.e.
> managing to squeeze between the check and locks_insert_lock()) is
> theoretically racy; do_dentry_open() would spin on ->i_lock, all
> right, but... only if there already is something in inode->i_flock.
> If this is the first lease/lock being set, break_lease() will do
> nothing, rather than call __break_lease() and spin there.
> 
> 	It's _very_ hard to hit; we are holding ->i_lock and thus can't
> be preempted, so open(2) would have to get *everything* (pathname
> lookup, etc.) done in a very narrow window.  So I don't believe it's
> exploitable, but it really smells bad.  The check is extremely crude
> and if nothing else it's a DoS fodder - a luser that keeps hitting that
> file with stat(2) can prevent F_SETLEASE from succeeding, even though
> he wouldn't be able to open the damn thing at all...

(cc'ing Bruce since he's been poking around in this area recently and
might have ideas...)

I see what you mean. I haven't looked closely at this yet, and I'm on
holiday today, but I'll plan to give this more scrutiny when I get back
next week. Giving it an initial look though...

We already hold an extra reference to the dentry via the path_get in
do_dentry_open. So is this race possible if the two tasks are working on
the same dentry? Or does it require a hardlinked inode?

If it's not possible to race on the same dentry, then one possible fix
would be to go ahead and do an extra igrab(inode) in do_dentry_open
before calling break_lease. I'm not particularly fond of that since it
means taking the i_lock an extra time, but it looks like it would close
the race.

-- 
Jeff Layton <jlayton@redhat.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] F_SETLEASE mess
  2013-07-05 10:51 ` Jeff Layton
@ 2013-07-05 12:08   ` Jeff Layton
  2013-07-05 16:25     ` Bruce Fields
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff Layton @ 2013-07-05 12:08 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linus Torvalds, Bruce Fields

On Fri, 2013-07-05 at 06:51 -0400, Jeff Layton wrote:
> On Fri, 2013-07-05 at 10:04 +0100, Al Viro wrote:
> > 	generic_add_lease() with F_WRLCK checks for other openers
> > in a very crude way - it wants no extra references to dentry (thus
> > excluding other struct file pointing to it) *and* no extra references
> > to in-core inode, excluding openers of other links.  It fails with
> > EAGAIN if those conditions are not met.
> > 
> > 	The way it deals with another open(2) racing with it (i.e.
> > managing to squeeze between the check and locks_insert_lock()) is
> > theoretically racy; do_dentry_open() would spin on ->i_lock, all
> > right, but... only if there already is something in inode->i_flock.
> > If this is the first lease/lock being set, break_lease() will do
> > nothing, rather than call __break_lease() and spin there.
> > 
> > 	It's _very_ hard to hit; we are holding ->i_lock and thus can't
> > be preempted, so open(2) would have to get *everything* (pathname
> > lookup, etc.) done in a very narrow window.  So I don't believe it's
> > exploitable, but it really smells bad.  The check is extremely crude
> > and if nothing else it's a DoS fodder - a luser that keeps hitting that
> > file with stat(2) can prevent F_SETLEASE from succeeding, even though
> > he wouldn't be able to open the damn thing at all...
> 
> (cc'ing Bruce since he's been poking around in this area recently and
> might have ideas...)
> 
> I see what you mean. I haven't looked closely at this yet, and I'm on
> holiday today, but I'll plan to give this more scrutiny when I get back
> next week. Giving it an initial look though...
> 
> We already hold an extra reference to the dentry via the path_get in
> do_dentry_open. So is this race possible if the two tasks are working on
> the same dentry? Or does it require a hardlinked inode?
> 
> If it's not possible to race on the same dentry, then one possible fix
> would be to go ahead and do an extra igrab(inode) in do_dentry_open
> before calling break_lease. I'm not particularly fond of that since it
> means taking the i_lock an extra time, but it looks like it would close
> the race.
> 


Hrm. I think we'd also need to couple that with an extra check for a
high refcount after doing locks_insert_lock in generic_add_lease, and
then call locks_delete_lock and return -EAGAIN if the counts have
changed.

-- 
Jeff Layton <jlayton@redhat.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] F_SETLEASE mess
  2013-07-05 12:08   ` Jeff Layton
@ 2013-07-05 16:25     ` Bruce Fields
  2013-07-05 21:46       ` Jeff Layton
  0 siblings, 1 reply; 15+ messages in thread
From: Bruce Fields @ 2013-07-05 16:25 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Al Viro, linux-fsdevel, Linus Torvalds

> > On Fri, 2013-07-05 at 10:04 +0100, Al Viro wrote:
> > > 	generic_add_lease() with F_WRLCK checks for other openers
> > > in a very crude way - it wants no extra references to dentry (thus
> > > excluding other struct file pointing to it) *and* no extra references
> > > to in-core inode, excluding openers of other links.  It fails with
> > > EAGAIN if those conditions are not met.
> > > 
> > > 	The way it deals with another open(2) racing with it (i.e.
> > > managing to squeeze between the check and locks_insert_lock()) is
> > > theoretically racy; do_dentry_open() would spin on ->i_lock, all
> > > right, but... only if there already is something in inode->i_flock.
> > > If this is the first lease/lock being set, break_lease() will do
> > > nothing, rather than call __break_lease() and spin there.
> > > 
> > > 	It's _very_ hard to hit; we are holding ->i_lock and thus can't
> > > be preempted, so open(2) would have to get *everything* (pathname
> > > lookup, etc.) done in a very narrow window.  So I don't believe it's
> > > exploitable, but it really smells bad.  The check is extremely crude
> > > and if nothing else it's a DoS fodder - a luser that keeps hitting that
> > > file with stat(2) can prevent F_SETLEASE from succeeding, even though
> > > he wouldn't be able to open the damn thing at all...

nfsd isn't using write leases yet (I want to get read delegations sorted
out first), and I don't understand Samba's requirements for write
leases.

In the future, when nfsd does write delegations: they're an optional
optimization, and if we're concerned about such a DOS one solution might
be just to change everything to a trylock when acquiring delegations:
it's always acceptable to just fail the delegation.

On Fri, Jul 05, 2013 at 08:08:44AM -0400, Jeff Layton wrote:
> On Fri, 2013-07-05 at 06:51 -0400, Jeff Layton wrote:
> > We already hold an extra reference to the dentry via the path_get in
> > do_dentry_open. So is this race possible if the two tasks are working on
> > the same dentry?

Well, as Al says, the race is roughly:

	take i_lock
	generic_add_lease checks d_count, i_count

		Conflicting opener does "*everything* (pathname lookup,
		etc.)" (including that path_get, and breake_lease()
		(which sees no lock, so doesn't try to get i_lock.))

	...
	locks_insert_lock() adds new lock.

> > Or does it require a hardlinked inode?
> > 
> > If it's not possible to race on the same dentry, then one possible fix
> > would be to go ahead and do an extra igrab(inode)

I think the only reason for this race is the attempt to optimize out an
i_lock acquisition in break_lease.  If we're willing to call igrab
(which also takes the i_lock), then we may as well just take the i_lock.

> > in do_dentry_open
> > before calling break_lease. I'm not particularly fond of that since it
> > means taking the i_lock an extra time, but it looks like it would close
> > the race.
> 
> Hrm. I think we'd also need to couple that with an extra check for a
> high refcount after doing locks_insert_lock in generic_add_lease, and
> then call locks_delete_lock and return -EAGAIN if the counts have
> changed.

... but checking the counts again afterwards might work.  (Dumb
question: in the absence of a lock on the opener's side, are the memory
accesses ordered such that a lease-setter is guaranteed to see the new
counts from an opener that didn't see the new i_flock value?)

How important is the optimization that skips the i_lock in break_lease?

--b.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] F_SETLEASE mess
  2013-07-05 16:25     ` Bruce Fields
@ 2013-07-05 21:46       ` Jeff Layton
  2013-07-08 14:17         ` Bruce Fields
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff Layton @ 2013-07-05 21:46 UTC (permalink / raw)
  To: Bruce Fields; +Cc: Al Viro, linux-fsdevel, Linus Torvalds

On Fri, 2013-07-05 at 12:25 -0400, Bruce Fields wrote:
> > > On Fri, 2013-07-05 at 10:04 +0100, Al Viro wrote:
> > > > 	generic_add_lease() with F_WRLCK checks for other openers
> > > > in a very crude way - it wants no extra references to dentry (thus
> > > > excluding other struct file pointing to it) *and* no extra references
> > > > to in-core inode, excluding openers of other links.  It fails with
> > > > EAGAIN if those conditions are not met.
> > > > 
> > > > 	The way it deals with another open(2) racing with it (i.e.
> > > > managing to squeeze between the check and locks_insert_lock()) is
> > > > theoretically racy; do_dentry_open() would spin on ->i_lock, all
> > > > right, but... only if there already is something in inode->i_flock.
> > > > If this is the first lease/lock being set, break_lease() will do
> > > > nothing, rather than call __break_lease() and spin there.
> > > > 
> > > > 	It's _very_ hard to hit; we are holding ->i_lock and thus can't
> > > > be preempted, so open(2) would have to get *everything* (pathname
> > > > lookup, etc.) done in a very narrow window.  So I don't believe it's
> > > > exploitable, but it really smells bad.  The check is extremely crude
> > > > and if nothing else it's a DoS fodder - a luser that keeps hitting that
> > > > file with stat(2) can prevent F_SETLEASE from succeeding, even though
> > > > he wouldn't be able to open the damn thing at all...
> 
> nfsd isn't using write leases yet (I want to get read delegations sorted
> out first), and I don't understand Samba's requirements for write
> leases.
> 
> In the future, when nfsd does write delegations: they're an optional
> optimization, and if we're concerned about such a DOS one solution might
> be just to change everything to a trylock when acquiring delegations:
> it's always acceptable to just fail the delegation.
> 

I think that's the case for Samba too. Oplocks are very similar to
delegations, though there are of course some semantic differences.

> On Fri, Jul 05, 2013 at 08:08:44AM -0400, Jeff Layton wrote:
> > On Fri, 2013-07-05 at 06:51 -0400, Jeff Layton wrote:
> > > We already hold an extra reference to the dentry via the path_get in
> > > do_dentry_open. So is this race possible if the two tasks are working on
> > > the same dentry?
> 
> Well, as Al says, the race is roughly:
> 
> 	take i_lock
> 	generic_add_lease checks d_count, i_count
> 
> 		Conflicting opener does "*everything* (pathname lookup,
> 		etc.)" (including that path_get, and breake_lease()
> 		(which sees no lock, so doesn't try to get i_lock.))
> 
> 	...
> 	locks_insert_lock() adds new lock.
> 

Right, I realized that while I was driving earlier. This race is
definitely possible in either case...

> > > Or does it require a hardlinked inode?
> > > 
> > > If it's not possible to race on the same dentry, then one possible fix
> > > would be to go ahead and do an extra igrab(inode)
> 
> I think the only reason for this race is the attempt to optimize out an
> i_lock acquisition in break_lease.  If we're willing to call igrab
> (which also takes the i_lock), then we may as well just take the i_lock.
> 

I had misremembered the function name -- I had meant ihold/iput (which
would commonly be lockless).

The point is moot though since we have a similar problem in the "same
dentry" case anyway. Also the path_get on the other dentry would ensure
that you had a i_count > 1...

So...disregard my silly rumbling about it. :)

> > > in do_dentry_open
> > > before calling break_lease. I'm not particularly fond of that since it
> > > means taking the i_lock an extra time, but it looks like it would close
> > > the race.
> > 
> > Hrm. I think we'd also need to couple that with an extra check for a
> > high refcount after doing locks_insert_lock in generic_add_lease, and
> > then call locks_delete_lock and return -EAGAIN if the counts have
> > changed.
> 
> ... but checking the counts again afterwards might work.  (Dumb
> question: in the absence of a lock on the opener's side, are the memory
> accesses ordered such that a lease-setter is guaranteed to see the new
> counts from an opener that didn't see the new i_flock value?)
> 

I'm not sure. We'd certainly need to audit that and ensure that the
lease setter does see them before we could rely on such a scheme.

> How important is the optimization that skips the i_lock in break_lease?

Again, not sure...but I do shy away a bit from anything that would slow
down opens in the case where leases aren't involved. I'd prefer any
penalty be paid by the setlease caller. Opens are much more common than
leases in almost every workload I can think of.

I think the bigger issue though is that looking at refcounts in order to
determine when we have a conflicting open is just plain wrong. There are
all sorts of reasons one might see a raised refcount that don't involve
conflicting opens (Al's stat() example for instance). It seems like we
ought to shoot for a solution that doesn't rely (solely) on inode and
dentry refcounts.

-- 
Jeff Layton <jlayton@redhat.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] F_SETLEASE mess
  2013-07-05 21:46       ` Jeff Layton
@ 2013-07-08 14:17         ` Bruce Fields
  2013-07-08 14:33           ` Jeff Layton
  2013-07-08 18:10           ` Myklebust, Trond
  0 siblings, 2 replies; 15+ messages in thread
From: Bruce Fields @ 2013-07-08 14:17 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Al Viro, linux-fsdevel, Linus Torvalds

On Fri, Jul 05, 2013 at 05:46:19PM -0400, Jeff Layton wrote:
> I think the bigger issue though is that looking at refcounts in order to
> determine when we have a conflicting open is just plain wrong. There are
> all sorts of reasons one might see a raised refcount that don't involve
> conflicting opens (Al's stat() example for instance). It seems like we
> ought to shoot for a solution that doesn't rely (solely) on inode and
> dentry refcounts.

Note that NFSv4 write delegations will need to affect stat as well.
(Once you let a client perform writes locally, that client becomes the
authority on the attributes, so we have to call back to it on stat.)

--b.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] F_SETLEASE mess
  2013-07-08 14:17         ` Bruce Fields
@ 2013-07-08 14:33           ` Jeff Layton
  2013-07-08 18:10           ` Myklebust, Trond
  1 sibling, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2013-07-08 14:33 UTC (permalink / raw)
  To: Bruce Fields; +Cc: Al Viro, linux-fsdevel, Linus Torvalds

On Mon, 8 Jul 2013 10:17:01 -0400
Bruce Fields <bfields@fieldses.org> wrote:

> On Fri, Jul 05, 2013 at 05:46:19PM -0400, Jeff Layton wrote:
> > I think the bigger issue though is that looking at refcounts in order to
> > determine when we have a conflicting open is just plain wrong. There are
> > all sorts of reasons one might see a raised refcount that don't involve
> > conflicting opens (Al's stat() example for instance). It seems like we
> > ought to shoot for a solution that doesn't rely (solely) on inode and
> > dentry refcounts.
> 
> Note that NFSv4 write delegations will need to affect stat as well.
> (Once you let a client perform writes locally, that client becomes the
> authority on the attributes, so we have to call back to it on stat.)
> 
> --b.

Good point...

I'm a little leery of changing how that check is done anyway. While it
seems intuitive to (re)implement the i_readcount in a similar way to the
i_writecount, I'd be concerned about callers relying on the existing
behavior. So, I'm inclined to try and fix the race that Al has
identified without changing the logic too much.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] F_SETLEASE mess
  2013-07-08 14:17         ` Bruce Fields
  2013-07-08 14:33           ` Jeff Layton
@ 2013-07-08 18:10           ` Myklebust, Trond
  2013-07-08 18:53             ` Bruce Fields
  1 sibling, 1 reply; 15+ messages in thread
From: Myklebust, Trond @ 2013-07-08 18:10 UTC (permalink / raw)
  To: Bruce Fields
  Cc: Jeff Layton, Al Viro, linux-fsdevel@vger.kernel.org,
	Linus Torvalds

On Mon, 2013-07-08 at 10:17 -0400, Bruce Fields wrote:
> On Fri, Jul 05, 2013 at 05:46:19PM -0400, Jeff Layton wrote:
> > I think the bigger issue though is that looking at refcounts in order to
> > determine when we have a conflicting open is just plain wrong. There are
> > all sorts of reasons one might see a raised refcount that don't involve
> > conflicting opens (Al's stat() example for instance). It seems like we
> > ought to shoot for a solution that doesn't rely (solely) on inode and
> > dentry refcounts.
> 
> Note that NFSv4 write delegations will need to affect stat as well.
> (Once you let a client perform writes locally, that client becomes the
> authority on the attributes, so we have to call back to it on stat.)

Umm... Yes, but are we ever really going to want to implement that part
of the spec? All the client can tell you is 'this file is dirty' and/or
it can tell you that a size change has occurred.

It's cute that the protocol allows you to do this, but it's not
particularly practical.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] F_SETLEASE mess
  2013-07-08 18:10           ` Myklebust, Trond
@ 2013-07-08 18:53             ` Bruce Fields
  2013-07-08 19:21               ` Myklebust, Trond
  0 siblings, 1 reply; 15+ messages in thread
From: Bruce Fields @ 2013-07-08 18:53 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: Jeff Layton, Al Viro, linux-fsdevel@vger.kernel.org,
	Linus Torvalds

On Mon, Jul 08, 2013 at 06:10:40PM +0000, Myklebust, Trond wrote:
> On Mon, 2013-07-08 at 10:17 -0400, Bruce Fields wrote:
> > On Fri, Jul 05, 2013 at 05:46:19PM -0400, Jeff Layton wrote:
> > > I think the bigger issue though is that looking at refcounts in order to
> > > determine when we have a conflicting open is just plain wrong. There are
> > > all sorts of reasons one might see a raised refcount that don't involve
> > > conflicting opens (Al's stat() example for instance). It seems like we
> > > ought to shoot for a solution that doesn't rely (solely) on inode and
> > > dentry refcounts.
> > 
> > Note that NFSv4 write delegations will need to affect stat as well.
> > (Once you let a client perform writes locally, that client becomes the
> > authority on the attributes, so we have to call back to it on stat.)
> 
> Umm... Yes, but are we ever really going to want to implement that part
> of the spec?  All the client can tell you is 'this file is dirty' and/or
> it can tell you that a size change has occurred.
> 
> It's cute that the protocol allows you to do this, but it's not
> particularly practical.

If we don't take some sort of action on stat then I don't see how to
avoid e.g. breaking "make".

(That action doesn't necessarily have to be a CB_GETATTR to the client
and a blind return of the results.)

--b.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] F_SETLEASE mess
  2013-07-08 18:53             ` Bruce Fields
@ 2013-07-08 19:21               ` Myklebust, Trond
  2013-07-08 19:34                 ` Bruce Fields
  0 siblings, 1 reply; 15+ messages in thread
From: Myklebust, Trond @ 2013-07-08 19:21 UTC (permalink / raw)
  To: Bruce Fields
  Cc: Jeff Layton, Al Viro, linux-fsdevel@vger.kernel.org,
	Linus Torvalds

On Mon, 2013-07-08 at 14:53 -0400, Bruce Fields wrote:
> On Mon, Jul 08, 2013 at 06:10:40PM +0000, Myklebust, Trond wrote:
> > On Mon, 2013-07-08 at 10:17 -0400, Bruce Fields wrote:
> > > On Fri, Jul 05, 2013 at 05:46:19PM -0400, Jeff Layton wrote:
> > > > I think the bigger issue though is that looking at refcounts in order to
> > > > determine when we have a conflicting open is just plain wrong. There are
> > > > all sorts of reasons one might see a raised refcount that don't involve
> > > > conflicting opens (Al's stat() example for instance). It seems like we
> > > > ought to shoot for a solution that doesn't rely (solely) on inode and
> > > > dentry refcounts.
> > > 
> > > Note that NFSv4 write delegations will need to affect stat as well.
> > > (Once you let a client perform writes locally, that client becomes the
> > > authority on the attributes, so we have to call back to it on stat.)
> > 
> > Umm... Yes, but are we ever really going to want to implement that part
> > of the spec?  All the client can tell you is 'this file is dirty' and/or
> > it can tell you that a size change has occurred.
> > 
> > It's cute that the protocol allows you to do this, but it's not
> > particularly practical.
> 
> If we don't take some sort of action on stat then I don't see how to
> avoid e.g. breaking "make".

We already cache writes without breaking "make". Why do you think the
presence of a delegation must necessarily change everything?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] F_SETLEASE mess
  2013-07-08 19:21               ` Myklebust, Trond
@ 2013-07-08 19:34                 ` Bruce Fields
  2013-07-08 20:14                   ` Myklebust, Trond
  0 siblings, 1 reply; 15+ messages in thread
From: Bruce Fields @ 2013-07-08 19:34 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: Jeff Layton, Al Viro, linux-fsdevel@vger.kernel.org,
	Linus Torvalds

On Mon, Jul 08, 2013 at 07:21:37PM +0000, Myklebust, Trond wrote:
> On Mon, 2013-07-08 at 14:53 -0400, Bruce Fields wrote:
> > On Mon, Jul 08, 2013 at 06:10:40PM +0000, Myklebust, Trond wrote:
> > > On Mon, 2013-07-08 at 10:17 -0400, Bruce Fields wrote:
> > > > On Fri, Jul 05, 2013 at 05:46:19PM -0400, Jeff Layton wrote:
> > > > > I think the bigger issue though is that looking at refcounts in order to
> > > > > determine when we have a conflicting open is just plain wrong. There are
> > > > > all sorts of reasons one might see a raised refcount that don't involve
> > > > > conflicting opens (Al's stat() example for instance). It seems like we
> > > > > ought to shoot for a solution that doesn't rely (solely) on inode and
> > > > > dentry refcounts.
> > > > 
> > > > Note that NFSv4 write delegations will need to affect stat as well.
> > > > (Once you let a client perform writes locally, that client becomes the
> > > > authority on the attributes, so we have to call back to it on stat.)
> > > 
> > > Umm... Yes, but are we ever really going to want to implement that part
> > > of the spec?  All the client can tell you is 'this file is dirty' and/or
> > > it can tell you that a size change has occurred.
> > > 
> > > It's cute that the protocol allows you to do this, but it's not
> > > particularly practical.
> > 
> > If we don't take some sort of action on stat then I don't see how to
> > avoid e.g. breaking "make".
> 
> We already cache writes without breaking "make". Why do you think the
> presence of a delegation must necessarily change everything?

As long as it holds a write delegation a client can delay updating data
or attributes arbitrarily--so if the server continues to return the old
attributes then e.g. "make" could fail to notice an update even long
after no application on any client is accessing the file any more.

Could you explain what I'm missing?

--b.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] F_SETLEASE mess
  2013-07-08 19:34                 ` Bruce Fields
@ 2013-07-08 20:14                   ` Myklebust, Trond
  2013-07-08 21:17                     ` Bruce Fields
  0 siblings, 1 reply; 15+ messages in thread
From: Myklebust, Trond @ 2013-07-08 20:14 UTC (permalink / raw)
  To: Bruce Fields
  Cc: Jeff Layton, Al Viro, linux-fsdevel@vger.kernel.org,
	Linus Torvalds

On Mon, 2013-07-08 at 15:34 -0400, Bruce Fields wrote:
> On Mon, Jul 08, 2013 at 07:21:37PM +0000, Myklebust, Trond wrote:
> > On Mon, 2013-07-08 at 14:53 -0400, Bruce Fields wrote:
> > > On Mon, Jul 08, 2013 at 06:10:40PM +0000, Myklebust, Trond wrote:
> > > > On Mon, 2013-07-08 at 10:17 -0400, Bruce Fields wrote:
> > > > > On Fri, Jul 05, 2013 at 05:46:19PM -0400, Jeff Layton wrote:
> > > > > > I think the bigger issue though is that looking at refcounts in order to
> > > > > > determine when we have a conflicting open is just plain wrong. There are
> > > > > > all sorts of reasons one might see a raised refcount that don't involve
> > > > > > conflicting opens (Al's stat() example for instance). It seems like we
> > > > > > ought to shoot for a solution that doesn't rely (solely) on inode and
> > > > > > dentry refcounts.
> > > > > 
> > > > > Note that NFSv4 write delegations will need to affect stat as well.
> > > > > (Once you let a client perform writes locally, that client becomes the
> > > > > authority on the attributes, so we have to call back to it on stat.)
> > > > 
> > > > Umm... Yes, but are we ever really going to want to implement that part
> > > > of the spec?  All the client can tell you is 'this file is dirty' and/or
> > > > it can tell you that a size change has occurred.
> > > > 
> > > > It's cute that the protocol allows you to do this, but it's not
> > > > particularly practical.
> > > 
> > > If we don't take some sort of action on stat then I don't see how to
> > > avoid e.g. breaking "make".
> > 
> > We already cache writes without breaking "make". Why do you think the
> > presence of a delegation must necessarily change everything?
> 
> As long as it holds a write delegation a client can delay updating data
> or attributes arbitrarily--so if the server continues to return the old
> attributes then e.g. "make" could fail to notice an update even long
> after no application on any client is accessing the file any more.
> 
> Could you explain what I'm missing?

An explanation for why the client would want to do this.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] F_SETLEASE mess
  2013-07-08 20:14                   ` Myklebust, Trond
@ 2013-07-08 21:17                     ` Bruce Fields
  2013-07-08 22:25                       ` Myklebust, Trond
  0 siblings, 1 reply; 15+ messages in thread
From: Bruce Fields @ 2013-07-08 21:17 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: Jeff Layton, Al Viro, linux-fsdevel@vger.kernel.org,
	Linus Torvalds

On Mon, Jul 08, 2013 at 08:14:05PM +0000, Myklebust, Trond wrote:
> On Mon, 2013-07-08 at 15:34 -0400, Bruce Fields wrote:
> > On Mon, Jul 08, 2013 at 07:21:37PM +0000, Myklebust, Trond wrote:
> > > On Mon, 2013-07-08 at 14:53 -0400, Bruce Fields wrote:
> > > > On Mon, Jul 08, 2013 at 06:10:40PM +0000, Myklebust, Trond wrote:
> > > > > On Mon, 2013-07-08 at 10:17 -0400, Bruce Fields wrote:
> > > > > > On Fri, Jul 05, 2013 at 05:46:19PM -0400, Jeff Layton wrote:
> > > > > > > I think the bigger issue though is that looking at refcounts in order to
> > > > > > > determine when we have a conflicting open is just plain wrong. There are
> > > > > > > all sorts of reasons one might see a raised refcount that don't involve
> > > > > > > conflicting opens (Al's stat() example for instance). It seems like we
> > > > > > > ought to shoot for a solution that doesn't rely (solely) on inode and
> > > > > > > dentry refcounts.
> > > > > > 
> > > > > > Note that NFSv4 write delegations will need to affect stat as well.
> > > > > > (Once you let a client perform writes locally, that client becomes the
> > > > > > authority on the attributes, so we have to call back to it on stat.)
> > > > > 
> > > > > Umm... Yes, but are we ever really going to want to implement that part
> > > > > of the spec?  All the client can tell you is 'this file is dirty' and/or
> > > > > it can tell you that a size change has occurred.
> > > > > 
> > > > > It's cute that the protocol allows you to do this, but it's not
> > > > > particularly practical.
> > > > 
> > > > If we don't take some sort of action on stat then I don't see how to
> > > > avoid e.g. breaking "make".
> > > 
> > > We already cache writes without breaking "make". Why do you think the
> > > presence of a delegation must necessarily change everything?
> > 
> > As long as it holds a write delegation a client can delay updating data
> > or attributes arbitrarily--so if the server continues to return the old
> > attributes then e.g. "make" could fail to notice an update even long
> > after no application on any client is accessing the file any more.
> > 
> > Could you explain what I'm missing?
> 
> An explanation for why the client would want to do this.

To reduce the latency of the close operation?  The RFCs clearly permit
it.

But I know you know that, so I think you're saying that the mechanisms
described in the rfc don't work as the authors seemed to intend, and any
careful client will be required to do something on close anyway.

Is that correct?  Could you explain your thinking?

I believe you--and I'll be delighted to get out of having to hook
stat--but I don't see it yet.

It looks to me like the server can implement write delegations by either
1) breaking delegations on every stat or 2) doing a cb_getattr on stat
and then recalling only if the return indicates the client has dirty
data.

Is your argument that this will require breaking delegations too
frequently to make write delegations practical, hence that clients and
server will still have to depend on clients doing some minimal update of
the file on close?

--b.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] F_SETLEASE mess
  2013-07-08 21:17                     ` Bruce Fields
@ 2013-07-08 22:25                       ` Myklebust, Trond
  2013-07-08 23:19                         ` Bruce Fields
  0 siblings, 1 reply; 15+ messages in thread
From: Myklebust, Trond @ 2013-07-08 22:25 UTC (permalink / raw)
  To: Bruce Fields
  Cc: Jeff Layton, Al Viro, linux-fsdevel@vger.kernel.org,
	Linus Torvalds

On Mon, 2013-07-08 at 17:17 -0400, Bruce Fields wrote:
> On Mon, Jul 08, 2013 at 08:14:05PM +0000, Myklebust, Trond wrote:
> > On Mon, 2013-07-08 at 15:34 -0400, Bruce Fields wrote:
> > > On Mon, Jul 08, 2013 at 07:21:37PM +0000, Myklebust, Trond wrote:
> > > > On Mon, 2013-07-08 at 14:53 -0400, Bruce Fields wrote:
> > > > > On Mon, Jul 08, 2013 at 06:10:40PM +0000, Myklebust, Trond wrote:
> > > > > > On Mon, 2013-07-08 at 10:17 -0400, Bruce Fields wrote:
> > > > > > > On Fri, Jul 05, 2013 at 05:46:19PM -0400, Jeff Layton wrote:
> > > > > > > > I think the bigger issue though is that looking at refcounts in order to
> > > > > > > > determine when we have a conflicting open is just plain wrong. There are
> > > > > > > > all sorts of reasons one might see a raised refcount that don't involve
> > > > > > > > conflicting opens (Al's stat() example for instance). It seems like we
> > > > > > > > ought to shoot for a solution that doesn't rely (solely) on inode and
> > > > > > > > dentry refcounts.
> > > > > > > 
> > > > > > > Note that NFSv4 write delegations will need to affect stat as well.
> > > > > > > (Once you let a client perform writes locally, that client becomes the
> > > > > > > authority on the attributes, so we have to call back to it on stat.)
> > > > > > 
> > > > > > Umm... Yes, but are we ever really going to want to implement that part
> > > > > > of the spec?  All the client can tell you is 'this file is dirty' and/or
> > > > > > it can tell you that a size change has occurred.
> > > > > > 
> > > > > > It's cute that the protocol allows you to do this, but it's not
> > > > > > particularly practical.
> > > > > 
> > > > > If we don't take some sort of action on stat then I don't see how to
> > > > > avoid e.g. breaking "make".
> > > > 
> > > > We already cache writes without breaking "make". Why do you think the
> > > > presence of a delegation must necessarily change everything?
> > > 
> > > As long as it holds a write delegation a client can delay updating data
> > > or attributes arbitrarily--so if the server continues to return the old
> > > attributes then e.g. "make" could fail to notice an update even long
> > > after no application on any client is accessing the file any more.
> > > 
> > > Could you explain what I'm missing?
> > 
> > An explanation for why the client would want to do this.
> 
> To reduce the latency of the close operation?  The RFCs clearly permit
> it.
> 
> But I know you know that, so I think you're saying that the mechanisms
> described in the rfc don't work as the authors seemed to intend, and any
> careful client will be required to do something on close anyway.
> 
> Is that correct?  Could you explain your thinking?
> 
> I believe you--and I'll be delighted to get out of having to hook
> stat--but I don't see it yet.
> 
> It looks to me like the server can implement write delegations by either
> 1) breaking delegations on every stat or 2) doing a cb_getattr on stat
> and then recalling only if the return indicates the client has dirty
> data.

Why must you recall if you the cb_getattr shows dirty data? Just bumping
the change attribute should suffice to keep 'make' and friends happy.
They can then trigger the recall by deciding to open the file.

> Is your argument that this will require breaking delegations too
> frequently to make write delegations practical, hence that clients and
> server will still have to depend on clients doing some minimal update of
> the file on close?

Yes. Right now, our client does relax the rules to not send the COMMIT;
the write delegation ensures that we can recover safely in case of a
server reboot. However we do still actively flush out the WRITEs to
reduce the burden of delegation recalls, and also to ensure that we
don't have to worry about the whole delegation writeback quota
business...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] F_SETLEASE mess
  2013-07-08 22:25                       ` Myklebust, Trond
@ 2013-07-08 23:19                         ` Bruce Fields
  0 siblings, 0 replies; 15+ messages in thread
From: Bruce Fields @ 2013-07-08 23:19 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: Jeff Layton, Al Viro, linux-fsdevel@vger.kernel.org,
	Linus Torvalds

On Mon, Jul 08, 2013 at 10:25:21PM +0000, Myklebust, Trond wrote:
> On Mon, 2013-07-08 at 17:17 -0400, Bruce Fields wrote:
> > Is your argument that this will require breaking delegations too
> > frequently to make write delegations practical, hence that clients and
> > server will still have to depend on clients doing some minimal update of
> > the file on close?
> 
> Yes. Right now, our client does relax the rules to not send the COMMIT;
> the write delegation ensures that we can recover safely in case of a
> server reboot. However we do still actively flush out the WRITEs to
> reduce the burden of delegation recalls, and also to ensure that we
> don't have to worry about the whole delegation writeback quota
> business...

OK, so: we should assume that a client will do what's necessary to
update the server attributes on close, such that the server won't need
to do something new on stat to prevent a regression in consistency
semantics when we implement write delegations.

That makes sense to me.

But (unless I'm totally misreading) the RFC authors clearly expected
clients to delay that stuff on close.

Specs aren't manuals for how to write a good implementation, I
understand that, but--here you're asking me to assume client
implementors will all do something that the spec goes out of its way to
tell them they don't have to do.

So I'd be more comfortable if I'd seen some ietf discussion of that.
I'll go start it if need be.  (Or maybe there already was some and I
missed it.)

--b.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-07-08 23:19 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-05  9:04 [RFC] F_SETLEASE mess Al Viro
2013-07-05 10:51 ` Jeff Layton
2013-07-05 12:08   ` Jeff Layton
2013-07-05 16:25     ` Bruce Fields
2013-07-05 21:46       ` Jeff Layton
2013-07-08 14:17         ` Bruce Fields
2013-07-08 14:33           ` Jeff Layton
2013-07-08 18:10           ` Myklebust, Trond
2013-07-08 18:53             ` Bruce Fields
2013-07-08 19:21               ` Myklebust, Trond
2013-07-08 19:34                 ` Bruce Fields
2013-07-08 20:14                   ` Myklebust, Trond
2013-07-08 21:17                     ` Bruce Fields
2013-07-08 22:25                       ` Myklebust, Trond
2013-07-08 23:19                         ` Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).