* [RFC] F_SETLEASE mess @ 2013-07-05 9:04 Al Viro 2013-07-05 10:51 ` Jeff Layton 0 siblings, 1 reply; 15+ messages in thread From: Al Viro @ 2013-07-05 9:04 UTC (permalink / raw) To: Jeff Layton; +Cc: linux-fsdevel, Linus Torvalds generic_add_lease() with F_WRLCK checks for other openers in a very crude way - it wants no extra references to dentry (thus excluding other struct file pointing to it) *and* no extra references to in-core inode, excluding openers of other links. It fails with EAGAIN if those conditions are not met. The way it deals with another open(2) racing with it (i.e. managing to squeeze between the check and locks_insert_lock()) is theoretically racy; do_dentry_open() would spin on ->i_lock, all right, but... only if there already is something in inode->i_flock. If this is the first lease/lock being set, break_lease() will do nothing, rather than call __break_lease() and spin there. It's _very_ hard to hit; we are holding ->i_lock and thus can't be preempted, so open(2) would have to get *everything* (pathname lookup, etc.) done in a very narrow window. So I don't believe it's exploitable, but it really smells bad. The check is extremely crude and if nothing else it's a DoS fodder - a luser that keeps hitting that file with stat(2) can prevent F_SETLEASE from succeeding, even though he wouldn't be able to open the damn thing at all... ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] F_SETLEASE mess 2013-07-05 9:04 [RFC] F_SETLEASE mess Al Viro @ 2013-07-05 10:51 ` Jeff Layton 2013-07-05 12:08 ` Jeff Layton 0 siblings, 1 reply; 15+ messages in thread From: Jeff Layton @ 2013-07-05 10:51 UTC (permalink / raw) To: Al Viro; +Cc: linux-fsdevel, Linus Torvalds, Bruce Fields On Fri, 2013-07-05 at 10:04 +0100, Al Viro wrote: > generic_add_lease() with F_WRLCK checks for other openers > in a very crude way - it wants no extra references to dentry (thus > excluding other struct file pointing to it) *and* no extra references > to in-core inode, excluding openers of other links. It fails with > EAGAIN if those conditions are not met. > > The way it deals with another open(2) racing with it (i.e. > managing to squeeze between the check and locks_insert_lock()) is > theoretically racy; do_dentry_open() would spin on ->i_lock, all > right, but... only if there already is something in inode->i_flock. > If this is the first lease/lock being set, break_lease() will do > nothing, rather than call __break_lease() and spin there. > > It's _very_ hard to hit; we are holding ->i_lock and thus can't > be preempted, so open(2) would have to get *everything* (pathname > lookup, etc.) done in a very narrow window. So I don't believe it's > exploitable, but it really smells bad. The check is extremely crude > and if nothing else it's a DoS fodder - a luser that keeps hitting that > file with stat(2) can prevent F_SETLEASE from succeeding, even though > he wouldn't be able to open the damn thing at all... (cc'ing Bruce since he's been poking around in this area recently and might have ideas...) I see what you mean. I haven't looked closely at this yet, and I'm on holiday today, but I'll plan to give this more scrutiny when I get back next week. Giving it an initial look though... We already hold an extra reference to the dentry via the path_get in do_dentry_open. So is this race possible if the two tasks are working on the same dentry? Or does it require a hardlinked inode? If it's not possible to race on the same dentry, then one possible fix would be to go ahead and do an extra igrab(inode) in do_dentry_open before calling break_lease. I'm not particularly fond of that since it means taking the i_lock an extra time, but it looks like it would close the race. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] F_SETLEASE mess 2013-07-05 10:51 ` Jeff Layton @ 2013-07-05 12:08 ` Jeff Layton 2013-07-05 16:25 ` Bruce Fields 0 siblings, 1 reply; 15+ messages in thread From: Jeff Layton @ 2013-07-05 12:08 UTC (permalink / raw) To: Al Viro; +Cc: linux-fsdevel, Linus Torvalds, Bruce Fields On Fri, 2013-07-05 at 06:51 -0400, Jeff Layton wrote: > On Fri, 2013-07-05 at 10:04 +0100, Al Viro wrote: > > generic_add_lease() with F_WRLCK checks for other openers > > in a very crude way - it wants no extra references to dentry (thus > > excluding other struct file pointing to it) *and* no extra references > > to in-core inode, excluding openers of other links. It fails with > > EAGAIN if those conditions are not met. > > > > The way it deals with another open(2) racing with it (i.e. > > managing to squeeze between the check and locks_insert_lock()) is > > theoretically racy; do_dentry_open() would spin on ->i_lock, all > > right, but... only if there already is something in inode->i_flock. > > If this is the first lease/lock being set, break_lease() will do > > nothing, rather than call __break_lease() and spin there. > > > > It's _very_ hard to hit; we are holding ->i_lock and thus can't > > be preempted, so open(2) would have to get *everything* (pathname > > lookup, etc.) done in a very narrow window. So I don't believe it's > > exploitable, but it really smells bad. The check is extremely crude > > and if nothing else it's a DoS fodder - a luser that keeps hitting that > > file with stat(2) can prevent F_SETLEASE from succeeding, even though > > he wouldn't be able to open the damn thing at all... > > (cc'ing Bruce since he's been poking around in this area recently and > might have ideas...) > > I see what you mean. I haven't looked closely at this yet, and I'm on > holiday today, but I'll plan to give this more scrutiny when I get back > next week. Giving it an initial look though... > > We already hold an extra reference to the dentry via the path_get in > do_dentry_open. So is this race possible if the two tasks are working on > the same dentry? Or does it require a hardlinked inode? > > If it's not possible to race on the same dentry, then one possible fix > would be to go ahead and do an extra igrab(inode) in do_dentry_open > before calling break_lease. I'm not particularly fond of that since it > means taking the i_lock an extra time, but it looks like it would close > the race. > Hrm. I think we'd also need to couple that with an extra check for a high refcount after doing locks_insert_lock in generic_add_lease, and then call locks_delete_lock and return -EAGAIN if the counts have changed. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] F_SETLEASE mess 2013-07-05 12:08 ` Jeff Layton @ 2013-07-05 16:25 ` Bruce Fields 2013-07-05 21:46 ` Jeff Layton 0 siblings, 1 reply; 15+ messages in thread From: Bruce Fields @ 2013-07-05 16:25 UTC (permalink / raw) To: Jeff Layton; +Cc: Al Viro, linux-fsdevel, Linus Torvalds > > On Fri, 2013-07-05 at 10:04 +0100, Al Viro wrote: > > > generic_add_lease() with F_WRLCK checks for other openers > > > in a very crude way - it wants no extra references to dentry (thus > > > excluding other struct file pointing to it) *and* no extra references > > > to in-core inode, excluding openers of other links. It fails with > > > EAGAIN if those conditions are not met. > > > > > > The way it deals with another open(2) racing with it (i.e. > > > managing to squeeze between the check and locks_insert_lock()) is > > > theoretically racy; do_dentry_open() would spin on ->i_lock, all > > > right, but... only if there already is something in inode->i_flock. > > > If this is the first lease/lock being set, break_lease() will do > > > nothing, rather than call __break_lease() and spin there. > > > > > > It's _very_ hard to hit; we are holding ->i_lock and thus can't > > > be preempted, so open(2) would have to get *everything* (pathname > > > lookup, etc.) done in a very narrow window. So I don't believe it's > > > exploitable, but it really smells bad. The check is extremely crude > > > and if nothing else it's a DoS fodder - a luser that keeps hitting that > > > file with stat(2) can prevent F_SETLEASE from succeeding, even though > > > he wouldn't be able to open the damn thing at all... nfsd isn't using write leases yet (I want to get read delegations sorted out first), and I don't understand Samba's requirements for write leases. In the future, when nfsd does write delegations: they're an optional optimization, and if we're concerned about such a DOS one solution might be just to change everything to a trylock when acquiring delegations: it's always acceptable to just fail the delegation. On Fri, Jul 05, 2013 at 08:08:44AM -0400, Jeff Layton wrote: > On Fri, 2013-07-05 at 06:51 -0400, Jeff Layton wrote: > > We already hold an extra reference to the dentry via the path_get in > > do_dentry_open. So is this race possible if the two tasks are working on > > the same dentry? Well, as Al says, the race is roughly: take i_lock generic_add_lease checks d_count, i_count Conflicting opener does "*everything* (pathname lookup, etc.)" (including that path_get, and breake_lease() (which sees no lock, so doesn't try to get i_lock.)) ... locks_insert_lock() adds new lock. > > Or does it require a hardlinked inode? > > > > If it's not possible to race on the same dentry, then one possible fix > > would be to go ahead and do an extra igrab(inode) I think the only reason for this race is the attempt to optimize out an i_lock acquisition in break_lease. If we're willing to call igrab (which also takes the i_lock), then we may as well just take the i_lock. > > in do_dentry_open > > before calling break_lease. I'm not particularly fond of that since it > > means taking the i_lock an extra time, but it looks like it would close > > the race. > > Hrm. I think we'd also need to couple that with an extra check for a > high refcount after doing locks_insert_lock in generic_add_lease, and > then call locks_delete_lock and return -EAGAIN if the counts have > changed. ... but checking the counts again afterwards might work. (Dumb question: in the absence of a lock on the opener's side, are the memory accesses ordered such that a lease-setter is guaranteed to see the new counts from an opener that didn't see the new i_flock value?) How important is the optimization that skips the i_lock in break_lease? --b. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] F_SETLEASE mess 2013-07-05 16:25 ` Bruce Fields @ 2013-07-05 21:46 ` Jeff Layton 2013-07-08 14:17 ` Bruce Fields 0 siblings, 1 reply; 15+ messages in thread From: Jeff Layton @ 2013-07-05 21:46 UTC (permalink / raw) To: Bruce Fields; +Cc: Al Viro, linux-fsdevel, Linus Torvalds On Fri, 2013-07-05 at 12:25 -0400, Bruce Fields wrote: > > > On Fri, 2013-07-05 at 10:04 +0100, Al Viro wrote: > > > > generic_add_lease() with F_WRLCK checks for other openers > > > > in a very crude way - it wants no extra references to dentry (thus > > > > excluding other struct file pointing to it) *and* no extra references > > > > to in-core inode, excluding openers of other links. It fails with > > > > EAGAIN if those conditions are not met. > > > > > > > > The way it deals with another open(2) racing with it (i.e. > > > > managing to squeeze between the check and locks_insert_lock()) is > > > > theoretically racy; do_dentry_open() would spin on ->i_lock, all > > > > right, but... only if there already is something in inode->i_flock. > > > > If this is the first lease/lock being set, break_lease() will do > > > > nothing, rather than call __break_lease() and spin there. > > > > > > > > It's _very_ hard to hit; we are holding ->i_lock and thus can't > > > > be preempted, so open(2) would have to get *everything* (pathname > > > > lookup, etc.) done in a very narrow window. So I don't believe it's > > > > exploitable, but it really smells bad. The check is extremely crude > > > > and if nothing else it's a DoS fodder - a luser that keeps hitting that > > > > file with stat(2) can prevent F_SETLEASE from succeeding, even though > > > > he wouldn't be able to open the damn thing at all... > > nfsd isn't using write leases yet (I want to get read delegations sorted > out first), and I don't understand Samba's requirements for write > leases. > > In the future, when nfsd does write delegations: they're an optional > optimization, and if we're concerned about such a DOS one solution might > be just to change everything to a trylock when acquiring delegations: > it's always acceptable to just fail the delegation. > I think that's the case for Samba too. Oplocks are very similar to delegations, though there are of course some semantic differences. > On Fri, Jul 05, 2013 at 08:08:44AM -0400, Jeff Layton wrote: > > On Fri, 2013-07-05 at 06:51 -0400, Jeff Layton wrote: > > > We already hold an extra reference to the dentry via the path_get in > > > do_dentry_open. So is this race possible if the two tasks are working on > > > the same dentry? > > Well, as Al says, the race is roughly: > > take i_lock > generic_add_lease checks d_count, i_count > > Conflicting opener does "*everything* (pathname lookup, > etc.)" (including that path_get, and breake_lease() > (which sees no lock, so doesn't try to get i_lock.)) > > ... > locks_insert_lock() adds new lock. > Right, I realized that while I was driving earlier. This race is definitely possible in either case... > > > Or does it require a hardlinked inode? > > > > > > If it's not possible to race on the same dentry, then one possible fix > > > would be to go ahead and do an extra igrab(inode) > > I think the only reason for this race is the attempt to optimize out an > i_lock acquisition in break_lease. If we're willing to call igrab > (which also takes the i_lock), then we may as well just take the i_lock. > I had misremembered the function name -- I had meant ihold/iput (which would commonly be lockless). The point is moot though since we have a similar problem in the "same dentry" case anyway. Also the path_get on the other dentry would ensure that you had a i_count > 1... So...disregard my silly rumbling about it. :) > > > in do_dentry_open > > > before calling break_lease. I'm not particularly fond of that since it > > > means taking the i_lock an extra time, but it looks like it would close > > > the race. > > > > Hrm. I think we'd also need to couple that with an extra check for a > > high refcount after doing locks_insert_lock in generic_add_lease, and > > then call locks_delete_lock and return -EAGAIN if the counts have > > changed. > > ... but checking the counts again afterwards might work. (Dumb > question: in the absence of a lock on the opener's side, are the memory > accesses ordered such that a lease-setter is guaranteed to see the new > counts from an opener that didn't see the new i_flock value?) > I'm not sure. We'd certainly need to audit that and ensure that the lease setter does see them before we could rely on such a scheme. > How important is the optimization that skips the i_lock in break_lease? Again, not sure...but I do shy away a bit from anything that would slow down opens in the case where leases aren't involved. I'd prefer any penalty be paid by the setlease caller. Opens are much more common than leases in almost every workload I can think of. I think the bigger issue though is that looking at refcounts in order to determine when we have a conflicting open is just plain wrong. There are all sorts of reasons one might see a raised refcount that don't involve conflicting opens (Al's stat() example for instance). It seems like we ought to shoot for a solution that doesn't rely (solely) on inode and dentry refcounts. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] F_SETLEASE mess 2013-07-05 21:46 ` Jeff Layton @ 2013-07-08 14:17 ` Bruce Fields 2013-07-08 14:33 ` Jeff Layton 2013-07-08 18:10 ` Myklebust, Trond 0 siblings, 2 replies; 15+ messages in thread From: Bruce Fields @ 2013-07-08 14:17 UTC (permalink / raw) To: Jeff Layton; +Cc: Al Viro, linux-fsdevel, Linus Torvalds On Fri, Jul 05, 2013 at 05:46:19PM -0400, Jeff Layton wrote: > I think the bigger issue though is that looking at refcounts in order to > determine when we have a conflicting open is just plain wrong. There are > all sorts of reasons one might see a raised refcount that don't involve > conflicting opens (Al's stat() example for instance). It seems like we > ought to shoot for a solution that doesn't rely (solely) on inode and > dentry refcounts. Note that NFSv4 write delegations will need to affect stat as well. (Once you let a client perform writes locally, that client becomes the authority on the attributes, so we have to call back to it on stat.) --b. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] F_SETLEASE mess 2013-07-08 14:17 ` Bruce Fields @ 2013-07-08 14:33 ` Jeff Layton 2013-07-08 18:10 ` Myklebust, Trond 1 sibling, 0 replies; 15+ messages in thread From: Jeff Layton @ 2013-07-08 14:33 UTC (permalink / raw) To: Bruce Fields; +Cc: Al Viro, linux-fsdevel, Linus Torvalds On Mon, 8 Jul 2013 10:17:01 -0400 Bruce Fields <bfields@fieldses.org> wrote: > On Fri, Jul 05, 2013 at 05:46:19PM -0400, Jeff Layton wrote: > > I think the bigger issue though is that looking at refcounts in order to > > determine when we have a conflicting open is just plain wrong. There are > > all sorts of reasons one might see a raised refcount that don't involve > > conflicting opens (Al's stat() example for instance). It seems like we > > ought to shoot for a solution that doesn't rely (solely) on inode and > > dentry refcounts. > > Note that NFSv4 write delegations will need to affect stat as well. > (Once you let a client perform writes locally, that client becomes the > authority on the attributes, so we have to call back to it on stat.) > > --b. Good point... I'm a little leery of changing how that check is done anyway. While it seems intuitive to (re)implement the i_readcount in a similar way to the i_writecount, I'd be concerned about callers relying on the existing behavior. So, I'm inclined to try and fix the race that Al has identified without changing the logic too much. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] F_SETLEASE mess 2013-07-08 14:17 ` Bruce Fields 2013-07-08 14:33 ` Jeff Layton @ 2013-07-08 18:10 ` Myklebust, Trond 2013-07-08 18:53 ` Bruce Fields 1 sibling, 1 reply; 15+ messages in thread From: Myklebust, Trond @ 2013-07-08 18:10 UTC (permalink / raw) To: Bruce Fields Cc: Jeff Layton, Al Viro, linux-fsdevel@vger.kernel.org, Linus Torvalds On Mon, 2013-07-08 at 10:17 -0400, Bruce Fields wrote: > On Fri, Jul 05, 2013 at 05:46:19PM -0400, Jeff Layton wrote: > > I think the bigger issue though is that looking at refcounts in order to > > determine when we have a conflicting open is just plain wrong. There are > > all sorts of reasons one might see a raised refcount that don't involve > > conflicting opens (Al's stat() example for instance). It seems like we > > ought to shoot for a solution that doesn't rely (solely) on inode and > > dentry refcounts. > > Note that NFSv4 write delegations will need to affect stat as well. > (Once you let a client perform writes locally, that client becomes the > authority on the attributes, so we have to call back to it on stat.) Umm... Yes, but are we ever really going to want to implement that part of the spec? All the client can tell you is 'this file is dirty' and/or it can tell you that a size change has occurred. It's cute that the protocol allows you to do this, but it's not particularly practical. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] F_SETLEASE mess 2013-07-08 18:10 ` Myklebust, Trond @ 2013-07-08 18:53 ` Bruce Fields 2013-07-08 19:21 ` Myklebust, Trond 0 siblings, 1 reply; 15+ messages in thread From: Bruce Fields @ 2013-07-08 18:53 UTC (permalink / raw) To: Myklebust, Trond Cc: Jeff Layton, Al Viro, linux-fsdevel@vger.kernel.org, Linus Torvalds On Mon, Jul 08, 2013 at 06:10:40PM +0000, Myklebust, Trond wrote: > On Mon, 2013-07-08 at 10:17 -0400, Bruce Fields wrote: > > On Fri, Jul 05, 2013 at 05:46:19PM -0400, Jeff Layton wrote: > > > I think the bigger issue though is that looking at refcounts in order to > > > determine when we have a conflicting open is just plain wrong. There are > > > all sorts of reasons one might see a raised refcount that don't involve > > > conflicting opens (Al's stat() example for instance). It seems like we > > > ought to shoot for a solution that doesn't rely (solely) on inode and > > > dentry refcounts. > > > > Note that NFSv4 write delegations will need to affect stat as well. > > (Once you let a client perform writes locally, that client becomes the > > authority on the attributes, so we have to call back to it on stat.) > > Umm... Yes, but are we ever really going to want to implement that part > of the spec? All the client can tell you is 'this file is dirty' and/or > it can tell you that a size change has occurred. > > It's cute that the protocol allows you to do this, but it's not > particularly practical. If we don't take some sort of action on stat then I don't see how to avoid e.g. breaking "make". (That action doesn't necessarily have to be a CB_GETATTR to the client and a blind return of the results.) --b. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] F_SETLEASE mess 2013-07-08 18:53 ` Bruce Fields @ 2013-07-08 19:21 ` Myklebust, Trond 2013-07-08 19:34 ` Bruce Fields 0 siblings, 1 reply; 15+ messages in thread From: Myklebust, Trond @ 2013-07-08 19:21 UTC (permalink / raw) To: Bruce Fields Cc: Jeff Layton, Al Viro, linux-fsdevel@vger.kernel.org, Linus Torvalds On Mon, 2013-07-08 at 14:53 -0400, Bruce Fields wrote: > On Mon, Jul 08, 2013 at 06:10:40PM +0000, Myklebust, Trond wrote: > > On Mon, 2013-07-08 at 10:17 -0400, Bruce Fields wrote: > > > On Fri, Jul 05, 2013 at 05:46:19PM -0400, Jeff Layton wrote: > > > > I think the bigger issue though is that looking at refcounts in order to > > > > determine when we have a conflicting open is just plain wrong. There are > > > > all sorts of reasons one might see a raised refcount that don't involve > > > > conflicting opens (Al's stat() example for instance). It seems like we > > > > ought to shoot for a solution that doesn't rely (solely) on inode and > > > > dentry refcounts. > > > > > > Note that NFSv4 write delegations will need to affect stat as well. > > > (Once you let a client perform writes locally, that client becomes the > > > authority on the attributes, so we have to call back to it on stat.) > > > > Umm... Yes, but are we ever really going to want to implement that part > > of the spec? All the client can tell you is 'this file is dirty' and/or > > it can tell you that a size change has occurred. > > > > It's cute that the protocol allows you to do this, but it's not > > particularly practical. > > If we don't take some sort of action on stat then I don't see how to > avoid e.g. breaking "make". We already cache writes without breaking "make". Why do you think the presence of a delegation must necessarily change everything? -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] F_SETLEASE mess 2013-07-08 19:21 ` Myklebust, Trond @ 2013-07-08 19:34 ` Bruce Fields 2013-07-08 20:14 ` Myklebust, Trond 0 siblings, 1 reply; 15+ messages in thread From: Bruce Fields @ 2013-07-08 19:34 UTC (permalink / raw) To: Myklebust, Trond Cc: Jeff Layton, Al Viro, linux-fsdevel@vger.kernel.org, Linus Torvalds On Mon, Jul 08, 2013 at 07:21:37PM +0000, Myklebust, Trond wrote: > On Mon, 2013-07-08 at 14:53 -0400, Bruce Fields wrote: > > On Mon, Jul 08, 2013 at 06:10:40PM +0000, Myklebust, Trond wrote: > > > On Mon, 2013-07-08 at 10:17 -0400, Bruce Fields wrote: > > > > On Fri, Jul 05, 2013 at 05:46:19PM -0400, Jeff Layton wrote: > > > > > I think the bigger issue though is that looking at refcounts in order to > > > > > determine when we have a conflicting open is just plain wrong. There are > > > > > all sorts of reasons one might see a raised refcount that don't involve > > > > > conflicting opens (Al's stat() example for instance). It seems like we > > > > > ought to shoot for a solution that doesn't rely (solely) on inode and > > > > > dentry refcounts. > > > > > > > > Note that NFSv4 write delegations will need to affect stat as well. > > > > (Once you let a client perform writes locally, that client becomes the > > > > authority on the attributes, so we have to call back to it on stat.) > > > > > > Umm... Yes, but are we ever really going to want to implement that part > > > of the spec? All the client can tell you is 'this file is dirty' and/or > > > it can tell you that a size change has occurred. > > > > > > It's cute that the protocol allows you to do this, but it's not > > > particularly practical. > > > > If we don't take some sort of action on stat then I don't see how to > > avoid e.g. breaking "make". > > We already cache writes without breaking "make". Why do you think the > presence of a delegation must necessarily change everything? As long as it holds a write delegation a client can delay updating data or attributes arbitrarily--so if the server continues to return the old attributes then e.g. "make" could fail to notice an update even long after no application on any client is accessing the file any more. Could you explain what I'm missing? --b. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] F_SETLEASE mess 2013-07-08 19:34 ` Bruce Fields @ 2013-07-08 20:14 ` Myklebust, Trond 2013-07-08 21:17 ` Bruce Fields 0 siblings, 1 reply; 15+ messages in thread From: Myklebust, Trond @ 2013-07-08 20:14 UTC (permalink / raw) To: Bruce Fields Cc: Jeff Layton, Al Viro, linux-fsdevel@vger.kernel.org, Linus Torvalds On Mon, 2013-07-08 at 15:34 -0400, Bruce Fields wrote: > On Mon, Jul 08, 2013 at 07:21:37PM +0000, Myklebust, Trond wrote: > > On Mon, 2013-07-08 at 14:53 -0400, Bruce Fields wrote: > > > On Mon, Jul 08, 2013 at 06:10:40PM +0000, Myklebust, Trond wrote: > > > > On Mon, 2013-07-08 at 10:17 -0400, Bruce Fields wrote: > > > > > On Fri, Jul 05, 2013 at 05:46:19PM -0400, Jeff Layton wrote: > > > > > > I think the bigger issue though is that looking at refcounts in order to > > > > > > determine when we have a conflicting open is just plain wrong. There are > > > > > > all sorts of reasons one might see a raised refcount that don't involve > > > > > > conflicting opens (Al's stat() example for instance). It seems like we > > > > > > ought to shoot for a solution that doesn't rely (solely) on inode and > > > > > > dentry refcounts. > > > > > > > > > > Note that NFSv4 write delegations will need to affect stat as well. > > > > > (Once you let a client perform writes locally, that client becomes the > > > > > authority on the attributes, so we have to call back to it on stat.) > > > > > > > > Umm... Yes, but are we ever really going to want to implement that part > > > > of the spec? All the client can tell you is 'this file is dirty' and/or > > > > it can tell you that a size change has occurred. > > > > > > > > It's cute that the protocol allows you to do this, but it's not > > > > particularly practical. > > > > > > If we don't take some sort of action on stat then I don't see how to > > > avoid e.g. breaking "make". > > > > We already cache writes without breaking "make". Why do you think the > > presence of a delegation must necessarily change everything? > > As long as it holds a write delegation a client can delay updating data > or attributes arbitrarily--so if the server continues to return the old > attributes then e.g. "make" could fail to notice an update even long > after no application on any client is accessing the file any more. > > Could you explain what I'm missing? An explanation for why the client would want to do this. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] F_SETLEASE mess 2013-07-08 20:14 ` Myklebust, Trond @ 2013-07-08 21:17 ` Bruce Fields 2013-07-08 22:25 ` Myklebust, Trond 0 siblings, 1 reply; 15+ messages in thread From: Bruce Fields @ 2013-07-08 21:17 UTC (permalink / raw) To: Myklebust, Trond Cc: Jeff Layton, Al Viro, linux-fsdevel@vger.kernel.org, Linus Torvalds On Mon, Jul 08, 2013 at 08:14:05PM +0000, Myklebust, Trond wrote: > On Mon, 2013-07-08 at 15:34 -0400, Bruce Fields wrote: > > On Mon, Jul 08, 2013 at 07:21:37PM +0000, Myklebust, Trond wrote: > > > On Mon, 2013-07-08 at 14:53 -0400, Bruce Fields wrote: > > > > On Mon, Jul 08, 2013 at 06:10:40PM +0000, Myklebust, Trond wrote: > > > > > On Mon, 2013-07-08 at 10:17 -0400, Bruce Fields wrote: > > > > > > On Fri, Jul 05, 2013 at 05:46:19PM -0400, Jeff Layton wrote: > > > > > > > I think the bigger issue though is that looking at refcounts in order to > > > > > > > determine when we have a conflicting open is just plain wrong. There are > > > > > > > all sorts of reasons one might see a raised refcount that don't involve > > > > > > > conflicting opens (Al's stat() example for instance). It seems like we > > > > > > > ought to shoot for a solution that doesn't rely (solely) on inode and > > > > > > > dentry refcounts. > > > > > > > > > > > > Note that NFSv4 write delegations will need to affect stat as well. > > > > > > (Once you let a client perform writes locally, that client becomes the > > > > > > authority on the attributes, so we have to call back to it on stat.) > > > > > > > > > > Umm... Yes, but are we ever really going to want to implement that part > > > > > of the spec? All the client can tell you is 'this file is dirty' and/or > > > > > it can tell you that a size change has occurred. > > > > > > > > > > It's cute that the protocol allows you to do this, but it's not > > > > > particularly practical. > > > > > > > > If we don't take some sort of action on stat then I don't see how to > > > > avoid e.g. breaking "make". > > > > > > We already cache writes without breaking "make". Why do you think the > > > presence of a delegation must necessarily change everything? > > > > As long as it holds a write delegation a client can delay updating data > > or attributes arbitrarily--so if the server continues to return the old > > attributes then e.g. "make" could fail to notice an update even long > > after no application on any client is accessing the file any more. > > > > Could you explain what I'm missing? > > An explanation for why the client would want to do this. To reduce the latency of the close operation? The RFCs clearly permit it. But I know you know that, so I think you're saying that the mechanisms described in the rfc don't work as the authors seemed to intend, and any careful client will be required to do something on close anyway. Is that correct? Could you explain your thinking? I believe you--and I'll be delighted to get out of having to hook stat--but I don't see it yet. It looks to me like the server can implement write delegations by either 1) breaking delegations on every stat or 2) doing a cb_getattr on stat and then recalling only if the return indicates the client has dirty data. Is your argument that this will require breaking delegations too frequently to make write delegations practical, hence that clients and server will still have to depend on clients doing some minimal update of the file on close? --b. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] F_SETLEASE mess 2013-07-08 21:17 ` Bruce Fields @ 2013-07-08 22:25 ` Myklebust, Trond 2013-07-08 23:19 ` Bruce Fields 0 siblings, 1 reply; 15+ messages in thread From: Myklebust, Trond @ 2013-07-08 22:25 UTC (permalink / raw) To: Bruce Fields Cc: Jeff Layton, Al Viro, linux-fsdevel@vger.kernel.org, Linus Torvalds On Mon, 2013-07-08 at 17:17 -0400, Bruce Fields wrote: > On Mon, Jul 08, 2013 at 08:14:05PM +0000, Myklebust, Trond wrote: > > On Mon, 2013-07-08 at 15:34 -0400, Bruce Fields wrote: > > > On Mon, Jul 08, 2013 at 07:21:37PM +0000, Myklebust, Trond wrote: > > > > On Mon, 2013-07-08 at 14:53 -0400, Bruce Fields wrote: > > > > > On Mon, Jul 08, 2013 at 06:10:40PM +0000, Myklebust, Trond wrote: > > > > > > On Mon, 2013-07-08 at 10:17 -0400, Bruce Fields wrote: > > > > > > > On Fri, Jul 05, 2013 at 05:46:19PM -0400, Jeff Layton wrote: > > > > > > > > I think the bigger issue though is that looking at refcounts in order to > > > > > > > > determine when we have a conflicting open is just plain wrong. There are > > > > > > > > all sorts of reasons one might see a raised refcount that don't involve > > > > > > > > conflicting opens (Al's stat() example for instance). It seems like we > > > > > > > > ought to shoot for a solution that doesn't rely (solely) on inode and > > > > > > > > dentry refcounts. > > > > > > > > > > > > > > Note that NFSv4 write delegations will need to affect stat as well. > > > > > > > (Once you let a client perform writes locally, that client becomes the > > > > > > > authority on the attributes, so we have to call back to it on stat.) > > > > > > > > > > > > Umm... Yes, but are we ever really going to want to implement that part > > > > > > of the spec? All the client can tell you is 'this file is dirty' and/or > > > > > > it can tell you that a size change has occurred. > > > > > > > > > > > > It's cute that the protocol allows you to do this, but it's not > > > > > > particularly practical. > > > > > > > > > > If we don't take some sort of action on stat then I don't see how to > > > > > avoid e.g. breaking "make". > > > > > > > > We already cache writes without breaking "make". Why do you think the > > > > presence of a delegation must necessarily change everything? > > > > > > As long as it holds a write delegation a client can delay updating data > > > or attributes arbitrarily--so if the server continues to return the old > > > attributes then e.g. "make" could fail to notice an update even long > > > after no application on any client is accessing the file any more. > > > > > > Could you explain what I'm missing? > > > > An explanation for why the client would want to do this. > > To reduce the latency of the close operation? The RFCs clearly permit > it. > > But I know you know that, so I think you're saying that the mechanisms > described in the rfc don't work as the authors seemed to intend, and any > careful client will be required to do something on close anyway. > > Is that correct? Could you explain your thinking? > > I believe you--and I'll be delighted to get out of having to hook > stat--but I don't see it yet. > > It looks to me like the server can implement write delegations by either > 1) breaking delegations on every stat or 2) doing a cb_getattr on stat > and then recalling only if the return indicates the client has dirty > data. Why must you recall if you the cb_getattr shows dirty data? Just bumping the change attribute should suffice to keep 'make' and friends happy. They can then trigger the recall by deciding to open the file. > Is your argument that this will require breaking delegations too > frequently to make write delegations practical, hence that clients and > server will still have to depend on clients doing some minimal update of > the file on close? Yes. Right now, our client does relax the rules to not send the COMMIT; the write delegation ensures that we can recover safely in case of a server reboot. However we do still actively flush out the WRITEs to reduce the burden of delegation recalls, and also to ensure that we don't have to worry about the whole delegation writeback quota business... -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] F_SETLEASE mess 2013-07-08 22:25 ` Myklebust, Trond @ 2013-07-08 23:19 ` Bruce Fields 0 siblings, 0 replies; 15+ messages in thread From: Bruce Fields @ 2013-07-08 23:19 UTC (permalink / raw) To: Myklebust, Trond Cc: Jeff Layton, Al Viro, linux-fsdevel@vger.kernel.org, Linus Torvalds On Mon, Jul 08, 2013 at 10:25:21PM +0000, Myklebust, Trond wrote: > On Mon, 2013-07-08 at 17:17 -0400, Bruce Fields wrote: > > Is your argument that this will require breaking delegations too > > frequently to make write delegations practical, hence that clients and > > server will still have to depend on clients doing some minimal update of > > the file on close? > > Yes. Right now, our client does relax the rules to not send the COMMIT; > the write delegation ensures that we can recover safely in case of a > server reboot. However we do still actively flush out the WRITEs to > reduce the burden of delegation recalls, and also to ensure that we > don't have to worry about the whole delegation writeback quota > business... OK, so: we should assume that a client will do what's necessary to update the server attributes on close, such that the server won't need to do something new on stat to prevent a regression in consistency semantics when we implement write delegations. That makes sense to me. But (unless I'm totally misreading) the RFC authors clearly expected clients to delay that stuff on close. Specs aren't manuals for how to write a good implementation, I understand that, but--here you're asking me to assume client implementors will all do something that the spec goes out of its way to tell them they don't have to do. So I'd be more comfortable if I'd seen some ietf discussion of that. I'll go start it if need be. (Or maybe there already was some and I missed it.) --b. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2013-07-08 23:19 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-07-05 9:04 [RFC] F_SETLEASE mess Al Viro 2013-07-05 10:51 ` Jeff Layton 2013-07-05 12:08 ` Jeff Layton 2013-07-05 16:25 ` Bruce Fields 2013-07-05 21:46 ` Jeff Layton 2013-07-08 14:17 ` Bruce Fields 2013-07-08 14:33 ` Jeff Layton 2013-07-08 18:10 ` Myklebust, Trond 2013-07-08 18:53 ` Bruce Fields 2013-07-08 19:21 ` Myklebust, Trond 2013-07-08 19:34 ` Bruce Fields 2013-07-08 20:14 ` Myklebust, Trond 2013-07-08 21:17 ` Bruce Fields 2013-07-08 22:25 ` Myklebust, Trond 2013-07-08 23:19 ` Bruce Fields
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).