* [PATCH] locks: close potential race between setlease and open @ 2013-07-08 19:07 Jeff Layton 2013-08-14 12:11 ` [PATCH v3] " Jeff Layton 2013-08-16 14:49 ` [PATCH v4] locks: close potential race between setlease and open Jeff Layton 0 siblings, 2 replies; 11+ messages in thread From: Jeff Layton @ 2013-07-08 19:07 UTC (permalink / raw) To: Al Viro; +Cc: Bruce Fields, linux-fsdevel, linux-kernel v2: - fix potential double-free of lease if second check finds conflict - add smp_mb's to ensure that other CPUs see i_flock changes (Full disclosure: I don't have a great feel for memory barriers. Some guidance here would be appreciated) As Al Viro points out, there is an unlikely, but possible race between opening a file and setting a lease on it. generic_add_lease is done with the i_lock held, but the inode->i_flock check in break_lease is lockless. It's possible for another task doing an open to do the entire pathwalk and call break_lease between the point where generic_add_lease checks for a conflicting open and adds the lease to the list. If this occurs, we can end up with a lease set on the file with a conflicting open. To guard against that, check again for a conflicting open after adding the lease to the i_flock list. If the above race occurs, then we can simply unwind the lease setting and return -EAGAIN. Finally, as Bruce points out, we must also ensure that the effects of locks_insert_lock are seen by any break_lease callers prior to doing the recheck for conflicting opens. So, add a smp_mb pairing in break_lease and generic_add_lease to ensure that. Cc: Bruce Fields <bfields@fieldses.org> Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/locks.c | 65 ++++++++++++++++++++++++++++++++++++++++++++---------- include/linux/fs.h | 2 ++ 2 files changed, 55 insertions(+), 12 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index b27a300..c870a85 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -653,14 +653,13 @@ static void locks_insert_lock(struct file_lock **pos, struct file_lock *fl) } /* - * Delete a lock and then free it. - * Wake up processes that are blocked waiting for this lock, - * notify the FS that the lock has been cleared and - * finally free the lock. + * Unlink a lock from all lists and free the namespace reference, but don't + * free it yet. Wake up processes that are blocked waiting for this lock, + * notify the FS that the lock has been cleared and finally free the lock. * * Must be called with the i_lock held! */ -static void locks_delete_lock(struct file_lock **thisfl_p) +static void locks_unlink_lock(struct file_lock **thisfl_p) { struct file_lock *fl = *thisfl_p; @@ -675,6 +674,18 @@ static void locks_delete_lock(struct file_lock **thisfl_p) } locks_wake_up_blocks(fl); +} + +/* + * Unlink a lock from all lists and free it. + * + * Must be called with i_lock held! + */ +static void locks_delete_lock(struct file_lock **thisfl_p) +{ + struct file_lock *fl = *thisfl_p; + + locks_unlink_lock(thisfl_p); locks_free_lock(fl); } @@ -1455,6 +1466,29 @@ int fcntl_getlease(struct file *filp) return type; } +/** + * check_conflicting_open - see if the given dentry points to a file that has + * an existing open that would conflict with the desired lease. + * + * @dentry: dentry to check + * @arg: type of lease that we're trying to acquire + * + * Check to see if there's an existing open fd on this file that would + * conflict with the lease we're trying to set. + */ +static int +check_conflicting_open(struct dentry *dentry, long arg) +{ + struct inode *inode = dentry->d_inode; + + if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0)) + return -EAGAIN; + if ((arg == F_WRLCK) && ((d_count(dentry) > 1) || + (atomic_read(&inode->i_count) > 1))) + return -EAGAIN; + return 0; +} + static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp) { struct file_lock *fl, **before, **my_before = NULL, *lease; @@ -1464,12 +1498,8 @@ static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp lease = *flp; - error = -EAGAIN; - if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0)) - goto out; - if ((arg == F_WRLCK) - && ((d_count(dentry) > 1) - || (atomic_read(&inode->i_count) > 1))) + error = check_conflicting_open(dentry, arg); + if (error) goto out; /* @@ -1514,8 +1544,19 @@ static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp goto out; locks_insert_lock(before, lease); - return 0; + /* ensure that break_lease sees change to i_flock list */ + smp_mb(); + + /* + * The check in break_lease() is lockless. It's possible for another + * open to race in after we did the earlier check for a conflicting + * open but before the lease was inserted. Check again for a + * conflicting open and cancel the lease if there is one. + */ + error = check_conflicting_open(dentry, arg); + if (error) + locks_unlink_lock(flp); out: return error; } diff --git a/include/linux/fs.h b/include/linux/fs.h index 834c9e5..735f8b5 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1953,6 +1953,8 @@ static inline int locks_verify_truncate(struct inode *inode, static inline int break_lease(struct inode *inode, unsigned int mode) { + /* make sure we see any change to i_flock list */ + smp_mb(); if (inode->i_flock) return __break_lease(inode, mode); return 0; -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3] locks: close potential race between setlease and open 2013-07-08 19:07 [PATCH] locks: close potential race between setlease and open Jeff Layton @ 2013-08-14 12:11 ` Jeff Layton 2013-08-15 19:32 ` Bruce Fields 2013-08-15 20:44 ` memory barriers in flock (Re: [PATCH v3] locks: close potential race between setlease and open) David Howells 2013-08-16 14:49 ` [PATCH v4] locks: close potential race between setlease and open Jeff Layton 1 sibling, 2 replies; 11+ messages in thread From: Jeff Layton @ 2013-08-14 12:11 UTC (permalink / raw) To: Al Viro Cc: linux-fsdevel, linux-kernel, Matthew Wilcox, Bruce Fields, David Howells v2: - fix potential double-free of lease if second check finds conflict - add smp_mb's to ensure that other CPUs see i_flock changes v3: - remove smp_mb calls. Partial ordering is unlikely to help here. As Al Viro points out, there is an unlikely, but possible race between opening a file and setting a lease on it. generic_add_lease is done with the i_lock held, but the inode->i_flock check in break_lease is lockless. It's possible for another task doing an open to do the entire pathwalk and call break_lease between the point where generic_add_lease checks for a conflicting open and adds the lease to the list. If this occurs, we can end up with a lease set on the file with a conflicting open. To guard against that, check again for a conflicting open after adding the lease to the i_flock list. If the above race occurs, then we can simply unwind the lease setting and return -EAGAIN. Because we take dentry references and acquire write access on the file before calling break_lease, we know that if the i_flock list is empty when the open caller goes to check it then the necessary refcounts have already been incremented. Thus the additional check for a conflicting open will see that there is one and the setlease call will fail. Cc: Bruce Fields <bfields@fieldses.org> Cc: David Howells <dhowells@redhat.com> Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/locks.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 58 insertions(+), 13 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index b27a300..a99adec 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -652,15 +652,18 @@ static void locks_insert_lock(struct file_lock **pos, struct file_lock *fl) locks_insert_global_locks(fl); } -/* - * Delete a lock and then free it. - * Wake up processes that are blocked waiting for this lock, - * notify the FS that the lock has been cleared and - * finally free the lock. +/** + * locks_delete_lock - Delete a lock and then free it. + * @thisfl_p: pointer that points to the fl_next field of the previous + * inode->i_flock list entry + * + * Unlink a lock from all lists and free the namespace reference, but don't + * free it yet. Wake up processes that are blocked waiting for this lock and + * notify the FS that the lock has been cleared. * * Must be called with the i_lock held! */ -static void locks_delete_lock(struct file_lock **thisfl_p) +static void locks_unlink_lock(struct file_lock **thisfl_p) { struct file_lock *fl = *thisfl_p; @@ -675,6 +678,18 @@ static void locks_delete_lock(struct file_lock **thisfl_p) } locks_wake_up_blocks(fl); +} + +/* + * Unlink a lock from all lists and free it. + * + * Must be called with i_lock held! + */ +static void locks_delete_lock(struct file_lock **thisfl_p) +{ + struct file_lock *fl = *thisfl_p; + + locks_unlink_lock(thisfl_p); locks_free_lock(fl); } @@ -1455,6 +1470,32 @@ int fcntl_getlease(struct file *filp) return type; } +/** + * check_conflicting_open - see if the given dentry points to a file that has + * an existing open that would conflict with the desired lease. + * + * @dentry: dentry to check + * @arg: type of lease that we're trying to acquire + * + * Check to see if there's an existing open fd on this file that would + * conflict with the lease we're trying to set. + */ +static int +check_conflicting_open(const struct dentry *dentry, const long arg) +{ + int ret = 0; + struct inode *inode = dentry->d_inode; + + if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0)) + return -EAGAIN; + + if ((arg == F_WRLCK) && ((d_count(dentry) > 1) || + (atomic_read(&inode->i_count) > 1))) + ret = -EAGAIN; + + return ret; +} + static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp) { struct file_lock *fl, **before, **my_before = NULL, *lease; @@ -1464,12 +1505,8 @@ static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp lease = *flp; - error = -EAGAIN; - if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0)) - goto out; - if ((arg == F_WRLCK) - && ((d_count(dentry) > 1) - || (atomic_read(&inode->i_count) > 1))) + error = check_conflicting_open(dentry, arg); + if (error) goto out; /* @@ -1514,8 +1551,16 @@ static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp goto out; locks_insert_lock(before, lease); - return 0; + /* + * The check in break_lease() is lockless. It's possible for another + * open to race in after we did the earlier check for a conflicting + * open but before the lease was inserted. Check again for a + * conflicting open and cancel the lease if there is one. + */ + error = check_conflicting_open(dentry, arg); + if (error) + locks_unlink_lock(flp); out: return error; } -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v3] locks: close potential race between setlease and open 2013-08-14 12:11 ` [PATCH v3] " Jeff Layton @ 2013-08-15 19:32 ` Bruce Fields 2013-08-15 19:43 ` Jeff Layton 2013-08-15 20:44 ` memory barriers in flock (Re: [PATCH v3] locks: close potential race between setlease and open) David Howells 1 sibling, 1 reply; 11+ messages in thread From: Bruce Fields @ 2013-08-15 19:32 UTC (permalink / raw) To: Jeff Layton Cc: Al Viro, linux-fsdevel, linux-kernel, Matthew Wilcox, David Howells On Wed, Aug 14, 2013 at 08:11:50AM -0400, Jeff Layton wrote: > v2: > - fix potential double-free of lease if second check finds conflict > - add smp_mb's to ensure that other CPUs see i_flock changes > > v3: > - remove smp_mb calls. Partial ordering is unlikely to help here. Forgive me here, I still don't understand. So to simplify massively, the situation looks like: setlease open ------------ ------ atomic_read atomic_inc write i_flock read i_flock atomic_read And we want to be sure that either the setlease caller sees the result of the atomic_inc, or the opener sees the result of the write to i_flock. As an example, suppose the above steps happen in the order: atomic_read write i_flock atomic_read atomic_inc read i_flock How do we know that the read of i_flock at the last step reflects the latest value of i_flock? For example, couldn't the write still be stuck in first CPU's cache? --b. > > As Al Viro points out, there is an unlikely, but possible race between > opening a file and setting a lease on it. generic_add_lease is done with > the i_lock held, but the inode->i_flock check in break_lease is > lockless. It's possible for another task doing an open to do the entire > pathwalk and call break_lease between the point where generic_add_lease > checks for a conflicting open and adds the lease to the list. If this > occurs, we can end up with a lease set on the file with a conflicting > open. > > To guard against that, check again for a conflicting open after adding > the lease to the i_flock list. If the above race occurs, then we can > simply unwind the lease setting and return -EAGAIN. > > Because we take dentry references and acquire write access on the file > before calling break_lease, we know that if the i_flock list is empty > when the open caller goes to check it then the necessary refcounts have > already been incremented. Thus the additional check for a conflicting > open will see that there is one and the setlease call will fail. > > Cc: Bruce Fields <bfields@fieldses.org> > Cc: David Howells <dhowells@redhat.com> > Reported-by: Al Viro <viro@ZenIV.linux.org.uk> > Signed-off-by: Jeff Layton <jlayton@redhat.com> > --- > fs/locks.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++------------ > 1 file changed, 58 insertions(+), 13 deletions(-) > > diff --git a/fs/locks.c b/fs/locks.c > index b27a300..a99adec 100644 > --- a/fs/locks.c > +++ b/fs/locks.c > @@ -652,15 +652,18 @@ static void locks_insert_lock(struct file_lock **pos, struct file_lock *fl) > locks_insert_global_locks(fl); > } > > -/* > - * Delete a lock and then free it. > - * Wake up processes that are blocked waiting for this lock, > - * notify the FS that the lock has been cleared and > - * finally free the lock. > +/** > + * locks_delete_lock - Delete a lock and then free it. > + * @thisfl_p: pointer that points to the fl_next field of the previous > + * inode->i_flock list entry > + * > + * Unlink a lock from all lists and free the namespace reference, but don't > + * free it yet. Wake up processes that are blocked waiting for this lock and > + * notify the FS that the lock has been cleared. > * > * Must be called with the i_lock held! > */ > -static void locks_delete_lock(struct file_lock **thisfl_p) > +static void locks_unlink_lock(struct file_lock **thisfl_p) > { > struct file_lock *fl = *thisfl_p; > > @@ -675,6 +678,18 @@ static void locks_delete_lock(struct file_lock **thisfl_p) > } > > locks_wake_up_blocks(fl); > +} > + > +/* > + * Unlink a lock from all lists and free it. > + * > + * Must be called with i_lock held! > + */ > +static void locks_delete_lock(struct file_lock **thisfl_p) > +{ > + struct file_lock *fl = *thisfl_p; > + > + locks_unlink_lock(thisfl_p); > locks_free_lock(fl); > } > > @@ -1455,6 +1470,32 @@ int fcntl_getlease(struct file *filp) > return type; > } > > +/** > + * check_conflicting_open - see if the given dentry points to a file that has > + * an existing open that would conflict with the desired lease. > + * > + * @dentry: dentry to check > + * @arg: type of lease that we're trying to acquire > + * > + * Check to see if there's an existing open fd on this file that would > + * conflict with the lease we're trying to set. > + */ > +static int > +check_conflicting_open(const struct dentry *dentry, const long arg) > +{ > + int ret = 0; > + struct inode *inode = dentry->d_inode; > + > + if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0)) > + return -EAGAIN; > + > + if ((arg == F_WRLCK) && ((d_count(dentry) > 1) || > + (atomic_read(&inode->i_count) > 1))) > + ret = -EAGAIN; > + > + return ret; > +} > + > static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp) > { > struct file_lock *fl, **before, **my_before = NULL, *lease; > @@ -1464,12 +1505,8 @@ static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp > > lease = *flp; > > - error = -EAGAIN; > - if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0)) > - goto out; > - if ((arg == F_WRLCK) > - && ((d_count(dentry) > 1) > - || (atomic_read(&inode->i_count) > 1))) > + error = check_conflicting_open(dentry, arg); > + if (error) > goto out; > > /* > @@ -1514,8 +1551,16 @@ static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp > goto out; > > locks_insert_lock(before, lease); > - return 0; > > + /* > + * The check in break_lease() is lockless. It's possible for another > + * open to race in after we did the earlier check for a conflicting > + * open but before the lease was inserted. Check again for a > + * conflicting open and cancel the lease if there is one. > + */ > + error = check_conflicting_open(dentry, arg); > + if (error) > + locks_unlink_lock(flp); > out: > return error; > } > -- > 1.8.3.1 > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3] locks: close potential race between setlease and open 2013-08-15 19:32 ` Bruce Fields @ 2013-08-15 19:43 ` Jeff Layton 2013-08-15 20:30 ` Bruce Fields 0 siblings, 1 reply; 11+ messages in thread From: Jeff Layton @ 2013-08-15 19:43 UTC (permalink / raw) To: Bruce Fields Cc: Al Viro, linux-fsdevel, linux-kernel, Matthew Wilcox, David Howells On Thu, 15 Aug 2013 15:32:03 -0400 Bruce Fields <bfields@fieldses.org> wrote: > On Wed, Aug 14, 2013 at 08:11:50AM -0400, Jeff Layton wrote: > > v2: > > - fix potential double-free of lease if second check finds conflict > > - add smp_mb's to ensure that other CPUs see i_flock changes > > > > v3: > > - remove smp_mb calls. Partial ordering is unlikely to help here. > > Forgive me here, I still don't understand. So to simplify massively, > the situation looks like: > > setlease open > ------------ ------ > > atomic_read atomic_inc > write i_flock read i_flock > atomic_read > > And we want to be sure that either the setlease caller sees the result > of the atomic_inc, or the opener sees the result of the write to > i_flock. > > As an example, suppose the above steps happen in the order: > > atomic_read > write i_flock > atomic_read > atomic_inc > read i_flock > > How do we know that the read of i_flock at the last step reflects the > latest value of i_flock? For example, couldn't the write still be stuck > in first CPU's cache? > > --b. > AIUI, all Linux architectures are cache-coherent. So if you do a write to a memory location on one CPU then other CPUs will see it after that event. The thing that you have to be careful about of course is ordering, as the CPU and compiler are allowed to reorder operations in the name of efficiency. I don't think ordering is a concern here since, as you point out, the refcounting operations are atomic or done under spinlock. Now, all that said...I'm pretty clueless when it comes to this sort of thing, so if I'm wrong then please do tell me so. Thanks, > > > > As Al Viro points out, there is an unlikely, but possible race between > > opening a file and setting a lease on it. generic_add_lease is done with > > the i_lock held, but the inode->i_flock check in break_lease is > > lockless. It's possible for another task doing an open to do the entire > > pathwalk and call break_lease between the point where generic_add_lease > > checks for a conflicting open and adds the lease to the list. If this > > occurs, we can end up with a lease set on the file with a conflicting > > open. > > > > To guard against that, check again for a conflicting open after adding > > the lease to the i_flock list. If the above race occurs, then we can > > simply unwind the lease setting and return -EAGAIN. > > > > Because we take dentry references and acquire write access on the file > > before calling break_lease, we know that if the i_flock list is empty > > when the open caller goes to check it then the necessary refcounts have > > already been incremented. Thus the additional check for a conflicting > > open will see that there is one and the setlease call will fail. > > > > Cc: Bruce Fields <bfields@fieldses.org> > > Cc: David Howells <dhowells@redhat.com> > > Reported-by: Al Viro <viro@ZenIV.linux.org.uk> > > Signed-off-by: Jeff Layton <jlayton@redhat.com> > > --- > > fs/locks.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++------------ > > 1 file changed, 58 insertions(+), 13 deletions(-) > > > > diff --git a/fs/locks.c b/fs/locks.c > > index b27a300..a99adec 100644 > > --- a/fs/locks.c > > +++ b/fs/locks.c > > @@ -652,15 +652,18 @@ static void locks_insert_lock(struct file_lock **pos, struct file_lock *fl) > > locks_insert_global_locks(fl); > > } > > > > -/* > > - * Delete a lock and then free it. > > - * Wake up processes that are blocked waiting for this lock, > > - * notify the FS that the lock has been cleared and > > - * finally free the lock. > > +/** > > + * locks_delete_lock - Delete a lock and then free it. > > + * @thisfl_p: pointer that points to the fl_next field of the previous > > + * inode->i_flock list entry > > + * > > + * Unlink a lock from all lists and free the namespace reference, but don't > > + * free it yet. Wake up processes that are blocked waiting for this lock and > > + * notify the FS that the lock has been cleared. > > * > > * Must be called with the i_lock held! > > */ > > -static void locks_delete_lock(struct file_lock **thisfl_p) > > +static void locks_unlink_lock(struct file_lock **thisfl_p) > > { > > struct file_lock *fl = *thisfl_p; > > > > @@ -675,6 +678,18 @@ static void locks_delete_lock(struct file_lock **thisfl_p) > > } > > > > locks_wake_up_blocks(fl); > > +} > > + > > +/* > > + * Unlink a lock from all lists and free it. > > + * > > + * Must be called with i_lock held! > > + */ > > +static void locks_delete_lock(struct file_lock **thisfl_p) > > +{ > > + struct file_lock *fl = *thisfl_p; > > + > > + locks_unlink_lock(thisfl_p); > > locks_free_lock(fl); > > } > > > > @@ -1455,6 +1470,32 @@ int fcntl_getlease(struct file *filp) > > return type; > > } > > > > +/** > > + * check_conflicting_open - see if the given dentry points to a file that has > > + * an existing open that would conflict with the desired lease. > > + * > > + * @dentry: dentry to check > > + * @arg: type of lease that we're trying to acquire > > + * > > + * Check to see if there's an existing open fd on this file that would > > + * conflict with the lease we're trying to set. > > + */ > > +static int > > +check_conflicting_open(const struct dentry *dentry, const long arg) > > +{ > > + int ret = 0; > > + struct inode *inode = dentry->d_inode; > > + > > + if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0)) > > + return -EAGAIN; > > + > > + if ((arg == F_WRLCK) && ((d_count(dentry) > 1) || > > + (atomic_read(&inode->i_count) > 1))) > > + ret = -EAGAIN; > > + > > + return ret; > > +} > > + > > static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp) > > { > > struct file_lock *fl, **before, **my_before = NULL, *lease; > > @@ -1464,12 +1505,8 @@ static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp > > > > lease = *flp; > > > > - error = -EAGAIN; > > - if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0)) > > - goto out; > > - if ((arg == F_WRLCK) > > - && ((d_count(dentry) > 1) > > - || (atomic_read(&inode->i_count) > 1))) > > + error = check_conflicting_open(dentry, arg); > > + if (error) > > goto out; > > > > /* > > @@ -1514,8 +1551,16 @@ static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp > > goto out; > > > > locks_insert_lock(before, lease); > > - return 0; > > > > + /* > > + * The check in break_lease() is lockless. It's possible for another > > + * open to race in after we did the earlier check for a conflicting > > + * open but before the lease was inserted. Check again for a > > + * conflicting open and cancel the lease if there is one. > > + */ > > + error = check_conflicting_open(dentry, arg); > > + if (error) > > + locks_unlink_lock(flp); > > out: > > return error; > > } > > -- > > 1.8.3.1 > > -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3] locks: close potential race between setlease and open 2013-08-15 19:43 ` Jeff Layton @ 2013-08-15 20:30 ` Bruce Fields 0 siblings, 0 replies; 11+ messages in thread From: Bruce Fields @ 2013-08-15 20:30 UTC (permalink / raw) To: Jeff Layton Cc: Al Viro, linux-fsdevel, linux-kernel, Matthew Wilcox, David Howells On Thu, Aug 15, 2013 at 03:43:25PM -0400, Jeff Layton wrote: > On Thu, 15 Aug 2013 15:32:03 -0400 > Bruce Fields <bfields@fieldses.org> wrote: > > > On Wed, Aug 14, 2013 at 08:11:50AM -0400, Jeff Layton wrote: > > > v2: > > > - fix potential double-free of lease if second check finds conflict > > > - add smp_mb's to ensure that other CPUs see i_flock changes > > > > > > v3: > > > - remove smp_mb calls. Partial ordering is unlikely to help here. > > > > Forgive me here, I still don't understand. So to simplify massively, > > the situation looks like: > > > > setlease open > > ------------ ------ > > > > atomic_read atomic_inc > > write i_flock read i_flock > > atomic_read > > > > And we want to be sure that either the setlease caller sees the result > > of the atomic_inc, or the opener sees the result of the write to > > i_flock. > > > > As an example, suppose the above steps happen in the order: > > > > atomic_read > > write i_flock > > atomic_read > > atomic_inc > > read i_flock > > > > How do we know that the read of i_flock at the last step reflects the > > latest value of i_flock? For example, couldn't the write still be stuck > > in first CPU's cache? > > > > --b. > > > > AIUI, all Linux architectures are cache-coherent. So if you do a > write to a memory location on one CPU then other CPUs will see it > after that event. > > The thing that you have to be careful about of course is ordering, as > the CPU and compiler are allowed to reorder operations in the name of > efficiency. I don't think ordering is a concern here since, as you > point out, the refcounting operations are atomic or done under spinlock. > > Now, all that said...I'm pretty clueless when it comes to this sort of > thing, so if I'm wrong then please do tell me so. Yeah, well, I'm beyond clueless--what's the difference between cache-coherent and ordered? Whether we say a write's still in one CPU cache or say the write was ordered after later operations--the upshot sounds the same. And I'm unclear what effect the atomic operations have. And the spinlock can't matter as it's only taken before and after all the operations on the left hand side? Oh wait, I guess there is another spinlock--the lglock taken when we modify i_flock. Hm. I'll go bury my nose in memory-barriers.txt and assume this is just my confusion.... --b. > > Thanks, > > > > > > > As Al Viro points out, there is an unlikely, but possible race between > > > opening a file and setting a lease on it. generic_add_lease is done with > > > the i_lock held, but the inode->i_flock check in break_lease is > > > lockless. It's possible for another task doing an open to do the entire > > > pathwalk and call break_lease between the point where generic_add_lease > > > checks for a conflicting open and adds the lease to the list. If this > > > occurs, we can end up with a lease set on the file with a conflicting > > > open. > > > > > > To guard against that, check again for a conflicting open after adding > > > the lease to the i_flock list. If the above race occurs, then we can > > > simply unwind the lease setting and return -EAGAIN. > > > > > > Because we take dentry references and acquire write access on the file > > > before calling break_lease, we know that if the i_flock list is empty > > > when the open caller goes to check it then the necessary refcounts have > > > already been incremented. Thus the additional check for a conflicting > > > open will see that there is one and the setlease call will fail. > > > > > > Cc: Bruce Fields <bfields@fieldses.org> > > > Cc: David Howells <dhowells@redhat.com> > > > Reported-by: Al Viro <viro@ZenIV.linux.org.uk> > > > Signed-off-by: Jeff Layton <jlayton@redhat.com> > > > --- > > > fs/locks.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++------------ > > > 1 file changed, 58 insertions(+), 13 deletions(-) > > > > > > diff --git a/fs/locks.c b/fs/locks.c > > > index b27a300..a99adec 100644 > > > --- a/fs/locks.c > > > +++ b/fs/locks.c > > > @@ -652,15 +652,18 @@ static void locks_insert_lock(struct file_lock **pos, struct file_lock *fl) > > > locks_insert_global_locks(fl); > > > } > > > > > > -/* > > > - * Delete a lock and then free it. > > > - * Wake up processes that are blocked waiting for this lock, > > > - * notify the FS that the lock has been cleared and > > > - * finally free the lock. > > > +/** > > > + * locks_delete_lock - Delete a lock and then free it. > > > + * @thisfl_p: pointer that points to the fl_next field of the previous > > > + * inode->i_flock list entry > > > + * > > > + * Unlink a lock from all lists and free the namespace reference, but don't > > > + * free it yet. Wake up processes that are blocked waiting for this lock and > > > + * notify the FS that the lock has been cleared. > > > * > > > * Must be called with the i_lock held! > > > */ > > > -static void locks_delete_lock(struct file_lock **thisfl_p) > > > +static void locks_unlink_lock(struct file_lock **thisfl_p) > > > { > > > struct file_lock *fl = *thisfl_p; > > > > > > @@ -675,6 +678,18 @@ static void locks_delete_lock(struct file_lock **thisfl_p) > > > } > > > > > > locks_wake_up_blocks(fl); > > > +} > > > + > > > +/* > > > + * Unlink a lock from all lists and free it. > > > + * > > > + * Must be called with i_lock held! > > > + */ > > > +static void locks_delete_lock(struct file_lock **thisfl_p) > > > +{ > > > + struct file_lock *fl = *thisfl_p; > > > + > > > + locks_unlink_lock(thisfl_p); > > > locks_free_lock(fl); > > > } > > > > > > @@ -1455,6 +1470,32 @@ int fcntl_getlease(struct file *filp) > > > return type; > > > } > > > > > > +/** > > > + * check_conflicting_open - see if the given dentry points to a file that has > > > + * an existing open that would conflict with the desired lease. > > > + * > > > + * @dentry: dentry to check > > > + * @arg: type of lease that we're trying to acquire > > > + * > > > + * Check to see if there's an existing open fd on this file that would > > > + * conflict with the lease we're trying to set. > > > + */ > > > +static int > > > +check_conflicting_open(const struct dentry *dentry, const long arg) > > > +{ > > > + int ret = 0; > > > + struct inode *inode = dentry->d_inode; > > > + > > > + if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0)) > > > + return -EAGAIN; > > > + > > > + if ((arg == F_WRLCK) && ((d_count(dentry) > 1) || > > > + (atomic_read(&inode->i_count) > 1))) > > > + ret = -EAGAIN; > > > + > > > + return ret; > > > +} > > > + > > > static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp) > > > { > > > struct file_lock *fl, **before, **my_before = NULL, *lease; > > > @@ -1464,12 +1505,8 @@ static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp > > > > > > lease = *flp; > > > > > > - error = -EAGAIN; > > > - if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0)) > > > - goto out; > > > - if ((arg == F_WRLCK) > > > - && ((d_count(dentry) > 1) > > > - || (atomic_read(&inode->i_count) > 1))) > > > + error = check_conflicting_open(dentry, arg); > > > + if (error) > > > goto out; > > > > > > /* > > > @@ -1514,8 +1551,16 @@ static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp > > > goto out; > > > > > > locks_insert_lock(before, lease); > > > - return 0; > > > > > > + /* > > > + * The check in break_lease() is lockless. It's possible for another > > > + * open to race in after we did the earlier check for a conflicting > > > + * open but before the lease was inserted. Check again for a > > > + * conflicting open and cancel the lease if there is one. > > > + */ > > > + error = check_conflicting_open(dentry, arg); > > > + if (error) > > > + locks_unlink_lock(flp); > > > out: > > > return error; > > > } > > > -- > > > 1.8.3.1 > > > > > > -- > Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 11+ messages in thread
* memory barriers in flock (Re: [PATCH v3] locks: close potential race between setlease and open) 2013-08-14 12:11 ` [PATCH v3] " Jeff Layton 2013-08-15 19:32 ` Bruce Fields @ 2013-08-15 20:44 ` David Howells 2013-08-15 21:31 ` Paul E. McKenney 1 sibling, 1 reply; 11+ messages in thread From: David Howells @ 2013-08-15 20:44 UTC (permalink / raw) To: Bruce Fields, paulmck Cc: dhowells, Jeff Layton, Al Viro, linux-fsdevel, linux-kernel, Matthew Wilcox Bruce Fields <bfields@fieldses.org> wrote: (Adding Paul McKenney who's good at this stuff) > > v2: > > - fix potential double-free of lease if second check finds conflict > > - add smp_mb's to ensure that other CPUs see i_flock changes > > > > v3: > > - remove smp_mb calls. Partial ordering is unlikely to help here. > > Forgive me here, I still don't understand. So to simplify massively, > the situation looks like: > > setlease open > ------------ ------ > > atomic_read atomic_inc > write i_flock read i_flock > atomic_read Are the three atomic ops reading the same value? If so, you can have smp_mb() calls here: atomic_read atomic_inc smp_mb() write i_flock read i_flock smp_mb() atomic_read I *think* that memory accesses in one place need to be reverse-ordered wrt to those in the other place, so: atomic_read atomic_inc smp_mb() smp_mb() write i_flock read i_flock atomic_read doesn't achieve anything. > And we want to be sure that either the setlease caller sees the result > of the atomic_inc, or the opener sees the result of the write to > i_flock. > > As an example, suppose the above steps happen in the order: > > atomic_read [A] > write i_flock [B] > atomic_read [C] > atomic_inc [X] > read i_flock [Y] (I've labelled the operations for convenience) > How do we know that the read of i_flock [Y] at the last step reflects the > latest value of i_flock? For example, couldn't the write still be stuck in > first CPU's cache? Putting in memory barriers doesn't help here. If A, B and C are all performed and committed to memory by the time X happens, then Y will see B, but C will not see X. The thing to remember is that smp_mb() is not a commit operation: it doesn't cause a modification to be committed to memory. The reason you use it is to make sure the CPU actually does preceding memory ops - eg. makes the modification in question - before it goes and does any following memory ops. Linux requires the system to be cache-coherent, so if the write is actually performed by a CPU then the result will be obtained by any other CPU, no matter whether it's still lingering in the first CPU's caches or whether it's been passed on. -*- However, I could be wrong. Memory barriers are mind-bending to try and think through, especially when it's the operations being ordered are R-W vs R-W rather than having W-W on at least one side. Hopefully Paul will be able to chime in David ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: memory barriers in flock (Re: [PATCH v3] locks: close potential race between setlease and open) 2013-08-15 20:44 ` memory barriers in flock (Re: [PATCH v3] locks: close potential race between setlease and open) David Howells @ 2013-08-15 21:31 ` Paul E. McKenney 2013-08-16 12:09 ` Jeff Layton 2013-08-19 13:31 ` Bruce Fields 0 siblings, 2 replies; 11+ messages in thread From: Paul E. McKenney @ 2013-08-15 21:31 UTC (permalink / raw) To: David Howells Cc: Bruce Fields, Jeff Layton, Al Viro, linux-fsdevel, linux-kernel, Matthew Wilcox On Thu, Aug 15, 2013 at 09:44:25PM +0100, David Howells wrote: > Bruce Fields <bfields@fieldses.org> wrote: > > (Adding Paul McKenney who's good at this stuff) Well, I should be able to provide a more refined form of confusion... > > > v2: > > > - fix potential double-free of lease if second check finds conflict > > > - add smp_mb's to ensure that other CPUs see i_flock changes > > > > > > v3: > > > - remove smp_mb calls. Partial ordering is unlikely to help here. > > > > Forgive me here, I still don't understand. So to simplify massively, > > the situation looks like: > > > > setlease open > > ------------ ------ > > > > atomic_read atomic_inc > > write i_flock read i_flock > > atomic_read > > Are the three atomic ops reading the same value? If so, you can have smp_mb() > calls here: > > atomic_read atomic_inc > smp_mb() > write i_flock read i_flock > smp_mb() > atomic_read Yes, you need memory barrier of some form or another. You can use smp_mb__after_atomic_inc() after the atomic_inc, which is lighter weight on x86. Note that atomic_inc() does not return a value, and thus does not guarantee ordering of any sort. > I *think* that memory accesses in one place need to be reverse-ordered wrt to > those in the other place, so: > > atomic_read atomic_inc > smp_mb() smp_mb() > write i_flock read i_flock > atomic_read > > doesn't achieve anything. The only thing that this arrangement could achive would be if the atomic_ operations are all accessing the same variable, in which case if the first CPU's last atomic_read preceded the atomic_inc, then we would know that the first CPU's first atomic_inc also preceded that same atomic_inc. Let's add values and look at it a slightly different way: CPU 0 CPU 1 r1 = atomic_read(&x); atomic_inc(&x); smp_mb(); smp_mb__after_atomic_inc() i_flock = 1; r3 = i_flock; r2 = atomic_read(&x); Assume that the initial values of x and i_flock are zero. (This might not make sense in the code, but you can substitute different values without changing the way this works.) Then if r2==0, we know that r1==0 as well. Of course, if there were other atomic_inc(&x) operations in flight, it might be that r1!=r2, but we would know that r1 got an earlier value of x than r2. If x is 64 bits, then we know that r1<r2. But as Dave points out, the placement of the memory barriers prevents us from drawing any similar conclusions about the accesses to i_flock. So let's take a similar look at the initial arrangement: CPU 0 CPU 1 r1 = atomic_read(&x); atomic_inc(&x); i_flock = 1; smp_mb__after_atomic_inc() smp_mb(); r3 = i_flock; r2 = atomic_read(&x); Again, assume that the initial values of x and of i_flock are zero. Assume also that this is the only code that executes. Then if r3==0, we know that r2>=1. In fact, if r3==0, we know that r2==1. The load from i_flock() had to have come before the store, and so the memory barriers guarantee that the load into r2 must see the results of the atomic_inc(). Similarly, if r2==0, we know that CPU 0's load into r2 came before CPU 1's atomic_inc(). The memory barriers then guarantee that CPU 0's store into i_flock precedes CPU 1's load from i_flock, so that r3==1. Cache coherence gives us another fact. Both of CPU 0's atomic_read()s do volatile loads from the same variable, so they must happen in order (if they were not volatile, the compiler would be within its rights to interchange them). Therefore, because CPU 0's atomic_read() into r2 precedes CPU 1's atomic_inc() -- recall that r2==0 -- we know that CPU 0's atomic_read() into r1 must also precede CPU 1's atomic_inc(), meaning that we know r1==0. Reasoning about memory barriers takes this same form. If something after memory barrier A can be proven to have happened before something that came before memory barrier B, then everything before memory barrier A must happen before everything after memory barrier B. You carry out the proof by looking at loads and stores to a given variable. This is a key point. Memory barriers do nothing except for conditionally enforcing ordering. They are not guaranteed to commit values to memory, to caches, or much of anything else. Again, if you know that something following memory barrier A happened before something preceding memory barrier B, then you know that memory access preceding memory barrier A will be seen by a corresponding memory access following memory barrier B. > > And we want to be sure that either the setlease caller sees the result > > of the atomic_inc, or the opener sees the result of the write to > > i_flock. > > > > As an example, suppose the above steps happen in the order: > > > > atomic_read [A] > > write i_flock [B] > > atomic_read [C] > > atomic_inc [X] > > read i_flock [Y] > > (I've labelled the operations for convenience) > > > How do we know that the read of i_flock [Y] at the last step reflects the > > latest value of i_flock? For example, couldn't the write still be stuck in > > first CPU's cache? > > Putting in memory barriers doesn't help here. If A, B and C are all performed > and committed to memory by the time X happens, then Y will see B, but C will > not see X. Indeed, you don't get both of these at once, except by accident. > The thing to remember is that smp_mb() is not a commit operation: it doesn't > cause a modification to be committed to memory. The reason you use it is to > make sure the CPU actually does preceding memory ops - eg. makes the > modification in question - before it goes and does any following memory ops. > > Linux requires the system to be cache-coherent, so if the write is actually > performed by a CPU then the result will be obtained by any other CPU, no > matter whether it's still lingering in the first CPU's caches or whether it's > been passed on. Or some later result, but yes. Again, these accesses must be volatile (as in either atomic_read() or ACCESS_ONCE()), or the compiler is within its rights to do all sorts of mischief. > -*- > > However, I could be wrong. Memory barriers are mind-bending to try and think > through, especially when it's the operations being ordered are R-W vs R-W > rather than having W-W on at least one side. There are 16 combinations, all of which do something. Some cases require an external store to prove anything, though. For example: CPU 0: r1 = ACCESS_ONCE(x); smp_mb(); r2 = ACCESS_ONCE(y); CPU 1: r3 = ACCESS_ONCE(y); smp_mb(); r4 = ACCESS_ONCE(x); CPU 2: ACCESS_ONCE(x) = 1; CPU 3: ACCESS_ONCE(y) = 1; Here, we know that if r2==0&&r3==1, then it is impossible to also have r4==0&&r1==1. Similarly, if r4==0&&r1==1, then it is impossible to also have r2==0&&r3==1. If there were no stores, then of course the reads could not tell you anything about the ordering. > Hopefully Paul will be able to chime in Careful what you wish for! ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: memory barriers in flock (Re: [PATCH v3] locks: close potential race between setlease and open) 2013-08-15 21:31 ` Paul E. McKenney @ 2013-08-16 12:09 ` Jeff Layton 2013-08-19 13:31 ` Bruce Fields 1 sibling, 0 replies; 11+ messages in thread From: Jeff Layton @ 2013-08-16 12:09 UTC (permalink / raw) To: paulmck Cc: David Howells, Bruce Fields, Al Viro, linux-fsdevel, linux-kernel, Matthew Wilcox On Thu, 15 Aug 2013 14:31:06 -0700 "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote: > On Thu, Aug 15, 2013 at 09:44:25PM +0100, David Howells wrote: > > Bruce Fields <bfields@fieldses.org> wrote: > > > > (Adding Paul McKenney who's good at this stuff) > > Well, I should be able to provide a more refined form of confusion... > > > > > v2: > > > > - fix potential double-free of lease if second check finds conflict > > > > - add smp_mb's to ensure that other CPUs see i_flock changes > > > > > > > > v3: > > > > - remove smp_mb calls. Partial ordering is unlikely to help here. > > > > > > Forgive me here, I still don't understand. So to simplify massively, > > > the situation looks like: > > > > > > setlease open > > > ------------ ------ > > > > > > atomic_read atomic_inc > > > write i_flock read i_flock > > > atomic_read > > > > Are the three atomic ops reading the same value? If so, you can have smp_mb() > > calls here: > > > > atomic_read atomic_inc > > smp_mb() > > write i_flock read i_flock > > smp_mb() > > atomic_read > > Yes, you need memory barrier of some form or another. You can use > smp_mb__after_atomic_inc() after the atomic_inc, which is lighter > weight on x86. Note that atomic_inc() does not return a value, and > thus does not guarantee ordering of any sort. > > > I *think* that memory accesses in one place need to be reverse-ordered wrt to > > those in the other place, so: > > > > atomic_read atomic_inc > > smp_mb() smp_mb() > > write i_flock read i_flock > > atomic_read > > > > doesn't achieve anything. > > The only thing that this arrangement could achive would be if the > atomic_ operations are all accessing the same variable, in which case > if the first CPU's last atomic_read preceded the atomic_inc, then > we would know that the first CPU's first atomic_inc also preceded > that same atomic_inc. > > Let's add values and look at it a slightly different way: > > CPU 0 CPU 1 > > r1 = atomic_read(&x); atomic_inc(&x); > smp_mb(); smp_mb__after_atomic_inc() > i_flock = 1; r3 = i_flock; > r2 = atomic_read(&x); > > Assume that the initial values of x and i_flock are zero. (This might not > make sense in the code, but you can substitute different values without > changing the way this works.) > > Then if r2==0, we know that r1==0 as well. Of course, if there were other > atomic_inc(&x) operations in flight, it might be that r1!=r2, but we > would know that r1 got an earlier value of x than r2. If x is 64 bits, > then we know that r1<r2. > > But as Dave points out, the placement of the memory barriers prevents > us from drawing any similar conclusions about the accesses to i_flock. > > So let's take a similar look at the initial arrangement: > > CPU 0 CPU 1 > > r1 = atomic_read(&x); atomic_inc(&x); > i_flock = 1; smp_mb__after_atomic_inc() > smp_mb(); r3 = i_flock; > r2 = atomic_read(&x); > > Again, assume that the initial values of x and of i_flock are zero. > Assume also that this is the only code that executes. Then if r3==0, we > know that r2>=1. In fact, if r3==0, we know that r2==1. The load from > i_flock() had to have come before the store, and so the memory barriers > guarantee that the load into r2 must see the results of the atomic_inc(). > > Similarly, if r2==0, we know that CPU 0's load into r2 came before > CPU 1's atomic_inc(). The memory barriers then guarantee that CPU 0's > store into i_flock precedes CPU 1's load from i_flock, so that r3==1. > Cache coherence gives us another fact. Both of CPU 0's atomic_read()s > do volatile loads from the same variable, so they must happen in order > (if they were not volatile, the compiler would be within its rights > to interchange them). Therefore, because CPU 0's atomic_read() into > r2 precedes CPU 1's atomic_inc() -- recall that r2==0 -- we know that > CPU 0's atomic_read() into r1 must also precede CPU 1's atomic_inc(), > meaning that we know r1==0. > > Reasoning about memory barriers takes this same form. If something after > memory barrier A can be proven to have happened before something that > came before memory barrier B, then everything before memory barrier A > must happen before everything after memory barrier B. You carry out > the proof by looking at loads and stores to a given variable. > > This is a key point. Memory barriers do nothing except for conditionally > enforcing ordering. They are not guaranteed to commit values to memory, > to caches, or much of anything else. Again, if you know that something > following memory barrier A happened before something preceding memory > barrier B, then you know that memory access preceding memory barrier A > will be seen by a corresponding memory access following memory barrier B. > > > > And we want to be sure that either the setlease caller sees the result > > > of the atomic_inc, or the opener sees the result of the write to > > > i_flock. > > > > > > As an example, suppose the above steps happen in the order: > > > > > > atomic_read [A] > > > write i_flock [B] > > > atomic_read [C] > > > atomic_inc [X] > > > read i_flock [Y] > > > > (I've labelled the operations for convenience) > > > > > How do we know that the read of i_flock [Y] at the last step reflects the > > > latest value of i_flock? For example, couldn't the write still be stuck in > > > first CPU's cache? > > > > Putting in memory barriers doesn't help here. If A, B and C are all performed > > and committed to memory by the time X happens, then Y will see B, but C will > > not see X. > > Indeed, you don't get both of these at once, except by accident. > > > The thing to remember is that smp_mb() is not a commit operation: it doesn't > > cause a modification to be committed to memory. The reason you use it is to > > make sure the CPU actually does preceding memory ops - eg. makes the > > modification in question - before it goes and does any following memory ops. > > > > Linux requires the system to be cache-coherent, so if the write is actually > > performed by a CPU then the result will be obtained by any other CPU, no > > matter whether it's still lingering in the first CPU's caches or whether it's > > been passed on. > > Or some later result, but yes. Again, these accesses must be volatile > (as in either atomic_read() or ACCESS_ONCE()), or the compiler is within > its rights to do all sorts of mischief. > > > -*- > > > > However, I could be wrong. Memory barriers are mind-bending to try and think > > through, especially when it's the operations being ordered are R-W vs R-W > > rather than having W-W on at least one side. > > There are 16 combinations, all of which do something. Some cases require > an external store to prove anything, though. For example: > > CPU 0: r1 = ACCESS_ONCE(x); smp_mb(); r2 = ACCESS_ONCE(y); > CPU 1: r3 = ACCESS_ONCE(y); smp_mb(); r4 = ACCESS_ONCE(x); > CPU 2: ACCESS_ONCE(x) = 1; > CPU 3: ACCESS_ONCE(y) = 1; > > Here, we know that if r2==0&&r3==1, then it is impossible to also have > r4==0&&r1==1. Similarly, if r4==0&&r1==1, then it is impossible to also > have r2==0&&r3==1. If there were no stores, then of course the reads > could not tell you anything about the ordering. > > > Hopefully Paul will be able to chime in > > Careful what you wish for! ;-) > > Thanx, Paul > Wow! Thanks for the excellent writeup. I think I'm starting to get it... In this case the issue is the manipulation of the inode->i_flock list (which is a simple single-pointer linked list) vs. manipulation of various refcounts on dentries and inodes. The most problematic refcount is probably the inode->i_writecount, which is an atomic_t. The others involve incrementing the dentry->d_count somewhere, and that's always done under a spinlock which should imply a memory barrier. In the open codepath, the i_writecount is incremented by __get_file_write_access. We need to ensure that that increment is done prior to checking !inode->i_flock. do_dentry_open does take at least one spinlock in between those points in file_sb_list_add(). That should be enough to enforce ordering there. In the setlease codepath, we need to ensure that the inode->i_flock store is done before checking the refcounts a second time. That store is done prior to locks_insert_global_locks, which takes a spinlock too, so that should enforce the ordering there. So, I think this patch should close the race. That said, it might not hurt to add back in some explicit memory barriers in the event that later code changes remove or move around the implicit ones. I'll spin up a v4 patch that does that, but it'll probably look very similar to the v2 one. Thanks for all the help! -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: memory barriers in flock (Re: [PATCH v3] locks: close potential race between setlease and open) 2013-08-15 21:31 ` Paul E. McKenney 2013-08-16 12:09 ` Jeff Layton @ 2013-08-19 13:31 ` Bruce Fields 1 sibling, 0 replies; 11+ messages in thread From: Bruce Fields @ 2013-08-19 13:31 UTC (permalink / raw) To: Paul E. McKenney Cc: David Howells, Jeff Layton, Al Viro, linux-fsdevel, linux-kernel, Matthew Wilcox On Thu, Aug 15, 2013 at 02:31:06PM -0700, Paul E. McKenney wrote: > On Thu, Aug 15, 2013 at 09:44:25PM +0100, David Howells wrote: > > Bruce Fields <bfields@fieldses.org> wrote: > > > > (Adding Paul McKenney who's good at this stuff) > > Well, I should be able to provide a more refined form of confusion... > > > > > v2: > > > > - fix potential double-free of lease if second check finds conflict > > > > - add smp_mb's to ensure that other CPUs see i_flock changes > > > > > > > > v3: > > > > - remove smp_mb calls. Partial ordering is unlikely to help here. > > > > > > Forgive me here, I still don't understand. So to simplify massively, > > > the situation looks like: > > > > > > setlease open > > > ------------ ------ > > > > > > atomic_read atomic_inc > > > write i_flock read i_flock > > > atomic_read > > > > Are the three atomic ops reading the same value? If so, you can have smp_mb() > > calls here: > > > > atomic_read atomic_inc > > smp_mb() > > write i_flock read i_flock > > smp_mb() > > atomic_read > > Yes, you need memory barrier of some form or another. You can use > smp_mb__after_atomic_inc() after the atomic_inc, which is lighter > weight on x86. Note that atomic_inc() does not return a value, and > thus does not guarantee ordering of any sort. > > > I *think* that memory accesses in one place need to be reverse-ordered wrt to > > those in the other place, so: > > > > atomic_read atomic_inc > > smp_mb() smp_mb() > > write i_flock read i_flock > > atomic_read > > > > doesn't achieve anything. > > The only thing that this arrangement could achive would be if the > atomic_ operations are all accessing the same variable, in which case > if the first CPU's last atomic_read preceded the atomic_inc, then > we would know that the first CPU's first atomic_inc also preceded > that same atomic_inc. > > Let's add values and look at it a slightly different way: > > CPU 0 CPU 1 > > r1 = atomic_read(&x); atomic_inc(&x); > smp_mb(); smp_mb__after_atomic_inc() > i_flock = 1; r3 = i_flock; > r2 = atomic_read(&x); > > Assume that the initial values of x and i_flock are zero. (This might not > make sense in the code, but you can substitute different values without > changing the way this works.) > > Then if r2==0, we know that r1==0 as well. Of course, if there were other > atomic_inc(&x) operations in flight, it might be that r1!=r2, but we > would know that r1 got an earlier value of x than r2. If x is 64 bits, > then we know that r1<r2. > > But as Dave points out, the placement of the memory barriers prevents > us from drawing any similar conclusions about the accesses to i_flock. > > So let's take a similar look at the initial arrangement: > > CPU 0 CPU 1 > > r1 = atomic_read(&x); atomic_inc(&x); > i_flock = 1; smp_mb__after_atomic_inc() > smp_mb(); r3 = i_flock; > r2 = atomic_read(&x); > > Again, assume that the initial values of x and of i_flock are zero. > Assume also that this is the only code that executes. Then if r3==0, we > know that r2>=1. In fact, if r3==0, we know that r2==1. The load from > i_flock() had to have come before the store, and so the memory barriers > guarantee that the load into r2 must see the results of the atomic_inc(). > > Similarly, if r2==0, we know that CPU 0's load into r2 came before > CPU 1's atomic_inc(). The memory barriers then guarantee that CPU 0's > store into i_flock precedes CPU 1's load from i_flock, so that r3==1. > Cache coherence gives us another fact. Both of CPU 0's atomic_read()s > do volatile loads from the same variable, so they must happen in order > (if they were not volatile, the compiler would be within its rights > to interchange them). OK, but the smp_mb() already guaranteed this, right? (How would our results be different if we replaced the above by r1 = x; x++; i_flock=1; smp_mb(); smb_mb(); r3=i_flock; r2 = x; ? Could the compiler could for example assume that x doesn't change at all between the two assignments and not do the second read at all? But if it's allowed to do that then I'm not sure I understand what a compiler barrier is.) > Therefore, because CPU 0's atomic_read() into > r2 precedes CPU 1's atomic_inc() -- recall that r2==0 -- we know that > CPU 0's atomic_read() into r1 must also precede CPU 1's atomic_inc(), > meaning that we know r1==0. > > Reasoning about memory barriers takes this same form. If something after > memory barrier A can be proven to have happened before something that > came before memory barrier B, then everything before memory barrier A > must happen before everything after memory barrier B. You carry out > the proof by looking at loads and stores to a given variable. > > This is a key point. Memory barriers do nothing except for conditionally > enforcing ordering. They are not guaranteed to commit values to memory, > to caches, or much of anything else. Again, if you know that something > following memory barrier A happened before something preceding memory > barrier B, then you know that memory access preceding memory barrier A > will be seen by a corresponding memory access following memory barrier B. > > > > And we want to be sure that either the setlease caller sees the result > > > of the atomic_inc, or the opener sees the result of the write to > > > i_flock. > > > > > > As an example, suppose the above steps happen in the order: > > > > > > atomic_read [A] > > > write i_flock [B] > > > atomic_read [C] > > > atomic_inc [X] > > > read i_flock [Y] > > > > (I've labelled the operations for convenience) > > > > > How do we know that the read of i_flock [Y] at the last step reflects the > > > latest value of i_flock? For example, couldn't the write still be stuck in > > > first CPU's cache? > > > > Putting in memory barriers doesn't help here. If A, B and C are all performed > > and committed to memory by the time X happens, then Y will see B, but C will > > not see X. > > Indeed, you don't get both of these at once, except by accident. > > > The thing to remember is that smp_mb() is not a commit operation: it doesn't > > cause a modification to be committed to memory. The reason you use it is to > > make sure the CPU actually does preceding memory ops - eg. makes the > > modification in question - before it goes and does any following memory ops. > > > > Linux requires the system to be cache-coherent, so if the write is actually > > performed by a CPU then the result will be obtained by any other CPU, no > > matter whether it's still lingering in the first CPU's caches or whether it's > > been passed on. > > Or some later result, but yes. Again, these accesses must be volatile > (as in either atomic_read() or ACCESS_ONCE()), or the compiler is within > its rights to do all sorts of mischief. Again I'm confused here by the statement in memory-barriers.txt that "All memory barriers except the data dependency barriers imply a compiler barrier." Shouldn't that mean that the compiler is forbidden any mischief that would make accesses appear to be misordered with respect to the barriers? If a compiler barrier doesn't mean at least that, then I can't figure out what's left that it could mean. > > -*- > > > > However, I could be wrong. Memory barriers are mind-bending to try and think > > through, especially when it's the operations being ordered are R-W vs R-W > > rather than having W-W on at least one side. > > There are 16 combinations, all of which do something. Some cases require > an external store to prove anything, though. For example: > > CPU 0: r1 = ACCESS_ONCE(x); smp_mb(); r2 = ACCESS_ONCE(y); > CPU 1: r3 = ACCESS_ONCE(y); smp_mb(); r4 = ACCESS_ONCE(x); > CPU 2: ACCESS_ONCE(x) = 1; > CPU 3: ACCESS_ONCE(y) = 1; > > Here, we know that if r2==0&&r3==1, then it is impossible to also have > r4==0&&r1==1. Similarly, if r4==0&&r1==1, then it is impossible to also > have r2==0&&r3==1. If there were no stores, then of course the reads > could not tell you anything about the ordering. > > > Hopefully Paul will be able to chime in > > Careful what you wish for! ;-) Thanks so much for the explanations! --b. ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v4] locks: close potential race between setlease and open 2013-07-08 19:07 [PATCH] locks: close potential race between setlease and open Jeff Layton 2013-08-14 12:11 ` [PATCH v3] " Jeff Layton @ 2013-08-16 14:49 ` Jeff Layton 2013-08-19 13:34 ` Bruce Fields 1 sibling, 1 reply; 11+ messages in thread From: Jeff Layton @ 2013-08-16 14:49 UTC (permalink / raw) To: Al Viro Cc: linux-fsdevel, linux-kernel, Matthew Wilcox, Bruce Fields, David Howells, Paul E. McKenney v2: - fix potential double-free of lease if second check finds conflict - add smp_mb's to ensure that other CPUs see i_flock changes v3: - remove smp_mb calls. Partial ordering is unlikely to help here. v4: - add back smp_mb calls. While we have implicit barriers in place that enforce this today, it's best to be explicit about it as a defensive coding measure. As Al Viro points out, there is an unlikely, but possible race between opening a file and setting a lease on it. generic_add_lease is done with the i_lock held, but the inode->i_flock check in break_lease is lockless. It's possible for another task doing an open to do the entire pathwalk and call break_lease between the point where generic_add_lease checks for a conflicting open and adds the lease to the list. If this occurs, we can end up with a lease set on the file with a conflicting open. To guard against that, check again for a conflicting open after adding the lease to the i_flock list. If the above race occurs, then we can simply unwind the lease setting and return -EAGAIN. Because we take dentry references and acquire write access on the file before calling break_lease, we know that if the i_flock list is empty when the open caller goes to check it then the necessary refcounts have already been incremented. Thus the additional check for a conflicting open will see that there is one and the setlease call will fail. Cc: Bruce Fields <bfields@fieldses.org> Cc: David Howells <dhowells@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/locks.c | 75 ++++++++++++++++++++++++++++++++++++++++++++---------- include/linux/fs.h | 6 +++++ 2 files changed, 68 insertions(+), 13 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index b27a300..8dddcb5 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -652,15 +652,18 @@ static void locks_insert_lock(struct file_lock **pos, struct file_lock *fl) locks_insert_global_locks(fl); } -/* - * Delete a lock and then free it. - * Wake up processes that are blocked waiting for this lock, - * notify the FS that the lock has been cleared and - * finally free the lock. +/** + * locks_delete_lock - Delete a lock and then free it. + * @thisfl_p: pointer that points to the fl_next field of the previous + * inode->i_flock list entry + * + * Unlink a lock from all lists and free the namespace reference, but don't + * free it yet. Wake up processes that are blocked waiting for this lock and + * notify the FS that the lock has been cleared. * * Must be called with the i_lock held! */ -static void locks_delete_lock(struct file_lock **thisfl_p) +static void locks_unlink_lock(struct file_lock **thisfl_p) { struct file_lock *fl = *thisfl_p; @@ -675,6 +678,18 @@ static void locks_delete_lock(struct file_lock **thisfl_p) } locks_wake_up_blocks(fl); +} + +/* + * Unlink a lock from all lists and free it. + * + * Must be called with i_lock held! + */ +static void locks_delete_lock(struct file_lock **thisfl_p) +{ + struct file_lock *fl = *thisfl_p; + + locks_unlink_lock(thisfl_p); locks_free_lock(fl); } @@ -1455,6 +1470,32 @@ int fcntl_getlease(struct file *filp) return type; } +/** + * check_conflicting_open - see if the given dentry points to a file that has + * an existing open that would conflict with the + * desired lease. + * @dentry: dentry to check + * @arg: type of lease that we're trying to acquire + * + * Check to see if there's an existing open fd on this file that would + * conflict with the lease we're trying to set. + */ +static int +check_conflicting_open(const struct dentry *dentry, const long arg) +{ + int ret = 0; + struct inode *inode = dentry->d_inode; + + if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0)) + return -EAGAIN; + + if ((arg == F_WRLCK) && ((d_count(dentry) > 1) || + (atomic_read(&inode->i_count) > 1))) + ret = -EAGAIN; + + return ret; +} + static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp) { struct file_lock *fl, **before, **my_before = NULL, *lease; @@ -1464,12 +1505,8 @@ static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp lease = *flp; - error = -EAGAIN; - if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0)) - goto out; - if ((arg == F_WRLCK) - && ((d_count(dentry) > 1) - || (atomic_read(&inode->i_count) > 1))) + error = check_conflicting_open(dentry, arg); + if (error) goto out; /* @@ -1514,8 +1551,20 @@ static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp goto out; locks_insert_lock(before, lease); - return 0; + /* + * The check in break_lease() is lockless. It's possible for another + * open to race in after we did the earlier check for a conflicting + * open but before the lease was inserted. Check again for a + * conflicting open and cancel the lease if there is one. + * + * We also add a barrier here to ensure that the insertion of the lock + * precedes these checks. + */ + smp_mb(); + error = check_conflicting_open(dentry, arg); + if (error) + locks_unlink_lock(flp); out: return error; } diff --git a/include/linux/fs.h b/include/linux/fs.h index 9818747..165bf41 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1955,6 +1955,12 @@ static inline int locks_verify_truncate(struct inode *inode, static inline int break_lease(struct inode *inode, unsigned int mode) { + /* + * Since this check is lockless, we must ensure that any refcounts + * taken are done before checking inode->i_flock. Otherwise, we could + * end up racing with tasks trying to set a new lease on this file. + */ + smp_mb(); if (inode->i_flock) return __break_lease(inode, mode); return 0; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v4] locks: close potential race between setlease and open 2013-08-16 14:49 ` [PATCH v4] locks: close potential race between setlease and open Jeff Layton @ 2013-08-19 13:34 ` Bruce Fields 0 siblings, 0 replies; 11+ messages in thread From: Bruce Fields @ 2013-08-19 13:34 UTC (permalink / raw) To: Jeff Layton Cc: Al Viro, linux-fsdevel, linux-kernel, Matthew Wilcox, David Howells, Paul E. McKenney On Fri, Aug 16, 2013 at 10:49:32AM -0400, Jeff Layton wrote: > v2: > - fix potential double-free of lease if second check finds conflict > - add smp_mb's to ensure that other CPUs see i_flock changes > > v3: > - remove smp_mb calls. Partial ordering is unlikely to help here. > > v4: > - add back smp_mb calls. While we have implicit barriers in place > that enforce this today, it's best to be explicit about it as a > defensive coding measure. > > As Al Viro points out, there is an unlikely, but possible race between > opening a file and setting a lease on it. generic_add_lease is done with > the i_lock held, but the inode->i_flock check in break_lease is > lockless. It's possible for another task doing an open to do the entire > pathwalk and call break_lease between the point where generic_add_lease > checks for a conflicting open and adds the lease to the list. If this > occurs, we can end up with a lease set on the file with a conflicting > open. > > To guard against that, check again for a conflicting open after adding > the lease to the i_flock list. If the above race occurs, then we can > simply unwind the lease setting and return -EAGAIN. > > Because we take dentry references and acquire write access on the file > before calling break_lease, we know that if the i_flock list is empty > when the open caller goes to check it then the necessary refcounts have > already been incremented. Thus the additional check for a conflicting > open will see that there is one and the setlease call will fail. ACK--thanks for your persistence! --b. > > Cc: Bruce Fields <bfields@fieldses.org> > Cc: David Howells <dhowells@redhat.com> > Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> > Reported-by: Al Viro <viro@ZenIV.linux.org.uk> > Signed-off-by: Jeff Layton <jlayton@redhat.com> > --- > fs/locks.c | 75 ++++++++++++++++++++++++++++++++++++++++++++---------- > include/linux/fs.h | 6 +++++ > 2 files changed, 68 insertions(+), 13 deletions(-) > > diff --git a/fs/locks.c b/fs/locks.c > index b27a300..8dddcb5 100644 > --- a/fs/locks.c > +++ b/fs/locks.c > @@ -652,15 +652,18 @@ static void locks_insert_lock(struct file_lock **pos, struct file_lock *fl) > locks_insert_global_locks(fl); > } > > -/* > - * Delete a lock and then free it. > - * Wake up processes that are blocked waiting for this lock, > - * notify the FS that the lock has been cleared and > - * finally free the lock. > +/** > + * locks_delete_lock - Delete a lock and then free it. > + * @thisfl_p: pointer that points to the fl_next field of the previous > + * inode->i_flock list entry > + * > + * Unlink a lock from all lists and free the namespace reference, but don't > + * free it yet. Wake up processes that are blocked waiting for this lock and > + * notify the FS that the lock has been cleared. > * > * Must be called with the i_lock held! > */ > -static void locks_delete_lock(struct file_lock **thisfl_p) > +static void locks_unlink_lock(struct file_lock **thisfl_p) > { > struct file_lock *fl = *thisfl_p; > > @@ -675,6 +678,18 @@ static void locks_delete_lock(struct file_lock **thisfl_p) > } > > locks_wake_up_blocks(fl); > +} > + > +/* > + * Unlink a lock from all lists and free it. > + * > + * Must be called with i_lock held! > + */ > +static void locks_delete_lock(struct file_lock **thisfl_p) > +{ > + struct file_lock *fl = *thisfl_p; > + > + locks_unlink_lock(thisfl_p); > locks_free_lock(fl); > } > > @@ -1455,6 +1470,32 @@ int fcntl_getlease(struct file *filp) > return type; > } > > +/** > + * check_conflicting_open - see if the given dentry points to a file that has > + * an existing open that would conflict with the > + * desired lease. > + * @dentry: dentry to check > + * @arg: type of lease that we're trying to acquire > + * > + * Check to see if there's an existing open fd on this file that would > + * conflict with the lease we're trying to set. > + */ > +static int > +check_conflicting_open(const struct dentry *dentry, const long arg) > +{ > + int ret = 0; > + struct inode *inode = dentry->d_inode; > + > + if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0)) > + return -EAGAIN; > + > + if ((arg == F_WRLCK) && ((d_count(dentry) > 1) || > + (atomic_read(&inode->i_count) > 1))) > + ret = -EAGAIN; > + > + return ret; > +} > + > static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp) > { > struct file_lock *fl, **before, **my_before = NULL, *lease; > @@ -1464,12 +1505,8 @@ static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp > > lease = *flp; > > - error = -EAGAIN; > - if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0)) > - goto out; > - if ((arg == F_WRLCK) > - && ((d_count(dentry) > 1) > - || (atomic_read(&inode->i_count) > 1))) > + error = check_conflicting_open(dentry, arg); > + if (error) > goto out; > > /* > @@ -1514,8 +1551,20 @@ static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp > goto out; > > locks_insert_lock(before, lease); > - return 0; > > + /* > + * The check in break_lease() is lockless. It's possible for another > + * open to race in after we did the earlier check for a conflicting > + * open but before the lease was inserted. Check again for a > + * conflicting open and cancel the lease if there is one. > + * > + * We also add a barrier here to ensure that the insertion of the lock > + * precedes these checks. > + */ > + smp_mb(); > + error = check_conflicting_open(dentry, arg); > + if (error) > + locks_unlink_lock(flp); > out: > return error; > } > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 9818747..165bf41 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1955,6 +1955,12 @@ static inline int locks_verify_truncate(struct inode *inode, > > static inline int break_lease(struct inode *inode, unsigned int mode) > { > + /* > + * Since this check is lockless, we must ensure that any refcounts > + * taken are done before checking inode->i_flock. Otherwise, we could > + * end up racing with tasks trying to set a new lease on this file. > + */ > + smp_mb(); > if (inode->i_flock) > return __break_lease(inode, mode); > return 0; > -- > 1.8.3.1 > ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2013-08-19 13:34 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-07-08 19:07 [PATCH] locks: close potential race between setlease and open Jeff Layton 2013-08-14 12:11 ` [PATCH v3] " Jeff Layton 2013-08-15 19:32 ` Bruce Fields 2013-08-15 19:43 ` Jeff Layton 2013-08-15 20:30 ` Bruce Fields 2013-08-15 20:44 ` memory barriers in flock (Re: [PATCH v3] locks: close potential race between setlease and open) David Howells 2013-08-15 21:31 ` Paul E. McKenney 2013-08-16 12:09 ` Jeff Layton 2013-08-19 13:31 ` Bruce Fields 2013-08-16 14:49 ` [PATCH v4] locks: close potential race between setlease and open Jeff Layton 2013-08-19 13:34 ` Bruce Fields
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).