* [PATCH] nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock
@ 2016-06-30 16:12 Chuck Lever
2016-06-30 16:17 ` Jeff Layton
0 siblings, 1 reply; 8+ messages in thread
From: Chuck Lever @ 2016-06-30 16:12 UTC (permalink / raw)
To: jlayton; +Cc: linux-nfs
nfsd4_release_lockowner finds a lock owner that has no lock state,
and drops cl_lock. Then release_lockowner picks up cl_lock and
unhashes the lock owner.
During the window where cl_lock is dropped, I don't see anything
preventing a concurrent nfsd4_lock from finding that same lock owner
and adding lock state to it.
Move release_lockowner() into nfsd4_release_lockowner and hang onto
the cl_lock until after the lock owner's state has been unhashed.
Fixes: 2c41beb0e5cf ("nfsd: reduce cl_lock thrashing in ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4state.c | 40 +++++++++++++++++-----------------------
1 file changed, 17 insertions(+), 23 deletions(-)
Hey Jeff-
Wondering what your thoughts about this are. I noticed a possible
race while looking at another bug. It's untested.
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 70d0b9b..b921123 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1200,27 +1200,6 @@ free_ol_stateid_reaplist(struct list_head *reaplist)
}
}
-static void release_lockowner(struct nfs4_lockowner *lo)
-{
- struct nfs4_client *clp = lo->lo_owner.so_client;
- struct nfs4_ol_stateid *stp;
- struct list_head reaplist;
-
- INIT_LIST_HEAD(&reaplist);
-
- spin_lock(&clp->cl_lock);
- unhash_lockowner_locked(lo);
- while (!list_empty(&lo->lo_owner.so_stateids)) {
- stp = list_first_entry(&lo->lo_owner.so_stateids,
- struct nfs4_ol_stateid, st_perstateowner);
- WARN_ON(!unhash_lock_stateid(stp));
- put_ol_stateid_locked(stp, &reaplist);
- }
- spin_unlock(&clp->cl_lock);
- free_ol_stateid_reaplist(&reaplist);
- nfs4_put_stateowner(&lo->lo_owner);
-}
-
static void release_open_stateid_locks(struct nfs4_ol_stateid *open_stp,
struct list_head *reaplist)
{
@@ -5945,6 +5924,7 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp,
__be32 status;
struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
struct nfs4_client *clp;
+ LIST_HEAD (reaplist);
dprintk("nfsd4_release_lockowner clientid: (%08x/%08x):\n",
clid->cl_boot, clid->cl_id);
@@ -5975,9 +5955,23 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp,
nfs4_get_stateowner(sop);
break;
}
+ if (!lo) {
+ spin_unlock(&clp->cl_lock);
+ return status;
+ }
+
+ unhash_lockowner_locked(lo);
+ while (!list_empty(&lo->lo_owner.so_stateids)) {
+ stp = list_first_entry(&lo->lo_owner.so_stateids,
+ struct nfs4_ol_stateid,
+ st_perstateowner);
+ WARN_ON(!unhash_lock_stateid(stp));
+ put_ol_stateid_locked(stp, &reaplist);
+ }
spin_unlock(&clp->cl_lock);
- if (lo)
- release_lockowner(lo);
+ free_ol_stateid_reaplist(&reaplist);
+ nfs4_put_stateowner(&lo->lo_owner);
+
return status;
}
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH] nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock
2016-06-30 16:12 [PATCH] nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock Chuck Lever
@ 2016-06-30 16:17 ` Jeff Layton
2016-06-30 16:20 ` Chuck Lever
0 siblings, 1 reply; 8+ messages in thread
From: Jeff Layton @ 2016-06-30 16:17 UTC (permalink / raw)
To: Chuck Lever; +Cc: linux-nfs
On Thu, 2016-06-30 at 12:12 -0400, Chuck Lever wrote:
> nfsd4_release_lockowner finds a lock owner that has no lock state,
> and drops cl_lock. Then release_lockowner picks up cl_lock and
> unhashes the lock owner.
>
> During the window where cl_lock is dropped, I don't see anything
> preventing a concurrent nfsd4_lock from finding that same lock owner
> and adding lock state to it.
>
> Move release_lockowner() into nfsd4_release_lockowner and hang onto
> the cl_lock until after the lock owner's state has been unhashed.
>
> Fixes: 2c41beb0e5cf ("nfsd: reduce cl_lock thrashing in ... ")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/nfsd/nfs4state.c | 40 +++++++++++++++++-----------------------
> 1 file changed, 17 insertions(+), 23 deletions(-)
>
> Hey Jeff-
>
> Wondering what your thoughts about this are. I noticed a possible
> race while looking at another bug. It's untested.
>
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 70d0b9b..b921123 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1200,27 +1200,6 @@ free_ol_stateid_reaplist(struct list_head *reaplist)
> }
> }
>
> -static void release_lockowner(struct nfs4_lockowner *lo)
> -{
> - struct nfs4_client *clp = lo->lo_owner.so_client;
> - struct nfs4_ol_stateid *stp;
> - struct list_head reaplist;
> -
> - INIT_LIST_HEAD(&reaplist);
> -
> - spin_lock(&clp->cl_lock);
> - unhash_lockowner_locked(lo);
> - while (!list_empty(&lo->lo_owner.so_stateids)) {
> - stp = list_first_entry(&lo->lo_owner.so_stateids,
> - struct nfs4_ol_stateid, st_perstateowner);
> - WARN_ON(!unhash_lock_stateid(stp));
> - put_ol_stateid_locked(stp, &reaplist);
> - }
> - spin_unlock(&clp->cl_lock);
> - free_ol_stateid_reaplist(&reaplist);
> - nfs4_put_stateowner(&lo->lo_owner);
> -}
> -
> static void release_open_stateid_locks(struct nfs4_ol_stateid *open_stp,
> struct list_head *reaplist)
> {
> @@ -5945,6 +5924,7 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp,
> __be32 status;
> struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
> struct nfs4_client *clp;
> + LIST_HEAD (reaplist);
>
> dprintk("nfsd4_release_lockowner clientid: (%08x/%08x):\n",
> clid->cl_boot, clid->cl_id);
> @@ -5975,9 +5955,23 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp,
> nfs4_get_stateowner(sop);
> break;
> }
> + if (!lo) {
> + spin_unlock(&clp->cl_lock);
> + return status;
> + }
> +
> + unhash_lockowner_locked(lo);
> + while (!list_empty(&lo->lo_owner.so_stateids)) {
> + stp = list_first_entry(&lo->lo_owner.so_stateids,
> + struct nfs4_ol_stateid,
> + st_perstateowner);
> + WARN_ON(!unhash_lock_stateid(stp));
> + put_ol_stateid_locked(stp, &reaplist);
> + }
> spin_unlock(&clp->cl_lock);
> - if (lo)
> - release_lockowner(lo);
> + free_ol_stateid_reaplist(&reaplist);
> + nfs4_put_stateowner(&lo->lo_owner);
> +
> return status;
> }
>
>
Your patch looks correct to me. Even if there is something else that
prevents that race (and I don't see anything that does either), then
still reduces the spinlock thrashing further. So...
Reviewed-by: Jeff Layton <jlayton@redhat.com>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock
2016-06-30 16:17 ` Jeff Layton
@ 2016-06-30 16:20 ` Chuck Lever
2016-06-30 16:22 ` Jeff Layton
0 siblings, 1 reply; 8+ messages in thread
From: Chuck Lever @ 2016-06-30 16:20 UTC (permalink / raw)
To: Jeff Layton; +Cc: Linux NFS Mailing List
> On Jun 30, 2016, at 12:17 PM, Jeff Layton <jlayton@redhat.com> wrote:
>
> On Thu, 2016-06-30 at 12:12 -0400, Chuck Lever wrote:
>> nfsd4_release_lockowner finds a lock owner that has no lock state,
>> and drops cl_lock. Then release_lockowner picks up cl_lock and
>> unhashes the lock owner.
>>
>> During the window where cl_lock is dropped, I don't see anything
>> preventing a concurrent nfsd4_lock from finding that same lock owner
>> and adding lock state to it.
>>
>> Move release_lockowner() into nfsd4_release_lockowner and hang onto
>> the cl_lock until after the lock owner's state has been unhashed.
>>
>> Fixes: 2c41beb0e5cf ("nfsd: reduce cl_lock thrashing in ... ")
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> fs/nfsd/nfs4state.c | 40 +++++++++++++++++-----------------------
>> 1 file changed, 17 insertions(+), 23 deletions(-)
>>
>> Hey Jeff-
>>
>> Wondering what your thoughts about this are. I noticed a possible
>> race while looking at another bug. It's untested.
>>
>>
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index 70d0b9b..b921123 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -1200,27 +1200,6 @@ free_ol_stateid_reaplist(struct list_head *reaplist)
>> }
>> }
>>
>> -static void release_lockowner(struct nfs4_lockowner *lo)
>> -{
>> - struct nfs4_client *clp = lo->lo_owner.so_client;
>> - struct nfs4_ol_stateid *stp;
>> - struct list_head reaplist;
>> -
>> - INIT_LIST_HEAD(&reaplist);
>> -
>> - spin_lock(&clp->cl_lock);
>> - unhash_lockowner_locked(lo);
>> - while (!list_empty(&lo->lo_owner.so_stateids)) {
>> - stp = list_first_entry(&lo->lo_owner.so_stateids,
>> - struct nfs4_ol_stateid, st_perstateowner);
>> - WARN_ON(!unhash_lock_stateid(stp));
>> - put_ol_stateid_locked(stp, &reaplist);
>> - }
>> - spin_unlock(&clp->cl_lock);
>> - free_ol_stateid_reaplist(&reaplist);
>> - nfs4_put_stateowner(&lo->lo_owner);
>> -}
>> -
>> static void release_open_stateid_locks(struct nfs4_ol_stateid *open_stp,
>> struct list_head *reaplist)
>> {
>> @@ -5945,6 +5924,7 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp,
>> __be32 status;
>> struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
>> struct nfs4_client *clp;
>> + LIST_HEAD (reaplist);
>>
>> dprintk("nfsd4_release_lockowner clientid: (%08x/%08x):\n",
>> clid->cl_boot, clid->cl_id);
>> @@ -5975,9 +5955,23 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp,
>> nfs4_get_stateowner(sop);
>> break;
>> }
>> + if (!lo) {
>> + spin_unlock(&clp->cl_lock);
>> + return status;
>> + }
>> +
>> + unhash_lockowner_locked(lo);
>> + while (!list_empty(&lo->lo_owner.so_stateids)) {
>> + stp = list_first_entry(&lo->lo_owner.so_stateids,
>> + struct nfs4_ol_stateid,
>> + st_perstateowner);
>> + WARN_ON(!unhash_lock_stateid(stp));
>> + put_ol_stateid_locked(stp, &reaplist);
>> + }
>> spin_unlock(&clp->cl_lock);
>> - if (lo)
>> - release_lockowner(lo);
>> + free_ol_stateid_reaplist(&reaplist);
>> + nfs4_put_stateowner(&lo->lo_owner);
>> +
>> return status;
>> }
>>
>>
>
>
> Your patch looks correct to me. Even if there is something else that
> prevents that race (and I don't see anything that does either), then
> still reduces the spinlock thrashing further. So...
>
> Reviewed-by: Jeff Layton <jlayton@redhat.com>
Thanks, I'll add your tag and put this through some testing.
Do you want to take this, or should it go through Bruce?
--
Chuck Lever
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock
2016-06-30 16:20 ` Chuck Lever
@ 2016-06-30 16:22 ` Jeff Layton
0 siblings, 0 replies; 8+ messages in thread
From: Jeff Layton @ 2016-06-30 16:22 UTC (permalink / raw)
To: Chuck Lever; +Cc: Linux NFS Mailing List
On Thu, 2016-06-30 at 12:20 -0400, Chuck Lever wrote:
>
> >
> > On Jun 30, 2016, at 12:17 PM, Jeff Layton <jlayton@redhat.com> wrote:
> >
> > On Thu, 2016-06-30 at 12:12 -0400, Chuck Lever wrote:
> > > nfsd4_release_lockowner finds a lock owner that has no lock state,
> > > and drops cl_lock. Then release_lockowner picks up cl_lock and
> > > unhashes the lock owner.
> > >
> > > During the window where cl_lock is dropped, I don't see anything
> > > preventing a concurrent nfsd4_lock from finding that same lock owner
> > > and adding lock state to it.
> > >
> > > Move release_lockowner() into nfsd4_release_lockowner and hang onto
> > > the cl_lock until after the lock owner's state has been unhashed.
> > >
> > > Fixes: 2c41beb0e5cf ("nfsd: reduce cl_lock thrashing in ... ")
> > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > > ---
> > > fs/nfsd/nfs4state.c | 40 +++++++++++++++++-----------------------
> > > 1 file changed, 17 insertions(+), 23 deletions(-)
> > >
> > > Hey Jeff-
> > >
> > > Wondering what your thoughts about this are. I noticed a possible
> > > race while looking at another bug. It's untested.
> > >
> > >
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index 70d0b9b..b921123 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -1200,27 +1200,6 @@ free_ol_stateid_reaplist(struct list_head *reaplist)
> > > }
> > > }
> > >
> > > -static void release_lockowner(struct nfs4_lockowner *lo)
> > > -{
> > > - struct nfs4_client *clp = lo->lo_owner.so_client;
> > > - struct nfs4_ol_stateid *stp;
> > > - struct list_head reaplist;
> > > -
> > > - INIT_LIST_HEAD(&reaplist);
> > > -
> > > - spin_lock(&clp->cl_lock);
> > > - unhash_lockowner_locked(lo);
> > > - while (!list_empty(&lo->lo_owner.so_stateids)) {
> > > - stp = list_first_entry(&lo->lo_owner.so_stateids,
> > > - struct nfs4_ol_stateid, st_perstateowner);
> > > - WARN_ON(!unhash_lock_stateid(stp));
> > > - put_ol_stateid_locked(stp, &reaplist);
> > > - }
> > > - spin_unlock(&clp->cl_lock);
> > > - free_ol_stateid_reaplist(&reaplist);
> > > - nfs4_put_stateowner(&lo->lo_owner);
> > > -}
> > > -
> > > static void release_open_stateid_locks(struct nfs4_ol_stateid *open_stp,
> > > struct list_head *reaplist)
> > > {
> > > @@ -5945,6 +5924,7 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp,
> > > __be32 status;
> > > struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
> > > struct nfs4_client *clp;
> > > + LIST_HEAD (reaplist);
> > >
> > > dprintk("nfsd4_release_lockowner clientid: (%08x/%08x):\n",
> > > clid->cl_boot, clid->cl_id);
> > > @@ -5975,9 +5955,23 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp,
> > > nfs4_get_stateowner(sop);
> > > break;
> > > }
> > > + if (!lo) {
> > > + spin_unlock(&clp->cl_lock);
> > > + return status;
> > > + }
> > > +
> > > + unhash_lockowner_locked(lo);
> > > + while (!list_empty(&lo->lo_owner.so_stateids)) {
> > > + stp = list_first_entry(&lo->lo_owner.so_stateids,
> > > + struct nfs4_ol_stateid,
> > > + st_perstateowner);
> > > + WARN_ON(!unhash_lock_stateid(stp));
> > > + put_ol_stateid_locked(stp, &reaplist);
> > > + }
> > > spin_unlock(&clp->cl_lock);
> > > - if (lo)
> > > - release_lockowner(lo);
> > > + free_ol_stateid_reaplist(&reaplist);
> > > + nfs4_put_stateowner(&lo->lo_owner);
> > > +
> > > return status;
> > > }
> > >
> > >
> >
> >
> > Your patch looks correct to me. Even if there is something else that
> > prevents that race (and I don't see anything that does either), then
> > still reduces the spinlock thrashing further. So...
> >
> > Reviewed-by: Jeff Layton <jlayton@redhat.com>
>
> Thanks, I'll add your tag and put this through some testing.
> Do you want to take this, or should it go through Bruce?
>
Great. Please send it to Bruce...
--
Jeff Layton <jlayton@redhat.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock
@ 2016-07-13 20:40 Chuck Lever
2016-07-14 19:55 ` J. Bruce Fields
0 siblings, 1 reply; 8+ messages in thread
From: Chuck Lever @ 2016-07-13 20:40 UTC (permalink / raw)
To: bfields; +Cc: linux-nfs
nfsd4_release_lockowner finds a lock owner that has no lock state,
and drops cl_lock. Then release_lockowner picks up cl_lock and
unhashes the lock owner.
During the window where cl_lock is dropped, I don't see anything
preventing a concurrent nfsd4_lock from finding that same lock owner
and adding lock state to it.
Move release_lockowner() into nfsd4_release_lockowner and hang onto
the cl_lock until after the lock owner's state cannot be found
again.
Fixes: 2c41beb0e5cf ("nfsd: reduce cl_lock thrashing in ... ")
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
Hi Bruce-
Noticed this recently. I've been running with this patch on my test
NFS server for over a week. Haven't noticed any issues, but I wonder
if my clients or tests actually exercise this code in parallel.
The reason I was looking at this area is that one of our internal
testers encountered a related problem with NFSv4.1. LOCK and
FREE_STATEID are racing: LOCK returns an existing lock stateid, then
FREE_STATEID frees that stateid (NFS4_OK). FREE_STATEID should
return NFS4ERR_LOCKS_HELD in this case?
I have not been able to reproduce this, but our tester is able to
hit it fairly reliably with Oracle's v4.1-based kernel running on
his server. Recent upstream kernels make the issue rare, but it is
still encountered on occasion.
fs/nfsd/nfs4state.c | 40 +++++++++++++++++-----------------------
1 file changed, 17 insertions(+), 23 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index f5f82e1..31c993f 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1200,27 +1200,6 @@ free_ol_stateid_reaplist(struct list_head *reaplist)
}
}
-static void release_lockowner(struct nfs4_lockowner *lo)
-{
- struct nfs4_client *clp = lo->lo_owner.so_client;
- struct nfs4_ol_stateid *stp;
- struct list_head reaplist;
-
- INIT_LIST_HEAD(&reaplist);
-
- spin_lock(&clp->cl_lock);
- unhash_lockowner_locked(lo);
- while (!list_empty(&lo->lo_owner.so_stateids)) {
- stp = list_first_entry(&lo->lo_owner.so_stateids,
- struct nfs4_ol_stateid, st_perstateowner);
- WARN_ON(!unhash_lock_stateid(stp));
- put_ol_stateid_locked(stp, &reaplist);
- }
- spin_unlock(&clp->cl_lock);
- free_ol_stateid_reaplist(&reaplist);
- nfs4_put_stateowner(&lo->lo_owner);
-}
-
static void release_open_stateid_locks(struct nfs4_ol_stateid *open_stp,
struct list_head *reaplist)
{
@@ -5938,6 +5917,7 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp,
__be32 status;
struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
struct nfs4_client *clp;
+ LIST_HEAD (reaplist);
dprintk("nfsd4_release_lockowner clientid: (%08x/%08x):\n",
clid->cl_boot, clid->cl_id);
@@ -5968,9 +5948,23 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp,
nfs4_get_stateowner(sop);
break;
}
+ if (!lo) {
+ spin_unlock(&clp->cl_lock);
+ return status;
+ }
+
+ unhash_lockowner_locked(lo);
+ while (!list_empty(&lo->lo_owner.so_stateids)) {
+ stp = list_first_entry(&lo->lo_owner.so_stateids,
+ struct nfs4_ol_stateid,
+ st_perstateowner);
+ WARN_ON(!unhash_lock_stateid(stp));
+ put_ol_stateid_locked(stp, &reaplist);
+ }
spin_unlock(&clp->cl_lock);
- if (lo)
- release_lockowner(lo);
+ free_ol_stateid_reaplist(&reaplist);
+ nfs4_put_stateowner(&lo->lo_owner);
+
return status;
}
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH] nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock
2016-07-13 20:40 Chuck Lever
@ 2016-07-14 19:55 ` J. Bruce Fields
2016-07-14 20:25 ` Chuck Lever
0 siblings, 1 reply; 8+ messages in thread
From: J. Bruce Fields @ 2016-07-14 19:55 UTC (permalink / raw)
To: Chuck Lever; +Cc: linux-nfs
On Wed, Jul 13, 2016 at 04:40:14PM -0400, Chuck Lever wrote:
> nfsd4_release_lockowner finds a lock owner that has no lock state,
> and drops cl_lock. Then release_lockowner picks up cl_lock and
> unhashes the lock owner.
>
> During the window where cl_lock is dropped, I don't see anything
> preventing a concurrent nfsd4_lock from finding that same lock owner
> and adding lock state to it.
>
> Move release_lockowner() into nfsd4_release_lockowner and hang onto
> the cl_lock until after the lock owner's state cannot be found
> again.
>
> Fixes: 2c41beb0e5cf ("nfsd: reduce cl_lock thrashing in ... ")
> Reviewed-by: Jeff Layton <jlayton@redhat.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Makes sense. Applying with just one line added to the changelog, in
case it's useful to someone: "Found by inspection, we don't currently
have a reproducer."
> ---
> Hi Bruce-
>
> Noticed this recently. I've been running with this patch on my test
> NFS server for over a week. Haven't noticed any issues, but I wonder
> if my clients or tests actually exercise this code in parallel.
>
> The reason I was looking at this area is that one of our internal
> testers encountered a related problem with NFSv4.1. LOCK and
> FREE_STATEID are racing: LOCK returns an existing lock stateid, then
> FREE_STATEID frees that stateid (NFS4_OK). FREE_STATEID should
> return NFS4ERR_LOCKS_HELD in this case?
Looks like free_stateid is deciding whether to return LOCKS_HELD by
inspecting the list of vfs locks, but nfsd4_lock creates the lock
stateid before the vfs lock, so maybe there's a race like:
create lock stateid
free_stateid
vfs_lock_file
but I haven't checked that in detail.
I haven't thought about the protocol side--does hitting this race
require an incorrect client?
--b.
>
> I have not been able to reproduce this, but our tester is able to
> hit it fairly reliably with Oracle's v4.1-based kernel running on
> his server. Recent upstream kernels make the issue rare, but it is
> still encountered on occasion.
>
>
> fs/nfsd/nfs4state.c | 40 +++++++++++++++++-----------------------
> 1 file changed, 17 insertions(+), 23 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index f5f82e1..31c993f 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1200,27 +1200,6 @@ free_ol_stateid_reaplist(struct list_head *reaplist)
> }
> }
>
> -static void release_lockowner(struct nfs4_lockowner *lo)
> -{
> - struct nfs4_client *clp = lo->lo_owner.so_client;
> - struct nfs4_ol_stateid *stp;
> - struct list_head reaplist;
> -
> - INIT_LIST_HEAD(&reaplist);
> -
> - spin_lock(&clp->cl_lock);
> - unhash_lockowner_locked(lo);
> - while (!list_empty(&lo->lo_owner.so_stateids)) {
> - stp = list_first_entry(&lo->lo_owner.so_stateids,
> - struct nfs4_ol_stateid, st_perstateowner);
> - WARN_ON(!unhash_lock_stateid(stp));
> - put_ol_stateid_locked(stp, &reaplist);
> - }
> - spin_unlock(&clp->cl_lock);
> - free_ol_stateid_reaplist(&reaplist);
> - nfs4_put_stateowner(&lo->lo_owner);
> -}
> -
> static void release_open_stateid_locks(struct nfs4_ol_stateid *open_stp,
> struct list_head *reaplist)
> {
> @@ -5938,6 +5917,7 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp,
> __be32 status;
> struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
> struct nfs4_client *clp;
> + LIST_HEAD (reaplist);
>
> dprintk("nfsd4_release_lockowner clientid: (%08x/%08x):\n",
> clid->cl_boot, clid->cl_id);
> @@ -5968,9 +5948,23 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp,
> nfs4_get_stateowner(sop);
> break;
> }
> + if (!lo) {
> + spin_unlock(&clp->cl_lock);
> + return status;
> + }
> +
> + unhash_lockowner_locked(lo);
> + while (!list_empty(&lo->lo_owner.so_stateids)) {
> + stp = list_first_entry(&lo->lo_owner.so_stateids,
> + struct nfs4_ol_stateid,
> + st_perstateowner);
> + WARN_ON(!unhash_lock_stateid(stp));
> + put_ol_stateid_locked(stp, &reaplist);
> + }
> spin_unlock(&clp->cl_lock);
> - if (lo)
> - release_lockowner(lo);
> + free_ol_stateid_reaplist(&reaplist);
> + nfs4_put_stateowner(&lo->lo_owner);
> +
> return status;
> }
>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock
2016-07-14 19:55 ` J. Bruce Fields
@ 2016-07-14 20:25 ` Chuck Lever
2016-07-26 21:22 ` Chuck Lever
0 siblings, 1 reply; 8+ messages in thread
From: Chuck Lever @ 2016-07-14 20:25 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Linux NFS Mailing List
> On Jul 14, 2016, at 3:55 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Wed, Jul 13, 2016 at 04:40:14PM -0400, Chuck Lever wrote:
>>
>> The reason I was looking at this area is that one of our internal
>> testers encountered a related problem with NFSv4.1. LOCK and
>> FREE_STATEID are racing: LOCK returns an existing lock stateid, then
>> FREE_STATEID frees that stateid (NFS4_OK). FREE_STATEID should
>> return NFS4ERR_LOCKS_HELD in this case?
>
> Looks like free_stateid is deciding whether to return LOCKS_HELD by
> inspecting the list of vfs locks, but nfsd4_lock creates the lock
> stateid before the vfs lock, so maybe there's a race like:
>
> create lock stateid
> free_stateid
> vfs_lock_file
>
> but I haven't checked that in detail.
> I haven't thought about the protocol side--does hitting this race
> require an incorrect client?
I don't believe so, but there may be some subtlety I'm
missing.
I have a network capture. The test is nfslock01 from LTP.
There are two user processes that lock, write, and unlock
alternating 64-byte pieces of the same file.
(stateid here is represented as "seqno / other")
On the wire, I see:
Frame 115004 C LOCK offset 672000 len 64
Frame 115008 R LOCK NFS4_OK stateid 1/A
Frame 115012 C WRITE stateid 0/A, offset 672000 len 64
Frame 115016 R WRITE NFS4_OK
Frame 115019 C LOCKU stateid 1/A 672000 len 64
Frame 115022 R LOCKU NFS4_OK stateid 2/A
Frame 115025 C FREE_STATEID stateid 2/A
Frame 115026 C LOCK offset 672128 len 64
Frame 115029 R FREE_STATEID NFS4_OK
Frame 115030 R LOCK stateid 3/A
Frame 115034 C WRITE stateid 0/A offset 672128 len 64
Frame 115038 R WRITE NFS4ERR_BAD_STATEID
Frame 115043 C TEST_STATEID stateid 3/A
Frame 115044 R TEST_STATEID NFS4_OK
The returned stateid list has 3/A marked NFS4ERR_BAD_STATEID
Since the LOCKU and LOCK are accessing data segments
128 bytes apart, I assume these requests come from
the same process on the client. The FREE_STATEID
might be done in the background? which might be why
it is sent concurrently with the LOCK.
IIUC, either the FREE_STATEID should have returned
LOCKS_HELD, or the second LOCK should have returned
a fresh lock state ID.
On my client I can't seem to get FREE_STATEID and
LOCK to be sent concurrently, so the operations
always come out in an order that guarantees the
race is avoided.
> --b.
>
>>
>> I have not been able to reproduce this, but our tester is able to
>> hit it fairly reliably with Oracle's v4.1-based kernel running on
>> his server. Recent upstream kernels make the issue rare, but it is
>> still encountered on occasion.
--
Chuck Lever
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock
2016-07-14 20:25 ` Chuck Lever
@ 2016-07-26 21:22 ` Chuck Lever
0 siblings, 0 replies; 8+ messages in thread
From: Chuck Lever @ 2016-07-26 21:22 UTC (permalink / raw)
To: J. Bruce Fields, Jeff Layton; +Cc: Linux NFS Mailing List
> On Jul 14, 2016, at 4:25 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
>
>
>> On Jul 14, 2016, at 3:55 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>>
>> On Wed, Jul 13, 2016 at 04:40:14PM -0400, Chuck Lever wrote:
>>>
>>> The reason I was looking at this area is that one of our internal
>>> testers encountered a related problem with NFSv4.1. LOCK and
>>> FREE_STATEID are racing: LOCK returns an existing lock stateid, then
>>> FREE_STATEID frees that stateid (NFS4_OK). FREE_STATEID should
>>> return NFS4ERR_LOCKS_HELD in this case?
>>
>> Looks like free_stateid is deciding whether to return LOCKS_HELD by
>> inspecting the list of vfs locks, but nfsd4_lock creates the lock
>> stateid before the vfs lock, so maybe there's a race like:
>>
>> create lock stateid
>> free_stateid
>> vfs_lock_file
>>
>> but I haven't checked that in detail.
>
>> I haven't thought about the protocol side--does hitting this race
>> require an incorrect client?
>
> I don't believe so, but there may be some subtlety I'm
> missing.
>
> I have a network capture. The test is nfslock01 from LTP.
> There are two user processes that lock, write, and unlock
> alternating 64-byte pieces of the same file.
>
> (stateid here is represented as "seqno / other")
>
> On the wire, I see:
>
> Frame 115004 C LOCK offset 672000 len 64
> Frame 115008 R LOCK NFS4_OK stateid 1/A
> Frame 115012 C WRITE stateid 0/A, offset 672000 len 64
> Frame 115016 R WRITE NFS4_OK
> Frame 115019 C LOCKU stateid 1/A 672000 len 64
> Frame 115022 R LOCKU NFS4_OK stateid 2/A
> Frame 115025 C FREE_STATEID stateid 2/A
> Frame 115026 C LOCK offset 672128 len 64
> Frame 115029 R FREE_STATEID NFS4_OK
> Frame 115030 R LOCK stateid 3/A
> Frame 115034 C WRITE stateid 0/A offset 672128 len 64
> Frame 115038 R WRITE NFS4ERR_BAD_STATEID
> Frame 115043 C TEST_STATEID stateid 3/A
> Frame 115044 R TEST_STATEID NFS4_OK
> The returned stateid list has 3/A marked NFS4ERR_BAD_STATEID
>
> Since the LOCKU and LOCK are accessing data segments
> 128 bytes apart, I assume these requests come from
> the same process on the client. The FREE_STATEID
> might be done in the background? which might be why
> it is sent concurrently with the LOCK.
>
> IIUC, either the FREE_STATEID should have returned
> LOCKS_HELD, or the second LOCK should have returned
> a fresh lock state ID.
>
> On my client I can't seem to get FREE_STATEID and
> LOCK to be sent concurrently, so the operations
> always come out in an order that guarantees the
> race is avoided.
Here's one possible scenario:
LOCK and FREE_STATEID are called simultaneously. The stateid passed
to FREE_STATEID has the same lock owner that is passed to the LOCK
operation.
nfsd4_lock is called. It gets to lookup_or_create_lock_stateid, which
takes cl_lock, finds the same stateid as was passed to FREE_STATEID,
drops cl_lock and returns. For some reason this thread is scheduled out.
This allows nfsd4_free_stateid to grab cl_lock. find_stateid_locked
finds the stateid, finds no lock state associated with the inode,
and decides to unhash the lock stateid. It drops cl_lock and
returns NFS4_OK.
nfsd4_lock resumes processing with the unhashed stateid. It completes
its processing, locks the file, and returns NFS4_OK and the dead
stateid.
Later the client uses that stateid in a WRITE. nfs4_preprocess_stateid_op
invokes nfsd4_lookup_stateid and then find_stateid_by_type. That might
find the stateid, but unhashing it earlier set sc_type to zero, and
now find_stateid_by_type returns NULL. WRITE returns BAD_STATEID.
--
Chuck Lever
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-07-26 21:22 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-30 16:12 [PATCH] nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock Chuck Lever
2016-06-30 16:17 ` Jeff Layton
2016-06-30 16:20 ` Chuck Lever
2016-06-30 16:22 ` Jeff Layton
-- strict thread matches above, loose matches on Subject: below --
2016-07-13 20:40 Chuck Lever
2016-07-14 19:55 ` J. Bruce Fields
2016-07-14 20:25 ` Chuck Lever
2016-07-26 21:22 ` Chuck Lever
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.