NFS4 clients cannot reclaim locks

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* NFS4 clients cannot reclaim locks
       [not found] <8181361.84.1285932468389.JavaMail.sprabhu@dhcp-1-233.fab.redhat.com>
@ 2010-10-01 11:30 ` Sachin Prabhu
  2010-10-01 20:46   ` Trond Myklebust
  2010-10-05 15:03   ` Timo Aaltonen
  0 siblings, 2 replies; 9+ messages in thread
From: Sachin Prabhu @ 2010-10-01 11:30 UTC (permalink / raw)
  To: linux-nfs

NFS4 clients appear to have problems reclaiming locks after a server reboot. I can recreate the issue on 2.6.34.7-56.fc13.x86_64 on a Fedora system. 

The problem appears to happen in cases where after a reboot, a WRITE call is made just before the RENEW call. In that case, the NFS4ERR_STALE_STATEID is returned for the WRITE call which results in NFS_STATE_RECLAIM_REBOOT being set in the state flags. However the NFS4ERR_STALE_CLIENTID returned for the subsequent RENEW call is handled by 
nfs4_recovery_handle_error() -> nfs4_state_end_reclaim_reboot(clp);  
which ends up setting the state flag to NFS_STATE_RECLAIM_NOGRACE and clearing the NFS_STATE_RECLAIM_REBOOT in nfs4_state_mark_reclaim_nograce(). 

The process of reclaiming the locks then seem to hit another roadblock in nfs4_open_expired() where it fails to open the file and reset the state. It ends up calling nfs4_reclaim_locks() in a loop with the old stateid in nfs4_reclaim_open_state().

By commenting out the call to nfs4_state_end_reclaim_reboot(clp) in nfs4_recovery_handle_error(), the client was able to handle this particular scenario properly.

Has any one else seen this issue?

Sachin Prabhu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NFS4 clients cannot reclaim locks
  2010-10-01 11:30 ` NFS4 clients cannot reclaim locks Sachin Prabhu
@ 2010-10-01 20:46   ` Trond Myklebust
  2010-10-05 15:03   ` Timo Aaltonen
  1 sibling, 0 replies; 9+ messages in thread
From: Trond Myklebust @ 2010-10-01 20:46 UTC (permalink / raw)
  To: Sachin Prabhu; +Cc: linux-nfs

On Fri, 2010-10-01 at 07:30 -0400, Sachin Prabhu wrote:
> NFS4 clients appear to have problems reclaiming locks after a server reboot. I can recreate the issue on 2.6.34.7-56.fc13.x86_64 on a Fedora system. 
> 
> The problem appears to happen in cases where after a reboot, a WRITE call is made just before the RENEW call. In that case, the NFS4ERR_STALE_STATEID is returned for the WRITE call which results in NFS_STATE_RECLAIM_REBOOT being set in the state flags. However the NFS4ERR_STALE_CLIENTID returned for the subsequent RENEW call is handled by 
> nfs4_recovery_handle_error() -> nfs4_state_end_reclaim_reboot(clp);  
> which ends up setting the state flag to NFS_STATE_RECLAIM_NOGRACE and clearing the NFS_STATE_RECLAIM_REBOOT in nfs4_state_mark_reclaim_nograce(). 

Yup. I don't think we should call nfs4_state_mark_reclaim_reboot() here.

> The process of reclaiming the locks then seem to hit another roadblock in nfs4_open_expired() where it fails to open the file and reset the state. It ends up calling nfs4_reclaim_locks() in a loop with the old stateid in nfs4_reclaim_open_state().

Any idea how nfs4_open_expired() is failing? It seems that if it does,
we should see an error, which would cause the lock reclaim to fail.

Also, why is the call to nfs4_reclaim_locks() looping? That too should
exit in case of an error.

> By commenting out the call to nfs4_state_end_reclaim_reboot(clp) in nfs4_recovery_handle_error(), the client was able to handle this particular scenario properly.

We do need to keep the nfs4_state_end_reclaim_reboot() there. Otherwise,
we have a problem if the server reboots again while we're in the middle
of reclaiming state.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NFS4 clients cannot reclaim locks
       [not found] <18163799.104.1286186355944.JavaMail.sprabhu@dhcp-1-233.fab.redhat.com>
@ 2010-10-04 10:03 ` Sachin Prabhu
  2010-10-05 13:37   ` Trond Myklebust
  2010-10-05 13:38   ` Trond Myklebust
  0 siblings, 2 replies; 9+ messages in thread
From: Sachin Prabhu @ 2010-10-04 10:03 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

----- "Trond Myklebust" <Trond.Myklebust@netapp.com> wrote:
> On Fri, 2010-10-01 at 07:30 -0400, Sachin Prabhu wrote:
> > NFS4 clients appear to have problems reclaiming locks after a server
> reboot. I can recreate the issue on 2.6.34.7-56.fc13.x86_64 on a
> Fedora system. 
> > 
> > The problem appears to happen in cases where after a reboot, a WRITE
> call is made just before the RENEW call. In that case, the
> NFS4ERR_STALE_STATEID is returned for the WRITE call which results in
> NFS_STATE_RECLAIM_REBOOT being set in the state flags. However the
> NFS4ERR_STALE_CLIENTID returned for the subsequent RENEW call is
> handled by 
> > nfs4_recovery_handle_error() -> nfs4_state_end_reclaim_reboot(clp); 
> 
> > which ends up setting the state flag to NFS_STATE_RECLAIM_NOGRACE
> and clearing the NFS_STATE_RECLAIM_REBOOT in
> nfs4_state_mark_reclaim_nograce(). 
> 
> Yup. I don't think we should call nfs4_state_mark_reclaim_reboot()
> here.
> 
> > The process of reclaiming the locks then seem to hit another
> roadblock in nfs4_open_expired() where it fails to open the file and
> reset the state. It ends up calling nfs4_reclaim_locks() in a loop
> with the old stateid in nfs4_reclaim_open_state().
> 
> Any idea how nfs4_open_expired() is failing? It seems that if it
> does,
> we should see an error, which would cause the lock reclaim to fail.
> 
> Also, why is the call to nfs4_reclaim_locks() looping? That too
> should
> exit in case of an error.
> 

>From instrumentation, the problem appears to happen at nfs4_open_prepare

static void nfs4_open_prepare(struct rpc_task *task, void *calldata)
{
..
        /*
         * Check if we still need to send an OPEN call, or if we can use
         * a delegation instead.
         */

        if (data->state != NULL) {
                struct nfs_delegation *delegation;

                if (can_open_cached(data->state, data->o_arg.fmode, data->o_arg.open_flags))
                        goto out_no_action;
..
out_no_action:
        task->tk_action = NULL;

}

Here, can_open_cached returns true. The open call is never made and the old state is used.
static int nfs4_reclaim_open_state(struct nfs4_state_owner *sp, const struct nfs4_state_recovery_ops *ops)
{
..
restart:
..
                status = ops->recover_open(sp, state); <-- This call attempts to use cached state and status is set to 0
                if (status >= 0) {
                        status = nfs4_reclaim_locks(state, ops); <-- Attempts to reclaim locks using old stateid
        -- Here status is set to -NFS4ERR_BAD_STATEID --
                 ..
                }
                switch (status) {
..
                        case -NFS4ERR_BAD_STATEID:
                        case -NFS4ERR_RECLAIM_BAD:
                        case -NFS4ERR_RECLAIM_CONFLICT:
                                nfs4_state_mark_reclaim_nograce(sp->so_client, state);
                                break;
..
                }
                nfs4_put_open_state(state);
                goto restart;
..
}

The call to ops->recover_open() calls nfs4_open_expired(). While preparing the RPC call to OPEN, in nfs4_open_prepare(), it decides that the caches copy is valid and it attempts to use it. So nfs4_open_expired() returns 0. The subsequent call to reclaim locks using nfs4_reclaim_locks() fails with with a -NFS4ERR_BAD_STATEID. A goto statement in nfs4_reclaim_open_state() results in it looping with the same results as before.

Sachin Prabhu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NFS4 clients cannot reclaim locks
  2010-10-04 10:03 ` Sachin Prabhu
@ 2010-10-05 13:37   ` Trond Myklebust
  2010-10-06 15:59     ` Sachin Prabhu
  2010-10-05 13:38   ` Trond Myklebust
  1 sibling, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2010-10-05 13:37 UTC (permalink / raw)
  To: Sachin Prabhu; +Cc: linux-nfs

On Mon, 2010-10-04 at 06:03 -0400, Sachin Prabhu wrote:
> From instrumentation, the problem appears to happen at nfs4_open_prepare
> 
> static void nfs4_open_prepare(struct rpc_task *task, void *calldata)
> {
> ..
>         /*
>          * Check if we still need to send an OPEN call, or if we can use
>          * a delegation instead.
>          */
> 
>         if (data->state != NULL) {
>                 struct nfs_delegation *delegation;
> 
>                 if (can_open_cached(data->state, data->o_arg.fmode, data->o_arg.open_flags))
>                         goto out_no_action;
> ..
> out_no_action:
>         task->tk_action = NULL;
> 
> }
> 
> Here, can_open_cached returns true. The open call is never made and the old state is used.
> static int nfs4_reclaim_open_state(struct nfs4_state_owner *sp, const struct nfs4_state_recovery_ops *ops)
> {
> ..
> restart:
> ..
>                 status = ops->recover_open(sp, state); <-- This call attempts to use cached state and status is set to 0
>                 if (status >= 0) {
>                         status = nfs4_reclaim_locks(state, ops); <-- Attempts to reclaim locks using old stateid
>         -- Here status is set to -NFS4ERR_BAD_STATEID --
>                  ..
>                 }
>                 switch (status) {
> ..
>                         case -NFS4ERR_BAD_STATEID:
>                         case -NFS4ERR_RECLAIM_BAD:
>                         case -NFS4ERR_RECLAIM_CONFLICT:
>                                 nfs4_state_mark_reclaim_nograce(sp->so_client, state);
>                                 break;
> ..
>                 }
>                 nfs4_put_open_state(state);
>                 goto restart;
> ..
> }
> 
> The call to ops->recover_open() calls nfs4_open_expired(). While preparing the RPC call to OPEN, in nfs4_open_prepare(), it decides that the caches copy is valid and it attempts to use it. So nfs4_open_expired() returns 0. The subsequent call to reclaim locks using nfs4_reclaim_locks() fails with with a -NFS4ERR_BAD_STATEID. A goto statement in nfs4_reclaim_open_state() results in it looping with the same results as before.

Yup. That makes sense. Does the following patch help?

Cheers
  Trond
--------------------------------------------------------------------------------------------------------
NFSv4: Fix open recovery

From: Trond Myklebust <Trond.Myklebust@netapp.com>

NFSv4 open recovery is currently broken: since we do not clear the
state->flags states before attempting recovery, we end up with the
'can_open_cached()' function triggering. This again leads to no OPEN call
being put on the wire.

Reported-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/nfs4proc.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)


diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 089da5b..01b4817 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1120,6 +1120,7 @@ static int nfs4_open_recover(struct nfs4_opendata *opendata, struct nfs4_state *
 	clear_bit(NFS_DELEGATED_STATE, &state->flags);
 	smp_rmb();
 	if (state->n_rdwr != 0) {
+		clear_bit(NFS_O_RDWR_STATE, &state->flags);
 		ret = nfs4_open_recover_helper(opendata, FMODE_READ|FMODE_WRITE, &newstate);
 		if (ret != 0)
 			return ret;
@@ -1127,6 +1128,7 @@ static int nfs4_open_recover(struct nfs4_opendata *opendata, struct nfs4_state *
 			return -ESTALE;
 	}
 	if (state->n_wronly != 0) {
+		clear_bit(NFS_O_WRONLY_STATE, &state->flags);
 		ret = nfs4_open_recover_helper(opendata, FMODE_WRITE, &newstate);
 		if (ret != 0)
 			return ret;
@@ -1134,6 +1136,7 @@ static int nfs4_open_recover(struct nfs4_opendata *opendata, struct nfs4_state *
 			return -ESTALE;
 	}
 	if (state->n_rdonly != 0) {
+		clear_bit(NFS_O_RDONLY_STATE, &state->flags);
 		ret = nfs4_open_recover_helper(opendata, FMODE_READ, &newstate);
 		if (ret != 0)
 			return ret;


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: NFS4 clients cannot reclaim locks
  2010-10-04 10:03 ` Sachin Prabhu
  2010-10-05 13:37   ` Trond Myklebust
@ 2010-10-05 13:38   ` Trond Myklebust
  1 sibling, 0 replies; 9+ messages in thread
From: Trond Myklebust @ 2010-10-05 13:38 UTC (permalink / raw)
  To: Sachin Prabhu; +Cc: linux-nfs

On Mon, 2010-10-04 at 06:03 -0400, Sachin Prabhu wrote:
> ----- "Trond Myklebust" <Trond.Myklebust@netapp.com> wrote:
> > On Fri, 2010-10-01 at 07:30 -0400, Sachin Prabhu wrote:
> > > NFS4 clients appear to have problems reclaiming locks after a server
> > reboot. I can recreate the issue on 2.6.34.7-56.fc13.x86_64 on a
> > Fedora system. 
> > > 
> > > The problem appears to happen in cases where after a reboot, a WRITE
> > call is made just before the RENEW call. In that case, the
> > NFS4ERR_STALE_STATEID is returned for the WRITE call which results in
> > NFS_STATE_RECLAIM_REBOOT being set in the state flags. However the
> > NFS4ERR_STALE_CLIENTID returned for the subsequent RENEW call is
> > handled by 
> > > nfs4_recovery_handle_error() -> nfs4_state_end_reclaim_reboot(clp); 
> > 
> > > which ends up setting the state flag to NFS_STATE_RECLAIM_NOGRACE
> > and clearing the NFS_STATE_RECLAIM_REBOOT in
> > nfs4_state_mark_reclaim_nograce(). 
> > 
> > Yup. I don't think we should call nfs4_state_mark_reclaim_reboot()
> > here.

...Here is the second patch.

Cheers
  Trond
------------------------------------------------------------------------------------------------------
NFSv4: Don't call nfs4_state_mark_reclaim_reboot() from error handlers

From: Trond Myklebust <Trond.Myklebust@netapp.com>

In the case of a server reboot, the state recovery thread starts by calling
nfs4_state_end_reclaim_reboot() in order to avoid edge conditions when
the server reboots while the client is in the middle of recovery.

However, if the client has already marked the nfs4_state as requiring
reboot recovery, then the above behaviour will cause the recovery thread to
treat the open as if it was part of such an edge condition: the open will
be recovered as if it was part of a lease expiration (and all the locks
will be lost).
Fix is to remove the call to nfs4_state_mark_reclaim_reboot from
nfs4_async_handle_error(), and nfs4_handle_exception(). Instead we leave it
to the recovery thread to do this for us.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/nfs4proc.c |    6 ------
 1 files changed, 0 insertions(+), 6 deletions(-)


diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 01b4817..74aa54e 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -255,9 +255,6 @@ static int nfs4_handle_exception(const struct nfs_server *server, int errorcode,
 			nfs4_state_mark_reclaim_nograce(clp, state);
 			goto do_state_recovery;
 		case -NFS4ERR_STALE_STATEID:
-			if (state == NULL)
-				break;
-			nfs4_state_mark_reclaim_reboot(clp, state);
 		case -NFS4ERR_STALE_CLIENTID:
 		case -NFS4ERR_EXPIRED:
 			goto do_state_recovery;
@@ -3493,9 +3490,6 @@ nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server,
 			nfs4_state_mark_reclaim_nograce(clp, state);
 			goto do_state_recovery;
 		case -NFS4ERR_STALE_STATEID:
-			if (state == NULL)
-				break;
-			nfs4_state_mark_reclaim_reboot(clp, state);
 		case -NFS4ERR_STALE_CLIENTID:
 		case -NFS4ERR_EXPIRED:
 			goto do_state_recovery;


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: NFS4 clients cannot reclaim locks
  2010-10-01 11:30 ` NFS4 clients cannot reclaim locks Sachin Prabhu
  2010-10-01 20:46   ` Trond Myklebust
@ 2010-10-05 15:03   ` Timo Aaltonen
  2010-11-22 16:02     ` Timo Aaltonen
  1 sibling, 1 reply; 9+ messages in thread
From: Timo Aaltonen @ 2010-10-05 15:03 UTC (permalink / raw)
  To: Sachin Prabhu; +Cc: linux-nfs

On Fri, 1 Oct 2010, Sachin Prabhu wrote:

> NFS4 clients appear to have problems reclaiming locks after a server reboot. I can recreate the issue on 2.6.34.7-56.fc13.x86_64 on a Fedora system.
>
> The problem appears to happen in cases where after a reboot, a WRITE call is made just before the RENEW call. In that case, the NFS4ERR_STALE_STATEID is returned for the WRITE call which results in NFS_STATE_RECLAIM_REBOOT being set in the state flags. However the NFS4ERR_STALE_CLIENTID returned for the subsequent RENEW call is handled by
> nfs4_recovery_handle_error() -> nfs4_state_end_reclaim_reboot(clp);
> which ends up setting the state flag to NFS_STATE_RECLAIM_NOGRACE and clearing the NFS_STATE_RECLAIM_REBOOT in nfs4_state_mark_reclaim_nograce().
>
> The process of reclaiming the locks then seem to hit another roadblock in nfs4_open_expired() where it fails to open the file and reset the state. It ends up calling nfs4_reclaim_locks() in a loop with the old stateid in nfs4_reclaim_open_state().
>
> By commenting out the call to nfs4_state_end_reclaim_reboot(clp) in nfs4_recovery_handle_error(), the client was able to handle this particular scenario properly.
>
> Has any one else seen this issue?

could this be related to the bug I was seeing with nfsv4 (now using v3 
with success):

https://bugzilla.kernel.org/show_bug.cgi?id=15973

though the error returned by the server is BAD_STATEID..


-- 
Timo Aaltonen
Systems Specialist, Aalto IT

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NFS4 clients cannot reclaim locks
  2010-10-05 13:37   ` Trond Myklebust
@ 2010-10-06 15:59     ` Sachin Prabhu
  0 siblings, 0 replies; 9+ messages in thread
From: Sachin Prabhu @ 2010-10-06 15:59 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs


----- "Trond Myklebust" <Trond.Myklebust@netapp.com> wrote:

> Yup. That makes sense. Does the following patch help?
> 
> Cheers
>   Trond
> --------------------------------------------------------------------------------------------------------
> NFSv4: Fix open recovery
> 
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> 
> NFSv4 open recovery is currently broken: since we do not clear the
> state->flags states before attempting recovery, we end up with the
> 'can_open_cached()' function triggering. This again leads to no OPEN
> call
> being put on the wire.
> 
> Reported-by: Sachin Prabhu <sprabhu@redhat.com>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
> 
>  fs/nfs/nfs4proc.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> 
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 089da5b..01b4817 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -1120,6 +1120,7 @@ static int nfs4_open_recover(struct
> nfs4_opendata *opendata, struct nfs4_state *
>  	clear_bit(NFS_DELEGATED_STATE, &state->flags);
>  	smp_rmb();
>  	if (state->n_rdwr != 0) {
> +		clear_bit(NFS_O_RDWR_STATE, &state->flags);
>  		ret = nfs4_open_recover_helper(opendata, FMODE_READ|FMODE_WRITE,
> &newstate);
>  		if (ret != 0)
>  			return ret;
> @@ -1127,6 +1128,7 @@ static int nfs4_open_recover(struct
> nfs4_opendata *opendata, struct nfs4_state *
>  			return -ESTALE;
>  	}
>  	if (state->n_wronly != 0) {
> +		clear_bit(NFS_O_WRONLY_STATE, &state->flags);
>  		ret = nfs4_open_recover_helper(opendata, FMODE_WRITE, &newstate);
>  		if (ret != 0)
>  			return ret;
> @@ -1134,6 +1136,7 @@ static int nfs4_open_recover(struct
> nfs4_opendata *opendata, struct nfs4_state *
>  			return -ESTALE;
>  	}
>  	if (state->n_rdonly != 0) {
> +		clear_bit(NFS_O_RDONLY_STATE, &state->flags);
>  		ret = nfs4_open_recover_helper(opendata, FMODE_READ, &newstate);
>  		if (ret != 0)
>  			return ret;
> 


Yes. The patch works. 

As expected, repeated open calls are made with claim-type set to NULL. For each of these calls, a NFS4ERR_GRACE is returned by the server as long as it is in Grace period. Once the grace period has completed, the open call succeeds, a new stateid is set and the write operation continues.

Thank You
Sachin Prabhu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NFS4 clients cannot reclaim locks
       [not found] <18697573.14.1286380841649.JavaMail.sprabhu@dhcp-1-233.fab.redhat.com>
@ 2010-10-06 16:01 ` Sachin Prabhu
  0 siblings, 0 replies; 9+ messages in thread
From: Sachin Prabhu @ 2010-10-06 16:01 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs


----- "Trond Myklebust" <Trond.Myklebust@netapp.com> wrote:
> ...Here is the second patch.
> 
> Cheers
>   Trond
> ------------------------------------------------------------------------------------------------------
> NFSv4: Don't call nfs4_state_mark_reclaim_reboot() from error
> handlers
> 
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> 
> In the case of a server reboot, the state recovery thread starts by
> calling
> nfs4_state_end_reclaim_reboot() in order to avoid edge conditions
> when
> the server reboots while the client is in the middle of recovery.
> 
> However, if the client has already marked the nfs4_state as requiring
> reboot recovery, then the above behaviour will cause the recovery
> thread to
> treat the open as if it was part of such an edge condition: the open
> will
> be recovered as if it was part of a lease expiration (and all the
> locks
> will be lost).
> Fix is to remove the call to nfs4_state_mark_reclaim_reboot from
> nfs4_async_handle_error(), and nfs4_handle_exception(). Instead we
> leave it
> to the recovery thread to do this for us.
> 
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
> 
>  fs/nfs/nfs4proc.c |    6 ------
>  1 files changed, 0 insertions(+), 6 deletions(-)
> 
> 
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 01b4817..74aa54e 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -255,9 +255,6 @@ static int nfs4_handle_exception(const struct
> nfs_server *server, int errorcode,
>  			nfs4_state_mark_reclaim_nograce(clp, state);
>  			goto do_state_recovery;
>  		case -NFS4ERR_STALE_STATEID:
> -			if (state == NULL)
> -				break;
> -			nfs4_state_mark_reclaim_reboot(clp, state);
>  		case -NFS4ERR_STALE_CLIENTID:
>  		case -NFS4ERR_EXPIRED:
>  			goto do_state_recovery;
> @@ -3493,9 +3490,6 @@ nfs4_async_handle_error(struct rpc_task *task,
> const struct nfs_server *server,
>  			nfs4_state_mark_reclaim_nograce(clp, state);
>  			goto do_state_recovery;
>  		case -NFS4ERR_STALE_STATEID:
> -			if (state == NULL)
> -				break;
> -			nfs4_state_mark_reclaim_reboot(clp, state);
>  		case -NFS4ERR_STALE_CLIENTID:
>  		case -NFS4ERR_EXPIRED:
>  			goto do_state_recovery;

Yes. The patch works for me.

An open call is made to the server with claim-type set to claim previous. This resets the stateid and the write operation can continue.

Thank You

Sachin Prabhu


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NFS4 clients cannot reclaim locks
  2010-10-05 15:03   ` Timo Aaltonen
@ 2010-11-22 16:02     ` Timo Aaltonen
  0 siblings, 0 replies; 9+ messages in thread
From: Timo Aaltonen @ 2010-11-22 16:02 UTC (permalink / raw)
  To: linux-nfs

On Tue, 5 Oct 2010, Timo Aaltonen wrote:

> On Fri, 1 Oct 2010, Sachin Prabhu wrote:
>
>> NFS4 clients appear to have problems reclaiming locks after a server 
>> reboot. I can recreate the issue on 2.6.34.7-56.fc13.x86_64 on a Fedora 
>> system.
>> 
>> The problem appears to happen in cases where after a reboot, a WRITE call 
>> is made just before the RENEW call. In that case, the NFS4ERR_STALE_STATEID 
>> is returned for the WRITE call which results in NFS_STATE_RECLAIM_REBOOT 
>> being set in the state flags. However the NFS4ERR_STALE_CLIENTID returned 
>> for the subsequent RENEW call is handled by
>> nfs4_recovery_handle_error() -> nfs4_state_end_reclaim_reboot(clp);
>> which ends up setting the state flag to NFS_STATE_RECLAIM_NOGRACE and 
>> clearing the NFS_STATE_RECLAIM_REBOOT in nfs4_state_mark_reclaim_nograce().
>> 
>> The process of reclaiming the locks then seem to hit another roadblock in 
>> nfs4_open_expired() where it fails to open the file and reset the state. It 
>> ends up calling nfs4_reclaim_locks() in a loop with the old stateid in 
>> nfs4_reclaim_open_state().
>> 
>> By commenting out the call to nfs4_state_end_reclaim_reboot(clp) in 
>> nfs4_recovery_handle_error(), the client was able to handle this particular 
>> scenario properly.
>> 
>> Has any one else seen this issue?
>
> could this be related to the bug I was seeing with nfsv4 (now using v3 with 
> success):
>
> https://bugzilla.kernel.org/show_bug.cgi?id=15973
>
> though the error returned by the server is BAD_STATEID..

At least testing .37rc2 has so far been positive, suggesting that the bug 
is fixed there.


-- 
Timo Aaltonen
Systems Specialist, Aalto IT

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-11-22 16:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <8181361.84.1285932468389.JavaMail.sprabhu@dhcp-1-233.fab.redhat.com>
2010-10-01 11:30 ` NFS4 clients cannot reclaim locks Sachin Prabhu
2010-10-01 20:46   ` Trond Myklebust
2010-10-05 15:03   ` Timo Aaltonen
2010-11-22 16:02     ` Timo Aaltonen
     [not found] <18163799.104.1286186355944.JavaMail.sprabhu@dhcp-1-233.fab.redhat.com>
2010-10-04 10:03 ` Sachin Prabhu
2010-10-05 13:37   ` Trond Myklebust
2010-10-06 15:59     ` Sachin Prabhu
2010-10-05 13:38   ` Trond Myklebust
     [not found] <18697573.14.1286380841649.JavaMail.sprabhu@dhcp-1-233.fab.redhat.com>
2010-10-06 16:01 ` Sachin Prabhu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).