* NFS4 clients cannot reclaim locks
[not found] <8181361.84.1285932468389.JavaMail.sprabhu@dhcp-1-233.fab.redhat.com>
@ 2010-10-01 11:30 ` Sachin Prabhu
2010-10-01 20:46 ` Trond Myklebust
2010-10-05 15:03 ` Timo Aaltonen
0 siblings, 2 replies; 9+ messages in thread
From: Sachin Prabhu @ 2010-10-01 11:30 UTC (permalink / raw)
To: linux-nfs
NFS4 clients appear to have problems reclaiming locks after a server reboot. I can recreate the issue on 2.6.34.7-56.fc13.x86_64 on a Fedora system.
The problem appears to happen in cases where after a reboot, a WRITE call is made just before the RENEW call. In that case, the NFS4ERR_STALE_STATEID is returned for the WRITE call which results in NFS_STATE_RECLAIM_REBOOT being set in the state flags. However the NFS4ERR_STALE_CLIENTID returned for the subsequent RENEW call is handled by
nfs4_recovery_handle_error() -> nfs4_state_end_reclaim_reboot(clp);
which ends up setting the state flag to NFS_STATE_RECLAIM_NOGRACE and clearing the NFS_STATE_RECLAIM_REBOOT in nfs4_state_mark_reclaim_nograce().
The process of reclaiming the locks then seem to hit another roadblock in nfs4_open_expired() where it fails to open the file and reset the state. It ends up calling nfs4_reclaim_locks() in a loop with the old stateid in nfs4_reclaim_open_state().
By commenting out the call to nfs4_state_end_reclaim_reboot(clp) in nfs4_recovery_handle_error(), the client was able to handle this particular scenario properly.
Has any one else seen this issue?
Sachin Prabhu
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS4 clients cannot reclaim locks
2010-10-01 11:30 ` NFS4 clients cannot reclaim locks Sachin Prabhu
@ 2010-10-01 20:46 ` Trond Myklebust
2010-10-05 15:03 ` Timo Aaltonen
1 sibling, 0 replies; 9+ messages in thread
From: Trond Myklebust @ 2010-10-01 20:46 UTC (permalink / raw)
To: Sachin Prabhu; +Cc: linux-nfs
On Fri, 2010-10-01 at 07:30 -0400, Sachin Prabhu wrote:
> NFS4 clients appear to have problems reclaiming locks after a server reboot. I can recreate the issue on 2.6.34.7-56.fc13.x86_64 on a Fedora system.
>
> The problem appears to happen in cases where after a reboot, a WRITE call is made just before the RENEW call. In that case, the NFS4ERR_STALE_STATEID is returned for the WRITE call which results in NFS_STATE_RECLAIM_REBOOT being set in the state flags. However the NFS4ERR_STALE_CLIENTID returned for the subsequent RENEW call is handled by
> nfs4_recovery_handle_error() -> nfs4_state_end_reclaim_reboot(clp);
> which ends up setting the state flag to NFS_STATE_RECLAIM_NOGRACE and clearing the NFS_STATE_RECLAIM_REBOOT in nfs4_state_mark_reclaim_nograce().
Yup. I don't think we should call nfs4_state_mark_reclaim_reboot() here.
> The process of reclaiming the locks then seem to hit another roadblock in nfs4_open_expired() where it fails to open the file and reset the state. It ends up calling nfs4_reclaim_locks() in a loop with the old stateid in nfs4_reclaim_open_state().
Any idea how nfs4_open_expired() is failing? It seems that if it does,
we should see an error, which would cause the lock reclaim to fail.
Also, why is the call to nfs4_reclaim_locks() looping? That too should
exit in case of an error.
> By commenting out the call to nfs4_state_end_reclaim_reboot(clp) in nfs4_recovery_handle_error(), the client was able to handle this particular scenario properly.
We do need to keep the nfs4_state_end_reclaim_reboot() there. Otherwise,
we have a problem if the server reboots again while we're in the middle
of reclaiming state.
Cheers
Trond
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS4 clients cannot reclaim locks
[not found] <18163799.104.1286186355944.JavaMail.sprabhu@dhcp-1-233.fab.redhat.com>
@ 2010-10-04 10:03 ` Sachin Prabhu
2010-10-05 13:37 ` Trond Myklebust
2010-10-05 13:38 ` Trond Myklebust
0 siblings, 2 replies; 9+ messages in thread
From: Sachin Prabhu @ 2010-10-04 10:03 UTC (permalink / raw)
To: Trond Myklebust; +Cc: linux-nfs
----- "Trond Myklebust" <Trond.Myklebust@netapp.com> wrote:
> On Fri, 2010-10-01 at 07:30 -0400, Sachin Prabhu wrote:
> > NFS4 clients appear to have problems reclaiming locks after a server
> reboot. I can recreate the issue on 2.6.34.7-56.fc13.x86_64 on a
> Fedora system.
> >
> > The problem appears to happen in cases where after a reboot, a WRITE
> call is made just before the RENEW call. In that case, the
> NFS4ERR_STALE_STATEID is returned for the WRITE call which results in
> NFS_STATE_RECLAIM_REBOOT being set in the state flags. However the
> NFS4ERR_STALE_CLIENTID returned for the subsequent RENEW call is
> handled by
> > nfs4_recovery_handle_error() -> nfs4_state_end_reclaim_reboot(clp);
>
> > which ends up setting the state flag to NFS_STATE_RECLAIM_NOGRACE
> and clearing the NFS_STATE_RECLAIM_REBOOT in
> nfs4_state_mark_reclaim_nograce().
>
> Yup. I don't think we should call nfs4_state_mark_reclaim_reboot()
> here.
>
> > The process of reclaiming the locks then seem to hit another
> roadblock in nfs4_open_expired() where it fails to open the file and
> reset the state. It ends up calling nfs4_reclaim_locks() in a loop
> with the old stateid in nfs4_reclaim_open_state().
>
> Any idea how nfs4_open_expired() is failing? It seems that if it
> does,
> we should see an error, which would cause the lock reclaim to fail.
>
> Also, why is the call to nfs4_reclaim_locks() looping? That too
> should
> exit in case of an error.
>
>From instrumentation, the problem appears to happen at nfs4_open_prepare
static void nfs4_open_prepare(struct rpc_task *task, void *calldata)
{
..
/*
* Check if we still need to send an OPEN call, or if we can use
* a delegation instead.
*/
if (data->state != NULL) {
struct nfs_delegation *delegation;
if (can_open_cached(data->state, data->o_arg.fmode, data->o_arg.open_flags))
goto out_no_action;
..
out_no_action:
task->tk_action = NULL;
}
Here, can_open_cached returns true. The open call is never made and the old state is used.
static int nfs4_reclaim_open_state(struct nfs4_state_owner *sp, const struct nfs4_state_recovery_ops *ops)
{
..
restart:
..
status = ops->recover_open(sp, state); <-- This call attempts to use cached state and status is set to 0
if (status >= 0) {
status = nfs4_reclaim_locks(state, ops); <-- Attempts to reclaim locks using old stateid
-- Here status is set to -NFS4ERR_BAD_STATEID --
..
}
switch (status) {
..
case -NFS4ERR_BAD_STATEID:
case -NFS4ERR_RECLAIM_BAD:
case -NFS4ERR_RECLAIM_CONFLICT:
nfs4_state_mark_reclaim_nograce(sp->so_client, state);
break;
..
}
nfs4_put_open_state(state);
goto restart;
..
}
The call to ops->recover_open() calls nfs4_open_expired(). While preparing the RPC call to OPEN, in nfs4_open_prepare(), it decides that the caches copy is valid and it attempts to use it. So nfs4_open_expired() returns 0. The subsequent call to reclaim locks using nfs4_reclaim_locks() fails with with a -NFS4ERR_BAD_STATEID. A goto statement in nfs4_reclaim_open_state() results in it looping with the same results as before.
Sachin Prabhu
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS4 clients cannot reclaim locks
2010-10-04 10:03 ` Sachin Prabhu
@ 2010-10-05 13:37 ` Trond Myklebust
2010-10-06 15:59 ` Sachin Prabhu
2010-10-05 13:38 ` Trond Myklebust
1 sibling, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2010-10-05 13:37 UTC (permalink / raw)
To: Sachin Prabhu; +Cc: linux-nfs
On Mon, 2010-10-04 at 06:03 -0400, Sachin Prabhu wrote:
> From instrumentation, the problem appears to happen at nfs4_open_prepare
>
> static void nfs4_open_prepare(struct rpc_task *task, void *calldata)
> {
> ..
> /*
> * Check if we still need to send an OPEN call, or if we can use
> * a delegation instead.
> */
>
> if (data->state != NULL) {
> struct nfs_delegation *delegation;
>
> if (can_open_cached(data->state, data->o_arg.fmode, data->o_arg.open_flags))
> goto out_no_action;
> ..
> out_no_action:
> task->tk_action = NULL;
>
> }
>
> Here, can_open_cached returns true. The open call is never made and the old state is used.
> static int nfs4_reclaim_open_state(struct nfs4_state_owner *sp, const struct nfs4_state_recovery_ops *ops)
> {
> ..
> restart:
> ..
> status = ops->recover_open(sp, state); <-- This call attempts to use cached state and status is set to 0
> if (status >= 0) {
> status = nfs4_reclaim_locks(state, ops); <-- Attempts to reclaim locks using old stateid
> -- Here status is set to -NFS4ERR_BAD_STATEID --
> ..
> }
> switch (status) {
> ..
> case -NFS4ERR_BAD_STATEID:
> case -NFS4ERR_RECLAIM_BAD:
> case -NFS4ERR_RECLAIM_CONFLICT:
> nfs4_state_mark_reclaim_nograce(sp->so_client, state);
> break;
> ..
> }
> nfs4_put_open_state(state);
> goto restart;
> ..
> }
>
> The call to ops->recover_open() calls nfs4_open_expired(). While preparing the RPC call to OPEN, in nfs4_open_prepare(), it decides that the caches copy is valid and it attempts to use it. So nfs4_open_expired() returns 0. The subsequent call to reclaim locks using nfs4_reclaim_locks() fails with with a -NFS4ERR_BAD_STATEID. A goto statement in nfs4_reclaim_open_state() results in it looping with the same results as before.
Yup. That makes sense. Does the following patch help?
Cheers
Trond
--------------------------------------------------------------------------------------------------------
NFSv4: Fix open recovery
From: Trond Myklebust <Trond.Myklebust@netapp.com>
NFSv4 open recovery is currently broken: since we do not clear the
state->flags states before attempting recovery, we end up with the
'can_open_cached()' function triggering. This again leads to no OPEN call
being put on the wire.
Reported-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
fs/nfs/nfs4proc.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 089da5b..01b4817 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1120,6 +1120,7 @@ static int nfs4_open_recover(struct nfs4_opendata *opendata, struct nfs4_state *
clear_bit(NFS_DELEGATED_STATE, &state->flags);
smp_rmb();
if (state->n_rdwr != 0) {
+ clear_bit(NFS_O_RDWR_STATE, &state->flags);
ret = nfs4_open_recover_helper(opendata, FMODE_READ|FMODE_WRITE, &newstate);
if (ret != 0)
return ret;
@@ -1127,6 +1128,7 @@ static int nfs4_open_recover(struct nfs4_opendata *opendata, struct nfs4_state *
return -ESTALE;
}
if (state->n_wronly != 0) {
+ clear_bit(NFS_O_WRONLY_STATE, &state->flags);
ret = nfs4_open_recover_helper(opendata, FMODE_WRITE, &newstate);
if (ret != 0)
return ret;
@@ -1134,6 +1136,7 @@ static int nfs4_open_recover(struct nfs4_opendata *opendata, struct nfs4_state *
return -ESTALE;
}
if (state->n_rdonly != 0) {
+ clear_bit(NFS_O_RDONLY_STATE, &state->flags);
ret = nfs4_open_recover_helper(opendata, FMODE_READ, &newstate);
if (ret != 0)
return ret;
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: NFS4 clients cannot reclaim locks
2010-10-04 10:03 ` Sachin Prabhu
2010-10-05 13:37 ` Trond Myklebust
@ 2010-10-05 13:38 ` Trond Myklebust
1 sibling, 0 replies; 9+ messages in thread
From: Trond Myklebust @ 2010-10-05 13:38 UTC (permalink / raw)
To: Sachin Prabhu; +Cc: linux-nfs
On Mon, 2010-10-04 at 06:03 -0400, Sachin Prabhu wrote:
> ----- "Trond Myklebust" <Trond.Myklebust@netapp.com> wrote:
> > On Fri, 2010-10-01 at 07:30 -0400, Sachin Prabhu wrote:
> > > NFS4 clients appear to have problems reclaiming locks after a server
> > reboot. I can recreate the issue on 2.6.34.7-56.fc13.x86_64 on a
> > Fedora system.
> > >
> > > The problem appears to happen in cases where after a reboot, a WRITE
> > call is made just before the RENEW call. In that case, the
> > NFS4ERR_STALE_STATEID is returned for the WRITE call which results in
> > NFS_STATE_RECLAIM_REBOOT being set in the state flags. However the
> > NFS4ERR_STALE_CLIENTID returned for the subsequent RENEW call is
> > handled by
> > > nfs4_recovery_handle_error() -> nfs4_state_end_reclaim_reboot(clp);
> >
> > > which ends up setting the state flag to NFS_STATE_RECLAIM_NOGRACE
> > and clearing the NFS_STATE_RECLAIM_REBOOT in
> > nfs4_state_mark_reclaim_nograce().
> >
> > Yup. I don't think we should call nfs4_state_mark_reclaim_reboot()
> > here.
...Here is the second patch.
Cheers
Trond
------------------------------------------------------------------------------------------------------
NFSv4: Don't call nfs4_state_mark_reclaim_reboot() from error handlers
From: Trond Myklebust <Trond.Myklebust@netapp.com>
In the case of a server reboot, the state recovery thread starts by calling
nfs4_state_end_reclaim_reboot() in order to avoid edge conditions when
the server reboots while the client is in the middle of recovery.
However, if the client has already marked the nfs4_state as requiring
reboot recovery, then the above behaviour will cause the recovery thread to
treat the open as if it was part of such an edge condition: the open will
be recovered as if it was part of a lease expiration (and all the locks
will be lost).
Fix is to remove the call to nfs4_state_mark_reclaim_reboot from
nfs4_async_handle_error(), and nfs4_handle_exception(). Instead we leave it
to the recovery thread to do this for us.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
fs/nfs/nfs4proc.c | 6 ------
1 files changed, 0 insertions(+), 6 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 01b4817..74aa54e 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -255,9 +255,6 @@ static int nfs4_handle_exception(const struct nfs_server *server, int errorcode,
nfs4_state_mark_reclaim_nograce(clp, state);
goto do_state_recovery;
case -NFS4ERR_STALE_STATEID:
- if (state == NULL)
- break;
- nfs4_state_mark_reclaim_reboot(clp, state);
case -NFS4ERR_STALE_CLIENTID:
case -NFS4ERR_EXPIRED:
goto do_state_recovery;
@@ -3493,9 +3490,6 @@ nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server,
nfs4_state_mark_reclaim_nograce(clp, state);
goto do_state_recovery;
case -NFS4ERR_STALE_STATEID:
- if (state == NULL)
- break;
- nfs4_state_mark_reclaim_reboot(clp, state);
case -NFS4ERR_STALE_CLIENTID:
case -NFS4ERR_EXPIRED:
goto do_state_recovery;
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: NFS4 clients cannot reclaim locks
2010-10-01 11:30 ` NFS4 clients cannot reclaim locks Sachin Prabhu
2010-10-01 20:46 ` Trond Myklebust
@ 2010-10-05 15:03 ` Timo Aaltonen
2010-11-22 16:02 ` Timo Aaltonen
1 sibling, 1 reply; 9+ messages in thread
From: Timo Aaltonen @ 2010-10-05 15:03 UTC (permalink / raw)
To: Sachin Prabhu; +Cc: linux-nfs
On Fri, 1 Oct 2010, Sachin Prabhu wrote:
> NFS4 clients appear to have problems reclaiming locks after a server reboot. I can recreate the issue on 2.6.34.7-56.fc13.x86_64 on a Fedora system.
>
> The problem appears to happen in cases where after a reboot, a WRITE call is made just before the RENEW call. In that case, the NFS4ERR_STALE_STATEID is returned for the WRITE call which results in NFS_STATE_RECLAIM_REBOOT being set in the state flags. However the NFS4ERR_STALE_CLIENTID returned for the subsequent RENEW call is handled by
> nfs4_recovery_handle_error() -> nfs4_state_end_reclaim_reboot(clp);
> which ends up setting the state flag to NFS_STATE_RECLAIM_NOGRACE and clearing the NFS_STATE_RECLAIM_REBOOT in nfs4_state_mark_reclaim_nograce().
>
> The process of reclaiming the locks then seem to hit another roadblock in nfs4_open_expired() where it fails to open the file and reset the state. It ends up calling nfs4_reclaim_locks() in a loop with the old stateid in nfs4_reclaim_open_state().
>
> By commenting out the call to nfs4_state_end_reclaim_reboot(clp) in nfs4_recovery_handle_error(), the client was able to handle this particular scenario properly.
>
> Has any one else seen this issue?
could this be related to the bug I was seeing with nfsv4 (now using v3
with success):
https://bugzilla.kernel.org/show_bug.cgi?id=15973
though the error returned by the server is BAD_STATEID..
--
Timo Aaltonen
Systems Specialist, Aalto IT
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS4 clients cannot reclaim locks
2010-10-05 13:37 ` Trond Myklebust
@ 2010-10-06 15:59 ` Sachin Prabhu
0 siblings, 0 replies; 9+ messages in thread
From: Sachin Prabhu @ 2010-10-06 15:59 UTC (permalink / raw)
To: Trond Myklebust; +Cc: linux-nfs
----- "Trond Myklebust" <Trond.Myklebust@netapp.com> wrote:
> Yup. That makes sense. Does the following patch help?
>
> Cheers
> Trond
> --------------------------------------------------------------------------------------------------------
> NFSv4: Fix open recovery
>
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
>
> NFSv4 open recovery is currently broken: since we do not clear the
> state->flags states before attempting recovery, we end up with the
> 'can_open_cached()' function triggering. This again leads to no OPEN
> call
> being put on the wire.
>
> Reported-by: Sachin Prabhu <sprabhu@redhat.com>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>
> fs/nfs/nfs4proc.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 089da5b..01b4817 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -1120,6 +1120,7 @@ static int nfs4_open_recover(struct
> nfs4_opendata *opendata, struct nfs4_state *
> clear_bit(NFS_DELEGATED_STATE, &state->flags);
> smp_rmb();
> if (state->n_rdwr != 0) {
> + clear_bit(NFS_O_RDWR_STATE, &state->flags);
> ret = nfs4_open_recover_helper(opendata, FMODE_READ|FMODE_WRITE,
> &newstate);
> if (ret != 0)
> return ret;
> @@ -1127,6 +1128,7 @@ static int nfs4_open_recover(struct
> nfs4_opendata *opendata, struct nfs4_state *
> return -ESTALE;
> }
> if (state->n_wronly != 0) {
> + clear_bit(NFS_O_WRONLY_STATE, &state->flags);
> ret = nfs4_open_recover_helper(opendata, FMODE_WRITE, &newstate);
> if (ret != 0)
> return ret;
> @@ -1134,6 +1136,7 @@ static int nfs4_open_recover(struct
> nfs4_opendata *opendata, struct nfs4_state *
> return -ESTALE;
> }
> if (state->n_rdonly != 0) {
> + clear_bit(NFS_O_RDONLY_STATE, &state->flags);
> ret = nfs4_open_recover_helper(opendata, FMODE_READ, &newstate);
> if (ret != 0)
> return ret;
>
Yes. The patch works.
As expected, repeated open calls are made with claim-type set to NULL. For each of these calls, a NFS4ERR_GRACE is returned by the server as long as it is in Grace period. Once the grace period has completed, the open call succeeds, a new stateid is set and the write operation continues.
Thank You
Sachin Prabhu
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS4 clients cannot reclaim locks
[not found] <18697573.14.1286380841649.JavaMail.sprabhu@dhcp-1-233.fab.redhat.com>
@ 2010-10-06 16:01 ` Sachin Prabhu
0 siblings, 0 replies; 9+ messages in thread
From: Sachin Prabhu @ 2010-10-06 16:01 UTC (permalink / raw)
To: Trond Myklebust; +Cc: linux-nfs
----- "Trond Myklebust" <Trond.Myklebust@netapp.com> wrote:
> ...Here is the second patch.
>
> Cheers
> Trond
> ------------------------------------------------------------------------------------------------------
> NFSv4: Don't call nfs4_state_mark_reclaim_reboot() from error
> handlers
>
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
>
> In the case of a server reboot, the state recovery thread starts by
> calling
> nfs4_state_end_reclaim_reboot() in order to avoid edge conditions
> when
> the server reboots while the client is in the middle of recovery.
>
> However, if the client has already marked the nfs4_state as requiring
> reboot recovery, then the above behaviour will cause the recovery
> thread to
> treat the open as if it was part of such an edge condition: the open
> will
> be recovered as if it was part of a lease expiration (and all the
> locks
> will be lost).
> Fix is to remove the call to nfs4_state_mark_reclaim_reboot from
> nfs4_async_handle_error(), and nfs4_handle_exception(). Instead we
> leave it
> to the recovery thread to do this for us.
>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>
> fs/nfs/nfs4proc.c | 6 ------
> 1 files changed, 0 insertions(+), 6 deletions(-)
>
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 01b4817..74aa54e 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -255,9 +255,6 @@ static int nfs4_handle_exception(const struct
> nfs_server *server, int errorcode,
> nfs4_state_mark_reclaim_nograce(clp, state);
> goto do_state_recovery;
> case -NFS4ERR_STALE_STATEID:
> - if (state == NULL)
> - break;
> - nfs4_state_mark_reclaim_reboot(clp, state);
> case -NFS4ERR_STALE_CLIENTID:
> case -NFS4ERR_EXPIRED:
> goto do_state_recovery;
> @@ -3493,9 +3490,6 @@ nfs4_async_handle_error(struct rpc_task *task,
> const struct nfs_server *server,
> nfs4_state_mark_reclaim_nograce(clp, state);
> goto do_state_recovery;
> case -NFS4ERR_STALE_STATEID:
> - if (state == NULL)
> - break;
> - nfs4_state_mark_reclaim_reboot(clp, state);
> case -NFS4ERR_STALE_CLIENTID:
> case -NFS4ERR_EXPIRED:
> goto do_state_recovery;
Yes. The patch works for me.
An open call is made to the server with claim-type set to claim previous. This resets the stateid and the write operation can continue.
Thank You
Sachin Prabhu
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS4 clients cannot reclaim locks
2010-10-05 15:03 ` Timo Aaltonen
@ 2010-11-22 16:02 ` Timo Aaltonen
0 siblings, 0 replies; 9+ messages in thread
From: Timo Aaltonen @ 2010-11-22 16:02 UTC (permalink / raw)
To: linux-nfs
On Tue, 5 Oct 2010, Timo Aaltonen wrote:
> On Fri, 1 Oct 2010, Sachin Prabhu wrote:
>
>> NFS4 clients appear to have problems reclaiming locks after a server
>> reboot. I can recreate the issue on 2.6.34.7-56.fc13.x86_64 on a Fedora
>> system.
>>
>> The problem appears to happen in cases where after a reboot, a WRITE call
>> is made just before the RENEW call. In that case, the NFS4ERR_STALE_STATEID
>> is returned for the WRITE call which results in NFS_STATE_RECLAIM_REBOOT
>> being set in the state flags. However the NFS4ERR_STALE_CLIENTID returned
>> for the subsequent RENEW call is handled by
>> nfs4_recovery_handle_error() -> nfs4_state_end_reclaim_reboot(clp);
>> which ends up setting the state flag to NFS_STATE_RECLAIM_NOGRACE and
>> clearing the NFS_STATE_RECLAIM_REBOOT in nfs4_state_mark_reclaim_nograce().
>>
>> The process of reclaiming the locks then seem to hit another roadblock in
>> nfs4_open_expired() where it fails to open the file and reset the state. It
>> ends up calling nfs4_reclaim_locks() in a loop with the old stateid in
>> nfs4_reclaim_open_state().
>>
>> By commenting out the call to nfs4_state_end_reclaim_reboot(clp) in
>> nfs4_recovery_handle_error(), the client was able to handle this particular
>> scenario properly.
>>
>> Has any one else seen this issue?
>
> could this be related to the bug I was seeing with nfsv4 (now using v3 with
> success):
>
> https://bugzilla.kernel.org/show_bug.cgi?id=15973
>
> though the error returned by the server is BAD_STATEID..
At least testing .37rc2 has so far been positive, suggesting that the bug
is fixed there.
--
Timo Aaltonen
Systems Specialist, Aalto IT
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-11-22 16:14 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <8181361.84.1285932468389.JavaMail.sprabhu@dhcp-1-233.fab.redhat.com>
2010-10-01 11:30 ` NFS4 clients cannot reclaim locks Sachin Prabhu
2010-10-01 20:46 ` Trond Myklebust
2010-10-05 15:03 ` Timo Aaltonen
2010-11-22 16:02 ` Timo Aaltonen
[not found] <18163799.104.1286186355944.JavaMail.sprabhu@dhcp-1-233.fab.redhat.com>
2010-10-04 10:03 ` Sachin Prabhu
2010-10-05 13:37 ` Trond Myklebust
2010-10-06 15:59 ` Sachin Prabhu
2010-10-05 13:38 ` Trond Myklebust
[not found] <18697573.14.1286380841649.JavaMail.sprabhu@dhcp-1-233.fab.redhat.com>
2010-10-06 16:01 ` Sachin Prabhu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).