* [PATCH 1/2] Revert "NFSv4: Retry LOCK on OLD_STATEID during delegation return"
@ 2023-06-27 18:31 Benjamin Coddington
2023-06-27 18:31 ` [PATCH 2/2] NFSv4: Fix dropped lock for racing OPEN and delegation return Benjamin Coddington
2023-10-05 20:39 ` [PATCH 1/2] Revert "NFSv4: Retry LOCK on OLD_STATEID during delegation return" Anna Schumaker
0 siblings, 2 replies; 6+ messages in thread
From: Benjamin Coddington @ 2023-06-27 18:31 UTC (permalink / raw)
To: trond.myklebust, anna; +Cc: Olga.Kornievskaia, linux-nfs
Olga Kornievskaia reports that this patch breaks NFSv4.0 state recovery.
It also introduces additional complexity in the error paths for cases not
related to the original problem. Let's revert it for now, and address the
original problem in another manner.
This reverts commit f5ea16137a3fa2858620dc9084466491c128535f.
Fixes: f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during delegation return")
Reported-by: Kornievskaia, Olga <Olga.Kornievskaia@netapp.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
---
fs/nfs/nfs4proc.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index d3665390c4cb..6bb14f6cfbc0 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7159,7 +7159,6 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata)
{
struct nfs4_lockdata *data = calldata;
struct nfs4_lock_state *lsp = data->lsp;
- struct nfs_server *server = NFS_SERVER(d_inode(data->ctx->dentry));
if (!nfs4_sequence_done(task, &data->res.seq_res))
return;
@@ -7167,7 +7166,8 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata)
data->rpc_status = task->tk_status;
switch (task->tk_status) {
case 0:
- renew_lease(server, data->timestamp);
+ renew_lease(NFS_SERVER(d_inode(data->ctx->dentry)),
+ data->timestamp);
if (data->arg.new_lock && !data->cancelled) {
data->fl.fl_flags &= ~(FL_SLEEP | FL_ACCESS);
if (locks_lock_inode_wait(lsp->ls_state->inode, &data->fl) < 0)
@@ -7188,8 +7188,6 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata)
if (!nfs4_stateid_match(&data->arg.open_stateid,
&lsp->ls_state->open_stateid))
goto out_restart;
- else if (nfs4_async_handle_error(task, server, lsp->ls_state, NULL) == -EAGAIN)
- goto out_restart;
} else if (!nfs4_stateid_match(&data->arg.lock_stateid,
&lsp->ls_stateid))
goto out_restart;
--
2.40.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] NFSv4: Fix dropped lock for racing OPEN and delegation return
2023-06-27 18:31 [PATCH 1/2] Revert "NFSv4: Retry LOCK on OLD_STATEID during delegation return" Benjamin Coddington
@ 2023-06-27 18:31 ` Benjamin Coddington
2023-06-29 18:33 ` Trond Myklebust
2023-10-05 20:39 ` [PATCH 1/2] Revert "NFSv4: Retry LOCK on OLD_STATEID during delegation return" Anna Schumaker
1 sibling, 1 reply; 6+ messages in thread
From: Benjamin Coddington @ 2023-06-27 18:31 UTC (permalink / raw)
To: trond.myklebust, anna; +Cc: Olga.Kornievskaia, linux-nfs
Commmit f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during delegation
return") attempted to solve this problem by using nfs4's generic async error
handling, but introduced a regression where v4.0 lock recovery would hang.
The additional complexity introduced by overloading that error handling is
not necessary for this case.
The problem as originally explained in the above commit is:
There's a small window where a LOCK sent during a delegation return can
race with another OPEN on client, but the open stateid has not yet been
updated. In this case, the client doesn't handle the OLD_STATEID error
from the server and will lose this lock, emitting:
"NFS: nfs4_handle_delegation_recall_error: unhandled error -10024".
We want a fix that is much more focused to the original problem. Fix this
issue by returning -EAGAIN from the nfs4_handle_delegation_recall_error() on
OLD_STATEID, and use that as a signal for the delegation return code to
retry the LOCK operation. We should at this point be able to send along
the updated stateid.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
---
fs/nfs/delegation.c | 4 +++-
fs/nfs/nfs4proc.c | 1 +
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index cf7365581031..23aeb02319a5 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -160,7 +160,9 @@ static int nfs_delegation_claim_locks(struct nfs4_state *state, const nfs4_state
if (nfs_file_open_context(fl->fl_file)->state != state)
continue;
spin_unlock(&flctx->flc_lock);
- status = nfs4_lock_delegation_recall(fl, state, stateid);
+ do {
+ status = nfs4_lock_delegation_recall(fl, state, stateid);
+ } while (status == -EAGAIN);
if (status < 0)
goto out;
spin_lock(&flctx->flc_lock);
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 6bb14f6cfbc0..399db73a57f4 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2262,6 +2262,7 @@ static int nfs4_handle_delegation_recall_error(struct nfs_server *server, struct
case -NFS4ERR_BAD_HIGH_SLOT:
case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
case -NFS4ERR_DEADSESSION:
+ case -NFS4ERR_OLD_STATEID:
return -EAGAIN;
case -NFS4ERR_STALE_CLIENTID:
case -NFS4ERR_STALE_STATEID:
--
2.40.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] NFSv4: Fix dropped lock for racing OPEN and delegation return
2023-06-27 18:31 ` [PATCH 2/2] NFSv4: Fix dropped lock for racing OPEN and delegation return Benjamin Coddington
@ 2023-06-29 18:33 ` Trond Myklebust
2023-06-29 19:01 ` Benjamin Coddington
0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2023-06-29 18:33 UTC (permalink / raw)
To: anna@kernel.org, bcodding@redhat.com
Cc: linux-nfs@vger.kernel.org, Olga.Kornievskaia@netapp.com
On Tue, 2023-06-27 at 14:31 -0400, Benjamin Coddington wrote:
> Commmit f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during
> delegation
> return") attempted to solve this problem by using nfs4's generic
> async error
> handling, but introduced a regression where v4.0 lock recovery would
> hang.
> The additional complexity introduced by overloading that error
> handling is
> not necessary for this case.
>
> The problem as originally explained in the above commit is:
>
> There's a small window where a LOCK sent during a delegation
> return can
> race with another OPEN on client, but the open stateid has not
> yet been
> updated. In this case, the client doesn't handle the OLD_STATEID
> error
> from the server and will lose this lock, emitting:
> "NFS: nfs4_handle_delegation_recall_error: unhandled error -
> 10024".
>
> We want a fix that is much more focused to the original problem. Fix
> this
> issue by returning -EAGAIN from the
> nfs4_handle_delegation_recall_error() on
> OLD_STATEID, and use that as a signal for the delegation return code
> to
> retry the LOCK operation. We should at this point be able to send
> along
> the updated stateid.
>
> Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
> ---
> fs/nfs/delegation.c | 4 +++-
> fs/nfs/nfs4proc.c | 1 +
> 2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
> index cf7365581031..23aeb02319a5 100644
> --- a/fs/nfs/delegation.c
> +++ b/fs/nfs/delegation.c
> @@ -160,7 +160,9 @@ static int nfs_delegation_claim_locks(struct
> nfs4_state *state, const nfs4_state
> if (nfs_file_open_context(fl->fl_file)->state !=
> state)
> continue;
> spin_unlock(&flctx->flc_lock);
> - status = nfs4_lock_delegation_recall(fl, state,
> stateid);
> + do {
> + status = nfs4_lock_delegation_recall(fl,
> state, stateid);
> + } while (status == -EAGAIN);
> if (status < 0)
> goto out;
> spin_lock(&flctx->flc_lock);
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 6bb14f6cfbc0..399db73a57f4 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -2262,6 +2262,7 @@ static int
> nfs4_handle_delegation_recall_error(struct nfs_server *server, struct
> case -NFS4ERR_BAD_HIGH_SLOT:
> case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
> case -NFS4ERR_DEADSESSION:
> + case -NFS4ERR_OLD_STATEID:
> return -EAGAIN;
Hmm... Rather than issuing a blanket EAGAIN, we really should be
looking at using either nfs4_refresh_lock_old_stateid() or
nfs4_refresh_open_old_stateid(), depending on whether the stateid that
saw the NFS4ERR_OLD_STATEID was a lock stateid or an open stateid.
> case -NFS4ERR_STALE_CLIENTID:
> case -NFS4ERR_STALE_STATEID:
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] NFSv4: Fix dropped lock for racing OPEN and delegation return
2023-06-29 18:33 ` Trond Myklebust
@ 2023-06-29 19:01 ` Benjamin Coddington
0 siblings, 0 replies; 6+ messages in thread
From: Benjamin Coddington @ 2023-06-29 19:01 UTC (permalink / raw)
To: Trond Myklebust; +Cc: anna, linux-nfs, Olga.Kornievskaia
On 29 Jun 2023, at 14:33, Trond Myklebust wrote:
> On Tue, 2023-06-27 at 14:31 -0400, Benjamin Coddington wrote:
>> Commmit f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during
>> delegation
>> return") attempted to solve this problem by using nfs4's generic
>> async error
>> handling, but introduced a regression where v4.0 lock recovery would
>> hang.
>> The additional complexity introduced by overloading that error
>> handling is
>> not necessary for this case.
>>
>> The problem as originally explained in the above commit is:
>>
>> There's a small window where a LOCK sent during a delegation
>> return can
>> race with another OPEN on client, but the open stateid has not
>> yet been
>> updated. In this case, the client doesn't handle the OLD_STATEID
>> error
>> from the server and will lose this lock, emitting:
>> "NFS: nfs4_handle_delegation_recall_error: unhandled error -
>> 10024".
>>
>> We want a fix that is much more focused to the original problem. Fix
>> this
>> issue by returning -EAGAIN from the
>> nfs4_handle_delegation_recall_error() on
>> OLD_STATEID, and use that as a signal for the delegation return code
>> to
>> retry the LOCK operation. We should at this point be able to send
>> along
>> the updated stateid.
>>
>> Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
>> ---
>> fs/nfs/delegation.c | 4 +++-
>> fs/nfs/nfs4proc.c | 1 +
>> 2 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
>> index cf7365581031..23aeb02319a5 100644
>> --- a/fs/nfs/delegation.c
>> +++ b/fs/nfs/delegation.c
>> @@ -160,7 +160,9 @@ static int nfs_delegation_claim_locks(struct
>> nfs4_state *state, const nfs4_state
>> if (nfs_file_open_context(fl->fl_file)->state !=
>> state)
>> continue;
>> spin_unlock(&flctx->flc_lock);
>> - status = nfs4_lock_delegation_recall(fl, state,
>> stateid);
>> + do {
>> + status = nfs4_lock_delegation_recall(fl,
>> state, stateid);
>> + } while (status == -EAGAIN);
>> if (status < 0)
>> goto out;
>> spin_lock(&flctx->flc_lock);
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index 6bb14f6cfbc0..399db73a57f4 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -2262,6 +2262,7 @@ static int
>> nfs4_handle_delegation_recall_error(struct nfs_server *server, struct
>> case -NFS4ERR_BAD_HIGH_SLOT:
>> case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
>> case -NFS4ERR_DEADSESSION:
>> + case -NFS4ERR_OLD_STATEID:
>> return -EAGAIN;
>
> Hmm... Rather than issuing a blanket EAGAIN, we really should be
> looking at using either nfs4_refresh_lock_old_stateid() or
> nfs4_refresh_open_old_stateid(), depending on whether the stateid that
> saw the NFS4ERR_OLD_STATEID was a lock stateid or an open stateid.
I figured if there was client race that would trigger the OLD_STATEID on
open, we'd have heard from the "unhandled error" printk by now, so I rushed
this out.. But I also don't like the overloading for open error handling
here. I'll work it up as you suggest.
The revert can go without a fix (IMO). The fix is worse than original bug
which was really hard to hit. I couldn't reproduce it without artificial
delays.
Ben
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] Revert "NFSv4: Retry LOCK on OLD_STATEID during delegation return"
2023-06-27 18:31 [PATCH 1/2] Revert "NFSv4: Retry LOCK on OLD_STATEID during delegation return" Benjamin Coddington
2023-06-27 18:31 ` [PATCH 2/2] NFSv4: Fix dropped lock for racing OPEN and delegation return Benjamin Coddington
@ 2023-10-05 20:39 ` Anna Schumaker
2023-10-06 13:44 ` Sasha Levin
1 sibling, 1 reply; 6+ messages in thread
From: Anna Schumaker @ 2023-10-05 20:39 UTC (permalink / raw)
To: stable; +Cc: trond.myklebust, Olga.Kornievskaia, linux-nfs,
Benjamin Coddington
Hi Stable Kernel Team,
Can this patch be backported to v6.1+ kernels? The commit id is
5b4a82a0724a and has been upstream since v6.5. As was mentioned in the
original patch description (below), the commit being reverted by this
patch breaks state recovery in a way that is worse than the initial
bug that it was attempting to fix.
Thanks,
Anna
On Tue, Jun 27, 2023 at 2:31 PM Benjamin Coddington <bcodding@redhat.com> wrote:
>
> Olga Kornievskaia reports that this patch breaks NFSv4.0 state recovery.
> It also introduces additional complexity in the error paths for cases not
> related to the original problem. Let's revert it for now, and address the
> original problem in another manner.
>
> This reverts commit f5ea16137a3fa2858620dc9084466491c128535f.
>
> Fixes: f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during delegation return")
> Reported-by: Kornievskaia, Olga <Olga.Kornievskaia@netapp.com>
> Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
> ---
> fs/nfs/nfs4proc.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index d3665390c4cb..6bb14f6cfbc0 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -7159,7 +7159,6 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata)
> {
> struct nfs4_lockdata *data = calldata;
> struct nfs4_lock_state *lsp = data->lsp;
> - struct nfs_server *server = NFS_SERVER(d_inode(data->ctx->dentry));
>
> if (!nfs4_sequence_done(task, &data->res.seq_res))
> return;
> @@ -7167,7 +7166,8 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata)
> data->rpc_status = task->tk_status;
> switch (task->tk_status) {
> case 0:
> - renew_lease(server, data->timestamp);
> + renew_lease(NFS_SERVER(d_inode(data->ctx->dentry)),
> + data->timestamp);
> if (data->arg.new_lock && !data->cancelled) {
> data->fl.fl_flags &= ~(FL_SLEEP | FL_ACCESS);
> if (locks_lock_inode_wait(lsp->ls_state->inode, &data->fl) < 0)
> @@ -7188,8 +7188,6 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata)
> if (!nfs4_stateid_match(&data->arg.open_stateid,
> &lsp->ls_state->open_stateid))
> goto out_restart;
> - else if (nfs4_async_handle_error(task, server, lsp->ls_state, NULL) == -EAGAIN)
> - goto out_restart;
> } else if (!nfs4_stateid_match(&data->arg.lock_stateid,
> &lsp->ls_stateid))
> goto out_restart;
> --
> 2.40.1
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] Revert "NFSv4: Retry LOCK on OLD_STATEID during delegation return"
2023-10-05 20:39 ` [PATCH 1/2] Revert "NFSv4: Retry LOCK on OLD_STATEID during delegation return" Anna Schumaker
@ 2023-10-06 13:44 ` Sasha Levin
0 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2023-10-06 13:44 UTC (permalink / raw)
To: Anna Schumaker
Cc: stable, trond.myklebust, Olga.Kornievskaia, linux-nfs,
Benjamin Coddington
On Thu, Oct 05, 2023 at 04:39:58PM -0400, Anna Schumaker wrote:
>Hi Stable Kernel Team,
>
>Can this patch be backported to v6.1+ kernels? The commit id is
>5b4a82a0724a and has been upstream since v6.5. As was mentioned in the
>original patch description (below), the commit being reverted by this
>patch breaks state recovery in a way that is worse than the initial
>bug that it was attempting to fix.
Queued up, thanks!
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-10-06 13:44 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-27 18:31 [PATCH 1/2] Revert "NFSv4: Retry LOCK on OLD_STATEID during delegation return" Benjamin Coddington
2023-06-27 18:31 ` [PATCH 2/2] NFSv4: Fix dropped lock for racing OPEN and delegation return Benjamin Coddington
2023-06-29 18:33 ` Trond Myklebust
2023-06-29 19:01 ` Benjamin Coddington
2023-10-05 20:39 ` [PATCH 1/2] Revert "NFSv4: Retry LOCK on OLD_STATEID during delegation return" Anna Schumaker
2023-10-06 13:44 ` Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).