[PATCH] nfsd: remove long-standing revoked delegations by force

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] nfsd: remove long-standing revoked delegations by force
@ 2025-09-02  2:22 Li Lingfeng
  2025-09-02 10:21 ` Jeff Layton
  0 siblings, 1 reply; 13+ messages in thread
From: Li Lingfeng @ 2025-09-02  2:22 UTC (permalink / raw)
  To: chuck.lever, jlayton, neil, okorniev, Dai.Ngo, tom, linux-nfs,
	linux-kernel
  Cc: yukuai1, houtao1, yi.zhang, yangerkun, lilingfeng, lilingfeng3,
	zhangjian496

When file access conflicts occur between clients, the server recalls
delegations. If the client holding delegation fails to return it after
a recall, nfs4_laundromat adds the delegation to cl_revoked list.
This causes subsequent SEQUENCE operations to set the
SEQ4_STATUS_RECALLABLE_STATE_REVOKED flag, forcing the client to
validate all delegations and return the revoked one.

However, if the client fails to return the delegation due to a timeout
after receiving the recall or a server bug, the delegation remains in the
server's cl_revoked list. The client marks it revoked and won't find it
upon detecting SEQ4_STATUS_RECALLABLE_STATE_REVOKED. This leads to a loop:
the server persistently sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED, and the
client repeatedly tests all delegations, severely impacting performance
when numerous delegations exist.

Since abnormal delegations are removed from flc_lease via nfs4_laundromat
--> revoke_delegation --> destroy_unhashed_deleg -->
nfs4_unlock_deleg_lease --> kernel_setlease, and do not block new open
requests indefinitely, retaining such a delegation on the server is
unnecessary.

Reported-by: Zhang Jian <zhangjian496@huawei.com>
Fixes: 3bd64a5ba171 ("nfsd4: implement SEQ4_STATUS_RECALLABLE_STATE_REVOKED")
Closes: https://lore.kernel.org/all/ff8debe9-6877-4cf7-ba29-fc98eae0ffa0@huawei.com/
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
---
 fs/nfsd/nfs4state.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 88c347957da5..aa65a685dbb9 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4326,6 +4326,8 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	int buflen;
 	struct net *net = SVC_NET(rqstp);
 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+	struct list_head *pos, *next;
+	struct nfs4_delegation *dp;
 
 	if (resp->opcnt != 1)
 		return nfserr_sequence_pos;
@@ -4470,6 +4472,15 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	default:
 		seq->status_flags = 0;
 	}
+	if (!list_empty(&clp->cl_revoked)) {
+		list_for_each_safe(pos, next, &clp->cl_revoked) {
+			dp = list_entry(pos, struct nfs4_delegation, dl_recall_lru);
+			if (dp->dl_time < (ktime_get_boottime_seconds() - 2 * nn->nfsd4_lease)) {
+				list_del_init(&dp->dl_recall_lru);
+				nfs4_put_stid(&dp->dl_stid);
+			}
+		}
+	}
 	if (!list_empty(&clp->cl_revoked))
 		seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED;
 	if (atomic_read(&clp->cl_admin_revoked))
-- 
2.46.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] nfsd: remove long-standing revoked delegations by force
  2025-09-02  2:22 [PATCH] nfsd: remove long-standing revoked delegations by force Li Lingfeng
@ 2025-09-02 10:21 ` Jeff Layton
  2025-09-02 12:10   ` Li Lingfeng
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Layton @ 2025-09-02 10:21 UTC (permalink / raw)
  To: Li Lingfeng, chuck.lever, neil, okorniev, Dai.Ngo, tom, linux-nfs,
	linux-kernel
  Cc: yukuai1, houtao1, yi.zhang, yangerkun, lilingfeng, zhangjian496

On Tue, 2025-09-02 at 10:22 +0800, Li Lingfeng wrote:
> When file access conflicts occur between clients, the server recalls
> delegations. If the client holding delegation fails to return it after
> a recall, nfs4_laundromat adds the delegation to cl_revoked list.
> This causes subsequent SEQUENCE operations to set the
> SEQ4_STATUS_RECALLABLE_STATE_REVOKED flag, forcing the client to
> validate all delegations and return the revoked one.
> 
> However, if the client fails to return the delegation due to a timeout
> after receiving the recall or a server bug, the delegation remains in the
> server's cl_revoked list. The client marks it revoked and won't find it
> upon detecting SEQ4_STATUS_RECALLABLE_STATE_REVOKED. This leads to a loop:
> the server persistently sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED, and the
> client repeatedly tests all delegations, severely impacting performance
> when numerous delegations exist.
> 

It is a performance impact, but I don't get the "loop" here. Are you
saying that this problem compounds itself? That testing all delegations
causes others to be revoked?

> Since abnormal delegations are removed from flc_lease via nfs4_laundromat
> --> revoke_delegation --> destroy_unhashed_deleg -->
> nfs4_unlock_deleg_lease --> kernel_setlease, and do not block new open
> requests indefinitely, retaining such a delegation on the server is
> unnecessary.
> 
> Reported-by: Zhang Jian <zhangjian496@huawei.com>
> Fixes: 3bd64a5ba171 ("nfsd4: implement SEQ4_STATUS_RECALLABLE_STATE_REVOKED")
> Closes: https://lore.kernel.org/all/ff8debe9-6877-4cf7-ba29-fc98eae0ffa0@huawei.com/
> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
> ---
>  fs/nfsd/nfs4state.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 88c347957da5..aa65a685dbb9 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -4326,6 +4326,8 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	int buflen;
>  	struct net *net = SVC_NET(rqstp);
>  	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> +	struct list_head *pos, *next;
> +	struct nfs4_delegation *dp;
>  
>  	if (resp->opcnt != 1)
>  		return nfserr_sequence_pos;
> @@ -4470,6 +4472,15 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	default:
>  		seq->status_flags = 0;
>  	}
> +	if (!list_empty(&clp->cl_revoked)) {
> +		list_for_each_safe(pos, next, &clp->cl_revoked) {
> +			dp = list_entry(pos, struct nfs4_delegation, dl_recall_lru);
> +			if (dp->dl_time < (ktime_get_boottime_seconds() - 2 * nn->nfsd4_lease)) {
> +				list_del_init(&dp->dl_recall_lru);
> +				nfs4_put_stid(&dp->dl_stid);
> +			}
> +		}
> +	}
>  	if (!list_empty(&clp->cl_revoked))
>  		seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED;
>  	if (atomic_read(&clp->cl_admin_revoked))

This seems like a violation of the spec. AIUI, the server is required
to hang onto a record of the delegation until the client does the
TEST_STATEID/FREE_STATEID dance to remove it. Just discarding them like
this seems wrong.

Should we instead just administratively evict the client since it's
clearly not behaving right in this case?
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nfsd: remove long-standing revoked delegations by force
  2025-09-02 10:21 ` Jeff Layton
@ 2025-09-02 12:10   ` Li Lingfeng
  2025-09-02 12:43     ` Benjamin Coddington
  2025-09-02 13:40     ` Jeff Layton
  0 siblings, 2 replies; 13+ messages in thread
From: Li Lingfeng @ 2025-09-02 12:10 UTC (permalink / raw)
  To: Jeff Layton, chuck.lever, neil, okorniev, Dai.Ngo, tom, linux-nfs,
	linux-kernel
  Cc: yukuai1, houtao1, yi.zhang, yangerkun, lilingfeng, zhangjian496

Hi,

在 2025/9/2 18:21, Jeff Layton 写道:
> On Tue, 2025-09-02 at 10:22 +0800, Li Lingfeng wrote:
>> When file access conflicts occur between clients, the server recalls
>> delegations. If the client holding delegation fails to return it after
>> a recall, nfs4_laundromat adds the delegation to cl_revoked list.
>> This causes subsequent SEQUENCE operations to set the
>> SEQ4_STATUS_RECALLABLE_STATE_REVOKED flag, forcing the client to
>> validate all delegations and return the revoked one.
>>
>> However, if the client fails to return the delegation due to a timeout
>> after receiving the recall or a server bug, the delegation remains in the
>> server's cl_revoked list. The client marks it revoked and won't find it
>> upon detecting SEQ4_STATUS_RECALLABLE_STATE_REVOKED. This leads to a loop:
>> the server persistently sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED, and the
>> client repeatedly tests all delegations, severely impacting performance
>> when numerous delegations exist.
>>
> It is a performance impact, but I don't get the "loop" here. Are you
> saying that this problem compounds itself? That testing all delegations
> causes others to be revoked?
The delegation will be removed from server->delegations in client after
NFSPROC4_CLNT_DELEGRETURN is performed.
nfs4_delegreturn_done
  nfs_delegation_mark_returned
   nfs_detach_delegation
    nfs_detach_delegation_locked
     list_del_rcu // remove delegation from server->delegations

 From the client's perspective, the delegation has been returned, but on
the server side, it is left in the cl_revoked list.[1].

Subsequently, every sequence from the client will be flagged with
SEQ4_STATUS_RECALLABLE_STATE_REVOKED as long as cl_revoked remains 
non-empty.
nfsd4_sequence
  seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED

When the client detects SEQ4_STATUS_RECALLABLE_STATE_REVOKED while
processing a sequence result, it sets NFS_DELEGATION_TEST_EXPIRED for all
delegations and wakes up the state manager for handling.
nfs41_sequence_done
  nfs41_sequence_process
   nfs41_handle_sequence_flag_errors
    nfs41_handle_recallable_state_revoked
     nfs_test_expired_all_delegations
      nfs_mark_test_expired_all_delegations
       nfs_delegation_mark_test_expired_server
        // set NFS_DELEGATION_TEST_EXPIRED for delegations in 
server->delegations
      nfs4_schedule_state_manager

The state manager tests all delegations except the one that was returned,
as it is no longer in server->delegations.
nfs4_state_manager
  nfs4_begin_drain_session
  nfs_reap_expired_delegations
   nfs_server_reap_expired_delegations
    // test delegations in server->delegations

There may be a loop:
1) send a sequence(client)
2) return SEQ4_STATUS_RECALLABLE_STATE_REVOKED(server)
3) set NFS_DELEGATION_TEST_EXPIRED for all delegations(client)
4) test all delegations by state manager(client)
5) send another sequence(client)

The state manager's traversal of delegations occurs between
nfs4_begin_drain_session and nfs4_end_drain_session. Non-privileged requests
will be blocked because the NFS4_SLOT_TBL_DRAINING flag is set. If there are
many delegations to traverse, this blocking time can be relatively long.
>> Since abnormal delegations are removed from flc_lease via nfs4_laundromat
>> --> revoke_delegation --> destroy_unhashed_deleg -->
>> nfs4_unlock_deleg_lease --> kernel_setlease, and do not block new open
>> requests indefinitely, retaining such a delegation on the server is
>> unnecessary.
>>
>> Reported-by: Zhang Jian <zhangjian496@huawei.com>
>> Fixes: 3bd64a5ba171 ("nfsd4: implement SEQ4_STATUS_RECALLABLE_STATE_REVOKED")
>> Closes: https://lore.kernel.org/all/ff8debe9-6877-4cf7-ba29-fc98eae0ffa0@huawei.com/
>> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
>> ---
>>   fs/nfsd/nfs4state.c | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index 88c347957da5..aa65a685dbb9 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -4326,6 +4326,8 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>   	int buflen;
>>   	struct net *net = SVC_NET(rqstp);
>>   	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>> +	struct list_head *pos, *next;
>> +	struct nfs4_delegation *dp;
>>   
>>   	if (resp->opcnt != 1)
>>   		return nfserr_sequence_pos;
>> @@ -4470,6 +4472,15 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>   	default:
>>   		seq->status_flags = 0;
>>   	}
>> +	if (!list_empty(&clp->cl_revoked)) {
>> +		list_for_each_safe(pos, next, &clp->cl_revoked) {
>> +			dp = list_entry(pos, struct nfs4_delegation, dl_recall_lru);
>> +			if (dp->dl_time < (ktime_get_boottime_seconds() - 2 * nn->nfsd4_lease)) {
>> +				list_del_init(&dp->dl_recall_lru);
>> +				nfs4_put_stid(&dp->dl_stid);
>> +			}
>> +		}
>> +	}
>>   	if (!list_empty(&clp->cl_revoked))
>>   		seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED;
>>   	if (atomic_read(&clp->cl_admin_revoked))
> This seems like a violation of the spec. AIUI, the server is required
> to hang onto a record of the delegation until the client does the
> TEST_STATEID/FREE_STATEID dance to remove it. Just discarding them like
> this seems wrong.
Our expected outcome was that the client would release the abnormal
delegation via TEST_STATEID/FREE_STATEID upon detecting its invalidity.
However, this problematic delegation is no longer present in the
client's server->delegations list—whether due to client-side timeouts or
the server-side bug [1].
>
> Should we instead just administratively evict the client since it's
> clearly not behaving right in this case?
Thanks for the suggestion. While administratively evicting the client would
certainly resolve the immediate delegation issue, I'm concerned that 
approach
might be a bit heavy-handed.
The problematic behavior seems isolated to a single delegation. Meanwhile,
the client itself likely has numerous other open files and active state on
the server. Forcing a complete client reconnect would tear down all that
state, which could cause significant application disruption and be perceived
as a service outage from the client's perspective.

[1] 
https://lore.kernel.org/all/de669327-c93a-49e5-a53b-bda9e67d34a2@huawei.com/

Thanks,
Lingfeng


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nfsd: remove long-standing revoked delegations by force
  2025-09-02 12:10   ` Li Lingfeng
@ 2025-09-02 12:43     ` Benjamin Coddington
  2025-09-02 13:08       ` Li Lingfeng
  2025-09-03  3:46       ` zhangjian (CG)
  2025-09-02 13:40     ` Jeff Layton
  1 sibling, 2 replies; 13+ messages in thread
From: Benjamin Coddington @ 2025-09-02 12:43 UTC (permalink / raw)
  To: Li Lingfeng
  Cc: Jeff Layton, chuck.lever, neil, okorniev, Dai.Ngo, tom, linux-nfs,
	linux-kernel, yukuai1, houtao1, yi.zhang, yangerkun, lilingfeng,
	zhangjian496

On 2 Sep 2025, at 8:10, Li Lingfeng wrote:

> Our expected outcome was that the client would release the abnormal
> delegation via TEST_STATEID/FREE_STATEID upon detecting its invalidity.
> However, this problematic delegation is no longer present in the
> client's server->delegations list—whether due to client-side timeouts or
> the server-side bug [1].

How does the client timeout TEST_STATEID - are you mounting with 'soft'?

We should find the server-side bug and fix it rather than write code to
paper over it.  I do think the synchronization of state here is a bit
fragile and wish the protocol had a generation, sequence, or marker for
setting SEQ4_STATUS_ bits..

>>
>> Should we instead just administratively evict the client since it's
>> clearly not behaving right in this case?
> Thanks for the suggestion. While administratively evicting the client would
> certainly resolve the immediate delegation issue, I'm concerned that approach
> might be a bit heavy-handed.
> The problematic behavior seems isolated to a single delegation. Meanwhile,
> the client itself likely has numerous other open files and active state on
> the server. Forcing a complete client reconnect would tear down all that
> state, which could cause significant application disruption and be perceived
> as a service outage from the client's perspective.
>
> [1] https://lore.kernel.org/all/de669327-c93a-49e5-a53b-bda9e67d34a2@huawei.com/

^^ in this thread you reference v5.10 - there was a knfsd fix for a
cl_revoked leak "3b816601e279", and there have been 3 or 4 fixes to fix
problems and optimize the client walk of delegations since then.  Jeff
pointed out that there have been fixes in these areas.  Are you finding this
problem still with all those fixes included?

Ben


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nfsd: remove long-standing revoked delegations by force
  2025-09-02 12:43     ` Benjamin Coddington
@ 2025-09-02 13:08       ` Li Lingfeng
  2025-09-03  3:46       ` zhangjian (CG)
  1 sibling, 0 replies; 13+ messages in thread
From: Li Lingfeng @ 2025-09-02 13:08 UTC (permalink / raw)
  To: Benjamin Coddington
  Cc: Jeff Layton, chuck.lever, neil, okorniev, Dai.Ngo, tom, linux-nfs,
	linux-kernel, yukuai1, houtao1, yi.zhang, yangerkun, lilingfeng,
	zhangjian496

Hi, Ben

在 2025/9/2 20:43, Benjamin Coddington 写道:
> On 2 Sep 2025, at 8:10, Li Lingfeng wrote:
>
>> Our expected outcome was that the client would release the abnormal
>> delegation via TEST_STATEID/FREE_STATEID upon detecting its invalidity.
>> However, this problematic delegation is no longer present in the
>> client's server->delegations list—whether due to client-side timeouts or
>> the server-side bug [1].
> How does the client timeout TEST_STATEID - are you mounting with 'soft'?
I have never actually encountered a timeout; on 5.10, I triggered it by
forcibly injecting a timeout error.

--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -6509,6 +6509,10 @@ static void nfs4_delegreturn_prepare(struct 
rpc_task *task, void *data)
                         &d_data->args.seq_args,
                         &d_data->res.seq_res,
                         task);
+
+       printk("%s force inject err\n", __func__);
+       task->tk_rpc_status = -ETIMEDOUT;
+       rpc_exit(task, -ETIMEDOUT);
  }
> We should find the server-side bug and fix it rather than write code to
> paper over it.  I do think the synchronization of state here is a bit
> fragile and wish the protocol had a generation, sequence, or marker for
> setting SEQ4_STATUS_ bits..
I was able to reproduce a server-side bug by adding delays (without using
fault injection). The server-side bug is detailed in reference [1].
I would appreciate it if you could provide any suggestions for 
modifications.
>>> Should we instead just administratively evict the client since it's
>>> clearly not behaving right in this case?
>> Thanks for the suggestion. While administratively evicting the client would
>> certainly resolve the immediate delegation issue, I'm concerned that approach
>> might be a bit heavy-handed.
>> The problematic behavior seems isolated to a single delegation. Meanwhile,
>> the client itself likely has numerous other open files and active state on
>> the server. Forcing a complete client reconnect would tear down all that
>> state, which could cause significant application disruption and be perceived
>> as a service outage from the client's perspective.
>>
>> [1] https://lore.kernel.org/all/de669327-c93a-49e5-a53b-bda9e67d34a2@huawei.com/
> ^^ in this thread you reference v5.10 - there was a knfsd fix for a
> cl_revoked leak "3b816601e279", and there have been 3 or 4 fixes to fix
> problems and optimize the client walk of delegations since then.  Jeff
> pointed out that there have been fixes in these areas.  Are you finding this
> problem still with all those fixes included?
As shown in [1], the problem can be reproduced at master(commit 
b320789d6883),
I think all those fixes are included.

Thanks,
Lingfeng

>
> Ben
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nfsd: remove long-standing revoked delegations by force
  2025-09-02 12:10   ` Li Lingfeng
  2025-09-02 12:43     ` Benjamin Coddington
@ 2025-09-02 13:40     ` Jeff Layton
  2025-09-02 14:21       ` Li Lingfeng
  1 sibling, 1 reply; 13+ messages in thread
From: Jeff Layton @ 2025-09-02 13:40 UTC (permalink / raw)
  To: Li Lingfeng, chuck.lever, neil, okorniev, Dai.Ngo, tom, linux-nfs,
	linux-kernel
  Cc: yukuai1, houtao1, yi.zhang, yangerkun, lilingfeng, zhangjian496

On Tue, 2025-09-02 at 20:10 +0800, Li Lingfeng wrote:
> Hi,
> 
> 在 2025/9/2 18:21, Jeff Layton 写道:
> > On Tue, 2025-09-02 at 10:22 +0800, Li Lingfeng wrote:
> > > When file access conflicts occur between clients, the server recalls
> > > delegations. If the client holding delegation fails to return it after
> > > a recall, nfs4_laundromat adds the delegation to cl_revoked list.
> > > This causes subsequent SEQUENCE operations to set the
> > > SEQ4_STATUS_RECALLABLE_STATE_REVOKED flag, forcing the client to
> > > validate all delegations and return the revoked one.
> > > 
> > > However, if the client fails to return the delegation due to a timeout
> > > after receiving the recall or a server bug, the delegation remains in the
> > > server's cl_revoked list. The client marks it revoked and won't find it
> > > upon detecting SEQ4_STATUS_RECALLABLE_STATE_REVOKED. This leads to a loop:
> > > the server persistently sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED, and the
> > > client repeatedly tests all delegations, severely impacting performance
> > > when numerous delegations exist.
> > > 
> > It is a performance impact, but I don't get the "loop" here. Are you
> > saying that this problem compounds itself? That testing all delegations
> > causes others to be revoked?
> The delegation will be removed from server->delegations in client after
> NFSPROC4_CLNT_DELEGRETURN is performed.
> nfs4_delegreturn_done
>   nfs_delegation_mark_returned
>    nfs_detach_delegation
>     nfs_detach_delegation_locked
>      list_del_rcu // remove delegation from server->delegations
> 
>  From the client's perspective, the delegation has been returned, but on
> the server side, it is left in the cl_revoked list.[1].
> 
> Subsequently, every sequence from the client will be flagged with
> SEQ4_STATUS_RECALLABLE_STATE_REVOKED as long as cl_revoked remains 
> non-empty.
> nfsd4_sequence
>   seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED
> 
> When the client detects SEQ4_STATUS_RECALLABLE_STATE_REVOKED while
> processing a sequence result, it sets NFS_DELEGATION_TEST_EXPIRED for all
> delegations and wakes up the state manager for handling.
> nfs41_sequence_done
>   nfs41_sequence_process
>    nfs41_handle_sequence_flag_errors
>     nfs41_handle_recallable_state_revoked
>      nfs_test_expired_all_delegations
>       nfs_mark_test_expired_all_delegations
>        nfs_delegation_mark_test_expired_server
>         // set NFS_DELEGATION_TEST_EXPIRED for delegations in 
> server->delegations
>       nfs4_schedule_state_manager
> 
> The state manager tests all delegations except the one that was returned,
> as it is no longer in server->delegations.
> nfs4_state_manager
>   nfs4_begin_drain_session
>   nfs_reap_expired_delegations
>    nfs_server_reap_expired_delegations
>     // test delegations in server->delegations
> 
> There may be a loop:
> 1) send a sequence(client)
> 2) return SEQ4_STATUS_RECALLABLE_STATE_REVOKED(server)
> 3) set NFS_DELEGATION_TEST_EXPIRED for all delegations(client)
> 4) test all delegations by state manager(client)
> 5) send another sequence(client)
> 
> The state manager's traversal of delegations occurs between
> nfs4_begin_drain_session and nfs4_end_drain_session. Non-privileged requests
> will be blocked because the NFS4_SLOT_TBL_DRAINING flag is set. If there are
> many delegations to traverse, this blocking time can be relatively long.
> > > Since abnormal delegations are removed from flc_lease via nfs4_laundromat
> > > --> revoke_delegation --> destroy_unhashed_deleg -->
> > > nfs4_unlock_deleg_lease --> kernel_setlease, and do not block new open
> > > requests indefinitely, retaining such a delegation on the server is
> > > unnecessary.
> > > 
> > > Reported-by: Zhang Jian <zhangjian496@huawei.com>
> > > Fixes: 3bd64a5ba171 ("nfsd4: implement SEQ4_STATUS_RECALLABLE_STATE_REVOKED")
> > > Closes: https://lore.kernel.org/all/ff8debe9-6877-4cf7-ba29-fc98eae0ffa0@huawei.com/
> > > Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
> > > ---
> > >   fs/nfsd/nfs4state.c | 11 +++++++++++
> > >   1 file changed, 11 insertions(+)
> > > 
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index 88c347957da5..aa65a685dbb9 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -4326,6 +4326,8 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > >   	int buflen;
> > >   	struct net *net = SVC_NET(rqstp);
> > >   	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > > +	struct list_head *pos, *next;
> > > +	struct nfs4_delegation *dp;
> > >   
> > >   	if (resp->opcnt != 1)
> > >   		return nfserr_sequence_pos;
> > > @@ -4470,6 +4472,15 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > >   	default:
> > >   		seq->status_flags = 0;
> > >   	}
> > > +	if (!list_empty(&clp->cl_revoked)) {
> > > +		list_for_each_safe(pos, next, &clp->cl_revoked) {
> > > +			dp = list_entry(pos, struct nfs4_delegation, dl_recall_lru);
> > > +			if (dp->dl_time < (ktime_get_boottime_seconds() - 2 * nn->nfsd4_lease)) {
> > > +				list_del_init(&dp->dl_recall_lru);
> > > +				nfs4_put_stid(&dp->dl_stid);
> > > +			}
> > > +		}

FYI: this list is protected by the clp->cl_lock. You need to hold it to
do this list walk.

> > > +	}
> > >   	if (!list_empty(&clp->cl_revoked))
> > >   		seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED;
> > >   	if (atomic_read(&clp->cl_admin_revoked))
> > This seems like a violation of the spec. AIUI, the server is required
> > to hang onto a record of the delegation until the client does the
> > TEST_STATEID/FREE_STATEID dance to remove it. Just discarding them like
> > this seems wrong.
> Our expected outcome was that the client would release the abnormal
> delegation via TEST_STATEID/FREE_STATEID upon detecting its invalidity.
> However, this problematic delegation is no longer present in the
> client's server->delegations list—whether due to client-side timeouts or
> the server-side bug [1].
> > 
> > Should we instead just administratively evict the client since it's
> > clearly not behaving right in this case?
> Thanks for the suggestion. While administratively evicting the client would
> certainly resolve the immediate delegation issue, I'm concerned that 
> approach
> might be a bit heavy-handed.
> The problematic behavior seems isolated to a single delegation. Meanwhile,
> the client itself likely has numerous other open files and active state on
> the server. Forcing a complete client reconnect would tear down all that
> state, which could cause significant application disruption and be perceived
> as a service outage from the client's perspective.
> 
> [1] 
> https://lore.kernel.org/all/de669327-c93a-49e5-a53b-bda9e67d34a2@huawei.com/
> 
> Thanks,
> Lingfeng

Ok, I get the problem, but I still disagree with the solution. I don't
think we can just time these things out. Ideally we'd close the race
window, but the sc_status field is protected by the global state_lock
and I don't think we want to take it in revoke_delegation.

The best solution I can see is to have destroy_delegation()
unconditionally set SC_STATUS_CLOSED, and then you can do the list walk
above, but checking for that flag instead of testing for a timeout.

I'm still not thrilled with this solution though. It makes SEQUENCE a
bit more heavyweight than I'd like. I'm starting to think that we need
to rework the overall delegation locking, but that's an ugly problem to
tackle. 
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nfsd: remove long-standing revoked delegations by force
  2025-09-02 13:40     ` Jeff Layton
@ 2025-09-02 14:21       ` Li Lingfeng
  2025-09-02 14:29         ` Jeff Layton
  0 siblings, 1 reply; 13+ messages in thread
From: Li Lingfeng @ 2025-09-02 14:21 UTC (permalink / raw)
  To: Jeff Layton, chuck.lever, neil, okorniev, Dai.Ngo, tom, linux-nfs,
	linux-kernel
  Cc: yukuai1, houtao1, yi.zhang, yangerkun, lilingfeng, zhangjian496


在 2025/9/2 21:40, Jeff Layton 写道:
> On Tue, 2025-09-02 at 20:10 +0800, Li Lingfeng wrote:
>> Hi,
>>
>> 在 2025/9/2 18:21, Jeff Layton 写道:
>>> On Tue, 2025-09-02 at 10:22 +0800, Li Lingfeng wrote:
>>>> When file access conflicts occur between clients, the server recalls
>>>> delegations. If the client holding delegation fails to return it after
>>>> a recall, nfs4_laundromat adds the delegation to cl_revoked list.
>>>> This causes subsequent SEQUENCE operations to set the
>>>> SEQ4_STATUS_RECALLABLE_STATE_REVOKED flag, forcing the client to
>>>> validate all delegations and return the revoked one.
>>>>
>>>> However, if the client fails to return the delegation due to a timeout
>>>> after receiving the recall or a server bug, the delegation remains in the
>>>> server's cl_revoked list. The client marks it revoked and won't find it
>>>> upon detecting SEQ4_STATUS_RECALLABLE_STATE_REVOKED. This leads to a loop:
>>>> the server persistently sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED, and the
>>>> client repeatedly tests all delegations, severely impacting performance
>>>> when numerous delegations exist.
>>>>
>>> It is a performance impact, but I don't get the "loop" here. Are you
>>> saying that this problem compounds itself? That testing all delegations
>>> causes others to be revoked?
>> The delegation will be removed from server->delegations in client after
>> NFSPROC4_CLNT_DELEGRETURN is performed.
>> nfs4_delegreturn_done
>>    nfs_delegation_mark_returned
>>     nfs_detach_delegation
>>      nfs_detach_delegation_locked
>>       list_del_rcu // remove delegation from server->delegations
>>
>>   From the client's perspective, the delegation has been returned, but on
>> the server side, it is left in the cl_revoked list.[1].
>>
>> Subsequently, every sequence from the client will be flagged with
>> SEQ4_STATUS_RECALLABLE_STATE_REVOKED as long as cl_revoked remains
>> non-empty.
>> nfsd4_sequence
>>    seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED
>>
>> When the client detects SEQ4_STATUS_RECALLABLE_STATE_REVOKED while
>> processing a sequence result, it sets NFS_DELEGATION_TEST_EXPIRED for all
>> delegations and wakes up the state manager for handling.
>> nfs41_sequence_done
>>    nfs41_sequence_process
>>     nfs41_handle_sequence_flag_errors
>>      nfs41_handle_recallable_state_revoked
>>       nfs_test_expired_all_delegations
>>        nfs_mark_test_expired_all_delegations
>>         nfs_delegation_mark_test_expired_server
>>          // set NFS_DELEGATION_TEST_EXPIRED for delegations in
>> server->delegations
>>        nfs4_schedule_state_manager
>>
>> The state manager tests all delegations except the one that was returned,
>> as it is no longer in server->delegations.
>> nfs4_state_manager
>>    nfs4_begin_drain_session
>>    nfs_reap_expired_delegations
>>     nfs_server_reap_expired_delegations
>>      // test delegations in server->delegations
>>
>> There may be a loop:
>> 1) send a sequence(client)
>> 2) return SEQ4_STATUS_RECALLABLE_STATE_REVOKED(server)
>> 3) set NFS_DELEGATION_TEST_EXPIRED for all delegations(client)
>> 4) test all delegations by state manager(client)
>> 5) send another sequence(client)
>>
>> The state manager's traversal of delegations occurs between
>> nfs4_begin_drain_session and nfs4_end_drain_session. Non-privileged requests
>> will be blocked because the NFS4_SLOT_TBL_DRAINING flag is set. If there are
>> many delegations to traverse, this blocking time can be relatively long.
>>>> Since abnormal delegations are removed from flc_lease via nfs4_laundromat
>>>> --> revoke_delegation --> destroy_unhashed_deleg -->
>>>> nfs4_unlock_deleg_lease --> kernel_setlease, and do not block new open
>>>> requests indefinitely, retaining such a delegation on the server is
>>>> unnecessary.
>>>>
>>>> Reported-by: Zhang Jian <zhangjian496@huawei.com>
>>>> Fixes: 3bd64a5ba171 ("nfsd4: implement SEQ4_STATUS_RECALLABLE_STATE_REVOKED")
>>>> Closes: https://lore.kernel.org/all/ff8debe9-6877-4cf7-ba29-fc98eae0ffa0@huawei.com/
>>>> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
>>>> ---
>>>>    fs/nfsd/nfs4state.c | 11 +++++++++++
>>>>    1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>>> index 88c347957da5..aa65a685dbb9 100644
>>>> --- a/fs/nfsd/nfs4state.c
>>>> +++ b/fs/nfsd/nfs4state.c
>>>> @@ -4326,6 +4326,8 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>    	int buflen;
>>>>    	struct net *net = SVC_NET(rqstp);
>>>>    	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>>>> +	struct list_head *pos, *next;
>>>> +	struct nfs4_delegation *dp;
>>>>    
>>>>    	if (resp->opcnt != 1)
>>>>    		return nfserr_sequence_pos;
>>>> @@ -4470,6 +4472,15 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>    	default:
>>>>    		seq->status_flags = 0;
>>>>    	}
>>>> +	if (!list_empty(&clp->cl_revoked)) {
>>>> +		list_for_each_safe(pos, next, &clp->cl_revoked) {
>>>> +			dp = list_entry(pos, struct nfs4_delegation, dl_recall_lru);
>>>> +			if (dp->dl_time < (ktime_get_boottime_seconds() - 2 * nn->nfsd4_lease)) {
>>>> +				list_del_init(&dp->dl_recall_lru);
>>>> +				nfs4_put_stid(&dp->dl_stid);
>>>> +			}
>>>> +		}
> FYI: this list is protected by the clp->cl_lock. You need to hold it to
> do this list walk.
>
>>>> +	}
>>>>    	if (!list_empty(&clp->cl_revoked))
>>>>    		seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED;
>>>>    	if (atomic_read(&clp->cl_admin_revoked))
>>> This seems like a violation of the spec. AIUI, the server is required
>>> to hang onto a record of the delegation until the client does the
>>> TEST_STATEID/FREE_STATEID dance to remove it. Just discarding them like
>>> this seems wrong.
>> Our expected outcome was that the client would release the abnormal
>> delegation via TEST_STATEID/FREE_STATEID upon detecting its invalidity.
>> However, this problematic delegation is no longer present in the
>> client's server->delegations list—whether due to client-side timeouts or
>> the server-side bug [1].
>>> Should we instead just administratively evict the client since it's
>>> clearly not behaving right in this case?
>> Thanks for the suggestion. While administratively evicting the client would
>> certainly resolve the immediate delegation issue, I'm concerned that
>> approach
>> might be a bit heavy-handed.
>> The problematic behavior seems isolated to a single delegation. Meanwhile,
>> the client itself likely has numerous other open files and active state on
>> the server. Forcing a complete client reconnect would tear down all that
>> state, which could cause significant application disruption and be perceived
>> as a service outage from the client's perspective.
>>
>> [1]
>> https://lore.kernel.org/all/de669327-c93a-49e5-a53b-bda9e67d34a2@huawei.com/
>>
>> Thanks,
>> Lingfeng
> Ok, I get the problem, but I still disagree with the solution. I don't
> think we can just time these things out. Ideally we'd close the race
> window, but the sc_status field is protected by the global state_lock
> and I don't think we want to take it in revoke_delegation.
>
> The best solution I can see is to have destroy_delegation()
> unconditionally set SC_STATUS_CLOSED, and then you can do the list walk
> above, but checking for that flag instead of testing for a timeout.
This might potentially affect the normal TEST_STATEID/FREE_STATEID flow,
as nfsd4_free_stateid() branches differently based on whether
SC_STATUS_CLOSED is set. Alternatively, I was wondering if you could
suggest a workaround to avoid this issue?

Thanks,
Lingfeng

>
> I'm still not thrilled with this solution though. It makes SEQUENCE a
> bit more heavyweight than I'd like. I'm starting to think that we need
> to rework the overall delegation locking, but that's an ugly problem to
> tackle.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nfsd: remove long-standing revoked delegations by force
  2025-09-02 14:21       ` Li Lingfeng
@ 2025-09-02 14:29         ` Jeff Layton
  2025-09-03  1:34           ` Li Lingfeng
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Layton @ 2025-09-02 14:29 UTC (permalink / raw)
  To: Li Lingfeng, chuck.lever, neil, okorniev, Dai.Ngo, tom, linux-nfs,
	linux-kernel
  Cc: yukuai1, houtao1, yi.zhang, yangerkun, lilingfeng, zhangjian496

On Tue, 2025-09-02 at 22:21 +0800, Li Lingfeng wrote:
> 在 2025/9/2 21:40, Jeff Layton 写道:
> > On Tue, 2025-09-02 at 20:10 +0800, Li Lingfeng wrote:
> > > Hi,
> > > 
> > > 在 2025/9/2 18:21, Jeff Layton 写道:
> > > > On Tue, 2025-09-02 at 10:22 +0800, Li Lingfeng wrote:
> > > > > When file access conflicts occur between clients, the server recalls
> > > > > delegations. If the client holding delegation fails to return it after
> > > > > a recall, nfs4_laundromat adds the delegation to cl_revoked list.
> > > > > This causes subsequent SEQUENCE operations to set the
> > > > > SEQ4_STATUS_RECALLABLE_STATE_REVOKED flag, forcing the client to
> > > > > validate all delegations and return the revoked one.
> > > > > 
> > > > > However, if the client fails to return the delegation due to a timeout
> > > > > after receiving the recall or a server bug, the delegation remains in the
> > > > > server's cl_revoked list. The client marks it revoked and won't find it
> > > > > upon detecting SEQ4_STATUS_RECALLABLE_STATE_REVOKED. This leads to a loop:
> > > > > the server persistently sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED, and the
> > > > > client repeatedly tests all delegations, severely impacting performance
> > > > > when numerous delegations exist.
> > > > > 
> > > > It is a performance impact, but I don't get the "loop" here. Are you
> > > > saying that this problem compounds itself? That testing all delegations
> > > > causes others to be revoked?
> > > The delegation will be removed from server->delegations in client after
> > > NFSPROC4_CLNT_DELEGRETURN is performed.
> > > nfs4_delegreturn_done
> > >    nfs_delegation_mark_returned
> > >     nfs_detach_delegation
> > >      nfs_detach_delegation_locked
> > >       list_del_rcu // remove delegation from server->delegations
> > > 
> > >   From the client's perspective, the delegation has been returned, but on
> > > the server side, it is left in the cl_revoked list.[1].
> > > 
> > > Subsequently, every sequence from the client will be flagged with
> > > SEQ4_STATUS_RECALLABLE_STATE_REVOKED as long as cl_revoked remains
> > > non-empty.
> > > nfsd4_sequence
> > >    seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED
> > > 
> > > When the client detects SEQ4_STATUS_RECALLABLE_STATE_REVOKED while
> > > processing a sequence result, it sets NFS_DELEGATION_TEST_EXPIRED for all
> > > delegations and wakes up the state manager for handling.
> > > nfs41_sequence_done
> > >    nfs41_sequence_process
> > >     nfs41_handle_sequence_flag_errors
> > >      nfs41_handle_recallable_state_revoked
> > >       nfs_test_expired_all_delegations
> > >        nfs_mark_test_expired_all_delegations
> > >         nfs_delegation_mark_test_expired_server
> > >          // set NFS_DELEGATION_TEST_EXPIRED for delegations in
> > > server->delegations
> > >        nfs4_schedule_state_manager
> > > 
> > > The state manager tests all delegations except the one that was returned,
> > > as it is no longer in server->delegations.
> > > nfs4_state_manager
> > >    nfs4_begin_drain_session
> > >    nfs_reap_expired_delegations
> > >     nfs_server_reap_expired_delegations
> > >      // test delegations in server->delegations
> > > 
> > > There may be a loop:
> > > 1) send a sequence(client)
> > > 2) return SEQ4_STATUS_RECALLABLE_STATE_REVOKED(server)
> > > 3) set NFS_DELEGATION_TEST_EXPIRED for all delegations(client)
> > > 4) test all delegations by state manager(client)
> > > 5) send another sequence(client)
> > > 
> > > The state manager's traversal of delegations occurs between
> > > nfs4_begin_drain_session and nfs4_end_drain_session. Non-privileged requests
> > > will be blocked because the NFS4_SLOT_TBL_DRAINING flag is set. If there are
> > > many delegations to traverse, this blocking time can be relatively long.
> > > > > Since abnormal delegations are removed from flc_lease via nfs4_laundromat
> > > > > --> revoke_delegation --> destroy_unhashed_deleg -->
> > > > > nfs4_unlock_deleg_lease --> kernel_setlease, and do not block new open
> > > > > requests indefinitely, retaining such a delegation on the server is
> > > > > unnecessary.
> > > > > 
> > > > > Reported-by: Zhang Jian <zhangjian496@huawei.com>
> > > > > Fixes: 3bd64a5ba171 ("nfsd4: implement SEQ4_STATUS_RECALLABLE_STATE_REVOKED")
> > > > > Closes: https://lore.kernel.org/all/ff8debe9-6877-4cf7-ba29-fc98eae0ffa0@huawei.com/
> > > > > Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
> > > > > ---
> > > > >    fs/nfsd/nfs4state.c | 11 +++++++++++
> > > > >    1 file changed, 11 insertions(+)
> > > > > 
> > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > > index 88c347957da5..aa65a685dbb9 100644
> > > > > --- a/fs/nfsd/nfs4state.c
> > > > > +++ b/fs/nfsd/nfs4state.c
> > > > > @@ -4326,6 +4326,8 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > > >    	int buflen;
> > > > >    	struct net *net = SVC_NET(rqstp);
> > > > >    	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > > > > +	struct list_head *pos, *next;
> > > > > +	struct nfs4_delegation *dp;
> > > > >    
> > > > >    	if (resp->opcnt != 1)
> > > > >    		return nfserr_sequence_pos;
> > > > > @@ -4470,6 +4472,15 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > > >    	default:
> > > > >    		seq->status_flags = 0;
> > > > >    	}
> > > > > +	if (!list_empty(&clp->cl_revoked)) {
> > > > > +		list_for_each_safe(pos, next, &clp->cl_revoked) {
> > > > > +			dp = list_entry(pos, struct nfs4_delegation, dl_recall_lru);
> > > > > +			if (dp->dl_time < (ktime_get_boottime_seconds() - 2 * nn->nfsd4_lease)) {
> > > > > +				list_del_init(&dp->dl_recall_lru);
> > > > > +				nfs4_put_stid(&dp->dl_stid);
> > > > > +			}
> > > > > +		}
> > FYI: this list is protected by the clp->cl_lock. You need to hold it to
> > do this list walk.
> > 
> > > > > +	}
> > > > >    	if (!list_empty(&clp->cl_revoked))
> > > > >    		seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED;
> > > > >    	if (atomic_read(&clp->cl_admin_revoked))
> > > > This seems like a violation of the spec. AIUI, the server is required
> > > > to hang onto a record of the delegation until the client does the
> > > > TEST_STATEID/FREE_STATEID dance to remove it. Just discarding them like
> > > > this seems wrong.
> > > Our expected outcome was that the client would release the abnormal
> > > delegation via TEST_STATEID/FREE_STATEID upon detecting its invalidity.
> > > However, this problematic delegation is no longer present in the
> > > client's server->delegations list—whether due to client-side timeouts or
> > > the server-side bug [1].
> > > > Should we instead just administratively evict the client since it's
> > > > clearly not behaving right in this case?
> > > Thanks for the suggestion. While administratively evicting the client would
> > > certainly resolve the immediate delegation issue, I'm concerned that
> > > approach
> > > might be a bit heavy-handed.
> > > The problematic behavior seems isolated to a single delegation. Meanwhile,
> > > the client itself likely has numerous other open files and active state on
> > > the server. Forcing a complete client reconnect would tear down all that
> > > state, which could cause significant application disruption and be perceived
> > > as a service outage from the client's perspective.
> > > 
> > > [1]
> > > https://lore.kernel.org/all/de669327-c93a-49e5-a53b-bda9e67d34a2@huawei.com/
> > > 
> > > Thanks,
> > > Lingfeng
> > Ok, I get the problem, but I still disagree with the solution. I don't
> > think we can just time these things out. Ideally we'd close the race
> > window, but the sc_status field is protected by the global state_lock
> > and I don't think we want to take it in revoke_delegation.
> > 
> > The best solution I can see is to have destroy_delegation()
> > unconditionally set SC_STATUS_CLOSED, and then you can do the list walk
> > above, but checking for that flag instead of testing for a timeout.
> This might potentially affect the normal TEST_STATEID/FREE_STATEID flow,
> as nfsd4_free_stateid() branches differently based on whether
> SC_STATUS_CLOSED is set. Alternatively, I was wondering if you could
> suggest a workaround to avoid this issue?
> 

I can't think of any workarounds other than turning off delegations
altogether.

I guess your concern is that TEST_STATEID and FREE_STATEID would return
BAD_STATEID in this case, even though the entry was still (technically)
on the cl_revoked list? That seems like correct behavior. The client
did send DELEGRETURN, after all.

> > 
> > I'm still not thrilled with this solution though. It makes SEQUENCE a
> > bit more heavyweight than I'd like. I'm starting to think that we need
> > to rework the overall delegation locking, but that's an ugly problem to
> > tackle.

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nfsd: remove long-standing revoked delegations by force
  2025-09-02 14:29         ` Jeff Layton
@ 2025-09-03  1:34           ` Li Lingfeng
  0 siblings, 0 replies; 13+ messages in thread
From: Li Lingfeng @ 2025-09-03  1:34 UTC (permalink / raw)
  To: Jeff Layton, chuck.lever, neil, okorniev, Dai.Ngo, tom, linux-nfs,
	linux-kernel
  Cc: yukuai1, houtao1, yi.zhang, yangerkun, lilingfeng, zhangjian496


在 2025/9/2 22:29, Jeff Layton 写道:
> On Tue, 2025-09-02 at 22:21 +0800, Li Lingfeng wrote:
>> 在 2025/9/2 21:40, Jeff Layton 写道:
>>> On Tue, 2025-09-02 at 20:10 +0800, Li Lingfeng wrote:
>>>> Hi,
>>>>
>>>> 在 2025/9/2 18:21, Jeff Layton 写道:
>>>>> On Tue, 2025-09-02 at 10:22 +0800, Li Lingfeng wrote:
>>>>>> When file access conflicts occur between clients, the server recalls
>>>>>> delegations. If the client holding delegation fails to return it after
>>>>>> a recall, nfs4_laundromat adds the delegation to cl_revoked list.
>>>>>> This causes subsequent SEQUENCE operations to set the
>>>>>> SEQ4_STATUS_RECALLABLE_STATE_REVOKED flag, forcing the client to
>>>>>> validate all delegations and return the revoked one.
>>>>>>
>>>>>> However, if the client fails to return the delegation due to a timeout
>>>>>> after receiving the recall or a server bug, the delegation remains in the
>>>>>> server's cl_revoked list. The client marks it revoked and won't find it
>>>>>> upon detecting SEQ4_STATUS_RECALLABLE_STATE_REVOKED. This leads to a loop:
>>>>>> the server persistently sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED, and the
>>>>>> client repeatedly tests all delegations, severely impacting performance
>>>>>> when numerous delegations exist.
>>>>>>
>>>>> It is a performance impact, but I don't get the "loop" here. Are you
>>>>> saying that this problem compounds itself? That testing all delegations
>>>>> causes others to be revoked?
>>>> The delegation will be removed from server->delegations in client after
>>>> NFSPROC4_CLNT_DELEGRETURN is performed.
>>>> nfs4_delegreturn_done
>>>>     nfs_delegation_mark_returned
>>>>      nfs_detach_delegation
>>>>       nfs_detach_delegation_locked
>>>>        list_del_rcu // remove delegation from server->delegations
>>>>
>>>>    From the client's perspective, the delegation has been returned, but on
>>>> the server side, it is left in the cl_revoked list.[1].
>>>>
>>>> Subsequently, every sequence from the client will be flagged with
>>>> SEQ4_STATUS_RECALLABLE_STATE_REVOKED as long as cl_revoked remains
>>>> non-empty.
>>>> nfsd4_sequence
>>>>     seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED
>>>>
>>>> When the client detects SEQ4_STATUS_RECALLABLE_STATE_REVOKED while
>>>> processing a sequence result, it sets NFS_DELEGATION_TEST_EXPIRED for all
>>>> delegations and wakes up the state manager for handling.
>>>> nfs41_sequence_done
>>>>     nfs41_sequence_process
>>>>      nfs41_handle_sequence_flag_errors
>>>>       nfs41_handle_recallable_state_revoked
>>>>        nfs_test_expired_all_delegations
>>>>         nfs_mark_test_expired_all_delegations
>>>>          nfs_delegation_mark_test_expired_server
>>>>           // set NFS_DELEGATION_TEST_EXPIRED for delegations in
>>>> server->delegations
>>>>         nfs4_schedule_state_manager
>>>>
>>>> The state manager tests all delegations except the one that was returned,
>>>> as it is no longer in server->delegations.
>>>> nfs4_state_manager
>>>>     nfs4_begin_drain_session
>>>>     nfs_reap_expired_delegations
>>>>      nfs_server_reap_expired_delegations
>>>>       // test delegations in server->delegations
>>>>
>>>> There may be a loop:
>>>> 1) send a sequence(client)
>>>> 2) return SEQ4_STATUS_RECALLABLE_STATE_REVOKED(server)
>>>> 3) set NFS_DELEGATION_TEST_EXPIRED for all delegations(client)
>>>> 4) test all delegations by state manager(client)
>>>> 5) send another sequence(client)
>>>>
>>>> The state manager's traversal of delegations occurs between
>>>> nfs4_begin_drain_session and nfs4_end_drain_session. Non-privileged requests
>>>> will be blocked because the NFS4_SLOT_TBL_DRAINING flag is set. If there are
>>>> many delegations to traverse, this blocking time can be relatively long.
>>>>>> Since abnormal delegations are removed from flc_lease via nfs4_laundromat
>>>>>> --> revoke_delegation --> destroy_unhashed_deleg -->
>>>>>> nfs4_unlock_deleg_lease --> kernel_setlease, and do not block new open
>>>>>> requests indefinitely, retaining such a delegation on the server is
>>>>>> unnecessary.
>>>>>>
>>>>>> Reported-by: Zhang Jian <zhangjian496@huawei.com>
>>>>>> Fixes: 3bd64a5ba171 ("nfsd4: implement SEQ4_STATUS_RECALLABLE_STATE_REVOKED")
>>>>>> Closes: https://lore.kernel.org/all/ff8debe9-6877-4cf7-ba29-fc98eae0ffa0@huawei.com/
>>>>>> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
>>>>>> ---
>>>>>>     fs/nfsd/nfs4state.c | 11 +++++++++++
>>>>>>     1 file changed, 11 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>>>>> index 88c347957da5..aa65a685dbb9 100644
>>>>>> --- a/fs/nfsd/nfs4state.c
>>>>>> +++ b/fs/nfsd/nfs4state.c
>>>>>> @@ -4326,6 +4326,8 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>>>     	int buflen;
>>>>>>     	struct net *net = SVC_NET(rqstp);
>>>>>>     	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>>>>>> +	struct list_head *pos, *next;
>>>>>> +	struct nfs4_delegation *dp;
>>>>>>     
>>>>>>     	if (resp->opcnt != 1)
>>>>>>     		return nfserr_sequence_pos;
>>>>>> @@ -4470,6 +4472,15 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>>>     	default:
>>>>>>     		seq->status_flags = 0;
>>>>>>     	}
>>>>>> +	if (!list_empty(&clp->cl_revoked)) {
>>>>>> +		list_for_each_safe(pos, next, &clp->cl_revoked) {
>>>>>> +			dp = list_entry(pos, struct nfs4_delegation, dl_recall_lru);
>>>>>> +			if (dp->dl_time < (ktime_get_boottime_seconds() - 2 * nn->nfsd4_lease)) {
>>>>>> +				list_del_init(&dp->dl_recall_lru);
>>>>>> +				nfs4_put_stid(&dp->dl_stid);
>>>>>> +			}
>>>>>> +		}
>>> FYI: this list is protected by the clp->cl_lock. You need to hold it to
>>> do this list walk.
>>>
>>>>>> +	}
>>>>>>     	if (!list_empty(&clp->cl_revoked))
>>>>>>     		seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED;
>>>>>>     	if (atomic_read(&clp->cl_admin_revoked))
>>>>> This seems like a violation of the spec. AIUI, the server is required
>>>>> to hang onto a record of the delegation until the client does the
>>>>> TEST_STATEID/FREE_STATEID dance to remove it. Just discarding them like
>>>>> this seems wrong.
>>>> Our expected outcome was that the client would release the abnormal
>>>> delegation via TEST_STATEID/FREE_STATEID upon detecting its invalidity.
>>>> However, this problematic delegation is no longer present in the
>>>> client's server->delegations list—whether due to client-side timeouts or
>>>> the server-side bug [1].
>>>>> Should we instead just administratively evict the client since it's
>>>>> clearly not behaving right in this case?
>>>> Thanks for the suggestion. While administratively evicting the client would
>>>> certainly resolve the immediate delegation issue, I'm concerned that
>>>> approach
>>>> might be a bit heavy-handed.
>>>> The problematic behavior seems isolated to a single delegation. Meanwhile,
>>>> the client itself likely has numerous other open files and active state on
>>>> the server. Forcing a complete client reconnect would tear down all that
>>>> state, which could cause significant application disruption and be perceived
>>>> as a service outage from the client's perspective.
>>>>
>>>> [1]
>>>> https://lore.kernel.org/all/de669327-c93a-49e5-a53b-bda9e67d34a2@huawei.com/
>>>>
>>>> Thanks,
>>>> Lingfeng
>>> Ok, I get the problem, but I still disagree with the solution. I don't
>>> think we can just time these things out. Ideally we'd close the race
>>> window, but the sc_status field is protected by the global state_lock
>>> and I don't think we want to take it in revoke_delegation.
>>>
>>> The best solution I can see is to have destroy_delegation()
>>> unconditionally set SC_STATUS_CLOSED, and then you can do the list walk
>>> above, but checking for that flag instead of testing for a timeout.
>> This might potentially affect the normal TEST_STATEID/FREE_STATEID flow,
>> as nfsd4_free_stateid() branches differently based on whether
>> SC_STATUS_CLOSED is set. Alternatively, I was wondering if you could
>> suggest a workaround to avoid this issue?
>>
> I can't think of any workarounds other than turning off delegations
> altogether.
>
> I guess your concern is that TEST_STATEID and FREE_STATEID would return
> BAD_STATEID in this case, even though the entry was still (technically)
> on the cl_revoked list? That seems like correct behavior. The client
> did send DELEGRETURN, after all.
Sorry for the noice. I was mistaken.
I was originally concerned that after setting SC_STATUS_CLOSED via
nfsd4_delegreturn --> destroy_delegation, if the client initiated a
FREE_STATEID, the server would skip the SC_TYPE_DELEG branch in
nfsd4_free_stateid() that would normally be executed.
However, in reality, after the client completes DELEGRETURN, it will not
perform FREE_STATEID for this delegation again since the delegation is
no longer in server->delegations.

Thanks,
Lingfeng

>
>>> I'm still not thrilled with this solution though. It makes SEQUENCE a
>>> bit more heavyweight than I'd like. I'm starting to think that we need
>>> to rework the overall delegation locking, but that's an ugly problem to
>>> tackle.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nfsd: remove long-standing revoked delegations by force
  2025-09-02 12:43     ` Benjamin Coddington
  2025-09-02 13:08       ` Li Lingfeng
@ 2025-09-03  3:46       ` zhangjian (CG)
  2025-09-03  6:45         ` Li Lingfeng
  1 sibling, 1 reply; 13+ messages in thread
From: zhangjian (CG) @ 2025-09-03  3:46 UTC (permalink / raw)
  To: Benjamin Coddington, Li Lingfeng
  Cc: Jeff Layton, chuck.lever, neil, okorniev, Dai.Ngo, tom, linux-nfs,
	linux-kernel, yukuai1, houtao1, yi.zhang, yangerkun, lilingfeng

Hello every experts.

If we can see all delegations on hard-mounted nfs client, which are also
on server cl_revoked list, changed from
NFS_DELEGATION_RETURN_IF_CLOSED|NFS_DELEGATION_REVOKED|NFS_DELEGATION_TEST_EXPIRED
to NFS_DELEGATION_RETURN_IF_CLOSED|NFS_DELEGATION_REVOKED, can we give
some hypothesis on this problem ?

By the way, this problem can be cover over by decreasing file count on
server.

Thanks,
zhangjian

On 2025/9/2 20:43, Benjamin Coddington wrote:
> On 2 Sep 2025, at 8:10, Li Lingfeng wrote:
> 
>> Our expected outcome was that the client would release the abnormal
>> delegation via TEST_STATEID/FREE_STATEID upon detecting its invalidity.
>> However, this problematic delegation is no longer present in the
>> client's server->delegations list—whether due to client-side timeouts or
>> the server-side bug [1].
> 
> How does the client timeout TEST_STATEID - are you mounting with 'soft'?
> 
> We should find the server-side bug and fix it rather than write code to
> paper over it.  I do think the synchronization of state here is a bit
> fragile and wish the protocol had a generation, sequence, or marker for
> setting SEQ4_STATUS_ bits..
> 
>>>
>>> Should we instead just administratively evict the client since it's
>>> clearly not behaving right in this case?
>> Thanks for the suggestion. While administratively evicting the client would
>> certainly resolve the immediate delegation issue, I'm concerned that approach
>> might be a bit heavy-handed.
>> The problematic behavior seems isolated to a single delegation. Meanwhile,
>> the client itself likely has numerous other open files and active state on
>> the server. Forcing a complete client reconnect would tear down all that
>> state, which could cause significant application disruption and be perceived
>> as a service outage from the client's perspective.
>>
>> [1] https://lore.kernel.org/all/de669327-c93a-49e5-a53b-bda9e67d34a2@huawei.com/
> 
> ^^ in this thread you reference v5.10 - there was a knfsd fix for a
> cl_revoked leak "3b816601e279", and there have been 3 or 4 fixes to fix
> problems and optimize the client walk of delegations since then.  Jeff
> pointed out that there have been fixes in these areas.  Are you finding this
> problem still with all those fixes included?
> 
> Ben
> 
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nfsd: remove long-standing revoked delegations by force
  2025-09-03  3:46       ` zhangjian (CG)
@ 2025-09-03  6:45         ` Li Lingfeng
  2025-09-03 10:06           ` zhangjian (CG)
  0 siblings, 1 reply; 13+ messages in thread
From: Li Lingfeng @ 2025-09-03  6:45 UTC (permalink / raw)
  To: zhangjian (CG), Benjamin Coddington, Li Lingfeng
  Cc: Jeff Layton, chuck.lever, neil, okorniev, Dai.Ngo, tom, linux-nfs,
	linux-kernel, yukuai1, houtao1, yi.zhang, yangerkun

Hi,

在 2025/9/3 11:46, zhangjian (CG) 写道:
> Hello every experts.
>
> If we can see all delegations on hard-mounted nfs client, which are also
> on server cl_revoked list, changed from
> NFS_DELEGATION_RETURN_IF_CLOSED|NFS_DELEGATION_REVOKED|NFS_DELEGATION_TEST_EXPIRED
> to NFS_DELEGATION_RETURN_IF_CLOSED|NFS_DELEGATION_REVOKED, can we give
> some hypothesis on this problem ?
>
> By the way, this problem can be cover over by decreasing file count on
> server.
>
> Thanks,
> zhangjian
I think NFS_DELEGATION_TEST_EXPIRED is cleared as follows:
nfs4_state_manager
  nfs4_do_reclaim
   nfs4_reclaim_open_state
    __nfs4_reclaim_open_state // get nfs4_state from sp->so_states
     nfs41_open_expired // status = ops->recover_open
      nfs41_check_delegation_stateid
       test_and_clear_bit // NFS_DELEGATION_TEST_EXPIRED
After the bug in [1] is triggered, although the delegation is no longer on
server->delegations, it can still be obtained by traversing sp->so_states.
However, I cannot find the connection between the number of files on the
server and this issue.

Thanks,
Lingfeng

>
> On 2025/9/2 20:43, Benjamin Coddington wrote:
>> On 2 Sep 2025, at 8:10, Li Lingfeng wrote:
>>
>>> Our expected outcome was that the client would release the abnormal
>>> delegation via TEST_STATEID/FREE_STATEID upon detecting its invalidity.
>>> However, this problematic delegation is no longer present in the
>>> client's server->delegations list—whether due to client-side timeouts or
>>> the server-side bug [1].
>> How does the client timeout TEST_STATEID - are you mounting with 'soft'?
>>
>> We should find the server-side bug and fix it rather than write code to
>> paper over it.  I do think the synchronization of state here is a bit
>> fragile and wish the protocol had a generation, sequence, or marker for
>> setting SEQ4_STATUS_ bits..
>>
>>>> Should we instead just administratively evict the client since it's
>>>> clearly not behaving right in this case?
>>> Thanks for the suggestion. While administratively evicting the client would
>>> certainly resolve the immediate delegation issue, I'm concerned that approach
>>> might be a bit heavy-handed.
>>> The problematic behavior seems isolated to a single delegation. Meanwhile,
>>> the client itself likely has numerous other open files and active state on
>>> the server. Forcing a complete client reconnect would tear down all that
>>> state, which could cause significant application disruption and be perceived
>>> as a service outage from the client's perspective.
>>>
>>> [1] https://lore.kernel.org/all/de669327-c93a-49e5-a53b-bda9e67d34a2@huawei.com/
>> ^^ in this thread you reference v5.10 - there was a knfsd fix for a
>> cl_revoked leak "3b816601e279", and there have been 3 or 4 fixes to fix
>> problems and optimize the client walk of delegations since then.  Jeff
>> pointed out that there have been fixes in these areas.  Are you finding this
>> problem still with all those fixes included?
>>
>> Ben
>>
>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nfsd: remove long-standing revoked delegations by force
  2025-09-03  6:45         ` Li Lingfeng
@ 2025-09-03 10:06           ` zhangjian (CG)
  2025-09-03 11:40             ` Li Lingfeng
  0 siblings, 1 reply; 13+ messages in thread
From: zhangjian (CG) @ 2025-09-03 10:06 UTC (permalink / raw)
  To: Li Lingfeng, Benjamin Coddington
  Cc: Jeff Layton, chuck.lever, neil, okorniev, Dai.Ngo, tom, linux-nfs,
	linux-kernel, yukuai1, houtao1, yi.zhang, yangerkun



On 2025/9/3 14:45, Li Lingfeng wrote:
> Hi,
> 
> 在 2025/9/3 11:46, zhangjian (CG) 写道:
>> Hello every experts.
>>
>> If we can see all delegations on hard-mounted nfs client, which are also
>> on server cl_revoked list, changed from
>> NFS_DELEGATION_RETURN_IF_CLOSED|NFS_DELEGATION_REVOKED|
>> NFS_DELEGATION_TEST_EXPIRED
>> to NFS_DELEGATION_RETURN_IF_CLOSED|NFS_DELEGATION_REVOKED, can we give
>> some hypothesis on this problem ?
>>
>> By the way, this problem can be cover over by decreasing file count on
>> server.
>>
>> Thanks,
>> zhangjian
> I think NFS_DELEGATION_TEST_EXPIRED is cleared as follows:
> nfs4_state_manager
>  nfs4_do_reclaim
>   nfs4_reclaim_open_state
>    __nfs4_reclaim_open_state // get nfs4_state from sp->so_states
>     nfs41_open_expired // status = ops->recover_open
>      nfs41_check_delegation_stateid
>       test_and_clear_bit // NFS_DELEGATION_TEST_EXPIRED
> After the bug in [1] is triggered, although the delegation is no longer on
> server->delegations, it can still be obtained by traversing sp->so_states.
> However, I cannot find the connection between the number of files on the
> server and this issue.
> 
> Thanks,
> Lingfeng
> 
Thanks a lot.

NFS_DELEGATION_TEST_EXPIRED can only be set when
delegation->stateid.type != NFS4_INVALID_STATEID_TYPE. But when
NFS_DELEGATION_REVOKED is set, delegation->stateid.type will be
NFS4_INVALID_STATEID_TYPE in nfs_mark_delegation_revoked.
This implies the order could be like:
1. Deleg A is in server cl_revoked list
2. Deleg B is marked as NFS_DELEGATION_TEST_EXPIRED in client
3. Deleg B is revoked by server callback procedure and server meet [1].
deleg B is added to cl_revoked list
4. Deleg B is marked as NFS_DELEGATION_REVOKED in client

Why the first deleg A is in server cl_revoked list? Is [1] only
condition? Why this can only happen when file count is large.
I used to see 700 delegations in server but 40w+ delegations in client.
May this give some clue on the problem?
>>
>> On 2025/9/2 20:43, Benjamin Coddington wrote:
>>> On 2 Sep 2025, at 8:10, Li Lingfeng wrote:
>>>
>>>> Our expected outcome was that the client would release the abnormal
>>>> delegation via TEST_STATEID/FREE_STATEID upon detecting its invalidity.
>>>> However, this problematic delegation is no longer present in the
>>>> client's server->delegations list—whether due to client-side
>>>> timeouts or
>>>> the server-side bug [1].
>>> How does the client timeout TEST_STATEID - are you mounting with 'soft'?
>>>
>>> We should find the server-side bug and fix it rather than write code to
>>> paper over it.  I do think the synchronization of state here is a bit
>>> fragile and wish the protocol had a generation, sequence, or marker for
>>> setting SEQ4_STATUS_ bits..
>>>
>>>>> Should we instead just administratively evict the client since it's
>>>>> clearly not behaving right in this case?
>>>> Thanks for the suggestion. While administratively evicting the
>>>> client would
>>>> certainly resolve the immediate delegation issue, I'm concerned that
>>>> approach
>>>> might be a bit heavy-handed.
>>>> The problematic behavior seems isolated to a single delegation.
>>>> Meanwhile,
>>>> the client itself likely has numerous other open files and active
>>>> state on
>>>> the server. Forcing a complete client reconnect would tear down all
>>>> that
>>>> state, which could cause significant application disruption and be
>>>> perceived
>>>> as a service outage from the client's perspective.
>>>>
>>>> [1] https://lore.kernel.org/all/de669327-c93a-49e5-a53b-
>>>> bda9e67d34a2@huawei.com/
>>> ^^ in this thread you reference v5.10 - there was a knfsd fix for a
>>> cl_revoked leak "3b816601e279", and there have been 3 or 4 fixes to fix
>>> problems and optimize the client walk of delegations since then.  Jeff
>>> pointed out that there have been fixes in these areas.  Are you
>>> finding this
>>> problem still with all those fixes included?
>>>
>>> Ben
>>>
>>>
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nfsd: remove long-standing revoked delegations by force
  2025-09-03 10:06           ` zhangjian (CG)
@ 2025-09-03 11:40             ` Li Lingfeng
  0 siblings, 0 replies; 13+ messages in thread
From: Li Lingfeng @ 2025-09-03 11:40 UTC (permalink / raw)
  To: zhangjian (CG), Li Lingfeng, Benjamin Coddington
  Cc: Jeff Layton, chuck.lever, neil, okorniev, Dai.Ngo, tom, linux-nfs,
	linux-kernel, yukuai1, houtao1, yi.zhang, yangerkun

Hi,

在 2025/9/3 18:06, zhangjian (CG) 写道:
>
> On 2025/9/3 14:45, Li Lingfeng wrote:
>> Hi,
>>
>> 在 2025/9/3 11:46, zhangjian (CG) 写道:
>>> Hello every experts.
>>>
>>> If we can see all delegations on hard-mounted nfs client, which are also
>>> on server cl_revoked list, changed from
>>> NFS_DELEGATION_RETURN_IF_CLOSED|NFS_DELEGATION_REVOKED|
>>> NFS_DELEGATION_TEST_EXPIRED
>>> to NFS_DELEGATION_RETURN_IF_CLOSED|NFS_DELEGATION_REVOKED, can we give
>>> some hypothesis on this problem ?
>>>
>>> By the way, this problem can be cover over by decreasing file count on
>>> server.
>>>
>>> Thanks,
>>> zhangjian
>> I think NFS_DELEGATION_TEST_EXPIRED is cleared as follows:
>> nfs4_state_manager
>>   nfs4_do_reclaim
>>    nfs4_reclaim_open_state
>>     __nfs4_reclaim_open_state // get nfs4_state from sp->so_states
>>      nfs41_open_expired // status = ops->recover_open
>>       nfs41_check_delegation_stateid
>>        test_and_clear_bit // NFS_DELEGATION_TEST_EXPIRED
>> After the bug in [1] is triggered, although the delegation is no longer on
>> server->delegations, it can still be obtained by traversing sp->so_states.
>> However, I cannot find the connection between the number of files on the
>> server and this issue.
>>
>> Thanks,
>> Lingfeng
>>
> Thanks a lot.
>
> NFS_DELEGATION_TEST_EXPIRED can only be set when
> delegation->stateid.type != NFS4_INVALID_STATEID_TYPE. But when
> NFS_DELEGATION_REVOKED is set, delegation->stateid.type will be
> NFS4_INVALID_STATEID_TYPE in nfs_mark_delegation_revoked.
> This implies the order could be like:
> 1. Deleg A is in server cl_revoked list
> 2. Deleg B is marked as NFS_DELEGATION_TEST_EXPIRED in client
> 3. Deleg B is revoked by server callback procedure and server meet [1].
> deleg B is added to cl_revoked list
> 4. Deleg B is marked as NFS_DELEGATION_REVOKED in client
I think Deleg A was added to the server's cl_revoked list due to [1]. For
the file corresponding to Deleg B, no access conflict occurred, which
means no deleg return was triggered. Therefore, unlike Deleg A, it would
not go through the process of nfs4_delegreturn_done -->
nfs_delegation_mark_returned --> nfs_mark_delegation_revoked to be set
with NFS4_INVALID_STATEID_TYPE, and thus could be flagged with
NFS_DELEGATION_TEST_EXPIRED.
> Why the first deleg A is in server cl_revoked list? Is [1] only
> condition? Why this can only happen when file count is large.
> I used to see 700 delegations in server but 40w+ delegations in client.
> May this give some clue on the problem?
I'm afraid I cannot explain why there is such a significant discrepancy in
the number of delegations between the client and the server. I truly don't
know what is happening.

Thanks,
Lingfeng

>>> On 2025/9/2 20:43, Benjamin Coddington wrote:
>>>> On 2 Sep 2025, at 8:10, Li Lingfeng wrote:
>>>>
>>>>> Our expected outcome was that the client would release the abnormal
>>>>> delegation via TEST_STATEID/FREE_STATEID upon detecting its invalidity.
>>>>> However, this problematic delegation is no longer present in the
>>>>> client's server->delegations list—whether due to client-side
>>>>> timeouts or
>>>>> the server-side bug [1].
>>>> How does the client timeout TEST_STATEID - are you mounting with 'soft'?
>>>>
>>>> We should find the server-side bug and fix it rather than write code to
>>>> paper over it.  I do think the synchronization of state here is a bit
>>>> fragile and wish the protocol had a generation, sequence, or marker for
>>>> setting SEQ4_STATUS_ bits..
>>>>
>>>>>> Should we instead just administratively evict the client since it's
>>>>>> clearly not behaving right in this case?
>>>>> Thanks for the suggestion. While administratively evicting the
>>>>> client would
>>>>> certainly resolve the immediate delegation issue, I'm concerned that
>>>>> approach
>>>>> might be a bit heavy-handed.
>>>>> The problematic behavior seems isolated to a single delegation.
>>>>> Meanwhile,
>>>>> the client itself likely has numerous other open files and active
>>>>> state on
>>>>> the server. Forcing a complete client reconnect would tear down all
>>>>> that
>>>>> state, which could cause significant application disruption and be
>>>>> perceived
>>>>> as a service outage from the client's perspective.
>>>>>
>>>>> [1] https://lore.kernel.org/all/de669327-c93a-49e5-a53b-
>>>>> bda9e67d34a2@huawei.com/
>>>> ^^ in this thread you reference v5.10 - there was a knfsd fix for a
>>>> cl_revoked leak "3b816601e279", and there have been 3 or 4 fixes to fix
>>>> problems and optimize the client walk of delegations since then.  Jeff
>>>> pointed out that there have been fixes in these areas.  Are you
>>>> finding this
>>>> problem still with all those fixes included?
>>>>
>>>> Ben
>>>>
>>>>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-09-03 11:40 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-02  2:22 [PATCH] nfsd: remove long-standing revoked delegations by force Li Lingfeng
2025-09-02 10:21 ` Jeff Layton
2025-09-02 12:10   ` Li Lingfeng
2025-09-02 12:43     ` Benjamin Coddington
2025-09-02 13:08       ` Li Lingfeng
2025-09-03  3:46       ` zhangjian (CG)
2025-09-03  6:45         ` Li Lingfeng
2025-09-03 10:06           ` zhangjian (CG)
2025-09-03 11:40             ` Li Lingfeng
2025-09-02 13:40     ` Jeff Layton
2025-09-02 14:21       ` Li Lingfeng
2025-09-02 14:29         ` Jeff Layton
2025-09-03  1:34           ` Li Lingfeng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).