[Question]nfs: never returned delegation

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

* [Question]nfs: never returned delegation
@ 2025-08-11 12:48 zhangjian (CG)
  2025-08-11 13:03 ` Trond Myklebust
  2025-08-11 13:03 ` Jeff Layton
  0 siblings, 2 replies; 15+ messages in thread
From: zhangjian (CG) @ 2025-08-11 12:48 UTC (permalink / raw)
  To: Trond Myklebust, anna; +Cc: linux-nfs, linux-kernel

Recently, we meet a NFS problem in 5.10. There are so many test_state_id request after a non-privilaged request in tcpdump result. There are 40w+ delegations in client (I read the delegation list from /proc/kcore).
Firstly, I think state manager cost a lot in nfs_server_reap_expired_delegations. But I see they are all in NFS_DELEGATION_REVOKED state except 6 in NFS_DELEGATION_REFERENCED (I read this from /proc/kcore too). 
I analyze NFS code and find if NFSPROC4_CLNT_DELEGRETURN procedure meet ETIMEOUT, delegation will be marked as NFS4ERR_DELEG_REVOKED and never return it again. NFS server will keep the revoked delegation in clp->cl_revoked forever. This will result in following sequence response with RECALLABLE_STATE_REVOKED flag. Client will send test_state_id request for all non-revoked delegation.
This can only be solved by restarting NFS server.
I think ETIMEOUT in NFSPROC4_CLNT_DELEGRETURN procedure may be not the only case that cause lots of non-terminable test_state_id requests after any non-privilaged request. 
Wish NFS experts give some advices on this problem.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question]nfs: never returned delegation
  2025-08-11 12:48 [Question]nfs: never returned delegation zhangjian (CG)
@ 2025-08-11 13:03 ` Trond Myklebust
  2025-08-12  2:51   ` zhangjian (CG)
  2025-09-01  9:07   ` Li Lingfeng
  2025-08-11 13:03 ` Jeff Layton
  1 sibling, 2 replies; 15+ messages in thread
From: Trond Myklebust @ 2025-08-11 13:03 UTC (permalink / raw)
  To: zhangjian (CG), anna; +Cc: linux-nfs, linux-kernel

On Mon, 2025-08-11 at 20:48 +0800, zhangjian (CG) wrote:
> Recently, we meet a NFS problem in 5.10. There are so many
> test_state_id request after a non-privilaged request in tcpdump
> result. There are 40w+ delegations in client (I read the delegation
> list from /proc/kcore).
> Firstly, I think state manager cost a lot in
> nfs_server_reap_expired_delegations. But I see they are all in
> NFS_DELEGATION_REVOKED state except 6 in NFS_DELEGATION_REFERENCED (I
> read this from /proc/kcore too). 
> I analyze NFS code and find if NFSPROC4_CLNT_DELEGRETURN procedure
> meet ETIMEOUT, delegation will be marked as NFS4ERR_DELEG_REVOKED and
> never return it again. NFS server will keep the revoked delegation in
> clp->cl_revoked forever. This will result in following sequence
> response with RECALLABLE_STATE_REVOKED flag. Client will send
> test_state_id request for all non-revoked delegation.
> This can only be solved by restarting NFS server.
> I think ETIMEOUT in NFSPROC4_CLNT_DELEGRETURN procedure may be not
> the only case that cause lots of non-terminable test_state_id
> requests after any non-privilaged request. 
> Wish NFS experts give some advices on this problem.
> 

You have the following options:

   1. Don't ever use "soft" or "softerr" on the NFS client.
   2. Reboot your server every now and again.
   3. Change the server code to not bother caching revoked state. Doing
      so is rather pointless, since there is nothing a client can do
      differently when presented with NFS4ERR_DELEG_REVOKED vs.
      NFS4ERR_BAD_STATEID.
   4. Change the server code to garbage collect revoked stateids after
      a while.


-- 
Trond Myklebust Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question]nfs: never returned delegation
  2025-08-11 12:48 [Question]nfs: never returned delegation zhangjian (CG)
  2025-08-11 13:03 ` Trond Myklebust
@ 2025-08-11 13:03 ` Jeff Layton
  2025-08-11 13:06   ` Trond Myklebust
  2025-08-12  2:45   ` zhangjian (CG)
  1 sibling, 2 replies; 15+ messages in thread
From: Jeff Layton @ 2025-08-11 13:03 UTC (permalink / raw)
  To: zhangjian (CG), Trond Myklebust, anna; +Cc: linux-nfs, linux-kernel

On Mon, 2025-08-11 at 20:48 +0800, zhangjian (CG) wrote:
> Recently, we meet a NFS problem in 5.10. There are so many test_state_id request after a non-privilaged request in tcpdump result. There are 40w+ delegations in client (I read the delegation list from /proc/kcore).
> Firstly, I think state manager cost a lot in nfs_server_reap_expired_delegations. But I see they are all in NFS_DELEGATION_REVOKED state except 6 in NFS_DELEGATION_REFERENCED (I read this from /proc/kcore too). 
> I analyze NFS code and find if NFSPROC4_CLNT_DELEGRETURN procedure meet ETIMEOUT, delegation will be marked as NFS4ERR_DELEG_REVOKED and never return it again. NFS server will keep the revoked delegation in clp->cl_revoked forever. This will result in following sequence response with RECALLABLE_STATE_REVOKED flag. Client will send test_state_id request for all non-revoked delegation.
> This can only be solved by restarting NFS server.
> I think ETIMEOUT in NFSPROC4_CLNT_DELEGRETURN procedure may be not the only case that cause lots of non-terminable test_state_id requests after any non-privilaged request. 
> Wish NFS experts give some advices on this problem.
> 

What should happen is that the client should issue a TEST_STATEID and
then follow up with a FREE_STATEID once it's clear that it has been
revoked. Alternately, if the client expires then the server will purge
any state it held at that point. The server is required to keep a
record of these objects until one of those events occurs.

v5.10 is pretty old, and there have been a number of fixes in this area
in both the client and server over the last several years. You may want
to try a newer kernel (or look at doing some backporting).

Cheers,
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question]nfs: never returned delegation
  2025-08-11 13:03 ` Jeff Layton
@ 2025-08-11 13:06   ` Trond Myklebust
  2025-08-12  2:45   ` zhangjian (CG)
  1 sibling, 0 replies; 15+ messages in thread
From: Trond Myklebust @ 2025-08-11 13:06 UTC (permalink / raw)
  To: Jeff Layton, zhangjian (CG), anna; +Cc: linux-nfs, linux-kernel

On Mon, 2025-08-11 at 09:03 -0400, Jeff Layton wrote:
> On Mon, 2025-08-11 at 20:48 +0800, zhangjian (CG) wrote:
> > Recently, we meet a NFS problem in 5.10. There are so many
> > test_state_id request after a non-privilaged request in tcpdump
> > result. There are 40w+ delegations in client (I read the delegation
> > list from /proc/kcore).
> > Firstly, I think state manager cost a lot in
> > nfs_server_reap_expired_delegations. But I see they are all in
> > NFS_DELEGATION_REVOKED state except 6 in NFS_DELEGATION_REFERENCED
> > (I read this from /proc/kcore too). 
> > I analyze NFS code and find if NFSPROC4_CLNT_DELEGRETURN procedure
> > meet ETIMEOUT, delegation will be marked as NFS4ERR_DELEG_REVOKED
> > and never return it again. NFS server will keep the revoked
> > delegation in clp->cl_revoked forever. This will result in
> > following sequence response with RECALLABLE_STATE_REVOKED flag.
> > Client will send test_state_id request for all non-revoked
> > delegation.
> > This can only be solved by restarting NFS server.
> > I think ETIMEOUT in NFSPROC4_CLNT_DELEGRETURN procedure may be not
> > the only case that cause lots of non-terminable test_state_id
> > requests after any non-privilaged request. 
> > Wish NFS experts give some advices on this problem.
> > 
> 
> What should happen is that the client should issue a TEST_STATEID and
> then follow up with a FREE_STATEID once it's clear that it has been
> revoked. Alternately, if the client expires then the server will
> purge
> any state it held at that point. The server is required to keep a
> record of these objects until one of those events occurs.
> 
> v5.10 is pretty old, and there have been a number of fixes in this
> area
> in both the client and server over the last several years. You may
> want
> to try a newer kernel (or look at doing some backporting).
> 
> Cheers,

No. If you get an ETIMEDOUT, then it means you are doing soft mounts or
softerr. The client will not follow up with TEST_STATEID or
FREE_STATEID.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question]nfs: never returned delegation
  2025-08-11 13:03 ` Jeff Layton
  2025-08-11 13:06   ` Trond Myklebust
@ 2025-08-12  2:45   ` zhangjian (CG)
  2026-03-06  2:46     ` zhangjian (CG)
  1 sibling, 1 reply; 15+ messages in thread
From: zhangjian (CG) @ 2025-08-12  2:45 UTC (permalink / raw)
  To: Jeff Layton, Trond Myklebust, anna; +Cc: linux-nfs, linux-kernel

Thanks a lot for reply.

Stateid is marked NFS4_INVALID_STATEID_TYPE when delegation is marked
NFS4ERR_DELEG_REVOKED. nfs_mark_test_expired_delegation will not mark
delegation as NFS_DELEGATION_TEST_EXPIRED again. In this case,
TEST_STATEID and FREE_STATEID will not be send to server any more.
This means that if return-delegation-procedure meet ETIMEOUT, delegation
will be in server clp->cl_revoked list forever.

On 2025/8/11 21:03, Jeff Layton wrote:
> On Mon, 2025-08-11 at 20:48 +0800, zhangjian (CG) wrote:
>> Recently, we meet a NFS problem in 5.10. There are so many test_state_id request after a non-privilaged request in tcpdump result. There are 40w+ delegations in client (I read the delegation list from /proc/kcore).
>> Firstly, I think state manager cost a lot in nfs_server_reap_expired_delegations. But I see they are all in NFS_DELEGATION_REVOKED state except 6 in NFS_DELEGATION_REFERENCED (I read this from /proc/kcore too). 
>> I analyze NFS code and find if NFSPROC4_CLNT_DELEGRETURN procedure meet ETIMEOUT, delegation will be marked as NFS4ERR_DELEG_REVOKED and never return it again. NFS server will keep the revoked delegation in clp->cl_revoked forever. This will result in following sequence response with RECALLABLE_STATE_REVOKED flag. Client will send test_state_id request for all non-revoked delegation.
>> This can only be solved by restarting NFS server.
>> I think ETIMEOUT in NFSPROC4_CLNT_DELEGRETURN procedure may be not the only case that cause lots of non-terminable test_state_id requests after any non-privilaged request. 
>> Wish NFS experts give some advices on this problem.
>>
> 
> What should happen is that the client should issue a TEST_STATEID and
> then follow up with a FREE_STATEID once it's clear that it has been
> revoked. Alternately, if the client expires then the server will purge
> any state it held at that point. The server is required to keep a
> record of these objects until one of those events occurs.
> 
> v5.10 is pretty old, and there have been a number of fixes in this area
> in both the client and server over the last several years. You may want
> to try a newer kernel (or look at doing some backporting).
> 
> Cheers,


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question]nfs: never returned delegation
  2025-08-11 13:03 ` Trond Myklebust
@ 2025-08-12  2:51   ` zhangjian (CG)
  2025-09-01  9:07   ` Li Lingfeng
  1 sibling, 0 replies; 15+ messages in thread
From: zhangjian (CG) @ 2025-08-12  2:51 UTC (permalink / raw)
  To: Trond Myklebust, anna; +Cc: linux-nfs, linux-kernel

On 2025/8/11 21:03, Trond Myklebust wrote:
> On Mon, 2025-08-11 at 20:48 +0800, zhangjian (CG) wrote:
>> Recently, we meet a NFS problem in 5.10. There are so many
>> test_state_id request after a non-privilaged request in tcpdump
>> result. There are 40w+ delegations in client (I read the delegation
>> list from /proc/kcore).
>> Firstly, I think state manager cost a lot in
>> nfs_server_reap_expired_delegations. But I see they are all in
>> NFS_DELEGATION_REVOKED state except 6 in NFS_DELEGATION_REFERENCED (I
>> read this from /proc/kcore too). 
>> I analyze NFS code and find if NFSPROC4_CLNT_DELEGRETURN procedure
>> meet ETIMEOUT, delegation will be marked as NFS4ERR_DELEG_REVOKED and
>> never return it again. NFS server will keep the revoked delegation in
>> clp->cl_revoked forever. This will result in following sequence
>> response with RECALLABLE_STATE_REVOKED flag. Client will send
>> test_state_id request for all non-revoked delegation.
>> This can only be solved by restarting NFS server.
>> I think ETIMEOUT in NFSPROC4_CLNT_DELEGRETURN procedure may be not
>> the only case that cause lots of non-terminable test_state_id
>> requests after any non-privilaged request. 
>> Wish NFS experts give some advices on this problem.
>>
> 
> You have the following options:
> 
>    1. Don't ever use "soft" or "softerr" on the NFS client.
>    2. Reboot your server every now and again.
>    3. Change the server code to not bother caching revoked state. Doing
>       so is rather pointless, since there is nothing a client can do
>       differently when presented with NFS4ERR_DELEG_REVOKED vs.
>       NFS4ERR_BAD_STATEID.
>    4. Change the server code to garbage collect revoked stateids after
>       a while.
> 
>

Thanks a lot for reply.

NFS client meet TIMEOUT in return-delegation procedure may not be the
only case that server keep delegation in clp->cl_revoked list forever.
I think garbaging collecting revoked stateid after a while (4) is more
reasonable way to avoid this problem。


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question]nfs: never returned delegation
  2025-08-11 13:03 ` Trond Myklebust
  2025-08-12  2:51   ` zhangjian (CG)
@ 2025-09-01  9:07   ` Li Lingfeng
  2025-09-01 11:40     ` Jeff Layton
  1 sibling, 1 reply; 15+ messages in thread
From: Li Lingfeng @ 2025-09-01  9:07 UTC (permalink / raw)
  To: Trond Myklebust, zhangjian (CG), anna
  Cc: linux-nfs, linux-kernel, Chuck Lever, Jeff Layton, NeilBrown,
	yangerkun, zhangyi (F), Hou Tao, chengzhihao1@huawei.com,
	Li Lingfeng

Hi,

在 2025/8/11 21:03, Trond Myklebust 写道:
> On Mon, 2025-08-11 at 20:48 +0800, zhangjian (CG) wrote:
>> Recently, we meet a NFS problem in 5.10. There are so many
>> test_state_id request after a non-privilaged request in tcpdump
>> result. There are 40w+ delegations in client (I read the delegation
>> list from /proc/kcore).
>> Firstly, I think state manager cost a lot in
>> nfs_server_reap_expired_delegations. But I see they are all in
>> NFS_DELEGATION_REVOKED state except 6 in NFS_DELEGATION_REFERENCED (I
>> read this from /proc/kcore too).
>> I analyze NFS code and find if NFSPROC4_CLNT_DELEGRETURN procedure
>> meet ETIMEOUT, delegation will be marked as NFS4ERR_DELEG_REVOKED and
>> never return it again. NFS server will keep the revoked delegation in
>> clp->cl_revoked forever. This will result in following sequence
>> response with RECALLABLE_STATE_REVOKED flag. Client will send
>> test_state_id request for all non-revoked delegation.
>> This can only be solved by restarting NFS server.
>> I think ETIMEOUT in NFSPROC4_CLNT_DELEGRETURN procedure may be not
>> the only case that cause lots of non-terminable test_state_id
>> requests after any non-privilaged request.
>> Wish NFS experts give some advices on this problem.
>>
> You have the following options:
>
>     1. Don't ever use "soft" or "softerr" on the NFS client.
>     2. Reboot your server every now and again.
>     3. Change the server code to not bother caching revoked state. Doing
>        so is rather pointless, since there is nothing a client can do
>        differently when presented with NFS4ERR_DELEG_REVOKED vs.
>        NFS4ERR_BAD_STATEID.
>     4. Change the server code to garbage collect revoked stateids after
>        a while.
>
I found that a server-side bug could also cause such behavior, and I've
reproduced the issue based on the master (commit b320789d6883).
nfs4_laundromat                       nfsd4_delegreturn
  list_add // add dp to reaplist
           // by dl_recall_lru
  list_del_init // delete dp from
                // reaplist
                                        destroy_delegation
                                         unhash_delegation_locked
                                          list_del_init
                                          // dp was not added to any list
                                          // via dl_recall_lru
  revoke_delegation
  list_add // add dp to cl_revoked
           // by dl_recall_lru

The delegation will be left in cl_revoked.

I agree with Trond's suggestion to change the server code to fix it.

Thanks,
Lingfeng

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question]nfs: never returned delegation
  2025-09-01  9:07   ` Li Lingfeng
@ 2025-09-01 11:40     ` Jeff Layton
  2025-09-01 14:12       ` Li Lingfeng
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff Layton @ 2025-09-01 11:40 UTC (permalink / raw)
  To: Li Lingfeng, Trond Myklebust, zhangjian (CG), anna
  Cc: linux-nfs, linux-kernel, Chuck Lever, NeilBrown, yangerkun,
	zhangyi (F), Hou Tao, chengzhihao1@huawei.com, Li Lingfeng

On Mon, 2025-09-01 at 17:07 +0800, Li Lingfeng wrote:
> Hi,
> 
> 在 2025/8/11 21:03, Trond Myklebust 写道:
> > On Mon, 2025-08-11 at 20:48 +0800, zhangjian (CG) wrote:
> > > Recently, we meet a NFS problem in 5.10. There are so many
> > > test_state_id request after a non-privilaged request in tcpdump
> > > result. There are 40w+ delegations in client (I read the delegation
> > > list from /proc/kcore).
> > > Firstly, I think state manager cost a lot in
> > > nfs_server_reap_expired_delegations. But I see they are all in
> > > NFS_DELEGATION_REVOKED state except 6 in NFS_DELEGATION_REFERENCED (I
> > > read this from /proc/kcore too).
> > > I analyze NFS code and find if NFSPROC4_CLNT_DELEGRETURN procedure
> > > meet ETIMEOUT, delegation will be marked as NFS4ERR_DELEG_REVOKED and
> > > never return it again. NFS server will keep the revoked delegation in
> > > clp->cl_revoked forever. This will result in following sequence
> > > response with RECALLABLE_STATE_REVOKED flag. Client will send
> > > test_state_id request for all non-revoked delegation.
> > > This can only be solved by restarting NFS server.
> > > I think ETIMEOUT in NFSPROC4_CLNT_DELEGRETURN procedure may be not
> > > the only case that cause lots of non-terminable test_state_id
> > > requests after any non-privilaged request.
> > > Wish NFS experts give some advices on this problem.
> > > 
> > You have the following options:
> > 
> >     1. Don't ever use "soft" or "softerr" on the NFS client.
> >     2. Reboot your server every now and again.
> >     3. Change the server code to not bother caching revoked state. Doing
> >        so is rather pointless, since there is nothing a client can do
> >        differently when presented with NFS4ERR_DELEG_REVOKED vs.
> >        NFS4ERR_BAD_STATEID.
> >     4. Change the server code to garbage collect revoked stateids after
> >        a while.
> > 
> I found that a server-side bug could also cause such behavior, and I've
> reproduced the issue based on the master (commit b320789d6883).
> nfs4_laundromat                       nfsd4_delegreturn

I think you may be right about the race. The details are a little off
though. The important bit here is that the laundromat also calls this
unhash_delegation_locked before doing the list_add/del.

>   list_add // add dp to reaplist
>            // by dl_recall_lru
>   list_del_init // delete dp from
>                 // reaplist
>                                         destroy_delegation
>                                          unhash_delegation_locked

...which _should_ make the above unhash_delegation_locked return false,
so that list_del_init never happens.

>                                           list_del_init
>                                           // dp was not added to any list
>                                           // via dl_recall_lru
>   revoke_delegation
>   list_add // add dp to cl_revoked
>            // by dl_recall_lru
> 
> The delegation will be left in cl_revoked.
> 
> I agree with Trond's suggestion to change the server code to fix it.
> 
> 

...but there is at least one variation on what you wrote above where it
could get stuck back on the cl_revoked list after the delegreturn. The
delegreturn does set the SC_STATUS_CLOSED bit on the stateid, so
something like this (untested) patch, perhaps?

------------8<----------

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index d2d5e8e397a4..e594ded49e60 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1506,7 +1506,7 @@ static void revoke_delegation(struct nfs4_delegation *dp)
        trace_nfsd_stid_revoke(&dp->dl_stid);
 
        spin_lock(&clp->cl_lock);
-       if (dp->dl_stid.sc_status & SC_STATUS_FREED) {
+       if (dp->dl_stid.sc_status & (SC_STATUS_FREED | SC_STATUS_CLOSED)) {
                list_del_init(&dp->dl_recall_lru);
                goto out;
        }


-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Question]nfs: never returned delegation
  2025-09-01 11:40     ` Jeff Layton
@ 2025-09-01 14:12       ` Li Lingfeng
  0 siblings, 0 replies; 15+ messages in thread
From: Li Lingfeng @ 2025-09-01 14:12 UTC (permalink / raw)
  To: Jeff Layton, Trond Myklebust, zhangjian (CG), anna
  Cc: linux-nfs, linux-kernel, Chuck Lever, NeilBrown, yangerkun,
	zhangyi (F), Hou Tao, chengzhihao1@huawei.com, Li Lingfeng

Hi,

在 2025/9/1 19:40, Jeff Layton 写道:
> On Mon, 2025-09-01 at 17:07 +0800, Li Lingfeng wrote:
>> Hi,
>>
>> 在 2025/8/11 21:03, Trond Myklebust 写道:
>>> On Mon, 2025-08-11 at 20:48 +0800, zhangjian (CG) wrote:
>>>> Recently, we meet a NFS problem in 5.10. There are so many
>>>> test_state_id request after a non-privilaged request in tcpdump
>>>> result. There are 40w+ delegations in client (I read the delegation
>>>> list from /proc/kcore).
>>>> Firstly, I think state manager cost a lot in
>>>> nfs_server_reap_expired_delegations. But I see they are all in
>>>> NFS_DELEGATION_REVOKED state except 6 in NFS_DELEGATION_REFERENCED (I
>>>> read this from /proc/kcore too).
>>>> I analyze NFS code and find if NFSPROC4_CLNT_DELEGRETURN procedure
>>>> meet ETIMEOUT, delegation will be marked as NFS4ERR_DELEG_REVOKED and
>>>> never return it again. NFS server will keep the revoked delegation in
>>>> clp->cl_revoked forever. This will result in following sequence
>>>> response with RECALLABLE_STATE_REVOKED flag. Client will send
>>>> test_state_id request for all non-revoked delegation.
>>>> This can only be solved by restarting NFS server.
>>>> I think ETIMEOUT in NFSPROC4_CLNT_DELEGRETURN procedure may be not
>>>> the only case that cause lots of non-terminable test_state_id
>>>> requests after any non-privilaged request.
>>>> Wish NFS experts give some advices on this problem.
>>>>
>>> You have the following options:
>>>
>>>      1. Don't ever use "soft" or "softerr" on the NFS client.
>>>      2. Reboot your server every now and again.
>>>      3. Change the server code to not bother caching revoked state. Doing
>>>         so is rather pointless, since there is nothing a client can do
>>>         differently when presented with NFS4ERR_DELEG_REVOKED vs.
>>>         NFS4ERR_BAD_STATEID.
>>>      4. Change the server code to garbage collect revoked stateids after
>>>         a while.
>>>
>> I found that a server-side bug could also cause such behavior, and I've
>> reproduced the issue based on the master (commit b320789d6883).
>> nfs4_laundromat                       nfsd4_delegreturn
> I think you may be right about the race. The details are a little off
> though. The important bit here is that the laundromat also calls this
> unhash_delegation_locked before doing the list_add/del.
>>    list_add // add dp to reaplist
>>             // by dl_recall_lru
>>    list_del_init // delete dp from
>>                  // reaplist
>>                                          destroy_delegation
>>                                           unhash_delegation_locked
> ...which _should_ make the above unhash_delegation_locked return false,
> so that list_del_init never happens.
Thank you for your correction. The delegreturn indeed does not perform
list_del_init in such concurrent scenarios.
>
>>                                            list_del_init
>>                                            // dp was not added to any list
>>                                            // via dl_recall_lru
>>    revoke_delegation
>>    list_add // add dp to cl_revoked
>>             // by dl_recall_lru
>>
>> The delegation will be left in cl_revoked.
>>
>> I agree with Trond's suggestion to change the server code to fix it.
>>
>>
> ...but there is at least one variation on what you wrote above where it
> could get stuck back on the cl_revoked list after the delegreturn. The
> delegreturn does set the SC_STATUS_CLOSED bit on the stateid, so
> something like this (untested) patch, perhaps?
However, as you noted, since laundromat calls unhash_delegation_locked
first, I think the delegreturn will skip setting SC_STATUS_CLOSED due to
delegation_hashed returning false in unhash_delegation_locked.
>
> ------------8<----------
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index d2d5e8e397a4..e594ded49e60 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1506,7 +1506,7 @@ static void revoke_delegation(struct nfs4_delegation *dp)
>          trace_nfsd_stid_revoke(&dp->dl_stid);
>   
>          spin_lock(&clp->cl_lock);
> -       if (dp->dl_stid.sc_status & SC_STATUS_FREED) {
> +       if (dp->dl_stid.sc_status & (SC_STATUS_FREED | SC_STATUS_CLOSED)) {
>                  list_del_init(&dp->dl_recall_lru);
>                  goto out;
>          }
>
Thanks,
Lingfeng


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question]nfs: never returned delegation
  2025-08-12  2:45   ` zhangjian (CG)
@ 2026-03-06  2:46     ` zhangjian (CG)
  2026-03-06  4:49       ` Trond Myklebust
  0 siblings, 1 reply; 15+ messages in thread
From: zhangjian (CG) @ 2026-03-06  2:46 UTC (permalink / raw)
  To: trond.myklebust, anna, Jeff Layton; +Cc: linux-nfs, linux-kernel

Hi experts on NFS:

Recently we meet an error:
1.Nfs wait for sunrpc
2.Sunrpc send OPEN message and hang the rpc task onto sunrpc pending queue.
3.Server never reply, and since NFS_CS_NO_RETRANS_TIMEOUT is forced and
connection is ESTABLISHED, task will never be retransmitted.
This cause procedures waiting on this file hang forever.
I know using "umount -f " to kill rpc task works. And the key to the
problem most likely lies in the network layer. But should nfs retransmit
it after waiting for so long?

Wish for reply. Thanks

Zhangjian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question]nfs: never returned delegation
  2026-03-06  2:46     ` zhangjian (CG)
@ 2026-03-06  4:49       ` Trond Myklebust
  2026-03-12  4:19         ` [Question]nfs: should nfs timeout even with NFS_CS_NO_RETRANS_TIMEOUT ? zhangjian (CG)
  0 siblings, 1 reply; 15+ messages in thread
From: Trond Myklebust @ 2026-03-06  4:49 UTC (permalink / raw)
  To: zhangjian (CG), anna, Jeff Layton; +Cc: linux-nfs, linux-kernel

On Fri, 2026-03-06 at 10:46 +0800, zhangjian (CG) wrote:
> Hi experts on NFS:
> 
> Recently we meet an error:
> 1.Nfs wait for sunrpc
> 2.Sunrpc send OPEN message and hang the rpc task onto sunrpc pending
> queue.
> 3.Server never reply, and since NFS_CS_NO_RETRANS_TIMEOUT is forced
> and
> connection is ESTABLISHED, task will never be retransmitted.
> This cause procedures waiting on this file hang forever.
> I know using "umount -f " to kill rpc task works. And the key to the
> problem most likely lies in the network layer. But should nfs
> retransmit
> it after waiting for so long?
> 
> Wish for reply. Thanks
> 
> Zhangjian
> 
Please read the NFSv4 spec. It very clearly states that the client
should never retransmit unless the connection breaks.

IOW: the problem here is your broken server, not the client.
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question]nfs: should nfs timeout even with NFS_CS_NO_RETRANS_TIMEOUT ?
  2026-03-06  4:49       ` Trond Myklebust
@ 2026-03-12  4:19         ` zhangjian (CG)
  2026-03-12 13:09           ` Trond Myklebust
  0 siblings, 1 reply; 15+ messages in thread
From: zhangjian (CG) @ 2026-03-12  4:19 UTC (permalink / raw)
  To: Trond Myklebust, anna, Jeff Layton; +Cc: linux-nfs, linux-kernel



On 3/6/2026 12:49 PM, Trond Myklebust wrote:
> On Fri, 2026-03-06 at 10:46 +0800, zhangjian (CG) wrote:
>> Hi experts on NFS:
>>
>> Recently we meet an error:
>> 1.Nfs wait for sunrpc
>> 2.Sunrpc send OPEN message and hang the rpc task onto sunrpc pending
>> queue.
>> 3.Server never reply, and since NFS_CS_NO_RETRANS_TIMEOUT is forced
>> and
>> connection is ESTABLISHED, task will never be retransmitted.
>> This cause procedures waiting on this file hang forever.
>> I know using "umount -f " to kill rpc task works. And the key to the
>> problem most likely lies in the network layer. But should nfs
>> retransmit
>> it after waiting for so long?
>>
>> Wish for reply. Thanks
>>
>> Zhangjian
>>
> Please read the NFSv4 spec. It very clearly states that the client
> should never retransmit unless the connection breaks.
> 

NFSv4 spec said client should never retransmit, but not said client need
to wait forever. Maybe sunrpc should tell nfs -ETIMEOUT and nfs return
ERROR rather than retransmit.

> IOW: the problem here is your broken server, not the client.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question]nfs: should nfs timeout even with NFS_CS_NO_RETRANS_TIMEOUT ?
  2026-03-12  4:19         ` [Question]nfs: should nfs timeout even with NFS_CS_NO_RETRANS_TIMEOUT ? zhangjian (CG)
@ 2026-03-12 13:09           ` Trond Myklebust
  2026-03-13  3:22             ` zhangjian (CG)
  0 siblings, 1 reply; 15+ messages in thread
From: Trond Myklebust @ 2026-03-12 13:09 UTC (permalink / raw)
  To: zhangjian (CG), anna, Jeff Layton; +Cc: linux-nfs, linux-kernel

On Thu, 2026-03-12 at 12:19 +0800, zhangjian (CG) wrote:
> 
> 
> On 3/6/2026 12:49 PM, Trond Myklebust wrote:
> > On Fri, 2026-03-06 at 10:46 +0800, zhangjian (CG) wrote:
> > > Hi experts on NFS:
> > > 
> > > Recently we meet an error:
> > > 1.Nfs wait for sunrpc
> > > 2.Sunrpc send OPEN message and hang the rpc task onto sunrpc
> > > pending
> > > queue.
> > > 3.Server never reply, and since NFS_CS_NO_RETRANS_TIMEOUT is
> > > forced
> > > and
> > > connection is ESTABLISHED, task will never be retransmitted.
> > > This cause procedures waiting on this file hang forever.
> > > I know using "umount -f " to kill rpc task works. And the key to
> > > the
> > > problem most likely lies in the network layer. But should nfs
> > > retransmit
> > > it after waiting for so long?
> > > 
> > > Wish for reply. Thanks
> > > 
> > > Zhangjian
> > > 
> > Please read the NFSv4 spec. It very clearly states that the client
> > should never retransmit unless the connection breaks.
> > 
> 
> NFSv4 spec said client should never retransmit, but not said client
> need
> to wait forever. Maybe sunrpc should tell nfs -ETIMEOUT and nfs
> return
> ERROR rather than retransmit.

You are 100% free to use the existing 'soft' or 'softerr' mount options
if you have applications that can parse those (non-POSIX) errors.
Note however that there is no way to tell the server that you are
'cancelling' an RPC call, so it will hold onto that slot until it is
done executing the call (see RFC8881, Section 2.10.6.1.). So you are
eventually going to run out of usable slots, and the system will gum up
anyway.

The default mount option is 'hard', because those are the only
semantics that are compatible with POSIX and NFSv4.x.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question]nfs: should nfs timeout even with NFS_CS_NO_RETRANS_TIMEOUT ?
  2026-03-12 13:09           ` Trond Myklebust
@ 2026-03-13  3:22             ` zhangjian (CG)
  2026-03-13 15:18               ` Trond Myklebust
  0 siblings, 1 reply; 15+ messages in thread
From: zhangjian (CG) @ 2026-03-13  3:22 UTC (permalink / raw)
  To: Trond Myklebust, anna, Jeff Layton; +Cc: linux-nfs, linux-kernel


On 3/12/2026 9:09 PM, Trond Myklebust wrote:
> On Thu, 2026-03-12 at 12:19 +0800, zhangjian (CG) wrote:
>>
>>
>> On 3/6/2026 12:49 PM, Trond Myklebust wrote:
>>> On Fri, 2026-03-06 at 10:46 +0800, zhangjian (CG) wrote:
>>>> Hi experts on NFS:
>>>>
>>>> Recently we meet an error:
>>>> 1.Nfs wait for sunrpc
>>>> 2.Sunrpc send OPEN message and hang the rpc task onto sunrpc
>>>> pending
>>>> queue.
>>>> 3.Server never reply, and since NFS_CS_NO_RETRANS_TIMEOUT is
>>>> forced
>>>> and
>>>> connection is ESTABLISHED, task will never be retransmitted.
>>>> This cause procedures waiting on this file hang forever.
>>>> I know using "umount -f " to kill rpc task works. And the key to
>>>> the
>>>> problem most likely lies in the network layer. But should nfs
>>>> retransmit
>>>> it after waiting for so long?
>>>>
>>>> Wish for reply. Thanks
>>>>
>>>> Zhangjian
>>>>
>>> Please read the NFSv4 spec. It very clearly states that the client
>>> should never retransmit unless the connection breaks.
>>>
>>
>> NFSv4 spec said client should never retransmit, but not said client
>> need
>> to wait forever. Maybe sunrpc should tell nfs -ETIMEOUT and nfs
>> return
>> ERROR rather than retransmit.
> 
> You are 100% free to use the existing 'soft' or 'softerr' mount options
> if you have applications that can parse those (non-POSIX) errors.

I have already mounted with soft,retrans,timeo options. The connection
is in established state. But since NFS_CS_NO_RETRANS_TIMEOUT is set. The
OPEN rpctask will not return -ETIMEOUT. Any operation waiting for the
seqid will hang. The soft don't works when connection is good.

> Note however that there is no way to tell the server that you are
> 'cancelling' an RPC call, so it will hold onto that slot until it is
> done executing the call (see RFC8881, Section 2.10.6.1.). So you are
> eventually going to run out of usable slots, and the system will gum up
> anyway.

Maybe client hanging for so long is more serious than running out of
client slot. Even auto-reconnecting is better than this.

> 
> The default mount option is 'hard', because those are the only
> semantics that are compatible with POSIX and NFSv4.x.
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question]nfs: should nfs timeout even with NFS_CS_NO_RETRANS_TIMEOUT ?
  2026-03-13  3:22             ` zhangjian (CG)
@ 2026-03-13 15:18               ` Trond Myklebust
  0 siblings, 0 replies; 15+ messages in thread
From: Trond Myklebust @ 2026-03-13 15:18 UTC (permalink / raw)
  To: zhangjian (CG), anna, Jeff Layton; +Cc: linux-nfs, linux-kernel

On Fri, 2026-03-13 at 11:22 +0800, zhangjian (CG) wrote:
> 
> On 3/12/2026 9:09 PM, Trond Myklebust wrote:
> > On Thu, 2026-03-12 at 12:19 +0800, zhangjian (CG) wrote:
> > > 
> > > 
> > > On 3/6/2026 12:49 PM, Trond Myklebust wrote:
> > > > On Fri, 2026-03-06 at 10:46 +0800, zhangjian (CG) wrote:
> > > > > Hi experts on NFS:
> > > > > 
> > > > > Recently we meet an error:
> > > > > 1.Nfs wait for sunrpc
> > > > > 2.Sunrpc send OPEN message and hang the rpc task onto sunrpc
> > > > > pending
> > > > > queue.
> > > > > 3.Server never reply, and since NFS_CS_NO_RETRANS_TIMEOUT is
> > > > > forced
> > > > > and
> > > > > connection is ESTABLISHED, task will never be retransmitted.
> > > > > This cause procedures waiting on this file hang forever.
> > > > > I know using "umount -f " to kill rpc task works. And the key
> > > > > to
> > > > > the
> > > > > problem most likely lies in the network layer. But should nfs
> > > > > retransmit
> > > > > it after waiting for so long?
> > > > > 
> > > > > Wish for reply. Thanks
> > > > > 
> > > > > Zhangjian
> > > > > 
> > > > Please read the NFSv4 spec. It very clearly states that the
> > > > client
> > > > should never retransmit unless the connection breaks.
> > > > 
> > > 
> > > NFSv4 spec said client should never retransmit, but not said
> > > client
> > > need
> > > to wait forever. Maybe sunrpc should tell nfs -ETIMEOUT and nfs
> > > return
> > > ERROR rather than retransmit.
> > 
> > You are 100% free to use the existing 'soft' or 'softerr' mount
> > options
> > if you have applications that can parse those (non-POSIX) errors.
> 
> I have already mounted with soft,retrans,timeo options. The
> connection
> is in established state. But since NFS_CS_NO_RETRANS_TIMEOUT is set.
> The
> OPEN rpctask will not return -ETIMEOUT. Any operation waiting for the
> seqid will hang. The soft don't works when connection is good.
> 
> > Note however that there is no way to tell the server that you are
> > 'cancelling' an RPC call, so it will hold onto that slot until it
> > is
> > done executing the call (see RFC8881, Section 2.10.6.1.). So you
> > are
> > eventually going to run out of usable slots, and the system will
> > gum up
> > anyway.
> 
> Maybe client hanging for so long is more serious than running out of
> client slot. Even auto-reconnecting is better than this.

We do not ever "fix" broken servers by hacking the client.

I suggest that either you fix your server, or that you replace it with
one that isn't broken.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-03-13 15:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-11 12:48 [Question]nfs: never returned delegation zhangjian (CG)
2025-08-11 13:03 ` Trond Myklebust
2025-08-12  2:51   ` zhangjian (CG)
2025-09-01  9:07   ` Li Lingfeng
2025-09-01 11:40     ` Jeff Layton
2025-09-01 14:12       ` Li Lingfeng
2025-08-11 13:03 ` Jeff Layton
2025-08-11 13:06   ` Trond Myklebust
2025-08-12  2:45   ` zhangjian (CG)
2026-03-06  2:46     ` zhangjian (CG)
2026-03-06  4:49       ` Trond Myklebust
2026-03-12  4:19         ` [Question]nfs: should nfs timeout even with NFS_CS_NO_RETRANS_TIMEOUT ? zhangjian (CG)
2026-03-12 13:09           ` Trond Myklebust
2026-03-13  3:22             ` zhangjian (CG)
2026-03-13 15:18               ` Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox