From: Jeff Layton <jlayton@kernel.org>
To: Li Lingfeng <lilingfeng3@huawei.com>,
chuck.lever@oracle.com, neil@brown.name, okorniev@redhat.com,
Dai.Ngo@oracle.com, tom@talpey.com, linux-nfs@vger.kernel.org,
linux-kernel@vger.kernel.org
Cc: yukuai1@huaweicloud.com, houtao1@huawei.com, yi.zhang@huawei.com,
yangerkun@huawei.com, lilingfeng@huaweicloud.com,
zhangjian496@huawei.com
Subject: Re: [PATCH] nfsd: remove long-standing revoked delegations by force
Date: Tue, 02 Sep 2025 10:29:37 -0400 [thread overview]
Message-ID: <7f0c2dfbcebcacef8afa0b8d7cbdd6d84296cbcb.camel@kernel.org> (raw)
In-Reply-To: <c5869e23-5294-4757-a757-868748b3bc65@huawei.com>
On Tue, 2025-09-02 at 22:21 +0800, Li Lingfeng wrote:
> 在 2025/9/2 21:40, Jeff Layton 写道:
> > On Tue, 2025-09-02 at 20:10 +0800, Li Lingfeng wrote:
> > > Hi,
> > >
> > > 在 2025/9/2 18:21, Jeff Layton 写道:
> > > > On Tue, 2025-09-02 at 10:22 +0800, Li Lingfeng wrote:
> > > > > When file access conflicts occur between clients, the server recalls
> > > > > delegations. If the client holding delegation fails to return it after
> > > > > a recall, nfs4_laundromat adds the delegation to cl_revoked list.
> > > > > This causes subsequent SEQUENCE operations to set the
> > > > > SEQ4_STATUS_RECALLABLE_STATE_REVOKED flag, forcing the client to
> > > > > validate all delegations and return the revoked one.
> > > > >
> > > > > However, if the client fails to return the delegation due to a timeout
> > > > > after receiving the recall or a server bug, the delegation remains in the
> > > > > server's cl_revoked list. The client marks it revoked and won't find it
> > > > > upon detecting SEQ4_STATUS_RECALLABLE_STATE_REVOKED. This leads to a loop:
> > > > > the server persistently sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED, and the
> > > > > client repeatedly tests all delegations, severely impacting performance
> > > > > when numerous delegations exist.
> > > > >
> > > > It is a performance impact, but I don't get the "loop" here. Are you
> > > > saying that this problem compounds itself? That testing all delegations
> > > > causes others to be revoked?
> > > The delegation will be removed from server->delegations in client after
> > > NFSPROC4_CLNT_DELEGRETURN is performed.
> > > nfs4_delegreturn_done
> > > nfs_delegation_mark_returned
> > > nfs_detach_delegation
> > > nfs_detach_delegation_locked
> > > list_del_rcu // remove delegation from server->delegations
> > >
> > > From the client's perspective, the delegation has been returned, but on
> > > the server side, it is left in the cl_revoked list.[1].
> > >
> > > Subsequently, every sequence from the client will be flagged with
> > > SEQ4_STATUS_RECALLABLE_STATE_REVOKED as long as cl_revoked remains
> > > non-empty.
> > > nfsd4_sequence
> > > seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED
> > >
> > > When the client detects SEQ4_STATUS_RECALLABLE_STATE_REVOKED while
> > > processing a sequence result, it sets NFS_DELEGATION_TEST_EXPIRED for all
> > > delegations and wakes up the state manager for handling.
> > > nfs41_sequence_done
> > > nfs41_sequence_process
> > > nfs41_handle_sequence_flag_errors
> > > nfs41_handle_recallable_state_revoked
> > > nfs_test_expired_all_delegations
> > > nfs_mark_test_expired_all_delegations
> > > nfs_delegation_mark_test_expired_server
> > > // set NFS_DELEGATION_TEST_EXPIRED for delegations in
> > > server->delegations
> > > nfs4_schedule_state_manager
> > >
> > > The state manager tests all delegations except the one that was returned,
> > > as it is no longer in server->delegations.
> > > nfs4_state_manager
> > > nfs4_begin_drain_session
> > > nfs_reap_expired_delegations
> > > nfs_server_reap_expired_delegations
> > > // test delegations in server->delegations
> > >
> > > There may be a loop:
> > > 1) send a sequence(client)
> > > 2) return SEQ4_STATUS_RECALLABLE_STATE_REVOKED(server)
> > > 3) set NFS_DELEGATION_TEST_EXPIRED for all delegations(client)
> > > 4) test all delegations by state manager(client)
> > > 5) send another sequence(client)
> > >
> > > The state manager's traversal of delegations occurs between
> > > nfs4_begin_drain_session and nfs4_end_drain_session. Non-privileged requests
> > > will be blocked because the NFS4_SLOT_TBL_DRAINING flag is set. If there are
> > > many delegations to traverse, this blocking time can be relatively long.
> > > > > Since abnormal delegations are removed from flc_lease via nfs4_laundromat
> > > > > --> revoke_delegation --> destroy_unhashed_deleg -->
> > > > > nfs4_unlock_deleg_lease --> kernel_setlease, and do not block new open
> > > > > requests indefinitely, retaining such a delegation on the server is
> > > > > unnecessary.
> > > > >
> > > > > Reported-by: Zhang Jian <zhangjian496@huawei.com>
> > > > > Fixes: 3bd64a5ba171 ("nfsd4: implement SEQ4_STATUS_RECALLABLE_STATE_REVOKED")
> > > > > Closes: https://lore.kernel.org/all/ff8debe9-6877-4cf7-ba29-fc98eae0ffa0@huawei.com/
> > > > > Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
> > > > > ---
> > > > > fs/nfsd/nfs4state.c | 11 +++++++++++
> > > > > 1 file changed, 11 insertions(+)
> > > > >
> > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > > index 88c347957da5..aa65a685dbb9 100644
> > > > > --- a/fs/nfsd/nfs4state.c
> > > > > +++ b/fs/nfsd/nfs4state.c
> > > > > @@ -4326,6 +4326,8 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > > > int buflen;
> > > > > struct net *net = SVC_NET(rqstp);
> > > > > struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > > > > + struct list_head *pos, *next;
> > > > > + struct nfs4_delegation *dp;
> > > > >
> > > > > if (resp->opcnt != 1)
> > > > > return nfserr_sequence_pos;
> > > > > @@ -4470,6 +4472,15 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > > > default:
> > > > > seq->status_flags = 0;
> > > > > }
> > > > > + if (!list_empty(&clp->cl_revoked)) {
> > > > > + list_for_each_safe(pos, next, &clp->cl_revoked) {
> > > > > + dp = list_entry(pos, struct nfs4_delegation, dl_recall_lru);
> > > > > + if (dp->dl_time < (ktime_get_boottime_seconds() - 2 * nn->nfsd4_lease)) {
> > > > > + list_del_init(&dp->dl_recall_lru);
> > > > > + nfs4_put_stid(&dp->dl_stid);
> > > > > + }
> > > > > + }
> > FYI: this list is protected by the clp->cl_lock. You need to hold it to
> > do this list walk.
> >
> > > > > + }
> > > > > if (!list_empty(&clp->cl_revoked))
> > > > > seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED;
> > > > > if (atomic_read(&clp->cl_admin_revoked))
> > > > This seems like a violation of the spec. AIUI, the server is required
> > > > to hang onto a record of the delegation until the client does the
> > > > TEST_STATEID/FREE_STATEID dance to remove it. Just discarding them like
> > > > this seems wrong.
> > > Our expected outcome was that the client would release the abnormal
> > > delegation via TEST_STATEID/FREE_STATEID upon detecting its invalidity.
> > > However, this problematic delegation is no longer present in the
> > > client's server->delegations list—whether due to client-side timeouts or
> > > the server-side bug [1].
> > > > Should we instead just administratively evict the client since it's
> > > > clearly not behaving right in this case?
> > > Thanks for the suggestion. While administratively evicting the client would
> > > certainly resolve the immediate delegation issue, I'm concerned that
> > > approach
> > > might be a bit heavy-handed.
> > > The problematic behavior seems isolated to a single delegation. Meanwhile,
> > > the client itself likely has numerous other open files and active state on
> > > the server. Forcing a complete client reconnect would tear down all that
> > > state, which could cause significant application disruption and be perceived
> > > as a service outage from the client's perspective.
> > >
> > > [1]
> > > https://lore.kernel.org/all/de669327-c93a-49e5-a53b-bda9e67d34a2@huawei.com/
> > >
> > > Thanks,
> > > Lingfeng
> > Ok, I get the problem, but I still disagree with the solution. I don't
> > think we can just time these things out. Ideally we'd close the race
> > window, but the sc_status field is protected by the global state_lock
> > and I don't think we want to take it in revoke_delegation.
> >
> > The best solution I can see is to have destroy_delegation()
> > unconditionally set SC_STATUS_CLOSED, and then you can do the list walk
> > above, but checking for that flag instead of testing for a timeout.
> This might potentially affect the normal TEST_STATEID/FREE_STATEID flow,
> as nfsd4_free_stateid() branches differently based on whether
> SC_STATUS_CLOSED is set. Alternatively, I was wondering if you could
> suggest a workaround to avoid this issue?
>
I can't think of any workarounds other than turning off delegations
altogether.
I guess your concern is that TEST_STATEID and FREE_STATEID would return
BAD_STATEID in this case, even though the entry was still (technically)
on the cl_revoked list? That seems like correct behavior. The client
did send DELEGRETURN, after all.
> >
> > I'm still not thrilled with this solution though. It makes SEQUENCE a
> > bit more heavyweight than I'd like. I'm starting to think that we need
> > to rework the overall delegation locking, but that's an ugly problem to
> > tackle.
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2025-09-02 14:29 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-02 2:22 [PATCH] nfsd: remove long-standing revoked delegations by force Li Lingfeng
2025-09-02 10:21 ` Jeff Layton
2025-09-02 12:10 ` Li Lingfeng
2025-09-02 12:43 ` Benjamin Coddington
2025-09-02 13:08 ` Li Lingfeng
2025-09-03 3:46 ` zhangjian (CG)
2025-09-03 6:45 ` Li Lingfeng
2025-09-03 10:06 ` zhangjian (CG)
2025-09-03 11:40 ` Li Lingfeng
2025-09-02 13:40 ` Jeff Layton
2025-09-02 14:21 ` Li Lingfeng
2025-09-02 14:29 ` Jeff Layton [this message]
2025-09-03 1:34 ` Li Lingfeng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7f0c2dfbcebcacef8afa0b8d7cbdd6d84296cbcb.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=Dai.Ngo@oracle.com \
--cc=chuck.lever@oracle.com \
--cc=houtao1@huawei.com \
--cc=lilingfeng3@huawei.com \
--cc=lilingfeng@huaweicloud.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neil@brown.name \
--cc=okorniev@redhat.com \
--cc=tom@talpey.com \
--cc=yangerkun@huawei.com \
--cc=yi.zhang@huawei.com \
--cc=yukuai1@huaweicloud.com \
--cc=zhangjian496@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.