From: Dai Ngo <dai.ngo@oracle.com>
To: Jeff Layton <jlayton@kernel.org>,
chuck.lever@oracle.com, neil@brown.name, okorniev@redhat.com,
tom@talpey.com, hch@lst.de, alex.aring@gmail.com,
viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz
Cc: linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org
Subject: Re: [PATCH v5 1/1] NFSD: Enforce timeout on layout recall and integrate lease manager fencing
Date: Fri, 6 Feb 2026 15:33:16 -0800 [thread overview]
Message-ID: <623f06a7-4e41-4703-95ef-b3476fc4549a@oracle.com> (raw)
In-Reply-To: <3cb09bd01df3d43293f2f443ebb6b4a10ea50dee.camel@kernel.org>
On 2/6/26 11:40 AM, Jeff Layton wrote:
> On Fri, 2026-02-06 at 10:17 -0800, Dai Ngo wrote:
>> On 2/6/26 6:28 AM, Jeff Layton wrote:
>>> On Thu, 2026-02-05 at 12:29 -0800, Dai Ngo wrote:
>>>> When a layout conflict triggers a recall, enforcing a timeout is
>>>> necessary to prevent excessive nfsd threads from being blocked in
>>>> __break_lease ensuring the server continues servicing incoming
>>>> requests efficiently.
>>>>
>>>> This patch introduces a new function to lease_manager_operations:
>>>>
>>>> lm_breaker_timedout: Invoked when a lease recall times out and is
>>>> about to be disposed of. This function enables the lease manager
>>>> to inform the caller whether the file_lease should remain on the
>>>> flc_list or be disposed of.
>>>>
>>>> For the NFSD lease manager, this function now handles layout recall
>>>> timeouts. If the layout type supports fencing and the client has not
>>>> been fenced, a fence operation is triggered to prevent the client
>>>> from accessing the block device.
>>>>
>>>> While the fencing operation is in progress, the conflicting file_lease
>>>> remains on the flc_list until fencing is complete. This guarantees
>>>> that no other clients can access the file, and the client with
>>>> exclusive access is properly blocked before disposal.
>>>>
>>> Fair point. However...
>>>
>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>> ---
>>>> Documentation/filesystems/locking.rst | 2 +
>>>> fs/locks.c | 15 +++-
>>>> fs/nfsd/blocklayout.c | 41 ++++++++--
>>>> fs/nfsd/nfs4layouts.c | 113 +++++++++++++++++++++++++-
>>>> fs/nfsd/nfs4state.c | 1 +
>>>> fs/nfsd/pnfs.h | 2 +-
>>>> fs/nfsd/state.h | 8 ++
>>>> include/linux/filelock.h | 1 +
>>>> 8 files changed, 169 insertions(+), 14 deletions(-)
>>>>
>>>> v2:
>>>> . Update Subject line to include fencing operation.
>>>> . Allow conflicting lease to remain on flc_list until fencing
>>>> is complete.
>>>> . Use system worker to perform fencing operation asynchronously.
>>>> . Use nfs4_stid.sc_count to ensure layout stateid remains
>>>> valid before starting the fencing operation, nfs4_stid.sc_count
>>>> is released after fencing operation is complete.
>>>> . Rework nfsd4_scsi_fence_client to:
>>>> . wait until fencing to complete before exiting.
>>>> . wait until fencing in progress to complete before
>>>> checking the NFSD_MDS_PR_FENCED flag.
>>>> . Remove lm_need_to_retry from lease_manager_operations.
>>>> v3:
>>>> . correct locking requirement in locking.rst.
>>>> . add max retry count to fencing operation.
>>>> . add missing nfs4_put_stid in nfsd4_layout_fence_worker.
>>>> . remove special-casing of FL_LAYOUT in lease_modify.
>>>> . remove lease_want_dispose.
>>>> . move lm_breaker_timedout call to time_out_leases.
>>>> v4:
>>>> . only increment ls_fence_retry_cnt after successfully
>>>> schedule new work in nfsd4_layout_lm_breaker_timedout.
>>>> v5:
>>>> . take reference count on layout stateid before starting
>>>> fence worker.
>>>> . restore comments in nfsd4_scsi_fence_client and the
>>>> code that check for specific errors.
>>>> . cancel fence worker before freeing layout stateid.
>>>> . increase fence retry from 5 to 20.
>>>>
>>>> NOTE:
>>>> I experimented with having the fence worker handle lease
>>>> disposal after fencing the client. However, this requires
>>>> the lease code to export the lease_dispose_list function,
>>>> and for the fence worker to acquire the flc_lock in order
>>>> to perform the disposal. This approach adds unnecessary
>>>> complexity and reduces code clarity, as it exposes internal
>>>> lease code details to the nfsd worker, which should not
>>>> be the case.
>>>>
>>>> Instead, the lm_breaker_timedout operation should simply
>>>> notify the lease code about how to handle a lease that
>>>> times out during a lease break, rather than directly
>>>> manipulating the lease list.
>>>>
>>> Ok, fair point.
>>>
>>>> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
>>>> index 04c7691e50e0..79bee9ae8bc3 100644
>>>> --- a/Documentation/filesystems/locking.rst
>>>> +++ b/Documentation/filesystems/locking.rst
>>>> @@ -403,6 +403,7 @@ prototypes::
>>>> bool (*lm_breaker_owns_lease)(struct file_lock *);
>>>> bool (*lm_lock_expirable)(struct file_lock *);
>>>> void (*lm_expire_lock)(void);
>>>> + bool (*lm_breaker_timedout)(struct file_lease *);
>>>>
>>>> locking rules:
>>>>
>>>> @@ -417,6 +418,7 @@ lm_breaker_owns_lease: yes no no
>>>> lm_lock_expirable yes no no
>>>> lm_expire_lock no no yes
>>>> lm_open_conflict yes no no
>>>> +lm_breaker_timedout yes no no
>>>> ====================== ============= ================= =========
>>>>
>>>> buffer_head
>>>> diff --git a/fs/locks.c b/fs/locks.c
>>>> index 46f229f740c8..0e77423cf000 100644
>>>> --- a/fs/locks.c
>>>> +++ b/fs/locks.c
>>>> @@ -1524,6 +1524,7 @@ static void time_out_leases(struct inode *inode, struct list_head *dispose)
>>>> {
>>>> struct file_lock_context *ctx = inode->i_flctx;
>>>> struct file_lease *fl, *tmp;
>>>> + bool remove = true;
>>>>
>>>> lockdep_assert_held(&ctx->flc_lock);
>>>>
>>>> @@ -1531,8 +1532,18 @@ static void time_out_leases(struct inode *inode, struct list_head *dispose)
>>>> trace_time_out_leases(inode, fl);
>>>> if (past_time(fl->fl_downgrade_time))
>>>> lease_modify(fl, F_RDLCK, dispose);
>>>> - if (past_time(fl->fl_break_time))
>>>> - lease_modify(fl, F_UNLCK, dispose);
>>>> +
>>>> + if (past_time(fl->fl_break_time)) {
>>>> + /*
>>>> + * Consult the lease manager when a lease break times
>>>> + * out to determine whether the lease should be disposed
>>>> + * of.
>>>> + */
>>>> + if (fl->fl_lmops && fl->fl_lmops->lm_breaker_timedout)
>>>> + remove = fl->fl_lmops->lm_breaker_timedout(fl);
>>>> + if (remove)
>>>> + lease_modify(fl, F_UNLCK, dispose);
>>> When remove is false, and lease_modify() doesn't happen (i.e., the
>>> common case where we queue the wq job), when do you actually remove the
>>> lease?
>> The lease is removed when the fence worker completes the fencing operation
>> and set ls_fenced to true. When __break_lease/time_out_leases calls
>> lm_breaker_timedout again, nfsd4_layout_lm_breaker_timedout returns true
>> since ls_fenced is now set.
>>
>>> Are you just assuming that after the client is fenced, that the layout
>>> stateid's refcount will go to zero? I'm curious what drives that
>>> process, if so.
>> No, after completing the fence operation, the fenced worker drops the
>> reference count on the layout stateid by calling nfs4_put_stid(). If
>> the reference drops to 0 then the layout stateid is freed at this
>> point, otherwise it will be freed when the CB_RECALL callback times
>> out.
>>
> In principle the stateid could stick around for a while after the fence
> has occurred. It would be better to unlock the lease as soon as the
> fencing is done, so that tasks waiting on it can proceed (a'la
> kernel_setlease() with F_UNLCK).
Good suggestion, fix in v6.
Thanks,
-Dai
prev parent reply other threads:[~2026-02-06 23:33 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-05 20:29 [PATCH v5 1/1] NFSD: Enforce timeout on layout recall and integrate lease manager fencing Dai Ngo
2026-02-06 6:16 ` Christoph Hellwig
2026-02-06 18:28 ` Dai Ngo
2026-02-06 14:28 ` Jeff Layton
2026-02-06 18:17 ` Dai Ngo
2026-02-06 19:40 ` Jeff Layton
2026-02-06 23:33 ` Dai Ngo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=623f06a7-4e41-4703-95ef-b3476fc4549a@oracle.com \
--to=dai.ngo@oracle.com \
--cc=alex.aring@gmail.com \
--cc=brauner@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neil@brown.name \
--cc=okorniev@redhat.com \
--cc=tom@talpey.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox