From: Dai Ngo <dai.ngo@oracle.com>
To: Jeff Layton <jlayton@kernel.org>,
chuck.lever@oracle.com, neil@brown.name, okorniev@redhat.com,
tom@talpey.com, hch@lst.de, alex.aring@gmail.com,
viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz
Cc: linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org
Subject: Re: [PATCH v5 1/1] NFSD: Enforce timeout on layout recall and integrate lease manager fencing
Date: Fri, 6 Feb 2026 10:17:33 -0800 [thread overview]
Message-ID: <6a28e81b-1e2e-4457-8bec-4312e6d3246f@oracle.com> (raw)
In-Reply-To: <9194ce4db4391c0e6428f97b05fcee53706fb485.camel@kernel.org>
On 2/6/26 6:28 AM, Jeff Layton wrote:
> On Thu, 2026-02-05 at 12:29 -0800, Dai Ngo wrote:
>> When a layout conflict triggers a recall, enforcing a timeout is
>> necessary to prevent excessive nfsd threads from being blocked in
>> __break_lease ensuring the server continues servicing incoming
>> requests efficiently.
>>
>> This patch introduces a new function to lease_manager_operations:
>>
>> lm_breaker_timedout: Invoked when a lease recall times out and is
>> about to be disposed of. This function enables the lease manager
>> to inform the caller whether the file_lease should remain on the
>> flc_list or be disposed of.
>>
>> For the NFSD lease manager, this function now handles layout recall
>> timeouts. If the layout type supports fencing and the client has not
>> been fenced, a fence operation is triggered to prevent the client
>> from accessing the block device.
>>
>> While the fencing operation is in progress, the conflicting file_lease
>> remains on the flc_list until fencing is complete. This guarantees
>> that no other clients can access the file, and the client with
>> exclusive access is properly blocked before disposal.
>>
> Fair point. However...
>
>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>> ---
>> Documentation/filesystems/locking.rst | 2 +
>> fs/locks.c | 15 +++-
>> fs/nfsd/blocklayout.c | 41 ++++++++--
>> fs/nfsd/nfs4layouts.c | 113 +++++++++++++++++++++++++-
>> fs/nfsd/nfs4state.c | 1 +
>> fs/nfsd/pnfs.h | 2 +-
>> fs/nfsd/state.h | 8 ++
>> include/linux/filelock.h | 1 +
>> 8 files changed, 169 insertions(+), 14 deletions(-)
>>
>> v2:
>> . Update Subject line to include fencing operation.
>> . Allow conflicting lease to remain on flc_list until fencing
>> is complete.
>> . Use system worker to perform fencing operation asynchronously.
>> . Use nfs4_stid.sc_count to ensure layout stateid remains
>> valid before starting the fencing operation, nfs4_stid.sc_count
>> is released after fencing operation is complete.
>> . Rework nfsd4_scsi_fence_client to:
>> . wait until fencing to complete before exiting.
>> . wait until fencing in progress to complete before
>> checking the NFSD_MDS_PR_FENCED flag.
>> . Remove lm_need_to_retry from lease_manager_operations.
>> v3:
>> . correct locking requirement in locking.rst.
>> . add max retry count to fencing operation.
>> . add missing nfs4_put_stid in nfsd4_layout_fence_worker.
>> . remove special-casing of FL_LAYOUT in lease_modify.
>> . remove lease_want_dispose.
>> . move lm_breaker_timedout call to time_out_leases.
>> v4:
>> . only increment ls_fence_retry_cnt after successfully
>> schedule new work in nfsd4_layout_lm_breaker_timedout.
>> v5:
>> . take reference count on layout stateid before starting
>> fence worker.
>> . restore comments in nfsd4_scsi_fence_client and the
>> code that check for specific errors.
>> . cancel fence worker before freeing layout stateid.
>> . increase fence retry from 5 to 20.
>>
>> NOTE:
>> I experimented with having the fence worker handle lease
>> disposal after fencing the client. However, this requires
>> the lease code to export the lease_dispose_list function,
>> and for the fence worker to acquire the flc_lock in order
>> to perform the disposal. This approach adds unnecessary
>> complexity and reduces code clarity, as it exposes internal
>> lease code details to the nfsd worker, which should not
>> be the case.
>>
>> Instead, the lm_breaker_timedout operation should simply
>> notify the lease code about how to handle a lease that
>> times out during a lease break, rather than directly
>> manipulating the lease list.
>>
> Ok, fair point.
>
>> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
>> index 04c7691e50e0..79bee9ae8bc3 100644
>> --- a/Documentation/filesystems/locking.rst
>> +++ b/Documentation/filesystems/locking.rst
>> @@ -403,6 +403,7 @@ prototypes::
>> bool (*lm_breaker_owns_lease)(struct file_lock *);
>> bool (*lm_lock_expirable)(struct file_lock *);
>> void (*lm_expire_lock)(void);
>> + bool (*lm_breaker_timedout)(struct file_lease *);
>>
>> locking rules:
>>
>> @@ -417,6 +418,7 @@ lm_breaker_owns_lease: yes no no
>> lm_lock_expirable yes no no
>> lm_expire_lock no no yes
>> lm_open_conflict yes no no
>> +lm_breaker_timedout yes no no
>> ====================== ============= ================= =========
>>
>> buffer_head
>> diff --git a/fs/locks.c b/fs/locks.c
>> index 46f229f740c8..0e77423cf000 100644
>> --- a/fs/locks.c
>> +++ b/fs/locks.c
>> @@ -1524,6 +1524,7 @@ static void time_out_leases(struct inode *inode, struct list_head *dispose)
>> {
>> struct file_lock_context *ctx = inode->i_flctx;
>> struct file_lease *fl, *tmp;
>> + bool remove = true;
>>
>> lockdep_assert_held(&ctx->flc_lock);
>>
>> @@ -1531,8 +1532,18 @@ static void time_out_leases(struct inode *inode, struct list_head *dispose)
>> trace_time_out_leases(inode, fl);
>> if (past_time(fl->fl_downgrade_time))
>> lease_modify(fl, F_RDLCK, dispose);
>> - if (past_time(fl->fl_break_time))
>> - lease_modify(fl, F_UNLCK, dispose);
>> +
>> + if (past_time(fl->fl_break_time)) {
>> + /*
>> + * Consult the lease manager when a lease break times
>> + * out to determine whether the lease should be disposed
>> + * of.
>> + */
>> + if (fl->fl_lmops && fl->fl_lmops->lm_breaker_timedout)
>> + remove = fl->fl_lmops->lm_breaker_timedout(fl);
>> + if (remove)
>> + lease_modify(fl, F_UNLCK, dispose);
> When remove is false, and lease_modify() doesn't happen (i.e., the
> common case where we queue the wq job), when do you actually remove the
> lease?
The lease is removed when the fence worker completes the fencing operation
and set ls_fenced to true. When __break_lease/time_out_leases calls
lm_breaker_timedout again, nfsd4_layout_lm_breaker_timedout returns true
since ls_fenced is now set.
>
> Are you just assuming that after the client is fenced, that the layout
> stateid's refcount will go to zero? I'm curious what drives that
> process, if so.
No, after completing the fence operation, the fenced worker drops the
reference count on the layout stateid by calling nfs4_put_stid(). If
the reference drops to 0 then the layout stateid is freed at this
point, otherwise it will be freed when the CB_RECALL callback times
out.
-Dai
>
>
>
next prev parent reply other threads:[~2026-02-06 18:18 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-05 20:29 [PATCH v5 1/1] NFSD: Enforce timeout on layout recall and integrate lease manager fencing Dai Ngo
2026-02-06 6:16 ` Christoph Hellwig
2026-02-06 18:28 ` Dai Ngo
2026-02-06 14:28 ` Jeff Layton
2026-02-06 18:17 ` Dai Ngo [this message]
2026-02-06 19:40 ` Jeff Layton
2026-02-06 23:33 ` Dai Ngo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6a28e81b-1e2e-4457-8bec-4312e6d3246f@oracle.com \
--to=dai.ngo@oracle.com \
--cc=alex.aring@gmail.com \
--cc=brauner@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neil@brown.name \
--cc=okorniev@redhat.com \
--cc=tom@talpey.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox