Linux NFS development
 help / color / mirror / Atom feed
From: Dai Ngo <dai.ngo@oracle.com>
To: Jeff Layton <jlayton@kernel.org>,
	chuck.lever@oracle.com, neil@brown.name, okorniev@redhat.com,
	tom@talpey.com, hch@lst.de, alex.aring@gmail.com,
	viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz
Cc: linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org
Subject: Re: [PATCH v5 1/1] NFSD: Enforce timeout on layout recall and integrate lease manager fencing
Date: Fri, 6 Feb 2026 15:33:16 -0800	[thread overview]
Message-ID: <623f06a7-4e41-4703-95ef-b3476fc4549a@oracle.com> (raw)
In-Reply-To: <3cb09bd01df3d43293f2f443ebb6b4a10ea50dee.camel@kernel.org>


On 2/6/26 11:40 AM, Jeff Layton wrote:
> On Fri, 2026-02-06 at 10:17 -0800, Dai Ngo wrote:
>> On 2/6/26 6:28 AM, Jeff Layton wrote:
>>> On Thu, 2026-02-05 at 12:29 -0800, Dai Ngo wrote:
>>>> When a layout conflict triggers a recall, enforcing a timeout is
>>>> necessary to prevent excessive nfsd threads from being blocked in
>>>> __break_lease ensuring the server continues servicing incoming
>>>> requests efficiently.
>>>>
>>>> This patch introduces a new function to lease_manager_operations:
>>>>
>>>> lm_breaker_timedout: Invoked when a lease recall times out and is
>>>> about to be disposed of. This function enables the lease manager
>>>> to inform the caller whether the file_lease should remain on the
>>>> flc_list or be disposed of.
>>>>
>>>> For the NFSD lease manager, this function now handles layout recall
>>>> timeouts. If the layout type supports fencing and the client has not
>>>> been fenced, a fence operation is triggered to prevent the client
>>>> from accessing the block device.
>>>>
>>>> While the fencing operation is in progress, the conflicting file_lease
>>>> remains on the flc_list until fencing is complete. This guarantees
>>>> that no other clients can access the file, and the client with
>>>> exclusive access is properly blocked before disposal.
>>>>
>>> Fair point. However...
>>>
>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>> ---
>>>>    Documentation/filesystems/locking.rst |   2 +
>>>>    fs/locks.c                            |  15 +++-
>>>>    fs/nfsd/blocklayout.c                 |  41 ++++++++--
>>>>    fs/nfsd/nfs4layouts.c                 | 113 +++++++++++++++++++++++++-
>>>>    fs/nfsd/nfs4state.c                   |   1 +
>>>>    fs/nfsd/pnfs.h                        |   2 +-
>>>>    fs/nfsd/state.h                       |   8 ++
>>>>    include/linux/filelock.h              |   1 +
>>>>    8 files changed, 169 insertions(+), 14 deletions(-)
>>>>
>>>> v2:
>>>>       . Update Subject line to include fencing operation.
>>>>       . Allow conflicting lease to remain on flc_list until fencing
>>>>         is complete.
>>>>       . Use system worker to perform fencing operation asynchronously.
>>>>       . Use nfs4_stid.sc_count to ensure layout stateid remains
>>>>         valid before starting the fencing operation, nfs4_stid.sc_count
>>>>         is released after fencing operation is complete.
>>>>       . Rework nfsd4_scsi_fence_client to:
>>>>            . wait until fencing to complete before exiting.
>>>>            . wait until fencing in progress to complete before
>>>>              checking the NFSD_MDS_PR_FENCED flag.
>>>>       . Remove lm_need_to_retry from lease_manager_operations.
>>>> v3:
>>>>       . correct locking requirement in locking.rst.
>>>>       . add max retry count to fencing operation.
>>>>       . add missing nfs4_put_stid in nfsd4_layout_fence_worker.
>>>>       . remove special-casing of FL_LAYOUT in lease_modify.
>>>>       . remove lease_want_dispose.
>>>>       . move lm_breaker_timedout call to time_out_leases.
>>>> v4:
>>>>       . only increment ls_fence_retry_cnt after successfully
>>>>         schedule new work in nfsd4_layout_lm_breaker_timedout.
>>>> v5:
>>>>       . take reference count on layout stateid before starting
>>>>         fence worker.
>>>>       . restore comments in nfsd4_scsi_fence_client and the
>>>>         code that check for specific errors.
>>>>       . cancel fence worker before freeing layout stateid.
>>>>       . increase fence retry from 5 to 20.
>>>>
>>>> NOTE:
>>>>       I experimented with having the fence worker handle lease
>>>>       disposal after fencing the client. However, this requires
>>>>       the lease code to export the lease_dispose_list function,
>>>>       and for the fence worker to acquire the flc_lock in order
>>>>       to perform the disposal. This approach adds unnecessary
>>>>       complexity and reduces code clarity, as it exposes internal
>>>>       lease code details to the nfsd worker, which should not
>>>>       be the case.
>>>>
>>>>       Instead, the lm_breaker_timedout operation should simply
>>>>       notify the lease code about how to handle a lease that
>>>>       times out during a lease break, rather than directly
>>>>       manipulating the lease list.
>>>>
>>> Ok, fair point.
>>>
>>>> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
>>>> index 04c7691e50e0..79bee9ae8bc3 100644
>>>> --- a/Documentation/filesystems/locking.rst
>>>> +++ b/Documentation/filesystems/locking.rst
>>>> @@ -403,6 +403,7 @@ prototypes::
>>>>    	bool (*lm_breaker_owns_lease)(struct file_lock *);
>>>>            bool (*lm_lock_expirable)(struct file_lock *);
>>>>            void (*lm_expire_lock)(void);
>>>> +        bool (*lm_breaker_timedout)(struct file_lease *);
>>>>    
>>>>    locking rules:
>>>>    
>>>> @@ -417,6 +418,7 @@ lm_breaker_owns_lease:	yes     	no			no
>>>>    lm_lock_expirable	yes		no			no
>>>>    lm_expire_lock		no		no			yes
>>>>    lm_open_conflict	yes		no			no
>>>> +lm_breaker_timedout     yes             no                      no
>>>>    ======================	=============	=================	=========
>>>>    
>>>>    buffer_head
>>>> diff --git a/fs/locks.c b/fs/locks.c
>>>> index 46f229f740c8..0e77423cf000 100644
>>>> --- a/fs/locks.c
>>>> +++ b/fs/locks.c
>>>> @@ -1524,6 +1524,7 @@ static void time_out_leases(struct inode *inode, struct list_head *dispose)
>>>>    {
>>>>    	struct file_lock_context *ctx = inode->i_flctx;
>>>>    	struct file_lease *fl, *tmp;
>>>> +	bool remove = true;
>>>>    
>>>>    	lockdep_assert_held(&ctx->flc_lock);
>>>>    
>>>> @@ -1531,8 +1532,18 @@ static void time_out_leases(struct inode *inode, struct list_head *dispose)
>>>>    		trace_time_out_leases(inode, fl);
>>>>    		if (past_time(fl->fl_downgrade_time))
>>>>    			lease_modify(fl, F_RDLCK, dispose);
>>>> -		if (past_time(fl->fl_break_time))
>>>> -			lease_modify(fl, F_UNLCK, dispose);
>>>> +
>>>> +		if (past_time(fl->fl_break_time)) {
>>>> +			/*
>>>> +			 * Consult the lease manager when a lease break times
>>>> +			 * out to determine whether the lease should be disposed
>>>> +			 * of.
>>>> +			 */
>>>> +			if (fl->fl_lmops && fl->fl_lmops->lm_breaker_timedout)
>>>> +				remove = fl->fl_lmops->lm_breaker_timedout(fl);
>>>> +			if (remove)
>>>> +				lease_modify(fl, F_UNLCK, dispose);
>>> When remove is false, and lease_modify() doesn't happen (i.e., the
>>> common case where we queue the wq job), when do you actually remove the
>>> lease?
>> The lease is removed when the fence worker completes the fencing operation
>> and set ls_fenced to true. When __break_lease/time_out_leases calls
>> lm_breaker_timedout again, nfsd4_layout_lm_breaker_timedout returns true
>> since ls_fenced is now set.
>>
>>> Are you just assuming that after the client is fenced, that the layout
>>> stateid's refcount will go to zero? I'm curious what drives that
>>> process, if so.
>> No, after completing the fence operation, the fenced worker drops the
>> reference count on the layout stateid by calling nfs4_put_stid(). If
>> the reference drops to 0 then the layout stateid is freed at this
>> point, otherwise it will be freed when the CB_RECALL callback times
>> out.
>>
> In principle the stateid could stick around for a while after the fence
> has occurred. It would be better to unlock the lease as soon as the
> fencing is done, so that tasks waiting on it can proceed (a'la
> kernel_setlease() with F_UNLCK).

Good suggestion, fix in v6.

Thanks,
-Dai


      reply	other threads:[~2026-02-06 23:33 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-05 20:29 [PATCH v5 1/1] NFSD: Enforce timeout on layout recall and integrate lease manager fencing Dai Ngo
2026-02-06  6:16 ` Christoph Hellwig
2026-02-06 18:28   ` Dai Ngo
2026-02-06 14:28 ` Jeff Layton
2026-02-06 18:17   ` Dai Ngo
2026-02-06 19:40     ` Jeff Layton
2026-02-06 23:33       ` Dai Ngo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=623f06a7-4e41-4703-95ef-b3476fc4549a@oracle.com \
    --to=dai.ngo@oracle.com \
    --cc=alex.aring@gmail.com \
    --cc=brauner@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox