From: Jeff Layton <jlayton@kernel.org>
To: Dai Ngo <dai.ngo@oracle.com>,
chuck.lever@oracle.com, neil@brown.name, okorniev@redhat.com,
tom@talpey.com, hch@lst.de, alex.aring@gmail.com,
viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz
Cc: linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org
Subject: Re: [PATCH v5 1/1] NFSD: Enforce timeout on layout recall and integrate lease manager fencing
Date: Fri, 06 Feb 2026 14:40:03 -0500 [thread overview]
Message-ID: <3cb09bd01df3d43293f2f443ebb6b4a10ea50dee.camel@kernel.org> (raw)
In-Reply-To: <6a28e81b-1e2e-4457-8bec-4312e6d3246f@oracle.com>
On Fri, 2026-02-06 at 10:17 -0800, Dai Ngo wrote:
> On 2/6/26 6:28 AM, Jeff Layton wrote:
> > On Thu, 2026-02-05 at 12:29 -0800, Dai Ngo wrote:
> > > When a layout conflict triggers a recall, enforcing a timeout is
> > > necessary to prevent excessive nfsd threads from being blocked in
> > > __break_lease ensuring the server continues servicing incoming
> > > requests efficiently.
> > >
> > > This patch introduces a new function to lease_manager_operations:
> > >
> > > lm_breaker_timedout: Invoked when a lease recall times out and is
> > > about to be disposed of. This function enables the lease manager
> > > to inform the caller whether the file_lease should remain on the
> > > flc_list or be disposed of.
> > >
> > > For the NFSD lease manager, this function now handles layout recall
> > > timeouts. If the layout type supports fencing and the client has not
> > > been fenced, a fence operation is triggered to prevent the client
> > > from accessing the block device.
> > >
> > > While the fencing operation is in progress, the conflicting file_lease
> > > remains on the flc_list until fencing is complete. This guarantees
> > > that no other clients can access the file, and the client with
> > > exclusive access is properly blocked before disposal.
> > >
> > Fair point. However...
> >
> > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > ---
> > > Documentation/filesystems/locking.rst | 2 +
> > > fs/locks.c | 15 +++-
> > > fs/nfsd/blocklayout.c | 41 ++++++++--
> > > fs/nfsd/nfs4layouts.c | 113 +++++++++++++++++++++++++-
> > > fs/nfsd/nfs4state.c | 1 +
> > > fs/nfsd/pnfs.h | 2 +-
> > > fs/nfsd/state.h | 8 ++
> > > include/linux/filelock.h | 1 +
> > > 8 files changed, 169 insertions(+), 14 deletions(-)
> > >
> > > v2:
> > > . Update Subject line to include fencing operation.
> > > . Allow conflicting lease to remain on flc_list until fencing
> > > is complete.
> > > . Use system worker to perform fencing operation asynchronously.
> > > . Use nfs4_stid.sc_count to ensure layout stateid remains
> > > valid before starting the fencing operation, nfs4_stid.sc_count
> > > is released after fencing operation is complete.
> > > . Rework nfsd4_scsi_fence_client to:
> > > . wait until fencing to complete before exiting.
> > > . wait until fencing in progress to complete before
> > > checking the NFSD_MDS_PR_FENCED flag.
> > > . Remove lm_need_to_retry from lease_manager_operations.
> > > v3:
> > > . correct locking requirement in locking.rst.
> > > . add max retry count to fencing operation.
> > > . add missing nfs4_put_stid in nfsd4_layout_fence_worker.
> > > . remove special-casing of FL_LAYOUT in lease_modify.
> > > . remove lease_want_dispose.
> > > . move lm_breaker_timedout call to time_out_leases.
> > > v4:
> > > . only increment ls_fence_retry_cnt after successfully
> > > schedule new work in nfsd4_layout_lm_breaker_timedout.
> > > v5:
> > > . take reference count on layout stateid before starting
> > > fence worker.
> > > . restore comments in nfsd4_scsi_fence_client and the
> > > code that check for specific errors.
> > > . cancel fence worker before freeing layout stateid.
> > > . increase fence retry from 5 to 20.
> > >
> > > NOTE:
> > > I experimented with having the fence worker handle lease
> > > disposal after fencing the client. However, this requires
> > > the lease code to export the lease_dispose_list function,
> > > and for the fence worker to acquire the flc_lock in order
> > > to perform the disposal. This approach adds unnecessary
> > > complexity and reduces code clarity, as it exposes internal
> > > lease code details to the nfsd worker, which should not
> > > be the case.
> > >
> > > Instead, the lm_breaker_timedout operation should simply
> > > notify the lease code about how to handle a lease that
> > > times out during a lease break, rather than directly
> > > manipulating the lease list.
> > >
> > Ok, fair point.
> >
> > > diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
> > > index 04c7691e50e0..79bee9ae8bc3 100644
> > > --- a/Documentation/filesystems/locking.rst
> > > +++ b/Documentation/filesystems/locking.rst
> > > @@ -403,6 +403,7 @@ prototypes::
> > > bool (*lm_breaker_owns_lease)(struct file_lock *);
> > > bool (*lm_lock_expirable)(struct file_lock *);
> > > void (*lm_expire_lock)(void);
> > > + bool (*lm_breaker_timedout)(struct file_lease *);
> > >
> > > locking rules:
> > >
> > > @@ -417,6 +418,7 @@ lm_breaker_owns_lease: yes no no
> > > lm_lock_expirable yes no no
> > > lm_expire_lock no no yes
> > > lm_open_conflict yes no no
> > > +lm_breaker_timedout yes no no
> > > ====================== ============= ================= =========
> > >
> > > buffer_head
> > > diff --git a/fs/locks.c b/fs/locks.c
> > > index 46f229f740c8..0e77423cf000 100644
> > > --- a/fs/locks.c
> > > +++ b/fs/locks.c
> > > @@ -1524,6 +1524,7 @@ static void time_out_leases(struct inode *inode, struct list_head *dispose)
> > > {
> > > struct file_lock_context *ctx = inode->i_flctx;
> > > struct file_lease *fl, *tmp;
> > > + bool remove = true;
> > >
> > > lockdep_assert_held(&ctx->flc_lock);
> > >
> > > @@ -1531,8 +1532,18 @@ static void time_out_leases(struct inode *inode, struct list_head *dispose)
> > > trace_time_out_leases(inode, fl);
> > > if (past_time(fl->fl_downgrade_time))
> > > lease_modify(fl, F_RDLCK, dispose);
> > > - if (past_time(fl->fl_break_time))
> > > - lease_modify(fl, F_UNLCK, dispose);
> > > +
> > > + if (past_time(fl->fl_break_time)) {
> > > + /*
> > > + * Consult the lease manager when a lease break times
> > > + * out to determine whether the lease should be disposed
> > > + * of.
> > > + */
> > > + if (fl->fl_lmops && fl->fl_lmops->lm_breaker_timedout)
> > > + remove = fl->fl_lmops->lm_breaker_timedout(fl);
> > > + if (remove)
> > > + lease_modify(fl, F_UNLCK, dispose);
> > When remove is false, and lease_modify() doesn't happen (i.e., the
> > common case where we queue the wq job), when do you actually remove the
> > lease?
>
> The lease is removed when the fence worker completes the fencing operation
> and set ls_fenced to true. When __break_lease/time_out_leases calls
> lm_breaker_timedout again, nfsd4_layout_lm_breaker_timedout returns true
> since ls_fenced is now set.
>
> >
> > Are you just assuming that after the client is fenced, that the layout
> > stateid's refcount will go to zero? I'm curious what drives that
> > process, if so.
>
> No, after completing the fence operation, the fenced worker drops the
> reference count on the layout stateid by calling nfs4_put_stid(). If
> the reference drops to 0 then the layout stateid is freed at this
> point, otherwise it will be freed when the CB_RECALL callback times
> out.
>
In principle the stateid could stick around for a while after the fence
has occurred. It would be better to unlock the lease as soon as the
fencing is done, so that tasks waiting on it can proceed (a'la
kernel_setlease() with F_UNLCK).
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2026-02-06 19:40 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-05 20:29 [PATCH v5 1/1] NFSD: Enforce timeout on layout recall and integrate lease manager fencing Dai Ngo
2026-02-06 6:16 ` Christoph Hellwig
2026-02-06 18:28 ` Dai Ngo
2026-02-06 14:28 ` Jeff Layton
2026-02-06 18:17 ` Dai Ngo
2026-02-06 19:40 ` Jeff Layton [this message]
2026-02-06 23:33 ` Dai Ngo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3cb09bd01df3d43293f2f443ebb6b4a10ea50dee.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=alex.aring@gmail.com \
--cc=brauner@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=dai.ngo@oracle.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neil@brown.name \
--cc=okorniev@redhat.com \
--cc=tom@talpey.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox