From: Jeff Layton <jlayton@kernel.org>
To: Dai Ngo <dai.ngo@oracle.com>,
chuck.lever@oracle.com, neil@brown.name, okorniev@redhat.com,
tom@talpey.com, hch@lst.de, alex.aring@gmail.com
Cc: linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org
Subject: Re: [PATCH 1/1] NFSD: Enforce recall timeout for layout conflict
Date: Tue, 20 Jan 2026 16:28:14 -0500 [thread overview]
Message-ID: <f2203e755aca4da45b099b18aac03b0a9d299343.camel@kernel.org> (raw)
In-Reply-To: <a1dc8306-6422-45c8-a5b0-8d10a4d89279@oracle.com>
On Tue, 2026-01-20 at 13:22 -0800, Dai Ngo wrote:
> On 1/20/26 12:41 PM, Jeff Layton wrote:
> > On Mon, 2026-01-19 at 09:47 -0800, Dai Ngo wrote:
> > > When a layout conflict triggers a recall, enforcing a timeout
> > > is necessary to prevent excessive nfsd threads from being tied
> > > up in __break_lease and ensure the server can continue servicing
> > > incoming requests efficiently.
> > >
> > > This patch introduces two new functions in lease_manager_operations:
> > >
> > > 1. lm_breaker_timedout: Invoked when a lease recall times out,
> > > allowing the lease manager to take appropriate action.
> > >
> > > The NFSD lease manager uses this to handle layout recall
> > > timeouts. If the layout type supports fencing, a fence
> > > operation is issued to prevent the client from accessing
> > > the block device.
> > >
> > > 2. lm_need_to_retry: Invoked when there is a lease conflict.
> > > This allows the lease manager to instruct __break_lease
> > > to return an error to the caller, prompting a retry of
> > > the conflicting operation.
> > >
> > > The NFSD lease manager uses this to avoid excessive nfsd
> > > from being blocked in __break_lease, which could hinder
> > > the server's ability to service incoming requests.
> > >
> > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > ---
> > > Documentation/filesystems/locking.rst | 4 ++
> > > fs/locks.c | 29 +++++++++++-
> > > fs/nfsd/nfs4layouts.c | 65 +++++++++++++++++++++++++--
> > > include/linux/filelock.h | 7 +++
> > > 4 files changed, 100 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
> > > index 04c7691e50e0..ae9a1b207b95 100644
> > > --- a/Documentation/filesystems/locking.rst
> > > +++ b/Documentation/filesystems/locking.rst
> > > @@ -403,6 +403,8 @@ prototypes::
> > > bool (*lm_breaker_owns_lease)(struct file_lock *);
> > > bool (*lm_lock_expirable)(struct file_lock *);
> > > void (*lm_expire_lock)(void);
> > > + void (*lm_breaker_timedout)(struct file_lease *);
> > > + bool (*lm_need_to_retry)(struct file_lease *, struct file_lock_context *);
> > >
> > > locking rules:
> > >
> > > @@ -417,6 +419,8 @@ lm_breaker_owns_lease: yes no no
> > > lm_lock_expirable yes no no
> > > lm_expire_lock no no yes
> > > lm_open_conflict yes no no
> > > +lm_breaker_timedout no no yes
> > > +lm_need_to_retry yes no no
> > > ====================== ============= ================= =========
> > >
> > > buffer_head
> > > diff --git a/fs/locks.c b/fs/locks.c
> > > index 46f229f740c8..cd08642ab8bb 100644
> > > --- a/fs/locks.c
> > > +++ b/fs/locks.c
> > > @@ -381,6 +381,14 @@ lease_dispose_list(struct list_head *dispose)
> > > while (!list_empty(dispose)) {
> > > flc = list_first_entry(dispose, struct file_lock_core, flc_list);
> > > list_del_init(&flc->flc_list);
> > > + if (flc->flc_flags & FL_BREAKER_TIMEDOUT) {
> > > + struct file_lease *fl;
> > > +
> > > + fl = file_lease(flc);
> > > + if (fl->fl_lmops &&
> > > + fl->fl_lmops->lm_breaker_timedout)
> > > + fl->fl_lmops->lm_breaker_timedout(fl);
> > > + }
> > > locks_free_lease(file_lease(flc));
> > > }
> > > }
> > > @@ -1531,8 +1539,10 @@ static void time_out_leases(struct inode *inode, struct list_head *dispose)
> > > trace_time_out_leases(inode, fl);
> > > if (past_time(fl->fl_downgrade_time))
> > > lease_modify(fl, F_RDLCK, dispose);
> > > - if (past_time(fl->fl_break_time))
> > > + if (past_time(fl->fl_break_time)) {
> > > lease_modify(fl, F_UNLCK, dispose);
> > > + fl->c.flc_flags |= FL_BREAKER_TIMEDOUT;
> > > + }
> > When the lease times out, you go ahead and remove it but then mark it
> > with FL_BREAKER_TIMEDOUT. Then later, you call ->lm_breaker_timedout if
> > that's set.
> >
> > That means that when this happens, there is a window of time where
> > there is no lease, but the rogue client isn't yet fenced. That sounds
> > like a problem as you could allow competing access.
>
> I have to think more about the implication of competing access. Since
> the thread that detects the conflict is in the process of fencing the
> other client and has not accessed the file data yet, I don't see the
> problem of allowing the other client to continue access the file until
> fence operation completed.
>
Isn't the whole point of write layout leases to grant exclusive access
to an external client? At the point where you lose the lease, any
competing access can then proceed. Maybe a local file writer starts
writing to the file at that point. But...what if the client is still
writing stuff to the backing store? Won't that corrupt data (and maybe
metadata)?
> >
> > I think you'll have to do this in reverse order: fence the client and
> > then remove the lease.
> >
> > > }
> > > }
> > >
> > > @@ -1633,6 +1643,8 @@ int __break_lease(struct inode *inode, unsigned int flags)
> > > list_for_each_entry_safe(fl, tmp, &ctx->flc_lease, c.flc_list) {
> > > if (!leases_conflict(&fl->c, &new_fl->c))
> > > continue;
> > > + if (new_fl->fl_lmops != fl->fl_lmops)
> > > + new_fl->fl_lmops = fl->fl_lmops;
> > > if (want_write) {
> > > if (fl->c.flc_flags & FL_UNLOCK_PENDING)
> > > continue;
> > > @@ -1657,6 +1669,18 @@ int __break_lease(struct inode *inode, unsigned int flags)
> > > goto out;
> > > }
> > >
> > > + /*
> > > + * Check whether the lease manager wants the operation
> > > + * causing the conflict to be retried.
> > > + */
> > > + if (new_fl->fl_lmops && new_fl->fl_lmops->lm_need_to_retry &&
> > > + new_fl->fl_lmops->lm_need_to_retry(new_fl, ctx)) {
> > > + trace_break_lease_noblock(inode, new_fl);
> > > + error = -ERESTARTSYS;
> > > + goto out;
> > > + }
> > > + ctx->flc_in_conflict = true;
> > > +
> > I guess flc_in_conflict is supposed to indicate "hey, we're already
> > doing a layout break on this inode". That seems reasonable, if a little
> > klunky.
> >
> > It would be nice if you could track this flag inside of nfsd's data
> > structures instead (since only it cares about the flag), but I don't
> > think it has any convenient per-inode structures to set this in.
>
> Can we move this flag in to nfsd_file? set the flag there and clear
> the flag when fencing completed.
>
No, there can be several nfsd_file objects per inode. I think that'd be
hard to do.
> >
> > > restart:
> > > fl = list_first_entry(&ctx->flc_lease, struct file_lease, c.flc_list);
> > > break_time = fl->fl_break_time;
> > > @@ -1693,6 +1717,9 @@ int __break_lease(struct inode *inode, unsigned int flags)
> > > spin_unlock(&ctx->flc_lock);
> > > percpu_up_read(&file_rwsem);
> > > lease_dispose_list(&dispose);
> > > + spin_lock(&ctx->flc_lock);
> > > + ctx->flc_in_conflict = false;
> > > + spin_unlock(&ctx->flc_lock);
> > > free_lock:
> > > locks_free_lease(new_fl);
> > > return error;
> > > diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
> > > index ad7af8cfcf1f..e7777d6ee8d0 100644
> > > --- a/fs/nfsd/nfs4layouts.c
> > > +++ b/fs/nfsd/nfs4layouts.c
> > > @@ -747,11 +747,9 @@ static bool
> > > nfsd4_layout_lm_break(struct file_lease *fl)
> > > {
> > > /*
> > > - * We don't want the locks code to timeout the lease for us;
> > > - * we'll remove it ourself if a layout isn't returned
> > > - * in time:
> > > + * Enforce break lease timeout to prevent NFSD
> > > + * thread from hanging in __break_lease.
> > > */
> > > - fl->fl_break_time = 0;
> > > nfsd4_recall_file_layout(fl->c.flc_owner);
> > > return false;
> > > }
> > > @@ -782,10 +780,69 @@ nfsd4_layout_lm_open_conflict(struct file *filp, int arg)
> > > return 0;
> > > }
> > >
> > > +/**
> > > + * nfsd_layout_breaker_timedout - The layout recall has timed out.
> > Please fix this kdoc header.
>
> I noticed this too, will fix in v2.
>
> >
> > > + * If the layout type supports fence operation then do it to stop
> > > + * the client from accessing the block device.
> > > + *
> > > + * @fl: file to check
> > > + *
> > > + * Return value: None.
> > > + */
> > > +static void
> > > +nfsd4_layout_lm_breaker_timedout(struct file_lease *fl)
> > > +{
> > > + struct nfs4_layout_stateid *ls = fl->c.flc_owner;
> > > + struct nfsd_file *nf;
> > > + u32 type;
> > > +
> > > + rcu_read_lock();
> > > + nf = nfsd_file_get(ls->ls_file);
> > > + rcu_read_unlock();
> > > + if (!nf)
> > > + return;
> > > + type = ls->ls_layout_type;
> > > + if (nfsd4_layout_ops[type]->fence_client)
> > > + nfsd4_layout_ops[type]->fence_client(ls, nf);
> > > + nfsd_file_put(nf);
> > > +}
> > > +
> > > +/**
> > > + * nfsd4_layout_lm_conflict - Handle multiple conflicts in the same file.
> > kdoc header is wrong here. This should be for nfsd4_layout_lm_retry().
>
> I noticed this too, will fix in v2. Kernel test robot also
> complains about this.
>
> >
> > > + *
> > > + * This function is called from __break_lease when a conflict occurs.
> > > + * For layout conflicts on the same file, each conflict triggers a
> > > + * layout recall. Only the thread handling the first conflict needs
> > > + * to remain in __break_lease to manage the timeout for these recalls;
> > > + * subsequent threads should not wait in __break_lease.
> > > + *
> > > + * This is done to prevent excessive nfsd threads from becoming tied up
> > > + * in __break_lease, which could hinder the server's ability to service
> > > + * incoming requests.
> > > + *
> > > + * Return true if thread should not wait in __break_lease else return
> > > + * false.
> > > + */
> > > +static bool
> > > +nfsd4_layout_lm_retry(struct file_lease *fl,
> > > + struct file_lock_context *ctx)
> > > +{
> > > + struct svc_rqst *rqstp;
> > > +
> > > + rqstp = nfsd_current_rqst();
> > > + if (!rqstp)
> > > + return false;
> > > + if ((fl->c.flc_flags & FL_LAYOUT) && ctx->flc_in_conflict)
> > This should never be called for anything but a FL_LAYOUT lease, since
> > you're only setting this in nfsd4_layouts_lm_ops.
>
> I will remove the check for FL_LAYOUT in v2.
>
> Thanks,
> -Dai
>
> >
> > > + return true;
> > > + return false;
> > > +}
> > > +
> > > static const struct lease_manager_operations nfsd4_layouts_lm_ops = {
> > > .lm_break = nfsd4_layout_lm_break,
> > > .lm_change = nfsd4_layout_lm_change,
> > > .lm_open_conflict = nfsd4_layout_lm_open_conflict,
> > > + .lm_breaker_timedout = nfsd4_layout_lm_breaker_timedout,
> > > + .lm_need_to_retry = nfsd4_layout_lm_retry,
> > > };
> > >
> > > int
> > > diff --git a/include/linux/filelock.h b/include/linux/filelock.h
> > > index 2f5e5588ee07..6967af8b7fd2 100644
> > > --- a/include/linux/filelock.h
> > > +++ b/include/linux/filelock.h
> > > @@ -17,6 +17,7 @@
> > > #define FL_OFDLCK 1024 /* lock is "owned" by struct file */
> > > #define FL_LAYOUT 2048 /* outstanding pNFS layout */
> > > #define FL_RECLAIM 4096 /* reclaiming from a reboot server */
> > > +#define FL_BREAKER_TIMEDOUT 8192 /* lease breaker timed out */
> > >
> > > #define FL_CLOSE_POSIX (FL_POSIX | FL_CLOSE)
> > >
> > > @@ -50,6 +51,9 @@ struct lease_manager_operations {
> > > void (*lm_setup)(struct file_lease *, void **);
> > > bool (*lm_breaker_owns_lease)(struct file_lease *);
> > > int (*lm_open_conflict)(struct file *, int);
> > > + void (*lm_breaker_timedout)(struct file_lease *fl);
> > > + bool (*lm_need_to_retry)(struct file_lease *fl,
> > > + struct file_lock_context *ctx);
> > > };
> > >
> > > struct lock_manager {
> > > @@ -145,6 +149,9 @@ struct file_lock_context {
> > > struct list_head flc_flock;
> > > struct list_head flc_posix;
> > > struct list_head flc_lease;
> > > +
> > > + /* for FL_LAYOUT */
> > > + bool flc_in_conflict;
> > > };
> > >
> > > #ifdef CONFIG_FILE_LOCKING
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2026-01-20 21:28 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-19 17:47 [PATCH 1/1] NFSD: Enforce recall timeout for layout conflict Dai Ngo
2026-01-19 22:49 ` kernel test robot
2026-01-20 7:26 ` Christoph Hellwig
2026-01-20 18:54 ` Dai Ngo
2026-01-21 8:38 ` Christoph Hellwig
2026-01-20 15:19 ` Chuck Lever
2026-01-20 19:22 ` Dai Ngo
2026-01-20 20:55 ` Dai Ngo
2026-01-20 20:41 ` Jeff Layton
2026-01-20 21:22 ` Dai Ngo
2026-01-20 21:28 ` Jeff Layton [this message]
2026-01-20 21:42 ` Dai Ngo
2026-01-20 22:03 ` Jeff Layton
2026-01-20 22:48 ` Dai Ngo
2026-01-21 14:34 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f2203e755aca4da45b099b18aac03b0a9d299343.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=alex.aring@gmail.com \
--cc=chuck.lever@oracle.com \
--cc=dai.ngo@oracle.com \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neil@brown.name \
--cc=okorniev@redhat.com \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox