public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Dai Ngo <dai.ngo@oracle.com>,
	chuck.lever@oracle.com, neil@brown.name,  okorniev@redhat.com,
	tom@talpey.com, hch@lst.de, alex.aring@gmail.com
Cc: linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org
Subject: Re: [PATCH 1/1] NFSD: Enforce recall timeout for layout conflict
Date: Tue, 20 Jan 2026 16:28:14 -0500	[thread overview]
Message-ID: <f2203e755aca4da45b099b18aac03b0a9d299343.camel@kernel.org> (raw)
In-Reply-To: <a1dc8306-6422-45c8-a5b0-8d10a4d89279@oracle.com>

On Tue, 2026-01-20 at 13:22 -0800, Dai Ngo wrote:
> On 1/20/26 12:41 PM, Jeff Layton wrote:
> > On Mon, 2026-01-19 at 09:47 -0800, Dai Ngo wrote:
> > > When a layout conflict triggers a recall, enforcing a timeout
> > > is necessary to prevent excessive nfsd threads from being tied
> > > up in __break_lease and ensure the server can continue servicing
> > > incoming requests efficiently.
> > > 
> > > This patch introduces two new functions in lease_manager_operations:
> > > 
> > > 1. lm_breaker_timedout: Invoked when a lease recall times out,
> > >     allowing the lease manager to take appropriate action.
> > > 
> > >     The NFSD lease manager uses this to handle layout recall
> > >     timeouts. If the layout type supports fencing, a fence
> > >     operation is issued to prevent the client from accessing
> > >     the block device.
> > > 
> > > 2. lm_need_to_retry: Invoked when there is a lease conflict.
> > >     This allows the lease manager to instruct __break_lease
> > >     to return an error to the caller, prompting a retry of
> > >     the conflicting operation.
> > > 
> > >     The NFSD lease manager uses this to avoid excessive nfsd
> > >     from being blocked in __break_lease, which could hinder
> > >     the server's ability to service incoming requests.
> > > 
> > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > ---
> > >   Documentation/filesystems/locking.rst |  4 ++
> > >   fs/locks.c                            | 29 +++++++++++-
> > >   fs/nfsd/nfs4layouts.c                 | 65 +++++++++++++++++++++++++--
> > >   include/linux/filelock.h              |  7 +++
> > >   4 files changed, 100 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
> > > index 04c7691e50e0..ae9a1b207b95 100644
> > > --- a/Documentation/filesystems/locking.rst
> > > +++ b/Documentation/filesystems/locking.rst
> > > @@ -403,6 +403,8 @@ prototypes::
> > >   	bool (*lm_breaker_owns_lease)(struct file_lock *);
> > >           bool (*lm_lock_expirable)(struct file_lock *);
> > >           void (*lm_expire_lock)(void);
> > > +        void (*lm_breaker_timedout)(struct file_lease *);
> > > +        bool (*lm_need_to_retry)(struct file_lease *, struct file_lock_context *);
> > >   
> > >   locking rules:
> > >   
> > > @@ -417,6 +419,8 @@ lm_breaker_owns_lease:	yes     	no			no
> > >   lm_lock_expirable	yes		no			no
> > >   lm_expire_lock		no		no			yes
> > >   lm_open_conflict	yes		no			no
> > > +lm_breaker_timedout     no              no                      yes
> > > +lm_need_to_retry        yes             no                      no
> > >   ======================	=============	=================	=========
> > >   
> > >   buffer_head
> > > diff --git a/fs/locks.c b/fs/locks.c
> > > index 46f229f740c8..cd08642ab8bb 100644
> > > --- a/fs/locks.c
> > > +++ b/fs/locks.c
> > > @@ -381,6 +381,14 @@ lease_dispose_list(struct list_head *dispose)
> > >   	while (!list_empty(dispose)) {
> > >   		flc = list_first_entry(dispose, struct file_lock_core, flc_list);
> > >   		list_del_init(&flc->flc_list);
> > > +		if (flc->flc_flags & FL_BREAKER_TIMEDOUT) {
> > > +			struct file_lease *fl;
> > > +
> > > +			fl = file_lease(flc);
> > > +			if (fl->fl_lmops &&
> > > +					fl->fl_lmops->lm_breaker_timedout)
> > > +				fl->fl_lmops->lm_breaker_timedout(fl);
> > > +		}
> > >   		locks_free_lease(file_lease(flc));
> > >   	}
> > >   }
> > > @@ -1531,8 +1539,10 @@ static void time_out_leases(struct inode *inode, struct list_head *dispose)
> > >   		trace_time_out_leases(inode, fl);
> > >   		if (past_time(fl->fl_downgrade_time))
> > >   			lease_modify(fl, F_RDLCK, dispose);
> > > -		if (past_time(fl->fl_break_time))
> > > +		if (past_time(fl->fl_break_time)) {
> > >   			lease_modify(fl, F_UNLCK, dispose);
> > > +			fl->c.flc_flags |= FL_BREAKER_TIMEDOUT;
> > > +		}
> > When the lease times out, you go ahead and remove it but then mark it
> > with FL_BREAKER_TIMEDOUT. Then later, you call ->lm_breaker_timedout if
> > that's set.
> > 
> > That means that when this happens, there is a window of time where
> > there is no lease, but the rogue client isn't yet fenced. That sounds
> > like a problem as you could allow competing access.
> 
> I have to think more about the implication of competing access. Since
> the thread that detects the conflict is in the process of fencing the
> other client and has not accessed the file data yet, I don't see the
> problem of allowing the other client to continue access the file until
> fence operation completed.
> 

Isn't the whole point of write layout leases to grant exclusive access
to an external client? At the point where you lose the lease, any
competing access can then proceed. Maybe a local file writer starts
writing to the file at that point. But...what if the client is still
writing stuff to the backing store? Won't that corrupt data (and maybe
metadata)?

> > 
> > I think you'll have to do this in reverse order: fence the client and
> > then remove the lease.
> > 
> > >   	}
> > >   }
> > >   
> > > @@ -1633,6 +1643,8 @@ int __break_lease(struct inode *inode, unsigned int flags)
> > >   	list_for_each_entry_safe(fl, tmp, &ctx->flc_lease, c.flc_list) {
> > >   		if (!leases_conflict(&fl->c, &new_fl->c))
> > >   			continue;
> > > +		if (new_fl->fl_lmops != fl->fl_lmops)
> > > +			new_fl->fl_lmops = fl->fl_lmops;
> > >   		if (want_write) {
> > >   			if (fl->c.flc_flags & FL_UNLOCK_PENDING)
> > >   				continue;
> > > @@ -1657,6 +1669,18 @@ int __break_lease(struct inode *inode, unsigned int flags)
> > >   		goto out;
> > >   	}
> > >   
> > > +	/*
> > > +	 * Check whether the lease manager wants the operation
> > > +	 * causing the conflict to be retried.
> > > +	 */
> > > +	if (new_fl->fl_lmops && new_fl->fl_lmops->lm_need_to_retry &&
> > > +			new_fl->fl_lmops->lm_need_to_retry(new_fl, ctx)) {
> > > +		trace_break_lease_noblock(inode, new_fl);
> > > +		error = -ERESTARTSYS;
> > > +		goto out;
> > > +	}
> > > +	ctx->flc_in_conflict = true;
> > > +
> > I guess flc_in_conflict is supposed to indicate "hey, we're already
> > doing a layout break on this inode". That seems reasonable, if a little
> > klunky.
> > 
> > It would be nice if you could track this flag inside of nfsd's data
> > structures instead (since only it cares about the flag), but I don't
> > think it has any convenient per-inode structures to set this in.
> 
> Can we move this flag in to nfsd_file? set the flag there and clear
> the flag when fencing completed.
> 

No, there can be several nfsd_file objects per inode. I think that'd be
hard to do.

> > 
> > >   restart:
> > >   	fl = list_first_entry(&ctx->flc_lease, struct file_lease, c.flc_list);
> > >   	break_time = fl->fl_break_time;
> > > @@ -1693,6 +1717,9 @@ int __break_lease(struct inode *inode, unsigned int flags)
> > >   	spin_unlock(&ctx->flc_lock);
> > >   	percpu_up_read(&file_rwsem);
> > >   	lease_dispose_list(&dispose);
> > > +	spin_lock(&ctx->flc_lock);
> > > +	ctx->flc_in_conflict = false;
> > > +	spin_unlock(&ctx->flc_lock);
> > >   free_lock:
> > >   	locks_free_lease(new_fl);
> > >   	return error;
> > > diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
> > > index ad7af8cfcf1f..e7777d6ee8d0 100644
> > > --- a/fs/nfsd/nfs4layouts.c
> > > +++ b/fs/nfsd/nfs4layouts.c
> > > @@ -747,11 +747,9 @@ static bool
> > >   nfsd4_layout_lm_break(struct file_lease *fl)
> > >   {
> > >   	/*
> > > -	 * We don't want the locks code to timeout the lease for us;
> > > -	 * we'll remove it ourself if a layout isn't returned
> > > -	 * in time:
> > > +	 * Enforce break lease timeout to prevent NFSD
> > > +	 * thread from hanging in __break_lease.
> > >   	 */
> > > -	fl->fl_break_time = 0;
> > >   	nfsd4_recall_file_layout(fl->c.flc_owner);
> > >   	return false;
> > >   }
> > > @@ -782,10 +780,69 @@ nfsd4_layout_lm_open_conflict(struct file *filp, int arg)
> > >   	return 0;
> > >   }
> > >   
> > > +/**
> > > + * nfsd_layout_breaker_timedout - The layout recall has timed out.
> > Please fix this kdoc header.
> 
> I noticed this too, will fix in v2.
> 
> > 
> > > + * If the layout type supports fence operation then do it to stop
> > > + * the client from accessing the block device.
> > > + *
> > > + * @fl: file to check
> > > + *
> > > + * Return value: None.
> > > + */
> > > +static void
> > > +nfsd4_layout_lm_breaker_timedout(struct file_lease *fl)
> > > +{
> > > +	struct nfs4_layout_stateid *ls = fl->c.flc_owner;
> > > +	struct nfsd_file *nf;
> > > +	u32 type;
> > > +
> > > +	rcu_read_lock();
> > > +	nf = nfsd_file_get(ls->ls_file);
> > > +	rcu_read_unlock();
> > > +	if (!nf)
> > > +		return;
> > > +	type = ls->ls_layout_type;
> > > +	if (nfsd4_layout_ops[type]->fence_client)
> > > +		nfsd4_layout_ops[type]->fence_client(ls, nf);
> > > +	nfsd_file_put(nf);
> > > +}
> > > +
> > > +/**
> > > + * nfsd4_layout_lm_conflict - Handle multiple conflicts in the same file.
> > kdoc header is wrong here. This should be for nfsd4_layout_lm_retry().
> 
> I noticed this too, will fix in v2. Kernel test robot also
> complains about this.
> 
> > 
> > > + *
> > > + * This function is called from __break_lease when a conflict occurs.
> > > + * For layout conflicts on the same file, each conflict triggers a
> > > + * layout  recall. Only the thread handling the first conflict needs
> > > + * to remain in __break_lease to manage the timeout for these recalls;
> > > + * subsequent threads should not wait in __break_lease.
> > > + *
> > > + * This is done to prevent excessive nfsd threads from becoming tied up
> > > + * in __break_lease, which could hinder the server's ability to service
> > > + * incoming requests.
> > > + *
> > > + * Return true if thread should not wait in __break_lease else return
> > > + * false.
> > > + */
> > > +static bool
> > > +nfsd4_layout_lm_retry(struct file_lease *fl,
> > > +				struct file_lock_context *ctx)
> > > +{
> > > +	struct svc_rqst *rqstp;
> > > +
> > > +	rqstp = nfsd_current_rqst();
> > > +	if (!rqstp)
> > > +		return false;
> > > +	if ((fl->c.flc_flags & FL_LAYOUT) && ctx->flc_in_conflict)
> > This should never be called for anything but a FL_LAYOUT lease, since
> > you're only setting this in nfsd4_layouts_lm_ops.
> 
> I will remove the check for FL_LAYOUT in v2.
> 
> Thanks,
> -Dai
> 
> > 
> > > +		return true;
> > > +	return false;
> > > +}
> > > +
> > >   static const struct lease_manager_operations nfsd4_layouts_lm_ops = {
> > >   	.lm_break		= nfsd4_layout_lm_break,
> > >   	.lm_change		= nfsd4_layout_lm_change,
> > >   	.lm_open_conflict	= nfsd4_layout_lm_open_conflict,
> > > +	.lm_breaker_timedout	= nfsd4_layout_lm_breaker_timedout,
> > > +	.lm_need_to_retry	= nfsd4_layout_lm_retry,
> > >   };
> > >   
> > >   int
> > > diff --git a/include/linux/filelock.h b/include/linux/filelock.h
> > > index 2f5e5588ee07..6967af8b7fd2 100644
> > > --- a/include/linux/filelock.h
> > > +++ b/include/linux/filelock.h
> > > @@ -17,6 +17,7 @@
> > >   #define FL_OFDLCK	1024	/* lock is "owned" by struct file */
> > >   #define FL_LAYOUT	2048	/* outstanding pNFS layout */
> > >   #define FL_RECLAIM	4096	/* reclaiming from a reboot server */
> > > +#define	FL_BREAKER_TIMEDOUT	8192	/* lease breaker timed out */
> > >   
> > >   #define FL_CLOSE_POSIX (FL_POSIX | FL_CLOSE)
> > >   
> > > @@ -50,6 +51,9 @@ struct lease_manager_operations {
> > >   	void (*lm_setup)(struct file_lease *, void **);
> > >   	bool (*lm_breaker_owns_lease)(struct file_lease *);
> > >   	int (*lm_open_conflict)(struct file *, int);
> > > +	void (*lm_breaker_timedout)(struct file_lease *fl);
> > > +	bool (*lm_need_to_retry)(struct file_lease *fl,
> > > +			struct file_lock_context *ctx);
> > >   };
> > >   
> > >   struct lock_manager {
> > > @@ -145,6 +149,9 @@ struct file_lock_context {
> > >   	struct list_head	flc_flock;
> > >   	struct list_head	flc_posix;
> > >   	struct list_head	flc_lease;
> > > +
> > > +	/* for FL_LAYOUT */
> > > +	bool			flc_in_conflict;
> > >   };
> > >   
> > >   #ifdef CONFIG_FILE_LOCKING

-- 
Jeff Layton <jlayton@kernel.org>

  reply	other threads:[~2026-01-20 21:28 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-19 17:47 [PATCH 1/1] NFSD: Enforce recall timeout for layout conflict Dai Ngo
2026-01-19 22:49 ` kernel test robot
2026-01-20  7:26 ` Christoph Hellwig
2026-01-20 18:54   ` Dai Ngo
2026-01-21  8:38     ` Christoph Hellwig
2026-01-20 15:19 ` Chuck Lever
2026-01-20 19:22   ` Dai Ngo
2026-01-20 20:55   ` Dai Ngo
2026-01-20 20:41 ` Jeff Layton
2026-01-20 21:22   ` Dai Ngo
2026-01-20 21:28     ` Jeff Layton [this message]
2026-01-20 21:42       ` Dai Ngo
2026-01-20 22:03         ` Jeff Layton
2026-01-20 22:48       ` Dai Ngo
2026-01-21 14:34         ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f2203e755aca4da45b099b18aac03b0a9d299343.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=alex.aring@gmail.com \
    --cc=chuck.lever@oracle.com \
    --cc=dai.ngo@oracle.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox