linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Dai Ngo <dai.ngo@oracle.com>,
	Benjamin Coddington <bcodding@hammerspace.com>
Cc: chuck.lever@oracle.com, neilb@ownmail.net, okorniev@redhat.com,
	 tom@talpey.com, hch@lst.de, alex.aring@gmail.com,
	viro@zeniv.linux.org.uk,  brauner@kernel.org, jack@suse.cz,
	linux-fsdevel@vger.kernel.org,  linux-kernel@vger.kernel.org,
	linux-nfs@vger.kernel.org
Subject: Re: [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts
Date: Tue, 11 Nov 2025 10:45:26 -0500	[thread overview]
Message-ID: <bcfae669888a1a35e63cdcf09938b23db003abb3.camel@kernel.org> (raw)
In-Reply-To: <e38104d7-1c2e-4791-aa78-d4458233dcb6@oracle.com>

On Tue, 2025-11-11 at 07:24 -0800, Dai Ngo wrote:
> Hi Ben,
> 
> On 11/9/25 10:34 AM, Benjamin Coddington wrote:
> > On 6 Nov 2025, at 12:05, Dai Ngo wrote:
> > 
> > > When a layout conflict triggers a call to __break_lease, the function
> > > nfsd4_layout_lm_break clears the fl_break_time timeout before sending
> > > the CB_LAYOUTRECALL. As a result, __break_lease repeatedly restarts
> > > its loop, waiting indefinitely for the conflicting file lease to be
> > > released.
> > > 
> > > If the number of lease conflicts matches the number of NFSD threads (which
> > > defaults to 8), all available NFSD threads become occupied. Consequently,
> > > there are no threads left to handle incoming requests or callback replies,
> > > leading to a total hang of the NFS server.
> > > 
> > > This issue is reliably reproducible by running the Git test suite on a
> > > configuration using SCSI layout.
> > > 
> > > This patchset fixes this problem by introducing the new lm_breaker_timedout
> > > operation to lease_manager_operations and using timeout for layout
> > > lease break.
> > Hey Dai,
> > 
> > I like your solution here, but I worry it can cause unexpected or
> > unnecessary client fencing when the problem is server-side (not enough
> > threads).  Clients might be dutifully sending LAYOUTRETURN, but the server
> > can't service them
> 
> I agreed. This is a server problem and we penalize the client. We need
> a long term solution for dealing resource shortage (server threads)
> problem.
> 
> Fortunately, the client can detect reservation conflict errors and appears
> to retry the I/O. Also, the client will ask for new layout and in the
> process it re-registers its reservation key so I/O will continue.
> 
> >   - and this change will cause some potentially unexpected
> > fencing in environments where things could be fixed (by adding more knfsd
> > threads).
> >    Also, I think we significantly bumped default thread counts
> > recently in nfs-utils:
> > eb5abb5c60ab (tag: nfs-utils-2-8-2-rc3) nfsd: dump default number of threads to 16
> 
> This helps a bit but if there is always a chance that there is a load
> that requires more than the number of server threads.
> 
> > 
> > You probably have already seen previous discussions about this:
> > https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/1CC82EC5-6120-4EE4-A7F0-019CF7BC762C@redhat.com/__;!!ACWV5N9M2RV99hQ!Pq4vHQs-qk71XjZ0vOkONTD7nxkuyUUEKTBsJJ0L_OrFWudokphCyc2V0q0_OrNoGD3KnsgoHKp7rb_lDcs$
> > 
> > This also changes the behavior for all layouts, I haven't thought through
> > the implications of that - but I wish we could have knob for this behavior,
> > or perhaps a knfsd-specific fl_break_time tuneable.
> 
> There is already a knob to tune the fl_break_time:
> # cat /proc/sys/fs/lease-break-time
> 
> but currently lease-break-time is in seconds so the minimum we can set
> is 1 which I think is still too long to tight up a server thread.
> 
> > 
> > Last thought (for now): I think Neil has some work for dynamic knfsd thread
> > count.. or Jeff?  (I am having trouble finding it) Would that work around
> > this problem?
> 


It would help, up to a point, but so does increasing the static thread
count. Even if we get dynamically sized threadpool, we'll likely still
have a hard cap on the number of threads. We'll always going to be
subject to this if VFS operations are going to be blocking on
delegation breaks.

> This would help, and I prefer this route rather than rework __break_lease
> to return EAGAIN/jukebox while the server recalling the layout.
> 


One way I can see to address this properly is to allow for non-blocking
lease breaks in some fashion. Basically, have the fs return
-EAGAIN (maybe after a short wait) at some point so that maybe
LAYOUTRETURN can get through once the client retries).

Plumbing that intent down to the actual break_layout() calls is a
problem though -- that's a lot of layers. I wonder if we need some per-
task flag that tells the layout engine "always do non-blocking lease
breaks"? That sounds pretty ugly too.

The only other way I could see to fix this is to move to an
asynchronous model of some sort. IOW, have at least some operations
(anything that could conceivably cause a layout break) done
asynchronously.

Then you could dispatch the operation and put the rqstp on a some sort
of waitqueue, and then let the thread do more work instead of blocking.
When the work is done, just requeue the rqstp to send the reply.

Just thinking out loud, but maybe we could use io_uring's underlying
infrastructure for this? Basically, set up an io_uring but do it all in
kernel space in nfsd thread context?
-- 
Jeff Layton <jlayton@kernel.org>

      parent reply	other threads:[~2025-11-11 15:45 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-06 17:05 [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts Dai Ngo
2025-11-06 17:05 ` [PATCH 1/2] locks: Introduce lm_breaker_timedout op to lease_manager_operations Dai Ngo
2025-11-07 13:26   ` Christoph Hellwig
2025-11-07 16:58     ` Dai Ngo
2025-11-06 17:05 ` [PATCH 2/2] NFSD: Fix server hang when there are multiple layout conflicts Dai Ngo
2025-11-07 13:29   ` Christoph Hellwig
2025-11-07 17:01     ` Dai Ngo
2025-11-07 13:30 ` [Patch 0/2] " Christoph Hellwig
2025-11-09 18:34 ` Benjamin Coddington
2025-11-11 15:24   ` Dai Ngo
2025-11-11 15:34     ` Chuck Lever
2025-11-11 15:36       ` Christoph Hellwig
2025-11-11 15:43       ` Dai Ngo
2025-11-11 15:53         ` Chuck Lever
2025-11-11 15:45     ` Jeff Layton [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bcfae669888a1a35e63cdcf09938b23db003abb3.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=alex.aring@gmail.com \
    --cc=bcodding@hammerspace.com \
    --cc=brauner@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=dai.ngo@oracle.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@ownmail.net \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).