From: Jeff Layton <jlayton@kernel.org>
To: Mike Snitzer <snitzer@kernel.org>
Cc: steved@redhat.com, Chuck Lever <chuck.lever@oracle.com>,
NeilBrown <neil@brown.name>,
linux-nfs@vger.kernel.org
Subject: Re: unable to run NFSD in container if "options sunrpc pool_mode=pernode"
Date: Tue, 27 May 2025 09:50:13 -0400 [thread overview]
Message-ID: <0a6ebf149045d2d7c5379e1187d7b44ee297457b.camel@kernel.org> (raw)
In-Reply-To: <aDHYtTJxeAr5FDRK@kernel.org>
On Sat, 2025-05-24 at 10:33 -0400, Mike Snitzer wrote:
> On Sat, May 24, 2025 at 08:05:19AM -0400, Jeff Layton wrote:
> > On Fri, 2025-05-23 at 23:53 -0400, Mike Snitzer wrote:
> > > On Fri, May 23, 2025 at 07:09:27PM -0400, Mike Snitzer wrote:
> > > > On Fri, May 23, 2025 at 06:40:45PM -0400, Jeff Layton wrote:
> > > > > On Fri, 2025-05-23 at 18:19 -0400, Mike Snitzer wrote:
> > > > > > On Fri, May 23, 2025 at 02:40:17PM -0400, Jeff Layton wrote:
> > > > > > > On Fri, 2025-05-23 at 14:29 -0400, Mike Snitzer wrote:
> > > > > > > > I don't know if $SUBJECT ever worked... but with latest 6.15 or
> > > > > > > > nfsd-testing if I just use pool_mode=global then all is fine.
> > > > > > > >
> > > > > > > > If pool_mode=pernode then mounting the container's NFSv3 export fails.
> > > > > > > >
> > > > > > > > I haven't started to dig into code yet but pool_mode=pernode works
> > > > > > > > perfectly fine if NFSD isn't running in a container.
> > > > > > > >
> > > > >
> > > > > Oops, I went and looked and nfsd isn't running in a container on these
> > > > > boxes. There are some other containerized apps running on the box, but
> > > > > nfsd isn't running in a container.
> > > >
> > > > OK.
> > > >
> > > > > > I'm using nfs-utils-2.8.2. I don't see any nfsd threads running if I
> > > > > > use "options sunrpc pool_mode=pernode".
> > > > > >
> > > > >
> > > > > I'll have a look soon, but if you figure it out in the meantime, let us
> > > > > know.
> > > >
> > > > Will do.
> > > >
> > > > Just the latest info I have, with sunrpc's pool_mode=pernode dd hangs
> > > > with this stack trace:
> > >
> > > Turns out this pool_mode=pernode issue is a regression caused by the
> > > very recent nfs-utils 2.8.2 (I rebuilt EL10's nfs-utils package,
> > > because why not upgrade to the latest!?).
> > >
> > > If I use EL9.5's latest nfs-utils-2.5.4-37.el8.x86_64 then sunrpc's
> > > pool_mode=pernode works fine.
> > >
> > > And this issue doesn't have anything to do with running in a container
> > > (it seemed to be container related purely because I happened to be
> > > seeing the issue with an EL9.5 container that had the EL10-based
> > > nfs-utils 2.8.2 installed).
> > >
> > > Steved, unfortunately I'm not sure what the problem is with the newer
> > > nfs-utils and setting "options sunrpc pool_mode=pernode"
> > >
> >
> > I tried to reproduce this using fedora-41 VMs (no f42 available for
> > virt-builder yet), but everything worked. I don't have any actual NUMA
> > hw here though, so maybe that matters?
> >
> > Can you run this on the nfs server and send back the output? I'm
> > wondering if this setting might not track the module option properly on
> > that host for some reason:
> >
> > # nfsdctl pool-mode
>
> (from EL9.5 container with nfs-utils 2.8.2)
> # nfsdctl pool-mode
> pool-mode: pernode
> npools: 2
>
> (on host)
> # numactl -H
> available: 2 nodes (0-1)
> node 0 cpus: 0 1 2 3 4 5 6 7
> node 0 size: 11665 MB
> node 0 free: 9892 MB
> node 1 cpus: 8 9 10 11 12 13 14 15
> node 1 size: 6042 MB
> node 1 free: 5127 MB
> node distances:
> node 0 1
> 0: 10 20
> 1: 20 10
>
> (and yeahh I was aware the newer nfs-utils uses the netlink interface,
> will be interesting to pin down what the issue is with
> pool-mode=pernode)
Ok, I can reproduce this on a true NUMA machine. The first thing that's
interesting is that it seems to be intermittent. Occasionally I can
mount and operate on the socket, but socket requests hang most of the
time.
I turned up all of the nfsd and sunrpc tracepoints. After attempting a
mount that hung, I see only a single tracepoint fire:
<idle>-0 [038] ..s.. 5942.572721: svc_xprt_enqueue: server=[::]:2049 client=(einval) flags=CONN|CHNGBUF|LISTENER|CACHE_AUTH|CONG_CTRL
Based on the flags, svc_xprt_ready should have returned true. That should
make the xprt be enqueued and an idle thread be awoken. It looks like
that last bit may not be happening for some reason.
At this point, I'll probably have to add some debugging. I'll keep
poking at it -- stay tuned.
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2025-05-27 13:50 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-23 18:29 unable to run NFSD in container if "options sunrpc pool_mode=pernode" Mike Snitzer
2025-05-23 18:40 ` Jeff Layton
2025-05-23 22:19 ` Mike Snitzer
2025-05-23 22:38 ` Mike Snitzer
2025-05-23 22:40 ` Jeff Layton
2025-05-23 23:09 ` Mike Snitzer
2025-05-24 3:53 ` Mike Snitzer
2025-05-24 10:26 ` Jeff Layton
2025-05-24 12:05 ` Jeff Layton
2025-05-24 14:33 ` Mike Snitzer
2025-05-24 15:10 ` Jeff Layton
2025-05-27 13:50 ` Jeff Layton [this message]
2025-05-27 21:59 ` Jeff Layton
2025-06-13 12:32 ` Jeff Layton
2025-05-23 18:40 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0a6ebf149045d2d7c5379e1187d7b44ee297457b.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=neil@brown.name \
--cc=snitzer@kernel.org \
--cc=steved@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox