public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
* unable to run NFSD in container if "options sunrpc pool_mode=pernode"
@ 2025-05-23 18:29 Mike Snitzer
  2025-05-23 18:40 ` Jeff Layton
  2025-05-23 18:40 ` Chuck Lever
  0 siblings, 2 replies; 15+ messages in thread
From: Mike Snitzer @ 2025-05-23 18:29 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton, NeilBrown; +Cc: linux-nfs

I don't know if $SUBJECT ever worked... but with latest 6.15 or
nfsd-testing if I just use pool_mode=global then all is fine.

If pool_mode=pernode then mounting the container's NFSv3 export fails.

I haven't started to dig into code yet but pool_mode=pernode works
perfectly fine if NFSD isn't running in a container.

Mike

ps. yet another reason why pool_mode=pernode should be the default if
more than 1 NUMA node ;)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unable to run NFSD in container if "options sunrpc pool_mode=pernode"
  2025-05-23 18:29 unable to run NFSD in container if "options sunrpc pool_mode=pernode" Mike Snitzer
@ 2025-05-23 18:40 ` Jeff Layton
  2025-05-23 22:19   ` Mike Snitzer
  2025-05-23 18:40 ` Chuck Lever
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff Layton @ 2025-05-23 18:40 UTC (permalink / raw)
  To: Mike Snitzer, Chuck Lever, NeilBrown; +Cc: linux-nfs

On Fri, 2025-05-23 at 14:29 -0400, Mike Snitzer wrote:
> I don't know if $SUBJECT ever worked... but with latest 6.15 or
> nfsd-testing if I just use pool_mode=global then all is fine.
> 
> If pool_mode=pernode then mounting the container's NFSv3 export fails.
> 
> I haven't started to dig into code yet but pool_mode=pernode works
> perfectly fine if NFSD isn't running in a container.
> 
> Mike
> 
> ps. yet another reason why pool_mode=pernode should be the default if
> more than 1 NUMA node ;)

Huh, strange. I've no idea why that would be. What kernel is this?

FWIW, I just built a localio-enabled on a v6.12-uek kernel for our own
purposes yesterday and it's running pool_mode=pernode. It seemed to
work fine as a v3 DS, but I didn't test mounting the container's export
directly.

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unable to run NFSD in container if "options sunrpc pool_mode=pernode"
  2025-05-23 18:29 unable to run NFSD in container if "options sunrpc pool_mode=pernode" Mike Snitzer
  2025-05-23 18:40 ` Jeff Layton
@ 2025-05-23 18:40 ` Chuck Lever
  1 sibling, 0 replies; 15+ messages in thread
From: Chuck Lever @ 2025-05-23 18:40 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-nfs, Jeff Layton, NeilBrown

On 5/23/25 2:29 PM, Mike Snitzer wrote:
> ps. yet another reason why pool_mode=pernode should be the default if
> more than 1 NUMA node ;)

Easier said than done. During bake-a-thon, I mentioned there are some
historical discussions about this. Two I found:

https://lore.kernel.org/linux-nfs/313d317dc0ca136de106979add5695ef5e2101e7.camel@hammerspace.com/

And the review comments on this patch:

https://lore.kernel.org/linux-nfs/20240715074657.18174-14-neilb@suse.de/


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unable to run NFSD in container if "options sunrpc pool_mode=pernode"
  2025-05-23 18:40 ` Jeff Layton
@ 2025-05-23 22:19   ` Mike Snitzer
  2025-05-23 22:38     ` Mike Snitzer
  2025-05-23 22:40     ` Jeff Layton
  0 siblings, 2 replies; 15+ messages in thread
From: Mike Snitzer @ 2025-05-23 22:19 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Chuck Lever, NeilBrown, linux-nfs

On Fri, May 23, 2025 at 02:40:17PM -0400, Jeff Layton wrote:
> On Fri, 2025-05-23 at 14:29 -0400, Mike Snitzer wrote:
> > I don't know if $SUBJECT ever worked... but with latest 6.15 or
> > nfsd-testing if I just use pool_mode=global then all is fine.
> > 
> > If pool_mode=pernode then mounting the container's NFSv3 export fails.
> > 
> > I haven't started to dig into code yet but pool_mode=pernode works
> > perfectly fine if NFSD isn't running in a container.
> > 
> > Mike
> > 
> > ps. yet another reason why pool_mode=pernode should be the default if
> > more than 1 NUMA node ;)
> 
> Huh, strange. I've no idea why that would be. What kernel is this?

It is this 6.12.24 based frankenbeast-ish kernel:
https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=kernel-6.12.24/main-testing

Basically just 6.12.24 + NFS and NFSD sync'd through nfs-testing and
nfsd-testing (so 6.15 NFS and NFSD going on 6.16).

But I also just verified that this kernel built on Chuck's
nfsd-testing branch (with 2 extra patches) has the same issue:
https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=cel-nfsd-testing-6.16

Here is the NFS related config:

CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=m
# CONFIG_NFS_V2 is not set
CONFIG_NFS_V3=m
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=m
# CONFIG_NFS_SWAP is not set
CONFIG_NFS_V4_1=y
CONFIG_NFS_V4_2=y
CONFIG_PNFS_FILE_LAYOUT=m
CONFIG_PNFS_BLOCK=m
CONFIG_PNFS_FLEXFILE_LAYOUT=m
CONFIG_NFS_V4_1_IMPLEMENTATION_ID_DOMAIN="kernel.org"
# CONFIG_NFS_V4_1_MIGRATION is not set
CONFIG_NFS_V4_SECURITY_LABEL=y
CONFIG_NFS_FSCACHE=y
# CONFIG_NFS_USE_LEGACY_DNS is not set
CONFIG_NFS_USE_KERNEL_DNS=y
CONFIG_NFS_DEBUG=y
CONFIG_NFS_DISABLE_UDP_SUPPORT=y
# CONFIG_NFS_V4_2_READ_PLUS is not set
CONFIG_NFSD=m
# CONFIG_NFSD_V2 is not set
CONFIG_NFSD_V3_ACL=y
CONFIG_NFSD_V4=y
CONFIG_NFSD_PNFS=y
# CONFIG_NFSD_BLOCKLAYOUT is not set
CONFIG_NFSD_SCSILAYOUT=y
# CONFIG_NFSD_FLEXFILELAYOUT is not set
# CONFIG_NFSD_V4_2_INTER_SSC is not set
CONFIG_NFSD_V4_SECURITY_LABEL=y
# CONFIG_NFSD_LEGACY_CLIENT_TRACKING is not set
# CONFIG_NFSD_V4_DELEG_TIMESTAMPS is not set
CONFIG_GRACE_PERIOD=m
CONFIG_LOCKD=m
CONFIG_LOCKD_V4=y
CONFIG_NFS_ACL_SUPPORT=m
CONFIG_NFS_COMMON=y
CONFIG_NFS_COMMON_LOCALIO_SUPPORT=m
CONFIG_NFS_LOCALIO=y
CONFIG_NFS_V4_2_SSC_HELPER=y
CONFIG_SUNRPC=m
CONFIG_SUNRPC_GSS=m
CONFIG_SUNRPC_BACKCHANNEL=y
CONFIG_RPCSEC_GSS_KRB5=m
CONFIG_RPCSEC_GSS_KRB5_ENCTYPES_AES_SHA1=y
CONFIG_RPCSEC_GSS_KRB5_ENCTYPES_AES_SHA2=y
CONFIG_SUNRPC_DEBUG=y
CONFIG_SUNRPC_XPRT_RDMA=m

> FWIW, I just built a localio-enabled on a v6.12-uek kernel for our own
> purposes yesterday and it's running pool_mode=pernode. It seemed to
> work fine as a v3 DS, but I didn't test mounting the container's export
> directly.

OK, but you were able to access the v3 DS just fine (assuming pNFS
flexfiles layouts that point to your DS that is running NFSD in a
container) ?

I'm using nfs-utils-2.8.2.  I don't see any nfsd threads running if I
use "options sunrpc pool_mode=pernode".

Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unable to run NFSD in container if "options sunrpc pool_mode=pernode"
  2025-05-23 22:19   ` Mike Snitzer
@ 2025-05-23 22:38     ` Mike Snitzer
  2025-05-23 22:40     ` Jeff Layton
  1 sibling, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2025-05-23 22:38 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Chuck Lever, NeilBrown, linux-nfs

On Fri, May 23, 2025 at 06:19:03PM -0400, Mike Snitzer wrote:
> On Fri, May 23, 2025 at 02:40:17PM -0400, Jeff Layton wrote:
> > On Fri, 2025-05-23 at 14:29 -0400, Mike Snitzer wrote:
> > > I don't know if $SUBJECT ever worked... but with latest 6.15 or
> > > nfsd-testing if I just use pool_mode=global then all is fine.
> > > 
> > > If pool_mode=pernode then mounting the container's NFSv3 export fails.
> > > 
> > > I haven't started to dig into code yet but pool_mode=pernode works
> > > perfectly fine if NFSD isn't running in a container.
> > > 
> > > Mike
> > > 
> > > ps. yet another reason why pool_mode=pernode should be the default if
> > > more than 1 NUMA node ;)
> > 
> > Huh, strange. I've no idea why that would be. What kernel is this?
> 
> It is this 6.12.24 based frankenbeast-ish kernel:
> https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=kernel-6.12.24/main-testing
> 
> Basically just 6.12.24 + NFS and NFSD sync'd through nfs-testing and
> nfsd-testing (so 6.15 NFS and NFSD going on 6.16).
> 
> But I also just verified that this kernel built on Chuck's
> nfsd-testing branch (with 2 extra patches) has the same issue:
> https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=cel-nfsd-testing-6.16
> 
> Here is the NFS related config:
> 
> CONFIG_NETWORK_FILESYSTEMS=y
> CONFIG_NFS_FS=m
> # CONFIG_NFS_V2 is not set
> CONFIG_NFS_V3=m
> CONFIG_NFS_V3_ACL=y
> CONFIG_NFS_V4=m
> # CONFIG_NFS_SWAP is not set
> CONFIG_NFS_V4_1=y
> CONFIG_NFS_V4_2=y
> CONFIG_PNFS_FILE_LAYOUT=m
> CONFIG_PNFS_BLOCK=m
> CONFIG_PNFS_FLEXFILE_LAYOUT=m
> CONFIG_NFS_V4_1_IMPLEMENTATION_ID_DOMAIN="kernel.org"
> # CONFIG_NFS_V4_1_MIGRATION is not set
> CONFIG_NFS_V4_SECURITY_LABEL=y
> CONFIG_NFS_FSCACHE=y
> # CONFIG_NFS_USE_LEGACY_DNS is not set
> CONFIG_NFS_USE_KERNEL_DNS=y
> CONFIG_NFS_DEBUG=y
> CONFIG_NFS_DISABLE_UDP_SUPPORT=y
> # CONFIG_NFS_V4_2_READ_PLUS is not set
> CONFIG_NFSD=m
> # CONFIG_NFSD_V2 is not set
> CONFIG_NFSD_V3_ACL=y
> CONFIG_NFSD_V4=y
> CONFIG_NFSD_PNFS=y
> # CONFIG_NFSD_BLOCKLAYOUT is not set
> CONFIG_NFSD_SCSILAYOUT=y
> # CONFIG_NFSD_FLEXFILELAYOUT is not set
> # CONFIG_NFSD_V4_2_INTER_SSC is not set
> CONFIG_NFSD_V4_SECURITY_LABEL=y
> # CONFIG_NFSD_LEGACY_CLIENT_TRACKING is not set
> # CONFIG_NFSD_V4_DELEG_TIMESTAMPS is not set
> CONFIG_GRACE_PERIOD=m
> CONFIG_LOCKD=m
> CONFIG_LOCKD_V4=y
> CONFIG_NFS_ACL_SUPPORT=m
> CONFIG_NFS_COMMON=y
> CONFIG_NFS_COMMON_LOCALIO_SUPPORT=m
> CONFIG_NFS_LOCALIO=y
> CONFIG_NFS_V4_2_SSC_HELPER=y
> CONFIG_SUNRPC=m
> CONFIG_SUNRPC_GSS=m
> CONFIG_SUNRPC_BACKCHANNEL=y
> CONFIG_RPCSEC_GSS_KRB5=m
> CONFIG_RPCSEC_GSS_KRB5_ENCTYPES_AES_SHA1=y
> CONFIG_RPCSEC_GSS_KRB5_ENCTYPES_AES_SHA2=y
> CONFIG_SUNRPC_DEBUG=y
> CONFIG_SUNRPC_XPRT_RDMA=m
> 
> > FWIW, I just built a localio-enabled on a v6.12-uek kernel for our own
> > purposes yesterday and it's running pool_mode=pernode. It seemed to
> > work fine as a v3 DS, but I didn't test mounting the container's export
> > directly.
> 
> OK, but you were able to access the v3 DS just fine (assuming pNFS
> flexfiles layouts that point to your DS that is running NFSD in a
> container) ?
> 
> I'm using nfs-utils-2.8.2.  I don't see any nfsd threads running if I
> use "options sunrpc pool_mode=pernode".

Actually, I do see nfsd threads running.. just that if I try to issue
IO (using pNFS flexfiles to file on v3 DS) it hangs.

Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unable to run NFSD in container if "options sunrpc pool_mode=pernode"
  2025-05-23 22:19   ` Mike Snitzer
  2025-05-23 22:38     ` Mike Snitzer
@ 2025-05-23 22:40     ` Jeff Layton
  2025-05-23 23:09       ` Mike Snitzer
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff Layton @ 2025-05-23 22:40 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Chuck Lever, NeilBrown, linux-nfs

On Fri, 2025-05-23 at 18:19 -0400, Mike Snitzer wrote:
> On Fri, May 23, 2025 at 02:40:17PM -0400, Jeff Layton wrote:
> > On Fri, 2025-05-23 at 14:29 -0400, Mike Snitzer wrote:
> > > I don't know if $SUBJECT ever worked... but with latest 6.15 or
> > > nfsd-testing if I just use pool_mode=global then all is fine.
> > > 
> > > If pool_mode=pernode then mounting the container's NFSv3 export fails.
> > > 
> > > I haven't started to dig into code yet but pool_mode=pernode works
> > > perfectly fine if NFSD isn't running in a container.
> > > 

Oops, I went and looked and nfsd isn't running in a container on these
boxes. There are some other containerized apps running on the box, but
nfsd isn't running in a container.


> > > Mike
> > > 
> > > ps. yet another reason why pool_mode=pernode should be the default if
> > > more than 1 NUMA node ;)
> > 
> > Huh, strange. I've no idea why that would be. What kernel is this?
> 
> It is this 6.12.24 based frankenbeast-ish kernel:
> https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=kernel-6.12.24/main-testing
> 
> Basically just 6.12.24 + NFS and NFSD sync'd through nfs-testing and
> nfsd-testing (so 6.15 NFS and NFSD going on 6.16).
> 
> But I also just verified that this kernel built on Chuck's
> nfsd-testing branch (with 2 extra patches) has the same issue:
> https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=cel-nfsd-testing-6.16
> 
> Here is the NFS related config:
> 
> CONFIG_NETWORK_FILESYSTEMS=y
> CONFIG_NFS_FS=m
> # CONFIG_NFS_V2 is not set
> CONFIG_NFS_V3=m
> CONFIG_NFS_V3_ACL=y
> CONFIG_NFS_V4=m
> # CONFIG_NFS_SWAP is not set
> CONFIG_NFS_V4_1=y
> CONFIG_NFS_V4_2=y
> CONFIG_PNFS_FILE_LAYOUT=m
> CONFIG_PNFS_BLOCK=m
> CONFIG_PNFS_FLEXFILE_LAYOUT=m
> CONFIG_NFS_V4_1_IMPLEMENTATION_ID_DOMAIN="kernel.org"
> # CONFIG_NFS_V4_1_MIGRATION is not set
> CONFIG_NFS_V4_SECURITY_LABEL=y
> CONFIG_NFS_FSCACHE=y
> # CONFIG_NFS_USE_LEGACY_DNS is not set
> CONFIG_NFS_USE_KERNEL_DNS=y
> CONFIG_NFS_DEBUG=y
> CONFIG_NFS_DISABLE_UDP_SUPPORT=y
> # CONFIG_NFS_V4_2_READ_PLUS is not set
> CONFIG_NFSD=m
> # CONFIG_NFSD_V2 is not set
> CONFIG_NFSD_V3_ACL=y
> CONFIG_NFSD_V4=y
> CONFIG_NFSD_PNFS=y
> # CONFIG_NFSD_BLOCKLAYOUT is not set
> CONFIG_NFSD_SCSILAYOUT=y
> # CONFIG_NFSD_FLEXFILELAYOUT is not set
> # CONFIG_NFSD_V4_2_INTER_SSC is not set
> CONFIG_NFSD_V4_SECURITY_LABEL=y
> # CONFIG_NFSD_LEGACY_CLIENT_TRACKING is not set
> # CONFIG_NFSD_V4_DELEG_TIMESTAMPS is not set
> CONFIG_GRACE_PERIOD=m
> CONFIG_LOCKD=m
> CONFIG_LOCKD_V4=y
> CONFIG_NFS_ACL_SUPPORT=m
> CONFIG_NFS_COMMON=y
> CONFIG_NFS_COMMON_LOCALIO_SUPPORT=m
> CONFIG_NFS_LOCALIO=y
> CONFIG_NFS_V4_2_SSC_HELPER=y
> CONFIG_SUNRPC=m
> CONFIG_SUNRPC_GSS=m
> CONFIG_SUNRPC_BACKCHANNEL=y
> CONFIG_RPCSEC_GSS_KRB5=m
> CONFIG_RPCSEC_GSS_KRB5_ENCTYPES_AES_SHA1=y
> CONFIG_RPCSEC_GSS_KRB5_ENCTYPES_AES_SHA2=y
> CONFIG_SUNRPC_DEBUG=y
> CONFIG_SUNRPC_XPRT_RDMA=m
> 
> > FWIW, I just built a localio-enabled on a v6.12-uek kernel for our own
> > purposes yesterday and it's running pool_mode=pernode. It seemed to
> > work fine as a v3 DS, but I didn't test mounting the container's export
> > directly.
> 
> OK, but you were able to access the v3 DS just fine (assuming pNFS
> flexfiles layouts that point to your DS that is running NFSD in a
> container) ?
> 
> I'm using nfs-utils-2.8.2.  I don't see any nfsd threads running if I
> use "options sunrpc pool_mode=pernode".
> 

I'll have a look soon, but if you figure it out in the meantime, let us
know.

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unable to run NFSD in container if "options sunrpc pool_mode=pernode"
  2025-05-23 22:40     ` Jeff Layton
@ 2025-05-23 23:09       ` Mike Snitzer
  2025-05-24  3:53         ` Mike Snitzer
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2025-05-23 23:09 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Chuck Lever, NeilBrown, linux-nfs

On Fri, May 23, 2025 at 06:40:45PM -0400, Jeff Layton wrote:
> On Fri, 2025-05-23 at 18:19 -0400, Mike Snitzer wrote:
> > On Fri, May 23, 2025 at 02:40:17PM -0400, Jeff Layton wrote:
> > > On Fri, 2025-05-23 at 14:29 -0400, Mike Snitzer wrote:
> > > > I don't know if $SUBJECT ever worked... but with latest 6.15 or
> > > > nfsd-testing if I just use pool_mode=global then all is fine.
> > > > 
> > > > If pool_mode=pernode then mounting the container's NFSv3 export fails.
> > > > 
> > > > I haven't started to dig into code yet but pool_mode=pernode works
> > > > perfectly fine if NFSD isn't running in a container.
> > > > 
> 
> Oops, I went and looked and nfsd isn't running in a container on these
> boxes. There are some other containerized apps running on the box, but
> nfsd isn't running in a container.

OK.

> > > > ps. yet another reason why pool_mode=pernode should be the default if
> > > > more than 1 NUMA node ;)
> > > 
> > > Huh, strange. I've no idea why that would be. What kernel is this?
> > 
> > It is this 6.12.24 based frankenbeast-ish kernel:
> > https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=kernel-6.12.24/main-testing
> > 
> > Basically just 6.12.24 + NFS and NFSD sync'd through nfs-testing and
> > nfsd-testing (so 6.15 NFS and NFSD going on 6.16).
> > 
> > But I also just verified that this kernel built on Chuck's
> > nfsd-testing branch (with 2 extra patches) has the same issue:
> > https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=cel-nfsd-testing-6.16
> > 
> > Here is the NFS related config:
> > 
> > CONFIG_NETWORK_FILESYSTEMS=y
> > CONFIG_NFS_FS=m
> > # CONFIG_NFS_V2 is not set
> > CONFIG_NFS_V3=m
> > CONFIG_NFS_V3_ACL=y
> > CONFIG_NFS_V4=m
> > # CONFIG_NFS_SWAP is not set
> > CONFIG_NFS_V4_1=y
> > CONFIG_NFS_V4_2=y
> > CONFIG_PNFS_FILE_LAYOUT=m
> > CONFIG_PNFS_BLOCK=m
> > CONFIG_PNFS_FLEXFILE_LAYOUT=m
> > CONFIG_NFS_V4_1_IMPLEMENTATION_ID_DOMAIN="kernel.org"
> > # CONFIG_NFS_V4_1_MIGRATION is not set
> > CONFIG_NFS_V4_SECURITY_LABEL=y
> > CONFIG_NFS_FSCACHE=y
> > # CONFIG_NFS_USE_LEGACY_DNS is not set
> > CONFIG_NFS_USE_KERNEL_DNS=y
> > CONFIG_NFS_DEBUG=y
> > CONFIG_NFS_DISABLE_UDP_SUPPORT=y
> > # CONFIG_NFS_V4_2_READ_PLUS is not set
> > CONFIG_NFSD=m
> > # CONFIG_NFSD_V2 is not set
> > CONFIG_NFSD_V3_ACL=y
> > CONFIG_NFSD_V4=y
> > CONFIG_NFSD_PNFS=y
> > # CONFIG_NFSD_BLOCKLAYOUT is not set
> > CONFIG_NFSD_SCSILAYOUT=y
> > # CONFIG_NFSD_FLEXFILELAYOUT is not set
> > # CONFIG_NFSD_V4_2_INTER_SSC is not set
> > CONFIG_NFSD_V4_SECURITY_LABEL=y
> > # CONFIG_NFSD_LEGACY_CLIENT_TRACKING is not set
> > # CONFIG_NFSD_V4_DELEG_TIMESTAMPS is not set
> > CONFIG_GRACE_PERIOD=m
> > CONFIG_LOCKD=m
> > CONFIG_LOCKD_V4=y
> > CONFIG_NFS_ACL_SUPPORT=m
> > CONFIG_NFS_COMMON=y
> > CONFIG_NFS_COMMON_LOCALIO_SUPPORT=m
> > CONFIG_NFS_LOCALIO=y
> > CONFIG_NFS_V4_2_SSC_HELPER=y
> > CONFIG_SUNRPC=m
> > CONFIG_SUNRPC_GSS=m
> > CONFIG_SUNRPC_BACKCHANNEL=y
> > CONFIG_RPCSEC_GSS_KRB5=m
> > CONFIG_RPCSEC_GSS_KRB5_ENCTYPES_AES_SHA1=y
> > CONFIG_RPCSEC_GSS_KRB5_ENCTYPES_AES_SHA2=y
> > CONFIG_SUNRPC_DEBUG=y
> > CONFIG_SUNRPC_XPRT_RDMA=m
> > 
> > > FWIW, I just built a localio-enabled on a v6.12-uek kernel for our own
> > > purposes yesterday and it's running pool_mode=pernode. It seemed to
> > > work fine as a v3 DS, but I didn't test mounting the container's export
> > > directly.
> > 
> > OK, but you were able to access the v3 DS just fine (assuming pNFS
> > flexfiles layouts that point to your DS that is running NFSD in a
> > container) ?
> > 
> > I'm using nfs-utils-2.8.2.  I don't see any nfsd threads running if I
> > use "options sunrpc pool_mode=pernode".
> > 
> 
> I'll have a look soon, but if you figure it out in the meantime, let us
> know.

Will do.

Just the latest info I have, with sunrpc's pool_mode=pernode dd hangs
with this stack trace:

# cat /proc/8087/stack
[<0>] rpc_wait_bit_killable+0x25/0x80 [sunrpc]
[<0>] __rpc_execute+0x151/0x480 [sunrpc]
[<0>] rpc_execute+0xca/0xf0 [sunrpc]
[<0>] rpc_run_task+0x110/0x180 [sunrpc]
[<0>] nfs4_call_sync_custom+0xb/0x30 [nfsv4]
[<0>] nfs4_do_call_sync+0x69/0x90 [nfsv4]
[<0>] _nfs4_proc_getattr+0x128/0x160 [nfsv4]
[<0>] nfs4_proc_getattr+0x73/0x100 [nfsv4]
[<0>] nfs4_do_open+0x775/0x9d0 [nfsv4]
[<0>] nfs4_atomic_open+0xf7/0x100 [nfsv4]
[<0>] nfs_atomic_open+0x1e7/0x6c0 [nfs]
[<0>] path_openat+0xd38/0x11f0
[<0>] do_filp_open+0xae/0x120
[<0>] do_sys_openat2+0x24d/0x2a0
[<0>] do_sys_open+0x4f/0x90
[<0>] do_syscall_64+0x7b/0x160
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

And if I just try to mount using v3 it fails with:

# mount -vvvvvvv -o vers=3,nolock 10.200.80.89:/cvol_12_0 /mnt/test
mount.nfs: timeout set for Fri May 23 22:52:04 2025
mount.nfs: trying text-based options 'vers=3,nolock,addr=10.200.80.89'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 10.200.80.89 prog 100003 vers 3 prot TCP port 2049
mount.nfs: portmap query retrying: RPC: Timed out
mount.nfs: prog 100003, trying vers=3, prot=17
mount.nfs: portmap query failed: RPC: Program not registered
mount.nfs: trying text-based options 'vers=3,nolock,addr=10.200.80.89'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 10.200.80.89 prog 100003 vers 3 prot TCP port 2049
mount.nfs: portmap query retrying: RPC: Timed out
mount.nfs: prog 100003, trying vers=3, prot=17
mount.nfs: portmap query failed: RPC: Program not registered
mount.nfs: trying text-based options 'vers=3,nolock,addr=10.200.80.89'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 10.200.80.89 prog 100003 vers 3 prot TCP port 2049
mount.nfs: portmap query retrying: RPC: Timed out
mount.nfs: prog 100003, trying vers=3, prot=17
mount.nfs: portmap query failed: RPC: Program not registered
mount.nfs: requested NFS version or transport protocol is not supported for /mnt/test

# rpcinfo -p 10.200.80.89
   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100005    1   udp  20048  mountd
    100005    1   tcp  20048  mountd
    100005    2   udp  20048  mountd
    100005    2   tcp  20048  mountd
    100024    1   udp  45252  status
    100024    1   tcp  60557  status
    100005    3   udp  20048  mountd
    100005    3   tcp  20048  mountd
    100003    3   tcp   2049  nfs
    100003    4   tcp   2049  nfs
    100227    3   tcp   2049  nfs_acl
    100021    1   udp  40987  nlockmgr
    100021    3   udp  40987  nlockmgr
    100021    4   udp  40987  nlockmgr
    100021    1   tcp  36527  nlockmgr
    100021    3   tcp  36527  nlockmgr
    100021    4   tcp  36527  nlockmgr

(Not sure what's up with portmap issues and it not progressing to
trying program 100005.. which as you can see below it does)

But if I just use sunrpc's default pool_mode=global:

# mount -vvvvvvv -o vers=3,nolock 10.200.80.89:/cvol_12_0 /mnt/test
mount.nfs: timeout set for Fri May 23 22:55:43 2025
mount.nfs: trying text-based options 'vers=3,nolock,addr=10.200.80.89'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 10.200.80.89 prog 100003 vers 3 prot TCP port 2049
mount.nfs: prog 100005, trying vers=3, prot=17
mount.nfs: trying 10.200.80.89 prog 100005 vers 3 prot UDP port 20048

# rpcinfo -p 10.200.80.89
   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp  54037  status
    100024    1   tcp  46339  status
    100005    1   udp  20048  mountd
    100005    1   tcp  20048  mountd
    100005    2   udp  20048  mountd
    100005    2   tcp  20048  mountd
    100005    3   udp  20048  mountd
    100005    3   tcp  20048  mountd
    100003    3   tcp   2049  nfs
    100003    4   tcp   2049  nfs
    100227    3   tcp   2049  nfs_acl
    100021    1   udp  36268  nlockmgr
    100021    3   udp  36268  nlockmgr
    100021    4   udp  36268  nlockmgr
    100021    1   tcp  44195  nlockmgr
    100021    3   tcp  44195  nlockmgr
    100021    4   tcp  44195  nlockmgr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unable to run NFSD in container if "options sunrpc pool_mode=pernode"
  2025-05-23 23:09       ` Mike Snitzer
@ 2025-05-24  3:53         ` Mike Snitzer
  2025-05-24 10:26           ` Jeff Layton
  2025-05-24 12:05           ` Jeff Layton
  0 siblings, 2 replies; 15+ messages in thread
From: Mike Snitzer @ 2025-05-24  3:53 UTC (permalink / raw)
  To: Jeff Layton, steved; +Cc: Chuck Lever, NeilBrown, linux-nfs

On Fri, May 23, 2025 at 07:09:27PM -0400, Mike Snitzer wrote:
> On Fri, May 23, 2025 at 06:40:45PM -0400, Jeff Layton wrote:
> > On Fri, 2025-05-23 at 18:19 -0400, Mike Snitzer wrote:
> > > On Fri, May 23, 2025 at 02:40:17PM -0400, Jeff Layton wrote:
> > > > On Fri, 2025-05-23 at 14:29 -0400, Mike Snitzer wrote:
> > > > > I don't know if $SUBJECT ever worked... but with latest 6.15 or
> > > > > nfsd-testing if I just use pool_mode=global then all is fine.
> > > > > 
> > > > > If pool_mode=pernode then mounting the container's NFSv3 export fails.
> > > > > 
> > > > > I haven't started to dig into code yet but pool_mode=pernode works
> > > > > perfectly fine if NFSD isn't running in a container.
> > > > > 
> > 
> > Oops, I went and looked and nfsd isn't running in a container on these
> > boxes. There are some other containerized apps running on the box, but
> > nfsd isn't running in a container.
> 
> OK.
> 
> > > I'm using nfs-utils-2.8.2.  I don't see any nfsd threads running if I
> > > use "options sunrpc pool_mode=pernode".
> > > 
> > 
> > I'll have a look soon, but if you figure it out in the meantime, let us
> > know.
> 
> Will do.
> 
> Just the latest info I have, with sunrpc's pool_mode=pernode dd hangs
> with this stack trace:

Turns out this pool_mode=pernode issue is a regression caused by the
very recent nfs-utils 2.8.2 (I rebuilt EL10's nfs-utils package,
because why not upgrade to the latest!?).

If I use EL9.5's latest nfs-utils-2.5.4-37.el8.x86_64 then sunrpc's
pool_mode=pernode works fine.

And this issue doesn't have anything to do with running in a container
(it seemed to be container related purely because I happened to be
seeing the issue with an EL9.5 container that had the EL10-based
nfs-utils 2.8.2 installed).

Steved, unfortunately I'm not sure what the problem is with the newer
nfs-utils and setting "options sunrpc pool_mode=pernode"

Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unable to run NFSD in container if "options sunrpc pool_mode=pernode"
  2025-05-24  3:53         ` Mike Snitzer
@ 2025-05-24 10:26           ` Jeff Layton
  2025-05-24 12:05           ` Jeff Layton
  1 sibling, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2025-05-24 10:26 UTC (permalink / raw)
  To: Mike Snitzer, steved; +Cc: Chuck Lever, NeilBrown, linux-nfs

On Fri, 2025-05-23 at 23:53 -0400, Mike Snitzer wrote:
> On Fri, May 23, 2025 at 07:09:27PM -0400, Mike Snitzer wrote:
> > On Fri, May 23, 2025 at 06:40:45PM -0400, Jeff Layton wrote:
> > > On Fri, 2025-05-23 at 18:19 -0400, Mike Snitzer wrote:
> > > > On Fri, May 23, 2025 at 02:40:17PM -0400, Jeff Layton wrote:
> > > > > On Fri, 2025-05-23 at 14:29 -0400, Mike Snitzer wrote:
> > > > > > I don't know if $SUBJECT ever worked... but with latest 6.15 or
> > > > > > nfsd-testing if I just use pool_mode=global then all is fine.
> > > > > > 
> > > > > > If pool_mode=pernode then mounting the container's NFSv3 export fails.
> > > > > > 
> > > > > > I haven't started to dig into code yet but pool_mode=pernode works
> > > > > > perfectly fine if NFSD isn't running in a container.
> > > > > > 
> > > 
> > > Oops, I went and looked and nfsd isn't running in a container on these
> > > boxes. There are some other containerized apps running on the box, but
> > > nfsd isn't running in a container.
> > 
> > OK.
> > 
> > > > I'm using nfs-utils-2.8.2.  I don't see any nfsd threads running if I
> > > > use "options sunrpc pool_mode=pernode".
> > > > 
> > > 
> > > I'll have a look soon, but if you figure it out in the meantime, let us
> > > know.
> > 
> > Will do.
> > 
> > Just the latest info I have, with sunrpc's pool_mode=pernode dd hangs
> > with this stack trace:
> 
> Turns out this pool_mode=pernode issue is a regression caused by the
> very recent nfs-utils 2.8.2 (I rebuilt EL10's nfs-utils package,
> because why not upgrade to the latest!?).
> 
> If I use EL9.5's latest nfs-utils-2.5.4-37.el8.x86_64 then sunrpc's
> pool_mode=pernode works fine.
> 
> And this issue doesn't have anything to do with running in a container
> (it seemed to be container related purely because I happened to be
> seeing the issue with an EL9.5 container that had the EL10-based
> nfs-utils 2.8.2 installed).
> 
> Steved, unfortunately I'm not sure what the problem is with the newer
> nfs-utils and setting "options sunrpc pool_mode=pernode"
> 

This is probably a kernel problem.

Newer nfs-utils uses nfsdctl to start the server, whereas the older
nfs-utils would use rpc.nfsd. nfsdctl uses the netlink interfaces to
start the server instead of /proc/fs/nfsd which rpc.nfsd uses.

If you're getting different results with the two different nfs-utils
versions then the problem is likely there. I'll take a look.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unable to run NFSD in container if "options sunrpc pool_mode=pernode"
  2025-05-24  3:53         ` Mike Snitzer
  2025-05-24 10:26           ` Jeff Layton
@ 2025-05-24 12:05           ` Jeff Layton
  2025-05-24 14:33             ` Mike Snitzer
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff Layton @ 2025-05-24 12:05 UTC (permalink / raw)
  To: Mike Snitzer, steved; +Cc: Chuck Lever, NeilBrown, linux-nfs

On Fri, 2025-05-23 at 23:53 -0400, Mike Snitzer wrote:
> On Fri, May 23, 2025 at 07:09:27PM -0400, Mike Snitzer wrote:
> > On Fri, May 23, 2025 at 06:40:45PM -0400, Jeff Layton wrote:
> > > On Fri, 2025-05-23 at 18:19 -0400, Mike Snitzer wrote:
> > > > On Fri, May 23, 2025 at 02:40:17PM -0400, Jeff Layton wrote:
> > > > > On Fri, 2025-05-23 at 14:29 -0400, Mike Snitzer wrote:
> > > > > > I don't know if $SUBJECT ever worked... but with latest 6.15 or
> > > > > > nfsd-testing if I just use pool_mode=global then all is fine.
> > > > > > 
> > > > > > If pool_mode=pernode then mounting the container's NFSv3 export fails.
> > > > > > 
> > > > > > I haven't started to dig into code yet but pool_mode=pernode works
> > > > > > perfectly fine if NFSD isn't running in a container.
> > > > > > 
> > > 
> > > Oops, I went and looked and nfsd isn't running in a container on these
> > > boxes. There are some other containerized apps running on the box, but
> > > nfsd isn't running in a container.
> > 
> > OK.
> > 
> > > > I'm using nfs-utils-2.8.2.  I don't see any nfsd threads running if I
> > > > use "options sunrpc pool_mode=pernode".
> > > > 
> > > 
> > > I'll have a look soon, but if you figure it out in the meantime, let us
> > > know.
> > 
> > Will do.
> > 
> > Just the latest info I have, with sunrpc's pool_mode=pernode dd hangs
> > with this stack trace:
> 
> Turns out this pool_mode=pernode issue is a regression caused by the
> very recent nfs-utils 2.8.2 (I rebuilt EL10's nfs-utils package,
> because why not upgrade to the latest!?).
> 
> If I use EL9.5's latest nfs-utils-2.5.4-37.el8.x86_64 then sunrpc's
> pool_mode=pernode works fine.
> 
> And this issue doesn't have anything to do with running in a container
> (it seemed to be container related purely because I happened to be
> seeing the issue with an EL9.5 container that had the EL10-based
> nfs-utils 2.8.2 installed).
> 
> Steved, unfortunately I'm not sure what the problem is with the newer
> nfs-utils and setting "options sunrpc pool_mode=pernode"
> 

I tried to reproduce this using fedora-41 VMs (no f42 available for
virt-builder yet), but everything worked. I don't have any actual NUMA
hw here though, so maybe that matters?

Can you run this on the nfs server and send back the output? I'm
wondering if this setting might not track the module option properly on
that host for some reason:

    # nfsdctl pool-mode

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unable to run NFSD in container if "options sunrpc pool_mode=pernode"
  2025-05-24 12:05           ` Jeff Layton
@ 2025-05-24 14:33             ` Mike Snitzer
  2025-05-24 15:10               ` Jeff Layton
                                 ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Mike Snitzer @ 2025-05-24 14:33 UTC (permalink / raw)
  To: Jeff Layton; +Cc: steved, Chuck Lever, NeilBrown, linux-nfs

On Sat, May 24, 2025 at 08:05:19AM -0400, Jeff Layton wrote:
> On Fri, 2025-05-23 at 23:53 -0400, Mike Snitzer wrote:
> > On Fri, May 23, 2025 at 07:09:27PM -0400, Mike Snitzer wrote:
> > > On Fri, May 23, 2025 at 06:40:45PM -0400, Jeff Layton wrote:
> > > > On Fri, 2025-05-23 at 18:19 -0400, Mike Snitzer wrote:
> > > > > On Fri, May 23, 2025 at 02:40:17PM -0400, Jeff Layton wrote:
> > > > > > On Fri, 2025-05-23 at 14:29 -0400, Mike Snitzer wrote:
> > > > > > > I don't know if $SUBJECT ever worked... but with latest 6.15 or
> > > > > > > nfsd-testing if I just use pool_mode=global then all is fine.
> > > > > > > 
> > > > > > > If pool_mode=pernode then mounting the container's NFSv3 export fails.
> > > > > > > 
> > > > > > > I haven't started to dig into code yet but pool_mode=pernode works
> > > > > > > perfectly fine if NFSD isn't running in a container.
> > > > > > > 
> > > > 
> > > > Oops, I went and looked and nfsd isn't running in a container on these
> > > > boxes. There are some other containerized apps running on the box, but
> > > > nfsd isn't running in a container.
> > > 
> > > OK.
> > > 
> > > > > I'm using nfs-utils-2.8.2.  I don't see any nfsd threads running if I
> > > > > use "options sunrpc pool_mode=pernode".
> > > > > 
> > > > 
> > > > I'll have a look soon, but if you figure it out in the meantime, let us
> > > > know.
> > > 
> > > Will do.
> > > 
> > > Just the latest info I have, with sunrpc's pool_mode=pernode dd hangs
> > > with this stack trace:
> > 
> > Turns out this pool_mode=pernode issue is a regression caused by the
> > very recent nfs-utils 2.8.2 (I rebuilt EL10's nfs-utils package,
> > because why not upgrade to the latest!?).
> > 
> > If I use EL9.5's latest nfs-utils-2.5.4-37.el8.x86_64 then sunrpc's
> > pool_mode=pernode works fine.
> > 
> > And this issue doesn't have anything to do with running in a container
> > (it seemed to be container related purely because I happened to be
> > seeing the issue with an EL9.5 container that had the EL10-based
> > nfs-utils 2.8.2 installed).
> > 
> > Steved, unfortunately I'm not sure what the problem is with the newer
> > nfs-utils and setting "options sunrpc pool_mode=pernode"
> > 
> 
> I tried to reproduce this using fedora-41 VMs (no f42 available for
> virt-builder yet), but everything worked. I don't have any actual NUMA
> hw here though, so maybe that matters?
> 
> Can you run this on the nfs server and send back the output? I'm
> wondering if this setting might not track the module option properly on
> that host for some reason:
> 
>     # nfsdctl pool-mode

(from EL9.5 container with nfs-utils 2.8.2)
# nfsdctl pool-mode
pool-mode: pernode
npools: 2

(on host)
# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 11665 MB
node 0 free: 9892 MB
node 1 cpus: 8 9 10 11 12 13 14 15
node 1 size: 6042 MB
node 1 free: 5127 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10

(and yeahh I was aware the newer nfs-utils uses the netlink interface,
will be interesting to pin down what the issue is with
pool-mode=pernode)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unable to run NFSD in container if "options sunrpc pool_mode=pernode"
  2025-05-24 14:33             ` Mike Snitzer
@ 2025-05-24 15:10               ` Jeff Layton
  2025-05-27 13:50               ` Jeff Layton
  2025-06-13 12:32               ` Jeff Layton
  2 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2025-05-24 15:10 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: steved, Chuck Lever, NeilBrown, linux-nfs

On Sat, 2025-05-24 at 10:33 -0400, Mike Snitzer wrote:
> On Sat, May 24, 2025 at 08:05:19AM -0400, Jeff Layton wrote:
> > On Fri, 2025-05-23 at 23:53 -0400, Mike Snitzer wrote:
> > > On Fri, May 23, 2025 at 07:09:27PM -0400, Mike Snitzer wrote:
> > > > On Fri, May 23, 2025 at 06:40:45PM -0400, Jeff Layton wrote:
> > > > > On Fri, 2025-05-23 at 18:19 -0400, Mike Snitzer wrote:
> > > > > > On Fri, May 23, 2025 at 02:40:17PM -0400, Jeff Layton wrote:
> > > > > > > On Fri, 2025-05-23 at 14:29 -0400, Mike Snitzer wrote:
> > > > > > > > I don't know if $SUBJECT ever worked... but with latest 6.15 or
> > > > > > > > nfsd-testing if I just use pool_mode=global then all is fine.
> > > > > > > > 
> > > > > > > > If pool_mode=pernode then mounting the container's NFSv3 export fails.
> > > > > > > > 
> > > > > > > > I haven't started to dig into code yet but pool_mode=pernode works
> > > > > > > > perfectly fine if NFSD isn't running in a container.
> > > > > > > > 
> > > > > 
> > > > > Oops, I went and looked and nfsd isn't running in a container on these
> > > > > boxes. There are some other containerized apps running on the box, but
> > > > > nfsd isn't running in a container.
> > > > 
> > > > OK.
> > > > 
> > > > > > I'm using nfs-utils-2.8.2.  I don't see any nfsd threads running if I
> > > > > > use "options sunrpc pool_mode=pernode".
> > > > > > 
> > > > > 
> > > > > I'll have a look soon, but if you figure it out in the meantime, let us
> > > > > know.
> > > > 
> > > > Will do.
> > > > 
> > > > Just the latest info I have, with sunrpc's pool_mode=pernode dd hangs
> > > > with this stack trace:
> > > 
> > > Turns out this pool_mode=pernode issue is a regression caused by the
> > > very recent nfs-utils 2.8.2 (I rebuilt EL10's nfs-utils package,
> > > because why not upgrade to the latest!?).
> > > 
> > > If I use EL9.5's latest nfs-utils-2.5.4-37.el8.x86_64 then sunrpc's
> > > pool_mode=pernode works fine.
> > > 
> > > And this issue doesn't have anything to do with running in a container
> > > (it seemed to be container related purely because I happened to be
> > > seeing the issue with an EL9.5 container that had the EL10-based
> > > nfs-utils 2.8.2 installed).
> > > 
> > > Steved, unfortunately I'm not sure what the problem is with the newer
> > > nfs-utils and setting "options sunrpc pool_mode=pernode"
> > > 
> > 
> > I tried to reproduce this using fedora-41 VMs (no f42 available for
> > virt-builder yet), but everything worked. I don't have any actual NUMA
> > hw here though, so maybe that matters?
> > 
> > Can you run this on the nfs server and send back the output? I'm
> > wondering if this setting might not track the module option properly on
> > that host for some reason:
> > 
> >     # nfsdctl pool-mode
> 
> (from EL9.5 container with nfs-utils 2.8.2)
> # nfsdctl pool-mode
> pool-mode: pernode
> npools: 2
> 
> (on host)
> # numactl -H
> available: 2 nodes (0-1)
> node 0 cpus: 0 1 2 3 4 5 6 7
> node 0 size: 11665 MB
> node 0 free: 9892 MB
> node 1 cpus: 8 9 10 11 12 13 14 15
> node 1 size: 6042 MB
> node 1 free: 5127 MB
> node distances:
> node   0   1
>   0:  10  20
>   1:  20  10
> 
> (and yeahh I was aware the newer nfs-utils uses the netlink interface,
> will be interesting to pin down what the issue is with
> pool-mode=pernode)

rpc.nfsd creates sockets in userland and then passes them to the
kernel. With the netlink interface, we now create the socket in the
kernel. If I had to guess, the problem is related to that difference.

I can check out a NUMA box at work this week and try to reproduce this,
if you don't figure it out over the weekend.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unable to run NFSD in container if "options sunrpc pool_mode=pernode"
  2025-05-24 14:33             ` Mike Snitzer
  2025-05-24 15:10               ` Jeff Layton
@ 2025-05-27 13:50               ` Jeff Layton
  2025-05-27 21:59                 ` Jeff Layton
  2025-06-13 12:32               ` Jeff Layton
  2 siblings, 1 reply; 15+ messages in thread
From: Jeff Layton @ 2025-05-27 13:50 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: steved, Chuck Lever, NeilBrown, linux-nfs

On Sat, 2025-05-24 at 10:33 -0400, Mike Snitzer wrote:
> On Sat, May 24, 2025 at 08:05:19AM -0400, Jeff Layton wrote:
> > On Fri, 2025-05-23 at 23:53 -0400, Mike Snitzer wrote:
> > > On Fri, May 23, 2025 at 07:09:27PM -0400, Mike Snitzer wrote:
> > > > On Fri, May 23, 2025 at 06:40:45PM -0400, Jeff Layton wrote:
> > > > > On Fri, 2025-05-23 at 18:19 -0400, Mike Snitzer wrote:
> > > > > > On Fri, May 23, 2025 at 02:40:17PM -0400, Jeff Layton wrote:
> > > > > > > On Fri, 2025-05-23 at 14:29 -0400, Mike Snitzer wrote:
> > > > > > > > I don't know if $SUBJECT ever worked... but with latest 6.15 or
> > > > > > > > nfsd-testing if I just use pool_mode=global then all is fine.
> > > > > > > > 
> > > > > > > > If pool_mode=pernode then mounting the container's NFSv3 export fails.
> > > > > > > > 
> > > > > > > > I haven't started to dig into code yet but pool_mode=pernode works
> > > > > > > > perfectly fine if NFSD isn't running in a container.
> > > > > > > > 
> > > > > 
> > > > > Oops, I went and looked and nfsd isn't running in a container on these
> > > > > boxes. There are some other containerized apps running on the box, but
> > > > > nfsd isn't running in a container.
> > > > 
> > > > OK.
> > > > 
> > > > > > I'm using nfs-utils-2.8.2.  I don't see any nfsd threads running if I
> > > > > > use "options sunrpc pool_mode=pernode".
> > > > > > 
> > > > > 
> > > > > I'll have a look soon, but if you figure it out in the meantime, let us
> > > > > know.
> > > > 
> > > > Will do.
> > > > 
> > > > Just the latest info I have, with sunrpc's pool_mode=pernode dd hangs
> > > > with this stack trace:
> > > 
> > > Turns out this pool_mode=pernode issue is a regression caused by the
> > > very recent nfs-utils 2.8.2 (I rebuilt EL10's nfs-utils package,
> > > because why not upgrade to the latest!?).
> > > 
> > > If I use EL9.5's latest nfs-utils-2.5.4-37.el8.x86_64 then sunrpc's
> > > pool_mode=pernode works fine.
> > > 
> > > And this issue doesn't have anything to do with running in a container
> > > (it seemed to be container related purely because I happened to be
> > > seeing the issue with an EL9.5 container that had the EL10-based
> > > nfs-utils 2.8.2 installed).
> > > 
> > > Steved, unfortunately I'm not sure what the problem is with the newer
> > > nfs-utils and setting "options sunrpc pool_mode=pernode"
> > > 
> > 
> > I tried to reproduce this using fedora-41 VMs (no f42 available for
> > virt-builder yet), but everything worked. I don't have any actual NUMA
> > hw here though, so maybe that matters?
> > 
> > Can you run this on the nfs server and send back the output? I'm
> > wondering if this setting might not track the module option properly on
> > that host for some reason:
> > 
> >     # nfsdctl pool-mode
> 
> (from EL9.5 container with nfs-utils 2.8.2)
> # nfsdctl pool-mode
> pool-mode: pernode
> npools: 2
> 
> (on host)
> # numactl -H
> available: 2 nodes (0-1)
> node 0 cpus: 0 1 2 3 4 5 6 7
> node 0 size: 11665 MB
> node 0 free: 9892 MB
> node 1 cpus: 8 9 10 11 12 13 14 15
> node 1 size: 6042 MB
> node 1 free: 5127 MB
> node distances:
> node   0   1
>   0:  10  20
>   1:  20  10
> 
> (and yeahh I was aware the newer nfs-utils uses the netlink interface,
> will be interesting to pin down what the issue is with
> pool-mode=pernode)

Ok, I can reproduce this on a true NUMA machine. The first thing that's
interesting is that it seems to be intermittent. Occasionally I can
mount and operate on the socket, but socket requests hang most of the
time.

I turned up all of the nfsd and sunrpc tracepoints. After attempting a
mount that hung, I see only a single tracepoint fire:

          <idle>-0       [038] ..s..  5942.572721: svc_xprt_enqueue: server=[::]:2049 client=(einval) flags=CONN|CHNGBUF|LISTENER|CACHE_AUTH|CONG_CTRL

Based on the flags, svc_xprt_ready should have returned true. That should
make the xprt be enqueued and an idle thread be awoken. It looks like
that last bit may not be happening for some reason.

At this point, I'll probably have to add some debugging. I'll keep
poking at it -- stay tuned.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unable to run NFSD in container if "options sunrpc pool_mode=pernode"
  2025-05-27 13:50               ` Jeff Layton
@ 2025-05-27 21:59                 ` Jeff Layton
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2025-05-27 21:59 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: steved, Chuck Lever, NeilBrown, linux-nfs

On Tue, 2025-05-27 at 09:50 -0400, Jeff Layton wrote:
> On Sat, 2025-05-24 at 10:33 -0400, Mike Snitzer wrote:
> > On Sat, May 24, 2025 at 08:05:19AM -0400, Jeff Layton wrote:
> > > On Fri, 2025-05-23 at 23:53 -0400, Mike Snitzer wrote:
> > > > On Fri, May 23, 2025 at 07:09:27PM -0400, Mike Snitzer wrote:
> > > > > On Fri, May 23, 2025 at 06:40:45PM -0400, Jeff Layton wrote:
> > > > > > On Fri, 2025-05-23 at 18:19 -0400, Mike Snitzer wrote:
> > > > > > > On Fri, May 23, 2025 at 02:40:17PM -0400, Jeff Layton wrote:
> > > > > > > > On Fri, 2025-05-23 at 14:29 -0400, Mike Snitzer wrote:
> > > > > > > > > I don't know if $SUBJECT ever worked... but with latest 6.15 or
> > > > > > > > > nfsd-testing if I just use pool_mode=global then all is fine.
> > > > > > > > > 
> > > > > > > > > If pool_mode=pernode then mounting the container's NFSv3 export fails.
> > > > > > > > > 
> > > > > > > > > I haven't started to dig into code yet but pool_mode=pernode works
> > > > > > > > > perfectly fine if NFSD isn't running in a container.
> > > > > > > > > 
> > > > > > 
> > > > > > Oops, I went and looked and nfsd isn't running in a container on these
> > > > > > boxes. There are some other containerized apps running on the box, but
> > > > > > nfsd isn't running in a container.
> > > > > 
> > > > > OK.
> > > > > 
> > > > > > > I'm using nfs-utils-2.8.2.  I don't see any nfsd threads running if I
> > > > > > > use "options sunrpc pool_mode=pernode".
> > > > > > > 
> > > > > > 
> > > > > > I'll have a look soon, but if you figure it out in the meantime, let us
> > > > > > know.
> > > > > 
> > > > > Will do.
> > > > > 
> > > > > Just the latest info I have, with sunrpc's pool_mode=pernode dd hangs
> > > > > with this stack trace:
> > > > 
> > > > Turns out this pool_mode=pernode issue is a regression caused by the
> > > > very recent nfs-utils 2.8.2 (I rebuilt EL10's nfs-utils package,
> > > > because why not upgrade to the latest!?).
> > > > 
> > > > If I use EL9.5's latest nfs-utils-2.5.4-37.el8.x86_64 then sunrpc's
> > > > pool_mode=pernode works fine.
> > > > 
> > > > And this issue doesn't have anything to do with running in a container
> > > > (it seemed to be container related purely because I happened to be
> > > > seeing the issue with an EL9.5 container that had the EL10-based
> > > > nfs-utils 2.8.2 installed).
> > > > 
> > > > Steved, unfortunately I'm not sure what the problem is with the newer
> > > > nfs-utils and setting "options sunrpc pool_mode=pernode"
> > > > 
> > > 
> > > I tried to reproduce this using fedora-41 VMs (no f42 available for
> > > virt-builder yet), but everything worked. I don't have any actual NUMA
> > > hw here though, so maybe that matters?
> > > 
> > > Can you run this on the nfs server and send back the output? I'm
> > > wondering if this setting might not track the module option properly on
> > > that host for some reason:
> > > 
> > >     # nfsdctl pool-mode
> > 
> > (from EL9.5 container with nfs-utils 2.8.2)
> > # nfsdctl pool-mode
> > pool-mode: pernode
> > npools: 2
> > 
> > (on host)
> > # numactl -H
> > available: 2 nodes (0-1)
> > node 0 cpus: 0 1 2 3 4 5 6 7
> > node 0 size: 11665 MB
> > node 0 free: 9892 MB
> > node 1 cpus: 8 9 10 11 12 13 14 15
> > node 1 size: 6042 MB
> > node 1 free: 5127 MB
> > node distances:
> > node   0   1
> >   0:  10  20
> >   1:  20  10
> > 
> > (and yeahh I was aware the newer nfs-utils uses the netlink interface,
> > will be interesting to pin down what the issue is with
> > pool-mode=pernode)
> 
> Ok, I can reproduce this on a true NUMA machine. The first thing that's
> interesting is that it seems to be intermittent. Occasionally I can
> mount and operate on the socket, but socket requests hang most of the
> time.
> 
> I turned up all of the nfsd and sunrpc tracepoints. After attempting a
> mount that hung, I see only a single tracepoint fire:
> 
>           <idle>-0       [038] ..s..  5942.572721: svc_xprt_enqueue: server=[::]:2049 client=(einval) flags=CONN|CHNGBUF|LISTENER|CACHE_AUTH|CONG_CTRL
> 
> Based on the flags, svc_xprt_ready should have returned true. That should
> make the xprt be enqueued and an idle thread be awoken. It looks like
> that last bit may not be happening for some reason.
> 
> At this point, I'll probably have to add some debugging. I'll keep
> poking at it -- stay tuned.

Ok, I figured it out. The problem is that the netlink interfaces don't
correctly handle the special case of a non-global pool_mode and
userland passing down a single-element array.

The old interface interpreted that as "distribute all of these threads
amongst the available nodes", but the netlink interfaces take it
literally and start all of the threads in the first node. If a socket
request comes in on the other node, then you're out of luck and the
socket never gets serviced.

I have a patch I'm testing now that should do the right thing. In the
meantime, if you explicitly set the number of threads in each pool it
should work around the problem. E.g., if you want 128 threads total,
distributed evenly over both numa nodes, try a setting like this for
now:

[nfsd]
threads=64,64

I'll send a proper fix for the kernel once I've tested it a bit more. 

Thanks for the bug report!
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unable to run NFSD in container if "options sunrpc pool_mode=pernode"
  2025-05-24 14:33             ` Mike Snitzer
  2025-05-24 15:10               ` Jeff Layton
  2025-05-27 13:50               ` Jeff Layton
@ 2025-06-13 12:32               ` Jeff Layton
  2 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2025-06-13 12:32 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: steved, Chuck Lever, NeilBrown, linux-nfs

On Sat, 2025-05-24 at 10:33 -0400, Mike Snitzer wrote:
> On Sat, May 24, 2025 at 08:05:19AM -0400, Jeff Layton wrote:
> > On Fri, 2025-05-23 at 23:53 -0400, Mike Snitzer wrote:
> > > On Fri, May 23, 2025 at 07:09:27PM -0400, Mike Snitzer wrote:
> > > > On Fri, May 23, 2025 at 06:40:45PM -0400, Jeff Layton wrote:
> > > > > On Fri, 2025-05-23 at 18:19 -0400, Mike Snitzer wrote:
> > > > > > On Fri, May 23, 2025 at 02:40:17PM -0400, Jeff Layton wrote:
> > > > > > > On Fri, 2025-05-23 at 14:29 -0400, Mike Snitzer wrote:
> > > > > > > > I don't know if $SUBJECT ever worked... but with latest 6.15 or
> > > > > > > > nfsd-testing if I just use pool_mode=global then all is fine.
> > > > > > > > 
> > > > > > > > If pool_mode=pernode then mounting the container's NFSv3 export fails.
> > > > > > > > 
> > > > > > > > I haven't started to dig into code yet but pool_mode=pernode works
> > > > > > > > perfectly fine if NFSD isn't running in a container.
> > > > > > > > 
> > > > > 
> > > > > Oops, I went and looked and nfsd isn't running in a container on these
> > > > > boxes. There are some other containerized apps running on the box, but
> > > > > nfsd isn't running in a container.
> > > > 
> > > > OK.
> > > > 
> > > > > > I'm using nfs-utils-2.8.2.  I don't see any nfsd threads running if I
> > > > > > use "options sunrpc pool_mode=pernode".
> > > > > > 
> > > > > 
> > > > > I'll have a look soon, but if you figure it out in the meantime, let us
> > > > > know.
> > > > 
> > > > Will do.
> > > > 
> > > > Just the latest info I have, with sunrpc's pool_mode=pernode dd hangs
> > > > with this stack trace:
> > > 
> > > Turns out this pool_mode=pernode issue is a regression caused by the
> > > very recent nfs-utils 2.8.2 (I rebuilt EL10's nfs-utils package,
> > > because why not upgrade to the latest!?).
> > > 
> > > If I use EL9.5's latest nfs-utils-2.5.4-37.el8.x86_64 then sunrpc's
> > > pool_mode=pernode works fine.
> > > 
> > > And this issue doesn't have anything to do with running in a container
> > > (it seemed to be container related purely because I happened to be
> > > seeing the issue with an EL9.5 container that had the EL10-based
> > > nfs-utils 2.8.2 installed).
> > > 
> > > Steved, unfortunately I'm not sure what the problem is with the newer
> > > nfs-utils and setting "options sunrpc pool_mode=pernode"
> > > 
> > 
> > I tried to reproduce this using fedora-41 VMs (no f42 available for
> > virt-builder yet), but everything worked. I don't have any actual NUMA
> > hw here though, so maybe that matters?
> > 
> > Can you run this on the nfs server and send back the output? I'm
> > wondering if this setting might not track the module option properly on
> > that host for some reason:
> > 
> >     # nfsdctl pool-mode
> 
> (from EL9.5 container with nfs-utils 2.8.2)
> # nfsdctl pool-mode
> pool-mode: pernode
> npools: 2
> 
> (on host)
> # numactl -H
> available: 2 nodes (0-1)
> node 0 cpus: 0 1 2 3 4 5 6 7
> node 0 size: 11665 MB
> node 0 free: 9892 MB
> node 1 cpus: 8 9 10 11 12 13 14 15
> node 1 size: 6042 MB
> node 1 free: 5127 MB
> node distances:
> node   0   1
>   0:  10  20
>   1:  20  10
> 
> (and yeahh I was aware the newer nfs-utils uses the netlink interface,
> will be interesting to pin down what the issue is with
> pool-mode=pernode)

Hi Mike,

I submitted a patch for this a couple of weeks ago:

    https://lore.kernel.org/linux-nfs/20250527-rpc-numa-v1-1-fa1d98e9a900@kernel.org/

Were you able to test it, and did it fix your issue?

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-06-13 12:32 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-23 18:29 unable to run NFSD in container if "options sunrpc pool_mode=pernode" Mike Snitzer
2025-05-23 18:40 ` Jeff Layton
2025-05-23 22:19   ` Mike Snitzer
2025-05-23 22:38     ` Mike Snitzer
2025-05-23 22:40     ` Jeff Layton
2025-05-23 23:09       ` Mike Snitzer
2025-05-24  3:53         ` Mike Snitzer
2025-05-24 10:26           ` Jeff Layton
2025-05-24 12:05           ` Jeff Layton
2025-05-24 14:33             ` Mike Snitzer
2025-05-24 15:10               ` Jeff Layton
2025-05-27 13:50               ` Jeff Layton
2025-05-27 21:59                 ` Jeff Layton
2025-06-13 12:32               ` Jeff Layton
2025-05-23 18:40 ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox