Linux NFS development
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Trond Myklebust <trondmy@hammerspace.com>,
	Anna Schumaker <anna@kernel.org>
Cc: linux-nfs@vger.kernel.org, Omar Sandoval <osandov@osandov.com>,
	Chris Mason	 <clm@meta.com>
Subject: Re: leaked pNFS DS nfs_client references
Date: Wed, 23 Apr 2025 11:53:44 -0400	[thread overview]
Message-ID: <9fd8b6f84a39d5dc7715b1a7411df90b2ea83b74.camel@kernel.org> (raw)
In-Reply-To: <6dcb8370766dac91240301e1cfbf9b77e863da08.camel@kernel.org>

On Mon, 2025-04-21 at 15:46 -0400, Jeff Layton wrote:
> Hi Trond/Anna:
> 
> We (at Meta) have been hunting a number of problems surrounding leaked
> network namespaces with containerized workloads. We recently deployed a
> v6.9 based kernel on the clients that has all the known containerized
> NFS fixes from upstream.
> 
> Usually, when we've found problems with leaked netns's it has been
> because there were still outstanding RPCs associated with the rpc_clnt.
> Today, we found a host that seems to have some leaked nfs_client
> structures, but there is no associated RPC activity.
> 
> In this case, we had 2 leaked net namespaces. We discovered them by
> looking under /sys/kernel/debugfs/rpc_xprt for xprts associated with
> netns's that no longer have any userland tasks attached.
> 
> Some drgn (pardon my terrible Python):
> 
> > > > for net in for_each_net():
> ...     if (net.ns.inum == 4026558887 or net.ns.inum == 4026558805):                                   
> ...         print("netns:", net.ns.inum)                                                               
> ...         nfs_net = cast("struct nfs_net *", net.gen.ptr[prog["nfs_net_id"]])
> ...         print("Volume list empty:", list_empty(nfs_net.nfs_volume_list.address_of_()))
> ...         for clnt in list_for_each_entry("struct nfs_client", nfs_net.nfs_client_list.address_of_(), "cl_share_link"):
> ...             rpcclnt = clnt.cl_rpcclient
> ...             print(clnt.cl_count.refs.counter, clnt.cl_hostname, rpcclnt.cl_vers, "tasks: ", list_count_nodes(rpcclnt.cl_tasks.address_of_()))
> ... 
> netns: (unsigned int)4026558805
> Volume list empty: True
> (int)1 (char *)0xffff8a12e988a500 = "f00::3117:a4f1:a940:94af" (u32)3 tasks:  0
> (int)1 (char *)0xffff8881a0f694c0 = "f00::bfaa:cec2:8ee2:295" (u32)3 tasks:  0
> (int)1 (char *)0xffff889e81a74e40 = "f00::8f23:f52d:9d79:a7b0" (u32)3 tasks:  0
> (int)1 (char *)0xffff8a027d8e0780 = "f00::d209:97ba:1c6:3282" (u32)3 tasks:  0
> netns: (unsigned int)4026558887
> Volume list empty: True
> (int)1 (char *)0xffff8a14d5b0e2c0 = "f00::3f52:fea6:4ccb:96dd" (u32)3 tasks:  0
> (int)1 (char *)0xffff8881e6626cc0 = "f00::705:c924:ddc1:51e4" (u32)3 tasks:  0
> (int)1 (char *)0xffff8a149cdb6680 = "f00::3117:a4f1:a940:94af" (u32)3 tasks:  0
> (int)1 (char *)0xffff8896ada2f800 = "f00::d56c:cd93:1f0c:99c7" (u32)3 tasks:  0
> (int)1 (char *)0xffff8a159251f240 = "f00::614d:87c1:a73f:1f09" (u32)3 tasks:  0
> (int)1 (char *)0xffff888e699f4940 = "f00::1285:b785:f114:d38b" (u32)3 tasks:  0
> (int)1 (char *)0xffff88812ae41500 = "f00::fb1c:bc4a:3d9a:c2a6" (u32)3 tasks:  0
> (int)1 (char *)0xffff8a137dbc4e00 = "f00::bd2f:5851:b552:5bce" (u32)3 tasks:  0
> 
> There are 12 leaked nfs_clients in 2 netns's. There are no longer any
> struct nfs_servers associated with either netns. Each leaked client has
> a single outstanding reference. They're all connections to different
> DS's (except for one between the two netns's, but I suspect that's just
> coincidence). They're all NFSv3, which indicates that they are pNFS DS
> clients. None of them have any running RPCs.
> 
> I took a look at the nfs_client refcount handling in the pNFS code but
> didn't see any obvious bugs.
> 
> One thing we could consider is adding a refcount tracker for these
> objects. That would tell us pretty quickly what took the leftover
> references in the first place, assuming this is reproducible.
> 
> This kernel is based on v6.9, so it's possible we missed a fix that we
> need. I didn't see anything obvious in recent git fixes though.
> 
> Any thoughts?

An update:

Omar and I worked together yesterday, and confirmed that this is the
same problem that he reported a week or two ago. This kernel has the
two patches you sent on April 6th:

    [PATCH v2 1/2] NFSv4: Handle fatal ENETDOWN and ENETUNREACH errors
    [PATCH v2 2/2] NFSv4/pnfs: Layoutreturn on close must handle fatal networking errors

...so that's evidently not enough to fix it.

Note that this is a potential memory corruptor. The leaked layout segs
were all sitting on the layouts plh_return_segs list. If they get freed
later, then that will likely scribble over the list_head in the freed
layout.

Looking over the code, it appears that when the inode is evicted, the
plh_return_segs list should get cleaned out via:

nfs4_evict_inode
    pnfs_destroy_layout_final
        __pnfs_destroy_layout
            pnfs_mark_layout_stateid_invalid
                pnfs_free_returned_lsegs

pnfs_free_returned_lsegs() should put them all on the tmp_list and then
__pnfs_destroy_layout() should free the contents of the list via
pnfs_free_lseg_list().

It's not clear to me why that didn't happen here.
-- 
Jeff Layton <jlayton@kernel.org>

      reply	other threads:[~2025-04-23 15:53 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-21 19:46 leaked pNFS DS nfs_client references Jeff Layton
2025-04-23 15:53 ` Jeff Layton [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9fd8b6f84a39d5dc7715b1a7411df90b2ea83b74.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=anna@kernel.org \
    --cc=clm@meta.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=osandov@osandov.com \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox