* Leaking plh_return_segs in containerized pNFS client?
@ 2025-04-04 18:57 Omar Sandoval
2025-04-06 9:04 ` Trond Myklebust
0 siblings, 1 reply; 2+ messages in thread
From: Omar Sandoval @ 2025-04-04 18:57 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Jeff Layton, Chris Mason, linux-nfs
Hi, Trond,
I'm investigating an issue on our systems that are running your latest
containerized NFS client teardown patches while Jeff is out. We're not
seeing the NFS client get stuck anymore, but I'm debugging what appears
to be a reference leak.
Jeff noticed that there are some lingering network namespaces not in use
by any processes after the container shutdown. I chased these references
through:
net -> nfs_client -> nfs4_pnfs_ds -> nfs4_ff_layout_ds ->
nfs4_ff_layout_mirror -> nfs4_ff_layout_segment
What I'm seeing is:
* The nfs4_ff_layout_segment/pnfs_layout_segment has a pls_refcount of
0, but hasn't been freed.
* Its pls_layout has already been freed, and the nfs_inode
and nfs_server are also long gone.
* The segment was on pls_layout_hdr->plh_return_segs.
>>> lseg
*(struct pnfs_layout_segment *)0xffff88813147ca00 = {
.pls_list = (struct list_head){
.next = (struct list_head *)0xffff8885d49e0f38,
.prev = (struct list_head *)0xffff888dee919f80,
},
.pls_lc_list = (struct list_head){
.next = (struct list_head *)0xffff88813147ca10,
.prev = (struct list_head *)0xffff88813147ca10,
},
.pls_commits = (struct list_head){
.next = (struct list_head *)0xffff88813147ca20,
.prev = (struct list_head *)0xffff88813147ca20,
},
.pls_range = (struct pnfs_layout_range){
.iomode = (u32)1,
.offset = (u64)0,
.length = (u64)18446744073709551615,
},
.pls_refcount = (refcount_t){
.refs = (atomic_t){
.counter = (int)0,
},
},
.pls_seq = (u32)2,
.pls_flags = (unsigned long)10,
.pls_layout = (struct pnfs_layout_hdr *)0xffff8885d49e0f00,
}
>>> decode_enum_type_flags(lseg.pls_flags, prog["NFS_LSEG_VALID"].type_)
'NFS_LSEG_ROC|NFS_LSEG_LAYOUTRETURN'
>>> lseg.pls_list.next == lseg.pls_layout.plh_return_segs.address_of_()
True
So my guess is that there were still segments on plh_return_segs when
the pnfs_layout_hdr was freed. I wasn't able to make sense of how the
lifetime of that list is supposed to work. My next step is to test with
WARN_ONCE(!list_empty(&lo->plh_return_segs)) in the free path of
pnfs_put_layout_hdr(). In the meantime, do you have any ideas?
Thanks,
Omar
P.S. I spotted a separate issue that nfs4_data_server_cache is only
keyed on the socket address, not taking the network namespace into
account, which can result in connections being shared between
containers. This leak has a knock-on effect of pinning dead DS
connections in the cache, which other containers may try to reuse. Maybe
the cache should be split up by netns?
^ permalink raw reply [flat|nested] 2+ messages in thread* Re: Leaking plh_return_segs in containerized pNFS client?
2025-04-04 18:57 Leaking plh_return_segs in containerized pNFS client? Omar Sandoval
@ 2025-04-06 9:04 ` Trond Myklebust
0 siblings, 0 replies; 2+ messages in thread
From: Trond Myklebust @ 2025-04-06 9:04 UTC (permalink / raw)
To: osandov@osandov.com
Cc: linux-nfs@vger.kernel.org, jlayton@kernel.org, clm@meta.com
Hi Omar,
On Fri, 2025-04-04 at 11:57 -0700, Omar Sandoval wrote:
> Hi, Trond,
>
> I'm investigating an issue on our systems that are running your
> latest
> containerized NFS client teardown patches while Jeff is out. We're
> not
> seeing the NFS client get stuck anymore, but I'm debugging what
> appears
> to be a reference leak.
>
> Jeff noticed that there are some lingering network namespaces not in
> use
> by any processes after the container shutdown. I chased these
> references
> through:
>
> net -> nfs_client -> nfs4_pnfs_ds -> nfs4_ff_layout_ds ->
> nfs4_ff_layout_mirror -> nfs4_ff_layout_segment
>
> What I'm seeing is:
>
> * The nfs4_ff_layout_segment/pnfs_layout_segment has a pls_refcount
> of
> 0, but hasn't been freed.
> * Its pls_layout has already been freed, and the nfs_inode
> and nfs_server are also long gone.
> * The segment was on pls_layout_hdr->plh_return_segs.
>
> >>> lseg
> *(struct pnfs_layout_segment *)0xffff88813147ca00 = {
> .pls_list = (struct list_head){
> .next = (struct list_head *)0xffff8885d49e0f38,
> .prev = (struct list_head *)0xffff888dee919f80,
> },
> .pls_lc_list = (struct list_head){
> .next = (struct list_head *)0xffff88813147ca10,
> .prev = (struct list_head *)0xffff88813147ca10,
> },
> .pls_commits = (struct list_head){
> .next = (struct list_head *)0xffff88813147ca20,
> .prev = (struct list_head *)0xffff88813147ca20,
> },
> .pls_range = (struct pnfs_layout_range){
> .iomode = (u32)1,
> .offset = (u64)0,
> .length = (u64)18446744073709551615,
> },
> .pls_refcount = (refcount_t){
> .refs = (atomic_t){
> .counter = (int)0,
> },
> },
> .pls_seq = (u32)2,
> .pls_flags = (unsigned long)10,
> .pls_layout = (struct pnfs_layout_hdr
> *)0xffff8885d49e0f00,
> }
> >>> decode_enum_type_flags(lseg.pls_flags,
> prog["NFS_LSEG_VALID"].type_)
> 'NFS_LSEG_ROC|NFS_LSEG_LAYOUTRETURN'
> >>> lseg.pls_list.next ==
> lseg.pls_layout.plh_return_segs.address_of_()
> True
>
> So my guess is that there were still segments on plh_return_segs when
> the pnfs_layout_hdr was freed. I wasn't able to make sense of how the
> lifetime of that list is supposed to work. My next step is to test
> with
> WARN_ONCE(!list_empty(&lo->plh_return_segs)) in the free path of
> pnfs_put_layout_hdr(). In the meantime, do you have any ideas?
I suspect these might just be layout segments that were queued for
return, but are unable to be sent. It looks to me as if the
layoutreturn on close code might be to blame.
I'll send out a patch series with a fix.
>
> Thanks,
> Omar
>
> P.S. I spotted a separate issue that nfs4_data_server_cache is only
> keyed on the socket address, not taking the network namespace into
> account, which can result in connections being shared between
> containers. This leak has a knock-on effect of pinning dead DS
> connections in the cache, which other containers may try to reuse.
> Maybe
> the cache should be split up by netns?
Agreed, and thanks for spotting this!
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-04-06 9:05 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-04 18:57 Leaking plh_return_segs in containerized pNFS client? Omar Sandoval
2025-04-06 9:04 ` Trond Myklebust
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox