From: "David Wang" <00107082@163.com>
To: "Mal Haak" <malcolm@haak.id.au>
Cc: "Viacheslav Dubeyko" <Slava.Dubeyko@ibm.com>,
"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
"Xiubo Li" <xiubli@redhat.com>,
"idryomov@gmail.com" <idryomov@gmail.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"surenb@google.com" <surenb@google.com>,
dhowells@redhat.com, pc@manguebit.org, netfs@lists.linux.dev
Subject: Re: Possible memory leak in 6.17.7
Date: Tue, 16 Dec 2025 20:18:11 +0800 (CST) [thread overview]
Message-ID: <5845dde.b3e3.19b2718bc89.Coremail.00107082@163.com> (raw)
In-Reply-To: <20251216215527.61c2e16f@xps15mal>
At 2025-12-16 19:55:27, "Mal Haak" <malcolm@haak.id.au> wrote:
>On Tue, 16 Dec 2025 17:09:18 +1000
>Mal Haak <malcolm@haak.id.au> wrote:
>
>> On Tue, 16 Dec 2025 15:00:43 +0800 (CST)
>> "David Wang" <00107082@163.com> wrote:
>>
>> > At 2025-12-16 09:26:47, "Mal Haak" <malcolm@haak.id.au> wrote:
>> > >On Mon, 15 Dec 2025 19:42:56 +0000
>> > >Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote:
>> > >
>> > >> Hi Mal,
>> > >>
>> > ><SNIP>
>> > >>
>> > >> Thanks a lot for reporting the issue. Finally, I can see the
>> > >> discussion in email list. :) Are you working on the patch with
>> > >> the fix? Should we wait for the fix or I need to start the issue
>> > >> reproduction and investigation? I am simply trying to avoid
>> > >> patches collision and, also, I have multiple other issues for
>> > >> the fix in CephFS kernel client. :)
>> > >>
>> > >> Thanks,
>> > >> Slava.
>> > >
>> > >Hello,
>> > >
>> > >Unfortunately creating a patch is just outside my comfort zone,
>> > >I've lived too long in Lustre land.
>> >
>> > Hi, just out of curiosity, have you narrowed down the caller of
>> > __filemap_get_folio causing the memory problem? Or do you have
>> > trouble applying the debug patch for memory allocation profiling?
>> >
>> > David
>> >
>> Hi David,
>>
>> I hadn't yet as I did test XFS and NFS to see if it replicated the
>> behaviour and it did not.
>>
>> But actually this could speed things up considerably. I will do that
>> now and see what I get.
>>
>> Thanks
>>
>> Mal
>>
>I did just give it a blast.
>
>Unfortunately it returned exactly what I expected, that is the calls
>are all coming from netfs.
>
>Which makes sense for cephfs.
>
># sort -g /proc/allocinfo|tail|numfmt --to=iec
> 10M 2541 drivers/block/zram/zram_drv.c:1597 [zram]
>func:zram_meta_alloc 12M 3001 mm/execmem.c:41 func:execmem_vmalloc
> 12M 3605 kernel/fork.c:311 func:alloc_thread_stack_node
> 16M 992 mm/slub.c:3061 func:alloc_slab_page
> 20M 35544 lib/xarray.c:378 func:xas_alloc
> 31M 7704 mm/memory.c:1192 func:folio_prealloc
> 69M 17562 mm/memory.c:1190 func:folio_prealloc
> 104M 8212 mm/slub.c:3059 func:alloc_slab_page
> 124M 30075 mm/readahead.c:189 func:ractl_alloc_folio
> 2.6G 661392 fs/netfs/buffered_read.c:635 [netfs]
>func:netfs_write_begin
>
>So, unfortunately it doesn't reveal the true source. But was worth a
>shot! So thanks again
Oh, at least cephfs could be ruled out, right?
CC netfs folks then. :)
>
>Mal
>
>
>> > >
>> > >I've have been trying to narrow down a consistent reproducer that's
>> > >as fast as my production workload. (It crashes a 32GB VM in 2hrs)
>> > >And I haven't got it quite as fast. I think the dd workload is too
>> > >well behaved.
>> > >
>> > >I can confirm the issue appeared in the major patch set that was
>> > >applied as part of the 6.15 kernel. So during the more complete
>> > >pages to folios switch and that nothing has changed in the bug
>> > >behaviour since then. I did have a look at all the diffs from 6.14
>> > >to 6.18 on addr.c and didn't see any changes post 6.15 that looked
>> > >like they would impact the bug behavior.
>> > >
>> > >Again, I'm not super familiar with the CephFS code but to hazard a
>> > >guess, but I think that the web download workload triggers things
>> > >faster suggests that unaligned writes might make things worse. But
>> > >again, I'm not 100% sure. I can't find a reproducer as fast as
>> > >downloading a dataset. Rsync of lots and lots of tiny files is a
>> > >tad faster than the dd case.
>> > >
>> > >I did see some changes in ceph_check_page_before_write where the
>> > >previous code unlocked pages and then continued where as the
>> > >changed folio code just returns ENODATA and doesn't unlock
>> > >anything with most of the rest of the logic unchanged. This might
>> > >be perfectly fine, but in my, admittedly limited, reading of the
>> > >code I couldn't figure out where anything that was locked prior to
>> > >this being called would get unlocked like it did prior to the
>> > >change. Again, I could be miles off here and one of the bulk
>> > >reclaim/unlock passes that was added might be cleaning this up
>> > >correctly or some other functional change might take care of this,
>> > >but it looks to be potentially in the code path I'm excising and
>> > >it has had some unlock logic changed.
>> > >
>> > >I've spent most of my time trying to find a solid quick reproducer.
>> > >Not that it takes long to start leaking folios, but I wanted
>> > >something that aggressively triggered it so a small vm would oom
>> > >quickly and when combined with crash_on_oom it could potentially be
>> > >used for regression testing by way of "did vm crash?".
>> > >
>> > >I'm not sure if it will super help, but I'll provide what details I
>> > >can about the actual workload that really sets it off. It's a
>> > >python based tool for downloading datasets. Datasets are split
>> > >into N chunks and the tool downloads them in parallel 100 at a
>> > >time until all N chunks are down. The compressed dataset is then
>> > >unpacked and reassembled for use with workloads.
>> > >
>> > >This is replicating a common home folder usecase in HPC. CephFS is
>> > >very attractive for home folders due to it's "NFS-like" utility and
>> > >performance. And many tools use a similar method for fetching large
>> > >datasets. Tools are frequently written in python or go.
>> > >
>> > >None of my customers have hit this yet, not have any enterprise
>> > >customers as none have moved to a new enough kernel yet due to slow
>> > >upgrade cycles. Even Proxmox have only just started testing on a
>> > >kernel version > 6.14.
>> > >
>> > >I'm more than happy to help however I can with testing. I can run
>> > >instrumented kernels or test patches or whatever you need. I am
>> > >sorry I haven't been able to produce a super clean, fast reproducer
>> > >(my test cluster at home is all spinners and only 500TB usable).
>> > >But I figured I needed to get the word out asap as distros and soon
>> > >customers are going to be moving past 6.12-6.14 kernels as the 5-7
>> > >year update cycle marches on. Especially those wanting to take full
>> > >advantage of CacheFS and encryption functionality.
>> > >
>> > >Again thanks for looking at this and do reach out if I can help in
>> > >anyway. I am in the ceph slack if it's faster to reach out that
>> > >way.
>> > >
>> > >Regards
>> > >
>> > >Mal Haak
>>
next prev parent reply other threads:[~2025-12-16 12:19 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-10 8:20 Possible memory leak in 6.17.7 Mal Haak
2025-11-20 2:23 ` Mal Haak
2025-12-05 22:23 ` Mal Haak
2025-12-08 9:52 ` Mal Haak
2025-12-08 11:08 ` David Wang
2025-12-08 23:08 ` Mal Haak
2025-12-09 4:40 ` David Wang
2025-12-10 13:43 ` Mal Haak
2025-12-11 3:28 ` RRe: " David Wang
2025-12-11 4:23 ` Mal Haak
2025-12-15 19:42 ` Viacheslav Dubeyko
2025-12-16 1:26 ` Mal Haak
2025-12-16 2:02 ` Viacheslav Dubeyko
2025-12-16 7:00 ` David Wang
2025-12-16 7:09 ` Mal Haak
2025-12-16 11:55 ` Mal Haak
2025-12-16 12:18 ` David Wang [this message]
2025-12-16 12:42 ` David Wang
2025-12-17 1:56 ` Viacheslav Dubeyko
2025-12-17 2:28 ` Mal Haak
2025-12-17 5:59 ` David Wang
2025-12-17 6:46 ` Mal Haak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5845dde.b3e3.19b2718bc89.Coremail.00107082@163.com \
--to=00107082@163.com \
--cc=Slava.Dubeyko@ibm.com \
--cc=ceph-devel@vger.kernel.org \
--cc=dhowells@redhat.com \
--cc=idryomov@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=malcolm@haak.id.au \
--cc=netfs@lists.linux.dev \
--cc=pc@manguebit.org \
--cc=surenb@google.com \
--cc=xiubli@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.