From: Mal Haak <malcolm@haak.id.au>
To: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Cc: "00107082@163.com" <00107082@163.com>,
Xiubo Li <xiubli@redhat.com>, David Howells <dhowells@redhat.com>,
"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
"surenb@google.com" <surenb@google.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"netfs@lists.linux.dev" <netfs@lists.linux.dev>,
"pc@manguebit.org" <pc@manguebit.org>,
"idryomov@gmail.com" <idryomov@gmail.com>
Subject: Re: Possible memory leak in 6.17.7
Date: Wed, 17 Dec 2025 12:28:38 +1000 [thread overview]
Message-ID: <20251217122838.3748ea92@xps15mal> (raw)
In-Reply-To: <ec3b777ba176a6ca4738da8c62c030577a4e58eb.camel@ibm.com>
On Wed, 17 Dec 2025 01:56:52 +0000
Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote:
> Hi Mal,
>
> On Tue, 2025-12-16 at 20:42 +0800, David Wang wrote:
> > At 2025-12-16 20:18:11, "David Wang" <00107082@163.com> wrote:
> > >
> > >
>
> <skipped>
>
> > > >
> > > >
> > > > > > >
> > > > > > > I've have been trying to narrow down a consistent
> > > > > > > reproducer that's as fast as my production workload. (It
> > > > > > > crashes a 32GB VM in 2hrs) And I haven't got it quite as
> > > > > > > fast. I think the dd workload is too well behaved.
> > > > > > >
> > > > > > > I can confirm the issue appeared in the major patch set
> > > > > > > that was applied as part of the 6.15 kernel. So during
> > > > > > > the more complete pages to folios switch and that nothing
> > > > > > > has changed in the bug behaviour since then. I did have a
> > > > > > > look at all the diffs from 6.14 to 6.18 on addr.c and
> > > > > > > didn't see any changes post 6.15 that looked like they
> > > > > > > would impact the bug behavior.
> > > > > > >
> > > > > > > Again, I'm not super familiar with the CephFS code but to
> > > > > > > hazard a guess, but I think that the web download
> > > > > > > workload triggers things faster suggests that unaligned
> > > > > > > writes might make things worse. But again, I'm not 100%
> > > > > > > sure. I can't find a reproducer as fast as downloading a
> > > > > > > dataset. Rsync of lots and lots of tiny files is a tad
> > > > > > > faster than the dd case.
> > > > > > >
> > > > > > > I did see some changes in ceph_check_page_before_write
> > > > > > > where the previous code unlocked pages and then continued
> > > > > > > where as the changed folio code just returns ENODATA and
> > > > > > > doesn't unlock anything with most of the rest of the
> > > > > > > logic unchanged. This might be perfectly fine, but in my,
> > > > > > > admittedly limited, reading of the code I couldn't figure
> > > > > > > out where anything that was locked prior to this being
> > > > > > > called would get unlocked like it did prior to the
> > > > > > > change. Again, I could be miles off here and one of the
> > > > > > > bulk reclaim/unlock passes that was added might be
> > > > > > > cleaning this up correctly or some other functional
> > > > > > > change might take care of this, but it looks to be
> > > > > > > potentially in the code path I'm excising and it has had
> > > > > > > some unlock logic changed.
> > > > > > >
> > > > > > > I've spent most of my time trying to find a solid quick
> > > > > > > reproducer. Not that it takes long to start leaking
> > > > > > > folios, but I wanted something that aggressively
> > > > > > > triggered it so a small vm would oom quickly and when
> > > > > > > combined with crash_on_oom it could potentially be used
> > > > > > > for regression testing by way of "did vm crash?".
> > > > > > >
> > > > > > > I'm not sure if it will super help, but I'll provide what
> > > > > > > details I can about the actual workload that really sets
> > > > > > > it off. It's a python based tool for downloading
> > > > > > > datasets. Datasets are split into N chunks and the tool
> > > > > > > downloads them in parallel 100 at a time until all N
> > > > > > > chunks are down. The compressed dataset is then unpacked
> > > > > > > and reassembled for use with workloads.
> > > > > > >
> > > > > > > This is replicating a common home folder usecase in HPC.
> > > > > > > CephFS is very attractive for home folders due to it's
> > > > > > > "NFS-like" utility and performance. And many tools use a
> > > > > > > similar method for fetching large datasets. Tools are
> > > > > > > frequently written in python or go.
> > > > > > >
> > > > > > > None of my customers have hit this yet, not have any
> > > > > > > enterprise customers as none have moved to a new enough
> > > > > > > kernel yet due to slow upgrade cycles. Even Proxmox have
> > > > > > > only just started testing on a kernel version > 6.14.
> > > > > > >
> > > > > > > I'm more than happy to help however I can with testing. I
> > > > > > > can run instrumented kernels or test patches or whatever
> > > > > > > you need. I am sorry I haven't been able to produce a
> > > > > > > super clean, fast reproducer (my test cluster at home is
> > > > > > > all spinners and only 500TB usable). But I figured I
> > > > > > > needed to get the word out asap as distros and soon
> > > > > > > customers are going to be moving past 6.12-6.14 kernels
> > > > > > > as the 5-7 year update cycle marches on. Especially those
> > > > > > > wanting to take full advantage of CacheFS and encryption
> > > > > > > functionality.
> > > > > > >
> > > > > > > Again thanks for looking at this and do reach out if I
> > > > > > > can help in anyway. I am in the ceph slack if it's faster
> > > > > > > to reach out that way.
> > > > > > >
> > > > >
>
> Could you please add your CephFS kernel client's mount options into
> the ticket [1]?
>
> Thanks a lot,
> Slava.
>
> [1] https://tracker.ceph.com/issues/74156
I've updated the ticket.
I am curious about the differences between your test setup and my
actual setup in terms of capacity and hardware.
I can provide crash dumps if it is helpful.
Thanks
Mal
prev parent reply other threads:[~2025-12-17 2:28 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20251110182008.71e0858b@xps15mal>
[not found] ` <20251208110829.11840-1-00107082@163.com>
[not found] ` <20251209090831.13c7a639@xps15mal>
[not found] ` <17469653.4a75.19b01691299.Coremail.00107082@163.com>
[not found] ` <20251210234318.5d8c2d68@xps15mal>
[not found] ` <2a9ba88e.3aa6.19b0b73dd4e.Coremail.00107082@163.com>
[not found] ` <20251211142358.563d9ac3@xps15mal>
[not found] ` <8c8e8dc4d30a8ca37a57d7f29c5f29cdf7a904ee.camel@ibm.com>
[not found] ` <20251216112647.39ac2295@xps15mal>
[not found] ` <63fa6bc2.6afc.19b25f618ad.Coremail.00107082@163.com>
[not found] ` <20251216170918.5f7848cc@xps15mal>
[not found] ` <20251216215527.61c2e16f@xps15mal>
2025-12-16 12:18 ` Possible memory leak in 6.17.7 David Wang
2025-12-16 12:42 ` David Wang
2025-12-17 1:56 ` Viacheslav Dubeyko
2025-12-17 2:28 ` Mal Haak [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251217122838.3748ea92@xps15mal \
--to=malcolm@haak.id.au \
--cc=00107082@163.com \
--cc=Slava.Dubeyko@ibm.com \
--cc=ceph-devel@vger.kernel.org \
--cc=dhowells@redhat.com \
--cc=idryomov@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netfs@lists.linux.dev \
--cc=pc@manguebit.org \
--cc=surenb@google.com \
--cc=xiubli@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox