From: Johannes Weiner <hannes@cmpxchg.org>
To: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Shakeel Butt <shakeel.butt@linux.dev>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Michal Hocko <mhocko@kernel.org>,
David Hildenbrand <david@redhat.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: skip lru_note_cost() when scanning only file or anon
Date: Mon, 14 Jul 2025 11:22:47 -0400 [thread overview]
Message-ID: <20250714152247.GB991@cmpxchg.org> (raw)
In-Reply-To: <8734b2vcgr.fsf@linux.dev>
On Fri, Jul 11, 2025 at 10:55:48AM -0700, Roman Gushchin wrote:
> Johannes Weiner <hannes@cmpxchg.org> writes:
> > The caveat with this patch is that, aside from the static noswap
> > scenario, modes can switch back and forth abruptly or even overlap.
> >
> > So if you leave a pressure scenario and go back to cache trimming, you
> > will no longer age the cost information anymore. The next spike could
> > be starting out with potentially quite stale information.
> >
> > Or say proactive reclaim recently already targeted anon, and there
> > were rotations and pageouts; that would be useful data for a reactive
> > reclaimer doing work at around the same time, or shortly thereafter.
>
> Agree, but at the same time it's possible to come up with the scenario
> when it's not good.
> A
> / \
> B C memory.max=X
> / \
> D E
>
> Let's say we have a cgroup structure like this, we apply a lot
> of proactive anon pressure on E, then the pressure from on D from
> C's limit will be biased towards file without a good reason.
No, this is on purpose. D and E are not independent. They're in the
same memory domain, C. So if you want to reclaim C, and a subset of
its anon has already been pressured to resistance, then a larger part
of the reclaim candidates in C will need to come from file.
> Or as in my case, if a cgroup has memory.memsw.limit set and is
> thrashing, does it makes sense to bias the rest of the system
> into anon reclaim? The recorded cost can really large.
>
> >
> > So for everything but the static noswap case, the patch makes me
> > nervous. And I'm not sure it actually helps in the cases where it
> > would matter the most.
>
> I understand, but do you think it's acceptable with some additional
> conditions: e.g. narrow it down to only very high scanning priorities?
> Or !sc.may_swap case?
>
> In the end, we have the following code in get_scan_count(), so at
> least on priority 0 we ignore all costs anyway.
> if (!sc->priority && swappiness) {
> scan_balance = SCAN_EQUAL;
> goto out;
> }
>
> Wdyt?
I think relitigating a proven aging mechanism after half a decade in
production is going to be tough and require extensive testing.
If your primary problem is the cost of the locking, I'd focus on that.
> > It might make more sense to look into the cost (ha) of the cost
> > recording itself. Can we turn it into a vmstat item? That would make
> > it lockless, would get rstat batching up the cgroup tree etc. This
> > doesn't need to be 100% precise and race free after all.
>
> Idk, maybe yes, but rstat flushing was a source of the issues as well
> and now it's mostly ratelimited, so I'm concerned that because of that
> we'll have sudden changes in the reclaim behavior every 2 seconds.
That's not a new hazard, though. prepare_scan_control() decisions are
already subject to this, as is the lru cost aging itself.
next prev parent reply other threads:[~2025-07-14 15:22 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-11 15:50 [PATCH] mm: skip lru_note_cost() when scanning only file or anon Roman Gushchin
2025-07-11 17:20 ` Johannes Weiner
2025-07-11 17:55 ` Roman Gushchin
2025-07-14 15:22 ` Johannes Weiner [this message]
2025-07-14 16:21 ` Roman Gushchin
2025-07-11 18:18 ` Roman Gushchin
2025-07-13 19:57 ` Hugh Dickins
2025-07-14 15:25 ` Johannes Weiner
2025-07-14 17:59 ` Roman Gushchin
2025-07-14 20:28 ` Shakeel Butt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250714152247.GB991@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@kernel.org \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.