All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>,
	Zhongkun He <hezhongkun.hzk@bytedance.com>,
	akpm@linux-foundation.org, muchun.song@linux.dev,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH V2] mm: vmscan: skip the file folios in proactive reclaim if swappiness is MAX
Date: Fri, 14 Mar 2025 17:52:49 +0000	[thread overview]
Message-ID: <Z9Rs8ZtgkupXpFYn@google.com> (raw)
In-Reply-To: <20250314165739.GB1316033@cmpxchg.org>

On Fri, Mar 14, 2025 at 12:57:39PM -0400, Johannes Weiner wrote:
> On Fri, Mar 14, 2025 at 03:49:30PM +0100, Michal Hocko wrote:
> > On Fri 14-03-25 10:18:33, Johannes Weiner wrote:
> > > On Fri, Mar 14, 2025 at 10:27:57AM +0100, Michal Hocko wrote:
> > [...]
> > > > I have just noticed that you have followed up [1] with a concern that
> > > > using swappiness in the whole min-max range without any heuristics turns
> > > > out to be harder than just relying on the min and max as extremes.
> > > > What seems to be still missing (or maybe it is just me not seeing that)
> > > > is why should we only enforce those extreme ends of the range and still
> > > > preserve under-defined semantic for all other swappiness values in the
> > > > pro-active reclaim.
> > > 
> > > I'm guess I'm not seeing the "under-defined" part.
> > 
> > What I meant here is that any other value than both ends of swappiness
> > doesn't have generally predictable behavior unless you know specific
> > details of the current memory reclaim heuristics in get_scan_count.
> > 
> > > cache_trim_mode is
> > > there to make sure a streaming file access pattern doesn't cause
> > > swapping.
> > 
> > Yes, I am aware of the purpose.
> > 
> > > He has a special usecase to override cache_trim_mode when he
> > > knows a large amount of anon is going cold. There is no way we can
> > > generally remove it from proactive reclaim.
> > 
> > I believe I do understand the requirement here. The patch offers
> > counterpart to noswap pro-active reclaim and I do not have objections to
> > that.
> > 
> > The reason I brought this up is that everything in between 0..200 is
> > kinda gray area. We've had several queries why swappiness=N doesn't work
> > as expected and the usual answer was because of heuristics. Most people
> > just learned to live with that and stopped fine tuning vm_swappiness.
> > Which is good I guess.
> 
> You're still oversimplifying and then dismissing. The heuristics don't
> make swappiness meaningless, they make it useful in the first place.
> 
>   This control is used to define the rough relative IO cost of swapping
>   and filesystem paging, as a value between 0 and 200.
> 
> This is clearly defined, and implemented as such. cache_trim_mode is
> predicated on the *absence* of paging and caching benefits: A linear,
> use-once file access pattern that *does not* benefit from additional
> cache space. Kicking out anon for that purpose would be wrong under
> pretty much any circumstance. That's why it "overrides" swappiness:
> swappiness cannot apply when swapping at all would be nonsense.
> 
> Proactive reclaimers like ours rely on this. We use swappiness to
> express exactly what it says on the tin: the relative cost between
> thrashing file vs anon. We use it quite effectively to manage anon
> write rates for flash wear management e.g. Obviously that doesn't mean
> we want to swap when somebody streams through a large file set.
> 
> Zhongkun's case is a significant exception. He just wants to get rid
> of known-cold anon set. This level of insight into userspace access
> patterns is rare in practice. You could argue that MADV_PAGEOUT might
> be more suitable for that.

We have a similar use case at Google where we have a known-cold anon set
and we proactively reclaim it. This is why we previously proposed
type=anon/file/.., but swappiness is more flexible for use cases like
the one Johannes describes above.

> But I also don't necessarily see a problem
> with making swappiness=200 do it; although we might have to teach our
> proactive reclaimer to auto-tune between 1 and 199 then.

Would it be better if we don't use the existing swappiness=200 for this?

We can support something like memory.reclaim X swappiness=max instead to
achieve the "anon only" mode without affecting the existing semantics of
swappiness at all. I have a feeling I may have already proposed that at
some point.

In the kernel, we can define a new value (say 201 or 1000) that means
anon only and set it in memory_reclaim() when "max" is specified. We can
then explicitly check for this value in get_scan_count() (we probably
also need to handle MGLRU?).

From a user perspective the swappiness semantics remain unchanged, and
you do not need to teach your proactive reclaim to auto tune up to 199
of 200. We just support a new swappiness mode specific to proactive
reclaim.

WDYT?



  reply	other threads:[~2025-03-14 17:52 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-14  3:33 [PATCH V2] mm: vmscan: skip the file folios in proactive reclaim if swappiness is MAX Zhongkun He
2025-03-14  6:11 ` Muchun Song
2025-03-14  8:52 ` Michal Hocko
2025-03-14  9:24   ` [External] " Zhongkun He
2025-03-14  9:27   ` Michal Hocko
2025-03-14 10:35     ` [External] " Zhongkun He
2025-03-14 14:18     ` Johannes Weiner
2025-03-14 14:49       ` Michal Hocko
2025-03-14 16:57         ` Johannes Weiner
2025-03-14 17:52           ` Yosry Ahmed [this message]
2025-03-14 11:32 ` Hailong Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z9Rs8ZtgkupXpFYn@google.com \
    --to=yosry.ahmed@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hezhongkun.hzk@bytedance.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.