Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Gregory Price <gourry@gourry.net>
To: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Amir Goldstein <amir73il@gmail.com>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	lsf-pc <lsf-pc@lists.linux-foundation.org>,
	Bharata B Rao <bharata@amd.com>,
	Donet Tom <donettom@linux.ibm.com>,
	Matthew Wilcox <willy@infradead.org>,
	Aboorva Devarajan <aboorvad@linux.ibm.com>,
	linux-mm@kvack.org, Ojaswin Mujoo <ojaswin@linux.ibm.com>
Subject: Re: [LSF/MM/BPF BoF Session] Numa-Aware Placement for Page Cache Pages
Date: Thu, 30 Apr 2026 18:32:13 +0100	[thread overview]
Message-ID: <afOSHX3U_0F5kKQY@gourry-fedora-PF4VCD3F> (raw)
In-Reply-To: <h5ossmwe.ritesh.list@gmail.com>

On Thu, Apr 30, 2026 at 05:03:37PM +0530, Ritesh Harjani (IBM) wrote:
> 
> Linux already supports memory tiers

Allegedly. (TM)

In practice, and in working with such support, the support is incredibly
nascent and in fact causes LRU inversions by design, is missing unmapped
page cache support (as you note here), and just overall does not work
well out of the box for any reasonably complicated system.

> and there are ongoing discussions around
> promotion of unmapped page cache pages, which lets kernel do the right thing
> for userspace page cache pages on a tiered system.
> 

I like to think of this more accurately as:

"Lets the kernel nudge the trajectory of the distribution in the right
direction".

There is no objectively "right thing" here, and chasing that is a dead
end.

> Userspace, sometimes is in a better position than the kernel to know the
> workload's access pattern and whether it makes sense to drop page cache pages
> once the I/O is done.
> 

At the expense of an increasingly complex maintenance burden on the kernel.

> So the question is:
> Do we need a userspace interface for the placement policy of page cache pages on a per file basis?
>

To the extent that you get something like:

MADV/FADV_HOT (promote and read-ahead)

as an extension that mirrors MADV_WILLNEED (read-ahead)

... maybe.

> 1. Is there a need for an interface that allows userspace to do per-fd page
>    placement and maybe per-fd page migration?

Maybe as MADV/FADV hints, but beyond this - no.  I agree with Willy that
the kernel should simply get placement right.

Building the assumption that userland will do X and *then* the kernel
will get it right is just a road to building a bunch of random
interfaces that eventually get deprecated when the kernel does it
correctly.  We should just do it correctly or not ship it.

> 3. Even if applications may not need this today, should kernel developers start
>    thinking about it now, before users start abusing some not-well-defined
>    existing interface. e.g. the story of echo 1 > /proc/sys/vm/drop_caches,
>    which became a production workload tool despite never being intended as
>    one?

We have a public meeting every 2 weeks on tiering topics

https://lore.kernel.org/all/8a622c4f-0774-96a5-2d2a-2151e0bc2367@google.com/

> 
> Let me know if people think that this discussion qualifies for a BoF discussion at LSFMM?
> Or do you think it's a bad idea altogether, if that is the case - Then
> please help me understand, why so?
> Before starting to jump on the implemention of any of this - I would
> like to gather feedback on what do others think?
> 

Always happy to discuss.  Just need to figure out timing.

~Gregory


      parent reply	other threads:[~2026-04-30 17:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30 11:33 [LSF/MM/BPF BoF Session] Numa-Aware Placement for Page Cache Pages Ritesh Harjani (IBM)
2026-04-30 13:15 ` Matthew Wilcox
2026-04-30 14:43   ` Ritesh Harjani
2026-05-02 14:57   ` Gregory Price
2026-05-02 15:49     ` Gregory Price
2026-05-03 16:18       ` Ritesh Harjani
2026-05-03 23:48         ` Gregory Price
2026-05-02 23:00     ` Matthew Wilcox
2026-05-03 14:15       ` Gregory Price
2026-04-30 17:32 ` Gregory Price [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afOSHX3U_0F5kKQY@gourry-fedora-PF4VCD3F \
    --to=gourry@gourry.net \
    --cc=aboorvad@linux.ibm.com \
    --cc=amir73il@gmail.com \
    --cc=bharata@amd.com \
    --cc=brauner@kernel.org \
    --cc=donettom@linux.ibm.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=ritesh.list@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox