Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
To: linux-fsdevel <linux-fsdevel@vger.kernel.org>
Cc: Amir Goldstein <amir73il@gmail.com>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	lsf-pc <lsf-pc@lists.linux-foundation.org>,
	Gregory Price <gourry@gourry.net>,
	Bharata B Rao <bharata@amd.com>,
	Donet Tom <donettom@linux.ibm.com>,
	Matthew Wilcox <willy@infradead.org>,
	Aboorva Devarajan <aboorvad@linux.ibm.com>,
	linux-mm@kvack.org, Ojaswin Mujoo <ojaswin@linux.ibm.com>
Subject: [LSF/MM/BPF BoF Session] Numa-Aware Placement for Page Cache Pages
Date: Thu, 30 Apr 2026 17:03:37 +0530	[thread overview]
Message-ID: <h5ossmwe.ritesh.list@gmail.com> (raw)


Hi All,

Amir insisted for this :)
> IOW, bring the Hallway session into the room, so that other people
> can participate and we can use the hallway time for gossip and
> stuffing our faces.

So, since we might have few slots available in FS breakout sessions -
here is something that I was hoping to have a discussion with you all
in hallway. However, I thought maybe it will be a good idea to initiate
this thread here, to see what do you think about this.


Linux already supports memory tiers and there are ongoing discussions around
promotion of unmapped page cache pages, which lets kernel do the right thing
for userspace page cache pages on a tiered system.

v6.17 added support for per-node global reclaim via
/sys/devices/system/node/nodeX/reclaim, which lets users perform per-node
reclaim of page cache pages. We also already have interfaces that let userspace
define the lifetime of page cache pages, such as RWF_DONTCACHE and
FADV_DONTNEED. These are increasingly useful because locally-attached DRAM is
a costly resource and we don't want unwanted page cache pollution there.
Userspace, sometimes is in a better position than the kernel to know the
workload's access pattern and whether it makes sense to drop page cache pages
once the I/O is done.

So the question is:
Do we need a userspace interface for the placement policy of page cache pages on a per file basis?

Note that we do have per-task placement policies like set_mempolicy(), but those
are too coarse and don't help if userspace wants per-fd control. mmap+mbind()
doesn't reach unmapped page cache. shared_policy per-inode works for
shmem/guest_memfd but not for other filesystems (I think so, but I maybe
wrong).

So what I would like to discuss with others is:

1. Is there a need for an interface that allows userspace to do per-fd page
   placement and maybe per-fd page migration?
2. Are there applications that need such an interface or would they benefit
   from it?
3. Even if applications may not need this today, should kernel developers start
   thinking about it now, before users start abusing some not-well-defined
   existing interface. e.g. the story of echo 1 > /proc/sys/vm/drop_caches,
   which became a production workload tool despite never being intended as
   one?

Let me know if people think that this discussion qualifies for a BoF discussion at LSFMM?
Or do you think it's a bad idea altogether, if that is the case - Then
please help me understand, why so?
Before starting to jump on the implemention of any of this - I would
like to gather feedback on what do others think?

-ritesh


             reply	other threads:[~2026-04-30 11:58 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30 11:33 Ritesh Harjani (IBM) [this message]
2026-04-30 13:15 ` [LSF/MM/BPF BoF Session] Numa-Aware Placement for Page Cache Pages Matthew Wilcox
2026-04-30 14:43   ` Ritesh Harjani
2026-05-02 14:57   ` Gregory Price
2026-05-02 15:49     ` Gregory Price
2026-05-03 16:18       ` Ritesh Harjani
2026-05-03 23:48         ` Gregory Price
2026-05-02 23:00     ` Matthew Wilcox
2026-05-03 14:15       ` Gregory Price
2026-04-30 17:32 ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=h5ossmwe.ritesh.list@gmail.com \
    --to=ritesh.list@gmail.com \
    --cc=aboorvad@linux.ibm.com \
    --cc=amir73il@gmail.com \
    --cc=bharata@amd.com \
    --cc=brauner@kernel.org \
    --cc=donettom@linux.ibm.com \
    --cc=gourry@gourry.net \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox