Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Amir Goldstein <amir73il@gmail.com>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	lsf-pc <lsf-pc@lists.linux-foundation.org>,
	Gregory Price <gourry@gourry.net>,
	Bharata B Rao <bharata@amd.com>,
	Donet Tom <donettom@linux.ibm.com>,
	Aboorva Devarajan <aboorvad@linux.ibm.com>,
	linux-mm@kvack.org, Ojaswin Mujoo <ojaswin@linux.ibm.com>
Subject: Re: [LSF/MM/BPF BoF Session] Numa-Aware Placement for Page Cache Pages
Date: Thu, 30 Apr 2026 20:13:08 +0530	[thread overview]
Message-ID: <ecjwse4j.ritesh.list@gmail.com> (raw)
In-Reply-To: <afNV5wbhsFQJLzxi@casper.infradead.org>

Matthew Wilcox <willy@infradead.org> writes:

> On Thu, Apr 30, 2026 at 05:03:37PM +0530, Ritesh Harjani (IBM) wrote:
>> Linux already supports memory tiers and there are ongoing discussions around
>> promotion of unmapped page cache pages, which lets kernel do the right thing
>> for userspace page cache pages on a tiered system.
>
> Well, you know my opinion of that idea ...
>

:)

>> So the question is:
>> Do we need a userspace interface for the placement policy of page cache pages on a per file basis?
>
> What do we do if two tasks both "know" the right NUMA placement for the
> inode's data, and they disagree?
>

Yes, that's a fair concern that I too had.

So, the placement policy only takes effect at the first allocation i.e.
once a folio is in the page cache. So in the common case where two tasks
read disjoint ranges of the same file, a per-fd policy might work
cleanly - each task's policy governs the folios it reads and there
shouldn't be any conflict.

However, on the same range, whoever instantiate the folio first wins.
But that problem exist today too, even with set_mempolicy.

>> 1. Is there a need for an interface that allows userspace to do per-fd page
>>    placement and maybe per-fd page migration?
>
> Ideally, no, the kernel should observe the task and get it right.
>
> By the way, you're familiar with how filemap_alloc_folio_noprof()
> works today, right?

Are you pointing towards the recent work of yours here?

16a542e22339 Matthew Wilcox  mm/filemap: Extend __filemap_get_folio() to support NUMA memory poli..  8 months ago
7f3779a3ac3e Matthew Wilcox  mm/filemap: Add NUMA mempolicy support to filemap_alloc_folio()         8 months ago
    mm/filemap: Add NUMA mempolicy support to filemap_alloc_folio()

    Add a mempolicy parameter to filemap_alloc_folio() to enable NUMA-aware
    page cache allocations. This will be used by upcoming changes to
    support NUMA policies in guest-memfd, where guest_memory need to be
    allocated NUMA policy specified by VMM.

    All existing users pass NULL maintaining current behavior.


Yup, that sort of is laying the foundation work for this discussion :)
Although I understand that it was done particularly for guest_memfd only.

Is that what you meant?


> I forget whether cpuset_do_page_mem_spread
> is on or off by default.
>

Should be off then I guess...

cpuset_write_u64()
...
	case FILE_SPREAD_PAGE:
		pr_info_once("cpuset.%s is deprecated\n", cft->name);
		retval = cpuset_update_flag(CS_SPREAD_PAGE, cs, val);
		break;

Is this what you were referring to?


>> Let me know if people think that this discussion qualifies for a BoF discussion at LSFMM?
>> Or do you think it's a bad idea altogether, if that is the case - Then
>> please help me understand, why so?
>> Before starting to jump on the implemention of any of this - I would
>> like to gather feedback on what do others think?
>
> I'm just concerned about what other session i'll have to miss to attend
> this instead ;-)

It's good to know that there is an interest then ;)

-ritesh


  reply	other threads:[~2026-04-30 15:53 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30 11:33 [LSF/MM/BPF BoF Session] Numa-Aware Placement for Page Cache Pages Ritesh Harjani (IBM)
2026-04-30 13:15 ` Matthew Wilcox
2026-04-30 14:43   ` Ritesh Harjani [this message]
2026-05-02 14:57   ` Gregory Price
2026-05-02 15:49     ` Gregory Price
2026-05-03 16:18       ` Ritesh Harjani
2026-05-03 23:48         ` Gregory Price
2026-05-02 23:00     ` Matthew Wilcox
2026-05-03 14:15       ` Gregory Price
2026-04-30 17:32 ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ecjwse4j.ritesh.list@gmail.com \
    --to=ritesh.list@gmail.com \
    --cc=aboorvad@linux.ibm.com \
    --cc=amir73il@gmail.com \
    --cc=bharata@amd.com \
    --cc=brauner@kernel.org \
    --cc=donettom@linux.ibm.com \
    --cc=gourry@gourry.net \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox