Linux Trace Kernel
 help / color / mirror / Atom feed
From: Arun George/Arun George <arun.george@samsung.com>
To: Gregory Price <gourry@gourry.net>
Cc: lsf-pc@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-cxl@vger.kernel.org, cgroups@vger.kernel.org,
	linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org,
	damon@lists.linux.dev, kernel-team@meta.com,
	gregkh@linuxfoundation.org, rafael@kernel.org, dakr@kernel.org,
	dave@stgolabs.net, dave.jiang@intel.com,
	alison.schofield@intel.com, vishal.l.verma@intel.com,
	ira.weiny@intel.com, longman@redhat.com,
	akpm@linux-foundation.org, david@kernel.org,
	lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com,
	vbabka@suse.cz, rppt@kernel.org, surenb@google.com,
	mhocko@suse.com, osalvador@suse.de, ziy@nvidia.com,
	matthew.brost@intel.com, joshua.hahnjy@gmail.com,
	rakie.kim@sk.com, byungchul@sk.com, ying.huang@linux.alibaba.com,
	apopple@nvidia.com, axelrasmussen@google.com, yuanchu@google.com,
	weixugc@google.com, yury.norov@gmail.com,
	linux@rasmusvillemoes.dk, mhiramat@kernel.org,
	mathieu.desnoyers@efficios.com, tj@kernel.org,
	hannes@cmpxchg.org, mkoutny@suse.com, jackmanb@google.com,
	sj@kernel.org, baolin.wang@linux.alibaba.com, npache@redhat.com,
	ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org,
	lance.yang@linux.dev, muchun.song@linux.dev, xu.xin16@zte.com.cn,
	chengming.zhou@linux.dev, jannh@google.com, linmiaohe@huawei.com,
	nao.horiguchi@gmail.com, pfalcato@suse.de, rientjes@google.com,
	shakeel.butt@linux.dev, riel@surriel.com, harry.yoo@oracle.com,
	cl@gentwo.org, roman.gushchin@linux.dev, chrisl@kernel.org,
	kasong@tencent.com, shikemeng@huaweicloud.com, nphamcs@gmail.com,
	bhe@redhat.com, zhengqi.arch@bytedance.com, terry.bowman@amd.com,
	gost.dev@samsung.com, arungeorge05@gmail.com, cpgs@samsung.com
Subject: Re: [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM)
Date: Fri, 22 May 2026 14:10:34 +0530	[thread overview]
Message-ID: <1891546521.01779449881859.JavaMail.epsvc@epcpadp2new> (raw)
In-Reply-To: <afmgJcFUjQLYxkb5@gourry-fedora-PF4VCD3F>

Thanks.

On 05-05-2026 01:15 pm, Gregory Price wrote:
> In the scenario i'm talking about, a "write budget" is defined as a
> number of pages that are allows to be mapped writable in the page
> tables at any given time.
> Agree. I was also in the same context.

I am trying to bring the device perspective here, and would like to 
discuss a few corner cases and possible solutions.

As I see, solving the compressed memory problem statement has these 
aspects mainly:

1) Allocation control: private/managed memory concept.
2) Write control: write-protected PTEs, write-controlled use cases like 
ZSWAP
3) Proactive reclaims: optional methods to ease back-pressure using 
memory shrinkers, ballooning, kswapd, promotion etc. These methods will 
be triggered based on notifications/interrupts from the device.

May be they are not enough to cover some corner cases for cram!

  I believe that this thin-provisioned memory infra is susceptible to 
'writes-above-media-capacity corner cases' (because of not handling 
device back-pressure notifications in time) whichever methods we use in 
the kernel. Even if we use write-controlled methods like ZSWAP and 
pro-active reclaims, there could be corner cases where the communication 
with the device could be broken and the write path is not aware of it 
immediately. Note that OCP spec [1] says the device should mark the 
memory location as 'poisoned' in 'over-capacity' writes.

So I have the following proposals / options for this scenario.

    Option 1: Poisoned data management - This is about accepting that 
poisoning of memory locations can happen in much more regular frequency 
here than regular memories and we need to figure out potential recovery 
mechanisms in host (not recovery of data; but recovery from the poison 
situation). But I guess folks will not be okay with it in general, and I 
am not aware of any workloads where data poisoning is tolerated (may be 
caching workloads?).

    Option 2 (preferred): Device assisted write budgeting - This is 
about a device aware / assisted mechanism for the write-controlled 
use-cases (Ex: ZSWAP) to know the 'safe number of  writes' that can be 
performed to the device (Or allows to be mapped writable in the page 
tables). This could be like a 'token bucket' algorithm, where the device 
provides a 'budget / set of tokens' to the host. And it need to be 
replenished periodically in the device communication code path; and if 
the host does not find the token, writes cannot go ahead.

In short, the communication with the device has to be maintained to make 
pages mapped writable. For MVP, this could be a simple constraint of 
checking actual device capacity periodically to replenish write-budget 
for CRAM. For other users of private nodes (GPU memory?), this 
constraint may not be needed at all.

We are planning to send an RFC code which will fit into your CRAM infra 
to discuss this poison management approach further.

[1]: 
https://www.opencompute.org/documents/hyperscale-tiered-memory-expander-specification-for-compute-express-link-cxl-1-pdf

~Arun George


  reply	other threads:[~2026-05-22 11:38 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20260427123800epcas5p1e1a2fed257091b31e2e6c3a7d1b0c2b0@epcas5p1.samsung.com>
2026-02-22  8:48 ` [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM) Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 01/27] numa: introduce N_MEMORY_PRIVATE node state Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 02/27] mm,cpuset: gate allocations from N_MEMORY_PRIVATE behind __GFP_PRIVATE Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 03/27] mm/page_alloc: add numa_zone_allowed() and wire it up Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 04/27] mm/page_alloc: Add private node handling to build_zonelists Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 05/27] mm: introduce folio_is_private_managed() unified predicate Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 06/27] mm/mlock: skip mlock for managed-memory folios Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 07/27] mm/madvise: skip madvise " Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 08/27] mm/ksm: skip KSM " Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 09/27] mm/khugepaged: skip private node folios when trying to collapse Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 10/27] mm/swap: add free_folio callback for folio release cleanup Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 11/27] mm/huge_memory.c: add private node folio split notification callback Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 12/27] mm/migrate: NP_OPS_MIGRATION - support private node user migration Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 13/27] mm/mempolicy: NP_OPS_MEMPOLICY - support private node mempolicy Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 14/27] mm/memory-tiers: NP_OPS_DEMOTION - support private node demotion Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 15/27] mm/mprotect: NP_OPS_PROTECT_WRITE - gate PTE/PMD write-upgrades Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 16/27] mm: NP_OPS_RECLAIM - private node reclaim participation Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 17/27] mm/oom: NP_OPS_OOM_ELIGIBLE - private node OOM participation Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 18/27] mm/memory: NP_OPS_NUMA_BALANCING - private node NUMA balancing Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 19/27] mm/compaction: NP_OPS_COMPACTION - private node compaction support Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 20/27] mm/gup: NP_OPS_LONGTERM_PIN - private node longterm pin support Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 21/27] mm/memory-failure: add memory_failure callback to node_private_ops Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 22/27] mm/memory_hotplug: add add_private_memory_driver_managed() Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 23/27] mm/cram: add compressed ram memory management subsystem Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 24/27] cxl/core: Add cxl_sysram region type Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 25/27] cxl/core: Add private node support to cxl_sysram Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 26/27] cxl: add cxl_mempolicy sample PCI driver Gregory Price
2026-02-22  8:48   ` [RFC PATCH v4 27/27] cxl: add cxl_compression " Gregory Price
2026-02-23 13:07   ` [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM) David Hildenbrand (Arm)
2026-02-23 14:54     ` Gregory Price
2026-02-23 16:08       ` Gregory Price
2026-03-17 13:05         ` David Hildenbrand (Arm)
2026-03-19 14:29           ` Gregory Price
2026-02-24  6:19   ` Alistair Popple
2026-02-24 15:17     ` Gregory Price
2026-02-24 16:54       ` Gregory Price
2026-02-25 22:21       ` Matthew Brost
2026-02-25 23:58         ` Gregory Price
2026-02-26  3:27       ` Alistair Popple
2026-02-26  5:54         ` Gregory Price
2026-02-26 22:49           ` Gregory Price
2026-03-03 20:36         ` Gregory Price
2026-02-25 12:40   ` Alejandro Lucero Palau
2026-02-25 14:43     ` Gregory Price
2026-05-06 14:43     ` Gregory Price
2026-03-17 13:25   ` David Hildenbrand (Arm)
2026-03-19 15:09     ` Gregory Price
2026-04-13 13:11       ` David Hildenbrand (Arm)
2026-04-13 17:05         ` Gregory Price
2026-04-15  9:49           ` David Hildenbrand (Arm)
2026-04-15 15:17             ` Gregory Price
2026-04-15 19:47               ` Frank van der Linden
2026-04-16  1:24                 ` Gregory Price
2026-04-17  9:50                   ` David Hildenbrand (Arm)
2026-04-17 15:07                     ` Gregory Price
2026-04-16 20:23                 ` Gregory Price
2026-04-17  9:39                 ` David Hildenbrand (Arm)
2026-04-17  9:37               ` David Hildenbrand (Arm)
2026-04-17 14:45                 ` Gregory Price
2026-04-20  2:56                 ` Gregory Price
2026-04-27 12:32   ` Arun George
2026-04-27 22:28     ` Gregory Price
2026-04-29  6:15       ` Arun George/Arun George
2026-04-29 13:42         ` Gregory Price
2026-05-04 13:08           ` Arun George/Arun George
2026-05-05  7:45             ` Gregory Price
2026-05-22  8:40               ` Arun George/Arun George [this message]
2026-05-05 22:21   ` Yiannis Nikolakopoulos
2026-05-09 16:38   ` [LSF/MM/BPF TOPIC] Private Memory Nodes - follow up Gregory Price
2026-05-21  6:23   ` [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM) Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1891546521.01779449881859.JavaMail.epsvc@epcpadp2new \
    --to=arun.george@samsung.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alison.schofield@intel.com \
    --cc=apopple@nvidia.com \
    --cc=arungeorge05@gmail.com \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=byungchul@sk.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=cl@gentwo.org \
    --cc=cpgs@samsung.com \
    --cc=dakr@kernel.org \
    --cc=damon@lists.linux.dev \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=gost.dev@samsung.com \
    --cc=gourry@gourry.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=harry.yoo@oracle.com \
    --cc=ira.weiny@intel.com \
    --cc=jackmanb@google.com \
    --cc=jannh@google.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kasong@tencent.com \
    --cc=kernel-team@meta.com \
    --cc=lance.yang@linux.dev \
    --cc=linmiaohe@huawei.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=longman@redhat.com \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=matthew.brost@intel.com \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=nao.horiguchi@gmail.com \
    --cc=npache@redhat.com \
    --cc=nphamcs@gmail.com \
    --cc=osalvador@suse.de \
    --cc=pfalcato@suse.de \
    --cc=rafael@kernel.org \
    --cc=rakie.kim@sk.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=sj@kernel.org \
    --cc=surenb@google.com \
    --cc=terry.bowman@amd.com \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=vishal.l.verma@intel.com \
    --cc=weixugc@google.com \
    --cc=xu.xin16@zte.com.cn \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuanchu@google.com \
    --cc=yury.norov@gmail.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox