All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Akinobu Mita <akinobu.mita@gmail.com>
Cc: linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, akpm@linux-foundation.org,
	axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com,
	hannes@cmpxchg.org, david@kernel.org, zhengqi.arch@bytedance.com,
	shakeel.butt@linux.dev, lorenzo.stoakes@oracle.com,
	Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
	surenb@google.com, ziy@nvidia.com, matthew.brost@intel.com,
	joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
	gourry@gourry.net, ying.huang@linux.alibaba.com,
	apopple@nvidia.com, bingjiao@google.com,
	jonathan.cameron@huawei.com, pratyush.brahma@oss.qualcomm.com
Subject: Re: [PATCH v4 3/3] mm/vmscan: don't demote if there is not enough free memory in the lower memory tier
Date: Wed, 14 Jan 2026 14:40:09 +0100	[thread overview]
Message-ID: <aWecuccS9tGZgXe1@tiehlicka> (raw)
In-Reply-To: <CAC5umyib=XmPkTvRT6eAsMx+WDx0NkSP9djh4xkhLJgwk_v8Mw@mail.gmail.com>

On Wed 14-01-26 21:51:28, Akinobu Mita wrote:
> 2026年1月13日(火) 22:40 Michal Hocko <mhocko@suse.com>:
> >
> > On Tue 13-01-26 17:14:53, Akinobu Mita wrote:
> > > On systems with multiple memory-tiers consisting of DRAM and CXL memory,
> > > the OOM killer is not invoked properly.
> > >
> > > Here's the command to reproduce:
> > >
> > > $ sudo swapoff -a
> > > $ stress-ng --oomable -v --memrate 20 --memrate-bytes 10G \
> > >     --memrate-rd-mbs 1 --memrate-wr-mbs 1
> > >
> > > The memory usage is the number of workers specified with the --memrate
> > > option multiplied by the buffer size specified with the --memrate-bytes
> > > option, so please adjust it so that it exceeds the total size of the
> > > installed DRAM and CXL memory.
> > >
> > > If swap is disabled, you can usually expect the OOM killer to terminate
> > > the stress-ng process when memory usage approaches the installed memory
> > > size.
> > >
> > > However, if multiple memory-tiers exist (multiple
> > > /sys/devices/virtual/memory_tiering/memory_tier<N> directories exist) and
> > > /sys/kernel/mm/numa/demotion_enabled is true, the OOM killer will not be
> > > invoked and the system will become inoperable, regardless of whether MGLRU
> > > is enabled or not.
> > >
> > > This issue can be reproduced using NUMA emulation even on systems with
> > > only DRAM.  You can create two-fake memory-tiers by booting a single-node
> > > system with "numa=fake=2 numa_emulation.adistance=576,704" kernel
> > > parameters.
> > >
> > > The reason for this issue is that memory allocations do not directly
> > > trigger the oom-killer, assuming that if the target node has an underlying
> > > memory tier, it can always be reclaimed by demotion.
> >
> > Why don't we fall back to no demotion mode in this case? I mean we have
> > shrink_folio_list:
> >         if (!list_empty(&demote_folios)) {
> >                 /* Folios which weren't demoted go back on @folio_list */
> >                 list_splice_init(&demote_folios, folio_list);
> >
> >                 /*
> >                  * goto retry to reclaim the undemoted folios in folio_list if
> >                  * desired.
> >                  *
> >                  * Reclaiming directly from top tier nodes is not often desired
> >                  * due to it breaking the LRU ordering: in general memory
> >                  * should be reclaimed from lower tier nodes and demoted from
> >                  * top tier nodes.
> >                  *
> >                  * However, disabling reclaim from top tier nodes entirely
> >                  * would cause ooms in edge scenarios where lower tier memory
> >                  * is unreclaimable for whatever reason, eg memory being
> >                  * mlocked or too hot to reclaim. We can disable reclaim
> >                  * from top tier nodes in proactive reclaim though as that is
> >                  * not real memory pressure.
> >                  */
> >                 if (!sc->proactive) {
> >                         do_demote_pass = false;
> >                         goto retry;
> >                 }
> >         }
> >
> > to handle this situation no?
> 
> can_demote() is called from four places.
> I tried modifying the patch to change the behavior only when can_demote()
> is called from shrink_folio_list(), but the problem was not fixed
> (oom did not occur).
> 
> Similarly, changing the behavior of can_demote() when called from
> can_reclaim_anon_pages(), shrink_folio_list(), and can_age_anon_pages(),
> but not when called from get_swappiness(), did not fix the problem either
> (oom did not occur).
> 
> Conversely, changing the behavior only when called from get_swappiness(),
> but not changing the behavior of can_reclaim_anon_pages(),
> shrink_folio_list(), and can_age_anon_pages(), fixed the problem
> (oom did occur).
> 
> Therefore, it appears that the behavior of get_swappiness() is important
> in this issue.

You have said that there is no swap configured in the system, right?
That would imply that anonymous pages are not reclaimable at all (see
can_reclaim_anon_pages)?

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2026-01-14 13:40 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-13  8:14 [PATCH v4 0/3] mm: fix oom-killer not being invoked when demotion is enabled Akinobu Mita
2026-01-13  8:14 ` [PATCH v4 1/3] mm: memory-tiers, numa_emu: enable to create memory tiers using fake numa nodes Akinobu Mita
2026-01-13  9:30   ` Pratyush Brahma
2026-01-13  8:14 ` [PATCH v4 2/3] mm: numa_emu: add document for NUMA emulation Akinobu Mita
2026-01-13  9:32   ` Pratyush Brahma
2026-01-13  8:14 ` [PATCH v4 3/3] mm/vmscan: don't demote if there is not enough free memory in the lower memory tier Akinobu Mita
2026-01-13 13:40   ` Michal Hocko
2026-01-14 12:51     ` Akinobu Mita
2026-01-14 13:40       ` Michal Hocko [this message]
2026-01-14 17:49       ` Gregory Price
2026-01-15  0:40         ` Akinobu Mita
2026-01-22  0:32           ` Akinobu Mita
2026-01-22 16:38             ` Gregory Price
2026-01-26  1:57               ` Akinobu Mita
2026-01-27 21:21                 ` Gregory Price
2026-01-29  0:51                   ` Akinobu Mita
2026-01-29  2:48                     ` Gregory Price
2026-01-22 18:34       ` Joshua Hahn
2026-01-26  2:01         ` Akinobu Mita
2026-01-27 22:00           ` Joshua Hahn
2026-01-29  0:40             ` Akinobu Mita
2026-02-02 13:11               ` Michal Hocko
2026-02-02 13:15                 ` Michal Hocko
2026-02-04  2:07                 ` Akinobu Mita
2026-02-04  9:25                   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aWecuccS9tGZgXe1@tiehlicka \
    --to=mhocko@suse.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akinobu.mita@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=bingjiao@google.com \
    --cc=byungchul@sk.com \
    --cc=david@kernel.org \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=jonathan.cameron@huawei.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=matthew.brost@intel.com \
    --cc=pratyush.brahma@oss.qualcomm.com \
    --cc=rakie.kim@sk.com \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=weixugc@google.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.