From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E2EBB39C657 for ; Wed, 14 Jan 2026 13:40:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.53 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768398019; cv=none; b=qOcwQtbLCwWHmhDs4jZtpkm9ur+s2G33wnHk9kn1bV+4lWcLeGQ8/3Acy2q6LZ4QaazSU9XjIuNfd0h2yqzm/k0Kbinlnc5By96Zkeatp4EjWu2MfmIGHRFUDsIT2RrJRjUPSyUYQaKC5z5HoObs3Vrl5dYfGRc1UINzjePJ8Yk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768398019; c=relaxed/simple; bh=aNEFYJ2ZdTfHXcX8KrZPjabgpkhfQt8wJUgJuNV3NTM=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=cnAd2rPJhyIBp/g6wyFoKHIOtUmJOijBRwMP+JQ/1jidoU+Idd4AEPWVFmyr6mHzJsz9JfHskjdCRD+UWSA/J+1sipUrDIL+jjZfyNLlBiydIWcsu1JDF6QAqBEvSnVE3OkvqNtmOt26geGLvFQM06cHKzzH+c+/soT8vcKhxTg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=K1tq98zw; arc=none smtp.client-ip=209.85.128.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="K1tq98zw" Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-47edffe5540so13495565e9.0 for ; Wed, 14 Jan 2026 05:40:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1768398011; x=1769002811; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=A5nEm7X8GKJZArjA3XXNFLR3YBACGijl/GdrWDkzdOs=; b=K1tq98zw3lhpxplSsfJT1kt1G/BzlYQq7REsxveZwaBuyyLk6IGp912OmGUxAFuTXw Z4aPMjIN4yuZ0ZPlQGThK3zcQY/y3Q5F/dZwwkzZFdMjju5RK6O9Asgxl9i2HxAvyk8z Elg2vhQc8ix3+quSLL4+WQIxnJXMVKWtghE/GGK1fORjZIQk5dWUbNlpu0Es3xILvALd gUFKZpIVbmx1UFzviGgxBjyvYZuDL7TZwZtCzO8TNLQFov6AfrSly4mirMylXqOJM1aY KjXcs1rcXMCWhR3XI6Py46JXIM53GQyvq7Vu6veQIPAKcmA//JkX2amo5vUciDzdvOTb w8FQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768398011; x=1769002811; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=A5nEm7X8GKJZArjA3XXNFLR3YBACGijl/GdrWDkzdOs=; b=Mzs5QWgjs9V/aFBBdFaPepp7sMfDQE1ZSgTIhCjgii+DNmFnp30tPuE4drndFiiOsU ZEYvSy+CCMlae0MroJJfGyvWD2Yys7tAnerL251N9wAKqIjFQrw1KtVR/SXi3IUOity0 N4lf8SlDZLF502ExBOWVMlZAmp9R1iB3ucUxUdlMpncXFxjmANnaOSY0jwbMdEYD6ZfQ Vctj7mzlpw/WLR8LF8tUvewSn5oMMrNa7h3J3Lt23SFc6CJgxb+lOPch3JtfD4rLSZTG s33smOubLDgM04VISP8ut0ECSUBUYazktoADKdWtUSYmL3JbfdgXUe+NQ+tHWO9qbEl4 38AA== X-Forwarded-Encrypted: i=1; AJvYcCW6FGAOoCPhf8ORY5e7wgrzV6fWUh33I1UkP4n0orZQqOPnoBxwWCzQxleRx+2NCUoXCgd/ZBfGG4I6Pyc=@vger.kernel.org X-Gm-Message-State: AOJu0YyWgSFn3GDYs1CxWIRG8DwNMPakcLCSHALti9gkcyeHxmBMRgcc KMMVgaozuKG9+uQLrHhidZtxYumlrU3Jqxlh2tWStjtEzgjMx6zUmmBiYKbkDuDLVrM= X-Gm-Gg: AY/fxX5EkGm4qWl3NQudfAy+6/5dEJoJKlAWd6ioYanOmM9+O4YeUK//FztOiivy0Ot 4W3erohVNH/cMbhZY42SSExNUwPdZl71VTzV46zonCJehLIkh/zoO4sHyqJnq1adUw8CUe5I8iE CV4iInqbpzUzIqV9dCkLxgxLOwn1UH3xhDZPJG2Z+NkYyfjFxhOtxIDRq2R8xt6nRjpRpmFIfq3 ZBmLGH4E1KKinhw6eNbIy5zRQG42/fVdNm5dGmQ6MqepFN3n+L6dF7kIsRODhpfp/U72by/8FBL faoXRxaCjaAH0+AG+9y6Z4f+cjPC9VUJ/shXCDc4zatyocWU9Erom84ZxmXTWKN4S0c0x/bDxNY 5MhS2gsw8xfGK7OYN7Jz8zSiQOP0771YDthLGdpTLbSZlVQkNbD9IlsD0j2vD3RQzeDv3C7yLzx Yo1sOFXwUnOrCJST5CvFwsdFHUI/tAv0v+91g= X-Received: by 2002:a05:600c:c16a:b0:477:1af2:f40a with SMTP id 5b1f17b1804b1-47ee334d0aemr36493285e9.17.1768398011351; Wed, 14 Jan 2026 05:40:11 -0800 (PST) Received: from localhost (109-81-19-111.rct.o2.cz. [109.81.19.111]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-47ee27dffcdsm21221875e9.6.2026.01.14.05.40.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 14 Jan 2026 05:40:10 -0800 (PST) Date: Wed, 14 Jan 2026 14:40:09 +0100 From: Michal Hocko To: Akinobu Mita Cc: linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, hannes@cmpxchg.org, david@kernel.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, bingjiao@google.com, jonathan.cameron@huawei.com, pratyush.brahma@oss.qualcomm.com Subject: Re: [PATCH v4 3/3] mm/vmscan: don't demote if there is not enough free memory in the lower memory tier Message-ID: References: <20260113081453.8293-1-akinobu.mita@gmail.com> <20260113081453.8293-4-akinobu.mita@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Wed 14-01-26 21:51:28, Akinobu Mita wrote: > 2026年1月13日(火) 22:40 Michal Hocko : > > > > On Tue 13-01-26 17:14:53, Akinobu Mita wrote: > > > On systems with multiple memory-tiers consisting of DRAM and CXL memory, > > > the OOM killer is not invoked properly. > > > > > > Here's the command to reproduce: > > > > > > $ sudo swapoff -a > > > $ stress-ng --oomable -v --memrate 20 --memrate-bytes 10G \ > > > --memrate-rd-mbs 1 --memrate-wr-mbs 1 > > > > > > The memory usage is the number of workers specified with the --memrate > > > option multiplied by the buffer size specified with the --memrate-bytes > > > option, so please adjust it so that it exceeds the total size of the > > > installed DRAM and CXL memory. > > > > > > If swap is disabled, you can usually expect the OOM killer to terminate > > > the stress-ng process when memory usage approaches the installed memory > > > size. > > > > > > However, if multiple memory-tiers exist (multiple > > > /sys/devices/virtual/memory_tiering/memory_tier directories exist) and > > > /sys/kernel/mm/numa/demotion_enabled is true, the OOM killer will not be > > > invoked and the system will become inoperable, regardless of whether MGLRU > > > is enabled or not. > > > > > > This issue can be reproduced using NUMA emulation even on systems with > > > only DRAM. You can create two-fake memory-tiers by booting a single-node > > > system with "numa=fake=2 numa_emulation.adistance=576,704" kernel > > > parameters. > > > > > > The reason for this issue is that memory allocations do not directly > > > trigger the oom-killer, assuming that if the target node has an underlying > > > memory tier, it can always be reclaimed by demotion. > > > > Why don't we fall back to no demotion mode in this case? I mean we have > > shrink_folio_list: > > if (!list_empty(&demote_folios)) { > > /* Folios which weren't demoted go back on @folio_list */ > > list_splice_init(&demote_folios, folio_list); > > > > /* > > * goto retry to reclaim the undemoted folios in folio_list if > > * desired. > > * > > * Reclaiming directly from top tier nodes is not often desired > > * due to it breaking the LRU ordering: in general memory > > * should be reclaimed from lower tier nodes and demoted from > > * top tier nodes. > > * > > * However, disabling reclaim from top tier nodes entirely > > * would cause ooms in edge scenarios where lower tier memory > > * is unreclaimable for whatever reason, eg memory being > > * mlocked or too hot to reclaim. We can disable reclaim > > * from top tier nodes in proactive reclaim though as that is > > * not real memory pressure. > > */ > > if (!sc->proactive) { > > do_demote_pass = false; > > goto retry; > > } > > } > > > > to handle this situation no? > > can_demote() is called from four places. > I tried modifying the patch to change the behavior only when can_demote() > is called from shrink_folio_list(), but the problem was not fixed > (oom did not occur). > > Similarly, changing the behavior of can_demote() when called from > can_reclaim_anon_pages(), shrink_folio_list(), and can_age_anon_pages(), > but not when called from get_swappiness(), did not fix the problem either > (oom did not occur). > > Conversely, changing the behavior only when called from get_swappiness(), > but not changing the behavior of can_reclaim_anon_pages(), > shrink_folio_list(), and can_age_anon_pages(), fixed the problem > (oom did occur). > > Therefore, it appears that the behavior of get_swappiness() is important > in this issue. You have said that there is no swap configured in the system, right? That would imply that anonymous pages are not reclaimable at all (see can_reclaim_anon_pages)? -- Michal Hocko SUSE Labs