From: Michal Hocko <mhocko@suse.com>
To: Jiayuan Chen <jiayuan.chen@linux.dev>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
Johannes Weiner <hannes@cmpxchg.org>,
David Hildenbrand <david@redhat.com>,
Qi Zheng <zhengqi.arch@bytedance.com>,
Shakeel Butt <shakeel.butt@linux.dev>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1] mm/vmscan: Add retry logic for cgroups with memory.low in kswapd
Date: Fri, 7 Nov 2025 14:22:14 +0100 [thread overview]
Message-ID: <aQ3yhmsT2NHeNwLi@tiehlicka> (raw)
In-Reply-To: <db4d9e73e6a70033da561ed88aef32c1ebe411dd@linux.dev>
Sorry for late reply.
On Mon 20-10-25 10:11:23, Jiayuan Chen wrote:
[...]
> To provide more context about our specific setup:
>
> 1. The memory.low values set on host pods are actually quite large,
> some pods are set to 10GB, others to 20GB, etc.
> 2. Since most pods have memory limits configured, each time kswapd
> is woken up, if a pod's memory usage hasn't exceeded its own
> memory.low, its memory won't be reclaimed.
> 3. When applications start up, rapidly consume memory, or experience
> network traffic bursts, the kernel reaches steal_suitable_fallback(),
> which sets watermark_boost and subsequently wakes kswapd.
> 4. In the core logic of kswapd thread (balance_pgdat()), when reclaim is
> triggered by watermark_boost, the maximum priority is 10. Higher priority
> values mean less aggressive LRU scanning, which can result in no pages
> being reclaimed during a single scan cycle:
>
> if (nr_boost_reclaim && sc.priority == DEF_PRIORITY - 2)
> raise_priority = false;
>
> 5. This eventually causes pgdat->kswapd_failures to continuously accumulate,
> exceeding MAX_RECLAIM_RETRIES, and consequently kswapd stops working.
> At this point, the system's available memory is still significantly above
> the high watermark—it's inappropriate for kswapd to stop under these
> conditions.
>
> The final observable issue is that a brief period of rapid memory allocation
> causes kswapd to stop running, ultimately triggering direct reclaim and
> making the applications unresponsive.
This to me sounds like something to be addressed in the watermark
boosting code. I do not think we should be breaching low limit for that
(opportunistic) reclaim.
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2025-11-07 13:22 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-14 8:18 [PATCH v1] mm/vmscan: Add retry logic for cgroups with memory.low in kswapd Jiayuan Chen
2025-10-14 9:33 ` Michal Hocko
2025-10-14 12:56 ` Jiayuan Chen
2025-10-16 14:49 ` Michal Hocko
2025-10-16 15:10 ` Jiayuan Chen
2025-10-16 18:43 ` Michal Hocko
2025-10-20 10:11 ` Jiayuan Chen
2025-11-07 13:22 ` Michal Hocko [this message]
2025-11-08 0:09 ` Shakeel Butt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aQ3yhmsT2NHeNwLi@tiehlicka \
--to=mhocko@suse.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=jiayuan.chen@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=shakeel.butt@linux.dev \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.