From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com [209.85.128.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9BECE33B6F0 for ; Tue, 6 Jan 2026 19:14:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.67 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767726847; cv=none; b=scB8nasAbDnfANaaf16tKz1Sii//yrg3AtnbuDsC0lmY1ycCvaOynoFllLtKpxTb6qx4G7dp3gMKyyFiwixbttSygsTFZUT92WCYDG0Dx0i3g3p2mwNXg6xiTcwFmCIYTkx8um1UWrVwnNXswqMcxUkZod+x1TaaCp5rQ3Zk1Pk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767726847; c=relaxed/simple; bh=6lMVgex1exTY+4A2lEtsvzRQr/QViOmnFfvyX65Xezk=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=dIr9ca8nlAWyYGupgsICyJItP9UfOrhehwo05JsstoHVqTUuU5ORCzNM/FMuYN5YclINSGUuabfQIRL8tpO34XG9wGAoAC6K/WiBmxblg6oIKbwa7KZuuCatEsz2CFz3SWMjm+sccnNIU8TUhXB5zDw6mq94eXHt5jb8/nOuygY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=dTqgc3sc; arc=none smtp.client-ip=209.85.128.67 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="dTqgc3sc" Received: by mail-wm1-f67.google.com with SMTP id 5b1f17b1804b1-477563e28a3so982485e9.1 for ; Tue, 06 Jan 2026 11:14:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1767726844; x=1768331644; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=P4b6An5MJliymK+l5lQ+asO/gerD4yNm0UvKmXCvNrg=; b=dTqgc3scqOR25wpPq90xWGskz5affDupbrU7QCzWMhXhFRO/VODO+5IXFuAJU+t9hP SieqHr7SRIKRrgKISOwZWMJfrWtmL6/kzTQB6q/niVTaRINrIruBtBKDtSzH2bwwvXwb loMxVvtKfBfZ0dQL8j4rhhA7TmtwVZRym/ehsUDT1kZYeJb+UVXBn+1J93B+Whojy80X tmY/Xg1pkM/mGAS6RPSjG9ucxco3TrsauLZz9YFxiIYXXwQpA0JjABsV+43kXpSz9oK2 5du0Rvnfsol0xQKtEXZ7KDBCemBGh8hrvbt5w7ip5L/oP3ihsSS3l7frotxHDX/UwZMx 1yGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767726844; x=1768331644; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=P4b6An5MJliymK+l5lQ+asO/gerD4yNm0UvKmXCvNrg=; b=rTn7UsSJih3B4jC4b1t0js1oqF8Blapv8O8HA2FZ2jnacH+REGix7jfADQmeIlR8WB C5eB82DdeMVk3V+z6SY9sB33G2fAK+9/oAdmbLqKlv6ps2llUdqYbdtUI01a+XdLg+RK IxIgIy0ClX3QVyMSoo4snLsvBRjZOdXidyLVdNlYT1XMMvxvcH8QZcArw0SDpE7x8oGZ 7uVv4O6MtMO9qm6CiQJc9X8gpI0D/x1rN+3xHj535cMHpFG5qpIfnZ//IZCAzCdvuUuW wXlB+eDmxPTbndcBdo9hQlf6a1kGpLMVS5M8j+JFznuVYsM1E4HZh/lEoTdyJFq7wwlw SGSQ== X-Forwarded-Encrypted: i=1; AJvYcCVqCV1xsMK7ggEqBdOxW2cCze3yAzzCoBZuZbTsAsfoAqX/tZKU2YGdFMkO6FKCUEXTKgZjXFHcYElbHmg=@vger.kernel.org X-Gm-Message-State: AOJu0Ywv+UbIoyGZlxwxUoDnjUJSA2LJbyV1gbjd//0/TiMoXCJ9o4QW fKbX2Mi7aIzR/DapjKYUEwCrfp5ZOBUjazs6/BX3XQj+9uOq4v2rgiZi8N8wXAbmqzo= X-Gm-Gg: AY/fxX71np3+4RhCPItpkojywa/ctyQz/3yzkdTTZiBCrUZBaK1Fv2G5Iv9akD7C6X9 Q2yCgmBM0dqKBGNrBk82uvU96xV9D4zeTxtKw6HBy8G3fChAIQfWt0QZVMikMD6z3VA+TbTh+mQ /pK+K/B5r83zz6xsPyyofx3DR4kq6eVD2TrHnZ0wTBNR9rMwYoW92Og3uSClYpYFC5yaDcmJ6O3 i/JtMJ1vyBpFQO3NFN1dWVi+df9pcK1cga0QPPw8aFfI4dDHAhWtX9lA86lJokMvYt02PfofTPv BGrlQDmU+BOC+CQPaxkaYYB4TaiTxEqQSYfRLwJVf70/kwI68RQlr+BFv6qlE3u1//1/Ksy62qa /CAZAHKHus6lkfNs642+p88T0LyRvucq0lPJNwp5kPnPdXKnQnFFNnV4wb+U8kmUcqo+s3ARoiZ KphWwFHihsWBuAixzG15xVdDps X-Google-Smtp-Source: AGHT+IEyNwsx2FHFz7Vpse3jYP2GR6qjdO3W7d0IOA9iHAVE6nWkedBySE+QtJWPz2tdpfLkjzsUNw== X-Received: by 2002:a05:600c:848d:b0:477:9a61:fd06 with SMTP id 5b1f17b1804b1-47d7f41153cmr46778365e9.8.1767726843867; Tue, 06 Jan 2026 11:14:03 -0800 (PST) Received: from localhost (109-81-93-164.rct.o2.cz. [109.81.93.164]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-47d7f661a03sm63555415e9.13.2026.01.06.11.14.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Jan 2026 11:14:03 -0800 (PST) Date: Tue, 6 Jan 2026 20:14:02 +0100 From: Michal Hocko To: Shakeel Butt Cc: Jiayuan Chen , linux-mm@kvack.org, Jiayuan Chen , Andrew Morton , Johannes Weiner , David Hildenbrand , Qi Zheng , Lorenzo Stoakes , Axel Rasmussen , Yuanchu Xie , Wei Xu , linux-kernel@vger.kernel.org Subject: Re: [PATCH v1] mm/vmscan: mitigate spurious kswapd_failures reset from direct reclaim Message-ID: References: <4owaeb7bmkfgfzqd4ztdsi4tefc36cnmpju4yrknsgjm4y32ez@qsgn6lnv3cxb> <2e574085ed3d7775c3b83bb80d302ce45415ac42@linux.dev> <52cc0b2671b068903c6580b7431db0f22982ae86@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Tue 06-01-26 08:50:11, Shakeel Butt wrote: > On Tue, Jan 06, 2026 at 01:59:15PM +0100, Michal Hocko wrote: > > On Tue 06-01-26 11:19:21, Jiayuan Chen wrote: > > > January 6, 2026 at 17:49, "Michal Hocko" wrote: > > > > > > > > > > > > > > On Tue 06-01-26 05:25:42, Jiayuan Chen wrote: > > > > > > > > > > > > > > That said, I believe this patch is still a valid fix on its own - resetting kswapd_failures > > > > > when the node is not actually balanced doesn't seem like correct behavior regardless of the > > > > > broader context. > > > > > > > > > Originally I was more inclined to opt out memcg reclaim from reseting > > > > kswapd retry counter but the more I am thiking about that the more your > > > > patch makes sense to me. > > > > > > > > The reason being that it handles both memcg and global direct reclaims > > > > in the same way which makes the logic easier to follow. Afterall the > > > > primary purpose is to resurrect kswapd after we can see there is a > > > > better chance to reclaim something for kswapd. Until that moment direct > > > > reclaim is the only reclaim mechanism. > > > > > > > > Relying on pgdat_balanced might lead to re-enabling kswapd way much > > > > later while memory reclaim would be still mostly direct reclaim bound - > > > > thus increase allocation latencies. > > > > If we wanted to do better we would need to evaluate recent > > > > refaults/thrashing behavior but even then I am not sure we can make a > > > > good cut off. > > > > > > > > So in the end pgdat_balanced approach seems worth trying and see whether > > > > this could cause any corner cases. > > > > > > Thanks Michal. > > > > > > Regarding the allocation latency concern - we are already > > > in the direct reclaim slowpath, so a little extra overhead > > > from the pgdat_balanced check should be negligible. > > > > Yes, I do not think that pgdat_balanced call itself adds to the latency > > in the reclaim (slow) path. Mine main concern regarding latencies is > > about direct reclaim as a sole source of reclaim itself (as kswapd is > > not active). > > Yes we will be punting on direct reclaimers to collectively balance the > node which I think is fine for such cases i.e. high kswapd_failures. > However I still think the high kswapd_failures is most probably caused > by misconfiguration of the system by the users (like overcommitting zones > or nodes with unreclaimable memory or very memory.min). I am not questioning a misconfiguration. It is just far from great that kswapd adds to the problem under those conditions without a very good reason. I would be pushing back on increasing complexity for apparently misonfigured systems but I believe it is fair to say that failure counter reset logic could see some improvements. So let's see whether we can deal with the situation better while improving on this logic without much of an added complexity. > Yes, we can > reduce the suffering of such misconfigurations like this patch but > somehow the user should be notified that the system is misconfigured. > Anyways, I think we can proceed with this path. > > Juayuan, have you tested this patch on your production environment? Yes, getting some reclaim stats to the changelog would be highly appreciated (with and without the patch of course if you can reproduce the issue). -- Michal Hocko SUSE Labs