From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-177.mta0.migadu.com (out-177.mta0.migadu.com [91.218.175.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E38772248BE for ; Tue, 6 Jan 2026 03:14:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.177 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767669286; cv=none; b=TFCDUzdR4+QmUwe4fTPWAK5jrnatfBZxZmUhWzMbsRtHQhxOaPMvlF8491piK3VqWUp2D/EA9wJmycniTMlQpaSj3YpOhfQaPLF0rtGHs31pTMV4SNAXracIjJ+csYLEZqVDcAbTpeCA5qJf5Xz4wonHqgvd2uDQWuRj1ufBZP4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767669286; c=relaxed/simple; bh=qoZ/tlTPnPndCNj6CJ0wQxbyr0N4gGrEz+sLejOl0HQ=; h=MIME-Version:Date:Content-Type:From:Message-ID:Subject:To:Cc: In-Reply-To:References; b=KYgh/hww0VkqWZgE3FrTSaZ0ocVlME5nZBjO7Oggh7f5C7yTVdtDIWWrNowVcaHMsuGh4PxKmn0DfJXuJZf2at4SzY/oFnyNsfBTA1gqf1oYu//rbpWiSY2P1Pm1ZlsVfnmT+J5My0q0worl5bol7kvzz+it+u2dW8Kwev6TUn4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=L4SeUm1j; arc=none smtp.client-ip=91.218.175.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="L4SeUm1j" Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1767669272; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SSIXmXHDZZDTd4tsk6VB8G6eA4XEfbMhwVQl2vQbAZs=; b=L4SeUm1j/nAU67n5ezffzmjCAaPcy9UI1hOi8JJXpVKrGChdsu2EFiAaRVNxAAogheJBV2 46PNCMSxVk70AuM1bZ9lNDF2H/zmLT84KOpu/+AECdoBYf0wv8i/6YNjscf5lh4OVuGL1T lnrfComLSWTDt2PpK8hY7rvDVs6apkM= Date: Tue, 06 Jan 2026 03:14:29 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: "Jiayuan Chen" Message-ID: TLS-Required: No Subject: Re: [PATCH v2] mm/memcg: scale memory.high penalty based on refault recency To: "Shakeel Butt" Cc: linux-mm@kvack.org, "Jiayuan Chen" , "Johannes Weiner" , "Michal Hocko" , "Roman Gushchin" , "Muchun Song" , "Andrew Morton" , "David Hildenbrand" , "Qi Zheng" , "Lorenzo Stoakes" , "Axel Rasmussen" , "Yuanchu Xie" , "Wei Xu" , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, "Hui Zhu" In-Reply-To: References: <20251229033957.296257-1-jiayuan.chen@linux.dev> X-Migadu-Flow: FLOW_OUT January 6, 2026 at 01:08, "Shakeel Butt" wrote: >=20 >=20+Hui Zhu >=20 >=20Hi Jiayuan, >=20 >=20On Mon, Dec 29, 2025 at 11:39:55AM +0800, Jiayuan Chen wrote: >=20 >=20>=20 >=20> From: Jiayuan Chen > >=20=20 >=20> Problem > > ------- > > We observed an issue in production where a workload continuously > > triggering memory.high also generates massive disk IO READ, causing > > system-wide performance degradation. > >=20=20 >=20> This happens because memory.high penalty is currently based solely= on > > the overage amount, not the actual impact of that overage: > >=20=20 >=20> 1. A memcg over memory.high reclaiming cold/unused pages > > =E2=86=92 minimal system impact, light penalty is appropriate > >=20=20 >=20> 2. A memcg over memory.high with hot pages being continuously > > reclaimed and refaulted =E2=86=92 severe IO pressure, needs heavy pe= nalty > >=20=20 >=20> Both cases receive identical penalties today. Users are forced to > > combine memory.high with io.max as a workaround, but this is: > > - The wrong abstraction level (memory policy shouldn't require IO tu= ning) > > - Hard to configure correctly across different storage devices > > - Unintuitive for users who only want memory control > >=20 >=20Thanks for raising and reporting this use-case. Overall I am supporti= ve > of making memory.high more useful but instead of adding more more > heuristic in the kernel, I would prefer to make the enforcement of > memory.high more flexible with BPF. >=20 >=20At the moment, Hui Zhu is working on adding BPF support for memcg but= it > is very generic and I would prefer to start with specific and real > use-case. I think your use-case is real and will be beneficial to many > other users. Can you please followup on that Hui's RFC to present your > use-case? I will also try to push the effort from the review side. >=20 >=20thanks, > Shakeel > Hi Shakeel, Thanks for the feedback and pointing to Hui's RFC. I noticed Michal has already forwarded my patch to that thread, and Hui has responded. I'll wait to see how that discussion evolves and whether there's an opportunity to integrate my use-case into his BPF framework. You're right that my timestamp-based approach is heuristic. It was designed as a simple, low-overhead approximation to detect active thrashing without the cost of flushing refault counters on every charge. But I agree that a more flexible BPF-based solution could be cleaner in the long term. I'll follow up on Hui's thread once there's more progress. Thanks, Jiayuan