From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-dl1-f51.google.com (mail-dl1-f51.google.com [74.125.82.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AFF5835836E for ; Wed, 11 Mar 2026 22:05:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.51 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773266725; cv=none; b=VhBzebxNr8+0e7dpq5FKPCt+cEFFW6RCItoyoqUzgIWDRq/JynyZaFNB0SlUi2pkr92X8vb24E4djgtdqi3f0m4mKc3s9/mZPz9FeWCiksRGMpr0fNsY325+oEJOhxwiChx4K6AXsIOA0g9Sttucs/MhIs3U9t7FTnDwfLe5Rmc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773266725; c=relaxed/simple; bh=+8RpzlRgG8CcMguiysKihl8ncy0qm2mBmUD1E0R4G/4=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=O0ORbjtESeGe2ifF2R90Gt5IUccf5GgZXNlIJcSB7rEch51F4Amowu2+wtELHtYpOTZvEk9YTQKlQdxVF9JpN9knRFYhWGBrGMLmeMMV879Fa1vu1j6EjHOU5scAvSCPXRp7QPPjF00GhD1YEqQTDNs6m5sRwXmDqULTVDRatMU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=EmVDTFlF; arc=none smtp.client-ip=74.125.82.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="EmVDTFlF" Received: by mail-dl1-f51.google.com with SMTP id a92af1059eb24-126ea4e9697so1677c88.1 for ; Wed, 11 Mar 2026 15:05:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1773266723; x=1773871523; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=wXWniKRYu/9movUjNPN4FwDYQDKZWxsH7J4WhlFPgko=; b=EmVDTFlFD7dHSZ6uCvkgQJowUITrgYMbsIVoVMkwviE/rqQTW2ZGR1cZs1UQ6eTiaR 2XCj7xXeZj1fd/tq905fQrRRmeiCEFxvcFls6astlbd5SkEmf9GayVyr58Q1sRPy7RRn LU87GVHJNARbp7lNY8msnWkMeoulD+DOXkGyYH9zQLPYJ5yiR/WK6/DphGBOjABsQYtN NB8TVNZ6SviBSVH+RThs7nvoJCzkyvTwbXmc1z/nfsY0GnvoFi7CMy070kgVTpp07epf ZgwQZW/Wy0eegaEOZlB5w7VMnpa+A4m6ymJkDcep/4WzDi4Vz43+Ea+deS3pVg/2Dx7s xoRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773266723; x=1773871523; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wXWniKRYu/9movUjNPN4FwDYQDKZWxsH7J4WhlFPgko=; b=ss/bnCtlIkcLELFsKqUU/Dh0Y7PN1W9L0EdR6Gp+kae/v/LWFHO3wlQj1h/3tfMZ9J 2MC13S/9nEPTB6N2JO3dxfEGWgFNZ21clQCJjS2MZDIz+bsY24QPzOzedFzcYN5vY1xc yPsmHpQpxodrLyYsemasiWOvxVmuoFnJpbTGQ8zqSnwtI7d0knwYrMBGZN8sSj6vcCkH LquttTOgAEukgxBdPyCr7WZTgCARoMGm0ESQfmx2mfXFTKHnjSn0lDlUGv8lxdJPUu6G aj4VLodoBZZdQhltfahZ5T6F6/rt/c/Matn36Cplj9hrcTHj5qYZJUPne7QcHCzxNl4P 4D9A== X-Forwarded-Encrypted: i=1; AJvYcCXTO9C5Edo6jfCti4/4X6HF9GGOfA49ORB41OsVnj3tEkhpG4/bc+FHrxu02Z2YrfsSQ3v9BC68XYcKlVY=@vger.kernel.org X-Gm-Message-State: AOJu0Yyn8mrDGbnxPOMshlkYWIUrqyFonGKm4lzV+r6nTYGTNq3wij3o /KFmk80pIhaVJhpA+OUezOJC7xwlPPhDgnYcSWcZbIKE80GC5FCJamUvIr45N7fCPQ== X-Gm-Gg: ATEYQzzZE4UecksvLne556iSRwA8NudeWRYb/WVXC5ouaAh5p32+QdVBmdquV4AnYgO 28HI2H40XOwN8mtIlxLPOWrFm2/bG7WI958Yo2qmVsUf6J7SeqwRuNK0UmPFVmYcY6r6r4eDid2 ekUI5AsYjKXoiTrjYOP3cj7Hwu7DMAIL3Ee/f3ic+lRskmdKCFtxLGnhaJO1thGAkr+Z8LkH+Lg 7tcaLVQM7XAqbxAQPUJp0g4UzUm6qlsSPwwZ9KPMRdiQPT22hqgJFEG7vc+ue/EByFQUMHEx2Xm zCXqs5SybkDH3KHEU5nJ2eFUa7447pCeD2tUd5bI7bd/igcmq48u2yBPbCd0DoBxowBfWJV/E46 bVuY2hUOsehwEYt93wKRhhh5fyQSRzWmbLtam1IHt3FBV2R88m8/Tmoy9O32sxkB2xvGQqkuW/6 aFkGcvKEn0AHY458sjSuDrkfYs4FyZO52O/LvmqCkT3uOqoI0GxPFClUbnS3bCC4Vm X-Received: by 2002:a05:7022:5f19:b0:127:366c:8722 with SMTP id a92af1059eb24-128ed176657mr29991c88.16.1773266721865; Wed, 11 Mar 2026 15:05:21 -0700 (PDT) Received: from google.com (206.238.125.34.bc.googleusercontent.com. [34.125.238.206]) by smtp.gmail.com with ESMTPSA id a92af1059eb24-128e7bffd49sm6083208c88.5.2026.03.11.15.05.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Mar 2026 15:05:21 -0700 (PDT) Date: Wed, 11 Mar 2026 22:05:16 +0000 From: Bing Jiao To: Joshua Hahn Cc: Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Qi Zheng , Axel Rasmussen , Yuanchu Xie , Wei Xu , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: Re: [RFC PATCH 6/6] mm/memcontrol: Make memory.high tier-aware Message-ID: References: <20260223223830.586018-1-joshua.hahnjy@gmail.com> <20260223223830.586018-7-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20260223223830.586018-7-joshua.hahnjy@gmail.com> On Mon, Feb 23, 2026 at 02:38:29PM -0800, Joshua Hahn wrote: > @@ -4485,15 +4527,22 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, > return err; > > page_counter_set_high(&memcg->memory, high); > + toptier_high = page_counter_toptier_high(&memcg->memory); > > if (of->file->f_flags & O_NONBLOCK) > goto out; > > for (;;) { > unsigned long nr_pages = page_counter_read(&memcg->memory); > + unsigned long toptier_pages = mem_cgroup_toptier_usage(memcg); > unsigned long reclaimed; > + unsigned long to_free; > + nodemask_t toptier_nodes, *reclaim_nodes; > + bool mem_high_ok = nr_pages <= high; > + bool toptier_high_ok = !(tier_aware_memcg_limits && > + toptier_pages > toptier_high); > > - if (nr_pages <= high) > + if (mem_high_ok && toptier_high_ok) > break; > > if (signal_pending(current)) > @@ -4505,8 +4554,17 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, > continue; > } > > - reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_pages - high, > - GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, NULL); > + mt_get_toptier_nodemask(&toptier_nodes, NULL); > + if (mem_high_ok && !toptier_high_ok) { > + reclaim_nodes = &toptier_nodes; > + to_free = toptier_pages - toptier_high; > + } else { > + reclaim_nodes = NULL; > + to_free = nr_pages - high; > + } > + reclaimed = try_to_free_mem_cgroup_pages(memcg, to_free, > + GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, > + NULL, reclaim_nodes); > > if (!reclaimed && !nr_retries--) > break; Hi Joshua, thanks for the patch. I have a concern regarding the system behavior when both the total memory.high limit and the new toptier_high limit are breached. If both mem_high_ok and toptier_high are false, memory_high_write() invokes try_to_free_mem_cgroup_pages() with reclaim_nodes set to NULL to target all nodes. Under these conditions, the reclaimer might attempt to satisfy the target bytes by demoting pages from the top-tier to lower tiers. While this fulfills the toptier_high requirement, it fails to reduce the total memory charge for the cgroup because the counter tracks the sum across all tiers. Consequently, since the total memory usage remains unchanged, the reclaimer will likely become trapped in the loop until it reaches MAX_RECLAIM_RETRIES and other situations (e.g., both !reclaimed && !nr_retries–), leading to excessive CPU consumption without successfully bringing the cgroup below its total memory limit, or causing all top-tier pages demoted to far-tier, or causing premature OOM kills. Given your tier-aware memcg limits, I think it is better to reclaim from lower tiers to swap to satisfy mem_high_ok by setting the allowed nodemask to far-tier nodes. Then demote pages from top tiers to ensure toptier_high is okay. This also prevents reclaiming pages directly from top tiers to swap and ensures that demotion actually contributes to reaching the targeted memory state without unnecessary performance penalties. To address the issue where a memcg exceeds its total limit and demotion cannot help to relief the memory memcg pressure, I am considering to introduce a reclaim_options setting that prevents page demotion by setting sc.no_demote = 1. I have a local patch for this and am preparing it for submission. Please let me know if I have misunderstood any part of your implementation or if you see any issues with this proposed adjustment. Best, Bing