From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-186.mta1.migadu.com (out-186.mta1.migadu.com [95.215.58.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B3D073AF677 for ; Thu, 28 May 2026 13:32:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.186 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779975153; cv=none; b=hIMv5sDkaZcDTKzDallrvSZuk5xc8an17Sk39mGIMY+X6NOTqI7C9Ib92F5rno1GI5hnN/o3FCGkF4c2NLIwNN1/QmJrVu0IX5ssjCSHH1kUf0PIaiqqZUq1TD7EYpgP2KRY3b+UmG7rX76PQRGpCKrD2ECH0x9fjNgthPHXrfU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779975153; c=relaxed/simple; bh=boO5Pr7r8Ns7RLP2Bda22en1ccII9rcdpLXY3jloMBE=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=XzzbP6zGM3nAQZzdU5VJlO7GMtJgl/oAx45dIUjxJ0W3nGYx3/v0G6PLET74GSG88j3pWHVo/oaZColHDiAdH1oJ0zEou1P5XcmmAWDV+u4zGLZ/HVoWfZsUHBgwKapwE/t0b5d3H0fhLNqbX+CfHrJ5uvbRvsKhAj5PwlAraBM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=MiU8TddY; arc=none smtp.client-ip=95.215.58.186 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="MiU8TddY" Message-ID: <6f9c78b2-3846-4f75-bcc2-41bf91230513@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1779975139; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mK9p/d9INJUnJjRfPuNzLbC2SvilH/rgU+fuHkGUigQ=; b=MiU8TddYjqp0YAIW6vK65c2gTNAqz6sba9wwiZPfAp0DNVaVGgf9vBqL8/Y/aF1QY2Lx+v 8i1v6Fs9OZ23Ha/H4jwGv+TCwXEvZj2aH13jAB4kcNqFRP5eBwWIwo9OXgpL+ZBSpnFDkH TE+tfrF66VN9lUdQeM2jyamvTO5NLDI= Date: Thu, 28 May 2026 14:32:06 +0100 Precedence: bulk X-Mailing-List: cgroups@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH v5 9/9] mm: switch deferred split shrinker to list_lru To: Johannes Weiner , Andrew Morton Cc: David Hildenbrand , Lorenzo Stoakes , Shakeel Butt , Michal Hocko , Dave Chinner , Roman Gushchin , Muchun Song , Qi Zheng , Yosry Ahmed , Zi Yan , "Liam R . Howlett" , Kiryl Shutsemau , Vlastimil Babka , Kairui Song , Mikhail Zaslonko , Vasily Gorbik , Baolin Wang , Barry Song , Dev Jain , Lance Yang , Nico Pache , Ryan Roberts , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20260527204757.2544958-1-hannes@cmpxchg.org> <20260527204757.2544958-10-hannes@cmpxchg.org> Content-Language: en-US X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Usama Arif In-Reply-To: <20260527204757.2544958-10-hannes@cmpxchg.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT On 27/05/2026 21:45, Johannes Weiner wrote: > The deferred split queue handles cgroups in a suboptimal fashion. The > queue is per-NUMA node or per-cgroup, not the intersection. That means > on a cgrouped system, a node-restricted allocation entering reclaim > can end up splitting large pages on other nodes: > > alloc/unmap > deferred_split_folio() > list_add_tail(memcg->split_queue) > set_shrinker_bit(memcg, node, deferred_shrinker_id) > > for_each_zone_zonelist_nodemask(restricted_nodes) > mem_cgroup_iter() > shrink_slab(node, memcg) > shrink_slab_memcg(node, memcg) > if test_shrinker_bit(memcg, node, deferred_shrinker_id) > deferred_split_scan() > walks memcg->split_queue > > The shrinker bit adds an imperfect guard rail. As soon as the cgroup > has a single large page on the node of interest, all large pages owned > by that memcg, including those on other nodes, will be split. > > list_lru properly sets up per-node, per-cgroup lists. As a bonus, it > streamlines a lot of the list operations and reclaim walks. It's used > widely by other major shrinkers already. Convert the deferred split > queue as well. > > The list_lru per-memcg heads are instantiated on demand when the first > object of interest is allocated for a cgroup, by calling > folio_memcg_alloc_deferred(). Add calls to where splittable pages are > created: anon faults, swapin faults, khugepaged collapse. > > These calls create all possible node heads for the cgroup at once, so > the migration code (between nodes) doesn't need any special care. > > Reported-by: Mikhail Zaslonko > Tested-by: Mikhail Zaslonko > Acked-by: Shakeel Butt > Reviewed-by: Lorenzo Stoakes (Oracle) > Signed-off-by: Johannes Weiner > --- > include/linux/huge_mm.h | 7 +- > include/linux/memcontrol.h | 4 - > include/linux/mmzone.h | 12 -- > mm/huge_memory.c | 364 +++++++++++++------------------------ > mm/internal.h | 2 +- > mm/khugepaged.c | 5 + > mm/memcontrol.c | 12 +- > mm/memory.c | 4 + > mm/mm_init.c | 15 -- > mm/swap_state.c | 10 + > 10 files changed, 150 insertions(+), 285 deletions(-) > [...] > diff --git a/mm/memory.c b/mm/memory.c > index 135f5c0f57bd..f22e61d8c8de 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5222,6 +5222,10 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf) > folio_put(folio); > goto next; > } > + if (order > 1 && folio_memcg_alloc_deferred(folio)) { > + folio_put(folio); Ah sorry, should have caught this in the previous version, do we need count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK); here? or maybe we just goto next instead of goto fallback and trty next viable order? > + goto fallback; > + } > folio_throttle_swaprate(folio, gfp); > /* > * When a folio is not zeroed during allocation