All of lore.kernel.org
 help / color / mirror / Atom feed
From: Usama Arif <usama.arif@linux.dev>
To: Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Michal Hocko <mhocko@kernel.org>,
	Dave Chinner <david@fromorbit.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	Qi Zheng <qi.zheng@linux.dev>,
	Yosry Ahmed <yosry.ahmed@linux.dev>, Zi Yan <ziy@nvidia.com>,
	"Liam R . Howlett" <liam@infradead.org>,
	Kiryl Shutsemau <kas@kernel.org>,
	Vlastimil Babka <vbabka@kernel.org>,
	Kairui Song <ryncsn@gmail.com>,
	Mikhail Zaslonko <zaslonko@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	Barry Song <baohua@kernel.org>, Dev Jain <dev.jain@arm.com>,
	Lance Yang <lance.yang@linux.dev>, Nico Pache <npache@redhat.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 9/9] mm: switch deferred split shrinker to list_lru
Date: Thu, 28 May 2026 14:32:06 +0100	[thread overview]
Message-ID: <6f9c78b2-3846-4f75-bcc2-41bf91230513@linux.dev> (raw)
In-Reply-To: <20260527204757.2544958-10-hannes@cmpxchg.org>



On 27/05/2026 21:45, Johannes Weiner wrote:
> The deferred split queue handles cgroups in a suboptimal fashion. The
> queue is per-NUMA node or per-cgroup, not the intersection. That means
> on a cgrouped system, a node-restricted allocation entering reclaim
> can end up splitting large pages on other nodes:
> 
>         alloc/unmap
>           deferred_split_folio()
>             list_add_tail(memcg->split_queue)
>             set_shrinker_bit(memcg, node, deferred_shrinker_id)
> 
>         for_each_zone_zonelist_nodemask(restricted_nodes)
>           mem_cgroup_iter()
>             shrink_slab(node, memcg)
>               shrink_slab_memcg(node, memcg)
>                 if test_shrinker_bit(memcg, node, deferred_shrinker_id)
>                   deferred_split_scan()
>                     walks memcg->split_queue
> 
> The shrinker bit adds an imperfect guard rail. As soon as the cgroup
> has a single large page on the node of interest, all large pages owned
> by that memcg, including those on other nodes, will be split.
> 
> list_lru properly sets up per-node, per-cgroup lists. As a bonus, it
> streamlines a lot of the list operations and reclaim walks. It's used
> widely by other major shrinkers already. Convert the deferred split
> queue as well.
> 
> The list_lru per-memcg heads are instantiated on demand when the first
> object of interest is allocated for a cgroup, by calling
> folio_memcg_alloc_deferred(). Add calls to where splittable pages are
> created: anon faults, swapin faults, khugepaged collapse.
> 
> These calls create all possible node heads for the cgroup at once, so
> the migration code (between nodes) doesn't need any special care.
> 
> Reported-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
> Tested-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
> Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  include/linux/huge_mm.h    |   7 +-
>  include/linux/memcontrol.h |   4 -
>  include/linux/mmzone.h     |  12 --
>  mm/huge_memory.c           | 364 +++++++++++++------------------------
>  mm/internal.h              |   2 +-
>  mm/khugepaged.c            |   5 +
>  mm/memcontrol.c            |  12 +-
>  mm/memory.c                |   4 +
>  mm/mm_init.c               |  15 --
>  mm/swap_state.c            |  10 +
>  10 files changed, 150 insertions(+), 285 deletions(-)
> 

[...]

> diff --git a/mm/memory.c b/mm/memory.c
> index 135f5c0f57bd..f22e61d8c8de 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5222,6 +5222,10 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
>  			folio_put(folio);
>  			goto next;
>  		}
> +		if (order > 1 && folio_memcg_alloc_deferred(folio)) {
> +			folio_put(folio);

Ah sorry, should have caught this in the previous version, do we need

count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK);

here?

or maybe we just goto next instead of goto fallback and trty next
viable order?


> +			goto fallback;
> +		}
>  		folio_throttle_swaprate(folio, gfp);
>  		/*
>  		 * When a folio is not zeroed during allocation


  parent reply	other threads:[~2026-05-28 13:32 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-27 20:45 [PATCH v5 0/9] mm: switch THP shrinker to list_lru Johannes Weiner
2026-05-27 20:45 ` [PATCH v5 1/9] mm: list_lru: fix set_shrinker_bit() call during race with cgroup deletion Johannes Weiner
2026-05-28 13:25   ` Usama Arif
2026-05-30  2:38   ` Wei Yang
2026-05-27 20:45 ` [PATCH v5 2/9] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty Johannes Weiner
2026-05-27 20:45 ` [PATCH v5 3/9] mm: list_lru: deduplicate unlock_list_lru() Johannes Weiner
2026-05-27 20:45 ` [PATCH v5 4/9] mm: list_lru: move list dead check to lock_list_lru_of_memcg() Johannes Weiner
2026-05-27 20:45 ` [PATCH v5 5/9] mm: list_lru: deduplicate lock_list_lru() Johannes Weiner
2026-05-29  9:56   ` Wei Yang
2026-05-29 13:42     ` Johannes Weiner
2026-05-30  1:25       ` Wei Yang
2026-05-27 20:45 ` [PATCH v5 6/9] mm: list_lru: introduce caller locking for additions and deletions Johannes Weiner
2026-05-27 20:45 ` [PATCH v5 7/9] mm: list_lru: introduce folio_memcg_list_lru_alloc() Johannes Weiner
2026-05-27 20:45 ` [PATCH v5 8/9] mm: memory: flatten alloc_anon_folio() retry loop Johannes Weiner
2026-05-30  9:06   ` Dev Jain
2026-05-27 20:45 ` [PATCH v5 9/9] mm: switch deferred split shrinker to list_lru Johannes Weiner
2026-05-28  7:08   ` SeongJae Park
2026-05-28 14:03     ` Johannes Weiner
2026-05-28 13:32   ` Usama Arif [this message]
2026-05-28 14:02     ` Johannes Weiner
2026-05-28 15:31       ` Usama Arif
2026-05-29 17:33   ` Kairui Song
2026-05-31  8:00   ` Wei Yang
2026-06-01 10:39   ` Lance Yang
2026-06-01 11:09     ` Lance Yang
2026-06-01 13:21   ` Lance Yang
2026-06-01 18:17     ` Johannes Weiner
2026-06-01  8:36 ` [PATCH v5 0/9] mm: switch THP " Lance Yang
2026-06-02 21:46   ` Johannes Weiner
2026-06-03  4:44     ` Lance Yang
2026-06-03 11:41       ` Johannes Weiner
2026-06-03 11:53         ` Lance Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6f9c78b2-3846-4f75-bcc2-41bf91230513@linux.dev \
    --to=usama.arif@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=cgroups@vger.kernel.org \
    --cc=david@fromorbit.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=gor@linux.ibm.com \
    --cc=hannes@cmpxchg.org \
    --cc=kas@kernel.org \
    --cc=lance.yang@linux.dev \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=npache@redhat.com \
    --cc=qi.zheng@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=ryan.roberts@arm.com \
    --cc=ryncsn@gmail.com \
    --cc=shakeel.butt@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=yosry.ahmed@linux.dev \
    --cc=zaslonko@linux.ibm.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.