Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Dennis Zhou <dennis@kernel.org>
To: Kaitao Cheng <kaitao.cheng@linux.dev>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Uladzislau Rezki <urezki@gmail.com>, Tejun Heo <tj@kernel.org>,
	Christoph Lameter <cl@gentwo.org>,
	Vlastimil Babka <vbabka@kernel.org>,
	Michal Hocko <mhocko@suse.com>,
	muchun.song@linux.dev, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	chengkaitao <chengkaitao@kylinos.cn>
Subject: Re: [PATCH v3 0/3] mm/percpu: Fix possible NOFS/NOIO reclaim recursion
Date: Wed, 17 Jun 2026 00:03:17 -0700	[thread overview]
Message-ID: <ajJGtVDi0rShPfE1@palisades.local> (raw)
In-Reply-To: <20260612022648.13008-1-kaitao.cheng@linux.dev>

Hello,

On Fri, Jun 12, 2026 at 10:26:45AM +0800, Kaitao Cheng wrote:
> From: chengkaitao <chengkaitao@kylinos.cn>
> 
> Hi all,
> 
> After v1 was posted, there were many different opinions, mainly around
> optimizing pcpu_alloc_mutex. This v3 is intended to describe the existing
> problems more clearly and provide a conventional fix approach.
> 
> Commit 9a5b183941b5 ("mm, percpu: do not consider sleepable allocations
> atomic") allowed GFP_NOFS and GFP_NOIO percpu allocations to use
> pcpu_alloc_mutex and the chunk creation slow path. This restored the
> allocation capability that was lost when those constrained allocations
> were treated as atomic, but it also makes the percpu slow path visible
> to callers from constrained reclaim contexts.
> 
> There are two related problems.
> 
> First, the create and populate slow paths do not fully preserve the
> caller's allocation constraints. pcpu_alloc_noprof() derives pcpu_gfp from
> the caller supplied GFP mask and passes it down to the percpu backing page
> allocator. However, chunk creation calls pcpu_get_vm_areas(), and chunk
> population can allocate temporary metadata or vmalloc page tables while
> mapping backing pages. Those internal allocations can still use GFP_KERNEL,
> so a caller using GFP_NOFS or GFP_NOIO can enter unconstrained FS or IO
> reclaim while holding pcpu_alloc_mutex.
> 
> One possible case is blk-cgroup after commit 5d726c4dbeed
> ("blk-cgroup: fix possible deadlock while configuring policy").
> blkg_conf_prep() now serializes against blkcg_deactivate_policy() with
> q->blkcg_mutex, and blkg_alloc() uses GFP_NOIO because queue freeze and IO
> reclaim dependencies can otherwise deadlock. If the percpu slow path loses
> that GFP_NOIO context, direct reclaim or writeback can issue IO to a frozen
> queue while q->blkcg_mutex is held.
> 
> Second, allowing sleepable GFP_NOFS/GFP_NOIO allocations to take
> pcpu_alloc_mutex means that unconstrained backing allocations made under
> the mutex can create an FS/IO reclaim dependency against a constrained
> caller which already holds an FS or IO lock and then waits for
> pcpu_alloc_mutex.
> 
> This series fixes those issues in three steps:
> 
>   - pass the caller supplied GFP mask into pcpu_get_vm_areas() and use it
>     for vmalloc metadata and KASAN shadow allocations;
>   - pass the GFP mask through the chunk population path, including the
>     temporary pages array and vmalloc page table allocation scope;
>   - restrict percpu backing allocations performed while holding
>     pcpu_alloc_mutex to GFP_NOIO, so they cannot recurse into IO or FS
>     reclaim.
> 
> This keeps sleepable GFP_NOFS/GFP_NOIO percpu allocations working, while
> avoiding the reclaim recursion risks introduced by making those allocations
> eligible for the mutex-protected slow path.
> 
> Changes in v3:
> Allow @gfp to pass __GFP_NOFAIL through. (Andrew Morton)
> 
> Changes in v2:
>   - split the previous first patch into vmalloc-area creation and chunk
>     population changes; (Pedro Falcato)
>   - pass the GFP mask explicitly to pcpu_get_vm_areas(); (Pedro Falcato)
>   - apply the corresponding memalloc scope around vmalloc page table
>     allocation during chunk population;
>   - replace the reclaim recursion avoidance with a GFP_NOIO backing
>     allocation mask instead of only rejecting nested reclaim.
>     (Michal Hocko)
> 
> Link to v2:
> https://lore.kernel.org/all/20260604113101.89510-1-kaitao.cheng@linux.dev/
> 
> Link to v1:
> https://lore.kernel.org/all/20260528132917.81123-1-kaitao.cheng@linux.dev/
> 
> Kaitao Cheng (3):
>   mm/vmalloc: honor GFP constraints in pcpu_get_vm_areas()
>   mm/percpu: honor GFP constraints when populating chunks
>   mm/percpu: Avoid IO/FS reclaim in backing allocations
> 
>  include/linux/vmalloc.h |  4 ++--
>  mm/percpu-vm.c          | 40 +++++++++++++++++++++++++++-------------
>  mm/percpu.c             | 18 ++++++++++++------
>  mm/vmalloc.c            | 23 ++++++++++++-----------
>  4 files changed, 53 insertions(+), 32 deletions(-)
> 
> -- 
> 2.43.0
> 

Thanks for taking on this work. I definitely missed this earlier.

I acked patches 1 and 2. I think 3 is good but the __GFP_NOFAIL warrants
more discussion. I think my take back then was a single percpu
allocation can trigger a large # of backing pages. As a result, while
the caller may not be asking for a lot of memory, we may need
substantially more to back that allocation. Given the discrepancy,
that's why __GFP_NOFAIL is just mutex_lock() vs mutex_lock_killable().

Thanks,
Dennis


      parent reply	other threads:[~2026-06-17  7:03 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-12  2:26 [PATCH v3 0/3] mm/percpu: Fix possible NOFS/NOIO reclaim recursion Kaitao Cheng
2026-06-12  2:26 ` [PATCH v3 1/3] mm/vmalloc: honor GFP constraints in pcpu_get_vm_areas() Kaitao Cheng
2026-06-17  6:02   ` Dennis Zhou
2026-06-12  2:26 ` [PATCH v3 2/3] mm/percpu: honor GFP constraints when populating chunks Kaitao Cheng
2026-06-17  6:29   ` Dennis Zhou
2026-06-12  2:26 ` [PATCH v3 3/3] mm/percpu: Avoid IO/FS reclaim in backing allocations Kaitao Cheng
2026-06-17  6:53   ` Dennis Zhou
2026-06-17  8:56     ` Kaitao Cheng
2026-06-17 13:16       ` Michal Hocko
2026-06-17  7:03 ` Dennis Zhou [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajJGtVDi0rShPfE1@palisades.local \
    --to=dennis@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=chengkaitao@kylinos.cn \
    --cc=cl@gentwo.org \
    --cc=kaitao.cheng@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=tj@kernel.org \
    --cc=urezki@gmail.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox