From: Dennis Zhou <dennis@kernel.org>
To: Kaitao Cheng <kaitao.cheng@linux.dev>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Uladzislau Rezki <urezki@gmail.com>, Tejun Heo <tj@kernel.org>,
Christoph Lameter <cl@gentwo.org>,
Vlastimil Babka <vbabka@kernel.org>,
Michal Hocko <mhocko@suse.com>,
muchun.song@linux.dev, linux-mm@kvack.org,
linux-kernel@vger.kernel.org,
chengkaitao <chengkaitao@kylinos.cn>
Subject: Re: [PATCH v3 0/3] mm/percpu: Fix possible NOFS/NOIO reclaim recursion
Date: Wed, 17 Jun 2026 00:03:17 -0700 [thread overview]
Message-ID: <ajJGtVDi0rShPfE1@palisades.local> (raw)
In-Reply-To: <20260612022648.13008-1-kaitao.cheng@linux.dev>
Hello,
On Fri, Jun 12, 2026 at 10:26:45AM +0800, Kaitao Cheng wrote:
> From: chengkaitao <chengkaitao@kylinos.cn>
>
> Hi all,
>
> After v1 was posted, there were many different opinions, mainly around
> optimizing pcpu_alloc_mutex. This v3 is intended to describe the existing
> problems more clearly and provide a conventional fix approach.
>
> Commit 9a5b183941b5 ("mm, percpu: do not consider sleepable allocations
> atomic") allowed GFP_NOFS and GFP_NOIO percpu allocations to use
> pcpu_alloc_mutex and the chunk creation slow path. This restored the
> allocation capability that was lost when those constrained allocations
> were treated as atomic, but it also makes the percpu slow path visible
> to callers from constrained reclaim contexts.
>
> There are two related problems.
>
> First, the create and populate slow paths do not fully preserve the
> caller's allocation constraints. pcpu_alloc_noprof() derives pcpu_gfp from
> the caller supplied GFP mask and passes it down to the percpu backing page
> allocator. However, chunk creation calls pcpu_get_vm_areas(), and chunk
> population can allocate temporary metadata or vmalloc page tables while
> mapping backing pages. Those internal allocations can still use GFP_KERNEL,
> so a caller using GFP_NOFS or GFP_NOIO can enter unconstrained FS or IO
> reclaim while holding pcpu_alloc_mutex.
>
> One possible case is blk-cgroup after commit 5d726c4dbeed
> ("blk-cgroup: fix possible deadlock while configuring policy").
> blkg_conf_prep() now serializes against blkcg_deactivate_policy() with
> q->blkcg_mutex, and blkg_alloc() uses GFP_NOIO because queue freeze and IO
> reclaim dependencies can otherwise deadlock. If the percpu slow path loses
> that GFP_NOIO context, direct reclaim or writeback can issue IO to a frozen
> queue while q->blkcg_mutex is held.
>
> Second, allowing sleepable GFP_NOFS/GFP_NOIO allocations to take
> pcpu_alloc_mutex means that unconstrained backing allocations made under
> the mutex can create an FS/IO reclaim dependency against a constrained
> caller which already holds an FS or IO lock and then waits for
> pcpu_alloc_mutex.
>
> This series fixes those issues in three steps:
>
> - pass the caller supplied GFP mask into pcpu_get_vm_areas() and use it
> for vmalloc metadata and KASAN shadow allocations;
> - pass the GFP mask through the chunk population path, including the
> temporary pages array and vmalloc page table allocation scope;
> - restrict percpu backing allocations performed while holding
> pcpu_alloc_mutex to GFP_NOIO, so they cannot recurse into IO or FS
> reclaim.
>
> This keeps sleepable GFP_NOFS/GFP_NOIO percpu allocations working, while
> avoiding the reclaim recursion risks introduced by making those allocations
> eligible for the mutex-protected slow path.
>
> Changes in v3:
> Allow @gfp to pass __GFP_NOFAIL through. (Andrew Morton)
>
> Changes in v2:
> - split the previous first patch into vmalloc-area creation and chunk
> population changes; (Pedro Falcato)
> - pass the GFP mask explicitly to pcpu_get_vm_areas(); (Pedro Falcato)
> - apply the corresponding memalloc scope around vmalloc page table
> allocation during chunk population;
> - replace the reclaim recursion avoidance with a GFP_NOIO backing
> allocation mask instead of only rejecting nested reclaim.
> (Michal Hocko)
>
> Link to v2:
> https://lore.kernel.org/all/20260604113101.89510-1-kaitao.cheng@linux.dev/
>
> Link to v1:
> https://lore.kernel.org/all/20260528132917.81123-1-kaitao.cheng@linux.dev/
>
> Kaitao Cheng (3):
> mm/vmalloc: honor GFP constraints in pcpu_get_vm_areas()
> mm/percpu: honor GFP constraints when populating chunks
> mm/percpu: Avoid IO/FS reclaim in backing allocations
>
> include/linux/vmalloc.h | 4 ++--
> mm/percpu-vm.c | 40 +++++++++++++++++++++++++++-------------
> mm/percpu.c | 18 ++++++++++++------
> mm/vmalloc.c | 23 ++++++++++++-----------
> 4 files changed, 53 insertions(+), 32 deletions(-)
>
> --
> 2.43.0
>
Thanks for taking on this work. I definitely missed this earlier.
I acked patches 1 and 2. I think 3 is good but the __GFP_NOFAIL warrants
more discussion. I think my take back then was a single percpu
allocation can trigger a large # of backing pages. As a result, while
the caller may not be asking for a lot of memory, we may need
substantially more to back that allocation. Given the discrepancy,
that's why __GFP_NOFAIL is just mutex_lock() vs mutex_lock_killable().
Thanks,
Dennis
prev parent reply other threads:[~2026-06-17 7:03 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-12 2:26 [PATCH v3 0/3] mm/percpu: Fix possible NOFS/NOIO reclaim recursion Kaitao Cheng
2026-06-12 2:26 ` [PATCH v3 1/3] mm/vmalloc: honor GFP constraints in pcpu_get_vm_areas() Kaitao Cheng
2026-06-17 6:02 ` Dennis Zhou
2026-06-12 2:26 ` [PATCH v3 2/3] mm/percpu: honor GFP constraints when populating chunks Kaitao Cheng
2026-06-17 6:29 ` Dennis Zhou
2026-06-12 2:26 ` [PATCH v3 3/3] mm/percpu: Avoid IO/FS reclaim in backing allocations Kaitao Cheng
2026-06-17 6:53 ` Dennis Zhou
2026-06-17 8:56 ` Kaitao Cheng
2026-06-17 13:16 ` Michal Hocko
2026-06-17 7:03 ` Dennis Zhou [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajJGtVDi0rShPfE1@palisades.local \
--to=dennis@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=chengkaitao@kylinos.cn \
--cc=cl@gentwo.org \
--cc=kaitao.cheng@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=tj@kernel.org \
--cc=urezki@gmail.com \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox