Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Kaitao Cheng <kaitao.cheng@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>,
	Dennis Zhou <dennis@kernel.org>, Tejun Heo <tj@kernel.org>,
	Christoph Lameter <cl@gentwo.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	Pedro Falcato <pfalcato@suse.de>,
	Vlastimil Babka <vbabka@kernel.org>,
	Michal Hocko <mhocko@suse.com>
Cc: muchun.song@linux.dev, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	chengkaitao <chengkaitao@kylinos.cn>
Subject: [PATCH v2 0/3] mm/percpu: Fix possible NOFS/NOIO reclaim recursion
Date: Thu,  4 Jun 2026 19:30:58 +0800	[thread overview]
Message-ID: <20260604113101.89510-1-kaitao.cheng@linux.dev> (raw)

From: chengkaitao <chengkaitao@kylinos.cn>

Hi all,

After v1 was posted, there were many different opinions, mainly around
optimizing pcpu_alloc_mutex. This v2 is intended to describe the existing
problems more clearly and provide a conventional fix approach.

Commit 9a5b183941b5 ("mm, percpu: do not consider sleepable allocations
atomic") allowed GFP_NOFS and GFP_NOIO percpu allocations to use
pcpu_alloc_mutex and the chunk creation slow path. This restored the
allocation capability that was lost when those constrained allocations
were treated as atomic, but it also makes the percpu slow path visible
to callers from constrained reclaim contexts.

There are two related problems.

First, the create and populate slow paths do not fully preserve the
caller's allocation constraints. pcpu_alloc_noprof() derives pcpu_gfp from
the caller supplied GFP mask and passes it down to the percpu backing page
allocator. However, chunk creation calls pcpu_get_vm_areas(), and chunk
population can allocate temporary metadata or vmalloc page tables while
mapping backing pages. Those internal allocations can still use GFP_KERNEL,
so a caller using GFP_NOFS or GFP_NOIO can enter unconstrained FS or IO
reclaim while holding pcpu_alloc_mutex.

One possible case is blk-cgroup after commit 5d726c4dbeed
("blk-cgroup: fix possible deadlock while configuring policy").
blkg_conf_prep() now serializes against blkcg_deactivate_policy() with
q->blkcg_mutex, and blkg_alloc() uses GFP_NOIO because queue freeze and IO
reclaim dependencies can otherwise deadlock. If the percpu slow path loses
that GFP_NOIO context, direct reclaim or writeback can issue IO to a frozen
queue while q->blkcg_mutex is held.

Second, allowing sleepable GFP_NOFS/GFP_NOIO allocations to take
pcpu_alloc_mutex means that unconstrained backing allocations made under
the mutex can create an FS/IO reclaim dependency against a constrained
caller which already holds an FS or IO lock and then waits for
pcpu_alloc_mutex.

This series fixes those issues in three steps:

  - pass the caller supplied GFP mask into pcpu_get_vm_areas() and use it
    for vmalloc metadata and KASAN shadow allocations;
  - pass the GFP mask through the chunk population path, including the
    temporary pages array and vmalloc page table allocation scope;
  - restrict percpu backing allocations performed while holding
    pcpu_alloc_mutex to GFP_NOIO, so they cannot recurse into IO or FS
    reclaim.

This keeps sleepable GFP_NOFS/GFP_NOIO percpu allocations working, while
avoiding the reclaim recursion risks introduced by making those allocations
eligible for the mutex-protected slow path.

Changes in v2:
  - split the previous first patch into vmalloc-area creation and chunk
    population changes; (Pedro Falcato)
  - pass the GFP mask explicitly to pcpu_get_vm_areas(); (Pedro Falcato)
  - apply the corresponding memalloc scope around vmalloc page table
    allocation during chunk population;
  - replace the reclaim recursion avoidance with a GFP_NOIO backing
    allocation mask instead of only rejecting nested reclaim.
    (Michal Hocko)

Link to v1:
https://lore.kernel.org/all/20260528132917.81123-1-kaitao.cheng@linux.dev/

Kaitao Cheng (3):
  mm/vmalloc: honor GFP constraints in pcpu_get_vm_areas()
  mm/percpu: honor GFP constraints when populating chunks
  mm/percpu: Avoid IO/FS reclaim in backing allocations

 include/linux/vmalloc.h |  4 ++--
 mm/percpu-vm.c          | 40 +++++++++++++++++++++++++++-------------
 mm/percpu.c             | 17 +++++++++++------
 mm/vmalloc.c            | 23 ++++++++++++-----------
 4 files changed, 52 insertions(+), 32 deletions(-)

-- 
2.43.0



             reply	other threads:[~2026-06-04 11:31 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-04 11:30 Kaitao Cheng [this message]
2026-06-04 11:30 ` [PATCH v2 1/3] mm/vmalloc: honor GFP constraints in pcpu_get_vm_areas() Kaitao Cheng
2026-06-04 16:49   ` Uladzislau Rezki
2026-06-04 11:31 ` [PATCH v2 2/3] mm/percpu: honor GFP constraints when populating chunks Kaitao Cheng
2026-06-04 11:31 ` [PATCH v2 3/3] mm/percpu: Avoid IO/FS reclaim in backing allocations Kaitao Cheng
2026-06-04 19:07   ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260604113101.89510-1-kaitao.cheng@linux.dev \
    --to=kaitao.cheng@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=chengkaitao@kylinos.cn \
    --cc=cl@gentwo.org \
    --cc=dennis@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=pfalcato@suse.de \
    --cc=tj@kernel.org \
    --cc=urezki@gmail.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox