From: Pedro Falcato <pfalcato@suse.de>
To: Kaitao Cheng <kaitao.cheng@linux.dev>
Cc: dennis@kernel.org, tj@kernel.org, cl@gentwo.org,
akpm@linux-foundation.org, mhocko@suse.com, vbabka@kernel.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
muchun.song@linux.dev, Kaitao Cheng <chengkaitao@kylinos.cn>
Subject: Re: [PATCH 2/2] mm/percpu: Avoid pcpu_alloc_mutex recursion from reclaim
Date: Fri, 29 May 2026 10:34:55 +0100 [thread overview]
Message-ID: <ahldBnDclhfNMyzu@pedro-suse.lan> (raw)
In-Reply-To: <20260528132917.81123-3-kaitao.cheng@linux.dev>
On Thu, May 28, 2026 at 09:29:17PM +0800, Kaitao Cheng wrote:
> From: Kaitao Cheng <chengkaitao@kylinos.cn>
>
> pcpu_alloc_noprof() takes pcpu_alloc_mutex for sleepable allocations
> so that it can create chunks and populate backing pages. If reclaim is
> entered while that mutex is already held, and reclaim reaches a path
> which allocates percpu memory, the nested allocation can try to take
> pcpu_alloc_mutex again.
>
> That creates a reclaim recursion dependency:
>
> pcpu_alloc_noprof(GFP_KERNEL)
> mutex_lock(&pcpu_alloc_mutex)
> reclaim
> pcpu_alloc_noprof(GFP_NOIO/GFP_NOFS)
> mutex_lock(&pcpu_alloc_mutex)
>
> Avoid this by treating percpu allocations from reclaim context as atomic.
> Such allocations may still be served from already available and populated
> areas, but they must not enter the mutex-protected slow path or create new
> chunks. If no space is available, fail the allocation and let the normal
> balance work handle replenishment outside reclaim.
>
> Update the function comment to describe that reclaim context allocations
> are atomic regardless of whether the supplied GFP mask would otherwise
> allow blocking.
>
> This patch is a preventive fix. There may not currently be any path that
> calls pcpu_alloc_noprof(GFP_NOIO/GFP_NOFS) from direct reclaim context.
I don't like this. The proper way of fixing this would probably be to release
pcpu_alloc_mutex (or not have it in the first place!) while you're allocating
memory.
>
> Fixes: 9a5b183941b5 ("mm, percpu: do not consider sleepable allocations atomic")
> Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
> ---
> mm/percpu.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 1bb38467390b..9c30e5897813 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -1803,9 +1803,9 @@ static void pcpu_memalloc_scope_restore(gfp_t gfp, unsigned int flags)
> * @gfp: allocation flags
> *
> * Allocate percpu area of @size bytes aligned at @align. If @gfp doesn't
> - * contain %GFP_KERNEL, the allocation is atomic. If @gfp has __GFP_NOWARN
> - * then no warning will be triggered on invalid or failed allocation
> - * requests.
> + * allow blocking, or if allocation is requested from reclaim context, the
> + * allocation is atomic. If @gfp has __GFP_NOWARN then no warning will be
> + * triggered on invalid or failed allocation requests.
> *
> * RETURNS:
> * Percpu pointer to the allocated area on success, NULL on failure.
> @@ -1828,7 +1828,12 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
> gfp = current_gfp_context(gfp);
> /* whitelisted flags that can be passed to the backing allocators */
> pcpu_gfp = gfp & (GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
> - is_atomic = !gfpflags_allow_blocking(gfp);
> + /*
> + * Reclaim can be entered while pcpu_alloc_mutex is already held by
> + * another percpu allocation. Avoid recursing back into the mutex from
> + * reclaim; best-effort allocations from already populated areas are OK.
> + */
since this is an entirely theoretical issue:
/* Reclaim paths should not be hitting the percpu allocator, for now */
if (WARN_ON_ONCE(current->reclaim_state))
return NULL;
But that's just my 2c.
> + is_atomic = !gfpflags_allow_blocking(gfp) || current->reclaim_state;
> do_warn = !(gfp & __GFP_NOWARN);
>
> /*
> --
> 2.50.1 (Apple Git-155)
>
--
Pedro
next prev parent reply other threads:[~2026-05-29 9:35 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-28 13:29 [PATCH 0/2] mm/percpu: Fix possible NOFS/NOIO reclaim recursion Kaitao Cheng
2026-05-28 13:29 ` [PATCH 1/2] mm/percpu: Preserve NOFS/NOIO scope during chunk create and populate Kaitao Cheng
2026-05-29 9:25 ` Pedro Falcato
2026-05-29 9:38 ` Pedro Falcato
2026-05-30 12:47 ` Kaitao Cheng
2026-05-30 13:32 ` Dennis Zhou
2026-06-01 2:27 ` Kaitao Cheng
2026-06-01 15:45 ` Michal Hocko
2026-06-02 3:03 ` Kaitao Cheng
2026-06-02 7:16 ` Vlastimil Babka (SUSE)
2026-06-02 8:05 ` Michal Hocko
2026-06-02 9:02 ` Kaitao Cheng
2026-06-02 7:17 ` Michal Hocko
2026-06-02 13:46 ` Pedro Falcato
2026-05-28 13:29 ` [PATCH 2/2] mm/percpu: Avoid pcpu_alloc_mutex recursion from reclaim Kaitao Cheng
2026-05-29 9:34 ` Pedro Falcato [this message]
2026-05-28 21:09 ` [PATCH 0/2] mm/percpu: Fix possible NOFS/NOIO reclaim recursion Andrew Morton
2026-05-28 21:10 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ahldBnDclhfNMyzu@pedro-suse.lan \
--to=pfalcato@suse.de \
--cc=akpm@linux-foundation.org \
--cc=chengkaitao@kylinos.cn \
--cc=cl@gentwo.org \
--cc=dennis@kernel.org \
--cc=kaitao.cheng@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=tj@kernel.org \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.