From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-186.mta0.migadu.com (out-186.mta0.migadu.com [91.218.175.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2563E2E7BD9 for ; Fri, 12 Jun 2026 02:27:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.186 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781231259; cv=none; b=cUn7a8dUabehERF+YSQOFECBHlAEZn/uVCVV14bguXN48hRrqV+RM4N+ijGhAsZPiv6TLJTscjEZLamlIb9wSXcWDGJ4ydXMBs0h2fuI5A+sWB5afX8mooEeSMeK48WOpj+h3bSFMI/PSLapqQODF8ugAW4F0uOIK2mol7apG7Q= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781231259; c=relaxed/simple; bh=RLo7bC8LPxg/PwG887m+DiLr56EFmiGcVDW+I3yx5GA=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=VikvYU4xLaEMtCrkXKI0Ru0E8s60lGJjd0uYqvhJeYVe8m+iWiVQZTMDCRird50laq1f0bQw5JdvRdv22HB98wnIYyP/21bmhjH3NfWYQ3wnXa8yXQg02c3vz+2hlkPAYnXRy0dEfzhYBLZzUGII7OAhkCL2HS5V4fXe7klhnPQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=iSnqPrUF; arc=none smtp.client-ip=91.218.175.186 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="iSnqPrUF" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1781231255; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=+nAVk0jQ1WR0TlwOtqmpPD5IZTU0EEVKPj44RNkLO9w=; b=iSnqPrUF9ghIgpxs/l4X1ShSfdiHvbdH1Oa+EgfCBF/zhnHGkViQhRuhb3IwqSJm0Q8ZJ2 okR5zm7DUKFPBLMVfoLI/YXfzQHGKCAZwLl2UY1fl/GU72mqvOT5yoQiV/84RRek/H3ImU ho8+9IDfs5/h+nlfBufGwmYHM8aqOUo= From: Kaitao Cheng To: Andrew Morton , Uladzislau Rezki , Dennis Zhou , Tejun Heo , Christoph Lameter Cc: Vlastimil Babka , Michal Hocko , muchun.song@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, chengkaitao Subject: [PATCH v3 0/3] mm/percpu: Fix possible NOFS/NOIO reclaim recursion Date: Fri, 12 Jun 2026 10:26:45 +0800 Message-ID: <20260612022648.13008-1-kaitao.cheng@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT From: chengkaitao Hi all, After v1 was posted, there were many different opinions, mainly around optimizing pcpu_alloc_mutex. This v3 is intended to describe the existing problems more clearly and provide a conventional fix approach. Commit 9a5b183941b5 ("mm, percpu: do not consider sleepable allocations atomic") allowed GFP_NOFS and GFP_NOIO percpu allocations to use pcpu_alloc_mutex and the chunk creation slow path. This restored the allocation capability that was lost when those constrained allocations were treated as atomic, but it also makes the percpu slow path visible to callers from constrained reclaim contexts. There are two related problems. First, the create and populate slow paths do not fully preserve the caller's allocation constraints. pcpu_alloc_noprof() derives pcpu_gfp from the caller supplied GFP mask and passes it down to the percpu backing page allocator. However, chunk creation calls pcpu_get_vm_areas(), and chunk population can allocate temporary metadata or vmalloc page tables while mapping backing pages. Those internal allocations can still use GFP_KERNEL, so a caller using GFP_NOFS or GFP_NOIO can enter unconstrained FS or IO reclaim while holding pcpu_alloc_mutex. One possible case is blk-cgroup after commit 5d726c4dbeed ("blk-cgroup: fix possible deadlock while configuring policy"). blkg_conf_prep() now serializes against blkcg_deactivate_policy() with q->blkcg_mutex, and blkg_alloc() uses GFP_NOIO because queue freeze and IO reclaim dependencies can otherwise deadlock. If the percpu slow path loses that GFP_NOIO context, direct reclaim or writeback can issue IO to a frozen queue while q->blkcg_mutex is held. Second, allowing sleepable GFP_NOFS/GFP_NOIO allocations to take pcpu_alloc_mutex means that unconstrained backing allocations made under the mutex can create an FS/IO reclaim dependency against a constrained caller which already holds an FS or IO lock and then waits for pcpu_alloc_mutex. This series fixes those issues in three steps: - pass the caller supplied GFP mask into pcpu_get_vm_areas() and use it for vmalloc metadata and KASAN shadow allocations; - pass the GFP mask through the chunk population path, including the temporary pages array and vmalloc page table allocation scope; - restrict percpu backing allocations performed while holding pcpu_alloc_mutex to GFP_NOIO, so they cannot recurse into IO or FS reclaim. This keeps sleepable GFP_NOFS/GFP_NOIO percpu allocations working, while avoiding the reclaim recursion risks introduced by making those allocations eligible for the mutex-protected slow path. Changes in v3: Allow @gfp to pass __GFP_NOFAIL through. (Andrew Morton) Changes in v2: - split the previous first patch into vmalloc-area creation and chunk population changes; (Pedro Falcato) - pass the GFP mask explicitly to pcpu_get_vm_areas(); (Pedro Falcato) - apply the corresponding memalloc scope around vmalloc page table allocation during chunk population; - replace the reclaim recursion avoidance with a GFP_NOIO backing allocation mask instead of only rejecting nested reclaim. (Michal Hocko) Link to v2: https://lore.kernel.org/all/20260604113101.89510-1-kaitao.cheng@linux.dev/ Link to v1: https://lore.kernel.org/all/20260528132917.81123-1-kaitao.cheng@linux.dev/ Kaitao Cheng (3): mm/vmalloc: honor GFP constraints in pcpu_get_vm_areas() mm/percpu: honor GFP constraints when populating chunks mm/percpu: Avoid IO/FS reclaim in backing allocations include/linux/vmalloc.h | 4 ++-- mm/percpu-vm.c | 40 +++++++++++++++++++++++++++------------- mm/percpu.c | 18 ++++++++++++------ mm/vmalloc.c | 23 ++++++++++++----------- 4 files changed, 53 insertions(+), 32 deletions(-) -- 2.43.0