From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail-lf1-f47.google.com (mail-lf1-f47.google.com [209.85.167.47])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 63A414949F2
	for <linux-kernel@vger.kernel.org>; Thu,  4 Jun 2026 16:49:57 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.47
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1780591799; cv=none; b=bOISTanvMpFDkbL9yxDAy6WRPQB/JLkh47Q/53FaXo3xLqjWvU9bI30SRiG+DYKuOVdjtkBLjltqYdCP72vDT8fpVf9x5yElxsmwKAgEJ/f8IciL/9t86WFB08SMrpxk4E4PBriJRxXGcfiMaOi4jJEVQTx8Xm7l83P6DVCpOLU=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1780591799; c=relaxed/simple;
	bh=KIDxnBdTSxsfYs3rAlY3s3ZGcXn7b+5ejpTdWTRXe/g=;
	h=From:Date:To:Cc:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=Q/GbdZorBI6+WDpadGXC4r12muD+CB+14zu75pxI2IausiXL9fmWL7h97aaxpzj+X4CRHlRdAbgjc6VxvFTS/U0RgxaeJGPl8Zzy2owJnaDy/DlhBsvHELGxlJXPgNGi0ievdkzARzC3JNkgYnmVqdxsNb/Hu+hLHAJwgp45mmU=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=asH+9cX4; arc=none smtp.client-ip=209.85.167.47
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="asH+9cX4"
Received: by mail-lf1-f47.google.com with SMTP id 2adb3069b0e04-5aa68d7d757so1117832e87.0
        for <linux-kernel@vger.kernel.org>; Thu, 04 Jun 2026 09:49:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20251104; t=1780591796; x=1781196596; darn=vger.kernel.org;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:date:from:from:to:cc:subject:date:message-id:reply-to;
        bh=A87NVINYasGU8b7TootJWTNUhNfOL1nK4vYccktouhw=;
        b=asH+9cX4XauYi/7mxw5ZeR2zs4sTT8A2GNqdr+LZQ9VcVrOu3kEqe8GgRJt3roB61H
         rKQFn5FBkH+IBHUcLZzy7+54Qc2Ir0VHfp20J2mafaM608zam0tVJ/s18M/oLzTnOl5o
         uvJjkYmMU1+muPbgtawbRgUNES/x+G2sUYhKRRlF9qOGaNSBHezumSahGtYn+QIDsEuo
         oBB3dGY0AqMriIFLYui9YaYnqgjj9dA8rv4J6FNuaSU977rKOHdwyPJgcOoIO038vuZK
         Ts8ZMq9kLgZ96JnJVfBWLnOSUX35jZW8QRRq8YGT24hh/WFjIBj0wAkNiiB4cwdG7a9x
         hL8A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1780591796; x=1781196596;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:date:from:x-gm-gg:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=A87NVINYasGU8b7TootJWTNUhNfOL1nK4vYccktouhw=;
        b=ehAD0C7myMJZg5opbZ48R5T3HRQ/xApjHQ2lNvHCEvLXYSvEXc5e2STzZgC71G6oEz
         u3vbaqFX0BKbmkdi19DZCYZ8S+BE2Z60ArPou1Agj1aQOyBFO953PKUsDzjaYwOGrk0Q
         deiEP1fvkoRKNRYyyeMn07AHgGVsjy5m9KPVt46A6slIzwNAq6VytOS4S/hDAL4mRiKE
         LmrfvkcDUAjUNYzhzjl8100twLQKTsH+KcfSuDy7JaeTUom3wak92IboufJIwWeGIiGL
         z3aV8TD0ML9ANT8ajDbw+uqr6klzXt4D2Gd1L2KeJ/yiDii0Xe8MZNP/Y2GTxUdrmQ3t
         eJAg==
X-Forwarded-Encrypted: i=1; AFNElJ/ZAYYhWm3PLThJyzsbNKYFOIiei6n9kSdCb7wb1ZMTIJmPL5ROj15LY92rI6l29jwhMtfVMbEWeM3VKmU=@vger.kernel.org
X-Gm-Message-State: AOJu0Yzegy6ReWgHVYIoSmxofjiqmaioKg23U/wwriIUNq/i6ZJIcbS7
	6cay8xPb/KoB16bQwDZf8za0C4USQLX/ifI5gybQQ6CfcIftv9zxVW5typL6oyNA
X-Gm-Gg: Acq92OFXG9JP1jDck6UBSuZflj0gDn2O2pMaU25MJnT0M+HShleDymNef9EP8osLtsX
	78g3ynWP+NBkI77HrTQb3myBy55pT3AoIZWbB23h5U6qVK40vQwBPVOGRo/mX+OP6r6NxzztWKN
	Hmm3UCj4zSltbOWy81Nqey8ZYP3vayA0CimqHLI962CqiNJAEpK8h1Nuo5ArhPo+rA+i42lhXZd
	Z1ZRHUaNdZp2PwTbEH/4QtWWV0h8PmFai6BUj77IArY5J7J6A6NGOir4iAsD02rCEvlAHcaoCjo
	W4t535hHviU1tVwhIeQrcHfsS6rmrQT/2jpb3XUYH8ViYaGJS4wH1+qFnPXqSMR5GZjdGbUuQAa
	SqzhZuJ2MQZ/VSwegsINbKnP+qpoQETM3Ut8lzcT4sxHk6DVycq1plHAv8lJ0hOJnNVvhB0/m4z
	g=
X-Received: by 2002:a05:6512:108f:b0:5aa:753c:d8de with SMTP id 2adb3069b0e04-5aa7bf75701mr2785257e87.0.1780591795348;
        Thu, 04 Jun 2026 09:49:55 -0700 (PDT)
Received: from milan ([2001:9b1:d5a0:a500::24b])
        by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-5aa7b97ac1esm1321971e87.54.2026.06.04.09.49.54
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 04 Jun 2026 09:49:54 -0700 (PDT)
From: Uladzislau Rezki <urezki@gmail.com>
X-Google-Original-From: Uladzislau Rezki <urezki@milan>
Date: Thu, 4 Jun 2026 18:49:52 +0200
To: Kaitao Cheng <kaitao.cheng@linux.dev>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Dennis Zhou <dennis@kernel.org>, Tejun Heo <tj@kernel.org>,
	Christoph Lameter <cl@gentwo.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	Pedro Falcato <pfalcato@suse.de>,
	Vlastimil Babka <vbabka@kernel.org>, Michal Hocko <mhocko@suse.com>,
	muchun.song@linux.dev, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Kaitao Cheng <chengkaitao@kylinos.cn>
Subject: Re: [PATCH v2 1/3] mm/vmalloc: honor GFP constraints in
 pcpu_get_vm_areas()
Message-ID: <aiGssN5OS0nN7a58@milan>
References: <20260604113101.89510-1-kaitao.cheng@linux.dev>
 <20260604113101.89510-2-kaitao.cheng@linux.dev>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20260604113101.89510-2-kaitao.cheng@linux.dev>

On Thu, Jun 04, 2026 at 07:30:59PM +0800, Kaitao Cheng wrote:
> From: Kaitao Cheng <chengkaitao@kylinos.cn>
> 
> pcpu_alloc_noprof() derives pcpu_gfp from the caller supplied GFP mask
> and passes it down to the backing percpu allocator. However, when the
> percpu vmalloc allocator has to create a new chunk, pcpu_create_chunk()
> calls pcpu_get_vm_areas() to allocate the corresponding vmalloc areas.
> 
> pcpu_get_vm_areas() currently performs its internal allocations with
> GFP_KERNEL, including vmap area metadata, vm_struct metadata and KASAN
> vmalloc shadow population. This means that a caller which deliberately
> uses GFP_NOFS or GFP_NOIO can still enter FS or IO reclaim while creating
> the vmalloc areas for a new percpu chunk.
> 
> One possible case is blk-cgroup after commit 5d726c4dbeed
> ("blk-cgroup: fix possible deadlock while configuring policy").
> blkg_conf_prep() now serializes against blkcg_deactivate_policy() with
> q->blkcg_mutex, and blkg_alloc() was changed to GFP_NOIO for that reason:
> 
>   CPU0: blkg_conf_prep()
>     mutex_lock(q->blkcg_mutex)
>     blkg_alloc(..., GFP_NOIO)
>       alloc_percpu_gfp(..., GFP_NOIO)
>         pcpu_alloc_noprof(..., GFP_NOIO)
> 	  pcpu_create_chunk(GFP_NOIO)
> 	    pcpu_get_vm_areas()
>               -> if percpu chunks are exhausted, chunk create may do
>                  internal GFP_KERNEL allocations
>               -> direct reclaim / writeback can issue IO to this queue
>               -> IO waits because the queue is frozen
> 
>   CPU1: blkcg_deactivate_policy()
>     blk_mq_freeze_queue(q)
>     mutex_lock(q->blkcg_mutex)
>       -> waits for CPU0
>     ... unfreeze only happens after q->blkcg_mutex is acquired/released
> 
> So the concern is that the caller deliberately uses GFP_NOIO because it
> may hold a lock which can be acquired after queue freeze, but the percpu
> slow path can temporarily lose that allocation context.
> 
> Pass the caller supplied GFP mask from pcpu_create_chunk() to
> pcpu_get_vm_areas(), and use it for the internal vmalloc metadata and
> KASAN shadow allocations.
> 
> Fixes: 9a5b183941b5 ("mm, percpu: do not consider sleepable allocations atomic")
> Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
> ---
>  include/linux/vmalloc.h |  4 ++--
>  mm/percpu-vm.c          |  2 +-
>  mm/vmalloc.c            | 23 ++++++++++++-----------
>  3 files changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 3b02c0c6b371..9601e06624c8 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -308,14 +308,14 @@ static inline void set_vm_flush_reset_perms(void *addr) {}
>  #if defined(CONFIG_MMU) && defined(CONFIG_SMP)
>  struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
>  				     const size_t *sizes, int nr_vms,
> -				     size_t align);
> +				     size_t align, gfp_t gfp);
>  
>  void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms);
>  # else
>  static inline struct vm_struct **
>  pcpu_get_vm_areas(const unsigned long *offsets,
>  		const size_t *sizes, int nr_vms,
> -		size_t align)
> +		size_t align, gfp_t gfp)
>  {
>  	return NULL;
>  }
> diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
> index 4f5937090590..69b00741dc68 100644
> --- a/mm/percpu-vm.c
> +++ b/mm/percpu-vm.c
> @@ -340,7 +340,7 @@ static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp)
>  		return NULL;
>  
>  	vms = pcpu_get_vm_areas(pcpu_group_offsets, pcpu_group_sizes,
> -				pcpu_nr_groups, pcpu_atom_size);
> +				pcpu_nr_groups, pcpu_atom_size, gfp);
>  	if (!vms) {
>  		pcpu_free_chunk(chunk);
>  		return NULL;
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 1afca3568b9b..08f468135e4d 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -4946,16 +4946,17 @@ pvm_determine_end_from_reverse(struct vmap_area **va, unsigned long align)
>   * @sizes: array containing size of each area
>   * @nr_vms: the number of areas to allocate
>   * @align: alignment, all entries in @offsets and @sizes must be aligned to this
> + * @gfp: allocation flags passed to the underlying memory allocator
>   *
>   * Returns: kmalloc'd vm_struct pointer array pointing to allocated
>   *	    vm_structs on success, %NULL on failure
>   *
>   * Percpu allocator wants to use congruent vm areas so that it can
>   * maintain the offsets among percpu areas.  This function allocates
> - * congruent vmalloc areas for it with GFP_KERNEL.  These areas tend to
> - * be scattered pretty far, distance between two areas easily going up
> - * to gigabytes.  To avoid interacting with regular vmallocs, these
> - * areas are allocated from top.
> + * congruent vmalloc areas for it. These areas tend to be scattered
> + * pretty far, distance between two areas easily going up to gigabytes.
> + * To avoid interacting with regular vmallocs, these areas are allocated
> + * from top.
>   *
>   * Despite its complicated look, this allocator is rather simple. It
>   * does everything top-down and scans free blocks from the end looking
> @@ -4966,7 +4967,7 @@ pvm_determine_end_from_reverse(struct vmap_area **va, unsigned long align)
>   */
>  struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
>  				     const size_t *sizes, int nr_vms,
> -				     size_t align)
> +				     size_t align, gfp_t gfp)
>  {
>  	const unsigned long vmalloc_start = ALIGN(VMALLOC_START, align);
>  	const unsigned long vmalloc_end = VMALLOC_END & ~(align - 1);
> @@ -5004,14 +5005,14 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
>  		return NULL;
>  	}
>  
> -	vms = kzalloc_objs(vms[0], nr_vms);
> -	vas = kzalloc_objs(vas[0], nr_vms);
> +	vms = kzalloc_objs(vms[0], nr_vms, gfp);
> +	vas = kzalloc_objs(vas[0], nr_vms, gfp);
>  	if (!vas || !vms)
>  		goto err_free2;
>  
>  	for (area = 0; area < nr_vms; area++) {
> -		vas[area] = kmem_cache_zalloc(vmap_area_cachep, GFP_KERNEL);
> -		vms[area] = kzalloc_obj(struct vm_struct);
> +		vas[area] = kmem_cache_zalloc(vmap_area_cachep, gfp);
> +		vms[area] = kzalloc_obj(struct vm_struct, gfp);
>  		if (!vas[area] || !vms[area])
>  			goto err_free;
>  	}
> @@ -5101,7 +5102,7 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
>  
>  	/* populate the kasan shadow space */
>  	for (area = 0; area < nr_vms; area++) {
> -		if (kasan_populate_vmalloc(vas[area]->va_start, sizes[area], GFP_KERNEL))
> +		if (kasan_populate_vmalloc(vas[area]->va_start, sizes[area], gfp))
>  			goto err_free_shadow;
>  	}
>  
> @@ -5158,7 +5159,7 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
>  				continue;
>  
>  			vas[area] = kmem_cache_zalloc(
> -				vmap_area_cachep, GFP_KERNEL);
> +				vmap_area_cachep, gfp);
>  			if (!vas[area])
>  				goto err_free;
>  		}
> -- 
> 2.43.0
> 
Looks good to me:

Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

--
Uladzislau Rezki