From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f41.google.com (mail-wm1-f41.google.com [209.85.128.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5A2B431986C for ; Thu, 18 Jun 2026 17:05:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.41 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781802335; cv=none; b=dJoolBqg4Qi8shH/u4m3B9Ia266MQ40LOSHd+iV+HAnkKjh8Lfvj9J3o0QfRrR5KOnh1R+lsdcYIBvPdLchnwtj6tOkzEw6ulbwFEw5AJIL3IeP48pwau4IcgSZQ0KfztFCrw9a/Y269Zupl4g0/cFFHYG81XizsvR4Uxjv2jB0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781802335; c=relaxed/simple; bh=7HrKi1KAce2zs7OLkutoimyhh4vhGDK16pTduvTQULg=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=fSRBtQpPt9AIVzXog5HtHz+69kryHwRDoRVIa1zJXRJbejlgUo/RnAcBob+4PUPqHqpfLuWAtzDbsG3CYcOAS6vQCJ01ZFNnurN8EFpW2Sb2sEirxBAR51svshD/XHkjzddINk0BBj/ETUrqnWdbAbDYDGftnE1+/uaI/j7Tcfg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=OyLUFwiV; arc=none smtp.client-ip=209.85.128.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="OyLUFwiV" Received: by mail-wm1-f41.google.com with SMTP id 5b1f17b1804b1-490b3637b90so8819875e9.3 for ; Thu, 18 Jun 2026 10:05:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1781802332; x=1782407132; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=ybW0XDMXWhhPRAahHbsLjLyEsfhFFCr/wzQmAvHHa+w=; b=OyLUFwiVz5bD3i72njNHiSsIAn4/gkP9sZeloeL4hP6WYXefHDo4IW+bWGRTkyInXU T3NBrg9bq/ACTok4Z0q72uABTSZ8hxDzX8tk6Zhxw3tC8FuvJetHcwx1ZpcYx5X0Ryn3 JHD0ZIiNFEdsAHeZSNMhEEQvaHFqvN/5+l8OG6e9my5VhDJzGzKFZBzhIw5syjkPOiGp KRvoYgT8Dao4OAMOVgBre7Be3HLDQ9r+hWL3UOWjy+H477xhMBSdMkbFjE8yuKMVsUpE K+iNRRMQzuTs60cuowsEblU6ySwTWzQ7y5Hw2eDO4+S4EGg1FGk0Of0Hgx+evJcFS8kX PC7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781802332; x=1782407132; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ybW0XDMXWhhPRAahHbsLjLyEsfhFFCr/wzQmAvHHa+w=; b=sN6nNQNyaqdiQK4iUX38wxn0q/yIcB+v4OlVymKJiCZKY0fTwzZo2OgFabCAon3+9I rSDekkdI6FiNX6+rATPNFaUrI5KiOhKBkFObuc1njvK3Djc0xa3qHFoXZ5uzq8ayXOx/ KnbY2Ee43QZ5XYyrkrmQD2bFIfkW4wQbk+VXH4Twt6+m0bX7hCxjXHSMw4m8ncEwF+gc G6jxbiXvOYtLYzV2vvEBr9qLLHpxEdUcqmRndypwd3CXk0rQ94MC7Te/H/rPOgaGQhMl pegpT0cDl8f37g5JZ14gE2C42fb6X1Z/+osTsAYEotVYFFkvQdn4g/IR8eZpf+UY0vd6 UoGQ== X-Forwarded-Encrypted: i=1; AFNElJ87B7kFic+kUkMMoVvC6YFqUseTr2MHOcFoYD+aUuBPzC3F16gdTqbCYV/82VsJqzfuQ0ZD5ALIb+gJ9lQ=@vger.kernel.org X-Gm-Message-State: AOJu0Yy6zIEcIEUS356pR5dYOS11UiPQcnnhpLetFV64RCE0G6Cl5ws0 08oLTGvNZhMW8P0MY7cbuqbjscDDo41D6TNJqGgBzkw23YstOlDRN3c7eIVUBtg2l1M= X-Gm-Gg: AfdE7cnwW31XAWeyD8lqWD3HdriHrXZYPEGV03nsJ9RouAQeSWuXy5WGyHoK1KJJEkJ cf2QZQDgZ7YfPrOAlaIG1blk1q2fITXDmOoy3Qp2Wlo8m7rAS+qBgPy4qtt04PiU+iES1p9bSIp CUmEra+GMd+s3mCGgEW3LTV/wVcR0sudzgF4goVgjVzvyhjEEnVCdvEPvkP9m1WjYU0UdGS0xTn Cw4Er98O9ai5G7UPkm4elvsm1kQbqgqSC+rfSUXVLYTQ5e+DY/F4CmcaFqRX+LCpulfd8XyQqL/ 2WGWHVv8GH2sanbtZM5e1NIAAhooZwWAjT0bnbNoUHkcvvCIBGDh5497S0JgsKQvV77TN5ji37X I4uXfRwAdm8wP/ATiK7BrvQSUcgSgeEH4oMo3DgfX53L1GPOsmRMkfZQxIk2bXrKOU7oOhX0Ei5 2+4l6iL5OOavozdWPkoUe8zV/oyA== X-Received: by 2002:a05:600c:c48f:b0:490:b645:3213 with SMTP id 5b1f17b1804b1-4923f56c119mr7287915e9.19.1781802331500; Thu, 18 Jun 2026 10:05:31 -0700 (PDT) Received: from localhost (109-81-26-193.rct.o2.cz. [109.81.26.193]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4923fd33dafsm2921175e9.8.2026.06.18.10.05.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Jun 2026 10:05:31 -0700 (PDT) Date: Thu, 18 Jun 2026 19:05:30 +0200 From: Michal Hocko To: Kaitao Cheng Cc: Andrew Morton , Uladzislau Rezki , Dennis Zhou , Tejun Heo , Christoph Lameter , Vlastimil Babka , Pedro Falcato , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kaitao Cheng , Shivam Kalra Subject: Re: [PATCH v4 1/4] mm/vmalloc: honor GFP constraints in pcpu_get_vm_areas() Message-ID: References: <20260618130414.96383-1-kaitao.cheng@linux.dev> <20260618130414.96383-2-kaitao.cheng@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260618130414.96383-2-kaitao.cheng@linux.dev> On Thu 18-06-26 21:04:11, Kaitao Cheng wrote: > From: Kaitao Cheng > > pcpu_alloc_noprof() derives pcpu_gfp from the caller supplied GFP mask > and passes it down to the backing percpu allocator. However, when the > percpu vmalloc allocator has to create a new chunk, pcpu_create_chunk() > calls pcpu_get_vm_areas() to allocate the corresponding vmalloc areas. > > pcpu_get_vm_areas() currently performs its internal allocations with > GFP_KERNEL, including vmap area metadata, vm_struct metadata and KASAN > vmalloc shadow population. This means that a caller which deliberately > uses GFP_NOFS or GFP_NOIO can still enter FS or IO reclaim while creating > the vmalloc areas for a new percpu chunk. > > One possible case is blk-cgroup after commit 5d726c4dbeed > ("blk-cgroup: fix possible deadlock while configuring policy"). > blkg_conf_prep() now serializes against blkcg_deactivate_policy() with > q->blkcg_mutex, and blkg_alloc() was changed to GFP_NOIO for that reason: > > CPU0: blkg_conf_prep() > mutex_lock(q->blkcg_mutex) > blkg_alloc(..., GFP_NOIO) > alloc_percpu_gfp(..., GFP_NOIO) > pcpu_alloc_noprof(..., GFP_NOIO) > pcpu_create_chunk(GFP_NOIO) > pcpu_get_vm_areas() > -> if percpu chunks are exhausted, chunk create may do > internal GFP_KERNEL allocations > -> direct reclaim / writeback can issue IO to this queue > -> IO waits because the queue is frozen > > CPU1: blkcg_deactivate_policy() > blk_mq_freeze_queue(q) > mutex_lock(q->blkcg_mutex) > -> waits for CPU0 > ... unfreeze only happens after q->blkcg_mutex is acquired/released > > So the concern is that the caller deliberately uses GFP_NOIO because it > may hold a lock which can be acquired after queue freeze, but the percpu > slow path can temporarily lose that allocation context. > > Pass the caller supplied GFP mask from pcpu_create_chunk() to > pcpu_get_vm_areas(), and use it for the internal vmalloc metadata and > KASAN shadow allocations. > > Fixes: 9a5b183941b5 ("mm, percpu: do not consider sleepable allocations atomic") > Signed-off-by: Kaitao Cheng > Reviewed-by: Uladzislau Rezki (Sony) > Reviewed-by: Shivam Kalra > Acked-by: Dennis Zhou LGTM Acked-by: Michal Hocko Thanks for fixing this! > --- > include/linux/vmalloc.h | 4 ++-- > mm/percpu-vm.c | 2 +- > mm/vmalloc.c | 23 ++++++++++++----------- > 3 files changed, 15 insertions(+), 14 deletions(-) > > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h > index d87dc7f77f4e..e4d8d0a9f30f 100644 > --- a/include/linux/vmalloc.h > +++ b/include/linux/vmalloc.h > @@ -310,14 +310,14 @@ static inline void set_vm_flush_reset_perms(void *addr) {} > #if defined(CONFIG_MMU) && defined(CONFIG_SMP) > struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets, > const size_t *sizes, int nr_vms, > - size_t align); > + size_t align, gfp_t gfp); > > void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms); > # else > static inline struct vm_struct ** > pcpu_get_vm_areas(const unsigned long *offsets, > const size_t *sizes, int nr_vms, > - size_t align) > + size_t align, gfp_t gfp) > { > return NULL; > } > diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c > index 4f5937090590..69b00741dc68 100644 > --- a/mm/percpu-vm.c > +++ b/mm/percpu-vm.c > @@ -340,7 +340,7 @@ static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp) > return NULL; > > vms = pcpu_get_vm_areas(pcpu_group_offsets, pcpu_group_sizes, > - pcpu_nr_groups, pcpu_atom_size); > + pcpu_nr_groups, pcpu_atom_size, gfp); > if (!vms) { > pcpu_free_chunk(chunk); > return NULL; > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 1afca3568b9b..08f468135e4d 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -4946,16 +4946,17 @@ pvm_determine_end_from_reverse(struct vmap_area **va, unsigned long align) > * @sizes: array containing size of each area > * @nr_vms: the number of areas to allocate > * @align: alignment, all entries in @offsets and @sizes must be aligned to this > + * @gfp: allocation flags passed to the underlying memory allocator > * > * Returns: kmalloc'd vm_struct pointer array pointing to allocated > * vm_structs on success, %NULL on failure > * > * Percpu allocator wants to use congruent vm areas so that it can > * maintain the offsets among percpu areas. This function allocates > - * congruent vmalloc areas for it with GFP_KERNEL. These areas tend to > - * be scattered pretty far, distance between two areas easily going up > - * to gigabytes. To avoid interacting with regular vmallocs, these > - * areas are allocated from top. > + * congruent vmalloc areas for it. These areas tend to be scattered > + * pretty far, distance between two areas easily going up to gigabytes. > + * To avoid interacting with regular vmallocs, these areas are allocated > + * from top. > * > * Despite its complicated look, this allocator is rather simple. It > * does everything top-down and scans free blocks from the end looking > @@ -4966,7 +4967,7 @@ pvm_determine_end_from_reverse(struct vmap_area **va, unsigned long align) > */ > struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets, > const size_t *sizes, int nr_vms, > - size_t align) > + size_t align, gfp_t gfp) > { > const unsigned long vmalloc_start = ALIGN(VMALLOC_START, align); > const unsigned long vmalloc_end = VMALLOC_END & ~(align - 1); > @@ -5004,14 +5005,14 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets, > return NULL; > } > > - vms = kzalloc_objs(vms[0], nr_vms); > - vas = kzalloc_objs(vas[0], nr_vms); > + vms = kzalloc_objs(vms[0], nr_vms, gfp); > + vas = kzalloc_objs(vas[0], nr_vms, gfp); > if (!vas || !vms) > goto err_free2; > > for (area = 0; area < nr_vms; area++) { > - vas[area] = kmem_cache_zalloc(vmap_area_cachep, GFP_KERNEL); > - vms[area] = kzalloc_obj(struct vm_struct); > + vas[area] = kmem_cache_zalloc(vmap_area_cachep, gfp); > + vms[area] = kzalloc_obj(struct vm_struct, gfp); > if (!vas[area] || !vms[area]) > goto err_free; > } > @@ -5101,7 +5102,7 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets, > > /* populate the kasan shadow space */ > for (area = 0; area < nr_vms; area++) { > - if (kasan_populate_vmalloc(vas[area]->va_start, sizes[area], GFP_KERNEL)) > + if (kasan_populate_vmalloc(vas[area]->va_start, sizes[area], gfp)) > goto err_free_shadow; > } > > @@ -5158,7 +5159,7 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets, > continue; > > vas[area] = kmem_cache_zalloc( > - vmap_area_cachep, GFP_KERNEL); > + vmap_area_cachep, gfp); > if (!vas[area]) > goto err_free; > } > -- > 2.50.1 (Apple Git-155) > -- Michal Hocko SUSE Labs