linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/7] vmallloc and non-blocking GFPs
@ 2025-07-04 15:25 Uladzislau Rezki (Sony)
  2025-07-04 15:25 ` [RFC 1/7] lib/test_vmalloc: Add non-block-alloc-test case Uladzislau Rezki (Sony)
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Uladzislau Rezki (Sony) @ 2025-07-04 15:25 UTC (permalink / raw)
  To: linux-mm, Andrew Morton; +Cc: Michal Hocko, LKML, Baoquan He, Uladzislau Rezki

This is a small series tends to support non-blocking GFP flags
such as GFP_ATOMIC or GFP_NOWAIT. This is a draft version and
it should be improved or changed.

For example there are still hard-coded GFP flags in the:
    kmsan_vmap_pages_range_noflush(addr, end, prot, pages, page_shift);

if the kernel is build with KMSAN support. There are other parts which
should be fixed. But i tested this series with a fresh non-block-alloc
test together with CONFIG_DEBUG_ATOMIC_SLEEP=y to track sleep in atomic
issues.

Based on:

VERSION = 6
PATCHLEVEL = 16
SUBLEVEL = 0
EXTRAVERSION = -rc1

Uladzislau Rezki (Sony) (7):
  lib/test_vmalloc: Add non-block-alloc-test case
  mm/vmalloc: Support non-blocking GFP flags in alloc_vmap_area()
  mm/vmalloc: Avoid cond_resched() when blocking is not permitted
  mm/kasan, mm/vmalloc: Respect GFP flags in kasan_populate_vmalloc()
  mm/vmalloc: Defer freeing partly initialized vm_struct
  mm/vmalloc: Support non-blocking GFP flags in __vmalloc_area_node()
  mm: Drop __GFP_DIRECT_RECLAIM flag if PF_MEMALLOC is set

 include/linux/kasan.h    |  6 +--
 include/linux/sched/mm.h |  7 ++-
 lib/test_vmalloc.c       | 27 ++++++++++++
 mm/kasan/shadow.c        | 22 +++++++---
 mm/vmalloc.c             | 93 +++++++++++++++++++++++++++++++++-------
 5 files changed, 129 insertions(+), 26 deletions(-)

-- 
2.39.5



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC 1/7] lib/test_vmalloc: Add non-block-alloc-test case
  2025-07-04 15:25 [RFC 0/7] vmallloc and non-blocking GFPs Uladzislau Rezki (Sony)
@ 2025-07-04 15:25 ` Uladzislau Rezki (Sony)
  2025-07-08  5:59   ` [External] " Adrian Huang12
  2025-07-04 15:25 ` [RFC 2/7] mm/vmalloc: Support non-blocking GFP flags in alloc_vmap_area() Uladzislau Rezki (Sony)
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Uladzislau Rezki (Sony) @ 2025-07-04 15:25 UTC (permalink / raw)
  To: linux-mm, Andrew Morton; +Cc: Michal Hocko, LKML, Baoquan He, Uladzislau Rezki

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 lib/test_vmalloc.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c
index 1b0b59549aaf..9e3429dfe176 100644
--- a/lib/test_vmalloc.c
+++ b/lib/test_vmalloc.c
@@ -54,6 +54,7 @@ __param(int, run_test_mask, INT_MAX,
 		"\t\tid: 256,  name: kvfree_rcu_1_arg_vmalloc_test\n"
 		"\t\tid: 512,  name: kvfree_rcu_2_arg_vmalloc_test\n"
 		"\t\tid: 1024, name: vm_map_ram_test\n"
+		"\t\tid: 2048, name: no_block_alloc_test\n"
 		/* Add a new test case description here. */
 );
 
@@ -283,6 +284,31 @@ static int fix_size_alloc_test(void)
 	return 0;
 }
 
+static DEFINE_SPINLOCK(no_block_alloc_lock);
+
+static int no_block_alloc_test(void)
+{
+	void *ptr;
+	u8 rnd;
+	int i;
+
+	for (i = 0; i < test_loop_count; i++) {
+		rnd = get_random_u8();
+
+		spin_lock(&no_block_alloc_lock);
+		ptr = __vmalloc(PAGE_SIZE, (rnd % 2) ? GFP_NOWAIT : GFP_ATOMIC);
+		spin_unlock(&no_block_alloc_lock);
+
+		if (!ptr)
+			return -1;
+
+		*((__u8 *)ptr) = 0;
+		vfree(ptr);
+	}
+
+	return 0;
+}
+
 static int
 pcpu_alloc_test(void)
 {
@@ -410,6 +436,7 @@ static struct test_case_desc test_case_array[] = {
 	{ "kvfree_rcu_1_arg_vmalloc_test", kvfree_rcu_1_arg_vmalloc_test },
 	{ "kvfree_rcu_2_arg_vmalloc_test", kvfree_rcu_2_arg_vmalloc_test },
 	{ "vm_map_ram_test", vm_map_ram_test },
+	{ "no_block_alloc_test", no_block_alloc_test },
 	/* Add a new test case here. */
 };
 
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC 2/7] mm/vmalloc: Support non-blocking GFP flags in alloc_vmap_area()
  2025-07-04 15:25 [RFC 0/7] vmallloc and non-blocking GFPs Uladzislau Rezki (Sony)
  2025-07-04 15:25 ` [RFC 1/7] lib/test_vmalloc: Add non-block-alloc-test case Uladzislau Rezki (Sony)
@ 2025-07-04 15:25 ` Uladzislau Rezki (Sony)
  2025-07-07  7:11   ` Michal Hocko
  2025-07-04 15:25 ` [RFC 3/7] mm/vmalloc: Avoid cond_resched() when blocking is not permitted Uladzislau Rezki (Sony)
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Uladzislau Rezki (Sony) @ 2025-07-04 15:25 UTC (permalink / raw)
  To: linux-mm, Andrew Morton; +Cc: Michal Hocko, LKML, Baoquan He, Uladzislau Rezki

alloc_vmap_area() currently assumes that sleeping is allowed during
allocation. This is not true for callers which pass non-blocking
GFP flags, such as GFP_ATOMIC or GFP_NOWAIT.

This patch adds logic to detect whether the given gfp_mask permits
blocking. It avoids invoking might_sleep() or falling back to reclaim
path if blocking is not allowed.

This makes alloc_vmap_area() safer for use in non-sleeping contexts,
where previously it could hit unexpected sleeps, trigger warnings.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ab986dd09b6a..8c375b8e269d 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2009,6 +2009,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 	unsigned long freed;
 	unsigned long addr;
 	unsigned int vn_id;
+	bool allow_block;
 	int purged = 0;
 	int ret;
 
@@ -2018,7 +2019,9 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 	if (unlikely(!vmap_initialized))
 		return ERR_PTR(-EBUSY);
 
-	might_sleep();
+	allow_block = gfpflags_allow_blocking(gfp_mask);
+	if (allow_block)
+		might_sleep();
 
 	/*
 	 * If a VA is obtained from a global heap(if it fails here)
@@ -2030,7 +2033,8 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 	 */
 	va = node_alloc(size, align, vstart, vend, &addr, &vn_id);
 	if (!va) {
-		gfp_mask = gfp_mask & GFP_RECLAIM_MASK;
+		if (allow_block)
+			gfp_mask = gfp_mask & GFP_RECLAIM_MASK;
 
 		va = kmem_cache_alloc_node(vmap_area_cachep, gfp_mask, node);
 		if (unlikely(!va))
@@ -2057,8 +2061,14 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 	 * If an allocation fails, the error value is
 	 * returned. Therefore trigger the overflow path.
 	 */
-	if (IS_ERR_VALUE(addr))
+	if (IS_ERR_VALUE(addr)) {
+		if (!allow_block) {
+			kmem_cache_free(vmap_area_cachep, va);
+			return ERR_PTR(-ENOMEM);
+		}
+
 		goto overflow;
+	}
 
 	va->va_start = addr;
 	va->va_end = addr + size;
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC 3/7] mm/vmalloc: Avoid cond_resched() when blocking is not permitted
  2025-07-04 15:25 [RFC 0/7] vmallloc and non-blocking GFPs Uladzislau Rezki (Sony)
  2025-07-04 15:25 ` [RFC 1/7] lib/test_vmalloc: Add non-block-alloc-test case Uladzislau Rezki (Sony)
  2025-07-04 15:25 ` [RFC 2/7] mm/vmalloc: Support non-blocking GFP flags in alloc_vmap_area() Uladzislau Rezki (Sony)
@ 2025-07-04 15:25 ` Uladzislau Rezki (Sony)
  2025-07-07  7:11   ` Michal Hocko
  2025-07-04 15:25 ` [RFC 4/7] mm/kasan, mm/vmalloc: Respect GFP flags in kasan_populate_vmalloc() Uladzislau Rezki (Sony)
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Uladzislau Rezki (Sony) @ 2025-07-04 15:25 UTC (permalink / raw)
  To: linux-mm, Andrew Morton; +Cc: Michal Hocko, LKML, Baoquan He, Uladzislau Rezki

The vm_area_alloc_pages() function uses cond_resched() to yield the
CPU during potentially long-running loops. However, yielding should
only be done if the given GFP flags allow blocking.

This patch avoids calling cond_resched() when the allocation context
is non-blocking(GFP_ATOMIC, GFP_NOWAIT).

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 8c375b8e269d..25d09f753239 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3624,7 +3624,9 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
 							pages + nr_allocated);
 
 			nr_allocated += nr;
-			cond_resched();
+
+			if (gfpflags_allow_blocking(gfp))
+				cond_resched();
 
 			/*
 			 * If zero or pages were obtained partly,
@@ -3666,7 +3668,9 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
 		for (i = 0; i < (1U << order); i++)
 			pages[nr_allocated + i] = page + i;
 
-		cond_resched();
+		if (gfpflags_allow_blocking(gfp))
+			cond_resched();
+
 		nr_allocated += 1U << order;
 	}
 
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC 4/7] mm/kasan, mm/vmalloc: Respect GFP flags in kasan_populate_vmalloc()
  2025-07-04 15:25 [RFC 0/7] vmallloc and non-blocking GFPs Uladzislau Rezki (Sony)
                   ` (2 preceding siblings ...)
  2025-07-04 15:25 ` [RFC 3/7] mm/vmalloc: Avoid cond_resched() when blocking is not permitted Uladzislau Rezki (Sony)
@ 2025-07-04 15:25 ` Uladzislau Rezki (Sony)
  2025-07-07  1:47   ` Baoquan He
  2025-07-04 15:25 ` [RFC 5/7] mm/vmalloc: Defer freeing partly initialized vm_struct Uladzislau Rezki (Sony)
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Uladzislau Rezki (Sony) @ 2025-07-04 15:25 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Michal Hocko, LKML, Baoquan He, Uladzislau Rezki, Andrey Ryabinin,
	Alexander Potapenko

The function kasan_populate_vmalloc() internally allocates a page using
a hardcoded GFP_KERNEL flag. This is not safe in contexts where non-blocking
allocation flags are required, such as GFP_ATOMIC or GFP_NOWAIT, for example
during atomic vmalloc paths.

This patch modifies kasan_populate_vmalloc() and its helpers to accept a
gfp_mask argument to use it for a page allocation. It allows the caller to
specify the correct allocation context.

Also, when non-blocking flags are used, memalloc_noreclaim_save/restore()
is used around apply_to_page_range() to suppress potential reclaim behavior
that may otherwise violate atomic constraints.

Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 include/linux/kasan.h |  6 +++---
 mm/kasan/shadow.c     | 22 +++++++++++++++-------
 mm/vmalloc.c          |  4 ++--
 3 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 890011071f2b..fe5ce9215821 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -562,7 +562,7 @@ static inline void kasan_init_hw_tags(void) { }
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 
 void kasan_populate_early_vm_area_shadow(void *start, unsigned long size);
-int kasan_populate_vmalloc(unsigned long addr, unsigned long size);
+int kasan_populate_vmalloc(unsigned long addr, unsigned long size, gfp_t gfp_mask);
 void kasan_release_vmalloc(unsigned long start, unsigned long end,
 			   unsigned long free_region_start,
 			   unsigned long free_region_end,
@@ -574,7 +574,7 @@ static inline void kasan_populate_early_vm_area_shadow(void *start,
 						       unsigned long size)
 { }
 static inline int kasan_populate_vmalloc(unsigned long start,
-					unsigned long size)
+					unsigned long size, gfp_t gfp_mask)
 {
 	return 0;
 }
@@ -610,7 +610,7 @@ static __always_inline void kasan_poison_vmalloc(const void *start,
 static inline void kasan_populate_early_vm_area_shadow(void *start,
 						       unsigned long size) { }
 static inline int kasan_populate_vmalloc(unsigned long start,
-					unsigned long size)
+					unsigned long size, gfp_t gfp_mask)
 {
 	return 0;
 }
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index d2c70cd2afb1..5edfc1f6b53e 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -335,13 +335,13 @@ static void ___free_pages_bulk(struct page **pages, int nr_pages)
 	}
 }
 
-static int ___alloc_pages_bulk(struct page **pages, int nr_pages)
+static int ___alloc_pages_bulk(struct page **pages, int nr_pages, gfp_t gfp_mask)
 {
 	unsigned long nr_populated, nr_total = nr_pages;
 	struct page **page_array = pages;
 
 	while (nr_pages) {
-		nr_populated = alloc_pages_bulk(GFP_KERNEL, nr_pages, pages);
+		nr_populated = alloc_pages_bulk(gfp_mask, nr_pages, pages);
 		if (!nr_populated) {
 			___free_pages_bulk(page_array, nr_total - nr_pages);
 			return -ENOMEM;
@@ -353,25 +353,33 @@ static int ___alloc_pages_bulk(struct page **pages, int nr_pages)
 	return 0;
 }
 
-static int __kasan_populate_vmalloc(unsigned long start, unsigned long end)
+static int __kasan_populate_vmalloc(unsigned long start, unsigned long end, gfp_t gfp_mask)
 {
 	unsigned long nr_pages, nr_total = PFN_UP(end - start);
+	bool noblock = !gfpflags_allow_blocking(gfp_mask);
 	struct vmalloc_populate_data data;
+	unsigned int flags;
 	int ret = 0;
 
-	data.pages = (struct page **)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+	data.pages = (struct page **)__get_free_page(gfp_mask | __GFP_ZERO);
 	if (!data.pages)
 		return -ENOMEM;
 
 	while (nr_total) {
 		nr_pages = min(nr_total, PAGE_SIZE / sizeof(data.pages[0]));
-		ret = ___alloc_pages_bulk(data.pages, nr_pages);
+		ret = ___alloc_pages_bulk(data.pages, nr_pages, gfp_mask);
 		if (ret)
 			break;
 
 		data.start = start;
+		if (noblock)
+			flags = memalloc_noreclaim_save();
+
 		ret = apply_to_page_range(&init_mm, start, nr_pages * PAGE_SIZE,
 					  kasan_populate_vmalloc_pte, &data);
+		if (noblock)
+			memalloc_noreclaim_restore(flags);
+
 		___free_pages_bulk(data.pages, nr_pages);
 		if (ret)
 			break;
@@ -385,7 +393,7 @@ static int __kasan_populate_vmalloc(unsigned long start, unsigned long end)
 	return ret;
 }
 
-int kasan_populate_vmalloc(unsigned long addr, unsigned long size)
+int kasan_populate_vmalloc(unsigned long addr, unsigned long size, gfp_t gfp_mask)
 {
 	unsigned long shadow_start, shadow_end;
 	int ret;
@@ -414,7 +422,7 @@ int kasan_populate_vmalloc(unsigned long addr, unsigned long size)
 	shadow_start = PAGE_ALIGN_DOWN(shadow_start);
 	shadow_end = PAGE_ALIGN(shadow_end);
 
-	ret = __kasan_populate_vmalloc(shadow_start, shadow_end);
+	ret = __kasan_populate_vmalloc(shadow_start, shadow_end, gfp_mask);
 	if (ret)
 		return ret;
 
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 25d09f753239..5bac15b09b03 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2091,7 +2091,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 	BUG_ON(va->va_start < vstart);
 	BUG_ON(va->va_end > vend);
 
-	ret = kasan_populate_vmalloc(addr, size);
+	ret = kasan_populate_vmalloc(addr, size, gfp_mask);
 	if (ret) {
 		free_vmap_area(va);
 		return ERR_PTR(ret);
@@ -4832,7 +4832,7 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
 
 	/* populate the kasan shadow space */
 	for (area = 0; area < nr_vms; area++) {
-		if (kasan_populate_vmalloc(vas[area]->va_start, sizes[area]))
+		if (kasan_populate_vmalloc(vas[area]->va_start, sizes[area], GFP_KERNEL))
 			goto err_free_shadow;
 	}
 
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC 5/7] mm/vmalloc: Defer freeing partly initialized vm_struct
  2025-07-04 15:25 [RFC 0/7] vmallloc and non-blocking GFPs Uladzislau Rezki (Sony)
                   ` (3 preceding siblings ...)
  2025-07-04 15:25 ` [RFC 4/7] mm/kasan, mm/vmalloc: Respect GFP flags in kasan_populate_vmalloc() Uladzislau Rezki (Sony)
@ 2025-07-04 15:25 ` Uladzislau Rezki (Sony)
  2025-07-04 15:25 ` [RFC 6/7] mm/vmalloc: Support non-blocking GFP flags in __vmalloc_area_node() Uladzislau Rezki (Sony)
  2025-07-04 15:25 ` [RFC 7/7] mm: Drop __GFP_DIRECT_RECLAIM flag if PF_MEMALLOC is set Uladzislau Rezki (Sony)
  6 siblings, 0 replies; 25+ messages in thread
From: Uladzislau Rezki (Sony) @ 2025-07-04 15:25 UTC (permalink / raw)
  To: linux-mm, Andrew Morton; +Cc: Michal Hocko, LKML, Baoquan He, Uladzislau Rezki

__vmalloc_area_node() may call free_vmap_area() or vfree() on
error paths, both of which can sleep. This becomes problematic
if the function is invoked from an atomic context, such as when
GFP_ATOMIC or GFP_NOWAIT is passed via gfp_mask.

To fix this, unify error paths and defer the cleanup of partly
initialized vm_struct objects to a workqueue. This ensures that
freeing happens in a process context and avoids invalid sleeps
in atomic regions.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 35 ++++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 5bac15b09b03..2eaff0575a9e 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3677,6 +3677,36 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
 	return nr_allocated;
 }
 
+static LLIST_HEAD(pending_vm_area_cleanup);
+
+static void cleanup_vm_area_work(struct work_struct *work)
+{
+	struct llist_node *node, *next;
+	struct vm_struct *vm;
+
+	llist_for_each_safe(node, next, llist_del_all(&pending_vm_area_cleanup)) {
+		vm = (void *) node - offsetof(struct vm_struct, next);
+
+		if (!vm->nr_pages)
+			free_vm_area(vm);
+		else
+			vfree(vm->addr);
+	}
+}
+
+static DECLARE_WORK(cleanup_vm_area, cleanup_vm_area_work);
+
+/*
+ * Helper for __vmalloc_area_node() to defer cleanup
+ * of partially initialized vm_struct in error paths.
+ */
+static void
+defer_vm_area_cleanup(struct vm_struct *area)
+{
+	if (llist_add((struct llist_node *) &area->next, &pending_vm_area_cleanup))
+		schedule_work(&cleanup_vm_area);
+}
+
 static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 				 pgprot_t prot, unsigned int page_shift,
 				 int node)
@@ -3708,8 +3738,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 		warn_alloc(gfp_mask, NULL,
 			"vmalloc error: size %lu, failed to allocated page array size %lu",
 			nr_small_pages * PAGE_SIZE, array_size);
-		free_vm_area(area);
-		return NULL;
+		goto fail;
 	}
 
 	set_vm_area_page_order(area, page_shift - PAGE_SHIFT);
@@ -3786,7 +3815,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	return area->addr;
 
 fail:
-	vfree(area->addr);
+	defer_vm_area_cleanup(area);
 	return NULL;
 }
 
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC 6/7] mm/vmalloc: Support non-blocking GFP flags in __vmalloc_area_node()
  2025-07-04 15:25 [RFC 0/7] vmallloc and non-blocking GFPs Uladzislau Rezki (Sony)
                   ` (4 preceding siblings ...)
  2025-07-04 15:25 ` [RFC 5/7] mm/vmalloc: Defer freeing partly initialized vm_struct Uladzislau Rezki (Sony)
@ 2025-07-04 15:25 ` Uladzislau Rezki (Sony)
  2025-07-07  7:13   ` Michal Hocko
  2025-07-08 15:47   ` Michal Hocko
  2025-07-04 15:25 ` [RFC 7/7] mm: Drop __GFP_DIRECT_RECLAIM flag if PF_MEMALLOC is set Uladzislau Rezki (Sony)
  6 siblings, 2 replies; 25+ messages in thread
From: Uladzislau Rezki (Sony) @ 2025-07-04 15:25 UTC (permalink / raw)
  To: linux-mm, Andrew Morton; +Cc: Michal Hocko, LKML, Baoquan He, Uladzislau Rezki

This patch makes __vmalloc_area_node() to correctly handle non-blocking
allocation requests, such as GFP_ATOMIC and GFP_NOWAIT. Main changes:

- nested_gfp flag follows the same non-blocking constraints
  as the primary gfp_mask, ensuring consistency and avoiding
  sleeping allocations in atomic contexts.

- if blocking is not allowed, __GFP_NOFAIL is forcibly cleared
  and warning is issued if it was set, since __GFP_NOFAIL is
  incompatible with non-blocking contexts;

- Add a __GFP_HIGHMEM to gfp_mask only for blocking requests
  if there are no DMA constraints.

- in non-blocking mode we use memalloc_noreclaim_save/restore()
  to prevent reclaim related operations that may sleep while
  setting up page tables or mapping pages.

This is particularly important for page table allocations that
internally use GFP_PGTABLE_KERNEL, which may sleep unless such
scope restrictions are applied. For example:

<snip>
    #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)

    __pte_alloc_kernel()
        pte_alloc_one_kernel(&init_mm);
            pagetable_alloc_noprof(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, 0);
<snip>

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 30 +++++++++++++++++++++++++-----
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 2eaff0575a9e..fe1699e01e02 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3711,7 +3711,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 				 pgprot_t prot, unsigned int page_shift,
 				 int node)
 {
-	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
+	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
 	bool nofail = gfp_mask & __GFP_NOFAIL;
 	unsigned long addr = (unsigned long)area->addr;
 	unsigned long size = get_vm_area_size(area);
@@ -3719,12 +3719,28 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	unsigned int nr_small_pages = size >> PAGE_SHIFT;
 	unsigned int page_order;
 	unsigned int flags;
+	bool noblock;
 	int ret;
 
 	array_size = (unsigned long)nr_small_pages * sizeof(struct page *);
+	noblock = !gfpflags_allow_blocking(gfp_mask);
 
-	if (!(gfp_mask & (GFP_DMA | GFP_DMA32)))
-		gfp_mask |= __GFP_HIGHMEM;
+	if (noblock) {
+		/* __GFP_NOFAIL is incompatible with non-blocking contexts. */
+		WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
+		gfp_mask &= ~__GFP_NOFAIL;
+
+		/*
+		 * In non-sleeping contexts, ensure nested allocations follow
+		 * same non-blocking rules.
+		 */
+		nested_gfp = gfp_mask | __GFP_ZERO;
+		nofail = false;
+	} else {
+		/* Allow highmem allocations if there are no DMA constraints. */
+		if (!(gfp_mask & (GFP_DMA | GFP_DMA32)))
+			gfp_mask |= __GFP_HIGHMEM;
+	}
 
 	/* Please note that the recursion is strictly bounded. */
 	if (array_size > PAGE_SIZE) {
@@ -3788,7 +3804,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	 * page tables allocations ignore external gfp mask, enforce it
 	 * by the scope API
 	 */
-	if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
+	if (noblock)
+		flags = memalloc_noreclaim_save();
+	else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
 		flags = memalloc_nofs_save();
 	else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == 0)
 		flags = memalloc_noio_save();
@@ -3800,7 +3818,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 			schedule_timeout_uninterruptible(1);
 	} while (nofail && (ret < 0));
 
-	if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
+	if (noblock)
+		memalloc_noreclaim_restore(flags);
+	else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
 		memalloc_nofs_restore(flags);
 	else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == 0)
 		memalloc_noio_restore(flags);
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC 7/7] mm: Drop __GFP_DIRECT_RECLAIM flag if PF_MEMALLOC is set
  2025-07-04 15:25 [RFC 0/7] vmallloc and non-blocking GFPs Uladzislau Rezki (Sony)
                   ` (5 preceding siblings ...)
  2025-07-04 15:25 ` [RFC 6/7] mm/vmalloc: Support non-blocking GFP flags in __vmalloc_area_node() Uladzislau Rezki (Sony)
@ 2025-07-04 15:25 ` Uladzislau Rezki (Sony)
  6 siblings, 0 replies; 25+ messages in thread
From: Uladzislau Rezki (Sony) @ 2025-07-04 15:25 UTC (permalink / raw)
  To: linux-mm, Andrew Morton; +Cc: Michal Hocko, LKML, Baoquan He, Uladzislau Rezki

The memory allocator already avoids reclaim when PF_MEMALLOC is set.
Clear __GFP_DIRECT_RECLAIM explicitly to suppress might_alloc() warnings
to make more correct behavior.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 include/linux/sched/mm.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index b13474825130..40757173acb1 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -246,12 +246,14 @@ static inline bool in_vfork(struct task_struct *tsk)
  * PF_MEMALLOC_NOIO implies GFP_NOIO
  * PF_MEMALLOC_NOFS implies GFP_NOFS
  * PF_MEMALLOC_PIN  implies !GFP_MOVABLE
+ * PF_MEMALLOC      implies !__GFP_DIRECT_RECLAIM
  */
 static inline gfp_t current_gfp_context(gfp_t flags)
 {
 	unsigned int pflags = READ_ONCE(current->flags);
 
-	if (unlikely(pflags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | PF_MEMALLOC_PIN))) {
+	if (unlikely(pflags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS |
+			PF_MEMALLOC_PIN | PF_MEMALLOC))) {
 		/*
 		 * NOIO implies both NOIO and NOFS and it is a weaker context
 		 * so always make sure it makes precedence
@@ -263,6 +265,9 @@ static inline gfp_t current_gfp_context(gfp_t flags)
 
 		if (pflags & PF_MEMALLOC_PIN)
 			flags &= ~__GFP_MOVABLE;
+
+		if (pflags & PF_MEMALLOC)
+			flags &= ~__GFP_DIRECT_RECLAIM;
 	}
 	return flags;
 }
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC 4/7] mm/kasan, mm/vmalloc: Respect GFP flags in kasan_populate_vmalloc()
  2025-07-04 15:25 ` [RFC 4/7] mm/kasan, mm/vmalloc: Respect GFP flags in kasan_populate_vmalloc() Uladzislau Rezki (Sony)
@ 2025-07-07  1:47   ` Baoquan He
  2025-07-08  1:15     ` Baoquan He
  0 siblings, 1 reply; 25+ messages in thread
From: Baoquan He @ 2025-07-07  1:47 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: linux-mm, Andrew Morton, Michal Hocko, LKML, Andrey Ryabinin,
	Alexander Potapenko

On 07/04/25 at 05:25pm, Uladzislau Rezki (Sony) wrote:
......snip.......
> diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
> index d2c70cd2afb1..5edfc1f6b53e 100644
> --- a/mm/kasan/shadow.c
> +++ b/mm/kasan/shadow.c
> @@ -335,13 +335,13 @@ static void ___free_pages_bulk(struct page **pages, int nr_pages)
>  	}
>  }
>  
> -static int ___alloc_pages_bulk(struct page **pages, int nr_pages)
> +static int ___alloc_pages_bulk(struct page **pages, int nr_pages, gfp_t gfp_mask)
>  {
>  	unsigned long nr_populated, nr_total = nr_pages;
>  	struct page **page_array = pages;
>  
>  	while (nr_pages) {
> -		nr_populated = alloc_pages_bulk(GFP_KERNEL, nr_pages, pages);
> +		nr_populated = alloc_pages_bulk(gfp_mask, nr_pages, pages);
>  		if (!nr_populated) {
>  			___free_pages_bulk(page_array, nr_total - nr_pages);
>  			return -ENOMEM;
> @@ -353,25 +353,33 @@ static int ___alloc_pages_bulk(struct page **pages, int nr_pages)
>  	return 0;
>  }
>  
> -static int __kasan_populate_vmalloc(unsigned long start, unsigned long end)
> +static int __kasan_populate_vmalloc(unsigned long start, unsigned long end, gfp_t gfp_mask)
>  {
>  	unsigned long nr_pages, nr_total = PFN_UP(end - start);
> +	bool noblock = !gfpflags_allow_blocking(gfp_mask);
>  	struct vmalloc_populate_data data;
> +	unsigned int flags;
>  	int ret = 0;
>  
> -	data.pages = (struct page **)__get_free_page(GFP_KERNEL | __GFP_ZERO);
> +	data.pages = (struct page **)__get_free_page(gfp_mask | __GFP_ZERO);
>  	if (!data.pages)
>  		return -ENOMEM;
>  
>  	while (nr_total) {
>  		nr_pages = min(nr_total, PAGE_SIZE / sizeof(data.pages[0]));
> -		ret = ___alloc_pages_bulk(data.pages, nr_pages);
> +		ret = ___alloc_pages_bulk(data.pages, nr_pages, gfp_mask);
>  		if (ret)
>  			break;
>  
>  		data.start = start;
> +		if (noblock)
> +			flags = memalloc_noreclaim_save();
> +
>  		ret = apply_to_page_range(&init_mm, start, nr_pages * PAGE_SIZE,
>  					  kasan_populate_vmalloc_pte, &data);

This series is a great enhancement, thanks.

When checking code, seems apply_to_page_range() will lead to page table
allocation which uses GFP_PGTABLE_KERNEL. Not sure if we need to handle
this either.

> +		if (noblock)
> +			memalloc_noreclaim_restore(flags);
> +
>  		___free_pages_bulk(data.pages, nr_pages);
>  		if (ret)
>  			break;
...snip...



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 2/7] mm/vmalloc: Support non-blocking GFP flags in alloc_vmap_area()
  2025-07-04 15:25 ` [RFC 2/7] mm/vmalloc: Support non-blocking GFP flags in alloc_vmap_area() Uladzislau Rezki (Sony)
@ 2025-07-07  7:11   ` Michal Hocko
  2025-07-08 12:34     ` Uladzislau Rezki
  0 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2025-07-07  7:11 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony); +Cc: linux-mm, Andrew Morton, LKML, Baoquan He

On Fri 04-07-25 17:25:32, Uladzislau Rezki wrote:
[...]
> @@ -2030,7 +2033,8 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>  	 */
>  	va = node_alloc(size, align, vstart, vend, &addr, &vn_id);
>  	if (!va) {
> -		gfp_mask = gfp_mask & GFP_RECLAIM_MASK;
> +		if (allow_block)
> +			gfp_mask = gfp_mask & GFP_RECLAIM_MASK;

I don't follow here and is this even correct?

>  
>  		va = kmem_cache_alloc_node(vmap_area_cachep, gfp_mask, node);
>  		if (unlikely(!va))
> @@ -2057,8 +2061,14 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>  	 * If an allocation fails, the error value is
>  	 * returned. Therefore trigger the overflow path.
>  	 */
> -	if (IS_ERR_VALUE(addr))
> +	if (IS_ERR_VALUE(addr)) {
> +		if (!allow_block) {
> +			kmem_cache_free(vmap_area_cachep, va);
> +			return ERR_PTR(-ENOMEM);

I would suggest to add a comment for this. Something like

for blockable requests trigger the overflow paths because that
relies on vmap_purge_lock mutex and blocking notifiers.

> +		}
> +
>  		goto overflow;
> +	}
>  
>  	va->va_start = addr;
>  	va->va_end = addr + size;
> -- 
> 2.39.5

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 3/7] mm/vmalloc: Avoid cond_resched() when blocking is not permitted
  2025-07-04 15:25 ` [RFC 3/7] mm/vmalloc: Avoid cond_resched() when blocking is not permitted Uladzislau Rezki (Sony)
@ 2025-07-07  7:11   ` Michal Hocko
  2025-07-08 12:29     ` Uladzislau Rezki
  0 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2025-07-07  7:11 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony); +Cc: linux-mm, Andrew Morton, LKML, Baoquan He

On Fri 04-07-25 17:25:33, Uladzislau Rezki wrote:
> The vm_area_alloc_pages() function uses cond_resched() to yield the
> CPU during potentially long-running loops. However, yielding should
> only be done if the given GFP flags allow blocking.
> 
> This patch avoids calling cond_resched() when the allocation context
> is non-blocking(GFP_ATOMIC, GFP_NOWAIT).

Do we even need those cond_resched calls? Both of them are called
shortly after memory allocator which already yields CPU when allowed.
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
>  mm/vmalloc.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 8c375b8e269d..25d09f753239 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3624,7 +3624,9 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
>  							pages + nr_allocated);
>  
>  			nr_allocated += nr;
> -			cond_resched();
> +
> +			if (gfpflags_allow_blocking(gfp))
> +				cond_resched();
>  
>  			/*
>  			 * If zero or pages were obtained partly,
> @@ -3666,7 +3668,9 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
>  		for (i = 0; i < (1U << order); i++)
>  			pages[nr_allocated + i] = page + i;
>  
> -		cond_resched();
> +		if (gfpflags_allow_blocking(gfp))
> +			cond_resched();
> +
>  		nr_allocated += 1U << order;
>  	}
>  
> -- 
> 2.39.5

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 6/7] mm/vmalloc: Support non-blocking GFP flags in __vmalloc_area_node()
  2025-07-04 15:25 ` [RFC 6/7] mm/vmalloc: Support non-blocking GFP flags in __vmalloc_area_node() Uladzislau Rezki (Sony)
@ 2025-07-07  7:13   ` Michal Hocko
  2025-07-08 12:27     ` Uladzislau Rezki
  2025-07-08 15:47   ` Michal Hocko
  1 sibling, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2025-07-07  7:13 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony); +Cc: linux-mm, Andrew Morton, LKML, Baoquan He

On Fri 04-07-25 17:25:36, Uladzislau Rezki wrote:
> This patch makes __vmalloc_area_node() to correctly handle non-blocking
> allocation requests, such as GFP_ATOMIC and GFP_NOWAIT. Main changes:
> 
> - nested_gfp flag follows the same non-blocking constraints
>   as the primary gfp_mask, ensuring consistency and avoiding
>   sleeping allocations in atomic contexts.
> 
> - if blocking is not allowed, __GFP_NOFAIL is forcibly cleared
>   and warning is issued if it was set, since __GFP_NOFAIL is
>   incompatible with non-blocking contexts;
> 
> - Add a __GFP_HIGHMEM to gfp_mask only for blocking requests
>   if there are no DMA constraints.
> 
> - in non-blocking mode we use memalloc_noreclaim_save/restore()
>   to prevent reclaim related operations that may sleep while
>   setting up page tables or mapping pages.
> 
> This is particularly important for page table allocations that
> internally use GFP_PGTABLE_KERNEL, which may sleep unless such
> scope restrictions are applied. For example:
> 
> <snip>
>     #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)
> 
>     __pte_alloc_kernel()
>         pte_alloc_one_kernel(&init_mm);
>             pagetable_alloc_noprof(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, 0);
> <snip>

The changelog doesn't explain the actual implementation and that is
really crucial here. You rely on memalloc_noreclaim_save (i.e.
PF_MEMALLOC) to never trigger memory reclaim but you are not explaining
how do you prevent from the biggest caveat of this interface. Let me
quote the documentation
 * Users of this scope have to be extremely careful to not deplete the reserves
 * completely and implement a throttling mechanism which controls the
 * consumption of the reserve based on the amount of freed memory. Usage of a
 * pre-allocated pool (e.g. mempool) should be always considered before using
 * this scope.

Unless I am missing something _any_ vmalloc(GFP_NOWAIT|GFP_ATOMIC) user
would get practically unbound access to the whole available memory. This
is not really acceptable.

> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
>  mm/vmalloc.c | 30 +++++++++++++++++++++++++-----
>  1 file changed, 25 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 2eaff0575a9e..fe1699e01e02 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3711,7 +3711,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  				 pgprot_t prot, unsigned int page_shift,
>  				 int node)
>  {
> -	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> +	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
>  	bool nofail = gfp_mask & __GFP_NOFAIL;
>  	unsigned long addr = (unsigned long)area->addr;
>  	unsigned long size = get_vm_area_size(area);
> @@ -3719,12 +3719,28 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  	unsigned int nr_small_pages = size >> PAGE_SHIFT;
>  	unsigned int page_order;
>  	unsigned int flags;
> +	bool noblock;
>  	int ret;
>  
>  	array_size = (unsigned long)nr_small_pages * sizeof(struct page *);
> +	noblock = !gfpflags_allow_blocking(gfp_mask);
>  
> -	if (!(gfp_mask & (GFP_DMA | GFP_DMA32)))
> -		gfp_mask |= __GFP_HIGHMEM;
> +	if (noblock) {
> +		/* __GFP_NOFAIL is incompatible with non-blocking contexts. */
> +		WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
> +		gfp_mask &= ~__GFP_NOFAIL;
> +
> +		/*
> +		 * In non-sleeping contexts, ensure nested allocations follow
> +		 * same non-blocking rules.
> +		 */
> +		nested_gfp = gfp_mask | __GFP_ZERO;
> +		nofail = false;
> +	} else {
> +		/* Allow highmem allocations if there are no DMA constraints. */
> +		if (!(gfp_mask & (GFP_DMA | GFP_DMA32)))
> +			gfp_mask |= __GFP_HIGHMEM;
> +	}
>  
>  	/* Please note that the recursion is strictly bounded. */
>  	if (array_size > PAGE_SIZE) {
> @@ -3788,7 +3804,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  	 * page tables allocations ignore external gfp mask, enforce it
>  	 * by the scope API
>  	 */
> -	if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
> +	if (noblock)
> +		flags = memalloc_noreclaim_save();
> +	else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
>  		flags = memalloc_nofs_save();
>  	else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == 0)
>  		flags = memalloc_noio_save();
> @@ -3800,7 +3818,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  			schedule_timeout_uninterruptible(1);
>  	} while (nofail && (ret < 0));
>  
> -	if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
> +	if (noblock)
> +		memalloc_noreclaim_restore(flags);
> +	else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
>  		memalloc_nofs_restore(flags);
>  	else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == 0)
>  		memalloc_noio_restore(flags);
> -- 
> 2.39.5
> 

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 4/7] mm/kasan, mm/vmalloc: Respect GFP flags in kasan_populate_vmalloc()
  2025-07-07  1:47   ` Baoquan He
@ 2025-07-08  1:15     ` Baoquan He
  2025-07-08  8:30       ` Uladzislau Rezki
  0 siblings, 1 reply; 25+ messages in thread
From: Baoquan He @ 2025-07-08  1:15 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: linux-mm, Andrew Morton, Michal Hocko, LKML, Andrey Ryabinin,
	Alexander Potapenko

On 07/07/25 at 09:47am, Baoquan He wrote:
> On 07/04/25 at 05:25pm, Uladzislau Rezki (Sony) wrote:
> ......snip.......
> > diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
> > index d2c70cd2afb1..5edfc1f6b53e 100644
> > --- a/mm/kasan/shadow.c
> > +++ b/mm/kasan/shadow.c
> > @@ -335,13 +335,13 @@ static void ___free_pages_bulk(struct page **pages, int nr_pages)
> >  	}
> >  }
> >  
> > -static int ___alloc_pages_bulk(struct page **pages, int nr_pages)
> > +static int ___alloc_pages_bulk(struct page **pages, int nr_pages, gfp_t gfp_mask)
> >  {
> >  	unsigned long nr_populated, nr_total = nr_pages;
> >  	struct page **page_array = pages;
> >  
> >  	while (nr_pages) {
> > -		nr_populated = alloc_pages_bulk(GFP_KERNEL, nr_pages, pages);
> > +		nr_populated = alloc_pages_bulk(gfp_mask, nr_pages, pages);
> >  		if (!nr_populated) {
> >  			___free_pages_bulk(page_array, nr_total - nr_pages);
> >  			return -ENOMEM;
> > @@ -353,25 +353,33 @@ static int ___alloc_pages_bulk(struct page **pages, int nr_pages)
> >  	return 0;
> >  }
> >  
> > -static int __kasan_populate_vmalloc(unsigned long start, unsigned long end)
> > +static int __kasan_populate_vmalloc(unsigned long start, unsigned long end, gfp_t gfp_mask)
> >  {
> >  	unsigned long nr_pages, nr_total = PFN_UP(end - start);
> > +	bool noblock = !gfpflags_allow_blocking(gfp_mask);
> >  	struct vmalloc_populate_data data;
> > +	unsigned int flags;
> >  	int ret = 0;
> >  
> > -	data.pages = (struct page **)__get_free_page(GFP_KERNEL | __GFP_ZERO);
> > +	data.pages = (struct page **)__get_free_page(gfp_mask | __GFP_ZERO);
> >  	if (!data.pages)
> >  		return -ENOMEM;
> >  
> >  	while (nr_total) {
> >  		nr_pages = min(nr_total, PAGE_SIZE / sizeof(data.pages[0]));
> > -		ret = ___alloc_pages_bulk(data.pages, nr_pages);
> > +		ret = ___alloc_pages_bulk(data.pages, nr_pages, gfp_mask);
> >  		if (ret)
> >  			break;
> >  
> >  		data.start = start;
> > +		if (noblock)
> > +			flags = memalloc_noreclaim_save();
> > +
> >  		ret = apply_to_page_range(&init_mm, start, nr_pages * PAGE_SIZE,
> >  					  kasan_populate_vmalloc_pte, &data);
> 
> This series is a great enhancement, thanks.
> 
> When checking code, seems apply_to_page_range() will lead to page table
> allocation which uses GFP_PGTABLE_KERNEL. Not sure if we need to handle
> this either.

I am fool, didn't see the obvious added scope between
memalloc_noreclaim_save/srestore(). Please ignore this noise.

> 
> > +		if (noblock)
> > +			memalloc_noreclaim_restore(flags);
> > +
> >  		___free_pages_bulk(data.pages, nr_pages);
> >  		if (ret)
> >  			break;
> ...snip...
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [External] [RFC 1/7] lib/test_vmalloc: Add non-block-alloc-test case
  2025-07-04 15:25 ` [RFC 1/7] lib/test_vmalloc: Add non-block-alloc-test case Uladzislau Rezki (Sony)
@ 2025-07-08  5:59   ` Adrian Huang12
  2025-07-08  8:29     ` Uladzislau Rezki
  0 siblings, 1 reply; 25+ messages in thread
From: Adrian Huang12 @ 2025-07-08  5:59 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony), linux-mm@kvack.org, Andrew Morton
  Cc: Michal Hocko, LKML, Baoquan He

Hi Uladzislau,

> -----Original Message-----
> From: owner-linux-mm@kvack.org <owner-linux-mm@kvack.org> On Behalf
> Of Uladzislau Rezki (Sony)
> Sent: Friday, July 4, 2025 11:26 PM
> To: linux-mm@kvack.org; Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>; LKML
> <linux-kernel@vger.kernel.org>; Baoquan He <bhe@redhat.com>; Uladzislau
> Rezki <urezki@gmail.com>
> Subject: [External] [RFC 1/7] lib/test_vmalloc: Add non-block-alloc-test case
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
>  lib/test_vmalloc.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c index
> 1b0b59549aaf..9e3429dfe176 100644
> --- a/lib/test_vmalloc.c
> +++ b/lib/test_vmalloc.c
> @@ -54,6 +54,7 @@ __param(int, run_test_mask, INT_MAX,
>  		"\t\tid: 256,  name: kvfree_rcu_1_arg_vmalloc_test\n"
>  		"\t\tid: 512,  name: kvfree_rcu_2_arg_vmalloc_test\n"
>  		"\t\tid: 1024, name: vm_map_ram_test\n"
> +		"\t\tid: 2048, name: no_block_alloc_test\n"
>  		/* Add a new test case description here. */  );
> 
> @@ -283,6 +284,31 @@ static int fix_size_alloc_test(void)
>  	return 0;
>  }
> 
> +static DEFINE_SPINLOCK(no_block_alloc_lock);
> +
> +static int no_block_alloc_test(void)
> +{
> +	void *ptr;
> +	u8 rnd;
> +	int i;
> +
> +	for (i = 0; i < test_loop_count; i++) {
> +		rnd = get_random_u8();
> +
> +		spin_lock(&no_block_alloc_lock);

Since there is no shared data to be protected, do we need this lock for serialization? Any concerns?

It spent 18 minutes for this test (256-core server):
  # time modprobe test_vmalloc nr_threads=$(nproc) run_test_mask=0x800
  real    18m6.099s
  user    0m0.002s
  sys     0m4.555s

Without the lock, it spent 41 seconds (Have run for 300+ tests without any failure: 256-core server):
  # time modprobe test_vmalloc nr_threads=$(nproc) run_test_mask=0x800
  real    0m41.367s
  user    0m0.003s
  sys     0m5.758s

Would it be better to run this test concurrently? That said, it can also verify the scalability problem when the number of CPUs grow. 

> +		ptr = __vmalloc(PAGE_SIZE, (rnd % 2) ? GFP_NOWAIT :
> GFP_ATOMIC);
> +		spin_unlock(&no_block_alloc_lock);
> +
> +		if (!ptr)
> +			return -1;
> +
> +		*((__u8 *)ptr) = 0;
> +		vfree(ptr);
> +	}
> +
> +	return 0;
> +}
> +
>  static int
>  pcpu_alloc_test(void)
>  {
> @@ -410,6 +436,7 @@ static struct test_case_desc test_case_array[] = {
>  	{ "kvfree_rcu_1_arg_vmalloc_test", kvfree_rcu_1_arg_vmalloc_test },
>  	{ "kvfree_rcu_2_arg_vmalloc_test", kvfree_rcu_2_arg_vmalloc_test },
>  	{ "vm_map_ram_test", vm_map_ram_test },
> +	{ "no_block_alloc_test", no_block_alloc_test },
>  	/* Add a new test case here. */
>  };
> 
> --
> 2.39.5
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [External] [RFC 1/7] lib/test_vmalloc: Add non-block-alloc-test case
  2025-07-08  5:59   ` [External] " Adrian Huang12
@ 2025-07-08  8:29     ` Uladzislau Rezki
  0 siblings, 0 replies; 25+ messages in thread
From: Uladzislau Rezki @ 2025-07-08  8:29 UTC (permalink / raw)
  To: Adrian Huang12
  Cc: Uladzislau Rezki (Sony), linux-mm@kvack.org, Andrew Morton,
	Michal Hocko, LKML, Baoquan He

Hello, Adrian!

> 
> > -----Original Message-----
> > From: owner-linux-mm@kvack.org <owner-linux-mm@kvack.org> On Behalf
> > Of Uladzislau Rezki (Sony)
> > Sent: Friday, July 4, 2025 11:26 PM
> > To: linux-mm@kvack.org; Andrew Morton <akpm@linux-foundation.org>
> > Cc: Michal Hocko <mhocko@kernel.org>; LKML
> > <linux-kernel@vger.kernel.org>; Baoquan He <bhe@redhat.com>; Uladzislau
> > Rezki <urezki@gmail.com>
> > Subject: [External] [RFC 1/7] lib/test_vmalloc: Add non-block-alloc-test case
> > 
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > ---
> >  lib/test_vmalloc.c | 27 +++++++++++++++++++++++++++
> >  1 file changed, 27 insertions(+)
> > 
> > diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c index
> > 1b0b59549aaf..9e3429dfe176 100644
> > --- a/lib/test_vmalloc.c
> > +++ b/lib/test_vmalloc.c
> > @@ -54,6 +54,7 @@ __param(int, run_test_mask, INT_MAX,
> >  		"\t\tid: 256,  name: kvfree_rcu_1_arg_vmalloc_test\n"
> >  		"\t\tid: 512,  name: kvfree_rcu_2_arg_vmalloc_test\n"
> >  		"\t\tid: 1024, name: vm_map_ram_test\n"
> > +		"\t\tid: 2048, name: no_block_alloc_test\n"
> >  		/* Add a new test case description here. */  );
> > 
> > @@ -283,6 +284,31 @@ static int fix_size_alloc_test(void)
> >  	return 0;
> >  }
> > 
> > +static DEFINE_SPINLOCK(no_block_alloc_lock);
> > +
> > +static int no_block_alloc_test(void)
> > +{
> > +	void *ptr;
> > +	u8 rnd;
> > +	int i;
> > +
> > +	for (i = 0; i < test_loop_count; i++) {
> > +		rnd = get_random_u8();
> > +
> > +		spin_lock(&no_block_alloc_lock);
> 
> Since there is no shared data to be protected, do we need this lock for serialization? Any concerns?
> 
> It spent 18 minutes for this test (256-core server):
>   # time modprobe test_vmalloc nr_threads=$(nproc) run_test_mask=0x800
>   real    18m6.099s
>   user    0m0.002s
>   sys     0m4.555s
> 
> Without the lock, it spent 41 seconds (Have run for 300+ tests without any failure: 256-core server):
>   # time modprobe test_vmalloc nr_threads=$(nproc) run_test_mask=0x800
>   real    0m41.367s
>   user    0m0.003s
>   sys     0m5.758s
> 
> Would it be better to run this test concurrently? That said, it can also verify the scalability problem when the number of CPUs grow. 
> 
It was added just to track the sleep-in-atomic issues. We do not need
that spin-lock in fact. Instead we can just invoke
preempt_disable/enable() to simulate the context which is not allowed
to trigger any schedule(), i.e. sleeping.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 4/7] mm/kasan, mm/vmalloc: Respect GFP flags in kasan_populate_vmalloc()
  2025-07-08  1:15     ` Baoquan He
@ 2025-07-08  8:30       ` Uladzislau Rezki
  0 siblings, 0 replies; 25+ messages in thread
From: Uladzislau Rezki @ 2025-07-08  8:30 UTC (permalink / raw)
  To: Baoquan He
  Cc: Uladzislau Rezki (Sony), linux-mm, Andrew Morton, Michal Hocko,
	LKML, Andrey Ryabinin, Alexander Potapenko

On Tue, Jul 08, 2025 at 09:15:19AM +0800, Baoquan He wrote:
> On 07/07/25 at 09:47am, Baoquan He wrote:
> > On 07/04/25 at 05:25pm, Uladzislau Rezki (Sony) wrote:
> > ......snip.......
> > > diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
> > > index d2c70cd2afb1..5edfc1f6b53e 100644
> > > --- a/mm/kasan/shadow.c
> > > +++ b/mm/kasan/shadow.c
> > > @@ -335,13 +335,13 @@ static void ___free_pages_bulk(struct page **pages, int nr_pages)
> > >  	}
> > >  }
> > >  
> > > -static int ___alloc_pages_bulk(struct page **pages, int nr_pages)
> > > +static int ___alloc_pages_bulk(struct page **pages, int nr_pages, gfp_t gfp_mask)
> > >  {
> > >  	unsigned long nr_populated, nr_total = nr_pages;
> > >  	struct page **page_array = pages;
> > >  
> > >  	while (nr_pages) {
> > > -		nr_populated = alloc_pages_bulk(GFP_KERNEL, nr_pages, pages);
> > > +		nr_populated = alloc_pages_bulk(gfp_mask, nr_pages, pages);
> > >  		if (!nr_populated) {
> > >  			___free_pages_bulk(page_array, nr_total - nr_pages);
> > >  			return -ENOMEM;
> > > @@ -353,25 +353,33 @@ static int ___alloc_pages_bulk(struct page **pages, int nr_pages)
> > >  	return 0;
> > >  }
> > >  
> > > -static int __kasan_populate_vmalloc(unsigned long start, unsigned long end)
> > > +static int __kasan_populate_vmalloc(unsigned long start, unsigned long end, gfp_t gfp_mask)
> > >  {
> > >  	unsigned long nr_pages, nr_total = PFN_UP(end - start);
> > > +	bool noblock = !gfpflags_allow_blocking(gfp_mask);
> > >  	struct vmalloc_populate_data data;
> > > +	unsigned int flags;
> > >  	int ret = 0;
> > >  
> > > -	data.pages = (struct page **)__get_free_page(GFP_KERNEL | __GFP_ZERO);
> > > +	data.pages = (struct page **)__get_free_page(gfp_mask | __GFP_ZERO);
> > >  	if (!data.pages)
> > >  		return -ENOMEM;
> > >  
> > >  	while (nr_total) {
> > >  		nr_pages = min(nr_total, PAGE_SIZE / sizeof(data.pages[0]));
> > > -		ret = ___alloc_pages_bulk(data.pages, nr_pages);
> > > +		ret = ___alloc_pages_bulk(data.pages, nr_pages, gfp_mask);
> > >  		if (ret)
> > >  			break;
> > >  
> > >  		data.start = start;
> > > +		if (noblock)
> > > +			flags = memalloc_noreclaim_save();
> > > +
> > >  		ret = apply_to_page_range(&init_mm, start, nr_pages * PAGE_SIZE,
> > >  					  kasan_populate_vmalloc_pte, &data);
> > 
> > This series is a great enhancement, thanks.
> > 
> > When checking code, seems apply_to_page_range() will lead to page table
> > allocation which uses GFP_PGTABLE_KERNEL. Not sure if we need to handle
> > this either.
> 
> I am fool, didn't see the obvious added scope between
> memalloc_noreclaim_save/srestore(). Please ignore this noise.
> 
No worries :)

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 6/7] mm/vmalloc: Support non-blocking GFP flags in __vmalloc_area_node()
  2025-07-07  7:13   ` Michal Hocko
@ 2025-07-08 12:27     ` Uladzislau Rezki
  2025-07-08 15:22       ` Michal Hocko
  0 siblings, 1 reply; 25+ messages in thread
From: Uladzislau Rezki @ 2025-07-08 12:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Uladzislau Rezki (Sony), linux-mm, Andrew Morton, LKML,
	Baoquan He

On Mon, Jul 07, 2025 at 09:13:04AM +0200, Michal Hocko wrote:
> On Fri 04-07-25 17:25:36, Uladzislau Rezki wrote:
> > This patch makes __vmalloc_area_node() to correctly handle non-blocking
> > allocation requests, such as GFP_ATOMIC and GFP_NOWAIT. Main changes:
> > 
> > - nested_gfp flag follows the same non-blocking constraints
> >   as the primary gfp_mask, ensuring consistency and avoiding
> >   sleeping allocations in atomic contexts.
> > 
> > - if blocking is not allowed, __GFP_NOFAIL is forcibly cleared
> >   and warning is issued if it was set, since __GFP_NOFAIL is
> >   incompatible with non-blocking contexts;
> > 
> > - Add a __GFP_HIGHMEM to gfp_mask only for blocking requests
> >   if there are no DMA constraints.
> > 
> > - in non-blocking mode we use memalloc_noreclaim_save/restore()
> >   to prevent reclaim related operations that may sleep while
> >   setting up page tables or mapping pages.
> > 
> > This is particularly important for page table allocations that
> > internally use GFP_PGTABLE_KERNEL, which may sleep unless such
> > scope restrictions are applied. For example:
> > 
> > <snip>
> >     #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)
> > 
> >     __pte_alloc_kernel()
> >         pte_alloc_one_kernel(&init_mm);
> >             pagetable_alloc_noprof(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, 0);
> > <snip>
> 
> The changelog doesn't explain the actual implementation and that is
> really crucial here. You rely on memalloc_noreclaim_save (i.e.
> PF_MEMALLOC) to never trigger memory reclaim but you are not explaining
> how do you prevent from the biggest caveat of this interface. Let me
> quote the documentation
>  * Users of this scope have to be extremely careful to not deplete the reserves
>  * completely and implement a throttling mechanism which controls the
>  * consumption of the reserve based on the amount of freed memory. Usage of a
>  * pre-allocated pool (e.g. mempool) should be always considered before using
>  * this scope.
> 
I am aware about that comment. I had same concern about this, but it
looks like i/you may overshot here. Yes, we have access to memory
resrves but this only for page-table manipulations, i.e. to allocate
a page for 5-level page table structure. We have PGD, P4D, PUD, PMD
and PTE which is the lowest level and which needs pages the most.

As i see we do not free pages at least on PTE level, it means that
an address space is populated forward only and never shrink back.
Most of the time you do not need to allocate, this mostly occurs
initially after the boot.

>
> Unless I am missing something _any_ vmalloc(GFP_NOWAIT|GFP_ATOMIC) user
> would get practically unbound access to the whole available memory. This
> is not really acceptable.
> 
See above comment. If there is a big concern about this, i can add
memalloc_noblock_save() memalloc_noblock_restore() pair to eliminate
that concern. The context will be converted in a way that it drops
__GFP_DIRECT_RECLAIM flag.

Thank you for your comments and input i appreciate it.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 3/7] mm/vmalloc: Avoid cond_resched() when blocking is not permitted
  2025-07-07  7:11   ` Michal Hocko
@ 2025-07-08 12:29     ` Uladzislau Rezki
  0 siblings, 0 replies; 25+ messages in thread
From: Uladzislau Rezki @ 2025-07-08 12:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Uladzislau Rezki (Sony), linux-mm, Andrew Morton, LKML,
	Baoquan He

On Mon, Jul 07, 2025 at 09:11:43AM +0200, Michal Hocko wrote:
> On Fri 04-07-25 17:25:33, Uladzislau Rezki wrote:
> > The vm_area_alloc_pages() function uses cond_resched() to yield the
> > CPU during potentially long-running loops. However, yielding should
> > only be done if the given GFP flags allow blocking.
> > 
> > This patch avoids calling cond_resched() when the allocation context
> > is non-blocking(GFP_ATOMIC, GFP_NOWAIT).
> 
> Do we even need those cond_resched calls? Both of them are called
> shortly after memory allocator which already yields CPU when allowed.
>
I think it can be just dropped.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 2/7] mm/vmalloc: Support non-blocking GFP flags in alloc_vmap_area()
  2025-07-07  7:11   ` Michal Hocko
@ 2025-07-08 12:34     ` Uladzislau Rezki
  2025-07-08 15:17       ` Michal Hocko
  0 siblings, 1 reply; 25+ messages in thread
From: Uladzislau Rezki @ 2025-07-08 12:34 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Uladzislau Rezki (Sony), linux-mm, Andrew Morton, LKML,
	Baoquan He

On Mon, Jul 07, 2025 at 09:11:35AM +0200, Michal Hocko wrote:
> On Fri 04-07-25 17:25:32, Uladzislau Rezki wrote:
> [...]
> > @@ -2030,7 +2033,8 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
> >  	 */
> >  	va = node_alloc(size, align, vstart, vend, &addr, &vn_id);
> >  	if (!va) {
> > -		gfp_mask = gfp_mask & GFP_RECLAIM_MASK;
> > +		if (allow_block)
> > +			gfp_mask = gfp_mask & GFP_RECLAIM_MASK;
> 
> I don't follow here and is this even correct?
> 
Allow nested flags to follow a user request if there is a request
to not block. For example if we apply GFP_RECLAIM_MASK to GFP_ATOMIC
GFP_ATOMIC is converted to zero, thus to GFP_NOWAIT.

> >  
> >  		va = kmem_cache_alloc_node(vmap_area_cachep, gfp_mask, node);
> >  		if (unlikely(!va))
> > @@ -2057,8 +2061,14 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
> >  	 * If an allocation fails, the error value is
> >  	 * returned. Therefore trigger the overflow path.
> >  	 */
> > -	if (IS_ERR_VALUE(addr))
> > +	if (IS_ERR_VALUE(addr)) {
> > +		if (!allow_block) {
> > +			kmem_cache_free(vmap_area_cachep, va);
> > +			return ERR_PTR(-ENOMEM);
> 
> I would suggest to add a comment for this. Something like
> 
> for blockable requests trigger the overflow paths because that
> relies on vmap_purge_lock mutex and blocking notifiers.
> 
Thanks, i can do it easily. Also, this is an RFC i think it should
be split and improved. Maybe to move out some functionality into a
separate function.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 2/7] mm/vmalloc: Support non-blocking GFP flags in alloc_vmap_area()
  2025-07-08 12:34     ` Uladzislau Rezki
@ 2025-07-08 15:17       ` Michal Hocko
  2025-07-08 16:45         ` Uladzislau Rezki
  0 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2025-07-08 15:17 UTC (permalink / raw)
  To: Uladzislau Rezki; +Cc: linux-mm, Andrew Morton, LKML, Baoquan He

On Tue 08-07-25 14:34:28, Uladzislau Rezki wrote:
> On Mon, Jul 07, 2025 at 09:11:35AM +0200, Michal Hocko wrote:
> > On Fri 04-07-25 17:25:32, Uladzislau Rezki wrote:
> > [...]
> > > @@ -2030,7 +2033,8 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
> > >  	 */
> > >  	va = node_alloc(size, align, vstart, vend, &addr, &vn_id);
> > >  	if (!va) {
> > > -		gfp_mask = gfp_mask & GFP_RECLAIM_MASK;
> > > +		if (allow_block)
> > > +			gfp_mask = gfp_mask & GFP_RECLAIM_MASK;
> > 
> > I don't follow here and is this even correct?
> > 
> Allow nested flags to follow a user request if there is a request
> to not block. For example if we apply GFP_RECLAIM_MASK to GFP_ATOMIC
> GFP_ATOMIC is converted to zero, thus to GFP_NOWAIT.

I still do not follow. The aim of this code is to filter out all
non-reclaim related flags. Why that should work differently for
non-waiting allocations?
Btw. if you had GPP_ATOMIC the resulting mask will be still GFP_ATOMIC
as both __GFP_HIGH|__GFP_KSWAPD_RECLAIM are part of GFP_RECLAIM_MASK.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 6/7] mm/vmalloc: Support non-blocking GFP flags in __vmalloc_area_node()
  2025-07-08 12:27     ` Uladzislau Rezki
@ 2025-07-08 15:22       ` Michal Hocko
  2025-07-09 11:20         ` Uladzislau Rezki
  0 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2025-07-08 15:22 UTC (permalink / raw)
  To: Uladzislau Rezki; +Cc: linux-mm, Andrew Morton, LKML, Baoquan He

On Tue 08-07-25 14:27:57, Uladzislau Rezki wrote:
> On Mon, Jul 07, 2025 at 09:13:04AM +0200, Michal Hocko wrote:
> > On Fri 04-07-25 17:25:36, Uladzislau Rezki wrote:
> > > This patch makes __vmalloc_area_node() to correctly handle non-blocking
> > > allocation requests, such as GFP_ATOMIC and GFP_NOWAIT. Main changes:
> > > 
> > > - nested_gfp flag follows the same non-blocking constraints
> > >   as the primary gfp_mask, ensuring consistency and avoiding
> > >   sleeping allocations in atomic contexts.
> > > 
> > > - if blocking is not allowed, __GFP_NOFAIL is forcibly cleared
> > >   and warning is issued if it was set, since __GFP_NOFAIL is
> > >   incompatible with non-blocking contexts;
> > > 
> > > - Add a __GFP_HIGHMEM to gfp_mask only for blocking requests
> > >   if there are no DMA constraints.
> > > 
> > > - in non-blocking mode we use memalloc_noreclaim_save/restore()
> > >   to prevent reclaim related operations that may sleep while
> > >   setting up page tables or mapping pages.
> > > 
> > > This is particularly important for page table allocations that
> > > internally use GFP_PGTABLE_KERNEL, which may sleep unless such
> > > scope restrictions are applied. For example:
> > > 
> > > <snip>
> > >     #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)
> > > 
> > >     __pte_alloc_kernel()
> > >         pte_alloc_one_kernel(&init_mm);
> > >             pagetable_alloc_noprof(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, 0);
> > > <snip>
> > 
> > The changelog doesn't explain the actual implementation and that is
> > really crucial here. You rely on memalloc_noreclaim_save (i.e.
> > PF_MEMALLOC) to never trigger memory reclaim but you are not explaining
> > how do you prevent from the biggest caveat of this interface. Let me
> > quote the documentation
> >  * Users of this scope have to be extremely careful to not deplete the reserves
> >  * completely and implement a throttling mechanism which controls the
> >  * consumption of the reserve based on the amount of freed memory. Usage of a
> >  * pre-allocated pool (e.g. mempool) should be always considered before using
> >  * this scope.
> > 
> I am aware about that comment. I had same concern about this, but it
> looks like i/you may overshot here. Yes, we have access to memory
> resrves but this only for page-table manipulations, i.e. to allocate
> a page for 5-level page table structure. We have PGD, P4D, PUD, PMD
> and PTE which is the lowest level and which needs pages the most.
> 
> As i see we do not free pages at least on PTE level, it means that
> an address space is populated forward only and never shrink back.
> Most of the time you do not need to allocate, this mostly occurs
> initially after the boot.

You are right, I have misread the patch. I thought this includes
vm_area_alloc_pages as well but you are right this is only for page
tables and that seems much more reasonable. Having that outlined in the
changelog would have helped ;)
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 6/7] mm/vmalloc: Support non-blocking GFP flags in __vmalloc_area_node()
  2025-07-04 15:25 ` [RFC 6/7] mm/vmalloc: Support non-blocking GFP flags in __vmalloc_area_node() Uladzislau Rezki (Sony)
  2025-07-07  7:13   ` Michal Hocko
@ 2025-07-08 15:47   ` Michal Hocko
  2025-07-09 13:45     ` Uladzislau Rezki
  1 sibling, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2025-07-08 15:47 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony); +Cc: linux-mm, Andrew Morton, LKML, Baoquan He

On Fri 04-07-25 17:25:36, Uladzislau Rezki wrote:
> This patch makes __vmalloc_area_node() to correctly handle non-blocking
> allocation requests, such as GFP_ATOMIC and GFP_NOWAIT. Main changes:
> 
> - nested_gfp flag follows the same non-blocking constraints
>   as the primary gfp_mask, ensuring consistency and avoiding
>   sleeping allocations in atomic contexts.
> 
> - if blocking is not allowed, __GFP_NOFAIL is forcibly cleared
>   and warning is issued if it was set, since __GFP_NOFAIL is
>   incompatible with non-blocking contexts;
> 
> - Add a __GFP_HIGHMEM to gfp_mask only for blocking requests
>   if there are no DMA constraints.
> 
> - in non-blocking mode we use memalloc_noreclaim_save/restore()
>   to prevent reclaim related operations that may sleep while
>   setting up page tables or mapping pages.
> 
> This is particularly important for page table allocations that
> internally use GFP_PGTABLE_KERNEL, which may sleep unless such
> scope restrictions are applied. For example:
> 
> <snip>
>     #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)
> 
>     __pte_alloc_kernel()
>         pte_alloc_one_kernel(&init_mm);
>             pagetable_alloc_noprof(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, 0);
> <snip>
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
>  mm/vmalloc.c | 30 +++++++++++++++++++++++++-----
>  1 file changed, 25 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 2eaff0575a9e..fe1699e01e02 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3711,7 +3711,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  				 pgprot_t prot, unsigned int page_shift,
>  				 int node)
>  {
> -	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> +	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
>  	bool nofail = gfp_mask & __GFP_NOFAIL;
>  	unsigned long addr = (unsigned long)area->addr;
>  	unsigned long size = get_vm_area_size(area);
> @@ -3719,12 +3719,28 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  	unsigned int nr_small_pages = size >> PAGE_SHIFT;
>  	unsigned int page_order;
>  	unsigned int flags;
> +	bool noblock;
>  	int ret;
>  
>  	array_size = (unsigned long)nr_small_pages * sizeof(struct page *);
> +	noblock = !gfpflags_allow_blocking(gfp_mask);
>  
> -	if (!(gfp_mask & (GFP_DMA | GFP_DMA32)))
> -		gfp_mask |= __GFP_HIGHMEM;
> +	if (noblock) {
> +		/* __GFP_NOFAIL is incompatible with non-blocking contexts. */
> +		WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
> +		gfp_mask &= ~__GFP_NOFAIL;

Btw. we already ignore GFP_NOFAIL for atomic allocations and warn about
that at the page allocator level (__alloc_pages_slowpath)

What we can do though is to add a pr_warn + dump_stack for request with
size that would require (in the worst case) page tables allocation
larger than a portion of min_free_kbytes (to scale with different memory
sizes). That should be plenty for any reasonable non blocking vmalloc.
We would have means to catch abusers in that way.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 2/7] mm/vmalloc: Support non-blocking GFP flags in alloc_vmap_area()
  2025-07-08 15:17       ` Michal Hocko
@ 2025-07-08 16:45         ` Uladzislau Rezki
  0 siblings, 0 replies; 25+ messages in thread
From: Uladzislau Rezki @ 2025-07-08 16:45 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Uladzislau Rezki, linux-mm, Andrew Morton, LKML, Baoquan He

On Tue, Jul 08, 2025 at 05:17:33PM +0200, Michal Hocko wrote:
> On Tue 08-07-25 14:34:28, Uladzislau Rezki wrote:
> > On Mon, Jul 07, 2025 at 09:11:35AM +0200, Michal Hocko wrote:
> > > On Fri 04-07-25 17:25:32, Uladzislau Rezki wrote:
> > > [...]
> > > > @@ -2030,7 +2033,8 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
> > > >  	 */
> > > >  	va = node_alloc(size, align, vstart, vend, &addr, &vn_id);
> > > >  	if (!va) {
> > > > -		gfp_mask = gfp_mask & GFP_RECLAIM_MASK;
> > > > +		if (allow_block)
> > > > +			gfp_mask = gfp_mask & GFP_RECLAIM_MASK;
> > > 
> > > I don't follow here and is this even correct?
> > > 
> > Allow nested flags to follow a user request if there is a request
> > to not block. For example if we apply GFP_RECLAIM_MASK to GFP_ATOMIC
> > GFP_ATOMIC is converted to zero, thus to GFP_NOWAIT.
> 
> I still do not follow. The aim of this code is to filter out all
> non-reclaim related flags. Why that should work differently for
> non-waiting allocations?
> Btw. if you had GPP_ATOMIC the resulting mask will be still GFP_ATOMIC
> as both __GFP_HIGH|__GFP_KSWAPD_RECLAIM are part of GFP_RECLAIM_MASK.
> 
Right. I misread the GFP_RECLAIM_MASK, i thought that GFP_ATOMIC and
GFP_NOWAIT are not part of it. They allow reclaim, but not direct,
i.e. it is OK to wake-up a kswapd.

So, they should not work differently. Thank you for the comment! 

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 6/7] mm/vmalloc: Support non-blocking GFP flags in __vmalloc_area_node()
  2025-07-08 15:22       ` Michal Hocko
@ 2025-07-09 11:20         ` Uladzislau Rezki
  0 siblings, 0 replies; 25+ messages in thread
From: Uladzislau Rezki @ 2025-07-09 11:20 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Uladzislau Rezki, linux-mm, Andrew Morton, LKML, Baoquan He

On Tue, Jul 08, 2025 at 05:22:52PM +0200, Michal Hocko wrote:
> On Tue 08-07-25 14:27:57, Uladzislau Rezki wrote:
> > On Mon, Jul 07, 2025 at 09:13:04AM +0200, Michal Hocko wrote:
> > > On Fri 04-07-25 17:25:36, Uladzislau Rezki wrote:
> > > > This patch makes __vmalloc_area_node() to correctly handle non-blocking
> > > > allocation requests, such as GFP_ATOMIC and GFP_NOWAIT. Main changes:
> > > > 
> > > > - nested_gfp flag follows the same non-blocking constraints
> > > >   as the primary gfp_mask, ensuring consistency and avoiding
> > > >   sleeping allocations in atomic contexts.
> > > > 
> > > > - if blocking is not allowed, __GFP_NOFAIL is forcibly cleared
> > > >   and warning is issued if it was set, since __GFP_NOFAIL is
> > > >   incompatible with non-blocking contexts;
> > > > 
> > > > - Add a __GFP_HIGHMEM to gfp_mask only for blocking requests
> > > >   if there are no DMA constraints.
> > > > 
> > > > - in non-blocking mode we use memalloc_noreclaim_save/restore()
> > > >   to prevent reclaim related operations that may sleep while
> > > >   setting up page tables or mapping pages.
> > > > 
> > > > This is particularly important for page table allocations that
> > > > internally use GFP_PGTABLE_KERNEL, which may sleep unless such
> > > > scope restrictions are applied. For example:
> > > > 
> > > > <snip>
> > > >     #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)
> > > > 
> > > >     __pte_alloc_kernel()
> > > >         pte_alloc_one_kernel(&init_mm);
> > > >             pagetable_alloc_noprof(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, 0);
> > > > <snip>
> > > 
> > > The changelog doesn't explain the actual implementation and that is
> > > really crucial here. You rely on memalloc_noreclaim_save (i.e.
> > > PF_MEMALLOC) to never trigger memory reclaim but you are not explaining
> > > how do you prevent from the biggest caveat of this interface. Let me
> > > quote the documentation
> > >  * Users of this scope have to be extremely careful to not deplete the reserves
> > >  * completely and implement a throttling mechanism which controls the
> > >  * consumption of the reserve based on the amount of freed memory. Usage of a
> > >  * pre-allocated pool (e.g. mempool) should be always considered before using
> > >  * this scope.
> > > 
> > I am aware about that comment. I had same concern about this, but it
> > looks like i/you may overshot here. Yes, we have access to memory
> > resrves but this only for page-table manipulations, i.e. to allocate
> > a page for 5-level page table structure. We have PGD, P4D, PUD, PMD
> > and PTE which is the lowest level and which needs pages the most.
> > 
> > As i see we do not free pages at least on PTE level, it means that
> > an address space is populated forward only and never shrink back.
> > Most of the time you do not need to allocate, this mostly occurs
> > initially after the boot.
> 
> You are right, I have misread the patch. I thought this includes
> vm_area_alloc_pages as well but you are right this is only for page
> tables and that seems much more reasonable. Having that outlined in the
> changelog would have helped ;)
>
I will update the commit message in more detail in my next version.

Thank you for!

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 6/7] mm/vmalloc: Support non-blocking GFP flags in __vmalloc_area_node()
  2025-07-08 15:47   ` Michal Hocko
@ 2025-07-09 13:45     ` Uladzislau Rezki
  0 siblings, 0 replies; 25+ messages in thread
From: Uladzislau Rezki @ 2025-07-09 13:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Uladzislau Rezki (Sony), linux-mm, Andrew Morton, LKML,
	Baoquan He

On Tue, Jul 08, 2025 at 05:47:21PM +0200, Michal Hocko wrote:
> On Fri 04-07-25 17:25:36, Uladzislau Rezki wrote:
> > This patch makes __vmalloc_area_node() to correctly handle non-blocking
> > allocation requests, such as GFP_ATOMIC and GFP_NOWAIT. Main changes:
> > 
> > - nested_gfp flag follows the same non-blocking constraints
> >   as the primary gfp_mask, ensuring consistency and avoiding
> >   sleeping allocations in atomic contexts.
> > 
> > - if blocking is not allowed, __GFP_NOFAIL is forcibly cleared
> >   and warning is issued if it was set, since __GFP_NOFAIL is
> >   incompatible with non-blocking contexts;
> > 
> > - Add a __GFP_HIGHMEM to gfp_mask only for blocking requests
> >   if there are no DMA constraints.
> > 
> > - in non-blocking mode we use memalloc_noreclaim_save/restore()
> >   to prevent reclaim related operations that may sleep while
> >   setting up page tables or mapping pages.
> > 
> > This is particularly important for page table allocations that
> > internally use GFP_PGTABLE_KERNEL, which may sleep unless such
> > scope restrictions are applied. For example:
> > 
> > <snip>
> >     #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)
> > 
> >     __pte_alloc_kernel()
> >         pte_alloc_one_kernel(&init_mm);
> >             pagetable_alloc_noprof(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, 0);
> > <snip>
> > 
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > ---
> >  mm/vmalloc.c | 30 +++++++++++++++++++++++++-----
> >  1 file changed, 25 insertions(+), 5 deletions(-)
> > 
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 2eaff0575a9e..fe1699e01e02 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -3711,7 +3711,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> >  				 pgprot_t prot, unsigned int page_shift,
> >  				 int node)
> >  {
> > -	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> > +	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> >  	bool nofail = gfp_mask & __GFP_NOFAIL;
> >  	unsigned long addr = (unsigned long)area->addr;
> >  	unsigned long size = get_vm_area_size(area);
> > @@ -3719,12 +3719,28 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> >  	unsigned int nr_small_pages = size >> PAGE_SHIFT;
> >  	unsigned int page_order;
> >  	unsigned int flags;
> > +	bool noblock;
> >  	int ret;
> >  
> >  	array_size = (unsigned long)nr_small_pages * sizeof(struct page *);
> > +	noblock = !gfpflags_allow_blocking(gfp_mask);
> >  
> > -	if (!(gfp_mask & (GFP_DMA | GFP_DMA32)))
> > -		gfp_mask |= __GFP_HIGHMEM;
> > +	if (noblock) {
> > +		/* __GFP_NOFAIL is incompatible with non-blocking contexts. */
> > +		WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
> > +		gfp_mask &= ~__GFP_NOFAIL;
> 
> Btw. we already ignore GFP_NOFAIL for atomic allocations and warn about
> that at the page allocator level (__alloc_pages_slowpath)
> 
Thank you. I will comment this!

>
> What we can do though is to add a pr_warn + dump_stack for request with
> size that would require (in the worst case) page tables allocation
> larger than a portion of min_free_kbytes (to scale with different memory
> sizes). That should be plenty for any reasonable non blocking vmalloc.
> We would have means to catch abusers in that way.
> 
OK, i will add it. I assume you mean:

  unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);

  if (request_pages > pages_min)
    dump();

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2025-07-09 13:45 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-04 15:25 [RFC 0/7] vmallloc and non-blocking GFPs Uladzislau Rezki (Sony)
2025-07-04 15:25 ` [RFC 1/7] lib/test_vmalloc: Add non-block-alloc-test case Uladzislau Rezki (Sony)
2025-07-08  5:59   ` [External] " Adrian Huang12
2025-07-08  8:29     ` Uladzislau Rezki
2025-07-04 15:25 ` [RFC 2/7] mm/vmalloc: Support non-blocking GFP flags in alloc_vmap_area() Uladzislau Rezki (Sony)
2025-07-07  7:11   ` Michal Hocko
2025-07-08 12:34     ` Uladzislau Rezki
2025-07-08 15:17       ` Michal Hocko
2025-07-08 16:45         ` Uladzislau Rezki
2025-07-04 15:25 ` [RFC 3/7] mm/vmalloc: Avoid cond_resched() when blocking is not permitted Uladzislau Rezki (Sony)
2025-07-07  7:11   ` Michal Hocko
2025-07-08 12:29     ` Uladzislau Rezki
2025-07-04 15:25 ` [RFC 4/7] mm/kasan, mm/vmalloc: Respect GFP flags in kasan_populate_vmalloc() Uladzislau Rezki (Sony)
2025-07-07  1:47   ` Baoquan He
2025-07-08  1:15     ` Baoquan He
2025-07-08  8:30       ` Uladzislau Rezki
2025-07-04 15:25 ` [RFC 5/7] mm/vmalloc: Defer freeing partly initialized vm_struct Uladzislau Rezki (Sony)
2025-07-04 15:25 ` [RFC 6/7] mm/vmalloc: Support non-blocking GFP flags in __vmalloc_area_node() Uladzislau Rezki (Sony)
2025-07-07  7:13   ` Michal Hocko
2025-07-08 12:27     ` Uladzislau Rezki
2025-07-08 15:22       ` Michal Hocko
2025-07-09 11:20         ` Uladzislau Rezki
2025-07-08 15:47   ` Michal Hocko
2025-07-09 13:45     ` Uladzislau Rezki
2025-07-04 15:25 ` [RFC 7/7] mm: Drop __GFP_DIRECT_RECLAIM flag if PF_MEMALLOC is set Uladzislau Rezki (Sony)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).