rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Align kvrealloc() with krealloc()
@ 2024-07-22 16:29 Danilo Krummrich
  2024-07-22 16:29 ` [PATCH v2 1/2] mm: vmalloc: implement vrealloc() Danilo Krummrich
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Danilo Krummrich @ 2024-07-22 16:29 UTC (permalink / raw)
  To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, vbabka,
	roman.gushchin, 42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf,
	mhocko, mpe, chandan.babu, christian.koenig, maz, oliver.upton
  Cc: linux-kernel, linux-mm, rust-for-linux, Danilo Krummrich

Hi,

Besides the obvious (and desired) difference between krealloc() and kvrealloc(),
there is some inconsistency in their function signatures and behavior:

 - krealloc() frees the memory when the requested size is zero, whereas
   kvrealloc() simply returns a pointer to the existing allocation.

 - krealloc() behaves like kmalloc() if a NULL pointer is passed, whereas
   kvrealloc() does not accept a NULL pointer at all and, if passed, would fault
   instead.

 - krealloc() is self-contained, whereas kvrealloc() relies on the caller to
   provide the size of the previous allocation.

Inconsistent behavior throughout allocation APIs is error prone, hence make
kvrealloc() behave like krealloc(), which seems superior in all mentioned
aspects.

In order to be able to get rid of kvrealloc()'s oldsize parameter, introduce
vrealloc() and make use of it in kvrealloc().

Making use of vrealloc() in kvrealloc() also provides oppertunities to grow (and
shrink) allocations more efficiently. For instance, vrealloc() can be optimized
to allocate and map additional pages to grow the allocation or unmap and free
unused pages to shrink the allocation.

Besides the above, those functions are required by Rust's allocator abstractons
[1] (rework based on this series in [2]). With `Vec` or `KVec` respectively,
potentially growing (and shrinking) data structures are rather common.

The patches of this series can also be found in [3].

[1] https://lore.kernel.org/lkml/20240704170738.3621-1-dakr@redhat.com/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/dakr/linux.git/log/?h=rust/mm
[3] https://git.kernel.org/pub/scm/linux/kernel/git/dakr/linux.git/log/?h=mm/krealloc

Changes in v2:
 - remove unnecessary extern and move __realloc_size to a new line for
   vrealloc_noprof() and kvrealloc_noprof()
 - drop EXPORT_SYMBOL for vrealloc_noprof()
 - rename to_kmalloc_flags() to kmalloc_gfp_adjust()
 - fix missing NULL check in vrealloc_noprof()
 - rephrase TODO comments in vrealloc_noprof()

Danilo Krummrich (2):
  mm: vmalloc: implement vrealloc()
  mm: kvmalloc: align kvrealloc() with krealloc()

 arch/arm64/kvm/nested.c                   |  1 -
 arch/powerpc/platforms/pseries/papr-vpd.c |  5 +-
 drivers/gpu/drm/drm_exec.c                |  3 +-
 fs/xfs/xfs_log_recover.c                  |  2 +-
 include/linux/slab.h                      |  4 +-
 include/linux/vmalloc.h                   |  4 +
 kernel/resource.c                         |  3 +-
 lib/fortify_kunit.c                       |  3 +-
 mm/util.c                                 | 89 +++++++++++++++--------
 mm/vmalloc.c                              | 59 +++++++++++++++
 10 files changed, 129 insertions(+), 44 deletions(-)


base-commit: 933069701c1b507825b514317d4edd5d3fd9d417
-- 
2.45.2


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 1/2] mm: vmalloc: implement vrealloc()
  2024-07-22 16:29 [PATCH v2 0/2] Align kvrealloc() with krealloc() Danilo Krummrich
@ 2024-07-22 16:29 ` Danilo Krummrich
  2024-07-26 14:37   ` Vlastimil Babka
  2024-07-22 16:29 ` [PATCH v2 2/2] mm: kvmalloc: align kvrealloc() with krealloc() Danilo Krummrich
  2024-07-23 18:54 ` [PATCH v2 0/2] Align " Michal Hocko
  2 siblings, 1 reply; 28+ messages in thread
From: Danilo Krummrich @ 2024-07-22 16:29 UTC (permalink / raw)
  To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, vbabka,
	roman.gushchin, 42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf,
	mhocko, mpe, chandan.babu, christian.koenig, maz, oliver.upton
  Cc: linux-kernel, linux-mm, rust-for-linux, Danilo Krummrich

Implement vrealloc() analogous to krealloc().

Currently, krealloc() requires the caller to pass the size of the
previous memory allocation, which, instead, should be self-contained.

We attempt to fix this in a subsequent patch which, in order to do so,
requires vrealloc().

Besides that, we need realloc() functions for kernel allocators in Rust
too. With `Vec` or `KVec` respectively, potentially growing (and
shrinking) data structures are rather common.

Signed-off-by: Danilo Krummrich <dakr@kernel.org>
---
 include/linux/vmalloc.h |  4 +++
 mm/vmalloc.c            | 59 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index e4a631ec430b..ad2ce7a6ab7a 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -189,6 +189,10 @@ extern void *__vcalloc_noprof(size_t n, size_t size, gfp_t flags) __alloc_size(1
 extern void *vcalloc_noprof(size_t n, size_t size) __alloc_size(1, 2);
 #define vcalloc(...)		alloc_hooks(vcalloc_noprof(__VA_ARGS__))
 
+void * __must_check vrealloc_noprof(const void *p, size_t size, gfp_t flags)
+		__realloc_size(2);
+#define vrealloc(...)		alloc_hooks(vrealloc_noprof(__VA_ARGS__))
+
 extern void vfree(const void *addr);
 extern void vfree_atomic(const void *addr);
 
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6b783baf12a1..caf032f0bd69 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -4037,6 +4037,65 @@ void *vzalloc_node_noprof(unsigned long size, int node)
 }
 EXPORT_SYMBOL(vzalloc_node_noprof);
 
+/**
+ * vrealloc - reallocate virtually contiguous memory; contents remain unchanged
+ * @p: object to reallocate memory for
+ * @size: the size to reallocate
+ * @flags: the flags for the page level allocator
+ *
+ * The contents of the object pointed to are preserved up to the lesser of the
+ * new and old size (__GFP_ZERO flag is effectively ignored).
+ *
+ * If @p is %NULL, vrealloc() behaves exactly like vmalloc(). If @size is 0 and
+ * @p is not a %NULL pointer, the object pointed to is freed.
+ *
+ * Return: pointer to the allocated memory; %NULL if @size is zero or in case of
+ *         failure
+ */
+void *vrealloc_noprof(const void *p, size_t size, gfp_t flags)
+{
+	size_t old_size = 0;
+	void *n;
+
+	if (!size) {
+		vfree(p);
+		return NULL;
+	}
+
+	if (p) {
+		struct vm_struct *vm;
+
+		vm = find_vm_area(p);
+		if (unlikely(!vm)) {
+			WARN(1, "Trying to vrealloc() nonexistent vm area (%p)\n", p);
+			return NULL;
+		}
+
+		old_size = get_vm_area_size(vm);
+	}
+
+	if (size <= old_size) {
+		/*
+		 * TODO: Shrink the vm_area, i.e. unmap and free unused pages.
+		 * What would be a good heuristic for when to shrink the
+		 * vm_area?
+		 */
+		return (void *)p;
+	}
+
+	/* TODO: Grow the vm_area, i.e. allocate and map additional pages. */
+	n = __vmalloc_noprof(size, flags);
+	if (!n)
+		return NULL;
+
+	if (p) {
+		memcpy(n, p, old_size);
+		vfree(p);
+	}
+
+	return n;
+}
+
 #if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)
 #define GFP_VMALLOC32 (GFP_DMA32 | GFP_KERNEL)
 #elif defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA)
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 2/2] mm: kvmalloc: align kvrealloc() with krealloc()
  2024-07-22 16:29 [PATCH v2 0/2] Align kvrealloc() with krealloc() Danilo Krummrich
  2024-07-22 16:29 ` [PATCH v2 1/2] mm: vmalloc: implement vrealloc() Danilo Krummrich
@ 2024-07-22 16:29 ` Danilo Krummrich
  2024-07-23  1:43   ` Andrew Morton
                     ` (2 more replies)
  2024-07-23 18:54 ` [PATCH v2 0/2] Align " Michal Hocko
  2 siblings, 3 replies; 28+ messages in thread
From: Danilo Krummrich @ 2024-07-22 16:29 UTC (permalink / raw)
  To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, vbabka,
	roman.gushchin, 42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf,
	mhocko, mpe, chandan.babu, christian.koenig, maz, oliver.upton
  Cc: linux-kernel, linux-mm, rust-for-linux, Danilo Krummrich

Besides the obvious (and desired) difference between krealloc() and
kvrealloc(), there is some inconsistency in their function signatures
and behavior:

 - krealloc() frees the memory when the requested size is zero, whereas
   kvrealloc() simply returns a pointer to the existing allocation.

 - krealloc() behaves like kmalloc() if a NULL pointer is passed, whereas
   kvrealloc() does not accept a NULL pointer at all and, if passed,
   would fault instead.

 - krealloc() is self-contained, whereas kvrealloc() relies on the caller
   to provide the size of the previous allocation.

Inconsistent behavior throughout allocation APIs is error prone, hence make
kvrealloc() behave like krealloc(), which seems superior in all mentioned
aspects.

Besides that, implementing kvrealloc() by making use of krealloc() and
vrealloc() provides oppertunities to grow (and shrink) allocations more
efficiently. For instance, vrealloc() can be optimized to allocate and
map additional pages to grow the allocation or unmap and free unused
pages to shrink the allocation.

Signed-off-by: Danilo Krummrich <dakr@kernel.org>
---
 arch/arm64/kvm/nested.c                   |  1 -
 arch/powerpc/platforms/pseries/papr-vpd.c |  5 +-
 drivers/gpu/drm/drm_exec.c                |  3 +-
 fs/xfs/xfs_log_recover.c                  |  2 +-
 include/linux/slab.h                      |  4 +-
 kernel/resource.c                         |  3 +-
 lib/fortify_kunit.c                       |  3 +-
 mm/util.c                                 | 89 +++++++++++++++--------
 8 files changed, 66 insertions(+), 44 deletions(-)

diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index de789e0f1ae9..1ff3079aabc9 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -62,7 +62,6 @@ int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
 	 */
 	num_mmus = atomic_read(&kvm->online_vcpus) * S2_MMU_PER_VCPU;
 	tmp = kvrealloc(kvm->arch.nested_mmus,
-			size_mul(sizeof(*kvm->arch.nested_mmus), kvm->arch.nested_mmus_size),
 			size_mul(sizeof(*kvm->arch.nested_mmus), num_mmus),
 			GFP_KERNEL_ACCOUNT | __GFP_ZERO);
 	if (!tmp)
diff --git a/arch/powerpc/platforms/pseries/papr-vpd.c b/arch/powerpc/platforms/pseries/papr-vpd.c
index c29e85db5f35..1574176e3ffc 100644
--- a/arch/powerpc/platforms/pseries/papr-vpd.c
+++ b/arch/powerpc/platforms/pseries/papr-vpd.c
@@ -156,10 +156,7 @@ static int vpd_blob_extend(struct vpd_blob *blob, const char *data, size_t len)
 	const char *old_ptr = blob->data;
 	char *new_ptr;
 
-	new_ptr = old_ptr ?
-		kvrealloc(old_ptr, old_len, new_len, GFP_KERNEL_ACCOUNT) :
-		kvmalloc(len, GFP_KERNEL_ACCOUNT);
-
+	new_ptr = kvrealloc(old_ptr, new_len, GFP_KERNEL_ACCOUNT);
 	if (!new_ptr)
 		return -ENOMEM;
 
diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
index 2da094bdf8a4..18e366cc4993 100644
--- a/drivers/gpu/drm/drm_exec.c
+++ b/drivers/gpu/drm/drm_exec.c
@@ -145,8 +145,7 @@ static int drm_exec_obj_locked(struct drm_exec *exec,
 		size_t size = exec->max_objects * sizeof(void *);
 		void *tmp;
 
-		tmp = kvrealloc(exec->objects, size, size + PAGE_SIZE,
-				GFP_KERNEL);
+		tmp = kvrealloc(exec->objects, size + PAGE_SIZE, GFP_KERNEL);
 		if (!tmp)
 			return -ENOMEM;
 
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 4423dd344239..1997981827fb 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2128,7 +2128,7 @@ xlog_recover_add_to_cont_trans(
 	old_ptr = item->ri_buf[item->ri_cnt-1].i_addr;
 	old_len = item->ri_buf[item->ri_cnt-1].i_len;
 
-	ptr = kvrealloc(old_ptr, old_len, len + old_len, GFP_KERNEL);
+	ptr = kvrealloc(old_ptr, len + old_len, GFP_KERNEL);
 	if (!ptr)
 		return -ENOMEM;
 	memcpy(&ptr[old_len], dp, len);
diff --git a/include/linux/slab.h b/include/linux/slab.h
index eb2bf4629157..c9cb42203183 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -841,8 +841,8 @@ kvmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags, int node)
 #define kvcalloc_node(...)			alloc_hooks(kvcalloc_node_noprof(__VA_ARGS__))
 #define kvcalloc(...)				alloc_hooks(kvcalloc_noprof(__VA_ARGS__))
 
-extern void *kvrealloc_noprof(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
-		      __realloc_size(3);
+void *kvrealloc_noprof(const void *p, size_t size, gfp_t flags)
+		__realloc_size(2);
 #define kvrealloc(...)				alloc_hooks(kvrealloc_noprof(__VA_ARGS__))
 
 extern void kvfree(const void *addr);
diff --git a/kernel/resource.c b/kernel/resource.c
index 14777afb0a99..9f747bb7cd03 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -450,8 +450,7 @@ int walk_system_ram_res_rev(u64 start, u64 end, void *arg,
 			/* re-alloc */
 			struct resource *rams_new;
 
-			rams_new = kvrealloc(rams, rams_size * sizeof(struct resource),
-					     (rams_size + 16) * sizeof(struct resource),
+			rams_new = kvrealloc(rams, (rams_size + 16) * sizeof(struct resource),
 					     GFP_KERNEL);
 			if (!rams_new)
 				goto out;
diff --git a/lib/fortify_kunit.c b/lib/fortify_kunit.c
index f9ad60a9c7bd..ecb638d4cde1 100644
--- a/lib/fortify_kunit.c
+++ b/lib/fortify_kunit.c
@@ -306,8 +306,7 @@ DEFINE_ALLOC_SIZE_TEST_PAIR(vmalloc)
 	orig = kvmalloc(prev_size, gfp);				\
 	KUNIT_EXPECT_TRUE(test, orig != NULL);				\
 	checker(((expected_pages) * PAGE_SIZE) * 2,			\
-		kvrealloc(orig, prev_size,				\
-			  ((alloc_pages) * PAGE_SIZE) * 2, gfp),	\
+		kvrealloc(orig, ((alloc_pages) * PAGE_SIZE) * 2, gfp),	\
 		kvfree(p));						\
 } while (0)
 DEFINE_ALLOC_SIZE_TEST_PAIR(kvmalloc)
diff --git a/mm/util.c b/mm/util.c
index bc488f0121a7..0ff5898cc6de 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -608,6 +608,28 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
 }
 EXPORT_SYMBOL(vm_mmap);
 
+static gfp_t kmalloc_gfp_adjust(gfp_t flags, size_t size)
+{
+	/*
+	 * We want to attempt a large physically contiguous block first because
+	 * it is less likely to fragment multiple larger blocks and therefore
+	 * contribute to a long term fragmentation less than vmalloc fallback.
+	 * However make sure that larger requests are not too disruptive - no
+	 * OOM killer and no allocation failure warnings as we have a fallback.
+	 */
+	if (size > PAGE_SIZE) {
+		flags |= __GFP_NOWARN;
+
+		if (!(flags & __GFP_RETRY_MAYFAIL))
+			flags |= __GFP_NORETRY;
+
+		/* nofail semantic is implemented by the vmalloc fallback */
+		flags &= ~__GFP_NOFAIL;
+	}
+
+	return flags;
+}
+
 /**
  * __kvmalloc_node - attempt to allocate physically contiguous memory, but upon
  * failure, fall back to non-contiguous (vmalloc) allocation.
@@ -627,32 +649,15 @@ EXPORT_SYMBOL(vm_mmap);
  */
 void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node)
 {
-	gfp_t kmalloc_flags = flags;
 	void *ret;
 
-	/*
-	 * We want to attempt a large physically contiguous block first because
-	 * it is less likely to fragment multiple larger blocks and therefore
-	 * contribute to a long term fragmentation less than vmalloc fallback.
-	 * However make sure that larger requests are not too disruptive - no
-	 * OOM killer and no allocation failure warnings as we have a fallback.
-	 */
-	if (size > PAGE_SIZE) {
-		kmalloc_flags |= __GFP_NOWARN;
-
-		if (!(kmalloc_flags & __GFP_RETRY_MAYFAIL))
-			kmalloc_flags |= __GFP_NORETRY;
-
-		/* nofail semantic is implemented by the vmalloc fallback */
-		kmalloc_flags &= ~__GFP_NOFAIL;
-	}
-
-	ret = __kmalloc_node_noprof(PASS_BUCKET_PARAMS(size, b), kmalloc_flags, node);
-
 	/*
 	 * It doesn't really make sense to fallback to vmalloc for sub page
 	 * requests
 	 */
+	ret = __kmalloc_node_noprof(PASS_BUCKET_PARAMS(size, b),
+				    kmalloc_gfp_adjust(flags, size),
+				    node);
 	if (ret || size <= PAGE_SIZE)
 		return ret;
 
@@ -715,18 +720,42 @@ void kvfree_sensitive(const void *addr, size_t len)
 }
 EXPORT_SYMBOL(kvfree_sensitive);
 
-void *kvrealloc_noprof(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
+/**
+ * kvrealloc - reallocate memory; contents remain unchanged
+ * @p: object to reallocate memory for
+ * @size: the size to reallocate
+ * @flags: the flags for the page level allocator
+ *
+ * The contents of the object pointed to are preserved up to the lesser of the
+ * new and old size (__GFP_ZERO flag is effectively ignored).
+ *
+ * If @p is %NULL, kvrealloc() behaves exactly like kvmalloc(). If @size is 0
+ * and @p is not a %NULL pointer, the object pointed to is freed.
+ *
+ * Return: pointer to the allocated memory or %NULL in case of error
+ */
+void *kvrealloc_noprof(const void *p, size_t size, gfp_t flags)
 {
-	void *newp;
+	void *n;
+
+	if (is_vmalloc_addr(p))
+		return vrealloc_noprof(p, size, flags);
+
+	n = krealloc_noprof(p, size, kmalloc_gfp_adjust(flags, size));
+	if (!n) {
+		/* We failed to krealloc(), fall back to kvmalloc(). */
+		n = kvmalloc_noprof(size, flags);
+		if (!n)
+			return NULL;
+
+		if (p) {
+			/* We already know that `p` is not a vmalloc address. */
+			memcpy(n, p, ksize(p));
+			kfree(p);
+		}
+	}
 
-	if (oldsize >= newsize)
-		return (void *)p;
-	newp = kvmalloc_noprof(newsize, flags);
-	if (!newp)
-		return NULL;
-	memcpy(newp, p, oldsize);
-	kvfree(p);
-	return newp;
+	return n;
 }
 EXPORT_SYMBOL(kvrealloc_noprof);
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: kvmalloc: align kvrealloc() with krealloc()
  2024-07-22 16:29 ` [PATCH v2 2/2] mm: kvmalloc: align kvrealloc() with krealloc() Danilo Krummrich
@ 2024-07-23  1:43   ` Andrew Morton
  2024-07-23 14:05     ` Danilo Krummrich
  2024-07-23  7:50   ` Michal Hocko
  2024-07-26 14:38   ` Vlastimil Babka
  2 siblings, 1 reply; 28+ messages in thread
From: Andrew Morton @ 2024-07-23  1:43 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, vbabka, roman.gushchin,
	42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf, mhocko, mpe,
	chandan.babu, christian.koenig, maz, oliver.upton, linux-kernel,
	linux-mm, rust-for-linux

On Mon, 22 Jul 2024 18:29:24 +0200 Danilo Krummrich <dakr@kernel.org> wrote:

> Besides the obvious (and desired) difference between krealloc() and
> kvrealloc(), there is some inconsistency in their function signatures
> and behavior:
> 
>  - krealloc() frees the memory when the requested size is zero, whereas
>    kvrealloc() simply returns a pointer to the existing allocation.

The old kvrealloc() behavior actually sounds somewhat useful.  You've
checked that no existing sites were relying on this?

And that all existing kvrealloc() callers were (incorrectly) checking
for NULL?  Seems that way.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: kvmalloc: align kvrealloc() with krealloc()
  2024-07-22 16:29 ` [PATCH v2 2/2] mm: kvmalloc: align kvrealloc() with krealloc() Danilo Krummrich
  2024-07-23  1:43   ` Andrew Morton
@ 2024-07-23  7:50   ` Michal Hocko
  2024-07-23 10:42     ` Danilo Krummrich
  2024-07-26 14:38   ` Vlastimil Babka
  2 siblings, 1 reply; 28+ messages in thread
From: Michal Hocko @ 2024-07-23  7:50 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, vbabka,
	roman.gushchin, 42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf,
	mpe, chandan.babu, christian.koenig, maz, oliver.upton,
	linux-kernel, linux-mm, rust-for-linux

On Mon 22-07-24 18:29:24, Danilo Krummrich wrote:
> Besides the obvious (and desired) difference between krealloc() and
> kvrealloc(), there is some inconsistency in their function signatures
> and behavior:
> 
>  - krealloc() frees the memory when the requested size is zero, whereas
>    kvrealloc() simply returns a pointer to the existing allocation.
> 
>  - krealloc() behaves like kmalloc() if a NULL pointer is passed, whereas
>    kvrealloc() does not accept a NULL pointer at all and, if passed,
>    would fault instead.
> 
>  - krealloc() is self-contained, whereas kvrealloc() relies on the caller
>    to provide the size of the previous allocation.
> 
> Inconsistent behavior throughout allocation APIs is error prone, hence make
> kvrealloc() behave like krealloc(), which seems superior in all mentioned
> aspects.

I completely agree with this. Fortunately the number of existing callers
is small and none of them really seem to depend on the current behavior
in that aspect.
 
> Besides that, implementing kvrealloc() by making use of krealloc() and
> vrealloc() provides oppertunities to grow (and shrink) allocations more
> efficiently. For instance, vrealloc() can be optimized to allocate and
> map additional pages to grow the allocation or unmap and free unused
> pages to shrink the allocation.

This seems like a change that is independent on the above and should be
a patch on its own.

[...]

> diff --git a/mm/util.c b/mm/util.c
> index bc488f0121a7..0ff5898cc6de 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -608,6 +608,28 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
>  }
>  EXPORT_SYMBOL(vm_mmap);
>  
> +static gfp_t kmalloc_gfp_adjust(gfp_t flags, size_t size)

This seems like a generally useful helper which it is not. I would call
it something like __kvmalloc_gfp_adjust or something similar so that it is
clear that this is just a helper to adjust gfp flag for slab allocator
path

[...]
> -void *kvrealloc_noprof(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
> +/**
> + * kvrealloc - reallocate memory; contents remain unchanged
> + * @p: object to reallocate memory for
> + * @size: the size to reallocate
> + * @flags: the flags for the page level allocator
> + *
> + * The contents of the object pointed to are preserved up to the lesser of the
> + * new and old size (__GFP_ZERO flag is effectively ignored).
> + *
> + * If @p is %NULL, kvrealloc() behaves exactly like kvmalloc(). If @size is 0
> + * and @p is not a %NULL pointer, the object pointed to is freed.
> + *
> + * Return: pointer to the allocated memory or %NULL in case of error
> + */
> +void *kvrealloc_noprof(const void *p, size_t size, gfp_t flags)
>  {
> -	void *newp;
> +	void *n;
> +

	if (!size && p) {
		kvfree(p);
		return NULL;
	}

would make this code flow slightly easier to read because the freeing
path would be shared for all compbinations IMO.

> +	if (is_vmalloc_addr(p))
> +		return vrealloc_noprof(p, size, flags);
> +
> +	n = krealloc_noprof(p, size, kmalloc_gfp_adjust(flags, size));
> +	if (!n) {
> +		/* We failed to krealloc(), fall back to kvmalloc(). */
> +		n = kvmalloc_noprof(size, flags);

Why don't you simply use vrealloc_noprof here?

> +		if (!n)
> +			return NULL;
> +
> +		if (p) {
> +			/* We already know that `p` is not a vmalloc address. */
> +			memcpy(n, p, ksize(p));
> +			kfree(p);
> +		}
> +	}
>  
> -	if (oldsize >= newsize)
> -		return (void *)p;
> -	newp = kvmalloc_noprof(newsize, flags);
> -	if (!newp)
> -		return NULL;
> -	memcpy(newp, p, oldsize);
> -	kvfree(p);
> -	return newp;
> +	return n;
>  }
>  EXPORT_SYMBOL(kvrealloc_noprof);
>  
> -- 
> 2.45.2

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: kvmalloc: align kvrealloc() with krealloc()
  2024-07-23  7:50   ` Michal Hocko
@ 2024-07-23 10:42     ` Danilo Krummrich
  2024-07-23 10:55       ` Michal Hocko
  0 siblings, 1 reply; 28+ messages in thread
From: Danilo Krummrich @ 2024-07-23 10:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, vbabka,
	roman.gushchin, 42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf,
	mpe, chandan.babu, christian.koenig, maz, oliver.upton,
	linux-kernel, linux-mm, rust-for-linux

On Tue, Jul 23, 2024 at 09:50:13AM +0200, Michal Hocko wrote:
> On Mon 22-07-24 18:29:24, Danilo Krummrich wrote:
> > Besides the obvious (and desired) difference between krealloc() and
> > kvrealloc(), there is some inconsistency in their function signatures
> > and behavior:
> > 
> >  - krealloc() frees the memory when the requested size is zero, whereas
> >    kvrealloc() simply returns a pointer to the existing allocation.
> > 
> >  - krealloc() behaves like kmalloc() if a NULL pointer is passed, whereas
> >    kvrealloc() does not accept a NULL pointer at all and, if passed,
> >    would fault instead.
> > 
> >  - krealloc() is self-contained, whereas kvrealloc() relies on the caller
> >    to provide the size of the previous allocation.
> > 
> > Inconsistent behavior throughout allocation APIs is error prone, hence make
> > kvrealloc() behave like krealloc(), which seems superior in all mentioned
> > aspects.
> 
> I completely agree with this. Fortunately the number of existing callers
> is small and none of them really seem to depend on the current behavior
> in that aspect.
>  
> > Besides that, implementing kvrealloc() by making use of krealloc() and
> > vrealloc() provides oppertunities to grow (and shrink) allocations more
> > efficiently. For instance, vrealloc() can be optimized to allocate and
> > map additional pages to grow the allocation or unmap and free unused
> > pages to shrink the allocation.
> 
> This seems like a change that is independent on the above and should be
> a patch on its own.

The optimizations you mean? Yes, I intend to do this in a separate series. For
now, I put TODOs in vrealloc.

> 
> [...]
> 
> > diff --git a/mm/util.c b/mm/util.c
> > index bc488f0121a7..0ff5898cc6de 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -608,6 +608,28 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
> >  }
> >  EXPORT_SYMBOL(vm_mmap);
> >  
> > +static gfp_t kmalloc_gfp_adjust(gfp_t flags, size_t size)
> 
> This seems like a generally useful helper which it is not. I would call
> it something like __kvmalloc_gfp_adjust or something similar so that it is
> clear that this is just a helper to adjust gfp flag for slab allocator
> path

Christoph proposed this name, I think he wanted to encode the target of the
flags, whereas you want to encode where the function is intended to be called
from.

When I originally named this thing, I had the same conflict - encoding both
turns out clumsy - and came up with to_kmalloc_flags().

Personally, I'd be fine with __kvmalloc_gfp_adjust() too.

> 
> [...]
> > -void *kvrealloc_noprof(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
> > +/**
> > + * kvrealloc - reallocate memory; contents remain unchanged
> > + * @p: object to reallocate memory for
> > + * @size: the size to reallocate
> > + * @flags: the flags for the page level allocator
> > + *
> > + * The contents of the object pointed to are preserved up to the lesser of the
> > + * new and old size (__GFP_ZERO flag is effectively ignored).
> > + *
> > + * If @p is %NULL, kvrealloc() behaves exactly like kvmalloc(). If @size is 0
> > + * and @p is not a %NULL pointer, the object pointed to is freed.
> > + *
> > + * Return: pointer to the allocated memory or %NULL in case of error
> > + */
> > +void *kvrealloc_noprof(const void *p, size_t size, gfp_t flags)
> >  {
> > -	void *newp;
> > +	void *n;
> > +
> 
> 	if (!size && p) {
> 		kvfree(p);
> 		return NULL;
> 	}
> 
> would make this code flow slightly easier to read because the freeing
> path would be shared for all compbinations IMO.

Personally, I like it without. For me the simplicity comes from directing things
to either krealloc() or vrealloc(). But I'd be open to change it however.

> 
> > +	if (is_vmalloc_addr(p))
> > +		return vrealloc_noprof(p, size, flags);
> > +
> > +	n = krealloc_noprof(p, size, kmalloc_gfp_adjust(flags, size));
> > +	if (!n) {
> > +		/* We failed to krealloc(), fall back to kvmalloc(). */
> > +		n = kvmalloc_noprof(size, flags);
> 
> Why don't you simply use vrealloc_noprof here?

We could do that, but we'd also need to do the same checks kvmalloc() does, i.e.

	/*
	 * It doesn't really make sense to fallback to vmalloc for sub page
	 * requests
	 */
	if (ret || size <= PAGE_SIZE)
		return ret;

	/* non-sleeping allocations are not supported by vmalloc */
	if (!gfpflags_allow_blocking(flags))
		return NULL;

	/* Don't even allow crazy sizes */
	if (unlikely(size > INT_MAX)) {
		WARN_ON_ONCE(!(flags & __GFP_NOWARN));
		return NULL;
	}

Does the kmalloc() retry through kvmalloc() hurt us enough to do that? This
should only ever happen when we switch from a kmalloc buffer to a vmalloc
buffer, which we only do once, we never switch back.

> 
> > +		if (!n)
> > +			return NULL;
> > +
> > +		if (p) {
> > +			/* We already know that `p` is not a vmalloc address. */
> > +			memcpy(n, p, ksize(p));
> > +			kfree(p);
> > +		}
> > +	}
> >  
> > -	if (oldsize >= newsize)
> > -		return (void *)p;
> > -	newp = kvmalloc_noprof(newsize, flags);
> > -	if (!newp)
> > -		return NULL;
> > -	memcpy(newp, p, oldsize);
> > -	kvfree(p);
> > -	return newp;
> > +	return n;
> >  }
> >  EXPORT_SYMBOL(kvrealloc_noprof);
> >  
> > -- 
> > 2.45.2
> 
> -- 
> Michal Hocko
> SUSE Labs
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: kvmalloc: align kvrealloc() with krealloc()
  2024-07-23 10:42     ` Danilo Krummrich
@ 2024-07-23 10:55       ` Michal Hocko
  2024-07-23 11:55         ` Danilo Krummrich
  0 siblings, 1 reply; 28+ messages in thread
From: Michal Hocko @ 2024-07-23 10:55 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, vbabka,
	roman.gushchin, 42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf,
	mpe, chandan.babu, christian.koenig, maz, oliver.upton,
	linux-kernel, linux-mm, rust-for-linux

On Tue 23-07-24 12:42:17, Danilo Krummrich wrote:
> On Tue, Jul 23, 2024 at 09:50:13AM +0200, Michal Hocko wrote:
> > On Mon 22-07-24 18:29:24, Danilo Krummrich wrote:
[...]
> > > Besides that, implementing kvrealloc() by making use of krealloc() and
> > > vrealloc() provides oppertunities to grow (and shrink) allocations more
> > > efficiently. For instance, vrealloc() can be optimized to allocate and
> > > map additional pages to grow the allocation or unmap and free unused
> > > pages to shrink the allocation.
> > 
> > This seems like a change that is independent on the above and should be
> > a patch on its own.
> 
> The optimizations you mean? Yes, I intend to do this in a separate series. For
> now, I put TODOs in vrealloc.

No I mean, that the change of the signature and semantic should be done along with
update to callers and the new implementation of the function itself
should be done in its own patch.

[...]
> > > +void *kvrealloc_noprof(const void *p, size_t size, gfp_t flags)
> > >  {
> > > -	void *newp;
> > > +	void *n;
> > > +
> > 
> > 	if (!size && p) {
> > 		kvfree(p);
> > 		return NULL;
> > 	}
> > 
> > would make this code flow slightly easier to read because the freeing
> > path would be shared for all compbinations IMO.
> 
> Personally, I like it without. For me the simplicity comes from directing things
> to either krealloc() or vrealloc(). But I'd be open to change it however.

I would really prefer to have it there because it makes the follow up
code easier.

> > > +	if (is_vmalloc_addr(p))
> > > +		return vrealloc_noprof(p, size, flags);
> > > +
> > > +	n = krealloc_noprof(p, size, kmalloc_gfp_adjust(flags, size));
> > > +	if (!n) {
> > > +		/* We failed to krealloc(), fall back to kvmalloc(). */
> > > +		n = kvmalloc_noprof(size, flags);
> > 
> > Why don't you simply use vrealloc_noprof here?
> 
> We could do that, but we'd also need to do the same checks kvmalloc() does, i.e.
> 
> 	/*
> 	 * It doesn't really make sense to fallback to vmalloc for sub page
> 	 * requests
> 	 */
> 	if (ret || size <= PAGE_SIZE)
> 		return ret;

With the early !size && p check we wouldn't right?

> 
> 	/* non-sleeping allocations are not supported by vmalloc */
> 	if (!gfpflags_allow_blocking(flags))
> 		return NULL;
> 
> 	/* Don't even allow crazy sizes */
> 	if (unlikely(size > INT_MAX)) {
> 		WARN_ON_ONCE(!(flags & __GFP_NOWARN));
> 		return NULL;
> 	}

I do not see why kvrealloc should have different set of constrains than
vrealloc in this regards.

> Does the kmalloc() retry through kvmalloc() hurt us enough to do that? This
> should only ever happen when we switch from a kmalloc buffer to a vmalloc
> buffer, which we only do once, we never switch back.

This is effectively open coding part of vrealloc without any good
reason. Please get rid of that.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: kvmalloc: align kvrealloc() with krealloc()
  2024-07-23 10:55       ` Michal Hocko
@ 2024-07-23 11:55         ` Danilo Krummrich
  2024-07-23 12:12           ` Michal Hocko
  0 siblings, 1 reply; 28+ messages in thread
From: Danilo Krummrich @ 2024-07-23 11:55 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, vbabka,
	roman.gushchin, 42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf,
	mpe, chandan.babu, christian.koenig, maz, oliver.upton,
	linux-kernel, linux-mm, rust-for-linux

On Tue, Jul 23, 2024 at 12:55:45PM +0200, Michal Hocko wrote:
> On Tue 23-07-24 12:42:17, Danilo Krummrich wrote:
> > On Tue, Jul 23, 2024 at 09:50:13AM +0200, Michal Hocko wrote:
> > > On Mon 22-07-24 18:29:24, Danilo Krummrich wrote:
> [...]
> > > > Besides that, implementing kvrealloc() by making use of krealloc() and
> > > > vrealloc() provides oppertunities to grow (and shrink) allocations more
> > > > efficiently. For instance, vrealloc() can be optimized to allocate and
> > > > map additional pages to grow the allocation or unmap and free unused
> > > > pages to shrink the allocation.
> > > 
> > > This seems like a change that is independent on the above and should be
> > > a patch on its own.
> > 
> > The optimizations you mean? Yes, I intend to do this in a separate series. For
> > now, I put TODOs in vrealloc.
> 
> No I mean, that the change of the signature and semantic should be done along with
> update to callers and the new implementation of the function itself
> should be done in its own patch.

Sorry, it seems like you lost me a bit.

There is one patch that implements vrealloc() and one patch that does the change
of krealloc()'s signature, semantics and the corresponding update to the
callers.

Isn't that already what you ask for?

> 
> [...]
> > > > +void *kvrealloc_noprof(const void *p, size_t size, gfp_t flags)
> > > >  {
> > > > -	void *newp;
> > > > +	void *n;
> > > > +
> > > 
> > > 	if (!size && p) {
> > > 		kvfree(p);
> > > 		return NULL;
> > > 	}
> > > 
> > > would make this code flow slightly easier to read because the freeing
> > > path would be shared for all compbinations IMO.
> > 
> > Personally, I like it without. For me the simplicity comes from directing things
> > to either krealloc() or vrealloc(). But I'd be open to change it however.
> 
> I would really prefer to have it there because it makes the follow up
> code easier.

I don't think it does (see below).

Either way, I got notified that Andrew applied the patches to mm-unstable. How
to proceed from there for further changes, if any?

> 
> > > > +	if (is_vmalloc_addr(p))
> > > > +		return vrealloc_noprof(p, size, flags);
> > > > +
> > > > +	n = krealloc_noprof(p, size, kmalloc_gfp_adjust(flags, size));
> > > > +	if (!n) {
> > > > +		/* We failed to krealloc(), fall back to kvmalloc(). */
> > > > +		n = kvmalloc_noprof(size, flags);
> > > 
> > > Why don't you simply use vrealloc_noprof here?
> > 
> > We could do that, but we'd also need to do the same checks kvmalloc() does, i.e.
> > 
> > 	/*
> > 	 * It doesn't really make sense to fallback to vmalloc for sub page
> > 	 * requests
> > 	 */
> > 	if (ret || size <= PAGE_SIZE)
> > 		return ret;
> 
> With the early !size && p check we wouldn't right?

I think that's unrelated. Your proposed early check checks for size == 0 to free
and return early. Whereas this check bails out if we fail kmalloc() with
size <= PAGE_SIZE, because a subsequent vmalloc() wouldn't make a lot of sense.

> 
> > 
> > 	/* non-sleeping allocations are not supported by vmalloc */
> > 	if (!gfpflags_allow_blocking(flags))
> > 		return NULL;
> > 
> > 	/* Don't even allow crazy sizes */
> > 	if (unlikely(size > INT_MAX)) {
> > 		WARN_ON_ONCE(!(flags & __GFP_NOWARN));
> > 		return NULL;
> > 	}
> 
> I do not see why kvrealloc should have different set of constrains than
> vrealloc in this regards.

Those constraints come from kvmalloc() and hence should also apply for
kvrealloc(). What you seem to question here is whether they should be moved from
kvmalloc() to vmalloc() (and hence implicitly to vrealloc()).

As for the gfpflags_allow_blocking() check, it seems like this one was suggested
by you for kvmalloc() [1]. It seems that some people call kvmalloc() with
GPF_ATOMIC (which seems a bit weird at a first glance, but maybe makes sense in
some generic code paths). Hence, kvrealloc() must be able to handle it as well.

As for the size > INT_MAX check, please see the discussion in commit
0708a0afe291 ("mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls").

But again, whether those checks should be moved to vmalloc() is probably a
different topic.

[1] https://lore.kernel.org/all/20220926151650.15293-1-fw@strlen.de/T/#u

> 
> > Does the kmalloc() retry through kvmalloc() hurt us enough to do that? This
> > should only ever happen when we switch from a kmalloc buffer to a vmalloc
> > buffer, which we only do once, we never switch back.
> 
> This is effectively open coding part of vrealloc without any good
> reason. Please get rid of that.
> 
> -- 
> Michal Hocko
> SUSE Labs
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: kvmalloc: align kvrealloc() with krealloc()
  2024-07-23 11:55         ` Danilo Krummrich
@ 2024-07-23 12:12           ` Michal Hocko
  2024-07-23 13:33             ` Danilo Krummrich
  0 siblings, 1 reply; 28+ messages in thread
From: Michal Hocko @ 2024-07-23 12:12 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, vbabka,
	roman.gushchin, 42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf,
	mpe, chandan.babu, christian.koenig, maz, oliver.upton,
	linux-kernel, linux-mm, rust-for-linux

On Tue 23-07-24 13:55:48, Danilo Krummrich wrote:
> On Tue, Jul 23, 2024 at 12:55:45PM +0200, Michal Hocko wrote:
> > On Tue 23-07-24 12:42:17, Danilo Krummrich wrote:
> > > On Tue, Jul 23, 2024 at 09:50:13AM +0200, Michal Hocko wrote:
> > > > On Mon 22-07-24 18:29:24, Danilo Krummrich wrote:
> > [...]
> > > > > Besides that, implementing kvrealloc() by making use of krealloc() and
> > > > > vrealloc() provides oppertunities to grow (and shrink) allocations more
> > > > > efficiently. For instance, vrealloc() can be optimized to allocate and
> > > > > map additional pages to grow the allocation or unmap and free unused
> > > > > pages to shrink the allocation.
> > > > 
> > > > This seems like a change that is independent on the above and should be
> > > > a patch on its own.
> > > 
> > > The optimizations you mean? Yes, I intend to do this in a separate series. For
> > > now, I put TODOs in vrealloc.
> > 
> > No I mean, that the change of the signature and semantic should be done along with
> > update to callers and the new implementation of the function itself
> > should be done in its own patch.
> 
> Sorry, it seems like you lost me a bit.
> 
> There is one patch that implements vrealloc() and one patch that does the change
> of krealloc()'s signature, semantics and the corresponding update to the
> callers.
> 
> Isn't that already what you ask for?

No, because this second patch reimplements kvrealloc wo to use krealloc
and vrealloc fallback. More clear now?
 
> > [...]
> > > > > +void *kvrealloc_noprof(const void *p, size_t size, gfp_t flags)
> > > > >  {
> > > > > -	void *newp;
> > > > > +	void *n;
> > > > > +
> > > > 
> > > > 	if (!size && p) {
> > > > 		kvfree(p);
> > > > 		return NULL;
> > > > 	}
> > > > 
> > > > would make this code flow slightly easier to read because the freeing
> > > > path would be shared for all compbinations IMO.
> > > 
> > > Personally, I like it without. For me the simplicity comes from directing things
> > > to either krealloc() or vrealloc(). But I'd be open to change it however.
> > 
> > I would really prefer to have it there because it makes the follow up
> > code easier.
> 
> I don't think it does (see below).
> 
> Either way, I got notified that Andrew applied the patches to mm-unstable. How
> to proceed from there for further changes, if any?

Andrew will either apply follow up fixes are replace the series by a new
version.

> > 
> > > > > +	if (is_vmalloc_addr(p))
> > > > > +		return vrealloc_noprof(p, size, flags);
> > > > > +
> > > > > +	n = krealloc_noprof(p, size, kmalloc_gfp_adjust(flags, size));
> > > > > +	if (!n) {
> > > > > +		/* We failed to krealloc(), fall back to kvmalloc(). */
> > > > > +		n = kvmalloc_noprof(size, flags);
> > > > 
> > > > Why don't you simply use vrealloc_noprof here?
> > > 
> > > We could do that, but we'd also need to do the same checks kvmalloc() does, i.e.
> > > 
> > > 	/*
> > > 	 * It doesn't really make sense to fallback to vmalloc for sub page
> > > 	 * requests
> > > 	 */
> > > 	if (ret || size <= PAGE_SIZE)
> > > 		return ret;
> > 
> > With the early !size && p check we wouldn't right?
> 
> I think that's unrelated. Your proposed early check checks for size == 0 to free
> and return early. Whereas this check bails out if we fail kmalloc() with
> size <= PAGE_SIZE, because a subsequent vmalloc() wouldn't make a lot of sense.

It seems we are not on the same page here. Here is what I would like
kvrealloc to look like in the end:

void *kvrealloc_noprof(const void *p, size_t size, gfp_t flags)
{
        void *newp;

        if (!size && p) {
                kvfree(p);
                return NULL;
        }

        if (!is_vmalloc_addr(p))
                newp = krealloc_noprof(p, size, kmalloc_gfp_adjust(flags, size));

        if (newp)
                return newp;

        return vrealloc_noprof(p, size, flags);
}
EXPORT_SYMBOL(kvrealloc_noprof);

krealloc_noprof should be extended for the maximum allowed size and so
does vrealloc_noprof. The implementation of the kvrealloc cannot get any
easier and more straightforward AFAICS. See my point?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: kvmalloc: align kvrealloc() with krealloc()
  2024-07-23 12:12           ` Michal Hocko
@ 2024-07-23 13:33             ` Danilo Krummrich
  2024-07-23 18:53               ` Michal Hocko
  0 siblings, 1 reply; 28+ messages in thread
From: Danilo Krummrich @ 2024-07-23 13:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, vbabka,
	roman.gushchin, 42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf,
	mpe, chandan.babu, christian.koenig, maz, oliver.upton,
	linux-kernel, linux-mm, rust-for-linux

On Tue, Jul 23, 2024 at 02:12:23PM +0200, Michal Hocko wrote:
> On Tue 23-07-24 13:55:48, Danilo Krummrich wrote:
> > On Tue, Jul 23, 2024 at 12:55:45PM +0200, Michal Hocko wrote:
> > > On Tue 23-07-24 12:42:17, Danilo Krummrich wrote:
> > > > On Tue, Jul 23, 2024 at 09:50:13AM +0200, Michal Hocko wrote:
> > > > > On Mon 22-07-24 18:29:24, Danilo Krummrich wrote:
> > > [...]
> > > > > > Besides that, implementing kvrealloc() by making use of krealloc() and
> > > > > > vrealloc() provides oppertunities to grow (and shrink) allocations more
> > > > > > efficiently. For instance, vrealloc() can be optimized to allocate and
> > > > > > map additional pages to grow the allocation or unmap and free unused
> > > > > > pages to shrink the allocation.
> > > > > 
> > > > > This seems like a change that is independent on the above and should be
> > > > > a patch on its own.
> > > > 
> > > > The optimizations you mean? Yes, I intend to do this in a separate series. For
> > > > now, I put TODOs in vrealloc.
> > > 
> > > No I mean, that the change of the signature and semantic should be done along with
> > > update to callers and the new implementation of the function itself
> > > should be done in its own patch.
> > 
> > Sorry, it seems like you lost me a bit.
> > 
> > There is one patch that implements vrealloc() and one patch that does the change
> > of krealloc()'s signature, semantics and the corresponding update to the
> > callers.
> > 
> > Isn't that already what you ask for?
> 
> No, because this second patch reimplements kvrealloc wo to use krealloc
> and vrealloc fallback. More clear now?

I'm very sorry, but no. The second patch just changes kvrealloc(), how do you
want to split it up?

> > > > > > +	if (is_vmalloc_addr(p))
> > > > > > +		return vrealloc_noprof(p, size, flags);
> > > > > > +
> > > > > > +	n = krealloc_noprof(p, size, kmalloc_gfp_adjust(flags, size));
> > > > > > +	if (!n) {
> > > > > > +		/* We failed to krealloc(), fall back to kvmalloc(). */
> > > > > > +		n = kvmalloc_noprof(size, flags);
> > > > > 
> > > > > Why don't you simply use vrealloc_noprof here?
> > > > 
> > > > We could do that, but we'd also need to do the same checks kvmalloc() does, i.e.
> > > > 
> > > > 	/*
> > > > 	 * It doesn't really make sense to fallback to vmalloc for sub page
> > > > 	 * requests
> > > > 	 */
> > > > 	if (ret || size <= PAGE_SIZE)
> > > > 		return ret;
> > > 
> > > With the early !size && p check we wouldn't right?
> > 
> > I think that's unrelated. Your proposed early check checks for size == 0 to free
> > and return early. Whereas this check bails out if we fail kmalloc() with
> > size <= PAGE_SIZE, because a subsequent vmalloc() wouldn't make a lot of sense.
> 
> It seems we are not on the same page here. Here is what I would like
> kvrealloc to look like in the end:
> 
> void *kvrealloc_noprof(const void *p, size_t size, gfp_t flags)
> {
>         void *newp;
> 
>         if (!size && p) {
>                 kvfree(p);
>                 return NULL;
>         }
> 
>         if (!is_vmalloc_addr(p))
>                 newp = krealloc_noprof(p, size, kmalloc_gfp_adjust(flags, size));
> 
>         if (newp)
>                 return newp;
> 
>         return vrealloc_noprof(p, size, flags);
> }
> EXPORT_SYMBOL(kvrealloc_noprof);

This looks weird. The fact that you're passing p to vrealloc_noprof() if
krealloc_noprof() fails, implies that vrealloc_noprof() must be able to deal
with pointers to kmalloc'd memory.

Also, you never migrate from kmalloc memory to vmalloc memory and never free p.
Given the above, do you mean to say that vrealloc_noprof() should do all that?

If so, I strongly disagree here. vrealloc() should only deal with vmalloc
memory.

> 
> krealloc_noprof should be extended for the maximum allowed size

krealloc_noprof() already has a maximum allowed size.

> and so does vrealloc_noprof.

Probably, but I don't think this series is the correct scope for this change.
I'd offer to send a separate patch for this though.

> The implementation of the kvrealloc cannot get any
> easier and more straightforward AFAICS. See my point?
> -- 
> Michal Hocko
> SUSE Labs
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: kvmalloc: align kvrealloc() with krealloc()
  2024-07-23  1:43   ` Andrew Morton
@ 2024-07-23 14:05     ` Danilo Krummrich
  0 siblings, 0 replies; 28+ messages in thread
From: Danilo Krummrich @ 2024-07-23 14:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, vbabka, roman.gushchin,
	42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf, mhocko, mpe,
	chandan.babu, christian.koenig, maz, oliver.upton, linux-kernel,
	linux-mm, rust-for-linux

On Mon, Jul 22, 2024 at 06:43:48PM -0700, Andrew Morton wrote:
> On Mon, 22 Jul 2024 18:29:24 +0200 Danilo Krummrich <dakr@kernel.org> wrote:
> 
> > Besides the obvious (and desired) difference between krealloc() and
> > kvrealloc(), there is some inconsistency in their function signatures
> > and behavior:
> > 
> >  - krealloc() frees the memory when the requested size is zero, whereas
> >    kvrealloc() simply returns a pointer to the existing allocation.
> 
> The old kvrealloc() behavior actually sounds somewhat useful.  You've
> checked that no existing sites were relying on this?

Yes, I did.

> 
> And that all existing kvrealloc() callers were (incorrectly) checking
> for NULL?  Seems that way.

You mean for the initial allocation? Yes, but I also noticed that as long as the
old kvrealloc() is called with p == NULL and oldsize == 0 it should work as
well.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: kvmalloc: align kvrealloc() with krealloc()
  2024-07-23 13:33             ` Danilo Krummrich
@ 2024-07-23 18:53               ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2024-07-23 18:53 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, vbabka,
	roman.gushchin, 42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf,
	mpe, chandan.babu, christian.koenig, maz, oliver.upton,
	linux-kernel, linux-mm, rust-for-linux

On Tue 23-07-24 15:33:32, Danilo Krummrich wrote:
> On Tue, Jul 23, 2024 at 02:12:23PM +0200, Michal Hocko wrote:
> > On Tue 23-07-24 13:55:48, Danilo Krummrich wrote:
[...]
> > void *kvrealloc_noprof(const void *p, size_t size, gfp_t flags)
> > {
> >         void *newp;
> > 
> >         if (!size && p) {
> >                 kvfree(p);
> >                 return NULL;
> >         }
> > 
> >         if (!is_vmalloc_addr(p))
> >                 newp = krealloc_noprof(p, size, kmalloc_gfp_adjust(flags, size));
> > 
> >         if (newp)
> >                 return newp;
> > 
> >         return vrealloc_noprof(p, size, flags);
> > }
> > EXPORT_SYMBOL(kvrealloc_noprof);
> 
> This looks weird. The fact that you're passing p to vrealloc_noprof() if
> krealloc_noprof() fails, implies that vrealloc_noprof() must be able to deal
> with pointers to kmalloc'd memory.

You are right I have oversimplified this. I was hoping to follow
kvmalloc model with a clear fallback and that should be possible but it
would require more changes. Scratch that.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/2] Align kvrealloc() with krealloc()
  2024-07-22 16:29 [PATCH v2 0/2] Align kvrealloc() with krealloc() Danilo Krummrich
  2024-07-22 16:29 ` [PATCH v2 1/2] mm: vmalloc: implement vrealloc() Danilo Krummrich
  2024-07-22 16:29 ` [PATCH v2 2/2] mm: kvmalloc: align kvrealloc() with krealloc() Danilo Krummrich
@ 2024-07-23 18:54 ` Michal Hocko
  2024-07-23 18:56   ` Danilo Krummrich
  2 siblings, 1 reply; 28+ messages in thread
From: Michal Hocko @ 2024-07-23 18:54 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, vbabka,
	roman.gushchin, 42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf,
	mpe, chandan.babu, christian.koenig, maz, oliver.upton,
	linux-kernel, linux-mm, rust-for-linux

To both patches:
Acked-by: Michal Hocko <mhocko@suse.com>

Sorry, I was a bit dense today.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/2] Align kvrealloc() with krealloc()
  2024-07-23 18:54 ` [PATCH v2 0/2] Align " Michal Hocko
@ 2024-07-23 18:56   ` Danilo Krummrich
  0 siblings, 0 replies; 28+ messages in thread
From: Danilo Krummrich @ 2024-07-23 18:56 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, vbabka,
	roman.gushchin, 42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf,
	mpe, chandan.babu, christian.koenig, maz, oliver.upton,
	linux-kernel, linux-mm, rust-for-linux

On Tue, Jul 23, 2024 at 08:54:21PM +0200, Michal Hocko wrote:
> To both patches:
> Acked-by: Michal Hocko <mhocko@suse.com>
> 
> Sorry, I was a bit dense today.

No worries, thanks for reviewing!

- Danilo

> 
> -- 
> Michal Hocko
> SUSE Labs
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: vmalloc: implement vrealloc()
  2024-07-22 16:29 ` [PATCH v2 1/2] mm: vmalloc: implement vrealloc() Danilo Krummrich
@ 2024-07-26 14:37   ` Vlastimil Babka
  2024-07-26 20:05     ` Danilo Krummrich
  0 siblings, 1 reply; 28+ messages in thread
From: Vlastimil Babka @ 2024-07-26 14:37 UTC (permalink / raw)
  To: Danilo Krummrich, cl, penberg, rientjes, iamjoonsoo.kim, akpm,
	roman.gushchin, 42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf,
	mhocko, mpe, chandan.babu, christian.koenig, maz, oliver.upton
  Cc: linux-kernel, linux-mm, rust-for-linux

On 7/22/24 6:29 PM, Danilo Krummrich wrote:
> Implement vrealloc() analogous to krealloc().
> 
> Currently, krealloc() requires the caller to pass the size of the
> previous memory allocation, which, instead, should be self-contained.
> 
> We attempt to fix this in a subsequent patch which, in order to do so,
> requires vrealloc().
> 
> Besides that, we need realloc() functions for kernel allocators in Rust
> too. With `Vec` or `KVec` respectively, potentially growing (and
> shrinking) data structures are rather common.
> 
> Signed-off-by: Danilo Krummrich <dakr@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -4037,6 +4037,65 @@ void *vzalloc_node_noprof(unsigned long size, int node)
>  }
>  EXPORT_SYMBOL(vzalloc_node_noprof);
>  
> +/**
> + * vrealloc - reallocate virtually contiguous memory; contents remain unchanged
> + * @p: object to reallocate memory for
> + * @size: the size to reallocate
> + * @flags: the flags for the page level allocator
> + *
> + * The contents of the object pointed to are preserved up to the lesser of the
> + * new and old size (__GFP_ZERO flag is effectively ignored).

Well, technically not correct as we don't shrink. Get 8 pages, kvrealloc to
4 pages, kvrealloc back to 8 and the last 4 are not zeroed. But it's not
new, kvrealloc() did the same before patch 2/2.

But it's also fundamentally not true for krealloc(), or kvrealloc()
switching from a kmalloc to valloc. ksize() returns the size of the kmalloc
bucket, we don't know what was the exact prior allocation size. Worse, we
started poisoning the padding in debug configurations, so even a
kmalloc(__GFP_ZERO) followed by krealloc(__GFP_ZERO) can give you unexpected
poison now...

I guess we should just document __GFP_ZERO is not honored at all for
realloc, and maybe start even warning :/ Hopefully nobody relies on that.

> + *
> + * If @p is %NULL, vrealloc() behaves exactly like vmalloc(). If @size is 0 and
> + * @p is not a %NULL pointer, the object pointed to is freed.
> + *
> + * Return: pointer to the allocated memory; %NULL if @size is zero or in case of
> + *         failure
> + */
> +void *vrealloc_noprof(const void *p, size_t size, gfp_t flags)
> +{
> +	size_t old_size = 0;
> +	void *n;
> +
> +	if (!size) {
> +		vfree(p);
> +		return NULL;
> +	}
> +
> +	if (p) {
> +		struct vm_struct *vm;
> +
> +		vm = find_vm_area(p);
> +		if (unlikely(!vm)) {
> +			WARN(1, "Trying to vrealloc() nonexistent vm area (%p)\n", p);
> +			return NULL;
> +		}
> +
> +		old_size = get_vm_area_size(vm);
> +	}
> +
> +	if (size <= old_size) {
> +		/*
> +		 * TODO: Shrink the vm_area, i.e. unmap and free unused pages.
> +		 * What would be a good heuristic for when to shrink the
> +		 * vm_area?
> +		 */
> +		return (void *)p;
> +	}
> +
> +	/* TODO: Grow the vm_area, i.e. allocate and map additional pages. */
> +	n = __vmalloc_noprof(size, flags);
> +	if (!n)
> +		return NULL;
> +
> +	if (p) {
> +		memcpy(n, p, old_size);
> +		vfree(p);
> +	}
> +
> +	return n;
> +}
> +
>  #if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)
>  #define GFP_VMALLOC32 (GFP_DMA32 | GFP_KERNEL)
>  #elif defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: kvmalloc: align kvrealloc() with krealloc()
  2024-07-22 16:29 ` [PATCH v2 2/2] mm: kvmalloc: align kvrealloc() with krealloc() Danilo Krummrich
  2024-07-23  1:43   ` Andrew Morton
  2024-07-23  7:50   ` Michal Hocko
@ 2024-07-26 14:38   ` Vlastimil Babka
  2 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2024-07-26 14:38 UTC (permalink / raw)
  To: Danilo Krummrich, cl, penberg, rientjes, iamjoonsoo.kim, akpm,
	roman.gushchin, 42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf,
	mhocko, mpe, chandan.babu, christian.koenig, maz, oliver.upton
  Cc: linux-kernel, linux-mm, rust-for-linux

On 7/22/24 6:29 PM, Danilo Krummrich wrote:
> Besides the obvious (and desired) difference between krealloc() and
> kvrealloc(), there is some inconsistency in their function signatures
> and behavior:
> 
>  - krealloc() frees the memory when the requested size is zero, whereas
>    kvrealloc() simply returns a pointer to the existing allocation.
> 
>  - krealloc() behaves like kmalloc() if a NULL pointer is passed, whereas
>    kvrealloc() does not accept a NULL pointer at all and, if passed,
>    would fault instead.
> 
>  - krealloc() is self-contained, whereas kvrealloc() relies on the caller
>    to provide the size of the previous allocation.
> 
> Inconsistent behavior throughout allocation APIs is error prone, hence make
> kvrealloc() behave like krealloc(), which seems superior in all mentioned
> aspects.
> 
> Besides that, implementing kvrealloc() by making use of krealloc() and
> vrealloc() provides oppertunities to grow (and shrink) allocations more
> efficiently. For instance, vrealloc() can be optimized to allocate and
> map additional pages to grow the allocation or unmap and free unused
> pages to shrink the allocation.
> 
> Signed-off-by: Danilo Krummrich <dakr@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

with same caveat about the __GFP_ZERO comment on kvrealloc_noprof()


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: vmalloc: implement vrealloc()
  2024-07-26 14:37   ` Vlastimil Babka
@ 2024-07-26 20:05     ` Danilo Krummrich
  2024-07-29 19:08       ` Danilo Krummrich
  0 siblings, 1 reply; 28+ messages in thread
From: Danilo Krummrich @ 2024-07-26 20:05 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, roman.gushchin,
	42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf, mhocko, mpe,
	chandan.babu, christian.koenig, maz, oliver.upton, linux-kernel,
	linux-mm, rust-for-linux

On Fri, Jul 26, 2024 at 04:37:43PM +0200, Vlastimil Babka wrote:
> On 7/22/24 6:29 PM, Danilo Krummrich wrote:
> > Implement vrealloc() analogous to krealloc().
> > 
> > Currently, krealloc() requires the caller to pass the size of the
> > previous memory allocation, which, instead, should be self-contained.
> > 
> > We attempt to fix this in a subsequent patch which, in order to do so,
> > requires vrealloc().
> > 
> > Besides that, we need realloc() functions for kernel allocators in Rust
> > too. With `Vec` or `KVec` respectively, potentially growing (and
> > shrinking) data structures are rather common.
> > 
> > Signed-off-by: Danilo Krummrich <dakr@kernel.org>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -4037,6 +4037,65 @@ void *vzalloc_node_noprof(unsigned long size, int node)
> >  }
> >  EXPORT_SYMBOL(vzalloc_node_noprof);
> >  
> > +/**
> > + * vrealloc - reallocate virtually contiguous memory; contents remain unchanged
> > + * @p: object to reallocate memory for
> > + * @size: the size to reallocate
> > + * @flags: the flags for the page level allocator
> > + *
> > + * The contents of the object pointed to are preserved up to the lesser of the
> > + * new and old size (__GFP_ZERO flag is effectively ignored).
> 
> Well, technically not correct as we don't shrink. Get 8 pages, kvrealloc to
> 4 pages, kvrealloc back to 8 and the last 4 are not zeroed. But it's not
> new, kvrealloc() did the same before patch 2/2.

Taking it (too) literal, it's not wrong. The contents of the object pointed to
are indeed preserved up to the lesser of the new and old size. It's just that
the rest may be "preserved" as well.

I work on implementing shrink and grow for vrealloc(). In the meantime I think
we could probably just memset() spare memory to zero.

nommu would still uses krealloc() though...

> 
> But it's also fundamentally not true for krealloc(), or kvrealloc()
> switching from a kmalloc to valloc. ksize() returns the size of the kmalloc
> bucket, we don't know what was the exact prior allocation size.

Probably a stupid question, but can't we just zero the full bucket initially and
make sure to memset() spare memory in the bucket to zero when krealloc() is
called with new_size < ksize()?

> Worse, we
> started poisoning the padding in debug configurations, so even a
> kmalloc(__GFP_ZERO) followed by krealloc(__GFP_ZERO) can give you unexpected
> poison now...

As in writing magics directly to the spare memory in the bucket? Which would
then also be copied over to a new buffer in __do_krealloc()?

> 
> I guess we should just document __GFP_ZERO is not honored at all for
> realloc, and maybe start even warning :/ Hopefully nobody relies on that.

I think it'd be great to make __GFP_ZERO work in all cases. However, if that's
really not possible, I'd prefer if we could at least gurantee that
*realloc(NULL, size, flags | __GFP_ZERO) is a valid call, i.e.
WARN_ON(p && flags & __GFP_ZERO).

> 
> > + *
> > + * If @p is %NULL, vrealloc() behaves exactly like vmalloc(). If @size is 0 and
> > + * @p is not a %NULL pointer, the object pointed to is freed.
> > + *
> > + * Return: pointer to the allocated memory; %NULL if @size is zero or in case of
> > + *         failure
> > + */
> > +void *vrealloc_noprof(const void *p, size_t size, gfp_t flags)
> > +{
> > +	size_t old_size = 0;
> > +	void *n;
> > +
> > +	if (!size) {
> > +		vfree(p);
> > +		return NULL;
> > +	}
> > +
> > +	if (p) {
> > +		struct vm_struct *vm;
> > +
> > +		vm = find_vm_area(p);
> > +		if (unlikely(!vm)) {
> > +			WARN(1, "Trying to vrealloc() nonexistent vm area (%p)\n", p);
> > +			return NULL;
> > +		}
> > +
> > +		old_size = get_vm_area_size(vm);
> > +	}
> > +
> > +	if (size <= old_size) {
> > +		/*
> > +		 * TODO: Shrink the vm_area, i.e. unmap and free unused pages.
> > +		 * What would be a good heuristic for when to shrink the
> > +		 * vm_area?
> > +		 */
> > +		return (void *)p;
> > +	}
> > +
> > +	/* TODO: Grow the vm_area, i.e. allocate and map additional pages. */
> > +	n = __vmalloc_noprof(size, flags);
> > +	if (!n)
> > +		return NULL;
> > +
> > +	if (p) {
> > +		memcpy(n, p, old_size);
> > +		vfree(p);
> > +	}
> > +
> > +	return n;
> > +}
> > +
> >  #if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)
> >  #define GFP_VMALLOC32 (GFP_DMA32 | GFP_KERNEL)
> >  #elif defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA)
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: vmalloc: implement vrealloc()
  2024-07-26 20:05     ` Danilo Krummrich
@ 2024-07-29 19:08       ` Danilo Krummrich
  2024-07-30  1:35         ` Danilo Krummrich
  0 siblings, 1 reply; 28+ messages in thread
From: Danilo Krummrich @ 2024-07-29 19:08 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, roman.gushchin,
	42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf, mhocko, mpe,
	chandan.babu, christian.koenig, maz, oliver.upton, linux-kernel,
	linux-mm, rust-for-linux

On Fri, Jul 26, 2024 at 10:05:47PM +0200, Danilo Krummrich wrote:
> On Fri, Jul 26, 2024 at 04:37:43PM +0200, Vlastimil Babka wrote:
> > On 7/22/24 6:29 PM, Danilo Krummrich wrote:
> > > Implement vrealloc() analogous to krealloc().
> > > 
> > > Currently, krealloc() requires the caller to pass the size of the
> > > previous memory allocation, which, instead, should be self-contained.
> > > 
> > > We attempt to fix this in a subsequent patch which, in order to do so,
> > > requires vrealloc().
> > > 
> > > Besides that, we need realloc() functions for kernel allocators in Rust
> > > too. With `Vec` or `KVec` respectively, potentially growing (and
> > > shrinking) data structures are rather common.
> > > 
> > > Signed-off-by: Danilo Krummrich <dakr@kernel.org>
> > 
> > Acked-by: Vlastimil Babka <vbabka@suse.cz>
> > 
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -4037,6 +4037,65 @@ void *vzalloc_node_noprof(unsigned long size, int node)
> > >  }
> > >  EXPORT_SYMBOL(vzalloc_node_noprof);
> > >  
> > > +/**
> > > + * vrealloc - reallocate virtually contiguous memory; contents remain unchanged
> > > + * @p: object to reallocate memory for
> > > + * @size: the size to reallocate
> > > + * @flags: the flags for the page level allocator
> > > + *
> > > + * The contents of the object pointed to are preserved up to the lesser of the
> > > + * new and old size (__GFP_ZERO flag is effectively ignored).
> > 
> > Well, technically not correct as we don't shrink. Get 8 pages, kvrealloc to
> > 4 pages, kvrealloc back to 8 and the last 4 are not zeroed. But it's not
> > new, kvrealloc() did the same before patch 2/2.
> 
> Taking it (too) literal, it's not wrong. The contents of the object pointed to
> are indeed preserved up to the lesser of the new and old size. It's just that
> the rest may be "preserved" as well.
> 
> I work on implementing shrink and grow for vrealloc(). In the meantime I think
> we could probably just memset() spare memory to zero.

Probably, this was a bad idea. Even with shrinking implemented we'd need to
memset() potential spare memory of the last page to zero, when new_size <
old_size.

Analogously, the same would be true for krealloc() buckets. That's probably not
worth it.

I think we should indeed just document that __GFP_ZERO doesn't work for
re-allocating memory and start to warn about it. As already mentioned, I think
we should at least gurantee that *realloc(NULL, size, flags | __GFP_ZERO) is
valid, i.e. WARN_ON(p && flags & __GFP_ZERO).

> 
> nommu would still uses krealloc() though...
> 
> > 
> > But it's also fundamentally not true for krealloc(), or kvrealloc()
> > switching from a kmalloc to valloc. ksize() returns the size of the kmalloc
> > bucket, we don't know what was the exact prior allocation size.
> 
> Probably a stupid question, but can't we just zero the full bucket initially and
> make sure to memset() spare memory in the bucket to zero when krealloc() is
> called with new_size < ksize()?
> 
> > Worse, we
> > started poisoning the padding in debug configurations, so even a
> > kmalloc(__GFP_ZERO) followed by krealloc(__GFP_ZERO) can give you unexpected
> > poison now...
> 
> As in writing magics directly to the spare memory in the bucket? Which would
> then also be copied over to a new buffer in __do_krealloc()?
> 
> > 
> > I guess we should just document __GFP_ZERO is not honored at all for
> > realloc, and maybe start even warning :/ Hopefully nobody relies on that.
> 
> I think it'd be great to make __GFP_ZERO work in all cases. However, if that's
> really not possible, I'd prefer if we could at least gurantee that
> *realloc(NULL, size, flags | __GFP_ZERO) is a valid call, i.e.
> WARN_ON(p && flags & __GFP_ZERO).
> 
> > 
> > > + *
> > > + * If @p is %NULL, vrealloc() behaves exactly like vmalloc(). If @size is 0 and
> > > + * @p is not a %NULL pointer, the object pointed to is freed.
> > > + *
> > > + * Return: pointer to the allocated memory; %NULL if @size is zero or in case of
> > > + *         failure
> > > + */
> > > +void *vrealloc_noprof(const void *p, size_t size, gfp_t flags)
> > > +{
> > > +	size_t old_size = 0;
> > > +	void *n;
> > > +
> > > +	if (!size) {
> > > +		vfree(p);
> > > +		return NULL;
> > > +	}
> > > +
> > > +	if (p) {
> > > +		struct vm_struct *vm;
> > > +
> > > +		vm = find_vm_area(p);
> > > +		if (unlikely(!vm)) {
> > > +			WARN(1, "Trying to vrealloc() nonexistent vm area (%p)\n", p);
> > > +			return NULL;
> > > +		}
> > > +
> > > +		old_size = get_vm_area_size(vm);
> > > +	}
> > > +
> > > +	if (size <= old_size) {
> > > +		/*
> > > +		 * TODO: Shrink the vm_area, i.e. unmap and free unused pages.
> > > +		 * What would be a good heuristic for when to shrink the
> > > +		 * vm_area?
> > > +		 */
> > > +		return (void *)p;
> > > +	}
> > > +
> > > +	/* TODO: Grow the vm_area, i.e. allocate and map additional pages. */
> > > +	n = __vmalloc_noprof(size, flags);
> > > +	if (!n)
> > > +		return NULL;
> > > +
> > > +	if (p) {
> > > +		memcpy(n, p, old_size);
> > > +		vfree(p);
> > > +	}
> > > +
> > > +	return n;
> > > +}
> > > +
> > >  #if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)
> > >  #define GFP_VMALLOC32 (GFP_DMA32 | GFP_KERNEL)
> > >  #elif defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA)
> > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: vmalloc: implement vrealloc()
  2024-07-29 19:08       ` Danilo Krummrich
@ 2024-07-30  1:35         ` Danilo Krummrich
  2024-07-30 12:15           ` Vlastimil Babka
  0 siblings, 1 reply; 28+ messages in thread
From: Danilo Krummrich @ 2024-07-30  1:35 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, roman.gushchin,
	42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf, mhocko, mpe,
	chandan.babu, christian.koenig, maz, oliver.upton, linux-kernel,
	linux-mm, rust-for-linux

On Mon, Jul 29, 2024 at 09:08:16PM +0200, Danilo Krummrich wrote:
> On Fri, Jul 26, 2024 at 10:05:47PM +0200, Danilo Krummrich wrote:
> > On Fri, Jul 26, 2024 at 04:37:43PM +0200, Vlastimil Babka wrote:
> > > On 7/22/24 6:29 PM, Danilo Krummrich wrote:
> > > > Implement vrealloc() analogous to krealloc().
> > > > 
> > > > Currently, krealloc() requires the caller to pass the size of the
> > > > previous memory allocation, which, instead, should be self-contained.
> > > > 
> > > > We attempt to fix this in a subsequent patch which, in order to do so,
> > > > requires vrealloc().
> > > > 
> > > > Besides that, we need realloc() functions for kernel allocators in Rust
> > > > too. With `Vec` or `KVec` respectively, potentially growing (and
> > > > shrinking) data structures are rather common.
> > > > 
> > > > Signed-off-by: Danilo Krummrich <dakr@kernel.org>
> > > 
> > > Acked-by: Vlastimil Babka <vbabka@suse.cz>
> > > 
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -4037,6 +4037,65 @@ void *vzalloc_node_noprof(unsigned long size, int node)
> > > >  }
> > > >  EXPORT_SYMBOL(vzalloc_node_noprof);
> > > >  
> > > > +/**
> > > > + * vrealloc - reallocate virtually contiguous memory; contents remain unchanged
> > > > + * @p: object to reallocate memory for
> > > > + * @size: the size to reallocate
> > > > + * @flags: the flags for the page level allocator
> > > > + *
> > > > + * The contents of the object pointed to are preserved up to the lesser of the
> > > > + * new and old size (__GFP_ZERO flag is effectively ignored).
> > > 
> > > Well, technically not correct as we don't shrink. Get 8 pages, kvrealloc to
> > > 4 pages, kvrealloc back to 8 and the last 4 are not zeroed. But it's not
> > > new, kvrealloc() did the same before patch 2/2.
> > 
> > Taking it (too) literal, it's not wrong. The contents of the object pointed to
> > are indeed preserved up to the lesser of the new and old size. It's just that
> > the rest may be "preserved" as well.
> > 
> > I work on implementing shrink and grow for vrealloc(). In the meantime I think
> > we could probably just memset() spare memory to zero.
> 
> Probably, this was a bad idea. Even with shrinking implemented we'd need to
> memset() potential spare memory of the last page to zero, when new_size <
> old_size.
> 
> Analogously, the same would be true for krealloc() buckets. That's probably not
> worth it.
> 
> I think we should indeed just document that __GFP_ZERO doesn't work for
> re-allocating memory and start to warn about it. As already mentioned, I think
> we should at least gurantee that *realloc(NULL, size, flags | __GFP_ZERO) is
> valid, i.e. WARN_ON(p && flags & __GFP_ZERO).

Maybe I spoke a bit to soon with this last paragraph. I think continuously
gowing something with __GFP_ZERO is a legitimate use case. I just did a quick
grep for users of krealloc() with __GFP_ZERO and found 18 matches.

So, I think, at least for now, we should instead document that __GFP_ZERO is
only fully honored when the buffer is grown continuously (without intermediate
shrinking) and __GFP_ZERO is supplied in every iteration.

In case I miss something here, and not even this case is safe, it looks like
we have 18 broken users of krealloc().

> 
> > 
> > nommu would still uses krealloc() though...
> > 
> > > 
> > > But it's also fundamentally not true for krealloc(), or kvrealloc()
> > > switching from a kmalloc to valloc. ksize() returns the size of the kmalloc
> > > bucket, we don't know what was the exact prior allocation size.
> > 
> > Probably a stupid question, but can't we just zero the full bucket initially and
> > make sure to memset() spare memory in the bucket to zero when krealloc() is
> > called with new_size < ksize()?
> > 
> > > Worse, we
> > > started poisoning the padding in debug configurations, so even a
> > > kmalloc(__GFP_ZERO) followed by krealloc(__GFP_ZERO) can give you unexpected
> > > poison now...
> > 
> > As in writing magics directly to the spare memory in the bucket? Which would
> > then also be copied over to a new buffer in __do_krealloc()?
> > 
> > > 
> > > I guess we should just document __GFP_ZERO is not honored at all for
> > > realloc, and maybe start even warning :/ Hopefully nobody relies on that.
> > 
> > I think it'd be great to make __GFP_ZERO work in all cases. However, if that's
> > really not possible, I'd prefer if we could at least gurantee that
> > *realloc(NULL, size, flags | __GFP_ZERO) is a valid call, i.e.
> > WARN_ON(p && flags & __GFP_ZERO).
> > 
> > > 
> > > > + *
> > > > + * If @p is %NULL, vrealloc() behaves exactly like vmalloc(). If @size is 0 and
> > > > + * @p is not a %NULL pointer, the object pointed to is freed.
> > > > + *
> > > > + * Return: pointer to the allocated memory; %NULL if @size is zero or in case of
> > > > + *         failure
> > > > + */
> > > > +void *vrealloc_noprof(const void *p, size_t size, gfp_t flags)
> > > > +{
> > > > +	size_t old_size = 0;
> > > > +	void *n;
> > > > +
> > > > +	if (!size) {
> > > > +		vfree(p);
> > > > +		return NULL;
> > > > +	}
> > > > +
> > > > +	if (p) {
> > > > +		struct vm_struct *vm;
> > > > +
> > > > +		vm = find_vm_area(p);
> > > > +		if (unlikely(!vm)) {
> > > > +			WARN(1, "Trying to vrealloc() nonexistent vm area (%p)\n", p);
> > > > +			return NULL;
> > > > +		}
> > > > +
> > > > +		old_size = get_vm_area_size(vm);
> > > > +	}
> > > > +
> > > > +	if (size <= old_size) {
> > > > +		/*
> > > > +		 * TODO: Shrink the vm_area, i.e. unmap and free unused pages.
> > > > +		 * What would be a good heuristic for when to shrink the
> > > > +		 * vm_area?
> > > > +		 */
> > > > +		return (void *)p;
> > > > +	}
> > > > +
> > > > +	/* TODO: Grow the vm_area, i.e. allocate and map additional pages. */
> > > > +	n = __vmalloc_noprof(size, flags);
> > > > +	if (!n)
> > > > +		return NULL;
> > > > +
> > > > +	if (p) {
> > > > +		memcpy(n, p, old_size);
> > > > +		vfree(p);
> > > > +	}
> > > > +
> > > > +	return n;
> > > > +}
> > > > +
> > > >  #if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)
> > > >  #define GFP_VMALLOC32 (GFP_DMA32 | GFP_KERNEL)
> > > >  #elif defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA)
> > > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: vmalloc: implement vrealloc()
  2024-07-30  1:35         ` Danilo Krummrich
@ 2024-07-30 12:15           ` Vlastimil Babka
  2024-07-30 13:14             ` Danilo Krummrich
  2024-09-02  1:36             ` Feng Tang
  0 siblings, 2 replies; 28+ messages in thread
From: Vlastimil Babka @ 2024-07-30 12:15 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, roman.gushchin,
	42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf, mhocko, mpe,
	chandan.babu, christian.koenig, maz, oliver.upton, linux-kernel,
	linux-mm, rust-for-linux, Feng Tang, kasan-dev

On 7/30/24 3:35 AM, Danilo Krummrich wrote:
> On Mon, Jul 29, 2024 at 09:08:16PM +0200, Danilo Krummrich wrote:
>> On Fri, Jul 26, 2024 at 10:05:47PM +0200, Danilo Krummrich wrote:
>>> On Fri, Jul 26, 2024 at 04:37:43PM +0200, Vlastimil Babka wrote:
>>>> On 7/22/24 6:29 PM, Danilo Krummrich wrote:
>>>>> Implement vrealloc() analogous to krealloc().
>>>>>
>>>>> Currently, krealloc() requires the caller to pass the size of the
>>>>> previous memory allocation, which, instead, should be self-contained.
>>>>>
>>>>> We attempt to fix this in a subsequent patch which, in order to do so,
>>>>> requires vrealloc().
>>>>>
>>>>> Besides that, we need realloc() functions for kernel allocators in Rust
>>>>> too. With `Vec` or `KVec` respectively, potentially growing (and
>>>>> shrinking) data structures are rather common.
>>>>>
>>>>> Signed-off-by: Danilo Krummrich <dakr@kernel.org>
>>>>
>>>> Acked-by: Vlastimil Babka <vbabka@suse.cz>
>>>>
>>>>> --- a/mm/vmalloc.c
>>>>> +++ b/mm/vmalloc.c
>>>>> @@ -4037,6 +4037,65 @@ void *vzalloc_node_noprof(unsigned long size, int node)
>>>>>  }
>>>>>  EXPORT_SYMBOL(vzalloc_node_noprof);
>>>>>  
>>>>> +/**
>>>>> + * vrealloc - reallocate virtually contiguous memory; contents remain unchanged
>>>>> + * @p: object to reallocate memory for
>>>>> + * @size: the size to reallocate
>>>>> + * @flags: the flags for the page level allocator
>>>>> + *
>>>>> + * The contents of the object pointed to are preserved up to the lesser of the
>>>>> + * new and old size (__GFP_ZERO flag is effectively ignored).
>>>>
>>>> Well, technically not correct as we don't shrink. Get 8 pages, kvrealloc to
>>>> 4 pages, kvrealloc back to 8 and the last 4 are not zeroed. But it's not
>>>> new, kvrealloc() did the same before patch 2/2.
>>>
>>> Taking it (too) literal, it's not wrong. The contents of the object pointed to
>>> are indeed preserved up to the lesser of the new and old size. It's just that
>>> the rest may be "preserved" as well.
>>>
>>> I work on implementing shrink and grow for vrealloc(). In the meantime I think
>>> we could probably just memset() spare memory to zero.
>>
>> Probably, this was a bad idea. Even with shrinking implemented we'd need to
>> memset() potential spare memory of the last page to zero, when new_size <
>> old_size.
>>
>> Analogously, the same would be true for krealloc() buckets. That's probably not
>> worth it.

I think it could remove unexpected bad surprises with the API so why not
do it.

>> I think we should indeed just document that __GFP_ZERO doesn't work for
>> re-allocating memory and start to warn about it. As already mentioned, I think
>> we should at least gurantee that *realloc(NULL, size, flags | __GFP_ZERO) is
>> valid, i.e. WARN_ON(p && flags & __GFP_ZERO).
> 
> Maybe I spoke a bit to soon with this last paragraph. I think continuously
> gowing something with __GFP_ZERO is a legitimate use case. I just did a quick
> grep for users of krealloc() with __GFP_ZERO and found 18 matches.
> 
> So, I think, at least for now, we should instead document that __GFP_ZERO is
> only fully honored when the buffer is grown continuously (without intermediate
> shrinking) and __GFP_ZERO is supplied in every iteration.
> 
> In case I miss something here, and not even this case is safe, it looks like
> we have 18 broken users of krealloc().

+CC Feng Tang

Let's say we kmalloc(56, __GFP_ZERO), we get an object from kmalloc-64
cache. Since commit 946fa0dbf2d89 ("mm/slub: extend redzone check to
extra allocated kmalloc space than requested") and preceding commits, if
slub_debug is enabled (red zoning or user tracking), only the 56 bytes
will be zeroed. The rest will be either unknown garbage, or redzone.

Then we might e.g. krealloc(120) and get a kmalloc-128 object and 64
bytes (result of ksize()) will be copied, including the garbage/redzone.
I think it's fixable because when we do this in slub_debug, we also
store the original size in the metadata, so we could read it back and
adjust how many bytes are copied.

Then we could guarantee that if __GFP_ZERO is used consistently on
initial kmalloc() and on krealloc() and the user doesn't corrupt the
extra space themselves (which is a bug anyway that the redzoning is
supposed to catch) all will be fine.

There might be also KASAN side to this, I see poison_kmalloc_redzone()
is also redzoning the area between requested size and cache's object_size?

>>
>>>
>>> nommu would still uses krealloc() though...
>>>
>>>>
>>>> But it's also fundamentally not true for krealloc(), or kvrealloc()
>>>> switching from a kmalloc to valloc. ksize() returns the size of the kmalloc
>>>> bucket, we don't know what was the exact prior allocation size.
>>>
>>> Probably a stupid question, but can't we just zero the full bucket initially and
>>> make sure to memset() spare memory in the bucket to zero when krealloc() is
>>> called with new_size < ksize()?
>>>
>>>> Worse, we
>>>> started poisoning the padding in debug configurations, so even a
>>>> kmalloc(__GFP_ZERO) followed by krealloc(__GFP_ZERO) can give you unexpected
>>>> poison now...
>>>
>>> As in writing magics directly to the spare memory in the bucket? Which would
>>> then also be copied over to a new buffer in __do_krealloc()?
>>>
>>>>
>>>> I guess we should just document __GFP_ZERO is not honored at all for
>>>> realloc, and maybe start even warning :/ Hopefully nobody relies on that.
>>>
>>> I think it'd be great to make __GFP_ZERO work in all cases. However, if that's
>>> really not possible, I'd prefer if we could at least gurantee that
>>> *realloc(NULL, size, flags | __GFP_ZERO) is a valid call, i.e.
>>> WARN_ON(p && flags & __GFP_ZERO).
>>>
>>>>
>>>>> + *
>>>>> + * If @p is %NULL, vrealloc() behaves exactly like vmalloc(). If @size is 0 and
>>>>> + * @p is not a %NULL pointer, the object pointed to is freed.
>>>>> + *
>>>>> + * Return: pointer to the allocated memory; %NULL if @size is zero or in case of
>>>>> + *         failure
>>>>> + */
>>>>> +void *vrealloc_noprof(const void *p, size_t size, gfp_t flags)
>>>>> +{
>>>>> +	size_t old_size = 0;
>>>>> +	void *n;
>>>>> +
>>>>> +	if (!size) {
>>>>> +		vfree(p);
>>>>> +		return NULL;
>>>>> +	}
>>>>> +
>>>>> +	if (p) {
>>>>> +		struct vm_struct *vm;
>>>>> +
>>>>> +		vm = find_vm_area(p);
>>>>> +		if (unlikely(!vm)) {
>>>>> +			WARN(1, "Trying to vrealloc() nonexistent vm area (%p)\n", p);
>>>>> +			return NULL;
>>>>> +		}
>>>>> +
>>>>> +		old_size = get_vm_area_size(vm);
>>>>> +	}
>>>>> +
>>>>> +	if (size <= old_size) {
>>>>> +		/*
>>>>> +		 * TODO: Shrink the vm_area, i.e. unmap and free unused pages.
>>>>> +		 * What would be a good heuristic for when to shrink the
>>>>> +		 * vm_area?
>>>>> +		 */
>>>>> +		return (void *)p;
>>>>> +	}
>>>>> +
>>>>> +	/* TODO: Grow the vm_area, i.e. allocate and map additional pages. */
>>>>> +	n = __vmalloc_noprof(size, flags);
>>>>> +	if (!n)
>>>>> +		return NULL;
>>>>> +
>>>>> +	if (p) {
>>>>> +		memcpy(n, p, old_size);
>>>>> +		vfree(p);
>>>>> +	}
>>>>> +
>>>>> +	return n;
>>>>> +}
>>>>> +
>>>>>  #if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)
>>>>>  #define GFP_VMALLOC32 (GFP_DMA32 | GFP_KERNEL)
>>>>>  #elif defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA)
>>>>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: vmalloc: implement vrealloc()
  2024-07-30 12:15           ` Vlastimil Babka
@ 2024-07-30 13:14             ` Danilo Krummrich
  2024-07-30 13:58               ` Vlastimil Babka
  2024-09-02  1:36             ` Feng Tang
  1 sibling, 1 reply; 28+ messages in thread
From: Danilo Krummrich @ 2024-07-30 13:14 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, roman.gushchin,
	42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf, mhocko, mpe,
	chandan.babu, christian.koenig, maz, oliver.upton, linux-kernel,
	linux-mm, rust-for-linux, Feng Tang, kasan-dev

On Tue, Jul 30, 2024 at 02:15:34PM +0200, Vlastimil Babka wrote:
> On 7/30/24 3:35 AM, Danilo Krummrich wrote:
> > On Mon, Jul 29, 2024 at 09:08:16PM +0200, Danilo Krummrich wrote:
> >> On Fri, Jul 26, 2024 at 10:05:47PM +0200, Danilo Krummrich wrote:
> >>> On Fri, Jul 26, 2024 at 04:37:43PM +0200, Vlastimil Babka wrote:
> >>>> On 7/22/24 6:29 PM, Danilo Krummrich wrote:
> >>>>> Implement vrealloc() analogous to krealloc().
> >>>>>
> >>>>> Currently, krealloc() requires the caller to pass the size of the
> >>>>> previous memory allocation, which, instead, should be self-contained.
> >>>>>
> >>>>> We attempt to fix this in a subsequent patch which, in order to do so,
> >>>>> requires vrealloc().
> >>>>>
> >>>>> Besides that, we need realloc() functions for kernel allocators in Rust
> >>>>> too. With `Vec` or `KVec` respectively, potentially growing (and
> >>>>> shrinking) data structures are rather common.
> >>>>>
> >>>>> Signed-off-by: Danilo Krummrich <dakr@kernel.org>
> >>>>
> >>>> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> >>>>
> >>>>> --- a/mm/vmalloc.c
> >>>>> +++ b/mm/vmalloc.c
> >>>>> @@ -4037,6 +4037,65 @@ void *vzalloc_node_noprof(unsigned long size, int node)
> >>>>>  }
> >>>>>  EXPORT_SYMBOL(vzalloc_node_noprof);
> >>>>>  
> >>>>> +/**
> >>>>> + * vrealloc - reallocate virtually contiguous memory; contents remain unchanged
> >>>>> + * @p: object to reallocate memory for
> >>>>> + * @size: the size to reallocate
> >>>>> + * @flags: the flags for the page level allocator
> >>>>> + *
> >>>>> + * The contents of the object pointed to are preserved up to the lesser of the
> >>>>> + * new and old size (__GFP_ZERO flag is effectively ignored).
> >>>>
> >>>> Well, technically not correct as we don't shrink. Get 8 pages, kvrealloc to
> >>>> 4 pages, kvrealloc back to 8 and the last 4 are not zeroed. But it's not
> >>>> new, kvrealloc() did the same before patch 2/2.
> >>>
> >>> Taking it (too) literal, it's not wrong. The contents of the object pointed to
> >>> are indeed preserved up to the lesser of the new and old size. It's just that
> >>> the rest may be "preserved" as well.
> >>>
> >>> I work on implementing shrink and grow for vrealloc(). In the meantime I think
> >>> we could probably just memset() spare memory to zero.
> >>
> >> Probably, this was a bad idea. Even with shrinking implemented we'd need to
> >> memset() potential spare memory of the last page to zero, when new_size <
> >> old_size.
> >>
> >> Analogously, the same would be true for krealloc() buckets. That's probably not
> >> worth it.
> 
> I think it could remove unexpected bad surprises with the API so why not
> do it.

We'd either need to do it *every* time we shrink an allocation on spec, or we
only do it when shrinking with __GFP_ZERO flag set, which might be a bit
counter-intuitive.

If we do it, I'd probably vote for the latter semantics. While it sounds more
error prone, it's less wasteful and enough to cover the most common case where
the actual *realloc() call is always with the same parameters, but a changing
size.

> 
> >> I think we should indeed just document that __GFP_ZERO doesn't work for
> >> re-allocating memory and start to warn about it. As already mentioned, I think
> >> we should at least gurantee that *realloc(NULL, size, flags | __GFP_ZERO) is
> >> valid, i.e. WARN_ON(p && flags & __GFP_ZERO).
> > 
> > Maybe I spoke a bit to soon with this last paragraph. I think continuously
> > gowing something with __GFP_ZERO is a legitimate use case. I just did a quick
> > grep for users of krealloc() with __GFP_ZERO and found 18 matches.
> > 
> > So, I think, at least for now, we should instead document that __GFP_ZERO is
> > only fully honored when the buffer is grown continuously (without intermediate
> > shrinking) and __GFP_ZERO is supplied in every iteration.
> > 
> > In case I miss something here, and not even this case is safe, it looks like
> > we have 18 broken users of krealloc().
> 
> +CC Feng Tang
> 
> Let's say we kmalloc(56, __GFP_ZERO), we get an object from kmalloc-64
> cache. Since commit 946fa0dbf2d89 ("mm/slub: extend redzone check to
> extra allocated kmalloc space than requested") and preceding commits, if
> slub_debug is enabled (red zoning or user tracking), only the 56 bytes
> will be zeroed. The rest will be either unknown garbage, or redzone.
> 
> Then we might e.g. krealloc(120) and get a kmalloc-128 object and 64
> bytes (result of ksize()) will be copied, including the garbage/redzone.
> I think it's fixable because when we do this in slub_debug, we also
> store the original size in the metadata, so we could read it back and
> adjust how many bytes are copied.
> 
> Then we could guarantee that if __GFP_ZERO is used consistently on
> initial kmalloc() and on krealloc() and the user doesn't corrupt the
> extra space themselves (which is a bug anyway that the redzoning is
> supposed to catch) all will be fine.

Ok, so those 18 users are indeed currently broken, but only when slub_debug is
enabled (assuming that all of those are consistently growing with __GFP_ZERO).

> 
> There might be also KASAN side to this, I see poison_kmalloc_redzone()
> is also redzoning the area between requested size and cache's object_size?
> 
> >>
> >>>
> >>> nommu would still uses krealloc() though...
> >>>
> >>>>
> >>>> But it's also fundamentally not true for krealloc(), or kvrealloc()
> >>>> switching from a kmalloc to valloc. ksize() returns the size of the kmalloc
> >>>> bucket, we don't know what was the exact prior allocation size.
> >>>
> >>> Probably a stupid question, but can't we just zero the full bucket initially and
> >>> make sure to memset() spare memory in the bucket to zero when krealloc() is
> >>> called with new_size < ksize()?
> >>>
> >>>> Worse, we
> >>>> started poisoning the padding in debug configurations, so even a
> >>>> kmalloc(__GFP_ZERO) followed by krealloc(__GFP_ZERO) can give you unexpected
> >>>> poison now...
> >>>
> >>> As in writing magics directly to the spare memory in the bucket? Which would
> >>> then also be copied over to a new buffer in __do_krealloc()?
> >>>
> >>>>
> >>>> I guess we should just document __GFP_ZERO is not honored at all for
> >>>> realloc, and maybe start even warning :/ Hopefully nobody relies on that.
> >>>
> >>> I think it'd be great to make __GFP_ZERO work in all cases. However, if that's
> >>> really not possible, I'd prefer if we could at least gurantee that
> >>> *realloc(NULL, size, flags | __GFP_ZERO) is a valid call, i.e.
> >>> WARN_ON(p && flags & __GFP_ZERO).
> >>>
> >>>>
> >>>>> + *
> >>>>> + * If @p is %NULL, vrealloc() behaves exactly like vmalloc(). If @size is 0 and
> >>>>> + * @p is not a %NULL pointer, the object pointed to is freed.
> >>>>> + *
> >>>>> + * Return: pointer to the allocated memory; %NULL if @size is zero or in case of
> >>>>> + *         failure
> >>>>> + */
> >>>>> +void *vrealloc_noprof(const void *p, size_t size, gfp_t flags)
> >>>>> +{
> >>>>> +	size_t old_size = 0;
> >>>>> +	void *n;
> >>>>> +
> >>>>> +	if (!size) {
> >>>>> +		vfree(p);
> >>>>> +		return NULL;
> >>>>> +	}
> >>>>> +
> >>>>> +	if (p) {
> >>>>> +		struct vm_struct *vm;
> >>>>> +
> >>>>> +		vm = find_vm_area(p);
> >>>>> +		if (unlikely(!vm)) {
> >>>>> +			WARN(1, "Trying to vrealloc() nonexistent vm area (%p)\n", p);
> >>>>> +			return NULL;
> >>>>> +		}
> >>>>> +
> >>>>> +		old_size = get_vm_area_size(vm);
> >>>>> +	}
> >>>>> +
> >>>>> +	if (size <= old_size) {
> >>>>> +		/*
> >>>>> +		 * TODO: Shrink the vm_area, i.e. unmap and free unused pages.
> >>>>> +		 * What would be a good heuristic for when to shrink the
> >>>>> +		 * vm_area?
> >>>>> +		 */
> >>>>> +		return (void *)p;
> >>>>> +	}
> >>>>> +
> >>>>> +	/* TODO: Grow the vm_area, i.e. allocate and map additional pages. */
> >>>>> +	n = __vmalloc_noprof(size, flags);
> >>>>> +	if (!n)
> >>>>> +		return NULL;
> >>>>> +
> >>>>> +	if (p) {
> >>>>> +		memcpy(n, p, old_size);
> >>>>> +		vfree(p);
> >>>>> +	}
> >>>>> +
> >>>>> +	return n;
> >>>>> +}
> >>>>> +
> >>>>>  #if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)
> >>>>>  #define GFP_VMALLOC32 (GFP_DMA32 | GFP_KERNEL)
> >>>>>  #elif defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA)
> >>>>
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: vmalloc: implement vrealloc()
  2024-07-30 13:14             ` Danilo Krummrich
@ 2024-07-30 13:58               ` Vlastimil Babka
  2024-07-30 14:32                 ` Danilo Krummrich
  0 siblings, 1 reply; 28+ messages in thread
From: Vlastimil Babka @ 2024-07-30 13:58 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, roman.gushchin,
	42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf, mhocko, mpe,
	chandan.babu, christian.koenig, maz, oliver.upton, linux-kernel,
	linux-mm, rust-for-linux, Feng Tang, kasan-dev

On 7/30/24 3:14 PM, Danilo Krummrich wrote:
> On Tue, Jul 30, 2024 at 02:15:34PM +0200, Vlastimil Babka wrote:
>> On 7/30/24 3:35 AM, Danilo Krummrich wrote:
>>> On Mon, Jul 29, 2024 at 09:08:16PM +0200, Danilo Krummrich wrote:
>>>> On Fri, Jul 26, 2024 at 10:05:47PM +0200, Danilo Krummrich wrote:
>>>>> On Fri, Jul 26, 2024 at 04:37:43PM +0200, Vlastimil Babka wrote:
>>>>>> On 7/22/24 6:29 PM, Danilo Krummrich wrote:
>>>>>>> Implement vrealloc() analogous to krealloc().
>>>>>>>
>>>>>>> Currently, krealloc() requires the caller to pass the size of the
>>>>>>> previous memory allocation, which, instead, should be self-contained.
>>>>>>>
>>>>>>> We attempt to fix this in a subsequent patch which, in order to do so,
>>>>>>> requires vrealloc().
>>>>>>>
>>>>>>> Besides that, we need realloc() functions for kernel allocators in Rust
>>>>>>> too. With `Vec` or `KVec` respectively, potentially growing (and
>>>>>>> shrinking) data structures are rather common.
>>>>>>>
>>>>>>> Signed-off-by: Danilo Krummrich <dakr@kernel.org>
>>>>>>
>>>>>> Acked-by: Vlastimil Babka <vbabka@suse.cz>
>>>>>>
>>>>>>> --- a/mm/vmalloc.c
>>>>>>> +++ b/mm/vmalloc.c
>>>>>>> @@ -4037,6 +4037,65 @@ void *vzalloc_node_noprof(unsigned long size, int node)
>>>>>>>  }
>>>>>>>  EXPORT_SYMBOL(vzalloc_node_noprof);
>>>>>>>  
>>>>>>> +/**
>>>>>>> + * vrealloc - reallocate virtually contiguous memory; contents remain unchanged
>>>>>>> + * @p: object to reallocate memory for
>>>>>>> + * @size: the size to reallocate
>>>>>>> + * @flags: the flags for the page level allocator
>>>>>>> + *
>>>>>>> + * The contents of the object pointed to are preserved up to the lesser of the
>>>>>>> + * new and old size (__GFP_ZERO flag is effectively ignored).
>>>>>>
>>>>>> Well, technically not correct as we don't shrink. Get 8 pages, kvrealloc to
>>>>>> 4 pages, kvrealloc back to 8 and the last 4 are not zeroed. But it's not
>>>>>> new, kvrealloc() did the same before patch 2/2.
>>>>>
>>>>> Taking it (too) literal, it's not wrong. The contents of the object pointed to
>>>>> are indeed preserved up to the lesser of the new and old size. It's just that
>>>>> the rest may be "preserved" as well.
>>>>>
>>>>> I work on implementing shrink and grow for vrealloc(). In the meantime I think
>>>>> we could probably just memset() spare memory to zero.
>>>>
>>>> Probably, this was a bad idea. Even with shrinking implemented we'd need to
>>>> memset() potential spare memory of the last page to zero, when new_size <
>>>> old_size.
>>>>
>>>> Analogously, the same would be true for krealloc() buckets. That's probably not
>>>> worth it.
>>
>> I think it could remove unexpected bad surprises with the API so why not
>> do it.
> 
> We'd either need to do it *every* time we shrink an allocation on spec, or we
> only do it when shrinking with __GFP_ZERO flag set, which might be a bit
> counter-intuitive.

I don't think it is that much counterintuitive.

> If we do it, I'd probably vote for the latter semantics. While it sounds more
> error prone, it's less wasteful and enough to cover the most common case where
> the actual *realloc() call is always with the same parameters, but a changing
> size.

Yeah. Or with hardening enabled (init_on_alloc) it could be done always.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: vmalloc: implement vrealloc()
  2024-07-30 13:58               ` Vlastimil Babka
@ 2024-07-30 14:32                 ` Danilo Krummrich
  0 siblings, 0 replies; 28+ messages in thread
From: Danilo Krummrich @ 2024-07-30 14:32 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, roman.gushchin,
	42.hyeyoo, urezki, hch, kees, ojeda, wedsonaf, mhocko, mpe,
	chandan.babu, christian.koenig, maz, oliver.upton, linux-kernel,
	linux-mm, rust-for-linux, Feng Tang, kasan-dev

On Tue, Jul 30, 2024 at 03:58:25PM +0200, Vlastimil Babka wrote:
> On 7/30/24 3:14 PM, Danilo Krummrich wrote:
> > On Tue, Jul 30, 2024 at 02:15:34PM +0200, Vlastimil Babka wrote:
> >> On 7/30/24 3:35 AM, Danilo Krummrich wrote:
> >>> On Mon, Jul 29, 2024 at 09:08:16PM +0200, Danilo Krummrich wrote:
> >>>> On Fri, Jul 26, 2024 at 10:05:47PM +0200, Danilo Krummrich wrote:
> >>>>> On Fri, Jul 26, 2024 at 04:37:43PM +0200, Vlastimil Babka wrote:
> >>>>>> On 7/22/24 6:29 PM, Danilo Krummrich wrote:
> >>>>>>> Implement vrealloc() analogous to krealloc().
> >>>>>>>
> >>>>>>> Currently, krealloc() requires the caller to pass the size of the
> >>>>>>> previous memory allocation, which, instead, should be self-contained.
> >>>>>>>
> >>>>>>> We attempt to fix this in a subsequent patch which, in order to do so,
> >>>>>>> requires vrealloc().
> >>>>>>>
> >>>>>>> Besides that, we need realloc() functions for kernel allocators in Rust
> >>>>>>> too. With `Vec` or `KVec` respectively, potentially growing (and
> >>>>>>> shrinking) data structures are rather common.
> >>>>>>>
> >>>>>>> Signed-off-by: Danilo Krummrich <dakr@kernel.org>
> >>>>>>
> >>>>>> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> >>>>>>
> >>>>>>> --- a/mm/vmalloc.c
> >>>>>>> +++ b/mm/vmalloc.c
> >>>>>>> @@ -4037,6 +4037,65 @@ void *vzalloc_node_noprof(unsigned long size, int node)
> >>>>>>>  }
> >>>>>>>  EXPORT_SYMBOL(vzalloc_node_noprof);
> >>>>>>>  
> >>>>>>> +/**
> >>>>>>> + * vrealloc - reallocate virtually contiguous memory; contents remain unchanged
> >>>>>>> + * @p: object to reallocate memory for
> >>>>>>> + * @size: the size to reallocate
> >>>>>>> + * @flags: the flags for the page level allocator
> >>>>>>> + *
> >>>>>>> + * The contents of the object pointed to are preserved up to the lesser of the
> >>>>>>> + * new and old size (__GFP_ZERO flag is effectively ignored).
> >>>>>>
> >>>>>> Well, technically not correct as we don't shrink. Get 8 pages, kvrealloc to
> >>>>>> 4 pages, kvrealloc back to 8 and the last 4 are not zeroed. But it's not
> >>>>>> new, kvrealloc() did the same before patch 2/2.
> >>>>>
> >>>>> Taking it (too) literal, it's not wrong. The contents of the object pointed to
> >>>>> are indeed preserved up to the lesser of the new and old size. It's just that
> >>>>> the rest may be "preserved" as well.
> >>>>>
> >>>>> I work on implementing shrink and grow for vrealloc(). In the meantime I think
> >>>>> we could probably just memset() spare memory to zero.
> >>>>
> >>>> Probably, this was a bad idea. Even with shrinking implemented we'd need to
> >>>> memset() potential spare memory of the last page to zero, when new_size <
> >>>> old_size.
> >>>>
> >>>> Analogously, the same would be true for krealloc() buckets. That's probably not
> >>>> worth it.
> >>
> >> I think it could remove unexpected bad surprises with the API so why not
> >> do it.
> > 
> > We'd either need to do it *every* time we shrink an allocation on spec, or we
> > only do it when shrinking with __GFP_ZERO flag set, which might be a bit
> > counter-intuitive.
> 
> I don't think it is that much counterintuitive.
> 
> > If we do it, I'd probably vote for the latter semantics. While it sounds more
> > error prone, it's less wasteful and enough to cover the most common case where
> > the actual *realloc() call is always with the same parameters, but a changing
> > size.
> 
> Yeah. Or with hardening enabled (init_on_alloc) it could be done always.
> 

Ok, sounds good. Will go with that then.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: vmalloc: implement vrealloc()
  2024-07-30 12:15           ` Vlastimil Babka
  2024-07-30 13:14             ` Danilo Krummrich
@ 2024-09-02  1:36             ` Feng Tang
  2024-09-02  7:04               ` Feng Tang
  1 sibling, 1 reply; 28+ messages in thread
From: Feng Tang @ 2024-09-02  1:36 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Danilo Krummrich, cl@linux.com, penberg@kernel.org,
	rientjes@google.com, iamjoonsoo.kim@lge.com,
	akpm@linux-foundation.org, roman.gushchin@linux.dev,
	42.hyeyoo@gmail.com, urezki@gmail.com, hch@infradead.org,
	kees@kernel.org, ojeda@kernel.org, wedsonaf@gmail.com,
	mhocko@kernel.org, mpe@ellerman.id.au, chandan.babu@oracle.com,
	christian.koenig@amd.com, maz@kernel.org, oliver.upton@linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	rust-for-linux@vger.kernel.org, kasan-dev

On Tue, Jul 30, 2024 at 08:15:34PM +0800, Vlastimil Babka wrote:
> On 7/30/24 3:35 AM, Danilo Krummrich wrote:
[...]
> > 
> > Maybe I spoke a bit to soon with this last paragraph. I think continuously
> > gowing something with __GFP_ZERO is a legitimate use case. I just did a quick
> > grep for users of krealloc() with __GFP_ZERO and found 18 matches.
> > 
> > So, I think, at least for now, we should instead document that __GFP_ZERO is
> > only fully honored when the buffer is grown continuously (without intermediate
> > shrinking) and __GFP_ZERO is supplied in every iteration.
> > 
> > In case I miss something here, and not even this case is safe, it looks like
> > we have 18 broken users of krealloc().
> 
> +CC Feng Tang

Sorry for the late reply!

> 
> Let's say we kmalloc(56, __GFP_ZERO), we get an object from kmalloc-64
> cache. Since commit 946fa0dbf2d89 ("mm/slub: extend redzone check to
> extra allocated kmalloc space than requested") and preceding commits, if
> slub_debug is enabled (red zoning or user tracking), only the 56 bytes
> will be zeroed. The rest will be either unknown garbage, or redzone.

Yes.

> 
> Then we might e.g. krealloc(120) and get a kmalloc-128 object and 64
> bytes (result of ksize()) will be copied, including the garbage/redzone.
> I think it's fixable because when we do this in slub_debug, we also
> store the original size in the metadata, so we could read it back and
> adjust how many bytes are copied.

krealloc() --> __do_krealloc() --> ksize()
When ksize() is called, as we don't know what user will do with the
extra space ([57, 64] here), the orig_size check will be unset by
__ksize() calling skip_orig_size_check(). 

And if the newsize is bigger than the old 'ksize', the 'orig_size'
will be correctly set for the newly allocated kmalloc object.

For the 'unstable' branch of -mm tree, which has all latest patches
from Danilo, I run some basic test and it seems to be fine. 

> 
> Then we could guarantee that if __GFP_ZERO is used consistently on
> initial kmalloc() and on krealloc() and the user doesn't corrupt the
> extra space themselves (which is a bug anyway that the redzoning is
> supposed to catch) all will be fine.
> 
> There might be also KASAN side to this, I see poison_kmalloc_redzone()
> is also redzoning the area between requested size and cache's object_size?

AFAIK, KASAN has 3 modes: generic, SW-taged, HW-tagged, while the
latter 2 modes relied on arm64. For 'generic' mode, poison_kmalloc_redzone()
only redzone its own shadow memory, and not the kmalloc object data
space [orig_size + 1, ksize]. For the other 2 modes, I have no hardware
to test, but I guess they are also fine, otherwise there should be
already some bug report :), as normal kmalloc() may call it too. 

Thanks,
Feng

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: vmalloc: implement vrealloc()
  2024-09-02  1:36             ` Feng Tang
@ 2024-09-02  7:04               ` Feng Tang
  2024-09-02  8:56                 ` Vlastimil Babka
  0 siblings, 1 reply; 28+ messages in thread
From: Feng Tang @ 2024-09-02  7:04 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Danilo Krummrich, cl@linux.com, penberg@kernel.org,
	rientjes@google.com, iamjoonsoo.kim@lge.com,
	akpm@linux-foundation.org, roman.gushchin@linux.dev,
	42.hyeyoo@gmail.com, urezki@gmail.com, hch@infradead.org,
	kees@kernel.org, ojeda@kernel.org, wedsonaf@gmail.com,
	mhocko@kernel.org, mpe@ellerman.id.au, chandan.babu@oracle.com,
	christian.koenig@amd.com, maz@kernel.org, oliver.upton@linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	rust-for-linux@vger.kernel.org, kasan-dev

On Mon, Sep 02, 2024 at 09:36:26AM +0800, Tang, Feng wrote:
> On Tue, Jul 30, 2024 at 08:15:34PM +0800, Vlastimil Babka wrote:
> > On 7/30/24 3:35 AM, Danilo Krummrich wrote:
[...]
> > 
> > Let's say we kmalloc(56, __GFP_ZERO), we get an object from kmalloc-64
> > cache. Since commit 946fa0dbf2d89 ("mm/slub: extend redzone check to
> > extra allocated kmalloc space than requested") and preceding commits, if
> > slub_debug is enabled (red zoning or user tracking), only the 56 bytes
> > will be zeroed. The rest will be either unknown garbage, or redzone.
> 
> Yes.
> 
> > 
> > Then we might e.g. krealloc(120) and get a kmalloc-128 object and 64
> > bytes (result of ksize()) will be copied, including the garbage/redzone.
> > I think it's fixable because when we do this in slub_debug, we also
> > store the original size in the metadata, so we could read it back and
> > adjust how many bytes are copied.
> 
> krealloc() --> __do_krealloc() --> ksize()
> When ksize() is called, as we don't know what user will do with the
> extra space ([57, 64] here), the orig_size check will be unset by
> __ksize() calling skip_orig_size_check(). 
> 
> And if the newsize is bigger than the old 'ksize', the 'orig_size'
> will be correctly set for the newly allocated kmalloc object.
> 
> For the 'unstable' branch of -mm tree, which has all latest patches
> from Danilo, I run some basic test and it seems to be fine. 

when doing more test, I found one case matching Vlastimil's previous
concern, that if we kzalloc a small object, and then krealloc with
a slightly bigger size which can still reuse the kmalloc object,
some redzone will be preserved.

With test code like: 

	buf = kzalloc(36, GFP_KERNEL);
	memset(buf, 0xff, 36);

	buf = krealloc(buf, 48, GFP_KERNEL | __GFP_ZERO);

Data after kzalloc+memset :

	ffff88802189b040: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  
	ffff88802189b050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  
	ffff88802189b060: ff ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc  
	ffff88802189b070: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  

Data after krealloc:

	ffff88802189b040: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
	ffff88802189b050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
	ffff88802189b060: ff ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc
	ffff88802189b070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

If we really want to make [37, 48] to be zeroed too, we can lift the
get_orig_size() from slub.c to slab_common.c and use it as the start
of zeroing in krealloc().

Thanks,
Feng

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: vmalloc: implement vrealloc()
  2024-09-02  7:04               ` Feng Tang
@ 2024-09-02  8:56                 ` Vlastimil Babka
  2024-09-03  3:18                   ` Feng Tang
  0 siblings, 1 reply; 28+ messages in thread
From: Vlastimil Babka @ 2024-09-02  8:56 UTC (permalink / raw)
  To: Feng Tang
  Cc: Danilo Krummrich, cl@linux.com, penberg@kernel.org,
	rientjes@google.com, iamjoonsoo.kim@lge.com,
	akpm@linux-foundation.org, roman.gushchin@linux.dev,
	42.hyeyoo@gmail.com, urezki@gmail.com, hch@infradead.org,
	kees@kernel.org, ojeda@kernel.org, wedsonaf@gmail.com,
	mhocko@kernel.org, mpe@ellerman.id.au, chandan.babu@oracle.com,
	christian.koenig@amd.com, maz@kernel.org, oliver.upton@linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	rust-for-linux@vger.kernel.org, kasan-dev

On 9/2/24 09:04, Feng Tang wrote:
> On Mon, Sep 02, 2024 at 09:36:26AM +0800, Tang, Feng wrote:
>> On Tue, Jul 30, 2024 at 08:15:34PM +0800, Vlastimil Babka wrote:
>> > On 7/30/24 3:35 AM, Danilo Krummrich wrote:
> [...]
>> > 
>> > Let's say we kmalloc(56, __GFP_ZERO), we get an object from kmalloc-64
>> > cache. Since commit 946fa0dbf2d89 ("mm/slub: extend redzone check to
>> > extra allocated kmalloc space than requested") and preceding commits, if
>> > slub_debug is enabled (red zoning or user tracking), only the 56 bytes
>> > will be zeroed. The rest will be either unknown garbage, or redzone.
>> 
>> Yes.
>> 
>> > 
>> > Then we might e.g. krealloc(120) and get a kmalloc-128 object and 64
>> > bytes (result of ksize()) will be copied, including the garbage/redzone.
>> > I think it's fixable because when we do this in slub_debug, we also
>> > store the original size in the metadata, so we could read it back and
>> > adjust how many bytes are copied.
>> 
>> krealloc() --> __do_krealloc() --> ksize()
>> When ksize() is called, as we don't know what user will do with the
>> extra space ([57, 64] here), the orig_size check will be unset by
>> __ksize() calling skip_orig_size_check(). 
>> 
>> And if the newsize is bigger than the old 'ksize', the 'orig_size'
>> will be correctly set for the newly allocated kmalloc object.

Yes, but the memcpy() to the new object will be done using ksize() thus
include the redzone, e.g. [57, 64]

>> For the 'unstable' branch of -mm tree, which has all latest patches
>> from Danilo, I run some basic test and it seems to be fine. 

To test it would not always be enough to expect some slub_debug to fail,
you'd e.g. have to kmalloc(48, GFP_KERNEL | GFP_ZERO), krealloc(128,
GFP_KERNEL | GFP_ZERO) and then verify there are zeroes from 48 to 128. I
suspect there won't be zeroes from 48 to 64 due to redzone.

(this would have made a great lib/slub_kunit.c test :))

> when doing more test, I found one case matching Vlastimil's previous
> concern, that if we kzalloc a small object, and then krealloc with
> a slightly bigger size which can still reuse the kmalloc object,
> some redzone will be preserved.
> 
> With test code like: 
> 
> 	buf = kzalloc(36, GFP_KERNEL);
> 	memset(buf, 0xff, 36);
> 
> 	buf = krealloc(buf, 48, GFP_KERNEL | __GFP_ZERO);
> 
> Data after kzalloc+memset :
> 
> 	ffff88802189b040: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  
> 	ffff88802189b050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  
> 	ffff88802189b060: ff ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc  
> 	ffff88802189b070: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  
> 
> Data after krealloc:
> 
> 	ffff88802189b040: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 	ffff88802189b050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 	ffff88802189b060: ff ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc
> 	ffff88802189b070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 
> If we really want to make [37, 48] to be zeroed too, we can lift the
> get_orig_size() from slub.c to slab_common.c and use it as the start
> of zeroing in krealloc().

Or maybe just move krealloc() to mm/slub.c so there are no unnecessary calls
between the files.

We should also set a new orig_size in cases we are shrinking or enlarging
within same object (i.e. 48->40 or 48->64). In case of shrinking, we also
might need to redzone the shrinked area (i.e. [40, 48]) or later checks will
fail.  But if the current object is from kfence, then probably not do any of
this... sigh this gets complicated. And really we need kunit tests for all
the scenarios :/

> Thanks,
> Feng


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: vmalloc: implement vrealloc()
  2024-09-02  8:56                 ` Vlastimil Babka
@ 2024-09-03  3:18                   ` Feng Tang
  2024-09-06  7:35                     ` Feng Tang
  0 siblings, 1 reply; 28+ messages in thread
From: Feng Tang @ 2024-09-03  3:18 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Danilo Krummrich, cl@linux.com, penberg@kernel.org,
	rientjes@google.com, iamjoonsoo.kim@lge.com,
	akpm@linux-foundation.org, roman.gushchin@linux.dev,
	42.hyeyoo@gmail.com, urezki@gmail.com, hch@infradead.org,
	kees@kernel.org, ojeda@kernel.org, wedsonaf@gmail.com,
	mhocko@kernel.org, mpe@ellerman.id.au, chandan.babu@oracle.com,
	christian.koenig@amd.com, maz@kernel.org, oliver.upton@linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	rust-for-linux@vger.kernel.org, kasan-dev

On Mon, Sep 02, 2024 at 10:56:57AM +0200, Vlastimil Babka wrote:
> On 9/2/24 09:04, Feng Tang wrote:
> > On Mon, Sep 02, 2024 at 09:36:26AM +0800, Tang, Feng wrote:
> >> On Tue, Jul 30, 2024 at 08:15:34PM +0800, Vlastimil Babka wrote:
> >> > On 7/30/24 3:35 AM, Danilo Krummrich wrote:
> > [...]
> >> > 
> >> > Let's say we kmalloc(56, __GFP_ZERO), we get an object from kmalloc-64
> >> > cache. Since commit 946fa0dbf2d89 ("mm/slub: extend redzone check to
> >> > extra allocated kmalloc space than requested") and preceding commits, if
> >> > slub_debug is enabled (red zoning or user tracking), only the 56 bytes
> >> > will be zeroed. The rest will be either unknown garbage, or redzone.
> >> 
> >> Yes.
> >> 
> >> > 
> >> > Then we might e.g. krealloc(120) and get a kmalloc-128 object and 64
> >> > bytes (result of ksize()) will be copied, including the garbage/redzone.
> >> > I think it's fixable because when we do this in slub_debug, we also
> >> > store the original size in the metadata, so we could read it back and
> >> > adjust how many bytes are copied.
> >> 
> >> krealloc() --> __do_krealloc() --> ksize()
> >> When ksize() is called, as we don't know what user will do with the
> >> extra space ([57, 64] here), the orig_size check will be unset by
> >> __ksize() calling skip_orig_size_check(). 
> >> 
> >> And if the newsize is bigger than the old 'ksize', the 'orig_size'
> >> will be correctly set for the newly allocated kmalloc object.
> 
> Yes, but the memcpy() to the new object will be done using ksize() thus
> include the redzone, e.g. [57, 64]

Right.

> 
> >> For the 'unstable' branch of -mm tree, which has all latest patches
> >> from Danilo, I run some basic test and it seems to be fine. 
> 
> To test it would not always be enough to expect some slub_debug to fail,
> you'd e.g. have to kmalloc(48, GFP_KERNEL | GFP_ZERO), krealloc(128,
> GFP_KERNEL | GFP_ZERO) and then verify there are zeroes from 48 to 128. I
> suspect there won't be zeroes from 48 to 64 due to redzone.

Yes, you are right.
 
> (this would have made a great lib/slub_kunit.c test :))

Agree.

> > when doing more test, I found one case matching Vlastimil's previous
> > concern, that if we kzalloc a small object, and then krealloc with
> > a slightly bigger size which can still reuse the kmalloc object,
> > some redzone will be preserved.
> > 
> > With test code like: 
> > 
> > 	buf = kzalloc(36, GFP_KERNEL);
> > 	memset(buf, 0xff, 36);
> > 
> > 	buf = krealloc(buf, 48, GFP_KERNEL | __GFP_ZERO);
> > 
> > Data after kzalloc+memset :
> > 
> > 	ffff88802189b040: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  
> > 	ffff88802189b050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  
> > 	ffff88802189b060: ff ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc  
> > 	ffff88802189b070: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  
> > 
> > Data after krealloc:
> > 
> > 	ffff88802189b040: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > 	ffff88802189b050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > 	ffff88802189b060: ff ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc
> > 	ffff88802189b070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 
> > If we really want to make [37, 48] to be zeroed too, we can lift the
> > get_orig_size() from slub.c to slab_common.c and use it as the start
> > of zeroing in krealloc().
> 
> Or maybe just move krealloc() to mm/slub.c so there are no unnecessary calls
> between the files.
> 
> We should also set a new orig_size in cases we are shrinking or enlarging
> within same object (i.e. 48->40 or 48->64). In case of shrinking, we also
> might need to redzone the shrinked area (i.e. [40, 48]) or later checks will
> fail.  But if the current object is from kfence, then probably not do any of
> this... sigh this gets complicated. And really we need kunit tests for all
> the scenarios :/

Good point! will think about and try to implement it to ensure the
orig_size and kmalloc-redzone check setting is kept. 

Thanks,
Feng

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: vmalloc: implement vrealloc()
  2024-09-03  3:18                   ` Feng Tang
@ 2024-09-06  7:35                     ` Feng Tang
  0 siblings, 0 replies; 28+ messages in thread
From: Feng Tang @ 2024-09-06  7:35 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Danilo Krummrich, cl@linux.com, penberg@kernel.org,
	rientjes@google.com, iamjoonsoo.kim@lge.com,
	akpm@linux-foundation.org, roman.gushchin@linux.dev,
	42.hyeyoo@gmail.com, urezki@gmail.com, hch@infradead.org,
	kees@kernel.org, ojeda@kernel.org, wedsonaf@gmail.com,
	mhocko@kernel.org, mpe@ellerman.id.au, chandan.babu@oracle.com,
	christian.koenig@amd.com, maz@kernel.org, oliver.upton@linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	rust-for-linux@vger.kernel.org, kasan-dev

On Tue, Sep 03, 2024 at 11:18:48AM +0800, Tang, Feng wrote:
> On Mon, Sep 02, 2024 at 10:56:57AM +0200, Vlastimil Babka wrote:
[...]
> > > If we really want to make [37, 48] to be zeroed too, we can lift the
> > > get_orig_size() from slub.c to slab_common.c and use it as the start
> > > of zeroing in krealloc().
> > 
> > Or maybe just move krealloc() to mm/slub.c so there are no unnecessary calls
> > between the files.
> > 
> > We should also set a new orig_size in cases we are shrinking or enlarging
> > within same object (i.e. 48->40 or 48->64). In case of shrinking, we also
> > might need to redzone the shrinked area (i.e. [40, 48]) or later checks will
> > fail.  But if the current object is from kfence, then probably not do any of
> > this... sigh this gets complicated. And really we need kunit tests for all
> > the scenarios :/
> 
> Good point! will think about and try to implement it to ensure the
> orig_size and kmalloc-redzone check setting is kept. 

I checked this, and as you mentioned, there is some kfence and kasan stuff
which needs to be handled to manage the 'orig_size'. As this work depends
on patches in both -slab tree and -mm tree, will base it againt linux-next
tree and send out the patches for review soon.

Thanks,
Feng

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2024-09-06  7:35 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-22 16:29 [PATCH v2 0/2] Align kvrealloc() with krealloc() Danilo Krummrich
2024-07-22 16:29 ` [PATCH v2 1/2] mm: vmalloc: implement vrealloc() Danilo Krummrich
2024-07-26 14:37   ` Vlastimil Babka
2024-07-26 20:05     ` Danilo Krummrich
2024-07-29 19:08       ` Danilo Krummrich
2024-07-30  1:35         ` Danilo Krummrich
2024-07-30 12:15           ` Vlastimil Babka
2024-07-30 13:14             ` Danilo Krummrich
2024-07-30 13:58               ` Vlastimil Babka
2024-07-30 14:32                 ` Danilo Krummrich
2024-09-02  1:36             ` Feng Tang
2024-09-02  7:04               ` Feng Tang
2024-09-02  8:56                 ` Vlastimil Babka
2024-09-03  3:18                   ` Feng Tang
2024-09-06  7:35                     ` Feng Tang
2024-07-22 16:29 ` [PATCH v2 2/2] mm: kvmalloc: align kvrealloc() with krealloc() Danilo Krummrich
2024-07-23  1:43   ` Andrew Morton
2024-07-23 14:05     ` Danilo Krummrich
2024-07-23  7:50   ` Michal Hocko
2024-07-23 10:42     ` Danilo Krummrich
2024-07-23 10:55       ` Michal Hocko
2024-07-23 11:55         ` Danilo Krummrich
2024-07-23 12:12           ` Michal Hocko
2024-07-23 13:33             ` Danilo Krummrich
2024-07-23 18:53               ` Michal Hocko
2024-07-26 14:38   ` Vlastimil Babka
2024-07-23 18:54 ` [PATCH v2 0/2] Align " Michal Hocko
2024-07-23 18:56   ` Danilo Krummrich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).