[PATCH v5 0/6] slab: Introduce dedicated bucket allocator

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v5 0/6] slab: Introduce dedicated bucket allocator
@ 2024-06-19 19:33 Kees Cook
  2024-06-19 19:33 ` [PATCH v5 1/6] mm/slab: Introduce kmem_buckets typedef Kees Cook
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Kees Cook @ 2024-06-19 19:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, GONG, Ruiqi, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, jvoisin, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu,
	linux-kernel, linux-mm, linux-hardening, netdev

Hi,

 v5:
  - Use vbabka's macros for optional arguments (thank you! I added a
    Co-developed-by and S-o-b)
  - Do not make Kconfig "default y", but recommend that it be enabled (vbabka)
  - Do not check for NULL before kmem_cache_destroy() on error path (horms)
  - Adjust size/bucket argument ordering on slab_alloc()
  - Make sure kmem_buckets cache itself is SLAB_NO_MERGE
  - Do not include "_noprof" in kern-doc (it is redundant)
  - Fix kern-doc argument ordering
 v4: https://lore.kernel.org/lkml/20240531191304.it.853-kees@kernel.org/
 v3: https://lore.kernel.org/lkml/20240424213019.make.366-kees@kernel.org/
 v2: https://lore.kernel.org/lkml/20240305100933.it.923-kees@kernel.org/
 v1: https://lore.kernel.org/lkml/20240304184252.work.496-kees@kernel.org/

For the cover letter, I'm repeating the commit log for patch 4 here,
which has additional clarifications and rationale since v2:

    Dedicated caches are available for fixed size allocations via
    kmem_cache_alloc(), but for dynamically sized allocations there is only
    the global kmalloc API's set of buckets available. This means it isn't
    possible to separate specific sets of dynamically sized allocations into
    a separate collection of caches.
    
    This leads to a use-after-free exploitation weakness in the Linux
    kernel since many heap memory spraying/grooming attacks depend on using
    userspace-controllable dynamically sized allocations to collide with
    fixed size allocations that end up in same cache.
    
    While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense
    against these kinds of "type confusion" attacks, including for fixed
    same-size heap objects, we can create a complementary deterministic
    defense for dynamically sized allocations that are directly user
    controlled. Addressing these cases is limited in scope, so isolation these
    kinds of interfaces will not become an unbounded game of whack-a-mole. For
    example, pass through memdup_user(), making isolation there very
    effective.
    
    In order to isolate user-controllable sized allocations from system
    allocations, introduce kmem_buckets_create(), which behaves like
    kmem_cache_create(). Introduce kmem_buckets_alloc(), which behaves like
    kmem_cache_alloc(). Introduce kmem_buckets_alloc_track_caller() for
    where caller tracking is needed. Introduce kmem_buckets_valloc() for
    cases where vmalloc callback is needed.
    
    Allows for confining allocations to a dedicated set of sized caches
    (which have the same layout as the kmalloc caches).
    
    This can also be used in the future to extend codetag allocation
    annotations to implement per-caller allocation cache isolation[1] even
    for dynamic allocations.
    
    Memory allocation pinning[2] is still needed to plug the Use-After-Free
    cross-allocator weakness, but that is an existing and separate issue
    which is complementary to this improvement. Development continues for
    that feature via the SLAB_VIRTUAL[3] series (which could also provide
    guard pages -- another complementary improvement).
    
    Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1]
    Link: https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html [2]
    Link: https://lore.kernel.org/lkml/20230915105933.495735-1-matteorizzo@google.com/ [3]

After the core implementation are 2 patches that cover the most heavily
abused "repeat offenders" used in exploits. Repeating those details here:

    The msg subsystem is a common target for exploiting[1][2][3][4][5][6]
    use-after-free type confusion flaws in the kernel for both read and
    write primitives. Avoid having a user-controlled size cache share the
    global kmalloc allocator by using a separate set of kmalloc buckets.

    
    Link: https://blog.hacktivesecurity.com/index.php/2022/06/13/linux-kernel-exploit-development-1day-case-study/ [1]
    Link: https://hardenedvault.net/blog/2022-11-13-msg_msg-recon-mitigation-ved/ [2]
    Link: https://www.willsroot.io/2021/08/corctf-2021-fire-of-salvation-writeup.html [3]
    Link: https://a13xp0p0v.github.io/2021/02/09/CVE-2021-26708.html [4]
    Link: https://google.github.io/security-research/pocs/linux/cve-2021-22555/writeup.html [5]
    Link: https://zplin.me/papers/ELOISE.pdf [6]
    Link: https://syst3mfailure.io/wall-of-perdition/ [7]

    Both memdup_user() and vmemdup_user() handle allocations that are
    regularly used for exploiting use-after-free type confusion flaws in
    the kernel (e.g. prctl() PR_SET_VMA_ANON_NAME[1] and setxattr[2][3][4]
    respectively).
    
    Since both are designed for contents coming from userspace, it allows
    for userspace-controlled allocation sizes. Use a dedicated set of kmalloc
    buckets so these allocations do not share caches with the global kmalloc
    buckets.
    
    Link: https://starlabs.sg/blog/2023/07-prctl-anon_vma_name-an-amusing-heap-spray/ [1]
    Link: https://duasynt.com/blog/linux-kernel-heap-spray [2]
    Link: https://etenal.me/archives/1336 [3]
    Link: https://github.com/a13xp0p0v/kernel-hack-drill/blob/master/drill_exploit_uaf.c [4]

Thanks!

-Kees


Kees Cook (6):
  mm/slab: Introduce kmem_buckets typedef
  mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
  mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets
    argument
  mm/slab: Introduce kmem_buckets_create() and family
  ipc, msg: Use dedicated slab buckets for alloc_msg()
  mm/util: Use dedicated slab buckets for memdup_user()

 include/linux/slab.h | 49 +++++++++++++++++++++-----
 ipc/msgutil.c        | 13 ++++++-
 mm/Kconfig           | 16 +++++++++
 mm/slab.h            |  6 ++--
 mm/slab_common.c     | 83 ++++++++++++++++++++++++++++++++++++++++++--
 mm/slub.c            | 20 +++++------
 mm/util.c            | 23 ++++++++----
 7 files changed, 180 insertions(+), 30 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v5 1/6] mm/slab: Introduce kmem_buckets typedef
  2024-06-19 19:33 [PATCH v5 0/6] slab: Introduce dedicated bucket allocator Kees Cook
@ 2024-06-19 19:33 ` Kees Cook
  2024-06-19 19:33 ` [PATCH v5 2/6] mm/slab: Plumb kmem_buckets into __do_kmalloc_node() Kees Cook
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Kees Cook @ 2024-06-19 19:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, GONG, Ruiqi, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, jvoisin, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu,
	linux-kernel, linux-mm, linux-hardening, netdev

Encapsulate the concept of a single set of kmem_caches that are used
for the kmalloc size buckets. Redefine kmalloc_caches as an array
of these buckets (for the different global cache buckets).

Signed-off-by: Kees Cook <kees@kernel.org>
---
 include/linux/slab.h | 5 +++--
 mm/slab_common.c     | 3 +--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index ed6bee5ec2b6..8a006fac57c6 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -426,8 +426,9 @@ enum kmalloc_cache_type {
 	NR_KMALLOC_TYPES
 };
 
-extern struct kmem_cache *
-kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1];
+typedef struct kmem_cache * kmem_buckets[KMALLOC_SHIFT_HIGH + 1];
+
+extern kmem_buckets kmalloc_caches[NR_KMALLOC_TYPES];
 
 /*
  * Define gfp bits that should not be set for KMALLOC_NORMAL.
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 1560a1546bb1..e0b1c109bed2 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -653,8 +653,7 @@ static struct kmem_cache *__init create_kmalloc_cache(const char *name,
 	return s;
 }
 
-struct kmem_cache *
-kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1] __ro_after_init =
+kmem_buckets kmalloc_caches[NR_KMALLOC_TYPES] __ro_after_init =
 { /* initialization for https://llvm.org/pr42570 */ };
 EXPORT_SYMBOL(kmalloc_caches);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 2/6] mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
  2024-06-19 19:33 [PATCH v5 0/6] slab: Introduce dedicated bucket allocator Kees Cook
  2024-06-19 19:33 ` [PATCH v5 1/6] mm/slab: Introduce kmem_buckets typedef Kees Cook
@ 2024-06-19 19:33 ` Kees Cook
  2024-06-20 13:08   ` Vlastimil Babka
  2024-06-19 19:33 ` [PATCH v5 3/6] mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets argument Kees Cook
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Kees Cook @ 2024-06-19 19:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, GONG, Ruiqi, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, jvoisin, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu,
	linux-kernel, linux-mm, linux-hardening, netdev

Introduce CONFIG_SLAB_BUCKETS which provides the infrastructure to
support separated kmalloc buckets (in the following kmem_buckets_create()
patches and future codetag-based separation). Since this will provide
a mitigation for a very common case of exploits, enable it by default.

To be able to choose which buckets to allocate from, make the buckets
available to the internal kmalloc interfaces by adding them as the
first argument, rather than depending on the buckets being chosen from
the fixed set of global buckets. Where the bucket is not available,
pass NULL, which means "use the default system kmalloc bucket set"
(the prior existing behavior), as implemented in kmalloc_slab().

To avoid adding the extra argument when !CONFIG_SLAB_BUCKETS, only the
top-level macros and static inlines use the buckets argument (where
they are stripped out and compiled out respectively). The actual extern
functions can then been built without the argument, and the internals
fall back to the global kmalloc buckets unconditionally.

Co-developed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Kees Cook <kees@kernel.org>
---
 include/linux/slab.h | 27 ++++++++++++++++++++++-----
 mm/Kconfig           | 16 ++++++++++++++++
 mm/slab.h            |  6 ++++--
 mm/slab_common.c     |  2 +-
 mm/slub.c            | 20 ++++++++++----------
 5 files changed, 53 insertions(+), 18 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 8a006fac57c6..708bde6039f0 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -570,6 +570,21 @@ void *kmem_cache_alloc_node_noprof(struct kmem_cache *s, gfp_t flags,
 				   int node) __assume_slab_alignment __malloc;
 #define kmem_cache_alloc_node(...)	alloc_hooks(kmem_cache_alloc_node_noprof(__VA_ARGS__))
 
+/*
+ * These macros allow declaring a kmem_buckets * parameter alongside size, which
+ * can be compiled out with CONFIG_SLAB_BUCKETS=n so that a large number of call
+ * sites don't have to pass NULL.
+ */
+#ifdef CONFIG_SLAB_BUCKETS
+#define DECL_BUCKET_PARAMS(_size, _b)	size_t (_size), kmem_buckets *(_b)
+#define PASS_BUCKET_PARAMS(_size, _b)	(_size), (_b)
+#define PASS_BUCKET_PARAM(_b)		(_b)
+#else
+#define DECL_BUCKET_PARAMS(_size, _b)	size_t (_size)
+#define PASS_BUCKET_PARAMS(_size, _b)	(_size)
+#define PASS_BUCKET_PARAM(_b)		NULL
+#endif
+
 /*
  * The following functions are not to be used directly and are intended only
  * for internal use from kmalloc() and kmalloc_node()
@@ -579,7 +594,7 @@ void *kmem_cache_alloc_node_noprof(struct kmem_cache *s, gfp_t flags,
 void *__kmalloc_noprof(size_t size, gfp_t flags)
 				__assume_kmalloc_alignment __alloc_size(1);
 
-void *__kmalloc_node_noprof(size_t size, gfp_t flags, int node)
+void *__kmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node)
 				__assume_kmalloc_alignment __alloc_size(1);
 
 void *__kmalloc_cache_noprof(struct kmem_cache *s, gfp_t flags, size_t size)
@@ -679,7 +694,7 @@ static __always_inline __alloc_size(1) void *kmalloc_node_noprof(size_t size, gf
 				kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
 				flags, node, size);
 	}
-	return __kmalloc_node_noprof(size, flags, node);
+	return __kmalloc_node_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node);
 }
 #define kmalloc_node(...)			alloc_hooks(kmalloc_node_noprof(__VA_ARGS__))
 
@@ -730,8 +745,10 @@ static inline __realloc_size(2, 3) void * __must_check krealloc_array_noprof(voi
  */
 #define kcalloc(n, size, flags)		kmalloc_array(n, size, (flags) | __GFP_ZERO)
 
-void *kmalloc_node_track_caller_noprof(size_t size, gfp_t flags, int node,
-				  unsigned long caller) __alloc_size(1);
+void *__kmalloc_node_track_caller_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node,
+					 unsigned long caller) __alloc_size(1);
+#define kmalloc_node_track_caller_noprof(size, flags, node, caller) \
+	__kmalloc_node_track_caller_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node, caller)
 #define kmalloc_node_track_caller(...)		\
 	alloc_hooks(kmalloc_node_track_caller_noprof(__VA_ARGS__, _RET_IP_))
 
@@ -757,7 +774,7 @@ static inline __alloc_size(1, 2) void *kmalloc_array_node_noprof(size_t n, size_
 		return NULL;
 	if (__builtin_constant_p(n) && __builtin_constant_p(size))
 		return kmalloc_node_noprof(bytes, flags, node);
-	return __kmalloc_node_noprof(bytes, flags, node);
+	return __kmalloc_node_noprof(PASS_BUCKET_PARAMS(bytes, NULL), flags, node);
 }
 #define kmalloc_array_node(...)			alloc_hooks(kmalloc_array_node_noprof(__VA_ARGS__))
 
diff --git a/mm/Kconfig b/mm/Kconfig
index b4cb45255a54..20bb71e241c3 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -273,6 +273,22 @@ config SLAB_FREELIST_HARDENED
 	  sacrifices to harden the kernel slab allocator against common
 	  freelist exploit methods.
 
+config SLAB_BUCKETS
+	bool "Support allocation from separate kmalloc buckets"
+	depends on !SLUB_TINY
+	help
+	  Kernel heap attacks frequently depend on being able to create
+	  specifically-sized allocations with user-controlled contents
+	  that will be allocated into the same kmalloc bucket as a
+	  target object. To avoid sharing these allocation buckets,
+	  provide an explicitly separated set of buckets to be used for
+	  user-controlled allocations. This may very slightly increase
+	  memory fragmentation, though in practice it's only a handful
+	  of extra pages since the bulk of user-controlled allocations
+	  are relatively long-lived.
+
+	  If unsure, say Y.
+
 config SLUB_STATS
 	default n
 	bool "Enable performance statistics"
diff --git a/mm/slab.h b/mm/slab.h
index b16e63191578..d5e8034af9d5 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -403,16 +403,18 @@ static inline unsigned int size_index_elem(unsigned int bytes)
  * KMALLOC_MAX_CACHE_SIZE and the caller must check that.
  */
 static inline struct kmem_cache *
-kmalloc_slab(size_t size, gfp_t flags, unsigned long caller)
+kmalloc_slab(size_t size, kmem_buckets *b, gfp_t flags, unsigned long caller)
 {
 	unsigned int index;
 
+	if (!b)
+		b = &kmalloc_caches[kmalloc_type(flags, caller)];
 	if (size <= 192)
 		index = kmalloc_size_index[size_index_elem(size)];
 	else
 		index = fls(size - 1);
 
-	return kmalloc_caches[kmalloc_type(flags, caller)][index];
+	return (*b)[index];
 }
 
 gfp_t kmalloc_fix_flags(gfp_t flags);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index e0b1c109bed2..9b0f2ef951f1 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -702,7 +702,7 @@ size_t kmalloc_size_roundup(size_t size)
 		 * The flags don't matter since size_index is common to all.
 		 * Neither does the caller for just getting ->object_size.
 		 */
-		return kmalloc_slab(size, GFP_KERNEL, 0)->object_size;
+		return kmalloc_slab(size, NULL, GFP_KERNEL, 0)->object_size;
 	}
 
 	/* Above the smaller buckets, size is a multiple of page size. */
diff --git a/mm/slub.c b/mm/slub.c
index 3d19a0ee411f..80f0a51242d1 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4117,7 +4117,7 @@ void *__kmalloc_large_node_noprof(size_t size, gfp_t flags, int node)
 EXPORT_SYMBOL(__kmalloc_large_node_noprof);
 
 static __always_inline
-void *__do_kmalloc_node(size_t size, gfp_t flags, int node,
+void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,
 			unsigned long caller)
 {
 	struct kmem_cache *s;
@@ -4133,32 +4133,32 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node,
 	if (unlikely(!size))
 		return ZERO_SIZE_PTR;
 
-	s = kmalloc_slab(size, flags, caller);
+	s = kmalloc_slab(size, b, flags, caller);
 
 	ret = slab_alloc_node(s, NULL, flags, node, caller, size);
 	ret = kasan_kmalloc(s, ret, size, flags);
 	trace_kmalloc(caller, ret, size, s->size, flags, node);
 	return ret;
 }
-
-void *__kmalloc_node_noprof(size_t size, gfp_t flags, int node)
+void *__kmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node)
 {
-	return __do_kmalloc_node(size, flags, node, _RET_IP_);
+	return __do_kmalloc_node(size, PASS_BUCKET_PARAM(b), flags, node, _RET_IP_);
 }
 EXPORT_SYMBOL(__kmalloc_node_noprof);
 
 void *__kmalloc_noprof(size_t size, gfp_t flags)
 {
-	return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
+	return __do_kmalloc_node(size, NULL, flags, NUMA_NO_NODE, _RET_IP_);
 }
 EXPORT_SYMBOL(__kmalloc_noprof);
 
-void *kmalloc_node_track_caller_noprof(size_t size, gfp_t flags,
-				       int node, unsigned long caller)
+void *__kmalloc_node_track_caller_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags,
+					 int node, unsigned long caller)
 {
-	return __do_kmalloc_node(size, flags, node, caller);
+	return __do_kmalloc_node(size, PASS_BUCKET_PARAM(b), flags, node, caller);
+
 }
-EXPORT_SYMBOL(kmalloc_node_track_caller_noprof);
+EXPORT_SYMBOL(__kmalloc_node_track_caller_noprof);
 
 void *__kmalloc_cache_noprof(struct kmem_cache *s, gfp_t gfpflags, size_t size)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 3/6] mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets argument
  2024-06-19 19:33 [PATCH v5 0/6] slab: Introduce dedicated bucket allocator Kees Cook
  2024-06-19 19:33 ` [PATCH v5 1/6] mm/slab: Introduce kmem_buckets typedef Kees Cook
  2024-06-19 19:33 ` [PATCH v5 2/6] mm/slab: Plumb kmem_buckets into __do_kmalloc_node() Kees Cook
@ 2024-06-19 19:33 ` Kees Cook
  2024-06-19 19:33 ` [PATCH v5 4/6] mm/slab: Introduce kmem_buckets_create() and family Kees Cook
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Kees Cook @ 2024-06-19 19:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, GONG, Ruiqi, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, jvoisin, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu,
	linux-kernel, linux-mm, linux-hardening, netdev

Plumb kmem_buckets arguments through kvmalloc_node_noprof() so it is
possible to provide an API to perform kvmalloc-style allocations with
a particular set of buckets. Introduce kvmalloc_buckets_node() that takes a
kmem_buckets argument.

Signed-off-by: Kees Cook <kees@kernel.org>
---
 include/linux/slab.h | 4 +++-
 mm/util.c            | 9 +++++----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 708bde6039f0..8d0800c7579a 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -798,7 +798,9 @@ static inline __alloc_size(1) void *kzalloc_noprof(size_t size, gfp_t flags)
 #define kzalloc(...)				alloc_hooks(kzalloc_noprof(__VA_ARGS__))
 #define kzalloc_node(_size, _flags, _node)	kmalloc_node(_size, (_flags)|__GFP_ZERO, _node)
 
-extern void *kvmalloc_node_noprof(size_t size, gfp_t flags, int node) __alloc_size(1);
+void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node) __alloc_size(1);
+#define kvmalloc_node_noprof(size, flags, node)	\
+	__kvmalloc_node_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node)
 #define kvmalloc_node(...)			alloc_hooks(kvmalloc_node_noprof(__VA_ARGS__))
 
 #define kvmalloc(_size, _flags)			kvmalloc_node(_size, _flags, NUMA_NO_NODE)
diff --git a/mm/util.c b/mm/util.c
index c9e519e6811f..28c5356b9f1c 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -594,9 +594,10 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
 EXPORT_SYMBOL(vm_mmap);
 
 /**
- * kvmalloc_node - attempt to allocate physically contiguous memory, but upon
+ * __kvmalloc_node - attempt to allocate physically contiguous memory, but upon
  * failure, fall back to non-contiguous (vmalloc) allocation.
  * @size: size of the request.
+ * @b: which set of kmalloc buckets to allocate from.
  * @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
  * @node: numa node to allocate from
  *
@@ -609,7 +610,7 @@ EXPORT_SYMBOL(vm_mmap);
  *
  * Return: pointer to the allocated memory of %NULL in case of failure
  */
-void *kvmalloc_node_noprof(size_t size, gfp_t flags, int node)
+void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node)
 {
 	gfp_t kmalloc_flags = flags;
 	void *ret;
@@ -631,7 +632,7 @@ void *kvmalloc_node_noprof(size_t size, gfp_t flags, int node)
 		kmalloc_flags &= ~__GFP_NOFAIL;
 	}
 
-	ret = kmalloc_node_noprof(size, kmalloc_flags, node);
+	ret = __kmalloc_node_noprof(PASS_BUCKET_PARAMS(size, b), kmalloc_flags, node);
 
 	/*
 	 * It doesn't really make sense to fallback to vmalloc for sub page
@@ -660,7 +661,7 @@ void *kvmalloc_node_noprof(size_t size, gfp_t flags, int node)
 			flags, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
 			node, __builtin_return_address(0));
 }
-EXPORT_SYMBOL(kvmalloc_node_noprof);
+EXPORT_SYMBOL(__kvmalloc_node_noprof);
 
 /**
  * kvfree() - Free memory.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 4/6] mm/slab: Introduce kmem_buckets_create() and family
  2024-06-19 19:33 [PATCH v5 0/6] slab: Introduce dedicated bucket allocator Kees Cook
                   ` (2 preceding siblings ...)
  2024-06-19 19:33 ` [PATCH v5 3/6] mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets argument Kees Cook
@ 2024-06-19 19:33 ` Kees Cook
  2024-06-20 13:56   ` Vlastimil Babka
  2024-06-20 22:48   ` Andi Kleen
  2024-06-19 19:33 ` [PATCH v5 5/6] ipc, msg: Use dedicated slab buckets for alloc_msg() Kees Cook
  2024-06-19 19:33 ` [PATCH v5 6/6] mm/util: Use dedicated slab buckets for memdup_user() Kees Cook
  5 siblings, 2 replies; 24+ messages in thread
From: Kees Cook @ 2024-06-19 19:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, GONG, Ruiqi, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, jvoisin, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu,
	linux-kernel, linux-mm, linux-hardening, netdev

Dedicated caches are available for fixed size allocations via
kmem_cache_alloc(), but for dynamically sized allocations there is only
the global kmalloc API's set of buckets available. This means it isn't
possible to separate specific sets of dynamically sized allocations into
a separate collection of caches.

This leads to a use-after-free exploitation weakness in the Linux
kernel since many heap memory spraying/grooming attacks depend on using
userspace-controllable dynamically sized allocations to collide with
fixed size allocations that end up in same cache.

While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense
against these kinds of "type confusion" attacks, including for fixed
same-size heap objects, we can create a complementary deterministic
defense for dynamically sized allocations that are directly user
controlled. Addressing these cases is limited in scope, so isolating these
kinds of interfaces will not become an unbounded game of whack-a-mole. For
example, many pass through memdup_user(), making isolation there very
effective.

In order to isolate user-controllable dynamically-sized
allocations from the common system kmalloc allocations, introduce
kmem_buckets_create(), which behaves like kmem_cache_create(). Introduce
kmem_buckets_alloc(), which behaves like kmem_cache_alloc(). Introduce
kmem_buckets_alloc_track_caller() for where caller tracking is
needed. Introduce kmem_buckets_valloc() for cases where vmalloc fallback
is needed.

This can also be used in the future to extend allocation profiling's use
of code tagging to implement per-caller allocation cache isolation[1]
even for dynamic allocations.

Memory allocation pinning[2] is still needed to plug the Use-After-Free
cross-allocator weakness, but that is an existing and separate issue
which is complementary to this improvement. Development continues for
that feature via the SLAB_VIRTUAL[3] series (which could also provide
guard pages -- another complementary improvement).

Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1]
Link: https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html [2]
Link: https://lore.kernel.org/lkml/20230915105933.495735-1-matteorizzo@google.com/ [3]
Signed-off-by: Kees Cook <kees@kernel.org>
---
 include/linux/slab.h | 13 ++++++++
 mm/slab_common.c     | 78 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 91 insertions(+)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 8d0800c7579a..3698b15b6138 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -549,6 +549,11 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
 
 void kmem_cache_free(struct kmem_cache *s, void *objp);
 
+kmem_buckets *kmem_buckets_create(const char *name, unsigned int align,
+				  slab_flags_t flags,
+				  unsigned int useroffset, unsigned int usersize,
+				  void (*ctor)(void *));
+
 /*
  * Bulk allocation and freeing operations. These are accelerated in an
  * allocator specific way to avoid taking locks repeatedly or building
@@ -681,6 +686,12 @@ static __always_inline __alloc_size(1) void *kmalloc_noprof(size_t size, gfp_t f
 }
 #define kmalloc(...)				alloc_hooks(kmalloc_noprof(__VA_ARGS__))
 
+#define kmem_buckets_alloc(_b, _size, _flags)	\
+	alloc_hooks(__kmalloc_node_noprof(PASS_BUCKET_PARAMS(_size, _b), _flags, NUMA_NO_NODE))
+
+#define kmem_buckets_alloc_track_caller(_b, _size, _flags)	\
+	alloc_hooks(__kmalloc_node_track_caller_noprof(PASS_BUCKET_PARAMS(_size, _b), _flags, NUMA_NO_NODE, _RET_IP_))
+
 static __always_inline __alloc_size(1) void *kmalloc_node_noprof(size_t size, gfp_t flags, int node)
 {
 	if (__builtin_constant_p(size) && size) {
@@ -808,6 +819,8 @@ void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node)
 #define kvzalloc(_size, _flags)			kvmalloc(_size, (_flags)|__GFP_ZERO)
 
 #define kvzalloc_node(_size, _flags, _node)	kvmalloc_node(_size, (_flags)|__GFP_ZERO, _node)
+#define kmem_buckets_valloc(_b, _size, _flags)	\
+	alloc_hooks(__kvmalloc_node_noprof(PASS_BUCKET_PARAMS(_size, _b), _flags, NUMA_NO_NODE))
 
 static inline __alloc_size(1, 2) void *
 kvmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags, int node)
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 9b0f2ef951f1..453bc4ec8b57 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -392,6 +392,80 @@ kmem_cache_create(const char *name, unsigned int size, unsigned int align,
 }
 EXPORT_SYMBOL(kmem_cache_create);
 
+static struct kmem_cache *kmem_buckets_cache __ro_after_init;
+
+kmem_buckets *kmem_buckets_create(const char *name, unsigned int align,
+				  slab_flags_t flags,
+				  unsigned int useroffset,
+				  unsigned int usersize,
+				  void (*ctor)(void *))
+{
+	kmem_buckets *b;
+	int idx;
+
+	/*
+	 * When the separate buckets API is not built in, just return
+	 * a non-NULL value for the kmem_buckets pointer, which will be
+	 * unused when performing allocations.
+	 */
+	if (!IS_ENABLED(CONFIG_SLAB_BUCKETS))
+		return ZERO_SIZE_PTR;
+
+	if (WARN_ON(!kmem_buckets_cache))
+		return NULL;
+
+	b = kmem_cache_alloc(kmem_buckets_cache, GFP_KERNEL|__GFP_ZERO);
+	if (WARN_ON(!b))
+		return NULL;
+
+	flags |= SLAB_NO_MERGE;
+
+	for (idx = 0; idx < ARRAY_SIZE(kmalloc_caches[KMALLOC_NORMAL]); idx++) {
+		char *short_size, *cache_name;
+		unsigned int cache_useroffset, cache_usersize;
+		unsigned int size;
+
+		if (!kmalloc_caches[KMALLOC_NORMAL][idx])
+			continue;
+
+		size = kmalloc_caches[KMALLOC_NORMAL][idx]->object_size;
+		if (!size)
+			continue;
+
+		short_size = strchr(kmalloc_caches[KMALLOC_NORMAL][idx]->name, '-');
+		if (WARN_ON(!short_size))
+			goto fail;
+
+		cache_name = kasprintf(GFP_KERNEL, "%s-%s", name, short_size + 1);
+		if (WARN_ON(!cache_name))
+			goto fail;
+
+		if (useroffset >= size) {
+			cache_useroffset = 0;
+			cache_usersize = 0;
+		} else {
+			cache_useroffset = useroffset;
+			cache_usersize = min(size - cache_useroffset, usersize);
+		}
+		(*b)[idx] = kmem_cache_create_usercopy(cache_name, size,
+					align, flags, cache_useroffset,
+					cache_usersize, ctor);
+		kfree(cache_name);
+		if (WARN_ON(!(*b)[idx]))
+			goto fail;
+	}
+
+	return b;
+
+fail:
+	for (idx = 0; idx < ARRAY_SIZE(kmalloc_caches[KMALLOC_NORMAL]); idx++)
+		kmem_cache_destroy((*b)[idx]);
+	kfree(b);
+
+	return NULL;
+}
+EXPORT_SYMBOL(kmem_buckets_create);
+
 #ifdef SLAB_SUPPORTS_SYSFS
 /*
  * For a given kmem_cache, kmem_cache_destroy() should only be called
@@ -931,6 +1005,10 @@ void __init create_kmalloc_caches(void)
 
 	/* Kmalloc array is now usable */
 	slab_state = UP;
+
+	kmem_buckets_cache = kmem_cache_create("kmalloc_buckets",
+					       sizeof(kmem_buckets),
+					       0, SLAB_NO_MERGE, NULL);
 }
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 5/6] ipc, msg: Use dedicated slab buckets for alloc_msg()
  2024-06-19 19:33 [PATCH v5 0/6] slab: Introduce dedicated bucket allocator Kees Cook
                   ` (3 preceding siblings ...)
  2024-06-19 19:33 ` [PATCH v5 4/6] mm/slab: Introduce kmem_buckets_create() and family Kees Cook
@ 2024-06-19 19:33 ` Kees Cook
  2024-06-19 19:33 ` [PATCH v5 6/6] mm/util: Use dedicated slab buckets for memdup_user() Kees Cook
  5 siblings, 0 replies; 24+ messages in thread
From: Kees Cook @ 2024-06-19 19:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, GONG, Ruiqi, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, jvoisin, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu,
	linux-kernel, linux-mm, linux-hardening, netdev

The msg subsystem is a common target for exploiting[1][2][3][4][5][6][7]
use-after-free type confusion flaws in the kernel for both read and write
primitives. Avoid having a user-controlled dynamically-size allocation
share the global kmalloc cache by using a separate set of kmalloc buckets
via the kmem_buckets API.

Link: https://blog.hacktivesecurity.com/index.php/2022/06/13/linux-kernel-exploit-development-1day-case-study/ [1]
Link: https://hardenedvault.net/blog/2022-11-13-msg_msg-recon-mitigation-ved/ [2]
Link: https://www.willsroot.io/2021/08/corctf-2021-fire-of-salvation-writeup.html [3]
Link: https://a13xp0p0v.github.io/2021/02/09/CVE-2021-26708.html [4]
Link: https://google.github.io/security-research/pocs/linux/cve-2021-22555/writeup.html [5]
Link: https://zplin.me/papers/ELOISE.pdf [6]
Link: https://syst3mfailure.io/wall-of-perdition/ [7]
Signed-off-by: Kees Cook <kees@kernel.org>
---
 ipc/msgutil.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/ipc/msgutil.c b/ipc/msgutil.c
index d0a0e877cadd..f392f30a057a 100644
--- a/ipc/msgutil.c
+++ b/ipc/msgutil.c
@@ -42,6 +42,17 @@ struct msg_msgseg {
 #define DATALEN_MSG	((size_t)PAGE_SIZE-sizeof(struct msg_msg))
 #define DATALEN_SEG	((size_t)PAGE_SIZE-sizeof(struct msg_msgseg))
 
+static kmem_buckets *msg_buckets __ro_after_init;
+
+static int __init init_msg_buckets(void)
+{
+	msg_buckets = kmem_buckets_create("msg_msg", 0, SLAB_ACCOUNT,
+					  sizeof(struct msg_msg),
+					  DATALEN_MSG, NULL);
+
+	return 0;
+}
+subsys_initcall(init_msg_buckets);
 
 static struct msg_msg *alloc_msg(size_t len)
 {
@@ -50,7 +61,7 @@ static struct msg_msg *alloc_msg(size_t len)
 	size_t alen;
 
 	alen = min(len, DATALEN_MSG);
-	msg = kmalloc(sizeof(*msg) + alen, GFP_KERNEL_ACCOUNT);
+	msg = kmem_buckets_alloc(msg_buckets, sizeof(*msg) + alen, GFP_KERNEL);
 	if (msg == NULL)
 		return NULL;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 6/6] mm/util: Use dedicated slab buckets for memdup_user()
  2024-06-19 19:33 [PATCH v5 0/6] slab: Introduce dedicated bucket allocator Kees Cook
                   ` (4 preceding siblings ...)
  2024-06-19 19:33 ` [PATCH v5 5/6] ipc, msg: Use dedicated slab buckets for alloc_msg() Kees Cook
@ 2024-06-19 19:33 ` Kees Cook
  5 siblings, 0 replies; 24+ messages in thread
From: Kees Cook @ 2024-06-19 19:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, GONG, Ruiqi, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, jvoisin, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu,
	linux-kernel, linux-mm, linux-hardening, netdev

Both memdup_user() and vmemdup_user() handle allocations that are
regularly used for exploiting use-after-free type confusion flaws in
the kernel (e.g. prctl() PR_SET_VMA_ANON_NAME[1] and setxattr[2][3][4]
respectively).

Since both are designed for contents coming from userspace, it allows
for userspace-controlled allocation sizes. Use a dedicated set of kmalloc
buckets so these allocations do not share caches with the global kmalloc
buckets.

After a fresh boot under Ubuntu 23.10, we can see the caches are already
in active use:

 # grep ^memdup /proc/slabinfo
 memdup_user-8k         4      4   8192    4    8 : ...
 memdup_user-4k         8      8   4096    8    8 : ...
 memdup_user-2k        16     16   2048   16    8 : ...
 memdup_user-1k         0      0   1024   16    4 : ...
 memdup_user-512        0      0    512   16    2 : ...
 memdup_user-256        0      0    256   16    1 : ...
 memdup_user-128        0      0    128   32    1 : ...
 memdup_user-64       256    256     64   64    1 : ...
 memdup_user-32       512    512     32  128    1 : ...
 memdup_user-16      1024   1024     16  256    1 : ...
 memdup_user-8       2048   2048      8  512    1 : ...
 memdup_user-192        0      0    192   21    1 : ...
 memdup_user-96       168    168     96   42    1 : ...

Link: https://starlabs.sg/blog/2023/07-prctl-anon_vma_name-an-amusing-heap-spray/ [1]
Link: https://duasynt.com/blog/linux-kernel-heap-spray [2]
Link: https://etenal.me/archives/1336 [3]
Link: https://github.com/a13xp0p0v/kernel-hack-drill/blob/master/drill_exploit_uaf.c [4]
Signed-off-by: Kees Cook <kees@kernel.org>
---
 mm/util.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/mm/util.c b/mm/util.c
index 28c5356b9f1c..6f0fcc5f4243 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -198,6 +198,16 @@ char *kmemdup_nul(const char *s, size_t len, gfp_t gfp)
 }
 EXPORT_SYMBOL(kmemdup_nul);
 
+static kmem_buckets *user_buckets __ro_after_init;
+
+static int __init init_user_buckets(void)
+{
+	user_buckets = kmem_buckets_create("memdup_user", 0, 0, 0, INT_MAX, NULL);
+
+	return 0;
+}
+subsys_initcall(init_user_buckets);
+
 /**
  * memdup_user - duplicate memory region from user space
  *
@@ -211,7 +221,7 @@ void *memdup_user(const void __user *src, size_t len)
 {
 	void *p;
 
-	p = kmalloc_track_caller(len, GFP_USER | __GFP_NOWARN);
+	p = kmem_buckets_alloc_track_caller(user_buckets, len, GFP_USER | __GFP_NOWARN);
 	if (!p)
 		return ERR_PTR(-ENOMEM);
 
@@ -237,7 +247,7 @@ void *vmemdup_user(const void __user *src, size_t len)
 {
 	void *p;
 
-	p = kvmalloc(len, GFP_USER);
+	p = kmem_buckets_valloc(user_buckets, len, GFP_USER);
 	if (!p)
 		return ERR_PTR(-ENOMEM);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 2/6] mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
  2024-06-19 19:33 ` [PATCH v5 2/6] mm/slab: Plumb kmem_buckets into __do_kmalloc_node() Kees Cook
@ 2024-06-20 13:08   ` Vlastimil Babka
  2024-06-20 13:37     ` Vlastimil Babka
  2024-06-20 18:41     ` Kees Cook
  0 siblings, 2 replies; 24+ messages in thread
From: Vlastimil Babka @ 2024-06-20 13:08 UTC (permalink / raw)
  To: Kees Cook
  Cc: GONG, Ruiqi, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, jvoisin, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan, Kent Overstreet,
	Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu, linux-kernel,
	linux-mm, linux-hardening, netdev

On 6/19/24 9:33 PM, Kees Cook wrote:
> Introduce CONFIG_SLAB_BUCKETS which provides the infrastructure to
> support separated kmalloc buckets (in the following kmem_buckets_create()
> patches and future codetag-based separation). Since this will provide
> a mitigation for a very common case of exploits, enable it by default.

No longer "enable it by default".

> 
> To be able to choose which buckets to allocate from, make the buckets
> available to the internal kmalloc interfaces by adding them as the
> first argument, rather than depending on the buckets being chosen from

second argument now

> the fixed set of global buckets. Where the bucket is not available,
> pass NULL, which means "use the default system kmalloc bucket set"
> (the prior existing behavior), as implemented in kmalloc_slab().
> 
> To avoid adding the extra argument when !CONFIG_SLAB_BUCKETS, only the
> top-level macros and static inlines use the buckets argument (where
> they are stripped out and compiled out respectively). The actual extern
> functions can then been built without the argument, and the internals
> fall back to the global kmalloc buckets unconditionally.

Also describes the previous implementation and not the new one?

> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -273,6 +273,22 @@ config SLAB_FREELIST_HARDENED
>  	  sacrifices to harden the kernel slab allocator against common
>  	  freelist exploit methods.
>  
> +config SLAB_BUCKETS
> +	bool "Support allocation from separate kmalloc buckets"
> +	depends on !SLUB_TINY
> +	help
> +	  Kernel heap attacks frequently depend on being able to create
> +	  specifically-sized allocations with user-controlled contents
> +	  that will be allocated into the same kmalloc bucket as a
> +	  target object. To avoid sharing these allocation buckets,
> +	  provide an explicitly separated set of buckets to be used for
> +	  user-controlled allocations. This may very slightly increase
> +	  memory fragmentation, though in practice it's only a handful
> +	  of extra pages since the bulk of user-controlled allocations
> +	  are relatively long-lived.
> +
> +	  If unsure, say Y.

I was wondering why I don't see the buckets in slabinfo and turns out it was
SLAB_MERGE_DEFAULT. It would probably make sense for SLAB_MERGE_DEFAULT to
depends on !SLAB_BUCKETS now as the merging defeats the purpose, wdyt?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 2/6] mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
  2024-06-20 13:08   ` Vlastimil Babka
@ 2024-06-20 13:37     ` Vlastimil Babka
  2024-06-20 18:46       ` Kees Cook
  2024-06-20 18:41     ` Kees Cook
  1 sibling, 1 reply; 24+ messages in thread
From: Vlastimil Babka @ 2024-06-20 13:37 UTC (permalink / raw)
  To: Kees Cook
  Cc: GONG, Ruiqi, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, jvoisin, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan, Kent Overstreet,
	Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu, linux-kernel,
	linux-mm, linux-hardening, netdev

On 6/20/24 3:08 PM, Vlastimil Babka wrote:
> On 6/19/24 9:33 PM, Kees Cook wrote:
> I was wondering why I don't see the buckets in slabinfo and turns out it was
> SLAB_MERGE_DEFAULT. It would probably make sense for SLAB_MERGE_DEFAULT to
> depends on !SLAB_BUCKETS now as the merging defeats the purpose, wdyt?

Hm I might have been just blind, can see them there now. Anyway it probably
doesn't make much sense to have SLAB_BUCKETS and/or RANDOM_KMALLOC_CACHES
together with SLAB_MERGE_DEFAULT?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 4/6] mm/slab: Introduce kmem_buckets_create() and family
  2024-06-19 19:33 ` [PATCH v5 4/6] mm/slab: Introduce kmem_buckets_create() and family Kees Cook
@ 2024-06-20 13:56   ` Vlastimil Babka
  2024-06-20 18:54     ` Kees Cook
  2024-06-20 22:48   ` Andi Kleen
  1 sibling, 1 reply; 24+ messages in thread
From: Vlastimil Babka @ 2024-06-20 13:56 UTC (permalink / raw)
  To: Kees Cook
  Cc: GONG, Ruiqi, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, jvoisin, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan, Kent Overstreet,
	Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu, linux-kernel,
	linux-mm, linux-hardening, netdev

On 6/19/24 9:33 PM, Kees Cook wrote:
> Dedicated caches are available for fixed size allocations via
> kmem_cache_alloc(), but for dynamically sized allocations there is only
> the global kmalloc API's set of buckets available. This means it isn't
> possible to separate specific sets of dynamically sized allocations into
> a separate collection of caches.
> 
> This leads to a use-after-free exploitation weakness in the Linux
> kernel since many heap memory spraying/grooming attacks depend on using
> userspace-controllable dynamically sized allocations to collide with
> fixed size allocations that end up in same cache.
> 
> While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense
> against these kinds of "type confusion" attacks, including for fixed
> same-size heap objects, we can create a complementary deterministic
> defense for dynamically sized allocations that are directly user
> controlled. Addressing these cases is limited in scope, so isolating these
> kinds of interfaces will not become an unbounded game of whack-a-mole. For
> example, many pass through memdup_user(), making isolation there very
> effective.
> 
> In order to isolate user-controllable dynamically-sized
> allocations from the common system kmalloc allocations, introduce
> kmem_buckets_create(), which behaves like kmem_cache_create(). Introduce
> kmem_buckets_alloc(), which behaves like kmem_cache_alloc(). Introduce
> kmem_buckets_alloc_track_caller() for where caller tracking is
> needed. Introduce kmem_buckets_valloc() for cases where vmalloc fallback
> is needed.
> 
> This can also be used in the future to extend allocation profiling's use
> of code tagging to implement per-caller allocation cache isolation[1]
> even for dynamic allocations.
> 
> Memory allocation pinning[2] is still needed to plug the Use-After-Free
> cross-allocator weakness, but that is an existing and separate issue
> which is complementary to this improvement. Development continues for
> that feature via the SLAB_VIRTUAL[3] series (which could also provide
> guard pages -- another complementary improvement).
> 
> Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1]
> Link: https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html [2]
> Link: https://lore.kernel.org/lkml/20230915105933.495735-1-matteorizzo@google.com/ [3]
> Signed-off-by: Kees Cook <kees@kernel.org>
> ---
>  include/linux/slab.h | 13 ++++++++
>  mm/slab_common.c     | 78 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 91 insertions(+)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 8d0800c7579a..3698b15b6138 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -549,6 +549,11 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
>  
>  void kmem_cache_free(struct kmem_cache *s, void *objp);
>  
> +kmem_buckets *kmem_buckets_create(const char *name, unsigned int align,
> +				  slab_flags_t flags,
> +				  unsigned int useroffset, unsigned int usersize,
> +				  void (*ctor)(void *));

I'd drop the ctor, I can't imagine how it would be used with variable-sized
allocations. Probably also "align" doesn't make much sense since we're just
copying the kmalloc cache sizes and its implicit alignment of any
power-of-two allocations. I don't think any current kmalloc user would
suddenly need either of those as you convert it to buckets, and definitely
not any user converted automatically by the code tagging.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 2/6] mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
  2024-06-20 13:08   ` Vlastimil Babka
  2024-06-20 13:37     ` Vlastimil Babka
@ 2024-06-20 18:41     ` Kees Cook
  1 sibling, 0 replies; 24+ messages in thread
From: Kees Cook @ 2024-06-20 18:41 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: GONG, Ruiqi, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, jvoisin, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan, Kent Overstreet,
	Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu, linux-kernel,
	linux-mm, linux-hardening, netdev

On Thu, Jun 20, 2024 at 03:08:32PM +0200, Vlastimil Babka wrote:
> On 6/19/24 9:33 PM, Kees Cook wrote:
> > Introduce CONFIG_SLAB_BUCKETS which provides the infrastructure to
> > support separated kmalloc buckets (in the following kmem_buckets_create()
> > patches and future codetag-based separation). Since this will provide
> > a mitigation for a very common case of exploits, enable it by default.
> 
> No longer "enable it by default".

Whoops! Yes, thank you.

> 
> > 
> > To be able to choose which buckets to allocate from, make the buckets
> > available to the internal kmalloc interfaces by adding them as the
> > first argument, rather than depending on the buckets being chosen from
> 
> second argument now

Fixed.

> 
> > the fixed set of global buckets. Where the bucket is not available,
> > pass NULL, which means "use the default system kmalloc bucket set"
> > (the prior existing behavior), as implemented in kmalloc_slab().
> > 
> > To avoid adding the extra argument when !CONFIG_SLAB_BUCKETS, only the
> > top-level macros and static inlines use the buckets argument (where
> > they are stripped out and compiled out respectively). The actual extern
> > functions can then been built without the argument, and the internals
> > fall back to the global kmalloc buckets unconditionally.
> 
> Also describes the previous implementation and not the new one?

I think this still describes the implementation: the macros are doing
this work now. I wanted to explain in the commit log why the "static
inline"s still have explicit arguments (they will vanish during
inlining), as they are needed to detect the need for the using the
global buckets.

> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -273,6 +273,22 @@ config SLAB_FREELIST_HARDENED
> >  	  sacrifices to harden the kernel slab allocator against common
> >  	  freelist exploit methods.
> >  
> > +config SLAB_BUCKETS
> > +	bool "Support allocation from separate kmalloc buckets"
> > +	depends on !SLUB_TINY
> > +	help
> > +	  Kernel heap attacks frequently depend on being able to create
> > +	  specifically-sized allocations with user-controlled contents
> > +	  that will be allocated into the same kmalloc bucket as a
> > +	  target object. To avoid sharing these allocation buckets,
> > +	  provide an explicitly separated set of buckets to be used for
> > +	  user-controlled allocations. This may very slightly increase
> > +	  memory fragmentation, though in practice it's only a handful
> > +	  of extra pages since the bulk of user-controlled allocations
> > +	  are relatively long-lived.
> > +
> > +	  If unsure, say Y.
> 
> I was wondering why I don't see the buckets in slabinfo and turns out it was
> SLAB_MERGE_DEFAULT. It would probably make sense for SLAB_MERGE_DEFAULT to
> depends on !SLAB_BUCKETS now as the merging defeats the purpose, wdyt?

You mention this was a misunderstanding in the next email, but just to
reply here: I explicitly use SLAB_NO_MERGE, so if it ever DOES become
invisible, then yes, that would be unexpected!

Thanks for the review!

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 2/6] mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
  2024-06-20 13:37     ` Vlastimil Babka
@ 2024-06-20 18:46       ` Kees Cook
  2024-06-20 20:44         ` Vlastimil Babka
  0 siblings, 1 reply; 24+ messages in thread
From: Kees Cook @ 2024-06-20 18:46 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: GONG, Ruiqi, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, jvoisin, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan, Kent Overstreet,
	Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu, linux-kernel,
	linux-mm, linux-hardening, netdev

On Thu, Jun 20, 2024 at 03:37:31PM +0200, Vlastimil Babka wrote:
> On 6/20/24 3:08 PM, Vlastimil Babka wrote:
> > On 6/19/24 9:33 PM, Kees Cook wrote:
> > I was wondering why I don't see the buckets in slabinfo and turns out it was
> > SLAB_MERGE_DEFAULT. It would probably make sense for SLAB_MERGE_DEFAULT to
> > depends on !SLAB_BUCKETS now as the merging defeats the purpose, wdyt?
> 
> Hm I might have been just blind, can see them there now. Anyway it probably
> doesn't make much sense to have SLAB_BUCKETS and/or RANDOM_KMALLOC_CACHES
> together with SLAB_MERGE_DEFAULT?

It's already handled so that the _other_ caches can still be merged if
people want it. See new_kmalloc_cache():

#ifdef CONFIG_RANDOM_KMALLOC_CACHES
        if (type >= KMALLOC_RANDOM_START && type <= KMALLOC_RANDOM_END)
                flags |= SLAB_NO_MERGE;
#endif

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 4/6] mm/slab: Introduce kmem_buckets_create() and family
  2024-06-20 13:56   ` Vlastimil Babka
@ 2024-06-20 18:54     ` Kees Cook
  2024-06-20 20:43       ` Vlastimil Babka
  0 siblings, 1 reply; 24+ messages in thread
From: Kees Cook @ 2024-06-20 18:54 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: GONG, Ruiqi, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, jvoisin, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan, Kent Overstreet,
	Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu, linux-kernel,
	linux-mm, linux-hardening, netdev

On Thu, Jun 20, 2024 at 03:56:27PM +0200, Vlastimil Babka wrote:
> On 6/19/24 9:33 PM, Kees Cook wrote:
> > Dedicated caches are available for fixed size allocations via
> > kmem_cache_alloc(), but for dynamically sized allocations there is only
> > the global kmalloc API's set of buckets available. This means it isn't
> > possible to separate specific sets of dynamically sized allocations into
> > a separate collection of caches.
> > 
> > This leads to a use-after-free exploitation weakness in the Linux
> > kernel since many heap memory spraying/grooming attacks depend on using
> > userspace-controllable dynamically sized allocations to collide with
> > fixed size allocations that end up in same cache.
> > 
> > While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense
> > against these kinds of "type confusion" attacks, including for fixed
> > same-size heap objects, we can create a complementary deterministic
> > defense for dynamically sized allocations that are directly user
> > controlled. Addressing these cases is limited in scope, so isolating these
> > kinds of interfaces will not become an unbounded game of whack-a-mole. For
> > example, many pass through memdup_user(), making isolation there very
> > effective.
> > 
> > In order to isolate user-controllable dynamically-sized
> > allocations from the common system kmalloc allocations, introduce
> > kmem_buckets_create(), which behaves like kmem_cache_create(). Introduce
> > kmem_buckets_alloc(), which behaves like kmem_cache_alloc(). Introduce
> > kmem_buckets_alloc_track_caller() for where caller tracking is
> > needed. Introduce kmem_buckets_valloc() for cases where vmalloc fallback
> > is needed.
> > 
> > This can also be used in the future to extend allocation profiling's use
> > of code tagging to implement per-caller allocation cache isolation[1]
> > even for dynamic allocations.
> > 
> > Memory allocation pinning[2] is still needed to plug the Use-After-Free
> > cross-allocator weakness, but that is an existing and separate issue
> > which is complementary to this improvement. Development continues for
> > that feature via the SLAB_VIRTUAL[3] series (which could also provide
> > guard pages -- another complementary improvement).
> > 
> > Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1]
> > Link: https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html [2]
> > Link: https://lore.kernel.org/lkml/20230915105933.495735-1-matteorizzo@google.com/ [3]
> > Signed-off-by: Kees Cook <kees@kernel.org>
> > ---
> >  include/linux/slab.h | 13 ++++++++
> >  mm/slab_common.c     | 78 ++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 91 insertions(+)
> > 
> > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > index 8d0800c7579a..3698b15b6138 100644
> > --- a/include/linux/slab.h
> > +++ b/include/linux/slab.h
> > @@ -549,6 +549,11 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
> >  
> >  void kmem_cache_free(struct kmem_cache *s, void *objp);
> >  
> > +kmem_buckets *kmem_buckets_create(const char *name, unsigned int align,
> > +				  slab_flags_t flags,
> > +				  unsigned int useroffset, unsigned int usersize,
> > +				  void (*ctor)(void *));
> 
> I'd drop the ctor, I can't imagine how it would be used with variable-sized
> allocations.

I've kept it because for "kmalloc wrapper" APIs, e.g. devm_kmalloc(),
there is some "housekeeping" that gets done explicitly right now that I
think would be better served by using a ctor in the future. These APIs
are variable-sized, but have a fixed size header, so they have a
"minimum size" that the ctor can still operate on, etc.

> Probably also "align" doesn't make much sense since we're just
> copying the kmalloc cache sizes and its implicit alignment of any
> power-of-two allocations.

Yeah, that's probably true. I kept it since I wanted to mirror
kmem_cache_create() to make this API more familiar looking.

> I don't think any current kmalloc user would
> suddenly need either of those as you convert it to buckets, and definitely
> not any user converted automatically by the code tagging.

Right, it's not needed for either the explicit users nor the future
automatic users. But since these arguments are available internally,
there seems to be future utility,  it's not fast path, and it made things
feel like the existing API, I'd kind of like to keep it.

But all that said, if you really don't want it, then sure I can drop
those arguments. Adding them back in the future shouldn't be too
much churn.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 4/6] mm/slab: Introduce kmem_buckets_create() and family
  2024-06-20 18:54     ` Kees Cook
@ 2024-06-20 20:43       ` Vlastimil Babka
  2024-06-28  5:35         ` Boqun Feng
  0 siblings, 1 reply; 24+ messages in thread
From: Vlastimil Babka @ 2024-06-20 20:43 UTC (permalink / raw)
  To: Kees Cook
  Cc: GONG, Ruiqi, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, jvoisin, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan, Kent Overstreet,
	Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu, linux-kernel,
	linux-mm, linux-hardening, netdev, rust-for-linux

On 6/20/24 8:54 PM, Kees Cook wrote:
> On Thu, Jun 20, 2024 at 03:56:27PM +0200, Vlastimil Babka wrote:
>> > @@ -549,6 +549,11 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
>> >  
>> >  void kmem_cache_free(struct kmem_cache *s, void *objp);
>> >  
>> > +kmem_buckets *kmem_buckets_create(const char *name, unsigned int align,
>> > +				  slab_flags_t flags,
>> > +				  unsigned int useroffset, unsigned int usersize,
>> > +				  void (*ctor)(void *));
>> 
>> I'd drop the ctor, I can't imagine how it would be used with variable-sized
>> allocations.
> 
> I've kept it because for "kmalloc wrapper" APIs, e.g. devm_kmalloc(),
> there is some "housekeeping" that gets done explicitly right now that I
> think would be better served by using a ctor in the future. These APIs
> are variable-sized, but have a fixed size header, so they have a
> "minimum size" that the ctor can still operate on, etc.
> 
>> Probably also "align" doesn't make much sense since we're just
>> copying the kmalloc cache sizes and its implicit alignment of any
>> power-of-two allocations.
> 
> Yeah, that's probably true. I kept it since I wanted to mirror
> kmem_cache_create() to make this API more familiar looking.

Rust people were asking about kmalloc alignment (but I forgot the details)
so maybe this could be useful for them? CC rust-for-linux.

>> I don't think any current kmalloc user would
>> suddenly need either of those as you convert it to buckets, and definitely
>> not any user converted automatically by the code tagging.
> 
> Right, it's not needed for either the explicit users nor the future
> automatic users. But since these arguments are available internally,
> there seems to be future utility,  it's not fast path, and it made things
> feel like the existing API, I'd kind of like to keep it.
> 
> But all that said, if you really don't want it, then sure I can drop
> those arguments. Adding them back in the future shouldn't be too
> much churn.

I guess we can keep it then.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 2/6] mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
  2024-06-20 18:46       ` Kees Cook
@ 2024-06-20 20:44         ` Vlastimil Babka
  0 siblings, 0 replies; 24+ messages in thread
From: Vlastimil Babka @ 2024-06-20 20:44 UTC (permalink / raw)
  To: Kees Cook
  Cc: GONG, Ruiqi, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, jvoisin, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan, Kent Overstreet,
	Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu, linux-kernel,
	linux-mm, linux-hardening, netdev

On 6/20/24 8:46 PM, Kees Cook wrote:
> On Thu, Jun 20, 2024 at 03:37:31PM +0200, Vlastimil Babka wrote:
>> On 6/20/24 3:08 PM, Vlastimil Babka wrote:
>> > On 6/19/24 9:33 PM, Kees Cook wrote:
>> > I was wondering why I don't see the buckets in slabinfo and turns out it was
>> > SLAB_MERGE_DEFAULT. It would probably make sense for SLAB_MERGE_DEFAULT to
>> > depends on !SLAB_BUCKETS now as the merging defeats the purpose, wdyt?
>> 
>> Hm I might have been just blind, can see them there now. Anyway it probably
>> doesn't make much sense to have SLAB_BUCKETS and/or RANDOM_KMALLOC_CACHES
>> together with SLAB_MERGE_DEFAULT?
> 
> It's already handled so that the _other_ caches can still be merged if
> people want it. See new_kmalloc_cache():
> 
> #ifdef CONFIG_RANDOM_KMALLOC_CACHES
>         if (type >= KMALLOC_RANDOM_START && type <= KMALLOC_RANDOM_END)
>                 flags |= SLAB_NO_MERGE;
> #endif

OK

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 4/6] mm/slab: Introduce kmem_buckets_create() and family
  2024-06-19 19:33 ` [PATCH v5 4/6] mm/slab: Introduce kmem_buckets_create() and family Kees Cook
  2024-06-20 13:56   ` Vlastimil Babka
@ 2024-06-20 22:48   ` Andi Kleen
  2024-06-20 23:29     ` Kees Cook
  1 sibling, 1 reply; 24+ messages in thread
From: Andi Kleen @ 2024-06-20 22:48 UTC (permalink / raw)
  To: Kees Cook
  Cc: Vlastimil Babka, GONG, Ruiqi, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, jvoisin, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu,
	linux-kernel, linux-mm, linux-hardening, netdev

Kees Cook <kees@kernel.org> writes:

> Dedicated caches are available for fixed size allocations via
> kmem_cache_alloc(), but for dynamically sized allocations there is only
> the global kmalloc API's set of buckets available. This means it isn't
> possible to separate specific sets of dynamically sized allocations into
> a separate collection of caches.
>
> This leads to a use-after-free exploitation weakness in the Linux
> kernel since many heap memory spraying/grooming attacks depend on using
> userspace-controllable dynamically sized allocations to collide with
> fixed size allocations that end up in same cache.
>
> While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense
> against these kinds of "type confusion" attacks, including for fixed
> same-size heap objects, we can create a complementary deterministic
> defense for dynamically sized allocations that are directly user
> controlled. Addressing these cases is limited in scope, so isolating these
> kinds of interfaces will not become an unbounded game of whack-a-mole. For
> example, many pass through memdup_user(), making isolation there very
> effective.

Isn't the attack still possible if the attacker can free the slab page
during the use-after-free period with enough memory pressure?

Someone else might grab the page that was in the bucket for another slab
and the type confusion could hurt again.

Or is there some other defense against that, other than
CONFIG_DEBUG_PAGEALLOC or full slab poisoning? And how expensive
does it get when any of those are enabled?

I remember reading some paper about a apple allocator trying similar
techniques and it tried very hard to never reuse memory (probably
not a good idea for Linux though)

I assume you thought about this, but it would be good to discuss such
limitations and interactions in the commit log.

-Andi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 4/6] mm/slab: Introduce kmem_buckets_create() and family
  2024-06-20 22:48   ` Andi Kleen
@ 2024-06-20 23:29     ` Kees Cook
  0 siblings, 0 replies; 24+ messages in thread
From: Kees Cook @ 2024-06-20 23:29 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Vlastimil Babka, GONG, Ruiqi, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, jvoisin, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu,
	linux-kernel, linux-mm, linux-hardening, netdev

On Thu, Jun 20, 2024 at 03:48:24PM -0700, Andi Kleen wrote:
> Kees Cook <kees@kernel.org> writes:
> 
> > Dedicated caches are available for fixed size allocations via
> > kmem_cache_alloc(), but for dynamically sized allocations there is only
> > the global kmalloc API's set of buckets available. This means it isn't
> > possible to separate specific sets of dynamically sized allocations into
> > a separate collection of caches.
> >
> > This leads to a use-after-free exploitation weakness in the Linux
> > kernel since many heap memory spraying/grooming attacks depend on using
> > userspace-controllable dynamically sized allocations to collide with
> > fixed size allocations that end up in same cache.
> >
> > While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense
> > against these kinds of "type confusion" attacks, including for fixed
> > same-size heap objects, we can create a complementary deterministic
> > defense for dynamically sized allocations that are directly user
> > controlled. Addressing these cases is limited in scope, so isolating these
> > kinds of interfaces will not become an unbounded game of whack-a-mole. For
> > example, many pass through memdup_user(), making isolation there very
> > effective.
> 
> Isn't the attack still possible if the attacker can free the slab page
> during the use-after-free period with enough memory pressure?
> 
> Someone else might grab the page that was in the bucket for another slab
> and the type confusion could hurt again.
> 
> Or is there some other defense against that, other than
> CONFIG_DEBUG_PAGEALLOC or full slab poisoning? And how expensive
> does it get when any of those are enabled?
> 
> I remember reading some paper about a apple allocator trying similar
> techniques and it tried very hard to never reuse memory (probably
> not a good idea for Linux though)
> 
> I assume you thought about this, but it would be good to discuss such
> limitations and interactions in the commit log.

Yup! It's in there; it's just after what you quoted above. Here it is:

> > Memory allocation pinning[2] is still needed to plug the Use-After-Free
> > cross-allocator weakness, but that is an existing and separate issue
> > which is complementary to this improvement. Development continues for
> > that feature via the SLAB_VIRTUAL[3] series (which could also provide
> > guard pages -- another complementary improvement).
> > [...]
> > Link: https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html [2]
> > Link: https://lore.kernel.org/lkml/20230915105933.495735-1-matteorizzo@google.com/ [3]

Let me know if you think this description needs to be improved...

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 4/6] mm/slab: Introduce kmem_buckets_create() and family
  2024-06-20 20:43       ` Vlastimil Babka
@ 2024-06-28  5:35         ` Boqun Feng
  2024-06-28  8:40           ` Vlastimil Babka
  2024-06-28 15:47           ` Kees Cook
  0 siblings, 2 replies; 24+ messages in thread
From: Boqun Feng @ 2024-06-28  5:35 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, GONG, Ruiqi, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, jvoisin, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu,
	linux-kernel, linux-mm, linux-hardening, netdev, rust-for-linux

On Thu, Jun 20, 2024 at 10:43:39PM +0200, Vlastimil Babka wrote:
> On 6/20/24 8:54 PM, Kees Cook wrote:
> > On Thu, Jun 20, 2024 at 03:56:27PM +0200, Vlastimil Babka wrote:
> >> > @@ -549,6 +549,11 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
> >> >  
> >> >  void kmem_cache_free(struct kmem_cache *s, void *objp);
> >> >  
> >> > +kmem_buckets *kmem_buckets_create(const char *name, unsigned int align,
> >> > +				  slab_flags_t flags,
> >> > +				  unsigned int useroffset, unsigned int usersize,
> >> > +				  void (*ctor)(void *));
> >> 
> >> I'd drop the ctor, I can't imagine how it would be used with variable-sized
> >> allocations.
> > 
> > I've kept it because for "kmalloc wrapper" APIs, e.g. devm_kmalloc(),
> > there is some "housekeeping" that gets done explicitly right now that I
> > think would be better served by using a ctor in the future. These APIs
> > are variable-sized, but have a fixed size header, so they have a
> > "minimum size" that the ctor can still operate on, etc.
> > 
> >> Probably also "align" doesn't make much sense since we're just
> >> copying the kmalloc cache sizes and its implicit alignment of any
> >> power-of-two allocations.
> > 
> > Yeah, that's probably true. I kept it since I wanted to mirror
> > kmem_cache_create() to make this API more familiar looking.
> 
> Rust people were asking about kmalloc alignment (but I forgot the details)

It was me! The ask is whether we can specify the alignment for the
allocation API, for example, requesting a size=96 and align=32 memory,
or the allocation API could do a "best alignment", for example,
allocating a size=96 will give a align=32 memory. As far as I
understand, kmalloc() doesn't support this.

> so maybe this could be useful for them? CC rust-for-linux.
> 

I took a quick look as what kmem_buckets is, and seems to me that align
doesn't make sense here (and probably not useful in Rust as well)
because a kmem_buckets is a set of kmem_caches, each has its own object
size, making them share the same alignment is probably not what you
want. But I could be missing something.

Regards,
Boqun

> >> I don't think any current kmalloc user would
> >> suddenly need either of those as you convert it to buckets, and definitely
> >> not any user converted automatically by the code tagging.
> > 
> > Right, it's not needed for either the explicit users nor the future
> > automatic users. But since these arguments are available internally,
> > there seems to be future utility,  it's not fast path, and it made things
> > feel like the existing API, I'd kind of like to keep it.
> > 
> > But all that said, if you really don't want it, then sure I can drop
> > those arguments. Adding them back in the future shouldn't be too
> > much churn.
> 
> I guess we can keep it then.
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 4/6] mm/slab: Introduce kmem_buckets_create() and family
  2024-06-28  5:35         ` Boqun Feng
@ 2024-06-28  8:40           ` Vlastimil Babka
  2024-06-28  9:06             ` Alice Ryhl
  2024-06-28 15:47           ` Kees Cook
  1 sibling, 1 reply; 24+ messages in thread
From: Vlastimil Babka @ 2024-06-28  8:40 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Kees Cook, GONG, Ruiqi, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, jvoisin, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu,
	linux-kernel, linux-mm, linux-hardening, netdev, rust-for-linux

On 6/28/24 7:35 AM, Boqun Feng wrote:
> On Thu, Jun 20, 2024 at 10:43:39PM +0200, Vlastimil Babka wrote:
>> On 6/20/24 8:54 PM, Kees Cook wrote:
>> > On Thu, Jun 20, 2024 at 03:56:27PM +0200, Vlastimil Babka wrote:
>> >> > @@ -549,6 +549,11 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
>> >> >  
>> >> >  void kmem_cache_free(struct kmem_cache *s, void *objp);
>> >> >  
>> >> > +kmem_buckets *kmem_buckets_create(const char *name, unsigned int align,
>> >> > +				  slab_flags_t flags,
>> >> > +				  unsigned int useroffset, unsigned int usersize,
>> >> > +				  void (*ctor)(void *));
>> >> 
>> >> I'd drop the ctor, I can't imagine how it would be used with variable-sized
>> >> allocations.
>> > 
>> > I've kept it because for "kmalloc wrapper" APIs, e.g. devm_kmalloc(),
>> > there is some "housekeeping" that gets done explicitly right now that I
>> > think would be better served by using a ctor in the future. These APIs
>> > are variable-sized, but have a fixed size header, so they have a
>> > "minimum size" that the ctor can still operate on, etc.
>> > 
>> >> Probably also "align" doesn't make much sense since we're just
>> >> copying the kmalloc cache sizes and its implicit alignment of any
>> >> power-of-two allocations.
>> > 
>> > Yeah, that's probably true. I kept it since I wanted to mirror
>> > kmem_cache_create() to make this API more familiar looking.
>> 
>> Rust people were asking about kmalloc alignment (but I forgot the details)
> 
> It was me! The ask is whether we can specify the alignment for the
> allocation API, for example, requesting a size=96 and align=32 memory,
> or the allocation API could do a "best alignment", for example,
> allocating a size=96 will give a align=32 memory. As far as I
> understand, kmalloc() doesn't support this.

Hm yeah we only have guarantees for power-or-2 allocations.

>> so maybe this could be useful for them? CC rust-for-linux.
>> 
> 
> I took a quick look as what kmem_buckets is, and seems to me that align
> doesn't make sense here (and probably not useful in Rust as well)
> because a kmem_buckets is a set of kmem_caches, each has its own object
> size, making them share the same alignment is probably not what you
> want. But I could be missing something.

How flexible do you need those alignments to be? Besides the power-of-two
guarantees, we currently have only two odd sizes with 96 and 192. If those
were guaranteed to be aligned 32 bytes, would that be sufficient? Also do
you ever allocate anything smaller than 32 bytes then?

To summarize, if Rust's requirements can be summarized by some rules and
it's not completely ad-hoc per-allocation alignment requirement (or if it
is, does it have an upper bound?) we could perhaps figure out the creation
of rust-specific kmem_buckets to give it what's needed?

> Regards,
> Boqun
> 
>> >> I don't think any current kmalloc user would
>> >> suddenly need either of those as you convert it to buckets, and definitely
>> >> not any user converted automatically by the code tagging.
>> > 
>> > Right, it's not needed for either the explicit users nor the future
>> > automatic users. But since these arguments are available internally,
>> > there seems to be future utility,  it's not fast path, and it made things
>> > feel like the existing API, I'd kind of like to keep it.
>> > 
>> > But all that said, if you really don't want it, then sure I can drop
>> > those arguments. Adding them back in the future shouldn't be too
>> > much churn.
>> 
>> I guess we can keep it then.
>> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 4/6] mm/slab: Introduce kmem_buckets_create() and family
  2024-06-28  8:40           ` Vlastimil Babka
@ 2024-06-28  9:06             ` Alice Ryhl
  2024-06-28  9:17               ` Vlastimil Babka
  0 siblings, 1 reply; 24+ messages in thread
From: Alice Ryhl @ 2024-06-28  9:06 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Boqun Feng, Kees Cook, GONG, Ruiqi, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, jvoisin, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu,
	linux-kernel, linux-mm, linux-hardening, netdev, rust-for-linux

On Fri, Jun 28, 2024 at 10:40 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 6/28/24 7:35 AM, Boqun Feng wrote:
> > On Thu, Jun 20, 2024 at 10:43:39PM +0200, Vlastimil Babka wrote:
> >> On 6/20/24 8:54 PM, Kees Cook wrote:
> >> > On Thu, Jun 20, 2024 at 03:56:27PM +0200, Vlastimil Babka wrote:
> >> >> > @@ -549,6 +549,11 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
> >> >> >
> >> >> >  void kmem_cache_free(struct kmem_cache *s, void *objp);
> >> >> >
> >> >> > +kmem_buckets *kmem_buckets_create(const char *name, unsigned int align,
> >> >> > +                                 slab_flags_t flags,
> >> >> > +                                 unsigned int useroffset, unsigned int usersize,
> >> >> > +                                 void (*ctor)(void *));
> >> >>
> >> >> I'd drop the ctor, I can't imagine how it would be used with variable-sized
> >> >> allocations.
> >> >
> >> > I've kept it because for "kmalloc wrapper" APIs, e.g. devm_kmalloc(),
> >> > there is some "housekeeping" that gets done explicitly right now that I
> >> > think would be better served by using a ctor in the future. These APIs
> >> > are variable-sized, but have a fixed size header, so they have a
> >> > "minimum size" that the ctor can still operate on, etc.
> >> >
> >> >> Probably also "align" doesn't make much sense since we're just
> >> >> copying the kmalloc cache sizes and its implicit alignment of any
> >> >> power-of-two allocations.
> >> >
> >> > Yeah, that's probably true. I kept it since I wanted to mirror
> >> > kmem_cache_create() to make this API more familiar looking.
> >>
> >> Rust people were asking about kmalloc alignment (but I forgot the details)
> >
> > It was me! The ask is whether we can specify the alignment for the
> > allocation API, for example, requesting a size=96 and align=32 memory,
> > or the allocation API could do a "best alignment", for example,
> > allocating a size=96 will give a align=32 memory. As far as I
> > understand, kmalloc() doesn't support this.
>
> Hm yeah we only have guarantees for power-or-2 allocations.
>
> >> so maybe this could be useful for them? CC rust-for-linux.
> >>
> >
> > I took a quick look as what kmem_buckets is, and seems to me that align
> > doesn't make sense here (and probably not useful in Rust as well)
> > because a kmem_buckets is a set of kmem_caches, each has its own object
> > size, making them share the same alignment is probably not what you
> > want. But I could be missing something.
>
> How flexible do you need those alignments to be? Besides the power-of-two
> guarantees, we currently have only two odd sizes with 96 and 192. If those
> were guaranteed to be aligned 32 bytes, would that be sufficient? Also do
> you ever allocate anything smaller than 32 bytes then?
>
> To summarize, if Rust's requirements can be summarized by some rules and
> it's not completely ad-hoc per-allocation alignment requirement (or if it
> is, does it have an upper bound?) we could perhaps figure out the creation
> of rust-specific kmem_buckets to give it what's needed?

Rust's allocator API can take any size and alignment as long as:

1. The alignment is a power of two.
2. The size is non-zero.
3. When you round up the size to the next multiple of the alignment,
then it must not overflow the signed type isize / ssize_t.

What happens right now is that when Rust wants an allocation with a
higher alignment than ARCH_SLAB_MINALIGN, then it will increase size
until it becomes a power of two so that the power-of-two guarantee
gives a properly aligned allocation.

Alice

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 4/6] mm/slab: Introduce kmem_buckets_create() and family
  2024-06-28  9:06             ` Alice Ryhl
@ 2024-06-28  9:17               ` Vlastimil Babka
  2024-06-28  9:34                 ` Alice Ryhl
  0 siblings, 1 reply; 24+ messages in thread
From: Vlastimil Babka @ 2024-06-28  9:17 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Boqun Feng, Kees Cook, GONG, Ruiqi, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, jvoisin, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu,
	linux-kernel, linux-mm, linux-hardening, netdev, rust-for-linux

On 6/28/24 11:06 AM, Alice Ryhl wrote:
>> >>
>> >
>> > I took a quick look as what kmem_buckets is, and seems to me that align
>> > doesn't make sense here (and probably not useful in Rust as well)
>> > because a kmem_buckets is a set of kmem_caches, each has its own object
>> > size, making them share the same alignment is probably not what you
>> > want. But I could be missing something.
>>
>> How flexible do you need those alignments to be? Besides the power-of-two
>> guarantees, we currently have only two odd sizes with 96 and 192. If those
>> were guaranteed to be aligned 32 bytes, would that be sufficient? Also do
>> you ever allocate anything smaller than 32 bytes then?
>>
>> To summarize, if Rust's requirements can be summarized by some rules and
>> it's not completely ad-hoc per-allocation alignment requirement (or if it
>> is, does it have an upper bound?) we could perhaps figure out the creation
>> of rust-specific kmem_buckets to give it what's needed?
> 
> Rust's allocator API can take any size and alignment as long as:
> 
> 1. The alignment is a power of two.
> 2. The size is non-zero.
> 3. When you round up the size to the next multiple of the alignment,
> then it must not overflow the signed type isize / ssize_t.
> 
> What happens right now is that when Rust wants an allocation with a
> higher alignment than ARCH_SLAB_MINALIGN, then it will increase size
> until it becomes a power of two so that the power-of-two guarantee
> gives a properly aligned allocation.

So am I correct thinking that, if the cache of size 96 bytes guaranteed a
32byte alignment, and 192 bytes guaranteed 64byte alignment, and the rest of
sizes with the already guaranteed power-of-two alignment, then on rust side
you would only have to round up sizes to the next multiples of the alignemnt
(rule 3 above) and that would be sufficient?
 Abstracting from the specific sizes of 96 and 192, the guarantee on kmalloc
side would have to be - guarantee alignment to the largest power-of-two
divisor of the size. Does that sound right?

Then I think we could have some flag for kmem_buckets creation that would do
the right thing.

> Alice


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 4/6] mm/slab: Introduce kmem_buckets_create() and family
  2024-06-28  9:17               ` Vlastimil Babka
@ 2024-06-28  9:34                 ` Alice Ryhl
  0 siblings, 0 replies; 24+ messages in thread
From: Alice Ryhl @ 2024-06-28  9:34 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Boqun Feng, Kees Cook, GONG, Ruiqi, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, jvoisin, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu,
	linux-kernel, linux-mm, linux-hardening, netdev, rust-for-linux

On Fri, Jun 28, 2024 at 11:17 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 6/28/24 11:06 AM, Alice Ryhl wrote:
> >> >>
> >> >
> >> > I took a quick look as what kmem_buckets is, and seems to me that align
> >> > doesn't make sense here (and probably not useful in Rust as well)
> >> > because a kmem_buckets is a set of kmem_caches, each has its own object
> >> > size, making them share the same alignment is probably not what you
> >> > want. But I could be missing something.
> >>
> >> How flexible do you need those alignments to be? Besides the power-of-two
> >> guarantees, we currently have only two odd sizes with 96 and 192. If those
> >> were guaranteed to be aligned 32 bytes, would that be sufficient? Also do
> >> you ever allocate anything smaller than 32 bytes then?
> >>
> >> To summarize, if Rust's requirements can be summarized by some rules and
> >> it's not completely ad-hoc per-allocation alignment requirement (or if it
> >> is, does it have an upper bound?) we could perhaps figure out the creation
> >> of rust-specific kmem_buckets to give it what's needed?
> >
> > Rust's allocator API can take any size and alignment as long as:
> >
> > 1. The alignment is a power of two.
> > 2. The size is non-zero.
> > 3. When you round up the size to the next multiple of the alignment,
> > then it must not overflow the signed type isize / ssize_t.
> >
> > What happens right now is that when Rust wants an allocation with a
> > higher alignment than ARCH_SLAB_MINALIGN, then it will increase size
> > until it becomes a power of two so that the power-of-two guarantee
> > gives a properly aligned allocation.
>
> So am I correct thinking that, if the cache of size 96 bytes guaranteed a
> 32byte alignment, and 192 bytes guaranteed 64byte alignment, and the rest of
> sizes with the already guaranteed power-of-two alignment, then on rust side
> you would only have to round up sizes to the next multiples of the alignemnt
> (rule 3 above) and that would be sufficient?
>  Abstracting from the specific sizes of 96 and 192, the guarantee on kmalloc
> side would have to be - guarantee alignment to the largest power-of-two
> divisor of the size. Does that sound right?
>
> Then I think we could have some flag for kmem_buckets creation that would do
> the right thing.

If kmalloc/krealloc guarantee that an allocation is aligned according
to the largest power-of-two divisor of the size, then the Rust
allocator would definitely be simplified as we would not longer need
this part:

if layout.align() > bindings::ARCH_SLAB_MINALIGN {
    // The alignment requirement exceeds the slab guarantee, thus try
to enlarge the size
    // to use the "power-of-two" size/alignment guarantee (see
comments in `kmalloc()` for
    // more information).
    //
    // Note that `layout.size()` (after padding) is guaranteed to be a
multiple of
    // `layout.align()`, so `next_power_of_two` gives enough alignment
guarantee.
    size = size.next_power_of_two();
}

We would only need to keep the part that rounds up the size to a
multiple of the alignment.

Alice

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 4/6] mm/slab: Introduce kmem_buckets_create() and family
  2024-06-28  5:35         ` Boqun Feng
  2024-06-28  8:40           ` Vlastimil Babka
@ 2024-06-28 15:47           ` Kees Cook
  2024-06-28 16:53             ` Vlastimil Babka
  1 sibling, 1 reply; 24+ messages in thread
From: Kees Cook @ 2024-06-28 15:47 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Vlastimil Babka, GONG, Ruiqi, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, jvoisin, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu,
	linux-kernel, linux-mm, linux-hardening, netdev, rust-for-linux

On Thu, Jun 27, 2024 at 10:35:36PM -0700, Boqun Feng wrote:
> On Thu, Jun 20, 2024 at 10:43:39PM +0200, Vlastimil Babka wrote:
> > Rust people were asking about kmalloc alignment (but I forgot the details)
> 
> It was me! The ask is whether we can specify the alignment for the
> allocation API, for example, requesting a size=96 and align=32 memory,
> or the allocation API could do a "best alignment", for example,
> allocating a size=96 will give a align=32 memory. As far as I
> understand, kmalloc() doesn't support this.

I can drop the "align" argument. Do we want to hard-code a
per-cache-size alignment for the caches in a kmem_buckets collection?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 4/6] mm/slab: Introduce kmem_buckets_create() and family
  2024-06-28 15:47           ` Kees Cook
@ 2024-06-28 16:53             ` Vlastimil Babka
  0 siblings, 0 replies; 24+ messages in thread
From: Vlastimil Babka @ 2024-06-28 16:53 UTC (permalink / raw)
  To: Kees Cook, Boqun Feng
  Cc: GONG, Ruiqi, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, jvoisin, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, Xiu Jianfeng, Suren Baghdasaryan, Kent Overstreet,
	Jann Horn, Matteo Rizzo, Thomas Graf, Herbert Xu, linux-kernel,
	linux-mm, linux-hardening, netdev, rust-for-linux

On 6/28/24 5:47 PM, Kees Cook wrote:
> On Thu, Jun 27, 2024 at 10:35:36PM -0700, Boqun Feng wrote:
>> On Thu, Jun 20, 2024 at 10:43:39PM +0200, Vlastimil Babka wrote:
>> > Rust people were asking about kmalloc alignment (but I forgot the details)
>> 
>> It was me! The ask is whether we can specify the alignment for the
>> allocation API, for example, requesting a size=96 and align=32 memory,
>> or the allocation API could do a "best alignment", for example,
>> allocating a size=96 will give a align=32 memory. As far as I
>> understand, kmalloc() doesn't support this.
> 
> I can drop the "align" argument. Do we want to hard-code a
> per-cache-size alignment for the caches in a kmem_buckets collection?

I think you can drop it as a single value is really ill suited for a
collection of different sizes.

As for Rust's requirements we could consider whether to add a special flag
if they use own bucket, or just implement the rules for non-power-of-two
size globally. It should be feasible as I think in the non-debug caches it
shouldn't in fact change the existing layout, which is the same situation as
when we codified the power-of-two caches alignment as guaranteed.



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-06-28 16:53 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-19 19:33 [PATCH v5 0/6] slab: Introduce dedicated bucket allocator Kees Cook
2024-06-19 19:33 ` [PATCH v5 1/6] mm/slab: Introduce kmem_buckets typedef Kees Cook
2024-06-19 19:33 ` [PATCH v5 2/6] mm/slab: Plumb kmem_buckets into __do_kmalloc_node() Kees Cook
2024-06-20 13:08   ` Vlastimil Babka
2024-06-20 13:37     ` Vlastimil Babka
2024-06-20 18:46       ` Kees Cook
2024-06-20 20:44         ` Vlastimil Babka
2024-06-20 18:41     ` Kees Cook
2024-06-19 19:33 ` [PATCH v5 3/6] mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets argument Kees Cook
2024-06-19 19:33 ` [PATCH v5 4/6] mm/slab: Introduce kmem_buckets_create() and family Kees Cook
2024-06-20 13:56   ` Vlastimil Babka
2024-06-20 18:54     ` Kees Cook
2024-06-20 20:43       ` Vlastimil Babka
2024-06-28  5:35         ` Boqun Feng
2024-06-28  8:40           ` Vlastimil Babka
2024-06-28  9:06             ` Alice Ryhl
2024-06-28  9:17               ` Vlastimil Babka
2024-06-28  9:34                 ` Alice Ryhl
2024-06-28 15:47           ` Kees Cook
2024-06-28 16:53             ` Vlastimil Babka
2024-06-20 22:48   ` Andi Kleen
2024-06-20 23:29     ` Kees Cook
2024-06-19 19:33 ` [PATCH v5 5/6] ipc, msg: Use dedicated slab buckets for alloc_msg() Kees Cook
2024-06-19 19:33 ` [PATCH v5 6/6] mm/util: Use dedicated slab buckets for memdup_user() Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).