linux-hardening.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] slab: support for compiler-assisted type-based slab cache partitioning
@ 2025-08-25 15:44 Marco Elver
  2025-08-25 16:48 ` Harry Yoo
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Marco Elver @ 2025-08-25 15:44 UTC (permalink / raw)
  To: elver
  Cc: linux-kernel, kasan-dev, Gustavo A. R. Silva, Liam R. Howlett,
	Alexander Potapenko, Andrew Morton, Andrey Konovalov,
	David Hildenbrand, David Rientjes, Dmitry Vyukov, Florent Revest,
	GONG Ruiqi, Harry Yoo, Jann Horn, Kees Cook, Lorenzo Stoakes,
	Matteo Rizzo, Michal Hocko, Mike Rapoport, Nathan Chancellor,
	Roman Gushchin, Suren Baghdasaryan, Vlastimil Babka,
	linux-hardening, linux-mm

[ Beware, this an early RFC for an in-development Clang feature, and
  requires the following Clang/LLVM development tree:
   https://github.com/melver/llvm-project/tree/alloc-token
  The corresponding LLVM RFC and discussion can be found here:
   https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 ]

Rework the general infrastructure around RANDOM_KMALLOC_CACHES into more
flexible PARTITION_KMALLOC_CACHES, with the former being a partitioning
mode of the latter.

Introduce a new mode, TYPED_KMALLOC_CACHES, which leverages Clang's
"allocation tokens" via __builtin_alloc_token_infer [1].

This mechanism allows the compiler to pass a token ID derived from the
allocation's type to the allocator. The compiler performs best-effort
type inference, and recognizes idioms such as kmalloc(sizeof(T), ...).
Unlike RANDOM_KMALLOC_CACHES, this mode deterministically assigns a slab
cache to an allocation of type T, regardless of allocation site.

Clang's default token ID calculation is described as [1]:

   TypeHashPointerSplit: This mode assigns a token ID based on the hash
   of the allocated type's name, where the top half ID-space is reserved
   for types that contain pointers and the bottom half for types that do
   not contain pointers.

Separating pointer-containing objects from pointerless objects and data
allocations can help mitigate certain classes of memory corruption
exploits [2]: attackers who gains a buffer overflow on a primitive
buffer cannot use it to directly corrupt pointers or other critical
metadata in an object residing in a different, isolated heap region.

It is important to note that heap isolation strategies offer a
best-effort approach, and do not provide a 100% security guarantee,
albeit achievable at relatively low performance cost. Note that this
also does not prevent cross-cache attacks, and SLAB_VIRTUAL [3] should
be used as a complementary mitigation.

With all that, my kernel (x86 defconfig) shows me a histogram of slab
cache object distribution per /proc/slabinfo (after boot):

  <slab cache>      <objs> <hist>
  kmalloc-part-15     619  ++++++
  kmalloc-part-14    1412  ++++++++++++++
  kmalloc-part-13    1063  ++++++++++
  kmalloc-part-12    1745  +++++++++++++++++
  kmalloc-part-11     891  ++++++++
  kmalloc-part-10     610  ++++++
  kmalloc-part-09     792  +++++++
  kmalloc-part-08    3054  ++++++++++++++++++++++++++++++
  kmalloc-part-07     245  ++
  kmalloc-part-06     182  +
  kmalloc-part-05     122  +
  kmalloc-part-04     295  ++
  kmalloc-part-03     241  ++
  kmalloc-part-02     107  +
  kmalloc-part-01     124  +
  kmalloc            6231  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

The above /proc/slabinfo snapshot shows me there are 7547 allocated
objects (slabs 00 - 07) that the compiler claims contain no pointers or
it was unable to infer the type of, and 10186 objects that contain
pointers (slabs 08 - 15). On a whole, this looks relatively sane.

Additionally, when I compile my kernel with -Rpass=alloc-token, which
provides diagnostics where (after dead-code elimination) type inference
failed, I see 966 allocation sites where the compiler failed to identify
a type. Some initial review confirms these are mostly variable sized
buffers, but also include structs with trailing flexible length arrays
(the latter could be recognized by the compiler by teaching it to look
more deeply into complex expressions such as those generated by
struct_size).

Link: https://github.com/melver/llvm-project/blob/alloc-token/clang/docs/AllocToken.rst [1]
Link: https://blog.dfsec.com/ios/2025/05/30/blasting-past-ios-18/ [2]
Link: https://lwn.net/Articles/944647/ [3]
Signed-off-by: Marco Elver <elver@google.com>
---
 Makefile                        |  5 ++
 include/linux/percpu.h          |  2 +-
 include/linux/slab.h            | 88 ++++++++++++++++++++-------------
 kernel/configs/hardening.config |  2 +-
 mm/Kconfig                      | 43 ++++++++++++----
 mm/kfence/kfence_test.c         |  4 +-
 mm/slab.h                       |  4 +-
 mm/slab_common.c                | 48 +++++++++---------
 mm/slub.c                       | 20 ++++----
 9 files changed, 131 insertions(+), 85 deletions(-)

diff --git a/Makefile b/Makefile
index d1adb78c3596..cc761267fc75 100644
--- a/Makefile
+++ b/Makefile
@@ -936,6 +936,11 @@ KBUILD_CFLAGS	+= $(CC_AUTO_VAR_INIT_ZERO_ENABLER)
 endif
 endif
 
+ifdef CONFIG_TYPED_KMALLOC_CACHES
+# PARTITION_KMALLOC_CACHES_NR + 1
+KBUILD_CFLAGS	+= -falloc-token-max=16
+endif
+
 # Explicitly clear padding bits during variable initialization
 KBUILD_CFLAGS += $(call cc-option,-fzero-init-padding-bits=all)
 
diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 85bf8dd9f087..271b41be314d 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -36,7 +36,7 @@
 #define PCPU_BITMAP_BLOCK_BITS		(PCPU_BITMAP_BLOCK_SIZE >>	\
 					 PCPU_MIN_ALLOC_SHIFT)
 
-#ifdef CONFIG_RANDOM_KMALLOC_CACHES
+#ifdef CONFIG_PARTITION_KMALLOC_CACHES
 # if defined(CONFIG_LOCKDEP) && !defined(CONFIG_PAGE_SIZE_4KB)
 # define PERCPU_DYNAMIC_SIZE_SHIFT      13
 # else
diff --git a/include/linux/slab.h b/include/linux/slab.h
index d5a8ab98035c..4ace54744b54 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -583,10 +583,10 @@ static inline unsigned int arch_slab_minalign(void)
 #define SLAB_OBJ_MIN_SIZE      (KMALLOC_MIN_SIZE < 16 ? \
                                (KMALLOC_MIN_SIZE) : 16)
 
-#ifdef CONFIG_RANDOM_KMALLOC_CACHES
-#define RANDOM_KMALLOC_CACHES_NR	15 // # of cache copies
+#ifdef CONFIG_PARTITION_KMALLOC_CACHES
+#define PARTITION_KMALLOC_CACHES_NR	15 // # of cache copies
 #else
-#define RANDOM_KMALLOC_CACHES_NR	0
+#define PARTITION_KMALLOC_CACHES_NR	0
 #endif
 
 /*
@@ -605,8 +605,8 @@ enum kmalloc_cache_type {
 #ifndef CONFIG_MEMCG
 	KMALLOC_CGROUP = KMALLOC_NORMAL,
 #endif
-	KMALLOC_RANDOM_START = KMALLOC_NORMAL,
-	KMALLOC_RANDOM_END = KMALLOC_RANDOM_START + RANDOM_KMALLOC_CACHES_NR,
+	KMALLOC_PARTITION_START = KMALLOC_NORMAL,
+	KMALLOC_PARTITION_END = KMALLOC_PARTITION_START + PARTITION_KMALLOC_CACHES_NR,
 #ifdef CONFIG_SLUB_TINY
 	KMALLOC_RECLAIM = KMALLOC_NORMAL,
 #else
@@ -633,9 +633,20 @@ extern kmem_buckets kmalloc_caches[NR_KMALLOC_TYPES];
 	(IS_ENABLED(CONFIG_ZONE_DMA)   ? __GFP_DMA : 0) |	\
 	(IS_ENABLED(CONFIG_MEMCG) ? __GFP_ACCOUNT : 0))
 
+#ifdef CONFIG_RANDOM_KMALLOC_CACHES
 extern unsigned long random_kmalloc_seed;
+typedef struct { unsigned long ip; } kmalloc_token_t;
+#define __kmalloc_token(...) ((kmalloc_token_t) { .ip = _RET_IP_ })
+#elif defined(CONFIG_TYPED_KMALLOC_CACHES)
+typedef struct { unsigned long v; } kmalloc_token_t;
+#define __kmalloc_token(...) ((kmalloc_token_t){ .v = __builtin_alloc_token_infer(__VA_ARGS__) })
+#else
+/* no-op */
+typedef struct {} kmalloc_token_t;
+#define __kmalloc_token(...) ((kmalloc_token_t){})
+#endif
 
-static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags, unsigned long caller)
+static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags, kmalloc_token_t token)
 {
 	/*
 	 * The most common case is KMALLOC_NORMAL, so test for it
@@ -643,9 +654,11 @@ static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags, unsigne
 	 */
 	if (likely((flags & KMALLOC_NOT_NORMAL_BITS) == 0))
 #ifdef CONFIG_RANDOM_KMALLOC_CACHES
-		/* RANDOM_KMALLOC_CACHES_NR (=15) copies + the KMALLOC_NORMAL */
-		return KMALLOC_RANDOM_START + hash_64(caller ^ random_kmalloc_seed,
-						      ilog2(RANDOM_KMALLOC_CACHES_NR + 1));
+		/* PARTITION_KMALLOC_CACHES_NR (=15) copies + the KMALLOC_NORMAL */
+		return KMALLOC_PARTITION_START + hash_64(token.ip ^ random_kmalloc_seed,
+							 ilog2(PARTITION_KMALLOC_CACHES_NR + 1));
+#elif defined(CONFIG_TYPED_KMALLOC_CACHES)
+		return KMALLOC_PARTITION_START + token.v;
 #else
 		return KMALLOC_NORMAL;
 #endif
@@ -819,10 +832,10 @@ void *kmem_cache_alloc_node_noprof(struct kmem_cache *s, gfp_t flags,
  * with the exception of kunit tests
  */
 
-void *__kmalloc_noprof(size_t size, gfp_t flags)
+void *__kmalloc_noprof(size_t size, gfp_t flags, kmalloc_token_t token)
 				__assume_kmalloc_alignment __alloc_size(1);
 
-void *__kmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node)
+void *__kmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node, kmalloc_token_t token)
 				__assume_kmalloc_alignment __alloc_size(1);
 
 void *__kmalloc_cache_noprof(struct kmem_cache *s, gfp_t flags, size_t size)
@@ -893,7 +906,7 @@ void *__kmalloc_large_node_noprof(size_t size, gfp_t flags, int node)
  *	Try really hard to succeed the allocation but fail
  *	eventually.
  */
-static __always_inline __alloc_size(1) void *kmalloc_noprof(size_t size, gfp_t flags)
+static __always_inline __alloc_size(1) void *_kmalloc_noprof(size_t size, gfp_t flags, kmalloc_token_t token)
 {
 	if (__builtin_constant_p(size) && size) {
 		unsigned int index;
@@ -903,20 +916,21 @@ static __always_inline __alloc_size(1) void *kmalloc_noprof(size_t size, gfp_t f
 
 		index = kmalloc_index(size);
 		return __kmalloc_cache_noprof(
-				kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
+				kmalloc_caches[kmalloc_type(flags, token)][index],
 				flags, size);
 	}
-	return __kmalloc_noprof(size, flags);
+	return __kmalloc_noprof(size, flags, token);
 }
+#define kmalloc_noprof(...)			_kmalloc_noprof(__VA_ARGS__, __kmalloc_token(__VA_ARGS__))
 #define kmalloc(...)				alloc_hooks(kmalloc_noprof(__VA_ARGS__))
 
 #define kmem_buckets_alloc(_b, _size, _flags)	\
-	alloc_hooks(__kmalloc_node_noprof(PASS_BUCKET_PARAMS(_size, _b), _flags, NUMA_NO_NODE))
+	alloc_hooks(__kmalloc_node_noprof(PASS_BUCKET_PARAMS(_size, _b), _flags, NUMA_NO_NODE, __kmalloc_token(_size)))
 
 #define kmem_buckets_alloc_track_caller(_b, _size, _flags)	\
-	alloc_hooks(__kmalloc_node_track_caller_noprof(PASS_BUCKET_PARAMS(_size, _b), _flags, NUMA_NO_NODE, _RET_IP_))
+	alloc_hooks(__kmalloc_node_track_caller_noprof(PASS_BUCKET_PARAMS(_size, _b), _flags, NUMA_NO_NODE, _RET_IP_, __kmalloc_token(_size)))
 
-static __always_inline __alloc_size(1) void *kmalloc_node_noprof(size_t size, gfp_t flags, int node)
+static __always_inline __alloc_size(1) void *_kmalloc_node_noprof(size_t size, gfp_t flags, int node, kmalloc_token_t token)
 {
 	if (__builtin_constant_p(size) && size) {
 		unsigned int index;
@@ -926,11 +940,12 @@ static __always_inline __alloc_size(1) void *kmalloc_node_noprof(size_t size, gf
 
 		index = kmalloc_index(size);
 		return __kmalloc_cache_node_noprof(
-				kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
+				kmalloc_caches[kmalloc_type(flags, token)][index],
 				flags, node, size);
 	}
-	return __kmalloc_node_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node);
+	return __kmalloc_node_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node, token);
 }
+#define kmalloc_node_noprof(...)		_kmalloc_node_noprof(__VA_ARGS__, __kmalloc_token(__VA_ARGS__))
 #define kmalloc_node(...)			alloc_hooks(kmalloc_node_noprof(__VA_ARGS__))
 
 /**
@@ -939,14 +954,15 @@ static __always_inline __alloc_size(1) void *kmalloc_node_noprof(size_t size, gf
  * @size: element size.
  * @flags: the type of memory to allocate (see kmalloc).
  */
-static inline __alloc_size(1, 2) void *kmalloc_array_noprof(size_t n, size_t size, gfp_t flags)
+static inline __alloc_size(1, 2) void *_kmalloc_array_noprof(size_t n, size_t size, gfp_t flags, kmalloc_token_t token)
 {
 	size_t bytes;
 
 	if (unlikely(check_mul_overflow(n, size, &bytes)))
 		return NULL;
-	return kmalloc_noprof(bytes, flags);
+	return _kmalloc_noprof(bytes, flags, token);
 }
+#define kmalloc_array_noprof(...)		_kmalloc_array_noprof(__VA_ARGS__, __kmalloc_token(__VA_ARGS__))
 #define kmalloc_array(...)			alloc_hooks(kmalloc_array_noprof(__VA_ARGS__))
 
 /**
@@ -989,9 +1005,9 @@ static inline __realloc_size(2, 3) void * __must_check krealloc_array_noprof(voi
 #define kcalloc(n, size, flags)		kmalloc_array(n, size, (flags) | __GFP_ZERO)
 
 void *__kmalloc_node_track_caller_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node,
-					 unsigned long caller) __alloc_size(1);
+					 unsigned long caller, kmalloc_token_t token) __alloc_size(1);
 #define kmalloc_node_track_caller_noprof(size, flags, node, caller) \
-	__kmalloc_node_track_caller_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node, caller)
+	__kmalloc_node_track_caller_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node, caller, __kmalloc_token(size))
 #define kmalloc_node_track_caller(...)		\
 	alloc_hooks(kmalloc_node_track_caller_noprof(__VA_ARGS__, _RET_IP_))
 
@@ -1008,17 +1024,18 @@ void *__kmalloc_node_track_caller_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flag
 #define kmalloc_track_caller_noprof(...)	\
 		kmalloc_node_track_caller_noprof(__VA_ARGS__, NUMA_NO_NODE, _RET_IP_)
 
-static inline __alloc_size(1, 2) void *kmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags,
-							  int node)
+static inline __alloc_size(1, 2) void *_kmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags,
+								  int node, kmalloc_token_t token)
 {
 	size_t bytes;
 
 	if (unlikely(check_mul_overflow(n, size, &bytes)))
 		return NULL;
 	if (__builtin_constant_p(n) && __builtin_constant_p(size))
-		return kmalloc_node_noprof(bytes, flags, node);
-	return __kmalloc_node_noprof(PASS_BUCKET_PARAMS(bytes, NULL), flags, node);
+		return _kmalloc_node_noprof(bytes, flags, node, token);
+	return __kmalloc_node_noprof(PASS_BUCKET_PARAMS(bytes, NULL), flags, node, token);
 }
+#define kmalloc_array_node_noprof(...)		_kmalloc_array_node_noprof(__VA_ARGS__, __kmalloc_token(__VA_ARGS__))
 #define kmalloc_array_node(...)			alloc_hooks(kmalloc_array_node_noprof(__VA_ARGS__))
 
 #define kcalloc_node(_n, _size, _flags, _node)	\
@@ -1034,16 +1051,17 @@ static inline __alloc_size(1, 2) void *kmalloc_array_node_noprof(size_t n, size_
  * @size: how many bytes of memory are required.
  * @flags: the type of memory to allocate (see kmalloc).
  */
-static inline __alloc_size(1) void *kzalloc_noprof(size_t size, gfp_t flags)
+static inline __alloc_size(1) void *_kzalloc_noprof(size_t size, gfp_t flags, kmalloc_token_t token)
 {
-	return kmalloc_noprof(size, flags | __GFP_ZERO);
+	return _kmalloc_noprof(size, flags | __GFP_ZERO, token);
 }
+#define kzalloc_noprof(...)			_kzalloc_noprof(__VA_ARGS__, __kmalloc_token(__VA_ARGS__))
 #define kzalloc(...)				alloc_hooks(kzalloc_noprof(__VA_ARGS__))
 #define kzalloc_node(_size, _flags, _node)	kmalloc_node(_size, (_flags)|__GFP_ZERO, _node)
 
-void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node) __alloc_size(1);
+void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node, kmalloc_token_t token) __alloc_size(1);
 #define kvmalloc_node_noprof(size, flags, node)	\
-	__kvmalloc_node_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node)
+	__kvmalloc_node_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node, __kmalloc_token(size))
 #define kvmalloc_node(...)			alloc_hooks(kvmalloc_node_noprof(__VA_ARGS__))
 
 #define kvmalloc(_size, _flags)			kvmalloc_node(_size, _flags, NUMA_NO_NODE)
@@ -1052,19 +1070,19 @@ void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node)
 
 #define kvzalloc_node(_size, _flags, _node)	kvmalloc_node(_size, (_flags)|__GFP_ZERO, _node)
 #define kmem_buckets_valloc(_b, _size, _flags)	\
-	alloc_hooks(__kvmalloc_node_noprof(PASS_BUCKET_PARAMS(_size, _b), _flags, NUMA_NO_NODE))
+	alloc_hooks(__kvmalloc_node_noprof(PASS_BUCKET_PARAMS(_size, _b), _flags, NUMA_NO_NODE, __kmalloc_token(_size)))
 
 static inline __alloc_size(1, 2) void *
-kvmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags, int node)
+_kvmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags, int node, kmalloc_token_t token)
 {
 	size_t bytes;
 
 	if (unlikely(check_mul_overflow(n, size, &bytes)))
 		return NULL;
 
-	return kvmalloc_node_noprof(bytes, flags, node);
+	return __kvmalloc_node_noprof(PASS_BUCKET_PARAMS(bytes, NULL), flags, node, token);
 }
-
+#define kvmalloc_array_node_noprof(...)		_kvmalloc_array_node_noprof(__VA_ARGS__, __kmalloc_token(__VA_ARGS__))
 #define kvmalloc_array_noprof(...)		kvmalloc_array_node_noprof(__VA_ARGS__, NUMA_NO_NODE)
 #define kvcalloc_node_noprof(_n,_s,_f,_node)	kvmalloc_array_node_noprof(_n,_s,(_f)|__GFP_ZERO,_node)
 #define kvcalloc_noprof(...)			kvcalloc_node_noprof(__VA_ARGS__, NUMA_NO_NODE)
diff --git a/kernel/configs/hardening.config b/kernel/configs/hardening.config
index 64caaf997fc0..df4fba56a3fd 100644
--- a/kernel/configs/hardening.config
+++ b/kernel/configs/hardening.config
@@ -22,7 +22,7 @@ CONFIG_SLAB_FREELIST_RANDOM=y
 CONFIG_SLAB_FREELIST_HARDENED=y
 CONFIG_SLAB_BUCKETS=y
 CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
-CONFIG_RANDOM_KMALLOC_CACHES=y
+CONFIG_PARTITION_KMALLOC_CACHES=y
 
 # Sanity check userspace page table mappings.
 CONFIG_PAGE_TABLE_CHECK=y
diff --git a/mm/Kconfig b/mm/Kconfig
index e443fe8cd6cf..f194ead443d4 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -284,22 +284,45 @@ config SLUB_CPU_PARTIAL
 	  which requires the taking of locks that may cause latency spikes.
 	  Typically one would choose no for a realtime system.
 
-config RANDOM_KMALLOC_CACHES
-	default n
+config PARTITION_KMALLOC_CACHES
 	depends on !SLUB_TINY
-	bool "Randomize slab caches for normal kmalloc"
+	bool "Partitioned slab caches for normal kmalloc"
 	help
-	  A hardening feature that creates multiple copies of slab caches for
-	  normal kmalloc allocation and makes kmalloc randomly pick one based
-	  on code address, which makes the attackers more difficult to spray
-	  vulnerable memory objects on the heap for the purpose of exploiting
-	  memory vulnerabilities.
+	  A hardening feature that creates multiple isolated copies of slab
+	  caches for normal kmalloc allocations. This makes it more difficult
+	  to exploit memory-safety vulnerabilities by attacking vulnerable
+	  co-located memory objects. Several modes are provided.
 
 	  Currently the number of copies is set to 16, a reasonably large value
 	  that effectively diverges the memory objects allocated for different
 	  subsystems or modules into different caches, at the expense of a
-	  limited degree of memory and CPU overhead that relates to hardware and
-	  system workload.
+	  limited degree of memory and CPU overhead that relates to hardware
+	  and system workload.
+
+choice
+	prompt "Partitioned slab cache mode"
+	depends on PARTITION_KMALLOC_CACHES
+	default TYPED_KMALLOC_CACHES
+	help
+	  Selects the slab cache partitioning mode.
+
+config RANDOM_KMALLOC_CACHES
+	bool "Randomize slab caches for normal kmalloc"
+	help
+	  Randomly pick a slab cache based on code address.
+
+config TYPED_KMALLOC_CACHES
+	bool "Type based slab cache selection for normal kmalloc"
+	depends on $(cc-option,-falloc-token-max=123)
+	help
+	  Rely on Clang's allocation tokens to choose a slab cache, where token
+	  IDs are derived from the allocated type.
+
+	  The current effectiveness of Clang's type inference can be judged by
+	  -Rpass=alloc-token, which provides diagnostics where (after dead-code
+	  elimination) type inference failed.
+
+endchoice
 
 endmenu # Slab allocator options
 
diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c
index 00034e37bc9f..76111257bbcc 100644
--- a/mm/kfence/kfence_test.c
+++ b/mm/kfence/kfence_test.c
@@ -214,7 +214,7 @@ static void test_cache_destroy(void)
 static inline size_t kmalloc_cache_alignment(size_t size)
 {
 	/* just to get ->align so no need to pass in the real caller */
-	enum kmalloc_cache_type type = kmalloc_type(GFP_KERNEL, 0);
+	enum kmalloc_cache_type type = kmalloc_type(GFP_KERNEL, __kmalloc_token(0));
 	return kmalloc_caches[type][__kmalloc_index(size, false)]->align;
 }
 
@@ -285,7 +285,7 @@ static void *test_alloc(struct kunit *test, size_t size, gfp_t gfp, enum allocat
 
 		if (is_kfence_address(alloc)) {
 			struct slab *slab = virt_to_slab(alloc);
-			enum kmalloc_cache_type type = kmalloc_type(GFP_KERNEL, _RET_IP_);
+			enum kmalloc_cache_type type = kmalloc_type(GFP_KERNEL, __kmalloc_token(size));
 			struct kmem_cache *s = test_cache ?:
 					kmalloc_caches[type][__kmalloc_index(size, false)];
 
diff --git a/mm/slab.h b/mm/slab.h
index 248b34c839b7..e956b59e0bd8 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -386,12 +386,12 @@ static inline unsigned int size_index_elem(unsigned int bytes)
  * KMALLOC_MAX_CACHE_SIZE and the caller must check that.
  */
 static inline struct kmem_cache *
-kmalloc_slab(size_t size, kmem_buckets *b, gfp_t flags, unsigned long caller)
+kmalloc_slab(size_t size, kmem_buckets *b, gfp_t flags, kmalloc_token_t token)
 {
 	unsigned int index;
 
 	if (!b)
-		b = &kmalloc_caches[kmalloc_type(flags, caller)];
+		b = &kmalloc_caches[kmalloc_type(flags, token)];
 	if (size <= 192)
 		index = kmalloc_size_index[size_index_elem(size)];
 	else
diff --git a/mm/slab_common.c b/mm/slab_common.c
index bfe7c40eeee1..6c826d50c819 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -741,7 +741,7 @@ size_t kmalloc_size_roundup(size_t size)
 		 * The flags don't matter since size_index is common to all.
 		 * Neither does the caller for just getting ->object_size.
 		 */
-		return kmalloc_slab(size, NULL, GFP_KERNEL, 0)->object_size;
+		return kmalloc_slab(size, NULL, GFP_KERNEL, __kmalloc_token(0))->object_size;
 	}
 
 	/* Above the smaller buckets, size is a multiple of page size. */
@@ -775,26 +775,26 @@ EXPORT_SYMBOL(kmalloc_size_roundup);
 #define KMALLOC_RCL_NAME(sz)
 #endif
 
-#ifdef CONFIG_RANDOM_KMALLOC_CACHES
-#define __KMALLOC_RANDOM_CONCAT(a, b) a ## b
-#define KMALLOC_RANDOM_NAME(N, sz) __KMALLOC_RANDOM_CONCAT(KMA_RAND_, N)(sz)
-#define KMA_RAND_1(sz)                  .name[KMALLOC_RANDOM_START +  1] = "kmalloc-rnd-01-" #sz,
-#define KMA_RAND_2(sz)  KMA_RAND_1(sz)  .name[KMALLOC_RANDOM_START +  2] = "kmalloc-rnd-02-" #sz,
-#define KMA_RAND_3(sz)  KMA_RAND_2(sz)  .name[KMALLOC_RANDOM_START +  3] = "kmalloc-rnd-03-" #sz,
-#define KMA_RAND_4(sz)  KMA_RAND_3(sz)  .name[KMALLOC_RANDOM_START +  4] = "kmalloc-rnd-04-" #sz,
-#define KMA_RAND_5(sz)  KMA_RAND_4(sz)  .name[KMALLOC_RANDOM_START +  5] = "kmalloc-rnd-05-" #sz,
-#define KMA_RAND_6(sz)  KMA_RAND_5(sz)  .name[KMALLOC_RANDOM_START +  6] = "kmalloc-rnd-06-" #sz,
-#define KMA_RAND_7(sz)  KMA_RAND_6(sz)  .name[KMALLOC_RANDOM_START +  7] = "kmalloc-rnd-07-" #sz,
-#define KMA_RAND_8(sz)  KMA_RAND_7(sz)  .name[KMALLOC_RANDOM_START +  8] = "kmalloc-rnd-08-" #sz,
-#define KMA_RAND_9(sz)  KMA_RAND_8(sz)  .name[KMALLOC_RANDOM_START +  9] = "kmalloc-rnd-09-" #sz,
-#define KMA_RAND_10(sz) KMA_RAND_9(sz)  .name[KMALLOC_RANDOM_START + 10] = "kmalloc-rnd-10-" #sz,
-#define KMA_RAND_11(sz) KMA_RAND_10(sz) .name[KMALLOC_RANDOM_START + 11] = "kmalloc-rnd-11-" #sz,
-#define KMA_RAND_12(sz) KMA_RAND_11(sz) .name[KMALLOC_RANDOM_START + 12] = "kmalloc-rnd-12-" #sz,
-#define KMA_RAND_13(sz) KMA_RAND_12(sz) .name[KMALLOC_RANDOM_START + 13] = "kmalloc-rnd-13-" #sz,
-#define KMA_RAND_14(sz) KMA_RAND_13(sz) .name[KMALLOC_RANDOM_START + 14] = "kmalloc-rnd-14-" #sz,
-#define KMA_RAND_15(sz) KMA_RAND_14(sz) .name[KMALLOC_RANDOM_START + 15] = "kmalloc-rnd-15-" #sz,
-#else // CONFIG_RANDOM_KMALLOC_CACHES
-#define KMALLOC_RANDOM_NAME(N, sz)
+#ifdef CONFIG_PARTITION_KMALLOC_CACHES
+#define __KMALLOC_PARTITION_CONCAT(a, b) a ## b
+#define KMALLOC_PARTITION_NAME(N, sz) __KMALLOC_PARTITION_CONCAT(KMA_PART_, N)(sz)
+#define KMA_PART_1(sz)                  .name[KMALLOC_PARTITION_START +  1] = "kmalloc-part-01-" #sz,
+#define KMA_PART_2(sz)  KMA_PART_1(sz)  .name[KMALLOC_PARTITION_START +  2] = "kmalloc-part-02-" #sz,
+#define KMA_PART_3(sz)  KMA_PART_2(sz)  .name[KMALLOC_PARTITION_START +  3] = "kmalloc-part-03-" #sz,
+#define KMA_PART_4(sz)  KMA_PART_3(sz)  .name[KMALLOC_PARTITION_START +  4] = "kmalloc-part-04-" #sz,
+#define KMA_PART_5(sz)  KMA_PART_4(sz)  .name[KMALLOC_PARTITION_START +  5] = "kmalloc-part-05-" #sz,
+#define KMA_PART_6(sz)  KMA_PART_5(sz)  .name[KMALLOC_PARTITION_START +  6] = "kmalloc-part-06-" #sz,
+#define KMA_PART_7(sz)  KMA_PART_6(sz)  .name[KMALLOC_PARTITION_START +  7] = "kmalloc-part-07-" #sz,
+#define KMA_PART_8(sz)  KMA_PART_7(sz)  .name[KMALLOC_PARTITION_START +  8] = "kmalloc-part-08-" #sz,
+#define KMA_PART_9(sz)  KMA_PART_8(sz)  .name[KMALLOC_PARTITION_START +  9] = "kmalloc-part-09-" #sz,
+#define KMA_PART_10(sz) KMA_PART_9(sz)  .name[KMALLOC_PARTITION_START + 10] = "kmalloc-part-10-" #sz,
+#define KMA_PART_11(sz) KMA_PART_10(sz) .name[KMALLOC_PARTITION_START + 11] = "kmalloc-part-11-" #sz,
+#define KMA_PART_12(sz) KMA_PART_11(sz) .name[KMALLOC_PARTITION_START + 12] = "kmalloc-part-12-" #sz,
+#define KMA_PART_13(sz) KMA_PART_12(sz) .name[KMALLOC_PARTITION_START + 13] = "kmalloc-part-13-" #sz,
+#define KMA_PART_14(sz) KMA_PART_13(sz) .name[KMALLOC_PARTITION_START + 14] = "kmalloc-part-14-" #sz,
+#define KMA_PART_15(sz) KMA_PART_14(sz) .name[KMALLOC_PARTITION_START + 15] = "kmalloc-part-15-" #sz,
+#else // CONFIG_PARTITION_KMALLOC_CACHES
+#define KMALLOC_PARTITION_NAME(N, sz)
 #endif
 
 #define INIT_KMALLOC_INFO(__size, __short_size)			\
@@ -803,7 +803,7 @@ EXPORT_SYMBOL(kmalloc_size_roundup);
 	KMALLOC_RCL_NAME(__short_size)				\
 	KMALLOC_CGROUP_NAME(__short_size)			\
 	KMALLOC_DMA_NAME(__short_size)				\
-	KMALLOC_RANDOM_NAME(RANDOM_KMALLOC_CACHES_NR, __short_size)	\
+	KMALLOC_PARTITION_NAME(PARTITION_KMALLOC_CACHES_NR, __short_size)	\
 	.size = __size,						\
 }
 
@@ -915,8 +915,8 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type)
 		flags |= SLAB_CACHE_DMA;
 	}
 
-#ifdef CONFIG_RANDOM_KMALLOC_CACHES
-	if (type >= KMALLOC_RANDOM_START && type <= KMALLOC_RANDOM_END)
+#ifdef CONFIG_PARTITION_KMALLOC_CACHES
+	if (type >= KMALLOC_PARTITION_START && type <= KMALLOC_PARTITION_END)
 		flags |= SLAB_NO_MERGE;
 #endif
 
diff --git a/mm/slub.c b/mm/slub.c
index 30003763d224..d3c2beab0ea2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4344,7 +4344,7 @@ EXPORT_SYMBOL(__kmalloc_large_node_noprof);
 
 static __always_inline
 void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,
-			unsigned long caller)
+			unsigned long caller, kmalloc_token_t token)
 {
 	struct kmem_cache *s;
 	void *ret;
@@ -4359,29 +4359,29 @@ void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,
 	if (unlikely(!size))
 		return ZERO_SIZE_PTR;
 
-	s = kmalloc_slab(size, b, flags, caller);
+	s = kmalloc_slab(size, b, flags, token);
 
 	ret = slab_alloc_node(s, NULL, flags, node, caller, size);
 	ret = kasan_kmalloc(s, ret, size, flags);
 	trace_kmalloc(caller, ret, size, s->size, flags, node);
 	return ret;
 }
-void *__kmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node)
+void *__kmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node, kmalloc_token_t token)
 {
-	return __do_kmalloc_node(size, PASS_BUCKET_PARAM(b), flags, node, _RET_IP_);
+	return __do_kmalloc_node(size, PASS_BUCKET_PARAM(b), flags, node, _RET_IP_, token);
 }
 EXPORT_SYMBOL(__kmalloc_node_noprof);
 
-void *__kmalloc_noprof(size_t size, gfp_t flags)
+void *__kmalloc_noprof(size_t size, gfp_t flags, kmalloc_token_t token)
 {
-	return __do_kmalloc_node(size, NULL, flags, NUMA_NO_NODE, _RET_IP_);
+	return __do_kmalloc_node(size, NULL, flags, NUMA_NO_NODE, _RET_IP_, token);
 }
 EXPORT_SYMBOL(__kmalloc_noprof);
 
 void *__kmalloc_node_track_caller_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags,
-					 int node, unsigned long caller)
+					 int node, unsigned long caller, kmalloc_token_t token)
 {
-	return __do_kmalloc_node(size, PASS_BUCKET_PARAM(b), flags, node, caller);
+	return __do_kmalloc_node(size, PASS_BUCKET_PARAM(b), flags, node, caller, token);
 
 }
 EXPORT_SYMBOL(__kmalloc_node_track_caller_noprof);
@@ -5041,7 +5041,7 @@ static gfp_t kmalloc_gfp_adjust(gfp_t flags, size_t size)
  *
  * Return: pointer to the allocated memory of %NULL in case of failure
  */
-void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node)
+void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node, kmalloc_token_t token)
 {
 	void *ret;
 
@@ -5051,7 +5051,7 @@ void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node)
 	 */
 	ret = __do_kmalloc_node(size, PASS_BUCKET_PARAM(b),
 				kmalloc_gfp_adjust(flags, size),
-				node, _RET_IP_);
+				node, _RET_IP_, token);
 	if (ret || size <= PAGE_SIZE)
 		return ret;
 
-- 
2.51.0.rc2.233.g662b1ed5c5-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] slab: support for compiler-assisted type-based slab cache partitioning
  2025-08-25 15:44 [PATCH RFC] slab: support for compiler-assisted type-based slab cache partitioning Marco Elver
@ 2025-08-25 16:48 ` Harry Yoo
  2025-08-26 10:45   ` Marco Elver
  2025-08-26 11:14   ` Matteo Rizzo
  2025-08-25 20:17 ` Kees Cook
  2025-08-26  4:59 ` GONG Ruiqi
  2 siblings, 2 replies; 10+ messages in thread
From: Harry Yoo @ 2025-08-25 16:48 UTC (permalink / raw)
  To: Marco Elver
  Cc: linux-kernel, kasan-dev, Gustavo A. R. Silva, Liam R. Howlett,
	Alexander Potapenko, Andrew Morton, Andrey Konovalov,
	David Hildenbrand, David Rientjes, Dmitry Vyukov, Florent Revest,
	GONG Ruiqi, Jann Horn, Kees Cook, Lorenzo Stoakes, Matteo Rizzo,
	Michal Hocko, Mike Rapoport, Nathan Chancellor, Roman Gushchin,
	Suren Baghdasaryan, Vlastimil Babka, linux-hardening, linux-mm

On Mon, Aug 25, 2025 at 05:44:40PM +0200, Marco Elver wrote:
> [ Beware, this an early RFC for an in-development Clang feature, and
>   requires the following Clang/LLVM development tree:
>    https://github.com/melver/llvm-project/tree/alloc-token
>   The corresponding LLVM RFC and discussion can be found here:
>    https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434  ]

Whoa, a cutting-edge feature!

> Rework the general infrastructure around RANDOM_KMALLOC_CACHES into more
> flexible PARTITION_KMALLOC_CACHES, with the former being a partitioning
> mode of the latter.
> 
> Introduce a new mode, TYPED_KMALLOC_CACHES, which leverages Clang's
> "allocation tokens" via __builtin_alloc_token_infer [1].
> 
> This mechanism allows the compiler to pass a token ID derived from the
> allocation's type to the allocator. The compiler performs best-effort
> type inference, and recognizes idioms such as kmalloc(sizeof(T), ...).
> Unlike RANDOM_KMALLOC_CACHES, this mode deterministically assigns a slab
> cache to an allocation of type T, regardless of allocation site.

I don't think either TYPED_KMALLOC_CACHES or RANDOM_KMALLOC_CACHES is
strictly superior to the other (or am I wrong?). Would it be reasonable
to do some run-time randomization for TYPED_KMALLOC_CACHES too?
(i.e., randomize index within top/bottom half based on allocation site and
random seed)

> Clang's default token ID calculation is described as [1]:
> 
>    TypeHashPointerSplit: This mode assigns a token ID based on the hash
>    of the allocated type's name, where the top half ID-space is reserved
>    for types that contain pointers and the bottom half for types that do
>    not contain pointers.
> 
> Separating pointer-containing objects from pointerless objects and data
> allocations can help mitigate certain classes of memory corruption
> exploits [2]: attackers who gains a buffer overflow on a primitive
> buffer cannot use it to directly corrupt pointers or other critical
> metadata in an object residing in a different, isolated heap region.
>
> It is important to note that heap isolation strategies offer a
> best-effort approach, and do not provide a 100% security guarantee,
> albeit achievable at relatively low performance cost. Note that this
> also does not prevent cross-cache attacks, and SLAB_VIRTUAL [3] should
> be used as a complementary mitigation.

Not relevant to this patch, but just wondering if there are
any plans for SLAB_VIRTUAL?

> With all that, my kernel (x86 defconfig) shows me a histogram of slab
> cache object distribution per /proc/slabinfo (after boot):
> 
>   <slab cache>      <objs> <hist>
>   kmalloc-part-15     619  ++++++
>   kmalloc-part-14    1412  ++++++++++++++
>   kmalloc-part-13    1063  ++++++++++
>   kmalloc-part-12    1745  +++++++++++++++++
>   kmalloc-part-11     891  ++++++++
>   kmalloc-part-10     610  ++++++
>   kmalloc-part-09     792  +++++++
>   kmalloc-part-08    3054  ++++++++++++++++++++++++++++++
>   kmalloc-part-07     245  ++
>   kmalloc-part-06     182  +
>   kmalloc-part-05     122  +
>   kmalloc-part-04     295  ++
>   kmalloc-part-03     241  ++
>   kmalloc-part-02     107  +
>   kmalloc-part-01     124  +
>   kmalloc            6231  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> The above /proc/slabinfo snapshot shows me there are 7547 allocated
> objects (slabs 00 - 07) that the compiler claims contain no pointers or
> it was unable to infer the type of, and 10186 objects that contain
> pointers (slabs 08 - 15). On a whole, this looks relatively sane.
> 
> Additionally, when I compile my kernel with -Rpass=alloc-token, which
> provides diagnostics where (after dead-code elimination) type inference
> failed, I see 966 allocation sites where the compiler failed to identify
> a type. Some initial review confirms these are mostly variable sized
> buffers, but also include structs with trailing flexible length arrays
> (the latter could be recognized by the compiler by teaching it to look
> more deeply into complex expressions such as those generated by
> struct_size).

When the compiler fails to identify a type, does it go to top half or
bottom half, or perhaps it doesn't matter?

> Link: https://github.com/melver/llvm-project/blob/alloc-token/clang/docs/AllocToken.rst [1]
> Link: https://blog.dfsec.com/ios/2025/05/30/blasting-past-ios-18 [2]
> Link: https://lwn.net/Articles/944647/ [3]
> Signed-off-by: Marco Elver <elver@google.com>
> ---

I didn't go too deep into the implementation details, but I'm happy with
it since the change looks quite simple ;)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] slab: support for compiler-assisted type-based slab cache partitioning
  2025-08-25 15:44 [PATCH RFC] slab: support for compiler-assisted type-based slab cache partitioning Marco Elver
  2025-08-25 16:48 ` Harry Yoo
@ 2025-08-25 20:17 ` Kees Cook
  2025-08-26 10:50   ` Marco Elver
  2025-08-26  4:59 ` GONG Ruiqi
  2 siblings, 1 reply; 10+ messages in thread
From: Kees Cook @ 2025-08-25 20:17 UTC (permalink / raw)
  To: Marco Elver, elver
  Cc: linux-kernel, kasan-dev, Gustavo A. R. Silva, Liam R. Howlett,
	Alexander Potapenko, Andrew Morton, Andrey Konovalov,
	David Hildenbrand, David Rientjes, Dmitry Vyukov, Florent Revest,
	GONG Ruiqi, Harry Yoo, Jann Horn, Lorenzo Stoakes, Matteo Rizzo,
	Michal Hocko, Mike Rapoport, Nathan Chancellor, Roman Gushchin,
	Suren Baghdasaryan, Vlastimil Babka, linux-hardening, linux-mm



On August 25, 2025 11:44:40 AM EDT, Marco Elver <elver@google.com> wrote:
>Additionally, when I compile my kernel with -Rpass=alloc-token, which
>provides diagnostics where (after dead-code elimination) type inference
>failed, I see 966 allocation sites where the compiler failed to identify
>a type. Some initial review confirms these are mostly variable sized
>buffers, but also include structs with trailing flexible length arrays
>(the latter could be recognized by the compiler by teaching it to look
>more deeply into complex expressions such as those generated by
>struct_size).

Can the type be extracted from an AST analysis of the lhs?

struct foo *p = kmalloc(bytes, gfp);

Doesn't tell us much from "bytes", but typeof(*p) does...

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] slab: support for compiler-assisted type-based slab cache partitioning
  2025-08-25 15:44 [PATCH RFC] slab: support for compiler-assisted type-based slab cache partitioning Marco Elver
  2025-08-25 16:48 ` Harry Yoo
  2025-08-25 20:17 ` Kees Cook
@ 2025-08-26  4:59 ` GONG Ruiqi
  2025-08-26 11:01   ` Marco Elver
  2 siblings, 1 reply; 10+ messages in thread
From: GONG Ruiqi @ 2025-08-26  4:59 UTC (permalink / raw)
  To: Marco Elver
  Cc: linux-kernel, kasan-dev, Gustavo A. R. Silva, Liam R. Howlett,
	Alexander Potapenko, Andrew Morton, Andrey Konovalov,
	David Hildenbrand, David Rientjes, Dmitry Vyukov, Florent Revest,
	Harry Yoo, Jann Horn, Kees Cook, Lorenzo Stoakes, Matteo Rizzo,
	Michal Hocko, Mike Rapoport, Nathan Chancellor, Roman Gushchin,
	Suren Baghdasaryan, Vlastimil Babka, linux-hardening, linux-mm


On 8/25/2025 11:44 PM, Marco Elver wrote:
> ...
> 
> Introduce a new mode, TYPED_KMALLOC_CACHES, which leverages Clang's
> "allocation tokens" via __builtin_alloc_token_infer [1].
> 
> This mechanism allows the compiler to pass a token ID derived from the
> allocation's type to the allocator. The compiler performs best-effort
> type inference, and recognizes idioms such as kmalloc(sizeof(T), ...).
> Unlike RANDOM_KMALLOC_CACHES, this mode deterministically assigns a slab
> cache to an allocation of type T, regardless of allocation site.
> 
> Clang's default token ID calculation is described as [1]:
> 
>    TypeHashPointerSplit: This mode assigns a token ID based on the hash
>    of the allocated type's name, where the top half ID-space is reserved
>    for types that contain pointers and the bottom half for types that do
>    not contain pointers.
> 

Is a type's token id always the same across different builds? Or somehow
predictable? If so, the attacker could probably find out all types that
end up with the same id, and use some of them to exploit the buggy one.

-Ruiqi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] slab: support for compiler-assisted type-based slab cache partitioning
  2025-08-25 16:48 ` Harry Yoo
@ 2025-08-26 10:45   ` Marco Elver
  2025-08-26 11:14   ` Matteo Rizzo
  1 sibling, 0 replies; 10+ messages in thread
From: Marco Elver @ 2025-08-26 10:45 UTC (permalink / raw)
  To: Harry Yoo
  Cc: linux-kernel, kasan-dev, Gustavo A. R. Silva, Liam R. Howlett,
	Alexander Potapenko, Andrew Morton, Andrey Konovalov,
	David Hildenbrand, David Rientjes, Dmitry Vyukov, Florent Revest,
	GONG Ruiqi, Jann Horn, Kees Cook, Lorenzo Stoakes, Matteo Rizzo,
	Michal Hocko, Mike Rapoport, Nathan Chancellor, Roman Gushchin,
	Suren Baghdasaryan, Vlastimil Babka, linux-hardening, linux-mm

On Mon, 25 Aug 2025 at 18:49, Harry Yoo <harry.yoo@oracle.com> wrote:
[...]
> > This mechanism allows the compiler to pass a token ID derived from the
> > allocation's type to the allocator. The compiler performs best-effort
> > type inference, and recognizes idioms such as kmalloc(sizeof(T), ...).
> > Unlike RANDOM_KMALLOC_CACHES, this mode deterministically assigns a slab
> > cache to an allocation of type T, regardless of allocation site.
>
> I don't think either TYPED_KMALLOC_CACHES or RANDOM_KMALLOC_CACHES is
> strictly superior to the other (or am I wrong?).

TYPED_KMALLOC_CACHES provides stronger guarantees on how objects are
isolated; in particular, isolating (most) pointer-containing objects
from plain data objects means that it's a lot harder to gain control
of a pointer from an ordinary buffer overflow in a plain data object.

This particular proposed scheme is the result of conclusions I
gathered from various security researchers (and also reconfirmed by
e.g. [2]), and the conclusion being that many successful exploits gain
a write primitive through a vulnerable plain data allocation. That
write primitive can then be used to overwrite pointers in adjacent
objects.

In addition, I have been told by some of those security researches
(citation needed), that RANDOM_KMALLOC_CACHES actually makes some
exploits easier, because there is less "noise" in each individual slab
cache, yet a given allocation is predictably assigned to a slab cache
by its callsite (via _RET_IP_ + boot-time seed). RANDOM_KMALLOC_CACHES
does not separate pointer-containing and non-pointer-containing
objects, and therefore it's likely that a vulnerable object is still
co-located with a pointer-containing object that can be overwritten.

That being said, none of these mitigation are perfect. But on systems
that cannot afford to enable KASAN (or rather, KASAN_HW_TAGS) in
production, it's a lot better than nothing.

[2] https://blog.dfsec.com/ios/2025/05/30/blasting-past-ios-18

> Would it be reasonable
> to do some run-time randomization for TYPED_KMALLOC_CACHES too?
> (i.e., randomize index within top/bottom half based on allocation site and
> random seed)

It's unclear to me if that would strengthen or weaken the mitigation.
Irrespective of the top/bottom split, one of the key properties to
retain is that allocations of type T are predictably assigned a slab
cache. This means that even if a pointer-containing object of type T
is vulnerable, yet the pointer within T is useless for exploitation,
the difficulty of getting to a sensitive object S is still increased
by the fact that S is unlikely to be co-located. If we were to
introduce more randomness, we increase the probability that S will be
co-located with T, which is counter-intuitive to me.

> > Clang's default token ID calculation is described as [1]:
> >
> >    TypeHashPointerSplit: This mode assigns a token ID based on the hash
> >    of the allocated type's name, where the top half ID-space is reserved
> >    for types that contain pointers and the bottom half for types that do
> >    not contain pointers.
> >
> > Separating pointer-containing objects from pointerless objects and data
> > allocations can help mitigate certain classes of memory corruption
> > exploits [2]: attackers who gains a buffer overflow on a primitive
> > buffer cannot use it to directly corrupt pointers or other critical
> > metadata in an object residing in a different, isolated heap region.
> >
> > It is important to note that heap isolation strategies offer a
> > best-effort approach, and do not provide a 100% security guarantee,
> > albeit achievable at relatively low performance cost. Note that this
> > also does not prevent cross-cache attacks, and SLAB_VIRTUAL [3] should
> > be used as a complementary mitigation.
>
> Not relevant to this patch, but just wondering if there are
> any plans for SLAB_VIRTUAL?

The relevant folks are Cc'd, so hopefully they are aware.

[...]
> > Additionally, when I compile my kernel with -Rpass=alloc-token, which
> > provides diagnostics where (after dead-code elimination) type inference
> > failed, I see 966 allocation sites where the compiler failed to identify
> > a type. Some initial review confirms these are mostly variable sized
> > buffers, but also include structs with trailing flexible length arrays
> > (the latter could be recognized by the compiler by teaching it to look
> > more deeply into complex expressions such as those generated by
> > struct_size).
>
> When the compiler fails to identify a type, does it go to top half or
> bottom half, or perhaps it doesn't matter?

It picks fallback of 0 by default, so that'd be the bottom half, which
would be the pointer-less bucket. That also matches what I'm seeing,
where the majority of these objects are variably sized plain buffers.
The fallback itself is configurable, so it'd also be possible to pick
a dedicated slab cache for the "unknown type" allocations.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] slab: support for compiler-assisted type-based slab cache partitioning
  2025-08-25 20:17 ` Kees Cook
@ 2025-08-26 10:50   ` Marco Elver
  0 siblings, 0 replies; 10+ messages in thread
From: Marco Elver @ 2025-08-26 10:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, kasan-dev, Gustavo A. R. Silva, Liam R. Howlett,
	Alexander Potapenko, Andrew Morton, Andrey Konovalov,
	David Hildenbrand, David Rientjes, Dmitry Vyukov, Florent Revest,
	GONG Ruiqi, Harry Yoo, Jann Horn, Lorenzo Stoakes, Matteo Rizzo,
	Michal Hocko, Mike Rapoport, Nathan Chancellor, Roman Gushchin,
	Suren Baghdasaryan, Vlastimil Babka, linux-hardening, linux-mm

On Mon, 25 Aug 2025 at 22:18, Kees Cook <kees@kernel.org> wrote:
> On August 25, 2025 11:44:40 AM EDT, Marco Elver <elver@google.com> wrote:
> >Additionally, when I compile my kernel with -Rpass=alloc-token, which
> >provides diagnostics where (after dead-code elimination) type inference
> >failed, I see 966 allocation sites where the compiler failed to identify
> >a type. Some initial review confirms these are mostly variable sized
> >buffers, but also include structs with trailing flexible length arrays
> >(the latter could be recognized by the compiler by teaching it to look
> >more deeply into complex expressions such as those generated by
> >struct_size).
>
> Can the type be extracted from an AST analysis of the lhs?
>
> struct foo *p = kmalloc(bytes, gfp);
>
> Doesn't tell us much from "bytes", but typeof(*p) does...

Certainly possible. It currently looks for explicit casts if it can't
figure out from malloc args, but is not yet able to deal with implicit
casts like that. But it's fixable - on the TODO list, and should
improve coverage even more.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] slab: support for compiler-assisted type-based slab cache partitioning
  2025-08-26  4:59 ` GONG Ruiqi
@ 2025-08-26 11:01   ` Marco Elver
  2025-08-26 11:31     ` Florent Revest
  2025-08-27  8:34     ` GONG Ruiqi
  0 siblings, 2 replies; 10+ messages in thread
From: Marco Elver @ 2025-08-26 11:01 UTC (permalink / raw)
  To: GONG Ruiqi
  Cc: linux-kernel, kasan-dev, Gustavo A. R. Silva, Liam R. Howlett,
	Alexander Potapenko, Andrew Morton, Andrey Konovalov,
	David Hildenbrand, David Rientjes, Dmitry Vyukov, Florent Revest,
	Harry Yoo, Jann Horn, Kees Cook, Lorenzo Stoakes, Matteo Rizzo,
	Michal Hocko, Mike Rapoport, Nathan Chancellor, Roman Gushchin,
	Suren Baghdasaryan, Vlastimil Babka, linux-hardening, linux-mm

On Tue, 26 Aug 2025 at 06:59, GONG Ruiqi <gongruiqi1@huawei.com> wrote:
> On 8/25/2025 11:44 PM, Marco Elver wrote:
> > ...
> >
> > Introduce a new mode, TYPED_KMALLOC_CACHES, which leverages Clang's
> > "allocation tokens" via __builtin_alloc_token_infer [1].
> >
> > This mechanism allows the compiler to pass a token ID derived from the
> > allocation's type to the allocator. The compiler performs best-effort
> > type inference, and recognizes idioms such as kmalloc(sizeof(T), ...).
> > Unlike RANDOM_KMALLOC_CACHES, this mode deterministically assigns a slab
> > cache to an allocation of type T, regardless of allocation site.
> >
> > Clang's default token ID calculation is described as [1]:
> >
> >    TypeHashPointerSplit: This mode assigns a token ID based on the hash
> >    of the allocated type's name, where the top half ID-space is reserved
> >    for types that contain pointers and the bottom half for types that do
> >    not contain pointers.
>
> Is a type's token id always the same across different builds? Or somehow
> predictable? If so, the attacker could probably find out all types that
> end up with the same id, and use some of them to exploit the buggy one.

Yes, it's meant to be deterministic and predictable. I guess this is
the same question regarding randomness, for which it's unclear if it
strengthens or weakens the mitigation. As I wrote elsewhere:

> Irrespective of the top/bottom split, one of the key properties to
> retain is that allocations of type T are predictably assigned a slab
> cache. This means that even if a pointer-containing object of type T
> is vulnerable, yet the pointer within T is useless for exploitation,
> the difficulty of getting to a sensitive object S is still increased
> by the fact that S is unlikely to be co-located. If we were to
> introduce more randomness, we increase the probability that S will be
> co-located with T, which is counter-intuitive to me.

I think we can reason either way, and I grant you this is rather ambiguous.

But the definitive point that was made to me from various security
researchers that inspired this technique is that the most useful thing
we can do is separate pointer-containing objects from
non-pointer-containing objects (in absence of slab per type, which is
likely too costly in the common case).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] slab: support for compiler-assisted type-based slab cache partitioning
  2025-08-25 16:48 ` Harry Yoo
  2025-08-26 10:45   ` Marco Elver
@ 2025-08-26 11:14   ` Matteo Rizzo
  1 sibling, 0 replies; 10+ messages in thread
From: Matteo Rizzo @ 2025-08-26 11:14 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Marco Elver, linux-kernel, kasan-dev, Gustavo A. R. Silva,
	Liam R. Howlett, Alexander Potapenko, Andrew Morton,
	Andrey Konovalov, David Hildenbrand, David Rientjes,
	Dmitry Vyukov, Florent Revest, GONG Ruiqi, Jann Horn, Kees Cook,
	Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Nathan Chancellor,
	Roman Gushchin, Suren Baghdasaryan, Vlastimil Babka,
	linux-hardening, linux-mm

On Mon, 25 Aug 2025 at 18:49, Harry Yoo <harry.yoo@oracle.com> wrote:
>
> Not relevant to this patch, but just wondering if there are
> any plans for SLAB_VIRTUAL?

I'm still working on it, I hope to submit a new version upstream soon.
There are a few issues with the current version (mainly virtual memory
exhaustion) that I would like to solve first.

Matteo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] slab: support for compiler-assisted type-based slab cache partitioning
  2025-08-26 11:01   ` Marco Elver
@ 2025-08-26 11:31     ` Florent Revest
  2025-08-27  8:34     ` GONG Ruiqi
  1 sibling, 0 replies; 10+ messages in thread
From: Florent Revest @ 2025-08-26 11:31 UTC (permalink / raw)
  To: Marco Elver
  Cc: GONG Ruiqi, linux-kernel, kasan-dev, Gustavo A. R. Silva,
	Liam R. Howlett, Alexander Potapenko, Andrew Morton,
	Andrey Konovalov, David Hildenbrand, David Rientjes,
	Dmitry Vyukov, Harry Yoo, Jann Horn, Kees Cook, Lorenzo Stoakes,
	Matteo Rizzo, Michal Hocko, Mike Rapoport, Nathan Chancellor,
	Roman Gushchin, Suren Baghdasaryan, Vlastimil Babka,
	linux-hardening, linux-mm

On Tue, Aug 26, 2025 at 1:01 PM Marco Elver <elver@google.com> wrote:
>
> On Tue, 26 Aug 2025 at 06:59, GONG Ruiqi <gongruiqi1@huawei.com> wrote:
> > On 8/25/2025 11:44 PM, Marco Elver wrote:
> > > ...
> > >
> > > Introduce a new mode, TYPED_KMALLOC_CACHES, which leverages Clang's
> > > "allocation tokens" via __builtin_alloc_token_infer [1].
> > >
> > > This mechanism allows the compiler to pass a token ID derived from the
> > > allocation's type to the allocator. The compiler performs best-effort
> > > type inference, and recognizes idioms such as kmalloc(sizeof(T), ...).
> > > Unlike RANDOM_KMALLOC_CACHES, this mode deterministically assigns a slab
> > > cache to an allocation of type T, regardless of allocation site.
> > >
> > > Clang's default token ID calculation is described as [1]:
> > >
> > >    TypeHashPointerSplit: This mode assigns a token ID based on the hash
> > >    of the allocated type's name, where the top half ID-space is reserved
> > >    for types that contain pointers and the bottom half for types that do
> > >    not contain pointers.
> >
> > Is a type's token id always the same across different builds? Or somehow
> > predictable? If so, the attacker could probably find out all types that
> > end up with the same id, and use some of them to exploit the buggy one.
>
> Yes, it's meant to be deterministic and predictable. I guess this is
> the same question regarding randomness, for which it's unclear if it
> strengthens or weakens the mitigation. As I wrote elsewhere:
>
> > Irrespective of the top/bottom split, one of the key properties to
> > retain is that allocations of type T are predictably assigned a slab
> > cache. This means that even if a pointer-containing object of type T
> > is vulnerable, yet the pointer within T is useless for exploitation,
> > the difficulty of getting to a sensitive object S is still increased
> > by the fact that S is unlikely to be co-located. If we were to
> > introduce more randomness, we increase the probability that S will be
> > co-located with T, which is counter-intuitive to me.
>
> I think we can reason either way, and I grant you this is rather ambiguous.
>
> But the definitive point that was made to me from various security
> researchers that inspired this technique is that the most useful thing
> we can do is separate pointer-containing objects from
> non-pointer-containing objects (in absence of slab per type, which is
> likely too costly in the common case).

One more perspective on this: in a data center environment, attackers
typically get a first foothold by compromising a userspace network
service. If they can do that once, they can do that a bunch of times,
and gain code execution on different machines every time.

Before trying to exploit a kernel memory corruption to elevate
privileges on a machine, they can test the SLAB properties of the
running kernel to make sure it's as they wish (eg: with timing side
channels like in the SLUBStick paper). So with RANDOM_KMALLOC_CACHES,
attackers can just keep retrying their attacks until they land on a
machine where the types T and S are collocated and only then proceed
with their exploit.

With TYPED_KMALLOC_CACHES (and with SLAB_VIRTUAL hopefully someday),
they are simply never able to cross the "objects without pointers" to
"objects with pointers" boundary which really gets in the way of many
exploitation techniques and feels at least to me like a much stronger
security boundary.

This limit of RANDOM_KMALLOC_CACHES may not be as relevant in other
deployments (eg: on a smartphone) but it makes me strongly prefer
TYPED_KMALLOC_CACHES for server use cases at least.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] slab: support for compiler-assisted type-based slab cache partitioning
  2025-08-26 11:01   ` Marco Elver
  2025-08-26 11:31     ` Florent Revest
@ 2025-08-27  8:34     ` GONG Ruiqi
  1 sibling, 0 replies; 10+ messages in thread
From: GONG Ruiqi @ 2025-08-27  8:34 UTC (permalink / raw)
  To: Marco Elver
  Cc: linux-kernel, kasan-dev, Gustavo A. R. Silva, Liam R. Howlett,
	Alexander Potapenko, Andrew Morton, Andrey Konovalov,
	David Hildenbrand, David Rientjes, Dmitry Vyukov, Florent Revest,
	Harry Yoo, Jann Horn, Kees Cook, Lorenzo Stoakes, Matteo Rizzo,
	Michal Hocko, Mike Rapoport, Nathan Chancellor, Roman Gushchin,
	Suren Baghdasaryan, Vlastimil Babka, linux-hardening, linux-mm



On 8/26/2025 7:01 PM, Marco Elver wrote:
> On Tue, 26 Aug 2025 at 06:59, GONG Ruiqi <gongruiqi1@huawei.com> wrote:
>> On 8/25/2025 11:44 PM, Marco Elver wrote:
>>> ...
>>>
>>> Introduce a new mode, TYPED_KMALLOC_CACHES, which leverages Clang's
>>> "allocation tokens" via __builtin_alloc_token_infer [1].
>>>
>>> This mechanism allows the compiler to pass a token ID derived from the
>>> allocation's type to the allocator. The compiler performs best-effort
>>> type inference, and recognizes idioms such as kmalloc(sizeof(T), ...).
>>> Unlike RANDOM_KMALLOC_CACHES, this mode deterministically assigns a slab
>>> cache to an allocation of type T, regardless of allocation site.
>>>
>>> Clang's default token ID calculation is described as [1]:
>>>
>>>    TypeHashPointerSplit: This mode assigns a token ID based on the hash
>>>    of the allocated type's name, where the top half ID-space is reserved
>>>    for types that contain pointers and the bottom half for types that do
>>>    not contain pointers.
>>
>> Is a type's token id always the same across different builds? Or somehow
>> predictable? If so, the attacker could probably find out all types that
>> end up with the same id, and use some of them to exploit the buggy one.
> 
> Yes, it's meant to be deterministic and predictable. I guess this is
> the same question regarding randomness, for which it's unclear if it
> strengthens or weakens the mitigation. As I wrote elsewhere:
> 
>> Irrespective of the top/bottom split, one of the key properties to
>> retain is that allocations of type T are predictably assigned a slab
>> cache. This means that even if a pointer-containing object of type T
>> is vulnerable, yet the pointer within T is useless for exploitation,
>> the difficulty of getting to a sensitive object S is still increased
>> by the fact that S is unlikely to be co-located. If we were to
>> introduce more randomness, we increase the probability that S will be
>> co-located with T, which is counter-intuitive to me.

I'm interested in such topic. Let's discuss multiple situations here.

If S doesn't contains a pointer member, then your pointer-containing
object isolation completely separates S against T. No problem, and
nothing to do with randomness.

If S does, then whether they co-locate is completely based on the token
algorithm, which has two problems: 1. The result is deterministic and so
can be known by everyone including the attacker, so the attacker could
analyze the code and try to find out an S suitable for being exploited.
And 2. once such T & S exist, we can't interfere in the algorithm, and
the defense fails for all builds (of the same or nearby kernel versions
at least).

Here I think randomness could help: its value is not just about
separating things based on probability, but more about blinding the
attacker. In this scenario, with randomness we could let the attacker
unable to find out the suitable S, so they couldn't exploit it even
though such S & T exist. As you mentioned (somewhere else), the attacker
might still be able to "take off the eye mask" and locate S & T by some
other methods, e.g. analyzing the resource information at runtime, but
that's not randomness to blame. We could do something else about that
(e.g. show less for random-candidate slab caches), and that's another story.

> 
> I think we can reason either way, and I grant you this is rather ambiguous.
> 
> But the definitive point that was made to me from various security
> researchers that inspired this technique is that the most useful thing
> we can do is separate pointer-containing objects from
> non-pointer-containing objects (in absence of slab per type, which is
> likely too costly in the common case).

Isolating pointer-containing objects is the key point indeed. And for me
it's orthogonal with randomness, and they can be combined to achieve
better hardening solutions.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-08-27  8:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-25 15:44 [PATCH RFC] slab: support for compiler-assisted type-based slab cache partitioning Marco Elver
2025-08-25 16:48 ` Harry Yoo
2025-08-26 10:45   ` Marco Elver
2025-08-26 11:14   ` Matteo Rizzo
2025-08-25 20:17 ` Kees Cook
2025-08-26 10:50   ` Marco Elver
2025-08-26  4:59 ` GONG Ruiqi
2025-08-26 11:01   ` Marco Elver
2025-08-26 11:31     ` Florent Revest
2025-08-27  8:34     ` GONG Ruiqi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).