linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/5] slab: Allocate and use per-call-site caches
@ 2024-08-09  7:33 Kees Cook
  2024-08-09  7:33 ` [PATCH 1/5] slab: Introduce kmem_buckets_destroy() Kees Cook
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Kees Cook @ 2024-08-09  7:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, Suren Baghdasaryan, Kent Overstreet, GONG, Ruiqi,
	Jann Horn, Matteo Rizzo, jvoisin, Xiu Jianfeng, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, linux-kernel, linux-mm,
	linux-hardening

Hi,

Here's my current progress on using per-call-site kmalloc caches (instead
of KMALLOC_NORMAL), as a defense against the common heap-grooming attacks
that construct malicious objects in the same cache as a target object.

I'd like to get feedback on the general approach before I continue with
it. I've noted in the later patches what additional improvements I'd
like to make. The first 3 patches are relatively small infrastructure
changes.

Thanks!

-Kees

Kees Cook (5):
  slab: Introduce kmem_buckets_destroy()
  codetag: Run module_load hooks for builtin codetags
  codetag: Introduce codetag_early_walk()
  alloc_tag: Track fixed vs dynamic sized kmalloc calls
  slab: Allocate and use per-call-site caches

 include/linux/alloc_tag.h |  38 +++++++++--
 include/linux/codetag.h   |   2 +
 include/linux/slab.h      |  17 ++---
 lib/alloc_tag.c           | 129 +++++++++++++++++++++++++++++++++++---
 lib/codetag.c             |  21 +++++--
 mm/Kconfig                |  25 ++++++++
 mm/slab_common.c          |  18 +++++-
 mm/slub.c                 |  31 ++++++++-
 8 files changed, 253 insertions(+), 28 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/5] slab: Introduce kmem_buckets_destroy()
  2024-08-09  7:33 [RFC][PATCH 0/5] slab: Allocate and use per-call-site caches Kees Cook
@ 2024-08-09  7:33 ` Kees Cook
  2024-08-09  7:33 ` [PATCH 2/5] codetag: Run module_load hooks for builtin codetags Kees Cook
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Kees Cook @ 2024-08-09  7:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Roman Gushchin, Hyeonggon Yoo,
	linux-mm, Suren Baghdasaryan, Kent Overstreet, GONG, Ruiqi,
	Jann Horn, Matteo Rizzo, jvoisin, Xiu Jianfeng, linux-kernel,
	linux-hardening

Modular use of kmem_buckets_create() means that kmem_buckets will need
to be removed as well. Introduce kmem_buckets_destroy(), matching
kmem_cache_destroy().

Signed-off-by: Kees Cook <kees@kernel.org>
---
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: linux-mm@kvack.org
---
 include/linux/slab.h |  1 +
 mm/slab_common.c     | 17 ++++++++++++++---
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index eb2bf4629157..86cb61a0102c 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -552,6 +552,7 @@ void kmem_cache_free(struct kmem_cache *s, void *objp);
 kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags,
 				  unsigned int useroffset, unsigned int usersize,
 				  void (*ctor)(void *));
+void kmem_buckets_destroy(kmem_buckets *b);
 
 /*
  * Bulk allocation and freeing operations. These are accelerated in an
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 40b582a014b8..fc698cba0ebe 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -392,6 +392,19 @@ kmem_cache_create(const char *name, unsigned int size, unsigned int align,
 }
 EXPORT_SYMBOL(kmem_cache_create);
 
+void kmem_buckets_destroy(kmem_buckets *b)
+{
+	int idx;
+
+	if (!b)
+		return;
+
+	for (idx = 0; idx < ARRAY_SIZE(kmalloc_caches[KMALLOC_NORMAL]); idx++)
+		kmem_cache_destroy((*b)[idx]);
+	kfree(b);
+}
+EXPORT_SYMBOL(kmem_buckets_destroy);
+
 static struct kmem_cache *kmem_buckets_cache __ro_after_init;
 
 /**
@@ -476,9 +489,7 @@ kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags,
 	return b;
 
 fail:
-	for (idx = 0; idx < ARRAY_SIZE(kmalloc_caches[KMALLOC_NORMAL]); idx++)
-		kmem_cache_destroy((*b)[idx]);
-	kfree(b);
+	kmem_buckets_destroy(b);
 
 	return NULL;
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/5] codetag: Run module_load hooks for builtin codetags
  2024-08-09  7:33 [RFC][PATCH 0/5] slab: Allocate and use per-call-site caches Kees Cook
  2024-08-09  7:33 ` [PATCH 1/5] slab: Introduce kmem_buckets_destroy() Kees Cook
@ 2024-08-09  7:33 ` Kees Cook
  2024-08-29 15:02   ` Suren Baghdasaryan
  2024-08-09  7:33 ` [PATCH 3/5] codetag: Introduce codetag_early_walk() Kees Cook
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Kees Cook @ 2024-08-09  7:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, Suren Baghdasaryan, Kent Overstreet, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, linux-mm, GONG, Ruiqi, Jann Horn,
	Matteo Rizzo, jvoisin, Xiu Jianfeng, linux-kernel,
	linux-hardening

The module_load callback should still run for builtin codetags that
define it, even in a non-modular kernel. (i.e. for the cmod->mod == NULL
case).

Signed-off-by: Kees Cook <kees@kernel.org>
---
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: linux-mm@kvack.org
---
 lib/codetag.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/lib/codetag.c b/lib/codetag.c
index 5ace625f2328..ef7634c7ee18 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -125,7 +125,6 @@ static inline size_t range_size(const struct codetag_type *cttype,
 			cttype->desc.tag_size;
 }
 
-#ifdef CONFIG_MODULES
 static void *get_symbol(struct module *mod, const char *prefix, const char *name)
 {
 	DECLARE_SEQ_BUF(sb, KSYM_NAME_LEN);
@@ -199,6 +198,7 @@ static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
 	return 0;
 }
 
+#ifdef CONFIG_MODULES
 void codetag_load_module(struct module *mod)
 {
 	struct codetag_type *cttype;
@@ -248,9 +248,6 @@ bool codetag_unload_module(struct module *mod)
 
 	return unload_ok;
 }
-
-#else /* CONFIG_MODULES */
-static int codetag_module_init(struct codetag_type *cttype, struct module *mod) { return 0; }
 #endif /* CONFIG_MODULES */
 
 struct codetag_type *
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/5] codetag: Introduce codetag_early_walk()
  2024-08-09  7:33 [RFC][PATCH 0/5] slab: Allocate and use per-call-site caches Kees Cook
  2024-08-09  7:33 ` [PATCH 1/5] slab: Introduce kmem_buckets_destroy() Kees Cook
  2024-08-09  7:33 ` [PATCH 2/5] codetag: Run module_load hooks for builtin codetags Kees Cook
@ 2024-08-09  7:33 ` Kees Cook
  2024-08-29 15:39   ` Suren Baghdasaryan
  2024-08-09  7:33 ` [PATCH 4/5] alloc_tag: Track fixed vs dynamic sized kmalloc calls Kees Cook
  2024-08-09  7:33 ` [PATCH 5/5] slab: Allocate and use per-call-site caches Kees Cook
  4 siblings, 1 reply; 17+ messages in thread
From: Kees Cook @ 2024-08-09  7:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, Suren Baghdasaryan, Kent Overstreet, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, linux-mm, GONG, Ruiqi, Jann Horn,
	Matteo Rizzo, jvoisin, Xiu Jianfeng, linux-kernel,
	linux-hardening

In order to process builtin alloc_tags much earlier during boot (before
register_codetag() is processed), provide codetag_early_walk() that
perform a lockless walk with a specified callback function. This will be
used to allocate required caches that cannot be allocated on demand.

Signed-off-by: Kees Cook <kees@kernel.org>
---
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: linux-mm@kvack.org
---
 include/linux/codetag.h |  2 ++
 lib/codetag.c           | 16 ++++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index c2a579ccd455..9eb1fcd90570 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -64,6 +64,8 @@ void codetag_lock_module_list(struct codetag_type *cttype, bool lock);
 bool codetag_trylock_module_list(struct codetag_type *cttype);
 struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype);
 struct codetag *codetag_next_ct(struct codetag_iterator *iter);
+void codetag_early_walk(const struct codetag_type_desc *desc,
+			void (*callback)(struct codetag *ct));
 
 void codetag_to_text(struct seq_buf *out, struct codetag *ct);
 
diff --git a/lib/codetag.c b/lib/codetag.c
index ef7634c7ee18..9d563c8c088a 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -154,6 +154,22 @@ static struct codetag_range get_section_range(struct module *mod,
 	};
 }
 
+void codetag_early_walk(const struct codetag_type_desc *desc,
+			void (*callback)(struct codetag *ct))
+{
+	struct codetag_range range;
+	struct codetag *ct;
+
+	range = get_section_range(NULL, desc->section);
+	if (!range.start || !range.stop ||
+	    range.start == range.stop ||
+	    range.start > range.stop)
+		return;
+
+	for (ct = range.start; ct < range.stop; ct = ((void *)ct + desc->tag_size))
+		callback(ct);
+}
+
 static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
 {
 	struct codetag_range range;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/5] alloc_tag: Track fixed vs dynamic sized kmalloc calls
  2024-08-09  7:33 [RFC][PATCH 0/5] slab: Allocate and use per-call-site caches Kees Cook
                   ` (2 preceding siblings ...)
  2024-08-09  7:33 ` [PATCH 3/5] codetag: Introduce codetag_early_walk() Kees Cook
@ 2024-08-09  7:33 ` Kees Cook
  2024-08-29 16:00   ` Suren Baghdasaryan
  2024-08-09  7:33 ` [PATCH 5/5] slab: Allocate and use per-call-site caches Kees Cook
  4 siblings, 1 reply; 17+ messages in thread
From: Kees Cook @ 2024-08-09  7:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, Suren Baghdasaryan, Kent Overstreet, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, linux-mm, GONG, Ruiqi, Jann Horn,
	Matteo Rizzo, jvoisin, Xiu Jianfeng, linux-kernel,
	linux-hardening

For slab allocations, record whether the call site is using a fixed
size (i.e. compile time constant) or a dynamic size. Report the results
in /proc/allocinfo.

Improvements needed:
- examine realloc routines for needed coverage

Signed-off-by: Kees Cook <kees@kernel.org>
---
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: linux-mm@kvack.org
---
 include/linux/alloc_tag.h | 30 ++++++++++++++++++++++++++----
 include/linux/slab.h      | 16 ++++++++--------
 lib/alloc_tag.c           |  8 ++++++++
 mm/Kconfig                |  8 ++++++++
 4 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 8c61ccd161ba..f5d8c5849b82 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -20,6 +20,19 @@ struct alloc_tag_counters {
 	u64 calls;
 };
 
+#ifdef CONFIG_SLAB_PER_SITE
+struct alloc_meta {
+	/* 0 means non-slab, SIZE_MAX means dynamic, and everything else is fixed-size. */
+	size_t sized;
+};
+#define ALLOC_META_INIT(_size)	{		\
+		.sized = (__builtin_constant_p(_size) ? (_size) : SIZE_MAX), \
+	}
+#else
+struct alloc_meta { };
+#define ALLOC_META_INIT(_size)	{ }
+#endif
+
 /*
  * An instance of this structure is created in a special ELF section at every
  * allocation callsite. At runtime, the special section is treated as
@@ -27,6 +40,7 @@ struct alloc_tag_counters {
  */
 struct alloc_tag {
 	struct codetag			ct;
+	struct alloc_meta		meta;
 	struct alloc_tag_counters __percpu	*counters;
 } __aligned(8);
 
@@ -74,19 +88,21 @@ static inline struct alloc_tag *ct_to_alloc_tag(struct codetag *ct)
  */
 DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
 
-#define DEFINE_ALLOC_TAG(_alloc_tag)						\
+#define DEFINE_ALLOC_TAG(_alloc_tag, _meta_init)				\
 	static struct alloc_tag _alloc_tag __used __aligned(8)			\
 	__section("alloc_tags") = {						\
 		.ct = CODE_TAG_INIT,						\
+		.meta = _meta_init,						\
 		.counters = &_shared_alloc_tag };
 
 #else /* ARCH_NEEDS_WEAK_PER_CPU */
 
-#define DEFINE_ALLOC_TAG(_alloc_tag)						\
+#define DEFINE_ALLOC_TAG(_alloc_tag, _meta_init)				\
 	static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr);	\
 	static struct alloc_tag _alloc_tag __used __aligned(8)			\
 	__section("alloc_tags") = {						\
 		.ct = CODE_TAG_INIT,						\
+		.meta = _meta_init,						\
 		.counters = &_alloc_tag_cntr };
 
 #endif /* ARCH_NEEDS_WEAK_PER_CPU */
@@ -191,7 +207,7 @@ static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes)
 
 #else /* CONFIG_MEM_ALLOC_PROFILING */
 
-#define DEFINE_ALLOC_TAG(_alloc_tag)
+#define DEFINE_ALLOC_TAG(_alloc_tag, _meta_init)
 static inline bool mem_alloc_profiling_enabled(void) { return false; }
 static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag,
 				 size_t bytes) {}
@@ -210,8 +226,14 @@ static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes) {}
 
 #define alloc_hooks(_do_alloc)						\
 ({									\
-	DEFINE_ALLOC_TAG(_alloc_tag);					\
+	DEFINE_ALLOC_TAG(_alloc_tag, { });				\
 	alloc_hooks_tag(&_alloc_tag, _do_alloc);			\
 })
 
+#define alloc_sized_hooks(_do_alloc, _size, ...)			\
+({									\
+	DEFINE_ALLOC_TAG(_alloc_tag, ALLOC_META_INIT(_size));		\
+	alloc_hooks_tag(&_alloc_tag, _do_alloc(_size, __VA_ARGS__));	\
+})
+
 #endif /* _LINUX_ALLOC_TAG_H */
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 86cb61a0102c..314d24c79e05 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -541,7 +541,7 @@ static_assert(PAGE_SHIFT <= 20);
  */
 void *kmem_cache_alloc_noprof(struct kmem_cache *cachep,
 			      gfp_t flags) __assume_slab_alignment __malloc;
-#define kmem_cache_alloc(...)			alloc_hooks(kmem_cache_alloc_noprof(__VA_ARGS__))
+#define kmem_cache_alloc(...)		alloc_hooks(kmem_cache_alloc_noprof(__VA_ARGS__))
 
 void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
 			    gfp_t gfpflags) __assume_slab_alignment __malloc;
@@ -685,7 +685,7 @@ static __always_inline __alloc_size(1) void *kmalloc_noprof(size_t size, gfp_t f
 	}
 	return __kmalloc_noprof(size, flags);
 }
-#define kmalloc(...)				alloc_hooks(kmalloc_noprof(__VA_ARGS__))
+#define kmalloc(size, ...)	alloc_sized_hooks(kmalloc_noprof, size, __VA_ARGS__)
 
 #define kmem_buckets_alloc(_b, _size, _flags)	\
 	alloc_hooks(__kmalloc_node_noprof(PASS_BUCKET_PARAMS(_size, _b), _flags, NUMA_NO_NODE))
@@ -708,7 +708,7 @@ static __always_inline __alloc_size(1) void *kmalloc_node_noprof(size_t size, gf
 	}
 	return __kmalloc_node_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node);
 }
-#define kmalloc_node(...)			alloc_hooks(kmalloc_node_noprof(__VA_ARGS__))
+#define kmalloc_node(size, ...)		alloc_sized_hooks(kmalloc_node_noprof, size, __VA_ARGS__)
 
 /**
  * kmalloc_array - allocate memory for an array.
@@ -726,7 +726,7 @@ static inline __alloc_size(1, 2) void *kmalloc_array_noprof(size_t n, size_t siz
 		return kmalloc_noprof(bytes, flags);
 	return kmalloc_noprof(bytes, flags);
 }
-#define kmalloc_array(...)			alloc_hooks(kmalloc_array_noprof(__VA_ARGS__))
+#define kmalloc_array(...)		alloc_hooks(kmalloc_array_noprof(__VA_ARGS__))
 
 /**
  * krealloc_array - reallocate memory for an array.
@@ -761,8 +761,8 @@ void *__kmalloc_node_track_caller_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flag
 					 unsigned long caller) __alloc_size(1);
 #define kmalloc_node_track_caller_noprof(size, flags, node, caller) \
 	__kmalloc_node_track_caller_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node, caller)
-#define kmalloc_node_track_caller(...)		\
-	alloc_hooks(kmalloc_node_track_caller_noprof(__VA_ARGS__, _RET_IP_))
+#define kmalloc_node_track_caller(size, ...)		\
+	alloc_sized_hooks(kmalloc_node_track_caller_noprof, size, __VA_ARGS__, _RET_IP_)
 
 /*
  * kmalloc_track_caller is a special version of kmalloc that records the
@@ -807,13 +807,13 @@ static inline __alloc_size(1) void *kzalloc_noprof(size_t size, gfp_t flags)
 {
 	return kmalloc_noprof(size, flags | __GFP_ZERO);
 }
-#define kzalloc(...)				alloc_hooks(kzalloc_noprof(__VA_ARGS__))
+#define kzalloc(size, ...)			alloc_sized_hooks(kzalloc_noprof, size, __VA_ARGS__)
 #define kzalloc_node(_size, _flags, _node)	kmalloc_node(_size, (_flags)|__GFP_ZERO, _node)
 
 void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node) __alloc_size(1);
 #define kvmalloc_node_noprof(size, flags, node)	\
 	__kvmalloc_node_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node)
-#define kvmalloc_node(...)			alloc_hooks(kvmalloc_node_noprof(__VA_ARGS__))
+#define kvmalloc_node(size, ...)		alloc_sized_hooks(kvmalloc_node_noprof, size, __VA_ARGS__)
 
 #define kvmalloc(_size, _flags)			kvmalloc_node(_size, _flags, NUMA_NO_NODE)
 #define kvmalloc_noprof(_size, _flags)		kvmalloc_node_noprof(_size, _flags, NUMA_NO_NODE)
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 81e5f9a70f22..6d2cb72bf269 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -78,6 +78,14 @@ static void alloc_tag_to_text(struct seq_buf *out, struct codetag *ct)
 
 	seq_buf_printf(out, "%12lli %8llu ", bytes, counter.calls);
 	codetag_to_text(out, ct);
+#ifdef CONFIG_SLAB_PER_SITE
+	seq_buf_putc(out, ' ');
+	seq_buf_printf(out, "size:%s(%zu) slab:%s",
+				tag->meta.sized == 0 ? "non-slab" :
+					tag->meta.sized == SIZE_MAX ? "dynamic" : "fixed",
+				tag->meta.sized == SIZE_MAX ? 0 : tag->meta.sized,
+				tag->meta.cache ? "ready" : "unused");
+#endif
 	seq_buf_putc(out, ' ');
 	seq_buf_putc(out, '\n');
 }
diff --git a/mm/Kconfig b/mm/Kconfig
index b72e7d040f78..855c63c3270d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -296,6 +296,14 @@ config SLAB_BUCKETS
 
 	  If unsure, say Y.
 
+config SLAB_PER_SITE
+	bool "Separate slab allocations by call size"
+	depends on !SLUB_TINY
+	default SLAB_FREELIST_HARDENED
+	select SLAB_BUCKETS
+	help
+	  Track sizes of kmalloc() call sites.
+
 config SLUB_STATS
 	default n
 	bool "Enable performance statistics"
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/5] slab: Allocate and use per-call-site caches
  2024-08-09  7:33 [RFC][PATCH 0/5] slab: Allocate and use per-call-site caches Kees Cook
                   ` (3 preceding siblings ...)
  2024-08-09  7:33 ` [PATCH 4/5] alloc_tag: Track fixed vs dynamic sized kmalloc calls Kees Cook
@ 2024-08-09  7:33 ` Kees Cook
  2024-08-17  1:30   ` Xiu Jianfeng
  2024-08-29 17:03   ` Suren Baghdasaryan
  4 siblings, 2 replies; 17+ messages in thread
From: Kees Cook @ 2024-08-09  7:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, Suren Baghdasaryan, Kent Overstreet, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, linux-mm, GONG, Ruiqi, Jann Horn,
	Matteo Rizzo, jvoisin, Xiu Jianfeng, linux-kernel,
	linux-hardening

Use separate per-call-site kmem_cache or kmem_buckets. These are
allocated on demand to avoid wasting memory for unused caches.

A few caches need to be allocated very early to support allocating the
caches themselves: kstrdup(), kvasprintf(), and pcpu_mem_zalloc(). Any
GFP_ATOMIC allocations are currently left to be allocated from
KMALLOC_NORMAL.

With a distro config, /proc/slabinfo grows from ~400 entries to ~2200.

Since this feature (CONFIG_SLAB_PER_SITE) is redundant to
CONFIG_RANDOM_KMALLOC_CACHES, mark it a incompatible. Add Kconfig help
text that compares the features.

Improvements needed:
- Retain call site gfp flags in alloc_tag meta field to:
  - pre-allocate all GFP_ATOMIC caches (since their caches cannot
    be allocated on demand unless we want them to be GFP_ATOMIC
    themselves...)
  - Separate MEMCG allocations as well
- Allocate individual caches within kmem_buckets on demand to
  further reduce memory usage overhead.

Signed-off-by: Kees Cook <kees@kernel.org>
---
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: linux-mm@kvack.org
---
 include/linux/alloc_tag.h |   8 +++
 lib/alloc_tag.c           | 121 +++++++++++++++++++++++++++++++++++---
 mm/Kconfig                |  19 +++++-
 mm/slab_common.c          |   1 +
 mm/slub.c                 |  31 +++++++++-
 5 files changed, 170 insertions(+), 10 deletions(-)

diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index f5d8c5849b82..c95628f9b049 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -24,6 +24,7 @@ struct alloc_tag_counters {
 struct alloc_meta {
 	/* 0 means non-slab, SIZE_MAX means dynamic, and everything else is fixed-size. */
 	size_t sized;
+	void *cache;
 };
 #define ALLOC_META_INIT(_size)	{		\
 		.sized = (__builtin_constant_p(_size) ? (_size) : SIZE_MAX), \
@@ -216,6 +217,13 @@ static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes) {}
 
 #endif /* CONFIG_MEM_ALLOC_PROFILING */
 
+#ifdef CONFIG_SLAB_PER_SITE
+void alloc_tag_early_walk(void);
+void alloc_tag_site_init(struct codetag *ct, bool ondemand);
+#else
+static inline void alloc_tag_early_walk(void) {}
+#endif
+
 #define alloc_hooks_tag(_tag, _do_alloc)				\
 ({									\
 	struct alloc_tag * __maybe_unused _old = alloc_tag_save(_tag);	\
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 6d2cb72bf269..e8a66a7c4a6b 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -157,6 +157,89 @@ static void __init procfs_init(void)
 	proc_create_seq("allocinfo", 0400, NULL, &allocinfo_seq_op);
 }
 
+#ifdef CONFIG_SLAB_PER_SITE
+static bool ondemand_ready;
+
+void alloc_tag_site_init(struct codetag *ct, bool ondemand)
+{
+	struct alloc_tag *tag = ct_to_alloc_tag(ct);
+	char *name;
+	void *p, *old;
+
+	/* Only handle kmalloc allocations. */
+	if (!tag->meta.sized)
+		return;
+
+	/* Must be ready for on-demand allocations. */
+	if (ondemand && !ondemand_ready)
+		return;
+
+	old = READ_ONCE(tag->meta.cache);
+	/* Already allocated? */
+	if (old)
+		return;
+
+	if (tag->meta.sized < SIZE_MAX) {
+		/* Fixed-size allocations. */
+		name = kasprintf(GFP_KERNEL, "f:%zu:%s:%d", tag->meta.sized, ct->function, ct->lineno);
+		if (WARN_ON_ONCE(!name))
+			return;
+		/*
+		 * As with KMALLOC_NORMAL, the entire allocation needs to be
+		 * open to usercopy access. :(
+		 */
+		p = kmem_cache_create_usercopy(name, tag->meta.sized, 0,
+					       SLAB_NO_MERGE, 0, tag->meta.sized,
+					       NULL);
+	} else {
+		/* Dynamically-size allocations. */
+		name = kasprintf(GFP_KERNEL, "d:%s:%d", ct->function, ct->lineno);
+		if (WARN_ON_ONCE(!name))
+			return;
+		p = kmem_buckets_create(name, SLAB_NO_MERGE, 0, UINT_MAX, NULL);
+	}
+	if (p) {
+		if (unlikely(!try_cmpxchg(&tag->meta.cache, &old, p))) {
+			/* We lost the allocation race; clean up. */
+			if (tag->meta.sized < SIZE_MAX)
+				kmem_cache_destroy(p);
+			else
+				kmem_buckets_destroy(p);
+		}
+	}
+	kfree(name);
+}
+
+static void alloc_tag_site_init_early(struct codetag *ct)
+{
+	/* Explicitly initialize the caches needed to initialize caches. */
+	if (strcmp(ct->function, "kstrdup") == 0 ||
+	    strcmp(ct->function, "kvasprintf") == 0 ||
+	    strcmp(ct->function, "pcpu_mem_zalloc") == 0)
+		alloc_tag_site_init(ct, false);
+
+	/* TODO: pre-allocate GFP_ATOMIC caches here. */
+}
+#endif
+
+static void alloc_tag_module_load(struct codetag_type *cttype,
+				  struct codetag_module *cmod)
+{
+#ifdef CONFIG_SLAB_PER_SITE
+	struct codetag_iterator iter;
+	struct codetag *ct;
+
+	iter = codetag_get_ct_iter(cttype);
+	for (ct = codetag_next_ct(&iter); ct; ct = codetag_next_ct(&iter)) {
+		if (iter.cmod != cmod)
+			continue;
+
+		/* TODO: pre-allocate GFP_ATOMIC caches here. */
+		//alloc_tag_site_init(ct, false);
+	}
+#endif
+}
+
 static bool alloc_tag_module_unload(struct codetag_type *cttype,
 				    struct codetag_module *cmod)
 {
@@ -175,8 +258,21 @@ static bool alloc_tag_module_unload(struct codetag_type *cttype,
 
 		if (WARN(counter.bytes,
 			 "%s:%u module %s func:%s has %llu allocated at module unload",
-			 ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes))
+			 ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes)) {
 			module_unused = false;
+		}
+#ifdef CONFIG_SLAB_PER_SITE
+		else if (tag->meta.sized) {
+			/* Remove the allocated caches, if possible. */
+			void *p = READ_ONCE(tag->meta.cache);
+
+			WRITE_ONCE(tag->meta.cache, NULL);
+			if (tag->meta.sized < SIZE_MAX)
+				kmem_cache_destroy(p);
+			else
+				kmem_buckets_destroy(p);
+		}
+#endif
 	}
 
 	return module_unused;
@@ -260,15 +356,16 @@ static void __init sysctl_init(void)
 static inline void sysctl_init(void) {}
 #endif /* CONFIG_SYSCTL */
 
+static const struct codetag_type_desc alloc_tag_desc = {
+	.section	= "alloc_tags",
+	.tag_size	= sizeof(struct alloc_tag),
+	.module_load	= alloc_tag_module_load,
+	.module_unload	= alloc_tag_module_unload,
+};
+
 static int __init alloc_tag_init(void)
 {
-	const struct codetag_type_desc desc = {
-		.section	= "alloc_tags",
-		.tag_size	= sizeof(struct alloc_tag),
-		.module_unload	= alloc_tag_module_unload,
-	};
-
-	alloc_tag_cttype = codetag_register_type(&desc);
+	alloc_tag_cttype = codetag_register_type(&alloc_tag_desc);
 	if (IS_ERR(alloc_tag_cttype))
 		return PTR_ERR(alloc_tag_cttype);
 
@@ -278,3 +375,11 @@ static int __init alloc_tag_init(void)
 	return 0;
 }
 module_init(alloc_tag_init);
+
+#ifdef CONFIG_SLAB_PER_SITE
+void alloc_tag_early_walk(void)
+{
+	codetag_early_walk(&alloc_tag_desc, alloc_tag_site_init_early);
+	ondemand_ready = true;
+}
+#endif
diff --git a/mm/Kconfig b/mm/Kconfig
index 855c63c3270d..4f01cb6dd32e 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -302,7 +302,20 @@ config SLAB_PER_SITE
 	default SLAB_FREELIST_HARDENED
 	select SLAB_BUCKETS
 	help
-	  Track sizes of kmalloc() call sites.
+	  As a defense against shared-cache "type confusion" use-after-free
+	  attacks, every kmalloc()-family call allocates from a separate
+	  kmem_cache (or when dynamically sized, kmem_buckets). Attackers
+	  will no longer be able to groom malicious objects via similarly
+	  sized allocations that share the same cache as the target object.
+
+	  This increases the "at rest" kmalloc slab memory usage by
+	  roughly 5x (around 7MiB), and adds the potential for greater
+	  long-term memory fragmentation. However, some workloads
+	  actually see performance improvements when single allocation
+	  sites are hot.
+
+	  For a similar defense, see CONFIG_RANDOM_KMALLOC_CACHES, which
+	  has less memory usage overhead, but is probabilistic.
 
 config SLUB_STATS
 	default n
@@ -331,6 +344,7 @@ config SLUB_CPU_PARTIAL
 config RANDOM_KMALLOC_CACHES
 	default n
 	depends on !SLUB_TINY
+	depends on !SLAB_PER_SITE
 	bool "Randomize slab caches for normal kmalloc"
 	help
 	  A hardening feature that creates multiple copies of slab caches for
@@ -345,6 +359,9 @@ config RANDOM_KMALLOC_CACHES
 	  limited degree of memory and CPU overhead that relates to hardware and
 	  system workload.
 
+	  For a similar defense, see CONFIG_SLAB_PER_SITE, which is
+	  deterministic, but has greater memory usage overhead.
+
 endmenu # Slab allocator options
 
 config SHUFFLE_PAGE_ALLOCATOR
diff --git a/mm/slab_common.c b/mm/slab_common.c
index fc698cba0ebe..09506bfa972c 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1040,6 +1040,7 @@ void __init create_kmalloc_caches(void)
 		kmem_buckets_cache = kmem_cache_create("kmalloc_buckets",
 						       sizeof(kmem_buckets),
 						       0, SLAB_NO_MERGE, NULL);
+	alloc_tag_early_walk();
 }
 
 /**
diff --git a/mm/slub.c b/mm/slub.c
index 3520acaf9afa..d14102c4b4d7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4135,6 +4135,35 @@ void *__kmalloc_large_node_noprof(size_t size, gfp_t flags, int node)
 }
 EXPORT_SYMBOL(__kmalloc_large_node_noprof);
 
+static __always_inline
+struct kmem_cache *choose_slab(size_t size, kmem_buckets *b, gfp_t flags,
+			       unsigned long caller)
+{
+#ifdef CONFIG_SLAB_PER_SITE
+	struct alloc_tag *tag = current->alloc_tag;
+
+	if (!b && tag && tag->meta.sized &&
+	    kmalloc_type(flags, caller) == KMALLOC_NORMAL &&
+	    (flags & GFP_ATOMIC) != GFP_ATOMIC) {
+		void *p = READ_ONCE(tag->meta.cache);
+
+		if (!p && slab_state >= UP) {
+			alloc_tag_site_init(&tag->ct, true);
+			p = READ_ONCE(tag->meta.cache);
+		}
+
+		if (tag->meta.sized < SIZE_MAX) {
+			if (p)
+				return p;
+			/* Otherwise continue with default buckets. */
+		} else {
+			b = p;
+		}
+	}
+#endif
+	return kmalloc_slab(size, b, flags, caller);
+}
+
 static __always_inline
 void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,
 			unsigned long caller)
@@ -4152,7 +4181,7 @@ void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,
 	if (unlikely(!size))
 		return ZERO_SIZE_PTR;
 
-	s = kmalloc_slab(size, b, flags, caller);
+	s = choose_slab(size, b, flags, caller);
 
 	ret = slab_alloc_node(s, NULL, flags, node, caller, size);
 	ret = kasan_kmalloc(s, ret, size, flags);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] slab: Allocate and use per-call-site caches
  2024-08-09  7:33 ` [PATCH 5/5] slab: Allocate and use per-call-site caches Kees Cook
@ 2024-08-17  1:30   ` Xiu Jianfeng
  2024-08-22 17:47     ` Kees Cook
  2024-08-29 17:03   ` Suren Baghdasaryan
  1 sibling, 1 reply; 17+ messages in thread
From: Xiu Jianfeng @ 2024-08-17  1:30 UTC (permalink / raw)
  To: Kees Cook, Vlastimil Babka
  Cc: Suren Baghdasaryan, Kent Overstreet, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, linux-mm, GONG, Ruiqi, Jann Horn,
	Matteo Rizzo, jvoisin, linux-kernel, linux-hardening

Hi Kees,

On 2024/8/9 15:33, Kees Cook wrote:
> Use separate per-call-site kmem_cache or kmem_buckets. These are
> allocated on demand to avoid wasting memory for unused caches.
> 
> A few caches need to be allocated very early to support allocating the
> caches themselves: kstrdup(), kvasprintf(), and pcpu_mem_zalloc(). Any
> GFP_ATOMIC allocations are currently left to be allocated from
> KMALLOC_NORMAL.
> 
> With a distro config, /proc/slabinfo grows from ~400 entries to ~2200.
> 
> Since this feature (CONFIG_SLAB_PER_SITE) is redundant to
> CONFIG_RANDOM_KMALLOC_CACHES, mark it a incompatible. Add Kconfig help
> text that compares the features.
> 
> Improvements needed:
> - Retain call site gfp flags in alloc_tag meta field to:
>   - pre-allocate all GFP_ATOMIC caches (since their caches cannot
>     be allocated on demand unless we want them to be GFP_ATOMIC
>     themselves...)
>   - Separate MEMCG allocations as well
> - Allocate individual caches within kmem_buckets on demand to
>   further reduce memory usage overhead.
> 
> Signed-off-by: Kees Cook <kees@kernel.org>
> ---
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Kent Overstreet <kent.overstreet@linux.dev>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Cc: linux-mm@kvack.org
> ---
>  include/linux/alloc_tag.h |   8 +++
>  lib/alloc_tag.c           | 121 +++++++++++++++++++++++++++++++++++---
>  mm/Kconfig                |  19 +++++-
>  mm/slab_common.c          |   1 +
>  mm/slub.c                 |  31 +++++++++-
>  5 files changed, 170 insertions(+), 10 deletions(-)
> 

[...]

> diff --git a/mm/slub.c b/mm/slub.c
> index 3520acaf9afa..d14102c4b4d7 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4135,6 +4135,35 @@ void *__kmalloc_large_node_noprof(size_t size, gfp_t flags, int node)
>  }
>  EXPORT_SYMBOL(__kmalloc_large_node_noprof);
>  
> +static __always_inline
> +struct kmem_cache *choose_slab(size_t size, kmem_buckets *b, gfp_t flags,
> +			       unsigned long caller)
> +{
> +#ifdef CONFIG_SLAB_PER_SITE
> +	struct alloc_tag *tag = current->alloc_tag;

There is a compile error here if CONFIG_MEM_ALLOC_PROFILING is disabled
when I test this patchset.

mm/slub.c: In function ‘choose_slab’:
mm/slub.c:4187:40: error: ‘struct task_struct’ has no member named
‘alloc_tag’
 4187 |         struct alloc_tag *tag = current->alloc_tag;
      |                                        ^~
  CC      mm/page_reporting.o

maybe CONFIG_SLAB_PER_SITE should depend on CONFIG_MEM_ALLOC_PROFILING


> +
> +	if (!b && tag && tag->meta.sized &&
> +	    kmalloc_type(flags, caller) == KMALLOC_NORMAL &&
> +	    (flags & GFP_ATOMIC) != GFP_ATOMIC) {
> +		void *p = READ_ONCE(tag->meta.cache);
> +
> +		if (!p && slab_state >= UP) {
> +			alloc_tag_site_init(&tag->ct, true);
> +			p = READ_ONCE(tag->meta.cache);
> +		}
> +
> +		if (tag->meta.sized < SIZE_MAX) {
> +			if (p)
> +				return p;
> +			/* Otherwise continue with default buckets. */
> +		} else {
> +			b = p;
> +		}
> +	}
> +#endif
> +	return kmalloc_slab(size, b, flags, caller);
> +}
> +
>  static __always_inline
>  void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,
>  			unsigned long caller)
> @@ -4152,7 +4181,7 @@ void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,
>  	if (unlikely(!size))
>  		return ZERO_SIZE_PTR;
>  
> -	s = kmalloc_slab(size, b, flags, caller);
> +	s = choose_slab(size, b, flags, caller);
>  
>  	ret = slab_alloc_node(s, NULL, flags, node, caller, size);
>  	ret = kasan_kmalloc(s, ret, size, flags);



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] slab: Allocate and use per-call-site caches
  2024-08-17  1:30   ` Xiu Jianfeng
@ 2024-08-22 17:47     ` Kees Cook
  0 siblings, 0 replies; 17+ messages in thread
From: Kees Cook @ 2024-08-22 17:47 UTC (permalink / raw)
  To: Xiu Jianfeng
  Cc: Vlastimil Babka, Suren Baghdasaryan, Kent Overstreet,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
	GONG, Ruiqi, Jann Horn, Matteo Rizzo, jvoisin, linux-kernel,
	linux-hardening

On Sat, Aug 17, 2024 at 09:30:58AM +0800, Xiu Jianfeng wrote:
> Hi Kees,
> 
> On 2024/8/9 15:33, Kees Cook wrote:
> > Use separate per-call-site kmem_cache or kmem_buckets. These are
> > allocated on demand to avoid wasting memory for unused caches.
> > 
> > A few caches need to be allocated very early to support allocating the
> > caches themselves: kstrdup(), kvasprintf(), and pcpu_mem_zalloc(). Any
> > GFP_ATOMIC allocations are currently left to be allocated from
> > KMALLOC_NORMAL.
> > 
> > With a distro config, /proc/slabinfo grows from ~400 entries to ~2200.
> > 
> > Since this feature (CONFIG_SLAB_PER_SITE) is redundant to
> > CONFIG_RANDOM_KMALLOC_CACHES, mark it a incompatible. Add Kconfig help
> > text that compares the features.
> > 
> > Improvements needed:
> > - Retain call site gfp flags in alloc_tag meta field to:
> >   - pre-allocate all GFP_ATOMIC caches (since their caches cannot
> >     be allocated on demand unless we want them to be GFP_ATOMIC
> >     themselves...)
> >   - Separate MEMCG allocations as well
> > - Allocate individual caches within kmem_buckets on demand to
> >   further reduce memory usage overhead.
> > 
> > Signed-off-by: Kees Cook <kees@kernel.org>
> > ---
> > Cc: Suren Baghdasaryan <surenb@google.com>
> > Cc: Kent Overstreet <kent.overstreet@linux.dev>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Christoph Lameter <cl@linux.com>
> > Cc: Pekka Enberg <penberg@kernel.org>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Roman Gushchin <roman.gushchin@linux.dev>
> > Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> > Cc: linux-mm@kvack.org
> > ---
> >  include/linux/alloc_tag.h |   8 +++
> >  lib/alloc_tag.c           | 121 +++++++++++++++++++++++++++++++++++---
> >  mm/Kconfig                |  19 +++++-
> >  mm/slab_common.c          |   1 +
> >  mm/slub.c                 |  31 +++++++++-
> >  5 files changed, 170 insertions(+), 10 deletions(-)
> > 
> 
> [...]
> 
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 3520acaf9afa..d14102c4b4d7 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -4135,6 +4135,35 @@ void *__kmalloc_large_node_noprof(size_t size, gfp_t flags, int node)
> >  }
> >  EXPORT_SYMBOL(__kmalloc_large_node_noprof);
> >  
> > +static __always_inline
> > +struct kmem_cache *choose_slab(size_t size, kmem_buckets *b, gfp_t flags,
> > +			       unsigned long caller)
> > +{
> > +#ifdef CONFIG_SLAB_PER_SITE
> > +	struct alloc_tag *tag = current->alloc_tag;
> 
> There is a compile error here if CONFIG_MEM_ALLOC_PROFILING is disabled
> when I test this patchset.
> 
> mm/slub.c: In function ‘choose_slab’:
> mm/slub.c:4187:40: error: ‘struct task_struct’ has no member named
> ‘alloc_tag’
>  4187 |         struct alloc_tag *tag = current->alloc_tag;
>       |                                        ^~
>   CC      mm/page_reporting.o
> 
> maybe CONFIG_SLAB_PER_SITE should depend on CONFIG_MEM_ALLOC_PROFILING

Thanks! I tried to make the Kconfig use the right dependencies, but I
clearly missed something. There is also some weird behavior between
"depends" and "select". I will get this fixed for the next version.

-Kees

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/5] codetag: Run module_load hooks for builtin codetags
  2024-08-09  7:33 ` [PATCH 2/5] codetag: Run module_load hooks for builtin codetags Kees Cook
@ 2024-08-29 15:02   ` Suren Baghdasaryan
  2024-09-11 22:17     ` Kees Cook
  0 siblings, 1 reply; 17+ messages in thread
From: Suren Baghdasaryan @ 2024-08-29 15:02 UTC (permalink / raw)
  To: Kees Cook
  Cc: Vlastimil Babka, Kent Overstreet, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, GONG, Ruiqi, Jann Horn, Matteo Rizzo,
	jvoisin, Xiu Jianfeng, linux-kernel, linux-hardening

On Fri, Aug 9, 2024 at 12:33 AM Kees Cook <kees@kernel.org> wrote:
>
> The module_load callback should still run for builtin codetags that
> define it, even in a non-modular kernel. (i.e. for the cmod->mod == NULL
> case).
>
> Signed-off-by: Kees Cook <kees@kernel.org>

Hi Kees,
I finally got some time and started reviewing your patches.
Coincidentally I recently posted a fix for this issue at
https://lore.kernel.org/all/20240828231536.1770519-1-surenb@google.com/
Your fix is missing a small part when codetag_module_init() is using
mod->name while struct module is undefined (CONFIG_MODULES=n) and you
should see this build error:

In file included from ./include/linux/kernel.h:31,
                 from ./include/linux/cpumask.h:11,
                 from ./include/linux/smp.h:13,
                 from ./include/linux/lockdep.h:14,
                 from ./include/linux/radix-tree.h:14,
                 from ./include/linux/idr.h:15,
                 from lib/codetag.c:3:
lib/codetag.c: In function ‘codetag_module_init’:
  CC      drivers/acpi/acpica/extrace.o
lib/codetag.c:167:34: error: invalid use of undefined type ‘struct module’
  167 |                         mod ? mod->name : "(built-in)");
      |                                  ^~

Thanks,
Suren.


> ---
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Kent Overstreet <kent.overstreet@linux.dev>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Cc: linux-mm@kvack.org
> ---
>  lib/codetag.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/lib/codetag.c b/lib/codetag.c
> index 5ace625f2328..ef7634c7ee18 100644
> --- a/lib/codetag.c
> +++ b/lib/codetag.c
> @@ -125,7 +125,6 @@ static inline size_t range_size(const struct codetag_type *cttype,
>                         cttype->desc.tag_size;
>  }
>
> -#ifdef CONFIG_MODULES
>  static void *get_symbol(struct module *mod, const char *prefix, const char *name)
>  {
>         DECLARE_SEQ_BUF(sb, KSYM_NAME_LEN);
> @@ -199,6 +198,7 @@ static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
>         return 0;
>  }
>
> +#ifdef CONFIG_MODULES
>  void codetag_load_module(struct module *mod)
>  {
>         struct codetag_type *cttype;
> @@ -248,9 +248,6 @@ bool codetag_unload_module(struct module *mod)
>
>         return unload_ok;
>  }
> -
> -#else /* CONFIG_MODULES */
> -static int codetag_module_init(struct codetag_type *cttype, struct module *mod) { return 0; }
>  #endif /* CONFIG_MODULES */
>
>  struct codetag_type *
> --
> 2.34.1
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] codetag: Introduce codetag_early_walk()
  2024-08-09  7:33 ` [PATCH 3/5] codetag: Introduce codetag_early_walk() Kees Cook
@ 2024-08-29 15:39   ` Suren Baghdasaryan
  2024-09-11 22:18     ` Kees Cook
  0 siblings, 1 reply; 17+ messages in thread
From: Suren Baghdasaryan @ 2024-08-29 15:39 UTC (permalink / raw)
  To: Kees Cook
  Cc: Vlastimil Babka, Kent Overstreet, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, GONG, Ruiqi, Jann Horn, Matteo Rizzo,
	jvoisin, Xiu Jianfeng, linux-kernel, linux-hardening

On Fri, Aug 9, 2024 at 12:33 AM Kees Cook <kees@kernel.org> wrote:
>
> In order to process builtin alloc_tags much earlier during boot (before
> register_codetag() is processed), provide codetag_early_walk() that
> perform a lockless walk with a specified callback function. This will be
> used to allocate required caches that cannot be allocated on demand.
>
> Signed-off-by: Kees Cook <kees@kernel.org>
> ---
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Kent Overstreet <kent.overstreet@linux.dev>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Cc: linux-mm@kvack.org
> ---
>  include/linux/codetag.h |  2 ++
>  lib/codetag.c           | 16 ++++++++++++++++
>  2 files changed, 18 insertions(+)
>
> diff --git a/include/linux/codetag.h b/include/linux/codetag.h
> index c2a579ccd455..9eb1fcd90570 100644
> --- a/include/linux/codetag.h
> +++ b/include/linux/codetag.h
> @@ -64,6 +64,8 @@ void codetag_lock_module_list(struct codetag_type *cttype, bool lock);
>  bool codetag_trylock_module_list(struct codetag_type *cttype);
>  struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype);
>  struct codetag *codetag_next_ct(struct codetag_iterator *iter);
> +void codetag_early_walk(const struct codetag_type_desc *desc,
> +                       void (*callback)(struct codetag *ct));
>
>  void codetag_to_text(struct seq_buf *out, struct codetag *ct);
>
> diff --git a/lib/codetag.c b/lib/codetag.c
> index ef7634c7ee18..9d563c8c088a 100644
> --- a/lib/codetag.c
> +++ b/lib/codetag.c
> @@ -154,6 +154,22 @@ static struct codetag_range get_section_range(struct module *mod,
>         };
>  }
>
> +void codetag_early_walk(const struct codetag_type_desc *desc,
> +                       void (*callback)(struct codetag *ct))
> +{
> +       struct codetag_range range;
> +       struct codetag *ct;
> +
> +       range = get_section_range(NULL, desc->section);
> +       if (!range.start || !range.stop ||
> +           range.start == range.stop ||
> +           range.start > range.stop)
> +               return;

I think this check can be simplified to:

        if (!range.start || range.start >= range.stop)
                return;

nit: Technically (!range.start) should also never trigger. In a valid
image these symbols are either missing (range.start == range.stop ==
NULL) or both are defined and (range.start < range.stop).

> +
> +       for (ct = range.start; ct < range.stop; ct = ((void *)ct + desc->tag_size))
> +               callback(ct);
> +}
> +
>  static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
>  {
>         struct codetag_range range;
> --
> 2.34.1
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] alloc_tag: Track fixed vs dynamic sized kmalloc calls
  2024-08-09  7:33 ` [PATCH 4/5] alloc_tag: Track fixed vs dynamic sized kmalloc calls Kees Cook
@ 2024-08-29 16:00   ` Suren Baghdasaryan
  2024-09-11 22:23     ` Kees Cook
  0 siblings, 1 reply; 17+ messages in thread
From: Suren Baghdasaryan @ 2024-08-29 16:00 UTC (permalink / raw)
  To: Kees Cook
  Cc: Vlastimil Babka, Kent Overstreet, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, GONG, Ruiqi, Jann Horn, Matteo Rizzo,
	jvoisin, Xiu Jianfeng, linux-kernel, linux-hardening

On Fri, Aug 9, 2024 at 12:33 AM Kees Cook <kees@kernel.org> wrote:
>
> For slab allocations, record whether the call site is using a fixed
> size (i.e. compile time constant) or a dynamic size. Report the results
> in /proc/allocinfo.
>
> Improvements needed:
> - examine realloc routines for needed coverage
>
> Signed-off-by: Kees Cook <kees@kernel.org>
> ---
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Kent Overstreet <kent.overstreet@linux.dev>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Cc: linux-mm@kvack.org
> ---
>  include/linux/alloc_tag.h | 30 ++++++++++++++++++++++++++----
>  include/linux/slab.h      | 16 ++++++++--------
>  lib/alloc_tag.c           |  8 ++++++++
>  mm/Kconfig                |  8 ++++++++
>  4 files changed, 50 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
> index 8c61ccd161ba..f5d8c5849b82 100644
> --- a/include/linux/alloc_tag.h
> +++ b/include/linux/alloc_tag.h
> @@ -20,6 +20,19 @@ struct alloc_tag_counters {
>         u64 calls;
>  };
>
> +#ifdef CONFIG_SLAB_PER_SITE
> +struct alloc_meta {
> +       /* 0 means non-slab, SIZE_MAX means dynamic, and everything else is fixed-size. */
> +       size_t sized;
> +};
> +#define ALLOC_META_INIT(_size) {               \
> +               .sized = (__builtin_constant_p(_size) ? (_size) : SIZE_MAX), \
> +       }
> +#else
> +struct alloc_meta { };
> +#define ALLOC_META_INIT(_size) { }
> +#endif
> +
>  /*
>   * An instance of this structure is created in a special ELF section at every
>   * allocation callsite. At runtime, the special section is treated as
> @@ -27,6 +40,7 @@ struct alloc_tag_counters {
>   */
>  struct alloc_tag {
>         struct codetag                  ct;
> +       struct alloc_meta               meta;
>         struct alloc_tag_counters __percpu      *counters;
>  } __aligned(8);
>
> @@ -74,19 +88,21 @@ static inline struct alloc_tag *ct_to_alloc_tag(struct codetag *ct)
>   */
>  DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
>
> -#define DEFINE_ALLOC_TAG(_alloc_tag)                                           \
> +#define DEFINE_ALLOC_TAG(_alloc_tag, _meta_init)                               \
>         static struct alloc_tag _alloc_tag __used __aligned(8)                  \
>         __section("alloc_tags") = {                                             \
>                 .ct = CODE_TAG_INIT,                                            \
> +               .meta = _meta_init,                                             \
>                 .counters = &_shared_alloc_tag };
>
>  #else /* ARCH_NEEDS_WEAK_PER_CPU */
>
> -#define DEFINE_ALLOC_TAG(_alloc_tag)                                           \
> +#define DEFINE_ALLOC_TAG(_alloc_tag, _meta_init)                               \
>         static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr);      \
>         static struct alloc_tag _alloc_tag __used __aligned(8)                  \
>         __section("alloc_tags") = {                                             \
>                 .ct = CODE_TAG_INIT,                                            \
> +               .meta = _meta_init,                                             \
>                 .counters = &_alloc_tag_cntr };
>
>  #endif /* ARCH_NEEDS_WEAK_PER_CPU */
> @@ -191,7 +207,7 @@ static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes)
>
>  #else /* CONFIG_MEM_ALLOC_PROFILING */
>
> -#define DEFINE_ALLOC_TAG(_alloc_tag)
> +#define DEFINE_ALLOC_TAG(_alloc_tag, _meta_init)
>  static inline bool mem_alloc_profiling_enabled(void) { return false; }
>  static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag,
>                                  size_t bytes) {}
> @@ -210,8 +226,14 @@ static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes) {}
>
>  #define alloc_hooks(_do_alloc)                                         \
>  ({                                                                     \
> -       DEFINE_ALLOC_TAG(_alloc_tag);                                   \
> +       DEFINE_ALLOC_TAG(_alloc_tag, { });                              \
>         alloc_hooks_tag(&_alloc_tag, _do_alloc);                        \
>  })
>
> +#define alloc_sized_hooks(_do_alloc, _size, ...)                       \
> +({                                                                     \
> +       DEFINE_ALLOC_TAG(_alloc_tag, ALLOC_META_INIT(_size));           \
> +       alloc_hooks_tag(&_alloc_tag, _do_alloc(_size, __VA_ARGS__));    \
> +})
> +
>  #endif /* _LINUX_ALLOC_TAG_H */
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 86cb61a0102c..314d24c79e05 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -541,7 +541,7 @@ static_assert(PAGE_SHIFT <= 20);
>   */
>  void *kmem_cache_alloc_noprof(struct kmem_cache *cachep,
>                               gfp_t flags) __assume_slab_alignment __malloc;
> -#define kmem_cache_alloc(...)                  alloc_hooks(kmem_cache_alloc_noprof(__VA_ARGS__))
> +#define kmem_cache_alloc(...)          alloc_hooks(kmem_cache_alloc_noprof(__VA_ARGS__))

nit: seems like an unnecessary churn.

>
>  void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
>                             gfp_t gfpflags) __assume_slab_alignment __malloc;
> @@ -685,7 +685,7 @@ static __always_inline __alloc_size(1) void *kmalloc_noprof(size_t size, gfp_t f
>         }
>         return __kmalloc_noprof(size, flags);
>  }
> -#define kmalloc(...)                           alloc_hooks(kmalloc_noprof(__VA_ARGS__))
> +#define kmalloc(size, ...)     alloc_sized_hooks(kmalloc_noprof, size, __VA_ARGS__)
>
>  #define kmem_buckets_alloc(_b, _size, _flags)  \
>         alloc_hooks(__kmalloc_node_noprof(PASS_BUCKET_PARAMS(_size, _b), _flags, NUMA_NO_NODE))
> @@ -708,7 +708,7 @@ static __always_inline __alloc_size(1) void *kmalloc_node_noprof(size_t size, gf
>         }
>         return __kmalloc_node_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node);
>  }
> -#define kmalloc_node(...)                      alloc_hooks(kmalloc_node_noprof(__VA_ARGS__))
> +#define kmalloc_node(size, ...)                alloc_sized_hooks(kmalloc_node_noprof, size, __VA_ARGS__)
>
>  /**
>   * kmalloc_array - allocate memory for an array.
> @@ -726,7 +726,7 @@ static inline __alloc_size(1, 2) void *kmalloc_array_noprof(size_t n, size_t siz
>                 return kmalloc_noprof(bytes, flags);
>         return kmalloc_noprof(bytes, flags);
>  }
> -#define kmalloc_array(...)                     alloc_hooks(kmalloc_array_noprof(__VA_ARGS__))
> +#define kmalloc_array(...)             alloc_hooks(kmalloc_array_noprof(__VA_ARGS__))

ditto.

>
>  /**
>   * krealloc_array - reallocate memory for an array.
> @@ -761,8 +761,8 @@ void *__kmalloc_node_track_caller_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flag
>                                          unsigned long caller) __alloc_size(1);
>  #define kmalloc_node_track_caller_noprof(size, flags, node, caller) \
>         __kmalloc_node_track_caller_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node, caller)
> -#define kmalloc_node_track_caller(...)         \
> -       alloc_hooks(kmalloc_node_track_caller_noprof(__VA_ARGS__, _RET_IP_))
> +#define kmalloc_node_track_caller(size, ...)           \
> +       alloc_sized_hooks(kmalloc_node_track_caller_noprof, size, __VA_ARGS__, _RET_IP_)
>
>  /*
>   * kmalloc_track_caller is a special version of kmalloc that records the
> @@ -807,13 +807,13 @@ static inline __alloc_size(1) void *kzalloc_noprof(size_t size, gfp_t flags)
>  {
>         return kmalloc_noprof(size, flags | __GFP_ZERO);
>  }
> -#define kzalloc(...)                           alloc_hooks(kzalloc_noprof(__VA_ARGS__))
> +#define kzalloc(size, ...)                     alloc_sized_hooks(kzalloc_noprof, size, __VA_ARGS__)
>  #define kzalloc_node(_size, _flags, _node)     kmalloc_node(_size, (_flags)|__GFP_ZERO, _node)
>
>  void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node) __alloc_size(1);
>  #define kvmalloc_node_noprof(size, flags, node)        \
>         __kvmalloc_node_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node)
> -#define kvmalloc_node(...)                     alloc_hooks(kvmalloc_node_noprof(__VA_ARGS__))
> +#define kvmalloc_node(size, ...)               alloc_sized_hooks(kvmalloc_node_noprof, size, __VA_ARGS__)
>
>  #define kvmalloc(_size, _flags)                        kvmalloc_node(_size, _flags, NUMA_NO_NODE)
>  #define kvmalloc_noprof(_size, _flags)         kvmalloc_node_noprof(_size, _flags, NUMA_NO_NODE)
> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> index 81e5f9a70f22..6d2cb72bf269 100644
> --- a/lib/alloc_tag.c
> +++ b/lib/alloc_tag.c
> @@ -78,6 +78,14 @@ static void alloc_tag_to_text(struct seq_buf *out, struct codetag *ct)
>
>         seq_buf_printf(out, "%12lli %8llu ", bytes, counter.calls);
>         codetag_to_text(out, ct);
> +#ifdef CONFIG_SLAB_PER_SITE
> +       seq_buf_putc(out, ' ');
> +       seq_buf_printf(out, "size:%s(%zu) slab:%s",
> +                               tag->meta.sized == 0 ? "non-slab" :

"non-slab" term sounds overly specific and we might extend this to
some other allocations as well in the future. I would suggest
"unknown" instead.

> +                                       tag->meta.sized == SIZE_MAX ? "dynamic" : "fixed",
> +                               tag->meta.sized == SIZE_MAX ? 0 : tag->meta.sized,
> +                               tag->meta.cache ? "ready" : "unused");

I don't see "struct alloc_meta" having a "cache" member...

Since you are changing the format of this file, you want to also bump
up the file version inside print_allocinfo_header().


> +#endif
>         seq_buf_putc(out, ' ');
>         seq_buf_putc(out, '\n');
>  }
> diff --git a/mm/Kconfig b/mm/Kconfig
> index b72e7d040f78..855c63c3270d 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -296,6 +296,14 @@ config SLAB_BUCKETS
>
>           If unsure, say Y.
>
> +config SLAB_PER_SITE
> +       bool "Separate slab allocations by call size"
> +       depends on !SLUB_TINY
> +       default SLAB_FREELIST_HARDENED
> +       select SLAB_BUCKETS
> +       help
> +         Track sizes of kmalloc() call sites.
> +
>  config SLUB_STATS
>         default n
>         bool "Enable performance statistics"
> --
> 2.34.1
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] slab: Allocate and use per-call-site caches
  2024-08-09  7:33 ` [PATCH 5/5] slab: Allocate and use per-call-site caches Kees Cook
  2024-08-17  1:30   ` Xiu Jianfeng
@ 2024-08-29 17:03   ` Suren Baghdasaryan
  2024-09-11 22:30     ` Kees Cook
  1 sibling, 1 reply; 17+ messages in thread
From: Suren Baghdasaryan @ 2024-08-29 17:03 UTC (permalink / raw)
  To: Kees Cook
  Cc: Vlastimil Babka, Kent Overstreet, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, GONG, Ruiqi, Jann Horn, Matteo Rizzo,
	jvoisin, Xiu Jianfeng, linux-kernel, linux-hardening

On Fri, Aug 9, 2024 at 12:33 AM Kees Cook <kees@kernel.org> wrote:
>
> Use separate per-call-site kmem_cache or kmem_buckets. These are
> allocated on demand to avoid wasting memory for unused caches.
>
> A few caches need to be allocated very early to support allocating the
> caches themselves: kstrdup(), kvasprintf(), and pcpu_mem_zalloc(). Any
> GFP_ATOMIC allocations are currently left to be allocated from
> KMALLOC_NORMAL.
>
> With a distro config, /proc/slabinfo grows from ~400 entries to ~2200.
>
> Since this feature (CONFIG_SLAB_PER_SITE) is redundant to
> CONFIG_RANDOM_KMALLOC_CACHES, mark it a incompatible. Add Kconfig help
> text that compares the features.
>
> Improvements needed:
> - Retain call site gfp flags in alloc_tag meta field to:
>   - pre-allocate all GFP_ATOMIC caches (since their caches cannot
>     be allocated on demand unless we want them to be GFP_ATOMIC
>     themselves...)

I'm currently working on a feature to identify allocations with
__GFP_ACCOUNT known at compile time (similar to how you handle the
size in the previous patch). Might be something you can reuse/extend.

>   - Separate MEMCG allocations as well

Do you mean allocations with __GFP_ACCOUNT or something else?

> - Allocate individual caches within kmem_buckets on demand to
>   further reduce memory usage overhead.
>
> Signed-off-by: Kees Cook <kees@kernel.org>
> ---
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Kent Overstreet <kent.overstreet@linux.dev>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Cc: linux-mm@kvack.org
> ---
>  include/linux/alloc_tag.h |   8 +++
>  lib/alloc_tag.c           | 121 +++++++++++++++++++++++++++++++++++---
>  mm/Kconfig                |  19 +++++-
>  mm/slab_common.c          |   1 +
>  mm/slub.c                 |  31 +++++++++-
>  5 files changed, 170 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
> index f5d8c5849b82..c95628f9b049 100644
> --- a/include/linux/alloc_tag.h
> +++ b/include/linux/alloc_tag.h
> @@ -24,6 +24,7 @@ struct alloc_tag_counters {
>  struct alloc_meta {
>         /* 0 means non-slab, SIZE_MAX means dynamic, and everything else is fixed-size. */
>         size_t sized;
> +       void *cache;

I see now where that meta.cache in the previous patch came from...
That part should be moved here.

>  };
>  #define ALLOC_META_INIT(_size) {               \
>                 .sized = (__builtin_constant_p(_size) ? (_size) : SIZE_MAX), \
> @@ -216,6 +217,13 @@ static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes) {}
>
>  #endif /* CONFIG_MEM_ALLOC_PROFILING */
>
> +#ifdef CONFIG_SLAB_PER_SITE
> +void alloc_tag_early_walk(void);
> +void alloc_tag_site_init(struct codetag *ct, bool ondemand);
> +#else
> +static inline void alloc_tag_early_walk(void) {}
> +#endif
> +
>  #define alloc_hooks_tag(_tag, _do_alloc)                               \
>  ({                                                                     \
>         struct alloc_tag * __maybe_unused _old = alloc_tag_save(_tag);  \
> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> index 6d2cb72bf269..e8a66a7c4a6b 100644
> --- a/lib/alloc_tag.c
> +++ b/lib/alloc_tag.c
> @@ -157,6 +157,89 @@ static void __init procfs_init(void)
>         proc_create_seq("allocinfo", 0400, NULL, &allocinfo_seq_op);
>  }
>
> +#ifdef CONFIG_SLAB_PER_SITE
> +static bool ondemand_ready;
> +
> +void alloc_tag_site_init(struct codetag *ct, bool ondemand)
> +{
> +       struct alloc_tag *tag = ct_to_alloc_tag(ct);
> +       char *name;
> +       void *p, *old;
> +
> +       /* Only handle kmalloc allocations. */
> +       if (!tag->meta.sized)
> +               return;
> +
> +       /* Must be ready for on-demand allocations. */
> +       if (ondemand && !ondemand_ready)
> +               return;
> +
> +       old = READ_ONCE(tag->meta.cache);
> +       /* Already allocated? */
> +       if (old)
> +               return;
> +
> +       if (tag->meta.sized < SIZE_MAX) {
> +               /* Fixed-size allocations. */
> +               name = kasprintf(GFP_KERNEL, "f:%zu:%s:%d", tag->meta.sized, ct->function, ct->lineno);
> +               if (WARN_ON_ONCE(!name))
> +                       return;
> +               /*
> +                * As with KMALLOC_NORMAL, the entire allocation needs to be
> +                * open to usercopy access. :(
> +                */
> +               p = kmem_cache_create_usercopy(name, tag->meta.sized, 0,
> +                                              SLAB_NO_MERGE, 0, tag->meta.sized,
> +                                              NULL);
> +       } else {
> +               /* Dynamically-size allocations. */
> +               name = kasprintf(GFP_KERNEL, "d:%s:%d", ct->function, ct->lineno);
> +               if (WARN_ON_ONCE(!name))
> +                       return;
> +               p = kmem_buckets_create(name, SLAB_NO_MERGE, 0, UINT_MAX, NULL);
> +       }
> +       if (p) {
> +               if (unlikely(!try_cmpxchg(&tag->meta.cache, &old, p))) {
> +                       /* We lost the allocation race; clean up. */
> +                       if (tag->meta.sized < SIZE_MAX)
> +                               kmem_cache_destroy(p);
> +                       else
> +                               kmem_buckets_destroy(p);
> +               }
> +       }
> +       kfree(name);
> +}
> +
> +static void alloc_tag_site_init_early(struct codetag *ct)
> +{
> +       /* Explicitly initialize the caches needed to initialize caches. */
> +       if (strcmp(ct->function, "kstrdup") == 0 ||
> +           strcmp(ct->function, "kvasprintf") == 0 ||
> +           strcmp(ct->function, "pcpu_mem_zalloc") == 0)

I hope we can find a better way to distinguish these allocations.
Maybe have a specialized hook for them, like alloc_hooks_early() which
sets a bit inside ct->flags to distinguish them?

> +               alloc_tag_site_init(ct, false);
> +
> +       /* TODO: pre-allocate GFP_ATOMIC caches here. */

You could pre-allocate GFP_ATOMIC caches during
alloc_tag_module_load() only if gfp_flags are known at compile time I
think. I guess for the dynamic case choose_slab() will fall back to
kmalloc_slab()?

> +}
> +#endif
> +
> +static void alloc_tag_module_load(struct codetag_type *cttype,
> +                                 struct codetag_module *cmod)
> +{
> +#ifdef CONFIG_SLAB_PER_SITE
> +       struct codetag_iterator iter;
> +       struct codetag *ct;
> +
> +       iter = codetag_get_ct_iter(cttype);
> +       for (ct = codetag_next_ct(&iter); ct; ct = codetag_next_ct(&iter)) {
> +               if (iter.cmod != cmod)
> +                       continue;
> +
> +               /* TODO: pre-allocate GFP_ATOMIC caches here. */
> +               //alloc_tag_site_init(ct, false);
> +       }
> +#endif
> +}
> +
>  static bool alloc_tag_module_unload(struct codetag_type *cttype,
>                                     struct codetag_module *cmod)
>  {
> @@ -175,8 +258,21 @@ static bool alloc_tag_module_unload(struct codetag_type *cttype,
>
>                 if (WARN(counter.bytes,
>                          "%s:%u module %s func:%s has %llu allocated at module unload",
> -                        ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes))
> +                        ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes)) {
>                         module_unused = false;
> +               }
> +#ifdef CONFIG_SLAB_PER_SITE
> +               else if (tag->meta.sized) {
> +                       /* Remove the allocated caches, if possible. */
> +                       void *p = READ_ONCE(tag->meta.cache);
> +
> +                       WRITE_ONCE(tag->meta.cache, NULL);

I'm guessing you are not using try_cmpxchg() the same way you did in
alloc_tag_site_init() because a race with any other user is impossible
at the module unload time? If so, a comment mentioning that would be
good.

> +                       if (tag->meta.sized < SIZE_MAX)
> +                               kmem_cache_destroy(p);
> +                       else
> +                               kmem_buckets_destroy(p);
> +               }
> +#endif
>         }
>
>         return module_unused;
> @@ -260,15 +356,16 @@ static void __init sysctl_init(void)
>  static inline void sysctl_init(void) {}
>  #endif /* CONFIG_SYSCTL */
>
> +static const struct codetag_type_desc alloc_tag_desc = {
> +       .section        = "alloc_tags",
> +       .tag_size       = sizeof(struct alloc_tag),
> +       .module_load    = alloc_tag_module_load,
> +       .module_unload  = alloc_tag_module_unload,
> +};
> +
>  static int __init alloc_tag_init(void)
>  {
> -       const struct codetag_type_desc desc = {
> -               .section        = "alloc_tags",
> -               .tag_size       = sizeof(struct alloc_tag),
> -               .module_unload  = alloc_tag_module_unload,
> -       };
> -
> -       alloc_tag_cttype = codetag_register_type(&desc);
> +       alloc_tag_cttype = codetag_register_type(&alloc_tag_desc);
>         if (IS_ERR(alloc_tag_cttype))
>                 return PTR_ERR(alloc_tag_cttype);
>
> @@ -278,3 +375,11 @@ static int __init alloc_tag_init(void)
>         return 0;
>  }
>  module_init(alloc_tag_init);
> +
> +#ifdef CONFIG_SLAB_PER_SITE
> +void alloc_tag_early_walk(void)
> +{
> +       codetag_early_walk(&alloc_tag_desc, alloc_tag_site_init_early);
> +       ondemand_ready = true;
> +}
> +#endif
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 855c63c3270d..4f01cb6dd32e 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -302,7 +302,20 @@ config SLAB_PER_SITE
>         default SLAB_FREELIST_HARDENED
>         select SLAB_BUCKETS
>         help
> -         Track sizes of kmalloc() call sites.
> +         As a defense against shared-cache "type confusion" use-after-free
> +         attacks, every kmalloc()-family call allocates from a separate
> +         kmem_cache (or when dynamically sized, kmem_buckets). Attackers
> +         will no longer be able to groom malicious objects via similarly
> +         sized allocations that share the same cache as the target object.
> +
> +         This increases the "at rest" kmalloc slab memory usage by
> +         roughly 5x (around 7MiB), and adds the potential for greater
> +         long-term memory fragmentation. However, some workloads
> +         actually see performance improvements when single allocation
> +         sites are hot.

I hope you provide the performance and overhead data in the cover
letter when you post v1.

> +
> +         For a similar defense, see CONFIG_RANDOM_KMALLOC_CACHES, which
> +         has less memory usage overhead, but is probabilistic.
>
>  config SLUB_STATS
>         default n
> @@ -331,6 +344,7 @@ config SLUB_CPU_PARTIAL
>  config RANDOM_KMALLOC_CACHES
>         default n
>         depends on !SLUB_TINY
> +       depends on !SLAB_PER_SITE
>         bool "Randomize slab caches for normal kmalloc"
>         help
>           A hardening feature that creates multiple copies of slab caches for
> @@ -345,6 +359,9 @@ config RANDOM_KMALLOC_CACHES
>           limited degree of memory and CPU overhead that relates to hardware and
>           system workload.
>
> +         For a similar defense, see CONFIG_SLAB_PER_SITE, which is
> +         deterministic, but has greater memory usage overhead.
> +
>  endmenu # Slab allocator options
>
>  config SHUFFLE_PAGE_ALLOCATOR
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index fc698cba0ebe..09506bfa972c 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1040,6 +1040,7 @@ void __init create_kmalloc_caches(void)
>                 kmem_buckets_cache = kmem_cache_create("kmalloc_buckets",
>                                                        sizeof(kmem_buckets),
>                                                        0, SLAB_NO_MERGE, NULL);
> +       alloc_tag_early_walk();
>  }
>
>  /**
> diff --git a/mm/slub.c b/mm/slub.c
> index 3520acaf9afa..d14102c4b4d7 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4135,6 +4135,35 @@ void *__kmalloc_large_node_noprof(size_t size, gfp_t flags, int node)
>  }
>  EXPORT_SYMBOL(__kmalloc_large_node_noprof);
>
> +static __always_inline
> +struct kmem_cache *choose_slab(size_t size, kmem_buckets *b, gfp_t flags,
> +                              unsigned long caller)
> +{
> +#ifdef CONFIG_SLAB_PER_SITE
> +       struct alloc_tag *tag = current->alloc_tag;
> +
> +       if (!b && tag && tag->meta.sized &&
> +           kmalloc_type(flags, caller) == KMALLOC_NORMAL &&
> +           (flags & GFP_ATOMIC) != GFP_ATOMIC) {

What if allocation is GFP_ATOMIC but a previous allocation from the
same location (same tag) happened without GFP_ATOMIC and
tag->meta.cache was allocated. Why not use that existing cache?
Same if the tag->meta.cache was pre-allocated.


> +               void *p = READ_ONCE(tag->meta.cache);
> +
> +               if (!p && slab_state >= UP) {
> +                       alloc_tag_site_init(&tag->ct, true);
> +                       p = READ_ONCE(tag->meta.cache);
> +               }
> +
> +               if (tag->meta.sized < SIZE_MAX) {
> +                       if (p)
> +                               return p;
> +                       /* Otherwise continue with default buckets. */
> +               } else {
> +                       b = p;
> +               }
> +       }
> +#endif
> +       return kmalloc_slab(size, b, flags, caller);
> +}
> +
>  static __always_inline
>  void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,
>                         unsigned long caller)
> @@ -4152,7 +4181,7 @@ void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,
>         if (unlikely(!size))
>                 return ZERO_SIZE_PTR;
>
> -       s = kmalloc_slab(size, b, flags, caller);
> +       s = choose_slab(size, b, flags, caller);
>
>         ret = slab_alloc_node(s, NULL, flags, node, caller, size);
>         ret = kasan_kmalloc(s, ret, size, flags);
> --
> 2.34.1
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/5] codetag: Run module_load hooks for builtin codetags
  2024-08-29 15:02   ` Suren Baghdasaryan
@ 2024-09-11 22:17     ` Kees Cook
  0 siblings, 0 replies; 17+ messages in thread
From: Kees Cook @ 2024-09-11 22:17 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Vlastimil Babka, Kent Overstreet, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, GONG, Ruiqi, Jann Horn, Matteo Rizzo,
	jvoisin, Xiu Jianfeng, linux-kernel, linux-hardening

On Thu, Aug 29, 2024 at 08:02:13AM -0700, Suren Baghdasaryan wrote:
> On Fri, Aug 9, 2024 at 12:33 AM Kees Cook <kees@kernel.org> wrote:
> >
> > The module_load callback should still run for builtin codetags that
> > define it, even in a non-modular kernel. (i.e. for the cmod->mod == NULL
> > case).
> >
> > Signed-off-by: Kees Cook <kees@kernel.org>
> 
> Hi Kees,
> I finally got some time and started reviewing your patches.
> Coincidentally I recently posted a fix for this issue at
> https://lore.kernel.org/all/20240828231536.1770519-1-surenb@google.com/
> Your fix is missing a small part when codetag_module_init() is using
> mod->name while struct module is undefined (CONFIG_MODULES=n) and you
> should see this build error:
> 
> In file included from ./include/linux/kernel.h:31,
>                  from ./include/linux/cpumask.h:11,
>                  from ./include/linux/smp.h:13,
>                  from ./include/linux/lockdep.h:14,
>                  from ./include/linux/radix-tree.h:14,
>                  from ./include/linux/idr.h:15,
>                  from lib/codetag.c:3:
> lib/codetag.c: In function ‘codetag_module_init’:
>   CC      drivers/acpi/acpica/extrace.o
> lib/codetag.c:167:34: error: invalid use of undefined type ‘struct module’
>   167 |                         mod ? mod->name : "(built-in)");
>       |                                  ^~

Ah-ha! Excellent. Thanks; I will double-check that your version of this
doesn't have any surprises for how I was using it here. I expect it'll
be fine.

-Kees

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] codetag: Introduce codetag_early_walk()
  2024-08-29 15:39   ` Suren Baghdasaryan
@ 2024-09-11 22:18     ` Kees Cook
  0 siblings, 0 replies; 17+ messages in thread
From: Kees Cook @ 2024-09-11 22:18 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Vlastimil Babka, Kent Overstreet, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, GONG, Ruiqi, Jann Horn, Matteo Rizzo,
	jvoisin, Xiu Jianfeng, linux-kernel, linux-hardening

On Thu, Aug 29, 2024 at 08:39:29AM -0700, Suren Baghdasaryan wrote:
> On Fri, Aug 9, 2024 at 12:33 AM Kees Cook <kees@kernel.org> wrote:
> >
> > In order to process builtin alloc_tags much earlier during boot (before
> > register_codetag() is processed), provide codetag_early_walk() that
> > perform a lockless walk with a specified callback function. This will be
> > used to allocate required caches that cannot be allocated on demand.
> >
> > Signed-off-by: Kees Cook <kees@kernel.org>
> > ---
> > Cc: Suren Baghdasaryan <surenb@google.com>
> > Cc: Kent Overstreet <kent.overstreet@linux.dev>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Christoph Lameter <cl@linux.com>
> > Cc: Pekka Enberg <penberg@kernel.org>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Roman Gushchin <roman.gushchin@linux.dev>
> > Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> > Cc: linux-mm@kvack.org
> > ---
> >  include/linux/codetag.h |  2 ++
> >  lib/codetag.c           | 16 ++++++++++++++++
> >  2 files changed, 18 insertions(+)
> >
> > diff --git a/include/linux/codetag.h b/include/linux/codetag.h
> > index c2a579ccd455..9eb1fcd90570 100644
> > --- a/include/linux/codetag.h
> > +++ b/include/linux/codetag.h
> > @@ -64,6 +64,8 @@ void codetag_lock_module_list(struct codetag_type *cttype, bool lock);
> >  bool codetag_trylock_module_list(struct codetag_type *cttype);
> >  struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype);
> >  struct codetag *codetag_next_ct(struct codetag_iterator *iter);
> > +void codetag_early_walk(const struct codetag_type_desc *desc,
> > +                       void (*callback)(struct codetag *ct));
> >
> >  void codetag_to_text(struct seq_buf *out, struct codetag *ct);
> >
> > diff --git a/lib/codetag.c b/lib/codetag.c
> > index ef7634c7ee18..9d563c8c088a 100644
> > --- a/lib/codetag.c
> > +++ b/lib/codetag.c
> > @@ -154,6 +154,22 @@ static struct codetag_range get_section_range(struct module *mod,
> >         };
> >  }
> >
> > +void codetag_early_walk(const struct codetag_type_desc *desc,
> > +                       void (*callback)(struct codetag *ct))
> > +{
> > +       struct codetag_range range;
> > +       struct codetag *ct;
> > +
> > +       range = get_section_range(NULL, desc->section);
> > +       if (!range.start || !range.stop ||
> > +           range.start == range.stop ||
> > +           range.start > range.stop)
> > +               return;
> 
> I think this check can be simplified to:
> 
>         if (!range.start || range.start >= range.stop)
>                 return;
> 
> nit: Technically (!range.start) should also never trigger. In a valid
> image these symbols are either missing (range.start == range.stop ==
> NULL) or both are defined and (range.start < range.stop).

Yeah, all true. I was mainly copying all the checks that existed in the
"slow path" version.

I will adjust this for the next version.

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] alloc_tag: Track fixed vs dynamic sized kmalloc calls
  2024-08-29 16:00   ` Suren Baghdasaryan
@ 2024-09-11 22:23     ` Kees Cook
  0 siblings, 0 replies; 17+ messages in thread
From: Kees Cook @ 2024-09-11 22:23 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Vlastimil Babka, Kent Overstreet, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, GONG, Ruiqi, Jann Horn, Matteo Rizzo,
	jvoisin, Xiu Jianfeng, linux-kernel, linux-hardening

On Thu, Aug 29, 2024 at 09:00:37AM -0700, Suren Baghdasaryan wrote:
> On Fri, Aug 9, 2024 at 12:33 AM Kees Cook <kees@kernel.org> wrote:
> [...]
> > -#define kmem_cache_alloc(...)                  alloc_hooks(kmem_cache_alloc_noprof(__VA_ARGS__))
> > +#define kmem_cache_alloc(...)          alloc_hooks(kmem_cache_alloc_noprof(__VA_ARGS__))
> 
> nit: seems like an unnecessary churn.

Whoops, yes. This was left over from an earlier pass and I failed to get
the whitespace correctly restored. I will fix this this.

> > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > index 81e5f9a70f22..6d2cb72bf269 100644
> > --- a/lib/alloc_tag.c
> > +++ b/lib/alloc_tag.c
> > @@ -78,6 +78,14 @@ static void alloc_tag_to_text(struct seq_buf *out, struct codetag *ct)
> >
> >         seq_buf_printf(out, "%12lli %8llu ", bytes, counter.calls);
> >         codetag_to_text(out, ct);
> > +#ifdef CONFIG_SLAB_PER_SITE
> > +       seq_buf_putc(out, ' ');
> > +       seq_buf_printf(out, "size:%s(%zu) slab:%s",
> > +                               tag->meta.sized == 0 ? "non-slab" :
> 
> "non-slab" term sounds overly specific and we might extend this to
> some other allocations as well in the future. I would suggest
> "unknown" instead.

Heh, yeah. I went back and forth on the name for this and went with
non-slab because we do know what it isn't. It's not some kind of
unexpected state. Maybe "untracked", or "unsized", though both seem
inaccurate from certain perspectives.

> 
> > +                                       tag->meta.sized == SIZE_MAX ? "dynamic" : "fixed",
> > +                               tag->meta.sized == SIZE_MAX ? 0 : tag->meta.sized,
> > +                               tag->meta.cache ? "ready" : "unused");
> 
> I don't see "struct alloc_meta" having a "cache" member...

Oops, yes, as you found this should have been associated with the next
patch that adds "cache".

> Since you are changing the format of this file, you want to also bump
> up the file version inside print_allocinfo_header().

Okay, yeah. In that case I'll probably split the report into a separate
patch after "cache" is added so there's only a single bump in allocinfo
versioning.

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] slab: Allocate and use per-call-site caches
  2024-08-29 17:03   ` Suren Baghdasaryan
@ 2024-09-11 22:30     ` Kees Cook
  2024-09-12 15:58       ` Suren Baghdasaryan
  0 siblings, 1 reply; 17+ messages in thread
From: Kees Cook @ 2024-09-11 22:30 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Vlastimil Babka, Kent Overstreet, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, GONG, Ruiqi, Jann Horn, Matteo Rizzo,
	jvoisin, Xiu Jianfeng, linux-kernel, linux-hardening

On Thu, Aug 29, 2024 at 10:03:56AM -0700, Suren Baghdasaryan wrote:
> On Fri, Aug 9, 2024 at 12:33 AM Kees Cook <kees@kernel.org> wrote:
> >
> > Use separate per-call-site kmem_cache or kmem_buckets. These are
> > allocated on demand to avoid wasting memory for unused caches.
> >
> > A few caches need to be allocated very early to support allocating the
> > caches themselves: kstrdup(), kvasprintf(), and pcpu_mem_zalloc(). Any
> > GFP_ATOMIC allocations are currently left to be allocated from
> > KMALLOC_NORMAL.
> >
> > With a distro config, /proc/slabinfo grows from ~400 entries to ~2200.
> >
> > Since this feature (CONFIG_SLAB_PER_SITE) is redundant to
> > CONFIG_RANDOM_KMALLOC_CACHES, mark it a incompatible. Add Kconfig help
> > text that compares the features.
> >
> > Improvements needed:
> > - Retain call site gfp flags in alloc_tag meta field to:
> >   - pre-allocate all GFP_ATOMIC caches (since their caches cannot
> >     be allocated on demand unless we want them to be GFP_ATOMIC
> >     themselves...)
> 
> I'm currently working on a feature to identify allocations with
> __GFP_ACCOUNT known at compile time (similar to how you handle the
> size in the previous patch). Might be something you can reuse/extend.

Great, yes! I'd love to check it out.

> >   - Separate MEMCG allocations as well
> 
> Do you mean allocations with __GFP_ACCOUNT or something else?

I do, yes.

> > +static void alloc_tag_site_init_early(struct codetag *ct)
> > +{
> > +       /* Explicitly initialize the caches needed to initialize caches. */
> > +       if (strcmp(ct->function, "kstrdup") == 0 ||
> > +           strcmp(ct->function, "kvasprintf") == 0 ||
> > +           strcmp(ct->function, "pcpu_mem_zalloc") == 0)
> 
> I hope we can find a better way to distinguish these allocations.
> Maybe have a specialized hook for them, like alloc_hooks_early() which
> sets a bit inside ct->flags to distinguish them?

That might be possible. I'll see how that ends up looking. I don't want
to even further fragment the alloc_hooks_... variants.

> 
> > +               alloc_tag_site_init(ct, false);
> > +
> > +       /* TODO: pre-allocate GFP_ATOMIC caches here. */
> 
> You could pre-allocate GFP_ATOMIC caches during
> alloc_tag_module_load() only if gfp_flags are known at compile time I
> think. I guess for the dynamic case choose_slab() will fall back to
> kmalloc_slab()?

Right, yes. I'd do it like the size checking: if we know at compile
time, we can depend on it, otherwise it's a run-time fallback.

> 
> > @@ -175,8 +258,21 @@ static bool alloc_tag_module_unload(struct codetag_type *cttype,
> >
> >                 if (WARN(counter.bytes,
> >                          "%s:%u module %s func:%s has %llu allocated at module unload",
> > -                        ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes))
> > +                        ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes)) {
> >                         module_unused = false;
> > +               }
> > +#ifdef CONFIG_SLAB_PER_SITE
> > +               else if (tag->meta.sized) {
> > +                       /* Remove the allocated caches, if possible. */
> > +                       void *p = READ_ONCE(tag->meta.cache);
> > +
> > +                       WRITE_ONCE(tag->meta.cache, NULL);
> 
> I'm guessing you are not using try_cmpxchg() the same way you did in
> alloc_tag_site_init() because a race with any other user is impossible
> at the module unload time? If so, a comment mentioning that would be
> good.

Correct. It should not be possible. But yes, I will add a comment.

> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 855c63c3270d..4f01cb6dd32e 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -302,7 +302,20 @@ config SLAB_PER_SITE
> >         default SLAB_FREELIST_HARDENED
> >         select SLAB_BUCKETS
> >         help
> > -         Track sizes of kmalloc() call sites.
> > +         As a defense against shared-cache "type confusion" use-after-free
> > +         attacks, every kmalloc()-family call allocates from a separate
> > +         kmem_cache (or when dynamically sized, kmem_buckets). Attackers
> > +         will no longer be able to groom malicious objects via similarly
> > +         sized allocations that share the same cache as the target object.
> > +
> > +         This increases the "at rest" kmalloc slab memory usage by
> > +         roughly 5x (around 7MiB), and adds the potential for greater
> > +         long-term memory fragmentation. However, some workloads
> > +         actually see performance improvements when single allocation
> > +         sites are hot.
> 
> I hope you provide the performance and overhead data in the cover
> letter when you post v1.

That's my plan. It's always odd choosing workloads, but we do seem to
have a few 'regular' benchmarks (hackbench, kernel builds, etc). Is
there anything in particular you'd want to see?

> > +static __always_inline
> > +struct kmem_cache *choose_slab(size_t size, kmem_buckets *b, gfp_t flags,
> > +                              unsigned long caller)
> > +{
> > +#ifdef CONFIG_SLAB_PER_SITE
> > +       struct alloc_tag *tag = current->alloc_tag;
> > +
> > +       if (!b && tag && tag->meta.sized &&
> > +           kmalloc_type(flags, caller) == KMALLOC_NORMAL &&
> > +           (flags & GFP_ATOMIC) != GFP_ATOMIC) {
> 
> What if allocation is GFP_ATOMIC but a previous allocation from the
> same location (same tag) happened without GFP_ATOMIC and
> tag->meta.cache was allocated. Why not use that existing cache?
> Same if the tag->meta.cache was pre-allocated.

Maybe I was being too conservative in my understanding -- I thought that
I couldn't use those caches on the chance that they may already be full?
Or is that always the risk, ad GFP_ATOMIC deals with that? If it would
be considered safe attempt the allocation from the existing cache, then
yeah, I can adjust this check.

Thanks for looking these over!

-Kees

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] slab: Allocate and use per-call-site caches
  2024-09-11 22:30     ` Kees Cook
@ 2024-09-12 15:58       ` Suren Baghdasaryan
  0 siblings, 0 replies; 17+ messages in thread
From: Suren Baghdasaryan @ 2024-09-12 15:58 UTC (permalink / raw)
  To: Kees Cook
  Cc: Vlastimil Babka, Kent Overstreet, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, GONG, Ruiqi, Jann Horn, Matteo Rizzo,
	jvoisin, Xiu Jianfeng, linux-kernel, linux-hardening

On Wed, Sep 11, 2024 at 3:30 PM Kees Cook <kees@kernel.org> wrote:
>
> On Thu, Aug 29, 2024 at 10:03:56AM -0700, Suren Baghdasaryan wrote:
> > On Fri, Aug 9, 2024 at 12:33 AM Kees Cook <kees@kernel.org> wrote:
> > >
> > > Use separate per-call-site kmem_cache or kmem_buckets. These are
> > > allocated on demand to avoid wasting memory for unused caches.
> > >
> > > A few caches need to be allocated very early to support allocating the
> > > caches themselves: kstrdup(), kvasprintf(), and pcpu_mem_zalloc(). Any
> > > GFP_ATOMIC allocations are currently left to be allocated from
> > > KMALLOC_NORMAL.
> > >
> > > With a distro config, /proc/slabinfo grows from ~400 entries to ~2200.
> > >
> > > Since this feature (CONFIG_SLAB_PER_SITE) is redundant to
> > > CONFIG_RANDOM_KMALLOC_CACHES, mark it a incompatible. Add Kconfig help
> > > text that compares the features.
> > >
> > > Improvements needed:
> > > - Retain call site gfp flags in alloc_tag meta field to:
> > >   - pre-allocate all GFP_ATOMIC caches (since their caches cannot
> > >     be allocated on demand unless we want them to be GFP_ATOMIC
> > >     themselves...)
> >
> > I'm currently working on a feature to identify allocations with
> > __GFP_ACCOUNT known at compile time (similar to how you handle the
> > size in the previous patch). Might be something you can reuse/extend.
>
> Great, yes! I'd love to check it out.
>
> > >   - Separate MEMCG allocations as well
> >
> > Do you mean allocations with __GFP_ACCOUNT or something else?
>
> I do, yes.
>
> > > +static void alloc_tag_site_init_early(struct codetag *ct)
> > > +{
> > > +       /* Explicitly initialize the caches needed to initialize caches. */
> > > +       if (strcmp(ct->function, "kstrdup") == 0 ||
> > > +           strcmp(ct->function, "kvasprintf") == 0 ||
> > > +           strcmp(ct->function, "pcpu_mem_zalloc") == 0)
> >
> > I hope we can find a better way to distinguish these allocations.
> > Maybe have a specialized hook for them, like alloc_hooks_early() which
> > sets a bit inside ct->flags to distinguish them?
>
> That might be possible. I'll see how that ends up looking. I don't want
> to even further fragment the alloc_hooks_... variants.
>
> >
> > > +               alloc_tag_site_init(ct, false);
> > > +
> > > +       /* TODO: pre-allocate GFP_ATOMIC caches here. */
> >
> > You could pre-allocate GFP_ATOMIC caches during
> > alloc_tag_module_load() only if gfp_flags are known at compile time I
> > think. I guess for the dynamic case choose_slab() will fall back to
> > kmalloc_slab()?
>
> Right, yes. I'd do it like the size checking: if we know at compile
> time, we can depend on it, otherwise it's a run-time fallback.
>
> >
> > > @@ -175,8 +258,21 @@ static bool alloc_tag_module_unload(struct codetag_type *cttype,
> > >
> > >                 if (WARN(counter.bytes,
> > >                          "%s:%u module %s func:%s has %llu allocated at module unload",
> > > -                        ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes))
> > > +                        ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes)) {
> > >                         module_unused = false;
> > > +               }
> > > +#ifdef CONFIG_SLAB_PER_SITE
> > > +               else if (tag->meta.sized) {
> > > +                       /* Remove the allocated caches, if possible. */
> > > +                       void *p = READ_ONCE(tag->meta.cache);
> > > +
> > > +                       WRITE_ONCE(tag->meta.cache, NULL);
> >
> > I'm guessing you are not using try_cmpxchg() the same way you did in
> > alloc_tag_site_init() because a race with any other user is impossible
> > at the module unload time? If so, a comment mentioning that would be
> > good.
>
> Correct. It should not be possible. But yes, I will add a comment.
>
> > > diff --git a/mm/Kconfig b/mm/Kconfig
> > > index 855c63c3270d..4f01cb6dd32e 100644
> > > --- a/mm/Kconfig
> > > +++ b/mm/Kconfig
> > > @@ -302,7 +302,20 @@ config SLAB_PER_SITE
> > >         default SLAB_FREELIST_HARDENED
> > >         select SLAB_BUCKETS
> > >         help
> > > -         Track sizes of kmalloc() call sites.
> > > +         As a defense against shared-cache "type confusion" use-after-free
> > > +         attacks, every kmalloc()-family call allocates from a separate
> > > +         kmem_cache (or when dynamically sized, kmem_buckets). Attackers
> > > +         will no longer be able to groom malicious objects via similarly
> > > +         sized allocations that share the same cache as the target object.
> > > +
> > > +         This increases the "at rest" kmalloc slab memory usage by
> > > +         roughly 5x (around 7MiB), and adds the potential for greater
> > > +         long-term memory fragmentation. However, some workloads
> > > +         actually see performance improvements when single allocation
> > > +         sites are hot.
> >
> > I hope you provide the performance and overhead data in the cover
> > letter when you post v1.
>
> That's my plan. It's always odd choosing workloads, but we do seem to
> have a few 'regular' benchmarks (hackbench, kernel builds, etc). Is
> there anything in particular you'd want to see?

I have a stress test implemented as a loadable module to benchmark
slab and page allocation times (just a tight loop and timing it). I
can clean it up a bit and share with you.

>
> > > +static __always_inline
> > > +struct kmem_cache *choose_slab(size_t size, kmem_buckets *b, gfp_t flags,
> > > +                              unsigned long caller)
> > > +{
> > > +#ifdef CONFIG_SLAB_PER_SITE
> > > +       struct alloc_tag *tag = current->alloc_tag;
> > > +
> > > +       if (!b && tag && tag->meta.sized &&
> > > +           kmalloc_type(flags, caller) == KMALLOC_NORMAL &&
> > > +           (flags & GFP_ATOMIC) != GFP_ATOMIC) {
> >
> > What if allocation is GFP_ATOMIC but a previous allocation from the
> > same location (same tag) happened without GFP_ATOMIC and
> > tag->meta.cache was allocated. Why not use that existing cache?
> > Same if the tag->meta.cache was pre-allocated.
>
> Maybe I was being too conservative in my understanding -- I thought that
> I couldn't use those caches on the chance that they may already be full?
> Or is that always the risk, ad GFP_ATOMIC deals with that? If it would
> be considered safe attempt the allocation from the existing cache, then
> yeah, I can adjust this check.

Well, you fall back to kmalloc_slab() which also might be full. So,
how would using an existing cache be different?

>
> Thanks for looking these over!
>
> -Kees
>
> --
> Kees Cook


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-09-12 15:59 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-09  7:33 [RFC][PATCH 0/5] slab: Allocate and use per-call-site caches Kees Cook
2024-08-09  7:33 ` [PATCH 1/5] slab: Introduce kmem_buckets_destroy() Kees Cook
2024-08-09  7:33 ` [PATCH 2/5] codetag: Run module_load hooks for builtin codetags Kees Cook
2024-08-29 15:02   ` Suren Baghdasaryan
2024-09-11 22:17     ` Kees Cook
2024-08-09  7:33 ` [PATCH 3/5] codetag: Introduce codetag_early_walk() Kees Cook
2024-08-29 15:39   ` Suren Baghdasaryan
2024-09-11 22:18     ` Kees Cook
2024-08-09  7:33 ` [PATCH 4/5] alloc_tag: Track fixed vs dynamic sized kmalloc calls Kees Cook
2024-08-29 16:00   ` Suren Baghdasaryan
2024-09-11 22:23     ` Kees Cook
2024-08-09  7:33 ` [PATCH 5/5] slab: Allocate and use per-call-site caches Kees Cook
2024-08-17  1:30   ` Xiu Jianfeng
2024-08-22 17:47     ` Kees Cook
2024-08-29 17:03   ` Suren Baghdasaryan
2024-09-11 22:30     ` Kees Cook
2024-09-12 15:58       ` Suren Baghdasaryan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).