From: Kees Cook <kees@kernel.org>
To: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
Kent Overstreet <kent.overstreet@linux.dev>,
Christoph Lameter <cl@linux.com>,
Pekka Enberg <penberg@kernel.org>,
David Rientjes <rientjes@google.com>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Andrew Morton <akpm@linux-foundation.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Hyeonggon Yoo <42.hyeyoo@gmail.com>,
linux-mm@kvack.org, "GONG, Ruiqi" <gongruiqi@huaweicloud.com>,
Jann Horn <jannh@google.com>,
Matteo Rizzo <matteorizzo@google.com>,
jvoisin <julien.voisin@dustri.org>,
Xiu Jianfeng <xiujianfeng@huawei.com>,
linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org
Subject: Re: [PATCH 5/5] slab: Allocate and use per-call-site caches
Date: Wed, 11 Sep 2024 15:30:40 -0700 [thread overview]
Message-ID: <202409111523.AEAEE48@keescook> (raw)
In-Reply-To: <CAJuCfpHOqKPzbbkULpWU5g1-8mTLXraQM4taHzajY_cJ-YFWgQ@mail.gmail.com>
On Thu, Aug 29, 2024 at 10:03:56AM -0700, Suren Baghdasaryan wrote:
> On Fri, Aug 9, 2024 at 12:33 AM Kees Cook <kees@kernel.org> wrote:
> >
> > Use separate per-call-site kmem_cache or kmem_buckets. These are
> > allocated on demand to avoid wasting memory for unused caches.
> >
> > A few caches need to be allocated very early to support allocating the
> > caches themselves: kstrdup(), kvasprintf(), and pcpu_mem_zalloc(). Any
> > GFP_ATOMIC allocations are currently left to be allocated from
> > KMALLOC_NORMAL.
> >
> > With a distro config, /proc/slabinfo grows from ~400 entries to ~2200.
> >
> > Since this feature (CONFIG_SLAB_PER_SITE) is redundant to
> > CONFIG_RANDOM_KMALLOC_CACHES, mark it a incompatible. Add Kconfig help
> > text that compares the features.
> >
> > Improvements needed:
> > - Retain call site gfp flags in alloc_tag meta field to:
> > - pre-allocate all GFP_ATOMIC caches (since their caches cannot
> > be allocated on demand unless we want them to be GFP_ATOMIC
> > themselves...)
>
> I'm currently working on a feature to identify allocations with
> __GFP_ACCOUNT known at compile time (similar to how you handle the
> size in the previous patch). Might be something you can reuse/extend.
Great, yes! I'd love to check it out.
> > - Separate MEMCG allocations as well
>
> Do you mean allocations with __GFP_ACCOUNT or something else?
I do, yes.
> > +static void alloc_tag_site_init_early(struct codetag *ct)
> > +{
> > + /* Explicitly initialize the caches needed to initialize caches. */
> > + if (strcmp(ct->function, "kstrdup") == 0 ||
> > + strcmp(ct->function, "kvasprintf") == 0 ||
> > + strcmp(ct->function, "pcpu_mem_zalloc") == 0)
>
> I hope we can find a better way to distinguish these allocations.
> Maybe have a specialized hook for them, like alloc_hooks_early() which
> sets a bit inside ct->flags to distinguish them?
That might be possible. I'll see how that ends up looking. I don't want
to even further fragment the alloc_hooks_... variants.
>
> > + alloc_tag_site_init(ct, false);
> > +
> > + /* TODO: pre-allocate GFP_ATOMIC caches here. */
>
> You could pre-allocate GFP_ATOMIC caches during
> alloc_tag_module_load() only if gfp_flags are known at compile time I
> think. I guess for the dynamic case choose_slab() will fall back to
> kmalloc_slab()?
Right, yes. I'd do it like the size checking: if we know at compile
time, we can depend on it, otherwise it's a run-time fallback.
>
> > @@ -175,8 +258,21 @@ static bool alloc_tag_module_unload(struct codetag_type *cttype,
> >
> > if (WARN(counter.bytes,
> > "%s:%u module %s func:%s has %llu allocated at module unload",
> > - ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes))
> > + ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes)) {
> > module_unused = false;
> > + }
> > +#ifdef CONFIG_SLAB_PER_SITE
> > + else if (tag->meta.sized) {
> > + /* Remove the allocated caches, if possible. */
> > + void *p = READ_ONCE(tag->meta.cache);
> > +
> > + WRITE_ONCE(tag->meta.cache, NULL);
>
> I'm guessing you are not using try_cmpxchg() the same way you did in
> alloc_tag_site_init() because a race with any other user is impossible
> at the module unload time? If so, a comment mentioning that would be
> good.
Correct. It should not be possible. But yes, I will add a comment.
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 855c63c3270d..4f01cb6dd32e 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -302,7 +302,20 @@ config SLAB_PER_SITE
> > default SLAB_FREELIST_HARDENED
> > select SLAB_BUCKETS
> > help
> > - Track sizes of kmalloc() call sites.
> > + As a defense against shared-cache "type confusion" use-after-free
> > + attacks, every kmalloc()-family call allocates from a separate
> > + kmem_cache (or when dynamically sized, kmem_buckets). Attackers
> > + will no longer be able to groom malicious objects via similarly
> > + sized allocations that share the same cache as the target object.
> > +
> > + This increases the "at rest" kmalloc slab memory usage by
> > + roughly 5x (around 7MiB), and adds the potential for greater
> > + long-term memory fragmentation. However, some workloads
> > + actually see performance improvements when single allocation
> > + sites are hot.
>
> I hope you provide the performance and overhead data in the cover
> letter when you post v1.
That's my plan. It's always odd choosing workloads, but we do seem to
have a few 'regular' benchmarks (hackbench, kernel builds, etc). Is
there anything in particular you'd want to see?
> > +static __always_inline
> > +struct kmem_cache *choose_slab(size_t size, kmem_buckets *b, gfp_t flags,
> > + unsigned long caller)
> > +{
> > +#ifdef CONFIG_SLAB_PER_SITE
> > + struct alloc_tag *tag = current->alloc_tag;
> > +
> > + if (!b && tag && tag->meta.sized &&
> > + kmalloc_type(flags, caller) == KMALLOC_NORMAL &&
> > + (flags & GFP_ATOMIC) != GFP_ATOMIC) {
>
> What if allocation is GFP_ATOMIC but a previous allocation from the
> same location (same tag) happened without GFP_ATOMIC and
> tag->meta.cache was allocated. Why not use that existing cache?
> Same if the tag->meta.cache was pre-allocated.
Maybe I was being too conservative in my understanding -- I thought that
I couldn't use those caches on the chance that they may already be full?
Or is that always the risk, ad GFP_ATOMIC deals with that? If it would
be considered safe attempt the allocation from the existing cache, then
yeah, I can adjust this check.
Thanks for looking these over!
-Kees
--
Kees Cook
next prev parent reply other threads:[~2024-09-11 22:30 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-09 7:33 [RFC][PATCH 0/5] slab: Allocate and use per-call-site caches Kees Cook
2024-08-09 7:33 ` [PATCH 1/5] slab: Introduce kmem_buckets_destroy() Kees Cook
2024-08-09 7:33 ` [PATCH 2/5] codetag: Run module_load hooks for builtin codetags Kees Cook
2024-08-29 15:02 ` Suren Baghdasaryan
2024-09-11 22:17 ` Kees Cook
2024-08-09 7:33 ` [PATCH 3/5] codetag: Introduce codetag_early_walk() Kees Cook
2024-08-29 15:39 ` Suren Baghdasaryan
2024-09-11 22:18 ` Kees Cook
2024-08-09 7:33 ` [PATCH 4/5] alloc_tag: Track fixed vs dynamic sized kmalloc calls Kees Cook
2024-08-29 16:00 ` Suren Baghdasaryan
2024-09-11 22:23 ` Kees Cook
2024-08-09 7:33 ` [PATCH 5/5] slab: Allocate and use per-call-site caches Kees Cook
2024-08-17 1:30 ` Xiu Jianfeng
2024-08-22 17:47 ` Kees Cook
2024-08-29 17:03 ` Suren Baghdasaryan
2024-09-11 22:30 ` Kees Cook [this message]
2024-09-12 15:58 ` Suren Baghdasaryan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=202409111523.AEAEE48@keescook \
--to=kees@kernel.org \
--cc=42.hyeyoo@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=gongruiqi@huaweicloud.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=jannh@google.com \
--cc=julien.voisin@dustri.org \
--cc=kent.overstreet@linux.dev \
--cc=linux-hardening@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matteorizzo@google.com \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=xiujianfeng@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.