From: Kees Cook <kees@kernel.org>
To: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
Kent Overstreet <kent.overstreet@linux.dev>,
Christoph Lameter <cl@linux.com>,
Pekka Enberg <penberg@kernel.org>,
David Rientjes <rientjes@google.com>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Andrew Morton <akpm@linux-foundation.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Hyeonggon Yoo <42.hyeyoo@gmail.com>,
linux-mm@kvack.org, "GONG, Ruiqi" <gongruiqi@huaweicloud.com>,
Jann Horn <jannh@google.com>,
Matteo Rizzo <matteorizzo@google.com>,
jvoisin <julien.voisin@dustri.org>,
Xiu Jianfeng <xiujianfeng@huawei.com>,
linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org
Subject: Re: [PATCH 5/5] slab: Allocate and use per-call-site caches
Date: Wed, 11 Sep 2024 15:30:40 -0700 [thread overview]
Message-ID: <202409111523.AEAEE48@keescook> (raw)
In-Reply-To: <CAJuCfpHOqKPzbbkULpWU5g1-8mTLXraQM4taHzajY_cJ-YFWgQ@mail.gmail.com>
On Thu, Aug 29, 2024 at 10:03:56AM -0700, Suren Baghdasaryan wrote:
> On Fri, Aug 9, 2024 at 12:33 AM Kees Cook <kees@kernel.org> wrote:
> >
> > Use separate per-call-site kmem_cache or kmem_buckets. These are
> > allocated on demand to avoid wasting memory for unused caches.
> >
> > A few caches need to be allocated very early to support allocating the
> > caches themselves: kstrdup(), kvasprintf(), and pcpu_mem_zalloc(). Any
> > GFP_ATOMIC allocations are currently left to be allocated from
> > KMALLOC_NORMAL.
> >
> > With a distro config, /proc/slabinfo grows from ~400 entries to ~2200.
> >
> > Since this feature (CONFIG_SLAB_PER_SITE) is redundant to
> > CONFIG_RANDOM_KMALLOC_CACHES, mark it a incompatible. Add Kconfig help
> > text that compares the features.
> >
> > Improvements needed:
> > - Retain call site gfp flags in alloc_tag meta field to:
> > - pre-allocate all GFP_ATOMIC caches (since their caches cannot
> > be allocated on demand unless we want them to be GFP_ATOMIC
> > themselves...)
>
> I'm currently working on a feature to identify allocations with
> __GFP_ACCOUNT known at compile time (similar to how you handle the
> size in the previous patch). Might be something you can reuse/extend.
Great, yes! I'd love to check it out.
> > - Separate MEMCG allocations as well
>
> Do you mean allocations with __GFP_ACCOUNT or something else?
I do, yes.
> > +static void alloc_tag_site_init_early(struct codetag *ct)
> > +{
> > + /* Explicitly initialize the caches needed to initialize caches. */
> > + if (strcmp(ct->function, "kstrdup") == 0 ||
> > + strcmp(ct->function, "kvasprintf") == 0 ||
> > + strcmp(ct->function, "pcpu_mem_zalloc") == 0)
>
> I hope we can find a better way to distinguish these allocations.
> Maybe have a specialized hook for them, like alloc_hooks_early() which
> sets a bit inside ct->flags to distinguish them?
That might be possible. I'll see how that ends up looking. I don't want
to even further fragment the alloc_hooks_... variants.
>
> > + alloc_tag_site_init(ct, false);
> > +
> > + /* TODO: pre-allocate GFP_ATOMIC caches here. */
>
> You could pre-allocate GFP_ATOMIC caches during
> alloc_tag_module_load() only if gfp_flags are known at compile time I
> think. I guess for the dynamic case choose_slab() will fall back to
> kmalloc_slab()?
Right, yes. I'd do it like the size checking: if we know at compile
time, we can depend on it, otherwise it's a run-time fallback.
>
> > @@ -175,8 +258,21 @@ static bool alloc_tag_module_unload(struct codetag_type *cttype,
> >
> > if (WARN(counter.bytes,
> > "%s:%u module %s func:%s has %llu allocated at module unload",
> > - ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes))
> > + ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes)) {
> > module_unused = false;
> > + }
> > +#ifdef CONFIG_SLAB_PER_SITE
> > + else if (tag->meta.sized) {
> > + /* Remove the allocated caches, if possible. */
> > + void *p = READ_ONCE(tag->meta.cache);
> > +
> > + WRITE_ONCE(tag->meta.cache, NULL);
>
> I'm guessing you are not using try_cmpxchg() the same way you did in
> alloc_tag_site_init() because a race with any other user is impossible
> at the module unload time? If so, a comment mentioning that would be
> good.
Correct. It should not be possible. But yes, I will add a comment.
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 855c63c3270d..4f01cb6dd32e 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -302,7 +302,20 @@ config SLAB_PER_SITE
> > default SLAB_FREELIST_HARDENED
> > select SLAB_BUCKETS
> > help
> > - Track sizes of kmalloc() call sites.
> > + As a defense against shared-cache "type confusion" use-after-free
> > + attacks, every kmalloc()-family call allocates from a separate
> > + kmem_cache (or when dynamically sized, kmem_buckets). Attackers
> > + will no longer be able to groom malicious objects via similarly
> > + sized allocations that share the same cache as the target object.
> > +
> > + This increases the "at rest" kmalloc slab memory usage by
> > + roughly 5x (around 7MiB), and adds the potential for greater
> > + long-term memory fragmentation. However, some workloads
> > + actually see performance improvements when single allocation
> > + sites are hot.
>
> I hope you provide the performance and overhead data in the cover
> letter when you post v1.
That's my plan. It's always odd choosing workloads, but we do seem to
have a few 'regular' benchmarks (hackbench, kernel builds, etc). Is
there anything in particular you'd want to see?
> > +static __always_inline
> > +struct kmem_cache *choose_slab(size_t size, kmem_buckets *b, gfp_t flags,
> > + unsigned long caller)
> > +{
> > +#ifdef CONFIG_SLAB_PER_SITE
> > + struct alloc_tag *tag = current->alloc_tag;
> > +
> > + if (!b && tag && tag->meta.sized &&
> > + kmalloc_type(flags, caller) == KMALLOC_NORMAL &&
> > + (flags & GFP_ATOMIC) != GFP_ATOMIC) {
>
> What if allocation is GFP_ATOMIC but a previous allocation from the
> same location (same tag) happened without GFP_ATOMIC and
> tag->meta.cache was allocated. Why not use that existing cache?
> Same if the tag->meta.cache was pre-allocated.
Maybe I was being too conservative in my understanding -- I thought that
I couldn't use those caches on the chance that they may already be full?
Or is that always the risk, ad GFP_ATOMIC deals with that? If it would
be considered safe attempt the allocation from the existing cache, then
yeah, I can adjust this check.
Thanks for looking these over!
-Kees
--
Kees Cook
next prev parent reply other threads:[~2024-09-11 22:30 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-09 7:33 [RFC][PATCH 0/5] slab: Allocate and use per-call-site caches Kees Cook
2024-08-09 7:33 ` [PATCH 1/5] slab: Introduce kmem_buckets_destroy() Kees Cook
2024-08-09 7:33 ` [PATCH 2/5] codetag: Run module_load hooks for builtin codetags Kees Cook
2024-08-29 15:02 ` Suren Baghdasaryan
2024-09-11 22:17 ` Kees Cook
2024-08-09 7:33 ` [PATCH 3/5] codetag: Introduce codetag_early_walk() Kees Cook
2024-08-29 15:39 ` Suren Baghdasaryan
2024-09-11 22:18 ` Kees Cook
2024-08-09 7:33 ` [PATCH 4/5] alloc_tag: Track fixed vs dynamic sized kmalloc calls Kees Cook
2024-08-29 16:00 ` Suren Baghdasaryan
2024-09-11 22:23 ` Kees Cook
2024-08-09 7:33 ` [PATCH 5/5] slab: Allocate and use per-call-site caches Kees Cook
2024-08-17 1:30 ` Xiu Jianfeng
2024-08-22 17:47 ` Kees Cook
2024-08-29 17:03 ` Suren Baghdasaryan
2024-09-11 22:30 ` Kees Cook [this message]
2024-09-12 15:58 ` Suren Baghdasaryan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=202409111523.AEAEE48@keescook \
--to=kees@kernel.org \
--cc=42.hyeyoo@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=gongruiqi@huaweicloud.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=jannh@google.com \
--cc=julien.voisin@dustri.org \
--cc=kent.overstreet@linux.dev \
--cc=linux-hardening@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matteorizzo@google.com \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=xiujianfeng@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).