From: Harry Yoo <harry.yoo@oracle.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
bpf@vger.kernel.org, linux-mm@kvack.org, shakeel.butt@linux.dev,
mhocko@suse.com, bigeasy@linutronix.de, andrii@kernel.org,
memxor@gmail.com, akpm@linux-foundation.org,
peterz@infradead.org, rostedt@goodmis.org, hannes@cmpxchg.org,
willy@infradead.org
Subject: Re: [PATCH 6/6] slab: Introduce kmalloc_nolock() and kfree_nolock().
Date: Wed, 7 May 2025 09:31:17 +0900 [thread overview]
Message-ID: <aBqp1ScxaTznSf36@harry> (raw)
In-Reply-To: <4d3e5d4b-502b-459b-9779-c0bf55ef2a03@suse.cz>
On Tue, May 06, 2025 at 02:01:48PM +0200, Vlastimil Babka wrote:
> On 5/1/25 05:27, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@kernel.org>
> >
> > kmalloc_nolock() relies on ability of local_lock to detect the situation
> > when it's locked.
> > In !PREEMPT_RT local_lock_is_locked() is true only when NMI happened in
> > irq saved region that protects _that specific_ per-cpu kmem_cache_cpu.
> > In that case retry the operation in a different kmalloc bucket.
> > The second attempt will likely succeed, since this cpu locked
> > different kmem_cache_cpu.
> > When lock_local_is_locked() sees locked memcg_stock.stock_lock
> > fallback to atomic operations.
> >
> > Similarly, in PREEMPT_RT local_lock_is_locked() returns true when
> > per-cpu rt_spin_lock is locked by current task. In this case re-entrance
> > into the same kmalloc bucket is unsafe, and kmalloc_nolock() tries
> > a different bucket that is most likely is not locked by current
> > task. Though it may be locked by a different task it's safe to
> > rt_spin_lock() on it.
> >
> > Similar to alloc_pages_nolock() the kmalloc_nolock() returns NULL
> > immediately if called from hard irq or NMI in PREEMPT_RT.
> >
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>
... snip ...
> > @@ -4354,6 +4406,88 @@ void *__kmalloc_noprof(size_t size, gfp_t flags)
> > }
> > EXPORT_SYMBOL(__kmalloc_noprof);
> >
> > +/**
> > + * kmalloc_nolock - Allocate an object of given size from any context.
> > + * @size: size to allocate
> > + * @gfp_flags: GFP flags. Only __GFP_ACCOUNT, __GFP_ZERO allowed.
> > + * @node: node number of the target node.
> > + *
> > + * Return: pointer to the new object or NULL in case of error.
> > + * NULL does not mean EBUSY or EAGAIN. It means ENOMEM.
> > + * There is no reason to call it again and expect !NULL.
> > + */
> > +void *kmalloc_nolock_noprof(size_t size, gfp_t gfp_flags, int node)
> > +{
> > + gfp_t alloc_gfp = __GFP_NOWARN | __GFP_NOMEMALLOC | gfp_flags;
> > + struct kmem_cache *s;
> > + bool can_retry = true;
> > + void *ret = ERR_PTR(-EBUSY);
> > +
> > + VM_WARN_ON_ONCE(gfp_flags & ~(__GFP_ACCOUNT | __GFP_ZERO));
> > +
> > + if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
> > + return NULL;
> > + if (unlikely(!size))
> > + return ZERO_SIZE_PTR;
> > +
> > + if (!USE_LOCKLESS_FAST_PATH() && (in_nmi() || in_hardirq()))
> > + /* kmalloc_nolock() in PREEMPT_RT is not supported from irq */
> > + return NULL;
> > +retry:
> > + s = kmalloc_slab(size, NULL, alloc_gfp, _RET_IP_);
>
> The idea of retrying on different bucket is based on wrong assumptions and
> thus won't work as you expect. kmalloc_slab() doesn't select buckets truly
> randomly, but deterministically via hashing from a random per-boot seed and
> the _RET_IP_, as the security hardening goal is to make different kmalloc()
> callsites get different caches with high probability.
It's not retrying with the same size, so I don't think it's relying on any
assumption about random kmalloc caches. (yeah, it wastes some memory if
allocated from the next size bucket)
if (PTR_ERR(ret) == -EBUSY) {
if (can_retry) {
/* pick the next kmalloc bucket */
size = s->object_size + 1;
/*
* Another alternative is to
* if (memcg) alloc_gfp &= ~__GFP_ACCOUNT;
* else if (!memcg) alloc_gfp |= __GFP_ACCOUNT;
* to retry from bucket of the same size.
*/
can_retry = false;
goto retry;
}
ret = NULL;
}
By the way, it doesn't check if a kmalloc cache that can serve
(s->object_size + 1) allocations actually exists, which is not true for
the largest kmalloc cache?
--
Cheers,
Harry / Hyeonggon
next prev parent reply other threads:[~2025-05-07 0:32 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-01 3:27 [PATCH 0/6] mm: Reentrant kmalloc Alexei Starovoitov
2025-05-01 3:27 ` [PATCH 1/6] mm: Rename try_alloc_pages() to alloc_pages_nolock() Alexei Starovoitov
2025-05-06 8:26 ` Vlastimil Babka
2025-05-07 1:24 ` Alexei Starovoitov
2025-05-01 3:27 ` [PATCH 2/6] locking/local_lock: Expose dep_map in local_trylock_t Alexei Starovoitov
2025-05-06 12:56 ` Vlastimil Babka
2025-05-06 14:55 ` Vlastimil Babka
2025-05-07 1:25 ` Alexei Starovoitov
2025-05-12 13:26 ` Sebastian Andrzej Siewior
2025-05-12 16:46 ` Alexei Starovoitov
2025-05-01 3:27 ` [PATCH 3/6] locking/local_lock: Introduce local_lock_is_locked() Alexei Starovoitov
2025-05-06 12:59 ` Vlastimil Babka
2025-05-07 1:28 ` Alexei Starovoitov
2025-05-12 14:56 ` Sebastian Andrzej Siewior
2025-05-12 15:01 ` Vlastimil Babka
2025-05-12 15:23 ` Sebastian Andrzej Siewior
2025-05-01 3:27 ` [PATCH 4/6] locking/local_lock: Introduce local_lock_irqsave_check() Alexei Starovoitov
2025-05-07 13:02 ` Vlastimil Babka
2025-05-12 14:03 ` Sebastian Andrzej Siewior
2025-05-12 17:16 ` Alexei Starovoitov
2025-05-13 6:58 ` Vlastimil Babka
2025-05-13 21:55 ` Alexei Starovoitov
2025-05-01 3:27 ` [PATCH 5/6] mm: Allow GFP_ACCOUNT and GFP_COMP to be used in alloc_pages_nolock() Alexei Starovoitov
2025-05-06 8:55 ` Vlastimil Babka
2025-05-07 1:33 ` Alexei Starovoitov
2025-05-01 3:27 ` [PATCH 6/6] slab: Introduce kmalloc_nolock() and kfree_nolock() Alexei Starovoitov
2025-05-05 18:46 ` Shakeel Butt
2025-05-06 0:49 ` Alexei Starovoitov
2025-05-06 1:24 ` Shakeel Butt
2025-05-06 1:51 ` Alexei Starovoitov
2025-05-06 18:05 ` Shakeel Butt
2025-05-06 12:01 ` Vlastimil Babka
2025-05-07 0:31 ` Harry Yoo [this message]
2025-05-07 2:23 ` Alexei Starovoitov
2025-05-07 8:38 ` Vlastimil Babka
2025-05-07 2:20 ` Alexei Starovoitov
2025-05-07 10:44 ` Vlastimil Babka
2025-05-09 1:03 ` Harry Yoo
2025-06-24 17:13 ` SLAB_NO_CMPXCHG was:: " Alexei Starovoitov
2025-06-25 11:38 ` Harry Yoo
2025-06-26 20:03 ` Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aBqp1ScxaTznSf36@harry \
--to=harry.yoo@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=bpf@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
--cc=memxor@gmail.com \
--cc=mhocko@suse.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=shakeel.butt@linux.dev \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.