From: Harry Yoo <harry.yoo@oracle.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>, bpf <bpf@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
Shakeel Butt <shakeel.butt@linux.dev>,
Michal Hocko <mhocko@suse.com>,
Sebastian Sewior <bigeasy@linutronix.de>,
Andrii Nakryiko <andrii@kernel.org>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>,
Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH v2 6/6] slab: Introduce kmalloc_nolock() and kfree_nolock().
Date: Fri, 11 Jul 2025 15:06:33 +0900 [thread overview]
Message-ID: <aHCpkPBPEiSECFc6@hyeyoo> (raw)
In-Reply-To: <CAADnVQ+G340va8h2B7nNO00mWxbP_chx3oHW2PYrKt2AfOZS8w@mail.gmail.com>
On Thu, Jul 10, 2025 at 12:13:20PM -0700, Alexei Starovoitov wrote:
> On Thu, Jul 10, 2025 at 8:05 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> > On 7/10/25 12:21, Harry Yoo wrote:
> > > On Thu, Jul 10, 2025 at 11:36:02AM +0200, Vlastimil Babka wrote:
> > >> On 7/9/25 03:53, Alexei Starovoitov wrote:
> > >>
> > >> Hm but this is leaking the slab we allocated and have in the "slab"
> > >> variable, we need to free it back in that case.
>
> ohh. sorry for the silly mistake.
> Re-reading the diff again I realized that I made a similar mistake in
> alloc_single_from_new_slab().
> It has this bit:
> if (!alloc_debug_processing(...))
> return NULL;
Yeah but we purposefully leak slabs if !alloc_debug_processing(),
the same in alloc_single_from_partial().
> so I assumed that doing:
> if (!spin_trylock_irqsave(&n->list_lock,..))
> return NULL;
>
> is ok too. Now I see that !alloc_debug is purposefully leaking memory.
>
> Should we add:
> @@ -2841,6 +2841,7 @@ static void *alloc_single_from_new_slab(struct
> kmem_cache *s, struct slab *slab,
> * It's not really expected that this would fail on a
> * freshly allocated slab, but a concurrent memory
> * corruption in theory could cause that.
> + * Leak newly allocated slab.
> */
> return NULL;
>
> so the next person doesn't make the same mistake?
Looks fine. Probably add a comment to alloc_single_from_partial() as well?
> Also help me understand...
> slab->objects is never equal to 1, right?
No. For example, if you allocate a 4k obj and SLUB allocate a slab with
oo_order(s->min), s->objects will be 1.
/proc/slabinfo only prints oo_order(s->oo), not oo_order(s->min).
> /proc/slabinfo agrees, but I cannot decipher it through slab init code.
> Logically it makes sense.
I think the reason why there is no <objsperslab> == 1 in your
/proc/slabinfo is that calculate_order() tries to choose higher order
for slabs (based on nr of CPUs) to reduce lock contention.
But nothing prevents s->objects from being 1.
> If that's the case why alloc_single_from_new_slab()
> has this part:
> if (slab->inuse == slab->objects)
> add_full(s, n, slab);
> else
> add_partial(n, slab, DEACTIVATE_TO_HEAD);
>
> Shouldn't it call add_partial() only ?
> since slab->inuse == 1 and slab->objects != 1
...and that means we need to handle slab->inuse == slab->objects.
> > > But it might be a partial slab taken from the list?
> >
> > True.
> >
> > > Then we need to trylock n->list_lock and if that fails, oh...
> >
> > So... since we succeeded taking it from the list and thus the spin_trylock,
> > it means it's safe to spinlock n->list_lock again - we might be waiting on
> > other cpu to unlock it but we know we didn't NMI on our own cpu having the
> > lock, right? But we'd probably need to convince lockdep about this somehow,
> > and also remember if we allocated a new slab or taken on from the partial
> > list... or just deal with this unlikely situation in another irq work :/
>
> irq_work might be the least mind bending.
> Good point about partial vs new slab.
> For partial we can indeed proceed with deactivate_slab() and if
> I'm reading the code correctly, it won't have new.inuse == 0,
> so it won't go to discard_slab() (which won't be safe in this path)
> But teaching lockdep that below bit in deactivate_slab() is safe:
> } else if (new.freelist) {
> spin_lock_irqsave(&n->list_lock, flags);
> add_partial(n, slab, tail);
> is a challenge.
>
> Since defer_free_work is there, I'm leaning to reuse it for
> deactive_slab too. It will process
> static DEFINE_PER_CPU(struct llist_head, defer_free_objects);
> and
> static DEFINE_PER_CPU(struct llist_head, defer_deactivate_slabs);
+1 for another irq work for slab deactivation
it should be rare anyway...
> Shouldn't be too ugly. Better ideas?
--
Cheers,
Harry / Hyeonggon
next prev parent reply other threads:[~2025-07-11 6:07 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-09 1:52 [PATCH v2 0/6] slab: Re-entrant kmalloc_nolock() Alexei Starovoitov
2025-07-09 1:52 ` [PATCH v2 1/6] locking/local_lock: Expose dep_map in local_trylock_t Alexei Starovoitov
2025-07-11 8:02 ` Sebastian Andrzej Siewior
2025-07-09 1:52 ` [PATCH v2 2/6] locking/local_lock: Introduce local_lock_is_locked() Alexei Starovoitov
2025-07-11 7:52 ` Sebastian Andrzej Siewior
2025-07-09 1:53 ` [PATCH v2 3/6] locking/local_lock: Introduce local_lock_lockdep_start/end() Alexei Starovoitov
2025-07-11 7:50 ` Sebastian Andrzej Siewior
2025-07-11 9:55 ` Vlastimil Babka
2025-07-11 15:17 ` Sebastian Andrzej Siewior
2025-07-11 15:23 ` Vlastimil Babka
2025-07-12 2:19 ` Alexei Starovoitov
2025-07-14 11:06 ` Sebastian Andrzej Siewior
2025-07-14 15:35 ` Vlastimil Babka
2025-07-14 15:54 ` Sebastian Andrzej Siewior
2025-07-14 17:52 ` Alexei Starovoitov
2025-07-14 18:33 ` Vlastimil Babka
2025-07-14 18:46 ` Alexei Starovoitov
2025-07-15 6:56 ` Vlastimil Babka
2025-07-15 17:29 ` Alexei Starovoitov
2025-07-15 17:48 ` Vlastimil Babka
2025-07-15 21:00 ` Alexei Starovoitov
2025-07-09 1:53 ` [PATCH v2 4/6] mm: Allow GFP_ACCOUNT to be used in alloc_pages_nolock() Alexei Starovoitov
2025-07-09 14:20 ` Vlastimil Babka
2025-07-09 1:53 ` [PATCH v2 5/6] mm: Introduce alloc_frozen_pages_nolock() Alexei Starovoitov
2025-07-09 14:21 ` Vlastimil Babka
2025-07-09 1:53 ` [PATCH v2 6/6] slab: Introduce kmalloc_nolock() and kfree_nolock() Alexei Starovoitov
2025-07-10 9:36 ` Vlastimil Babka
2025-07-10 10:21 ` Harry Yoo
2025-07-10 15:05 ` Vlastimil Babka
2025-07-10 19:13 ` Alexei Starovoitov
2025-07-11 6:06 ` Harry Yoo [this message]
2025-07-11 10:30 ` Vlastimil Babka
2025-07-12 1:55 ` Alexei Starovoitov
2025-07-10 19:21 ` Alexei Starovoitov
2025-07-11 7:26 ` Sebastian Andrzej Siewior
2025-07-11 7:36 ` Harry Yoo
2025-07-11 7:40 ` Harry Yoo
2025-07-11 10:48 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aHCpkPBPEiSECFc6@hyeyoo \
--to=harry.yoo@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=bpf@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
--cc=memxor@gmail.com \
--cc=mhocko@suse.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=shakeel.butt@linux.dev \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.