From: Harry Yoo <harry.yoo@oracle.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Lameter <cl@gentwo.org>,
David Rientjes <rientjes@google.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
Alexei Starovoitov <ast@kernel.org>,
linux-mm <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab()
Date: Fri, 24 Oct 2025 11:03:29 +0900 [thread overview]
Message-ID: <aPrecUasNUbEkLlS@hyeyoo> (raw)
In-Reply-To: <CAADnVQK+3GLbq4GjOYO0Q6vhURPyNyy70bZKUUwRpLuK-R8NAA@mail.gmail.com>
On Thu, Oct 23, 2025 at 06:17:19PM -0700, Alexei Starovoitov wrote:
> On Thu, Oct 23, 2025 at 5:00 PM Harry Yoo <harry.yoo@oracle.com> wrote:
> >
> > On Thu, Oct 23, 2025 at 04:13:37PM -0700, Alexei Starovoitov wrote:
> > > On Thu, Oct 23, 2025 at 5:01 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> > > >
> > > > Since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and
> > > > kfree_nolock().") there's a possibility in alloc_single_from_new_slab()
> > > > that we discard the newly allocated slab if we can't spin and we fail to
> > > > trylock. As a result we don't perform inc_slabs_node() later in the
> > > > function. Instead we perform a deferred deactivate_slab() which can
> > > > either put the unacounted slab on partial list, or discard it
> > > > immediately while performing dec_slabs_node(). Either way will cause an
> > > > accounting imbalance.
> > > >
> > > > Fix this by not marking the slab as frozen, and using free_slab()
> > > > instead of deactivate_slab() for non-frozen slabs in
> > > > free_deferred_objects(). For CONFIG_SLUB_TINY, that's the only possible
> > > > case. By not using discard_slab() we avoid dec_slabs_node().
> > > >
> > > > Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
> > > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> > > > ---
> > > > Changes in v2:
> > > > - Fix the problem differently. Harry pointed out that we can't move
> > > > inc_slabs_node() outside of list_lock protected regions as that would
> > > > reintroduce issues fixed by commit c7323a5ad078
> > > > - Link to v1: https://patch.msgid.link/20251022-fix-slab-accounting-v1-1-27870ec363ce@suse.cz
> > > > ---
> > > > mm/slub.c | 8 +++++---
> > > > 1 file changed, 5 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/mm/slub.c b/mm/slub.c
> > > > index 23d8f54e9486..87a1d2f9de0d 100644
> > > > --- a/mm/slub.c
> > > > +++ b/mm/slub.c
> > > > @@ -3422,7 +3422,6 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
> > > >
> > > > if (!allow_spin && !spin_trylock_irqsave(&n->list_lock, flags)) {
> > > > /* Unlucky, discard newly allocated slab */
> > > > - slab->frozen = 1;
> > > > defer_deactivate_slab(slab, NULL);
> > > > return NULL;
> > > > }
> > > > @@ -6471,9 +6470,12 @@ static void free_deferred_objects(struct irq_work *work)
> > > > struct slab *slab = container_of(pos, struct slab, llnode);
> > > >
> > > > #ifdef CONFIG_SLUB_TINY
> > > > - discard_slab(slab->slab_cache, slab);
> > > > + free_slab(slab->slab_cache, slab);
> > > > #else
> > > > - deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
> > > > + if (slab->frozen)
> > > > + deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
> > > > + else
> > > > + free_slab(slab->slab_cache, slab);
> > >
> > > A bit odd to use 'frozen' flag as such a signal.
> > > I guess I'm worried that truly !frozen slab can come here
> > > via ___slab_alloc() -> retry_load_slab: -> defer_deactivate_slab().
> > > And things will be much worse than just accounting.
> >
> > But the cpu slab must have been frozen before it's attached to
> > c->slab?
>
> Is it?
> the path is
> c = slub_get_cpu_ptr(s->cpu_slab);
> if (unlikely(c->slab)) {
> struct slab *flush_slab = c->slab;
> defer_deactivate_slab(flush_slab, ...);
>
> I don't see why it would be frozen.
Oh god. I was going to say the cpu slab is always frozen. It has been
true for very long time, but it seems it's not true after commit 90b1e56641
("mm/slub: directly load freelist from cpu partial slab in the likely case").
So I think you're right that a non-frozen slab can go through
free_slab() in free_deferred_objects()...
But fixing this should be simple. Add something like
freeze_and_get_freelist() and call it when SLUB take a slab from
per-cpu partial slab list?
> > > Maybe add
> > > inc_slabs_node(s, nid, slab->objects);
> > > right before
> > > defer_deactivate_slab(slab, NULL);
> > > return NULL;
> > >
> > > I don't quite get why c7323a5ad078 is doing everything under n->list_lock.
> > > It's been 3 years since.
> >
> > When n->nr_slabs is inconsistent, validate_slab_node() might report an
> > error (false positive) when someone wrote '1' to
> > /sys/kernel/slab/<cache name>/validate
>
> Ok. I see it now. It's the actual number of elements in n->full
> list needs to match n->nr_slabs.
>
> But then how it's not broken already?
> I see that
> alloc_single_from_new_slab()
> unconditionally does inc_slabs_node(), but
It increments n->nr_slabs. It doesn't matter which list it's going to be
added to, because it's total number of slabs in that node.
> slab itself is added either to n->full or n->partial lists.
and then n->nr_partial is also incremented if it's added to n->partial.
> And validate_slab_node() should be complaining already.
The debug routine checks if:
- the number of slabs in n->partial == n->nr_partial
- the number of slabs in n->full + n->partial == n->nr_slabs
under n->list_lock. So it's not broken?
> Anyway, I'm not arguing. Just trying to understand.
> If you think the fix is fine, then go ahead.
--
Cheers,
Harry / Hyeonggon
next prev parent reply other threads:[~2025-10-24 2:03 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-23 12:01 [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab() Vlastimil Babka
2025-10-23 12:56 ` Harry Yoo
2025-10-23 23:13 ` Alexei Starovoitov
2025-10-24 0:00 ` Harry Yoo
2025-10-24 1:17 ` Alexei Starovoitov
2025-10-24 2:03 ` Harry Yoo [this message]
2025-10-24 8:55 ` Vlastimil Babka
2025-10-24 9:36 ` Harry Yoo
2025-10-24 18:08 ` Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aPrecUasNUbEkLlS@hyeyoo \
--to=harry.yoo@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=alexei.starovoitov@gmail.com \
--cc=ast@kernel.org \
--cc=cl@gentwo.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.