From: Harry Yoo <harry.yoo@oracle.com>
To: linux-mm@kvack.org
Cc: Dmitry Vyukov <dvyukov@google.com>,
lkmm@lists.linux.dev, linux-arch@vger.kernel.org,
linux-kernel@vger.kernel.org,
Joel Fernandes <joelagnelf@nvidia.com>,
Daniel Lustig <dlustig@nvidia.com>,
Akira Yokosawa <akiyks@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Luc Maranget <luc.maranget@inria.fr>,
Jade Alglave <j.alglave@ucl.ac.uk>,
David Howells <dhowells@redhat.com>,
Nicholas Piggin <npiggin@gmail.com>,
Boqun Feng <boqun@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Will Deacon <will@kernel.org>,
Andrea Parri <parri.andrea@gmail.com>,
Alan Stern <stern@rowland.harvard.edu>,
Pedro Falcato <pfalcato@suse.de>,
Vlastimil Babka <vbabka@suse.cz>,
Christoph Lameter <cl@gentwo.org>,
David Rientjes <rientjes@google.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
Hao Li <hao.li@linux.dev>, Shakeel Butt <shakeel.butt@linux.dev>,
Venkat Rao Bagalkote <venkat88@linux.ibm.com>,
Mateusz Guzik <mjguzik@gmail.com>,
Suren Baghdasaryan <surenb@google.com>,
Marco Elver <elver@google.com>
Subject: Re: [BUG] Memory ordering between kmalloc() and kfree()? it's confusing!
Date: Fri, 6 Mar 2026 11:46:29 +0900 [thread overview]
Message-ID: <aapABVbVYNwhEV55@hyeyoo> (raw)
In-Reply-To: <aZ_lJAqxh_hNGr_v@hyeyoo>
On Thu, Feb 26, 2026 at 03:35:08PM +0900, Harry Yoo wrote:
> Hello, SLAB, LKMM, and KCSAN folks!
[...snip...]
> # Now, let's take a look at the bug I've been investigating
>
> There were two bugs [3] [4] reported, with symptoms that appear to be
> caused by slab returning wrong metadata (the symptoms: incorrect
> reference counting of obj_cgroup, integer overflow as more memory is
> uncharged than charged).
>
> [3] https://lore.kernel.org/lkml/ca241daa-e7e7-4604-a48d-de91ec9184a5@linux.ibm.com
> [4] https://lore.kernel.org/all/ddff7c7d-c0c3-4780-808f-9a83268bbf0c@linux.ibm.com
>
> Hmm, if it's returning wrong metadata, how could that happen?
>
> Well, perhaps it's either 1) the calculation of metadata address is
> incorrect, or 2) reading the metadata itself is racy.
>
> Shakeel Butt pointed out [9] that there's a potential memory ordering
> issue. It suggests that no enforced ordering between slab->obj_exts
> and slab->stride can make the metadata address calculation incorrect.
>
> [9] https://lore.kernel.org/lkml/aZu9G9mVIVzSm6Ft@hyeyoo
>
> Let's say CPU X and Y are allocating/freeing slab objects from/to
> the same slab. They need to access metadata for the objects:
>
> CPU X CPU Y
>
> // CPU X allocates metadata array
> - slab->obj_exts = <the address of the metadata array>
> - slab->stride = 16 (sizeof struct slab)
>
> - stride = plain load slab->stride
> - obj_exts = READ_ONCE(slab->obj_exts)
> - if (obj_exts)
> - metadata_addr =
> stride * index + obj_exts
> - stride = plain load slab->stride
> - obj_exts = READ_ONCE(slab->obj_exts)
> - if (obj_exts)
> - metadata_addr = stride * index +
> obj_exts
>
> // Wait, obj_exts is non-NULL,
> // but slab->stride is stale!
>
> // Now, metadata_addr is wrong.
>
> Hmm, this could definitely happen when two CPUs try to allocate/free
> objects from/to the same slab. We need to make sure that, CPUs cannot
> see stale slab->stride as long as slab->obj_exts is not NULL.
>
> # How I tried to fix it
>
> An expensive solution would be do:
>
> CPU X: CPU Y:
> - slab->stride = 16 - READ_ONCE(slab->obj_exts)
> - smp_wmb() - if (obj_exts)
> - slab->obj_exts = <something> - smp_rmb()
> - plain load slab->stride
>
> Then, CPU Y should see either (obj_exts == 0), or
> (obj_exts != 0 and a valid stride). (obj_exts != 0) && (invalid stride)
> is impossible.
>
> This fix [5] seems to resolve the bug [6], yay!
>
> Before testing this fix, I wasn't fully convinced that it was a memory
> ordering issue. But after testing it, it seems reasonable to assume that
> it's indeed a memory ordering issue.
Apologies for delay. I had to confirm that there was a confusion
in the analysis above.
It turns out that smp_wmb()+smp_rmb() pair didn't really fix the
underlying problem [10]. And the confusion was that the bugs reported
[5] [7] are actually caused by lack of enforced memory ordering.
It's true that there was a theoretical memory ordering issue (now fixed
in 7.0-rc2 [7]), but the reason why stride value was invalid was because
stride's type was unsigned short, which was too small [9] [11].
So my previous argument that "probably there is a user that violates
slab's assumption" becomes invalid. That's a relif ;)
> [5] https://lore.kernel.org/linux-mm/aZ2Gwie5dpXotxWc@hyeyoo
> [6] https://lore.kernel.org/linux-mm/84492f08-04c2-485c-9a18-cdafd5a9c3e5@linux.ibm.com
[9] https://lore.kernel.org/linux-mm/20260303135722.2680521-1-harry.yoo@oracle.com
[10] https://lore.kernel.org/linux-mm/aaj--Lej6kWE0aV-@hyeyoo
[11] https://lore.kernel.org/linux-mm/41f1c856-2c41-4d11-96e6-079d95d8efbb@linux.ibm.com
--
Cheers,
Harry / Hyeonggon
prev parent reply other threads:[~2026-03-06 2:47 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-26 6:35 [BUG] Memory ordering between kmalloc() and kfree()? it's confusing! Harry Yoo
2026-02-26 15:45 ` Alan Stern
2026-02-26 16:17 ` Harry Yoo
2026-02-26 16:42 ` Alan Stern
2026-02-26 17:11 ` Harry Yoo
2026-02-26 18:06 ` Alan Stern
2026-02-27 12:36 ` Harry Yoo
2026-02-27 17:00 ` Alan Stern
2026-02-26 17:59 ` Christoph Lameter (Ampere)
2026-02-27 8:06 ` Hao Li
2026-02-27 9:03 ` Harry Yoo
2026-02-27 9:14 ` Akira Yokosawa
2026-03-06 2:46 ` Harry Yoo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aapABVbVYNwhEV55@hyeyoo \
--to=harry.yoo@oracle.com \
--cc=akiyks@gmail.com \
--cc=boqun@kernel.org \
--cc=cl@gentwo.org \
--cc=dhowells@redhat.com \
--cc=dlustig@nvidia.com \
--cc=dvyukov@google.com \
--cc=elver@google.com \
--cc=hao.li@linux.dev \
--cc=j.alglave@ucl.ac.uk \
--cc=joelagnelf@nvidia.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lkmm@lists.linux.dev \
--cc=luc.maranget@inria.fr \
--cc=mjguzik@gmail.com \
--cc=npiggin@gmail.com \
--cc=parri.andrea@gmail.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=pfalcato@suse.de \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=stern@rowland.harvard.edu \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=venkat88@linux.ibm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.