* Re: [PATCH 0/3] slab: support memoryless nodes with sheaves
[not found] ` <8ab58ecb-1fc1-42a1-b67a-c3107de2ece4@kernel.org>
@ 2026-04-08 13:04 ` Jon Hunter
2026-04-08 14:06 ` Hao Li
2026-04-08 14:31 ` Harry Yoo (Oracle)
0 siblings, 2 replies; 5+ messages in thread
From: Jon Hunter @ 2026-04-08 13:04 UTC (permalink / raw)
To: Vlastimil Babka (SUSE), Ming Lei
Cc: Harry Yoo, Hao Li, Andrew Morton, Christoph Lameter,
David Rientjes, Roman Gushchin, linux-mm, linux-kernel,
linux-tegra@vger.kernel.org
Hi Vlastimil,
On 11/03/2026 17:22, Vlastimil Babka (SUSE) wrote:
> On 3/11/26 10:49, Ming Lei wrote:
>> On Wed, Mar 11, 2026 at 09:25:54AM +0100, Vlastimil Babka (SUSE) wrote:
>>> This is the draft patch from [1] turned into a proper series with
>>> incremental changes. It's based on v7.0-rc3. It's too intrusive for a
>>> 7.0 hotfix, so we'll only be able to fix/reduce the regression in 7.1. I
>>> hope it's acceptable given it's a non-standard configuration, 7.0 is not
>>> a LTS, and it's a perf regression, not functionality.
>>>
>>> Ming can you please retest this on top of v7.0-rc3, which already has
>>> fb1091febd66 ("mm/slab: allow sheaf refill if blocking is not
>>> allowed"). Separate data point for v7.0-rc3 could be also useful.
>>>
>>> [1] https://lore.kernel.org/all/c6a01f7e-c6eb-454b-9b9e-734526dd659d@kernel.org/
>>>
>>> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
>>> ---
>>> Vlastimil Babka (SUSE) (3):
>>> slab: decouple pointer to barn from kmem_cache_node
>>> slab: create barns for online memoryless nodes
>>> slab: free remote objects to sheaves on memoryless nodes
>>
>> Hi Vlastimil and Guys,
>>
>> I re-run the test case used in https://lore.kernel.org/all/aZ0SbIqaIkwoW2mB@fedora/
>>
>> - v6.19-rc5: 34M
>>
>> - 815c8e35511d Merge branch 'slab/for-7.0/sheaves' into slab/for-next: 13M
>>
>> - v7.0-rc3: 13M
>
> Thanks, that's in line with your previous testing of "mm/slab: allow sheaf
> refill if blocking is not allowed" making no difference here. At least we
> just learned it helps other benchmarks :)
>
>> - v7.0-rc3 + the three patches: 24M
>
> OK. So now it might be really the total per-cpu caching capacity difference.
I have also observed a performance regresssion for Linux v7.0-rc for
some graphics related tests we run. I bisected to ...
# first bad commit: [e47c897a29491ade20b27612fdd3107c39a07357] slab: add
sheaves to most caches
I came across Ming's report and hence, found this series. I have also
tested the 3 patches in this series and it did appear to help with one
test, but overall I am still seeing a ~25% performance regression (the
tests are taking about 25% longer to run). I am not the owner or author
of these specific tests and I have not dived into see exactly what is
taking longer, but I just know they are taking longer to run.
Anyway, I have not seen any recent updates on this, and so I am not sure
if there are any other updates or what the current status of this is?
If there are any more patches available I will be happy to test.
Thanks!
Jon
--
nvpublic
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/3] slab: support memoryless nodes with sheaves
2026-04-08 13:04 ` [PATCH 0/3] slab: support memoryless nodes with sheaves Jon Hunter
@ 2026-04-08 14:06 ` Hao Li
2026-04-09 20:02 ` Jon Hunter
2026-04-08 14:31 ` Harry Yoo (Oracle)
1 sibling, 1 reply; 5+ messages in thread
From: Hao Li @ 2026-04-08 14:06 UTC (permalink / raw)
To: Jon Hunter
Cc: Vlastimil Babka (SUSE), Ming Lei, Harry Yoo, Andrew Morton,
Christoph Lameter, David Rientjes, Roman Gushchin, linux-mm,
linux-kernel, linux-tegra@vger.kernel.org
On Wed, Apr 08, 2026 at 02:04:54PM +0100, Jon Hunter wrote:
> Hi Vlastimil,
>
> On 11/03/2026 17:22, Vlastimil Babka (SUSE) wrote:
> > On 3/11/26 10:49, Ming Lei wrote:
> > > On Wed, Mar 11, 2026 at 09:25:54AM +0100, Vlastimil Babka (SUSE) wrote:
> > > > This is the draft patch from [1] turned into a proper series with
> > > > incremental changes. It's based on v7.0-rc3. It's too intrusive for a
> > > > 7.0 hotfix, so we'll only be able to fix/reduce the regression in 7.1. I
> > > > hope it's acceptable given it's a non-standard configuration, 7.0 is not
> > > > a LTS, and it's a perf regression, not functionality.
> > > >
> > > > Ming can you please retest this on top of v7.0-rc3, which already has
> > > > fb1091febd66 ("mm/slab: allow sheaf refill if blocking is not
> > > > allowed"). Separate data point for v7.0-rc3 could be also useful.
> > > >
> > > > [1] https://lore.kernel.org/all/c6a01f7e-c6eb-454b-9b9e-734526dd659d@kernel.org/
> > > >
> > > > Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> > > > ---
> > > > Vlastimil Babka (SUSE) (3):
> > > > slab: decouple pointer to barn from kmem_cache_node
> > > > slab: create barns for online memoryless nodes
> > > > slab: free remote objects to sheaves on memoryless nodes
> > >
> > > Hi Vlastimil and Guys,
> > >
> > > I re-run the test case used in https://lore.kernel.org/all/aZ0SbIqaIkwoW2mB@fedora/
> > >
> > > - v6.19-rc5: 34M
> > >
> > > - 815c8e35511d Merge branch 'slab/for-7.0/sheaves' into slab/for-next: 13M
> > >
> > > - v7.0-rc3: 13M
> >
> > Thanks, that's in line with your previous testing of "mm/slab: allow sheaf
> > refill if blocking is not allowed" making no difference here. At least we
> > just learned it helps other benchmarks :)
> >
> > > - v7.0-rc3 + the three patches: 24M
> >
> > OK. So now it might be really the total per-cpu caching capacity difference.
>
>
> I have also observed a performance regresssion for Linux v7.0-rc for some
> graphics related tests we run. I bisected to ...
>
> # first bad commit: [e47c897a29491ade20b27612fdd3107c39a07357] slab: add
> sheaves to most caches
Hi, Jon
Thanks for the reporting.
This first bad commit is surprising. In theory, this commit seems couldn't hurt
performance.
Could you possibly manually switch commits to verify this bad commit again,
without using git bisect?
>
> I came across Ming's report and hence, found this series. I have also tested
> the 3 patches in this series and it did appear to help with one test, but
> overall I am still seeing a ~25% performance regression (the tests are
> taking about 25% longer to run). I am not the owner or author of these
> specific tests and I have not dived into see exactly what is taking longer,
> but I just know they are taking longer to run.
>
> Anyway, I have not seen any recent updates on this, and so I am not sure if
> there are any other updates or what the current status of this is?
>
> If there are any more patches available I will be happy to test.
>
> Thanks!
> Jon
>
> --
> nvpublic
--
Thanks,
Hao
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/3] slab: support memoryless nodes with sheaves
2026-04-08 13:04 ` [PATCH 0/3] slab: support memoryless nodes with sheaves Jon Hunter
2026-04-08 14:06 ` Hao Li
@ 2026-04-08 14:31 ` Harry Yoo (Oracle)
2026-04-09 20:11 ` Jon Hunter
1 sibling, 1 reply; 5+ messages in thread
From: Harry Yoo (Oracle) @ 2026-04-08 14:31 UTC (permalink / raw)
To: Jon Hunter
Cc: Vlastimil Babka (SUSE), Ming Lei, Hao Li, Andrew Morton,
Christoph Lameter, David Rientjes, Roman Gushchin, linux-mm,
linux-kernel, linux-tegra@vger.kernel.org
On Wed, Apr 08, 2026 at 02:04:54PM +0100, Jon Hunter wrote:
> Hi Vlastimil,
Hi Jon,
> On 11/03/2026 17:22, Vlastimil Babka (SUSE) wrote:
> > On 3/11/26 10:49, Ming Lei wrote:
> > > On Wed, Mar 11, 2026 at 09:25:54AM +0100, Vlastimil Babka (SUSE) wrote:
> > > > This is the draft patch from [1] turned into a proper series with
> > > > incremental changes. It's based on v7.0-rc3. It's too intrusive for a
> > > > 7.0 hotfix, so we'll only be able to fix/reduce the regression in 7.1. I
> > > > hope it's acceptable given it's a non-standard configuration, 7.0 is not
> > > > a LTS, and it's a perf regression, not functionality.
> > > >
> > > > Ming can you please retest this on top of v7.0-rc3, which already has
> > > > fb1091febd66 ("mm/slab: allow sheaf refill if blocking is not
> > > > allowed"). Separate data point for v7.0-rc3 could be also useful.
> > > >
> > > > [1] https://lore.kernel.org/all/c6a01f7e-c6eb-454b-9b9e-734526dd659d@kernel.org/
> > > >
> > > > Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> > > > ---
> > > > Vlastimil Babka (SUSE) (3):
> > > > slab: decouple pointer to barn from kmem_cache_node
> > > > slab: create barns for online memoryless nodes
> > > > slab: free remote objects to sheaves on memoryless nodes
> > >
> > > Hi Vlastimil and Guys,
> > >
> > > I re-run the test case used in https://lore.kernel.org/all/aZ0SbIqaIkwoW2mB@fedora/
> > >
> > > - v6.19-rc5: 34M
> > >
> > > - 815c8e35511d Merge branch 'slab/for-7.0/sheaves' into slab/for-next: 13M
> > >
> > > - v7.0-rc3: 13M
> >
> > Thanks, that's in line with your previous testing of "mm/slab: allow sheaf
> > refill if blocking is not allowed" making no difference here. At least we
> > just learned it helps other benchmarks :)
> >
> > > - v7.0-rc3 + the three patches: 24M
> >
> > OK. So now it might be really the total per-cpu caching capacity difference.
>
> I have also observed a performance regresssion for Linux v7.0-rc for some
> graphics related tests we run. I bisected to ...
>
> # first bad commit: [e47c897a29491ade20b27612fdd3107c39a07357] slab: add
> sheaves to most caches
>
> I came across Ming's report and hence, found this series. I have also tested
> the 3 patches in this series and it did appear to help with one test, but
> overall I am still seeing a ~25% performance regression (the tests are
> taking about 25% longer to run). I am not the owner or author of these
> specific tests and I have not dived into see exactly what is taking longer,
> but I just know they are taking longer to run.
>
> Anyway, I have not seen any recent updates on this, and so I am not sure if
> there are any other updates or what the current status of this is?
As far as I remember we didn't get to fully recovering the performance
yet. Interestingly even when most of allocations go through the fastpath
it didn't fully recover [1].
[1] https://lore.kernel.org/all/abI9DKxuwl_4Gasj@hyeyoo
I was suspecting it's probably because of:
- false sharing on something (sheaves, obj metadata, etc.), or
- suboptimal NUMA placement, or
- something outside slab involved
But I don't have enough data to back up any of these theories yet.
> If there are any more patches available I will be happy to test.
Thanks!
Before diving deeper, could you please share the NUMA topology from
`numactl -H` on your machine?
It's probably a NUMA machine? (and hopefully not memoryless ones!)
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/3] slab: support memoryless nodes with sheaves
2026-04-08 14:06 ` Hao Li
@ 2026-04-09 20:02 ` Jon Hunter
0 siblings, 0 replies; 5+ messages in thread
From: Jon Hunter @ 2026-04-09 20:02 UTC (permalink / raw)
To: Hao Li
Cc: Vlastimil Babka (SUSE), Ming Lei, Harry Yoo, Andrew Morton,
Christoph Lameter, David Rientjes, Roman Gushchin, linux-mm,
linux-kernel, linux-tegra@vger.kernel.org
On 08/04/2026 15:06, Hao Li wrote:
> On Wed, Apr 08, 2026 at 02:04:54PM +0100, Jon Hunter wrote:
>> Hi Vlastimil,
>>
>> On 11/03/2026 17:22, Vlastimil Babka (SUSE) wrote:
>>> On 3/11/26 10:49, Ming Lei wrote:
>>>> On Wed, Mar 11, 2026 at 09:25:54AM +0100, Vlastimil Babka (SUSE) wrote:
>>>>> This is the draft patch from [1] turned into a proper series with
>>>>> incremental changes. It's based on v7.0-rc3. It's too intrusive for a
>>>>> 7.0 hotfix, so we'll only be able to fix/reduce the regression in 7.1. I
>>>>> hope it's acceptable given it's a non-standard configuration, 7.0 is not
>>>>> a LTS, and it's a perf regression, not functionality.
>>>>>
>>>>> Ming can you please retest this on top of v7.0-rc3, which already has
>>>>> fb1091febd66 ("mm/slab: allow sheaf refill if blocking is not
>>>>> allowed"). Separate data point for v7.0-rc3 could be also useful.
>>>>>
>>>>> [1] https://lore.kernel.org/all/c6a01f7e-c6eb-454b-9b9e-734526dd659d@kernel.org/
>>>>>
>>>>> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
>>>>> ---
>>>>> Vlastimil Babka (SUSE) (3):
>>>>> slab: decouple pointer to barn from kmem_cache_node
>>>>> slab: create barns for online memoryless nodes
>>>>> slab: free remote objects to sheaves on memoryless nodes
>>>>
>>>> Hi Vlastimil and Guys,
>>>>
>>>> I re-run the test case used in https://lore.kernel.org/all/aZ0SbIqaIkwoW2mB@fedora/
>>>>
>>>> - v6.19-rc5: 34M
>>>>
>>>> - 815c8e35511d Merge branch 'slab/for-7.0/sheaves' into slab/for-next: 13M
>>>>
>>>> - v7.0-rc3: 13M
>>>
>>> Thanks, that's in line with your previous testing of "mm/slab: allow sheaf
>>> refill if blocking is not allowed" making no difference here. At least we
>>> just learned it helps other benchmarks :)
>>>
>>>> - v7.0-rc3 + the three patches: 24M
>>>
>>> OK. So now it might be really the total per-cpu caching capacity difference.
>>
>>
>> I have also observed a performance regresssion for Linux v7.0-rc for some
>> graphics related tests we run. I bisected to ...
>>
>> # first bad commit: [e47c897a29491ade20b27612fdd3107c39a07357] slab: add
>> sheaves to most caches
>
> Hi, Jon
>
> Thanks for the reporting.
> This first bad commit is surprising. In theory, this commit seems couldn't hurt
> performance.
> Could you possibly manually switch commits to verify this bad commit again,
> without using git bisect?
So I went back and checked out commit
e47c897a29491ade20b27612fdd3107c39a07357 ("slab: add
sheaves to most caches") confirmed that the problem exists there and
then reverted that and confirmed that I no longer see the problem.
Jon
--
nvpublic
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/3] slab: support memoryless nodes with sheaves
2026-04-08 14:31 ` Harry Yoo (Oracle)
@ 2026-04-09 20:11 ` Jon Hunter
0 siblings, 0 replies; 5+ messages in thread
From: Jon Hunter @ 2026-04-09 20:11 UTC (permalink / raw)
To: Harry Yoo (Oracle)
Cc: Vlastimil Babka (SUSE), Ming Lei, Hao Li, Andrew Morton,
Christoph Lameter, David Rientjes, Roman Gushchin, linux-mm,
linux-kernel, linux-tegra@vger.kernel.org
On 08/04/2026 15:31, Harry Yoo (Oracle) wrote:
> On Wed, Apr 08, 2026 at 02:04:54PM +0100, Jon Hunter wrote:
>> Hi Vlastimil,
>
> Hi Jon,
>
>> On 11/03/2026 17:22, Vlastimil Babka (SUSE) wrote:
>>> On 3/11/26 10:49, Ming Lei wrote:
>>>> On Wed, Mar 11, 2026 at 09:25:54AM +0100, Vlastimil Babka (SUSE) wrote:
>>>>> This is the draft patch from [1] turned into a proper series with
>>>>> incremental changes. It's based on v7.0-rc3. It's too intrusive for a
>>>>> 7.0 hotfix, so we'll only be able to fix/reduce the regression in 7.1. I
>>>>> hope it's acceptable given it's a non-standard configuration, 7.0 is not
>>>>> a LTS, and it's a perf regression, not functionality.
>>>>>
>>>>> Ming can you please retest this on top of v7.0-rc3, which already has
>>>>> fb1091febd66 ("mm/slab: allow sheaf refill if blocking is not
>>>>> allowed"). Separate data point for v7.0-rc3 could be also useful.
>>>>>
>>>>> [1] https://lore.kernel.org/all/c6a01f7e-c6eb-454b-9b9e-734526dd659d@kernel.org/
>>>>>
>>>>> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
>>>>> ---
>>>>> Vlastimil Babka (SUSE) (3):
>>>>> slab: decouple pointer to barn from kmem_cache_node
>>>>> slab: create barns for online memoryless nodes
>>>>> slab: free remote objects to sheaves on memoryless nodes
>>>>
>>>> Hi Vlastimil and Guys,
>>>>
>>>> I re-run the test case used in https://lore.kernel.org/all/aZ0SbIqaIkwoW2mB@fedora/
>>>>
>>>> - v6.19-rc5: 34M
>>>>
>>>> - 815c8e35511d Merge branch 'slab/for-7.0/sheaves' into slab/for-next: 13M
>>>>
>>>> - v7.0-rc3: 13M
>>>
>>> Thanks, that's in line with your previous testing of "mm/slab: allow sheaf
>>> refill if blocking is not allowed" making no difference here. At least we
>>> just learned it helps other benchmarks :)
>>>
>>>> - v7.0-rc3 + the three patches: 24M
>>>
>>> OK. So now it might be really the total per-cpu caching capacity difference.
>>
>> I have also observed a performance regresssion for Linux v7.0-rc for some
>> graphics related tests we run. I bisected to ...
>>
>> # first bad commit: [e47c897a29491ade20b27612fdd3107c39a07357] slab: add
>> sheaves to most caches
>>
>> I came across Ming's report and hence, found this series. I have also tested
>> the 3 patches in this series and it did appear to help with one test, but
>> overall I am still seeing a ~25% performance regression (the tests are
>> taking about 25% longer to run). I am not the owner or author of these
>> specific tests and I have not dived into see exactly what is taking longer,
>> but I just know they are taking longer to run.
>>
>> Anyway, I have not seen any recent updates on this, and so I am not sure if
>> there are any other updates or what the current status of this is?
>
> As far as I remember we didn't get to fully recovering the performance
> yet. Interestingly even when most of allocations go through the fastpath
> it didn't fully recover [1].
>
> [1] https://lore.kernel.org/all/abI9DKxuwl_4Gasj@hyeyoo
>
> I was suspecting it's probably because of:
> - false sharing on something (sheaves, obj metadata, etc.), or
> - suboptimal NUMA placement, or
> - something outside slab involved
>
> But I don't have enough data to back up any of these theories yet.
>
>> If there are any more patches available I will be happy to test.
>
> Thanks!
>
> Before diving deeper, could you please share the NUMA topology from
> `numactl -H` on your machine?
>
> It's probably a NUMA machine? (and hopefully not memoryless ones!)
This is not a NUMA machine, this is a Tegra234 Jetson AGX Orin board [0] ...
$ numactl -H
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 30517 MB
node 0 free: 29263 MB
node distances:
node 0
0: 10
Jon
[0]
https://www.nvidia.com/en-gb/autonomous-machines/embedded-systems/jetson-orin/
--
nvpublic
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-09 20:11 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260311-b4-slab-memoryless-barns-v1-0-70ab850be4ce@kernel.org>
[not found] ` <abE6uqdzMUv8k0mU@fedora>
[not found] ` <8ab58ecb-1fc1-42a1-b67a-c3107de2ece4@kernel.org>
2026-04-08 13:04 ` [PATCH 0/3] slab: support memoryless nodes with sheaves Jon Hunter
2026-04-08 14:06 ` Hao Li
2026-04-09 20:02 ` Jon Hunter
2026-04-08 14:31 ` Harry Yoo (Oracle)
2026-04-09 20:11 ` Jon Hunter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox