linux-tegra.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v8 04/23] slab: add sheaf support for batching kfree_rcu() operations
       [not found]   ` <0406562e-2066-4cf8-9902-b2b0616dd742@kernel.org>
@ 2025-11-27 11:38     ` Jon Hunter
  2025-11-27 11:50       ` Jon Hunter
                         ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Jon Hunter @ 2025-11-27 11:38 UTC (permalink / raw)
  To: Daniel Gomez, Vlastimil Babka, Suren Baghdasaryan,
	Liam R. Howlett, Christoph Lameter, David Rientjes
  Cc: Roman Gushchin, Harry Yoo, Uladzislau Rezki, Sidhartha Kumar,
	linux-mm, linux-kernel, rcu, maple-tree, linux-modules,
	Luis Chamberlain, Petr Pavlu, Sami Tolvanen, Aaron Tomlin,
	Lucas De Marchi, linux-tegra@vger.kernel.org



On 31/10/2025 21:32, Daniel Gomez wrote:
> 
> 
> On 10/09/2025 10.01, Vlastimil Babka wrote:
>> Extend the sheaf infrastructure for more efficient kfree_rcu() handling.
>> For caches with sheaves, on each cpu maintain a rcu_free sheaf in
>> addition to main and spare sheaves.
>>
>> kfree_rcu() operations will try to put objects on this sheaf. Once full,
>> the sheaf is detached and submitted to call_rcu() with a handler that
>> will try to put it in the barn, or flush to slab pages using bulk free,
>> when the barn is full. Then a new empty sheaf must be obtained to put
>> more objects there.
>>
>> It's possible that no free sheaves are available to use for a new
>> rcu_free sheaf, and the allocation in kfree_rcu() context can only use
>> GFP_NOWAIT and thus may fail. In that case, fall back to the existing
>> kfree_rcu() implementation.
>>
>> Expected advantages:
>> - batching the kfree_rcu() operations, that could eventually replace the
>>    existing batching
>> - sheaves can be reused for allocations via barn instead of being
>>    flushed to slabs, which is more efficient
>>    - this includes cases where only some cpus are allowed to process rcu
>>      callbacks (Android)
>>
>> Possible disadvantage:
>> - objects might be waiting for more than their grace period (it is
>>    determined by the last object freed into the sheaf), increasing memory
>>    usage - but the existing batching does that too.
>>
>> Only implement this for CONFIG_KVFREE_RCU_BATCHED as the tiny
>> implementation favors smaller memory footprint over performance.
>>
>> Also for now skip the usage of rcu sheaf for CONFIG_PREEMPT_RT as the
>> contexts where kfree_rcu() is called might not be compatible with taking
>> a barn spinlock or a GFP_NOWAIT allocation of a new sheaf taking a
>> spinlock - the current kfree_rcu() implementation avoids doing that.
>>
>> Teach kvfree_rcu_barrier() to flush all rcu_free sheaves from all caches
>> that have them. This is not a cheap operation, but the barrier usage is
>> rare - currently kmem_cache_destroy() or on module unload.
>>
>> Add CONFIG_SLUB_STATS counters free_rcu_sheaf and free_rcu_sheaf_fail to
>> count how many kfree_rcu() used the rcu_free sheaf successfully and how
>> many had to fall back to the existing implementation.
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Hi Vlastimil,
> 
> This patch increases kmod selftest (stress module loader) runtime by about
> ~50-60%, from ~200s to ~300s total execution time. My tested kernel has
> CONFIG_KVFREE_RCU_BATCHED enabled. Any idea or suggestions on what might be
> causing this, or how to address it?
> 

I have been looking into a regression for Linux v6.18-rc where time 
taken to run some internal graphics tests on our Tegra234 device has 
increased from around 35% causing the tests to timeout. Bisect is 
pointing to this commit and I also see we have CONFIG_KVFREE_RCU_BATCHED=y.

I have not tried disabling CONFIG_KVFREE_RCU_BATCHED=y but I can. I am 
not sure if there are any downsides to disabling this?

Thanks
Jon

-- 
nvpublic


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v8 04/23] slab: add sheaf support for batching kfree_rcu() operations
  2025-11-27 11:38     ` [PATCH v8 04/23] slab: add sheaf support for batching kfree_rcu() operations Jon Hunter
@ 2025-11-27 11:50       ` Jon Hunter
  2025-11-27 12:33       ` Harry Yoo
  2025-11-27 13:18       ` Vlastimil Babka
  2 siblings, 0 replies; 8+ messages in thread
From: Jon Hunter @ 2025-11-27 11:50 UTC (permalink / raw)
  To: Daniel Gomez, Vlastimil Babka, Suren Baghdasaryan,
	Liam R. Howlett, Christoph Lameter, David Rientjes
  Cc: Roman Gushchin, Harry Yoo, Uladzislau Rezki, Sidhartha Kumar,
	linux-mm, linux-kernel, rcu, maple-tree, linux-modules,
	Luis Chamberlain, Petr Pavlu, Sami Tolvanen, Aaron Tomlin,
	Lucas De Marchi, linux-tegra@vger.kernel.org


On 27/11/2025 11:38, Jon Hunter wrote:
> 
> 
> On 31/10/2025 21:32, Daniel Gomez wrote:
>>
>>
>> On 10/09/2025 10.01, Vlastimil Babka wrote:
>>> Extend the sheaf infrastructure for more efficient kfree_rcu() handling.
>>> For caches with sheaves, on each cpu maintain a rcu_free sheaf in
>>> addition to main and spare sheaves.
>>>
>>> kfree_rcu() operations will try to put objects on this sheaf. Once full,
>>> the sheaf is detached and submitted to call_rcu() with a handler that
>>> will try to put it in the barn, or flush to slab pages using bulk free,
>>> when the barn is full. Then a new empty sheaf must be obtained to put
>>> more objects there.
>>>
>>> It's possible that no free sheaves are available to use for a new
>>> rcu_free sheaf, and the allocation in kfree_rcu() context can only use
>>> GFP_NOWAIT and thus may fail. In that case, fall back to the existing
>>> kfree_rcu() implementation.
>>>
>>> Expected advantages:
>>> - batching the kfree_rcu() operations, that could eventually replace the
>>>    existing batching
>>> - sheaves can be reused for allocations via barn instead of being
>>>    flushed to slabs, which is more efficient
>>>    - this includes cases where only some cpus are allowed to process rcu
>>>      callbacks (Android)
>>>
>>> Possible disadvantage:
>>> - objects might be waiting for more than their grace period (it is
>>>    determined by the last object freed into the sheaf), increasing 
>>> memory
>>>    usage - but the existing batching does that too.
>>>
>>> Only implement this for CONFIG_KVFREE_RCU_BATCHED as the tiny
>>> implementation favors smaller memory footprint over performance.
>>>
>>> Also for now skip the usage of rcu sheaf for CONFIG_PREEMPT_RT as the
>>> contexts where kfree_rcu() is called might not be compatible with taking
>>> a barn spinlock or a GFP_NOWAIT allocation of a new sheaf taking a
>>> spinlock - the current kfree_rcu() implementation avoids doing that.
>>>
>>> Teach kvfree_rcu_barrier() to flush all rcu_free sheaves from all caches
>>> that have them. This is not a cheap operation, but the barrier usage is
>>> rare - currently kmem_cache_destroy() or on module unload.
>>>
>>> Add CONFIG_SLUB_STATS counters free_rcu_sheaf and free_rcu_sheaf_fail to
>>> count how many kfree_rcu() used the rcu_free sheaf successfully and how
>>> many had to fall back to the existing implementation.
>>>
>>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>>
>> Hi Vlastimil,
>>
>> This patch increases kmod selftest (stress module loader) runtime by 
>> about
>> ~50-60%, from ~200s to ~300s total execution time. My tested kernel has
>> CONFIG_KVFREE_RCU_BATCHED enabled. Any idea or suggestions on what 
>> might be
>> causing this, or how to address it?
>>
> 
> I have been looking into a regression for Linux v6.18-rc where time 
> taken to run some internal graphics tests on our Tegra234 device has 
> increased from around 35% causing the tests to timeout. Bisect is 

I meant 'increased by around 35%'.

> pointing to this commit and I also see we have CONFIG_KVFREE_RCU_BATCHED=y.
> 
> I have not tried disabling CONFIG_KVFREE_RCU_BATCHED=y but I can. I am 
> not sure if there are any downsides to disabling this?
> 
> Thanks
> Jon
> 

-- 
nvpublic


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v8 04/23] slab: add sheaf support for batching kfree_rcu() operations
  2025-11-27 11:38     ` [PATCH v8 04/23] slab: add sheaf support for batching kfree_rcu() operations Jon Hunter
  2025-11-27 11:50       ` Jon Hunter
@ 2025-11-27 12:33       ` Harry Yoo
  2025-11-27 12:48         ` Harry Yoo
  2025-11-27 13:18       ` Vlastimil Babka
  2 siblings, 1 reply; 8+ messages in thread
From: Harry Yoo @ 2025-11-27 12:33 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Daniel Gomez, Vlastimil Babka, Suren Baghdasaryan,
	Liam R. Howlett, Christoph Lameter, David Rientjes,
	Roman Gushchin, Uladzislau Rezki, Sidhartha Kumar, linux-mm,
	linux-kernel, rcu, maple-tree, linux-modules, Luis Chamberlain,
	Petr Pavlu, Sami Tolvanen, Aaron Tomlin, Lucas De Marchi,
	linux-tegra@vger.kernel.org

On Thu, Nov 27, 2025 at 11:38:49AM +0000, Jon Hunter wrote:
> 
> 
> On 31/10/2025 21:32, Daniel Gomez wrote:
> > 
> > 
> > On 10/09/2025 10.01, Vlastimil Babka wrote:
> > > Extend the sheaf infrastructure for more efficient kfree_rcu() handling.
> > > For caches with sheaves, on each cpu maintain a rcu_free sheaf in
> > > addition to main and spare sheaves.
> > > 
> > > kfree_rcu() operations will try to put objects on this sheaf. Once full,
> > > the sheaf is detached and submitted to call_rcu() with a handler that
> > > will try to put it in the barn, or flush to slab pages using bulk free,
> > > when the barn is full. Then a new empty sheaf must be obtained to put
> > > more objects there.
> > > 
> > > It's possible that no free sheaves are available to use for a new
> > > rcu_free sheaf, and the allocation in kfree_rcu() context can only use
> > > GFP_NOWAIT and thus may fail. In that case, fall back to the existing
> > > kfree_rcu() implementation.
> > > 
> > > Expected advantages:
> > > - batching the kfree_rcu() operations, that could eventually replace the
> > >    existing batching
> > > - sheaves can be reused for allocations via barn instead of being
> > >    flushed to slabs, which is more efficient
> > >    - this includes cases where only some cpus are allowed to process rcu
> > >      callbacks (Android)
> > > 
> > > Possible disadvantage:
> > > - objects might be waiting for more than their grace period (it is
> > >    determined by the last object freed into the sheaf), increasing memory
> > >    usage - but the existing batching does that too.
> > > 
> > > Only implement this for CONFIG_KVFREE_RCU_BATCHED as the tiny
> > > implementation favors smaller memory footprint over performance.
> > > 
> > > Also for now skip the usage of rcu sheaf for CONFIG_PREEMPT_RT as the
> > > contexts where kfree_rcu() is called might not be compatible with taking
> > > a barn spinlock or a GFP_NOWAIT allocation of a new sheaf taking a
> > > spinlock - the current kfree_rcu() implementation avoids doing that.
> > > 
> > > Teach kvfree_rcu_barrier() to flush all rcu_free sheaves from all caches
> > > that have them. This is not a cheap operation, but the barrier usage is
> > > rare - currently kmem_cache_destroy() or on module unload.
> > > 
> > > Add CONFIG_SLUB_STATS counters free_rcu_sheaf and free_rcu_sheaf_fail to
> > > count how many kfree_rcu() used the rcu_free sheaf successfully and how
> > > many had to fall back to the existing implementation.
> > > 
> > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> > 
> > Hi Vlastimil,
> > 
> > This patch increases kmod selftest (stress module loader) runtime by about
> > ~50-60%, from ~200s to ~300s total execution time. My tested kernel has
> > CONFIG_KVFREE_RCU_BATCHED enabled. Any idea or suggestions on what might be
> > causing this, or how to address it?
> > 
> 
> I have been looking into a regression for Linux v6.18-rc where time taken to
> run some internal graphics tests on our Tegra234 device has increased from
> around 35% causing the tests to timeout. Bisect is pointing to this commit
> and I also see we have CONFIG_KVFREE_RCU_BATCHED=y.

Thanks for reporting! Uh, this has been put aside while I was busy working
on other stuff... but now that we have two people complaining about this,
I'll allocate some time to investigate and improve it.

It'll take some time though :)

> I have not tried disabling CONFIG_KVFREE_RCU_BATCHED=y but I can. I am not
> sure if there are any downsides to disabling this?

I would not recommend doing that, unless you want to sacrifice overall
performance just for the test. Disabling it could create too many RCU
grace periods in the system.

> 
> Thanks
> Jon
> 
> -- 
> nvpublic

-- 
Cheers,
Harry / Hyeonggon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v8 04/23] slab: add sheaf support for batching kfree_rcu() operations
  2025-11-27 12:33       ` Harry Yoo
@ 2025-11-27 12:48         ` Harry Yoo
  2025-11-28  8:57           ` Jon Hunter
  0 siblings, 1 reply; 8+ messages in thread
From: Harry Yoo @ 2025-11-27 12:48 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Daniel Gomez, Vlastimil Babka, Suren Baghdasaryan,
	Liam R. Howlett, Christoph Lameter, David Rientjes,
	Roman Gushchin, Uladzislau Rezki, Sidhartha Kumar, linux-mm,
	linux-kernel, rcu, maple-tree, linux-modules, Luis Chamberlain,
	Petr Pavlu, Sami Tolvanen, Aaron Tomlin, Lucas De Marchi,
	linux-tegra@vger.kernel.org

On Thu, Nov 27, 2025 at 09:33:46PM +0900, Harry Yoo wrote:
> On Thu, Nov 27, 2025 at 11:38:49AM +0000, Jon Hunter wrote:
> > 
> > 
> > On 31/10/2025 21:32, Daniel Gomez wrote:
> > > 
> > > 
> > > On 10/09/2025 10.01, Vlastimil Babka wrote:
> > > > Extend the sheaf infrastructure for more efficient kfree_rcu() handling.
> > > > For caches with sheaves, on each cpu maintain a rcu_free sheaf in
> > > > addition to main and spare sheaves.
> > > > 
> > > > kfree_rcu() operations will try to put objects on this sheaf. Once full,
> > > > the sheaf is detached and submitted to call_rcu() with a handler that
> > > > will try to put it in the barn, or flush to slab pages using bulk free,
> > > > when the barn is full. Then a new empty sheaf must be obtained to put
> > > > more objects there.
> > > > 
> > > > It's possible that no free sheaves are available to use for a new
> > > > rcu_free sheaf, and the allocation in kfree_rcu() context can only use
> > > > GFP_NOWAIT and thus may fail. In that case, fall back to the existing
> > > > kfree_rcu() implementation.
> > > > 
> > > > Expected advantages:
> > > > - batching the kfree_rcu() operations, that could eventually replace the
> > > >    existing batching
> > > > - sheaves can be reused for allocations via barn instead of being
> > > >    flushed to slabs, which is more efficient
> > > >    - this includes cases where only some cpus are allowed to process rcu
> > > >      callbacks (Android)
> > > > 
> > > > Possible disadvantage:
> > > > - objects might be waiting for more than their grace period (it is
> > > >    determined by the last object freed into the sheaf), increasing memory
> > > >    usage - but the existing batching does that too.
> > > > 
> > > > Only implement this for CONFIG_KVFREE_RCU_BATCHED as the tiny
> > > > implementation favors smaller memory footprint over performance.
> > > > 
> > > > Also for now skip the usage of rcu sheaf for CONFIG_PREEMPT_RT as the
> > > > contexts where kfree_rcu() is called might not be compatible with taking
> > > > a barn spinlock or a GFP_NOWAIT allocation of a new sheaf taking a
> > > > spinlock - the current kfree_rcu() implementation avoids doing that.
> > > > 
> > > > Teach kvfree_rcu_barrier() to flush all rcu_free sheaves from all caches
> > > > that have them. This is not a cheap operation, but the barrier usage is
> > > > rare - currently kmem_cache_destroy() or on module unload.
> > > > 
> > > > Add CONFIG_SLUB_STATS counters free_rcu_sheaf and free_rcu_sheaf_fail to
> > > > count how many kfree_rcu() used the rcu_free sheaf successfully and how
> > > > many had to fall back to the existing implementation.
> > > > 
> > > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> > > 
> > > Hi Vlastimil,
> > > 
> > > This patch increases kmod selftest (stress module loader) runtime by about
> > > ~50-60%, from ~200s to ~300s total execution time. My tested kernel has
> > > CONFIG_KVFREE_RCU_BATCHED enabled. Any idea or suggestions on what might be
> > > causing this, or how to address it?
> > > 
> > 
> > I have been looking into a regression for Linux v6.18-rc where time taken to
> > run some internal graphics tests on our Tegra234 device has increased from
> > around 35% causing the tests to timeout. Bisect is pointing to this commit
> > and I also see we have CONFIG_KVFREE_RCU_BATCHED=y.
> 
> Thanks for reporting! Uh, this has been put aside while I was busy working
> on other stuff... but now that we have two people complaining about this,
> I'll allocate some time to investigate and improve it.
> 
> It'll take some time though :)

By the way, how many CPUs do you have on your system, and does your
kernel have CONFIG_CODE_TAGGING enabled?

-- 
Cheers,
Harry / Hyeonggon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v8 04/23] slab: add sheaf support for batching kfree_rcu() operations
  2025-11-27 11:38     ` [PATCH v8 04/23] slab: add sheaf support for batching kfree_rcu() operations Jon Hunter
  2025-11-27 11:50       ` Jon Hunter
  2025-11-27 12:33       ` Harry Yoo
@ 2025-11-27 13:18       ` Vlastimil Babka
  2025-11-28  8:59         ` Jon Hunter
  2 siblings, 1 reply; 8+ messages in thread
From: Vlastimil Babka @ 2025-11-27 13:18 UTC (permalink / raw)
  To: Jon Hunter, Daniel Gomez, Suren Baghdasaryan, Liam R. Howlett,
	Christoph Lameter, David Rientjes
  Cc: Roman Gushchin, Harry Yoo, Uladzislau Rezki, Sidhartha Kumar,
	linux-mm, linux-kernel, rcu, maple-tree, linux-modules,
	Luis Chamberlain, Petr Pavlu, Sami Tolvanen, Aaron Tomlin,
	Lucas De Marchi, linux-tegra@vger.kernel.org

On 11/27/25 12:38, Jon Hunter wrote:
> 
> 
> On 31/10/2025 21:32, Daniel Gomez wrote:
>> 
>> 
>> On 10/09/2025 10.01, Vlastimil Babka wrote:
>> 
>> Hi Vlastimil,
>> 
>> This patch increases kmod selftest (stress module loader) runtime by about
>> ~50-60%, from ~200s to ~300s total execution time. My tested kernel has
>> CONFIG_KVFREE_RCU_BATCHED enabled. Any idea or suggestions on what might be
>> causing this, or how to address it?
>> 
> 
> I have been looking into a regression for Linux v6.18-rc where time 
> taken to run some internal graphics tests on our Tegra234 device has 
> increased from around 35% causing the tests to timeout. Bisect is 
> pointing to this commit and I also see we have CONFIG_KVFREE_RCU_BATCHED=y.

Do the tegra tests involve (frequent) module unloads too, then? Or calling
kmem_cache_destroy() somewhere?

Thanks,
Vlastimil

> I have not tried disabling CONFIG_KVFREE_RCU_BATCHED=y but I can. I am 
> not sure if there are any downsides to disabling this?
> 
> Thanks
> Jon
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v8 04/23] slab: add sheaf support for batching kfree_rcu() operations
  2025-11-27 12:48         ` Harry Yoo
@ 2025-11-28  8:57           ` Jon Hunter
  2025-12-01  6:55             ` Harry Yoo
  0 siblings, 1 reply; 8+ messages in thread
From: Jon Hunter @ 2025-11-28  8:57 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Daniel Gomez, Vlastimil Babka, Suren Baghdasaryan,
	Liam R. Howlett, Christoph Lameter, David Rientjes,
	Roman Gushchin, Uladzislau Rezki, Sidhartha Kumar, linux-mm,
	linux-kernel, rcu, maple-tree, linux-modules, Luis Chamberlain,
	Petr Pavlu, Sami Tolvanen, Aaron Tomlin, Lucas De Marchi,
	linux-tegra@vger.kernel.org


On 27/11/2025 12:48, Harry Yoo wrote:

...

>>> I have been looking into a regression for Linux v6.18-rc where time taken to
>>> run some internal graphics tests on our Tegra234 device has increased from
>>> around 35% causing the tests to timeout. Bisect is pointing to this commit
>>> and I also see we have CONFIG_KVFREE_RCU_BATCHED=y.
>>
>> Thanks for reporting! Uh, this has been put aside while I was busy working
>> on other stuff... but now that we have two people complaining about this,
>> I'll allocate some time to investigate and improve it.
>>
>> It'll take some time though :)
> 
> By the way, how many CPUs do you have on your system, and does your
> kernel have CONFIG_CODE_TAGGING enabled?

For this device there are 12 CPUs. I don't see CONFIG_CODE_TAGGING enabled.

Thanks
Jon

-- 
nvpublic


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v8 04/23] slab: add sheaf support for batching kfree_rcu() operations
  2025-11-27 13:18       ` Vlastimil Babka
@ 2025-11-28  8:59         ` Jon Hunter
  0 siblings, 0 replies; 8+ messages in thread
From: Jon Hunter @ 2025-11-28  8:59 UTC (permalink / raw)
  To: Vlastimil Babka, Daniel Gomez, Suren Baghdasaryan,
	Liam R. Howlett, Christoph Lameter, David Rientjes
  Cc: Roman Gushchin, Harry Yoo, Uladzislau Rezki, Sidhartha Kumar,
	linux-mm, linux-kernel, rcu, maple-tree, linux-modules,
	Luis Chamberlain, Petr Pavlu, Sami Tolvanen, Aaron Tomlin,
	Lucas De Marchi, linux-tegra@vger.kernel.org


On 27/11/2025 13:18, Vlastimil Babka wrote:
> On 11/27/25 12:38, Jon Hunter wrote:
>>
>>
>> On 31/10/2025 21:32, Daniel Gomez wrote:
>>>
>>>
>>> On 10/09/2025 10.01, Vlastimil Babka wrote:
>>>
>>> Hi Vlastimil,
>>>
>>> This patch increases kmod selftest (stress module loader) runtime by about
>>> ~50-60%, from ~200s to ~300s total execution time. My tested kernel has
>>> CONFIG_KVFREE_RCU_BATCHED enabled. Any idea or suggestions on what might be
>>> causing this, or how to address it?
>>>
>>
>> I have been looking into a regression for Linux v6.18-rc where time
>> taken to run some internal graphics tests on our Tegra234 device has
>> increased from around 35% causing the tests to timeout. Bisect is
>> pointing to this commit and I also see we have CONFIG_KVFREE_RCU_BATCHED=y.
> 
> Do the tegra tests involve (frequent) module unloads too, then? Or calling
> kmem_cache_destroy() somewhere?

In this specific case I am not running the tegra-tests but we have a 
internal testsuite of GPU related tests. I don't believe that believe 
this is unloading any modules. I can take a look next week to see if 
kmem_cache_destroy() is getting called somewhere when these tests run.

Thanks
Jon

-- 
nvpublic


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v8 04/23] slab: add sheaf support for batching kfree_rcu() operations
  2025-11-28  8:57           ` Jon Hunter
@ 2025-12-01  6:55             ` Harry Yoo
  0 siblings, 0 replies; 8+ messages in thread
From: Harry Yoo @ 2025-12-01  6:55 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Daniel Gomez, Vlastimil Babka, Suren Baghdasaryan,
	Liam R. Howlett, Christoph Lameter, David Rientjes,
	Roman Gushchin, Uladzislau Rezki, Sidhartha Kumar, linux-mm,
	linux-kernel, rcu, maple-tree, linux-modules, Luis Chamberlain,
	Petr Pavlu, Sami Tolvanen, Aaron Tomlin, Lucas De Marchi,
	linux-tegra@vger.kernel.org

On Fri, Nov 28, 2025 at 08:57:28AM +0000, Jon Hunter wrote:
> 
> On 27/11/2025 12:48, Harry Yoo wrote:
> 
> ...
> 
> > > > I have been looking into a regression for Linux v6.18-rc where time taken to
> > > > run some internal graphics tests on our Tegra234 device has increased from
> > > > around 35% causing the tests to timeout. Bisect is pointing to this commit
> > > > and I also see we have CONFIG_KVFREE_RCU_BATCHED=y.
> > > 
> > > Thanks for reporting! Uh, this has been put aside while I was busy working
> > > on other stuff... but now that we have two people complaining about this,
> > > I'll allocate some time to investigate and improve it.
> > > 
> > > It'll take some time though :)
> > 
> > By the way, how many CPUs do you have on your system, and does your
> > kernel have CONFIG_CODE_TAGGING enabled?
> 
> For this device there are 12 CPUs. I don't see CONFIG_CODE_TAGGING enabled.

Thanks! Then it's probably due to kmem_cache_destroy().
Please let me know this patch improves your test execution time.

https://lore.kernel.org/linux-mm/20251128113740.90129-1-harry.yoo@oracle.com/

-- 
Cheers,
Harry / Hyeonggon

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-12-01  6:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250910-slub-percpu-caches-v8-0-ca3099d8352c@suse.cz>
     [not found] ` <20250910-slub-percpu-caches-v8-4-ca3099d8352c@suse.cz>
     [not found]   ` <0406562e-2066-4cf8-9902-b2b0616dd742@kernel.org>
2025-11-27 11:38     ` [PATCH v8 04/23] slab: add sheaf support for batching kfree_rcu() operations Jon Hunter
2025-11-27 11:50       ` Jon Hunter
2025-11-27 12:33       ` Harry Yoo
2025-11-27 12:48         ` Harry Yoo
2025-11-28  8:57           ` Jon Hunter
2025-12-01  6:55             ` Harry Yoo
2025-11-27 13:18       ` Vlastimil Babka
2025-11-28  8:59         ` Jon Hunter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).