[ISSUE] cgroup: test_percpu_basic fails on PREEMPT_RT due to lazy percpu stat flushing

public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed

* [ISSUE] cgroup: test_percpu_basic fails on PREEMPT_RT due to lazy percpu stat flushing
@ 2026-03-11  8:49 Lucas Liu
  2026-03-11 14:17 ` Waiman Long
  0 siblings, 1 reply; 5+ messages in thread
From: Lucas Liu @ 2026-03-11  8:49 UTC (permalink / raw)
  To: cgroups, linux-kselftest

Hi recently I met this issue
 ./test_kmem
ok 1 test_kmem_basic
ok 2 test_kmem_memcg_deletion
ok 3 test_kmem_proc_kpagecgroup
ok 4 test_kmem_kernel_stacks
ok 5 test_kmem_dead_cgroups
memory.current 24514560
percpu 15280000
not ok 6 test_percpu_basic

In this test the memory.current 24514560, percpu 15280000, Diff ~9.2MB.

#define MAX_VMSTAT_ERROR (4096 * 64 * get_nprocs())

in this part (8cpus) MAX_VMSTAT_ERROR is 4M memory. On the RT kernel,
the labs(current - percpu) is 9.2M, that is the root cause for this
failure. I am not sure what value is suitable for this case(2M per cpu
maybe?)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [ISSUE] cgroup: test_percpu_basic fails on PREEMPT_RT due to lazy percpu stat flushing
  2026-03-11  8:49 [ISSUE] cgroup: test_percpu_basic fails on PREEMPT_RT due to lazy percpu stat flushing Lucas Liu
@ 2026-03-11 14:17 ` Waiman Long
  2026-03-12  6:27   ` Lucas Liu
  2026-03-12 10:18   ` Li Wang
  0 siblings, 2 replies; 5+ messages in thread
From: Waiman Long @ 2026-03-11 14:17 UTC (permalink / raw)
  To: Lucas Liu, cgroups, linux-kselftest; +Cc: Li Wang

On 3/11/26 4:49 AM, Lucas Liu wrote:
> Hi recently I met this issue
>   ./test_kmem
> ok 1 test_kmem_basic
> ok 2 test_kmem_memcg_deletion
> ok 3 test_kmem_proc_kpagecgroup
> ok 4 test_kmem_kernel_stacks
> ok 5 test_kmem_dead_cgroups
> memory.current 24514560
> percpu 15280000
> not ok 6 test_percpu_basic
>
> In this test the memory.current 24514560, percpu 15280000, Diff ~9.2MB.
>
> #define MAX_VMSTAT_ERROR (4096 * 64 * get_nprocs())
>
> in this part (8cpus) MAX_VMSTAT_ERROR is 4M memory. On the RT kernel,
> the labs(current - percpu) is 9.2M, that is the root cause for this
> failure. I am not sure what value is suitable for this case(2M per cpu
> maybe?)

Li Wang had posted patches to address some of the problems in this test.

https://lore.kernel.org/lkml/20260306071843.149147-2-liwang@redhat.com/

It could be the case that lazy percpu stat flushing can also be a factor 
here. In this case, we may need to reread the stat counters again 
several time with some delay to solve this problem.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [ISSUE] cgroup: test_percpu_basic fails on PREEMPT_RT due to lazy percpu stat flushing
  2026-03-11 14:17 ` Waiman Long
@ 2026-03-12  6:27   ` Lucas Liu
  2026-03-12 10:18   ` Li Wang
  1 sibling, 0 replies; 5+ messages in thread
From: Lucas Liu @ 2026-03-12  6:27 UTC (permalink / raw)
  To: Waiman Long; +Cc: cgroups, linux-kselftest, Li Wang

Hi Waiman:
Thanks for responding, I have tried Li Wang's patch, The problem has been fixed.

# ./test_kmem
ok 1 test_kmem_basic
ok 2 test_kmem_memcg_deletion
ok 3 test_kmem_proc_kpagecgroup
ok 4 test_kmem_kernel_stacks
ok 5 test_kmem_dead_cgroups
ok 6 test_percpu_basic
[root@localhost cgroup]# bash run.sh
run 100 times...
--------------------------------------
proccess: 100/100  status: [  OK  ]  failure: 0
--------------------------------------
done
overall: 100
ok: 100
fail: 0


For the lazy percpu stat flushing, I assume this is expected behavior
for RT kernels? So Li Wang's patch can be our final solution? Please
correct me if I am wrong.

Thanks

On Wed, Mar 11, 2026 at 10:17 PM Waiman Long <longman@redhat.com> wrote:
>
> On 3/11/26 4:49 AM, Lucas Liu wrote:
> > Hi recently I met this issue
> >   ./test_kmem
> > ok 1 test_kmem_basic
> > ok 2 test_kmem_memcg_deletion
> > ok 3 test_kmem_proc_kpagecgroup
> > ok 4 test_kmem_kernel_stacks
> > ok 5 test_kmem_dead_cgroups
> > memory.current 24514560
> > percpu 15280000
> > not ok 6 test_percpu_basic
> >
> > In this test the memory.current 24514560, percpu 15280000, Diff ~9.2MB.
> >
> > #define MAX_VMSTAT_ERROR (4096 * 64 * get_nprocs())
> >
> > in this part (8cpus) MAX_VMSTAT_ERROR is 4M memory. On the RT kernel,
> > the labs(current - percpu) is 9.2M, that is the root cause for this
> > failure. I am not sure what value is suitable for this case(2M per cpu
> > maybe?)
>
> Li Wang had posted patches to address some of the problems in this test.
>
> https://lore.kernel.org/lkml/20260306071843.149147-2-liwang@redhat.com/
>
> It could be the case that lazy percpu stat flushing can also be a factor
> here. In this case, we may need to reread the stat counters again
> several time with some delay to solve this problem.
>
> Cheers,
> Longman
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [ISSUE] cgroup: test_percpu_basic fails on PREEMPT_RT due to lazy percpu stat flushing
  2026-03-11 14:17 ` Waiman Long
  2026-03-12  6:27   ` Lucas Liu
@ 2026-03-12 10:18   ` Li Wang
  2026-03-12 10:30     ` Li Wang
  1 sibling, 1 reply; 5+ messages in thread
From: Li Wang @ 2026-03-12 10:18 UTC (permalink / raw)
  To: Waiman Long, Lucas Liu; +Cc: cgroups, linux-kselftest, Li Wang

Waiman Long wrote:

> On 3/11/26 4:49 AM, Lucas Liu wrote:
> > Hi recently I met this issue
> >   ./test_kmem
> > ok 1 test_kmem_basic
> > ok 2 test_kmem_memcg_deletion
> > ok 3 test_kmem_proc_kpagecgroup
> > ok 4 test_kmem_kernel_stacks
> > ok 5 test_kmem_dead_cgroups
> > memory.current 24514560
> > percpu 15280000
> > not ok 6 test_percpu_basic
> > 
> > In this test the memory.current 24514560, percpu 15280000, Diff ~9.2MB.
> > 
> > #define MAX_VMSTAT_ERROR (4096 * 64 * get_nprocs())
> > 
> > in this part (8cpus) MAX_VMSTAT_ERROR is 4M memory. On the RT kernel,
> > the labs(current - percpu) is 9.2M, that is the root cause for this
> > failure. I am not sure what value is suitable for this case(2M per cpu
> > maybe?)
> 
> Li Wang had posted patches to address some of the problems in this test.
> 
> https://lore.kernel.org/lkml/20260306071843.149147-2-liwang@redhat.com/
> 
> It could be the case that lazy percpu stat flushing can also be a factor
> here. In this case, we may need to reread the stat counters again several
> time with some delay to solve this problem.

When memory.stat is read, the kernel calls mem_cgroup_flush_stats(), which
invokes cgroup_rstat_flush() to drain per-cpu counters before returning
results. So in the normal read path, stats are flushed, they aren't
arbitrarily stale at the point this test reads them.

The "lazy" aspect, my understand, is that background flushing maybe skipped
sometime, as there is an situation: __mem_cgroup_flush_stats() skips the
flush if the total pending update is below a threshold, i.e.

  575  static bool memcg_vmstats_needs_flush(struct memcg_vmstats *vmstats)
  576  {
  577          return atomic64_read(&vmstats->stats_updates) >
  578                  MEMCG_CHARGE_BATCH * num_online_cpus();
  579  }

So the "lazy" could happen on a machine with too many CPUs, that threshold
can be non-trivial and could contribute a few MB of discrepancy.

But my failure observed on a 3CPUs box, it shouldn't go with "lazy" skip.

 # ./test_kmem
 TAP version 13
 1..6
 ok 1 test_kmem_basic
 ok 2 test_kmem_memcg_deletion
 ok 3 test_kmem_proc_kpagecgroup
 ok 4 test_kmem_kernel_stacks
 ok 5 test_kmem_dead_cgroups
 memory.current 11530240
 percpu 8440000
 not ok 6 test_percpu_basic
 # Totals: pass:5 fail:1 xfail:0 xpass:0 skip:0 error:0
 
 # uname -r
 6.12.0-211.el10.aarch64
 
 # getconf PAGE_SIZE
 4096
 
 # lscpu
 Architecture:                aarch64
   CPU op-mode(s):            32-bit, 64-bit
   Byte Order:                Little Endian
 CPU(s):                      3
   On-line CPU(s) list:       0-2
 ...

Even on Lucas's test system, (8cpus), I assume the pagesize is 4k, the
threashold is 2M is still less than the failed result:
  64 × 8 = 512 pages = 512 × 4096 = 2 MB

Bose on the above two testing, the lazy produce deviation is not
like the root cause.

-- 
Regards,
Li Wang


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [ISSUE] cgroup: test_percpu_basic fails on PREEMPT_RT due to lazy percpu stat flushing
  2026-03-12 10:18   ` Li Wang
@ 2026-03-12 10:30     ` Li Wang
  0 siblings, 0 replies; 5+ messages in thread
From: Li Wang @ 2026-03-12 10:30 UTC (permalink / raw)
  To: Waiman Long, Lucas Liu; +Cc: cgroups, linux-kselftest, Li Wang

On Thu, Mar 12, 2026 at 06:18:09PM +0800, Li Wang wrote:
> Waiman Long wrote:
> 
> > On 3/11/26 4:49 AM, Lucas Liu wrote:
> > > Hi recently I met this issue
> > >   ./test_kmem
> > > ok 1 test_kmem_basic
> > > ok 2 test_kmem_memcg_deletion
> > > ok 3 test_kmem_proc_kpagecgroup
> > > ok 4 test_kmem_kernel_stacks
> > > ok 5 test_kmem_dead_cgroups
> > > memory.current 24514560
> > > percpu 15280000
> > > not ok 6 test_percpu_basic
> > > 
> > > In this test the memory.current 24514560, percpu 15280000, Diff ~9.2MB.
> > > 
> > > #define MAX_VMSTAT_ERROR (4096 * 64 * get_nprocs())
> > > 
> > > in this part (8cpus) MAX_VMSTAT_ERROR is 4M memory. On the RT kernel,
> > > the labs(current - percpu) is 9.2M, that is the root cause for this
> > > failure. I am not sure what value is suitable for this case(2M per cpu
> > > maybe?)
> > 
> > Li Wang had posted patches to address some of the problems in this test.
> > 
> > https://lore.kernel.org/lkml/20260306071843.149147-2-liwang@redhat.com/
> > 
> > It could be the case that lazy percpu stat flushing can also be a factor
> > here. In this case, we may need to reread the stat counters again several
> > time with some delay to solve this problem.
> 
> When memory.stat is read, the kernel calls mem_cgroup_flush_stats(), which
> invokes cgroup_rstat_flush() to drain per-cpu counters before returning
> results. So in the normal read path, stats are flushed, they aren't
> arbitrarily stale at the point this test reads them.
> 
> The "lazy" aspect, my understand, is that background flushing maybe skipped
> sometime, as there is an situation: __mem_cgroup_flush_stats() skips the
> flush if the total pending update is below a threshold, i.e.
> 
>   575  static bool memcg_vmstats_needs_flush(struct memcg_vmstats *vmstats)
>   576  {
>   577          return atomic64_read(&vmstats->stats_updates) >
>   578                  MEMCG_CHARGE_BATCH * num_online_cpus();
>   579  }
> 
> So the "lazy" could happen on a machine with too many CPUs, that threshold
> can be non-trivial and could contribute a few MB of discrepancy.
> 
> But my failure observed on a 3CPUs box, it shouldn't go with "lazy" skip.
> 
>  # ./test_kmem
>  TAP version 13
>  1..6
>  ok 1 test_kmem_basic
>  ok 2 test_kmem_memcg_deletion
>  ok 3 test_kmem_proc_kpagecgroup
>  ok 4 test_kmem_kernel_stacks
>  ok 5 test_kmem_dead_cgroups
>  memory.current 11530240
>  percpu 8440000
>  not ok 6 test_percpu_basic
>  # Totals: pass:5 fail:1 xfail:0 xpass:0 skip:0 error:0
>  
>  # uname -r
>  6.12.0-211.el10.aarch64
>  
>  # getconf PAGE_SIZE
>  4096
>  
>  # lscpu
>  Architecture:                aarch64
>    CPU op-mode(s):            32-bit, 64-bit
>    Byte Order:                Little Endian
>  CPU(s):                      3
>    On-line CPU(s) list:       0-2
>  ...
> 
> Even on Lucas's test system, (8cpus), I assume the pagesize is 4k, the
> threashold is 2M is still less than the failed result:
>   64 × 8 = 512 pages = 512 × 4096 = 2 MB
> 
> Bose on the above two testing, the lazy produce deviation is not
> like the root cause.

BTW, if the lazy flush does become a problem on large-CPU machines
in real test, we can add a retry loop (like Waiman suggested) in a
seperate patch. But I'd prefer to keep this one focused on the
missing slab accounting first.

-- 
Regards,
Li Wang


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-12 10:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-11  8:49 [ISSUE] cgroup: test_percpu_basic fails on PREEMPT_RT due to lazy percpu stat flushing Lucas Liu
2026-03-11 14:17 ` Waiman Long
2026-03-12  6:27   ` Lucas Liu
2026-03-12 10:18   ` Li Wang
2026-03-12 10:30     ` Li Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox