cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [linus:master] [selftests/cgroup]  954bacce36: kernel-selftests.cgroup.test_cpu.test_cpucg_max.fail
@ 2025-08-07  5:52 kernel test robot
  2025-08-11 21:32 ` Michal Koutný
  0 siblings, 1 reply; 2+ messages in thread
From: kernel test robot @ 2025-08-07  5:52 UTC (permalink / raw)
  To: Shashank Balaji
  Cc: oe-lkp, lkp, linux-kernel, Tejun Heo, Michal Koutný, cgroups,
	oliver.sang



Hello,


we noticed 954bacce36 is to address "cpu.max tests", however, we found below
two tests can pass on parent, but failed on 954bacce36.

dfe25fbaedfc2a07 954bacce36d976fe472090b5598
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
           :6           100%          6:6     kernel-selftests.cgroup.test_cpu.test_cpucg_max.fail
           :6           100%          6:6     kernel-selftests.cgroup.test_cpu.test_cpucg_max_nested.fail


not sure if there are any necessary env setting? thanks


kernel test robot noticed "kernel-selftests.cgroup.test_cpu.test_cpucg_max.fail" on:

commit: 954bacce36d976fe472090b55987df66da00c49b ("selftests/cgroup: fix cpu.max tests")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linux-next/master 84b92a499e7eca54ba1df6f6c6e01766025943f1]

in testcase: kernel-selftests
version: kernel-selftests-x86_64-ae388edd4a8f-1_20250729
with following parameters:

	group: cgroup



config: x86_64-rhel-9.4-kselftests
compiler: gcc-12
test machine: 4 threads 1 sockets Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz (Ivy Bridge) with 8G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202508070722.46239e7c-lkp@intel.com



# timeout set to 300
# selftests: cgroup: test_cpu
# ok 1 test_cpucg_subtree_control
# ok 2 test_cpucg_stats
# ok 3 test_cpucg_nice
# not ok 4 test_cpucg_weight_overprovisioned
# ok 5 test_cpucg_weight_underprovisioned
# not ok 6 test_cpucg_nested_weight_overprovisioned
# ok 7 test_cpucg_nested_weight_underprovisioned
# not ok 8 test_cpucg_max
# not ok 9 test_cpucg_max_nested
not ok 2 selftests: cgroup: test_cpu # exit=1



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250807/202508070722.46239e7c-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [linus:master] [selftests/cgroup]  954bacce36: kernel-selftests.cgroup.test_cpu.test_cpucg_max.fail
  2025-08-07  5:52 [linus:master] [selftests/cgroup] 954bacce36: kernel-selftests.cgroup.test_cpu.test_cpucg_max.fail kernel test robot
@ 2025-08-11 21:32 ` Michal Koutný
  0 siblings, 0 replies; 2+ messages in thread
From: Michal Koutný @ 2025-08-11 21:32 UTC (permalink / raw)
  To: kernel test robot
  Cc: Shashank Balaji, oe-lkp, lkp, linux-kernel, Tejun Heo, cgroups

[-- Attachment #1: Type: text/plain, Size: 1930 bytes --]

Hello.

On Thu, Aug 07, 2025 at 01:52:31PM +0800, kernel test robot <oliver.sang@intel.com> wrote:
> dfe25fbaedfc2a07 954bacce36d976fe472090b5598
> ---------------- ---------------------------
>        fail:runs  %reproduction    fail:runs
>            |             |             |
>            :6           100%          6:6     kernel-selftests.cgroup.test_cpu.test_cpucg_max.fail
>            :6           100%          6:6     kernel-selftests.cgroup.test_cpu.test_cpucg_max_nested.fail
> 
> 
> not sure if there are any necessary env setting? thanks

The selftest commit essentially changed the tolerance margin from
ridiculously large to something that looked statistically appropriate
[1].
However, when I run the test (30x) on the 954bacce36 I get:

quantile([D1 D2 D8])  # 1 2 and 8 vCPUs respectively
ans =

   1.3086e+04   1.1559e+04   1.1177e+04 # min
   1.5109e+04   1.2936e+04   1.2989e+04
   1.5791e+04   1.3938e+04   1.3788e+04 # median
   1.6159e+04   1.5385e+04   1.4980e+04
   1.8757e+04   1.8699e+04   1.9494e+04 # max

I obtain similar values also on v6.15 (the kernel + 954bacce36
selftest). So it's not anything in throtlling implementation affecting
this.

The tests above were with HZ=250, for HZ=1000, I get slightly smaller
results with D2:
   1.1753e+04 # min
   1.2634e+04
   1.3208e+04 # median
   1.4010e+04
   1.6937e+04 # max

But still nowhere the 20% margin (i.e. values_close(...10%)), these
values would demand up to 100% (values_close(..., 50%)). Or add a bias
derived from sched_cfs_bandwidth_slice_us or increase the tested quota
from 1% to 5%, that'd be an improvement:

   48882 # min
   52450
   52941 # median
   54284 # 75th percentile
   73186 # max (limit would be 60000)

I'm not sure how big overrun we want to accept as a pass.

Michal

[1] lore.kernel.org/r/20250701-kselftest-cgroup-fix-cpu-max-v1-2-049507ad6832@sony.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-08-11 21:32 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-07  5:52 [linus:master] [selftests/cgroup] 954bacce36: kernel-selftests.cgroup.test_cpu.test_cpucg_max.fail kernel test robot
2025-08-11 21:32 ` Michal Koutný

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).