From: Nathan Chancellor <nathan@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>,
K Prateek Nayak <kprateek.nayak@amd.com>
Cc: linux-tip-commits@vger.kernel.org, linux-kernel@vger.kernel.org,
Shrikanth Hegde <sshegde@linux.ibm.com>,
Chen Yu <yu.c.chen@intel.com>,
Valentin Schneider <vschneid@redhat.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
x86@kernel.org
Subject: Re: [tip: sched/core] sched/topology: Compute sd_weight considering cpuset partitions
Date: Fri, 20 Mar 2026 16:58:24 -0700 [thread overview]
Message-ID: <20260320235824.GA1176840@ax162> (raw)
In-Reply-To: <177382132440.1647592.1849180094328011054.tip-bot2@tip-bot2>
Hi all,
On Wed, Mar 18, 2026 at 08:08:44AM -0000, tip-bot2 for K Prateek Nayak wrote:
> The following commit has been merged into the sched/core branch of tip:
>
> Commit-ID: 8e8e23dea43e64ddafbd1246644c3219209be113
> Gitweb: https://git.kernel.org/tip/8e8e23dea43e64ddafbd1246644c3219209be113
> Author: K Prateek Nayak <kprateek.nayak@amd.com>
> AuthorDate: Thu, 12 Mar 2026 04:44:26
> Committer: Peter Zijlstra <peterz@infradead.org>
> CommitterDate: Wed, 18 Mar 2026 09:06:47 +01:00
>
> sched/topology: Compute sd_weight considering cpuset partitions
>
> The "sd_weight" used for calculating the load balancing interval, and
> its limits, considers the span weight of the entire topology level
> without accounting for cpuset partitions.
>
> For example, consider a large system of 128CPUs divided into 8 * 16CPUs
> partition which is typical when deploying virtual machines:
>
> [ PKG Domain: 128CPUs ]
>
> [Partition0: 16CPUs][Partition1: 16CPUs] ... [Partition7: 16CPUs]
>
> Although each partition only contains 16CPUs, the load balancing
> interval is set to a minimum of 128 jiffies considering the span of the
> entire domain with 128CPUs which can lead to longer imbalances within
> the partition although balancing within is cheaper with 16CPUs.
>
> Compute the "sd_weight" after computing the "sd_span" considering the
> cpu_map covered by the partition, and set the load balancing interval,
> and its limits accordingly.
>
> For the above example, the balancing intervals for the partitions PKG
> domain changes as follows:
>
> before after
> balance_interval 128 16
> min_interval 128 16
> max_interval 256 32
>
> Intervals are now proportional to the CPUs in the partitioned domain as
> was intended by the original formula.
>
> Fixes: cb83b629bae03 ("sched/numa: Rewrite the CONFIG_NUMA sched domain support")
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> Reviewed-by: Chen Yu <yu.c.chen@intel.com>
> Reviewed-by: Valentin Schneider <vschneid@redhat.com>
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Link: https://patch.msgid.link/20260312044434.1974-2-kprateek.nayak@amd.com
> ---
> kernel/sched/topology.c | 14 ++++++--------
> 1 file changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 061f8c8..79bab80 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1645,13 +1645,17 @@ sd_init(struct sched_domain_topology_level *tl,
> struct cpumask *sd_span;
> u64 now = sched_clock();
>
> - sd_weight = cpumask_weight(tl->mask(tl, cpu));
> + sd_span = sched_domain_span(sd);
> + cpumask_and(sd_span, cpu_map, tl->mask(tl, cpu));
> + sd_weight = cpumask_weight(sd_span);
> + sd_id = cpumask_first(sd_span);
>
> if (tl->sd_flags)
> sd_flags = (*tl->sd_flags)();
> if (WARN_ONCE(sd_flags & ~TOPOLOGY_SD_FLAGS,
> - "wrong sd_flags in topology description\n"))
> + "wrong sd_flags in topology description\n"))
> sd_flags &= TOPOLOGY_SD_FLAGS;
> + sd_flags |= asym_cpu_capacity_classify(sd_span, cpu_map);
>
> *sd = (struct sched_domain){
> .min_interval = sd_weight,
> @@ -1689,12 +1693,6 @@ sd_init(struct sched_domain_topology_level *tl,
> .name = tl->name,
> };
>
> - sd_span = sched_domain_span(sd);
> - cpumask_and(sd_span, cpu_map, tl->mask(tl, cpu));
> - sd_id = cpumask_first(sd_span);
> -
> - sd->flags |= asym_cpu_capacity_classify(sd_span, cpu_map);
> -
> WARN_ONCE((sd->flags & (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY)) ==
> (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY),
> "CPU capacity asymmetry not supported on SMT\n");
Apologies if this has already been reported or addressed but I am seeing
a crash when booting certain ARM configurations after this change landed
in -next. I reduced it down to
$ cat kernel/configs/schedstats.config
CONFIG_SCHEDSTATS=y
$ make -skj"$(nproc)" ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- mrproper defconfig schedstats.config zImage
$ curl -LSs https://github.com/ClangBuiltLinux/boot-utils/releases/download/20241120-044434/arm-rootfs.cpio.zst | zstd -d >rootfs.cpio
$ qemu-system-arm \
-display none \
-nodefaults \
-no-reboot \
-machine virt \
-append 'console=ttyAMA0 earlycon' \
-kernel arch/arm/boot/zImage \
-initrd rootfs.cpio \
-m 1G \
-serial mon:stdio
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 7.0.0-rc4-00017-g8e8e23dea43e (nathan@framework-amd-ryzen-maxplus-395) (arm-linux-gnueabi-gcc (GCC) 15.2.0, GNU ld (GNU Binutils) 2.45) #1 SMP Fri Mar 20 16:12:05 MST 2026
...
[ 0.031929] 8<--- cut here ---
[ 0.031999] Unable to handle kernel NULL pointer dereference at virtual address 00000000 when write
[ 0.032172] [00000000] *pgd=00000000
[ 0.032459] Internal error: Oops: 805 [#1] SMP ARM
[ 0.032902] Modules linked in:
[ 0.033466] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc4-00017-g8e8e23dea43e #1 VOLUNTARY
[ 0.033658] Hardware name: Generic DT based system
[ 0.033770] PC is at build_sched_domains+0x7d0/0x1628
[ 0.034091] LR is at build_sched_domains+0x78c/0x1628
[ 0.034166] pc : [<c03c54bc>] lr : [<c03c5478>] psr: 20000053
[ 0.034255] sp : f080dec0 ip : 00000000 fp : c1e244a4
[ 0.034339] r10: c1e04fd4 r9 : c1e24518 r8 : 00000000
[ 0.034415] r7 : c2088f20 r6 : c28db924 r5 : c1e051ec r4 : 00000010
[ 0.034508] r3 : 00000000 r2 : 00000000 r1 : 00000010 r0 : 00000010
[ 0.034623] Flags: nzCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment none
[ 0.034730] Control: 10c5387d Table: 4020406a DAC: 00000051
[ 0.034819] Register r0 information: zero-size pointer
[ 0.034990] Register r1 information: zero-size pointer
[ 0.035064] Register r2 information: NULL pointer
[ 0.035133] Register r3 information: NULL pointer
[ 0.035198] Register r4 information: zero-size pointer
[ 0.035266] Register r5 information: non-slab/vmalloc memory
[ 0.035376] Register r6 information: slab kmalloc-512 start c28db800 pointer offset 292 size 512
[ 0.035623] Register r7 information: non-slab/vmalloc memory
[ 0.035703] Register r8 information: NULL pointer
[ 0.035769] Register r9 information: non-slab/vmalloc memory
[ 0.035848] Register r10 information: non-slab/vmalloc memory
[ 0.035928] Register r11 information: non-slab/vmalloc memory
[ 0.036006] Register r12 information: NULL pointer
[ 0.036083] Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
[ 0.036243] Stack: (0xf080dec0 to 0xf080e000)
[ 0.036339] dec0: 00000000 c139a06c 00000001 00000000 c1e243f4 c28db924 c28db800 00000000
[ 0.036450] dee0: 00000000 ffff8ad3 00000000 00000001 c18f9f1c 00000000 c1e03d80 c1a8d4d0
[ 0.036559] df00: 00000000 c2073b8f c28e3180 00000000 c20d0050 c1d7ea64 c28b8800 f4b63fe3
[ 0.036665] df20: c1e22714 c2969480 c1e22714 00000000 c2074620 c1a8a0e8 00000000 00000000
[ 0.036772] df40: f080df6c c1c1d724 20000053 c0303d80 f080df64 f4b63fe3 c1d703dc c1d703dc
[ 0.036878] df60: c1d703dc 00000000 00000000 c1c01368 c2969480 f080df74 f080df74 f4b63fe3
[ 0.036989] df80: 00000000 c1e04f80 c13979fc 00000000 00000000 00000000 00000000 00000000
[ 0.037097] dfa0: 00000000 c1397a14 00000000 c03001ac 00000000 00000000 00000000 00000000
[ 0.037206] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 0.037316] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[ 0.037447] Call trace:
[ 0.037698] build_sched_domains from sched_init_smp+0x80/0x108
[ 0.037943] sched_init_smp from kernel_init_freeable+0xe8/0x24c
[ 0.038029] kernel_init_freeable from kernel_init+0x18/0x12c
[ 0.038122] kernel_init from ret_from_fork+0x14/0x28
[ 0.038209] Exception stack(0xf080dfb0 to 0xf080dff8)
[ 0.038277] dfa0: 00000000 00000000 00000000 00000000
[ 0.038386] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 0.038495] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 0.038640] Code: e58d3020 e58d300c e59d3020 e59d200c (e5832000)
[ 0.038903] ---[ end trace 0000000000000000 ]---
[ 0.039275] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 0.039628] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
If there is any more information I can provide or patches I can test, I
am more than happy to do so.
Cheers,
Nathan
# bad: [b5d083a3ed1e2798396d5e491432e887da8d4a06] Add linux-next specific files for 20260319
# good: [8a30aeb0d1b4e4aaf7f7bae72f20f2ae75385ccb] Merge tag 'nfsd-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
git bisect start 'b5d083a3ed1e2798396d5e491432e887da8d4a06' '8a30aeb0d1b4e4aaf7f7bae72f20f2ae75385ccb'
# good: [21fbd87ec0afe2af5457f5a7f9acbee4bf5db891] Merge branch 'main' of https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
git bisect good 21fbd87ec0afe2af5457f5a7f9acbee4bf5db891
# good: [bffa4391cf4ee844778893a781f14faa55c75cce] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git
git bisect good bffa4391cf4ee844778893a781f14faa55c75cce
# bad: [a360efb89caee066919156db3921e616093c43b6] Merge branch 'for-leds-next' of https://git.kernel.org/pub/scm/linux/kernel/git/lee/leds.git
git bisect bad a360efb89caee066919156db3921e616093c43b6
# good: [77f1b9e1181ac53ae9ce7c3c0e52002d02495c5e] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git
git bisect good 77f1b9e1181ac53ae9ce7c3c0e52002d02495c5e
# bad: [d0b3afea83e48990083c0367c10f02af751166b4] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
git bisect bad d0b3afea83e48990083c0367c10f02af751166b4
# good: [fe58c95c6f191a8c45dc183a2348a3b4caa77ed8] Merge branch into tip/master: 'perf/core'
git bisect good fe58c95c6f191a8c45dc183a2348a3b4caa77ed8
# bad: [90924d8b73ac96a1a8b1cb9ba6cae36e193061a1] Merge branch into tip/master: 'timers/vdso'
git bisect bad 90924d8b73ac96a1a8b1cb9ba6cae36e193061a1
# bad: [91396a53d7c7cb694627c665e0dbd2589c99eb0a] Merge branch into tip/master: 'timers/core'
git bisect bad 91396a53d7c7cb694627c665e0dbd2589c99eb0a
# bad: [fe7171d0d5dfbe189e41db99580ebacafc3c09ce] sched/fair: Simplify SIS_UTIL handling in select_idle_cpu()
git bisect bad fe7171d0d5dfbe189e41db99580ebacafc3c09ce
# good: [54a66e431eeacf23e1dc47cb3507f2d0c068aaf0] sched/headers: Inline raw_spin_rq_unlock()
git bisect good 54a66e431eeacf23e1dc47cb3507f2d0c068aaf0
# bad: [1cc8a33ca7e8d38f962b64ece2a42c411a67bc76] sched/topology: Allocate per-CPU sched_domain_shared in s_data
git bisect bad 1cc8a33ca7e8d38f962b64ece2a42c411a67bc76
# good: [786244f70322e41c937e69f0f935bfd11a9611bf] Merge tag 'v7.0-rc4' into sched/core, to pick up scheduler fixes
git bisect good 786244f70322e41c937e69f0f935bfd11a9611bf
# bad: [5a7b576b3ec1acc2694c5b58f80cd1d44a11b2c1] sched/topology: Extract "imb_numa_nr" calculation into a separate helper
git bisect bad 5a7b576b3ec1acc2694c5b58f80cd1d44a11b2c1
# bad: [8e8e23dea43e64ddafbd1246644c3219209be113] sched/topology: Compute sd_weight considering cpuset partitions
git bisect bad 8e8e23dea43e64ddafbd1246644c3219209be113
# first bad commit: [8e8e23dea43e64ddafbd1246644c3219209be113] sched/topology: Compute sd_weight considering cpuset partitions
next prev parent reply other threads:[~2026-03-20 23:58 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-12 4:44 [PATCH v4 0/9] sched/topology: Optimize sd->shared allocation K Prateek Nayak
2026-03-12 4:44 ` [PATCH v4 1/9] sched/topology: Compute sd_weight considering cpuset partitions K Prateek Nayak
2026-03-12 9:34 ` Peter Zijlstra
2026-03-12 9:59 ` K Prateek Nayak
2026-03-12 10:01 ` Peter Zijlstra
2026-03-12 10:09 ` K Prateek Nayak
2026-03-18 8:08 ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-20 23:58 ` Nathan Chancellor [this message]
2026-03-21 3:36 ` K Prateek Nayak
2026-03-21 7:33 ` Chen, Yu C
2026-03-21 7:47 ` Chen, Yu C
2026-03-21 8:59 ` K Prateek Nayak
2026-03-21 9:45 ` K Prateek Nayak
2026-03-21 10:13 ` K Prateek Nayak
2026-03-21 12:48 ` Chen, Yu C
2026-03-24 2:54 ` K Prateek Nayak
2026-03-21 14:13 ` Shrikanth Hegde
2026-03-21 15:14 ` K Prateek Nayak
2026-03-21 16:38 ` [PATCH] sched/topology: Initialize sd_span after assignment to *sd K Prateek Nayak
2026-03-23 9:08 ` Shrikanth Hegde
2026-03-23 17:34 ` K Prateek Nayak
2026-03-23 9:36 ` Peter Zijlstra
2026-03-23 13:24 ` Jon Hunter
2026-03-23 15:36 ` Chen, Yu C
2026-03-23 17:24 ` K Prateek Nayak
2026-03-23 22:41 ` Nathan Chancellor
2026-03-24 9:10 ` [tip: sched/core] sched/topology: Fix sched_domain_span() tip-bot2 for Peter Zijlstra
2026-03-12 4:44 ` [PATCH v4 2/9] sched/topology: Extract "imb_numa_nr" calculation into a separate helper K Prateek Nayak
2026-03-12 13:37 ` kernel test robot
2026-03-12 15:42 ` K Prateek Nayak
2026-03-12 16:02 ` Peter Zijlstra
2026-03-16 0:18 ` Dietmar Eggemann
2026-03-16 3:41 ` K Prateek Nayak
2026-03-16 8:24 ` Dietmar Eggemann
2026-03-16 8:50 ` K Prateek Nayak
2026-03-18 8:08 ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12 4:44 ` [PATCH v4 3/9] sched/topology: Allocate per-CPU sched_domain_shared in s_data K Prateek Nayak
2026-03-18 8:08 ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12 4:44 ` [PATCH v4 4/9] sched/topology: Switch to assigning "sd->shared" from s_data K Prateek Nayak
2026-03-18 8:08 ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12 4:44 ` [PATCH v4 5/9] sched/topology: Remove sched_domain_shared allocation with sd_data K Prateek Nayak
2026-03-18 8:08 ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12 4:44 ` [PATCH v4 6/9] sched/core: Check for rcu_read_lock_any_held() in idle_get_state() K Prateek Nayak
2026-03-12 9:46 ` Peter Zijlstra
2026-03-12 10:06 ` K Prateek Nayak
2026-03-18 8:08 ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12 4:44 ` [PATCH v4 7/9] sched/fair: Remove superfluous rcu_read_lock() in the wakeup path K Prateek Nayak
2026-03-15 23:36 ` Dietmar Eggemann
2026-03-16 3:19 ` K Prateek Nayak
2026-03-18 8:08 ` [tip: sched/core] PM: EM: Switch to rcu_dereference_all() in " tip-bot2 for Dietmar Eggemann
2026-03-18 8:08 ` [tip: sched/core] sched/fair: Remove superfluous rcu_read_lock() in the " tip-bot2 for K Prateek Nayak
2026-03-12 4:44 ` [PATCH v4 8/9] sched/fair: Simplify the entry condition for update_idle_cpu_scan() K Prateek Nayak
2026-03-18 8:08 ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12 4:44 ` [PATCH v4 9/9] sched/fair: Simplify SIS_UTIL handling in select_idle_cpu() K Prateek Nayak
2026-03-18 8:08 ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-16 0:22 ` [PATCH v4 0/9] sched/topology: Optimize sd->shared allocation Dietmar Eggemann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260320235824.GA1176840@ax162 \
--to=nathan@kernel.org \
--cc=dietmar.eggemann@arm.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=sshegde@linux.ibm.com \
--cc=vschneid@redhat.com \
--cc=x86@kernel.org \
--cc=yu.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox