From: Nathan Chancellor <nathan@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>,
K Prateek Nayak <kprateek.nayak@amd.com>
Cc: linux-tip-commits@vger.kernel.org, linux-kernel@vger.kernel.org,
Shrikanth Hegde <sshegde@linux.ibm.com>,
Chen Yu <yu.c.chen@intel.com>,
Valentin Schneider <vschneid@redhat.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
x86@kernel.org
Subject: Re: [tip: sched/core] sched/topology: Compute sd_weight considering cpuset partitions
Date: Fri, 20 Mar 2026 16:58:24 -0700 [thread overview]
Message-ID: <20260320235824.GA1176840@ax162> (raw)
In-Reply-To: <177382132440.1647592.1849180094328011054.tip-bot2@tip-bot2>
Hi all,
On Wed, Mar 18, 2026 at 08:08:44AM -0000, tip-bot2 for K Prateek Nayak wrote:
> The following commit has been merged into the sched/core branch of tip:
>
> Commit-ID: 8e8e23dea43e64ddafbd1246644c3219209be113
> Gitweb: https://git.kernel.org/tip/8e8e23dea43e64ddafbd1246644c3219209be113
> Author: K Prateek Nayak <kprateek.nayak@amd.com>
> AuthorDate: Thu, 12 Mar 2026 04:44:26
> Committer: Peter Zijlstra <peterz@infradead.org>
> CommitterDate: Wed, 18 Mar 2026 09:06:47 +01:00
>
> sched/topology: Compute sd_weight considering cpuset partitions
>
> The "sd_weight" used for calculating the load balancing interval, and
> its limits, considers the span weight of the entire topology level
> without accounting for cpuset partitions.
>
> For example, consider a large system of 128CPUs divided into 8 * 16CPUs
> partition which is typical when deploying virtual machines:
>
> [ PKG Domain: 128CPUs ]
>
> [Partition0: 16CPUs][Partition1: 16CPUs] ... [Partition7: 16CPUs]
>
> Although each partition only contains 16CPUs, the load balancing
> interval is set to a minimum of 128 jiffies considering the span of the
> entire domain with 128CPUs which can lead to longer imbalances within
> the partition although balancing within is cheaper with 16CPUs.
>
> Compute the "sd_weight" after computing the "sd_span" considering the
> cpu_map covered by the partition, and set the load balancing interval,
> and its limits accordingly.
>
> For the above example, the balancing intervals for the partitions PKG
> domain changes as follows:
>
> before after
> balance_interval 128 16
> min_interval 128 16
> max_interval 256 32
>
> Intervals are now proportional to the CPUs in the partitioned domain as
> was intended by the original formula.
>
> Fixes: cb83b629bae03 ("sched/numa: Rewrite the CONFIG_NUMA sched domain support")
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> Reviewed-by: Chen Yu <yu.c.chen@intel.com>
> Reviewed-by: Valentin Schneider <vschneid@redhat.com>
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Link: https://patch.msgid.link/20260312044434.1974-2-kprateek.nayak@amd.com
> ---
> kernel/sched/topology.c | 14 ++++++--------
> 1 file changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 061f8c8..79bab80 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1645,13 +1645,17 @@ sd_init(struct sched_domain_topology_level *tl,
> struct cpumask *sd_span;
> u64 now = sched_clock();
>
> - sd_weight = cpumask_weight(tl->mask(tl, cpu));
> + sd_span = sched_domain_span(sd);
> + cpumask_and(sd_span, cpu_map, tl->mask(tl, cpu));
> + sd_weight = cpumask_weight(sd_span);
> + sd_id = cpumask_first(sd_span);
>
> if (tl->sd_flags)
> sd_flags = (*tl->sd_flags)();
> if (WARN_ONCE(sd_flags & ~TOPOLOGY_SD_FLAGS,
> - "wrong sd_flags in topology description\n"))
> + "wrong sd_flags in topology description\n"))
> sd_flags &= TOPOLOGY_SD_FLAGS;
> + sd_flags |= asym_cpu_capacity_classify(sd_span, cpu_map);
>
> *sd = (struct sched_domain){
> .min_interval = sd_weight,
> @@ -1689,12 +1693,6 @@ sd_init(struct sched_domain_topology_level *tl,
> .name = tl->name,
> };
>
> - sd_span = sched_domain_span(sd);
> - cpumask_and(sd_span, cpu_map, tl->mask(tl, cpu));
> - sd_id = cpumask_first(sd_span);
> -
> - sd->flags |= asym_cpu_capacity_classify(sd_span, cpu_map);
> -
> WARN_ONCE((sd->flags & (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY)) ==
> (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY),
> "CPU capacity asymmetry not supported on SMT\n");
Apologies if this has already been reported or addressed but I am seeing
a crash when booting certain ARM configurations after this change landed
in -next. I reduced it down to
$ cat kernel/configs/schedstats.config
CONFIG_SCHEDSTATS=y
$ make -skj"$(nproc)" ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- mrproper defconfig schedstats.config zImage
$ curl -LSs https://github.com/ClangBuiltLinux/boot-utils/releases/download/20241120-044434/arm-rootfs.cpio.zst | zstd -d >rootfs.cpio
$ qemu-system-arm \
-display none \
-nodefaults \
-no-reboot \
-machine virt \
-append 'console=ttyAMA0 earlycon' \
-kernel arch/arm/boot/zImage \
-initrd rootfs.cpio \
-m 1G \
-serial mon:stdio
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 7.0.0-rc4-00017-g8e8e23dea43e (nathan@framework-amd-ryzen-maxplus-395) (arm-linux-gnueabi-gcc (GCC) 15.2.0, GNU ld (GNU Binutils) 2.45) #1 SMP Fri Mar 20 16:12:05 MST 2026
...
[ 0.031929] 8<--- cut here ---
[ 0.031999] Unable to handle kernel NULL pointer dereference at virtual address 00000000 when write
[ 0.032172] [00000000] *pgd=00000000
[ 0.032459] Internal error: Oops: 805 [#1] SMP ARM
[ 0.032902] Modules linked in:
[ 0.033466] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc4-00017-g8e8e23dea43e #1 VOLUNTARY
[ 0.033658] Hardware name: Generic DT based system
[ 0.033770] PC is at build_sched_domains+0x7d0/0x1628
[ 0.034091] LR is at build_sched_domains+0x78c/0x1628
[ 0.034166] pc : [<c03c54bc>] lr : [<c03c5478>] psr: 20000053
[ 0.034255] sp : f080dec0 ip : 00000000 fp : c1e244a4
[ 0.034339] r10: c1e04fd4 r9 : c1e24518 r8 : 00000000
[ 0.034415] r7 : c2088f20 r6 : c28db924 r5 : c1e051ec r4 : 00000010
[ 0.034508] r3 : 00000000 r2 : 00000000 r1 : 00000010 r0 : 00000010
[ 0.034623] Flags: nzCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment none
[ 0.034730] Control: 10c5387d Table: 4020406a DAC: 00000051
[ 0.034819] Register r0 information: zero-size pointer
[ 0.034990] Register r1 information: zero-size pointer
[ 0.035064] Register r2 information: NULL pointer
[ 0.035133] Register r3 information: NULL pointer
[ 0.035198] Register r4 information: zero-size pointer
[ 0.035266] Register r5 information: non-slab/vmalloc memory
[ 0.035376] Register r6 information: slab kmalloc-512 start c28db800 pointer offset 292 size 512
[ 0.035623] Register r7 information: non-slab/vmalloc memory
[ 0.035703] Register r8 information: NULL pointer
[ 0.035769] Register r9 information: non-slab/vmalloc memory
[ 0.035848] Register r10 information: non-slab/vmalloc memory
[ 0.035928] Register r11 information: non-slab/vmalloc memory
[ 0.036006] Register r12 information: NULL pointer
[ 0.036083] Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
[ 0.036243] Stack: (0xf080dec0 to 0xf080e000)
[ 0.036339] dec0: 00000000 c139a06c 00000001 00000000 c1e243f4 c28db924 c28db800 00000000
[ 0.036450] dee0: 00000000 ffff8ad3 00000000 00000001 c18f9f1c 00000000 c1e03d80 c1a8d4d0
[ 0.036559] df00: 00000000 c2073b8f c28e3180 00000000 c20d0050 c1d7ea64 c28b8800 f4b63fe3
[ 0.036665] df20: c1e22714 c2969480 c1e22714 00000000 c2074620 c1a8a0e8 00000000 00000000
[ 0.036772] df40: f080df6c c1c1d724 20000053 c0303d80 f080df64 f4b63fe3 c1d703dc c1d703dc
[ 0.036878] df60: c1d703dc 00000000 00000000 c1c01368 c2969480 f080df74 f080df74 f4b63fe3
[ 0.036989] df80: 00000000 c1e04f80 c13979fc 00000000 00000000 00000000 00000000 00000000
[ 0.037097] dfa0: 00000000 c1397a14 00000000 c03001ac 00000000 00000000 00000000 00000000
[ 0.037206] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 0.037316] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[ 0.037447] Call trace:
[ 0.037698] build_sched_domains from sched_init_smp+0x80/0x108
[ 0.037943] sched_init_smp from kernel_init_freeable+0xe8/0x24c
[ 0.038029] kernel_init_freeable from kernel_init+0x18/0x12c
[ 0.038122] kernel_init from ret_from_fork+0x14/0x28
[ 0.038209] Exception stack(0xf080dfb0 to 0xf080dff8)
[ 0.038277] dfa0: 00000000 00000000 00000000 00000000
[ 0.038386] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 0.038495] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 0.038640] Code: e58d3020 e58d300c e59d3020 e59d200c (e5832000)
[ 0.038903] ---[ end trace 0000000000000000 ]---
[ 0.039275] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 0.039628] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
If there is any more information I can provide or patches I can test, I
am more than happy to do so.
Cheers,
Nathan
# bad: [b5d083a3ed1e2798396d5e491432e887da8d4a06] Add linux-next specific files for 20260319
# good: [8a30aeb0d1b4e4aaf7f7bae72f20f2ae75385ccb] Merge tag 'nfsd-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
git bisect start 'b5d083a3ed1e2798396d5e491432e887da8d4a06' '8a30aeb0d1b4e4aaf7f7bae72f20f2ae75385ccb'
# good: [21fbd87ec0afe2af5457f5a7f9acbee4bf5db891] Merge branch 'main' of https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
git bisect good 21fbd87ec0afe2af5457f5a7f9acbee4bf5db891
# good: [bffa4391cf4ee844778893a781f14faa55c75cce] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git
git bisect good bffa4391cf4ee844778893a781f14faa55c75cce
# bad: [a360efb89caee066919156db3921e616093c43b6] Merge branch 'for-leds-next' of https://git.kernel.org/pub/scm/linux/kernel/git/lee/leds.git
git bisect bad a360efb89caee066919156db3921e616093c43b6
# good: [77f1b9e1181ac53ae9ce7c3c0e52002d02495c5e] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git
git bisect good 77f1b9e1181ac53ae9ce7c3c0e52002d02495c5e
# bad: [d0b3afea83e48990083c0367c10f02af751166b4] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
git bisect bad d0b3afea83e48990083c0367c10f02af751166b4
# good: [fe58c95c6f191a8c45dc183a2348a3b4caa77ed8] Merge branch into tip/master: 'perf/core'
git bisect good fe58c95c6f191a8c45dc183a2348a3b4caa77ed8
# bad: [90924d8b73ac96a1a8b1cb9ba6cae36e193061a1] Merge branch into tip/master: 'timers/vdso'
git bisect bad 90924d8b73ac96a1a8b1cb9ba6cae36e193061a1
# bad: [91396a53d7c7cb694627c665e0dbd2589c99eb0a] Merge branch into tip/master: 'timers/core'
git bisect bad 91396a53d7c7cb694627c665e0dbd2589c99eb0a
# bad: [fe7171d0d5dfbe189e41db99580ebacafc3c09ce] sched/fair: Simplify SIS_UTIL handling in select_idle_cpu()
git bisect bad fe7171d0d5dfbe189e41db99580ebacafc3c09ce
# good: [54a66e431eeacf23e1dc47cb3507f2d0c068aaf0] sched/headers: Inline raw_spin_rq_unlock()
git bisect good 54a66e431eeacf23e1dc47cb3507f2d0c068aaf0
# bad: [1cc8a33ca7e8d38f962b64ece2a42c411a67bc76] sched/topology: Allocate per-CPU sched_domain_shared in s_data
git bisect bad 1cc8a33ca7e8d38f962b64ece2a42c411a67bc76
# good: [786244f70322e41c937e69f0f935bfd11a9611bf] Merge tag 'v7.0-rc4' into sched/core, to pick up scheduler fixes
git bisect good 786244f70322e41c937e69f0f935bfd11a9611bf
# bad: [5a7b576b3ec1acc2694c5b58f80cd1d44a11b2c1] sched/topology: Extract "imb_numa_nr" calculation into a separate helper
git bisect bad 5a7b576b3ec1acc2694c5b58f80cd1d44a11b2c1
# bad: [8e8e23dea43e64ddafbd1246644c3219209be113] sched/topology: Compute sd_weight considering cpuset partitions
git bisect bad 8e8e23dea43e64ddafbd1246644c3219209be113
# first bad commit: [8e8e23dea43e64ddafbd1246644c3219209be113] sched/topology: Compute sd_weight considering cpuset partitions
next prev parent reply other threads:[~2026-03-20 23:58 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-12 4:44 [PATCH v4 0/9] sched/topology: Optimize sd->shared allocation K Prateek Nayak
2026-03-12 4:44 ` [PATCH v4 1/9] sched/topology: Compute sd_weight considering cpuset partitions K Prateek Nayak
2026-03-12 9:34 ` Peter Zijlstra
2026-03-12 9:59 ` K Prateek Nayak
2026-03-12 10:01 ` Peter Zijlstra
2026-03-12 10:09 ` K Prateek Nayak
2026-03-18 8:08 ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-20 23:58 ` Nathan Chancellor [this message]
2026-03-21 3:36 ` K Prateek Nayak
2026-03-21 7:33 ` Chen, Yu C
2026-03-21 7:47 ` Chen, Yu C
2026-03-21 8:59 ` K Prateek Nayak
2026-03-21 9:45 ` K Prateek Nayak
2026-03-21 10:13 ` K Prateek Nayak
2026-03-21 12:48 ` Chen, Yu C
2026-03-24 2:54 ` K Prateek Nayak
2026-03-21 14:13 ` Shrikanth Hegde
2026-03-21 15:14 ` K Prateek Nayak
2026-03-21 16:38 ` [PATCH] sched/topology: Initialize sd_span after assignment to *sd K Prateek Nayak
2026-03-23 9:08 ` Shrikanth Hegde
2026-03-23 17:34 ` K Prateek Nayak
2026-03-23 9:36 ` Peter Zijlstra
2026-03-23 13:24 ` Jon Hunter
2026-03-23 15:36 ` Chen, Yu C
2026-03-23 17:24 ` K Prateek Nayak
2026-03-23 22:41 ` Nathan Chancellor
2026-03-24 9:10 ` [tip: sched/core] sched/topology: Fix sched_domain_span() tip-bot2 for Peter Zijlstra
2026-03-12 4:44 ` [PATCH v4 2/9] sched/topology: Extract "imb_numa_nr" calculation into a separate helper K Prateek Nayak
2026-03-12 13:37 ` kernel test robot
2026-03-12 15:42 ` K Prateek Nayak
2026-03-12 16:02 ` Peter Zijlstra
2026-03-16 0:18 ` Dietmar Eggemann
2026-03-16 3:41 ` K Prateek Nayak
2026-03-16 8:24 ` Dietmar Eggemann
2026-03-16 8:50 ` K Prateek Nayak
2026-03-18 8:08 ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12 4:44 ` [PATCH v4 3/9] sched/topology: Allocate per-CPU sched_domain_shared in s_data K Prateek Nayak
2026-03-18 8:08 ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12 4:44 ` [PATCH v4 4/9] sched/topology: Switch to assigning "sd->shared" from s_data K Prateek Nayak
2026-03-18 8:08 ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12 4:44 ` [PATCH v4 5/9] sched/topology: Remove sched_domain_shared allocation with sd_data K Prateek Nayak
2026-03-18 8:08 ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12 4:44 ` [PATCH v4 6/9] sched/core: Check for rcu_read_lock_any_held() in idle_get_state() K Prateek Nayak
2026-03-12 9:46 ` Peter Zijlstra
2026-03-12 10:06 ` K Prateek Nayak
2026-03-18 8:08 ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12 4:44 ` [PATCH v4 7/9] sched/fair: Remove superfluous rcu_read_lock() in the wakeup path K Prateek Nayak
2026-03-15 23:36 ` Dietmar Eggemann
2026-03-16 3:19 ` K Prateek Nayak
2026-03-18 8:08 ` [tip: sched/core] PM: EM: Switch to rcu_dereference_all() in " tip-bot2 for Dietmar Eggemann
2026-03-18 8:08 ` [tip: sched/core] sched/fair: Remove superfluous rcu_read_lock() in the " tip-bot2 for K Prateek Nayak
2026-03-12 4:44 ` [PATCH v4 8/9] sched/fair: Simplify the entry condition for update_idle_cpu_scan() K Prateek Nayak
2026-03-18 8:08 ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12 4:44 ` [PATCH v4 9/9] sched/fair: Simplify SIS_UTIL handling in select_idle_cpu() K Prateek Nayak
2026-03-18 8:08 ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-16 0:22 ` [PATCH v4 0/9] sched/topology: Optimize sd->shared allocation Dietmar Eggemann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260320235824.GA1176840@ax162 \
--to=nathan@kernel.org \
--cc=dietmar.eggemann@arm.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=sshegde@linux.ibm.com \
--cc=vschneid@redhat.com \
--cc=x86@kernel.org \
--cc=yu.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.