public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nathan Chancellor <nathan@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>,
	K Prateek Nayak <kprateek.nayak@amd.com>
Cc: linux-tip-commits@vger.kernel.org, linux-kernel@vger.kernel.org,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Chen Yu <yu.c.chen@intel.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	x86@kernel.org
Subject: Re: [tip: sched/core] sched/topology: Compute sd_weight considering cpuset partitions
Date: Fri, 20 Mar 2026 16:58:24 -0700	[thread overview]
Message-ID: <20260320235824.GA1176840@ax162> (raw)
In-Reply-To: <177382132440.1647592.1849180094328011054.tip-bot2@tip-bot2>

Hi all,

On Wed, Mar 18, 2026 at 08:08:44AM -0000, tip-bot2 for K Prateek Nayak wrote:
> The following commit has been merged into the sched/core branch of tip:
> 
> Commit-ID:     8e8e23dea43e64ddafbd1246644c3219209be113
> Gitweb:        https://git.kernel.org/tip/8e8e23dea43e64ddafbd1246644c3219209be113
> Author:        K Prateek Nayak <kprateek.nayak@amd.com>
> AuthorDate:    Thu, 12 Mar 2026 04:44:26 
> Committer:     Peter Zijlstra <peterz@infradead.org>
> CommitterDate: Wed, 18 Mar 2026 09:06:47 +01:00
> 
> sched/topology: Compute sd_weight considering cpuset partitions
> 
> The "sd_weight" used for calculating the load balancing interval, and
> its limits, considers the span weight of the entire topology level
> without accounting for cpuset partitions.
> 
> For example, consider a large system of 128CPUs divided into 8 * 16CPUs
> partition which is typical when deploying virtual machines:
> 
>   [                      PKG Domain: 128CPUs                      ]
> 
>   [Partition0: 16CPUs][Partition1: 16CPUs] ... [Partition7: 16CPUs]
> 
> Although each partition only contains 16CPUs, the load balancing
> interval is set to a minimum of 128 jiffies considering the span of the
> entire domain with 128CPUs which can lead to longer imbalances within
> the partition although balancing within is cheaper with 16CPUs.
> 
> Compute the "sd_weight" after computing the "sd_span" considering the
> cpu_map covered by the partition, and set the load balancing interval,
> and its limits accordingly.
> 
> For the above example, the balancing intervals for the partitions PKG
> domain changes as follows:
> 
>                   before   after
> balance_interval   128      16
> min_interval       128      16
> max_interval       256      32
> 
> Intervals are now proportional to the CPUs in the partitioned domain as
> was intended by the original formula.
> 
> Fixes: cb83b629bae03 ("sched/numa: Rewrite the CONFIG_NUMA sched domain support")
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> Reviewed-by: Chen Yu <yu.c.chen@intel.com>
> Reviewed-by: Valentin Schneider <vschneid@redhat.com>
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Link: https://patch.msgid.link/20260312044434.1974-2-kprateek.nayak@amd.com
> ---
>  kernel/sched/topology.c | 14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 061f8c8..79bab80 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1645,13 +1645,17 @@ sd_init(struct sched_domain_topology_level *tl,
>  	struct cpumask *sd_span;
>  	u64 now = sched_clock();
>  
> -	sd_weight = cpumask_weight(tl->mask(tl, cpu));
> +	sd_span = sched_domain_span(sd);
> +	cpumask_and(sd_span, cpu_map, tl->mask(tl, cpu));
> +	sd_weight = cpumask_weight(sd_span);
> +	sd_id = cpumask_first(sd_span);
>  
>  	if (tl->sd_flags)
>  		sd_flags = (*tl->sd_flags)();
>  	if (WARN_ONCE(sd_flags & ~TOPOLOGY_SD_FLAGS,
> -			"wrong sd_flags in topology description\n"))
> +		      "wrong sd_flags in topology description\n"))
>  		sd_flags &= TOPOLOGY_SD_FLAGS;
> +	sd_flags |= asym_cpu_capacity_classify(sd_span, cpu_map);
>  
>  	*sd = (struct sched_domain){
>  		.min_interval		= sd_weight,
> @@ -1689,12 +1693,6 @@ sd_init(struct sched_domain_topology_level *tl,
>  		.name			= tl->name,
>  	};
>  
> -	sd_span = sched_domain_span(sd);
> -	cpumask_and(sd_span, cpu_map, tl->mask(tl, cpu));
> -	sd_id = cpumask_first(sd_span);
> -
> -	sd->flags |= asym_cpu_capacity_classify(sd_span, cpu_map);
> -
>  	WARN_ONCE((sd->flags & (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY)) ==
>  		  (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY),
>  		  "CPU capacity asymmetry not supported on SMT\n");

Apologies if this has already been reported or addressed but I am seeing
a crash when booting certain ARM configurations after this change landed
in -next. I reduced it down to

  $ cat kernel/configs/schedstats.config
  CONFIG_SCHEDSTATS=y

  $ make -skj"$(nproc)" ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- mrproper defconfig schedstats.config zImage

  $ curl -LSs https://github.com/ClangBuiltLinux/boot-utils/releases/download/20241120-044434/arm-rootfs.cpio.zst | zstd -d >rootfs.cpio

  $ qemu-system-arm \
      -display none \
      -nodefaults \
      -no-reboot \
      -machine virt \
      -append 'console=ttyAMA0 earlycon' \
      -kernel arch/arm/boot/zImage \
      -initrd rootfs.cpio \
      -m 1G \
      -serial mon:stdio
  [    0.000000] Booting Linux on physical CPU 0x0
  [    0.000000] Linux version 7.0.0-rc4-00017-g8e8e23dea43e (nathan@framework-amd-ryzen-maxplus-395) (arm-linux-gnueabi-gcc (GCC) 15.2.0, GNU ld (GNU Binutils) 2.45) #1 SMP Fri Mar 20 16:12:05 MST 2026
  ...
  [    0.031929] 8<--- cut here ---
  [    0.031999] Unable to handle kernel NULL pointer dereference at virtual address 00000000 when write
  [    0.032172] [00000000] *pgd=00000000
  [    0.032459] Internal error: Oops: 805 [#1] SMP ARM
  [    0.032902] Modules linked in:
  [    0.033466] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc4-00017-g8e8e23dea43e #1 VOLUNTARY
  [    0.033658] Hardware name: Generic DT based system
  [    0.033770] PC is at build_sched_domains+0x7d0/0x1628
  [    0.034091] LR is at build_sched_domains+0x78c/0x1628
  [    0.034166] pc : [<c03c54bc>]    lr : [<c03c5478>]    psr: 20000053
  [    0.034255] sp : f080dec0  ip : 00000000  fp : c1e244a4
  [    0.034339] r10: c1e04fd4  r9 : c1e24518  r8 : 00000000
  [    0.034415] r7 : c2088f20  r6 : c28db924  r5 : c1e051ec  r4 : 00000010
  [    0.034508] r3 : 00000000  r2 : 00000000  r1 : 00000010  r0 : 00000010
  [    0.034623] Flags: nzCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment none
  [    0.034730] Control: 10c5387d  Table: 4020406a  DAC: 00000051
  [    0.034819] Register r0 information: zero-size pointer
  [    0.034990] Register r1 information: zero-size pointer
  [    0.035064] Register r2 information: NULL pointer
  [    0.035133] Register r3 information: NULL pointer
  [    0.035198] Register r4 information: zero-size pointer
  [    0.035266] Register r5 information: non-slab/vmalloc memory
  [    0.035376] Register r6 information: slab kmalloc-512 start c28db800 pointer offset 292 size 512
  [    0.035623] Register r7 information: non-slab/vmalloc memory
  [    0.035703] Register r8 information: NULL pointer
  [    0.035769] Register r9 information: non-slab/vmalloc memory
  [    0.035848] Register r10 information: non-slab/vmalloc memory
  [    0.035928] Register r11 information: non-slab/vmalloc memory
  [    0.036006] Register r12 information: NULL pointer
  [    0.036083] Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
  [    0.036243] Stack: (0xf080dec0 to 0xf080e000)
  [    0.036339] dec0: 00000000 c139a06c 00000001 00000000 c1e243f4 c28db924 c28db800 00000000
  [    0.036450] dee0: 00000000 ffff8ad3 00000000 00000001 c18f9f1c 00000000 c1e03d80 c1a8d4d0
  [    0.036559] df00: 00000000 c2073b8f c28e3180 00000000 c20d0050 c1d7ea64 c28b8800 f4b63fe3
  [    0.036665] df20: c1e22714 c2969480 c1e22714 00000000 c2074620 c1a8a0e8 00000000 00000000
  [    0.036772] df40: f080df6c c1c1d724 20000053 c0303d80 f080df64 f4b63fe3 c1d703dc c1d703dc
  [    0.036878] df60: c1d703dc 00000000 00000000 c1c01368 c2969480 f080df74 f080df74 f4b63fe3
  [    0.036989] df80: 00000000 c1e04f80 c13979fc 00000000 00000000 00000000 00000000 00000000
  [    0.037097] dfa0: 00000000 c1397a14 00000000 c03001ac 00000000 00000000 00000000 00000000
  [    0.037206] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  [    0.037316] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
  [    0.037447] Call trace:
  [    0.037698]  build_sched_domains from sched_init_smp+0x80/0x108
  [    0.037943]  sched_init_smp from kernel_init_freeable+0xe8/0x24c
  [    0.038029]  kernel_init_freeable from kernel_init+0x18/0x12c
  [    0.038122]  kernel_init from ret_from_fork+0x14/0x28
  [    0.038209] Exception stack(0xf080dfb0 to 0xf080dff8)
  [    0.038277] dfa0:                                     00000000 00000000 00000000 00000000
  [    0.038386] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  [    0.038495] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
  [    0.038640] Code: e58d3020 e58d300c e59d3020 e59d200c (e5832000)
  [    0.038903] ---[ end trace 0000000000000000 ]---
  [    0.039275] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
  [    0.039628] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

If there is any more information I can provide or patches I can test, I
am more than happy to do so.

Cheers,
Nathan

# bad: [b5d083a3ed1e2798396d5e491432e887da8d4a06] Add linux-next specific files for 20260319
# good: [8a30aeb0d1b4e4aaf7f7bae72f20f2ae75385ccb] Merge tag 'nfsd-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
git bisect start 'b5d083a3ed1e2798396d5e491432e887da8d4a06' '8a30aeb0d1b4e4aaf7f7bae72f20f2ae75385ccb'
# good: [21fbd87ec0afe2af5457f5a7f9acbee4bf5db891] Merge branch 'main' of https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
git bisect good 21fbd87ec0afe2af5457f5a7f9acbee4bf5db891
# good: [bffa4391cf4ee844778893a781f14faa55c75cce] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git
git bisect good bffa4391cf4ee844778893a781f14faa55c75cce
# bad: [a360efb89caee066919156db3921e616093c43b6] Merge branch 'for-leds-next' of https://git.kernel.org/pub/scm/linux/kernel/git/lee/leds.git
git bisect bad a360efb89caee066919156db3921e616093c43b6
# good: [77f1b9e1181ac53ae9ce7c3c0e52002d02495c5e] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git
git bisect good 77f1b9e1181ac53ae9ce7c3c0e52002d02495c5e
# bad: [d0b3afea83e48990083c0367c10f02af751166b4] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
git bisect bad d0b3afea83e48990083c0367c10f02af751166b4
# good: [fe58c95c6f191a8c45dc183a2348a3b4caa77ed8] Merge branch into tip/master: 'perf/core'
git bisect good fe58c95c6f191a8c45dc183a2348a3b4caa77ed8
# bad: [90924d8b73ac96a1a8b1cb9ba6cae36e193061a1] Merge branch into tip/master: 'timers/vdso'
git bisect bad 90924d8b73ac96a1a8b1cb9ba6cae36e193061a1
# bad: [91396a53d7c7cb694627c665e0dbd2589c99eb0a] Merge branch into tip/master: 'timers/core'
git bisect bad 91396a53d7c7cb694627c665e0dbd2589c99eb0a
# bad: [fe7171d0d5dfbe189e41db99580ebacafc3c09ce] sched/fair: Simplify SIS_UTIL handling in select_idle_cpu()
git bisect bad fe7171d0d5dfbe189e41db99580ebacafc3c09ce
# good: [54a66e431eeacf23e1dc47cb3507f2d0c068aaf0] sched/headers: Inline raw_spin_rq_unlock()
git bisect good 54a66e431eeacf23e1dc47cb3507f2d0c068aaf0
# bad: [1cc8a33ca7e8d38f962b64ece2a42c411a67bc76] sched/topology: Allocate per-CPU sched_domain_shared in s_data
git bisect bad 1cc8a33ca7e8d38f962b64ece2a42c411a67bc76
# good: [786244f70322e41c937e69f0f935bfd11a9611bf] Merge tag 'v7.0-rc4' into sched/core, to pick up scheduler fixes
git bisect good 786244f70322e41c937e69f0f935bfd11a9611bf
# bad: [5a7b576b3ec1acc2694c5b58f80cd1d44a11b2c1] sched/topology: Extract "imb_numa_nr" calculation into a separate helper
git bisect bad 5a7b576b3ec1acc2694c5b58f80cd1d44a11b2c1
# bad: [8e8e23dea43e64ddafbd1246644c3219209be113] sched/topology: Compute sd_weight considering cpuset partitions
git bisect bad 8e8e23dea43e64ddafbd1246644c3219209be113
# first bad commit: [8e8e23dea43e64ddafbd1246644c3219209be113] sched/topology: Compute sd_weight considering cpuset partitions

  reply	other threads:[~2026-03-20 23:58 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-12  4:44 [PATCH v4 0/9] sched/topology: Optimize sd->shared allocation K Prateek Nayak
2026-03-12  4:44 ` [PATCH v4 1/9] sched/topology: Compute sd_weight considering cpuset partitions K Prateek Nayak
2026-03-12  9:34   ` Peter Zijlstra
2026-03-12  9:59     ` K Prateek Nayak
2026-03-12 10:01       ` Peter Zijlstra
2026-03-12 10:09         ` K Prateek Nayak
2026-03-18  8:08   ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-20 23:58     ` Nathan Chancellor [this message]
2026-03-21  3:36       ` K Prateek Nayak
2026-03-21  7:33         ` Chen, Yu C
2026-03-21  7:47           ` Chen, Yu C
2026-03-21  8:59             ` K Prateek Nayak
2026-03-21  9:45               ` K Prateek Nayak
2026-03-21 10:13                 ` K Prateek Nayak
2026-03-21 12:48                   ` Chen, Yu C
2026-03-24  2:54                     ` K Prateek Nayak
2026-03-21 14:13                   ` Shrikanth Hegde
2026-03-21 15:14                     ` K Prateek Nayak
2026-03-21 16:38       ` [PATCH] sched/topology: Initialize sd_span after assignment to *sd K Prateek Nayak
2026-03-23  9:08         ` Shrikanth Hegde
2026-03-23 17:34           ` K Prateek Nayak
2026-03-23  9:36         ` Peter Zijlstra
2026-03-23 13:24           ` Jon Hunter
2026-03-23 15:36           ` Chen, Yu C
2026-03-23 17:24           ` K Prateek Nayak
2026-03-23 22:41           ` Nathan Chancellor
2026-03-24  9:10           ` [tip: sched/core] sched/topology: Fix sched_domain_span() tip-bot2 for Peter Zijlstra
2026-03-12  4:44 ` [PATCH v4 2/9] sched/topology: Extract "imb_numa_nr" calculation into a separate helper K Prateek Nayak
2026-03-12 13:37   ` kernel test robot
2026-03-12 15:42     ` K Prateek Nayak
2026-03-12 16:02       ` Peter Zijlstra
2026-03-16  0:18   ` Dietmar Eggemann
2026-03-16  3:41     ` K Prateek Nayak
2026-03-16  8:24       ` Dietmar Eggemann
2026-03-16  8:50         ` K Prateek Nayak
2026-03-18  8:08   ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12  4:44 ` [PATCH v4 3/9] sched/topology: Allocate per-CPU sched_domain_shared in s_data K Prateek Nayak
2026-03-18  8:08   ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12  4:44 ` [PATCH v4 4/9] sched/topology: Switch to assigning "sd->shared" from s_data K Prateek Nayak
2026-03-18  8:08   ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12  4:44 ` [PATCH v4 5/9] sched/topology: Remove sched_domain_shared allocation with sd_data K Prateek Nayak
2026-03-18  8:08   ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12  4:44 ` [PATCH v4 6/9] sched/core: Check for rcu_read_lock_any_held() in idle_get_state() K Prateek Nayak
2026-03-12  9:46   ` Peter Zijlstra
2026-03-12 10:06     ` K Prateek Nayak
2026-03-18  8:08   ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12  4:44 ` [PATCH v4 7/9] sched/fair: Remove superfluous rcu_read_lock() in the wakeup path K Prateek Nayak
2026-03-15 23:36   ` Dietmar Eggemann
2026-03-16  3:19     ` K Prateek Nayak
2026-03-18  8:08     ` [tip: sched/core] PM: EM: Switch to rcu_dereference_all() in " tip-bot2 for Dietmar Eggemann
2026-03-18  8:08   ` [tip: sched/core] sched/fair: Remove superfluous rcu_read_lock() in the " tip-bot2 for K Prateek Nayak
2026-03-12  4:44 ` [PATCH v4 8/9] sched/fair: Simplify the entry condition for update_idle_cpu_scan() K Prateek Nayak
2026-03-18  8:08   ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12  4:44 ` [PATCH v4 9/9] sched/fair: Simplify SIS_UTIL handling in select_idle_cpu() K Prateek Nayak
2026-03-18  8:08   ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-16  0:22 ` [PATCH v4 0/9] sched/topology: Optimize sd->shared allocation Dietmar Eggemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260320235824.GA1176840@ax162 \
    --to=nathan@kernel.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=sshegde@linux.ibm.com \
    --cc=vschneid@redhat.com \
    --cc=x86@kernel.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox