From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5F7F2279798; Fri, 20 Mar 2026 23:58:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774051109; cv=none; b=nMlB1ty4MUM546C67e+8TLgpVKIeHJ0uh9hhsLcINlIMmpzO/k0gi0VbfSXRumqmMwa6jm5z0GM3rt1NEvPySdlIHYWitPZrFG4cmp/nL3YjtP4qQ8YsxDCqSLH0tikTf2X1vmpt5IBMmvp0XyLBhnlid9Ke2TurzC6fexWSOuU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774051109; c=relaxed/simple; bh=3tpgKjwJJRnjQIp2tbY7+AKppTSIjIAHRI4OV69A2SY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=seo5Q5YO9DhMDOW8kkl5YEKJJVI+KqBYSnUFmi6SCu/ToeDjONvxHH2ErZrfkW1knnPtz+K+/BOYsioaljvxU3onMKlFSsLcz7ehk1AMQbM5xNiFAv8R0Fna30NURIvq9QSSPTBlwp6cnKhi7deU+h4vos2R5Lb2UAvFI+RjobE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=d/dVq5ab; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="d/dVq5ab" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 00EF4C4CEF7; Fri, 20 Mar 2026 23:58:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774051108; bh=3tpgKjwJJRnjQIp2tbY7+AKppTSIjIAHRI4OV69A2SY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=d/dVq5abTsCrsrYVMmoJyYumdms2xLzGpvm2BY8D4lGiK1SEBwYeq5E/LHE8vL0cP N2aVqYcwIOrkRAYfBTMERo20t6HHmON1sMJGdnIrHhefPm2G3BLI5+B26gJLPAhzz9 EKtHV29kzQ9NqAGG/6QtIYaEs8+XMJtYPmmRxf5EGi1QLNxtnJhOd6p4DhFAV6KeIn XZtDXqMz4YeGtxGalIy1GszMQiDxKHI9NN+l3C0ZfxQpXbNJy2oGUNWO4X8E9k9uhV VIBt1g2Ou7qFzm56eb+xSUCECKD9iQDtWKoyzD23K880x1Zc1bl13L6nOaK9or5I0S GaCWV7Lpu9N6Q== Date: Fri, 20 Mar 2026 16:58:24 -0700 From: Nathan Chancellor To: Peter Zijlstra , K Prateek Nayak Cc: linux-tip-commits@vger.kernel.org, linux-kernel@vger.kernel.org, Shrikanth Hegde , Chen Yu , Valentin Schneider , Dietmar Eggemann , x86@kernel.org Subject: Re: [tip: sched/core] sched/topology: Compute sd_weight considering cpuset partitions Message-ID: <20260320235824.GA1176840@ax162> References: <20260312044434.1974-2-kprateek.nayak@amd.com> <177382132440.1647592.1849180094328011054.tip-bot2@tip-bot2> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <177382132440.1647592.1849180094328011054.tip-bot2@tip-bot2> Hi all, On Wed, Mar 18, 2026 at 08:08:44AM -0000, tip-bot2 for K Prateek Nayak wrote: > The following commit has been merged into the sched/core branch of tip: > > Commit-ID: 8e8e23dea43e64ddafbd1246644c3219209be113 > Gitweb: https://git.kernel.org/tip/8e8e23dea43e64ddafbd1246644c3219209be113 > Author: K Prateek Nayak > AuthorDate: Thu, 12 Mar 2026 04:44:26 > Committer: Peter Zijlstra > CommitterDate: Wed, 18 Mar 2026 09:06:47 +01:00 > > sched/topology: Compute sd_weight considering cpuset partitions > > The "sd_weight" used for calculating the load balancing interval, and > its limits, considers the span weight of the entire topology level > without accounting for cpuset partitions. > > For example, consider a large system of 128CPUs divided into 8 * 16CPUs > partition which is typical when deploying virtual machines: > > [ PKG Domain: 128CPUs ] > > [Partition0: 16CPUs][Partition1: 16CPUs] ... [Partition7: 16CPUs] > > Although each partition only contains 16CPUs, the load balancing > interval is set to a minimum of 128 jiffies considering the span of the > entire domain with 128CPUs which can lead to longer imbalances within > the partition although balancing within is cheaper with 16CPUs. > > Compute the "sd_weight" after computing the "sd_span" considering the > cpu_map covered by the partition, and set the load balancing interval, > and its limits accordingly. > > For the above example, the balancing intervals for the partitions PKG > domain changes as follows: > > before after > balance_interval 128 16 > min_interval 128 16 > max_interval 256 32 > > Intervals are now proportional to the CPUs in the partitioned domain as > was intended by the original formula. > > Fixes: cb83b629bae03 ("sched/numa: Rewrite the CONFIG_NUMA sched domain support") > Signed-off-by: K Prateek Nayak > Signed-off-by: Peter Zijlstra (Intel) > Reviewed-by: Shrikanth Hegde > Reviewed-by: Chen Yu > Reviewed-by: Valentin Schneider > Reviewed-by: Dietmar Eggemann > Tested-by: Dietmar Eggemann > Link: https://patch.msgid.link/20260312044434.1974-2-kprateek.nayak@amd.com > --- > kernel/sched/topology.c | 14 ++++++-------- > 1 file changed, 6 insertions(+), 8 deletions(-) > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > index 061f8c8..79bab80 100644 > --- a/kernel/sched/topology.c > +++ b/kernel/sched/topology.c > @@ -1645,13 +1645,17 @@ sd_init(struct sched_domain_topology_level *tl, > struct cpumask *sd_span; > u64 now = sched_clock(); > > - sd_weight = cpumask_weight(tl->mask(tl, cpu)); > + sd_span = sched_domain_span(sd); > + cpumask_and(sd_span, cpu_map, tl->mask(tl, cpu)); > + sd_weight = cpumask_weight(sd_span); > + sd_id = cpumask_first(sd_span); > > if (tl->sd_flags) > sd_flags = (*tl->sd_flags)(); > if (WARN_ONCE(sd_flags & ~TOPOLOGY_SD_FLAGS, > - "wrong sd_flags in topology description\n")) > + "wrong sd_flags in topology description\n")) > sd_flags &= TOPOLOGY_SD_FLAGS; > + sd_flags |= asym_cpu_capacity_classify(sd_span, cpu_map); > > *sd = (struct sched_domain){ > .min_interval = sd_weight, > @@ -1689,12 +1693,6 @@ sd_init(struct sched_domain_topology_level *tl, > .name = tl->name, > }; > > - sd_span = sched_domain_span(sd); > - cpumask_and(sd_span, cpu_map, tl->mask(tl, cpu)); > - sd_id = cpumask_first(sd_span); > - > - sd->flags |= asym_cpu_capacity_classify(sd_span, cpu_map); > - > WARN_ONCE((sd->flags & (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY)) == > (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY), > "CPU capacity asymmetry not supported on SMT\n"); Apologies if this has already been reported or addressed but I am seeing a crash when booting certain ARM configurations after this change landed in -next. I reduced it down to $ cat kernel/configs/schedstats.config CONFIG_SCHEDSTATS=y $ make -skj"$(nproc)" ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- mrproper defconfig schedstats.config zImage $ curl -LSs https://github.com/ClangBuiltLinux/boot-utils/releases/download/20241120-044434/arm-rootfs.cpio.zst | zstd -d >rootfs.cpio $ qemu-system-arm \ -display none \ -nodefaults \ -no-reboot \ -machine virt \ -append 'console=ttyAMA0 earlycon' \ -kernel arch/arm/boot/zImage \ -initrd rootfs.cpio \ -m 1G \ -serial mon:stdio [ 0.000000] Booting Linux on physical CPU 0x0 [ 0.000000] Linux version 7.0.0-rc4-00017-g8e8e23dea43e (nathan@framework-amd-ryzen-maxplus-395) (arm-linux-gnueabi-gcc (GCC) 15.2.0, GNU ld (GNU Binutils) 2.45) #1 SMP Fri Mar 20 16:12:05 MST 2026 ... [ 0.031929] 8<--- cut here --- [ 0.031999] Unable to handle kernel NULL pointer dereference at virtual address 00000000 when write [ 0.032172] [00000000] *pgd=00000000 [ 0.032459] Internal error: Oops: 805 [#1] SMP ARM [ 0.032902] Modules linked in: [ 0.033466] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc4-00017-g8e8e23dea43e #1 VOLUNTARY [ 0.033658] Hardware name: Generic DT based system [ 0.033770] PC is at build_sched_domains+0x7d0/0x1628 [ 0.034091] LR is at build_sched_domains+0x78c/0x1628 [ 0.034166] pc : [] lr : [] psr: 20000053 [ 0.034255] sp : f080dec0 ip : 00000000 fp : c1e244a4 [ 0.034339] r10: c1e04fd4 r9 : c1e24518 r8 : 00000000 [ 0.034415] r7 : c2088f20 r6 : c28db924 r5 : c1e051ec r4 : 00000010 [ 0.034508] r3 : 00000000 r2 : 00000000 r1 : 00000010 r0 : 00000010 [ 0.034623] Flags: nzCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment none [ 0.034730] Control: 10c5387d Table: 4020406a DAC: 00000051 [ 0.034819] Register r0 information: zero-size pointer [ 0.034990] Register r1 information: zero-size pointer [ 0.035064] Register r2 information: NULL pointer [ 0.035133] Register r3 information: NULL pointer [ 0.035198] Register r4 information: zero-size pointer [ 0.035266] Register r5 information: non-slab/vmalloc memory [ 0.035376] Register r6 information: slab kmalloc-512 start c28db800 pointer offset 292 size 512 [ 0.035623] Register r7 information: non-slab/vmalloc memory [ 0.035703] Register r8 information: NULL pointer [ 0.035769] Register r9 information: non-slab/vmalloc memory [ 0.035848] Register r10 information: non-slab/vmalloc memory [ 0.035928] Register r11 information: non-slab/vmalloc memory [ 0.036006] Register r12 information: NULL pointer [ 0.036083] Process swapper/0 (pid: 1, stack limit = 0x(ptrval)) [ 0.036243] Stack: (0xf080dec0 to 0xf080e000) [ 0.036339] dec0: 00000000 c139a06c 00000001 00000000 c1e243f4 c28db924 c28db800 00000000 [ 0.036450] dee0: 00000000 ffff8ad3 00000000 00000001 c18f9f1c 00000000 c1e03d80 c1a8d4d0 [ 0.036559] df00: 00000000 c2073b8f c28e3180 00000000 c20d0050 c1d7ea64 c28b8800 f4b63fe3 [ 0.036665] df20: c1e22714 c2969480 c1e22714 00000000 c2074620 c1a8a0e8 00000000 00000000 [ 0.036772] df40: f080df6c c1c1d724 20000053 c0303d80 f080df64 f4b63fe3 c1d703dc c1d703dc [ 0.036878] df60: c1d703dc 00000000 00000000 c1c01368 c2969480 f080df74 f080df74 f4b63fe3 [ 0.036989] df80: 00000000 c1e04f80 c13979fc 00000000 00000000 00000000 00000000 00000000 [ 0.037097] dfa0: 00000000 c1397a14 00000000 c03001ac 00000000 00000000 00000000 00000000 [ 0.037206] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 0.037316] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 [ 0.037447] Call trace: [ 0.037698] build_sched_domains from sched_init_smp+0x80/0x108 [ 0.037943] sched_init_smp from kernel_init_freeable+0xe8/0x24c [ 0.038029] kernel_init_freeable from kernel_init+0x18/0x12c [ 0.038122] kernel_init from ret_from_fork+0x14/0x28 [ 0.038209] Exception stack(0xf080dfb0 to 0xf080dff8) [ 0.038277] dfa0: 00000000 00000000 00000000 00000000 [ 0.038386] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 0.038495] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 0.038640] Code: e58d3020 e58d300c e59d3020 e59d200c (e5832000) [ 0.038903] ---[ end trace 0000000000000000 ]--- [ 0.039275] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 0.039628] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- If there is any more information I can provide or patches I can test, I am more than happy to do so. Cheers, Nathan # bad: [b5d083a3ed1e2798396d5e491432e887da8d4a06] Add linux-next specific files for 20260319 # good: [8a30aeb0d1b4e4aaf7f7bae72f20f2ae75385ccb] Merge tag 'nfsd-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux git bisect start 'b5d083a3ed1e2798396d5e491432e887da8d4a06' '8a30aeb0d1b4e4aaf7f7bae72f20f2ae75385ccb' # good: [21fbd87ec0afe2af5457f5a7f9acbee4bf5db891] Merge branch 'main' of https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git git bisect good 21fbd87ec0afe2af5457f5a7f9acbee4bf5db891 # good: [bffa4391cf4ee844778893a781f14faa55c75cce] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git git bisect good bffa4391cf4ee844778893a781f14faa55c75cce # bad: [a360efb89caee066919156db3921e616093c43b6] Merge branch 'for-leds-next' of https://git.kernel.org/pub/scm/linux/kernel/git/lee/leds.git git bisect bad a360efb89caee066919156db3921e616093c43b6 # good: [77f1b9e1181ac53ae9ce7c3c0e52002d02495c5e] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git git bisect good 77f1b9e1181ac53ae9ce7c3c0e52002d02495c5e # bad: [d0b3afea83e48990083c0367c10f02af751166b4] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git git bisect bad d0b3afea83e48990083c0367c10f02af751166b4 # good: [fe58c95c6f191a8c45dc183a2348a3b4caa77ed8] Merge branch into tip/master: 'perf/core' git bisect good fe58c95c6f191a8c45dc183a2348a3b4caa77ed8 # bad: [90924d8b73ac96a1a8b1cb9ba6cae36e193061a1] Merge branch into tip/master: 'timers/vdso' git bisect bad 90924d8b73ac96a1a8b1cb9ba6cae36e193061a1 # bad: [91396a53d7c7cb694627c665e0dbd2589c99eb0a] Merge branch into tip/master: 'timers/core' git bisect bad 91396a53d7c7cb694627c665e0dbd2589c99eb0a # bad: [fe7171d0d5dfbe189e41db99580ebacafc3c09ce] sched/fair: Simplify SIS_UTIL handling in select_idle_cpu() git bisect bad fe7171d0d5dfbe189e41db99580ebacafc3c09ce # good: [54a66e431eeacf23e1dc47cb3507f2d0c068aaf0] sched/headers: Inline raw_spin_rq_unlock() git bisect good 54a66e431eeacf23e1dc47cb3507f2d0c068aaf0 # bad: [1cc8a33ca7e8d38f962b64ece2a42c411a67bc76] sched/topology: Allocate per-CPU sched_domain_shared in s_data git bisect bad 1cc8a33ca7e8d38f962b64ece2a42c411a67bc76 # good: [786244f70322e41c937e69f0f935bfd11a9611bf] Merge tag 'v7.0-rc4' into sched/core, to pick up scheduler fixes git bisect good 786244f70322e41c937e69f0f935bfd11a9611bf # bad: [5a7b576b3ec1acc2694c5b58f80cd1d44a11b2c1] sched/topology: Extract "imb_numa_nr" calculation into a separate helper git bisect bad 5a7b576b3ec1acc2694c5b58f80cd1d44a11b2c1 # bad: [8e8e23dea43e64ddafbd1246644c3219209be113] sched/topology: Compute sd_weight considering cpuset partitions git bisect bad 8e8e23dea43e64ddafbd1246644c3219209be113 # first bad commit: [8e8e23dea43e64ddafbd1246644c3219209be113] sched/topology: Compute sd_weight considering cpuset partitions