All of lore.kernel.org
 help / color / mirror / Atom feed
From: Srikar Dronamraju <srikar@linux.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	Ben Segall <bsegall@google.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ingo Molnar <mingo@kernel.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	Madhavan Srinivasan <maddy@linux.ibm.com>,
	Mel Gorman <mgorman@suse.de>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Nicholas Piggin <npiggin@gmail.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Swapnil Sapkal <swapnil.sapkal@amd.com>,
	Thomas Huth <thuth@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	virtualization@lists.linux.dev,
	Ilya Leoshkevich <iii@linux.ibm.com>,
	Beata Michalska <beata.michalska@arm.com>
Subject: Re: [PATCH 08/17] sched/core: Implement CPU soft offline/online
Date: Sat, 6 Dec 2025 00:24:51 +0530	[thread overview]
Message-ID: <aTMqeylKyRwS7mn_@linux.ibm.com> (raw)
In-Reply-To: <20251205160326.GF2528459@noisy.programming.kicks-ass.net>

* Peter Zijlstra <peterz@infradead.org> [2025-12-05 17:03:26]:

Hi Peter, 


> 
> What happens if you then offline one of these softoffline CPUs? Doesn't
> that do sched_cpu_deactivate() again?
> 
> Also, the way this seems to use softoffline_mask is as a hidden argument
> to sched_cpu_{de,}activate() instead of as an actual mask.
> 
> Moreover, there does not seem to be any sort of serialization vs
> concurrent set_cpu_softoffline() callers. At the very least
> update_group_capacity() would end up with indeterminate results.
> 

To serialize soft_offline with actual offline, can we take cpu_maps_update_begin() / cpu_maps_update_done


> This all doesn't look 'robust'.

I figured out when Shrikanth Hegde reported a warning to me today evening.

Basically pin a task to CPU, and then run workload so that the load causes steal and then do a cpu offline 
Pinning just causes the window to be sure enough to hit the case easily.

[  804.464298] ------------[ cut here ]------------
[  804.464325] CPU capacity asymmetry not supported on SMT
[  804.464341] WARNING: CPU: 575 PID: 2926 at kernel/sched/topology.c:1677 sd_init+0x428/0x494
[  804.464355] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding tls rfkill ip_set nf_tables nfnetlink sunrpc pseries_rng vmx_crypto drm drm_panel_orientation_quirks xfs sd_mod sg ibmvscsi scsi_transport_srp ibmveth pseries_wdt dm_mirror dm_region_hash dm_log dm_mod fuse
[  804.464409] CPU: 575 UID: 0 PID: 2926 Comm: cpuhp/575 Kdump: loaded Not tainted 6.18.0-master+ #15 VOLUNTARY
[  804.464415] Hardware name: IBM,9080-HEU Power11 (architected) 0x820200 0xf000007 of:IBM,FW1110.00 (OK1110_066) hv:phyp pSeries
[  804.464420] NIP:  c000000000215c4c LR: c000000000215c48 CTR: 00000000005d54a0
[  804.464425] REGS: c00001801cfff3c0 TRAP: 0700   Not tainted  (6.18.0-master+)
[  804.464429] MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28828228  XER: 0000000c
[  804.464441] CFAR: c000000000171988 IRQMASK: 0
               GPR00: c000000000215c48 c00001801cfff660 c000000001c28100 000000000000002b
               GPR04: 0000000000000000 c00001801cfff470 c00001801cfff468 000001fff1280000
               GPR08: 0000000000000027 0000000000000000 0000000000000000 0000000000000001
               GPR12: c00001ffe182ffa8 c00001fff5d43b00 c00001804e999548 0000000000000000
               GPR16: 0000000000000000 c0000000015732e8 c00000000153f380 c00000012b337c18
               GPR20: c000000002edb660 0000000000000239 0000000000000004 c000018029a26200
               GPR24: 0000000000000000 c0000000029787c8 0000000000000002 c00000012b337c00
               GPR28: c00001804e7cb948 c000000002ee06d0 c00001804e7cb800 c0000000029787c8
[  804.464491] NIP [c000000000215c4c] sd_init+0x428/0x494
[  804.464496] LR [c000000000215c48] sd_init+0x424/0x494
[  804.464501] Call Trace:
[  804.464504] [c00001801cfff660] [c000000000215c48] sd_init+0x424/0x494 (unreliable)
[  804.464511] [c00001801cfff740] [c000000000226fd8] build_sched_domains+0x1c0/0x938
[  804.464517] [c00001801cfff850] [c000000000228f98] partition_sched_domains_locked+0x4a8/0x688
[  804.464523] [c00001801cfff940] [c000000000229244] partition_sched_domains+0x5c/0x84
[  804.464528] [c00001801cfff990] [c00000000031a020] rebuild_sched_domains_locked+0x1d8/0x260
[  804.464536] [c00001801cfff9f0] [c00000000031dde4] cpuset_handle_hotplug+0x564/0x728
[  804.464542] [c00001801cfffd80] [c0000000001d9fa8] sched_cpu_activate+0x2d4/0x2dc
[  804.464549] [c00001801cfffde0] [c00000000017567c] cpuhp_invoke_callback+0x26c/0xb20
[  804.464556] [c00001801cfffec0] [c000000000177554] cpuhp_thread_fun+0x210/0x2e8
[  804.464561] [c00001801cffff40] [c0000000001c1640] smpboot_thread_fn+0x200/0x2c0
[  804.464568] [c00001801cffff90] [c0000000001b5758] kthread+0x134/0x164
[  804.464575] [c00001801cffffe0] [c00000000000ded8] start_kernel_thread+0x14/0x18
[  804.464581] Code: 4082fe5c 3d420120 894a2525 2c0a0000 4082fe4c 3c62ff95 39200001 3d420120 38639830 992a2525 4bf5bcbd 60000000 <0fe00000> 813e003c 4bfffe24 60000000
[  804.464598] ---[ end trace 0000000000000000 ]---


But this warning will still remain even if we take the cpu_maps_update_begin.

This comes due to
	WARN_ONCE((sd->flags & (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY)) ==
		  (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY),
		  "CPU capacity asymmetry not supported on SMT\n");

which was recently added by 
Commit c744dc4ab58d ("sched/topology: Rework CPU capacity asymmetry detection")
Is there a way to tweak this WARN_ONCE?

-- 
Thanks and Regards
Srikar Dronamraju


  reply	other threads:[~2025-12-05 18:55 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-04 17:53 [PATCH 00/17] Steal time based dynamic CPU resource management Srikar Dronamraju
2025-12-04 17:53 ` [PATCH 01/17] sched/fair: Enable group_asym_packing in find_idlest_group Srikar Dronamraju
2025-12-04 17:53 ` [PATCH 02/17] powerpc/lpar: Reorder steal accounting calculation Srikar Dronamraju
2025-12-04 17:53 ` [PATCH 03/17] pseries/lpar: Process steal metrics Srikar Dronamraju
2025-12-04 17:53 ` [PATCH 04/17] powerpc/smp: Add num_available_cores callback for smp_ops Srikar Dronamraju
2025-12-04 17:53 ` [PATCH 05/17] pseries/smp: Query and set entitlements Srikar Dronamraju
2025-12-04 17:53 ` [PATCH 06/17] powerpc/smp: Delay processing steal time at boot Srikar Dronamraju
2025-12-04 17:53 ` [PATCH 07/17] sched/core: Set balance_callback only if CPU is dying Srikar Dronamraju
2025-12-04 17:53 ` [PATCH 08/17] sched/core: Implement CPU soft offline/online Srikar Dronamraju
2025-12-05 16:03   ` Peter Zijlstra
2025-12-05 18:54     ` Srikar Dronamraju [this message]
2025-12-05 16:07   ` Peter Zijlstra
2025-12-05 18:57     ` Srikar Dronamraju
2025-12-04 17:53 ` [PATCH 09/17] powerpc/smp: Implement arch_scale_cpu_capacity for shared LPARs Srikar Dronamraju
2025-12-04 17:53 ` [PATCH 10/17] powerpc/smp: Define arch_update_cpu_topology " Srikar Dronamraju
2025-12-04 17:53 ` [PATCH 11/17] pseries/smp: Create soft offline infrastructure for Powerpc " Srikar Dronamraju
2025-12-04 17:54 ` [PATCH 12/17] pseries/smp: Trigger softoffline based on steal metrics Srikar Dronamraju
2025-12-04 17:54 ` [PATCH 13/17] pseries/smp: Account cores when triggering softoffline Srikar Dronamraju
2025-12-04 17:54 ` [PATCH 14/17] powerpc/smp: Assume preempt if CPU is inactive Srikar Dronamraju
2025-12-04 17:54 ` [PATCH 15/17] pseries/hotplug: Update available_cores on a dlpar event Srikar Dronamraju
2025-12-04 17:54 ` [PATCH 16/17] pseries/smp: Allow users to override steal thresholds Srikar Dronamraju
2025-12-04 17:54 ` [PATCH 17/17] pseries/lpar: Add debug interface to set steal interval Srikar Dronamraju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aTMqeylKyRwS7mn_@linux.ibm.com \
    --to=srikar@linux.ibm.com \
    --cc=beata.michalska@arm.com \
    --cc=bsegall@google.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=dietmar.eggemann@arm.com \
    --cc=iii@linux.ibm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sshegde@linux.ibm.com \
    --cc=swapnil.sapkal@amd.com \
    --cc=thuth@redhat.com \
    --cc=vincent.guittot@linaro.org \
    --cc=virtualization@lists.linux.dev \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.