From: Pingfan Liu <piliu@redhat.com>
To: Juri Lelli <juri.lelli@redhat.com>
Cc: "Waiman Long" <llong@redhat.com>,
cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
"Tejun Heo" <tj@kernel.org>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Michal Koutný" <mkoutny@suse.com>,
"Ingo Molnar" <mingo@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Pierre Gondois" <pierre.gondois@arm.com>,
"Vincent Guittot" <vincent.guittot@linaro.org>,
"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Ben Segall" <bsegall@google.com>, "Mel Gorman" <mgorman@suse.de>,
"Valentin Schneider" <vschneid@redhat.com>
Subject: Re: [PATCHv3] sched/deadline: Walk up cpuset hierarchy to decide root domain when hot-unplug
Date: Mon, 20 Oct 2025 21:34:13 +0800 [thread overview]
Message-ID: <aPY6VeMfcu_iddY4@fedora> (raw)
In-Reply-To: <aPXQra4TWR0NVwDQ@jlelli-thinkpadt14gen4.remote.csb>
Hi Juri,
Thanks for following up on this topic. Please check my comment below.
On Mon, Oct 20, 2025 at 08:03:25AM +0200, Juri Lelli wrote:
> Hi!
>
> On 20/10/25 11:21, Pingfan Liu wrote:
> > Hi Waiman,
> >
> > I appreciate your time in reviewing my patch. Please see the comment
> > belows.
> >
> > On Fri, Oct 17, 2025 at 01:52:45PM -0400, Waiman Long wrote:
> > > On 10/17/25 8:26 AM, Pingfan Liu wrote:
> > > > When testing kexec-reboot on a 144 cpus machine with
> > > > isolcpus=managed_irq,domain,1-71,73-143 in kernel command line, I
> > > > encounter the following bug:
> > > >
> > > > [ 97.114759] psci: CPU142 killed (polled 0 ms)
> > > > [ 97.333236] Failed to offline CPU143 - error=-16
> > > > [ 97.333246] ------------[ cut here ]------------
> > > > [ 97.342682] kernel BUG at kernel/cpu.c:1569!
> > > > [ 97.347049] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
> > > > [ 97.353281] Modules linked in: rfkill sunrpc dax_hmem cxl_acpi cxl_port cxl_core einj vfat fat arm_smmuv3_pmu nvidia_cspmu arm_spe_pmu coresight_trbe arm_cspmu_module rndis_host ipmi_ssif cdc_ether i2c_smbus spi_nor usbnet ast coresight_tmc mii ixgbe i2c_algo_bit mdio mtd coresight_funnel coresight_stm stm_core coresight_etm4x coresight cppc_cpufreq loop fuse nfnetlink xfs crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce sbsa_gwdt nvme nvme_core nvme_auth i2c_tegra acpi_power_meter acpi_ipmi ipmi_devintf ipmi_msghandler dm_mirror dm_region_hash dm_log dm_mod
> > > > [ 97.404119] CPU: 0 UID: 0 PID: 2583 Comm: kexec Kdump: loaded Not tainted 6.12.0-41.el10.aarch64 #1
> > > > [ 97.413371] Hardware name: Supermicro MBD-G1SMH/G1SMH, BIOS 2.0 07/12/2024
> > > > [ 97.420400] pstate: 23400009 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
> > > > [ 97.427518] pc : smp_shutdown_nonboot_cpus+0x104/0x128
> > > > [ 97.432778] lr : smp_shutdown_nonboot_cpus+0x11c/0x128
> > > > [ 97.438028] sp : ffff800097c6b9a0
> > > > [ 97.441411] x29: ffff800097c6b9a0 x28: ffff0000a099d800 x27: 0000000000000000
> > > > [ 97.448708] x26: 0000000000000000 x25: 0000000000000000 x24: ffffb94aaaa8f218
> > > > [ 97.456004] x23: ffffb94aaaabaae0 x22: ffffb94aaaa8f018 x21: 0000000000000000
> > > > [ 97.463301] x20: ffffb94aaaa8fc10 x19: 000000000000008f x18: 00000000fffffffe
> > > > [ 97.470598] x17: 0000000000000000 x16: ffffb94aa958fcd0 x15: ffff103acfca0b64
> > > > [ 97.477894] x14: ffff800097c6b520 x13: 36312d3d726f7272 x12: ffff103acfc6ffa8
> > > > [ 97.485191] x11: ffff103acf6f0000 x10: ffff103bc085c400 x9 : ffffb94aa88a0eb0
> > > > [ 97.492488] x8 : 0000000000000001 x7 : 000000000017ffe8 x6 : c0000000fffeffff
> > > > [ 97.499784] x5 : ffff003bdf62b408 x4 : 0000000000000000 x3 : 0000000000000000
> > > > [ 97.507081] x2 : 0000000000000000 x1 : ffff0000a099d800 x0 : 0000000000000002
> > > > [ 97.514379] Call trace:
> > > > [ 97.516874] smp_shutdown_nonboot_cpus+0x104/0x128
> > > > [ 97.521769] machine_shutdown+0x20/0x38
> > > > [ 97.525693] kernel_kexec+0xc4/0xf0
> > > > [ 97.529260] __do_sys_reboot+0x24c/0x278
> > > > [ 97.533272] __arm64_sys_reboot+0x2c/0x40
> > > > [ 97.537370] invoke_syscall.constprop.0+0x74/0xd0
> > > > [ 97.542179] do_el0_svc+0xb0/0xe8
> > > > [ 97.545562] el0_svc+0x44/0x1d0
> > > > [ 97.548772] el0t_64_sync_handler+0x120/0x130
> > > > [ 97.553222] el0t_64_sync+0x1a4/0x1a8
> > > > [ 97.556963] Code: a94363f7 a8c47bfd d50323bf d65f03c0 (d4210000)
> > > > [ 97.563191] ---[ end trace 0000000000000000 ]---
> > > > [ 97.595854] Kernel panic - not syncing: Oops - BUG: Fatal exception
> > > > [ 97.602275] Kernel Offset: 0x394a28600000 from 0xffff800080000000
> > > > [ 97.608502] PHYS_OFFSET: 0x80000000
> > > > [ 97.612062] CPU features: 0x10,0000000d,002a6928,5667fea7
> > > > [ 97.617580] Memory Limit: none
> > > > [ 97.648626] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]
> > > >
> > > > Tracking down this issue, I found that dl_bw_deactivate() returned
> > > > -EBUSY, which caused sched_cpu_deactivate() to fail on the last CPU.
> > > > When a CPU is inactive, its rd is set to def_root_domain. For an
> > > > blocked-state deadline task (in this case, "cppc_fie"), it was not
> > > > migrated to CPU0, and its task_rq() information is stale. As a result,
> > > > its bandwidth is wrongly accounted into def_root_domain during domain
> > > > rebuild.
> > >
> > > First of all, in an emergency situation when we need to shutdown the kernel,
> > > does it really matter if dl_bw_activate() returns -EBUSY? Should we just go
> > > ahead and ignore this dl_bw generated error?
> > >
> >
> > Ah, sorry - the previous test example was misleading. Let me restate it
> > as an equivalent operation on a system with 144 CPUs:
> > sudo bash -c 'taskset -cp 0 $$ && for i in {1..143}; do echo 0 > /sys/devices/system/cpu/cpu$i/online 2>/dev/null; done'
> >
> > That extracts the hot-removal part, which is affected by the bug, from
> > the kexec reboot process. It expects that only cpu0 is online, but in
> > practice, the cpu143 refused to be offline due to this bug.
>
> I confess I am still perplexed by this, considering the "particular"
> nature of cppc worker that seems to be the only task that is able to
> trigger this problem. First of all, is that indeed the case or are you
> able to reproduce this problem with standard (non-kthread) DEADLINE
> tasks as well?
>
Yes, I can. I wrote a SCHED_DEADLINE task that waits indefinitely on a
semaphore (or, more precisely, for a very long period that may span the
entire CPU hot-removal process) to emulate waiting for an undetermined
driver input. Then I spawned multiple instances of this program to
ensure that some of them run on CPU 72. When I attempted to offline CPUs
1–143 one by one, CPU 143 failed to go offline.
> I essentially wonder how cppc worker affinity/migration on hotplug is
> handled. With your isolcpus configuration you have one isolated root
The affinity/migration on hotplug work fine. The keypoint is that they
only handle the task on rq. For the blocked-state tasks (here it is cppc
worker), they just ignore them.
Thanks,
Pingfan
> domain per isolated cpu, so if cppc worker is not migrated away from (in
> the case above) cpu 143, then BW control might be right in saying we
> can't offline that cpu, as the worker still has BW running there. This
> is also why I fist wondered (and suggested) we remove cppc worker BW
> from the picture (make it DEADLINE special) as we don't really seem to
> have a reliable way to associate meaningful BW to it anyway.
>
> Thanks,
> Juri
>
next prev parent reply other threads:[~2025-10-20 13:34 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-17 12:26 [PATCHv3] sched/deadline: Walk up cpuset hierarchy to decide root domain when hot-unplug Pingfan Liu
2025-10-17 17:52 ` Waiman Long
2025-10-20 3:21 ` Pingfan Liu
2025-10-20 6:03 ` Juri Lelli
2025-10-20 13:34 ` Pingfan Liu [this message]
2025-10-20 15:25 ` Juri Lelli
2025-10-18 4:36 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aPY6VeMfcu_iddY4@fedora \
--to=piliu@redhat.com \
--cc=bsegall@google.com \
--cc=cgroups@vger.kernel.org \
--cc=dietmar.eggemann@arm.com \
--cc=hannes@cmpxchg.org \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=llong@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=mkoutny@suse.com \
--cc=peterz@infradead.org \
--cc=pierre.gondois@arm.com \
--cc=rostedt@goodmis.org \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox