From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Yu Liao <liaoyu15@huawei.com>,
Thomas Gleixner <tglx@linutronix.de>,
Liu Tie <liutie4@huawei.com>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.19 07/55] hrtimers: Push pending hrtimers away from outgoing CPU earlier
Date: Mon, 11 Dec 2023 19:21:17 +0100 [thread overview]
Message-ID: <20231211182012.494629114@linuxfoundation.org> (raw)
In-Reply-To: <20231211182012.263036284@linuxfoundation.org>
4.19-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner <tglx@linutronix.de>
[ Upstream commit 5c0930ccaad5a74d74e8b18b648c5eb21ed2fe94 ]
2b8272ff4a70 ("cpu/hotplug: Prevent self deadlock on CPU hot-unplug")
solved the straight forward CPU hotplug deadlock vs. the scheduler
bandwidth timer. Yu discovered a more involved variant where a task which
has a bandwidth timer started on the outgoing CPU holds a lock and then
gets throttled. If the lock required by one of the CPU hotplug callbacks
the hotplug operation deadlocks because the unthrottling timer event is not
handled on the dying CPU and can only be recovered once the control CPU
reaches the hotplug state which pulls the pending hrtimers from the dead
CPU.
Solve this by pushing the hrtimers away from the dying CPU in the dying
callbacks. Nothing can queue a hrtimer on the dying CPU at that point because
all other CPUs spin in stop_machine() with interrupts disabled and once the
operation is finished the CPU is marked offline.
Reported-by: Yu Liao <liaoyu15@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Liu Tie <liutie4@huawei.com>
Link: https://lore.kernel.org/r/87a5rphara.ffs@tglx
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/linux/cpuhotplug.h | 1 +
include/linux/hrtimer.h | 4 ++--
kernel/cpu.c | 8 +++++++-
kernel/time/hrtimer.c | 33 ++++++++++++---------------------
4 files changed, 22 insertions(+), 24 deletions(-)
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 71a0a5ffdbb1a..dd9f035be63f7 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -139,6 +139,7 @@ enum cpuhp_state {
CPUHP_AP_ARM_CORESIGHT_STARTING,
CPUHP_AP_ARM64_ISNDEP_STARTING,
CPUHP_AP_SMPCFD_DYING,
+ CPUHP_AP_HRTIMERS_DYING,
CPUHP_AP_X86_TBOOT_DYING,
CPUHP_AP_ARM_CACHE_B15_RAC_DYING,
CPUHP_AP_ONLINE,
diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 542b4fa2cda9b..3bdaa92a2cab3 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -508,9 +508,9 @@ extern void sysrq_timer_list_show(void);
int hrtimers_prepare_cpu(unsigned int cpu);
#ifdef CONFIG_HOTPLUG_CPU
-int hrtimers_dead_cpu(unsigned int cpu);
+int hrtimers_cpu_dying(unsigned int cpu);
#else
-#define hrtimers_dead_cpu NULL
+#define hrtimers_cpu_dying NULL
#endif
#endif
diff --git a/kernel/cpu.c b/kernel/cpu.c
index c9ca190ec0347..34c09c3d37bc6 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1418,7 +1418,7 @@ static struct cpuhp_step cpuhp_hp_states[] = {
[CPUHP_HRTIMERS_PREPARE] = {
.name = "hrtimers:prepare",
.startup.single = hrtimers_prepare_cpu,
- .teardown.single = hrtimers_dead_cpu,
+ .teardown.single = NULL,
},
[CPUHP_SMPCFD_PREPARE] = {
.name = "smpcfd:prepare",
@@ -1485,6 +1485,12 @@ static struct cpuhp_step cpuhp_hp_states[] = {
.startup.single = NULL,
.teardown.single = smpcfd_dying_cpu,
},
+ [CPUHP_AP_HRTIMERS_DYING] = {
+ .name = "hrtimers:dying",
+ .startup.single = NULL,
+ .teardown.single = hrtimers_cpu_dying,
+ },
+
/* Entry state on starting. Interrupts enabled from here on. Transient
* state for synchronsization */
[CPUHP_AP_ONLINE] = {
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 8512f06f0ebef..bf74f43e42af0 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1922,29 +1922,22 @@ static void migrate_hrtimer_list(struct hrtimer_clock_base *old_base,
}
}
-int hrtimers_dead_cpu(unsigned int scpu)
+int hrtimers_cpu_dying(unsigned int dying_cpu)
{
struct hrtimer_cpu_base *old_base, *new_base;
- int i;
+ int i, ncpu = cpumask_first(cpu_active_mask);
- BUG_ON(cpu_online(scpu));
- tick_cancel_sched_timer(scpu);
+ tick_cancel_sched_timer(dying_cpu);
+
+ old_base = this_cpu_ptr(&hrtimer_bases);
+ new_base = &per_cpu(hrtimer_bases, ncpu);
- /*
- * this BH disable ensures that raise_softirq_irqoff() does
- * not wakeup ksoftirqd (and acquire the pi-lock) while
- * holding the cpu_base lock
- */
- local_bh_disable();
- local_irq_disable();
- old_base = &per_cpu(hrtimer_bases, scpu);
- new_base = this_cpu_ptr(&hrtimer_bases);
/*
* The caller is globally serialized and nobody else
* takes two locks at once, deadlock is not possible.
*/
- raw_spin_lock(&new_base->lock);
- raw_spin_lock_nested(&old_base->lock, SINGLE_DEPTH_NESTING);
+ raw_spin_lock(&old_base->lock);
+ raw_spin_lock_nested(&new_base->lock, SINGLE_DEPTH_NESTING);
for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++) {
migrate_hrtimer_list(&old_base->clock_base[i],
@@ -1955,15 +1948,13 @@ int hrtimers_dead_cpu(unsigned int scpu)
* The migration might have changed the first expiring softirq
* timer on this CPU. Update it.
*/
- hrtimer_update_softirq_timer(new_base, false);
+ __hrtimer_get_next_event(new_base, HRTIMER_ACTIVE_SOFT);
+ /* Tell the other CPU to retrigger the next event */
+ smp_call_function_single(ncpu, retrigger_next_event, NULL, 0);
- raw_spin_unlock(&old_base->lock);
raw_spin_unlock(&new_base->lock);
+ raw_spin_unlock(&old_base->lock);
- /* Check, if we got expired work to do */
- __hrtimer_peek_ahead_timers();
- local_irq_enable();
- local_bh_enable();
return 0;
}
--
2.42.0
next prev parent reply other threads:[~2023-12-11 18:24 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-11 18:21 [PATCH 4.19 00/55] 4.19.302-rc1 review Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 01/55] spi: imx: add a device specific prepare_message callback Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 02/55] spi: imx: move wml setting to later than setup_transfer Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 03/55] spi: imx: correct wml as the last sg length Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 04/55] spi: imx: mx51-ecspi: Move some initialisation to prepare_message hook Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 05/55] media: davinci: vpif_capture: fix potential double free Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 06/55] block: introduce multi-page bvec helpers Greg Kroah-Hartman
2023-12-12 5:47 ` Christoph Hellwig
2023-12-12 8:38 ` Greg Kroah-Hartman
2023-12-11 18:21 ` Greg Kroah-Hartman [this message]
2023-12-11 18:21 ` [PATCH 4.19 08/55] netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 09/55] tg3: Move the [rt]x_dropped counters to tg3_napi Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 10/55] tg3: Increment tx_dropped in tg3_tso_bug() Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 11/55] kconfig: fix memory leak from range properties Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 12/55] drm/amdgpu: correct chunk_ptr to a pointer to chunk Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 13/55] ipv6: fix potential NULL deref in fib6_add() Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 14/55] hv_netvsc: rndis_filter needs to select NLS Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 15/55] net: arcnet: Fix RESET flag handling Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 16/55] net: arcnet: com20020 fix error handling Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 17/55] arcnet: restoring support for multiple Sohard Arcnet cards Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 18/55] ipv4: ip_gre: Avoid skb_pull() failure in ipgre_xmit() Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 19/55] net: hns: fix fake link up on xge port Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 20/55] netfilter: xt_owner: Add supplementary groups option Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 21/55] netfilter: xt_owner: Fix for unsafe access of sk->sk_socket Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 22/55] tcp: do not accept ACK of bytes we never sent Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 23/55] RDMA/bnxt_re: Correct module description string Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 24/55] hwmon: (acpi_power_meter) Fix 4.29 MW bug Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 25/55] tracing: Fix a warning when allocating buffered events fails Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 26/55] scsi: be2iscsi: Fix a memleak in beiscsi_init_wrb_handle() Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 27/55] ARM: imx: Check return value of devm_kasprintf in imx_mmdc_perf_init Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 28/55] ARM: dts: imx: make gpt node name generic Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 29/55] ARM: dts: imx7: Declare timers compatible with fsl,imx6dl-gpt Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 30/55] ALSA: pcm: fix out-of-bounds in snd_pcm_state_names Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 31/55] packet: Move reference count in packet_sock to atomic_long_t Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 32/55] nilfs2: prevent WARNING in nilfs_sufile_set_segment_usage() Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 33/55] tracing: Always update snapshot buffer size Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 34/55] tracing: Fix incomplete locking when disabling buffered events Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 35/55] tracing: Fix a possible race " Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 36/55] perf/core: Add a new read format to get a number of lost samples Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 37/55] perf: Fix perf_event_validate_size() Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 38/55] gpiolib: sysfs: Fix error handling on failed export Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 39/55] usb: gadget: f_hid: fix report descriptor allocation Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 40/55] parport: Add support for Brainboxes IX/UC/PX parallel cards Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 41/55] usb: typec: class: fix typec_altmode_put_partner to put plugs Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 42/55] ARM: PL011: Fix DMA support Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 43/55] serial: sc16is7xx: address RX timeout interrupt errata Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 44/55] serial: 8250_omap: Add earlycon support for the AM654 UART controller Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 45/55] x86/CPU/AMD: Check vendor in the AMD microcode callback Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 46/55] KVM: s390/mm: Properly reset no-dat Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 47/55] nilfs2: fix missing error check for sb_set_blocksize call Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 48/55] netlink: dont call ->netlink_bind with table lock held Greg Kroah-Hartman
2023-12-11 18:21 ` [PATCH 4.19 49/55] genetlink: add CAP_NET_ADMIN test for multicast bind Greg Kroah-Hartman
2023-12-11 18:22 ` [PATCH 4.19 50/55] psample: Require CAP_NET_ADMIN when joining "packets" group Greg Kroah-Hartman
2023-12-11 18:22 ` [PATCH 4.19 51/55] drop_monitor: Require CAP_SYS_ADMIN when joining "events" group Greg Kroah-Hartman
2023-12-11 18:22 ` [PATCH 4.19 52/55] tools headers UAPI: Sync linux/perf_event.h with the kernel sources Greg Kroah-Hartman
2023-12-11 18:22 ` [PATCH 4.19 53/55] IB/isert: Fix unaligned immediate-data handling Greg Kroah-Hartman
2023-12-11 18:22 ` [PATCH 4.19 54/55] devcoredump : Serialize devcd_del work Greg Kroah-Hartman
2023-12-11 18:22 ` [PATCH 4.19 55/55] devcoredump: Send uevent once devcd is ready Greg Kroah-Hartman
2023-12-11 21:04 ` [PATCH 4.19 00/55] 4.19.302-rc1 review Daniel Díaz
2023-12-12 10:39 ` Greg Kroah-Hartman
2023-12-12 16:14 ` Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231211182012.494629114@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=liaoyu15@huawei.com \
--cc=liutie4@huawei.com \
--cc=patches@lists.linux.dev \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox