public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] tick/broadcast-hrtimer: Prevent the timer device on broadcast duty CPU from being disabled
@ 2023-12-18  2:58 Yu Liao
  2024-01-12  7:40 ` Yu Liao
  2024-01-24 19:18 ` Thomas Gleixner
  0 siblings, 2 replies; 5+ messages in thread
From: Yu Liao @ 2023-12-18  2:58 UTC (permalink / raw)
  To: linux-kernel, tglx; +Cc: liaoyu15, liwei391, wangxiongfeng2, frederic, mingo

It was found that running the LTP hotplug stress test on a aarch64
system could produce rcu_sched stall warnings.

The issue is the following:

CPU1 (owns the broadcast hrtimer)	CPU2

				tick_broadcast_enter()
				//shut down local timer device
				...
				tick_broadcast_exit()
				//exits with tick_broadcast_force_mask set,
				timer device remains disabled

				initiates offlining of CPU1
take_cpu_down()
//CPU1 shuts down and does
not send broadcast IPI anymore
				takedown_cpu()
				  hotplug_cpu__broadcast_tick_pull()
				  //move broadcast hrtimer to this CPU
				    clockevents_program_event()
				      bc_set_next()
					hrtimer_start()
					//does not call hrtimer_reprogram()
					to program timer device if expires
					equals dev->next_event, so the timer
					device remains disabled.

CPU2 takes over the broadcast duty but local timer device is disabled,
causing many CPUs to become stuck.

Fix this by calling tick_program_event() to reprogram the local timer
device in this scenario.

Signed-off-by: Yu Liao <liaoyu15@huawei.com>
---
 kernel/time/tick-broadcast-hrtimer.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/kernel/time/tick-broadcast-hrtimer.c b/kernel/time/tick-broadcast-hrtimer.c
index e28f9210f8a1..6a4a612581fb 100644
--- a/kernel/time/tick-broadcast-hrtimer.c
+++ b/kernel/time/tick-broadcast-hrtimer.c
@@ -42,10 +42,22 @@ static int bc_shutdown(struct clock_event_device *evt)
  */
 static int bc_set_next(ktime_t expires, struct clock_event_device *bc)
 {
+	ktime_t next_event = this_cpu_ptr(&tick_cpu_device)->evtdev->next_event;
+
 	/*
-	 * This is called either from enter/exit idle code or from the
-	 * broadcast handler. In all cases tick_broadcast_lock is held.
-	 *
+	 * This can be called from CPU offline operation to move broadcast
+	 * assignment. If tick_broadcast_force_mask is set, the CPU local
+	 * timer device may be disabled. And hrtimer_reprogram() will not
+	 * called if the timer is not the first expiring timer. Reprogram
+	 * the cpu local timer device to ensure we can take over the
+	 * broadcast duty.
+	 */
+	if (tick_check_broadcast_expired() && expires >= next_event)
+		tick_program_event(next_event, 1);
+
+	/*
+	 * This is called from enter/exit idle code, broadcast handler or
+	 * CPU offline operation. In all cases tick_broadcast_lock is held.
 	 * hrtimer_cancel() cannot be called here neither from the
 	 * broadcast handler nor from the enter/exit idle code. The idle
 	 * code can run into the problem described in bc_shutdown() and the
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-07-11 13:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-18  2:58 [PATCH] tick/broadcast-hrtimer: Prevent the timer device on broadcast duty CPU from being disabled Yu Liao
2024-01-12  7:40 ` Yu Liao
2024-01-12 15:34   ` Thomas Gleixner
2024-01-24 19:18 ` Thomas Gleixner
2024-07-11 13:10   ` Yu Liao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox