All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	Kyle McMartin <jkkm@meta.com>,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	Linux RT Development <linux-rt-devel@lists.linux.dev>,
	Clark Williams <williams@redhat.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	John Kacur <jkacur@redhat.com>
Subject: Re: [PATCH sched/core] sched/rt: Fix RT_PUSH_IPI soft lockup loop
Date: Wed, 13 May 2026 09:39:14 -1000	[thread overview]
Message-ID: <20260513193914.1593369-1-tj@kernel.org> (raw)
In-Reply-To: <20260512172847.5024e5e8@gandalf.local.home>

Hello,

Capturing this on the actual production hosts is awkward. It requires
a fleet with a particular management operation in flight, and while
the aggregate occurrence rate is reliable, which specific machine
hits it isn't predictable, so I haven't been able to catch one with
tracing on.

Production context: hosts serve live network traffic and storage IO.
CPU util before lockup is moderate (~30-40% steady state), but the
moment-to-moment softirq work is bursty - traffic patterns, IO
completions, plus the PSI poll triggers that source the migratable
psimon. Once softirq processing falls behind on a CPU, work piles
up. With the prio-bail-without-clearing-overload path, the IPI
storm forms on top and amplifies the slowdown.

So, here's a capture from a synthetic reproducer that I think
models the dynamic and reaches the same end state.

Test box, 192 CPUs, kernel without the fix:

- Per-target hrtimer (HRTIMER_MODE_REL_PINNED_HARD) fires every
  750us. Each fire schedules one tasklet round-robin from a pool
  of 20k distinct tasklets. Each tasklet body is a 500us cpu_relax
  loop, standing in for "process one item of softirq work".

- Storm driver: 190 SCHED_FIFO-50 nanosleep loops on non-target
  CPUs drive tell_cpu_to_push from balance_rt. Two synthetic
  psimon-shaped kthreads (FIFO 1) bound to the targets to pin
  them into rto_mask.

Baseline (no storm helpers): ~85% softirq util, no lockup, runs
indefinitely. The reproducer's baseline is higher than production -
my guess is we need to scrape up against capacity to grow a backlog
with the fixed-shape workload here, while production gets the same
effect from bursty arrivals during brief slowdowns.

With the storm: walker IPI overhead stretches each tasklet body
from 500us to ~1.1ms. Service rate drops below arrival, backlog
grows ~430/s. After ~46s, one tasklet_action_common snapshot has
~20k tasklets which it processes serially in BH-disabled softirq
context. That's ~22s uninterruptible, watchdog fires.

Six soft-lockups in a 120s run:

  [61125.38] BUG: soft lockup - CPU#95 stuck for 22s! [kworker/95:0]
  [61145.38] BUG: soft lockup - CPU#47 stuck for 45s! [migration/47]
  [61173.38] BUG: soft lockup - CPU#47 stuck for 71s! [migration/47]
  [61197.38] BUG: soft lockup - CPU#95 stuck for 22s! [migration/95]
  [61209.38] BUG: soft lockup - CPU#47 stuck for 21s! [kworker/47:1]
  [61225.38] BUG: soft lockup - CPU#95 stuck for 48s! [migration/95]

Stack at fire:

  rt_storm_wedge_fn+0x22/0xe0
  tasklet_action_common+0x100/0x2b0
  handle_softirqs+0xbe/0x280
  __irq_exit_rcu+0x47/0x100
  sysvec_apic_timer_interrupt+0x3a/0x80     <- watchdog hrtimer
  asm_sysvec_apic_timer_interrupt+0x16/0x20
  RIP: 0033:0x...     <- user task (rt_storm_hog)

Trace captured with your event list plus IPI:

  -e sched_switch -e sched_waking -e irq -e workqueue -e ipi
  -e irq_vectors:call_function_single_entry/exit
  -e irq_vectors:irq_work_entry/exit
  -e irq_vectors:reschedule_entry/exit
  -e irq_vectors:local_timer_entry/exit

Sliced to a 17s window around the first RCU stall + first
soft-lockup, filtered to CPUs 47 and 95, gzipped text (~11MB):

  https://drive.google.com/file/d/11AN6dyvOWiZLVNEEuVtQieRyAxJYbCbt/view?usp=sharing

Thanks.

--
tejun

  reply	other threads:[~2026-05-13 19:39 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-06 23:57 [PATCH sched/core] sched/rt: Fix RT_PUSH_IPI soft lockup loop Tejun Heo
2026-05-07 14:14 ` Peter Zijlstra
2026-05-11 19:33   ` Tejun Heo
2026-05-12 15:37   ` Steven Rostedt
2026-05-12 18:07     ` Tejun Heo
2026-05-12 21:28       ` Steven Rostedt
2026-05-13 19:39         ` Tejun Heo [this message]
2026-05-14  0:24           ` Steven Rostedt
2026-05-14  0:53             ` Tejun Heo
2026-05-14  1:31               ` Steven Rostedt
2026-05-14  1:42                 ` Tejun Heo
2026-05-14  2:01                   ` Steven Rostedt
2026-05-14  4:48                     ` Tejun Heo
2026-05-14 14:03                       ` Steven Rostedt
2026-05-14 21:15                         ` Tejun Heo
2026-05-14 23:43                           ` Steven Rostedt
2026-05-12 20:10     ` Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260513193914.1593369-1-tj@kernel.org \
    --to=tj@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=jkacur@redhat.com \
    --cc=jkkm@meta.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=stable@vger.kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.