From: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
To: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
linux-kernel@vger.kernel.org, rcu@vger.kernel.org,
stable-rt <stable-rt@vger.kernel.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
Anna-Maria Behnsen <anna-maria@linutronix.de>,
Davidlohr Bueso <dave@stgolabs.net>,
Frederic Weisbecker <frederic@kernel.org>,
Ingo Molnar <mingo@kernel.org>,
Josh Triplett <josh@joshtriplett.org>,
Thomas Gleixner <tglx@linutronix.de>,
Florian Bezdeka <florian.bezdeka@siemens.com>,
Pavel Machek <pavel@denx.de>
Subject: Re: [PATCH v3 3/3] softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.
Date: Mon, 1 Dec 2025 20:49:20 -0300 [thread overview]
Message-ID: <aS4pgDjn1b8coe0r@redhat.com> (raw)
In-Reply-To: <0d66a966-0b89-416a-8712-6a6131af355e@siemens.com>
On Mon, Dec 01, 2025 at 10:51:50PM +0100, Jan Kiszka wrote:
> On 06.11.24 15:51, Sebastian Andrzej Siewior wrote:
> > A timer/ hrtimer softirq is raised in-IRQ context. With threaded
> > interrupts enabled or on PREEMPT_RT this leads to waking the ksoftirqd
> > for the processing of the softirq. ksoftirqd runs as SCHED_OTHER which
> > means it will compete with other tasks for CPU ressources.
> > This can introduce long delays for timer processing on heavy loaded
> > systems and is not desired.
> >
> > Split the TIMER_SOFTIRQ and HRTIMER_SOFTIRQ processing into a dedicated
> > timers thread and let it run at the lowest SCHED_FIFO priority.
> > Wake-ups for RT tasks happen from hardirq context so only timer_list timers
> > and hrtimers for "regular" tasks are processed here. The higher priority
> > ensures that wakeups are performed before scheduling SCHED_OTHER tasks.
> >
> > Using a dedicated variable to store the pending softirq bits values
> > ensure that the timer are not accidentally picked up by ksoftirqd and
> > other threaded interrupts.
> > It shouldn't be picked up by ksoftirqd since it runs at lower priority.
> > However if ksoftirqd is already running while a timer fires, then
> > ksoftird will be PI-boosted due to the BH-lock to ktimer's priority.
> > Ideally we try to avoid having ksoftirqd running.
> >
> > The timer thread can pick up pending softirqs from ksoftirqd but only
> > if the softirq load is high. It is not be desired that the picked up
> > softirqs are processed at SCHED_FIFO priority under high softirq load
> > but this can already happen by a PI-boost by a force-threaded interrupt.
> >
> > [ frederic@kernel.org: rcutorture.c fixes, storm fix by introduction of
> > local_timers_pending() for tick_nohz_next_event() ]
> >
> > [ junxiao.chang@intel.com: Ensure ktimersd gets woken up even if a
> > softirq is currently served. ]
> >
> > Reviewed-by: Paul E. McKenney <paulmck@kernel.org> [rcutorture]
> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>
> This went into 6.13 and was never backported to 6.12-lts. And that is
> why you can easily stall the latter with a workload like this and
> CONFIG_PREEMPT_RT enabled:
>
> echo "+cpu" >> /sys/fs/cgroup/cgroup.subtree_control
> echo "+cpuset" >> /sys/fs/cgroup/cgroup.subtree_control
>
> mkdir /sys/fs/cgroup/stalltest.sub1
> mkdir /sys/fs/cgroup/stalltest.sub2
> sleep 10000000 &
> pid=$!
>
> systemd-run --slice "stalltest.slice" taskset -c 0 sh -c " \
> while true; do
> echo $pid > /sys/fs/cgroup/stalltest.sub1/cgroup.procs;
> echo $pid > /sys/fs/cgroup/stalltest.sub2/cgroup.procs;
> done"
>
> echo "1000 20000" > /sys/fs/cgroup/stalltest.slice/cpu.max
>
> This triggers a lock-up if a holder of cgroup_file_kn_lock with
> SCHED_OTHER is scheduled out after using up its timeslice and then
> cgroup_file_notify_timer fires over a SCHED_OTHER context as well,
> trying to get this lock, failing and then never being able to reactivate
> the lock holder again as well.
>
> I've nicely reproduced this with upstream 6.12.58 while Debian's lastest
> 6.12-rt does not trigger because it additionally has the downstream -rt
> patches on board.
>
> How should we handle this? Consider 6.12 mainline with -rt and cgroups
> as potentially broken, asking people to user 6.12-rt? Or port this back?
>
> BTW, the original report of this issue came from an older
> 5.10.194-cip39-rt16 kernel (based on rt94 for 5.10). When was this
> feature introduced to the -rt patches? Was it ever backported to 5.10-rt
> or other -rt versions?
Hi Jan!
I failed to locate the original discussion (from v5.10-rt) as the V1 of this
patchset is a new thread. Anyway, you are correct, the commit below (and the
other two changes from the series) are not present in v5.10-rt.
AFAICT commit 49a17639508c ("softirq: Use a dedicated thread for timer wakeups
on PREEMPT_RT.") was merged initially to v6.13-rc1, it was never exclusive
to the RT tree.
Luis
> Jan
>
> --
> Siemens AG, Foundational Technologies
> Linux Expert Center
>
---end quoted text---
next prev parent reply other threads:[~2025-12-01 23:49 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-06 14:51 PATCH v3 0/3] softirq: Use a dedicated thread for timer wakeups with forced-threading Sebastian Andrzej Siewior
2024-11-06 14:51 ` [PATCH v3 1/3] hrtimer: Use __raise_softirq_irqoff() to raise the softirq Sebastian Andrzej Siewior
2024-11-07 1:48 ` [tip: irq/core] " tip-bot2 for Sebastian Andrzej Siewior
2024-11-06 14:51 ` [PATCH v3 2/3] timers: " Sebastian Andrzej Siewior
2024-11-07 1:48 ` [tip: irq/core] " tip-bot2 for Sebastian Andrzej Siewior
2024-11-06 14:51 ` [PATCH v3 3/3] softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT Sebastian Andrzej Siewior
2024-11-07 1:48 ` [tip: irq/core] " tip-bot2 for Sebastian Andrzej Siewior
2025-12-01 21:51 ` [PATCH v3 3/3] " Jan Kiszka
2025-12-01 23:49 ` Luis Claudio R. Goncalves [this message]
2025-12-02 6:57 ` Jan Kiszka
2025-12-02 8:22 ` Sebastian Andrzej Siewior
2025-12-02 8:24 ` Sebastian Andrzej Siewior
2025-12-02 12:39 ` Jan Kiszka
2025-12-02 13:22 ` Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aS4pgDjn1b8coe0r@redhat.com \
--to=lgoncalv@redhat.com \
--cc=anna-maria@linutronix.de \
--cc=bigeasy@linutronix.de \
--cc=dave@stgolabs.net \
--cc=florian.bezdeka@siemens.com \
--cc=frederic@kernel.org \
--cc=jan.kiszka@siemens.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=paulmck@kernel.org \
--cc=pavel@denx.de \
--cc=rcu@vger.kernel.org \
--cc=stable-rt@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.