linux-rt-devel.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: "Šindelář, Jindřich" <JindrichSindelar@eaton.com>
Cc: "linux-rt-devel@lists.linux.dev" <linux-rt-devel@lists.linux.dev>,
	"bigeasy@linutronix.de" <bigeasy@linutronix.de>,
	"kprateek.nayak@amd.com" <kprateek.nayak@amd.com>,
	"ryotkkr98@gmail.com" <ryotkkr98@gmail.com>
Subject: Re: softirqs causing high IRQ jitter
Date: Wed, 2 Jul 2025 13:09:09 -0400	[thread overview]
Message-ID: <20250702130909.0bc4702d@batman.local.home> (raw)
In-Reply-To: <SA1PR17MB5553039BCE5C4C1A68CEFB87D740A@SA1PR17MB5553.namprd17.prod.outlook.com>

On Wed, 2 Jul 2025 14:51:37 +0000
"Šindelář, Jindřich" <JindrichSindelar@eaton.com> wrote:

> Hello,
> 
> This is my first post here, hope everything is right with it :)

Welcome!

> 
> Our team is observing high jitter in certain IRQ handlers on a PREEMPT_RT
> kernel, and we have pinned the issue to softirqs. However, we're not sure
> how to address the problem. I'll describe the situation first, sum up how
> I understand things, and ask questions at the end.
> 
> We have an embedded system with the NXP i.MX6 single-core SoC, and our
> kernel is based on linux-stable-rt v5.15.163-rt78. Our modifications are
> rather small and don't touch anything softirq-related.
> 
> All of our IRQ threads are running at the default priority of 50
> (SCHED_FIFO). There are also several user-space application SCHED_FIFO
> threads - some above, some below the priority 50. We also noticed that the
> application is raising the ksoftirqd priority to 60 (SCHED_FIFO), based on
> recommendations from an external supplier - the reason being the ability
> to handle TIMER and HRTIMER softirqs with minimal latency and jitter.
> 
> We observe that a UART "Tx complete" IRQ occasionally comes late, causing
> unacceptable jitter in the communication. There are actually more threads
> involved (SDMA IRQ, its tasklet, and UART "Tx complete" IRQ), and any of
> these can come late.
> 
> We did some experiments and debugging through GPIOs and found that when
> the UART "Tx complete" IRQ comes late, it is when there is a bunch of
> softirqs processed in a row. We set a GPIO when entering the loop in
> handle_softirqs() and clear the GPIO when leaving it. The pulse we observe
> on that GPIO when it delays the UART IRQ can have a length of a few
> hundred microseconds.
> 
> Furthermore, we found that in v6.6.12-rt20, an additional thread has been
> introduced to primarily handle the timer-related softirqs
> (https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/co
> mmit/?id=7f49c1dd9ab0c9f23c526f58241d6cc9d50f1778). We thought this might
> be helpful in our situation, as we could leave the ksoftirqd thread at
> prio 1 and set the timer thread prio to 60, as requested by the
> application. We backported the patch, and it did improve the situation,
> but it didn't resolve it completely. The IRQs we're interested in still
> come late, but it's now more seldom. My understanding is that, despite the
> claim in the commit message, the timer thread is not really dedicated to
> timer softirqs only. It gets woken up when a timer softirq is raised, but
> it will also handle any other softirqs that are pending at that moment.

I think you may have found the main issue. That is the timer softirqd
will pick up other softirqs that are pending. Yes it is a separate
thread and will wake up to handle softirqs when the timer softirq needs
to be handled but I don't see anything limiting it from handling other
softirqs.

This looks like it could be an enhancement to have the timer softirq
only handle timer softirqs. But unfortunately, I don't have the time to
implement this. Perhaps somebody else?

-- Steve


> 
> Additionally, the pending softirqs can also be executed in the context of
> any threaded IRQ. So even if there was an isolation where the timer thread
> handles (hr)timers only, the other softirqs could still cause unacceptable jitter
> in IRQ handling. In my understanding, the whole thing gets even more
> complicated by priority inheritance - even if softirq processing gets
> started in the context of a ksoftirqd thread with priority 1, its priority
> may get PI-boosted when a higher priority thread runs into a lock held by
> ksoftirqd.
> 
> Please correct me if any of my understandings or observations mentioned
> above are wrong. Now, I'd like to ask these questions:
> 
> 1. In our view, not all of the softirqs have the same priority: e.g.,
> (hr)timers and tasklets seem important, but things like NET_RX, NET_TX, or
> BLOCK less so. Would it be a bad idea to try to introduce a more selective
> approach, where a certain type of softirqs can only be executed with a
> defined maximum priority? This could make more sense if we introduced some
> IRQ priority partitioning instead of leaving all IRQ threads at prio 50
> (we're considering this step as well).
> 
> 2. Several of our peripherals use a DMA (our SoC actually has 2 different DMA
> blocks), and we noticed that the dmaengine is using tasklets to execute
> the DMA callbacks. This means the callbacks can be executed in the context
> of an arbitrary thread (and thus arbitrary priority). It feels quite
> strange to me, especially because the DMA callbacks can further
> "cooperate" with other IRQs (such as enabling the "Tx complete" IRQ of our
> UART). I understood that tasklets are deprecated and it's recommended to
> use BH workqueues instead. Is there a reason why we shouldn't modify the
> dmaengine (which I see as a very important component in the system) to
> make it use BH workqueues instead of tasklets?
> 
> 3. Do you have any other recommendations on how we should configure and
> balance our system? Setting different priorities to individual IRQ threads
> based on how critical we see them looks quite straightforward, but the
> fact that pending softirqs can be executed in the context of an arbitrary
> IRQ thread still makes it look nondeterministic.
> 
> Best regards
> Jindra
> ________________________________
> Eaton Elektrotechnika s.r.o. ~ Sídlo společnosti, jak je zapsáno v rejstříku: Komárovská 2406, Praha 9 - Horní Počernice, 193 00, Česká Republika ~ Jméno, místo, kde byla společnost zaregistrována: Praha ~ Identifikační číslo (IČO): 498 11 894
> ________________________________


  reply	other threads:[~2025-07-02 17:09 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-02 14:51 softirqs causing high IRQ jitter Šindelář, Jindřich
2025-07-02 17:09 ` Steven Rostedt [this message]
2025-07-06 12:01   ` Ryo Takakura
2025-07-07 13:05     ` Šindelář, Jindřich
2025-07-08 14:36       ` Šindelář, Jindřich
2025-07-08 15:17         ` Šindelář, Jindřich
2025-07-13  1:40         ` Ryo Takakura
2025-07-13  1:35       ` Ryo Takakura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250702130909.0bc4702d@batman.local.home \
    --to=rostedt@goodmis.org \
    --cc=JindrichSindelar@eaton.com \
    --cc=bigeasy@linutronix.de \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=ryotkkr98@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).