softirqs causing high IRQ jitter

linux-rt-devel.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

* softirqs causing high IRQ jitter
@ 2025-07-02 14:51 Šindelář, Jindřich
  2025-07-02 17:09 ` Steven Rostedt
  0 siblings, 1 reply; 8+ messages in thread
From: Šindelář, Jindřich @ 2025-07-02 14:51 UTC (permalink / raw)
  To: linux-rt-devel@lists.linux.dev
  Cc: bigeasy@linutronix.de, kprateek.nayak@amd.com,
	ryotkkr98@gmail.com

Hello,

This is my first post here, hope everything is right with it :)

Our team is observing high jitter in certain IRQ handlers on a PREEMPT_RT
kernel, and we have pinned the issue to softirqs. However, we're not sure
how to address the problem. I'll describe the situation first, sum up how
I understand things, and ask questions at the end.

We have an embedded system with the NXP i.MX6 single-core SoC, and our
kernel is based on linux-stable-rt v5.15.163-rt78. Our modifications are
rather small and don't touch anything softirq-related.

All of our IRQ threads are running at the default priority of 50
(SCHED_FIFO). There are also several user-space application SCHED_FIFO
threads - some above, some below the priority 50. We also noticed that the
application is raising the ksoftirqd priority to 60 (SCHED_FIFO), based on
recommendations from an external supplier - the reason being the ability
to handle TIMER and HRTIMER softirqs with minimal latency and jitter.

We observe that a UART "Tx complete" IRQ occasionally comes late, causing
unacceptable jitter in the communication. There are actually more threads
involved (SDMA IRQ, its tasklet, and UART "Tx complete" IRQ), and any of
these can come late.

We did some experiments and debugging through GPIOs and found that when
the UART "Tx complete" IRQ comes late, it is when there is a bunch of
softirqs processed in a row. We set a GPIO when entering the loop in
handle_softirqs() and clear the GPIO when leaving it. The pulse we observe
on that GPIO when it delays the UART IRQ can have a length of a few
hundred microseconds.

Furthermore, we found that in v6.6.12-rt20, an additional thread has been
introduced to primarily handle the timer-related softirqs
(https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/co
mmit/?id=7f49c1dd9ab0c9f23c526f58241d6cc9d50f1778). We thought this might
be helpful in our situation, as we could leave the ksoftirqd thread at
prio 1 and set the timer thread prio to 60, as requested by the
application. We backported the patch, and it did improve the situation,
but it didn't resolve it completely. The IRQs we're interested in still
come late, but it's now more seldom. My understanding is that, despite the
claim in the commit message, the timer thread is not really dedicated to
timer softirqs only. It gets woken up when a timer softirq is raised, but
it will also handle any other softirqs that are pending at that moment.

Additionally, the pending softirqs can also be executed in the context of
any threaded IRQ. So even if there was an isolation where the timer thread
handles (hr)timers only, the other softirqs could still cause unacceptable jitter
in IRQ handling. In my understanding, the whole thing gets even more
complicated by priority inheritance - even if softirq processing gets
started in the context of a ksoftirqd thread with priority 1, its priority
may get PI-boosted when a higher priority thread runs into a lock held by
ksoftirqd.

Please correct me if any of my understandings or observations mentioned
above are wrong. Now, I'd like to ask these questions:

1. In our view, not all of the softirqs have the same priority: e.g.,
(hr)timers and tasklets seem important, but things like NET_RX, NET_TX, or
BLOCK less so. Would it be a bad idea to try to introduce a more selective
approach, where a certain type of softirqs can only be executed with a
defined maximum priority? This could make more sense if we introduced some
IRQ priority partitioning instead of leaving all IRQ threads at prio 50
(we're considering this step as well).

2. Several of our peripherals use a DMA (our SoC actually has 2 different DMA
blocks), and we noticed that the dmaengine is using tasklets to execute
the DMA callbacks. This means the callbacks can be executed in the context
of an arbitrary thread (and thus arbitrary priority). It feels quite
strange to me, especially because the DMA callbacks can further
"cooperate" with other IRQs (such as enabling the "Tx complete" IRQ of our
UART). I understood that tasklets are deprecated and it's recommended to
use BH workqueues instead. Is there a reason why we shouldn't modify the
dmaengine (which I see as a very important component in the system) to
make it use BH workqueues instead of tasklets?

3. Do you have any other recommendations on how we should configure and
balance our system? Setting different priorities to individual IRQ threads
based on how critical we see them looks quite straightforward, but the
fact that pending softirqs can be executed in the context of an arbitrary
IRQ thread still makes it look nondeterministic.

Best regards
Jindra
________________________________
Eaton Elektrotechnika s.r.o. ~ Sídlo společnosti, jak je zapsáno v rejstříku: Komárovská 2406, Praha 9 - Horní Počernice, 193 00, Česká Republika ~ Jméno, místo, kde byla společnost zaregistrována: Praha ~ Identifikační číslo (IČO): 498 11 894
________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: softirqs causing high IRQ jitter
  2025-07-02 14:51 softirqs causing high IRQ jitter Šindelář, Jindřich
@ 2025-07-02 17:09 ` Steven Rostedt
  2025-07-06 12:01   ` Ryo Takakura
  0 siblings, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2025-07-02 17:09 UTC (permalink / raw)
  To: Šindelář, Jindřich
  Cc: linux-rt-devel@lists.linux.dev, bigeasy@linutronix.de,
	kprateek.nayak@amd.com, ryotkkr98@gmail.com

On Wed, 2 Jul 2025 14:51:37 +0000
"Šindelář, Jindřich" <JindrichSindelar@eaton.com> wrote:

> Hello,
> 
> This is my first post here, hope everything is right with it :)

Welcome!

> 
> Our team is observing high jitter in certain IRQ handlers on a PREEMPT_RT
> kernel, and we have pinned the issue to softirqs. However, we're not sure
> how to address the problem. I'll describe the situation first, sum up how
> I understand things, and ask questions at the end.
> 
> We have an embedded system with the NXP i.MX6 single-core SoC, and our
> kernel is based on linux-stable-rt v5.15.163-rt78. Our modifications are
> rather small and don't touch anything softirq-related.
> 
> All of our IRQ threads are running at the default priority of 50
> (SCHED_FIFO). There are also several user-space application SCHED_FIFO
> threads - some above, some below the priority 50. We also noticed that the
> application is raising the ksoftirqd priority to 60 (SCHED_FIFO), based on
> recommendations from an external supplier - the reason being the ability
> to handle TIMER and HRTIMER softirqs with minimal latency and jitter.
> 
> We observe that a UART "Tx complete" IRQ occasionally comes late, causing
> unacceptable jitter in the communication. There are actually more threads
> involved (SDMA IRQ, its tasklet, and UART "Tx complete" IRQ), and any of
> these can come late.
> 
> We did some experiments and debugging through GPIOs and found that when
> the UART "Tx complete" IRQ comes late, it is when there is a bunch of
> softirqs processed in a row. We set a GPIO when entering the loop in
> handle_softirqs() and clear the GPIO when leaving it. The pulse we observe
> on that GPIO when it delays the UART IRQ can have a length of a few
> hundred microseconds.
> 
> Furthermore, we found that in v6.6.12-rt20, an additional thread has been
> introduced to primarily handle the timer-related softirqs
> (https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/co
> mmit/?id=7f49c1dd9ab0c9f23c526f58241d6cc9d50f1778). We thought this might
> be helpful in our situation, as we could leave the ksoftirqd thread at
> prio 1 and set the timer thread prio to 60, as requested by the
> application. We backported the patch, and it did improve the situation,
> but it didn't resolve it completely. The IRQs we're interested in still
> come late, but it's now more seldom. My understanding is that, despite the
> claim in the commit message, the timer thread is not really dedicated to
> timer softirqs only. It gets woken up when a timer softirq is raised, but
> it will also handle any other softirqs that are pending at that moment.

I think you may have found the main issue. That is the timer softirqd
will pick up other softirqs that are pending. Yes it is a separate
thread and will wake up to handle softirqs when the timer softirq needs
to be handled but I don't see anything limiting it from handling other
softirqs.

This looks like it could be an enhancement to have the timer softirq
only handle timer softirqs. But unfortunately, I don't have the time to
implement this. Perhaps somebody else?

-- Steve


> 
> Additionally, the pending softirqs can also be executed in the context of
> any threaded IRQ. So even if there was an isolation where the timer thread
> handles (hr)timers only, the other softirqs could still cause unacceptable jitter
> in IRQ handling. In my understanding, the whole thing gets even more
> complicated by priority inheritance - even if softirq processing gets
> started in the context of a ksoftirqd thread with priority 1, its priority
> may get PI-boosted when a higher priority thread runs into a lock held by
> ksoftirqd.
> 
> Please correct me if any of my understandings or observations mentioned
> above are wrong. Now, I'd like to ask these questions:
> 
> 1. In our view, not all of the softirqs have the same priority: e.g.,
> (hr)timers and tasklets seem important, but things like NET_RX, NET_TX, or
> BLOCK less so. Would it be a bad idea to try to introduce a more selective
> approach, where a certain type of softirqs can only be executed with a
> defined maximum priority? This could make more sense if we introduced some
> IRQ priority partitioning instead of leaving all IRQ threads at prio 50
> (we're considering this step as well).
> 
> 2. Several of our peripherals use a DMA (our SoC actually has 2 different DMA
> blocks), and we noticed that the dmaengine is using tasklets to execute
> the DMA callbacks. This means the callbacks can be executed in the context
> of an arbitrary thread (and thus arbitrary priority). It feels quite
> strange to me, especially because the DMA callbacks can further
> "cooperate" with other IRQs (such as enabling the "Tx complete" IRQ of our
> UART). I understood that tasklets are deprecated and it's recommended to
> use BH workqueues instead. Is there a reason why we shouldn't modify the
> dmaengine (which I see as a very important component in the system) to
> make it use BH workqueues instead of tasklets?
> 
> 3. Do you have any other recommendations on how we should configure and
> balance our system? Setting different priorities to individual IRQ threads
> based on how critical we see them looks quite straightforward, but the
> fact that pending softirqs can be executed in the context of an arbitrary
> IRQ thread still makes it look nondeterministic.
> 
> Best regards
> Jindra
> ________________________________
> Eaton Elektrotechnika s.r.o. ~ Sídlo společnosti, jak je zapsáno v rejstříku: Komárovská 2406, Praha 9 - Horní Počernice, 193 00, Česká Republika ~ Jméno, místo, kde byla společnost zaregistrována: Praha ~ Identifikační číslo (IČO): 498 11 894
> ________________________________


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: softirqs causing high IRQ jitter
  2025-07-02 17:09 ` Steven Rostedt
@ 2025-07-06 12:01   ` Ryo Takakura
  2025-07-07 13:05     ` Šindelář, Jindřich
  0 siblings, 1 reply; 8+ messages in thread
From: Ryo Takakura @ 2025-07-06 12:01 UTC (permalink / raw)
  To: rostedt, JindrichSindelar
  Cc: bigeasy, kprateek.nayak, linux-rt-devel, ryotkkr98

Hi Jindra and Steven!

On Wed, 2 Jul 2025 13:09:09 -0400, Steven Rostedt wrote:
>On Wed, 2 Jul 2025 14:51:37 +0000
>"Šindelář, Jindřich" <JindrichSindelar@eaton.com> wrote:
>> Our team is observing high jitter in certain IRQ handlers on a PREEMPT_RT
>> kernel, and we have pinned the issue to softirqs. However, we're not sure
>> how to address the problem. I'll describe the situation first, sum up how
>> I understand things, and ask questions at the end.
>> 
>> We have an embedded system with the NXP i.MX6 single-core SoC, and our
>> kernel is based on linux-stable-rt v5.15.163-rt78. Our modifications are
>> rather small and don't touch anything softirq-related.
>> 
>> All of our IRQ threads are running at the default priority of 50
>> (SCHED_FIFO). There are also several user-space application SCHED_FIFO
>> threads - some above, some below the priority 50. We also noticed that the
>> application is raising the ksoftirqd priority to 60 (SCHED_FIFO), based on
>> recommendations from an external supplier - the reason being the ability
>> to handle TIMER and HRTIMER softirqs with minimal latency and jitter.
>> 
>> We observe that a UART "Tx complete" IRQ occasionally comes late, causing
>> unacceptable jitter in the communication. There are actually more threads
>> involved (SDMA IRQ, its tasklet, and UART "Tx complete" IRQ), and any of
>> these can come late.
>> 
>> We did some experiments and debugging through GPIOs and found that when
>> the UART "Tx complete" IRQ comes late, it is when there is a bunch of
>> softirqs processed in a row. We set a GPIO when entering the loop in
>> handle_softirqs() and clear the GPIO when leaving it. The pulse we observe
>> on that GPIO when it delays the UART IRQ can have a length of a few
>> hundred microseconds.
>> 
>> Furthermore, we found that in v6.6.12-rt20, an additional thread has been
>> introduced to primarily handle the timer-related softirqs
>> (https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/co
>> mmit/?id=7f49c1dd9ab0c9f23c526f58241d6cc9d50f1778). We thought this might
>> be helpful in our situation, as we could leave the ksoftirqd thread at
>> prio 1 and set the timer thread prio to 60, as requested by the
>> application. We backported the patch, and it did improve the situation,
>> but it didn't resolve it completely. The IRQs we're interested in still
>> come late, but it's now more seldom. My understanding is that, despite the
>> claim in the commit message, the timer thread is not really dedicated to
>> timer softirqs only. It gets woken up when a timer softirq is raised, but
>> it will also handle any other softirqs that are pending at that moment.
>
>I think you may have found the main issue. That is the timer softirqd
>will pick up other softirqs that are pending. Yes it is a separate
>thread and will wake up to handle softirqs when the timer softirq needs
>to be handled but I don't see anything limiting it from handling other
>softirqs.
>
>This looks like it could be an enhancement to have the timer softirq
>only handle timer softirqs. But unfortunately, I don't have the time to
>implement this. Perhaps somebody else?

I tried to come up with an idea isolating (hr)timer softirq
from the rest. Would be nice to get a comment!

The idea is that we reorder the softirq vector and only handle
(hr)timer softirq and HI_SOFTIRQ in non-ksoftirqd thread context,
namely the timer threads and threaded IRQs' context, and leave
the rest of softirqs to ksoftirqd by checking @ksirqd.

This way I believe the jitter addressed below by Jindra caused
by softirqs being executed in the threaded IRQ context can also
be reduced as well.

----- BEGIN -----
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 51b6484c0493..d43e59db6efa 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -548,13 +548,18 @@ enum
 {
        HI_SOFTIRQ=0,
        TIMER_SOFTIRQ,
+#ifdef CONFIG_PREEMPT_RT
+       HRTIMER_SOFTIRQ,
+#endif
        NET_TX_SOFTIRQ,
        NET_RX_SOFTIRQ,
        BLOCK_SOFTIRQ,
        IRQ_POLL_SOFTIRQ,
        TASKLET_SOFTIRQ,
        SCHED_SOFTIRQ,
+#ifndef CONFIG_PREEMPT_RT
        HRTIMER_SOFTIRQ,
+#endif
        RCU_SOFTIRQ,    /* Preferable RCU should always be the last softirq */

        NR_SOFTIRQS
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 513b1945987c..41df13c07aa2 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -571,6 +571,13 @@ static void handle_softirqs(bool ksirqd)
                h += softirq_bit - 1;

                vec_nr = h - softirq_vec;
+
+               if (IS_ENABLED(CONFIG_PREEMPT_RT) && vec_nr > HRTIMER_SOFTIRQ && !ksirqd) {
+                       /* Restore the pending bitmask for ksoftirqd to handle */
+                       or_softirq_pending(pending << (vec_nr - ffs(pending) + 1));
+                       break;
+               }
+
                prev_count = preempt_count();

                kstat_incr_softirqs_this_cpu(vec_nr);
@@ -596,7 +603,7 @@ static void handle_softirqs(bool ksirqd)
        pending = local_softirq_pending();
        if (pending) {
                if (time_before(jiffies, end) && !need_resched() &&
-                   --max_restart)
+                   --max_restart && !IS_ENABLED(CONFIG_PREEMPT_RT))
                        goto restart;

                wakeup_softirqd();
----- END -----

Sincerely,
Ryo Takakura

>> 
>> Additionally, the pending softirqs can also be executed in the context of
>> any threaded IRQ. So even if there was an isolation where the timer thread
>> handles (hr)timers only, the other softirqs could still cause unacceptable jitter
>> in IRQ handling. In my understanding, the whole thing gets even more
>> complicated by priority inheritance - even if softirq processing gets
>> started in the context of a ksoftirqd thread with priority 1, its priority
>> may get PI-boosted when a higher priority thread runs into a lock held by
>> ksoftirqd.
>> 
>> Please correct me if any of my understandings or observations mentioned
>> above are wrong. Now, I'd like to ask these questions:
>> 
>> 1. In our view, not all of the softirqs have the same priority: e.g.,
>> (hr)timers and tasklets seem important, but things like NET_RX, NET_TX, or
>> BLOCK less so. Would it be a bad idea to try to introduce a more selective
>> approach, where a certain type of softirqs can only be executed with a
>> defined maximum priority? This could make more sense if we introduced some
>> IRQ priority partitioning instead of leaving all IRQ threads at prio 50
>> (we're considering this step as well).
>> 
>> 2. Several of our peripherals use a DMA (our SoC actually has 2 different DMA
>> blocks), and we noticed that the dmaengine is using tasklets to execute
>> the DMA callbacks. This means the callbacks can be executed in the context
>> of an arbitrary thread (and thus arbitrary priority). It feels quite
>> strange to me, especially because the DMA callbacks can further
>> "cooperate" with other IRQs (such as enabling the "Tx complete" IRQ of our
>> UART). I understood that tasklets are deprecated and it's recommended to
>> use BH workqueues instead. Is there a reason why we shouldn't modify the
>> dmaengine (which I see as a very important component in the system) to
>> make it use BH workqueues instead of tasklets?
>> 
>> 3. Do you have any other recommendations on how we should configure and
>> balance our system? Setting different priorities to individual IRQ threads
>> based on how critical we see them looks quite straightforward, but the
>> fact that pending softirqs can be executed in the context of an arbitrary
>> IRQ thread still makes it look nondeterministic.

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: softirqs causing high IRQ jitter
  2025-07-06 12:01   ` Ryo Takakura
@ 2025-07-07 13:05     ` Šindelář, Jindřich
  2025-07-08 14:36       ` Šindelář, Jindřich
  2025-07-13  1:35       ` Ryo Takakura
  0 siblings, 2 replies; 8+ messages in thread
From: Šindelář, Jindřich @ 2025-07-07 13:05 UTC (permalink / raw)
  To: Ryo Takakura, rostedt@goodmis.org
  Cc: bigeasy@linutronix.de, kprateek.nayak@amd.com,
	linux-rt-devel@lists.linux.dev

Hello Steve, hello Ryo,

> I think you may have found the main issue. That is the timer softirqd
> will pick up other softirqs that are pending. Yes it is a separate
> thread and will wake up to handle softirqs when the timer softirq needs
> to be handled but I don't see anything limiting it from handling other
> softirqs.

Thank you for confirming this. I am now leaning more towards trying
even more separation, to achieve a state where, if handle_softirqs()
gets called in a context of an IRQ thread, it will only handle the
softirq raised by that thread, leaving the rest of pending softirqs
untouched.
I have an idea in mind, but before trying, I would like to understand
a bit more how the bitmask that holds the pending softirqs is protected
against races. There's this block of code in the handle_softirqs():

        // ...
        pending = local_softirq_pending();

        softirq_handle_begin();
        in_hardirq = lockdep_softirq_start();
        account_softirq_enter(current);

    restart:
        /* Reset the pending bitmask before enabling irqs */
        set_softirq_pending(0);

        local_irq_enable();
        // ...

Here, local_softirq_pending() and set_softirq_pending() access the same
bitmask, and AFAICT, the code isn't surrounded by any lock or critical
section. It is true that (hard)IRQs are disabled at that point, but to
my knowledge, preemption is not. If all that is true, then a softirq
can be set pending in the meantime and can get "lost" by the
set_softirq_pending(0). I find it very unlikely that such a mistake
would be there, so I'm probably wrong here. Could anyone please tell me
what am I missing?

> I tried to come up with an idea isolating (hr)timer softirq from the
> rest. Would be nice to get a comment!

Thank you for comming up with the suggestion. There will be people who
are more entitled to review this than me, but I will try to comment on
what I see.
 - Your suggestion will not only modify the behavior when called in
   the context of the timer thread, it will apply anytime it's called
   outside of the ksoftirqd (including IRQ threads).
 - IMHO it would be nicer to signify the fact it's called from the
   timer thread with a function argument, similarly to the ksirqd one
 - I would avoid reordering the softirqs and instead of re-raising
   them in the loop, I would suggest to only clear the timer related
   bits in the place where is currently the set_softirq_pending(0).
 - Your restart logic would mean restart never happens with PREEMPT_RT,
   including when already running in the ksoftirqd. I'm not sure if
   that's desired.

Overall, as written at the beginning, I'm now at a point where I'm not
sure if the isolation of the timer softirqs will solve the whole
problem we have - it may still happen that multiple pending softirqs
will be handled in the context of an arbitrary IRQ thread. And trying
to completely avoid the execution in that context (leaving it only to a
low-prio ksoftirqd) doesn't look like a good idea either, as ksoftirqd
could get starved out on heavily loaded systems.

Best regards
Jindra
________________________________
Eaton Elektrotechnika s.r.o. ~ Sídlo společnosti, jak je zapsáno v rejstříku: Komárovská 2406, Praha 9 - Horní Počernice, 193 00, Česká Republika ~ Jméno, místo, kde byla společnost zaregistrována: Praha ~ Identifikační číslo (IČO): 498 11 894
________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: softirqs causing high IRQ jitter
  2025-07-07 13:05     ` Šindelář, Jindřich
@ 2025-07-08 14:36       ` Šindelář, Jindřich
  2025-07-08 15:17         ` Šindelář, Jindřich
  2025-07-13  1:40         ` Ryo Takakura
  2025-07-13  1:35       ` Ryo Takakura
  1 sibling, 2 replies; 8+ messages in thread
From: Šindelář, Jindřich @ 2025-07-08 14:36 UTC (permalink / raw)
  To: Ryo Takakura, rostedt@goodmis.org
  Cc: bigeasy@linutronix.de, kprateek.nayak@amd.com,
	linux-rt-devel@lists.linux.dev

Hello,

an update from my side on this:
> Thank you for confirming this. I am now leaning more towards trying
> even more separation, to achieve a state where, if handle_softirqs()
> gets called in a context of an IRQ thread, it will only handle the
> softirq raised by that thread, leaving the rest of pending softirqs
> untouched.

My way of thinking about this separation (either only for the timers or
for per-IRQ-thread) was based on the idea that inside handle_softirqs(),
I'll only clear some bits of the per-CPU pending softirq bitmask and
leave the rest pending, i.e. modify the code mentioned above to
something like this:

    // ...
    __u32 softirq_mask = 0xFFFFFFFF;

    if (in_timer_thread) {
        /* when running in the timer thread, only handle the (hr)timer softirqs */
        softirq_mask = (1UL << TIMER_SOFTIRQ) | (1UL << HRTIMER_SOFTIRQ);
    }

    pending = local_softirq_pending();

    softirq_handle_begin();
    in_hardirq = lockdep_softirq_start();
    account_softirq_enter(current);

restart:
    /* Reset the softirqs we want to handle in this run, keep the rest pending */
    set_softirq_pending(pending & ~softirq_mask);
    pending &= softirq_mask;

    local_irq_enable();
    // ...

The issue here is once the IRQs get enabled, the rest of the code using
local_softirq_pending() to check what's pending can immediatelly start
handling softirqs again and again...
I thought I could do these changes in a local and relatively non-invasive
way, but now I see the use of local_softirq_pending() is spread over
multiple places, also outside of softirq.c (namely in smp.c and
tick-sched.c). It appears to me that the approach of "let's only handle
some softirqs and leave the rest pending for now" may be very difficult
to achieve when the single pending softirq bitmask is checked at number
of different places and it's assumed that whenever it's non-zero, it
should be handled on the first next occasion.

If anyone would suggest a different way, I'll be happy to hear it, but
I don't see any reasonable approach myself. I think I'll focus on
different ways how to reduce the number of softirqs occurring, maybe
trying to use something like BH workqueues for some of them (NET_RX and
NET_TX come to mind).

PS: Sorry for the long gibberish line at the end, it's automatically
added by my company's mail server...

Regards
Jindra
________________________________
Eaton Elektrotechnika s.r.o. ~ Sídlo společnosti, jak je zapsáno v rejstříku: Komárovská 2406, Praha 9 - Horní Počernice, 193 00, Česká Republika ~ Jméno, místo, kde byla společnost zaregistrována: Praha ~ Identifikační číslo (IČO): 498 11 894
________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: softirqs causing high IRQ jitter
  2025-07-08 14:36       ` Šindelář, Jindřich
@ 2025-07-08 15:17         ` Šindelář, Jindřich
  2025-07-13  1:40         ` Ryo Takakura
  1 sibling, 0 replies; 8+ messages in thread
From: Šindelář, Jindřich @ 2025-07-08 15:17 UTC (permalink / raw)
  To: Ryo Takakura, rostedt@goodmis.org
  Cc: bigeasy@linutronix.de, kprateek.nayak@amd.com,
	linux-rt-devel@lists.linux.dev

> I think I'll focus on different ways how to reduce the number of
> softirqs occurring, maybe trying to use something like BH workqueues
> for some of them (NET_RX and NET_TX come to mind).

Correcting myself before someone else does: I meant threaded
workqueues, and now realize that touching the NET_RX and NET_TX would
be a bad idea. If anyone would have any suggestions on what I could
move from softirqs to a more manageable mechanism for deferred work, I
will be glad to read them :)

Best regards
Jindra

________________________________
Eaton Elektrotechnika s.r.o. ~ Sídlo společnosti, jak je zapsáno v rejstříku: Komárovská 2406, Praha 9 - Horní Počernice, 193 00, Česká Republika ~ Jméno, místo, kde byla společnost zaregistrována: Praha ~ Identifikační číslo (IČO): 498 11 894
________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: softirqs causing high IRQ jitter
  2025-07-07 13:05     ` Šindelář, Jindřich
  2025-07-08 14:36       ` Šindelář, Jindřich
@ 2025-07-13  1:35       ` Ryo Takakura
  1 sibling, 0 replies; 8+ messages in thread
From: Ryo Takakura @ 2025-07-13  1:35 UTC (permalink / raw)
  To: jindrichsindelar
  Cc: bigeasy, kprateek.nayak, linux-rt-devel, rostedt, ryotkkr98

On Mon, 7 Jul 2025 13:05:40 +0000, Šindelář, Jindřich wrote:
>On Wed, 2 Jul 2025 13:09:09 -0400, Steven Rostedt wrote:
>> I think you may have found the main issue. That is the timer softirqd
>> will pick up other softirqs that are pending. Yes it is a separate
>> thread and will wake up to handle softirqs when the timer softirq needs
>> to be handled but I don't see anything limiting it from handling other
>> softirqs.
>
>Thank you for confirming this. I am now leaning more towards trying
>even more separation, to achieve a state where, if handle_softirqs()
>gets called in a context of an IRQ thread, it will only handle the
>softirq raised by that thread, leaving the rest of pending softirqs
>untouched.
>I have an idea in mind, but before trying, I would like to understand
>a bit more how the bitmask that holds the pending softirqs is protected
>against races. There's this block of code in the handle_softirqs():
>
>        // ...
>        pending = local_softirq_pending();
>
>        softirq_handle_begin();
>        in_hardirq = lockdep_softirq_start();
>        account_softirq_enter(current);
>
>    restart:
>        /* Reset the pending bitmask before enabling irqs */
>        set_softirq_pending(0);
>
>        local_irq_enable();
>        // ...
>
>Here, local_softirq_pending() and set_softirq_pending() access the same
>bitmask, and AFAICT, the code isn't surrounded by any lock or critical
>section. It is true that (hard)IRQs are disabled at that point, but to
>my knowledge, preemption is not. If all that is true, then a softirq
>can be set pending in the meantime and can get "lost" by the
>set_softirq_pending(0). I find it very unlikely that such a mistake
>would be there, so I'm probably wrong here. Could anyone please tell me
>what am I missing?

I'm afraid I might be wrong, but my understanding is that there
should be no scheduling happening if we disable interrupts (like
timers), unless we call for it. And the path in question has no
code that could trigger scheduling, so it should be safe.

I believe its documented here[0].

>> I tried to come up with an idea isolating (hr)timer softirq from the
>> rest. Would be nice to get a comment!
>
>Thank you for comming up with the suggestion. There will be people who
>are more entitled to review this than me, but I will try to comment on
>what I see.
> - Your suggestion will not only modify the behavior when called in
>   the context of the timer thread, it will apply anytime it's called
>   outside of the ksoftirqd (including IRQ threads).
> - IMHO it would be nicer to signify the fact it's called from the
>   timer thread with a function argument, similarly to the ksirqd one
> - I would avoid reordering the softirqs and instead of re-raising
>   them in the loop, I would suggest to only clear the timer related
>   bits in the place where is currently the set_softirq_pending(0).

Thanks for the comment! It was my intention to include IRQ thread as well,
and I wasn't really thinking of distinguishing the timer thread from others.
If we were to signify that its called from the timer thread, I think what
you suggested works better. I also think its better to clear those timer
related bits to be executed rather than reordering the softirqs like you say.

> - Your restart logic would mean restart never happens with PREEMPT_RT,
>   including when already running in the ksoftirqd. I'm not sure if
>   that's desired.

Oh yes, ksoftirqd should better be allowed to restart. I should have
given more thoughts to this... Thanks for pointing out.

Sincerely,
Ryo Takakura

[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/locking/preempt-locking.rst

>Overall, as written at the beginning, I'm now at a point where I'm not
>sure if the isolation of the timer softirqs will solve the whole
>problem we have - it may still happen that multiple pending softirqs
>will be handled in the context of an arbitrary IRQ thread. And trying
>to completely avoid the execution in that context (leaving it only to a
>low-prio ksoftirqd) doesn't look like a good idea either, as ksoftirqd
>could get starved out on heavily loaded systems.
>
>Best regards
>Jindra

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: softirqs causing high IRQ jitter
  2025-07-08 14:36       ` Šindelář, Jindřich
  2025-07-08 15:17         ` Šindelář, Jindřich
@ 2025-07-13  1:40         ` Ryo Takakura
  1 sibling, 0 replies; 8+ messages in thread
From: Ryo Takakura @ 2025-07-13  1:40 UTC (permalink / raw)
  To: jindrichsindelar
  Cc: bigeasy, kprateek.nayak, linux-rt-devel, rostedt, ryotkkr98

On Tue, 8 Jul 2025 14:36:23 +0000, Šindelář, Jindřich wrote:
>an update from my side on this:
>> Thank you for confirming this. I am now leaning more towards trying
>> even more separation, to achieve a state where, if handle_softirqs()
>> gets called in a context of an IRQ thread, it will only handle the
>> softirq raised by that thread, leaving the rest of pending softirqs
>> untouched.
>
>My way of thinking about this separation (either only for the timers or
>for per-IRQ-thread) was based on the idea that inside handle_softirqs(),
>I'll only clear some bits of the per-CPU pending softirq bitmask and
>leave the rest pending, i.e. modify the code mentioned above to
>something like this:
>
>    // ...
>    __u32 softirq_mask = 0xFFFFFFFF;
>
>    if (in_timer_thread) {
>        /* when running in the timer thread, only handle the (hr)timer softirqs */
>        softirq_mask = (1UL << TIMER_SOFTIRQ) | (1UL << HRTIMER_SOFTIRQ);
>    }
>
>    pending = local_softirq_pending();
>
>    softirq_handle_begin();
>    in_hardirq = lockdep_softirq_start();
>    account_softirq_enter(current);
>
>restart:
>    /* Reset the softirqs we want to handle in this run, keep the rest pending */
>    set_softirq_pending(pending & ~softirq_mask);
>    pending &= softirq_mask;
>
>    local_irq_enable();
>    // ...
>
>The issue here is once the IRQs get enabled, the rest of the code using
>local_softirq_pending() to check what's pending can immediatelly start
>handling softirqs again and again...

Maybe I'm missing something here, but my understanding is that pending bits
are expected to be set once IRQs get enabled.

I do think we need to try avoid restarting for those non-timer softirqs
that won't be handled in in_timer_thread, something like below?

        pending = local_softirq_pending();
        if (pending) {
+               /* Don't restart for non-timer softirqs if in_timer_softirq */
                if (time_before(jiffies, end) && !need_resched() &&
-                   --max_restart)
+                   --max_restart && (pending & softirq_mask))
                        goto restart;

                wakeup_softirqd();

Sincerely,
Ryo Takakura

>I thought I could do these changes in a local and relatively non-invasive
>way, but now I see the use of local_softirq_pending() is spread over
>multiple places, also outside of softirq.c (namely in smp.c and
>tick-sched.c). It appears to me that the approach of "let's only handle
>some softirqs and leave the rest pending for now" may be very difficult
>to achieve when the single pending softirq bitmask is checked at number
>of different places and it's assumed that whenever it's non-zero, it
>should be handled on the first next occasion.
>
>If anyone would suggest a different way, I'll be happy to hear it, but
>I don't see any reasonable approach myself. I think I'll focus on
>different ways how to reduce the number of softirqs occurring, maybe
>trying to use something like BH workqueues for some of them (NET_RX and
>NET_TX come to mind).

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-07-13  1:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-02 14:51 softirqs causing high IRQ jitter Šindelář, Jindřich
2025-07-02 17:09 ` Steven Rostedt
2025-07-06 12:01   ` Ryo Takakura
2025-07-07 13:05     ` Šindelář, Jindřich
2025-07-08 14:36       ` Šindelář, Jindřich
2025-07-08 15:17         ` Šindelář, Jindřich
2025-07-13  1:40         ` Ryo Takakura
2025-07-13  1:35       ` Ryo Takakura

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).