Regression in performance when using PREEMPT

public inbox for linux-rt-devel@lists.linux.dev
 help / color / mirror / Atom feed

* Regression in performance when using PREEMPT_RT
@ 2025-12-26 16:02 Steven Rostedt
  2026-01-12 15:27 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 5+ messages in thread
From: Steven Rostedt @ 2025-12-26 16:02 UTC (permalink / raw)
  To: LKML, linux-rt-devel; +Cc: Sebastian Andrzej Siewior, liangjlee

Hi Sebastian,

We are doing some experiments in running Android Pixel with a PREEMPT_RT
kernel, we found a few unacceptable performance regressions. One was in the
block layer. In non-rt, the ufshcd interrupt would trigger the BLOCK
softirq on another CPU. In RT, it always triggered the softirq on the same
CPU as the interrupt. As the interrupt line always triggers on CPU0, it
forces the BLOCK softirq to also always run on CPU 0, which is bad because
CPU 0 is a little core and the main work should be running on a big core.

In block/blk-mq.c:blk_mq_complete_need_ipi() there's this code:

	/*
	 * With force threaded interrupts enabled, raising softirq from an SMP
	 * function call will always result in waking the ksoftirqd thread.
	 * This is probably worse than completing the request on a different
	 * cache domain.
	 */
	if (force_irqthreads())
		return false;

When I saw "probably worse", I'm thinking this was decided by analysis and
not by any real numbers. Was it?

When we commented out the above if statement so that it did not return
false, things sped up to almost non-rt speeds again.

The fio benchmark went from 76MB to 94MB (higher is better). It's still not
at the level of non-rt, but this was definitely one of the areas that
caused the regression.

Is that exit out of the function truly needed?

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Regression in performance when using PREEMPT_RT
  2025-12-26 16:02 Regression in performance when using PREEMPT_RT Steven Rostedt
@ 2026-01-12 15:27 ` Sebastian Andrzej Siewior
  2026-01-12 17:33   ` Steven Rostedt
  0 siblings, 1 reply; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-01-12 15:27 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, linux-rt-devel, liangjlee

On 2025-12-26 11:02:49 [-0500], Steven Rostedt wrote:
> Hi Sebastian,
> 
> We are doing some experiments in running Android Pixel with a PREEMPT_RT
> kernel, we found a few unacceptable performance regressions. One was in the
> block layer. In non-rt, the ufshcd interrupt would trigger the BLOCK
> softirq on another CPU. In RT, it always triggered the softirq on the same
> CPU as the interrupt. As the interrupt line always triggers on CPU0, it
> forces the BLOCK softirq to also always run on CPU 0, which is bad because
> CPU 0 is a little core and the main work should be running on a big core.
> 
> In block/blk-mq.c:blk_mq_complete_need_ipi() there's this code:
> 
> 	/*
> 	 * With force threaded interrupts enabled, raising softirq from an SMP
> 	 * function call will always result in waking the ksoftirqd thread.
> 	 * This is probably worse than completing the request on a different
> 	 * cache domain.
> 	 */
> 	if (force_irqthreads())
> 		return false;
> 
> When I saw "probably worse", I'm thinking this was decided by analysis and
> not by any real numbers. Was it?
> 
> When we commented out the above if statement so that it did not return
> false, things sped up to almost non-rt speeds again.
> 
> The fio benchmark went from 76MB to 94MB (higher is better). It's still not
> at the level of non-rt, but this was definitely one of the areas that
> caused the regression.
> 
> Is that exit out of the function truly needed?

If I remember correctly, it completes in the context the
threaded-handler. It only does the remote-IPI thingy if the queue is
assigned to a different CPU than the CPU where it completes the request.
In your case it seems that either the device has multiple queues
configured and just one interrupt or the queue is configured to a
different CPU than the interrupt.

If you have multiple queues but just one interrupt then the lack
distributing the load is unfortunate. If the queue has been moved to a
BIG CPU then I suggest to move the IRQ to a BIG CPU, too.

If you ignore the statement and allow the remote-IPI to kick the softirq
then the request ends up in ksoftirqd on the remote CPU probably
accompanied by a warning. Here it runs as SCHED_OTHER and competes for
CPU resources with any other task on that CPU which different than
softirq on !RT where it has to wait until other hardirqs complete.
The other thing is that if something "else" is busy, say a
threaded-interrupt then it will pickup this request (before ksoftirqd
had the chance). The result is that handler now does the I/O at its end
instead ksoftirqd. If the interrupt is important and has a higher
priority then the average MAX_RT_PRIO / 2 then this block I/O might
disrupt its schedule.

I think moving the interrupt to a BIG CPU (as the block queue), if
possible, would be the easiest thing to do.
If the hardware restricts it, I would suggest having a dedicated
SCHED_FIFO thread for its duty would be better than an anonymous
catch-all ksoftirqd.

> Thanks,

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Regression in performance when using PREEMPT_RT
  2026-01-12 15:27 ` Sebastian Andrzej Siewior
@ 2026-01-12 17:33   ` Steven Rostedt
       [not found]     ` <CAM5rmdez5fyEU-=MYxpPrg1Sr+VjbMi1tyek17uoJw_4gyMGhg@mail.gmail.com>
  2026-01-13  8:43     ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 5+ messages in thread
From: Steven Rostedt @ 2026-01-12 17:33 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: LKML, linux-rt-devel, liangjlee

On Mon, 12 Jan 2026 16:27:02 +0100
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:


> If I remember correctly, it completes in the context the
> threaded-handler. It only does the remote-IPI thingy if the queue is
> assigned to a different CPU than the CPU where it completes the request.
> In your case it seems that either the device has multiple queues
> configured and just one interrupt or the queue is configured to a
> different CPU than the interrupt.
> 
> If you have multiple queues but just one interrupt then the lack
> distributing the load is unfortunate. If the queue has been moved to a
> BIG CPU then I suggest to move the IRQ to a BIG CPU, too.
> 
> If you ignore the statement and allow the remote-IPI to kick the softirq
> then the request ends up in ksoftirqd on the remote CPU probably
> accompanied by a warning. Here it runs as SCHED_OTHER and competes for

I'm not sure it triggers a warning. Jack, have you seen that when you
removed the force_irqthreads() check?

> CPU resources with any other task on that CPU which different than
> softirq on !RT where it has to wait until other hardirqs complete.
> The other thing is that if something "else" is busy, say a
> threaded-interrupt then it will pickup this request (before ksoftirqd
> had the chance). The result is that handler now does the I/O at its end
> instead ksoftirqd. If the interrupt is important and has a higher
> priority then the average MAX_RT_PRIO / 2 then this block I/O might
> disrupt its schedule.

I'm not sure what the affect of that would be.

> 
> I think moving the interrupt to a BIG CPU (as the block queue), if
> possible, would be the easiest thing to do.

We have done that. It appears that the hardware simply picks the first CPU
it can use and *always* uses that. By setting it to a big core, it did get
some improvement but the problem is still that the softirq only runs on
that CPU.

> If the hardware restricts it, I would suggest having a dedicated
> SCHED_FIFO thread for its duty would be better than an anonymous
> catch-all ksoftirqd.

I think the solution may be to give up on PREEMPT_RT if that's the case,
unless there's a non PREEMPT_RT reason to make that change.

To give you an idea of what the issue is here, it is the distribution of the
block softirq (even when the irq itself is always coming in on a single
CPU).

 cat /proc/softirqs | grep -i block

non-rt:
	BLOCK:          6          0          0          0          1          7     162986     164750

RT unmodified:
	BLOCK:     329875          0          0          0          0          0          0          0

RT without the forced_irqthreads check:
	BLOCK:          0          0          0          0         11         15     164116     163619

-- Steve


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Regression in performance when using PREEMPT_RT
       [not found]     ` <CAM5rmdez5fyEU-=MYxpPrg1Sr+VjbMi1tyek17uoJw_4gyMGhg@mail.gmail.com>
@ 2026-01-13  7:44       ` Jack Lee
  0 siblings, 0 replies; 5+ messages in thread
From: Jack Lee @ 2026-01-13  7:44 UTC (permalink / raw)
  To: rostedt; +Cc: bigeasy, linux-rt-devel, linux-kernel

I resent the email because I didn't switch to plain text mode the
first time, and LKML rejected it. Sorry for the inconvenience.

Hi Steven and Sebastian,

> If you ignore the statement and allow the remote-IPI to kick the softirq
> then the request ends up in ksoftirqd on the remote CPU probably
> accompanied by a warning.

> I'm not sure it triggers a warning. Jack, have you seen that when you
> removed the force_irqthreads() check?

Yes, the below 2 warnings appeared after I removed the checks. [1]

[ 6643.468395] WARNING: CPU: 7 PID: 0 at kernel/softirq.c:317
do_softirq_post_smp_call_flush+0x60/0x70
[ 6643.468594] CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Tainted: G
W  OE      6.12.18-android16-0-maybe-dirty-4k #1
[ 6643.468624] Call trace:
[ 6643.468625] do_softirq_post_smp_call_flush+0x60/0x70
[ 6643.468626] flush_smp_call_function_queue+0x5c/0x78
[ 6643.468629] do_idle+0x210/0x25c
[ 6643.468632] cpu_startup_entry+0x34/0x3c
[ 6643.468634] secondary_start_kernel+0x130/0x150
[ 6643.468637] __secondary_switched+0xc0/0xc4

[ 6644.917108] WARNING: CPU: 0 PID: 340 at kernel/irq/handle.c:162
__handle_irq_event_percpu+0x15c/0x25c
[ 6644.917753] CPU: 0 UID: 0 PID: 340 Comm: irq/301-exynos- Tainted: G
       W  OE      6.12.18-android16-0-maybe-dirty-4k #1
[ 6644.917858] Call trace:
[ 6644.917859] __handle_irq_event_percpu+0x15c/0x25c
[ 6644.917865] handle_irq_event+0x64/0xcc
[ 6644.917871] handle_edge_irq+0x130/0x304
[ 6644.917878] generic_handle_domain_irq+0x58/0x80
[ 6644.917884] dw_handle_msi_irq+0xc8/0xdc
[ 6644.917894] exynos_pcie_rc_irq_handler+0xa4/0x2c8 [pcie_exynos_gs
f050c3a198650ac29c8e84e424e142925038c66a]
[ 6644.917935] irq_forced_thread_fn+0x48/0xac
[ 6644.917942] irq_thread+0x158/0x2b8
[ 6644.917948] kthread+0x11c/0x1b0
[ 6644.917956] ret_from_fork+0x10/0x20

---

[1]: Steps:
$ dmesg -C
$ echo 1 > /d/clear_warn_once
$ taskset c0 /data/local/tmp/fio --name=fio.result --direct=1
--directory=/data/local/tmp/out/ --rw=write --size=128m --numjobs=1
--group_reporting=1 --loops=10 --fsync_on_close=1 --end_fsync=1
$ dmesg > dmesg.txt

Regards,
Jack Lee

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Regression in performance when using PREEMPT_RT
  2026-01-12 17:33   ` Steven Rostedt
       [not found]     ` <CAM5rmdez5fyEU-=MYxpPrg1Sr+VjbMi1tyek17uoJw_4gyMGhg@mail.gmail.com>
@ 2026-01-13  8:43     ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-01-13  8:43 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, linux-rt-devel, liangjlee

On 2026-01-12 12:33:56 [-0500], Steven Rostedt wrote:
> > CPU resources with any other task on that CPU which different than
> > softirq on !RT where it has to wait until other hardirqs complete.
> > The other thing is that if something "else" is busy, say a
> > threaded-interrupt then it will pickup this request (before ksoftirqd
> > had the chance). The result is that handler now does the I/O at its end
> > instead ksoftirqd. If the interrupt is important and has a higher
> > priority then the average MAX_RT_PRIO / 2 then this block I/O might
> > disrupt its schedule.
> 
> I'm not sure what the affect of that would be.

It could get completed by the networking interrupt. That would be okay
from the "correctness" POV but if your networking has higher priority
and disk I/O is considered low priority then it will be probably not
good.

> > I think moving the interrupt to a BIG CPU (as the block queue), if
> > possible, would be the easiest thing to do.
> 
> We have done that. It appears that the hardware simply picks the first CPU
> it can use and *always* uses that. By setting it to a big core, it did get
> some improvement but the problem is still that the softirq only runs on
> that CPU.

That is "normal". Unless you have firmware that configures each
interrupt to a CPU and linux simply uses the default value. 

> > If the hardware restricts it, I would suggest having a dedicated
> > SCHED_FIFO thread for its duty would be better than an anonymous
> > catch-all ksoftirqd.
> 
> I think the solution may be to give up on PREEMPT_RT if that's the case,
> unless there's a non PREEMPT_RT reason to make that change.
> 
> To give you an idea of what the issue is here, it is the distribution of the
> block softirq (even when the irq itself is always coming in on a single
> CPU).
> 
>  cat /proc/softirqs | grep -i block
> 
> non-rt:
> 	BLOCK:          6          0          0          0          1          7     162986     164750
> 
> RT unmodified:
> 	BLOCK:     329875          0          0          0          0          0          0          0
> 
> RT without the forced_irqthreads check:
> 	BLOCK:          0          0          0          0         11         15     164116     163619

So the last two or four CPUs are the big ones? It seems that you have at
least two queues. Not sure where they come from.
I would suggest to create threads per run-queue just to keep it within
the context. This should mimic the anonymous softirq. The difference
would be the higher priority and preference over the SCHED_OTHER tasks
which might improve the performance (which depends on the current
workload, i.e. if your system idle then there is no fight for CPU
ressources).
Or you get the one interrupt per queue if this is missing in the eMMC
driver somewhere on the software side. Usually the NVME do this.

> -- Steve

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-01-13  8:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-26 16:02 Regression in performance when using PREEMPT_RT Steven Rostedt
2026-01-12 15:27 ` Sebastian Andrzej Siewior
2026-01-12 17:33   ` Steven Rostedt
     [not found]     ` <CAM5rmdez5fyEU-=MYxpPrg1Sr+VjbMi1tyek17uoJw_4gyMGhg@mail.gmail.com>
2026-01-13  7:44       ` Jack Lee
2026-01-13  8:43     ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox