* Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3)
@ 2024-12-20 13:18 Florian Paul Schmidt
2024-12-20 14:05 ` John Ogness
2025-01-07 14:39 ` Sebastian Andrzej Siewior
0 siblings, 2 replies; 20+ messages in thread
From: Florian Paul Schmidt @ 2024-12-20 13:18 UTC (permalink / raw)
To: linux-rt-users
Hi!
I'm playing around with a Raspberry Pi 5 and a PREEMPT_RT kernel (raspberry
pi 6.13.y branch) and wonder about the latencies I see. First I tried
cyclictest and saw latencies in the upper 50s up to 80s or so under a
kernel compile load.
So I compiled the kernel with timerlat and osnoise tracers and ran a bit of
rtla timerlat top -a 60 --dma-latency 0
and I get traces like the following:
Timer Latency
0 00:00:22 | IRQ Timer Latency (us) | Thread Timer Latency (us)
CPU COUNT | cur min avg max | cur min avg max
0 #21138 | 4 0 0 14 | 17 3 4 48
1 #21138 | 6 0 0 21 | 41 3 4 50
2 #21138 | 11 0 0 11 | 61 3 4 61
3 #21138 | 5 0 0 10 | 40 3 4 57
rtla timerlat hit stop tracing
## CPU 2 hit stop tracing, analyzing it ##
IRQ handler delay: 1.36 us (2.22 %)
IRQ latency: 11.60 us
Timerlat IRQ duration: 21.26 us (34.63 %)
Blocking thread: 19.46 us (31.70 %)
cc1:265581 19.46 us
Blocking thread stack trace
-> timerlat_irq
-> __hrtimer_run_queues
-> hrtimer_interrupt
-> arch_timer_handler_phys
-> handle_percpu_devid_irq
-> generic_handle_domain_irq
-> gic_handle_irq
-> call_on_irq_stack
-> do_interrupt_handler
-> el0_interrupt
-> __el0_irq_handler_common
-> el0t_64_irq_handler
-> el0t_64_irq
------------------------------------------------------------------------
Thread latency: 61.40 us (100%)
Max timerlat IRQ latency from idle: 4.74 us in cpu 2
Saving trace to timerlat_trace.txt
And the surprising thing was that there was nothing out of the ordinary (at
least to my non-expert eyes): No spin locks or similar. Just a syscall (these
are implemented via IRQ EL0, right?) which then goes straight to the timer
handling. I might be completely off here though, as I am not an expert at
all in these things.
When I contrast this to an unloaded system and a threshold like 10 us it seems
that there is actually "something" in the trace.
Timer Latency
0 00:00:04 | IRQ Timer Latency (us) | Thread Timer Latency (us)
CPU COUNT | cur min avg max | cur min avg max
0 #4004 | 0 0 0 1 | 3 3 3 6
1 #4004 | 1 0 0 1 | 10 3 3 10
2 #4003 | 0 0 0 0 | 3 3 3 7
3 #4003 | 0 0 0 1 | 3 3 3 7
rtla timerlat hit stop tracing
## CPU 1 hit stop tracing, analyzing it ##
IRQ handler delay: 0.00 us (0.00 %)
IRQ latency: 1.40 us
Timerlat IRQ duration: 4.67 us (45.67 %)
Blocking thread: 3.17 us (30.98 %)
htop:262562 3.17 us
Blocking thread stack trace
-> timerlat_irq
-> __hrtimer_run_queues
-> hrtimer_interrupt
-> arch_timer_handler_phys
-> handle_percpu_devid_irq
-> generic_handle_domain_irq
-> gic_handle_irq
-> call_on_irq_stack
-> do_interrupt_handler
-> el1_interrupt
-> el1h_64_irq_handler
-> el1h_64_irq
-> tracer_preempt_on
-> preempt_count_sub
-> migrate_enable
-> kmem_cache_free
-> __fput
-> __fput_sync
-> __arm64_sys_close
-> invoke_syscall
-> el0_svc_common.constprop.0
-> do_el0_svc
-> el0_svc
-> el0t_64_sync_handler
-> el0t_64_sync
------------------------------------------------------------------------
Thread latency: 10.22 us (100%)
Max timerlat IRQ latency from idle: 1.43 us in cpu 1
Saving trace to timerlat_trace.txt
In this case it's htop invoking some syscall. So I really wonder how to
interpret things like the first trace above where "nothing" is happening
and still there are large-ish latencie.
At some point in time I wondered what kind of loads did trigger the
longest latencies and I played with "stress -m 6" a little and I was
surprised to see the latencies even go higher than 100. Here is an
example trace
Timer Latency
0 00:00:16 | IRQ Timer Latency (us) | Thread Timer Latency (us)
CPU COUNT | cur min avg max | cur min avg max
0 #15654 | 1 0 2 31 | 6 3 15 75
1 #15654 | 2 0 3 42 | 13 3 17 91
2 #15654 | 2 0 2 38 | 17 3 16 87
3 #15654 | 5 0 3 26 | 111 3 16 111
rtla timerlat hit stop tracing
## CPU 3 hit stop tracing, analyzing it ##
IRQ handler delay: 0.00 us (0.00 %)
IRQ latency: 5.16 us
Timerlat IRQ duration: 47.41 us (42.53 %)
Blocking thread: 50.50 us (45.30 %)
stress:278226 50.50 us
Blocking thread stack trace
-> timerlat_irq
-> __hrtimer_run_queues
-> hrtimer_interrupt
-> arch_timer_handler_phys
-> handle_percpu_devid_irq
-> generic_handle_domain_irq
-> gic_handle_irq
-> call_on_irq_stack
-> do_interrupt_handler
-> el1_interrupt
-> el1h_64_irq_handler
-> el1h_64_irq
-> folios_put_refs
-> free_pages_and_swap_cache
-> __tlb_batch_free_encoded_pages
-> tlb_flush_mmu
-> unmap_page_range
-> unmap_vmas
-> vms_clear_ptes
-> vms_complete_munmap_vmas
-> do_vmi_align_munmap
-> do_vmi_munmap
-> __vm_munmap
-> __arm64_sys_munmap
-> invoke_syscall
-> el0_svc_common.constprop.0
-> do_el0_svc
-> el0_svc
-> el0t_64_sync_handler
-> el0t_64_sync
------------------------------------------------------------------------
Thread latency: 111.47 us (100%)
Max timerlat IRQ latency from idle: 1.98 us in cpu 3
Saving trace to timerlat_trace.txt
OK, here at least something related to virtual memory is happening.
I still wonder though: How come the latencies are so large? Is the
raspberry pi platform just not suited to reach lower latencies? Is
everything working as expected? Are maybe the raspberry pi
patches responsible for latency sources which are invisible in the
traces? I there any hope for getting these latencies a little more
under control?
Kind regards,
FPS
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3)
2024-12-20 13:18 Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3) Florian Paul Schmidt
@ 2024-12-20 14:05 ` John Ogness
2024-12-20 15:57 ` Florian Paul Schmidt
2025-01-09 8:39 ` Florian Paul Schmidt
2025-01-07 14:39 ` Sebastian Andrzej Siewior
1 sibling, 2 replies; 20+ messages in thread
From: John Ogness @ 2024-12-20 14:05 UTC (permalink / raw)
To: Florian Paul Schmidt, linux-rt-users
Hi Florian,
On 2024-12-20, Florian Paul Schmidt <mista.tapas@gmx.net> wrote:
> I'm playing around with a Raspberry Pi 5 and a PREEMPT_RT kernel (raspberry
> pi 6.13.y branch) and wonder about the latencies I see. First I tried
> cyclictest and saw latencies in the upper 50s up to 80s or so under a
> kernel compile load.
...
> I still wonder though: How come the latencies are so large? Is the
> raspberry pi platform just not suited to reach lower latencies? Is
> everything working as expected? Are maybe the raspberry pi
> patches responsible for latency sources which are invisible in the
> traces? I there any hope for getting these latencies a little more
> under control?
There are various kernel configurations that affect latency. But I am
curious... Why do you think a maximum latency of 80us on an $80 board is
large? What maximum latencies are you expecting? And why do you think it
is not "under control"?
John Ogness
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3)
2024-12-20 14:05 ` John Ogness
@ 2024-12-20 15:57 ` Florian Paul Schmidt
2024-12-22 10:32 ` Leon Woestenberg
2025-01-09 8:39 ` Florian Paul Schmidt
1 sibling, 1 reply; 20+ messages in thread
From: Florian Paul Schmidt @ 2024-12-20 15:57 UTC (permalink / raw)
To: John Ogness, linux-rt-users
Hi John! Sorry, forgot to hit "reply all". So here it goes again :)
With a little clarification as well..
On 20/12/2024 15:05, John Ogness wrote:
> There are various kernel configurations that affect latency. But I am
> curious... Why do you think a maximum latency of 80us on an $80 board is
> large?
Simply because 80 us are an "eternity" 😉 That's like 192000 cycles at
2.4Ghz. And also because other slow systems (like that Celeron j4125
board I have) behave much better even at similar clock speeds (EDIT:
IIRC). I don't think the price point has anything to do with it really.
EDIT: Don't get me wrong though. 80 us are OK for my use case (audio
processing) - I just wonder if it might be even better.
> What maximum latencies are you expecting? And why do you think it
> is not "under control"?
I would _wish_ for latencies in the low tens of microseconds 🙂 I do
not expect anything.
Kind regards,
FPS
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3)
2024-12-20 15:57 ` Florian Paul Schmidt
@ 2024-12-22 10:32 ` Leon Woestenberg
2025-01-08 9:42 ` Florian Paul Schmidt
2025-01-12 15:14 ` Florian Paul Schmidt
0 siblings, 2 replies; 20+ messages in thread
From: Leon Woestenberg @ 2024-12-22 10:32 UTC (permalink / raw)
To: Florian Paul Schmidt; +Cc: John Ogness, linux-rt-users
Hello,
First of all, you can loo-kup long-term latency plot for PREEMPT_RT on
different systems. I doubt Raspberry Pi 5 differs from Celeron/Atom,
in my experience and from looking at the plots:, but YMMD.
https://www.osadl.org/Latency-plot-of-system-in-rack-2-slot.qa-latencyplot-r2s7.0.html?shadow=0
https://www.osadl.org/Long-term-latency-plot-of-system-in-rack.qa-3d-latencyplot-rds3.0.html?shadow=0
You could consider using isolated CPU cores, which should give you
much lower latencies, if you have full control over the application
logic.
Regards,
Leon.
--
Leon Woestenberg
leon@sidebranch.com
T: +31 40 711 42 76
M: +31 6 472 30 372
Sidebranch Embedded Systems
Eindhoven, The Netherlands
http://www.sidebranch.com
On Fri, Dec 20, 2024 at 4:58 PM Florian Paul Schmidt
<mista.tapas@gmx.net> wrote:
>
> Hi John! Sorry, forgot to hit "reply all". So here it goes again :)
> With a little clarification as well..
>
> On 20/12/2024 15:05, John Ogness wrote:
> > There are various kernel configurations that affect latency. But I am
> > curious... Why do you think a maximum latency of 80us on an $80 board is
> > large?
>
> Simply because 80 us are an "eternity" 😉 That's like 192000 cycles at
> 2.4Ghz. And also because other slow systems (like that Celeron j4125
> board I have) behave much better even at similar clock speeds (EDIT:
> IIRC). I don't think the price point has anything to do with it really.
>
> EDIT: Don't get me wrong though. 80 us are OK for my use case (audio
> processing) - I just wonder if it might be even better.
>
> > What maximum latencies are you expecting? And why do you think it
> > is not "under control"?
>
> I would _wish_ for latencies in the low tens of microseconds 🙂 I do
> not expect anything.
>
> Kind regards,
> FPS
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3)
2024-12-20 13:18 Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3) Florian Paul Schmidt
2024-12-20 14:05 ` John Ogness
@ 2025-01-07 14:39 ` Sebastian Andrzej Siewior
2025-01-08 9:48 ` Florian Paul Schmidt
1 sibling, 1 reply; 20+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-01-07 14:39 UTC (permalink / raw)
To: Florian Paul Schmidt; +Cc: linux-rt-users
On 2024-12-20 14:18:43 [+0100], Florian Paul Schmidt wrote:
> Hi!
Hi,
> I'm playing around with a Raspberry Pi 5 and a PREEMPT_RT kernel (raspberry
> pi 6.13.y branch) and wonder about the latencies I see. First I tried
> cyclictest and saw latencies in the upper 50s up to 80s or so under a
> kernel compile load.
>
> So I compiled the kernel with timerlat and osnoise tracers and ran a bit of
>
> rtla timerlat top -a 60 --dma-latency 0
>
> and I get traces like the following:
…
> ## CPU 3 hit stop tracing, analyzing it ##
> IRQ handler delay: 0.00 us (0.00 %)
> IRQ latency: 5.16 us
> Timerlat IRQ duration: 47.41 us (42.53 %)
> Blocking thread: 50.50 us (45.30 %)
> stress:278226 50.50 us
> Blocking thread stack trace
> -> timerlat_irq
> -> __hrtimer_run_queues
> -> hrtimer_interrupt
> -> arch_timer_handler_phys
> -> handle_percpu_devid_irq
> -> generic_handle_domain_irq
> -> gic_handle_irq
> -> call_on_irq_stack
> -> do_interrupt_handler
> -> el1_interrupt
> -> el1h_64_irq_handler
> -> el1h_64_irq
> -> folios_put_refs
…
> Thread latency: 111.47 us (100%)
The backtrace appears long but everything up to folios_put_refs()
appears preemtible. Then there is the interrupt handler. This looks
okay.
> Max timerlat IRQ latency from idle: 1.98 us in cpu 3
> Saving trace to timerlat_trace.txt
>
>
> OK, here at least something related to virtual memory is happening.
> I still wonder though: How come the latencies are so large? Is the
> raspberry pi platform just not suited to reach lower latencies? Is
> everything working as expected? Are maybe the raspberry pi
> patches responsible for latency sources which are invisible in the
> traces? I there any hope for getting these latencies a little more
> under control?
Based on this trace, I don't see a section which disables preemption/
interrupts and be the reason for the interrupt happening.
One thing that could delay your doing are small caches.
Then there is looking within the system: An inaccurate/ low resolution
clocksource and/ or timer device.
> Kind regards,
> FPS
Sebastian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3)
2024-12-22 10:32 ` Leon Woestenberg
@ 2025-01-08 9:42 ` Florian Paul Schmidt
2025-01-12 15:14 ` Florian Paul Schmidt
1 sibling, 0 replies; 20+ messages in thread
From: Florian Paul Schmidt @ 2025-01-08 9:42 UTC (permalink / raw)
To: Leon Woestenberg; +Cc: John Ogness, linux-rt-users
Hi!
On 12/22/2024 11:32 AM, Leon Woestenberg wrote:
> Hello,
>
> First of all, you can loo-kup long-term latency plot for PREEMPT_RT on
> different systems. I doubt Raspberry Pi 5 differs from Celeron/Atom,
> in my experience and from looking at the plots:, but YMMD.
>
> https://www.osadl.org/Latency-plot-of-system-in-rack-2-slot.qa-latencyplot-r2s7.0.html?shadow=0
> https://www.osadl.org/Long-term-latency-plot-of-system-in-rack.qa-3d-latencyplot-rds3.0.html?shadow=0
I am aware of the OSADL test farm :) But thanks for the pointer nonetheless!
> You could consider using isolated CPU cores, which should give you
> much lower latencies, if you have full control over the application
> logic.
I would love to make use of all 4 cores though. I'll play around and see
if the latency reductions with isolated cores make up for the loss of a
core for the RT work.
Thanks and regards,
FPS
--
https://blog.dfdx.eu
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3)
2025-01-07 14:39 ` Sebastian Andrzej Siewior
@ 2025-01-08 9:48 ` Florian Paul Schmidt
2025-01-28 12:30 ` Tim Sander
0 siblings, 1 reply; 20+ messages in thread
From: Florian Paul Schmidt @ 2025-01-08 9:48 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: linux-rt-users
Hi!
On 1/7/2025 3:39 PM, Sebastian Andrzej Siewior wrote:
> On 2024-12-20 14:18:43 [+0100], Florian Paul Schmidt wrote:
[...]
> The backtrace appears long but everything up to folios_put_refs()
> appears preemtible. Then there is the interrupt handler. This looks
> okay.
That was my impression as well. Nothing really surprising (at least for
my limited knowledge in these matters.)
> Based on this trace, I don't see a section which disables preemption/
> interrupts and be the reason for the interrupt happening.
>
> One thing that could delay your doing are small caches.
> Then there is looking within the system: An inaccurate/ low resolution
> clocksource and/ or timer device.
I will look into these. Thanks for the pointers!
Kind regards,
FPS
--
https://blog.dfdx.eu
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3)
2024-12-20 14:05 ` John Ogness
2024-12-20 15:57 ` Florian Paul Schmidt
@ 2025-01-09 8:39 ` Florian Paul Schmidt
1 sibling, 0 replies; 20+ messages in thread
From: Florian Paul Schmidt @ 2025-01-09 8:39 UTC (permalink / raw)
To: John Ogness, linux-rt-users
Hi John again, Hi list :)
On 12/20/2024 3:05 PM, John Ogness wrote:
> There are various kernel configurations that affect latency. But I am
> curious... Why do you think a maximum latency of 80us on an $80 board is
> large? What maximum latencies are you expecting? And why do you think it
> is not "under control"?
I played around some more and coincidentally found a way to trigger even
longer latencies:
stress -m 4 --vm-stride 16 --vm-bytes 512000000 --vm-keep -c 4
This gives me latencies in excess of 500 us. I even saw 800 us once or
so in my tests.. The traces still look fine. Here's one for 438us:
rtla timerlat hit stop tracing
## CPU 3 hit stop tracing, analyzing it ##
IRQ handler delay: 0.00 us
(0.00 %)
IRQ latency: 439.58 us
Blocking thread:
stress:1706
Blocking thread stack trace
-> timerlat_irq
-> __hrtimer_run_queues
-> hrtimer_interrupt
-> arch_timer_handler_phys
-> handle_percpu_devid_irq
-> generic_handle_domain_irq
-> gic_handle_irq
-> call_on_irq_stack
-> do_interrupt_handler
-> el0_interrupt
-> __el0_irq_handler_common
-> el0t_64_irq_handler
-> el0t_64_irq
------------------------------------------------------------------------
IRQ latency: 439.58 us (100%)
A Raspberry Pi 4 seems to be less susceptible to this stress test. There
the maximum latencies go to ca. 200 us. I will try a couple of kernels
including the one in the OSADL farm.
Kind regards,
FPS
--
https://blog.dfdx.eu
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3)
2024-12-22 10:32 ` Leon Woestenberg
2025-01-08 9:42 ` Florian Paul Schmidt
@ 2025-01-12 15:14 ` Florian Paul Schmidt
2025-01-12 21:30 ` John Ogness
2025-01-14 10:28 ` Mike Galbraith
1 sibling, 2 replies; 20+ messages in thread
From: Florian Paul Schmidt @ 2025-01-12 15:14 UTC (permalink / raw)
To: Leon Woestenberg; +Cc: John Ogness, linux-rt-users
Hi again!
On 22/12/2024 11:32, Leon Woestenberg wrote:
> You could consider using isolated CPU cores, which should give you
> much lower latencies, if you have full control over the application
> logic.
OK, I got around to playing with isolcpus. As a first attempt I added
isolcpus=1,2,3
to my kernel commandline and tried
stress -m 1 --vm-stride 16 --vm-bytes 512000000 --vm-keep -c 1
and it ran on core 0 maxxing it out. So, all fine and dandy up to
this point. Then I went on to run cyclictest on the isolated core:
sudo cyclictest -m -p 95 -a 1,2,3 -t 3
But I get enormous latencies:
# /dev/cpu_dma_latency set to 0us
policy: fifo: loadavg: 2.98 3.08 2.95 3/222 134837
T: 0 (134706) P:95 I:1000 C: 4824 Min: 1 Act: 3 Avg: 8 Max: 518
T: 1 (134707) P:95 I:1500 C: 3215 Min: 1 Act: 1 Avg: 8 Max: 566
T: 2 (134708) P:95 I:2000 C: 2411 Min: 1 Act: 1 Avg: 9 Max: 503
Side note: I get these enormous latencies also without cpu
isolation and this particular stress test (albeit with -c 4 and -m 4
- so the stressor running on all cores.) But I would have expected
the core isolation to mitigate this somewhat. Here's the really
surprising thing though. If I run
sudo cyclictest -m -p 95 -a 0,1,2,3 -t 4
then I get these enormous latencies on cores 1,2 and 3 but not on
core 0:
# /dev/cpu_dma_latency set to 0us
policy: fifo: loadavg: 2.50 2.83 2.88 3/223 139478
T: 0 (139313) P:95 I:1000 C: 5881 Min: 2 Act: 3 Avg: 3 Max: 10
T: 1 (139314) P:95 I:1500 C: 3920 Min: 1 Act: 1 Avg: 7 Max: 419
T: 2 (139315) P:95 I:2000 C: 2940 Min: 1 Act: 1 Avg: 7 Max: 480
T: 3 (139316) P:95 I:2500 C: 2352 Min: 1 Act: 1 Avg: 9 Max: 433
And _that_ is really strange from my understanding. What is this
raspberry pi 5 doing? O_o
10 us would be an excellent worst case latency ;)
And here's another strange thing: The latencies depend a lot on the
--vm-stride argument to stress. 16 seems to be the value that maximizes
these latencies. IF I go either to higher (32, 64, ...) or lower (8, 4,
2, 1) values the latencies disappear almost completely.
Now I am more puzzled than before.
Kind regards,
FPS
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3)
2025-01-12 15:14 ` Florian Paul Schmidt
@ 2025-01-12 21:30 ` John Ogness
2025-01-13 9:15 ` Florian Paul Schmidt
2025-01-14 10:28 ` Mike Galbraith
1 sibling, 1 reply; 20+ messages in thread
From: John Ogness @ 2025-01-12 21:30 UTC (permalink / raw)
To: Florian Paul Schmidt, Leon Woestenberg; +Cc: linux-rt-users
On 2025-01-12, Florian Paul Schmidt <mista.tapas@gmx.net> wrote:
> OK, I got around to playing with isolcpus. As a first attempt I added
>
> isolcpus=1,2,3
>
> to my kernel commandline and tried
>
> stress -m 1 --vm-stride 16 --vm-bytes 512000000 --vm-keep -c 1
>
> and it ran on core 0 maxxing it out.
...
> Here's the really surprising thing though. If I run
>
> sudo cyclictest -m -p 95 -a 0,1,2,3 -t 4
>
> then I get these enormous latencies on cores 1,2 and 3 but not on
> core 0:
>
> # /dev/cpu_dma_latency set to 0us
> policy: fifo: loadavg: 2.50 2.83 2.88 3/223 139478
>
> T: 0 (139313) P:95 I:1000 C: 5881 Min: 2 Act: 3 Avg: 3 Max: 10
> T: 1 (139314) P:95 I:1500 C: 3920 Min: 1 Act: 1 Avg: 7 Max: 419
> T: 2 (139315) P:95 I:2000 C: 2940 Min: 1 Act: 1 Avg: 7 Max: 480
> T: 3 (139316) P:95 I:2500 C: 2352 Min: 1 Act: 1 Avg: 9 Max: 433
Notice the average is considerably higher on the "idle" CPUs. Perhaps
you have cpufreq scaling enabled?
Are you running these tests using the OSADL kernel?
I assume you see the same effect when running stress(1) pinned to CPU1?
... just to be sure the boot CPU is not somehow special. (No need to
boot with isolcpus since the machine is otherwise idle anyway.)
taskset 0x2 stress -m 1 --vm-stride 16 --vm-bytes 512000000 --vm-keep -c 1
sudo cyclictest -m -p 95 -a 1,2,3 -t 3
John Ogness
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3)
2025-01-12 21:30 ` John Ogness
@ 2025-01-13 9:15 ` Florian Paul Schmidt
2025-01-13 10:28 ` Florian Paul Schmidt
2025-01-13 11:09 ` John Ogness
0 siblings, 2 replies; 20+ messages in thread
From: Florian Paul Schmidt @ 2025-01-13 9:15 UTC (permalink / raw)
To: John Ogness, Leon Woestenberg; +Cc: linux-rt-users
On 1/12/2025 10:30 PM, John Ogness wrote:
> On 2025-01-12, Florian Paul Schmidt <mista.tapas@gmx.net> wrote:
[...]
>> T: 0 (139313) P:95 I:1000 C: 5881 Min: 2 Act: 3 Avg: 3 Max: 10
>> T: 1 (139314) P:95 I:1500 C: 3920 Min: 1 Act: 1 Avg: 7 Max: 419
>> T: 2 (139315) P:95 I:2000 C: 2940 Min: 1 Act: 1 Avg: 7 Max: 480
>> T: 3 (139316) P:95 I:2500 C: 2352 Min: 1 Act: 1 Avg: 9 Max: 433
>
> Notice the average is considerably higher on the "idle" CPUs. Perhaps
> you have cpufreq scaling enabled?
I don't think so. Cores are pinned at 2.4 Ghz by way of cpufreq-set -g
performance.
> Are you running these tests using the OSADL kernel?
No, that is going to be the next thing I'll try. Sadly something's up
with the generated patch script. The kernel directory resulting from
running it is missing the top-level Kconfig. There's a hunk in the
monsterpatch that deletes it:
Index: linux-6.1.66-rt19-v8-16k/Kconfig
===================================================================
--- linux-6.1.66-rt19-v8-16k.orig/Kconfig
+++ /dev/null
[...]
Lines 246845 and following.
Why? :)
The kernel I am running is:
$ uname -a
Linux ogfx5 6.13.0-rc6-v8-rt2+ #1 SMP PREEMPT_RT Thu Jan 9 17:07:08 CET
2025 aarch64 GNU/Linux
I also get similar results with 6.13.0-rc3 but compiled with 16k pages
and for the pi 5 architecture.
> I assume you see the same effect when running stress(1) pinned to CPU1?
> ... just to be sure the boot CPU is not somehow special. (No need to
> boot with isolcpus since the machine is otherwise idle anyway.)
>
> taskset 0x2 stress -m 1 --vm-stride 16 --vm-bytes 512000000 --vm-keep -c 1
>
> sudo cyclictest -m -p 95 -a 1,2,3 -t 3
Yeah, I see the same behaviour with this way of running the test: The
core with the stressor on it shows small latencies and the other huge
ones. The effect disappears once I run two or more stressors, e.g.
taskset 0x4 stress -m 1 --vm-stride 16 --vm-bytes 512000000 --vm-keep -c 1
and
taskset 0x8 stress -m 1 --vm-stride 16 --vm-bytes 512000000 --vm-keep -c 1
Then all cores show huge latencies.
Kind regards,
FPS
--
https://blog.dfdx.eu
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3)
2025-01-13 9:15 ` Florian Paul Schmidt
@ 2025-01-13 10:28 ` Florian Paul Schmidt
2025-01-13 11:09 ` John Ogness
1 sibling, 0 replies; 20+ messages in thread
From: Florian Paul Schmidt @ 2025-01-13 10:28 UTC (permalink / raw)
To: John Ogness, Leon Woestenberg; +Cc: linux-rt-users
On 1/13/2025 10:15 AM, Florian Paul Schmidt wrote:
[...]
> No, that is going to be the next thing I'll try. Sadly something's up
> with the generated patch script. The kernel directory resulting from
> running it is missing the top-level Kconfig. There's a hunk in the
> monsterpatch that deletes it:
>
> Index: linux-6.1.66-rt19-v8-16k/Kconfig
> ===================================================================
> --- linux-6.1.66-rt19-v8-16k.orig/Kconfig
> +++ /dev/null
> [...]
>
> Lines 246845 and following.
>
> Why? :)
Hmm, and also the provided .config is not really complete it seems.
After re-adding the Kconfig file make oldconfig asks a couple of questions.
And also it seems the kernel has to be built differently from the usual
raspberry pi kernels. Does it not support device tree overlays? The dtbs
target complained about missing makefiles..
Kind regards,
FPS
--
https://blog.dfdx.eu
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3)
2025-01-13 9:15 ` Florian Paul Schmidt
2025-01-13 10:28 ` Florian Paul Schmidt
@ 2025-01-13 11:09 ` John Ogness
2025-01-13 12:56 ` Florian Paul Schmidt
1 sibling, 1 reply; 20+ messages in thread
From: John Ogness @ 2025-01-13 11:09 UTC (permalink / raw)
To: Florian Paul Schmidt, Leon Woestenberg; +Cc: linux-rt-users
On 2025-01-13, Florian Paul Schmidt <mista.tapas@gmx.net> wrote:
>>> T: 0 (139313) P:95 I:1000 C: 5881 Min: 2 Act: 3 Avg: 3 Max: 10
>>> T: 1 (139314) P:95 I:1500 C: 3920 Min: 1 Act: 1 Avg: 7 Max: 419
>>> T: 2 (139315) P:95 I:2000 C: 2940 Min: 1 Act: 1 Avg: 7 Max: 480
>>> T: 3 (139316) P:95 I:2500 C: 2352 Min: 1 Act: 1 Avg: 9 Max: 433
>
>> I assume you see the same effect when running stress(1) pinned to CPU1?
>> ... just to be sure the boot CPU is not somehow special. (No need to
>> boot with isolcpus since the machine is otherwise idle anyway.)
>>
>> taskset 0x2 stress -m 1 --vm-stride 16 --vm-bytes 512000000 --vm-keep -c 1
>>
>> sudo cyclictest -m -p 95 -a 1,2,3 -t 3
>
> Yeah, I see the same behaviour with this way of running the test: The
> core with the stressor on it shows small latencies and the other huge
> ones. The effect disappears once I run two or more stressors, e.g.
>
> taskset 0x4 stress -m 1 --vm-stride 16 --vm-bytes 512000000 --vm-keep -c 1
>
> and
>
> taskset 0x8 stress -m 1 --vm-stride 16 --vm-bytes 512000000 --vm-keep -c 1
>
> Then all cores show huge latencies.
To me this looks like memory bus contention. I would expect you could
reproduce the same behavior with bare metal software.
Someone with some experience with this platform would need to speak
up. The usefulness of my casual responses has come to an end. ;-)
John Ogness
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3)
2025-01-13 11:09 ` John Ogness
@ 2025-01-13 12:56 ` Florian Paul Schmidt
0 siblings, 0 replies; 20+ messages in thread
From: Florian Paul Schmidt @ 2025-01-13 12:56 UTC (permalink / raw)
To: John Ogness, Leon Woestenberg; +Cc: linux-rt-users
On 1/13/2025 12:09 PM, John Ogness wrote:
[...]
>> Yeah, I see the same behaviour with this way of running the test: The
>> core with the stressor on it shows small latencies and the other huge
>> ones. The effect disappears once I run two or more stressors, e.g.
>>
>> taskset 0x4 stress -m 1 --vm-stride 16 --vm-bytes 512000000 --vm-keep -c 1
>>
>> and
>>
>> taskset 0x8 stress -m 1 --vm-stride 16 --vm-bytes 512000000 --vm-keep -c 1
>>
>> Then all cores show huge latencies.
>
> To me this looks like memory bus contention. I would expect you could
> reproduce the same behavior with bare metal software.
>
> Someone with some experience with this platform would need to speak
> up. The usefulness of my casual responses has come to an end. ;-)
Your input is nonetheless very much appreciated :) Also the peculiar
dependence on the stride parameter which has an effect on what memory
banks are touched in what order and with what timing points in this
direction IMHO.
Another datapoint: When I set the cpu speed to 1.5 Ghz instead of 2.4 by
way of cpufreq-set -g powersave the effect is greatly diminshed which
also points into the contention direction.
Kind regards,
FPS
--
https://blog.dfdx.eu
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3)
2025-01-12 15:14 ` Florian Paul Schmidt
2025-01-12 21:30 ` John Ogness
@ 2025-01-14 10:28 ` Mike Galbraith
2025-02-14 8:31 ` Florian Paul Schmidt
1 sibling, 1 reply; 20+ messages in thread
From: Mike Galbraith @ 2025-01-14 10:28 UTC (permalink / raw)
To: Florian Paul Schmidt, Leon Woestenberg; +Cc: John Ogness, linux-rt-users
On Sun, 2025-01-12 at 16:14 +0100, Florian Paul Schmidt wrote:
> Hi again!
>
> On 22/12/2024 11:32, Leon Woestenberg wrote:
> > You could consider using isolated CPU cores, which should give you
> > much lower latencies, if you have full control over the application
> > logic.
>
> OK, I got around to playing with isolcpus. As a first attempt I added
>
> isolcpus=1,2,3
>
> to my kernel commandline and tried
>
> stress -m 1 --vm-stride 16 --vm-bytes 512000000 --vm-keep -c 1
>
> and it ran on core 0 maxxing it out. So, all fine and dandy up to
> this point. Then I went on to run cyclictest on the isolated core:
>
> sudo cyclictest -m -p 95 -a 1,2,3 -t 3
>
> But I get enormous latencies:
>
> # /dev/cpu_dma_latency set to 0us
> policy: fifo: loadavg: 2.98 3.08 2.95 3/222 134837
>
> T: 0 (134706) P:95 I:1000 C: 4824 Min: 1 Act: 3 Avg: 8 Max: 518
> T: 1 (134707) P:95 I:1500 C: 3215 Min: 1 Act: 1 Avg: 8 Max: 566
> T: 2 (134708) P:95 I:2000 C: 2411 Min: 1 Act: 1 Avg: 9 Max: 503
>
> Side note: I get these enormous latencies also without cpu
> isolation and this particular stress test (albeit with -c 4 and -m 4
> - so the stressor running on all cores.) But I would have expected
> the core isolation to mitigate this somewhat. Here's the really
> surprising thing though. If I run
>
> sudo cyclictest -m -p 95 -a 0,1,2,3 -t 4
>
> then I get these enormous latencies on cores 1,2 and 3 but not on
> core 0:
This seems rather odd, one core diddling memory is the worst case. Any
arm experts out there know why that would be?
pi5, locked at 2.4GHz:
taskset -c 3 stress -m 1 --vm-bytes 1G --vm-stride 16 --vm-keep --timeout 60s;killall cyclictest
T: 0 (23157) P:99 I:1000 C: 63421 Min: 1 Act: 12 Avg: 27 Max: 1378
T: 1 (23158) P:99 I:1500 C: 42327 Min: 1 Act: 32 Avg: 29 Max: 1734
T: 2 (23159) P:99 I:2000 C: 31744 Min: 1 Act: 18 Avg: 28 Max: 1353
T: 3 (23160) P:99 I:2500 C: 25393 Min: 1 Act: 11 Avg: 6 Max: 39
With all cores diddling ram, things are far from lovely, but cyclictest
wakeup latency improves drastically. As you increase the number of
diddlers 1 -> 4, the latency numbers improve until you get all the way
up to.. merely awful.
stress -m 4 --vm-bytes 1G --vm-stride 16 --vm-keep --timeout 60s;killall cyclictest
T: 0 (23276) P:99 I:1000 C: 62706 Min: 1 Act: 49 Avg: 48 Max: 220
T: 1 (23277) P:99 I:1500 C: 41800 Min: 1 Act: 53 Avg: 62 Max: 274
T: 2 (23278) P:99 I:2000 C: 31348 Min: 1 Act: 45 Avg: 65 Max: 302
T: 3 (23279) P:99 I:2500 C: 25077 Min: 1 Act: 32 Avg: 54 Max: 175
Trace of 4 diddler run below, timewarp plagued single diddler trace
below that. Single diddler timewarps go away if you reduce the size
enough, I just stopped poking the thing at 1G.
cyclictest reports markedly higher latency than the ftrace measured
wakeup -> task switch latency in all cases.. as if it has to slog
through miles of mud on the way to reading the damn clock.
Bottom line seems to be don't do that to dinky/cute/cheap boards :)
High speed scroll or skip recommended.
# tracer: wakeup_RT (only traces prio=99)
#
# wakeup_RT latency trace v1.1.5 on 6.13.0-rc6-v8-rt
# --------------------------------------------------------------------
# latency: 234 us, #272/272, CPU#3 | (M:PREEMPT_RT VP:0, KP:0, SP:0 HP:0 #P:4)
# -----------------
# | task: cyclictest-17297 (uid:0 nice:0 policy:1 rt_prio:99)
# -----------------
#
# _------=> CPU#
# / _-----=> irqs-off/BH-disabled
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / _-=> migrate-disable
# ||||| / delay
# cmd pid |||||| time | caller
# \ / |||||| \ | /
stress-17318 3dNh4. 4us : 17318:120:R + [003] 17297: 0:R cyclictest
stress-17318 3dNh4. 11us : try_to_wake_up <-wake_up_process
stress-17318 3dNh3. 12us : task_woken_rt <-ttwu_do_activate
stress-17318 3dNh3. 13us : _raw_spin_unlock <-try_to_wake_up
stress-17318 3dNh3. 13us : preempt_count_sub <-_raw_spin_unlock
stress-17318 3dNh2. 14us : _raw_spin_unlock_irqrestore <-try_to_wake_up
stress-17318 3dNh2. 14us : preempt_count_sub <-_raw_spin_unlock_irqrestore
stress-17318 3dNh1. 14us : preempt_count_sub <-try_to_wake_up
stress-17318 3dNh.. 15us : _raw_spin_lock_irq <-__hrtimer_run_queues
stress-17318 3dNh.. 15us : preempt_count_add <-_raw_spin_lock_irq
stress-17318 3dNh1. 18us : hrtimer_update_next_event <-hrtimer_interrupt
stress-17318 3dNh1. 18us : __hrtimer_next_event_base <-hrtimer_update_next_event
stress-17318 3dNh1. 20us : __hrtimer_next_event_base <-hrtimer_update_next_event
stress-17318 3dNh1. 21us : _raw_spin_unlock_irqrestore <-hrtimer_interrupt
stress-17318 3dNh1. 21us : preempt_count_sub <-_raw_spin_unlock_irqrestore
stress-17318 3dNh.. 21us : tick_program_event <-hrtimer_interrupt
stress-17318 3dNh.. 23us : clockevents_program_event <-tick_program_event
stress-17318 3dNh.. 24us : ktime_get <-clockevents_program_event
stress-17318 3dNh.. 25us : arch_counter_read <-ktime_get
stress-17318 3dNh.. 25us : arch_timer_set_next_event_phys <-clockevents_program_event
stress-17318 3dNh.. 26us : gic_eoimode1_eoi_irq <-handle_percpu_devid_irq
stress-17318 3dNh.. 31us : generic_handle_domain_irq <-gic_handle_irq
stress-17318 3dNh.. 32us : __irq_resolve_mapping <-generic_handle_domain_irq
stress-17318 3dNh.. 32us : __rcu_read_lock <-__irq_resolve_mapping
stress-17318 3dNh.. 34us : __rcu_read_unlock <-__irq_resolve_mapping
stress-17318 3dNh.. 37us : handle_fasteoi_irq <-generic_handle_domain_irq
stress-17318 3dNh.. 37us : _raw_spin_lock <-handle_fasteoi_irq
stress-17318 3dNh.. 38us : preempt_count_add <-_raw_spin_lock
stress-17318 3dNh1. 39us : handle_irq_event <-handle_fasteoi_irq
stress-17318 3dNh1. 40us : _raw_spin_unlock <-handle_irq_event
stress-17318 3dNh1. 40us : preempt_count_sub <-_raw_spin_unlock
stress-17318 3dNh.. 41us : __handle_irq_event_percpu <-handle_irq_event
stress-17318 3dNh.. 44us : irq_default_primary_handler <-__handle_irq_event_percpu
stress-17318 3dNh.. 45us : __irq_wake_thread <-__handle_irq_event_percpu
stress-17318 3dNh.. 46us : wake_up_process <-__irq_wake_thread
stress-17318 3dNh.. 46us : try_to_wake_up <-wake_up_process
stress-17318 3dNh.. 47us : preempt_count_add <-try_to_wake_up
stress-17318 3dNh1. 47us : _raw_spin_lock_irqsave <-try_to_wake_up
stress-17318 3dNh1. 48us : preempt_count_add <-_raw_spin_lock_irqsave
stress-17318 3dNh2. 52us : kthread_is_per_cpu <-is_cpu_allowed
stress-17318 3dNh2. 54us : ttwu_queue_wakelist <-try_to_wake_up
stress-17318 3dNh2. 54us : preempt_count_add <-try_to_wake_up
stress-17318 3dNh3. 54us : _raw_spin_lock <-try_to_wake_up
stress-17318 3dNh3. 55us : preempt_count_add <-_raw_spin_lock
stress-17318 3dNh4. 55us : preempt_count_sub <-try_to_wake_up
stress-17318 3dNh3. 56us : update_rq_clock <-try_to_wake_up
stress-17318 3dNh3. 56us : ttwu_do_activate <-try_to_wake_up
stress-17318 3dNh3. 57us : enqueue_task <-ttwu_do_activate
stress-17318 3dNh3. 59us : enqueue_task_rt <-enqueue_task
stress-17318 3dNh3. 59us : dequeue_rt_stack <-enqueue_task_rt
stress-17318 3dNh3. 61us : wakeup_preempt <-ttwu_do_activate
stress-17318 3dNh3. 61us : __resched_curr.constprop.0 <-wakeup_preempt
stress-17318 3dNh3. 65us : task_woken_rt <-ttwu_do_activate
stress-17318 3dNh3. 65us : _raw_spin_unlock <-try_to_wake_up
stress-17318 3dNh3. 66us : preempt_count_sub <-_raw_spin_unlock
stress-17318 3dNh2. 66us : _raw_spin_unlock_irqrestore <-try_to_wake_up
stress-17318 3dNh2. 66us : preempt_count_sub <-_raw_spin_unlock_irqrestore
stress-17318 3dNh1. 67us : preempt_count_sub <-try_to_wake_up
stress-17318 3dNh.. 68us : add_interrupt_randomness <-handle_irq_event
stress-17318 3dNh.. 69us : fast_mix <-add_interrupt_randomness
stress-17318 3dNh.. 74us : add_timer_on <-add_interrupt_randomness
stress-17318 3dNh.. 75us : lock_timer_base <-add_timer_on
stress-17318 3dNh.. 76us : _raw_spin_lock_irqsave <-lock_timer_base
stress-17318 3dNh.. 77us : preempt_count_add <-_raw_spin_lock_irqsave
stress-17318 3dNh1. 79us : calc_wheel_index <-add_timer_on
stress-17318 3dNh1. 82us : enqueue_timer <-add_timer_on
stress-17318 3dNh1. 83us : _raw_spin_unlock_irqrestore <-add_timer_on
stress-17318 3dNh1. 84us : preempt_count_sub <-_raw_spin_unlock_irqrestore
stress-17318 3dNh.. 84us : note_interrupt <-handle_irq_event
stress-17318 3dNh.. 85us : _raw_spin_lock <-handle_irq_event
stress-17318 3dNh.. 85us : preempt_count_add <-_raw_spin_lock
stress-17318 3dNh1. 88us : irq_chip_eoi_parent <-handle_fasteoi_irq
stress-17318 3dNh1. 90us : irq_chip_eoi_parent <-irq_chip_eoi_parent
stress-17318 3dNh1. 91us : gic_eoimode1_eoi_irq <-irq_chip_eoi_parent
stress-17318 3dNh1. 92us : _raw_spin_unlock <-handle_fasteoi_irq
stress-17318 3dNh1. 92us : preempt_count_sub <-_raw_spin_unlock
stress-17318 3dNh.. 96us : generic_handle_domain_irq <-gic_handle_irq
stress-17318 3dNh.. 96us : __irq_resolve_mapping <-generic_handle_domain_irq
stress-17318 3dNh.. 96us : __rcu_read_lock <-__irq_resolve_mapping
stress-17318 3dNh.. 96us : __rcu_read_unlock <-__irq_resolve_mapping
stress-17318 3dNh.. 97us : handle_fasteoi_irq <-generic_handle_domain_irq
stress-17318 3dNh.. 98us : _raw_spin_lock <-handle_fasteoi_irq
stress-17318 3dNh.. 98us : preempt_count_add <-_raw_spin_lock
stress-17318 3dNh1. 99us : handle_irq_event <-handle_fasteoi_irq
stress-17318 3dNh1. 99us : _raw_spin_unlock <-handle_irq_event
stress-17318 3dNh1. 99us : preempt_count_sub <-_raw_spin_unlock
stress-17318 3dNh.. 100us : __handle_irq_event_percpu <-handle_irq_event
stress-17318 3dNh.. 100us : irq_default_primary_handler <-__handle_irq_event_percpu
stress-17318 3dNh.. 100us : __irq_wake_thread <-__handle_irq_event_percpu
stress-17318 3dNh.. 100us : add_interrupt_randomness <-handle_irq_event
stress-17318 3dNh.. 101us : fast_mix <-add_interrupt_randomness
stress-17318 3dNh.. 101us : note_interrupt <-handle_irq_event
stress-17318 3dNh.. 102us : _raw_spin_lock <-handle_irq_event
stress-17318 3dNh.. 102us : preempt_count_add <-_raw_spin_lock
stress-17318 3dNh1. 103us : irq_chip_eoi_parent <-handle_fasteoi_irq
stress-17318 3dNh1. 104us : irq_chip_eoi_parent <-irq_chip_eoi_parent
stress-17318 3dNh1. 104us : gic_eoimode1_eoi_irq <-irq_chip_eoi_parent
stress-17318 3dNh1. 106us : _raw_spin_unlock <-handle_fasteoi_irq
stress-17318 3dNh1. 106us : preempt_count_sub <-_raw_spin_unlock
stress-17318 3dNh.. 111us : generic_handle_domain_irq <-gic_handle_irq
stress-17318 3dNh.. 111us : __irq_resolve_mapping <-generic_handle_domain_irq
stress-17318 3dNh.. 114us : __rcu_read_lock <-__irq_resolve_mapping
stress-17318 3dNh.. 114us : __rcu_read_unlock <-__irq_resolve_mapping
stress-17318 3dNh.. 115us : handle_fasteoi_irq <-generic_handle_domain_irq
stress-17318 3dNh.. 115us : _raw_spin_lock <-handle_fasteoi_irq
stress-17318 3dNh.. 115us : preempt_count_add <-_raw_spin_lock
stress-17318 3dNh1. 116us : handle_irq_event <-handle_fasteoi_irq
stress-17318 3dNh1. 116us : _raw_spin_unlock <-handle_irq_event
stress-17318 3dNh1. 116us : preempt_count_sub <-_raw_spin_unlock
stress-17318 3dNh.. 116us : __handle_irq_event_percpu <-handle_irq_event
stress-17318 3dNh.. 117us : irq_default_primary_handler <-__handle_irq_event_percpu
stress-17318 3dNh.. 117us : __irq_wake_thread <-__handle_irq_event_percpu
stress-17318 3dNh.. 117us : add_interrupt_randomness <-handle_irq_event
stress-17318 3dNh.. 118us : fast_mix <-add_interrupt_randomness
stress-17318 3dNh.. 118us : note_interrupt <-handle_irq_event
stress-17318 3dNh.. 120us : _raw_spin_lock <-handle_irq_event
stress-17318 3dNh.. 120us : preempt_count_add <-_raw_spin_lock
stress-17318 3dNh1. 120us : irq_chip_eoi_parent <-handle_fasteoi_irq
stress-17318 3dNh1. 121us : irq_chip_eoi_parent <-irq_chip_eoi_parent
stress-17318 3dNh1. 122us : gic_eoimode1_eoi_irq <-irq_chip_eoi_parent
stress-17318 3dNh1. 125us : _raw_spin_unlock <-handle_fasteoi_irq
stress-17318 3dNh1. 125us : preempt_count_sub <-_raw_spin_unlock
stress-17318 3dNh.. 129us : generic_handle_domain_irq <-gic_handle_irq
stress-17318 3dNh.. 129us : __irq_resolve_mapping <-generic_handle_domain_irq
stress-17318 3dNh.. 130us : __rcu_read_lock <-__irq_resolve_mapping
stress-17318 3dNh.. 130us : __rcu_read_unlock <-__irq_resolve_mapping
stress-17318 3dNh.. 130us : handle_fasteoi_irq <-generic_handle_domain_irq
stress-17318 3dNh.. 130us : _raw_spin_lock <-handle_fasteoi_irq
stress-17318 3dNh.. 131us : preempt_count_add <-_raw_spin_lock
stress-17318 3dNh1. 131us : handle_irq_event <-handle_fasteoi_irq
stress-17318 3dNh1. 131us : _raw_spin_unlock <-handle_irq_event
stress-17318 3dNh1. 132us : preempt_count_sub <-_raw_spin_unlock
stress-17318 3dNh.. 132us : __handle_irq_event_percpu <-handle_irq_event
stress-17318 3dNh.. 133us : irq_default_primary_handler <-__handle_irq_event_percpu
stress-17318 3dNh.. 134us : __irq_wake_thread <-__handle_irq_event_percpu
stress-17318 3dNh.. 134us : add_interrupt_randomness <-handle_irq_event
stress-17318 3dNh.. 135us : fast_mix <-add_interrupt_randomness
stress-17318 3dNh.. 135us : note_interrupt <-handle_irq_event
stress-17318 3dNh.. 135us : _raw_spin_lock <-handle_irq_event
stress-17318 3dNh.. 135us : preempt_count_add <-_raw_spin_lock
stress-17318 3dNh1. 136us : irq_chip_eoi_parent <-handle_fasteoi_irq
stress-17318 3dNh1. 137us : irq_chip_eoi_parent <-irq_chip_eoi_parent
stress-17318 3dNh1. 137us : gic_eoimode1_eoi_irq <-irq_chip_eoi_parent
stress-17318 3dNh1. 138us : _raw_spin_unlock <-handle_fasteoi_irq
stress-17318 3dNh1. 138us : preempt_count_sub <-_raw_spin_unlock
stress-17318 3dNh.. 140us : generic_handle_domain_irq <-gic_handle_irq
stress-17318 3dNh.. 140us : __irq_resolve_mapping <-generic_handle_domain_irq
stress-17318 3dNh.. 140us : __rcu_read_lock <-__irq_resolve_mapping
stress-17318 3dNh.. 141us : __rcu_read_unlock <-__irq_resolve_mapping
stress-17318 3dNh.. 141us : handle_fasteoi_irq <-generic_handle_domain_irq
stress-17318 3dNh.. 141us : _raw_spin_lock <-handle_fasteoi_irq
stress-17318 3dNh.. 142us : preempt_count_add <-_raw_spin_lock
stress-17318 3dNh1. 142us : handle_irq_event <-handle_fasteoi_irq
stress-17318 3dNh1. 142us : _raw_spin_unlock <-handle_irq_event
stress-17318 3dNh1. 143us : preempt_count_sub <-_raw_spin_unlock
stress-17318 3dNh.. 143us : __handle_irq_event_percpu <-handle_irq_event
stress-17318 3dNh.. 144us : irq_default_primary_handler <-__handle_irq_event_percpu
stress-17318 3dNh.. 144us : __irq_wake_thread <-__handle_irq_event_percpu
stress-17318 3dNh.. 145us : add_interrupt_randomness <-handle_irq_event
stress-17318 3dNh.. 145us : fast_mix <-add_interrupt_randomness
stress-17318 3dNh.. 145us : note_interrupt <-handle_irq_event
stress-17318 3dNh.. 145us : _raw_spin_lock <-handle_irq_event
stress-17318 3dNh.. 145us : preempt_count_add <-_raw_spin_lock
stress-17318 3dNh1. 146us : irq_chip_eoi_parent <-handle_fasteoi_irq
stress-17318 3dNh1. 146us : irq_chip_eoi_parent <-irq_chip_eoi_parent
stress-17318 3dNh1. 147us : gic_eoimode1_eoi_irq <-irq_chip_eoi_parent
stress-17318 3dNh1. 148us : _raw_spin_unlock <-handle_fasteoi_irq
stress-17318 3dNh1. 148us : preempt_count_sub <-_raw_spin_unlock
stress-17318 3dNh.. 150us : generic_handle_domain_irq <-gic_handle_irq
stress-17318 3dNh.. 150us : __irq_resolve_mapping <-generic_handle_domain_irq
stress-17318 3dNh.. 150us : __rcu_read_lock <-__irq_resolve_mapping
stress-17318 3dNh.. 151us : __rcu_read_unlock <-__irq_resolve_mapping
stress-17318 3dNh.. 151us : handle_fasteoi_irq <-generic_handle_domain_irq
stress-17318 3dNh.. 151us : _raw_spin_lock <-handle_fasteoi_irq
stress-17318 3dNh.. 152us : preempt_count_add <-_raw_spin_lock
stress-17318 3dNh1. 152us : handle_irq_event <-handle_fasteoi_irq
stress-17318 3dNh1. 152us : _raw_spin_unlock <-handle_irq_event
stress-17318 3dNh1. 152us : preempt_count_sub <-_raw_spin_unlock
stress-17318 3dNh.. 153us : __handle_irq_event_percpu <-handle_irq_event
stress-17318 3dNh.. 153us : irq_default_primary_handler <-__handle_irq_event_percpu
stress-17318 3dNh.. 153us : __irq_wake_thread <-__handle_irq_event_percpu
stress-17318 3dNh.. 154us : add_interrupt_randomness <-handle_irq_event
stress-17318 3dNh.. 154us : fast_mix <-add_interrupt_randomness
stress-17318 3dNh.. 155us : note_interrupt <-handle_irq_event
stress-17318 3dNh.. 155us : _raw_spin_lock <-handle_irq_event
stress-17318 3dNh.. 155us : preempt_count_add <-_raw_spin_lock
stress-17318 3dNh1. 155us : irq_chip_eoi_parent <-handle_fasteoi_irq
stress-17318 3dNh1. 155us : irq_chip_eoi_parent <-irq_chip_eoi_parent
stress-17318 3dNh1. 156us : gic_eoimode1_eoi_irq <-irq_chip_eoi_parent
stress-17318 3dNh1. 157us : _raw_spin_unlock <-handle_fasteoi_irq
stress-17318 3dNh1. 158us : preempt_count_sub <-_raw_spin_unlock
stress-17318 3dNh.. 160us : generic_handle_domain_irq <-gic_handle_irq
stress-17318 3dNh.. 160us : __irq_resolve_mapping <-generic_handle_domain_irq
stress-17318 3dNh.. 160us : __rcu_read_lock <-__irq_resolve_mapping
stress-17318 3dNh.. 161us : __rcu_read_unlock <-__irq_resolve_mapping
stress-17318 3dNh.. 161us : handle_fasteoi_irq <-generic_handle_domain_irq
stress-17318 3dNh.. 161us : _raw_spin_lock <-handle_fasteoi_irq
stress-17318 3dNh.. 162us : preempt_count_add <-_raw_spin_lock
stress-17318 3dNh1. 162us : handle_irq_event <-handle_fasteoi_irq
stress-17318 3dNh1. 163us : _raw_spin_unlock <-handle_irq_event
stress-17318 3dNh1. 163us : preempt_count_sub <-_raw_spin_unlock
stress-17318 3dNh.. 163us : __handle_irq_event_percpu <-handle_irq_event
stress-17318 3dNh.. 164us : irq_default_primary_handler <-__handle_irq_event_percpu
stress-17318 3dNh.. 164us : __irq_wake_thread <-__handle_irq_event_percpu
stress-17318 3dNh.. 165us : add_interrupt_randomness <-handle_irq_event
stress-17318 3dNh.. 167us : fast_mix <-add_interrupt_randomness
stress-17318 3dNh.. 167us : note_interrupt <-handle_irq_event
stress-17318 3dNh.. 169us : _raw_spin_lock <-handle_irq_event
stress-17318 3dNh.. 169us : preempt_count_add <-_raw_spin_lock
stress-17318 3dNh1. 170us : irq_chip_eoi_parent <-handle_fasteoi_irq
stress-17318 3dNh1. 171us : irq_chip_eoi_parent <-irq_chip_eoi_parent
stress-17318 3dNh1. 171us : gic_eoimode1_eoi_irq <-irq_chip_eoi_parent
stress-17318 3dNh1. 174us : _raw_spin_unlock <-handle_fasteoi_irq
stress-17318 3dNh1. 174us : preempt_count_sub <-_raw_spin_unlock
stress-17318 3dNh.. 180us : generic_handle_domain_irq <-gic_handle_irq
stress-17318 3dNh.. 180us : __irq_resolve_mapping <-generic_handle_domain_irq
stress-17318 3dNh.. 181us : __rcu_read_lock <-__irq_resolve_mapping
stress-17318 3dNh.. 181us : __rcu_read_unlock <-__irq_resolve_mapping
stress-17318 3dNh.. 181us : handle_fasteoi_irq <-generic_handle_domain_irq
stress-17318 3dNh.. 182us : _raw_spin_lock <-handle_fasteoi_irq
stress-17318 3dNh.. 182us : preempt_count_add <-_raw_spin_lock
stress-17318 3dNh1. 183us : handle_irq_event <-handle_fasteoi_irq
stress-17318 3dNh1. 183us : _raw_spin_unlock <-handle_irq_event
stress-17318 3dNh1. 184us : preempt_count_sub <-_raw_spin_unlock
stress-17318 3dNh.. 184us : __handle_irq_event_percpu <-handle_irq_event
stress-17318 3dNh.. 184us : irq_default_primary_handler <-__handle_irq_event_percpu
stress-17318 3dNh.. 184us : __irq_wake_thread <-__handle_irq_event_percpu
stress-17318 3dNh.. 184us : add_interrupt_randomness <-handle_irq_event
stress-17318 3dNh.. 185us : fast_mix <-add_interrupt_randomness
stress-17318 3dNh.. 185us : note_interrupt <-handle_irq_event
stress-17318 3dNh.. 186us : _raw_spin_lock <-handle_irq_event
stress-17318 3dNh.. 186us : preempt_count_add <-_raw_spin_lock
stress-17318 3dNh1. 186us : irq_chip_eoi_parent <-handle_fasteoi_irq
stress-17318 3dNh1. 187us : irq_chip_eoi_parent <-irq_chip_eoi_parent
stress-17318 3dNh1. 187us : gic_eoimode1_eoi_irq <-irq_chip_eoi_parent
stress-17318 3dNh1. 189us : _raw_spin_unlock <-handle_fasteoi_irq
stress-17318 3dNh1. 189us : preempt_count_sub <-_raw_spin_unlock
stress-17318 3dNh.. 191us : irq_exit_rcu <-el0_interrupt
stress-17318 3dN... 193us : idle_cpu <-__irq_exit_rcu
stress-17318 3dN... 194us : do_notify_resume <-el0_interrupt
stress-17318 3.N... 195us : schedule <-do_notify_resume
stress-17318 3dN.1. 196us : rcu_note_context_switch <-__schedule
stress-17318 3dN.1. 198us : preempt_count_add <-__schedule
stress-17318 3dN.2. 198us : _raw_spin_lock <-__schedule
stress-17318 3dN.2. 199us : preempt_count_add <-_raw_spin_lock
stress-17318 3dN.3. 200us : preempt_count_sub <-__schedule
stress-17318 3dN.2. 200us : update_rq_clock <-__schedule
stress-17318 3dN.2. 202us : balance_fair <-__schedule
stress-17318 3dN.2. 203us : pick_task_stop <-__schedule
stress-17318 3dN.2. 204us : pick_task_dl <-__schedule
stress-17318 3dN.2. 206us : pick_task_rt <-__schedule
stress-17318 3dN.2. 207us : put_prev_task_fair <-__schedule
stress-17318 3dN.2. 208us : put_prev_entity <-put_prev_task_fair
stress-17318 3dN.2. 208us : update_curr <-put_prev_entity
stress-17318 3dN.2. 209us : update_curr_se <-update_curr
stress-17318 3dN.2. 211us : update_min_vruntime <-update_curr
stress-17318 3dN.2. 213us : cpuacct_charge <-update_curr
stress-17318 3dN.2. 214us : __cgroup_account_cputime <-update_curr
stress-17318 3dN.2. 215us : preempt_count_add <-__cgroup_account_cputime
stress-17318 3dN.3. 216us : cgroup_rstat_updated <-__cgroup_account_cputime
stress-17318 3dN.3. 216us : preempt_count_sub <-__cgroup_account_cputime
stress-17318 3dN.2. 217us : dl_server_update <-update_curr
stress-17318 3dN.2. 218us : update_curr_dl_se <-dl_server_update
stress-17318 3dN.2. 218us : dl_scaled_delta_exec <-update_curr_dl_se
stress-17318 3dN.2. 219us : __enqueue_entity <-put_prev_entity
stress-17318 3dN.2. 221us : __update_load_avg_se <-update_load_avg
stress-17318 3dN.2. 223us : __update_load_avg_cfs_rq <-update_load_avg
stress-17318 3dN.2. 224us : decay_load <-__update_load_avg_cfs_rq
stress-17318 3dN.2. 224us : decay_load <-__update_load_avg_cfs_rq
stress-17318 3dN.2. 227us : update_rt_rq_load_avg <-set_next_task_rt
stress-17318 3d..3. 232us : __schedule <-preempt_schedule_notrace
stress-17318 3d..3. 233us : 17318:120:R ==> [003] 17297: 0:R cyclictest
Note time-warps and far more nutty irq holdoff.
# tracer: wakeup_RT
#
# wakeup_RT latency trace v1.1.5 on 6.13.0-rc6-v8-rt
# --------------------------------------------------------------------
# latency: 1316 us, #273/273, CPU#1 | (M:PREEMPT_RT VP:0, KP:0, SP:0 HP:0 #P:4)
# -----------------
# | task: cyclictest-15842 (uid:0 nice:0 policy:1 rt_prio:99)
# -----------------
#
# _------=> CPU#
# / _-----=> irqs-off/BH-disabled
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / _-=> migrate-disable
# ||||| / delay
# cmd pid |||||| time | caller
# \ / |||||| \ | /
<idle>-0 1dnh5. 0us : 0:120:R + [001] 15842: 0:R cyclictest
<idle>-0 1dnh5. 1us : try_to_wake_up <-wake_up_process
<idle>-0 1dnh4. 1us : task_woken_rt <-ttwu_do_activate
<idle>-0 1dnh4. 1us : _raw_spin_unlock <-try_to_wake_up
<idle>-0 1dnh4. 1us : preempt_count_sub <-_raw_spin_unlock
<idle>-0 1dnh3. 1us : _raw_spin_unlock_irqrestore <-try_to_wake_up
<idle>-0 1dnh3. 2us : preempt_count_sub <-_raw_spin_unlock_irqrestore
<idle>-0 1dnh2. 2us : preempt_count_sub <-try_to_wake_up
<idle>-0 1dnh1. 2us : _raw_spin_lock_irq <-__hrtimer_run_queues
<idle>-0 1dnh1. 2us : preempt_count_add <-_raw_spin_lock_irq
<idle>-0 1dnh2. 2us : hrtimer_update_next_event <-hrtimer_interrupt
<idle>-0 1dnh2. 3us : __hrtimer_next_event_base <-hrtimer_update_next_event
<idle>-0 1dnh2. 3us : __hrtimer_next_event_base <-hrtimer_update_next_event
<idle>-0 1dnh2. 3us : _raw_spin_unlock_irqrestore <-hrtimer_interrupt
<idle>-0 1dnh2. 3us : preempt_count_sub <-_raw_spin_unlock_irqrestore
<idle>-0 1dnh1. 3us : tick_program_event <-hrtimer_interrupt
<idle>-0 1dnh1. 4us : clockevents_program_event <-tick_program_event
<idle>-0 1dnh1. 4us : ktime_get <-clockevents_program_event
<idle>-0 1dnh1. 4us : arch_counter_read <-ktime_get
<idle>-0 1dnh1. 4us : arch_timer_set_next_event_phys <-clockevents_program_event
<idle>-0 1dnh1. 5us!: gic_eoimode1_eoi_irq <-handle_percpu_devid_irq
<idle>-0 1dnh1. 347us : generic_handle_domain_irq <-gic_handle_irq
<idle>-0 1dnh1. 347us : __irq_resolve_mapping <-generic_handle_domain_irq
<idle>-0 1dnh1. 347us : __rcu_read_lock <-__irq_resolve_mapping
<idle>-0 1dnh1. 347us : __rcu_read_unlock <-__irq_resolve_mapping
<idle>-0 1dnh1. 348us : handle_percpu_devid_irq <-generic_handle_domain_irq
<idle>-0 1dnh1. 348us : ipi_handler <-handle_percpu_devid_irq
<idle>-0 1dNh1. 348us : gic_eoimode1_eoi_irq <-handle_percpu_devid_irq
<idle>-0 1dNh1. 352us : irq_exit_rcu <-el1_interrupt
<idle>-0 1dN.1. 352us : idle_cpu <-__irq_exit_rcu
<idle>-0 1.N.1. 352us : arch_cpu_idle_exit <-do_idle
<idle>-0 1.N.1. 352us : tick_nohz_idle_exit <-do_idle
<idle>-0 1dN.1. 353us : ktime_get <-tick_nohz_idle_exit
<idle>-0 1dN.1. 353us : arch_counter_read <-ktime_get
<idle>-0 1dN.1. 353us : tick_do_update_jiffies64 <-tick_nohz_idle_exit
<idle>-0 1dN.1. 353us : _raw_spin_lock <-tick_do_update_jiffies64
<idle>-0 1dN.1. 353us : preempt_count_add <-_raw_spin_lock
<idle>-0 1dN.2. 354us : calc_global_load <-tick_do_update_jiffies64
<idle>-0 1dN.2. 354us : _raw_spin_unlock <-tick_do_update_jiffies64
<idle>-0 1dN.2. 354us : preempt_count_sub <-_raw_spin_unlock
<idle>-0 1dN.1. 354us : update_wall_time <-tick_do_update_jiffies64
<idle>-0 1dN.1. 355us : timekeeping_advance <-update_wall_time
<idle>-0 1dN.1. 355us : _raw_spin_lock_irqsave <-timekeeping_advance
<idle>-0 1dN.1. 355us : preempt_count_add <-_raw_spin_lock_irqsave
<idle>-0 1dN.2. 355us : arch_counter_read <-timekeeping_advance
<idle>-0 1dN.2. 355us : ntp_tick_length <-timekeeping_advance
<idle>-0 1dN.2. 356us : ntp_tick_length <-timekeeping_advance
<idle>-0 1dN.2. 356us : timekeeping_update_from_shadow.constprop.0 <-timekeeping_advance
<idle>-0 1dN.2. 356us : ntp_get_next_leap <-timekeeping_update_from_shadow.constprop.0
<idle>-0 1dN.2. 357us : update_vsyscall <-timekeeping_update_from_shadow.constprop.0
<idle>-0 1dN.2. 357us : raw_notifier_call_chain <-timekeeping_update_from_shadow.constprop.0
<idle>-0 1dN.2. 357us : notifier_call_chain <-raw_notifier_call_chain
<idle>-0 1dN.2. 357us : _raw_spin_unlock_irqrestore <-timekeeping_advance
<idle>-0 1dN.2. 357us : preempt_count_sub <-_raw_spin_unlock_irqrestore
<idle>-0 1dN.1. 358us : timer_clear_idle <-tick_nohz_idle_exit
<idle>-0 1dN.1. 358us : tmigr_cpu_activate <-timer_clear_idle
<idle>-0 1dN.1. 358us : _raw_spin_lock <-tmigr_cpu_activate
<idle>-0 1dN.1. 358us : preempt_count_add <-_raw_spin_lock
<idle>-0 1dN.2. 358us : __tmigr_cpu_activate <-tmigr_cpu_activate
<idle>-0 1dN.2. 361us : tmigr_active_up <-__tmigr_cpu_activate
<idle>-0 1dN.2. 361us : _raw_spin_unlock <-tmigr_cpu_activate
<idle>-0 1dN.2. 361us : preempt_count_sub <-_raw_spin_unlock
<idle>-0 1dN.1. 362us : calc_load_nohz_stop <-tick_nohz_idle_exit
<idle>-0 1dN.1. 362us : tick_nohz_restart <-tick_nohz_idle_exit
<idle>-0 1dN.1. 362us : hrtimer_cancel <-tick_nohz_restart
<idle>-0 1dN.1. 362us : hrtimer_active <-hrtimer_cancel
<idle>-0 1dN.1. 362us : hrtimer_forward <-tick_nohz_restart
<idle>-0 1dN.1. 362us : hrtimer_start_range_ns <-tick_nohz_restart
<idle>-0 1dN.1. 363us : _raw_spin_lock_irqsave <-hrtimer_start_range_ns
<idle>-0 1dN.1. 363us : preempt_count_add <-_raw_spin_lock_irqsave
<idle>-0 1dN.2. 363us : enqueue_hrtimer <-hrtimer_start_range_ns
<idle>-0 1dN.2. 363us : hrtimer_reprogram.constprop.0 <-hrtimer_start_range_ns
<idle>-0 1dN.2. 364us : tick_program_event <-hrtimer_reprogram.constprop.0
<idle>-0 1dN.2. 364us : clockevents_program_event <-tick_program_event
<idle>-0 1dN.2. 365us : ktime_get <-clockevents_program_event
<idle>-0 1dN.2. 365us : arch_counter_read <-ktime_get
<idle>-0 1dN.2. 365us : arch_timer_set_next_event_phys <-clockevents_program_event
<idle>-0 1dN.2. 365us : _raw_spin_unlock_irqrestore <-hrtimer_start_range_ns
<idle>-0 1dN.2. 365us : preempt_count_sub <-_raw_spin_unlock_irqrestore
<idle>-0 1dN.1. 366us : account_idle_ticks <-tick_nohz_idle_exit
<idle>-0 1dN.1. 366us : account_idle_time <-account_idle_ticks
<idle>-0 1dN.1. 366us : irq_enter_rcu <-el1_interrupt
<idle>-0 1dNh1. 366us : tick_irq_enter <-irq_enter_rcu
<idle>-0 1dNh1. 367us : tick_check_oneshot_broadcast_this_cpu <-tick_irq_enter
<idle>-0 1dNh1. 367us : do_interrupt_handler <-el1_interrupt
<idle>-0 1dNh1. 367us : gic_handle_irq <-call_on_irq_stack
<idle>-0 1dNh1. 368us : generic_handle_domain_irq <-gic_handle_irq
<idle>-0 1dNh1. 368us : __irq_resolve_mapping <-generic_handle_domain_irq
<idle>-0 1dNh1. 368us : __rcu_read_lock <-__irq_resolve_mapping
<idle>-0 1dNh1. 368us : __rcu_read_unlock <-__irq_resolve_mapping
<idle>-0 1dNh1. 368us : handle_percpu_devid_irq <-generic_handle_domain_irq
<idle>-0 1dNh1. 369us : ipi_handler <-handle_percpu_devid_irq
<idle>-0 1dNh1. 369us : generic_smp_call_function_single_interrupt <-ipi_handler
<idle>-0 1dNh1. 369us : __flush_smp_call_function_queue <-generic_smp_call_function_single_interrupt
<idle>-0 1dNh1. 370us : rto_push_irq_work_func <-irq_work_single
<idle>-0 1dNh1. 370us : raw_spin_rq_lock_nested <-rto_push_irq_work_func
<idle>-0 1dNh1. 370us : preempt_count_add <-raw_spin_rq_lock_nested
<idle>-0 1dNh2. 370us : _raw_spin_lock <-raw_spin_rq_lock_nested
<idle>-0 1dNh2. 370us : preempt_count_add <-_raw_spin_lock
<idle>-0 1dNh3. 371us : preempt_count_sub <-raw_spin_rq_lock_nested
<idle>-0 1dNh2. 371us : push_rt_task <-rto_push_irq_work_func
<idle>-0 1dNh2. 371us : pick_next_pushable_task <-push_rt_task
<idle>-0 1dNh2. 371us : resched_curr <-push_rt_task
<idle>-0 1dNh2. 371us : __resched_curr.constprop.0 <-resched_curr
<idle>-0 1dNh2. 372us : raw_spin_rq_unlock <-rto_push_irq_work_func
<idle>-0 1dNh2. 372us : _raw_spin_unlock <-raw_spin_rq_unlock
<idle>-0 1dNh2. 372us : preempt_count_sub <-_raw_spin_unlock
<idle>-0 1dNh1. 372us : _raw_spin_lock <-rto_push_irq_work_func
<idle>-0 1dNh1. 373us : preempt_count_add <-_raw_spin_lock
<idle>-0 1dNh2. 373us : rto_next_cpu <-rto_push_irq_work_func
<idle>-0 1dNh2. 373us : _raw_spin_unlock <-rto_push_irq_work_func
<idle>-0 1dNh2. 373us : preempt_count_sub <-_raw_spin_unlock
<idle>-0 1dNh1. 373us : preempt_count_add <-irq_work_queue_on
<idle>-0 1dNh2. 374us : arch_irq_work_raise <-__irq_work_queue_local
<idle>-0 1dNh2. 374us : smp_cross_call <-arch_irq_work_raise
<idle>-0 1dNh2. 374us : __ipi_send_mask <-smp_cross_call
<idle>-0 1dNh2. 375us : gic_ipi_send_mask <-__ipi_send_mask
<idle>-0 1dNh2. 377us : preempt_count_sub <-irq_work_queue_on
<idle>-0 1dNh1. 377us : gic_eoimode1_eoi_irq <-handle_percpu_devid_irq
<idle>-0 1dNh1. 382us : generic_handle_domain_irq <-gic_handle_irq
<idle>-0 1dNh1. 382us : __irq_resolve_mapping <-generic_handle_domain_irq
<idle>-0 1dNh1. 382us : __rcu_read_lock <-__irq_resolve_mapping
<idle>-0 1dNh1. 383us : __rcu_read_unlock <-__irq_resolve_mapping
<idle>-0 1dNh1. 383us : handle_percpu_devid_irq <-generic_handle_domain_irq
<idle>-0 1dNh1. 383us : ipi_handler <-handle_percpu_devid_irq
<idle>-0 1dNh1. 383us : rto_push_irq_work_func <-irq_work_single
<idle>-0 1dNh1. 383us : raw_spin_rq_lock_nested <-rto_push_irq_work_func
<idle>-0 1dNh1. 384us : preempt_count_add <-raw_spin_rq_lock_nested
<idle>-0 1dNh2. 384us : _raw_spin_lock <-raw_spin_rq_lock_nested
<idle>-0 1dNh2. 384us : preempt_count_add <-_raw_spin_lock
<idle>-0 1dNh3. 384us : preempt_count_sub <-raw_spin_rq_lock_nested
<idle>-0 1dNh2. 384us : push_rt_task <-rto_push_irq_work_func
<idle>-0 1dNh2. 384us : pick_next_pushable_task <-push_rt_task
<idle>-0 1dNh2. 385us : resched_curr <-push_rt_task
<idle>-0 1dNh2. 385us : __resched_curr.constprop.0 <-resched_curr
<idle>-0 1dNh2. 385us : raw_spin_rq_unlock <-rto_push_irq_work_func
<idle>-0 1dNh2. 385us : _raw_spin_unlock <-raw_spin_rq_unlock
<idle>-0 1dNh2. 385us : preempt_count_sub <-_raw_spin_unlock
<idle>-0 1dNh1. 386us : _raw_spin_lock <-rto_push_irq_work_func
<idle>-0 1dNh1. 386us : preempt_count_add <-_raw_spin_lock
<idle>-0 1dNh2. 386us : rto_next_cpu <-rto_push_irq_work_func
<idle>-0 1dNh2. 386us : _raw_spin_unlock <-rto_push_irq_work_func
<idle>-0 1dNh2. 386us : preempt_count_sub <-_raw_spin_unlock
<idle>-0 1dNh1. 387us : preempt_count_add <-irq_work_queue_on
<idle>-0 1dNh2. 387us : arch_irq_work_raise <-__irq_work_queue_local
<idle>-0 1dNh2. 387us : smp_cross_call <-arch_irq_work_raise
<idle>-0 1dNh2. 387us : __ipi_send_mask <-smp_cross_call
<idle>-0 1dNh2. 388us!: gic_ipi_send_mask <-__ipi_send_mask
<idle>-0 1dNh2. 495us : preempt_count_sub <-irq_work_queue_on
<idle>-0 1dNh1. 496us!: gic_eoimode1_eoi_irq <-handle_percpu_devid_irq
<idle>-0 1dNh1. 816us : generic_handle_domain_irq <-gic_handle_irq
<idle>-0 1dNh1. 816us : __irq_resolve_mapping <-generic_handle_domain_irq
<idle>-0 1dNh1. 816us : __rcu_read_lock <-__irq_resolve_mapping
<idle>-0 1dNh1. 816us : __rcu_read_unlock <-__irq_resolve_mapping
<idle>-0 1dNh1. 816us : handle_percpu_devid_irq <-generic_handle_domain_irq
<idle>-0 1dNh1. 816us : ipi_handler <-handle_percpu_devid_irq
<idle>-0 1dNh1. 817us : rto_push_irq_work_func <-irq_work_single
<idle>-0 1dNh1. 817us : raw_spin_rq_lock_nested <-rto_push_irq_work_func
<idle>-0 1dNh1. 817us : preempt_count_add <-raw_spin_rq_lock_nested
<idle>-0 1dNh2. 817us : _raw_spin_lock <-raw_spin_rq_lock_nested
<idle>-0 1dNh2. 817us : preempt_count_add <-_raw_spin_lock
<idle>-0 1dNh3. 818us : preempt_count_sub <-raw_spin_rq_lock_nested
<idle>-0 1dNh2. 818us : push_rt_task <-rto_push_irq_work_func
<idle>-0 1dNh2. 818us : pick_next_pushable_task <-push_rt_task
<idle>-0 1dNh2. 818us : resched_curr <-push_rt_task
<idle>-0 1dNh2. 819us : __resched_curr.constprop.0 <-resched_curr
<idle>-0 1dNh2. 819us : raw_spin_rq_unlock <-rto_push_irq_work_func
<idle>-0 1dNh2. 819us : _raw_spin_unlock <-raw_spin_rq_unlock
<idle>-0 1dNh2. 819us : preempt_count_sub <-_raw_spin_unlock
<idle>-0 1dNh1. 819us : _raw_spin_lock <-rto_push_irq_work_func
<idle>-0 1dNh1. 820us : preempt_count_add <-_raw_spin_lock
<idle>-0 1dNh2. 820us : rto_next_cpu <-rto_push_irq_work_func
<idle>-0 1dNh2. 820us : _raw_spin_unlock <-rto_push_irq_work_func
<idle>-0 1dNh2. 820us : preempt_count_sub <-_raw_spin_unlock
<idle>-0 1dNh1. 820us : sched_put_rd <-rto_push_irq_work_func
<idle>-0 1dNh1. 820us!: gic_eoimode1_eoi_irq <-handle_percpu_devid_irq
<idle>-0 1dNh1. 966us : generic_handle_domain_irq <-gic_handle_irq
<idle>-0 1dNh1. 966us : __irq_resolve_mapping <-generic_handle_domain_irq
<idle>-0 1dNh1. 967us : __rcu_read_lock <-__irq_resolve_mapping
<idle>-0 1dNh1. 968us : __rcu_read_unlock <-__irq_resolve_mapping
<idle>-0 1dNh1. 968us : handle_percpu_devid_irq <-generic_handle_domain_irq
<idle>-0 1dNh1. 969us : arch_timer_handler_phys <-handle_percpu_devid_irq
<idle>-0 1dNh1. 969us : hrtimer_interrupt <-arch_timer_handler_phys
<idle>-0 1dNh1. 969us : _raw_spin_lock_irqsave <-hrtimer_interrupt
<idle>-0 1dNh1. 969us : preempt_count_add <-_raw_spin_lock_irqsave
<idle>-0 1dNh2. 969us : ktime_get_update_offsets_now <-hrtimer_interrupt
<idle>-0 1dNh2. 970us : arch_counter_read <-ktime_get_update_offsets_now
<idle>-0 1dNh2. 970us : __hrtimer_run_queues <-hrtimer_interrupt
<idle>-0 1dNh2. 970us : _raw_spin_unlock_irqrestore <-__hrtimer_run_queues
<idle>-0 1dNh2. 970us : preempt_count_sub <-_raw_spin_unlock_irqrestore
<idle>-0 1dNh1. 970us : tick_nohz_handler <-__hrtimer_run_queues
<idle>-0 1dNh1. 971us : ktime_get <-tick_nohz_handler
<idle>-0 1dNh1. 971us : arch_counter_read <-ktime_get
<idle>-0 1dNh1. 971us : update_process_times <-tick_nohz_handler
<idle>-0 1dNh1. 971us : account_process_tick <-update_process_times
<idle>-0 1dNh1. 972us : account_idle_time <-account_process_tick
<idle>-0 1dNh1. 972us : hrtimer_run_queues <-update_process_times
<idle>-0 1dNh1. 972us : tmigr_requires_handle_remote <-update_process_times
<idle>-0 1dNh1. 972us : get_jiffies_update <-tmigr_requires_handle_remote
<idle>-0 1dNh1. 972us : rcu_sched_clock_irq <-update_process_times
<idle>-0 1dNh1. 973us : rcu_is_cpu_rrupt_from_idle <-rcu_sched_clock_irq
<idle>-0 1dNh1. 973us : rcu_segcblist_ready_cbs <-rcu_sched_clock_irq
<idle>-0 1dNh1. 973us : rcu_is_cpu_rrupt_from_idle <-rcu_sched_clock_irq
<idle>-0 1dNh1. 973us : sched_tick <-update_process_times
<idle>-0 1dNh1. 974us : topology_scale_freq_tick <-sched_tick
<idle>-0 1dNh1. 974us : preempt_count_add <-sched_tick
<idle>-0 1dNh2. 974us : _raw_spin_lock <-sched_tick
<idle>-0 1dNh2. 974us : preempt_count_add <-_raw_spin_lock
<idle>-0 1dNh3. 974us : preempt_count_sub <-sched_tick
<idle>-0 1dNh2. 975us : update_rq_clock <-sched_tick
<idle>-0 1dNh2. 975us : task_tick_idle <-sched_tick
<idle>-0 1dNh2. 975us : calc_global_load_tick <-sched_tick
<idle>-0 1dNh2. 975us : task_tick_mm_cid <-sched_tick
<idle>-0 1dNh2. 975us : _raw_spin_unlock <-sched_tick
<idle>-0 1dNh2. 976us : preempt_count_sub <-_raw_spin_unlock
<idle>-0 1dNh1. 976us : perf_event_task_tick <-sched_tick
<idle>-0 1dNh1. 976us : perf_adjust_freq_unthr_context <-perf_event_task_tick
<idle>-0 1dNh1. 976us : __rcu_read_lock <-perf_event_task_tick
<idle>-0 1dNh1. 977us : __rcu_read_unlock <-perf_event_task_tick
<idle>-0 1dNh1. 977us : idle_cpu <-sched_tick
<idle>-0 1dNh1. 977us : sched_balance_trigger <-sched_tick
<idle>-0 1dNh1. 977us : nohz_balance_exit_idle <-sched_balance_trigger
<idle>-0 1dNh1. 978us : __rcu_read_lock <-nohz_balance_exit_idle
<idle>-0 1dNh1. 978us : __rcu_read_unlock <-nohz_balance_exit_idle
<idle>-0 1dNh1. 978us : housekeeping_cpumask <-sched_balance_trigger
<idle>-0 1dNh1. 978us : idle_cpu <-sched_balance_trigger
<idle>-0 1dNh1. 979us : smp_call_function_single_async <-sched_balance_trigger
<idle>-0 1dNh1. 979us : preempt_count_add <-smp_call_function_single_async
<idle>-0 1dNh2. 979us : generic_exec_single <-smp_call_function_single_async
<idle>-0 1dNh2. 979us : __smp_call_single_queue <-generic_exec_single
<idle>-0 1dNh2. 980us : call_function_single_prep_ipi <-__smp_call_single_queue
<idle>-0 1dNh2. 980us : arch_send_call_function_single_ipi <-__smp_call_single_queue
<idle>-0 1dNh2. 980us : smp_cross_call <-arch_send_call_function_single_ipi
<idle>-0 1dNh2. 980us : __ipi_send_mask <-smp_cross_call
<idle>-0 1dNh2. 980us!: gic_ipi_send_mask <-__ipi_send_mask
<idle>-0 1dNh2. 1304us : preempt_count_sub <-smp_call_function_single_async
<idle>-0 1dNh1. 1305us : run_posix_cpu_timers <-update_process_times
<idle>-0 1dNh1. 1305us : profile_tick <-tick_nohz_handler
<idle>-0 1dNh1. 1305us : profile_pc <-profile_tick
<idle>-0 1dNh1. 1306us : profile_pc_cb <-arch_stack_walk
<idle>-0 1dNh1. 1306us : hrtimer_forward <-tick_nohz_handler
<idle>-0 1dNh1. 1306us : _raw_spin_lock_irq <-__hrtimer_run_queues
<idle>-0 1dNh1. 1306us : preempt_count_add <-_raw_spin_lock_irq
<idle>-0 1dNh2. 1306us : enqueue_hrtimer <-__hrtimer_run_queues
<idle>-0 1dNh2. 1307us : hrtimer_update_next_event <-hrtimer_interrupt
<idle>-0 1dNh2. 1307us : __hrtimer_next_event_base <-hrtimer_update_next_event
<idle>-0 1dNh2. 1307us : __hrtimer_next_event_base <-hrtimer_update_next_event
<idle>-0 1dNh2. 1307us : _raw_spin_unlock_irqrestore <-hrtimer_interrupt
<idle>-0 1dNh2. 1307us : preempt_count_sub <-_raw_spin_unlock_irqrestore
<idle>-0 1dNh1. 1308us : tick_program_event <-hrtimer_interrupt
<idle>-0 1dNh1. 1308us : clockevents_program_event <-tick_program_event
<idle>-0 1dNh1. 1308us : ktime_get <-clockevents_program_event
<idle>-0 1dNh1. 1308us : arch_counter_read <-ktime_get
<idle>-0 1dNh1. 1309us : arch_timer_set_next_event_phys <-clockevents_program_event
<idle>-0 1dNh1. 1309us : gic_eoimode1_eoi_irq <-handle_percpu_devid_irq
<idle>-0 1dNh1. 1312us : irq_exit_rcu <-el1_interrupt
<idle>-0 1dN.1. 1312us : idle_cpu <-__irq_exit_rcu
<idle>-0 1.N.1. 1313us : flush_smp_call_function_queue <-do_idle
<idle>-0 1.N.1. 1313us : schedule_idle <-do_idle
<idle>-0 1dN.1. 1313us : rcu_note_context_switch <-__schedule
<idle>-0 1dN.1. 1313us : preempt_count_add <-__schedule
<idle>-0 1dN.2. 1313us : _raw_spin_lock <-__schedule
<idle>-0 1dN.2. 1314us : preempt_count_add <-_raw_spin_lock
<idle>-0 1dN.3. 1314us : preempt_count_sub <-__schedule
<idle>-0 1dN.2. 1314us : update_rq_clock <-__schedule
<idle>-0 1dN.2. 1314us : pick_task_stop <-__schedule
<idle>-0 1dN.2. 1314us : pick_task_dl <-__schedule
<idle>-0 1dN.2. 1315us : pick_task_rt <-__schedule
<idle>-0 1dN.2. 1315us : put_prev_task_idle <-__schedule
<idle>-0 1dN.2. 1315us : dl_scaled_delta_exec <-put_prev_task_idle
<idle>-0 1dN.2. 1315us : update_rt_rq_load_avg <-set_next_task_rt
<idle>-0 1d..3. 1316us : __schedule <-schedule_idle
<idle>-0 1d..3. 1316us : 0:120:R ==> [001] 15842: 0:R cyclictest
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3)
2025-01-08 9:48 ` Florian Paul Schmidt
@ 2025-01-28 12:30 ` Tim Sander
0 siblings, 0 replies; 20+ messages in thread
From: Tim Sander @ 2025-01-28 12:30 UTC (permalink / raw)
To: Sebastian Andrzej Siewior, Florian Paul Schmidt; +Cc: linux-rt-users
Hi
Just my 2 ct.
Am Mittwoch, 8. Januar 2025, 10:48:56 CET schrieb Florian Paul Schmidt:
> Hi!
>
> On 1/7/2025 3:39 PM, Sebastian Andrzej Siewior wrote:
> > On 2024-12-20 14:18:43 [+0100], Florian Paul Schmidt wrote:
> [...]
>
> > The backtrace appears long but everything up to folios_put_refs()
> > appears preemtible. Then there is the interrupt handler. This looks
> > okay.
>
> That was my impression as well. Nothing really surprising (at least for
> my limited knowledge in these matters.)
>
> > Based on this trace, I don't see a section which disables preemption/
> > interrupts and be the reason for the interrupt happening.
> >
> > One thing that could delay your doing are small caches.
> > Then there is looking within the system: An inaccurate/ low resolution
> > clocksource and/ or timer device.
It could also be the TLB infrastructure. I found that these events can also be
very time consuming, probably combined with memory contention?
I think this might be traced with performance counters but i never looked into
this further than musing about the possibility.
Best regards
Tim
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3)
2025-01-14 10:28 ` Mike Galbraith
@ 2025-02-14 8:31 ` Florian Paul Schmidt
2025-02-14 9:05 ` Mike Galbraith
2025-02-14 12:06 ` Gabriele Monaco
0 siblings, 2 replies; 20+ messages in thread
From: Florian Paul Schmidt @ 2025-02-14 8:31 UTC (permalink / raw)
To: Mike Galbraith, Leon Woestenberg; +Cc: John Ogness, linux-rt-users
Hi again!
On 1/14/2025 11:28 AM, Mike Galbraith wrote:
[...]
> This seems rather odd, one core diddling memory is the worst case. Any
> arm experts out there know why that would be?
>
> pi5, locked at 2.4GHz:
>
> taskset -c 3 stress -m 1 --vm-bytes 1G --vm-stride 16 --vm-keep --timeout 60s;killall cyclictest
>
> T: 0 (23157) P:99 I:1000 C: 63421 Min: 1 Act: 12 Avg: 27 Max: 1378
> T: 1 (23158) P:99 I:1500 C: 42327 Min: 1 Act: 32 Avg: 29 Max: 1734
> T: 2 (23159) P:99 I:2000 C: 31744 Min: 1 Act: 18 Avg: 28 Max: 1353
> T: 3 (23160) P:99 I:2500 C: 25393 Min: 1 Act: 11 Avg: 6 Max: 39
>
> With all cores diddling ram, things are far from lovely, but cyclictest
> wakeup latency improves drastically. As you increase the number of
> diddlers 1 -> 4, the latency numbers improve until you get all the way
> up to.. merely awful.
>
> stress -m 4 --vm-bytes 1G --vm-stride 16 --vm-keep --timeout 60s;killall cyclictest
>
> T: 0 (23276) P:99 I:1000 C: 62706 Min: 1 Act: 49 Avg: 48 Max: 220
> T: 1 (23277) P:99 I:1500 C: 41800 Min: 1 Act: 53 Avg: 62 Max: 274
> T: 2 (23278) P:99 I:2000 C: 31348 Min: 1 Act: 45 Avg: 65 Max: 302
> T: 3 (23279) P:99 I:2500 C: 25077 Min: 1 Act: 32 Avg: 54 Max: 175
[...]
I wonder if there is still a way to save the day, i.e. use the RPI5 as a
somewhat capable RT-system by
1. making sure that the RT load does not do anything nasty - this is
mostly under the users' control since they can setup their RT load to
not do this.
2. preventing other tasks on the system from trashing the memory bus.
These are not under direct control by the users if they wish to run a
general purpose Linux system besides the RT load.
Is there any mechanism in the Linux kernel to achieve 2.? This would
probably come with significant slowdown of the non-RT tasks, but that
would be acceptable at least for me.
Kind regards,
FPS
--
https://blog.dfdx.eu
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3)
2025-02-14 8:31 ` Florian Paul Schmidt
@ 2025-02-14 9:05 ` Mike Galbraith
2025-02-14 12:40 ` gene heskett
2025-02-14 12:06 ` Gabriele Monaco
1 sibling, 1 reply; 20+ messages in thread
From: Mike Galbraith @ 2025-02-14 9:05 UTC (permalink / raw)
To: Florian Paul Schmidt, Leon Woestenberg; +Cc: John Ogness, linux-rt-users
On Fri, 2025-02-14 at 09:31 +0100, Florian Paul Schmidt wrote:
> Hi again!
>
> On 1/14/2025 11:28 AM, Mike Galbraith wrote:
>
> [...]
>
> > This seems rather odd, one core diddling memory is the worst case. Any
> > arm experts out there know why that would be?
> >
> > pi5, locked at 2.4GHz:
> >
> > taskset -c 3 stress -m 1 --vm-bytes 1G --vm-stride 16 --vm-keep --timeout 60s;killall cyclictest
> >
> > T: 0 (23157) P:99 I:1000 C: 63421 Min: 1 Act: 12 Avg: 27 Max: 1378
> > T: 1 (23158) P:99 I:1500 C: 42327 Min: 1 Act: 32 Avg: 29 Max: 1734
> > T: 2 (23159) P:99 I:2000 C: 31744 Min: 1 Act: 18 Avg: 28 Max: 1353
> > T: 3 (23160) P:99 I:2500 C: 25393 Min: 1 Act: 11 Avg: 6 Max: 39
> >
>
> I wonder if there is still a way to save the day, i.e. use the RPI5 as a
> somewhat capable RT-system by
>
> 1. making sure that the RT load does not do anything nasty - this is
> mostly under the users' control since they can setup their RT load to
> not do this.
I think that's the answer, don't expect miracles from cute/dinky boxen.
Beating crap out of any box is only useful for finding limits, has
nothing to do with "RT capable". A box is RT capable iff a given RT
load's hard constraints can be met on that box. Both of our stress
results show only that if you need amazing memory performance, dinky
pi5 is not for you :)
-Mike
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3)
2025-02-14 8:31 ` Florian Paul Schmidt
2025-02-14 9:05 ` Mike Galbraith
@ 2025-02-14 12:06 ` Gabriele Monaco
1 sibling, 0 replies; 20+ messages in thread
From: Gabriele Monaco @ 2025-02-14 12:06 UTC (permalink / raw)
To: Florian Paul Schmidt
Cc: John Ogness, linux-rt-users, Mike Galbraith, Leon Woestenberg
On Fri, 2025-02-14 at 09:31 +0100, Florian Paul Schmidt wrote:
>
> I wonder if there is still a way to save the day, i.e. use the RPI5
> as a
> somewhat capable RT-system by
>
> 1. making sure that the RT load does not do anything nasty - this is
> mostly under the users' control since they can setup their RT load to
> not do this.
>
> 2. preventing other tasks on the system from trashing the memory bus.
> These are not under direct control by the users if they wish to run a
> general purpose Linux system besides the RT load.
>
> Is there any mechanism in the Linux kernel to achieve 2.? This would
> probably come with significant slowdown of the non-RT tasks, but that
> would be acceptable at least for me.
>
Hi Florian,
as far as I'm aware, there's not much in the linux kernel to achieve 2,
at least without hardware support (e.g. intel MBA and friends).
What you mention really looks to me like Memguard
(https://github.com/heechul/memguard), there's a lot of research on
similar approaches, essentially what you do is to assume accesses to
the bus are given by LLC (last level cache) misses, as those are
usually easier to measure with PMUs (on-chip performance counters).
Then you give budgets to each core and throttle the cores that finished
their budget.
Memguard is the closest to a ready-made solution (thought a bit old),
but you can find several papers on the topic, also something that
really throttles tasks and not cores, so you won't need to play with
isolating the few cores on the Pi.
This requires a bit of crafting, but I hope it gives you some insights.
Best,
Gabriele
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3)
2025-02-14 9:05 ` Mike Galbraith
@ 2025-02-14 12:40 ` gene heskett
0 siblings, 0 replies; 20+ messages in thread
From: gene heskett @ 2025-02-14 12:40 UTC (permalink / raw)
To: Mike Galbraith, Florian Paul Schmidt, Leon Woestenberg
Cc: John Ogness, linux-rt-users
On 2/14/25 04:05, Mike Galbraith wrote:
> On Fri, 2025-02-14 at 09:31 +0100, Florian Paul Schmidt wrote:
>> Hi again!
>>
>> On 1/14/2025 11:28 AM, Mike Galbraith wrote:
>>
>> [...]
>>
>>> This seems rather odd, one core diddling memory is the worst case. Any
>>> arm experts out there know why that would be?
>>>
>>> pi5, locked at 2.4GHz:
>>>
>>> taskset -c 3 stress -m 1 --vm-bytes 1G --vm-stride 16 --vm-keep --timeout 60s;killall cyclictest
>>>
>>> T: 0 (23157) P:99 I:1000 C: 63421 Min: 1 Act: 12 Avg: 27 Max: 1378
>>> T: 1 (23158) P:99 I:1500 C: 42327 Min: 1 Act: 32 Avg: 29 Max: 1734
>>> T: 2 (23159) P:99 I:2000 C: 31744 Min: 1 Act: 18 Avg: 28 Max: 1353
>>> T: 3 (23160) P:99 I:2500 C: 25393 Min: 1 Act: 11 Avg: 6 Max: 39
>>>
>> I wonder if there is still a way to save the day, i.e. use the RPI5 as a
>> somewhat capable RT-system by
>>
>> 1. making sure that the RT load does not do anything nasty - this is
>> mostly under the users' control since they can setup their RT load to
>> not do this.
> I think that's the answer, don't expect miracles from cute/dinky boxen.
>
> Beating crap out of any box is only useful for finding limits, has
> nothing to do with "RT capable". A box is RT capable iff a given RT
> load's hard constraints can be met on that box. Both of our stress
> results show only that if you need amazing memory performance, dinky
> pi5 is not for you :)
>
> -Mike
This is all well and good Mike, you are actively researching the arm
stuff which needs some TLC.
A decade+ back up the log, I undertook to run an old Sheldon lathe with
a pi3b. Using a mesa 7i90HD spi card and a trio of 7i42TA's to isolate
the 5 volts from the 3.3 logic in the 7i90HD. Worked fairly well but
stuttered a bit from running out of data. So I switched it to an rpi4b
which does the job nicely. Then I changed the steppers to 50 volt
stepper/servo's and threw out the pid's. That where its still at today.
The servo's doubled the speeds. But that pi is still running on a 64Gig
SD card. Enter the bananapi-m5 with a builtin m2 like soldered on
memory, only 4Gb but many times faster than the micro sd card. How would
that work if the bpi-m5 were to boot and run from the m2 memory it has?
Bears investigating IMO. I am booting them from micro-sd on my 3d
printers, running them quite well. Finding how to boot from the m2, is
well hidden if possible at all. Hints appreciated. Thanks Mike.
Cheers, Gene Heskett, CET.
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
- Louis D. Brandeis
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2025-02-14 12:41 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-20 13:18 Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3) Florian Paul Schmidt
2024-12-20 14:05 ` John Ogness
2024-12-20 15:57 ` Florian Paul Schmidt
2024-12-22 10:32 ` Leon Woestenberg
2025-01-08 9:42 ` Florian Paul Schmidt
2025-01-12 15:14 ` Florian Paul Schmidt
2025-01-12 21:30 ` John Ogness
2025-01-13 9:15 ` Florian Paul Schmidt
2025-01-13 10:28 ` Florian Paul Schmidt
2025-01-13 11:09 ` John Ogness
2025-01-13 12:56 ` Florian Paul Schmidt
2025-01-14 10:28 ` Mike Galbraith
2025-02-14 8:31 ` Florian Paul Schmidt
2025-02-14 9:05 ` Mike Galbraith
2025-02-14 12:40 ` gene heskett
2025-02-14 12:06 ` Gabriele Monaco
2025-01-09 8:39 ` Florian Paul Schmidt
2025-01-07 14:39 ` Sebastian Andrzej Siewior
2025-01-08 9:48 ` Florian Paul Schmidt
2025-01-28 12:30 ` Tim Sander
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox