public inbox for linux-rt-users@vger.kernel.org
 help / color / mirror / Atom feed
* Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3)
@ 2024-12-20 13:18 Florian Paul Schmidt
  2024-12-20 14:05 ` John Ogness
  2025-01-07 14:39 ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 20+ messages in thread
From: Florian Paul Schmidt @ 2024-12-20 13:18 UTC (permalink / raw)
  To: linux-rt-users

Hi!

I'm playing around with a Raspberry Pi 5 and a PREEMPT_RT kernel (raspberry
pi 6.13.y branch) and wonder about the latencies I see. First I tried
cyclictest and saw latencies in the upper 50s up to 80s or so under a
kernel compile load.

So I compiled the kernel with timerlat and osnoise tracers and ran a bit of

rtla timerlat top -a 60 --dma-latency 0

and I get traces like the following:


                                      Timer Latency
   0 00:00:22   |          IRQ Timer Latency (us)        |         Thread Timer Latency (us)
CPU COUNT      |      cur       min       avg       max |      cur       min       avg       max
   0 #21138     |        4         0         0        14 |       17         3         4        48
   1 #21138     |        6         0         0        21 |       41         3         4        50
   2 #21138     |       11         0         0        11 |       61         3         4        61
   3 #21138     |        5         0         0        10 |       40         3         4        57
rtla timerlat hit stop tracing
## CPU 2 hit stop tracing, analyzing it ##
   IRQ handler delay:                                         1.36 us (2.22 %)
   IRQ latency:                                              11.60 us
   Timerlat IRQ duration:                                    21.26 us (34.63 %)
   Blocking thread:                                          19.46 us (31.70 %)
                              cc1:265581                     19.46 us
     Blocking thread stack trace
                 -> timerlat_irq
                 -> __hrtimer_run_queues
                 -> hrtimer_interrupt
                 -> arch_timer_handler_phys
                 -> handle_percpu_devid_irq
                 -> generic_handle_domain_irq
                 -> gic_handle_irq
                 -> call_on_irq_stack
                 -> do_interrupt_handler
                 -> el0_interrupt
                 -> __el0_irq_handler_common
                 -> el0t_64_irq_handler
                 -> el0t_64_irq
------------------------------------------------------------------------
   Thread latency:                                           61.40 us (100%)

Max timerlat IRQ latency from idle: 4.74 us in cpu 2
   Saving trace to timerlat_trace.txt

And the surprising thing was that there was nothing out of the ordinary (at
least to my non-expert eyes): No spin locks or similar. Just a syscall (these
are implemented via IRQ EL0, right?) which then goes straight to the timer
handling. I might be completely off here though, as I am not an expert at
all in these things.

When I contrast this to an unloaded system and a threshold like 10 us it seems
that there is actually "something" in the trace.

                                      Timer Latency
   0 00:00:04   |          IRQ Timer Latency (us)        |         Thread Timer Latency (us)
CPU COUNT      |      cur       min       avg       max |      cur       min       avg       max
   0 #4004      |        0         0         0         1 |        3         3         3         6
   1 #4004      |        1         0         0         1 |       10         3         3        10
   2 #4003      |        0         0         0         0 |        3         3         3         7
   3 #4003      |        0         0         0         1 |        3         3         3         7
rtla timerlat hit stop tracing
## CPU 1 hit stop tracing, analyzing it ##
   IRQ handler delay:                                         0.00 us (0.00 %)
   IRQ latency:                                               1.40 us
   Timerlat IRQ duration:                                     4.67 us (45.67 %)
   Blocking thread:                                           3.17 us (30.98 %)
                             htop:262562                      3.17 us
     Blocking thread stack trace
                 -> timerlat_irq
                 -> __hrtimer_run_queues
                 -> hrtimer_interrupt
                 -> arch_timer_handler_phys
                 -> handle_percpu_devid_irq
                 -> generic_handle_domain_irq
                 -> gic_handle_irq
                 -> call_on_irq_stack
                 -> do_interrupt_handler
                 -> el1_interrupt
                 -> el1h_64_irq_handler
                 -> el1h_64_irq
                 -> tracer_preempt_on
                 -> preempt_count_sub
                 -> migrate_enable
                 -> kmem_cache_free
                 -> __fput
                 -> __fput_sync
                 -> __arm64_sys_close
                 -> invoke_syscall
                 -> el0_svc_common.constprop.0
                 -> do_el0_svc
                 -> el0_svc
                 -> el0t_64_sync_handler
                 -> el0t_64_sync
------------------------------------------------------------------------
   Thread latency:                                           10.22 us (100%)

Max timerlat IRQ latency from idle: 1.43 us in cpu 1
   Saving trace to timerlat_trace.txt

In this case it's htop invoking some syscall. So I really wonder how to
interpret things like the first trace above where "nothing" is happening
and still there are large-ish latencie.

At some point in time I wondered what kind of loads did trigger the
longest latencies and I played with "stress -m 6" a little and I was
surprised to see the latencies even go higher than 100. Here is an
example trace

                                      Timer Latency
   0 00:00:16   |          IRQ Timer Latency (us)        |         Thread Timer Latency (us)
CPU COUNT      |      cur       min       avg       max |      cur       min       avg       max
   0 #15654     |        1         0         2        31 |        6         3        15        75
   1 #15654     |        2         0         3        42 |       13         3        17        91
   2 #15654     |        2         0         2        38 |       17         3        16        87
   3 #15654     |        5         0         3        26 |      111         3        16       111
rtla timerlat hit stop tracing
## CPU 3 hit stop tracing, analyzing it ##
   IRQ handler delay:                                         0.00 us (0.00 %)
   IRQ latency:                                               5.16 us
   Timerlat IRQ duration:                                    47.41 us (42.53 %)
   Blocking thread:                                          50.50 us (45.30 %)
                           stress:278226                     50.50 us
     Blocking thread stack trace
                 -> timerlat_irq
                 -> __hrtimer_run_queues
                 -> hrtimer_interrupt
                 -> arch_timer_handler_phys
                 -> handle_percpu_devid_irq
                 -> generic_handle_domain_irq
                 -> gic_handle_irq
                 -> call_on_irq_stack
                 -> do_interrupt_handler
                 -> el1_interrupt
                 -> el1h_64_irq_handler
                 -> el1h_64_irq
                 -> folios_put_refs
                 -> free_pages_and_swap_cache
                 -> __tlb_batch_free_encoded_pages
                 -> tlb_flush_mmu
                 -> unmap_page_range
                 -> unmap_vmas
                 -> vms_clear_ptes
                 -> vms_complete_munmap_vmas
                 -> do_vmi_align_munmap
                 -> do_vmi_munmap
                 -> __vm_munmap
                 -> __arm64_sys_munmap
                 -> invoke_syscall
                 -> el0_svc_common.constprop.0
                 -> do_el0_svc
                 -> el0_svc
                 -> el0t_64_sync_handler
                 -> el0t_64_sync
------------------------------------------------------------------------
   Thread latency:                                          111.47 us (100%)

Max timerlat IRQ latency from idle: 1.98 us in cpu 3
   Saving trace to timerlat_trace.txt


OK, here at least something related to virtual memory is happening.
I still wonder though: How come the latencies are so large? Is the
raspberry pi platform just not suited to reach lower latencies? Is
everything working as expected? Are maybe the raspberry pi
patches responsible for latency sources which are invisible in the
traces? I there any hope for getting these latencies a little more
under control?

Kind regards,
FPS

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-02-14 12:41 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-20 13:18 Raspberry Pi 5 and PREEMPT_RT (6.13.0-rc3) Florian Paul Schmidt
2024-12-20 14:05 ` John Ogness
2024-12-20 15:57   ` Florian Paul Schmidt
2024-12-22 10:32     ` Leon Woestenberg
2025-01-08  9:42       ` Florian Paul Schmidt
2025-01-12 15:14       ` Florian Paul Schmidt
2025-01-12 21:30         ` John Ogness
2025-01-13  9:15           ` Florian Paul Schmidt
2025-01-13 10:28             ` Florian Paul Schmidt
2025-01-13 11:09             ` John Ogness
2025-01-13 12:56               ` Florian Paul Schmidt
2025-01-14 10:28         ` Mike Galbraith
2025-02-14  8:31           ` Florian Paul Schmidt
2025-02-14  9:05             ` Mike Galbraith
2025-02-14 12:40               ` gene heskett
2025-02-14 12:06             ` Gabriele Monaco
2025-01-09  8:39   ` Florian Paul Schmidt
2025-01-07 14:39 ` Sebastian Andrzej Siewior
2025-01-08  9:48   ` Florian Paul Schmidt
2025-01-28 12:30     ` Tim Sander

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox