* Latency spikes on V6.15.1 Preempt RT and maybe related to intel IGB
@ 2025-06-10 11:23 Marc Strämke
2025-06-13 14:54 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 15+ messages in thread
From: Marc Strämke @ 2025-06-10 11:23 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-rt-users
[-- Attachment #1: Type: text/plain, Size: 1142 bytes --]
Hello Everyone, I am reposting to LKML as I am not sure the rt-users
mailinglist is read by many people, (I hope that is okay)
On an AMD Ryzen Embedded machine I am experiencing strange Latency
spikes in cyclictest and need some hints how to debug that further.
The system typically has max latencys of 88 us and averages of 4-8
which is more then sufficient for my application, but I saw some spikes
of many hundred us in testing.
I can provoke latenciess of more then 500-1000 us by invoking "ip l set
enp1s0 promisc off" on the first network interfaces. The network
interface is an "Intel Corporation I210 Gigabit Network Connection"
using the IGB driver.
I tried more or less all tracers but am not knowledgeable enough to make
sense of the output. IRQSoff and wakeup_rt trace output attached.
Can anyone point me in the right direction? I am not sure how to
interpret the function tracers and function_graph tracers output in a
meaningful way. As mainly a user of of the kernel I am a bit overwhelmed
by the interaction of the scheduler, RCU and so on..
I have attached my config for reference.
Kind Regards
Marc
[-- Attachment #2: irqsoff_trace --]
[-- Type: text/plain, Size: 2114 bytes --]
# tracer: irqsoff
#
# irqsoff latency trace v1.1.5 on 6.15.1-dirty
# --------------------------------------------------------------------
# latency: 208 us, #15/15, CPU#1 | (M:PREEMPT_RT VP:0, KP:0, SP:0 HP:0 #P:2)
# -----------------
# | task: ktimers/1-31 (uid:0 nice:0 policy:1 rt_prio:1)
# -----------------
# => started at: handle_softirqs.constprop.0
# => ended at: run_ktimerd
#
#
# _------=> CPU#
# / _-----=> irqs-off/BH-disabled
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / _-=> migrate-disable
# ||||| / delay
# cmd pid |||||| time | caller
# \ / |||||| \ | /
ktimers/-31 1d.s.2 0us : handle_softirqs.constprop.0
ktimers/-31 1d.s.2 0us : irq_disable: caller=handle_softirqs.constprop.0+0x2d8/0x300 parent=0x0
ktimers/-31 1d.s.2 0us : __local_bh_enable(cnt=256, unlock=1) <-run_ktimerd
ktimers/-31 1d...2 0us : __rcu_read_unlock() <-__local_bh_enable
ktimers/-31 1d...2 1us+: rt_spin_unlock(lock=0xffff983497d17660) <-__local_bh_enable
ktimers/-31 1d...2 99us : migrate_enable() <-rt_spin_unlock
ktimers/-31 1d...1 99us : __rcu_read_unlock() <-rt_spin_unlock
ktimers/-31 1d...1 99us : migrate_enable() <-run_ktimerd
ktimers/-31 1d..11 100us!: preempt_disable: caller=run_ktimerd+0x4f/0x90 parent=run_ktimerd+0x4f/0x90
ktimers/-31 1d..1. 208us : preempt_enable: caller=run_ktimerd+0x4f/0x90 parent=run_ktimerd+0x4f/0x90
ktimers/-31 1d.... 209us : irq_enable: caller=run_ktimerd+0x78/0x90 parent=0x0
ktimers/-31 1d.... 209us : run_ktimerd
ktimers/-31 1d..1. 209us : preempt_disable: caller=_raw_spin_lock_irqsave+0x23/0x60 parent=0x0
ktimers/-31 1d.... 209us!: tracer_hardirqs_on <-run_ktimerd
ktimers/-31 1d.... 581us : <stack trace>
=> smpboot_thread_fn
=> kthread
=> ret_from_fork
=> ret_from_fork_asm
[-- Attachment #3: wakeup_rt_trace --]
[-- Type: text/plain, Size: 9760 bytes --]
# tracer: wakeup_rt
#
# wakeup_rt latency trace v1.1.5 on 6.15.1-dirty
# --------------------------------------------------------------------
# latency: 411 us, #89/89, CPU#1 | (M:PREEMPT_RT VP:0, KP:0, SP:0 HP:0 #P:2)
# -----------------
# | task: cyclictest-218784 (uid:0 nice:0 policy:1 rt_prio:95)
# -----------------
#
# _------=> CPU#
# / _-----=> irqs-off/BH-disabled
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / _-=> migrate-disable
# ||||| / delay
# cmd pid |||||| time | caller
# \ / |||||| \ | /
<idle>-0 1dNh5. 0us : 0:120:R + [001] 218784: 4:R cyclictest
<idle>-0 1dNh5. 8us : <stack trace>
=> __ftrace_trace_stack
=> probe_wakeup
=> __traceiter_sched_wakeup
=> ttwu_do_activate
=> try_to_wake_up
=> hrtimer_wakeup
=> __hrtimer_run_queues
=> hrtimer_interrupt
=> __sysvec_apic_timer_interrupt
=> sysvec_apic_timer_interrupt
=> asm_sysvec_apic_timer_interrupt
=> cpu_idle_poll.isra.0
=> do_idle
=> cpu_startup_entry
=> start_secondary
=> common_startup_64
<idle>-0 1dNh5. 8us : 0
<idle>-0 1dNh4. 8us : task_woken_rt(rq=0xffff983497d2b400, p=0xffff9833c394a000) <-ttwu_do_activate
<idle>-0 1dNh4. 8us : _raw_spin_unlock(lock=0xffff983497d2b400) <-try_to_wake_up
<idle>-0 1dNh4. 8us+: preempt_count_sub(val=1) <-_raw_spin_unlock
<idle>-0 1dNh3. 74us : _raw_spin_unlock_irqrestore(lock=0xffff9833c394a8d8, flags=134) <-try_to_wake_up
<idle>-0 1dNh3. 74us : preempt_count_sub(val=1) <-_raw_spin_unlock_irqrestore
<idle>-0 1dNh1. 75us : hrtimer_expire_exit: hrtimer=00000000129a7594
<idle>-0 1dNh1. 75us+: _raw_spin_lock_irq(lock=0xffff983497d1b700) <-__hrtimer_run_queues
<idle>-0 1dNh1. 112us : preempt_count_add(val=1) <-_raw_spin_lock_irq
<idle>-0 1dNh2. 112us : hrtimer_update_next_event(cpu_base=0xffff983497d1b700) <-hrtimer_interrupt
<idle>-0 1dNh2. 112us : __hrtimer_next_event_base(cpu_base=0xffff983497d1b700, exclude=0x0, active=1, expires_next=9223372036854775807) <-hrtimer_interrupt
<idle>-0 1dNh2. 112us : _raw_spin_unlock_irqrestore(lock=0xffff983497d1b700, flags=134) <-hrtimer_interrupt
<idle>-0 1dNh2. 112us : preempt_count_sub(val=1) <-_raw_spin_unlock_irqrestore
<idle>-0 1dNh1. 113us : tick_program_event(expires=57049782000000, force=0) <-hrtimer_interrupt
<idle>-0 1dNh1. 113us : clockevents_program_event(dev=0xffff983497d17380, expires=57049782000000, force=0) <-hrtimer_interrupt
<idle>-0 1dNh1. 113us : ktime_get() <-clockevents_program_event
<idle>-0 1dNh1. 113us : lapic_next_event(delta=838, evt=0xffff983497d17380) <-clockevents_program_event
<idle>-0 1dNh1. 113us : local_timer_exit: vector=236
<idle>-0 1dNh1. 113us : irq_exit_rcu() <-sysvec_apic_timer_interrupt
<idle>-0 1dNh1. 113us : preempt_count_sub(val=65536) <-__irq_exit_rcu
<idle>-0 1dN.1. 113us : wake_up_process(p=0xffff9833c0366000) <-__irq_exit_rcu
<idle>-0 1dN.1. 113us : try_to_wake_up(p=0xffff9833c0366000, state=3, wake_flags=0) <-__irq_exit_rcu
<idle>-0 1dN.2. 114us : _raw_spin_lock_irqsave(lock=0xffff9833c03668d8) <-try_to_wake_up
<idle>-0 1dN.2. 114us : preempt_count_add(val=1) <-_raw_spin_lock_irqsave
<idle>-0 1dN.3. 114us : sched_waking: comm=ktimers/1 pid=31 prio=98 target_cpu=001
<idle>-0 1dN.3. 114us : kthread_is_per_cpu(p=0xffff9833c0366000) <-try_to_wake_up
<idle>-0 1dN.3. 114us : ttwu_queue_wakelist(p=0xffff9833c0366000, cpu=1, wake_flags=8) <-try_to_wake_up
<idle>-0 1dN.4. 114us : _raw_spin_lock(lock=0xffff983497d2b400) <-try_to_wake_up
<idle>-0 1dN.4. 114us : preempt_count_add(val=1) <-_raw_spin_lock
<idle>-0 1dN.4. 114us+: update_rq_clock(rq=0xffff983497d2b400) <-try_to_wake_up
<idle>-0 1dN.4. 149us : ttwu_do_activate(rq=0xffff983497d2b400, p=0xffff9833c0366000, wake_flags=8, rf=0xffffb7c340160f88) <-try_to_wake_up
<idle>-0 1dN.4. 149us : enqueue_task(rq=0xffff983497d2b400, p=0xffff9833c0366000, flags=9) <-ttwu_do_activate
<idle>-0 1dN.4. 149us+: enqueue_task_rt(rq=0xffff983497d2b400, p=0xffff9833c0366000, flags=9) <-enqueue_task
<idle>-0 1dN.4. 185us : dequeue_rt_stack(rt_se=0xffff9833c0366180, flags=9) <-enqueue_task_rt
<idle>-0 1dN.4. 186us : wakeup_preempt(rq=0xffff983497d2b400, p=0xffff9833c0366000, flags=8) <-ttwu_do_activate
<idle>-0 1dN.4. 186us : __resched_curr.constprop.0() <-wakeup_preempt
<idle>-0 1dN.5. 186us : __traceiter_sched_wakeup(__data=0x0, p=0xffff9833c0366000) <-ttwu_do_activate
<idle>-0 1dN.4. 186us : sched_wakeup: comm=ktimers/1 pid=31 prio=98 target_cpu=001
<idle>-0 1dN.4. 186us : task_woken_rt(rq=0xffff983497d2b400, p=0xffff9833c0366000) <-ttwu_do_activate
<idle>-0 1dN.4. 186us : _raw_spin_unlock(lock=0xffff983497d2b400) <-try_to_wake_up
<idle>-0 1dN.4. 186us : preempt_count_sub(val=1) <-_raw_spin_unlock
<idle>-0 1dN.3. 187us : _raw_spin_unlock_irqrestore(lock=0xffff9833c03668d8, flags=146) <-try_to_wake_up
<idle>-0 1dN.3. 187us : preempt_count_sub(val=1) <-_raw_spin_unlock_irqrestore
<idle>-0 1dN.1. 187us : idle_cpu(cpu=1) <-__irq_exit_rcu
<idle>-0 1dN.1. 187us : irq_enable: caller=irqentry_exit+0x50/0x70 parent=0x0
<idle>-0 1dN.1. 187us!: rcu_watching: Endirq 1 0 0x5cc
<idle>-0 1dN.1. 295us : rcu_watching: Start 0 1 0x5d4
<idle>-0 1dN.1. 295us : cpu_idle: state=4294967295 cpu_id=1
<idle>-0 1dN.2. 295us : irq_disable: caller=wakeup_tracer_call+0xbb/0xe0 parent=0x0
<idle>-0 1.N.1. 295us : arch_cpu_idle_exit() <-do_idle
<idle>-0 1dN.2. 296us : irq_enable: caller=wakeup_tracer_call+0xd4/0xe0 parent=0x0
<idle>-0 1dN.2. 296us : irq_disable: caller=wakeup_tracer_call+0xbb/0xe0 parent=0x0
<idle>-0 1.N.1. 296us : tick_nohz_idle_exit() <-do_idle
<idle>-0 1dN.2. 296us : irq_enable: caller=wakeup_tracer_call+0xd4/0xe0 parent=0x0
<idle>-0 1dN.1. 296us : irq_disable: caller=tick_nohz_idle_exit+0x75/0x140 parent=0x0
<idle>-0 1dN.1. 296us : irq_enable: caller=tick_nohz_idle_exit+0x64/0x140 parent=0x0
<idle>-0 1dN.2. 296us : irq_disable: caller=wakeup_tracer_call+0xbb/0xe0 parent=0x0
<idle>-0 1.N.1. 296us!: flush_smp_call_function_queue() <-do_idle
<idle>-0 1dN.2. 404us : irq_enable: caller=wakeup_tracer_call+0xd4/0xe0 parent=0x0
<idle>-0 1dN.2. 405us : irq_disable: caller=wakeup_tracer_call+0xbb/0xe0 parent=0x0
<idle>-0 1.N.1. 405us : schedule_idle() <-cpu_startup_entry
<idle>-0 1dN.2. 405us : irq_enable: caller=wakeup_tracer_call+0xd4/0xe0 parent=0x0
<idle>-0 1dN.2. 405us : irq_disable: caller=wakeup_tracer_call+0xbb/0xe0 parent=0x0
<idle>-0 1.N.1. 405us : hrtimer_active(timer=0xffff983497d2bfe0) <-__schedule
<idle>-0 1dN.2. 405us : irq_enable: caller=wakeup_tracer_call+0xd4/0xe0 parent=0x0
<idle>-0 1dN.1. 405us : irq_disable: caller=__schedule+0x67f/0x1080 parent=0x0
<idle>-0 1dN.1. 405us : rcu_note_context_switch(preempt=0) <-__schedule
<idle>-0 1dN.1. 405us : rcu_utilization: Start context switch
<idle>-0 1dN.1. 405us : rcu_qs() <-rcu_note_context_switch
<idle>-0 1dN.1. 406us : rcu_utilization: End context switch
<idle>-0 1dN.1. 406us : raw_spin_rq_lock_nested.constprop.0() <-__schedule
<idle>-0 1dN.2. 406us : _raw_spin_lock(lock=0xffff983497d2b400) <-raw_spin_rq_lock_nested.constprop.0
<idle>-0 1dN.2. 406us : preempt_count_add(val=1) <-_raw_spin_lock
<idle>-0 1dN.2. 406us : update_rq_clock(rq=0xffff983497d2b400) <-__schedule
<idle>-0 1dN.2. 406us : pick_task_stop(rq=0xffff983497d2b400) <-__schedule
<idle>-0 1dN.2. 406us : pick_task_dl(rq=0xffff983497d2b400) <-__schedule
<idle>-0 1dN.2. 406us : pick_task_rt(rq=0xffff983497d2b400) <-__schedule
<idle>-0 1dN.2. 407us : put_prev_task_idle(rq=0xffff983497d2b400, prev=0xffff9833c035a000, next=0xffff9833c394a000) <-__schedule
<idle>-0 1dN.2. 407us : dl_scaled_delta_exec(rq=0xffff983497d2b400, dl_se=0xffff983497d2bce8, delta_exec=332332) <-put_prev_task_idle
<idle>-0 1dN.2. 407us : arch_scale_cpu_capacity(cpu=1) <-dl_scaled_delta_exec
<idle>-0 1dN.2. 407us : dequeue_pushable_task(rq=0xffff983497d2b400, p=0xffff9833c394a000) <-set_next_task_rt
<idle>-0 1dN.2. 407us : update_rt_rq_load_avg(now=57049782342572, rq=0xffff983497d2b400, running=0) <-set_next_task_rt
<idle>-0 1d..3. 407us : __traceiter_sched_switch(__data=0x0, preempt=0, prev=0xffff9833c035a000, next=0xffff9833c394a000, prev_state=0) <-__schedule
<idle>-0 1d..2. 407us : sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=cyclictest next_pid=218784 next_prio=4
<idle>-0 1d..3. 407us : __traceiter_sched_switch
<idle>-0 1d..3. 408us : 0:120:R ==> [001] 218784: 4:R cyclictest
<idle>-0 1d..3. 412us : <stack trace>
=> __ftrace_trace_stack
=> probe_wakeup_sched_switch
=> __traceiter_sched_switch
=> __schedule
=> schedule_idle
=> cpu_startup_entry
=> start_secondary
=> common_startup_64
[-- Attachment #4: config.gz --]
[-- Type: application/x-gzip, Size: 34225 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel IGB
2025-06-10 11:23 Latency spikes on V6.15.1 Preempt RT and maybe related to intel IGB Marc Strämke
@ 2025-06-13 14:54 ` Sebastian Andrzej Siewior
[not found] ` <E1uQ6I0-000000003aa-37uJ@smtprelay05.ispgateway.de>
[not found] ` <522704b0-5a35-42c3-9bcc-627b487b740c@eltropuls.de>
0 siblings, 2 replies; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-13 14:54 UTC (permalink / raw)
To: Marc Strämke; +Cc: linux-kernel, linux-rt-users
On 2025-06-10 13:23:13 [+0200], Marc Strämke wrote:
> Hello Everyone, I am reposting to LKML as I am not sure the rt-users
Hi,
> mailinglist is read by many people, (I hope that is okay)
>
> On an AMD Ryzen Embedded machine I am experiencing strange Latency spikes in
> cyclictest and need some hints how to debug that further.
>
> The system typically has max latencys of 88 us and averages of 4-8 which is
> more then sufficient for my application, but I saw some spikes of many
> hundred us in testing.
>
> I can provoke latenciess of more then 500-1000 us by invoking "ip l set
> enp1s0 promisc off" on the first network interfaces. The network interface
> is an "Intel Corporation I210 Gigabit Network Connection" using the IGB
> driver.
>
> I tried more or less all tracers but am not knowledgeable enough to make
> sense of the output. IRQSoff and wakeup_rt trace output attached.
I'm not sure what your two traces captured. The irqsoff_trace captured
208us and this looks like a regular top of the run_ktimerd() invocation.
There is not much going on.
wakeup_rt_trace shows the wakeup of cyclictest. It records 411us. Most
of it is scheduler itself with some 100us from
flush_smp_call_function_queue().
> Can anyone point me in the right direction? I am not sure how to interpret
> the function tracers and function_graph tracers output in a meaningful way.
> As mainly a user of of the kernel I am a bit overwhelmed by the interaction
> of the scheduler, RCU and so on..
Could you please try one of the following:
- enable sched tracing events and tell cyclictest to break & stop
tracing? The options for cyclitest would be -b --tracemark.
The idea to see the scheduler events before the delay happens. So your
latency spike is 500us then you can try -b 490 or so.
- use rtla. This might be easier and give you the backtrace of what you
are looking for
https://bristot.me/linux-scheduling-latency-debug-and-analysis/
> I have attached my config for reference.
There is nothing wrong with it. You might want to disable NO_HZ (and use
PERIODIC) and use HZ=250 (or less)
> Kind Regards
>
> Marc
>
Sebastian
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
[not found] ` <E1uQ6I0-000000003aa-37uJ@smtprelay05.ispgateway.de>
@ 2025-06-13 19:58 ` Sebastian Andrzej Siewior
2025-06-14 8:52 ` Marc Strämke
0 siblings, 1 reply; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-13 19:58 UTC (permalink / raw)
To: marc.straemke; +Cc: linux-kernel, linux-rt-users
On 2025-06-13 17:26:15 [+0200], marc.straemke wrote:
> Thanks Sebastian, I will do that tomorrow.To confirm: Just pure event
> tracing without the function tracer? (After enabling the sched
> events)Regards,Marc
The event tracing should narrow down which of the tasks cause the spike.
So if you say it is the ip comment then you should see ip.
Step two would be the function tracer to narrow it down further. You
could start right away with the function tracer to see where the big gap
is.
rtla could speed up the whole process (via the timerlat auto analysis).
Sebastian
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
2025-06-13 19:58 ` Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB Sebastian Andrzej Siewior
@ 2025-06-14 8:52 ` Marc Strämke
2025-06-16 15:10 ` Marc Strämke
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Marc Strämke @ 2025-06-14 8:52 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: linux-kernel, linux-rt-users
[-- Attachment #1: Type: text/plain, Size: 3062 bytes --]
Sebastian,
i tried that in the past (rtla top auto analysis). But i do not really
understand the result:
rtla timerlat hit stop tracing
## CPU 1 hit stop tracing, analyzing it ##
IRQ handler delay: 0.00 us
(0.00 %)
IRQ latency: 709.25 us
Blocking thread:
ip:3567
Blocking thread stack trace
-> timerlat_irq
-> __hrtimer_run_queues
-> hrtimer_interrupt
-> __sysvec_apic_timer_interrupt
-> sysvec_apic_timer_interrupt
-> asm_sysvec_apic_timer_interrupt
-> igb_update_mc_addr_list
-> igb_set_rx_mode
-> __dev_change_flags
-> netif_change_flags
-> do_setlink.constprop.0
-> rtnl_newlink
-> rtnetlink_rcv_msg
-> netlink_rcv_skb
-> netlink_unicast
-> netlink_sendmsg
-> ____sys_sendmsg
-> ___sys_sendmsg
-> __sys_sendmsg
-> do_syscall_64
-> entry_SYSCALL_64_after_hwframe
------------------------------------------------------------------------
IRQ latency: 709.25 us (100%)
I do not really understand where the IRQ/Preemption disabling is
happening. What would the next thing be to do? Function (graph?) tracing
on all the functions visible in the backtrace?
I tried to look at the event race output starting with the call to
igb_set_rx_mode. I have attached the trace with all events and a
function filter on igb on only the cpu executing ip. I cannot
understand what is happening between timestasmp 700.149995 and the IRQ
disable event on 700.150795....
Thanks for your help,
Marc
Am 13.06.2025 um 21:58 schrieb Sebastian Andrzej Siewior:
> On 2025-06-13 17:26:15 [+0200], marc.straemke wrote:
>> Thanks Sebastian, I will do that tomorrow.To confirm: Just pure event
>> tracing without the function tracer? (After enabling the sched
>> events)Regards,Marc
> The event tracing should narrow down which of the tasks cause the spike.
> So if you say it is the ip comment then you should see ip.
> Step two would be the function tracer to narrow it down further. You
> could start right away with the function tracer to see where the big gap
> is.
> rtla could speed up the whole process (via the timerlat auto analysis).
>
> Sebastian
[-- Attachment #2: trace_after_igb_set_rx_mode --]
[-- Type: text/plain, Size: 11269 bytes --]
ip-4931 [001] ...1. 700.149994: preempt_disable: caller=__local_bh_disable_ip+0x76/0xe0 parent=__local_bh_disable_ip+0x76/0xe0
ip-4931 [001] ...11 700.149994: preempt_enable: caller=__local_bh_disable_ip+0x76/0xe0 parent=__local_bh_disable_ip+0x76/0xe0
ip-4931 [001] b...3 700.149994: igb_set_rx_mode <-__dev_change_flags
ip-4931 [001] b...3 700.149995: kmalloc: call_site=igb_set_rx_mode+0x4f8/0x5a0 ptr=00000000b0b4e5c8 bytes_req=12 bytes_alloc=16 gfp_flags=GFP_ATOMIC|__GFP_ZERO node=-1 accounted=false
ip-4931 [001] b...3 700.149995: igb_update_mc_addr_list <-igb_set_rx_mode
ip-4931 [001] Dn..3 700.150795: irq_disable: caller=irqentry_enter+0x2a/0x60 parent=0x0
ip-4931 [001] Dnh.3 700.150795: preempt_disable: caller=sysvec_reschedule_ipi+0x24/0x130 parent=sysvec_reschedule_ipi+0x24/0x130
ip-4931 [001] Dnh.3 700.150795: reschedule_entry: vector=253
ip-4931 [001] DNh.3 700.150795: reschedule_exit: vector=253
ip-4931 [001] DNh.3 700.150796: preempt_enable: caller=sysvec_reschedule_ipi+0x8d/0x130 parent=sysvec_reschedule_ipi+0x8d/0x130
ip-4931 [001] DN.13 700.150796: preempt_disable: caller=irqentry_exit+0x64/0x70 parent=irqentry_exit+0x64/0x70
ip-4931 [001] DN.13 700.150796: irq_enable: caller=preempt_schedule_irq+0x3b/0xe0 parent=0x0
ip-4931 [001] DN.13 700.150796: irq_disable: caller=irqentry_enter+0x2a/0x60 parent=0x0
ip-4931 [001] DNh13 700.150797: call_function_single_entry: vector=251
ip-4931 [001] DNh23 700.150798: ipi_send_cpu: cpu=1 callsite=irq_work_queue_on+0x109/0x120 callback=rto_push_irq_work_func+0x0/0xa0
ip-4931 [001] DNh13 700.150798: call_function_single_exit: vector=251
ip-4931 [001] DN.13 700.150798: irq_enable: caller=asm_sysvec_call_function_single+0x1a/0x20 parent=0x0
ip-4931 [001] DN.13 700.150799: irq_disable: caller=irqentry_enter+0x2a/0x60 parent=0x0
ip-4931 [001] DNh13 700.150799: irq_work_entry: vector=246
ip-4931 [001] DNh13 700.150799: irq_work_exit: vector=246
ip-4931 [001] DN.13 700.150800: irq_enable: caller=asm_sysvec_irq_work+0x1a/0x20 parent=0x0
ip-4931 [001] DN.13 700.150800: irq_disable: caller=irqentry_enter+0x2a/0x60 parent=0x0
ip-4931 [001] DNh13 700.150800: local_timer_entry: vector=236
ip-4931 [001] DNh23 700.150801: hrtimer_cancel: hrtimer=000000003093a798
ip-4931 [001] DNh13 700.150801: hrtimer_expire_entry: hrtimer=000000003093a798 function=hrtimer_wakeup now=700145368977
ip-4931 [001] DNh33 700.150802: sched_waking: comm=cyclictest pid=4893 prio=9 target_cpu=001
ip-4931 [001] DNh43 700.150802: sched_stat_sleep: comm=cyclictest pid=4893 delay=1223239 [ns]
ip-4931 [001] DNh43 700.150803: sched_wakeup: comm=cyclictest pid=4893 prio=9 target_cpu=001
ip-4931 [001] DNh13 700.150803: hrtimer_expire_exit: hrtimer=000000003093a798
ip-4931 [001] DNh23 700.150803: hrtimer_cancel: hrtimer=000000001da73767
ip-4931 [001] DNh13 700.150803: hrtimer_expire_entry: hrtimer=000000001da73767 function=tick_nohz_handler now=700145368977
ip-4931 [001] DNh13 700.150804: softirq_raise: vec=1 [action=TIMER]
ip-4931 [001] DNh13 700.150804: rcu_utilization: Start scheduler-tick
ip-4931 [001] DNh13 700.150805: rcu_utilization: End scheduler-tick
ip-4931 [001] DNh13 700.150805: read_msr: e8, value 1787ff7229c
ip-4931 [001] DNh13 700.150805: read_msr: e7, value 177daa8fd55
ip-4931 [001] DNh23 700.150805: sched_stat_runtime: comm=ip pid=4931 runtime=1226636 [ns]
ip-4931 [001] DNh13 700.150808: hrtimer_expire_exit: hrtimer=000000001da73767
ip-4931 [001] DNh23 700.150808: hrtimer_start: hrtimer=000000001da73767 function=tick_nohz_handler expires=700146000000 softexpires=700146000000 mode=ABS
ip-4931 [001] DNh13 700.150809: local_timer_exit: vector=236
ip-4931 [001] DN.33 700.150809: sched_waking: comm=ktimers/1 pid=31 prio=98 target_cpu=001
ip-4931 [001] DN.43 700.150809: sched_stat_sleep: comm=ktimers/1 pid=31 delay=3349898 [ns]
ip-4931 [001] DN.43 700.150810: sched_wakeup: comm=ktimers/1 pid=31 prio=98 target_cpu=001
ip-4931 [001] DN.13 700.150810: irq_enable: caller=asm_sysvec_apic_timer_interrupt+0x1a/0x20 parent=0x0
ip-4931 [001] DN.13 700.150810: irq_disable: caller=__schedule+0x67f/0x1080 parent=0x0
ip-4931 [001] DN.13 700.150810: rcu_utilization: Start context switch
ip-4931 [001] DN.23 700.150810: rcu_preempt_task: rcu_preempt 93628 4931
ip-4931 [001] DN.13 700.150811: rcu_utilization: End context switch
ip-4931 [001] DN.23 700.150811: sched_stat_runtime: comm=ip pid=4931 runtime=4037 [ns]
ip-4931 [001] DN.23 700.150812: sched_stat_wait: comm=cyclictest pid=4893 delay=7434 [ns]
ip-4931 [001] D..23 700.150813: sched_stat_wait: comm=ip pid=4931 delay=0 [ns]
ip-4931 [001] D..33 700.150814: hrtimer_start: hrtimer=0000000087697d14 function=inactive_task_timer expires=700464978086 softexpires=700464978086 mode=REL|HARD
ip-4931 [001] D..33 700.150814: hrtimer_cancel: hrtimer=0000000096921230
ip-4931 [001] D..33 700.150815: hrtimer_cancel: hrtimer=0000000087697d14
ip-4931 [001] D..33 700.150816: hrtimer_start: hrtimer=0000000096921230 function=dl_task_timer expires=701094857133 softexpires=701094857133 mode=ABS|HARD
ip-4931 [001] Dp.23 700.150816: sched_switch: prev_comm=ip prev_pid=4931 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=4893 next_prio=9
ip-4931 [001] Dp.23 700.150817: tlb_flush: pages:-1 reason:flush on task switch (0)
ip-4931 [001] Dp.23 700.150817: x86_fpu_regs_deactivated: x86/fpu: 00000000dd84b5fb load: 0 xfeatures: 2 xcomp_bv: 8000000000000007
cyclictest-4893 [001] dN.3. 700.150818: sched_stat_runtime: comm=cyclictest pid=4893 runtime=8747 [ns]
cyclictest-4893 [001] dN.3. 700.150819: sched_migrate_task: comm=rcu_preempt pid=17 prio=98 orig_cpu=1 dest_cpu=0
cyclictest-4893 [001] dN.3. 700.150819: sched_wake_idle_without_ipi: cpu=0
cyclictest-4893 [001] dN.1. 700.150820: irq_enable: caller=finish_task_switch.isra.0+0xb1/0x300 parent=0x0
cyclictest-4893 [001] dN.1. 700.150820: irq_disable: caller=irqentry_enter+0x2a/0x60 parent=0x0
cyclictest-4893 [001] dNh1. 700.150820: call_function_single_entry: vector=251
cyclictest-4893 [001] dNh2. 700.150821: csd_queue_cpu: cpu=0 callsite=irq_work_queue_on+0xc6/0x120 func=rto_push_irq_work_func csd=00000000e2a8d506
cyclictest-4893 [001] dNh2. 700.150821: ipi_send_cpu: cpu=0 callsite=irq_work_queue_on+0xc6/0x120 callback=generic_smp_call_function_single_interrupt+0x0/0x20
cyclictest-4893 [001] dNh1. 700.150821: call_function_single_exit: vector=251
cyclictest-4893 [001] dN.1. 700.150822: irq_enable: caller=asm_sysvec_call_function_single+0x1a/0x20 parent=0x0
cyclictest-4893 [001] .N.1. 700.150822: preempt_enable: caller=do_nanosleep+0x72/0x190 parent=do_nanosleep+0x72/0x190
cyclictest-4893 [001] .N.1. 700.150822: preempt_disable: caller=do_nanosleep+0x72/0x190 parent=do_nanosleep+0x72/0x190
cyclictest-4893 [001] dN.1. 700.150822: irq_disable: caller=__schedule+0x67f/0x1080 parent=0x0
cyclictest-4893 [001] dN.1. 700.150822: rcu_utilization: Start context switch
cyclictest-4893 [001] dN.1. 700.150822: rcu_utilization: End context switch
cyclictest-4893 [001] d..1. 700.150823: irq_enable: caller=__schedule+0xb5e/0x1080 parent=0x0
cyclictest-4893 [001] ...1. 700.150823: preempt_enable: caller=do_nanosleep+0x72/0x190 parent=do_nanosleep+0x72/0x190
cyclictest-4893 [001] ..... 700.150823: sys_exit: NR 230 = 0
cyclictest-4893 [001] d.... 700.150823: irq_disable: caller=syscall_exit_to_user_mode+0x1c4/0x240 parent=0x0
cyclictest-4893 [001] d.... 700.150824: irq_enable: caller=syscall_exit_to_user_mode+0xe4/0x240 parent=0x0
cyclictest-4893 [001] ..... 700.150824: rseq_update: cpu_id=1 node_id=0 mm_cid=0
cyclictest-4893 [001] d.... 700.150824: irq_disable: caller=syscall_exit_to_user_mode+0x203/0x240 parent=0x0
cyclictest-4893 [001] d.... 700.150825: x86_fpu_regs_activated: x86/fpu: 00000000ff492f85 load: 1 xfeatures: 2 xcomp_bv: 8000000000000007
cyclictest-4893 [001] d.... 700.150825: irq_enable: caller=syscall_exit_to_user_mode+0x7c/0x240 parent=0x0
cyclictest-4893 [001] d.... 700.150830: irq_disable: caller=do_syscall_64+0x45/0x1a0 parent=0x0
cyclictest-4893 [001] d.... 700.150830: irq_enable: caller=do_syscall_64+0x4a/0x1a0 parent=0x0
cyclictest-4893 [001] ..... 700.150831: sys_enter: NR 1 (4, 7f640320e2c0, 21, 7f640320d5b5, 0, 64)
cyclictest-4893 [001] ...1. 700.150832: preempt_disable: caller=vfs_write+0x306/0x430 parent=vfs_write+0x306/0x430
cyclictest-4893 [001] ...1. 700.150832: preempt_enable: caller=vfs_write+0x328/0x430 parent=vfs_write+0x328/0x430
cyclictest-4893 [001] ..... 700.150832: tracing_mark_write: hit latency threshold (759 > 700)
cyclictest-4893 [001] ...1. 700.150833: preempt_disable: caller=vfs_write+0x12f/0x430 parent=vfs_write+0x12f/0x430
cyclictest-4893 [001] ...1. 700.150833: preempt_enable: caller=vfs_write+0x151/0x430 parent=vfs_write+0x151/0x430
cyclictest-4893 [001] ...1. 700.150833: preempt_disable: caller=fput+0x1f/0x90 parent=fput+0x1f/0x90
cyclictest-4893 [001] ...1. 700.150833: preempt_enable: caller=fput+0x3f/0x90 parent=fput+0x3f/0x90
cyclictest-4893 [001] ..... 700.150833: sys_exit: NR 1 = 33
cyclictest-4893 [001] d.... 700.150834: irq_disable: caller=syscall_exit_to_user_mode+0x1c4/0x240 parent=0x0
cyclictest-4893 [001] d.... 700.150834: irq_enable: caller=syscall_exit_to_user_mode+0x7c/0x240 parent=0x0
cyclictest-4893 [001] d.... 700.150834: irq_disable: caller=do_syscall_64+0x45/0x1a0 parent=0x0
cyclictest-4893 [001] d.... 700.150834: irq_enable: caller=do_syscall_64+0x4a/0x1a0 parent=0x0
cyclictest-4893 [001] ..... 700.150834: sys_enter: NR 1 (5, 5556d858a461, 2, 7f640320d5b5, 0, 64)
cyclictest-4893 [001] ...1. 700.150835: preempt_disable: caller=vfs_write+0x306/0x430 parent=vfs_write+0x306/0x430
cyclictest-4893 [001] ...1. 700.150835: preempt_enable: caller=vfs_write+0x328/0x430 parent=vfs_write+0x328/0x430
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel IGB
[not found] ` <522704b0-5a35-42c3-9bcc-627b487b740c@eltropuls.de>
@ 2025-06-14 18:13 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-14 18:13 UTC (permalink / raw)
To: Marc Strämke; +Cc: linux-kernel, linux-rt-users
On 2025-06-14 14:59:43 [+0200], Marc Strämke wrote:
> Hello Sebastian,
Hi,
> Am 13.06.2025 um 16:54 schrieb Sebastian Andrzej Siewior:
> > > I have attached my config for reference.
> > There is nothing wrong with it. You might want to disable NO_HZ (and use
> > PERIODIC) and use HZ=250 (or less)
>
> I am a bit puzzled by that. Most advise I have read suggests disabling the
> periodic ticks to decrease jitter in an RT Application. What is the tradeoff
> here?
TICK = periodic does HZ timer interrupts a second. Each timer interrupt
serves possible timer_list timers within the kernel (not user space
timers such as clock_nanosleep()/ sleep from userland - those are
scheduled accurate). It also does some time keeping and accounting.
TICK = nohz makes it possible to skip ticks if the CPU is idle. A
notebook CPU which has nothing to do and goes idle could stay longer
idle because the HZ interrupt will not wake it up on the next tick. So
it can stay longer idle and safe energy.
If the CPU does not go idle the timer tick fires every tick.
TICK = no_hz_full you can disable the tick if the CPU remains in
userland for longer period of time. Remains in userland means it does
not call into the kernel.
State tracking is higher with nohz, higher with no_hz_full. Most RT
applications are periodic. Cyclictest should show you lower numbers with
periodic than with nohz.
You can monitor the timer interrupts in /proc/interrupts and compare an
idle CPU vs one that runs an application (which does a busy loop).
> Marc
Sebastian
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
2025-06-14 8:52 ` Marc Strämke
@ 2025-06-16 15:10 ` Marc Strämke
2025-06-17 9:16 ` Sebastian Andrzej Siewior
2025-06-17 9:08 ` Sebastian Andrzej Siewior
2025-06-17 9:28 ` Sebastian Andrzej Siewior
2 siblings, 1 reply; 15+ messages in thread
From: Marc Strämke @ 2025-06-16 15:10 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: linux-kernel, linux-rt-users
Hi Sebastian,
On 14.06.25 10:52, Marc Strämke wrote:
>
> I do not really understand where the IRQ/Preemption disabling is
> happening. What would the next thing be to do? Function (graph?)
> tracing on all the functions visible in the backtrace?
>
>
> I tried to look at the event race output starting with the call to
> igb_set_rx_mode. I have attached the trace with all events and a
> function filter on igb on only the cpu executing ip. I cannot
> understand what is happening between timestasmp 700.149995 and the IRQ
> disable event on 700.150795....
>
>
> Thanks for your help,
>
> Marc
>
>
I am still trying to figure that puzzle out: Please see the following
function trace snippet:
ip-690530 [000] ..... 178636.460435: rtnl_is_locked <-__dev_change_flags
ip-690530 [000] ..... 178636.460435: __local_bh_disable_ip
<-__dev_change_flags
ip-690530 [000] ..... 178636.460435: migrate_disable
<-__local_bh_disable_ip
ip-690530 [000] ...1. 178636.460435: preempt_disable:
caller=__local_bh_disable_ip+0x76/0xe0
parent=__local_bh_disable_ip+0x76/0xe0
ip-690530 [000] ...11 178636.460435: preempt_enable:
caller=__local_bh_disable_ip+0x76/0xe0
parent=__local_bh_disable_ip+0x76/0xe0
ip-690530 [000] ....1 178636.460435: rt_spin_lock <-__local_bh_disable_ip
ip-690530 [000] ....1 178636.460436: __rcu_read_lock <-rt_spin_lock
ip-690530 [000] ....1 178636.460436: migrate_disable
<-__local_bh_disable_ip
ip-690530 [000] ....2 178636.460436: __rcu_read_lock
<-__local_bh_disable_ip
ip-690530 [000] b...2 178636.460436: rt_spin_lock <-__dev_change_flags
ip-690530 [000] b...2 178636.460436: __rcu_read_lock <-rt_spin_lock
ip-690530 [000] b...2 178636.460436: migrate_disable <-__dev_change_flags
ip-690530 [000] b...3 178636.460436: __dev_set_rx_mode <-__dev_change_flags
ip-690530 [000] b...3 178636.460437: igb_set_rx_mode <-__dev_change_flags
It is my understanding that __local_bh_disable_ip called from
netif_addr_lock_bh seems to be the origin of my latency.
What i do not understand is, even if the bottom halves are disabled.
Shouldn't I see the interrupt arriving in the trace?
If i understood it correctly, your talk "Unblocking the softirq lock for
PREEMPT_RT" during the plumbers conference 2023 is exactly about that
case, right?
Probably fixing this issue is out of my abilities for now ;-) The wast
variety of locking concepts inside the kernel is quite intimidating for
a newcomer in kernel land...
Regards,
Marc
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
2025-06-14 8:52 ` Marc Strämke
2025-06-16 15:10 ` Marc Strämke
@ 2025-06-17 9:08 ` Sebastian Andrzej Siewior
2025-06-17 9:28 ` Sebastian Andrzej Siewior
2 siblings, 0 replies; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-17 9:08 UTC (permalink / raw)
To: Marc Strämke; +Cc: linux-kernel, linux-rt-users
On 2025-06-14 10:52:36 [+0200], Marc Strämke wrote:
> Sebastian,
Hi,
> i tried that in the past (rtla top auto analysis). But i do not really
> understand the result:
>
> rtla timerlat hit stop tracing
> ## CPU 1 hit stop tracing, analyzing it ##
> IRQ handler delay: 0.00 us (0.00
> %)
> IRQ latency: 709.25 us
> Blocking thread:
> ip:3567
> Blocking thread stack trace
> -> timerlat_irq
> -> __hrtimer_run_queues
> -> hrtimer_interrupt
> -> __sysvec_apic_timer_interrupt
> -> sysvec_apic_timer_interrupt
> -> asm_sysvec_apic_timer_interrupt
> -> igb_update_mc_addr_list
> -> igb_set_rx_mode
> -> __dev_change_flags
> -> netif_change_flags
> -> do_setlink.constprop.0
> -> rtnl_newlink
> -> rtnetlink_rcv_msg
> -> netlink_rcv_skb
> -> netlink_unicast
> -> netlink_sendmsg
> -> ____sys_sendmsg
> -> ___sys_sendmsg
> -> __sys_sendmsg
> -> do_syscall_64
> -> entry_SYSCALL_64_after_hwframe
> ------------------------------------------------------------------------
> IRQ latency: 709.25 us (100%)
>
>
> I do not really understand where the IRQ/Preemption disabling is happening.
> What would the next thing be to do? Function (graph?) tracing on all the
> functions visible in the backtrace?
So somewhere between entry_SYSCALL_64_after_hwframe() and timerlat_irq()
something happened that led to the spike. Up to __dev_change_flags() it
is mostly regular and asm_sysvec_apic_timer_interrupt() is already the
interrupt.
There is only one calling site for igb_update_mc_addr_list() and
everything before that looks harmless.
igb_update_mc_addr_list() has these loops where it writes to FIFO
register. This is followed by wrfl() which reads the status register.
This read forces the writes to go through the bus. What might happen is
that all the writes are "cached" and scheduled for a better time. That
read will force the writes to actually happen and the CPU will stall
until this done.
My guess would be one the writes cause the stall here.
> I tried to look at the event race output starting with the call to
> igb_set_rx_mode. I have attached the trace with all events and a function
> filter on igb on only the cpu executing ip. I cannot understand what is
> happening between timestasmp 700.149995 and the IRQ disable event on
> 700.150795....
>
> ip-4931 [001] b...3 700.149994: igb_set_rx_mode <-__dev_change_flags
> ip-4931 [001] b...3 700.149995: kmalloc: call_site=igb_set_rx_mode+0x4f8/0x5a0 ptr=00000000b0b4e5c8 bytes_req=12 bytes_alloc=16 gfp_flags=GFP_ATOMIC|__GFP_ZERO node=-1 accounted=false
> ip-4931 [001] b...3 700.149995: igb_update_mc_addr_list <-igb_set_rx_mode
> ip-4931 [001] Dn..3 700.150795: irq_disable: caller=irqentry_enter+0x2a/0x60 parent=0x0
The CPU is somewhere in igb_update_mc_addr_list(). Based on the
timestamps it was there for 800us before the irq_disable() occurred.
This did not happen within igb_update_mc_addr_list() but due to an
interrupt.
> Thanks for your help,
>
> Marc
Sebastian
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
2025-06-16 15:10 ` Marc Strämke
@ 2025-06-17 9:16 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-17 9:16 UTC (permalink / raw)
To: Marc Strämke; +Cc: linux-kernel, linux-rt-users
On 2025-06-16 17:10:33 [+0200], Marc Strämke wrote:
> Hi Sebastian,
Hi Marc,
> I am still trying to figure that puzzle out: Please see the following
> function trace snippet:
>
> ip-690530 [000] ..... 178636.460435: rtnl_is_locked <-__dev_change_flags
> ip-690530 [000] ..... 178636.460435: __local_bh_disable_ip <-__dev_change_flags
> ip-690530 [000] ..... 178636.460435: migrate_disable <-__local_bh_disable_ip
> ip-690530 [000] ...1. 178636.460435: preempt_disable: caller=__local_bh_disable_ip+0x76/0xe0 parent=__local_bh_disable_ip+0x76/0xe0
> ip-690530 [000] ...11 178636.460435: preempt_enable: caller=__local_bh_disable_ip+0x76/0xe0 parent=__local_bh_disable_ip+0x76/0xe0
> ip-690530 [000] ....1 178636.460435: rt_spin_lock <-__local_bh_disable_ip
> ip-690530 [000] ....1 178636.460436: __rcu_read_lock <-rt_spin_lock
> ip-690530 [000] ....1 178636.460436: migrate_disable <-__local_bh_disable_ip
> ip-690530 [000] ....2 178636.460436: __rcu_read_lock <-__local_bh_disable_ip
> ip-690530 [000] b...2 178636.460436: rt_spin_lock <-__dev_change_flags
> ip-690530 [000] b...2 178636.460436: __rcu_read_lock <-rt_spin_lock
> ip-690530 [000] b...2 178636.460436: migrate_disable <-__dev_change_flags
> ip-690530 [000] b...3 178636.460436: __dev_set_rx_mode <-__dev_change_flags
> ip-690530 [000] b...3 178636.460437: igb_set_rx_mode <-__dev_change_flags
>
> It is my understanding that __local_bh_disable_ip called from
> netif_addr_lock_bh seems to be the origin of my latency.
How so?
> What i do not understand is, even if the bottom halves are disabled.
> Shouldn't I see the interrupt arriving in the trace?
Yes.
> If i understood it correctly, your talk "Unblocking the softirq lock for
> PREEMPT_RT" during the plumbers conference 2023 is exactly about that case,
> right?
No.
> Probably fixing this issue is out of my abilities for now ;-) The wast
> variety of locking concepts inside the kernel is quite intimidating for a
> newcomer in kernel land...
If you would have only the scheduler events enabled and you would see
that "ip-690530" is doing something, then an interrupt fires, that
interrupts wakes a thread, CPU switches to that thread and that thread
does sched_pi_setprio() and switches back to "ip-690530" until it is
done _then_ it would be what I said on plumbers in 2023.
Your trace snippet above is short latency wise. It covers just 2us.
From that it all looks good. If you go further down in your trace,
there has to be a spike between igb_update_mc_addr_list() and
asm_sysvec_apic_timer_interrupt().
> Regards,
>
>
> Marc
Sebastian
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
2025-06-14 8:52 ` Marc Strämke
2025-06-16 15:10 ` Marc Strämke
2025-06-17 9:08 ` Sebastian Andrzej Siewior
@ 2025-06-17 9:28 ` Sebastian Andrzej Siewior
2025-06-17 9:45 ` Marc Strämke
2 siblings, 1 reply; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-17 9:28 UTC (permalink / raw)
To: Marc Strämke; +Cc: linux-kernel, linux-rt-users
On 2025-06-14 10:52:36 [+0200], Marc Strämke wrote:
> ip-4931 [001] b...3 700.149995: igb_update_mc_addr_list <-igb_set_rx_mode
> ip-4931 [001] Dn..3 700.150795: irq_disable: caller=irqentry_enter+0x2a/0x60 parent=0x0
Between those two functions you have 800us delay. Interrupts are not
disabled so the CPU stalls. As explained earlier, I expect the read on
the bus flushes the writes causing the spike.
Sebastian
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
2025-06-17 9:28 ` Sebastian Andrzej Siewior
@ 2025-06-17 9:45 ` Marc Strämke
2025-06-17 10:00 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 15+ messages in thread
From: Marc Strämke @ 2025-06-17 9:45 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: linux-kernel, linux-rt-users
Hi Sebastian,
On 17.06.25 11:28, Sebastian Andrzej Siewior wrote:
> Between those two functions you have 800us delay. Interrupts are not
> disabled so the CPU stalls. As explained earlier, I expect the read on
> the bus flushes the writes causing the spike.
>
So the delay you think is really the hardware(CPU) being stalled on the
bus for so long? Or do you mean that this is the reason for the long
runtime of the IGB function only?
Shouldn't the other core (it is 2 core machine) still be able to handle
the timer interrupt then? (I did some testing with isolating IP on one
core and cyclictest on the other, I am not sure if I moved the timer IRQ
to the second core)
My mistake was to search for where the Interrupt was being disabled in
Kernelspace (I did not think that the hardware could introduce and IRQ
delay for 800us..)
Marc
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
2025-06-17 9:45 ` Marc Strämke
@ 2025-06-17 10:00 ` Sebastian Andrzej Siewior
2025-06-17 10:03 ` Marc Strämke
0 siblings, 1 reply; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-17 10:00 UTC (permalink / raw)
To: Marc Strämke; +Cc: linux-kernel, linux-rt-users
On 2025-06-17 11:45:52 [+0200], Marc Strämke wrote:
> Hi Sebastian,
Hi Marc,
> On 17.06.25 11:28, Sebastian Andrzej Siewior wrote:
> > Between those two functions you have 800us delay. Interrupts are not
> > disabled so the CPU stalls. As explained earlier, I expect the read on
> > the bus flushes the writes causing the spike.
> >
> So the delay you think is really the hardware(CPU) being stalled on the bus
> for so long? Or do you mean that this is the reason for the long runtime of
Yes.
> the IGB function only?
Both. The bus is flushed, the CPU stalls until the end and then the
function takes long _and_ the due to the stall the interrupt can not
fire any sooner.
> Shouldn't the other core (it is 2 core machine) still be able to handle the
> timer interrupt then? (I did some testing with isolating IP on one core and
You have two cores and two cyclictest threads - one on each CPU. This is
explicit per-CPU. No migration between those two cores. Timers for CPU0
are always handled by CPU0.
Even if CPU1 would handle CPU0's timers then it would wake cyclictest on
CPU0 but that thread would have to wake until CPU0 is done with the PCI
bus. CPU1 knows nothing about it.
> cyclictest on the other, I am not sure if I moved the timer IRQ to the
> second core)
> My mistake was to search for where the Interrupt was being disabled in
> Kernelspace (I did not think that the hardware could introduce and IRQ delay
> for 800us..)
Yeah.
> Marc
Sebastian
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
2025-06-17 10:00 ` Sebastian Andrzej Siewior
@ 2025-06-17 10:03 ` Marc Strämke
2025-06-17 10:15 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 15+ messages in thread
From: Marc Strämke @ 2025-06-17 10:03 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: linux-kernel, linux-rt-users
Hi Sebastian,
On 17.06.25 12:00, Sebastian Andrzej Siewior wrote:
> Even if CPU1 would handle CPU0's timers then it would wake cyclictest on
> CPU0 but that thread would have to wake until CPU0 is done with the PCI
> bus. CPU1 knows nothing about it.
Okay then the latency I see on the other CPU must be from a PCI access
done by the second CPU which stall on the same shared bus.
Anyway: Thanks for your help Sebastian! I can probably live well with
these spikes in latency. I was more concerned that there is a deeper
issue with my config and the response time could be unbounded.
Marc
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
2025-06-17 10:03 ` Marc Strämke
@ 2025-06-17 10:15 ` Sebastian Andrzej Siewior
2025-06-17 10:54 ` Kurt Kanzenbach
0 siblings, 1 reply; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-17 10:15 UTC (permalink / raw)
To: Marc Strämke; +Cc: linux-kernel, linux-rt-users
On 2025-06-17 12:03:31 [+0200], Marc Strämke wrote:
> Hi Sebastian,
Hi,
> On 17.06.25 12:00, Sebastian Andrzej Siewior wrote:
> > Even if CPU1 would handle CPU0's timers then it would wake cyclictest on
> > CPU0 but that thread would have to wake until CPU0 is done with the PCI
> > bus. CPU1 knows nothing about it.
>
> Okay then the latency I see on the other CPU must be from a PCI access done
> by the second CPU which stall on the same shared bus.
Okay.
>
> Anyway: Thanks for your help Sebastian! I can probably live well with these
> spikes in latency. I was more concerned that there is a deeper issue with my
> config and the response time could be unbounded.
You don't have to live with it. You could add a read after the writes in
the loop (wrfl()). This should help.
Some Intel CPUs have a MSR bit to disable this kind of caching on the
PCI bus. Maybe the AMD CPU has this, too. Or there might be a switch in
the BIOS.
> Marc
Sebastian
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
2025-06-17 10:15 ` Sebastian Andrzej Siewior
@ 2025-06-17 10:54 ` Kurt Kanzenbach
[not found] ` <19a46896-3d26-415b-9820-730a42b0702e@eltropuls.de>
0 siblings, 1 reply; 15+ messages in thread
From: Kurt Kanzenbach @ 2025-06-17 10:54 UTC (permalink / raw)
To: Sebastian Andrzej Siewior, Marc Strämke; +Cc: linux-kernel, linux-rt-users
[-- Attachment #1: Type: text/plain, Size: 612 bytes --]
On Tue Jun 17 2025, Sebastian Andrzej Siewior wrote:
> On 2025-06-17 12:03:31 [+0200], Marc Strämke wrote:
>> Anyway: Thanks for your help Sebastian! I can probably live well with these
>> spikes in latency. I was more concerned that there is a deeper issue with my
>> config and the response time could be unbounded.
>
> You don't have to live with it. You could add a read after the writes in
> the loop (wrfl()). This should help.
Something like this, which was done for Intel e1000?
https://lore.kernel.org/intel-wired-lan/20241219192743.4499-1-gerhard@engleder-embedded.com/
Thanks,
Kurt
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
[not found] ` <19a46896-3d26-415b-9820-730a42b0702e@eltropuls.de>
@ 2025-06-17 13:35 ` Kurt Kanzenbach
0 siblings, 0 replies; 15+ messages in thread
From: Kurt Kanzenbach @ 2025-06-17 13:35 UTC (permalink / raw)
To: Marc Strämke, Sebastian Andrzej Siewior; +Cc: linux-kernel, linux-rt-users
[-- Attachment #1: Type: text/plain, Size: 532 bytes --]
On Tue Jun 17 2025, Marc Strämke wrote:
> Hi Kurt,
> On 17.06.25 12:54, Kurt Kanzenbach wrote:
>>> You don't have to live with it. You could add a read after the writes in
>>> the loop (wrfl()). This should help.
>> Something like this, which was done for Intel e1000?
>>
>> https://lore.kernel.org/intel-wired-lan/20241219192743.4499-1-gerhard@engleder-embedded.com/
>>
>> Thanks,
>> Kurt
>
> This works wonders (Latency spike down to 120 us from 700) ! I will
> prepare a patch and submit it to LKML.
Great :-)
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-06-17 13:58 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-10 11:23 Latency spikes on V6.15.1 Preempt RT and maybe related to intel IGB Marc Strämke
2025-06-13 14:54 ` Sebastian Andrzej Siewior
[not found] ` <E1uQ6I0-000000003aa-37uJ@smtprelay05.ispgateway.de>
2025-06-13 19:58 ` Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB Sebastian Andrzej Siewior
2025-06-14 8:52 ` Marc Strämke
2025-06-16 15:10 ` Marc Strämke
2025-06-17 9:16 ` Sebastian Andrzej Siewior
2025-06-17 9:08 ` Sebastian Andrzej Siewior
2025-06-17 9:28 ` Sebastian Andrzej Siewior
2025-06-17 9:45 ` Marc Strämke
2025-06-17 10:00 ` Sebastian Andrzej Siewior
2025-06-17 10:03 ` Marc Strämke
2025-06-17 10:15 ` Sebastian Andrzej Siewior
2025-06-17 10:54 ` Kurt Kanzenbach
[not found] ` <19a46896-3d26-415b-9820-730a42b0702e@eltropuls.de>
2025-06-17 13:35 ` Kurt Kanzenbach
[not found] ` <522704b0-5a35-42c3-9bcc-627b487b740c@eltropuls.de>
2025-06-14 18:13 ` Latency spikes on V6.15.1 Preempt RT and maybe related to intel IGB Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).