Latency spikes on V6.15.1 Preempt RT and maybe related to intel IGB

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Latency spikes on V6.15.1 Preempt RT and maybe related to intel IGB
@ 2025-06-10 11:23 Marc Strämke
  2025-06-13 14:54 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 15+ messages in thread
From: Marc Strämke @ 2025-06-10 11:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 1142 bytes --]

Hello Everyone, I am reposting to LKML as I am not sure the rt-users 
mailinglist is read by many people, (I hope that is okay)

On an AMD Ryzen Embedded machine I am experiencing strange Latency 
spikes in cyclictest and need some hints how to debug that further.

The system typically has max latencys of  88 us and averages of 4-8 
which is more then sufficient for my application, but I saw some spikes 
of many hundred us in testing.

I can provoke latenciess of more then 500-1000 us by invoking "ip l set 
enp1s0 promisc off" on the first network interfaces. The network 
interface is an "Intel Corporation I210 Gigabit Network Connection" 
using the IGB driver.

I tried more or less all tracers but am not knowledgeable enough to make 
sense of the output. IRQSoff and wakeup_rt trace output attached.

Can anyone point me in the right direction? I am not sure how to 
interpret the function tracers and function_graph tracers output in a 
meaningful way. As mainly a user of of the kernel I am a bit overwhelmed 
by the interaction of the scheduler, RCU and so on..


I have attached my config for reference.


Kind Regards

Marc


[-- Attachment #2: irqsoff_trace --]
[-- Type: text/plain, Size: 2114 bytes --]

# tracer: irqsoff
#
# irqsoff latency trace v1.1.5 on 6.15.1-dirty
# --------------------------------------------------------------------
# latency: 208 us, #15/15, CPU#1 | (M:PREEMPT_RT VP:0, KP:0, SP:0 HP:0 #P:2)
#    -----------------
#    | task: ktimers/1-31 (uid:0 nice:0 policy:1 rt_prio:1)
#    -----------------
#  => started at: handle_softirqs.constprop.0
#  => ended at:   run_ktimerd
#
#
#                    _------=> CPU#            
#                   / _-----=> irqs-off/BH-disabled
#                  | / _----=> need-resched    
#                  || / _---=> hardirq/softirq 
#                  ||| / _--=> preempt-depth   
#                  |||| / _-=> migrate-disable 
#                  ||||| /     delay           
#  cmd     pid     |||||| time  |   caller     
#     \   /        ||||||  \    |    /       
ktimers/-31        1d.s.2    0us : handle_softirqs.constprop.0
ktimers/-31        1d.s.2    0us : irq_disable: caller=handle_softirqs.constprop.0+0x2d8/0x300 parent=0x0
ktimers/-31        1d.s.2    0us : __local_bh_enable(cnt=256, unlock=1) <-run_ktimerd
ktimers/-31        1d...2    0us : __rcu_read_unlock() <-__local_bh_enable
ktimers/-31        1d...2    1us+: rt_spin_unlock(lock=0xffff983497d17660) <-__local_bh_enable
ktimers/-31        1d...2   99us : migrate_enable() <-rt_spin_unlock
ktimers/-31        1d...1   99us : __rcu_read_unlock() <-rt_spin_unlock
ktimers/-31        1d...1   99us : migrate_enable() <-run_ktimerd
ktimers/-31        1d..11  100us!: preempt_disable: caller=run_ktimerd+0x4f/0x90 parent=run_ktimerd+0x4f/0x90
ktimers/-31        1d..1.  208us : preempt_enable: caller=run_ktimerd+0x4f/0x90 parent=run_ktimerd+0x4f/0x90
ktimers/-31        1d....  209us : irq_enable: caller=run_ktimerd+0x78/0x90 parent=0x0
ktimers/-31        1d....  209us : run_ktimerd
ktimers/-31        1d..1.  209us : preempt_disable: caller=_raw_spin_lock_irqsave+0x23/0x60 parent=0x0
ktimers/-31        1d....  209us!: tracer_hardirqs_on <-run_ktimerd
ktimers/-31        1d....  581us : <stack trace>
 => smpboot_thread_fn
 => kthread
 => ret_from_fork
 => ret_from_fork_asm

[-- Attachment #3: wakeup_rt_trace --]
[-- Type: text/plain, Size: 9760 bytes --]

# tracer: wakeup_rt
#
# wakeup_rt latency trace v1.1.5 on 6.15.1-dirty
# --------------------------------------------------------------------
# latency: 411 us, #89/89, CPU#1 | (M:PREEMPT_RT VP:0, KP:0, SP:0 HP:0 #P:2)
#    -----------------
#    | task: cyclictest-218784 (uid:0 nice:0 policy:1 rt_prio:95)
#    -----------------
#
#                    _------=> CPU#            
#                   / _-----=> irqs-off/BH-disabled
#                  | / _----=> need-resched    
#                  || / _---=> hardirq/softirq 
#                  ||| / _--=> preempt-depth   
#                  |||| / _-=> migrate-disable 
#                  ||||| /     delay           
#  cmd     pid     |||||| time  |   caller     
#     \   /        ||||||  \    |    /       
  <idle>-0         1dNh5.    0us :        0:120:R   + [001]  218784:  4:R cyclictest
  <idle>-0         1dNh5.    8us : <stack trace>
 => __ftrace_trace_stack
 => probe_wakeup
 => __traceiter_sched_wakeup
 => ttwu_do_activate
 => try_to_wake_up
 => hrtimer_wakeup
 => __hrtimer_run_queues
 => hrtimer_interrupt
 => __sysvec_apic_timer_interrupt
 => sysvec_apic_timer_interrupt
 => asm_sysvec_apic_timer_interrupt
 => cpu_idle_poll.isra.0
 => do_idle
 => cpu_startup_entry
 => start_secondary
 => common_startup_64
  <idle>-0         1dNh5.    8us : 0
  <idle>-0         1dNh4.    8us : task_woken_rt(rq=0xffff983497d2b400, p=0xffff9833c394a000) <-ttwu_do_activate
  <idle>-0         1dNh4.    8us : _raw_spin_unlock(lock=0xffff983497d2b400) <-try_to_wake_up
  <idle>-0         1dNh4.    8us+: preempt_count_sub(val=1) <-_raw_spin_unlock
  <idle>-0         1dNh3.   74us : _raw_spin_unlock_irqrestore(lock=0xffff9833c394a8d8, flags=134) <-try_to_wake_up
  <idle>-0         1dNh3.   74us : preempt_count_sub(val=1) <-_raw_spin_unlock_irqrestore
  <idle>-0         1dNh1.   75us : hrtimer_expire_exit: hrtimer=00000000129a7594
  <idle>-0         1dNh1.   75us+: _raw_spin_lock_irq(lock=0xffff983497d1b700) <-__hrtimer_run_queues
  <idle>-0         1dNh1.  112us : preempt_count_add(val=1) <-_raw_spin_lock_irq
  <idle>-0         1dNh2.  112us : hrtimer_update_next_event(cpu_base=0xffff983497d1b700) <-hrtimer_interrupt
  <idle>-0         1dNh2.  112us : __hrtimer_next_event_base(cpu_base=0xffff983497d1b700, exclude=0x0, active=1, expires_next=9223372036854775807) <-hrtimer_interrupt
  <idle>-0         1dNh2.  112us : _raw_spin_unlock_irqrestore(lock=0xffff983497d1b700, flags=134) <-hrtimer_interrupt
  <idle>-0         1dNh2.  112us : preempt_count_sub(val=1) <-_raw_spin_unlock_irqrestore
  <idle>-0         1dNh1.  113us : tick_program_event(expires=57049782000000, force=0) <-hrtimer_interrupt
  <idle>-0         1dNh1.  113us : clockevents_program_event(dev=0xffff983497d17380, expires=57049782000000, force=0) <-hrtimer_interrupt
  <idle>-0         1dNh1.  113us : ktime_get() <-clockevents_program_event
  <idle>-0         1dNh1.  113us : lapic_next_event(delta=838, evt=0xffff983497d17380) <-clockevents_program_event
  <idle>-0         1dNh1.  113us : local_timer_exit: vector=236
  <idle>-0         1dNh1.  113us : irq_exit_rcu() <-sysvec_apic_timer_interrupt
  <idle>-0         1dNh1.  113us : preempt_count_sub(val=65536) <-__irq_exit_rcu
  <idle>-0         1dN.1.  113us : wake_up_process(p=0xffff9833c0366000) <-__irq_exit_rcu
  <idle>-0         1dN.1.  113us : try_to_wake_up(p=0xffff9833c0366000, state=3, wake_flags=0) <-__irq_exit_rcu
  <idle>-0         1dN.2.  114us : _raw_spin_lock_irqsave(lock=0xffff9833c03668d8) <-try_to_wake_up
  <idle>-0         1dN.2.  114us : preempt_count_add(val=1) <-_raw_spin_lock_irqsave
  <idle>-0         1dN.3.  114us : sched_waking: comm=ktimers/1 pid=31 prio=98 target_cpu=001
  <idle>-0         1dN.3.  114us : kthread_is_per_cpu(p=0xffff9833c0366000) <-try_to_wake_up
  <idle>-0         1dN.3.  114us : ttwu_queue_wakelist(p=0xffff9833c0366000, cpu=1, wake_flags=8) <-try_to_wake_up
  <idle>-0         1dN.4.  114us : _raw_spin_lock(lock=0xffff983497d2b400) <-try_to_wake_up
  <idle>-0         1dN.4.  114us : preempt_count_add(val=1) <-_raw_spin_lock
  <idle>-0         1dN.4.  114us+: update_rq_clock(rq=0xffff983497d2b400) <-try_to_wake_up
  <idle>-0         1dN.4.  149us : ttwu_do_activate(rq=0xffff983497d2b400, p=0xffff9833c0366000, wake_flags=8, rf=0xffffb7c340160f88) <-try_to_wake_up
  <idle>-0         1dN.4.  149us : enqueue_task(rq=0xffff983497d2b400, p=0xffff9833c0366000, flags=9) <-ttwu_do_activate
  <idle>-0         1dN.4.  149us+: enqueue_task_rt(rq=0xffff983497d2b400, p=0xffff9833c0366000, flags=9) <-enqueue_task
  <idle>-0         1dN.4.  185us : dequeue_rt_stack(rt_se=0xffff9833c0366180, flags=9) <-enqueue_task_rt
  <idle>-0         1dN.4.  186us : wakeup_preempt(rq=0xffff983497d2b400, p=0xffff9833c0366000, flags=8) <-ttwu_do_activate
  <idle>-0         1dN.4.  186us : __resched_curr.constprop.0() <-wakeup_preempt
  <idle>-0         1dN.5.  186us : __traceiter_sched_wakeup(__data=0x0, p=0xffff9833c0366000) <-ttwu_do_activate
  <idle>-0         1dN.4.  186us : sched_wakeup: comm=ktimers/1 pid=31 prio=98 target_cpu=001
  <idle>-0         1dN.4.  186us : task_woken_rt(rq=0xffff983497d2b400, p=0xffff9833c0366000) <-ttwu_do_activate
  <idle>-0         1dN.4.  186us : _raw_spin_unlock(lock=0xffff983497d2b400) <-try_to_wake_up
  <idle>-0         1dN.4.  186us : preempt_count_sub(val=1) <-_raw_spin_unlock
  <idle>-0         1dN.3.  187us : _raw_spin_unlock_irqrestore(lock=0xffff9833c03668d8, flags=146) <-try_to_wake_up
  <idle>-0         1dN.3.  187us : preempt_count_sub(val=1) <-_raw_spin_unlock_irqrestore
  <idle>-0         1dN.1.  187us : idle_cpu(cpu=1) <-__irq_exit_rcu
  <idle>-0         1dN.1.  187us : irq_enable: caller=irqentry_exit+0x50/0x70 parent=0x0
  <idle>-0         1dN.1.  187us!: rcu_watching: Endirq 1 0 0x5cc
  <idle>-0         1dN.1.  295us : rcu_watching: Start 0 1 0x5d4
  <idle>-0         1dN.1.  295us : cpu_idle: state=4294967295 cpu_id=1
  <idle>-0         1dN.2.  295us : irq_disable: caller=wakeup_tracer_call+0xbb/0xe0 parent=0x0
  <idle>-0         1.N.1.  295us : arch_cpu_idle_exit() <-do_idle
  <idle>-0         1dN.2.  296us : irq_enable: caller=wakeup_tracer_call+0xd4/0xe0 parent=0x0
  <idle>-0         1dN.2.  296us : irq_disable: caller=wakeup_tracer_call+0xbb/0xe0 parent=0x0
  <idle>-0         1.N.1.  296us : tick_nohz_idle_exit() <-do_idle
  <idle>-0         1dN.2.  296us : irq_enable: caller=wakeup_tracer_call+0xd4/0xe0 parent=0x0
  <idle>-0         1dN.1.  296us : irq_disable: caller=tick_nohz_idle_exit+0x75/0x140 parent=0x0
  <idle>-0         1dN.1.  296us : irq_enable: caller=tick_nohz_idle_exit+0x64/0x140 parent=0x0
  <idle>-0         1dN.2.  296us : irq_disable: caller=wakeup_tracer_call+0xbb/0xe0 parent=0x0
  <idle>-0         1.N.1.  296us!: flush_smp_call_function_queue() <-do_idle
  <idle>-0         1dN.2.  404us : irq_enable: caller=wakeup_tracer_call+0xd4/0xe0 parent=0x0
  <idle>-0         1dN.2.  405us : irq_disable: caller=wakeup_tracer_call+0xbb/0xe0 parent=0x0
  <idle>-0         1.N.1.  405us : schedule_idle() <-cpu_startup_entry
  <idle>-0         1dN.2.  405us : irq_enable: caller=wakeup_tracer_call+0xd4/0xe0 parent=0x0
  <idle>-0         1dN.2.  405us : irq_disable: caller=wakeup_tracer_call+0xbb/0xe0 parent=0x0
  <idle>-0         1.N.1.  405us : hrtimer_active(timer=0xffff983497d2bfe0) <-__schedule
  <idle>-0         1dN.2.  405us : irq_enable: caller=wakeup_tracer_call+0xd4/0xe0 parent=0x0
  <idle>-0         1dN.1.  405us : irq_disable: caller=__schedule+0x67f/0x1080 parent=0x0
  <idle>-0         1dN.1.  405us : rcu_note_context_switch(preempt=0) <-__schedule
  <idle>-0         1dN.1.  405us : rcu_utilization: Start context switch
  <idle>-0         1dN.1.  405us : rcu_qs() <-rcu_note_context_switch
  <idle>-0         1dN.1.  406us : rcu_utilization: End context switch
  <idle>-0         1dN.1.  406us : raw_spin_rq_lock_nested.constprop.0() <-__schedule
  <idle>-0         1dN.2.  406us : _raw_spin_lock(lock=0xffff983497d2b400) <-raw_spin_rq_lock_nested.constprop.0
  <idle>-0         1dN.2.  406us : preempt_count_add(val=1) <-_raw_spin_lock
  <idle>-0         1dN.2.  406us : update_rq_clock(rq=0xffff983497d2b400) <-__schedule
  <idle>-0         1dN.2.  406us : pick_task_stop(rq=0xffff983497d2b400) <-__schedule
  <idle>-0         1dN.2.  406us : pick_task_dl(rq=0xffff983497d2b400) <-__schedule
  <idle>-0         1dN.2.  406us : pick_task_rt(rq=0xffff983497d2b400) <-__schedule
  <idle>-0         1dN.2.  407us : put_prev_task_idle(rq=0xffff983497d2b400, prev=0xffff9833c035a000, next=0xffff9833c394a000) <-__schedule
  <idle>-0         1dN.2.  407us : dl_scaled_delta_exec(rq=0xffff983497d2b400, dl_se=0xffff983497d2bce8, delta_exec=332332) <-put_prev_task_idle
  <idle>-0         1dN.2.  407us : arch_scale_cpu_capacity(cpu=1) <-dl_scaled_delta_exec
  <idle>-0         1dN.2.  407us : dequeue_pushable_task(rq=0xffff983497d2b400, p=0xffff9833c394a000) <-set_next_task_rt
  <idle>-0         1dN.2.  407us : update_rt_rq_load_avg(now=57049782342572, rq=0xffff983497d2b400, running=0) <-set_next_task_rt
  <idle>-0         1d..3.  407us : __traceiter_sched_switch(__data=0x0, preempt=0, prev=0xffff9833c035a000, next=0xffff9833c394a000, prev_state=0) <-__schedule
  <idle>-0         1d..2.  407us : sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=cyclictest next_pid=218784 next_prio=4
  <idle>-0         1d..3.  407us : __traceiter_sched_switch
  <idle>-0         1d..3.  408us :        0:120:R ==> [001]  218784:  4:R cyclictest
  <idle>-0         1d..3.  412us : <stack trace>
 => __ftrace_trace_stack
 => probe_wakeup_sched_switch
 => __traceiter_sched_switch
 => __schedule
 => schedule_idle
 => cpu_startup_entry
 => start_secondary
 => common_startup_64

[-- Attachment #4: config.gz --]
[-- Type: application/x-gzip, Size: 34225 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel IGB
  2025-06-10 11:23 Latency spikes on V6.15.1 Preempt RT and maybe related to intel IGB Marc Strämke
@ 2025-06-13 14:54 ` Sebastian Andrzej Siewior
       [not found]   ` <E1uQ6I0-000000003aa-37uJ@smtprelay05.ispgateway.de>
       [not found]   ` <522704b0-5a35-42c3-9bcc-627b487b740c@eltropuls.de>
  0 siblings, 2 replies; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-13 14:54 UTC (permalink / raw)
  To: Marc Strämke; +Cc: linux-kernel, linux-rt-users

On 2025-06-10 13:23:13 [+0200], Marc Strämke wrote:
> Hello Everyone, I am reposting to LKML as I am not sure the rt-users
Hi,

> mailinglist is read by many people, (I hope that is okay)
> 
> On an AMD Ryzen Embedded machine I am experiencing strange Latency spikes in
> cyclictest and need some hints how to debug that further.
> 
> The system typically has max latencys of  88 us and averages of 4-8 which is
> more then sufficient for my application, but I saw some spikes of many
> hundred us in testing.
> 
> I can provoke latenciess of more then 500-1000 us by invoking "ip l set
> enp1s0 promisc off" on the first network interfaces. The network interface
> is an "Intel Corporation I210 Gigabit Network Connection" using the IGB
> driver.
> 
> I tried more or less all tracers but am not knowledgeable enough to make
> sense of the output. IRQSoff and wakeup_rt trace output attached.

I'm not sure what your two traces captured. The irqsoff_trace captured
208us and this looks like a regular top of the run_ktimerd() invocation.
There is not much going on.

wakeup_rt_trace shows the wakeup of cyclictest. It records 411us. Most
of it is scheduler itself with some 100us from
flush_smp_call_function_queue().
 
> Can anyone point me in the right direction? I am not sure how to interpret
> the function tracers and function_graph tracers output in a meaningful way.
> As mainly a user of of the kernel I am a bit overwhelmed by the interaction
> of the scheduler, RCU and so on..


Could you please try one of the following:
- enable sched tracing events and tell cyclictest to break & stop
  tracing? The options for cyclitest would be -b --tracemark.
  The idea to see the scheduler events before the delay happens. So your
  latency spike is 500us then you can try -b 490 or so.

- use rtla. This might be easier and give you the backtrace of what you
  are looking for
  	https://bristot.me/linux-scheduling-latency-debug-and-analysis/

> I have attached my config for reference.

There is nothing wrong with it. You might want to disable NO_HZ (and use
PERIODIC) and use HZ=250 (or less) 

> Kind Regards
> 
> Marc
> 

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
       [not found]   ` <E1uQ6I0-000000003aa-37uJ@smtprelay05.ispgateway.de>
@ 2025-06-13 19:58     ` Sebastian Andrzej Siewior
  2025-06-14  8:52       ` Marc Strämke
  0 siblings, 1 reply; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-13 19:58 UTC (permalink / raw)
  To: marc.straemke; +Cc: linux-kernel, linux-rt-users

On 2025-06-13 17:26:15 [+0200], marc.straemke wrote:
> Thanks Sebastian, I will do that tomorrow.To confirm: Just pure event
> tracing without the function tracer? (After enabling the sched
> events)Regards,Marc 

The event tracing should narrow down which of the tasks cause the spike.
So if you say it is the ip comment then you should see ip.
Step two would be the function tracer to narrow it down further. You
could start right away with the function tracer to see where the big gap
is.
rtla could speed up the whole process (via the timerlat auto analysis).

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
  2025-06-13 19:58     ` Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB Sebastian Andrzej Siewior
@ 2025-06-14  8:52       ` Marc Strämke
  2025-06-16 15:10         ` Marc Strämke
                           ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Marc Strämke @ 2025-06-14  8:52 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-kernel, linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 3062 bytes --]

Sebastian,

i tried that in the past (rtla top auto analysis). But i do not really 
understand the result:

rtla timerlat hit stop tracing
## CPU 1 hit stop tracing, analyzing it ##
   IRQ handler delay:                                         0.00 us 
(0.00 %)
   IRQ latency:                                             709.25 us
   Blocking thread:
                               ip:3567
     Blocking thread stack trace
                 -> timerlat_irq
                 -> __hrtimer_run_queues
                 -> hrtimer_interrupt
                 -> __sysvec_apic_timer_interrupt
                 -> sysvec_apic_timer_interrupt
                 -> asm_sysvec_apic_timer_interrupt
                 -> igb_update_mc_addr_list
                 -> igb_set_rx_mode
                 -> __dev_change_flags
                 -> netif_change_flags
                 -> do_setlink.constprop.0
                 -> rtnl_newlink
                 -> rtnetlink_rcv_msg
                 -> netlink_rcv_skb
                 -> netlink_unicast
                 -> netlink_sendmsg
                 -> ____sys_sendmsg
                 -> ___sys_sendmsg
                 -> __sys_sendmsg
                 -> do_syscall_64
                 -> entry_SYSCALL_64_after_hwframe
------------------------------------------------------------------------
      IRQ latency:                                          709.25 us (100%)


I do not really understand where the IRQ/Preemption disabling is 
happening. What would the next thing be to do? Function (graph?) tracing 
on all the functions visible in the backtrace?


I tried to look at the event race output starting with the call to 
igb_set_rx_mode. I have attached the trace with all events and a 
function filter on igb on only the cpu executing ip.  I cannot 
understand what is happening between timestasmp 700.149995 and the IRQ 
disable event on 700.150795....


Thanks for your help,

Marc



Am 13.06.2025 um 21:58 schrieb Sebastian Andrzej Siewior:
> On 2025-06-13 17:26:15 [+0200], marc.straemke wrote:
>> Thanks Sebastian, I will do that tomorrow.To confirm: Just pure event
>> tracing without the function tracer? (After enabling the sched
>> events)Regards,Marc
> The event tracing should narrow down which of the tasks cause the spike.
> So if you say it is the ip comment then you should see ip.
> Step two would be the function tracer to narrow it down further. You
> could start right away with the function tracer to see where the big gap
> is.
> rtla could speed up the whole process (via the timerlat auto analysis).
>
> Sebastian

[-- Attachment #2: trace_after_igb_set_rx_mode --]
[-- Type: text/plain, Size: 11269 bytes --]

              ip-4931    [001] ...1.   700.149994: preempt_disable: caller=__local_bh_disable_ip+0x76/0xe0 parent=__local_bh_disable_ip+0x76/0xe0
              ip-4931    [001] ...11   700.149994: preempt_enable: caller=__local_bh_disable_ip+0x76/0xe0 parent=__local_bh_disable_ip+0x76/0xe0
              ip-4931    [001] b...3   700.149994: igb_set_rx_mode <-__dev_change_flags
              ip-4931    [001] b...3   700.149995: kmalloc: call_site=igb_set_rx_mode+0x4f8/0x5a0 ptr=00000000b0b4e5c8 bytes_req=12 bytes_alloc=16 gfp_flags=GFP_ATOMIC|__GFP_ZERO node=-1 accounted=false
              ip-4931    [001] b...3   700.149995: igb_update_mc_addr_list <-igb_set_rx_mode
              ip-4931    [001] Dn..3   700.150795: irq_disable: caller=irqentry_enter+0x2a/0x60 parent=0x0
              ip-4931    [001] Dnh.3   700.150795: preempt_disable: caller=sysvec_reschedule_ipi+0x24/0x130 parent=sysvec_reschedule_ipi+0x24/0x130
              ip-4931    [001] Dnh.3   700.150795: reschedule_entry: vector=253
              ip-4931    [001] DNh.3   700.150795: reschedule_exit: vector=253
              ip-4931    [001] DNh.3   700.150796: preempt_enable: caller=sysvec_reschedule_ipi+0x8d/0x130 parent=sysvec_reschedule_ipi+0x8d/0x130
              ip-4931    [001] DN.13   700.150796: preempt_disable: caller=irqentry_exit+0x64/0x70 parent=irqentry_exit+0x64/0x70
              ip-4931    [001] DN.13   700.150796: irq_enable: caller=preempt_schedule_irq+0x3b/0xe0 parent=0x0
              ip-4931    [001] DN.13   700.150796: irq_disable: caller=irqentry_enter+0x2a/0x60 parent=0x0
              ip-4931    [001] DNh13   700.150797: call_function_single_entry: vector=251
              ip-4931    [001] DNh23   700.150798: ipi_send_cpu: cpu=1 callsite=irq_work_queue_on+0x109/0x120 callback=rto_push_irq_work_func+0x0/0xa0
              ip-4931    [001] DNh13   700.150798: call_function_single_exit: vector=251
              ip-4931    [001] DN.13   700.150798: irq_enable: caller=asm_sysvec_call_function_single+0x1a/0x20 parent=0x0
              ip-4931    [001] DN.13   700.150799: irq_disable: caller=irqentry_enter+0x2a/0x60 parent=0x0
              ip-4931    [001] DNh13   700.150799: irq_work_entry: vector=246
              ip-4931    [001] DNh13   700.150799: irq_work_exit: vector=246
              ip-4931    [001] DN.13   700.150800: irq_enable: caller=asm_sysvec_irq_work+0x1a/0x20 parent=0x0
              ip-4931    [001] DN.13   700.150800: irq_disable: caller=irqentry_enter+0x2a/0x60 parent=0x0
              ip-4931    [001] DNh13   700.150800: local_timer_entry: vector=236
              ip-4931    [001] DNh23   700.150801: hrtimer_cancel: hrtimer=000000003093a798
              ip-4931    [001] DNh13   700.150801: hrtimer_expire_entry: hrtimer=000000003093a798 function=hrtimer_wakeup now=700145368977
              ip-4931    [001] DNh33   700.150802: sched_waking: comm=cyclictest pid=4893 prio=9 target_cpu=001
              ip-4931    [001] DNh43   700.150802: sched_stat_sleep: comm=cyclictest pid=4893 delay=1223239 [ns]
              ip-4931    [001] DNh43   700.150803: sched_wakeup: comm=cyclictest pid=4893 prio=9 target_cpu=001
              ip-4931    [001] DNh13   700.150803: hrtimer_expire_exit: hrtimer=000000003093a798
              ip-4931    [001] DNh23   700.150803: hrtimer_cancel: hrtimer=000000001da73767
              ip-4931    [001] DNh13   700.150803: hrtimer_expire_entry: hrtimer=000000001da73767 function=tick_nohz_handler now=700145368977
              ip-4931    [001] DNh13   700.150804: softirq_raise: vec=1 [action=TIMER]
              ip-4931    [001] DNh13   700.150804: rcu_utilization: Start scheduler-tick
              ip-4931    [001] DNh13   700.150805: rcu_utilization: End scheduler-tick
              ip-4931    [001] DNh13   700.150805: read_msr: e8, value 1787ff7229c
              ip-4931    [001] DNh13   700.150805: read_msr: e7, value 177daa8fd55
              ip-4931    [001] DNh23   700.150805: sched_stat_runtime: comm=ip pid=4931 runtime=1226636 [ns]
              ip-4931    [001] DNh13   700.150808: hrtimer_expire_exit: hrtimer=000000001da73767
              ip-4931    [001] DNh23   700.150808: hrtimer_start: hrtimer=000000001da73767 function=tick_nohz_handler expires=700146000000 softexpires=700146000000 mode=ABS
              ip-4931    [001] DNh13   700.150809: local_timer_exit: vector=236
              ip-4931    [001] DN.33   700.150809: sched_waking: comm=ktimers/1 pid=31 prio=98 target_cpu=001
              ip-4931    [001] DN.43   700.150809: sched_stat_sleep: comm=ktimers/1 pid=31 delay=3349898 [ns]
              ip-4931    [001] DN.43   700.150810: sched_wakeup: comm=ktimers/1 pid=31 prio=98 target_cpu=001
              ip-4931    [001] DN.13   700.150810: irq_enable: caller=asm_sysvec_apic_timer_interrupt+0x1a/0x20 parent=0x0
              ip-4931    [001] DN.13   700.150810: irq_disable: caller=__schedule+0x67f/0x1080 parent=0x0
              ip-4931    [001] DN.13   700.150810: rcu_utilization: Start context switch
              ip-4931    [001] DN.23   700.150810: rcu_preempt_task: rcu_preempt 93628 4931
              ip-4931    [001] DN.13   700.150811: rcu_utilization: End context switch
              ip-4931    [001] DN.23   700.150811: sched_stat_runtime: comm=ip pid=4931 runtime=4037 [ns]
              ip-4931    [001] DN.23   700.150812: sched_stat_wait: comm=cyclictest pid=4893 delay=7434 [ns]
              ip-4931    [001] D..23   700.150813: sched_stat_wait: comm=ip pid=4931 delay=0 [ns]
              ip-4931    [001] D..33   700.150814: hrtimer_start: hrtimer=0000000087697d14 function=inactive_task_timer expires=700464978086 softexpires=700464978086 mode=REL|HARD
              ip-4931    [001] D..33   700.150814: hrtimer_cancel: hrtimer=0000000096921230
              ip-4931    [001] D..33   700.150815: hrtimer_cancel: hrtimer=0000000087697d14
              ip-4931    [001] D..33   700.150816: hrtimer_start: hrtimer=0000000096921230 function=dl_task_timer expires=701094857133 softexpires=701094857133 mode=ABS|HARD
              ip-4931    [001] Dp.23   700.150816: sched_switch: prev_comm=ip prev_pid=4931 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=4893 next_prio=9
              ip-4931    [001] Dp.23   700.150817: tlb_flush: pages:-1 reason:flush on task switch (0)
              ip-4931    [001] Dp.23   700.150817: x86_fpu_regs_deactivated: x86/fpu: 00000000dd84b5fb load: 0 xfeatures: 2 xcomp_bv: 8000000000000007
      cyclictest-4893    [001] dN.3.   700.150818: sched_stat_runtime: comm=cyclictest pid=4893 runtime=8747 [ns]
      cyclictest-4893    [001] dN.3.   700.150819: sched_migrate_task: comm=rcu_preempt pid=17 prio=98 orig_cpu=1 dest_cpu=0
      cyclictest-4893    [001] dN.3.   700.150819: sched_wake_idle_without_ipi: cpu=0
      cyclictest-4893    [001] dN.1.   700.150820: irq_enable: caller=finish_task_switch.isra.0+0xb1/0x300 parent=0x0
      cyclictest-4893    [001] dN.1.   700.150820: irq_disable: caller=irqentry_enter+0x2a/0x60 parent=0x0
      cyclictest-4893    [001] dNh1.   700.150820: call_function_single_entry: vector=251
      cyclictest-4893    [001] dNh2.   700.150821: csd_queue_cpu: cpu=0 callsite=irq_work_queue_on+0xc6/0x120 func=rto_push_irq_work_func csd=00000000e2a8d506
      cyclictest-4893    [001] dNh2.   700.150821: ipi_send_cpu: cpu=0 callsite=irq_work_queue_on+0xc6/0x120 callback=generic_smp_call_function_single_interrupt+0x0/0x20
      cyclictest-4893    [001] dNh1.   700.150821: call_function_single_exit: vector=251
      cyclictest-4893    [001] dN.1.   700.150822: irq_enable: caller=asm_sysvec_call_function_single+0x1a/0x20 parent=0x0
      cyclictest-4893    [001] .N.1.   700.150822: preempt_enable: caller=do_nanosleep+0x72/0x190 parent=do_nanosleep+0x72/0x190
      cyclictest-4893    [001] .N.1.   700.150822: preempt_disable: caller=do_nanosleep+0x72/0x190 parent=do_nanosleep+0x72/0x190
      cyclictest-4893    [001] dN.1.   700.150822: irq_disable: caller=__schedule+0x67f/0x1080 parent=0x0
      cyclictest-4893    [001] dN.1.   700.150822: rcu_utilization: Start context switch
      cyclictest-4893    [001] dN.1.   700.150822: rcu_utilization: End context switch
      cyclictest-4893    [001] d..1.   700.150823: irq_enable: caller=__schedule+0xb5e/0x1080 parent=0x0
      cyclictest-4893    [001] ...1.   700.150823: preempt_enable: caller=do_nanosleep+0x72/0x190 parent=do_nanosleep+0x72/0x190
      cyclictest-4893    [001] .....   700.150823: sys_exit: NR 230 = 0
      cyclictest-4893    [001] d....   700.150823: irq_disable: caller=syscall_exit_to_user_mode+0x1c4/0x240 parent=0x0
      cyclictest-4893    [001] d....   700.150824: irq_enable: caller=syscall_exit_to_user_mode+0xe4/0x240 parent=0x0
      cyclictest-4893    [001] .....   700.150824: rseq_update: cpu_id=1 node_id=0 mm_cid=0
      cyclictest-4893    [001] d....   700.150824: irq_disable: caller=syscall_exit_to_user_mode+0x203/0x240 parent=0x0
      cyclictest-4893    [001] d....   700.150825: x86_fpu_regs_activated: x86/fpu: 00000000ff492f85 load: 1 xfeatures: 2 xcomp_bv: 8000000000000007
      cyclictest-4893    [001] d....   700.150825: irq_enable: caller=syscall_exit_to_user_mode+0x7c/0x240 parent=0x0
      cyclictest-4893    [001] d....   700.150830: irq_disable: caller=do_syscall_64+0x45/0x1a0 parent=0x0
      cyclictest-4893    [001] d....   700.150830: irq_enable: caller=do_syscall_64+0x4a/0x1a0 parent=0x0
      cyclictest-4893    [001] .....   700.150831: sys_enter: NR 1 (4, 7f640320e2c0, 21, 7f640320d5b5, 0, 64)
      cyclictest-4893    [001] ...1.   700.150832: preempt_disable: caller=vfs_write+0x306/0x430 parent=vfs_write+0x306/0x430
      cyclictest-4893    [001] ...1.   700.150832: preempt_enable: caller=vfs_write+0x328/0x430 parent=vfs_write+0x328/0x430
      cyclictest-4893    [001] .....   700.150832: tracing_mark_write: hit latency threshold (759 > 700)
      cyclictest-4893    [001] ...1.   700.150833: preempt_disable: caller=vfs_write+0x12f/0x430 parent=vfs_write+0x12f/0x430
      cyclictest-4893    [001] ...1.   700.150833: preempt_enable: caller=vfs_write+0x151/0x430 parent=vfs_write+0x151/0x430
      cyclictest-4893    [001] ...1.   700.150833: preempt_disable: caller=fput+0x1f/0x90 parent=fput+0x1f/0x90
      cyclictest-4893    [001] ...1.   700.150833: preempt_enable: caller=fput+0x3f/0x90 parent=fput+0x3f/0x90
      cyclictest-4893    [001] .....   700.150833: sys_exit: NR 1 = 33
      cyclictest-4893    [001] d....   700.150834: irq_disable: caller=syscall_exit_to_user_mode+0x1c4/0x240 parent=0x0
      cyclictest-4893    [001] d....   700.150834: irq_enable: caller=syscall_exit_to_user_mode+0x7c/0x240 parent=0x0
      cyclictest-4893    [001] d....   700.150834: irq_disable: caller=do_syscall_64+0x45/0x1a0 parent=0x0
      cyclictest-4893    [001] d....   700.150834: irq_enable: caller=do_syscall_64+0x4a/0x1a0 parent=0x0
      cyclictest-4893    [001] .....   700.150834: sys_enter: NR 1 (5, 5556d858a461, 2, 7f640320d5b5, 0, 64)
      cyclictest-4893    [001] ...1.   700.150835: preempt_disable: caller=vfs_write+0x306/0x430 parent=vfs_write+0x306/0x430
      cyclictest-4893    [001] ...1.   700.150835: preempt_enable: caller=vfs_write+0x328/0x430 parent=vfs_write+0x328/0x430

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel IGB
       [not found]   ` <522704b0-5a35-42c3-9bcc-627b487b740c@eltropuls.de>
@ 2025-06-14 18:13     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-14 18:13 UTC (permalink / raw)
  To: Marc Strämke; +Cc: linux-kernel, linux-rt-users

On 2025-06-14 14:59:43 [+0200], Marc Strämke wrote:
> Hello Sebastian,
Hi,

> Am 13.06.2025 um 16:54 schrieb Sebastian Andrzej Siewior:
> > > I have attached my config for reference.
> > There is nothing wrong with it. You might want to disable NO_HZ (and use
> > PERIODIC) and use HZ=250 (or less)
> 
> I am a bit puzzled by that. Most advise I have read suggests disabling the
> periodic ticks to decrease jitter in an RT Application. What is the tradeoff
> here?

TICK = periodic does HZ timer interrupts a second. Each timer interrupt
serves possible timer_list timers within the kernel (not user space
timers such as clock_nanosleep()/ sleep from userland - those are
scheduled accurate). It also does some time keeping and accounting.

TICK = nohz makes it possible to skip ticks if the CPU is idle. A
notebook CPU which has nothing to do and goes idle could stay longer
idle because the HZ interrupt will not wake it up on the next tick. So
it can stay longer idle and safe energy.
If the CPU does not go idle the timer tick fires every tick.

TICK = no_hz_full you can disable the tick if the CPU remains in
userland for longer period of time. Remains in userland means it does
not call into the kernel.

State tracking is higher with nohz, higher with no_hz_full. Most RT
applications are periodic. Cyclictest should show you lower numbers with
periodic than with nohz.

You can monitor the timer interrupts in /proc/interrupts and compare an
idle CPU vs one that runs an application (which does a busy loop).

> Marc

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
  2025-06-14  8:52       ` Marc Strämke
@ 2025-06-16 15:10         ` Marc Strämke
  2025-06-17  9:16           ` Sebastian Andrzej Siewior
  2025-06-17  9:08         ` Sebastian Andrzej Siewior
  2025-06-17  9:28         ` Sebastian Andrzej Siewior
  2 siblings, 1 reply; 15+ messages in thread
From: Marc Strämke @ 2025-06-16 15:10 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-kernel, linux-rt-users

Hi Sebastian,

On 14.06.25 10:52, Marc Strämke wrote:
>
> I do not really understand where the IRQ/Preemption disabling is 
> happening. What would the next thing be to do? Function (graph?) 
> tracing on all the functions visible in the backtrace?
>
>
> I tried to look at the event race output starting with the call to 
> igb_set_rx_mode. I have attached the trace with all events and a 
> function filter on igb on only the cpu executing ip.  I cannot 
> understand what is happening between timestasmp 700.149995 and the IRQ 
> disable event on 700.150795....
>
>
> Thanks for your help,
>
> Marc
>
>
I am still trying to figure that puzzle out: Please see the following 
function trace snippet:

ip-690530  [000] ..... 178636.460435: rtnl_is_locked <-__dev_change_flags
ip-690530  [000] ..... 178636.460435: __local_bh_disable_ip 
<-__dev_change_flags
ip-690530  [000] ..... 178636.460435: migrate_disable 
<-__local_bh_disable_ip
ip-690530  [000] ...1. 178636.460435: preempt_disable: 
caller=__local_bh_disable_ip+0x76/0xe0 
parent=__local_bh_disable_ip+0x76/0xe0
ip-690530  [000] ...11 178636.460435: preempt_enable: 
caller=__local_bh_disable_ip+0x76/0xe0 
parent=__local_bh_disable_ip+0x76/0xe0
ip-690530  [000] ....1 178636.460435: rt_spin_lock <-__local_bh_disable_ip
ip-690530  [000] ....1 178636.460436: __rcu_read_lock <-rt_spin_lock
ip-690530  [000] ....1 178636.460436: migrate_disable 
<-__local_bh_disable_ip
ip-690530  [000] ....2 178636.460436: __rcu_read_lock 
<-__local_bh_disable_ip
ip-690530  [000] b...2 178636.460436: rt_spin_lock <-__dev_change_flags
ip-690530  [000] b...2 178636.460436: __rcu_read_lock <-rt_spin_lock
ip-690530  [000] b...2 178636.460436: migrate_disable <-__dev_change_flags
ip-690530  [000] b...3 178636.460436: __dev_set_rx_mode <-__dev_change_flags
ip-690530  [000] b...3 178636.460437: igb_set_rx_mode <-__dev_change_flags

It is my understanding that __local_bh_disable_ip called from 
netif_addr_lock_bh seems to be the origin of my latency.

What i do not understand is, even if the bottom halves are disabled. 
Shouldn't I see the interrupt arriving in the trace?

If i understood it correctly, your talk "Unblocking the softirq lock for 
PREEMPT_RT" during the plumbers conference 2023 is exactly about that 
case, right?

Probably fixing this issue is out of my abilities for now ;-) The wast 
variety of locking concepts inside the kernel is quite intimidating for 
a newcomer in kernel land...


Regards,


Marc


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
  2025-06-14  8:52       ` Marc Strämke
  2025-06-16 15:10         ` Marc Strämke
@ 2025-06-17  9:08         ` Sebastian Andrzej Siewior
  2025-06-17  9:28         ` Sebastian Andrzej Siewior
  2 siblings, 0 replies; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-17  9:08 UTC (permalink / raw)
  To: Marc Strämke; +Cc: linux-kernel, linux-rt-users

On 2025-06-14 10:52:36 [+0200], Marc Strämke wrote:
> Sebastian,
Hi,

> i tried that in the past (rtla top auto analysis). But i do not really
> understand the result:
> 
> rtla timerlat hit stop tracing
> ## CPU 1 hit stop tracing, analyzing it ##
>   IRQ handler delay:                                         0.00 us (0.00
> %)
>   IRQ latency:                                             709.25 us
>   Blocking thread:
>                               ip:3567
>     Blocking thread stack trace
>                 -> timerlat_irq
>                 -> __hrtimer_run_queues
>                 -> hrtimer_interrupt
>                 -> __sysvec_apic_timer_interrupt
>                 -> sysvec_apic_timer_interrupt
>                 -> asm_sysvec_apic_timer_interrupt
>                 -> igb_update_mc_addr_list
>                 -> igb_set_rx_mode
>                 -> __dev_change_flags
>                 -> netif_change_flags
>                 -> do_setlink.constprop.0
>                 -> rtnl_newlink
>                 -> rtnetlink_rcv_msg
>                 -> netlink_rcv_skb
>                 -> netlink_unicast
>                 -> netlink_sendmsg
>                 -> ____sys_sendmsg
>                 -> ___sys_sendmsg
>                 -> __sys_sendmsg
>                 -> do_syscall_64
>                 -> entry_SYSCALL_64_after_hwframe
> ------------------------------------------------------------------------
>      IRQ latency:                                          709.25 us (100%)
> 
> 
> I do not really understand where the IRQ/Preemption disabling is happening.
> What would the next thing be to do? Function (graph?) tracing on all the
> functions visible in the backtrace?

So somewhere between entry_SYSCALL_64_after_hwframe() and timerlat_irq()
something happened that led to the spike. Up to __dev_change_flags() it
is mostly regular and asm_sysvec_apic_timer_interrupt() is already the
interrupt. 
There is only one calling site for igb_update_mc_addr_list() and
everything before that looks harmless.
igb_update_mc_addr_list() has these loops where it writes to FIFO
register. This is followed by wrfl() which reads the status register.
This read forces the writes to go through the bus. What might happen is
that all the writes are "cached" and scheduled for a better time. That
read will force the writes to actually happen and the CPU will stall
until this done.
My guess would be one the writes cause the stall here.

> I tried to look at the event race output starting with the call to
> igb_set_rx_mode. I have attached the trace with all events and a function
> filter on igb on only the cpu executing ip.  I cannot understand what is
> happening between timestasmp 700.149995 and the IRQ disable event on
> 700.150795....
> 
>               ip-4931    [001] b...3   700.149994: igb_set_rx_mode <-__dev_change_flags
>               ip-4931    [001] b...3   700.149995: kmalloc: call_site=igb_set_rx_mode+0x4f8/0x5a0 ptr=00000000b0b4e5c8 bytes_req=12 bytes_alloc=16 gfp_flags=GFP_ATOMIC|__GFP_ZERO node=-1 accounted=false
>               ip-4931    [001] b...3   700.149995: igb_update_mc_addr_list <-igb_set_rx_mode
>               ip-4931    [001] Dn..3   700.150795: irq_disable: caller=irqentry_enter+0x2a/0x60 parent=0x0

The CPU is somewhere in igb_update_mc_addr_list(). Based on the
timestamps it was there for 800us before the irq_disable() occurred.
This did not happen within igb_update_mc_addr_list() but due to an
interrupt.

> Thanks for your help,
> 
> Marc

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
  2025-06-16 15:10         ` Marc Strämke
@ 2025-06-17  9:16           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-17  9:16 UTC (permalink / raw)
  To: Marc Strämke; +Cc: linux-kernel, linux-rt-users

On 2025-06-16 17:10:33 [+0200], Marc Strämke wrote:
> Hi Sebastian,
Hi Marc,

> I am still trying to figure that puzzle out: Please see the following
> function trace snippet:
> 
> ip-690530  [000] ..... 178636.460435: rtnl_is_locked <-__dev_change_flags
> ip-690530  [000] ..... 178636.460435: __local_bh_disable_ip <-__dev_change_flags
> ip-690530  [000] ..... 178636.460435: migrate_disable <-__local_bh_disable_ip
> ip-690530  [000] ...1. 178636.460435: preempt_disable: caller=__local_bh_disable_ip+0x76/0xe0 parent=__local_bh_disable_ip+0x76/0xe0
> ip-690530  [000] ...11 178636.460435: preempt_enable: caller=__local_bh_disable_ip+0x76/0xe0 parent=__local_bh_disable_ip+0x76/0xe0
> ip-690530  [000] ....1 178636.460435: rt_spin_lock <-__local_bh_disable_ip
> ip-690530  [000] ....1 178636.460436: __rcu_read_lock <-rt_spin_lock
> ip-690530  [000] ....1 178636.460436: migrate_disable <-__local_bh_disable_ip
> ip-690530  [000] ....2 178636.460436: __rcu_read_lock <-__local_bh_disable_ip
> ip-690530  [000] b...2 178636.460436: rt_spin_lock <-__dev_change_flags
> ip-690530  [000] b...2 178636.460436: __rcu_read_lock <-rt_spin_lock
> ip-690530  [000] b...2 178636.460436: migrate_disable <-__dev_change_flags
> ip-690530  [000] b...3 178636.460436: __dev_set_rx_mode <-__dev_change_flags
> ip-690530  [000] b...3 178636.460437: igb_set_rx_mode <-__dev_change_flags
> 
> It is my understanding that __local_bh_disable_ip called from
> netif_addr_lock_bh seems to be the origin of my latency.

How so?

> What i do not understand is, even if the bottom halves are disabled.
> Shouldn't I see the interrupt arriving in the trace?
Yes.

> If i understood it correctly, your talk "Unblocking the softirq lock for
> PREEMPT_RT" during the plumbers conference 2023 is exactly about that case,
> right?
No.

> Probably fixing this issue is out of my abilities for now ;-) The wast
> variety of locking concepts inside the kernel is quite intimidating for a
> newcomer in kernel land...

If you would have only the scheduler events enabled and you would see
that "ip-690530" is doing something, then an interrupt fires, that
interrupts wakes a thread, CPU switches to that thread and that thread
does sched_pi_setprio() and switches back to "ip-690530" until it is
done _then_ it would be what I said on plumbers in 2023.

Your trace snippet above is short latency wise. It covers just 2us.
From that it all looks good. If you go further down in your trace,
there has to be a spike between igb_update_mc_addr_list() and
asm_sysvec_apic_timer_interrupt().

> Regards,
> 
> 
> Marc

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
  2025-06-14  8:52       ` Marc Strämke
  2025-06-16 15:10         ` Marc Strämke
  2025-06-17  9:08         ` Sebastian Andrzej Siewior
@ 2025-06-17  9:28         ` Sebastian Andrzej Siewior
  2025-06-17  9:45           ` Marc Strämke
  2 siblings, 1 reply; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-17  9:28 UTC (permalink / raw)
  To: Marc Strämke; +Cc: linux-kernel, linux-rt-users

On 2025-06-14 10:52:36 [+0200], Marc Strämke wrote:
>               ip-4931    [001] b...3   700.149995: igb_update_mc_addr_list <-igb_set_rx_mode
>               ip-4931    [001] Dn..3   700.150795: irq_disable: caller=irqentry_enter+0x2a/0x60 parent=0x0

Between those two functions you have 800us delay. Interrupts are not
disabled so the CPU stalls. As explained earlier, I expect the read on
the bus flushes the writes causing the spike.

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
  2025-06-17  9:28         ` Sebastian Andrzej Siewior
@ 2025-06-17  9:45           ` Marc Strämke
  2025-06-17 10:00             ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 15+ messages in thread
From: Marc Strämke @ 2025-06-17  9:45 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-kernel, linux-rt-users

Hi Sebastian,

On 17.06.25 11:28, Sebastian Andrzej Siewior wrote:
> Between those two functions you have 800us delay. Interrupts are not
> disabled so the CPU stalls. As explained earlier, I expect the read on
> the bus flushes the writes causing the spike.
>
So the delay you think is really the hardware(CPU) being stalled on the 
bus for so long? Or do you mean that this is the reason for the long 
runtime of the IGB function only?

Shouldn't the other core (it is 2 core machine) still be able to handle 
the timer interrupt then? (I did some testing with isolating IP on one 
core and cyclictest on the other, I am not sure if I moved the timer IRQ 
to the second core)

My mistake was to search for where the Interrupt was being disabled in 
Kernelspace (I did not think that the hardware could introduce and IRQ 
delay for 800us..)



Marc






^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
  2025-06-17  9:45           ` Marc Strämke
@ 2025-06-17 10:00             ` Sebastian Andrzej Siewior
  2025-06-17 10:03               ` Marc Strämke
  0 siblings, 1 reply; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-17 10:00 UTC (permalink / raw)
  To: Marc Strämke; +Cc: linux-kernel, linux-rt-users

On 2025-06-17 11:45:52 [+0200], Marc Strämke wrote:
> Hi Sebastian,
Hi Marc,

> On 17.06.25 11:28, Sebastian Andrzej Siewior wrote:
> > Between those two functions you have 800us delay. Interrupts are not
> > disabled so the CPU stalls. As explained earlier, I expect the read on
> > the bus flushes the writes causing the spike.
> > 
> So the delay you think is really the hardware(CPU) being stalled on the bus
> for so long? Or do you mean that this is the reason for the long runtime of
Yes.

> the IGB function only?
Both. The bus is flushed, the CPU stalls until the end and then the
function takes long _and_ the due to the stall the interrupt can not
fire any sooner.

> Shouldn't the other core (it is 2 core machine) still be able to handle the
> timer interrupt then? (I did some testing with isolating IP on one core and

You have two cores and two cyclictest threads - one on each CPU. This is
explicit per-CPU. No migration between those two cores. Timers for CPU0
are always handled by CPU0.
Even if CPU1 would handle CPU0's timers then it would wake cyclictest on
CPU0 but that thread would have to wake until CPU0 is done with the PCI
bus. CPU1 knows nothing about it.

> cyclictest on the other, I am not sure if I moved the timer IRQ to the
> second core)

> My mistake was to search for where the Interrupt was being disabled in
> Kernelspace (I did not think that the hardware could introduce and IRQ delay
> for 800us..)

Yeah. 

> Marc

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
  2025-06-17 10:00             ` Sebastian Andrzej Siewior
@ 2025-06-17 10:03               ` Marc Strämke
  2025-06-17 10:15                 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 15+ messages in thread
From: Marc Strämke @ 2025-06-17 10:03 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-kernel, linux-rt-users

Hi Sebastian,

On 17.06.25 12:00, Sebastian Andrzej Siewior wrote:
> Even if CPU1 would handle CPU0's timers then it would wake cyclictest on
> CPU0 but that thread would have to wake until CPU0 is done with the PCI
> bus. CPU1 knows nothing about it.

Okay then the latency I see on the other CPU must be from a PCI access 
done by the second CPU which stall on the same shared bus.

Anyway: Thanks for your help Sebastian! I can probably live well with 
these spikes in latency. I was more concerned that there is a deeper 
issue with my config and the response time could be unbounded.

Marc

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
  2025-06-17 10:03               ` Marc Strämke
@ 2025-06-17 10:15                 ` Sebastian Andrzej Siewior
  2025-06-17 10:54                   ` Kurt Kanzenbach
  0 siblings, 1 reply; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-17 10:15 UTC (permalink / raw)
  To: Marc Strämke; +Cc: linux-kernel, linux-rt-users

On 2025-06-17 12:03:31 [+0200], Marc Strämke wrote:
> Hi Sebastian,
Hi,

> On 17.06.25 12:00, Sebastian Andrzej Siewior wrote:
> > Even if CPU1 would handle CPU0's timers then it would wake cyclictest on
> > CPU0 but that thread would have to wake until CPU0 is done with the PCI
> > bus. CPU1 knows nothing about it.
> 
> Okay then the latency I see on the other CPU must be from a PCI access done
> by the second CPU which stall on the same shared bus.

Okay.
> 
> Anyway: Thanks for your help Sebastian! I can probably live well with these
> spikes in latency. I was more concerned that there is a deeper issue with my
> config and the response time could be unbounded.

You don't have to live with it. You could add a read after the writes in
the loop (wrfl()). This should help.
Some Intel CPUs have a MSR bit to disable this kind of caching on the
PCI bus. Maybe the AMD CPU has this, too. Or there might be a switch in
the BIOS.

> Marc

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
  2025-06-17 10:15                 ` Sebastian Andrzej Siewior
@ 2025-06-17 10:54                   ` Kurt Kanzenbach
       [not found]                     ` <19a46896-3d26-415b-9820-730a42b0702e@eltropuls.de>
  0 siblings, 1 reply; 15+ messages in thread
From: Kurt Kanzenbach @ 2025-06-17 10:54 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Marc Strämke; +Cc: linux-kernel, linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 612 bytes --]

On Tue Jun 17 2025, Sebastian Andrzej Siewior wrote:
> On 2025-06-17 12:03:31 [+0200], Marc Strämke wrote:
>> Anyway: Thanks for your help Sebastian! I can probably live well with these
>> spikes in latency. I was more concerned that there is a deeper issue with my
>> config and the response time could be unbounded.
>
> You don't have to live with it. You could add a read after the writes in
> the loop (wrfl()). This should help.

Something like this, which was done for Intel e1000?

https://lore.kernel.org/intel-wired-lan/20241219192743.4499-1-gerhard@engleder-embedded.com/

Thanks,
Kurt

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB
       [not found]                     ` <19a46896-3d26-415b-9820-730a42b0702e@eltropuls.de>
@ 2025-06-17 13:35                       ` Kurt Kanzenbach
  0 siblings, 0 replies; 15+ messages in thread
From: Kurt Kanzenbach @ 2025-06-17 13:35 UTC (permalink / raw)
  To: Marc Strämke, Sebastian Andrzej Siewior; +Cc: linux-kernel, linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 532 bytes --]

On Tue Jun 17 2025, Marc Strämke wrote:
> Hi Kurt,
> On 17.06.25 12:54, Kurt Kanzenbach wrote:
>>> You don't have to live with it. You could add a read after the writes in
>>> the loop (wrfl()). This should help.
>> Something like this, which was done for Intel e1000?
>>
>> https://lore.kernel.org/intel-wired-lan/20241219192743.4499-1-gerhard@engleder-embedded.com/
>>
>> Thanks,
>> Kurt
>
> This works wonders (Latency spike down to 120 us from 700) ! I will 
> prepare a patch and submit it to LKML.

Great :-)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-06-17 13:58 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-10 11:23 Latency spikes on V6.15.1 Preempt RT and maybe related to intel IGB Marc Strämke
2025-06-13 14:54 ` Sebastian Andrzej Siewior
     [not found]   ` <E1uQ6I0-000000003aa-37uJ@smtprelay05.ispgateway.de>
2025-06-13 19:58     ` Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB Sebastian Andrzej Siewior
2025-06-14  8:52       ` Marc Strämke
2025-06-16 15:10         ` Marc Strämke
2025-06-17  9:16           ` Sebastian Andrzej Siewior
2025-06-17  9:08         ` Sebastian Andrzej Siewior
2025-06-17  9:28         ` Sebastian Andrzej Siewior
2025-06-17  9:45           ` Marc Strämke
2025-06-17 10:00             ` Sebastian Andrzej Siewior
2025-06-17 10:03               ` Marc Strämke
2025-06-17 10:15                 ` Sebastian Andrzej Siewior
2025-06-17 10:54                   ` Kurt Kanzenbach
     [not found]                     ` <19a46896-3d26-415b-9820-730a42b0702e@eltropuls.de>
2025-06-17 13:35                       ` Kurt Kanzenbach
     [not found]   ` <522704b0-5a35-42c3-9bcc-627b487b740c@eltropuls.de>
2025-06-14 18:13     ` Latency spikes on V6.15.1 Preempt RT and maybe related to intel IGB Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).