public inbox for linux-rt-users@vger.kernel.org
 help / color / mirror / Atom feed
* Unexpected rcuc osnoise on PREEMPT_RT with "rcu_nocbs=all rcu_nocb_poll"
@ 2024-01-19 15:38 Leon Woestenberg
  2024-01-19 22:08 ` Frederic Weisbecker
  0 siblings, 1 reply; 6+ messages in thread
From: Leon Woestenberg @ 2024-01-19 15:38 UTC (permalink / raw)
  To: linux-rt-users

Hello all,

On 6.1.71-rt21 (PREEMPT_RT against vanilla) I am trying to find the
root cause of an "rcuc" thread scheduled in on an isolated core (CPU
#4).
The isolated core should run a single non-kernel user-space-busy-loop
thread (SCHED_OTHER, not FIFO or RR).

tsc=reliable isolcpus=domain,managed_irq,4-7 nohz_full=4-7
rcu_nocbs=all rcu_nocb_poll
/proc/irq/*/smp_affinity and cpumasks in
/sys/devices/virtual/workqueue/ all show 000f or less (CPUs 0-3)

Using osnoise I see rcuc/4 popping up as a cause of noise, but my
kernel command line should prevent that, right?

I see this in the kernel boot log, and I wonder if this is related to my issue:

[    0.063518] rcu:     RCU_SOFTIRQ processing moved to rcuc kthreads.

tree.c:102:static bool use_softirq = !IS_ENABLED(CONFIG_PREEMPT_RT);

Could this cause the rcuc threads to become active, even with the
"rcu_nocbs=all rcu_nocb_poll" setting?


Thanks, Leon.


[    0.063516] rcu: Preemptible hierarchical RCU implementation.
[    0.063517] rcu:     RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=16.
[    0.063518] rcu:     RCU priority boosting: priority 1 delay 500 ms.
[    0.063518] rcu:     RCU_SOFTIRQ processing moved to rcuc kthreads.
[    0.063519]     No expedited grace period (rcu_normal_after_boot).
[    0.063519]     Trampoline variant of Tasks RCU enabled.
[    0.063519]     Rude variant of Tasks RCU enabled.
[    0.063520]     Tracing variant of Tasks RCU enabled.
[    0.063520] rcu: RCU calculated value of scheduler-enlistment delay
is 25 jiffies.
[    0.063521] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=16
[    0.066896] NO_HZ: Full dynticks CPUs: 4-7.
[    0.066897] rcu:     Offload RCU callbacks from CPUs: 0-15.
[    0.066898] rcu:     Poll for callbacks from no-CBs CPUs.
[    0.066901] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[    1.145094] rcu: Hierarchical SRCU implementation.
[    1.145094] rcu:     Max phase no-delay instances is 1000.


--
Leon Woestenberg
leon@sidebranch.com
T: +31 40 711 42 76
M: +31 6 472 30 372

Sidebranch Embedded Systems
Eindhoven, The Netherlands
http://www.sidebranch.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unexpected rcuc osnoise on PREEMPT_RT with "rcu_nocbs=all rcu_nocb_poll"
  2024-01-19 15:38 Unexpected rcuc osnoise on PREEMPT_RT with "rcu_nocbs=all rcu_nocb_poll" Leon Woestenberg
@ 2024-01-19 22:08 ` Frederic Weisbecker
  2024-01-20  0:37   ` Paul E. McKenney
  0 siblings, 1 reply; 6+ messages in thread
From: Frederic Weisbecker @ 2024-01-19 22:08 UTC (permalink / raw)
  To: Leon Woestenberg; +Cc: linux-rt-users, Paul E. McKenney, rcu

Le Fri, Jan 19, 2024 at 04:38:23PM +0100, Leon Woestenberg a écrit :
> Hello all,
> 
> On 6.1.71-rt21 (PREEMPT_RT against vanilla) I am trying to find the
> root cause of an "rcuc" thread scheduled in on an isolated core (CPU
> #4).
> The isolated core should run a single non-kernel user-space-busy-loop
> thread (SCHED_OTHER, not FIFO or RR).
> 
> tsc=reliable isolcpus=domain,managed_irq,4-7 nohz_full=4-7
> rcu_nocbs=all rcu_nocb_poll
> /proc/irq/*/smp_affinity and cpumasks in
> /sys/devices/virtual/workqueue/ all show 000f or less (CPUs 0-3)
> 
> Using osnoise I see rcuc/4 popping up as a cause of noise, but my
> kernel command line should prevent that, right?
> 
> I see this in the kernel boot log, and I wonder if this is related to my issue:
> 
> [    0.063518] rcu:     RCU_SOFTIRQ processing moved to rcuc kthreads.
> 
> tree.c:102:static bool use_softirq = !IS_ENABLED(CONFIG_PREEMPT_RT);
> 
> Could this cause the rcuc threads to become active, even with the
> "rcu_nocbs=all rcu_nocb_poll" setting?

That's unexpected. The RCU load should be moved from rcuc to rcuo* kthreads.

Do you have CONFIG_RCU_STRICT_GRACE_PERIOD=y perhaps?

Also do you really need rcu_nocb_poll? I would expect this option to be useful
for debugging only, though I'm not aware of every usecases.

Thanks.

> 
> 
> Thanks, Leon.
> 
> 
> [    0.063516] rcu: Preemptible hierarchical RCU implementation.
> [    0.063517] rcu:     RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=16.
> [    0.063518] rcu:     RCU priority boosting: priority 1 delay 500 ms.
> [    0.063518] rcu:     RCU_SOFTIRQ processing moved to rcuc kthreads.
> [    0.063519]     No expedited grace period (rcu_normal_after_boot).
> [    0.063519]     Trampoline variant of Tasks RCU enabled.
> [    0.063519]     Rude variant of Tasks RCU enabled.
> [    0.063520]     Tracing variant of Tasks RCU enabled.
> [    0.063520] rcu: RCU calculated value of scheduler-enlistment delay
> is 25 jiffies.
> [    0.063521] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=16
> [    0.066896] NO_HZ: Full dynticks CPUs: 4-7.
> [    0.066897] rcu:     Offload RCU callbacks from CPUs: 0-15.
> [    0.066898] rcu:     Poll for callbacks from no-CBs CPUs.
> [    0.066901] rcu: srcu_init: Setting srcu_struct sizes based on contention.
> [    1.145094] rcu: Hierarchical SRCU implementation.
> [    1.145094] rcu:     Max phase no-delay instances is 1000.
> 
> 
> --
> Leon Woestenberg
> leon@sidebranch.com
> T: +31 40 711 42 76
> M: +31 6 472 30 372
> 
> Sidebranch Embedded Systems
> Eindhoven, The Netherlands
> http://www.sidebranch.com
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unexpected rcuc osnoise on PREEMPT_RT with "rcu_nocbs=all rcu_nocb_poll"
  2024-01-19 22:08 ` Frederic Weisbecker
@ 2024-01-20  0:37   ` Paul E. McKenney
  2024-01-22 14:32     ` Leon Woestenberg
  0 siblings, 1 reply; 6+ messages in thread
From: Paul E. McKenney @ 2024-01-20  0:37 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Leon Woestenberg, linux-rt-users, rcu

On Fri, Jan 19, 2024 at 11:08:23PM +0100, Frederic Weisbecker wrote:
> Le Fri, Jan 19, 2024 at 04:38:23PM +0100, Leon Woestenberg a écrit :
> > Hello all,
> > 
> > On 6.1.71-rt21 (PREEMPT_RT against vanilla) I am trying to find the
> > root cause of an "rcuc" thread scheduled in on an isolated core (CPU
> > #4).
> > The isolated core should run a single non-kernel user-space-busy-loop
> > thread (SCHED_OTHER, not FIFO or RR).
> > 
> > tsc=reliable isolcpus=domain,managed_irq,4-7 nohz_full=4-7
> > rcu_nocbs=all rcu_nocb_poll
> > /proc/irq/*/smp_affinity and cpumasks in
> > /sys/devices/virtual/workqueue/ all show 000f or less (CPUs 0-3)
> > 
> > Using osnoise I see rcuc/4 popping up as a cause of noise, but my
> > kernel command line should prevent that, right?

Well, your choice is rcuc or ksoftirqd.  Either way, there is some per-CPU
work that must be done.  But to Frederic's later point, not very much
work.  In particular, RCU callbacks should be invoked on an rcuoc kthread.

Unless CPU 4 spends absolutely no time in the kernel, and never ever
has more than one runnable task at any given time.  And doesn't have
any other excuse to take a scheduling-clock interrupt.  Otherwise, CPU
4's rcuc kthread will burn at least a little bit of time.  The more time
CPU 4 spends in the kernel and the more time CPU 4 spends with more than
one runnable task, the more time CPU 4's rcuc kthread will burn.

> > I see this in the kernel boot log, and I wonder if this is related to my issue:
> > 
> > [    0.063518] rcu:     RCU_SOFTIRQ processing moved to rcuc kthreads.

This simply means that the processing that would have been done in
softirq (for example, by ksoftirqd) will now be done by rcuc.

> > tree.c:102:static bool use_softirq = !IS_ENABLED(CONFIG_PREEMPT_RT);
> > 
> > Could this cause the rcuc threads to become active, even with the
> > "rcu_nocbs=all rcu_nocb_poll" setting?

You can override this with rcutree.use_softirq=1 if you wish.  But that
will simply move the processing from the rcuc kthread to softirq context.

> That's unexpected. The RCU load should be moved from rcuc to rcuo* kthreads.
> 
> Do you have CONFIG_RCU_STRICT_GRACE_PERIOD=y perhaps?
> 
> Also do you really need rcu_nocb_poll? I would expect this option to be useful
> for debugging only, though I'm not aware of every usecases.

I have the same questions.  ;-)

							Thanx, Paul

> Thanks.
> 
> > 
> > 
> > Thanks, Leon.
> > 
> > 
> > [    0.063516] rcu: Preemptible hierarchical RCU implementation.
> > [    0.063517] rcu:     RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=16.
> > [    0.063518] rcu:     RCU priority boosting: priority 1 delay 500 ms.
> > [    0.063518] rcu:     RCU_SOFTIRQ processing moved to rcuc kthreads.
> > [    0.063519]     No expedited grace period (rcu_normal_after_boot).
> > [    0.063519]     Trampoline variant of Tasks RCU enabled.
> > [    0.063519]     Rude variant of Tasks RCU enabled.
> > [    0.063520]     Tracing variant of Tasks RCU enabled.
> > [    0.063520] rcu: RCU calculated value of scheduler-enlistment delay
> > is 25 jiffies.
> > [    0.063521] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=16
> > [    0.066896] NO_HZ: Full dynticks CPUs: 4-7.
> > [    0.066897] rcu:     Offload RCU callbacks from CPUs: 0-15.
> > [    0.066898] rcu:     Poll for callbacks from no-CBs CPUs.
> > [    0.066901] rcu: srcu_init: Setting srcu_struct sizes based on contention.
> > [    1.145094] rcu: Hierarchical SRCU implementation.
> > [    1.145094] rcu:     Max phase no-delay instances is 1000.
> > 
> > 
> > --
> > Leon Woestenberg
> > leon@sidebranch.com
> > T: +31 40 711 42 76
> > M: +31 6 472 30 372
> > 
> > Sidebranch Embedded Systems
> > Eindhoven, The Netherlands
> > http://www.sidebranch.com
> > 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unexpected rcuc osnoise on PREEMPT_RT with "rcu_nocbs=all rcu_nocb_poll"
  2024-01-20  0:37   ` Paul E. McKenney
@ 2024-01-22 14:32     ` Leon Woestenberg
  2024-01-23 18:32       ` Paul E. McKenney
  2024-08-16 14:06       ` Frederic Weisbecker
  0 siblings, 2 replies; 6+ messages in thread
From: Leon Woestenberg @ 2024-01-22 14:32 UTC (permalink / raw)
  To: paulmck; +Cc: Frederic Weisbecker, linux-rt-users, rcu

Hi Frederic, Paul, all,

On Sat, Jan 20, 2024 at 1:37 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> On Fri, Jan 19, 2024 at 11:08:23PM +0100, Frederic Weisbecker wrote:
> > Le Fri, Jan 19, 2024 at 04:38:23PM +0100, Leon Woestenberg a écrit :
> > > The isolated core should run a single non-kernel user-space-busy-loop
> > > thread (SCHED_OTHER, not FIFO or RR).
> > >
> > > tsc=reliable isolcpus=domain,managed_irq,4-7 nohz_full=4-7
> > > rcu_nocbs=all rcu_nocb_poll
> > > /proc/irq/*/smp_affinity and cpumasks in
> > > /sys/devices/virtual/workqueue/ all show 000f or less (CPUs 0-3)
> > >
> > > Using osnoise I see rcuc/4 popping up as a cause of noise, but my
> > > kernel command line should prevent that, right?
>
> Well, your choice is rcuc or ksoftirqd.  Either way, there is some per-CPU
> work that must be done.  But to Frederic's later point, not very much
> work.  In particular, RCU callbacks should be invoked on an rcuoc kthread.
>
Yes, that's what I expected (and what I got to work on an earlier
system with an older kernel).

I went back to running ftrace against the isolated CPU, and I see some
kworker work.
Sometimes  blk_mq_run_work_func().

(I think seeing rcuc in my osnoise trace was coincidental.)

What would be the method to find the root cause more clearly? Should I
deepen my ftrace function call depth?

I am getting one interruption per 8 minutes on the isolated CPU. Here
is an example trace that ran 30 minutes:

cat trace.4
# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
  4)               |  vtime_user_exit() {
  4)   0.634 us    |    get_vtime_delta();
  4)   1.799 us    |    account_user_time();
  4)   5.808 us    |  }
  4)               |  irq_enter_rcu() {
  4)   0.640 us    |    tick_irq_enter();
  4)   0.914 us    |  }
  4)               |  __sysvec_call_function_single() {
  4)               |  /* call_function_single_entry: vector=251 */
  4)   0.513 us    |    generic_smp_call_function_single_interrupt();
  4)               |  /* call_function_single_exit: vector=251 */
  4)   2.136 us    |  }
  4)               |  irq_exit_rcu() {
  4)   0.203 us    |    idle_cpu();
  4)               |    tick_nohz_irq_exit() {
  4)               |      /* tick_stop: success=0 dependency=SCHED */
  4)   7.564 us    |    }
  4)   8.285 us    |  }
  4)               |  exit_to_user_mode_prepare() {
  4)   0.217 us    |    rcu_nocb_flush_deferred_wakeup();
  4)   0.187 us    |    __rcu_irq_enter_check_tick();
  4)               |  /* reschedule_entry: vector=253 */
  4)               |  /* reschedule_exit: vector=253 */
  4)               |    raw_irqentry_exit_cond_resched() {
  4)               |      /* local_timer_entry: vector=236 */
  4)               |      /* local_timer_exit: vector=236 */
  4)               |      /* tick_stop: success=0 dependency=SCHED */
  4)               |      /* sched_switch: prev_comm=hiccups_x86-64
prev_pid=1335 prev_prio=120 prev_state=R+ ==> next_comm=kworker/4:0H
next_pid=1160 next_prio=100 */
 ------------------------------------------
  4)  hiccups-1335  =>  kworker-1160
 ------------------------------------------

  4)               |  finish_task_switch.isra.0() {
  4)   0.475 us    |    vtime_task_switch_generic();
  4)   0.126 us    |    __tick_nohz_task_switch();
  4)   1.069 us    |  }
  4)               |  wq_worker_running() {
  4)   0.223 us    |    kthread_data();
  4)   0.547 us    |  }
  4)   0.180 us    |  _raw_spin_lock_irq();
  4)   0.115 us    |  _raw_spin_unlock_irq();
  4)               |  mod_timer() {
  4)   0.241 us    |    lock_timer_base();
  4)   0.104 us    |    detach_if_pending();
  4)   0.123 us    |    _raw_spin_lock();
  4)   0.120 us    |    calc_wheel_index();
  4)   0.115 us    |    enqueue_timer();
  4)   0.134 us    |    _raw_spin_unlock_irqrestore();
  4)   1.815 us    |  }
  4)               |  create_worker() {
  4)   0.238 us    |    rt_spin_lock();
  4)   0.288 us    |    rt_spin_unlock();
  4)   0.144 us    |    kfree();
  4)   1.216 us    |    alloc_worker();
  4)               |    kthread_create_on_node() {
  4)               |      /* sched_switch: prev_comm=kworker/4:0H
prev_pid=1160 prev_prio=100 prev_state=D ==> next_comm=hiccups_x86-64
next_pid=1335 next_prio=120 */
 ------------------------------------------
  4)  kworker-1160  =>  hiccups-1335
 ------------------------------------------

  4) + 33.354 us   |    } /* raw_irqentry_exit_cond_resched */
  4)   1.334 us    |    schedule();
  4)   0.151 us    |    rcu_nocb_flush_deferred_wakeup();
  4)   0.133 us    |    mem_cgroup_handle_over_high();
  4)   0.203 us    |    blkcg_maybe_throttle_current();
  4)   0.300 us    |    __rseq_handle_notify_resume();
  4)   0.113 us    |    rcu_nocb_flush_deferred_wakeup();
  4)   0.114 us    |    fpregs_assert_state_consistent();
  4)   0.142 us    |    switch_fpu_return();
  4) + 39.502 us   |  } /* exit_to_user_mode_prepare */
  4)               |  vtime_user_enter() {
  4)   0.177 us    |    get_vtime_delta();
  4)   0.390 us    |  }
  4)               |  vtime_user_exit() {
  4)   0.171 us    |    get_vtime_delta();
  4)   0.437 us    |  }
  4)               |  irq_enter_rcu() {
  4)   0.167 us    |    tick_irq_enter();
  4)   0.393 us    |  }
  4)               |  __sysvec_call_function_single() {
  4)               |  /* call_function_single_entry: vector=251 */
  4)   0.348 us    |    generic_smp_call_function_single_interrupt();
  4)               |  /* call_function_single_exit: vector=251 */
  4)   0.908 us    |  }
  4)               |  irq_exit_rcu() {
  4)   0.304 us    |    idle_cpu();
  4)               |    tick_nohz_irq_exit() {
  4)               |      /* tick_stop: success=0 dependency=SCHED */
  4)   0.646 us    |    }
  4)   2.154 us    |  }
  4)               |  exit_to_user_mode_prepare() {
  4)   0.129 us    |    rcu_nocb_flush_deferred_wakeup();
  4)   0.219 us    |    __rcu_irq_enter_check_tick();
  4)               |  /* reschedule_entry: vector=253 */
  4)               |  /* reschedule_exit: vector=253 */
  4)               |    raw_irqentry_exit_cond_resched() {
  4)               |      /* sched_switch: prev_comm=hiccups_x86-64
prev_pid=1335 prev_prio=120 prev_state=R+ ==> next_comm=kworker/4:0H
next_pid=1160 next_prio=100 */
 ------------------------------------------
  4)  hiccups-1335  =>  kworker-1160
 ------------------------------------------

  4) + 96.177 us   |    } /* kthread_create_on_node */
  4)   1.071 us    |    set_user_nice();
  4)   0.999 us    |    kthread_bind_mask();
  4)   0.436 us    |    worker_attach_to_pool();
  4)   0.112 us    |    _raw_spin_lock_irq();
  4)   0.146 us    |    worker_enter_idle();
  4)   2.932 us    |    wake_up_process();
  4)   0.114 us    |    _raw_spin_unlock_irq();
  4) ! 107.063 us  |  } /* create_worker */
  4)               |  del_timer_sync() {
  4)   0.154 us    |    lock_timer_base();
  4)   0.132 us    |    detach_if_pending();
  4)   0.099 us    |    _raw_spin_unlock_irqrestore();
  4)   0.854 us    |  }
  4)   0.103 us    |  _raw_spin_lock_irq();
  4)               |  rcuwait_wake_up() {
  4)   0.097 us    |    __rcu_read_lock();
  4)   0.099 us    |    __rcu_read_unlock();
  4)   0.511 us    |  }
  4)               |  process_one_work() {
  4)   0.100 us    |    _raw_spin_unlock_irq();
  4)   8.334 us    |    blk_mq_run_work_fn();
  4)   0.133 us    |    _raw_spin_lock_irq();
  4)   0.118 us    |    pwq_dec_nr_in_flight();
  4)   9.496 us    |  }
  4)   0.114 us    |  worker_enter_idle();
  4)   0.108 us    |  _raw_spin_unlock_irq();
  4)               |  schedule() {
  4)   0.148 us    |    wq_worker_sleeping();
  4)   0.114 us    |    rcu_note_context_switch();
  4)   0.100 us    |    _raw_spin_lock();
  4)   0.125 us    |    update_rq_clock();
  4)   0.817 us    |    dequeue_task_fair();
  4)   0.550 us    |    pick_next_task_fair();
  4)   0.435 us    |    psi_task_switch();
  4)               |    __traceiter_sched_switch() {
  4)               |      /* sched_switch: prev_comm=kworker/4:0H
prev_pid=1160 prev_prio=100 prev_state=I ==> next_comm=kworker/4:1H
next_pid=1348 next_prio=100 */
  4)   0.315 us    |    }
  4)   0.114 us    |    enter_lazy_tlb();
 ------------------------------------------
  4)  kworker-1160  =>  kworker-1348
 ------------------------------------------

  4)               |  finish_task_switch.isra.0() {
  4)   0.289 us    |    vtime_task_switch_generic();
  4)   0.111 us    |    __tick_nohz_task_switch();
  4)   0.791 us    |  }
  4)               |  __kthread_parkme() {
  4)   0.106 us    |    _raw_spin_lock_irqsave();
  4)   0.098 us    |    _raw_spin_unlock_irqrestore();
  4)   0.611 us    |  }
  4)               |  worker_thread() {
  4)   0.107 us    |    mutex_lock();
  4)   0.107 us    |    mutex_unlock();
  4)   0.101 us    |    _raw_spin_lock_irq();
  4)   0.128 us    |    worker_enter_idle();
  4)   0.104 us    |    _raw_spin_unlock_irq();
  4)               |    schedule() {
  4)               |      /* sched_switch: prev_comm=kworker/4:1H
prev_pid=1348 prev_prio=100 prev_state=I ==> next_comm=hiccups_x86-64
next_pid=1335 next_prio=120 */
 ------------------------------------------
  4)  kworker-1348  =>  hiccups-1335
 ------------------------------------------

  4) + 36.544 us   |    } /* raw_irqentry_exit_cond_resched */
  4)   1.160 us    |    schedule();
  4)   0.138 us    |    rcu_nocb_flush_deferred_wakeup();
  4)   0.125 us    |    mem_cgroup_handle_over_high();
  4)   0.113 us    |    blkcg_maybe_throttle_current();
  4)   0.262 us    |    __rseq_handle_notify_resume();
  4)   0.112 us    |    rcu_nocb_flush_deferred_wakeup();
  4)   0.111 us    |    fpregs_assert_state_consistent();
  4)   0.124 us    |    switch_fpu_return();
  4) + 41.539 us   |  } /* exit_to_user_mode_prepare */
  4)               |  vtime_user_enter() {
  4)   0.189 us    |    get_vtime_delta();
  4)   0.390 us    |  }
  4)               |  vtime_user_exit() {
  4)   0.178 us    |    get_vtime_delta();
  4)   0.413 us    |  }
  4)               |  irq_enter_rcu() {
  4)   0.155 us    |    tick_irq_enter();
  4)   0.428 us    |  }
  4)               |  __sysvec_call_function_single() {
  4)               |  /* call_function_single_entry: vector=251 */
  4)   0.329 us    |    generic_smp_call_function_single_interrupt();
  4)               |  /* call_function_single_exit: vector=251 */
  4)   1.158 us    |  }
  4)               |  irq_exit_rcu() {
  4)   0.190 us    |    idle_cpu();
  4)               |    tick_nohz_irq_exit() {
  4)               |      /* tick_stop: success=0 dependency=SCHED */
  4)   0.518 us    |    }
  4)   1.128 us    |  }
  4)               |  exit_to_user_mode_prepare() {
  4)   0.128 us    |    rcu_nocb_flush_deferred_wakeup();
  4)   0.109 us    |    __rcu_irq_enter_check_tick();
  4)               |  /* reschedule_entry: vector=253 */
  4)               |  /* reschedule_exit: vector=253 */
  4)               |    raw_irqentry_exit_cond_resched() {
  4)               |      /* sched_switch: prev_comm=hiccups_x86-64
prev_pid=1335 prev_prio=120 prev_state=R+ ==> next_comm=kworker/4:1H
next_pid=1348 next_prio=100 */
 ------------------------------------------
  4)  hiccups-1335  =>  kworker-1348
 ------------------------------------------

  4) ! 277.107 us  |    } /* schedule */
  4)   0.190 us    |    _raw_spin_lock_irq();
  4)   6.380 us    |    process_one_work();
  4)   0.128 us    |    worker_enter_idle();
  4)   0.115 us    |    _raw_spin_unlock_irq();
  4)               |    schedule() {
  4)               |      /* sched_switch: prev_comm=kworker/4:1H
prev_pid=1348 prev_prio=100 prev_state=I ==> next_comm=hiccups_x86-64
next_pid=1335 next_prio=120 */
 ------------------------------------------
  4)  kworker-1348  =>  hiccups-1335
 ------------------------------------------

  4) + 15.222 us   |    } /* raw_irqentry_exit_cond_resched */
  4)   1.086 us    |    schedule();
  4)   0.115 us    |    rcu_nocb_flush_deferred_wakeup();
  4)   0.100 us    |    mem_cgroup_handle_over_high();
  4)   0.106 us    |    blkcg_maybe_throttle_current();
  4)   0.148 us    |    __rseq_handle_notify_resume();
  4)   0.116 us    |    rcu_nocb_flush_deferred_wakeup();
  4)   0.119 us    |    fpregs_assert_state_consistent();
  4)   0.123 us    |    switch_fpu_return();
  4) + 19.265 us   |  }
  4)               |  vtime_user_enter() {
  4)   0.160 us    |    get_vtime_delta();
  4)   0.366 us    |  }
  4)               |  vtime_user_exit() {
  4)   0.173 us    |    get_vtime_delta();
  4)   0.399 us    |  }
  4)               |  irq_enter_rcu() {
  4)   0.152 us    |    tick_irq_enter();
  4)   0.380 us    |  }
  4)               |  __sysvec_apic_timer_interrupt() {
  4)               |  /* local_timer_entry: vector=236 */
  4)   8.901 us    |    hrtimer_interrupt();
  4)               |  /* local_timer_exit: vector=236 */
  4)   9.422 us    |  }
  4)               |  irq_exit_rcu() {
  4)   0.394 us    |    wake_up_process();
  4)   0.137 us    |    idle_cpu();
  4)   0.677 us    |    tick_nohz_irq_exit();
  4)   1.797 us    |  }
  4)               |  exit_to_user_mode_prepare() {
  4)   0.112 us    |    rcu_nocb_flush_deferred_wakeup();
  4)               |    schedule() {
  4)               |      /* sched_switch: prev_comm=hiccups_x86-64
prev_pid=1335 prev_prio=120 prev_state=R+ ==> next_comm=ktimers/4
next_pid=1157 next_prio=98 */
 ------------------------------------------
  4)  hiccups-1335  =>  ktimers-1157
 ------------------------------------------

  4)               |  finish_task_switch.isra.0() {
  4)   0.564 us    |    vtime_task_switch_generic();
  4)   0.210 us    |    __tick_nohz_task_switch();
  4)   1.321 us    |  }
  4)   0.213 us    |  kthread_should_stop();
  4)   0.135 us    |  kthread_should_park();
  4)   0.118 us    |  timersd_should_run();
  4)               |  run_timersd() {
  4)   0.409 us    |    __local_bh_disable_ip();
  4)               |    __do_softirq() {
  4)               |      /* timer_expire_entry:
timer=00000000b281d363 function=delayed_work_timer_fn now=4295134337
baseclk=4295016960 */
  4)   4.507 us    |    }
  4)   0.355 us    |    __local_bh_enable();
  4)   5.864 us    |  }
  4)   0.129 us    |  kthread_should_stop();
  4)   0.102 us    |  kthread_should_park();
  4)   0.099 us    |  timersd_should_run();
  4)               |  schedule() {
  4)   0.132 us    |    rcu_note_context_switch();
  4)   0.114 us    |    _raw_spin_lock();
  4)   0.122 us    |    update_rq_clock();
  4)   1.191 us    |    dequeue_task_rt();
  4)   0.184 us    |    balance_rt();
  4)   0.120 us    |    balance_fair();
  4)   0.233 us    |    put_prev_task_rt();
  4)   0.121 us    |    pick_next_task_stop();
  4)   0.121 us    |    pick_next_task_dl();
  4)   0.175 us    |    pick_next_task_rt();
  4)   0.442 us    |    __pick_next_task_fair();
  4)   0.396 us    |    psi_task_switch();
  4)               |    __traceiter_sched_switch() {
  4)               |      /* sched_switch: prev_comm=ktimers/4
prev_pid=1157 prev_prio=98 prev_state=S ==> next_comm=kworker/4:1
next_pid=1163 next_prio=120 */
  4)   0.269 us    |    }
  4)   0.114 us    |    enter_lazy_tlb();
 ------------------------------------------
  4)  ktimers-1157  =>  kworker-1163
 ------------------------------------------

  4)               |  finish_task_switch.isra.0() {
  4)   0.288 us    |    vtime_task_switch_generic();
  4)   0.116 us    |    __tick_nohz_task_switch();
  4)   0.111 us    |    __rcu_irq_enter_check_tick();
  4)   0.176 us    |    irq_enter_rcu();
  4)               |    __sysvec_irq_work() {
  4)               |      /* irq_work_entry: vector=246 */
  4)               |      /* irq_work_exit: vector=246 */
  4)   0.907 us    |    }
  4)               |    irq_exit_rcu() {
  4)               |      /* tick_stop: success=0 dependency=SCHED */
  4)   0.554 us    |    }
  4)   0.123 us    |    raw_irqentry_exit_cond_resched();
  4)   3.596 us    |  }
  4)               |  wq_worker_running() {
  4)   0.101 us    |    kthread_data();
  4)   0.344 us    |  }
  4)   0.121 us    |  _raw_spin_lock_irq();
  4)               |  process_one_work() {
  4)   0.112 us    |    _raw_spin_unlock_irq();
  4)   1.246 us    |    vmstat_update();
  4)   0.107 us    |    _raw_spin_lock_irq();
  4)   0.108 us    |    pwq_dec_nr_in_flight();
  4)   2.356 us    |  }
  4)   0.099 us    |  worker_enter_idle();
  4)   0.097 us    |  _raw_spin_unlock_irq();
  4)               |  schedule() {
  4)   0.153 us    |    wq_worker_sleeping();
  4)   0.194 us    |    rcu_note_context_switch();
  4)   0.413 us    |    _raw_spin_lock();
  4)   0.134 us    |    update_rq_clock();
  4)   0.702 us    |    dequeue_task_fair();
  4)   1.123 us    |    pick_next_task_fair();
  4)   0.687 us    |    psi_task_switch();
  4)               |    __traceiter_sched_switch() {
  4)               |      /* sched_switch: prev_comm=kworker/4:1
prev_pid=1163 prev_prio=120 prev_state=I ==> next_comm=hiccups_x86-64
next_pid=1335 next_prio=120 */
  4)   0.246 us    |    }
  4)   0.139 us    |    switch_mm_irqs_off();
 ------------------------------------------
  4)  kworker-1163  =>  hiccups-1335
 ------------------------------------------

  4) + 32.824 us   |    } /* schedule */
  4)   0.113 us    |    rcu_nocb_flush_deferred_wakeup();
  4)   0.110 us    |    mem_cgroup_handle_over_high();
  4)   0.104 us    |    blkcg_maybe_throttle_current();
  4)   0.151 us    |    __rseq_handle_notify_resume();
  4)   0.116 us    |    rcu_nocb_flush_deferred_wakeup();
  4)   0.097 us    |    fpregs_assert_state_consistent();
  4)   0.102 us    |    switch_fpu_return();
  4) + 34.991 us   |  } /* exit_to_user_mode_prepare */
  4)               |  vtime_user_enter() {
  4)   0.174 us    |    get_vtime_delta();
  4)   0.388 us    |  }
  4)               |  vtime_user_exit() {
  4)   0.188 us    |    get_vtime_delta();
  4)   0.456 us    |    account_user_time();
  4)   1.118 us    |  }
  4)               |  irq_enter_rcu() {
  4)   0.158 us    |    tick_irq_enter();
  4)   0.391 us    |  }
  4)               |  __sysvec_apic_timer_interrupt() {
  4)               |  /* local_timer_entry: vector=236 */
  4)   5.066 us    |    hrtimer_interrupt();
  4)               |  /* local_timer_exit: vector=236 */
  4)   5.596 us    |  }
  4)               |  irq_exit_rcu() {
  4)   0.111 us    |    idle_cpu();
  4)               |    tick_nohz_irq_exit() {
  4)               |      /* tick_stop: success=1 dependency=NONE */
  4)   5.546 us    |    }
  4)   6.018 us    |  }
  4)               |  exit_to_user_mode_prepare() {
  4)   0.118 us    |    rcu_nocb_flush_deferred_wakeup();
  4)   0.128 us    |    fpregs_assert_state_consistent();
  4)   0.623 us    |  }
  4)               |  vtime_user_enter() {
  4)   0.158 us    |    get_vtime_delta();
  4)   0.364 us    |  }


> Unless CPU 4 spends absolutely no time in the kernel, and never ever
> has more than one runnable task at any given time.  And doesn't have
> any other excuse to take a scheduling-clock interrupt.  Otherwise, CPU
> 4's rcuc kthread will burn at least a little bit of time.  The more time
> CPU 4 spends in the kernel and the more time CPU 4 spends with more than
> one runnable task, the more time CPU 4's rcuc kthread will burn.
>
I am exactly trying to achieve this (again, but now on 6.1.*-rt* latest).

> > > I see this in the kernel boot log, and I wonder if this is related to my issue:
> > >
> > > [    0.063518] rcu:     RCU_SOFTIRQ processing moved to rcuc kthreads.
>
> This simply means that the processing that would have been done in
> softirq (for example, by ksoftirqd) will now be done by rcuc.
>
Note that my kernel has the PREEMPT_RT patch, so I would expect any
softirq to be a kthread anyway.

> > > tree.c:102:static bool use_softirq = !IS_ENABLED(CONFIG_PREEMPT_RT);
> > >
> > > Could this cause the rcuc threads to become active, even with the
> > > "rcu_nocbs=all rcu_nocb_poll" setting?
>
> You can override this with rcutree.use_softirq=1 if you wish.  But that
> will simply move the processing from the rcuc kthread to softirq context.
>
...which is a kthread after all in PREEMPT_RT,  right?

> > That's unexpected. The RCU load should be moved from rcuc to rcuo* kthreads.
> >
> > Do you have CONFIG_RCU_STRICT_GRACE_PERIOD=y perhaps?
> >
No. I do have some _TIMEOUT related settings, that I am not sure of if
they are rcuc side or rcuo* side:

162:CONFIG_RCU_EXPERT=y
172:CONFIG_RCU_STALL_COMMON=y
173:CONFIG_RCU_NEED_SEGCBLIST=y
174:CONFIG_RCU_FANOUT=64
175:CONFIG_RCU_FANOUT_LEAF=16
176:CONFIG_RCU_BOOST=y
177:CONFIG_RCU_BOOST_DELAY=500
178:# CONFIG_RCU_EXP_KTHREAD is not set
179:CONFIG_RCU_NOCB_CPU=y
180:# CONFIG_RCU_NOCB_CPU_DEFAULT_ALL is not set
181:CONFIG_RCU_NOCB_CPU_CB_BOOST=y
10445:# CONFIG_RCU_SCALE_TEST is not set
10446:# CONFIG_RCU_TORTURE_TEST is not set
10447:# CONFIG_RCU_REF_SCALE_TEST is not set
10448:CONFIG_RCU_CPU_STALL_TIMEOUT=21
10449:CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=0
10450:# CONFIG_RCU_TRACE is not set
10451:# CONFIG_RCU_EQS_DEBUG is not set

> > Also do you really need rcu_nocb_poll? I would expect this option to be useful
> > for debugging only, though I'm not aware of every usecases.
>
"Setting rcu_nocb_poll to relieve each CPU from the responsibility
awakening their RCU offload threads".

I hoped this would decrease the CPU specific RCU (rcuc) side to become
more silent on the timer. Or is it purely synchronous on the event
anyway?

> I have the same questions.  ;-)
>
Good, that means -EUSER is even more likely.  (five nines? :-) )


Regards,

Leon.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unexpected rcuc osnoise on PREEMPT_RT with "rcu_nocbs=all rcu_nocb_poll"
  2024-01-22 14:32     ` Leon Woestenberg
@ 2024-01-23 18:32       ` Paul E. McKenney
  2024-08-16 14:06       ` Frederic Weisbecker
  1 sibling, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2024-01-23 18:32 UTC (permalink / raw)
  To: Leon Woestenberg; +Cc: Frederic Weisbecker, linux-rt-users, rcu

On Mon, Jan 22, 2024 at 03:32:44PM +0100, Leon Woestenberg wrote:
> Hi Frederic, Paul, all,
> 
> On Sat, Jan 20, 2024 at 1:37 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > On Fri, Jan 19, 2024 at 11:08:23PM +0100, Frederic Weisbecker wrote:
> > > Le Fri, Jan 19, 2024 at 04:38:23PM +0100, Leon Woestenberg a écrit :
> > > > The isolated core should run a single non-kernel user-space-busy-loop
> > > > thread (SCHED_OTHER, not FIFO or RR).
> > > >
> > > > tsc=reliable isolcpus=domain,managed_irq,4-7 nohz_full=4-7
> > > > rcu_nocbs=all rcu_nocb_poll
> > > > /proc/irq/*/smp_affinity and cpumasks in
> > > > /sys/devices/virtual/workqueue/ all show 000f or less (CPUs 0-3)
> > > >
> > > > Using osnoise I see rcuc/4 popping up as a cause of noise, but my
> > > > kernel command line should prevent that, right?
> >
> > Well, your choice is rcuc or ksoftirqd.  Either way, there is some per-CPU
> > work that must be done.  But to Frederic's later point, not very much
> > work.  In particular, RCU callbacks should be invoked on an rcuoc kthread.
> >
> Yes, that's what I expected (and what I got to work on an earlier
> system with an older kernel).
> 
> I went back to running ftrace against the isolated CPU, and I see some
> kworker work.
> Sometimes  blk_mq_run_work_func().
> 
> (I think seeing rcuc in my osnoise trace was coincidental.)
> 
> What would be the method to find the root cause more clearly? Should I
> deepen my ftrace function call depth?

Is the application thread running on CPU 4 doing any kernel work?
Alternatively, is there more than one runnable task on CPU 4?

> I am getting one interruption per 8 minutes on the isolated CPU. Here
> is an example trace that ran 30 minutes:
> 
> cat trace.4
> # tracer: function_graph
> #
> # CPU  DURATION                  FUNCTION CALLS
> # |     |   |                     |   |   |   |
>   4)               |  vtime_user_exit() {
>   4)   0.634 us    |    get_vtime_delta();
>   4)   1.799 us    |    account_user_time();
>   4)   5.808 us    |  }
>   4)               |  irq_enter_rcu() {
>   4)   0.640 us    |    tick_irq_enter();
>   4)   0.914 us    |  }
>   4)               |  __sysvec_call_function_single() {
>   4)               |  /* call_function_single_entry: vector=251 */
>   4)   0.513 us    |    generic_smp_call_function_single_interrupt();
>   4)               |  /* call_function_single_exit: vector=251 */
>   4)   2.136 us    |  }

CPU 4 was the target of an smp_call_function_single().  Which just
means that some other CPU needs CPU 4 to do something for it.  The
csd_function_entry trace event, if enabled, will give you the name of the
function called, which might help you work out what needed to be adjusted.

>   4)               |  irq_exit_rcu() {
>   4)   0.203 us    |    idle_cpu();
>   4)               |    tick_nohz_irq_exit() {
>   4)               |      /* tick_stop: success=0 dependency=SCHED */
>   4)   7.564 us    |    }
>   4)   8.285 us    |  }
>   4)               |  exit_to_user_mode_prepare() {
>   4)   0.217 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.187 us    |    __rcu_irq_enter_check_tick();
>   4)               |  /* reschedule_entry: vector=253 */
>   4)               |  /* reschedule_exit: vector=253 */
>   4)               |    raw_irqentry_exit_cond_resched() {
>   4)               |      /* local_timer_entry: vector=236 */
>   4)               |      /* local_timer_exit: vector=236 */
>   4)               |      /* tick_stop: success=0 dependency=SCHED */
>   4)               |      /* sched_switch: prev_comm=hiccups_x86-64
> prev_pid=1335 prev_prio=120 prev_state=R+ ==> next_comm=kworker/4:0H
> next_pid=1160 next_prio=100 */
>  ------------------------------------------
>   4)  hiccups-1335  =>  kworker-1160
>  ------------------------------------------
> 
>   4)               |  finish_task_switch.isra.0() {
>   4)   0.475 us    |    vtime_task_switch_generic();
>   4)   0.126 us    |    __tick_nohz_task_switch();
>   4)   1.069 us    |  }
>   4)               |  wq_worker_running() {
>   4)   0.223 us    |    kthread_data();
>   4)   0.547 us    |  }
>   4)   0.180 us    |  _raw_spin_lock_irq();
>   4)   0.115 us    |  _raw_spin_unlock_irq();
>   4)               |  mod_timer() {
>   4)   0.241 us    |    lock_timer_base();
>   4)   0.104 us    |    detach_if_pending();
>   4)   0.123 us    |    _raw_spin_lock();
>   4)   0.120 us    |    calc_wheel_index();
>   4)   0.115 us    |    enqueue_timer();
>   4)   0.134 us    |    _raw_spin_unlock_irqrestore();
>   4)   1.815 us    |  }
>   4)               |  create_worker() {
>   4)   0.238 us    |    rt_spin_lock();
>   4)   0.288 us    |    rt_spin_unlock();
>   4)   0.144 us    |    kfree();
>   4)   1.216 us    |    alloc_worker();
>   4)               |    kthread_create_on_node() {
>   4)               |      /* sched_switch: prev_comm=kworker/4:0H

The workqueue_execute_start trace event will tell you which function
is the workqueue handler, though that would not have helped here.

> prev_pid=1160 prev_prio=100 prev_state=D ==> next_comm=hiccups_x86-64
> next_pid=1335 next_prio=120 */
>  ------------------------------------------
>   4)  kworker-1160  =>  hiccups-1335
>  ------------------------------------------
> 
>   4) + 33.354 us   |    } /* raw_irqentry_exit_cond_resched */
>   4)   1.334 us    |    schedule();
>   4)   0.151 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.133 us    |    mem_cgroup_handle_over_high();
>   4)   0.203 us    |    blkcg_maybe_throttle_current();
>   4)   0.300 us    |    __rseq_handle_notify_resume();
>   4)   0.113 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.114 us    |    fpregs_assert_state_consistent();
>   4)   0.142 us    |    switch_fpu_return();
>   4) + 39.502 us   |  } /* exit_to_user_mode_prepare */
>   4)               |  vtime_user_enter() {
>   4)   0.177 us    |    get_vtime_delta();
>   4)   0.390 us    |  }
>   4)               |  vtime_user_exit() {
>   4)   0.171 us    |    get_vtime_delta();
>   4)   0.437 us    |  }
>   4)               |  irq_enter_rcu() {
>   4)   0.167 us    |    tick_irq_enter();
>   4)   0.393 us    |  }
>   4)               |  __sysvec_call_function_single() {
>   4)               |  /* call_function_single_entry: vector=251 */
>   4)   0.348 us    |    generic_smp_call_function_single_interrupt();
>   4)               |  /* call_function_single_exit: vector=251 */

Another smp_call_function_single(), same tracepoint.

>   4)   0.908 us    |  }
>   4)               |  irq_exit_rcu() {
>   4)   0.304 us    |    idle_cpu();
>   4)               |    tick_nohz_irq_exit() {
>   4)               |      /* tick_stop: success=0 dependency=SCHED */
>   4)   0.646 us    |    }
>   4)   2.154 us    |  }
>   4)               |  exit_to_user_mode_prepare() {
>   4)   0.129 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.219 us    |    __rcu_irq_enter_check_tick();
>   4)               |  /* reschedule_entry: vector=253 */
>   4)               |  /* reschedule_exit: vector=253 */
>   4)               |    raw_irqentry_exit_cond_resched() {
>   4)               |      /* sched_switch: prev_comm=hiccups_x86-64
> prev_pid=1335 prev_prio=120 prev_state=R+ ==> next_comm=kworker/4:0H
> next_pid=1160 next_prio=100 */
>  ------------------------------------------
>   4)  hiccups-1335  =>  kworker-1160
>  ------------------------------------------
> 
>   4) + 96.177 us   |    } /* kthread_create_on_node */
>   4)   1.071 us    |    set_user_nice();
>   4)   0.999 us    |    kthread_bind_mask();
>   4)   0.436 us    |    worker_attach_to_pool();
>   4)   0.112 us    |    _raw_spin_lock_irq();
>   4)   0.146 us    |    worker_enter_idle();
>   4)   2.932 us    |    wake_up_process();
>   4)   0.114 us    |    _raw_spin_unlock_irq();
>   4) ! 107.063 us  |  } /* create_worker */
>   4)               |  del_timer_sync() {
>   4)   0.154 us    |    lock_timer_base();
>   4)   0.132 us    |    detach_if_pending();
>   4)   0.099 us    |    _raw_spin_unlock_irqrestore();
>   4)   0.854 us    |  }
>   4)   0.103 us    |  _raw_spin_lock_irq();
>   4)               |  rcuwait_wake_up() {
>   4)   0.097 us    |    __rcu_read_lock();
>   4)   0.099 us    |    __rcu_read_unlock();
>   4)   0.511 us    |  }
>   4)               |  process_one_work() {
>   4)   0.100 us    |    _raw_spin_unlock_irq();
>   4)   8.334 us    |    blk_mq_run_work_fn();

And in this case the workqueue handler is blk_mq_run_work_fn(), as
you noted.  Is I/O being initiated on CPU 4?  If so, it might help
to move that I/O to one of CPUs 0-3.

>   4)   0.133 us    |    _raw_spin_lock_irq();
>   4)   0.118 us    |    pwq_dec_nr_in_flight();
>   4)   9.496 us    |  }
>   4)   0.114 us    |  worker_enter_idle();
>   4)   0.108 us    |  _raw_spin_unlock_irq();
>   4)               |  schedule() {
>   4)   0.148 us    |    wq_worker_sleeping();
>   4)   0.114 us    |    rcu_note_context_switch();
>   4)   0.100 us    |    _raw_spin_lock();
>   4)   0.125 us    |    update_rq_clock();
>   4)   0.817 us    |    dequeue_task_fair();
>   4)   0.550 us    |    pick_next_task_fair();
>   4)   0.435 us    |    psi_task_switch();
>   4)               |    __traceiter_sched_switch() {
>   4)               |      /* sched_switch: prev_comm=kworker/4:0H
> prev_pid=1160 prev_prio=100 prev_state=I ==> next_comm=kworker/4:1H
> next_pid=1348 next_prio=100 */
>   4)   0.315 us    |    }
>   4)   0.114 us    |    enter_lazy_tlb();
>  ------------------------------------------
>   4)  kworker-1160  =>  kworker-1348
>  ------------------------------------------
> 
>   4)               |  finish_task_switch.isra.0() {
>   4)   0.289 us    |    vtime_task_switch_generic();
>   4)   0.111 us    |    __tick_nohz_task_switch();
>   4)   0.791 us    |  }
>   4)               |  __kthread_parkme() {
>   4)   0.106 us    |    _raw_spin_lock_irqsave();
>   4)   0.098 us    |    _raw_spin_unlock_irqrestore();
>   4)   0.611 us    |  }
>   4)               |  worker_thread() {
>   4)   0.107 us    |    mutex_lock();
>   4)   0.107 us    |    mutex_unlock();
>   4)   0.101 us    |    _raw_spin_lock_irq();
>   4)   0.128 us    |    worker_enter_idle();
>   4)   0.104 us    |    _raw_spin_unlock_irq();
>   4)               |    schedule() {
>   4)               |      /* sched_switch: prev_comm=kworker/4:1H
> prev_pid=1348 prev_prio=100 prev_state=I ==> next_comm=hiccups_x86-64
> next_pid=1335 next_prio=120 */
>  ------------------------------------------
>   4)  kworker-1348  =>  hiccups-1335
>  ------------------------------------------
> 
>   4) + 36.544 us   |    } /* raw_irqentry_exit_cond_resched */
>   4)   1.160 us    |    schedule();
>   4)   0.138 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.125 us    |    mem_cgroup_handle_over_high();
>   4)   0.113 us    |    blkcg_maybe_throttle_current();
>   4)   0.262 us    |    __rseq_handle_notify_resume();
>   4)   0.112 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.111 us    |    fpregs_assert_state_consistent();
>   4)   0.124 us    |    switch_fpu_return();
>   4) + 41.539 us   |  } /* exit_to_user_mode_prepare */
>   4)               |  vtime_user_enter() {
>   4)   0.189 us    |    get_vtime_delta();
>   4)   0.390 us    |  }
>   4)               |  vtime_user_exit() {
>   4)   0.178 us    |    get_vtime_delta();
>   4)   0.413 us    |  }
>   4)               |  irq_enter_rcu() {
>   4)   0.155 us    |    tick_irq_enter();
>   4)   0.428 us    |  }
>   4)               |  __sysvec_call_function_single() {
>   4)               |  /* call_function_single_entry: vector=251 */
>   4)   0.329 us    |    generic_smp_call_function_single_interrupt();
>   4)               |  /* call_function_single_exit: vector=251 */
>   4)   1.158 us    |  }
>   4)               |  irq_exit_rcu() {
>   4)   0.190 us    |    idle_cpu();
>   4)               |    tick_nohz_irq_exit() {
>   4)               |      /* tick_stop: success=0 dependency=SCHED */
>   4)   0.518 us    |    }
>   4)   1.128 us    |  }
>   4)               |  exit_to_user_mode_prepare() {
>   4)   0.128 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.109 us    |    __rcu_irq_enter_check_tick();
>   4)               |  /* reschedule_entry: vector=253 */
>   4)               |  /* reschedule_exit: vector=253 */
>   4)               |    raw_irqentry_exit_cond_resched() {
>   4)               |      /* sched_switch: prev_comm=hiccups_x86-64
> prev_pid=1335 prev_prio=120 prev_state=R+ ==> next_comm=kworker/4:1H
> next_pid=1348 next_prio=100 */
>  ------------------------------------------
>   4)  hiccups-1335  =>  kworker-1348
>  ------------------------------------------
> 
>   4) ! 277.107 us  |    } /* schedule */
>   4)   0.190 us    |    _raw_spin_lock_irq();
>   4)   6.380 us    |    process_one_work();
>   4)   0.128 us    |    worker_enter_idle();
>   4)   0.115 us    |    _raw_spin_unlock_irq();
>   4)               |    schedule() {
>   4)               |      /* sched_switch: prev_comm=kworker/4:1H
> prev_pid=1348 prev_prio=100 prev_state=I ==> next_comm=hiccups_x86-64
> next_pid=1335 next_prio=120 */
>  ------------------------------------------
>   4)  kworker-1348  =>  hiccups-1335
>  ------------------------------------------
> 
>   4) + 15.222 us   |    } /* raw_irqentry_exit_cond_resched */
>   4)   1.086 us    |    schedule();
>   4)   0.115 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.100 us    |    mem_cgroup_handle_over_high();
>   4)   0.106 us    |    blkcg_maybe_throttle_current();
>   4)   0.148 us    |    __rseq_handle_notify_resume();
>   4)   0.116 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.119 us    |    fpregs_assert_state_consistent();
>   4)   0.123 us    |    switch_fpu_return();
>   4) + 19.265 us   |  }
>   4)               |  vtime_user_enter() {
>   4)   0.160 us    |    get_vtime_delta();
>   4)   0.366 us    |  }
>   4)               |  vtime_user_exit() {
>   4)   0.173 us    |    get_vtime_delta();
>   4)   0.399 us    |  }
>   4)               |  irq_enter_rcu() {
>   4)   0.152 us    |    tick_irq_enter();
>   4)   0.380 us    |  }
>   4)               |  __sysvec_apic_timer_interrupt() {
>   4)               |  /* local_timer_entry: vector=236 */
>   4)   8.901 us    |    hrtimer_interrupt();
>   4)               |  /* local_timer_exit: vector=236 */
>   4)   9.422 us    |  }
>   4)               |  irq_exit_rcu() {
>   4)   0.394 us    |    wake_up_process();
>   4)   0.137 us    |    idle_cpu();
>   4)   0.677 us    |    tick_nohz_irq_exit();
>   4)   1.797 us    |  }
>   4)               |  exit_to_user_mode_prepare() {
>   4)   0.112 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)               |    schedule() {
>   4)               |      /* sched_switch: prev_comm=hiccups_x86-64
> prev_pid=1335 prev_prio=120 prev_state=R+ ==> next_comm=ktimers/4
> next_pid=1157 next_prio=98 */
>  ------------------------------------------
>   4)  hiccups-1335  =>  ktimers-1157
>  ------------------------------------------
> 
>   4)               |  finish_task_switch.isra.0() {
>   4)   0.564 us    |    vtime_task_switch_generic();
>   4)   0.210 us    |    __tick_nohz_task_switch();
>   4)   1.321 us    |  }
>   4)   0.213 us    |  kthread_should_stop();
>   4)   0.135 us    |  kthread_should_park();
>   4)   0.118 us    |  timersd_should_run();
>   4)               |  run_timersd() {
>   4)   0.409 us    |    __local_bh_disable_ip();
>   4)               |    __do_softirq() {
>   4)               |      /* timer_expire_entry:
> timer=00000000b281d363 function=delayed_work_timer_fn now=4295134337
> baseclk=4295016960 */
>   4)   4.507 us    |    }
>   4)   0.355 us    |    __local_bh_enable();
>   4)   5.864 us    |  }
>   4)   0.129 us    |  kthread_should_stop();
>   4)   0.102 us    |  kthread_should_park();
>   4)   0.099 us    |  timersd_should_run();
>   4)               |  schedule() {
>   4)   0.132 us    |    rcu_note_context_switch();
>   4)   0.114 us    |    _raw_spin_lock();
>   4)   0.122 us    |    update_rq_clock();
>   4)   1.191 us    |    dequeue_task_rt();
>   4)   0.184 us    |    balance_rt();
>   4)   0.120 us    |    balance_fair();
>   4)   0.233 us    |    put_prev_task_rt();
>   4)   0.121 us    |    pick_next_task_stop();
>   4)   0.121 us    |    pick_next_task_dl();
>   4)   0.175 us    |    pick_next_task_rt();
>   4)   0.442 us    |    __pick_next_task_fair();
>   4)   0.396 us    |    psi_task_switch();
>   4)               |    __traceiter_sched_switch() {
>   4)               |      /* sched_switch: prev_comm=ktimers/4
> prev_pid=1157 prev_prio=98 prev_state=S ==> next_comm=kworker/4:1
> next_pid=1163 next_prio=120 */
>   4)   0.269 us    |    }
>   4)   0.114 us    |    enter_lazy_tlb();
>  ------------------------------------------
>   4)  ktimers-1157  =>  kworker-1163
>  ------------------------------------------
> 
>   4)               |  finish_task_switch.isra.0() {
>   4)   0.288 us    |    vtime_task_switch_generic();
>   4)   0.116 us    |    __tick_nohz_task_switch();
>   4)   0.111 us    |    __rcu_irq_enter_check_tick();
>   4)   0.176 us    |    irq_enter_rcu();
>   4)               |    __sysvec_irq_work() {
>   4)               |      /* irq_work_entry: vector=246 */
>   4)               |      /* irq_work_exit: vector=246 */
>   4)   0.907 us    |    }
>   4)               |    irq_exit_rcu() {
>   4)               |      /* tick_stop: success=0 dependency=SCHED */
>   4)   0.554 us    |    }
>   4)   0.123 us    |    raw_irqentry_exit_cond_resched();
>   4)   3.596 us    |  }
>   4)               |  wq_worker_running() {
>   4)   0.101 us    |    kthread_data();
>   4)   0.344 us    |  }
>   4)   0.121 us    |  _raw_spin_lock_irq();
>   4)               |  process_one_work() {
>   4)   0.112 us    |    _raw_spin_unlock_irq();
>   4)   1.246 us    |    vmstat_update();
>   4)   0.107 us    |    _raw_spin_lock_irq();
>   4)   0.108 us    |    pwq_dec_nr_in_flight();
>   4)   2.356 us    |  }

This is your kworker doing work, if I am not too confused.
Preceded by a context switch.

>   4)   0.099 us    |  worker_enter_idle();
>   4)   0.097 us    |  _raw_spin_unlock_irq();
>   4)               |  schedule() {
>   4)   0.153 us    |    wq_worker_sleeping();
>   4)   0.194 us    |    rcu_note_context_switch();
>   4)   0.413 us    |    _raw_spin_lock();
>   4)   0.134 us    |    update_rq_clock();
>   4)   0.702 us    |    dequeue_task_fair();
>   4)   1.123 us    |    pick_next_task_fair();
>   4)   0.687 us    |    psi_task_switch();
>   4)               |    __traceiter_sched_switch() {
>   4)               |      /* sched_switch: prev_comm=kworker/4:1
> prev_pid=1163 prev_prio=120 prev_state=I ==> next_comm=hiccups_x86-64
> next_pid=1335 next_prio=120 */
>   4)   0.246 us    |    }
>   4)   0.139 us    |    switch_mm_irqs_off();
>  ------------------------------------------
>   4)  kworker-1163  =>  hiccups-1335
>  ------------------------------------------
> 
>   4) + 32.824 us   |    } /* schedule */
>   4)   0.113 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.110 us    |    mem_cgroup_handle_over_high();
>   4)   0.104 us    |    blkcg_maybe_throttle_current();
>   4)   0.151 us    |    __rseq_handle_notify_resume();
>   4)   0.116 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.097 us    |    fpregs_assert_state_consistent();
>   4)   0.102 us    |    switch_fpu_return();
>   4) + 34.991 us   |  } /* exit_to_user_mode_prepare */
>   4)               |  vtime_user_enter() {
>   4)   0.174 us    |    get_vtime_delta();
>   4)   0.388 us    |  }
>   4)               |  vtime_user_exit() {
>   4)   0.188 us    |    get_vtime_delta();
>   4)   0.456 us    |    account_user_time();
>   4)   1.118 us    |  }
>   4)               |  irq_enter_rcu() {
>   4)   0.158 us    |    tick_irq_enter();
>   4)   0.391 us    |  }
>   4)               |  __sysvec_apic_timer_interrupt() {
>   4)               |  /* local_timer_entry: vector=236 */
>   4)   5.066 us    |    hrtimer_interrupt();
>   4)               |  /* local_timer_exit: vector=236 */
>   4)   5.596 us    |  }
>   4)               |  irq_exit_rcu() {
>   4)   0.111 us    |    idle_cpu();
>   4)               |    tick_nohz_irq_exit() {
>   4)               |      /* tick_stop: success=1 dependency=NONE */
>   4)   5.546 us    |    }
>   4)   6.018 us    |  }
>   4)               |  exit_to_user_mode_prepare() {
>   4)   0.118 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.128 us    |    fpregs_assert_state_consistent();
>   4)   0.623 us    |  }
>   4)               |  vtime_user_enter() {
>   4)   0.158 us    |    get_vtime_delta();
>   4)   0.364 us    |  }

This looks like the kernel exiting to userspace?

> > Unless CPU 4 spends absolutely no time in the kernel, and never ever
> > has more than one runnable task at any given time.  And doesn't have
> > any other excuse to take a scheduling-clock interrupt.  Otherwise, CPU
> > 4's rcuc kthread will burn at least a little bit of time.  The more time
> > CPU 4 spends in the kernel and the more time CPU 4 spends with more than
> > one runnable task, the more time CPU 4's rcuc kthread will burn.
> >
> I am exactly trying to achieve this (again, but now on 6.1.*-rt* latest).
> 
> > > > I see this in the kernel boot log, and I wonder if this is related to my issue:
> > > >
> > > > [    0.063518] rcu:     RCU_SOFTIRQ processing moved to rcuc kthreads.
> >
> > This simply means that the processing that would have been done in
> > softirq (for example, by ksoftirqd) will now be done by rcuc.
> >
> Note that my kernel has the PREEMPT_RT patch, so I would expect any
> softirq to be a kthread anyway.
> 
> > > > tree.c:102:static bool use_softirq = !IS_ENABLED(CONFIG_PREEMPT_RT);
> > > >
> > > > Could this cause the rcuc threads to become active, even with the
> > > > "rcu_nocbs=all rcu_nocb_poll" setting?
> >
> > You can override this with rcutree.use_softirq=1 if you wish.  But that
> > will simply move the processing from the rcuc kthread to softirq context.
> >
> ...which is a kthread after all in PREEMPT_RT,  right?
> 
> > > That's unexpected. The RCU load should be moved from rcuc to rcuo* kthreads.
> > >
> > > Do you have CONFIG_RCU_STRICT_GRACE_PERIOD=y perhaps?
> > >
> No. I do have some _TIMEOUT related settings, that I am not sure of if
> they are rcuc side or rcuo* side:
> 
> 162:CONFIG_RCU_EXPERT=y
> 172:CONFIG_RCU_STALL_COMMON=y
> 173:CONFIG_RCU_NEED_SEGCBLIST=y
> 174:CONFIG_RCU_FANOUT=64
> 175:CONFIG_RCU_FANOUT_LEAF=16
> 176:CONFIG_RCU_BOOST=y
> 177:CONFIG_RCU_BOOST_DELAY=500
> 178:# CONFIG_RCU_EXP_KTHREAD is not set
> 179:CONFIG_RCU_NOCB_CPU=y

This combined with your nohz_full=4-7 will offload handling of RCU
callbacks to kthreads, but you should also check to make sure that
the resulting rcuo* kthreads are never running on CPUs 4-7.

> 180:# CONFIG_RCU_NOCB_CPU_DEFAULT_ALL is not set
> 181:CONFIG_RCU_NOCB_CPU_CB_BOOST=y
> 10445:# CONFIG_RCU_SCALE_TEST is not set
> 10446:# CONFIG_RCU_TORTURE_TEST is not set
> 10447:# CONFIG_RCU_REF_SCALE_TEST is not set
> 10448:CONFIG_RCU_CPU_STALL_TIMEOUT=21
> 10449:CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=0
> 10450:# CONFIG_RCU_TRACE is not set
> 10451:# CONFIG_RCU_EQS_DEBUG is not set
> 
> > > Also do you really need rcu_nocb_poll? I would expect this option to be useful
> > > for debugging only, though I'm not aware of every usecases.
> >
> "Setting rcu_nocb_poll to relieve each CPU from the responsibility
> awakening their RCU offload threads".
> 
> I hoped this would decrease the CPU specific RCU (rcuc) side to become
> more silent on the timer. Or is it purely synchronous on the event
> anyway?

Without rcu_nocb_poll, when your kernel code does call_rcu(), for example,
when userspace closes a file, that call_rcu() will do a wakeup with
attendant overhead.  With rcu_nocb_poll, call_rcu() won't do wakeups
(unless your application (say) closes tens of thousands of files all at
once on a single CPU), but the rcuog* kthreads will wake up periodically
to poll.  Depending on what is pinned where, you might want either
setting of rcu_nocb_poll.

> > I have the same questions.  ;-)
> >
> Good, that means -EUSER is even more likely.  (five nines? :-) )

This file gives some information on per-CPU kthreads, a bit dated, but
still likely helpful:

Documentation/admin-guide/kernel-per-CPU-kthreads.rst

							Thanx, Paul

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unexpected rcuc osnoise on PREEMPT_RT with "rcu_nocbs=all rcu_nocb_poll"
  2024-01-22 14:32     ` Leon Woestenberg
  2024-01-23 18:32       ` Paul E. McKenney
@ 2024-08-16 14:06       ` Frederic Weisbecker
  1 sibling, 0 replies; 6+ messages in thread
From: Frederic Weisbecker @ 2024-08-16 14:06 UTC (permalink / raw)
  To: Leon Woestenberg; +Cc: paulmck, linux-rt-users, rcu

Le Mon, Jan 22, 2024 at 03:32:44PM +0100, Leon Woestenberg a écrit :
> cat trace.4
> # tracer: function_graph
> #
> # CPU  DURATION                  FUNCTION CALLS
> # |     |   |                     |   |   |   |
>   4)               |  vtime_user_exit() {
>   4)   0.634 us    |    get_vtime_delta();
>   4)   1.799 us    |    account_user_time();
>   4)   5.808 us    |  }
>   4)               |  irq_enter_rcu() {
>   4)   0.640 us    |    tick_irq_enter();
>   4)   0.914 us    |  }
>   4)               |  __sysvec_call_function_single() {
>   4)               |  /* call_function_single_entry: vector=251 */
>   4)   0.513 us    |    generic_smp_call_function_single_interrupt();
>   4)               |  /* call_function_single_exit: vector=251 */
>   4)   2.136 us    |  }
>   4)               |  irq_exit_rcu() {
>   4)   0.203 us    |    idle_cpu();
>   4)               |    tick_nohz_irq_exit() {
>   4)               |      /* tick_stop: success=0 dependency=SCHED */
>   4)   7.564 us    |    }
>   4)   8.285 us    |  }
>   4)               |  exit_to_user_mode_prepare() {
>   4)   0.217 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.187 us    |    __rcu_irq_enter_check_tick();
>   4)               |  /* reschedule_entry: vector=253 */
>   4)               |  /* reschedule_exit: vector=253 */
>   4)               |    raw_irqentry_exit_cond_resched() {
>   4)               |      /* local_timer_entry: vector=236 */
>   4)               |      /* local_timer_exit: vector=236 */
>   4)               |      /* tick_stop: success=0 dependency=SCHED */
>   4)               |      /* sched_switch: prev_comm=hiccups_x86-64
> prev_pid=1335 prev_prio=120 prev_state=R+ ==> next_comm=kworker/4:0H
> next_pid=1160 next_prio=100 */
>  ------------------------------------------
>   4)  hiccups-1335  =>  kworker-1160
>  ------------------------------------------
> 
>   4)               |  finish_task_switch.isra.0() {
>   4)   0.475 us    |    vtime_task_switch_generic();
>   4)   0.126 us    |    __tick_nohz_task_switch();
>   4)   1.069 us    |  }
>   4)               |  wq_worker_running() {
>   4)   0.223 us    |    kthread_data();
>   4)   0.547 us    |  }
>   4)   0.180 us    |  _raw_spin_lock_irq();
>   4)   0.115 us    |  _raw_spin_unlock_irq();
>   4)               |  mod_timer() {
>   4)   0.241 us    |    lock_timer_base();
>   4)   0.104 us    |    detach_if_pending();
>   4)   0.123 us    |    _raw_spin_lock();
>   4)   0.120 us    |    calc_wheel_index();
>   4)   0.115 us    |    enqueue_timer();
>   4)   0.134 us    |    _raw_spin_unlock_irqrestore();
>   4)   1.815 us    |  }
>   4)               |  create_worker() {
>   4)   0.238 us    |    rt_spin_lock();
>   4)   0.288 us    |    rt_spin_unlock();
>   4)   0.144 us    |    kfree();
>   4)   1.216 us    |    alloc_worker();
>   4)               |    kthread_create_on_node() {
>   4)               |      /* sched_switch: prev_comm=kworker/4:0H
> prev_pid=1160 prev_prio=100 prev_state=D ==> next_comm=hiccups_x86-64
> next_pid=1335 next_prio=120 */
>  ------------------------------------------
>   4)  kworker-1160  =>  hiccups-1335
>  ------------------------------------------
> 
>   4) + 33.354 us   |    } /* raw_irqentry_exit_cond_resched */
>   4)   1.334 us    |    schedule();
>   4)   0.151 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.133 us    |    mem_cgroup_handle_over_high();
>   4)   0.203 us    |    blkcg_maybe_throttle_current();
>   4)   0.300 us    |    __rseq_handle_notify_resume();
>   4)   0.113 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.114 us    |    fpregs_assert_state_consistent();
>   4)   0.142 us    |    switch_fpu_return();
>   4) + 39.502 us   |  } /* exit_to_user_mode_prepare */
>   4)               |  vtime_user_enter() {
>   4)   0.177 us    |    get_vtime_delta();
>   4)   0.390 us    |  }
>   4)               |  vtime_user_exit() {
>   4)   0.171 us    |    get_vtime_delta();
>   4)   0.437 us    |  }
>   4)               |  irq_enter_rcu() {
>   4)   0.167 us    |    tick_irq_enter();
>   4)   0.393 us    |  }
>   4)               |  __sysvec_call_function_single() {
>   4)               |  /* call_function_single_entry: vector=251 */
>   4)   0.348 us    |    generic_smp_call_function_single_interrupt();
>   4)               |  /* call_function_single_exit: vector=251 */
>   4)   0.908 us    |  }
>   4)               |  irq_exit_rcu() {
>   4)   0.304 us    |    idle_cpu();
>   4)               |    tick_nohz_irq_exit() {
>   4)               |      /* tick_stop: success=0 dependency=SCHED */
>   4)   0.646 us    |    }
>   4)   2.154 us    |  }
>   4)               |  exit_to_user_mode_prepare() {
>   4)   0.129 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.219 us    |    __rcu_irq_enter_check_tick();
>   4)               |  /* reschedule_entry: vector=253 */
>   4)               |  /* reschedule_exit: vector=253 */
>   4)               |    raw_irqentry_exit_cond_resched() {
>   4)               |      /* sched_switch: prev_comm=hiccups_x86-64
> prev_pid=1335 prev_prio=120 prev_state=R+ ==> next_comm=kworker/4:0H
> next_pid=1160 next_prio=100 */
>  ------------------------------------------
>   4)  hiccups-1335  =>  kworker-1160
>  ------------------------------------------
> 
>   4) + 96.177 us   |    } /* kthread_create_on_node */
>   4)   1.071 us    |    set_user_nice();
>   4)   0.999 us    |    kthread_bind_mask();
>   4)   0.436 us    |    worker_attach_to_pool();
>   4)   0.112 us    |    _raw_spin_lock_irq();
>   4)   0.146 us    |    worker_enter_idle();
>   4)   2.932 us    |    wake_up_process();
>   4)   0.114 us    |    _raw_spin_unlock_irq();
>   4) ! 107.063 us  |  } /* create_worker */
>   4)               |  del_timer_sync() {
>   4)   0.154 us    |    lock_timer_base();
>   4)   0.132 us    |    detach_if_pending();
>   4)   0.099 us    |    _raw_spin_unlock_irqrestore();
>   4)   0.854 us    |  }
>   4)   0.103 us    |  _raw_spin_lock_irq();
>   4)               |  rcuwait_wake_up() {
>   4)   0.097 us    |    __rcu_read_lock();
>   4)   0.099 us    |    __rcu_read_unlock();
>   4)   0.511 us    |  }
>   4)               |  process_one_work() {
>   4)   0.100 us    |    _raw_spin_unlock_irq();
>   4)   8.334 us    |    blk_mq_run_work_fn();
>   4)   0.133 us    |    _raw_spin_lock_irq();
>   4)   0.118 us    |    pwq_dec_nr_in_flight();
>   4)   9.496 us    |  }
>   4)   0.114 us    |  worker_enter_idle();
>   4)   0.108 us    |  _raw_spin_unlock_irq();
>   4)               |  schedule() {
>   4)   0.148 us    |    wq_worker_sleeping();
>   4)   0.114 us    |    rcu_note_context_switch();
>   4)   0.100 us    |    _raw_spin_lock();
>   4)   0.125 us    |    update_rq_clock();
>   4)   0.817 us    |    dequeue_task_fair();
>   4)   0.550 us    |    pick_next_task_fair();
>   4)   0.435 us    |    psi_task_switch();
>   4)               |    __traceiter_sched_switch() {
>   4)               |      /* sched_switch: prev_comm=kworker/4:0H
> prev_pid=1160 prev_prio=100 prev_state=I ==> next_comm=kworker/4:1H
> next_pid=1348 next_prio=100 */
>   4)   0.315 us    |    }
>   4)   0.114 us    |    enter_lazy_tlb();
>  ------------------------------------------
>   4)  kworker-1160  =>  kworker-1348
>  ------------------------------------------
> 
>   4)               |  finish_task_switch.isra.0() {
>   4)   0.289 us    |    vtime_task_switch_generic();
>   4)   0.111 us    |    __tick_nohz_task_switch();
>   4)   0.791 us    |  }
>   4)               |  __kthread_parkme() {
>   4)   0.106 us    |    _raw_spin_lock_irqsave();
>   4)   0.098 us    |    _raw_spin_unlock_irqrestore();
>   4)   0.611 us    |  }
>   4)               |  worker_thread() {
>   4)   0.107 us    |    mutex_lock();
>   4)   0.107 us    |    mutex_unlock();
>   4)   0.101 us    |    _raw_spin_lock_irq();
>   4)   0.128 us    |    worker_enter_idle();
>   4)   0.104 us    |    _raw_spin_unlock_irq();
>   4)               |    schedule() {
>   4)               |      /* sched_switch: prev_comm=kworker/4:1H
> prev_pid=1348 prev_prio=100 prev_state=I ==> next_comm=hiccups_x86-64
> next_pid=1335 next_prio=120 */
>  ------------------------------------------
>   4)  kworker-1348  =>  hiccups-1335
>  ------------------------------------------
> 
>   4) + 36.544 us   |    } /* raw_irqentry_exit_cond_resched */
>   4)   1.160 us    |    schedule();
>   4)   0.138 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.125 us    |    mem_cgroup_handle_over_high();
>   4)   0.113 us    |    blkcg_maybe_throttle_current();
>   4)   0.262 us    |    __rseq_handle_notify_resume();
>   4)   0.112 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.111 us    |    fpregs_assert_state_consistent();
>   4)   0.124 us    |    switch_fpu_return();
>   4) + 41.539 us   |  } /* exit_to_user_mode_prepare */
>   4)               |  vtime_user_enter() {
>   4)   0.189 us    |    get_vtime_delta();
>   4)   0.390 us    |  }
>   4)               |  vtime_user_exit() {
>   4)   0.178 us    |    get_vtime_delta();
>   4)   0.413 us    |  }
>   4)               |  irq_enter_rcu() {
>   4)   0.155 us    |    tick_irq_enter();
>   4)   0.428 us    |  }
>   4)               |  __sysvec_call_function_single() {
>   4)               |  /* call_function_single_entry: vector=251 */
>   4)   0.329 us    |    generic_smp_call_function_single_interrupt();
>   4)               |  /* call_function_single_exit: vector=251 */
>   4)   1.158 us    |  }
>   4)               |  irq_exit_rcu() {
>   4)   0.190 us    |    idle_cpu();
>   4)               |    tick_nohz_irq_exit() {
>   4)               |      /* tick_stop: success=0 dependency=SCHED */
>   4)   0.518 us    |    }
>   4)   1.128 us    |  }
>   4)               |  exit_to_user_mode_prepare() {
>   4)   0.128 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.109 us    |    __rcu_irq_enter_check_tick();
>   4)               |  /* reschedule_entry: vector=253 */
>   4)               |  /* reschedule_exit: vector=253 */
>   4)               |    raw_irqentry_exit_cond_resched() {
>   4)               |      /* sched_switch: prev_comm=hiccups_x86-64
> prev_pid=1335 prev_prio=120 prev_state=R+ ==> next_comm=kworker/4:1H
> next_pid=1348 next_prio=100 */
>  ------------------------------------------
>   4)  hiccups-1335  =>  kworker-1348
>  ------------------------------------------
> 
>   4) ! 277.107 us  |    } /* schedule */
>   4)   0.190 us    |    _raw_spin_lock_irq();
>   4)   6.380 us    |    process_one_work();
>   4)   0.128 us    |    worker_enter_idle();
>   4)   0.115 us    |    _raw_spin_unlock_irq();
>   4)               |    schedule() {
>   4)               |      /* sched_switch: prev_comm=kworker/4:1H
> prev_pid=1348 prev_prio=100 prev_state=I ==> next_comm=hiccups_x86-64
> next_pid=1335 next_prio=120 */
>  ------------------------------------------
>   4)  kworker-1348  =>  hiccups-1335
>  ------------------------------------------
> 
>   4) + 15.222 us   |    } /* raw_irqentry_exit_cond_resched */
>   4)   1.086 us    |    schedule();
>   4)   0.115 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.100 us    |    mem_cgroup_handle_over_high();
>   4)   0.106 us    |    blkcg_maybe_throttle_current();
>   4)   0.148 us    |    __rseq_handle_notify_resume();
>   4)   0.116 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.119 us    |    fpregs_assert_state_consistent();
>   4)   0.123 us    |    switch_fpu_return();
>   4) + 19.265 us   |  }
>   4)               |  vtime_user_enter() {
>   4)   0.160 us    |    get_vtime_delta();
>   4)   0.366 us    |  }
>   4)               |  vtime_user_exit() {
>   4)   0.173 us    |    get_vtime_delta();
>   4)   0.399 us    |  }
>   4)               |  irq_enter_rcu() {
>   4)   0.152 us    |    tick_irq_enter();
>   4)   0.380 us    |  }
>   4)               |  __sysvec_apic_timer_interrupt() {
>   4)               |  /* local_timer_entry: vector=236 */
>   4)   8.901 us    |    hrtimer_interrupt();
>   4)               |  /* local_timer_exit: vector=236 */
>   4)   9.422 us    |  }
>   4)               |  irq_exit_rcu() {
>   4)   0.394 us    |    wake_up_process();
>   4)   0.137 us    |    idle_cpu();
>   4)   0.677 us    |    tick_nohz_irq_exit();
>   4)   1.797 us    |  }
>   4)               |  exit_to_user_mode_prepare() {
>   4)   0.112 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)               |    schedule() {
>   4)               |      /* sched_switch: prev_comm=hiccups_x86-64
> prev_pid=1335 prev_prio=120 prev_state=R+ ==> next_comm=ktimers/4
> next_pid=1157 next_prio=98 */
>  ------------------------------------------
>   4)  hiccups-1335  =>  ktimers-1157
>  ------------------------------------------
> 
>   4)               |  finish_task_switch.isra.0() {
>   4)   0.564 us    |    vtime_task_switch_generic();
>   4)   0.210 us    |    __tick_nohz_task_switch();
>   4)   1.321 us    |  }
>   4)   0.213 us    |  kthread_should_stop();
>   4)   0.135 us    |  kthread_should_park();
>   4)   0.118 us    |  timersd_should_run();
>   4)               |  run_timersd() {
>   4)   0.409 us    |    __local_bh_disable_ip();
>   4)               |    __do_softirq() {
>   4)               |      /* timer_expire_entry:
> timer=00000000b281d363 function=delayed_work_timer_fn now=4295134337
> baseclk=4295016960 */
>   4)   4.507 us    |    }
>   4)   0.355 us    |    __local_bh_enable();
>   4)   5.864 us    |  }
>   4)   0.129 us    |  kthread_should_stop();
>   4)   0.102 us    |  kthread_should_park();
>   4)   0.099 us    |  timersd_should_run();
>   4)               |  schedule() {
>   4)   0.132 us    |    rcu_note_context_switch();
>   4)   0.114 us    |    _raw_spin_lock();
>   4)   0.122 us    |    update_rq_clock();
>   4)   1.191 us    |    dequeue_task_rt();
>   4)   0.184 us    |    balance_rt();
>   4)   0.120 us    |    balance_fair();
>   4)   0.233 us    |    put_prev_task_rt();
>   4)   0.121 us    |    pick_next_task_stop();
>   4)   0.121 us    |    pick_next_task_dl();
>   4)   0.175 us    |    pick_next_task_rt();
>   4)   0.442 us    |    __pick_next_task_fair();
>   4)   0.396 us    |    psi_task_switch();
>   4)               |    __traceiter_sched_switch() {
>   4)               |      /* sched_switch: prev_comm=ktimers/4
> prev_pid=1157 prev_prio=98 prev_state=S ==> next_comm=kworker/4:1
> next_pid=1163 next_prio=120 */
>   4)   0.269 us    |    }
>   4)   0.114 us    |    enter_lazy_tlb();
>  ------------------------------------------
>   4)  ktimers-1157  =>  kworker-1163
>  ------------------------------------------
> 
>   4)               |  finish_task_switch.isra.0() {
>   4)   0.288 us    |    vtime_task_switch_generic();
>   4)   0.116 us    |    __tick_nohz_task_switch();
>   4)   0.111 us    |    __rcu_irq_enter_check_tick();
>   4)   0.176 us    |    irq_enter_rcu();
>   4)               |    __sysvec_irq_work() {
>   4)               |      /* irq_work_entry: vector=246 */
>   4)               |      /* irq_work_exit: vector=246 */
>   4)   0.907 us    |    }
>   4)               |    irq_exit_rcu() {
>   4)               |      /* tick_stop: success=0 dependency=SCHED */
>   4)   0.554 us    |    }
>   4)   0.123 us    |    raw_irqentry_exit_cond_resched();
>   4)   3.596 us    |  }
>   4)               |  wq_worker_running() {
>   4)   0.101 us    |    kthread_data();
>   4)   0.344 us    |  }
>   4)   0.121 us    |  _raw_spin_lock_irq();
>   4)               |  process_one_work() {
>   4)   0.112 us    |    _raw_spin_unlock_irq();
>   4)   1.246 us    |    vmstat_update();
>   4)   0.107 us    |    _raw_spin_lock_irq();
>   4)   0.108 us    |    pwq_dec_nr_in_flight();
>   4)   2.356 us    |  }
>   4)   0.099 us    |  worker_enter_idle();
>   4)   0.097 us    |  _raw_spin_unlock_irq();
>   4)               |  schedule() {
>   4)   0.153 us    |    wq_worker_sleeping();
>   4)   0.194 us    |    rcu_note_context_switch();
>   4)   0.413 us    |    _raw_spin_lock();
>   4)   0.134 us    |    update_rq_clock();
>   4)   0.702 us    |    dequeue_task_fair();
>   4)   1.123 us    |    pick_next_task_fair();
>   4)   0.687 us    |    psi_task_switch();
>   4)               |    __traceiter_sched_switch() {
>   4)               |      /* sched_switch: prev_comm=kworker/4:1
> prev_pid=1163 prev_prio=120 prev_state=I ==> next_comm=hiccups_x86-64
> next_pid=1335 next_prio=120 */
>   4)   0.246 us    |    }
>   4)   0.139 us    |    switch_mm_irqs_off();
>  ------------------------------------------
>   4)  kworker-1163  =>  hiccups-1335
>  ------------------------------------------
> 
>   4) + 32.824 us   |    } /* schedule */
>   4)   0.113 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.110 us    |    mem_cgroup_handle_over_high();
>   4)   0.104 us    |    blkcg_maybe_throttle_current();
>   4)   0.151 us    |    __rseq_handle_notify_resume();
>   4)   0.116 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.097 us    |    fpregs_assert_state_consistent();
>   4)   0.102 us    |    switch_fpu_return();
>   4) + 34.991 us   |  } /* exit_to_user_mode_prepare */
>   4)               |  vtime_user_enter() {
>   4)   0.174 us    |    get_vtime_delta();
>   4)   0.388 us    |  }
>   4)               |  vtime_user_exit() {
>   4)   0.188 us    |    get_vtime_delta();
>   4)   0.456 us    |    account_user_time();
>   4)   1.118 us    |  }
>   4)               |  irq_enter_rcu() {
>   4)   0.158 us    |    tick_irq_enter();
>   4)   0.391 us    |  }
>   4)               |  __sysvec_apic_timer_interrupt() {
>   4)               |  /* local_timer_entry: vector=236 */
>   4)   5.066 us    |    hrtimer_interrupt();
>   4)               |  /* local_timer_exit: vector=236 */
>   4)   5.596 us    |  }
>   4)               |  irq_exit_rcu() {
>   4)   0.111 us    |    idle_cpu();
>   4)               |    tick_nohz_irq_exit() {
>   4)               |      /* tick_stop: success=1 dependency=NONE */
>   4)   5.546 us    |    }
>   4)   6.018 us    |  }
>   4)               |  exit_to_user_mode_prepare() {
>   4)   0.118 us    |    rcu_nocb_flush_deferred_wakeup();
>   4)   0.128 us    |    fpregs_assert_state_consistent();
>   4)   0.623 us    |  }
>   4)               |  vtime_user_enter() {
>   4)   0.158 us    |    get_vtime_delta();
>   4)   0.364 us    |  }

I'm not seeing rcuc here in this trace. Do you have another one?

OTOH I see quite some workqueue activity and some timer callbacks.
For example blk_mq_run_work_fn() (doing IO from isolated CPU ?)
And also vmstat which should be handled since v6.4 with:

be5e015d107d (vmstat: skip periodic vmstat update for isolated CPUs)

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-08-16 14:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-19 15:38 Unexpected rcuc osnoise on PREEMPT_RT with "rcu_nocbs=all rcu_nocb_poll" Leon Woestenberg
2024-01-19 22:08 ` Frederic Weisbecker
2024-01-20  0:37   ` Paul E. McKenney
2024-01-22 14:32     ` Leon Woestenberg
2024-01-23 18:32       ` Paul E. McKenney
2024-08-16 14:06       ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox