All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] bonding: rate-limit printing a warning message
@ 2022-01-26 23:12 Mahesh Bandewar
  2022-01-26 23:24 ` Jay Vosburgh
  0 siblings, 1 reply; 3+ messages in thread
From: Mahesh Bandewar @ 2022-01-26 23:12 UTC (permalink / raw)
  To: Netdev, Jay Vosburgh, Andy Gospodarek, Veaceslav Falico
  Cc: David Miller, Jakub Kicinski, Mahesh Bandewar, Mahesh Bandewar

dev.c:get_link_speed() schedules a work-queue aggressively when it
fails to get the speed. If the link n question is a bond device which
may have multiple links to iterate through to get the link
speed. If the underlying link(s) has/have issues, bonding driver prints
a link-status message and this doesn't go well with the aggressive
work-queue scheduling and results in a rcu stall. This fix just
adds a ratelimiter to the message printing to avoid the stall.

Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x14d/0x20b lib/dump_stack.c:53
 nmi_cpu_backtrace.cold+0x19/0x98 lib/nmi_backtrace.c:103
 nmi_trigger_cpumask_backtrace+0x16a/0x17e lib/nmi_backtrace.c:62
 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
 trigger_single_cpu_backtrace include/linux/nmi.h:161 [inline]
 rcu_dump_cpu_stacks+0x183/0x1cf kernel/rcu/tree.c:1210
 print_cpu_stall kernel/rcu/tree.c:1349 [inline]
 check_cpu_stall kernel/rcu/tree.c:1423 [inline]
 rcu_pending kernel/rcu/tree.c:3010 [inline]
 rcu_check_callbacks.cold+0x494/0x7d3 kernel/rcu/tree.c:2551
 update_process_times+0x32/0x80 kernel/time/timer.c:1641
 tick_sched_handle+0xa0/0x180 kernel/time/tick-sched.c:161
 tick_sched_timer+0x44/0x130 kernel/time/tick-sched.c:1193
 __run_hrtimer kernel/time/hrtimer.c:1396 [inline]
 __hrtimer_run_queues+0x304/0xd80 kernel/time/hrtimer.c:1458
 hrtimer_interrupt+0x2ea/0x730 kernel/time/hrtimer.c:1516
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1031 [inline]
 smp_apic_timer_interrupt+0x150/0x5e0 arch/x86/kernel/apic/apic.c:1056
 apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:780
 </IRQ>
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783 [inline]
RIP: 0010:console_unlock+0x82b/0xcc0 kernel/printk/printk.c:2302
 RSP: 0018:ffff8801966cb9e8 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff12
RAX: ffff8801968d0040 RBX: 0000000000000000 RCX: 0000000000000006
RDX: 0000000000000000 RSI: ffffffff815a6515 RDI: 0000000000000293
RBP: ffff8801966cba70 R08: ffff8801968d0040 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff82cc61b0 R14: dffffc0000000000 R15: 0000000000000001
 vprintk_emit+0x593/0x610 kernel/printk/printk.c:1836
 vprintk_default+0x28/0x30 kernel/printk/printk.c:1876
 vprintk_func+0x7a/0xed kernel/printk/printk_safe.c:379
 printk+0xba/0xed kernel/printk/printk.c:1909
 get_link_speed.cold+0x43/0x144 net/core/dev.c:1493
 get_link_speed_work+0x1e/0x30 net/core/dev.c:1515
 process_one_work+0x881/0x1560 kernel/workqueue.c:2147
 worker_thread+0x653/0x1150 kernel/workqueue.c:2281
 kthread+0x345/0x410 kernel/kthread.c:246
 ret_from_fork+0x3f/0x50 arch/x86/entry/entry_64.S:393

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
 drivers/net/bonding/bond_main.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 238b56d77c36..98b37af3fb6b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2538,9 +2538,12 @@ static int bond_miimon_inspect(struct bonding *bond)
 				/* recovered before downdelay expired */
 				bond_propose_link_state(slave, BOND_LINK_UP);
 				slave->last_link_up = jiffies;
-				slave_info(bond->dev, slave->dev, "link status up again after %d ms\n",
-					   (bond->params.downdelay - slave->delay) *
-					   bond->params.miimon);
+				if (net_ratelimit())
+					slave_info(bond->dev, slave->dev,
+						   "link status up again after %d ms\n",
+						   (bond->params.downdelay -
+						    slave->delay) *
+						   bond->params.miimon);
 				commit++;
 				continue;
 			}
-- 
2.35.0.rc0.227.g00780c9af4-goog


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] bonding: rate-limit printing a warning message
  2022-01-26 23:12 [PATCH] bonding: rate-limit printing a warning message Mahesh Bandewar
@ 2022-01-26 23:24 ` Jay Vosburgh
  2022-01-27  0:18   ` Mahesh Bandewar (महेश बंडेवार)
  0 siblings, 1 reply; 3+ messages in thread
From: Jay Vosburgh @ 2022-01-26 23:24 UTC (permalink / raw)
  To: Mahesh Bandewar
  Cc: Netdev, Andy Gospodarek, Veaceslav Falico, David Miller,
	Jakub Kicinski, Mahesh Bandewar

Mahesh Bandewar <maheshb@google.com> wrote:

>dev.c:get_link_speed() schedules a work-queue aggressively when it
>fails to get the speed. If the link n question is a bond device which
>may have multiple links to iterate through to get the link
>speed. If the underlying link(s) has/have issues, bonding driver prints
>a link-status message and this doesn't go well with the aggressive
>work-queue scheduling and results in a rcu stall. This fix just
>adds a ratelimiter to the message printing to avoid the stall.

	I don't see a get_link_speed() function in net-next
net/core/dev.c; am I missing something?

	-J

>Call Trace:
> <IRQ>
> __dump_stack lib/dump_stack.c:17 [inline]
> dump_stack+0x14d/0x20b lib/dump_stack.c:53
> nmi_cpu_backtrace.cold+0x19/0x98 lib/nmi_backtrace.c:103
> nmi_trigger_cpumask_backtrace+0x16a/0x17e lib/nmi_backtrace.c:62
> arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
> trigger_single_cpu_backtrace include/linux/nmi.h:161 [inline]
> rcu_dump_cpu_stacks+0x183/0x1cf kernel/rcu/tree.c:1210
> print_cpu_stall kernel/rcu/tree.c:1349 [inline]
> check_cpu_stall kernel/rcu/tree.c:1423 [inline]
> rcu_pending kernel/rcu/tree.c:3010 [inline]
> rcu_check_callbacks.cold+0x494/0x7d3 kernel/rcu/tree.c:2551
> update_process_times+0x32/0x80 kernel/time/timer.c:1641
> tick_sched_handle+0xa0/0x180 kernel/time/tick-sched.c:161
> tick_sched_timer+0x44/0x130 kernel/time/tick-sched.c:1193
> __run_hrtimer kernel/time/hrtimer.c:1396 [inline]
> __hrtimer_run_queues+0x304/0xd80 kernel/time/hrtimer.c:1458
> hrtimer_interrupt+0x2ea/0x730 kernel/time/hrtimer.c:1516
> local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1031 [inline]
> smp_apic_timer_interrupt+0x150/0x5e0 arch/x86/kernel/apic/apic.c:1056
> apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:780
> </IRQ>
>RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783 [inline]
>RIP: 0010:console_unlock+0x82b/0xcc0 kernel/printk/printk.c:2302
> RSP: 0018:ffff8801966cb9e8 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff12
>RAX: ffff8801968d0040 RBX: 0000000000000000 RCX: 0000000000000006
>RDX: 0000000000000000 RSI: ffffffff815a6515 RDI: 0000000000000293
>RBP: ffff8801966cba70 R08: ffff8801968d0040 R09: 0000000000000000
>R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>R13: ffffffff82cc61b0 R14: dffffc0000000000 R15: 0000000000000001
> vprintk_emit+0x593/0x610 kernel/printk/printk.c:1836
> vprintk_default+0x28/0x30 kernel/printk/printk.c:1876
> vprintk_func+0x7a/0xed kernel/printk/printk_safe.c:379
> printk+0xba/0xed kernel/printk/printk.c:1909
> get_link_speed.cold+0x43/0x144 net/core/dev.c:1493
> get_link_speed_work+0x1e/0x30 net/core/dev.c:1515
> process_one_work+0x881/0x1560 kernel/workqueue.c:2147
> worker_thread+0x653/0x1150 kernel/workqueue.c:2281
> kthread+0x345/0x410 kernel/kthread.c:246
> ret_from_fork+0x3f/0x50 arch/x86/entry/entry_64.S:393
>
>Signed-off-by: Mahesh Bandewar <maheshb@google.com>
>---
> drivers/net/bonding/bond_main.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 238b56d77c36..98b37af3fb6b 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -2538,9 +2538,12 @@ static int bond_miimon_inspect(struct bonding *bond)
> 				/* recovered before downdelay expired */
> 				bond_propose_link_state(slave, BOND_LINK_UP);
> 				slave->last_link_up = jiffies;
>-				slave_info(bond->dev, slave->dev, "link status up again after %d ms\n",
>-					   (bond->params.downdelay - slave->delay) *
>-					   bond->params.miimon);
>+				if (net_ratelimit())
>+					slave_info(bond->dev, slave->dev,
>+						   "link status up again after %d ms\n",
>+						   (bond->params.downdelay -
>+						    slave->delay) *
>+						   bond->params.miimon);
> 				commit++;
> 				continue;
> 			}
>-- 
>2.35.0.rc0.227.g00780c9af4-goog
>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] bonding: rate-limit printing a warning message
  2022-01-26 23:24 ` Jay Vosburgh
@ 2022-01-27  0:18   ` Mahesh Bandewar (महेश बंडेवार)
  0 siblings, 0 replies; 3+ messages in thread
From: Mahesh Bandewar (महेश बंडेवार) @ 2022-01-27  0:18 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Netdev, Andy Gospodarek, Veaceslav Falico, David Miller,
	Jakub Kicinski, Mahesh Bandewar

On Wed, Jan 26, 2022 at 3:25 PM Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
>
> Mahesh Bandewar <maheshb@google.com> wrote:
>
> >dev.c:get_link_speed() schedules a work-queue aggressively when it
> >fails to get the speed. If the link n question is a bond device which
> >may have multiple links to iterate through to get the link
> >speed. If the underlying link(s) has/have issues, bonding driver prints
> >a link-status message and this doesn't go well with the aggressive
> >work-queue scheduling and results in a rcu stall. This fix just
> >adds a ratelimiter to the message printing to avoid the stall.
>
>         I don't see a get_link_speed() function in net-next
> net/core/dev.c; am I missing something?
>
Oops, looks like this case is relevant to our kernel only, please
ignore the patch.

>         -J
>
> >Call Trace:
> > <IRQ>
> > __dump_stack lib/dump_stack.c:17 [inline]
> > dump_stack+0x14d/0x20b lib/dump_stack.c:53
> > nmi_cpu_backtrace.cold+0x19/0x98 lib/nmi_backtrace.c:103
> > nmi_trigger_cpumask_backtrace+0x16a/0x17e lib/nmi_backtrace.c:62
> > arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
> > trigger_single_cpu_backtrace include/linux/nmi.h:161 [inline]
> > rcu_dump_cpu_stacks+0x183/0x1cf kernel/rcu/tree.c:1210
> > print_cpu_stall kernel/rcu/tree.c:1349 [inline]
> > check_cpu_stall kernel/rcu/tree.c:1423 [inline]
> > rcu_pending kernel/rcu/tree.c:3010 [inline]
> > rcu_check_callbacks.cold+0x494/0x7d3 kernel/rcu/tree.c:2551
> > update_process_times+0x32/0x80 kernel/time/timer.c:1641
> > tick_sched_handle+0xa0/0x180 kernel/time/tick-sched.c:161
> > tick_sched_timer+0x44/0x130 kernel/time/tick-sched.c:1193
> > __run_hrtimer kernel/time/hrtimer.c:1396 [inline]
> > __hrtimer_run_queues+0x304/0xd80 kernel/time/hrtimer.c:1458
> > hrtimer_interrupt+0x2ea/0x730 kernel/time/hrtimer.c:1516
> > local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1031 [inline]
> > smp_apic_timer_interrupt+0x150/0x5e0 arch/x86/kernel/apic/apic.c:1056
> > apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:780
> > </IRQ>
> >RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783 [inline]
> >RIP: 0010:console_unlock+0x82b/0xcc0 kernel/printk/printk.c:2302
> > RSP: 0018:ffff8801966cb9e8 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff12
> >RAX: ffff8801968d0040 RBX: 0000000000000000 RCX: 0000000000000006
> >RDX: 0000000000000000 RSI: ffffffff815a6515 RDI: 0000000000000293
> >RBP: ffff8801966cba70 R08: ffff8801968d0040 R09: 0000000000000000
> >R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> >R13: ffffffff82cc61b0 R14: dffffc0000000000 R15: 0000000000000001
> > vprintk_emit+0x593/0x610 kernel/printk/printk.c:1836
> > vprintk_default+0x28/0x30 kernel/printk/printk.c:1876
> > vprintk_func+0x7a/0xed kernel/printk/printk_safe.c:379
> > printk+0xba/0xed kernel/printk/printk.c:1909
> > get_link_speed.cold+0x43/0x144 net/core/dev.c:1493
> > get_link_speed_work+0x1e/0x30 net/core/dev.c:1515
> > process_one_work+0x881/0x1560 kernel/workqueue.c:2147
> > worker_thread+0x653/0x1150 kernel/workqueue.c:2281
> > kthread+0x345/0x410 kernel/kthread.c:246
> > ret_from_fork+0x3f/0x50 arch/x86/entry/entry_64.S:393
> >
> >Signed-off-by: Mahesh Bandewar <maheshb@google.com>
> >---
> > drivers/net/bonding/bond_main.c | 9 ++++++---
> > 1 file changed, 6 insertions(+), 3 deletions(-)
> >
> >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> >index 238b56d77c36..98b37af3fb6b 100644
> >--- a/drivers/net/bonding/bond_main.c
> >+++ b/drivers/net/bonding/bond_main.c
> >@@ -2538,9 +2538,12 @@ static int bond_miimon_inspect(struct bonding *bond)
> >                               /* recovered before downdelay expired */
> >                               bond_propose_link_state(slave, BOND_LINK_UP);
> >                               slave->last_link_up = jiffies;
> >-                              slave_info(bond->dev, slave->dev, "link status up again after %d ms\n",
> >-                                         (bond->params.downdelay - slave->delay) *
> >-                                         bond->params.miimon);
> >+                              if (net_ratelimit())
> >+                                      slave_info(bond->dev, slave->dev,
> >+                                                 "link status up again after %d ms\n",
> >+                                                 (bond->params.downdelay -
> >+                                                  slave->delay) *
> >+                                                 bond->params.miimon);
> >                               commit++;
> >                               continue;
> >                       }
> >--
> >2.35.0.rc0.227.g00780c9af4-goog
> >
>
> ---
>         -Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-01-27  0:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-26 23:12 [PATCH] bonding: rate-limit printing a warning message Mahesh Bandewar
2022-01-26 23:24 ` Jay Vosburgh
2022-01-27  0:18   ` Mahesh Bandewar (महेश बंडेवार)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.