netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] bonding: rate-limit printing a warning message
@ 2022-01-26 23:12 Mahesh Bandewar
  2022-01-26 23:24 ` Jay Vosburgh
  0 siblings, 1 reply; 3+ messages in thread
From: Mahesh Bandewar @ 2022-01-26 23:12 UTC (permalink / raw)
  To: Netdev, Jay Vosburgh, Andy Gospodarek, Veaceslav Falico
  Cc: David Miller, Jakub Kicinski, Mahesh Bandewar, Mahesh Bandewar

dev.c:get_link_speed() schedules a work-queue aggressively when it
fails to get the speed. If the link n question is a bond device which
may have multiple links to iterate through to get the link
speed. If the underlying link(s) has/have issues, bonding driver prints
a link-status message and this doesn't go well with the aggressive
work-queue scheduling and results in a rcu stall. This fix just
adds a ratelimiter to the message printing to avoid the stall.

Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x14d/0x20b lib/dump_stack.c:53
 nmi_cpu_backtrace.cold+0x19/0x98 lib/nmi_backtrace.c:103
 nmi_trigger_cpumask_backtrace+0x16a/0x17e lib/nmi_backtrace.c:62
 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
 trigger_single_cpu_backtrace include/linux/nmi.h:161 [inline]
 rcu_dump_cpu_stacks+0x183/0x1cf kernel/rcu/tree.c:1210
 print_cpu_stall kernel/rcu/tree.c:1349 [inline]
 check_cpu_stall kernel/rcu/tree.c:1423 [inline]
 rcu_pending kernel/rcu/tree.c:3010 [inline]
 rcu_check_callbacks.cold+0x494/0x7d3 kernel/rcu/tree.c:2551
 update_process_times+0x32/0x80 kernel/time/timer.c:1641
 tick_sched_handle+0xa0/0x180 kernel/time/tick-sched.c:161
 tick_sched_timer+0x44/0x130 kernel/time/tick-sched.c:1193
 __run_hrtimer kernel/time/hrtimer.c:1396 [inline]
 __hrtimer_run_queues+0x304/0xd80 kernel/time/hrtimer.c:1458
 hrtimer_interrupt+0x2ea/0x730 kernel/time/hrtimer.c:1516
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1031 [inline]
 smp_apic_timer_interrupt+0x150/0x5e0 arch/x86/kernel/apic/apic.c:1056
 apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:780
 </IRQ>
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783 [inline]
RIP: 0010:console_unlock+0x82b/0xcc0 kernel/printk/printk.c:2302
 RSP: 0018:ffff8801966cb9e8 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff12
RAX: ffff8801968d0040 RBX: 0000000000000000 RCX: 0000000000000006
RDX: 0000000000000000 RSI: ffffffff815a6515 RDI: 0000000000000293
RBP: ffff8801966cba70 R08: ffff8801968d0040 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff82cc61b0 R14: dffffc0000000000 R15: 0000000000000001
 vprintk_emit+0x593/0x610 kernel/printk/printk.c:1836
 vprintk_default+0x28/0x30 kernel/printk/printk.c:1876
 vprintk_func+0x7a/0xed kernel/printk/printk_safe.c:379
 printk+0xba/0xed kernel/printk/printk.c:1909
 get_link_speed.cold+0x43/0x144 net/core/dev.c:1493
 get_link_speed_work+0x1e/0x30 net/core/dev.c:1515
 process_one_work+0x881/0x1560 kernel/workqueue.c:2147
 worker_thread+0x653/0x1150 kernel/workqueue.c:2281
 kthread+0x345/0x410 kernel/kthread.c:246
 ret_from_fork+0x3f/0x50 arch/x86/entry/entry_64.S:393

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
 drivers/net/bonding/bond_main.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 238b56d77c36..98b37af3fb6b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2538,9 +2538,12 @@ static int bond_miimon_inspect(struct bonding *bond)
 				/* recovered before downdelay expired */
 				bond_propose_link_state(slave, BOND_LINK_UP);
 				slave->last_link_up = jiffies;
-				slave_info(bond->dev, slave->dev, "link status up again after %d ms\n",
-					   (bond->params.downdelay - slave->delay) *
-					   bond->params.miimon);
+				if (net_ratelimit())
+					slave_info(bond->dev, slave->dev,
+						   "link status up again after %d ms\n",
+						   (bond->params.downdelay -
+						    slave->delay) *
+						   bond->params.miimon);
 				commit++;
 				continue;
 			}
-- 
2.35.0.rc0.227.g00780c9af4-goog


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] bonding: rate-limit printing a warning message
  2022-01-26 23:12 [PATCH] bonding: rate-limit printing a warning message Mahesh Bandewar
@ 2022-01-26 23:24 ` Jay Vosburgh
  2022-01-27  0:18   ` Mahesh Bandewar (महेश बंडेवार)
  0 siblings, 1 reply; 3+ messages in thread
From: Jay Vosburgh @ 2022-01-26 23:24 UTC (permalink / raw)
  To: Mahesh Bandewar
  Cc: Netdev, Andy Gospodarek, Veaceslav Falico, David Miller,
	Jakub Kicinski, Mahesh Bandewar

Mahesh Bandewar <maheshb@google.com> wrote:

>dev.c:get_link_speed() schedules a work-queue aggressively when it
>fails to get the speed. If the link n question is a bond device which
>may have multiple links to iterate through to get the link
>speed. If the underlying link(s) has/have issues, bonding driver prints
>a link-status message and this doesn't go well with the aggressive
>work-queue scheduling and results in a rcu stall. This fix just
>adds a ratelimiter to the message printing to avoid the stall.

	I don't see a get_link_speed() function in net-next
net/core/dev.c; am I missing something?

	-J

>Call Trace:
> <IRQ>
> __dump_stack lib/dump_stack.c:17 [inline]
> dump_stack+0x14d/0x20b lib/dump_stack.c:53
> nmi_cpu_backtrace.cold+0x19/0x98 lib/nmi_backtrace.c:103
> nmi_trigger_cpumask_backtrace+0x16a/0x17e lib/nmi_backtrace.c:62
> arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
> trigger_single_cpu_backtrace include/linux/nmi.h:161 [inline]
> rcu_dump_cpu_stacks+0x183/0x1cf kernel/rcu/tree.c:1210
> print_cpu_stall kernel/rcu/tree.c:1349 [inline]
> check_cpu_stall kernel/rcu/tree.c:1423 [inline]
> rcu_pending kernel/rcu/tree.c:3010 [inline]
> rcu_check_callbacks.cold+0x494/0x7d3 kernel/rcu/tree.c:2551
> update_process_times+0x32/0x80 kernel/time/timer.c:1641
> tick_sched_handle+0xa0/0x180 kernel/time/tick-sched.c:161
> tick_sched_timer+0x44/0x130 kernel/time/tick-sched.c:1193
> __run_hrtimer kernel/time/hrtimer.c:1396 [inline]
> __hrtimer_run_queues+0x304/0xd80 kernel/time/hrtimer.c:1458
> hrtimer_interrupt+0x2ea/0x730 kernel/time/hrtimer.c:1516
> local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1031 [inline]
> smp_apic_timer_interrupt+0x150/0x5e0 arch/x86/kernel/apic/apic.c:1056
> apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:780
> </IRQ>
>RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783 [inline]
>RIP: 0010:console_unlock+0x82b/0xcc0 kernel/printk/printk.c:2302
> RSP: 0018:ffff8801966cb9e8 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff12
>RAX: ffff8801968d0040 RBX: 0000000000000000 RCX: 0000000000000006
>RDX: 0000000000000000 RSI: ffffffff815a6515 RDI: 0000000000000293
>RBP: ffff8801966cba70 R08: ffff8801968d0040 R09: 0000000000000000
>R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>R13: ffffffff82cc61b0 R14: dffffc0000000000 R15: 0000000000000001
> vprintk_emit+0x593/0x610 kernel/printk/printk.c:1836
> vprintk_default+0x28/0x30 kernel/printk/printk.c:1876
> vprintk_func+0x7a/0xed kernel/printk/printk_safe.c:379
> printk+0xba/0xed kernel/printk/printk.c:1909
> get_link_speed.cold+0x43/0x144 net/core/dev.c:1493
> get_link_speed_work+0x1e/0x30 net/core/dev.c:1515
> process_one_work+0x881/0x1560 kernel/workqueue.c:2147
> worker_thread+0x653/0x1150 kernel/workqueue.c:2281
> kthread+0x345/0x410 kernel/kthread.c:246
> ret_from_fork+0x3f/0x50 arch/x86/entry/entry_64.S:393
>
>Signed-off-by: Mahesh Bandewar <maheshb@google.com>
>---
> drivers/net/bonding/bond_main.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 238b56d77c36..98b37af3fb6b 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -2538,9 +2538,12 @@ static int bond_miimon_inspect(struct bonding *bond)
> 				/* recovered before downdelay expired */
> 				bond_propose_link_state(slave, BOND_LINK_UP);
> 				slave->last_link_up = jiffies;
>-				slave_info(bond->dev, slave->dev, "link status up again after %d ms\n",
>-					   (bond->params.downdelay - slave->delay) *
>-					   bond->params.miimon);
>+				if (net_ratelimit())
>+					slave_info(bond->dev, slave->dev,
>+						   "link status up again after %d ms\n",
>+						   (bond->params.downdelay -
>+						    slave->delay) *
>+						   bond->params.miimon);
> 				commit++;
> 				continue;
> 			}
>-- 
>2.35.0.rc0.227.g00780c9af4-goog
>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] bonding: rate-limit printing a warning message
  2022-01-26 23:24 ` Jay Vosburgh
@ 2022-01-27  0:18   ` Mahesh Bandewar (महेश बंडेवार)
  0 siblings, 0 replies; 3+ messages in thread
From: Mahesh Bandewar (महेश बंडेवार) @ 2022-01-27  0:18 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Netdev, Andy Gospodarek, Veaceslav Falico, David Miller,
	Jakub Kicinski, Mahesh Bandewar

On Wed, Jan 26, 2022 at 3:25 PM Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
>
> Mahesh Bandewar <maheshb@google.com> wrote:
>
> >dev.c:get_link_speed() schedules a work-queue aggressively when it
> >fails to get the speed. If the link n question is a bond device which
> >may have multiple links to iterate through to get the link
> >speed. If the underlying link(s) has/have issues, bonding driver prints
> >a link-status message and this doesn't go well with the aggressive
> >work-queue scheduling and results in a rcu stall. This fix just
> >adds a ratelimiter to the message printing to avoid the stall.
>
>         I don't see a get_link_speed() function in net-next
> net/core/dev.c; am I missing something?
>
Oops, looks like this case is relevant to our kernel only, please
ignore the patch.

>         -J
>
> >Call Trace:
> > <IRQ>
> > __dump_stack lib/dump_stack.c:17 [inline]
> > dump_stack+0x14d/0x20b lib/dump_stack.c:53
> > nmi_cpu_backtrace.cold+0x19/0x98 lib/nmi_backtrace.c:103
> > nmi_trigger_cpumask_backtrace+0x16a/0x17e lib/nmi_backtrace.c:62
> > arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
> > trigger_single_cpu_backtrace include/linux/nmi.h:161 [inline]
> > rcu_dump_cpu_stacks+0x183/0x1cf kernel/rcu/tree.c:1210
> > print_cpu_stall kernel/rcu/tree.c:1349 [inline]
> > check_cpu_stall kernel/rcu/tree.c:1423 [inline]
> > rcu_pending kernel/rcu/tree.c:3010 [inline]
> > rcu_check_callbacks.cold+0x494/0x7d3 kernel/rcu/tree.c:2551
> > update_process_times+0x32/0x80 kernel/time/timer.c:1641
> > tick_sched_handle+0xa0/0x180 kernel/time/tick-sched.c:161
> > tick_sched_timer+0x44/0x130 kernel/time/tick-sched.c:1193
> > __run_hrtimer kernel/time/hrtimer.c:1396 [inline]
> > __hrtimer_run_queues+0x304/0xd80 kernel/time/hrtimer.c:1458
> > hrtimer_interrupt+0x2ea/0x730 kernel/time/hrtimer.c:1516
> > local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1031 [inline]
> > smp_apic_timer_interrupt+0x150/0x5e0 arch/x86/kernel/apic/apic.c:1056
> > apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:780
> > </IRQ>
> >RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783 [inline]
> >RIP: 0010:console_unlock+0x82b/0xcc0 kernel/printk/printk.c:2302
> > RSP: 0018:ffff8801966cb9e8 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff12
> >RAX: ffff8801968d0040 RBX: 0000000000000000 RCX: 0000000000000006
> >RDX: 0000000000000000 RSI: ffffffff815a6515 RDI: 0000000000000293
> >RBP: ffff8801966cba70 R08: ffff8801968d0040 R09: 0000000000000000
> >R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> >R13: ffffffff82cc61b0 R14: dffffc0000000000 R15: 0000000000000001
> > vprintk_emit+0x593/0x610 kernel/printk/printk.c:1836
> > vprintk_default+0x28/0x30 kernel/printk/printk.c:1876
> > vprintk_func+0x7a/0xed kernel/printk/printk_safe.c:379
> > printk+0xba/0xed kernel/printk/printk.c:1909
> > get_link_speed.cold+0x43/0x144 net/core/dev.c:1493
> > get_link_speed_work+0x1e/0x30 net/core/dev.c:1515
> > process_one_work+0x881/0x1560 kernel/workqueue.c:2147
> > worker_thread+0x653/0x1150 kernel/workqueue.c:2281
> > kthread+0x345/0x410 kernel/kthread.c:246
> > ret_from_fork+0x3f/0x50 arch/x86/entry/entry_64.S:393
> >
> >Signed-off-by: Mahesh Bandewar <maheshb@google.com>
> >---
> > drivers/net/bonding/bond_main.c | 9 ++++++---
> > 1 file changed, 6 insertions(+), 3 deletions(-)
> >
> >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> >index 238b56d77c36..98b37af3fb6b 100644
> >--- a/drivers/net/bonding/bond_main.c
> >+++ b/drivers/net/bonding/bond_main.c
> >@@ -2538,9 +2538,12 @@ static int bond_miimon_inspect(struct bonding *bond)
> >                               /* recovered before downdelay expired */
> >                               bond_propose_link_state(slave, BOND_LINK_UP);
> >                               slave->last_link_up = jiffies;
> >-                              slave_info(bond->dev, slave->dev, "link status up again after %d ms\n",
> >-                                         (bond->params.downdelay - slave->delay) *
> >-                                         bond->params.miimon);
> >+                              if (net_ratelimit())
> >+                                      slave_info(bond->dev, slave->dev,
> >+                                                 "link status up again after %d ms\n",
> >+                                                 (bond->params.downdelay -
> >+                                                  slave->delay) *
> >+                                                 bond->params.miimon);
> >                               commit++;
> >                               continue;
> >                       }
> >--
> >2.35.0.rc0.227.g00780c9af4-goog
> >
>
> ---
>         -Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-01-27  0:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-26 23:12 [PATCH] bonding: rate-limit printing a warning message Mahesh Bandewar
2022-01-26 23:24 ` Jay Vosburgh
2022-01-27  0:18   ` Mahesh Bandewar (महेश बंडेवार)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).