* [PATCH] bonding: rate-limit printing a warning message
@ 2022-01-26 23:12 Mahesh Bandewar
2022-01-26 23:24 ` Jay Vosburgh
0 siblings, 1 reply; 3+ messages in thread
From: Mahesh Bandewar @ 2022-01-26 23:12 UTC (permalink / raw)
To: Netdev, Jay Vosburgh, Andy Gospodarek, Veaceslav Falico
Cc: David Miller, Jakub Kicinski, Mahesh Bandewar, Mahesh Bandewar
dev.c:get_link_speed() schedules a work-queue aggressively when it
fails to get the speed. If the link n question is a bond device which
may have multiple links to iterate through to get the link
speed. If the underlying link(s) has/have issues, bonding driver prints
a link-status message and this doesn't go well with the aggressive
work-queue scheduling and results in a rcu stall. This fix just
adds a ratelimiter to the message printing to avoid the stall.
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x14d/0x20b lib/dump_stack.c:53
nmi_cpu_backtrace.cold+0x19/0x98 lib/nmi_backtrace.c:103
nmi_trigger_cpumask_backtrace+0x16a/0x17e lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_single_cpu_backtrace include/linux/nmi.h:161 [inline]
rcu_dump_cpu_stacks+0x183/0x1cf kernel/rcu/tree.c:1210
print_cpu_stall kernel/rcu/tree.c:1349 [inline]
check_cpu_stall kernel/rcu/tree.c:1423 [inline]
rcu_pending kernel/rcu/tree.c:3010 [inline]
rcu_check_callbacks.cold+0x494/0x7d3 kernel/rcu/tree.c:2551
update_process_times+0x32/0x80 kernel/time/timer.c:1641
tick_sched_handle+0xa0/0x180 kernel/time/tick-sched.c:161
tick_sched_timer+0x44/0x130 kernel/time/tick-sched.c:1193
__run_hrtimer kernel/time/hrtimer.c:1396 [inline]
__hrtimer_run_queues+0x304/0xd80 kernel/time/hrtimer.c:1458
hrtimer_interrupt+0x2ea/0x730 kernel/time/hrtimer.c:1516
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1031 [inline]
smp_apic_timer_interrupt+0x150/0x5e0 arch/x86/kernel/apic/apic.c:1056
apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:780
</IRQ>
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783 [inline]
RIP: 0010:console_unlock+0x82b/0xcc0 kernel/printk/printk.c:2302
RSP: 0018:ffff8801966cb9e8 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff12
RAX: ffff8801968d0040 RBX: 0000000000000000 RCX: 0000000000000006
RDX: 0000000000000000 RSI: ffffffff815a6515 RDI: 0000000000000293
RBP: ffff8801966cba70 R08: ffff8801968d0040 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff82cc61b0 R14: dffffc0000000000 R15: 0000000000000001
vprintk_emit+0x593/0x610 kernel/printk/printk.c:1836
vprintk_default+0x28/0x30 kernel/printk/printk.c:1876
vprintk_func+0x7a/0xed kernel/printk/printk_safe.c:379
printk+0xba/0xed kernel/printk/printk.c:1909
get_link_speed.cold+0x43/0x144 net/core/dev.c:1493
get_link_speed_work+0x1e/0x30 net/core/dev.c:1515
process_one_work+0x881/0x1560 kernel/workqueue.c:2147
worker_thread+0x653/0x1150 kernel/workqueue.c:2281
kthread+0x345/0x410 kernel/kthread.c:246
ret_from_fork+0x3f/0x50 arch/x86/entry/entry_64.S:393
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
drivers/net/bonding/bond_main.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 238b56d77c36..98b37af3fb6b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2538,9 +2538,12 @@ static int bond_miimon_inspect(struct bonding *bond)
/* recovered before downdelay expired */
bond_propose_link_state(slave, BOND_LINK_UP);
slave->last_link_up = jiffies;
- slave_info(bond->dev, slave->dev, "link status up again after %d ms\n",
- (bond->params.downdelay - slave->delay) *
- bond->params.miimon);
+ if (net_ratelimit())
+ slave_info(bond->dev, slave->dev,
+ "link status up again after %d ms\n",
+ (bond->params.downdelay -
+ slave->delay) *
+ bond->params.miimon);
commit++;
continue;
}
--
2.35.0.rc0.227.g00780c9af4-goog
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] bonding: rate-limit printing a warning message
2022-01-26 23:12 [PATCH] bonding: rate-limit printing a warning message Mahesh Bandewar
@ 2022-01-26 23:24 ` Jay Vosburgh
2022-01-27 0:18 ` Mahesh Bandewar (महेश बंडेवार)
0 siblings, 1 reply; 3+ messages in thread
From: Jay Vosburgh @ 2022-01-26 23:24 UTC (permalink / raw)
To: Mahesh Bandewar
Cc: Netdev, Andy Gospodarek, Veaceslav Falico, David Miller,
Jakub Kicinski, Mahesh Bandewar
Mahesh Bandewar <maheshb@google.com> wrote:
>dev.c:get_link_speed() schedules a work-queue aggressively when it
>fails to get the speed. If the link n question is a bond device which
>may have multiple links to iterate through to get the link
>speed. If the underlying link(s) has/have issues, bonding driver prints
>a link-status message and this doesn't go well with the aggressive
>work-queue scheduling and results in a rcu stall. This fix just
>adds a ratelimiter to the message printing to avoid the stall.
I don't see a get_link_speed() function in net-next
net/core/dev.c; am I missing something?
-J
>Call Trace:
> <IRQ>
> __dump_stack lib/dump_stack.c:17 [inline]
> dump_stack+0x14d/0x20b lib/dump_stack.c:53
> nmi_cpu_backtrace.cold+0x19/0x98 lib/nmi_backtrace.c:103
> nmi_trigger_cpumask_backtrace+0x16a/0x17e lib/nmi_backtrace.c:62
> arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
> trigger_single_cpu_backtrace include/linux/nmi.h:161 [inline]
> rcu_dump_cpu_stacks+0x183/0x1cf kernel/rcu/tree.c:1210
> print_cpu_stall kernel/rcu/tree.c:1349 [inline]
> check_cpu_stall kernel/rcu/tree.c:1423 [inline]
> rcu_pending kernel/rcu/tree.c:3010 [inline]
> rcu_check_callbacks.cold+0x494/0x7d3 kernel/rcu/tree.c:2551
> update_process_times+0x32/0x80 kernel/time/timer.c:1641
> tick_sched_handle+0xa0/0x180 kernel/time/tick-sched.c:161
> tick_sched_timer+0x44/0x130 kernel/time/tick-sched.c:1193
> __run_hrtimer kernel/time/hrtimer.c:1396 [inline]
> __hrtimer_run_queues+0x304/0xd80 kernel/time/hrtimer.c:1458
> hrtimer_interrupt+0x2ea/0x730 kernel/time/hrtimer.c:1516
> local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1031 [inline]
> smp_apic_timer_interrupt+0x150/0x5e0 arch/x86/kernel/apic/apic.c:1056
> apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:780
> </IRQ>
>RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783 [inline]
>RIP: 0010:console_unlock+0x82b/0xcc0 kernel/printk/printk.c:2302
> RSP: 0018:ffff8801966cb9e8 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff12
>RAX: ffff8801968d0040 RBX: 0000000000000000 RCX: 0000000000000006
>RDX: 0000000000000000 RSI: ffffffff815a6515 RDI: 0000000000000293
>RBP: ffff8801966cba70 R08: ffff8801968d0040 R09: 0000000000000000
>R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>R13: ffffffff82cc61b0 R14: dffffc0000000000 R15: 0000000000000001
> vprintk_emit+0x593/0x610 kernel/printk/printk.c:1836
> vprintk_default+0x28/0x30 kernel/printk/printk.c:1876
> vprintk_func+0x7a/0xed kernel/printk/printk_safe.c:379
> printk+0xba/0xed kernel/printk/printk.c:1909
> get_link_speed.cold+0x43/0x144 net/core/dev.c:1493
> get_link_speed_work+0x1e/0x30 net/core/dev.c:1515
> process_one_work+0x881/0x1560 kernel/workqueue.c:2147
> worker_thread+0x653/0x1150 kernel/workqueue.c:2281
> kthread+0x345/0x410 kernel/kthread.c:246
> ret_from_fork+0x3f/0x50 arch/x86/entry/entry_64.S:393
>
>Signed-off-by: Mahesh Bandewar <maheshb@google.com>
>---
> drivers/net/bonding/bond_main.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 238b56d77c36..98b37af3fb6b 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -2538,9 +2538,12 @@ static int bond_miimon_inspect(struct bonding *bond)
> /* recovered before downdelay expired */
> bond_propose_link_state(slave, BOND_LINK_UP);
> slave->last_link_up = jiffies;
>- slave_info(bond->dev, slave->dev, "link status up again after %d ms\n",
>- (bond->params.downdelay - slave->delay) *
>- bond->params.miimon);
>+ if (net_ratelimit())
>+ slave_info(bond->dev, slave->dev,
>+ "link status up again after %d ms\n",
>+ (bond->params.downdelay -
>+ slave->delay) *
>+ bond->params.miimon);
> commit++;
> continue;
> }
>--
>2.35.0.rc0.227.g00780c9af4-goog
>
---
-Jay Vosburgh, jay.vosburgh@canonical.com
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] bonding: rate-limit printing a warning message
2022-01-26 23:24 ` Jay Vosburgh
@ 2022-01-27 0:18 ` Mahesh Bandewar (महेश बंडेवार)
0 siblings, 0 replies; 3+ messages in thread
From: Mahesh Bandewar (महेश बंडेवार) @ 2022-01-27 0:18 UTC (permalink / raw)
To: Jay Vosburgh
Cc: Netdev, Andy Gospodarek, Veaceslav Falico, David Miller,
Jakub Kicinski, Mahesh Bandewar
On Wed, Jan 26, 2022 at 3:25 PM Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
>
> Mahesh Bandewar <maheshb@google.com> wrote:
>
> >dev.c:get_link_speed() schedules a work-queue aggressively when it
> >fails to get the speed. If the link n question is a bond device which
> >may have multiple links to iterate through to get the link
> >speed. If the underlying link(s) has/have issues, bonding driver prints
> >a link-status message and this doesn't go well with the aggressive
> >work-queue scheduling and results in a rcu stall. This fix just
> >adds a ratelimiter to the message printing to avoid the stall.
>
> I don't see a get_link_speed() function in net-next
> net/core/dev.c; am I missing something?
>
Oops, looks like this case is relevant to our kernel only, please
ignore the patch.
> -J
>
> >Call Trace:
> > <IRQ>
> > __dump_stack lib/dump_stack.c:17 [inline]
> > dump_stack+0x14d/0x20b lib/dump_stack.c:53
> > nmi_cpu_backtrace.cold+0x19/0x98 lib/nmi_backtrace.c:103
> > nmi_trigger_cpumask_backtrace+0x16a/0x17e lib/nmi_backtrace.c:62
> > arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
> > trigger_single_cpu_backtrace include/linux/nmi.h:161 [inline]
> > rcu_dump_cpu_stacks+0x183/0x1cf kernel/rcu/tree.c:1210
> > print_cpu_stall kernel/rcu/tree.c:1349 [inline]
> > check_cpu_stall kernel/rcu/tree.c:1423 [inline]
> > rcu_pending kernel/rcu/tree.c:3010 [inline]
> > rcu_check_callbacks.cold+0x494/0x7d3 kernel/rcu/tree.c:2551
> > update_process_times+0x32/0x80 kernel/time/timer.c:1641
> > tick_sched_handle+0xa0/0x180 kernel/time/tick-sched.c:161
> > tick_sched_timer+0x44/0x130 kernel/time/tick-sched.c:1193
> > __run_hrtimer kernel/time/hrtimer.c:1396 [inline]
> > __hrtimer_run_queues+0x304/0xd80 kernel/time/hrtimer.c:1458
> > hrtimer_interrupt+0x2ea/0x730 kernel/time/hrtimer.c:1516
> > local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1031 [inline]
> > smp_apic_timer_interrupt+0x150/0x5e0 arch/x86/kernel/apic/apic.c:1056
> > apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:780
> > </IRQ>
> >RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783 [inline]
> >RIP: 0010:console_unlock+0x82b/0xcc0 kernel/printk/printk.c:2302
> > RSP: 0018:ffff8801966cb9e8 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff12
> >RAX: ffff8801968d0040 RBX: 0000000000000000 RCX: 0000000000000006
> >RDX: 0000000000000000 RSI: ffffffff815a6515 RDI: 0000000000000293
> >RBP: ffff8801966cba70 R08: ffff8801968d0040 R09: 0000000000000000
> >R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> >R13: ffffffff82cc61b0 R14: dffffc0000000000 R15: 0000000000000001
> > vprintk_emit+0x593/0x610 kernel/printk/printk.c:1836
> > vprintk_default+0x28/0x30 kernel/printk/printk.c:1876
> > vprintk_func+0x7a/0xed kernel/printk/printk_safe.c:379
> > printk+0xba/0xed kernel/printk/printk.c:1909
> > get_link_speed.cold+0x43/0x144 net/core/dev.c:1493
> > get_link_speed_work+0x1e/0x30 net/core/dev.c:1515
> > process_one_work+0x881/0x1560 kernel/workqueue.c:2147
> > worker_thread+0x653/0x1150 kernel/workqueue.c:2281
> > kthread+0x345/0x410 kernel/kthread.c:246
> > ret_from_fork+0x3f/0x50 arch/x86/entry/entry_64.S:393
> >
> >Signed-off-by: Mahesh Bandewar <maheshb@google.com>
> >---
> > drivers/net/bonding/bond_main.c | 9 ++++++---
> > 1 file changed, 6 insertions(+), 3 deletions(-)
> >
> >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> >index 238b56d77c36..98b37af3fb6b 100644
> >--- a/drivers/net/bonding/bond_main.c
> >+++ b/drivers/net/bonding/bond_main.c
> >@@ -2538,9 +2538,12 @@ static int bond_miimon_inspect(struct bonding *bond)
> > /* recovered before downdelay expired */
> > bond_propose_link_state(slave, BOND_LINK_UP);
> > slave->last_link_up = jiffies;
> >- slave_info(bond->dev, slave->dev, "link status up again after %d ms\n",
> >- (bond->params.downdelay - slave->delay) *
> >- bond->params.miimon);
> >+ if (net_ratelimit())
> >+ slave_info(bond->dev, slave->dev,
> >+ "link status up again after %d ms\n",
> >+ (bond->params.downdelay -
> >+ slave->delay) *
> >+ bond->params.miimon);
> > commit++;
> > continue;
> > }
> >--
> >2.35.0.rc0.227.g00780c9af4-goog
> >
>
> ---
> -Jay Vosburgh, jay.vosburgh@canonical.com
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-01-27 0:19 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-26 23:12 [PATCH] bonding: rate-limit printing a warning message Mahesh Bandewar
2022-01-26 23:24 ` Jay Vosburgh
2022-01-27 0:18 ` Mahesh Bandewar (महेश बंडेवार)
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.