From: Jay Vosburgh <jay.vosburgh@canonical.com>
To: Mahesh Bandewar <maheshb@google.com>
Cc: Netdev <netdev@vger.kernel.org>,
Andy Gospodarek <andy@greyhouse.net>,
Veaceslav Falico <vfalico@gmail.com>,
David Miller <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
Mahesh Bandewar <mahesh@bandewar.net>
Subject: Re: [PATCH] bonding: rate-limit printing a warning message
Date: Wed, 26 Jan 2022 15:24:59 -0800 [thread overview]
Message-ID: <25071.1643239499@famine> (raw)
In-Reply-To: <20220126231229.4028998-1-maheshb@google.com>
Mahesh Bandewar <maheshb@google.com> wrote:
>dev.c:get_link_speed() schedules a work-queue aggressively when it
>fails to get the speed. If the link n question is a bond device which
>may have multiple links to iterate through to get the link
>speed. If the underlying link(s) has/have issues, bonding driver prints
>a link-status message and this doesn't go well with the aggressive
>work-queue scheduling and results in a rcu stall. This fix just
>adds a ratelimiter to the message printing to avoid the stall.
I don't see a get_link_speed() function in net-next
net/core/dev.c; am I missing something?
-J
>Call Trace:
> <IRQ>
> __dump_stack lib/dump_stack.c:17 [inline]
> dump_stack+0x14d/0x20b lib/dump_stack.c:53
> nmi_cpu_backtrace.cold+0x19/0x98 lib/nmi_backtrace.c:103
> nmi_trigger_cpumask_backtrace+0x16a/0x17e lib/nmi_backtrace.c:62
> arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
> trigger_single_cpu_backtrace include/linux/nmi.h:161 [inline]
> rcu_dump_cpu_stacks+0x183/0x1cf kernel/rcu/tree.c:1210
> print_cpu_stall kernel/rcu/tree.c:1349 [inline]
> check_cpu_stall kernel/rcu/tree.c:1423 [inline]
> rcu_pending kernel/rcu/tree.c:3010 [inline]
> rcu_check_callbacks.cold+0x494/0x7d3 kernel/rcu/tree.c:2551
> update_process_times+0x32/0x80 kernel/time/timer.c:1641
> tick_sched_handle+0xa0/0x180 kernel/time/tick-sched.c:161
> tick_sched_timer+0x44/0x130 kernel/time/tick-sched.c:1193
> __run_hrtimer kernel/time/hrtimer.c:1396 [inline]
> __hrtimer_run_queues+0x304/0xd80 kernel/time/hrtimer.c:1458
> hrtimer_interrupt+0x2ea/0x730 kernel/time/hrtimer.c:1516
> local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1031 [inline]
> smp_apic_timer_interrupt+0x150/0x5e0 arch/x86/kernel/apic/apic.c:1056
> apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:780
> </IRQ>
>RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783 [inline]
>RIP: 0010:console_unlock+0x82b/0xcc0 kernel/printk/printk.c:2302
> RSP: 0018:ffff8801966cb9e8 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff12
>RAX: ffff8801968d0040 RBX: 0000000000000000 RCX: 0000000000000006
>RDX: 0000000000000000 RSI: ffffffff815a6515 RDI: 0000000000000293
>RBP: ffff8801966cba70 R08: ffff8801968d0040 R09: 0000000000000000
>R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>R13: ffffffff82cc61b0 R14: dffffc0000000000 R15: 0000000000000001
> vprintk_emit+0x593/0x610 kernel/printk/printk.c:1836
> vprintk_default+0x28/0x30 kernel/printk/printk.c:1876
> vprintk_func+0x7a/0xed kernel/printk/printk_safe.c:379
> printk+0xba/0xed kernel/printk/printk.c:1909
> get_link_speed.cold+0x43/0x144 net/core/dev.c:1493
> get_link_speed_work+0x1e/0x30 net/core/dev.c:1515
> process_one_work+0x881/0x1560 kernel/workqueue.c:2147
> worker_thread+0x653/0x1150 kernel/workqueue.c:2281
> kthread+0x345/0x410 kernel/kthread.c:246
> ret_from_fork+0x3f/0x50 arch/x86/entry/entry_64.S:393
>
>Signed-off-by: Mahesh Bandewar <maheshb@google.com>
>---
> drivers/net/bonding/bond_main.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 238b56d77c36..98b37af3fb6b 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -2538,9 +2538,12 @@ static int bond_miimon_inspect(struct bonding *bond)
> /* recovered before downdelay expired */
> bond_propose_link_state(slave, BOND_LINK_UP);
> slave->last_link_up = jiffies;
>- slave_info(bond->dev, slave->dev, "link status up again after %d ms\n",
>- (bond->params.downdelay - slave->delay) *
>- bond->params.miimon);
>+ if (net_ratelimit())
>+ slave_info(bond->dev, slave->dev,
>+ "link status up again after %d ms\n",
>+ (bond->params.downdelay -
>+ slave->delay) *
>+ bond->params.miimon);
> commit++;
> continue;
> }
>--
>2.35.0.rc0.227.g00780c9af4-goog
>
---
-Jay Vosburgh, jay.vosburgh@canonical.com
next prev parent reply other threads:[~2022-01-26 23:25 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-26 23:12 [PATCH] bonding: rate-limit printing a warning message Mahesh Bandewar
2022-01-26 23:24 ` Jay Vosburgh [this message]
2022-01-27 0:18 ` Mahesh Bandewar (महेश बंडेवार)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=25071.1643239499@famine \
--to=jay.vosburgh@canonical.com \
--cc=andy@greyhouse.net \
--cc=davem@davemloft.net \
--cc=kuba@kernel.org \
--cc=mahesh@bandewar.net \
--cc=maheshb@google.com \
--cc=netdev@vger.kernel.org \
--cc=vfalico@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.