netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Jakub Kicinski <kuba@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>,
	Arjan van de Ven <arjan@linux.intel.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Jiri Pirko <jiri@resnulli.us>,
	netdev@vger.kernel.org
Subject: Re: [PATCH net v1] net/sched: Don't print dump stack in event of transmission timeout
Date: Sun, 12 Apr 2020 22:23:36 +0300	[thread overview]
Message-ID: <20200412192336.GD334007@unreal> (raw)
In-Reply-To: <20200412115913.14d69a7c@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On Sun, Apr 12, 2020 at 11:59:13AM -0700, Jakub Kicinski wrote:
> On Sun, 12 Apr 2020 09:08:54 +0300 Leon Romanovsky wrote:
> > Hi Dave,
> >
> > This is a new version of previously sent v0 [1] with change in print error
> > level as was suggested by Jakub and Cong. I'm asking you to reevaluate
> > your previous decision [2] given the fact that this is user triggered
> > bug and very similar scenario was committed by Linus "fs/filesystems.c:
> > downgrade user-reachable WARN_ONCE() to pr_warn_once()" a couple of days
> > ago [3].
> >
> > [1] https://lore.kernel.org/netdev/20200402152336.538433-1-leon@kernel.org
> > [2] https://lore.kernel.org/netdev/20200402.180218.940555077368617365.davem@davemloft.net
> > [3] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/urgent&id=26c5d78c976ca298e59a56f6101a97b618ba3539
>
> How is it user triggerable? If there's a IB-specific reason maybe ib
> netdev should stop implementing ndo_tx_timeout.

It is happening if device is extremely over loaded with traffic,
internally HW decreases the performance (HW bug), it is causing to
the TX timeouts and to the WARN_ON splat.

We don't want to stop implementing ndo_tx_timeout, because it works
right most (if not all) of the time.

If it is very important, I will dig into internal bug reports to see
the possible reproduction scenarios, but from what I saw till now,
it is statistical failure.

And it is not IB specific, but mlx4 specific.

Thanks

  reply	other threads:[~2020-04-12 19:23 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-12  6:08 [PATCH net v1] net/sched: Don't print dump stack in event of transmission timeout Leon Romanovsky
2020-04-12 18:59 ` Jakub Kicinski
2020-04-12 19:23   ` Leon Romanovsky [this message]
2020-04-13  4:19 ` David Miller
2020-04-13  5:03   ` Leon Romanovsky
2020-04-13  9:01 ` Jose Abreu
2020-04-13 10:20   ` Leon Romanovsky
2020-04-13 10:37     ` Jose Abreu
2020-04-13 10:54       ` Leon Romanovsky
2020-04-13 11:01         ` Jose Abreu
2020-04-13 11:25           ` Leon Romanovsky
2020-04-13 17:22 ` Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200412192336.GD334007@unreal \
    --to=leon@kernel.org \
    --cc=arjan@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).