netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: David Miller <davem@davemloft.net>
Cc: kuba@kernel.org, arjan@linux.intel.com, xiyou.wangcong@gmail.com,
	jhs@mojatatu.com, jiri@resnulli.us, netdev@vger.kernel.org
Subject: Re: [PATCH net v1] net/sched: Don't print dump stack in event of transmission timeout
Date: Mon, 13 Apr 2020 08:03:57 +0300	[thread overview]
Message-ID: <20200413050357.GF334007@unreal> (raw)
In-Reply-To: <20200412.211925.400624643622219681.davem@davemloft.net>

On Sun, Apr 12, 2020 at 09:19:25PM -0700, David Miller wrote:
>
> This is cause by a device"overwhelmed with traffic"?  Sounds like
> normal operation to me.
>
> That's a bug, and the driver handling the device with this problem
> should adjust how it implements TX timeouts to accomodate this.

From the internal bug description, hope that it makes sense.

-----
A timeout may occur if the amount of the reported bytes higher than the queue limit,
in this case, the kernel closes the queue and only after getting a completion it wil
reopen it.

In the debug we saw that in some situations the driver gets a **delayed completion**,
completions arrive after **1 min**, therefore, the amount of queued bytes exceeds the
DQL max size.

As a result, the kernel after watchdog_timeo calls the driver's timeout function,
that prints timeout to dmesg.

After debugging the issue with FW to understand the root cause of the delayed completions
we understand that since the IB and the TCP traffic are running at the same service level (SL),
the same schedule queue schedules between all the QPs, and in this case if one of the IB QPs get
stuck because of congestion, all other QPs will be stuck (include the TCP QPs) until releasing
the stuck QP.
-----

User separates traffic to different SLs.

Thanks

  reply	other threads:[~2020-04-13  5:04 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-12  6:08 [PATCH net v1] net/sched: Don't print dump stack in event of transmission timeout Leon Romanovsky
2020-04-12 18:59 ` Jakub Kicinski
2020-04-12 19:23   ` Leon Romanovsky
2020-04-13  4:19 ` David Miller
2020-04-13  5:03   ` Leon Romanovsky [this message]
2020-04-13  9:01 ` Jose Abreu
2020-04-13 10:20   ` Leon Romanovsky
2020-04-13 10:37     ` Jose Abreu
2020-04-13 10:54       ` Leon Romanovsky
2020-04-13 11:01         ` Jose Abreu
2020-04-13 11:25           ` Leon Romanovsky
2020-04-13 17:22 ` Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200413050357.GF334007@unreal \
    --to=leon@kernel.org \
    --cc=arjan@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).