netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: lsorense@csclub.uwaterloo.ca (Lennart Sorensen)
To: linux-kernel@vger.kernel.org
Cc: Len Sorensen <lsorense@csclub.uwaterloo.ca>,
	netdev@vger.kernel.org, Benjamin Poirier <bpoirier@suse.com>,
	intel-wired-lan@lists.osuosl.org,
	Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Subject: commit 16ecba59 breaks 82574L under heavy load.
Date: Tue, 18 Jul 2017 10:21:09 -0400	[thread overview]
Message-ID: <20170718142109.GO18556@csclub.uwaterloo.ca> (raw)

Commit 16ecba59bc333d6282ee057fb02339f77a880beb has apparently broken
at least the 82574L under heavy load (as in load heavy enough to cause
packet drops).  In this case, when running in MSI-X mode, the Other
Causes interrupt fires about 3000 times per second, but not due to link
state changes.  Unfortunately this commit changed the driver to assume
that the Other Causes interrupt can only mean link state change and
hence sets the flag that (unfortunately) means both link is down and link
state should be checked.  Since this now happens 3000 times per second,
the chances of it happening while the watchdog_task is checking the link
state becomes pretty high, and it if does happen to coincice, then the
watchdog_task will reset the adapter, which causes a real loss of link.

Reverting the commit makes everything work fine again (of course packets
are still dropped, but at least the link stays up, the adapter isn't
reset, and most packets make it through).

I tried checking what the bits in the ICR actually were under these
conditions, and it would appear that the only bit set is 24 (the Other
Causes interrupt bit).  So I don't know what the real cause is although
rx buffer overrun would be my guess, and in fact I see nothing in the
datasheet indicating that you can actually disable the rx buffer overrun
from generating an interrupt.

Prior to this commit, the interrupt handler explicitly checked that the
interrupt was caused by a link state change and only then did it trigger
a recheck which worked fine and did not cause incorrect adapter resets,
although it of course still had lots of undesired interrupts to deal with.

Of course ideally there would be a way to make these 3000 pointless
interrupts per second not happen, but unless there is a way to determine
that, I think this commit needs reverting, since it apparently causes
link failures on actual hardware that exists.

The ports are onboard intel 82574L on a Supermicro X7SPA-HF-D525 with
1.2a BIOS (upgrading to 1.2b to check if it makes a difference is not
an option unfortunately).

-- 
Len Sorensen

             reply	other threads:[~2017-07-18 14:21 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-18 14:21 Lennart Sorensen [this message]
2017-07-18 23:14 ` commit 16ecba59 breaks 82574L under heavy load Benjamin Poirier
2017-07-19 14:19   ` Lennart Sorensen
2017-07-20  0:07     ` Benjamin Poirier
2017-07-20 14:00       ` Lennart Sorensen
2017-07-20 23:44         ` Benjamin Poirier
2017-07-21 15:27           ` Lennart Sorensen
2017-07-21 16:09             ` Lennart Sorensen
2017-07-21 18:36               ` [PATCH 1/5] e1000e: Fix error path in link detection Benjamin Poirier
2017-07-21 18:36                 ` [PATCH 2/5] e1000e: Fix wrong comment related to " Benjamin Poirier
2017-09-19  0:13                   ` [Intel-wired-lan] " Brown, Aaron F
2017-07-21 18:36                 ` [PATCH 3/5] e1000e: Fix return value test Benjamin Poirier
2017-09-15  0:20                   ` [Intel-wired-lan] " Brown, Aaron F
2017-07-21 18:36                 ` [PATCH 4/5] e1000e: Separate signaling for link check/link up Benjamin Poirier
2017-07-21 18:50                   ` Lennart Sorensen
2017-08-02 11:28                   ` [Intel-wired-lan] " Neftin, Sasha
2017-08-02 14:34                     ` Lennart Sorensen
2017-08-02 14:49                       ` Benjamin Poirier
2017-09-15  0:27                         ` Brown, Aaron F
2017-07-21 18:36                 ` [PATCH 5/5] e1000e: Avoid receiver overrun interrupt bursts Benjamin Poirier
2017-07-21 18:48                   ` Lennart Sorensen
2017-08-12  2:13                     ` Philip Prindeville
2017-08-12  2:47                       ` Philip Prindeville
2017-08-21 17:17                   ` Benjamin Poirier
2017-09-15  0:38                   ` [Intel-wired-lan] " Brown, Aaron F
2017-09-19 18:38                   ` [5/5] " Philip Prindeville
2017-09-19 19:41                     ` Benjamin Poirier
2017-10-24 17:20                       ` Lennart Sorensen
2017-10-24 17:39                         ` Philip Prindeville
2017-09-15  0:18                 ` [Intel-wired-lan] [PATCH 1/5] e1000e: Fix error path in link detection Brown, Aaron F
2017-07-21 19:02           ` commit 16ecba59 breaks 82574L under heavy load Lennart Sorensen
2017-07-24 21:56           ` Philip Prindeville

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170718142109.GO18556@csclub.uwaterloo.ca \
    --to=lsorense@csclub.uwaterloo.ca \
    --cc=bpoirier@suse.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).