netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: lsorense@csclub.uwaterloo.ca (Lennart Sorensen)
To: Benjamin Poirier <bpoirier@suse.com>
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	intel-wired-lan@lists.osuosl.org,
	Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Subject: Re: commit 16ecba59 breaks 82574L under heavy load.
Date: Fri, 21 Jul 2017 12:09:37 -0400	[thread overview]
Message-ID: <20170721160937.GA22632@csclub.uwaterloo.ca> (raw)
In-Reply-To: <20170721152709.GT18556@csclub.uwaterloo.ca>

On Fri, Jul 21, 2017 at 11:27:09AM -0400,  wrote:
> On Thu, Jul 20, 2017 at 04:44:55PM -0700, Benjamin Poirier wrote:
> > Could you please test the following patch and let me know if it:
> > 1) reduces the interrupt rate of the Other msi-x vector
> > 2) avoids the link flaps
> > or
> > 3) logs some dmesg warnings of the form "Other interrupt with unhandled [...]"
> > In this case, please paste icr values printed.
> 
> I will give it a try.

So test looks excellent.  Seems to only get interrupts when link state
actually changes now.

> Another odd behaviour I see is that the driver will hang in
> napi_synchronize on shutdown if there is traffic at the time (at least
> I think that's the trigger, maybe the trigger is if there has been an
> overload of traffic and the backlog in napi was used).
> 
> From doing some searching, this seems to be a problem that has plagued
> some people for years with this driver.
> 
> I am having trouble figuring out exactly what napi_synchronize is waiting
> for and who is supposed to toggle the flag it is waiting on.  The flag
> appears to work backwards from what I would have expected it to do.
> I see lots of places that can set the bit, but only napi_enable seems
> to clear it again, and I don't see how that would get called for all
> the places that potentially set the bit.

I just realized NAPI_STATE_SCHED and NAPIF_STATE_SCHED are the same
thing and I need to look at both of those.

Still something seems odd in some corner case where napi gets stuck and
you can't close the port anymore due to napi_synchronize never being
able to finish.  Some traffic pattern causes that SCHED state bit to
get into the wrong state and nothing ever clears it.  Even managed to
see it get stuck so it never passed traffic again and hung on shutdown.
The napi poll was never called again.

-- 
Len Sorensen

  reply	other threads:[~2017-07-21 16:09 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-18 14:21 commit 16ecba59 breaks 82574L under heavy load Lennart Sorensen
2017-07-18 23:14 ` Benjamin Poirier
2017-07-19 14:19   ` Lennart Sorensen
2017-07-20  0:07     ` Benjamin Poirier
2017-07-20 14:00       ` Lennart Sorensen
2017-07-20 23:44         ` Benjamin Poirier
2017-07-21 15:27           ` Lennart Sorensen
2017-07-21 16:09             ` Lennart Sorensen [this message]
2017-07-21 18:36               ` [PATCH 1/5] e1000e: Fix error path in link detection Benjamin Poirier
2017-07-21 18:36                 ` [PATCH 2/5] e1000e: Fix wrong comment related to " Benjamin Poirier
2017-09-19  0:13                   ` [Intel-wired-lan] " Brown, Aaron F
2017-07-21 18:36                 ` [PATCH 3/5] e1000e: Fix return value test Benjamin Poirier
2017-09-15  0:20                   ` [Intel-wired-lan] " Brown, Aaron F
2017-07-21 18:36                 ` [PATCH 4/5] e1000e: Separate signaling for link check/link up Benjamin Poirier
2017-07-21 18:50                   ` Lennart Sorensen
2017-08-02 11:28                   ` [Intel-wired-lan] " Neftin, Sasha
2017-08-02 14:34                     ` Lennart Sorensen
2017-08-02 14:49                       ` Benjamin Poirier
2017-09-15  0:27                         ` Brown, Aaron F
2017-07-21 18:36                 ` [PATCH 5/5] e1000e: Avoid receiver overrun interrupt bursts Benjamin Poirier
2017-07-21 18:48                   ` Lennart Sorensen
2017-08-12  2:13                     ` Philip Prindeville
2017-08-12  2:47                       ` Philip Prindeville
2017-08-21 17:17                   ` Benjamin Poirier
2017-09-15  0:38                   ` [Intel-wired-lan] " Brown, Aaron F
2017-09-19 18:38                   ` [5/5] " Philip Prindeville
2017-09-19 19:41                     ` Benjamin Poirier
2017-10-24 17:20                       ` Lennart Sorensen
2017-10-24 17:39                         ` Philip Prindeville
2017-09-15  0:18                 ` [Intel-wired-lan] [PATCH 1/5] e1000e: Fix error path in link detection Brown, Aaron F
2017-07-21 19:02           ` commit 16ecba59 breaks 82574L under heavy load Lennart Sorensen
2017-07-24 21:56           ` Philip Prindeville

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170721160937.GA22632@csclub.uwaterloo.ca \
    --to=lsorense@csclub.uwaterloo.ca \
    --cc=bpoirier@suse.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).