netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Poirier <benjamin.poirier@gmail.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Shrikrishna Khare <skhare@vmware.com>,
	Jeff Kirsher <jeffrey.t.kirsher@intel.com>,
	Netdev <netdev@vger.kernel.org>,
	intel-wired-lan <intel-wired-lan@lists.osuosl.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [Intel-wired-lan] [RFC PATCH] e1000e: Remove Other from EIAC.
Date: Mon, 22 Jan 2018 16:12:14 +0900	[thread overview]
Message-ID: <20180122071214.cv773dufu6n4lvnw@f1.synalogic.ca> (raw)
In-Reply-To: <CAKgT0Ueu1T6UKf-yi13RN9+9=V5PPsFzhYrMSz+3sneXyxV9Lw@mail.gmail.com>

On 2018/01/20 09:21, Alexander Duyck wrote:
> On Fri, Jan 19, 2018 at 2:55 PM, Benjamin Poirier
> <benjamin.poirier@gmail.com> wrote:
> > On 2018/01/20 07:45, Benjamin Poirier wrote:
> > [...]
> >> >
> >> > I'm of the mind that we need to cut down on the code thrash.  This
> >> > driver is supposed to have been in a "maintenance" mode for the last
> >> > year or so as there aren't being any new parts added is my
> >> > understanding. As-is I don't see any reason why 16ecba59bc33 ("e1000e:
> >> > Do not read ICR in Other interrupt", v4.5-rc1) was submitted or
> >> > accepted in the first place. I don't see any notes about it fixing any
> >> > bug or addressing any issue and it seems like that is the start of all
> >> > the issues we have been having recently with RXO triggering more
> >> > interrupts, various link issues, and this most recent VMware issue.
> >>
> >> I'm sorry to say but you're the one who suggested that change:
> >>
> >> http://lkml.iu.edu/hypermail/linux/kernel/1510.3/03528.html
> >>
> >> On 2015/10/28 23:08, Alexander Duyck wrote:
> >> > On 10/22/2015 05:32 PM, Benjamin Poirier wrote:
> >> [...]
> >> >
> >> > I would argue your first patch probably didn't go far enough to remove dead
> >> > code.  Specifically you should only ever get into this function if LSC is
> >> > set.  There are no other causes that should trigger this.  As such you could
> >> > probably remove the ICR read, and instead replace it with an ICR write of
> >> > the LSC bit since OTHER is already cleared via EIAC.
> >> >
> >
> > ... The assumption that "There are no other causes that should trigger
> > this." turned out to be wrong and that caused the RXO problems. Clearing
> > OTHER via EIAC is causing the problems with vmware now. I don't think
> > you foresaw those problems back in 2015 and neither did I.
> 
> Well that explains why I felt like I was explaining things to a
> younger version of myself. I was a bit more relaxed in terms of being
> willing to make arbitrary changes a few years ago. I tend to be a bit
> more conservative now, at least as far as having justifications as to
> why I want to do things. With any change you end up taking on risk,
> and so usually a patch has a justification as to why you are making
> the change.
> 
> Unfortunately at the time I didn't have all the information and based
> my suggestion on a bad assumption. I would guess at the time I was
> thinking of doing general code cleanup. Other drivers such as igb work
> this way, but it led us down the path we are on now where we are
> having to make one fix after another. It is leading in the opposite
> direction of maintainability as this is all becoming more complex.
> Suggesting this was a bad decision on my part at the time. I'm only
> human, I make mistakes.. :-)

Thanks for the introspection.

> 
> With further review of the code I am seeing various other issues that
> could still pop up as I am not certain we should even have the "other"
> interrupt messing with the NAPI polling or packet accounting logic at
> all. The question I would have at this point is if we revert
> 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1)
> and all the related fixes for it, what do we end up with?

The patch I submitted for the current vmware issue actually finishes
reverting commit 16ecba59bc33.

I believe the relevant commits to consider are:

16ecba59bc33 e1000e: Do not read ICR in Other interrupt (v4.5-rc1)
a61cfe4ffad7 e1000e: Do not write lsc to ics in msi-x mode (v4.5-rc1)
0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1)

19110cfbb34d e1000e: Separate signaling for link check/link up
             (v4.15-rc1)
4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts (v4.15-rc1)

4110e02eb45e e1000e: Fix e1000_check_for_copper_link_ich8lan return
	     value. (v4.15-rc8)

(submitted)  e1000e: Remove Other from EIAC.

commit 4aea7a5c5e94 essentially reverts a61cfe4ffad7 and part of
16ecba59bc33 (ICR read). The submitted patch reverts the rest of
16ecba59bc33 (EIAC clearing of Other).

> It seems
> like the code is slowly heading back in the direction of where it was
> originally anyway since there have been a number of partial reverts.
> I'm wondering what would happen if we were to just short-cut that and
> audit the patches involved to see what we really need and don't.
> 
> Your patch as proposed is essentially another step in that direction.
> I'm thinking we may want to drop my currently proposed fix for now and
> instead look at going through and figure out what changes after that
> first one are still really needed.

If the patch that I submitted for the current vmware issue is merged,
the significant commits that are left are:

0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1)
	Fixes a problem in the irq disabling of the napi implementation.

19110cfbb34d e1000e: Separate signaling for link check/link up
             (v4.15-rc1)
	Fixes link flapping caused by a race condition in link
	detection. It was found because the Other interrupt was being
	triggered sort of spuriously by rxo.

4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts (v4.15-rc1)
	Fixes Other interrupt bursts during sustained rxo conditions.

I think all of those are still needed, or if they're removed, they need
to be reimplemented differently.

> It doesn't look like my fix will
> make it for 4.15 anyway so we might as well focus on making certain to
> have things as solid as possible by the time 4.16-rc1 rolls around.
> 
> Thanks.
> 
> - Alex

  reply	other threads:[~2018-01-22  7:12 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-18  6:50 [RFC PATCH] e1000e: Remove Other from EIAC Benjamin Poirier
2018-01-18  7:27 ` Benjamin Poirier
2018-01-19  2:42   ` Shrikrishna Khare
2018-01-19  5:36     ` Benjamin Poirier
2018-01-18  7:41 ` Benjamin Poirier
2018-01-18 15:51 ` [Intel-wired-lan] " Alexander Duyck
2018-01-19  8:59   ` Benjamin Poirier
2018-01-19 13:36     ` Benjamin Poirier
2018-01-19 16:22       ` Alexander Duyck
2018-01-19 22:45         ` Benjamin Poirier
2018-01-19 22:55           ` Benjamin Poirier
2018-01-20 17:21             ` Alexander Duyck
2018-01-22  7:12               ` Benjamin Poirier [this message]
2018-01-22 18:01                 ` Alexander Duyck
2018-01-24  8:35                   ` Benjamin Poirier
2018-01-24 16:01                     ` Alexander Duyck
2018-01-30 19:46 ` Alexander Duyck
2018-01-31  7:31   ` Benjamin Poirier
2018-02-02  4:29     ` Brown, Aaron F
2018-02-02  4:31     ` Brown, Aaron F

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180122071214.cv773dufu6n4lvnw@f1.synalogic.ca \
    --to=benjamin.poirier@gmail.com \
    --cc=alexander.duyck@gmail.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=skhare@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).