From: Benjamin Poirier <benjamin.poirier@gmail.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Shrikrishna Khare <skhare@vmware.com>,
Jeff Kirsher <jeffrey.t.kirsher@intel.com>,
Netdev <netdev@vger.kernel.org>,
intel-wired-lan <intel-wired-lan@lists.osuosl.org>,
linux-kernel@vger.kernel.org
Subject: Re: [Intel-wired-lan] [RFC PATCH] e1000e: Remove Other from EIAC.
Date: Mon, 22 Jan 2018 16:12:14 +0900 [thread overview]
Message-ID: <20180122071214.cv773dufu6n4lvnw@f1.synalogic.ca> (raw)
In-Reply-To: <CAKgT0Ueu1T6UKf-yi13RN9+9=V5PPsFzhYrMSz+3sneXyxV9Lw@mail.gmail.com>
On 2018/01/20 09:21, Alexander Duyck wrote:
> On Fri, Jan 19, 2018 at 2:55 PM, Benjamin Poirier
> <benjamin.poirier@gmail.com> wrote:
> > On 2018/01/20 07:45, Benjamin Poirier wrote:
> > [...]
> >> >
> >> > I'm of the mind that we need to cut down on the code thrash. This
> >> > driver is supposed to have been in a "maintenance" mode for the last
> >> > year or so as there aren't being any new parts added is my
> >> > understanding. As-is I don't see any reason why 16ecba59bc33 ("e1000e:
> >> > Do not read ICR in Other interrupt", v4.5-rc1) was submitted or
> >> > accepted in the first place. I don't see any notes about it fixing any
> >> > bug or addressing any issue and it seems like that is the start of all
> >> > the issues we have been having recently with RXO triggering more
> >> > interrupts, various link issues, and this most recent VMware issue.
> >>
> >> I'm sorry to say but you're the one who suggested that change:
> >>
> >> http://lkml.iu.edu/hypermail/linux/kernel/1510.3/03528.html
> >>
> >> On 2015/10/28 23:08, Alexander Duyck wrote:
> >> > On 10/22/2015 05:32 PM, Benjamin Poirier wrote:
> >> [...]
> >> >
> >> > I would argue your first patch probably didn't go far enough to remove dead
> >> > code. Specifically you should only ever get into this function if LSC is
> >> > set. There are no other causes that should trigger this. As such you could
> >> > probably remove the ICR read, and instead replace it with an ICR write of
> >> > the LSC bit since OTHER is already cleared via EIAC.
> >> >
> >
> > ... The assumption that "There are no other causes that should trigger
> > this." turned out to be wrong and that caused the RXO problems. Clearing
> > OTHER via EIAC is causing the problems with vmware now. I don't think
> > you foresaw those problems back in 2015 and neither did I.
>
> Well that explains why I felt like I was explaining things to a
> younger version of myself. I was a bit more relaxed in terms of being
> willing to make arbitrary changes a few years ago. I tend to be a bit
> more conservative now, at least as far as having justifications as to
> why I want to do things. With any change you end up taking on risk,
> and so usually a patch has a justification as to why you are making
> the change.
>
> Unfortunately at the time I didn't have all the information and based
> my suggestion on a bad assumption. I would guess at the time I was
> thinking of doing general code cleanup. Other drivers such as igb work
> this way, but it led us down the path we are on now where we are
> having to make one fix after another. It is leading in the opposite
> direction of maintainability as this is all becoming more complex.
> Suggesting this was a bad decision on my part at the time. I'm only
> human, I make mistakes.. :-)
Thanks for the introspection.
>
> With further review of the code I am seeing various other issues that
> could still pop up as I am not certain we should even have the "other"
> interrupt messing with the NAPI polling or packet accounting logic at
> all. The question I would have at this point is if we revert
> 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1)
> and all the related fixes for it, what do we end up with?
The patch I submitted for the current vmware issue actually finishes
reverting commit 16ecba59bc33.
I believe the relevant commits to consider are:
16ecba59bc33 e1000e: Do not read ICR in Other interrupt (v4.5-rc1)
a61cfe4ffad7 e1000e: Do not write lsc to ics in msi-x mode (v4.5-rc1)
0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1)
19110cfbb34d e1000e: Separate signaling for link check/link up
(v4.15-rc1)
4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts (v4.15-rc1)
4110e02eb45e e1000e: Fix e1000_check_for_copper_link_ich8lan return
value. (v4.15-rc8)
(submitted) e1000e: Remove Other from EIAC.
commit 4aea7a5c5e94 essentially reverts a61cfe4ffad7 and part of
16ecba59bc33 (ICR read). The submitted patch reverts the rest of
16ecba59bc33 (EIAC clearing of Other).
> It seems
> like the code is slowly heading back in the direction of where it was
> originally anyway since there have been a number of partial reverts.
> I'm wondering what would happen if we were to just short-cut that and
> audit the patches involved to see what we really need and don't.
>
> Your patch as proposed is essentially another step in that direction.
> I'm thinking we may want to drop my currently proposed fix for now and
> instead look at going through and figure out what changes after that
> first one are still really needed.
If the patch that I submitted for the current vmware issue is merged,
the significant commits that are left are:
0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1)
Fixes a problem in the irq disabling of the napi implementation.
19110cfbb34d e1000e: Separate signaling for link check/link up
(v4.15-rc1)
Fixes link flapping caused by a race condition in link
detection. It was found because the Other interrupt was being
triggered sort of spuriously by rxo.
4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts (v4.15-rc1)
Fixes Other interrupt bursts during sustained rxo conditions.
I think all of those are still needed, or if they're removed, they need
to be reimplemented differently.
> It doesn't look like my fix will
> make it for 4.15 anyway so we might as well focus on making certain to
> have things as solid as possible by the time 4.16-rc1 rolls around.
>
> Thanks.
>
> - Alex
next prev parent reply other threads:[~2018-01-22 7:12 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-18 6:50 [RFC PATCH] e1000e: Remove Other from EIAC Benjamin Poirier
2018-01-18 7:27 ` Benjamin Poirier
2018-01-19 2:42 ` Shrikrishna Khare
2018-01-19 5:36 ` Benjamin Poirier
2018-01-18 7:41 ` Benjamin Poirier
2018-01-18 15:51 ` [Intel-wired-lan] " Alexander Duyck
2018-01-19 8:59 ` Benjamin Poirier
2018-01-19 13:36 ` Benjamin Poirier
2018-01-19 16:22 ` Alexander Duyck
2018-01-19 22:45 ` Benjamin Poirier
2018-01-19 22:55 ` Benjamin Poirier
2018-01-20 17:21 ` Alexander Duyck
2018-01-22 7:12 ` Benjamin Poirier [this message]
2018-01-22 18:01 ` Alexander Duyck
2018-01-24 8:35 ` Benjamin Poirier
2018-01-24 16:01 ` Alexander Duyck
2018-01-30 19:46 ` Alexander Duyck
2018-01-31 7:31 ` Benjamin Poirier
2018-02-02 4:29 ` Brown, Aaron F
2018-02-02 4:31 ` Brown, Aaron F
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180122071214.cv773dufu6n4lvnw@f1.synalogic.ca \
--to=benjamin.poirier@gmail.com \
--cc=alexander.duyck@gmail.com \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=jeffrey.t.kirsher@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=skhare@vmware.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).