From: Gavin Lambert <intel@mirality.co.nz>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver
Date: Thu, 11 Jul 2019 18:50:54 +1200 [thread overview]
Message-ID: <3acf459ddbbd30687cda0a79523afe04@mirality.co.nz> (raw)
This might be a bit of a tricky question, but I'm not really sure where
else to ask. Please cc me on any replies or I might overlook them.
I'm using a system with an e1000e network driver which has been patched
to bypass the regular Linux network stack (because it can get called
from a Xenomai RT context, among other reasons -- although in my case
I'm not doing that). The complete source for the patched version of the
code can be found here:
https://github.com/ribalda/ethercat/blob/master/devices/e1000e/netdev-4.9-ethercat.c
(There are some minor changes to other files, but the majority of
changes are only to this file. You can see just the changes at
https://gist.github.com/uecasm/5e36a15bda6ffd53079344fc443dcc5f/revisions
.)
It was originally based on the in-kernel e1000e driver as of Linux
4.9.65. (I'm not the person who originally made the patches, but I am
the person who rebased them to kernel 4.9 and I'm the one trying to
maintain them for newer kernel versions. Though I'm also not the person
who made that github repo.)
On a Debian system with kernel linux-image-4.9.0-4-rt-amd64 (4.9.65)
installed, this works perfectly. It also works perfectly with
linux-image-4.9.0-8-rt-amd64 (4.9.110).
However, with kernel linux-image-4.9.0-9-rt-amd64 (4.9.168) installed
(and no other changes to the system other than building the patched
e1000e module against this kernel's headers), something weird happens
when the driver is running in its alternate "ecdev" mode.
Specifically, when the module is initially loaded, it works as expected
and can send/receive without problems. When link is removed (by
disconnecting the Ethernet cable), it detects this as expected. When
link is restored, it detects this and reports it but is then unable to
actually send any packets. (Note: to send packets the external code
calls the "ndo_start_xmit" operation directly, and to receive packets it
calls "ec_poll". Also note that it won't receive a packet unless it
sends one first, due to the way that the network it's connected to
works, so I can't tell if receives work or not when sends don't work.)
Unloading and reloading the module fixes this, even if the link is
initially down and then reconnected after the module is reloaded. (So
perhaps the problem is something it does at the link-loss event?)
Occasionally, it does manage to survive one or two replugs before
getting into the problem state. But once there, no amount of replugging
appears to recover it; only reloading the module.
I do know that when it's in the failure state (not actually sending
packets), e1000_xmit_frame continues to get all the way to the bottom
and return NETDEV_TX_OK.
Note that the e1000e code being used is still the code as shown in the
link above, not the code as exists in Linux 4.9.168. I did try rebasing
the ethercat patches onto the new driver version, but this didn?t seem
to change the behavior.
Also note that the bad behavior was observed on an I219-V and an
I219-LM, but does not appear to happen with an 82571EB (these are the
only devices I have handy at the moment). The problem also doesn't
occur when using the unpatched driver from 4.9.168 as a standard Linux
network driver.
Obviously, something the patches are doing is causing problems, but it
seems odd that the issue only occurs with certain hardware and with
certain kernel versions. Any ideas on what could be the cause and
solution (or how to narrow it down further)? I can easily make changes
to the driver code; it's a lot harder to try kernel versions between the
two above, however, but I might be able to do that too.
next reply other threads:[~2019-07-11 6:50 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-11 6:50 Gavin Lambert [this message]
2019-07-12 3:23 ` [Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver Gavin Lambert
2019-07-18 8:06 ` Gavin Lambert
2019-07-18 8:22 ` Paul Menzel
2019-07-18 8:24 ` Neftin, Sasha
2019-07-19 0:40 ` Gavin Lambert
2019-07-19 1:02 ` Gavin Lambert
2019-08-20 2:15 ` Gavin Lambert
2019-09-03 7:56 ` Gavin Lambert
2019-09-03 8:35 ` Paul Menzel
2019-09-03 9:20 ` Greg Kroah-Hartman
2019-09-03 9:28 ` Winkler, Tomas
2019-09-03 9:39 ` Paul Menzel
2019-09-03 11:00 ` Gavin Lambert
2019-09-04 10:06 ` Winkler, Tomas
2019-09-04 11:08 ` Gavin Lambert
2019-09-04 12:31 ` Lifshits, Vitaly
2019-09-05 3:59 ` Gavin Lambert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3acf459ddbbd30687cda0a79523afe04@mirality.co.nz \
--to=intel@mirality.co.nz \
--cc=intel-wired-lan@osuosl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox