netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [Bugme-new] [Bug 14737] New: e1000e driver experiences large packet losses
       [not found] <bug-14737-10286@http.bugzilla.kernel.org/>
@ 2009-12-07 21:19 ` Andrew Morton
  2009-12-07 21:53   ` Brandeburg, Jesse
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2009-12-07 21:19 UTC (permalink / raw)
  To: Jeff Kirsher, Jesse Brandeburg, Bruce Allan, PJ Waskiewicz, Joh
  Cc: bugzilla-daemon, bugme-daemon, netdev, xenoterracide


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Sat, 5 Dec 2009 07:02:49 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=14737
> 
>            Summary: e1000e driver experiences large packet losses
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 2.6.32--
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Network
>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>         ReportedBy: xenoterracide@gmail.com
>         Regression: No
> 
> 
> possibly related to this bug #13835 same symptoms and I upped a lot of info for
> this bug on there.
> 
> Also Read this thread http://marc.info/?t=125699907100001&r=1&w=2
> 
> I have thus far been able to find similar symptoms all the way back to 2.6.29.6
> I've not yet done testing farther. problem is intermittent. It does not appear
> to affect another nic on the system (however testing of that hasn't been
> extensive,and a different driver). if I do not reboot the computer and the bug
> hasn't manifested, it will not manifest (perhaps unless I reload modules or
> restart interfaces (not tested).
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Bugme-new] [Bug 14737] New: e1000e driver experiences large packet losses
  2009-12-07 21:19 ` [Bugme-new] [Bug 14737] New: e1000e driver experiences large packet losses Andrew Morton
@ 2009-12-07 21:53   ` Brandeburg, Jesse
  2009-12-07 22:20     ` Caleb Cushing
  2009-12-07 22:59     ` Jarek Poplawski
  0 siblings, 2 replies; 4+ messages in thread
From: Brandeburg, Jesse @ 2009-12-07 21:53 UTC (permalink / raw)
  To: xenoterracide@gmail.com
  Cc: Kirsher, Jeffrey T, Allan, Bruce W, Waskiewicz Jr, Peter P,
	Ronciak, John, bugzilla-daemon@bugzilla.kernel.org,
	bugme-daemon@bugzilla.kernel.org, Andrew Morton,
	netdev@vger.kernel.org

thanks akpm, I've been watching this thread but now I will try to jump in.

Caleb, can you please summarize where we are today, you've done a lot of 
testing and the thread has gone on a while.

Kernels known to fail (after any length):

Kernels known to work:

Have you been able to try the latest e1000e from 2.6.32?  it has some 
fixes in it, although none right off the top of my head that will fix your 
issue.

I have a couple of related questions, why don't you have irqbalance 
enabled?  Network interrupts should not be migrating across all cpus 
evenly, at the very least your system should be reconfigured to lock the 
interrupts to a particular core with smp_affinity.


           CPU0       CPU1       CPU2       CPU3       
  0:        119         59         69         70   IO-APIC-edge      timer
  1:          1          2          1          0   IO-APIC-edge      i8042
  6:          0          1          0          1   IO-APIC-edge      floppy
  8:        185        178        175        180   IO-APIC-edge      rtc0
  9:          0          0          0          0   IO-APIC-fasteoi   acpi
 12:          0          1          2          3   IO-APIC-edge      i8042
 16:     761720     767583     765772     762262   IO-APIC-fasteoi   uhci_hcd:usb3, EMU10K1
 17:          2          1          0          0   IO-APIC-fasteoi   ohci1394
 18:          0          0          0          2   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb8
 19:     192022     191598     191809     191886   IO-APIC-fasteoi   uhci_hcd:usb5, uhci_hcd:usb7
 21:          0          1          1          3   IO-APIC-fasteoi   uhci_hcd:usb4, eth0
 23:      19600      19263      19489      19502   IO-APIC-fasteoi   ehci_hcd:usb2, uhci_hcd:usb6
 25:     419910     412980     411109     416834   PCI-MSI-edge      i915
 26:     233236     233744     233647     233567   PCI-MSI-edge      ahci
 27:     709493     708677     709630     708963   PCI-MSI-edge      eth1
NMI:          0          0          0          0   Non-maskable interrupts
LOC:   10375694    9592098    6283658    6319369   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0   Performance monitoring interrupts
PND:          0          0          0          0   Performance pending work
RES:      50103      49240      47545      45606   Rescheduling interrupts
CAL:      74174        408      71586        453   Function call interrupts
TLB:      49410      53567      50409      52426   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0   Machine check exceptions
MCP:        271        271        271        271   Machine check polls
ERR:          0
MIS:          0

There is nothing in the ethtool -S statistics that I see that indicates 
anything is wrong, you've gotten no tx timeouts as far as I can tell, have 
you had any system panics (possibly seeming unrelated to network?)


On Mon, 7 Dec 2009, Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Sat, 5 Dec 2009 07:02:49 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=14737
> > 
> >            Summary: e1000e driver experiences large packet losses
> >            Product: Drivers
> >            Version: 2.5
> >     Kernel Version: 2.6.32--
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Network
> >         AssignedTo: drivers_network@kernel-bugs.osdl.org
> >         ReportedBy: xenoterracide@gmail.com
> >         Regression: No
> > 
> > 
> > possibly related to this bug #13835 same symptoms and I upped a lot of info for
> > this bug on there.
> > 
> > Also Read this thread http://marc.info/?t=125699907100001&r=1&w=2
> > 
> > I have thus far been able to find similar symptoms all the way back to 2.6.29.6
> > I've not yet done testing farther. problem is intermittent. It does not appear
> > to affect another nic on the system (however testing of that hasn't been
> > extensive,and a different driver). if I do not reboot the computer and the bug
> > hasn't manifested, it will not manifest (perhaps unless I reload modules or
> > restart interfaces (not tested).
> > 
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Bugme-new] [Bug 14737] New: e1000e driver experiences large packet losses
  2009-12-07 21:53   ` Brandeburg, Jesse
@ 2009-12-07 22:20     ` Caleb Cushing
  2009-12-07 22:59     ` Jarek Poplawski
  1 sibling, 0 replies; 4+ messages in thread
From: Caleb Cushing @ 2009-12-07 22:20 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: Kirsher, Jeffrey T, Allan, Bruce W, Waskiewicz Jr, Peter P,
	Ronciak, John, bugzilla-daemon@bugzilla.kernel.org,
	bugme-daemon@bugzilla.kernel.org, Andrew Morton,
	netdev@vger.kernel.org

On Mon, Dec 7, 2009 at 4:53 PM, Brandeburg, Jesse
<jesse.brandeburg@intel.com> wrote:
> thanks akpm, I've been watching this thread but now I will try to jump in.
>
> Caleb, can you please summarize where we are today, you've done a lot of
> testing and the thread has gone on a while.
>
> Kernels known to fail (after any length):
2.6.32 - 2.6.29.6 is the range I've tested 2.6.29 only seemed to have
10% packet loss with mtr as opposed to the later, higher 30-50% still
that's abnormal and shouldn't be happening. I haven't tested farther
back yet.

> Kernels known to work:

flawlessly, none at this point. I've been able to replicate on every
version tested. given the fact it doesn't happen on every reboot and I
rarely reboot this makes it difficult to test. Other than the fact
that I've been unable to find a good kernel nothing suggests hardware
failure. given that some of the other e1000e bugs go back farther than
I've tested...

> Have you been able to try the latest e1000e from 2.6.32?  it has some
> fixes in it, although none right off the top of my head that will fix your
> issue.

yes. reproducible. whether it occurs as often I'm not sure.

> I have a couple of related questions, why don't you have irqbalance
> enabled?  Network interrupts should not be migrating across all cpus
> evenly, at the very least your system should be reconfigured to lock the
> interrupts to a particular core with smp_affinity.

is that new with 32? if not I don't know... I'm using arch linux's
config as a base, if it's something they should have enabled I can
relay the message.

> There is nothing in the ethtool -S statistics that I see that indicates
> anything is wrong, you've gotten no tx timeouts as far as I can tell, have
> you had any system panics (possibly seeming unrelated to network?)

no. My system seems perfectly stable (outside of some end user
software bugs, and even then only kopete seems to crash these days,
due to me using an experimental protocol). I'm unable to account for
the fact that tests aren't accounting for anything wrong...

hmm... thought... possibly iptables is dropping them as INVALID? I'm
still thinking that testing on just this system with one nic hooked
into the other might be a good idea, as the firewall configuration in
openwrt is not straightforward to me, this would also remove any QoS
rules that the router is applying, and random packets floating around
(that windows boxen are sending).
-- 
Caleb Cushing

http://xenoterracide.blogspot.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Bugme-new] [Bug 14737] New: e1000e driver experiences large packet losses
  2009-12-07 21:53   ` Brandeburg, Jesse
  2009-12-07 22:20     ` Caleb Cushing
@ 2009-12-07 22:59     ` Jarek Poplawski
  1 sibling, 0 replies; 4+ messages in thread
From: Jarek Poplawski @ 2009-12-07 22:59 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: xenoterracide@gmail.com, Kirsher, Jeffrey T, Allan, Bruce W,
	Waskiewicz Jr, Peter P, Ronciak, John,
	bugzilla-daemon@bugzilla.kernel.org,
	bugme-daemon@bugzilla.kernel.org, Andrew Morton,
	netdev@vger.kernel.org

Brandeburg, Jesse wrote, On 12/07/2009 10:53 PM:

> There is nothing in the ethtool -S statistics that I see that indicates 
> anything is wrong, you've gotten no tx timeouts as far as I can tell, have 
> you had any system panics (possibly seeming unrelated to network?)


There are unreplied icmp echos and a lot of tcp retransmits in the
first one (netstat_after.slave4.log.gz):

Ip:
    812 total packets received
    1 with invalid addresses
    0 forwarded
    0 incoming packets discarded
    802 incoming packets delivered
    1048 requests sent out
Icmp:
    488 ICMP messages received
    0 input ICMP message failed.
    ICMP input histogram:
        destination unreachable: 2
        timeout in transit: 289
        echo replies: 197
    677 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 2
IcmpMsg:
        InType0: 197
        InType3: 2
        InType11: 289
        OutType3: 2
        OutType69: 675
Tcp:
    17 active connections openings
    0 passive connection openings
    14 failed connection attempts
    0 connection resets received
    0 connections established
    45 segments received
    49 segments send out
    19 segments retransmited
    0 bad segments received.
    19 resets sent

I analyzed tcpdumps from the router and there were really skipped
icmp echo requests on input. At the same time nothing wrong in the
stats (qdisc, ifconfig, ethtool) of this sending box with e1000e,
and the router's ifconfig (I only didn't see router's netstat).

Anyway, I doubt it's accidental or router to blame if another NIC,
and this e1000e with some boots/kernels(?) can work flawlessly.

Btw, after finding this similarly mysterious story below (with the
same NIC) I wonder if the router model can matter here too, but
maybe I'm wrong.

http://bugzilla.kernel.org/show_bug.cgi?id=11998


Jarek P.
  

> On Mon, 7 Dec 2009, Andrew Morton wrote:
>> (switched to email.  Please respond via emailed reply-to-all, not via the
>> bugzilla web interface).
>>
>> On Sat, 5 Dec 2009 07:02:49 GMT
>> bugzilla-daemon@bugzilla.kernel.org wrote:
>>
>>> http://bugzilla.kernel.org/show_bug.cgi?id=14737
>>>
>>>            Summary: e1000e driver experiences large packet losses
>>>            Product: Drivers

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-12-07 22:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bug-14737-10286@http.bugzilla.kernel.org/>
2009-12-07 21:19 ` [Bugme-new] [Bug 14737] New: e1000e driver experiences large packet losses Andrew Morton
2009-12-07 21:53   ` Brandeburg, Jesse
2009-12-07 22:20     ` Caleb Cushing
2009-12-07 22:59     ` Jarek Poplawski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).