netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* dl2k tx timeout problems
       [not found] <883AD1ABBCC79842ACCEDB1BE3E5B78C03B2B5F6@srvexch01siege.Outremer.rfo.fr>
@ 2005-06-08 14:10 ` Jon Mason
  0 siblings, 0 replies; only message in thread
From: Jon Mason @ 2005-06-08 14:10 UTC (permalink / raw)
  To: MOUNIER Emmanuel; +Cc: netdev

Bonjour,
Please see my comments below.

On 6/8/05, MOUNIER Emmanuel <Emmanuel.MOUNIER@rfo.fr> wrote:
> Hello!
>             I'm not sure, but I think the EMT 64 is for the extended PCI Slot (64bits), right?

Actually no, EMT64 is Intel's version of 64bit extensions (similar to
AMD's athlon64/opteron).  This enables you to run a 64bit kernel or a
32bit kernel.  If you installed a standard x86 version of Linux, then
you are running in 32bit mode.  if you installed something for x86_64
(sometimes called amd64), then you are running in 64bit mode.  "uname
-a" will show you which one you are running.

> If  it's true, yes, my card work in 64bits mode, and I think it's maybe the problem, because the DGE-550SX card work perfectly on some of our old server in standard PCI slot (32bits).

I think this is worth noting.  I'll investigate this with my copper adapter. 

>             I've tried many kernel versions without success, but actually I'm running kernel 2.6.8.1-10smp on a Mandrake Linux 10.1.

What other kernels have you tried?  Have you tried a vanilla kernel
from kernel.org?
 
>             Now, I will try to explain my problem as clear as I can:
> 
>             I've plugged the card, and turned on my Linux box. The card was detected perfectly and the module was loaded at the boot.
> 
>             I can assign an IP address to the card, and I'm able to ping my network. After a short time, the network traffic completely hangs and it says: TX timeout, is buffer full?
 
how long (an estimate is fine) before the system experiences the tx
timeout?  What kind of network traffic is the systeming doing during
this time?  Are the systems idle?  are they running NFS?
 
 
>             When I restart the network service, I can see in my logs that Linux simply disable the IRQ of my NIC:
 
Can you send me the output of "lspci -v"?  This will help confirm that
no other devices shares the same interrupt.
 
> /var/log/messages :
> 
> 
> 
> FWRFO kernel: eth2: D-Link DGE-550SX Gigabit Ethernet Adapter, 00:0d:88:b5:f3:f5, IRQ 4
> 
> FWRFO kernel: tx_coalesce:^I16 packets
> 
> FWRFO kernel: rx_coalesce:^I10 packets
> 
> FWRFO kernel: rx_timeout: ^I128000 ns
> 
> FWRFO kernel: Uhhuh. NMI received. Dazed and confused, but trying to continue
> 
> FWRFO kernel: You probably have a hardware problem with your RAM chips


The error above is a memory parity error.  That is definately not
good.  Are you seeing this error very often?

> FWRFO kernel: eth2: Link off
> 
> FWRFO kernel: eth2: Link up
> 
> FWRFO kernel: Auto 1000 Mbps, Full duplex
> 
> FWRFO kernel: Enable Tx Flow Control
> 
> FWRFO kernel: Enable Rx Flow Control
> 
> FWRFO kernel: irq 4: nobody cared!
> 
> FWRFO kernel:  [dump_stack+30/32] dump_stack+0x1e/0x20
> 
> FWRFO kernel:  [<c0107e3e>] dump_stack+0x1e/0x20
> 
> FWRFO kernel:  [__report_bad_irq+43/144] __report_bad_irq+0x2b/0x90
> 
> FWRFO kernel:  [<c010945b>] __report_bad_irq+0x2b/0x90
> 
> FWRFO kernel:  [note_interrupt+144/176] note_interrupt+0x90/0xb0
> 
> FWRFO kernel:  [<c0109570>] note_interrupt+0x90/0xb0
> 
> FWRFO kernel:  [do_IRQ+272/304] do_IRQ+0x110/0x130
> 
> FWRFO kernel:  [<c0109850>] do_IRQ+0x110/0x130
> 
> FWRFO kernel:  [common_interrupt+24/32] common_interrupt+0x18/0x20
> 
> FWRFO kernel:  [<c0107960>] common_interrupt+0x18/0x20
> 
> FWRFO kernel:  [do_softirq+53/64] do_softirq+0x35/0x40
> 
> FWRFO kernel:  [<c01278f5>] do_softirq+0x35/0x40
> 
> FWRFO kernel:  [do_IRQ+279/304] do_IRQ+0x117/0x130
> 
> FWRFO kernel:  [<c0109857>] do_IRQ+0x117/0x130
> 
> FWRFO kernel:  [common_interrupt+24/32] common_interrupt+0x18/0x20
> 
> FWRFO kernel:  [<c0107960>] common_interrupt+0x18/0x20
> 
> FWRFO kernel:  [pg0+945814381/1069203456] rio_open+0x5d/0x210 [dl2k]
> 
> FWRFO kernel:  [<f8a51b6d>] rio_open+0x5d/0x210 [dl2k]
> 
> FWRFO kernel:  [dev_open+232/256] dev_open+0xe8/0x100
> 
> FWRFO kernel:  [<c0287bf8>] dev_open+0xe8/0x100
> 
> FWRFO kernel:  [dev_change_flags+88/304] dev_change_flags+0x58/0x130
> 
> FWRFO kernel:  [<c0289378>] dev_change_flags+0x58/0x130
> 
> FWRFO kernel:  [devinet_ioctl+1392/1584] devinet_ioctl+0x570/0x630
> 
> FWRFO kernel:  [<c02cda60>] devinet_ioctl+0x570/0x630
> 
> FWRFO kernel:  [inet_ioctl+192/208] inet_ioctl+0xc0/0xd0
> 
> FWRFO kernel:  [<c02cfd60>] inet_ioctl+0xc0/0xd0
> 
> FWRFO kernel:  [sock_ioctl+522/720] sock_ioctl+0x20a/0x2d0
> 
> FWRFO kernel:  [<c027ecca>] sock_ioctl+0x20a/0x2d0
> 
> FWRFO kernel:  [sys_ioctl+586/662] sys_ioctl+0x24a/0x296
> 
> FWRFO kernel:  [<c0174ffa>] sys_ioctl+0x24a/0x296
> 
> FWRFO kernel:  [sysenter_past_esp+82/113] sysenter_past_esp+0x52/0x71
> 
> FWRFO kernel:  [<c0106fa1>] sysenter_past_esp+0x52/0x71
> 
> FWRFO kernel: handlers:
> 
> FWRFO kernel: [pg0+945816320/1069203456] (rio_interrupt+0x0/0xf0 [dl2k])
> 
> FWRFO kernel: [<f8a52300>] (rio_interrupt+0x0/0xf0 [dl2k])
> 
> FWRFO kernel: Disabling IRQ #4
 

The bad interrupt is most likely related to the restarting of the
network while the adapter is hung.


>             I went to the BIOS setup, and I set the system to not share the IRQ for my NIC.
> 
> 
> 
>             I've tried with several DLINK NIC of the same series, and in 4 DL-360 HP servers, so I don't think it's a hardware malfunction.
> 
> 
> 
>             I also tried to build a new kernel without power management, and with the Dlink drivers include in the kernel (not in a module).
> 
> 
> 
>             I can try as many debug patch as you want =)

Great!  I'm sure I'll have something for you to test.  I can send you
the patch that I sent to Richard.  It solves the problem under light
load, but the network will still hang under high load.

> 
> 
>             And sure, you can forward our mails to the Linux kernel network mailing list.

I have CC'ed them on this e-mail, and changed the subject accordingly. 
 
 
>             I have some knowledge in Linux OS, but I'm very poor in software development, so maybe you must explain me in details what I must do for patching, etc...
> 

I'll be happy to explain when the time comes.

> Thanks you very much, and sorry for my poor English...

Your English is very good (and loads better than my French).  

> Emmanuel Mounier
> 
> Chargé de projet direction Technique
> 
> RFO ( www.rfo.fr )
> 
> mail : emmanuel.mounier@rfo.fr
> 
> 
> ________________________________
> 
> De: Jon Mason [mailto:jdmason@gmail.com]
> Date: mar. 07/06/2005 18:35
> À: MOUNIER Emmanuel
> Objet : Re: Help : Big Problem With DLINK Fiber NIC
> 
> 
> 
> Bonjour!
> 
> I am happy to help.  My previous experience has been with the copper
> adapters (I have one at home), but the fiber ones should be fairly
> similar.
> 
> From "http://h18004.www1.hp.com/products/servers/proliantdl360/", I
> see that your systems are EMT64.  Are you running them in 64bit or
> 32bit?  What kernel version are you running?
> 
> When you refer to the same problem, I assume you mean tx timeouts.
> How are you causing the error?
> 
> I never fully fixed Richards issue, but I was able to get it working
> under light traffic.  I got side tracked, and have't looked at the
> problem in a little while.  Are you willing to try some debug patches?
> 
> With your approval, I would like to CC the netdev mailing list
> (netdev@oss.sgi.com) on these e-mails.  netdev is the linux kernel
> network mailing list (incase you didn't already know).
> 
> Thanks,
> Jon
> 
> On 6/7/05, MOUNIER Emmanuel <Emmanuel.MOUNIER@rfo.fr> wrote:
> >
> >
> >
> > Hello.
> >
> >   I'm a french network manager, and I have a big problem with some Dlink
> > Fiber Network cards (DGE-550SX).
> >
> >  I've seen on a website that you helped Mr Richard EMS to try to find a
> > solution.
> > (http://www.ussg.iu.edu/hypermail/linux/kernel/0412.2/0371.html)
> >
> >  I've contacted him, but he said he have bought another Fiber NIC card.
> >
> >  My problem is that I have 13 DGE-550SX cards for 8 HP Server Proliant
> > DL-360 G4, and I have the same problem.
> >
> >  Just want to know if you have any idea now, or maybe, if you can bring me
> > some help...
> >
> >  Fiber NIC card is very expensive, and I hope I will find a way to solve the
> > problem but, either DLink or HP seem to be able to give me a solution.
> >
> >  If I can do something to help you, just tell me what !
> >
> >  Thanks per advance.
> >
> >  Emmanuel Mounier
> >  Chargé de projet direction Technique
> >  RFO ( www.rfo.fr )
> >  mail : emmanuel.mounier@rfo.fr
> 
> 
> 
>

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2005-06-08 14:10 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <883AD1ABBCC79842ACCEDB1BE3E5B78C03B2B5F6@srvexch01siege.Outremer.rfo.fr>
2005-06-08 14:10 ` dl2k tx timeout problems Jon Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).