netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Simultanious transmits seems to cause hang on pcnet32
@ 2006-07-18 15:20 Lennart Sorensen
  2006-07-18 17:57 ` Don Fry
  0 siblings, 1 reply; 3+ messages in thread
From: Lennart Sorensen @ 2006-07-18 15:20 UTC (permalink / raw)
  To: netdev

I am currently doing some testing on my system and managing to totally
hang the system (so that the watchdog has to come along and reboot it).

The setup is this:
I have a PLX PCI-PCI bridge with 4 79C972 chips behind it, each running
100baseTX.  I am transmitting traffic from a smartbits test system from
port 1 to port 3 and back, and from port 2 to port 4 and back.  I am
running 500 packets/second with 60 byte packets each way.

If I start the traffic on all 4 ports at the same time, I get less than
100 packets received back at the smartbits on each port, and then the
linux kernel is hung.  No response to anything I have tried.  The
watchdog then reboots the system.

If I start traffic on less than 4 ports, and then add the remaining
ports a second or so later, then it runs just fine and keeps up with the
traffic.

I tried making the traffic all flow out eth0 (an rtl8139 port) instead
of out the pcnet32 ports, and then there is no problem, so I think there
is some problem when multiple ports try to start transmitting at the
same time.

So far it has failed with 2.6.8 and 2.6.16 and with 2.6.17's pcnet32
with the napi patches applied.

I noticed that sometime between 2.6.4 and 2.6.8, the TxDone interrupts
were removed entirely, where as they used to be sent every once in a
while.  I am not sure if this is making a difference yet.

I tried increasing the ring sizes to their maximum setting of 9/9 rather
than the current default of 4/5, and that didn't make any difference
either.

Does anyone have a suggestion for how to go about debuging this issue?
So far I am very confused.

I tried turning on lots of debuging in pcnet32, but that seems to slow
the system down enough (printing debug messages on the serial console)
that it only manages to transmit 10 packets per port per second, at
which point it doesn't lock up.  Reducing the test setting from 500
60byte packets/second to 100 makes the problem disappear as well.

So I am open for suggestions to try.  I really don't know where to go
about debuging this when it makes the kernel lock up.  It makes me think
it is getting stuck somewhere with interrupts disabled, but I can't see
anything in the transmit code that looks like that could happen.

--
Len Sorensen

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Simultanious transmits seems to cause hang on pcnet32
  2006-07-18 15:20 Simultanious transmits seems to cause hang on pcnet32 Lennart Sorensen
@ 2006-07-18 17:57 ` Don Fry
  2006-07-18 18:45   ` Lennart Sorensen
  0 siblings, 1 reply; 3+ messages in thread
From: Don Fry @ 2006-07-18 17:57 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: netdev

On Tue, Jul 18, 2006 at 11:20:17AM -0400, Lennart Sorensen wrote:
> I am currently doing some testing on my system and managing to totally
> hang the system (so that the watchdog has to come along and reboot it).
> 
> The setup is this:
> I have a PLX PCI-PCI bridge with 4 79C972 chips behind it, each running
> 100baseTX.  I am transmitting traffic from a smartbits test system from
> port 1 to port 3 and back, and from port 2 to port 4 and back.  I am
> running 500 packets/second with 60 byte packets each way.

I don't know what a 'smartbits test system' is or how it works.  Could
you please briefly explain what it is and does?

> 
> If I start the traffic on all 4 ports at the same time, I get less than
> 100 packets received back at the smartbits on each port, and then the
> linux kernel is hung.  No response to anything I have tried.  The
> watchdog then reboots the system.
> 
> If I start traffic on less than 4 ports, and then add the remaining
> ports a second or so later, then it runs just fine and keeps up with the
> traffic.
> 
> I tried making the traffic all flow out eth0 (an rtl8139 port) instead
> of out the pcnet32 ports, and then there is no problem, so I think there
> is some problem when multiple ports try to start transmitting at the
> same time.

Is the rdl8139 on the same PCI bus?

> 
> So far it has failed with 2.6.8 and 2.6.16 and with 2.6.17's pcnet32
> with the napi patches applied.

Is there a version of the pcnet32 driver that does work?  Is this a
stock driver or do you have modifications made as well?

> 
> I noticed that sometime between 2.6.4 and 2.6.8, the TxDone interrupts
> were removed entirely, where as they used to be sent every once in a
> while.  I am not sure if this is making a difference yet.

The ltint or TxDone interrupt deferral code was removed in May 2004,
2.6.7 timeframe.  Every transmit packet causes an interrupt, rather than
just occasionally.

> 
> I tried increasing the ring sizes to their maximum setting of 9/9 rather
> than the current default of 4/5, and that didn't make any difference
> either.

Does reducing the ring size make any difference?  Or tx large/rx small,
or vice-versa?

> 
> Does anyone have a suggestion for how to go about debuging this issue?
> So far I am very confused.

Is there any way to see what is happening on the PCI bus where the
pcnet32 devices are connected?  Or see what is happening on the master
side of the pci-to-pci bridge?  Do the chips share any interrupt lines
or do they all have dedicated irq's?

Is this an SMP or UP system?

> 
> I tried turning on lots of debuging in pcnet32, but that seems to slow
> the system down enough (printing debug messages on the serial console)
> that it only manages to transmit 10 packets per port per second, at
> which point it doesn't lock up.  Reducing the test setting from 500
> 60byte packets/second to 100 makes the problem disappear as well.
> 
> So I am open for suggestions to try.  I really don't know where to go
> about debuging this when it makes the kernel lock up.  It makes me think
> it is getting stuck somewhere with interrupts disabled, but I can't see
> anything in the transmit code that looks like that could happen.
> 
> --
> Len Sorensen
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Don Fry
brazilnut@us.ibm.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Simultanious transmits seems to cause hang on pcnet32
  2006-07-18 17:57 ` Don Fry
@ 2006-07-18 18:45   ` Lennart Sorensen
  0 siblings, 0 replies; 3+ messages in thread
From: Lennart Sorensen @ 2006-07-18 18:45 UTC (permalink / raw)
  To: Don Fry; +Cc: netdev

On Tue, Jul 18, 2006 at 10:57:47AM -0700, Don Fry wrote:
> I don't know what a 'smartbits test system' is or how it works.  Could
> you please briefly explain what it is and does?

It is a network test system built by spirent (www.spirentcom.com).

It is mainly a layer 2 test system (you configure what you want it
ethernet packet to look like, what rate you want them sent at, and what
fields to change and by how much on each packet sent out).  We have it
configured to generate packets from 192.168.1.2 to 192.168.3.2 (and vice
versa), with the ip of the router with the pcnet32 chips in it, set as
the gateway.  The packets are simply an ethernet packet with the IPv4
header with the source and destination IP filled in, along with the
other required fields and the checksum, and then the data part of the
packet filled with 0s in this case.

> Is the rdl8139 on the same PCI bus?

The 8139 is on the primary PCI bus, the 972s are behind the pci bridge.
The 8139 driver is normally not even loaded.

> Is there a version of the pcnet32 driver that does work?  Is this a
> stock driver or do you have modifications made as well?

I haven't found one that works yet.  The only changes I have made are to
initialize the PHY and set the MAC address, since we don't have an eeprom
connected to the 972s.  I was thinking of going and trying with 2.4.27
or something around there, to see if an older driver behaves differently.

> The ltint or TxDone interrupt deferral code was removed in May 2004,
> 2.6.7 timeframe.  Every transmit packet causes an interrupt, rather than
> just occasionally.

Hmm, the way I read the code, it looked like setting the status to 8300
made no packet generate the interrupt, and setting it to 9300 made a
packet generate an interrupt.  I guess I read it backwards.  That
wouldn't surprise me. :)

> Does reducing the ring size make any difference?  Or tx large/rx small,
> or vice-versa?

I don't know.  I can try that.

> Is there any way to see what is happening on the PCI bus where the
> pcnet32 devices are connected?  Or see what is happening on the master
> side of the pci-to-pci bridge?  Do the chips share any interrupt lines
> or do they all have dedicated irq's?

We have two interrupts for the PCI bus, irq10 and 11.  eth1 and 3 share
one, and eth2 and 4 share the other.

> Is this an SMP or UP system?

Single amd geode SCx200 266MHz.

I have also considered building with PREEMPT off, to see if that makes a
difference, not that there are really any user space processes doing
anything on the system.

--
Len Sorensen

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-07-18 18:45 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-18 15:20 Simultanious transmits seems to cause hang on pcnet32 Lennart Sorensen
2006-07-18 17:57 ` Don Fry
2006-07-18 18:45   ` Lennart Sorensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).