public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Linux 2.6.9 Adaptec 4 Port Starfire Sickness
@ 2005-04-03  4:41 jmerkey
  2005-04-03  5:47 ` Willy Tarreau
  2005-04-03  7:26 ` Jeff Garzik
  0 siblings, 2 replies; 7+ messages in thread
From: jmerkey @ 2005-04-03  4:41 UTC (permalink / raw)
  To: linux-kernel

With linux 2.6.9 running at 192 MB/S network loading and protocol 
splitting drivers routing packets out of
a 2.6.9 device at full 100 mb/s (12.5 MB/S) simultaneously over 4 ports, 
the adaptec starfire driver goes into
constant Tx FIFO reconfiguration mode and after 3-4 days of constantly 
resetting the Tx FIFO window and
generating a deluge of messages such as:

ethX:  PCI bus congestion, resetting Tx FIFO window to X bytes

pouring into the system log file at a rate of a dozen per minute.  After 
several days, the PCI bus totally locks up
and hangs the system.  Need a config option to allow the starfire to 
disable this feature.  At very
high bus loading rates, the starfire card will completely lock the bus 
after 3-4 days
of constant Tx FIFO reconfiguration at very high data rates with 
protocol splitting and routing.

Jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linux 2.6.9 Adaptec 4 Port Starfire Sickness
  2005-04-03  4:41 Linux 2.6.9 Adaptec 4 Port Starfire Sickness jmerkey
@ 2005-04-03  5:47 ` Willy Tarreau
  2005-04-03  6:58   ` jmerkey
  2005-04-03  7:26 ` Jeff Garzik
  1 sibling, 1 reply; 7+ messages in thread
From: Willy Tarreau @ 2005-04-03  5:47 UTC (permalink / raw)
  To: jmerkey; +Cc: linux-kernel

Hi Jeff,

I've also experienced those messages under 2.4, but they were harmless,
and I never had a machine hang even after weeks of full load (the adapter
was mounted on a stress test machine before being used in firewalls for
months).

So I wonder how you can be sure that it is this driver which finally locks
the bus. Perhaps the system locks for any other reason (eg: race condition).
Have you tried with any other 4-port NIC (tulip or sun for example) ? Sun
QFE would be the most interesting to test as it also supports 64 bits /
66 MHz.

Regards,
Willy

On Sat, Apr 02, 2005 at 09:41:28PM -0700, jmerkey wrote:
> With linux 2.6.9 running at 192 MB/S network loading and protocol 
> splitting drivers routing packets out of
> a 2.6.9 device at full 100 mb/s (12.5 MB/S) simultaneously over 4 ports, 
> the adaptec starfire driver goes into
> constant Tx FIFO reconfiguration mode and after 3-4 days of constantly 
> resetting the Tx FIFO window and
> generating a deluge of messages such as:
> 
> ethX:  PCI bus congestion, resetting Tx FIFO window to X bytes
> 
> pouring into the system log file at a rate of a dozen per minute.  After 
> several days, the PCI bus totally locks up
> and hangs the system.  Need a config option to allow the starfire to 
> disable this feature.  At very
> high bus loading rates, the starfire card will completely lock the bus 
> after 3-4 days
> of constant Tx FIFO reconfiguration at very high data rates with 
> protocol splitting and routing.
> 
> Jeff
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linux 2.6.9 Adaptec 4 Port Starfire Sickness
  2005-04-03  5:47 ` Willy Tarreau
@ 2005-04-03  6:58   ` jmerkey
  2005-04-03  7:38     ` Willy Tarreau
  0 siblings, 1 reply; 7+ messages in thread
From: jmerkey @ 2005-04-03  6:58 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel


It works fine with the Intel Dual Port Pro-1000 MT adapters without 
these problems. I am using testing scenarios
with Jumbo Frames as well. I am guessing the PCI bus contention is high 
due to the disk I/O bandwidth and
this is causing conditions the adapter does not normally see. 
Documentation states that this message should be very
rare, and not spool off into the logs at this rate.

See http://www.ibiblio.org/mdw/HOWTO/Ethernet-HOWTO-8.html

Jeff

Willy Tarreau wrote:

>Hi Jeff,
>
>I've also experienced those messages under 2.4, but they were harmless,
>and I never had a machine hang even after weeks of full load (the adapter
>was mounted on a stress test machine before being used in firewalls for
>months).
>
>So I wonder how you can be sure that it is this driver which finally locks
>the bus. Perhaps the system locks for any other reason (eg: race condition).
>Have you tried with any other 4-port NIC (tulip or sun for example) ? Sun
>QFE would be the most interesting to test as it also supports 64 bits /
>66 MHz.
>
>Regards,
>Willy
>
>On Sat, Apr 02, 2005 at 09:41:28PM -0700, jmerkey wrote:
>  
>
>>With linux 2.6.9 running at 192 MB/S network loading and protocol 
>>splitting drivers routing packets out of
>>a 2.6.9 device at full 100 mb/s (12.5 MB/S) simultaneously over 4 ports, 
>>the adaptec starfire driver goes into
>>constant Tx FIFO reconfiguration mode and after 3-4 days of constantly 
>>resetting the Tx FIFO window and
>>generating a deluge of messages such as:
>>
>>ethX:  PCI bus congestion, resetting Tx FIFO window to X bytes
>>
>>pouring into the system log file at a rate of a dozen per minute.  After 
>>several days, the PCI bus totally locks up
>>and hangs the system.  Need a config option to allow the starfire to 
>>disable this feature.  At very
>>high bus loading rates, the starfire card will completely lock the bus 
>>after 3-4 days
>>of constant Tx FIFO reconfiguration at very high data rates with 
>>protocol splitting and routing.
>>
>>Jeff
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>Please read the FAQ at  http://www.tux.org/lkml/
>>    
>>
>
>  
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linux 2.6.9 Adaptec 4 Port Starfire Sickness
  2005-04-03  7:26 ` Jeff Garzik
@ 2005-04-03  7:07   ` jmerkey
  0 siblings, 0 replies; 7+ messages in thread
From: jmerkey @ 2005-04-03  7:07 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel

Jeff Garzik wrote:

> jmerkey wrote:
>
>> With linux 2.6.9 running at 192 MB/S network loading and protocol 
>> splitting drivers routing packets out of
>> a 2.6.9 device at full 100 mb/s (12.5 MB/S) simultaneously over 4 
>> ports, the adaptec starfire driver goes into
>> constant Tx FIFO reconfiguration mode and after 3-4 days of 
>> constantly resetting the Tx FIFO window and
>> generating a deluge of messages such as:
>>
>> ethX: PCI bus congestion, resetting Tx FIFO window to X bytes
>>
>> pouring into the system log file at a rate of a dozen per minute. 
>> After several days, the PCI bus totally locks up
>> and hangs the system. Need a config option to allow the starfire to 
>> disable this feature. At very
>> high bus loading rates, the starfire card will completely lock the 
>> bus after 3-4 days
>> of constant Tx FIFO reconfiguration at very high data rates with 
>> protocol splitting and routing.
>
>
> The feature doesn't need disabling; just modify the driver to stop the 
> flapping.
>
> Jeff
>
>
>
>
I am going to try to just turn off the Tx FIFO setting in the code 
completely and see if this helps, not just
the message. See what happens ...

Jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linux 2.6.9 Adaptec 4 Port Starfire Sickness
  2005-04-03  7:38     ` Willy Tarreau
@ 2005-04-03  7:21       ` jmerkey
  0 siblings, 0 replies; 7+ messages in thread
From: jmerkey @ 2005-04-03  7:21 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel


I disabled the FIFO resetting code and am running tests. See what 
happens. I am on 2.6 not
2.4 so it could be a problem there. At any rate, I will see if the 
problem goes away.

Jeff

Willy Tarreau wrote:

>On Sat, Apr 02, 2005 at 11:58:44PM -0700, jmerkey wrote:
>  
>
>>It works fine with the Intel Dual Port Pro-1000 MT adapters without 
>>these problems.
>>    
>>
>
>but unless I'm mistaken, there's no PCI bridge on this board, and it is
>possible that the two ports share the same IRQ, that's why I suggested
>trying a 4-port sun QFE or something which is more similar to the starfire.
>
>  
>
>>I am using testing scenarios
>>with Jumbo Frames as well. I am guessing the PCI bus contention is high 
>>due to the disk I/O bandwidth and
>>this is causing conditions the adapter does not normally see. 
>>    
>>
>
>As I said, I have been saturating this card for weeks during stress tests
>and although it spitted out lots of messages, it never hanged (at least on
>recent 2.4 kernels, because very early 2.4 were a real pain with this one).
>
>  
>
>>Documentation states that this message should be very
>>rare, and not spool off into the logs at this rate.
>>    
>>
>
>perhaps you have a mix of small and large frames which makes the driver
>constantly change the fifo size, and this part is not handled properly ?
>
>Willy
>
>  
>
>>See http://www.ibiblio.org/mdw/HOWTO/Ethernet-HOWTO-8.html
>>
>>Jeff
>>
>>Willy Tarreau wrote:
>>
>>    
>>
>>>Hi Jeff,
>>>
>>>I've also experienced those messages under 2.4, but they were harmless,
>>>and I never had a machine hang even after weeks of full load (the adapter
>>>was mounted on a stress test machine before being used in firewalls for
>>>months).
>>>
>>>So I wonder how you can be sure that it is this driver which finally 
>>>locks
>>>the bus. Perhaps the system locks for any other reason (eg: race 
>>>condition).
>>>Have you tried with any other 4-port NIC (tulip or sun for example) ? Sun
>>>QFE would be the most interesting to test as it also supports 64 bits /
>>>66 MHz.
>>>
>>>Regards,
>>>Willy
>>>
>>>On Sat, Apr 02, 2005 at 09:41:28PM -0700, jmerkey wrote:
>>>
>>>
>>>      
>>>
>>>>With linux 2.6.9 running at 192 MB/S network loading and protocol 
>>>>splitting drivers routing packets out of
>>>>a 2.6.9 device at full 100 mb/s (12.5 MB/S) simultaneously over 4 
>>>>ports, the adaptec starfire driver goes into
>>>>constant Tx FIFO reconfiguration mode and after 3-4 days of constantly 
>>>>resetting the Tx FIFO window and
>>>>generating a deluge of messages such as:
>>>>
>>>>ethX:  PCI bus congestion, resetting Tx FIFO window to X bytes
>>>>
>>>>pouring into the system log file at a rate of a dozen per minute.  
>>>>After several days, the PCI bus totally locks up
>>>>and hangs the system.  Need a config option to allow the starfire to 
>>>>disable this feature.  At very
>>>>high bus loading rates, the starfire card will completely lock the bus 
>>>>after 3-4 days
>>>>of constant Tx FIFO reconfiguration at very high data rates with 
>>>>protocol splitting and routing.
>>>>
>>>>Jeff
>>>>-
>>>>To unsubscribe from this list: send the line "unsubscribe linux-kernel" 
>>>>in
>>>>the body of a message to majordomo@vger.kernel.org
>>>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>Please read the FAQ at  http://www.tux.org/lkml/
>>>>  
>>>>
>>>>        
>>>>
>>>
>>>      
>>>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>
>  
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linux 2.6.9 Adaptec 4 Port Starfire Sickness
  2005-04-03  4:41 Linux 2.6.9 Adaptec 4 Port Starfire Sickness jmerkey
  2005-04-03  5:47 ` Willy Tarreau
@ 2005-04-03  7:26 ` Jeff Garzik
  2005-04-03  7:07   ` jmerkey
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff Garzik @ 2005-04-03  7:26 UTC (permalink / raw)
  To: jmerkey; +Cc: linux-kernel

jmerkey wrote:
> With linux 2.6.9 running at 192 MB/S network loading and protocol 
> splitting drivers routing packets out of
> a 2.6.9 device at full 100 mb/s (12.5 MB/S) simultaneously over 4 ports, 
> the adaptec starfire driver goes into
> constant Tx FIFO reconfiguration mode and after 3-4 days of constantly 
> resetting the Tx FIFO window and
> generating a deluge of messages such as:
> 
> ethX:  PCI bus congestion, resetting Tx FIFO window to X bytes
> 
> pouring into the system log file at a rate of a dozen per minute.  After 
> several days, the PCI bus totally locks up
> and hangs the system.  Need a config option to allow the starfire to 
> disable this feature.  At very
> high bus loading rates, the starfire card will completely lock the bus 
> after 3-4 days
> of constant Tx FIFO reconfiguration at very high data rates with 
> protocol splitting and routing.

The feature doesn't need disabling; just modify the driver to stop the 
flapping.

	Jeff




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linux 2.6.9 Adaptec 4 Port Starfire Sickness
  2005-04-03  6:58   ` jmerkey
@ 2005-04-03  7:38     ` Willy Tarreau
  2005-04-03  7:21       ` jmerkey
  0 siblings, 1 reply; 7+ messages in thread
From: Willy Tarreau @ 2005-04-03  7:38 UTC (permalink / raw)
  To: jmerkey; +Cc: linux-kernel

On Sat, Apr 02, 2005 at 11:58:44PM -0700, jmerkey wrote:
> 
> It works fine with the Intel Dual Port Pro-1000 MT adapters without 
> these problems.

but unless I'm mistaken, there's no PCI bridge on this board, and it is
possible that the two ports share the same IRQ, that's why I suggested
trying a 4-port sun QFE or something which is more similar to the starfire.

> I am using testing scenarios
> with Jumbo Frames as well. I am guessing the PCI bus contention is high 
> due to the disk I/O bandwidth and
> this is causing conditions the adapter does not normally see. 

As I said, I have been saturating this card for weeks during stress tests
and although it spitted out lots of messages, it never hanged (at least on
recent 2.4 kernels, because very early 2.4 were a real pain with this one).

> Documentation states that this message should be very
> rare, and not spool off into the logs at this rate.

perhaps you have a mix of small and large frames which makes the driver
constantly change the fifo size, and this part is not handled properly ?

Willy

> See http://www.ibiblio.org/mdw/HOWTO/Ethernet-HOWTO-8.html
> 
> Jeff
> 
> Willy Tarreau wrote:
> 
> >Hi Jeff,
> >
> >I've also experienced those messages under 2.4, but they were harmless,
> >and I never had a machine hang even after weeks of full load (the adapter
> >was mounted on a stress test machine before being used in firewalls for
> >months).
> >
> >So I wonder how you can be sure that it is this driver which finally 
> >locks
> >the bus. Perhaps the system locks for any other reason (eg: race 
> >condition).
> >Have you tried with any other 4-port NIC (tulip or sun for example) ? Sun
> >QFE would be the most interesting to test as it also supports 64 bits /
> >66 MHz.
> >
> >Regards,
> >Willy
> >
> >On Sat, Apr 02, 2005 at 09:41:28PM -0700, jmerkey wrote:
> > 
> >
> >>With linux 2.6.9 running at 192 MB/S network loading and protocol 
> >>splitting drivers routing packets out of
> >>a 2.6.9 device at full 100 mb/s (12.5 MB/S) simultaneously over 4 
> >>ports, the adaptec starfire driver goes into
> >>constant Tx FIFO reconfiguration mode and after 3-4 days of constantly 
> >>resetting the Tx FIFO window and
> >>generating a deluge of messages such as:
> >>
> >>ethX:  PCI bus congestion, resetting Tx FIFO window to X bytes
> >>
> >>pouring into the system log file at a rate of a dozen per minute.  
> >>After several days, the PCI bus totally locks up
> >>and hangs the system.  Need a config option to allow the starfire to 
> >>disable this feature.  At very
> >>high bus loading rates, the starfire card will completely lock the bus 
> >>after 3-4 days
> >>of constant Tx FIFO reconfiguration at very high data rates with 
> >>protocol splitting and routing.
> >>
> >>Jeff
> >>-
> >>To unsubscribe from this list: send the line "unsubscribe linux-kernel" 
> >>in
> >>the body of a message to majordomo@vger.kernel.org
> >>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>Please read the FAQ at  http://www.tux.org/lkml/
> >>   
> >>
> >
> > 
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-04-03  7:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-03  4:41 Linux 2.6.9 Adaptec 4 Port Starfire Sickness jmerkey
2005-04-03  5:47 ` Willy Tarreau
2005-04-03  6:58   ` jmerkey
2005-04-03  7:38     ` Willy Tarreau
2005-04-03  7:21       ` jmerkey
2005-04-03  7:26 ` Jeff Garzik
2005-04-03  7:07   ` jmerkey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox