* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
[not found] ` <86802c440805071130m62c1f4edydb3316dac4a2aba2@mail.gmail.com>
@ 2008-05-07 21:15 ` David Miller
2008-05-09 18:32 ` Jesper Krogh
0 siblings, 1 reply; 52+ messages in thread
From: David Miller @ 2008-05-07 21:15 UTC (permalink / raw)
To: yhlu.kernel; +Cc: jesper, linux-kernel, netdev
From: "Yinghai Lu" <yhlu.kernel@gmail.com>
Date: Wed, 7 May 2008 11:30:18 -0700
> On Wed, May 7, 2008 at 11:23 AM, Jesper Krogh <jesper@krogh.cc> wrote:
> > Hi.
> >
> > I get errors like this after a few minutes of traffic on a Sun Neptune 10g
> > ethernet card. (with nice 500MB/s throughput).
> >
> > Then the server seems too busy with something, so it doesn't even respont
> > to a serial terminal login.
> >
> > May 7 16:16:33 hest kernel: [ 166.948958] niu: niu_get_parent:
> > platform_type[1] port[3]
> > May 7 16:16:33 hest kernel: [ 166.949366] niu: niu_get_and_validate_port:
> > port[3] num_ports[2]
> > May 7 16:16:33 hest kernel: [ 166.949886] niu: niu_put_parent: port[3]
> > .. bootup ends here ..
> > May 7 17:13:54 hest kernel: [ 3670.128178] niu 0000:84:00.0: niu: eth4:
> > Transmit timed out, resetting
> > May 7 17:14:04 hest kernel: [ 3680.108614] niu 0000:84:00.0: niu: eth4:
> > Transmit timed out, resetting
> > May 7 17:14:14 hest kernel: [ 3690.093089] niu 0000:84:00.0: niu: eth4:
> > Transmit timed out, resetting
> > May 7 17:14:19 hest kernel: [ 3695.079254] niu 0000:84:00.0: niu: eth4:
> > Transmit timed out, resetting
> > May 7 17:14:24 hest kernel: [ 3700.073525] niu 0000:84:00.0: niu: eth4:
> > Transmit timed out, resetting
> > May 7 17:14:29 hest kernel: [ 3705.063744] niu 0000:84:00.0: niu: eth4:
> > Transmit timed out, resetting
> > May 7 17:14:34 hest kernel: [ 3710.049918] niu 0000:84:00.0: niu: eth4:
> > Transmit timed out, resetting
> >
> >
> > Any suggestions?
> >
> > The system is an Ubuntu Hardy (2.6.24-17-server) amd64.
>
> can you try 2.6.25 or current git?
Also, please always CC: netdev@vger.kernel.org on networking reports.
Thank you.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-07 21:15 ` NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24) David Miller
@ 2008-05-09 18:32 ` Jesper Krogh
2008-05-09 21:32 ` David Miller
2008-05-10 11:01 ` Jesper Krogh
0 siblings, 2 replies; 52+ messages in thread
From: Jesper Krogh @ 2008-05-09 18:32 UTC (permalink / raw)
To: David Miller; +Cc: yhlu.kernel, linux-kernel, netdev
David Miller wrote:
> From: "Yinghai Lu" <yhlu.kernel@gmail.com>
> Date: Wed, 7 May 2008 11:30:18 -0700
>
>> On Wed, May 7, 2008 at 11:23 AM, Jesper Krogh <jesper@krogh.cc> wrote:
>>> Hi.
>>>
>>> I get errors like this after a few minutes of traffic on a Sun Neptune 10g
>>> ethernet card. (with nice 500MB/s throughput).
>>>
>>> Then the server seems too busy with something, so it doesn't even respont
>>> to a serial terminal login.
>>>
>>> May 7 16:16:33 hest kernel: [ 166.948958] niu: niu_get_parent:
>>> platform_type[1] port[3]
>>> May 7 16:16:33 hest kernel: [ 166.949366] niu: niu_get_and_validate_port:
>>> port[3] num_ports[2]
>>> May 7 16:16:33 hest kernel: [ 166.949886] niu: niu_put_parent: port[3]
>>> .. bootup ends here ..
>>> May 7 17:13:54 hest kernel: [ 3670.128178] niu 0000:84:00.0: niu: eth4:
>>> Transmit timed out, resetting
>>> May 7 17:14:04 hest kernel: [ 3680.108614] niu 0000:84:00.0: niu: eth4:
>>> Transmit timed out, resetting
>>> May 7 17:14:14 hest kernel: [ 3690.093089] niu 0000:84:00.0: niu: eth4:
>>> Transmit timed out, resetting
>>> May 7 17:14:19 hest kernel: [ 3695.079254] niu 0000:84:00.0: niu: eth4:
>>> Transmit timed out, resetting
>>> May 7 17:14:24 hest kernel: [ 3700.073525] niu 0000:84:00.0: niu: eth4:
>>> Transmit timed out, resetting
>>> May 7 17:14:29 hest kernel: [ 3705.063744] niu 0000:84:00.0: niu: eth4:
>>> Transmit timed out, resetting
>>> May 7 17:14:34 hest kernel: [ 3710.049918] niu 0000:84:00.0: niu: eth4:
>>> Transmit timed out, resetting
>>>
>>>
>>> Any suggestions?
>>>
>>> The system is an Ubuntu Hardy (2.6.24-17-server) amd64.
>> can you try 2.6.25 or current git?
>
> Also, please always CC: netdev@vger.kernel.org on networking reports.
Yes. It is reproducable under 2.6.25.2, when the load get up.. (worked
excellent in the <100MB/s range for several hours.
When it works I doesnt seem to be able to get it pass 500MB/s.
Jesper
--
Jesper
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-09 18:32 ` Jesper Krogh
@ 2008-05-09 21:32 ` David Miller
2008-05-09 21:59 ` Jesper Krogh
2008-05-10 11:01 ` Jesper Krogh
1 sibling, 1 reply; 52+ messages in thread
From: David Miller @ 2008-05-09 21:32 UTC (permalink / raw)
To: jesper; +Cc: yhlu.kernel, linux-kernel, netdev
From: Jesper Krogh <jesper@krogh.cc>
Date: Fri, 09 May 2008 20:32:53 +0200
> When it works I doesnt seem to be able to get it pass 500MB/s.
With this card you really need multiple cpus and multiple threads
sending data through the card in order to fill the 10Gb pipe.
Single connections will not fill the pipe.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-09 21:32 ` David Miller
@ 2008-05-09 21:59 ` Jesper Krogh
2008-05-09 22:07 ` David Miller
` (2 more replies)
0 siblings, 3 replies; 52+ messages in thread
From: Jesper Krogh @ 2008-05-09 21:59 UTC (permalink / raw)
To: David Miller; +Cc: yhlu.kernel, linux-kernel, netdev
David Miller wrote:
> From: Jesper Krogh <jesper@krogh.cc>
> Date: Fri, 09 May 2008 20:32:53 +0200
>
>> When it works I doesnt seem to be able to get it pass 500MB/s.
>
> With this card you really need multiple cpus and multiple threads
> sending data through the card in order to fill the 10Gb pipe.
>
> Single connections will not fill the pipe.
The server is a Sun X4600 with 8 x dual-core CPU's, setup with 64
NFS-threads. The other end of the fiber goes into a switch with gigabit
ports connected to 48 dual-core cpus. The test was done doing a dd on a
4.5GB file from the server to /dev/null on the clients.
The number of contextswitches seems enourmous.. over 120.000 sometimes.
When transmitting around the same amount of data (4xgigabit bonded with
802.3ad) 4x110MB/s the amount of contextswitches only reaches 3-4.000. I
have no idea if this has any relevance.
Should this setup not be able to fill the pipe?
--
Jesper
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-09 21:59 ` Jesper Krogh
@ 2008-05-09 22:07 ` David Miller
2008-05-09 22:13 ` Jesper Krogh
2008-05-09 22:09 ` Matheos Worku
2008-05-09 22:20 ` Rick Jones
2 siblings, 1 reply; 52+ messages in thread
From: David Miller @ 2008-05-09 22:07 UTC (permalink / raw)
To: jesper; +Cc: yhlu.kernel, linux-kernel, netdev
From: Jesper Krogh <jesper@krogh.cc>
Date: Fri, 09 May 2008 23:59:18 +0200
> David Miller wrote:
> > From: Jesper Krogh <jesper@krogh.cc>
> > Date: Fri, 09 May 2008 20:32:53 +0200
> >
> >> When it works I doesnt seem to be able to get it pass 500MB/s.
> >
> > With this card you really need multiple cpus and multiple threads
> > sending data through the card in order to fill the 10Gb pipe.
> >
> > Single connections will not fill the pipe.
>
> The server is a Sun X4600 with 8 x dual-core CPU's, setup with 64
> NFS-threads. The other end of the fiber goes into a switch with gigabit
> ports connected to 48 dual-core cpus. The test was done doing a dd on a
> 4.5GB file from the server to /dev/null on the clients.
A single file transfer will not fill the pipe using this card, no
matter how many cpus you have :-)
The card is designed such that multiple parallel data streams over
different connections result in the greatest throughput.
> Should this setup not be able to fill the pipe?
Nope, not with this card.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-09 21:59 ` Jesper Krogh
2008-05-09 22:07 ` David Miller
@ 2008-05-09 22:09 ` Matheos Worku
2008-05-09 22:15 ` Jesper Krogh
2008-05-09 22:20 ` Rick Jones
2 siblings, 1 reply; 52+ messages in thread
From: Matheos Worku @ 2008-05-09 22:09 UTC (permalink / raw)
To: Jesper Krogh; +Cc: David Miller, yhlu.kernel, linux-kernel, netdev
Jesper Krogh wrote:
> David Miller wrote:
>> From: Jesper Krogh <jesper@krogh.cc>
>> Date: Fri, 09 May 2008 20:32:53 +0200
>>
>>> When it works I doesnt seem to be able to get it pass 500MB/s.
>>
>> With this card you really need multiple cpus and multiple threads
>> sending data through the card in order to fill the 10Gb pipe.
>>
>> Single connections will not fill the pipe.
>
> The server is a Sun X4600 with 8 x dual-core CPU's, setup with 64
> NFS-threads. The other end of the fiber goes into a switch with gigabit
> ports connected to 48 dual-core cpus. The test was done doing a dd on a
> 4.5GB file from the server to /dev/null on the clients.
Are you doing a TX or RX (with respect to the 10G if)?
-- matheos
>
> The number of contextswitches seems enourmous.. over 120.000 sometimes.
> When transmitting around the same amount of data (4xgigabit bonded with
> 802.3ad) 4x110MB/s the amount of contextswitches only reaches 3-4.000. I
> have no idea if this has any relevance.
>
> Should this setup not be able to fill the pipe?
>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-09 22:07 ` David Miller
@ 2008-05-09 22:13 ` Jesper Krogh
0 siblings, 0 replies; 52+ messages in thread
From: Jesper Krogh @ 2008-05-09 22:13 UTC (permalink / raw)
To: David Miller; +Cc: yhlu.kernel, linux-kernel, netdev
David Miller wrote:
> From: Jesper Krogh <jesper@krogh.cc>
> Date: Fri, 09 May 2008 23:59:18 +0200
>
>> David Miller wrote:
>>> From: Jesper Krogh <jesper@krogh.cc>
>>> Date: Fri, 09 May 2008 20:32:53 +0200
>>>
>>>> When it works I doesnt seem to be able to get it pass 500MB/s.
>>> With this card you really need multiple cpus and multiple threads
>>> sending data through the card in order to fill the 10Gb pipe.
>>>
>>> Single connections will not fill the pipe.
>> The server is a Sun X4600 with 8 x dual-core CPU's, setup with 64
>> NFS-threads. The other end of the fiber goes into a switch with gigabit
>> ports connected to 48 dual-core cpus. The test was done doing a dd on a
>> 4.5GB file from the server to /dev/null on the clients.
>
> A single file transfer will not fill the pipe using this card, no
> matter how many cpus you have :-)
I do run 20+ times dd on each of the clients at the same time. Using a
single client I should be able to get it pass 1 gigabit. (The 4.5 GB
file can fit in the 32GB memory of the server so the serverdisk isn't
the bottleneck either)
Further investigation shows that:
# ethtool -k eth4
Offload parameters for eth4:
Cannot get device rx csum settings: Operation not supported
rx-checksumming: off
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off
Does that seem correct?
--
Jesper
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-09 22:09 ` Matheos Worku
@ 2008-05-09 22:15 ` Jesper Krogh
2008-05-09 22:36 ` Matheos Worku
0 siblings, 1 reply; 52+ messages in thread
From: Jesper Krogh @ 2008-05-09 22:15 UTC (permalink / raw)
To: Matheos Worku; +Cc: David Miller, yhlu.kernel, linux-kernel, netdev
Matheos Worku wrote:
> Jesper Krogh wrote:
>> David Miller wrote:
>>> From: Jesper Krogh <jesper@krogh.cc>
>>> Date: Fri, 09 May 2008 20:32:53 +0200
>>>
>>>> When it works I doesnt seem to be able to get it pass 500MB/s.
>>>
>>> With this card you really need multiple cpus and multiple threads
>>> sending data through the card in order to fill the 10Gb pipe.
>>>
>>> Single connections will not fill the pipe.
>>
>> The server is a Sun X4600 with 8 x dual-core CPU's, setup with 64
>> NFS-threads. The other end of the fiber goes into a switch with gigabit
>> ports connected to 48 dual-core cpus. The test was done doing a dd on a
>> 4.5GB file from the server to /dev/null on the clients.
>
> Are you doing a TX or RX (with respect to the 10G if)?
Thats a transmit.. from the NFS server to the clients.
Jesper
--
Jesper
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-09 21:59 ` Jesper Krogh
2008-05-09 22:07 ` David Miller
2008-05-09 22:09 ` Matheos Worku
@ 2008-05-09 22:20 ` Rick Jones
2008-05-09 22:48 ` Jesper Krogh
2 siblings, 1 reply; 52+ messages in thread
From: Rick Jones @ 2008-05-09 22:20 UTC (permalink / raw)
To: Jesper Krogh; +Cc: David Miller, yhlu.kernel, linux-kernel, netdev
Jesper Krogh wrote:
> David Miller wrote:
>
>> From: Jesper Krogh <jesper@krogh.cc>
>> Date: Fri, 09 May 2008 20:32:53 +0200
>>
>>> When it works I doesnt seem to be able to get it pass 500MB/s.
>>
>>
>> With this card you really need multiple cpus and multiple threads
>> sending data through the card in order to fill the 10Gb pipe.
>>
>> Single connections will not fill the pipe.
>
>
> The server is a Sun X4600 with 8 x dual-core CPU's, setup with 64
> NFS-threads. The other end of the fiber goes into a switch with gigabit
> ports connected to 48 dual-core cpus. The test was done doing a dd on a
> 4.5GB file from the server to /dev/null on the clients.
>
> The number of contextswitches seems enourmous.. over 120.000 sometimes.
> When transmitting around the same amount of data (4xgigabit bonded with
> 802.3ad) 4x110MB/s the amount of contextswitches only reaches 3-4.000. I
> have no idea if this has any relevance.
>
> Should this setup not be able to fill the pipe?
Into which slot was the Neptune inserted? (sure will be nice to have
Alex Chiang's pci slot id patch in mainline one of these days :)
Is that slot x4, x8, x16?
To which cpu(s) were the neptune's interrupts assigned? (grep <ethN>
/proc/interrupts)
Is the irqbalanced running?
Were any of the 16 CPUs in the system saturated during the test? (top
with all CPUs displayed)
Do you have/know of any diagrams showing the way the I/O slots are wired
to the rest of the system?
Have you tried any tests without any filesystem involvement? A script
like this (might need post-mailer fixup) might be interesting to run:
s2:~ # cat runemomniagg.sh
length=30
confidence="-i 30,30"
# comment the following to get proper aggregates
# rather than quick-and dirty
confidence=""
#edit these to match your clients
control_host[1]=192.168.2.205
control_host[2]=192.168.2.206
control_host[3]=192.168.2.207
control_host[4]=192.168.2.208
control_host[5]=192.168.2.209
control_host[6]=192.168.2.210
control_host[7]=192.168.2.211
control_host[8]=192.168.2.212
control_host[9]=192.168.2.201
control_host[10]=192.168.2.203
control_host[11]=192.168.2.204
control_host[12]=192.168.2.202
concurrent_sessions="1 2 3 4 5 6 7 8 9 10 11 12"
HDR="-P 1"
# -O means "human" -o means "csv"
CSV="-o"
#CSV="-O"
echo text you supply about interrupts
echo text you supply about the systems
uname -a
echo TCP_STREAM to multiple systems
for i in $concurrent_sessions; do j=1; echo $i concurrent streams;
while [ $j -le $i ]; do netperf $HDR -t omni -c -C -H
${control_host[$j]} -l $length $confidence -- $CSV -m 64K & HDR="-P
0";j=`expr $j + 1`; done; wait; done
echo TCP_MAERTS to multiple systems
HDR="-P 1"
for i in $concurrent_sessions; do j=1; echo $i concurrent streams;
while [ $j -le $i ]; do netperf $HDR -t omni -c -C -H
${control_host[$j]} -l $length $confidence -- $CSV -M ,64K & HDR="-P
0";j=`expr $j + 1`; done; wait; done
echo bidir TCP_RR MEGABITS to multiple systems
HDR="-P 1"
for i in $concurrent_sessions; do j=1; echo $i concurrent streams;
while [ $j -le $i ]; do netperf $HDR -t omni -f m -c -C -H
${control_host[$j]} -l $length $confidence -- $CSV -s 1M -S 1M -r 64K -b
12 & HDR="-P 0";j=`expr $j + 1`; done; wait; done
for burst in 0 1 4 16
do
echo TCP_RR to multiple systems burst of $burst
HDR="-P 1"
for i in $concurrent_sessions; do j=1; echo $i concurrent streams;
while [ $j -le $i ]; do netperf $HDR -t omni -c -C -H
${control_host[$j]} -l $length $confidence -- $CSV -r 1 -b $burst -D &
HDR="-P 0";j=`expr $j + 1`; done; wait; done
done
cat /proc/meminfo
cat /proc/cpuinfo
Which will run netperf "omni" tests and emit a _LOT_ of information.
The concurrent tests run "properly" will be ~15 minutes a data point.
The output is probably best viewed in a spreadsheet program. It is
possible to limit what netperf will emit by placing the names of the
output items of interest into a file you pass in the test specific -o or
-O options. The netperf omni tests are in the top of trunk at:
http://www.netperf.org/svn/netperf2/trunk/
via subversion. The script presume your ./configure was a superset of:
./configure --enable-omni --enable-burst
You would want it installed on both your SUT and at least a subset of
your LG's.
rick jones
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-09 22:15 ` Jesper Krogh
@ 2008-05-09 22:36 ` Matheos Worku
2008-05-09 22:43 ` Matheos Worku
2008-05-09 22:45 ` David Miller
0 siblings, 2 replies; 52+ messages in thread
From: Matheos Worku @ 2008-05-09 22:36 UTC (permalink / raw)
To: Jesper Krogh; +Cc: David Miller, yhlu.kernel, linux-kernel, netdev
Jesper Krogh wrote:
> Matheos Worku wrote:
>
>> Jesper Krogh wrote:
>>
>>> David Miller wrote:
>>>
>>>> From: Jesper Krogh <jesper@krogh.cc>
>>>> Date: Fri, 09 May 2008 20:32:53 +0200
>>>>
>>>>> When it works I doesnt seem to be able to get it pass 500MB/s.
>>>>
>>>>
>>>> With this card you really need multiple cpus and multiple threads
>>>> sending data through the card in order to fill the 10Gb pipe.
>>>>
>>>> Single connections will not fill the pipe.
>>>
>>>
>>> The server is a Sun X4600 with 8 x dual-core CPU's, setup with 64
>>> NFS-threads. The other end of the fiber goes into a switch with gigabit
>>> ports connected to 48 dual-core cpus. The test was done doing a dd on a
>>> 4.5GB file from the server to /dev/null on the clients.
>>
>>
>> Are you doing a TX or RX (with respect to the 10G if)?
>
>
> Thats a transmit.. from the NFS server to the clients.
I have observed TX throughput degradation (and increased CPU
utilization) occurs with increased # of connections, when CPU count > 4
CPUs. I don't think it is related to the driver (or HW). A while ago I
prototyped a driver which drops all UDP TX packets and the throughput
degradation (and CPU utilization increase) behavior occurred though the
driver was not doing much work. LSO/TSO seems to help with the
situation though. With LSO disabled, I have observed the issue on
several 10G nics.
Regards
Matheos
>
> Jesper
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-09 22:36 ` Matheos Worku
@ 2008-05-09 22:43 ` Matheos Worku
2008-05-09 22:46 ` David Miller
2008-05-09 23:10 ` Jesper Krogh
2008-05-09 22:45 ` David Miller
1 sibling, 2 replies; 52+ messages in thread
From: Matheos Worku @ 2008-05-09 22:43 UTC (permalink / raw)
To: Matheos Worku
Cc: Jesper Krogh, David Miller, yhlu.kernel, linux-kernel, netdev
Matheos Worku wrote:
> Jesper Krogh wrote:
>
>> Matheos Worku wrote:
>>
>>> Jesper Krogh wrote:
>>>
>>>> David Miller wrote:
>>>>
>>>>> From: Jesper Krogh <jesper@krogh.cc>
>>>>> Date: Fri, 09 May 2008 20:32:53 +0200
>>>>>
>>>>>> When it works I doesnt seem to be able to get it pass 500MB/s.
>>>>>
>>>>>
>>>>>
>>>>> With this card you really need multiple cpus and multiple threads
>>>>> sending data through the card in order to fill the 10Gb pipe.
>>>>>
>>>>> Single connections will not fill the pipe.
>>>>
>>>>
>>>>
>>>> The server is a Sun X4600 with 8 x dual-core CPU's, setup with 64
>>>> NFS-threads. The other end of the fiber goes into a switch with
>>>> gigabit
>>>> ports connected to 48 dual-core cpus. The test was done doing a dd
>>>> on a
>>>> 4.5GB file from the server to /dev/null on the clients.
>>>
>>>
>>>
>>> Are you doing a TX or RX (with respect to the 10G if)?
>>
>>
>>
>> Thats a transmit.. from the NFS server to the clients.
>
Is MSI/MSI-X enabled on the kernel? I have noticed that it was not
enabled on Gutsy-SPARC?
--Matheos
>
> I have observed TX throughput degradation (and increased CPU
> utilization) occurs with increased # of connections, when CPU count >
> 4 CPUs. I don't think it is related to the driver (or HW). A while
> ago I prototyped a driver which drops all UDP TX packets and the
> throughput degradation (and CPU utilization increase) behavior
> occurred though the driver was not doing much work. LSO/TSO seems to
> help with the situation though. With LSO disabled, I have observed the
> issue on several 10G nics.
>
> Regards
> Matheos
>
>
>>
>> Jesper
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-09 22:36 ` Matheos Worku
2008-05-09 22:43 ` Matheos Worku
@ 2008-05-09 22:45 ` David Miller
2008-05-22 16:32 ` Jesper Krogh
1 sibling, 1 reply; 52+ messages in thread
From: David Miller @ 2008-05-09 22:45 UTC (permalink / raw)
To: Matheos.Worku; +Cc: jesper, yhlu.kernel, linux-kernel, netdev
From: Matheos Worku <Matheos.Worku@Sun.COM>
Date: Fri, 09 May 2008 15:36:17 -0700
> I have observed TX throughput degradation (and increased CPU
> utilization) occurs with increased # of connections, when CPU count > 4
> CPUs. I don't think it is related to the driver (or HW).
All transmits through a device are fully serialized currently,
it's a known problem and something we plan to fix.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-09 22:43 ` Matheos Worku
@ 2008-05-09 22:46 ` David Miller
2008-05-09 23:10 ` Jesper Krogh
1 sibling, 0 replies; 52+ messages in thread
From: David Miller @ 2008-05-09 22:46 UTC (permalink / raw)
To: Matheos.Worku; +Cc: jesper, yhlu.kernel, linux-kernel, netdev
From: Matheos Worku <Matheos.Worku@Sun.COM>
Date: Fri, 09 May 2008 15:43:39 -0700
> Is MSI/MSI-X enabled on the kernel? I have noticed that it was not
> enabled on Gutsy-SPARC?
He's on an x86 system.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-09 22:20 ` Rick Jones
@ 2008-05-09 22:48 ` Jesper Krogh
2008-05-09 23:03 ` Rick Jones
` (2 more replies)
0 siblings, 3 replies; 52+ messages in thread
From: Jesper Krogh @ 2008-05-09 22:48 UTC (permalink / raw)
To: Rick Jones; +Cc: David Miller, yhlu.kernel, linux-kernel, netdev
Rick Jones wrote:
>> The number of contextswitches seems enourmous.. over 120.000 sometimes.
>> When transmitting around the same amount of data (4xgigabit bonded with
>> 802.3ad) 4x110MB/s the amount of contextswitches only reaches 3-4.000. I
>> have no idea if this has any relevance.
>>
>> Should this setup not be able to fill the pipe?
>
> Into which slot was the Neptune inserted? (sure will be nice to have
> Alex Chiang's pci slot id patch in mainline one of these days :)
>
> Is that slot x4, x8, x16?
I can find out excactly .. on monday. But shouldn't x4 be enough anyway?
wikipedia says 250MB/s pr. lane. And no slots is less than x4, so I
thought that it didn't matter to me.
> To which cpu(s) were the neptune's interrupts assigned? (grep <ethN>
> /proc/interrupts)
Several
> Is the irqbalanced running?
Yes. I started by not running it, but saw that cpu 0 was saturated by
ksoftirqd
> Were any of the 16 CPUs in the system saturated during the test? (top
> with all CPUs displayed)
Yes..
Cpu12 : 34.2%us, 5.9%sy, 0.0%ni, 0.0%id, 14.5%wa, 1.0%hi, 44.4%si,
0.0%st
All others have idle time.
> Do you have/know of any diagrams showing the way the I/O slots are wired
> to the rest of the system?
I havent dug into that. Probably naive, as I as, I expected som hardware
guys as Sun to have taken care of that, when I bought the recommended
10g card for their own server.
> Have you tried any tests without any filesystem involvement?
no not yet. I'll try that.
--
Jesper
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-09 22:48 ` Jesper Krogh
@ 2008-05-09 23:03 ` Rick Jones
2008-05-09 23:13 ` Jesper Krogh
2008-05-09 23:08 ` David Dillow
2008-05-10 2:20 ` Bill Fink
2 siblings, 1 reply; 52+ messages in thread
From: Rick Jones @ 2008-05-09 23:03 UTC (permalink / raw)
To: Jesper Krogh; +Cc: David Miller, yhlu.kernel, linux-kernel, netdev
>> To which cpu(s) were the neptune's interrupts assigned? (grep <ethN>
>> /proc/interrupts)
>
> Several
Which ones?-) I suspect that since it is several that at least answers
the MSI_X question right? Or is it possible to have a single device
have several MSI interrupts?
>> Is the irqbalanced running?
>
>
> Yes. I started by not running it, but saw that cpu 0 was saturated by
> ksoftirqd
>
>> Were any of the 16 CPUs in the system saturated during the test? (top
>> with all CPUs displayed)
>
>
> Yes..
> Cpu12 : 34.2%us, 5.9%sy, 0.0%ni, 0.0%id, 14.5%wa, 1.0%hi, 44.4%si,
> 0.0%st
David mentioned TX serialization - perhaps that is where TX completions
are happening?
> All others have idle time.
>
>> Do you have/know of any diagrams showing the way the I/O slots are
>> wired to the rest of the system?
>
>
> I havent dug into that. Probably naive, as I as, I expected som hardware
> guys as Sun to have taken care of that, when I bought the recommended
> 10g card for their own server.
I am very clueless as to the design considerations under which Neptune
was designed, but given that Neptune is based (IIRC) on the same "in
core" 10G stuff as Sun talks about being in the UltraSPARC-T2
processors, there is the possibility the design center was somewhat
different then what one sees in an 8P Opteron system, whether it carries
a Sun logo or not. From what I've learned thusfar 8P Opteron systems
are "interesting" beasts with what I suspect are rather different
constraints and behaviours than an UltraSPARC-T[12] system.
What I was asking about was more a block diagram showing via which
sockets the given I/O slots were connected.
>> Have you tried any tests without any filesystem involvement?
>
> no not yet. I'll try that.
If you decide to go the netperf route, feel free to contact me directly
with netperf questions, or via the netperf-talk mailling list on
netperf.org.
rick jones
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-09 22:48 ` Jesper Krogh
2008-05-09 23:03 ` Rick Jones
@ 2008-05-09 23:08 ` David Dillow
2008-05-10 6:22 ` Jesper Krogh
2008-05-10 2:20 ` Bill Fink
2 siblings, 1 reply; 52+ messages in thread
From: David Dillow @ 2008-05-09 23:08 UTC (permalink / raw)
To: Jesper Krogh; +Cc: Rick Jones, David Miller, yhlu.kernel, linux-kernel, netdev
On Sat, 2008-05-10 at 00:48 +0200, Jesper Krogh wrote:
> Rick Jones wrote:
> >> The number of contextswitches seems enourmous.. over 120.000 sometimes.
> >> When transmitting around the same amount of data (4xgigabit bonded with
> >> 802.3ad) 4x110MB/s the amount of contextswitches only reaches 3-4.000. I
> >> have no idea if this has any relevance.
> >>
> >> Should this setup not be able to fill the pipe?
> >
> > Into which slot was the Neptune inserted? (sure will be nice to have
> > Alex Chiang's pci slot id patch in mainline one of these days :)
> >
> > Is that slot x4, x8, x16?
>
> I can find out excactly .. on monday. But shouldn't x4 be enough anyway?
> wikipedia says 250MB/s pr. lane. And no slots is less than x4, so I
> thought that it didn't matter to me.
lspci -vvv on the slot can tell you what has been negotiated. Also, you
can figure the effective PCIe bandwidth using ~200 MB/s as a rule of
thumb, given the various overheads on the bus. So an x4 won't fully load
up a 10Gbps link.
At least, that's what we're seeing in x8 slots using Infiniband cards,
so your mileage may vary.
--
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-09 22:43 ` Matheos Worku
2008-05-09 22:46 ` David Miller
@ 2008-05-09 23:10 ` Jesper Krogh
2008-05-09 23:21 ` Matheos Worku
1 sibling, 1 reply; 52+ messages in thread
From: Jesper Krogh @ 2008-05-09 23:10 UTC (permalink / raw)
To: Matheos Worku; +Cc: David Miller, yhlu.kernel, linux-kernel, netdev
Matheos Worku wrote:
>>> Thats a transmit.. from the NFS server to the clients.
>>
> Is MSI/MSI-X enabled on the kernel? I have noticed that it was not
> enabled on Gutsy-SPARC?
# grep MSI .config
CONFIG_ARCH_SUPPORTS_MSI=y
CONFIG_PCI_MSI=y
Should that be ok?
--
Jesper Krogh
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-09 23:03 ` Rick Jones
@ 2008-05-09 23:13 ` Jesper Krogh
2008-05-09 23:33 ` Rick Jones
0 siblings, 1 reply; 52+ messages in thread
From: Jesper Krogh @ 2008-05-09 23:13 UTC (permalink / raw)
To: Rick Jones; +Cc: David Miller, yhlu.kernel, linux-kernel, netdev
Rick Jones wrote:
>>> To which cpu(s) were the neptune's interrupts assigned? (grep <ethN>
>>> /proc/interrupts)
>>
>> Several
>
> Which ones?-) I suspect that since it is several that at least answers
> the MSI_X question right? Or is it possible to have a single device
> have several MSI interrupts?
The table is rather large.
http://rafb.net/p/a9zfFA59.html
>>> Have you tried any tests without any filesystem involvement?
>>
>> no not yet. I'll try that.
>
> If you decide to go the netperf route, feel free to contact me directly
> with netperf questions, or via the netperf-talk mailling list on
> netperf.org.
Ok.
--
Jesper
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-09 23:10 ` Jesper Krogh
@ 2008-05-09 23:21 ` Matheos Worku
0 siblings, 0 replies; 52+ messages in thread
From: Matheos Worku @ 2008-05-09 23:21 UTC (permalink / raw)
To: Jesper Krogh; +Cc: David Miller, yhlu.kernel, linux-kernel, netdev
Jesper Krogh wrote:
> Matheos Worku wrote:
>>>> Thats a transmit.. from the NFS server to the clients.
>>>
>> Is MSI/MSI-X enabled on the kernel? I have noticed that it was not
>> enabled on Gutsy-SPARC?
>
> # grep MSI .config
> CONFIG_ARCH_SUPPORTS_MSI=y
> CONFIG_PCI_MSI=y
>
> Should that be ok?
>
That should be ok.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-09 23:13 ` Jesper Krogh
@ 2008-05-09 23:33 ` Rick Jones
0 siblings, 0 replies; 52+ messages in thread
From: Rick Jones @ 2008-05-09 23:33 UTC (permalink / raw)
To: Jesper Krogh; +Cc: David Miller, yhlu.kernel, linux-kernel, netdev
Jesper Krogh wrote:
> Rick Jones wrote:
>
>>>> To which cpu(s) were the neptune's interrupts assigned? (grep <ethN>
>>>> /proc/interrupts)
>>>
>>>
>>> Several
>>
>>
>> Which ones?-) I suspect that since it is several that at least
>> answers the MSI_X question right? Or is it possible to have a single
>> device have several MSI interrupts?
Well, the output below anwered my question about multiple MSI's from a
single device :)
>
>
> The table is rather large.
> http://rafb.net/p/a9zfFA59.html
So, over time at least, the interrupts have been spread over half the
cores in the system. This is where the block diagram showing the socket
and I/O slot connections would be useful.
For the interrupts over a given interval, one can take snapshots at
either end of the interval and then run them through beforeafter:
ftp://ftp.cup.hp.com/dist/networking/tools/
I think that collectl might have something there too.
I guess the addresses in use weren't sufficient to get interrupts spread
across all 24 IRQ's.
rick jones
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-09 22:48 ` Jesper Krogh
2008-05-09 23:03 ` Rick Jones
2008-05-09 23:08 ` David Dillow
@ 2008-05-10 2:20 ` Bill Fink
2 siblings, 0 replies; 52+ messages in thread
From: Bill Fink @ 2008-05-10 2:20 UTC (permalink / raw)
To: Jesper Krogh; +Cc: Rick Jones, David Miller, yhlu.kernel, linux-kernel, netdev
On Sat, 10 May 2008, Jesper Krogh wrote:
> Rick Jones wrote:
> >
> > Into which slot was the Neptune inserted? (sure will be nice to have
> > Alex Chiang's pci slot id patch in mainline one of these days :)
> >
> > Is that slot x4, x8, x16?
>
> I can find out excactly .. on monday. But shouldn't x4 be enough anyway?
> wikipedia says 250MB/s pr. lane. And no slots is less than x4, so I
> thought that it didn't matter to me.
No you need 8x for 10-GigE. The 250 MB/sec per lane is 2 Gbps per lane
so 4x is only 8 Gbps. For a typical PCI-E transaction size of 128 bytes,
and PCI-E protocol overhead of 20 to 24 bytes, that takes off about 20%,
which leaves you with about 6.5 Gbps of usable bandwidth, which is further
reduced somewhat by the required PCI-E ACK and flow control packets that
accompany the actual data transaction packets. I am not an expert on
PCI-E but I believe this info to be generally correct.
-Bill
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-09 23:08 ` David Dillow
@ 2008-05-10 6:22 ` Jesper Krogh
2008-05-10 15:53 ` Roland Dreier
0 siblings, 1 reply; 52+ messages in thread
From: Jesper Krogh @ 2008-05-10 6:22 UTC (permalink / raw)
To: David Dillow; +Cc: Rick Jones, David Miller, yhlu.kernel, linux-kernel, netdev
David Dillow wrote:
> lspci -vvv on the slot can tell you what has been negotiated. Also, you
> can figure the effective PCIe bandwidth using ~200 MB/s as a rule of
> thumb, given the various overheads on the bus. So an x4 won't fully load
> up a 10Gbps link.
Ok. It must be x8 then:
Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1
--
Jesper
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-09 18:32 ` Jesper Krogh
2008-05-09 21:32 ` David Miller
@ 2008-05-10 11:01 ` Jesper Krogh
2008-05-11 4:34 ` David Miller
1 sibling, 1 reply; 52+ messages in thread
From: Jesper Krogh @ 2008-05-10 11:01 UTC (permalink / raw)
To: David Miller; +Cc: yhlu.kernel, linux-kernel, netdev
Jesper Krogh wrote:
> David Miller wrote:
>> From: "Yinghai Lu" <yhlu.kernel@gmail.com>
>> Date: Wed, 7 May 2008 11:30:18 -0700
>>
>>> On Wed, May 7, 2008 at 11:23 AM, Jesper Krogh <jesper@krogh.cc> wrote:
>>>> Hi.
>>>>
>>>> I get errors like this after a few minutes of traffic on a Sun
>>>> Neptune 10g
>>>> ethernet card. (with nice 500MB/s throughput).
>>>>
>>>> Then the server seems too busy with something, so it doesn't even
>>>> respont
>>>> to a serial terminal login.
>>>>
>>>> May 7 16:16:33 hest kernel: [ 166.948958] niu: niu_get_parent:
>>>> platform_type[1] port[3]
>>>> May 7 16:16:33 hest kernel: [ 166.949366] niu:
>>>> niu_get_and_validate_port:
>>>> port[3] num_ports[2]
>>>> May 7 16:16:33 hest kernel: [ 166.949886] niu: niu_put_parent:
>>>> port[3]
>>>> .. bootup ends here ..
>>>> May 7 17:13:54 hest kernel: [ 3670.128178] niu 0000:84:00.0: niu:
>>>> eth4:
>>>> Transmit timed out, resetting
>>>> May 7 17:14:04 hest kernel: [ 3680.108614] niu 0000:84:00.0: niu:
>>>> eth4:
>>>> Transmit timed out, resetting
>>>> May 7 17:14:14 hest kernel: [ 3690.093089] niu 0000:84:00.0: niu:
>>>> eth4:
>>>> Transmit timed out, resetting
>>>> May 7 17:14:19 hest kernel: [ 3695.079254] niu 0000:84:00.0: niu:
>>>> eth4:
>>>> Transmit timed out, resetting
>>>> May 7 17:14:24 hest kernel: [ 3700.073525] niu 0000:84:00.0: niu:
>>>> eth4:
>>>> Transmit timed out, resetting
>>>> May 7 17:14:29 hest kernel: [ 3705.063744] niu 0000:84:00.0: niu:
>>>> eth4:
>>>> Transmit timed out, resetting
>>>> May 7 17:14:34 hest kernel: [ 3710.049918] niu 0000:84:00.0: niu:
>>>> eth4:
>>>> Transmit timed out, resetting
>>>>
>>>>
>>>> Any suggestions?
>>>>
>>>> The system is an Ubuntu Hardy (2.6.24-17-server) amd64.
>>> can you try 2.6.25 or current git?
>>
>> Also, please always CC: netdev@vger.kernel.org on networking reports.
>
> Yes. It is reproducable under 2.6.25.2, when the load get up.. (worked
> excellent in the <100MB/s range for several hours.
Any good suggestions about the "Transmit timed out" messages. It
currently leads to a system that "doesnt die" but doesnt respond within
15 minutes of load of the network adapter.
Does the high amount of Context-switches (120.000+ have any influence)?
Should I be able to use TSO?
# ethtool -k eth4
Offload parameters for eth4:
Cannot get device rx csum settings: Operation not supported
rx-checksumming: off
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off
ethtool v6
Jesper
--
Jesper
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-10 6:22 ` Jesper Krogh
@ 2008-05-10 15:53 ` Roland Dreier
2008-05-12 6:49 ` Jesper Krogh
0 siblings, 1 reply; 52+ messages in thread
From: Roland Dreier @ 2008-05-10 15:53 UTC (permalink / raw)
To: Jesper Krogh
Cc: David Dillow, Rick Jones, David Miller, yhlu.kernel, linux-kernel,
netdev
> Ok. It must be x8 then:
> Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1
Actually that line is showing the supported capabilities... you want the
second line a few down that shows the actual status.
- R.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-10 11:01 ` Jesper Krogh
@ 2008-05-11 4:34 ` David Miller
2008-05-11 5:44 ` Jesper Krogh
2008-05-12 6:52 ` Jesper Krogh
0 siblings, 2 replies; 52+ messages in thread
From: David Miller @ 2008-05-11 4:34 UTC (permalink / raw)
To: jesper; +Cc: yhlu.kernel, linux-kernel, netdev
From: Jesper Krogh <jesper@krogh.cc>
Date: Sat, 10 May 2008 13:01:08 +0200
> Any good suggestions about the "Transmit timed out" messages. It
> currently leads to a system that "doesnt die" but doesnt respond within
> 15 minutes of load of the network adapter.
It's likely some bug in the driver that hangs the card for whatever
reason, which we'll need to work out.
> Does the high amount of Context-switches (120.000+ have any influence)?
Unlikely.
> Should I be able to use TSO?
NIU doesn't support TSO.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-11 4:34 ` David Miller
@ 2008-05-11 5:44 ` Jesper Krogh
2008-05-11 6:08 ` David Miller
2008-05-12 6:52 ` Jesper Krogh
1 sibling, 1 reply; 52+ messages in thread
From: Jesper Krogh @ 2008-05-11 5:44 UTC (permalink / raw)
To: David Miller; +Cc: yhlu.kernel, linux-kernel, netdev
David Miller wrote:
> From: Jesper Krogh <jesper@krogh.cc>
> Date: Sat, 10 May 2008 13:01:08 +0200
>
>> Any good suggestions about the "Transmit timed out" messages. It
>> currently leads to a system that "doesnt die" but doesnt respond within
>> 15 minutes of load of the network adapter.
>
> It's likely some bug in the driver that hangs the card for whatever
> reason, which we'll need to work out.
What can I do to help out? I'd really like to get the 10g interface
operational quite soon.
Jesper
--
Jesper
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-11 5:44 ` Jesper Krogh
@ 2008-05-11 6:08 ` David Miller
2008-05-11 9:47 ` Jesper Krogh
0 siblings, 1 reply; 52+ messages in thread
From: David Miller @ 2008-05-11 6:08 UTC (permalink / raw)
To: jesper; +Cc: yhlu.kernel, linux-kernel, netdev
From: Jesper Krogh <jesper@krogh.cc>
Date: Sun, 11 May 2008 07:44:22 +0200
> David Miller wrote:
> > From: Jesper Krogh <jesper@krogh.cc>
> > Date: Sat, 10 May 2008 13:01:08 +0200
> >
> >> Any good suggestions about the "Transmit timed out" messages. It
> >> currently leads to a system that "doesnt die" but doesnt respond within
> >> 15 minutes of load of the network adapter.
> >
> > It's likely some bug in the driver that hangs the card for whatever
> > reason, which we'll need to work out.
>
> What can I do to help out? I'd really like to get the 10g interface
> operational quite soon.
I personally don't have the time to work on this currently, but maybe
I will later this week.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-11 6:08 ` David Miller
@ 2008-05-11 9:47 ` Jesper Krogh
0 siblings, 0 replies; 52+ messages in thread
From: Jesper Krogh @ 2008-05-11 9:47 UTC (permalink / raw)
To: David Miller; +Cc: yhlu.kernel, linux-kernel, netdev
David Miller wrote:
> I personally don't have the time to work on this currently, but maybe
> I will later this week.
Ok. Just let me know if I can help and/or when you have something to
test.
Jesper
--
Jesper
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-10 15:53 ` Roland Dreier
@ 2008-05-12 6:49 ` Jesper Krogh
0 siblings, 0 replies; 52+ messages in thread
From: Jesper Krogh @ 2008-05-12 6:49 UTC (permalink / raw)
To: Roland Dreier
Cc: David Dillow, Rick Jones, David Miller, yhlu.kernel, linux-kernel,
netdev
Roland Dreier wrote:
> > Ok. It must be x8 then:
> > Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1
>
> Actually that line is showing the supported capabilities... you want the
> second line a few down that shows the actual status.
Ok it is still x8 then.
Link: Speed 2.5Gb/s, Width x8
Jesper
--
Jesper
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-11 4:34 ` David Miller
2008-05-11 5:44 ` Jesper Krogh
@ 2008-05-12 6:52 ` Jesper Krogh
2008-05-26 19:03 ` Jesper Krogh
1 sibling, 1 reply; 52+ messages in thread
From: Jesper Krogh @ 2008-05-12 6:52 UTC (permalink / raw)
To: David Miller; +Cc: yhlu.kernel, linux-kernel, netdev
David Miller wrote:
> From: Jesper Krogh <jesper@krogh.cc>
> Date: Sat, 10 May 2008 13:01:08 +0200
>
>> Any good suggestions about the "Transmit timed out" messages. It
>> currently leads to a system that "doesnt die" but doesnt respond within
>> 15 minutes of load of the network adapter.
>
> It's likely some bug in the driver that hangs the card for whatever
> reason, which we'll need to work out.
Ok. I have been testing a bit more. It generally works fine, It seems
the "transmit timed out" thing was provoked by dd in the tests, because
it was creating a lot of small requests. (default bs=512).(over NFS).
When the blocksize went up, the problem dissapered. (and the numer of
context-switches went down).
When the blocksize went up, the performance likewise raised to 615MB/s.
I haven't been able to get pass that number.
This is where one of the cpu's are settling at 100% load.
Jesper
--
Jesper
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-09 22:45 ` David Miller
@ 2008-05-22 16:32 ` Jesper Krogh
2008-05-22 17:15 ` Ben Hutchings
2008-05-22 17:41 ` David Miller
0 siblings, 2 replies; 52+ messages in thread
From: Jesper Krogh @ 2008-05-22 16:32 UTC (permalink / raw)
To: David Miller; +Cc: Matheos.Worku, yhlu.kernel, linux-kernel, netdev
David Miller wrote:
> From: Matheos Worku <Matheos.Worku@Sun.COM>
> Date: Fri, 09 May 2008 15:36:17 -0700
>
>> I have observed TX throughput degradation (and increased CPU
>> utilization) occurs with increased # of connections, when CPU count > 4
>> CPUs. I don't think it is related to the driver (or HW).
>
> All transmits through a device are fully serialized currently,
> it's a known problem and something we plan to fix.
I google'd up this one:
http://vger.kernel.org/~davem/davem_tokyo08.pdf (slide 23+).
Does this mean that I can expect every 10G card to have this limitation
under Linux? Or are some known to be better than others?
(I can probably justify paying for another 10G card if I can expect to
gain the last 400MB/s).
Since I cannot push more than ~600MB/s through the NIC before a single
cpu is bottlenecked, it seem most likely to be this TX throughput
thing I hit.
Thanks
Jesper
--
Jesper
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-22 16:32 ` Jesper Krogh
@ 2008-05-22 17:15 ` Ben Hutchings
2008-05-22 17:41 ` David Miller
1 sibling, 0 replies; 52+ messages in thread
From: Ben Hutchings @ 2008-05-22 17:15 UTC (permalink / raw)
To: Jesper Krogh
Cc: David Miller, Matheos.Worku, yhlu.kernel, linux-kernel, netdev
Jesper Krogh wrote:
> David Miller wrote:
> >From: Matheos Worku <Matheos.Worku@Sun.COM>
> >Date: Fri, 09 May 2008 15:36:17 -0700
> >
> >>I have observed TX throughput degradation (and increased CPU
> >>utilization) occurs with increased # of connections, when CPU count > 4
> >>CPUs. I don't think it is related to the driver (or HW).
> >
> >All transmits through a device are fully serialized currently,
> >it's a known problem and something we plan to fix.
>
> I google'd up this one:
>
> http://vger.kernel.org/~davem/davem_tokyo08.pdf (slide 23+).
>
> Does this mean that I can expect every 10G card to have this limitation
> under Linux? Or are some known to be better than others?
> (I can probably justify paying for another 10G card if I can expect to
> gain the last 400MB/s).
[...]
Our controller and driver can reach line rate for a single TCP/IPv4 stream,
depending on the host system. I don't think we're alone in that, though
there are few published benchmarks that don't involve TOEs.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-22 16:32 ` Jesper Krogh
2008-05-22 17:15 ` Ben Hutchings
@ 2008-05-22 17:41 ` David Miller
2008-05-22 18:14 ` Ben Hutchings
2008-06-01 7:25 ` Andrey Panin
1 sibling, 2 replies; 52+ messages in thread
From: David Miller @ 2008-05-22 17:41 UTC (permalink / raw)
To: jesper; +Cc: Matheos.Worku, yhlu.kernel, linux-kernel, netdev
From: Jesper Krogh <jesper@krogh.cc>
Date: Thu, 22 May 2008 18:32:07 +0200
> Does this mean that I can expect every 10G card to have this limitation
> under Linux?
For now, yes. The transmit path itself in the generic network
device layer is where the serialization comes from.
I'm travelling now for 3 weeks so I won't be able to work on
this stuff until I get back.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-22 17:41 ` David Miller
@ 2008-05-22 18:14 ` Ben Hutchings
2008-05-22 18:28 ` David Miller
2008-06-01 7:25 ` Andrey Panin
1 sibling, 1 reply; 52+ messages in thread
From: Ben Hutchings @ 2008-05-22 18:14 UTC (permalink / raw)
To: David Miller; +Cc: jesper, Matheos.Worku, yhlu.kernel, linux-kernel, netdev
David Miller wrote:
> From: Jesper Krogh <jesper@krogh.cc>
> Date: Thu, 22 May 2008 18:32:07 +0200
>
> > Does this mean that I can expect every 10G card to have this limitation
> > under Linux?
>
> For now, yes. The transmit path itself in the generic network
> device layer is where the serialization comes from.
This is true in the general case, but can be substantially mitigated
by segmentation offload. Unfortunately GSO doesn't help much as the
overhead of allocating the extra skbs is fairly high.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-22 18:14 ` Ben Hutchings
@ 2008-05-22 18:28 ` David Miller
0 siblings, 0 replies; 52+ messages in thread
From: David Miller @ 2008-05-22 18:28 UTC (permalink / raw)
To: bhutchings; +Cc: jesper, Matheos.Worku, yhlu.kernel, linux-kernel, netdev
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Thu, 22 May 2008 19:14:21 +0100
> David Miller wrote:
> > For now, yes. The transmit path itself in the generic network
> > device layer is where the serialization comes from.
>
> This is true in the general case, but can be substantially mitigated
> by segmentation offload. Unfortunately GSO doesn't help much as the
> overhead of allocating the extra skbs is fairly high.
But GSO still does help a lot for chips that lack hw TSO support,
such as NIU, therefore sw GSO support in the NIU driver was pretty
high on my todo list.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-12 6:52 ` Jesper Krogh
@ 2008-05-26 19:03 ` Jesper Krogh
2008-05-26 19:33 ` David Miller
0 siblings, 1 reply; 52+ messages in thread
From: Jesper Krogh @ 2008-05-26 19:03 UTC (permalink / raw)
To: David Miller; +Cc: yhlu.kernel, linux-kernel, netdev
Jesper Krogh wrote:
> David Miller wrote:
>> From: Jesper Krogh <jesper@krogh.cc>
>> Date: Sat, 10 May 2008 13:01:08 +0200
>>
>>> Any good suggestions about the "Transmit timed out" messages. It
>>> currently leads to a system that "doesnt die" but doesnt respond within
>>> 15 minutes of load of the network adapter.
>>
>> It's likely some bug in the driver that hangs the card for whatever
>> reason, which we'll need to work out.
>
> Ok. I have been testing a bit more. It generally works fine, It seems
> the "transmit timed out" thing was provoked by dd in the tests, because
> it was creating a lot of small requests. (default bs=512).(over NFS).
> When the blocksize went up, the problem dissapered. (and the numer of
> context-switches went down).
Ok. Now I also hit it in production with the NFS-server, so this
is definately a real bug somewhere in the driver. Should I register it
at bugzilla?
$ sudo grep -ci "Transmit timed out" /var/log/messages
1007
And only a hard reboot brought it back online.
2.6.25.2
--
Jesper
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-26 19:03 ` Jesper Krogh
@ 2008-05-26 19:33 ` David Miller
2008-05-26 19:39 ` David Miller
0 siblings, 1 reply; 52+ messages in thread
From: David Miller @ 2008-05-26 19:33 UTC (permalink / raw)
To: jesper; +Cc: yhlu.kernel, linux-kernel, netdev
From: Jesper Krogh <jesper@krogh.cc>
Date: Mon, 26 May 2008 21:03:34 +0200
> Ok. Now I also hit it in production with the NFS-server, so this
> is definately a real bug somewhere in the driver. Should I register it
> at bugzilla?
Please feel free to do that.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-26 19:33 ` David Miller
@ 2008-05-26 19:39 ` David Miller
2008-05-26 20:54 ` Jesper Krogh
0 siblings, 1 reply; 52+ messages in thread
From: David Miller @ 2008-05-26 19:39 UTC (permalink / raw)
To: jesper; +Cc: yhlu.kernel, linux-kernel, netdev, matheos.worku
From: David Miller <davem@davemloft.net>
Date: Mon, 26 May 2008 12:33:38 -0700 (PDT)
> From: Jesper Krogh <jesper@krogh.cc>
> Date: Mon, 26 May 2008 21:03:34 +0200
>
> > Ok. Now I also hit it in production with the NFS-server, so this
> > is definately a real bug somewhere in the driver. Should I register it
> > at bugzilla?
>
> Please feel free to do that.
BTW, I did stare at some of the transmit code of the NIU driver
while flying from Tokyo to Seattle a few hours ago, and I
found one possible theory on the transmit timeouts.
Can you try the patch below and let us know if the symptoms
continue?
[ Note to Matheos: The IRQ marking scheme of the NIU doesn't mesh
well with how things work under Linux. We really needs a
"TX queue empty" interrupt status in order to handle all cases
properly. Otherwise we really cannot decide not mark some TX
descriptors without potentially entering a deadlock condition. ]
diff --git a/drivers/net/niu.c b/drivers/net/niu.c
index 918f802..7ab7f8e 100644
--- a/drivers/net/niu.c
+++ b/drivers/net/niu.c
@@ -6165,7 +6165,7 @@ static int niu_start_xmit(struct sk_buff *skb, struct net_device *dev)
rp->tx_buffs[prod].mapping = mapping;
mrk = TX_DESC_SOP;
- if (++rp->mark_counter == rp->mark_freq) {
+ if (1 /*++rp->mark_counter == rp->mark_freq*/) {
rp->mark_counter = 0;
mrk |= TX_DESC_MARK;
rp->mark_pending++;
^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-26 19:39 ` David Miller
@ 2008-05-26 20:54 ` Jesper Krogh
2008-05-26 22:15 ` David Miller
0 siblings, 1 reply; 52+ messages in thread
From: Jesper Krogh @ 2008-05-26 20:54 UTC (permalink / raw)
To: David Miller; +Cc: yhlu.kernel, linux-kernel, netdev, matheos.worku
David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Mon, 26 May 2008 12:33:38 -0700 (PDT)
>
>> From: Jesper Krogh <jesper@krogh.cc>
>> Date: Mon, 26 May 2008 21:03:34 +0200
>>
>>> Ok. Now I also hit it in production with the NFS-server, so this
>>> is definately a real bug somewhere in the driver. Should I register it
>>> at bugzilla?
>> Please feel free to do that.
>
> BTW, I did stare at some of the transmit code of the NIU driver
> while flying from Tokyo to Seattle a few hours ago, and I
> found one possible theory on the transmit timeouts.
>
> Can you try the patch below and let us know if the symptoms
> continue?
>
> [ Note to Matheos: The IRQ marking scheme of the NIU doesn't mesh
> well with how things work under Linux. We really needs a
> "TX queue empty" interrupt status in order to handle all cases
> properly. Otherwise we really cannot decide not mark some TX
> descriptors without potentially entering a deadlock condition. ]
>
> diff --git a/drivers/net/niu.c b/drivers/net/niu.c
> index 918f802..7ab7f8e 100644
> --- a/drivers/net/niu.c
> +++ b/drivers/net/niu.c
> @@ -6165,7 +6165,7 @@ static int niu_start_xmit(struct sk_buff *skb, struct net_device *dev)
> rp->tx_buffs[prod].mapping = mapping;
>
> mrk = TX_DESC_SOP;
> - if (++rp->mark_counter == rp->mark_freq) {
> + if (1 /*++rp->mark_counter == rp->mark_freq*/) {
> rp->mark_counter = 0;
> mrk |= TX_DESC_MARK;
> rp->mark_pending++;
Applied and running.. I've now pushed 400GB of data through it trying to
get it to hit the bug but it is still running.
So without saying that it solved the problem, it definately seems so.
2.6.26-rc4 + above patch.
Jesper
--
Jesper
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-26 20:54 ` Jesper Krogh
@ 2008-05-26 22:15 ` David Miller
2008-05-26 22:21 ` Jesper Krogh
2008-05-27 6:19 ` Jesper Krogh
0 siblings, 2 replies; 52+ messages in thread
From: David Miller @ 2008-05-26 22:15 UTC (permalink / raw)
To: jesper; +Cc: yhlu.kernel, linux-kernel, netdev, matheos.worku
From: Jesper Krogh <jesper@krogh.cc>
Date: Mon, 26 May 2008 22:54:53 +0200
> Applied and running.. I've now pushed 400GB of data through it trying to
> get it to hit the bug but it is still running.
>
> So without saying that it solved the problem, it definately seems so.
> 2.6.26-rc4 + above patch.
Thanks for testing.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-26 22:15 ` David Miller
@ 2008-05-26 22:21 ` Jesper Krogh
2008-05-26 22:30 ` David Miller
2008-05-27 6:19 ` Jesper Krogh
1 sibling, 1 reply; 52+ messages in thread
From: Jesper Krogh @ 2008-05-26 22:21 UTC (permalink / raw)
To: David Miller; +Cc: yhlu.kernel, linux-kernel, netdev, matheos.worku
David Miller wrote:
> From: Jesper Krogh <jesper@krogh.cc>
> Date: Mon, 26 May 2008 22:54:53 +0200
>
>> Applied and running.. I've now pushed 400GB of data through it trying to
>> get it to hit the bug but it is still running.
>>
>> So without saying that it solved the problem, it definately seems so.
>> 2.6.26-rc4 + above patch.
>
> Thanks for testing.
I got this one:
(also reported in the linux-2.6.26-rc4 thread)
42949399.810959] ck804xrom ck804xrom_init_one(): Unable to register
resource 0x0000000000000000-0x00000000ffffffff - kernel bug?
[42949399.979924] ------------[ cut here ]------------
[42949399.979924] WARNING: at arch/x86/mm/ioremap.c:159
__ioremap_caller+0x299/0x330()
[42949399.979924] Modules linked in: ck804xrom(+) mtd i2c_nforce2(+)
niu(+) i2c_core serio_raw button(+) chipreg map_funcs pcspkr k8temp
shpchp pci_hotplug evdev joydev ext3 jbd mbcache pata_amd sr_mod cdrom
sg sd_mod usb_storage libusual pata_acpi usbhid hid mptsas ata_generic
mptspi scsi_transport_sas mptscsih mptbase libata scsi_transport_spi
ehci_hcd e1000 scsi_mod dock ohci_hcd usbcore dm_mirror dm_log
dm_snapshot dm_mod thermal processor fan fuse
[42949399.979924] Pid: 5660, comm: modprobe Not tainted 2.6.26-rc4 #1
[42949399.979924]
[42949399.979924] Call Trace:
[42949399.979924] [<ffffffff80236aa4>] warn_on_slowpath+0x64/0xa0
[42949399.979924] [<ffffffff8034b697>] idr_get_empty_slot+0xf7/0x280
[42949399.979924] [<ffffffff80237b4e>] printk+0x4e/0x60
[42949399.979924] [<ffffffff80465e52>] klist_iter_exit+0x12/0x20
[42949399.979924] [<ffffffff80223139>] __ioremap_caller+0x299/0x330
[42949399.979924] [<ffffffffa00ee1f6>]
:ck804xrom:init_ck804xrom+0x1f6/0x631
[42949399.979924] [<ffffffffa00ee1f6>]
:ck804xrom:init_ck804xrom+0x1f6/0x631
[42949399.979924] [<ffffffff8025c23a>] sys_init_module+0x17a/0x1d00
[42949399.979924] [<ffffffff8028c228>] vma_prio_tree_insert+0x28/0x60
[42949399.979924] [<ffffffff8020c2bb>] system_call_after_swapgs+0x7b/0x80
[42949399.979924]
[42949399.979924] ---[ end trace 5bb785355abc57e6 ]---
[42949399.979924] ck804xrom: ioremap(00000000, 100000000) failed
(to me it seemed unrelated, but I definately not an expert in this)
--
Jesper
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-26 22:21 ` Jesper Krogh
@ 2008-05-26 22:30 ` David Miller
0 siblings, 0 replies; 52+ messages in thread
From: David Miller @ 2008-05-26 22:30 UTC (permalink / raw)
To: jesper; +Cc: yhlu.kernel, linux-kernel, netdev, matheos.worku
From: Jesper Krogh <jesper@krogh.cc>
Date: Tue, 27 May 2008 00:21:15 +0200
> David Miller wrote:
> > From: Jesper Krogh <jesper@krogh.cc>
> > Date: Mon, 26 May 2008 22:54:53 +0200
> >
> >> Applied and running.. I've now pushed 400GB of data through it trying to
> >> get it to hit the bug but it is still running.
> >>
> >> So without saying that it solved the problem, it definately seems so.
> >> 2.6.26-rc4 + above patch.
> >
> > Thanks for testing.
>
> I got this one:
> (also reported in the linux-2.6.26-rc4 thread)
That's for another driver.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-26 22:15 ` David Miller
2008-05-26 22:21 ` Jesper Krogh
@ 2008-05-27 6:19 ` Jesper Krogh
2008-05-28 1:18 ` Matheos Worku
1 sibling, 1 reply; 52+ messages in thread
From: Jesper Krogh @ 2008-05-27 6:19 UTC (permalink / raw)
To: David Miller; +Cc: yhlu.kernel, linux-kernel, netdev, matheos.worku
David Miller wrote:
> From: Jesper Krogh <jesper@krogh.cc>
> Date: Mon, 26 May 2008 22:54:53 +0200
>
>> Applied and running.. I've now pushed 400GB of data through it trying to
>> get it to hit the bug but it is still running.
>>
>> So without saying that it solved the problem, it definately seems so.
>> 2.6.26-rc4 + above patch.
>
> Thanks for testing.
Ok. I was too early out.. it ended up in the same situation again.
May 27 08:09:12 hest kernel: [42953871.982072] NETDEV WATCHDOG: eth4:
transmit timed out
May 27 08:09:17 hest kernel: [42953877.827797] NETDEV WATCHDOG: eth4:
transmit timed out
May 27 08:09:22 hest kernel: [42953883.958375] NETDEV WATCHDOG: eth4:
transmit timed out
May 27 08:09:27 hest kernel: [42953890.668401] NETDEV WATCHDOG: eth4:
transmit timed out
Jesper
--
Jesper
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-27 6:19 ` Jesper Krogh
@ 2008-05-28 1:18 ` Matheos Worku
2008-05-29 5:34 ` David Miller
2008-06-18 0:02 ` David Miller
0 siblings, 2 replies; 52+ messages in thread
From: Matheos Worku @ 2008-05-28 1:18 UTC (permalink / raw)
To: Jesper Krogh; +Cc: David Miller, yhlu.kernel, linux-kernel, netdev
Jesper Krogh wrote:
> David Miller wrote:
>
>> From: Jesper Krogh <jesper@krogh.cc>
>> Date: Mon, 26 May 2008 22:54:53 +0200
>>
>>> Applied and running.. I've now pushed 400GB of data through it
>>> trying to
>>> get it to hit the bug but it is still running.
>>>
>>> So without saying that it solved the problem, it definately seems so.
>>> 2.6.26-rc4 + above patch.
>>
>>
>> Thanks for testing.
>
>
> Ok. I was too early out.. it ended up in the same situation again.
>
> May 27 08:09:12 hest kernel: [42953871.982072] NETDEV WATCHDOG: eth4:
> transmit timed out
> May 27 08:09:17 hest kernel: [42953877.827797] NETDEV WATCHDOG: eth4:
> transmit timed out
> May 27 08:09:22 hest kernel: [42953883.958375] NETDEV WATCHDOG: eth4:
> transmit timed out
> May 27 08:09:27 hest kernel: [42953890.668401] NETDEV WATCHDOG: eth4:
> transmit timed out
>
>
> Jesper
Dave,
Considering that fixing the HW would take considerable time, I was
wondering if the scheme we use in the nxge driver could be considered as
a workaround. Since the niu driver is already doing skb_orphan as a work
around, what if already transmitted TX buffers are reclaimed
periodically, within dev->hard_start_xmit() ? Then TX_DESC_MARK would
be set if/when available TX descriptor count falls below some watermark.
Disable device TX queue about the time TX_DESC_MARK is set and enable
it within TX interrupt.
Regards
Matheos
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-28 1:18 ` Matheos Worku
@ 2008-05-29 5:34 ` David Miller
2008-05-30 0:14 ` Matheos Worku
2008-06-18 0:02 ` David Miller
1 sibling, 1 reply; 52+ messages in thread
From: David Miller @ 2008-05-29 5:34 UTC (permalink / raw)
To: Matheos.Worku; +Cc: jesper, yhlu.kernel, linux-kernel, netdev
From: Matheos Worku <Matheos.Worku@Sun.COM>
Date: Tue, 27 May 2008 18:18:57 -0700
> Considering that fixing the HW would take considerable time, I was
> wondering if the scheme we use in the nxge driver could be considered as
> a workaround. Since the niu driver is already doing skb_orphan as a work
> around, what if already transmitted TX buffers are reclaimed
> periodically, within dev->hard_start_xmit() ? Then TX_DESC_MARK would
> be set if/when available TX descriptor count falls below some watermark.
> Disable device TX queue about the time TX_DESC_MARK is set and enable
> it within TX interrupt.
Since my hack patch didn't fix his problem at all, are you suggesting
that we end up not fielding TX mark interrupts even though mark is set
in all the TX descriptors and this is what hangs the chip?
I find that very unlikely, especially because with my test patch every
single TX descriptor will have the mark bit set and therefore we'd
have to not receive all of those TX mark interrupts in order for the
TX unit to hang like that.
Something else must be going wrong.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-29 5:34 ` David Miller
@ 2008-05-30 0:14 ` Matheos Worku
2008-05-30 7:00 ` David Miller
0 siblings, 1 reply; 52+ messages in thread
From: Matheos Worku @ 2008-05-30 0:14 UTC (permalink / raw)
To: David Miller; +Cc: jesper, yhlu.kernel, linux-kernel, netdev
David Miller wrote:
>From: Matheos Worku <Matheos.Worku@Sun.COM>
>Date: Tue, 27 May 2008 18:18:57 -0700
>
>
>
>>Considering that fixing the HW would take considerable time, I was
>>wondering if the scheme we use in the nxge driver could be considered as
>>a workaround. Since the niu driver is already doing skb_orphan as a work
>>around, what if already transmitted TX buffers are reclaimed
>>periodically, within dev->hard_start_xmit() ? Then TX_DESC_MARK would
>>be set if/when available TX descriptor count falls below some watermark.
>>Disable device TX queue about the time TX_DESC_MARK is set and enable
>>it within TX interrupt.
>>
>>
>
>Since my hack patch didn't fix his problem at all, are you suggesting
>that we end up not fielding TX mark interrupts even though mark is set
>in all the TX descriptors and this is what hangs the chip?
>
>I find that very unlikely, especially because with my test patch every
>single TX descriptor will have the mark bit set and therefore we'd
>have to not receive all of those TX mark interrupts in order for the
>TX unit to hang like that.
>
>Something else must be going wrong.
>
>
Dave,
Actually what I am suggesting was a workaround for the lack of "TX Ring
Empty" interrupt by not relying on the TX interrupt at all. As for the
TX hang, I will try to reproduce the problem and look at the registers
for the clue.
Regards
Matheos
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-30 0:14 ` Matheos Worku
@ 2008-05-30 7:00 ` David Miller
2008-06-16 18:09 ` Matheos Worku
0 siblings, 1 reply; 52+ messages in thread
From: David Miller @ 2008-05-30 7:00 UTC (permalink / raw)
To: Matheos.Worku; +Cc: jesper, yhlu.kernel, linux-kernel, netdev
From: Matheos Worku <Matheos.Worku@Sun.COM>
Date: Thu, 29 May 2008 17:14:29 -0700
> Actually what I am suggesting was a workaround for the lack of "TX Ring
> Empty" interrupt by not relying on the TX interrupt at all.
Ahh I see.
Some of the things I talked about in my presentation here in
Berlin at LinuxTAG yesterday can help mitigate the effects.
Most of it revolves around batching, and allowing the driver
to manage the backlog of packets directly when the TX queue
fills up.
In such a case we could batch the TX queue refill, know how many more
TX packets we will queue up to the chip right now, and therefore know
that we can safely set periodic MARK bits and only need to force set
the MARK bit at the very end.
> As for the TX hang, I will try to reproduce the problem and look at
> the registers for the clue.
Thanks a lot.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-22 17:41 ` David Miller
2008-05-22 18:14 ` Ben Hutchings
@ 2008-06-01 7:25 ` Andrey Panin
2008-06-01 16:01 ` David Miller
1 sibling, 1 reply; 52+ messages in thread
From: Andrey Panin @ 2008-06-01 7:25 UTC (permalink / raw)
To: David Miller; +Cc: jesper, Matheos.Worku, yhlu.kernel, linux-kernel, netdev
[-- Attachment #1: Type: text/plain, Size: 885 bytes --]
On 143, 05 22, 2008 at 10:41:45 -0700, David Miller wrote:
> From: Jesper Krogh <jesper@krogh.cc>
> Date: Thu, 22 May 2008 18:32:07 +0200
>
> > Does this mean that I can expect every 10G card to have this limitation
> > under Linux?
>
> For now, yes. The transmit path itself in the generic network
> device layer is where the serialization comes from.
BTW does this problem affects bonded interfaces ?
> I'm travelling now for 3 weeks so I won't be able to work on
> this stuff until I get back.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
Andrey Panin | Linux and UNIX system administrator
pazke@donpac.ru | PGP key: wwwkeys.pgp.net
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-06-01 7:25 ` Andrey Panin
@ 2008-06-01 16:01 ` David Miller
0 siblings, 0 replies; 52+ messages in thread
From: David Miller @ 2008-06-01 16:01 UTC (permalink / raw)
To: pazke; +Cc: jesper, Matheos.Worku, yhlu.kernel, linux-kernel, netdev
From: Andrey Panin <pazke@pazke.donpac.ru>
Date: Sun, 1 Jun 2008 11:25:28 +0400
> On 143, 05 22, 2008 at 10:41:45 -0700, David Miller wrote:
> > For now, yes. The transmit path itself in the generic network
> > device layer is where the serialization comes from.
>
> BTW does this problem affects bonded interfaces ?
The toplevel bond device uses lockless transmit because
it does not queue, so the arity of the locking is equal
to the arity of the number of slave devices sitting
under the toplevel bond device.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-30 7:00 ` David Miller
@ 2008-06-16 18:09 ` Matheos Worku
2008-06-16 18:21 ` Jesper Krogh
0 siblings, 1 reply; 52+ messages in thread
From: Matheos Worku @ 2008-06-16 18:09 UTC (permalink / raw)
To: David Miller; +Cc: jesper, yhlu.kernel, linux-kernel, netdev
David Miller wrote:
>From: Matheos Worku <Matheos.Worku@Sun.COM>
>Date: Thu, 29 May 2008 17:14:29 -0700
>
>
>
>>Actually what I am suggesting was a workaround for the lack of "TX Ring
>>Empty" interrupt by not relying on the TX interrupt at all.
>>
>>
>
>Ahh I see.
>
>Some of the things I talked about in my presentation here in
>Berlin at LinuxTAG yesterday can help mitigate the effects.
>Most of it revolves around batching, and allowing the driver
>to manage the backlog of packets directly when the TX queue
>fills up.
>
>In such a case we could batch the TX queue refill, know how many more
>TX packets we will queue up to the chip right now, and therefore know
>that we can safely set periodic MARK bits and only need to force set
>the MARK bit at the very end.
>
>
>
>>As for the TX hang, I will try to reproduce the problem and look at
>>the registers for the clue.
>>
>>
Have been trying but not able to reproduce the timeout. I am using NFS
V3 with TCP. Are you using UDP by any chance?
Regards
Matheos
>
>Thanks a lot.
>--
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-06-16 18:09 ` Matheos Worku
@ 2008-06-16 18:21 ` Jesper Krogh
0 siblings, 0 replies; 52+ messages in thread
From: Jesper Krogh @ 2008-06-16 18:21 UTC (permalink / raw)
To: Matheos Worku; +Cc: David Miller, yhlu.kernel, linux-kernel, netdev
Matheos Worku wrote:
> David Miller wrote:
>
>> From: Matheos Worku <Matheos.Worku@Sun.COM>
>> Date: Thu, 29 May 2008 17:14:29 -0700
>>
>>
>>
>>> Actually what I am suggesting was a workaround for the lack of "TX
>>> Ring Empty" interrupt by not relying on the TX interrupt at all.
>>>
>>
>> Ahh I see.
>>
>> Some of the things I talked about in my presentation here in
>> Berlin at LinuxTAG yesterday can help mitigate the effects.
>> Most of it revolves around batching, and allowing the driver
>> to manage the backlog of packets directly when the TX queue
>> fills up.
>>
>> In such a case we could batch the TX queue refill, know how many more
>> TX packets we will queue up to the chip right now, and therefore know
>> that we can safely set periodic MARK bits and only need to force set
>> the MARK bit at the very end.
>>
>>
>>
>>> As for the TX hang, I will try to reproduce the problem and look at
>>> the registers for the clue.
>>>
> Have been trying but not able to reproduce the timeout. I am using NFS
> V3 with TCP. Are you using UDP by any chance?
I wouldn't say it is easy either.. I have never got it before getting a
few TB over the "wire". I've got proto=tcp in /proc/mounts for the
mountpoints, so I'd assume that I use TCP.
There is an Extreme Networks switch in the other end, I havent got
hardware to actually test that with a different card, so I cannot rule
the switch out either. .. but it would be strange.
Jesper
--
Jesper
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)
2008-05-28 1:18 ` Matheos Worku
2008-05-29 5:34 ` David Miller
@ 2008-06-18 0:02 ` David Miller
1 sibling, 0 replies; 52+ messages in thread
From: David Miller @ 2008-06-18 0:02 UTC (permalink / raw)
To: Matheos.Worku; +Cc: jesper, yhlu.kernel, linux-kernel, netdev
From: Matheos Worku <Matheos.Worku@Sun.COM>
Date: Tue, 27 May 2008 18:18:57 -0700
> Considering that fixing the HW would take considerable time, I was
> wondering if the scheme we use in the nxge driver could be considered as
> a workaround. Since the niu driver is already doing skb_orphan as a work
> around, what if already transmitted TX buffers are reclaimed
> periodically, within dev->hard_start_xmit() ? Then TX_DESC_MARK would
> be set if/when available TX descriptor count falls below some watermark.
> Disable device TX queue about the time TX_DESC_MARK is set and enable
> it within TX interrupt.
This is still insufficient.
Even if we detach the socket association, there are still resources
held by the SKB, such as firewalling state.
So you can get into situations where, for example, you can't unload
netfilter modules, attempts just hang the system.
The only workaround is to use a timer to purge the TX queue, and that's
far from acceptable in my opinion, because of the timer maintainence
overhead and the latency.
^ permalink raw reply [flat|nested] 52+ messages in thread
end of thread, other threads:[~2008-06-18 0:02 UTC | newest]
Thread overview: 52+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4821F3B7.2090702@krogh.cc>
[not found] ` <86802c440805071130m62c1f4edydb3316dac4a2aba2@mail.gmail.com>
2008-05-07 21:15 ` NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24) David Miller
2008-05-09 18:32 ` Jesper Krogh
2008-05-09 21:32 ` David Miller
2008-05-09 21:59 ` Jesper Krogh
2008-05-09 22:07 ` David Miller
2008-05-09 22:13 ` Jesper Krogh
2008-05-09 22:09 ` Matheos Worku
2008-05-09 22:15 ` Jesper Krogh
2008-05-09 22:36 ` Matheos Worku
2008-05-09 22:43 ` Matheos Worku
2008-05-09 22:46 ` David Miller
2008-05-09 23:10 ` Jesper Krogh
2008-05-09 23:21 ` Matheos Worku
2008-05-09 22:45 ` David Miller
2008-05-22 16:32 ` Jesper Krogh
2008-05-22 17:15 ` Ben Hutchings
2008-05-22 17:41 ` David Miller
2008-05-22 18:14 ` Ben Hutchings
2008-05-22 18:28 ` David Miller
2008-06-01 7:25 ` Andrey Panin
2008-06-01 16:01 ` David Miller
2008-05-09 22:20 ` Rick Jones
2008-05-09 22:48 ` Jesper Krogh
2008-05-09 23:03 ` Rick Jones
2008-05-09 23:13 ` Jesper Krogh
2008-05-09 23:33 ` Rick Jones
2008-05-09 23:08 ` David Dillow
2008-05-10 6:22 ` Jesper Krogh
2008-05-10 15:53 ` Roland Dreier
2008-05-12 6:49 ` Jesper Krogh
2008-05-10 2:20 ` Bill Fink
2008-05-10 11:01 ` Jesper Krogh
2008-05-11 4:34 ` David Miller
2008-05-11 5:44 ` Jesper Krogh
2008-05-11 6:08 ` David Miller
2008-05-11 9:47 ` Jesper Krogh
2008-05-12 6:52 ` Jesper Krogh
2008-05-26 19:03 ` Jesper Krogh
2008-05-26 19:33 ` David Miller
2008-05-26 19:39 ` David Miller
2008-05-26 20:54 ` Jesper Krogh
2008-05-26 22:15 ` David Miller
2008-05-26 22:21 ` Jesper Krogh
2008-05-26 22:30 ` David Miller
2008-05-27 6:19 ` Jesper Krogh
2008-05-28 1:18 ` Matheos Worku
2008-05-29 5:34 ` David Miller
2008-05-30 0:14 ` Matheos Worku
2008-05-30 7:00 ` David Miller
2008-06-16 18:09 ` Matheos Worku
2008-06-16 18:21 ` Jesper Krogh
2008-06-18 0:02 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).