Very high bandwith packet based interface and performance problems

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Very high bandwith packet based interface and performance problems
@ 2001-02-21  2:19 Nye Liu
       [not found] ` <E14VXub-0001vv-00@the-village.bc.nu>
  0 siblings, 1 reply; 14+ messages in thread
From: Nye Liu @ 2001-02-21  2:19 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 308 bytes --]

I am working on a very high speed packet based interface but we are having
severe problems related to bandwidth vs cpu horsepower. enclosed is a part
of a summary. PLEASE cc responses directly to spamnyet@nyet.org

Thanks!!!

-- 
"Who would be stupid enough to quote a fictitious character?"
	-- Don Quixote

[-- Attachment #2: Type: message/rfc822, Size: 3189 bytes --]

From: Nye Liu <nyet@zumanetworks.com>
To: Peter Thommen <pthommen@zumanetworks.com>
Cc: support@mvista.com
Subject: SI/ppc performance issues.
Date: Tue, 20 Feb 2001 18:03:56 -0800
Message-ID: <20010220180356.A1936@hobag.internal.zumanetworks.com>

Due to the limited horsepower of our ppc740 (as it has no cache) our
proprietary, 2 Gbit packet based interface is capable of overwhelming
the software throughput capabilities of the kernel. This congestion is
causing severe network performance issues in both UDP and TCP.  In UDP,
if the frame rate exceeds approximately 300Mbit (1500 byte packets),
the kernel usage goes to 100%, leaving no cpu power for user space
applications to even recieve frames, causing severe queuing packet
loss. In the TCP case, there seem to be constant acks from the kernel,
but most data never seems to make it to user space.

Inspecting the /proc/net/dev and /proc/net/snmp counters reveal no errors.

As a control, the private 10/100 ethernet interface is capable of
sustaining 100Mbits of unidirectional UDP and TCP trafic with no problems.

Similary, if a traffic policer is used to limit the load of the
proprietary high speed interface to approx. 200Mbits, there is no packet
loss in UDP. Since we lack a shaper, we can't test TCP reliably, as
the policer drops packets instead of shaping output. We can test this
qualitatively by artificially preventing the TCP source from sending
too quickly.. we can do this by loading the source cpu heavily. however,
results from this are mixed. We seem to be able to attain only approx.
50-60Mbits by this method.

Questions:

There are two options in the 2.0 kernel. One is "Cpu is too slow for
network" or something similar. A second (driver specific option) is a
flow control mechanism.

In 2.4, the first seems to be missing. The second is only available for
a few drivers (e.g. tulip)

What do these options do?

In 2.4, what is the recommended way of keeping a high speed interface
from overwhelming the kernel network queue (e.g. Gig ethernet)?

Does this affect user space programs (e.g. ftpd, apache, etc)?

If so, how?

What are the mechanisms by which the Linux kernel drops frames?

Which mechanisms are accompanied by statistics, and what are they.

Which mechanisms are NOT accompanied by statistics.

Why is the kernel acking a blocked TCP stream? (i.e. when a user space
TCP program is unable to read from a socket because it is not being
schedulued due to kernel cpu load)

(todd... please comment, as this is a prelim document for the problem
description)

-Nye

-- 
"Who would be stupid enough to quote a fictitious character?"
	-- Don Quixote

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Very high bandwith packet based interface and performance problems
       [not found] ` <E14VXub-0001vv-00@the-village.bc.nu>
@ 2001-02-21 22:00   ` Nye Liu
  2001-02-21 22:07     ` Alan Cox
  2001-02-21 22:27     ` Gregory Maxwell
  0 siblings, 2 replies; 14+ messages in thread
From: Nye Liu @ 2001-02-21 22:00 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

On Wed, Feb 21, 2001 at 11:58:23AM +0000, Alan Cox wrote:
> Dropping packets under load will make tcp do the right thing. You don't need
> complex mathematical models since dropping frames under load is just another
> form of congestion and tcp handles it pretty sanely

Alan: thanks for your response...

This is exactly what I would expect to see, but we are seeing something
else..

Under HEAVY load we are seeing approximately 20Mbit of TCP throughput. If
we "shape" (i use the term loosely, we dont actually have a real shaper,
just loading the cpu who is trasmitting) the presented load, we can
get 60-70Mbit. I'm not quite sure why this is.  My first guess was
that because the kernel was getting 99% of the cpu, the application was
getting very little, and thus the read wasn't happening fast enough, and
the socket was blocking. In this case, you would expect the system to get
to a nice equilibrium, where if the app stopped reading, the kernel would
stop acking, and the transmitter would back off, eventually to a point
where the app could start reading again because the kernel load dropped.

This is NOT what I'm seeing at all.. the kernel load appears to be
pegged at 100% (or very close to it), the user space app is getting
enough cpu time to read out about 10-20Mbit, and FURTHERMORE the kernel
appears to be ACKING ALL the traffic, which I don't understand at all
(e.g. the transmitter is simply blasting 300MBit of tcp unrestricted)

With udp, we can get the full 300MBit throughput, but only if we shape
the load to 300Mbit. If we increase the load past 300 MBit, the received
frames (at the user space udp app) drops to 10-20MBit, again due to
user-space application scheduling problems.

-nye

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Very high bandwith packet based interface and performance problems
  2001-02-21 22:00   ` Nye Liu
@ 2001-02-21 22:07     ` Alan Cox
  2001-02-21 22:11       ` Nye Liu
                         ` (3 more replies)
  2001-02-21 22:27     ` Gregory Maxwell
  1 sibling, 4 replies; 14+ messages in thread
From: Alan Cox @ 2001-02-21 22:07 UTC (permalink / raw)
  To: Nye Liu; +Cc: Alan Cox, linux-kernel

> that because the kernel was getting 99% of the cpu, the application was
> getting very little, and thus the read wasn't happening fast enough, and

Seems reasonable

> This is NOT what I'm seeing at all.. the kernel load appears to be
> pegged at 100% (or very close to it), the user space app is getting
> enough cpu time to read out about 10-20Mbit, and FURTHERMORE the kernel
> appears to be ACKING ALL the traffic, which I don't understand at all
> (e.g. the transmitter is simply blasting 300MBit of tcp unrestricted)

TCP _requires_ the remote end ack every 2nd frame regardless of progress.

> With udp, we can get the full 300MBit throughput, but only if we shape
> the load to 300Mbit. If we increase the load past 300 MBit, the received
> frames (at the user space udp app) drops to 10-20MBit, again due to
> user-space application scheduling problems.

How is your incoming traffic handled architecturally - irq per packet or
some kind of ring buffer with irq mitigation.  Do you know where the cpu
load is - is it mostly the irq servicing or mostly network stack ?




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Very high bandwith packet based interface and performance problems
  2001-02-21 22:07     ` Alan Cox
@ 2001-02-21 22:11       ` Nye Liu
  2001-02-21 22:25         ` Alan Cox
  2001-02-22  1:24       ` Nye Liu
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Nye Liu @ 2001-02-21 22:11 UTC (permalink / raw)
  To: Alan Cox, linux-kernel

On Wed, Feb 21, 2001 at 10:07:32PM +0000, Alan Cox wrote:
> > that because the kernel was getting 99% of the cpu, the application was
> > getting very little, and thus the read wasn't happening fast enough, and
> 
> Seems reasonable
> 
> > This is NOT what I'm seeing at all.. the kernel load appears to be
> > pegged at 100% (or very close to it), the user space app is getting
> > enough cpu time to read out about 10-20Mbit, and FURTHERMORE the kernel
> > appears to be ACKING ALL the traffic, which I don't understand at all
> > (e.g. the transmitter is simply blasting 300MBit of tcp unrestricted)
> 
> TCP _requires_ the remote end ack every 2nd frame regardless of progress.
> 
> > With udp, we can get the full 300MBit throughput, but only if we shape
> > the load to 300Mbit. If we increase the load past 300 MBit, the received
> > frames (at the user space udp app) drops to 10-20MBit, again due to
> > user-space application scheduling problems.
> 
> How is your incoming traffic handled architecturally - irq per packet or
> some kind of ring buffer with irq mitigation.  Do you know where the cpu
> load is - is it mostly the irq servicing or mostly network stack ?
> 
> 

Alan: thanks again for your prompt response!

bus mastered DMA ring buffer. As to the load, I'm not quite sure... we
were using a fairly large ring buffer, but increasing/decreasing the size
didn't seem to affect the number of packets per interrrupt. I added a
little watermarking code, and it seems that we do (at peak) about 30-35
packets per interrupt. That is STILL a heck of a lot of interrupts! I
can't quite figure out why the driver refuses to go deeper.

I can think of a couple possible solutions. our interface has a HUGE
amount of hardware buffers, so I can easily simply stop reading for
a small time if we detect conjestion... can you suggest a nice clean
mechanism for this?

any other ideas?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Very high bandwith packet based interface and performance problems
  2001-02-21 22:11       ` Nye Liu
@ 2001-02-21 22:25         ` Alan Cox
  0 siblings, 0 replies; 14+ messages in thread
From: Alan Cox @ 2001-02-21 22:25 UTC (permalink / raw)
  To: Nye Liu; +Cc: Alan Cox, linux-kernel

> I can think of a couple possible solutions. our interface has a HUGE
> amount of hardware buffers, so I can easily simply stop reading for
> a small time if we detect conjestion... can you suggest a nice clean
> mechanism for this?

If you have a lot of buffers you can try one thing to see if its IRQ load,
turn the IRQ off, set a fast timer running and hook the buffer handling to
the timer irq.

Next obvious step would be using the timer based irq handling to limit the
number of buffers you use netif_rx() on and discard any others.

Finally don't rule out memory bandwidth, if the ram is main memory then the
dma engine could be pretty much driving the cpu off the bus  at high data
rates.

Alan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Very high bandwith packet based interface and performance problems
  2001-02-21 22:00   ` Nye Liu
  2001-02-21 22:07     ` Alan Cox
@ 2001-02-21 22:27     ` Gregory Maxwell
  1 sibling, 0 replies; 14+ messages in thread
From: Gregory Maxwell @ 2001-02-21 22:27 UTC (permalink / raw)
  To: Nye Liu; +Cc: Alan Cox, linux-kernel

On Wed, Feb 21, 2001 at 02:00:55PM -0800, Nye Liu wrote:
[snip]
> This is NOT what I'm seeing at all.. the kernel load appears to be
> pegged at 100% (or very close to it), the user space app is getting
> enough cpu time to read out about 10-20Mbit, and FURTHERMORE the kernel
> appears to be ACKING ALL the traffic, which I don't understand at all
> (e.g. the transmitter is simply blasting 300MBit of tcp unrestricted)
> 
> With udp, we can get the full 300MBit throughput, but only if we shape
> the load to 300Mbit. If we increase the load past 300 MBit, the received
> frames (at the user space udp app) drops to 10-20MBit, again due to
> user-space application scheduling problems.

Perhaps excess context switches are thrashing the system?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Very high bandwith packet based interface and performance problems
  2001-02-21 22:07     ` Alan Cox
  2001-02-21 22:11       ` Nye Liu
@ 2001-02-22  1:24       ` Nye Liu
  2001-02-22  1:50         ` Rick Jones
  2001-02-22 10:14         ` Alan Cox
  2001-02-22  1:46       ` Rick Jones
  2001-02-22 21:48       ` Pavel Machek
  3 siblings, 2 replies; 14+ messages in thread
From: Nye Liu @ 2001-02-22  1:24 UTC (permalink / raw)
  To: Alan Cox, linux-kernel

On Wed, Feb 21, 2001 at 10:07:32PM +0000, Alan Cox wrote:
> > that because the kernel was getting 99% of the cpu, the application was
> > getting very little, and thus the read wasn't happening fast enough, and
> 
> Seems reasonable
> 
> > This is NOT what I'm seeing at all.. the kernel load appears to be
> > pegged at 100% (or very close to it), the user space app is getting
> > enough cpu time to read out about 10-20Mbit, and FURTHERMORE the kernel
> > appears to be ACKING ALL the traffic, which I don't understand at all
> > (e.g. the transmitter is simply blasting 300MBit of tcp unrestricted)
> 
> TCP _requires_ the remote end ack every 2nd frame regardless of progress.

YIPES. I didn't realize this was the case.. how is end-to-end application
flow control handled when the bottle neck is user space bound and not b/w
bound? e.g. if i write a test app that does a

while(1) {
    sleep (5);
    read(sock, buf, 1);
}

and the transmitter is unrestricted, what happens?

Does it have to do with TCP_FORMAL_WINDOW (eg. automatically reduce window
size to zero when queue backs up?)

or is it only a cpu loading problem? (ie. is there a difference in queuing
behavior between 1) the user process doesnt get cycles 2) the user process
simply fails to read ?)

Also, I have been reading up on CONFIG_HW_FLOWCONTROL.. what is the
recommended way for the driver to stop receiving? In the sample tulip
code i see you can register a xon callback, but i can't tell if there
is a way to see the backlog from the driver.

-nye

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Very high bandwith packet based interface and performance problems
  2001-02-21 22:07     ` Alan Cox
  2001-02-21 22:11       ` Nye Liu
  2001-02-22  1:24       ` Nye Liu
@ 2001-02-22  1:46       ` Rick Jones
  2001-02-22 10:20         ` Alan Cox
  2001-02-22 21:48       ` Pavel Machek
  3 siblings, 1 reply; 14+ messages in thread
From: Rick Jones @ 2001-02-22  1:46 UTC (permalink / raw)
  To: Alan Cox; +Cc: Nye Liu, linux-kernel

Alan Cox wrote:
> 
> > that because the kernel was getting 99% of the cpu, the application was
> > getting very little, and thus the read wasn't happening fast enough, and
> 
> Seems reasonable
> 
> > This is NOT what I'm seeing at all.. the kernel load appears to be
> > pegged at 100% (or very close to it), the user space app is getting
> > enough cpu time to read out about 10-20Mbit, and FURTHERMORE the kernel
> > appears to be ACKING ALL the traffic, which I don't understand at all
> > (e.g. the transmitter is simply blasting 300MBit of tcp unrestricted)
> 
> TCP _requires_ the remote end ack every 2nd frame regardless of progress.

um, I thought the spec says that ACK every 2nd segment is a SHOULD not a
MUST?

rick jones
-- 
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Very high bandwith packet based interface and performance problems
  2001-02-22  1:24       ` Nye Liu
@ 2001-02-22  1:50         ` Rick Jones
  2001-02-22 10:14         ` Alan Cox
  1 sibling, 0 replies; 14+ messages in thread
From: Rick Jones @ 2001-02-22  1:50 UTC (permalink / raw)
  To: Nye Liu; +Cc: Alan Cox, linux-kernel

> > > This is NOT what I'm seeing at all.. the kernel load appears to be
> > > pegged at 100% (or very close to it), the user space app is getting
> > > enough cpu time to read out about 10-20Mbit, and FURTHERMORE the kernel
> > > appears to be ACKING ALL the traffic, which I don't understand at all
> > > (e.g. the transmitter is simply blasting 300MBit of tcp unrestricted)
> >
> > TCP _requires_ the remote end ack every 2nd frame regardless of progress.
> 
> YIPES. I didn't realize this was the case.. how is end-to-end application
> flow control handled when the bottle neck is user space bound and not b/w
> bound? e.g. if i write a test app that does a

If the app is not reading from the socket buffer, the receiving TCP is
supposed to stop sending window-updates, and the sender is supposed to
stop sending data when it runs-out of window.

If TCP ACK's data, it really should (must?) not then later drop it on
the floor without aborting the connection. If a TCP is ACKing data and
then that data is dropped before it is given to the application, and the
connection is not being reset, that is probably a bug.

A TCP _is_ free to drop data prior to sending an ACK - it simply drops
it and does not ACK it.

rick jones

-- 
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Very high bandwith packet based interface and performance problems
  2001-02-22  1:24       ` Nye Liu
  2001-02-22  1:50         ` Rick Jones
@ 2001-02-22 10:14         ` Alan Cox
  1 sibling, 0 replies; 14+ messages in thread
From: Alan Cox @ 2001-02-22 10:14 UTC (permalink / raw)
  To: Nye Liu; +Cc: Alan Cox, linux-kernel

> and the transmitter is unrestricted, what happens?
> Does it have to do with TCP_FORMAL_WINDOW (eg. automatically reduce window
> size to zero when queue backs up?)

Read RFC1122. Basically your guess is right. The sender sends data, and gets
back acks saying 'window 0'. It will then do exponential backoffs while
polling the 0 window as it backs off (ack being unreliable)



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Very high bandwith packet based interface and performance problems
  2001-02-22  1:46       ` Rick Jones
@ 2001-02-22 10:20         ` Alan Cox
  2001-02-22 18:12           ` Rick Jones
  0 siblings, 1 reply; 14+ messages in thread
From: Alan Cox @ 2001-02-22 10:20 UTC (permalink / raw)
  To: Rick Jones; +Cc: Alan Cox, Nye Liu, linux-kernel

> > TCP _requires_ the remote end ack every 2nd frame regardless of progress.
> 
> um, I thought the spec says that ACK every 2nd segment is a SHOULD not a
> MUST?

Yes its a SHOULD in RFC1122, but in any normal environment pretty much a 
must and I know of no stack significantly violating it.

RFC1122 also requires that your protocol stack SHOULD be able to leap tall
buldings at a single bound of course...


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Very high bandwith packet based interface and performance problems
  2001-02-22 10:20         ` Alan Cox
@ 2001-02-22 18:12           ` Rick Jones
  2001-02-23 18:27             ` kuznet
  0 siblings, 1 reply; 14+ messages in thread
From: Rick Jones @ 2001-02-22 18:12 UTC (permalink / raw)
  To: Alan Cox; +Cc: Nye Liu, linux-kernel

Alan Cox wrote:
> 
> > > TCP _requires_ the remote end ack every 2nd frame regardless of progress.
> >
> > um, I thought the spec says that ACK every 2nd segment is a SHOULD not a
> > MUST?
> 
> Yes its a SHOULD in RFC1122, but in any normal environment pretty much a
> must and I know of no stack significantly violating it.

I didn't know there was such a thing as a normal environment :)

> RFC1122 also requires that your protocol stack SHOULD be able to leap tall
> buldings at a single bound of course...

And, of course my protocol stack does :) It is also a floor wax, AND a
dessert topping!-)

rick jones
-- 
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Very high bandwith packet based interface and performance problems
  2001-02-21 22:07     ` Alan Cox
                         ` (2 preceding siblings ...)
  2001-02-22  1:46       ` Rick Jones
@ 2001-02-22 21:48       ` Pavel Machek
  3 siblings, 0 replies; 14+ messages in thread
From: Pavel Machek @ 2001-02-22 21:48 UTC (permalink / raw)
  To: Alan Cox, Nye Liu; +Cc: linux-kernel

Hi!

> > This is NOT what I'm seeing at all.. the kernel load appears to be
> > pegged at 100% (or very close to it), the user space app is getting
> > enough cpu time to read out about 10-20Mbit, and FURTHERMORE the kernel
> > appears to be ACKING ALL the traffic, which I don't understand at all
> > (e.g. the transmitter is simply blasting 300MBit of tcp unrestricted)
> 
> TCP _requires_ the remote end ack every 2nd frame regardless of
> progress.

Should not TCP advertise window of 0 to stop sender?

Where does kernel put all those data in tcp case? I do not understand
that. Transmiter blasts at 300Mbit, userspace gets 20Mbit. There's
280Mbit datastream going _somewhere_. It should be eating memory at
35MB/second, unless you have 1Gig of ram, something interesting should
happen after minute or so...
								Pavel
-- 
I'm pavel@ucw.cz. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at discuss@linmodems.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Very high bandwith packet based interface and performance problems
  2001-02-22 18:12           ` Rick Jones
@ 2001-02-23 18:27             ` kuznet
  0 siblings, 0 replies; 14+ messages in thread
From: kuznet @ 2001-02-23 18:27 UTC (permalink / raw)
  To: Rick Jones; +Cc: linux-kernel

Hello!

> > Yes its a SHOULD in RFC1122, but in any normal environment pretty much a
> > must and I know of no stack significantly violating it.
> 
> I didn't know there was such a thing as a normal environment :)

Jokes apart, such "normal" environments are rare today.

>From tcpdumps it is clear, that win2000 does not ack each other mss.
It can ack once per window at high load. I have seen the same behaviour
of solaris. freebsd-4.x surely does not ack each second mss
(it is from source code), which is probably bug (at least, it stops
to ack at all as soon as MSG_WAITALL is used. 8))

Acking each second mss is required to do slow start more or less
fastly. As soon as window is full, they are useless, so that win2000
is fully right and, in fact, optimal.

Alexey

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2001-02-23 18:40 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-02-21  2:19 Very high bandwith packet based interface and performance problems Nye Liu
     [not found] ` <E14VXub-0001vv-00@the-village.bc.nu>
2001-02-21 22:00   ` Nye Liu
2001-02-21 22:07     ` Alan Cox
2001-02-21 22:11       ` Nye Liu
2001-02-21 22:25         ` Alan Cox
2001-02-22  1:24       ` Nye Liu
2001-02-22  1:50         ` Rick Jones
2001-02-22 10:14         ` Alan Cox
2001-02-22  1:46       ` Rick Jones
2001-02-22 10:20         ` Alan Cox
2001-02-22 18:12           ` Rick Jones
2001-02-23 18:27             ` kuznet
2001-02-22 21:48       ` Pavel Machek
2001-02-21 22:27     ` Gregory Maxwell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox