sendfile+zerocopy: fairly sexy (nothing to do with ECN)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* sendfile+zerocopy: fairly sexy (nothing to do with ECN)
@ 2001-01-27  5:45 Andrew Morton
  2001-01-27  6:20 ` Aaron Lehmann
                   ` (3 more replies)
  0 siblings, 4 replies; 56+ messages in thread
From: Andrew Morton @ 2001-01-27  5:45 UTC (permalink / raw)
  To: lkml, netdev@oss.sgi.com

(Please keep netdev copied, else Jamal will grump at you, and
 you don't want that).

I've whacked together some tools to measure TCP throughput
with both sendfile and read/write.  I've tested with and
without the zerocopy patch.

The CPU load figures are very accurate: the tool uses a `subtractive'
algorithm which measures how much CPU is left over by the networking
code, rather than trying to measure how much CPU load the networking
stuff actually takes, if you see what I mean.  This accounts accurately
for CPU load, interrupts, softirq processing, memory bandwidth utilisation
and cache pollution.

The client is a 650 MHz PIII.  The NIC is a 3CCFE575CT Cardbus 3com.
It supports Scatter/Gather and hardware checksums.  The NIC's interrupt
is shared with the Cardbus controller, so this will impact throughput
slightly.

The kernels which were tested were 2.4.1-pre10 with and without the
zerocopy patch.  We only look at client load (the TCP sender).

The link throughput was 11.5 mbytes/sec at all times (saturated 100baseT)

2.4.1-pre10-vanilla, using sendfile():          29.6% CPU
2.4.1-pre10-vanilla, using read()/write():      34.5% CPU

2.4.1-pre10+zercopy, using sendfile():          18.2% CPU
2.4.1-pre10+zercopy, using read()/write():      38.1% CPU

2.4.1-pre10+zercopy, using sendfile():          22.9% CPU    * hardware tx checksums disabled
2.4.1-pre10+zercopy, using read()/write():      39.2% CPU    * hardware tx checksums disabled

What can we conclude?

- sendfile is 10% cheaper than read()-then-write() on 2.4.1-pre10.

- sendfile() with the zerocopy patch is 40% cheaper than
  sendfile() without the zerocopy patch.

- hardware Tx checksums don't make much difference.  hmm...

Bear in mind that the 3c59x driver uses a one-interrupt-per-packet
algorithm.  Mitigation reduces this to 0.3 ints/packet.
So we're absorbing 4,500 interrupts/sec while processing
12,000 packets/sec.  gigE NICs do much better mitigation than
this and the relative benefits of zerocopy will be much higher
for these.  Hopefully Jamal can do some testing.

BTW: I could not reproduce Jamal's oops when sending large
files (2 gigs with sendfile()).

The test tool is, of course, documented [ :-)/2 ].  It's at

	http://www.uow.edu.au/~andrewm/linux/#zc

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-27  5:45 sendfile+zerocopy: fairly sexy (nothing to do with ECN) Andrew Morton
@ 2001-01-27  6:20 ` Aaron Lehmann
  2001-01-27  8:19   ` Andrew Morton
  2001-01-27 10:05 ` Ion Badulescu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 56+ messages in thread
From: Aaron Lehmann @ 2001-01-27  6:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, netdev@oss.sgi.com

On Sat, Jan 27, 2001 at 04:45:43PM +1100, Andrew Morton wrote:
> 2.4.1-pre10-vanilla, using read()/write():      34.5% CPU
> 2.4.1-pre10+zercopy, using read()/write():      38.1% CPU

Am I right to be bothered by this?

The majority of Unix network traffic is handled with read()/write().
Why would zerocopy slow that down?

If zerocopy is simply unoptimized, that's fine for now. But if the
problem is inherent in the implementation or design, that might be a
problem. Any patch which incurs a signifigant slowdown on traditional
networking should be contraversial.

Aaron Lehmann

please ignore me if I don't know what I'm talking about.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-27  6:20 ` Aaron Lehmann
@ 2001-01-27  8:19   ` Andrew Morton
  2001-01-27 10:09     ` Ion Badulescu
  2001-01-30  6:00     ` David S. Miller
  0 siblings, 2 replies; 56+ messages in thread
From: Andrew Morton @ 2001-01-27  8:19 UTC (permalink / raw)
  To: Aaron Lehmann; +Cc: lkml, netdev@oss.sgi.com

Aaron Lehmann wrote:
> 
> On Sat, Jan 27, 2001 at 04:45:43PM +1100, Andrew Morton wrote:
> > 2.4.1-pre10-vanilla, using read()/write():      34.5% CPU
> > 2.4.1-pre10+zercopy, using read()/write():      38.1% CPU
> 
> Am I right to be bothered by this?
> 
> The majority of Unix network traffic is handled with read()/write().
> Why would zerocopy slow that down?
> 
> If zerocopy is simply unoptimized, that's fine for now. But if the
> problem is inherent in the implementation or design, that might be a
> problem. Any patch which incurs a signifigant slowdown on traditional
> networking should be contraversial.

Good point.

The figures I quoted for the no-hw-checksum case were still
using scatter/gather.  That can be turned off as well and
it makes it a tiny bit quicker.  So the table is now:

2.4.1-pre10-vanilla, using sendfile():          29.6% CPU
2.4.1-pre10-vanilla, using read()/write():      34.5% CPU

2.4.1-pre10+zercopy, using sendfile():          18.2% CPU
2.4.1-pre10+zercopy, using read()/write():      38.1% CPU

2.4.1-pre10+zercopy, using sendfile():          22.9% CPU    * hardware tx checksums disabled
2.4.1-pre10+zercopy, using read()/write():      39.2% CPU    * hardware tx checksums disabled

2.4.1-pre10+zercopy, using sendfile():          22.4% CPU    * hardware tx checksums and SG disabled
2.4.1-pre10+zercopy, using read()/write():      38.5% CPU    * hardware tx checksums and SG disabled

But that's not relevant.

I just retested everything.  Yes, the zerocopy patch does
appear to decrease the efficiency of TCP on non-SG+checksumming
hardware by 5% - 10%.  Others need to test...


With an RTL8139/8139too.  CPU is 500MHz PII Celeron, uniprocessor:

2.4.1-pre10-vanilla, using sendfile():          43.8% CPU
2.4.1-pre10-vanilla, using read()/write():      54.1% CPU

2.4.1-pre10+zerocopy, using sendfile():         43.1% CPU
2.4.1-pre10+zerocopy, using read()/write():     55.5% CPU

Note that the 8139 only gets 10.8 Mbytes/sec here.  it randomly
jumps up to 11.5 occasionally, but spends most of its time at
10.8. Hard to know what to make of this.  Of course, if you're
using an 8139 you don't care about performance anyway :)


Contradictory results.  rtl8139 doesn't do Rx checksums,
and I think has an extra copy in the driver, so caching effects
may be obscuring things here.

I can test with eepro100 in a couple of days.


-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-27  5:45 sendfile+zerocopy: fairly sexy (nothing to do with ECN) Andrew Morton
  2001-01-27  6:20 ` Aaron Lehmann
@ 2001-01-27 10:05 ` Ion Badulescu
  2001-01-27 10:39   ` Andrew Morton
  2001-01-27 12:49   ` jamal
  2001-01-27 12:43 ` sendfile+zerocopy: fairly sexy (nothing to do with ECN) jamal
       [not found] ` <200101271854.VAA02845@ms2.inr.ac.ru>
  3 siblings, 2 replies; 56+ messages in thread
From: Ion Badulescu @ 2001-01-27 10:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, netdev@oss.sgi.com

On Sat, 27 Jan 2001 16:45:43 +1100, Andrew Morton <andrewm@uow.edu.au> wrote:

> The client is a 650 MHz PIII.  The NIC is a 3CCFE575CT Cardbus 3com.
> It supports Scatter/Gather and hardware checksums.  The NIC's interrupt
> is shared with the Cardbus controller, so this will impact throughput
> slightly.
> 
> The kernels which were tested were 2.4.1-pre10 with and without the
> zerocopy patch.  We only look at client load (the TCP sender).
> 
> The link throughput was 11.5 mbytes/sec at all times (saturated 100baseT)
> 
> 2.4.1-pre10-vanilla, using sendfile():          29.6% CPU
> 2.4.1-pre10-vanilla, using read()/write():      34.5% CPU
> 
> 2.4.1-pre10+zercopy, using sendfile():          18.2% CPU
> 2.4.1-pre10+zercopy, using read()/write():      38.1% CPU
> 
> 2.4.1-pre10+zercopy, using sendfile():          22.9% CPU    * hardware tx checksums disabled
> 2.4.1-pre10+zercopy, using read()/write():      39.2% CPU    * hardware tx checksums disabled

750MHz PIII, Adaptec Starfire NIC, driver modified to use hardware sg+csum
(both Tx/Rx), and Intel i82559 (eepro100), no hardware csum support,
vanilla driver.

The box has 512MB of RAM, and I'm using a 100MB file, so it's entirely cached.

starfire:
2.4.1-pre10+zerocopy, using sendfile():		 9.6% CPU
2.4.1-pre10+zerocopy, using read()/write():	18.3%-29.6% CPU		* why so much variance?

2.4.1-pre10+zerocopy, using sendfile():		17.4% CPU		* hardware csum disabled
2.4.1-pre10+zerocopy, using read()/write():	16.5%-26.8% CPU		* idem, again why so much variance?

2.4.1-pre10-vanilla, using sendfile():		16.5% CPU
2.4.1-pre10-vanilla, using read()/write():	14.5%-24.5% CPU		* high variance again

eepro100:
2.4.1-pre10+zerocopy, using sendfile():		16.0% CPU
2.4.1-pre10+zerocopy, using read()/write():	15.0%-24.5% CPU		* why so much variance?

2.4.1-pre10-vanilla, using sendfile():		16.7% CPU
2.4.1-pre10-vanilla, using read()/write():	14.5%-24.6% CPU		* high variance again

The read+write case is really weird. I'm getting results like this:

CPU load: 27.9491
CPU load: 25.4763
CPU load: 15.8544
CPU load: 25.455
CPU load: 25.2072
CPU load: 15.8677
CPU load: 25.4896
CPU load: 25.2791
CPU load: 15.8837

i.e. 2 slow, 1 fast, 2 slow, 1 fast, and so on so forth.

> What can we conclude?
> 
> - sendfile is 10% cheaper than read()-then-write() on 2.4.1-pre10.

Hard to tell, with such inconclusive results...

> - sendfile() with the zerocopy patch is 40% cheaper than
>   sendfile() without the zerocopy patch.

Indeed. Close to 50% in fact.

> - hardware Tx checksums don't make much difference.  hmm...

Actually it makes all the difference in the world for the starfire.
Interesting...

> Bear in mind that the 3c59x driver uses a one-interrupt-per-packet
> algorithm.  Mitigation reduces this to 0.3 ints/packet.
> So we're absorbing 4,500 interrupts/sec while processing
> 12,000 packets/sec.  gigE NICs do much better mitigation than
> this and the relative benefits of zerocopy will be much higher
> for these.  Hopefully Jamal can do some testing.

Hmm.. the starfire also has quite advanced interrupt mitigation,
but I have not played with it. Maybe tomorrow. So these results
are with one-interrupt-per-packet.

P.S. The starfire still doesn't like tinygrams (skb's with 1-byte
fragments). Fortunately your test program doesn't seem to generate
them. :-)

Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-27  8:19   ` Andrew Morton
@ 2001-01-27 10:09     ` Ion Badulescu
  2001-01-27 10:45       ` Andrew Morton
  2001-01-30  6:00     ` David S. Miller
  1 sibling, 1 reply; 56+ messages in thread
From: Ion Badulescu @ 2001-01-27 10:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, netdev@oss.sgi.com, Aaron Lehmann

On Sat, 27 Jan 2001 19:19:01 +1100, Andrew Morton <andrewm@uow.edu.au> wrote:

> The figures I quoted for the no-hw-checksum case were still
> using scatter/gather.  That can be turned off as well and
> it makes it a tiny bit quicker.

Hmm. Are you sure the differences are not just noise? Unless you
modified the zerocopy patch yourself, it won't use SG without
checksums...

In fact it would be interesting to revert that policy and
see how much SG alone helps. Probably not much, since the
CPU checksumming is close to onecopy.

Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-27 10:05 ` Ion Badulescu
@ 2001-01-27 10:39   ` Andrew Morton
  2001-01-27 12:49   ` jamal
  1 sibling, 0 replies; 56+ messages in thread
From: Andrew Morton @ 2001-01-27 10:39 UTC (permalink / raw)
  To: Ion Badulescu; +Cc: lkml, netdev@oss.sgi.com

Ion Badulescu wrote:
> 
> 2.4.1-pre10+zerocopy, using read()/write():     18.3%-29.6% CPU         * why so much variance?

The variance is presumably because of the naive read/write
implementation.  It sucks in 16 megs and writes out out again.
With a 100 megabyte file you'll get aliasing effects between
the sampling interval and the client's activity.

You will get more repeatable results using smaller files.  I'm
just sending /usr/local/bin/* ten times, with

./zcc -s otherhost -c /usr/local/bin/* -n10 -N2 -S

Maybe that 16 meg buffer should be shorter...  Yes, making it
smaller smooths things out.

Heh, look at this.  It's a simple read-some, send-some loop.
Plot CPU utilisation against the transfer size:

Size           %CPU

256             31
512             25
1024            22
2048            18
4096            17
8192            16
16384           18
32768           19
65536           21
128k            22
256k            22.5

8192 bytes is best.

I've added the `-b' option to zcc to set the transfer size.  Same
URL.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-27 10:09     ` Ion Badulescu
@ 2001-01-27 10:45       ` Andrew Morton
  0 siblings, 0 replies; 56+ messages in thread
From: Andrew Morton @ 2001-01-27 10:45 UTC (permalink / raw)
  To: Ion Badulescu; +Cc: lkml, netdev@oss.sgi.com, Aaron Lehmann

Ion Badulescu wrote:
> 
> On Sat, 27 Jan 2001 19:19:01 +1100, Andrew Morton <andrewm@uow.edu.au> wrote:
> 
> > The figures I quoted for the no-hw-checksum case were still
> > using scatter/gather.  That can be turned off as well and
> > it makes it a tiny bit quicker.
> 
> Hmm. Are you sure the differences are not just noise?

I don't think so.  It's all pretty repeatable.

> Unless you
> modified the zerocopy patch yourself, it won't use SG without
> checksums...

I believe it in fact does use SG when hardware tx checksums are unavailable,
but this capability wil be removed RSN because userspace can scribble
on the pagecache after the checksum has been calculated, and before
the frame has hit the wire.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-27  5:45 sendfile+zerocopy: fairly sexy (nothing to do with ECN) Andrew Morton
  2001-01-27  6:20 ` Aaron Lehmann
  2001-01-27 10:05 ` Ion Badulescu
@ 2001-01-27 12:43 ` jamal
  2001-01-27 13:29   ` Andrew Morton
  2001-01-29 18:50   ` Rick Jones
       [not found] ` <200101271854.VAA02845@ms2.inr.ac.ru>
  3 siblings, 2 replies; 56+ messages in thread
From: jamal @ 2001-01-27 12:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, netdev@oss.sgi.com



On Sat, 27 Jan 2001, Andrew Morton wrote:

> (Please keep netdev copied, else Jamal will grump at you, and
>  you don't want that).
>

Thanks, Andrew ;-> Isnt netdev where networking stuff should be
discussed? I think i give up and will join lk, RSN ;->

> The kernels which were tested were 2.4.1-pre10 with and without the
> zerocopy patch.  We only look at client load (the TCP sender).
>
> The link throughput was 11.5 mbytes/sec at all times (saturated 100baseT)
>
> 2.4.1-pre10-vanilla, using sendfile():          29.6% CPU
> 2.4.1-pre10-vanilla, using read()/write():      34.5% CPU
>
> 2.4.1-pre10+zercopy, using sendfile():          18.2% CPU
> 2.4.1-pre10+zercopy, using read()/write():      38.1% CPU
>
> 2.4.1-pre10+zercopy, using sendfile():          22.9% CPU    * hardware tx checksums disabled
> 2.4.1-pre10+zercopy, using read()/write():      39.2% CPU    * hardware tx checksums disabled
>
>
> What can we conclude?
>
> - sendfile is 10% cheaper than read()-then-write() on 2.4.1-pre10.
>
> - sendfile() with the zerocopy patch is 40% cheaper than
>   sendfile() without the zerocopy patch.
>

It is also useful to have both client and server stats.
BTW, since the laptop (with the 3C card) is the client, the SG
shouldnt kick in at all.

> - hardware Tx checksums don't make much difference.  hmm...
>
> Bear in mind that the 3c59x driver uses a one-interrupt-per-packet
> algorithm.  Mitigation reduces this to 0.3 ints/packet.
> So we're absorbing 4,500 interrupts/sec while processing
> 12,000 packets/sec.  gigE NICs do much better mitigation than
> this and the relative benefits of zerocopy will be much higher
> for these.  Hopefully Jamal can do some testing.
>

I dont have my babies right now, but as soon as i can get access to
them

> BTW: I could not reproduce Jamal's oops when sending large
> files (2 gigs with sendfile()).

Alexey was concerned about this. Good. But maybe it will still
happen with my setupo. We'll see.

>
> The test tool is, of course, documented [ :-)/2 ].  It's at
>
> 	http://www.uow.edu.au/~andrewm/linux/#zc
>

I'll give this a shot later. Can you try with the sendfiled-ttcp?
http://www.cyberus.ca/~hadi/ttcp-sf.tar.gz
Anyways, you are NIC-challenged ;-> Get GigE. 100Mbps doesnt give
much information.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-27 10:05 ` Ion Badulescu
  2001-01-27 10:39   ` Andrew Morton
@ 2001-01-27 12:49   ` jamal
  2001-01-30  1:06     ` Ion Badulescu
  1 sibling, 1 reply; 56+ messages in thread
From: jamal @ 2001-01-27 12:49 UTC (permalink / raw)
  To: Ion Badulescu; +Cc: Andrew Morton, lkml, netdev@oss.sgi.com



On Sat, 27 Jan 2001, Ion Badulescu wrote:

>
> 750MHz PIII, Adaptec Starfire NIC, driver modified to use hardware sg+csum
> (both Tx/Rx), and Intel i82559 (eepro100), no hardware csum support,
> vanilla driver.
>
> The box has 512MB of RAM, and I'm using a 100MB file, so it's entirely cached.
>
> starfire:
> 2.4.1-pre10+zerocopy, using sendfile():		 9.6% CPU
> 2.4.1-pre10+zerocopy, using read()/write():	18.3%-29.6% CPU		* why so much variance?
>

What are your throughput numbers?

Could you also, please, test using:

http://www.cyberus.ca/~hadi/ttcp-sf.tar.gz

post both sender and receiver data. Repeat each test about
5 times.

cheers,
jamal


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-27 12:43 ` sendfile+zerocopy: fairly sexy (nothing to do with ECN) jamal
@ 2001-01-27 13:29   ` Andrew Morton
  2001-01-27 14:15     ` jamal
  2001-01-29 18:50   ` Rick Jones
  1 sibling, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2001-01-27 13:29 UTC (permalink / raw)
  To: jamal; +Cc: lkml, netdev@oss.sgi.com

jamal wrote:
> 
> ..
> It is also useful to have both client and server stats.
> BTW, since the laptop (with the 3C card) is the client, the SG
> shouldnt kick in at all.

The `client' here is doing the sendfiling, so yes, the
gathering occurs on the client.

> ...
> > The test tool is, of course, documented [ :-)/2 ].  It's at
> >
> >       http://www.uow.edu.au/~andrewm/linux/#zc
> >
> 
> I'll give this a shot later. Can you try with the sendfiled-ttcp?
> http://www.cyberus.ca/~hadi/ttcp-sf.tar.gz

hmm..  I didn't bother with TCP_CORK because the files being
sent are "much" larger than a frame.  Guess I should.

The problem with things like ttcp is the measurement of CPU load.
If your network is so fast that your machine can't keep up then
fine, raw throughput is a good measure. But if the link is saturated
then normal process accounting doesn't cut it.

For example, at 100 mbps, `top' says ttcp is chewing 4% CPU. But guess
what?  A low-priority process running on the same machine is in fact
slowed down by 30%.  top lies.  Most of the cost of the networking layer
is being accounted to swapper, and lost.  And who accounts for cache
eviction, bus utilisation, etc.  We're better off measuring what's
left behind, rather than measuring what is consumed.

You can in fact do this with ttcp: run it with a super-high priority
and run a little task in the background (dummyload.c in the above
tarball does this).  See how much the dummy task is slowed down
wrt an unloaded system.  It gets tricky on SMP though.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-27 13:29   ` Andrew Morton
@ 2001-01-27 14:15     ` jamal
  2001-01-28 16:05       ` Andrew Morton
  0 siblings, 1 reply; 56+ messages in thread
From: jamal @ 2001-01-27 14:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, netdev@oss.sgi.com

On Sun, 28 Jan 2001, Andrew Morton wrote:

> jamal wrote:
> >
> > ..
> > It is also useful to have both client and server stats.
> > BTW, since the laptop (with the 3C card) is the client, the SG
> > shouldnt kick in at all.
>
> The `client' here is doing the sendfiling, so yes, the
> gathering occurs on the client.
>

OK, semantics. Maybe we should stick to sender and receiver.
(server normally will translate to "serve" files)

> > I'll give this a shot later. Can you try with the sendfiled-ttcp?
> > http://www.cyberus.ca/~hadi/ttcp-sf.tar.gz
>
> hmm..  I didn't bother with TCP_CORK because the files being
> sent are "much" larger than a frame.  Guess I should.

It doesnt make much sense to use sendfile without TCP_CORK.

> The problem with things like ttcp is the measurement of CPU load.
> If your network is so fast that your machine can't keep up then
> fine, raw throughput is a good measure. But if the link is saturated
> then normal process accounting doesn't cut it.

ttcp's CPU measure is not the best. Part of my plan was to change that.
It uses times(). So the measurement is not good. It is infact not
very reflective on SMP. The way to do it there is to break it down
by CPU.
Throughput: 100Mbps is really nothing. Linux never had a problem with
4-500Mbps file serving. So throughput is an important number. so is
end to end latency, but in file serving case, latency might not be a big
deal so ignore it.

> For example, at 100 mbps, `top' says ttcp is chewing 4% CPU. But guess
> what?  A low-priority process running on the same machine is in fact
> slowed down by 30%.  top lies.  Most of the cost of the networking layer
> is being accounted to swapper, and lost.  And who accounts for cache
> eviction, bus utilisation, etc.  We're better off measuring what's
> left behind, rather than measuring what is consumed.
>
> You can in fact do this with ttcp: run it with a super-high priority
> and run a little task in the background (dummyload.c in the above
> tarball does this).  See how much the dummy task is slowed down
> wrt an unloaded system.  It gets tricky on SMP though.
>

The best way to do CPU measurement is via /proc. The way top
does it. You measure it from within your nettest program. This does
measure what is "left behind" since your proggie is in user space.
Actually, it shouldnt matter whether you do it from your test program or
from dummyload.c. With dummyload you might have to sigkill the program
every  time a test terminates.
You also should break down utilization by CPU.

cheers,
jamal

PS:- can you try it out with the ttcp testcode i posted?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
       [not found] ` <200101271854.VAA02845@ms2.inr.ac.ru>
@ 2001-01-28  5:34   ` Andrew Morton
  2001-01-28 13:37     ` Felix von Leitner
  0 siblings, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2001-01-28  5:34 UTC (permalink / raw)
  To: kuznet; +Cc: netdev, lkml

kuznet@ms2.inr.ac.ru wrote:
> 
> Hello!
> 
> > 2.4.1-pre10+zercopy, using read()/write():      38.1% CPU
> 
> write() on zc card is worse than normal write() by definition.
> It generates split buffers.

yes.  The figures below show this.  Disabling SG+checksums speeds
up write() and send().

> Split buffers are more expensive and we have to pay for this.
> You have paid too much for slow card though. 8)
>
> Do you measure load correctly?

Yes.  Quite confident about this.  Here's the algorithm:

1: Run a cycle-soaker on each CPU on an otherwise unloaded
   system.  See how much "work" they all do per second.

2: Run the cycle-soakers again, but with network traffic happening.
   See how much their "work" is reduced. Deduce networking CPU load
   from this difference.

   The networking code all runs SCHED_FIFO or in interrupt context,
   so the cycle-soakers have no effect upon the network code's access
   to the CPU.

   The "cycle-soakers" just sit there spinning and dirtying 10,000
   cachelines per second.

> > 2.4.1-pre10+zercopy, using read()/write():      39.2% CPU    * hardware tx checksums disabled
> 
> This is illegal combination of parameters. You force two memory accesses,
> doing this. The fact that it does not add to load is dubious. 8)8)

mm.. Perhaps with read()/write() the data is already in cache?

Anyway, I've tweaked up the tool again so it can do send() or
write() (then I looked at the implementation and wondered why
I'd bothered).  It also does TCP_CORK now.

I ran another set of tests.  The zerocopy patch improves sendfile()
hugely but slows down send()/write() significantly, with a 3c905C:

http://www.uow.edu.au/~andrewm/linux/#zc

The kernels which were tested were 2.4.1-pre10 with and without the
zerocopy patch.  We only look at client load (the TCP sender).

In all tests the link throughput was 11.5 mbytes/sec at all times
(saturated 100baseT) unless otherwise noted.

The client (the thing which sends data) is a dual 500MHz PII with a
3c905C.

For the write() and send() tests, the chunk size was 64 kbytes.

The workload was 63 files with an average length of 350 kbytes.

                                                     CPU

    2.4.1-pre10+zerocopy, using sendfile():          9.6%
    2.4.1-pre10+zerocopy, using send():             24.1%
    2.4.1-pre10+zerocopy, using write():            24.2%

    2.4.1-pre10+zerocopy, using sendfile():         16.2%       * checksums and SG disabled
    2.4.1-pre10+zerocopy, using send():             21.5%       * checksums and SG disabled
    2.4.1-pre10+zerocopy, using write():            21.5%       * checksums and SG disabled

    2.4.1-pre10-vanilla, using sendfile():          17.1%
    2.4.1-pre10-vanilla, using send():              21.1%
    2.4.1-pre10-vanilla, using write():             21.1%

Bearing in mind that a large amount of the load is in the device
driver, the zerocopy patch makes a large improvement in sendfile
efficiency.  But read() and send() performance is decreased by 10% -
more than this if you factor out the constant device driver overhead.

TCP_CORK makes no difference.  The files being sent are much larger
than a single frame.

Conclusions:

  For a NIC which cannot do scatter/gather/checksums, the zerocopy
  patch makes no change in throughput in all case.

  For a NIC which can do scatter/gather/checksums, sendfile()
  efficiency is improved by 40% and send() efficiency is decreased by
  10%.  The increase and decrease caused by the zerocopy patch will in
  fact be significantly larger than these two figures, because the
  measurements here include a constant base load caused by the device
  driver.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-28  5:34   ` Andrew Morton
@ 2001-01-28 13:37     ` Felix von Leitner
  2001-01-28 14:11       ` Dan Hollis
                         ` (3 more replies)
  0 siblings, 4 replies; 56+ messages in thread
From: Felix von Leitner @ 2001-01-28 13:37 UTC (permalink / raw)
  To: lkml

Thus spake Andrew Morton (andrewm@uow.edu.au):
> Conclusions:

>   For a NIC which cannot do scatter/gather/checksums, the zerocopy
>   patch makes no change in throughput in all case.

>   For a NIC which can do scatter/gather/checksums, sendfile()
>   efficiency is improved by 40% and send() efficiency is decreased by
>   10%.  The increase and decrease caused by the zerocopy patch will in
>   fact be significantly larger than these two figures, because the
>   measurements here include a constant base load caused by the device
>   driver.

What is missing here is a good authoritative web ressource that tells
people which NIC to buy.

I have a tulip NIC because a few years ago that apparently was the NIC
of choice.  It has good multicast (which is important to me), but AFAIK
it has neither scatter-gather nor hardware checksumming.

Is there such a web page already?
If not, I volunteer to create amd maintain one.

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-28 13:37     ` Felix von Leitner
@ 2001-01-28 14:11       ` Dan Hollis
  2001-01-28 14:27       ` Andi Kleen
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 56+ messages in thread
From: Dan Hollis @ 2001-01-28 14:11 UTC (permalink / raw)
  To: Felix von Leitner; +Cc: lkml

On Sun, 28 Jan 2001, Felix von Leitner wrote:
> What is missing here is a good authoritative web ressource that tells
> people which NIC to buy.
> I have a tulip NIC because a few years ago that apparently was the NIC
> of choice.  It has good multicast (which is important to me), but AFAIK
> it has neither scatter-gather nor hardware checksumming.
> Is there such a web page already?

http://www.anime.net/~goemon/cardz/

Based on discussions I've had with Donald Becker about chipsets.

For 100bt, 3c905C is the most efficient card at the moment.
I've no idea about gigglebit ethernet.

-Dan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-28 13:37     ` Felix von Leitner
  2001-01-28 14:11       ` Dan Hollis
@ 2001-01-28 14:27       ` Andi Kleen
  2001-01-29 21:50         ` Pavel Machek
  2001-01-28 19:43       ` Gregory Maxwell
  2001-01-28 19:48       ` Choosing Linux NICs (was: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)) Felix von Leitner
  3 siblings, 1 reply; 56+ messages in thread
From: Andi Kleen @ 2001-01-28 14:27 UTC (permalink / raw)
  To: lkml; +Cc: leitner

On Sun, Jan 28, 2001 at 02:37:48PM +0100, Felix von Leitner wrote:
> What is missing here is a good authoritative web ressource that tells
> people which NIC to buy.
> 
> I have a tulip NIC because a few years ago that apparently was the NIC
> of choice.  It has good multicast (which is important to me), but AFAIK
> it has neither scatter-gather nor hardware checksumming.
> 
> Is there such a web page already?
> If not, I volunteer to create amd maintain one.

Here a try for FastEthernet. Corrections/additions welcome.

Currently the 3c9xx cards look like the best commonly affordable ones,
at least when you care about zero copy networking. The newer ones have
all the necessary toys for it. 
Don't use them with the 3com vendor driver though, their driver is crap. 

eepro100 seems to have mostly the same facilities, but Intel doesn't 
document it fully so it is not usable in the standard Linux driver. They have
an own driver available (e100.c) which does more, but it of course lags in
progress to the normal stack.

I don't know about starfire, but it seems to be hard to even buy them anyways.

Sun HME seems to be on similar level as 3c9xx, but near impossible to buy or
very expensive.

Realtek is ok for low cost and being relatively hazzle free, but they
miss lots of useful facilities and basically require a copy for RX.

SMC epic/100 is handicapped by not being able to receive to unaligned addresses,
requiring in linux a driver level copy (that may change in 2.5 though, zero
copy has the necessary infrastructure to only copy the header in this case,
not the whole packet) 

AMD pcnet afaik doesn't do hardware checksums.

Tulip doesn't do hardware checksum and is a bit constrained by the 
required long word alignment in RX (causing problems with misaligned IP 
headers, see above on epic100) 

Advantage of Tulip and AMD is that they perform much better in my experience
on half duplex Ethernet than other cards because they a modified 
patented backoff scheme. Without it Linux 2.1+ tends to suffer badly from
ethernet congestion by colliding with the own acks, probably because it 
sends too fast.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-27 14:15     ` jamal
@ 2001-01-28 16:05       ` Andrew Morton
  0 siblings, 0 replies; 56+ messages in thread
From: Andrew Morton @ 2001-01-28 16:05 UTC (permalink / raw)
  To: jamal; +Cc: lkml, netdev@oss.sgi.com

jamal wrote:
> 
> PS:- can you try it out with the ttcp testcode i posted?

Yup.  See below.  The numbers are almost the same as
with `zcs' and `zcc'.

The CPU utilisation code which was in `zcc' has been
broken out into a standalone tool, so the new `cyclesoak'
app is a general-purpose system load measurement tool.
It's fascinating to play with, if you're into that sort
of thing.

`cyclesoak' was used to measure ttcp-sf and NFS/UDP client
and server throughput.  The times()-based instrumentation
inside ttcp-sf doesn't (can't) give correct numbers.  2-4% CPU
at 100 mbps?  We wish :)

The zerocopy patch doesn't seem to affect NFS efficiency at
all.  Confused.

Excerpt from the rapidly swelling README:

NFS/UDP client results
======================

Reading a 100 meg file across 100baseT.  The file is fully cached on
the server.  The client is the above machine.  You need to unmount the
server between runs to avoid client-side caching.

The server is mounted with various rsize and wsize options.

  Kernel           rsize wsize   mbyte/sec     CPU

  2.4.1-pre10+zc   1024  1024     2.4         10.3%
  2.4.1-pre10+zc   2048  2048     3.7         11.4%
  2.4.1-pre10+zc   4096  4096     10.1        29.0%
  2.4.1-pre10+zc   8199  8192     11.9        28.2%
  2.4.1-pre10+zc  16384 16384     11.9        28.2%

  2.4.1-pre10      1024  1024      2.4         9.7%
  2.4.1-pre10      2048  2048      3.7        11.8%
  2.4.1-pre10      4096  4096     10.7        33.6%
  2.4.1-pre10      8199  8192     11.9        29.5%
  2.4.1-pre10     16384 16384     11.9        29.2%

Small diff at 8192.

NFS/UDP server results
======================

Reading a 100 meg file across 100baseT.  The file is fully cached on
the server.  The server is the above machine.

  Kernel           rsize wsize   mbyte/sec     CPU

  2.4.1-pre10+zc   1024  1024      2.6        19.1%
  2.4.1-pre10+zc   2048  2048      3.9        18.8%
  2.4.1-pre10+zc   4096  4096     10.0        34.5%
  2.4.1-pre10+zc   8199  8192     11.8        28.9%
  2.4.1-pre10+zc  16384 16384     11.8        29.0%

  2.4.1-pre10      1024  1024      2.6        18.5%
  2.4.1-pre10      2048  2048      3.9        18.6%
  2.4.1-pre10      4096  4096     10.9        33.8%
  2.4.1-pre10      8199  8192     11.8        29.0%
  2.4.1-pre10     16384 16384     11.8        29.0%

No diff.

ttcp-sf Results
===============

Jamal Hadi Salim has taught ttcp to use sendfile.  See
http://www.cyberus.ca/~hadi/ttcp-sf.tar.gz

Using the same machine as above, and the following commands:

Sender:    ./ttcp-sf -t -c -l 32768 -v receiver_host
Receiver:  ./ttcp-sf -c -r -l 32768 -v sender_host

                                                        CPU

    2.4.1-pre10-zerocopy, sending with ttcp-sf:        10.5%
    2.4.1-pre10-zerocopy, receiving with ttcp-sf:      16.1%

    2.4.1-pre10-vanilla, sending with ttcp-sf:         18.5%
    2.4.1-pre10-vanilla, receiving with ttcp-sf:       16.0%

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-28 13:37     ` Felix von Leitner
  2001-01-28 14:11       ` Dan Hollis
  2001-01-28 14:27       ` Andi Kleen
@ 2001-01-28 19:43       ` Gregory Maxwell
  2001-01-28 19:48       ` Choosing Linux NICs (was: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)) Felix von Leitner
  3 siblings, 0 replies; 56+ messages in thread
From: Gregory Maxwell @ 2001-01-28 19:43 UTC (permalink / raw)
  To: lkml

On Sun, Jan 28, 2001 at 02:37:48PM +0100, Felix von Leitner wrote:
> Thus spake Andrew Morton (andrewm@uow.edu.au):
> > Conclusions:
> 
> >   For a NIC which cannot do scatter/gather/checksums, the zerocopy
> >   patch makes no change in throughput in all case.
> 
> >   For a NIC which can do scatter/gather/checksums, sendfile()
> >   efficiency is improved by 40% and send() efficiency is decreased by
> >   10%.  The increase and decrease caused by the zerocopy patch will in
> >   fact be significantly larger than these two figures, because the
> >   measurements here include a constant base load caused by the device
> >   driver.
> 
> What is missing here is a good authoritative web ressource that tells
> people which NIC to buy.
> 
> I have a tulip NIC because a few years ago that apparently was the NIC
> of choice.  It has good multicast (which is important to me), but AFAIK
> it has neither scatter-gather nor hardware checksumming.
> 
> Is there such a web page already?
> If not, I volunteer to create amd maintain one.

Additionally, it would be useful to have some boot messages comment on the
abilities of cards. I am sick and tired of dealing with people telling me
that 'Linux performance sucks' when they keep putting Linux on systems with
pci 8139 adaptors. 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Choosing Linux NICs (was: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN))
  2001-01-28 13:37     ` Felix von Leitner
                         ` (2 preceding siblings ...)
  2001-01-28 19:43       ` Gregory Maxwell
@ 2001-01-28 19:48       ` Felix von Leitner
  3 siblings, 0 replies; 56+ messages in thread
From: Felix von Leitner @ 2001-01-28 19:48 UTC (permalink / raw)
  To: lkml

Thus spake Felix von Leitner (leitner@fefe.de):
> What is missing here is a good authoritative web ressource that tells
> people which NIC to buy.

I started one now.

It's at http://www.fefe.de/linuxeth/, but there is not much content yet.
Please contribute!

Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-27 12:43 ` sendfile+zerocopy: fairly sexy (nothing to do with ECN) jamal
  2001-01-27 13:29   ` Andrew Morton
@ 2001-01-29 18:50   ` Rick Jones
  1 sibling, 0 replies; 56+ messages in thread
From: Rick Jones @ 2001-01-29 18:50 UTC (permalink / raw)
  To: jamal; +Cc: Andrew Morton, lkml, netdev@oss.sgi.com

> I'll give this a shot later. Can you try with the sendfiled-ttcp?
> http://www.cyberus.ca/~hadi/ttcp-sf.tar.gz

I guess I need to "leverage" some bits for netperf :)

WRT getting data with links that cannot saturate a system, having
something akin to the netperf service demand measure can help. Nothing
terribly fancy - simply a conversion of the CPU utilization and
throughput to a microseconds of CPU to transfer a KB of data. 

As for CKO and avoiding copies and such, if past experience is any guide
(ftp://ftp.cup.hp.com/dist/networking/briefs/copyavoid.ps) you get a
very nice synergistic effect once the last "access" of data is removed.
CKO gets you say 10%, avoiding the copy gets you say 10%, but doing both
at the same time gets you 30%.

rick jones
http://www.netperf.org/
-- 
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-28 14:27       ` Andi Kleen
@ 2001-01-29 21:50         ` Pavel Machek
  0 siblings, 0 replies; 56+ messages in thread
From: Pavel Machek @ 2001-01-29 21:50 UTC (permalink / raw)
  To: Andi Kleen, lkml; +Cc: leitner

Hi!

> Advantage of Tulip and AMD is that they perform much better in my experience
> on half duplex Ethernet than other cards because they a modified 
> patented backoff scheme. Without it Linux 2.1+ tends to suffer badly from
> ethernet congestion by colliding with the own acks, probably because it 
> sends too fast.

Is that real problem? If so, some strategic delay loop should do the
trick...
								Pavel
-- 
I'm pavel@ucw.cz. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at discuss@linmodems.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-27 12:49   ` jamal
@ 2001-01-30  1:06     ` Ion Badulescu
  2001-01-30  2:48       ` jamal
  0 siblings, 1 reply; 56+ messages in thread
From: Ion Badulescu @ 2001-01-30  1:06 UTC (permalink / raw)
  To: jamal; +Cc: Andrew Morton, lkml, netdev@oss.sgi.com

On Sat, 27 Jan 2001, jamal wrote:

> > starfire:
> > 2.4.1-pre10+zerocopy, using sendfile():		 9.6% CPU
> > 2.4.1-pre10+zerocopy, using read()/write():	18.3%-29.6% CPU		* why so much variance?
> >
>
> What are your throughput numbers?

11.5kBps, quite consistently.

BTW, Andrew's new tool (with 8k reads/writes) has shown the load in the
read/write case to be essentially the lower margin of the intervals I got
in the first mail.

> Could you also, please, test using:
>
> http://www.cyberus.ca/~hadi/ttcp-sf.tar.gz
>
> post both sender and receiver data. Repeat each test about
> 5 times.

I've tried it, but I'm not really sure what I can report. ttcp's
measurements are clearly misleading, so I used Andrew's cyclesoak instead.
The numbers are (with 2.4.1-pre10+zerocopy):

[starfire, hw csum & sg enabled]
sending with sendfile:		10.0-10.2%
sending with send/write:	13.5-13.7%
receiving:			20.0-20.2%

[starfire, hw csum & sg disabled]
sending with sendfile:		18.1-18.3%
sending with send/write:	13.9-14.1%
receiving:			24.3-24.5%

[eepro100, i82559, no hw fancies]
sending with sendfile:		16.2-16.4%
sending with send/write:	12.0-12.2%
receiving:			21.5-21.7%

Same tests, this time with 2.4.1-pre10 vanilla:

[starfire]
sending with sendfile:		18.1-18.3%
sending with send/write:	12.5-12.7%
receiving:			23.0-23.1%

[eepro100, i82559]
sending with sendfile:		16.7-16.9%
sending with send/write:	12.0-12.2%
receiving:			20.8-20.9%


Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-30  1:06     ` Ion Badulescu
@ 2001-01-30  2:48       ` jamal
  2001-01-30  3:26         ` Ion Badulescu
  0 siblings, 1 reply; 56+ messages in thread
From: jamal @ 2001-01-30  2:48 UTC (permalink / raw)
  To: Ion Badulescu; +Cc: Andrew Morton, lkml, netdev@oss.sgi.com

On Mon, 29 Jan 2001, Ion Badulescu wrote:

> 11.5kBps, quite consistently.

This gige card is really sick. Are you sure? Please double check.

>
> I've tried it, but I'm not really sure what I can report. ttcp's
> measurements are clearly misleading, so I used Andrew's cyclesoak instead.

The ttcp CPU (times()) measurements are misleading. In particular when
doing sendfile. All they say is how much time ttcp spent in kernel space
vs user space. So all CPU measurement i have posted in the past
should be considered bogus. It is interesting to note, however, that
the trend reported by ttcp's CPU measurements as well as Andrew (and
yourself) are similar;->
But the point is: CPU is not the only measure that is of interest.
Throughput is definetly one of those that is of extreme importance.
100Mbps is not exciting. You seem to have gigE. I think your 11KB looks
suspiciously wrong. Can you double check please?

cheers,
jamal

PS:- another important parameter is latency, but that might not be as
important in file serving (maybe in short file tranfers ala http).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-30  2:48       ` jamal
@ 2001-01-30  3:26         ` Ion Badulescu
  2001-01-31  0:53           ` Still not sexy! (Re: " jamal
  0 siblings, 1 reply; 56+ messages in thread
From: Ion Badulescu @ 2001-01-30  3:26 UTC (permalink / raw)
  To: jamal; +Cc: Andrew Morton, lkml, netdev@oss.sgi.com

On Mon, 29 Jan 2001, jamal wrote:

> > 11.5kBps, quite consistently.
>
> This gige card is really sick. Are you sure? Please double check.

Umm.. the starfire chipset is 100Mbit only. So 11.5MBps (sorry, that was a
typo, it's mega not kilo) is really all I'd expect out of it.

Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-27  8:19   ` Andrew Morton
  2001-01-27 10:09     ` Ion Badulescu
@ 2001-01-30  6:00     ` David S. Miller
  2001-01-30 12:44       ` Andrew Morton
  2001-02-02 10:12       ` Andrew Morton
  1 sibling, 2 replies; 56+ messages in thread
From: David S. Miller @ 2001-01-30  6:00 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, netdev@oss.sgi.com

The "more expensive" write/send in zerocopy is a known cost of paged
SKBs.  This cost may be decreased a bit with some fine tuning, but not
eliminated entirely.

What do we get for this cost?

Basically, the big win is not that the card checksums the packet.
We could get that for free while copying the data from userspace
into the kernel pages during the sendmsg(), using the combined
"copy+checksum" hand-coded assembly routines we already have.

It is in fact the better use of memory.  Firstly, we use page
allocations, only single ones.  With linear buffers SLAB could
use multiple pages which strain the memory subsystem quite a bit at
times.  Secondly, we fill pages with socket data precisely whereas
SLAB can only get as tight packing as any general purpose memory
allocator can.

This, I feel, outweighs the slight performance decrease.  And I would
wager a bet that the better usage of memory will result in better
all around performance.

The problem with microscopic tests is that you do not see the world
around the thing being focused on.  I feel Andrew/Jamal's test are
very valuable, but lets keep things in perspective when doing cost
analysis.

Finally, please do some tests on loopback.  It is usually a great
way to get "pure software overhead" measurements of our TCP stack.

Later,
David S. Miller
davem@redhat.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-30  6:00     ` David S. Miller
@ 2001-01-30 12:44       ` Andrew Morton
  2001-01-30 12:52         ` David S. Miller
  2001-02-02 10:12       ` Andrew Morton
  1 sibling, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2001-01-30 12:44 UTC (permalink / raw)
  To: David S. Miller; +Cc: lkml, netdev@oss.sgi.com

"David S. Miller" wrote:
> 
> The "more expensive" write/send in zerocopy is a known cost of paged
> SKBs.  This cost may be decreased a bit with some fine tuning, but not
> eliminated entirely.

Can you say what causes the difference?  I had a brief poke
around - generic_copy_from_user() dominates in both cases
of course, but nothing really stood out when comparing the
zerocopy kernel's profile with non-zc.

Varying the value of MAXPGS (all the way down to 1) and also
the amount of data which is sent with send() does change the
throughput, but not the ratio wrt non-zc.

> What do we get for this cost?
> 
> Basically, the big win is not that the card checksums the packet.
> We could get that for free while copying the data from userspace
> into the kernel pages during the sendmsg(), using the combined
> "copy+checksum" hand-coded assembly routines we already have.
> 
> It is in fact the better use of memory.  Firstly, we use page
> allocations, only single ones.  With linear buffers SLAB could
> use multiple pages which strain the memory subsystem quite a bit at
> times.  Secondly, we fill pages with socket data precisely whereas
> SLAB can only get as tight packing as any general purpose memory
> allocator can.
> 
> This, I feel, outweighs the slight performance decrease.  And I would
> wager a bet that the better usage of memory will result in better
> all around performance.

ie: inappropriate test coverage.  Not surprising.  What
additional scenarios need to be tested?  Zillions of
connections?

If anyone really needs that 10% they can use the `hw_checksums=0'
module parm, but SG+xsum is enabled by default - we need the testing.

> The problem with microscopic tests is that you do not see the world
> around the thing being focused on.  I feel Andrew/Jamal's test are
> very valuable, but lets keep things in perspective when doing cost
> analysis.
> 
> Finally, please do some tests on loopback.  It is usually a great
> way to get "pure software overhead" measurements of our TCP stack.

Will do.

BTW: can you suggest why I'm not observing any change in NFS client
efficiency?

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-30 12:44       ` Andrew Morton
@ 2001-01-30 12:52         ` David S. Miller
  2001-01-30 14:58           ` Andrew Morton
  0 siblings, 1 reply; 56+ messages in thread
From: David S. Miller @ 2001-01-30 12:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, netdev@oss.sgi.com

Andrew Morton writes:
 > BTW: can you suggest why I'm not observing any change in NFS client
 > efficiency?

As in "filecopy speed" or "cpu usage while copying a file"?

The current fragmentation code eliminates a full SKB allocation and
data copy on the NFS file data receive path in the client, CPU has to
be saved compared to pre-zerocopy or something is very wrong.

File copy speed, well you should be link speed limited as even without
the zerocopy patches you ought to have enough cpu to keep it busy.

Later,
David S. Miller
davem@redhat.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-30 12:52         ` David S. Miller
@ 2001-01-30 14:58           ` Andrew Morton
  2001-01-30 17:49             ` Chris Wedgwood
  2001-01-30 22:28             ` David S. Miller
  0 siblings, 2 replies; 56+ messages in thread
From: Andrew Morton @ 2001-01-30 14:58 UTC (permalink / raw)
  To: David S. Miller; +Cc: lkml, netdev@oss.sgi.com

"David S. Miller" wrote:
> 
> Andrew Morton writes:
>  > BTW: can you suggest why I'm not observing any change in NFS client
>  > efficiency?
> 
> As in "filecopy speed" or "cpu usage while copying a file"?
> 
> The current fragmentation code eliminates a full SKB allocation and
> data copy on the NFS file data receive path in the client, CPU has to
> be saved compared to pre-zerocopy or something is very wrong.
> 
> File copy speed, well you should be link speed limited as even without
> the zerocopy patches you ought to have enough cpu to keep it busy.
> 

Mount the server rsize=wsize=8192.  `cp' a 102,400,000 byte file
from the NFS server to /dev/null.  The file is fully cached on
the server.  unmount and remount the server between runs
to eliminate client caching. The copy takes 8.654 seconds.  That's
11.8 megabytes/sec.

Client is 2.4.1-vanilla:                   29.8% CPU
Client is 2.4.1-zc:                        28.2% CPU
Client is 2.4.1-zc, non-SG+xsum NIC:       27.7% CPU

So I was mistaken - there is an improvement. (A 2% CPU
change is easily measurable with this setup).

It may be a little better than this - cyclesoak I think
will underestimate the benefit of saving on memory traffic.
It only generates 10,000 cacheline writebacks per second per
CPU.  But winding it up to 80,000 doesn't affect the above
figures much at all.

The box has 130 mbyte/sec memory write bandwidth, so saving
a copy should save 10% of this.   (Wanders away, scratching
head...)

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-30 14:58           ` Andrew Morton
@ 2001-01-30 17:49             ` Chris Wedgwood
  2001-01-30 22:17               ` David S. Miller
  2001-01-30 22:28             ` David S. Miller
  1 sibling, 1 reply; 56+ messages in thread
From: Chris Wedgwood @ 2001-01-30 17:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David S. Miller, lkml, netdev@oss.sgi.com

On Wed, Jan 31, 2001 at 01:58:44AM +1100, Andrew Morton wrote:

    Mount the server rsize=wsize=8192.  `cp' a 102,400,000 byte file
    from the NFS server to /dev/null.  The file is fully cached on
    the server.  unmount and remount the server between runs to
    eliminate client caching. The copy takes 8.654 seconds.  That's
    11.8 megabytes/sec.

What server are you using here? Using NetApp filers I don't see
anything like this, probably only 8.5MB/s at most and this number is
fairly noisy.



  --cw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-30 17:49             ` Chris Wedgwood
@ 2001-01-30 22:17               ` David S. Miller
  2001-01-31  0:31                 ` Chris Wedgwood
  0 siblings, 1 reply; 56+ messages in thread
From: David S. Miller @ 2001-01-30 22:17 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Andrew Morton, lkml, netdev@oss.sgi.com

Chris Wedgwood writes:
 > What server are you using here? Using NetApp filers I don't see
 > anything like this, probably only 8.5MB/s at most and this number is
 > fairly noisy.

8.5MB/sec sounds like half-duplex 100baseT.  Positive you are running
at full duplex all the way to the netapp, and if so how many switches
sit between you and this netapp?

Later,
David S. Miller
davem@redhat.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-30 14:58           ` Andrew Morton
  2001-01-30 17:49             ` Chris Wedgwood
@ 2001-01-30 22:28             ` David S. Miller
  2001-01-30 23:34               ` Andrew Morton
  1 sibling, 1 reply; 56+ messages in thread
From: David S. Miller @ 2001-01-30 22:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, netdev@oss.sgi.com

Andrew Morton writes:
 > The box has 130 mbyte/sec memory write bandwidth, so saving
 > a copy should save 10% of this.   (Wanders away, scratching
 > head...)

Are you sure your measurment program will account properly
for all system cycles spent in softnet processing?  This is
where the bulk of the cpu cycle savings will occur.

Later,
David S. Miller
davem@redhat.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-30 22:28             ` David S. Miller
@ 2001-01-30 23:34               ` Andrew Morton
  0 siblings, 0 replies; 56+ messages in thread
From: Andrew Morton @ 2001-01-30 23:34 UTC (permalink / raw)
  To: David S. Miller; +Cc: lkml, netdev@oss.sgi.com

"David S. Miller" wrote:
> 
> Andrew Morton writes:
>  > The box has 130 mbyte/sec memory write bandwidth, so saving
>  > a copy should save 10% of this.   (Wanders away, scratching
>  > head...)
> 
> Are you sure your measurment program will account properly
> for all system cycles spent in softnet processing?  This is
> where the bulk of the cpu cycle savings will occur.
> 

It tries to. It runs n_cpus instances of this:

static void busyloop(int instance)
{
        int idx;

        for ( ; ; ) {
                for (idx = 0; idx < busyloop_size; idx++) {
                        int thumb;

                        busyloop_buf[idx]++;                    /* Dirty a cacheline */
                        for (thumb = 0; thumb < 200; thumb++)
                                ;                               /* twiddle */
                        busyloop_progress[instance * CACHE_LINE_SIZE]++;
                }
        }
}

At minimum priority.

And it measures how much these threads are slowed
down, wrt an unloaded system. So interrupt work
is definitely accounted for.

It needs work.  It should walk the buffer in cacheline-sized
strides, should have tunable read-versus-write ratios, should
be scheduled with `idle' priority, should be bondable
to CPUs and should create PCI traffic.  That means a in-kernel
implementation.

But tweaking this thing thus far has made only very small
differences in output.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-30 22:17               ` David S. Miller
@ 2001-01-31  0:31                 ` Chris Wedgwood
  2001-01-31  0:45                   ` David S. Miller
  0 siblings, 1 reply; 56+ messages in thread
From: Chris Wedgwood @ 2001-01-31  0:31 UTC (permalink / raw)
  To: David S. Miller; +Cc: Andrew Morton, lkml, netdev@oss.sgi.com

On Tue, Jan 30, 2001 at 02:17:57PM -0800, David S. Miller wrote:

    8.5MB/sec sounds like half-duplex 100baseT.

No; I'm 100% its  FD; HD gives 40k/sec TCP because of collisions and
such like.

    Positive you are running at full duplex all the way to the
    netapp, and if so how many switches sit between you and this
    netapp?

It's FD all the way (we hardwire everything to 100-FD and never trust
auto-negotiate); I see no errors or such like anywhere.

There are ... <pause> ... 3 switches between four switches in
between, mostly linked via GE. I'm not sure if latency might be an
issue here, is it was critical I can imagine 10 km of glass might be
a problem but it's not _that_ far...

  --cw

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-31  0:31                 ` Chris Wedgwood
@ 2001-01-31  0:45                   ` David S. Miller
  0 siblings, 0 replies; 56+ messages in thread
From: David S. Miller @ 2001-01-31  0:45 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Andrew Morton, lkml, netdev@oss.sgi.com

Chris Wedgwood writes:
 > There are ... <pause> ... 3 switches between four switches in
 > between, mostly linked via GE. I'm not sure if latency might be an
 > issue here, is it was critical I can imagine 10 km of glass might be
 > a problem but it's not _that_ far...

Other than this, I don't know what to postulate.  Really,
most reports and my own experimentation (directly connected
Linux knfsd to 2.4.x nfs client) supports the fact that our
client can saturate 100baseT rather fully.

Later,
David S. Miller
davem@redhat.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-30  3:26         ` Ion Badulescu
@ 2001-01-31  0:53           ` jamal
  2001-01-31  0:59             ` Ingo Molnar
  2001-01-31  1:10             ` Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing to dowith ECN) Rick Jones
  0 siblings, 2 replies; 56+ messages in thread
From: jamal @ 2001-01-31  0:53 UTC (permalink / raw)
  To: Ion Badulescu; +Cc: Andrew Morton, lkml, netdev@oss.sgi.com



On Mon, 29 Jan 2001, Ion Badulescu wrote:

> On Mon, 29 Jan 2001, jamal wrote:
>
> > > 11.5kBps, quite consistently.
> >
> > This gige card is really sick. Are you sure? Please double check.
>
> Umm.. the starfire chipset is 100Mbit only. So 11.5MBps (sorry, that was a
> typo, it's mega not kilo) is really all I'd expect out of it.
>

not good.

So far all the tests have been around CPU. The general trend seems
to be:
- sendfile + ZC good for CPU
- write() + ZC not good for CPU
(i might have forgotten something from Andrew's results).
This happens (even with my bogus cpu measure) to be similar.
That seems to be explainable.

** I reported that there was also an oddity in throughput values,
unfortunately since no one (other than me) seems to have access
to a gige cards in the ZC list, nobody can confirm or disprove
what i posted. Here again as a reminder:

Kernel     |  tput  | sender-CPU | receiver-CPU |
-------------------------------------------------
2.4.0-pre3 | 99MB/s |   87%      |  23%         |
NSF        |        |            |              |
-------------------------------------------------
2.4.0-pre3 | 86MB/s |   100%     |  17%         |
SF         |        |            |              |
-------------------------------------------------
2.4.0-pre3 | 66.2   |   60%      |  11%         |
+ZC        | MB/s   |            |              |
-------------------------------------------------
2.4.0-pre3 | 68     |   8%       |  8%          |
+ZC  SF    | MB/s   |            |              |
-------------------------------------------------


Just ignore the CPU readings, focus on throughput. And could someone plese
post results?

cheers,
jamal



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-31  0:53           ` Still not sexy! (Re: " jamal
@ 2001-01-31  0:59             ` Ingo Molnar
  2001-01-31  1:04               ` jamal
  2001-01-31  1:10             ` Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing to dowith ECN) Rick Jones
  1 sibling, 1 reply; 56+ messages in thread
From: Ingo Molnar @ 2001-01-31  0:59 UTC (permalink / raw)
  To: jamal; +Cc: Ion Badulescu, Andrew Morton, lkml, netdev@oss.sgi.com


On Tue, 30 Jan 2001, jamal wrote:

> Kernel     |  tput  | sender-CPU | receiver-CPU |
> -------------------------------------------------
> 2.4.0-pre3 | 99MB/s |   87%      |  23%         |
> NSF        |        |            |              |
> -------------------------------------------------
> 2.4.0-pre3 | 68     |   8%       |  8%          |
> +ZC  SF    | MB/s   |            |              |
> -------------------------------------------------

isnt the CPU utilization difference amazing? :-)

a couple of questions:

- is this UDP or TCP based? (UDP i guess)

- what wsize/rsize are you using? How do these requests look like on the
  network, ie. are they suffieciently MTU-sized?

- what happens if you run multiple instances of the testcode, does it
  saturate bandwidth (or CPU)?

	Ingo


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-31  0:59             ` Ingo Molnar
@ 2001-01-31  1:04               ` jamal
  2001-01-31  1:14                 ` Ingo Molnar
  0 siblings, 1 reply; 56+ messages in thread
From: jamal @ 2001-01-31  1:04 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Ion Badulescu, Andrew Morton, lkml, netdev@oss.sgi.com



On Wed, 31 Jan 2001, Ingo Molnar wrote:

>
> On Tue, 30 Jan 2001, jamal wrote:
>
> > Kernel     |  tput  | sender-CPU | receiver-CPU |
> > -------------------------------------------------
> > 2.4.0-pre3 | 99MB/s |   87%      |  23%         |
> > NSF        |        |            |              |
> > -------------------------------------------------
> > 2.4.0-pre3 | 68     |   8%       |  8%          |
> > +ZC  SF    | MB/s   |            |              |
> > -------------------------------------------------
>
> isnt the CPU utilization difference amazing? :-)
>

With a caveat, sadly ;-> ttcp uses times() system call (or a diff of
times() one at the beggining and another at the end). So the cpu
measurements are not reflective.

> a couple of questions:
>
> - is this UDP or TCP based? (UDP i guess)
>
TCP

> - what wsize/rsize are you using? How do these requests look like on the
>   network, ie. are they suffieciently MTU-sized?

yes. writes vary from 8K->64K but not much difference over the long period
of time.

>
> - what happens if you run multiple instances of the testcode, does it
>   saturate bandwidth (or CPU)?

This is something of great interest. I havent tried it. I should.
I suspect this would be where the value of the ZC changes will become
evident.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing to  dowith ECN)
  2001-01-31  0:53           ` Still not sexy! (Re: " jamal
  2001-01-31  0:59             ` Ingo Molnar
@ 2001-01-31  1:10             ` Rick Jones
  2001-01-31  1:45               ` jamal
  1 sibling, 1 reply; 56+ messages in thread
From: Rick Jones @ 2001-01-31  1:10 UTC (permalink / raw)
  To: jamal; +Cc: Ion Badulescu, Andrew Morton, lkml, netdev@oss.sgi.com

> ** I reported that there was also an oddity in throughput values,
> unfortunately since no one (other than me) seems to have access
> to a gige cards in the ZC list, nobody can confirm or disprove
> what i posted. Here again as a reminder:
> 
> Kernel     |  tput  | sender-CPU | receiver-CPU |
> -------------------------------------------------
> 2.4.0-pre3 | 99MB/s |   87%      |  23%         |
> NSF        |        |            |              |
> -------------------------------------------------
> 2.4.0-pre3 | 86MB/s |   100%     |  17%         |
> SF         |        |            |              |
> -------------------------------------------------
> 2.4.0-pre3 | 66.2   |   60%      |  11%         |
> +ZC        | MB/s   |            |              |
> -------------------------------------------------
> 2.4.0-pre3 | 68     |   8%       |  8%          |
> +ZC  SF    | MB/s   |            |              |
> -------------------------------------------------
> 
> Just ignore the CPU readings, focus on throughput. And could someone plese
> post results?

In the spirit of the socratic method :)

Is your gige card based on Alteon?

How does ZC/SG change the nature of the packets presented to the NIC?

How well does the NIC do with that changed nature?

rick jones

sometimes, performance tuning is like squeezing a balloon. one part gets
smaller, but then you start to see the rest of the balloon...

-- 
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-31  1:04               ` jamal
@ 2001-01-31  1:14                 ` Ingo Molnar
  2001-01-31  1:39                   ` jamal
  2001-01-31 11:21                   ` Malcolm Beattie
  0 siblings, 2 replies; 56+ messages in thread
From: Ingo Molnar @ 2001-01-31  1:14 UTC (permalink / raw)
  To: jamal; +Cc: Ion Badulescu, Andrew Morton, lkml, netdev@oss.sgi.com


On Tue, 30 Jan 2001, jamal wrote:

> > - is this UDP or TCP based? (UDP i guess)
> >
> TCP

well then i'd suggest to do:

	echo 100000 100000 100000 > /proc/sys/net/ipv4/tcp_wmem

does this make any difference?

	Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-31  1:14                 ` Ingo Molnar
@ 2001-01-31  1:39                   ` jamal
  2001-01-31 11:21                   ` Malcolm Beattie
  1 sibling, 0 replies; 56+ messages in thread
From: jamal @ 2001-01-31  1:39 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Ion Badulescu, Andrew Morton, lkml, netdev@oss.sgi.com



On Wed, 31 Jan 2001, Ingo Molnar wrote:

>
> On Tue, 30 Jan 2001, jamal wrote:
>
> > > - is this UDP or TCP based? (UDP i guess)
> > >
> > TCP
>
> well then i'd suggest to do:
>
> 	echo 100000 100000 100000 > /proc/sys/net/ipv4/tcp_wmem
>
> does this make any difference?

According to my notes, i dont see this.
however, 262144 into /proc/sys/net/core/*mem_max/default.

I have access to my h/ware this weekend. Hopefully i should get something
better than ttcp to use.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing to dowith ECN)
  2001-01-31  1:10             ` Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing to dowith ECN) Rick Jones
@ 2001-01-31  1:45               ` jamal
  2001-01-31  2:25                 ` Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing todowith ECN) Rick Jones
  0 siblings, 1 reply; 56+ messages in thread
From: jamal @ 2001-01-31  1:45 UTC (permalink / raw)
  To: Rick Jones; +Cc: Ion Badulescu, Andrew Morton, lkml, netdev@oss.sgi.com



On Tue, 30 Jan 2001, Rick Jones wrote:

> > ** I reported that there was also an oddity in throughput values,
> > unfortunately since no one (other than me) seems to have access
> > to a gige cards in the ZC list, nobody can confirm or disprove
> > what i posted. Here again as a reminder:
> >
> > Kernel     |  tput  | sender-CPU | receiver-CPU |
> > -------------------------------------------------
> > 2.4.0-pre3 | 99MB/s |   87%      |  23%         |
> > NSF        |        |            |              |
> > -------------------------------------------------
> > 2.4.0-pre3 | 86MB/s |   100%     |  17%         |
> > SF         |        |            |              |
> > -------------------------------------------------
> > 2.4.0-pre3 | 66.2   |   60%      |  11%         |
> > +ZC        | MB/s   |            |              |
> > -------------------------------------------------
> > 2.4.0-pre3 | 68     |   8%       |  8%          |
> > +ZC  SF    | MB/s   |            |              |
> > -------------------------------------------------
> >
> > Just ignore the CPU readings, focus on throughput. And could someone plese
> > post results?
>
> In the spirit of the socratic method :)

;->

>
> Is your gige card based on Alteon?

Yes, sir, it is. To be precise:

** Sender: SMP-PII-450Mhz, ASUS m/board; 3com version of acenic
- 1M version
** receiver: same hardware; acenic alteon card - 1M version

> How does ZC/SG change the nature of the packets presented to the NIC?

what do you mean? I am _sure_ you know how SG/ZC work. So i am suspecting
more than socratic view on life here. Could be influence from Aristotle;->

> How well does the NIC do with that changed nature?
>

Hard question to answer ;-> I havent done any analysis at that level

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing  todowith ECN)
  2001-01-31  1:45               ` jamal
@ 2001-01-31  2:25                 ` Rick Jones
  2001-02-04 19:48                   ` jamal
  0 siblings, 1 reply; 56+ messages in thread
From: Rick Jones @ 2001-01-31  2:25 UTC (permalink / raw)
  To: jamal; +Cc: Ion Badulescu, Andrew Morton, lkml, netdev@oss.sgi.com

> > How does ZC/SG change the nature of the packets presented to the NIC?
> 
> what do you mean? I am _sure_ you know how SG/ZC work. So i am suspecting
> more than socratic view on life here. Could be influence from Aristotle;->

Well, I don't know  the specifics of Linux, but I gather from what I've
read on the list thusfar, that prior to implementing SG support, Linux
NIC drivers would copy packets into single contiguous buffers that were
then sent to the NIC yes? 

If so, the implication is with SG going, that copy no longer takes
place, and so a chain of buffers is given to the NIC.

Also, if one is fully ZC :) pesky things like protocol headers can
naturally end-up in separate buffers.

So, now you have to ask how well any given NIC follows chains of
buffers. At what number of buffers is the overhead in the NIC of
following the chains enough to keep it from achieving link-rate?

One way to try and deduce that would be to meld some of the SG and preSG
behaviours and copy packets into varying numbers of buffers per packet
and measure the resulting impact on throughput through the NIC.

rick jones

As time marches on, the orders of magnitude of the constants may change,
but basic concepts still remain, and the "lessons" learned in the past
by one generation tend to get relearned in the next :) for example -
there is no such a thing as a free lunch... :)

-- 
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-31  1:14                 ` Ingo Molnar
  2001-01-31  1:39                   ` jamal
@ 2001-01-31 11:21                   ` Malcolm Beattie
  2001-01-31 11:24                     ` Ingo Molnar
  1 sibling, 1 reply; 56+ messages in thread
From: Malcolm Beattie @ 2001-01-31 11:21 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: jamal, Ion Badulescu, Andrew Morton, lkml, netdev@oss.sgi.com

Ingo Molnar writes:
> 
> On Tue, 30 Jan 2001, jamal wrote:
> 
> > > - is this UDP or TCP based? (UDP i guess)
> > >
> > TCP
> 
> well then i'd suggest to do:
> 
> 	echo 100000 100000 100000 > /proc/sys/net/ipv4/tcp_wmem
> 
> does this make any difference?

For the last week I've been benchmarking Linux network and I/O on a
couple of machines with 3c985 gigabit cards and some other stuff
(see below). One of the things I tried yesterday was a beta test
version of a secure ftpd written by Chris Evans which happens to use
sendfile() making it a convenient extra benchmark. I'd already put
net.core.{r,w}mem_max up to 262144 for the sake of gensink and other
benchmarks which raise SO_{SND,RCV}BUF. I hadn't however, tried
raising tcp_wmem as per your suggestion above.

Currently the systems are linked back to back with fibre with jumbo
frames (MTU 9000) on and running pure kernel 2.4.1. I transferred a 300
MByte file repeatedly from the server to the client with an ftp "get"
client-side. The file will have been completely in page cache on the
server (both machines have 512MB RAM) and was written to /dev/null on
the client side. (Yes, I checked the client was doing ordinary
read/write and not throwing it away).

Without the raised tcp_wmem setting I was getting 81 MByte/s.
With tcp_wmem set as above I got 86 MByte/s. Nice increase. Any other
setting I can tweak apart from {r,w}mem_max and tcp_{w,r}mem? The CPU
on the client (350 MHz PII) is the bottleneck: gensink4 maxes out at
69 Mbyte/s pulling TCP from the server and 94 Mbyte/s pushing. (The
other system, 733 MHz PIII pushes >100MByte/s UDP with ttcp but the
client drops most of it).

I'll be following up Dave Miller's "please benchmark zerocopy"
request when I've got some more numbers written down since I've only
just put the zerocopy patch in and haven't rebooted yet.

If anyone wants any other specific benchmarks done (I/O or network)
I may get some time to do them: the PIII system has an 8-port
Escalade card with 8 x 46GB disks (117 MByte/s block writes as
measured by Bonnie on a RAID1/0 mixed RAIDset) and there are also
four dual-port eepro fast ethernet cards, a Cisco 8-port 3508G gigabit
switch and a 24-port 3524 fast ethernet switch (gigastack linked to
the 3508G).  I'm benchmarking and looking into the possibility of a DIY
NAS or SAN-type thing.

--Malcolm

-- 
Malcolm Beattie <mbeattie@sable.ox.ac.uk>
Unix Systems Programmer
Oxford University Computing Services
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-31 11:21                   ` Malcolm Beattie
@ 2001-01-31 11:24                     ` Ingo Molnar
  0 siblings, 0 replies; 56+ messages in thread
From: Ingo Molnar @ 2001-01-31 11:24 UTC (permalink / raw)
  To: Malcolm Beattie
  Cc: jamal, Ion Badulescu, Andrew Morton, lkml, netdev@oss.sgi.com

On Wed, 31 Jan 2001, Malcolm Beattie wrote:

> Without the raised tcp_wmem setting I was getting 81 MByte/s. With
> tcp_wmem set as above I got 86 MByte/s. Nice increase. Any other
> setting I can tweak apart from {r,w}mem_max and tcp_{w,r}mem? The CPU
> on the client (350 MHz PII) is the bottleneck: gensink4 maxes out at
> 69 Mbyte/s pulling TCP from the server and 94 Mbyte/s pushing. (The
> other system, 733 MHz PIII pushes >100MByte/s UDP with ttcp but the
> client drops most of it).

you can speed up the client significantly by using the MSG_TRUNC option
('truncate message'). It will zap incoming data without copying it into
user-space. (you can use this for the 'bulk transfer' part - the initial
protocol handling code needs to see the actual data.) This way you should
be able to saturate the server even more.

	Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-01-30  6:00     ` David S. Miller
  2001-01-30 12:44       ` Andrew Morton
@ 2001-02-02 10:12       ` Andrew Morton
  2001-02-02 12:14         ` Trond Myklebust
  2001-02-02 17:51         ` David Lang
  1 sibling, 2 replies; 56+ messages in thread
From: Andrew Morton @ 2001-02-02 10:12 UTC (permalink / raw)
  To: David S. Miller; +Cc: lkml, netdev@oss.sgi.com

"David S. Miller" wrote:
> 
> ...
> Finally, please do some tests on loopback.  It is usually a great
> way to get "pure software overhead" measurements of our TCP stack.

Here we are.  TCP and NFS/UDP over lo.

Machine is a dual-PII.  I didn't bother running CPU utilisation
testing while benchmarking loopback, although this may be of
some interest for SMP.  I just looked at the throughput.

Machine is a dual 500MHz PII (again).  Memory read bandwidth
is 320 meg/sec.  Write b/w is 130 meg/sec.  The working set
is 60 ~300k files, everything cached. We run the following
tests:

1: sendfile() to localhost, sender and receiver pinned to
   separate CPUs

2: sendfile() to localhost, sender and receiver pinned to
   the same CPU

3: sendfile() to localhost, no explicit pinning.

4, 5, 6: same as above, except we use send() in 8kbyte
   chunks.

Repeat with and without zerocopy patch 2.4.1-2.

The receiver reads 64k hunks and throws them away. sendfile()
sends the entire file.

Also, do an NFS mount of localhost, rsize=wsize=8192, see how
long it takes to `cp' a 100 meg file from the "server" to
/dev/null.  The file is cached on the "server".  Do this for
the three pinning cases as well - all the NFS kernel processes
were pinned as a group and `cp' was the other group.

                                sendfile()     send(8k)   NFS
                                 Mbyte/s       Mbyte/s   Mbyte/s

No explicit bonding
  2.4.1:                          66600        70000     25600
  2.4.1-zc:                      208000        69000     25000

Bond client and server to separate CPUs
  2.4.1:                          66700        68000     27800
  2.4.1-zc:                      213047        66000     25700

Bond client and server to same CPU:
  2.4.1:                          56000        57000     23300
  2.4.1-zc:                      176000        55000     22100

Much the same story.  Big increase in sendfile() efficiency,
small drop in send() and NFS unchanged.

The relative increase in sendfile() efficiency is much higher
than with a real NIC, presumably because we've factored out
the constant (and large) cost of the device driver.

All the bits and pieces to reproduce this are at

	http://www.uow.edu.au/~andrewm/linux/#zc

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-02-02 10:12       ` Andrew Morton
@ 2001-02-02 12:14         ` Trond Myklebust
  2001-02-02 17:51         ` David Lang
  1 sibling, 0 replies; 56+ messages in thread
From: Trond Myklebust @ 2001-02-02 12:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David S. Miller, lkml, netdev@oss.sgi.com

>>>>> " " == Andrew Morton <andrewm@uow.edu.au> writes:

     > Much the same story.  Big increase in sendfile() efficiency,
     > small drop in send() and NFS unchanged.

This is normal. The server doesn't do zero copy reads, but instead
copies from the page cache into an NFS-specific buffer using
file.f_op->read(). Alexey and Dave's changes are therefore unlikely to
register on NFS performance (other than on CPU use as has been
mentioned before) until we implement a sendfile-like scheme for knfsd
over TCP.
I've been wanting to start doing that (and also to finish the client
conversion to use the TCP zero-copy), but I'm pretty pressed for time
at the moment.

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-02-02 10:12       ` Andrew Morton
  2001-02-02 12:14         ` Trond Myklebust
@ 2001-02-02 17:51         ` David Lang
  2001-02-02 22:46           ` David S. Miller
  1 sibling, 1 reply; 56+ messages in thread
From: David Lang @ 2001-02-02 17:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David S. Miller, lkml, netdev@oss.sgi.com

I have been watching this thread with interest for a while now, but am
wondering about the real-world use of this, given the performance penalty
for write()

As I see it there are two basic cases you are saying this will help in.

1. webservers

2. other fileservers

I also freely admit that I don't know a lot about sendfile() so it may
have some capability that makes my concerns meaningless, if so please let
me know.

1a. for webservers that server static content (and can therefor use
sendfile) I don't see this as significant becouse as your tests have been
showing, even a modest machine can saturate your network (unless you are
useing gigE at which time it takes a skightly larger machine)

1b. for webservers that are not primarily serving static content, they
have to use write() for the output from cgi's, etc and therefor pay the
performance penalty without being able to use sendfile() much to get the
advantages. These machines are the ones that really need the performance
as the cgi's take a significant amount of your cpu.

2. for other fileservers sendfile() sounds like it would be useful if the
client is reading the entire file, but what about the cases where the
client is reading part of the file, or is writing to the file. In both of
these cases it seems that the fileserver is back to the write() penalty.
does anyone have stats on the types of requests that fileservers are being
asked for?

David Lang



 On Fri, 2 Feb 2001, Andrew Morton wrote:

> Date: Fri, 02 Feb 2001 21:12:50 +1100
> From: Andrew Morton <andrewm@uow.edu.au>
> To: David S. Miller <davem@redhat.com>
> Cc: lkml <linux-kernel@vger.kernel.org>,
>      "netdev@oss.sgi.com" <netdev@oss.sgi.com>
> Subject: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
>
> "David S. Miller" wrote:
> >
> > ...
> > Finally, please do some tests on loopback.  It is usually a great
> > way to get "pure software overhead" measurements of our TCP stack.
>
> Here we are.  TCP and NFS/UDP over lo.
>
> Machine is a dual-PII.  I didn't bother running CPU utilisation
> testing while benchmarking loopback, although this may be of
> some interest for SMP.  I just looked at the throughput.
>
> Machine is a dual 500MHz PII (again).  Memory read bandwidth
> is 320 meg/sec.  Write b/w is 130 meg/sec.  The working set
> is 60 ~300k files, everything cached. We run the following
> tests:
>
> 1: sendfile() to localhost, sender and receiver pinned to
>    separate CPUs
>
> 2: sendfile() to localhost, sender and receiver pinned to
>    the same CPU
>
> 3: sendfile() to localhost, no explicit pinning.
>
> 4, 5, 6: same as above, except we use send() in 8kbyte
>    chunks.
>
> Repeat with and without zerocopy patch 2.4.1-2.
>
> The receiver reads 64k hunks and throws them away. sendfile()
> sends the entire file.
>
> Also, do an NFS mount of localhost, rsize=wsize=8192, see how
> long it takes to `cp' a 100 meg file from the "server" to
> /dev/null.  The file is cached on the "server".  Do this for
> the three pinning cases as well - all the NFS kernel processes
> were pinned as a group and `cp' was the other group.
>
>
>                                 sendfile()     send(8k)   NFS
>                                  Mbyte/s       Mbyte/s   Mbyte/s
>
> No explicit bonding
>   2.4.1:                          66600        70000     25600
>   2.4.1-zc:                      208000        69000     25000
>
> Bond client and server to separate CPUs
>   2.4.1:                          66700        68000     27800
>   2.4.1-zc:                      213047        66000     25700
>
> Bond client and server to same CPU:
>   2.4.1:                          56000        57000     23300
>   2.4.1-zc:                      176000        55000     22100
>
>
>
> Much the same story.  Big increase in sendfile() efficiency,
> small drop in send() and NFS unchanged.
>
> The relative increase in sendfile() efficiency is much higher
> than with a real NIC, presumably because we've factored out
> the constant (and large) cost of the device driver.
>
> All the bits and pieces to reproduce this are at
>
> 	http://www.uow.edu.au/~andrewm/linux/#zc
>
> -
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-02-02 17:51         ` David Lang
@ 2001-02-02 22:46           ` David S. Miller
  2001-02-02 22:57             ` David Lang
  0 siblings, 1 reply; 56+ messages in thread
From: David S. Miller @ 2001-02-02 22:46 UTC (permalink / raw)
  To: David Lang; +Cc: Andrew Morton, lkml, netdev@oss.sgi.com

David Lang writes:
 > 1a. for webservers that server static content (and can therefor use
 > sendfile) I don't see this as significant becouse as your tests have been
 > showing, even a modest machine can saturate your network (unless you are
 > useing gigE at which time it takes a skightly larger machine)

Start using more than one interface, then it begins to become
interesting.

 > 1b. for webservers that are not primarily serving static content, they
 > have to use write() for the output from cgi's, etc and therefor pay the
 > performance penalty without being able to use sendfile() much to get the
 > advantages. These machines are the ones that really need the performance
 > as the cgi's take a significant amount of your cpu.

CGI's can be cached btw if the implementation is clever (f.e. CGI
tells the web server that if the file used as input to the CGI does
not change then the output from the CGI will not change, meaning CGI
output is based solely on input, therefore CGI output can be cached
by the web server).

 > 2. for other fileservers sendfile() sounds like it would be useful if the
 > client is reading the entire file, but what about the cases where the
 > client is reading part of the file, or is writing to the file. In both of
 > these cases it seems that the fileserver is back to the write() penalty.
 > does anyone have stats on the types of requests that fileservers are being
 > asked for?

It helps no matter what part of the file the client reads.

sendfile() can be used on an arbitrary offset+len portion of
a file, it is not limited to just sending an entire fire.

Later,
David S. Miller
davem@redhat.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-02-02 22:46           ` David S. Miller
@ 2001-02-02 22:57             ` David Lang
  2001-02-02 23:09               ` David S. Miller
  2001-02-03  2:27               ` James Sutherland
  0 siblings, 2 replies; 56+ messages in thread
From: David Lang @ 2001-02-02 22:57 UTC (permalink / raw)
  To: David S. Miller; +Cc: Andrew Morton, lkml, netdev@oss.sgi.com

Thanks, that info on sendfile makes sense for the fileserver situation.
for webservers we will have to see (many/most CGI's look at stuff from the
client so I still have doubts as to how much use cacheing will be)

David Lang

On Fri, 2 Feb 2001, David S. Miller wrote:

> Date: Fri, 2 Feb 2001 14:46:07 -0800 (PST)
> From: David S. Miller <davem@redhat.com>
> To: David Lang <dlang@diginsite.com>
> Cc: Andrew Morton <andrewm@uow.edu.au>, lkml <linux-kernel@vger.kernel.org>,
>      "netdev@oss.sgi.com" <netdev@oss.sgi.com>
> Subject: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
>
>
> David Lang writes:
>  > 1a. for webservers that server static content (and can therefor use
>  > sendfile) I don't see this as significant becouse as your tests have been
>  > showing, even a modest machine can saturate your network (unless you are
>  > useing gigE at which time it takes a skightly larger machine)
>
> Start using more than one interface, then it begins to become
> interesting.
>
>  > 1b. for webservers that are not primarily serving static content, they
>  > have to use write() for the output from cgi's, etc and therefor pay the
>  > performance penalty without being able to use sendfile() much to get the
>  > advantages. These machines are the ones that really need the performance
>  > as the cgi's take a significant amount of your cpu.
>
> CGI's can be cached btw if the implementation is clever (f.e. CGI
> tells the web server that if the file used as input to the CGI does
> not change then the output from the CGI will not change, meaning CGI
> output is based solely on input, therefore CGI output can be cached
> by the web server).
>
>  > 2. for other fileservers sendfile() sounds like it would be useful if the
>  > client is reading the entire file, but what about the cases where the
>  > client is reading part of the file, or is writing to the file. In both of
>  > these cases it seems that the fileserver is back to the write() penalty.
>  > does anyone have stats on the types of requests that fileservers are being
>  > asked for?
>
> It helps no matter what part of the file the client reads.
>
> sendfile() can be used on an arbitrary offset+len portion of
> a file, it is not limited to just sending an entire fire.
>
> Later,
> David S. Miller
> davem@redhat.com
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-02-02 22:57             ` David Lang
@ 2001-02-02 23:09               ` David S. Miller
  2001-02-02 23:13                 ` David Lang
  2001-02-03  2:27               ` James Sutherland
  1 sibling, 1 reply; 56+ messages in thread
From: David S. Miller @ 2001-02-02 23:09 UTC (permalink / raw)
  To: David Lang; +Cc: Andrew Morton, lkml, netdev@oss.sgi.com

David Lang writes:
 > Thanks, that info on sendfile makes sense for the fileserver situation.
 > for webservers we will have to see (many/most CGI's look at stuff from the
 > client so I still have doubts as to how much use cacheing will be)

Also note that the decreased CPU utilization resulting from
zerocopy sendfile leaves more CPU available for CGI execution.

This was a point I forgot to make.

Later,
David S. Miller
davem@redhat.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-02-02 23:09               ` David S. Miller
@ 2001-02-02 23:13                 ` David Lang
  2001-02-02 23:28                   ` Jeff Barrow
  2001-02-02 23:31                   ` David S. Miller
  0 siblings, 2 replies; 56+ messages in thread
From: David Lang @ 2001-02-02 23:13 UTC (permalink / raw)
  To: David S. Miller; +Cc: Andrew Morton, lkml, netdev@oss.sgi.com

right, assuming that there is enough sendfile() benifit to overcome the
write() penalty from the stuff that can't be cached or sent from a file.

my question was basicly are there enough places where sendfile would
actually be used to make it a net gain.

David Lang

On Fri, 2 Feb 2001, David S. Miller wrote:

> Date: Fri, 2 Feb 2001 15:09:13 -0800 (PST)
> From: David S. Miller <davem@redhat.com>
> To: David Lang <dlang@diginsite.com>
> Cc: Andrew Morton <andrewm@uow.edu.au>, lkml <linux-kernel@vger.kernel.org>,
>      "netdev@oss.sgi.com" <netdev@oss.sgi.com>
> Subject: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
>
>
> David Lang writes:
>  > Thanks, that info on sendfile makes sense for the fileserver situation.
>  > for webservers we will have to see (many/most CGI's look at stuff from the
>  > client so I still have doubts as to how much use cacheing will be)
>
> Also note that the decreased CPU utilization resulting from
> zerocopy sendfile leaves more CPU available for CGI execution.
>
> This was a point I forgot to make.
>
> Later,
> David S. Miller
> davem@redhat.com
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-02-02 23:13                 ` David Lang
@ 2001-02-02 23:28                   ` Jeff Barrow
  2001-02-02 23:31                   ` David S. Miller
  1 sibling, 0 replies; 56+ messages in thread
From: Jeff Barrow @ 2001-02-02 23:28 UTC (permalink / raw)
  To: David Lang; +Cc: David S. Miller, Andrew Morton, lkml, netdev@oss.sgi.com

Let's see.... all the work being done for clustering would definitely
benefit... all the static images on your webserver--and static images
makes up most of the bandwidth from web servers (images, activeX controls,
java apps, sound clips...)... NFS servers, Samba servers (both of which
are used more than you may think)... email servers...

Once Real Networks patches their Realserver to use sendfile (which
shouldn't bee all that hard), then that would help too....

I think that sendfile can be used in a LOT of applications, and the only
ones that wouldn't benefit are mostly low-bandwidth anyway (CGI apps
almost always return either a small html file or a small image file, then
there's telnet and other interactive utilities...).

Most applications that use a lot of bandwidth (and thus a lot of CPU time
sending the packets) are capable of being patched to use sendfile.

On Fri, 2 Feb 2001, David Lang wrote:

> right, assuming that there is enough sendfile() benifit to overcome the
> write() penalty from the stuff that can't be cached or sent from a file.
> 
> my question was basicly are there enough places where sendfile would
> actually be used to make it a net gain.
> 
> David Lang
> 
> On Fri, 2 Feb 2001, David S. Miller wrote:
> 
> > Date: Fri, 2 Feb 2001 15:09:13 -0800 (PST)
> > From: David S. Miller <davem@redhat.com>
> > To: David Lang <dlang@diginsite.com>
> > Cc: Andrew Morton <andrewm@uow.edu.au>, lkml <linux-kernel@vger.kernel.org>,
> >      "netdev@oss.sgi.com" <netdev@oss.sgi.com>
> > Subject: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
> >
> >
> > David Lang writes:
> >  > Thanks, that info on sendfile makes sense for the fileserver situation.
> >  > for webservers we will have to see (many/most CGI's look at stuff from the
> >  > client so I still have doubts as to how much use cacheing will be)
> >
> > Also note that the decreased CPU utilization resulting from
> > zerocopy sendfile leaves more CPU available for CGI execution.
> >
> > This was a point I forgot to make.
> >
> > Later,
> > David S. Miller
> > davem@redhat.com
> >
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-02-02 23:13                 ` David Lang
  2001-02-02 23:28                   ` Jeff Barrow
@ 2001-02-02 23:31                   ` David S. Miller
  1 sibling, 0 replies; 56+ messages in thread
From: David S. Miller @ 2001-02-02 23:31 UTC (permalink / raw)
  To: David Lang; +Cc: Andrew Morton, lkml, netdev@oss.sgi.com

David Lang writes:
 > right, assuming that there is enough sendfile() benifit to overcome the
 > write() penalty from the stuff that can't be cached or sent from a file.
 > 
 > my question was basicly are there enough places where sendfile would
 > actually be used to make it a net gain.

There are non-performance issues as well (really, all of these points
have been mentioned in this thread btw).  One is that since paged
SKBs use only single-order page allocations, the memory allocation
subsystem is stressed less than the current scheme where SLAB
allocates multi-order pages to satisfy allocations of linear SKB data
buffers.

This has consequences and benefits system wide.

Later,
David S. Miller
davem@redhat.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
  2001-02-02 22:57             ` David Lang
  2001-02-02 23:09               ` David S. Miller
@ 2001-02-03  2:27               ` James Sutherland
  1 sibling, 0 replies; 56+ messages in thread
From: James Sutherland @ 2001-02-03  2:27 UTC (permalink / raw)
  To: David Lang; +Cc: David S. Miller, Andrew Morton, lkml, netdev@oss.sgi.com

On Fri, 2 Feb 2001, David Lang wrote:

> Thanks, that info on sendfile makes sense for the fileserver situation.
> for webservers we will have to see (many/most CGI's look at stuff from the
> client so I still have doubts as to how much use cacheing will be)

CGI performance isn't directly affected by this - the whole point is to
reduce the "cost" of handling static requests to zero (at least, as close
as possible) leaving as much CPU as possible for the CGI to use.

So sendfile won't help your CGI directly - it will just give your CGI more
resources to work with.


James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing  todowith ECN)
  2001-01-31  2:25                 ` Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing todowith ECN) Rick Jones
@ 2001-02-04 19:48                   ` jamal
  2001-02-05  5:13                     ` David S. Miller
  2001-02-05 18:51                     ` Rick Jones
  0 siblings, 2 replies; 56+ messages in thread
From: jamal @ 2001-02-04 19:48 UTC (permalink / raw)
  To: Rick Jones; +Cc: Ion Badulescu, Andrew Morton, lkml, netdev@oss.sgi.com

On Tue, 30 Jan 2001, Rick Jones wrote:

> > > How does ZC/SG change the nature of the packets presented to the NIC?
> >
> > what do you mean? I am _sure_ you know how SG/ZC work. So i am suspecting
> > more than socratic view on life here. Could be influence from Aristotle;->
>
> Well, I don't know  the specifics of Linux, but I gather from what I've
> read on the list thusfar, that prior to implementing SG support, Linux
> NIC drivers would copy packets into single contiguous buffers that were
> then sent to the NIC yes?
>

yes.

> If so, the implication is with SG going, that copy no longer takes
> place, and so a chain of buffers is given to the NIC.
>

yes.

> Also, if one is fully ZC :) pesky things like protocol headers can
> naturally end-up in separate buffers.
>

yes.

> So, now you have to ask how well any given NIC follows chains of
> buffers. At what number of buffers is the overhead in the NIC of
> following the chains enough to keep it from achieving link-rate?
>

hmmm... not sure how you would enforce this today or why you would
want that. Alexey, Dave?
The kernel should be able to break it into two buffers(with netperf,
for example -- header + data).
Ok, probably with tux-http 3 (header, data, trailler).

> One way to try and deduce that would be to meld some of the SG and preSG
> behaviours and copy packets into varying numbers of buffers per packet
> and measure the resulting impact on throughput through the NIC.
>

If only time were on my hands i'd love to do this. Alas.
NOTE also, that effect would also be an effect of the specif NIC.

> rick jones
>
> As time marches on, the orders of magnitude of the constants may change,
> but basic concepts still remain, and the "lessons" learned in the past
> by one generation tend to get relearned in the next :) for example -
> there is no such a thing as a free lunch... :)

;->
BTW, i am reading one of your papers (circa 1993 ;->, "we go fast with a
little help from your apps")  in which you make an interesting
observation. That (figure 2) there is "a considerable increase in
efficiency but not a considerable increase in throughput" .... I "scanned"
to the end of the paper and dont see an explanation.
I've made a somehow similar observation with the current zc patches and
infact observed that throughput goes down with the linux zc patches.
[This is being contested but no-one else is testing at gigE, so my word is
the only truth].
Of course your paper doesnt talk about sendfile rather the page pinning +
COW tricks (which are considered taboo in Linux) but i do sense a
relationship.

cheers,
jamal

PS:- I dont have "my" machines yet and i have a feeling it will be a while
before i re-run the tests; however, i have created a patch for
linux-sendfile with netperf. Please take a look at it at:
http://www.cyberus.ca/~hadi/patch-nperf-sfile-linux.gz
tell me if is missing anything and if it is ok, could you please merge in
your tree?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing  todowith ECN)
  2001-02-04 19:48                   ` jamal
@ 2001-02-05  5:13                     ` David S. Miller
  2001-02-05 18:51                     ` Rick Jones
  1 sibling, 0 replies; 56+ messages in thread
From: David S. Miller @ 2001-02-05  5:13 UTC (permalink / raw)
  To: jamal; +Cc: Rick Jones, Ion Badulescu, Andrew Morton, lkml,
	netdev@oss.sgi.com

jamal writes:
 > > So, now you have to ask how well any given NIC follows chains of
 > > buffers. At what number of buffers is the overhead in the NIC of
 > > following the chains enough to keep it from achieving link-rate?
 > >
 > 
 > hmmm... not sure how you would enforce this today or why you would
 > want that. Alexey, Dave?
 > The kernel should be able to break it into two buffers(with netperf,
 > for example -- header + data).
 > Ok, probably with tux-http 3 (header, data, trailler).

First, just to make sure Jamal understands what Rick Jones is
trying to make note of.  He is trying to say that the cost of
dealing with extra TX descriptor ring entries can begin to
nullify the gains of zerocopy, depending upon HW implementation (both
at the NIC and the PCI controller).

Back to today, it is possible that this is an issue if your machine
is near PCI bandwidth saturation before zerocopy for these tests.
I think this may be one of the factors causing Jamal to see results
Alexey cannot reproduce.  Get two people with identical PCI host
bridges, Acenic in identical PCI slot, I bet the numbers begin to
jive.

Currently, you get "1 + ((MTU + PAGE_SIZE - 1) / PAGE_SIZE)" buffers
per packet when going over a zerocopy device using TCP.

Later,
David S. Miller
davem@redhat.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing  todowith ECN)
  2001-02-04 19:48                   ` jamal
  2001-02-05  5:13                     ` David S. Miller
@ 2001-02-05 18:51                     ` Rick Jones
  1 sibling, 0 replies; 56+ messages in thread
From: Rick Jones @ 2001-02-05 18:51 UTC (permalink / raw)
  To: jamal; +Cc: Ion Badulescu, Andrew Morton, lkml, netdev@oss.sgi.com

> > As time marches on, the orders of magnitude of the constants may change,
> > but basic concepts still remain, and the "lessons" learned in the past
> > by one generation tend to get relearned in the next :) for example -
> > there is no such a thing as a free lunch... :)
> 
> ;->
> BTW, i am reading one of your papers (circa 1993 ;->, "we go fast with a
> little help from your apps")  in which you make an interesting
> observation. That (figure 2) there is "a considerable increase in
> efficiency but not a considerable increase in throughput" .... I "scanned"
> to the end of the paper and dont see an explanation.

That would be the copyavoidance paper using the very old G30 with the
HP-PB (sometimes called PeanutButter) bus :)
(http://ftp.cup.hp.com/dist/networking/briefs/)

No, back then we were not going to describe the dirty laundry of the G30
hardware :) The limiter appears to have been the bus converter from the
SGC (?) main bus of the Novas (8x7,F,G,H,I) to the HP-PB bus. The chip
was (apropriately enough) codenamed "BOA" and it was a constrictor :)

I never had a chance to carry-out the tests on an older 852 system -
those have slower CPU's, but HP-PB was _the_ bus in the system.
Prototypes leading to the HP-PB FDDI card achieved 10 MB/s on an 832
system using UDP - this was back in the 1988-1989 timeframe iirc.

> I've made a somehow similar observation with the current zc patches and
> infact observed that throughput goes down with the linux zc patches.
> [This is being contested but no-one else is testing at gigE, so my word is
> the only truth].
> Of course your paper doesnt talk about sendfile rather the page pinning +
> COW tricks (which are considered taboo in Linux) but i do sense a
> relationship.

Well, the HP-PB FDDI card did follow buffer chains rather well, and
there was no mapping overhead on a Nova - it was a non-coherent I/O
subsystem and DMA was done exclusively with physical addresses (and
requisite pre-DMA flushes on outbound, and purges on inbound - another
reason why copy-avoidance was such a win overheadwise).

Also, there was no throughput drop when going to copyavoidance in that
stuff. So, I'd say that while somethings might "feel" similar, it does
not go much deeper than that.

rick

> PS:- I dont have "my" machines yet and i have a feeling it will be a while
> before i re-run the tests; however, i have created a patch for
> linux-sendfile with netperf. Please take a look at it at:
> http://www.cyberus.ca/~hadi/patch-nperf-sfile-linux.gz
> tell me if is missing anything and if it is ok, could you please merge in
> your tree?

I will take a look.

-- 
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2001-02-05 18:51 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-01-27  5:45 sendfile+zerocopy: fairly sexy (nothing to do with ECN) Andrew Morton
2001-01-27  6:20 ` Aaron Lehmann
2001-01-27  8:19   ` Andrew Morton
2001-01-27 10:09     ` Ion Badulescu
2001-01-27 10:45       ` Andrew Morton
2001-01-30  6:00     ` David S. Miller
2001-01-30 12:44       ` Andrew Morton
2001-01-30 12:52         ` David S. Miller
2001-01-30 14:58           ` Andrew Morton
2001-01-30 17:49             ` Chris Wedgwood
2001-01-30 22:17               ` David S. Miller
2001-01-31  0:31                 ` Chris Wedgwood
2001-01-31  0:45                   ` David S. Miller
2001-01-30 22:28             ` David S. Miller
2001-01-30 23:34               ` Andrew Morton
2001-02-02 10:12       ` Andrew Morton
2001-02-02 12:14         ` Trond Myklebust
2001-02-02 17:51         ` David Lang
2001-02-02 22:46           ` David S. Miller
2001-02-02 22:57             ` David Lang
2001-02-02 23:09               ` David S. Miller
2001-02-02 23:13                 ` David Lang
2001-02-02 23:28                   ` Jeff Barrow
2001-02-02 23:31                   ` David S. Miller
2001-02-03  2:27               ` James Sutherland
2001-01-27 10:05 ` Ion Badulescu
2001-01-27 10:39   ` Andrew Morton
2001-01-27 12:49   ` jamal
2001-01-30  1:06     ` Ion Badulescu
2001-01-30  2:48       ` jamal
2001-01-30  3:26         ` Ion Badulescu
2001-01-31  0:53           ` Still not sexy! (Re: " jamal
2001-01-31  0:59             ` Ingo Molnar
2001-01-31  1:04               ` jamal
2001-01-31  1:14                 ` Ingo Molnar
2001-01-31  1:39                   ` jamal
2001-01-31 11:21                   ` Malcolm Beattie
2001-01-31 11:24                     ` Ingo Molnar
2001-01-31  1:10             ` Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing to dowith ECN) Rick Jones
2001-01-31  1:45               ` jamal
2001-01-31  2:25                 ` Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing todowith ECN) Rick Jones
2001-02-04 19:48                   ` jamal
2001-02-05  5:13                     ` David S. Miller
2001-02-05 18:51                     ` Rick Jones
2001-01-27 12:43 ` sendfile+zerocopy: fairly sexy (nothing to do with ECN) jamal
2001-01-27 13:29   ` Andrew Morton
2001-01-27 14:15     ` jamal
2001-01-28 16:05       ` Andrew Morton
2001-01-29 18:50   ` Rick Jones
     [not found] ` <200101271854.VAA02845@ms2.inr.ac.ru>
2001-01-28  5:34   ` Andrew Morton
2001-01-28 13:37     ` Felix von Leitner
2001-01-28 14:11       ` Dan Hollis
2001-01-28 14:27       ` Andi Kleen
2001-01-29 21:50         ` Pavel Machek
2001-01-28 19:43       ` Gregory Maxwell
2001-01-28 19:48       ` Choosing Linux NICs (was: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)) Felix von Leitner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox