linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* GigE Performance Comparison of GMAC and SUNGEM Drivers
@ 2001-11-18  6:45 Bill Fink
  2001-11-19  0:15 ` Anton Blanchard
  0 siblings, 1 reply; 12+ messages in thread
From: Bill Fink @ 2001-11-18  6:45 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Bill Fink


Hi,

I just did a GigE performance comparison of the GMAC and SUNGEM drivers
using the latest 2.4.15-pre4-ben0 kernel.  The two test systems were
both 867 MHz G4s connected to two ports on the same Extreme 5i GigE
switch.  The test was simply to measure the sustained TCP network
throughput for a 60 second period (using memory-to-memory transfers
of 64 KB buffers with a 768 KB window size).  The test was run shortly
after both systems had been rebooted, and there was nothing else of
significance running on either system (not even X windows).

The GMAC driver had significantly better performance.  It sustained
663 Mbps for the 60 second test period, and used 63 % of the CPU on
the transmitter and 64 % of the CPU on the receiver.  By comparison,
the SUNGEM driver only achieved 588 Mbps, and utilized 100 % of the
CPU on the transmitter and 86 % of the CPU on the receiver.  Thus,
the SUNGEM driver had an 11.3 % lower network performance while
using 58.7 % more CPU (and was in fact totally CPU saturated).

I was actually somewhat disappointed even by the GMAC GigE performance.
I was expecting to achieve nearly full GigE performance, and since there
was still about 1/3 of the CPU available, the bottleneck was obviously
elsewhere.  Perhaps it is just a limitation of the actual Broadcom BCM5411
GigE chip built into the 867 MHz G4.  I am hoping that this is in fact
the case.  I will be trying more tests later using a NetGear GA620T
PCI NIC using the ACENIC driver to see if it has better performance.
This NetGear NIC is also supposed to support jumbo frames (9K MTU),
and I am very interested in determining the presumably significant
performance benefits and/or reduced CPU usage associated with using
jumbo frames.

						-Bill


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GigE Performance Comparison of GMAC and SUNGEM Drivers
  2001-11-18  6:45 GigE Performance Comparison of GMAC and SUNGEM Drivers Bill Fink
@ 2001-11-19  0:15 ` Anton Blanchard
  2001-11-19 12:54   ` Benjamin Herrenschmidt
  2001-11-21  3:46   ` Bill Fink
  0 siblings, 2 replies; 12+ messages in thread
From: Anton Blanchard @ 2001-11-19  0:15 UTC (permalink / raw)
  To: Bill Fink; +Cc: linuxppc-dev



Hi,

> The GMAC driver had significantly better performance.  It sustained
> 663 Mbps for the 60 second test period, and used 63 % of the CPU on
> the transmitter and 64 % of the CPU on the receiver.  By comparison,
> the SUNGEM driver only achieved 588 Mbps, and utilized 100 % of the
> CPU on the transmitter and 86 % of the CPU on the receiver.  Thus,
> the SUNGEM driver had an 11.3 % lower network performance while
> using 58.7 % more CPU (and was in fact totally CPU saturated).

It would be interesting to see where the cpu is being used. Could you
boot with profile=2 and use readprofile to find the worst cpu hogs
during a run?

> I will be trying more tests later using a NetGear GA620T
> PCI NIC using the ACENIC driver to see if it has better performance.
> This NetGear NIC is also supposed to support jumbo frames (9K MTU),
> and I am very interested in determining the presumably significant
> performance benefits and/or reduced CPU usage associated with using
> jumbo frames.

On two ppc64 machines I can get up to 100MB/s payload using 1500 byte MTU.
When using zero copy this drops to 80MB/s (I guess the MIPS cpu on the
acenic is flat out), but the host cpu usage is much less of course.

With 9K MTU I can get ~122.5MB/s payload which is pretty good.

PS: Be sure to increase all the /proc/sys/net/.../*mem* sysctl variables.

Anton

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GigE Performance Comparison of GMAC and SUNGEM Drivers
  2001-11-19  0:15 ` Anton Blanchard
@ 2001-11-19 12:54   ` Benjamin Herrenschmidt
  2001-11-20  6:34     ` Bill Fink
  2001-11-21  3:46   ` Bill Fink
  1 sibling, 1 reply; 12+ messages in thread
From: Benjamin Herrenschmidt @ 2001-11-19 12:54 UTC (permalink / raw)
  To: Bill Fink; +Cc: Anton Blanchard, linuxppc-dev, David S. Miller


>
>
>Hi,
>
>> The GMAC driver had significantly better performance.  It sustained
>> 663 Mbps for the 60 second test period, and used 63 % of the CPU on
>> the transmitter and 64 % of the CPU on the receiver.  By comparison,
>> the SUNGEM driver only achieved 588 Mbps, and utilized 100 % of the
>> CPU on the transmitter and 86 % of the CPU on the receiver.  Thus,
>> the SUNGEM driver had an 11.3 % lower network performance while
>> using 58.7 % more CPU (and was in fact totally CPU saturated).

This is weird and unexpected as GMAC will request interrupt for each
transmitted packet while sungem won't

However, I noticed that sungem is getting a lot of rxmac and txmac
interrupts, I'll investigate this a bit more.

(Could you check the difference of /proc/interrupts between a test
with gmac and a test with sungem ?)

Note that I just updated sungem in my rsync tree, it now has all of
the power management and ethtool/miitool support.
I plan to replace gmac with sungem completely, so it would be nice to
figure out where that problem comes from.

Ben.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GigE Performance Comparison of GMAC and SUNGEM Drivers
  2001-11-19 12:54   ` Benjamin Herrenschmidt
@ 2001-11-20  6:34     ` Bill Fink
  2001-11-20 12:02       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 12+ messages in thread
From: Bill Fink @ 2001-11-20  6:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Bill Fink, Anton Blanchard, linuxppc-dev, David S. Miller


On Mon, 19 Nov 2001, Benjamin Herrenschmidt wrote:

> >> The GMAC driver had significantly better performance.  It sustained
> >> 663 Mbps for the 60 second test period, and used 63 % of the CPU on
> >> the transmitter and 64 % of the CPU on the receiver.  By comparison,
> >> the SUNGEM driver only achieved 588 Mbps, and utilized 100 % of the
> >> CPU on the transmitter and 86 % of the CPU on the receiver.  Thus,
> >> the SUNGEM driver had an 11.3 % lower network performance while
> >> using 58.7 % more CPU (and was in fact totally CPU saturated).
>
> This is weird and unexpected as GMAC will request interrupt for each
> transmitted packet while sungem won't
>
> However, I noticed that sungem is getting a lot of rxmac and txmac
> interrupts, I'll investigate this a bit more.
>
> (Could you check the difference of /proc/interrupts between a test
> with gmac and a test with sungem ?)

Hi Ben,

OK.  Here's the GMAC test:

 60 second test:  4698.401 MB at 656.7557 Mbps (63 % TX, 63 % RX)

 Transmitter before and after:

 41:        191   OpenPIC   Level     eth0
 41:     476734   OpenPIC   Level     eth0

 Receiver before and after:

 41:        264   OpenPIC   Level     eth0
 41:    1157318   OpenPIC   Level     eth0

And here's the SUNGEM test:

 60 second test:  4223.125 MB at 590.4346 Mbps (100 % TX, 87 % RX)

 Transmitter before and after:

 41:        193   OpenPIC   Level     eth0
 41:    4673225   OpenPIC   Level     eth0

 Receiver before and after:

 41:        229   OpenPIC   Level     eth0
 41:    3610859   OpenPIC   Level     eth0

Taking the GMAC case, 4698.401 MB works out to 3284421 1500-byte MTU
packets (not counting TCP/IP overhead), so it would appear that the
GMAC driver is doing some type of interrupt coalescing and that the
SUNGEM driver isn't.

> Note that I just updated sungem in my rsync tree, it now has all of
> the power management and ethtool/miitool support.
> I plan to replace gmac with sungem completely, so it would be nice to
> figure out where that problem comes from.

I'd consider it much more than nice.  Since the whole point of GigE
is better performance, taking such a huge peformance/CPU hit would
be extremely bad.  OTOH, I probably won't be using the built-in GigE
hardware anyway because of its apparent performance ceiling of about
660 Mbps and its lack of jumbo frame support.

						-Bill


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GigE Performance Comparison of GMAC and SUNGEM Drivers
  2001-11-20  6:34     ` Bill Fink
@ 2001-11-20 12:02       ` Benjamin Herrenschmidt
  2001-11-20 19:50         ` David S. Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Benjamin Herrenschmidt @ 2001-11-20 12:02 UTC (permalink / raw)
  To: Bill Fink; +Cc: David S. Miller, linuxppc-dev, Anton Blanchard


>
>I'd consider it much more than nice.  Since the whole point of GigE
>is better performance, taking such a huge peformance/CPU hit would
>be extremely bad.  OTOH, I probably won't be using the built-in GigE
>hardware anyway because of its apparent performance ceiling of about
>660 Mbps and its lack of jumbo frame support.

Well, I think we may get overall better perfs once we have this
fixed, and (David, can you confirm ?), I think jumbo frames can
be supported.

So there is a small issue with the Tx ring that I'll fix asap
but it doesn't explain your problem, you are getting way too
much interrupt, possibly those rxmac/txmac interrupts I've noticed
here.

I'll investigate a bit and send you a patched driver.

Thanks for taking the time for testing,

Ben


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GigE Performance Comparison of GMAC and SUNGEM Drivers
  2001-11-20 12:02       ` Benjamin Herrenschmidt
@ 2001-11-20 19:50         ` David S. Miller
  2001-11-21  0:05           ` benh
  0 siblings, 1 reply; 12+ messages in thread
From: David S. Miller @ 2001-11-20 19:50 UTC (permalink / raw)
  To: benh; +Cc: billfink, linuxppc-dev, anton


   From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
   Date: Tue, 20 Nov 2001 13:02:32 +0100

   Well, I think we may get overall better perfs once we have this
   fixed, and (David, can you confirm ?), I think jumbo frames can
   be supported.

No, the GEM chips are broken and don't support jumbo frames.

Franks a lot,
David S. Miller
davem@redhat.com


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GigE Performance Comparison of GMAC and SUNGEM Drivers
  2001-11-20 19:50         ` David S. Miller
@ 2001-11-21  0:05           ` benh
  2001-11-21  0:19             ` David S. Miller
  0 siblings, 1 reply; 12+ messages in thread
From: benh @ 2001-11-21  0:05 UTC (permalink / raw)
  To: David S. Miller; +Cc: anton, billfink, linuxppc-dev


>   Well, I think we may get overall better perfs once we have this
>   fixed, and (David, can you confirm ?), I think jumbo frames can
>   be supported.
>
>No, the GEM chips are broken and don't support jumbo frames.

Ah, ok, I stand corrected. Is this a bug in the chip or a flaw of
the GEM chip design ? I mean, is there a chance that later revs
of the chip or eventually the one used by Apple can support it ?

Ben.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GigE Performance Comparison of GMAC and SUNGEM Drivers
  2001-11-21  0:05           ` benh
@ 2001-11-21  0:19             ` David S. Miller
  2001-11-21  1:25               ` Benjamin Herrenschmidt
  2001-11-21  4:17               ` Bill Fink
  0 siblings, 2 replies; 12+ messages in thread
From: David S. Miller @ 2001-11-21  0:19 UTC (permalink / raw)
  To: benh; +Cc: anton, billfink, linuxppc-dev


   From: benh@kernel.crashing.org
   Date: Wed, 21 Nov 2001 01:05:18 +0100

   Ah, ok, I stand corrected. Is this a bug in the chip or a flaw of
   the GEM chip design ? I mean, is there a chance that later revs
   of the chip or eventually the one used by Apple can support it ?

There is no chance whatsoever of this ever working on any
GEM revision.

Even if it could work, GEM has a 9K transmit and 20K receive fifo in
the largest configuration.  Even the Acenic has 512k or 1MB of total
on-chip ram for packet buffering.

As a result GEM sends pause frames when there is even the slightest
amount of DMA traffic is has to compete with.  On a 33Mhz/32-bit PCI
bus, it is sending pause frames all the time even if it is the only
agent making use of the bus.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GigE Performance Comparison of GMAC and SUNGEM Drivers
  2001-11-21  0:19             ` David S. Miller
@ 2001-11-21  1:25               ` Benjamin Herrenschmidt
  2001-11-21  4:17               ` Bill Fink
  1 sibling, 0 replies; 12+ messages in thread
From: Benjamin Herrenschmidt @ 2001-11-21  1:25 UTC (permalink / raw)
  To: David S. Miller; +Cc: anton, billfink, linuxppc-dev


>As a result GEM sends pause frames when there is even the slightest
>amount of DMA traffic is has to compete with.  On a 33Mhz/32-bit PCI
>bus, it is sending pause frames all the time even if it is the only
>agent making use of the bus.

Ok. At least, the UniNorth version runs on a 66Mhz bus directly
hooked to the CPU bus within the north bridge. The only other device
sharing that PCI segment is FireWire (well, it's a bandwidth hog
too be more rarely used so far).

Thanks for the clarification.

Ben.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GigE Performance Comparison of GMAC and SUNGEM Drivers
  2001-11-19  0:15 ` Anton Blanchard
  2001-11-19 12:54   ` Benjamin Herrenschmidt
@ 2001-11-21  3:46   ` Bill Fink
  2001-11-21  5:36     ` Anton Blanchard
  1 sibling, 1 reply; 12+ messages in thread
From: Bill Fink @ 2001-11-21  3:46 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Bill Fink, linuxppc-dev


Hi Anton,

On Mon, 19 Nov 2001, Anton Blanchard wrote:

> > The GMAC driver had significantly better performance.  It sustained
> > 663 Mbps for the 60 second test period, and used 63 % of the CPU on
> > the transmitter and 64 % of the CPU on the receiver.  By comparison,
> > the SUNGEM driver only achieved 588 Mbps, and utilized 100 % of the
> > CPU on the transmitter and 86 % of the CPU on the receiver.  Thus,
> > the SUNGEM driver had an 11.3 % lower network performance while
> > using 58.7 % more CPU (and was in fact totally CPU saturated).
>
> It would be interesting to see where the cpu is being used. Could you
> boot with profile=2 and use readprofile to find the worst cpu hogs
> during a run?

Since Ben suspected and there does indeed seem to be an abnormally
large number of interrupts with the SUNGEM driver, I haven't pursued
your suggestion yet.  However, since it sounds like an interesting
tool to have in one's arsenal and I've never used it before, I'll
probably give it a try a little later.

> > I will be trying more tests later using a NetGear GA620T
> > PCI NIC using the ACENIC driver to see if it has better performance.
> > This NetGear NIC is also supposed to support jumbo frames (9K MTU),
> > and I am very interested in determining the presumably significant
> > performance benefits and/or reduced CPU usage associated with using
> > jumbo frames.
>
> On two ppc64 machines I can get up to 100MB/s payload using 1500 byte MTU.
> When using zero copy this drops to 80MB/s (I guess the MIPS cpu on the
> acenic is flat out), but the host cpu usage is much less of course.
>
> With 9K MTU I can get ~122.5MB/s payload which is pretty good.

That's an understatement.  That's damn good!  I hope I can reproduce
that.  What NIC card were you using?  Unfortunately, I just checked
the NetGear web page and they don't seem to have the GA620T anymore.
They now have a GA622T, but I believe that uses a different chip,
which I don't think is supported by the acenic driver.

> PS: Be sure to increase all the /proc/sys/net/.../*mem* sysctl variables.

I had set the /proc/sys/net/core/[rw]mem_max to 1 MB each, which was
sufficient since my test application uses the SO_SNDBUF and SO_RCVBUF
ioctls to explicitly set the TCP transmitter and receiver window sizes.
However, I now also noticed the /proc/sys/net/ipv4/tcp_[rw]mem variables,
for which I found some terse documentation that explained they were
used for automatically selected receive and send buffers for the
TCP socket.  Is there any more extensive documentation anywhere for
how this auto tuning of TCP receive and send buffers is done?

						-Thanks

						-Bill

P.S.  It turns out that my using such a large window size of 768 KB was
      having an adverse impact on performance.  I'm used to doing tests
      across MANs and WANs.  But for the simple case of a local GigE
      switch, where the RTT is only about 0.12 msec, the necessary TCP
      window size (BW*RTT) is only about 15 KB (talk about overkill with
      my 768 KB window size).  I did another test with the GMAC driver
      just using the default TCP send and receive window sizes, and was
      able to achieve about 720 Mbps.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GigE Performance Comparison of GMAC and SUNGEM Drivers
  2001-11-21  0:19             ` David S. Miller
  2001-11-21  1:25               ` Benjamin Herrenschmidt
@ 2001-11-21  4:17               ` Bill Fink
  1 sibling, 0 replies; 12+ messages in thread
From: Bill Fink @ 2001-11-21  4:17 UTC (permalink / raw)
  To: David S. Miller; +Cc: benh, anton, billfink, linuxppc-dev


On Tue, 20 Nov 2001, David S. Miller wrote:

>    From: benh@kernel.crashing.org
>    Date: Wed, 21 Nov 2001 01:05:18 +0100
>
>    Ah, ok, I stand corrected. Is this a bug in the chip or a flaw of
>    the GEM chip design ? I mean, is there a chance that later revs
>    of the chip or eventually the one used by Apple can support it ?
>
> There is no chance whatsoever of this ever working on any
> GEM revision.
>
> Even if it could work, GEM has a 9K transmit and 20K receive fifo in
> the largest configuration.  Even the Acenic has 512k or 1MB of total
> on-chip ram for packet buffering.
>
> As a result GEM sends pause frames when there is even the slightest
> amount of DMA traffic is has to compete with.  On a 33Mhz/32-bit PCI
> bus, it is sending pause frames all the time even if it is the only
> agent making use of the bus.

Ah, maybe the limited on-chip buffering with the GEM chips explains
the performance cap of about 720 Mbps that I am experiencing with my
testing.  Since Ben indicated that the GEM is on a 66 MHz PCI bus,
the PCI bus itself has a bandwidth in excess of 2 Gbps (32-bit) or
4 Gbps (64-bit).  I don't know whether the GEM/PCI is 32-bit or 64-bit.

BTW, is there any way of checking on the system about what the bandwidth
and width of the various PCI buses are?  I can see from /proc/pci that
there are 3 PCI buses on the 867 MHz G4, and that the GEM is on the third
PCI bus with the FireWire, but I don't see anything obvious that indicates
what the bus bandwidth or width is.  Also, I'm now wondering if I'll have
any contention problems if and when I get any FireWire disks.

						-Thanks

						-Bill


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GigE Performance Comparison of GMAC and SUNGEM Drivers
  2001-11-21  3:46   ` Bill Fink
@ 2001-11-21  5:36     ` Anton Blanchard
  0 siblings, 0 replies; 12+ messages in thread
From: Anton Blanchard @ 2001-11-21  5:36 UTC (permalink / raw)
  To: Bill Fink; +Cc: linuxppc-dev



> That's an understatement.  That's damn good!  I hope I can reproduce
> that.  What NIC card were you using?  Unfortunately, I just checked
> the NetGear web page and they don't seem to have the GA620T anymore.
> They now have a GA622T, but I believe that uses a different chip,
> which I don't think is supported by the acenic driver.

Yes I use acenic type cards (IBM and Netgear GA620). I havent tested
the GA622 at all although I think there is a Linux driver for it.

> Is there any more extensive documentation anywhere for
> how this auto tuning of TCP receive and send buffers is done?

Check out Documentation/networking/ip-sysctl.txt in the kernel source
for some information. Its not much but its a start :)

Anton

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2001-11-21  5:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-11-18  6:45 GigE Performance Comparison of GMAC and SUNGEM Drivers Bill Fink
2001-11-19  0:15 ` Anton Blanchard
2001-11-19 12:54   ` Benjamin Herrenschmidt
2001-11-20  6:34     ` Bill Fink
2001-11-20 12:02       ` Benjamin Herrenschmidt
2001-11-20 19:50         ` David S. Miller
2001-11-21  0:05           ` benh
2001-11-21  0:19             ` David S. Miller
2001-11-21  1:25               ` Benjamin Herrenschmidt
2001-11-21  4:17               ` Bill Fink
2001-11-21  3:46   ` Bill Fink
2001-11-21  5:36     ` Anton Blanchard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).