From: Andrew Gallatin <gallatin@myri.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Miller <davem@davemloft.net>,
brice@myri.com, sgruszka@redhat.com, netdev@vger.kernel.org
Subject: Re: [PATCH] myr10ge: again fix lro_gen_skb() alignment
Date: Tue, 28 Apr 2009 11:00:16 -0400 [thread overview]
Message-ID: <49F71A00.5090701@myri.com> (raw)
In-Reply-To: <20090428061225.GA1591@gondor.apana.org.au>
Herbert Xu wrote:
> On Mon, Apr 27, 2009 at 04:05:01PM +0800, Herbert Xu wrote:
>> On Fri, Apr 24, 2009 at 12:16:08PM -0400, Andrew Gallatin wrote:
>>> These results are indeed quite close, so the performance problem seems
>>> isolated to AMD CPUS, and perhaps due to the smaller caches.
>>> Do you have any AMD you can use as a receiver?
>> I now have an AMD with 512K cache to test this. Unfortunately
>> I'd just locked it up before I got a chance to do any serious
>> testing. So it might take a while.
>
> OK that's been fixed up. Indeed the AMD can't do wire speed.
> But still the performance seems comparable. Both of them sit
> between 6600Mb/s and 7100Mb/s. The sender is running at about
> 66% idle in either case.
Its strange, I still consistently see about 1Gb/s better performance
from LRO than GRO on this weak machine (6.5Gb/s LRO, 5.5Gb/s GRO)
when binding everything to the same CPU. Mpstat -P 0 shows roughly
10% more time spent in "soft" when using GRO vs LRO:
GRO:
10:17:45 CPU %user %nice %system %iowait %irq %soft
%idle intr/s
10:17:46 0 0.00 0.00 54.00 0.00 0.00 46.00 0.00
11754.00
10:17:47 0 0.00 0.00 54.00 0.00 1.00 45.00 0.00
11718.00
10:17:48 0 0.00 0.00 47.00 0.00 2.00 51.00 0.00
11639.00
LRO:
10:21:55 CPU %user %nice %system %iowait %irq %soft %idle
intr/s
10:21:56 0 0.00 0.00 66.00 0.00 1.00 33.00 0.00
13228.00
10:21:57 0 0.00 0.00 65.35 0.00 1.98 32.67 0.00
13118.81
10:21:58 0 0.00 0.00 63.00 0.00 1.00 36.00 0.00
13238.00
According to oprofile, the top 20 samples running GRO are:
CPU: AMD64 processors, speed 2050.03 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 100000
samples % image name app name
symbol name
4382 30.5408 vmlinux vmlinux
copy_user_generic_string
534 3.7218 myri10ge.ko myri10ge
myri10ge_poll
463 3.2269 vmlinux vmlinux
_raw_spin_lock
394 2.7460 vmlinux vmlinux
rb_get_reader_page
382 2.6624 vmlinux vmlinux
acpi_pm_read
356 2.4812 vmlinux vmlinux
inet_gro_receive
293 2.0421 oprofiled oprofiled (no
symbols)
268 1.8679 vmlinux vmlinux
find_next_bit
268 1.8679 vmlinux vmlinux
tg_shares_up
257 1.7912 vmlinux vmlinux
ring_buffer_consume
247 1.7215 myri10ge.ko myri10ge
myri10ge_alloc_rx_pages
247 1.7215 vmlinux vmlinux
tcp_gro_receive
228 1.5891 vmlinux vmlinux
__free_pages_ok
219 1.5263 vmlinux vmlinux
skb_gro_receive
167 1.1639 vmlinux vmlinux
skb_gro_header
149 1.0385 bash bash (no
symbols)
141 0.9827 vmlinux vmlinux
skb_copy_datagram_iovec
132 0.9200 vmlinux vmlinux
rb_buffer_peek
129 0.8991 vmlinux vmlinux
_raw_spin_unlock
123 0.8573 vmlinux vmlinux
delay_tsc
Nothing really stands out for me. Here is LRO:
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 100000
samples % image name app name
symbol name
4884 33.1164 vmlinux vmlinux
copy_user_generic_string
721 4.8888 myri10ge.ko myri10ge
myri10ge_poll
580 3.9327 vmlinux vmlinux
_raw_spin_lock
409 2.7733 vmlinux vmlinux
acpi_pm_read
306 2.0749 vmlinux vmlinux
rb_get_reader_page
293 1.9867 oprofiled oprofiled (no
symbols)
286 1.9392 myri10ge.ko myri10ge
myri10ge_get_frag_header
253 1.7155 vmlinux vmlinux
__lro_proc_segment
250 1.6951 vmlinux vmlinux
rb_buffer_peek
247 1.6748 vmlinux vmlinux
ring_buffer_consume
232 1.5731 vmlinux vmlinux
__free_pages_ok
211 1.4307 myri10ge.ko myri10ge
myri10ge_alloc_rx_pages
206 1.3968 vmlinux vmlinux
tg_shares_up
175 1.1866 vmlinux vmlinux
skb_copy_datagram_iovec
158 1.0713 vmlinux vmlinux
find_next_bit
146 0.9900 vmlinux vmlinux
lro_tcp_ip_check
131 0.8883 oprofile.ko oprofile
op_cpu_buffer_read_entry
127 0.8611 vmlinux vmlinux
delay_tsc
125 0.8476 bash bash (no
symbols)
125 0.8476 vmlinux vmlinux
_raw_spin_unlock
If I can't figure out why LRO is so much faster in some cases, then I
think maybe I'll just put together a patch which keeps LRO, and does
GRO only if LRO is disabled. Kind of ugly, but better than loosing
15% performance on some machines.
Drew
next prev parent reply other threads:[~2009-04-28 15:01 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-15 8:09 [PATCH] myr10ge: again fix lro_gen_skb() alignment Stanislaw Gruszka
2009-04-15 9:28 ` David Miller
2009-04-15 9:48 ` Brice Goglin
2009-04-15 10:02 ` David Miller
2009-04-15 13:01 ` Andrew Gallatin
2009-04-15 21:04 ` Andrew Gallatin
2009-04-15 23:42 ` David Miller
2009-04-16 8:50 ` Herbert Xu
2009-04-16 9:02 ` David Miller
2009-04-21 19:19 ` Andrew Gallatin
2009-04-22 10:48 ` Herbert Xu
2009-04-22 15:37 ` Andrew Gallatin
2009-04-24 5:45 ` Herbert Xu
2009-04-24 12:45 ` Andrew Gallatin
2009-04-24 12:51 ` Herbert Xu
2009-04-24 17:13 ` Rick Jones
2009-04-24 16:16 ` Andrew Gallatin
2009-04-24 16:30 ` Herbert Xu
2009-04-24 16:31 ` Herbert Xu
2009-04-27 8:05 ` Herbert Xu
2009-04-27 8:07 ` Herbert Xu
2009-04-27 9:32 ` David Miller
2009-04-27 11:01 ` Herbert Xu
2009-04-27 12:45 ` David Miller
2009-04-27 12:45 ` David Miller
2009-04-28 6:12 ` Herbert Xu
2009-04-28 15:00 ` Andrew Gallatin [this message]
2009-04-28 15:02 ` David Miller
2009-04-28 15:20 ` Herbert Xu
2009-04-28 15:44 ` Andrew Gallatin
2009-04-28 21:12 ` Andrew Gallatin
2009-04-29 13:42 ` Andrew Gallatin
2009-04-29 13:53 ` Eric Dumazet
2009-04-29 14:18 ` Andrew Gallatin
2009-04-29 15:26 ` Eric Dumazet
2009-04-29 17:28 ` Andrew Gallatin
2009-04-30 8:10 ` Herbert Xu
2009-04-30 8:14 ` Herbert Xu
2009-04-30 8:17 ` Eric Dumazet
2009-04-30 19:14 ` Andrew Gallatin
2009-04-23 8:00 ` Herbert Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49F71A00.5090701@myri.com \
--to=gallatin@myri.com \
--cc=brice@myri.com \
--cc=davem@davemloft.net \
--cc=herbert@gondor.apana.org.au \
--cc=netdev@vger.kernel.org \
--cc=sgruszka@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.