netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Gallatin <gallatin@myri.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Miller <davem@davemloft.net>,
	brice@myri.com, sgruszka@redhat.com, netdev@vger.kernel.org
Subject: Re: [PATCH] myr10ge: again fix lro_gen_skb() alignment
Date: Tue, 28 Apr 2009 11:00:16 -0400	[thread overview]
Message-ID: <49F71A00.5090701@myri.com> (raw)
In-Reply-To: <20090428061225.GA1591@gondor.apana.org.au>

Herbert Xu wrote:
 > On Mon, Apr 27, 2009 at 04:05:01PM +0800, Herbert Xu wrote:
 >> On Fri, Apr 24, 2009 at 12:16:08PM -0400, Andrew Gallatin wrote:
 >>> These results are indeed quite close, so the performance problem seems
 >>> isolated to AMD CPUS, and perhaps due to the smaller caches.
 >>> Do you have any AMD you can use as a receiver?
 >> I now have an AMD with 512K cache to test this.  Unfortunately
 >> I'd just locked it up before I got a chance to do any serious
 >> testing.  So it might take a while.
 >
 > OK that's been fixed up.  Indeed the AMD can't do wire speed.
 > But still the performance seems comparable.  Both of them sit
 > between 6600Mb/s and 7100Mb/s.  The sender is running at about
 > 66% idle in either case.

Its strange, I still consistently see about 1Gb/s better performance
from LRO than GRO on this weak machine (6.5Gb/s LRO, 5.5Gb/s GRO)
when binding everything to the same CPU. Mpstat -P 0 shows roughly
10% more time spent in "soft" when using GRO vs LRO:

GRO:
  10:17:45     CPU   %user   %nice %system %iowait    %irq   %soft 
%idle    intr/s
10:17:46       0    0.00    0.00   54.00    0.00    0.00   46.00    0.00 
  11754.00
10:17:47       0    0.00    0.00   54.00    0.00    1.00   45.00    0.00 
  11718.00
10:17:48       0    0.00    0.00   47.00    0.00    2.00   51.00    0.00 
  11639.00


LRO:
10:21:55     CPU   %user   %nice %system %iowait    %irq   %soft   %idle 
    intr/s
10:21:56       0    0.00    0.00   66.00    0.00    1.00   33.00    0.00 
  13228.00
10:21:57       0    0.00    0.00   65.35    0.00    1.98   32.67    0.00 
  13118.81
10:21:58       0    0.00    0.00   63.00    0.00    1.00   36.00    0.00 
  13238.00


According to oprofile, the top 20 samples running GRO are:
CPU: AMD64 processors, speed 2050.03 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a 
unit mask of 0x00 (No unit mask) count 100000
samples  %        image name               app name 
symbol name
4382     30.5408  vmlinux                  vmlinux 
copy_user_generic_string
534       3.7218  myri10ge.ko              myri10ge 
myri10ge_poll
463       3.2269  vmlinux                  vmlinux 
_raw_spin_lock
394       2.7460  vmlinux                  vmlinux 
rb_get_reader_page
382       2.6624  vmlinux                  vmlinux 
acpi_pm_read
356       2.4812  vmlinux                  vmlinux 
inet_gro_receive
293       2.0421  oprofiled                oprofiled                (no 
symbols)
268       1.8679  vmlinux                  vmlinux 
find_next_bit
268       1.8679  vmlinux                  vmlinux 
tg_shares_up
257       1.7912  vmlinux                  vmlinux 
ring_buffer_consume
247       1.7215  myri10ge.ko              myri10ge 
myri10ge_alloc_rx_pages
247       1.7215  vmlinux                  vmlinux 
tcp_gro_receive
228       1.5891  vmlinux                  vmlinux 
__free_pages_ok
219       1.5263  vmlinux                  vmlinux 
skb_gro_receive
167       1.1639  vmlinux                  vmlinux 
skb_gro_header
149       1.0385  bash                     bash                     (no 
symbols)
141       0.9827  vmlinux                  vmlinux 
skb_copy_datagram_iovec
132       0.9200  vmlinux                  vmlinux 
rb_buffer_peek
129       0.8991  vmlinux                  vmlinux 
_raw_spin_unlock
123       0.8573  vmlinux                  vmlinux 
delay_tsc

Nothing really stands out for me.  Here is LRO:


Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a 
unit mask of 0x00 (No unit mask) count 100000
samples  %        image name               app name 
symbol name
4884     33.1164  vmlinux                  vmlinux 
copy_user_generic_string
721       4.8888  myri10ge.ko              myri10ge 
myri10ge_poll
580       3.9327  vmlinux                  vmlinux 
_raw_spin_lock
409       2.7733  vmlinux                  vmlinux 
acpi_pm_read
306       2.0749  vmlinux                  vmlinux 
rb_get_reader_page
293       1.9867  oprofiled                oprofiled                (no 
symbols)
286       1.9392  myri10ge.ko              myri10ge 
myri10ge_get_frag_header
253       1.7155  vmlinux                  vmlinux 
__lro_proc_segment
250       1.6951  vmlinux                  vmlinux 
rb_buffer_peek
247       1.6748  vmlinux                  vmlinux 
ring_buffer_consume
232       1.5731  vmlinux                  vmlinux 
__free_pages_ok
211       1.4307  myri10ge.ko              myri10ge 
myri10ge_alloc_rx_pages
206       1.3968  vmlinux                  vmlinux 
tg_shares_up
175       1.1866  vmlinux                  vmlinux 
skb_copy_datagram_iovec
158       1.0713  vmlinux                  vmlinux 
find_next_bit
146       0.9900  vmlinux                  vmlinux 
lro_tcp_ip_check
131       0.8883  oprofile.ko              oprofile 
op_cpu_buffer_read_entry
127       0.8611  vmlinux                  vmlinux 
delay_tsc
125       0.8476  bash                     bash                     (no 
symbols)
125       0.8476  vmlinux                  vmlinux 
_raw_spin_unlock


If I can't figure out why LRO is so much faster in some cases, then I
think maybe I'll just put together a patch which keeps LRO, and does
GRO only if LRO is disabled.  Kind of ugly, but better than loosing
15% performance on some machines.

Drew

  reply	other threads:[~2009-04-28 15:01 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-15  8:09 [PATCH] myr10ge: again fix lro_gen_skb() alignment Stanislaw Gruszka
2009-04-15  9:28 ` David Miller
2009-04-15  9:48   ` Brice Goglin
2009-04-15 10:02     ` David Miller
2009-04-15 13:01       ` Andrew Gallatin
2009-04-15 21:04         ` Andrew Gallatin
2009-04-15 23:42           ` David Miller
2009-04-16  8:50             ` Herbert Xu
2009-04-16  9:02               ` David Miller
2009-04-21 19:19               ` Andrew Gallatin
2009-04-22 10:48                 ` Herbert Xu
2009-04-22 15:37                   ` Andrew Gallatin
2009-04-24  5:45                     ` Herbert Xu
2009-04-24 12:45                       ` Andrew Gallatin
2009-04-24 12:51                         ` Herbert Xu
2009-04-24 17:13                         ` Rick Jones
2009-04-24 16:16                       ` Andrew Gallatin
2009-04-24 16:30                         ` Herbert Xu
2009-04-24 16:31                           ` Herbert Xu
2009-04-27  8:05                         ` Herbert Xu
2009-04-27  8:07                           ` Herbert Xu
2009-04-27  9:32                             ` David Miller
2009-04-27 11:01                               ` Herbert Xu
2009-04-27 12:45                             ` David Miller
2009-04-27 12:45                           ` David Miller
2009-04-28  6:12                           ` Herbert Xu
2009-04-28 15:00                             ` Andrew Gallatin [this message]
2009-04-28 15:02                               ` David Miller
2009-04-28 15:20                               ` Herbert Xu
2009-04-28 15:44                                 ` Andrew Gallatin
2009-04-28 21:12                                 ` Andrew Gallatin
2009-04-29 13:42                                   ` Andrew Gallatin
2009-04-29 13:53                                     ` Eric Dumazet
2009-04-29 14:18                                       ` Andrew Gallatin
2009-04-29 15:26                                         ` Eric Dumazet
2009-04-29 17:28                                           ` Andrew Gallatin
2009-04-30  8:10                                             ` Herbert Xu
2009-04-30  8:14                                               ` Herbert Xu
2009-04-30  8:17                                             ` Eric Dumazet
2009-04-30 19:14                                               ` Andrew Gallatin
2009-04-23  8:00                 ` Herbert Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49F71A00.5090701@myri.com \
    --to=gallatin@myri.com \
    --cc=brice@myri.com \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=netdev@vger.kernel.org \
    --cc=sgruszka@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).