Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@kernel.org>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	linux-kernel@vger.kernel.org, sebastien.dugue@bull.net,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org
Subject: Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
Date: Sat, 26 Oct 2013 14:01:08 +0200	[thread overview]
Message-ID: <20131026120108.GC24067@gmail.com> (raw)
In-Reply-To: <20131021201958.GC4154@hmsreliant.think-freely.org>


* Neil Horman <nhorman@tuxdriver.com> wrote:

> On Mon, Oct 21, 2013 at 12:44:05PM -0700, Eric Dumazet wrote:
> > On Mon, 2013-10-21 at 15:21 -0400, Neil Horman wrote:
> > 
> > > 
> > > Ok, so I ran the above code on a single cpu using taskset, and set irq affinity
> > > such that no interrupts (save for local ones), would occur on that cpu.  Note
> > > that I had to convert csum_partial_opt to csum_partial, as the _opt variant
> > > doesn't exist in my tree, nor do I see it in any upstream tree or in the history
> > > anywhere.
> > 
> > This csum_partial_opt() was a private implementation of csum_partial()
> > so that I could load the module without rebooting the kernel ;)
> > 
> > > 
> > > base results:
> > > 53569916
> > > 43506025
> > > 43476542
> > > 44048436
> > > 45048042
> > > 48550429
> > > 53925556
> > > 53927374
> > > 53489708
> > > 53003915
> > > 
> > > AVG = 492 ns
> > > 
> > > prefetching only:
> > > 53279213
> > > 45518140
> > > 49585388
> > > 53176179
> > > 44071822
> > > 43588822
> > > 44086546
> > > 47507065
> > > 53646812
> > > 54469118
> > > 
> > > AVG = 488 ns
> > > 
> > > 
> > > parallel alu's only:
> > > 46226844
> > > 44458101
> > > 46803498
> > > 45060002
> > > 46187624
> > > 37542946
> > > 45632866
> > > 46275249
> > > 45031141
> > > 46281204
> > > 
> > > AVG = 449 ns
> > > 
> > > 
> > > both optimizations:
> > > 45708837
> > > 45631124
> > > 45697135
> > > 45647011
> > > 45036679
> > > 39418544
> > > 44481577
> > > 46820868
> > > 44496471
> > > 35523928
> > > 
> > > AVG = 438 ns
> > > 
> > > 
> > > We continue to see a small savings in execution time with prefetching (4 ns, or
> > > about 0.8%), a better savings with parallel alu execution (43 ns, or 8.7%), and
> > > the best savings with both optimizations (54 ns, or 10.9%).  
> > > 
> > > These results, while they've changed as we've modified the test case slightly
> > > have remained consistent in their sppedup ordinality.  Prefetching helps, but
> > > not as much as using multiple alu's, and neither is as good as doing both
> > > together.
> > > 
> > > Unless you see something else that I'm doing wrong here.  It seems like a win to
> > > do both.
> > > 
> > 
> > Well, I only said (or maybe I forgot), that on my machines, I got no
> > improvements at all with the multiple alu or the prefetch. (I tried
> > different strides)
> > 
> > Only noises in the results.
> > 
> I thought you previously said that running netperf gave you a stastically
> significant performance boost when you added prefetching:
> http://marc.info/?l=linux-kernel&m=138178914124863&w=2
> 
> But perhaps I missed a note somewhere.
> 
> > It seems it depends on cpus and/or multiple factors.
> > 
> > Last machine I used for the tests had :
> > 
> > processor	: 23
> > vendor_id	: GenuineIntel
> > cpu family	: 6
> > model		: 44
> > model name	: Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
> > stepping	: 2
> > microcode	: 0x13
> > cpu MHz		: 2800.256
> > cache size	: 12288 KB
> > physical id	: 1
> > siblings	: 12
> > core id		: 10
> > cpu cores	: 6
> > 
> > 
> > 
> > 
> 
> Thats about what I'm running with:
> processor       : 0
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 44
> model name      : Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz
> stepping        : 2
> microcode       : 0x13
> cpu MHz         : 1600.000
> cache size      : 12288 KB
> physical id     : 0
> siblings        : 8
> core id         : 0
> cpu cores       : 4
> 
> 
> I can't imagine what would cause the discrepancy in our results (a 
> 10% savings in execution time seems significant to me). My only 
> thought would be that possibly the alu's on your cpu are faster 
> than mine, and reduce the speedup obtained by preforming operation 
> in parallel, though I can't imagine thats the case with these 
> processors being so closely matched.

You keep ignoring my request to calculate and account for noise of 
the measurement.

For example you are talking about a 0.8% prefetch effect while the 
noise in the results is obviously much larger than that, with a 
min/max distance of around 5%:

> > > 43476542
> > > 53927374

so the noise of 10 measurements would be around 5-10%. (back of the 
envelope calculation)

So you might be right in the end, but the posted data does not 
support your claims, statistically.

It's your responsibility to come up with convincing measurements and 
results, not of those who review your work.

Thanks,

	Ingo

next prev parent reply	other threads:[~2013-10-26 12:01 UTC|newest]

Thread overview: 132+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-11 16:51 [PATCH] x86: Run checksumming in parallel accross multiple alu's Neil Horman
2013-10-12 17:21 ` Ingo Molnar
2013-10-13 12:53   ` Neil Horman
2013-10-14 20:28   ` Neil Horman
2013-10-14 21:19     ` Eric Dumazet
2013-10-14 22:18       ` Eric Dumazet
2013-10-14 22:37         ` Joe Perches
2013-10-14 22:44           ` Eric Dumazet
2013-10-14 22:49             ` Joe Perches
2013-10-15  7:41               ` Ingo Molnar
2013-10-15 10:51                 ` Borislav Petkov
2013-10-15 12:04                   ` Ingo Molnar
2013-10-15 16:21                 ` Joe Perches
2013-10-16  0:34                   ` Eric Dumazet
2013-10-16  6:25                   ` Ingo Molnar
2013-10-16 16:55                     ` Joe Perches
2013-10-17  0:34         ` Neil Horman
2013-10-17  1:42           ` Eric Dumazet
2013-10-18 16:50             ` Neil Horman
2013-10-18 17:20               ` Eric Dumazet
2013-10-18 20:11                 ` Neil Horman
2013-10-18 21:15                   ` Eric Dumazet
2013-10-20 21:29                     ` Neil Horman
2013-10-21 17:31                       ` Eric Dumazet
2013-10-21 17:46                         ` Neil Horman
2013-10-21 19:21                     ` Neil Horman
2013-10-21 19:44                       ` Eric Dumazet
2013-10-21 20:19                         ` Neil Horman
2013-10-26 12:01                           ` Ingo Molnar [this message]
2013-10-26 13:58                             ` Neil Horman
2013-10-27  7:26                               ` Ingo Molnar
2013-10-27 17:05                                 ` Neil Horman
2013-10-17  8:41           ` Ingo Molnar
2013-10-17 18:19             ` H. Peter Anvin
2013-10-17 18:48               ` Eric Dumazet
2013-10-18  6:43               ` Ingo Molnar
2013-10-28 16:01             ` Neil Horman
2013-10-28 16:20               ` Ingo Molnar
2013-10-28 17:49                 ` Neil Horman
2013-10-28 16:24               ` Ingo Molnar
2013-10-28 16:49                 ` David Ahern
2013-10-28 17:46                 ` Neil Horman
2013-10-28 18:29                   ` Neil Horman
2013-10-29  8:25                     ` Ingo Molnar
2013-10-29 11:20                       ` Neil Horman
2013-10-29 11:30                         ` Ingo Molnar
2013-10-29 11:49                           ` Neil Horman
2013-10-29 12:52                             ` Ingo Molnar
2013-10-29 13:07                               ` Neil Horman
2013-10-29 13:11                                 ` Ingo Molnar
2013-10-29 13:20                                   ` Neil Horman
2013-10-29 14:17                                   ` Neil Horman
2013-10-29 14:27                                     ` Ingo Molnar
2013-10-29 20:26                                       ` Neil Horman
2013-10-31 10:22                                         ` Ingo Molnar
2013-10-31 14:33                                           ` Neil Horman
2013-11-01  9:13                                             ` Ingo Molnar
2013-11-01 14:06                                               ` Neil Horman
2013-10-29 14:12                               ` David Ahern
2013-10-15  7:32     ` Ingo Molnar
2013-10-15 13:14       ` Neil Horman
2013-10-12 22:29 ` H. Peter Anvin
2013-10-13 12:53   ` Neil Horman
2013-10-18 16:42   ` Neil Horman
2013-10-18 17:09     ` H. Peter Anvin
2013-10-25 13:06       ` Neil Horman
2013-10-14  4:38 ` Andi Kleen
2013-10-14  7:49   ` Ingo Molnar
2013-10-14 21:07     ` Eric Dumazet
2013-10-15 13:17       ` Neil Horman
2013-10-14 20:25   ` Neil Horman
2013-10-15  7:12     ` Sébastien Dugué
2013-10-15 13:33       ` Andi Kleen
2013-10-15 13:56         ` Sébastien Dugué
2013-10-15 14:06           ` Eric Dumazet
2013-10-15 14:15             ` Sébastien Dugué
2013-10-15 14:26               ` Eric Dumazet
2013-10-15 14:52                 ` Eric Dumazet
2013-10-15 16:02                   ` Andi Kleen
2013-10-16  0:28                     ` Eric Dumazet
2013-11-06 15:23 ` x86: Enhance perf checksum profiling and x86 implementation Neil Horman
2013-11-06 15:23   ` [PATCH v2 1/2] perf: Add csum benchmark tests to perf Neil Horman
2013-11-06 15:23   ` [PATCH v2 2/2] x86: add prefetching to do_csum Neil Horman
2013-11-06 15:34     ` Dave Jones
2013-11-06 15:54       ` Neil Horman
2013-11-06 17:19         ` Joe Perches
2013-11-06 18:11           ` Neil Horman
2013-11-06 20:02           ` Neil Horman
2013-11-06 20:07             ` Joe Perches
2013-11-08 16:25               ` Neil Horman
2013-11-08 16:51                 ` Joe Perches
2013-11-08 19:07                   ` Neil Horman
2013-11-08 19:17                     ` Joe Perches
2013-11-08 20:08                       ` Neil Horman
2013-11-08 19:17                     ` H. Peter Anvin
2013-11-08 19:01           ` Neil Horman
2013-11-08 19:33             ` Joe Perches
2013-11-08 20:14               ` Neil Horman
2013-11-08 20:29                 ` Joe Perches
2013-11-11 19:40                   ` Neil Horman
2013-11-11 21:18                     ` Ingo Molnar
2013-11-06 18:23         ` Eric Dumazet
2013-11-06 18:59           ` Neil Horman
2013-11-06 20:19     ` Andi Kleen
2013-11-07 21:23       ` Neil Horman
  -- strict thread matches above, loose matches on Subject: below --
2013-10-18 15:46 [PATCH] x86: Run checksumming in parallel accross multiple alu's Doug Ledford
2013-10-18 17:42 Doug Ledford
2013-10-19  8:23 ` Ingo Molnar
2013-10-21 17:54   ` Doug Ledford
2013-10-26 11:55     ` Ingo Molnar
2013-10-28 17:02       ` Doug Ledford
2013-10-29  8:38         ` Ingo Molnar
2013-10-30  5:25 Doug Ledford
2013-10-30 10:27 ` David Laight
2013-10-30 11:02 ` Neil Horman
2013-10-30 12:18   ` David Laight
2013-10-30 13:22     ` Doug Ledford
2013-10-30 13:35   ` Doug Ledford
2013-10-30 14:04     ` David Laight
2013-10-30 14:52     ` Neil Horman
2013-10-31 18:30     ` Neil Horman
2013-11-01  9:21       ` Ingo Molnar
2013-11-01 15:42       ` Ben Hutchings
2013-11-01 16:08         ` Neil Horman
2013-11-01 16:16           ` Ben Hutchings
2013-11-01 16:18           ` David Laight
2013-11-01 17:37             ` Neil Horman
2013-11-01 19:45               ` Joe Perches
2013-11-01 19:58                 ` Neil Horman
2013-11-01 20:26                   ` Joe Perches
2013-11-02  2:07                     ` Neil Horman
2013-11-04  9:47               ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131026120108.GC24067@gmail.com \
    --to=mingo@kernel.org \
    --cc=eric.dumazet@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nhorman@tuxdriver.com \
    --cc=sebastien.dugue@bull.net \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).