From: Ingo Molnar <mingo@kernel.org>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: linux-kernel@vger.kernel.org, sebastien.dugue@bull.net,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
x86@kernel.org
Subject: Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
Date: Tue, 15 Oct 2013 09:32:48 +0200 [thread overview]
Message-ID: <20131015073248.GA25493@gmail.com> (raw)
In-Reply-To: <20131014202854.GH26880@hmsreliant.think-freely.org>
* Neil Horman <nhorman@tuxdriver.com> wrote:
> On Sat, Oct 12, 2013 at 07:21:24PM +0200, Ingo Molnar wrote:
> >
> > * Neil Horman <nhorman@tuxdriver.com> wrote:
> >
> > > Sébastien Dugué reported to me that devices implementing ipoib (which
> > > don't have checksum offload hardware were spending a significant amount
> > > of time computing checksums. We found that by splitting the checksum
> > > computation into two separate streams, each skipping successive elements
> > > of the buffer being summed, we could parallelize the checksum operation
> > > accros multiple alus. Since neither chain is dependent on the result of
> > > the other, we get a speedup in execution (on hardware that has multiple
> > > alu's available, which is almost ubiquitous on x86), and only a
> > > negligible decrease on hardware that has only a single alu (an extra
> > > addition is introduced). Since addition in commutative, the result is
> > > the same, only faster
> >
> > This patch should really come with measurement numbers: what performance
> > increase (and drop) did you get on what CPUs.
> >
> > Thanks,
> >
> > Ingo
> >
>
>
> So, early testing results today. I wrote a test module that, allocated
> a 4k buffer, initalized it with random data, and called csum_partial on
> it 100000 times, recording the time at the start and end of that loop.
It would be nice to stick that testcase into tools/perf/bench/, see how we
are able to benchmark the kernel's mempcy and memset implementation there:
$ perf bench mem memcpy -r help
# Running 'mem/memcpy' benchmark:
Unknown routine:help
Available routines...
default ... Default memcpy() provided by glibc
x86-64-unrolled ... unrolled memcpy() in arch/x86/lib/memcpy_64.S
x86-64-movsq ... movsq-based memcpy() in arch/x86/lib/memcpy_64.S
x86-64-movsb ... movsb-based memcpy() in arch/x86/lib/memcpy_64.S
In a similar fashion we could build the csum_partial() code as well and do
measurements. (We could change arch/x86/ code as well to make such
embedding/including easier, as long as it does not change performance.)
Thanks,
Ingo
next prev parent reply other threads:[~2013-10-15 7:32 UTC|newest]
Thread overview: 132+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-11 16:51 [PATCH] x86: Run checksumming in parallel accross multiple alu's Neil Horman
2013-10-12 17:21 ` Ingo Molnar
2013-10-13 12:53 ` Neil Horman
2013-10-14 20:28 ` Neil Horman
2013-10-14 21:19 ` Eric Dumazet
2013-10-14 22:18 ` Eric Dumazet
2013-10-14 22:37 ` Joe Perches
2013-10-14 22:44 ` Eric Dumazet
2013-10-14 22:49 ` Joe Perches
2013-10-15 7:41 ` Ingo Molnar
2013-10-15 10:51 ` Borislav Petkov
2013-10-15 12:04 ` Ingo Molnar
2013-10-15 16:21 ` Joe Perches
2013-10-16 0:34 ` Eric Dumazet
2013-10-16 6:25 ` Ingo Molnar
2013-10-16 16:55 ` Joe Perches
2013-10-17 0:34 ` Neil Horman
2013-10-17 1:42 ` Eric Dumazet
2013-10-18 16:50 ` Neil Horman
2013-10-18 17:20 ` Eric Dumazet
2013-10-18 20:11 ` Neil Horman
2013-10-18 21:15 ` Eric Dumazet
2013-10-20 21:29 ` Neil Horman
2013-10-21 17:31 ` Eric Dumazet
2013-10-21 17:46 ` Neil Horman
2013-10-21 19:21 ` Neil Horman
2013-10-21 19:44 ` Eric Dumazet
2013-10-21 20:19 ` Neil Horman
2013-10-26 12:01 ` Ingo Molnar
2013-10-26 13:58 ` Neil Horman
2013-10-27 7:26 ` Ingo Molnar
2013-10-27 17:05 ` Neil Horman
2013-10-17 8:41 ` Ingo Molnar
2013-10-17 18:19 ` H. Peter Anvin
2013-10-17 18:48 ` Eric Dumazet
2013-10-18 6:43 ` Ingo Molnar
2013-10-28 16:01 ` Neil Horman
2013-10-28 16:20 ` Ingo Molnar
2013-10-28 17:49 ` Neil Horman
2013-10-28 16:24 ` Ingo Molnar
2013-10-28 16:49 ` David Ahern
2013-10-28 17:46 ` Neil Horman
2013-10-28 18:29 ` Neil Horman
2013-10-29 8:25 ` Ingo Molnar
2013-10-29 11:20 ` Neil Horman
2013-10-29 11:30 ` Ingo Molnar
2013-10-29 11:49 ` Neil Horman
2013-10-29 12:52 ` Ingo Molnar
2013-10-29 13:07 ` Neil Horman
2013-10-29 13:11 ` Ingo Molnar
2013-10-29 13:20 ` Neil Horman
2013-10-29 14:17 ` Neil Horman
2013-10-29 14:27 ` Ingo Molnar
2013-10-29 20:26 ` Neil Horman
2013-10-31 10:22 ` Ingo Molnar
2013-10-31 14:33 ` Neil Horman
2013-11-01 9:13 ` Ingo Molnar
2013-11-01 14:06 ` Neil Horman
2013-10-29 14:12 ` David Ahern
2013-10-15 7:32 ` Ingo Molnar [this message]
2013-10-15 13:14 ` Neil Horman
2013-10-12 22:29 ` H. Peter Anvin
2013-10-13 12:53 ` Neil Horman
2013-10-18 16:42 ` Neil Horman
2013-10-18 17:09 ` H. Peter Anvin
2013-10-25 13:06 ` Neil Horman
2013-10-14 4:38 ` Andi Kleen
2013-10-14 7:49 ` Ingo Molnar
2013-10-14 21:07 ` Eric Dumazet
2013-10-15 13:17 ` Neil Horman
2013-10-14 20:25 ` Neil Horman
2013-10-15 7:12 ` Sébastien Dugué
2013-10-15 13:33 ` Andi Kleen
2013-10-15 13:56 ` Sébastien Dugué
2013-10-15 14:06 ` Eric Dumazet
2013-10-15 14:15 ` Sébastien Dugué
2013-10-15 14:26 ` Eric Dumazet
2013-10-15 14:52 ` Eric Dumazet
2013-10-15 16:02 ` Andi Kleen
2013-10-16 0:28 ` Eric Dumazet
2013-11-06 15:23 ` x86: Enhance perf checksum profiling and x86 implementation Neil Horman
2013-11-06 15:23 ` [PATCH v2 1/2] perf: Add csum benchmark tests to perf Neil Horman
2013-11-06 15:23 ` [PATCH v2 2/2] x86: add prefetching to do_csum Neil Horman
2013-11-06 15:34 ` Dave Jones
2013-11-06 15:54 ` Neil Horman
2013-11-06 17:19 ` Joe Perches
2013-11-06 18:11 ` Neil Horman
2013-11-06 20:02 ` Neil Horman
2013-11-06 20:07 ` Joe Perches
2013-11-08 16:25 ` Neil Horman
2013-11-08 16:51 ` Joe Perches
2013-11-08 19:07 ` Neil Horman
2013-11-08 19:17 ` Joe Perches
2013-11-08 20:08 ` Neil Horman
2013-11-08 19:17 ` H. Peter Anvin
2013-11-08 19:01 ` Neil Horman
2013-11-08 19:33 ` Joe Perches
2013-11-08 20:14 ` Neil Horman
2013-11-08 20:29 ` Joe Perches
2013-11-11 19:40 ` Neil Horman
2013-11-11 21:18 ` Ingo Molnar
2013-11-06 18:23 ` Eric Dumazet
2013-11-06 18:59 ` Neil Horman
2013-11-06 20:19 ` Andi Kleen
2013-11-07 21:23 ` Neil Horman
-- strict thread matches above, loose matches on Subject: below --
2013-10-18 15:46 [PATCH] x86: Run checksumming in parallel accross multiple alu's Doug Ledford
2013-10-18 17:42 Doug Ledford
2013-10-19 8:23 ` Ingo Molnar
2013-10-21 17:54 ` Doug Ledford
2013-10-26 11:55 ` Ingo Molnar
2013-10-28 17:02 ` Doug Ledford
2013-10-29 8:38 ` Ingo Molnar
2013-10-30 5:25 Doug Ledford
2013-10-30 10:27 ` David Laight
2013-10-30 11:02 ` Neil Horman
2013-10-30 12:18 ` David Laight
2013-10-30 13:22 ` Doug Ledford
2013-10-30 13:35 ` Doug Ledford
2013-10-30 14:04 ` David Laight
2013-10-30 14:52 ` Neil Horman
2013-10-31 18:30 ` Neil Horman
2013-11-01 9:21 ` Ingo Molnar
2013-11-01 15:42 ` Ben Hutchings
2013-11-01 16:08 ` Neil Horman
2013-11-01 16:16 ` Ben Hutchings
2013-11-01 16:18 ` David Laight
2013-11-01 17:37 ` Neil Horman
2013-11-01 19:45 ` Joe Perches
2013-11-01 19:58 ` Neil Horman
2013-11-01 20:26 ` Joe Perches
2013-11-02 2:07 ` Neil Horman
2013-11-04 9:47 ` David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131015073248.GA25493@gmail.com \
--to=mingo@kernel.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nhorman@tuxdriver.com \
--cc=sebastien.dugue@bull.net \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.