From: Joel Schopp <jschopp@austin.ibm.com>
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: linuxppc-dev@ozlabs.org, paulus@samba.org, anton@samba.org
Subject: Re: [PATCH/RFC] 64 bit csum_partial_copy_generic
Date: Thu, 11 Sep 2008 12:44:44 -0500 [thread overview]
Message-ID: <48C9590C.2020508@austin.ibm.com> (raw)
In-Reply-To: <37E422D0-D937-4C5E-A4A2-EF911B4149E3@kernel.crashing.org>
> Did you consider the other alternative? If you work on 32-bit chunks
> instead of 64-bit chunks (either load them with lwz, or split them
> after loading with ld), you can add them up with a regular non-carrying
> add, which isn't serialising like adde; this also allows unrolling the
> loop (using several accumulators instead of just one). Since your
> registers are 64-bit, you can sum 16GB of data before ever getting a
> carry out.
>
> Or maybe the bottleneck here is purely the memory bandwidth?
I think the main bottleneck is the bandwidth/latency of memory.
When I sent the patch out I hadn't thought about eliminating the e from
the add with 32 bit chunks. So I went off and tried it today and
converting the existing function to use just add instead of adde (since
it was only doing 32 bits already) and got 1.5% - 15.7% faster on
Power5, which is nice, but was still way behind the new function in
every testcase. I then added 1 level of unrolling to that (using 2
accumulators) and got 59% slower to 10% faster on Power5 depending on
input. It seems quite a bit slower than I would have expected (I would
have expected basically even), but thats what got measured. The comment
in the existing function indicates unrolling the loop doesn't help
because the bdnz has zero overhead, so I guess the unrolling hurt more
than I expected.
In any case I have now thought about it and don't think it will work out.
>
>> Signed-off-by: Joel Schopp<jschopp@austin.ibm.com>
>
> You missed a space there.
If at first you don't succeed...
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
next prev parent reply other threads:[~2008-09-11 17:45 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-10 20:15 [PATCH/RFC] 64 bit csum_partial_copy_generic jschopp
2008-09-11 13:45 ` Segher Boessenkool
2008-09-11 17:44 ` Joel Schopp [this message]
2008-10-10 6:13 ` Paul Mackerras
2008-10-10 18:18 ` Joel Schopp
2008-10-16 6:12 ` Paul Mackerras
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48C9590C.2020508@austin.ibm.com \
--to=jschopp@austin.ibm.com \
--cc=anton@samba.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=paulus@samba.org \
--cc=segher@kernel.crashing.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).