* Re: Accelerating crush with SIMD
2016-08-29 8:56 ` Loic Dachary
@ 2016-08-29 9:16 ` Piotr Dałek
2016-08-29 18:15 ` Gregory Farnum
1 sibling, 0 replies; 5+ messages in thread
From: Piotr Dałek @ 2016-08-29 9:16 UTC (permalink / raw)
To: Ceph Development
On Mon, Aug 29, 2016 at 10:56:47AM +0200, Loic Dachary wrote:
> Hi Greg,
>
> On 29/08/2016 06:28, Gregory Farnum wrote:
> > On Sun, Aug 28, 2016 at 10:59 AM, Loic Dachary <loic@dachary.org> wrote:
> >> Hi,
> >>
> >> Could we significantly accelerate crush with SIMD instructions ? I don't remember the idea being discussed but maybe I missed it.
> >
> > I think it was attempted, but using a lookup table method turned out
> > to be much faster. Sage did some prototyping and then some folks from
> > Intel did a lot of heavy optimization; I'd be surprised if anybody
> > managed to speed up the CRUSH calculations much at this point (at
> > least, without changing the fundamental math involved).
> >
> > Sorry I can't be more detailed; the actual CRUSH implementation is
> > something I've largely left alone. I imagine the optimization points
> > become pretty clear running git blame or something though. ;)
>
> I was not thinking of accelerating the crush hash function or the straw2 function, but to have them run simultaneously on 4/8/16 items at a time using _mm, _mm256 or _mm512 instructions[1], when possible. I'll put together a proof of concept later today to clarify what I have in mind.
>
> Cheers
>
> [1] https://software.intel.com/sites/landingpage/IntrinsicsGuide/
Last time I checked, it didn't make sense in any way as crush functions were
fast enough already, and there was little room for parallelizing
calculations. This *is* possible, but requires a lot of careful rework on
all parts that actually use it. Note that just calculating 4/8/16 hashes at
once doesn't mean instant benefit as calculation is only the part of story;
you need to pack and unpack data from source/to destination and this takes
time too. Also, I don't think Ceph does so many crush recalculations per
second to make such rework feasible - but feel free to prove me wrong.
Best regards,
--
Piotr Dałek
branch@predictor.org.pl
http://blog.predictor.org.pl
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Accelerating crush with SIMD
2016-08-29 8:56 ` Loic Dachary
2016-08-29 9:16 ` Piotr Dałek
@ 2016-08-29 18:15 ` Gregory Farnum
1 sibling, 0 replies; 5+ messages in thread
From: Gregory Farnum @ 2016-08-29 18:15 UTC (permalink / raw)
To: Loic Dachary; +Cc: Ceph Development
On Mon, Aug 29, 2016 at 1:56 AM, Loic Dachary <loic@dachary.org> wrote:
> Hi Greg,
>
> On 29/08/2016 06:28, Gregory Farnum wrote:
>> On Sun, Aug 28, 2016 at 10:59 AM, Loic Dachary <loic@dachary.org> wrote:
>>> Hi,
>>>
>>> Could we significantly accelerate crush with SIMD instructions ? I don't remember the idea being discussed but maybe I missed it.
>>
>> I think it was attempted, but using a lookup table method turned out
>> to be much faster. Sage did some prototyping and then some folks from
>> Intel did a lot of heavy optimization; I'd be surprised if anybody
>> managed to speed up the CRUSH calculations much at this point (at
>> least, without changing the fundamental math involved).
>>
>> Sorry I can't be more detailed; the actual CRUSH implementation is
>> something I've largely left alone. I imagine the optimization points
>> become pretty clear running git blame or something though. ;)
>
> I was not thinking of accelerating the crush hash function or the straw2 function, but to have them run simultaneously on 4/8/16 items at a time using _mm, _mm256 or _mm512 instructions[1], when possible. I'll put together a proof of concept later today to clarify what I have in mind.
I know this moved threads, but now I get it and that sounds cool. :)
*thumbs up*
-Greg
^ permalink raw reply [flat|nested] 5+ messages in thread