From mboxrd@z Thu Jan 1 00:00:00 1970 From: Loic Dachary Subject: Re: SIMD accelerated crush_do_rule proof of concept Date: Mon, 29 Aug 2016 16:54:11 +0200 Message-ID: <57C44C93.3050100@dachary.org> References: <57C41F9E.7090309@dachary.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Return-path: Received: from slow1-d.mail.gandi.net ([217.70.178.86]:46068 "EHLO slow1-d.mail.gandi.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757200AbcH2O6z (ORCPT ); Mon, 29 Aug 2016 10:58:55 -0400 Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net [217.70.183.195]) by slow1-d.mail.gandi.net (Postfix) with ESMTP id 5668947BB6D for ; Mon, 29 Aug 2016 16:55:36 +0200 (CEST) In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Vincent JARDIN Cc: Ceph Development Hi Vincent, On 29/08/2016 16:08, Vincent JARDIN wrote: > Le 29/08/2016 à 15:55, Sage Weil a écrit : >> To answer your question, the only real risk/problem I see is that we need >> to keep the perfectly in sync with the non-optimized variant > > I do propose a generic implementation that allows to share SIMD on ARM, Intel and others (Altivec), > > > https://github.com/dachary/ceph/commit/71ae4584d9ed57f70aad718d0ffe206a01e91fef > > You can try the following, > For instance, > #include > #include > { > __v32qi va, vb; > va = (__v32qi) { 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 4, 1, 0 }; > vb = (__v32qi) { 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 }; > > __v32qi res = va ^ vb; > } > > it will produce the optimized Neon or AVX, AVX2 according to each targets. Generic code that relies on the compiler optimizations is terse, which is nice. But the code is not generic: it needs to be written specifically for the optimizer, which is self defeating. The http://locklessinc.com/articles/vectorize/ article illustrate that in a fun way. Instead of maintaining code with SIMD instructions, you need to understand each optimizer by reading assembly language, which is more complicated. Cheers -- Loïc Dachary, Artisan Logiciel Libre