From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51869) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ap1dW-00047k-Sn for qemu-devel@nongnu.org; Sat, 09 Apr 2016 18:45:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ap1dT-0000bF-Nf for qemu-devel@nongnu.org; Sat, 09 Apr 2016 18:45:50 -0400 Sender: Richard Henderson References: <1460023087-31509-1-git-send-email-vijayak@caviumnetworks.com> <1460023087-31509-2-git-send-email-vijayak@caviumnetworks.com> From: Richard Henderson Message-ID: <57098617.1020308@twiddle.net> Date: Sat, 9 Apr 2016 15:45:43 -0700 MIME-Version: 1.0 In-Reply-To: <1460023087-31509-2-git-send-email-vijayak@caviumnetworks.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: vijayak@caviumnetworks.com, qemu-arm@nongnu.org, peter.maydell@linaro.org, pbonzini@redhat.com Cc: vijay.kilari@gmail.com, Prasun.Kapoor@caviumnetworks.com, knv.suresh2009@gmail.com, qemu-devel@nongnu.org, Suresh , Vijay On 04/07/2016 02:58 AM, vijayak@caviumnetworks.com wrote: > +#elif defined __aarch64__ > +#include "arm_neon.h" A better test is __NEON__, which asserts that neon is available at compile time (which will be true basically always for aarch64), and then you don't need a runime test for neon. You also get support for armv7 with neon. > +#define NEON_VECTYPE uint64x2_t > +#define NEON_LOAD_N_ORR(v1, v2) (vld1q_u64(&v1) | vld1q_u64(&v2)) > +#define NEON_ORR(v1, v2) ((v1) | (v2)) > +#define NEON_NOT_EQ_ZERO(v1) \ > + ((vgetq_lane_u64(v1, 0) != 0) || (vgetq_lane_u64(v1, 1) != 0)) FWIW, I think that vmaxvq_u32 would be a better reduction for aarch64. Extracting the individual lanes isn't as efficient as one would like. For armv7, folding via vget_lane_u64(vget_high_u64(v1) | vget_low_u64(v1), 0) is probably best. r~