From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41947) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fXqfI-0008TT-Mr for qemu-devel@nongnu.org; Tue, 26 Jun 2018 12:18:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fXqfF-0005Jm-Hg for qemu-devel@nongnu.org; Tue, 26 Jun 2018 12:18:00 -0400 Received: from mail-pg0-x241.google.com ([2607:f8b0:400e:c05::241]:36025) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fXqfF-0005Ja-B4 for qemu-devel@nongnu.org; Tue, 26 Jun 2018 12:17:57 -0400 Received: by mail-pg0-x241.google.com with SMTP id m5-v6so7843216pgd.3 for ; Tue, 26 Jun 2018 09:17:57 -0700 (PDT) References: <20180621015359.12018-1-richard.henderson@linaro.org> <20180621015359.12018-34-richard.henderson@linaro.org> From: Richard Henderson Message-ID: Date: Tue, 26 Jun 2018 09:17:52 -0700 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v5 33/35] target/arm: Implement SVE dot product (indexed) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Maydell Cc: QEMU Developers On 06/26/2018 08:30 AM, Peter Maydell wrote: > On 21 June 2018 at 02:53, Richard Henderson > wrote: >> Signed-off-by: Richard Henderson >> --- >> target/arm/helper.h | 5 ++ >> target/arm/translate-sve.c | 18 +++++++ >> target/arm/vec_helper.c | 96 ++++++++++++++++++++++++++++++++++++++ >> target/arm/sve.decode | 8 +++- >> 4 files changed, 126 insertions(+), 1 deletion(-) >> > >> +void HELPER(gvec_sdot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc) >> +{ >> + intptr_t i, j, opr_sz = simd_oprsz(desc), opr_sz_4 = opr_sz / 4; >> + intptr_t index = simd_data(desc); >> + uint32_t *d = vd; >> + int8_t *n = vn, *m = vm; >> + >> + for (i = 0; i < opr_sz_4; i = j) { >> + int8_t m0 = m[(i + index) * 4 + 0]; >> + int8_t m1 = m[(i + index) * 4 + 1]; >> + int8_t m2 = m[(i + index) * 4 + 2]; >> + int8_t m3 = m[(i + index) * 4 + 3]; >> + >> + j = i; >> + do { >> + d[j] += n[j * 4 + 0] * m0 >> + + n[j * 4 + 1] * m1 >> + + n[j * 4 + 2] * m2 >> + + n[j * 4 + 3] * m3; >> + } while (++j < MIN(i + 4, opr_sz_4)); >> + } >> + clear_tail(d, opr_sz, simd_maxsz(desc)); >> +} > > Maybe I'm just half asleep this afternoon, but this is pretty > confusing -- nested loops where the outer loop's increment > uses the inner loop's index, and the inner loop's conditions > depend on the outer loop index... Yeah, well. There is an edge case of aa64 advsimd, reusing this same helper, sdot v0.2s, v1.8b, v0.4b[0] where m values must be read (and held) before writing d results, and there are not 16/4=4 elements to process but only 2. I suppose I could special-case oprsz == 8 in order to simplify iteration of what is otherwise a multiple of 16. I thought iterating J from I to I+4 was easier to read than writing out I+J everywhere. Perhaps not. >> -DOT_zzz 01000100 1 sz:1 0 rm:5 00000 u:1 rn:5 rd:5 >> +DOT_zzz 01000100 1 sz:1 0 rm:5 00000 u:1 rn:5 rd:5 ra=%reg_movprfx > > Should this have been in the previous patch ? Yes, thanks. r~