From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58557) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY8lV-0004qU-05 for qemu-devel@nongnu.org; Wed, 27 Jun 2018 07:37:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY8lR-00036c-Cn for qemu-devel@nongnu.org; Wed, 27 Jun 2018 07:37:36 -0400 Received: from mail-wr0-x241.google.com ([2a00:1450:400c:c0c::241]:34416) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY8lR-00035W-4I for qemu-devel@nongnu.org; Wed, 27 Jun 2018 07:37:33 -0400 Received: by mail-wr0-x241.google.com with SMTP id a12-v6so1706857wro.1 for ; Wed, 27 Jun 2018 04:37:33 -0700 (PDT) References: <20180621015359.12018-1-richard.henderson@linaro.org> <20180621015359.12018-3-richard.henderson@linaro.org> <87lgb1wvz6.fsf@linaro.org> <292aa987-8644-c22e-4594-c3ad6518c4c3@linaro.org> From: Alex =?utf-8?Q?Benn=C3=A9e?= In-reply-to: <292aa987-8644-c22e-4594-c3ad6518c4c3@linaro.org> Date: Wed, 27 Jun 2018 12:37:30 +0100 Message-ID: <87fu18wjd1.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v5 02/35] target/arm: Implement SVE Contiguous Load, first-fault and no-fault List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Richard Henderson Cc: qemu-devel@nongnu.org, peter.maydell@linaro.org Richard Henderson writes: > On 06/26/2018 05:52 AM, Alex Benn=C3=A9e wrote: >>> +#define DO_LDFF1(PART, FN, TYPEE, TYPEM, H) = \ >>> +static void do_sve_ldff1##PART(CPUARMState *env, void *vd, void *vg, = \ >>> + target_ulong addr, intptr_t oprsz, = \ >>> + bool first, uintptr_t ra) = \ >>> +{ = \ >>> + intptr_t i =3D 0; = \ >>> + do { = \ >>> + uint16_t pg =3D *(uint16_t *)(vg + H1_2(i >> 3)); = \ >>> + do { = \ >>> + TYPEM m =3D 0; = \ >>> + if (pg & 1) { = \ >>> + if (!first && = \ >>> + page_check_range(addr, sizeof(TYPEM), PAGE_READ)) = { \ >>> + record_fault(env, i, oprsz); = \ >>> + return; = \ >>> + } = \ >>> + m =3D FN(env, addr, ra); = \ >>> + first =3D false; = \ >>> + } = \ >>> + *(TYPEE *)(vd + H(i)) =3D m; = \ >>> + i +=3D sizeof(TYPEE), pg >>=3D sizeof(TYPEE); = \ >>> + addr +=3D sizeof(TYPEM); = \ >>> + } while (i & 15); = \ >>> + } while (i < oprsz); = \ >>> +} >>> \ >> So I noticed that the disassembly of these two functions is mostly >> parameter pushing and popping. Is there a case to be made to use the >> __flatten__ approach and see how the compiler unrolls it all? > > Em... for the most part the functions being called are not inlinable, > being defined in accel/tcg/. *sigh* I guess. It's a shame because the numbers get more disappointing: 12:13:48 [alex@zen:~/l/q/q/aarch64-linux-user] review/rth-sve-v5(+26/-1) + = ./qemu-aarch64 ./tests/simd-memcpy libc intreg intpair simdreg simdpair sve libc, 248298053, 4228 kb/s intreg, 646085220, 1623 kb/s intpair, 369350825, 2841 kb/s simdreg, 1422096252, 737 kb/s simdpair, 1369635566, 765 kb/s sve, 2646179942, 396 kb/s and the above example doesn't have the cost of page_check_range. I guess this isn't something that could be improved until other architectures had a similar predicated load solution we could use in generated code. Helpers are always going to suck here :-/ Anyway my boy-racer disappointments aside: Reviewed-by: Alex Benn=C3=A9e -- Alex Benn=C3=A9e