From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39107) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gAZtx-0000or-H2 for qemu-devel@nongnu.org; Thu, 11 Oct 2018 08:17:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gAZtr-0006Jy-6s for qemu-devel@nongnu.org; Thu, 11 Oct 2018 08:17:13 -0400 Received: from fanzine.igalia.com ([91.117.99.155]:37131) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gAZtq-0005nX-R3 for qemu-devel@nongnu.org; Thu, 11 Oct 2018 08:17:07 -0400 From: Alberto Garcia In-Reply-To: <20181009153132.GL22838@redhat.com> References: <20181009125541.24455-1-berrange@redhat.com> <20181009125541.24455-5-berrange@redhat.com> <20181009150629.GK22838@redhat.com> <20181009153132.GL22838@redhat.com> Date: Thu, 11 Oct 2018 14:16:24 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 4/6] crypto: convert xts_tweak_encdec to use xts_uint128 type List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: =?utf-8?Q?Daniel_P=2E_Berrang=C3=A9?= Cc: qemu-devel@nongnu.org On Tue 09 Oct 2018 05:31:32 PM CEST, Daniel P. Berrang=C3=A9 wrote: > On Tue, Oct 09, 2018 at 05:30:25PM +0200, Alberto Garcia wrote: >> >> > for (i =3D 0; i < lim; i++) { >> >> > - xts_tweak_encdec(datactx, decfunc, src, dst, (uint8_t *)&T= ); >> >> > + xts_uint128 S, D; >> >> > + >> >> > + memcpy(&S, src, XTS_BLOCK_SIZE); >> >> > + xts_tweak_encdec(datactx, decfunc, &S, &D, &T); >> >> > + memcpy(dst, &D, XTS_BLOCK_SIZE); >> >>=20 >> >> Why do you need S and D? >> > >> > I think src & dst pointers can't be guaranteed to be aligned >> > sufficiently for int64 operations, if we just cast from uint8t*. >>=20 >> I see. I did a quick test without the memcpy() calls and it doesn't >> seem to have a visible effect on performance, but if it turns out >> that it does then maybe this is worth investigating further. I >> suspect all buffers received by this code are allocated with >> qemu_try_blockalign() anyway, so it should be safe. > > The extra memcpy() calls certainly had a perf impact when I added > them, so if we can determine that we can safely do without, that would > be desirable. So I was having a look at this. From the block layer everything comes properly aligned. Then there's VirtioCrypto, which seems to allow XTS mode but I couldn't quite tell from virtio_crypto_sym_op_helper() if all buffers are guaranteed to be aligned. What you could do is a runtime check (with QEMU_PTR_IS_ALIGNED()) and decide what implementation to use. A couple of additional thoughts: - x86_64 (and others) allow unaligned memory accesses, and that might be faster than copying the buffer using memcpy(). I haven't measured it however. - qcrypto_block_{encrypt,decrypt}_helper() (used for encrypted block I/O) use the same buffer for input and output, so maybe it's worth exploring if this fact allows for additional optimization if you still need to use memcpy(). Berto