From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 90D5DC433F5 for ; Sat, 19 Feb 2022 00:55:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=ZmWxvH0/K11D73zOZZLpX4v32yL2iCJWozUiOKkAGbw=; b=rm6E3Ar1WQp2Ch 7UYGisWidSevr9rk13ria1flKP18L5/hhlztBEt+zyU0DSz8PWnuN+7swBNmddt/6K4qnQpEiLOmB SZvB1/57pdLnwuEzhKHnua1PJx+0vvdKhVXYHYbQyULGpZf5lF0Gu0TaqQWjVPvZVHg7ETqzcsOod WhsErDpcgLU6kZ0gqxh83sJX4x/bul/byHxCM6y+Cvc+QkBYQkFrFw2qRklMSRCgNuJcrlM8m27Ul upnhSfGlOLgdsvwcrIGH6dMmqa0r6ilIhjz+eWKr9VSf9hIBgXOaqXC28oC0vWfGk1nvQzku8glG5 Lkd9QuRQmDW46q8O6fpQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nLE0w-00G1xo-Qi; Sat, 19 Feb 2022 00:54:18 +0000 Received: from dfw.source.kernel.org ([2604:1380:4641:c500::1]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nLE0t-00G1xS-Ex for linux-arm-kernel@lists.infradead.org; Sat, 19 Feb 2022 00:54:16 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id DC5FB61FCA; Sat, 19 Feb 2022 00:54:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 178BAC340ED; Sat, 19 Feb 2022 00:54:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1645232054; bh=o4V+SgoVVVDHAsLHw06x53cm5Mf2URwbscepLNprQm8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=hchYl3a7A04OP3TCtC0kUYAvSNFTvVycRXIzFOsPhXEfc6Ab9rODywKnXQrF55Dm+ ziV4s7gPesway+6g0dDiZmkFME2pXcAsZaX8A1Ef8vUF7bA2FSXxLuQ1QFUk/ftR3p hEEReo/TmmU8Exysgt6nGdn8FQ+yldOTLoon/5CjQHC3ulItzDrp2vBvF64Vz/X9lv x1D956MEkpBQpeXlcMGhScauh/Q+jO8Lh8B1EhVrTWT8rUYagwwR0cEweqL3XWGN91 K/gVVBRQcpNFYuM0VwJ2P+cdkJNJp+vAEvKM9BGXsznTLLu02k6EuTqhHrukVWg0VK 4uBv2TJlY5UAw== Date: Fri, 18 Feb 2022 16:54:12 -0800 From: Eric Biggers To: Nathan Huckleberry Cc: linux-crypto@vger.kernel.org, Herbert Xu , "David S. Miller" , linux-arm-kernel@lists.infradead.org, Paul Crowley , Sami Tolvanen , Ard Biesheuvel Subject: Re: [RFC PATCH v2 6/7] crypto: x86/polyval: Add PCLMULQDQ accelerated implementation of POLYVAL Message-ID: References: <20220210232812.798387-1-nhuck@google.com> <20220210232812.798387-7-nhuck@google.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220218_165415_558833_0339AAAA X-CRM114-Status: GOOD ( 14.01 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, Feb 18, 2022 at 04:34:28PM -0800, Eric Biggers wrote: > > +.macro schoolbook1_iteration i xor_sum > > + .set i, \i > > + .set xor_sum, \xor_sum > > + movups (16*i)(OP1), %xmm0 > > + .if(i == 0 && xor_sum == 1) > > + pxor SUM, %xmm0 > > + .endif > > + vpclmulqdq $0x01, (16*i)(OP2), %xmm0, %xmm1 > > + vpxor %xmm1, MI, MI > > + vpclmulqdq $0x00, (16*i)(OP2), %xmm0, %xmm2 > > + vpxor %xmm2, LO, LO > > + vpclmulqdq $0x11, (16*i)(OP2), %xmm0, %xmm3 > > + vpxor %xmm3, HI, HI > > + vpclmulqdq $0x10, (16*i)(OP2), %xmm0, %xmm4 > > + vpxor %xmm4, MI, MI > > Perhaps the above multiplications and XORs should be reordered slightly so that > each XOR doesn't depend on the previous instruction? A good ordering might be: > > vpclmulqdq $0x01, (16*\i)(OP2), %xmm0, %xmm1 > vpclmulqdq $0x10, (16*\i)(OP2), %xmm0, %xmm2 > vpclmulqdq $0x00, (16*\i)(OP2), %xmm0, %xmm3 > vpclmulqdq $0x11, (16*\i)(OP2), %xmm0, %xmm4 > vpxor %xmm1, MI, MI > vpxor %xmm3, LO, LO > vpxor %xmm4, HI, HI > vpxor %xmm2, MI, MI > > With that, no instruction would depend on either of the previous two > instructions. > > This might be more important in the ARM64 version than the x86_64 version, as > x86_64 CPUs are pretty aggressive about internally reordering instructions. But > it's something to consider in both versions. > > Likewise in schoolbook1_noload. Or slightly better: vpclmulqdq $0x01, (16*\i)(OP2), %xmm0, %xmm2 vpclmulqdq $0x00, (16*\i)(OP2), %xmm0, %xmm1 vpclmulqdq $0x10, (16*\i)(OP2), %xmm0, %xmm3 vpclmulqdq $0x11, (16*\i)(OP2), %xmm0, %xmm4 vpxor %xmm2, MI, MI vpxor %xmm1, LO, LO vpxor %xmm4, HI, HI vpxor %xmm3, MI, MI - Eric _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel