From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.7 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37DC1C388F9 for ; Sat, 7 Nov 2020 18:09:50 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B8C4720885 for ; Sat, 7 Nov 2020 18:09:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="RXZPoK7g" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B8C4720885 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=collabora.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Type: Content-Transfer-Encoding:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:Message-ID:Date:References:In-Reply-To: Subject:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=RzBuLbBFg21e++9CCOSS4ExKMuxGWFFJ//2YtBiwUOs=; b=RXZPoK7gjz/ld+pOb2mvN1Wc7 toHfGAYmxYJSeCG/svPKVm6GwdNqIvVoIM+XX7aonZUK559h3pyHvi32Uj3YH65J/6hIbxtrK3Vm4 W5/O3Jb18dwm6G3BmHMoFyU0Pko16ZZDLonGDn98NiyL25/8HPqCZdFKq7aDUdo1lX77G+FZu46D9 NoJrF/X7KCMzdslW3IV7zVrg39AnM+g8OuJ6QsTey+F2ZGMrRU0Ee3r2MJsKUzcXRm9YVQcli+yr0 tGnDJfHjEh0XL5lWb/uuFEl2aP0i0Csc7Y5KqLpsmAipP2qCBbH9nOyhSEUiGEs/hzfINnEvrZY7C vCYWhO4DQ==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kbSd4-0005Bg-Ud; Sat, 07 Nov 2020 18:07:59 +0000 Received: from bhuna.collabora.co.uk ([2a00:1098:0:82:1000:25:2eeb:e3e3]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kbSd1-00059D-Ra for linux-arm-kernel@lists.infradead.org; Sat, 07 Nov 2020 18:07:57 +0000 Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: aratiu) with ESMTPSA id 803CB1F45390 From: Adrian Ratiu To: Nick Desaulniers Subject: Re: [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization In-Reply-To: References: <20201106051436.2384842-1-adrian.ratiu@collabora.com> <20201106051436.2384842-3-adrian.ratiu@collabora.com> <20201106101419.GB3811063@ubuntu-m3-large-x86> <87wnyyvh56.fsf@collabora.com> Date: Sat, 07 Nov 2020 20:07:47 +0200 Message-ID: <87tuu1ujkc.fsf@collabora.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201107_130756_079355_94285382 X-CRM114-Status: GOOD ( 33.85 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arnd Bergmann , LKML , Russell King , clang-built-linux , Nathan Chancellor , Collabora Kernel ML , Ard Biesheuvel , Linux ARM Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, 06 Nov 2020, Nick Desaulniers wrote: > On Fri, Nov 6, 2020 at 3:50 AM Adrian Ratiu > wrote: >> >> Hi Nathan, >> >> On Fri, 06 Nov 2020, Nathan Chancellor >> wrote: >> > + Ard, who wrote this code. >> > >> > On Fri, Nov 06, 2020 at 07:14:36AM +0200, Adrian Ratiu wrote: >> >> Due to a Clang bug [1] neon autoloop vectorization does not >> >> happen or happens badly with no gains and considering >> >> previous GCC experiences which generated unoptimized code >> >> which was worse than the default asm implementation, it is >> >> safer to default clang builds to the known good generic >> >> implementation. The kernel currently supports a minimum >> >> Clang version of v10.0.1, see commit 1f7a44f63e6c >> >> ("compiler-clang: add build check for clang 10.0.1"). When >> >> the bug gets eventually fixed, this commit could be reverted >> >> or, if the minimum clang version bump takes a long time, a >> >> warning could be added for users to upgrade their compilers >> >> like was done for GCC. [1] >> >> https://bugs.llvm.org/show_bug.cgi?id=40976 Signed-off-by: >> >> Adrian Ratiu >> > >> > Thank you for the patch! We are also tracking this here: >> > >> > https://github.com/ClangBuiltLinux/linux/issues/496 >> > >> > It was on my TODO to revist getting the warning eliminated, >> > which likely would have involved a patch like this as well. >> > >> > I am curious if it is worth revisting or dusting off Arnd's >> > patch in the LLVM bug tracker first. I have not tried it >> > personally. If that is not a worthwhile option, I am fine >> > with this for now. It would be nice to try and get a fix >> > pinned down on the LLVM side at some point but alas, finite >> > amount of resources and people :( >> >> I tested Arnd's kernel patch from the LLVM bugtracker [1], but >> with the Clang v10.0.1 I still get warnings like the following >> even though the __restrict workaround seems to affect the >> generated instructions: >> >> ./include/asm-generic/xor.h:15:2: remark: the cost-model >> indicates that interleaving is not beneficial >> [-Rpass-missed=loop-vectorize] >> ./include/asm-generic/xor.h:11:1: remark: List vectorization >> was possible but not beneficial with cost 0 >= 0 >> [-Rpass-missed=slp-vectorizer] xor_8regs_2(unsigned long bytes, >> unsigned long *__restrict p1, unsigned long *__restrict p2) > > If it's just a matter of overruling the cost model #pragma clang > loop vectorize(enable) > > will do the trick. > > Indeed, ``` diff --git a/include/asm-generic/xor.h > b/include/asm-generic/xor.h index b62a2a56a4d4..8796955498b7 > 100644 --- a/include/asm-generic/xor.h +++ > b/include/asm-generic/xor.h @@ -12,6 +12,7 @@ > xor_8regs_2(unsigned long bytes, unsigned long *p1, unsigned > long *p2) > { > long lines = bytes / (sizeof (long)) / 8; > > +#pragma clang loop vectorize(enable) > do { > p1[0] ^= p2[0]; p1[1] ^= p2[1]; > @@ -32,6 +33,7 @@ xor_8regs_3(unsigned long bytes, unsigned long > *p1, unsigned long *p2, > { > long lines = bytes / (sizeof (long)) / 8; > > +#pragma clang loop vectorize(enable) > do { > p1[0] ^= p2[0] ^ p3[0]; p1[1] ^= p2[1] ^ p3[1]; > @@ -53,6 +55,7 @@ xor_8regs_4(unsigned long bytes, unsigned long > *p1, unsigned long *p2, > { > long lines = bytes / (sizeof (long)) / 8; > > +#pragma clang loop vectorize(enable) > do { > p1[0] ^= p2[0] ^ p3[0] ^ p4[0]; p1[1] ^= p2[1] ^ > p3[1] ^ p4[1]; > @@ -75,6 +78,7 @@ xor_8regs_5(unsigned long bytes, unsigned long > *p1, unsigned long *p2, > { > long lines = bytes / (sizeof (long)) / 8; > > +#pragma clang loop vectorize(enable) > do { > p1[0] ^= p2[0] ^ p3[0] ^ p4[0] ^ p5[0]; p1[1] ^= > p2[1] ^ p3[1] ^ p4[1] ^ p5[1]; > ``` seems to generate the vectorized code. > > Why don't we find a way to make those pragma's more toolchain > portable, rather than open coding them like I have above rather > than this series? Hi Nick, Thank you very much for the suggestion. I agree. If a toolchain portable way can be found to realiably trigger the optimization, I will gladly replace this patch. :) Will work on it starting Monday then report back my findings or, if I can get it to work in a satisfying manner, send a v2 series directly. The first patch is still needed because it's more of a general cleanup as Nathan correctly observed. Regards, Adrian > > -- > Thanks, > ~Nick Desaulniers _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel