From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A2EAC28CF6 for ; Wed, 1 Aug 2018 17:58:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1E41B208A4 for ; Wed, 1 Aug 2018 17:58:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1E41B208A4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linuxfoundation.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732456AbeHATpJ (ORCPT ); Wed, 1 Aug 2018 15:45:09 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:49806 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405945AbeHATIk (ORCPT ); Wed, 1 Aug 2018 15:08:40 -0400 Received: from localhost (D57E6652.static.ziggozakelijk.nl [213.126.102.82]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 8B4681374; Wed, 1 Aug 2018 17:13:52 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Christophe Leroy , Michael Ellerman , Sasha Levin Subject: [PATCH 4.14 046/246] powerpc/lib: Adjust .balign inside string functions for PPC32 Date: Wed, 1 Aug 2018 18:49:16 +0200 Message-Id: <20180801165013.920182664@linuxfoundation.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180801165011.700991984@linuxfoundation.org> References: <20180801165011.700991984@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.14-stable review patch. If anyone has any objections, please let me know. ------------------ From: Christophe Leroy [ Upstream commit 1128bb7813a896bd608fb622eee3c26aaf33b473 ] commit 87a156fb18fe1 ("Align hot loops of some string functions") degraded the performance of string functions by adding useless nops A simple benchmark on an 8xx calling 100000x a memchr() that matches the first byte runs in 41668 TB ticks before this patch and in 35986 TB ticks after this patch. So this gives an improvement of approx 10% Another benchmark doing the same with a memchr() matching the 128th byte runs in 1011365 TB ticks before this patch and 1005682 TB ticks after this patch, so regardless on the number of loops, removing those useless nops improves the test by 5683 TB ticks. Fixes: 87a156fb18fe1 ("Align hot loops of some string functions") Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/include/asm/cache.h | 3 +++ arch/powerpc/lib/string.S | 7 ++++--- 2 files changed, 7 insertions(+), 3 deletions(-) --- a/arch/powerpc/include/asm/cache.h +++ b/arch/powerpc/include/asm/cache.h @@ -9,11 +9,14 @@ #if defined(CONFIG_PPC_8xx) || defined(CONFIG_403GCX) #define L1_CACHE_SHIFT 4 #define MAX_COPY_PREFETCH 1 +#define IFETCH_ALIGN_SHIFT 2 #elif defined(CONFIG_PPC_E500MC) #define L1_CACHE_SHIFT 6 #define MAX_COPY_PREFETCH 4 +#define IFETCH_ALIGN_SHIFT 3 #elif defined(CONFIG_PPC32) #define MAX_COPY_PREFETCH 4 +#define IFETCH_ALIGN_SHIFT 3 /* 603 fetches 2 insn at a time */ #if defined(CONFIG_PPC_47x) #define L1_CACHE_SHIFT 7 #else --- a/arch/powerpc/lib/string.S +++ b/arch/powerpc/lib/string.S @@ -12,6 +12,7 @@ #include #include #include +#include .text @@ -23,7 +24,7 @@ _GLOBAL(strncpy) mtctr r5 addi r6,r3,-1 addi r4,r4,-1 - .balign 16 + .balign IFETCH_ALIGN_BYTES 1: lbzu r0,1(r4) cmpwi 0,r0,0 stbu r0,1(r6) @@ -43,7 +44,7 @@ _GLOBAL(strncmp) mtctr r5 addi r5,r3,-1 addi r4,r4,-1 - .balign 16 + .balign IFETCH_ALIGN_BYTES 1: lbzu r3,1(r5) cmpwi 1,r3,0 lbzu r0,1(r4) @@ -77,7 +78,7 @@ _GLOBAL(memchr) beq- 2f mtctr r5 addi r3,r3,-1 - .balign 16 + .balign IFETCH_ALIGN_BYTES 1: lbzu r0,1(r3) cmpw 0,r0,r4 bdnzf 2,1b