From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9B7F1CFA758 for ; Fri, 4 Oct 2024 10:09:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=ZUWHh8GOfB3nPUtEg4q2hTeA7Bumtwmu0CU703/MnjU=; b=TsmXizIrY1/m/fI3qCI45+5pgj WrJmtKBvHsFqkHFZ4AuSqcsdOqfU1Hg5RmrH+kw1MEq/xIvOKwE42Gn85ownK6uNDlD8s24CASYx3 PPEFmb6nRlImmlAvdpzesSIyLuxI0otvJ8uM7S/C1iRssmSYJUerE3xSPPfXc/7zj2ymfk4yU1LZt +d4Gnd/iV3rtB3c/xb0+YRxSWHDCxeQ74kwxgINGrF82S8mQjQerBpG9ZJ0ciw2315GmrP49racMr anSeegHP8tj5XSbULk7Cmf+kHj10axsdcEe45tS/Q6F3gYLRM/wwjX6+kibaqhR0IAkGMfsMUXmcM o9eUXqaA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1swfEx-0000000Blyd-1lIJ; Fri, 04 Oct 2024 10:08:51 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1swfDf-0000000Blg7-2t7I for linux-arm-kernel@lists.infradead.org; Fri, 04 Oct 2024 10:07:33 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 48715339; Fri, 4 Oct 2024 03:07:47 -0700 (PDT) Received: from arm.com (usa-sjc-mx-foss1.foss.arm.com [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9D4DC3F640; Fri, 4 Oct 2024 03:07:16 -0700 (PDT) Date: Fri, 4 Oct 2024 13:07:14 +0300 From: Catalin Marinas To: Kristina Martsenko Cc: linux-arm-kernel@lists.infradead.org, Will Deacon , Mark Rutland , Robin Murphy , Marc Zyngier Subject: Re: [PATCH 4/5] arm64: lib: Use MOPS for memcpy() routines Message-ID: References: <20240930161051.3777828-1-kristina.martsenko@arm.com> <20240930161051.3777828-5-kristina.martsenko@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20241004_030731_800898_F35BE249 X-CRM114-Status: GOOD ( 24.39 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Thu, Oct 03, 2024 at 05:46:08PM +0100, Kristina Martsenko wrote: > On 02/10/2024 16:29, Catalin Marinas wrote: > > On Mon, Sep 30, 2024 at 05:10:50PM +0100, Kristina Martsenko wrote: > >> diff --git a/arch/arm64/lib/memcpy.S b/arch/arm64/lib/memcpy.S > >> index 4ab48d49c451..9b99106fb95f 100644 > >> --- a/arch/arm64/lib/memcpy.S > >> +++ b/arch/arm64/lib/memcpy.S > >> @@ -57,7 +57,7 @@ > >> The loop tail is handled by always copying 64 bytes from the end. > >> */ > >> > >> -SYM_FUNC_START(__pi_memcpy) > >> +SYM_FUNC_START_LOCAL(__pi_memcpy_generic) > >> add srcend, src, count > >> add dstend, dstin, count > >> cmp count, 128 > >> @@ -238,7 +238,24 @@ L(copy64_from_start): > >> stp B_l, B_h, [dstin, 16] > >> stp C_l, C_h, [dstin] > >> ret > >> +SYM_FUNC_END(__pi_memcpy_generic) > >> + > >> +#ifdef CONFIG_AS_HAS_MOPS > >> + .arch_extension mops > >> +SYM_FUNC_START(__pi_memcpy) > >> +alternative_if_not ARM64_HAS_MOPS > >> + b __pi_memcpy_generic > >> +alternative_else_nop_endif > > > > I'm fine with patching the branch but I wonder whether, for the time > > being, we should use alternative_if instead and the NOP to fall through > > the default implementation. The hardware in the field doesn't have > > FEAT_MOPS yet and they may see a slight penalty introduced by the > > branch, especially for small memcpys. Just guessing, I haven't done any > > benchmarks. > > My thinking was that this way it doesn't have to be changed again in the > future. But I'm fine with switching to alternative_if for v2. The other option is to benchmark the proposed patches a bit and see if we notice any difference on current hardware. Not sure exactly what benchmarks would exercise these paths. For copy_page(), I suspect the branch is probably lost in the noise. It's more like small copies that might notice. Yet another option is to leave the patches as they are and see if anyone complains, we swap them over then ;). -- Catalin