From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 918C0D2F7CE for ; Thu, 17 Oct 2024 11:59:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=GBoFNuL1WFYfpkAnAswHtQajODXCsz6EcDaPxxE82G8=; b=V2rifVkJTy9q0yMMiBWgOeJub9 zKRgdjvYApFzUAcTva7UAQmF6lz/8V9kMKhvq6QuDPFyVcsQJUVBTA5tRIpkWuhHEBGqRVOw6gzOe dCDl6kGJmwSMjVVFj2qVgAdUtxPOvGDpo7hEDWoOLrY80TCBYaXYadbAEzpYUP20Z/W8RT9WwZ02v P9tHlWe2QzxKlFL7R3xZOayunwkucnNnVK2kyMQ42/xTH3MRNtpkr1QGw+EejzSDEw1FHv7spaJ4O VMMq2IEh9lGOVn7U8P/6Svgq88qdh8l8n6DABwIdysSgvw2rzb/e8U0nyTpLgGoO8xjYQQJC5U59J RHo+InBA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1t1PAK-0000000Ei8x-17KS; Thu, 17 Oct 2024 11:59:40 +0000 Received: from nyc.source.kernel.org ([147.75.193.91]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1t1P8A-0000000EhPO-3hi6 for linux-arm-kernel@lists.infradead.org; Thu, 17 Oct 2024 11:57:43 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 2161CA43E73; Thu, 17 Oct 2024 11:57:17 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9492CC4CEC3; Thu, 17 Oct 2024 11:57:24 +0000 (UTC) Date: Thu, 17 Oct 2024 12:57:22 +0100 From: Catalin Marinas To: Kristina Martsenko Cc: linux-arm-kernel@lists.infradead.org, Will Deacon , Mark Rutland , Robin Murphy , Marc Zyngier Subject: Re: [PATCH 4/5] arm64: lib: Use MOPS for memcpy() routines Message-ID: References: <20240930161051.3777828-1-kristina.martsenko@arm.com> <20240930161051.3777828-5-kristina.martsenko@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20241017_045727_454149_67D193BB X-CRM114-Status: GOOD ( 27.25 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, Oct 16, 2024 at 02:08:27PM +0100, Kristina Martsenko wrote: > On 04/10/2024 11:07, Catalin Marinas wrote: > > On Thu, Oct 03, 2024 at 05:46:08PM +0100, Kristina Martsenko wrote: > >> On 02/10/2024 16:29, Catalin Marinas wrote: > >>> On Mon, Sep 30, 2024 at 05:10:50PM +0100, Kristina Martsenko wrote: > >>>> diff --git a/arch/arm64/lib/memcpy.S b/arch/arm64/lib/memcpy.S > >>>> index 4ab48d49c451..9b99106fb95f 100644 > >>>> --- a/arch/arm64/lib/memcpy.S > >>>> +++ b/arch/arm64/lib/memcpy.S > >>>> @@ -57,7 +57,7 @@ > >>>> The loop tail is handled by always copying 64 bytes from the end. > >>>> */ > >>>> > >>>> -SYM_FUNC_START(__pi_memcpy) > >>>> +SYM_FUNC_START_LOCAL(__pi_memcpy_generic) > >>>> add srcend, src, count > >>>> add dstend, dstin, count > >>>> cmp count, 128 > >>>> @@ -238,7 +238,24 @@ L(copy64_from_start): > >>>> stp B_l, B_h, [dstin, 16] > >>>> stp C_l, C_h, [dstin] > >>>> ret > >>>> +SYM_FUNC_END(__pi_memcpy_generic) > >>>> + > >>>> +#ifdef CONFIG_AS_HAS_MOPS > >>>> + .arch_extension mops > >>>> +SYM_FUNC_START(__pi_memcpy) > >>>> +alternative_if_not ARM64_HAS_MOPS > >>>> + b __pi_memcpy_generic > >>>> +alternative_else_nop_endif > >>> > >>> I'm fine with patching the branch but I wonder whether, for the time > >>> being, we should use alternative_if instead and the NOP to fall through > >>> the default implementation. The hardware in the field doesn't have > >>> FEAT_MOPS yet and they may see a slight penalty introduced by the > >>> branch, especially for small memcpys. Just guessing, I haven't done any > >>> benchmarks. > >> > >> My thinking was that this way it doesn't have to be changed again in the > >> future. But I'm fine with switching to alternative_if for v2. > > > > The other option is to benchmark the proposed patches a bit and see if > > we notice any difference on current hardware. Not sure exactly what > > benchmarks would exercise these paths. For copy_page(), I suspect the > > branch is probably lost in the noise. It's more like small copies that > > might notice. > > > > Yet another option is to leave the patches as they are and see if anyone > > complains, we swap them over then ;). > > I tried benchmarking a kernel build and hackbench on a Morello board (with > usercopy patches applied as well) but didn't see any significant performance > difference between the branch and NOP so I would leave the patches as they are. That's great. Thanks for checking. -- Catalin