From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74A98C61D97 for ; Wed, 22 Nov 2023 12:11:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344007AbjKVML1 (ORCPT ); Wed, 22 Nov 2023 07:11:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235039AbjKVMLY (ORCPT ); Wed, 22 Nov 2023 07:11:24 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F29E21AC for ; Wed, 22 Nov 2023 04:11:19 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 869E8C433C8; Wed, 22 Nov 2023 12:11:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1700655079; bh=4Pe2M7oVjGCR1IHagi17BwKXf6D2vE6O3UjPTSU2uL0=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=o6IWKn9H1iDGHN+/UX2qtQoIOqLrF+zx9/2b5LWDdqUPUJMQyLn5DFsntWl3EOvGp QRGK9kgRyVj0VvDhmB81CnO0ZRhT4gbaSRduq/9I3e63vLrb+f2ogiB2Z4WtdDDzKw L2BBEw+oMSbNsVA1a17+wXVeTpU6yycpe/tVfa6xrwEFPW33aKb2mkCOKMrETTXTJU Rl45OIuCHSeJjwSZ+Fgd05if7peDEbr/BMKomQC8U9KUEL3ilOLhdHx7+FS5LaNDsx EZKCBMoAXx1AA36xH0Tg7SZIpoKqI7LVBX/ORkgJaj6h7jJBrIUHN0/ZhNSIsWcr7j CEDFReuQ0ThTA== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1r5m4a-00FPrR-MI; Wed, 22 Nov 2023 12:11:17 +0000 Date: Wed, 22 Nov 2023 12:11:16 +0000 Message-ID: <86il5uyom3.wl-maz@kernel.org> From: Marc Zyngier To: Mark Rutland Cc: Will Deacon , Huang Shijie , catalin.marinas@arm.com, suzuki.poulose@arm.com, broonie@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, anshuman.khandual@arm.com, robh@kernel.org, oliver.upton@linux.dev, patches@amperecomputing.com Subject: Re: [PATCH 0/4] arm64: an optimization for AmpereOne In-Reply-To: References: <20231122092855.4440-1-shijie@os.amperecomputing.com> <20231122094857.GA2959@willie-the-truck> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: mark.rutland@arm.com, will@kernel.org, shijie@os.amperecomputing.com, catalin.marinas@arm.com, suzuki.poulose@arm.com, broonie@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, anshuman.khandual@arm.com, robh@kernel.org, oliver.upton@linux.dev, patches@amperecomputing.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 22 Nov 2023 11:40:09 +0000, Mark Rutland wrote: > > On Wed, Nov 22, 2023 at 09:48:57AM +0000, Will Deacon wrote: > > On Wed, Nov 22, 2023 at 05:28:51PM +0800, Huang Shijie wrote: > > > 0) Background: > > > We found that AmpereOne benefits from aggressive prefetches when > > > using 4K page size. > > > > We tend to shy away from micro-architecture specific optimisations in > > the arm64 kernel as they're pretty unmaintainable, hard to test properly, > > generally lead to bloat and add additional obstacles to updating our > > library routines. > > > > Admittedly, we have something for Thunder-X1 in copy_page() (disguised > > as ARM64_HAS_NO_HW_PREFETCH) but, frankly, that machine needed all the > > help it could get and given where it is today I suspect we could drop > > that code without any material consequences. > > > > So I'd really prefer not to merge this; modern CPUs should do better at > > copying data. It's copy_to_user(), not rocket science. > > I agree, and I'd also like to drop ARM64_HAS_NO_HW_PREFETCH. +1. Also, as the (most probably) sole user of this remarkable implementation, I hacked -rc2 to drop ARM64_HAS_NO_HW_PREFETCH. The result is that a kernel compilation job regressed by 0.4%, something that I consider being pure noise. If nobody beats me to it, I'll send the patch. M. -- Without deviation from the norm, progress is not possible.