From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 00FC3C30659 for ; Wed, 26 Jun 2024 12:05:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=I+n+n17APqw+bdWi1LDA6VKuPSq1Ise97B6f64VmIZE=; b=TocZZyQd1xWmAkb+gU7dZQEXzV yLTfyrJbyc+Ts76bcHTCcY8cSq6DE6feVKU9SSeJXks1EYkkXMLmIqhWemUFTPTJR76Mpqe7Gp0nN nz6M8cR4GTmCi7o2Umbl0vck+4sD89bwuIp/ujrx1ChBHmg1cfmOlphPBEMNnaLR+cXlsEo+R/85N 9e1do78tApm0ux8VXXm9GiVght5XMl1glzmGn125buTO2wmLUBYpL4UNaFHeeI1T/9ih1NNSnfr80 7xQfcWqksIAInsA0oRxQSTOcpOppAGuHWop+fzzfpPGqHhHwqUcG0/0ABw4ZYj/TxhQJQjJcWOGLa YjoYATyw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sMROh-00000006faX-0rq1; Wed, 26 Jun 2024 12:05:11 +0000 Received: from dfw.source.kernel.org ([139.178.84.217]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sMROY-00000006fY3-06y6 for linux-arm-kernel@lists.infradead.org; Wed, 26 Jun 2024 12:05:05 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 5E536619EE; Wed, 26 Jun 2024 12:05:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id F073CC32786; Wed, 26 Jun 2024 12:04:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719403501; bh=2uHDlGI4Mkm4jPb0WidaIGWBR2l/OxTlZaVOBsJiMIo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=CIJIWohuVu9Es+FhJFnQDacFByt/NzWi/EQzCRN1IRgMYcENXVihg30Uw+ytU8c9Q lA3svGvI0JtY7eoSDIgxOth06mBqeOTQikD0z/q8xvIjg2Y00GN40JxjRe94tFSqGQ XL3oI4alBePURw+4ZHGjv5mkc1DtNBF2SHF/Eyv0f87eInKMLRKvbBnEYLIYov01Xn e3OQPTFOPl4eCz+trbfyudzcPBvq0lDPLP1kWr3Pc8kmLZJXklM7uKg9Vhs+s4uqR8 OX+ICpRqb8dNC+/2/5GSMKTYCo4K4xx/r5OPvrF8hsG4fFcSdqvIbj6Lt9PDL27oxe 2cgE47E6KqNTg== Date: Wed, 26 Jun 2024 19:50:57 +0800 From: Jisheng Zhang To: Catalin Marinas Cc: Will Deacon , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] arm64/lib: copy_page: s/stnp/stp Message-ID: References: <20240613001812.2141-1-jszhang@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240626_050502_158909_4051BE07 X-CRM114-Status: GOOD ( 19.47 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, Jun 24, 2024 at 06:56:33PM +0100, Catalin Marinas wrote: > On Thu, Jun 13, 2024 at 08:18:12AM +0800, Jisheng Zhang wrote: > > stnp performs non-temporal store, give a hints to the memory system > > that caching is not useful for this data. But the scenario where > > copy_page() used may not have this implication, although I must admit > > there's such case where stnp helps performance(good). In this good > > case, we can rely on the HW write streaming mechanism in some > > implementations such as cortex-a55 to detect the case and take actions. > > > > testing with https://github.com/apinski-cavium/copy_page_benchmark > > this patch can reduce the time by about 3% on cortex-a55 platforms. > > What about other CPUs? I'm also not convinced by such microbenchmarks. Per my test on CA53 and CA73, CA73 got similar improvements. As for CA53 there's no difference, maybe due to the follwoing commit in the ATF: 54035fc4672aa ("Disable non-temporal hint on Cortex-A53/57") > It looks like it always copies to the same page, the stp may even > benefit from some caching of the data which we wouldn't need in a real > scenario. Yep this is also my understanding where's the improvement from. And I must admit there's case where stnp helps performance. we can rely on the HW write streaming mechanism to detect and take actions. However, sometimes, the "cache" behavior can benefit the scenario. Then in this case, the stnp here would double lose. what do you think? > > So, I'm not merging this unless it's backed by some solid data across > several CPU implementations. > > -- > Catalin