From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1ADDBC27C4F for ; Wed, 26 Jun 2024 18:09:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=ztUX3vfZUXWVfb7pc/yZ7GvQ1rU21Bdx8L0M8d1gEnw=; b=pr56OpW40awOWUmZsCfCO2sl6/ 9fCrWDJCnLSvj3/CvAY2dCflXl2Ms6bylBBEeYKg1hyHub2Bxs5aBUQntu4y8D1j7/SRhsbe1uVm2 89c3nvcaSUyegGukUf6UB0Jfasw047J2sbz5oVq+00KNLqqAfu2c4LuWG6G5QpAXe0S0pNQQ2eriz VGGZYwna+75oV2Aw7+hDQE02Jivw+31GQ93+Id2xCDG5TBt2QYpWF/U0ODHEvbi6fqjyGq2v3inHH 0tQzfCZsaF3YTImQVpkhDJnmjYM1CMgfYYjJE9vNl1aRzJ0urcjo+7Bvr7bXtgeW89ZX5UPSk4C4Q ui6TL76A==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sMX4k-00000007rj7-1zCo; Wed, 26 Jun 2024 18:08:58 +0000 Received: from sin.source.kernel.org ([145.40.73.55]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sMX4c-00000007rgD-0chQ for linux-arm-kernel@lists.infradead.org; Wed, 26 Jun 2024 18:08:51 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 95B85CE2120 for ; Wed, 26 Jun 2024 18:08:46 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E139EC116B1; Wed, 26 Jun 2024 18:08:44 +0000 (UTC) Date: Wed, 26 Jun 2024 19:08:42 +0100 From: Catalin Marinas To: Jisheng Zhang Cc: Will Deacon , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] arm64/lib: copy_page: s/stnp/stp Message-ID: References: <20240613001812.2141-1-jszhang@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240626_110850_401544_EF4C065B X-CRM114-Status: GOOD ( 23.38 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, Jun 26, 2024 at 07:50:57PM +0800, Jisheng Zhang wrote: > On Mon, Jun 24, 2024 at 06:56:33PM +0100, Catalin Marinas wrote: > > On Thu, Jun 13, 2024 at 08:18:12AM +0800, Jisheng Zhang wrote: > > > stnp performs non-temporal store, give a hints to the memory system > > > that caching is not useful for this data. But the scenario where > > > copy_page() used may not have this implication, although I must admit > > > there's such case where stnp helps performance(good). In this good > > > case, we can rely on the HW write streaming mechanism in some > > > implementations such as cortex-a55 to detect the case and take actions. > > > > > > testing with https://github.com/apinski-cavium/copy_page_benchmark > > > this patch can reduce the time by about 3% on cortex-a55 platforms. [...] > > It looks like it always copies to the same page, the stp may even > > benefit from some caching of the data which we wouldn't need in a real > > scenario. > > Yep this is also my understanding where's the improvement from. And > I must admit there's case where stnp helps performance. we can rely > on the HW write streaming mechanism to detect and take actions. Well, is that case realistic? Can you show any improvement with some real-world uses? Most likely modern CPUs fall back to non-temporal stores after a series of STPs but it depends on how soon they do it, how much cache gets polluted. OTOH, page copying could be the result of a CoW and we'd expect subsequent accesses from the user where some caching may be beneficial. So, hard to tell but we should make a decision based on a microbenchmark that writes over the same page multiple times. If you have some real-world tests that exercise this path (e.g. CoW, Android app startup) and show an improvement, I'd be in favour of this. Otherwise, no. Thanks. -- Catalin