From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9AE14C2D0A8 for ; Mon, 28 Sep 2020 15:28:14 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 03D622158C for ; Mon, 28 Sep 2020 15:28:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="Fj6Plx/7" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 03D622158C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=TXVsPzA3kTufyGkXs0lmfZgROCCqcrWeiFfdvLSLYz4=; b=Fj6Plx/7FVbGl1EOejLw91MtY MZQ8CjbplgIfwm6qyVTQq9w2Sj6UnKYAGjYM5MLV+icB295k7rBpXScB819I5Ie8B5zXBZqpc5Q6X NiLDbGqKOEZYtMA+D9pYDBr+y/3qtinfO+Q7MsFPNE1CWXDQIM7p4sr5d0h4pezBCSK4qryoIX7dG vej1op5Ic65wxCarcmYuFuyAUY6o+DqCoerk9NiE6IgmrH2+So+AnB/51UmqPPpxAiGuxKxE45svG f8A6DKGrJCSujwi/WoKi8TItcsXkXQRbszU6yiURB6zBe/h410/BsaPseR5JXDsXsbVN2YBVlGvcm xgdFgOWGg==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kMv2g-0001xX-2R; Mon, 28 Sep 2020 15:26:18 +0000 Received: from mail.kernel.org ([198.145.29.99]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kMuyi-0008To-VR for linux-arm-kernel@lists.infradead.org; Mon, 28 Sep 2020 15:22:13 +0000 Received: from gaia (unknown [31.124.44.166]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0CA4B2100A; Mon, 28 Sep 2020 15:22:08 +0000 (UTC) Date: Mon, 28 Sep 2020 16:22:06 +0100 From: Catalin Marinas To: Gavin Shan Subject: Re: [PATCH v3 0/2] arm64/mm: Enable color zero pages Message-ID: <20200928152206.GC27500@gaia> References: <20200928072256.13098-1-gshan@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20200928072256.13098-1-gshan@redhat.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200928_112213_120723_922774F4 X-CRM114-Status: GOOD ( 20.73 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mark.rutland@arm.com, anshuman.khandual@arm.com, robin.murphy@arm.com, linux-kernel@vger.kernel.org, shan.gavin@gmail.com, will@kernel.org, linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi Gavin, On Mon, Sep 28, 2020 at 05:22:54PM +1000, Gavin Shan wrote: > Testing > ======= > [1] The experiment reveals how heavily the (L1) data cache miss impacts > the overall application's performance. The machine where the test > is carried out has the following L1 data cache topology. In the > mean while, the host kernel have following configurations. > > The test case allocates contiguous page frames through HugeTLBfs > and reads 4-bytes data from the same offset (0x0) from these (N) > contiguous page frames. N is equal to 8 or 9 separately in the > following two test cases. This is repeated for one million of > times. > > Note that 8 is number of L1 data cache ways. The experiment is > cause L1 cache thrashing on one particular set. > > Host: CONFIG_ARM64_PAGE_SHIFT=12 > DEFAULT_HUGE_PAGE_SIZE=2MB > L1 dcache: cache-line-size=64 > number-of-sets=64 > number-of-ways=8 > > N=8 N=9 > ------------------------------------------------------------------ > cache-misses: 43,429 9,038,460 > L1-dcache-load-misses: 43,429 9,038,460 > seconds time elapsed: 0.299206372 0.722253140 (2.41 times) > > [2] The experiment should have been carried out on machine where the > L1 data cache capacity of one particular way is larger than 4KB. > However, I'm unable to find such kind of machines. So I have to > evaluate the performance impact caused by L2 data cache thrashing. > The experiment is carried out on the machine, which has following > L1/L2 data cache topology. The host kernel configuration is same > to [1]. > > The corresponding test program allocates contiguous page frames > through hugeTLBfs and builds VMAs backed by zero pages. These > contiguous pages are sequentially read from fixed offset (0) in step > of 32KB and by 8 times. After that, the VMA backed by zero pages are > sequentially read in step of 4KB and by once. It's repeated by 8 > millions of times. > > Note 32KB is the cache capacity in one L2 data cache way and 8 is > number of L2 data cache sets. This experiment is to cause L2 data > cache thrashing on one particular set. > > L1 dcache: > L2 dcache: cache-line-size=64 > number-of-sets=512 > number-of-ways=8 > > ----------------------------------------------------------------------- > cache-references: 1,427,213,737 1,421,394,472 > cache-misses: 35,804,552 42,636,698 > L1-dcache-load-misses: 35,804,552 42,636,698 > seconds time elapsed: 2.602511671 2.098198172 (+19.3%) No-one is denying a performance improvement in a very specific way but what's missing here is explaining how these artificial benchmarks relate to real-world applications. -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel