From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 553B2C433DF for ; Fri, 10 Jul 2020 09:36:53 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1DA08206E2 for ; Fri, 10 Jul 2020 09:36:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="CS0yKGmo" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1DA08206E2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=EO7XmfkuqD5WAWc7xnwG0rxyl+yeC/L97XSY9Dulj8E=; b=CS0yKGmogZQKZKpONZhtcTTSJ NzOc3jFFzYiqJxISkmB32UyF/bcLRWT5TgkUHb13AZJ07UcUyOvoYhzbf0N6LZdXD2H6CB4xLU8vo oMmFTEb0BrQj1zn2ypIp0fLDSx7tNLEn0u2UFKKlbkSVtQj9J7rqPL224kE182TmPuRzmnCjV6fsT uMnmncUru3TEiE1ux4IUvMawrc/Wz28eMYBNXxZt80kYzSKwNxjwGNLVP8D8BActUQWWE8EUN60xo LegQPcritotfxJdEuED1d/bhitttxpAMxhBMNdIOzwBQIqAUDo8HqSNhRpOE2R5bl/rIUYrrzoPel WJFJ14lkw==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1jtpRR-0002ZX-3j; Fri, 10 Jul 2020 09:35:37 +0000 Received: from foss.arm.com ([217.140.110.172]) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1jtpRO-0002YY-Nw for linux-arm-kernel@lists.infradead.org; Fri, 10 Jul 2020 09:35:35 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D05D531B; Fri, 10 Jul 2020 02:35:28 -0700 (PDT) Received: from C02TD0UTHF1T.local (unknown [10.57.15.46]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2F6183F792; Fri, 10 Jul 2020 02:35:26 -0700 (PDT) Date: Fri, 10 Jul 2020 10:35:16 +0100 From: Mark Rutland To: Pingfan Liu Subject: Re: [PATCH] arm64/mm: save memory access in check_and_switch_context() fast switch path Message-ID: <20200710093516.GA25856@C02TD0UTHF1T.local> References: <1593755079-2160-1-git-send-email-kernelfans@gmail.com> <20200703101336.GA31383@C02TD0UTHF1T.local> <20200709114805.GA11227@C02TD0UTHF1T.local> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200710_053534_893013_503EF0F2 X-CRM114-Status: GOOD ( 13.80 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Jean-Philippe Brucker , Vladimir Murzin , Steve Capper , Catalin Marinas , Will Deacon , linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, Jul 10, 2020 at 04:03:39PM +0800, Pingfan Liu wrote: > On Thu, Jul 9, 2020 at 7:48 PM Mark Rutland wrote: > [...] > > > > IIUC that's a 0.3% improvement. It'd be worth putting these results in > > the commit message. > Sure, I will. > > > > Could you also try that with "perf bench sched messaging" as the > > workload? As a microbenchmark, that might show the highest potential > > benefit, and it'd be nice to have those figures too if possible. > I have finished 10 times of this test, and will put the results in the > commit log too. In summary, this microbenchmark has about 1.69% > improvement after this patch. Great; thanks for gathering this data! Mark. > > Test data: > > 1. without this patch, total 0.707 sec for 10 times > > # perf stat -r 10 perf bench sched messaging > # Running 'sched/messaging' benchmark: > # 20 sender and receiver processes per group > # 10 groups == 400 processes run > > Total time: 0.074 [sec] > # Running 'sched/messaging' benchmark: > # 20 sender and receiver processes per group > # 10 groups == 400 processes run > > Total time: 0.071 [sec] > # Running 'sched/messaging' benchmark: > # 20 sender and receiver processes per group > # 10 groups == 400 processes run > > Total time: 0.068 [sec] > # Running 'sched/messaging' benchmark: > # 20 sender and receiver processes per group > # 10 groups == 400 processes run > > Total time: 0.072 [sec] > # Running 'sched/messaging' benchmark: > # 20 sender and receiver processes per group > # 10 groups == 400 processes run > > Total time: 0.070 [sec] > # Running 'sched/messaging' benchmark: > # 20 sender and receiver processes per group > # 10 groups == 400 processes run > > Total time: 0.070 [sec] > # Running 'sched/messaging' benchmark: > # 20 sender and receiver processes per group > # 10 groups == 400 processes run > > Total time: 0.072 [sec] > # Running 'sched/messaging' benchmark: > # Running 'sched/messaging' benchmark: > # 20 sender and receiver processes per group > # 10 groups == 400 processes run > > Total time: 0.072 [sec] > # Running 'sched/messaging' benchmark: > # 20 sender and receiver processes per group > # 10 groups == 400 processes run > > Total time: 0.068 [sec] > # Running 'sched/messaging' benchmark: > # 20 sender and receiver processes per group > # 10 groups == 400 processes run > > Total time: 0.070 [sec] > > Performance counter stats for 'perf bench sched messaging' (10 runs): > > 3,102.15 msec task-clock # 11.018 CPUs > utilized ( +- 0.47% ) > 16,468 context-switches # 0.005 M/sec > ( +- 2.56% ) > 6,877 cpu-migrations # 0.002 M/sec > ( +- 3.44% ) > 83,645 page-faults # 0.027 M/sec > ( +- 0.05% ) > 6,440,897,966 cycles # 2.076 GHz > ( +- 0.37% ) > 3,620,264,483 instructions # 0.56 insn per > cycle ( +- 0.11% ) > branches > 11,187,394 branch-misses > ( +- 0.73% ) > > 0.28155 +- 0.00166 seconds time elapsed ( +- 0.59% ) > > 2. with this patch, totol 0.695 sec for 10 times > perf stat -r 10 perf bench sched messaging > # Running 'sched/messaging' benchmark: > # 20 sender and receiver processes per group > # 10 groups == 400 processes run > > Total time: 0.069 [sec] > # Running 'sched/messaging' benchmark: > # 20 sender and receiver processes per group > # 10 groups == 400 processes run > > Total time: 0.070 [sec] > # Running 'sched/messaging' benchmark: > # 20 sender and receiver processes per group > # 10 groups == 400 processes run > > Total time: 0.070 [sec] > # Running 'sched/messaging' benchmark: > # 20 sender and receiver processes per group > # 10 groups == 400 processes run > > Total time: 0.070 [sec] > # Running 'sched/messaging' benchmark: > # 20 sender and receiver processes per group > # 10 groups == 400 processes run > > Total time: 0.071 [sec] > # Running 'sched/messaging' benchmark: > # 20 sender and receiver processes per group > # 10 groups == 400 processes run > > Total time: 0.069 [sec] > # Running 'sched/messaging' benchmark: > # 20 sender and receiver processes per group > # 10 groups == 400 processes run > > Total time: 0.072 [sec] > # Running 'sched/messaging' benchmark: > # 20 sender and receiver processes per group > # 10 groups == 400 processes run > > Total time: 0.066 [sec] > # Running 'sched/messaging' benchmark: > # 20 sender and receiver processes per group > # 10 groups == 400 processes run > > Total time: 0.069 [sec] > # Running 'sched/messaging' benchmark: > # 20 sender and receiver processes per group > # 10 groups == 400 processes run > > Total time: 0.069 [sec] > > Performance counter stats for 'perf bench sched messaging' (10 runs): > > 3,098.48 msec task-clock # 11.182 CPUs > utilized ( +- 0.38% ) > 15,485 context-switches # 0.005 M/sec > ( +- 2.28% ) > 6,707 cpu-migrations # 0.002 M/sec > ( +- 2.80% ) > 83,606 page-faults # 0.027 M/sec > ( +- 0.00% ) > 6,435,068,186 cycles # 2.077 GHz > ( +- 0.26% ) > 3,611,197,297 instructions # 0.56 insn per > cycle ( +- 0.08% ) > branches > 11,323,244 branch-misses > ( +- 0.51% ) > > 0.277087 +- 0.000625 seconds time elapsed ( +- 0.23% ) > > > Thanks, > Pingfan _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel