From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E86CC433B4 for ; Fri, 23 Apr 2021 15:39:18 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9F4F5613BB for ; Fri, 23 Apr 2021 15:39:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9F4F5613BB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding :Content-Type:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=8c+lczUzMFnRRypJA0biFysd81rseZ23apGb5CpXeM0=; b=PgAifRivxyHv0dTZPRzDS+C0b 9sviOiLikqeNaBkopKAYTB1YR64bzsC4goe4RfyukBuoHj6WTf6A/AHZN/CZuk7Kf642lZ0nzzQH+ GjKqDp3Uni8pQlt07OC+tFihwEkJEdjAqNTM7LXBlANUw1/KiRYp03u0bctqXnqms7lxFd38LvFrU kZ9zph7BFpknhxIjpjn4Cfa9HbmxGAg/rNdblEVE/4vvu5r6rjTDfl/p5TtN7LzQTIjO0aGr8m9iE M03ge8ZoKs5QuYcfsIdWW40WP1KhCtr9KtxItNePo0RP2PUhVGCaGmkzBa8ApmhceWqrZbrZ148GY MsGln4v9g==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1lZxrl-001mik-NN; Fri, 23 Apr 2021 15:37:13 +0000 Received: from bombadil.infradead.org ([2607:7c80:54:e::133]) by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lZxrh-001miB-HT for linux-arm-kernel@desiato.infradead.org; Fri, 23 Apr 2021 15:37:09 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=k6ms/XWP+Mvai3QvWvA6F76FfPYG5zRXRVrMBHVHUnc=; b=Vn4zoWL2Bpyce3S+kHDf2NGWrS oEzXNotWLzwrJJ83pJvO3WsOq6l+72GrmdXUHSaXY7PrlHAO6L/CXoN+jvdrGi8NKbqz2txNntwg2 AgBjwQ9ZRUURQSXrd9sFjBaYAxpr3WiLBZ0rpLwBeHe5EQy2wzWexUH9REoJMhxB4WM6M8wjI2WMS oIrwX42Y0SsPsPYCPG0aWWQ+NVioM90JllE2D+U6WIsWw3vgV1EXOf3wd3KBKoIfkFbRPD4Fwt8oR q4tO/bRAOJFM5Y7O3u02LcvhqSwUgiFzbM1l0KgViozN/bhnjOFkR/OE/Z4WDEfJMTKU+ANK2qt6A DwAGWUBw==; Received: from mail.kernel.org ([198.145.29.99]) by bombadil.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lZxre-00EWDU-LV for linux-arm-kernel@lists.infradead.org; Fri, 23 Apr 2021 15:37:08 +0000 Received: by mail.kernel.org (Postfix) with ESMTPSA id C8F86611AE; Fri, 23 Apr 2021 15:37:04 +0000 (UTC) Date: Fri, 23 Apr 2021 16:37:02 +0100 From: Catalin Marinas To: Kai Shen Cc: will@kernel.org, linux-arm-kernel@lists.infradead.org, LKML , xuwei5@hisilicon.com, hewenliang4@huawei.com, wuxu.wu@huawei.com Subject: Re: [PATCH] arm64:align function __arch_clear_user Message-ID: <20210423153701.GP18757@arm.com> References: <58fecb22-f932-cb6e-d996-ca75fe26a75d@huawei.com> <20210414104144.GB8320@arm.com> <6829062c-a2d4-57da-4037-269fb7508993@huawei.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <6829062c-a2d4-57da-4037-269fb7508993@huawei.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210423_083706_754231_EA8B7F51 X-CRM114-Status: GOOD ( 25.47 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, Apr 19, 2021 at 10:05:16AM +0800, Kai Shen wrote: > On 2021/4/14 18:41, Catalin Marinas wrote: > > On Wed, Apr 14, 2021 at 05:25:43PM +0800, Kai Shen wrote: > > > Performance decreases happen in __arch_clear_user when this > > > function is not correctly aligned on HISI-HIP08 arm64 SOC which > > > fetches 32 bytes (8 instructions) from icache with a 32-bytes > > > aligned end address. As a result, if the hot loop is not 32-bytes > > > aligned, it may take more icache fetches which leads to decrease > > > in performance. > > > Dump of assembler code for function __arch_clear_user: > > > 0xffff0000809e3f10 : nop > > > 0xffff0000809e3f14 : mov x2, x1 > > > 0xffff0000809e3f18 : subs x1, x1, #0x8 > > > 0xffff0000809e3f1c : b.mi 0xffff0000809e3f30 <__arch_clear_user+3 > > > ----- 0xffff0000809e3f20 : str xzr, [x0],#8 > > > hot 0xffff0000809e3f24 : nop > > > loop 0xffff0000809e3f28 : subs x1, x1, #0x8 > > > ----- 0xffff0000809e3f2c : b.pl 0xffff0000809e3f20 <__arch_clear_user+1 > > > The hot loop above takes one icache fetch as the code is in one > > > 32-bytes aligned area and the loop takes one more icache fetch > > > when it is not aligned like below. > > > 0xffff0000809e4178 : str xzr, [x0],#8 > > > 0xffff0000809e417c : nop > > > 0xffff0000809e4180 : subs x1, x1, #0x8 > > > 0xffff0000809e4184 : b.pl 0xffff0000809e4178 <__arch_clear_user+ > > > Data collected by perf: > > > aligned not aligned > > > instructions 57733790 57739065 > > > L1-dcache-store 14938070 13718242 > > > L1-dcache-store-misses 349280 349869 > > > L1-icache-loads 15380895 28500665 > > > As we can see, L1-icache-loads almost double when the loop is not > > > aligned. > > > This problem is found in linux 4.19 on HISI-HIP08 arm64 SOC. > > > Not sure what the case is on other arm64 SOC, but it should do > > > no harm. > > > Signed-off-by: Kai Shen > > > > Do you have a real world workload that's affected by this function? > > > > I'm against adding alignments and nops for specific hardware > > implementations. What about lots of other loops that the compiler may > > generate or that we wrote in asm? > > The benchmark we used which suffer performance decrease: > https://github.com/redhat-performance/libMicro > pread $OPTS -N "pread_z1k" -s 1k -I 300 -f /dev/zero > pread $OPTS -N "pread_z10k" -s 10k -I 1000 -f /dev/zero > pread $OPTS -N "pread_z100k" -s 100k -I 2000 -f /dev/zero Is there any real world use-case that would benefit from this optimisation? Reading /dev/zero in a loop hardly counts as a practical workload. -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel