From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0B495CDB465 for ; Mon, 16 Oct 2023 08:04:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-ID:In-Reply-To: Date:From:Cc:To:Subject:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:References: List-Owner; bh=UCIqwrxnMyGx5snfuC/U+DF/XeBDx0riGJIQ+xX4NLQ=; b=guX1FMAB2jIE+c yDBXNt5YbQ2oq1FHw0vSSfn2QqBCzqpW3O+7k9rze+cRTpe1XZs4L+O/klxzoB3YkIz3dIiDdyTcP E59h1fGklKcKMbnG9x4Ogb+x9Eg/lRZvbd++6kW0C/5E7XpUYFLGkYgqDUkO19Zak01l9Shdmycsh 1EuoLdfgJkKF3+7nXkk0tNoTso1tbHvfGiJI2PTwT9xyUgInKP3zP9RmipBBYghsN/5s/qfZ4nYD/ 9pXDa/PAtmc2tLdZqDXhSR3dHdoRRZeUs6m3RHKWEF5Th2IZ0hllCo8I3SxDgiYd0Asz396sMdkUn y1pt043nhW3h+K2v47QQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qsIa0-008l1L-2Y; Mon, 16 Oct 2023 08:04:00 +0000 Received: from dfw.source.kernel.org ([2604:1380:4641:c500::1]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1qsIZQ-008kdA-2H for linux-arm-kernel@lists.infradead.org; Mon, 16 Oct 2023 08:03:27 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 21A9C60DC1; Mon, 16 Oct 2023 08:03:24 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 05B49C433C8; Mon, 16 Oct 2023 08:03:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1697443403; bh=xuB3m4ZywnocphHKNHENk0rhxtijRlmIG71nMSH2RTI=; h=Subject:To:Cc:From:Date:In-Reply-To:From; b=bLLY4XV9TtoXDL8ws/QZEf1s5cQv6w9+aR/mW/D5FgcfQXc+wce6NMRmaIOcaNLQO BxUUrT6+nA9aRQZoOr2olvdfmSrFlLjQ0UdlGhljCmn1qFJaZRFDPoeFABMGQ9WM0u 9S0UxubnqYeXm261uVw+FBzxAsA2Vffy7QMRedSE= Subject: Patch "arm64: rework EL0 MRS emulation" has been added to the 5.10-stable tree To: anshuman.khandual@arm.com,ardb@kernel.org,broonie@kernel.org,catalin.marinas@arm.com,ebiederm@xmission.com,f.fainelli@gmail.com,gregkh@linuxfoundation.org,haibinzhang@tencent.com,hewenliang4@huawei.com,james.morse@arm.com,joey.gouly@arm.com,linux-arm-kernel@lists.infradead.org,mark.rutland@arm.com,peterz@infradead.org,ruanjinjie@huawei.com,sashal@kernel.org,scott@os.amperecomputing.com,stable@kernel.org,will@kernel.org,youngmin.nam@samsung.com,yuzenghui@huawei.com Cc: From: Date: Mon, 16 Oct 2023 10:02:25 +0200 In-Reply-To: <20231011100545.979577-11-ruanjinjie@huawei.com> Message-ID: <2023101625-hacker-clump-4447@gregkh> MIME-Version: 1.0 X-stable: commit X-Patchwork-Hint: ignore X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231016_010324_859430_F9190184 X-CRM114-Status: GOOD ( 20.66 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org This is a note to let you know that I've just added the patch titled arm64: rework EL0 MRS emulation to the 5.10-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: arm64-rework-el0-mrs-emulation.patch and it can be found in the queue-5.10 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let know about it. >From ruanjinjie@huawei.com Wed Oct 11 12:06:38 2023 From: Jinjie Ruan Date: Wed, 11 Oct 2023 10:05:40 +0000 Subject: arm64: rework EL0 MRS emulation To: , , , , , , , , , , , , , , , , , , Cc: Message-ID: <20231011100545.979577-11-ruanjinjie@huawei.com> From: Mark Rutland commit f5962add74b61f8ae31c6311f75ca35d7e1d2d8f upstream. On CPUs without FEAT_IDST, ID register emulation is slower than it needs to be, as all threads contend for the same lock to perform the emulation. This patch reworks the emulation to avoid this unnecessary contention. On CPUs with FEAT_IDST (which is mandatory from ARMv8.4 onwards), EL0 accesses to ID registers result in a SYS trap, and emulation of these is handled with a sys64_hook. These hooks are statically allocated, and no locking is required to iterate through the hooks and perform the emulation, allowing emulation to occur in parallel with no contention. On CPUs without FEAT_IDST, EL0 accesses to ID registers result in an UNDEFINED exception, and emulation of these accesses is handled with an undef_hook. When an EL0 MRS instruction is trapped to EL1, the kernel finds the relevant handler by iterating through all of the undef_hooks, requiring undef_lock to be held during this lookup. This locking is only required to safely traverse the list of undef_hooks (as it can be concurrently modified), and the actual emulation of the MRS does not require any mutual exclusion. This locking is an unfortunate bottleneck, especially given that MRS emulation is enabled unconditionally and is never disabled. This patch reworks the non-FEAT_IDST MRS emulation logic so that it can be invoked directly from do_el0_undef(). This removes the bottleneck, allowing MRS traps to be handled entirely in parallel, and is a stepping stone to making all of the undef_hooks lock-free. I've tested this in a 64-vCPU VM on a 64-CPU ThunderX2 host, with a benchmark which spawns a number of threads which each try to read ID_AA64ISAR0_EL1 1000000 times. This is vastly more contention than will ever be seen in realistic usage, but clearly demonstrates the removal of the bottleneck: | Threads || Time (seconds) | | || Before || After | | || Real | System || Real | System | |---------++--------+---------++--------+---------| | 1 || 0.29 | 0.20 || 0.24 | 0.12 | | 2 || 0.35 | 0.51 || 0.23 | 0.27 | | 4 || 1.08 | 3.87 || 0.24 | 0.56 | | 8 || 4.31 | 33.60 || 0.24 | 1.11 | | 16 || 9.47 | 149.39 || 0.23 | 2.15 | | 32 || 19.07 | 605.27 || 0.24 | 4.38 | | 64 || 65.40 | 3609.09 || 0.33 | 11.27 | Aside from the speedup, there should be no functional change as a result of this patch. Signed-off-by: Mark Rutland Cc: Catalin Marinas Cc: James Morse Cc: Joey Gouly Cc: Peter Zijlstra Cc: Will Deacon Link: https://lore.kernel.org/r/20221019144123.612388-6-mark.rutland@arm.com Signed-off-by: Will Deacon Signed-off-by: Jinjie Ruan Signed-off-by: Greg Kroah-Hartman --- arch/arm64/include/asm/cpufeature.h | 3 ++- arch/arm64/kernel/cpufeature.c | 23 +++++------------------ arch/arm64/kernel/traps.c | 3 +++ 3 files changed, 10 insertions(+), 19 deletions(-) --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -759,7 +759,8 @@ static inline bool system_supports_tlb_r cpus_have_const_cap(ARM64_HAS_TLB_RANGE); } -extern int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt); +int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt); +bool try_emulate_mrs(struct pt_regs *regs, u32 isn); static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange) { --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -2852,35 +2852,22 @@ int do_emulate_mrs(struct pt_regs *regs, return rc; } -static int emulate_mrs(struct pt_regs *regs, u32 insn) +bool try_emulate_mrs(struct pt_regs *regs, u32 insn) { u32 sys_reg, rt; + if (compat_user_mode(regs) || !aarch64_insn_is_mrs(insn)) + return false; + /* * sys_reg values are defined as used in mrs/msr instruction. * shift the imm value to get the encoding. */ sys_reg = (u32)aarch64_insn_decode_immediate(AARCH64_INSN_IMM_16, insn) << 5; rt = aarch64_insn_decode_register(AARCH64_INSN_REGTYPE_RT, insn); - return do_emulate_mrs(regs, sys_reg, rt); -} - -static struct undef_hook mrs_hook = { - .instr_mask = 0xfff00000, - .instr_val = 0xd5300000, - .pstate_mask = PSR_AA32_MODE_MASK, - .pstate_val = PSR_MODE_EL0t, - .fn = emulate_mrs, -}; - -static int __init enable_mrs_emulation(void) -{ - register_undef_hook(&mrs_hook); - return 0; + return do_emulate_mrs(regs, sys_reg, rt) == 0; } -core_initcall(enable_mrs_emulation); - ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) { --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -408,6 +408,9 @@ void do_el0_undef(struct pt_regs *regs, if (user_insn_read(regs, &insn)) goto out_err; + if (try_emulate_mrs(regs, insn)) + return; + if (call_undef_hook(regs, insn) == 0) return; Patches currently in stable-queue which might be from ruanjinjie@huawei.com are queue-5.10/arm64-factor-insn-read-out-of-call_undef_hook.patch queue-5.10/arm64-rework-el0-mrs-emulation.patch queue-5.10/arm64-die-pass-err-as-long.patch queue-5.10/arm64-armv8_deprecated-rework-deprected-instruction-handling.patch queue-5.10/arm64-armv8_deprecated-fix-unused-function-error.patch queue-5.10/arm64-armv8_deprecated-move-aarch32-helper-earlier.patch queue-5.10/arm64-consistently-pass-esr_elx-to-die.patch queue-5.10/arm64-factor-out-el1-ssbs-emulation-hook.patch queue-5.10/arm64-report-el1-undefs-better.patch queue-5.10/arm64-armv8_deprecated-fold-ops-into-insn_emulation.patch queue-5.10/arm64-rework-bti-exception-handling.patch queue-5.10/arm64-rework-fpac-exception-handling.patch queue-5.10/arm64-split-el0-el1-undef-handlers.patch queue-5.10/arm64-allow-kprobes-on-el0-handlers.patch queue-5.10/arm64-armv8_deprecated-move-emulation-functions.patch _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel