From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 712CCC433E0 for ; Wed, 10 Feb 2021 11:12:33 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3587164D99 for ; Wed, 10 Feb 2021 11:12:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3587164D99 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=KCKEhtRp+2S47G38/Nd4EKSsetDylCdfmMCGzeFMsV4=; b=kCES2nrW9a2o81vs8HatBvo7B Xoqr7jkDJk4/iYudgZePPuULQhPcbUrlFvj3PZkUMK/sViioY1zA0000Jd+YKDrNdj4L6rNN+fVAd zGonFJEkxOd3GrUKIZlfyOa4WMx2ENjCVcarJ/2csNDqS7VtZ7AmoMWg31+/bC7Gcna/8vSJEOajA zyM49itd8nSafK60iilG/Zr9D//pZiTWWSzgpraF3xDunEvU6diSKYBrA1ls0+shUIkkHmi3Xf1hu 4rVpeeD0E1fnQYLOtyKEGAcwifF/9eodGRVBoNuOsHsS/XCu2ozr7uiSiC30l/7+6YQ4jSekj6muv uKI70UPqQ==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1l9nO6-0001OQ-0a; Wed, 10 Feb 2021 11:10:26 +0000 Received: from foss.arm.com ([217.140.110.172]) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1l9nO1-0001N3-Nc for linux-arm-kernel@lists.infradead.org; Wed, 10 Feb 2021 11:10:23 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6A224106F; Wed, 10 Feb 2021 03:10:19 -0800 (PST) Received: from arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4D0EF3F73D; Wed, 10 Feb 2021 03:10:18 -0800 (PST) Date: Wed, 10 Feb 2021 11:09:56 +0000 From: Dave Martin To: Mark Brown Subject: Re: [PATCH v7 2/2] arm64/sve: Rework SVE trap access to minimise memory access Message-ID: <20210210110951.GJ21837@arm.com> References: <20210201122901.11331-1-broonie@kernel.org> <20210201122901.11331-3-broonie@kernel.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20210201122901.11331-3-broonie@kernel.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210210_061022_612478_ABCEFD50 X-CRM114-Status: GOOD ( 39.25 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Julien Grall , Julien Grall , Catalin Marinas , Zhang Lei , Will Deacon , linux-arm-kernel@lists.infradead.org, Daniel Kiss Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, Feb 01, 2021 at 12:29:01PM +0000, Mark Brown wrote: > When we take a SVE access trap only the subset of the SVE Z0-Z31 > registers shared with the FPSIMD V0-V31 registers is valid, the rest > of the bits in the SVE registers must be cleared before returning to > userspace. Currently we do this by saving the current FPSIMD register > state to the task struct and then using that to initalize the copy of > the SVE registers in the task struct so they can be loaded from there > into the registers. This requires a lot more memory access than we > need. > > The newly added TIF_SVE_FULL_REGS can be used to reduce this overhead - > instead of doing the conversion immediately we can set only TIF_SVE_EXEC > and not TIF_SVE_FULL_REGS. This means that until we return to userspace > we only need to store the FPSIMD registers and if (as should be the > common case) the hardware still has the task state and does not need > that to be reloaded from the task struct we can do the initialization of > the SVE state entirely in registers. In the event that we do need to > reload the registers from the task struct only the FPSIMD subset needs > to be loaded from memory. > > If the FPSIMD state is loaded then we need to set the vector length. > This is because the vector length is only set when loading from memory, > the expectation is that the vector length is set when TIF_SVE_EXEC is > set. We also need to rebind the task to the CPU so the newly allocated > SVE state is used when the task is saved. > > This is based on earlier work by Julien Gral implementing a similar idea. > > Signed-off-by: Mark Brown > --- > arch/arm64/include/asm/fpsimd.h | 2 ++ > arch/arm64/kernel/entry-fpsimd.S | 5 +++++ > arch/arm64/kernel/fpsimd.c | 35 +++++++++++++++++++++----------- > 3 files changed, 30 insertions(+), 12 deletions(-) > > diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h > index bec5f14b622a..e60aa4ebb351 100644 > --- a/arch/arm64/include/asm/fpsimd.h > +++ b/arch/arm64/include/asm/fpsimd.h > @@ -74,6 +74,8 @@ extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state, > unsigned long vq_minus_1); > extern unsigned int sve_get_vl(void); > > +extern void sve_set_vq(unsigned long vq_minus_1); > + > struct arm64_cpu_capabilities; > extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused); > > diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S > index 2ca395c25448..3ecec60d3295 100644 > --- a/arch/arm64/kernel/entry-fpsimd.S > +++ b/arch/arm64/kernel/entry-fpsimd.S > @@ -48,6 +48,11 @@ SYM_FUNC_START(sve_get_vl) > ret > SYM_FUNC_END(sve_get_vl) > > +SYM_FUNC_START(sve_set_vq) > + sve_load_vq x0, x1, x2 > + ret > +SYM_FUNC_END(sve_set_vq) > + > /* > * Load SVE state from FPSIMD state. > * > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c > index 58c749ef04c4..05caf207e2ce 100644 > --- a/arch/arm64/kernel/fpsimd.c > +++ b/arch/arm64/kernel/fpsimd.c > @@ -994,10 +994,10 @@ void fpsimd_release_task(struct task_struct *dead_task) > /* > * Trapped SVE access > * > - * Storage is allocated for the full SVE state, the current FPSIMD > - * register contents are migrated across, and TIF_SVE_EXEC is set so that > - * the SVE access trap will be disabled the next time this task > - * reaches ret_to_user. > + * Storage is allocated for the full SVE state so that the code > + * running subsequently has somewhere to save the SVE registers to. We > + * then rely on ret_to_user to actually convert the FPSIMD registers > + * to SVE state by flushing as required. > * > * TIF_SVE_EXEC should be clear on entry: otherwise, > * fpsimd_restore_current_state() would have disabled the SVE access > @@ -1016,15 +1016,26 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs) > > get_cpu_fpsimd_context(); > > - fpsimd_save(); > - > - /* Force ret_to_user to reload the registers: */ > - fpsimd_flush_task_state(current); > - > - fpsimd_to_sve(current); > + /* > + * We shouldn't trap if we can execute SVE instructions and > + * there should be no SVE state if that is the case. > + */ > if (test_and_set_thread_flag(TIF_SVE_EXEC)) > - WARN_ON(1); /* SVE access shouldn't have trapped */ > - set_thread_flag(TIF_SVE_FULL_REGS); > + WARN_ON(1); > + if (test_and_clear_thread_flag(TIF_SVE_FULL_REGS)) > + WARN_ON(1); > + > + /* > + * When the FPSIMD state is loaded: > + * - The return path (see fpsimd_restore_current_state) requires > + * the vector length to be loaded beforehand. > + * - We need to rebind the task to the CPU so the newly allocated > + * SVE state is used when the task is saved. > + */ > + if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) { > + sve_set_vq(sve_vq_from_vl(current->thread.sve_vl) - 1); Hmmm, I can see why we need this here, but it feels slightly odd. Still, I don't have a better idea. Logically, this is all part of a single state change, where we transition from live FPSIMD-only state in the registers to live SVE state with a pending flush. Although we could wrap that up in a helper, we only do this particular transition here so I guess factoring it out may not be worth it. > + fpsimd_bind_task_to_cpu(); > + } > > put_cpu_fpsimd_context(); >From here, can things go wrong if we get preempted and scheduled out? I think fpsimd_save() would just set TIF_SVE_FULL_REGS and save out the full register data, which may contain stale data in the non-FPSIMD bits because we haven't flushed them yet. Assuming I've not confused myself here, the same think could probably happen with Ard's changes if a softirq uses kernel_neon_begin(), causing fpsimd_save() to get called. Cheers ---Dave _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel