From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 383BDC433E0 for ; Mon, 8 Feb 2021 17:27:45 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C510364E84 for ; Mon, 8 Feb 2021 17:27:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C510364E84 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=R0WrcdA/2womW1wZDjuAU39nZffEWOY+1QvvmBEsoug=; b=xo8aM5j7nsYugMG5dsOOtikjN KKVLbn+2ql5NkqXuFNwvihDCRC4K2rsRXrjQ3j84/iIBIwABTO/nwnxYiqobFddfXGVQs7KK2eh7U dSPokQHjvdPKPKryEreKV3HcSsPjyN6ijaO7VBMaP0D2kFD0bwftM8mwDenMI7g0MzWLCKiBjqZG1 tUsuNTUydAhcm3upkcAUvDKRA1aWMJmzGM5zSmvZPI3ON2gvWU3jLTDR7xPQuXHjbbTJBQNdAy2QZ 4bfkh3ycAi4u24ef4Id6+lcWQIzgnF+bAsCadq1LgiUJNlRthWmLU8SA+voI1RLtTUFl8KT5kJsOu KZ7RcYrfA==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1l9AJ3-0001ZA-KN; Mon, 08 Feb 2021 17:26:37 +0000 Received: from foss.arm.com ([217.140.110.172]) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1l9AJ1-0001Yd-Cy for linux-arm-kernel@lists.infradead.org; Mon, 08 Feb 2021 17:26:36 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C83D81FB; Mon, 8 Feb 2021 09:26:32 -0800 (PST) Received: from arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A0C6A3F719; Mon, 8 Feb 2021 09:26:31 -0800 (PST) Date: Mon, 8 Feb 2021 17:26:10 +0000 From: Dave Martin To: Mark Brown Subject: Re: [PATCH v7 0/2] arm64/sve: Improve performance when handling SVE access traps Message-ID: <20210208172608.GF21837@arm.com> References: <20210201122901.11331-1-broonie@kernel.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20210201122901.11331-1-broonie@kernel.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210208_122635_576474_8ACA78CB X-CRM114-Status: GOOD ( 41.76 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Julien Grall , Julien Grall , Catalin Marinas , Zhang Lei , Will Deacon , linux-arm-kernel@lists.infradead.org, Daniel Kiss Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, Feb 01, 2021 at 12:28:59PM +0000, Mark Brown wrote: > This patch series aims to improve the performance of handling SVE access > traps, earlier versions were originally written by Julien Gral but based > on discussions on previous versions the patches have been substantially > reworked to use a different approach. The patches are now different > enough that I set myself as the author, hopefully that's OK for Julien. > > I've marked this as RFC since it's not quite ready yet but I'd really > like feedback on the overall approach, it's a big change in > implementation. It needs at least one more pass for polish and while > it's holding up in my testing thus far I've not done as much as I'd like > yet. > > Per the syscall ABI, SVE registers will be unknown after a syscall. In > practice, the kernel will disable SVE and the registers will be zeroed > (except the first 128 bits of each vector) on the next SVE instruction. > Currently we do this by saving the FPSIMD state to memory, converting to > the matching SVE state and then reloading the registers on return to > userspace. This requires a lot of memory accesses that we shouldn't > need, improve this by reworking the SVE state tracking so we track if we > should trap on executing SVE instructions separately to if we need to > save the full register state. This allows us to avoid tracking the full > SVE state until we need to return to userspace and to convert directly > in registers in the common case where the FPSIMD state is still in > registers then. > > As with current mainline we disable SVE on every syscall. This may not > be ideal for applications that mix SVE and syscall usage, strategies > such as SH's fpu_counter may perform better but we need to assess the > performance on a wider range of systems than are currently available > before implementing anything. > > It is also possible to optimize the case when the SVE vector length > is 128-bit (ie the same size as the FPSIMD vectors). This could be > explored in the future, it becomes a lot easier to do with this > implementation. > > v7: > - A few minor cosmetic updates and one bugfix for > fpsimd_update_current_state(). I see the following delta in this function: @@ -1272,7 +1272,8 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state) get_cpu_fpsimd_context(); current->thread.uw.fpsimd_state = *state; - clear_thread_flag(TIF_SVE_FULL_REGS); + if (test_thread_flag(TIF_SVE_FULL_REGS)) + fpsimd_to_sve(current); task_fpsimd_load(); fpsimd_bind_task_to_cpu(); Can you elaborate on that? > v6: > - Substantially rework the patch so that TIF_SVE is now replaced by > two flags TIF_SVE_EXEC and TIF_SVE_FULL_REGS. > - Return to disabling SVE after every syscall as for current > mainine rather than leaving it enabled unless reset via ptrace. Some general thoughts before I get into the detail: I think the implementation flow in this series is roughly: a) Split TIF_SVE into two flags, but keep the flags in lockstep, and don't pretend to handle cases where they are unequal (no functional change). b) Add handling for cases where the flags are unequal (the only meaningful case being TIF_SVE_EXEC && !TIF_SVE_FULL_REGS) (purely dead code addition; no functional change). c) Add code that causes / resolves inequality of the flags in order to achieve the goal of the series (functional change). Does that sound about right? Patch 2 just does (c), broadly speaking, I couldn't convince myself whether patch 1 is introducing functional changes or not. The KVM parts of (b) could maybe benefit from their own commit message. If (b) were all in a separate patch by itself, it would be straightfoward to split the host and KVM changes up. Maybe splitting patch 1 into two would help clarify the situation. Second, TIF_SVE_EXEC is advertised as being a pure indication of whether the task is to be allowed to use SVE in userspace. I find that this jars a little with the fact that we leave TIF_SVE_EXEC set in the !FULL_REGS case -- where we cannot enter userspace without doing some additional work. This is not necessarily a problem, but it was one extra thing to get my head around. There seems to be a preexisting convention that a set thread flag implies a "dirtier" condition than when the flag is clear, though obviously it works either way. TIF_SVE_FULL_REGS here is really a work- not-needed flag (whereas TIF_FOREIGN_FPSTATE and various others are work-needed flags). This also is not necessarily a problem, but I find it a bit confusing. Inverting the sense of TIF_SVE_FULL_REGS, or a more explicit name such as TIF_SVE_FULL_REGS_VALID might make it clearer that there is work to do in the "not valid" case and we cannot blindly enter userspace or save the task state without taking any notice of the flag. These are not intended as criticisms -- it's hard to slice up the states we need into nice orthogonal concepts, and I'm still waiting for this refactored view to settle properly in my head. Ultimately, we want a decision point somewhere about whether to fall back to FPSIMD-only mode for a task or not, even if the decision is trivial for now. Where do you think that sits? I'd generally assumed that it would go in fpsimd_save() since that's where we take the hit for saving (and later restoring) the state. That function isn't preemptible though, so it might also make sense to make the decision elsewhere if there are suitable places to do it. I say this because it is important for this series to make a foundation that such a decision can be slotted naturally into -- that's part of the motiviation of making these changes in the first place (IIUC, anyway). [...] Cheers ---Dave _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel