From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26479C433E0 for ; Wed, 20 Jan 2021 15:46:20 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C09E72336D for ; Wed, 20 Jan 2021 15:46:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C09E72336D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=dN6/W5ecdR3Iy6DoFDIQKjdw3pXu5RwyucQ/eD713uA=; b=PK2NRJPaGRqdZbAl9pG1kx9U4 k536kqKrXV9n7oETjKIVhgHO2z5AlMTXwGQbZalJtZ9OLaSNaKdK0hlWDa5/XCB2JaDUav7h1umk0 foqUhw2HT+xQgzoywmAGfuyF6OW84JPWFfcgEeZsRxLtwZsZFMaZ3u0odcpEKYec6qGtVAdjz0j8c TBPV6KzArCdoMfyqAN+yx00R8TOo2HqtjyNWH5nHW9k5nWOfHA4SNojvm+d1MDSqDtP+i/bjQsFP5 vgFvF4kdvaRKYTLlkkL8i1MEIlBPJV6kcnw1+2Cb9PCmWI14CE8747pjlqzePlhTNoYN/5EX5XZGl eWw1W9n2w==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1l2Ff5-0008K2-Q1; Wed, 20 Jan 2021 15:44:47 +0000 Received: from foss.arm.com ([217.140.110.172]) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1l2Ff2-0008IR-Iu for linux-arm-kernel@lists.infradead.org; Wed, 20 Jan 2021 15:44:45 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2976631B; Wed, 20 Jan 2021 07:44:41 -0800 (PST) Received: from arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7882E3F68F; Wed, 20 Jan 2021 07:44:39 -0800 (PST) Date: Wed, 20 Jan 2021 15:44:23 +0000 From: Dave Martin To: Ard Biesheuvel Subject: Re: [RFC PATCH 4/5] arm64: fpsimd: run kernel mode NEON with softirqs disabled Message-ID: <20210120154422.GB1684@arm.com> References: <20201218170106.23280-1-ardb@kernel.org> <20201218170106.23280-5-ardb@kernel.org> <20210119160045.GA1684@arm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210120_104444_711691_3FBC4C40 X-CRM114-Status: GOOD ( 48.74 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Herbert Xu , Peter Zijlstra , Catalin Marinas , Sebastian Andrzej Siewior , Linux Kernel Mailing List , Eric Biggers , Mark Brown , Linux Crypto Mailing List , Thomas Gleixner , Will Deacon , Ingo Molnar , Linux ARM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Jan 19, 2021 at 05:29:05PM +0100, Ard Biesheuvel wrote: > On Tue, 19 Jan 2021 at 17:01, Dave Martin wrote: > > > > On Fri, Dec 18, 2020 at 06:01:05PM +0100, Ard Biesheuvel wrote: > > > Kernel mode NEON can be used in task or softirq context, but only in > > > a non-nesting manner, i.e., softirq context is only permitted if the > > > interrupt was not taken at a point where the kernel was using the NEON > > > in task context. > > > > > > This means all users of kernel mode NEON have to be aware of this > > > limitation, and either need to provide scalar fallbacks that may be much > > > slower (up to 20x for AES instructions) and potentially less safe, or > > > use an asynchronous interface that defers processing to a later time > > > when the NEON is guaranteed to be available. > > > > > > Given that grabbing and releasing the NEON is cheap, we can relax this > > > restriction, by increasing the granularity of kernel mode NEON code, and > > > always disabling softirq processing while the NEON is being used in task > > > context. > > > > > > Signed-off-by: Ard Biesheuvel > > > > Sorry for the slow reply on this... it looks reasonable, but I have a > > few comments below. > > > > No worries - thanks for taking a look. > > > > --- > > > arch/arm64/include/asm/assembler.h | 19 +++++++++++++------ > > > arch/arm64/kernel/asm-offsets.c | 2 ++ > > > arch/arm64/kernel/fpsimd.c | 4 ++-- > > > 3 files changed, 17 insertions(+), 8 deletions(-) > > > > > > diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h > > > index ddbe6bf00e33..74ce46ed55ac 100644 > > > --- a/arch/arm64/include/asm/assembler.h > > > +++ b/arch/arm64/include/asm/assembler.h > > > @@ -15,6 +15,7 @@ > > > #include > > > > > > #include > > > +#include > > > #include > > > #include > > > #include > > > @@ -717,17 +718,23 @@ USER(\label, ic ivau, \tmp2) // invalidate I line PoU > > > .endm > > > > > > .macro if_will_cond_yield_neon > > > -#ifdef CONFIG_PREEMPTION > > > get_current_task x0 > > > ldr x0, [x0, #TSK_TI_PREEMPT] > > > - sub x0, x0, #PREEMPT_DISABLE_OFFSET > > > - cbz x0, .Lyield_\@ > > > +#ifdef CONFIG_PREEMPTION > > > + cmp x0, #PREEMPT_DISABLE_OFFSET > > > + beq .Lyield_\@ // yield on need_resched in task context > > > +#endif > > > + /* never yield while serving a softirq */ > > > + tbnz x0, #SOFTIRQ_SHIFT, .Lnoyield_\@ > > > > Can you explain the rationale here? > > > > Using if_will_cond_yield_neon suggests the algo thinks it may run for > > too long the stall preemption until completion, but we happily stall > > preemption _and_ softirqs here. > > > > Is it actually a bug to use the NEON conditional yield helpers in > > softirq context? > > > > No, it is not. But calling kernel_neon_end() from softirq context will > not cause it to finish any faster, so there is really no point in > doing so. > > > Ideally, if processing in softirq context takes an unreasonable about of > > time, the work would be handed off to an asynchronous worker, but that > > does seem to conflict rather with the purpose of this series... > > > > Agreed, but this is not something we can police at this level. If the > caller does an unreasonable amount of work from a softirq, no amount > of yielding is going to make a difference. Ack, just wanted to make sure I wasn't missing something. Anyone writing softirq code can starve preemption, so I agree that we should trust people to know what they're doing. > > > + > > > + adr_l x0, irq_stat + IRQ_CPUSTAT_SOFTIRQ_PENDING > > > + this_cpu_offset x1 > > > + ldr w0, [x0, x1] > > > + cbnz w0, .Lyield_\@ // yield on pending softirq in task context > > > +.Lnoyield_\@: > > > /* fall through to endif_yield_neon */ > > > .subsection 1 > > > .Lyield_\@ : > > > -#else > > > - .section ".discard.cond_yield_neon", "ax" > > > -#endif > > > .endm > > > > > > .macro do_cond_yield_neon > > > diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c > > > index 7d32fc959b1a..34ef70877de4 100644 > > > --- a/arch/arm64/kernel/asm-offsets.c > > > +++ b/arch/arm64/kernel/asm-offsets.c > > > @@ -93,6 +93,8 @@ int main(void) > > > DEFINE(DMA_FROM_DEVICE, DMA_FROM_DEVICE); > > > BLANK(); > > > DEFINE(PREEMPT_DISABLE_OFFSET, PREEMPT_DISABLE_OFFSET); > > > + DEFINE(SOFTIRQ_SHIFT, SOFTIRQ_SHIFT); > > > + DEFINE(IRQ_CPUSTAT_SOFTIRQ_PENDING, offsetof(irq_cpustat_t, __softirq_pending)); > > > BLANK(); > > > DEFINE(CPU_BOOT_STACK, offsetof(struct secondary_data, stack)); > > > DEFINE(CPU_BOOT_TASK, offsetof(struct secondary_data, task)); > > > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c > > > index 062b21f30f94..823e3a8a8871 100644 > > > --- a/arch/arm64/kernel/fpsimd.c > > > +++ b/arch/arm64/kernel/fpsimd.c > > > @@ -180,7 +180,7 @@ static void __get_cpu_fpsimd_context(void) > > > */ > > > static void get_cpu_fpsimd_context(void) > > > { > > > - preempt_disable(); > > > + local_bh_disable(); > > > __get_cpu_fpsimd_context(); > > > } > > > > > > @@ -201,7 +201,7 @@ static void __put_cpu_fpsimd_context(void) > > > static void put_cpu_fpsimd_context(void) > > > { > > > __put_cpu_fpsimd_context(); > > > - preempt_enable(); > > > + local_bh_enable(); > > > } > > > > > > static bool have_cpu_fpsimd_context(void) > > > > I was concerned about catching all the relevant preempt_disable()s, but > > it had slipped my memory that Julien had factored these into one place. > > > > I can't see off the top of my head any reason why this shouldn't work. > > > > Thanks. > > > > > In threory, switching to local_bh_enable() here will add a check for > > pending softirqs onto context handling fast paths. I haven't dug into > > how that works, so perhaps this is trivial on top of the preemption > > check in preempt_enable(). Do you see any difference in hackbench or > > similar benchmarks? > > > > I haven't tried, tbh. But by context handling fast paths, you mean > managing the FP/SIMD state at context switch time, right? Checking for > pending softirqs amounts to a single per-CPU load plus compare, so > that should be negligible AFAICT. Obviously, actually handling the Yes. I've tended to assume, rather than prove, that this kind of thing is negligible -- so I confess I had not attempted to measure these effects when writing the original code. > softirq may take additional time, but that penalty has to be taken > somewhere - I don't see how that would create extra work that we > wouldn't have to do otherwise. > > I'll do some experiments with hackbench once I get back to this series. That sounds fine. Probably you won't find a significant difference anyway. Cheers ---Dave _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel