From: Dave Martin <Dave.Martin@arm.com>
To: Mark Brown <broonie@kernel.org>
Cc: Alexandre Torgue <alexandre.torgue@st.com>,
Catalin Marinas <catalin.marinas@arm.com>,
"David S. Miller" <davem@davemloft.net>,
l00374334 <liqiang64@huawei.com>,
Linux Crypto Mailing List <linux-crypto@vger.kernel.org>,
Maxime Coquelin <mcoquelin.stm32@gmail.com>,
Will Deacon <will@kernel.org>, Ard Biesheuvel <ardb@kernel.org>,
Linux ARM <linux-arm-kernel@lists.infradead.org>,
Herbert Xu <herbert@gondor.apana.org.au>
Subject: Re: [PATCH 1/1] arm64: Accelerate Adler32 using arm64 SVE instructions.
Date: Wed, 4 Nov 2020 18:13:06 +0000 [thread overview]
Message-ID: <20201104181256.GG6882@arm.com> (raw)
In-Reply-To: <20201104175032.GA15020@sirena.org.uk>
On Wed, Nov 04, 2020 at 05:50:33PM +0000, Mark Brown wrote:
> On Tue, Nov 03, 2020 at 06:00:32PM +0000, Dave Martin wrote:
> > On Tue, Nov 03, 2020 at 03:34:27PM +0100, Ard Biesheuvel wrote:
>
> > > First of all, I don't think it is safe at the moment to use SVE in the
> > > kernel, as we don't preserve all state IIRC. My memory is a bit hazy,
>
> > I'm not convinced that it's safe right now. SVE in the kernel is
> > unsupported, partly due to cost and partly due to the lack of a
> > compelling use case.
>
> I think at a minimum we'd want to handle the vector length explicitly
> for kernel mode SVE, vector length independent code will work most of
> the time but at the very least it feels like a landmine waiting to cause
> trouble. If nothing else there's probably going to be cases where it
> makes a difference for performance. Other than that I'm not currently
> seeing any issues since we're handling SVE in the same paths we handle
> the rest of the FPSIMD stuff.
Having a random vector length could be good for testing ;)
I was tempted to add that as a deliberate feature, but that sort of
nothing doesn't really belong in the kernel...
Anyway:
The main reasons for constraining the vector length are a) to hide
mismatches between CPUs in heterogeneous systems, b) to ensure that
validated software doesn't run with a vector length it wasn't validated
for, and c) testing.
For kernel code, it's reasonable to say that all code should be vector-
length agnostic unless there's a really good reason not to be. So we
may not care too much about (b).
In that case, just setting ZCR_EL1.LEN to max in kernel_sve_begin() (or
whatever) probably makes sense.
For (c), it might be useful to have a command-line parameter or debugfs
widget to constrain the vector length for kernel code; perhaps globally
or perhaps per driver or algo.
Otherwise, I agree that using SVE in the kernel _should_ probably work
safely, using the same basic mechanism as kernel_mode_neon(). Also,
it shouldn't have higher overheads than kernel-mode-NEON now.
>
> > I think it would be preferable to see this algo accelerated for NEON
> > first, since all AArch64 hardware can benefit from that.
>
> ...
>
> > much of a problem. kernel_neon_begin() may incur a save of the full SVE
> > state anyway, so in some ways it would be a good thing if we could
> > actually make use of all those registers.
>
> > SVE hardware remains rare, so as a general policy I don't think we
> > should accept SVE implementations of any algorithm that does not
> > already have a NEON implementation -- unless the contributor can
> > explain why nobody with non-SVE hardware is going to care about the
> > performance of that algo.
>
> I tend to agree here, my concerns are around the cost of maintaining a
> SVE implementation relative to the number of users who'd benefit from it
> rather than around the basic idea of using SVE at all. If we were
> seeing substantial performance benefits over an implementation using
> NEON, or had some other strong push to use SVE like a really solid
> understanding of why SVE is a good fit for the algorithm but NEON isn't,
> then it'd be worth finishing up the infrastructure. The infrastructure
> itself doesn't seem fundamentally problematic.
Agreed
Nonetheless, working up a candidate algorithm to help us see whether
there is a good use case seems like a worthwhile project, so I don't
want to discourage that too much.
Cheers
---Dave
WARNING: multiple messages have this Message-ID (diff)
From: Dave Martin <Dave.Martin@arm.com>
To: Mark Brown <broonie@kernel.org>
Cc: Alexandre Torgue <alexandre.torgue@st.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Ard Biesheuvel <ardb@kernel.org>,
l00374334 <liqiang64@huawei.com>,
Linux Crypto Mailing List <linux-crypto@vger.kernel.org>,
Maxime Coquelin <mcoquelin.stm32@gmail.com>,
Will Deacon <will@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Linux ARM <linux-arm-kernel@lists.infradead.org>,
Herbert Xu <herbert@gondor.apana.org.au>
Subject: Re: [PATCH 1/1] arm64: Accelerate Adler32 using arm64 SVE instructions.
Date: Wed, 4 Nov 2020 18:13:06 +0000 [thread overview]
Message-ID: <20201104181256.GG6882@arm.com> (raw)
In-Reply-To: <20201104175032.GA15020@sirena.org.uk>
On Wed, Nov 04, 2020 at 05:50:33PM +0000, Mark Brown wrote:
> On Tue, Nov 03, 2020 at 06:00:32PM +0000, Dave Martin wrote:
> > On Tue, Nov 03, 2020 at 03:34:27PM +0100, Ard Biesheuvel wrote:
>
> > > First of all, I don't think it is safe at the moment to use SVE in the
> > > kernel, as we don't preserve all state IIRC. My memory is a bit hazy,
>
> > I'm not convinced that it's safe right now. SVE in the kernel is
> > unsupported, partly due to cost and partly due to the lack of a
> > compelling use case.
>
> I think at a minimum we'd want to handle the vector length explicitly
> for kernel mode SVE, vector length independent code will work most of
> the time but at the very least it feels like a landmine waiting to cause
> trouble. If nothing else there's probably going to be cases where it
> makes a difference for performance. Other than that I'm not currently
> seeing any issues since we're handling SVE in the same paths we handle
> the rest of the FPSIMD stuff.
Having a random vector length could be good for testing ;)
I was tempted to add that as a deliberate feature, but that sort of
nothing doesn't really belong in the kernel...
Anyway:
The main reasons for constraining the vector length are a) to hide
mismatches between CPUs in heterogeneous systems, b) to ensure that
validated software doesn't run with a vector length it wasn't validated
for, and c) testing.
For kernel code, it's reasonable to say that all code should be vector-
length agnostic unless there's a really good reason not to be. So we
may not care too much about (b).
In that case, just setting ZCR_EL1.LEN to max in kernel_sve_begin() (or
whatever) probably makes sense.
For (c), it might be useful to have a command-line parameter or debugfs
widget to constrain the vector length for kernel code; perhaps globally
or perhaps per driver or algo.
Otherwise, I agree that using SVE in the kernel _should_ probably work
safely, using the same basic mechanism as kernel_mode_neon(). Also,
it shouldn't have higher overheads than kernel-mode-NEON now.
>
> > I think it would be preferable to see this algo accelerated for NEON
> > first, since all AArch64 hardware can benefit from that.
>
> ...
>
> > much of a problem. kernel_neon_begin() may incur a save of the full SVE
> > state anyway, so in some ways it would be a good thing if we could
> > actually make use of all those registers.
>
> > SVE hardware remains rare, so as a general policy I don't think we
> > should accept SVE implementations of any algorithm that does not
> > already have a NEON implementation -- unless the contributor can
> > explain why nobody with non-SVE hardware is going to care about the
> > performance of that algo.
>
> I tend to agree here, my concerns are around the cost of maintaining a
> SVE implementation relative to the number of users who'd benefit from it
> rather than around the basic idea of using SVE at all. If we were
> seeing substantial performance benefits over an implementation using
> NEON, or had some other strong push to use SVE like a really solid
> understanding of why SVE is a good fit for the algorithm but NEON isn't,
> then it'd be worth finishing up the infrastructure. The infrastructure
> itself doesn't seem fundamentally problematic.
Agreed
Nonetheless, working up a candidate algorithm to help us see whether
there is a good use case seems like a worthwhile project, so I don't
want to discourage that too much.
Cheers
---Dave
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2020-11-04 18:13 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-03 12:15 [PATCH 0/1] arm64: Accelerate Adler32 using arm64 SVE instructions l00374334
2020-11-03 12:15 ` l00374334
2020-11-03 12:15 ` [PATCH 1/1] " l00374334
2020-11-03 12:15 ` l00374334
2020-11-03 14:34 ` Ard Biesheuvel
2020-11-03 14:34 ` Ard Biesheuvel
2020-11-03 18:00 ` Dave Martin
2020-11-03 18:00 ` Dave Martin
2020-11-04 9:19 ` Li Qiang
2020-11-04 9:19 ` Li Qiang
2020-11-04 14:49 ` Dave Martin
2020-11-04 14:49 ` Dave Martin
2020-11-05 2:32 ` Li Qiang
2020-11-05 2:32 ` Li Qiang
2020-11-04 17:50 ` Mark Brown
2020-11-04 17:50 ` Mark Brown
2020-11-04 18:13 ` Dave Martin [this message]
2020-11-04 18:13 ` Dave Martin
2020-11-04 18:49 ` Mark Brown
2020-11-04 18:49 ` Mark Brown
2020-11-05 17:56 ` Dave Martin
2020-11-05 17:56 ` Dave Martin
2020-11-04 8:01 ` Li Qiang
2020-11-04 8:01 ` Li Qiang
2020-11-04 8:04 ` Ard Biesheuvel
2020-11-04 8:04 ` Ard Biesheuvel
2020-11-04 8:14 ` Li Qiang
2020-11-04 8:14 ` Li Qiang
2020-11-04 17:57 ` Eric Biggers
2020-11-04 17:57 ` Eric Biggers
2020-11-05 2:49 ` Li Qiang
2020-11-05 2:49 ` Li Qiang
2020-11-05 7:51 ` Ard Biesheuvel
2020-11-05 7:51 ` Ard Biesheuvel
2020-11-05 9:05 ` Li Qiang
2020-11-05 9:05 ` Li Qiang
2020-11-05 18:21 ` Eric Biggers
2020-11-05 18:21 ` Eric Biggers
2020-11-09 6:29 ` Li Qiang
2020-11-09 6:29 ` Li Qiang
2020-11-05 16:53 ` [PATCH 0/1] " Dave Martin
2020-11-05 16:53 ` Dave Martin
2020-11-09 3:43 ` Li Qiang
2020-11-09 3:43 ` Li Qiang
2020-11-10 10:46 ` Dave Martin
2020-11-10 10:46 ` Dave Martin
2020-11-10 13:20 ` Li Qiang
2020-11-10 13:20 ` Li Qiang
2020-11-10 16:07 ` Dave Martin
2020-11-10 16:07 ` Dave Martin
2020-11-12 7:20 ` Li Qiang
2020-11-12 7:20 ` Li Qiang
2020-11-12 11:17 ` Dave Martin
2020-11-12 11:17 ` Dave Martin
2020-11-14 7:31 ` Li Qiang
2020-11-14 7:31 ` Li Qiang
2020-11-16 15:56 ` Dave Martin
2020-11-16 15:56 ` Dave Martin
2020-11-17 12:45 ` Li Qiang
2020-11-17 12:45 ` Li Qiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201104181256.GG6882@arm.com \
--to=dave.martin@arm.com \
--cc=alexandre.torgue@st.com \
--cc=ardb@kernel.org \
--cc=broonie@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=davem@davemloft.net \
--cc=herbert@gondor.apana.org.au \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-crypto@vger.kernel.org \
--cc=liqiang64@huawei.com \
--cc=mcoquelin.stm32@gmail.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.