From: Eric Biggers <ebiggers@google.com>
To: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Jeffrey Walton <noloader@gmail.com>,
Greg Kaiser <gkaiser@google.com>,
Herbert Xu <herbert@gondor.apana.org.au>,
Michael Halcrow <mhalcrow@google.com>,
Patrik Torstensson <totte@google.com>,
Stefan Agner <stefan@agner.ch>,
Paul Lawrence <paullawrence@google.com>,
linux-fscrypt@vger.kernel.org,
"open list:HARDWARE RANDOM NUMBER GENERATOR CORE"
<linux-crypto@vger.kernel.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Alex Cope <alexcope@google.com>,
linux-crypto-owner@vger.kernel.org,
linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
Paul Crowley <paulcrowley@google.com>
Subject: Re: [PATCH v3 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS
Date: Mon, 18 Jun 2018 14:56:57 -0700 [thread overview]
Message-ID: <20180618215657.GB8022@google.com> (raw)
In-Reply-To: <CAKv+Gu8OnTY7ezbkU5-EzVHp3vvPr_okkOoWfOjkdxhSqSqhBg@mail.gmail.com>
On Sun, Jun 17, 2018 at 01:10:41PM +0200, Ard Biesheuvel wrote:
> >>>>> +
> >>>>> + // One-time XTS preparation
> >>>>> +
> >>>>> + /*
> >>>>> + * Allocate stack space to store 128 bytes worth of tweaks. For
> >>>>> + * performance, this space is aligned to a 16-byte boundary so that we
> >>>>> + * can use the load/store instructions that declare 16-byte alignment.
> >>>>> + */
> >>>>> + sub sp, #128
> >>>>> + bic sp, #0xf
> >>>>
> >>>>
> >>>> This fails here when building with CONFIG_THUMB2_KERNEL=y
> >>>>
> >>>> AS arch/arm/crypto/speck-neon-core.o
> >>>>
> >>>> arch/arm/crypto/speck-neon-core.S: Assembler messages:
> >>>>
> >>>> arch/arm/crypto/speck-neon-core.S:419: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>> arch/arm/crypto/speck-neon-core.S:423: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>> arch/arm/crypto/speck-neon-core.S:427: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>> arch/arm/crypto/speck-neon-core.S:431: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>>
> >>>> In a quick hack this change seems to address it:
> >>>>
> >>>>
> >>>> - sub sp, #128
> >>>> - bic sp, #0xf
> >>>> + mov r6, sp
> >>>> + sub r6, #128
> >>>> + bic r6, #0xf
> >>>> + mov sp, r6
> >>>>
> >>>> But there is probably a better solution to address this.
> >>>>
> >>>
> >>> Given that there is no NEON on M class cores, I recommend we put something like
> >>>
> >>> THUMB(bx pc)
> >>> THUMB(nop.w)
> >>> THUMB(.arm)
> >>>
> >>> at the beginning and be done with it.
> >>
> >> I mean nop.n or just nop, of course, and we may need a '.align 2' at
> >> the beginning as well.
> >
> > Wouldn't it be preferable to have it assemble it in Thumb2 too? It seems
> > that bic sp,#0xf is the only issue...
> >
>
> Well, in general, yes. In the case of NEON code, not really, since the
> resulting code will not be smaller anyway, because the Thumb2 NEON
> opcodes are all 4 bytes. Also, Thumb2-only cores don't have NEON
> units, so all cores that this code can run on will be able to run in
> ARM mode.
>
> So from a maintainability pov, having code that only assembles in one
> way is better than having code that must compile both to ARM and to
> Thumb2 opcodes.
>
> Just my 2 cents, anyway.
I don't have too much of a preference, though Stefan's suggested 4 instructions
can be reduced to 3, which also matches what aes-neonbs-core.S does:
sub r12, sp, #128
bic r12, #0xf
mov sp, r12
Ard, is the following what you're suggesting instead?
diff --git a/arch/arm/crypto/speck-neon-core.S b/arch/arm/crypto/speck-neon-core.S
index 3c1e203e53b9..c989ce3dc057 100644
--- a/arch/arm/crypto/speck-neon-core.S
+++ b/arch/arm/crypto/speck-neon-core.S
@@ -8,6 +8,7 @@
*/
#include <linux/linkage.h>
+#include <asm/assembler.h>
.text
.fpu neon
@@ -233,6 +234,12 @@
* nonzero multiple of 128.
*/
.macro _speck_xts_crypt n, decrypting
+
+ .align 2
+ THUMB(bx pc)
+ THUMB(nop)
+ THUMB(.arm)
+
push {r4-r7}
mov r7, sp
@@ -413,6 +420,8 @@
mov sp, r7
pop {r4-r7}
bx lr
+
+ THUMB(.thumb)
.endm
ENTRY(speck128_xts_encrypt_neon)
WARNING: multiple messages have this Message-ID (diff)
From: Eric Biggers <ebiggers@google.com>
To: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Stefan Agner <stefan@agner.ch>,
"open list:HARDWARE RANDOM NUMBER GENERATOR CORE"
<linux-crypto@vger.kernel.org>,
Herbert Xu <herbert@gondor.apana.org.au>,
linux-fscrypt@vger.kernel.org,
linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
Jeffrey Walton <noloader@gmail.com>,
Paul Crowley <paulcrowley@google.com>,
Patrik Torstensson <totte@google.com>,
Greg Kaiser <gkaiser@google.com>,
Paul Lawrence <paullawrence@google.com>,
Michael Halcrow <mhalcrow@google.com>,
Alex Cope <alexcope@google.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
linux-crypto-owner@vger.kernel.org
Subject: Re: [PATCH v3 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS
Date: Mon, 18 Jun 2018 14:56:57 -0700 [thread overview]
Message-ID: <20180618215657.GB8022@google.com> (raw)
In-Reply-To: <CAKv+Gu8OnTY7ezbkU5-EzVHp3vvPr_okkOoWfOjkdxhSqSqhBg@mail.gmail.com>
On Sun, Jun 17, 2018 at 01:10:41PM +0200, Ard Biesheuvel wrote:
> >>>>> +
> >>>>> + // One-time XTS preparation
> >>>>> +
> >>>>> + /*
> >>>>> + * Allocate stack space to store 128 bytes worth of tweaks. For
> >>>>> + * performance, this space is aligned to a 16-byte boundary so that we
> >>>>> + * can use the load/store instructions that declare 16-byte alignment.
> >>>>> + */
> >>>>> + sub sp, #128
> >>>>> + bic sp, #0xf
> >>>>
> >>>>
> >>>> This fails here when building with CONFIG_THUMB2_KERNEL=y
> >>>>
> >>>> AS arch/arm/crypto/speck-neon-core.o
> >>>>
> >>>> arch/arm/crypto/speck-neon-core.S: Assembler messages:
> >>>>
> >>>> arch/arm/crypto/speck-neon-core.S:419: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>> arch/arm/crypto/speck-neon-core.S:423: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>> arch/arm/crypto/speck-neon-core.S:427: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>> arch/arm/crypto/speck-neon-core.S:431: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>>
> >>>> In a quick hack this change seems to address it:
> >>>>
> >>>>
> >>>> - sub sp, #128
> >>>> - bic sp, #0xf
> >>>> + mov r6, sp
> >>>> + sub r6, #128
> >>>> + bic r6, #0xf
> >>>> + mov sp, r6
> >>>>
> >>>> But there is probably a better solution to address this.
> >>>>
> >>>
> >>> Given that there is no NEON on M class cores, I recommend we put something like
> >>>
> >>> THUMB(bx pc)
> >>> THUMB(nop.w)
> >>> THUMB(.arm)
> >>>
> >>> at the beginning and be done with it.
> >>
> >> I mean nop.n or just nop, of course, and we may need a '.align 2' at
> >> the beginning as well.
> >
> > Wouldn't it be preferable to have it assemble it in Thumb2 too? It seems
> > that bic sp,#0xf is the only issue...
> >
>
> Well, in general, yes. In the case of NEON code, not really, since the
> resulting code will not be smaller anyway, because the Thumb2 NEON
> opcodes are all 4 bytes. Also, Thumb2-only cores don't have NEON
> units, so all cores that this code can run on will be able to run in
> ARM mode.
>
> So from a maintainability pov, having code that only assembles in one
> way is better than having code that must compile both to ARM and to
> Thumb2 opcodes.
>
> Just my 2 cents, anyway.
I don't have too much of a preference, though Stefan's suggested 4 instructions
can be reduced to 3, which also matches what aes-neonbs-core.S does:
sub r12, sp, #128
bic r12, #0xf
mov sp, r12
Ard, is the following what you're suggesting instead?
diff --git a/arch/arm/crypto/speck-neon-core.S b/arch/arm/crypto/speck-neon-core.S
index 3c1e203e53b9..c989ce3dc057 100644
--- a/arch/arm/crypto/speck-neon-core.S
+++ b/arch/arm/crypto/speck-neon-core.S
@@ -8,6 +8,7 @@
*/
#include <linux/linkage.h>
+#include <asm/assembler.h>
.text
.fpu neon
@@ -233,6 +234,12 @@
* nonzero multiple of 128.
*/
.macro _speck_xts_crypt n, decrypting
+
+ .align 2
+ THUMB(bx pc)
+ THUMB(nop)
+ THUMB(.arm)
+
push {r4-r7}
mov r7, sp
@@ -413,6 +420,8 @@
mov sp, r7
pop {r4-r7}
bx lr
+
+ THUMB(.thumb)
.endm
ENTRY(speck128_xts_encrypt_neon)
WARNING: multiple messages have this Message-ID (diff)
From: ebiggers@google.com (Eric Biggers)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v3 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS
Date: Mon, 18 Jun 2018 14:56:57 -0700 [thread overview]
Message-ID: <20180618215657.GB8022@google.com> (raw)
In-Reply-To: <CAKv+Gu8OnTY7ezbkU5-EzVHp3vvPr_okkOoWfOjkdxhSqSqhBg@mail.gmail.com>
On Sun, Jun 17, 2018 at 01:10:41PM +0200, Ard Biesheuvel wrote:
> >>>>> +
> >>>>> + // One-time XTS preparation
> >>>>> +
> >>>>> + /*
> >>>>> + * Allocate stack space to store 128 bytes worth of tweaks. For
> >>>>> + * performance, this space is aligned to a 16-byte boundary so that we
> >>>>> + * can use the load/store instructions that declare 16-byte alignment.
> >>>>> + */
> >>>>> + sub sp, #128
> >>>>> + bic sp, #0xf
> >>>>
> >>>>
> >>>> This fails here when building with CONFIG_THUMB2_KERNEL=y
> >>>>
> >>>> AS arch/arm/crypto/speck-neon-core.o
> >>>>
> >>>> arch/arm/crypto/speck-neon-core.S: Assembler messages:
> >>>>
> >>>> arch/arm/crypto/speck-neon-core.S:419: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>> arch/arm/crypto/speck-neon-core.S:423: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>> arch/arm/crypto/speck-neon-core.S:427: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>> arch/arm/crypto/speck-neon-core.S:431: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>>
> >>>> In a quick hack this change seems to address it:
> >>>>
> >>>>
> >>>> - sub sp, #128
> >>>> - bic sp, #0xf
> >>>> + mov r6, sp
> >>>> + sub r6, #128
> >>>> + bic r6, #0xf
> >>>> + mov sp, r6
> >>>>
> >>>> But there is probably a better solution to address this.
> >>>>
> >>>
> >>> Given that there is no NEON on M class cores, I recommend we put something like
> >>>
> >>> THUMB(bx pc)
> >>> THUMB(nop.w)
> >>> THUMB(.arm)
> >>>
> >>> at the beginning and be done with it.
> >>
> >> I mean nop.n or just nop, of course, and we may need a '.align 2' at
> >> the beginning as well.
> >
> > Wouldn't it be preferable to have it assemble it in Thumb2 too? It seems
> > that bic sp,#0xf is the only issue...
> >
>
> Well, in general, yes. In the case of NEON code, not really, since the
> resulting code will not be smaller anyway, because the Thumb2 NEON
> opcodes are all 4 bytes. Also, Thumb2-only cores don't have NEON
> units, so all cores that this code can run on will be able to run in
> ARM mode.
>
> So from a maintainability pov, having code that only assembles in one
> way is better than having code that must compile both to ARM and to
> Thumb2 opcodes.
>
> Just my 2 cents, anyway.
I don't have too much of a preference, though Stefan's suggested 4 instructions
can be reduced to 3, which also matches what aes-neonbs-core.S does:
sub r12, sp, #128
bic r12, #0xf
mov sp, r12
Ard, is the following what you're suggesting instead?
diff --git a/arch/arm/crypto/speck-neon-core.S b/arch/arm/crypto/speck-neon-core.S
index 3c1e203e53b9..c989ce3dc057 100644
--- a/arch/arm/crypto/speck-neon-core.S
+++ b/arch/arm/crypto/speck-neon-core.S
@@ -8,6 +8,7 @@
*/
#include <linux/linkage.h>
+#include <asm/assembler.h>
.text
.fpu neon
@@ -233,6 +234,12 @@
* nonzero multiple of 128.
*/
.macro _speck_xts_crypt n, decrypting
+
+ .align 2
+ THUMB(bx pc)
+ THUMB(nop)
+ THUMB(.arm)
+
push {r4-r7}
mov r7, sp
@@ -413,6 +420,8 @@
mov sp, r7
pop {r4-r7}
bx lr
+
+ THUMB(.thumb)
.endm
ENTRY(speck128_xts_encrypt_neon)
next prev parent reply other threads:[~2018-06-18 21:56 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-14 18:42 [PATCH v3 0/5] crypto: Speck support Eric Biggers
2018-02-14 18:42 ` Eric Biggers
2018-02-14 18:42 ` [PATCH v3 1/5] crypto: add support for the Speck block cipher Eric Biggers
2018-02-14 18:42 ` Eric Biggers
2018-02-14 18:42 ` [PATCH v3 2/5] crypto: speck - export common helpers Eric Biggers
2018-02-14 18:42 ` Eric Biggers
2018-02-14 18:42 ` [PATCH v3 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS Eric Biggers
2018-02-14 18:42 ` Eric Biggers
2018-06-16 22:40 ` Stefan Agner
2018-06-16 22:40 ` Stefan Agner
2018-06-16 22:40 ` Stefan Agner
2018-06-17 9:30 ` Ard Biesheuvel
2018-06-17 9:30 ` Ard Biesheuvel
2018-06-17 9:30 ` Ard Biesheuvel
2018-06-17 9:40 ` Ard Biesheuvel
2018-06-17 9:40 ` Ard Biesheuvel
2018-06-17 9:40 ` Ard Biesheuvel
2018-06-17 10:41 ` Stefan Agner
2018-06-17 10:41 ` Stefan Agner
2018-06-17 10:41 ` Stefan Agner
2018-06-17 11:10 ` Ard Biesheuvel
2018-06-17 11:10 ` Ard Biesheuvel
2018-06-17 11:10 ` Ard Biesheuvel
2018-06-18 21:56 ` Eric Biggers [this message]
2018-06-18 21:56 ` Eric Biggers
2018-06-18 21:56 ` Eric Biggers
2018-06-18 22:04 ` Ard Biesheuvel
2018-06-18 22:04 ` Ard Biesheuvel
2018-06-18 22:04 ` Ard Biesheuvel
2018-02-14 18:42 ` [PATCH v3 4/5] crypto: speck - add test vectors for Speck128-XTS Eric Biggers
2018-02-14 18:42 ` Eric Biggers
2018-02-14 18:42 ` [PATCH v3 5/5] crypto: speck - add test vectors for Speck64-XTS Eric Biggers
2018-02-14 18:42 ` Eric Biggers
2018-02-22 15:13 ` [PATCH v3 0/5] crypto: Speck support Herbert Xu
2018-02-22 15:13 ` Herbert Xu
2018-02-22 15:13 ` Herbert Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180618215657.GB8022@google.com \
--to=ebiggers@google.com \
--cc=alexcope@google.com \
--cc=ard.biesheuvel@linaro.org \
--cc=gkaiser@google.com \
--cc=gregkh@linuxfoundation.org \
--cc=herbert@gondor.apana.org.au \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-crypto-owner@vger.kernel.org \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-fscrypt@vger.kernel.org \
--cc=mhalcrow@google.com \
--cc=noloader@gmail.com \
--cc=paulcrowley@google.com \
--cc=paullawrence@google.com \
--cc=stefan@agner.ch \
--cc=totte@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.