All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Biggers <ebiggers@google.com>
To: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Jeffrey Walton <noloader@gmail.com>,
	Greg Kaiser <gkaiser@google.com>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Michael Halcrow <mhalcrow@google.com>,
	Patrik Torstensson <totte@google.com>,
	Stefan Agner <stefan@agner.ch>,
	Paul Lawrence <paullawrence@google.com>,
	linux-fscrypt@vger.kernel.org,
	"open list:HARDWARE RANDOM NUMBER GENERATOR CORE"
	<linux-crypto@vger.kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Alex Cope <alexcope@google.com>,
	linux-crypto-owner@vger.kernel.org,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	Paul Crowley <paulcrowley@google.com>
Subject: Re: [PATCH v3 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS
Date: Mon, 18 Jun 2018 14:56:57 -0700	[thread overview]
Message-ID: <20180618215657.GB8022@google.com> (raw)
In-Reply-To: <CAKv+Gu8OnTY7ezbkU5-EzVHp3vvPr_okkOoWfOjkdxhSqSqhBg@mail.gmail.com>

On Sun, Jun 17, 2018 at 01:10:41PM +0200, Ard Biesheuvel wrote:
> >>>>> +
> >>>>> +     // One-time XTS preparation
> >>>>> +
> >>>>> +     /*
> >>>>> +      * Allocate stack space to store 128 bytes worth of tweaks.  For
> >>>>> +      * performance, this space is aligned to a 16-byte boundary so that we
> >>>>> +      * can use the load/store instructions that declare 16-byte alignment.
> >>>>> +      */
> >>>>> +     sub             sp, #128
> >>>>> +     bic             sp, #0xf
> >>>>
> >>>>
> >>>> This fails here when building with CONFIG_THUMB2_KERNEL=y
> >>>>
> >>>>   AS      arch/arm/crypto/speck-neon-core.o
> >>>>
> >>>> arch/arm/crypto/speck-neon-core.S: Assembler messages:
> >>>>
> >>>> arch/arm/crypto/speck-neon-core.S:419: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>> arch/arm/crypto/speck-neon-core.S:423: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>> arch/arm/crypto/speck-neon-core.S:427: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>> arch/arm/crypto/speck-neon-core.S:431: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>>
> >>>> In a quick hack this change seems to address it:
> >>>>
> >>>>
> >>>> -       sub             sp, #128
> >>>> -       bic             sp, #0xf
> >>>> +       mov             r6, sp
> >>>> +       sub             r6, #128
> >>>> +       bic             r6, #0xf
> >>>> +       mov             sp, r6
> >>>>
> >>>> But there is probably a better solution to address this.
> >>>>
> >>>
> >>> Given that there is no NEON on M class cores, I recommend we put something like
> >>>
> >>> THUMB(bx pc)
> >>> THUMB(nop.w)
> >>> THUMB(.arm)
> >>>
> >>> at the beginning and be done with it.
> >>
> >> I mean nop.n or just nop, of course, and we may need a '.align 2' at
> >> the beginning as well.
> >
> > Wouldn't it be preferable to have it assemble it in Thumb2 too? It seems
> > that bic sp,#0xf is the only issue...
> >
> 
> Well, in general, yes. In the case of NEON code, not really, since the
> resulting code will not be smaller anyway, because the Thumb2 NEON
> opcodes are all 4 bytes. Also, Thumb2-only cores don't have NEON
> units, so all cores that this code can run on will be able to run in
> ARM mode.
> 
> So from a maintainability pov, having code that only assembles in one
> way is better than having code that must compile both to ARM and to
> Thumb2 opcodes.
> 
> Just my 2 cents, anyway.

I don't have too much of a preference, though Stefan's suggested 4 instructions
can be reduced to 3, which also matches what aes-neonbs-core.S does:

        sub             r12, sp, #128
        bic             r12, #0xf
        mov             sp, r12

Ard, is the following what you're suggesting instead?

diff --git a/arch/arm/crypto/speck-neon-core.S b/arch/arm/crypto/speck-neon-core.S
index 3c1e203e53b9..c989ce3dc057 100644
--- a/arch/arm/crypto/speck-neon-core.S
+++ b/arch/arm/crypto/speck-neon-core.S
@@ -8,6 +8,7 @@
  */
 
 #include <linux/linkage.h>
+#include <asm/assembler.h>
 
 	.text
 	.fpu		neon
@@ -233,6 +234,12 @@
  * nonzero multiple of 128.
  */
 .macro _speck_xts_crypt	n, decrypting
+
+	.align		2
+	THUMB(bx pc)
+	THUMB(nop)
+	THUMB(.arm)
+
 	push		{r4-r7}
 	mov		r7, sp
 
@@ -413,6 +420,8 @@
 	mov		sp, r7
 	pop		{r4-r7}
 	bx		lr
+
+	THUMB(.thumb)
 .endm
 
 ENTRY(speck128_xts_encrypt_neon)

WARNING: multiple messages have this Message-ID (diff)
From: Eric Biggers <ebiggers@google.com>
To: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Stefan Agner <stefan@agner.ch>,
	"open list:HARDWARE RANDOM NUMBER GENERATOR CORE"
	<linux-crypto@vger.kernel.org>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	linux-fscrypt@vger.kernel.org,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	Jeffrey Walton <noloader@gmail.com>,
	Paul Crowley <paulcrowley@google.com>,
	Patrik Torstensson <totte@google.com>,
	Greg Kaiser <gkaiser@google.com>,
	Paul Lawrence <paullawrence@google.com>,
	Michael Halcrow <mhalcrow@google.com>,
	Alex Cope <alexcope@google.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-crypto-owner@vger.kernel.org
Subject: Re: [PATCH v3 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS
Date: Mon, 18 Jun 2018 14:56:57 -0700	[thread overview]
Message-ID: <20180618215657.GB8022@google.com> (raw)
In-Reply-To: <CAKv+Gu8OnTY7ezbkU5-EzVHp3vvPr_okkOoWfOjkdxhSqSqhBg@mail.gmail.com>

On Sun, Jun 17, 2018 at 01:10:41PM +0200, Ard Biesheuvel wrote:
> >>>>> +
> >>>>> +     // One-time XTS preparation
> >>>>> +
> >>>>> +     /*
> >>>>> +      * Allocate stack space to store 128 bytes worth of tweaks.  For
> >>>>> +      * performance, this space is aligned to a 16-byte boundary so that we
> >>>>> +      * can use the load/store instructions that declare 16-byte alignment.
> >>>>> +      */
> >>>>> +     sub             sp, #128
> >>>>> +     bic             sp, #0xf
> >>>>
> >>>>
> >>>> This fails here when building with CONFIG_THUMB2_KERNEL=y
> >>>>
> >>>>   AS      arch/arm/crypto/speck-neon-core.o
> >>>>
> >>>> arch/arm/crypto/speck-neon-core.S: Assembler messages:
> >>>>
> >>>> arch/arm/crypto/speck-neon-core.S:419: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>> arch/arm/crypto/speck-neon-core.S:423: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>> arch/arm/crypto/speck-neon-core.S:427: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>> arch/arm/crypto/speck-neon-core.S:431: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>>
> >>>> In a quick hack this change seems to address it:
> >>>>
> >>>>
> >>>> -       sub             sp, #128
> >>>> -       bic             sp, #0xf
> >>>> +       mov             r6, sp
> >>>> +       sub             r6, #128
> >>>> +       bic             r6, #0xf
> >>>> +       mov             sp, r6
> >>>>
> >>>> But there is probably a better solution to address this.
> >>>>
> >>>
> >>> Given that there is no NEON on M class cores, I recommend we put something like
> >>>
> >>> THUMB(bx pc)
> >>> THUMB(nop.w)
> >>> THUMB(.arm)
> >>>
> >>> at the beginning and be done with it.
> >>
> >> I mean nop.n or just nop, of course, and we may need a '.align 2' at
> >> the beginning as well.
> >
> > Wouldn't it be preferable to have it assemble it in Thumb2 too? It seems
> > that bic sp,#0xf is the only issue...
> >
> 
> Well, in general, yes. In the case of NEON code, not really, since the
> resulting code will not be smaller anyway, because the Thumb2 NEON
> opcodes are all 4 bytes. Also, Thumb2-only cores don't have NEON
> units, so all cores that this code can run on will be able to run in
> ARM mode.
> 
> So from a maintainability pov, having code that only assembles in one
> way is better than having code that must compile both to ARM and to
> Thumb2 opcodes.
> 
> Just my 2 cents, anyway.

I don't have too much of a preference, though Stefan's suggested 4 instructions
can be reduced to 3, which also matches what aes-neonbs-core.S does:

        sub             r12, sp, #128
        bic             r12, #0xf
        mov             sp, r12

Ard, is the following what you're suggesting instead?

diff --git a/arch/arm/crypto/speck-neon-core.S b/arch/arm/crypto/speck-neon-core.S
index 3c1e203e53b9..c989ce3dc057 100644
--- a/arch/arm/crypto/speck-neon-core.S
+++ b/arch/arm/crypto/speck-neon-core.S
@@ -8,6 +8,7 @@
  */
 
 #include <linux/linkage.h>
+#include <asm/assembler.h>
 
 	.text
 	.fpu		neon
@@ -233,6 +234,12 @@
  * nonzero multiple of 128.
  */
 .macro _speck_xts_crypt	n, decrypting
+
+	.align		2
+	THUMB(bx pc)
+	THUMB(nop)
+	THUMB(.arm)
+
 	push		{r4-r7}
 	mov		r7, sp
 
@@ -413,6 +420,8 @@
 	mov		sp, r7
 	pop		{r4-r7}
 	bx		lr
+
+	THUMB(.thumb)
 .endm
 
 ENTRY(speck128_xts_encrypt_neon)

WARNING: multiple messages have this Message-ID (diff)
From: ebiggers@google.com (Eric Biggers)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v3 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS
Date: Mon, 18 Jun 2018 14:56:57 -0700	[thread overview]
Message-ID: <20180618215657.GB8022@google.com> (raw)
In-Reply-To: <CAKv+Gu8OnTY7ezbkU5-EzVHp3vvPr_okkOoWfOjkdxhSqSqhBg@mail.gmail.com>

On Sun, Jun 17, 2018 at 01:10:41PM +0200, Ard Biesheuvel wrote:
> >>>>> +
> >>>>> +     // One-time XTS preparation
> >>>>> +
> >>>>> +     /*
> >>>>> +      * Allocate stack space to store 128 bytes worth of tweaks.  For
> >>>>> +      * performance, this space is aligned to a 16-byte boundary so that we
> >>>>> +      * can use the load/store instructions that declare 16-byte alignment.
> >>>>> +      */
> >>>>> +     sub             sp, #128
> >>>>> +     bic             sp, #0xf
> >>>>
> >>>>
> >>>> This fails here when building with CONFIG_THUMB2_KERNEL=y
> >>>>
> >>>>   AS      arch/arm/crypto/speck-neon-core.o
> >>>>
> >>>> arch/arm/crypto/speck-neon-core.S: Assembler messages:
> >>>>
> >>>> arch/arm/crypto/speck-neon-core.S:419: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>> arch/arm/crypto/speck-neon-core.S:423: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>> arch/arm/crypto/speck-neon-core.S:427: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>> arch/arm/crypto/speck-neon-core.S:431: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>>
> >>>> In a quick hack this change seems to address it:
> >>>>
> >>>>
> >>>> -       sub             sp, #128
> >>>> -       bic             sp, #0xf
> >>>> +       mov             r6, sp
> >>>> +       sub             r6, #128
> >>>> +       bic             r6, #0xf
> >>>> +       mov             sp, r6
> >>>>
> >>>> But there is probably a better solution to address this.
> >>>>
> >>>
> >>> Given that there is no NEON on M class cores, I recommend we put something like
> >>>
> >>> THUMB(bx pc)
> >>> THUMB(nop.w)
> >>> THUMB(.arm)
> >>>
> >>> at the beginning and be done with it.
> >>
> >> I mean nop.n or just nop, of course, and we may need a '.align 2' at
> >> the beginning as well.
> >
> > Wouldn't it be preferable to have it assemble it in Thumb2 too? It seems
> > that bic sp,#0xf is the only issue...
> >
> 
> Well, in general, yes. In the case of NEON code, not really, since the
> resulting code will not be smaller anyway, because the Thumb2 NEON
> opcodes are all 4 bytes. Also, Thumb2-only cores don't have NEON
> units, so all cores that this code can run on will be able to run in
> ARM mode.
> 
> So from a maintainability pov, having code that only assembles in one
> way is better than having code that must compile both to ARM and to
> Thumb2 opcodes.
> 
> Just my 2 cents, anyway.

I don't have too much of a preference, though Stefan's suggested 4 instructions
can be reduced to 3, which also matches what aes-neonbs-core.S does:

        sub             r12, sp, #128
        bic             r12, #0xf
        mov             sp, r12

Ard, is the following what you're suggesting instead?

diff --git a/arch/arm/crypto/speck-neon-core.S b/arch/arm/crypto/speck-neon-core.S
index 3c1e203e53b9..c989ce3dc057 100644
--- a/arch/arm/crypto/speck-neon-core.S
+++ b/arch/arm/crypto/speck-neon-core.S
@@ -8,6 +8,7 @@
  */
 
 #include <linux/linkage.h>
+#include <asm/assembler.h>
 
 	.text
 	.fpu		neon
@@ -233,6 +234,12 @@
  * nonzero multiple of 128.
  */
 .macro _speck_xts_crypt	n, decrypting
+
+	.align		2
+	THUMB(bx pc)
+	THUMB(nop)
+	THUMB(.arm)
+
 	push		{r4-r7}
 	mov		r7, sp
 
@@ -413,6 +420,8 @@
 	mov		sp, r7
 	pop		{r4-r7}
 	bx		lr
+
+	THUMB(.thumb)
 .endm
 
 ENTRY(speck128_xts_encrypt_neon)

  reply	other threads:[~2018-06-18 21:56 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-14 18:42 [PATCH v3 0/5] crypto: Speck support Eric Biggers
2018-02-14 18:42 ` Eric Biggers
2018-02-14 18:42 ` [PATCH v3 1/5] crypto: add support for the Speck block cipher Eric Biggers
2018-02-14 18:42   ` Eric Biggers
2018-02-14 18:42 ` [PATCH v3 2/5] crypto: speck - export common helpers Eric Biggers
2018-02-14 18:42   ` Eric Biggers
2018-02-14 18:42 ` [PATCH v3 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS Eric Biggers
2018-02-14 18:42   ` Eric Biggers
2018-06-16 22:40   ` Stefan Agner
2018-06-16 22:40     ` Stefan Agner
2018-06-16 22:40     ` Stefan Agner
2018-06-17  9:30     ` Ard Biesheuvel
2018-06-17  9:30       ` Ard Biesheuvel
2018-06-17  9:30       ` Ard Biesheuvel
2018-06-17  9:40       ` Ard Biesheuvel
2018-06-17  9:40         ` Ard Biesheuvel
2018-06-17  9:40         ` Ard Biesheuvel
2018-06-17 10:41         ` Stefan Agner
2018-06-17 10:41           ` Stefan Agner
2018-06-17 10:41           ` Stefan Agner
2018-06-17 11:10           ` Ard Biesheuvel
2018-06-17 11:10             ` Ard Biesheuvel
2018-06-17 11:10             ` Ard Biesheuvel
2018-06-18 21:56             ` Eric Biggers [this message]
2018-06-18 21:56               ` Eric Biggers
2018-06-18 21:56               ` Eric Biggers
2018-06-18 22:04               ` Ard Biesheuvel
2018-06-18 22:04                 ` Ard Biesheuvel
2018-06-18 22:04                 ` Ard Biesheuvel
2018-02-14 18:42 ` [PATCH v3 4/5] crypto: speck - add test vectors for Speck128-XTS Eric Biggers
2018-02-14 18:42   ` Eric Biggers
2018-02-14 18:42 ` [PATCH v3 5/5] crypto: speck - add test vectors for Speck64-XTS Eric Biggers
2018-02-14 18:42   ` Eric Biggers
2018-02-22 15:13 ` [PATCH v3 0/5] crypto: Speck support Herbert Xu
2018-02-22 15:13   ` Herbert Xu
2018-02-22 15:13   ` Herbert Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180618215657.GB8022@google.com \
    --to=ebiggers@google.com \
    --cc=alexcope@google.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=gkaiser@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-crypto-owner@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-fscrypt@vger.kernel.org \
    --cc=mhalcrow@google.com \
    --cc=noloader@gmail.com \
    --cc=paulcrowley@google.com \
    --cc=paullawrence@google.com \
    --cc=stefan@agner.ch \
    --cc=totte@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.