Linux cryptographic layer development
 help / color / mirror / Atom feed
* Re: [RFC PATCH 5/6] crypto: aesni-intel - Add bulk request support
From: Eric Biggers @ 2017-01-13  3:19 UTC (permalink / raw)
  To: Ondrej Mosnacek
  Cc: Herbert Xu, linux-crypto, dm-devel, Mike Snitzer, Milan Broz,
	Mikulas Patocka, Binoy Jayan
In-Reply-To: <c32a28630157c619ac2a7c851be586e72f193c68.1484215956.git.omosnacek@gmail.com>

On Thu, Jan 12, 2017 at 01:59:57PM +0100, Ondrej Mosnacek wrote:
> This patch implements bulk request handling in the AES-NI crypto drivers.
> The major advantage of this is that with bulk requests, the kernel_fpu_*
> functions (which are usually quite slow) are now called only once for the whole
> request.
> 

Hi Ondrej,

To what extent does the performance benefit of this patchset result from just
the reduced numbers of calls to kernel_fpu_begin() and kernel_fpu_end()?

If it's most of the benefit, would it make any sense to optimize
kernel_fpu_begin() and kernel_fpu_end() instead?

And if there are other examples besides kernel_fpu_begin/kernel_fpu_end where
the bulk API would provide a significant performance boost, can you mention
them?

Interestingly, the arm64 equivalent to kernel_fpu_begin()
(kernel_neon_begin_partial() in arch/arm64/kernel/fpsimd.c) appears to have an
optimization where the SIMD registers aren't saved if they were already saved.
I wonder why something similar isn't done on x86.

Eric

^ permalink raw reply

* Re: x86-64: Maintain 16-byte stack alignment
From: Andy Lutomirski @ 2017-01-13  3:23 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Linus Torvalds, Herbert Xu, Linux Kernel Mailing List,
	Linux Crypto Mailing List, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Ard Biesheuvel
In-Reply-To: <20170113031107.mgitq54fmjnrvi6f@treble>

On Thu, Jan 12, 2017 at 7:11 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Thu, Jan 12, 2017 at 05:46:55PM -0800, Andy Lutomirski wrote:
>> On Thu, Jan 12, 2017 at 12:15 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> > On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
>> >> On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
>> >> <torvalds@linux-foundation.org> wrote:
>> >> > On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> >> >>
>> >> >> Just to clarify, I think you're asking if, for versions of gcc which
>> >> >> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
>> >> >> functions to ensure their stacks are 16-byte aligned.
>> >> >>
>> >> >> It's certainly possible, but I don't see how that solves the problem.
>> >> >> The stack will still be misaligned by entry code.  Or am I missing
>> >> >> something?
>> >> >
>> >> > I think the argument is that we *could* try to align things, if we
>> >> > just had some tool that actually then verified that we aren't missing
>> >> > anything.
>> >> >
>> >> > I'm not entirely happy with checking the generated code, though,
>> >> > because as Ingo says, you have a 50:50 chance of just getting it right
>> >> > by mistake. So I'd much rather have some static tool that checks
>> >> > things at a code level (ie coccinelle or sparse).
>> >>
>> >> What I meant was checking the entry code to see if it aligns stack
>> >> frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
>> >> alignment for real may actually be entirely a lost cause.  After all,
>> >> I think we have some inline functions that do asm volatile ("call
>> >> ..."), and I don't see any credible way of forcing alignment short of
>> >> generating an entirely new stack frame and aligning that.
>> >
>> > Actually we already found all such cases and fixed them by forcing a new
>> > stack frame, thanks to objtool.  For example, see 55a76b59b5fe.
>>
>> What I mean is: what guarantees that the stack is properly aligned for
>> the subroutine call?  gcc promises to set up a stack frame, but does
>> it promise that rsp will be properly aligned to call a C function?
>
> Yes, I did an experiment and you're right.  I had naively assumed that
> all stack frames would be aligned.

Just to check: did you do your experiment with -mpreferred-stack-boundary=4?

--Andy

^ permalink raw reply

* Re: x86-64: Maintain 16-byte stack alignment
From: Josh Poimboeuf @ 2017-01-13  4:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Herbert Xu, Linux Kernel Mailing List,
	Linux Crypto Mailing List, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Ard Biesheuvel
In-Reply-To: <CALCETrUFCFn-rKnr+NG3SU7J78ree9siJC=Kz8f_Bk6eG2HyPA@mail.gmail.com>

On Thu, Jan 12, 2017 at 07:23:18PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 12, 2017 at 7:11 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Thu, Jan 12, 2017 at 05:46:55PM -0800, Andy Lutomirski wrote:
> >> On Thu, Jan 12, 2017 at 12:15 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> > On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
> >> >> On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
> >> >> <torvalds@linux-foundation.org> wrote:
> >> >> > On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> >> >>
> >> >> >> Just to clarify, I think you're asking if, for versions of gcc which
> >> >> >> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
> >> >> >> functions to ensure their stacks are 16-byte aligned.
> >> >> >>
> >> >> >> It's certainly possible, but I don't see how that solves the problem.
> >> >> >> The stack will still be misaligned by entry code.  Or am I missing
> >> >> >> something?
> >> >> >
> >> >> > I think the argument is that we *could* try to align things, if we
> >> >> > just had some tool that actually then verified that we aren't missing
> >> >> > anything.
> >> >> >
> >> >> > I'm not entirely happy with checking the generated code, though,
> >> >> > because as Ingo says, you have a 50:50 chance of just getting it right
> >> >> > by mistake. So I'd much rather have some static tool that checks
> >> >> > things at a code level (ie coccinelle or sparse).
> >> >>
> >> >> What I meant was checking the entry code to see if it aligns stack
> >> >> frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
> >> >> alignment for real may actually be entirely a lost cause.  After all,
> >> >> I think we have some inline functions that do asm volatile ("call
> >> >> ..."), and I don't see any credible way of forcing alignment short of
> >> >> generating an entirely new stack frame and aligning that.
> >> >
> >> > Actually we already found all such cases and fixed them by forcing a new
> >> > stack frame, thanks to objtool.  For example, see 55a76b59b5fe.
> >>
> >> What I mean is: what guarantees that the stack is properly aligned for
> >> the subroutine call?  gcc promises to set up a stack frame, but does
> >> it promise that rsp will be properly aligned to call a C function?
> >
> > Yes, I did an experiment and you're right.  I had naively assumed that
> > all stack frames would be aligned.
> 
> Just to check: did you do your experiment with -mpreferred-stack-boundary=4?

Yes, but it's too late for me to be doing hard stuff and I think my
first experiment was bogus.  I didn't use all the other kernel-specific
gcc options.

I tried again with all the kernel gcc options, except with
-mpreferred-stack-boundary=4 instead of 3, and actually came up with the
opposite conclusion.

I used the following code:

void otherfunc(void);

static inline void bar(long *f)
{
	asm volatile("call otherfunc" : : "m" (f) : );
}

void foo(void)
{
	long buf[3] = {0, 0, 0};
	bar(buf);
}

The stack frame was always 16-byte aligned regardless of whether the
buf array size was even or odd.

So my half-asleep brain is telling me that my original assumption was
right.

-- 
Josh

^ permalink raw reply

* Re: x86-64: Maintain 16-byte stack alignment
From: Josh Poimboeuf @ 2017-01-13  5:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Crypto Mailing List, Thomas Gleixner, Herbert Xu,
	Andy Lutomirski, Ingo Molnar, Andy Lutomirski,
	Linux Kernel Mailing List, Ard Biesheuvel
In-Reply-To: <CA+55aFzRrSwGxxfZk-RUEnsz=xhcSmOwE1CenfCPBWtsS9MwDw@mail.gmail.com>

On Thu, Jan 12, 2017 at 08:37:18PM -0800, Linus Torvalds wrote:
> On Jan 12, 2017 8:28 PM, "Josh Poimboeuf" <jpoimboe@redhat.com> wrote:
> 
> 
> The stack frame was always 16-byte aligned regardless of whether the
> buf array size was even or odd.
> 
> 
> Including with -fomit-frame-pointer?
> 
> With frame pointers, stack frames really are naturally 16 bytes, and then
> keeping the frame 16-byte aligned is just a matter of making any extra
> frame allocations or push/pop sequences that you do also be a multiple of
> 16 bytes.
> 
> But *without* frame pointers, the"native" frame size is just 8 bytes, and a
> function that doesn't need any other local storage and then calls another
> function (think various trivial wrapper functions that just add an argument
> and then munge the return value) would thus naturally cause the frame to
> become misaligned.
> 
> So then the compiler actually needs to start adding useless instructions
> just to keep the stack 16-byte aligned.

Disabling frame pointers didn't seem to help, but I finally got it to
misalign with a different test case.  I think it had been aligning the
array, so instead I made it push a register.


void otherfunc(void);

static inline void bar(int f)
{
	register void *__sp asm(_ASM_SP);
	asm volatile("call otherfunc" : "+r" (__sp) : "b"(f));
}

void foo(void)
{
	bar(5);
}


00000000000020f0 <foo>:
    20f0:	55                   	push   %rbp
    20f1:	48 89 e5             	mov    %rsp,%rbp
    20f4:	53                   	push   %rbx
    20f5:	bb 05 00 00 00       	mov    $0x5,%ebx
    20fa:	e8 00 00 00 00       	callq  20ff <foo+0xf>
			20fb: R_X86_64_PC32	otherfunc-0x4
    20ff:	5b                   	pop    %rbx
    2100:	5d                   	pop    %rbp
    2101:	c3                   	retq   
    2102:	0f 1f 40 00          	nopl   0x0(%rax)
    2106:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
    210d:	00 00 00 

-- 
Josh

^ permalink raw reply

* [PATCH] virtio-crypto: adjust priority of algorithm
From: Gonglei @ 2017-01-13  6:25 UTC (permalink / raw)
  To: virtualization, linux-crypto, linux-kernel
  Cc: mst, herbert, borntraeger, Gonglei

Some hardware accelerators (like intel aseni or the s390
cpacf functions) have lower priorities than virtio
crypto, and those drivers are faster than the same in
the host via virtio. So let's lower the priority of
virtio-crypto's algorithm, make it's higher than sofeware
implimentations but lower than the hardware ones.

Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 drivers/crypto/virtio/virtio_crypto_algs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c b/drivers/crypto/virtio/virtio_crypto_algs.c
index 6f40a42..4de4740 100644
--- a/drivers/crypto/virtio/virtio_crypto_algs.c
+++ b/drivers/crypto/virtio/virtio_crypto_algs.c
@@ -498,7 +498,7 @@ void virtio_crypto_ablkcipher_finalize_req(
 static struct crypto_alg virtio_crypto_algs[] = { {
 	.cra_name = "cbc(aes)",
 	.cra_driver_name = "virtio_crypto_aes_cbc",
-	.cra_priority = 501,
+	.cra_priority = 150,
 	.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
 	.cra_blocksize = AES_BLOCK_SIZE,
 	.cra_ctxsize  = sizeof(struct virtio_crypto_ablkcipher_ctx),
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH] crypto: testmgr - don't DMA map IV from stack in test_skcipher()
From: Horia Geantă @ 2017-01-13  6:59 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, linux-crypto

Fix the "DMA-API: device driver maps memory from stack" warning
generated when crypto accelerators map the IV.

Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
---
 crypto/testmgr.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 500a5277cc22..64245aeef634 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -1081,12 +1081,16 @@ static int __test_skcipher(struct crypto_skcipher *tfm, int enc,
 	const char *e, *d;
 	struct tcrypt_result result;
 	void *data;
-	char iv[MAX_IVLEN];
+	char *iv;
 	char *xbuf[XBUFSIZE];
 	char *xoutbuf[XBUFSIZE];
 	int ret = -ENOMEM;
 	unsigned int ivsize = crypto_skcipher_ivsize(tfm);
 
+	iv = kmalloc(MAX_IVLEN, GFP_KERNEL);
+	if (!iv)
+		return ret;
+
 	if (testmgr_alloc_buf(xbuf))
 		goto out_nobuf;
 
@@ -1328,6 +1332,7 @@ static int __test_skcipher(struct crypto_skcipher *tfm, int enc,
 out_nooutbuf:
 	testmgr_free_buf(xbuf);
 out_nobuf:
+	kfree(iv);
 	return ret;
 }
 
-- 
2.4.4

^ permalink raw reply related

* Re: [PATCH] virtio-crypto: adjust priority of algorithm
From: Christian Borntraeger @ 2017-01-13  8:28 UTC (permalink / raw)
  To: Gonglei, virtualization, linux-crypto, linux-kernel; +Cc: herbert, mst
In-Reply-To: <1484288741-31100-1-git-send-email-arei.gonglei@huawei.com>

ACK. Whoever takes this patch might want to fixup 3 typos.

On 01/13/2017 07:25 AM, Gonglei wrote:

> Some hardware accelerators (like intel aseni or the s390
					aesni
> cpacf functions) have lower priorities than virtio
> crypto, and those drivers are faster than the same in
> the host via virtio. So let's lower the priority of
> virtio-crypto's algorithm, make it's higher than sofeware
                                                   software
> implimentations but lower than the hardware ones.
  implementations
> 
> Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  drivers/crypto/virtio/virtio_crypto_algs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c b/drivers/crypto/virtio/virtio_crypto_algs.c
> index 6f40a42..4de4740 100644
> --- a/drivers/crypto/virtio/virtio_crypto_algs.c
> +++ b/drivers/crypto/virtio/virtio_crypto_algs.c
> @@ -498,7 +498,7 @@ void virtio_crypto_ablkcipher_finalize_req(
>  static struct crypto_alg virtio_crypto_algs[] = { {
>  	.cra_name = "cbc(aes)",
>  	.cra_driver_name = "virtio_crypto_aes_cbc",
> -	.cra_priority = 501,
> +	.cra_priority = 150,
>  	.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
>  	.cra_blocksize = AES_BLOCK_SIZE,
>  	.cra_ctxsize  = sizeof(struct virtio_crypto_ablkcipher_ctx),
> 

^ permalink raw reply

* [PATCH] crypto: arm/aes - avoid reserved 'tt' mnemonic in asm code
From: Ard Biesheuvel @ 2017-01-13  8:33 UTC (permalink / raw)
  To: linux-crypto; +Cc: linux-arm-kernel, herbert, arnd, Ard Biesheuvel

The ARMv8-M architecture introduces 'tt' and 'ttt' instructions,
which means we can no longer use 'tt' as a register alias on recent
versions of binutils for ARM. So replace the alias with 'ttab'.

Fixes: 81edb4262975 ("crypto: arm/aes - replace scalar AES cipher")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/crypto/aes-cipher-core.S | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm/crypto/aes-cipher-core.S b/arch/arm/crypto/aes-cipher-core.S
index b04261e1e068..c817a86c4ca8 100644
--- a/arch/arm/crypto/aes-cipher-core.S
+++ b/arch/arm/crypto/aes-cipher-core.S
@@ -18,7 +18,7 @@
 	rounds		.req	r1
 	in		.req	r2
 	out		.req	r3
-	tt		.req	ip
+	ttab		.req	ip
 
 	t0		.req	lr
 	t1		.req	r2
@@ -34,9 +34,9 @@
 
 	.macro		__load, out, in, idx
 	.if		__LINUX_ARM_ARCH__ < 7 && \idx > 0
-	ldr		\out, [tt, \in, lsr #(8 * \idx) - 2]
+	ldr		\out, [ttab, \in, lsr #(8 * \idx) - 2]
 	.else
-	ldr		\out, [tt, \in, lsl #2]
+	ldr		\out, [ttab, \in, lsl #2]
 	.endif
 	.endm
 
@@ -136,7 +136,7 @@
 	eor		r6, r6, r10
 	eor		r7, r7, r11
 
-	__adrl		tt, \ttab
+	__adrl		ttab, \ttab
 
 	tst		rounds, #2
 	bne		1f
@@ -146,7 +146,7 @@
 
 1:	subs		rounds, rounds, #4
 	\round		r8, r9, r10, r11, r4, r5, r6, r7
-	__adrl		tt, \ltab, ls
+	__adrl		ttab, \ltab, ls
 	\round		r4, r5, r6, r7, r8, r9, r10, r11
 	bhi		0b
 
-- 
2.7.4

^ permalink raw reply related

* Re: x86-64: Maintain 16-byte stack alignment
From: Herbert Xu @ 2017-01-13  8:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Josh Poimboeuf, Linux Kernel Mailing List,
	Linux Crypto Mailing List, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Ard Biesheuvel
In-Reply-To: <CALCETrVz-wEFVUwrpS8-Ln9SWnsF5KxkqJC-Br6wJ+e0LGM9UA@mail.gmail.com>

On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
>
> I think we have some inline functions that do asm volatile ("call
> ..."), and I don't see any credible way of forcing alignment short of
> generating an entirely new stack frame and aligning that.  Ick.  This

A straight asm call from C should always work because gcc keeps
the stack aligned in the prologue.

The only problem with inline assembly is when you start pushing
things onto the stack directly.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: x86-64: Maintain 16-byte stack alignment
From: Herbert Xu @ 2017-01-13  8:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Andy Lutomirski, Linux Kernel Mailing List,
	Linux Crypto Mailing List, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Ard Biesheuvel
In-Reply-To: <CA+55aFwkodRf3QM+GLzq5G4F7fp+=ds-cHJRtCgJczMZgVx6Ug@mail.gmail.com>

On Thu, Jan 12, 2017 at 01:40:54PM -0800, Linus Torvalds wrote:
> 
> The 8-byte alignment mainly makes sense when the basic call sequence
> just adds 8 bytes, and you have functions without frames (that still
> call other functions).

The question is does it really make sense to save those 8 bytes
of padding on x86-64 when arm64 apparently also requires 16-byte
stack alignment.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: x86-64: Maintain 16-byte stack alignment
From: Herbert Xu @ 2017-01-13  8:39 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Josh Poimboeuf, Linus Torvalds, Linux Kernel Mailing List,
	Linux Crypto Mailing List, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Ard Biesheuvel
In-Reply-To: <CALCETrXom8aY2XhpAyOtAwQQYF7wftBHJE_px1xr0iRmcYEJoA@mail.gmail.com>

On Thu, Jan 12, 2017 at 05:46:55PM -0800, Andy Lutomirski wrote:
> 
> What I mean is: what guarantees that the stack is properly aligned for
> the subroutine call?  gcc promises to set up a stack frame, but does
> it promise that rsp will be properly aligned to call a C function?

Yes, as long as you don't go behind its back and start directly
pushing things onto the stack with inline asm.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: x86-64: Maintain 16-byte stack alignment
From: Herbert Xu @ 2017-01-13  8:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Linux Crypto Mailing List, Thomas Gleixner,
	Andy Lutomirski, Ingo Molnar, Andy Lutomirski,
	Linux Kernel Mailing List, Ard Biesheuvel
In-Reply-To: <CA+55aFzRrSwGxxfZk-RUEnsz=xhcSmOwE1CenfCPBWtsS9MwDw@mail.gmail.com>

On Thu, Jan 12, 2017 at 08:37:18PM -0800, Linus Torvalds wrote:
>
> So then the compiler actually needs to start adding useless instructions
> just to keep the stack 16-byte aligned.

Which it does.  Of course most of the time no extra instructions
are required because there are stack variables, so it's just matter
of adding 8 to the value you're subtracting from rsp.  But it is
probably why gcc assumes that the stack is 16-byte aligned which
triggered my original crash.

Here is an example from the function that was involved in the crash,
without frame pointers:

00000000000001b0 <chacha20_simd>:
 1b0:   41 54                   push   %r12
 1b2:   55                      push   %rbp
 1b3:   48 81 ec f8 00 00 00    sub    $0xf8,%rsp

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: x86-64: Maintain 16-byte stack alignment
From: Herbert Xu @ 2017-01-13  8:43 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Linus Torvalds, Linux Crypto Mailing List, Thomas Gleixner,
	Andy Lutomirski, Ingo Molnar, Andy Lutomirski,
	Linux Kernel Mailing List, Ard Biesheuvel
In-Reply-To: <20170113050709.yrdtfet5d4sebubi@treble>

On Thu, Jan 12, 2017 at 11:07:09PM -0600, Josh Poimboeuf wrote:
> 
> Disabling frame pointers didn't seem to help, but I finally got it to
> misalign with a different test case.  I think it had been aligning the
> array, so instead I made it push a register.

Right.  If you start manipulating the stack directly then all bets
are off.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] crypto: testmgr - don't DMA map IV from stack in test_skcipher()
From: Herbert Xu @ 2017-01-13  8:46 UTC (permalink / raw)
  To: Horia Geantă; +Cc: David S. Miller, linux-crypto
In-Reply-To: <1484290756-20868-1-git-send-email-horia.geanta@nxp.com>

On Fri, Jan 13, 2017 at 08:59:16AM +0200, Horia Geantă wrote:
> Fix the "DMA-API: device driver maps memory from stack" warning
> generated when crypto accelerators map the IV.
> 
> Signed-off-by: Horia Geantă <horia.geanta@nxp.com>

Hmm, the IV comes in as a pointer.  So you should not assume that
it can be DMAed at all.

Perhaps we should change the API so that it gets passed in as an
SG list.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* RE: [PATCH] virtio-crypto: adjust priority of algorithm
From: Gonglei (Arei) @ 2017-01-13  9:24 UTC (permalink / raw)
  To: Christian Borntraeger, virtualization@lists.linux-foundation.org,
	linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org
  Cc: herbert@gondor.apana.org.au, mst@redhat.com
In-Reply-To: <874ee338-4dd6-8a2c-fb95-59bd784de026@de.ibm.com>

>
> From: Christian Borntraeger [mailto:borntraeger@de.ibm.com]
> Sent: Friday, January 13, 2017 4:28 PM
> To: Gonglei (Arei); virtualization@lists.linux-foundation.org;
> linux-crypto@vger.kernel.org; linux-kernel@vger.kernel.org
> Cc: mst@redhat.com; herbert@gondor.apana.org.au
> Subject: Re: [PATCH] virtio-crypto: adjust priority of algorithm
> 
> ACK. Whoever takes this patch might want to fixup 3 typos.
> 
Thanks, I'd better send v2 IMHO.  :)

Regards,
-Gonglei

> On 01/13/2017 07:25 AM, Gonglei wrote:
> 
> > Some hardware accelerators (like intel aseni or the s390
> 					aesni
> > cpacf functions) have lower priorities than virtio
> > crypto, and those drivers are faster than the same in
> > the host via virtio. So let's lower the priority of
> > virtio-crypto's algorithm, make it's higher than sofeware
>                                                    software
> > implimentations but lower than the hardware ones.
>   implementations
> >
> > Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> > ---
> >  drivers/crypto/virtio/virtio_crypto_algs.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c
> b/drivers/crypto/virtio/virtio_crypto_algs.c
> > index 6f40a42..4de4740 100644
> > --- a/drivers/crypto/virtio/virtio_crypto_algs.c
> > +++ b/drivers/crypto/virtio/virtio_crypto_algs.c
> > @@ -498,7 +498,7 @@ void virtio_crypto_ablkcipher_finalize_req(
> >  static struct crypto_alg virtio_crypto_algs[] = { {
> >  	.cra_name = "cbc(aes)",
> >  	.cra_driver_name = "virtio_crypto_aes_cbc",
> > -	.cra_priority = 501,
> > +	.cra_priority = 150,
> >  	.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
> >  	.cra_blocksize = AES_BLOCK_SIZE,
> >  	.cra_ctxsize  = sizeof(struct virtio_crypto_ablkcipher_ctx),
> >

^ permalink raw reply

* [PATCH v2] virtio-crypto: adjust priority of algorithm
From: Gonglei @ 2017-01-13  9:34 UTC (permalink / raw)
  To: virtualization, linux-crypto, linux-kernel
  Cc: mst, herbert, borntraeger, Gonglei

Some hardware accelerators (like intel aesni or the s390
cpacf functions) have lower priorities than virtio
crypto, and those drivers are faster than the same in
the host via virtio. So let's lower the priority of
virtio-crypto's algorithm, make it's higher than software
implementations but lower than the hardware ones.

Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
v2:
    fix three typos. [Christian]
---
 drivers/crypto/virtio/virtio_crypto_algs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c b/drivers/crypto/virtio/virtio_crypto_algs.c
index 6f40a42..4de4740 100644
--- a/drivers/crypto/virtio/virtio_crypto_algs.c
+++ b/drivers/crypto/virtio/virtio_crypto_algs.c
@@ -498,7 +498,7 @@ void virtio_crypto_ablkcipher_finalize_req(
 static struct crypto_alg virtio_crypto_algs[] = { {
 	.cra_name = "cbc(aes)",
 	.cra_driver_name = "virtio_crypto_aes_cbc",
-	.cra_priority = 501,
+	.cra_priority = 150,
 	.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
 	.cra_blocksize = AES_BLOCK_SIZE,
 	.cra_ctxsize  = sizeof(struct virtio_crypto_ablkcipher_ctx),
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH 1/2] crypto: aead AF_ALG - overhaul memory management
From: Herbert Xu @ 2017-01-13 10:21 UTC (permalink / raw)
  To: Stephan Müller; +Cc: linux-crypto
In-Reply-To: <5211147.RjDSfvrhhz@tauon.atsec.com>

On Thu, Jan 12, 2017 at 05:19:57PM +0100, Stephan Müller wrote:
>
> > I don't understand, what's wrong with:
> > 
> > sendmsg(fd, ...)
> > aio_read(iocb1)
> > sendmsg(fd, ...)
> > aio_read(iocb2)
> 
> Sure, that works. But here you limit yourself to one IOCB per aio_read. But 
> aio_read supports multiple IOCBs in one invocation. And this is the issue I am 
> considering.

Not really.  You just lay it out in the same way with lio_listio.
That is, a write followed by read, etc.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 00/13] crypto: copy AAD during encrypt for AEAD ciphers
From: Herbert Xu @ 2017-01-13 10:23 UTC (permalink / raw)
  To: Stephan Müller; +Cc: linux-crypto
In-Reply-To: <3463059.TVfkfLOfNf@tauon.atsec.com>

On Thu, Jan 12, 2017 at 05:37:33PM +0100, Stephan Müller wrote:
>
> I would not understand that statement.
> 
> With the patch mentioned above that I provided some weeks ago, we have the 
> following scenario for an encryption (in case of decryption, it is almost 
> identical, just the tag location is reversed):
> 
> user calls sendmsg with data buffer/IOVEC: AAD || PT
> 	-> algif_aead turns this into the src SGL
> 
> user calls recvmsg with data buffer/IOVEC: CT || Tag
> 	-> algif_aead creates the first SG entry in the dst SGL pointing to the 
> AAD from the src SGL
> 	-> algif_aead appends the user buffers to the dst SGL
> 
> 	-> algif_aead performs its operation and during that operation, only the 
> CT and Tag parts are changed
> 
> I.e. with the pre-pending of the SG pointing to the AAD from the src SGL to 
> the dst SGL we have a clean invocation of the kernel API.

But that means you can never invoke the in-place path of the kernel
API, which is the most optimised code path.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v2 0/7] crypto: ARM/arm64 - AES and ChaCha20 updates for v4.11
From: Herbert Xu @ 2017-01-13 10:28 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-crypto@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <CAKv+Gu9a5ZsWtT0XYczPBf54L7d5Whu5gG1eUOt8T-wms0orbg@mail.gmail.com>

On Thu, Jan 12, 2017 at 04:48:08PM +0000, Ard Biesheuvel wrote:
>
> Actually, patch #6 was the huge one not #7, and I don't see it in your tree yet.
> 
> https://git.kernel.org/cgit/linux/kernel/git/ardb/linux.git/commit/?h=crypto-arm-v4.11&id=cbf03b255f7c
> 
> The order does not matter, though, so could you please put it on top? Thanks.

OK I've applied it now and will push out soon.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt
From: Herbert Xu @ 2017-01-13 10:41 UTC (permalink / raw)
  To: Ondrej Mosnacek
  Cc: linux-crypto, dm-devel, Mike Snitzer, Milan Broz, Mikulas Patocka,
	Binoy Jayan
In-Reply-To: <cover.1484215956.git.omosnacek@gmail.com>

On Thu, Jan 12, 2017 at 01:59:52PM +0100, Ondrej Mosnacek wrote:
> 
> the goal of this patchset is to allow those skcipher API users that need to
> process batches of small messages (especially dm-crypt) to do so efficiently.

Please explain why this can't be done with the existing framework
using IV generators similar to the ones used for IPsec.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 1/2] crypto: aead AF_ALG - overhaul memory management
From: Stephan Müller @ 2017-01-13 10:49 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-crypto
In-Reply-To: <20170113102145.GA23349@gondor.apana.org.au>

Am Freitag, 13. Januar 2017, 18:21:45 CET schrieb Herbert Xu:

Hi Herbert,

> On Thu, Jan 12, 2017 at 05:19:57PM +0100, Stephan Müller wrote:
> > > I don't understand, what's wrong with:
> > > 
> > > sendmsg(fd, ...)
> > > aio_read(iocb1)
> > > sendmsg(fd, ...)
> > > aio_read(iocb2)
> > 
> > Sure, that works. But here you limit yourself to one IOCB per aio_read.
> > But
> > aio_read supports multiple IOCBs in one invocation. And this is the issue
> > I am considering.
> 
> Not really.  You just lay it out in the same way with lio_listio.
> That is, a write followed by read, etc.

According to the man page of lio_listio(3) the provided AIO operations are 
executed in an unspecified order. I would infer from that statement that even 
if an order of write / read / write / read is defined by the caller, this 
order may not be followed by the kernel. Thus we would need to consider the 
case that in the end, algif has to process the order of write / write / read / 
read or any other order.

Besides, the crashes I reported for the current AIO implementation in 
algif_aead and algif_skcipher are always triggered when invoking an aio_read 
with two or more IOCBs. The most important aspect I want to cover with the 
patch set is to stop crashing the kernel.

Ciao
Stephan

^ permalink raw reply

* Re: [PATCH 00/13] crypto: copy AAD during encrypt for AEAD ciphers
From: Stephan Müller @ 2017-01-13 10:58 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-crypto
In-Reply-To: <20170113102339.GB23349@gondor.apana.org.au>

Am Freitag, 13. Januar 2017, 18:23:39 CET schrieb Herbert Xu:

Hi Herbert,

> On Thu, Jan 12, 2017 at 05:37:33PM +0100, Stephan Müller wrote:
> > I would not understand that statement.
> > 
> > With the patch mentioned above that I provided some weeks ago, we have the
> > following scenario for an encryption (in case of decryption, it is almost
> > identical, just the tag location is reversed):
> > 
> > user calls sendmsg with data buffer/IOVEC: AAD || PT
> > 
> > 	-> algif_aead turns this into the src SGL
> > 
> > user calls recvmsg with data buffer/IOVEC: CT || Tag
> > 
> > 	-> algif_aead creates the first SG entry in the dst SGL pointing to the
> > 
> > AAD from the src SGL
> > 
> > 	-> algif_aead appends the user buffers to the dst SGL
> > 	
> > 	-> algif_aead performs its operation and during that operation, only the
> > 
> > CT and Tag parts are changed
> > 
> > I.e. with the pre-pending of the SG pointing to the AAD from the src SGL
> > to
> > the dst SGL we have a clean invocation of the kernel API.
> 
> But that means you can never invoke the in-place path of the kernel
> API, which is the most optimised code path.

May I ask how the in-place code path can be invoked by algif_aead or 
algif_skcipher? As far as I understand, this code path is only invoked when 
the cipher implementation sees that the src and dst SGLs are identical.

However, both algif interfaces maintain separate src and dst SGLs and always 
invoke the cipher operation with these dissimilar SGLs. Thus, I would infer 
that even when the user invokes zerocopy, the src/dst SGLs are not identical 
and therefore the cipher implementations would not use the in-place code path.

Ciao
Stephan

^ permalink raw reply

* Re: [PATCH 1/2] crypto: aead AF_ALG - overhaul memory management
From: Herbert Xu @ 2017-01-13 11:03 UTC (permalink / raw)
  To: Stephan Müller; +Cc: linux-crypto
In-Reply-To: <1588177.ehMkkoOMrj@positron.chronox.de>

On Fri, Jan 13, 2017 at 11:49:05AM +0100, Stephan Müller wrote:
> 
> According to the man page of lio_listio(3) the provided AIO operations are 
> executed in an unspecified order. I would infer from that statement that even 
> if an order of write / read / write / read is defined by the caller, this 
> order may not be followed by the kernel. Thus we would need to consider the 
> case that in the end, algif has to process the order of write / write / read / 
> read or any other order.

Well if ordering is not guaranteed that I don't see how your code
can work either.  Or am I missing something?

> Besides, the crashes I reported for the current AIO implementation in 
> algif_aead and algif_skcipher are always triggered when invoking an aio_read 
> with two or more IOCBs. The most important aspect I want to cover with the 
> patch set is to stop crashing the kernel.

Please stop adding new features and just fix the existing crash.
Once we have that covered and backported to stable then we can
start addressing new features.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 1/2] crypto: aead AF_ALG - overhaul memory management
From: Stephan Müller @ 2017-01-13 11:10 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-crypto
In-Reply-To: <20170113110335.GA23617@gondor.apana.org.au>

Am Freitag, 13. Januar 2017, 19:03:35 CET schrieb Herbert Xu:

Hi Herbert,

> On Fri, Jan 13, 2017 at 11:49:05AM +0100, Stephan Müller wrote:
> > According to the man page of lio_listio(3) the provided AIO operations are
> > executed in an unspecified order. I would infer from that statement that
> > even if an order of write / read / write / read is defined by the caller,
> > this order may not be followed by the kernel. Thus we would need to
> > consider the case that in the end, algif has to process the order of
> > write / write / read / read or any other order.
> 
> Well if ordering is not guaranteed that I don't see how your code
> can work either.  Or am I missing something?

The patch simply stores all data it gets from sendmsg in the src SGL. In 
addition it maintains an offset pointer into that src SGLs.

When the recvmsg call comes in and the dst SGL is prepared, it simply takes as 
much data from the src SGL as needed to cover the request defined by the dst 
SGL. After completing that operation, the offset pointer is moved forward to 
point to a yet unused part of the src SGL. If another recvmsg comes in without 
an intermediate sendmsg, it simply starts using the data from the src SGL 
starting from the offset.

Therefore, the code should now be able to handle a write / write / read / read 
scenario. Or it can handle, say, a write(32 bytes) / read (16 bytes) / read 
(16 bytes). At least my tests covered a successful testing of that scenario 
which always crashed the kernel before.

Ciao
Stephan

^ permalink raw reply

* Re: [PATCH 00/13] crypto: copy AAD during encrypt for AEAD ciphers
From: Stephan Müller @ 2017-01-13 11:12 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-crypto
In-Reply-To: <20170113110417.GB23617@gondor.apana.org.au>

Am Freitag, 13. Januar 2017, 19:04:17 CET schrieb Herbert Xu:

Hi Herbert,

> On Fri, Jan 13, 2017 at 11:58:24AM +0100, Stephan Müller wrote:
> > May I ask how the in-place code path can be invoked by algif_aead or
> > algif_skcipher? As far as I understand, this code path is only invoked
> > when
> > the cipher implementation sees that the src and dst SGLs are identical.
> 
> It's not right now but it isn't difficult to add the code to allow
> it by comparing SGLs.

Adding such code should IMHO not be impaired by pointing to the AAD held in 
the src SGL by the dst SGL as offered with the older patch mentioned before.

Ciao
Stephan

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox