* Re: [RFC PATCH 5/6] crypto: aesni-intel - Add bulk request support
From: Eric Biggers @ 2017-01-13 3:19 UTC (permalink / raw)
To: Ondrej Mosnacek
Cc: Herbert Xu, linux-crypto, dm-devel, Mike Snitzer, Milan Broz,
Mikulas Patocka, Binoy Jayan
In-Reply-To: <c32a28630157c619ac2a7c851be586e72f193c68.1484215956.git.omosnacek@gmail.com>
On Thu, Jan 12, 2017 at 01:59:57PM +0100, Ondrej Mosnacek wrote:
> This patch implements bulk request handling in the AES-NI crypto drivers.
> The major advantage of this is that with bulk requests, the kernel_fpu_*
> functions (which are usually quite slow) are now called only once for the whole
> request.
>
Hi Ondrej,
To what extent does the performance benefit of this patchset result from just
the reduced numbers of calls to kernel_fpu_begin() and kernel_fpu_end()?
If it's most of the benefit, would it make any sense to optimize
kernel_fpu_begin() and kernel_fpu_end() instead?
And if there are other examples besides kernel_fpu_begin/kernel_fpu_end where
the bulk API would provide a significant performance boost, can you mention
them?
Interestingly, the arm64 equivalent to kernel_fpu_begin()
(kernel_neon_begin_partial() in arch/arm64/kernel/fpsimd.c) appears to have an
optimization where the SIMD registers aren't saved if they were already saved.
I wonder why something similar isn't done on x86.
Eric
^ permalink raw reply
* Re: x86-64: Maintain 16-byte stack alignment
From: Andy Lutomirski @ 2017-01-13 3:23 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Linus Torvalds, Herbert Xu, Linux Kernel Mailing List,
Linux Crypto Mailing List, Ingo Molnar, Thomas Gleixner,
Andy Lutomirski, Ard Biesheuvel
In-Reply-To: <20170113031107.mgitq54fmjnrvi6f@treble>
On Thu, Jan 12, 2017 at 7:11 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Thu, Jan 12, 2017 at 05:46:55PM -0800, Andy Lutomirski wrote:
>> On Thu, Jan 12, 2017 at 12:15 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> > On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
>> >> On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
>> >> <torvalds@linux-foundation.org> wrote:
>> >> > On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> >> >>
>> >> >> Just to clarify, I think you're asking if, for versions of gcc which
>> >> >> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
>> >> >> functions to ensure their stacks are 16-byte aligned.
>> >> >>
>> >> >> It's certainly possible, but I don't see how that solves the problem.
>> >> >> The stack will still be misaligned by entry code. Or am I missing
>> >> >> something?
>> >> >
>> >> > I think the argument is that we *could* try to align things, if we
>> >> > just had some tool that actually then verified that we aren't missing
>> >> > anything.
>> >> >
>> >> > I'm not entirely happy with checking the generated code, though,
>> >> > because as Ingo says, you have a 50:50 chance of just getting it right
>> >> > by mistake. So I'd much rather have some static tool that checks
>> >> > things at a code level (ie coccinelle or sparse).
>> >>
>> >> What I meant was checking the entry code to see if it aligns stack
>> >> frames, and good luck getting sparse to do that. Hmm, getting 16-byte
>> >> alignment for real may actually be entirely a lost cause. After all,
>> >> I think we have some inline functions that do asm volatile ("call
>> >> ..."), and I don't see any credible way of forcing alignment short of
>> >> generating an entirely new stack frame and aligning that.
>> >
>> > Actually we already found all such cases and fixed them by forcing a new
>> > stack frame, thanks to objtool. For example, see 55a76b59b5fe.
>>
>> What I mean is: what guarantees that the stack is properly aligned for
>> the subroutine call? gcc promises to set up a stack frame, but does
>> it promise that rsp will be properly aligned to call a C function?
>
> Yes, I did an experiment and you're right. I had naively assumed that
> all stack frames would be aligned.
Just to check: did you do your experiment with -mpreferred-stack-boundary=4?
--Andy
^ permalink raw reply
* Re: x86-64: Maintain 16-byte stack alignment
From: Josh Poimboeuf @ 2017-01-13 4:27 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Linus Torvalds, Herbert Xu, Linux Kernel Mailing List,
Linux Crypto Mailing List, Ingo Molnar, Thomas Gleixner,
Andy Lutomirski, Ard Biesheuvel
In-Reply-To: <CALCETrUFCFn-rKnr+NG3SU7J78ree9siJC=Kz8f_Bk6eG2HyPA@mail.gmail.com>
On Thu, Jan 12, 2017 at 07:23:18PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 12, 2017 at 7:11 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Thu, Jan 12, 2017 at 05:46:55PM -0800, Andy Lutomirski wrote:
> >> On Thu, Jan 12, 2017 at 12:15 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> > On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
> >> >> On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
> >> >> <torvalds@linux-foundation.org> wrote:
> >> >> > On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> >> >>
> >> >> >> Just to clarify, I think you're asking if, for versions of gcc which
> >> >> >> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
> >> >> >> functions to ensure their stacks are 16-byte aligned.
> >> >> >>
> >> >> >> It's certainly possible, but I don't see how that solves the problem.
> >> >> >> The stack will still be misaligned by entry code. Or am I missing
> >> >> >> something?
> >> >> >
> >> >> > I think the argument is that we *could* try to align things, if we
> >> >> > just had some tool that actually then verified that we aren't missing
> >> >> > anything.
> >> >> >
> >> >> > I'm not entirely happy with checking the generated code, though,
> >> >> > because as Ingo says, you have a 50:50 chance of just getting it right
> >> >> > by mistake. So I'd much rather have some static tool that checks
> >> >> > things at a code level (ie coccinelle or sparse).
> >> >>
> >> >> What I meant was checking the entry code to see if it aligns stack
> >> >> frames, and good luck getting sparse to do that. Hmm, getting 16-byte
> >> >> alignment for real may actually be entirely a lost cause. After all,
> >> >> I think we have some inline functions that do asm volatile ("call
> >> >> ..."), and I don't see any credible way of forcing alignment short of
> >> >> generating an entirely new stack frame and aligning that.
> >> >
> >> > Actually we already found all such cases and fixed them by forcing a new
> >> > stack frame, thanks to objtool. For example, see 55a76b59b5fe.
> >>
> >> What I mean is: what guarantees that the stack is properly aligned for
> >> the subroutine call? gcc promises to set up a stack frame, but does
> >> it promise that rsp will be properly aligned to call a C function?
> >
> > Yes, I did an experiment and you're right. I had naively assumed that
> > all stack frames would be aligned.
>
> Just to check: did you do your experiment with -mpreferred-stack-boundary=4?
Yes, but it's too late for me to be doing hard stuff and I think my
first experiment was bogus. I didn't use all the other kernel-specific
gcc options.
I tried again with all the kernel gcc options, except with
-mpreferred-stack-boundary=4 instead of 3, and actually came up with the
opposite conclusion.
I used the following code:
void otherfunc(void);
static inline void bar(long *f)
{
asm volatile("call otherfunc" : : "m" (f) : );
}
void foo(void)
{
long buf[3] = {0, 0, 0};
bar(buf);
}
The stack frame was always 16-byte aligned regardless of whether the
buf array size was even or odd.
So my half-asleep brain is telling me that my original assumption was
right.
--
Josh
^ permalink raw reply
* Re: x86-64: Maintain 16-byte stack alignment
From: Josh Poimboeuf @ 2017-01-13 5:07 UTC (permalink / raw)
To: Linus Torvalds
Cc: Linux Crypto Mailing List, Thomas Gleixner, Herbert Xu,
Andy Lutomirski, Ingo Molnar, Andy Lutomirski,
Linux Kernel Mailing List, Ard Biesheuvel
In-Reply-To: <CA+55aFzRrSwGxxfZk-RUEnsz=xhcSmOwE1CenfCPBWtsS9MwDw@mail.gmail.com>
On Thu, Jan 12, 2017 at 08:37:18PM -0800, Linus Torvalds wrote:
> On Jan 12, 2017 8:28 PM, "Josh Poimboeuf" <jpoimboe@redhat.com> wrote:
>
>
> The stack frame was always 16-byte aligned regardless of whether the
> buf array size was even or odd.
>
>
> Including with -fomit-frame-pointer?
>
> With frame pointers, stack frames really are naturally 16 bytes, and then
> keeping the frame 16-byte aligned is just a matter of making any extra
> frame allocations or push/pop sequences that you do also be a multiple of
> 16 bytes.
>
> But *without* frame pointers, the"native" frame size is just 8 bytes, and a
> function that doesn't need any other local storage and then calls another
> function (think various trivial wrapper functions that just add an argument
> and then munge the return value) would thus naturally cause the frame to
> become misaligned.
>
> So then the compiler actually needs to start adding useless instructions
> just to keep the stack 16-byte aligned.
Disabling frame pointers didn't seem to help, but I finally got it to
misalign with a different test case. I think it had been aligning the
array, so instead I made it push a register.
void otherfunc(void);
static inline void bar(int f)
{
register void *__sp asm(_ASM_SP);
asm volatile("call otherfunc" : "+r" (__sp) : "b"(f));
}
void foo(void)
{
bar(5);
}
00000000000020f0 <foo>:
20f0: 55 push %rbp
20f1: 48 89 e5 mov %rsp,%rbp
20f4: 53 push %rbx
20f5: bb 05 00 00 00 mov $0x5,%ebx
20fa: e8 00 00 00 00 callq 20ff <foo+0xf>
20fb: R_X86_64_PC32 otherfunc-0x4
20ff: 5b pop %rbx
2100: 5d pop %rbp
2101: c3 retq
2102: 0f 1f 40 00 nopl 0x0(%rax)
2106: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
210d: 00 00 00
--
Josh
^ permalink raw reply
* [PATCH] virtio-crypto: adjust priority of algorithm
From: Gonglei @ 2017-01-13 6:25 UTC (permalink / raw)
To: virtualization, linux-crypto, linux-kernel
Cc: mst, herbert, borntraeger, Gonglei
Some hardware accelerators (like intel aseni or the s390
cpacf functions) have lower priorities than virtio
crypto, and those drivers are faster than the same in
the host via virtio. So let's lower the priority of
virtio-crypto's algorithm, make it's higher than sofeware
implimentations but lower than the hardware ones.
Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
drivers/crypto/virtio/virtio_crypto_algs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c b/drivers/crypto/virtio/virtio_crypto_algs.c
index 6f40a42..4de4740 100644
--- a/drivers/crypto/virtio/virtio_crypto_algs.c
+++ b/drivers/crypto/virtio/virtio_crypto_algs.c
@@ -498,7 +498,7 @@ void virtio_crypto_ablkcipher_finalize_req(
static struct crypto_alg virtio_crypto_algs[] = { {
.cra_name = "cbc(aes)",
.cra_driver_name = "virtio_crypto_aes_cbc",
- .cra_priority = 501,
+ .cra_priority = 150,
.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct virtio_crypto_ablkcipher_ctx),
--
1.8.3.1
^ permalink raw reply related
* [PATCH] crypto: testmgr - don't DMA map IV from stack in test_skcipher()
From: Horia Geantă @ 2017-01-13 6:59 UTC (permalink / raw)
To: Herbert Xu; +Cc: David S. Miller, linux-crypto
Fix the "DMA-API: device driver maps memory from stack" warning
generated when crypto accelerators map the IV.
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
---
crypto/testmgr.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 500a5277cc22..64245aeef634 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -1081,12 +1081,16 @@ static int __test_skcipher(struct crypto_skcipher *tfm, int enc,
const char *e, *d;
struct tcrypt_result result;
void *data;
- char iv[MAX_IVLEN];
+ char *iv;
char *xbuf[XBUFSIZE];
char *xoutbuf[XBUFSIZE];
int ret = -ENOMEM;
unsigned int ivsize = crypto_skcipher_ivsize(tfm);
+ iv = kmalloc(MAX_IVLEN, GFP_KERNEL);
+ if (!iv)
+ return ret;
+
if (testmgr_alloc_buf(xbuf))
goto out_nobuf;
@@ -1328,6 +1332,7 @@ static int __test_skcipher(struct crypto_skcipher *tfm, int enc,
out_nooutbuf:
testmgr_free_buf(xbuf);
out_nobuf:
+ kfree(iv);
return ret;
}
--
2.4.4
^ permalink raw reply related
* Re: [PATCH] virtio-crypto: adjust priority of algorithm
From: Christian Borntraeger @ 2017-01-13 8:28 UTC (permalink / raw)
To: Gonglei, virtualization, linux-crypto, linux-kernel; +Cc: herbert, mst
In-Reply-To: <1484288741-31100-1-git-send-email-arei.gonglei@huawei.com>
ACK. Whoever takes this patch might want to fixup 3 typos.
On 01/13/2017 07:25 AM, Gonglei wrote:
> Some hardware accelerators (like intel aseni or the s390
aesni
> cpacf functions) have lower priorities than virtio
> crypto, and those drivers are faster than the same in
> the host via virtio. So let's lower the priority of
> virtio-crypto's algorithm, make it's higher than sofeware
software
> implimentations but lower than the hardware ones.
implementations
>
> Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
> drivers/crypto/virtio/virtio_crypto_algs.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c b/drivers/crypto/virtio/virtio_crypto_algs.c
> index 6f40a42..4de4740 100644
> --- a/drivers/crypto/virtio/virtio_crypto_algs.c
> +++ b/drivers/crypto/virtio/virtio_crypto_algs.c
> @@ -498,7 +498,7 @@ void virtio_crypto_ablkcipher_finalize_req(
> static struct crypto_alg virtio_crypto_algs[] = { {
> .cra_name = "cbc(aes)",
> .cra_driver_name = "virtio_crypto_aes_cbc",
> - .cra_priority = 501,
> + .cra_priority = 150,
> .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
> .cra_blocksize = AES_BLOCK_SIZE,
> .cra_ctxsize = sizeof(struct virtio_crypto_ablkcipher_ctx),
>
^ permalink raw reply
* [PATCH] crypto: arm/aes - avoid reserved 'tt' mnemonic in asm code
From: Ard Biesheuvel @ 2017-01-13 8:33 UTC (permalink / raw)
To: linux-crypto; +Cc: linux-arm-kernel, herbert, arnd, Ard Biesheuvel
The ARMv8-M architecture introduces 'tt' and 'ttt' instructions,
which means we can no longer use 'tt' as a register alias on recent
versions of binutils for ARM. So replace the alias with 'ttab'.
Fixes: 81edb4262975 ("crypto: arm/aes - replace scalar AES cipher")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm/crypto/aes-cipher-core.S | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/arm/crypto/aes-cipher-core.S b/arch/arm/crypto/aes-cipher-core.S
index b04261e1e068..c817a86c4ca8 100644
--- a/arch/arm/crypto/aes-cipher-core.S
+++ b/arch/arm/crypto/aes-cipher-core.S
@@ -18,7 +18,7 @@
rounds .req r1
in .req r2
out .req r3
- tt .req ip
+ ttab .req ip
t0 .req lr
t1 .req r2
@@ -34,9 +34,9 @@
.macro __load, out, in, idx
.if __LINUX_ARM_ARCH__ < 7 && \idx > 0
- ldr \out, [tt, \in, lsr #(8 * \idx) - 2]
+ ldr \out, [ttab, \in, lsr #(8 * \idx) - 2]
.else
- ldr \out, [tt, \in, lsl #2]
+ ldr \out, [ttab, \in, lsl #2]
.endif
.endm
@@ -136,7 +136,7 @@
eor r6, r6, r10
eor r7, r7, r11
- __adrl tt, \ttab
+ __adrl ttab, \ttab
tst rounds, #2
bne 1f
@@ -146,7 +146,7 @@
1: subs rounds, rounds, #4
\round r8, r9, r10, r11, r4, r5, r6, r7
- __adrl tt, \ltab, ls
+ __adrl ttab, \ltab, ls
\round r4, r5, r6, r7, r8, r9, r10, r11
bhi 0b
--
2.7.4
^ permalink raw reply related
* Re: x86-64: Maintain 16-byte stack alignment
From: Herbert Xu @ 2017-01-13 8:36 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Linus Torvalds, Josh Poimboeuf, Linux Kernel Mailing List,
Linux Crypto Mailing List, Ingo Molnar, Thomas Gleixner,
Andy Lutomirski, Ard Biesheuvel
In-Reply-To: <CALCETrVz-wEFVUwrpS8-Ln9SWnsF5KxkqJC-Br6wJ+e0LGM9UA@mail.gmail.com>
On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
>
> I think we have some inline functions that do asm volatile ("call
> ..."), and I don't see any credible way of forcing alignment short of
> generating an entirely new stack frame and aligning that. Ick. This
A straight asm call from C should always work because gcc keeps
the stack aligned in the prologue.
The only problem with inline assembly is when you start pushing
things onto the stack directly.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: x86-64: Maintain 16-byte stack alignment
From: Herbert Xu @ 2017-01-13 8:38 UTC (permalink / raw)
To: Linus Torvalds
Cc: Josh Poimboeuf, Andy Lutomirski, Linux Kernel Mailing List,
Linux Crypto Mailing List, Ingo Molnar, Thomas Gleixner,
Andy Lutomirski, Ard Biesheuvel
In-Reply-To: <CA+55aFwkodRf3QM+GLzq5G4F7fp+=ds-cHJRtCgJczMZgVx6Ug@mail.gmail.com>
On Thu, Jan 12, 2017 at 01:40:54PM -0800, Linus Torvalds wrote:
>
> The 8-byte alignment mainly makes sense when the basic call sequence
> just adds 8 bytes, and you have functions without frames (that still
> call other functions).
The question is does it really make sense to save those 8 bytes
of padding on x86-64 when arm64 apparently also requires 16-byte
stack alignment.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: x86-64: Maintain 16-byte stack alignment
From: Herbert Xu @ 2017-01-13 8:39 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Josh Poimboeuf, Linus Torvalds, Linux Kernel Mailing List,
Linux Crypto Mailing List, Ingo Molnar, Thomas Gleixner,
Andy Lutomirski, Ard Biesheuvel
In-Reply-To: <CALCETrXom8aY2XhpAyOtAwQQYF7wftBHJE_px1xr0iRmcYEJoA@mail.gmail.com>
On Thu, Jan 12, 2017 at 05:46:55PM -0800, Andy Lutomirski wrote:
>
> What I mean is: what guarantees that the stack is properly aligned for
> the subroutine call? gcc promises to set up a stack frame, but does
> it promise that rsp will be properly aligned to call a C function?
Yes, as long as you don't go behind its back and start directly
pushing things onto the stack with inline asm.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: x86-64: Maintain 16-byte stack alignment
From: Herbert Xu @ 2017-01-13 8:42 UTC (permalink / raw)
To: Linus Torvalds
Cc: Josh Poimboeuf, Linux Crypto Mailing List, Thomas Gleixner,
Andy Lutomirski, Ingo Molnar, Andy Lutomirski,
Linux Kernel Mailing List, Ard Biesheuvel
In-Reply-To: <CA+55aFzRrSwGxxfZk-RUEnsz=xhcSmOwE1CenfCPBWtsS9MwDw@mail.gmail.com>
On Thu, Jan 12, 2017 at 08:37:18PM -0800, Linus Torvalds wrote:
>
> So then the compiler actually needs to start adding useless instructions
> just to keep the stack 16-byte aligned.
Which it does. Of course most of the time no extra instructions
are required because there are stack variables, so it's just matter
of adding 8 to the value you're subtracting from rsp. But it is
probably why gcc assumes that the stack is 16-byte aligned which
triggered my original crash.
Here is an example from the function that was involved in the crash,
without frame pointers:
00000000000001b0 <chacha20_simd>:
1b0: 41 54 push %r12
1b2: 55 push %rbp
1b3: 48 81 ec f8 00 00 00 sub $0xf8,%rsp
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: x86-64: Maintain 16-byte stack alignment
From: Herbert Xu @ 2017-01-13 8:43 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Linus Torvalds, Linux Crypto Mailing List, Thomas Gleixner,
Andy Lutomirski, Ingo Molnar, Andy Lutomirski,
Linux Kernel Mailing List, Ard Biesheuvel
In-Reply-To: <20170113050709.yrdtfet5d4sebubi@treble>
On Thu, Jan 12, 2017 at 11:07:09PM -0600, Josh Poimboeuf wrote:
>
> Disabling frame pointers didn't seem to help, but I finally got it to
> misalign with a different test case. I think it had been aligning the
> array, so instead I made it push a register.
Right. If you start manipulating the stack directly then all bets
are off.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH] crypto: testmgr - don't DMA map IV from stack in test_skcipher()
From: Herbert Xu @ 2017-01-13 8:46 UTC (permalink / raw)
To: Horia Geantă; +Cc: David S. Miller, linux-crypto
In-Reply-To: <1484290756-20868-1-git-send-email-horia.geanta@nxp.com>
On Fri, Jan 13, 2017 at 08:59:16AM +0200, Horia Geantă wrote:
> Fix the "DMA-API: device driver maps memory from stack" warning
> generated when crypto accelerators map the IV.
>
> Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Hmm, the IV comes in as a pointer. So you should not assume that
it can be DMAed at all.
Perhaps we should change the API so that it gets passed in as an
SG list.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* RE: [PATCH] virtio-crypto: adjust priority of algorithm
From: Gonglei (Arei) @ 2017-01-13 9:24 UTC (permalink / raw)
To: Christian Borntraeger, virtualization@lists.linux-foundation.org,
linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: herbert@gondor.apana.org.au, mst@redhat.com
In-Reply-To: <874ee338-4dd6-8a2c-fb95-59bd784de026@de.ibm.com>
>
> From: Christian Borntraeger [mailto:borntraeger@de.ibm.com]
> Sent: Friday, January 13, 2017 4:28 PM
> To: Gonglei (Arei); virtualization@lists.linux-foundation.org;
> linux-crypto@vger.kernel.org; linux-kernel@vger.kernel.org
> Cc: mst@redhat.com; herbert@gondor.apana.org.au
> Subject: Re: [PATCH] virtio-crypto: adjust priority of algorithm
>
> ACK. Whoever takes this patch might want to fixup 3 typos.
>
Thanks, I'd better send v2 IMHO. :)
Regards,
-Gonglei
> On 01/13/2017 07:25 AM, Gonglei wrote:
>
> > Some hardware accelerators (like intel aseni or the s390
> aesni
> > cpacf functions) have lower priorities than virtio
> > crypto, and those drivers are faster than the same in
> > the host via virtio. So let's lower the priority of
> > virtio-crypto's algorithm, make it's higher than sofeware
> software
> > implimentations but lower than the hardware ones.
> implementations
> >
> > Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> > ---
> > drivers/crypto/virtio/virtio_crypto_algs.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c
> b/drivers/crypto/virtio/virtio_crypto_algs.c
> > index 6f40a42..4de4740 100644
> > --- a/drivers/crypto/virtio/virtio_crypto_algs.c
> > +++ b/drivers/crypto/virtio/virtio_crypto_algs.c
> > @@ -498,7 +498,7 @@ void virtio_crypto_ablkcipher_finalize_req(
> > static struct crypto_alg virtio_crypto_algs[] = { {
> > .cra_name = "cbc(aes)",
> > .cra_driver_name = "virtio_crypto_aes_cbc",
> > - .cra_priority = 501,
> > + .cra_priority = 150,
> > .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
> > .cra_blocksize = AES_BLOCK_SIZE,
> > .cra_ctxsize = sizeof(struct virtio_crypto_ablkcipher_ctx),
> >
^ permalink raw reply
* [PATCH v2] virtio-crypto: adjust priority of algorithm
From: Gonglei @ 2017-01-13 9:34 UTC (permalink / raw)
To: virtualization, linux-crypto, linux-kernel
Cc: mst, herbert, borntraeger, Gonglei
Some hardware accelerators (like intel aesni or the s390
cpacf functions) have lower priorities than virtio
crypto, and those drivers are faster than the same in
the host via virtio. So let's lower the priority of
virtio-crypto's algorithm, make it's higher than software
implementations but lower than the hardware ones.
Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
v2:
fix three typos. [Christian]
---
drivers/crypto/virtio/virtio_crypto_algs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c b/drivers/crypto/virtio/virtio_crypto_algs.c
index 6f40a42..4de4740 100644
--- a/drivers/crypto/virtio/virtio_crypto_algs.c
+++ b/drivers/crypto/virtio/virtio_crypto_algs.c
@@ -498,7 +498,7 @@ void virtio_crypto_ablkcipher_finalize_req(
static struct crypto_alg virtio_crypto_algs[] = { {
.cra_name = "cbc(aes)",
.cra_driver_name = "virtio_crypto_aes_cbc",
- .cra_priority = 501,
+ .cra_priority = 150,
.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct virtio_crypto_ablkcipher_ctx),
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH 1/2] crypto: aead AF_ALG - overhaul memory management
From: Herbert Xu @ 2017-01-13 10:21 UTC (permalink / raw)
To: Stephan Müller; +Cc: linux-crypto
In-Reply-To: <5211147.RjDSfvrhhz@tauon.atsec.com>
On Thu, Jan 12, 2017 at 05:19:57PM +0100, Stephan Müller wrote:
>
> > I don't understand, what's wrong with:
> >
> > sendmsg(fd, ...)
> > aio_read(iocb1)
> > sendmsg(fd, ...)
> > aio_read(iocb2)
>
> Sure, that works. But here you limit yourself to one IOCB per aio_read. But
> aio_read supports multiple IOCBs in one invocation. And this is the issue I am
> considering.
Not really. You just lay it out in the same way with lio_listio.
That is, a write followed by read, etc.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH 00/13] crypto: copy AAD during encrypt for AEAD ciphers
From: Herbert Xu @ 2017-01-13 10:23 UTC (permalink / raw)
To: Stephan Müller; +Cc: linux-crypto
In-Reply-To: <3463059.TVfkfLOfNf@tauon.atsec.com>
On Thu, Jan 12, 2017 at 05:37:33PM +0100, Stephan Müller wrote:
>
> I would not understand that statement.
>
> With the patch mentioned above that I provided some weeks ago, we have the
> following scenario for an encryption (in case of decryption, it is almost
> identical, just the tag location is reversed):
>
> user calls sendmsg with data buffer/IOVEC: AAD || PT
> -> algif_aead turns this into the src SGL
>
> user calls recvmsg with data buffer/IOVEC: CT || Tag
> -> algif_aead creates the first SG entry in the dst SGL pointing to the
> AAD from the src SGL
> -> algif_aead appends the user buffers to the dst SGL
>
> -> algif_aead performs its operation and during that operation, only the
> CT and Tag parts are changed
>
> I.e. with the pre-pending of the SG pointing to the AAD from the src SGL to
> the dst SGL we have a clean invocation of the kernel API.
But that means you can never invoke the in-place path of the kernel
API, which is the most optimised code path.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH v2 0/7] crypto: ARM/arm64 - AES and ChaCha20 updates for v4.11
From: Herbert Xu @ 2017-01-13 10:28 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-crypto@vger.kernel.org,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <CAKv+Gu9a5ZsWtT0XYczPBf54L7d5Whu5gG1eUOt8T-wms0orbg@mail.gmail.com>
On Thu, Jan 12, 2017 at 04:48:08PM +0000, Ard Biesheuvel wrote:
>
> Actually, patch #6 was the huge one not #7, and I don't see it in your tree yet.
>
> https://git.kernel.org/cgit/linux/kernel/git/ardb/linux.git/commit/?h=crypto-arm-v4.11&id=cbf03b255f7c
>
> The order does not matter, though, so could you please put it on top? Thanks.
OK I've applied it now and will push out soon.
Thanks,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt
From: Herbert Xu @ 2017-01-13 10:41 UTC (permalink / raw)
To: Ondrej Mosnacek
Cc: linux-crypto, dm-devel, Mike Snitzer, Milan Broz, Mikulas Patocka,
Binoy Jayan
In-Reply-To: <cover.1484215956.git.omosnacek@gmail.com>
On Thu, Jan 12, 2017 at 01:59:52PM +0100, Ondrej Mosnacek wrote:
>
> the goal of this patchset is to allow those skcipher API users that need to
> process batches of small messages (especially dm-crypt) to do so efficiently.
Please explain why this can't be done with the existing framework
using IV generators similar to the ones used for IPsec.
Thanks,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH 1/2] crypto: aead AF_ALG - overhaul memory management
From: Stephan Müller @ 2017-01-13 10:49 UTC (permalink / raw)
To: Herbert Xu; +Cc: linux-crypto
In-Reply-To: <20170113102145.GA23349@gondor.apana.org.au>
Am Freitag, 13. Januar 2017, 18:21:45 CET schrieb Herbert Xu:
Hi Herbert,
> On Thu, Jan 12, 2017 at 05:19:57PM +0100, Stephan Müller wrote:
> > > I don't understand, what's wrong with:
> > >
> > > sendmsg(fd, ...)
> > > aio_read(iocb1)
> > > sendmsg(fd, ...)
> > > aio_read(iocb2)
> >
> > Sure, that works. But here you limit yourself to one IOCB per aio_read.
> > But
> > aio_read supports multiple IOCBs in one invocation. And this is the issue
> > I am considering.
>
> Not really. You just lay it out in the same way with lio_listio.
> That is, a write followed by read, etc.
According to the man page of lio_listio(3) the provided AIO operations are
executed in an unspecified order. I would infer from that statement that even
if an order of write / read / write / read is defined by the caller, this
order may not be followed by the kernel. Thus we would need to consider the
case that in the end, algif has to process the order of write / write / read /
read or any other order.
Besides, the crashes I reported for the current AIO implementation in
algif_aead and algif_skcipher are always triggered when invoking an aio_read
with two or more IOCBs. The most important aspect I want to cover with the
patch set is to stop crashing the kernel.
Ciao
Stephan
^ permalink raw reply
* Re: [PATCH 00/13] crypto: copy AAD during encrypt for AEAD ciphers
From: Stephan Müller @ 2017-01-13 10:58 UTC (permalink / raw)
To: Herbert Xu; +Cc: linux-crypto
In-Reply-To: <20170113102339.GB23349@gondor.apana.org.au>
Am Freitag, 13. Januar 2017, 18:23:39 CET schrieb Herbert Xu:
Hi Herbert,
> On Thu, Jan 12, 2017 at 05:37:33PM +0100, Stephan Müller wrote:
> > I would not understand that statement.
> >
> > With the patch mentioned above that I provided some weeks ago, we have the
> > following scenario for an encryption (in case of decryption, it is almost
> > identical, just the tag location is reversed):
> >
> > user calls sendmsg with data buffer/IOVEC: AAD || PT
> >
> > -> algif_aead turns this into the src SGL
> >
> > user calls recvmsg with data buffer/IOVEC: CT || Tag
> >
> > -> algif_aead creates the first SG entry in the dst SGL pointing to the
> >
> > AAD from the src SGL
> >
> > -> algif_aead appends the user buffers to the dst SGL
> >
> > -> algif_aead performs its operation and during that operation, only the
> >
> > CT and Tag parts are changed
> >
> > I.e. with the pre-pending of the SG pointing to the AAD from the src SGL
> > to
> > the dst SGL we have a clean invocation of the kernel API.
>
> But that means you can never invoke the in-place path of the kernel
> API, which is the most optimised code path.
May I ask how the in-place code path can be invoked by algif_aead or
algif_skcipher? As far as I understand, this code path is only invoked when
the cipher implementation sees that the src and dst SGLs are identical.
However, both algif interfaces maintain separate src and dst SGLs and always
invoke the cipher operation with these dissimilar SGLs. Thus, I would infer
that even when the user invokes zerocopy, the src/dst SGLs are not identical
and therefore the cipher implementations would not use the in-place code path.
Ciao
Stephan
^ permalink raw reply
* Re: [PATCH 1/2] crypto: aead AF_ALG - overhaul memory management
From: Herbert Xu @ 2017-01-13 11:03 UTC (permalink / raw)
To: Stephan Müller; +Cc: linux-crypto
In-Reply-To: <1588177.ehMkkoOMrj@positron.chronox.de>
On Fri, Jan 13, 2017 at 11:49:05AM +0100, Stephan Müller wrote:
>
> According to the man page of lio_listio(3) the provided AIO operations are
> executed in an unspecified order. I would infer from that statement that even
> if an order of write / read / write / read is defined by the caller, this
> order may not be followed by the kernel. Thus we would need to consider the
> case that in the end, algif has to process the order of write / write / read /
> read or any other order.
Well if ordering is not guaranteed that I don't see how your code
can work either. Or am I missing something?
> Besides, the crashes I reported for the current AIO implementation in
> algif_aead and algif_skcipher are always triggered when invoking an aio_read
> with two or more IOCBs. The most important aspect I want to cover with the
> patch set is to stop crashing the kernel.
Please stop adding new features and just fix the existing crash.
Once we have that covered and backported to stable then we can
start addressing new features.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH 1/2] crypto: aead AF_ALG - overhaul memory management
From: Stephan Müller @ 2017-01-13 11:10 UTC (permalink / raw)
To: Herbert Xu; +Cc: linux-crypto
In-Reply-To: <20170113110335.GA23617@gondor.apana.org.au>
Am Freitag, 13. Januar 2017, 19:03:35 CET schrieb Herbert Xu:
Hi Herbert,
> On Fri, Jan 13, 2017 at 11:49:05AM +0100, Stephan Müller wrote:
> > According to the man page of lio_listio(3) the provided AIO operations are
> > executed in an unspecified order. I would infer from that statement that
> > even if an order of write / read / write / read is defined by the caller,
> > this order may not be followed by the kernel. Thus we would need to
> > consider the case that in the end, algif has to process the order of
> > write / write / read / read or any other order.
>
> Well if ordering is not guaranteed that I don't see how your code
> can work either. Or am I missing something?
The patch simply stores all data it gets from sendmsg in the src SGL. In
addition it maintains an offset pointer into that src SGLs.
When the recvmsg call comes in and the dst SGL is prepared, it simply takes as
much data from the src SGL as needed to cover the request defined by the dst
SGL. After completing that operation, the offset pointer is moved forward to
point to a yet unused part of the src SGL. If another recvmsg comes in without
an intermediate sendmsg, it simply starts using the data from the src SGL
starting from the offset.
Therefore, the code should now be able to handle a write / write / read / read
scenario. Or it can handle, say, a write(32 bytes) / read (16 bytes) / read
(16 bytes). At least my tests covered a successful testing of that scenario
which always crashed the kernel before.
Ciao
Stephan
^ permalink raw reply
* Re: [PATCH 00/13] crypto: copy AAD during encrypt for AEAD ciphers
From: Stephan Müller @ 2017-01-13 11:12 UTC (permalink / raw)
To: Herbert Xu; +Cc: linux-crypto
In-Reply-To: <20170113110417.GB23617@gondor.apana.org.au>
Am Freitag, 13. Januar 2017, 19:04:17 CET schrieb Herbert Xu:
Hi Herbert,
> On Fri, Jan 13, 2017 at 11:58:24AM +0100, Stephan Müller wrote:
> > May I ask how the in-place code path can be invoked by algif_aead or
> > algif_skcipher? As far as I understand, this code path is only invoked
> > when
> > the cipher implementation sees that the src and dst SGLs are identical.
>
> It's not right now but it isn't difficult to add the code to allow
> it by comparing SGLs.
Adding such code should IMHO not be impaired by pointing to the AAD held in
the src SGL by the dst SGL as offered with the older patch mentioned before.
Ciao
Stephan
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox