Linux cryptographic layer development
 help / color / mirror / Atom feed
* Re: x86-64: Maintain 16-byte stack alignment
From: Andy Lutomirski @ 2017-01-10 19:00 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Herbert Xu, Linux Kernel Mailing List, Linux Crypto Mailing List,
	Linus Torvalds, Ingo Molnar, Thomas Gleixner, Andy Lutomirski
In-Reply-To: <CAKv+Gu8xVrfR=J5b2pGK7R5Tv-M3xZhL_dnTvvM7nTZLLtC-EQ@mail.gmail.com>

On Tue, Jan 10, 2017 at 9:30 AM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 10 January 2017 at 14:33, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>> I recently applied the patch
>>
>>         https://patchwork.kernel.org/patch/9468391/
>>
>> and ended up with a boot crash when it tried to run the x86 chacha20
>> code.  It turned out that the patch changed a manually aligned
>> stack buffer to one that is aligned by gcc.  What was happening was
>> that gcc can stack align to any value on x86-64 except 16.  The
>> reason is that gcc assumes that the stack is always 16-byte aligned,
>> which is not actually the case in the kernel.
>>
>
> Apologies for introducing this breakage. It seemed like an obvious and
> simple cleanup, so I didn't even bother to mention it in the commit
> log, but if the kernel does not guarantee 16 byte alignment, I guess
> we should revert to the old method. If SSE instructions are the only
> ones that require this alignment, then I suppose not having a ABI
> conforming stack pointer should not be an issue in general.

Here's what I think is really going on.  This is partially from
memory, so I could be off base.  The kernel is up against
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383, which means that,
on some GCC versions (like the bad one and maybe even current ones),
things compiled without -mno-sse can't have the stack alignment set
properly.  IMO we should fix this in the affected code, not the entry
code.  In fact, I think that fixing it in the entry code won't even
fully fix it because modern GCC will compile the rest of the kernel
with 8-byte alignment and the stack will get randomly unaligned (GCC
4.8 and newer).

Can we just add __attribute__((force_align_arg_pointer)) to the
affected functions?  Maybe have:

#define __USES_SSE __attribute__((force_align_arg_pointer))

on affected gcc versions?

***HOWEVER***

I think this is missing the tree for the supposed forest.  The actual
affected code appears to be:

static int chacha20_simd(struct blkcipher_desc *desc, struct scatterlist *dst,
                         struct scatterlist *src, unsigned int nbytes)
{
        u32 *state, state_buf[16 + (CHACHA20_STATE_ALIGN / sizeof(u32)) - 1];

...

        state = (u32 *)roundup((uintptr_t)state_buf, CHACHA20_STATE_ALIGN);

gcc presumably infers (incorrectly) that state_buf is 16-byte aligned
and optimizes out the roundup.  How about just declaring an actual
__aligned(16) buffer, marking the function
__attribute__((force_align_arg_pointer)), and being done with it?
After all, we need that forcible alignment on *all* gcc versions.

--Andy

^ permalink raw reply

* Re: x86-64: Maintain 16-byte stack alignment
From: Ard Biesheuvel @ 2017-01-10 17:30 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Linux Kernel Mailing List, Linux Crypto Mailing List,
	Linus Torvalds, Ingo Molnar, Thomas Gleixner, Andy Lutomirski
In-Reply-To: <20170110143340.GA3787@gondor.apana.org.au>

On 10 January 2017 at 14:33, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> I recently applied the patch
>
>         https://patchwork.kernel.org/patch/9468391/
>
> and ended up with a boot crash when it tried to run the x86 chacha20
> code.  It turned out that the patch changed a manually aligned
> stack buffer to one that is aligned by gcc.  What was happening was
> that gcc can stack align to any value on x86-64 except 16.  The
> reason is that gcc assumes that the stack is always 16-byte aligned,
> which is not actually the case in the kernel.
>

Apologies for introducing this breakage. It seemed like an obvious and
simple cleanup, so I didn't even bother to mention it in the commit
log, but if the kernel does not guarantee 16 byte alignment, I guess
we should revert to the old method. If SSE instructions are the only
ones that require this alignment, then I suppose not having a ABI
conforming stack pointer should not be an issue in general.

> The x86-64 CPU actually tries to keep the stack 16-byte aligned,
> e.g., it'll do so when an IRQ comes in.  So the reason it doesn't
> work in the kernel mostly comes down to the fact that the struct
> pt_regs which lives near the top of the stack is 168 bytes which
> is not a multiple of 16.
>
> This patch tries to fix this by adding an 8-byte padding at the
> top of the call-chain involving pt_regs so that when we call a C
> function later we do so with an aligned stack.
>
> The same problem probably exists on i386 too since gcc also assumes
> 16-byte alignment there.  It's harder to fix however as the CPU
> doesn't help us in the IRQ case.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>
> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
> index 05ed3d3..29d3bcb 100644
> --- a/arch/x86/entry/calling.h
> +++ b/arch/x86/entry/calling.h
> @@ -59,39 +59,42 @@
>  /*
>   * C ABI says these regs are callee-preserved. They aren't saved on kernel entry
>   * unless syscall needs a complete, fully filled "struct pt_regs".
> + *
> + * Note we add 8 extra bytes at the beginning to preserve stack alignment.
>   */
> -#define R15            0*8
> -#define R14            1*8
> -#define R13            2*8
> -#define R12            3*8
> -#define RBP            4*8
> -#define RBX            5*8
> +#define R15            1*8
> +#define R14            2*8
> +#define R13            3*8
> +#define R12            4*8
> +#define RBP            5*8
> +#define RBX            6*8
>  /* These regs are callee-clobbered. Always saved on kernel entry. */
> -#define R11            6*8
> -#define R10            7*8
> -#define R9             8*8
> -#define R8             9*8
> -#define RAX            10*8
> -#define RCX            11*8
> -#define RDX            12*8
> -#define RSI            13*8
> -#define RDI            14*8
> +#define R11            7*8
> +#define R10            8*8
> +#define R9             9*8
> +#define R8             10*8
> +#define RAX            11*8
> +#define RCX            12*8
> +#define RDX            13*8
> +#define RSI            14*8
> +#define RDI            15*8
>  /*
>   * On syscall entry, this is syscall#. On CPU exception, this is error code.
>   * On hw interrupt, it's IRQ number:
>   */
> -#define ORIG_RAX       15*8
> +#define ORIG_RAX       16*8
>  /* Return frame for iretq */
> -#define RIP            16*8
> -#define CS             17*8
> -#define EFLAGS         18*8
> -#define RSP            19*8
> -#define SS             20*8
> +#define RIP            17*8
> +#define CS             18*8
> +#define EFLAGS         19*8
> +#define RSP            20*8
> +#define SS             21*8
>
> +/* Note that this excludes the 8-byte padding. */
>  #define SIZEOF_PTREGS  21*8
>
>         .macro ALLOC_PT_GPREGS_ON_STACK
> -       addq    $-(15*8), %rsp
> +       addq    $-(16*8), %rsp
>         .endm
>
>         .macro SAVE_C_REGS_HELPER offset=0 rax=1 rcx=1 r8910=1 r11=1
> @@ -114,7 +117,7 @@
>         movq %rdi, 14*8+\offset(%rsp)
>         .endm
>         .macro SAVE_C_REGS offset=0
> -       SAVE_C_REGS_HELPER \offset, 1, 1, 1, 1
> +       SAVE_C_REGS_HELPER 8+\offset, 1, 1, 1, 1
>         .endm
>         .macro SAVE_C_REGS_EXCEPT_RAX_RCX offset=0
>         SAVE_C_REGS_HELPER \offset, 0, 0, 1, 1
> @@ -130,43 +133,43 @@
>         .endm
>
>         .macro SAVE_EXTRA_REGS offset=0
> -       movq %r15, 0*8+\offset(%rsp)
> -       movq %r14, 1*8+\offset(%rsp)
> -       movq %r13, 2*8+\offset(%rsp)
> -       movq %r12, 3*8+\offset(%rsp)
> -       movq %rbp, 4*8+\offset(%rsp)
> -       movq %rbx, 5*8+\offset(%rsp)
> +       movq %r15, 1*8+\offset(%rsp)
> +       movq %r14, 2*8+\offset(%rsp)
> +       movq %r13, 3*8+\offset(%rsp)
> +       movq %r12, 4*8+\offset(%rsp)
> +       movq %rbp, 5*8+\offset(%rsp)
> +       movq %rbx, 6*8+\offset(%rsp)
>         .endm
>
>         .macro RESTORE_EXTRA_REGS offset=0
> -       movq 0*8+\offset(%rsp), %r15
> -       movq 1*8+\offset(%rsp), %r14
> -       movq 2*8+\offset(%rsp), %r13
> -       movq 3*8+\offset(%rsp), %r12
> -       movq 4*8+\offset(%rsp), %rbp
> -       movq 5*8+\offset(%rsp), %rbx
> +       movq 1*8+\offset(%rsp), %r15
> +       movq 2*8+\offset(%rsp), %r14
> +       movq 3*8+\offset(%rsp), %r13
> +       movq 4*8+\offset(%rsp), %r12
> +       movq 5*8+\offset(%rsp), %rbp
> +       movq 6*8+\offset(%rsp), %rbx
>         .endm
>
>         .macro RESTORE_C_REGS_HELPER rstor_rax=1, rstor_rcx=1, rstor_r11=1, rstor_r8910=1, rstor_rdx=1
>         .if \rstor_r11
> -       movq 6*8(%rsp), %r11
> +       movq 7*8(%rsp), %r11
>         .endif
>         .if \rstor_r8910
> -       movq 7*8(%rsp), %r10
> -       movq 8*8(%rsp), %r9
> -       movq 9*8(%rsp), %r8
> +       movq 8*8(%rsp), %r10
> +       movq 9*8(%rsp), %r9
> +       movq 10*8(%rsp), %r8
>         .endif
>         .if \rstor_rax
> -       movq 10*8(%rsp), %rax
> +       movq 11*8(%rsp), %rax
>         .endif
>         .if \rstor_rcx
> -       movq 11*8(%rsp), %rcx
> +       movq 12*8(%rsp), %rcx
>         .endif
>         .if \rstor_rdx
> -       movq 12*8(%rsp), %rdx
> +       movq 13*8(%rsp), %rdx
>         .endif
> -       movq 13*8(%rsp), %rsi
> -       movq 14*8(%rsp), %rdi
> +       movq 14*8(%rsp), %rsi
> +       movq 15*8(%rsp), %rdi
>         .endm
>         .macro RESTORE_C_REGS
>         RESTORE_C_REGS_HELPER 1,1,1,1,1
> @@ -185,7 +188,7 @@
>         .endm
>
>         .macro REMOVE_PT_GPREGS_FROM_STACK addskip=0
> -       subq $-(15*8+\addskip), %rsp
> +       subq $-(16*8+\addskip), %rsp
>         .endm
>
>         .macro icebp
> @@ -203,11 +206,7 @@
>   */
>  .macro ENCODE_FRAME_POINTER ptregs_offset=0
>  #ifdef CONFIG_FRAME_POINTER
> -       .if \ptregs_offset
> -               leaq \ptregs_offset(%rsp), %rbp
> -       .else
> -               mov %rsp, %rbp
> -       .endif
> +       leaq    8+\ptregs_offset(%rsp), %rbp
>         orq     $0x1, %rbp
>  #endif
>  .endm
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 5b21970..880bbb8 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -168,7 +168,7 @@ GLOBAL(entry_SYSCALL_64_after_swapgs)
>         pushq   %r9                             /* pt_regs->r9 */
>         pushq   %r10                            /* pt_regs->r10 */
>         pushq   %r11                            /* pt_regs->r11 */
> -       sub     $(6*8), %rsp                    /* pt_regs->bp, bx, r12-15 not saved */
> +       sub     $(7*8), %rsp                    /* pt_regs->bp, bx, r12-15 not saved */
>
>         /*
>          * If we need to do entry work or if we guess we'll need to do
> @@ -234,14 +234,14 @@ entry_SYSCALL_64_fastpath:
>         TRACE_IRQS_ON
>         ENABLE_INTERRUPTS(CLBR_NONE)
>         SAVE_EXTRA_REGS
> -       movq    %rsp, %rdi
> +       leaq    8(%rsp), %rdi
>         call    syscall_return_slowpath /* returns with IRQs disabled */
>         jmp     return_from_SYSCALL_64
>
>  entry_SYSCALL64_slow_path:
>         /* IRQs are off. */
>         SAVE_EXTRA_REGS
> -       movq    %rsp, %rdi
> +       leaq    8(%rsp), %rdi
>         call    do_syscall_64           /* returns with IRQs disabled */
>
>  return_from_SYSCALL_64:
> @@ -342,9 +342,9 @@ ENTRY(stub_ptregs_64)
>          * Called from fast path -- disable IRQs again, pop return address
>          * and jump to slow path
>          */
> +       popq    %rax
>         DISABLE_INTERRUPTS(CLBR_NONE)
>         TRACE_IRQS_OFF
> -       popq    %rax
>         jmp     entry_SYSCALL64_slow_path
>
>  1:
> @@ -409,13 +409,14 @@ END(__switch_to_asm)
>   */
>  ENTRY(ret_from_fork)
>         movq    %rax, %rdi
> +       subq    $8, %rsp
>         call    schedule_tail                   /* rdi: 'prev' task parameter */
>
>         testq   %rbx, %rbx                      /* from kernel_thread? */
>         jnz     1f                              /* kernel threads are uncommon */
>
>  2:
> -       movq    %rsp, %rdi
> +       leaq    8(%rsp), %rdi
>         call    syscall_return_slowpath /* returns with IRQs disabled */
>         TRACE_IRQS_ON                   /* user mode is traced as IRQS on */
>         SWAPGS
> @@ -494,10 +495,12 @@ END(irq_entries_start)
>          * a little cheaper to use a separate counter in the PDA (short of
>          * moving irq_enter into assembly, which would be too much work)
>          */
> -       movq    %rsp, %rdi
> +       movq    %rsp, %rax
> +       leaq    8(%rsp), %rdi
>         incl    PER_CPU_VAR(irq_count)
>         cmovzq  PER_CPU_VAR(irq_stack_ptr), %rsp
> -       pushq   %rdi
> +       sub     $8, %rsp
> +       pushq   %rax
>         /* We entered an interrupt context - irqs are off: */
>         TRACE_IRQS_OFF
>
> @@ -527,7 +530,7 @@ ret_from_intr:
>
>         /* Interrupt came from user space */
>  GLOBAL(retint_user)
> -       mov     %rsp,%rdi
> +       leaq    8(%rsp), %rdi
>         call    prepare_exit_to_usermode
>         TRACE_IRQS_IRETQ
>         SWAPGS
> @@ -774,7 +777,7 @@ ENTRY(\sym)
>         .endif
>         .endif
>
> -       movq    %rsp, %rdi                      /* pt_regs pointer */
> +       leaq    8(%rsp), %rdi                   /* pt_regs pointer */
>
>         .if \has_error_code
>         movq    ORIG_RAX(%rsp), %rsi            /* get error code */
> @@ -810,11 +813,11 @@ ENTRY(\sym)
>         call    error_entry
>
>
> -       movq    %rsp, %rdi                      /* pt_regs pointer */
> +       leaq    8(%rsp), %rdi                   /* pt_regs pointer */
>         call    sync_regs
> -       movq    %rax, %rsp                      /* switch stack */
> +       leaq    -8(%rax), %rsp                  /* switch stack */
>
> -       movq    %rsp, %rdi                      /* pt_regs pointer */
> +       movq    %rax, %rdi                      /* pt_regs pointer */
>
>         .if \has_error_code
>         movq    ORIG_RAX(%rsp), %rsi            /* get error code */
> @@ -895,6 +898,7 @@ ENTRY(do_softirq_own_stack)
>         mov     %rsp, %rbp
>         incl    PER_CPU_VAR(irq_count)
>         cmove   PER_CPU_VAR(irq_stack_ptr), %rsp
> +       sub     $8, %rsp
>         push    %rbp                            /* frame pointer backlink */
>         call    __do_softirq
>         leaveq
> @@ -924,10 +928,11 @@ ENTRY(xen_do_hypervisor_callback)         /* do_hypervisor_callback(struct *pt_regs) */
>   * Since we don't modify %rdi, evtchn_do_upall(struct *pt_regs) will
>   * see the correct pointer to the pt_regs
>   */
> -       movq    %rdi, %rsp                      /* we don't return, adjust the stack frame */
> +       leaq    -8(%rdi), %rsp                  /* we don't return, adjust the stack frame */
>  11:    incl    PER_CPU_VAR(irq_count)
>         movq    %rsp, %rbp
>         cmovzq  PER_CPU_VAR(irq_stack_ptr), %rsp
> +       subq    $8, %rsp
>         pushq   %rbp                            /* frame pointer backlink */
>         call    xen_evtchn_do_upcall
>         popq    %rsp
> @@ -1264,6 +1269,7 @@ ENTRY(nmi)
>          */
>
>         movq    %rsp, %rdi
> +       subq    $8, %rsp
>         movq    $-1, %rsi
>         call    do_nmi
>
> @@ -1475,7 +1481,7 @@ end_repeat_nmi:
>         call    paranoid_entry
>
>         /* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
> -       movq    %rsp, %rdi
> +       leaq    8(%rsp), %rdi
>         movq    $-1, %rsi
>         call    do_nmi
>
> @@ -1519,7 +1525,7 @@ ENTRY(rewind_stack_do_exit)
>         xorl    %ebp, %ebp
>
>         movq    PER_CPU_VAR(cpu_current_top_of_stack), %rax
> -       leaq    -TOP_OF_KERNEL_STACK_PADDING-PTREGS_SIZE(%rax), %rsp
> +       leaq    -TOP_OF_KERNEL_STACK_PADDING-PTREGS_SIZE-8(%rax), %rsp
>
>         call    do_exit
>  1:     jmp 1b
> diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
> index e1721da..7d3f1e3 100644
> --- a/arch/x86/entry/entry_64_compat.S
> +++ b/arch/x86/entry/entry_64_compat.S
> @@ -89,6 +89,8 @@ ENTRY(entry_SYSENTER_compat)
>         pushq   $0                      /* pt_regs->r13 = 0 */
>         pushq   $0                      /* pt_regs->r14 = 0 */
>         pushq   $0                      /* pt_regs->r15 = 0 */
> +
> +       subq    $8, %rsp
>         cld
>
>         /*
> @@ -120,7 +122,7 @@ ENTRY(entry_SYSENTER_compat)
>          */
>         TRACE_IRQS_OFF
>
> -       movq    %rsp, %rdi
> +       leaq    8(%rsp), %rdi
>         call    do_fast_syscall_32
>         /* XEN PV guests always use IRET path */
>         ALTERNATIVE "testl %eax, %eax; jz .Lsyscall_32_done", \
> @@ -215,13 +217,15 @@ ENTRY(entry_SYSCALL_compat)
>         pushq   $0                      /* pt_regs->r14 = 0 */
>         pushq   $0                      /* pt_regs->r15 = 0 */
>
> +       subq    $8, %rsp
> +
>         /*
>          * User mode is traced as though IRQs are on, and SYSENTER
>          * turned them off.
>          */
>         TRACE_IRQS_OFF
>
> -       movq    %rsp, %rdi
> +       leaq    8(%rsp), %rdi
>         call    do_fast_syscall_32
>         /* XEN PV guests always use IRET path */
>         ALTERNATIVE "testl %eax, %eax; jz .Lsyscall_32_done", \
> @@ -324,6 +328,8 @@ ENTRY(entry_INT80_compat)
>         pushq   %r13                    /* pt_regs->r13 */
>         pushq   %r14                    /* pt_regs->r14 */
>         pushq   %r15                    /* pt_regs->r15 */
> +
> +       subq    $8, %rsp
>         cld
>
>         /*
> @@ -332,7 +338,7 @@ ENTRY(entry_INT80_compat)
>          */
>         TRACE_IRQS_OFF
>
> -       movq    %rsp, %rdi
> +       leaq    8(%rsp), %rdi
>         call    do_int80_syscall_32
>  .Lsyscall_32_done:
>
> diff --git a/arch/x86/entry/thunk_64.S b/arch/x86/entry/thunk_64.S
> index be36bf4..3c80aac 100644
> --- a/arch/x86/entry/thunk_64.S
> +++ b/arch/x86/entry/thunk_64.S
> @@ -33,6 +33,7 @@
>         movq 8(%rbp), %rdi
>         .endif
>
> +       sub $8, %rsp
>         call \func
>         jmp  .L_restore
>         _ASM_NOKPROBE(\name)
> @@ -58,6 +59,7 @@
>   || defined(CONFIG_DEBUG_LOCK_ALLOC) \
>   || defined(CONFIG_PREEMPT)
>  .L_restore:
> +       add $8, %rsp
>         popq %r11
>         popq %r10
>         popq %r9
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index b467b14..d03ab72 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -384,6 +384,8 @@ early_idt_handler_common:
>         pushq %r14                              /* pt_regs->r14 */
>         pushq %r15                              /* pt_regs->r15 */
>
> +       sub $8, %rsp
> +
>         cmpq $14,%rsi           /* Page fault? */
>         jnz 10f
>         GET_CR2_INTO(%rdi)      /* Can clobber any volatile register if pv */
> @@ -392,7 +394,7 @@ early_idt_handler_common:
>         jz 20f                  /* All good */
>
>  10:
> -       movq %rsp,%rdi          /* RDI = pt_regs; RSI is already trapnr */
> +       leaq 8(%rsp), %rdi      /* RDI = pt_regs; RSI is already trapnr */
>         call early_fixup_exception
>
>  20:
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index bf0c6d0..2af9f81 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -590,6 +590,7 @@ asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs)
>
>  struct bad_iret_stack {
>         void *error_entry_ret;
> +       void *padding;
>         struct pt_regs regs;
>  };
>
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: x86-64: Maintain 16-byte stack alignment
From: Andy Lutomirski @ 2017-01-10 17:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Herbert Xu, Linux Kernel Mailing List, Linux Crypto Mailing List,
	Ingo Molnar, Thomas Gleixner, Andy Lutomirski, Ard Biesheuvel
In-Reply-To: <CA+55aFyuUCr3aUBhFt6T3vKyZOskLyH8Kd_2jeFkPYxxvEaXEg@mail.gmail.com>

On Tue, Jan 10, 2017 at 9:05 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Jan 10, 2017 at 6:39 AM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>>
>> BTW this is with Debian gcc 4.7.2 which does not allow an 8-byte
>> stack alignment as attempted by the Makefile:
>
> I'm pretty sure we have random asm code that may not maintain a
> sus16-byte stack alignment when it calls other code (including, in some
> cases, calling C code).

I suspect so.

If we change this, changing pt_regs might make sense but is kind of
weird.  It also needs to be tested with and without frame pointers.

>
> So I'm not at all convinced that this is a good idea. We shouldn't
> expect 16-byte alignment to be something trustworthy.
>
>                  Linus



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply

* Re: x86-64: Maintain 16-byte stack alignment
From: Linus Torvalds @ 2017-01-10 17:05 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Linux Kernel Mailing List, Linux Crypto Mailing List, Ingo Molnar,
	Thomas Gleixner, Andy Lutomirski, Ard Biesheuvel
In-Reply-To: <20170110143913.GA3822@gondor.apana.org.au>

On Tue, Jan 10, 2017 at 6:39 AM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> BTW this is with Debian gcc 4.7.2 which does not allow an 8-byte
> stack alignment as attempted by the Makefile:

I'm pretty sure we have random asm code that may not maintain a
16-byte stack alignment when it calls other code (including, in some
cases, calling C code).

So I'm not at all convinced that this is a good idea. We shouldn't
expect 16-byte alignment to be something trustworthy.

                 Linus

^ permalink raw reply

* Re: x86-64: Maintain 16-byte stack alignment
From: Herbert Xu @ 2017-01-10 14:39 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Linux Crypto Mailing List,
	Linus Torvalds, Ingo Molnar, Thomas Gleixner, Andy Lutomirski,
	Ard Biesheuvel
In-Reply-To: <20170110143340.GA3787@gondor.apana.org.au>

On Tue, Jan 10, 2017 at 10:33:40PM +0800, Herbert Xu wrote:
> I recently applied the patch
> 
> 	https://patchwork.kernel.org/patch/9468391/
> 
> and ended up with a boot crash when it tried to run the x86 chacha20
> code.  It turned out that the patch changed a manually aligned
> stack buffer to one that is aligned by gcc.  What was happening was
> that gcc can stack align to any value on x86-64 except 16.  The
> reason is that gcc assumes that the stack is always 16-byte aligned,
> which is not actually the case in the kernel.

BTW this is with Debian gcc 4.7.2 which does not allow an 8-byte
stack alignment as attempted by the Makefile:

$ gcc -S -O2 -mno-sse -mpreferred-stack-boundary=3 a.c
a.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12
$ 

Obviously this is not an issue if your compiler actually allows
the 8-byte alignment.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* x86-64: Maintain 16-byte stack alignment
From: Herbert Xu @ 2017-01-10 14:33 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Linux Crypto Mailing List,
	Linus Torvalds, Ingo Molnar, Thomas Gleixner, Andy Lutomirski,
	Ard Biesheuvel

I recently applied the patch

	https://patchwork.kernel.org/patch/9468391/

and ended up with a boot crash when it tried to run the x86 chacha20
code.  It turned out that the patch changed a manually aligned
stack buffer to one that is aligned by gcc.  What was happening was
that gcc can stack align to any value on x86-64 except 16.  The
reason is that gcc assumes that the stack is always 16-byte aligned,
which is not actually the case in the kernel.

The x86-64 CPU actually tries to keep the stack 16-byte aligned,
e.g., it'll do so when an IRQ comes in.  So the reason it doesn't
work in the kernel mostly comes down to the fact that the struct
pt_regs which lives near the top of the stack is 168 bytes which
is not a multiple of 16.

This patch tries to fix this by adding an 8-byte padding at the
top of the call-chain involving pt_regs so that when we call a C
function later we do so with an aligned stack.

The same problem probably exists on i386 too since gcc also assumes
16-byte alignment there.  It's harder to fix however as the CPU
doesn't help us in the IRQ case.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 05ed3d3..29d3bcb 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -59,39 +59,42 @@
 /*
  * C ABI says these regs are callee-preserved. They aren't saved on kernel entry
  * unless syscall needs a complete, fully filled "struct pt_regs".
+ *
+ * Note we add 8 extra bytes at the beginning to preserve stack alignment.
  */
-#define R15		0*8
-#define R14		1*8
-#define R13		2*8
-#define R12		3*8
-#define RBP		4*8
-#define RBX		5*8
+#define R15		1*8
+#define R14		2*8
+#define R13		3*8
+#define R12		4*8
+#define RBP		5*8
+#define RBX		6*8
 /* These regs are callee-clobbered. Always saved on kernel entry. */
-#define R11		6*8
-#define R10		7*8
-#define R9		8*8
-#define R8		9*8
-#define RAX		10*8
-#define RCX		11*8
-#define RDX		12*8
-#define RSI		13*8
-#define RDI		14*8
+#define R11		7*8
+#define R10		8*8
+#define R9		9*8
+#define R8		10*8
+#define RAX		11*8
+#define RCX		12*8
+#define RDX		13*8
+#define RSI		14*8
+#define RDI		15*8
 /*
  * On syscall entry, this is syscall#. On CPU exception, this is error code.
  * On hw interrupt, it's IRQ number:
  */
-#define ORIG_RAX	15*8
+#define ORIG_RAX	16*8
 /* Return frame for iretq */
-#define RIP		16*8
-#define CS		17*8
-#define EFLAGS		18*8
-#define RSP		19*8
-#define SS		20*8
+#define RIP		17*8
+#define CS		18*8
+#define EFLAGS		19*8
+#define RSP		20*8
+#define SS		21*8
 
+/* Note that this excludes the 8-byte padding. */
 #define SIZEOF_PTREGS	21*8
 
 	.macro ALLOC_PT_GPREGS_ON_STACK
-	addq	$-(15*8), %rsp
+	addq	$-(16*8), %rsp
 	.endm
 
 	.macro SAVE_C_REGS_HELPER offset=0 rax=1 rcx=1 r8910=1 r11=1
@@ -114,7 +117,7 @@
 	movq %rdi, 14*8+\offset(%rsp)
 	.endm
 	.macro SAVE_C_REGS offset=0
-	SAVE_C_REGS_HELPER \offset, 1, 1, 1, 1
+	SAVE_C_REGS_HELPER 8+\offset, 1, 1, 1, 1
 	.endm
 	.macro SAVE_C_REGS_EXCEPT_RAX_RCX offset=0
 	SAVE_C_REGS_HELPER \offset, 0, 0, 1, 1
@@ -130,43 +133,43 @@
 	.endm
 
 	.macro SAVE_EXTRA_REGS offset=0
-	movq %r15, 0*8+\offset(%rsp)
-	movq %r14, 1*8+\offset(%rsp)
-	movq %r13, 2*8+\offset(%rsp)
-	movq %r12, 3*8+\offset(%rsp)
-	movq %rbp, 4*8+\offset(%rsp)
-	movq %rbx, 5*8+\offset(%rsp)
+	movq %r15, 1*8+\offset(%rsp)
+	movq %r14, 2*8+\offset(%rsp)
+	movq %r13, 3*8+\offset(%rsp)
+	movq %r12, 4*8+\offset(%rsp)
+	movq %rbp, 5*8+\offset(%rsp)
+	movq %rbx, 6*8+\offset(%rsp)
 	.endm
 
 	.macro RESTORE_EXTRA_REGS offset=0
-	movq 0*8+\offset(%rsp), %r15
-	movq 1*8+\offset(%rsp), %r14
-	movq 2*8+\offset(%rsp), %r13
-	movq 3*8+\offset(%rsp), %r12
-	movq 4*8+\offset(%rsp), %rbp
-	movq 5*8+\offset(%rsp), %rbx
+	movq 1*8+\offset(%rsp), %r15
+	movq 2*8+\offset(%rsp), %r14
+	movq 3*8+\offset(%rsp), %r13
+	movq 4*8+\offset(%rsp), %r12
+	movq 5*8+\offset(%rsp), %rbp
+	movq 6*8+\offset(%rsp), %rbx
 	.endm
 
 	.macro RESTORE_C_REGS_HELPER rstor_rax=1, rstor_rcx=1, rstor_r11=1, rstor_r8910=1, rstor_rdx=1
 	.if \rstor_r11
-	movq 6*8(%rsp), %r11
+	movq 7*8(%rsp), %r11
 	.endif
 	.if \rstor_r8910
-	movq 7*8(%rsp), %r10
-	movq 8*8(%rsp), %r9
-	movq 9*8(%rsp), %r8
+	movq 8*8(%rsp), %r10
+	movq 9*8(%rsp), %r9
+	movq 10*8(%rsp), %r8
 	.endif
 	.if \rstor_rax
-	movq 10*8(%rsp), %rax
+	movq 11*8(%rsp), %rax
 	.endif
 	.if \rstor_rcx
-	movq 11*8(%rsp), %rcx
+	movq 12*8(%rsp), %rcx
 	.endif
 	.if \rstor_rdx
-	movq 12*8(%rsp), %rdx
+	movq 13*8(%rsp), %rdx
 	.endif
-	movq 13*8(%rsp), %rsi
-	movq 14*8(%rsp), %rdi
+	movq 14*8(%rsp), %rsi
+	movq 15*8(%rsp), %rdi
 	.endm
 	.macro RESTORE_C_REGS
 	RESTORE_C_REGS_HELPER 1,1,1,1,1
@@ -185,7 +188,7 @@
 	.endm
 
 	.macro REMOVE_PT_GPREGS_FROM_STACK addskip=0
-	subq $-(15*8+\addskip), %rsp
+	subq $-(16*8+\addskip), %rsp
 	.endm
 
 	.macro icebp
@@ -203,11 +206,7 @@
  */
 .macro ENCODE_FRAME_POINTER ptregs_offset=0
 #ifdef CONFIG_FRAME_POINTER
-	.if \ptregs_offset
-		leaq \ptregs_offset(%rsp), %rbp
-	.else
-		mov %rsp, %rbp
-	.endif
+	leaq	8+\ptregs_offset(%rsp), %rbp
 	orq	$0x1, %rbp
 #endif
 .endm
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 5b21970..880bbb8 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -168,7 +168,7 @@ GLOBAL(entry_SYSCALL_64_after_swapgs)
 	pushq	%r9				/* pt_regs->r9 */
 	pushq	%r10				/* pt_regs->r10 */
 	pushq	%r11				/* pt_regs->r11 */
-	sub	$(6*8), %rsp			/* pt_regs->bp, bx, r12-15 not saved */
+	sub	$(7*8), %rsp			/* pt_regs->bp, bx, r12-15 not saved */
 
 	/*
 	 * If we need to do entry work or if we guess we'll need to do
@@ -234,14 +234,14 @@ entry_SYSCALL_64_fastpath:
 	TRACE_IRQS_ON
 	ENABLE_INTERRUPTS(CLBR_NONE)
 	SAVE_EXTRA_REGS
-	movq	%rsp, %rdi
+	leaq	8(%rsp), %rdi
 	call	syscall_return_slowpath	/* returns with IRQs disabled */
 	jmp	return_from_SYSCALL_64
 
 entry_SYSCALL64_slow_path:
 	/* IRQs are off. */
 	SAVE_EXTRA_REGS
-	movq	%rsp, %rdi
+	leaq	8(%rsp), %rdi
 	call	do_syscall_64		/* returns with IRQs disabled */
 
 return_from_SYSCALL_64:
@@ -342,9 +342,9 @@ ENTRY(stub_ptregs_64)
 	 * Called from fast path -- disable IRQs again, pop return address
 	 * and jump to slow path
 	 */
+	popq	%rax
 	DISABLE_INTERRUPTS(CLBR_NONE)
 	TRACE_IRQS_OFF
-	popq	%rax
 	jmp	entry_SYSCALL64_slow_path
 
 1:
@@ -409,13 +409,14 @@ END(__switch_to_asm)
  */
 ENTRY(ret_from_fork)
 	movq	%rax, %rdi
+	subq	$8, %rsp
 	call	schedule_tail			/* rdi: 'prev' task parameter */
 
 	testq	%rbx, %rbx			/* from kernel_thread? */
 	jnz	1f				/* kernel threads are uncommon */
 
 2:
-	movq	%rsp, %rdi
+	leaq	8(%rsp), %rdi
 	call	syscall_return_slowpath	/* returns with IRQs disabled */
 	TRACE_IRQS_ON			/* user mode is traced as IRQS on */
 	SWAPGS
@@ -494,10 +495,12 @@ END(irq_entries_start)
 	 * a little cheaper to use a separate counter in the PDA (short of
 	 * moving irq_enter into assembly, which would be too much work)
 	 */
-	movq	%rsp, %rdi
+	movq	%rsp, %rax
+	leaq	8(%rsp), %rdi
 	incl	PER_CPU_VAR(irq_count)
 	cmovzq	PER_CPU_VAR(irq_stack_ptr), %rsp
-	pushq	%rdi
+	sub	$8, %rsp
+	pushq	%rax
 	/* We entered an interrupt context - irqs are off: */
 	TRACE_IRQS_OFF
 
@@ -527,7 +530,7 @@ ret_from_intr:
 
 	/* Interrupt came from user space */
 GLOBAL(retint_user)
-	mov	%rsp,%rdi
+	leaq	8(%rsp), %rdi
 	call	prepare_exit_to_usermode
 	TRACE_IRQS_IRETQ
 	SWAPGS
@@ -774,7 +777,7 @@ ENTRY(\sym)
 	.endif
 	.endif
 
-	movq	%rsp, %rdi			/* pt_regs pointer */
+	leaq	8(%rsp), %rdi			/* pt_regs pointer */
 
 	.if \has_error_code
 	movq	ORIG_RAX(%rsp), %rsi		/* get error code */
@@ -810,11 +813,11 @@ ENTRY(\sym)
 	call	error_entry
 
 
-	movq	%rsp, %rdi			/* pt_regs pointer */
+	leaq	8(%rsp), %rdi			/* pt_regs pointer */
 	call	sync_regs
-	movq	%rax, %rsp			/* switch stack */
+	leaq	-8(%rax), %rsp			/* switch stack */
 
-	movq	%rsp, %rdi			/* pt_regs pointer */
+	movq	%rax, %rdi			/* pt_regs pointer */
 
 	.if \has_error_code
 	movq	ORIG_RAX(%rsp), %rsi		/* get error code */
@@ -895,6 +898,7 @@ ENTRY(do_softirq_own_stack)
 	mov	%rsp, %rbp
 	incl	PER_CPU_VAR(irq_count)
 	cmove	PER_CPU_VAR(irq_stack_ptr), %rsp
+	sub	$8, %rsp
 	push	%rbp				/* frame pointer backlink */
 	call	__do_softirq
 	leaveq
@@ -924,10 +928,11 @@ ENTRY(xen_do_hypervisor_callback)		/* do_hypervisor_callback(struct *pt_regs) */
  * Since we don't modify %rdi, evtchn_do_upall(struct *pt_regs) will
  * see the correct pointer to the pt_regs
  */
-	movq	%rdi, %rsp			/* we don't return, adjust the stack frame */
+	leaq	-8(%rdi), %rsp			/* we don't return, adjust the stack frame */
 11:	incl	PER_CPU_VAR(irq_count)
 	movq	%rsp, %rbp
 	cmovzq	PER_CPU_VAR(irq_stack_ptr), %rsp
+	subq	$8, %rsp
 	pushq	%rbp				/* frame pointer backlink */
 	call	xen_evtchn_do_upcall
 	popq	%rsp
@@ -1264,6 +1269,7 @@ ENTRY(nmi)
 	 */
 
 	movq	%rsp, %rdi
+	subq	$8, %rsp
 	movq	$-1, %rsi
 	call	do_nmi
 
@@ -1475,7 +1481,7 @@ end_repeat_nmi:
 	call	paranoid_entry
 
 	/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
-	movq	%rsp, %rdi
+	leaq	8(%rsp), %rdi
 	movq	$-1, %rsi
 	call	do_nmi
 
@@ -1519,7 +1525,7 @@ ENTRY(rewind_stack_do_exit)
 	xorl	%ebp, %ebp
 
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rax
-	leaq	-TOP_OF_KERNEL_STACK_PADDING-PTREGS_SIZE(%rax), %rsp
+	leaq	-TOP_OF_KERNEL_STACK_PADDING-PTREGS_SIZE-8(%rax), %rsp
 
 	call	do_exit
 1:	jmp 1b
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index e1721da..7d3f1e3 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -89,6 +89,8 @@ ENTRY(entry_SYSENTER_compat)
 	pushq   $0			/* pt_regs->r13 = 0 */
 	pushq   $0			/* pt_regs->r14 = 0 */
 	pushq   $0			/* pt_regs->r15 = 0 */
+
+	subq	$8, %rsp
 	cld
 
 	/*
@@ -120,7 +122,7 @@ ENTRY(entry_SYSENTER_compat)
 	 */
 	TRACE_IRQS_OFF
 
-	movq	%rsp, %rdi
+	leaq	8(%rsp), %rdi
 	call	do_fast_syscall_32
 	/* XEN PV guests always use IRET path */
 	ALTERNATIVE "testl %eax, %eax; jz .Lsyscall_32_done", \
@@ -215,13 +217,15 @@ ENTRY(entry_SYSCALL_compat)
 	pushq   $0			/* pt_regs->r14 = 0 */
 	pushq   $0			/* pt_regs->r15 = 0 */
 
+	subq	$8, %rsp
+
 	/*
 	 * User mode is traced as though IRQs are on, and SYSENTER
 	 * turned them off.
 	 */
 	TRACE_IRQS_OFF
 
-	movq	%rsp, %rdi
+	leaq	8(%rsp), %rdi
 	call	do_fast_syscall_32
 	/* XEN PV guests always use IRET path */
 	ALTERNATIVE "testl %eax, %eax; jz .Lsyscall_32_done", \
@@ -324,6 +328,8 @@ ENTRY(entry_INT80_compat)
 	pushq   %r13                    /* pt_regs->r13 */
 	pushq   %r14                    /* pt_regs->r14 */
 	pushq   %r15                    /* pt_regs->r15 */
+
+	subq	$8, %rsp
 	cld
 
 	/*
@@ -332,7 +338,7 @@ ENTRY(entry_INT80_compat)
 	 */
 	TRACE_IRQS_OFF
 
-	movq	%rsp, %rdi
+	leaq	8(%rsp), %rdi
 	call	do_int80_syscall_32
 .Lsyscall_32_done:
 
diff --git a/arch/x86/entry/thunk_64.S b/arch/x86/entry/thunk_64.S
index be36bf4..3c80aac 100644
--- a/arch/x86/entry/thunk_64.S
+++ b/arch/x86/entry/thunk_64.S
@@ -33,6 +33,7 @@
 	movq 8(%rbp), %rdi
 	.endif
 
+	sub $8, %rsp
 	call \func
 	jmp  .L_restore
 	_ASM_NOKPROBE(\name)
@@ -58,6 +59,7 @@
  || defined(CONFIG_DEBUG_LOCK_ALLOC) \
  || defined(CONFIG_PREEMPT)
 .L_restore:
+	add $8, %rsp
 	popq %r11
 	popq %r10
 	popq %r9
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index b467b14..d03ab72 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -384,6 +384,8 @@ early_idt_handler_common:
 	pushq %r14				/* pt_regs->r14 */
 	pushq %r15				/* pt_regs->r15 */
 
+	sub $8, %rsp
+
 	cmpq $14,%rsi		/* Page fault? */
 	jnz 10f
 	GET_CR2_INTO(%rdi)	/* Can clobber any volatile register if pv */
@@ -392,7 +394,7 @@ early_idt_handler_common:
 	jz 20f			/* All good */
 
 10:
-	movq %rsp,%rdi		/* RDI = pt_regs; RSI is already trapnr */
+	leaq 8(%rsp), %rdi	/* RDI = pt_regs; RSI is already trapnr */
 	call early_fixup_exception
 
 20:
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index bf0c6d0..2af9f81 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -590,6 +590,7 @@ asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs)
 
 struct bad_iret_stack {
 	void *error_entry_ret;
+	void *padding;
 	struct pt_regs regs;
 };
 
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related

* Re: [PATCH v8 1/1] crypto: add virtio-crypto driver
From: Christian Borntraeger @ 2017-01-10 12:56 UTC (permalink / raw)
  To: Gonglei (Arei), linux-kernel@vger.kernel.org,
	qemu-devel@nongnu.org, virtio-dev@lists.oasis-open.org,
	virtualization@lists.linux-foundation.org,
	linux-crypto@vger.kernel.org
  Cc: Huangweidong (C), Claudio Fontana, mst@redhat.com, Luonengjun,
	Hanweidong (Randy), Xuquan (Quan Xu), Wanzongshun (Vincent),
	stefanha@redhat.com, Zhoujian (jay, Euler), longpeng,
	arei.gonglei@hotmail.com, davem@davemloft.net, Wubin (H),
	herbert@gondor.apana.org.au
In-Reply-To: <33183CC9F5247A488A2544077AF19020DA182B83@DGGEMA505-MBX.china.huawei.com>

On 01/10/2017 01:36 PM, Gonglei (Arei) wrote:
> Hi,
> 
>>
>> On 12/15/2016 03:03 AM, Gonglei wrote:
>> [...]
>>> +
>>> +static struct crypto_alg virtio_crypto_algs[] = { {
>>> +	.cra_name = "cbc(aes)",
>>> +	.cra_driver_name = "virtio_crypto_aes_cbc",
>>> +	.cra_priority = 501,
>>
>>
>> This is still higher than the hardware-accelerators (like intel aesni or the
>> s390 cpacf functions or the arm hw). aesni and s390/cpacf are supported by the
>> hardware virtualization and available to the guests. I do not see a way how
>> virtio
>> crypto can be faster than that (in the end it might be cpacf/aesni + overhead)
>> instead it will very likely be slower.
>> So we should use a number that is higher than software implementations but
>> lower than the hw ones.
>>
>> Just grepping around, the software ones seem be be around 100 and the
>> hardware
>> ones around 200-400. So why was 150 not enough?
>>
> I didn't find a documentation about how we use the priority, and I assumed
> people use virtio-crypto will configure hardware accelerators in the
> host. So I choosed the number which bigger than aesni's priority.

Yes, but the aesni driver will only bind if there is HW support in the guest.
And if aesni is available in the guest (or the s390 aes function from cpacf)
it will always be faster than the same in the host via virtio.So your priority
should be smaller.

^ permalink raw reply

* RE: [PATCH v8 1/1] crypto: add virtio-crypto driver
From: Gonglei (Arei) @ 2017-01-10 12:36 UTC (permalink / raw)
  To: Christian Borntraeger, linux-kernel@vger.kernel.org,
	qemu-devel@nongnu.org, virtio-dev@lists.oasis-open.org,
	virtualization@lists.linux-foundation.org,
	linux-crypto@vger.kernel.org
  Cc: Huangweidong (C), Claudio Fontana, mst@redhat.com, Luonengjun,
	Hanweidong (Randy), Xuquan (Quan Xu), Wanzongshun (Vincent),
	stefanha@redhat.com, Zhoujian (jay, Euler), longpeng,
	arei.gonglei@hotmail.com, davem@davemloft.net, Wubin (H),
	herbert@gondor.apana.org.au
In-Reply-To: <e235caf4-f51c-dccf-4eb7-548d3fa71644@de.ibm.com>

Hi,

> 
> On 12/15/2016 03:03 AM, Gonglei wrote:
> [...]
> > +
> > +static struct crypto_alg virtio_crypto_algs[] = { {
> > +	.cra_name = "cbc(aes)",
> > +	.cra_driver_name = "virtio_crypto_aes_cbc",
> > +	.cra_priority = 501,
> 
> 
> This is still higher than the hardware-accelerators (like intel aesni or the
> s390 cpacf functions or the arm hw). aesni and s390/cpacf are supported by the
> hardware virtualization and available to the guests. I do not see a way how
> virtio
> crypto can be faster than that (in the end it might be cpacf/aesni + overhead)
> instead it will very likely be slower.
> So we should use a number that is higher than software implementations but
> lower than the hw ones.
> 
> Just grepping around, the software ones seem be be around 100 and the
> hardware
> ones around 200-400. So why was 150 not enough?
> 
I didn't find a documentation about how we use the priority, and I assumed
people use virtio-crypto will configure hardware accelerators in the
host. So I choosed the number which bigger than aesni's priority.

Regards,
-Gonglei

^ permalink raw reply

* Re: [PATCH v8 1/1] crypto: add virtio-crypto driver
From: Christian Borntraeger @ 2017-01-10 12:27 UTC (permalink / raw)
  To: Gonglei, linux-kernel, qemu-devel, virtio-dev, virtualization,
	linux-crypto
  Cc: xuquan8, weidong.huang, herbert, hanweidong, claudio.fontana, mst,
	luonengjun, wanzongshun, stefanha, jianjay.zhou, longpeng2, davem,
	wu.wubin, arei.gonglei
In-Reply-To: <1481767396-187748-2-git-send-email-arei.gonglei@huawei.com>

On 12/15/2016 03:03 AM, Gonglei wrote:
[...]
> +
> +static struct crypto_alg virtio_crypto_algs[] = { {
> +	.cra_name = "cbc(aes)",
> +	.cra_driver_name = "virtio_crypto_aes_cbc",
> +	.cra_priority = 501,


This is still higher than the hardware-accelerators (like intel aesni or the
s390 cpacf functions or the arm hw). aesni and s390/cpacf are supported by the
hardware virtualization and available to the guests. I do not see a way how virtio
crypto can be faster than that (in the end it might be cpacf/aesni + overhead)
instead it will very likely be slower.
So we should use a number that is higher than software implementations but
lower than the hw ones.

Just grepping around, the software ones seem be be around 100 and the hardware 
ones around 200-400. So why was 150 not enough?

Christian

^ permalink raw reply

* Re: [PATCH v2 1/4] lib: Update LZ4 compressor module based on LZ4 v1.7.2.
From: Willy Tarreau @ 2017-01-10 10:50 UTC (permalink / raw)
  To: Greg KH
  Cc: Sven Schmidt, akpm, bongkyu.kim, rsalvaterra, sergey.senozhatsky,
	linux-kernel, herbert, davem, linux-crypto, anton, ccross,
	keescook, tony.luck, phillip
In-Reply-To: <20170110100032.GC32419@kroah.com>

On Tue, Jan 10, 2017 at 11:00:32AM +0100, Greg KH wrote:
> On Tue, Jan 10, 2017 at 10:21:16AM +0100, Sven Schmidt wrote:
> > On 01/08/2017 12:25 PM, Greg KH wrote:
> > >On Sat, Jan 07, 2017 at 05:55:42PM +0100, Sven Schmidt wrote:
> > >> This patch updates LZ4 kernel module to LZ4 v1.7.2 by Yann Collet.
> > >> The kernel module is inspired by the previous work by Chanho Min.
> > >> The updated LZ4 module will not break existing code since there were alias
> > >> methods added to ensure backwards compatibility.
> > >
> > > Meta-comment.  Does this update include all of the security fixes that
> > > we have made over the past few years to the lz4 code?  I don't want to
> > > be adding back insecure functions that will cause us problems.
> > >
> > > Specifically look at the changes I made in 2014 in this directory for an
> > > example of what I am talking about here.
> > >
> > 
> > Hi Greg,
> > 
> > it doesn't. I didn't have that in mind until now.
> 
> Ick, those changes never got made "upstream"?  Not good, but makes sense
> as we couldn't really find an "upstream" when we made them :(

I *seem* to remember that some of these changes were specific to our
implementation, and were discovered during a review after we worked on
the the LZO implementation bugs, though I could be wrong.

If this is the case, it is one more reason for being extra careful.

> As you took this code from somewhere, you might want to also push your
> changes for these issues there as well, so that others don't run into
> them in the future.

Agreed!

Willy

^ permalink raw reply

* Re: [PATCH v2 1/4] lib: Update LZ4 compressor module based on LZ4 v1.7.2.
From: Greg KH @ 2017-01-10 10:00 UTC (permalink / raw)
  To: Sven Schmidt
  Cc: akpm, bongkyu.kim, rsalvaterra, sergey.senozhatsky, linux-kernel,
	herbert, davem, linux-crypto, anton, ccross, keescook, tony.luck,
	phillip
In-Reply-To: <1484040076-5004-1-git-send-email-4sschmid@informatik.uni-hamburg.de>

On Tue, Jan 10, 2017 at 10:21:16AM +0100, Sven Schmidt wrote:
> On 01/08/2017 12:25 PM, Greg KH wrote:
> >On Sat, Jan 07, 2017 at 05:55:42PM +0100, Sven Schmidt wrote:
> >> This patch updates LZ4 kernel module to LZ4 v1.7.2 by Yann Collet.
> >> The kernel module is inspired by the previous work by Chanho Min.
> >> The updated LZ4 module will not break existing code since there were alias
> >> methods added to ensure backwards compatibility.
> >
> > Meta-comment.  Does this update include all of the security fixes that
> > we have made over the past few years to the lz4 code?  I don't want to
> > be adding back insecure functions that will cause us problems.
> >
> > Specifically look at the changes I made in 2014 in this directory for an
> > example of what I am talking about here.
> >
> 
> Hi Greg,
> 
> it doesn't. I didn't have that in mind until now.

Ick, those changes never got made "upstream"?  Not good, but makes sense
as we couldn't really find an "upstream" when we made them :(

As you took this code from somewhere, you might want to also push your
changes for these issues there as well, so that others don't run into
them in the future.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH v2 1/4] lib: Update LZ4 compressor module based on LZ4 v1.7.2.
From: Greg KH @ 2017-01-10  9:59 UTC (permalink / raw)
  To: Sven Schmidt
  Cc: akpm, bongkyu.kim, rsalvaterra, sergey.senozhatsky, linux-kernel,
	herbert, davem, linux-crypto, anton, ccross, keescook, tony.luck,
	phillip
In-Reply-To: <1484040737-5400-1-git-send-email-4sschmid@informatik.uni-hamburg.de>

On Tue, Jan 10, 2017 at 10:32:17AM +0100, Sven Schmidt wrote:
> On 01/08/2017 12:23 PM, Greg KH wrote:
> > And follow the proper kernel coding style rules, putting your patches
> > through scripts/checkpatch.pl should help you out here.
> 
> Sorry, I didn't know about that particular script. I already put
> Patches 2, 3 and 4 through checkpatch.pl; patch 1 is a little more to
> rework but I will, of course, do that.

You do know about Documentation/SubmittingPatches, right?  If not, take
a look at that as well :)

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH v2 4/4] fs/pstore: fs/squashfs: Change usage of LZ4 to comply with new LZ4 module version
From: Sven Schmidt @ 2017-01-10  9:45 UTC (permalink / raw)
  To: keescook
  Cc: gregkh, akpm, bongkyu.kim, rsalvaterra, sergey.senozhatsky,
	linux-kernel, herbert, davem, linux-crypto, anton, ccross,
	tony.luck, phillip, Sven Schmidt
In-Reply-To: <CAGXu5jKqMUMU8DbnE=AamrCp_-JN7sGRCpaGMps16X_TshfK+Q@mail.gmail.com>

Hi Kees,

On 01/07/2017 10:33 PM, Kees Cook wrote:
 >On Sat, Jan 7, 2017 at 8:55 AM, Sven Schmidt
 ><4sschmid@informatik.uni-hamburg.de> wrote:
 >> This patch updates fs/pstore and fs/squashfs to use the updated functions from
 >> the new LZ4 module.
 >>
 >> Signed-off-by: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
 >> ---
 >>  fs/pstore/platform.c      | 14 ++++++++------
 >>  fs/squashfs/lz4_wrapper.c | 12 ++++++------
 >>  2 files changed, 14 insertions(+), 12 deletions(-)
 >>
 >> diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
 >> index 729677e..a0d8ca8 100644
 >> --- a/fs/pstore/platform.c
 >> +++ b/fs/pstore/platform.c
 >> @@ -342,31 +342,33 @@ static int compress_lz4(const void *in, void *out, size_t inlen, size_t outlen)
 >>  {
 >>         int ret;
 >>
 >> -       ret = lz4_compress(in, inlen, out, &outlen, workspace);
 >> +       ret = LZ4_compress_default(in, out, inlen, outlen, workspace);
 >>         if (ret) {
 >> +               // ret is 0 means an error occured
 >
 >If that's true, then shouldn't the "if" logic be changed? Also, here
 >and in all following comments are C++ style instead of kernel C-style.
 >This should be "/* ret == 0 means an error occured */", though really,
 >that should be obvious from the code and the comment isn't really
 >needed.

indeed, it should. I fixed that one.

 >>                 pr_err("lz4_compress error, ret = %d!\n", ret);
 >If it's always going to be zero here, is there a better place to get
 >details on why it failed?

 It is always going to be zero. Honestly, after looking at the current LZ4 in kernel again
 I don't get why there actually was a need to print the return value since lz4_compress
 will (as far as I see) always return -1 in case of an error while the new lz4_compress_fast/default
 will return 0 in such case. Maybe I should just stick with the error?

 Thanks,

 Sven

^ permalink raw reply

* Re: [PATCH v2 1/4] lib: Update LZ4 compressor module based on LZ4 v1.7.2.
From: Sven Schmidt @ 2017-01-10  9:32 UTC (permalink / raw)
  To: gregkh
  Cc: akpm, bongkyu.kim, rsalvaterra, sergey.senozhatsky, linux-kernel,
	herbert, davem, linux-crypto, anton, ccross, keescook, tony.luck,
	phillip, Sven Schmidt
In-Reply-To: <20170108112223.GA12798@kroah.com>

On 01/08/2017 12:23 PM, Greg KH wrote:
> On Sat, Jan 07, 2017 at 05:55:43PM +0100, Sven Schmidt wrote:
>> This patch updates the unlz4 wrapper to work with the new LZ4 kernel module version.
>>
>> Signed-off-by: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
>> ---
>>  lib/decompress_unlz4.c | 13 ++++++++-----
>>  1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/lib/decompress_unlz4.c b/lib/decompress_unlz4.c
>> index 036fc88..1b0baf3 100644
>> --- a/lib/decompress_unlz4.c
>> +++ b/lib/decompress_unlz4.c
>> @@ -72,7 +72,7 @@ STATIC inline int INIT unlz4(u8 *input, long in_len,
>>  		error("NULL input pointer and missing fill function");
>>  		goto exit_1;
>>  	} else {
>> -		inp = large_malloc(lz4_compressbound(uncomp_chunksize));
>> +		inp = large_malloc(LZ4_compressBound(uncomp_chunksize));
>
> Having functions differ by different cases of the characters is ripe for
> abuse and confusion.  Please never do that, especially as these "new"
> functions you created don't follow the correct kernel coding style rules
> :(
>
> thanks,
>
> greg k-h

Hi Greg,

you're right about that. I think I put a little too much effort in keeping old function names here.
I will get rid of that.

I also want to quote your previous E-Mail here:

> And follow the proper kernel coding style rules, putting your patches
> through scripts/checkpatch.pl should help you out here.

Sorry, I didn't know about that particular script. I already put Patches 2, 3 and 4 through checkpatch.pl;
patch 1 is a little more to rework but I will, of course, do that.

Thanks,

Sven

^ permalink raw reply

* Re: [PATCH v2 1/4] lib: Update LZ4 compressor module based on LZ4 v1.7.2.
From: Sven Schmidt @ 2017-01-10  9:21 UTC (permalink / raw)
  To: gregkh
  Cc: akpm, bongkyu.kim, rsalvaterra, sergey.senozhatsky, linux-kernel,
	herbert, davem, linux-crypto, anton, ccross, keescook, tony.luck,
	phillip, Sven Schmidt
In-Reply-To: <20170108112542.GC12798@kroah.com>

On 01/08/2017 12:25 PM, Greg KH wrote:
>On Sat, Jan 07, 2017 at 05:55:42PM +0100, Sven Schmidt wrote:
>> This patch updates LZ4 kernel module to LZ4 v1.7.2 by Yann Collet.
>> The kernel module is inspired by the previous work by Chanho Min.
>> The updated LZ4 module will not break existing code since there were alias
>> methods added to ensure backwards compatibility.
>
> Meta-comment.  Does this update include all of the security fixes that
> we have made over the past few years to the lz4 code?  I don't want to
> be adding back insecure functions that will cause us problems.
>
> Specifically look at the changes I made in 2014 in this directory for an
> example of what I am talking about here.
>

Hi Greg,

it doesn't. I didn't have that in mind until now.
But I had a look at the related commits after I received your E-Mail and will review what I have to change
(and, obviously, do the changes :)).

Thanks,

Sven

^ permalink raw reply

* [PATCH 00/13] crypto: copy AAD during encrypt for AEAD ciphers
From: Stephan Müller @ 2017-01-10  1:36 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto

Hi,

to all driver maintainers: the patches I added are compile tested, but
I do not have the hardware to verify the code. May I ask the respective
hardware maintainers to verify that the code is appropriate and works
as intended? Thanks a lot.

Herbert, this is my proprosal for our discussion around copying the
AAD for algif_aead. Instead of adding the code to algif_aead and wait
until it transpires to all cipher implementations, I thought it would
be more helpful to fix all cipher implementations.

To do so, the AAD copy function found in authenc is extracted to a global
service function. Furthermore, the generic AEAD TFM initialization code
now allocates the null cipher too. This allows us now to only invoke
the AAD copy function in the various implementations without any additional
allocation logic.

The code for x86 and the generic code was tested with libkcapi.

The code for the drivers is compile tested for drivers applicable to
x86 only. All others are neither compile tested nor functionally tested.

Stephan Mueller (13):
  crypto: service function to copy AAD from src to dst
  crypto: gcm_generic - copy AAD during encryption
  crypto: ccm_generic - copy AAD during encryption
  crypto: rfc4106-gcm-aesni - copy AAD during encryption
  crypto: ccm-aes-ce - copy AAD during encryption
  crypto: talitos - copy AAD during encryption
  crypto: picoxcell - copy AAD during encryption
  crypto: ixp4xx - copy AAD during encryption
  crypto: atmel - copy AAD during encryption
  crypto: caam - copy AAD during encryption
  crypto: chelsio - copy AAD during encryption
  crypto: nx - copy AAD during encryption
  crypto: qat - copy AAD during encryption

 arch/arm64/crypto/aes-ce-ccm-glue.c      |  4 ++++
 arch/x86/crypto/aesni-intel_glue.c       |  5 +++++
 crypto/Kconfig                           |  4 ++--
 crypto/aead.c                            | 36 ++++++++++++++++++++++++++++++--
 crypto/authenc.c                         | 36 ++++----------------------------
 crypto/ccm.c                             | 10 +++++++++
 crypto/gcm.c                             | 17 +++++++++++++++
 drivers/crypto/atmel-aes.c               |  6 ++++++
 drivers/crypto/caam/caamalg.c            |  8 +++++++
 drivers/crypto/chelsio/chcr_algo.c       |  5 +++++
 drivers/crypto/ixp4xx_crypto.c           |  6 ++++++
 drivers/crypto/nx/nx-aes-ccm.c           |  4 ++++
 drivers/crypto/nx/nx-aes-gcm.c           | 10 +++++++++
 drivers/crypto/picoxcell_crypto.c        |  5 +++++
 drivers/crypto/qat/qat_common/qat_algs.c |  4 ++++
 drivers/crypto/talitos.c                 |  5 +++++
 include/crypto/aead.h                    |  2 ++
 include/crypto/internal/aead.h           | 12 +++++++++++
 18 files changed, 143 insertions(+), 36 deletions(-)

-- 
2.9.3

^ permalink raw reply

* [PATCH 01/13] crypto: service function to copy AAD from src to dst
From: Stephan Müller @ 2017-01-10  1:36 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto
In-Reply-To: <10526995.lyZ7Je1KMx@positron.chronox.de>

The service function crypto_aead_copy_ad uses the null cipher to copy
the AAD from teh source to the destination SGL. The copy operation is
prevented when the source and destination SGL is identical.

The required null cipher is allocated during the allocation of the TFM
and released with the TFM. Therefore, a use of the helper function only
requires adding an invocation of this function in the encrypt code path
shortly before the encrypt operation is invoked but after the SGLs are
set with the AEAD request.

The patch converts the authenc implementation to use this service
function instead of its own copy logic.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
---
 crypto/Kconfig                 |  4 ++--
 crypto/aead.c                  | 36 ++++++++++++++++++++++++++++++++++--
 crypto/authenc.c               | 36 ++++--------------------------------
 include/crypto/aead.h          |  2 ++
 include/crypto/internal/aead.h | 12 ++++++++++++
 5 files changed, 54 insertions(+), 36 deletions(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 160f08e..f53928a 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -44,6 +44,8 @@ config CRYPTO_AEAD
 	tristate
 	select CRYPTO_AEAD2
 	select CRYPTO_ALGAPI
+	select CRYPTO_BLKCIPHER
+	select CRYPTO_NULL
 
 config CRYPTO_AEAD2
 	tristate
@@ -227,10 +229,8 @@ config CRYPTO_MCRYPTD
 config CRYPTO_AUTHENC
 	tristate "Authenc support"
 	select CRYPTO_AEAD
-	select CRYPTO_BLKCIPHER
 	select CRYPTO_MANAGER
 	select CRYPTO_HASH
-	select CRYPTO_NULL
 	help
 	  Authenc: Combined mode wrapper for IPsec.
 	  This is required for IPSec.
diff --git a/crypto/aead.c b/crypto/aead.c
index 3f5c5ff..e9291e6 100644
--- a/crypto/aead.c
+++ b/crypto/aead.c
@@ -79,11 +79,31 @@ int crypto_aead_setauthsize(struct crypto_aead *tfm, unsigned int authsize)
 }
 EXPORT_SYMBOL_GPL(crypto_aead_setauthsize);
 
+int crypto_aead_copy_ad(struct aead_request *req)
+{
+	struct crypto_aead *aead = crypto_aead_reqtfm(req);
+	SKCIPHER_REQUEST_ON_STACK(skreq, aead->null);
+
+	/* No copy operation if src and dst are identical. */
+	if (req->src == req->dst)
+		return 0;
+
+	skcipher_request_set_tfm(skreq, aead->null);
+	skcipher_request_set_callback(skreq, aead_request_flags(req),
+				      NULL, NULL);
+	skcipher_request_set_crypt(skreq, req->src, req->dst, req->assoclen,
+				   NULL);
+
+	return crypto_skcipher_encrypt(skreq);
+}
+EXPORT_SYMBOL_GPL(crypto_aead_copy_ad);
+
 static void crypto_aead_exit_tfm(struct crypto_tfm *tfm)
 {
 	struct crypto_aead *aead = __crypto_aead_cast(tfm);
 	struct aead_alg *alg = crypto_aead_alg(aead);
 
+	crypto_put_default_null_skcipher2();
 	alg->exit(aead);
 }
 
@@ -91,14 +111,26 @@ static int crypto_aead_init_tfm(struct crypto_tfm *tfm)
 {
 	struct crypto_aead *aead = __crypto_aead_cast(tfm);
 	struct aead_alg *alg = crypto_aead_alg(aead);
+	struct crypto_skcipher *null;
+	int err;
 
 	aead->authsize = alg->maxauthsize;
 
+	null = crypto_get_default_null_skcipher2();
+	err = PTR_ERR(null);
+	if (IS_ERR(null))
+		return err;
+	aead->null = null;
+
 	if (alg->exit)
 		aead->base.exit = crypto_aead_exit_tfm;
 
-	if (alg->init)
-		return alg->init(aead);
+	if (alg->init) {
+		err = alg->init(aead);
+		if (err)
+			crypto_put_default_null_skcipher2();
+		return err;
+	}
 
 	return 0;
 }
diff --git a/crypto/authenc.c b/crypto/authenc.c
index 875470b..a7a304a 100644
--- a/crypto/authenc.c
+++ b/crypto/authenc.c
@@ -14,7 +14,6 @@
 #include <crypto/internal/hash.h>
 #include <crypto/internal/skcipher.h>
 #include <crypto/authenc.h>
-#include <crypto/null.h>
 #include <crypto/scatterwalk.h>
 #include <linux/err.h>
 #include <linux/init.h>
@@ -33,7 +32,6 @@ struct authenc_instance_ctx {
 struct crypto_authenc_ctx {
 	struct crypto_ahash *auth;
 	struct crypto_skcipher *enc;
-	struct crypto_skcipher *null;
 };
 
 struct authenc_request_ctx {
@@ -180,21 +178,6 @@ static void crypto_authenc_encrypt_done(struct crypto_async_request *req,
 	authenc_request_complete(areq, err);
 }
 
-static int crypto_authenc_copy_assoc(struct aead_request *req)
-{
-	struct crypto_aead *authenc = crypto_aead_reqtfm(req);
-	struct crypto_authenc_ctx *ctx = crypto_aead_ctx(authenc);
-	SKCIPHER_REQUEST_ON_STACK(skreq, ctx->null);
-
-	skcipher_request_set_tfm(skreq, ctx->null);
-	skcipher_request_set_callback(skreq, aead_request_flags(req),
-				      NULL, NULL);
-	skcipher_request_set_crypt(skreq, req->src, req->dst, req->assoclen,
-				   NULL);
-
-	return crypto_skcipher_encrypt(skreq);
-}
-
 static int crypto_authenc_encrypt(struct aead_request *req)
 {
 	struct crypto_aead *authenc = crypto_aead_reqtfm(req);
@@ -212,13 +195,12 @@ static int crypto_authenc_encrypt(struct aead_request *req)
 	src = scatterwalk_ffwd(areq_ctx->src, req->src, req->assoclen);
 	dst = src;
 
-	if (req->src != req->dst) {
-		err = crypto_authenc_copy_assoc(req);
-		if (err)
-			return err;
+	err = crypto_aead_copy_ad(req);
+	if (err)
+		return err;
 
+	if (req->src != req->dst)
 		dst = scatterwalk_ffwd(areq_ctx->dst, req->dst, req->assoclen);
-	}
 
 	skcipher_request_set_tfm(skreq, enc);
 	skcipher_request_set_callback(skreq, aead_request_flags(req),
@@ -317,7 +299,6 @@ static int crypto_authenc_init_tfm(struct crypto_aead *tfm)
 	struct crypto_authenc_ctx *ctx = crypto_aead_ctx(tfm);
 	struct crypto_ahash *auth;
 	struct crypto_skcipher *enc;
-	struct crypto_skcipher *null;
 	int err;
 
 	auth = crypto_spawn_ahash(&ictx->auth);
@@ -329,14 +310,8 @@ static int crypto_authenc_init_tfm(struct crypto_aead *tfm)
 	if (IS_ERR(enc))
 		goto err_free_ahash;
 
-	null = crypto_get_default_null_skcipher2();
-	err = PTR_ERR(null);
-	if (IS_ERR(null))
-		goto err_free_skcipher;
-
 	ctx->auth = auth;
 	ctx->enc = enc;
-	ctx->null = null;
 
 	crypto_aead_set_reqsize(
 		tfm,
@@ -350,8 +325,6 @@ static int crypto_authenc_init_tfm(struct crypto_aead *tfm)
 
 	return 0;
 
-err_free_skcipher:
-	crypto_free_skcipher(enc);
 err_free_ahash:
 	crypto_free_ahash(auth);
 	return err;
@@ -363,7 +336,6 @@ static void crypto_authenc_exit_tfm(struct crypto_aead *tfm)
 
 	crypto_free_ahash(ctx->auth);
 	crypto_free_skcipher(ctx->enc);
-	crypto_put_default_null_skcipher2();
 }
 
 static void crypto_authenc_free(struct aead_instance *inst)
diff --git a/include/crypto/aead.h b/include/crypto/aead.h
index 03b9762..9872f92 100644
--- a/include/crypto/aead.h
+++ b/include/crypto/aead.h
@@ -16,6 +16,7 @@
 #include <linux/crypto.h>
 #include <linux/kernel.h>
 #include <linux/slab.h>
+#include <crypto/skcipher.h>
 
 /**
  * DOC: Authenticated Encryption With Associated Data (AEAD) Cipher API
@@ -155,6 +156,7 @@ struct crypto_aead {
 	unsigned int authsize;
 	unsigned int reqsize;
 
+	struct crypto_skcipher *null;
 	struct crypto_tfm base;
 };
 
diff --git a/include/crypto/internal/aead.h b/include/crypto/internal/aead.h
index 6ad8e31..01050c0 100644
--- a/include/crypto/internal/aead.h
+++ b/include/crypto/internal/aead.h
@@ -187,5 +187,17 @@ void crypto_unregister_aeads(struct aead_alg *algs, int count);
 int aead_register_instance(struct crypto_template *tmpl,
 			   struct aead_instance *inst);
 
+/**
+ * crypto_aead_copy_ad - copy the AAD from src to dst buffers
+ * @req: AEAD cipher request
+ *
+ * This function is intended for the encrypt operation only as the ciphertext
+ * should be accompanied by the AAD. The copy operation is performed with
+ * the null cipher that is allocated during initialization of the AEAD TFM.
+ *
+ * Return: 0 upon success, < 0 in case of error
+ */
+int crypto_aead_copy_ad(struct aead_request *req);
+
 #endif	/* _CRYPTO_INTERNAL_AEAD_H */
 
-- 
2.9.3

^ permalink raw reply related

* [PATCH 02/13] crypto: gcm_generic - copy AAD during encryption
From: Stephan Müller @ 2017-01-10  1:37 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto
In-Reply-To: <10526995.lyZ7Je1KMx@positron.chronox.de>

>From 8690e52f7e90663b14b2490d37a3b5b1c5a53bfe Mon Sep 17 00:00:00 2001
From: Stephan Mueller <smueller@chronox.de>
Date: Mon, 9 Jan 2017 12:35:35 +0100
Subject: 

Invoke the crypto_aead_copy_ad function during the encryption code path
to copy the AAD from the source to the destination buffer.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
---
 crypto/gcm.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/crypto/gcm.c b/crypto/gcm.c
index b7ad808..1ca0b05c 100644
--- a/crypto/gcm.c
+++ b/crypto/gcm.c
@@ -496,11 +496,16 @@ static int crypto_gcm_encrypt(struct aead_request *req)
 	struct crypto_gcm_req_priv_ctx *pctx = crypto_gcm_reqctx(req);
 	struct skcipher_request *skreq = &pctx->u.skreq;
 	u32 flags = aead_request_flags(req);
+	int err;
 
 	crypto_gcm_init_common(req);
 	crypto_gcm_init_crypt(req, req->cryptlen);
 	skcipher_request_set_callback(skreq, flags, gcm_encrypt_done, req);
 
+	err = crypto_aead_copy_ad(req);
+	if (err)
+		return err;
+
 	return crypto_skcipher_encrypt(skreq) ?:
 	       gcm_encrypt_continue(req, flags);
 }
@@ -866,9 +871,15 @@ static struct aead_request *crypto_rfc4106_crypt(struct aead_request *req)
 
 static int crypto_rfc4106_encrypt(struct aead_request *req)
 {
+	int err;
+
 	if (req->assoclen != 16 && req->assoclen != 20)
 		return -EINVAL;
 
+	err = crypto_aead_copy_ad(req);
+	if (err)
+		return err;
+
 	req = crypto_rfc4106_crypt(req);
 
 	return crypto_aead_encrypt(req);
@@ -1099,6 +1110,12 @@ static int crypto_rfc4543_copy_src_to_dst(struct aead_request *req, bool enc)
 
 static int crypto_rfc4543_encrypt(struct aead_request *req)
 {
+	int err;
+
+	err = crypto_aead_copy_ad(req);
+	if (err)
+		return err;
+
 	return crypto_rfc4543_crypt(req, true);
 }
 
-- 
2.9.3

^ permalink raw reply related

* [PATCH 03/13] crypto: ccm_generic - copy AAD during encryption
From: Stephan Müller @ 2017-01-10  1:37 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto
In-Reply-To: <10526995.lyZ7Je1KMx@positron.chronox.de>

Invoke the crypto_aead_copy_ad function during the encryption code path
to copy the AAD from the source to the destination buffer.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
---
 crypto/ccm.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/crypto/ccm.c b/crypto/ccm.c
index 26b924d..e2b6d3d 100644
--- a/crypto/ccm.c
+++ b/crypto/ccm.c
@@ -358,6 +358,10 @@ static int crypto_ccm_encrypt(struct aead_request *req)
 	if (err)
 		return err;
 
+	err = crypto_aead_copy_ad(req);
+	if (err)
+		return err;
+
 	err = crypto_ccm_auth(req, sg_next(pctx->src), cryptlen);
 	if (err)
 		return err;
@@ -749,9 +753,15 @@ static struct aead_request *crypto_rfc4309_crypt(struct aead_request *req)
 
 static int crypto_rfc4309_encrypt(struct aead_request *req)
 {
+	int err;
+
 	if (req->assoclen != 16 && req->assoclen != 20)
 		return -EINVAL;
 
+	err = crypto_aead_copy_ad(req);
+	if (err)
+		return err;
+
 	req = crypto_rfc4309_crypt(req);
 
 	return crypto_aead_encrypt(req);
-- 
2.9.3

^ permalink raw reply related

* [PATCH 04/13] crypto: rfc4106-gcm-aesni - copy AAD during encryption
From: Stephan Müller @ 2017-01-10  1:37 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto
In-Reply-To: <10526995.lyZ7Je1KMx@positron.chronox.de>

Invoke the crypto_aead_copy_ad function during the encryption code path
to copy the AAD from the source to the destination buffer.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
---
 arch/x86/crypto/aesni-intel_glue.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 36ca150..6149018 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -882,6 +882,7 @@ static int rfc4106_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct cryptd_aead **ctx = crypto_aead_ctx(tfm);
 	struct cryptd_aead *cryptd_tfm = *ctx;
+	int err;
 
 	tfm = &cryptd_tfm->base;
 	if (irq_fpu_usable() && (!in_atomic() ||
@@ -890,6 +891,10 @@ static int rfc4106_encrypt(struct aead_request *req)
 
 	aead_request_set_tfm(req, tfm);
 
+	err = crypto_aead_copy_ad(req);
+	if (err)
+		return err;
+
 	return crypto_aead_encrypt(req);
 }
 
-- 
2.9.3

^ permalink raw reply related

* [PATCH 05/13] crypto: ccm-aes-ce - copy AAD during encryption
From: Stephan Müller @ 2017-01-10  1:38 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto
In-Reply-To: <10526995.lyZ7Je1KMx@positron.chronox.de>

>From acad5f6af68d18bc1dbac460bd44f2298902018b Mon Sep 17 00:00:00 2001
From: Stephan Mueller <smueller@chronox.de>
Date: Mon, 9 Jan 2017 14:32:02 +0100
Subject: 

Invoke the crypto_aead_copy_ad function during the encryption code path
to copy the AAD from the source to the destination buffer.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
---
 arch/arm64/crypto/aes-ce-ccm-glue.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c
index cc5515d..7dfcb1f 100644
--- a/arch/arm64/crypto/aes-ce-ccm-glue.c
+++ b/arch/arm64/crypto/aes-ce-ccm-glue.c
@@ -155,6 +155,10 @@ static int ccm_encrypt(struct aead_request *req)
 	u32 len = req->cryptlen;
 	int err;
 
+	err = crypto_aead_copy_ad(req);
+	if (err)
+		return err;
+
 	err = ccm_init_mac(req, mac, len);
 	if (err)
 		return err;
-- 
2.9.3

^ permalink raw reply related

* [PATCH 06/13] crypto: talitos - copy AAD during encryption
From: Stephan Müller @ 2017-01-10  1:38 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto
In-Reply-To: <10526995.lyZ7Je1KMx@positron.chronox.de>

Invoke the crypto_aead_copy_ad function during the encryption code path
to copy the AAD from the source to the destination buffer.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
---
 drivers/crypto/talitos.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 0bba6a1..6a6e343 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -1429,6 +1429,11 @@ static int aead_encrypt(struct aead_request *req)
 	struct crypto_aead *authenc = crypto_aead_reqtfm(req);
 	struct talitos_ctx *ctx = crypto_aead_ctx(authenc);
 	struct talitos_edesc *edesc;
+	int err;
+
+	err = crypto_aead_copy_ad(req);
+	if (err)
+		return err;
 
 	/* allocate extended descriptor */
 	edesc = aead_edesc_alloc(req, req->iv, 0, true);
-- 
2.9.3

^ permalink raw reply related

* [PATCH 07/13] crypto: picoxcell - copy AAD during encryption
From: Stephan Müller @ 2017-01-10  1:38 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto
In-Reply-To: <10526995.lyZ7Je1KMx@positron.chronox.de>

Invoke the crypto_aead_copy_ad function during the encryption code path
to copy the AAD from the source to the destination buffer.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
---
 drivers/crypto/picoxcell_crypto.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/crypto/picoxcell_crypto.c b/drivers/crypto/picoxcell_crypto.c
index 4757609..ef955d3 100644
--- a/drivers/crypto/picoxcell_crypto.c
+++ b/drivers/crypto/picoxcell_crypto.c
@@ -688,6 +688,11 @@ static int spacc_aead_encrypt(struct aead_request *req)
 {
 	struct crypto_aead *aead = crypto_aead_reqtfm(req);
 	struct spacc_aead *alg = to_spacc_aead(crypto_aead_alg(aead));
+	int err;
+
+	err = crypto_aead_copy_ad(req);
+	if (err)
+		return err;
 
 	return spacc_aead_setup(req, alg->type, 1);
 }
-- 
2.9.3

^ permalink raw reply related

* [PATCH 08/13] crypto: ixp4xx - copy AAD during encryption
From: Stephan Müller @ 2017-01-10  1:39 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto
In-Reply-To: <10526995.lyZ7Je1KMx@positron.chronox.de>

Invoke the crypto_aead_copy_ad function during the encryption code path
to copy the AAD from the source to the destination buffer.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
---
 drivers/crypto/ixp4xx_crypto.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/crypto/ixp4xx_crypto.c b/drivers/crypto/ixp4xx_crypto.c
index 7868765..20a5bd8 100644
--- a/drivers/crypto/ixp4xx_crypto.c
+++ b/drivers/crypto/ixp4xx_crypto.c
@@ -1178,6 +1178,12 @@ static int aead_setkey(struct crypto_aead *tfm, const u8 *key,
 
 static int aead_encrypt(struct aead_request *req)
 {
+	int err;
+
+	err = crypto_aead_copy_ad(req);
+	if (err)
+		return err;
+
 	return aead_perform(req, 1, req->assoclen, req->cryptlen, req->iv);
 }
 
-- 
2.9.3

^ permalink raw reply related

* [PATCH 09/13] crypto: atmel - copy AAD during encryption
From: Stephan Müller @ 2017-01-10  1:39 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto
In-Reply-To: <10526995.lyZ7Je1KMx@positron.chronox.de>

Invoke the crypto_aead_copy_ad function during the encryption code path
to copy the AAD from the source to the destination buffer.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
---
 drivers/crypto/atmel-aes.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/crypto/atmel-aes.c b/drivers/crypto/atmel-aes.c
index 0e3d0d6..48ecf72 100644
--- a/drivers/crypto/atmel-aes.c
+++ b/drivers/crypto/atmel-aes.c
@@ -1752,6 +1752,12 @@ static int atmel_aes_gcm_setauthsize(struct crypto_aead *tfm,
 
 static int atmel_aes_gcm_encrypt(struct aead_request *req)
 {
+	int err;
+
+	err = crypto_aead_copy_ad(req);
+	if (err)
+		return err;
+
 	return atmel_aes_gcm_crypt(req, AES_FLAGS_ENCRYPT);
 }
 
-- 
2.9.3

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox