Re: [PATCH v13 7/7] x86: vdso: Wire up getrandom() vDSO implementation

linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Eric Biggers <ebiggers@kernel.org>
To: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: linux-kernel@vger.kernel.org, patches@lists.linux.dev,
	tglx@linutronix.de, linux-crypto@vger.kernel.org,
	linux-api@vger.kernel.org, x86@kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>,
	Carlos O'Donell <carlos@redhat.com>,
	Florian Weimer <fweimer@redhat.com>,
	Arnd Bergmann <arnd@arndb.de>, Jann Horn <jannh@google.com>,
	Christian Brauner <brauner@kernel.org>,
	Samuel Neves <sneves@dei.uc.pt>
Subject: Re: [PATCH v13 7/7] x86: vdso: Wire up getrandom() vDSO implementation
Date: Wed, 21 Dec 2022 15:27:04 -0800	[thread overview]
Message-ID: <Y6OWSM18QL977nbC@sol.localdomain> (raw)
In-Reply-To: <20221221142327.126451-8-Jason@zx2c4.com>

On Wed, Dec 21, 2022 at 03:23:27PM +0100, Jason A. Donenfeld wrote:
> diff --git a/arch/x86/entry/vdso/vgetrandom-chacha.S b/arch/x86/entry/vdso/vgetrandom-chacha.S
> new file mode 100644
> index 000000000000..91fbb7ac7af4
> --- /dev/null
> +++ b/arch/x86/entry/vdso/vgetrandom-chacha.S
> @@ -0,0 +1,177 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2022 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
> + */
> +
> +#include <linux/linkage.h>
> +#include <asm/frame.h>
> +
> +.section	.rodata.cst16.CONSTANTS, "aM", @progbits, 16
> +.align 16
> +CONSTANTS:	.octa 0x6b20657479622d323320646e61707865
> +.text

For simplicity, maybe leave off the section mergeability stuff and just have
plain ".section .rodata"?

> +/*
> + * Very basic SSE2 implementation of ChaCha20. Produces a given positive number
> + * of blocks of output with a nonce of 0, taking an input key and 8-byte
> + * counter. Importantly does not spill to the stack. Its arguments are:
> + *
> + *	rdi: output bytes
> + *	rsi: 32-byte key input
> + *	rdx: 8-byte counter input/output
> + *	rcx: number of 64-byte blocks to write to output
> + */
> +SYM_FUNC_START(__arch_chacha20_blocks_nostack)
> +
> +#define output  %rdi
> +#define key     %rsi
> +#define counter %rdx
> +#define nblocks %rcx
> +#define i       %al
> +#define state0  %xmm0
> +#define state1  %xmm1
> +#define state2  %xmm2
> +#define state3  %xmm3
> +#define copy0   %xmm4
> +#define copy1   %xmm5
> +#define copy2   %xmm6
> +#define copy3   %xmm7
> +#define temp    %xmm8
> +#define one     %xmm9

It would be worth mentioning in the function comment that none of the xmm
registers are callee-save.  That was not obvious to me.  I know that on arm64,
*kernel* code doesn't need to save/restore NEON registers, so it's not something
that arch/arm64/crypto/ does.  But, it *is* needed in arm64 userspace code.  So
I was worried that something similar would apply to x86_64, but it seems not.

> +	/* state1[0,1,2,3] = state1[0,3,2,1] */
> +	pshufd		$0x39,state1,state1
> +	/* state2[0,1,2,3] = state2[1,0,3,2] */
> +	pshufd		$0x4e,state2,state2
> +	/* state3[0,1,2,3] = state3[2,1,0,3] */
> +	pshufd		$0x93,state3,state3

The comments don't match the pshufd constants.  The code is correct but the
comments are not.  They should be:

	/* state1[0,1,2,3] = state1[1,2,3,0] */
	pshufd		$0x39,state1,state1
	/* state2[0,1,2,3] = state2[2,3,0,1] */
	pshufd		$0x4e,state2,state2
	/* state3[0,1,2,3] = state3[3,0,1,2] */
	pshufd		$0x93,state3,state3

> +	/* state0 += state1, state3 = rotl32(state3 ^ state0, 16) */
> +	paddd		state1,state0
> +	pxor		state0,state3
> +	movdqa		state3,temp
> +	pslld		$16,temp
> +	psrld		$16,state3
> +	por		temp,state3
> +
> +	/* state2 += state3, state1 = rotl32(state1 ^ state2, 12) */
> +	paddd		state3,state2
> +	pxor		state2,state1
> +	movdqa		state1,temp
> +	pslld		$12,temp
> +	psrld		$20,state1
> +	por		temp,state1
> +
> +	/* state0 += state1, state3 = rotl32(state3 ^ state0, 8) */
> +	paddd		state1,state0
> +	pxor		state0,state3
> +	movdqa		state3,temp
> +	pslld		$8,temp
> +	psrld		$24,state3
> +	por		temp,state3
> +
> +	/* state2 += state3, state1 = rotl32(state1 ^ state2, 7) */
> +	paddd		state3,state2
> +	pxor		state2,state1
> +	movdqa		state1,temp
> +	pslld		$7,temp
> +	psrld		$25,state1
> +	por		temp,state1

The above sequence of 24 instructions is repeated twice, so maybe it should be a
macro (".chacha_round"?).

> +	/* state1[0,1,2,3] = state1[2,1,0,3] */
> +	pshufd		$0x93,state1,state1
> +	/* state2[0,1,2,3] = state2[1,0,3,2] */
> +	pshufd		$0x4e,state2,state2
> +	/* state3[0,1,2,3] = state3[0,3,2,1] */
> +	pshufd		$0x39,state3,state3

Similarly, the above comments are wrong.  They should be:

	/* state1[0,1,2,3] = state1[3,0,1,2] */
	pshufd		$0x93,state1,state1
	/* state2[0,1,2,3] = state2[2,3,0,1] */
	pshufd		$0x4e,state2,state2
	/* state3[0,1,2,3] = state3[1,2,3,0] */
	pshufd		$0x39,state3,state3

- Eric

next prev parent reply	other threads:[~2022-12-21 23:27 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-21 14:23 [PATCH v13 0/7] implement getrandom() in vDSO Jason A. Donenfeld
2022-12-21 14:23 ` [PATCH v13 1/7] x86: lib: Separate instruction decoder MMIO type from MMIO trace Jason A. Donenfeld
2022-12-21 14:23 ` [PATCH v13 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings Jason A. Donenfeld
2022-12-21 14:23 ` [PATCH v13 3/7] x86: mm: Skip faulting instruction for VM_DROPPABLE faults Jason A. Donenfeld
2022-12-21 14:23 ` [PATCH v13 4/7] random: add vgetrandom_alloc() syscall Jason A. Donenfeld
2022-12-21 14:23 ` [PATCH v13 5/7] arch: allocate vgetrandom_alloc() syscall number Jason A. Donenfeld
2022-12-21 14:23 ` [PATCH v13 6/7] random: introduce generic vDSO getrandom() implementation Jason A. Donenfeld
2022-12-21 22:23   ` Eric Biggers
2023-01-01 15:53     ` Jason A. Donenfeld
2022-12-21 14:23 ` [PATCH v13 7/7] x86: vdso: Wire up getrandom() vDSO implementation Jason A. Donenfeld
2022-12-21 23:27   ` Eric Biggers [this message]
2023-01-01 16:21     ` Jason A. Donenfeld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y6OWSM18QL977nbC@sol.localdomain \
    --to=ebiggers@kernel.org \
    --cc=Jason@zx2c4.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=arnd@arndb.de \
    --cc=brauner@kernel.org \
    --cc=carlos@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jannh@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=sneves@dei.uc.pt \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).