All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] LoongArch: Implement getrandom() in vDSO
@ 2024-08-15 13:33 Xi Ruoyao
  2024-08-15 13:33 ` [PATCH v2 1/2] LoongArch: Perform alternative runtime patching on vDSO Xi Ruoyao
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Xi Ruoyao @ 2024-08-15 13:33 UTC (permalink / raw)
  To: Jason A . Donenfeld, Huacai Chen, WANG Xuerui
  Cc: linux-crypto, loongarch, Jinyang He, Tiezhu Yang, Arnd Bergmann,
	Xi Ruoyao

For the rationale to implement getrandom() in vDSO see [1].

The vDSO getrandom() needs a stack-less ChaCha20 implementation, so we
need to add architecture-specific code and wire it up with the generic
code.

Without LSX it's not easy to implement ChaCha20 without stack.  So the
current implementation just falls back to a getrandom() syscall if LSX
is unavailable.  In the 1st patch the existing alternative runtime
patching mechanism is expanded to cover vDSO in the first patch, so we
don't need to invoke cpucfg for each vDSO getrandom() call.

Then in the 2nd patch stack-less ChaCha20 is implemented with LSX.  The
code is basically a direct translate from the x86 SSE2 implementation.
One annoying thing here is the compiler generates a memset() call for a
"large" struct initialization in a cold path and there seems no way to
prevent it.  So a naive memset implementation is copied from the kernel
code into vDSO.

The implementation is tested with the kernel selftests added by the last
patch in [1].  I had to make some adjustments to make it work on
LoongArch (see [2], I've not submitted the changes as at now because I'm
unsure about the KHDR_INCLUDES addition).  The vdso_test_getrandom
bench-single result:

       vdso: 25000000 times in 0.631345201 seconds
       libc: 25000000 times in 6.953121083 seconds
    syscall: 25000000 times in 6.992112386 seconds

The vdso_test_getrandom bench-multi result:

       vdso: 25000000 x 256 times in 29.558284986 seconds
       libc: 25000000 x 256 times in 356.633930139 seconds
       syscall: 25000000 x 256 times in 334.885555338 seconds

[1]:https://lore.kernel.org/all/20240712014009.281406-1-Jason@zx2c4.com/
[2]:https://github.com/xry111/linux/commits/xry111/la-vdso/

v1->v2: Remove Cc: lists in the cover letter and just type them in git
send-email command.  I assumed the Cc: lists in the cover letter would be
"propagated" to the patches by git send-email but I was wrong, so v1 was
never properly delivered to the lists.

Xi Ruoyao (2):
  LoongArch: Perform alternative runtime patching on vDSO
  LoongArch: vDSO: Wire up getrandom() vDSO implementation

 arch/loongarch/Kconfig                      |   1 +
 arch/loongarch/include/asm/vdso/getrandom.h |  47 ++++++
 arch/loongarch/include/asm/vdso/vdso.h      |   8 +
 arch/loongarch/kernel/asm-offsets.c         |  10 ++
 arch/loongarch/kernel/vdso.c                |  14 +-
 arch/loongarch/vdso/Makefile                |   2 +
 arch/loongarch/vdso/memset.S                |  24 +++
 arch/loongarch/vdso/vdso.lds.S              |   7 +
 arch/loongarch/vdso/vgetrandom-alt.S        |  19 +++
 arch/loongarch/vdso/vgetrandom-chacha.S     | 162 ++++++++++++++++++++
 arch/loongarch/vdso/vgetrandom.c            |  16 ++
 11 files changed, 309 insertions(+), 1 deletion(-)
 create mode 100644 arch/loongarch/include/asm/vdso/getrandom.h
 create mode 100644 arch/loongarch/vdso/memset.S
 create mode 100644 arch/loongarch/vdso/vgetrandom-alt.S
 create mode 100644 arch/loongarch/vdso/vgetrandom-chacha.S
 create mode 100644 arch/loongarch/vdso/vgetrandom.c

-- 
2.46.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/2] LoongArch: Perform alternative runtime patching on vDSO
  2024-08-15 13:33 [PATCH v2 0/2] LoongArch: Implement getrandom() in vDSO Xi Ruoyao
@ 2024-08-15 13:33 ` Xi Ruoyao
  2024-08-15 13:33 ` [PATCH v2 2/2] LoongArch: vDSO: Wire up getrandom() vDSO implementation Xi Ruoyao
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Xi Ruoyao @ 2024-08-15 13:33 UTC (permalink / raw)
  To: Jason A . Donenfeld, Huacai Chen, WANG Xuerui
  Cc: linux-crypto, loongarch, Jinyang He, Tiezhu Yang, Arnd Bergmann,
	Xi Ruoyao

To implement getrandom() in vDSO, we need to implement stack-less
ChaCha20.  ChaCha20 is designed to be SIMD-friendly, but LSX is not
guaranteed to be available on all LoongArch CPU models.  Perform
alternative runtime patching on vDSO so we'll be able to use LSX in
vDSO.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
---
 arch/loongarch/kernel/vdso.c   | 8 +++++++-
 arch/loongarch/vdso/vdso.lds.S | 6 ++++++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/loongarch/kernel/vdso.c b/arch/loongarch/kernel/vdso.c
index 90dfccb41c14..d606ddf65b97 100644
--- a/arch/loongarch/kernel/vdso.c
+++ b/arch/loongarch/kernel/vdso.c
@@ -17,6 +17,7 @@
 #include <linux/time_namespace.h>
 #include <linux/timekeeper_internal.h>
 
+#include <asm/alternative.h>
 #include <asm/page.h>
 #include <asm/vdso.h>
 #include <vdso/helpers.h>
@@ -99,7 +100,7 @@ struct loongarch_vdso_info vdso_info = {
 
 static int __init init_vdso(void)
 {
-	unsigned long i, cpu, pfn;
+	unsigned long i, cpu, pfn, vdso;
 
 	BUG_ON(!PAGE_ALIGNED(vdso_info.vdso));
 	BUG_ON(!PAGE_ALIGNED(vdso_info.size));
@@ -111,6 +112,11 @@ static int __init init_vdso(void)
 	for (i = 0; i < vdso_info.size / PAGE_SIZE; i++)
 		vdso_info.code_mapping.pages[i] = pfn_to_page(pfn + i);
 
+	vdso = (unsigned long)vdso_info.vdso;
+
+	apply_alternatives((struct alt_instr *)(vdso + vdso_offset_alt),
+			   (struct alt_instr *)(vdso + vdso_offset_alt_end));
+
 	return 0;
 }
 subsys_initcall(init_vdso);
diff --git a/arch/loongarch/vdso/vdso.lds.S b/arch/loongarch/vdso/vdso.lds.S
index 56ad855896de..746d31bd4e90 100644
--- a/arch/loongarch/vdso/vdso.lds.S
+++ b/arch/loongarch/vdso/vdso.lds.S
@@ -35,6 +35,12 @@ SECTIONS
 
 	.rodata		: { *(.rodata*) }		:text
 
+	.altinstructions : ALIGN(4) {
+		VDSO_alt = .;
+		*(.altinstructions)
+		VDSO_alt_end = .;
+	} :text
+
 	_end = .;
 	PROVIDE(end = .);
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/2] LoongArch: vDSO: Wire up getrandom() vDSO implementation
  2024-08-15 13:33 [PATCH v2 0/2] LoongArch: Implement getrandom() in vDSO Xi Ruoyao
  2024-08-15 13:33 ` [PATCH v2 1/2] LoongArch: Perform alternative runtime patching on vDSO Xi Ruoyao
@ 2024-08-15 13:33 ` Xi Ruoyao
  2024-08-15 14:04 ` [PATCH v2 0/2] LoongArch: Implement getrandom() in vDSO Jason A. Donenfeld
  2024-08-15 14:08 ` Jason A. Donenfeld
  3 siblings, 0 replies; 9+ messages in thread
From: Xi Ruoyao @ 2024-08-15 13:33 UTC (permalink / raw)
  To: Jason A . Donenfeld, Huacai Chen, WANG Xuerui
  Cc: linux-crypto, loongarch, Jinyang He, Tiezhu Yang, Arnd Bergmann,
	Xi Ruoyao

Hook up the generic vDSO implementation to the LoongArch vDSO data page:
embed struct vdso_rng_data into struct loongarch_vdso_data, and use
assembler hack to resolve the symbol name "_vdso_rng_data" (which is
expected by the generic vDSO implementation) to the rng_data field in
loongarch_vdso_data.

The vDSO function requires a ChaCha20 implementation that does not write
to the stack, yet can still do an entire ChaCha20 permutation, so
provide this using LSX.  For processors lacking LSX just fallback to a
getrandom() syscall.

The compiler (GCC 14.2) calls memset() for initializing a "large" struct
in a cold path of the generic vDSO getrandom() code.  There seems no way
to prevent it from calling memset(), and it's a cold path so the
performance does not matter, so just provide a naive memset()
implementation for vDSO.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
---
 arch/loongarch/Kconfig                      |   1 +
 arch/loongarch/include/asm/vdso/getrandom.h |  47 ++++++
 arch/loongarch/include/asm/vdso/vdso.h      |   8 +
 arch/loongarch/kernel/asm-offsets.c         |  10 ++
 arch/loongarch/kernel/vdso.c                |   6 +
 arch/loongarch/vdso/Makefile                |   2 +
 arch/loongarch/vdso/memset.S                |  24 +++
 arch/loongarch/vdso/vdso.lds.S              |   1 +
 arch/loongarch/vdso/vgetrandom-alt.S        |  19 +++
 arch/loongarch/vdso/vgetrandom-chacha.S     | 162 ++++++++++++++++++++
 arch/loongarch/vdso/vgetrandom.c            |  16 ++
 11 files changed, 296 insertions(+)
 create mode 100644 arch/loongarch/include/asm/vdso/getrandom.h
 create mode 100644 arch/loongarch/vdso/memset.S
 create mode 100644 arch/loongarch/vdso/vgetrandom-alt.S
 create mode 100644 arch/loongarch/vdso/vgetrandom-chacha.S
 create mode 100644 arch/loongarch/vdso/vgetrandom.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 70f169210b52..56b3fc8feb0b 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -190,6 +190,7 @@ config LOONGARCH
 	select TRACE_IRQFLAGS_SUPPORT
 	select USE_PERCPU_NUMA_NODE_ID
 	select USER_STACKTRACE_SUPPORT
+	select VDSO_GETRANDOM if CPU_HAS_LSX
 	select ZONE_DMA32
 
 config 32BIT
diff --git a/arch/loongarch/include/asm/vdso/getrandom.h b/arch/loongarch/include/asm/vdso/getrandom.h
new file mode 100644
index 000000000000..a369588a4ebf
--- /dev/null
+++ b/arch/loongarch/include/asm/vdso/getrandom.h
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2024 Xi Ruoyao <xry111@xry111.site>. All Rights Reserved.
+ */
+#ifndef __ASM_VDSO_GETRANDOM_H
+#define __ASM_VDSO_GETRANDOM_H
+
+#ifndef __ASSEMBLY__
+
+#include <asm/unistd.h>
+#include <asm/vdso/vdso.h>
+
+static __always_inline ssize_t getrandom_syscall(void *_buffer,
+						 size_t _len,
+						 unsigned int _flags)
+{
+	register long ret asm("a0");
+	register long int nr asm("a7") = __NR_getrandom;
+	register void *buffer asm("a0") = _buffer;
+	register size_t len asm("a1") = _len;
+	register unsigned int flags asm("a2") = _flags;
+
+	asm volatile(
+	"      syscall 0\n"
+	: "+r" (ret)
+	: "r" (nr), "r" (buffer), "r" (len), "r" (flags)
+	: "$t0", "$t1", "$t2", "$t3", "$t4", "$t5", "$t6", "$t7", "$t8",
+	  "memory");
+
+	return ret;
+}
+
+static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data(
+	void)
+{
+	return (const struct vdso_rng_data *)(
+		get_vdso_data() +
+		VVAR_LOONGARCH_PAGES_START * PAGE_SIZE +
+		offsetof(struct loongarch_vdso_data, rng_data));
+}
+
+extern void __arch_chacha20_blocks_nostack(u8 *dst_bytes, const u32 *key,
+					   u32 *counter, size_t nblocks);
+
+#endif /* !__ASSEMBLY__ */
+
+#endif /* __ASM_VDSO_GETRANDOM_H */
diff --git a/arch/loongarch/include/asm/vdso/vdso.h b/arch/loongarch/include/asm/vdso/vdso.h
index 5a12309d9fb5..a2e24c3007e2 100644
--- a/arch/loongarch/include/asm/vdso/vdso.h
+++ b/arch/loongarch/include/asm/vdso/vdso.h
@@ -4,6 +4,9 @@
  * Copyright (C) 2020-2022 Loongson Technology Corporation Limited
  */
 
+#ifndef _ASM_VDSO_VDSO_H
+#define _ASM_VDSO_VDSO_H
+
 #ifndef __ASSEMBLY__
 
 #include <asm/asm.h>
@@ -16,6 +19,9 @@ struct vdso_pcpu_data {
 
 struct loongarch_vdso_data {
 	struct vdso_pcpu_data pdata[NR_CPUS];
+#ifdef CONFIG_VDSO_GETRANDOM
+	struct vdso_rng_data rng_data;
+#endif
 };
 
 /*
@@ -63,3 +69,5 @@ static inline unsigned long get_vdso_data(void)
 }
 
 #endif /* __ASSEMBLY__ */
+
+#endif
diff --git a/arch/loongarch/kernel/asm-offsets.c b/arch/loongarch/kernel/asm-offsets.c
index bee9f7a3108f..86f6d8a6dc23 100644
--- a/arch/loongarch/kernel/asm-offsets.c
+++ b/arch/loongarch/kernel/asm-offsets.c
@@ -14,6 +14,7 @@
 #include <asm/ptrace.h>
 #include <asm/processor.h>
 #include <asm/ftrace.h>
+#include <asm/vdso/vdso.h>
 
 static void __used output_ptreg_defines(void)
 {
@@ -321,3 +322,12 @@ static void __used output_kvm_defines(void)
 	OFFSET(KVM_GPGD, kvm, arch.pgd);
 	BLANK();
 }
+
+#ifdef CONFIG_VDSO_GETRANDOM
+static void __used output_vdso_rng_defines(void)
+{
+	COMMENT("LoongArch VDSO getrandom offsets.");
+	OFFSET(VDSO_RNG_DATA, loongarch_vdso_data, rng_data);
+	BLANK();
+}
+#endif
diff --git a/arch/loongarch/kernel/vdso.c b/arch/loongarch/kernel/vdso.c
index d606ddf65b97..d500436f252b 100644
--- a/arch/loongarch/kernel/vdso.c
+++ b/arch/loongarch/kernel/vdso.c
@@ -23,6 +23,7 @@
 #include <vdso/helpers.h>
 #include <vdso/vsyscall.h>
 #include <vdso/datapage.h>
+#include <generated/asm-offsets.h>
 #include <generated/vdso-offsets.h>
 
 extern char vdso_start[], vdso_end[];
@@ -35,6 +36,11 @@ static union {
 	struct loongarch_vdso_data vdata;
 } loongarch_vdso_data __page_aligned_data;
 
+#ifdef CONFIG_VDSO_GETRANDOM
+asm(".globl _vdso_rng_data\n"
+    ".set _vdso_rng_data, loongarch_vdso_data + " __stringify(VDSO_RNG_DATA));
+#endif
+
 static struct page *vdso_pages[] = { NULL };
 struct vdso_data *vdso_data = generic_vdso_data.data;
 struct vdso_pcpu_data *vdso_pdata = loongarch_vdso_data.vdata.pdata;
diff --git a/arch/loongarch/vdso/Makefile b/arch/loongarch/vdso/Makefile
index 2ddf0480e710..4be33ec54d1d 100644
--- a/arch/loongarch/vdso/Makefile
+++ b/arch/loongarch/vdso/Makefile
@@ -6,6 +6,8 @@ include $(srctree)/lib/vdso/Makefile
 
 obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o
 
+obj-vdso-$(CONFIG_VDSO_GETRANDOM) += vgetrandom.o vgetrandom-chacha.o vgetrandom-alt.o memset.o
+
 # Common compiler flags between ABIs.
 ccflags-vdso := \
 	$(filter -I%,$(KBUILD_CFLAGS)) \
diff --git a/arch/loongarch/vdso/memset.S b/arch/loongarch/vdso/memset.S
new file mode 100644
index 000000000000..ec1531683936
--- /dev/null
+++ b/arch/loongarch/vdso/memset.S
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * A copy of __memset_generic from arch/loongarch/lib/memset.S for vDSO.
+ *
+ * Copyright (C) 2020-2024 Loongson Technology Corporation Limited
+ */
+
+#include <asm/regdef.h>
+#include <linux/linkage.h>
+
+SYM_FUNC_START(memset)
+	move	a3, a0
+	beqz	a2, 2f
+
+1:	st.b	a1, a0, 0
+	addi.d	a0, a0, 1
+	addi.d	a2, a2, -1
+	bgt	a2, zero, 1b
+
+2:	move	a0, a3
+	jr	ra
+SYM_FUNC_END(memset)
+
+.hidden memset
diff --git a/arch/loongarch/vdso/vdso.lds.S b/arch/loongarch/vdso/vdso.lds.S
index 746d31bd4e90..ac63dc080bc9 100644
--- a/arch/loongarch/vdso/vdso.lds.S
+++ b/arch/loongarch/vdso/vdso.lds.S
@@ -69,6 +69,7 @@ VERSION
 		__vdso_clock_gettime;
 		__vdso_gettimeofday;
 		__vdso_rt_sigreturn;
+		__vdso_getrandom;
 	local: *;
 	};
 }
diff --git a/arch/loongarch/vdso/vgetrandom-alt.S b/arch/loongarch/vdso/vgetrandom-alt.S
new file mode 100644
index 000000000000..655b9f0dfece
--- /dev/null
+++ b/arch/loongarch/vdso/vgetrandom-alt.S
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2024 Xi Ruoyao <xry111@xry111.site>. All Rights Reserved.
+ *
+ */
+
+#include <asm/alternative-asm.h>
+#include <asm/cpu.h>
+#include <asm/unistd.h>
+#include <asm/regdef.h>
+#include <linux/linkage.h>
+
+SYM_FUNC_START(__vdso_getrandom)
+	ALTERNATIVE	__stringify(li.w a7, __NR_getrandom; syscall 0; jr ra), \
+			"b __vdso_getrandom_lsx", CPU_FEATURE_LSX
+SYM_FUNC_END(__vdso_getrandom)
+
+.weak	getrandom
+.set	getrandom, __vdso_getrandom
diff --git a/arch/loongarch/vdso/vgetrandom-chacha.S b/arch/loongarch/vdso/vgetrandom-chacha.S
new file mode 100644
index 000000000000..be385b04c3ea
--- /dev/null
+++ b/arch/loongarch/vdso/vgetrandom-chacha.S
@@ -0,0 +1,162 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2024 Xi Ruoyao <xry111@xry111.site>. All Rights Reserved.
+ *
+ * Based on arch/x86/entry/vdso/vgetrandom-chacha.S:
+ *
+ * Copyright (C) 2022-2024 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights
+ * Reserved.
+ */
+
+#include <asm/asm.h>
+#include <asm/regdef.h>
+#include <linux/linkage.h>
+
+.section	.rodata
+.align 4
+CONSTANTS:	.octa 0x6b20657479622d323320646e61707865
+
+.text
+
+/*
+ * Very basic SSE2 implementation of ChaCha20. Produces a given positive
+ * number of blocks of output with a nonce of 0, taking an input key and
+ * 8-byte counter. Importantly does not spill to the stack. Its arguments
+ * are:
+ *
+ *	a0: output bytes
+ *	a1: 32-byte key input
+ *	a2: 8-byte counter input/output
+ *	a3: number of 64-byte blocks to write to output
+ */
+SYM_FUNC_START(__arch_chacha20_blocks_nostack)
+#define output		a0
+#define key		a1
+#define counter		a2
+#define nblocks		a3
+#define i		t0
+/* LSX registers vr0-vr23 are caller-save. */
+#define state0		$vr0
+#define state1		$vr1
+#define state2		$vr2
+#define state3		$vr3
+#define copy0		$vr4
+#define copy1		$vr5
+#define copy2		$vr6
+#define copy3		$vr7
+#define one		$vr8
+
+	/* copy0 = "expand 32-byte k" */
+	la.pcrel	t1, CONSTANTS
+	vld		copy0, t1, 0
+	/* copy1, copy2 = key */
+	vld		copy1, key, 0
+	vld		copy2, key, 0x10
+	/* copy3 = counter || zero nonce */
+	vldrepl.d	copy3, counter, 0
+	vinsgr2vr.d	copy3, zero, 1
+	/* one = 1 || 0 */
+	vldi		one, 0b0110000000001
+	vinsgr2vr.d	one, zero, 1
+
+.Lblock:
+	/* state = copy */
+	vori.b		state0, copy0, 0
+	vori.b		state1, copy1, 0
+	vori.b		state2, copy2, 0
+	vori.b		state3, copy3, 0
+
+	li.w		i, 10
+.Lpermute:
+	/* state0 += state1, state3 = rotl32(state3 ^ state0, 16) */
+	vadd.w		state0, state0, state1
+	vxor.v		state3, state3, state0
+	vrotri.w	state3, state3, 16
+
+	/* state2 += state3, state1 = rotl32(state1 ^ state2, 12) */
+	vadd.w		state2, state2, state3
+	vxor.v		state1, state1, state2
+	vrotri.w	state1, state1, 20
+
+	/* state0 += state1, state3 = rotl32(state3 ^ state0, 8) */
+	vadd.w		state0, state0, state1
+	vxor.v		state3, state3, state0
+	vrotri.w	state3, state3, 24
+
+	/* state2 += state3, state1 = rotl32(state1 ^ state2, 7) */
+	vadd.w		state2, state2, state3
+	vxor.v		state1, state1, state2
+	vrotri.w	state1, state1, 25
+
+	/* state1[0,1,2,3] = state1[1,2,3,0] */
+	vshuf4i.w	state1, state1, 0b00111001
+	/* state2[0,1,2,3] = state2[2,3,0,1] */
+	vshuf4i.w	state2, state2, 0b01001110
+	/* state3[0,1,2,3] = state3[1,2,3,0] */
+	vshuf4i.w	state3, state3, 0b10010011
+
+	/* state0 += state1, state3 = rotl32(state3 ^ state0, 16) */
+	vadd.w		state0, state0, state1
+	vxor.v		state3, state3, state0
+	vrotri.w	state3, state3, 16
+
+	/* state2 += state3, state1 = rotl32(state1 ^ state2, 12) */
+	vadd.w		state2, state2, state3
+	vxor.v		state1, state1, state2
+	vrotri.w	state1, state1, 20
+
+	/* state0 += state1, state3 = rotl32(state3 ^ state0, 8) */
+	vadd.w		state0, state0, state1
+	vxor.v		state3, state3, state0
+	vrotri.w	state3, state3, 24
+
+	/* state2 += state3, state1 = rotl32(state1 ^ state2, 7) */
+	vadd.w		state2, state2, state3
+	vxor.v		state1, state1, state2
+	vrotri.w	state1, state1, 25
+
+	/* state1[0,1,2,3] = state1[3,0,1,2] */
+	vshuf4i.w	state1, state1, 0b10010011
+	/* state2[0,1,2,3] = state2[2,3,0,1] */
+	vshuf4i.w	state2, state2, 0b01001110
+	/* state3[0,1,2,3] = state3[1,2,3,0] */
+	vshuf4i.w	state3, state3, 0b00111001
+
+	addi.w		i, i, -1
+	bnez		i, .Lpermute
+
+	/* output0 = state0 + copy0 */
+	vadd.w		state0, state0, copy0
+	vst		state0, output, 0
+	/* output1 = state1 + copy1 */
+	vadd.w		state1, state1, copy1
+	vst		state1, output, 0x10
+	/* output2 = state2 + copy2 */
+	vadd.w		state2, state2, copy2
+	vst		state2, output, 0x20
+	/* output3 = state3 + copy3 */
+	vadd.w		state3, state3, copy3
+	vst		state3, output, 0x30
+
+	/* ++copy3.counter */
+	vadd.d		copy3, copy3, one
+
+	/* output += 64 */
+	PTR_ADDI	output, output, 64
+	/* --nblocks */
+	PTR_ADDI	nblocks, nblocks, -1
+	bnez		nblocks, .Lblock
+
+	/* counter = copy3.counter */
+	vstelm.d	copy3, counter, 0, 0
+
+	/* Zero out the potentially sensitive regs, in case nothing uses these again. */
+	vldi		state0, 0
+	vldi		state1, 0
+	vldi		state2, 0
+	vldi		state3, 0
+	vldi		copy1, 0
+	vldi		copy2, 0
+
+	jr		ra
+SYM_FUNC_END(__arch_chacha20_blocks_nostack)
diff --git a/arch/loongarch/vdso/vgetrandom.c b/arch/loongarch/vdso/vgetrandom.c
new file mode 100644
index 000000000000..fd09c3847b65
--- /dev/null
+++ b/arch/loongarch/vdso/vgetrandom.c
@@ -0,0 +1,16 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2024 Xi Ruoyao <xry111@xry111.site>. All Rights Reserved.
+ */
+#include <linux/types.h>
+
+#include "../../../../lib/vdso/getrandom.c"
+
+typeof(__cvdso_getrandom) __vdso_getrandom_lsx;
+
+ssize_t __vdso_getrandom_lsx(void *buffer, size_t len, unsigned int flags,
+			     void *opaque_state, size_t opaque_len)
+{
+	return __cvdso_getrandom(buffer, len, flags, opaque_state,
+				 opaque_len);
+}
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/2] LoongArch: Implement getrandom() in vDSO
  2024-08-15 13:33 [PATCH v2 0/2] LoongArch: Implement getrandom() in vDSO Xi Ruoyao
  2024-08-15 13:33 ` [PATCH v2 1/2] LoongArch: Perform alternative runtime patching on vDSO Xi Ruoyao
  2024-08-15 13:33 ` [PATCH v2 2/2] LoongArch: vDSO: Wire up getrandom() vDSO implementation Xi Ruoyao
@ 2024-08-15 14:04 ` Jason A. Donenfeld
  2024-08-15 14:22   ` Xi Ruoyao
  2024-08-26  6:32   ` Xi Ruoyao
  2024-08-15 14:08 ` Jason A. Donenfeld
  3 siblings, 2 replies; 9+ messages in thread
From: Jason A. Donenfeld @ 2024-08-15 14:04 UTC (permalink / raw)
  To: Xi Ruoyao
  Cc: Huacai Chen, WANG Xuerui, linux-crypto, loongarch, Jinyang He,
	Tiezhu Yang, Arnd Bergmann

Hi Xi,

Thanks for posting this! That's very nice to see.

I'm currently traveling without my laptop (actually in Yunnan, China!),
so I'll be able to take a look at this for real starting the 26th, as
right now I'm just on my cellphone using lore+mutt.

One thing I wanted to ask, though, is - doesn't LoongArch have 32 8-byte
registers? Shouldn't that be enough to implement ChaCha without spilling
and without using LSX?

Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/2] LoongArch: Implement getrandom() in vDSO
  2024-08-15 13:33 [PATCH v2 0/2] LoongArch: Implement getrandom() in vDSO Xi Ruoyao
                   ` (2 preceding siblings ...)
  2024-08-15 14:04 ` [PATCH v2 0/2] LoongArch: Implement getrandom() in vDSO Jason A. Donenfeld
@ 2024-08-15 14:08 ` Jason A. Donenfeld
  3 siblings, 0 replies; 9+ messages in thread
From: Jason A. Donenfeld @ 2024-08-15 14:08 UTC (permalink / raw)
  To: Xi Ruoyao
  Cc: Huacai Chen, WANG Xuerui, linux-crypto, loongarch, Jinyang He,
	Tiezhu Yang, Arnd Bergmann

On Thu, Aug 15, 2024 at 09:33:55PM +0800, Xi Ruoyao wrote:
> v1->v2: Remove Cc: lists in the cover letter and just type them in git
> send-email command.  I assumed the Cc: lists in the cover letter would be
> "propagated" to the patches by git send-email but I was wrong, so v1 was
> never properly delivered to the lists.

The `--cc-cover` flag is what you want, or set sendemail.ccCover in your
git config file.

https://git-scm.com/docs/git-send-email/en#Documentation/git-send-email.txt---no-cc-cover

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/2] LoongArch: Implement getrandom() in vDSO
  2024-08-15 14:04 ` [PATCH v2 0/2] LoongArch: Implement getrandom() in vDSO Jason A. Donenfeld
@ 2024-08-15 14:22   ` Xi Ruoyao
  2024-08-15 14:28     ` Jason A. Donenfeld
  2024-08-26  6:32   ` Xi Ruoyao
  1 sibling, 1 reply; 9+ messages in thread
From: Xi Ruoyao @ 2024-08-15 14:22 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Huacai Chen, WANG Xuerui, linux-crypto, loongarch, Jinyang He,
	Tiezhu Yang, Arnd Bergmann

On Thu, 2024-08-15 at 14:04 +0000, Jason A. Donenfeld wrote:
> Hi Xi,
> 
> Thanks for posting this! That's very nice to see.
> 
> I'm currently traveling without my laptop (actually in Yunnan, China!),

Have fun!

> so I'll be able to take a look at this for real starting the 26th, as
> right now I'm just on my cellphone using lore+mutt.
> 
> One thing I wanted to ask, though, is - doesn't LoongArch have 32 8-byte
> registers? Shouldn't that be enough to implement ChaCha without spilling
> and without using LSX?

I'll work on it but I need to ask a question (it may be stupid because I
know a little about security) before starting to code:

Is "stack-less" meaning simply "don't spill any sensitive data onto the
stack," or more strictly "stack shouldn't be used at all"?

For example, is it OK to save all the callee-saved registers in the
function prologue onto the stack, and restore them in the epilogue?

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/2] LoongArch: Implement getrandom() in vDSO
  2024-08-15 14:22   ` Xi Ruoyao
@ 2024-08-15 14:28     ` Jason A. Donenfeld
  0 siblings, 0 replies; 9+ messages in thread
From: Jason A. Donenfeld @ 2024-08-15 14:28 UTC (permalink / raw)
  To: Xi Ruoyao
  Cc: Huacai Chen, WANG Xuerui, linux-crypto, loongarch, Jinyang He,
	Tiezhu Yang, Arnd Bergmann

On Thu, Aug 15, 2024 at 10:22:31PM +0800, Xi Ruoyao wrote:
> > so I'll be able to take a look at this for real starting the 26th, as
> > right now I'm just on my cellphone using lore+mutt.
> > 
> > One thing I wanted to ask, though, is - doesn't LoongArch have 32 8-byte
> > registers? Shouldn't that be enough to implement ChaCha without spilling
> > and without using LSX?
> 
> I'll work on it but I need to ask a question (it may be stupid because I
> know a little about security) before starting to code:
> 
> Is "stack-less" meaning simply "don't spill any sensitive data onto the
> stack," or more strictly "stack shouldn't be used at all"?
> 
> For example, is it OK to save all the callee-saved registers in the
> function prologue onto the stack, and restore them in the epilogue?

Just means don't spill sensitive info, which means the key, the output,
the entire ChaCha state, and all intermediate states. But saving
callee-saved registers in the prologue like usual is fine.

Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/2] LoongArch: Implement getrandom() in vDSO
  2024-08-15 14:04 ` [PATCH v2 0/2] LoongArch: Implement getrandom() in vDSO Jason A. Donenfeld
  2024-08-15 14:22   ` Xi Ruoyao
@ 2024-08-26  6:32   ` Xi Ruoyao
  2024-08-26  8:54     ` Jason A. Donenfeld
  1 sibling, 1 reply; 9+ messages in thread
From: Xi Ruoyao @ 2024-08-26  6:32 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Huacai Chen, WANG Xuerui, linux-crypto, loongarch, Jinyang He,
	Tiezhu Yang, Arnd Bergmann

On Thu, 2024-08-15 at 14:04 +0000, Jason A. Donenfeld wrote:
> Thanks for posting this! That's very nice to see.
> 
> I'm currently traveling without my laptop (actually in Yunnan, China!),
> so I'll be able to take a look at this for real starting the 26th, as
> right now I'm just on my cellphone using lore+mutt.

Hi Jason,

When you start the reviewing I guess you can check out the powerpc
implementation first and add me into the Cc of your reply.  There seems
something useful to me in the powerpc implementation (avoiding memset,
adding __arch_get_k_vdso_data so I wouldn't need the inline asm trick
for the _vdso_rng_data symbol, and the selftest support).

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/2] LoongArch: Implement getrandom() in vDSO
  2024-08-26  6:32   ` Xi Ruoyao
@ 2024-08-26  8:54     ` Jason A. Donenfeld
  0 siblings, 0 replies; 9+ messages in thread
From: Jason A. Donenfeld @ 2024-08-26  8:54 UTC (permalink / raw)
  To: Xi Ruoyao
  Cc: Huacai Chen, WANG Xuerui, linux-crypto, loongarch, Jinyang He,
	Tiezhu Yang, Arnd Bergmann

On Mon, Aug 26, 2024 at 02:32:05PM +0800, Xi Ruoyao wrote:
> On Thu, 2024-08-15 at 14:04 +0000, Jason A. Donenfeld wrote:
> > Thanks for posting this! That's very nice to see.
> > 
> > I'm currently traveling without my laptop (actually in Yunnan, China!),
> > so I'll be able to take a look at this for real starting the 26th, as
> > right now I'm just on my cellphone using lore+mutt.
> 
> Hi Jason,
> 
> When you start the reviewing I guess you can check out the powerpc
> implementation first and add me into the Cc of your reply.  There seems
> something useful to me in the powerpc implementation (avoiding memset,
> adding __arch_get_k_vdso_data so I wouldn't need the inline asm trick
> for the _vdso_rng_data symbol, and the selftest support).

Indeed, I just committed a bit of those fixups to the random.git tree,
if you want to base your work on that for the time being:

   https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git/log/

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-08-26  8:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-15 13:33 [PATCH v2 0/2] LoongArch: Implement getrandom() in vDSO Xi Ruoyao
2024-08-15 13:33 ` [PATCH v2 1/2] LoongArch: Perform alternative runtime patching on vDSO Xi Ruoyao
2024-08-15 13:33 ` [PATCH v2 2/2] LoongArch: vDSO: Wire up getrandom() vDSO implementation Xi Ruoyao
2024-08-15 14:04 ` [PATCH v2 0/2] LoongArch: Implement getrandom() in vDSO Jason A. Donenfeld
2024-08-15 14:22   ` Xi Ruoyao
2024-08-15 14:28     ` Jason A. Donenfeld
2024-08-26  6:32   ` Xi Ruoyao
2024-08-26  8:54     ` Jason A. Donenfeld
2024-08-15 14:08 ` Jason A. Donenfeld

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.