linux-riscv.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [v8, 00/10] riscv: support kernel-mode Vector
@ 2023-12-23  4:29 Andy Chiu
  2023-12-23  4:29 ` [v8, 01/10] riscv: Add support for kernel mode vector Andy Chiu
                   ` (9 more replies)
  0 siblings, 10 replies; 24+ messages in thread
From: Andy Chiu @ 2023-12-23  4:29 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: paul.walmsley, greentime.hu, guoren, bjorn, charlie, ardb, arnd,
	peterz, tglx, ebiggers, Andy Chiu, Albert Ou

This series provides support for running Vector code in kernel mode.
Along with the suport, we add some Vector optimized routines. And
provide a simple threshold to decide when to run the vectorized
functions.

This series is composed by 3 parts:
 patch 1-4: adds basic support for kernel-mode Vector
 patch 5-6: includes vectorized common library routines into the kernel
 patch 7-10: provides some code refactors and support for preemptible
             kernel-mode Vector.

This series can be merged incrementally if we feel any part of
{1~4, 5~6, 7~10} is mature enough.

This patch is tested on a QEMU with V and verified that booting, normal
userspace operations all work as usual with thresholds set to 0. Also,
we test by launching multiple kernel threads which continuously executes
and verifies Vector operations in the background. The module that tests
these operation is expected to be upstream later.

v7 of this series can be found at [1]

Link: https://lore.kernel.org/all/20231221134318.28105-1-andy.chiu@sifive.com/

Patch summary:
 - Updated patches: 1, 2, 3, 5, 10
 - New patch: none
 - Unchanged patch: 4, 6, 7, 8, 9
 - Deleted patch: none

Changelog v8:
 - Address build fail on no-mmu config
 - Fix build fail with W=1
 - Refactor patches (1, 2), Eric

Changelog v7:
 - Fix build fail for allmodconfig and test building the series with
   allmodconfig/allyesconfig

Changelog v6:
 - Provide a more robust check on the use of non-preemptible Vector.
 - Add Kconfigs to set threshold value at compile time. (Charlie)
 - Add a patch to utilize kmem_cache_* for V context allocations.
 - Re-write and add preemptible Vector.

Changelog v5:
 - Rebase on top of riscv for-next (6.7-rc1)
Changelog v4:
 - Use kernel_v_flags and helpers to track vector context.
 - Prevent softirq from nesting V context for non-preempt V
 - Add user copy and mem* routines

Changelog v3:
 - Rebase on top of riscv for-next (6.6-rc1)
 - Fix a build issue (Conor)
 - Guard vstate_save, vstate_restore with {get,put}_cpu_vector_context.
 - Save V context after disabling preemption. (Guo)
 - Remove irqs_disabled() check from may_use_simd(). (Björn)
 - Comment about nesting V context.

Changelog v2:
 - fix build issues
 - Follow arm's way of starting kernel-mode simd code:
   - add include/asm/simd.h and rename may_use_vector() ->
     may_use_simd()
   - return void in kernel_vector_begin(), and BUG_ON if may_use_simd()
     fails
 - Change naming scheme for functions/macros (Conor):
   - remove KMV
   - 's/rvv/vector/'
   - 's/RISCV_ISA_V_PREEMPTIVE_KMV/RISCV_ISA_V_PREEMPTIVE/'
   - 's/TIF_RISCV_V_KMV/TIF_RISCV_V_KERNEL_MODE/'

Andy Chiu (8):
  riscv: vector: make Vector always available for softirq context
  riscv: sched: defer restoring Vector context for user
  riscv: lib: vectorize copy_to_user/copy_from_user
  riscv: lib: add vectorized mem* routines
  riscv: vector: do not pass task_struct into
    riscv_v_vstate_{save,restore}()
  riscv: vector: use a mask to write vstate_ctrl
  riscv: vector: use kmem_cache to manage vector context
  riscv: vector: allow kernel-mode Vector with preemption

Greentime Hu (2):
  riscv: Add support for kernel mode vector
  riscv: Add vector extension XOR implementation

 arch/riscv/Kconfig                      |  46 +++++
 arch/riscv/include/asm/asm-prototypes.h |  27 +++
 arch/riscv/include/asm/entry-common.h   |  17 ++
 arch/riscv/include/asm/processor.h      |  39 ++++-
 arch/riscv/include/asm/simd.h           |  64 +++++++
 arch/riscv/include/asm/thread_info.h    |   2 +
 arch/riscv/include/asm/vector.h         | 103 +++++++++--
 arch/riscv/include/asm/xor.h            |  68 ++++++++
 arch/riscv/kernel/Makefile              |   1 +
 arch/riscv/kernel/entry.S               |   8 +
 arch/riscv/kernel/kernel_mode_vector.c  | 219 ++++++++++++++++++++++++
 arch/riscv/kernel/process.c             |  13 +-
 arch/riscv/kernel/ptrace.c              |   7 +-
 arch/riscv/kernel/signal.c              |   7 +-
 arch/riscv/kernel/vector.c              |  50 +++++-
 arch/riscv/lib/Makefile                 |  10 +-
 arch/riscv/lib/memcpy_vector.S          |  29 ++++
 arch/riscv/lib/memmove_vector.S         |  49 ++++++
 arch/riscv/lib/memset_vector.S          |  33 ++++
 arch/riscv/lib/riscv_v_helpers.c        |  70 ++++++++
 arch/riscv/lib/uaccess.S                |  10 ++
 arch/riscv/lib/uaccess_vector.S         |  50 ++++++
 arch/riscv/lib/xor.S                    |  81 +++++++++
 23 files changed, 977 insertions(+), 26 deletions(-)
 create mode 100644 arch/riscv/include/asm/simd.h
 create mode 100644 arch/riscv/include/asm/xor.h
 create mode 100644 arch/riscv/kernel/kernel_mode_vector.c
 create mode 100644 arch/riscv/lib/memcpy_vector.S
 create mode 100644 arch/riscv/lib/memmove_vector.S
 create mode 100644 arch/riscv/lib/memset_vector.S
 create mode 100644 arch/riscv/lib/riscv_v_helpers.c
 create mode 100644 arch/riscv/lib/uaccess_vector.S
 create mode 100644 arch/riscv/lib/xor.S

-- 
2.17.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [v8, 01/10] riscv: Add support for kernel mode vector
  2023-12-23  4:29 [v8, 00/10] riscv: support kernel-mode Vector Andy Chiu
@ 2023-12-23  4:29 ` Andy Chiu
  2023-12-27  1:36   ` Charlie Jenkins
  2023-12-23  4:29 ` [v8, 02/10] riscv: vector: make Vector always available for softirq context Andy Chiu
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Andy Chiu @ 2023-12-23  4:29 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: paul.walmsley, greentime.hu, guoren, bjorn, charlie, ardb, arnd,
	peterz, tglx, ebiggers, Vincent Chen, Andy Chiu, Albert Ou,
	Heiko Stuebner, Baoquan He, Clément Léger, Guo Ren,
	Xiao Wang, Björn Töpel, Conor Dooley, Alexandre Ghiti,
	Sami Tolvanen, Sia Jee Heng, Evan Green, Jisheng Zhang

From: Greentime Hu <greentime.hu@sifive.com>

Add kernel_vector_begin() and kernel_vector_end() function declarations
and corresponding definitions in kernel_mode_vector.c

These are needed to wrap uses of vector in kernel mode.

Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
---
Changelog v8:
 - Refactor unnecessary whitespace change (Eric)
Changelog v7:
 - fix build fail for allmodconfig
Changelog v6:
 - Use 8 bits to track non-preemptible vector context to provide better
   WARN coverage.
Changelog v4:
 - Use kernel_v_flags and helpers to track vector context.
Changelog v3:
 - Reorder patch 1 to patch 3 to make use of
   {get,put}_cpu_vector_context later.
 - Export {get,put}_cpu_vector_context.
 - Save V context after disabling preemption. (Guo)
 - Fix a build fail. (Conor)
 - Remove irqs_disabled() check as it is not needed, fix styling. (Björn)
Changelog v2:
 - 's/kernel_rvv/kernel_vector' and return void in kernel_vector_begin
   (Conor)
 - export may_use_simd to include/asm/simd.h
---
 arch/riscv/include/asm/processor.h     | 17 ++++-
 arch/riscv/include/asm/simd.h          | 44 ++++++++++++
 arch/riscv/include/asm/vector.h        | 21 ++++++
 arch/riscv/kernel/Makefile             |  1 +
 arch/riscv/kernel/kernel_mode_vector.c | 95 ++++++++++++++++++++++++++
 arch/riscv/kernel/process.c            |  1 +
 6 files changed, 178 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/include/asm/simd.h
 create mode 100644 arch/riscv/kernel/kernel_mode_vector.c

diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index f19f861cda54..15781e2232e0 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -73,6 +73,20 @@
 struct task_struct;
 struct pt_regs;
 
+/*
+ * We use a flag to track in-kernel Vector context. Currently the flag has the
+ * following meaning:
+ *
+ *  - bit 0-7 indicates whether the in-kernel Vector context is active. The
+ *    activation of this state disables the preemption. On a non-RT kernel, it
+ *    also disable bh. Currently only 0 and 1 are valid value for this field.
+ *    Other values are reserved for future uses.
+ */
+
+#define RISCV_KERNEL_MODE_V_MASK	0xff
+
+#define RISCV_KERNEL_MODE_V	0x1
+
 /* CPU-specific state of a task */
 struct thread_struct {
 	/* Callee-saved registers */
@@ -81,7 +95,8 @@ struct thread_struct {
 	unsigned long s[12];	/* s[0]: frame pointer */
 	struct __riscv_d_ext_state fstate;
 	unsigned long bad_cause;
-	unsigned long vstate_ctrl;
+	u32 riscv_v_flags;
+	u32 vstate_ctrl;
 	struct __riscv_v_ext_state vstate;
 	unsigned long align_ctl;
 };
diff --git a/arch/riscv/include/asm/simd.h b/arch/riscv/include/asm/simd.h
new file mode 100644
index 000000000000..3b603e47c5d8
--- /dev/null
+++ b/arch/riscv/include/asm/simd.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2023 SiFive
+ */
+
+#ifndef __ASM_SIMD_H
+#define __ASM_SIMD_H
+
+#include <linux/compiler.h>
+#include <linux/irqflags.h>
+#include <linux/percpu.h>
+#include <linux/preempt.h>
+#include <linux/types.h>
+
+#include <asm/vector.h>
+
+#ifdef CONFIG_RISCV_ISA_V
+/*
+ * may_use_simd - whether it is allowable at this time to issue vector
+ *                instructions or access the vector register file
+ *
+ * Callers must not assume that the result remains true beyond the next
+ * preempt_enable() or return from softirq context.
+ */
+static __must_check inline bool may_use_simd(void)
+{
+	/*
+	 * RISCV_KERNEL_MODE_V is only set while preemption is disabled,
+	 * and is clear whenever preemption is enabled.
+	 */
+	return !in_hardirq() && !in_nmi() && !(riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK);
+}
+
+#else /* ! CONFIG_RISCV_ISA_V */
+
+static __must_check inline bool may_use_simd(void)
+{
+	return false;
+}
+
+#endif /* ! CONFIG_RISCV_ISA_V */
+
+#endif
diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index 87aaef656257..6254830c0668 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -22,6 +22,27 @@
 extern unsigned long riscv_v_vsize;
 int riscv_v_setup_vsize(void);
 bool riscv_v_first_use_handler(struct pt_regs *regs);
+void kernel_vector_begin(void);
+void kernel_vector_end(void);
+void get_cpu_vector_context(void);
+void put_cpu_vector_context(void);
+
+static inline void riscv_v_ctx_cnt_add(u32 offset)
+{
+	current->thread.riscv_v_flags += offset;
+	barrier();
+}
+
+static inline void riscv_v_ctx_cnt_sub(u32 offset)
+{
+	barrier();
+	current->thread.riscv_v_flags -= offset;
+}
+
+static inline u32 riscv_v_ctx_cnt(void)
+{
+	return READ_ONCE(current->thread.riscv_v_flags);
+}
 
 static __always_inline bool has_vector(void)
 {
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index fee22a3d1b53..8c58595696b3 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -63,6 +63,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
 obj-$(CONFIG_RISCV_MISALIGNED)	+= traps_misaligned.o
 obj-$(CONFIG_FPU)		+= fpu.o
 obj-$(CONFIG_RISCV_ISA_V)	+= vector.o
+obj-$(CONFIG_RISCV_ISA_V)	+= kernel_mode_vector.o
 obj-$(CONFIG_SMP)		+= smpboot.o
 obj-$(CONFIG_SMP)		+= smp.o
 obj-$(CONFIG_SMP)		+= cpu_ops.o
diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
new file mode 100644
index 000000000000..105147c7d2da
--- /dev/null
+++ b/arch/riscv/kernel/kernel_mode_vector.c
@@ -0,0 +1,95 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ * Author: Catalin Marinas <catalin.marinas@arm.com>
+ * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2021 SiFive
+ */
+#include <linux/compiler.h>
+#include <linux/irqflags.h>
+#include <linux/percpu.h>
+#include <linux/preempt.h>
+#include <linux/types.h>
+
+#include <asm/vector.h>
+#include <asm/switch_to.h>
+#include <asm/simd.h>
+
+/*
+ * Claim ownership of the CPU vector context for use by the calling context.
+ *
+ * The caller may freely manipulate the vector context metadata until
+ * put_cpu_vector_context() is called.
+ */
+void get_cpu_vector_context(void)
+{
+	preempt_disable();
+
+	WARN_ON((riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK) != 0);
+	riscv_v_ctx_cnt_add(RISCV_KERNEL_MODE_V);
+}
+
+/*
+ * Release the CPU vector context.
+ *
+ * Must be called from a context in which get_cpu_vector_context() was
+ * previously called, with no call to put_cpu_vector_context() in the
+ * meantime.
+ */
+void put_cpu_vector_context(void)
+{
+	WARN_ON((riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK) != RISCV_KERNEL_MODE_V);
+	riscv_v_ctx_cnt_sub(RISCV_KERNEL_MODE_V);
+
+	preempt_enable();
+}
+
+/*
+ * kernel_vector_begin(): obtain the CPU vector registers for use by the calling
+ * context
+ *
+ * Must not be called unless may_use_simd() returns true.
+ * Task context in the vector registers is saved back to memory as necessary.
+ *
+ * A matching call to kernel_vector_end() must be made before returning from the
+ * calling context.
+ *
+ * The caller may freely use the vector registers until kernel_vector_end() is
+ * called.
+ */
+void kernel_vector_begin(void)
+{
+	if (WARN_ON(!has_vector()))
+		return;
+
+	BUG_ON(!may_use_simd());
+
+	get_cpu_vector_context();
+
+	riscv_v_vstate_save(current, task_pt_regs(current));
+
+	riscv_v_enable();
+}
+EXPORT_SYMBOL_GPL(kernel_vector_begin);
+
+/*
+ * kernel_vector_end(): give the CPU vector registers back to the current task
+ *
+ * Must be called from a context in which kernel_vector_begin() was previously
+ * called, with no call to kernel_vector_end() in the meantime.
+ *
+ * The caller must not use the vector registers after this function is called,
+ * unless kernel_vector_begin() is called again in the meantime.
+ */
+void kernel_vector_end(void)
+{
+	if (WARN_ON(!has_vector()))
+		return;
+
+	riscv_v_vstate_restore(current, task_pt_regs(current));
+
+	riscv_v_disable();
+
+	put_cpu_vector_context();
+}
+EXPORT_SYMBOL_GPL(kernel_vector_end);
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 4f21d970a129..4a1275db1146 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -221,6 +221,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 		childregs->a0 = 0; /* Return value of fork() */
 		p->thread.s[0] = 0;
 	}
+	p->thread.riscv_v_flags = 0;
 	p->thread.ra = (unsigned long)ret_from_fork;
 	p->thread.sp = (unsigned long)childregs; /* kernel sp */
 	return 0;
-- 
2.17.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [v8, 02/10] riscv: vector: make Vector always available for softirq context
  2023-12-23  4:29 [v8, 00/10] riscv: support kernel-mode Vector Andy Chiu
  2023-12-23  4:29 ` [v8, 01/10] riscv: Add support for kernel mode vector Andy Chiu
@ 2023-12-23  4:29 ` Andy Chiu
  2023-12-23  4:29 ` [v8, 03/10] riscv: Add vector extension XOR implementation Andy Chiu
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Andy Chiu @ 2023-12-23  4:29 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: paul.walmsley, greentime.hu, guoren, bjorn, charlie, ardb, arnd,
	peterz, tglx, ebiggers, Andy Chiu, Albert Ou, Vincent Chen,
	Conor Dooley

The goal of this patch is to provide full support of Vector in kernel
softirq context. So that some of the crypto alogrithms won't need scalar
fallbacks.

By disabling bottom halves in active kernel-mode Vector, softirq will
not be able to nest on top of any kernel-mode Vector. So, softirq
context is able to use Vector whenever it runs.

After this patch, Vector context cannot start with irqs disabled.
Otherwise local_bh_enable() may run in a wrong context.

Disabling bh is not enough for RT-kernel to prevent preeemption. So
we must disable preemption, which also implies disabling bh on RT.

Related-to: commit 696207d4258b ("arm64/sve: Make kernel FPU protection RT friendly")
Related-to: commit 66c3ec5a7120 ("arm64: neon: Forbid when irqs are disabled")
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
---
Changelog v8:
 - refine comments, fix typos (Eric)
Changelog v4:
 - new patch since v4
---
 arch/riscv/include/asm/simd.h          |  6 +++++-
 arch/riscv/kernel/kernel_mode_vector.c | 14 ++++++++++++--
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/simd.h b/arch/riscv/include/asm/simd.h
index 3b603e47c5d8..2f1e95ccb03c 100644
--- a/arch/riscv/include/asm/simd.h
+++ b/arch/riscv/include/asm/simd.h
@@ -28,8 +28,12 @@ static __must_check inline bool may_use_simd(void)
 	/*
 	 * RISCV_KERNEL_MODE_V is only set while preemption is disabled,
 	 * and is clear whenever preemption is enabled.
+	 *
+	 * Kernel-mode Vector temporarily disables bh. So we must not return
+	 * true on irq_disabled(). Otherwise we would fail the lockdep check
+	 * calling local_bh_enable()
 	 */
-	return !in_hardirq() && !in_nmi() && !(riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK);
+	return !in_hardirq() && !in_nmi() && !irqs_disabled() && !(riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK);
 }
 
 #else /* ! CONFIG_RISCV_ISA_V */
diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
index 105147c7d2da..385d9b4d8cc6 100644
--- a/arch/riscv/kernel/kernel_mode_vector.c
+++ b/arch/riscv/kernel/kernel_mode_vector.c
@@ -23,7 +23,14 @@
  */
 void get_cpu_vector_context(void)
 {
-	preempt_disable();
+	/*
+	 * disable softirqs so it is impossible for softirqs to nest
+	 * get_cpu_vector_context() when kernel is actively using Vector.
+	 */
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		local_bh_disable();
+	else
+		preempt_disable();
 
 	WARN_ON((riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK) != 0);
 	riscv_v_ctx_cnt_add(RISCV_KERNEL_MODE_V);
@@ -41,7 +48,10 @@ void put_cpu_vector_context(void)
 	WARN_ON((riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK) != RISCV_KERNEL_MODE_V);
 	riscv_v_ctx_cnt_sub(RISCV_KERNEL_MODE_V);
 
-	preempt_enable();
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		local_bh_enable();
+	else
+		preempt_enable();
 }
 
 /*
-- 
2.17.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [v8, 03/10] riscv: Add vector extension XOR implementation
  2023-12-23  4:29 [v8, 00/10] riscv: support kernel-mode Vector Andy Chiu
  2023-12-23  4:29 ` [v8, 01/10] riscv: Add support for kernel mode vector Andy Chiu
  2023-12-23  4:29 ` [v8, 02/10] riscv: vector: make Vector always available for softirq context Andy Chiu
@ 2023-12-23  4:29 ` Andy Chiu
  2023-12-23  4:29 ` [v8, 04/10] riscv: sched: defer restoring Vector context for user Andy Chiu
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Andy Chiu @ 2023-12-23  4:29 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: paul.walmsley, greentime.hu, guoren, bjorn, charlie, ardb, arnd,
	peterz, tglx, ebiggers, Han-Kuan Chen, Andy Chiu, Albert Ou,
	Guo Ren, Sami Tolvanen, Deepak Gupta, Andrew Jones, Conor Dooley,
	Heiko Stuebner

From: Greentime Hu <greentime.hu@sifive.com>

This patch adds support for vector optimized XOR and it is tested in
qemu.

Co-developed-by: Han-Kuan Chen <hankuan.chen@sifive.com>
Signed-off-by: Han-Kuan Chen <hankuan.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
---
Changelog v8:
 - wrap xor function prototypes with CONFIG_RISCV_ISA_V
Changelog v7:
 - fix build warning message and use proper entry/exit macro for
   assembly. Drop Conor's A-b
Changelog v2:
 - 's/rvv/vector/' (Conor)
---
 arch/riscv/include/asm/asm-prototypes.h | 18 ++++++
 arch/riscv/include/asm/xor.h            | 68 +++++++++++++++++++++
 arch/riscv/lib/Makefile                 |  1 +
 arch/riscv/lib/xor.S                    | 81 +++++++++++++++++++++++++
 4 files changed, 168 insertions(+)
 create mode 100644 arch/riscv/include/asm/xor.h
 create mode 100644 arch/riscv/lib/xor.S

diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
index 36b955c762ba..6db1a9bbff4c 100644
--- a/arch/riscv/include/asm/asm-prototypes.h
+++ b/arch/riscv/include/asm/asm-prototypes.h
@@ -9,6 +9,24 @@ long long __lshrti3(long long a, int b);
 long long __ashrti3(long long a, int b);
 long long __ashlti3(long long a, int b);
 
+#ifdef CONFIG_RISCV_ISA_V
+
+void xor_regs_2_(unsigned long bytes, unsigned long *__restrict p1,
+		 const unsigned long *__restrict p2);
+void xor_regs_3_(unsigned long bytes, unsigned long *__restrict p1,
+		 const unsigned long *__restrict p2,
+		 const unsigned long *__restrict p3);
+void xor_regs_4_(unsigned long bytes, unsigned long *__restrict p1,
+		 const unsigned long *__restrict p2,
+		 const unsigned long *__restrict p3,
+		 const unsigned long *__restrict p4);
+void xor_regs_5_(unsigned long bytes, unsigned long *__restrict p1,
+		 const unsigned long *__restrict p2,
+		 const unsigned long *__restrict p3,
+		 const unsigned long *__restrict p4,
+		 const unsigned long *__restrict p5);
+
+#endif /* CONFIG_RISCV_ISA_V */
 
 #define DECLARE_DO_ERROR_INFO(name)	asmlinkage void name(struct pt_regs *regs)
 
diff --git a/arch/riscv/include/asm/xor.h b/arch/riscv/include/asm/xor.h
new file mode 100644
index 000000000000..96011861e46b
--- /dev/null
+++ b/arch/riscv/include/asm/xor.h
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2021 SiFive
+ */
+
+#include <linux/hardirq.h>
+#include <asm-generic/xor.h>
+#ifdef CONFIG_RISCV_ISA_V
+#include <asm/vector.h>
+#include <asm/switch_to.h>
+#include <asm/asm-prototypes.h>
+
+static void xor_vector_2(unsigned long bytes, unsigned long *__restrict p1,
+			 const unsigned long *__restrict p2)
+{
+	kernel_vector_begin();
+	xor_regs_2_(bytes, p1, p2);
+	kernel_vector_end();
+}
+
+static void xor_vector_3(unsigned long bytes, unsigned long *__restrict p1,
+			 const unsigned long *__restrict p2,
+			 const unsigned long *__restrict p3)
+{
+	kernel_vector_begin();
+	xor_regs_3_(bytes, p1, p2, p3);
+	kernel_vector_end();
+}
+
+static void xor_vector_4(unsigned long bytes, unsigned long *__restrict p1,
+			 const unsigned long *__restrict p2,
+			 const unsigned long *__restrict p3,
+			 const unsigned long *__restrict p4)
+{
+	kernel_vector_begin();
+	xor_regs_4_(bytes, p1, p2, p3, p4);
+	kernel_vector_end();
+}
+
+static void xor_vector_5(unsigned long bytes, unsigned long *__restrict p1,
+			 const unsigned long *__restrict p2,
+			 const unsigned long *__restrict p3,
+			 const unsigned long *__restrict p4,
+			 const unsigned long *__restrict p5)
+{
+	kernel_vector_begin();
+	xor_regs_5_(bytes, p1, p2, p3, p4, p5);
+	kernel_vector_end();
+}
+
+static struct xor_block_template xor_block_rvv = {
+	.name = "rvv",
+	.do_2 = xor_vector_2,
+	.do_3 = xor_vector_3,
+	.do_4 = xor_vector_4,
+	.do_5 = xor_vector_5
+};
+
+#undef XOR_TRY_TEMPLATES
+#define XOR_TRY_TEMPLATES           \
+	do {        \
+		xor_speed(&xor_block_8regs);    \
+		xor_speed(&xor_block_32regs);    \
+		if (has_vector()) { \
+			xor_speed(&xor_block_rvv);\
+		} \
+	} while (0)
+#endif
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 26cb2502ecf8..494f9cd1a00c 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -11,3 +11,4 @@ lib-$(CONFIG_64BIT)	+= tishift.o
 lib-$(CONFIG_RISCV_ISA_ZICBOZ)	+= clear_page.o
 
 obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
+lib-$(CONFIG_RISCV_ISA_V)	+= xor.o
diff --git a/arch/riscv/lib/xor.S b/arch/riscv/lib/xor.S
new file mode 100644
index 000000000000..b28f2430e52f
--- /dev/null
+++ b/arch/riscv/lib/xor.S
@@ -0,0 +1,81 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2021 SiFive
+ */
+#include <linux/linkage.h>
+#include <linux/export.h>
+#include <asm/asm.h>
+
+SYM_FUNC_START(xor_regs_2_)
+	vsetvli a3, a0, e8, m8, ta, ma
+	vle8.v v0, (a1)
+	vle8.v v8, (a2)
+	sub a0, a0, a3
+	vxor.vv v16, v0, v8
+	add a2, a2, a3
+	vse8.v v16, (a1)
+	add a1, a1, a3
+	bnez a0, xor_regs_2_
+	ret
+SYM_FUNC_END(xor_regs_2_)
+EXPORT_SYMBOL(xor_regs_2_)
+
+SYM_FUNC_START(xor_regs_3_)
+	vsetvli a4, a0, e8, m8, ta, ma
+	vle8.v v0, (a1)
+	vle8.v v8, (a2)
+	sub a0, a0, a4
+	vxor.vv v0, v0, v8
+	vle8.v v16, (a3)
+	add a2, a2, a4
+	vxor.vv v16, v0, v16
+	add a3, a3, a4
+	vse8.v v16, (a1)
+	add a1, a1, a4
+	bnez a0, xor_regs_3_
+	ret
+SYM_FUNC_END(xor_regs_3_)
+EXPORT_SYMBOL(xor_regs_3_)
+
+SYM_FUNC_START(xor_regs_4_)
+	vsetvli a5, a0, e8, m8, ta, ma
+	vle8.v v0, (a1)
+	vle8.v v8, (a2)
+	sub a0, a0, a5
+	vxor.vv v0, v0, v8
+	vle8.v v16, (a3)
+	add a2, a2, a5
+	vxor.vv v0, v0, v16
+	vle8.v v24, (a4)
+	add a3, a3, a5
+	vxor.vv v16, v0, v24
+	add a4, a4, a5
+	vse8.v v16, (a1)
+	add a1, a1, a5
+	bnez a0, xor_regs_4_
+	ret
+SYM_FUNC_END(xor_regs_4_)
+EXPORT_SYMBOL(xor_regs_4_)
+
+SYM_FUNC_START(xor_regs_5_)
+	vsetvli a6, a0, e8, m8, ta, ma
+	vle8.v v0, (a1)
+	vle8.v v8, (a2)
+	sub a0, a0, a6
+	vxor.vv v0, v0, v8
+	vle8.v v16, (a3)
+	add a2, a2, a6
+	vxor.vv v0, v0, v16
+	vle8.v v24, (a4)
+	add a3, a3, a6
+	vxor.vv v0, v0, v24
+	vle8.v v8, (a5)
+	add a4, a4, a6
+	vxor.vv v16, v0, v8
+	add a5, a5, a6
+	vse8.v v16, (a1)
+	add a1, a1, a6
+	bnez a0, xor_regs_5_
+	ret
+SYM_FUNC_END(xor_regs_5_)
+EXPORT_SYMBOL(xor_regs_5_)
-- 
2.17.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [v8, 04/10] riscv: sched: defer restoring Vector context for user
  2023-12-23  4:29 [v8, 00/10] riscv: support kernel-mode Vector Andy Chiu
                   ` (2 preceding siblings ...)
  2023-12-23  4:29 ` [v8, 03/10] riscv: Add vector extension XOR implementation Andy Chiu
@ 2023-12-23  4:29 ` Andy Chiu
  2023-12-27 12:07   ` Song Shuai
  2023-12-23  4:29 ` [v8, 05/10] riscv: lib: vectorize copy_to_user/copy_from_user Andy Chiu
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Andy Chiu @ 2023-12-23  4:29 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: paul.walmsley, greentime.hu, guoren, bjorn, charlie, ardb, arnd,
	peterz, tglx, ebiggers, Andy Chiu, Albert Ou, Oleg Nesterov,
	Björn Töpel, Conor Dooley, Guo Ren,
	Clément Léger, Jisheng Zhang, Sami Tolvanen,
	Deepak Gupta, Vincent Chen, Heiko Stuebner, Xiao Wang, Haorong Lu,
	Mathis Salmen, Joel Granados

User will use its Vector registers only after the kernel really returns
to the userspace. So we can delay restoring Vector registers as long as
we are still running in kernel mode. So, add a thread flag to indicates
the need of restoring Vector and do the restore at the last
arch-specific exit-to-user hook. This save the context restoring cost
when we switch over multiple processes that run V in kernel mode. For
example, if the kernel performs a context swicth from A->B->C, and
returns to C's userspace, then there is no need to restore B's
V-register.

Besides, this also prevents us from repeatedly restoring V context when
executing kernel-mode Vector multiple times.

The cost of this is that we must disable preemption and mark vector as
busy during vstate_{save,restore}. Because then the V context will not
get restored back immediately when a trap-causing context switch happens
in the middle of vstate_{save,restore}.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
---
Changelog v4:
 - fix typos and re-add Conor's A-b.
Changelog v3:
 - Guard {get,put}_cpu_vector_context between vstate_* operation and
   explain it in the commit msg.
 - Drop R-b from Björn and A-b from Conor.
Changelog v2:
 - rename and add comment for the new thread flag (Conor)
---
 arch/riscv/include/asm/entry-common.h  | 17 +++++++++++++++++
 arch/riscv/include/asm/thread_info.h   |  2 ++
 arch/riscv/include/asm/vector.h        | 11 ++++++++++-
 arch/riscv/kernel/kernel_mode_vector.c |  2 +-
 arch/riscv/kernel/process.c            |  2 ++
 arch/riscv/kernel/ptrace.c             |  5 ++++-
 arch/riscv/kernel/signal.c             |  5 ++++-
 arch/riscv/kernel/vector.c             |  2 +-
 8 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/arch/riscv/include/asm/entry-common.h b/arch/riscv/include/asm/entry-common.h
index 7ab5e34318c8..6361a8488642 100644
--- a/arch/riscv/include/asm/entry-common.h
+++ b/arch/riscv/include/asm/entry-common.h
@@ -4,6 +4,23 @@
 #define _ASM_RISCV_ENTRY_COMMON_H
 
 #include <asm/stacktrace.h>
+#include <asm/thread_info.h>
+#include <asm/vector.h>
+
+static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
+						  unsigned long ti_work)
+{
+	if (ti_work & _TIF_RISCV_V_DEFER_RESTORE) {
+		clear_thread_flag(TIF_RISCV_V_DEFER_RESTORE);
+		/*
+		 * We are already called with irq disabled, so go without
+		 * keeping track of vector_context_busy.
+		 */
+		riscv_v_vstate_restore(current, regs);
+	}
+}
+
+#define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare
 
 void handle_page_fault(struct pt_regs *regs);
 void handle_break(struct pt_regs *regs);
diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
index 574779900bfb..1047a97ddbc8 100644
--- a/arch/riscv/include/asm/thread_info.h
+++ b/arch/riscv/include/asm/thread_info.h
@@ -103,12 +103,14 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src);
 #define TIF_NOTIFY_SIGNAL	9	/* signal notifications exist */
 #define TIF_UPROBE		10	/* uprobe breakpoint or singlestep */
 #define TIF_32BIT		11	/* compat-mode 32bit process */
+#define TIF_RISCV_V_DEFER_RESTORE	12 /* restore Vector before returing to user */
 
 #define _TIF_NOTIFY_RESUME	(1 << TIF_NOTIFY_RESUME)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
 #define _TIF_NOTIFY_SIGNAL	(1 << TIF_NOTIFY_SIGNAL)
 #define _TIF_UPROBE		(1 << TIF_UPROBE)
+#define _TIF_RISCV_V_DEFER_RESTORE	(1 << TIF_RISCV_V_DEFER_RESTORE)
 
 #define _TIF_WORK_MASK \
 	(_TIF_NOTIFY_RESUME | _TIF_SIGPENDING | _TIF_NEED_RESCHED | \
diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index 6254830c0668..e706613aae2c 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -205,6 +205,15 @@ static inline void riscv_v_vstate_restore(struct task_struct *task,
 	}
 }
 
+static inline void riscv_v_vstate_set_restore(struct task_struct *task,
+					      struct pt_regs *regs)
+{
+	if ((regs->status & SR_VS) != SR_VS_OFF) {
+		set_tsk_thread_flag(task, TIF_RISCV_V_DEFER_RESTORE);
+		riscv_v_vstate_on(regs);
+	}
+}
+
 static inline void __switch_to_vector(struct task_struct *prev,
 				      struct task_struct *next)
 {
@@ -212,7 +221,7 @@ static inline void __switch_to_vector(struct task_struct *prev,
 
 	regs = task_pt_regs(prev);
 	riscv_v_vstate_save(prev, regs);
-	riscv_v_vstate_restore(next, task_pt_regs(next));
+	riscv_v_vstate_set_restore(next, task_pt_regs(next));
 }
 
 void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
index 385d9b4d8cc6..63814e780c28 100644
--- a/arch/riscv/kernel/kernel_mode_vector.c
+++ b/arch/riscv/kernel/kernel_mode_vector.c
@@ -96,7 +96,7 @@ void kernel_vector_end(void)
 	if (WARN_ON(!has_vector()))
 		return;
 
-	riscv_v_vstate_restore(current, task_pt_regs(current));
+	riscv_v_vstate_set_restore(current, task_pt_regs(current));
 
 	riscv_v_disable();
 
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 4a1275db1146..36993f408de4 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -171,6 +171,7 @@ void flush_thread(void)
 	riscv_v_vstate_off(task_pt_regs(current));
 	kfree(current->thread.vstate.datap);
 	memset(&current->thread.vstate, 0, sizeof(struct __riscv_v_ext_state));
+	clear_tsk_thread_flag(current, TIF_RISCV_V_DEFER_RESTORE);
 #endif
 }
 
@@ -187,6 +188,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	*dst = *src;
 	/* clear entire V context, including datap for a new task */
 	memset(&dst->thread.vstate, 0, sizeof(struct __riscv_v_ext_state));
+	clear_tsk_thread_flag(dst, TIF_RISCV_V_DEFER_RESTORE);
 
 	return 0;
 }
diff --git a/arch/riscv/kernel/ptrace.c b/arch/riscv/kernel/ptrace.c
index 2afe460de16a..7b93bcbdf9fa 100644
--- a/arch/riscv/kernel/ptrace.c
+++ b/arch/riscv/kernel/ptrace.c
@@ -99,8 +99,11 @@ static int riscv_vr_get(struct task_struct *target,
 	 * Ensure the vector registers have been saved to the memory before
 	 * copying them to membuf.
 	 */
-	if (target == current)
+	if (target == current) {
+		get_cpu_vector_context();
 		riscv_v_vstate_save(current, task_pt_regs(current));
+		put_cpu_vector_context();
+	}
 
 	ptrace_vstate.vstart = vstate->vstart;
 	ptrace_vstate.vl = vstate->vl;
diff --git a/arch/riscv/kernel/signal.c b/arch/riscv/kernel/signal.c
index 88b6220b2608..aca4a12c8416 100644
--- a/arch/riscv/kernel/signal.c
+++ b/arch/riscv/kernel/signal.c
@@ -86,7 +86,10 @@ static long save_v_state(struct pt_regs *regs, void __user **sc_vec)
 	/* datap is designed to be 16 byte aligned for better performance */
 	WARN_ON(unlikely(!IS_ALIGNED((unsigned long)datap, 16)));
 
+	get_cpu_vector_context();
 	riscv_v_vstate_save(current, regs);
+	put_cpu_vector_context();
+
 	/* Copy everything of vstate but datap. */
 	err = __copy_to_user(&state->v_state, &current->thread.vstate,
 			     offsetof(struct __riscv_v_ext_state, datap));
@@ -134,7 +137,7 @@ static long __restore_v_state(struct pt_regs *regs, void __user *sc_vec)
 	if (unlikely(err))
 		return err;
 
-	riscv_v_vstate_restore(current, regs);
+	riscv_v_vstate_set_restore(current, regs);
 
 	return err;
 }
diff --git a/arch/riscv/kernel/vector.c b/arch/riscv/kernel/vector.c
index 578b6292487e..66e8c6ab09d2 100644
--- a/arch/riscv/kernel/vector.c
+++ b/arch/riscv/kernel/vector.c
@@ -167,7 +167,7 @@ bool riscv_v_first_use_handler(struct pt_regs *regs)
 		return true;
 	}
 	riscv_v_vstate_on(regs);
-	riscv_v_vstate_restore(current, regs);
+	riscv_v_vstate_set_restore(current, regs);
 	return true;
 }
 
-- 
2.17.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [v8, 05/10] riscv: lib: vectorize copy_to_user/copy_from_user
  2023-12-23  4:29 [v8, 00/10] riscv: support kernel-mode Vector Andy Chiu
                   ` (3 preceding siblings ...)
  2023-12-23  4:29 ` [v8, 04/10] riscv: sched: defer restoring Vector context for user Andy Chiu
@ 2023-12-23  4:29 ` Andy Chiu
  2023-12-27  1:27   ` Charlie Jenkins
  2023-12-27  1:34   ` Guo Ren
  2023-12-23  4:29 ` [v8, 06/10] riscv: lib: add vectorized mem* routines Andy Chiu
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 24+ messages in thread
From: Andy Chiu @ 2023-12-23  4:29 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: paul.walmsley, greentime.hu, guoren, bjorn, charlie, ardb, arnd,
	peterz, tglx, ebiggers, Andy Chiu, Albert Ou, Guo Ren,
	Sami Tolvanen, Han-Kuan Chen, Deepak Gupta, Andrew Jones,
	Conor Dooley, Heiko Stuebner, Aurelien Jarno, Bo YU,
	Alexandre Ghiti, Clément Léger

This patch utilizes Vector to perform copy_to_user/copy_from_user. If
Vector is available and the size of copy is large enough for Vector to
perform better than scalar, then direct the kernel to do Vector copies
for userspace. Though the best programming practice for users is to
reduce the copy, this provides a faster variant when copies are
inevitable.

The optimal size for using Vector, copy_to_user_thres, is only a
heuristic for now. We can add DT parsing if people feel the need of
customizing it.

The exception fixup code of the __asm_vector_usercopy must fallback to
the scalar one because accessing user pages might fault, and must be
sleepable. Current kernel-mode Vector does not allow tasks to be
preemptible, so we must disactivate Vector and perform a scalar fallback
in such case.

The original implementation of Vector operations comes from
https://github.com/sifive/sifive-libc, which we agree to contribute to
Linux kernel.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
---
Changelog v8:
 - fix no-mmu build
Changelog v6:
 - Add a kconfig entry to configure threshold values (Charlie)
 - Refine assembly code (Charlie)
Changelog v4:
 - new patch since v4
---
 arch/riscv/Kconfig                      |  8 ++++
 arch/riscv/include/asm/asm-prototypes.h |  4 ++
 arch/riscv/lib/Makefile                 |  6 ++-
 arch/riscv/lib/riscv_v_helpers.c        | 44 ++++++++++++++++++++++
 arch/riscv/lib/uaccess.S                | 10 +++++
 arch/riscv/lib/uaccess_vector.S         | 50 +++++++++++++++++++++++++
 6 files changed, 121 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/lib/riscv_v_helpers.c
 create mode 100644 arch/riscv/lib/uaccess_vector.S

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 95a2a06acc6a..3c5ba05e8a2d 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -525,6 +525,14 @@ config RISCV_ISA_V_DEFAULT_ENABLE
 
 	  If you don't know what to do here, say Y.
 
+config RISCV_ISA_V_UCOPY_THRESHOLD
+	int "Threshold size for vectorized user copies"
+	depends on RISCV_ISA_V
+	default 768
+	help
+	  Prefer using vectorized copy_to_user()/copy_from_user() when the
+	  workload size exceeds this value.
+
 config TOOLCHAIN_HAS_ZBB
 	bool
 	default y
diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
index 6db1a9bbff4c..be438932f321 100644
--- a/arch/riscv/include/asm/asm-prototypes.h
+++ b/arch/riscv/include/asm/asm-prototypes.h
@@ -11,6 +11,10 @@ long long __ashlti3(long long a, int b);
 
 #ifdef CONFIG_RISCV_ISA_V
 
+#ifdef CONFIG_MMU
+asmlinkage int enter_vector_usercopy(void *dst, void *src, size_t n);
+#endif /* CONFIG_MMU  */
+
 void xor_regs_2_(unsigned long bytes, unsigned long *__restrict p1,
 		 const unsigned long *__restrict p2);
 void xor_regs_3_(unsigned long bytes, unsigned long *__restrict p1,
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 494f9cd1a00c..c8a6787d5827 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -6,9 +6,13 @@ lib-y			+= memmove.o
 lib-y			+= strcmp.o
 lib-y			+= strlen.o
 lib-y			+= strncmp.o
-lib-$(CONFIG_MMU)	+= uaccess.o
+ifeq ($(CONFIG_MMU), y)
+lib-y				+= uaccess.o
+lib-$(CONFIG_RISCV_ISA_V)	+= uaccess_vector.o
+endif
 lib-$(CONFIG_64BIT)	+= tishift.o
 lib-$(CONFIG_RISCV_ISA_ZICBOZ)	+= clear_page.o
 
 obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
 lib-$(CONFIG_RISCV_ISA_V)	+= xor.o
+lib-$(CONFIG_RISCV_ISA_V)	+= riscv_v_helpers.o
diff --git a/arch/riscv/lib/riscv_v_helpers.c b/arch/riscv/lib/riscv_v_helpers.c
new file mode 100644
index 000000000000..6cac8f4e69e9
--- /dev/null
+++ b/arch/riscv/lib/riscv_v_helpers.c
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2023 SiFive
+ * Author: Andy Chiu <andy.chiu@sifive.com>
+ */
+#include <linux/linkage.h>
+#include <asm/asm.h>
+
+#include <asm/vector.h>
+#include <asm/simd.h>
+
+#ifdef CONFIG_MMU
+#include <asm/asm-prototypes.h>
+#endif
+
+#ifdef CONFIG_MMU
+size_t riscv_v_usercopy_threshold = CONFIG_RISCV_ISA_V_UCOPY_THRESHOLD;
+int __asm_vector_usercopy(void *dst, void *src, size_t n);
+int fallback_scalar_usercopy(void *dst, void *src, size_t n);
+asmlinkage int enter_vector_usercopy(void *dst, void *src, size_t n)
+{
+	size_t remain, copied;
+
+	/* skip has_vector() check because it has been done by the asm  */
+	if (!may_use_simd())
+		goto fallback;
+
+	kernel_vector_begin();
+	remain = __asm_vector_usercopy(dst, src, n);
+	kernel_vector_end();
+
+	if (remain) {
+		copied = n - remain;
+		dst += copied;
+		src += copied;
+		goto fallback;
+	}
+
+	return remain;
+
+fallback:
+	return fallback_scalar_usercopy(dst, src, n);
+}
+#endif
diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S
index 3ab438f30d13..a1e4a3c42925 100644
--- a/arch/riscv/lib/uaccess.S
+++ b/arch/riscv/lib/uaccess.S
@@ -3,6 +3,8 @@
 #include <asm/asm.h>
 #include <asm/asm-extable.h>
 #include <asm/csr.h>
+#include <asm/hwcap.h>
+#include <asm/alternative-macros.h>
 
 	.macro fixup op reg addr lbl
 100:
@@ -11,6 +13,13 @@
 	.endm
 
 SYM_FUNC_START(__asm_copy_to_user)
+#ifdef CONFIG_RISCV_ISA_V
+	ALTERNATIVE("j fallback_scalar_usercopy", "nop", 0, RISCV_ISA_EXT_v, CONFIG_RISCV_ISA_V)
+	REG_L	t0, riscv_v_usercopy_threshold
+	bltu	a2, t0, fallback_scalar_usercopy
+	tail enter_vector_usercopy
+#endif
+SYM_FUNC_START(fallback_scalar_usercopy)
 
 	/* Enable access to user memory */
 	li t6, SR_SUM
@@ -181,6 +190,7 @@ SYM_FUNC_START(__asm_copy_to_user)
 	sub a0, t5, a0
 	ret
 SYM_FUNC_END(__asm_copy_to_user)
+SYM_FUNC_END(fallback_scalar_usercopy)
 EXPORT_SYMBOL(__asm_copy_to_user)
 SYM_FUNC_ALIAS(__asm_copy_from_user, __asm_copy_to_user)
 EXPORT_SYMBOL(__asm_copy_from_user)
diff --git a/arch/riscv/lib/uaccess_vector.S b/arch/riscv/lib/uaccess_vector.S
new file mode 100644
index 000000000000..7bd96cee39e4
--- /dev/null
+++ b/arch/riscv/lib/uaccess_vector.S
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <linux/linkage.h>
+#include <asm-generic/export.h>
+#include <asm/asm.h>
+#include <asm/asm-extable.h>
+#include <asm/csr.h>
+
+#define pDst a0
+#define pSrc a1
+#define iNum a2
+
+#define iVL a3
+
+#define ELEM_LMUL_SETTING m8
+#define vData v0
+
+	.macro fixup op reg addr lbl
+100:
+	\op \reg, \addr
+	_asm_extable	100b, \lbl
+	.endm
+
+SYM_FUNC_START(__asm_vector_usercopy)
+	/* Enable access to user memory */
+	li t6, SR_SUM
+	csrs CSR_STATUS, t6
+
+loop:
+	vsetvli iVL, iNum, e8, ELEM_LMUL_SETTING, ta, ma
+	fixup vle8.v vData, (pSrc), 10f
+	fixup vse8.v vData, (pDst), 10f
+	sub iNum, iNum, iVL
+	add pSrc, pSrc, iVL
+	add pDst, pDst, iVL
+	bnez iNum, loop
+
+.Lout_copy_user:
+	/* Disable access to user memory */
+	csrc CSR_STATUS, t6
+	li	a0, 0
+	ret
+
+	/* Exception fixup code */
+10:
+	/* Disable access to user memory */
+	csrc	CSR_STATUS, t6
+	mv	a0, iNum
+	ret
+SYM_FUNC_END(__asm_vector_usercopy)
-- 
2.17.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [v8, 06/10] riscv: lib: add vectorized mem* routines
  2023-12-23  4:29 [v8, 00/10] riscv: support kernel-mode Vector Andy Chiu
                   ` (4 preceding siblings ...)
  2023-12-23  4:29 ` [v8, 05/10] riscv: lib: vectorize copy_to_user/copy_from_user Andy Chiu
@ 2023-12-23  4:29 ` Andy Chiu
  2023-12-27  1:42   ` Charlie Jenkins
  2023-12-23  4:29 ` [v8, 07/10] riscv: vector: do not pass task_struct into riscv_v_vstate_{save,restore}() Andy Chiu
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Andy Chiu @ 2023-12-23  4:29 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: paul.walmsley, greentime.hu, guoren, bjorn, charlie, ardb, arnd,
	peterz, tglx, ebiggers, Andy Chiu, Albert Ou, Kees Cook,
	Han-Kuan Chen, Conor Dooley, Andrew Jones, Heiko Stuebner

Provide vectorized memcpy/memset/memmove to accelerate common memory
operations. Also, group them into V_OPT_TEMPLATE3 macro because their
setup/tear-down and fallback logics are the same.

The optimal size for the kernel to preference Vector over scalar,
riscv_v_mem*_threshold, is only a heuristic for now. We can add DT
parsing if people feel the need of customizing it.

The original implementation of Vector operations comes from
https://github.com/sifive/sifive-libc, which we agree to contribute to
Linux kernel.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
---
Changelog v7:
 - add __NO_FORTIFY to prevent conflicting function declaration with
   macro for mem* functions.
Changelog v6:
 - provide kconfig to set threshold for vectorized functions (Charlie)
 - rename *thres to *threshold (Charlie)
Changelog v4:
 - new patch since v4
---
 arch/riscv/Kconfig               | 24 ++++++++++++++++
 arch/riscv/lib/Makefile          |  3 ++
 arch/riscv/lib/memcpy_vector.S   | 29 +++++++++++++++++++
 arch/riscv/lib/memmove_vector.S  | 49 ++++++++++++++++++++++++++++++++
 arch/riscv/lib/memset_vector.S   | 33 +++++++++++++++++++++
 arch/riscv/lib/riscv_v_helpers.c | 26 +++++++++++++++++
 6 files changed, 164 insertions(+)
 create mode 100644 arch/riscv/lib/memcpy_vector.S
 create mode 100644 arch/riscv/lib/memmove_vector.S
 create mode 100644 arch/riscv/lib/memset_vector.S

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 3c5ba05e8a2d..cba53dcc2ae0 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -533,6 +533,30 @@ config RISCV_ISA_V_UCOPY_THRESHOLD
 	  Prefer using vectorized copy_to_user()/copy_from_user() when the
 	  workload size exceeds this value.
 
+config RISCV_ISA_V_MEMSET_THRESHOLD
+	int "Threshold size for vectorized memset()"
+	depends on RISCV_ISA_V
+	default 1280
+	help
+	  Prefer using vectorized memset() when the workload size exceeds this
+	  value.
+
+config RISCV_ISA_V_MEMCPY_THRESHOLD
+	int "Threshold size for vectorized memcpy()"
+	depends on RISCV_ISA_V
+	default 768
+	help
+	  Prefer using vectorized memcpy() when the workload size exceeds this
+	  value.
+
+config RISCV_ISA_V_MEMMOVE_THRESHOLD
+	int "Threshold size for vectorized memmove()"
+	depends on RISCV_ISA_V
+	default 512
+	help
+	  Prefer using vectorized memmove() when the workload size exceeds this
+	  value.
+
 config TOOLCHAIN_HAS_ZBB
 	bool
 	default y
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index c8a6787d5827..d389dbf285fe 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -16,3 +16,6 @@ lib-$(CONFIG_RISCV_ISA_ZICBOZ)	+= clear_page.o
 obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
 lib-$(CONFIG_RISCV_ISA_V)	+= xor.o
 lib-$(CONFIG_RISCV_ISA_V)	+= riscv_v_helpers.o
+lib-$(CONFIG_RISCV_ISA_V)	+= memset_vector.o
+lib-$(CONFIG_RISCV_ISA_V)	+= memcpy_vector.o
+lib-$(CONFIG_RISCV_ISA_V)	+= memmove_vector.o
diff --git a/arch/riscv/lib/memcpy_vector.S b/arch/riscv/lib/memcpy_vector.S
new file mode 100644
index 000000000000..4176b6e0a53c
--- /dev/null
+++ b/arch/riscv/lib/memcpy_vector.S
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+
+#define pDst a0
+#define pSrc a1
+#define iNum a2
+
+#define iVL a3
+#define pDstPtr a4
+
+#define ELEM_LMUL_SETTING m8
+#define vData v0
+
+
+/* void *memcpy(void *, const void *, size_t) */
+SYM_FUNC_START(__asm_memcpy_vector)
+	mv pDstPtr, pDst
+loop:
+	vsetvli iVL, iNum, e8, ELEM_LMUL_SETTING, ta, ma
+	vle8.v vData, (pSrc)
+	sub iNum, iNum, iVL
+	add pSrc, pSrc, iVL
+	vse8.v vData, (pDstPtr)
+	add pDstPtr, pDstPtr, iVL
+	bnez iNum, loop
+	ret
+SYM_FUNC_END(__asm_memcpy_vector)
diff --git a/arch/riscv/lib/memmove_vector.S b/arch/riscv/lib/memmove_vector.S
new file mode 100644
index 000000000000..4cea9d244dc9
--- /dev/null
+++ b/arch/riscv/lib/memmove_vector.S
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#include <linux/linkage.h>
+#include <asm/asm.h>
+
+#define pDst a0
+#define pSrc a1
+#define iNum a2
+
+#define iVL a3
+#define pDstPtr a4
+#define pSrcBackwardPtr a5
+#define pDstBackwardPtr a6
+
+#define ELEM_LMUL_SETTING m8
+#define vData v0
+
+SYM_FUNC_START(__asm_memmove_vector)
+
+    mv pDstPtr, pDst
+
+    bgeu pSrc, pDst, forward_copy_loop
+    add pSrcBackwardPtr, pSrc, iNum
+    add pDstBackwardPtr, pDst, iNum
+    bltu pDst, pSrcBackwardPtr, backward_copy_loop
+
+forward_copy_loop:
+    vsetvli iVL, iNum, e8, ELEM_LMUL_SETTING, ta, ma
+
+    vle8.v vData, (pSrc)
+    sub iNum, iNum, iVL
+    add pSrc, pSrc, iVL
+    vse8.v vData, (pDstPtr)
+    add pDstPtr, pDstPtr, iVL
+
+    bnez iNum, forward_copy_loop
+    ret
+
+backward_copy_loop:
+    vsetvli iVL, iNum, e8, ELEM_LMUL_SETTING, ta, ma
+
+    sub pSrcBackwardPtr, pSrcBackwardPtr, iVL
+    vle8.v vData, (pSrcBackwardPtr)
+    sub iNum, iNum, iVL
+    sub pDstBackwardPtr, pDstBackwardPtr, iVL
+    vse8.v vData, (pDstBackwardPtr)
+    bnez iNum, backward_copy_loop
+    ret
+
+SYM_FUNC_END(__asm_memmove_vector)
diff --git a/arch/riscv/lib/memset_vector.S b/arch/riscv/lib/memset_vector.S
new file mode 100644
index 000000000000..4611feed72ac
--- /dev/null
+++ b/arch/riscv/lib/memset_vector.S
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#include <linux/linkage.h>
+#include <asm/asm.h>
+
+#define pDst a0
+#define iValue a1
+#define iNum a2
+
+#define iVL a3
+#define iTemp a4
+#define pDstPtr a5
+
+#define ELEM_LMUL_SETTING m8
+#define vData v0
+
+/* void *memset(void *, int, size_t) */
+SYM_FUNC_START(__asm_memset_vector)
+
+    mv pDstPtr, pDst
+
+    vsetvli iVL, iNum, e8, ELEM_LMUL_SETTING, ta, ma
+    vmv.v.x vData, iValue
+
+loop:
+    vse8.v vData, (pDstPtr)
+    sub iNum, iNum, iVL
+    add pDstPtr, pDstPtr, iVL
+    vsetvli iVL, iNum, e8, ELEM_LMUL_SETTING, ta, ma
+    bnez iNum, loop
+
+    ret
+
+SYM_FUNC_END(__asm_memset_vector)
diff --git a/arch/riscv/lib/riscv_v_helpers.c b/arch/riscv/lib/riscv_v_helpers.c
index 6cac8f4e69e9..c62f333ba557 100644
--- a/arch/riscv/lib/riscv_v_helpers.c
+++ b/arch/riscv/lib/riscv_v_helpers.c
@@ -3,9 +3,13 @@
  * Copyright (C) 2023 SiFive
  * Author: Andy Chiu <andy.chiu@sifive.com>
  */
+#ifndef __NO_FORTIFY
+# define __NO_FORTIFY
+#endif
 #include <linux/linkage.h>
 #include <asm/asm.h>
 
+#include <asm/string.h>
 #include <asm/vector.h>
 #include <asm/simd.h>
 
@@ -42,3 +46,25 @@ asmlinkage int enter_vector_usercopy(void *dst, void *src, size_t n)
 	return fallback_scalar_usercopy(dst, src, n);
 }
 #endif
+
+#define V_OPT_TEMPLATE3(prefix, type_r, type_0, type_1)				\
+extern type_r __asm_##prefix##_vector(type_0, type_1, size_t n);		\
+type_r prefix(type_0 a0, type_1 a1, size_t n)					\
+{										\
+	type_r ret;								\
+	if (has_vector() && may_use_simd() &&					\
+	    n > riscv_v_##prefix##_threshold) {					\
+		kernel_vector_begin();						\
+		ret = __asm_##prefix##_vector(a0, a1, n);			\
+		kernel_vector_end();						\
+		return ret;							\
+	}									\
+	return __##prefix(a0, a1, n);						\
+}
+
+static size_t riscv_v_memset_threshold = CONFIG_RISCV_ISA_V_MEMSET_THRESHOLD;
+V_OPT_TEMPLATE3(memset, void *, void*, int)
+static size_t riscv_v_memcpy_threshold = CONFIG_RISCV_ISA_V_MEMCPY_THRESHOLD;
+V_OPT_TEMPLATE3(memcpy, void *, void*, const void *)
+static size_t riscv_v_memmove_threshold = CONFIG_RISCV_ISA_V_MEMMOVE_THRESHOLD;
+V_OPT_TEMPLATE3(memmove, void *, void*, const void *)
-- 
2.17.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [v8, 07/10] riscv: vector: do not pass task_struct into riscv_v_vstate_{save,restore}()
  2023-12-23  4:29 [v8, 00/10] riscv: support kernel-mode Vector Andy Chiu
                   ` (5 preceding siblings ...)
  2023-12-23  4:29 ` [v8, 06/10] riscv: lib: add vectorized mem* routines Andy Chiu
@ 2023-12-23  4:29 ` Andy Chiu
  2023-12-23  4:29 ` [v8, 08/10] riscv: vector: use a mask to write vstate_ctrl Andy Chiu
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Andy Chiu @ 2023-12-23  4:29 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: paul.walmsley, greentime.hu, guoren, bjorn, charlie, ardb, arnd,
	peterz, tglx, ebiggers, Andy Chiu, Albert Ou, Oleg Nesterov,
	Guo Ren, Björn Töpel, Conor Dooley,
	Clément Léger, Vincent Chen, Heiko Stuebner, Xiao Wang,
	Mathis Salmen, Haorong Lu

riscv_v_vstate_{save,restore}() can operate only on the knowlege of
struct __riscv_v_ext_state, and struct pt_regs. Let the caller decides
which should be passed into the function. Meanwhile, the kernel-mode
Vector is going to introduce another vstate, so this also makes functions
potentially able to be reused.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
---
Changelog v6:
 - re-added for v6
Changelog v3:
 - save V context after get_cpu_vector_context
Changelog v2:
 - fix build fail that get caught on this patch (Conor)
---
 arch/riscv/include/asm/entry-common.h  |  2 +-
 arch/riscv/include/asm/vector.h        | 14 +++++---------
 arch/riscv/kernel/kernel_mode_vector.c |  2 +-
 arch/riscv/kernel/ptrace.c             |  2 +-
 arch/riscv/kernel/signal.c             |  2 +-
 5 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/arch/riscv/include/asm/entry-common.h b/arch/riscv/include/asm/entry-common.h
index 6361a8488642..08fe8cdbf33e 100644
--- a/arch/riscv/include/asm/entry-common.h
+++ b/arch/riscv/include/asm/entry-common.h
@@ -16,7 +16,7 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
 		 * We are already called with irq disabled, so go without
 		 * keeping track of vector_context_busy.
 		 */
-		riscv_v_vstate_restore(current, regs);
+		riscv_v_vstate_restore(&current->thread.vstate, regs);
 	}
 }
 
diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index e706613aae2c..c5a83c277583 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -183,23 +183,19 @@ static inline void riscv_v_vstate_discard(struct pt_regs *regs)
 	__riscv_v_vstate_dirty(regs);
 }
 
-static inline void riscv_v_vstate_save(struct task_struct *task,
+static inline void riscv_v_vstate_save(struct __riscv_v_ext_state *vstate,
 				       struct pt_regs *regs)
 {
 	if ((regs->status & SR_VS) == SR_VS_DIRTY) {
-		struct __riscv_v_ext_state *vstate = &task->thread.vstate;
-
 		__riscv_v_vstate_save(vstate, vstate->datap);
 		__riscv_v_vstate_clean(regs);
 	}
 }
 
-static inline void riscv_v_vstate_restore(struct task_struct *task,
+static inline void riscv_v_vstate_restore(struct __riscv_v_ext_state *vstate,
 					  struct pt_regs *regs)
 {
 	if ((regs->status & SR_VS) != SR_VS_OFF) {
-		struct __riscv_v_ext_state *vstate = &task->thread.vstate;
-
 		__riscv_v_vstate_restore(vstate, vstate->datap);
 		__riscv_v_vstate_clean(regs);
 	}
@@ -220,7 +216,7 @@ static inline void __switch_to_vector(struct task_struct *prev,
 	struct pt_regs *regs;
 
 	regs = task_pt_regs(prev);
-	riscv_v_vstate_save(prev, regs);
+	riscv_v_vstate_save(&prev->thread.vstate, regs);
 	riscv_v_vstate_set_restore(next, task_pt_regs(next));
 }
 
@@ -238,8 +234,8 @@ static inline bool riscv_v_vstate_query(struct pt_regs *regs) { return false; }
 static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; }
 #define riscv_v_vsize (0)
 #define riscv_v_vstate_discard(regs)		do {} while (0)
-#define riscv_v_vstate_save(task, regs)		do {} while (0)
-#define riscv_v_vstate_restore(task, regs)	do {} while (0)
+#define riscv_v_vstate_save(vstate, regs)	do {} while (0)
+#define riscv_v_vstate_restore(vstate, regs)	do {} while (0)
 #define __switch_to_vector(__prev, __next)	do {} while (0)
 #define riscv_v_vstate_off(regs)		do {} while (0)
 #define riscv_v_vstate_on(regs)			do {} while (0)
diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
index 63814e780c28..7350e975e094 100644
--- a/arch/riscv/kernel/kernel_mode_vector.c
+++ b/arch/riscv/kernel/kernel_mode_vector.c
@@ -76,7 +76,7 @@ void kernel_vector_begin(void)
 
 	get_cpu_vector_context();
 
-	riscv_v_vstate_save(current, task_pt_regs(current));
+	riscv_v_vstate_save(&current->thread.vstate, task_pt_regs(current));
 
 	riscv_v_enable();
 }
diff --git a/arch/riscv/kernel/ptrace.c b/arch/riscv/kernel/ptrace.c
index 7b93bcbdf9fa..e8515aa9d80b 100644
--- a/arch/riscv/kernel/ptrace.c
+++ b/arch/riscv/kernel/ptrace.c
@@ -101,7 +101,7 @@ static int riscv_vr_get(struct task_struct *target,
 	 */
 	if (target == current) {
 		get_cpu_vector_context();
-		riscv_v_vstate_save(current, task_pt_regs(current));
+		riscv_v_vstate_save(&current->thread.vstate, task_pt_regs(current));
 		put_cpu_vector_context();
 	}
 
diff --git a/arch/riscv/kernel/signal.c b/arch/riscv/kernel/signal.c
index aca4a12c8416..5d69f4db9e8f 100644
--- a/arch/riscv/kernel/signal.c
+++ b/arch/riscv/kernel/signal.c
@@ -87,7 +87,7 @@ static long save_v_state(struct pt_regs *regs, void __user **sc_vec)
 	WARN_ON(unlikely(!IS_ALIGNED((unsigned long)datap, 16)));
 
 	get_cpu_vector_context();
-	riscv_v_vstate_save(current, regs);
+	riscv_v_vstate_save(&current->thread.vstate, regs);
 	put_cpu_vector_context();
 
 	/* Copy everything of vstate but datap. */
-- 
2.17.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [v8, 08/10] riscv: vector: use a mask to write vstate_ctrl
  2023-12-23  4:29 [v8, 00/10] riscv: support kernel-mode Vector Andy Chiu
                   ` (6 preceding siblings ...)
  2023-12-23  4:29 ` [v8, 07/10] riscv: vector: do not pass task_struct into riscv_v_vstate_{save,restore}() Andy Chiu
@ 2023-12-23  4:29 ` Andy Chiu
  2023-12-23  4:29 ` [v8, 09/10] riscv: vector: use kmem_cache to manage vector context Andy Chiu
  2023-12-23  4:29 ` [v8, 10/10] riscv: vector: allow kernel-mode Vector with preemption Andy Chiu
  9 siblings, 0 replies; 24+ messages in thread
From: Andy Chiu @ 2023-12-23  4:29 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: paul.walmsley, greentime.hu, guoren, bjorn, charlie, ardb, arnd,
	peterz, tglx, ebiggers, Andy Chiu, Albert Ou, Vincent Chen,
	Conor Dooley, Joel Granados

riscv_v_ctrl_set() should only touch bits within
PR_RISCV_V_VSTATE_CTRL_MASK. So, use the mask when we really set task's
vstate_ctrl.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
---
Changelog v6:
 - splitted out from v3
---
 arch/riscv/kernel/vector.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/kernel/vector.c b/arch/riscv/kernel/vector.c
index 66e8c6ab09d2..c1f28bc89ec6 100644
--- a/arch/riscv/kernel/vector.c
+++ b/arch/riscv/kernel/vector.c
@@ -122,7 +122,8 @@ static inline void riscv_v_ctrl_set(struct task_struct *tsk, int cur, int nxt,
 	ctrl |= VSTATE_CTRL_MAKE_NEXT(nxt);
 	if (inherit)
 		ctrl |= PR_RISCV_V_VSTATE_CTRL_INHERIT;
-	tsk->thread.vstate_ctrl = ctrl;
+	tsk->thread.vstate_ctrl &= ~PR_RISCV_V_VSTATE_CTRL_MASK;
+	tsk->thread.vstate_ctrl |= ctrl;
 }
 
 bool riscv_v_vstate_ctrl_user_allowed(void)
-- 
2.17.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [v8, 09/10] riscv: vector: use kmem_cache to manage vector context
  2023-12-23  4:29 [v8, 00/10] riscv: support kernel-mode Vector Andy Chiu
                   ` (7 preceding siblings ...)
  2023-12-23  4:29 ` [v8, 08/10] riscv: vector: use a mask to write vstate_ctrl Andy Chiu
@ 2023-12-23  4:29 ` Andy Chiu
  2023-12-23  4:29 ` [v8, 10/10] riscv: vector: allow kernel-mode Vector with preemption Andy Chiu
  9 siblings, 0 replies; 24+ messages in thread
From: Andy Chiu @ 2023-12-23  4:29 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: paul.walmsley, greentime.hu, guoren, bjorn, charlie, ardb, arnd,
	peterz, tglx, ebiggers, Andy Chiu, Albert Ou, Vincent Chen,
	Heiko Stuebner, Guo Ren, Björn Töpel, Xiao Wang,
	Clément Léger, Jisheng Zhang, Conor Dooley,
	Joel Granados

The allocation size of thread.vstate.datap is always riscv_v_vsize. So
it is possbile to use kmem_cache_* to manage the allocation. This gives
users more information regarding allocation of vector context via
/proc/slabinfo. And it potentially reduces the latency of the first-use
trap because of the allocation caches.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
---
Changelog v6:
 - new patch since v6
---
 arch/riscv/include/asm/vector.h |  4 ++++
 arch/riscv/kernel/process.c     |  7 ++++++-
 arch/riscv/kernel/vector.c      | 16 +++++++++++++++-
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index c5a83c277583..0e6741dd9ef3 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -26,6 +26,8 @@ void kernel_vector_begin(void);
 void kernel_vector_end(void);
 void get_cpu_vector_context(void);
 void put_cpu_vector_context(void);
+void riscv_v_thread_free(struct task_struct *tsk);
+void __init riscv_v_setup_ctx_cache(void);
 
 static inline void riscv_v_ctx_cnt_add(u32 offset)
 {
@@ -239,6 +241,8 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; }
 #define __switch_to_vector(__prev, __next)	do {} while (0)
 #define riscv_v_vstate_off(regs)		do {} while (0)
 #define riscv_v_vstate_on(regs)			do {} while (0)
+#define riscv_v_thread_free(tsk)		do {} while (0)
+#define  riscv_v_setup_ctx_cache()		do {} while (0)
 
 #endif /* CONFIG_RISCV_ISA_V */
 
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 36993f408de4..862d59c3872e 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -179,7 +179,7 @@ void arch_release_task_struct(struct task_struct *tsk)
 {
 	/* Free the vector context of datap. */
 	if (has_vector())
-		kfree(tsk->thread.vstate.datap);
+		riscv_v_thread_free(tsk);
 }
 
 int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
@@ -228,3 +228,8 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 	p->thread.sp = (unsigned long)childregs; /* kernel sp */
 	return 0;
 }
+
+void __init arch_task_cache_init(void)
+{
+	riscv_v_setup_ctx_cache();
+}
diff --git a/arch/riscv/kernel/vector.c b/arch/riscv/kernel/vector.c
index c1f28bc89ec6..1fe140e34557 100644
--- a/arch/riscv/kernel/vector.c
+++ b/arch/riscv/kernel/vector.c
@@ -21,6 +21,7 @@
 #include <asm/bug.h>
 
 static bool riscv_v_implicit_uacc = IS_ENABLED(CONFIG_RISCV_ISA_V_DEFAULT_ENABLE);
+static struct kmem_cache *riscv_v_user_cachep;
 
 unsigned long riscv_v_vsize __read_mostly;
 EXPORT_SYMBOL_GPL(riscv_v_vsize);
@@ -47,6 +48,13 @@ int riscv_v_setup_vsize(void)
 	return 0;
 }
 
+void __init riscv_v_setup_ctx_cache(void)
+{
+	riscv_v_user_cachep = kmem_cache_create_usercopy("riscv_vector_ctx",
+							 riscv_v_vsize, 16, SLAB_PANIC,
+							 0, riscv_v_vsize, NULL);
+}
+
 static bool insn_is_vector(u32 insn_buf)
 {
 	u32 opcode = insn_buf & __INSN_OPCODE_MASK;
@@ -84,7 +92,7 @@ static int riscv_v_thread_zalloc(void)
 {
 	void *datap;
 
-	datap = kzalloc(riscv_v_vsize, GFP_KERNEL);
+	datap = kmem_cache_zalloc(riscv_v_user_cachep, GFP_KERNEL);
 	if (!datap)
 		return -ENOMEM;
 
@@ -94,6 +102,12 @@ static int riscv_v_thread_zalloc(void)
 	return 0;
 }
 
+void riscv_v_thread_free(struct task_struct *tsk)
+{
+	if (tsk->thread.vstate.datap)
+		kmem_cache_free(riscv_v_user_cachep, tsk->thread.vstate.datap);
+}
+
 #define VSTATE_CTRL_GET_CUR(x) ((x) & PR_RISCV_V_VSTATE_CTRL_CUR_MASK)
 #define VSTATE_CTRL_GET_NEXT(x) (((x) & PR_RISCV_V_VSTATE_CTRL_NEXT_MASK) >> 2)
 #define VSTATE_CTRL_MAKE_NEXT(x) (((x) << 2) & PR_RISCV_V_VSTATE_CTRL_NEXT_MASK)
-- 
2.17.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [v8, 10/10] riscv: vector: allow kernel-mode Vector with preemption
  2023-12-23  4:29 [v8, 00/10] riscv: support kernel-mode Vector Andy Chiu
                   ` (8 preceding siblings ...)
  2023-12-23  4:29 ` [v8, 09/10] riscv: vector: use kmem_cache to manage vector context Andy Chiu
@ 2023-12-23  4:29 ` Andy Chiu
  2023-12-27 12:12   ` Song Shuai
  2023-12-27 22:45   ` Samuel Holland
  9 siblings, 2 replies; 24+ messages in thread
From: Andy Chiu @ 2023-12-23  4:29 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: paul.walmsley, greentime.hu, guoren, bjorn, charlie, ardb, arnd,
	peterz, tglx, ebiggers, Andy Chiu, Albert Ou, Guo Ren,
	Sami Tolvanen, Han-Kuan Chen, Deepak Gupta, Vincent Chen,
	Heiko Stuebner, Baoquan He, Clément Léger,
	Björn Töpel, Xiao Wang, Nathan Chancellor,
	Jisheng Zhang, Conor Dooley, Joel Granados

Add kernel_vstate to keep track of kernel-mode Vector registers when
trap introduced context switch happens. Also, provide riscv_v_flags to
let context save/restore routine track context status. Context tracking
happens whenever the core starts its in-kernel Vector executions. An
active (dirty) kernel task's V contexts will be saved to memory whenever
a trap-introduced context switch happens. Or, when a softirq, which
happens to nest on top of it, uses Vector. Context retoring happens when
the execution transfer back to the original Kernel context where it
first enable preempt_v.

Also, provide a config CONFIG_RISCV_ISA_V_PREEMPTIVE to give users an
option to disable preemptible kernel-mode Vector at build time. Users
with constraint memory may want to disable this config as preemptible
kernel-mode Vector needs extra space for tracking of per thread's
kernel-mode V context. Or, users might as well want to disable it if all
kernel-mode Vector code is time sensitive and cannot tolerate context
switch overhead.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
---
Changelog v8:
 - fix -Wmissing-prototypes for functions with asmlinkage
Changelog v6:
 - re-write patch to handle context nesting for softirqs
 - drop thread flag and track context instead in riscv_v_flags
 - refine some asm code and constraint it into C functions
 - preallocate v context for preempt_v
 - Return non-zero in riscv_v_start_kernel_context with non-preemptible
   kernel-mode Vector
Changelog v4:
 - dropped from v4
Changelog v3:
 - Guard vstate_save with {get,set}_cpu_vector_context
 - Add comments on preventions of nesting V contexts
 - remove warnings in context switch when trap's reg is not pressent (Conor)
 - refactor code (Björn)
Changelog v2:
 - fix build fail when compiling without RISCV_ISA_V (Conor)
 - 's/TIF_RISCV_V_KMV/TIF_RISCV_V_KERNEL_MODE' and add comment (Conor)
 - merge Kconfig patch into this oine (Conor).
 - 's/CONFIG_RISCV_ISA_V_PREEMPTIVE_KMV/CONFIG_RISCV_ISA_V_PREEMPTIVE/'
   (Conor)
 - fix some typos (Conor)
 - enclose assembly with RISCV_ISA_V_PREEMPTIVE.
 - change riscv_v_vstate_ctrl_config_kmv() to
   kernel_vector_allow_preemption() for better understanding. (Conor)
 - 's/riscv_v_kmv_preempitble/kernel_vector_preemptible/'
---
 arch/riscv/Kconfig                      |  14 +++
 arch/riscv/include/asm/asm-prototypes.h |   5 +
 arch/riscv/include/asm/processor.h      |  26 ++++-
 arch/riscv/include/asm/simd.h           |  26 ++++-
 arch/riscv/include/asm/vector.h         |  57 ++++++++++-
 arch/riscv/kernel/entry.S               |   8 ++
 arch/riscv/kernel/kernel_mode_vector.c  | 124 +++++++++++++++++++++++-
 arch/riscv/kernel/process.c             |   3 +
 arch/riscv/kernel/vector.c              |  31 ++++--
 9 files changed, 273 insertions(+), 21 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index cba53dcc2ae0..70603c486593 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -557,6 +557,20 @@ config RISCV_ISA_V_MEMMOVE_THRESHOLD
 	  Prefer using vectorized memmove() when the workload size exceeds this
 	  value.
 
+config RISCV_ISA_V_PREEMPTIVE
+	bool "Run kernel-mode Vector with kernel preemption"
+	depends on PREEMPTION
+	depends on RISCV_ISA_V
+	default y
+	help
+	  Usually, in-kernel SIMD routines are run with preemption disabled.
+	  Functions which envoke long running SIMD thus must yield core's
+	  vector unit to prevent blocking other tasks for too long.
+
+	  This config allows kernel to run SIMD without explicitly disable
+	  preemption. Enabling this config will result in higher memory
+	  consumption due to the allocation of per-task's kernel Vector context.
+
 config TOOLCHAIN_HAS_ZBB
 	bool
 	default y
diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
index be438932f321..cd627ec289f1 100644
--- a/arch/riscv/include/asm/asm-prototypes.h
+++ b/arch/riscv/include/asm/asm-prototypes.h
@@ -30,6 +30,11 @@ void xor_regs_5_(unsigned long bytes, unsigned long *__restrict p1,
 		 const unsigned long *__restrict p4,
 		 const unsigned long *__restrict p5);
 
+#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
+asmlinkage void riscv_v_context_nesting_start(struct pt_regs *regs);
+asmlinkage void riscv_v_context_nesting_end(struct pt_regs *regs);
+#endif /* CONFIG_RISCV_ISA_V_PREEMPTIVE */
+
 #endif /* CONFIG_RISCV_ISA_V */
 
 #define DECLARE_DO_ERROR_INFO(name)	asmlinkage void name(struct pt_regs *regs)
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index 15781e2232e0..4de9124bcf4f 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -81,11 +81,32 @@ struct pt_regs;
  *    activation of this state disables the preemption. On a non-RT kernel, it
  *    also disable bh. Currently only 0 and 1 are valid value for this field.
  *    Other values are reserved for future uses.
+ *  - bits 8-15 are used for tracking preemptible kernel-mode Vector, when
+ *    RISCV_ISA_V_PREEMPTIVE is set. Calling kernel_vector_begin() does not
+ *    disable the preemption if the thread's kernel_vstate.datap is allocated.
+ *    Instead, the kernel adds 1 into this field. Then the trap entry/exit code
+ *    knows if we are entering/exiting the context that owns preempt_v.
+ *     - 0: the task is not using preempt_v
+ *     - 1: the task is actively using, and owns preempt_v
+ *     - >1: the task was using preempt_v, but then took a trap within. Thus,
+ *       the task does not own preempt_v. Any use of Vector will have to save
+ *       preempt_v, if dirty, and fallback to non-preemptible kernel-mode
+ *       Vector.
+ *   - bit 30: The in-kernel preempt_v context is saved, and requries to be
+ *     restored when returning to the context that owns the preempt_v.
+ *   - bit 31: The in-kernel preempt_v context is dirty, as signaled by the
+ *     trap entry code. Any context switches out-of current task need to save
+ *     it to the task's in-kernel V context. Also, any traps nesting on-top-of
+ *     preempt_v requesting to use V needs a save.
  */
 
-#define RISCV_KERNEL_MODE_V_MASK	0xff
+#define RISCV_KERNEL_MODE_V_MASK	0x000000ff
+#define RISCV_PREEMPT_V_MASK		0x0000ff00
 
-#define RISCV_KERNEL_MODE_V	0x1
+#define RISCV_KERNEL_MODE_V		0x00000001
+#define RISCV_PREEMPT_V			0x00000100
+#define RISCV_PREEMPT_V_DIRTY		0x80000000
+#define RISCV_PREEMPT_V_NEED_RESTORE	0x40000000
 
 /* CPU-specific state of a task */
 struct thread_struct {
@@ -99,6 +120,7 @@ struct thread_struct {
 	u32 vstate_ctrl;
 	struct __riscv_v_ext_state vstate;
 	unsigned long align_ctl;
+	struct __riscv_v_ext_state kernel_vstate;
 };
 
 /* Whitelist the fstate from the task_struct for hardened usercopy */
diff --git a/arch/riscv/include/asm/simd.h b/arch/riscv/include/asm/simd.h
index 2f1e95ccb03c..7daccdcbdee8 100644
--- a/arch/riscv/include/asm/simd.h
+++ b/arch/riscv/include/asm/simd.h
@@ -12,6 +12,7 @@
 #include <linux/percpu.h>
 #include <linux/preempt.h>
 #include <linux/types.h>
+#include <linux/thread_info.h>
 
 #include <asm/vector.h>
 
@@ -28,12 +29,27 @@ static __must_check inline bool may_use_simd(void)
 	/*
 	 * RISCV_KERNEL_MODE_V is only set while preemption is disabled,
 	 * and is clear whenever preemption is enabled.
-	 *
-	 * Kernel-mode Vector temporarily disables bh. So we must not return
-	 * true on irq_disabled(). Otherwise we would fail the lockdep check
-	 * calling local_bh_enable()
 	 */
-	return !in_hardirq() && !in_nmi() && !irqs_disabled() && !(riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK);
+	if (in_hardirq() || in_nmi())
+		return false;
+
+	/*
+	 * Nesting is acheived in preempt_v by spreading the control for
+	 * preemptible and non-preemptible kernel-mode Vector into two fields.
+	 * Always try to match with prempt_v if kernel V-context exists. Then,
+	 * fallback to check non preempt_v if nesting happens, or if the config
+	 * is not set.
+	 */
+	if (IS_ENABLED(CONFIG_RISCV_ISA_V_PREEMPTIVE) && current->thread.kernel_vstate.datap) {
+		if (!riscv_preempt_v_started(current))
+			return true;
+	}
+	/*
+	 * Non-preemptible kernel-mode Vector temporarily disables bh. So we
+	 * must not return true on irq_disabled(). Otherwise we would fail the
+	 * lockdep check calling local_bh_enable()
+	 */
+	return !irqs_disabled() && !(riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK);
 }
 
 #else /* ! CONFIG_RISCV_ISA_V */
diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index 0e6741dd9ef3..542eaf9227c3 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -28,6 +28,7 @@ void get_cpu_vector_context(void);
 void put_cpu_vector_context(void);
 void riscv_v_thread_free(struct task_struct *tsk);
 void __init riscv_v_setup_ctx_cache(void);
+void riscv_v_thread_alloc(struct task_struct *tsk);
 
 static inline void riscv_v_ctx_cnt_add(u32 offset)
 {
@@ -212,14 +213,63 @@ static inline void riscv_v_vstate_set_restore(struct task_struct *task,
 	}
 }
 
+#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
+static inline bool riscv_preempt_v_dirty(struct task_struct *task)
+{
+	u32 val = READ_ONCE(task->thread.riscv_v_flags);
+
+	return !!(val & RISCV_PREEMPT_V_DIRTY);
+}
+
+static inline bool riscv_preempt_v_restore(struct task_struct *task)
+{
+	u32 val = READ_ONCE(task->thread.riscv_v_flags);
+
+	return !!(val & RISCV_PREEMPT_V_NEED_RESTORE);
+}
+
+static inline void riscv_preempt_v_clear_dirty(struct task_struct *task)
+{
+	barrier();
+	task->thread.riscv_v_flags &= ~RISCV_PREEMPT_V_DIRTY;
+}
+
+static inline void riscv_preempt_v_set_restore(struct task_struct *task)
+{
+	barrier();
+	task->thread.riscv_v_flags |= RISCV_PREEMPT_V_NEED_RESTORE;
+}
+
+static inline bool riscv_preempt_v_started(struct task_struct *task)
+{
+	return !!(READ_ONCE(task->thread.riscv_v_flags) & RISCV_PREEMPT_V_MASK);
+}
+#else /* !CONFIG_RISCV_ISA_V_PREEMPTIVE */
+static inline bool riscv_preempt_v_dirty(struct task_struct *task) { return false; }
+static inline bool riscv_preempt_v_restore(struct task_struct *task) { return false; }
+static inline bool riscv_preempt_v_started(struct task_struct *task) { return false; }
+#define riscv_preempt_v_clear_dirty(tsk)	do {} while (0)
+#define riscv_preempt_v_set_restore(tsk)	do {} while (0)
+#endif /* CONFIG_RISCV_ISA_V_PREEMPTIVE */
+
 static inline void __switch_to_vector(struct task_struct *prev,
 				      struct task_struct *next)
 {
 	struct pt_regs *regs;
 
-	regs = task_pt_regs(prev);
-	riscv_v_vstate_save(&prev->thread.vstate, regs);
-	riscv_v_vstate_set_restore(next, task_pt_regs(next));
+	if (riscv_preempt_v_dirty(prev)) {
+		__riscv_v_vstate_save(&prev->thread.kernel_vstate,
+				      prev->thread.kernel_vstate.datap);
+		riscv_preempt_v_clear_dirty(prev);
+	} else {
+		regs = task_pt_regs(prev);
+		riscv_v_vstate_save(&prev->thread.vstate, regs);
+	}
+
+	if (riscv_preempt_v_started(next))
+		riscv_preempt_v_set_restore(next);
+	else
+		riscv_v_vstate_set_restore(next, task_pt_regs(next));
 }
 
 void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
@@ -243,6 +293,7 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; }
 #define riscv_v_vstate_on(regs)			do {} while (0)
 #define riscv_v_thread_free(tsk)		do {} while (0)
 #define  riscv_v_setup_ctx_cache()		do {} while (0)
+#define riscv_v_thread_alloc(tsk)		do {} while (0)
 
 #endif /* CONFIG_RISCV_ISA_V */
 
diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index 54ca4564a926..9d1a305d5508 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -83,6 +83,10 @@ SYM_CODE_START(handle_exception)
 	/* Load the kernel shadow call stack pointer if coming from userspace */
 	scs_load_current_if_task_changed s5
 
+#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
+	move a0, sp
+	call riscv_v_context_nesting_start
+#endif
 	move a0, sp /* pt_regs */
 	la ra, ret_from_exception
 
@@ -138,6 +142,10 @@ SYM_CODE_START_NOALIGN(ret_from_exception)
 	 */
 	csrw CSR_SCRATCH, tp
 1:
+#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
+	move a0, sp
+	call riscv_v_context_nesting_end
+#endif
 	REG_L a0, PT_STATUS(sp)
 	/*
 	 * The current load reservation is effectively part of the processor's
diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
index 7350e975e094..75d6b00842b3 100644
--- a/arch/riscv/kernel/kernel_mode_vector.c
+++ b/arch/riscv/kernel/kernel_mode_vector.c
@@ -14,6 +14,9 @@
 #include <asm/vector.h>
 #include <asm/switch_to.h>
 #include <asm/simd.h>
+#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
+#include <asm/asm-prototypes.h>
+#endif
 
 /*
  * Claim ownership of the CPU vector context for use by the calling context.
@@ -54,6 +57,111 @@ void put_cpu_vector_context(void)
 		preempt_enable();
 }
 
+#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
+static inline void riscv_preempt_v_set_dirty(void)
+{
+	current->thread.riscv_v_flags |= RISCV_PREEMPT_V_DIRTY;
+}
+
+static inline void riscv_preempt_v_reset_flags(void)
+{
+	current->thread.riscv_v_flags &= ~(RISCV_PREEMPT_V_DIRTY | RISCV_PREEMPT_V_NEED_RESTORE);
+}
+
+static inline void riscv_preempt_v_depth_inc(void)
+{
+	riscv_v_ctx_cnt_add(RISCV_PREEMPT_V);
+}
+
+static inline void riscv_preempt_v_depth_dec(void)
+{
+	riscv_v_ctx_cnt_sub(RISCV_PREEMPT_V);
+}
+
+static inline u32 riscv_preempt_v_get_depth(void)
+{
+	return riscv_v_ctx_cnt() & RISCV_PREEMPT_V_MASK;
+}
+
+#define PREEMPT_V_FIRST_DEPTH	RISCV_PREEMPT_V
+static int riscv_v_stop_kernel_context(void)
+{
+	if (riscv_preempt_v_get_depth() != PREEMPT_V_FIRST_DEPTH)
+		return 1;
+
+	riscv_preempt_v_depth_dec();
+	return 0;
+}
+
+static int riscv_v_start_kernel_context(bool *is_nested)
+{
+	struct __riscv_v_ext_state *vstate = &current->thread.kernel_vstate;
+
+	if (!vstate->datap)
+		return -ENOENT;
+
+	if (riscv_preempt_v_started(current)) {
+		WARN_ON(riscv_preempt_v_get_depth() == PREEMPT_V_FIRST_DEPTH);
+		if (riscv_preempt_v_dirty(current)) {
+			get_cpu_vector_context();
+			__riscv_v_vstate_save(vstate, vstate->datap);
+			riscv_preempt_v_clear_dirty(current);
+			put_cpu_vector_context();
+		}
+		get_cpu_vector_context();
+		riscv_preempt_v_set_restore(current);
+		*is_nested = true;
+		return 0;
+	}
+
+	get_cpu_vector_context();
+	riscv_v_vstate_save(&current->thread.vstate, task_pt_regs(current));
+	put_cpu_vector_context();
+
+	riscv_preempt_v_depth_inc();
+	return 0;
+}
+
+/* low-level V context handling code, called with irq disabled */
+asmlinkage void riscv_v_context_nesting_start(struct pt_regs *regs)
+{
+	int depth;
+
+	if (!riscv_preempt_v_started(current))
+		return;
+
+	depth = riscv_preempt_v_get_depth();
+	if (depth == PREEMPT_V_FIRST_DEPTH && (regs->status & SR_VS) == SR_VS_DIRTY)
+		riscv_preempt_v_set_dirty();
+
+	riscv_preempt_v_depth_inc();
+}
+
+asmlinkage void riscv_v_context_nesting_end(struct pt_regs *regs)
+{
+	struct __riscv_v_ext_state *vstate = &current->thread.kernel_vstate;
+	u32 depth;
+
+	lockdep_assert_irqs_disabled();
+
+	if (!riscv_preempt_v_started(current))
+		return;
+
+	riscv_preempt_v_depth_dec();
+	depth = riscv_preempt_v_get_depth();
+	if (depth == PREEMPT_V_FIRST_DEPTH) {
+		if (riscv_preempt_v_restore(current)) {
+			__riscv_v_vstate_restore(vstate, vstate->datap);
+			__riscv_v_vstate_clean(regs);
+		}
+		riscv_preempt_v_reset_flags();
+	}
+}
+#else
+#define riscv_v_start_kernel_context(nested)	(-ENOENT)
+#define riscv_v_stop_kernel_context()		(-ENOENT)
+#endif /* CONFIG_RISCV_ISA_V_PREEMPTIVE */
+
 /*
  * kernel_vector_begin(): obtain the CPU vector registers for use by the calling
  * context
@@ -69,14 +177,20 @@ void put_cpu_vector_context(void)
  */
 void kernel_vector_begin(void)
 {
+	bool nested = false;
+
 	if (WARN_ON(!has_vector()))
 		return;
 
 	BUG_ON(!may_use_simd());
 
-	get_cpu_vector_context();
+	if (riscv_v_start_kernel_context(&nested)) {
+		get_cpu_vector_context();
+		riscv_v_vstate_save(&current->thread.vstate, task_pt_regs(current));
+	}
 
-	riscv_v_vstate_save(&current->thread.vstate, task_pt_regs(current));
+	if (!nested)
+		riscv_v_vstate_set_restore(current, task_pt_regs(current));
 
 	riscv_v_enable();
 }
@@ -96,10 +210,10 @@ void kernel_vector_end(void)
 	if (WARN_ON(!has_vector()))
 		return;
 
-	riscv_v_vstate_set_restore(current, task_pt_regs(current));
-
 	riscv_v_disable();
 
-	put_cpu_vector_context();
+	if (riscv_v_stop_kernel_context()) {// we should call this early
+		put_cpu_vector_context();
+	}
 }
 EXPORT_SYMBOL_GPL(kernel_vector_end);
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 862d59c3872e..92922dbd5b5c 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -188,6 +188,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	*dst = *src;
 	/* clear entire V context, including datap for a new task */
 	memset(&dst->thread.vstate, 0, sizeof(struct __riscv_v_ext_state));
+	memset(&dst->thread.kernel_vstate, 0, sizeof(struct __riscv_v_ext_state));
 	clear_tsk_thread_flag(dst, TIF_RISCV_V_DEFER_RESTORE);
 
 	return 0;
@@ -224,6 +225,8 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 		p->thread.s[0] = 0;
 	}
 	p->thread.riscv_v_flags = 0;
+	if (has_vector())
+		riscv_v_thread_alloc(p);
 	p->thread.ra = (unsigned long)ret_from_fork;
 	p->thread.sp = (unsigned long)childregs; /* kernel sp */
 	return 0;
diff --git a/arch/riscv/kernel/vector.c b/arch/riscv/kernel/vector.c
index 1fe140e34557..f9769703fd39 100644
--- a/arch/riscv/kernel/vector.c
+++ b/arch/riscv/kernel/vector.c
@@ -22,6 +22,9 @@
 
 static bool riscv_v_implicit_uacc = IS_ENABLED(CONFIG_RISCV_ISA_V_DEFAULT_ENABLE);
 static struct kmem_cache *riscv_v_user_cachep;
+#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
+static struct kmem_cache *riscv_v_kernel_cachep;
+#endif
 
 unsigned long riscv_v_vsize __read_mostly;
 EXPORT_SYMBOL_GPL(riscv_v_vsize);
@@ -53,6 +56,11 @@ void __init riscv_v_setup_ctx_cache(void)
 	riscv_v_user_cachep = kmem_cache_create_usercopy("riscv_vector_ctx",
 							 riscv_v_vsize, 16, SLAB_PANIC,
 							 0, riscv_v_vsize, NULL);
+#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
+	riscv_v_kernel_cachep = kmem_cache_create("riscv_vector_kctx",
+						  riscv_v_vsize, 16,
+						  SLAB_PANIC, NULL);
+#endif
 }
 
 static bool insn_is_vector(u32 insn_buf)
@@ -88,24 +96,35 @@ static bool insn_is_vector(u32 insn_buf)
 	return false;
 }
 
-static int riscv_v_thread_zalloc(void)
+static int riscv_v_thread_zalloc(struct kmem_cache *cache,
+				 struct __riscv_v_ext_state *ctx)
 {
 	void *datap;
 
-	datap = kmem_cache_zalloc(riscv_v_user_cachep, GFP_KERNEL);
+	datap = kmem_cache_zalloc(cache, GFP_KERNEL);
 	if (!datap)
 		return -ENOMEM;
 
-	current->thread.vstate.datap = datap;
-	memset(&current->thread.vstate, 0, offsetof(struct __riscv_v_ext_state,
-						    datap));
+	ctx->datap = datap;
+	memset(ctx, 0, offsetof(struct __riscv_v_ext_state, datap));
 	return 0;
 }
 
+void riscv_v_thread_alloc(struct task_struct *tsk)
+{
+#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
+	riscv_v_thread_zalloc(riscv_v_kernel_cachep, &tsk->thread.kernel_vstate);
+#endif
+}
+
 void riscv_v_thread_free(struct task_struct *tsk)
 {
 	if (tsk->thread.vstate.datap)
 		kmem_cache_free(riscv_v_user_cachep, tsk->thread.vstate.datap);
+#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
+	if (tsk->thread.kernel_vstate.datap)
+		kmem_cache_free(riscv_v_kernel_cachep, tsk->thread.kernel_vstate.datap);
+#endif
 }
 
 #define VSTATE_CTRL_GET_CUR(x) ((x) & PR_RISCV_V_VSTATE_CTRL_CUR_MASK)
@@ -177,7 +196,7 @@ bool riscv_v_first_use_handler(struct pt_regs *regs)
 	 * context where VS has been off. So, try to allocate the user's V
 	 * context and resume execution.
 	 */
-	if (riscv_v_thread_zalloc()) {
+	if (riscv_v_thread_zalloc(riscv_v_user_cachep, &current->thread.vstate)) {
 		force_sig(SIGBUS);
 		return true;
 	}
-- 
2.17.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [v8, 05/10] riscv: lib: vectorize copy_to_user/copy_from_user
  2023-12-23  4:29 ` [v8, 05/10] riscv: lib: vectorize copy_to_user/copy_from_user Andy Chiu
@ 2023-12-27  1:27   ` Charlie Jenkins
  2023-12-27  1:34   ` Guo Ren
  1 sibling, 0 replies; 24+ messages in thread
From: Charlie Jenkins @ 2023-12-27  1:27 UTC (permalink / raw)
  To: Andy Chiu
  Cc: linux-riscv, palmer, paul.walmsley, greentime.hu, guoren, bjorn,
	ardb, arnd, peterz, tglx, ebiggers, Albert Ou, Guo Ren,
	Sami Tolvanen, Han-Kuan Chen, Deepak Gupta, Andrew Jones,
	Conor Dooley, Heiko Stuebner, Aurelien Jarno, Bo YU,
	Alexandre Ghiti, Clément Léger

On Sat, Dec 23, 2023 at 04:29:09AM +0000, Andy Chiu wrote:
> This patch utilizes Vector to perform copy_to_user/copy_from_user. If
> Vector is available and the size of copy is large enough for Vector to
> perform better than scalar, then direct the kernel to do Vector copies
> for userspace. Though the best programming practice for users is to
> reduce the copy, this provides a faster variant when copies are
> inevitable.
> 
> The optimal size for using Vector, copy_to_user_thres, is only a
> heuristic for now. We can add DT parsing if people feel the need of
> customizing it.
> 
> The exception fixup code of the __asm_vector_usercopy must fallback to
> the scalar one because accessing user pages might fault, and must be
> sleepable. Current kernel-mode Vector does not allow tasks to be
> preemptible, so we must disactivate Vector and perform a scalar fallback
> in such case.
> 
> The original implementation of Vector operations comes from
> https://github.com/sifive/sifive-libc, which we agree to contribute to
> Linux kernel.
> 
> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> ---
> Changelog v8:
>  - fix no-mmu build
> Changelog v6:
>  - Add a kconfig entry to configure threshold values (Charlie)
>  - Refine assembly code (Charlie)
> Changelog v4:
>  - new patch since v4
> ---
>  arch/riscv/Kconfig                      |  8 ++++
>  arch/riscv/include/asm/asm-prototypes.h |  4 ++
>  arch/riscv/lib/Makefile                 |  6 ++-
>  arch/riscv/lib/riscv_v_helpers.c        | 44 ++++++++++++++++++++++
>  arch/riscv/lib/uaccess.S                | 10 +++++
>  arch/riscv/lib/uaccess_vector.S         | 50 +++++++++++++++++++++++++
>  6 files changed, 121 insertions(+), 1 deletion(-)
>  create mode 100644 arch/riscv/lib/riscv_v_helpers.c
>  create mode 100644 arch/riscv/lib/uaccess_vector.S
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 95a2a06acc6a..3c5ba05e8a2d 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -525,6 +525,14 @@ config RISCV_ISA_V_DEFAULT_ENABLE
>  
>  	  If you don't know what to do here, say Y.
>  
> +config RISCV_ISA_V_UCOPY_THRESHOLD
> +	int "Threshold size for vectorized user copies"
> +	depends on RISCV_ISA_V
> +	default 768
> +	help
> +	  Prefer using vectorized copy_to_user()/copy_from_user() when the
> +	  workload size exceeds this value.
> +
>  config TOOLCHAIN_HAS_ZBB
>  	bool
>  	default y
> diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> index 6db1a9bbff4c..be438932f321 100644
> --- a/arch/riscv/include/asm/asm-prototypes.h
> +++ b/arch/riscv/include/asm/asm-prototypes.h
> @@ -11,6 +11,10 @@ long long __ashlti3(long long a, int b);
>  
>  #ifdef CONFIG_RISCV_ISA_V
>  
> +#ifdef CONFIG_MMU
> +asmlinkage int enter_vector_usercopy(void *dst, void *src, size_t n);
> +#endif /* CONFIG_MMU  */
> +
>  void xor_regs_2_(unsigned long bytes, unsigned long *__restrict p1,
>  		 const unsigned long *__restrict p2);
>  void xor_regs_3_(unsigned long bytes, unsigned long *__restrict p1,
> diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
> index 494f9cd1a00c..c8a6787d5827 100644
> --- a/arch/riscv/lib/Makefile
> +++ b/arch/riscv/lib/Makefile
> @@ -6,9 +6,13 @@ lib-y			+= memmove.o
>  lib-y			+= strcmp.o
>  lib-y			+= strlen.o
>  lib-y			+= strncmp.o
> -lib-$(CONFIG_MMU)	+= uaccess.o
> +ifeq ($(CONFIG_MMU), y)
> +lib-y				+= uaccess.o
> +lib-$(CONFIG_RISCV_ISA_V)	+= uaccess_vector.o
> +endif
>  lib-$(CONFIG_64BIT)	+= tishift.o
>  lib-$(CONFIG_RISCV_ISA_ZICBOZ)	+= clear_page.o
>  
>  obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
>  lib-$(CONFIG_RISCV_ISA_V)	+= xor.o
> +lib-$(CONFIG_RISCV_ISA_V)	+= riscv_v_helpers.o
> diff --git a/arch/riscv/lib/riscv_v_helpers.c b/arch/riscv/lib/riscv_v_helpers.c
> new file mode 100644
> index 000000000000..6cac8f4e69e9
> --- /dev/null
> +++ b/arch/riscv/lib/riscv_v_helpers.c
> @@ -0,0 +1,44 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) 2023 SiFive
> + * Author: Andy Chiu <andy.chiu@sifive.com>
> + */
> +#include <linux/linkage.h>
> +#include <asm/asm.h>
> +
> +#include <asm/vector.h>
> +#include <asm/simd.h>
> +
> +#ifdef CONFIG_MMU
> +#include <asm/asm-prototypes.h>
> +#endif
> +
> +#ifdef CONFIG_MMU
> +size_t riscv_v_usercopy_threshold = CONFIG_RISCV_ISA_V_UCOPY_THRESHOLD;
> +int __asm_vector_usercopy(void *dst, void *src, size_t n);
> +int fallback_scalar_usercopy(void *dst, void *src, size_t n);
> +asmlinkage int enter_vector_usercopy(void *dst, void *src, size_t n)
> +{
> +	size_t remain, copied;
> +
> +	/* skip has_vector() check because it has been done by the asm  */
> +	if (!may_use_simd())
> +		goto fallback;
> +
> +	kernel_vector_begin();
> +	remain = __asm_vector_usercopy(dst, src, n);
> +	kernel_vector_end();
> +
> +	if (remain) {
> +		copied = n - remain;
> +		dst += copied;
> +		src += copied;
> +		goto fallback;
> +	}
> +
> +	return remain;
> +
> +fallback:
> +	return fallback_scalar_usercopy(dst, src, n);
> +}
> +#endif
> diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S
> index 3ab438f30d13..a1e4a3c42925 100644
> --- a/arch/riscv/lib/uaccess.S
> +++ b/arch/riscv/lib/uaccess.S
> @@ -3,6 +3,8 @@
>  #include <asm/asm.h>
>  #include <asm/asm-extable.h>
>  #include <asm/csr.h>
> +#include <asm/hwcap.h>
> +#include <asm/alternative-macros.h>
>  
>  	.macro fixup op reg addr lbl
>  100:
> @@ -11,6 +13,13 @@
>  	.endm
>  
>  SYM_FUNC_START(__asm_copy_to_user)
> +#ifdef CONFIG_RISCV_ISA_V
> +	ALTERNATIVE("j fallback_scalar_usercopy", "nop", 0, RISCV_ISA_EXT_v, CONFIG_RISCV_ISA_V)
> +	REG_L	t0, riscv_v_usercopy_threshold
> +	bltu	a2, t0, fallback_scalar_usercopy
> +	tail enter_vector_usercopy
> +#endif
> +SYM_FUNC_START(fallback_scalar_usercopy)
>  
>  	/* Enable access to user memory */
>  	li t6, SR_SUM
> @@ -181,6 +190,7 @@ SYM_FUNC_START(__asm_copy_to_user)
>  	sub a0, t5, a0
>  	ret
>  SYM_FUNC_END(__asm_copy_to_user)
> +SYM_FUNC_END(fallback_scalar_usercopy)
>  EXPORT_SYMBOL(__asm_copy_to_user)
>  SYM_FUNC_ALIAS(__asm_copy_from_user, __asm_copy_to_user)
>  EXPORT_SYMBOL(__asm_copy_from_user)
> diff --git a/arch/riscv/lib/uaccess_vector.S b/arch/riscv/lib/uaccess_vector.S
> new file mode 100644
> index 000000000000..7bd96cee39e4
> --- /dev/null
> +++ b/arch/riscv/lib/uaccess_vector.S
> @@ -0,0 +1,50 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#include <linux/linkage.h>
> +#include <asm-generic/export.h>
> +#include <asm/asm.h>
> +#include <asm/asm-extable.h>
> +#include <asm/csr.h>
> +
> +#define pDst a0
> +#define pSrc a1
> +#define iNum a2
> +
> +#define iVL a3
> +
> +#define ELEM_LMUL_SETTING m8
> +#define vData v0
> +
> +	.macro fixup op reg addr lbl
> +100:
> +	\op \reg, \addr
> +	_asm_extable	100b, \lbl
> +	.endm
> +
> +SYM_FUNC_START(__asm_vector_usercopy)
> +	/* Enable access to user memory */
> +	li t6, SR_SUM
> +	csrs CSR_STATUS, t6
> +
> +loop:
> +	vsetvli iVL, iNum, e8, ELEM_LMUL_SETTING, ta, ma
> +	fixup vle8.v vData, (pSrc), 10f
> +	fixup vse8.v vData, (pDst), 10f
> +	sub iNum, iNum, iVL
> +	add pSrc, pSrc, iVL
> +	add pDst, pDst, iVL
> +	bnez iNum, loop
> +
> +.Lout_copy_user:
> +	/* Disable access to user memory */
> +	csrc CSR_STATUS, t6
> +	li	a0, 0

It appears that iNum will always equal 0 at this line. Can this section
be eliminated and handled by the following fixup code or is there a
reason to keep them separate?

- Charlie

> +	ret
> +
> +	/* Exception fixup code */
> +10:
> +	/* Disable access to user memory */
> +	csrc	CSR_STATUS, t6
> +	mv	a0, iNum
> +	ret
> +SYM_FUNC_END(__asm_vector_usercopy)
> -- 
> 2.17.1
> 

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [v8, 05/10] riscv: lib: vectorize copy_to_user/copy_from_user
  2023-12-23  4:29 ` [v8, 05/10] riscv: lib: vectorize copy_to_user/copy_from_user Andy Chiu
  2023-12-27  1:27   ` Charlie Jenkins
@ 2023-12-27  1:34   ` Guo Ren
  2023-12-27  3:15     ` Andy Chiu
  1 sibling, 1 reply; 24+ messages in thread
From: Guo Ren @ 2023-12-27  1:34 UTC (permalink / raw)
  To: Andy Chiu
  Cc: linux-riscv, palmer, paul.walmsley, greentime.hu, guoren, bjorn,
	charlie, ardb, arnd, peterz, tglx, ebiggers, Albert Ou,
	Sami Tolvanen, Han-Kuan Chen, Deepak Gupta, Andrew Jones,
	Conor Dooley, Heiko Stuebner, Aurelien Jarno, Bo YU,
	Alexandre Ghiti, Clément Léger

On Sat, Dec 23, 2023 at 12:30 PM Andy Chiu <andy.chiu@sifive.com> wrote:
>
> This patch utilizes Vector to perform copy_to_user/copy_from_user. If
> Vector is available and the size of copy is large enough for Vector to
> perform better than scalar, then direct the kernel to do Vector copies
> for userspace. Though the best programming practice for users is to
> reduce the copy, this provides a faster variant when copies are
> inevitable.
>
> The optimal size for using Vector, copy_to_user_thres, is only a
> heuristic for now. We can add DT parsing if people feel the need of
> customizing it.
>
> The exception fixup code of the __asm_vector_usercopy must fallback to
> the scalar one because accessing user pages might fault, and must be
> sleepable. Current kernel-mode Vector does not allow tasks to be
> preemptible, so we must disactivate Vector and perform a scalar fallback
> in such case.
>
> The original implementation of Vector operations comes from
> https://github.com/sifive/sifive-libc, which we agree to contribute to
> Linux kernel.
>
> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> ---
> Changelog v8:
>  - fix no-mmu build
> Changelog v6:
>  - Add a kconfig entry to configure threshold values (Charlie)
>  - Refine assembly code (Charlie)
> Changelog v4:
>  - new patch since v4
> ---
>  arch/riscv/Kconfig                      |  8 ++++
>  arch/riscv/include/asm/asm-prototypes.h |  4 ++
>  arch/riscv/lib/Makefile                 |  6 ++-
>  arch/riscv/lib/riscv_v_helpers.c        | 44 ++++++++++++++++++++++
>  arch/riscv/lib/uaccess.S                | 10 +++++
>  arch/riscv/lib/uaccess_vector.S         | 50 +++++++++++++++++++++++++
>  6 files changed, 121 insertions(+), 1 deletion(-)
>  create mode 100644 arch/riscv/lib/riscv_v_helpers.c
>  create mode 100644 arch/riscv/lib/uaccess_vector.S
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 95a2a06acc6a..3c5ba05e8a2d 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -525,6 +525,14 @@ config RISCV_ISA_V_DEFAULT_ENABLE
>
>           If you don't know what to do here, say Y.
>
> +config RISCV_ISA_V_UCOPY_THRESHOLD
> +       int "Threshold size for vectorized user copies"
> +       depends on RISCV_ISA_V
> +       default 768
> +       help
> +         Prefer using vectorized copy_to_user()/copy_from_user() when the
> +         workload size exceeds this value.
> +
>  config TOOLCHAIN_HAS_ZBB
>         bool
>         default y
> diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> index 6db1a9bbff4c..be438932f321 100644
> --- a/arch/riscv/include/asm/asm-prototypes.h
> +++ b/arch/riscv/include/asm/asm-prototypes.h
> @@ -11,6 +11,10 @@ long long __ashlti3(long long a, int b);
>
>  #ifdef CONFIG_RISCV_ISA_V
>
> +#ifdef CONFIG_MMU
> +asmlinkage int enter_vector_usercopy(void *dst, void *src, size_t n);
> +#endif /* CONFIG_MMU  */
> +
>  void xor_regs_2_(unsigned long bytes, unsigned long *__restrict p1,
>                  const unsigned long *__restrict p2);
>  void xor_regs_3_(unsigned long bytes, unsigned long *__restrict p1,
> diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
> index 494f9cd1a00c..c8a6787d5827 100644
> --- a/arch/riscv/lib/Makefile
> +++ b/arch/riscv/lib/Makefile
> @@ -6,9 +6,13 @@ lib-y                  += memmove.o
>  lib-y                  += strcmp.o
>  lib-y                  += strlen.o
>  lib-y                  += strncmp.o
> -lib-$(CONFIG_MMU)      += uaccess.o
> +ifeq ($(CONFIG_MMU), y)
> +lib-y                          += uaccess.o
> +lib-$(CONFIG_RISCV_ISA_V)      += uaccess_vector.o
> +endif
>  lib-$(CONFIG_64BIT)    += tishift.o
>  lib-$(CONFIG_RISCV_ISA_ZICBOZ) += clear_page.o
>
>  obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
>  lib-$(CONFIG_RISCV_ISA_V)      += xor.o
> +lib-$(CONFIG_RISCV_ISA_V)      += riscv_v_helpers.o
> diff --git a/arch/riscv/lib/riscv_v_helpers.c b/arch/riscv/lib/riscv_v_helpers.c
> new file mode 100644
> index 000000000000..6cac8f4e69e9
> --- /dev/null
> +++ b/arch/riscv/lib/riscv_v_helpers.c
> @@ -0,0 +1,44 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) 2023 SiFive
> + * Author: Andy Chiu <andy.chiu@sifive.com>
> + */
> +#include <linux/linkage.h>
> +#include <asm/asm.h>
> +
> +#include <asm/vector.h>
> +#include <asm/simd.h>
> +
> +#ifdef CONFIG_MMU
> +#include <asm/asm-prototypes.h>
> +#endif
> +
> +#ifdef CONFIG_MMU
> +size_t riscv_v_usercopy_threshold = CONFIG_RISCV_ISA_V_UCOPY_THRESHOLD;
> +int __asm_vector_usercopy(void *dst, void *src, size_t n);
> +int fallback_scalar_usercopy(void *dst, void *src, size_t n);
> +asmlinkage int enter_vector_usercopy(void *dst, void *src, size_t n)
> +{
> +       size_t remain, copied;
> +
> +       /* skip has_vector() check because it has been done by the asm  */
> +       if (!may_use_simd())
> +               goto fallback;
> +
> +       kernel_vector_begin();
> +       remain = __asm_vector_usercopy(dst, src, n);
> +       kernel_vector_end();
> +
> +       if (remain) {
> +               copied = n - remain;
> +               dst += copied;
> +               src += copied;
> +               goto fallback;
> +       }
> +
> +       return remain;
> +
> +fallback:
> +       return fallback_scalar_usercopy(dst, src, n);
> +}
> +#endif
> diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S
> index 3ab438f30d13..a1e4a3c42925 100644
> --- a/arch/riscv/lib/uaccess.S
> +++ b/arch/riscv/lib/uaccess.S
> @@ -3,6 +3,8 @@
>  #include <asm/asm.h>
>  #include <asm/asm-extable.h>
>  #include <asm/csr.h>
> +#include <asm/hwcap.h>
> +#include <asm/alternative-macros.h>
>
>         .macro fixup op reg addr lbl
>  100:
> @@ -11,6 +13,13 @@
>         .endm
>
>  SYM_FUNC_START(__asm_copy_to_user)
> +#ifdef CONFIG_RISCV_ISA_V
> +       ALTERNATIVE("j fallback_scalar_usercopy", "nop", 0, RISCV_ISA_EXT_v, CONFIG_RISCV_ISA_V)
> +       REG_L   t0, riscv_v_usercopy_threshold
> +       bltu    a2, t0, fallback_scalar_usercopy
> +       tail enter_vector_usercopy
> +#endif
> +SYM_FUNC_START(fallback_scalar_usercopy)
>
>         /* Enable access to user memory */
>         li t6, SR_SUM
> @@ -181,6 +190,7 @@ SYM_FUNC_START(__asm_copy_to_user)
>         sub a0, t5, a0
>         ret
>  SYM_FUNC_END(__asm_copy_to_user)
> +SYM_FUNC_END(fallback_scalar_usercopy)
>  EXPORT_SYMBOL(__asm_copy_to_user)
>  SYM_FUNC_ALIAS(__asm_copy_from_user, __asm_copy_to_user)
>  EXPORT_SYMBOL(__asm_copy_from_user)
> diff --git a/arch/riscv/lib/uaccess_vector.S b/arch/riscv/lib/uaccess_vector.S
> new file mode 100644
> index 000000000000..7bd96cee39e4
> --- /dev/null
> +++ b/arch/riscv/lib/uaccess_vector.S
> @@ -0,0 +1,50 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#include <linux/linkage.h>
> +#include <asm-generic/export.h>
> +#include <asm/asm.h>
> +#include <asm/asm-extable.h>
> +#include <asm/csr.h>
> +
> +#define pDst a0
> +#define pSrc a1
> +#define iNum a2
> +
> +#define iVL a3
> +
> +#define ELEM_LMUL_SETTING m8
> +#define vData v0
> +
> +       .macro fixup op reg addr lbl
> +100:
> +       \op \reg, \addr
> +       _asm_extable    100b, \lbl
> +       .endm
> +
> +SYM_FUNC_START(__asm_vector_usercopy)
> +       /* Enable access to user memory */
> +       li t6, SR_SUM
> +       csrs CSR_STATUS, t6
> +
> +loop:
> +       vsetvli iVL, iNum, e8, ELEM_LMUL_SETTING, ta, ma
> +       fixup vle8.v vData, (pSrc), 10f
> +       fixup vse8.v vData, (pDst), 10f
> +       sub iNum, iNum, iVL
> +       add pSrc, pSrc, iVL
> +       add pDst, pDst, iVL
> +       bnez iNum, loop
> +
> +.Lout_copy_user:
> +       /* Disable access to user memory */
> +       csrc CSR_STATUS, t6
> +       li      a0, 0
> +       ret
> +
> +       /* Exception fixup code */
> +10:
> +       /* Disable access to user memory */
> +       csrc    CSR_STATUS, t6
> +       mv      a0, iNum
Shall we check CSR_VSTART to find out how many elements were copied?

> +       ret
> +SYM_FUNC_END(__asm_vector_usercopy)
> --
> 2.17.1
>


-- 
Best Regards
 Guo Ren

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [v8, 01/10] riscv: Add support for kernel mode vector
  2023-12-23  4:29 ` [v8, 01/10] riscv: Add support for kernel mode vector Andy Chiu
@ 2023-12-27  1:36   ` Charlie Jenkins
  2023-12-27  2:46     ` Andy Chiu
  0 siblings, 1 reply; 24+ messages in thread
From: Charlie Jenkins @ 2023-12-27  1:36 UTC (permalink / raw)
  To: Andy Chiu
  Cc: linux-riscv, palmer, paul.walmsley, greentime.hu, guoren, bjorn,
	ardb, arnd, peterz, tglx, ebiggers, Vincent Chen, Albert Ou,
	Heiko Stuebner, Baoquan He, Clément Léger, Guo Ren,
	Xiao Wang, Björn Töpel, Conor Dooley, Alexandre Ghiti,
	Sami Tolvanen, Sia Jee Heng, Evan Green, Jisheng Zhang

On Sat, Dec 23, 2023 at 04:29:05AM +0000, Andy Chiu wrote:
> From: Greentime Hu <greentime.hu@sifive.com>
> 
> Add kernel_vector_begin() and kernel_vector_end() function declarations
> and corresponding definitions in kernel_mode_vector.c
> 
> These are needed to wrap uses of vector in kernel mode.
> 
> Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
> Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> ---
> Changelog v8:
>  - Refactor unnecessary whitespace change (Eric)
> Changelog v7:
>  - fix build fail for allmodconfig
> Changelog v6:
>  - Use 8 bits to track non-preemptible vector context to provide better
>    WARN coverage.
> Changelog v4:
>  - Use kernel_v_flags and helpers to track vector context.
> Changelog v3:
>  - Reorder patch 1 to patch 3 to make use of
>    {get,put}_cpu_vector_context later.
>  - Export {get,put}_cpu_vector_context.
>  - Save V context after disabling preemption. (Guo)
>  - Fix a build fail. (Conor)
>  - Remove irqs_disabled() check as it is not needed, fix styling. (Björn)
> Changelog v2:
>  - 's/kernel_rvv/kernel_vector' and return void in kernel_vector_begin
>    (Conor)
>  - export may_use_simd to include/asm/simd.h
> ---
>  arch/riscv/include/asm/processor.h     | 17 ++++-
>  arch/riscv/include/asm/simd.h          | 44 ++++++++++++
>  arch/riscv/include/asm/vector.h        | 21 ++++++
>  arch/riscv/kernel/Makefile             |  1 +
>  arch/riscv/kernel/kernel_mode_vector.c | 95 ++++++++++++++++++++++++++
>  arch/riscv/kernel/process.c            |  1 +
>  6 files changed, 178 insertions(+), 1 deletion(-)
>  create mode 100644 arch/riscv/include/asm/simd.h
>  create mode 100644 arch/riscv/kernel/kernel_mode_vector.c
> 
> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> index f19f861cda54..15781e2232e0 100644
> --- a/arch/riscv/include/asm/processor.h
> +++ b/arch/riscv/include/asm/processor.h
> @@ -73,6 +73,20 @@
>  struct task_struct;
>  struct pt_regs;
>  
> +/*
> + * We use a flag to track in-kernel Vector context. Currently the flag has the
> + * following meaning:
> + *
> + *  - bit 0-7 indicates whether the in-kernel Vector context is active. The
> + *    activation of this state disables the preemption. On a non-RT kernel, it
> + *    also disable bh. Currently only 0 and 1 are valid value for this field.
> + *    Other values are reserved for future uses.
> + */
> +
> +#define RISCV_KERNEL_MODE_V_MASK	0xff
> +
> +#define RISCV_KERNEL_MODE_V	0x1
> +
>  /* CPU-specific state of a task */
>  struct thread_struct {
>  	/* Callee-saved registers */
> @@ -81,7 +95,8 @@ struct thread_struct {
>  	unsigned long s[12];	/* s[0]: frame pointer */
>  	struct __riscv_d_ext_state fstate;
>  	unsigned long bad_cause;
> -	unsigned long vstate_ctrl;
> +	u32 riscv_v_flags;
> +	u32 vstate_ctrl;
>  	struct __riscv_v_ext_state vstate;
>  	unsigned long align_ctl;
>  };
> diff --git a/arch/riscv/include/asm/simd.h b/arch/riscv/include/asm/simd.h
> new file mode 100644
> index 000000000000..3b603e47c5d8
> --- /dev/null
> +++ b/arch/riscv/include/asm/simd.h
> @@ -0,0 +1,44 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
> + * Copyright (C) 2023 SiFive
> + */
> +
> +#ifndef __ASM_SIMD_H
> +#define __ASM_SIMD_H
> +
> +#include <linux/compiler.h>
> +#include <linux/irqflags.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/types.h>
> +
> +#include <asm/vector.h>
> +
> +#ifdef CONFIG_RISCV_ISA_V
> +/*
> + * may_use_simd - whether it is allowable at this time to issue vector
> + *                instructions or access the vector register file
> + *
> + * Callers must not assume that the result remains true beyond the next
> + * preempt_enable() or return from softirq context.
> + */
> +static __must_check inline bool may_use_simd(void)
> +{
> +	/*
> +	 * RISCV_KERNEL_MODE_V is only set while preemption is disabled,
> +	 * and is clear whenever preemption is enabled.
> +	 */
> +	return !in_hardirq() && !in_nmi() && !(riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK);
> +}
> +
> +#else /* ! CONFIG_RISCV_ISA_V */
> +
> +static __must_check inline bool may_use_simd(void)
> +{
> +	return false;
> +}
> +
> +#endif /* ! CONFIG_RISCV_ISA_V */
> +
> +#endif
> diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
> index 87aaef656257..6254830c0668 100644
> --- a/arch/riscv/include/asm/vector.h
> +++ b/arch/riscv/include/asm/vector.h
> @@ -22,6 +22,27 @@
>  extern unsigned long riscv_v_vsize;
>  int riscv_v_setup_vsize(void);
>  bool riscv_v_first_use_handler(struct pt_regs *regs);
> +void kernel_vector_begin(void);
> +void kernel_vector_end(void);
> +void get_cpu_vector_context(void);
> +void put_cpu_vector_context(void);
> +
> +static inline void riscv_v_ctx_cnt_add(u32 offset)
> +{
> +	current->thread.riscv_v_flags += offset;
> +	barrier();
> +}
> +
> +static inline void riscv_v_ctx_cnt_sub(u32 offset)
> +{
> +	barrier();
> +	current->thread.riscv_v_flags -= offset;
> +}
> +
> +static inline u32 riscv_v_ctx_cnt(void)
> +{
> +	return READ_ONCE(current->thread.riscv_v_flags);
> +}
>  
>  static __always_inline bool has_vector(void)
>  {
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index fee22a3d1b53..8c58595696b3 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -63,6 +63,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
>  obj-$(CONFIG_RISCV_MISALIGNED)	+= traps_misaligned.o
>  obj-$(CONFIG_FPU)		+= fpu.o
>  obj-$(CONFIG_RISCV_ISA_V)	+= vector.o
> +obj-$(CONFIG_RISCV_ISA_V)	+= kernel_mode_vector.o
>  obj-$(CONFIG_SMP)		+= smpboot.o
>  obj-$(CONFIG_SMP)		+= smp.o
>  obj-$(CONFIG_SMP)		+= cpu_ops.o
> diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
> new file mode 100644
> index 000000000000..105147c7d2da
> --- /dev/null
> +++ b/arch/riscv/kernel/kernel_mode_vector.c
> @@ -0,0 +1,95 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) 2012 ARM Ltd.
> + * Author: Catalin Marinas <catalin.marinas@arm.com>
> + * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
> + * Copyright (C) 2021 SiFive
> + */
> +#include <linux/compiler.h>
> +#include <linux/irqflags.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/types.h>
> +
> +#include <asm/vector.h>
> +#include <asm/switch_to.h>
> +#include <asm/simd.h>
> +
> +/*
> + * Claim ownership of the CPU vector context for use by the calling context.
> + *
> + * The caller may freely manipulate the vector context metadata until
> + * put_cpu_vector_context() is called.
> + */
> +void get_cpu_vector_context(void)
> +{
> +	preempt_disable();
> +
> +	WARN_ON((riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK) != 0);
> +	riscv_v_ctx_cnt_add(RISCV_KERNEL_MODE_V);

In our last conversation I thought we agreed that a bitwise operation
would be more appropriate then addition. You also mentioned allowing
this function to be called multiple times. Did something change?

- Charlie

> +}
> +
> +/*
> + * Release the CPU vector context.
> + *
> + * Must be called from a context in which get_cpu_vector_context() was
> + * previously called, with no call to put_cpu_vector_context() in the
> + * meantime.
> + */
> +void put_cpu_vector_context(void)
> +{
> +	WARN_ON((riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK) != RISCV_KERNEL_MODE_V);
> +	riscv_v_ctx_cnt_sub(RISCV_KERNEL_MODE_V);
> +
> +	preempt_enable();
> +}
> +
> +/*
> + * kernel_vector_begin(): obtain the CPU vector registers for use by the calling
> + * context
> + *
> + * Must not be called unless may_use_simd() returns true.
> + * Task context in the vector registers is saved back to memory as necessary.
> + *
> + * A matching call to kernel_vector_end() must be made before returning from the
> + * calling context.
> + *
> + * The caller may freely use the vector registers until kernel_vector_end() is
> + * called.
> + */
> +void kernel_vector_begin(void)
> +{
> +	if (WARN_ON(!has_vector()))
> +		return;
> +
> +	BUG_ON(!may_use_simd());
> +
> +	get_cpu_vector_context();
> +
> +	riscv_v_vstate_save(current, task_pt_regs(current));
> +
> +	riscv_v_enable();
> +}
> +EXPORT_SYMBOL_GPL(kernel_vector_begin);
> +
> +/*
> + * kernel_vector_end(): give the CPU vector registers back to the current task
> + *
> + * Must be called from a context in which kernel_vector_begin() was previously
> + * called, with no call to kernel_vector_end() in the meantime.
> + *
> + * The caller must not use the vector registers after this function is called,
> + * unless kernel_vector_begin() is called again in the meantime.
> + */
> +void kernel_vector_end(void)
> +{
> +	if (WARN_ON(!has_vector()))
> +		return;
> +
> +	riscv_v_vstate_restore(current, task_pt_regs(current));
> +
> +	riscv_v_disable();
> +
> +	put_cpu_vector_context();
> +}
> +EXPORT_SYMBOL_GPL(kernel_vector_end);
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index 4f21d970a129..4a1275db1146 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -221,6 +221,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>  		childregs->a0 = 0; /* Return value of fork() */
>  		p->thread.s[0] = 0;
>  	}
> +	p->thread.riscv_v_flags = 0;
>  	p->thread.ra = (unsigned long)ret_from_fork;
>  	p->thread.sp = (unsigned long)childregs; /* kernel sp */
>  	return 0;
> -- 
> 2.17.1
> 

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [v8, 06/10] riscv: lib: add vectorized mem* routines
  2023-12-23  4:29 ` [v8, 06/10] riscv: lib: add vectorized mem* routines Andy Chiu
@ 2023-12-27  1:42   ` Charlie Jenkins
  0 siblings, 0 replies; 24+ messages in thread
From: Charlie Jenkins @ 2023-12-27  1:42 UTC (permalink / raw)
  To: Andy Chiu
  Cc: linux-riscv, palmer, paul.walmsley, greentime.hu, guoren, bjorn,
	ardb, arnd, peterz, tglx, ebiggers, Albert Ou, Kees Cook,
	Han-Kuan Chen, Conor Dooley, Andrew Jones, Heiko Stuebner

On Sat, Dec 23, 2023 at 04:29:10AM +0000, Andy Chiu wrote:
> Provide vectorized memcpy/memset/memmove to accelerate common memory
> operations. Also, group them into V_OPT_TEMPLATE3 macro because their
> setup/tear-down and fallback logics are the same.
> 
> The optimal size for the kernel to preference Vector over scalar,
> riscv_v_mem*_threshold, is only a heuristic for now. We can add DT
> parsing if people feel the need of customizing it.
> 
> The original implementation of Vector operations comes from
> https://github.com/sifive/sifive-libc, which we agree to contribute to
> Linux kernel.
> 
> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> ---
> Changelog v7:
>  - add __NO_FORTIFY to prevent conflicting function declaration with
>    macro for mem* functions.
> Changelog v6:
>  - provide kconfig to set threshold for vectorized functions (Charlie)
>  - rename *thres to *threshold (Charlie)
> Changelog v4:
>  - new patch since v4
> ---
>  arch/riscv/Kconfig               | 24 ++++++++++++++++
>  arch/riscv/lib/Makefile          |  3 ++
>  arch/riscv/lib/memcpy_vector.S   | 29 +++++++++++++++++++
>  arch/riscv/lib/memmove_vector.S  | 49 ++++++++++++++++++++++++++++++++
>  arch/riscv/lib/memset_vector.S   | 33 +++++++++++++++++++++
>  arch/riscv/lib/riscv_v_helpers.c | 26 +++++++++++++++++
>  6 files changed, 164 insertions(+)
>  create mode 100644 arch/riscv/lib/memcpy_vector.S
>  create mode 100644 arch/riscv/lib/memmove_vector.S
>  create mode 100644 arch/riscv/lib/memset_vector.S
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 3c5ba05e8a2d..cba53dcc2ae0 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -533,6 +533,30 @@ config RISCV_ISA_V_UCOPY_THRESHOLD
>  	  Prefer using vectorized copy_to_user()/copy_from_user() when the
>  	  workload size exceeds this value.
>  
> +config RISCV_ISA_V_MEMSET_THRESHOLD
> +	int "Threshold size for vectorized memset()"
> +	depends on RISCV_ISA_V
> +	default 1280
> +	help
> +	  Prefer using vectorized memset() when the workload size exceeds this
> +	  value.
> +
> +config RISCV_ISA_V_MEMCPY_THRESHOLD
> +	int "Threshold size for vectorized memcpy()"
> +	depends on RISCV_ISA_V
> +	default 768
> +	help
> +	  Prefer using vectorized memcpy() when the workload size exceeds this
> +	  value.
> +
> +config RISCV_ISA_V_MEMMOVE_THRESHOLD
> +	int "Threshold size for vectorized memmove()"
> +	depends on RISCV_ISA_V
> +	default 512
> +	help
> +	  Prefer using vectorized memmove() when the workload size exceeds this
> +	  value.
> +
>  config TOOLCHAIN_HAS_ZBB
>  	bool
>  	default y
> diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
> index c8a6787d5827..d389dbf285fe 100644
> --- a/arch/riscv/lib/Makefile
> +++ b/arch/riscv/lib/Makefile
> @@ -16,3 +16,6 @@ lib-$(CONFIG_RISCV_ISA_ZICBOZ)	+= clear_page.o
>  obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
>  lib-$(CONFIG_RISCV_ISA_V)	+= xor.o
>  lib-$(CONFIG_RISCV_ISA_V)	+= riscv_v_helpers.o
> +lib-$(CONFIG_RISCV_ISA_V)	+= memset_vector.o
> +lib-$(CONFIG_RISCV_ISA_V)	+= memcpy_vector.o
> +lib-$(CONFIG_RISCV_ISA_V)	+= memmove_vector.o
> diff --git a/arch/riscv/lib/memcpy_vector.S b/arch/riscv/lib/memcpy_vector.S
> new file mode 100644
> index 000000000000..4176b6e0a53c
> --- /dev/null
> +++ b/arch/riscv/lib/memcpy_vector.S
> @@ -0,0 +1,29 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#include <linux/linkage.h>
> +#include <asm/asm.h>
> +
> +#define pDst a0
> +#define pSrc a1
> +#define iNum a2
> +
> +#define iVL a3
> +#define pDstPtr a4
> +
> +#define ELEM_LMUL_SETTING m8
> +#define vData v0
> +
> +
> +/* void *memcpy(void *, const void *, size_t) */
> +SYM_FUNC_START(__asm_memcpy_vector)
> +	mv pDstPtr, pDst
> +loop:
> +	vsetvli iVL, iNum, e8, ELEM_LMUL_SETTING, ta, ma
> +	vle8.v vData, (pSrc)
> +	sub iNum, iNum, iVL
> +	add pSrc, pSrc, iVL
> +	vse8.v vData, (pDstPtr)
> +	add pDstPtr, pDstPtr, iVL
> +	bnez iNum, loop
> +	ret
> +SYM_FUNC_END(__asm_memcpy_vector)
> diff --git a/arch/riscv/lib/memmove_vector.S b/arch/riscv/lib/memmove_vector.S
> new file mode 100644
> index 000000000000..4cea9d244dc9
> --- /dev/null
> +++ b/arch/riscv/lib/memmove_vector.S
> @@ -0,0 +1,49 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +#include <linux/linkage.h>
> +#include <asm/asm.h>
> +
> +#define pDst a0
> +#define pSrc a1
> +#define iNum a2
> +
> +#define iVL a3
> +#define pDstPtr a4
> +#define pSrcBackwardPtr a5
> +#define pDstBackwardPtr a6
> +
> +#define ELEM_LMUL_SETTING m8
> +#define vData v0
> +
> +SYM_FUNC_START(__asm_memmove_vector)
> +
> +    mv pDstPtr, pDst
> +
> +    bgeu pSrc, pDst, forward_copy_loop
> +    add pSrcBackwardPtr, pSrc, iNum
> +    add pDstBackwardPtr, pDst, iNum
> +    bltu pDst, pSrcBackwardPtr, backward_copy_loop
> +
> +forward_copy_loop:
> +    vsetvli iVL, iNum, e8, ELEM_LMUL_SETTING, ta, ma
> +
> +    vle8.v vData, (pSrc)
> +    sub iNum, iNum, iVL
> +    add pSrc, pSrc, iVL
> +    vse8.v vData, (pDstPtr)
> +    add pDstPtr, pDstPtr, iVL
> +
> +    bnez iNum, forward_copy_loop
> +    ret
> +
> +backward_copy_loop:
> +    vsetvli iVL, iNum, e8, ELEM_LMUL_SETTING, ta, ma
> +
> +    sub pSrcBackwardPtr, pSrcBackwardPtr, iVL
> +    vle8.v vData, (pSrcBackwardPtr)
> +    sub iNum, iNum, iVL
> +    sub pDstBackwardPtr, pDstBackwardPtr, iVL
> +    vse8.v vData, (pDstBackwardPtr)
> +    bnez iNum, backward_copy_loop
> +    ret
> +
> +SYM_FUNC_END(__asm_memmove_vector)
> diff --git a/arch/riscv/lib/memset_vector.S b/arch/riscv/lib/memset_vector.S
> new file mode 100644
> index 000000000000..4611feed72ac
> --- /dev/null
> +++ b/arch/riscv/lib/memset_vector.S
> @@ -0,0 +1,33 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +#include <linux/linkage.h>
> +#include <asm/asm.h>
> +
> +#define pDst a0
> +#define iValue a1
> +#define iNum a2
> +
> +#define iVL a3
> +#define iTemp a4
> +#define pDstPtr a5
> +
> +#define ELEM_LMUL_SETTING m8
> +#define vData v0
> +
> +/* void *memset(void *, int, size_t) */
> +SYM_FUNC_START(__asm_memset_vector)
> +
> +    mv pDstPtr, pDst
> +
> +    vsetvli iVL, iNum, e8, ELEM_LMUL_SETTING, ta, ma
> +    vmv.v.x vData, iValue
> +
> +loop:
> +    vse8.v vData, (pDstPtr)
> +    sub iNum, iNum, iVL
> +    add pDstPtr, pDstPtr, iVL
> +    vsetvli iVL, iNum, e8, ELEM_LMUL_SETTING, ta, ma
> +    bnez iNum, loop
> +
> +    ret
> +
> +SYM_FUNC_END(__asm_memset_vector)
> diff --git a/arch/riscv/lib/riscv_v_helpers.c b/arch/riscv/lib/riscv_v_helpers.c
> index 6cac8f4e69e9..c62f333ba557 100644
> --- a/arch/riscv/lib/riscv_v_helpers.c
> +++ b/arch/riscv/lib/riscv_v_helpers.c
> @@ -3,9 +3,13 @@
>   * Copyright (C) 2023 SiFive
>   * Author: Andy Chiu <andy.chiu@sifive.com>
>   */
> +#ifndef __NO_FORTIFY
> +# define __NO_FORTIFY
> +#endif
>  #include <linux/linkage.h>
>  #include <asm/asm.h>
>  
> +#include <asm/string.h>
>  #include <asm/vector.h>
>  #include <asm/simd.h>
>  
> @@ -42,3 +46,25 @@ asmlinkage int enter_vector_usercopy(void *dst, void *src, size_t n)
>  	return fallback_scalar_usercopy(dst, src, n);
>  }
>  #endif
> +
> +#define V_OPT_TEMPLATE3(prefix, type_r, type_0, type_1)				\
> +extern type_r __asm_##prefix##_vector(type_0, type_1, size_t n);		\
> +type_r prefix(type_0 a0, type_1 a1, size_t n)					\
> +{										\
> +	type_r ret;								\
> +	if (has_vector() && may_use_simd() &&					\
> +	    n > riscv_v_##prefix##_threshold) {					\
> +		kernel_vector_begin();						\
> +		ret = __asm_##prefix##_vector(a0, a1, n);			\
> +		kernel_vector_end();						\
> +		return ret;							\
> +	}									\
> +	return __##prefix(a0, a1, n);						\
> +}
> +
> +static size_t riscv_v_memset_threshold = CONFIG_RISCV_ISA_V_MEMSET_THRESHOLD;
> +V_OPT_TEMPLATE3(memset, void *, void*, int)
> +static size_t riscv_v_memcpy_threshold = CONFIG_RISCV_ISA_V_MEMCPY_THRESHOLD;
> +V_OPT_TEMPLATE3(memcpy, void *, void*, const void *)
> +static size_t riscv_v_memmove_threshold = CONFIG_RISCV_ISA_V_MEMMOVE_THRESHOLD;
> +V_OPT_TEMPLATE3(memmove, void *, void*, const void *)
> -- 
> 2.17.1
> 

Thank you for adding the kconfigs for the thresholds.

Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [v8, 01/10] riscv: Add support for kernel mode vector
  2023-12-27  1:36   ` Charlie Jenkins
@ 2023-12-27  2:46     ` Andy Chiu
  2023-12-27  5:30       ` Charlie Jenkins
  0 siblings, 1 reply; 24+ messages in thread
From: Andy Chiu @ 2023-12-27  2:46 UTC (permalink / raw)
  To: Charlie Jenkins
  Cc: linux-riscv, palmer, paul.walmsley, greentime.hu, guoren, bjorn,
	ardb, arnd, peterz, tglx, ebiggers, Vincent Chen, Albert Ou,
	Heiko Stuebner, Baoquan He, Clément Léger, Guo Ren,
	Xiao Wang, Björn Töpel, Conor Dooley, Alexandre Ghiti,
	Sami Tolvanen, Sia Jee Heng, Evan Green, Jisheng Zhang

On Wed, Dec 27, 2023 at 9:36 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
>
> On Sat, Dec 23, 2023 at 04:29:05AM +0000, Andy Chiu wrote:
> > From: Greentime Hu <greentime.hu@sifive.com>
> >
> > Add kernel_vector_begin() and kernel_vector_end() function declarations
> > and corresponding definitions in kernel_mode_vector.c
> >
> > These are needed to wrap uses of vector in kernel mode.
> >
> > Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
> > Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> > Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> > Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> > ---
> > Changelog v8:
> >  - Refactor unnecessary whitespace change (Eric)
> > Changelog v7:
> >  - fix build fail for allmodconfig
> > Changelog v6:
> >  - Use 8 bits to track non-preemptible vector context to provide better
> >    WARN coverage.
> > Changelog v4:
> >  - Use kernel_v_flags and helpers to track vector context.
> > Changelog v3:
> >  - Reorder patch 1 to patch 3 to make use of
> >    {get,put}_cpu_vector_context later.
> >  - Export {get,put}_cpu_vector_context.
> >  - Save V context after disabling preemption. (Guo)
> >  - Fix a build fail. (Conor)
> >  - Remove irqs_disabled() check as it is not needed, fix styling. (Björn)
> > Changelog v2:
> >  - 's/kernel_rvv/kernel_vector' and return void in kernel_vector_begin
> >    (Conor)
> >  - export may_use_simd to include/asm/simd.h
> > ---
> >  arch/riscv/include/asm/processor.h     | 17 ++++-
> >  arch/riscv/include/asm/simd.h          | 44 ++++++++++++
> >  arch/riscv/include/asm/vector.h        | 21 ++++++
> >  arch/riscv/kernel/Makefile             |  1 +
> >  arch/riscv/kernel/kernel_mode_vector.c | 95 ++++++++++++++++++++++++++
> >  arch/riscv/kernel/process.c            |  1 +
> >  6 files changed, 178 insertions(+), 1 deletion(-)
> >  create mode 100644 arch/riscv/include/asm/simd.h
> >  create mode 100644 arch/riscv/kernel/kernel_mode_vector.c
> >
> > diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> > index f19f861cda54..15781e2232e0 100644
> > --- a/arch/riscv/include/asm/processor.h
> > +++ b/arch/riscv/include/asm/processor.h
> > @@ -73,6 +73,20 @@
> >  struct task_struct;
> >  struct pt_regs;
> >
> > +/*
> > + * We use a flag to track in-kernel Vector context. Currently the flag has the
> > + * following meaning:
> > + *
> > + *  - bit 0-7 indicates whether the in-kernel Vector context is active. The
> > + *    activation of this state disables the preemption. On a non-RT kernel, it
> > + *    also disable bh. Currently only 0 and 1 are valid value for this field.
> > + *    Other values are reserved for future uses.
> > + */
> > +
> > +#define RISCV_KERNEL_MODE_V_MASK     0xff
> > +
> > +#define RISCV_KERNEL_MODE_V  0x1
> > +
> >  /* CPU-specific state of a task */
> >  struct thread_struct {
> >       /* Callee-saved registers */
> > @@ -81,7 +95,8 @@ struct thread_struct {
> >       unsigned long s[12];    /* s[0]: frame pointer */
> >       struct __riscv_d_ext_state fstate;
> >       unsigned long bad_cause;
> > -     unsigned long vstate_ctrl;
> > +     u32 riscv_v_flags;
> > +     u32 vstate_ctrl;
> >       struct __riscv_v_ext_state vstate;
> >       unsigned long align_ctl;
> >  };
> > diff --git a/arch/riscv/include/asm/simd.h b/arch/riscv/include/asm/simd.h
> > new file mode 100644
> > index 000000000000..3b603e47c5d8
> > --- /dev/null
> > +++ b/arch/riscv/include/asm/simd.h
> > @@ -0,0 +1,44 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
> > + * Copyright (C) 2023 SiFive
> > + */
> > +
> > +#ifndef __ASM_SIMD_H
> > +#define __ASM_SIMD_H
> > +
> > +#include <linux/compiler.h>
> > +#include <linux/irqflags.h>
> > +#include <linux/percpu.h>
> > +#include <linux/preempt.h>
> > +#include <linux/types.h>
> > +
> > +#include <asm/vector.h>
> > +
> > +#ifdef CONFIG_RISCV_ISA_V
> > +/*
> > + * may_use_simd - whether it is allowable at this time to issue vector
> > + *                instructions or access the vector register file
> > + *
> > + * Callers must not assume that the result remains true beyond the next
> > + * preempt_enable() or return from softirq context.
> > + */
> > +static __must_check inline bool may_use_simd(void)
> > +{
> > +     /*
> > +      * RISCV_KERNEL_MODE_V is only set while preemption is disabled,
> > +      * and is clear whenever preemption is enabled.
> > +      */
> > +     return !in_hardirq() && !in_nmi() && !(riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK);
> > +}
> > +
> > +#else /* ! CONFIG_RISCV_ISA_V */
> > +
> > +static __must_check inline bool may_use_simd(void)
> > +{
> > +     return false;
> > +}
> > +
> > +#endif /* ! CONFIG_RISCV_ISA_V */
> > +
> > +#endif
> > diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
> > index 87aaef656257..6254830c0668 100644
> > --- a/arch/riscv/include/asm/vector.h
> > +++ b/arch/riscv/include/asm/vector.h
> > @@ -22,6 +22,27 @@
> >  extern unsigned long riscv_v_vsize;
> >  int riscv_v_setup_vsize(void);
> >  bool riscv_v_first_use_handler(struct pt_regs *regs);
> > +void kernel_vector_begin(void);
> > +void kernel_vector_end(void);
> > +void get_cpu_vector_context(void);
> > +void put_cpu_vector_context(void);
> > +
> > +static inline void riscv_v_ctx_cnt_add(u32 offset)
> > +{
> > +     current->thread.riscv_v_flags += offset;
> > +     barrier();
> > +}
> > +
> > +static inline void riscv_v_ctx_cnt_sub(u32 offset)
> > +{
> > +     barrier();
> > +     current->thread.riscv_v_flags -= offset;
> > +}
> > +
> > +static inline u32 riscv_v_ctx_cnt(void)
> > +{
> > +     return READ_ONCE(current->thread.riscv_v_flags);
> > +}
> >
> >  static __always_inline bool has_vector(void)
> >  {
> > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > index fee22a3d1b53..8c58595696b3 100644
> > --- a/arch/riscv/kernel/Makefile
> > +++ b/arch/riscv/kernel/Makefile
> > @@ -63,6 +63,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
> >  obj-$(CONFIG_RISCV_MISALIGNED)       += traps_misaligned.o
> >  obj-$(CONFIG_FPU)            += fpu.o
> >  obj-$(CONFIG_RISCV_ISA_V)    += vector.o
> > +obj-$(CONFIG_RISCV_ISA_V)    += kernel_mode_vector.o
> >  obj-$(CONFIG_SMP)            += smpboot.o
> >  obj-$(CONFIG_SMP)            += smp.o
> >  obj-$(CONFIG_SMP)            += cpu_ops.o
> > diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
> > new file mode 100644
> > index 000000000000..105147c7d2da
> > --- /dev/null
> > +++ b/arch/riscv/kernel/kernel_mode_vector.c
> > @@ -0,0 +1,95 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/*
> > + * Copyright (C) 2012 ARM Ltd.
> > + * Author: Catalin Marinas <catalin.marinas@arm.com>
> > + * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
> > + * Copyright (C) 2021 SiFive
> > + */
> > +#include <linux/compiler.h>
> > +#include <linux/irqflags.h>
> > +#include <linux/percpu.h>
> > +#include <linux/preempt.h>
> > +#include <linux/types.h>
> > +
> > +#include <asm/vector.h>
> > +#include <asm/switch_to.h>
> > +#include <asm/simd.h>
> > +
> > +/*
> > + * Claim ownership of the CPU vector context for use by the calling context.
> > + *
> > + * The caller may freely manipulate the vector context metadata until
> > + * put_cpu_vector_context() is called.
> > + */
> > +void get_cpu_vector_context(void)
> > +{
> > +     preempt_disable();
> > +
> > +     WARN_ON((riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK) != 0);
> > +     riscv_v_ctx_cnt_add(RISCV_KERNEL_MODE_V);
>
> In our last conversation I thought we agreed that a bitwise operation
> would be more appropriate then addition. You also mentioned allowing
> this function to be called multiple times. Did something change?

I am having the same discussion with Eric on this thread [1]. Using
counter add/sub and mask with the bitmask provides the same overflow
protection. It also helps us reuse the same mechanism for preempt_v
and for allowing this function to be called multiple times. I have not
done the second part because it is going to be very close to an idea
of enabling V for the entire kernel. For example, it is possible to
launch a kernel thread and wrap it with kernel_vector_*. If people
feel ok about this then I will add this into v9. We will have to
change the bitmap a little, and track context at trap entry/exit
regardless of CONFIG_RISCV_ISA_V_PREEMPTIVE.

- [1]: https://lore.kernel.org/all/20231222053014.GC52600@quark.localdomain/T/#m4f87d3c745853d518f96fb87a48c1d59e63b3d18

Thanks,
Andy

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [v8, 05/10] riscv: lib: vectorize copy_to_user/copy_from_user
  2023-12-27  1:34   ` Guo Ren
@ 2023-12-27  3:15     ` Andy Chiu
  2024-01-15  5:42       ` Andy Chiu
  0 siblings, 1 reply; 24+ messages in thread
From: Andy Chiu @ 2023-12-27  3:15 UTC (permalink / raw)
  To: Guo Ren
  Cc: linux-riscv, palmer, paul.walmsley, greentime.hu, guoren, bjorn,
	charlie, ardb, arnd, peterz, tglx, ebiggers, Albert Ou,
	Sami Tolvanen, Han-Kuan Chen, Deepak Gupta, Andrew Jones,
	Conor Dooley, Heiko Stuebner, Aurelien Jarno, Bo YU,
	Alexandre Ghiti, Clément Léger

On Wed, Dec 27, 2023 at 9:34 AM Guo Ren <guoren@kernel.org> wrote:
>
> On Sat, Dec 23, 2023 at 12:30 PM Andy Chiu <andy.chiu@sifive.com> wrote:
> >
> > This patch utilizes Vector to perform copy_to_user/copy_from_user. If
> > Vector is available and the size of copy is large enough for Vector to
> > perform better than scalar, then direct the kernel to do Vector copies
> > for userspace. Though the best programming practice for users is to
> > reduce the copy, this provides a faster variant when copies are
> > inevitable.
> >
> > The optimal size for using Vector, copy_to_user_thres, is only a
> > heuristic for now. We can add DT parsing if people feel the need of
> > customizing it.
> >
> > The exception fixup code of the __asm_vector_usercopy must fallback to
> > the scalar one because accessing user pages might fault, and must be
> > sleepable. Current kernel-mode Vector does not allow tasks to be
> > preemptible, so we must disactivate Vector and perform a scalar fallback
> > in such case.
> >
> > The original implementation of Vector operations comes from
> > https://github.com/sifive/sifive-libc, which we agree to contribute to
> > Linux kernel.
> >
> > Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> > ---
> > Changelog v8:
> >  - fix no-mmu build
> > Changelog v6:
> >  - Add a kconfig entry to configure threshold values (Charlie)
> >  - Refine assembly code (Charlie)
> > Changelog v4:
> >  - new patch since v4
> > ---
> >  arch/riscv/Kconfig                      |  8 ++++
> >  arch/riscv/include/asm/asm-prototypes.h |  4 ++
> >  arch/riscv/lib/Makefile                 |  6 ++-
> >  arch/riscv/lib/riscv_v_helpers.c        | 44 ++++++++++++++++++++++
> >  arch/riscv/lib/uaccess.S                | 10 +++++
> >  arch/riscv/lib/uaccess_vector.S         | 50 +++++++++++++++++++++++++
> >  6 files changed, 121 insertions(+), 1 deletion(-)
> >  create mode 100644 arch/riscv/lib/riscv_v_helpers.c
> >  create mode 100644 arch/riscv/lib/uaccess_vector.S
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 95a2a06acc6a..3c5ba05e8a2d 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -525,6 +525,14 @@ config RISCV_ISA_V_DEFAULT_ENABLE
> >
> >           If you don't know what to do here, say Y.
> >
> > +config RISCV_ISA_V_UCOPY_THRESHOLD
> > +       int "Threshold size for vectorized user copies"
> > +       depends on RISCV_ISA_V
> > +       default 768
> > +       help
> > +         Prefer using vectorized copy_to_user()/copy_from_user() when the
> > +         workload size exceeds this value.
> > +
> >  config TOOLCHAIN_HAS_ZBB
> >         bool
> >         default y
> > diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> > index 6db1a9bbff4c..be438932f321 100644
> > --- a/arch/riscv/include/asm/asm-prototypes.h
> > +++ b/arch/riscv/include/asm/asm-prototypes.h
> > @@ -11,6 +11,10 @@ long long __ashlti3(long long a, int b);
> >
> >  #ifdef CONFIG_RISCV_ISA_V
> >
> > +#ifdef CONFIG_MMU
> > +asmlinkage int enter_vector_usercopy(void *dst, void *src, size_t n);
> > +#endif /* CONFIG_MMU  */
> > +
> >  void xor_regs_2_(unsigned long bytes, unsigned long *__restrict p1,
> >                  const unsigned long *__restrict p2);
> >  void xor_regs_3_(unsigned long bytes, unsigned long *__restrict p1,
> > diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
> > index 494f9cd1a00c..c8a6787d5827 100644
> > --- a/arch/riscv/lib/Makefile
> > +++ b/arch/riscv/lib/Makefile
> > @@ -6,9 +6,13 @@ lib-y                  += memmove.o
> >  lib-y                  += strcmp.o
> >  lib-y                  += strlen.o
> >  lib-y                  += strncmp.o
> > -lib-$(CONFIG_MMU)      += uaccess.o
> > +ifeq ($(CONFIG_MMU), y)
> > +lib-y                          += uaccess.o
> > +lib-$(CONFIG_RISCV_ISA_V)      += uaccess_vector.o
> > +endif
> >  lib-$(CONFIG_64BIT)    += tishift.o
> >  lib-$(CONFIG_RISCV_ISA_ZICBOZ) += clear_page.o
> >
> >  obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
> >  lib-$(CONFIG_RISCV_ISA_V)      += xor.o
> > +lib-$(CONFIG_RISCV_ISA_V)      += riscv_v_helpers.o
> > diff --git a/arch/riscv/lib/riscv_v_helpers.c b/arch/riscv/lib/riscv_v_helpers.c
> > new file mode 100644
> > index 000000000000..6cac8f4e69e9
> > --- /dev/null
> > +++ b/arch/riscv/lib/riscv_v_helpers.c
> > @@ -0,0 +1,44 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/*
> > + * Copyright (C) 2023 SiFive
> > + * Author: Andy Chiu <andy.chiu@sifive.com>
> > + */
> > +#include <linux/linkage.h>
> > +#include <asm/asm.h>
> > +
> > +#include <asm/vector.h>
> > +#include <asm/simd.h>
> > +
> > +#ifdef CONFIG_MMU
> > +#include <asm/asm-prototypes.h>
> > +#endif
> > +
> > +#ifdef CONFIG_MMU
> > +size_t riscv_v_usercopy_threshold = CONFIG_RISCV_ISA_V_UCOPY_THRESHOLD;
> > +int __asm_vector_usercopy(void *dst, void *src, size_t n);
> > +int fallback_scalar_usercopy(void *dst, void *src, size_t n);
> > +asmlinkage int enter_vector_usercopy(void *dst, void *src, size_t n)
> > +{
> > +       size_t remain, copied;
> > +
> > +       /* skip has_vector() check because it has been done by the asm  */
> > +       if (!may_use_simd())
> > +               goto fallback;
> > +
> > +       kernel_vector_begin();
> > +       remain = __asm_vector_usercopy(dst, src, n);
> > +       kernel_vector_end();
> > +
> > +       if (remain) {
> > +               copied = n - remain;
> > +               dst += copied;
> > +               src += copied;
> > +               goto fallback;
> > +       }
> > +
> > +       return remain;
> > +
> > +fallback:
> > +       return fallback_scalar_usercopy(dst, src, n);
> > +}
> > +#endif
> > diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S
> > index 3ab438f30d13..a1e4a3c42925 100644
> > --- a/arch/riscv/lib/uaccess.S
> > +++ b/arch/riscv/lib/uaccess.S
> > @@ -3,6 +3,8 @@
> >  #include <asm/asm.h>
> >  #include <asm/asm-extable.h>
> >  #include <asm/csr.h>
> > +#include <asm/hwcap.h>
> > +#include <asm/alternative-macros.h>
> >
> >         .macro fixup op reg addr lbl
> >  100:
> > @@ -11,6 +13,13 @@
> >         .endm
> >
> >  SYM_FUNC_START(__asm_copy_to_user)
> > +#ifdef CONFIG_RISCV_ISA_V
> > +       ALTERNATIVE("j fallback_scalar_usercopy", "nop", 0, RISCV_ISA_EXT_v, CONFIG_RISCV_ISA_V)
> > +       REG_L   t0, riscv_v_usercopy_threshold
> > +       bltu    a2, t0, fallback_scalar_usercopy
> > +       tail enter_vector_usercopy
> > +#endif
> > +SYM_FUNC_START(fallback_scalar_usercopy)
> >
> >         /* Enable access to user memory */
> >         li t6, SR_SUM
> > @@ -181,6 +190,7 @@ SYM_FUNC_START(__asm_copy_to_user)
> >         sub a0, t5, a0
> >         ret
> >  SYM_FUNC_END(__asm_copy_to_user)
> > +SYM_FUNC_END(fallback_scalar_usercopy)
> >  EXPORT_SYMBOL(__asm_copy_to_user)
> >  SYM_FUNC_ALIAS(__asm_copy_from_user, __asm_copy_to_user)
> >  EXPORT_SYMBOL(__asm_copy_from_user)
> > diff --git a/arch/riscv/lib/uaccess_vector.S b/arch/riscv/lib/uaccess_vector.S
> > new file mode 100644
> > index 000000000000..7bd96cee39e4
> > --- /dev/null
> > +++ b/arch/riscv/lib/uaccess_vector.S
> > @@ -0,0 +1,50 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +
> > +#include <linux/linkage.h>
> > +#include <asm-generic/export.h>
> > +#include <asm/asm.h>
> > +#include <asm/asm-extable.h>
> > +#include <asm/csr.h>
> > +
> > +#define pDst a0
> > +#define pSrc a1
> > +#define iNum a2
> > +
> > +#define iVL a3
> > +
> > +#define ELEM_LMUL_SETTING m8
> > +#define vData v0
> > +
> > +       .macro fixup op reg addr lbl
> > +100:
> > +       \op \reg, \addr
> > +       _asm_extable    100b, \lbl
> > +       .endm
> > +
> > +SYM_FUNC_START(__asm_vector_usercopy)
> > +       /* Enable access to user memory */
> > +       li t6, SR_SUM
> > +       csrs CSR_STATUS, t6
> > +
> > +loop:
> > +       vsetvli iVL, iNum, e8, ELEM_LMUL_SETTING, ta, ma
> > +       fixup vle8.v vData, (pSrc), 10f
> > +       fixup vse8.v vData, (pDst), 10f
> > +       sub iNum, iNum, iVL
> > +       add pSrc, pSrc, iVL
> > +       add pDst, pDst, iVL
> > +       bnez iNum, loop
> > +
> > +.Lout_copy_user:
> > +       /* Disable access to user memory */
> > +       csrc CSR_STATUS, t6
> > +       li      a0, 0
> > +       ret
> > +
> > +       /* Exception fixup code */
> > +10:
> > +       /* Disable access to user memory */
> > +       csrc    CSR_STATUS, t6
> > +       mv      a0, iNum
> Shall we check CSR_VSTART to find out how many elements were copied?

This is a good idea! But is it possible to find out if we were trapped
at the load or the store instruction? IIUC if we trap at the load then
we could not derive the number of copied bytes from CSR_VSTART.

Thanks,
Andy

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [v8, 01/10] riscv: Add support for kernel mode vector
  2023-12-27  2:46     ` Andy Chiu
@ 2023-12-27  5:30       ` Charlie Jenkins
  2023-12-27  9:18         ` Andy Chiu
  0 siblings, 1 reply; 24+ messages in thread
From: Charlie Jenkins @ 2023-12-27  5:30 UTC (permalink / raw)
  To: Andy Chiu
  Cc: linux-riscv, palmer, paul.walmsley, greentime.hu, guoren, bjorn,
	ardb, arnd, peterz, tglx, ebiggers, Vincent Chen, Albert Ou,
	Heiko Stuebner, Baoquan He, Clément Léger, Guo Ren,
	Xiao Wang, Björn Töpel, Conor Dooley, Alexandre Ghiti,
	Sami Tolvanen, Sia Jee Heng, Evan Green, Jisheng Zhang

On Wed, Dec 27, 2023 at 10:46:58AM +0800, Andy Chiu wrote:
> On Wed, Dec 27, 2023 at 9:36 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
> >
> > On Sat, Dec 23, 2023 at 04:29:05AM +0000, Andy Chiu wrote:
> > > From: Greentime Hu <greentime.hu@sifive.com>
> > >
> > > Add kernel_vector_begin() and kernel_vector_end() function declarations
> > > and corresponding definitions in kernel_mode_vector.c
> > >
> > > These are needed to wrap uses of vector in kernel mode.
> > >
> > > Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
> > > Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> > > Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> > > Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> > > ---
> > > Changelog v8:
> > >  - Refactor unnecessary whitespace change (Eric)
> > > Changelog v7:
> > >  - fix build fail for allmodconfig
> > > Changelog v6:
> > >  - Use 8 bits to track non-preemptible vector context to provide better
> > >    WARN coverage.
> > > Changelog v4:
> > >  - Use kernel_v_flags and helpers to track vector context.
> > > Changelog v3:
> > >  - Reorder patch 1 to patch 3 to make use of
> > >    {get,put}_cpu_vector_context later.
> > >  - Export {get,put}_cpu_vector_context.
> > >  - Save V context after disabling preemption. (Guo)
> > >  - Fix a build fail. (Conor)
> > >  - Remove irqs_disabled() check as it is not needed, fix styling. (Björn)
> > > Changelog v2:
> > >  - 's/kernel_rvv/kernel_vector' and return void in kernel_vector_begin
> > >    (Conor)
> > >  - export may_use_simd to include/asm/simd.h
> > > ---
> > >  arch/riscv/include/asm/processor.h     | 17 ++++-
> > >  arch/riscv/include/asm/simd.h          | 44 ++++++++++++
> > >  arch/riscv/include/asm/vector.h        | 21 ++++++
> > >  arch/riscv/kernel/Makefile             |  1 +
> > >  arch/riscv/kernel/kernel_mode_vector.c | 95 ++++++++++++++++++++++++++
> > >  arch/riscv/kernel/process.c            |  1 +
> > >  6 files changed, 178 insertions(+), 1 deletion(-)
> > >  create mode 100644 arch/riscv/include/asm/simd.h
> > >  create mode 100644 arch/riscv/kernel/kernel_mode_vector.c
> > >
> > > diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> > > index f19f861cda54..15781e2232e0 100644
> > > --- a/arch/riscv/include/asm/processor.h
> > > +++ b/arch/riscv/include/asm/processor.h
> > > @@ -73,6 +73,20 @@
> > >  struct task_struct;
> > >  struct pt_regs;
> > >
> > > +/*
> > > + * We use a flag to track in-kernel Vector context. Currently the flag has the
> > > + * following meaning:
> > > + *
> > > + *  - bit 0-7 indicates whether the in-kernel Vector context is active. The
> > > + *    activation of this state disables the preemption. On a non-RT kernel, it
> > > + *    also disable bh. Currently only 0 and 1 are valid value for this field.
> > > + *    Other values are reserved for future uses.
> > > + */
> > > +
> > > +#define RISCV_KERNEL_MODE_V_MASK     0xff
> > > +
> > > +#define RISCV_KERNEL_MODE_V  0x1
> > > +
> > >  /* CPU-specific state of a task */
> > >  struct thread_struct {
> > >       /* Callee-saved registers */
> > > @@ -81,7 +95,8 @@ struct thread_struct {
> > >       unsigned long s[12];    /* s[0]: frame pointer */
> > >       struct __riscv_d_ext_state fstate;
> > >       unsigned long bad_cause;
> > > -     unsigned long vstate_ctrl;
> > > +     u32 riscv_v_flags;
> > > +     u32 vstate_ctrl;
> > >       struct __riscv_v_ext_state vstate;
> > >       unsigned long align_ctl;
> > >  };
> > > diff --git a/arch/riscv/include/asm/simd.h b/arch/riscv/include/asm/simd.h
> > > new file mode 100644
> > > index 000000000000..3b603e47c5d8
> > > --- /dev/null
> > > +++ b/arch/riscv/include/asm/simd.h
> > > @@ -0,0 +1,44 @@
> > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > +/*
> > > + * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
> > > + * Copyright (C) 2023 SiFive
> > > + */
> > > +
> > > +#ifndef __ASM_SIMD_H
> > > +#define __ASM_SIMD_H
> > > +
> > > +#include <linux/compiler.h>
> > > +#include <linux/irqflags.h>
> > > +#include <linux/percpu.h>
> > > +#include <linux/preempt.h>
> > > +#include <linux/types.h>
> > > +
> > > +#include <asm/vector.h>
> > > +
> > > +#ifdef CONFIG_RISCV_ISA_V
> > > +/*
> > > + * may_use_simd - whether it is allowable at this time to issue vector
> > > + *                instructions or access the vector register file
> > > + *
> > > + * Callers must not assume that the result remains true beyond the next
> > > + * preempt_enable() or return from softirq context.
> > > + */
> > > +static __must_check inline bool may_use_simd(void)
> > > +{
> > > +     /*
> > > +      * RISCV_KERNEL_MODE_V is only set while preemption is disabled,
> > > +      * and is clear whenever preemption is enabled.
> > > +      */
> > > +     return !in_hardirq() && !in_nmi() && !(riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK);
> > > +}
> > > +
> > > +#else /* ! CONFIG_RISCV_ISA_V */
> > > +
> > > +static __must_check inline bool may_use_simd(void)
> > > +{
> > > +     return false;
> > > +}
> > > +
> > > +#endif /* ! CONFIG_RISCV_ISA_V */
> > > +
> > > +#endif
> > > diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
> > > index 87aaef656257..6254830c0668 100644
> > > --- a/arch/riscv/include/asm/vector.h
> > > +++ b/arch/riscv/include/asm/vector.h
> > > @@ -22,6 +22,27 @@
> > >  extern unsigned long riscv_v_vsize;
> > >  int riscv_v_setup_vsize(void);
> > >  bool riscv_v_first_use_handler(struct pt_regs *regs);
> > > +void kernel_vector_begin(void);
> > > +void kernel_vector_end(void);
> > > +void get_cpu_vector_context(void);
> > > +void put_cpu_vector_context(void);
> > > +
> > > +static inline void riscv_v_ctx_cnt_add(u32 offset)
> > > +{
> > > +     current->thread.riscv_v_flags += offset;
> > > +     barrier();
> > > +}
> > > +
> > > +static inline void riscv_v_ctx_cnt_sub(u32 offset)
> > > +{
> > > +     barrier();
> > > +     current->thread.riscv_v_flags -= offset;
> > > +}
> > > +
> > > +static inline u32 riscv_v_ctx_cnt(void)
> > > +{
> > > +     return READ_ONCE(current->thread.riscv_v_flags);
> > > +}
> > >
> > >  static __always_inline bool has_vector(void)
> > >  {
> > > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > > index fee22a3d1b53..8c58595696b3 100644
> > > --- a/arch/riscv/kernel/Makefile
> > > +++ b/arch/riscv/kernel/Makefile
> > > @@ -63,6 +63,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
> > >  obj-$(CONFIG_RISCV_MISALIGNED)       += traps_misaligned.o
> > >  obj-$(CONFIG_FPU)            += fpu.o
> > >  obj-$(CONFIG_RISCV_ISA_V)    += vector.o
> > > +obj-$(CONFIG_RISCV_ISA_V)    += kernel_mode_vector.o
> > >  obj-$(CONFIG_SMP)            += smpboot.o
> > >  obj-$(CONFIG_SMP)            += smp.o
> > >  obj-$(CONFIG_SMP)            += cpu_ops.o
> > > diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
> > > new file mode 100644
> > > index 000000000000..105147c7d2da
> > > --- /dev/null
> > > +++ b/arch/riscv/kernel/kernel_mode_vector.c
> > > @@ -0,0 +1,95 @@
> > > +// SPDX-License-Identifier: GPL-2.0-or-later
> > > +/*
> > > + * Copyright (C) 2012 ARM Ltd.
> > > + * Author: Catalin Marinas <catalin.marinas@arm.com>
> > > + * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
> > > + * Copyright (C) 2021 SiFive
> > > + */
> > > +#include <linux/compiler.h>
> > > +#include <linux/irqflags.h>
> > > +#include <linux/percpu.h>
> > > +#include <linux/preempt.h>
> > > +#include <linux/types.h>
> > > +
> > > +#include <asm/vector.h>
> > > +#include <asm/switch_to.h>
> > > +#include <asm/simd.h>
> > > +
> > > +/*
> > > + * Claim ownership of the CPU vector context for use by the calling context.
> > > + *
> > > + * The caller may freely manipulate the vector context metadata until
> > > + * put_cpu_vector_context() is called.
> > > + */
> > > +void get_cpu_vector_context(void)
> > > +{
> > > +     preempt_disable();
> > > +
> > > +     WARN_ON((riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK) != 0);
> > > +     riscv_v_ctx_cnt_add(RISCV_KERNEL_MODE_V);
> >
> > In our last conversation I thought we agreed that a bitwise operation
> > would be more appropriate then addition. You also mentioned allowing
> > this function to be called multiple times. Did something change?
> 
> I am having the same discussion with Eric on this thread [1]. Using
> counter add/sub and mask with the bitmask provides the same overflow
> protection. It also helps us reuse the same mechanism for preempt_v
> and for allowing this function to be called multiple times. I have not
> done the second part because it is going to be very close to an idea
> of enabling V for the entire kernel. For example, it is possible to
> launch a kernel thread and wrap it with kernel_vector_*. If people
> feel ok about this then I will add this into v9. We will have to
> change the bitmap a little, and track context at trap entry/exit
> regardless of CONFIG_RISCV_ISA_V_PREEMPTIVE.
> 
> - [1]: https://lore.kernel.org/all/20231222053014.GC52600@quark.localdomain/T/#m4f87d3c745853d518f96fb87a48c1d59e63b3d18
> 
> Thanks,
> Andy

Okay I understand now, it is a counter to know how many calls along the
chain have called get_cpu_vector_context. However, if it is not yet
supported to have nested calls to get_cpu_vector_context, then it should
be an error to call it more than once and not just a warning.

- Charlie


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [v8, 01/10] riscv: Add support for kernel mode vector
  2023-12-27  5:30       ` Charlie Jenkins
@ 2023-12-27  9:18         ` Andy Chiu
  2023-12-28  1:52           ` Charlie Jenkins
  0 siblings, 1 reply; 24+ messages in thread
From: Andy Chiu @ 2023-12-27  9:18 UTC (permalink / raw)
  To: Charlie Jenkins
  Cc: linux-riscv, palmer, paul.walmsley, greentime.hu, guoren, bjorn,
	ardb, arnd, peterz, tglx, ebiggers, Vincent Chen, Albert Ou,
	Heiko Stuebner, Baoquan He, Clément Léger, Guo Ren,
	Xiao Wang, Björn Töpel, Conor Dooley, Alexandre Ghiti,
	Sami Tolvanen, Sia Jee Heng, Evan Green, Jisheng Zhang

On Wed, Dec 27, 2023 at 1:30 PM Charlie Jenkins <charlie@rivosinc.com> wrote:
>
> On Wed, Dec 27, 2023 at 10:46:58AM +0800, Andy Chiu wrote:
> > On Wed, Dec 27, 2023 at 9:36 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
> > >
> > > On Sat, Dec 23, 2023 at 04:29:05AM +0000, Andy Chiu wrote:
> > > > From: Greentime Hu <greentime.hu@sifive.com>
> > > >
> > > > Add kernel_vector_begin() and kernel_vector_end() function declarations
> > > > and corresponding definitions in kernel_mode_vector.c
> > > >
> > > > These are needed to wrap uses of vector in kernel mode.
> > > >
> > > > Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
> > > > Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> > > > Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> > > > Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> > > > ---
> > > > Changelog v8:
> > > >  - Refactor unnecessary whitespace change (Eric)
> > > > Changelog v7:
> > > >  - fix build fail for allmodconfig
> > > > Changelog v6:
> > > >  - Use 8 bits to track non-preemptible vector context to provide better
> > > >    WARN coverage.
> > > > Changelog v4:
> > > >  - Use kernel_v_flags and helpers to track vector context.
> > > > Changelog v3:
> > > >  - Reorder patch 1 to patch 3 to make use of
> > > >    {get,put}_cpu_vector_context later.
> > > >  - Export {get,put}_cpu_vector_context.
> > > >  - Save V context after disabling preemption. (Guo)
> > > >  - Fix a build fail. (Conor)
> > > >  - Remove irqs_disabled() check as it is not needed, fix styling. (Björn)
> > > > Changelog v2:
> > > >  - 's/kernel_rvv/kernel_vector' and return void in kernel_vector_begin
> > > >    (Conor)
> > > >  - export may_use_simd to include/asm/simd.h
> > > > ---
> > > >  arch/riscv/include/asm/processor.h     | 17 ++++-
> > > >  arch/riscv/include/asm/simd.h          | 44 ++++++++++++
> > > >  arch/riscv/include/asm/vector.h        | 21 ++++++
> > > >  arch/riscv/kernel/Makefile             |  1 +
> > > >  arch/riscv/kernel/kernel_mode_vector.c | 95 ++++++++++++++++++++++++++
> > > >  arch/riscv/kernel/process.c            |  1 +
> > > >  6 files changed, 178 insertions(+), 1 deletion(-)
> > > >  create mode 100644 arch/riscv/include/asm/simd.h
> > > >  create mode 100644 arch/riscv/kernel/kernel_mode_vector.c
> > > >
> > > > diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> > > > index f19f861cda54..15781e2232e0 100644
> > > > --- a/arch/riscv/include/asm/processor.h
> > > > +++ b/arch/riscv/include/asm/processor.h
> > > > @@ -73,6 +73,20 @@
> > > >  struct task_struct;
> > > >  struct pt_regs;
> > > >
> > > > +/*
> > > > + * We use a flag to track in-kernel Vector context. Currently the flag has the
> > > > + * following meaning:
> > > > + *
> > > > + *  - bit 0-7 indicates whether the in-kernel Vector context is active. The
> > > > + *    activation of this state disables the preemption. On a non-RT kernel, it
> > > > + *    also disable bh. Currently only 0 and 1 are valid value for this field.
> > > > + *    Other values are reserved for future uses.
> > > > + */
> > > > +
> > > > +#define RISCV_KERNEL_MODE_V_MASK     0xff
> > > > +
> > > > +#define RISCV_KERNEL_MODE_V  0x1
> > > > +
> > > >  /* CPU-specific state of a task */
> > > >  struct thread_struct {
> > > >       /* Callee-saved registers */
> > > > @@ -81,7 +95,8 @@ struct thread_struct {
> > > >       unsigned long s[12];    /* s[0]: frame pointer */
> > > >       struct __riscv_d_ext_state fstate;
> > > >       unsigned long bad_cause;
> > > > -     unsigned long vstate_ctrl;
> > > > +     u32 riscv_v_flags;
> > > > +     u32 vstate_ctrl;
> > > >       struct __riscv_v_ext_state vstate;
> > > >       unsigned long align_ctl;
> > > >  };
> > > > diff --git a/arch/riscv/include/asm/simd.h b/arch/riscv/include/asm/simd.h
> > > > new file mode 100644
> > > > index 000000000000..3b603e47c5d8
> > > > --- /dev/null
> > > > +++ b/arch/riscv/include/asm/simd.h
> > > > @@ -0,0 +1,44 @@
> > > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > > +/*
> > > > + * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
> > > > + * Copyright (C) 2023 SiFive
> > > > + */
> > > > +
> > > > +#ifndef __ASM_SIMD_H
> > > > +#define __ASM_SIMD_H
> > > > +
> > > > +#include <linux/compiler.h>
> > > > +#include <linux/irqflags.h>
> > > > +#include <linux/percpu.h>
> > > > +#include <linux/preempt.h>
> > > > +#include <linux/types.h>
> > > > +
> > > > +#include <asm/vector.h>
> > > > +
> > > > +#ifdef CONFIG_RISCV_ISA_V
> > > > +/*
> > > > + * may_use_simd - whether it is allowable at this time to issue vector
> > > > + *                instructions or access the vector register file
> > > > + *
> > > > + * Callers must not assume that the result remains true beyond the next
> > > > + * preempt_enable() or return from softirq context.
> > > > + */
> > > > +static __must_check inline bool may_use_simd(void)
> > > > +{
> > > > +     /*
> > > > +      * RISCV_KERNEL_MODE_V is only set while preemption is disabled,
> > > > +      * and is clear whenever preemption is enabled.
> > > > +      */
> > > > +     return !in_hardirq() && !in_nmi() && !(riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK);
> > > > +}
> > > > +
> > > > +#else /* ! CONFIG_RISCV_ISA_V */
> > > > +
> > > > +static __must_check inline bool may_use_simd(void)
> > > > +{
> > > > +     return false;
> > > > +}
> > > > +
> > > > +#endif /* ! CONFIG_RISCV_ISA_V */
> > > > +
> > > > +#endif
> > > > diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
> > > > index 87aaef656257..6254830c0668 100644
> > > > --- a/arch/riscv/include/asm/vector.h
> > > > +++ b/arch/riscv/include/asm/vector.h
> > > > @@ -22,6 +22,27 @@
> > > >  extern unsigned long riscv_v_vsize;
> > > >  int riscv_v_setup_vsize(void);
> > > >  bool riscv_v_first_use_handler(struct pt_regs *regs);
> > > > +void kernel_vector_begin(void);
> > > > +void kernel_vector_end(void);
> > > > +void get_cpu_vector_context(void);
> > > > +void put_cpu_vector_context(void);
> > > > +
> > > > +static inline void riscv_v_ctx_cnt_add(u32 offset)
> > > > +{
> > > > +     current->thread.riscv_v_flags += offset;
> > > > +     barrier();
> > > > +}
> > > > +
> > > > +static inline void riscv_v_ctx_cnt_sub(u32 offset)
> > > > +{
> > > > +     barrier();
> > > > +     current->thread.riscv_v_flags -= offset;
> > > > +}
> > > > +
> > > > +static inline u32 riscv_v_ctx_cnt(void)
> > > > +{
> > > > +     return READ_ONCE(current->thread.riscv_v_flags);
> > > > +}
> > > >
> > > >  static __always_inline bool has_vector(void)
> > > >  {
> > > > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > > > index fee22a3d1b53..8c58595696b3 100644
> > > > --- a/arch/riscv/kernel/Makefile
> > > > +++ b/arch/riscv/kernel/Makefile
> > > > @@ -63,6 +63,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
> > > >  obj-$(CONFIG_RISCV_MISALIGNED)       += traps_misaligned.o
> > > >  obj-$(CONFIG_FPU)            += fpu.o
> > > >  obj-$(CONFIG_RISCV_ISA_V)    += vector.o
> > > > +obj-$(CONFIG_RISCV_ISA_V)    += kernel_mode_vector.o
> > > >  obj-$(CONFIG_SMP)            += smpboot.o
> > > >  obj-$(CONFIG_SMP)            += smp.o
> > > >  obj-$(CONFIG_SMP)            += cpu_ops.o
> > > > diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
> > > > new file mode 100644
> > > > index 000000000000..105147c7d2da
> > > > --- /dev/null
> > > > +++ b/arch/riscv/kernel/kernel_mode_vector.c
> > > > @@ -0,0 +1,95 @@
> > > > +// SPDX-License-Identifier: GPL-2.0-or-later
> > > > +/*
> > > > + * Copyright (C) 2012 ARM Ltd.
> > > > + * Author: Catalin Marinas <catalin.marinas@arm.com>
> > > > + * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
> > > > + * Copyright (C) 2021 SiFive
> > > > + */
> > > > +#include <linux/compiler.h>
> > > > +#include <linux/irqflags.h>
> > > > +#include <linux/percpu.h>
> > > > +#include <linux/preempt.h>
> > > > +#include <linux/types.h>
> > > > +
> > > > +#include <asm/vector.h>
> > > > +#include <asm/switch_to.h>
> > > > +#include <asm/simd.h>
> > > > +
> > > > +/*
> > > > + * Claim ownership of the CPU vector context for use by the calling context.
> > > > + *
> > > > + * The caller may freely manipulate the vector context metadata until
> > > > + * put_cpu_vector_context() is called.
> > > > + */
> > > > +void get_cpu_vector_context(void)
> > > > +{
> > > > +     preempt_disable();
> > > > +
> > > > +     WARN_ON((riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK) != 0);
> > > > +     riscv_v_ctx_cnt_add(RISCV_KERNEL_MODE_V);
> > >
> > > In our last conversation I thought we agreed that a bitwise operation
> > > would be more appropriate then addition. You also mentioned allowing
> > > this function to be called multiple times. Did something change?
> >
> > I am having the same discussion with Eric on this thread [1]. Using
> > counter add/sub and mask with the bitmask provides the same overflow
> > protection. It also helps us reuse the same mechanism for preempt_v
> > and for allowing this function to be called multiple times. I have not
> > done the second part because it is going to be very close to an idea
> > of enabling V for the entire kernel. For example, it is possible to
> > launch a kernel thread and wrap it with kernel_vector_*. If people
> > feel ok about this then I will add this into v9. We will have to
> > change the bitmap a little, and track context at trap entry/exit
> > regardless of CONFIG_RISCV_ISA_V_PREEMPTIVE.
> >
> > - [1]: https://lore.kernel.org/all/20231222053014.GC52600@quark.localdomain/T/#m4f87d3c745853d518f96fb87a48c1d59e63b3d18
> >
> > Thanks,

Hey, I figured out a way to address the above problems, please wait for v9.

> > Andy
>
> Okay I understand now, it is a counter to know how many calls along the
> chain have called get_cpu_vector_context. However, if it is not yet
> supported to have nested calls to get_cpu_vector_context, then it should
> be an error to call it more than once and not just a warning.

Do you suggest promoting WARN_ON to a BUG_ON?

>
> - Charlie
>

Thanks,
Andy

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [v8, 04/10] riscv: sched: defer restoring Vector context for user
  2023-12-23  4:29 ` [v8, 04/10] riscv: sched: defer restoring Vector context for user Andy Chiu
@ 2023-12-27 12:07   ` Song Shuai
  0 siblings, 0 replies; 24+ messages in thread
From: Song Shuai @ 2023-12-27 12:07 UTC (permalink / raw)
  To: Andy Chiu, linux-riscv, palmer
  Cc: paul.walmsley, greentime.hu, guoren, bjorn, charlie, ardb, arnd,
	peterz, tglx, ebiggers, Albert Ou, Oleg Nesterov,
	Björn Töpel, Conor Dooley, Guo Ren,
	Clément Léger, Jisheng Zhang, Sami Tolvanen,
	Deepak Gupta, Vincent Chen, Heiko Stuebner, Xiao Wang, Haorong Lu,
	Mathis Salmen, Joel Granados



在 2023/12/23 12:29, Andy Chiu 写道:
> User will use its Vector registers only after the kernel really returns
> to the userspace. So we can delay restoring Vector registers as long as
> we are still running in kernel mode. So, add a thread flag to indicates
> the need of restoring Vector and do the restore at the last
> arch-specific exit-to-user hook. This save the context restoring cost
> when we switch over multiple processes that run V in kernel mode. For
> example, if the kernel performs a context swicth from A->B->C, and
> returns to C's userspace, then there is no need to restore B's
> V-register.
> 
> Besides, this also prevents us from repeatedly restoring V context when
> executing kernel-mode Vector multiple times.
> 
> The cost of this is that we must disable preemption and mark vector as
> busy during vstate_{save,restore}. Because then the V context will not
> get restored back immediately when a trap-causing context switch happens
> in the middle of vstate_{save,restore}.
> 
> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> Acked-by: Conor Dooley <conor.dooley@microchip.com>
> ---
> Changelog v4:
>   - fix typos and re-add Conor's A-b.
> Changelog v3:
>   - Guard {get,put}_cpu_vector_context between vstate_* operation and
>     explain it in the commit msg.
>   - Drop R-b from Björn and A-b from Conor.
> Changelog v2:
>   - rename and add comment for the new thread flag (Conor)
> ---
>   arch/riscv/include/asm/entry-common.h  | 17 +++++++++++++++++
>   arch/riscv/include/asm/thread_info.h   |  2 ++
>   arch/riscv/include/asm/vector.h        | 11 ++++++++++-
>   arch/riscv/kernel/kernel_mode_vector.c |  2 +-
>   arch/riscv/kernel/process.c            |  2 ++
>   arch/riscv/kernel/ptrace.c             |  5 ++++-
>   arch/riscv/kernel/signal.c             |  5 ++++-
>   arch/riscv/kernel/vector.c             |  2 +-
>   8 files changed, 41 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/entry-common.h b/arch/riscv/include/asm/entry-common.h
> index 7ab5e34318c8..6361a8488642 100644
> --- a/arch/riscv/include/asm/entry-common.h
> +++ b/arch/riscv/include/asm/entry-common.h
> @@ -4,6 +4,23 @@
>   #define _ASM_RISCV_ENTRY_COMMON_H
>   
>   #include <asm/stacktrace.h>
> +#include <asm/thread_info.h>
> +#include <asm/vector.h>
> +
> +static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
> +						  unsigned long ti_work)
> +{
> +	if (ti_work & _TIF_RISCV_V_DEFER_RESTORE) {
> +		clear_thread_flag(TIF_RISCV_V_DEFER_RESTORE);
> +		/*
> +		 * We are already called with irq disabled, so go without
> +		 * keeping track of vector_context_busy.
"vector_context_busy" here should mean the flag used to track in-kernel 
Vector context -- riscv_v_flags in this version, please update it.

> +		 */
> +		riscv_v_vstate_restore(current, regs);
> +	}
> +}
> +
> +#define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare
>   
>   void handle_page_fault(struct pt_regs *regs);
>   void handle_break(struct pt_regs *regs);
> diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
> index 574779900bfb..1047a97ddbc8 100644
> --- a/arch/riscv/include/asm/thread_info.h
> +++ b/arch/riscv/include/asm/thread_info.h
> @@ -103,12 +103,14 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src);
>   #define TIF_NOTIFY_SIGNAL	9	/* signal notifications exist */
>   #define TIF_UPROBE		10	/* uprobe breakpoint or singlestep */
>   #define TIF_32BIT		11	/* compat-mode 32bit process */
> +#define TIF_RISCV_V_DEFER_RESTORE	12 /* restore Vector before returing to user */
>   
>   #define _TIF_NOTIFY_RESUME	(1 << TIF_NOTIFY_RESUME)
>   #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
>   #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
>   #define _TIF_NOTIFY_SIGNAL	(1 << TIF_NOTIFY_SIGNAL)
>   #define _TIF_UPROBE		(1 << TIF_UPROBE)
> +#define _TIF_RISCV_V_DEFER_RESTORE	(1 << TIF_RISCV_V_DEFER_RESTORE)
>   
>   #define _TIF_WORK_MASK \
>   	(_TIF_NOTIFY_RESUME | _TIF_SIGPENDING | _TIF_NEED_RESCHED | \
> diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
> index 6254830c0668..e706613aae2c 100644
> --- a/arch/riscv/include/asm/vector.h
> +++ b/arch/riscv/include/asm/vector.h
> @@ -205,6 +205,15 @@ static inline void riscv_v_vstate_restore(struct task_struct *task,
>   	}
>   }
>   
> +static inline void riscv_v_vstate_set_restore(struct task_struct *task,
> +					      struct pt_regs *regs)
> +{
> +	if ((regs->status & SR_VS) != SR_VS_OFF) {
> +		set_tsk_thread_flag(task, TIF_RISCV_V_DEFER_RESTORE);
> +		riscv_v_vstate_on(regs);
> +	}
> +}
> +
>   static inline void __switch_to_vector(struct task_struct *prev,
>   				      struct task_struct *next)
>   {
> @@ -212,7 +221,7 @@ static inline void __switch_to_vector(struct task_struct *prev,
>   
>   	regs = task_pt_regs(prev);
>   	riscv_v_vstate_save(prev, regs);
> -	riscv_v_vstate_restore(next, task_pt_regs(next));
> +	riscv_v_vstate_set_restore(next, task_pt_regs(next));
>   }
>   
>   void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
> diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
> index 385d9b4d8cc6..63814e780c28 100644
> --- a/arch/riscv/kernel/kernel_mode_vector.c
> +++ b/arch/riscv/kernel/kernel_mode_vector.c
> @@ -96,7 +96,7 @@ void kernel_vector_end(void)
>   	if (WARN_ON(!has_vector()))
>   		return;
>   
> -	riscv_v_vstate_restore(current, task_pt_regs(current));
> +	riscv_v_vstate_set_restore(current, task_pt_regs(current));
>   
>   	riscv_v_disable();
>   
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index 4a1275db1146..36993f408de4 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -171,6 +171,7 @@ void flush_thread(void)
>   	riscv_v_vstate_off(task_pt_regs(current));
>   	kfree(current->thread.vstate.datap);
>   	memset(&current->thread.vstate, 0, sizeof(struct __riscv_v_ext_state));
> +	clear_tsk_thread_flag(current, TIF_RISCV_V_DEFER_RESTORE);
>   #endif
>   }
>   
> @@ -187,6 +188,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
>   	*dst = *src;
>   	/* clear entire V context, including datap for a new task */
>   	memset(&dst->thread.vstate, 0, sizeof(struct __riscv_v_ext_state));
> +	clear_tsk_thread_flag(dst, TIF_RISCV_V_DEFER_RESTORE);
>   
>   	return 0;
>   }
> diff --git a/arch/riscv/kernel/ptrace.c b/arch/riscv/kernel/ptrace.c
> index 2afe460de16a..7b93bcbdf9fa 100644
> --- a/arch/riscv/kernel/ptrace.c
> +++ b/arch/riscv/kernel/ptrace.c
> @@ -99,8 +99,11 @@ static int riscv_vr_get(struct task_struct *target,
>   	 * Ensure the vector registers have been saved to the memory before
>   	 * copying them to membuf.
>   	 */
> -	if (target == current)
> +	if (target == current) {
> +		get_cpu_vector_context();
>   		riscv_v_vstate_save(current, task_pt_regs(current));
> +		put_cpu_vector_context();
> +	}
>   
>   	ptrace_vstate.vstart = vstate->vstart;
>   	ptrace_vstate.vl = vstate->vl;
> diff --git a/arch/riscv/kernel/signal.c b/arch/riscv/kernel/signal.c
> index 88b6220b2608..aca4a12c8416 100644
> --- a/arch/riscv/kernel/signal.c
> +++ b/arch/riscv/kernel/signal.c
> @@ -86,7 +86,10 @@ static long save_v_state(struct pt_regs *regs, void __user **sc_vec)
>   	/* datap is designed to be 16 byte aligned for better performance */
>   	WARN_ON(unlikely(!IS_ALIGNED((unsigned long)datap, 16)));
>   
> +	get_cpu_vector_context();
>   	riscv_v_vstate_save(current, regs);
> +	put_cpu_vector_context();
> +
>   	/* Copy everything of vstate but datap. */
>   	err = __copy_to_user(&state->v_state, &current->thread.vstate,
>   			     offsetof(struct __riscv_v_ext_state, datap));
> @@ -134,7 +137,7 @@ static long __restore_v_state(struct pt_regs *regs, void __user *sc_vec)
>   	if (unlikely(err))
>   		return err;
>   
> -	riscv_v_vstate_restore(current, regs);
> +	riscv_v_vstate_set_restore(current, regs);
>   
>   	return err;
>   }
> diff --git a/arch/riscv/kernel/vector.c b/arch/riscv/kernel/vector.c
> index 578b6292487e..66e8c6ab09d2 100644
> --- a/arch/riscv/kernel/vector.c
> +++ b/arch/riscv/kernel/vector.c
> @@ -167,7 +167,7 @@ bool riscv_v_first_use_handler(struct pt_regs *regs)
>   		return true;
>   	}
>   	riscv_v_vstate_on(regs);
> -	riscv_v_vstate_restore(current, regs);
> +	riscv_v_vstate_set_restore(current, regs);
>   	return true;
>   }
>   

-- 
Thanks
Song Shuai

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [v8, 10/10] riscv: vector: allow kernel-mode Vector with preemption
  2023-12-23  4:29 ` [v8, 10/10] riscv: vector: allow kernel-mode Vector with preemption Andy Chiu
@ 2023-12-27 12:12   ` Song Shuai
  2023-12-27 22:45   ` Samuel Holland
  1 sibling, 0 replies; 24+ messages in thread
From: Song Shuai @ 2023-12-27 12:12 UTC (permalink / raw)
  To: Andy Chiu, linux-riscv
  Cc: paul.walmsley, greentime.hu, guoren, bjorn, charlie, ardb, arnd,
	peterz, tglx, ebiggers, Albert Ou, Guo Ren, Sami Tolvanen,
	Han-Kuan Chen, Deepak Gupta, Vincent Chen, Heiko Stuebner,
	Baoquan He, Clément Léger, Björn Töpel,
	Xiao Wang, Nathan Chancellor, Jisheng Zhang, Conor Dooley,
	Joel Granados, palmer


在 2023/12/23 12:29, Andy Chiu 写道:
> Add kernel_vstate to keep track of kernel-mode Vector registers when
> trap introduced context switch happens. Also, provide riscv_v_flags to
> let context save/restore routine track context status. Context tracking
> happens whenever the core starts its in-kernel Vector executions. An
> active (dirty) kernel task's V contexts will be saved to memory whenever
> a trap-introduced context switch happens. Or, when a softirq, which
> happens to nest on top of it, uses Vector. Context retoring happens when
> the execution transfer back to the original Kernel context where it
> first enable preempt_v.
> 
> Also, provide a config CONFIG_RISCV_ISA_V_PREEMPTIVE to give users an
> option to disable preemptible kernel-mode Vector at build time. Users
> with constraint memory may want to disable this config as preemptible
> kernel-mode Vector needs extra space for tracking of per thread's
> kernel-mode V context. Or, users might as well want to disable it if all
> kernel-mode Vector code is time sensitive and cannot tolerate context
> switch overhead.
> 
> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> ---
> Changelog v8:
>   - fix -Wmissing-prototypes for functions with asmlinkage
> Changelog v6:
>   - re-write patch to handle context nesting for softirqs
>   - drop thread flag and track context instead in riscv_v_flags
>   - refine some asm code and constraint it into C functions
>   - preallocate v context for preempt_v
>   - Return non-zero in riscv_v_start_kernel_context with non-preemptible
>     kernel-mode Vector
> Changelog v4:
>   - dropped from v4
> Changelog v3:
>   - Guard vstate_save with {get,set}_cpu_vector_context
>   - Add comments on preventions of nesting V contexts
>   - remove warnings in context switch when trap's reg is not pressent (Conor)
>   - refactor code (Björn)
> Changelog v2:
>   - fix build fail when compiling without RISCV_ISA_V (Conor)
>   - 's/TIF_RISCV_V_KMV/TIF_RISCV_V_KERNEL_MODE' and add comment (Conor)
>   - merge Kconfig patch into this oine (Conor).
>   - 's/CONFIG_RISCV_ISA_V_PREEMPTIVE_KMV/CONFIG_RISCV_ISA_V_PREEMPTIVE/'
>     (Conor)
>   - fix some typos (Conor)
>   - enclose assembly with RISCV_ISA_V_PREEMPTIVE.
>   - change riscv_v_vstate_ctrl_config_kmv() to
>     kernel_vector_allow_preemption() for better understanding. (Conor)
>   - 's/riscv_v_kmv_preempitble/kernel_vector_preemptible/'
> ---
>   arch/riscv/Kconfig                      |  14 +++
>   arch/riscv/include/asm/asm-prototypes.h |   5 +
>   arch/riscv/include/asm/processor.h      |  26 ++++-
>   arch/riscv/include/asm/simd.h           |  26 ++++-
>   arch/riscv/include/asm/vector.h         |  57 ++++++++++-
>   arch/riscv/kernel/entry.S               |   8 ++
>   arch/riscv/kernel/kernel_mode_vector.c  | 124 +++++++++++++++++++++++-
>   arch/riscv/kernel/process.c             |   3 +
>   arch/riscv/kernel/vector.c              |  31 ++++--
>   9 files changed, 273 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index cba53dcc2ae0..70603c486593 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -557,6 +557,20 @@ config RISCV_ISA_V_MEMMOVE_THRESHOLD
>   	  Prefer using vectorized memmove() when the workload size exceeds this
>   	  value.
>   
> +config RISCV_ISA_V_PREEMPTIVE
> +	bool "Run kernel-mode Vector with kernel preemption"
> +	depends on PREEMPTION
> +	depends on RISCV_ISA_V
> +	default y
> +	help
> +	  Usually, in-kernel SIMD routines are run with preemption disabled.
> +	  Functions which envoke long running SIMD thus must yield core's
> +	  vector unit to prevent blocking other tasks for too long.
> +
> +	  This config allows kernel to run SIMD without explicitly disable
> +	  preemption. Enabling this config will result in higher memory
> +	  consumption due to the allocation of per-task's kernel Vector context.
> +
>   config TOOLCHAIN_HAS_ZBB
>   	bool
>   	default y
> diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> index be438932f321..cd627ec289f1 100644
> --- a/arch/riscv/include/asm/asm-prototypes.h
> +++ b/arch/riscv/include/asm/asm-prototypes.h
> @@ -30,6 +30,11 @@ void xor_regs_5_(unsigned long bytes, unsigned long *__restrict p1,
>   		 const unsigned long *__restrict p4,
>   		 const unsigned long *__restrict p5);
>   
> +#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
> +asmlinkage void riscv_v_context_nesting_start(struct pt_regs *regs);
> +asmlinkage void riscv_v_context_nesting_end(struct pt_regs *regs);
> +#endif /* CONFIG_RISCV_ISA_V_PREEMPTIVE */
> +
>   #endif /* CONFIG_RISCV_ISA_V */
>   
>   #define DECLARE_DO_ERROR_INFO(name)	asmlinkage void name(struct pt_regs *regs)
> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> index 15781e2232e0..4de9124bcf4f 100644
> --- a/arch/riscv/include/asm/processor.h
> +++ b/arch/riscv/include/asm/processor.h
> @@ -81,11 +81,32 @@ struct pt_regs;
>    *    activation of this state disables the preemption. On a non-RT kernel, it
>    *    also disable bh. Currently only 0 and 1 are valid value for this field.
>    *    Other values are reserved for future uses.
> + *  - bits 8-15 are used for tracking preemptible kernel-mode Vector, when
> + *    RISCV_ISA_V_PREEMPTIVE is set. Calling kernel_vector_begin() does not
> + *    disable the preemption if the thread's kernel_vstate.datap is allocated.
> + *    Instead, the kernel adds 1 into this field. Then the trap entry/exit code
> + *    knows if we are entering/exiting the context that owns preempt_v.
> + *     - 0: the task is not using preempt_v
> + *     - 1: the task is actively using, and owns preempt_v
> + *     - >1: the task was using preempt_v, but then took a trap within. Thus,
> + *       the task does not own preempt_v. Any use of Vector will have to save
> + *       preempt_v, if dirty, and fallback to non-preemptible kernel-mode
> + *       Vector.
> + *   - bit 30: The in-kernel preempt_v context is saved, and requries to be
> + *     restored when returning to the context that owns the preempt_v.
> + *   - bit 31: The in-kernel preempt_v context is dirty, as signaled by the
> + *     trap entry code. Any context switches out-of current task need to save
> + *     it to the task's in-kernel V context. Also, any traps nesting on-top-of
> + *     preempt_v requesting to use V needs a save.
>    */
>   
> -#define RISCV_KERNEL_MODE_V_MASK	0xff
> +#define RISCV_KERNEL_MODE_V_MASK	0x000000ff
> +#define RISCV_PREEMPT_V_MASK		0x0000ff00
>   
> -#define RISCV_KERNEL_MODE_V	0x1
> +#define RISCV_KERNEL_MODE_V		0x00000001
> +#define RISCV_PREEMPT_V			0x00000100
> +#define RISCV_PREEMPT_V_DIRTY		0x80000000
> +#define RISCV_PREEMPT_V_NEED_RESTORE	0x40000000
>   
>   /* CPU-specific state of a task */
>   struct thread_struct {
> @@ -99,6 +120,7 @@ struct thread_struct {
>   	u32 vstate_ctrl;
>   	struct __riscv_v_ext_state vstate;
>   	unsigned long align_ctl;
> +	struct __riscv_v_ext_state kernel_vstate;
>   };
>   
>   /* Whitelist the fstate from the task_struct for hardened usercopy */
> diff --git a/arch/riscv/include/asm/simd.h b/arch/riscv/include/asm/simd.h
> index 2f1e95ccb03c..7daccdcbdee8 100644
> --- a/arch/riscv/include/asm/simd.h
> +++ b/arch/riscv/include/asm/simd.h
> @@ -12,6 +12,7 @@
>   #include <linux/percpu.h>
>   #include <linux/preempt.h>
>   #include <linux/types.h>
> +#include <linux/thread_info.h>
>   
>   #include <asm/vector.h>
>   
> @@ -28,12 +29,27 @@ static __must_check inline bool may_use_simd(void)
>   	/*
>   	 * RISCV_KERNEL_MODE_V is only set while preemption is disabled,
>   	 * and is clear whenever preemption is enabled.
> -	 *
> -	 * Kernel-mode Vector temporarily disables bh. So we must not return
> -	 * true on irq_disabled(). Otherwise we would fail the lockdep check
> -	 * calling local_bh_enable()
>   	 */
> -	return !in_hardirq() && !in_nmi() && !irqs_disabled() && !(riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK);
> +	if (in_hardirq() || in_nmi())
> +		return false;
> +
> +	/*
> +	 * Nesting is acheived in preempt_v by spreading the control for
> +	 * preemptible and non-preemptible kernel-mode Vector into two fields.
> +	 * Always try to match with prempt_v if kernel V-context exists. Then,
> +	 * fallback to check non preempt_v if nesting happens, or if the config
> +	 * is not set.
> +	 */
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_V_PREEMPTIVE) && current->thread.kernel_vstate.datap) {
> +		if (!riscv_preempt_v_started(current))
> +			return true;
> +	}
> +	/*
> +	 * Non-preemptible kernel-mode Vector temporarily disables bh. So we
> +	 * must not return true on irq_disabled(). Otherwise we would fail the
> +	 * lockdep check calling local_bh_enable()
> +	 */
> +	return !irqs_disabled() && !(riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK);
>   }
>   
>   #else /* ! CONFIG_RISCV_ISA_V */
> diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
> index 0e6741dd9ef3..542eaf9227c3 100644
> --- a/arch/riscv/include/asm/vector.h
> +++ b/arch/riscv/include/asm/vector.h
> @@ -28,6 +28,7 @@ void get_cpu_vector_context(void);
>   void put_cpu_vector_context(void);
>   void riscv_v_thread_free(struct task_struct *tsk);
>   void __init riscv_v_setup_ctx_cache(void);
> +void riscv_v_thread_alloc(struct task_struct *tsk);
>   
>   static inline void riscv_v_ctx_cnt_add(u32 offset)
>   {
> @@ -212,14 +213,63 @@ static inline void riscv_v_vstate_set_restore(struct task_struct *task,
>   	}
>   }
>   
> +#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
> +static inline bool riscv_preempt_v_dirty(struct task_struct *task)
> +{
> +	u32 val = READ_ONCE(task->thread.riscv_v_flags);
> +
> +	return !!(val & RISCV_PREEMPT_V_DIRTY);
> +}
> +
> +static inline bool riscv_preempt_v_restore(struct task_struct *task)
> +{
> +	u32 val = READ_ONCE(task->thread.riscv_v_flags);
> +
> +	return !!(val & RISCV_PREEMPT_V_NEED_RESTORE);
> +}
> +
> +static inline void riscv_preempt_v_clear_dirty(struct task_struct *task)
> +{
> +	barrier();
> +	task->thread.riscv_v_flags &= ~RISCV_PREEMPT_V_DIRTY;
> +}
> +
> +static inline void riscv_preempt_v_set_restore(struct task_struct *task)
> +{
> +	barrier();
> +	task->thread.riscv_v_flags |= RISCV_PREEMPT_V_NEED_RESTORE;
> +}
> +
> +static inline bool riscv_preempt_v_started(struct task_struct *task)
> +{
> +	return !!(READ_ONCE(task->thread.riscv_v_flags) & RISCV_PREEMPT_V_MASK);
> +}
> +#else /* !CONFIG_RISCV_ISA_V_PREEMPTIVE */
> +static inline bool riscv_preempt_v_dirty(struct task_struct *task) { return false; }
> +static inline bool riscv_preempt_v_restore(struct task_struct *task) { return false; }
> +static inline bool riscv_preempt_v_started(struct task_struct *task) { return false; }
> +#define riscv_preempt_v_clear_dirty(tsk)	do {} while (0)
> +#define riscv_preempt_v_set_restore(tsk)	do {} while (0)
> +#endif /* CONFIG_RISCV_ISA_V_PREEMPTIVE */
> +
>   static inline void __switch_to_vector(struct task_struct *prev,
>   				      struct task_struct *next)
>   {
>   	struct pt_regs *regs;
>   
> -	regs = task_pt_regs(prev);
> -	riscv_v_vstate_save(&prev->thread.vstate, regs);
> -	riscv_v_vstate_set_restore(next, task_pt_regs(next));
> +	if (riscv_preempt_v_dirty(prev)) {
> +		__riscv_v_vstate_save(&prev->thread.kernel_vstate,
> +				      prev->thread.kernel_vstate.datap);
> +		riscv_preempt_v_clear_dirty(prev);
> +	} else {
> +		regs = task_pt_regs(prev);
> +		riscv_v_vstate_save(&prev->thread.vstate, regs);
In this thread [1], IIUC, Wang and you prefer to skip the SR_SD check 
before saving the [vf]state,
and I also found that check isn't put in the snippet.

how about removing the SR_SD check for the fpu() case and including it 
in this series?

[1]:https://lore.kernel.org/linux-riscv/20231221070449.1809020-1-songshuaishuai@tinylab.org/ 

> +	}
> +
> +	if (riscv_preempt_v_started(next))
> +		riscv_preempt_v_set_restore(next);
> +	else
> +		riscv_v_vstate_set_restore(next, task_pt_regs(next));
>   }
>   
>   void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
> @@ -243,6 +293,7 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; }
>   #define riscv_v_vstate_on(regs)			do {} while (0)
>   #define riscv_v_thread_free(tsk)		do {} while (0)
>   #define  riscv_v_setup_ctx_cache()		do {} while (0)
> +#define riscv_v_thread_alloc(tsk)		do {} while (0)
>   
>   #endif /* CONFIG_RISCV_ISA_V */
>   
> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> index 54ca4564a926..9d1a305d5508 100644
> --- a/arch/riscv/kernel/entry.S
> +++ b/arch/riscv/kernel/entry.S
> @@ -83,6 +83,10 @@ SYM_CODE_START(handle_exception)
>   	/* Load the kernel shadow call stack pointer if coming from userspace */
>   	scs_load_current_if_task_changed s5
>   
> +#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
> +	move a0, sp
> +	call riscv_v_context_nesting_start
> +#endif
>   	move a0, sp /* pt_regs */
>   	la ra, ret_from_exception
>   
> @@ -138,6 +142,10 @@ SYM_CODE_START_NOALIGN(ret_from_exception)
>   	 */
>   	csrw CSR_SCRATCH, tp
>   1:
> +#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
> +	move a0, sp
> +	call riscv_v_context_nesting_end
> +#endif
>   	REG_L a0, PT_STATUS(sp)
>   	/*
>   	 * The current load reservation is effectively part of the processor's
> diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
> index 7350e975e094..75d6b00842b3 100644
> --- a/arch/riscv/kernel/kernel_mode_vector.c
> +++ b/arch/riscv/kernel/kernel_mode_vector.c
> @@ -14,6 +14,9 @@
>   #include <asm/vector.h>
>   #include <asm/switch_to.h>
>   #include <asm/simd.h>
> +#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
> +#include <asm/asm-prototypes.h>
> +#endif
>   
>   /*
>    * Claim ownership of the CPU vector context for use by the calling context.
> @@ -54,6 +57,111 @@ void put_cpu_vector_context(void)
>   		preempt_enable();
>   }
>   
> +#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
> +static inline void riscv_preempt_v_set_dirty(void)
> +{
> +	current->thread.riscv_v_flags |= RISCV_PREEMPT_V_DIRTY;
> +}
> +
> +static inline void riscv_preempt_v_reset_flags(void)
> +{
> +	current->thread.riscv_v_flags &= ~(RISCV_PREEMPT_V_DIRTY | RISCV_PREEMPT_V_NEED_RESTORE);
> +}
> +
> +static inline void riscv_preempt_v_depth_inc(void)
> +{
> +	riscv_v_ctx_cnt_add(RISCV_PREEMPT_V);
> +}
> +
> +static inline void riscv_preempt_v_depth_dec(void)
> +{
> +	riscv_v_ctx_cnt_sub(RISCV_PREEMPT_V);
> +}
> +
> +static inline u32 riscv_preempt_v_get_depth(void)
> +{
> +	return riscv_v_ctx_cnt() & RISCV_PREEMPT_V_MASK;
> +}
> +
> +#define PREEMPT_V_FIRST_DEPTH	RISCV_PREEMPT_V
> +static int riscv_v_stop_kernel_context(void)
> +{
> +	if (riscv_preempt_v_get_depth() != PREEMPT_V_FIRST_DEPTH)
> +		return 1;
> +
> +	riscv_preempt_v_depth_dec();
> +	return 0;
> +}
> +
> +static int riscv_v_start_kernel_context(bool *is_nested)
> +{
> +	struct __riscv_v_ext_state *vstate = &current->thread.kernel_vstate;
> +
> +	if (!vstate->datap)
> +		return -ENOENT;
> +
> +	if (riscv_preempt_v_started(current)) {
> +		WARN_ON(riscv_preempt_v_get_depth() == PREEMPT_V_FIRST_DEPTH);
> +		if (riscv_preempt_v_dirty(current)) {
> +			get_cpu_vector_context();
> +			__riscv_v_vstate_save(vstate, vstate->datap);
> +			riscv_preempt_v_clear_dirty(current);
> +			put_cpu_vector_context();
> +		}
> +		get_cpu_vector_context();
> +		riscv_preempt_v_set_restore(current);
> +		*is_nested = true;
> +		return 0;
> +	}
> +
> +	get_cpu_vector_context();
> +	riscv_v_vstate_save(&current->thread.vstate, task_pt_regs(current));
> +	put_cpu_vector_context();
> +
> +	riscv_preempt_v_depth_inc();
> +	return 0;
> +}
> +
> +/* low-level V context handling code, called with irq disabled */
> +asmlinkage void riscv_v_context_nesting_start(struct pt_regs *regs)
> +{
> +	int depth;
> +
> +	if (!riscv_preempt_v_started(current))
> +		return;
> +
> +	depth = riscv_preempt_v_get_depth();
> +	if (depth == PREEMPT_V_FIRST_DEPTH && (regs->status & SR_VS) == SR_VS_DIRTY)
> +		riscv_preempt_v_set_dirty();
> +
> +	riscv_preempt_v_depth_inc();
> +}
> +
> +asmlinkage void riscv_v_context_nesting_end(struct pt_regs *regs)
> +{
> +	struct __riscv_v_ext_state *vstate = &current->thread.kernel_vstate;
> +	u32 depth;
> +
> +	lockdep_assert_irqs_disabled();
> +
> +	if (!riscv_preempt_v_started(current))
> +		return;
> +
> +	riscv_preempt_v_depth_dec();
> +	depth = riscv_preempt_v_get_depth();
> +	if (depth == PREEMPT_V_FIRST_DEPTH) {
> +		if (riscv_preempt_v_restore(current)) {
> +			__riscv_v_vstate_restore(vstate, vstate->datap);
> +			__riscv_v_vstate_clean(regs);
> +		}
> +		riscv_preempt_v_reset_flags();
> +	}
> +}
> +#else
> +#define riscv_v_start_kernel_context(nested)	(-ENOENT)
> +#define riscv_v_stop_kernel_context()		(-ENOENT)
> +#endif /* CONFIG_RISCV_ISA_V_PREEMPTIVE */
> +
>   /*
>    * kernel_vector_begin(): obtain the CPU vector registers for use by the calling
>    * context
> @@ -69,14 +177,20 @@ void put_cpu_vector_context(void)
>    */
>   void kernel_vector_begin(void)
>   {
> +	bool nested = false;
> +
>   	if (WARN_ON(!has_vector()))
>   		return;
>   
>   	BUG_ON(!may_use_simd());
>   
> -	get_cpu_vector_context();
> +	if (riscv_v_start_kernel_context(&nested)) {
> +		get_cpu_vector_context();
> +		riscv_v_vstate_save(&current->thread.vstate, task_pt_regs(current));
> +	}
>   
> -	riscv_v_vstate_save(&current->thread.vstate, task_pt_regs(current));
> +	if (!nested)
> +		riscv_v_vstate_set_restore(current, task_pt_regs(current));
>   
>   	riscv_v_enable();
>   }
> @@ -96,10 +210,10 @@ void kernel_vector_end(void)
>   	if (WARN_ON(!has_vector()))
>   		return;
>   
> -	riscv_v_vstate_set_restore(current, task_pt_regs(current));
> -
>   	riscv_v_disable();
>   
> -	put_cpu_vector_context();
> +	if (riscv_v_stop_kernel_context()) {// we should call this early
> +		put_cpu_vector_context();
> +	}
>   }
>   EXPORT_SYMBOL_GPL(kernel_vector_end);
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index 862d59c3872e..92922dbd5b5c 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -188,6 +188,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
>   	*dst = *src;
>   	/* clear entire V context, including datap for a new task */
>   	memset(&dst->thread.vstate, 0, sizeof(struct __riscv_v_ext_state));
> +	memset(&dst->thread.kernel_vstate, 0, sizeof(struct __riscv_v_ext_state));
>   	clear_tsk_thread_flag(dst, TIF_RISCV_V_DEFER_RESTORE);
>   
>   	return 0;
> @@ -224,6 +225,8 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>   		p->thread.s[0] = 0;
>   	}
>   	p->thread.riscv_v_flags = 0;
> +	if (has_vector())
> +		riscv_v_thread_alloc(p);
>   	p->thread.ra = (unsigned long)ret_from_fork;
>   	p->thread.sp = (unsigned long)childregs; /* kernel sp */
>   	return 0;
> diff --git a/arch/riscv/kernel/vector.c b/arch/riscv/kernel/vector.c
> index 1fe140e34557..f9769703fd39 100644
> --- a/arch/riscv/kernel/vector.c
> +++ b/arch/riscv/kernel/vector.c
> @@ -22,6 +22,9 @@
>   
>   static bool riscv_v_implicit_uacc = IS_ENABLED(CONFIG_RISCV_ISA_V_DEFAULT_ENABLE);
>   static struct kmem_cache *riscv_v_user_cachep;
> +#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
> +static struct kmem_cache *riscv_v_kernel_cachep;
> +#endif
>   
>   unsigned long riscv_v_vsize __read_mostly;
>   EXPORT_SYMBOL_GPL(riscv_v_vsize);
> @@ -53,6 +56,11 @@ void __init riscv_v_setup_ctx_cache(void)
>   	riscv_v_user_cachep = kmem_cache_create_usercopy("riscv_vector_ctx",
>   							 riscv_v_vsize, 16, SLAB_PANIC,
>   							 0, riscv_v_vsize, NULL);
> +#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
> +	riscv_v_kernel_cachep = kmem_cache_create("riscv_vector_kctx",
> +						  riscv_v_vsize, 16,
> +						  SLAB_PANIC, NULL);
> +#endif
>   }
>   
>   static bool insn_is_vector(u32 insn_buf)
> @@ -88,24 +96,35 @@ static bool insn_is_vector(u32 insn_buf)
>   	return false;
>   }
>   
> -static int riscv_v_thread_zalloc(void)
> +static int riscv_v_thread_zalloc(struct kmem_cache *cache,
> +				 struct __riscv_v_ext_state *ctx)
>   {
>   	void *datap;
>   
> -	datap = kmem_cache_zalloc(riscv_v_user_cachep, GFP_KERNEL);
> +	datap = kmem_cache_zalloc(cache, GFP_KERNEL);
>   	if (!datap)
>   		return -ENOMEM;
>   
> -	current->thread.vstate.datap = datap;
> -	memset(&current->thread.vstate, 0, offsetof(struct __riscv_v_ext_state,
> -						    datap));
> +	ctx->datap = datap;
> +	memset(ctx, 0, offsetof(struct __riscv_v_ext_state, datap));
>   	return 0;
>   }
>   
> +void riscv_v_thread_alloc(struct task_struct *tsk)
> +{
> +#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
> +	riscv_v_thread_zalloc(riscv_v_kernel_cachep, &tsk->thread.kernel_vstate);
> +#endif
> +}
> +
>   void riscv_v_thread_free(struct task_struct *tsk)
>   {
>   	if (tsk->thread.vstate.datap)
>   		kmem_cache_free(riscv_v_user_cachep, tsk->thread.vstate.datap);
> +#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
> +	if (tsk->thread.kernel_vstate.datap)
> +		kmem_cache_free(riscv_v_kernel_cachep, tsk->thread.kernel_vstate.datap);
> +#endif
>   }
>   
>   #define VSTATE_CTRL_GET_CUR(x) ((x) & PR_RISCV_V_VSTATE_CTRL_CUR_MASK)
> @@ -177,7 +196,7 @@ bool riscv_v_first_use_handler(struct pt_regs *regs)
>   	 * context where VS has been off. So, try to allocate the user's V
>   	 * context and resume execution.
>   	 */
> -	if (riscv_v_thread_zalloc()) {
> +	if (riscv_v_thread_zalloc(riscv_v_user_cachep, &current->thread.vstate)) {
>   		force_sig(SIGBUS);
>   		return true;
>   	}

-- 
Thanks
Song Shuai

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [v8, 10/10] riscv: vector: allow kernel-mode Vector with preemption
  2023-12-23  4:29 ` [v8, 10/10] riscv: vector: allow kernel-mode Vector with preemption Andy Chiu
  2023-12-27 12:12   ` Song Shuai
@ 2023-12-27 22:45   ` Samuel Holland
  1 sibling, 0 replies; 24+ messages in thread
From: Samuel Holland @ 2023-12-27 22:45 UTC (permalink / raw)
  To: Andy Chiu, linux-riscv, palmer
  Cc: paul.walmsley, greentime.hu, guoren, bjorn, charlie, ardb, arnd,
	peterz, tglx, ebiggers, Albert Ou, Guo Ren, Sami Tolvanen,
	Han-Kuan Chen, Deepak Gupta, Vincent Chen, Heiko Stuebner,
	Baoquan He, Clément Léger, Björn Töpel,
	Xiao Wang, Nathan Chancellor, Jisheng Zhang, Conor Dooley,
	Joel Granados

On 2023-12-22 10:29 PM, Andy Chiu wrote:
> Add kernel_vstate to keep track of kernel-mode Vector registers when
> trap introduced context switch happens. Also, provide riscv_v_flags to
> let context save/restore routine track context status. Context tracking
> happens whenever the core starts its in-kernel Vector executions. An
> active (dirty) kernel task's V contexts will be saved to memory whenever
> a trap-introduced context switch happens. Or, when a softirq, which
> happens to nest on top of it, uses Vector. Context retoring happens when
> the execution transfer back to the original Kernel context where it
> first enable preempt_v.
> 
> Also, provide a config CONFIG_RISCV_ISA_V_PREEMPTIVE to give users an
> option to disable preemptible kernel-mode Vector at build time. Users
> with constraint memory may want to disable this config as preemptible
> kernel-mode Vector needs extra space for tracking of per thread's
> kernel-mode V context. Or, users might as well want to disable it if all
> kernel-mode Vector code is time sensitive and cannot tolerate context
> switch overhead.
> 
> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> ---
> Changelog v8:
>  - fix -Wmissing-prototypes for functions with asmlinkage
> Changelog v6:
>  - re-write patch to handle context nesting for softirqs
>  - drop thread flag and track context instead in riscv_v_flags
>  - refine some asm code and constraint it into C functions
>  - preallocate v context for preempt_v
>  - Return non-zero in riscv_v_start_kernel_context with non-preemptible
>    kernel-mode Vector
> Changelog v4:
>  - dropped from v4
> Changelog v3:
>  - Guard vstate_save with {get,set}_cpu_vector_context
>  - Add comments on preventions of nesting V contexts
>  - remove warnings in context switch when trap's reg is not pressent (Conor)
>  - refactor code (Björn)
> Changelog v2:
>  - fix build fail when compiling without RISCV_ISA_V (Conor)
>  - 's/TIF_RISCV_V_KMV/TIF_RISCV_V_KERNEL_MODE' and add comment (Conor)
>  - merge Kconfig patch into this oine (Conor).
>  - 's/CONFIG_RISCV_ISA_V_PREEMPTIVE_KMV/CONFIG_RISCV_ISA_V_PREEMPTIVE/'
>    (Conor)
>  - fix some typos (Conor)
>  - enclose assembly with RISCV_ISA_V_PREEMPTIVE.
>  - change riscv_v_vstate_ctrl_config_kmv() to
>    kernel_vector_allow_preemption() for better understanding. (Conor)
>  - 's/riscv_v_kmv_preempitble/kernel_vector_preemptible/'
> ---
>  arch/riscv/Kconfig                      |  14 +++
>  arch/riscv/include/asm/asm-prototypes.h |   5 +
>  arch/riscv/include/asm/processor.h      |  26 ++++-
>  arch/riscv/include/asm/simd.h           |  26 ++++-
>  arch/riscv/include/asm/vector.h         |  57 ++++++++++-
>  arch/riscv/kernel/entry.S               |   8 ++
>  arch/riscv/kernel/kernel_mode_vector.c  | 124 +++++++++++++++++++++++-
>  arch/riscv/kernel/process.c             |   3 +
>  arch/riscv/kernel/vector.c              |  31 ++++--
>  9 files changed, 273 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index cba53dcc2ae0..70603c486593 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -557,6 +557,20 @@ config RISCV_ISA_V_MEMMOVE_THRESHOLD
>  	  Prefer using vectorized memmove() when the workload size exceeds this
>  	  value.
>  
> +config RISCV_ISA_V_PREEMPTIVE
> +	bool "Run kernel-mode Vector with kernel preemption"
> +	depends on PREEMPTION
> +	depends on RISCV_ISA_V
> +	default y
> +	help
> +	  Usually, in-kernel SIMD routines are run with preemption disabled.
> +	  Functions which envoke long running SIMD thus must yield core's
> +	  vector unit to prevent blocking other tasks for too long.
> +
> +	  This config allows kernel to run SIMD without explicitly disable
> +	  preemption. Enabling this config will result in higher memory
> +	  consumption due to the allocation of per-task's kernel Vector context.
> +
>  config TOOLCHAIN_HAS_ZBB
>  	bool
>  	default y
> diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> index be438932f321..cd627ec289f1 100644
> --- a/arch/riscv/include/asm/asm-prototypes.h
> +++ b/arch/riscv/include/asm/asm-prototypes.h
> @@ -30,6 +30,11 @@ void xor_regs_5_(unsigned long bytes, unsigned long *__restrict p1,
>  		 const unsigned long *__restrict p4,
>  		 const unsigned long *__restrict p5);
>  
> +#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
> +asmlinkage void riscv_v_context_nesting_start(struct pt_regs *regs);
> +asmlinkage void riscv_v_context_nesting_end(struct pt_regs *regs);
> +#endif /* CONFIG_RISCV_ISA_V_PREEMPTIVE */
> +
>  #endif /* CONFIG_RISCV_ISA_V */
>  
>  #define DECLARE_DO_ERROR_INFO(name)	asmlinkage void name(struct pt_regs *regs)
> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> index 15781e2232e0..4de9124bcf4f 100644
> --- a/arch/riscv/include/asm/processor.h
> +++ b/arch/riscv/include/asm/processor.h
> @@ -81,11 +81,32 @@ struct pt_regs;
>   *    activation of this state disables the preemption. On a non-RT kernel, it
>   *    also disable bh. Currently only 0 and 1 are valid value for this field.
>   *    Other values are reserved for future uses.
> + *  - bits 8-15 are used for tracking preemptible kernel-mode Vector, when
> + *    RISCV_ISA_V_PREEMPTIVE is set. Calling kernel_vector_begin() does not
> + *    disable the preemption if the thread's kernel_vstate.datap is allocated.
> + *    Instead, the kernel adds 1 into this field. Then the trap entry/exit code
> + *    knows if we are entering/exiting the context that owns preempt_v.
> + *     - 0: the task is not using preempt_v
> + *     - 1: the task is actively using, and owns preempt_v
> + *     - >1: the task was using preempt_v, but then took a trap within. Thus,
> + *       the task does not own preempt_v. Any use of Vector will have to save
> + *       preempt_v, if dirty, and fallback to non-preemptible kernel-mode
> + *       Vector.
> + *   - bit 30: The in-kernel preempt_v context is saved, and requries to be
> + *     restored when returning to the context that owns the preempt_v.
> + *   - bit 31: The in-kernel preempt_v context is dirty, as signaled by the
> + *     trap entry code. Any context switches out-of current task need to save
> + *     it to the task's in-kernel V context. Also, any traps nesting on-top-of
> + *     preempt_v requesting to use V needs a save.
>   */
>  
> -#define RISCV_KERNEL_MODE_V_MASK	0xff
> +#define RISCV_KERNEL_MODE_V_MASK	0x000000ff
> +#define RISCV_PREEMPT_V_MASK		0x0000ff00
>  
> -#define RISCV_KERNEL_MODE_V	0x1
> +#define RISCV_KERNEL_MODE_V		0x00000001
> +#define RISCV_PREEMPT_V			0x00000100
> +#define RISCV_PREEMPT_V_DIRTY		0x80000000
> +#define RISCV_PREEMPT_V_NEED_RESTORE	0x40000000
>  
>  /* CPU-specific state of a task */
>  struct thread_struct {
> @@ -99,6 +120,7 @@ struct thread_struct {
>  	u32 vstate_ctrl;
>  	struct __riscv_v_ext_state vstate;
>  	unsigned long align_ctl;
> +	struct __riscv_v_ext_state kernel_vstate;
>  };
>  
>  /* Whitelist the fstate from the task_struct for hardened usercopy */
> diff --git a/arch/riscv/include/asm/simd.h b/arch/riscv/include/asm/simd.h
> index 2f1e95ccb03c..7daccdcbdee8 100644
> --- a/arch/riscv/include/asm/simd.h
> +++ b/arch/riscv/include/asm/simd.h
> @@ -12,6 +12,7 @@
>  #include <linux/percpu.h>
>  #include <linux/preempt.h>
>  #include <linux/types.h>
> +#include <linux/thread_info.h>
>  
>  #include <asm/vector.h>
>  
> @@ -28,12 +29,27 @@ static __must_check inline bool may_use_simd(void)
>  	/*
>  	 * RISCV_KERNEL_MODE_V is only set while preemption is disabled,
>  	 * and is clear whenever preemption is enabled.
> -	 *
> -	 * Kernel-mode Vector temporarily disables bh. So we must not return
> -	 * true on irq_disabled(). Otherwise we would fail the lockdep check
> -	 * calling local_bh_enable()
>  	 */
> -	return !in_hardirq() && !in_nmi() && !irqs_disabled() && !(riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK);
> +	if (in_hardirq() || in_nmi())
> +		return false;
> +
> +	/*
> +	 * Nesting is acheived in preempt_v by spreading the control for
> +	 * preemptible and non-preemptible kernel-mode Vector into two fields.
> +	 * Always try to match with prempt_v if kernel V-context exists. Then,
> +	 * fallback to check non preempt_v if nesting happens, or if the config
> +	 * is not set.
> +	 */
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_V_PREEMPTIVE) && current->thread.kernel_vstate.datap) {
> +		if (!riscv_preempt_v_started(current))
> +			return true;
> +	}
> +	/*
> +	 * Non-preemptible kernel-mode Vector temporarily disables bh. So we
> +	 * must not return true on irq_disabled(). Otherwise we would fail the
> +	 * lockdep check calling local_bh_enable()
> +	 */
> +	return !irqs_disabled() && !(riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK);
>  }
>  
>  #else /* ! CONFIG_RISCV_ISA_V */
> diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
> index 0e6741dd9ef3..542eaf9227c3 100644
> --- a/arch/riscv/include/asm/vector.h
> +++ b/arch/riscv/include/asm/vector.h
> @@ -28,6 +28,7 @@ void get_cpu_vector_context(void);
>  void put_cpu_vector_context(void);
>  void riscv_v_thread_free(struct task_struct *tsk);
>  void __init riscv_v_setup_ctx_cache(void);
> +void riscv_v_thread_alloc(struct task_struct *tsk);
>  
>  static inline void riscv_v_ctx_cnt_add(u32 offset)
>  {
> @@ -212,14 +213,63 @@ static inline void riscv_v_vstate_set_restore(struct task_struct *task,
>  	}
>  }
>  
> +#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
> +static inline bool riscv_preempt_v_dirty(struct task_struct *task)
> +{
> +	u32 val = READ_ONCE(task->thread.riscv_v_flags);
> +
> +	return !!(val & RISCV_PREEMPT_V_DIRTY);
> +}
> +
> +static inline bool riscv_preempt_v_restore(struct task_struct *task)
> +{
> +	u32 val = READ_ONCE(task->thread.riscv_v_flags);
> +
> +	return !!(val & RISCV_PREEMPT_V_NEED_RESTORE);
> +}
> +
> +static inline void riscv_preempt_v_clear_dirty(struct task_struct *task)
> +{
> +	barrier();
> +	task->thread.riscv_v_flags &= ~RISCV_PREEMPT_V_DIRTY;
> +}
> +
> +static inline void riscv_preempt_v_set_restore(struct task_struct *task)
> +{
> +	barrier();
> +	task->thread.riscv_v_flags |= RISCV_PREEMPT_V_NEED_RESTORE;
> +}
> +
> +static inline bool riscv_preempt_v_started(struct task_struct *task)
> +{
> +	return !!(READ_ONCE(task->thread.riscv_v_flags) & RISCV_PREEMPT_V_MASK);
> +}
> +#else /* !CONFIG_RISCV_ISA_V_PREEMPTIVE */
> +static inline bool riscv_preempt_v_dirty(struct task_struct *task) { return false; }
> +static inline bool riscv_preempt_v_restore(struct task_struct *task) { return false; }
> +static inline bool riscv_preempt_v_started(struct task_struct *task) { return false; }
> +#define riscv_preempt_v_clear_dirty(tsk)	do {} while (0)
> +#define riscv_preempt_v_set_restore(tsk)	do {} while (0)
> +#endif /* CONFIG_RISCV_ISA_V_PREEMPTIVE */
> +
>  static inline void __switch_to_vector(struct task_struct *prev,
>  				      struct task_struct *next)
>  {
>  	struct pt_regs *regs;
>  
> -	regs = task_pt_regs(prev);
> -	riscv_v_vstate_save(&prev->thread.vstate, regs);
> -	riscv_v_vstate_set_restore(next, task_pt_regs(next));
> +	if (riscv_preempt_v_dirty(prev)) {
> +		__riscv_v_vstate_save(&prev->thread.kernel_vstate,
> +				      prev->thread.kernel_vstate.datap);
> +		riscv_preempt_v_clear_dirty(prev);
> +	} else {
> +		regs = task_pt_regs(prev);
> +		riscv_v_vstate_save(&prev->thread.vstate, regs);
> +	}
> +
> +	if (riscv_preempt_v_started(next))
> +		riscv_preempt_v_set_restore(next);
> +	else
> +		riscv_v_vstate_set_restore(next, task_pt_regs(next));
>  }
>  
>  void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
> @@ -243,6 +293,7 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; }
>  #define riscv_v_vstate_on(regs)			do {} while (0)
>  #define riscv_v_thread_free(tsk)		do {} while (0)
>  #define  riscv_v_setup_ctx_cache()		do {} while (0)
> +#define riscv_v_thread_alloc(tsk)		do {} while (0)
>  
>  #endif /* CONFIG_RISCV_ISA_V */
>  
> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> index 54ca4564a926..9d1a305d5508 100644
> --- a/arch/riscv/kernel/entry.S
> +++ b/arch/riscv/kernel/entry.S
> @@ -83,6 +83,10 @@ SYM_CODE_START(handle_exception)
>  	/* Load the kernel shadow call stack pointer if coming from userspace */
>  	scs_load_current_if_task_changed s5
>  
> +#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
> +	move a0, sp
> +	call riscv_v_context_nesting_start
> +#endif
>  	move a0, sp /* pt_regs */
>  	la ra, ret_from_exception
>  
> @@ -138,6 +142,10 @@ SYM_CODE_START_NOALIGN(ret_from_exception)
>  	 */
>  	csrw CSR_SCRATCH, tp
>  1:
> +#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
> +	move a0, sp
> +	call riscv_v_context_nesting_end
> +#endif
>  	REG_L a0, PT_STATUS(sp)
>  	/*
>  	 * The current load reservation is effectively part of the processor's
> diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
> index 7350e975e094..75d6b00842b3 100644
> --- a/arch/riscv/kernel/kernel_mode_vector.c
> +++ b/arch/riscv/kernel/kernel_mode_vector.c
> @@ -14,6 +14,9 @@
>  #include <asm/vector.h>
>  #include <asm/switch_to.h>
>  #include <asm/simd.h>
> +#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
> +#include <asm/asm-prototypes.h>
> +#endif
>  
>  /*
>   * Claim ownership of the CPU vector context for use by the calling context.
> @@ -54,6 +57,111 @@ void put_cpu_vector_context(void)
>  		preempt_enable();
>  }
>  
> +#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
> +static inline void riscv_preempt_v_set_dirty(void)
> +{
> +	current->thread.riscv_v_flags |= RISCV_PREEMPT_V_DIRTY;
> +}
> +
> +static inline void riscv_preempt_v_reset_flags(void)
> +{
> +	current->thread.riscv_v_flags &= ~(RISCV_PREEMPT_V_DIRTY | RISCV_PREEMPT_V_NEED_RESTORE);
> +}
> +
> +static inline void riscv_preempt_v_depth_inc(void)
> +{
> +	riscv_v_ctx_cnt_add(RISCV_PREEMPT_V);
> +}
> +
> +static inline void riscv_preempt_v_depth_dec(void)
> +{
> +	riscv_v_ctx_cnt_sub(RISCV_PREEMPT_V);
> +}
> +
> +static inline u32 riscv_preempt_v_get_depth(void)
> +{
> +	return riscv_v_ctx_cnt() & RISCV_PREEMPT_V_MASK;
> +}
> +
> +#define PREEMPT_V_FIRST_DEPTH	RISCV_PREEMPT_V
> +static int riscv_v_stop_kernel_context(void)
> +{
> +	if (riscv_preempt_v_get_depth() != PREEMPT_V_FIRST_DEPTH)
> +		return 1;
> +
> +	riscv_preempt_v_depth_dec();
> +	return 0;
> +}
> +
> +static int riscv_v_start_kernel_context(bool *is_nested)
> +{
> +	struct __riscv_v_ext_state *vstate = &current->thread.kernel_vstate;
> +
> +	if (!vstate->datap)
> +		return -ENOENT;
> +
> +	if (riscv_preempt_v_started(current)) {
> +		WARN_ON(riscv_preempt_v_get_depth() == PREEMPT_V_FIRST_DEPTH);
> +		if (riscv_preempt_v_dirty(current)) {
> +			get_cpu_vector_context();
> +			__riscv_v_vstate_save(vstate, vstate->datap);
> +			riscv_preempt_v_clear_dirty(current);
> +			put_cpu_vector_context();
> +		}
> +		get_cpu_vector_context();
> +		riscv_preempt_v_set_restore(current);
> +		*is_nested = true;
> +		return 0;
> +	}
> +
> +	get_cpu_vector_context();
> +	riscv_v_vstate_save(&current->thread.vstate, task_pt_regs(current));
> +	put_cpu_vector_context();
> +
> +	riscv_preempt_v_depth_inc();
> +	return 0;
> +}
> +
> +/* low-level V context handling code, called with irq disabled */
> +asmlinkage void riscv_v_context_nesting_start(struct pt_regs *regs)
> +{
> +	int depth;
> +
> +	if (!riscv_preempt_v_started(current))
> +		return;
> +
> +	depth = riscv_preempt_v_get_depth();
> +	if (depth == PREEMPT_V_FIRST_DEPTH && (regs->status & SR_VS) == SR_VS_DIRTY)
> +		riscv_preempt_v_set_dirty();
> +
> +	riscv_preempt_v_depth_inc();
> +}
> +
> +asmlinkage void riscv_v_context_nesting_end(struct pt_regs *regs)
> +{
> +	struct __riscv_v_ext_state *vstate = &current->thread.kernel_vstate;
> +	u32 depth;
> +
> +	lockdep_assert_irqs_disabled();

I'm seeing this assertion fail immediately during boot:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at arch/riscv/kernel/kernel_mode_vector.c:145 riscv_v_context_nesting_end+0x17a/0x184
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.7.0-rc7-mainline-maybe-dirty #1
Hardware name: SiFive HiFive Unmatched A00 (DT)
epc : riscv_v_context_nesting_end+0x17a/0x184
 ra : ret_from_exception+0x1c/0x6e
epc : ffffffff8000a410 ra : ffffffff80d64da6 sp : ffffffff81a03d60
 gp : ffffffff81c047c8 tp : ffffffff81a27040 t0 : fffffffffffffb58
 t1 : ffffffff81aae7c0 t2 : 0000000000000000 s0 : ffffffff81a03d90
 s1 : 0000000000000001 a0 : 0000000000000001 a1 : ffffffff8101e430
 a2 : 0000000000000001 a3 : ffffffff81a27a30 a4 : 0000000000000000
 a5 : 0000000000000000 a6 : 0000000000000003 a7 : ffffffdbefeed0a0
 s2 : ffffffff81a03d90 s3 : ffffffff8297f190 s4 : 8000000000000005
 s5 : ffffffff81a27040 s6 : 00000000ffef6ab0 s7 : 0000000080200000
 s8 : 0000000000000710 s9 : 00000000ffef6bc8 s10: 0000000000000003
 s11: 0000000000000000 t3 : 0000000000000000 t4 : 0000000000000000
 t5 : 0000000000003288 t6 : 0000000000000008
status: 0000000200000100 badaddr: ffffffff81a27040 cause: 0000000000000003
[<ffffffff8000a410>] riscv_v_context_nesting_end+0x17a/0x184
[<ffffffff80d64da6>] ret_from_exception+0x1c/0x6e
irq event stamp: 11
hardirqs last  enabled at (11): [<ffffffff80d581b0>] irqentry_exit+0xd2/0x116
hardirqs last disabled at (9): [<ffffffff80d6536c>] __do_softirq+0x404/0x526
softirqs last  enabled at (10): [<ffffffff80d65430>] __do_softirq+0x4c8/0x526
softirqs last disabled at (3): [<ffffffff80042a94>] __irq_exit_rcu+0x74/0xca
---[ end trace 0000000000000000 ]---

It looks like lockdep_hardirqs_on() is called from the generic entry code,
so lockdep thinks IRQs are enabled throughout ret_from_exception(), even if
they don't actually get enabled until the sret instruction. So I think this
assertion should be removed.

Regards,
Samuel

> +
> +	if (!riscv_preempt_v_started(current))
> +		return;
> +
> +	riscv_preempt_v_depth_dec();
> +	depth = riscv_preempt_v_get_depth();
> +	if (depth == PREEMPT_V_FIRST_DEPTH) {
> +		if (riscv_preempt_v_restore(current)) {
> +			__riscv_v_vstate_restore(vstate, vstate->datap);
> +			__riscv_v_vstate_clean(regs);
> +		}
> +		riscv_preempt_v_reset_flags();
> +	}
> +}
> +#else
> +#define riscv_v_start_kernel_context(nested)	(-ENOENT)
> +#define riscv_v_stop_kernel_context()		(-ENOENT)
> +#endif /* CONFIG_RISCV_ISA_V_PREEMPTIVE */
> +
>  /*
>   * kernel_vector_begin(): obtain the CPU vector registers for use by the calling
>   * context
> @@ -69,14 +177,20 @@ void put_cpu_vector_context(void)
>   */
>  void kernel_vector_begin(void)
>  {
> +	bool nested = false;
> +
>  	if (WARN_ON(!has_vector()))
>  		return;
>  
>  	BUG_ON(!may_use_simd());
>  
> -	get_cpu_vector_context();
> +	if (riscv_v_start_kernel_context(&nested)) {
> +		get_cpu_vector_context();
> +		riscv_v_vstate_save(&current->thread.vstate, task_pt_regs(current));
> +	}
>  
> -	riscv_v_vstate_save(&current->thread.vstate, task_pt_regs(current));
> +	if (!nested)
> +		riscv_v_vstate_set_restore(current, task_pt_regs(current));
>  
>  	riscv_v_enable();
>  }
> @@ -96,10 +210,10 @@ void kernel_vector_end(void)
>  	if (WARN_ON(!has_vector()))
>  		return;
>  
> -	riscv_v_vstate_set_restore(current, task_pt_regs(current));
> -
>  	riscv_v_disable();
>  
> -	put_cpu_vector_context();
> +	if (riscv_v_stop_kernel_context()) {// we should call this early
> +		put_cpu_vector_context();
> +	}
>  }
>  EXPORT_SYMBOL_GPL(kernel_vector_end);
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index 862d59c3872e..92922dbd5b5c 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -188,6 +188,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
>  	*dst = *src;
>  	/* clear entire V context, including datap for a new task */
>  	memset(&dst->thread.vstate, 0, sizeof(struct __riscv_v_ext_state));
> +	memset(&dst->thread.kernel_vstate, 0, sizeof(struct __riscv_v_ext_state));
>  	clear_tsk_thread_flag(dst, TIF_RISCV_V_DEFER_RESTORE);
>  
>  	return 0;
> @@ -224,6 +225,8 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>  		p->thread.s[0] = 0;
>  	}
>  	p->thread.riscv_v_flags = 0;
> +	if (has_vector())
> +		riscv_v_thread_alloc(p);
>  	p->thread.ra = (unsigned long)ret_from_fork;
>  	p->thread.sp = (unsigned long)childregs; /* kernel sp */
>  	return 0;
> diff --git a/arch/riscv/kernel/vector.c b/arch/riscv/kernel/vector.c
> index 1fe140e34557..f9769703fd39 100644
> --- a/arch/riscv/kernel/vector.c
> +++ b/arch/riscv/kernel/vector.c
> @@ -22,6 +22,9 @@
>  
>  static bool riscv_v_implicit_uacc = IS_ENABLED(CONFIG_RISCV_ISA_V_DEFAULT_ENABLE);
>  static struct kmem_cache *riscv_v_user_cachep;
> +#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
> +static struct kmem_cache *riscv_v_kernel_cachep;
> +#endif
>  
>  unsigned long riscv_v_vsize __read_mostly;
>  EXPORT_SYMBOL_GPL(riscv_v_vsize);
> @@ -53,6 +56,11 @@ void __init riscv_v_setup_ctx_cache(void)
>  	riscv_v_user_cachep = kmem_cache_create_usercopy("riscv_vector_ctx",
>  							 riscv_v_vsize, 16, SLAB_PANIC,
>  							 0, riscv_v_vsize, NULL);
> +#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
> +	riscv_v_kernel_cachep = kmem_cache_create("riscv_vector_kctx",
> +						  riscv_v_vsize, 16,
> +						  SLAB_PANIC, NULL);
> +#endif
>  }
>  
>  static bool insn_is_vector(u32 insn_buf)
> @@ -88,24 +96,35 @@ static bool insn_is_vector(u32 insn_buf)
>  	return false;
>  }
>  
> -static int riscv_v_thread_zalloc(void)
> +static int riscv_v_thread_zalloc(struct kmem_cache *cache,
> +				 struct __riscv_v_ext_state *ctx)
>  {
>  	void *datap;
>  
> -	datap = kmem_cache_zalloc(riscv_v_user_cachep, GFP_KERNEL);
> +	datap = kmem_cache_zalloc(cache, GFP_KERNEL);
>  	if (!datap)
>  		return -ENOMEM;
>  
> -	current->thread.vstate.datap = datap;
> -	memset(&current->thread.vstate, 0, offsetof(struct __riscv_v_ext_state,
> -						    datap));
> +	ctx->datap = datap;
> +	memset(ctx, 0, offsetof(struct __riscv_v_ext_state, datap));
>  	return 0;
>  }
>  
> +void riscv_v_thread_alloc(struct task_struct *tsk)
> +{
> +#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
> +	riscv_v_thread_zalloc(riscv_v_kernel_cachep, &tsk->thread.kernel_vstate);
> +#endif
> +}
> +
>  void riscv_v_thread_free(struct task_struct *tsk)
>  {
>  	if (tsk->thread.vstate.datap)
>  		kmem_cache_free(riscv_v_user_cachep, tsk->thread.vstate.datap);
> +#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
> +	if (tsk->thread.kernel_vstate.datap)
> +		kmem_cache_free(riscv_v_kernel_cachep, tsk->thread.kernel_vstate.datap);
> +#endif
>  }
>  
>  #define VSTATE_CTRL_GET_CUR(x) ((x) & PR_RISCV_V_VSTATE_CTRL_CUR_MASK)
> @@ -177,7 +196,7 @@ bool riscv_v_first_use_handler(struct pt_regs *regs)
>  	 * context where VS has been off. So, try to allocate the user's V
>  	 * context and resume execution.
>  	 */
> -	if (riscv_v_thread_zalloc()) {
> +	if (riscv_v_thread_zalloc(riscv_v_user_cachep, &current->thread.vstate)) {
>  		force_sig(SIGBUS);
>  		return true;
>  	}


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [v8, 01/10] riscv: Add support for kernel mode vector
  2023-12-27  9:18         ` Andy Chiu
@ 2023-12-28  1:52           ` Charlie Jenkins
  0 siblings, 0 replies; 24+ messages in thread
From: Charlie Jenkins @ 2023-12-28  1:52 UTC (permalink / raw)
  To: Andy Chiu
  Cc: linux-riscv, palmer, paul.walmsley, greentime.hu, guoren, bjorn,
	ardb, arnd, peterz, tglx, ebiggers, Vincent Chen, Albert Ou,
	Heiko Stuebner, Baoquan He, Clément Léger, Guo Ren,
	Xiao Wang, Björn Töpel, Conor Dooley, Alexandre Ghiti,
	Sami Tolvanen, Sia Jee Heng, Evan Green, Jisheng Zhang

On Wed, Dec 27, 2023 at 05:18:10PM +0800, Andy Chiu wrote:
> On Wed, Dec 27, 2023 at 1:30 PM Charlie Jenkins <charlie@rivosinc.com> wrote:
> >
> > On Wed, Dec 27, 2023 at 10:46:58AM +0800, Andy Chiu wrote:
> > > On Wed, Dec 27, 2023 at 9:36 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
> > > >
> > > > On Sat, Dec 23, 2023 at 04:29:05AM +0000, Andy Chiu wrote:
> > > > > From: Greentime Hu <greentime.hu@sifive.com>
> > > > >
> > > > > Add kernel_vector_begin() and kernel_vector_end() function declarations
> > > > > and corresponding definitions in kernel_mode_vector.c
> > > > >
> > > > > These are needed to wrap uses of vector in kernel mode.
> > > > >
> > > > > Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
> > > > > Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> > > > > Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> > > > > Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> > > > > ---
> > > > > Changelog v8:
> > > > >  - Refactor unnecessary whitespace change (Eric)
> > > > > Changelog v7:
> > > > >  - fix build fail for allmodconfig
> > > > > Changelog v6:
> > > > >  - Use 8 bits to track non-preemptible vector context to provide better
> > > > >    WARN coverage.
> > > > > Changelog v4:
> > > > >  - Use kernel_v_flags and helpers to track vector context.
> > > > > Changelog v3:
> > > > >  - Reorder patch 1 to patch 3 to make use of
> > > > >    {get,put}_cpu_vector_context later.
> > > > >  - Export {get,put}_cpu_vector_context.
> > > > >  - Save V context after disabling preemption. (Guo)
> > > > >  - Fix a build fail. (Conor)
> > > > >  - Remove irqs_disabled() check as it is not needed, fix styling. (Björn)
> > > > > Changelog v2:
> > > > >  - 's/kernel_rvv/kernel_vector' and return void in kernel_vector_begin
> > > > >    (Conor)
> > > > >  - export may_use_simd to include/asm/simd.h
> > > > > ---
> > > > >  arch/riscv/include/asm/processor.h     | 17 ++++-
> > > > >  arch/riscv/include/asm/simd.h          | 44 ++++++++++++
> > > > >  arch/riscv/include/asm/vector.h        | 21 ++++++
> > > > >  arch/riscv/kernel/Makefile             |  1 +
> > > > >  arch/riscv/kernel/kernel_mode_vector.c | 95 ++++++++++++++++++++++++++
> > > > >  arch/riscv/kernel/process.c            |  1 +
> > > > >  6 files changed, 178 insertions(+), 1 deletion(-)
> > > > >  create mode 100644 arch/riscv/include/asm/simd.h
> > > > >  create mode 100644 arch/riscv/kernel/kernel_mode_vector.c
> > > > >
> > > > > diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> > > > > index f19f861cda54..15781e2232e0 100644
> > > > > --- a/arch/riscv/include/asm/processor.h
> > > > > +++ b/arch/riscv/include/asm/processor.h
> > > > > @@ -73,6 +73,20 @@
> > > > >  struct task_struct;
> > > > >  struct pt_regs;
> > > > >
> > > > > +/*
> > > > > + * We use a flag to track in-kernel Vector context. Currently the flag has the
> > > > > + * following meaning:
> > > > > + *
> > > > > + *  - bit 0-7 indicates whether the in-kernel Vector context is active. The
> > > > > + *    activation of this state disables the preemption. On a non-RT kernel, it
> > > > > + *    also disable bh. Currently only 0 and 1 are valid value for this field.
> > > > > + *    Other values are reserved for future uses.
> > > > > + */
> > > > > +
> > > > > +#define RISCV_KERNEL_MODE_V_MASK     0xff
> > > > > +
> > > > > +#define RISCV_KERNEL_MODE_V  0x1
> > > > > +
> > > > >  /* CPU-specific state of a task */
> > > > >  struct thread_struct {
> > > > >       /* Callee-saved registers */
> > > > > @@ -81,7 +95,8 @@ struct thread_struct {
> > > > >       unsigned long s[12];    /* s[0]: frame pointer */
> > > > >       struct __riscv_d_ext_state fstate;
> > > > >       unsigned long bad_cause;
> > > > > -     unsigned long vstate_ctrl;
> > > > > +     u32 riscv_v_flags;
> > > > > +     u32 vstate_ctrl;
> > > > >       struct __riscv_v_ext_state vstate;
> > > > >       unsigned long align_ctl;
> > > > >  };
> > > > > diff --git a/arch/riscv/include/asm/simd.h b/arch/riscv/include/asm/simd.h
> > > > > new file mode 100644
> > > > > index 000000000000..3b603e47c5d8
> > > > > --- /dev/null
> > > > > +++ b/arch/riscv/include/asm/simd.h
> > > > > @@ -0,0 +1,44 @@
> > > > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > > > +/*
> > > > > + * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
> > > > > + * Copyright (C) 2023 SiFive
> > > > > + */
> > > > > +
> > > > > +#ifndef __ASM_SIMD_H
> > > > > +#define __ASM_SIMD_H
> > > > > +
> > > > > +#include <linux/compiler.h>
> > > > > +#include <linux/irqflags.h>
> > > > > +#include <linux/percpu.h>
> > > > > +#include <linux/preempt.h>
> > > > > +#include <linux/types.h>
> > > > > +
> > > > > +#include <asm/vector.h>
> > > > > +
> > > > > +#ifdef CONFIG_RISCV_ISA_V
> > > > > +/*
> > > > > + * may_use_simd - whether it is allowable at this time to issue vector
> > > > > + *                instructions or access the vector register file
> > > > > + *
> > > > > + * Callers must not assume that the result remains true beyond the next
> > > > > + * preempt_enable() or return from softirq context.
> > > > > + */
> > > > > +static __must_check inline bool may_use_simd(void)
> > > > > +{
> > > > > +     /*
> > > > > +      * RISCV_KERNEL_MODE_V is only set while preemption is disabled,
> > > > > +      * and is clear whenever preemption is enabled.
> > > > > +      */
> > > > > +     return !in_hardirq() && !in_nmi() && !(riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK);
> > > > > +}
> > > > > +
> > > > > +#else /* ! CONFIG_RISCV_ISA_V */
> > > > > +
> > > > > +static __must_check inline bool may_use_simd(void)
> > > > > +{
> > > > > +     return false;
> > > > > +}
> > > > > +
> > > > > +#endif /* ! CONFIG_RISCV_ISA_V */
> > > > > +
> > > > > +#endif
> > > > > diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
> > > > > index 87aaef656257..6254830c0668 100644
> > > > > --- a/arch/riscv/include/asm/vector.h
> > > > > +++ b/arch/riscv/include/asm/vector.h
> > > > > @@ -22,6 +22,27 @@
> > > > >  extern unsigned long riscv_v_vsize;
> > > > >  int riscv_v_setup_vsize(void);
> > > > >  bool riscv_v_first_use_handler(struct pt_regs *regs);
> > > > > +void kernel_vector_begin(void);
> > > > > +void kernel_vector_end(void);
> > > > > +void get_cpu_vector_context(void);
> > > > > +void put_cpu_vector_context(void);
> > > > > +
> > > > > +static inline void riscv_v_ctx_cnt_add(u32 offset)
> > > > > +{
> > > > > +     current->thread.riscv_v_flags += offset;
> > > > > +     barrier();
> > > > > +}
> > > > > +
> > > > > +static inline void riscv_v_ctx_cnt_sub(u32 offset)
> > > > > +{
> > > > > +     barrier();
> > > > > +     current->thread.riscv_v_flags -= offset;
> > > > > +}
> > > > > +
> > > > > +static inline u32 riscv_v_ctx_cnt(void)
> > > > > +{
> > > > > +     return READ_ONCE(current->thread.riscv_v_flags);
> > > > > +}
> > > > >
> > > > >  static __always_inline bool has_vector(void)
> > > > >  {
> > > > > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > > > > index fee22a3d1b53..8c58595696b3 100644
> > > > > --- a/arch/riscv/kernel/Makefile
> > > > > +++ b/arch/riscv/kernel/Makefile
> > > > > @@ -63,6 +63,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
> > > > >  obj-$(CONFIG_RISCV_MISALIGNED)       += traps_misaligned.o
> > > > >  obj-$(CONFIG_FPU)            += fpu.o
> > > > >  obj-$(CONFIG_RISCV_ISA_V)    += vector.o
> > > > > +obj-$(CONFIG_RISCV_ISA_V)    += kernel_mode_vector.o
> > > > >  obj-$(CONFIG_SMP)            += smpboot.o
> > > > >  obj-$(CONFIG_SMP)            += smp.o
> > > > >  obj-$(CONFIG_SMP)            += cpu_ops.o
> > > > > diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
> > > > > new file mode 100644
> > > > > index 000000000000..105147c7d2da
> > > > > --- /dev/null
> > > > > +++ b/arch/riscv/kernel/kernel_mode_vector.c
> > > > > @@ -0,0 +1,95 @@
> > > > > +// SPDX-License-Identifier: GPL-2.0-or-later
> > > > > +/*
> > > > > + * Copyright (C) 2012 ARM Ltd.
> > > > > + * Author: Catalin Marinas <catalin.marinas@arm.com>
> > > > > + * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
> > > > > + * Copyright (C) 2021 SiFive
> > > > > + */
> > > > > +#include <linux/compiler.h>
> > > > > +#include <linux/irqflags.h>
> > > > > +#include <linux/percpu.h>
> > > > > +#include <linux/preempt.h>
> > > > > +#include <linux/types.h>
> > > > > +
> > > > > +#include <asm/vector.h>
> > > > > +#include <asm/switch_to.h>
> > > > > +#include <asm/simd.h>
> > > > > +
> > > > > +/*
> > > > > + * Claim ownership of the CPU vector context for use by the calling context.
> > > > > + *
> > > > > + * The caller may freely manipulate the vector context metadata until
> > > > > + * put_cpu_vector_context() is called.
> > > > > + */
> > > > > +void get_cpu_vector_context(void)
> > > > > +{
> > > > > +     preempt_disable();
> > > > > +
> > > > > +     WARN_ON((riscv_v_ctx_cnt() & RISCV_KERNEL_MODE_V_MASK) != 0);
> > > > > +     riscv_v_ctx_cnt_add(RISCV_KERNEL_MODE_V);
> > > >
> > > > In our last conversation I thought we agreed that a bitwise operation
> > > > would be more appropriate then addition. You also mentioned allowing
> > > > this function to be called multiple times. Did something change?
> > >
> > > I am having the same discussion with Eric on this thread [1]. Using
> > > counter add/sub and mask with the bitmask provides the same overflow
> > > protection. It also helps us reuse the same mechanism for preempt_v
> > > and for allowing this function to be called multiple times. I have not
> > > done the second part because it is going to be very close to an idea
> > > of enabling V for the entire kernel. For example, it is possible to
> > > launch a kernel thread and wrap it with kernel_vector_*. If people
> > > feel ok about this then I will add this into v9. We will have to
> > > change the bitmap a little, and track context at trap entry/exit
> > > regardless of CONFIG_RISCV_ISA_V_PREEMPTIVE.
> > >
> > > - [1]: https://lore.kernel.org/all/20231222053014.GC52600@quark.localdomain/T/#m4f87d3c745853d518f96fb87a48c1d59e63b3d18
> > >
> > > Thanks,
> 
> Hey, I figured out a way to address the above problems, please wait for v9.
> 
> > > Andy
> >
> > Okay I understand now, it is a counter to know how many calls along the
> > chain have called get_cpu_vector_context. However, if it is not yet
> > supported to have nested calls to get_cpu_vector_context, then it should
> > be an error to call it more than once and not just a warning.
> 
> Do you suggest promoting WARN_ON to a BUG_ON?

Yes. I think that is more clear in this case.

> 
> >
> > - Charlie
> >
> 
> Thanks,
> Andy

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [v8, 05/10] riscv: lib: vectorize copy_to_user/copy_from_user
  2023-12-27  3:15     ` Andy Chiu
@ 2024-01-15  5:42       ` Andy Chiu
  0 siblings, 0 replies; 24+ messages in thread
From: Andy Chiu @ 2024-01-15  5:42 UTC (permalink / raw)
  To: Guo Ren
  Cc: linux-riscv, palmer, paul.walmsley, greentime.hu, guoren, bjorn,
	charlie, ardb, arnd, peterz, tglx, ebiggers, Albert Ou,
	Sami Tolvanen, Han-Kuan Chen, Deepak Gupta, Andrew Jones,
	Conor Dooley, Heiko Stuebner, Aurelien Jarno, Bo YU,
	Alexandre Ghiti, Clément Léger

Hi Guo,

On Wed, Dec 27, 2023 at 11:15 AM Andy Chiu <andy.chiu@sifive.com> wrote:
>
> On Wed, Dec 27, 2023 at 9:34 AM Guo Ren <guoren@kernel.org> wrote:
> >
> > On Sat, Dec 23, 2023 at 12:30 PM Andy Chiu <andy.chiu@sifive.com> wrote:
> > >
> > > This patch utilizes Vector to perform copy_to_user/copy_from_user. If
> > > Vector is available and the size of copy is large enough for Vector to
> > > perform better than scalar, then direct the kernel to do Vector copies
> > > for userspace. Though the best programming practice for users is to
> > > reduce the copy, this provides a faster variant when copies are
> > > inevitable.
> > >
> > > The optimal size for using Vector, copy_to_user_thres, is only a
> > > heuristic for now. We can add DT parsing if people feel the need of
> > > customizing it.
> > >
> > > The exception fixup code of the __asm_vector_usercopy must fallback to
> > > the scalar one because accessing user pages might fault, and must be
> > > sleepable. Current kernel-mode Vector does not allow tasks to be
> > > preemptible, so we must disactivate Vector and perform a scalar fallback
> > > in such case.
> > >
> > > The original implementation of Vector operations comes from
> > > https://github.com/sifive/sifive-libc, which we agree to contribute to
> > > Linux kernel.
> > >
> > > Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> > > ---
> > > Changelog v8:
> > >  - fix no-mmu build
> > > Changelog v6:
> > >  - Add a kconfig entry to configure threshold values (Charlie)
> > >  - Refine assembly code (Charlie)
> > > Changelog v4:
> > >  - new patch since v4
> > > ---
> > >  arch/riscv/Kconfig                      |  8 ++++
> > >  arch/riscv/include/asm/asm-prototypes.h |  4 ++
> > >  arch/riscv/lib/Makefile                 |  6 ++-
> > >  arch/riscv/lib/riscv_v_helpers.c        | 44 ++++++++++++++++++++++
> > >  arch/riscv/lib/uaccess.S                | 10 +++++
> > >  arch/riscv/lib/uaccess_vector.S         | 50 +++++++++++++++++++++++++
> > >  6 files changed, 121 insertions(+), 1 deletion(-)
> > >  create mode 100644 arch/riscv/lib/riscv_v_helpers.c
> > >  create mode 100644 arch/riscv/lib/uaccess_vector.S
> > >
> > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > index 95a2a06acc6a..3c5ba05e8a2d 100644
> > > --- a/arch/riscv/Kconfig
> > > +++ b/arch/riscv/Kconfig
> > > @@ -525,6 +525,14 @@ config RISCV_ISA_V_DEFAULT_ENABLE
> > >
> > >           If you don't know what to do here, say Y.
> > >
> > > +config RISCV_ISA_V_UCOPY_THRESHOLD
> > > +       int "Threshold size for vectorized user copies"
> > > +       depends on RISCV_ISA_V
> > > +       default 768
> > > +       help
> > > +         Prefer using vectorized copy_to_user()/copy_from_user() when the
> > > +         workload size exceeds this value.
> > > +
> > >  config TOOLCHAIN_HAS_ZBB
> > >         bool
> > >         default y
> > > diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> > > index 6db1a9bbff4c..be438932f321 100644
> > > --- a/arch/riscv/include/asm/asm-prototypes.h
> > > +++ b/arch/riscv/include/asm/asm-prototypes.h
> > > @@ -11,6 +11,10 @@ long long __ashlti3(long long a, int b);
> > >
> > >  #ifdef CONFIG_RISCV_ISA_V
> > >
> > > +#ifdef CONFIG_MMU
> > > +asmlinkage int enter_vector_usercopy(void *dst, void *src, size_t n);
> > > +#endif /* CONFIG_MMU  */
> > > +
> > >  void xor_regs_2_(unsigned long bytes, unsigned long *__restrict p1,
> > >                  const unsigned long *__restrict p2);
> > >  void xor_regs_3_(unsigned long bytes, unsigned long *__restrict p1,
> > > diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
> > > index 494f9cd1a00c..c8a6787d5827 100644
> > > --- a/arch/riscv/lib/Makefile
> > > +++ b/arch/riscv/lib/Makefile
> > > @@ -6,9 +6,13 @@ lib-y                  += memmove.o
> > >  lib-y                  += strcmp.o
> > >  lib-y                  += strlen.o
> > >  lib-y                  += strncmp.o
> > > -lib-$(CONFIG_MMU)      += uaccess.o
> > > +ifeq ($(CONFIG_MMU), y)
> > > +lib-y                          += uaccess.o
> > > +lib-$(CONFIG_RISCV_ISA_V)      += uaccess_vector.o
> > > +endif
> > >  lib-$(CONFIG_64BIT)    += tishift.o
> > >  lib-$(CONFIG_RISCV_ISA_ZICBOZ) += clear_page.o
> > >
> > >  obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
> > >  lib-$(CONFIG_RISCV_ISA_V)      += xor.o
> > > +lib-$(CONFIG_RISCV_ISA_V)      += riscv_v_helpers.o
> > > diff --git a/arch/riscv/lib/riscv_v_helpers.c b/arch/riscv/lib/riscv_v_helpers.c
> > > new file mode 100644
> > > index 000000000000..6cac8f4e69e9
> > > --- /dev/null
> > > +++ b/arch/riscv/lib/riscv_v_helpers.c
> > > @@ -0,0 +1,44 @@
> > > +// SPDX-License-Identifier: GPL-2.0-or-later
> > > +/*
> > > + * Copyright (C) 2023 SiFive
> > > + * Author: Andy Chiu <andy.chiu@sifive.com>
> > > + */
> > > +#include <linux/linkage.h>
> > > +#include <asm/asm.h>
> > > +
> > > +#include <asm/vector.h>
> > > +#include <asm/simd.h>
> > > +
> > > +#ifdef CONFIG_MMU
> > > +#include <asm/asm-prototypes.h>
> > > +#endif
> > > +
> > > +#ifdef CONFIG_MMU
> > > +size_t riscv_v_usercopy_threshold = CONFIG_RISCV_ISA_V_UCOPY_THRESHOLD;
> > > +int __asm_vector_usercopy(void *dst, void *src, size_t n);
> > > +int fallback_scalar_usercopy(void *dst, void *src, size_t n);
> > > +asmlinkage int enter_vector_usercopy(void *dst, void *src, size_t n)
> > > +{
> > > +       size_t remain, copied;
> > > +
> > > +       /* skip has_vector() check because it has been done by the asm  */
> > > +       if (!may_use_simd())
> > > +               goto fallback;
> > > +
> > > +       kernel_vector_begin();
> > > +       remain = __asm_vector_usercopy(dst, src, n);
> > > +       kernel_vector_end();
> > > +
> > > +       if (remain) {
> > > +               copied = n - remain;
> > > +               dst += copied;
> > > +               src += copied;
> > > +               goto fallback;
> > > +       }
> > > +
> > > +       return remain;
> > > +
> > > +fallback:
> > > +       return fallback_scalar_usercopy(dst, src, n);
> > > +}
> > > +#endif
> > > diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S
> > > index 3ab438f30d13..a1e4a3c42925 100644
> > > --- a/arch/riscv/lib/uaccess.S
> > > +++ b/arch/riscv/lib/uaccess.S
> > > @@ -3,6 +3,8 @@
> > >  #include <asm/asm.h>
> > >  #include <asm/asm-extable.h>
> > >  #include <asm/csr.h>
> > > +#include <asm/hwcap.h>
> > > +#include <asm/alternative-macros.h>
> > >
> > >         .macro fixup op reg addr lbl
> > >  100:
> > > @@ -11,6 +13,13 @@
> > >         .endm
> > >
> > >  SYM_FUNC_START(__asm_copy_to_user)
> > > +#ifdef CONFIG_RISCV_ISA_V
> > > +       ALTERNATIVE("j fallback_scalar_usercopy", "nop", 0, RISCV_ISA_EXT_v, CONFIG_RISCV_ISA_V)
> > > +       REG_L   t0, riscv_v_usercopy_threshold
> > > +       bltu    a2, t0, fallback_scalar_usercopy
> > > +       tail enter_vector_usercopy
> > > +#endif
> > > +SYM_FUNC_START(fallback_scalar_usercopy)
> > >
> > >         /* Enable access to user memory */
> > >         li t6, SR_SUM
> > > @@ -181,6 +190,7 @@ SYM_FUNC_START(__asm_copy_to_user)
> > >         sub a0, t5, a0
> > >         ret
> > >  SYM_FUNC_END(__asm_copy_to_user)
> > > +SYM_FUNC_END(fallback_scalar_usercopy)
> > >  EXPORT_SYMBOL(__asm_copy_to_user)
> > >  SYM_FUNC_ALIAS(__asm_copy_from_user, __asm_copy_to_user)
> > >  EXPORT_SYMBOL(__asm_copy_from_user)
> > > diff --git a/arch/riscv/lib/uaccess_vector.S b/arch/riscv/lib/uaccess_vector.S
> > > new file mode 100644
> > > index 000000000000..7bd96cee39e4
> > > --- /dev/null
> > > +++ b/arch/riscv/lib/uaccess_vector.S
> > > @@ -0,0 +1,50 @@
> > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > +
> > > +#include <linux/linkage.h>
> > > +#include <asm-generic/export.h>
> > > +#include <asm/asm.h>
> > > +#include <asm/asm-extable.h>
> > > +#include <asm/csr.h>
> > > +
> > > +#define pDst a0
> > > +#define pSrc a1
> > > +#define iNum a2
> > > +
> > > +#define iVL a3
> > > +
> > > +#define ELEM_LMUL_SETTING m8
> > > +#define vData v0
> > > +
> > > +       .macro fixup op reg addr lbl
> > > +100:
> > > +       \op \reg, \addr
> > > +       _asm_extable    100b, \lbl
> > > +       .endm
> > > +
> > > +SYM_FUNC_START(__asm_vector_usercopy)
> > > +       /* Enable access to user memory */
> > > +       li t6, SR_SUM
> > > +       csrs CSR_STATUS, t6
> > > +
> > > +loop:
> > > +       vsetvli iVL, iNum, e8, ELEM_LMUL_SETTING, ta, ma
> > > +       fixup vle8.v vData, (pSrc), 10f
> > > +       fixup vse8.v vData, (pDst), 10f
> > > +       sub iNum, iNum, iVL
> > > +       add pSrc, pSrc, iVL
> > > +       add pDst, pDst, iVL
> > > +       bnez iNum, loop
> > > +
> > > +.Lout_copy_user:
> > > +       /* Disable access to user memory */
> > > +       csrc CSR_STATUS, t6
> > > +       li      a0, 0
> > > +       ret
> > > +
> > > +       /* Exception fixup code */
> > > +10:
> > > +       /* Disable access to user memory */
> > > +       csrc    CSR_STATUS, t6
> > > +       mv      a0, iNum
> > Shall we check CSR_VSTART to find out how many elements were copied?
>
> This is a good idea! But is it possible to find out if we were trapped
> at the load or the store instruction? IIUC if we trap at the load then
> we could not derive the number of copied bytes from CSR_VSTART.

Actually we can!, by separating out the fuxup for vle8.v and vse8.v. I
am going to include the update for this in v11. Thanks for the
suggestion!

>
> Thanks,
> Andy

Regards,
Andy

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-01-15  5:43 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-23  4:29 [v8, 00/10] riscv: support kernel-mode Vector Andy Chiu
2023-12-23  4:29 ` [v8, 01/10] riscv: Add support for kernel mode vector Andy Chiu
2023-12-27  1:36   ` Charlie Jenkins
2023-12-27  2:46     ` Andy Chiu
2023-12-27  5:30       ` Charlie Jenkins
2023-12-27  9:18         ` Andy Chiu
2023-12-28  1:52           ` Charlie Jenkins
2023-12-23  4:29 ` [v8, 02/10] riscv: vector: make Vector always available for softirq context Andy Chiu
2023-12-23  4:29 ` [v8, 03/10] riscv: Add vector extension XOR implementation Andy Chiu
2023-12-23  4:29 ` [v8, 04/10] riscv: sched: defer restoring Vector context for user Andy Chiu
2023-12-27 12:07   ` Song Shuai
2023-12-23  4:29 ` [v8, 05/10] riscv: lib: vectorize copy_to_user/copy_from_user Andy Chiu
2023-12-27  1:27   ` Charlie Jenkins
2023-12-27  1:34   ` Guo Ren
2023-12-27  3:15     ` Andy Chiu
2024-01-15  5:42       ` Andy Chiu
2023-12-23  4:29 ` [v8, 06/10] riscv: lib: add vectorized mem* routines Andy Chiu
2023-12-27  1:42   ` Charlie Jenkins
2023-12-23  4:29 ` [v8, 07/10] riscv: vector: do not pass task_struct into riscv_v_vstate_{save,restore}() Andy Chiu
2023-12-23  4:29 ` [v8, 08/10] riscv: vector: use a mask to write vstate_ctrl Andy Chiu
2023-12-23  4:29 ` [v8, 09/10] riscv: vector: use kmem_cache to manage vector context Andy Chiu
2023-12-23  4:29 ` [v8, 10/10] riscv: vector: allow kernel-mode Vector with preemption Andy Chiu
2023-12-27 12:12   ` Song Shuai
2023-12-27 22:45   ` Samuel Holland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).