linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC/RFT 0/3] Converge common flows for cpu assisted shadow stack
@ 2024-10-11  0:32 Deepak Gupta
  2024-10-11  0:32 ` [PATCH RFC/RFT 1/3] mm: Introduce ARCH_HAS_USER_SHADOW_STACK Deepak Gupta
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Deepak Gupta @ 2024-10-11  0:32 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Lorenzo Stoakes, Arnd Bergmann
  Cc: linux-kernel, linux-fsdevel, linux-mm, linux-arch, Rick Edgecombe,
	Mark Brown, Deepak Gupta, David Hildenbrand, Carlos Bilbao

x86, arm64 and risc-v support cpu assisted shadow stack. x86 was first
one and most of the shadow stack related code is in x86 arch directory.
arm64 guarded control stack (GCS) patches from Mark Brown are in -next.

There are significant flows which are quite common between all 3 arches:

- Enabling is via prctl.
- Managing virtual memory for shadow stack handled similarly.
- Virtual memory management of shadow stack on clone/fork is similar.

This led to obvious discussion many how to merge certain common flows in
generic code. Recent one being [1]. Goes without saying having generic
code helps with bug management as well (not having to fix same bug for 3
different arches).

In that attempt, Mark brown introduced `ARCH_HAS_SHADOW_STACK` as part
of arm64 gcs series [2]. This patchset uses same config to move as much
as possible common code in generic kernel. Additionaly this patchset
introduces wrapper abstractions where arch specific handling is required.
I looked at only x86 and risc-v while carving out common code and defining
these abstractions. Mark, please take a look at this and point out if arm64
would require something additional (or removal).

I've not tested this. Only compiled for x86 with shadow stack enable. Thus
this is a RFC and possible looking for some help to test as well on x86.

[1] - https://lore.kernel.org/all/20241008-v5_user_cfi_series-v6-0-60d9fe073f37@rivosinc.com/T/#m98d14237663150778a3f8df59a76a3fe6318624a

[2] - https://lore.kernel.org/linux-arm-kernel/20241001-arm64-gcs-v13-0-222b78d87eee@kernel.org/T/#m1ff65a49873b0e770e71de7af178f581c72be7ad

To: Thomas Gleixner <tglx@linutronix.de>
To: Ingo Molnar <mingo@redhat.com>
To: Borislav Petkov <bp@alien8.de>
To: Dave Hansen <dave.hansen@linux.intel.com>
To: x86@kernel.org
To: H. Peter Anvin <hpa@zytor.com>
To: Andrew Morton <akpm@linux-foundation.org>
To: Liam R. Howlett <Liam.Howlett@oracle.com>
To: Vlastimil Babka <vbabka@suse.cz>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: linux-kernel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-arch@vger.kernel.org
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Mark Brown <broonie@kernel.org>

Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
Deepak Gupta (2):
      mm: helper `is_shadow_stack_vma` to check shadow stack vma
      kernel: converge common shadow stack flow agnostic to arch

Mark Brown (1):
      mm: Introduce ARCH_HAS_USER_SHADOW_STACK

 arch/x86/Kconfig                       |   1 +
 arch/x86/include/asm/shstk.h           |   9 +
 arch/x86/include/uapi/asm/mman.h       |   3 -
 arch/x86/kernel/shstk.c                | 270 ++++++------------------------
 fs/proc/task_mmu.c                     |   2 +-
 include/linux/mm.h                     |   2 +-
 include/linux/usershstk.h              |  25 +++
 include/uapi/asm-generic/mman-common.h |   3 +
 kernel/Makefile                        |   2 +
 kernel/usershstk.c                     | 289 +++++++++++++++++++++++++++++++++
 mm/Kconfig                             |   6 +
 mm/gup.c                               |   2 +-
 mm/vma.h                               |  10 +-
 13 files changed, 392 insertions(+), 232 deletions(-)
---
base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc
change-id: 20241010-shstk_converge-aefbcbef5d71
--
- debug


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH RFC/RFT 1/3] mm: Introduce ARCH_HAS_USER_SHADOW_STACK
  2024-10-11  0:32 [PATCH RFC/RFT 0/3] Converge common flows for cpu assisted shadow stack Deepak Gupta
@ 2024-10-11  0:32 ` Deepak Gupta
  2024-10-11 10:33   ` Mark Brown
  2024-10-11  0:32 ` [PATCH RFC/RFT 2/3] mm: helper `is_shadow_stack_vma` to check shadow stack vma Deepak Gupta
  2024-10-11  0:32 ` [PATCH RFC/RFT 3/3] kernel: converge common shadow stack flow agnostic to arch Deepak Gupta
  2 siblings, 1 reply; 11+ messages in thread
From: Deepak Gupta @ 2024-10-11  0:32 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Lorenzo Stoakes, Arnd Bergmann
  Cc: linux-kernel, linux-fsdevel, linux-mm, linux-arch, Rick Edgecombe,
	Mark Brown, Deepak Gupta, David Hildenbrand, Carlos Bilbao

From: Mark Brown <broonie@kernel.org>

Since multiple architectures have support for shadow stacks and we need to
select support for this feature in several places in the generic code
provide a generic config option that the architectures can select.

Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Deepak Gupta <debug@rivosinc.com>
Reviewed-by: Carlos Bilbao <carlos.bilbao.osdev@gmail.com>
---
 arch/x86/Kconfig   | 1 +
 fs/proc/task_mmu.c | 2 +-
 include/linux/mm.h | 2 +-
 mm/Kconfig         | 6 ++++++
 4 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2852fcd82cbd..8ccae77d40f7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1954,6 +1954,7 @@ config X86_USER_SHADOW_STACK
 	depends on AS_WRUSS
 	depends on X86_64
 	select ARCH_USES_HIGH_VMA_FLAGS
+	select ARCH_HAS_USER_SHADOW_STACK
 	select X86_CET
 	help
 	  Shadow stack protection is a hardware feature that detects function
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 72f14fd59c2d..23f875e78eae 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -971,7 +971,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
 		[ilog2(VM_UFFD_MINOR)]	= "ui",
 #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
-#ifdef CONFIG_X86_USER_SHADOW_STACK
+#ifdef CONFIG_ARCH_HAS_USER_SHADOW_STACK
 		[ilog2(VM_SHADOW_STACK)] = "ss",
 #endif
 #if defined(CONFIG_64BIT) || defined(CONFIG_PPC32)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ecf63d2b0582..57533b9cae95 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -354,7 +354,7 @@ extern unsigned int kobjsize(const void *objp);
 #endif
 #endif /* CONFIG_ARCH_HAS_PKEYS */
 
-#ifdef CONFIG_X86_USER_SHADOW_STACK
+#ifdef CONFIG_ARCH_HAS_USER_SHADOW_STACK
 /*
  * VM_SHADOW_STACK should not be set with VM_SHARED because of lack of
  * support core mm.
diff --git a/mm/Kconfig b/mm/Kconfig
index 4c9f5ea13271..4b2a1ef9a161 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1296,6 +1296,12 @@ config NUMA_EMU
 	  into virtual nodes when booted with "numa=fake=N", where N is the
 	  number of nodes. This is only useful for debugging.
 
+config ARCH_HAS_USER_SHADOW_STACK
+	bool
+	help
+	  The architecture has hardware support for userspace shadow call
+          stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss).
+
 source "mm/damon/Kconfig"
 
 endmenu

-- 
2.45.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC/RFT 2/3] mm: helper `is_shadow_stack_vma` to check shadow stack vma
  2024-10-11  0:32 [PATCH RFC/RFT 0/3] Converge common flows for cpu assisted shadow stack Deepak Gupta
  2024-10-11  0:32 ` [PATCH RFC/RFT 1/3] mm: Introduce ARCH_HAS_USER_SHADOW_STACK Deepak Gupta
@ 2024-10-11  0:32 ` Deepak Gupta
  2024-10-11 10:38   ` Mark Brown
  2024-10-11  0:32 ` [PATCH RFC/RFT 3/3] kernel: converge common shadow stack flow agnostic to arch Deepak Gupta
  2 siblings, 1 reply; 11+ messages in thread
From: Deepak Gupta @ 2024-10-11  0:32 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Lorenzo Stoakes, Arnd Bergmann
  Cc: linux-kernel, linux-fsdevel, linux-mm, linux-arch, Rick Edgecombe,
	Mark Brown, Deepak Gupta

VM_SHADOW_STACK (alias to VM_HIGH_ARCH_5) is used to encode shadow stack
VMA on three architectures (x86 shadow stack, arm GCS and RISC-V shadow
stack). In case architecture doesn't implement shadow stack, it's VM_NONE
Introducing a helper `is_shadow_stack_vma` to determine shadow stack vma
or not.

Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
 mm/gup.c |  2 +-
 mm/vma.h | 10 +++++++---
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index a82890b46a36..8e6e14179f6c 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1282,7 +1282,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
 		    !writable_file_mapping_allowed(vma, gup_flags))
 			return -EFAULT;
 
-		if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) {
+		if (!(vm_flags & VM_WRITE) || is_shadow_stack_vma(vm_flags)) {
 			if (!(gup_flags & FOLL_FORCE))
 				return -EFAULT;
 			/* hugetlb does not support FOLL_FORCE|FOLL_WRITE. */
diff --git a/mm/vma.h b/mm/vma.h
index 819f994cf727..0f238dc37231 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -357,7 +357,7 @@ static inline struct vm_area_struct *vma_prev_limit(struct vma_iterator *vmi,
 }
 
 /*
- * These three helpers classifies VMAs for virtual memory accounting.
+ * These four helpers classifies VMAs for virtual memory accounting.
  */
 
 /*
@@ -368,6 +368,11 @@ static inline bool is_exec_mapping(vm_flags_t flags)
 	return (flags & (VM_EXEC | VM_WRITE | VM_STACK)) == VM_EXEC;
 }
 
+static inline bool is_shadow_stack_vma(vm_flags_t vm_flags)
+{
+	return !!(vm_flags & VM_SHADOW_STACK);
+}
+
 /*
  * Stack area (including shadow stacks)
  *
@@ -376,7 +381,7 @@ static inline bool is_exec_mapping(vm_flags_t flags)
  */
 static inline bool is_stack_mapping(vm_flags_t flags)
 {
-	return ((flags & VM_STACK) == VM_STACK) || (flags & VM_SHADOW_STACK);
+	return ((flags & VM_STACK) == VM_STACK) || is_shadow_stack_vma(flags);
 }
 
 /*
@@ -387,7 +392,6 @@ static inline bool is_data_mapping(vm_flags_t flags)
 	return (flags & (VM_WRITE | VM_SHARED | VM_STACK)) == VM_WRITE;
 }
 
-
 static inline void vma_iter_config(struct vma_iterator *vmi,
 		unsigned long index, unsigned long last)
 {

-- 
2.45.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC/RFT 3/3] kernel: converge common shadow stack flow agnostic to arch
  2024-10-11  0:32 [PATCH RFC/RFT 0/3] Converge common flows for cpu assisted shadow stack Deepak Gupta
  2024-10-11  0:32 ` [PATCH RFC/RFT 1/3] mm: Introduce ARCH_HAS_USER_SHADOW_STACK Deepak Gupta
  2024-10-11  0:32 ` [PATCH RFC/RFT 2/3] mm: helper `is_shadow_stack_vma` to check shadow stack vma Deepak Gupta
@ 2024-10-11  0:32 ` Deepak Gupta
  2024-10-11 12:33   ` Mark Brown
  2 siblings, 1 reply; 11+ messages in thread
From: Deepak Gupta @ 2024-10-11  0:32 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Lorenzo Stoakes, Arnd Bergmann
  Cc: linux-kernel, linux-fsdevel, linux-mm, linux-arch, Rick Edgecombe,
	Mark Brown, Deepak Gupta

CPU assisted shadow stack are supported by x86, arm64 and risc-v. In
terms of enabling shadow stack feature for usermode code in kernel,
they have following commonalities

- Expose a user ABI (via a prctl) to allow user mode to explicitly
  ask for enabling shadow stack instead of by default enabling it.
  x86 series pre-dates arm64 or risc-v announcment of support, so it
  ended up doing a arch specific prctl instead of generic one. arm64
  and risc-v have converged on using generic prctl and each of them
  can handle it appropriatley.

- On fork or clone, shadow stack has to be COWed or not COWed depending
  on CLONE_VM was passed or not. Additionally if CLONE_VFORK was passed
  then same (parent one) shadow stack should be used.

- To create shadow stack mappings, implement `map_shadow_stack` system
  call.

This patch picks up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` config
introduction and incorproate most of the common flows between different
architectures.

On a high level, shadow stack allocation and shadow stack de-allocation
are base operations on virtual memory and common between architectures.
Similarly shadow stack setup on prctl (arch specific or otherwise) is a
common flow. Treatment of shadow stack virtual memory on `clone/fork` and
implementaiton of `map_shadow_stack` is also converged into common flow.

To implement these common flows, each architecture have arch-specific
enabling mechanism as well as arch-specific data structures in task/
thread struct. So additionally this patch tries to abstract certain
operation/helpers and allowing each architecture to have their arch_*
implementation to implement the abstractions.

Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
 arch/x86/include/asm/shstk.h           |   9 +
 arch/x86/include/uapi/asm/mman.h       |   3 -
 arch/x86/kernel/shstk.c                | 270 ++++++------------------------
 include/linux/usershstk.h              |  25 +++
 include/uapi/asm-generic/mman-common.h |   3 +
 kernel/Makefile                        |   2 +
 kernel/usershstk.c                     | 289 +++++++++++++++++++++++++++++++++
 7 files changed, 375 insertions(+), 226 deletions(-)

diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h
index 4cb77e004615..4bb20af6cf7b 100644
--- a/arch/x86/include/asm/shstk.h
+++ b/arch/x86/include/asm/shstk.h
@@ -37,6 +37,15 @@ static inline int shstk_update_last_frame(unsigned long val) { return 0; }
 static inline bool shstk_is_enabled(void) { return false; }
 #endif /* CONFIG_X86_USER_SHADOW_STACK */
 
+int arch_create_rstor_token(unsigned long ssp, unsigned long *token_addr);
+bool arch_cpu_supports_shadow_stack(void);
+bool arch_is_shstk_enabled(struct task_struct *task);
+void arch_set_shstk_base_size(struct task_struct *task, unsigned long base,
+			unsigned long size);
+void arch_get_shstk_base_size(struct task_struct *task, unsigned long *base,
+			unsigned long *size);
+void arch_set_shstk_ptr_and_enable(unsigned long ssp);
+void arch_set_thread_shstk_status(bool enable);
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_X86_SHSTK_H */
diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h
index 46cdc941f958..ac1e6277212b 100644
--- a/arch/x86/include/uapi/asm/mman.h
+++ b/arch/x86/include/uapi/asm/mman.h
@@ -5,9 +5,6 @@
 #define MAP_32BIT	0x40		/* only give out 32bit addresses */
 #define MAP_ABOVE4G	0x80		/* only map above 4GB */
 
-/* Flags for map_shadow_stack(2) */
-#define SHADOW_STACK_SET_TOKEN	(1ULL << 0)	/* Set up a restore token in the shadow stack */
-
 #include <asm-generic/mman.h>
 
 #endif /* _ASM_X86_MMAN_H */
diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
index 059685612362..512339b271e1 100644
--- a/arch/x86/kernel/shstk.c
+++ b/arch/x86/kernel/shstk.c
@@ -25,6 +25,7 @@
 #include <asm/special_insns.h>
 #include <asm/fpu/api.h>
 #include <asm/prctl.h>
+#include <linux/usershstk.h>
 
 #define SS_FRAME_SIZE 8
 
@@ -43,11 +44,56 @@ static void features_clr(unsigned long features)
 	current->thread.features &= ~features;
 }
 
+bool arch_cpu_supports_shadow_stack(void)
+{
+	return cpu_feature_enabled(X86_FEATURE_USER_SHSTK);
+}
+
+bool arch_is_shstk_enabled(struct task_struct *task)
+{
+	return features_enabled(ARCH_SHSTK_SHSTK);
+}
+
+void arch_set_shstk_base_size(struct task_struct *task, unsigned long base,
+			unsigned long size)
+{
+	struct thread_shstk *shstk = &task->thread.shstk;
+
+	shstk->base = base;
+	shstk->size = size;
+}
+
+void arch_get_shstk_base_size(struct task_struct *task, unsigned long *base,
+			unsigned long *size)
+{
+	struct thread_shstk *shstk = &task->thread.shstk;
+
+	*base = shstk->base;
+	*size = shstk->size;
+}
+
+
+void arch_set_shstk_ptr_and_enable(unsigned long ssp)
+{
+	fpregs_lock_and_load();
+	wrmsrl(MSR_IA32_PL3_SSP, ssp);
+	wrmsrl(MSR_IA32_U_CET, CET_SHSTK_EN);
+	fpregs_unlock();
+}
+
+void arch_set_thread_shstk_status(bool enable)
+{
+	if (enable)
+		features_set(ARCH_SHSTK_SHSTK);
+	else
+		features_clr(ARCH_SHSTK_SHSTK);
+}
+
 /*
  * Create a restore token on the shadow stack.  A token is always 8-byte
  * and aligned to 8.
  */
-static int create_rstor_token(unsigned long ssp, unsigned long *token_addr)
+int arch_create_rstor_token(unsigned long ssp, unsigned long *token_addr)
 {
 	unsigned long addr;
 
@@ -72,118 +118,6 @@ static int create_rstor_token(unsigned long ssp, unsigned long *token_addr)
 	return 0;
 }
 
-/*
- * VM_SHADOW_STACK will have a guard page. This helps userspace protect
- * itself from attacks. The reasoning is as follows:
- *
- * The shadow stack pointer(SSP) is moved by CALL, RET, and INCSSPQ. The
- * INCSSP instruction can increment the shadow stack pointer. It is the
- * shadow stack analog of an instruction like:
- *
- *   addq $0x80, %rsp
- *
- * However, there is one important difference between an ADD on %rsp
- * and INCSSP. In addition to modifying SSP, INCSSP also reads from the
- * memory of the first and last elements that were "popped". It can be
- * thought of as acting like this:
- *
- * READ_ONCE(ssp);       // read+discard top element on stack
- * ssp += nr_to_pop * 8; // move the shadow stack
- * READ_ONCE(ssp-8);     // read+discard last popped stack element
- *
- * The maximum distance INCSSP can move the SSP is 2040 bytes, before
- * it would read the memory. Therefore a single page gap will be enough
- * to prevent any operation from shifting the SSP to an adjacent stack,
- * since it would have to land in the gap at least once, causing a
- * fault.
- */
-static unsigned long alloc_shstk(unsigned long addr, unsigned long size,
-				 unsigned long token_offset, bool set_res_tok)
-{
-	int flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_ABOVE4G;
-	struct mm_struct *mm = current->mm;
-	unsigned long mapped_addr, unused;
-
-	if (addr)
-		flags |= MAP_FIXED_NOREPLACE;
-
-	mmap_write_lock(mm);
-	mapped_addr = do_mmap(NULL, addr, size, PROT_READ, flags,
-			      VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL);
-	mmap_write_unlock(mm);
-
-	if (!set_res_tok || IS_ERR_VALUE(mapped_addr))
-		goto out;
-
-	if (create_rstor_token(mapped_addr + token_offset, NULL)) {
-		vm_munmap(mapped_addr, size);
-		return -EINVAL;
-	}
-
-out:
-	return mapped_addr;
-}
-
-static unsigned long adjust_shstk_size(unsigned long size)
-{
-	if (size)
-		return PAGE_ALIGN(size);
-
-	return PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G));
-}
-
-static void unmap_shadow_stack(u64 base, u64 size)
-{
-	int r;
-
-	r = vm_munmap(base, size);
-
-	/*
-	 * mmap_write_lock_killable() failed with -EINTR. This means
-	 * the process is about to die and have it's MM cleaned up.
-	 * This task shouldn't ever make it back to userspace. In this
-	 * case it is ok to leak a shadow stack, so just exit out.
-	 */
-	if (r == -EINTR)
-		return;
-
-	/*
-	 * For all other types of vm_munmap() failure, either the
-	 * system is out of memory or there is bug.
-	 */
-	WARN_ON_ONCE(r);
-}
-
-static int shstk_setup(void)
-{
-	struct thread_shstk *shstk = &current->thread.shstk;
-	unsigned long addr, size;
-
-	/* Already enabled */
-	if (features_enabled(ARCH_SHSTK_SHSTK))
-		return 0;
-
-	/* Also not supported for 32 bit */
-	if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || in_ia32_syscall())
-		return -EOPNOTSUPP;
-
-	size = adjust_shstk_size(0);
-	addr = alloc_shstk(0, size, 0, false);
-	if (IS_ERR_VALUE(addr))
-		return PTR_ERR((void *)addr);
-
-	fpregs_lock_and_load();
-	wrmsrl(MSR_IA32_PL3_SSP, addr + size);
-	wrmsrl(MSR_IA32_U_CET, CET_SHSTK_EN);
-	fpregs_unlock();
-
-	shstk->base = addr;
-	shstk->size = size;
-	features_set(ARCH_SHSTK_SHSTK);
-
-	return 0;
-}
-
 void reset_thread_features(void)
 {
 	memset(&current->thread.shstk, 0, sizeof(struct thread_shstk));
@@ -191,48 +125,6 @@ void reset_thread_features(void)
 	current->thread.features_locked = 0;
 }
 
-unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags,
-				       unsigned long stack_size)
-{
-	struct thread_shstk *shstk = &tsk->thread.shstk;
-	unsigned long addr, size;
-
-	/*
-	 * If shadow stack is not enabled on the new thread, skip any
-	 * switch to a new shadow stack.
-	 */
-	if (!features_enabled(ARCH_SHSTK_SHSTK))
-		return 0;
-
-	/*
-	 * For CLONE_VFORK the child will share the parents shadow stack.
-	 * Make sure to clear the internal tracking of the thread shadow
-	 * stack so the freeing logic run for child knows to leave it alone.
-	 */
-	if (clone_flags & CLONE_VFORK) {
-		shstk->base = 0;
-		shstk->size = 0;
-		return 0;
-	}
-
-	/*
-	 * For !CLONE_VM the child will use a copy of the parents shadow
-	 * stack.
-	 */
-	if (!(clone_flags & CLONE_VM))
-		return 0;
-
-	size = adjust_shstk_size(stack_size);
-	addr = alloc_shstk(0, size, 0, false);
-	if (IS_ERR_VALUE(addr))
-		return addr;
-
-	shstk->base = addr;
-	shstk->size = size;
-
-	return addr + size;
-}
-
 static unsigned long get_user_shstk_addr(void)
 {
 	unsigned long long ssp;
@@ -402,44 +294,6 @@ int restore_signal_shadow_stack(void)
 	return 0;
 }
 
-void shstk_free(struct task_struct *tsk)
-{
-	struct thread_shstk *shstk = &tsk->thread.shstk;
-
-	if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) ||
-	    !features_enabled(ARCH_SHSTK_SHSTK))
-		return;
-
-	/*
-	 * When fork() with CLONE_VM fails, the child (tsk) already has a
-	 * shadow stack allocated, and exit_thread() calls this function to
-	 * free it.  In this case the parent (current) and the child share
-	 * the same mm struct.
-	 */
-	if (!tsk->mm || tsk->mm != current->mm)
-		return;
-
-	/*
-	 * If shstk->base is NULL, then this task is not managing its
-	 * own shadow stack (CLONE_VFORK). So skip freeing it.
-	 */
-	if (!shstk->base)
-		return;
-
-	/*
-	 * shstk->base is NULL for CLONE_VFORK child tasks, and so is
-	 * normal. But size = 0 on a shstk->base is not normal and
-	 * indicated an attempt to free the thread shadow stack twice.
-	 * Warn about it.
-	 */
-	if (WARN_ON(!shstk->size))
-		return;
-
-	unmap_shadow_stack(shstk->base, shstk->size);
-
-	shstk->size = 0;
-}
-
 static int wrss_control(bool enable)
 {
 	u64 msrval;
@@ -502,36 +356,6 @@ static int shstk_disable(void)
 	return 0;
 }
 
-SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags)
-{
-	bool set_tok = flags & SHADOW_STACK_SET_TOKEN;
-	unsigned long aligned_size;
-
-	if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK))
-		return -EOPNOTSUPP;
-
-	if (flags & ~SHADOW_STACK_SET_TOKEN)
-		return -EINVAL;
-
-	/* If there isn't space for a token */
-	if (set_tok && size < 8)
-		return -ENOSPC;
-
-	if (addr && addr < SZ_4G)
-		return -ERANGE;
-
-	/*
-	 * An overflow would result in attempting to write the restore token
-	 * to the wrong location. Not catastrophic, but just return the right
-	 * error code and block it.
-	 */
-	aligned_size = PAGE_ALIGN(size);
-	if (aligned_size < size)
-		return -EOVERFLOW;
-
-	return alloc_shstk(addr, aligned_size, size, set_tok);
-}
-
 long shstk_prctl(struct task_struct *task, int option, unsigned long arg2)
 {
 	unsigned long features = arg2;
diff --git a/include/linux/usershstk.h b/include/linux/usershstk.h
new file mode 100644
index 000000000000..68d751948e35
--- /dev/null
+++ b/include/linux/usershstk.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _SHSTK_H
+#define _SHSTK_H
+
+#ifndef __ASSEMBLY__
+#include <linux/types.h>
+
+unsigned long alloc_shstk(unsigned long addr, unsigned long size,
+				 unsigned long token_offset, bool set_res_tok);
+int shstk_setup(void);
+int create_rstor_token(unsigned long ssp, unsigned long *token_addr);
+bool cpu_supports_shadow_stack(void);
+bool is_shstk_enabled(struct task_struct *task);
+void set_shstk_base_size(struct task_struct *task, unsigned long base,
+			unsigned long size);
+void get_shstk_base_size(struct task_struct *task, unsigned long *base,
+			unsigned long *size);
+void set_shstk_ptr_and_enable(unsigned long ssp);
+void set_thread_shstk_status(bool enable);
+unsigned long adjust_shstk_size(unsigned long size);
+void unmap_shadow_stack(u64 base, u64 size);
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* _SHSTK_H */
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index 6ce1f1ceb432..2c36e4c7b6ec 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -87,4 +87,7 @@
 #define PKEY_ACCESS_MASK	(PKEY_DISABLE_ACCESS |\
 				 PKEY_DISABLE_WRITE)
 
+/* Flags for map_shadow_stack(2) */
+#define SHADOW_STACK_SET_TOKEN	(1ULL << 0)	/* Set up a restore token in the shadow stack */
+
 #endif /* __ASM_GENERIC_MMAN_COMMON_H */
diff --git a/kernel/Makefile b/kernel/Makefile
index 87866b037fbe..1922c456b954 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -140,6 +140,8 @@ KCOV_INSTRUMENT_stackleak.o := n
 
 obj-$(CONFIG_SCF_TORTURE_TEST) += scftorture.o
 
+obj-$(CONFIG_ARCH_HAS_USER_SHADOW_STACK) += usershstk.o
+
 $(obj)/configs.o: $(obj)/config_data.gz
 
 targets += config_data config_data.gz
diff --git a/kernel/usershstk.c b/kernel/usershstk.c
new file mode 100644
index 000000000000..055d70b99893
--- /dev/null
+++ b/kernel/usershstk.c
@@ -0,0 +1,289 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * shstk.c - Intel shadow stack support
+ *
+ * Copyright (c) 2021, Intel Corporation.
+ * Yu-cheng Yu <yu-cheng.yu@intel.com>
+ */
+
+#include <linux/sched.h>
+#include <linux/bitops.h>
+#include <linux/types.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/sched/signal.h>
+#include <linux/compat.h>
+#include <linux/sizes.h>
+#include <linux/user.h>
+#include <linux/syscalls.h>
+#include <asm/shstk.h>
+#include <linux/usershstk.h>
+
+#define SHSTK_ENTRY_SIZE sizeof(void *)
+
+bool cpu_supports_shadow_stack(void)
+{
+	return arch_cpu_supports_shadow_stack();
+}
+
+bool is_shstk_enabled(struct task_struct *task)
+{
+	return arch_is_shstk_enabled(task);
+}
+
+void set_shstk_base_size(struct task_struct *task, unsigned long base,
+			unsigned long size)
+{
+	arch_set_shstk_base_size(task, base, size);
+}
+
+void get_shstk_base_size(struct task_struct *task, unsigned long *base,
+			unsigned long *size)
+{
+	arch_get_shstk_base_size(task, base, size);
+}
+
+void set_shstk_ptr_and_enable(unsigned long ssp)
+{
+	arch_set_shstk_ptr_and_enable(ssp);
+}
+
+void set_thread_shstk_status(bool enable)
+{
+	arch_set_thread_shstk_status(enable);
+}
+
+int create_rstor_token(unsigned long ssp, unsigned long *token_addr)
+{
+	return arch_create_rstor_token(ssp, token_addr);
+}
+
+unsigned long adjust_shstk_size(unsigned long size)
+{
+	if (size)
+		return PAGE_ALIGN(size);
+
+	return PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G));
+}
+
+void unmap_shadow_stack(u64 base, u64 size)
+{
+	int r;
+
+	r = vm_munmap(base, size);
+
+	/*
+	 * mmap_write_lock_killable() failed with -EINTR. This means
+	 * the process is about to die and have it's MM cleaned up.
+	 * This task shouldn't ever make it back to userspace. In this
+	 * case it is ok to leak a shadow stack, so just exit out.
+	 */
+	if (r == -EINTR)
+		return;
+
+	/*
+	 * For all other types of vm_munmap() failure, either the
+	 * system is out of memory or there is bug.
+	 */
+	WARN_ON_ONCE(r);
+}
+
+/*
+ * VM_SHADOW_STACK will have a guard page. This helps userspace protect
+ * itself from attacks. The reasoning is as follows:
+ *
+ * The shadow stack pointer(SSP) is moved by CALL, RET, and INCSSPQ. The
+ * INCSSP instruction can increment the shadow stack pointer. It is the
+ * shadow stack analog of an instruction like:
+ *
+ *   addq $0x80, %rsp
+ *
+ * However, there is one important difference between an ADD on %rsp
+ * and INCSSP. In addition to modifying SSP, INCSSP also reads from the
+ * memory of the first and last elements that were "popped". It can be
+ * thought of as acting like this:
+ *
+ * READ_ONCE(ssp);       // read+discard top element on stack
+ * ssp += nr_to_pop * 8; // move the shadow stack
+ * READ_ONCE(ssp-8);     // read+discard last popped stack element
+ *
+ * The maximum distance INCSSP can move the SSP is 2040 bytes, before
+ * it would read the memory. Therefore a single page gap will be enough
+ * to prevent any operation from shifting the SSP to an adjacent stack,
+ * since it would have to land in the gap at least once, causing a
+ * fault.
+ */
+unsigned long alloc_shstk(unsigned long addr, unsigned long size,
+				 unsigned long token_offset, bool set_res_tok)
+{
+	int flags = MAP_ANONYMOUS | MAP_PRIVATE;
+
+	flags |= IS_ENABLED(CONFIG_X86_64) ? MAP_ABOVE4G : 0;
+
+	struct mm_struct *mm = current->mm;
+	unsigned long mapped_addr, unused;
+
+	if (addr)
+		flags |= MAP_FIXED_NOREPLACE;
+
+	mmap_write_lock(mm);
+	mapped_addr = do_mmap(NULL, addr, size, PROT_READ, flags,
+			      VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL);
+	mmap_write_unlock(mm);
+
+	if (!set_res_tok || IS_ERR_VALUE(mapped_addr))
+		goto out;
+
+	if (create_rstor_token(mapped_addr + token_offset, NULL)) {
+		vm_munmap(mapped_addr, size);
+		return -EINVAL;
+	}
+
+out:
+	return mapped_addr;
+}
+
+void shstk_free(struct task_struct *tsk)
+{
+	unsigned long base, size;
+
+	if (!cpu_supports_shadow_stack() ||
+	    !is_shstk_enabled(current))
+		return;
+
+	/*
+	 * When fork() with CLONE_VM fails, the child (tsk) already has a
+	 * shadow stack allocated, and exit_thread() calls this function to
+	 * free it.  In this case the parent (current) and the child share
+	 * the same mm struct.
+	 */
+	if (!tsk->mm || tsk->mm != current->mm)
+		return;
+
+	get_shstk_base_size(tsk, &base, &size);
+	/*
+	 * If shstk->base is NULL, then this task is not managing its
+	 * own shadow stack (CLONE_VFORK). So skip freeing it.
+	 */
+	if (!base)
+		return;
+
+	/*
+	 * shstk->base is NULL for CLONE_VFORK child tasks, and so is
+	 * normal. But size = 0 on a shstk->base is not normal and
+	 * indicated an attempt to free the thread shadow stack twice.
+	 * Warn about it.
+	 */
+	if (WARN_ON(!size))
+		return;
+
+	unmap_shadow_stack(base, size);
+
+	set_shstk_base_size(tsk, 0, 0);
+}
+
+SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags)
+{
+	bool set_tok = flags & SHADOW_STACK_SET_TOKEN;
+	unsigned long aligned_size;
+
+	if (!cpu_supports_shadow_stack())
+		return -EOPNOTSUPP;
+
+	if (flags & ~SHADOW_STACK_SET_TOKEN)
+		return -EINVAL;
+
+	/* If there isn't space for a token */
+	if (set_tok && size < SHSTK_ENTRY_SIZE)
+		return -ENOSPC;
+
+	if (addr && (addr & (PAGE_SIZE - 1)))
+		return -EINVAL;
+
+	if (IS_ENABLED(CONFIG_X86_64) &&
+		addr && addr < SZ_4G)
+		return -ERANGE;
+
+	/*
+	 * An overflow would result in attempting to write the restore token
+	 * to the wrong location. Not catastrophic, but just return the right
+	 * error code and block it.
+	 */
+	aligned_size = PAGE_ALIGN(size);
+	if (aligned_size < size)
+		return -EOVERFLOW;
+
+	return alloc_shstk(addr, aligned_size, size, set_tok);
+}
+
+int shstk_setup(void)
+{
+	struct thread_shstk *shstk = &current->thread.shstk;
+	unsigned long addr, size;
+
+	/* Already enabled */
+	if (is_shstk_enabled(current))
+		return 0;
+
+	/* Also not supported for 32 bit */
+	if (!cpu_supports_shadow_stack() ||
+		(IS_ENABLED(CONFIG_X86_64) && in_ia32_syscall()))
+		return -EOPNOTSUPP;
+
+	size = adjust_shstk_size(0);
+	addr = alloc_shstk(0, size, 0, false);
+	if (IS_ERR_VALUE(addr))
+		return PTR_ERR((void *)addr);
+
+	set_shstk_ptr_and_enable(addr + size);
+	set_shstk_base_size(current, addr, size);
+
+	set_thread_shstk_status(true);
+
+	return 0;
+}
+
+unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags,
+				       unsigned long stack_size)
+{
+	struct thread_shstk *shstk = &tsk->thread.shstk;
+	unsigned long addr, size;
+
+	if (!cpu_supports_shadow_stack())
+		return -EOPNOTSUPP;
+
+	/*
+	 * If shadow stack is not enabled on the new thread, skip any
+	 * switch to a new shadow stack.
+	 */
+	if (!is_shstk_enabled(tsk))
+		return 0;
+
+	/*
+	 * For CLONE_VFORK the child will share the parents shadow stack.
+	 * Make sure to clear the internal tracking of the thread shadow
+	 * stack so the freeing logic run for child knows to leave it alone.
+	 */
+	if (clone_flags & CLONE_VFORK) {
+		set_shstk_base_size(tsk, 0, 0);
+		return 0;
+	}
+
+	/*
+	 * For !CLONE_VM the child will use a copy of the parents shadow
+	 * stack.
+	 */
+	if (!(clone_flags & CLONE_VM))
+		return 0;
+
+	size = adjust_shstk_size(stack_size);
+	addr = alloc_shstk(0, size, 0, false);
+	if (IS_ERR_VALUE(addr))
+		return addr;
+
+	set_shstk_base_size(tsk, addr, size);
+
+	return addr + size;
+}

-- 
2.45.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC/RFT 1/3] mm: Introduce ARCH_HAS_USER_SHADOW_STACK
  2024-10-11  0:32 ` [PATCH RFC/RFT 1/3] mm: Introduce ARCH_HAS_USER_SHADOW_STACK Deepak Gupta
@ 2024-10-11 10:33   ` Mark Brown
  2024-10-11 17:08     ` Deepak Gupta
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Brown @ 2024-10-11 10:33 UTC (permalink / raw)
  To: Deepak Gupta
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Lorenzo Stoakes, Arnd Bergmann, linux-kernel, linux-fsdevel,
	linux-mm, linux-arch, Rick Edgecombe, David Hildenbrand,
	Carlos Bilbao

[-- Attachment #1: Type: text/plain, Size: 856 bytes --]

On Thu, Oct 10, 2024 at 05:32:03PM -0700, Deepak Gupta wrote:
> From: Mark Brown <broonie@kernel.org>
> 
> Since multiple architectures have support for shadow stacks and we need to
> select support for this feature in several places in the generic code
> provide a generic config option that the architectures can select.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Mark Brown <broonie@kernel.org>
> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Reviewed-by: Deepak Gupta <debug@rivosinc.com>
> Reviewed-by: Carlos Bilbao <carlos.bilbao.osdev@gmail.com>
> ---

You need to add your own signoff when resending things (though I guess
this is likely to get applied to a tree that already contains this
patch so it likely doesn't matter in the end).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC/RFT 2/3] mm: helper `is_shadow_stack_vma` to check shadow stack vma
  2024-10-11  0:32 ` [PATCH RFC/RFT 2/3] mm: helper `is_shadow_stack_vma` to check shadow stack vma Deepak Gupta
@ 2024-10-11 10:38   ` Mark Brown
  2024-10-11 17:08     ` Deepak Gupta
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Brown @ 2024-10-11 10:38 UTC (permalink / raw)
  To: Deepak Gupta
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Lorenzo Stoakes, Arnd Bergmann, linux-kernel, linux-fsdevel,
	linux-mm, linux-arch, Rick Edgecombe

[-- Attachment #1: Type: text/plain, Size: 673 bytes --]

On Thu, Oct 10, 2024 at 05:32:04PM -0700, Deepak Gupta wrote:
> VM_SHADOW_STACK (alias to VM_HIGH_ARCH_5) is used to encode shadow stack
> VMA on three architectures (x86 shadow stack, arm GCS and RISC-V shadow
> stack). In case architecture doesn't implement shadow stack, it's VM_NONE
> Introducing a helper `is_shadow_stack_vma` to determine shadow stack vma
> or not.
> 
> Signed-off-by: Deepak Gupta <debug@rivosinc.com>
> ---
>  mm/gup.c |  2 +-
>  mm/vma.h | 10 +++++++---
>  2 files changed, 8 insertions(+), 4 deletions(-)

As I noted in reply to the version of this patch in the RISC-V series
there's another test for VM_SHADOW_STACK in mm/mmap.c.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC/RFT 3/3] kernel: converge common shadow stack flow agnostic to arch
  2024-10-11  0:32 ` [PATCH RFC/RFT 3/3] kernel: converge common shadow stack flow agnostic to arch Deepak Gupta
@ 2024-10-11 12:33   ` Mark Brown
  2024-10-11 17:05     ` Deepak Gupta
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Brown @ 2024-10-11 12:33 UTC (permalink / raw)
  To: Deepak Gupta
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Lorenzo Stoakes, Arnd Bergmann, linux-kernel, linux-fsdevel,
	linux-mm, linux-arch, Rick Edgecombe

[-- Attachment #1: Type: text/plain, Size: 5061 bytes --]

On Thu, Oct 10, 2024 at 05:32:05PM -0700, Deepak Gupta wrote:

> +unsigned long alloc_shstk(unsigned long addr, unsigned long size,
> +				 unsigned long token_offset, bool set_res_tok);
> +int shstk_setup(void);
> +int create_rstor_token(unsigned long ssp, unsigned long *token_addr);
> +bool cpu_supports_shadow_stack(void);

The cpu_ naming is confusing in an arm64 context, we use cpu_ for
functions that report if a feature is supported on the current CPU and
system_ for functions that report if a feature is enabled on the system.

> +void set_thread_shstk_status(bool enable);

It might be better if this took the flags that the prctl() takes?  It
feels like 

> +/* Flags for map_shadow_stack(2) */
> +#define SHADOW_STACK_SET_TOKEN	(1ULL << 0)	/* Set up a restore token in the shadow stack */
> +

We've also got SHADOW_STACK_SET_MARKER now.

> +bool cpu_supports_shadow_stack(void)
> +{
> +	return arch_cpu_supports_shadow_stack();
> +}
> +
> +bool is_shstk_enabled(struct task_struct *task)
> +{
> +	return arch_is_shstk_enabled(task);
> +}

Do we need these wrappers (or could they just be static inlines in the
header)?

> +void set_thread_shstk_status(bool enable)
> +{
> +	arch_set_thread_shstk_status(enable);
> +}

arm64 can return an error here, we reject a bunch of conditions like 32
bit threads and locked enable status.

> +unsigned long adjust_shstk_size(unsigned long size)
> +{
> +	if (size)
> +		return PAGE_ALIGN(size);
> +
> +	return PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G));
> +}

static?

> +/*
> + * VM_SHADOW_STACK will have a guard page. This helps userspace protect
> + * itself from attacks. The reasoning is as follows:
> + *
> + * The shadow stack pointer(SSP) is moved by CALL, RET, and INCSSPQ. The
> + * INCSSP instruction can increment the shadow stack pointer. It is the
> + * shadow stack analog of an instruction like:
> + *
> + *   addq $0x80, %rsp
> + *
> + * However, there is one important difference between an ADD on %rsp
> + * and INCSSP. In addition to modifying SSP, INCSSP also reads from the
> + * memory of the first and last elements that were "popped". It can be
> + * thought of as acting like this:
> + *
> + * READ_ONCE(ssp);       // read+discard top element on stack
> + * ssp += nr_to_pop * 8; // move the shadow stack
> + * READ_ONCE(ssp-8);     // read+discard last popped stack element
> + *
> + * The maximum distance INCSSP can move the SSP is 2040 bytes, before
> + * it would read the memory. Therefore a single page gap will be enough
> + * to prevent any operation from shifting the SSP to an adjacent stack,
> + * since it would have to land in the gap at least once, causing a
> + * fault.

This is all very x86 centric...

> +	if (create_rstor_token(mapped_addr + token_offset, NULL)) {
> +		vm_munmap(mapped_addr, size);
> +		return -EINVAL;
> +	}

Bikeshedding but can we call the function create_shstk_token() instead?
The rstor means absolutely nothing in an arm64 context.

> +SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags)
> +{
> +	bool set_tok = flags & SHADOW_STACK_SET_TOKEN;
> +	unsigned long aligned_size;
> +
> +	if (!cpu_supports_shadow_stack())
> +		return -EOPNOTSUPP;
> +
> +	if (flags & ~SHADOW_STACK_SET_TOKEN)
> +		return -EINVAL;

This needs SHADOW_STACK_SET_MARKER for arm64.

> +	if (addr && (addr & (PAGE_SIZE - 1)))
> +		return -EINVAL;

	if (!PAGE_ALIGNED(addr))

> +int shstk_setup(void)
> +{

This is half of the implementation of the prctl() for enabling shadow
stacks.  Looking at the arm64 implementation this rafactoring feels a
bit awkward, we don't have the one flag at a time requiremet that x86
has and we structure things rather differently.  I'm not sure that the
arch_prctl() and prctl() are going to line up comfortably...

> +	struct thread_shstk *shstk = &current->thread.shstk;
> +	unsigned long addr, size;
> +
> +	/* Already enabled */
> +	if (is_shstk_enabled(current))
> +		return 0;
> +
> +	/* Also not supported for 32 bit */
> +	if (!cpu_supports_shadow_stack() ||
> +		(IS_ENABLED(CONFIG_X86_64) && in_ia32_syscall()))
> +		return -EOPNOTSUPP;

We probably need a thread_supports_shstk(), arm64 has a similar check
for not 32 bit threads and I noted an issue with needing this check
elsewhere.

> +	/*
> +	 * For CLONE_VFORK the child will share the parents shadow stack.
> +	 * Make sure to clear the internal tracking of the thread shadow
> +	 * stack so the freeing logic run for child knows to leave it alone.
> +	 */
> +	if (clone_flags & CLONE_VFORK) {
> +		set_shstk_base_size(tsk, 0, 0);
> +		return 0;
> +	}

On arm64 we set the new thread's shadow stack pointer here, the logic
around that can probably also be usefully factored out.

> +	/*
> +	 * For !CLONE_VM the child will use a copy of the parents shadow
> +	 * stack.
> +	 */
> +	if (!(clone_flags & CLONE_VM))
> +		return 0;

Here also.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC/RFT 3/3] kernel: converge common shadow stack flow agnostic to arch
  2024-10-11 12:33   ` Mark Brown
@ 2024-10-11 17:05     ` Deepak Gupta
  2024-10-12  8:49       ` Mark Brown
  0 siblings, 1 reply; 11+ messages in thread
From: Deepak Gupta @ 2024-10-11 17:05 UTC (permalink / raw)
  To: Mark Brown
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Lorenzo Stoakes, Arnd Bergmann, linux-kernel, linux-fsdevel,
	linux-mm, linux-arch, Rick Edgecombe

On Fri, Oct 11, 2024 at 01:33:05PM +0100, Mark Brown wrote:
>On Thu, Oct 10, 2024 at 05:32:05PM -0700, Deepak Gupta wrote:
>
>> +unsigned long alloc_shstk(unsigned long addr, unsigned long size,
>> +				 unsigned long token_offset, bool set_res_tok);
>> +int shstk_setup(void);
>> +int create_rstor_token(unsigned long ssp, unsigned long *token_addr);
>> +bool cpu_supports_shadow_stack(void);
>
>The cpu_ naming is confusing in an arm64 context, we use cpu_ for
>functions that report if a feature is supported on the current CPU and
>system_ for functions that report if a feature is enabled on the system.
>

hmm...
Curious. What's the difference between cpu and system?
We can ditch both cpu and system and call it
`user_shstk_supported()`. Again not a great name but all we are looking for
is whether user shadow stack is supported or not.

>> +void set_thread_shstk_status(bool enable);
>
>It might be better if this took the flags that the prctl() takes?  It
>feels like

hmm we can do that. But I imagine it might get invoked from other flow as well.
Although instead of `bool`, we can take `unsigned long` here. It would work for now
for `prctl` and future users get options to chisel around it.
I'll do that.

>
>> +/* Flags for map_shadow_stack(2) */
>> +#define SHADOW_STACK_SET_TOKEN	(1ULL << 0)	/* Set up a restore token in the shadow stack */
>> +
>
>We've also got SHADOW_STACK_SET_MARKER now.
>

Sorry, I missed it.

>> +bool cpu_supports_shadow_stack(void)
>> +{
>> +	return arch_cpu_supports_shadow_stack();
>> +}
>> +
>> +bool is_shstk_enabled(struct task_struct *task)
>> +{
>> +	return arch_is_shstk_enabled(task);
>> +}
>
>Do we need these wrappers (or could they just be static inlines in the
>header)?

As I started doing this exercise, I ran into some headers and multiple
definitions issue due to both modules (arch specific shstk.c and generic
usershstk.c) need to call into each other independently and need to include
each other headers, so took path of least resistence.

But now that it settling a bit and I've better picture, I'll give it a try
again.

>
>> +void set_thread_shstk_status(bool enable)
>> +{
>> +	arch_set_thread_shstk_status(enable);
>> +}
>
>arm64 can return an error here, we reject a bunch of conditions like 32
>bit threads and locked enable status.

Ok.
You would like this error to be `bool` or an `int`?

>
>> +unsigned long adjust_shstk_size(unsigned long size)
>> +{
>> +	if (size)
>> +		return PAGE_ALIGN(size);
>> +
>> +	return PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G));
>> +}
>
>static?

Yes this can be static.

>
>> +/*
>> + * VM_SHADOW_STACK will have a guard page. This helps userspace protect
>> + * itself from attacks. The reasoning is as follows:
>> + *
>> + * The shadow stack pointer(SSP) is moved by CALL, RET, and INCSSPQ. The
>> + * INCSSP instruction can increment the shadow stack pointer. It is the
>> + * shadow stack analog of an instruction like:
>> + *
>> + *   addq $0x80, %rsp
>> + *
>> + * However, there is one important difference between an ADD on %rsp
>> + * and INCSSP. In addition to modifying SSP, INCSSP also reads from the
>> + * memory of the first and last elements that were "popped". It can be
>> + * thought of as acting like this:
>> + *
>> + * READ_ONCE(ssp);       // read+discard top element on stack
>> + * ssp += nr_to_pop * 8; // move the shadow stack
>> + * READ_ONCE(ssp-8);     // read+discard last popped stack element
>> + *
>> + * The maximum distance INCSSP can move the SSP is 2040 bytes, before
>> + * it would read the memory. Therefore a single page gap will be enough
>> + * to prevent any operation from shifting the SSP to an adjacent stack,
>> + * since it would have to land in the gap at least once, causing a
>> + * fault.
>
>This is all very x86 centric...

Yes I am aware and likely will be removed, assuming no objections from x86 on
removing it.

>
>> +	if (create_rstor_token(mapped_addr + token_offset, NULL)) {
>> +		vm_munmap(mapped_addr, size);
>> +		return -EINVAL;
>> +	}
>
>Bikeshedding but can we call the function create_shstk_token() instead?
>The rstor means absolutely nothing in an arm64 context.

`create_shstk_token` it is. Much better name. Thanks.

>
>> +SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags)
>> +{
>> +	bool set_tok = flags & SHADOW_STACK_SET_TOKEN;
>> +	unsigned long aligned_size;
>> +
>> +	if (!cpu_supports_shadow_stack())
>> +		return -EOPNOTSUPP;
>> +
>> +	if (flags & ~SHADOW_STACK_SET_TOKEN)
>> +		return -EINVAL;
>
>This needs SHADOW_STACK_SET_MARKER for arm64.

Ack.

>
>> +	if (addr && (addr & (PAGE_SIZE - 1)))
>> +		return -EINVAL;
>
>	if (!PAGE_ALIGNED(addr))
>
>> +int shstk_setup(void)
>> +{
>
>This is half of the implementation of the prctl() for enabling shadow
>stacks.  Looking at the arm64 implementation this rafactoring feels a
>bit awkward, we don't have the one flag at a time requiremet that x86
>has and we structure things rather differently.  I'm not sure that the
>arch_prctl() and prctl() are going to line up comfortably...
>

Yes I was in two minds as well. In x86 case, its anyways arch specific
so, you will land up in arch specific code and then later land in generic
code. In case of arm64 and riscv as well, each will have arch specific
implementation.

I'll give a try on making prctl handling arch agnostic. If it becomes too
kludgy and ugly, may be its best that prctl handling and first time shadow
stack setup is all arch specific.

>> +	struct thread_shstk *shstk = &current->thread.shstk;

Note for myself, this ^ is not needed. It's x86 specific and I missed this
clean up.

>> +	unsigned long addr, size;
>> +
>> +	/* Already enabled */
>> +	if (is_shstk_enabled(current))
>> +		return 0;
>> +
>> +	/* Also not supported for 32 bit */
>> +	if (!cpu_supports_shadow_stack() ||
>> +		(IS_ENABLED(CONFIG_X86_64) && in_ia32_syscall()))
>> +		return -EOPNOTSUPP;
>
>We probably need a thread_supports_shstk(), 

`is_shstk_enabled(current)` doesn't work?

> arm64 has a similar check
>for not 32 bit threads and I noted an issue with needing this check
>elsewhere.

hmm May be that's why we need `is_shskt_enabled(current)` and another
`thread_supports_shstk` (probably some better name here from someone on list)

And in case we end up having no commonalities in prctl handling (as mentioned
in comment above), may be then its not needed to have a `thread_supports_shstk`
because its needed during prctl
handling.

>
>> +	/*
>> +	 * For CLONE_VFORK the child will share the parents shadow stack.
>> +	 * Make sure to clear the internal tracking of the thread shadow
>> +	 * stack so the freeing logic run for child knows to leave it alone.
>> +	 */
>> +	if (clone_flags & CLONE_VFORK) {
>> +		set_shstk_base_size(tsk, 0, 0);
>> +		return 0;
>> +	}
>
>On arm64 we set the new thread's shadow stack pointer here, the logic
>around that can probably also be usefully factored out.

Ok I'll take a look.

>
>> +	/*
>> +	 * For !CLONE_VM the child will use a copy of the parents shadow
>> +	 * stack.
>> +	 */
>> +	if (!(clone_flags & CLONE_VM))
>> +		return 0;
>
>Here also.

Same comment as above.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC/RFT 2/3] mm: helper `is_shadow_stack_vma` to check shadow stack vma
  2024-10-11 10:38   ` Mark Brown
@ 2024-10-11 17:08     ` Deepak Gupta
  0 siblings, 0 replies; 11+ messages in thread
From: Deepak Gupta @ 2024-10-11 17:08 UTC (permalink / raw)
  To: Mark Brown
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Lorenzo Stoakes, Arnd Bergmann, linux-kernel, linux-fsdevel,
	linux-mm, linux-arch, Rick Edgecombe

On Fri, Oct 11, 2024 at 11:38:51AM +0100, Mark Brown wrote:
>On Thu, Oct 10, 2024 at 05:32:04PM -0700, Deepak Gupta wrote:
>> VM_SHADOW_STACK (alias to VM_HIGH_ARCH_5) is used to encode shadow stack
>> VMA on three architectures (x86 shadow stack, arm GCS and RISC-V shadow
>> stack). In case architecture doesn't implement shadow stack, it's VM_NONE
>> Introducing a helper `is_shadow_stack_vma` to determine shadow stack vma
>> or not.
>>
>> Signed-off-by: Deepak Gupta <debug@rivosinc.com>
>> ---
>>  mm/gup.c |  2 +-
>>  mm/vma.h | 10 +++++++---
>>  2 files changed, 8 insertions(+), 4 deletions(-)
>
>As I noted in reply to the version of this patch in the RISC-V series
>there's another test for VM_SHADOW_STACK in mm/mmap.c.

Yeah will make sure to consolidate comments from both patch series in next
version. Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC/RFT 1/3] mm: Introduce ARCH_HAS_USER_SHADOW_STACK
  2024-10-11 10:33   ` Mark Brown
@ 2024-10-11 17:08     ` Deepak Gupta
  0 siblings, 0 replies; 11+ messages in thread
From: Deepak Gupta @ 2024-10-11 17:08 UTC (permalink / raw)
  To: Mark Brown
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Lorenzo Stoakes, Arnd Bergmann, linux-kernel, linux-fsdevel,
	linux-mm, linux-arch, Rick Edgecombe, David Hildenbrand,
	Carlos Bilbao

On Fri, Oct 11, 2024 at 11:33:24AM +0100, Mark Brown wrote:
>On Thu, Oct 10, 2024 at 05:32:03PM -0700, Deepak Gupta wrote:
>> From: Mark Brown <broonie@kernel.org>
>>
>> Since multiple architectures have support for shadow stacks and we need to
>> select support for this feature in several places in the generic code
>> provide a generic config option that the architectures can select.
>>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Acked-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Mark Brown <broonie@kernel.org>
>> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
>> Reviewed-by: Deepak Gupta <debug@rivosinc.com>
>> Reviewed-by: Carlos Bilbao <carlos.bilbao.osdev@gmail.com>
>> ---
>
>You need to add your own signoff when resending things (though I guess
>this is likely to get applied to a tree that already contains this
>patch so it likely doesn't matter in the end).

oops :(


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC/RFT 3/3] kernel: converge common shadow stack flow agnostic to arch
  2024-10-11 17:05     ` Deepak Gupta
@ 2024-10-12  8:49       ` Mark Brown
  0 siblings, 0 replies; 11+ messages in thread
From: Mark Brown @ 2024-10-12  8:49 UTC (permalink / raw)
  To: Deepak Gupta
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Lorenzo Stoakes, Arnd Bergmann, linux-kernel, linux-fsdevel,
	linux-mm, linux-arch, Rick Edgecombe

[-- Attachment #1: Type: text/plain, Size: 2786 bytes --]

On Fri, Oct 11, 2024 at 10:05:50AM -0700, Deepak Gupta wrote:
> On Fri, Oct 11, 2024 at 01:33:05PM +0100, Mark Brown wrote:
> > On Thu, Oct 10, 2024 at 05:32:05PM -0700, Deepak Gupta wrote:

> > > +unsigned long alloc_shstk(unsigned long addr, unsigned long size,
> > > +				 unsigned long token_offset, bool set_res_tok);
> > > +int shstk_setup(void);
> > > +int create_rstor_token(unsigned long ssp, unsigned long *token_addr);
> > > +bool cpu_supports_shadow_stack(void);

> > The cpu_ naming is confusing in an arm64 context, we use cpu_ for
> > functions that report if a feature is supported on the current CPU and
> > system_ for functions that report if a feature is enabled on the system.

> hmm...
> Curious. What's the difference between cpu and system?

Like I say above cpu_ is for the current CPU and system_ is for the
system as a whole.  On a big.LITTLE system it's common to have a mix of
implementations which don't have consistent feature sets.

> We can ditch both cpu and system and call it
> `user_shstk_supported()`. Again not a great name but all we are looking for
> is whether user shadow stack is supported or not.

That avoids the confusion so works for me.

> > > +void set_thread_shstk_status(bool enable);
> > 
> > It might be better if this took the flags that the prctl() takes?  It
> > feels like

> hmm we can do that. But I imagine it might get invoked from other flow as well.

I'd expect that any other contexts would be either copying an existing
set of flags or disabling either of which should be managable.

> Although instead of `bool`, we can take `unsigned long` here. It would work for now
> for `prctl` and future users get options to chisel around it.
> I'll do that.

Sounds good.

> > > +void set_thread_shstk_status(bool enable)
> > > +{
> > > +	arch_set_thread_shstk_status(enable);
> > > +}

> > arm64 can return an error here, we reject a bunch of conditions like 32
> > bit threads and locked enable status.

> Ok.
> You would like this error to be `bool` or an `int`?

An int seems safer (eg, differentiating not supported, invalid arguments
and permission failures).

> > > +	unsigned long addr, size;

> > > +	/* Already enabled */
> > > +	if (is_shstk_enabled(current))
> > > +		return 0;

> > > +	/* Also not supported for 32 bit */
> > > +	if (!cpu_supports_shadow_stack() ||
> > > +		(IS_ENABLED(CONFIG_X86_64) && in_ia32_syscall()))
> > > +		return -EOPNOTSUPP;

> > We probably need a thread_supports_shstk(),

> `is_shstk_enabled(current)` doesn't work?

No, we just checked that immediately above - this is checking we're not
trying to enable shadow stack on a 32 bit task so it's a per task
property separate to the task already being enabled.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-10-12  8:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-11  0:32 [PATCH RFC/RFT 0/3] Converge common flows for cpu assisted shadow stack Deepak Gupta
2024-10-11  0:32 ` [PATCH RFC/RFT 1/3] mm: Introduce ARCH_HAS_USER_SHADOW_STACK Deepak Gupta
2024-10-11 10:33   ` Mark Brown
2024-10-11 17:08     ` Deepak Gupta
2024-10-11  0:32 ` [PATCH RFC/RFT 2/3] mm: helper `is_shadow_stack_vma` to check shadow stack vma Deepak Gupta
2024-10-11 10:38   ` Mark Brown
2024-10-11 17:08     ` Deepak Gupta
2024-10-11  0:32 ` [PATCH RFC/RFT 3/3] kernel: converge common shadow stack flow agnostic to arch Deepak Gupta
2024-10-11 12:33   ` Mark Brown
2024-10-11 17:05     ` Deepak Gupta
2024-10-12  8:49       ` Mark Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).