Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* Re: [PATCH] ima: Fix NULL pointer dereference in ima_file_hash
From: KP Singh @ 2020-09-16 12:47 UTC (permalink / raw)
  To: Florent Revest
  Cc: open list, Linux Security Module list, Mimi Zohar, James Morris,
	Kees Cook, Lakshmi Ramasubramanian, Jann Horn
In-Reply-To: <5ab256148e8ad5f9882e888dac809bc72cd3fe66.camel@chromium.org>

Somehow this patch does not seem to have been picked up by any of the
mailing list archives and I am not sure if this was delivered to the
lists. I will send a v2 and add Florent's Reviewed-by tag.

On Wed, Sep 16, 2020 at 2:41 PM Florent Revest <revest@chromium.org> wrote:
>
> Reviewed-by: Florent Revest <revest@chromium.org>
>
> On Wed, 2020-09-16 at 12:05 +0000, KP Singh wrote:
> > From: KP Singh <kpsingh@google.com>
> >
> > ima_file_hash can be called when there is no iint->ima_hash available
> > even though the inode exists in the integrity cache.
> >
> > An example where this can happen (suggested by Jann Horn):
> >
> > Process A does:
> >
> >       while(1) {
> >               unlink("/tmp/imafoo");
> >               fd = open("/tmp/imafoo", O_RDWR|O_CREAT|O_TRUNC, 0700);
> >               if (fd == -1) {
> >                       perror("open");
> >                       continue;
> >               }
> >               write(fd, "A", 1);
> >               close(fd);
> >       }
> >
> > and Process B does:
> >
> >       while (1) {
> >               int fd = open("/tmp/imafoo", O_RDONLY);
> >               if (fd == -1)
> >                       continue;
> >               char *mapping = mmap(NULL, 0x1000, PROT_READ|PROT_EXEC,
> >                                    MAP_PRIVATE, fd, 0);
> >               if (mapping != MAP_FAILED)
> >                       munmap(mapping, 0x1000);
> >               close(fd);
> >       }
> >
> > Due to the race to get the iint->mutex between ima_file_hash and
> > process_measurement iint->ima_hash could still be NULL.
> >
> > Fixes: 6beea7afcc72 ("ima: add the ability to query the cached hash
> > of a given file")
> > Signed-off-by: KP Singh <kpsingh@google.com>
> > ---
> >  security/integrity/ima/ima_main.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/security/integrity/ima/ima_main.c
> > b/security/integrity/ima/ima_main.c
> > index 8a91711ca79b..4c86cd4eece0 100644
> > --- a/security/integrity/ima/ima_main.c
> > +++ b/security/integrity/ima/ima_main.c
> > @@ -531,6 +531,16 @@ int ima_file_hash(struct file *file, char *buf,
> > size_t buf_size)
> >               return -EOPNOTSUPP;
> >
> >       mutex_lock(&iint->mutex);
> > +
> > +     /*
> > +      * ima_file_hash can be called when ima_collect_measurement has
> > still
> > +      * not been called, we might not always have a hash.
> > +      */
> > +     if (!iint->ima_hash) {
> > +             mutex_unlock(&iint->mutex);
> > +             return -EOPNOTSUPP;
> > +     }
> > +
> >       if (buf) {
> >               size_t copied_size;
> >
>

^ permalink raw reply

* [PATCH v2 3/4] [RFC] arm64/trampfd: Provide support for the trampoline file descriptor
From: madvenka @ 2020-09-16 15:08 UTC (permalink / raw)
  To: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	madvenka
In-Reply-To: <20200916150826.5990-1-madvenka@linux.microsoft.com>

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

	- Define architecture specific register names
	- Architecture specific functions for:
		- system call init
		- code descriptor check
		- data descriptor check
	- Fill a page with a trampoline table for:
		- 32-bit user process
		- 64-bit user process

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/include/asm/unistd.h      |   2 +-
 arch/arm64/include/asm/unistd32.h    |   2 +
 arch/arm64/include/uapi/asm/ptrace.h |  59 +++++++
 arch/arm64/kernel/Makefile           |   2 +
 arch/arm64/kernel/trampfd.c          | 244 +++++++++++++++++++++++++++
 5 files changed, 308 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kernel/trampfd.c

diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 3b859596840d..b3b2019f8d16 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -38,7 +38,7 @@
 #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls		440
+#define __NR_compat_syscalls		441
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 6d95d0c8bf2f..c0493c5322d9 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -885,6 +885,8 @@ __SYSCALL(__NR_openat2, sys_openat2)
 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 #define __NR_faccessat2 439
 __SYSCALL(__NR_faccessat2, sys_faccessat2)
+#define __NR_trampfd 440
+__SYSCALL(__NR_trampfd, sys_trampfd)
 
 /*
  * Please add new compat syscalls above this comment and update
diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
index 42cbe34d95ce..2778789c1cbe 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -88,6 +88,65 @@ struct user_pt_regs {
 	__u64		pstate;
 };
 
+/*
+ * These register names are to be used by 32-bit applications.
+ */
+enum reg_32_name {
+	arm_min,
+	arm_r0 = arm_min,
+	arm_r1,
+	arm_r2,
+	arm_r3,
+	arm_r4,
+	arm_r5,
+	arm_r6,
+	arm_r7,
+	arm_r8,
+	arm_r9,
+	arm_r10,
+	arm_r11,
+	arm_r12,
+	arm_max,
+};
+
+/*
+ * These register names are to be used by 64-bit applications.
+ */
+enum reg_64_name {
+	arm64_min = arm_max,
+	arm64_r0 = arm64_min,
+	arm64_r1,
+	arm64_r2,
+	arm64_r3,
+	arm64_r4,
+	arm64_r5,
+	arm64_r6,
+	arm64_r7,
+	arm64_r8,
+	arm64_r9,
+	arm64_r10,
+	arm64_r11,
+	arm64_r12,
+	arm64_r13,
+	arm64_r14,
+	arm64_r15,
+	arm64_r16,
+	arm64_r17,
+	arm64_r18,
+	arm64_r19,
+	arm64_r20,
+	arm64_r21,
+	arm64_r22,
+	arm64_r23,
+	arm64_r24,
+	arm64_r25,
+	arm64_r26,
+	arm64_r27,
+	arm64_r28,
+	arm64_r29,
+	arm64_max,
+};
+
 struct user_fpsimd_state {
 	__uint128_t	vregs[32];
 	__u32		fpsr;
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index a561cbb91d4d..18d373fb1208 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -71,3 +71,5 @@ extra-y					+= $(head-y) vmlinux.lds
 ifeq ($(CONFIG_DEBUG_EFI),y)
 AFLAGS_head.o += -DVMLINUX_PATH="\"$(realpath $(objtree)/vmlinux)\""
 endif
+
+obj-$(CONFIG_TRAMPFD)			+= trampfd.o
diff --git a/arch/arm64/kernel/trampfd.c b/arch/arm64/kernel/trampfd.c
new file mode 100644
index 000000000000..3b40ebb12907
--- /dev/null
+++ b/arch/arm64/kernel/trampfd.c
@@ -0,0 +1,244 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Trampoline FD - ARM64 support.
+ *
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ *
+ * Copyright (c) 2020, Microsoft Corporation.
+ */
+
+#include <linux/thread_info.h>
+#include <asm/compat.h>
+#include <linux/trampfd.h>
+
+#define TRAMPFD_CODE_32_SIZE		28
+#define TRAMPFD_CODE_64_SIZE		48
+
+static inline bool is_compat(void)
+{
+	return is_compat_thread(task_thread_info(current));
+}
+
+/*
+ * trampfd syscall.
+ */
+void trampfd_arch(struct trampfd_info *info)
+{
+	if (is_compat())
+		info->code_size = TRAMPFD_CODE_32_SIZE;
+	else
+		info->code_size = TRAMPFD_CODE_64_SIZE;
+	info->ntrampolines = PAGE_SIZE / info->code_size;
+	info->code_offset = TRAMPFD_CODE_PGOFF << PAGE_SHIFT;
+	info->reserved = 0;
+}
+
+/*
+ * trampfd code descriptor check.
+ */
+int trampfd_code_arch(struct trampfd_code *code)
+{
+	int	ntrampolines;
+	int	min, max;
+
+	if (is_compat()) {
+		min = arm_min;
+		max = arm_max;
+		ntrampolines = PAGE_SIZE / TRAMPFD_CODE_32_SIZE;
+	} else {
+		min = arm64_min;
+		max = arm64_max;
+		ntrampolines = PAGE_SIZE / TRAMPFD_CODE_64_SIZE;
+	}
+
+	if (code->reg < min || code->reg >= max)
+		return -EINVAL;
+
+	if (!code->ntrampolines || code->ntrampolines > ntrampolines)
+		return -EINVAL;
+	return 0;
+}
+
+/*
+ * trampfd data descriptor check.
+ */
+int trampfd_data_arch(struct trampfd_data *data)
+{
+	int	min, max;
+
+	if (is_compat()) {
+		min = arm_min;
+		max = arm_max;
+	} else {
+		min = arm64_min;
+		max = arm64_max;
+	}
+
+	if (data->reg < min || data->reg >= max)
+		return -EINVAL;
+	return 0;
+}
+
+#define MOVARM(ins, reg, imm32)						\
+{									\
+	u16	*_imm16 = (u16 *) &(imm32);	/* little endian */	\
+	int	_hw, _opcode;						\
+									\
+	for (_hw = 0; _hw < 2; _hw++) {					\
+		/* movw or movt */					\
+		_opcode = _hw ? 0xe3400000 : 0xe3000000;		\
+		*ins++ = _opcode | (_imm16[_hw] >> 12) << 16 |		\
+			 (reg) << 12 | (_imm16[_hw] & 0xFFF);		\
+	}								\
+}
+
+#define LDRARM(ins, reg)						\
+{									\
+	*ins++ = 0xe5900000 | (reg) << 16 | (reg) << 12;		\
+}
+
+#define BXARM(ins, reg)							\
+{									\
+	*ins++ = 0xe12fff10 | (reg);					\
+}
+
+static void trampfd_code_fill_32(struct trampfd *trampfd, char *addr)
+{
+	char		*eaddr = addr + PAGE_SIZE;
+	int		creg = trampfd->code_reg - arm_min;
+	int		dreg = trampfd->data_reg - arm_min;
+	u32		*code = trampfd->code;
+	u32		*data = trampfd->data;
+	u32		*instruction = (u32 *) addr;
+	int		i;
+
+	for (i = 0; i < trampfd->ntrampolines; i++, code++, data++) {
+		/*
+		 * movw creg, code & 0xFFFF
+		 * movt creg, code >> 16
+		 */
+		MOVARM(instruction, creg, code);
+
+		/*
+		 * ldr	creg, [creg]
+		 */
+		LDRARM(instruction, creg);
+
+		/*
+		 * movw dreg, data & 0xFFFF
+		 * movt dreg, data >> 16
+		 */
+		MOVARM(instruction, dreg, data);
+
+		/*
+		 * ldr	dreg, [dreg]
+		 */
+		LDRARM(instruction, dreg);
+
+		/*
+		 * bx	creg
+		 */
+		BXARM(instruction, creg);
+	}
+	addr = (char *) instruction;
+	memset(addr, 0, eaddr - addr);
+}
+
+#define MOVQ(ins, reg, imm64)						\
+{									\
+	u16	*_imm16 = (u16 *) &(imm64);	/* little endian */	\
+	int	_hw, _opcode;						\
+									\
+	for (_hw = 0; _hw < 4; _hw++) {					\
+		/* movz or movk */					\
+		_opcode = _hw ? 0xf2800000 : 0xd2800000;		\
+		*ins++ = _opcode | _hw << 21 | _imm16[_hw] << 5 | (reg);\
+	}								\
+}
+
+#define LDR(ins, reg)							\
+{									\
+	*ins++ = 0xf9400000 | (reg) << 5 | (reg);			\
+}
+
+#define BR(ins, reg)							\
+{									\
+	*ins++ = 0xd61f0000 | (reg) << 5;				\
+}
+
+#define PAD(ins)							\
+{									\
+	while ((uintptr_t) ins & 7)					\
+		*ins++ = 0;						\
+}
+
+static void trampfd_code_fill_64(struct trampfd *trampfd, char *addr)
+{
+	char		*eaddr = addr + PAGE_SIZE;
+	int		creg = trampfd->code_reg - arm64_min;
+	int		dreg = trampfd->data_reg - arm64_min;
+	u64		*code = trampfd->code;
+	u64		*data = trampfd->data;
+	u32		*instruction = (u32 *) addr;
+	int		i;
+
+	for (i = 0; i < trampfd->ntrampolines; i++, code++, data++) {
+		/*
+		 * Pseudo instruction:
+		 *
+		 * movq creg, code
+		 *
+		 * Actual instructions:
+		 *
+		 * movz	creg, code & 0xFFFF
+		 * movk	creg, (code >> 16) & 0xFFFF, lsl 16
+		 * movk	creg, (code >> 32) & 0xFFFF, lsl 32
+		 * movk	creg, (code >> 48) & 0xFFFF, lsl 48
+		 */
+		MOVQ(instruction, creg, code);
+
+		/*
+		 * ldr	creg, [creg]
+		 */
+		LDR(instruction, creg);
+
+		/*
+		 * Pseudo instruction:
+		 *
+		 * movq dreg, data
+		 *
+		 * Actual instructions:
+		 *
+		 * movz	dreg, data & 0xFFFF
+		 * movk	dreg, (data >> 16) & 0xFFFF, lsl 16
+		 * movk	dreg, (data >> 32) & 0xFFFF, lsl 32
+		 * movk	dreg, (data >> 48) & 0xFFFF, lsl 48
+		 */
+		MOVQ(instruction, dreg, data);
+
+		/*
+		 * ldr	dreg, [dreg]
+		 */
+		LDR(instruction, dreg);
+
+		/*
+		 * br	creg
+		 */
+		BR(instruction, creg);
+
+		/*
+		 * Pad to 8-byte boundary
+		 */
+		PAD(instruction);
+	}
+	addr = (char *) instruction;
+	memset(addr, 0, eaddr - addr);
+}
+
+void trampfd_code_fill(struct trampfd *trampfd, char *addr)
+{
+	if (is_compat())
+		trampfd_code_fill_32(trampfd, addr);
+	else
+		trampfd_code_fill_64(trampfd, addr);
+}
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 2/4] [RFC] x86/trampfd: Provide support for the trampoline file descriptor
From: madvenka @ 2020-09-16 15:08 UTC (permalink / raw)
  To: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	madvenka
In-Reply-To: <20200916150826.5990-1-madvenka@linux.microsoft.com>

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

	- Define architecture specific register names
	- Architecture specific functions for:
		- system call init
		- code descriptor check
		- data descriptor check
	- Fill a page with a trampoline table for:
		- 32-bit user process
		- 64-bit user process

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 arch/x86/include/uapi/asm/ptrace.h     |  38 ++++
 arch/x86/kernel/Makefile               |   1 +
 arch/x86/kernel/trampfd.c              | 238 +++++++++++++++++++++++++
 5 files changed, 279 insertions(+)
 create mode 100644 arch/x86/kernel/trampfd.c

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index d8f8a1a69ed1..d4f17806c9ab 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -443,3 +443,4 @@
 437	i386	openat2			sys_openat2
 438	i386	pidfd_getfd		sys_pidfd_getfd
 439	i386	faccessat2		sys_faccessat2
+440	i386	trampfd			sys_trampfd
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 78847b32e137..91b37bc4b6f0 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -360,6 +360,7 @@
 437	common	openat2			sys_openat2
 438	common	pidfd_getfd		sys_pidfd_getfd
 439	common	faccessat2		sys_faccessat2
+440	common	trampfd			sys_trampfd
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/arch/x86/include/uapi/asm/ptrace.h b/arch/x86/include/uapi/asm/ptrace.h
index 85165c0edafc..b4be362929b3 100644
--- a/arch/x86/include/uapi/asm/ptrace.h
+++ b/arch/x86/include/uapi/asm/ptrace.h
@@ -9,6 +9,44 @@
 
 #ifndef __ASSEMBLY__
 
+/*
+ * These register names are to be used by 32-bit applications.
+ */
+enum reg_32_name {
+	x32_min = 0,
+	x32_eax = x32_min,
+	x32_ebx,
+	x32_ecx,
+	x32_edx,
+	x32_esi,
+	x32_edi,
+	x32_ebp,
+	x32_max,
+};
+
+/*
+ * These register names are to be used by 64-bit applications.
+ */
+enum reg_64_name {
+	x64_min = x32_max,
+	x64_rax = x64_min,
+	x64_rbx,
+	x64_rcx,
+	x64_rdx,
+	x64_rsi,
+	x64_rdi,
+	x64_rbp,
+	x64_r8,
+	x64_r9,
+	x64_r10,
+	x64_r11,
+	x64_r12,
+	x64_r13,
+	x64_r14,
+	x64_r15,
+	x64_max,
+};
+
 #ifdef __i386__
 /* this struct defines the way the registers are stored on the
    stack during a system call. */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index e77261db2391..feb7f4f311fd 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -157,3 +157,4 @@ ifeq ($(CONFIG_X86_64),y)
 endif
 
 obj-$(CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT)	+= ima_arch.o
+obj-$(CONFIG_TRAMPFD)				+= trampfd.o
diff --git a/arch/x86/kernel/trampfd.c b/arch/x86/kernel/trampfd.c
new file mode 100644
index 000000000000..7b812c200d01
--- /dev/null
+++ b/arch/x86/kernel/trampfd.c
@@ -0,0 +1,238 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Trampoline FD - X86 support.
+ *
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ *
+ * Copyright (c) 2020, Microsoft Corporation.
+ */
+
+#include <linux/thread_info.h>
+#include <linux/trampfd.h>
+
+#define TRAMPFD_CODE_32_SIZE		24
+#define TRAMPFD_CODE_64_SIZE		40
+
+static inline bool is_compat(void)
+{
+	return (IS_ENABLED(CONFIG_X86_32) ||
+		(IS_ENABLED(CONFIG_COMPAT) && test_thread_flag(TIF_ADDR32)));
+}
+
+/*
+ * trampfd syscall.
+ */
+void trampfd_arch(struct trampfd_info *info)
+{
+	if (is_compat())
+		info->code_size = TRAMPFD_CODE_32_SIZE;
+	else
+		info->code_size = TRAMPFD_CODE_64_SIZE;
+	info->ntrampolines = PAGE_SIZE / info->code_size;
+	info->code_offset = TRAMPFD_CODE_PGOFF << PAGE_SHIFT;
+	info->reserved = 0;
+}
+
+/*
+ * trampfd code descriptor check.
+ */
+int trampfd_code_arch(struct trampfd_code *code)
+{
+	int	ntrampolines;
+	int	min, max;
+
+	if (is_compat()) {
+		min = x32_min;
+		max = x32_max;
+		ntrampolines = PAGE_SIZE / TRAMPFD_CODE_32_SIZE;
+	} else {
+		min = x64_min;
+		max = x64_max;
+		ntrampolines = PAGE_SIZE / TRAMPFD_CODE_64_SIZE;
+	}
+
+	if (code->reg < min || code->reg >= max)
+		return -EINVAL;
+
+	if (!code->ntrampolines || code->ntrampolines > ntrampolines)
+		return -EINVAL;
+	return 0;
+}
+
+/*
+ * trampfd data descriptor check.
+ */
+int trampfd_data_arch(struct trampfd_data *data)
+{
+	int	min, max;
+
+	if (is_compat()) {
+		min = x32_min;
+		max = x32_max;
+	} else {
+		min = x64_min;
+		max = x64_max;
+	}
+
+	if (data->reg < min || data->reg >= max)
+		return -EINVAL;
+	return 0;
+}
+
+/*
+ * X32 register encodings.
+ */
+static unsigned char	reg_32[] = {
+	0,	/* x32_eax */
+	3,	/* x32_ebx */
+	1,	/* x32_ecx */
+	2,	/* x32_edx */
+	6,	/* x32_esi */
+	7,	/* x32_edi */
+	5,	/* x32_ebp */
+};
+
+static void trampfd_code_fill_32(struct trampfd *trampfd, char *addr)
+{
+	char		*eaddr = addr + PAGE_SIZE;
+	int		creg = trampfd->code_reg - x32_min;
+	int		dreg = trampfd->data_reg - x32_min;
+	u32		*code = trampfd->code;
+	u32		*data = trampfd->data;
+	int		i;
+
+	for (i = 0; i < trampfd->ntrampolines; i++, code++, data++) {
+		/* endbr32 */
+		addr[0] = 0xf3;
+		addr[1] = 0x0f;
+		addr[2] = 0x1e;
+		addr[3] = 0xfb;
+
+		/* mov code, %creg */
+		addr[4] = 0xB8 | reg_32[creg];			/* opcode+reg */
+		memcpy(&addr[5], &code, sizeof(u32));		/* imm32 */
+
+		/* mov (%creg), %creg */
+		addr[9] = 0x8B;				/* opcode */
+		addr[10] = 0x00 |				/* MODRM.mode */
+			   reg_32[creg] << 3 |			/* MODRM.reg */
+			   reg_32[creg];			/* MODRM.r/m */
+
+		/* mov data, %dreg */
+		addr[11] = 0xB8 | reg_32[dreg];			/* opcode+reg */
+		memcpy(&addr[12], &data, sizeof(u32));		/* imm32 */
+
+		/* mov (%dreg), %dreg */
+		addr[16] = 0x8B;				/* opcode */
+		addr[17] = 0x00 |				/* MODRM.mode */
+			   reg_32[dreg] << 3 |			/* MODRM.reg */
+			   reg_32[dreg];			/* MODRM.r/m */
+
+		/* jmp *%creg */
+		addr[18] = 0xff;				/* opcode */
+		addr[19] = 0xe0 | reg_32[creg];			/* MODRM.r/m */
+
+		/* nopl (%eax) */
+		addr[20] = 0x0f;
+		addr[21] = 0x1f;
+		addr[22] = 0x00;
+
+		/* pad to 4-byte boundary */
+		memset(&addr[23], 0, TRAMPFD_CODE_32_SIZE - 23);
+		addr += TRAMPFD_CODE_32_SIZE;
+	}
+	memset(addr, 0, eaddr - addr);
+}
+
+/*
+ * X64 register encodings.
+ */
+static unsigned char	reg_64[] = {
+	0,	/* x64_rax */
+	3,	/* x64_rbx */
+	1,	/* x64_rcx */
+	2,	/* x64_rdx */
+	6,	/* x64_rsi */
+	7,	/* x64_rdi */
+	5,	/* x64_rbp */
+	8,	/* x64_r8 */
+	9,	/* x64_r9 */
+	10,	/* x64_r10 */
+	11,	/* x64_r11 */
+	12,	/* x64_r12 */
+	13,	/* x64_r13 */
+	14,	/* x64_r14 */
+	15,	/* x64_r15 */
+};
+
+static void trampfd_code_fill_64(struct trampfd *trampfd, char *addr)
+{
+	char		*eaddr = addr + PAGE_SIZE;
+	int		creg = trampfd->code_reg - x64_min;
+	int		dreg = trampfd->data_reg - x64_min;
+	u64		*code = trampfd->code;
+	u64		*data = trampfd->data;
+	int		i;
+
+	for (i = 0; i < trampfd->ntrampolines; i++, code++, data++) {
+		/* endbr64 */
+		addr[0] = 0xf3;
+		addr[1] = 0x0f;
+		addr[2] = 0x1e;
+		addr[3] = 0xfa;
+
+		/* movabs code, %creg */
+		addr[4] = 0x48 |				/* REX.W */
+			  ((reg_64[creg] & 0x8) >> 3);		/* REX.B */
+		addr[5] = 0xB8 | (reg_64[creg] & 0x7);		/* opcode+reg */
+		memcpy(&addr[6], &code, sizeof(u64));		/* imm64 */
+
+		/* movq (%creg), %creg */
+		addr[14] = 0x48 |				/* REX.W */
+			   ((reg_64[creg] & 0x8) >> 1) |	/* REX.R */
+			   ((reg_64[creg] & 0x8) >> 3);		/* REX.B */
+		addr[15] = 0x8B;				/* opcode */
+		addr[16] = 0x00 |				/* MODRM.mode */
+			   ((reg_64[creg] & 0x7)) << 3 |	/* MODRM.reg */
+			   ((reg_64[creg] & 0x7));		/* MODRM.r/m */
+
+		/* movabs data, %dreg */
+		addr[17] = 0x48 |				/* REX.W */
+			  ((reg_64[dreg] & 0x8) >> 3);		/* REX.B */
+		addr[18] = 0xB8 | (reg_64[dreg] & 0x7);		/* opcode+reg */
+		memcpy(&addr[19], &data, sizeof(u64));		/* imm64 */
+
+		/* movq (%dreg), %dreg */
+		addr[27] = 0x48 |				/* REX.W */
+			   ((reg_64[dreg] & 0x8) >> 1) |	/* REX.R */
+			   ((reg_64[dreg] & 0x8) >> 3);		/* REX.B */
+		addr[28] = 0x8B;				/* opcode */
+		addr[29] = 0x00 |				/* MODRM.mode */
+			   ((reg_64[dreg] & 0x7)) << 3 |	/* MODRM.reg */
+			   ((reg_64[dreg] & 0x7));		/* MODRM.r/m */
+
+		/* jmpq *%creg */
+		addr[30] = 0x40 |				/* REX.W */
+			   ((reg_64[creg] & 0x8) >> 3);		/* REX.B */
+		addr[31] = 0xff;				/* opcode */
+		addr[32] = 0xe0 | (reg_64[creg] & 0x7);		/* MODRM.r/m */
+
+		/* nopl (%rax) */
+		addr[33] = 0x0f;
+		addr[34] = 0x1f;
+		addr[35] = 0x00;
+
+		/* pad to 8-byte boundary */
+		memset(&addr[36], 0, TRAMPFD_CODE_64_SIZE - 36);
+		addr += TRAMPFD_CODE_64_SIZE;
+	}
+	memset(addr, 0, eaddr - addr);
+}
+
+void trampfd_code_fill(struct trampfd *trampfd, char *addr)
+{
+	if (is_compat())
+		trampfd_code_fill_32(trampfd, addr);
+	else
+		trampfd_code_fill_64(trampfd, addr);
+}
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
From: madvenka @ 2020-09-16 15:08 UTC (permalink / raw)
  To: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	madvenka
In-Reply-To: <210d7cd762d5307c2aa1676705b392bd445f1baa>

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

Introduction
============

Dynamic code is used in many different user applications. Dynamic code is
often generated at runtime. Dynamic code can also just be a pre-defined
sequence of machine instructions in a data buffer. Examples of dynamic
code are trampolines, JIT code, DBT code, etc.

Dynamic code is placed either in a data page or in a stack page. In order
to execute dynamic code, the page it resides in needs to be mapped with
execute permissions. Writable pages with execute permissions provide an
attack surface for hackers. Attackers can use this to inject malicious
code, modify existing code or do other harm.

To mitigate this, LSMs such as SELinux implement W^X. That is, they may not
allow pages to have both write and execute permissions. This prevents
dynamic code from executing and blocks applications that use it. To allow
genuine applications to run, exceptions have to be made for them (by setting
execmem, etc) which opens the door to security issues.

The W^X implementation today is not complete. There exist many user level
tricks that can be used to load and execute dynamic code. E.g.,

- Load the code into a file and map the file with R-X.

- Load the code in an RW- page. Change the permissions to R--. Then,
  change the permissions to R-X.

- Load the code in an RW- page. Remap the page with R-X to get a separate
  mapping to the same underlying physical page.

IMO, these are all security holes as an attacker can exploit them to inject
his own code.

In the future, these holes will definitely be closed. For instance, LSMs
(such as the IPE proposal [1]) may only allow code in properly signed object
files to be mapped with execute permissions. This will do two things:

	- user level tricks using anonymous pages will fail as anonymous
	  pages have no file identity

	- loading the code in a temporary file and mapping it with R-X
	  will fail as the temporary file would not have a signature

We need a way to execute such code without making security exceptions.
Trampolines are a good example of dynamic code. A couple of examples
of trampolines are given below. My first use case for this RFC is
libffi.

Examples of trampolines
=======================

libffi (A Portable Foreign Function Interface Library):

libffi allows a user to define functions with an arbitrary list of
arguments and return value through a feature called "Closures".
Closures use trampolines to jump to ABI handlers that handle calling
conventions and call a target function. libffi is used by a lot
of different applications. To name a few:

	- Python
	- Java
	- Javascript
	- Ruby FFI
	- Lisp
	- Objective C

GCC nested functions:

GCC has traditionally used trampolines for implementing nested
functions. The trampoline is placed on the user stack. So, the stack
needs to be executable.

Currently available solution
============================

One solution that has been proposed to allow trampolines to be executed
without making security exceptions is Trampoline Emulation. See:

https://pax.grsecurity.net/docs/emutramp.txt

In this solution, the kernel recognizes certain sequences of instructions
as "well-known" trampolines. When such a trampoline is executed, a page
fault happens because the trampoline page does not have execute permission.
The kernel recognizes the trampoline and emulates it. Basically, the
kernel does the work of the trampoline on behalf of the application.

Currently, the emulated trampolines are the ones used in libffi and GCC
nested functions. To my knowledge, only X86 is supported at this time.

As noted in emutramp.txt, this is not a generic solution. For every new
trampoline that needs to be supported, new instruction sequences need to
be recognized by the kernel and emulated. And this has to be done for
every architecture that needs to be supported.

emutramp.txt notes the following:

"... the real solution is not in emulation but by designing a kernel API
for runtime code generation and modifying userland to make use of it."

Solution proposed in this RFC
=============================

From this RFC's perspective, there are two scenarios for dynamic code:

Scenario 1
----------

We know what code we need only at runtime. For instance, JIT code generated
for frequently executed Java methods. Only at runtime do we know what
methods need to be JIT compiled. Such code cannot be statically defined. It
has to be generated at runtime.

Scenario 2
----------

We know what code we need in advance. User trampolines are a good example of
this. It is possible to define such code statically with some help from the
kernel.

This RFC addresses (2). (1) needs a general purpose trusted code generator
and is out of scope for this RFC.

For (2), the solution is to convert dynamic code to static code and place it
in a source file. The binary generated from the source can be signed. The
kernel can use signature verification to authenticate the binary and
allow the code to be mapped and executed.

The problem is that the static code has to be able to find the data that it
needs when it executes. For functions, the ABI defines the way to pass
parameters. But, for arbitrary dynamic code, there isn't a standard ABI
compliant way to pass data to the code for most architectures. Each instance
of dynamic code defines its own way. For instance, co-location of code and
data and PC-relative data referencing are used in cases where the ISA
supports it.

We need one standard way that would work for all architectures and ABIs.

The solution proposed here is:

1. Write the static code assuming that the data needed by the code is already
   pointed to by a designated register.

2. Get the kernel to supply a small universal trampoline that does the
   following:

	- Load the address of the data in a designated register
	- Load the address of the static code in a designated register
	- Jump to the static code

User code would use a kernel supplied API to create and map the trampoline.
The address values would be baked into the code so that no special ISA
features are needed.

To conserve memory, the kernel will pack as many trampolines as possible in
a page and provide a trampoline table to user code. The table itself is
managed by the user.

Trampoline File Descriptor (trampfd)
==========================

I am proposing a kernel API using anonymous file descriptors that can be
used to create the trampolines. The API is described in patch 1/4 of this
patchset. I provide a summary here:

	- Create a trampoline file object

	- Write a code descriptor into the trampoline file and specify:

		- the number of trampolines desired
		- the name of the code register
		- user pointer to a table of code addresses, one address
		  per trampoline

	- Write a data descriptor into the trampoline file and specify:

		- the name of the data register
		- user pointer to a table of data addresses, one address
		  per trampoline

	- mmap() the trampoline file. The kernel generates a table of
	  trampolines in a page and returns the trampoline table address

	- munmap() a trampoline file mapping

	- Close the trampoline file

Each mmap() will only map a single base page. Large pages are not supported.

A trampoline file can only be mapped once in an address space.

Trampoline file mappings cannot be shared across address spaces. So,
sending the trampoline file descriptor over a unix domain socket and
mapping it in another process will not work.

It is recommended that the code descriptor and the code table be placed
in the .rodata section so an attacker cannot modify them.

Trampoline use and reuse
========================

The code for trampoline X in the trampoline table is:

	load	&code_table[X], code_reg
	load	(code_reg), code_reg
	load	&data_table[X], data_reg
	load	(data_reg), data_reg
	jump	code_reg

The addresses &code_table[X] and &data_table[X] are baked into the
trampoline code. So, PC-relative data references are not needed. The user
can modify code_table[X] and data_table[X] dynamically.

For instance, within libffi, the same trampoline X can be used for different
closures at different times by setting:

	data_table[X] = closure;
	code_table[X] = ABI handling code;

Advantages of the Trampoline File Descriptor approach
=====================================================

- Using this support from the kernel, dynamic code can be converted to
  static code with a little effort so applications and libraries can move to
  a more secure model. In the simplest cases such as libffi, dynamic code can
  even be eliminated.

- This initial work is targeted towards X86 and ARM. But it can be supported
  easily on all architectures. We don't need any special ISA features such
  as PC-relative data referencing.

- The only code generation needed is for this small, universal trampoline.

- The kernel does not have to deal with any ABI issues in the generation of
  this trampoline.

- The kernel provides a trampoline table to conserve memory.

- An SELinux setting called "exectramp" can be implemented along the
  lines of "execmem", "execstack" and "execheap" to selectively allow the
  use of trampolines on a per application basis.

- In version 1, a trip to the kernel was required to execute the trampoline.
  In version 2, that is not required. So, there are no performance
  concerns in this approach.

libffi
======

I have implemented my solution for libffi and provided the changes for
X86 and ARM, 32-bit and 64-bit. Here is the reference patch:

http://linux.microsoft.com/~madvenka/libffi/libffi.v2.txt

If the trampfd patchset gets accepted, I will send the libffi changes
to the maintainers for a review. BTW, I have also successfully executed
the libffi self tests.

Work that is pending
====================

- I am working on implementing the SELinux setting - "exectramp".

- I have a test program to test the kernel API. I am working on adding it
  to selftests.

References
==========

[1] https://microsoft.github.io/ipe/
---

Changelog:

v1
	Introduced the Trampfd feature.

v2
	- Changed the system call. Version 2 does not support different
	  trampoline types and their associated type structures. It only
	  supports a kernel generated trampoline.

	  The system call now returns information to the user that is
	  used to define trampoline descriptors. E.g., the maximum
	  number of trampolines that can be packed in a single page.

	- Removed all the trampoline contexts such as register contexts
	  and stack contexts. This is based on the feedback that the kernel
	  should not have to worry about ABI issues and H/W features that
	  may deal with the context of a process.

	- Removed the need to make a trip into the kernel on trampoline
	  invocation. This is based on the feedback about performance.

	- Removed the ability to share trampolines across address spaces.
	  This would have made sense to different trampoline types based
	  on their semantics. But since I support only one specific
	  trampoline, sharing does not make sense.

	- Added calls to specify trampoline descriptors that the kernel
	  uses to generate trampolines.

	- Added architecture-specific code to generate the small, universal
	  trampoline for X86 32 and 64-bit, ARM 32 and 64-bit.

	- Implemented the trampoline table in a page.
Madhavan T. Venkataraman (4):
  Implement the kernel API for the trampoline file descriptor.
  Implement i386 and X86 support for the trampoline file descriptor.
  Implement ARM64 support for the trampoline file descriptor.
  Implement ARM support for the trampoline file descriptor.

 arch/arm/include/uapi/asm/ptrace.h     |  21 +++
 arch/arm/kernel/Makefile               |   1 +
 arch/arm/kernel/trampfd.c              | 124 +++++++++++++
 arch/arm/tools/syscall.tbl             |   1 +
 arch/arm64/include/asm/unistd.h        |   2 +-
 arch/arm64/include/asm/unistd32.h      |   2 +
 arch/arm64/include/uapi/asm/ptrace.h   |  59 ++++++
 arch/arm64/kernel/Makefile             |   2 +
 arch/arm64/kernel/trampfd.c            | 244 +++++++++++++++++++++++++
 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 arch/x86/include/uapi/asm/ptrace.h     |  38 ++++
 arch/x86/kernel/Makefile               |   1 +
 arch/x86/kernel/trampfd.c              | 238 ++++++++++++++++++++++++
 fs/Makefile                            |   1 +
 fs/trampfd/Makefile                    |   5 +
 fs/trampfd/trampfd_fops.c              | 241 ++++++++++++++++++++++++
 fs/trampfd/trampfd_map.c               | 142 ++++++++++++++
 include/linux/syscalls.h               |   2 +
 include/linux/trampfd.h                |  49 +++++
 include/uapi/asm-generic/unistd.h      |   4 +-
 include/uapi/linux/trampfd.h           | 184 +++++++++++++++++++
 init/Kconfig                           |   7 +
 kernel/sys_ni.c                        |   3 +
 24 files changed, 1371 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm/kernel/trampfd.c
 create mode 100644 arch/arm64/kernel/trampfd.c
 create mode 100644 arch/x86/kernel/trampfd.c
 create mode 100644 fs/trampfd/Makefile
 create mode 100644 fs/trampfd/trampfd_fops.c
 create mode 100644 fs/trampfd/trampfd_map.c
 create mode 100644 include/linux/trampfd.h
 create mode 100644 include/uapi/linux/trampfd.h

-- 
2.17.1

^ permalink raw reply

* [PATCH v2 4/4] [RFC] arm/trampfd: Provide support for the trampoline file descriptor
From: madvenka @ 2020-09-16 15:08 UTC (permalink / raw)
  To: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	madvenka
In-Reply-To: <20200916150826.5990-1-madvenka@linux.microsoft.com>

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

	- Define architecture specific register names
	- Architecture specific functions for:
		- system call init
		- code descriptor check
		- data descriptor check
	- Fill a page with a trampoline table,

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm/include/uapi/asm/ptrace.h |  21 +++++
 arch/arm/kernel/Makefile           |   1 +
 arch/arm/kernel/trampfd.c          | 124 +++++++++++++++++++++++++++++
 arch/arm/tools/syscall.tbl         |   1 +
 4 files changed, 147 insertions(+)
 create mode 100644 arch/arm/kernel/trampfd.c

diff --git a/arch/arm/include/uapi/asm/ptrace.h b/arch/arm/include/uapi/asm/ptrace.h
index e61c65b4018d..598047768f9b 100644
--- a/arch/arm/include/uapi/asm/ptrace.h
+++ b/arch/arm/include/uapi/asm/ptrace.h
@@ -151,6 +151,27 @@ struct pt_regs {
 #define ARM_r0		uregs[0]
 #define ARM_ORIG_r0	uregs[17]
 
+/*
+ * These register names are to be used by 32-bit applications.
+ */
+enum reg_32_name {
+	arm_min,
+	arm_r0 = arm_min,
+	arm_r1,
+	arm_r2,
+	arm_r3,
+	arm_r4,
+	arm_r5,
+	arm_r6,
+	arm_r7,
+	arm_r8,
+	arm_r9,
+	arm_r10,
+	arm_r11,
+	arm_r12,
+	arm_max,
+};
+
 /*
  * The size of the user-visible VFP state as seen by PTRACE_GET/SETVFPREGS
  * and core dumps.
diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
index 89e5d864e923..652c54c2f19a 100644
--- a/arch/arm/kernel/Makefile
+++ b/arch/arm/kernel/Makefile
@@ -105,5 +105,6 @@ obj-$(CONFIG_SMP)		+= psci_smp.o
 endif
 
 obj-$(CONFIG_HAVE_ARM_SMCCC)	+= smccc-call.o
+obj-$(CONFIG_TRAMPFD)		+= trampfd.o
 
 extra-y := $(head-y) vmlinux.lds
diff --git a/arch/arm/kernel/trampfd.c b/arch/arm/kernel/trampfd.c
new file mode 100644
index 000000000000..45146ed489e8
--- /dev/null
+++ b/arch/arm/kernel/trampfd.c
@@ -0,0 +1,124 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Trampoline FD - ARM support.
+ *
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ *
+ * Copyright (c) 2020, Microsoft Corporation.
+ */
+
+#include <linux/thread_info.h>
+#include <linux/trampfd.h>
+
+#define TRAMPFD_CODE_SIZE		28
+
+/*
+ * trampfd syscall.
+ */
+void trampfd_arch(struct trampfd_info *info)
+{
+	info->code_size = TRAMPFD_CODE_SIZE;
+	info->ntrampolines = PAGE_SIZE / info->code_size;
+	info->code_offset = TRAMPFD_CODE_PGOFF << PAGE_SHIFT;
+	info->reserved = 0;
+}
+
+/*
+ * trampfd code descriptor check.
+ */
+int trampfd_code_arch(struct trampfd_code *code)
+{
+	int	ntrampolines;
+	int	min, max;
+
+	min = arm_min;
+	max = arm_max;
+	ntrampolines = PAGE_SIZE / TRAMPFD_CODE_SIZE;
+
+	if (code->reg < min || code->reg >= max)
+		return -EINVAL;
+
+	if (!code->ntrampolines || code->ntrampolines > ntrampolines)
+		return -EINVAL;
+	return 0;
+}
+
+/*
+ * trampfd data descriptor check.
+ */
+int trampfd_data_arch(struct trampfd_data *data)
+{
+	int	min, max;
+
+	min = arm_min;
+	max = arm_max;
+
+	if (data->reg < min || data->reg >= max)
+		return -EINVAL;
+	return 0;
+}
+
+#define MOVW(ins, reg, imm32)						\
+{									\
+	u16	*_imm16 = (u16 *) &(imm32);	/* little endian */	\
+	int	_hw, _opcode;						\
+									\
+	for (_hw = 0; _hw < 2; _hw++) {					\
+		/* movw or movt */					\
+		_opcode = _hw ? 0xe3400000 : 0xe3000000;		\
+		*ins++ = _opcode | (_imm16[_hw] >> 12) << 16 |		\
+			 (reg) << 12 | (_imm16[_hw] & 0xFFF);		\
+	}								\
+}
+
+#define LDR(ins, reg)							\
+{									\
+	*ins++ = 0xe5900000 | (reg) << 16 | (reg) << 12;		\
+}
+
+#define BX(ins, reg)							\
+{									\
+	*ins++ = 0xe12fff10 | (reg);					\
+}
+
+void trampfd_code_fill(struct trampfd *trampfd, char *addr)
+{
+	char		*eaddr = addr + PAGE_SIZE;
+	int		creg = trampfd->code_reg - arm_min;
+	int		dreg = trampfd->data_reg - arm_min;
+	u32		*code = trampfd->code;
+	u32		*data = trampfd->data;
+	u32		*instruction = (u32 *) addr;
+	int		i;
+
+	for (i = 0; i < trampfd->ntrampolines; i++, code++, data++) {
+		/*
+		 * movw creg, code & 0xFFFF
+		 * movt creg, code >> 16
+		 */
+		MOVW(instruction, creg, code);
+
+		/*
+		 * ldr	creg, [creg]
+		 */
+		LDR(instruction, creg);
+
+		/*
+		 * movw dreg, data & 0xFFFF
+		 * movt dreg, data >> 16
+		 */
+		MOVW(instruction, dreg, data);
+
+		/*
+		 * ldr	dreg, [dreg]
+		 */
+		LDR(instruction, dreg);
+
+		/*
+		 * bx	creg
+		 */
+		BX(instruction, creg);
+	}
+	addr = (char *) instruction;
+	memset(addr, 0, eaddr - addr);
+}
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index d5cae5ffede0..85dcbc9e08ee 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -452,3 +452,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	trampfd				sys_trampfd
-- 
2.17.1


^ permalink raw reply related

* [PATCH] ima: Fix NULL pointer dereference in ima_file_hash
From: KP Singh @ 2020-09-16 12:05 UTC (permalink / raw)
  To: linux-kernel, linux-security-module
  Cc: Mimi Zohar, James Morris, Kees Cook, Lakshmi Ramasubramanian,
	Jann Horn, Florent Revest

From: KP Singh <kpsingh@google.com>

ima_file_hash can be called when there is no iint->ima_hash available
even though the inode exists in the integrity cache.

An example where this can happen (suggested by Jann Horn):

Process A does:

	while(1) {
		unlink("/tmp/imafoo");
		fd = open("/tmp/imafoo", O_RDWR|O_CREAT|O_TRUNC, 0700);
		if (fd == -1) {
			perror("open");
			continue;
		}
		write(fd, "A", 1);
		close(fd);
	}

and Process B does:

	while (1) {
		int fd = open("/tmp/imafoo", O_RDONLY);
		if (fd == -1)
			continue;
    		char *mapping = mmap(NULL, 0x1000, PROT_READ|PROT_EXEC,
			 	     MAP_PRIVATE, fd, 0);
		if (mapping != MAP_FAILED)
			munmap(mapping, 0x1000);
		close(fd);
  	}

Due to the race to get the iint->mutex between ima_file_hash and
process_measurement iint->ima_hash could still be NULL.

Fixes: 6beea7afcc72 ("ima: add the ability to query the cached hash of a given file")
Signed-off-by: KP Singh <kpsingh@google.com>
---
 security/integrity/ima/ima_main.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 8a91711ca79b..4c86cd4eece0 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -531,6 +531,16 @@ int ima_file_hash(struct file *file, char *buf, size_t buf_size)
 		return -EOPNOTSUPP;
 
 	mutex_lock(&iint->mutex);
+
+	/*
+	 * ima_file_hash can be called when ima_collect_measurement has still
+	 * not been called, we might not always have a hash.
+	 */
+	if (!iint->ima_hash) {
+		mutex_unlock(&iint->mutex);
+		return -EOPNOTSUPP;
+	}
+
 	if (buf) {
 		size_t copied_size;
 
-- 
2.28.0.618.gf4bc123cb7-goog


^ permalink raw reply related

* Re: [PATCH v4] certs: Add EFI_CERT_X509_GUID support for dbx entries
From: Jarkko Sakkinen @ 2020-09-16 18:09 UTC (permalink / raw)
  To: Eric Snowberg
  Cc: dhowells, dwmw2, herbert, davem, jmorris, serge, nayna, zohar,
	erichte, mpe, keyrings, linux-kernel, linux-crypto,
	linux-security-module
In-Reply-To: <20200916004927.64276-1-eric.snowberg@oracle.com>

On Tue, Sep 15, 2020 at 08:49:27PM -0400, Eric Snowberg wrote:
> The Secure Boot Forbidden Signature Database, dbx, contains a list of now
> revoked signatures and keys previously approved to boot with UEFI Secure
> Boot enabled.  The dbx is capable of containing any number of
> EFI_CERT_X509_SHA256_GUID, EFI_CERT_SHA256_GUID, and EFI_CERT_X509_GUID
> entries.
> 
> Currently when EFI_CERT_X509_GUID are contained in the dbx, the entries are
> skipped.
> 
> Add support for EFI_CERT_X509_GUID dbx entries. When a EFI_CERT_X509_GUID
> is found, it is added as an asymmetrical key to the .blacklist keyring.
> Anytime the .platform keyring is used, the keys in the .blacklist keyring
> are referenced, if a matching key is found, the key will be rejected.
> 
> Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com>

Looks good to me.

Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>

/Jarkko

^ permalink raw reply

* [PATCH v3] ima: Fix NULL pointer dereference in ima_file_hash
From: KP Singh @ 2020-09-16 18:02 UTC (permalink / raw)
  To: linux-kernel, linux-security-module
  Cc: Florent Revest, Mimi Zohar, James Morris, Kees Cook,
	Lakshmi Ramasubramanian, Jann Horn

From: KP Singh <kpsingh@google.com>

ima_file_hash can be called when there is no iint->ima_hash available
even though the inode exists in the integrity cache. It is fairly
common for a file to not have a hash. (e.g. an mknodat, prior to the
file being closed).

Another example where this can happen (suggested by Jann Horn):

Process A does:

	while(1) {
		unlink("/tmp/imafoo");
		fd = open("/tmp/imafoo", O_RDWR|O_CREAT|O_TRUNC, 0700);
		if (fd == -1) {
			perror("open");
			continue;
		}
		write(fd, "A", 1);
		close(fd);
	}

and Process B does:

	while (1) {
		int fd = open("/tmp/imafoo", O_RDONLY);
		if (fd == -1)
			continue;
    		char *mapping = mmap(NULL, 0x1000, PROT_READ|PROT_EXEC,
			 	     MAP_PRIVATE, fd, 0);
		if (mapping != MAP_FAILED)
			munmap(mapping, 0x1000);
		close(fd);
  	}

Due to the race to get the iint->mutex between ima_file_hash and
process_measurement iint->ima_hash could still be NULL.

Fixes: 6beea7afcc72 ("ima: add the ability to query the cached hash of a given file")
Signed-off-by: KP Singh <kpsingh@google.com>
Reviewed-by: Florent Revest <revest@chromium.org>
---
 security/integrity/ima/ima_main.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Changelog:

v2 -> v3

- Updated commit description.

v1 -> v2

- Added Review Tags, attempt to get the patch on the lists somehow.

diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 8a91711ca79b..4c86cd4eece0 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -531,6 +531,16 @@ int ima_file_hash(struct file *file, char *buf, size_t buf_size)
 		return -EOPNOTSUPP;
 
 	mutex_lock(&iint->mutex);
+
+	/*
+	 * ima_file_hash can be called when ima_collect_measurement has still
+	 * not been called, we might not always have a hash.
+	 */
+	if (!iint->ima_hash) {
+		mutex_unlock(&iint->mutex);
+		return -EOPNOTSUPP;
+	}
+
 	if (buf) {
 		size_t copied_size;
 
-- 
2.28.0.618.gf4bc123cb7-goog


^ permalink raw reply related

* Re: [PATCH v2 04/12] evm: Execute evm_inode_init_security() only when the HMAC key is loaded
From: Mimi Zohar @ 2020-09-16 16:15 UTC (permalink / raw)
  To: Roberto Sassu, mjg59
  Cc: linux-integrity, linux-security-module, linux-kernel,
	silviu.vlasceanu, stable
In-Reply-To: <20200904092339.19598-5-roberto.sassu@huawei.com>

Hi Roberto,

On Fri, 2020-09-04 at 11:23 +0200, Roberto Sassu wrote:
> evm_inode_init_security() requires the HMAC key to calculate the HMAC on
> initial xattrs provided by LSMs. Unfortunately, with the evm_key_loaded()
> check, the function continues even if the HMAC key is not loaded
> (evm_key_loaded() returns true also if EVM has been initialized only with a
> public key). If the HMAC key is not loaded, evm_inode_init_security()
> returns an error later when it calls evm_init_hmac().

This is all true, but the context for why it wasn't an issue previously
is missing.

The original usecase for allowing signature verificaton prior to
loading the HMAC key was a fully signed, possibly immutable, initrd. 
No new files were created or, at least, were in policy until the HMAC
key was loaded.   More recently support for requiring an EVM HMAC key
was removed.  Files having a portable and immutable signature were
given additional privileges.

Please update the patch description with the context of what has
changed.

Mimi

^ permalink raw reply

* [PATCH v2] ima: Fix NULL pointer dereference in ima_file_hash
From: KP Singh @ 2020-09-16 12:49 UTC (permalink / raw)
  To: linux-kernel, linux-security-module
  Cc: Florent Revest, Mimi Zohar, James Morris, Kees Cook,
	Lakshmi Ramasubramanian, Jann Horn

From: KP Singh <kpsingh@google.com>

ima_file_hash can be called when there is no iint->ima_hash available
even though the inode exists in the integrity cache.

An example where this can happen (suggested by Jann Horn):

Process A does:

	while(1) {
		unlink("/tmp/imafoo");
		fd = open("/tmp/imafoo", O_RDWR|O_CREAT|O_TRUNC, 0700);
		if (fd == -1) {
			perror("open");
			continue;
		}
		write(fd, "A", 1);
		close(fd);
	}

and Process B does:

	while (1) {
		int fd = open("/tmp/imafoo", O_RDONLY);
		if (fd == -1)
			continue;
    		char *mapping = mmap(NULL, 0x1000, PROT_READ|PROT_EXEC,
			 	     MAP_PRIVATE, fd, 0);
		if (mapping != MAP_FAILED)
			munmap(mapping, 0x1000);
		close(fd);
  	}

Due to the race to get the iint->mutex between ima_file_hash and
process_measurement iint->ima_hash could still be NULL.

Fixes: 6beea7afcc72 ("ima: add the ability to query the cached hash of a given file")
Signed-off-by: KP Singh <kpsingh@google.com>
Reviewed-by: Florent Revest <revest@chromium.org>
---
 security/integrity/ima/ima_main.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 8a91711ca79b..4c86cd4eece0 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -531,6 +531,16 @@ int ima_file_hash(struct file *file, char *buf, size_t buf_size)
 		return -EOPNOTSUPP;
 
 	mutex_lock(&iint->mutex);
+
+	/*
+	 * ima_file_hash can be called when ima_collect_measurement has still
+	 * not been called, we might not always have a hash.
+	 */
+	if (!iint->ima_hash) {
+		mutex_unlock(&iint->mutex);
+		return -EOPNOTSUPP;
+	}
+
 	if (buf) {
 		size_t copied_size;
 
-- 
2.28.0.526.ge36021eeef-goog


^ permalink raw reply related

* [PATCH v2 1/4] [RFC] fs/trampfd: Implement the trampoline file descriptor API
From: madvenka @ 2020-09-16 15:08 UTC (permalink / raw)
  To: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	madvenka
In-Reply-To: <20200916150826.5990-1-madvenka@linux.microsoft.com>

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

Introduction
============

Dynamic code is used in many different user applications. Dynamic code is
often generated at runtime. Dynamic code can also just be a pre-defined
sequence of machine instructions in a data buffer. Examples of dynamic
code are trampolines, JIT code, DBT code, etc.

Dynamic code is placed either in a data page or in a stack page. In order
to execute dynamic code, the page it resides in needs to be mapped with
execute permissions. Writable pages with execute permissions provide an
attack surface for hackers. Attackers can use this to inject malicious
code, modify existing code or do other harm.

To mitigate this, LSMs such as SELinux implement W^X. That is, they may not
allow pages to have both write and execute permissions. This prevents
dynamic code from executing and blocks applications that use it. To allow
genuine applications to run, exceptions have to be made for them (by setting
execmem, etc) which opens the door to security issues.

The W^X implementation today is not complete. There exist many user level
tricks that can be used to load and execute dynamic code. E.g.,

- Load the code into a file and map the file with R-X.

- Load the code in an RW- page. Change the permissions to R--. Then,
  change the permissions to R-X.

- Load the code in an RW- page. Remap the page with R-X to get a separate
  mapping to the same underlying physical page.

IMO, these are all security holes as an attacker can exploit them to inject
his own code.

In the future, these holes will definitely be closed. For instance, LSMs
(such as the IPE proposal [1]) may only allow code in properly signed object
files to be mapped with execute permissions. This will do two things:

	- user level tricks using anonymous pages will fail as anonymous
	  pages have no file identity

	- loading the code in a temporary file and mapping it with R-X
	  will fail as the temporary file would not have a signature

We need a way to execute such code without making security exceptions.
Trampolines are a good example of dynamic code. A couple of examples
of trampolines are given below. My first use case for this RFC is
libffi.

Solution
========

The solution is to convert dynamic code to static code and place it in a
source file. The binary generated from the source can be signed. The kernel
can use signature verification to authenticate the binary and allow the code
to be mapped and executed.

The problem is that the static code has to be able to find the data that it
needs when it executes. For functions, the ABI defines the way to pass
parameters. But, for arbitrary dynamic code, there isn't a standard ABI
compliant way to pass data to the code for most architectures. Each instance
of dynamic code defines its own way. For instance, co-location of code and
data and PC-relative data referencing are used in cases where the ISA
supports it.

We need one standard way that would work for all architectures and ABIs.

The solution has two parts:

1. The maintainer of the code writes the static code assuming that the data
   needed by the code is already pointed to by a designated register.

2. The kernel supplies a small universal trampoline that does the following:

	- Load the address of the data in a designated register
	- Load the address of the static code in a designated register
	- Jump to the static code

User code would use a kernel supplied API to create and map the trampoline.
The address values would be baked into the code so that no special ISA
features are needed.

To conserve memory, the kernel will pack as many trampolines as possible in
a page and provide a trampoline table to user code. The table itself is
managed by the user.

Kernel API
==========

A kernel API based on anonymous file descriptors is defined to create
trampolines. The following sections describe the API.

Create trampfd
==============

This feature introduces a new trampfd system call.

	struct trampfd_info	info;
	int			trampfd;

	trampfd = syscall(440, &info);

The kernel creates a trampoline file object and returns the following items
in info:

ntrampolines
	The number of trampolines that can be created with one trampfd. The
	user may create fewer trampolines if he wishes.

code_size
	The size of each trampoline.

code_offset
	The file offset to be used in mmap() to map the trampoline code.

Initialize trampfd
==================

A trampfd is initialized in this manner:

	struct trampfd_code	code;
	struct trampfd_data	data;

	/*
	 * Code descriptor.
	 */
	code.ntrampolines = number of desired trampolines;
	code.reg = code register name;
	code.table = array of code addresses

	/*
	 * Data descriptor.
	 */
	data.reg = data register name;
	data.table = array of data addresses

	pwrite(trampfd, &code, sizeof(init), TRAMPFD_CODE);
	pwrite(trampfd, &data, sizeof(init), TRAMPFD_DATA);

The register names are defined in ptrace.h (reg_32_name and reg_64_name).

It is recommended that the code descriptor and code array be placed in the
.rodata section so that an attacker cannot modify its contents.

Instead of pwrite(), the user can also do lseek() and write().

Map trampfd
===========

The user uses mmap() to map the trampoline table into user address space.

	len = info.code_size * code.ntrampolines;
	prot = PROT_READ | PROT_EXEC;
	flags = MAP_PRIVATE;
	offset = info.code_offset;

	trampoline_table = mmap(NULL, len, prot, flags, trampfd, offset);

The kernel generates the trampoline table. The code for trampoline X in the
table is:

	load	&code_table[X], code_reg
	load	(code_reg), code_reg
	load	&data_table[X], data_reg
	load	(data_reg), data_reg
	jump	code_reg

Each mmap() will only map a single base page. Large pages are not supported.

A trampoline file can only be mmapped once in an address space.

Trampoline file mappings cannot be shared across address spaces. So,
sending the trampoline file descriptor over a unix domain socket and
mapping it in another process will not work.

The trampoline code is generated with &code_table[X] and &data_table[X] hard
coded in it. But code_table[X] and data_table[X] can be modified by user
code dynamically so supply the code and data to trampoline X.

Trampoline table management
===========================

The user manages the trampoline table. The address of trampoline X is:

	trampoline_table + info.code_size * X;

Prior to invoking trampoline X, the user must initialize code_table[X] and
data_table[X].

Unmap trampfd
=============

Once the user is done with the trampoline table, it may be unmapped:

	len = info.code_size * code.ntrampolines;
	munmap(trampoline_table, len);

Remove trampfd
==============

To remove the trampfd:

	close(trampfd);

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 fs/Makefile                       |   1 +
 fs/trampfd/Makefile               |   5 +
 fs/trampfd/trampfd_fops.c         | 241 ++++++++++++++++++++++++++++++
 fs/trampfd/trampfd_map.c          | 142 ++++++++++++++++++
 include/linux/syscalls.h          |   2 +
 include/linux/trampfd.h           |  49 ++++++
 include/uapi/asm-generic/unistd.h |   4 +-
 include/uapi/linux/trampfd.h      | 184 +++++++++++++++++++++++
 init/Kconfig                      |   7 +
 kernel/sys_ni.c                   |   3 +
 10 files changed, 637 insertions(+), 1 deletion(-)
 create mode 100644 fs/trampfd/Makefile
 create mode 100644 fs/trampfd/trampfd_fops.c
 create mode 100644 fs/trampfd/trampfd_map.c
 create mode 100644 include/linux/trampfd.h
 create mode 100644 include/uapi/linux/trampfd.h

diff --git a/fs/Makefile b/fs/Makefile
index 2ce5112b02c8..227761302000 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -136,3 +136,4 @@ obj-$(CONFIG_EFIVAR_FS)		+= efivarfs/
 obj-$(CONFIG_EROFS_FS)		+= erofs/
 obj-$(CONFIG_VBOXSF_FS)		+= vboxsf/
 obj-$(CONFIG_ZONEFS_FS)		+= zonefs/
+obj-$(CONFIG_TRAMPFD)		+= trampfd/
diff --git a/fs/trampfd/Makefile b/fs/trampfd/Makefile
new file mode 100644
index 000000000000..ae09a0b1f841
--- /dev/null
+++ b/fs/trampfd/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_TRAMPFD) += trampfd.o
+
+trampfd-y += trampfd_fops.o trampfd_map.o
diff --git a/fs/trampfd/trampfd_fops.c b/fs/trampfd/trampfd_fops.c
new file mode 100644
index 000000000000..7164dd4d9039
--- /dev/null
+++ b/fs/trampfd/trampfd_fops.c
@@ -0,0 +1,241 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Trampoline FD - System call and File operations.
+ *
+ * Author: Madhavan T. Venkataraman (madvenka@microsoft.com)
+ *
+ * Copyright (C) 2020 Microsoft Corporation.
+ */
+
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/err.h>
+#include <linux/syscalls.h>
+#include <linux/seq_file.h>
+#include <linux/anon_inodes.h>
+#include <linux/trampfd.h>
+
+char	*trampfd_name = "[trampfd]";
+
+struct kmem_cache	*trampfd_cache;
+
+/*
+ * Arch stub function to return info for the trampfd syscall.
+ */
+int __attribute__((weak)) trampfd_arch(struct trampfd_info *info)
+{
+	return -EOPNOTSUPP;
+}
+
+/*
+ * Arch stub function to do arch specific initialization for a code
+ * descriptor.
+ */
+int __attribute__((weak)) trampfd_code_arch(struct trampfd_code *code)
+{
+	return -EOPNOTSUPP;
+}
+
+/*
+ * Arch stub function to do arch specific initialization for a data
+ * descriptor.
+ */
+int __attribute__((weak)) trampfd_data_arch(struct trampfd_data *data)
+{
+	return -EOPNOTSUPP;
+}
+
+#ifdef CONFIG_PROC_FS
+static void trampfd_show_fdinfo(struct seq_file *sfile, struct file *file)
+{
+	seq_puts(sfile, "Trampoline FD\n");
+}
+#endif
+
+static loff_t trampfd_llseek(struct file *file, loff_t offset, int whence)
+{
+	struct trampfd		*trampfd = file->private_data;
+
+	if (whence != SEEK_SET)
+		return -EINVAL;
+
+	if ((offset < 0) || (offset >= TRAMPFD_NUM_OFFSETS))
+		return -EINVAL;
+
+	mutex_lock(&trampfd->lock);
+	if (offset != file->f_pos) {
+		file->f_pos = offset;
+		file->f_version = 0;
+	}
+	mutex_unlock(&trampfd->lock);
+	return offset;
+}
+
+int trampfd_code(struct file *file, const char __user *arg, size_t count)
+{
+	struct trampfd		*trampfd = file->private_data;
+	struct trampfd_code	code;
+	int			rc = 0;
+
+	if (count != sizeof(code))
+		return -EINVAL;
+
+	if (copy_from_user(&code, arg, sizeof(code)))
+		return -EFAULT;
+
+	mutex_lock(&trampfd->lock);
+
+	if (trampfd->code) {
+		rc = -EEXIST;
+		goto unlock;
+	}
+
+	rc = trampfd_code_arch(&code);
+	if (rc)
+		goto unlock;
+
+	trampfd->code_reg = code.reg;
+	trampfd->ntrampolines = code.ntrampolines;
+	trampfd->code = (void *) (uintptr_t) code.table;
+unlock:
+	mutex_unlock(&trampfd->lock);
+	return rc;
+}
+
+int trampfd_data(struct file *file, const char __user *arg, size_t count)
+{
+	struct trampfd		*trampfd = file->private_data;
+	struct trampfd_data	data;
+	int			rc = 0;
+
+	if (count != sizeof(data))
+		return -EINVAL;
+
+	if (copy_from_user(&data, arg, sizeof(data)))
+		return -EFAULT;
+
+	if (data.reserved)
+		return -EINVAL;
+
+	mutex_lock(&trampfd->lock);
+
+	if (trampfd->data) {
+		rc = -EEXIST;
+		goto unlock;
+	}
+
+	rc = trampfd_data_arch(&data);
+	if (rc)
+		goto unlock;
+
+	trampfd->data_reg = data.reg;
+	trampfd->data = (void *) (uintptr_t) data.table;
+unlock:
+	mutex_unlock(&trampfd->lock);
+	return rc;
+}
+
+static ssize_t trampfd_write(struct file *file, const char __user *arg,
+			     size_t count, loff_t *ppos)
+{
+	int		rc;
+
+	if (!arg || !count)
+		return -EINVAL;
+
+	switch (*ppos) {
+	case TRAMPFD_CODE:
+		rc = trampfd_code(file, arg, count);
+		break;
+
+	case TRAMPFD_DATA:
+		rc = trampfd_data(file, arg, count);
+		break;
+
+	default:
+		rc = -EINVAL;
+		goto out;
+	}
+out:
+	return rc ? rc : (ssize_t) count;
+}
+
+static int trampfd_release(struct inode *inode, struct file *file)
+{
+	struct trampfd		*trampfd = file->private_data;
+
+	mutex_destroy(&trampfd->lock);
+	kmem_cache_free(trampfd_cache, trampfd);
+	return 0;
+}
+
+const struct file_operations trampfd_fops = {
+#ifdef CONFIG_PROC_FS
+	.show_fdinfo		= trampfd_show_fdinfo,
+#endif
+	.llseek			= trampfd_llseek,
+	.write			= trampfd_write,
+	.release		= trampfd_release,
+	.mmap			= trampfd_mmap,
+	.get_unmapped_area	= trampfd_get_unmapped_area,
+};
+
+SYSCALL_DEFINE1(trampfd, struct trampfd_info *, info_arg)
+{
+	struct trampfd		*trampfd;
+	struct trampfd_info	info;
+	struct file		*file;
+	int			fd;
+	int			rc;
+
+	if (!trampfd_cache)
+		return -ENOMEM;
+
+	if (!info_arg)
+		return -EINVAL;
+
+	trampfd = kmem_cache_zalloc(trampfd_cache, GFP_KERNEL);
+	if (!trampfd)
+		return -ENOMEM;
+	mutex_init(&trampfd->lock);
+	trampfd->creator = current->pid;
+
+	trampfd_arch(&info);
+
+	if (copy_to_user(info_arg, &info, sizeof(info))) {
+		rc = -EFAULT;
+		goto freetramp;
+	}
+
+	rc = get_unused_fd_flags(O_CLOEXEC);
+	if (rc < 0)
+		goto freetramp;
+	fd = rc;
+
+	file = anon_inode_getfile(trampfd_name, &trampfd_fops, trampfd, O_RDWR);
+	if (IS_ERR(file)) {
+		rc = PTR_ERR(file);
+		goto freefd;
+	}
+	file->f_mode |= (FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE);
+
+	fd_install(fd, file);
+	return fd;
+freefd:
+	put_unused_fd(fd);
+freetramp:
+	kmem_cache_free(trampfd_cache, trampfd);
+	return rc;
+}
+
+int __init trampfd_feature_init(void)
+{
+	trampfd_cache = kmem_cache_create("trampfd_cache",
+		sizeof(struct trampfd), 0, SLAB_HWCACHE_ALIGN, NULL);
+	if (trampfd_cache == NULL) {
+		pr_warn("%s: kmem_cache_create failed", __func__);
+		return -ENOMEM;
+	}
+	return 0;
+}
+core_initcall(trampfd_feature_init);
diff --git a/fs/trampfd/trampfd_map.c b/fs/trampfd/trampfd_map.c
new file mode 100644
index 000000000000..679b29768491
--- /dev/null
+++ b/fs/trampfd/trampfd_map.c
@@ -0,0 +1,142 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Trampoline FD - Memory mapping.
+ *
+ * Author: Madhavan T. Venkataraman (madvenka@microsoft.com)
+ *
+ * Copyright (C) 2020 Microsoft Corporation.
+ */
+
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/err.h>
+#include <linux/mman.h>
+#include <linux/highmem.h>
+#include <linux/trampfd.h>
+
+/*
+ * Arch stub function to populate a page with trampolines based on a
+ * trampoline specification.
+ */
+void __attribute__((weak)) trampfd_code_fill(struct trampfd *trampfd,
+					     char *addr)
+{
+}
+
+static void trampfd_close(struct vm_area_struct *vma)
+{
+	struct trampfd		*trampfd = vma->vm_file->private_data;
+
+	mutex_lock(&trampfd->lock);
+
+	if (trampfd->page) {
+		__free_pages(trampfd->page, 0);
+		trampfd->page = NULL;
+	}
+	trampfd->mapped = false;
+
+	mutex_unlock(&trampfd->lock);
+}
+
+static vm_fault_t trampfd_fault(struct vm_fault *vmf)
+{
+	struct vm_area_struct	*vma = vmf->vma;
+	struct trampfd		*trampfd = vma->vm_file->private_data;
+	struct page		*new_page = NULL;
+	void			*addr;
+
+	/*
+	 * Check this outside the lock so the lock does not have to be
+	 * dropped in order to allocate a page. Races are benign.
+	 */
+	if (!trampfd->page) {
+		new_page = alloc_pages(GFP_KERNEL, 0);
+		if (!new_page)
+			return VM_FAULT_OOM;
+	}
+
+	mutex_lock(&trampfd->lock);
+
+	if (!trampfd->page) {
+		trampfd->page = new_page;
+		new_page = NULL;
+		/*
+		 * Populate the page with trampolines.
+		 */
+		addr = kmap(trampfd->page);
+		trampfd_code_fill(trampfd, addr);
+		kunmap(trampfd->page);
+	}
+	vmf->page = trampfd->page;
+	get_page(vmf->page);
+
+	mutex_unlock(&trampfd->lock);
+
+	if (new_page)
+		__free_pages(new_page, 0);
+	return 0;
+}
+
+static const struct vm_operations_struct trampfd_vm_ops = {
+	.close	= trampfd_close,
+	.fault	= trampfd_fault,
+};
+
+int trampfd_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct trampfd		*trampfd = vma->vm_file->private_data;
+	int			rc = 0;
+
+	/*
+	 * A trampfd cannot be mapped into multiple address spaces.
+	 */
+	if (current->pid != trampfd->creator)
+		return -EINVAL;
+
+	mutex_lock(&trampfd->lock);
+
+	/*
+	 * trampfd must be initialized before it can be mapped.
+	 */
+	if (!trampfd->code || !trampfd->data) {
+		rc = -EINVAL;
+		goto unlock;
+	}
+
+	/*
+	 * A trampfd cannot be mapped multiple times in the same address space.
+	 */
+	if (trampfd->mapped) {
+		rc = -EEXIST;
+		goto unlock;
+	}
+
+	/*
+	 * prot should be R-X.
+	 */
+	if ((vma->vm_flags & VM_WRITE) || !(vma->vm_flags & VM_READ) ||
+	    !(vma->vm_flags & VM_EXEC)) {
+		rc = -EINVAL;
+		goto unlock;
+	}
+	trampfd->mapped = true;
+	vma->vm_ops = &trampfd_vm_ops;
+unlock:
+	mutex_unlock(&trampfd->lock);
+	return rc;
+}
+
+unsigned long
+trampfd_get_unmapped_area(struct file *file, unsigned long orig_addr,
+			  unsigned long len, unsigned long pgoff,
+			  unsigned long flags)
+{
+	const typeof_member(struct file_operations, get_unmapped_area)
+	get_area = current->mm->get_unmapped_area;
+
+	if (pgoff != TRAMPFD_CODE_PGOFF || flags != MAP_PRIVATE ||
+	    len != PAGE_SIZE)
+		return -EINVAL;
+
+	return get_area(file, orig_addr, len, pgoff, flags);
+}
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index b951a87da987..91f55ff3cdac 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -69,6 +69,7 @@ union bpf_attr;
 struct io_uring_params;
 struct clone_args;
 struct open_how;
+struct trampfd_info;
 
 #include <linux/types.h>
 #include <linux/aio_abi.h>
@@ -1005,6 +1006,7 @@ asmlinkage long sys_pidfd_send_signal(int pidfd, int sig,
 				       siginfo_t __user *info,
 				       unsigned int flags);
 asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags);
+asmlinkage long sys_trampfd(struct trampfd_info *info);
 
 /*
  * Architecture-specific system calls
diff --git a/include/linux/trampfd.h b/include/linux/trampfd.h
new file mode 100644
index 000000000000..c98fa1741c36
--- /dev/null
+++ b/include/linux/trampfd.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Trampoline FD - Internal structures and definitions.
+ *
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ *
+ * Copyright (c) 2020, Microsoft Corporation.
+ */
+#ifndef _LINUX_TRAMPFD_H
+#define _LINUX_TRAMPFD_H
+
+#include <uapi/linux/trampfd.h>
+
+/*
+ * mmap() offsets.
+ */
+enum trampfd_pgoff {
+	TRAMPFD_CODE_PGOFF = 1,
+	TRAMPFD_NUM_PGOFF,
+};
+
+/*
+ * Trampoline structure.
+ */
+struct trampfd {
+	struct mutex		lock;		/* to serialize access */
+	pid_t			creator;	/* to prevent sharing */
+
+	short			code_reg;	/* code register name */
+	short			data_reg;	/* data register name */
+	int			ntrampolines;	/* number of trampolines */
+
+	void			*code;		/* user code address table */
+	void			*data;		/* user data address table */
+
+	struct page		*page;		/* code page */
+	bool			mapped;		/* mapped into address space? */
+};
+
+#ifdef CONFIG_TRAMPFD
+
+int trampfd_mmap(struct file *file, struct vm_area_struct *vma);
+unsigned long trampfd_get_unmapped_area(struct file *file, unsigned long addr,
+					unsigned long len, unsigned long pgoff,
+					unsigned long flags);
+
+#endif /* CONFIG_TRAMPFD */
+
+#endif /* _LINUX_TRAMPFD_H */
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index f4a01305d9a6..3b1ad4b75c7a 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -857,9 +857,11 @@ __SYSCALL(__NR_openat2, sys_openat2)
 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 #define __NR_faccessat2 439
 __SYSCALL(__NR_faccessat2, sys_faccessat2)
+#define __NR_trampfd 440
+__SYSCALL(__NR_trampfd, sys_trampfd)
 
 #undef __NR_syscalls
-#define __NR_syscalls 440
+#define __NR_syscalls 441
 
 /*
  * 32 bit systems traditionally used different
diff --git a/include/uapi/linux/trampfd.h b/include/uapi/linux/trampfd.h
new file mode 100644
index 000000000000..9bbc0450e16d
--- /dev/null
+++ b/include/uapi/linux/trampfd.h
@@ -0,0 +1,184 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Trampoline FD - API structures and definitions.
+ *
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ *
+ * Copyright (c) 2020, Microsoft Corporation.
+ */
+#ifndef _UAPI_LINUX_TRAMPFD_H
+#define _UAPI_LINUX_TRAMPFD_H
+
+#include <linux/types.h>
+#include <linux/ptrace.h>
+
+/*
+ * All structure fields are defined so that they are the same width and at the
+ * same structure offset on 32-bit and 64-bit to avoid compat code.
+ *
+ * All fields named "reserved" must be set to 0. They are there primarily for
+ * alignment. But they may be used in the future.
+ */
+
+/* ---------------------------- Trampfd Feature --------------------------- */
+
+/*
+ * This feature can be used to help convert dynamic user code to static user
+ * code so that the code can be in the text segment of a binary file. This
+ * allows the kernel to authenticate the code. E.g., using signature
+ * verification of the binary file.
+ *
+ * The problem in converting dynamic code to static code is that the static
+ * code needs to be able to locate its data dynamically. So, its data needs
+ * to be loaded in a designated register before jumping to the static code.
+ *
+ * This feature uses the kernel to generate a small, secure trampoline to do
+ * this. The trampoline code looks like this:
+ *
+ *	- load the address of the static code in a register (code_reg)
+ *	- load the address of its data in a register (data_reg)
+ *	- jump to code_reg
+ *
+ * The kernel places the trampoline in a user page and maps it into the user
+ * address space. To conserve memory, the kernel packs multiple trampolines in
+ * a page and creates a trampoline table.
+ */
+
+/* -------------------------- Trampoline Creation ------------------------- */
+
+/*
+ * This feature introduces a new trampfd system call.
+ *
+ *	struct trampfd_info	info;
+ *
+ *	trampfd = syscall(430, &info);
+ *
+ * The kernel returns the following items in info:
+ *
+ * ntrampolines
+ *	The number of trampolines that can be created with one trampfd. The
+ *	user may create fewer trampolines if he wishes.
+ *
+ * code_size
+ *	The size of each trampoline.
+ *
+ * code_offset
+ *	The file offset to be used in mmap() to map the trampoline code.
+ */
+struct trampfd_info {
+	__u32		ntrampolines;
+	__u32		code_size;
+	__u32		code_offset;
+	__u32		reserved;
+};
+
+/* ----------------------- Trampoline Initialization ---------------------- */
+
+/*
+ * Trampoline code descriptor.
+ *
+ * ntrampolines
+ *	User specified number of trampolines. This number cannot exceed
+ *	info.ntrampolines.
+ *
+ * reg
+ *	User specified code register name. This is architecture specific and
+ *	can be obtained from ptrace.h.
+ *
+ * table
+ *	User array of code addresses, one address per trampoline.
+ *
+ */
+struct trampfd_code {
+	__u32		ntrampolines;
+	__u32		reg;
+	__u64		table;
+};
+
+/*
+ * Trampoline data descriptor.
+ *
+ * reg
+ *	User specified data register name. This is architecture specific and
+ *	can be obtained from ptrace.h.
+ *
+ * table
+ *	User array of data addresses, one address per trampoline.
+ *
+ */
+struct trampfd_data {
+	__u32		reg;
+	__u32		reserved;
+	__u64		table;
+};
+
+/*
+ * A trampfd is initialized in this manner:
+ *
+ *	struct trampfd_code	code;
+ *	struct trampfd_data	data;
+ *
+ *	code.ntrampolines = number of desired trampolines;
+ *	code.reg = code register name;
+ *	code.table = array of code addresses
+ *
+ *	data.reg = data register name;
+ *	data.table = array of data addresses
+ *
+ *	pwrite(trampfd, &code, sizeof(init), TRAMPFD_CODE);
+ *	pwrite(trampfd, &data, sizeof(init), TRAMPFD_DATA);
+ *
+ * It is recommended that the code descriptor and code array be placed in the
+ * .rodata section so that an attacker cannot modify its contents.
+ */
+
+/* ---------------------------- Trampoline mapping ------------------------ */
+
+/*
+ * The user uses mmap() to map the trampoline table into user address space.
+ *
+ *	len = info.code_size * code.ntrampolines;
+ *	prot = PROT_READ | PROT_EXEC;
+ *	flags = MAP_PRIVATE;
+ *	offset = info.code_offset;
+ *
+ *	trampoline_table = mmap(NULL, len, prot, flags, trampfd, offset);
+ *
+ * The kernel generates the trampoline table. The code for trampoline X in the
+ * table is:
+ *
+ *	load code_table[X] into code_reg
+ *	load data_table[X] into data_reg
+ *	jump code_reg
+ *
+ * The user manages the trampoline table. The address of trampoline X is:
+ *
+ *	trampoline_table + info.code_size * X;
+ *
+ * Prior to invoking trampoline X, the user must initialize code_table[X] and
+ * data_table[X].
+ */
+
+/* ------------------------- Symbolic offsets -------------------------- */
+
+/*
+ * trampfd can have different actions/parameters associated with it. Each one
+ * has a symbolic file offset. Action/Parameter structures are read or written
+ * at their file offsets.
+ *
+ * Offset		Operation	Data
+ * ------------------------------------------------------------------------
+ * TRAMPFD_CODE		Write		struct trampfd_code
+ * TRAMPFD_DATA		Write		struct trampfd_data
+ * ------------------------------------------------------------------------
+ */
+enum trampfd_offsets {
+	TRAMPFD_CODE,
+	TRAMPFD_DATA,
+	TRAMPFD_NUM_OFFSETS,
+};
+
+
+/* ----------------------------------------------------------------------- */
+
+#endif /* _UAPI_LINUX_TRAMPFD_H */
diff --git a/init/Kconfig b/init/Kconfig
index 0498af567f70..bb3ecca5b8e7 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2313,3 +2313,10 @@ config ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
 # <asm/syscall_wrapper.h>.
 config ARCH_HAS_SYSCALL_WRAPPER
 	def_bool n
+
+config TRAMPFD
+	bool "Enable trampfd() system call"
+	depends on MMU
+	help
+	  Enable the trampfd() system call that allows a process to map
+	  kernel generated trampolines within its address space.
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 3b69a560a7ac..93c5972aba85 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -349,6 +349,9 @@ COND_SYSCALL(pkey_mprotect);
 COND_SYSCALL(pkey_alloc);
 COND_SYSCALL(pkey_free);
 
+/* Trampoline fd */
+COND_SYSCALL(trampfd);
+
 
 /*
  * Architecture specific weak syscall entries.
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH v6 7/8] X.509: support OSCCA sm2-with-sm3 certificate verification
From: Gilad Ben-Yossef @ 2020-09-16  8:01 UTC (permalink / raw)
  To: Tianjia Zhang
  Cc: Herbert Xu, David S. Miller, David Howells, Maxime Coquelin,
	Alexandre Torgue, James Morris, Serge E. Hallyn, Stephan Mueller,
	Marcelo Henrique Cerri, Steven Rostedt (VMware), Masahiro Yamada,
	Brendan Higgins, Andrew Morton, Johannes Weiner, Waiman Long,
	Mimi Zohar, Lakshmi Ramasubramanian, Colin Ian King,
	Tushar Sugandhi, Vitaly Chikunov, Pascal van Leeuwen,
	Linux Crypto Mailing List, Linux kernel mailing list, keyrings,
	linux-stm32, Linux ARM, linux-security-module, Xufeng Zhang,
	Jia Zhang
In-Reply-To: <6f251e1e-42a0-7e6c-e0cd-51fba3150d17@linux.alibaba.com>

On Mon, Sep 14, 2020 at 9:34 AM Tianjia Zhang
<tianjia.zhang@linux.alibaba.com> wrote:
>
> Hi Gilad,
>
> On 9/13/20 3:12 PM, Gilad Ben-Yossef wrote:
> > Hi,
> >
> >
> > On Thu, Sep 3, 2020 at 4:13 PM Tianjia Zhang
> > <tianjia.zhang@linux.alibaba.com> wrote:
> >>
> >> The digital certificate format based on SM2 crypto algorithm as
> >> specified in GM/T 0015-2012. It was published by State Encryption
> >> Management Bureau, China.
> >>
> >> The method of generating Other User Information is defined as
> >> ZA=H256(ENTLA || IDA || a || b || xG || yG || xA || yA), it also
> >> specified in https://tools.ietf.org/html/draft-shen-sm2-ecdsa-02.
> >>
> >> The x509 certificate supports sm2-with-sm3 type certificate
> >> verification.  Because certificate verification requires ZA
> >> in addition to tbs data, ZA also depends on elliptic curve
> >> parameters and public key data, so you need to access tbs in sig
> >> and calculate ZA. Finally calculate the digest of the
> >> signature and complete the verification work. The calculation
> >> process of ZA is declared in specifications GM/T 0009-2012
> >> and GM/T 0003.2-2012.
> >>
> >> Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
> >> Tested-by: Xufeng Zhang <yunbo.xufeng@linux.alibaba.com>
> >> ---
> >>   crypto/asymmetric_keys/Makefile          |  1 +
> >>   crypto/asymmetric_keys/public_key.c      |  6 +++
> >>   crypto/asymmetric_keys/public_key_sm2.c  | 61 ++++++++++++++++++++++++
> >>   crypto/asymmetric_keys/x509_public_key.c |  3 ++
> >>   include/crypto/public_key.h              | 15 ++++++
> >>   5 files changed, 86 insertions(+)
> >>   create mode 100644 crypto/asymmetric_keys/public_key_sm2.c
> >>
> >> diff --git a/crypto/asymmetric_keys/Makefile b/crypto/asymmetric_keys/Makefile
> >> index 28b91adba2ae..1a99ea5acb6b 100644
> >> --- a/crypto/asymmetric_keys/Makefile
> >> +++ b/crypto/asymmetric_keys/Makefile
> >> @@ -11,6 +11,7 @@ asymmetric_keys-y := \
> >>          signature.o
> >>
> >>   obj-$(CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE) += public_key.o
> >> +obj-$(CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE) += public_key_sm2.o
> >>   obj-$(CONFIG_ASYMMETRIC_TPM_KEY_SUBTYPE) += asym_tpm.o
> >>
> >>   #
> >> diff --git a/crypto/asymmetric_keys/public_key.c b/crypto/asymmetric_keys/public_key.c
> >> index d8410ffd7f12..1d0492098bbd 100644
> >> --- a/crypto/asymmetric_keys/public_key.c
> >> +++ b/crypto/asymmetric_keys/public_key.c
> >> @@ -299,6 +299,12 @@ int public_key_verify_signature(const struct public_key *pkey,
> >>          if (ret)
> >>                  goto error_free_key;
> >>
> >> +       if (strcmp(sig->pkey_algo, "sm2") == 0 && sig->data_size) {
> >> +               ret = cert_sig_digest_update(sig, tfm);
> >> +               if (ret)
> >> +                       goto error_free_key;
> >> +       }
> >> +
> >>          sg_init_table(src_sg, 2);
> >>          sg_set_buf(&src_sg[0], sig->s, sig->s_size);
> >>          sg_set_buf(&src_sg[1], sig->digest, sig->digest_size);
> >> diff --git a/crypto/asymmetric_keys/public_key_sm2.c b/crypto/asymmetric_keys/public_key_sm2.c
> >> new file mode 100644
> >> index 000000000000..7325cf21dbb4
> >> --- /dev/null
> >> +++ b/crypto/asymmetric_keys/public_key_sm2.c
> >> @@ -0,0 +1,61 @@
> >> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> >> +/*
> >> + * asymmetric public-key algorithm for SM2-with-SM3 certificate
> >> + * as specified by OSCCA GM/T 0003.1-2012 -- 0003.5-2012 SM2 and
> >> + * described at https://tools.ietf.org/html/draft-shen-sm2-ecdsa-02
> >> + *
> >> + * Copyright (c) 2020, Alibaba Group.
> >> + * Authors: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
> >> + */
> >> +
> >> +#include <crypto/sm3_base.h>
> >> +#include <crypto/sm2.h>
> >> +#include <crypto/public_key.h>
> >> +
> >> +#if IS_REACHABLE(CONFIG_CRYPTO_SM2)
> >> +
> >> +int cert_sig_digest_update(const struct public_key_signature *sig,
> >> +                               struct crypto_akcipher *tfm_pkey)
> >> +{
> >> +       struct crypto_shash *tfm;
> >> +       struct shash_desc *desc;
> >> +       size_t desc_size;
> >> +       unsigned char dgst[SM3_DIGEST_SIZE];
> >> +       int ret;
> >> +
> >> +       BUG_ON(!sig->data);
> >> +
> >> +       ret = sm2_compute_z_digest(tfm_pkey, SM2_DEFAULT_USERID,
> >> +                                       SM2_DEFAULT_USERID_LEN, dgst);
> >> +       if (ret)
> >> +               return ret;
> >> +
> >> +       tfm = crypto_alloc_shash(sig->hash_algo, 0, 0);
> >> +       if (IS_ERR(tfm))
> >> +               return PTR_ERR(tfm);
> >> +
> >> +       desc_size = crypto_shash_descsize(tfm) + sizeof(*desc);
> >> +       desc = kzalloc(desc_size, GFP_KERNEL);
> >> +       if (!desc)
> >> +               goto error_free_tfm;
> >> +
> >> +       desc->tfm = tfm;
> >> +
> >> +       ret = crypto_shash_init(desc);
> >> +       if (ret < 0)
> >> +               goto error_free_desc;
> >> +
> >> +       ret = crypto_shash_update(desc, dgst, SM3_DIGEST_SIZE);
> >> +       if (ret < 0)
> >> +               goto error_free_desc;
> >> +
> >> +       ret = crypto_shash_finup(desc, sig->data, sig->data_size, sig->digest);
> >
> > It looks like you are doing a separate init, update, finup every time
> > - I would consider using crypto_shash_digest() in one go.
> >
> > In fact, considering the fact that you are allocating a tfm just for
> > this use and then releasing it, I would consider switching to
> > crypto_shash_tfm_digest() and dropping the kzalloc all together.
> >
> > This should simplify the code a bit.
> >
> > Other than that I don't have anything smart to say :-)
> >
> > Gilad
> >
>
> The hash calculation here includes two parts of data, 'dgst' and
> 'sig->data'. The last call is 'finup()' not 'final()'. I understand that
> it should not be possible to use 'crypto_shash_tfm_digest()' This kind
> of function is simplified.
>
> If a new scope is added, the assignment of desc can be optimized, as
> follows:
> ```
> do {
>      SHASH_DESC_ON_STACK(desc, tfm);
>      desc->tfm = tfm;
>
>      /* ... */
> } while (0);
> ```
> However, the kernel code may not accept this style. What is your opinion?

No, you are right. I've indeed missed that it's a finup() and not a
final(). If the size of data was big enough it might have been worth
going to the async. hash interface and creating a scatter list for
this but I suspect it is not justified with the data sizes we are
dealing with there.

So:

Reviewed-by: Gilad Ben-Yossef <gilad@benyossef.com>

Thanks,
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker

values of β will give rise to dom!

^ permalink raw reply

* [PATCH v38 15/24] x86/sgx: Enable provisioning for remote attestation
From: Jarkko Sakkinen @ 2020-09-15 11:28 UTC (permalink / raw)
  To: x86, linux-sgx
  Cc: linux-kernel, Jarkko Sakkinen, linux-security-module,
	Jethro Beekman, Darren Kenny, Andy Lutomirski, akpm,
	andriy.shevchenko, asapek, bp, cedric.xing, chenalexchen,
	conradparker, cyhanish, dave.hansen, haitao.huang, josh,
	kai.huang, kai.svahn, kmoy, ludloff, nhorman, npmccallum,
	puiterwijk, rientjes, sean.j.christopherson, tglx, yaozhangx
In-Reply-To: <20200915112842.897265-1-jarkko.sakkinen@linux.intel.com>

Provisioning Certification Enclave (PCE), the root of trust for other
enclaves, generates a signing key from a fused key called Provisioning
Certification Key. PCE can then use this key to certify an attestation key
of a Quoting Enclave (QE), e.g. we get the chain of trust down to the
hardware if the Intel signed PCE is used.

To use the needed keys, ATTRIBUTE.PROVISIONKEY is required but should be
only allowed for those who actually need it so that only the trusted
parties can certify QE's.

Obviously the attestation service should know the public key of the used
PCE and that way detect illegit attestation, but whitelisting the legit
users still adds an additional layer of defence.

Add new device file called /dev/sgx/provision. The sole purpose of this
file is to provide file descriptors that act as privilege tokens to allow
to build enclaves with ATTRIBUTE.PROVISIONKEY set. A new ioctl called
SGX_IOC_ENCLAVE_PROVISION is used to assign this token to an enclave.

Cc: linux-security-module@vger.kernel.org
Acked-by: Jethro Beekman <jethro@fortanix.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 arch/x86/include/uapi/asm/sgx.h  | 11 ++++++++
 arch/x86/kernel/cpu/sgx/driver.c | 18 ++++++++++++
 arch/x86/kernel/cpu/sgx/driver.h |  2 ++
 arch/x86/kernel/cpu/sgx/ioctl.c  | 47 ++++++++++++++++++++++++++++++++
 4 files changed, 78 insertions(+)

diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index 7729730d8580..d0916fb9629e 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -25,6 +25,8 @@ enum sgx_page_flags {
 	_IOWR(SGX_MAGIC, 0x01, struct sgx_enclave_add_pages)
 #define SGX_IOC_ENCLAVE_INIT \
 	_IOW(SGX_MAGIC, 0x02, struct sgx_enclave_init)
+#define SGX_IOC_ENCLAVE_PROVISION \
+	_IOW(SGX_MAGIC, 0x03, struct sgx_enclave_provision)
 
 /**
  * struct sgx_enclave_create - parameter structure for the
@@ -61,4 +63,13 @@ struct sgx_enclave_init {
 	__u64 sigstruct;
 };
 
+/**
+ * struct sgx_enclave_provision - parameter structure for the
+ *				  %SGX_IOC_ENCLAVE_PROVISION ioctl
+ * @attribute_fd:	file handle of the attribute file in the securityfs
+ */
+struct sgx_enclave_provision {
+	__u64 attribute_fd;
+};
+
 #endif /* _UAPI_ASM_X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index 7bdb49dfcca6..d01b28f7ce4a 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -134,6 +134,10 @@ static const struct file_operations sgx_encl_fops = {
 	.get_unmapped_area	= sgx_get_unmapped_area,
 };
 
+const struct file_operations sgx_provision_fops = {
+	.owner			= THIS_MODULE,
+};
+
 static struct miscdevice sgx_dev_enclave = {
 	.minor = MISC_DYNAMIC_MINOR,
 	.name = "enclave",
@@ -141,6 +145,13 @@ static struct miscdevice sgx_dev_enclave = {
 	.fops = &sgx_encl_fops,
 };
 
+static struct miscdevice sgx_dev_provision = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "provision",
+	.nodename = "sgx/provision",
+	.fops = &sgx_provision_fops,
+};
+
 int __init sgx_drv_init(void)
 {
 	unsigned int eax, ebx, ecx, edx;
@@ -181,5 +192,12 @@ int __init sgx_drv_init(void)
 		return ret;
 	}
 
+	ret = misc_register(&sgx_dev_provision);
+	if (ret) {
+		pr_err("Creating /dev/sgx/provision failed with %d.\n", ret);
+		misc_deregister(&sgx_dev_enclave);
+		return ret;
+	}
+
 	return 0;
 }
diff --git a/arch/x86/kernel/cpu/sgx/driver.h b/arch/x86/kernel/cpu/sgx/driver.h
index e4063923115b..72747d01c046 100644
--- a/arch/x86/kernel/cpu/sgx/driver.h
+++ b/arch/x86/kernel/cpu/sgx/driver.h
@@ -23,6 +23,8 @@ extern u64 sgx_attributes_reserved_mask;
 extern u64 sgx_xfrm_reserved_mask;
 extern u32 sgx_xsave_size_tbl[64];
 
+extern const struct file_operations sgx_provision_fops;
+
 long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg);
 
 int sgx_drv_init(void);
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index de2ed4f35ffb..4227bca7b477 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -673,6 +673,50 @@ static long sgx_ioc_enclave_init(struct sgx_encl *encl, void __user *arg)
 	return ret;
 }
 
+/**
+ * sgx_ioc_enclave_set_attribute - handler for %SGX_IOC_ENCLAVE_PROVISION
+ * @filep:	open file to /dev/sgx
+ * @arg:	userspace pointer to a struct sgx_enclave_provision instance
+ *
+ * Mark the enclave as being allowed to access a restricted attribute bit.
+ * The requested attribute is specified via the attribute_fd field in the
+ * provided struct sgx_enclave_provision.  The attribute_fd must be a
+ * handle to an SGX attribute file, e.g. "/dev/sgx/provision".
+ *
+ * Failure to explicitly request access to a restricted attribute will cause
+ * sgx_ioc_enclave_init() to fail.  Currently, the only restricted attribute
+ * is access to the PROVISION_KEY.
+ *
+ * Note, access to the EINITTOKEN_KEY is disallowed entirely.
+ *
+ * Return: 0 on success, -errno otherwise
+ */
+static long sgx_ioc_enclave_provision(struct sgx_encl *encl,
+					  void __user *arg)
+{
+	struct sgx_enclave_provision params;
+	struct file *attribute_file;
+	int ret;
+
+	if (copy_from_user(&params, arg, sizeof(params)))
+		return -EFAULT;
+
+	attribute_file = fget(params.attribute_fd);
+	if (!attribute_file)
+		return -EINVAL;
+
+	if (attribute_file->f_op != &sgx_provision_fops) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	encl->attributes |= SGX_ATTR_PROVISIONKEY;
+	ret = 0;
+
+out:
+	fput(attribute_file);
+	return ret;
+}
 
 long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 {
@@ -698,6 +742,9 @@ long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 	case SGX_IOC_ENCLAVE_INIT:
 		ret = sgx_ioc_enclave_init(encl, (void __user *)arg);
 		break;
+	case SGX_IOC_ENCLAVE_PROVISION:
+		ret = sgx_ioc_enclave_provision(encl, (void __user *)arg);
+		break;
 	default:
 		ret = -ENOIOCTLCMD;
 		break;
-- 
2.25.1


^ permalink raw reply related

* [PATCH v4] certs: Add EFI_CERT_X509_GUID support for dbx entries
From: Eric Snowberg @ 2020-09-16  0:49 UTC (permalink / raw)
  To: dhowells, dwmw2, jarkko.sakkinen
  Cc: herbert, davem, jmorris, serge, nayna, zohar, eric.snowberg,
	erichte, mpe, keyrings, linux-kernel, linux-crypto,
	linux-security-module

The Secure Boot Forbidden Signature Database, dbx, contains a list of now
revoked signatures and keys previously approved to boot with UEFI Secure
Boot enabled.  The dbx is capable of containing any number of
EFI_CERT_X509_SHA256_GUID, EFI_CERT_SHA256_GUID, and EFI_CERT_X509_GUID
entries.

Currently when EFI_CERT_X509_GUID are contained in the dbx, the entries are
skipped.

Add support for EFI_CERT_X509_GUID dbx entries. When a EFI_CERT_X509_GUID
is found, it is added as an asymmetrical key to the .blacklist keyring.
Anytime the .platform keyring is used, the keys in the .blacklist keyring
are referenced, if a matching key is found, the key will be rejected.

Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com>
---
v4:
Remove unneeded symbol export found by Jarkko Sakkinen

v3:
Fixed an issue when CONFIG_PKCS7_MESSAGE_PARSER is not builtin and defined
as a module instead, pointed out by Randy Dunlap

v2: 
Fixed build issue reported by kernel test robot <lkp@intel.com>
Commit message update (suggested by Jarkko Sakkinen)
---
 certs/blacklist.c                             | 32 +++++++++++++++++++
 certs/blacklist.h                             | 12 +++++++
 certs/system_keyring.c                        |  6 ++++
 include/keys/system_keyring.h                 | 11 +++++++
 .../platform_certs/keyring_handler.c          | 11 +++++++
 5 files changed, 72 insertions(+)

diff --git a/certs/blacklist.c b/certs/blacklist.c
index 6514f9ebc943..4adac7f8fd94 100644
--- a/certs/blacklist.c
+++ b/certs/blacklist.c
@@ -100,6 +100,38 @@ int mark_hash_blacklisted(const char *hash)
 	return 0;
 }
 
+int mark_key_revocationlisted(const char *data, size_t size)
+{
+	key_ref_t key;
+
+	key = key_create_or_update(make_key_ref(blacklist_keyring, true),
+				   "asymmetric",
+				   NULL,
+				   data,
+				   size,
+				   ((KEY_POS_ALL & ~KEY_POS_SETATTR) | KEY_USR_VIEW),
+				   KEY_ALLOC_NOT_IN_QUOTA | KEY_ALLOC_BUILT_IN);
+
+	if (IS_ERR(key)) {
+		pr_err("Problem with revocation key (%ld)\n", PTR_ERR(key));
+		return PTR_ERR(key);
+	}
+
+	return 0;
+}
+
+int is_key_revocationlisted(struct pkcs7_message *pkcs7)
+{
+	int ret;
+
+	ret = validate_trust(pkcs7, blacklist_keyring);
+
+	if (ret == 0)
+		return -EKEYREJECTED;
+
+	return -ENOKEY;
+}
+
 /**
  * is_hash_blacklisted - Determine if a hash is blacklisted
  * @hash: The hash to be checked as a binary blob
diff --git a/certs/blacklist.h b/certs/blacklist.h
index 1efd6fa0dc60..420bb7c86e07 100644
--- a/certs/blacklist.h
+++ b/certs/blacklist.h
@@ -1,3 +1,15 @@
 #include <linux/kernel.h>
+#include <linux/errno.h>
+#include <crypto/pkcs7.h>
 
 extern const char __initconst *const blacklist_hashes[];
+
+#ifdef CONFIG_INTEGRITY_PLATFORM_KEYRING
+#define validate_trust pkcs7_validate_trust
+#else
+static inline int validate_trust(struct pkcs7_message *pkcs7,
+				 struct key *trust_keyring)
+{
+	return -ENOKEY;
+}
+#endif
diff --git a/certs/system_keyring.c b/certs/system_keyring.c
index 798291177186..f8ea96219155 100644
--- a/certs/system_keyring.c
+++ b/certs/system_keyring.c
@@ -241,6 +241,12 @@ int verify_pkcs7_message_sig(const void *data, size_t len,
 			pr_devel("PKCS#7 platform keyring is not available\n");
 			goto error;
 		}
+
+		ret = is_key_revocationlisted(pkcs7);
+		if (ret != -ENOKEY) {
+			pr_devel("PKCS#7 platform key revocationlisted\n");
+			goto error;
+		}
 	}
 	ret = pkcs7_validate_trust(pkcs7, trusted_keys);
 	if (ret < 0) {
diff --git a/include/keys/system_keyring.h b/include/keys/system_keyring.h
index fb8b07daa9d1..b6991cfe1b6d 100644
--- a/include/keys/system_keyring.h
+++ b/include/keys/system_keyring.h
@@ -31,11 +31,14 @@ extern int restrict_link_by_builtin_and_secondary_trusted(
 #define restrict_link_by_builtin_and_secondary_trusted restrict_link_by_builtin_trusted
 #endif
 
+extern struct pkcs7_message *pkcs7;
 #ifdef CONFIG_SYSTEM_BLACKLIST_KEYRING
 extern int mark_hash_blacklisted(const char *hash);
+extern int mark_key_revocationlisted(const char *data, size_t size);
 extern int is_hash_blacklisted(const u8 *hash, size_t hash_len,
 			       const char *type);
 extern int is_binary_blacklisted(const u8 *hash, size_t hash_len);
+extern int is_key_revocationlisted(struct pkcs7_message *pkcs7);
 #else
 static inline int is_hash_blacklisted(const u8 *hash, size_t hash_len,
 				      const char *type)
@@ -47,6 +50,14 @@ static inline int is_binary_blacklisted(const u8 *hash, size_t hash_len)
 {
 	return 0;
 }
+static inline int mark_key_revocationlisted(const char *data, size_t size)
+{
+	return 0;
+}
+static inline int is_key_revocationlisted(struct pkcs7_message *pkcs7)
+{
+	return -ENOKEY;
+}
 #endif
 
 #ifdef CONFIG_IMA_BLACKLIST_KEYRING
diff --git a/security/integrity/platform_certs/keyring_handler.c b/security/integrity/platform_certs/keyring_handler.c
index c5ba695c10e3..cc5a43804bc4 100644
--- a/security/integrity/platform_certs/keyring_handler.c
+++ b/security/integrity/platform_certs/keyring_handler.c
@@ -55,6 +55,15 @@ static __init void uefi_blacklist_binary(const char *source,
 	uefi_blacklist_hash(source, data, len, "bin:", 4);
 }
 
+/*
+ * Revocationlist the X509 cert
+ */
+static __init void uefi_revocationlist_x509(const char *source,
+					    const void *data, size_t len)
+{
+	mark_key_revocationlisted(data, len);
+}
+
 /*
  * Return the appropriate handler for particular signature list types found in
  * the UEFI db and MokListRT tables.
@@ -76,5 +85,7 @@ __init efi_element_handler_t get_handler_for_dbx(const efi_guid_t *sig_type)
 		return uefi_blacklist_x509_tbs;
 	if (efi_guidcmp(*sig_type, efi_cert_sha256_guid) == 0)
 		return uefi_blacklist_binary;
+	if (efi_guidcmp(*sig_type, efi_cert_x509_guid) == 0)
+		return uefi_revocationlist_x509;
 	return 0;
 }
-- 
2.18.1


^ permalink raw reply related

* [PATCH v38 11/24] x86/sgx: Add SGX enclave driver
From: Jarkko Sakkinen @ 2020-09-15 11:28 UTC (permalink / raw)
  To: x86, linux-sgx
  Cc: linux-kernel, Jarkko Sakkinen, linux-security-module, linux-mm,
	Andrew Morton, Matthew Wilcox, Jethro Beekman, Haitao Huang,
	Chunyang Hui, Jordan Hand, Nathaniel McCallum, Seth Moore,
	Darren Kenny, Sean Christopherson, Suresh Siddha,
	andriy.shevchenko, asapek, bp, cedric.xing, chenalexchen,
	conradparker, cyhanish, dave.hansen, haitao.huang, josh,
	kai.huang, kai.svahn, kmoy, ludloff, luto, nhorman, puiterwijk,
	rientjes, tglx, yaozhangx
In-Reply-To: <20200915112842.897265-1-jarkko.sakkinen@linux.intel.com>

Intel Software Guard eXtensions (SGX) is a set of CPU instructions that can
be used by applications to set aside private regions of code and data. The
code outside the SGX hosted software entity is prevented from accessing the
memory inside the enclave by the CPU. We call these entities as enclaves.

Add a driver that provides an ioctl API to construct and run enclaves.
Enclaves are constructed from pages residing in reserved physical memory
areas. The contents of these pages can only be accessed when they are
mapped as part of an enclave, by a hardware thread running inside the
enclave.

The starting state of an enclave consists of a fixed measured set of
pages that are copied to the EPC during the construction process by
using the opcode ENCLS leaf functions and Software Enclave Control
Structure (SECS) that defines the enclave properties.

Enclaves are constructed by using ENCLS leaf functions ECREATE, EADD and
EINIT. ECREATE initializes SECS, EADD copies pages from system memory to
the EPC and EINIT checks a given signed measurement and moves the enclave
into a state ready for execution.

An initialized enclave can only be accessed through special Thread Control
Structure (TCS) pages by using ENCLU (ring-3 only) leaf EENTER.  This leaf
function converts a thread into enclave mode and continues the execution in
the offset defined by the TCS provided to EENTER. An enclave is exited
through syscall, exception, interrupts or by explicitly calling another
ENCLU leaf EEXIT.

The mmap() permissions are capped by the contained enclave page
permissions. The mapped areas must also be populated, i.e. each page
address must contain a page. This logic is implemented in
sgx_encl_may_map().

Cc: linux-security-module@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Tested-by: Jethro Beekman <jethro@fortanix.com>
Tested-by: Haitao Huang <haitao.huang@linux.intel.com>
Tested-by: Chunyang Hui <sanqian.hcy@antfin.com>
Tested-by: Jordan Hand <jorhand@linux.microsoft.com>
Tested-by: Nathaniel McCallum <npmccallum@redhat.com>
Tested-by: Seth Moore <sethmo@google.com>
Tested-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 arch/x86/kernel/cpu/sgx/Makefile |   2 +
 arch/x86/kernel/cpu/sgx/driver.c | 173 ++++++++++++++++
 arch/x86/kernel/cpu/sgx/driver.h |  29 +++
 arch/x86/kernel/cpu/sgx/encl.c   | 331 +++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/encl.h   |  85 ++++++++
 arch/x86/kernel/cpu/sgx/main.c   |  11 +
 6 files changed, 631 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/sgx/driver.c
 create mode 100644 arch/x86/kernel/cpu/sgx/driver.h
 create mode 100644 arch/x86/kernel/cpu/sgx/encl.c
 create mode 100644 arch/x86/kernel/cpu/sgx/encl.h

diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
index 79510ce01b3b..3fc451120735 100644
--- a/arch/x86/kernel/cpu/sgx/Makefile
+++ b/arch/x86/kernel/cpu/sgx/Makefile
@@ -1,2 +1,4 @@
 obj-y += \
+	driver.o \
+	encl.o \
 	main.o
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
new file mode 100644
index 000000000000..f54da5f19c2b
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -0,0 +1,173 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+// Copyright(c) 2016-18 Intel Corporation.
+
+#include <linux/acpi.h>
+#include <linux/miscdevice.h>
+#include <linux/mman.h>
+#include <linux/security.h>
+#include <linux/suspend.h>
+#include <asm/traps.h>
+#include "driver.h"
+#include "encl.h"
+
+u64 sgx_encl_size_max_32;
+u64 sgx_encl_size_max_64;
+u32 sgx_misc_reserved_mask;
+u64 sgx_attributes_reserved_mask;
+u64 sgx_xfrm_reserved_mask = ~0x3;
+u32 sgx_xsave_size_tbl[64];
+
+static int sgx_open(struct inode *inode, struct file *file)
+{
+	struct sgx_encl *encl;
+	int ret;
+
+	encl = kzalloc(sizeof(*encl), GFP_KERNEL);
+	if (!encl)
+		return -ENOMEM;
+
+	atomic_set(&encl->flags, 0);
+	kref_init(&encl->refcount);
+	xa_init(&encl->page_array);
+	mutex_init(&encl->lock);
+	INIT_LIST_HEAD(&encl->mm_list);
+	spin_lock_init(&encl->mm_lock);
+
+	ret = init_srcu_struct(&encl->srcu);
+	if (ret) {
+		kfree(encl);
+		return ret;
+	}
+
+	file->private_data = encl;
+
+	return 0;
+}
+
+static int sgx_release(struct inode *inode, struct file *file)
+{
+	struct sgx_encl *encl = file->private_data;
+	struct sgx_encl_mm *encl_mm;
+
+	for ( ; ; )  {
+		spin_lock(&encl->mm_lock);
+
+		if (list_empty(&encl->mm_list)) {
+			encl_mm = NULL;
+		} else {
+			encl_mm = list_first_entry(&encl->mm_list,
+						   struct sgx_encl_mm, list);
+			list_del_rcu(&encl_mm->list);
+		}
+
+		spin_unlock(&encl->mm_lock);
+
+		/* The list is empty, ready to go. */
+		if (!encl_mm)
+			break;
+
+		synchronize_srcu(&encl->srcu);
+		mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
+		kfree(encl_mm);
+	}
+
+	mutex_lock(&encl->lock);
+	atomic_or(SGX_ENCL_DEAD, &encl->flags);
+	mutex_unlock(&encl->lock);
+
+	kref_put(&encl->refcount, sgx_encl_release);
+	return 0;
+}
+
+static int sgx_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct sgx_encl *encl = file->private_data;
+	int ret;
+
+	ret = sgx_encl_may_map(encl, vma->vm_start, vma->vm_end, vma->vm_flags);
+	if (ret)
+		return ret;
+
+	ret = sgx_encl_mm_add(encl, vma->vm_mm);
+	if (ret)
+		return ret;
+
+	vma->vm_ops = &sgx_vm_ops;
+	vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
+	vma->vm_private_data = encl;
+
+	return 0;
+}
+
+static unsigned long sgx_get_unmapped_area(struct file *file,
+					   unsigned long addr,
+					   unsigned long len,
+					   unsigned long pgoff,
+					   unsigned long flags)
+{
+	if ((flags & MAP_TYPE) == MAP_PRIVATE)
+		return -EINVAL;
+
+	if (flags & MAP_FIXED)
+		return addr;
+
+	return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
+}
+
+static const struct file_operations sgx_encl_fops = {
+	.owner			= THIS_MODULE,
+	.open			= sgx_open,
+	.release		= sgx_release,
+	.mmap			= sgx_mmap,
+	.get_unmapped_area	= sgx_get_unmapped_area,
+};
+
+static struct miscdevice sgx_dev_enclave = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "enclave",
+	.nodename = "sgx/enclave",
+	.fops = &sgx_encl_fops,
+};
+
+int __init sgx_drv_init(void)
+{
+	unsigned int eax, ebx, ecx, edx;
+	u64 attr_mask, xfrm_mask;
+	int ret;
+	int i;
+
+	if (!boot_cpu_has(X86_FEATURE_SGX_LC)) {
+		pr_info("The public key MSRs are not writable.\n");
+		return -ENODEV;
+	}
+
+	cpuid_count(SGX_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	sgx_misc_reserved_mask = ~ebx | SGX_MISC_RESERVED_MASK;
+	sgx_encl_size_max_64 = 1ULL << ((edx >> 8) & 0xFF);
+	sgx_encl_size_max_32 = 1ULL << (edx & 0xFF);
+
+	cpuid_count(SGX_CPUID, 1, &eax, &ebx, &ecx, &edx);
+
+	attr_mask = (((u64)ebx) << 32) + (u64)eax;
+	sgx_attributes_reserved_mask = ~attr_mask | SGX_ATTR_RESERVED_MASK;
+
+	if (boot_cpu_has(X86_FEATURE_OSXSAVE)) {
+		xfrm_mask = (((u64)edx) << 32) + (u64)ecx;
+
+		for (i = 2; i < 64; i++) {
+			cpuid_count(0x0D, i, &eax, &ebx, &ecx, &edx);
+			if ((1UL << i) & xfrm_mask)
+				sgx_xsave_size_tbl[i] = eax + ebx;
+		}
+
+		sgx_xfrm_reserved_mask = ~xfrm_mask;
+	}
+
+	ret = misc_register(&sgx_dev_enclave);
+	if (ret) {
+		pr_err("Creating /dev/sgx/enclave failed with %d.\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
diff --git a/arch/x86/kernel/cpu/sgx/driver.h b/arch/x86/kernel/cpu/sgx/driver.h
new file mode 100644
index 000000000000..f7ce40dedc91
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/driver.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
+#ifndef __ARCH_SGX_DRIVER_H__
+#define __ARCH_SGX_DRIVER_H__
+
+#include <crypto/hash.h>
+#include <linux/kref.h>
+#include <linux/mmu_notifier.h>
+#include <linux/radix-tree.h>
+#include <linux/rwsem.h>
+#include <linux/sched.h>
+#include <linux/workqueue.h>
+#include "sgx.h"
+
+#define SGX_EINIT_SPIN_COUNT	20
+#define SGX_EINIT_SLEEP_COUNT	50
+#define SGX_EINIT_SLEEP_TIME	20
+
+extern u64 sgx_encl_size_max_32;
+extern u64 sgx_encl_size_max_64;
+extern u32 sgx_misc_reserved_mask;
+extern u64 sgx_attributes_reserved_mask;
+extern u64 sgx_xfrm_reserved_mask;
+extern u32 sgx_xsave_size_tbl[64];
+
+long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg);
+
+int sgx_drv_init(void);
+
+#endif /* __ARCH_X86_SGX_DRIVER_H__ */
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
new file mode 100644
index 000000000000..c2c4a77af36b
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -0,0 +1,331 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+// Copyright(c) 2016-18 Intel Corporation.
+
+#include <linux/lockdep.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/shmem_fs.h>
+#include <linux/suspend.h>
+#include <linux/sched/mm.h>
+#include "arch.h"
+#include "encl.h"
+#include "encls.h"
+#include "sgx.h"
+
+static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
+						unsigned long addr)
+{
+	struct sgx_encl_page *entry;
+	unsigned int flags;
+
+	/* If process was forked, VMA is still there but vm_private_data is set
+	 * to NULL.
+	 */
+	if (!encl)
+		return ERR_PTR(-EFAULT);
+
+	flags = atomic_read(&encl->flags);
+	if ((flags & SGX_ENCL_DEAD) || !(flags & SGX_ENCL_INITIALIZED))
+		return ERR_PTR(-EFAULT);
+
+	entry = xa_load(&encl->page_array, PFN_DOWN(addr));
+	if (!entry)
+		return ERR_PTR(-EFAULT);
+
+	/* Page is already resident in the EPC. */
+	if (entry->epc_page)
+		return entry;
+
+	return ERR_PTR(-EFAULT);
+}
+
+static void sgx_mmu_notifier_release(struct mmu_notifier *mn,
+				     struct mm_struct *mm)
+{
+	struct sgx_encl_mm *encl_mm = container_of(mn, struct sgx_encl_mm, mmu_notifier);
+	struct sgx_encl_mm *tmp = NULL;
+
+	/*
+	 * The enclave itself can remove encl_mm.  Note, objects can't be moved
+	 * off an RCU protected list, but deletion is ok.
+	 */
+	spin_lock(&encl_mm->encl->mm_lock);
+	list_for_each_entry(tmp, &encl_mm->encl->mm_list, list) {
+		if (tmp == encl_mm) {
+			list_del_rcu(&encl_mm->list);
+			break;
+		}
+	}
+	spin_unlock(&encl_mm->encl->mm_lock);
+
+	if (tmp == encl_mm) {
+		synchronize_srcu(&encl_mm->encl->srcu);
+		mmu_notifier_put(mn);
+	}
+}
+
+static void sgx_mmu_notifier_free(struct mmu_notifier *mn)
+{
+	struct sgx_encl_mm *encl_mm = container_of(mn, struct sgx_encl_mm, mmu_notifier);
+
+	kfree(encl_mm);
+}
+
+static const struct mmu_notifier_ops sgx_mmu_notifier_ops = {
+	.release		= sgx_mmu_notifier_release,
+	.free_notifier		= sgx_mmu_notifier_free,
+};
+
+static struct sgx_encl_mm *sgx_encl_find_mm(struct sgx_encl *encl,
+					    struct mm_struct *mm)
+{
+	struct sgx_encl_mm *encl_mm = NULL;
+	struct sgx_encl_mm *tmp;
+	int idx;
+
+	idx = srcu_read_lock(&encl->srcu);
+
+	list_for_each_entry_rcu(tmp, &encl->mm_list, list) {
+		if (tmp->mm == mm) {
+			encl_mm = tmp;
+			break;
+		}
+	}
+
+	srcu_read_unlock(&encl->srcu, idx);
+
+	return encl_mm;
+}
+
+int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm)
+{
+	struct sgx_encl_mm *encl_mm;
+	int ret;
+
+	/* mm_list can be accessed only by a single thread at a time. */
+	mmap_assert_write_locked(mm);
+
+	if (atomic_read(&encl->flags) & SGX_ENCL_DEAD)
+		return -EINVAL;
+
+	/*
+	 * mm_structs are kept on mm_list until the mm or the enclave dies,
+	 * i.e. once an mm is off the list, it's gone for good, therefore it's
+	 * impossible to get a false positive on @mm due to a stale mm_list.
+	 */
+	if (sgx_encl_find_mm(encl, mm))
+		return 0;
+
+	encl_mm = kzalloc(sizeof(*encl_mm), GFP_KERNEL);
+	if (!encl_mm)
+		return -ENOMEM;
+
+	encl_mm->encl = encl;
+	encl_mm->mm = mm;
+	encl_mm->mmu_notifier.ops = &sgx_mmu_notifier_ops;
+
+	ret = __mmu_notifier_register(&encl_mm->mmu_notifier, mm);
+	if (ret) {
+		kfree(encl_mm);
+		return ret;
+	}
+
+	spin_lock(&encl->mm_lock);
+	list_add_rcu(&encl_mm->list, &encl->mm_list);
+	spin_unlock(&encl->mm_lock);
+
+	return 0;
+}
+
+static void sgx_vma_open(struct vm_area_struct *vma)
+{
+	struct sgx_encl *encl = vma->vm_private_data;
+
+	if (!encl)
+		return;
+
+	if (sgx_encl_mm_add(encl, vma->vm_mm))
+		vma->vm_private_data = NULL;
+}
+
+static unsigned int sgx_vma_fault(struct vm_fault *vmf)
+{
+	unsigned long addr = (unsigned long)vmf->address;
+	struct vm_area_struct *vma = vmf->vma;
+	struct sgx_encl *encl = vma->vm_private_data;
+	struct sgx_encl_page *entry;
+	int ret = VM_FAULT_NOPAGE;
+	unsigned long pfn;
+
+	if (!encl)
+		return VM_FAULT_SIGBUS;
+
+	mutex_lock(&encl->lock);
+
+	entry = sgx_encl_load_page(encl, addr);
+	if (IS_ERR(entry)) {
+		if (unlikely(PTR_ERR(entry) != -EBUSY))
+			ret = VM_FAULT_SIGBUS;
+
+		goto out;
+	}
+
+	if (!follow_pfn(vma, addr, &pfn))
+		goto out;
+
+	ret = vmf_insert_pfn(vma, addr, PFN_DOWN(entry->epc_page->desc));
+	if (ret != VM_FAULT_NOPAGE) {
+		ret = VM_FAULT_SIGBUS;
+		goto out;
+	}
+
+out:
+	mutex_unlock(&encl->lock);
+	return ret;
+}
+
+/**
+ * sgx_encl_may_map() - Check if a requested VMA mapping is allowed
+ * @encl:		an enclave pointer
+ * @start:		lower bound of the address range, inclusive
+ * @end:		upper bound of the address range, exclusive
+ * @vm_prot_bits:	requested protections of the address range
+ *
+ * Iterate through the enclave pages contained within [@start, @end) to verify
+ * the permissions requested by @vm_prot_bits do not exceed that of any enclave
+ * page to be mapped.
+ *
+ * Return:
+ *   0 on success,
+ *   -EACCES if VMA permissions exceed enclave page permissions
+ */
+int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
+		     unsigned long end, unsigned long vm_flags)
+{
+	unsigned long vm_prot_bits = vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
+	unsigned long idx_start = PFN_DOWN(start);
+	unsigned long idx_end = PFN_DOWN(end - 1);
+	struct sgx_encl_page *page;
+
+	XA_STATE(xas, &encl->page_array, idx_start);
+
+	/*
+	 * Disallow READ_IMPLIES_EXEC tasks as their VMA permissions might
+	 * conflict with the enclave page permissions.
+	 */
+	if (current->personality & READ_IMPLIES_EXEC)
+		return -EACCES;
+
+	xas_for_each(&xas, page, idx_end)
+		if (!page || (~page->vm_max_prot_bits & vm_prot_bits))
+			return -EACCES;
+
+	return 0;
+}
+
+static int sgx_vma_mprotect(struct vm_area_struct *vma,
+			    struct vm_area_struct **pprev, unsigned long start,
+			    unsigned long end, unsigned long newflags)
+{
+	int ret;
+
+	ret = sgx_encl_may_map(vma->vm_private_data, start, end, newflags);
+	if (ret)
+		return ret;
+
+	return mprotect_fixup(vma, pprev, start, end, newflags);
+}
+
+const struct vm_operations_struct sgx_vm_ops = {
+	.open = sgx_vma_open,
+	.fault = sgx_vma_fault,
+	.mprotect = sgx_vma_mprotect,
+};
+
+/**
+ * sgx_encl_find - find an enclave
+ * @mm:		mm struct of the current process
+ * @addr:	address in the ELRANGE
+ * @vma:	the resulting VMA
+ *
+ * Find an enclave identified by the given address. Give back a VMA that is
+ * part of the enclave and located in that address. The VMA is given back if it
+ * is a proper enclave VMA even if an &sgx_encl instance does not exist yet
+ * (enclave creation has not been performed).
+ *
+ * Return:
+ *   0 on success,
+ *   -EINVAL if an enclave was not found,
+ *   -ENOENT if the enclave has not been created yet
+ */
+int sgx_encl_find(struct mm_struct *mm, unsigned long addr,
+		  struct vm_area_struct **vma)
+{
+	struct vm_area_struct *result;
+	struct sgx_encl *encl;
+
+	result = find_vma(mm, addr);
+	if (!result || result->vm_ops != &sgx_vm_ops || addr < result->vm_start)
+		return -EINVAL;
+
+	encl = result->vm_private_data;
+	*vma = result;
+
+	return encl ? 0 : -ENOENT;
+}
+
+/**
+ * sgx_encl_destroy() - destroy enclave resources
+ * @encl:	an enclave pointer
+ */
+void sgx_encl_destroy(struct sgx_encl *encl)
+{
+	struct sgx_encl_page *entry;
+	unsigned long index;
+
+	atomic_or(SGX_ENCL_DEAD, &encl->flags);
+
+	xa_for_each(&encl->page_array, index, entry) {
+		if (entry->epc_page) {
+			sgx_free_epc_page(entry->epc_page);
+			encl->secs_child_cnt--;
+			entry->epc_page = NULL;
+		}
+
+		kfree(entry);
+	}
+
+	xa_destroy(&encl->page_array);
+
+	if (!encl->secs_child_cnt && encl->secs.epc_page) {
+		sgx_free_epc_page(encl->secs.epc_page);
+		encl->secs.epc_page = NULL;
+	}
+}
+
+/**
+ * sgx_encl_release - Destroy an enclave instance
+ * @kref:	address of a kref inside &sgx_encl
+ *
+ * Used together with kref_put(). Frees all the resources associated with the
+ * enclave and the instance itself.
+ */
+void sgx_encl_release(struct kref *ref)
+{
+	struct sgx_encl *encl = container_of(ref, struct sgx_encl, refcount);
+
+	sgx_encl_destroy(encl);
+
+	if (encl->backing)
+		fput(encl->backing);
+
+	cleanup_srcu_struct(&encl->srcu);
+
+	WARN_ON_ONCE(!list_empty(&encl->mm_list));
+
+	/* Detect EPC page leak's. */
+	WARN_ON_ONCE(encl->secs_child_cnt);
+	WARN_ON_ONCE(encl->secs.epc_page);
+
+	kfree(encl);
+}
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
new file mode 100644
index 000000000000..8ff445476657
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -0,0 +1,85 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
+/**
+ * Copyright(c) 2016-19 Intel Corporation.
+ */
+#ifndef _X86_ENCL_H
+#define _X86_ENCL_H
+
+#include <linux/cpumask.h>
+#include <linux/kref.h>
+#include <linux/list.h>
+#include <linux/mm_types.h>
+#include <linux/mmu_notifier.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/srcu.h>
+#include <linux/workqueue.h>
+#include <linux/xarray.h>
+#include "sgx.h"
+
+/**
+ * enum sgx_encl_page_desc - defines bits for an enclave page's descriptor
+ * %SGX_ENCL_PAGE_ADDR_MASK:		Holds the virtual address of the page.
+ *
+ * The page address for SECS is zero and is used by the subsystem to recognize
+ * the SECS page.
+ */
+enum sgx_encl_page_desc {
+	/* Bits 11:3 are available when the page is not swapped. */
+	SGX_ENCL_PAGE_ADDR_MASK		= PAGE_MASK,
+};
+
+#define SGX_ENCL_PAGE_ADDR(page) \
+	((page)->desc & SGX_ENCL_PAGE_ADDR_MASK)
+
+struct sgx_encl_page {
+	unsigned long desc;
+	unsigned long vm_max_prot_bits;
+	struct sgx_epc_page *epc_page;
+	struct sgx_encl *encl;
+};
+
+enum sgx_encl_flags {
+	SGX_ENCL_CREATED	= BIT(0),
+	SGX_ENCL_INITIALIZED	= BIT(1),
+	SGX_ENCL_DEBUG		= BIT(2),
+	SGX_ENCL_DEAD		= BIT(3),
+	SGX_ENCL_IOCTL		= BIT(4),
+};
+
+struct sgx_encl_mm {
+	struct sgx_encl *encl;
+	struct mm_struct *mm;
+	struct list_head list;
+	struct mmu_notifier mmu_notifier;
+};
+
+struct sgx_encl {
+	atomic_t flags;
+	unsigned int page_cnt;
+	unsigned int secs_child_cnt;
+	struct mutex lock;
+	struct list_head mm_list;
+	spinlock_t mm_lock;
+	struct file *backing;
+	struct kref refcount;
+	struct srcu_struct srcu;
+	unsigned long base;
+	unsigned long size;
+	unsigned long ssaframesize;
+	struct xarray page_array;
+	struct sgx_encl_page secs;
+	cpumask_t cpumask;
+};
+
+extern const struct vm_operations_struct sgx_vm_ops;
+
+int sgx_encl_find(struct mm_struct *mm, unsigned long addr,
+		  struct vm_area_struct **vma);
+void sgx_encl_destroy(struct sgx_encl *encl);
+void sgx_encl_release(struct kref *ref);
+int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm);
+int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
+		     unsigned long end, unsigned long vm_flags);
+
+#endif /* _X86_ENCL_H */
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 97c6895fb6c9..4137254fb29e 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -9,6 +9,8 @@
 #include <linux/sched/mm.h>
 #include <linux/sched/signal.h>
 #include <linux/slab.h>
+#include "driver.h"
+#include "encl.h"
 #include "encls.h"
 
 struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
@@ -260,6 +262,8 @@ static bool __init sgx_page_cache_init(void)
 
 static void __init sgx_init(void)
 {
+	int ret;
+
 	if (!boot_cpu_has(X86_FEATURE_SGX))
 		return;
 
@@ -269,8 +273,15 @@ static void __init sgx_init(void)
 	if (!sgx_page_reclaimer_init())
 		goto err_page_cache;
 
+	ret = sgx_drv_init();
+	if (ret)
+		goto err_kthread;
+
 	return;
 
+err_kthread:
+	kthread_stop(ksgxswapd_tsk);
+
 err_page_cache:
 	sgx_page_cache_teardown();
 }
-- 
2.25.1


^ permalink raw reply related

* [PATCH v38 11/24] x86/sgx: Add SGX enclave driver
From: Jarkko Sakkinen @ 2020-09-15 11:05 UTC (permalink / raw)
  To: x86, linux-sgx
  Cc: linux-kernel, Jarkko Sakkinen, linux-security-module, linux-mm,
	Andrew Morton, Matthew Wilcox, Jethro Beekman, Haitao Huang,
	Chunyang Hui, Jordan Hand, Nathaniel McCallum, Seth Moore,
	Darren Kenny, Sean Christopherson, Suresh Siddha,
	andriy.shevchenko, asapek, bp, cedric.xing, chenalexchen,
	conradparker, cyhanish, dave.hansen, haitao.huang, josh,
	kai.huang, kai.svahn, kmoy, ludloff, luto, nhorman, puiterwijk,
	rientjes, tglx, yaozhangx
In-Reply-To: <20200915110522.893152-1-jarkko.sakkinen@linux.intel.com>

Intel Software Guard eXtensions (SGX) is a set of CPU instructions that can
be used by applications to set aside private regions of code and data. The
code outside the SGX hosted software entity is prevented from accessing the
memory inside the enclave by the CPU. We call these entities as enclaves.

Add a driver that provides an ioctl API to construct and run enclaves.
Enclaves are constructed from pages residing in reserved physical memory
areas. The contents of these pages can only be accessed when they are
mapped as part of an enclave, by a hardware thread running inside the
enclave.

The starting state of an enclave consists of a fixed measured set of
pages that are copied to the EPC during the construction process by
using the opcode ENCLS leaf functions and Software Enclave Control
Structure (SECS) that defines the enclave properties.

Enclaves are constructed by using ENCLS leaf functions ECREATE, EADD and
EINIT. ECREATE initializes SECS, EADD copies pages from system memory to
the EPC and EINIT checks a given signed measurement and moves the enclave
into a state ready for execution.

An initialized enclave can only be accessed through special Thread Control
Structure (TCS) pages by using ENCLU (ring-3 only) leaf EENTER.  This leaf
function converts a thread into enclave mode and continues the execution in
the offset defined by the TCS provided to EENTER. An enclave is exited
through syscall, exception, interrupts or by explicitly calling another
ENCLU leaf EEXIT.

The mmap() permissions are capped by the contained enclave page
permissions. The mapped areas must also be populated, i.e. each page
address must contain a page. This logic is implemented in
sgx_encl_may_map().

Cc: linux-security-module@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Tested-by: Jethro Beekman <jethro@fortanix.com>
Tested-by: Haitao Huang <haitao.huang@linux.intel.com>
Tested-by: Chunyang Hui <sanqian.hcy@antfin.com>
Tested-by: Jordan Hand <jorhand@linux.microsoft.com>
Tested-by: Nathaniel McCallum <npmccallum@redhat.com>
Tested-by: Seth Moore <sethmo@google.com>
Tested-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 arch/x86/kernel/cpu/sgx/Makefile |   2 +
 arch/x86/kernel/cpu/sgx/driver.c | 173 ++++++++++++++++
 arch/x86/kernel/cpu/sgx/driver.h |  29 +++
 arch/x86/kernel/cpu/sgx/encl.c   | 331 +++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/encl.h   |  85 ++++++++
 arch/x86/kernel/cpu/sgx/main.c   |  11 +
 6 files changed, 631 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/sgx/driver.c
 create mode 100644 arch/x86/kernel/cpu/sgx/driver.h
 create mode 100644 arch/x86/kernel/cpu/sgx/encl.c
 create mode 100644 arch/x86/kernel/cpu/sgx/encl.h

diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
index 79510ce01b3b..3fc451120735 100644
--- a/arch/x86/kernel/cpu/sgx/Makefile
+++ b/arch/x86/kernel/cpu/sgx/Makefile
@@ -1,2 +1,4 @@
 obj-y += \
+	driver.o \
+	encl.o \
 	main.o
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
new file mode 100644
index 000000000000..f54da5f19c2b
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -0,0 +1,173 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+// Copyright(c) 2016-18 Intel Corporation.
+
+#include <linux/acpi.h>
+#include <linux/miscdevice.h>
+#include <linux/mman.h>
+#include <linux/security.h>
+#include <linux/suspend.h>
+#include <asm/traps.h>
+#include "driver.h"
+#include "encl.h"
+
+u64 sgx_encl_size_max_32;
+u64 sgx_encl_size_max_64;
+u32 sgx_misc_reserved_mask;
+u64 sgx_attributes_reserved_mask;
+u64 sgx_xfrm_reserved_mask = ~0x3;
+u32 sgx_xsave_size_tbl[64];
+
+static int sgx_open(struct inode *inode, struct file *file)
+{
+	struct sgx_encl *encl;
+	int ret;
+
+	encl = kzalloc(sizeof(*encl), GFP_KERNEL);
+	if (!encl)
+		return -ENOMEM;
+
+	atomic_set(&encl->flags, 0);
+	kref_init(&encl->refcount);
+	xa_init(&encl->page_array);
+	mutex_init(&encl->lock);
+	INIT_LIST_HEAD(&encl->mm_list);
+	spin_lock_init(&encl->mm_lock);
+
+	ret = init_srcu_struct(&encl->srcu);
+	if (ret) {
+		kfree(encl);
+		return ret;
+	}
+
+	file->private_data = encl;
+
+	return 0;
+}
+
+static int sgx_release(struct inode *inode, struct file *file)
+{
+	struct sgx_encl *encl = file->private_data;
+	struct sgx_encl_mm *encl_mm;
+
+	for ( ; ; )  {
+		spin_lock(&encl->mm_lock);
+
+		if (list_empty(&encl->mm_list)) {
+			encl_mm = NULL;
+		} else {
+			encl_mm = list_first_entry(&encl->mm_list,
+						   struct sgx_encl_mm, list);
+			list_del_rcu(&encl_mm->list);
+		}
+
+		spin_unlock(&encl->mm_lock);
+
+		/* The list is empty, ready to go. */
+		if (!encl_mm)
+			break;
+
+		synchronize_srcu(&encl->srcu);
+		mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
+		kfree(encl_mm);
+	}
+
+	mutex_lock(&encl->lock);
+	atomic_or(SGX_ENCL_DEAD, &encl->flags);
+	mutex_unlock(&encl->lock);
+
+	kref_put(&encl->refcount, sgx_encl_release);
+	return 0;
+}
+
+static int sgx_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct sgx_encl *encl = file->private_data;
+	int ret;
+
+	ret = sgx_encl_may_map(encl, vma->vm_start, vma->vm_end, vma->vm_flags);
+	if (ret)
+		return ret;
+
+	ret = sgx_encl_mm_add(encl, vma->vm_mm);
+	if (ret)
+		return ret;
+
+	vma->vm_ops = &sgx_vm_ops;
+	vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
+	vma->vm_private_data = encl;
+
+	return 0;
+}
+
+static unsigned long sgx_get_unmapped_area(struct file *file,
+					   unsigned long addr,
+					   unsigned long len,
+					   unsigned long pgoff,
+					   unsigned long flags)
+{
+	if ((flags & MAP_TYPE) == MAP_PRIVATE)
+		return -EINVAL;
+
+	if (flags & MAP_FIXED)
+		return addr;
+
+	return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
+}
+
+static const struct file_operations sgx_encl_fops = {
+	.owner			= THIS_MODULE,
+	.open			= sgx_open,
+	.release		= sgx_release,
+	.mmap			= sgx_mmap,
+	.get_unmapped_area	= sgx_get_unmapped_area,
+};
+
+static struct miscdevice sgx_dev_enclave = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "enclave",
+	.nodename = "sgx/enclave",
+	.fops = &sgx_encl_fops,
+};
+
+int __init sgx_drv_init(void)
+{
+	unsigned int eax, ebx, ecx, edx;
+	u64 attr_mask, xfrm_mask;
+	int ret;
+	int i;
+
+	if (!boot_cpu_has(X86_FEATURE_SGX_LC)) {
+		pr_info("The public key MSRs are not writable.\n");
+		return -ENODEV;
+	}
+
+	cpuid_count(SGX_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	sgx_misc_reserved_mask = ~ebx | SGX_MISC_RESERVED_MASK;
+	sgx_encl_size_max_64 = 1ULL << ((edx >> 8) & 0xFF);
+	sgx_encl_size_max_32 = 1ULL << (edx & 0xFF);
+
+	cpuid_count(SGX_CPUID, 1, &eax, &ebx, &ecx, &edx);
+
+	attr_mask = (((u64)ebx) << 32) + (u64)eax;
+	sgx_attributes_reserved_mask = ~attr_mask | SGX_ATTR_RESERVED_MASK;
+
+	if (boot_cpu_has(X86_FEATURE_OSXSAVE)) {
+		xfrm_mask = (((u64)edx) << 32) + (u64)ecx;
+
+		for (i = 2; i < 64; i++) {
+			cpuid_count(0x0D, i, &eax, &ebx, &ecx, &edx);
+			if ((1UL << i) & xfrm_mask)
+				sgx_xsave_size_tbl[i] = eax + ebx;
+		}
+
+		sgx_xfrm_reserved_mask = ~xfrm_mask;
+	}
+
+	ret = misc_register(&sgx_dev_enclave);
+	if (ret) {
+		pr_err("Creating /dev/sgx/enclave failed with %d.\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
diff --git a/arch/x86/kernel/cpu/sgx/driver.h b/arch/x86/kernel/cpu/sgx/driver.h
new file mode 100644
index 000000000000..f7ce40dedc91
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/driver.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
+#ifndef __ARCH_SGX_DRIVER_H__
+#define __ARCH_SGX_DRIVER_H__
+
+#include <crypto/hash.h>
+#include <linux/kref.h>
+#include <linux/mmu_notifier.h>
+#include <linux/radix-tree.h>
+#include <linux/rwsem.h>
+#include <linux/sched.h>
+#include <linux/workqueue.h>
+#include "sgx.h"
+
+#define SGX_EINIT_SPIN_COUNT	20
+#define SGX_EINIT_SLEEP_COUNT	50
+#define SGX_EINIT_SLEEP_TIME	20
+
+extern u64 sgx_encl_size_max_32;
+extern u64 sgx_encl_size_max_64;
+extern u32 sgx_misc_reserved_mask;
+extern u64 sgx_attributes_reserved_mask;
+extern u64 sgx_xfrm_reserved_mask;
+extern u32 sgx_xsave_size_tbl[64];
+
+long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg);
+
+int sgx_drv_init(void);
+
+#endif /* __ARCH_X86_SGX_DRIVER_H__ */
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
new file mode 100644
index 000000000000..c2c4a77af36b
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -0,0 +1,331 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+// Copyright(c) 2016-18 Intel Corporation.
+
+#include <linux/lockdep.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/shmem_fs.h>
+#include <linux/suspend.h>
+#include <linux/sched/mm.h>
+#include "arch.h"
+#include "encl.h"
+#include "encls.h"
+#include "sgx.h"
+
+static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
+						unsigned long addr)
+{
+	struct sgx_encl_page *entry;
+	unsigned int flags;
+
+	/* If process was forked, VMA is still there but vm_private_data is set
+	 * to NULL.
+	 */
+	if (!encl)
+		return ERR_PTR(-EFAULT);
+
+	flags = atomic_read(&encl->flags);
+	if ((flags & SGX_ENCL_DEAD) || !(flags & SGX_ENCL_INITIALIZED))
+		return ERR_PTR(-EFAULT);
+
+	entry = xa_load(&encl->page_array, PFN_DOWN(addr));
+	if (!entry)
+		return ERR_PTR(-EFAULT);
+
+	/* Page is already resident in the EPC. */
+	if (entry->epc_page)
+		return entry;
+
+	return ERR_PTR(-EFAULT);
+}
+
+static void sgx_mmu_notifier_release(struct mmu_notifier *mn,
+				     struct mm_struct *mm)
+{
+	struct sgx_encl_mm *encl_mm = container_of(mn, struct sgx_encl_mm, mmu_notifier);
+	struct sgx_encl_mm *tmp = NULL;
+
+	/*
+	 * The enclave itself can remove encl_mm.  Note, objects can't be moved
+	 * off an RCU protected list, but deletion is ok.
+	 */
+	spin_lock(&encl_mm->encl->mm_lock);
+	list_for_each_entry(tmp, &encl_mm->encl->mm_list, list) {
+		if (tmp == encl_mm) {
+			list_del_rcu(&encl_mm->list);
+			break;
+		}
+	}
+	spin_unlock(&encl_mm->encl->mm_lock);
+
+	if (tmp == encl_mm) {
+		synchronize_srcu(&encl_mm->encl->srcu);
+		mmu_notifier_put(mn);
+	}
+}
+
+static void sgx_mmu_notifier_free(struct mmu_notifier *mn)
+{
+	struct sgx_encl_mm *encl_mm = container_of(mn, struct sgx_encl_mm, mmu_notifier);
+
+	kfree(encl_mm);
+}
+
+static const struct mmu_notifier_ops sgx_mmu_notifier_ops = {
+	.release		= sgx_mmu_notifier_release,
+	.free_notifier		= sgx_mmu_notifier_free,
+};
+
+static struct sgx_encl_mm *sgx_encl_find_mm(struct sgx_encl *encl,
+					    struct mm_struct *mm)
+{
+	struct sgx_encl_mm *encl_mm = NULL;
+	struct sgx_encl_mm *tmp;
+	int idx;
+
+	idx = srcu_read_lock(&encl->srcu);
+
+	list_for_each_entry_rcu(tmp, &encl->mm_list, list) {
+		if (tmp->mm == mm) {
+			encl_mm = tmp;
+			break;
+		}
+	}
+
+	srcu_read_unlock(&encl->srcu, idx);
+
+	return encl_mm;
+}
+
+int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm)
+{
+	struct sgx_encl_mm *encl_mm;
+	int ret;
+
+	/* mm_list can be accessed only by a single thread at a time. */
+	mmap_assert_write_locked(mm);
+
+	if (atomic_read(&encl->flags) & SGX_ENCL_DEAD)
+		return -EINVAL;
+
+	/*
+	 * mm_structs are kept on mm_list until the mm or the enclave dies,
+	 * i.e. once an mm is off the list, it's gone for good, therefore it's
+	 * impossible to get a false positive on @mm due to a stale mm_list.
+	 */
+	if (sgx_encl_find_mm(encl, mm))
+		return 0;
+
+	encl_mm = kzalloc(sizeof(*encl_mm), GFP_KERNEL);
+	if (!encl_mm)
+		return -ENOMEM;
+
+	encl_mm->encl = encl;
+	encl_mm->mm = mm;
+	encl_mm->mmu_notifier.ops = &sgx_mmu_notifier_ops;
+
+	ret = __mmu_notifier_register(&encl_mm->mmu_notifier, mm);
+	if (ret) {
+		kfree(encl_mm);
+		return ret;
+	}
+
+	spin_lock(&encl->mm_lock);
+	list_add_rcu(&encl_mm->list, &encl->mm_list);
+	spin_unlock(&encl->mm_lock);
+
+	return 0;
+}
+
+static void sgx_vma_open(struct vm_area_struct *vma)
+{
+	struct sgx_encl *encl = vma->vm_private_data;
+
+	if (!encl)
+		return;
+
+	if (sgx_encl_mm_add(encl, vma->vm_mm))
+		vma->vm_private_data = NULL;
+}
+
+static unsigned int sgx_vma_fault(struct vm_fault *vmf)
+{
+	unsigned long addr = (unsigned long)vmf->address;
+	struct vm_area_struct *vma = vmf->vma;
+	struct sgx_encl *encl = vma->vm_private_data;
+	struct sgx_encl_page *entry;
+	int ret = VM_FAULT_NOPAGE;
+	unsigned long pfn;
+
+	if (!encl)
+		return VM_FAULT_SIGBUS;
+
+	mutex_lock(&encl->lock);
+
+	entry = sgx_encl_load_page(encl, addr);
+	if (IS_ERR(entry)) {
+		if (unlikely(PTR_ERR(entry) != -EBUSY))
+			ret = VM_FAULT_SIGBUS;
+
+		goto out;
+	}
+
+	if (!follow_pfn(vma, addr, &pfn))
+		goto out;
+
+	ret = vmf_insert_pfn(vma, addr, PFN_DOWN(entry->epc_page->desc));
+	if (ret != VM_FAULT_NOPAGE) {
+		ret = VM_FAULT_SIGBUS;
+		goto out;
+	}
+
+out:
+	mutex_unlock(&encl->lock);
+	return ret;
+}
+
+/**
+ * sgx_encl_may_map() - Check if a requested VMA mapping is allowed
+ * @encl:		an enclave pointer
+ * @start:		lower bound of the address range, inclusive
+ * @end:		upper bound of the address range, exclusive
+ * @vm_prot_bits:	requested protections of the address range
+ *
+ * Iterate through the enclave pages contained within [@start, @end) to verify
+ * the permissions requested by @vm_prot_bits do not exceed that of any enclave
+ * page to be mapped.
+ *
+ * Return:
+ *   0 on success,
+ *   -EACCES if VMA permissions exceed enclave page permissions
+ */
+int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
+		     unsigned long end, unsigned long vm_flags)
+{
+	unsigned long vm_prot_bits = vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
+	unsigned long idx_start = PFN_DOWN(start);
+	unsigned long idx_end = PFN_DOWN(end - 1);
+	struct sgx_encl_page *page;
+
+	XA_STATE(xas, &encl->page_array, idx_start);
+
+	/*
+	 * Disallow READ_IMPLIES_EXEC tasks as their VMA permissions might
+	 * conflict with the enclave page permissions.
+	 */
+	if (current->personality & READ_IMPLIES_EXEC)
+		return -EACCES;
+
+	xas_for_each(&xas, page, idx_end)
+		if (!page || (~page->vm_max_prot_bits & vm_prot_bits))
+			return -EACCES;
+
+	return 0;
+}
+
+static int sgx_vma_mprotect(struct vm_area_struct *vma,
+			    struct vm_area_struct **pprev, unsigned long start,
+			    unsigned long end, unsigned long newflags)
+{
+	int ret;
+
+	ret = sgx_encl_may_map(vma->vm_private_data, start, end, newflags);
+	if (ret)
+		return ret;
+
+	return mprotect_fixup(vma, pprev, start, end, newflags);
+}
+
+const struct vm_operations_struct sgx_vm_ops = {
+	.open = sgx_vma_open,
+	.fault = sgx_vma_fault,
+	.mprotect = sgx_vma_mprotect,
+};
+
+/**
+ * sgx_encl_find - find an enclave
+ * @mm:		mm struct of the current process
+ * @addr:	address in the ELRANGE
+ * @vma:	the resulting VMA
+ *
+ * Find an enclave identified by the given address. Give back a VMA that is
+ * part of the enclave and located in that address. The VMA is given back if it
+ * is a proper enclave VMA even if an &sgx_encl instance does not exist yet
+ * (enclave creation has not been performed).
+ *
+ * Return:
+ *   0 on success,
+ *   -EINVAL if an enclave was not found,
+ *   -ENOENT if the enclave has not been created yet
+ */
+int sgx_encl_find(struct mm_struct *mm, unsigned long addr,
+		  struct vm_area_struct **vma)
+{
+	struct vm_area_struct *result;
+	struct sgx_encl *encl;
+
+	result = find_vma(mm, addr);
+	if (!result || result->vm_ops != &sgx_vm_ops || addr < result->vm_start)
+		return -EINVAL;
+
+	encl = result->vm_private_data;
+	*vma = result;
+
+	return encl ? 0 : -ENOENT;
+}
+
+/**
+ * sgx_encl_destroy() - destroy enclave resources
+ * @encl:	an enclave pointer
+ */
+void sgx_encl_destroy(struct sgx_encl *encl)
+{
+	struct sgx_encl_page *entry;
+	unsigned long index;
+
+	atomic_or(SGX_ENCL_DEAD, &encl->flags);
+
+	xa_for_each(&encl->page_array, index, entry) {
+		if (entry->epc_page) {
+			sgx_free_epc_page(entry->epc_page);
+			encl->secs_child_cnt--;
+			entry->epc_page = NULL;
+		}
+
+		kfree(entry);
+	}
+
+	xa_destroy(&encl->page_array);
+
+	if (!encl->secs_child_cnt && encl->secs.epc_page) {
+		sgx_free_epc_page(encl->secs.epc_page);
+		encl->secs.epc_page = NULL;
+	}
+}
+
+/**
+ * sgx_encl_release - Destroy an enclave instance
+ * @kref:	address of a kref inside &sgx_encl
+ *
+ * Used together with kref_put(). Frees all the resources associated with the
+ * enclave and the instance itself.
+ */
+void sgx_encl_release(struct kref *ref)
+{
+	struct sgx_encl *encl = container_of(ref, struct sgx_encl, refcount);
+
+	sgx_encl_destroy(encl);
+
+	if (encl->backing)
+		fput(encl->backing);
+
+	cleanup_srcu_struct(&encl->srcu);
+
+	WARN_ON_ONCE(!list_empty(&encl->mm_list));
+
+	/* Detect EPC page leak's. */
+	WARN_ON_ONCE(encl->secs_child_cnt);
+	WARN_ON_ONCE(encl->secs.epc_page);
+
+	kfree(encl);
+}
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
new file mode 100644
index 000000000000..8ff445476657
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -0,0 +1,85 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
+/**
+ * Copyright(c) 2016-19 Intel Corporation.
+ */
+#ifndef _X86_ENCL_H
+#define _X86_ENCL_H
+
+#include <linux/cpumask.h>
+#include <linux/kref.h>
+#include <linux/list.h>
+#include <linux/mm_types.h>
+#include <linux/mmu_notifier.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/srcu.h>
+#include <linux/workqueue.h>
+#include <linux/xarray.h>
+#include "sgx.h"
+
+/**
+ * enum sgx_encl_page_desc - defines bits for an enclave page's descriptor
+ * %SGX_ENCL_PAGE_ADDR_MASK:		Holds the virtual address of the page.
+ *
+ * The page address for SECS is zero and is used by the subsystem to recognize
+ * the SECS page.
+ */
+enum sgx_encl_page_desc {
+	/* Bits 11:3 are available when the page is not swapped. */
+	SGX_ENCL_PAGE_ADDR_MASK		= PAGE_MASK,
+};
+
+#define SGX_ENCL_PAGE_ADDR(page) \
+	((page)->desc & SGX_ENCL_PAGE_ADDR_MASK)
+
+struct sgx_encl_page {
+	unsigned long desc;
+	unsigned long vm_max_prot_bits;
+	struct sgx_epc_page *epc_page;
+	struct sgx_encl *encl;
+};
+
+enum sgx_encl_flags {
+	SGX_ENCL_CREATED	= BIT(0),
+	SGX_ENCL_INITIALIZED	= BIT(1),
+	SGX_ENCL_DEBUG		= BIT(2),
+	SGX_ENCL_DEAD		= BIT(3),
+	SGX_ENCL_IOCTL		= BIT(4),
+};
+
+struct sgx_encl_mm {
+	struct sgx_encl *encl;
+	struct mm_struct *mm;
+	struct list_head list;
+	struct mmu_notifier mmu_notifier;
+};
+
+struct sgx_encl {
+	atomic_t flags;
+	unsigned int page_cnt;
+	unsigned int secs_child_cnt;
+	struct mutex lock;
+	struct list_head mm_list;
+	spinlock_t mm_lock;
+	struct file *backing;
+	struct kref refcount;
+	struct srcu_struct srcu;
+	unsigned long base;
+	unsigned long size;
+	unsigned long ssaframesize;
+	struct xarray page_array;
+	struct sgx_encl_page secs;
+	cpumask_t cpumask;
+};
+
+extern const struct vm_operations_struct sgx_vm_ops;
+
+int sgx_encl_find(struct mm_struct *mm, unsigned long addr,
+		  struct vm_area_struct **vma);
+void sgx_encl_destroy(struct sgx_encl *encl);
+void sgx_encl_release(struct kref *ref);
+int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm);
+int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
+		     unsigned long end, unsigned long vm_flags);
+
+#endif /* _X86_ENCL_H */
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 97c6895fb6c9..4137254fb29e 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -9,6 +9,8 @@
 #include <linux/sched/mm.h>
 #include <linux/sched/signal.h>
 #include <linux/slab.h>
+#include "driver.h"
+#include "encl.h"
 #include "encls.h"
 
 struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
@@ -260,6 +262,8 @@ static bool __init sgx_page_cache_init(void)
 
 static void __init sgx_init(void)
 {
+	int ret;
+
 	if (!boot_cpu_has(X86_FEATURE_SGX))
 		return;
 
@@ -269,8 +273,15 @@ static void __init sgx_init(void)
 	if (!sgx_page_reclaimer_init())
 		goto err_page_cache;
 
+	ret = sgx_drv_init();
+	if (ret)
+		goto err_kthread;
+
 	return;
 
+err_kthread:
+	kthread_stop(ksgxswapd_tsk);
+
 err_page_cache:
 	sgx_page_cache_teardown();
 }
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH v2] socket.7,unix.7: add initial description for SO_PEERSEC
From: Serge E. Hallyn @ 2020-09-15 14:19 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Stephen Smalley, linux-man, linux-security-module, selinux, smcv,
	James Morris, Serge E. Hallyn
In-Reply-To: <CAKgNAkgjndEr4zd1zGD_h+2srWsRAQT7=Hyqh7Ktxy7FQn35Dg@mail.gmail.com>

On Tue, Sep 15, 2020 at 10:56:18AM +0200, Michael Kerrisk (man-pages) wrote:
> Hello Stephen,
> 
> On Mon, 14 Sep 2020 at 20:07, Stephen Smalley
> <stephen.smalley.work@gmail.com> wrote:
> >
> > SO_PEERSEC was introduced for AF_UNIX stream sockets connected via
> > connect(2) in Linux 2.6.2 [1] and later augmented to support AF_UNIX stream
> > and datagram sockets created via socketpair(2) in Linux 4.18 [2].  Document
> > SO_PEERSEC in the socket.7 and unix.7 man pages following the example
> > of the existing SO_PEERCRED descriptions.  SO_PEERSEC is also supported
> > on AF_INET sockets when using labeled IPSEC or NetLabel but defer
> > adding a description of that support to a separate patch.
> >
> > The module-independent description of the security context returned
> > by SO_PEERSEC is from Simon McVittie <smcv@collabora.com>.
> 
> Thanks for the patch, The text looks in reasonable shape to me. I'm
> just hanging off applying for a bit in case some Reviewed/Acked-by
> comes in.
> 

fwiw just as someone who's used this,

Reviewed-by: Serge Hallyn <serge@hallyn.com>

Thanks Stephen, thanks Michael.

> Cheers,
> 
> Michael
> 
> 
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=da6e57a2e6bd7939f610d957afacaf6a131e75ed
> >
> > [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0b811db2cb2aabc910e53d34ebb95a15997c33e7
> >
> > Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> > ---
> > v2 adds kernel commit info to the description and man page and uses
> > the suggested text from Simon McVittie for the description of
> > the security context string in a module-neutral way.
> >
> >  man7/socket.7 |  5 +++++
> >  man7/unix.7   | 46 ++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 51 insertions(+)
> >
> > diff --git a/man7/socket.7 b/man7/socket.7
> > index 21e891791..c3635f95b 100644
> > --- a/man7/socket.7
> > +++ b/man7/socket.7
> > @@ -690,6 +690,11 @@ Return the credentials of the peer process connected to this socket.
> >  For further details, see
> >  .BR unix (7).
> >  .TP
> > +.BR SO_PEERSEC " (since Linux 2.6.2)"
> > +Return the security context of the peer socket connected to this socket.
> > +For further details, see
> > +.BR unix (7).
> > +.TP
> >  .B SO_PRIORITY
> >  Set the protocol-defined priority for all packets to be sent on
> >  this socket.
> > diff --git a/man7/unix.7 b/man7/unix.7
> > index 50828a5bc..298521d4a 100644
> > --- a/man7/unix.7
> > +++ b/man7/unix.7
> > @@ -349,6 +349,52 @@ stream sockets and for
> >  .B AF_UNIX
> >  stream and datagram socket pairs created using
> >  .BR socketpair (2).
> > +.TP
> > +.B SO_PEERSEC
> > +This read-only socket option returns the
> > +security context of the peer socket connected to this socket.
> > +By default, this will be the same as the security context of
> > +the process that created the peer socket unless overridden
> > +by the policy or by a process with the required permissions.
> > +.IP
> > +The argument to
> > +.BR getsockopt (2)
> > +is a pointer to a
> > +buffer of the specified length in bytes
> > +into which the security context string will be copied.
> > +If the buffer length is less than the length of the security
> > +context string, then
> > +.BR getsockopt (2)
> > +will return the required length
> > +via
> > +.I optlen
> > +and return \-1 and sets
> > +.I errno
> > +to
> > +.BR ERANGE .
> > +The caller should allocate at least
> > +.BR NAME_MAX
> > +bytes for the buffer initially although this is not guaranteed
> > +to be sufficient.  Resizing the buffer to the returned length
> > +and retrying may be necessary.
> > +.IP
> > +The security context string may include a terminating null character
> > +in the returned length, but is not guaranteed to do so: a security
> > +context "foo" might be represented as either {'f','o','o'} of length 3
> > +or {'f','o','o','\\0'} of length 4, which are considered to be
> > +interchangeable. It is printable, does not contain non-terminating
> > +null characters, and is in an unspecified encoding (in particular it
> > +is not guaranteed to be ASCII or UTF-8).
> > +.IP
> > +The use of this option for sockets in the
> > +.B AF_UNIX
> > +address family
> > +is supported since Linux 2.6.2 for connected stream sockets and
> > +since Linux 4.18,
> > +.\" commit 0b811db2cb2aabc910e53d34ebb95a15997c33e7
> > +also for stream and datagram socket pairs created
> > +using
> > +.BR socketpair (2).
> >  .\"
> >  .SS Autobind feature
> >  If a
> > --
> > 2.25.1
> >
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply

* Re: [PATCH v3] certs: Add EFI_CERT_X509_GUID support for dbx entries
From: Eric Snowberg @ 2020-09-15 15:42 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: dhowells, dwmw2, jmorris, serge, nayna, erichte, mpe, zohar,
	keyrings, linux-kernel, linux-security-module, rdunlap
In-Reply-To: <20200914181227.GF9369@linux.intel.com>


> On Sep 14, 2020, at 12:12 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> 
> On Fri, Sep 11, 2020 at 02:22:30PM -0400, Eric Snowberg wrote:
>> The Secure Boot Forbidden Signature Database, dbx, contains a list of now
>> revoked signatures and keys previously approved to boot with UEFI Secure
>> Boot enabled.  The dbx is capable of containing any number of
>> EFI_CERT_X509_SHA256_GUID, EFI_CERT_SHA256_GUID, and EFI_CERT_X509_GUID
>> entries.
>> 
>> Currently when EFI_CERT_X509_GUID are contained in the dbx, the entries are
>> skipped.
>> 
>> Add support for EFI_CERT_X509_GUID dbx entries. When a EFI_CERT_X509_GUID
>> is found, it is added as an asymmetrical key to the .blacklist keyring.
>> Anytime the .platform keyring is used, the keys in the .blacklist keyring
>> are referenced, if a matching key is found, the key will be rejected.
>> 
>> Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com>
>> ---
>> v3:
>> Fixed an issue when CONFIG_PKCS7_MESSAGE_PARSER is not builtin and defined
>> as a module instead, pointed out by Randy Dunlap
>> 
>> v2: 
>> Fixed build issue reported by kernel test robot <lkp@intel.com>
>> Commit message update (suggested by Jarkko Sakkinen)
>> ---
>> certs/blacklist.c                             | 33 +++++++++++++++++++
>> certs/blacklist.h                             | 12 +++++++
>> certs/system_keyring.c                        |  6 ++++
>> include/keys/system_keyring.h                 | 11 +++++++
>> .../platform_certs/keyring_handler.c          | 11 +++++++
>> 5 files changed, 73 insertions(+)
>> 
>> diff --git a/certs/blacklist.c b/certs/blacklist.c
>> index 6514f9ebc943..3d1514ba5d47 100644
>> --- a/certs/blacklist.c
>> +++ b/certs/blacklist.c
>> @@ -100,6 +100,39 @@ int mark_hash_blacklisted(const char *hash)
>> 	return 0;
>> }
>> 
>> +int mark_key_revocationlisted(const char *data, size_t size)
>> +{
>> +	key_ref_t key;
>> +
>> +	key = key_create_or_update(make_key_ref(blacklist_keyring, true),
>> +				   "asymmetric",
>> +				   NULL,
>> +				   data,
>> +				   size,
>> +				   ((KEY_POS_ALL & ~KEY_POS_SETATTR) | KEY_USR_VIEW),
>> +				   KEY_ALLOC_NOT_IN_QUOTA | KEY_ALLOC_BUILT_IN);
>> +
>> +	if (IS_ERR(key)) {
>> +		pr_err("Problem with revocation key (%ld)\n", PTR_ERR(key));
>> +		return PTR_ERR(key);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +int is_key_revocationlisted(struct pkcs7_message *pkcs7)
>> +{
>> +	int ret;
>> +
>> +	ret = validate_trust(pkcs7, blacklist_keyring);
>> +
>> +	if (ret == 0)
>> +		return -EKEYREJECTED;
>> +
>> +	return -ENOKEY;
>> +}
>> +EXPORT_SYMBOL_GPL(is_key_revocationlisted);
> 
> Hmm... ignore my previous comment about this. Export symbol is called
> only by system keyring code.
> 
> Would be best if the commit message would explicitly reason new exports.

I don’t see a good reason to keep the export now, I’ll remove it from the
next version.  Thanks.


^ permalink raw reply

* Re: [[PATCH V4]] audit: trigger accompanying records when no rules present
From: Paul Moore @ 2020-09-15 16:18 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: Linux-Audit Mailing List, LKML, Linux Security Module list,
	Eric Paris
In-Reply-To: <35f2b8c69b4b9abbc076dd55a6f0f52cf20abad7.1599687447.git.rgb@redhat.com>

On Thu, Sep 10, 2020 at 11:03 AM Richard Guy Briggs <rgb@redhat.com> wrote:
>
> When there are no audit rules registered, mandatory records (config,
> etc.) are missing their accompanying records (syscall, proctitle, etc.).
>
> This is due to audit context dummy set on syscall entry based on absence
> of rules that signals that no other records are to be printed.
>
> Clear the dummy bit if any record is generated.
>
> The proctitle context and dummy checks are pointless since the
> proctitle record will not be printed if no syscall records are printed.
>
> The fds array is reset to -1 after the first syscall to indicate it
> isn't valid any more, but was never set to -1 when the context was
> allocated to indicate it wasn't yet valid.
>
> The audit_inode* functions can be called without going through
> getname_flags() or getname_kernel() that sets audit_names and cwd, so
> set the cwd if it has not already been done so due to audit_names being
> valid.
>
> The LSM dump_common_audit_data() LSM_AUDIT_DATA_NET:AF_UNIX case was
> missed with the ghak96 patch, so add that case here.
>
> Thanks to bauen1 <j2468h@googlemail.com> for reporting LSM situations in
> which context->cwd is not valid, inadvertantly fixed by the ghak96 patch.
>
> Please see upstream github issue
> https://github.com/linux-audit/audit-kernel/issues/120
> This is also related to upstream github issue
> https://github.com/linux-audit/audit-kernel/issues/96
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
> Passes audit-testsuite.
>
> Chagelog:
> v4:
> - rebase on audit/next v5.9-rc1
> - squash v2+v3fix
> - add pwd NULL check in audit_log_name()
> - resubmit after revert
>
> v3:
> - initialize fds[0] to -1
> - init cwd for ghak96 LSM_AUDIT_DATA_NET:AF_UNIX case
> - init cwd for audit_inode{,_child}
>
> v2:
> - unconditionally clear dummy
> - create audit_clear_dummy accessor function
> - remove proctitle context and dummy checks
>
>  kernel/audit.c       |  1 +
>  kernel/audit.h       |  8 ++++++++
>  kernel/auditsc.c     | 11 +++++++----
>  security/lsm_audit.c |  1 +
>  4 files changed, 17 insertions(+), 4 deletions(-)

Comments below, but can you elaborate on if any testing was done
beyond the audit-testsuite?

> diff --git a/kernel/audit.h b/kernel/audit.h
> index 3b9c0945225a..abcfef58435b 100644
> --- a/kernel/audit.h
> +++ b/kernel/audit.h
> @@ -290,6 +290,13 @@ extern int audit_signal_info_syscall(struct task_struct *t);
>  extern void audit_filter_inodes(struct task_struct *tsk,
>                                 struct audit_context *ctx);
>  extern struct list_head *audit_killed_trees(void);
> +
> +static inline void audit_clear_dummy(struct audit_context *ctx)
> +{
> +       if (ctx)
> +               ctx->dummy = 0;
> +}

With the only caller being audit_log_start(), should this be moved to
kernel/audit.c?  I'm just not sure this is something we would ever
need (or want) to call from elsewhere, thoughts?

> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 8dba8f0983b5..9d2de93f40b3 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -1367,7 +1368,10 @@ static void audit_log_name(struct audit_context *context, struct audit_names *n,
>                         /* name was specified as a relative path and the
>                          * directory component is the cwd
>                          */
> -                       audit_log_d_path(ab, " name=", &context->pwd);
> +                       if (&context->pwd)

Hmm, I don't think this is going to work the way you are intending; I
believe this will always evaluate to true regardless of the state of
context->pwd.  If you look elsewhere in kernel/auditsc.c you will see
some examples of checking to see if context->pwd is valid (e.g.
_audit_getcwd() and audit_log_exit()).

> +                               audit_log_d_path(ab, " name=", &context->pwd);
> +                       else
> +                               audit_log_format(ab, " name=(null)");
>                         break;
>                 default:
>                         /* log the name's directory component */...

> @@ -2079,6 +2080,7 @@ void __audit_inode(struct filename *name, const struct dentry *dentry,
>         }
>         handle_path(dentry);
>         audit_copy_inode(n, dentry, inode, flags & AUDIT_INODE_NOEVAL);
> +       _audit_getcwd(context);
>  }
>
>  void __audit_file(const struct file *file)
> @@ -2197,6 +2199,7 @@ void __audit_inode_child(struct inode *parent,
>                 audit_copy_inode(found_child, dentry, inode, 0);
>         else
>                 found_child->ino = AUDIT_INO_UNSET;
> +       _audit_getcwd(context);
>  }
>  EXPORT_SYMBOL_GPL(__audit_inode_child);
>
> diff --git a/security/lsm_audit.c b/security/lsm_audit.c
> index 53d0d183db8f..e93077612246 100644
> --- a/security/lsm_audit.c
> +++ b/security/lsm_audit.c
> @@ -369,6 +369,7 @@ static void dump_common_audit_data(struct audit_buffer *ab,
>                                         audit_log_untrustedstring(ab, p);
>                                 else
>                                         audit_log_n_hex(ab, p, len);
> +                               audit_getcwd();
>                                 break;
>                         }
>                 }

I'm starting to wonder if audit is doing this wrong (it is audit after
all) ... why not just fetch the cwd in audit_log_exit() if there are
entries in the context->names_list?  The only time we care about
logging the working dir is when we actually have PATH records, right?
My initial thinking is that we can simplify a lot of code if we just
add a audit_getcwd() call in audit_log_exit() if the
context->names_list is not empty.  We should even be safe in the task
exit case as the fs info appears to get cleaned up *after*
audit_log_exit() is called.

Assuming we go this route, we can probably get rid of all the
audit_getcwd() calls outside of the audit code (e.g. the lsm_audit.c
code).  I guess we would need to make sure things still behave the
same for chdir(2), getcwd(2), etc. but even if we have to insert one
or two audit_getcwd() calls in that case we should still come out on
top (although I suspect the necessary calls are already being made).

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* [PATCH] socket.7,ip.7: Document SO_PEERSEC for AF_INET sockets
From: Stephen Smalley @ 2020-09-15 16:39 UTC (permalink / raw)
  To: mtk.manpages
  Cc: linux-man, linux-security-module, selinux, smcv, jmorris, serge,
	paul, Stephen Smalley

Augment the description of SO_PEERSEC to cover AF_INET sockets
in addition to the prior description for AF_UNIX.

SO_PEERSEC for TCP sockets was introduced in Linux 2.6.17 [1], and
SO_PEERSEC for SCTP sockets was introduced in Linux 4.17 [2].

This does not cover usage of SCM_SECURITY for UDP sockets, which
was also introduced in the same commit for 2.6.17.

Examples of the necessary labeled IPSEC and NetLabel configurations
to enable use of SO_PEERSEC for TCP and SCTP sockets can be found in
the SELinux Notebook [3] and the selinux-testsuite [4].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2c7946a7bf45ae86736ab3b43d0085e43947945c

[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d452930fd3b9031e59abfeddb2fa383f1403d61a

[3] https://github.com/SELinuxProject/selinux-notebook

[4] https://github.com/SELinuxProject/selinux-testsuite
---
 man7/ip.7     | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++
 man7/socket.7 |  2 +-
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/man7/ip.7 b/man7/ip.7
index c522b219c..03a9f3f7c 100644
--- a/man7/ip.7
+++ b/man7/ip.7
@@ -979,6 +979,62 @@ Argument is an
 .I ip_mreq_source
 structure as described under
 .BR IP_ADD_SOURCE_MEMBERSHIP .
+.TP
+.BR SO_PEERSEC " (since Linux 2.6.17)"
+If labeled IPSEC or NetLabel is configured on both the sending and
+receiving hosts, this read-only socket option returns the security
+context of the peer socket connected to this socket.  By default, this
+will be the same as the security context of the process that created
+the peer socket unless overridden by the policy or by a process with
+the required permissions.
+.IP
+The argument to
+.BR getsockopt (2)
+is a pointer to a
+buffer of the specified length in bytes
+into which the security context string will be copied.
+If the buffer length is less than the length of the security
+context string, then
+.BR getsockopt (2)
+will return the required length
+via
+.I optlen
+and return \-1 and sets
+.I errno
+to
+.BR ERANGE .
+The caller should allocate at least
+.BR NAME_MAX
+bytes for the buffer initially although this is not guaranteed
+to be sufficient.  Resizing the buffer to the returned length
+and retrying may be necessary.
+.IP
+The security context string may include a terminating null character
+in the returned length, but is not guaranteed to do so: a security
+context "foo" might be represented as either {'f','o','o'} of length 3
+or {'f','o','o','\\0'} of length 4, which are considered to be
+interchangeable. It is printable, does not contain non-terminating
+null characters, and is in an unspecified encoding (in particular it
+is not guaranteed to be ASCII or UTF-8).
+.IP
+The use of this option for sockets in the
+.B AF_INET
+address family
+is supported since Linux 2.6.17
+.\" commit 2c7946a7bf45ae86736ab3b43d0085e43947945c
+for TCP sockets and since Linux
+4.17
+.\" commit d452930fd3b9031e59abfeddb2fa383f1403d61a
+for SCTP sockets.
+.IP
+For SELinux, NetLabel only conveys the MLS portion of the security
+context of the peer across the wire, defaulting the rest of the
+security context to the values defined in the policy for the
+netmsg initial security identifier (SID). However, NetLabel can
+be configured to pass full security contexts over loopback.  Labeled
+IPSEC always passes full security contexts as part of establishing
+the security association (SA) and looks them up based on the association
+for each packet.
 .SS /proc interfaces
 The IP protocol
 supports a set of
diff --git a/man7/socket.7 b/man7/socket.7
index c3635f95b..2f9039333 100644
--- a/man7/socket.7
+++ b/man7/socket.7
@@ -693,7 +693,7 @@ For further details, see
 .BR SO_PEERSEC " (since Linux 2.6.2)"
 Return the security context of the peer socket connected to this socket.
 For further details, see
-.BR unix (7).
+.BR unix (7) and ip(7).
 .TP
 .B SO_PRIORITY
 Set the protocol-defined priority for all packets to be sent on
-- 
2.25.1


^ permalink raw reply related

* [GIT PULL] security: device_cgroup RCU warning fix
From: James Morris @ 2020-09-15 20:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, linux-security-module

This was posted a while back and been baking in -next for a while, please 
consider for 5.9.


The following changes since commit bcf876870b95592b52519ed4aafcf9d95999bc9c:

  Linux 5.8 (2020-08-02 14:21:45 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git tags/fixes-v5.9a

for you to fetch changes up to bc62d68e2a0a69fcdcf28aca8edb01abf306b698:

  device_cgroup: Fix RCU list debugging warning (2020-08-20 11:25:03 -0700)

----------------------------------------------------------------
device_cgroup RCU warning fix from Amol Grover <frextrite@gmail.com>

----------------------------------------------------------------

Amol Grover (1):
      device_cgroup: Fix RCU list debugging warning

 security/device_cgroup.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


commit bc62d68e2a0a69fcdcf28aca8edb01abf306b698
Author: Amol Grover <frextrite@gmail.com>
Date:   Mon Apr 6 16:29:50 2020 +0530

    device_cgroup: Fix RCU list debugging warning
    
    exceptions may be traversed using list_for_each_entry_rcu()
    outside of an RCU read side critical section BUT under the
    protection of decgroup_mutex. Hence add the corresponding
    lockdep expression to fix the following false-positive
    warning:
    
    [    2.304417] =============================
    [    2.304418] WARNING: suspicious RCU usage
    [    2.304420] 5.5.4-stable #17 Tainted: G            E
    [    2.304422] -----------------------------
    [    2.304424] security/device_cgroup.c:355 RCU-list traversed in non-reader section!!
    
    Signed-off-by: Amol Grover <frextrite@gmail.com>
    Signed-off-by: James Morris <jmorris@namei.org>

diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index 43ab0ad45c1b..04375df52fc9 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -354,7 +354,8 @@ static bool match_exception_partial(struct list_head *exceptions, short type,
 {
 	struct dev_exception_item *ex;
 
-	list_for_each_entry_rcu(ex, exceptions, list) {
+	list_for_each_entry_rcu(ex, exceptions, list,
+				lockdep_is_held(&devcgroup_mutex)) {
 		if ((type & DEVCG_DEV_BLOCK) && !(ex->type & DEVCG_DEV_BLOCK))
 			continue;
 		if ((type & DEVCG_DEV_CHAR) && !(ex->type & DEVCG_DEV_CHAR))

^ permalink raw reply related

* Re: [RFC PATCH 5/6] security/fbfam: Detect a fork brute force attack
From: John Wood @ 2020-09-15 18:44 UTC (permalink / raw)
  To: Jann Horn
  Cc: John Wood, Kees Cook, Kernel Hardening, Matthew Wilcox,
	Jonathan Corbet, Alexander Viro, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Luis Chamberlain, Iurii Zaikin,
	James Morris, Serge E. Hallyn, linux-doc, kernel list,
	linux-fsdevel, linux-security-module
In-Reply-To: <CAG48ez0BcSY0is2LzdkizcOQYkaOJwfa=5ZSwjKb+faRwG9QCA@mail.gmail.com>

On Mon, Sep 14, 2020 at 09:42:37PM +0200, Jann Horn wrote:
> On Sun, Sep 13, 2020 at 7:55 PM John Wood <john.wood@gmx.com> wrote:
> > On Thu, Sep 10, 2020 at 11:10:38PM +0200, Jann Horn wrote:
> > > > +       delta_jiffies = get_jiffies_64() - stats->jiffies;
> > > > +       delta_time = jiffies64_to_msecs(delta_jiffies);
> > > > +       crashing_rate = delta_time / (u64)stats->faults;
> > >
> > > Do I see this correctly, is this computing the total runtime of this
> > > process hierarchy divided by the total number of faults seen in this
> > > process hierarchy? If so, you may want to reconsider whether that's
> > > really the behavior you want. For example, if I configure the minimum
> > > period between crashes to be 30s (as is the default in the sysctl
> > > patch), and I try to attack a server that has been running without any
> > > crashes for a month, I'd instantly be able to crash around
> > > 30*24*60*60/30 = 86400 times before the detection kicks in. That seems
> > > suboptimal.
> >
> > You are right. This is not the behaviour we want. So, for the next
> > version it would be better to compute the crashing period as the time
> > between two faults, or the time between the execve call and the first
> > fault (first fault case).
> >
> > However, I am afraid of a premature detection if a child process fails
> > twice in a short period.
> >
> > So, I think it would be a good idea add a new sysctl to setup a
> > minimum number of faults before the time between faults starts to be
> > computed. And so, the attack detection only will be triggered if the
> > application crashes quickly but after a number of crashes.
> >
> > What do you think?
>
> You could keep a list of the timestamps of the last five crashes or
> so, and then take action if the last five crashes happened within
> (5-1)*crash_period_limit time.

Ok, your proposed solution seems a more clever one. Anyway I think that a
new sysctl for fine tuning the number of timestamps would be needed.

Thanks,
John Wood


^ permalink raw reply

* Re: [RFC PATCH 5/6] security/fbfam: Detect a fork brute force attack
From: John Wood @ 2020-09-15 17:36 UTC (permalink / raw)
  To: Jann Horn
  Cc: John Wood, Kees Cook, Kernel Hardening, Matthew Wilcox,
	Jonathan Corbet, Alexander Viro, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Luis Chamberlain, Iurii Zaikin,
	James Morris, Serge E. Hallyn, linux-doc, kernel list,
	linux-fsdevel, linux-security-module
In-Reply-To: <CAG48ez3aQXb3EuGRVvLLo7BxycqJ4Y2mL83QhY9-QMK_qkfCuQ@mail.gmail.com>

Hi,

On Mon, Sep 14, 2020 at 09:39:10PM +0200, Jann Horn wrote:
> On Sun, Sep 13, 2020 at 6:56 PM John Wood <john.wood@gmx.com> wrote:
> > On Fri, Sep 11, 2020 at 02:01:56AM +0200, Jann Horn wrote:
> > > On Fri, Sep 11, 2020 at 1:49 AM Kees Cook <keescook@chromium.org> wrote:
> > > > On Thu, Sep 10, 2020 at 01:21:06PM -0700, Kees Cook wrote:
> > > > [...]
> > > > I don't think this is the right place for detecting a crash -- isn't
> > > > this only for the "dumping core" condition? In other words, don't you
> > > > want to do this in get_signal()'s "fatal" block? (i.e. very close to the
> > > > do_coredump, but without the "should I dump?" check?)
> > > >
> > > > Hmm, but maybe I'm wrong? It looks like you're looking at noticing the
> > > > process taking a signal from SIG_KERNEL_COREDUMP_MASK ?
> > > >
> > > > (Better yet: what are fatal conditions that do NOT match
> > > > SIG_KERNEL_COREDUMP_MASK, and should those be covered?)
> > > >
> > > > Regardless, *this* looks like the only place without an LSM hook. And it
> > > > doesn't seem unreasonable to add one here. I assume it would probably
> > > > just take the siginfo pointer, which is also what you're checking.
> > >
> > > Good point, making this an LSM might be a good idea.
> > >
> > > > e.g. for include/linux/lsm_hook_defs.h:
> > > >
> > > > LSM_HOOK(int, 0, task_coredump, const kernel_siginfo_t *siginfo);
> > >
> > > I guess it should probably be an LSM_RET_VOID hook? And since, as you
> > > said, it's not really semantically about core dumping, maybe it should
> > > be named task_fatal_signal or something like that.
> >
> > If I understand correctly you propose to add a new LSM hook without return
> > value and place it here:
> >
> > diff --git a/kernel/signal.c b/kernel/signal.c
> > index a38b3edc6851..074492d23e98 100644
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -2751,6 +2751,8 @@ bool get_signal(struct ksignal *ksig)
> >                         do_coredump(&ksig->info);
> >                 }
> >
> > +               // Add the new LSM hook here
> > +
> >                 /*
> >                  * Death signals, no core dump.
> >                  */
>
> It should probably be in the "if (sig_kernel_coredump(signr)) {"
> branch. And I'm not sure whether it should be before or after
> do_coredump() - if you do it after do_coredump(), the hook will have
> to wait until the core dump file has been written, which may take a
> little bit of time.

But if the LSM hook is placed in the "if (sig_kernel_coredump(signr)) {"
branch, then only the following signals will be passed to it.

SIGQUIT, SIGILL, SIGTRAP, SIGABRT, SIGFPE, SIGSEGV, SIGBUS, SIGSYS,
SIGXCPU, SIGXFSZ, SIGEMT

The above signals are extracted from SIG_KERNEL_COREDUMP_MASK macro, and
are only related to coredump.

So, if we add a new LSM hook (named task_fatal_signal) to detect a fatal
signal it would be better to place it just above the if statement.

diff --git a/kernel/signal.c b/kernel/signal.c
index a38b3edc6851..406af87f2f96 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2736,6 +2736,8 @@ bool get_signal(struct ksignal *ksig)
                 */
                current->flags |= PF_SIGNALED;

+               // Place the new LSM hook here
+
                if (sig_kernel_coredump(signr)) {
                        if (print_fatal_signals)
                                print_fatal_signal(ksig->info.si_signo);

This way all the fatal signals are caught and we also avoid the commented
delay if a core dump is necessary.

Thanks,
John Wood


^ permalink raw reply related

* Re: [PATCH] ima: Use kmemdup rather than kmalloc+memcpy
From: Mimi Zohar @ 2020-09-15 14:14 UTC (permalink / raw)
  To: Alex Dewar
  Cc: Dmitry Kasatkin, James Morris, Serge E. Hallyn, linux-integrity,
	linux-security-module, linux-kernel
In-Reply-To: <20200909190907.164013-1-alex.dewar90@gmail.com>

On Wed, 2020-09-09 at 20:09 +0100, Alex Dewar wrote:
> Issue identified with Coccinelle.
> 
> Signed-off-by: Alex Dewar <alex.dewar90@gmail.com>

Thank you.

Mimi


^ permalink raw reply

* Re: [RFC PATCH v9 2/3] arch: Wire up introspect_access(2)
From: Arnd Bergmann @ 2020-09-15 13:32 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel@vger.kernel.org, Aleksa Sarai, Alexei Starovoitov,
	Al Viro, Andrew Morton, Andy Lutomirski, Casey Schaufler,
	Christian Brauner, Christian Heimes, Daniel Borkmann,
	Deven Bowers, Dmitry Vyukov, Eric Biggers, Eric Chiang,
	Florian Weimer, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Kees Cook, Lakshmi Ramasubramanian,
	Matthew Garrett, Matthew Wilcox, Michael Kerrisk, Miklos Szeredi,
	Mimi Zohar, Philippe Trébuchet, Scott Shell,
	Sean Christopherson, Shuah Khan, Steve Dower, Steve Grubb,
	Tetsuo Handa, Thibaut Sautereau, Vincent Strubel,
	Kernel Hardening, Linux API, linux-integrity, LSM List,
	Linux FS-devel Mailing List, Mickaël Salaün,
	Thibaut Sautereau
In-Reply-To: <20200910164612.114215-3-mic@digikod.net>

On Thu, Sep 10, 2020 at 6:46 PM Mickaël Salaün <mic@digikod.net> wrote:
>
> From: Mickaël Salaün <mic@linux.microsoft.com>
>
> Wire up access_interpreted(2) for all architectures.
>
> Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
> Reviewed-by: Thibaut Sautereau <thibaut.sautereau@ssi.gouv.fr>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Vincent Strubel <vincent.strubel@ssi.gouv.fr>
> ---
>
> Changes since v7:
> * New patch for the new syscall.
> * Increase syscall numbers by 2 to leave space for new ones (in
>   linux-next): watch_mount(2) and process_madvise(2).

I checked that the syscall calling conventions are sane and that
it is wired up correctly on all architectures in this patch.

Acked-by: Arnd Bergmann <arnd@arndb.de>

I did not look at the system call implementation or its purpose though,
as that is not my area.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox