linux-security-module.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC 0/8] PGP key parser using SandBox Mode
       [not found]       ` <87y1bktjdk.fsf@meer.lwn.net>
@ 2024-02-16 18:24         ` Roberto Sassu
  0 siblings, 0 replies; 10+ messages in thread
From: Roberto Sassu @ 2024-02-16 18:24 UTC (permalink / raw)
  To: Jonathan Corbet, Petr Tesařík, Dave Hansen
  Cc: Petr Tesarik, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Andy Lutomirski, Oleg Nesterov, Peter Zijlstra,
	Xin Li, Arnd Bergmann, Andrew Morton, Rick Edgecombe, Kees Cook,
	Masami Hiramatsu (Google), Pengfei Xu, Josh Poimboeuf, Ze Gao,
	Kirill A. Shutemov, Kai Huang, David Woodhouse, Brian Gerst,
	Jason Gunthorpe, Joerg Roedel, Mike Rapoport (IBM), Tina Zhang,
	Jacob Pan, open list:DOCUMENTATION, open list, David Howells,
	Petr Tesarik, jannh, linux-security-module

On 2/16/2024 6:21 PM, Jonathan Corbet wrote:
> Petr Tesařík <petr@tesarici.cz> writes:
> 
>> On Fri, 16 Feb 2024 07:38:30 -0800
>> Dave Hansen <dave.hansen@intel.com> wrote:
>>> I'm confused by this.  The kernel doesn't (appear to) have a PGP parser
>>> today.  So are you saying that it *should* have one and it's only
>>> feasible if its confined in a sandbox?
>>
>> I'm sorry if this is confusing. Yes, your understanding is correct.
>> This patch series demonstrates that SBM (even in the initial version
>> that was submitted) allows to write a PGP parser which can survive
>> memory safety bugs withoug compromising the rest of the kernel.
> 
> So I have a different question: some years ago we added the "usermode
> blob" feature for just this kind of use case - parsing firewall rules at
> the time.  It has never been used for that, but it's still there in
> kernel/usermode_driver.c.  Is there a reason why this existing
> functionality can't be used for tasks like PGP parsing as well?

Yes, it was an option I explored last year (briefly talked about it as a 
BoF at LSS NA 2023).

You are right, there is such feature that seemed to fit well.

User space blob embedded in a kernel module, so signed. User space 
process connected only to the kernel through a pipe.

I even went ahead, and created a framework:

https://lore.kernel.org/linux-kernel/20230317145240.363908-1-roberto.sassu@huaweicloud.com/

so that anyone can implement similar use cases.

The further step is: how can I ensure that the process launched by the 
kernel is not attacked by root (which I assumed to be less powerful than 
the kernel in a locked-down system).

I handled this in both directions:

- The process launched by the kernel is under a seccomp strict profile,
   and can only read/write through the pipe created by the kernel (and
   call few other system calls, such as exit()). Otherwise it is killed.
   Cannot create any new communication channel.

- I created an LSM that denies any attempt to ptrace/signal to the
   process launched by the kernel. Jann Horn also suggested to make the
   process non-swappable.

However, despite these attempts, security people don't feel confident on 
offloading a kernel workload outside the kernel.

This is why this work started.

Roberto


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC 0/5] PoC: convert AppArmor parser to SandBox Mode
       [not found] <fb4a40c7-af9a-406a-95ab-406595f3ffe5@intel.com>
       [not found] ` <20240216152435.1575-1-petrtesarik@huaweicloud.com>
@ 2024-02-22 13:12 ` Petr Tesarik
  2024-02-22 13:12   ` [RFC 1/5] sbm: x86: fix SBM error entry path Petr Tesarik
                     ` (4 more replies)
  1 sibling, 5 replies; 10+ messages in thread
From: Petr Tesarik @ 2024-02-22 13:12 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Petr Tesařík, Petr Tesarik, Jonathan Corbet,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	Andy Lutomirski, Oleg Nesterov, Peter Zijlstra, Xin Li,
	Arnd Bergmann, Andrew Morton, Rick Edgecombe, Kees Cook,
	Masami Hiramatsu (Google), Pengfei Xu, Josh Poimboeuf, Ze Gao,
	Kirill A. Shutemov, Kai Huang, David Woodhouse, Brian Gerst,
	Jason Gunthorpe, Joerg Roedel, Mike Rapoport (IBM), Tina Zhang,
	Jacob Pan, open list:DOCUMENTATION, open list, Roberto Sassu,
	John Johansen, Paul Moore, James Morris, Serge E. Hallyn,
	apparmor, linux-security-module, Petr Tesarik

From: Petr Tesarik <petr.tesarik1@huawei-partners.com>

[ For people newly added to Cc, this RFC is a reply to subsystem
  maintainers who asked for a real-world demonstration of how
  SandBox Mode could be used in practice. SandBox Mode itself
  was proposed in these two series (generic and x86):

* https://lore.kernel.org/lkml/20240214113516.2307-1-petrtesarik@huaweicloud.com/T/
* https://lore.kernel.org/lkml/20240214113035.2117-1-petrtesarik@huaweicloud.com/T/
]

This patch series provides an example of running existing kernel code in
a sandbox. It also adds some fixes and infrastructure to the base series.
If you only want to see how the conversion itself might look like, skip
straight to patch 5/5.

Patches 1 and 2 amend the base patch series. Patches 3 and 4 are ported
from my earlier proof of concept and adapted to work without adding too
much other code. I am sending a complete WIP patch series so you can
actually build and run the code.

Disclaimer: This code is not ready for submission. It is incomplete and
may contain bugs. It is provided here for the sole purpose of demonstrating
how existing kernel code would be modified to run in a sandbox.

PATCH 1/5 is a bug fix discovered after sending patch series v1.
PATCH 2/5 allows to map a buffer into the sandbox at its kernel address.
PATCH 3/5 is required to intercept calls to pre-selected kernel functions.
PATCH 4/5 implements dynamic allocation in sandbox mode.
PATCH 5/5 demonstrates how to convert existing kernel code to use SBM.

Petr Tesarik (5):
  sbm: x86: fix SBM error entry path
  sbm: enhance buffer mapping API
  sbm: x86: infrastructure to fix up sandbox faults
  sbm: fix up calls to dynamic memory allocators
  apparmor: parse profiles in sandbox mode

 arch/x86/entry/entry_64.S     |  10 ++-
 arch/x86/kernel/sbm/call_64.S |  20 +++++
 arch/x86/kernel/sbm/core.c    | 161 +++++++++++++++++++++++++++++++++-
 arch/x86/kernel/vmlinux.lds.S |   9 ++
 include/linux/sbm.h           |  77 ++++++++++++++++
 kernel/sbm.c                  |  34 +++++++
 mm/slab_common.c              |   3 +-
 mm/slub.c                     |  17 ++--
 mm/vmalloc.c                  |  11 +--
 security/apparmor/crypto.c    |   7 +-
 security/apparmor/policy.c    |  29 ++++--
 security/apparmor/secid.c     |   3 +-
 12 files changed, 352 insertions(+), 29 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC 1/5] sbm: x86: fix SBM error entry path
  2024-02-22 13:12 ` [RFC 0/5] PoC: convert AppArmor parser to " Petr Tesarik
@ 2024-02-22 13:12   ` Petr Tesarik
  2024-02-22 13:12   ` [RFC 2/5] sbm: enhance buffer mapping API Petr Tesarik
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Petr Tesarik @ 2024-02-22 13:12 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Petr Tesařík, Petr Tesarik, Jonathan Corbet,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	Andy Lutomirski, Oleg Nesterov, Peter Zijlstra, Xin Li,
	Arnd Bergmann, Andrew Morton, Rick Edgecombe, Kees Cook,
	Masami Hiramatsu (Google), Pengfei Xu, Josh Poimboeuf, Ze Gao,
	Kirill A. Shutemov, Kai Huang, David Woodhouse, Brian Gerst,
	Jason Gunthorpe, Joerg Roedel, Mike Rapoport (IBM), Tina Zhang,
	Jacob Pan, open list:DOCUMENTATION, open list, Roberto Sassu,
	John Johansen, Paul Moore, James Morris, Serge E. Hallyn,
	apparmor, linux-security-module, Petr Tesarik

From: Petr Tesarik <petr.tesarik1@huawei-partners.com>

Normal interrupt entry from SBM should be generally treated as entry from
kernel mode (no swapgs, no speculation mitigations), but since there is a
CPL change, the interrupt handler runs on the trampoline stack, which may
get reused if the current task is re-scheduled.

Make sure to switch to the SBM exception stack.

Signed-off-by: Petr Tesarik <petr.tesarik1@huawei-partners.com>
---
 arch/x86/entry/entry_64.S | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 4ba3eea38102..96830591302d 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1062,14 +1062,20 @@ SYM_CODE_START(error_entry)
 	/*
 	 * If sandbox mode was active, adjust the saved CS,
 	 * unconditionally switch to kernel CR3 and continue
-	 * as if the interrupt was from kernel space.
+	 * as if the interrupt was from kernel space, but
+	 * switch away from the trampoline stack.
 	 */
 	movq	x86_sbm_state + SBM_kernel_cr3, %rcx
 	jrcxz	.Lerror_swapgs
 
 	andb	$~3, CS+8(%rsp)
 	movq	%rcx, %cr3
-	jmp	.Lerror_entry_done_lfence
+
+	FENCE_SWAPGS_KERNEL_ENTRY
+	CALL_DEPTH_ACCOUNT
+	leaq	8(%rsp), %rdi
+	/* Put us onto the SBM exception stack. */
+	jmp	sync_regs
 #endif
 
 .Lerror_swapgs:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC 2/5] sbm: enhance buffer mapping API
  2024-02-22 13:12 ` [RFC 0/5] PoC: convert AppArmor parser to " Petr Tesarik
  2024-02-22 13:12   ` [RFC 1/5] sbm: x86: fix SBM error entry path Petr Tesarik
@ 2024-02-22 13:12   ` Petr Tesarik
  2024-02-22 13:12   ` [RFC 3/5] sbm: x86: infrastructure to fix up sandbox faults Petr Tesarik
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Petr Tesarik @ 2024-02-22 13:12 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Petr Tesařík, Petr Tesarik, Jonathan Corbet,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	Andy Lutomirski, Oleg Nesterov, Peter Zijlstra, Xin Li,
	Arnd Bergmann, Andrew Morton, Rick Edgecombe, Kees Cook,
	Masami Hiramatsu (Google), Pengfei Xu, Josh Poimboeuf, Ze Gao,
	Kirill A. Shutemov, Kai Huang, David Woodhouse, Brian Gerst,
	Jason Gunthorpe, Joerg Roedel, Mike Rapoport (IBM), Tina Zhang,
	Jacob Pan, open list:DOCUMENTATION, open list, Roberto Sassu,
	John Johansen, Paul Moore, James Morris, Serge E. Hallyn,
	apparmor, linux-security-module, Petr Tesarik

From: Petr Tesarik <petr.tesarik1@huawei-partners.com>

Add SBM_MAP_READONLY() and SBM_MAP_WRITABLE() to the public API to allow
mapping kernel buffers directly into the sandbox with no copying.

Signed-off-by: Petr Tesarik <petr.tesarik1@huawei-partners.com>
---
 include/linux/sbm.h | 71 +++++++++++++++++++++++++++++++++++++++++++++
 kernel/sbm.c        | 34 ++++++++++++++++++++++
 2 files changed, 105 insertions(+)

diff --git a/include/linux/sbm.h b/include/linux/sbm.h
index 98fd27cd58d0..dbdc0781349f 100644
--- a/include/linux/sbm.h
+++ b/include/linux/sbm.h
@@ -181,6 +181,31 @@ static inline void *sbm_add_buf(struct sbm *sbm, struct sbm_buf **list,
 #define SBM_COPY_INOUT(sbm, buf, size) \
 	((typeof(({buf; })))sbm_add_buf((sbm), &(sbm)->io, (buf), (size)))
 
+/**
+ * sbm_map_readonly() - Map memory for reading.
+ * @sbm:   SBM instance.
+ * @ptr:   Starting virtual address.
+ * @size:  Size in bytes.
+ *
+ * Make the specified virtual address range readable in sandbox code.
+ *
+ * Return: Address of the buffer, or %NULL on error.
+ */
+void *sbm_map_readonly(struct sbm *sbm, const void *ptr, size_t size);
+
+/**
+ * sbm_map_writable() - Map memory for reading and writing.
+ * @sbm:   SBM instance.
+ * @ptr:   Starting virtual address.
+ * @size:  Size in bytes.
+ *
+ * Make the specified virtual address range readable and writable in sandbox
+ * code.
+ *
+ * Return: Address of the buffer, or %NULL on error.
+ */
+void *sbm_map_writable(struct sbm *sbm, const void *ptr, size_t size);
+
 #ifdef CONFIG_HAVE_ARCH_SBM
 
 /**
@@ -303,8 +328,54 @@ static inline int sbm_exec(struct sbm *sbm, sbm_func func, void *data)
 #define SBM_COPY_OUT(sbm, buf, size) __SBM_EVAL(buf)
 #define SBM_COPY_INOUT(sbm, buf, size) __SBM_EVAL(buf)
 
+static inline void *sbm_map_readonly(struct sbm *sbm, const void *ptr,
+				     size_t size)
+{
+	return (void *)ptr;
+}
+
+static inline void *sbm_map_writable(struct sbm *sbm, const void *ptr,
+				     size_t size)
+{
+	return (void *)ptr;
+}
+
 #endif /* CONFIG_SANDBOX_MODE */
 
+/**
+ * SBM_MAP_READONLY() - Map an input buffer into SBM.
+ * @sbm:   SBM instance.
+ * @buf:   Buffer virtual address.
+ * @size:  Size of the buffer.
+ *
+ * Make a read-only mapping of buffer in sandbox mode.
+ *
+ * This works with page granularity. If the buffer is not page-aligned,
+ * some data before and/or after the page is also mappeed into the sandbox.
+ * The mapping does not ensure guard pages either.
+ *
+ * Return: Buffer address in sandbox mode (same as kernel mode).
+ */
+#define SBM_MAP_READONLY(sbm, buf, size) \
+	((typeof(({buf; })))sbm_map_readonly((sbm), (buf), (size)))
+
+/**
+ * SBM_MAP_WRITABLE() - Map an input/output buffer into SBM.
+ * @sbm:   SBM instance.
+ * @buf:   Buffer virtual address.
+ * @size:  Size of the buffer.
+ *
+ * Make a writable mapping of buffer in sandbox mode.
+ *
+ * This works with page granularity. If the buffer is not page-aligned,
+ * some data before and/or after the page is also mappeed into the sandbox.
+ * The mapping does not ensure guard pages either.
+ *
+ * Return: Buffer address in sandbox mode (same as kernel mode).
+ */
+#define SBM_MAP_WRITABLE(sbm, buf, size) \
+	((typeof(({buf; })))sbm_map_writable((sbm), (buf), (size)))
+
 /**
  * __SBM_MAP() - Convert parameters to comma-separated expressions.
  * @m: Macro used to convert each pair.
diff --git a/kernel/sbm.c b/kernel/sbm.c
index df57184f5d87..c832808b538e 100644
--- a/kernel/sbm.c
+++ b/kernel/sbm.c
@@ -71,6 +71,40 @@ void sbm_destroy(struct sbm *sbm)
 }
 EXPORT_SYMBOL(sbm_destroy);
 
+void *sbm_map_readonly(struct sbm *sbm, const void *ptr, size_t size)
+{
+	struct sbm_buf buf;
+
+	if (sbm->error)
+		return NULL;
+
+	buf.sbm_ptr = (void *)ptr;
+	buf.size = size;
+	sbm->error = arch_sbm_map_readonly(sbm, &buf);
+	if (sbm->error)
+		return NULL;
+
+	return buf.sbm_ptr;
+}
+EXPORT_SYMBOL(sbm_map_readonly);
+
+void *sbm_map_writable(struct sbm *sbm, const void *ptr, size_t size)
+{
+	struct sbm_buf buf;
+
+	if (sbm->error)
+		return NULL;
+
+	buf.sbm_ptr = (void *)ptr;
+	buf.size = size;
+	sbm->error = arch_sbm_map_writable(sbm, &buf);
+	if (sbm->error)
+		return NULL;
+
+	return buf.sbm_ptr;
+}
+EXPORT_SYMBOL(sbm_map_writable);
+
 /* Copy input buffers into a sandbox. */
 static int sbm_copy_in(struct sbm *sbm)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC 3/5] sbm: x86: infrastructure to fix up sandbox faults
  2024-02-22 13:12 ` [RFC 0/5] PoC: convert AppArmor parser to " Petr Tesarik
  2024-02-22 13:12   ` [RFC 1/5] sbm: x86: fix SBM error entry path Petr Tesarik
  2024-02-22 13:12   ` [RFC 2/5] sbm: enhance buffer mapping API Petr Tesarik
@ 2024-02-22 13:12   ` Petr Tesarik
  2024-02-22 13:12   ` [RFC 4/5] sbm: fix up calls to dynamic memory allocators Petr Tesarik
  2024-02-22 13:12   ` [RFC 5/5] apparmor: parse profiles in sandbox mode Petr Tesarik
  4 siblings, 0 replies; 10+ messages in thread
From: Petr Tesarik @ 2024-02-22 13:12 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Petr Tesařík, Petr Tesarik, Jonathan Corbet,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	Andy Lutomirski, Oleg Nesterov, Peter Zijlstra, Xin Li,
	Arnd Bergmann, Andrew Morton, Rick Edgecombe, Kees Cook,
	Masami Hiramatsu (Google), Pengfei Xu, Josh Poimboeuf, Ze Gao,
	Kirill A. Shutemov, Kai Huang, David Woodhouse, Brian Gerst,
	Jason Gunthorpe, Joerg Roedel, Mike Rapoport (IBM), Tina Zhang,
	Jacob Pan, open list:DOCUMENTATION, open list, Roberto Sassu,
	John Johansen, Paul Moore, James Morris, Serge E. Hallyn,
	apparmor, linux-security-module, Petr Tesarik

From: Petr Tesarik <petr.tesarik1@huawei-partners.com>

Since sandbox mode cannot modify kernel data, much of the core API cannot
be used directly. Provide a method to call a known subset of kernel
functions from the sandbox fault handler on behalf of the sandbox code.

Since SBM permissions have page granularity, the code of an intercepted
function must not be in the same page as another function running in
sandbox mode. Provide a __nosbm marker to move the intercepted functions
into a special ELF section, align it to page boundaries and map it so that
it is not executable in sandbox mode. To minimize alignment padding, merge
the __nosbm section with the kernel entry code.

Signed-off-by: Petr Tesarik <petr.tesarik1@huawei-partners.com>
---
 arch/x86/kernel/sbm/call_64.S | 20 +++++++++++
 arch/x86/kernel/sbm/core.c    | 65 +++++++++++++++++++++++++++++++++--
 arch/x86/kernel/vmlinux.lds.S |  9 +++++
 include/linux/sbm.h           |  6 ++++
 4 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/sbm/call_64.S b/arch/x86/kernel/sbm/call_64.S
index 21edce5666bc..6d8ae30a0984 100644
--- a/arch/x86/kernel/sbm/call_64.S
+++ b/arch/x86/kernel/sbm/call_64.S
@@ -93,3 +93,23 @@ SYM_INNER_LABEL(x86_sbm_return, SYM_L_GLOBAL)
 	pop	%rbp
 	RET
 SYM_FUNC_END(x86_sbm_exec)
+
+.text
+
+/*
+ * arguments:
+ * rdi  .. state (ignored)
+ * rsi  .. target function
+ * rdx  .. struct pt_regs
+*/
+SYM_FUNC_START(x86_sbm_proxy_call)
+	mov	%rdx, %r10
+	mov	%rsi, %r11
+	mov	pt_regs_di(%r10), %rdi
+	mov	pt_regs_si(%r10), %rsi
+	mov	pt_regs_dx(%r10), %rdx
+	mov	pt_regs_cx(%r10), %rcx
+	mov	pt_regs_r8(%r10), %r8
+	mov	pt_regs_r9(%r10), %r9
+	JMP_NOSPEC r11
+SYM_FUNC_END(x86_sbm_proxy_call)
diff --git a/arch/x86/kernel/sbm/core.c b/arch/x86/kernel/sbm/core.c
index 296f1fde3c22..c8ac7ecb08cc 100644
--- a/arch/x86/kernel/sbm/core.c
+++ b/arch/x86/kernel/sbm/core.c
@@ -28,6 +28,60 @@ asmlinkage int x86_sbm_exec(struct x86_sbm_state *state, sbm_func func,
 			    unsigned long exc_tos);
 extern char x86_sbm_return[];
 
+extern char __nosbm_text_start[], __nosbm_text_end[];
+
+/*************************************************************
+ * HACK: PROOF-OF-CONCEPT FIXUP CODE STARTS HERE
+ */
+
+typedef unsigned long (*sbm_proxy_call_fn)(struct x86_sbm_state *,
+					   unsigned long func,
+					   struct pt_regs *);
+
+asmlinkage unsigned long x86_sbm_proxy_call(struct x86_sbm_state *state,
+					    unsigned long func,
+					    struct pt_regs *regs);
+
+/**
+ * struct sbm_fixup - Describe a sandbox fault fixup.
+ * @target:  Target function to be called.
+ * @proxy:   Proxy call function.
+ */
+struct sbm_fixup {
+	void *target;
+	sbm_proxy_call_fn proxy;
+};
+
+static const struct sbm_fixup fixups[] =
+{
+	{ }
+};
+
+/* Fix up a page fault if it is one of the known exceptions. */
+static bool fixup_sbm_call(struct x86_sbm_state *state,
+			   struct pt_regs *regs, unsigned long address)
+{
+	const struct sbm_fixup *fixup;
+
+	for (fixup = fixups; fixup->target; ++fixup) {
+		if (address == (unsigned long)fixup->target) {
+			regs->ax = fixup->proxy(state, address, regs);
+			return true;
+		}
+	}
+
+	return false;
+}
+
+/* Execution in sandbox mode continues here after fixup. */
+static void x86_sbm_continue(void)
+{
+}
+
+/*
+ * HACK: PROOF-OF-CONCEPT FIXUP CODE ENDS HERE
+ *************************************************************/
+
 union {
 	struct x86_sbm_state state;
 	char page[PAGE_SIZE];
@@ -140,8 +194,8 @@ static int map_kernel(struct x86_sbm_state *state)
 	if (err)
 		return err;
 
-	err = map_range(state, (unsigned long)__entry_text_start,
-			(unsigned long)__entry_text_end, PAGE_KERNEL_ROX);
+	err = map_range(state, (unsigned long)__nosbm_text_start,
+			(unsigned long)__nosbm_text_end, PAGE_KERNEL_ROX);
 	if (err)
 		return err;
 
@@ -482,6 +536,13 @@ void handle_sbm_fault(struct pt_regs *regs, unsigned long error_code,
 	if (spurious_sbm_fault(state, error_code, address))
 		return;
 
+	if ((error_code & ~X86_PF_PROT) == (X86_PF_USER | X86_PF_INSTR) &&
+	    fixup_sbm_call(state, regs, address)) {
+		/* Return back to sandbox... */
+		regs->ip = (unsigned long)x86_sbm_continue;
+		return;
+	}
+
 	/*
 	 * Force -EFAULT unless the fault was due to a user-mode instruction
 	 * fetch from the designated return address.
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index a349dbfc6d5a..c530a7faaa9a 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -139,8 +139,17 @@ SECTIONS
 		STATIC_CALL_TEXT
 
 		ALIGN_ENTRY_TEXT_BEGIN
+#ifdef CONFIG_SANDBOX_MODE
+		. = ALIGN(PAGE_SIZE);
+		__nosbm_text_start = .;
+#endif
 		*(.text..__x86.rethunk_untrain)
 		ENTRY_TEXT
+#ifdef CONFIG_SANDBOX_MODE
+		*(.text.nosbm)
+		. = ALIGN(PAGE_SIZE);
+		__nosbm_text_end = .;
+#endif
 
 #ifdef CONFIG_CPU_SRSO
 		/*
diff --git a/include/linux/sbm.h b/include/linux/sbm.h
index dbdc0781349f..9d7eb525e489 100644
--- a/include/linux/sbm.h
+++ b/include/linux/sbm.h
@@ -267,6 +267,8 @@ int arch_sbm_map_writable(struct sbm *sbm, const struct sbm_buf *buf);
  */
 int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *data);
 
+#define __nosbm __section(".text.nosbm")
+
 #else /* !CONFIG_HAVE_ARCH_SBM */
 
 static inline int arch_sbm_init(struct sbm *sbm)
@@ -295,6 +297,8 @@ static inline int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *data)
 	return func(data);
 }
 
+#define __nosbm
+
 #endif /* CONFIG_HAVE_ARCH_SBM */
 
 #else /* !CONFIG_SANDBOX_MODE */
@@ -340,6 +344,8 @@ static inline void *sbm_map_writable(struct sbm *sbm, const void *ptr,
 	return (void *)ptr;
 }
 
+#define __nosbm
+
 #endif /* CONFIG_SANDBOX_MODE */
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC 4/5] sbm: fix up calls to dynamic memory allocators
  2024-02-22 13:12 ` [RFC 0/5] PoC: convert AppArmor parser to " Petr Tesarik
                     ` (2 preceding siblings ...)
  2024-02-22 13:12   ` [RFC 3/5] sbm: x86: infrastructure to fix up sandbox faults Petr Tesarik
@ 2024-02-22 13:12   ` Petr Tesarik
  2024-02-22 15:51     ` Dave Hansen
  2024-02-22 13:12   ` [RFC 5/5] apparmor: parse profiles in sandbox mode Petr Tesarik
  4 siblings, 1 reply; 10+ messages in thread
From: Petr Tesarik @ 2024-02-22 13:12 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Petr Tesařík, Petr Tesarik, Jonathan Corbet,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	Andy Lutomirski, Oleg Nesterov, Peter Zijlstra, Xin Li,
	Arnd Bergmann, Andrew Morton, Rick Edgecombe, Kees Cook,
	Masami Hiramatsu (Google), Pengfei Xu, Josh Poimboeuf, Ze Gao,
	Kirill A. Shutemov, Kai Huang, David Woodhouse, Brian Gerst,
	Jason Gunthorpe, Joerg Roedel, Mike Rapoport (IBM), Tina Zhang,
	Jacob Pan, open list:DOCUMENTATION, open list, Roberto Sassu,
	John Johansen, Paul Moore, James Morris, Serge E. Hallyn,
	apparmor, linux-security-module, Petr Tesarik

From: Petr Tesarik <petr.tesarik1@huawei-partners.com>

Add fixup functions to call kmalloc(), vmalloc() and friends on behalf of
the sandbox code.

Signed-off-by: Petr Tesarik <petr.tesarik1@huawei-partners.com>
---
 arch/x86/kernel/sbm/core.c | 81 ++++++++++++++++++++++++++++++++++++++
 mm/slab_common.c           |  3 +-
 mm/slub.c                  | 17 ++++----
 mm/vmalloc.c               | 11 +++---
 4 files changed, 98 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/sbm/core.c b/arch/x86/kernel/sbm/core.c
index c8ac7ecb08cc..3cf3842292b9 100644
--- a/arch/x86/kernel/sbm/core.c
+++ b/arch/x86/kernel/sbm/core.c
@@ -20,6 +20,12 @@
 #include <linux/sbm.h>
 #include <linux/sched/task_stack.h>
 
+/*
+ * FIXME: Remove these includes when there is proper API for defining
+ * which functions can be called from sandbox mode.
+ */
+#include <linux/vmalloc.h>
+
 #define GFP_SBM_PGTABLE	(GFP_KERNEL | __GFP_ZERO)
 #define PGD_ORDER	get_order(sizeof(pgd_t) * PTRS_PER_PGD)
 
@@ -52,8 +58,83 @@ struct sbm_fixup {
 	sbm_proxy_call_fn proxy;
 };
 
+static int map_range(struct x86_sbm_state *state, unsigned long start,
+		     unsigned long end, pgprot_t prot);
+
+/* Map the newly allocated dynamic memory region. */
+static unsigned long post_alloc(struct x86_sbm_state *state,
+				unsigned long objp, size_t size)
+{
+	int err;
+
+	if (!objp)
+		return objp;
+
+	err = map_range(state, objp, objp + size, PAGE_SHARED);
+	if (err) {
+		kfree((void*)objp);
+		return 0UL;
+	}
+	return objp;
+}
+
+/* Allocation proxy handler if size is the 1st parameter. */
+static unsigned long proxy_alloc1(struct x86_sbm_state *state,
+				    unsigned long func, struct pt_regs *regs)
+{
+	unsigned long objp;
+
+	objp = x86_sbm_proxy_call(state, func, regs);
+	return post_alloc(state, objp, regs->di);
+}
+
+/* Allocation proxy handler if size is the 2nd parameter. */
+static unsigned long proxy_alloc2(struct x86_sbm_state *state,
+				    unsigned long func, struct pt_regs *regs)
+{
+	unsigned long objp;
+
+	objp = x86_sbm_proxy_call(state, func, regs);
+	return post_alloc(state, objp, regs->si);
+}
+
+/* Allocation proxy handler if size is the 3rd parameter. */
+static unsigned long proxy_alloc3(struct x86_sbm_state *state,
+				    unsigned long func, struct pt_regs *regs)
+{
+	unsigned long objp;
+
+	objp = x86_sbm_proxy_call(state, func, regs);
+	return post_alloc(state, objp, regs->dx);
+}
+
+/* Proxy handler to free previously allocated memory. */
+static unsigned long proxy_free(struct x86_sbm_state *state,
+				unsigned long func, struct pt_regs *regs)
+{
+	/* TODO: unmap allocated addresses from sandbox! */
+	return x86_sbm_proxy_call(state, func, regs);
+}
+
 static const struct sbm_fixup fixups[] =
 {
+	/* kmalloc() and friends */
+	{ kmalloc_trace, proxy_alloc3 },
+	{ __kmalloc, proxy_alloc1 },
+	{ __kmalloc_node, proxy_alloc1 },
+	{ __kmalloc_node_track_caller, proxy_alloc1 },
+	{ kmalloc_large, proxy_alloc1 },
+	{ kmalloc_large_node, proxy_alloc1 },
+	{ krealloc, proxy_alloc2 },
+	{ kfree, proxy_free },
+
+	/* vmalloc() and friends */
+	{ vmalloc, proxy_alloc1 },
+	{ __vmalloc, proxy_alloc1 },
+	{ __vmalloc_node, proxy_alloc1 },
+	{ vzalloc, proxy_alloc1 },
+	{ vfree, proxy_free },
+
 	{ }
 };
 
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 238293b1dbe1..2b72118d9bfa 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -28,6 +28,7 @@
 #include <asm/page.h>
 #include <linux/memcontrol.h>
 #include <linux/stackdepot.h>
+#include <linux/sbm.h>
 
 #include "internal.h"
 #include "slab.h"
@@ -1208,7 +1209,7 @@ __do_krealloc(const void *p, size_t new_size, gfp_t flags)
  *
  * Return: pointer to the allocated memory or %NULL in case of error
  */
-void *krealloc(const void *p, size_t new_size, gfp_t flags)
+void * __nosbm krealloc(const void *p, size_t new_size, gfp_t flags)
 {
 	void *ret;
 
diff --git a/mm/slub.c b/mm/slub.c
index 2ef88bbf56a3..5f2290fe4df0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -42,6 +42,7 @@
 #include <kunit/test.h>
 #include <kunit/test-bug.h>
 #include <linux/sort.h>
+#include <linux/sbm.h>
 
 #include <linux/debugfs.h>
 #include <trace/events/kmem.h>
@@ -3913,7 +3914,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_node);
  * directly to the page allocator. We use __GFP_COMP, because we will need to
  * know the allocation order to free the pages properly in kfree.
  */
-static void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
+static void * __nosbm __kmalloc_large_node(size_t size, gfp_t flags, int node)
 {
 	struct folio *folio;
 	void *ptr = NULL;
@@ -3938,7 +3939,7 @@ static void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
 	return ptr;
 }
 
-void *kmalloc_large(size_t size, gfp_t flags)
+void * __nosbm kmalloc_large(size_t size, gfp_t flags)
 {
 	void *ret = __kmalloc_large_node(size, flags, NUMA_NO_NODE);
 
@@ -3983,26 +3984,26 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node,
 	return ret;
 }
 
-void *__kmalloc_node(size_t size, gfp_t flags, int node)
+void * __nosbm __kmalloc_node(size_t size, gfp_t flags, int node)
 {
 	return __do_kmalloc_node(size, flags, node, _RET_IP_);
 }
 EXPORT_SYMBOL(__kmalloc_node);
 
-void *__kmalloc(size_t size, gfp_t flags)
+void * __nosbm __kmalloc(size_t size, gfp_t flags)
 {
 	return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
 }
 EXPORT_SYMBOL(__kmalloc);
 
-void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
-				  int node, unsigned long caller)
+void * __nosbm __kmalloc_node_track_caller(size_t size, gfp_t flags,
+					   int node, unsigned long caller)
 {
 	return __do_kmalloc_node(size, flags, node, caller);
 }
 EXPORT_SYMBOL(__kmalloc_node_track_caller);
 
-void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
+void * __nosbm kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
 {
 	void *ret = slab_alloc_node(s, NULL, gfpflags, NUMA_NO_NODE,
 					    _RET_IP_, size);
@@ -4386,7 +4387,7 @@ static void free_large_kmalloc(struct folio *folio, void *object)
  *
  * If @object is NULL, no operation is performed.
  */
-void kfree(const void *object)
+void __nosbm kfree(const void *object)
 {
 	struct folio *folio;
 	struct slab *slab;
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d12a17fc0c17..d7a5b715ac03 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -40,6 +40,7 @@
 #include <linux/pgtable.h>
 #include <linux/hugetlb.h>
 #include <linux/sched/mm.h>
+#include <linux/sbm.h>
 #include <asm/tlbflush.h>
 #include <asm/shmparam.h>
 
@@ -2804,7 +2805,7 @@ void vfree_atomic(const void *addr)
  * if we have CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG, but making the calling
  * conventions for vfree() arch-dependent would be a really bad idea).
  */
-void vfree(const void *addr)
+void __nosbm vfree(const void *addr)
 {
 	struct vm_struct *vm;
 	int i;
@@ -3379,7 +3380,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
  *
  * Return: pointer to the allocated memory or %NULL on error
  */
-void *__vmalloc_node(unsigned long size, unsigned long align,
+void * __nosbm __vmalloc_node(unsigned long size, unsigned long align,
 			    gfp_t gfp_mask, int node, const void *caller)
 {
 	return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END,
@@ -3394,7 +3395,7 @@ void *__vmalloc_node(unsigned long size, unsigned long align,
 EXPORT_SYMBOL_GPL(__vmalloc_node);
 #endif
 
-void *__vmalloc(unsigned long size, gfp_t gfp_mask)
+void * __nosbm __vmalloc(unsigned long size, gfp_t gfp_mask)
 {
 	return __vmalloc_node(size, 1, gfp_mask, NUMA_NO_NODE,
 				__builtin_return_address(0));
@@ -3413,7 +3414,7 @@ EXPORT_SYMBOL(__vmalloc);
  *
  * Return: pointer to the allocated memory or %NULL on error
  */
-void *vmalloc(unsigned long size)
+void * __nosbm vmalloc(unsigned long size)
 {
 	return __vmalloc_node(size, 1, GFP_KERNEL, NUMA_NO_NODE,
 				__builtin_return_address(0));
@@ -3453,7 +3454,7 @@ EXPORT_SYMBOL_GPL(vmalloc_huge);
  *
  * Return: pointer to the allocated memory or %NULL on error
  */
-void *vzalloc(unsigned long size)
+void * __nosbm vzalloc(unsigned long size)
 {
 	return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_ZERO, NUMA_NO_NODE,
 				__builtin_return_address(0));
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC 5/5] apparmor: parse profiles in sandbox mode
  2024-02-22 13:12 ` [RFC 0/5] PoC: convert AppArmor parser to " Petr Tesarik
                     ` (3 preceding siblings ...)
  2024-02-22 13:12   ` [RFC 4/5] sbm: fix up calls to dynamic memory allocators Petr Tesarik
@ 2024-02-22 13:12   ` Petr Tesarik
  4 siblings, 0 replies; 10+ messages in thread
From: Petr Tesarik @ 2024-02-22 13:12 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Petr Tesařík, Petr Tesarik, Jonathan Corbet,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	Andy Lutomirski, Oleg Nesterov, Peter Zijlstra, Xin Li,
	Arnd Bergmann, Andrew Morton, Rick Edgecombe, Kees Cook,
	Masami Hiramatsu (Google), Pengfei Xu, Josh Poimboeuf, Ze Gao,
	Kirill A. Shutemov, Kai Huang, David Woodhouse, Brian Gerst,
	Jason Gunthorpe, Joerg Roedel, Mike Rapoport (IBM), Tina Zhang,
	Jacob Pan, open list:DOCUMENTATION, open list, Roberto Sassu,
	John Johansen, Paul Moore, James Morris, Serge E. Hallyn,
	apparmor, linux-security-module, Petr Tesarik

From: Petr Tesarik <petr.tesarik1@huawei-partners.com>

Run aa_unpack() in sandbox mode. Map the input data read-only and then walk
the resulting list created in sandbox mode.

Hashes are calculated in kernel mode, because crypto routines are not
sandboxed. The fixups should sanitize the parameters of AppArmor functions
and they should be defined close to the target functions. Both requires
extending the generic API and adding some more arch hooks, which would grow
this patch series too much.

For demonstration purposes, the fixups blindly trust the input from sandbox
mode and are hard-wired in the arch code.

Signed-off-by: Petr Tesarik <petr.tesarik1@huawei-partners.com>
---
 arch/x86/kernel/sbm/core.c | 15 +++++++++++++++
 security/apparmor/crypto.c |  7 ++++---
 security/apparmor/policy.c | 29 ++++++++++++++++++++++-------
 security/apparmor/secid.c  |  3 ++-
 4 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/sbm/core.c b/arch/x86/kernel/sbm/core.c
index 3cf3842292b9..3268c00da873 100644
--- a/arch/x86/kernel/sbm/core.c
+++ b/arch/x86/kernel/sbm/core.c
@@ -40,6 +40,16 @@ extern char __nosbm_text_start[], __nosbm_text_end[];
  * HACK: PROOF-OF-CONCEPT FIXUP CODE STARTS HERE
  */
 
+/*
+ * HACK: These declarations are needed to make a proxy call, but in the
+ * final version, AppArmor itself will define the proxies. At least, do
+ * not make the functions callable from here. All we need are their
+ * entry point addresses.
+ */
+extern char aa_alloc_secid[];
+extern char aa_calc_hash[];
+extern char aa_calc_profile_hash[];
+
 typedef unsigned long (*sbm_proxy_call_fn)(struct x86_sbm_state *,
 					   unsigned long func,
 					   struct pt_regs *);
@@ -135,6 +145,11 @@ static const struct sbm_fixup fixups[] =
 	{ vzalloc, proxy_alloc1 },
 	{ vfree, proxy_free },
 
+	/* AppArmor */
+	{ aa_alloc_secid, x86_sbm_proxy_call },
+	{ aa_calc_hash, x86_sbm_proxy_call },
+	{ aa_calc_profile_hash, x86_sbm_proxy_call },
+
 	{ }
 };
 
diff --git a/security/apparmor/crypto.c b/security/apparmor/crypto.c
index aad486b2fca6..db349cd4e467 100644
--- a/security/apparmor/crypto.c
+++ b/security/apparmor/crypto.c
@@ -12,6 +12,7 @@
  */
 
 #include <crypto/hash.h>
+#include <linux/sbm.h>
 
 #include "include/apparmor.h"
 #include "include/crypto.h"
@@ -25,7 +26,7 @@ unsigned int aa_hash_size(void)
 	return apparmor_hash_size;
 }
 
-char *aa_calc_hash(void *data, size_t len)
+char * __nosbm aa_calc_hash(void *data, size_t len)
 {
 	SHASH_DESC_ON_STACK(desc, apparmor_tfm);
 	char *hash;
@@ -58,8 +59,8 @@ char *aa_calc_hash(void *data, size_t len)
 	return ERR_PTR(error);
 }
 
-int aa_calc_profile_hash(struct aa_profile *profile, u32 version, void *start,
-			 size_t len)
+int __nosbm aa_calc_profile_hash(struct aa_profile *profile, u32 version, void *start,
+				 size_t len)
 {
 	SHASH_DESC_ON_STACK(desc, apparmor_tfm);
 	int error;
diff --git a/security/apparmor/policy.c b/security/apparmor/policy.c
index 957654d253dd..f2b9bf851be0 100644
--- a/security/apparmor/policy.c
+++ b/security/apparmor/policy.c
@@ -74,6 +74,7 @@
 #include <linux/cred.h>
 #include <linux/rculist.h>
 #include <linux/user_namespace.h>
+#include <linux/sbm.h>
 
 #include "include/apparmor.h"
 #include "include/capability.h"
@@ -1040,6 +1041,11 @@ static struct aa_profile *update_to_newest_parent(struct aa_profile *new)
 	return newest;
 }
 
+static SBM_DEFINE_CALL(aa_unpack, struct aa_loaddata *, udata,
+		       struct list_head *, lh, const char **, ns);
+static SBM_DEFINE_THUNK(aa_unpack, struct aa_loaddata *, udata,
+		       struct list_head *, lh, const char **, ns);
+
 /**
  * aa_replace_profiles - replace profile(s) on the profile list
  * @policy_ns: namespace load is occurring on
@@ -1063,12 +1069,20 @@ ssize_t aa_replace_profiles(struct aa_ns *policy_ns, struct aa_label *label,
 	struct aa_loaddata *rawdata_ent;
 	const char *op;
 	ssize_t count, error;
-	LIST_HEAD(lh);
+	struct list_head lh, *sbm_lh;
+	struct sbm sbm;
 
 	op = mask & AA_MAY_REPLACE_POLICY ? OP_PROF_REPL : OP_PROF_LOAD;
 	aa_get_loaddata(udata);
 	/* released below */
-	error = aa_unpack(udata, &lh, &ns_name);
+	sbm_init(&sbm);
+	SBM_MAP_READONLY(&sbm, udata->data, udata->size);
+	/* TODO: Handling of list heads could be improved */
+	sbm_lh = SBM_COPY_OUT(&sbm, &lh, sizeof(lh));
+	INIT_LIST_HEAD(sbm_lh);
+	error = sbm_call(&sbm, aa_unpack,
+			 SBM_COPY_INOUT(&sbm, udata, sizeof(*udata)), sbm_lh,
+			 SBM_COPY_OUT(&sbm, &ns_name, sizeof(ns_name)));
 	if (error)
 		goto out;
 
@@ -1078,7 +1092,7 @@ ssize_t aa_replace_profiles(struct aa_ns *policy_ns, struct aa_label *label,
 	 *       fail. Sort ent list and take ns locks in hierarchy order
 	 */
 	count = 0;
-	list_for_each_entry(ent, &lh, list) {
+	list_for_each_entry(ent, sbm_lh, list) {
 		if (ns_name) {
 			if (ent->ns_name &&
 			    strcmp(ent->ns_name, ns_name) != 0) {
@@ -1128,7 +1142,7 @@ ssize_t aa_replace_profiles(struct aa_ns *policy_ns, struct aa_label *label,
 		}
 	}
 	/* setup parent and ns info */
-	list_for_each_entry(ent, &lh, list) {
+	list_for_each_entry(ent, sbm_lh, list) {
 		struct aa_policy *policy;
 		struct aa_profile *p;
 
@@ -1159,7 +1173,7 @@ ssize_t aa_replace_profiles(struct aa_ns *policy_ns, struct aa_label *label,
 		policy = __lookup_parent(ns, ent->new->base.hname);
 		if (!policy) {
 			/* first check for parent in the load set */
-			p = __list_lookup_parent(&lh, ent->new);
+			p = __list_lookup_parent(sbm_lh, ent->new);
 			if (!p) {
 				/*
 				 * fill in missing parent with null
@@ -1198,7 +1212,7 @@ ssize_t aa_replace_profiles(struct aa_ns *policy_ns, struct aa_label *label,
 			goto fail_lock;
 		}
 	}
-	list_for_each_entry(ent, &lh, list) {
+	list_for_each_entry(ent, sbm_lh, list) {
 		if (!ent->old) {
 			struct dentry *parent;
 			if (rcu_access_pointer(ent->new->parent)) {
@@ -1220,7 +1234,7 @@ ssize_t aa_replace_profiles(struct aa_ns *policy_ns, struct aa_label *label,
 	__aa_bump_ns_revision(ns);
 	if (aa_g_export_binary)
 		__aa_loaddata_update(udata, ns->revision);
-	list_for_each_entry_safe(ent, tmp, &lh, list) {
+	list_for_each_entry_safe(ent, tmp, sbm_lh, list) {
 		list_del_init(&ent->list);
 		op = (!ent->old && !ent->rename) ? OP_PROF_LOAD : OP_PROF_REPL;
 
@@ -1265,6 +1279,7 @@ ssize_t aa_replace_profiles(struct aa_ns *policy_ns, struct aa_label *label,
 	mutex_unlock(&ns->lock);
 
 out:
+	sbm_destroy(&sbm);
 	aa_put_ns(ns);
 	aa_put_loaddata(udata);
 	kfree(ns_name);
diff --git a/security/apparmor/secid.c b/security/apparmor/secid.c
index 83d3d1e6d9dc..4190666d2dee 100644
--- a/security/apparmor/secid.c
+++ b/security/apparmor/secid.c
@@ -16,6 +16,7 @@
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 #include <linux/xarray.h>
+#include <linux/sbm.h>
 
 #include "include/cred.h"
 #include "include/lib.h"
@@ -116,7 +117,7 @@ void apparmor_release_secctx(char *secdata, u32 seclen)
  * Returns: 0 with @label->secid initialized
  *          <0 returns error with @label->secid set to AA_SECID_INVALID
  */
-int aa_alloc_secid(struct aa_label *label, gfp_t gfp)
+int __nosbm aa_alloc_secid(struct aa_label *label, gfp_t gfp)
 {
 	unsigned long flags;
 	int ret;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC 4/5] sbm: fix up calls to dynamic memory allocators
  2024-02-22 13:12   ` [RFC 4/5] sbm: fix up calls to dynamic memory allocators Petr Tesarik
@ 2024-02-22 15:51     ` Dave Hansen
  2024-02-22 17:57       ` Petr Tesařík
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Hansen @ 2024-02-22 15:51 UTC (permalink / raw)
  To: Petr Tesarik
  Cc: Petr Tesařík, Jonathan Corbet, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	Andy Lutomirski, Oleg Nesterov, Peter Zijlstra, Xin Li,
	Arnd Bergmann, Andrew Morton, Rick Edgecombe, Kees Cook,
	Masami Hiramatsu (Google), Pengfei Xu, Josh Poimboeuf, Ze Gao,
	Kirill A. Shutemov, Kai Huang, David Woodhouse, Brian Gerst,
	Jason Gunthorpe, Joerg Roedel, Mike Rapoport (IBM), Tina Zhang,
	Jacob Pan, open list:DOCUMENTATION, open list, Roberto Sassu,
	John Johansen, Paul Moore, James Morris, Serge E. Hallyn,
	apparmor, linux-security-module, Petr Tesarik

On 2/22/24 05:12, Petr Tesarik wrote:
>  static const struct sbm_fixup fixups[] =
>  {
> +	/* kmalloc() and friends */
> +	{ kmalloc_trace, proxy_alloc3 },
> +	{ __kmalloc, proxy_alloc1 },
> +	{ __kmalloc_node, proxy_alloc1 },
> +	{ __kmalloc_node_track_caller, proxy_alloc1 },
> +	{ kmalloc_large, proxy_alloc1 },
> +	{ kmalloc_large_node, proxy_alloc1 },
> +	{ krealloc, proxy_alloc2 },
> +	{ kfree, proxy_free },
> +
> +	/* vmalloc() and friends */
> +	{ vmalloc, proxy_alloc1 },
> +	{ __vmalloc, proxy_alloc1 },
> +	{ __vmalloc_node, proxy_alloc1 },
> +	{ vzalloc, proxy_alloc1 },
> +	{ vfree, proxy_free },
> +
>  	{ }
>  };

Petr, thanks for sending this.  This _is_ a pretty concise example of
what it means to convert kernel code to run in your sandbox mode.  But,
from me, it's still "no thanks".

Establishing and maintaining this proxy list will be painful.  Folks
will change the code to call something new and break this *constantly*.

That goes for infrastructure like the allocators and for individual
sandbox instances like apparmor.

It's also telling that sandboxing a bit of apparmor took four fixups.
That tells me we're probably still only looking at the tip of the icebeg
if we were to convert a bunch more sites.

That's on top of everything I was concerned about before.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 4/5] sbm: fix up calls to dynamic memory allocators
  2024-02-22 15:51     ` Dave Hansen
@ 2024-02-22 17:57       ` Petr Tesařík
  2024-02-22 18:03         ` Dave Hansen
  0 siblings, 1 reply; 10+ messages in thread
From: Petr Tesařík @ 2024-02-22 17:57 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Petr Tesarik, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	Andy Lutomirski, Oleg Nesterov, Peter Zijlstra, Xin Li,
	Arnd Bergmann, Andrew Morton, Rick Edgecombe, Kees Cook,
	Masami Hiramatsu (Google), Pengfei Xu, Josh Poimboeuf, Ze Gao,
	Kirill A. Shutemov, Kai Huang, David Woodhouse, Brian Gerst,
	Jason Gunthorpe, Joerg Roedel, Mike Rapoport (IBM), Tina Zhang,
	Jacob Pan, open list:DOCUMENTATION, open list, Roberto Sassu,
	John Johansen, Paul Moore, James Morris, Serge E. Hallyn,
	apparmor, linux-security-module, Petr Tesarik

On Thu, 22 Feb 2024 07:51:00 -0800
Dave Hansen <dave.hansen@intel.com> wrote:

> On 2/22/24 05:12, Petr Tesarik wrote:
> >  static const struct sbm_fixup fixups[] =
> >  {
> > +	/* kmalloc() and friends */
> > +	{ kmalloc_trace, proxy_alloc3 },
> > +	{ __kmalloc, proxy_alloc1 },
> > +	{ __kmalloc_node, proxy_alloc1 },
> > +	{ __kmalloc_node_track_caller, proxy_alloc1 },
> > +	{ kmalloc_large, proxy_alloc1 },
> > +	{ kmalloc_large_node, proxy_alloc1 },
> > +	{ krealloc, proxy_alloc2 },
> > +	{ kfree, proxy_free },
> > +
> > +	/* vmalloc() and friends */
> > +	{ vmalloc, proxy_alloc1 },
> > +	{ __vmalloc, proxy_alloc1 },
> > +	{ __vmalloc_node, proxy_alloc1 },
> > +	{ vzalloc, proxy_alloc1 },
> > +	{ vfree, proxy_free },
> > +
> >  	{ }
> >  };  
> 
> Petr, thanks for sending this.  This _is_ a pretty concise example of
> what it means to convert kernel code to run in your sandbox mode.  But,
> from me, it's still "no thanks".
> 
> Establishing and maintaining this proxy list will be painful.  Folks
> will change the code to call something new and break this *constantly*.
> 
> That goes for infrastructure like the allocators and for individual
> sandbox instances like apparmor.

Understood.

OTOH the proxy list is here for the PoC so I could send something that
builds and runs without making it an overly big patch series. As
explained in patch 5/5, the goal is not to make a global list. Instead,
each instance should define what it needs and that way define its
specific policy of interfacing with the rest of the kernel.

To give an example, these AppArmor fixups would be added only to the
sandbox which runs aa_unpack(), but not to the one which runs
unpack_to_rootfs(), which is another PoC I did (but required porting
more patches).

If more fixups are needed after you change your code, you know you've
just added a new dependency. It's then up to you to decide if it was
intentional.

> It's also telling that sandboxing a bit of apparmor took four fixups.
> That tells me we're probably still only looking at the tip of the icebeg
> if we were to convert a bunch more sites.

Yes, it is the cost paid for getting code and data flows under control.

In your opinion this kind of memory safety is not worth the effort of
explicitly defining the interface between a sandboxed component and the
rest of the kernel, because it increases maintenance costs. Correct?

> That's on top of everything I was concerned about before.

Good, I think I can understand the new concern, but regarding
everything you were concerned about before, this part is still not
quite clear to me. I'll try to summarize the points:

* Running code in ring-0 is inherently safer than running code in
  ring-3.

  Since what I'm trying to do is protect kernel data structures
  from memory safety bugs in another part of the kernel, it roughly
  translates to: "Kernel data structures are better protected from
  rogue kernel modules than from userspace applications." This cannot
  possibly be what you are trying to say.

* SMAP, SMEP and/or LASS can somehow protect one part of the kernel
  from memory safety bugs in another part of the kernel.

  I somehow can't see how that is the case. I have always thought that:

  * SMEP prevents the kernel to execute code from user pages.
  * SMAP prevents the kernel to read from or write into user pages.
  * LASS does pretty much the same job as SMEP+SMAP, but instead of
    using page table protection bits, it uses the highest bit of the
    virtual address because that's much faster.

* Hardware designers are adding (other) hardware security defenses to
  ring-0 that are not applied to ring-3.

  Could you give an example of these other security defenses, please?

* Ring-3 is more exposed to attacks.

  This statement sounds a bit too vague on its own. What attack vectors
  are we talking about? The primary attack vector that SBM is trying to
  address are exploits of kernel code vulnerabilities triggered by data
  from sources outside the kernel (boot loader, userspace, etc.).

H. Peter Anvin added a few other points:

* SBM has all the downsides of a microkernel without the upsides.

  I can only guess what would be the downsides and upsides...

  One notorious downside is performance. Agreed, there is some overhead.
  I'm not promoting SBM for time-critical operations. But compared to
  user-mode helpers (which was suggested as an alternative for one of
  the proposed scenarios), the overhead of SBM is at least an order of
  magnitude less.

  IPC and the need to define how servers interact with each other is
  another downside I can think of. Yes, there is a bit of it in SBM, as
  you have correctly noted above.

* SBM introduces architectural changes that are most definitely *very*
  harmful both to maintainers and users.

  It is very difficult to learn something from this statement. Could
  you give some examples of how SBM harms either group, please?

* SBM feels like paravirtualization all over again.

  All right, hpa, you've had lots of pain with paravirtualization. I
  feel with you, I've had my part of it too. Can you imagine how much
  trouble I could have spared myself for the libkdumpfile project if I
  didn't have to deal with the difference between "physical addresses"
  and "machine addresses"?

  However, this is hardly a relevant point. The Linux kernel community
  is respected for making decisions based on facts, not feelings.

* SBM exposes kernel memory to user space.

  This is a misunderstanding. Sandbox mode does not share anything at
  all with user mode. It does share some CPU state with kernel mode,
  but not with user mode. If "user space" was intended to mean "Ring-3",
  then it doesn't explain how that is a really bad idea.

* SBM is not needed, because there is already eBPF.

  Well, yes, but I believe they work on a different level. For example,
  eBPF needs a verifier to ensure memory safety. If you run eBPF code
  itself in a sandbox instead, that verifier is not needed, because
  memory safety is enforced by CPU hardware.

When hpa says that SandBox Mode is "an enormous step in the wrong
direction", I want to understand why this direction is wrong, so I can
take a step in the right direction next time.

So far there has been only one objective concern: the need to track code
(and data) dependencies explicitly. AFAICS this is an inherent drawback
of any kind of program decomposition.

Is decomposition considered harmful?

Petr T

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 4/5] sbm: fix up calls to dynamic memory allocators
  2024-02-22 17:57       ` Petr Tesařík
@ 2024-02-22 18:03         ` Dave Hansen
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Hansen @ 2024-02-22 18:03 UTC (permalink / raw)
  To: Petr Tesařík
  Cc: Petr Tesarik, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	Andy Lutomirski, Oleg Nesterov, Peter Zijlstra, Xin Li,
	Arnd Bergmann, Andrew Morton, Rick Edgecombe, Kees Cook,
	Masami Hiramatsu (Google), Pengfei Xu, Josh Poimboeuf, Ze Gao,
	Kirill A. Shutemov, Kai Huang, David Woodhouse, Brian Gerst,
	Jason Gunthorpe, Joerg Roedel, Mike Rapoport (IBM), Tina Zhang,
	Jacob Pan, open list:DOCUMENTATION, open list, Roberto Sassu,
	John Johansen, Paul Moore, James Morris, Serge E. Hallyn,
	apparmor, linux-security-module, Petr Tesarik

On 2/22/24 09:57, Petr Tesařík wrote:
> * Hardware designers are adding (other) hardware security defenses to
>   ring-0 that are not applied to ring-3.
> 
>   Could you give an example of these other security defenses, please?

Here's one example:

> https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/data-dependent-prefetcher.html

"DDP is neither trained by nor triggered by supervisor-mode accesses."

But seriously, this is going to be my last message on this topic.  I
appreciate your enthusiasm, but I don't see any viable way forward for
this approach.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-02-22 18:03 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <fb4a40c7-af9a-406a-95ab-406595f3ffe5@intel.com>
     [not found] ` <20240216152435.1575-1-petrtesarik@huaweicloud.com>
     [not found]   ` <c65eb8f1-2903-4043-a3ab-945d880043b5@intel.com>
     [not found]     ` <20240216170805.0d0decd5@meshulam.tesarici.cz>
     [not found]       ` <87y1bktjdk.fsf@meer.lwn.net>
2024-02-16 18:24         ` [RFC 0/8] PGP key parser using SandBox Mode Roberto Sassu
2024-02-22 13:12 ` [RFC 0/5] PoC: convert AppArmor parser to " Petr Tesarik
2024-02-22 13:12   ` [RFC 1/5] sbm: x86: fix SBM error entry path Petr Tesarik
2024-02-22 13:12   ` [RFC 2/5] sbm: enhance buffer mapping API Petr Tesarik
2024-02-22 13:12   ` [RFC 3/5] sbm: x86: infrastructure to fix up sandbox faults Petr Tesarik
2024-02-22 13:12   ` [RFC 4/5] sbm: fix up calls to dynamic memory allocators Petr Tesarik
2024-02-22 15:51     ` Dave Hansen
2024-02-22 17:57       ` Petr Tesařík
2024-02-22 18:03         ` Dave Hansen
2024-02-22 13:12   ` [RFC 5/5] apparmor: parse profiles in sandbox mode Petr Tesarik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).