linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 1/2] proc: pass file instead of inode to proc_mem_open
@ 2024-06-13 13:39 Adrian Ratiu
  2024-06-13 13:39 ` [PATCH v6 2/2] proc: restrict /proc/pid/mem Adrian Ratiu
  2024-06-17  8:48 ` [PATCH v6 1/2] proc: pass file instead of inode to proc_mem_open Christian Brauner
  0 siblings, 2 replies; 9+ messages in thread
From: Adrian Ratiu @ 2024-06-13 13:39 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-security-module, linux-kernel, linux-hardening, linux-doc,
	kernel, gbiv, ryanbeltran, inglorion, ajordanr, jorgelo,
	Adrian Ratiu, Jann Horn, Kees Cook, Christian Brauner, Jeff Xu,
	Kees Cook

The file struct is required in proc_mem_open() so its
f_mode can be checked when deciding whether to allow or
deny /proc/*/mem open requests via the new read/write
and foll_force restriction mechanism.

Thus instead of directly passing the inode to the fun,
we pass the file and get the inode inside it.

Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jeff Xu <jeffxu@google.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Reviewed-by: Kees Cook <kees@kernel.org>
---
No changes in v6
---
 fs/proc/base.c       | 6 +++---
 fs/proc/internal.h   | 2 +-
 fs/proc/task_mmu.c   | 6 +++---
 fs/proc/task_nommu.c | 2 +-
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 72a1acd03675..4c607089f66e 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -794,9 +794,9 @@ static const struct file_operations proc_single_file_operations = {
 };
 
 
-struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode)
+struct mm_struct *proc_mem_open(struct file  *file, unsigned int mode)
 {
-	struct task_struct *task = get_proc_task(inode);
+	struct task_struct *task = get_proc_task(file->f_inode);
 	struct mm_struct *mm = ERR_PTR(-ESRCH);
 
 	if (task) {
@@ -816,7 +816,7 @@ struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode)
 
 static int __mem_open(struct inode *inode, struct file *file, unsigned int mode)
 {
-	struct mm_struct *mm = proc_mem_open(inode, mode);
+	struct mm_struct *mm = proc_mem_open(file, mode);
 
 	if (IS_ERR(mm))
 		return PTR_ERR(mm);
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index a71ac5379584..d38b2eea40d1 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -295,7 +295,7 @@ struct proc_maps_private {
 #endif
 } __randomize_layout;
 
-struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode);
+struct mm_struct *proc_mem_open(struct file *file, unsigned int mode);
 
 extern const struct file_operations proc_pid_maps_operations;
 extern const struct file_operations proc_pid_numa_maps_operations;
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f8d35f993fe5..fe3b2182b0aa 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -210,7 +210,7 @@ static int proc_maps_open(struct inode *inode, struct file *file,
 		return -ENOMEM;
 
 	priv->inode = inode;
-	priv->mm = proc_mem_open(inode, PTRACE_MODE_READ);
+	priv->mm = proc_mem_open(file, PTRACE_MODE_READ);
 	if (IS_ERR(priv->mm)) {
 		int err = PTR_ERR(priv->mm);
 
@@ -1030,7 +1030,7 @@ static int smaps_rollup_open(struct inode *inode, struct file *file)
 		goto out_free;
 
 	priv->inode = inode;
-	priv->mm = proc_mem_open(inode, PTRACE_MODE_READ);
+	priv->mm = proc_mem_open(file, PTRACE_MODE_READ);
 	if (IS_ERR(priv->mm)) {
 		ret = PTR_ERR(priv->mm);
 
@@ -1754,7 +1754,7 @@ static int pagemap_open(struct inode *inode, struct file *file)
 {
 	struct mm_struct *mm;
 
-	mm = proc_mem_open(inode, PTRACE_MODE_READ);
+	mm = proc_mem_open(file, PTRACE_MODE_READ);
 	if (IS_ERR(mm))
 		return PTR_ERR(mm);
 	file->private_data = mm;
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index bce674533000..a8ab182a4ed1 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -259,7 +259,7 @@ static int maps_open(struct inode *inode, struct file *file,
 		return -ENOMEM;
 
 	priv->inode = inode;
-	priv->mm = proc_mem_open(inode, PTRACE_MODE_READ);
+	priv->mm = proc_mem_open(file, PTRACE_MODE_READ);
 	if (IS_ERR(priv->mm)) {
 		int err = PTR_ERR(priv->mm);
 
-- 
2.44.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v6 2/2] proc: restrict /proc/pid/mem
  2024-06-13 13:39 [PATCH v6 1/2] proc: pass file instead of inode to proc_mem_open Adrian Ratiu
@ 2024-06-13 13:39 ` Adrian Ratiu
  2024-06-17 18:00   ` Kees Cook
  2024-06-18 22:39   ` Jeff Xu
  2024-06-17  8:48 ` [PATCH v6 1/2] proc: pass file instead of inode to proc_mem_open Christian Brauner
  1 sibling, 2 replies; 9+ messages in thread
From: Adrian Ratiu @ 2024-06-13 13:39 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-security-module, linux-kernel, linux-hardening, linux-doc,
	kernel, gbiv, ryanbeltran, inglorion, ajordanr, jorgelo,
	Adrian Ratiu, Guenter Roeck, Doug Anderson, Kees Cook, Jann Horn,
	Andrew Morton, Randy Dunlap, Christian Brauner, Jeff Xu,
	Mike Frysinger

Prior to v2.6.39 write access to /proc/<pid>/mem was restricted,
after which it got allowed in commit 198214a7ee50 ("proc: enable
writing to /proc/pid/mem"). Famous last words from that patch:
"no longer a security hazard". :)

Afterwards exploits started causing drama like [1]. The exploits
using /proc/*/mem can be rather sophisticated like [2] which
installed an arbitrary payload from noexec storage into a running
process then exec'd it, which itself could include an ELF loader
to run arbitrary code off noexec storage.

One of the well-known problems with /proc/*/mem writes is they
ignore page permissions via FOLL_FORCE, as opposed to writes via
process_vm_writev which respect page permissions. These writes can
also be used to bypass mode bits.

To harden against these types of attacks, distrbutions might want
to restrict /proc/pid/mem accesses, either entirely or partially,
for eg. to restrict FOLL_FORCE usage.

Known valid use-cases which still need these accesses are:

* Debuggers which also have ptrace permissions, so they can access
memory anyway via PTRACE_POKEDATA & co. Some debuggers like GDB
are designed to write /proc/pid/mem for basic functionality.

* Container supervisors using the seccomp notifier to intercept
syscalls and rewrite memory of calling processes by passing
around /proc/pid/mem file descriptors.

There might be more, that's why these params default to disabled.

Regarding other mechanisms which can block these accesses:

* seccomp filters can be used to block mmap/mprotect calls with W|X
perms, but they often can't block open calls as daemons want to
read/write their runtime state and seccomp filters cannot check
file paths, so plain write calls can't be easily blocked.

* Since the mem file is part of the dynamic /proc/<pid>/ space, we
can't run chmod once at boot to restrict it (and trying to react
to every process and run chmod doesn't scale, and the kernel no
longer allows chmod on any of these paths).

* SELinux could be used with a rule to cover all /proc/*/mem files,
but even then having multiple ways to deny an attack is useful in
case one layer fails.

Thus we introduce four kernel parameters to restrict /proc/*/mem
access: open-read, open-write, write and foll_force. All these can
be independently set to the following values:

all     => restrict all access unconditionally.
ptracer => restrict all access except for ptracer processes.

If left unset, the existing behaviour is preserved, i.e. access
is governed by basic file permissions.

Examples which can be passed by bootloaders:

proc_mem.restrict_foll_force=all
proc_mem.restrict_open_write=ptracer
proc_mem.restrict_open_read=ptracer
proc_mem.restrict_write=all

These knobs can also be enabled via Kconfig like for eg:

CONFIG_PROC_MEM_RESTRICT_WRITE_PTRACE_DEFAULT=y
CONFIG_PROC_MEM_RESTRICT_FOLL_FORCE_PTRACE_DEFAULT=y

Each distribution needs to decide what restrictions to apply,
depending on its use-cases. Embedded systems might want to do
more, while general-purpouse distros might want a more relaxed
policy, because for e.g. foll_force=all and write=all both break
break GDB, so it might be a bit excessive.

Based on an initial patch by Mike Frysinger <vapier@chromium.org>.

Link: https://lwn.net/Articles/476947/ [1]
Link: https://issues.chromium.org/issues/40089045 [2]
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Doug Anderson <dianders@chromium.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jeff Xu <jeffxu@google.com>
Co-developed-by: Mike Frysinger <vapier@chromium.org>
Signed-off-by: Mike Frysinger <vapier@chromium.org>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
---
Changes in v6:
* Replaced slow_inc() with static_key_enable/disable
* Added pr_warn() when restricting calls to warn userspace
* Reworked Kconfig menu to be choices with 3 states each
* Reworked static key defines to add OFF state
* Double checked all combinations, including OFF work as
  expected (booted and run GDB/gdbserver)
---
 .../admin-guide/kernel-parameters.txt         |  38 ++++
 fs/proc/base.c                                | 197 +++++++++++++++++-
 security/Kconfig                              | 121 +++++++++++
 3 files changed, 355 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index b600df82669d..ad2cb6b3c54d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4814,6 +4814,44 @@
 	printk.time=	Show timing data prefixed to each printk message line
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 
+	proc_mem.restrict_foll_force= [KNL]
+			Format: {all | ptracer}
+			Restricts the use of the FOLL_FORCE flag for /proc/*/mem access.
+			If restricted, the FOLL_FORCE flag will not be added to vm accesses.
+			Can be one of:
+			- 'all' restricts all access unconditionally.
+			- 'ptracer' allows access only for ptracer processes.
+			If not specified, FOLL_FORCE is always used.
+
+	proc_mem.restrict_open_read= [KNL]
+			Format: {all | ptracer}
+			Allows restricting read access to /proc/*/mem files during open().
+			Depending on restriction level, open for reads return -EACCES.
+			Can be one of:
+			- 'all' restricts all access unconditionally.
+			- 'ptracer' allows access only for ptracer processes.
+			If not specified, then basic file permissions continue to apply.
+
+	proc_mem.restrict_open_write= [KNL]
+			Format: {all | ptracer}
+			Allows restricting write access to /proc/*/mem files during open().
+			Depending on restriction level, open for writes return -EACCES.
+			Can be one of:
+			- 'all' restricts all access unconditionally.
+			- 'ptracer' allows access only for ptracer processes.
+			If not specified, then basic file permissions continue to apply.
+
+	proc_mem.restrict_write= [KNL]
+			Format: {all | ptracer}
+			Allows restricting write access to /proc/*/mem after the files
+			have been opened, during the actual write calls. This is useful for
+			systems which can't block writes earlier during open().
+			Depending on restriction level, writes will return -EACCES.
+			Can be one of:
+			- 'all' restricts all access unconditionally.
+			- 'ptracer' allows access only for ptracer processes.
+			If not specified, then basic file permissions continue to apply.
+
 	processor.max_cstate=	[HW,ACPI]
 			Limit processor to maximum C-state
 			max_cstate=9 overrides any DMI blacklist limit.
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 4c607089f66e..9ad9ddd94784 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -152,6 +152,77 @@ struct pid_entry {
 		NULL, &proc_pid_attr_operations,	\
 		{ .lsmid = LSMID })
 
+#if IS_ENABLED(CONFIG_PROC_MEM_RESTRICT_OPEN_READ_ALL)
+DEFINE_STATIC_KEY_TRUE_RO(proc_mem_restrict_open_read_all);
+DEFINE_STATIC_KEY_FALSE_RO(proc_mem_restrict_open_read_ptracer);
+#elif IS_ENABLED(CONFIG_PROC_MEM_RESTRICT_OPEN_READ_PTRACE)
+DEFINE_STATIC_KEY_FALSE_RO(proc_mem_restrict_open_read_all);
+DEFINE_STATIC_KEY_TRUE_RO(proc_mem_restrict_open_read_ptracer);
+#else
+DEFINE_STATIC_KEY_FALSE_RO(proc_mem_restrict_open_read_all);
+DEFINE_STATIC_KEY_FALSE_RO(proc_mem_restrict_open_read_ptracer);
+#endif
+
+#if IS_ENABLED(CONFIG_PROC_MEM_RESTRICT_OPEN_WRITE_ALL)
+DEFINE_STATIC_KEY_TRUE_RO(proc_mem_restrict_open_write_all);
+DEFINE_STATIC_KEY_FALSE_RO(proc_mem_restrict_open_write_ptracer);
+#elif IS_ENABLED(CONFIG_PROC_MEM_RESTRICT_OPEN_WRITE_PTRACE)
+DEFINE_STATIC_KEY_FALSE_RO(proc_mem_restrict_open_write_all);
+DEFINE_STATIC_KEY_TRUE_RO(proc_mem_restrict_open_write_ptracer);
+#else
+DEFINE_STATIC_KEY_FALSE_RO(proc_mem_restrict_open_write_all);
+DEFINE_STATIC_KEY_FALSE_RO(proc_mem_restrict_open_write_ptracer);
+#endif
+
+#if IS_ENABLED(CONFIG_PROC_MEM_RESTRICT_WRITE_ALL)
+DEFINE_STATIC_KEY_TRUE_RO(proc_mem_restrict_write_all);
+DEFINE_STATIC_KEY_FALSE_RO(proc_mem_restrict_write_ptracer);
+#elif IS_ENABLED(CONFIG_PROC_MEM_RESTRICT_WRITE_PTRACE)
+DEFINE_STATIC_KEY_FALSE_RO(proc_mem_restrict_write_all);
+DEFINE_STATIC_KEY_TRUE_RO(proc_mem_restrict_write_ptracer);
+#else
+DEFINE_STATIC_KEY_FALSE_RO(proc_mem_restrict_write_all);
+DEFINE_STATIC_KEY_FALSE_RO(proc_mem_restrict_write_ptracer);
+#endif
+
+#if IS_ENABLED(CONFIG_PROC_MEM_RESTRICT_FOLL_FORCE_ALL)
+DEFINE_STATIC_KEY_TRUE_RO(proc_mem_restrict_foll_force_all);
+DEFINE_STATIC_KEY_FALSE_RO(proc_mem_restrict_foll_force_ptracer);
+#elif IS_ENABLED(CONFIG_PROC_MEM_RESTRICT_FOLL_FORCE_PTRACE)
+DEFINE_STATIC_KEY_FALSE_RO(proc_mem_restrict_foll_force_all);
+DEFINE_STATIC_KEY_TRUE_RO(proc_mem_restrict_foll_force_ptracer);
+#else
+DEFINE_STATIC_KEY_FALSE_RO(proc_mem_restrict_foll_force_all);
+DEFINE_STATIC_KEY_FALSE_RO(proc_mem_restrict_foll_force_ptracer);
+#endif
+
+#define DEFINE_EARLY_PROC_MEM_RESTRICT(name)					\
+static int __init early_proc_mem_restrict_##name(char *buf)			\
+{										\
+	if (!buf)								\
+		return -EINVAL;							\
+										\
+	if (strcmp(buf, "all") == 0) {						\
+		static_key_enable(&proc_mem_restrict_##name##_all.key);		\
+		static_key_disable(&proc_mem_restrict_##name##_ptracer.key);	\
+	} else if (strcmp(buf, "ptracer") == 0) {				\
+		static_key_disable(&proc_mem_restrict_##name##_all.key);	\
+		static_key_enable(&proc_mem_restrict_##name##_ptracer.key);	\
+	} else if (strcmp(buf, "off") == 0) {					\
+		static_key_disable(&proc_mem_restrict_##name##_all.key);	\
+		static_key_disable(&proc_mem_restrict_##name##_ptracer.key);	\
+	} else									\
+		pr_warn("%s: ignoring unknown option '%s'\n",			\
+			"proc_mem.restrict_" #name, buf);			\
+	return 0;								\
+}										\
+early_param("proc_mem.restrict_" #name, early_proc_mem_restrict_##name)
+
+DEFINE_EARLY_PROC_MEM_RESTRICT(open_read);
+DEFINE_EARLY_PROC_MEM_RESTRICT(open_write);
+DEFINE_EARLY_PROC_MEM_RESTRICT(write);
+DEFINE_EARLY_PROC_MEM_RESTRICT(foll_force);
+
 /*
  * Count the number of hardlinks for the pid_entry table, excluding the .
  * and .. links.
@@ -794,12 +865,71 @@ static const struct file_operations proc_single_file_operations = {
 };
 
 
+static void report_mem_rw_reject(const char *action, struct task_struct *task)
+{
+	pr_warn_ratelimited("Denied %s of /proc/%d/mem (%s) by pid %d (%s)\n",
+			    action, task_pid_nr(task), task->comm,
+			    task_pid_nr(current), current->comm);
+}
+
+static int __mem_open_access_permitted(struct file *file, struct task_struct *task)
+{
+	bool is_ptracer;
+
+	rcu_read_lock();
+	is_ptracer = current == ptrace_parent(task);
+	rcu_read_unlock();
+
+	if (file->f_mode & FMODE_WRITE) {
+		/* Deny if writes are unconditionally disabled via param */
+		if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_OPEN_WRITE_DEFAULT,
+					&proc_mem_restrict_open_write_all)) {
+			report_mem_rw_reject("all open-for-write", task);
+			return -EACCES;
+		}
+
+		/* Deny if writes are allowed only for ptracers via param */
+		if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_OPEN_WRITE_PTRACE_DEFAULT,
+					&proc_mem_restrict_open_write_ptracer) &&
+		    !is_ptracer) {
+			report_mem_rw_reject("non-ptracer open-for-write", task);
+			return -EACCES;
+		}
+	}
+
+	if (file->f_mode & FMODE_READ) {
+		/* Deny if reads are unconditionally disabled via param */
+		if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_OPEN_READ_DEFAULT,
+					&proc_mem_restrict_open_read_all)) {
+			report_mem_rw_reject("all open-for-read", task);
+			return -EACCES;
+		}
+
+		/* Deny if reads are allowed only for ptracers via param */
+		if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_OPEN_READ_PTRACE_DEFAULT,
+					&proc_mem_restrict_open_read_ptracer) &&
+		    !is_ptracer) {
+			report_mem_rw_reject("non-ptracer open-for-read", task);
+			return -EACCES;
+		}
+	}
+
+	return 0; /* R/W are not restricted */
+}
+
 struct mm_struct *proc_mem_open(struct file  *file, unsigned int mode)
 {
 	struct task_struct *task = get_proc_task(file->f_inode);
 	struct mm_struct *mm = ERR_PTR(-ESRCH);
+	int ret;
 
 	if (task) {
+		ret = __mem_open_access_permitted(file, task);
+		if (ret) {
+			put_task_struct(task);
+			return ERR_PTR(ret);
+		}
+
 		mm = mm_access(task, mode | PTRACE_MODE_FSCREDS);
 		put_task_struct(task);
 
@@ -835,10 +965,67 @@ static int mem_open(struct inode *inode, struct file *file)
 	return ret;
 }
 
+static bool __mem_rw_current_is_ptracer(struct file *file)
+{
+	struct inode *inode = file_inode(file);
+	struct task_struct *task = get_proc_task(inode);
+	struct mm_struct *mm = NULL;
+	int is_ptracer = false, has_mm_access = false;
+
+	if (task) {
+		rcu_read_lock();
+		is_ptracer = current == ptrace_parent(task);
+		rcu_read_unlock();
+
+		mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
+		if (mm && file->private_data == mm) {
+			has_mm_access = true;
+			mmput(mm);
+		}
+
+		put_task_struct(task);
+	}
+
+	return is_ptracer && has_mm_access;
+}
+
+static unsigned int __mem_rw_get_foll_force_flag(struct file *file)
+{
+	/* Deny if FOLL_FORCE is disabled via param */
+	if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_FOLL_FORCE_DEFAULT,
+				&proc_mem_restrict_foll_force_all))
+		return 0;
+
+	/* Deny if FOLL_FORCE is allowed only for ptracers via param */
+	if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_FOLL_FORCE_PTRACE_DEFAULT,
+				&proc_mem_restrict_foll_force_ptracer) &&
+	    !__mem_rw_current_is_ptracer(file))
+		return 0;
+
+	return FOLL_FORCE;
+}
+
+static bool __mem_rw_block_writes(struct file *file)
+{
+	/* Block if writes are disabled via param proc_mem.restrict_write=all */
+	if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_WRITE_DEFAULT,
+				&proc_mem_restrict_write_all))
+		return true;
+
+	/* Block with an exception only for ptracers */
+	if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_WRITE_PTRACE_DEFAULT,
+				&proc_mem_restrict_write_ptracer) &&
+	    !__mem_rw_current_is_ptracer(file))
+		return true;
+
+	return false;
+}
+
 static ssize_t mem_rw(struct file *file, char __user *buf,
 			size_t count, loff_t *ppos, int write)
 {
 	struct mm_struct *mm = file->private_data;
+	struct task_struct *task = NULL;
 	unsigned long addr = *ppos;
 	ssize_t copied;
 	char *page;
@@ -847,6 +1034,13 @@ static ssize_t mem_rw(struct file *file, char __user *buf,
 	if (!mm)
 		return 0;
 
+	if (write && __mem_rw_block_writes(file)) {
+		task = get_proc_task(file->f_inode);
+		if (task)
+			report_mem_rw_reject("write call", task);
+		return -EACCES;
+	}
+
 	page = (char *)__get_free_page(GFP_KERNEL);
 	if (!page)
 		return -ENOMEM;
@@ -855,7 +1049,8 @@ static ssize_t mem_rw(struct file *file, char __user *buf,
 	if (!mmget_not_zero(mm))
 		goto free;
 
-	flags = FOLL_FORCE | (write ? FOLL_WRITE : 0);
+	flags = (write ? FOLL_WRITE : 0);
+	flags |= __mem_rw_get_foll_force_flag(file);
 
 	while (count > 0) {
 		size_t this_len = min_t(size_t, count, PAGE_SIZE);
diff --git a/security/Kconfig b/security/Kconfig
index 412e76f1575d..da4d9aa2c99f 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -183,6 +183,127 @@ config STATIC_USERMODEHELPER_PATH
 	  If you wish for all usermode helper programs to be disabled,
 	  specify an empty string here (i.e. "").
 
+choice
+	prompt "Restrict /proc/pid/mem FOLL_FORCE usage"
+	default PROC_MEM_RESTRICT_FOLL_FORCE_OFF
+	help
+	  Reading and writing of /proc/pid/mem bypasses memory permission
+	  checks due to the internal use of the FOLL_FORCE flag. This can be
+	  used by attackers to manipulate process memory contents that would
+	  have been otherwise protected. However, debuggers, like GDB, use
+	  this to set breakpoints, etc. To force debuggers to fall back to
+	  PEEK/POKE, see PROC_MEM_RESTRICT_OPEN_WRITE_ALL.
+
+	config PROC_MEM_RESTRICT_FOLL_FORCE_OFF
+	bool "Do not restrict FOLL_FORCE usage with /proc/pid/mem (regular)"
+	help
+	  Regular behavior: continue to use the FOLL_FORCE flag for
+	  /proc/pid/mem access.
+
+	config PROC_MEM_RESTRICT_FOLL_FORCE_PTRACE
+	bool "Only allow ptracers to use FOLL_FORCE with /proc/pid/mem (safer)"
+	help
+	  Only use the FOLL_FORCE flag for /proc/pid/mem access when the
+	  current task is the active ptracer of the target task. (Safer,
+	  least disruptive to most usage patterns.)
+
+	config PROC_MEM_RESTRICT_FOLL_FORCE_ALL
+	bool "Do not use FOLL_FORCE with /proc/pid/mem (safest)"
+	help
+	  Remove the FOLL_FORCE flag for all /proc/pid/mem accesses.
+	  (Safest, but may be disruptive to some usage patterns.)
+endchoice
+
+choice
+	prompt "Restrict /proc/pid/mem OPEN_READ usage"
+	default PROC_MEM_RESTRICT_OPEN_READ_OFF
+	help
+	  Reading and writing of /proc/pid/mem bypasses memory permission
+	  checks due to the internal use of the FOLL_FORCE flag. This can be
+	  used by attackers to manipulate process memory contents that would
+	  have been otherwise protected. However, debuggers, like GDB, use
+	  this to set breakpoints, etc. To force debuggers to fall back to
+	  PEEK/POKE, see PROC_MEM_RESTRICT_OPEN_WRITE_ALL.
+
+	config PROC_MEM_RESTRICT_OPEN_READ_OFF
+	bool "Do not restrict /proc/pid/mem open for read (regular)"
+	help
+	  Regular behavior: allow /proc/pid/mem open for read access.
+
+	config PROC_MEM_RESTRICT_OPEN_READ_PTRACE
+	bool "Only allow ptracers to open /proc/pid/mem for read (safer)"
+	help
+	  Only allow opening /proc/pid/mem for reading when the current
+	  task is the active ptracer of the target task. (Safer, least
+	  disruptive to most usage patterns.)
+
+	config PROC_MEM_RESTRICT_OPEN_READ_ALL
+	bool "Do not allow /proc/pid/mem open for read (safest)"
+	help
+	  Do not allow /proc/pid/mem open for reading access.
+	  (Safest, but may be disruptive to some usage patterns.)
+endchoice
+
+choice
+	prompt "Restrict /proc/pid/mem OPEN_WRITE usage"
+	default PROC_MEM_RESTRICT_OPEN_WRITE_OFF
+	help
+	  Reading and writing of /proc/pid/mem bypasses memory permission
+	  checks due to the internal use of the FOLL_FORCE flag. This can be
+	  used by attackers to manipulate process memory contents that would
+	  have been otherwise protected. However, debuggers, like GDB, use
+	  this to set breakpoints, etc. To force debuggers to fall back to
+	  PEEK/POKE, see PROC_MEM_RESTRICT_OPEN_WRITE_ALL.
+
+	config PROC_MEM_RESTRICT_OPEN_WRITE_OFF
+	bool "Do not restrict /proc/pid/mem open for write (regular)"
+	help
+	  Regular behavior: allow /proc/pid/mem open for write access.
+
+	config PROC_MEM_RESTRICT_OPEN_WRITE_PTRACE
+	bool "Only allow ptracers to open /proc/pid/mem for write (safer)"
+	help
+	  Only allow opening /proc/pid/mem for writing when the current
+	  task is the active ptracer of the target task. (Safer, least
+	  disruptive to most usage patterns.)
+
+	config PROC_MEM_RESTRICT_OPEN_WRITE_ALL
+	bool "Do not allow /proc/pid/mem open for write (safest)"
+	help
+	  Do not allow /proc/pid/mem open for writing access.
+	  (Safest, but may be disruptive to some usage patterns.)
+endchoice
+
+choice
+	prompt "Restrict /proc/pid/mem WRITE usage"
+	default PROC_MEM_RESTRICT_WRITE_OFF
+	help
+	  Reading and writing of /proc/pid/mem bypasses memory permission
+	  checks due to the internal use of the FOLL_FORCE flag. This can be
+	  used by attackers to manipulate process memory contents that would
+	  have been otherwise protected. However, debuggers, like GDB, use
+	  this to set breakpoints, etc. To force debuggers to fall back to
+	  PEEK/POKE, see PROC_MEM_RESTRICT_OPEN_WRITE_ALL.
+
+	config PROC_MEM_RESTRICT_WRITE_OFF
+	bool "Do not restrict /proc/pid/mem writes (regular)"
+	help
+	  Regular behavior: allow /proc/pid/mem write access.
+
+	config PROC_MEM_RESTRICT_WRITE_PTRACE
+	bool "Only allow ptracers to write to /proc/pid/mem (safer)"
+	help
+	  Only allow writing to /proc/pid/mem when the current task is
+	  the active ptracer of the target task. (Safer, least disruptive
+	  to most usage patterns.)
+
+	config PROC_MEM_RESTRICT_WRITE_ALL
+	bool "Do not allow writes to /proc/pid/mem (safest)"
+	help
+	  Do not allow writing to /proc/pid/mem.
+	  (Safest, but may be disruptive to some usage patterns.)
+endchoice
+
 source "security/selinux/Kconfig"
 source "security/smack/Kconfig"
 source "security/tomoyo/Kconfig"
-- 
2.44.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v6 1/2] proc: pass file instead of inode to proc_mem_open
  2024-06-13 13:39 [PATCH v6 1/2] proc: pass file instead of inode to proc_mem_open Adrian Ratiu
  2024-06-13 13:39 ` [PATCH v6 2/2] proc: restrict /proc/pid/mem Adrian Ratiu
@ 2024-06-17  8:48 ` Christian Brauner
  2024-06-17 10:47   ` Adrian Ratiu
  1 sibling, 1 reply; 9+ messages in thread
From: Christian Brauner @ 2024-06-17  8:48 UTC (permalink / raw)
  To: Adrian Ratiu
  Cc: linux-fsdevel, linux-security-module, linux-kernel,
	linux-hardening, linux-doc, kernel, gbiv, ryanbeltran, inglorion,
	ajordanr, jorgelo, Jann Horn, Kees Cook, Jeff Xu, Kees Cook

On Thu, Jun 13, 2024 at 04:39:36PM GMT, Adrian Ratiu wrote:
> The file struct is required in proc_mem_open() so its
> f_mode can be checked when deciding whether to allow or
> deny /proc/*/mem open requests via the new read/write
> and foll_force restriction mechanism.
> 
> Thus instead of directly passing the inode to the fun,
> we pass the file and get the inode inside it.
> 
> Cc: Jann Horn <jannh@google.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Jeff Xu <jeffxu@google.com>
> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
> Reviewed-by: Kees Cook <kees@kernel.org>
> ---

I've tentatively applies this patch to #vfs.procfs.
One comment, one question:

> No changes in v6
> ---
>  fs/proc/base.c       | 6 +++---
>  fs/proc/internal.h   | 2 +-
>  fs/proc/task_mmu.c   | 6 +++---
>  fs/proc/task_nommu.c | 2 +-
>  4 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 72a1acd03675..4c607089f66e 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -794,9 +794,9 @@ static const struct file_operations proc_single_file_operations = {
>  };
>  
>  
> -struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode)
> +struct mm_struct *proc_mem_open(struct file  *file, unsigned int mode)
>  {
> -	struct task_struct *task = get_proc_task(inode);
> +	struct task_struct *task = get_proc_task(file->f_inode);

Comment: This should use file_inode(file) but I've just fixed that when I
applied.

Question: Is this an equivalent transformation. So is the inode that was
passed to proc_mem_open() always the same inode as file_inode(file)?

>  	struct mm_struct *mm = ERR_PTR(-ESRCH);
>  
>  	if (task) {
> @@ -816,7 +816,7 @@ struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode)
>  
>  static int __mem_open(struct inode *inode, struct file *file, unsigned int mode)
>  {
> -	struct mm_struct *mm = proc_mem_open(inode, mode);
> +	struct mm_struct *mm = proc_mem_open(file, mode);
>  
>  	if (IS_ERR(mm))
>  		return PTR_ERR(mm);
> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> index a71ac5379584..d38b2eea40d1 100644
> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -295,7 +295,7 @@ struct proc_maps_private {
>  #endif
>  } __randomize_layout;
>  
> -struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode);
> +struct mm_struct *proc_mem_open(struct file *file, unsigned int mode);
>  
>  extern const struct file_operations proc_pid_maps_operations;
>  extern const struct file_operations proc_pid_numa_maps_operations;
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index f8d35f993fe5..fe3b2182b0aa 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -210,7 +210,7 @@ static int proc_maps_open(struct inode *inode, struct file *file,
>  		return -ENOMEM;
>  
>  	priv->inode = inode;
> -	priv->mm = proc_mem_open(inode, PTRACE_MODE_READ);
> +	priv->mm = proc_mem_open(file, PTRACE_MODE_READ);
>  	if (IS_ERR(priv->mm)) {
>  		int err = PTR_ERR(priv->mm);
>  
> @@ -1030,7 +1030,7 @@ static int smaps_rollup_open(struct inode *inode, struct file *file)
>  		goto out_free;
>  
>  	priv->inode = inode;
> -	priv->mm = proc_mem_open(inode, PTRACE_MODE_READ);
> +	priv->mm = proc_mem_open(file, PTRACE_MODE_READ);
>  	if (IS_ERR(priv->mm)) {
>  		ret = PTR_ERR(priv->mm);
>  
> @@ -1754,7 +1754,7 @@ static int pagemap_open(struct inode *inode, struct file *file)
>  {
>  	struct mm_struct *mm;
>  
> -	mm = proc_mem_open(inode, PTRACE_MODE_READ);
> +	mm = proc_mem_open(file, PTRACE_MODE_READ);
>  	if (IS_ERR(mm))
>  		return PTR_ERR(mm);
>  	file->private_data = mm;
> diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
> index bce674533000..a8ab182a4ed1 100644
> --- a/fs/proc/task_nommu.c
> +++ b/fs/proc/task_nommu.c
> @@ -259,7 +259,7 @@ static int maps_open(struct inode *inode, struct file *file,
>  		return -ENOMEM;
>  
>  	priv->inode = inode;
> -	priv->mm = proc_mem_open(inode, PTRACE_MODE_READ);
> +	priv->mm = proc_mem_open(file, PTRACE_MODE_READ);
>  	if (IS_ERR(priv->mm)) {
>  		int err = PTR_ERR(priv->mm);
>  
> -- 
> 2.44.2
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v6 1/2] proc: pass file instead of  inode to proc_mem_open
  2024-06-17  8:48 ` [PATCH v6 1/2] proc: pass file instead of inode to proc_mem_open Christian Brauner
@ 2024-06-17 10:47   ` Adrian Ratiu
  0 siblings, 0 replies; 9+ messages in thread
From: Adrian Ratiu @ 2024-06-17 10:47 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, linux-security-module, linux-kernel,
	linux-hardening, linux-doc, kernel, gbiv, ryanbeltran, inglorion,
	ajordanr, jorgelo, Jann Horn, Kees Cook, Jeff Xu, Kees Cook

On Monday, June 17, 2024 11:48 EEST, Christian Brauner <brauner@kernel.org> wrote:

> On Thu, Jun 13, 2024 at 04:39:36PM GMT, Adrian Ratiu wrote:
> > The file struct is required in proc_mem_open() so its
> > f_mode can be checked when deciding whether to allow or
> > deny /proc/*/mem open requests via the new read/write
> > and foll_force restriction mechanism.
> > 
> > Thus instead of directly passing the inode to the fun,
> > we pass the file and get the inode inside it.
> > 
> > Cc: Jann Horn <jannh@google.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Jeff Xu <jeffxu@google.com>
> > Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
> > Reviewed-by: Kees Cook <kees@kernel.org>
> > ---
> 
> I've tentatively applies this patch to #vfs.procfs.
> One comment, one question:
> 
> > No changes in v6
> > ---
> >  fs/proc/base.c       | 6 +++---
> >  fs/proc/internal.h   | 2 +-
> >  fs/proc/task_mmu.c   | 6 +++---
> >  fs/proc/task_nommu.c | 2 +-
> >  4 files changed, 8 insertions(+), 8 deletions(-)
> > 
> > diff --git a/fs/proc/base.c b/fs/proc/base.c
> > index 72a1acd03675..4c607089f66e 100644
> > --- a/fs/proc/base.c
> > +++ b/fs/proc/base.c
> > @@ -794,9 +794,9 @@ static const struct file_operations proc_single_file_operations = {
> >  };
> >  
> >  
> > -struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode)
> > +struct mm_struct *proc_mem_open(struct file  *file, unsigned int mode)
> >  {
> > -	struct task_struct *task = get_proc_task(inode);
> > +	struct task_struct *task = get_proc_task(file->f_inode);
> 
> Comment: This should use file_inode(file) but I've just fixed that when I
> applied.
> 
> Question: Is this an equivalent transformation. So is the inode that was
> passed to proc_mem_open() always the same inode as file_inode(file)?

Thank you!

Yes, the inode associated with the file struct should be always the same
while the file is opened, so the link set during the top-level mem_open()
callback should still hold while it itself calls into its sub-functions like
proc_mem_open().


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v6 2/2] proc: restrict /proc/pid/mem
  2024-06-13 13:39 ` [PATCH v6 2/2] proc: restrict /proc/pid/mem Adrian Ratiu
@ 2024-06-17 18:00   ` Kees Cook
  2024-06-18 22:39   ` Jeff Xu
  1 sibling, 0 replies; 9+ messages in thread
From: Kees Cook @ 2024-06-17 18:00 UTC (permalink / raw)
  To: Adrian Ratiu
  Cc: linux-fsdevel, linux-security-module, linux-kernel,
	linux-hardening, linux-doc, kernel, gbiv, ryanbeltran, inglorion,
	ajordanr, jorgelo, Guenter Roeck, Doug Anderson, Jann Horn,
	Andrew Morton, Randy Dunlap, Christian Brauner, Jeff Xu,
	Mike Frysinger

On Thu, Jun 13, 2024 at 04:39:37PM +0300, Adrian Ratiu wrote:
> Prior to v2.6.39 write access to /proc/<pid>/mem was restricted,
> after which it got allowed in commit 198214a7ee50 ("proc: enable
> writing to /proc/pid/mem"). Famous last words from that patch:
> "no longer a security hazard". :)

This version looks great! Thanks for all the changes. :)

Reviewed-by: Kees Cook <kees@kernel.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v6 2/2] proc: restrict /proc/pid/mem
  2024-06-13 13:39 ` [PATCH v6 2/2] proc: restrict /proc/pid/mem Adrian Ratiu
  2024-06-17 18:00   ` Kees Cook
@ 2024-06-18 22:39   ` Jeff Xu
  2024-06-19 20:41     ` Kees Cook
  1 sibling, 1 reply; 9+ messages in thread
From: Jeff Xu @ 2024-06-18 22:39 UTC (permalink / raw)
  To: Adrian Ratiu
  Cc: linux-fsdevel, linux-security-module, linux-kernel,
	linux-hardening, linux-doc, kernel, gbiv, ryanbeltran, inglorion,
	ajordanr, jorgelo, Guenter Roeck, Doug Anderson, Kees Cook,
	Jann Horn, Andrew Morton, Randy Dunlap, Christian Brauner,
	Jeff Xu, Mike Frysinger

Hi

Thanks for the patch !

On Thu, Jun 13, 2024 at 6:40 AM Adrian Ratiu <adrian.ratiu@collabora.com> wrote:
>
> Prior to v2.6.39 write access to /proc/<pid>/mem was restricted,
> after which it got allowed in commit 198214a7ee50 ("proc: enable
> writing to /proc/pid/mem"). Famous last words from that patch:
> "no longer a security hazard". :)
>
> Afterwards exploits started causing drama like [1]. The exploits
> using /proc/*/mem can be rather sophisticated like [2] which
> installed an arbitrary payload from noexec storage into a running
> process then exec'd it, which itself could include an ELF loader
> to run arbitrary code off noexec storage.
>
> One of the well-known problems with /proc/*/mem writes is they
> ignore page permissions via FOLL_FORCE, as opposed to writes via
> process_vm_writev which respect page permissions. These writes can
> also be used to bypass mode bits.
>
> To harden against these types of attacks, distrbutions might want
> to restrict /proc/pid/mem accesses, either entirely or partially,
> for eg. to restrict FOLL_FORCE usage.
>
> Known valid use-cases which still need these accesses are:
>
> * Debuggers which also have ptrace permissions, so they can access
> memory anyway via PTRACE_POKEDATA & co. Some debuggers like GDB
> are designed to write /proc/pid/mem for basic functionality.
>
> * Container supervisors using the seccomp notifier to intercept
> syscalls and rewrite memory of calling processes by passing
> around /proc/pid/mem file descriptors.
>
> There might be more, that's why these params default to disabled.
>
> Regarding other mechanisms which can block these accesses:
>
> * seccomp filters can be used to block mmap/mprotect calls with W|X
> perms, but they often can't block open calls as daemons want to
> read/write their runtime state and seccomp filters cannot check
> file paths, so plain write calls can't be easily blocked.
>
> * Since the mem file is part of the dynamic /proc/<pid>/ space, we
> can't run chmod once at boot to restrict it (and trying to react
> to every process and run chmod doesn't scale, and the kernel no
> longer allows chmod on any of these paths).
>
> * SELinux could be used with a rule to cover all /proc/*/mem files,
> but even then having multiple ways to deny an attack is useful in
> case one layer fails.
>
> Thus we introduce four kernel parameters to restrict /proc/*/mem
> access: open-read, open-write, write and foll_force. All these can
> be independently set to the following values:
>
> all     => restrict all access unconditionally.
> ptracer => restrict all access except for ptracer processes.
>
> If left unset, the existing behaviour is preserved, i.e. access
> is governed by basic file permissions.
>
> Examples which can be passed by bootloaders:
>
> proc_mem.restrict_foll_force=all
> proc_mem.restrict_open_write=ptracer
> proc_mem.restrict_open_read=ptracer
> proc_mem.restrict_write=all
>
> These knobs can also be enabled via Kconfig like for eg:
>
> CONFIG_PROC_MEM_RESTRICT_WRITE_PTRACE_DEFAULT=y
> CONFIG_PROC_MEM_RESTRICT_FOLL_FORCE_PTRACE_DEFAULT=y
>
> Each distribution needs to decide what restrictions to apply,
> depending on its use-cases. Embedded systems might want to do
> more, while general-purpouse distros might want a more relaxed
> policy, because for e.g. foll_force=all and write=all both break
> break GDB, so it might be a bit excessive.
>
> Based on an initial patch by Mike Frysinger <vapier@chromium.org>.
>
It is noteworthy that ChromeOS has benefited from blocking
/proc/pid/mem write since 2017 [1], owing to the patch implemented by
Mike Frysinger.

It is great that upstream can consider this patch, ChromeOS will use
the solution once it is accepted.

> Link: https://lwn.net/Articles/476947/ [1]
> Link: https://issues.chromium.org/issues/40089045 [2]
> Cc: Guenter Roeck <groeck@chromium.org>
> Cc: Doug Anderson <dianders@chromium.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Jann Horn <jannh@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Jeff Xu <jeffxu@google.com>
> Co-developed-by: Mike Frysinger <vapier@chromium.org>
> Signed-off-by: Mike Frysinger <vapier@chromium.org>
> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>

Reviewed-by: Jeff Xu <jeffxu@chromium.org>
Tested-by: Jeff Xu <jeffxu@chromium.org>
[1] https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/764773

-Jeff Xu


-Jeff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v6 2/2] proc: restrict /proc/pid/mem
  2024-06-18 22:39   ` Jeff Xu
@ 2024-06-19 20:41     ` Kees Cook
  2024-06-19 21:31       ` Adrian Ratiu
  2024-06-20 16:24       ` Jeff Xu
  0 siblings, 2 replies; 9+ messages in thread
From: Kees Cook @ 2024-06-19 20:41 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Adrian Ratiu, linux-fsdevel, linux-security-module, linux-kernel,
	linux-hardening, linux-doc, kernel, gbiv, ryanbeltran, inglorion,
	ajordanr, jorgelo, Guenter Roeck, Doug Anderson, Jann Horn,
	Andrew Morton, Randy Dunlap, Christian Brauner, Jeff Xu,
	Mike Frysinger

On Tue, Jun 18, 2024 at 03:39:44PM -0700, Jeff Xu wrote:
> Hi
> 
> Thanks for the patch !
> 
> On Thu, Jun 13, 2024 at 6:40 AM Adrian Ratiu <adrian.ratiu@collabora.com> wrote:
> >
> > Prior to v2.6.39 write access to /proc/<pid>/mem was restricted,
> > after which it got allowed in commit 198214a7ee50 ("proc: enable
> > writing to /proc/pid/mem"). Famous last words from that patch:
> > "no longer a security hazard". :)
> >
> > Afterwards exploits started causing drama like [1]. The exploits
> > using /proc/*/mem can be rather sophisticated like [2] which
> > installed an arbitrary payload from noexec storage into a running
> > process then exec'd it, which itself could include an ELF loader
> > to run arbitrary code off noexec storage.
> >
> > One of the well-known problems with /proc/*/mem writes is they
> > ignore page permissions via FOLL_FORCE, as opposed to writes via
> > process_vm_writev which respect page permissions. These writes can
> > also be used to bypass mode bits.
> >
> > To harden against these types of attacks, distrbutions might want
> > to restrict /proc/pid/mem accesses, either entirely or partially,
> > for eg. to restrict FOLL_FORCE usage.
> >
> > Known valid use-cases which still need these accesses are:
> >
> > * Debuggers which also have ptrace permissions, so they can access
> > memory anyway via PTRACE_POKEDATA & co. Some debuggers like GDB
> > are designed to write /proc/pid/mem for basic functionality.
> >
> > * Container supervisors using the seccomp notifier to intercept
> > syscalls and rewrite memory of calling processes by passing
> > around /proc/pid/mem file descriptors.
> >
> > There might be more, that's why these params default to disabled.
> >
> > Regarding other mechanisms which can block these accesses:
> >
> > * seccomp filters can be used to block mmap/mprotect calls with W|X
> > perms, but they often can't block open calls as daemons want to
> > read/write their runtime state and seccomp filters cannot check
> > file paths, so plain write calls can't be easily blocked.
> >
> > * Since the mem file is part of the dynamic /proc/<pid>/ space, we
> > can't run chmod once at boot to restrict it (and trying to react
> > to every process and run chmod doesn't scale, and the kernel no
> > longer allows chmod on any of these paths).
> >
> > * SELinux could be used with a rule to cover all /proc/*/mem files,
> > but even then having multiple ways to deny an attack is useful in
> > case one layer fails.
> >
> > Thus we introduce four kernel parameters to restrict /proc/*/mem
> > access: open-read, open-write, write and foll_force. All these can
> > be independently set to the following values:
> >
> > all     => restrict all access unconditionally.
> > ptracer => restrict all access except for ptracer processes.
> >
> > If left unset, the existing behaviour is preserved, i.e. access
> > is governed by basic file permissions.
> >
> > Examples which can be passed by bootloaders:
> >
> > proc_mem.restrict_foll_force=all
> > proc_mem.restrict_open_write=ptracer
> > proc_mem.restrict_open_read=ptracer
> > proc_mem.restrict_write=all
> >
> > These knobs can also be enabled via Kconfig like for eg:
> >
> > CONFIG_PROC_MEM_RESTRICT_WRITE_PTRACE_DEFAULT=y
> > CONFIG_PROC_MEM_RESTRICT_FOLL_FORCE_PTRACE_DEFAULT=y
> >
> > Each distribution needs to decide what restrictions to apply,
> > depending on its use-cases. Embedded systems might want to do
> > more, while general-purpouse distros might want a more relaxed
> > policy, because for e.g. foll_force=all and write=all both break
> > break GDB, so it might be a bit excessive.
> >
> > Based on an initial patch by Mike Frysinger <vapier@chromium.org>.
> >
> It is noteworthy that ChromeOS has benefited from blocking
> /proc/pid/mem write since 2017 [1], owing to the patch implemented by
> Mike Frysinger.
> 
> It is great that upstream can consider this patch, ChromeOS will use
> the solution once it is accepted.
> 
> > Link: https://lwn.net/Articles/476947/ [1]
> > Link: https://issues.chromium.org/issues/40089045 [2]
> > Cc: Guenter Roeck <groeck@chromium.org>
> > Cc: Doug Anderson <dianders@chromium.org>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Jann Horn <jannh@google.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Randy Dunlap <rdunlap@infradead.org>
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Jeff Xu <jeffxu@google.com>
> > Co-developed-by: Mike Frysinger <vapier@chromium.org>
> > Signed-off-by: Mike Frysinger <vapier@chromium.org>
> > Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
> 
> Reviewed-by: Jeff Xu <jeffxu@chromium.org>
> Tested-by: Jeff Xu <jeffxu@chromium.org>
> [1] https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/764773

Thanks for the testing! What settings did you use? I think Chrome OS was
effectively doing this?

PROC_MEM_RESTRICT_OPEN_READ_OFF=y
CONFIG_PROC_MEM_RESTRICT_OPEN_WRITE_ALL=y
CONFIG_PROC_MEM_RESTRICT_WRITE_ALL=y
CONFIG_PROC_MEM_RESTRICT_FOLL_FORCE_ALL=y

Though I don't see the FOLL_FORCE changes in the linked Chrome OS patch,
but I suspect it's unreachable with
CONFIG_PROC_MEM_RESTRICT_OPEN_WRITE_ALL=y.

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v6 2/2] proc: restrict /proc/pid/mem
  2024-06-19 20:41     ` Kees Cook
@ 2024-06-19 21:31       ` Adrian Ratiu
  2024-06-20 16:24       ` Jeff Xu
  1 sibling, 0 replies; 9+ messages in thread
From: Adrian Ratiu @ 2024-06-19 21:31 UTC (permalink / raw)
  To: Kees Cook
  Cc: Jeff Xu, linux-fsdevel, linux-security-module, linux-kernel,
	linux-hardening, linux-doc, kernel, gbiv, ryanbeltran, inglorion,
	ajordanr, jorgelo, Guenter Roeck, Doug Anderson, Jann Horn,
	Andrew Morton, Randy Dunlap, Christian Brauner, Jeff Xu,
	Mike Frysinger

On Wednesday, June 19, 2024 23:41 EEST, Kees Cook <kees@kernel.org> wrote:

> On Tue, Jun 18, 2024 at 03:39:44PM -0700, Jeff Xu wrote:
> > Hi
> > 
> > Thanks for the patch !
> > 
> > On Thu, Jun 13, 2024 at 6:40 AM Adrian Ratiu <adrian.ratiu@collabora.com> wrote:
> > >
> > > Prior to v2.6.39 write access to /proc/<pid>/mem was restricted,
> > > after which it got allowed in commit 198214a7ee50 ("proc: enable
> > > writing to /proc/pid/mem"). Famous last words from that patch:
> > > "no longer a security hazard". :)
> > >
> > > Afterwards exploits started causing drama like [1]. The exploits
> > > using /proc/*/mem can be rather sophisticated like [2] which
> > > installed an arbitrary payload from noexec storage into a running
> > > process then exec'd it, which itself could include an ELF loader
> > > to run arbitrary code off noexec storage.
> > >
> > > One of the well-known problems with /proc/*/mem writes is they
> > > ignore page permissions via FOLL_FORCE, as opposed to writes via
> > > process_vm_writev which respect page permissions. These writes can
> > > also be used to bypass mode bits.
> > >
> > > To harden against these types of attacks, distrbutions might want
> > > to restrict /proc/pid/mem accesses, either entirely or partially,
> > > for eg. to restrict FOLL_FORCE usage.
> > >
> > > Known valid use-cases which still need these accesses are:
> > >
> > > * Debuggers which also have ptrace permissions, so they can access
> > > memory anyway via PTRACE_POKEDATA & co. Some debuggers like GDB
> > > are designed to write /proc/pid/mem for basic functionality.
> > >
> > > * Container supervisors using the seccomp notifier to intercept
> > > syscalls and rewrite memory of calling processes by passing
> > > around /proc/pid/mem file descriptors.
> > >
> > > There might be more, that's why these params default to disabled.
> > >
> > > Regarding other mechanisms which can block these accesses:
> > >
> > > * seccomp filters can be used to block mmap/mprotect calls with W|X
> > > perms, but they often can't block open calls as daemons want to
> > > read/write their runtime state and seccomp filters cannot check
> > > file paths, so plain write calls can't be easily blocked.
> > >
> > > * Since the mem file is part of the dynamic /proc/<pid>/ space, we
> > > can't run chmod once at boot to restrict it (and trying to react
> > > to every process and run chmod doesn't scale, and the kernel no
> > > longer allows chmod on any of these paths).
> > >
> > > * SELinux could be used with a rule to cover all /proc/*/mem files,
> > > but even then having multiple ways to deny an attack is useful in
> > > case one layer fails.
> > >
> > > Thus we introduce four kernel parameters to restrict /proc/*/mem
> > > access: open-read, open-write, write and foll_force. All these can
> > > be independently set to the following values:
> > >
> > > all     => restrict all access unconditionally.
> > > ptracer => restrict all access except for ptracer processes.
> > >
> > > If left unset, the existing behaviour is preserved, i.e. access
> > > is governed by basic file permissions.
> > >
> > > Examples which can be passed by bootloaders:
> > >
> > > proc_mem.restrict_foll_force=all
> > > proc_mem.restrict_open_write=ptracer
> > > proc_mem.restrict_open_read=ptracer
> > > proc_mem.restrict_write=all
> > >
> > > These knobs can also be enabled via Kconfig like for eg:
> > >
> > > CONFIG_PROC_MEM_RESTRICT_WRITE_PTRACE_DEFAULT=y
> > > CONFIG_PROC_MEM_RESTRICT_FOLL_FORCE_PTRACE_DEFAULT=y
> > >
> > > Each distribution needs to decide what restrictions to apply,
> > > depending on its use-cases. Embedded systems might want to do
> > > more, while general-purpouse distros might want a more relaxed
> > > policy, because for e.g. foll_force=all and write=all both break
> > > break GDB, so it might be a bit excessive.
> > >
> > > Based on an initial patch by Mike Frysinger <vapier@chromium.org>.
> > >
> > It is noteworthy that ChromeOS has benefited from blocking
> > /proc/pid/mem write since 2017 [1], owing to the patch implemented by
> > Mike Frysinger.
> > 
> > It is great that upstream can consider this patch, ChromeOS will use
> > the solution once it is accepted.
> > 
> > > Link: https://lwn.net/Articles/476947/ [1]
> > > Link: https://issues.chromium.org/issues/40089045 [2]
> > > Cc: Guenter Roeck <groeck@chromium.org>
> > > Cc: Doug Anderson <dianders@chromium.org>
> > > Cc: Kees Cook <keescook@chromium.org>
> > > Cc: Jann Horn <jannh@google.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: Randy Dunlap <rdunlap@infradead.org>
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > Cc: Jeff Xu <jeffxu@google.com>
> > > Co-developed-by: Mike Frysinger <vapier@chromium.org>
> > > Signed-off-by: Mike Frysinger <vapier@chromium.org>
> > > Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
> > 
> > Reviewed-by: Jeff Xu <jeffxu@chromium.org>
> > Tested-by: Jeff Xu <jeffxu@chromium.org>
> > [1] https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/764773
> 
> Thanks for the testing! What settings did you use? I think Chrome OS was
> effectively doing this?
> 
> PROC_MEM_RESTRICT_OPEN_READ_OFF=y
> CONFIG_PROC_MEM_RESTRICT_OPEN_WRITE_ALL=y
> CONFIG_PROC_MEM_RESTRICT_WRITE_ALL=y
> CONFIG_PROC_MEM_RESTRICT_FOLL_FORCE_ALL=y

Correct except for CONFIG_PROC_MEM_RESTRICT_OPEN_WRITE_ALL=y
which will make ChromeOS boot loop because upstart/systemd-tmpfiles
will fail and trigger a recovery + reboot, then the kernel will again block
opening the file and so on. :)

ChromeOS effectively only blocks all writes which also blocks all foll_force.

> 
> Though I don't see the FOLL_FORCE changes in the linked Chrome OS patch,
> but I suspect it's unreachable with
> CONFIG_PROC_MEM_RESTRICT_OPEN_WRITE_ALL=y.
 
That is correct, CONFIG_PROC_MEM_RESTRICT_OPEN_WRITE_ALL=y also
blocks FOLL_FORCE.

The idea there is to restrict writes entirely in production images via
Kconfig and then relax the restriction in dev/test images via boot params
proc_mem.restrict_write=ptracer proc_mem.restrict_foll_force=ptracer

See this CL:
https://chromium-review.googlesource.com/c/chromiumos/platform/vboot_reference/+/5631026


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v6 2/2] proc: restrict /proc/pid/mem
  2024-06-19 20:41     ` Kees Cook
  2024-06-19 21:31       ` Adrian Ratiu
@ 2024-06-20 16:24       ` Jeff Xu
  1 sibling, 0 replies; 9+ messages in thread
From: Jeff Xu @ 2024-06-20 16:24 UTC (permalink / raw)
  To: Kees Cook
  Cc: Adrian Ratiu, linux-fsdevel, linux-security-module, linux-kernel,
	linux-hardening, linux-doc, kernel, gbiv, ryanbeltran, inglorion,
	ajordanr, jorgelo, Guenter Roeck, Doug Anderson, Jann Horn,
	Andrew Morton, Randy Dunlap, Christian Brauner, Jeff Xu,
	Mike Frysinger

On Wed, Jun 19, 2024 at 1:41 PM Kees Cook <kees@kernel.org> wrote:
>
> On Tue, Jun 18, 2024 at 03:39:44PM -0700, Jeff Xu wrote:
> > Hi
> >
> > Thanks for the patch !
> >
> > On Thu, Jun 13, 2024 at 6:40 AM Adrian Ratiu <adrian.ratiu@collabora.com> wrote:
> > >
> > > Prior to v2.6.39 write access to /proc/<pid>/mem was restricted,
> > > after which it got allowed in commit 198214a7ee50 ("proc: enable
> > > writing to /proc/pid/mem"). Famous last words from that patch:
> > > "no longer a security hazard". :)
> > >
> > > Afterwards exploits started causing drama like [1]. The exploits
> > > using /proc/*/mem can be rather sophisticated like [2] which
> > > installed an arbitrary payload from noexec storage into a running
> > > process then exec'd it, which itself could include an ELF loader
> > > to run arbitrary code off noexec storage.
> > >
> > > One of the well-known problems with /proc/*/mem writes is they
> > > ignore page permissions via FOLL_FORCE, as opposed to writes via
> > > process_vm_writev which respect page permissions. These writes can
> > > also be used to bypass mode bits.
> > >
> > > To harden against these types of attacks, distrbutions might want
> > > to restrict /proc/pid/mem accesses, either entirely or partially,
> > > for eg. to restrict FOLL_FORCE usage.
> > >
> > > Known valid use-cases which still need these accesses are:
> > >
> > > * Debuggers which also have ptrace permissions, so they can access
> > > memory anyway via PTRACE_POKEDATA & co. Some debuggers like GDB
> > > are designed to write /proc/pid/mem for basic functionality.
> > >
> > > * Container supervisors using the seccomp notifier to intercept
> > > syscalls and rewrite memory of calling processes by passing
> > > around /proc/pid/mem file descriptors.
> > >
> > > There might be more, that's why these params default to disabled.
> > >
> > > Regarding other mechanisms which can block these accesses:
> > >
> > > * seccomp filters can be used to block mmap/mprotect calls with W|X
> > > perms, but they often can't block open calls as daemons want to
> > > read/write their runtime state and seccomp filters cannot check
> > > file paths, so plain write calls can't be easily blocked.
> > >
> > > * Since the mem file is part of the dynamic /proc/<pid>/ space, we
> > > can't run chmod once at boot to restrict it (and trying to react
> > > to every process and run chmod doesn't scale, and the kernel no
> > > longer allows chmod on any of these paths).
> > >
> > > * SELinux could be used with a rule to cover all /proc/*/mem files,
> > > but even then having multiple ways to deny an attack is useful in
> > > case one layer fails.
> > >
> > > Thus we introduce four kernel parameters to restrict /proc/*/mem
> > > access: open-read, open-write, write and foll_force. All these can
> > > be independently set to the following values:
> > >
> > > all     => restrict all access unconditionally.
> > > ptracer => restrict all access except for ptracer processes.
> > >
> > > If left unset, the existing behaviour is preserved, i.e. access
> > > is governed by basic file permissions.
> > >
> > > Examples which can be passed by bootloaders:
> > >
> > > proc_mem.restrict_foll_force=all
> > > proc_mem.restrict_open_write=ptracer
> > > proc_mem.restrict_open_read=ptracer
> > > proc_mem.restrict_write=all
> > >
> > > These knobs can also be enabled via Kconfig like for eg:
> > >
> > > CONFIG_PROC_MEM_RESTRICT_WRITE_PTRACE_DEFAULT=y
> > > CONFIG_PROC_MEM_RESTRICT_FOLL_FORCE_PTRACE_DEFAULT=y
> > >
> > > Each distribution needs to decide what restrictions to apply,
> > > depending on its use-cases. Embedded systems might want to do
> > > more, while general-purpouse distros might want a more relaxed
> > > policy, because for e.g. foll_force=all and write=all both break
> > > break GDB, so it might be a bit excessive.
> > >
> > > Based on an initial patch by Mike Frysinger <vapier@chromium.org>.
> > >
> > It is noteworthy that ChromeOS has benefited from blocking
> > /proc/pid/mem write since 2017 [1], owing to the patch implemented by
> > Mike Frysinger.
> >
> > It is great that upstream can consider this patch, ChromeOS will use
> > the solution once it is accepted.
> >
> > > Link: https://lwn.net/Articles/476947/ [1]
> > > Link: https://issues.chromium.org/issues/40089045 [2]
> > > Cc: Guenter Roeck <groeck@chromium.org>
> > > Cc: Doug Anderson <dianders@chromium.org>
> > > Cc: Kees Cook <keescook@chromium.org>
> > > Cc: Jann Horn <jannh@google.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: Randy Dunlap <rdunlap@infradead.org>
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > Cc: Jeff Xu <jeffxu@google.com>
> > > Co-developed-by: Mike Frysinger <vapier@chromium.org>
> > > Signed-off-by: Mike Frysinger <vapier@chromium.org>
> > > Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
> >
> > Reviewed-by: Jeff Xu <jeffxu@chromium.org>
> > Tested-by: Jeff Xu <jeffxu@chromium.org>
> > [1] https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/764773
>
> Thanks for the testing! What settings did you use? I think Chrome OS was
> effectively doing this?
>
> PROC_MEM_RESTRICT_OPEN_READ_OFF=y
> CONFIG_PROC_MEM_RESTRICT_OPEN_WRITE_ALL=y
> CONFIG_PROC_MEM_RESTRICT_WRITE_ALL=y
> CONFIG_PROC_MEM_RESTRICT_FOLL_FORCE_ALL=y
>
> Though I don't see the FOLL_FORCE changes in the linked Chrome OS patch,
> but I suspect it's unreachable with
> CONFIG_PROC_MEM_RESTRICT_OPEN_WRITE_ALL=y.
>
I use CONFIG_PROC_MEM_RESTRICT_WRITE_ALL=y and
did manual test writing to /proc/pid/mem using code similar to [1]

The __mem_rw_block_writes check is placed ahead of
__mem_rw_get_foll_force_flag, so it doesn't need
CONFIG_PROC_MEM_RESTRICT_FOLL_FORCE_DEFAULT. It might be nice to call
this out in kernel-parameters.txt.

I didn't restrict_open_read and restrict_open_write, ChromeOS doesn't
use those two.

-Jeff

[1] https://offlinemark.com/an-obscure-quirk-of-proc/

> -Kees

>
> --
> Kees Cook

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-06-20 16:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-13 13:39 [PATCH v6 1/2] proc: pass file instead of inode to proc_mem_open Adrian Ratiu
2024-06-13 13:39 ` [PATCH v6 2/2] proc: restrict /proc/pid/mem Adrian Ratiu
2024-06-17 18:00   ` Kees Cook
2024-06-18 22:39   ` Jeff Xu
2024-06-19 20:41     ` Kees Cook
2024-06-19 21:31       ` Adrian Ratiu
2024-06-20 16:24       ` Jeff Xu
2024-06-17  8:48 ` [PATCH v6 1/2] proc: pass file instead of inode to proc_mem_open Christian Brauner
2024-06-17 10:47   ` Adrian Ratiu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).