Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] exec: mirror set_dumpable() to thread-group siblings' user_dumpable
@ 2026-05-15  2:05 Hyunwoo Kim
  2026-05-15  3:28 ` Hyunwoo Kim
  0 siblings, 1 reply; 2+ messages in thread
From: Hyunwoo Kim @ 2026-05-15  2:05 UTC (permalink / raw)
  To: oleg, mingo, peterz, juri.lelli, vincent.guittot, torvalds, qsa,
	kees
  Cc: linux-kernel, linux-mm, imv4bel

task_still_dumpable() reads task->user_dumpable when task->mm is NULL.
That cache is written exactly once in exit_mm(): a single
get_dumpable(mm) load taken just before task->mm is cleared.

The load is not ordered against set_dumpable() writes performed by
CLONE_VM siblings, either via commit_creds() (an euid/egid/fsuid/fsgid
change or a capability gain - !cred_cap_issubset(old, new)) or via
prctl(PR_SET_DUMPABLE). If a sibling stores SUID_DUMP_DISABLE after
the exiting thread observed SUID_DUMP_USER, the live shared mm and the
cached value diverge with no later refresh, so task_still_dumpable()
keeps returning SUID_DUMP_USER for the exiting task.

In set_dumpable(), after the atomic update to mm->flags, walk every
thread in current's thread group under rcu_read_lock() and mirror the
new SUID_DUMP_USER bit into each task's user_dumpable cache. All
in-tree callers of set_dumpable() pass current->mm (commit_creds(),
prctl(PR_SET_DUMPABLE), begin_new_exec()), so for_each_thread(current)
covers exactly the CLONE_VM thread group whose mm was just updated.

The mirror takes task_lock(t) around each per-task write to serialize
with exit_mm()'s own user_dumpable RMW: the bit-field shares a word
with neighbor fields, and without serialization a race-window expansion
between the mirror's store and exit_mm()'s store on the same word can
lose the mirror's update.

This keeps the cache in sync with the latest set_dumpable() result
even after every thread in the group has passed exit_mm().

Fixes: 31e62c2ebbfd ("ptrace: slightly saner 'get_dumpable()' logic")
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
---
Changes in v2:
- Move the update from a read-side sibling walk in task_still_dumpable()
  to a write-side mirror in set_dumpable(); v1 left the cache stale
  when no live sibling mm remained.
- v1: https://lore.kernel.org/all/agZjJdaIGCvf0z_1@v4bel/
---
 fs/exec.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/fs/exec.c b/fs/exec.c
index ba12b4c466f6..9732b38b727f 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1910,10 +1910,26 @@ EXPORT_SYMBOL(set_binfmt);
  */
 void set_dumpable(struct mm_struct *mm, int value)
 {
+	struct task_struct *t;
+	bool user;
+
 	if (WARN_ON((unsigned)value > SUID_DUMP_ROOT))
 		return;
 
 	__mm_flags_set_mask_dumpable(mm, value);
+
+	/*
+	 * Mirror to every sibling's user_dumpable cache, serialized with
+	 * exit_mm()'s write via task_lock(t) to avoid bit-field RMW races.
+	 */
+	user = (value == SUID_DUMP_USER);
+	rcu_read_lock();
+	for_each_thread(current, t) {
+		task_lock(t);
+		t->user_dumpable = user;
+		task_unlock(t);
+	}
+	rcu_read_unlock();
 }
 
 static inline struct user_arg_ptr native_arg(const char __user *const __user *p)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH v2] exec: mirror set_dumpable() to thread-group siblings' user_dumpable
  2026-05-15  2:05 [PATCH v2] exec: mirror set_dumpable() to thread-group siblings' user_dumpable Hyunwoo Kim
@ 2026-05-15  3:28 ` Hyunwoo Kim
  0 siblings, 0 replies; 2+ messages in thread
From: Hyunwoo Kim @ 2026-05-15  3:28 UTC (permalink / raw)
  To: oleg, mingo, peterz, juri.lelli, vincent.guittot, torvalds, qsa,
	kees
  Cc: linux-kernel, linux-mm, imv4bel

On Fri, May 15, 2026 at 11:05:06AM +0900, Hyunwoo Kim wrote:
> task_still_dumpable() reads task->user_dumpable when task->mm is NULL.
> That cache is written exactly once in exit_mm(): a single
> get_dumpable(mm) load taken just before task->mm is cleared.
> 
> The load is not ordered against set_dumpable() writes performed by
> CLONE_VM siblings, either via commit_creds() (an euid/egid/fsuid/fsgid
> change or a capability gain - !cred_cap_issubset(old, new)) or via
> prctl(PR_SET_DUMPABLE). If a sibling stores SUID_DUMP_DISABLE after
> the exiting thread observed SUID_DUMP_USER, the live shared mm and the
> cached value diverge with no later refresh, so task_still_dumpable()
> keeps returning SUID_DUMP_USER for the exiting task.
> 
> In set_dumpable(), after the atomic update to mm->flags, walk every
> thread in current's thread group under rcu_read_lock() and mirror the
> new SUID_DUMP_USER bit into each task's user_dumpable cache. All
> in-tree callers of set_dumpable() pass current->mm (commit_creds(),
> prctl(PR_SET_DUMPABLE), begin_new_exec()), so for_each_thread(current)
> covers exactly the CLONE_VM thread group whose mm was just updated.
> 
> The mirror takes task_lock(t) around each per-task write to serialize
> with exit_mm()'s own user_dumpable RMW: the bit-field shares a word
> with neighbor fields, and without serialization a race-window expansion
> between the mirror's store and exit_mm()'s store on the same word can
> lose the mirror's update.
> 
> This keeps the cache in sync with the latest set_dumpable() result
> even after every thread in the group has passed exit_mm().
> 
> Fixes: 31e62c2ebbfd ("ptrace: slightly saner 'get_dumpable()' logic")
> Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
> ---
> Changes in v2:
> - Move the update from a read-side sibling walk in task_still_dumpable()
>   to a write-side mirror in set_dumpable(); v1 left the cache stale
>   when no live sibling mm remained.
> - v1: https://lore.kernel.org/all/agZjJdaIGCvf0z_1@v4bel/
> ---
>  fs/exec.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index ba12b4c466f6..9732b38b727f 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1910,10 +1910,26 @@ EXPORT_SYMBOL(set_binfmt);
>   */
>  void set_dumpable(struct mm_struct *mm, int value)
>  {
> +	struct task_struct *t;
> +	bool user;
> +
>  	if (WARN_ON((unsigned)value > SUID_DUMP_ROOT))
>  		return;
>  
>  	__mm_flags_set_mask_dumpable(mm, value);
> +
> +	/*
> +	 * Mirror to every sibling's user_dumpable cache, serialized with
> +	 * exit_mm()'s write via task_lock(t) to avoid bit-field RMW races.
> +	 */
> +	user = (value == SUID_DUMP_USER);
> +	rcu_read_lock();
> +	for_each_thread(current, t) {
> +		task_lock(t);
> +		t->user_dumpable = user;
> +		task_unlock(t);
> +	}
> +	rcu_read_unlock();
>  }
>  
>  static inline struct user_arg_ptr native_arg(const char __user *const __user *p)
> -- 
> 2.43.0
> 

Withdrawing this patch.

On re-checking, it only fires in a contrived (meaningless) scenario;
real binary flows are blocked by 31e62c2ebbfd.


Best regards,
Hyunwoo Kim


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-05-15  3:28 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-15  2:05 [PATCH v2] exec: mirror set_dumpable() to thread-group siblings' user_dumpable Hyunwoo Kim
2026-05-15  3:28 ` Hyunwoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox