linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] kernel: Avoid memcpy of task comm
@ 2024-06-02  2:37 Yafang Shao
  2024-06-02  2:37 ` [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() Yafang Shao
                   ` (5 more replies)
  0 siblings, 6 replies; 31+ messages in thread
From: Yafang Shao @ 2024-06-02  2:37 UTC (permalink / raw)
  To: torvalds
  Cc: linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf, Yafang Shao

Using memcpy to copy the task comm relies on the length of task comm.
Changes in the task comm could result in a destination string that is not
NUL-terminated. Therefore, we should explicitly ensure the destination
string is always NUL-terminated, regardless of the task comm. This approach
will facilitate future extensions to the task comm.

As suggested by Linus [0], we can identify all relevant code with the
following git grep command:

  git grep 'memcpy.*->comm\>'

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/all/CAHk-=wjAmmHUg6vho1KjzQi2=psR30+CogFd4aXrThr2gsiS4g@mail.gmail.com/ [0]

Yafang Shao (6):
  fs/exec: Drop task_lock() inside __get_task_comm()
  tracing: Replace memcpy() with __get_task_comm()
  auditsc: Replace memcpy() with __get_task_comm()
  security: Replace memcpy() with __get_task_comm()
  bpftool: Make task comm always be NUL-terminated
  selftests/bpf: Replace memcpy() with __get_task_comm()

 fs/exec.c                                     |  7 +++--
 include/linux/sched.h                         |  2 +-
 include/linux/tracepoint.h                    |  4 +--
 include/trace/events/block.h                  | 10 +++----
 include/trace/events/oom.h                    |  2 +-
 include/trace/events/osnoise.h                |  2 +-
 include/trace/events/sched.h                  | 27 ++++++++++---------
 include/trace/events/signal.h                 |  2 +-
 include/trace/events/task.h                   |  4 +--
 kernel/auditsc.c                              |  6 ++---
 security/lsm_audit.c                          |  4 +--
 security/selinux/selinuxfs.c                  |  2 +-
 tools/bpf/bpftool/pids.c                      |  2 ++
 .../bpf/bpf_testmod/bpf_testmod-events.h      |  2 +-
 14 files changed, 41 insertions(+), 35 deletions(-)

-- 
2.39.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm()
  2024-06-02  2:37 [PATCH 0/6] kernel: Avoid memcpy of task comm Yafang Shao
@ 2024-06-02  2:37 ` Yafang Shao
  2024-06-02  3:51   ` Eric W. Biederman
                     ` (2 more replies)
  2024-06-02  2:37 ` [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm() Yafang Shao
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 31+ messages in thread
From: Yafang Shao @ 2024-06-02  2:37 UTC (permalink / raw)
  To: torvalds
  Cc: linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf, Yafang Shao, Alexander Viro,
	Christian Brauner, Jan Kara, Eric Biederman, Kees Cook

Quoted from Linus [0]:

  Since user space can randomly change their names anyway, using locking
  was always wrong for readers (for writers it probably does make sense
  to have some lock - although practically speaking nobody cares there
  either, but at least for a writer some kind of race could have
  long-term mixed results

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/all/CAHk-=wivfrF0_zvf+oj6==Sh=-npJooP8chLPEfaFV0oNYTTBA@mail.gmail.com [0]
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
---
 fs/exec.c             | 7 +++++--
 include/linux/sched.h | 2 +-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index b3c40fbb325f..b43992d35a8a 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1227,12 +1227,15 @@ static int unshare_sighand(struct task_struct *me)
 	return 0;
 }
 
+/*
+ * User space can randomly change their names anyway, so locking for readers
+ * doesn't make sense. For writers, locking is probably necessary, as a race
+ * condition could lead to long-term mixed results.
+ */
 char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk)
 {
-	task_lock(tsk);
 	/* Always NUL terminated and zero-padded */
 	strscpy_pad(buf, tsk->comm, buf_size);
-	task_unlock(tsk);
 	return buf;
 }
 EXPORT_SYMBOL_GPL(__get_task_comm);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index c75fd46506df..56a927393a38 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1083,7 +1083,7 @@ struct task_struct {
 	 *
 	 * - normally initialized setup_new_exec()
 	 * - access it with [gs]et_task_comm()
-	 * - lock it with task_lock()
+	 * - lock it with task_lock() for writing
 	 */
 	char				comm[TASK_COMM_LEN];
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm()
  2024-06-02  2:37 [PATCH 0/6] kernel: Avoid memcpy of task comm Yafang Shao
  2024-06-02  2:37 ` [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() Yafang Shao
@ 2024-06-02  2:37 ` Yafang Shao
  2024-06-03 21:20   ` Steven Rostedt
  2024-06-02  2:37 ` [PATCH 3/6] auditsc: " Yafang Shao
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 31+ messages in thread
From: Yafang Shao @ 2024-06-02  2:37 UTC (permalink / raw)
  To: torvalds
  Cc: linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf, Yafang Shao, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers

Using __get_task_comm() to read the task comm ensures that the name is
always NUL-terminated, regardless of the source string. This approach also
facilitates future extensions to the task comm.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 include/linux/tracepoint.h     |  4 ++--
 include/trace/events/block.h   | 10 +++++-----
 include/trace/events/oom.h     |  2 +-
 include/trace/events/osnoise.h |  2 +-
 include/trace/events/sched.h   | 27 ++++++++++++++-------------
 include/trace/events/signal.h  |  2 +-
 include/trace/events/task.h    |  4 ++--
 7 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 689b6d71590e..6381824d8107 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -519,10 +519,10 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
  *	*
  *
  *	TP_fast_assign(
- *		memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
+ *		__get_task_comm(__entry->next_comm, TASK_COMM_LEN, next);
  *		__entry->prev_pid	= prev->pid;
  *		__entry->prev_prio	= prev->prio;
- *		memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
+ *		__get_task_comm(__entry->prev_comm, TASK_COMM_LEN, prev);
  *		__entry->next_pid	= next->pid;
  *		__entry->next_prio	= next->prio;
  *	),
diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index 0e128ad51460..6f8c5d0014e6 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -193,7 +193,7 @@ DECLARE_EVENT_CLASS(block_rq,
 
 		blk_fill_rwbs(__entry->rwbs, rq->cmd_flags);
 		__get_str(cmd)[0] = '\0';
-		memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->comm, TASK_COMM_LEN, current);
 	),
 
 	TP_printk("%d,%d %s %u (%s) %llu + %u [%s]",
@@ -328,7 +328,7 @@ DECLARE_EVENT_CLASS(block_bio,
 		__entry->sector		= bio->bi_iter.bi_sector;
 		__entry->nr_sector	= bio_sectors(bio);
 		blk_fill_rwbs(__entry->rwbs, bio->bi_opf);
-		memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->comm, TASK_COMM_LEN, current);
 	),
 
 	TP_printk("%d,%d %s %llu + %u [%s]",
@@ -415,7 +415,7 @@ TRACE_EVENT(block_plug,
 	),
 
 	TP_fast_assign(
-		memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->comm, TASK_COMM_LEN, current);
 	),
 
 	TP_printk("[%s]", __entry->comm)
@@ -434,7 +434,7 @@ DECLARE_EVENT_CLASS(block_unplug,
 
 	TP_fast_assign(
 		__entry->nr_rq = depth;
-		memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->comm, TASK_COMM_LEN, current);
 	),
 
 	TP_printk("[%s] %d", __entry->comm, __entry->nr_rq)
@@ -485,7 +485,7 @@ TRACE_EVENT(block_split,
 		__entry->sector		= bio->bi_iter.bi_sector;
 		__entry->new_sector	= new_sector;
 		blk_fill_rwbs(__entry->rwbs, bio->bi_opf);
-		memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->comm, TASK_COMM_LEN, current);
 	),
 
 	TP_printk("%d,%d %s %llu / %llu [%s]",
diff --git a/include/trace/events/oom.h b/include/trace/events/oom.h
index b799f3bcba82..f29be9ebcd4d 100644
--- a/include/trace/events/oom.h
+++ b/include/trace/events/oom.h
@@ -23,7 +23,7 @@ TRACE_EVENT(oom_score_adj_update,
 
 	TP_fast_assign(
 		__entry->pid = task->pid;
-		memcpy(__entry->comm, task->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->comm, TASK_COMM_LEN, task);
 		__entry->oom_score_adj = task->signal->oom_score_adj;
 	),
 
diff --git a/include/trace/events/osnoise.h b/include/trace/events/osnoise.h
index 82f741ec0f57..50f480655722 100644
--- a/include/trace/events/osnoise.h
+++ b/include/trace/events/osnoise.h
@@ -20,7 +20,7 @@ TRACE_EVENT(thread_noise,
 	),
 
 	TP_fast_assign(
-		memcpy(__entry->comm, t->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->comm, TASK_COMM_LEN, t);
 		__entry->pid = t->pid;
 		__entry->start = start;
 		__entry->duration = duration;
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 68973f650c26..2a9d7c62c58a 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -9,6 +9,7 @@
 #include <linux/sched/numa_balancing.h>
 #include <linux/tracepoint.h>
 #include <linux/binfmts.h>
+#include <linux/sched.h>
 
 /*
  * Tracepoint for calling kthread_stop, performed to end a kthread:
@@ -25,7 +26,7 @@ TRACE_EVENT(sched_kthread_stop,
 	),
 
 	TP_fast_assign(
-		memcpy(__entry->comm, t->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->comm, TASK_COMM_LEN, t);
 		__entry->pid	= t->pid;
 	),
 
@@ -152,7 +153,7 @@ DECLARE_EVENT_CLASS(sched_wakeup_template,
 	),
 
 	TP_fast_assign(
-		memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->comm, TASK_COMM_LEN, p);
 		__entry->pid		= p->pid;
 		__entry->prio		= p->prio; /* XXX SCHED_DEADLINE */
 		__entry->target_cpu	= task_cpu(p);
@@ -239,11 +240,11 @@ TRACE_EVENT(sched_switch,
 	),
 
 	TP_fast_assign(
-		memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->next_comm, TASK_COMM_LEN, next);
 		__entry->prev_pid	= prev->pid;
 		__entry->prev_prio	= prev->prio;
 		__entry->prev_state	= __trace_sched_switch_state(preempt, prev_state, prev);
-		memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->prev_comm, TASK_COMM_LEN, prev);
 		__entry->next_pid	= next->pid;
 		__entry->next_prio	= next->prio;
 		/* XXX SCHED_DEADLINE */
@@ -286,7 +287,7 @@ TRACE_EVENT(sched_migrate_task,
 	),
 
 	TP_fast_assign(
-		memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->comm, TASK_COMM_LEN, p);
 		__entry->pid		= p->pid;
 		__entry->prio		= p->prio; /* XXX SCHED_DEADLINE */
 		__entry->orig_cpu	= task_cpu(p);
@@ -311,7 +312,7 @@ DECLARE_EVENT_CLASS(sched_process_template,
 	),
 
 	TP_fast_assign(
-		memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->comm, TASK_COMM_LEN, p);
 		__entry->pid		= p->pid;
 		__entry->prio		= p->prio; /* XXX SCHED_DEADLINE */
 	),
@@ -357,7 +358,7 @@ TRACE_EVENT(sched_process_wait,
 	),
 
 	TP_fast_assign(
-		memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->comm, TASK_COMM_LEN, current);
 		__entry->pid		= pid_nr(pid);
 		__entry->prio		= current->prio; /* XXX SCHED_DEADLINE */
 	),
@@ -383,9 +384,9 @@ TRACE_EVENT(sched_process_fork,
 	),
 
 	TP_fast_assign(
-		memcpy(__entry->parent_comm, parent->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->parent_comm, TASK_COMM_LEN, parent);
 		__entry->parent_pid	= parent->pid;
-		memcpy(__entry->child_comm, child->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->child_comm, TASK_COMM_LEN, child);
 		__entry->child_pid	= child->pid;
 	),
 
@@ -481,7 +482,7 @@ DECLARE_EVENT_CLASS_SCHEDSTAT(sched_stat_template,
 	),
 
 	TP_fast_assign(
-		memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->comm, TASK_COMM_LEN, tsk);
 		__entry->pid	= tsk->pid;
 		__entry->delay	= delay;
 	),
@@ -539,7 +540,7 @@ DECLARE_EVENT_CLASS(sched_stat_runtime,
 	),
 
 	TP_fast_assign(
-		memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->comm, TASK_COMM_LEN, tsk);
 		__entry->pid		= tsk->pid;
 		__entry->runtime	= runtime;
 	),
@@ -571,7 +572,7 @@ TRACE_EVENT(sched_pi_setprio,
 	),
 
 	TP_fast_assign(
-		memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->comm, TASK_COMM_LEN, tsk);
 		__entry->pid		= tsk->pid;
 		__entry->oldprio	= tsk->prio;
 		__entry->newprio	= pi_task ?
@@ -596,7 +597,7 @@ TRACE_EVENT(sched_process_hang,
 	),
 
 	TP_fast_assign(
-		memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->comm, TASK_COMM_LEN, tsk);
 		__entry->pid = tsk->pid;
 	),
 
diff --git a/include/trace/events/signal.h b/include/trace/events/signal.h
index 1db7e4b07c01..8f317a265392 100644
--- a/include/trace/events/signal.h
+++ b/include/trace/events/signal.h
@@ -67,7 +67,7 @@ TRACE_EVENT(signal_generate,
 	TP_fast_assign(
 		__entry->sig	= sig;
 		TP_STORE_SIGINFO(__entry, info);
-		memcpy(__entry->comm, task->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->comm, TASK_COMM_LEN, task);
 		__entry->pid	= task->pid;
 		__entry->group	= group;
 		__entry->result	= result;
diff --git a/include/trace/events/task.h b/include/trace/events/task.h
index 47b527464d1a..77c14707460e 100644
--- a/include/trace/events/task.h
+++ b/include/trace/events/task.h
@@ -21,7 +21,7 @@ TRACE_EVENT(task_newtask,
 
 	TP_fast_assign(
 		__entry->pid = task->pid;
-		memcpy(__entry->comm, task->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->comm, TASK_COMM_LEN, task);
 		__entry->clone_flags = clone_flags;
 		__entry->oom_score_adj = task->signal->oom_score_adj;
 	),
@@ -46,7 +46,7 @@ TRACE_EVENT(task_rename,
 
 	TP_fast_assign(
 		__entry->pid = task->pid;
-		memcpy(entry->oldcomm, task->comm, TASK_COMM_LEN);
+		__get_task_comm(entry->oldcomm, TASK_COMM_LEN, task);
 		strscpy(entry->newcomm, comm, TASK_COMM_LEN);
 		__entry->oom_score_adj = task->signal->oom_score_adj;
 	),
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 3/6] auditsc: Replace memcpy() with __get_task_comm()
  2024-06-02  2:37 [PATCH 0/6] kernel: Avoid memcpy of task comm Yafang Shao
  2024-06-02  2:37 ` [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() Yafang Shao
  2024-06-02  2:37 ` [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm() Yafang Shao
@ 2024-06-02  2:37 ` Yafang Shao
  2024-06-03 21:03   ` Paul Moore
  2024-06-02  2:37 ` [PATCH 4/6] security: " Yafang Shao
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 31+ messages in thread
From: Yafang Shao @ 2024-06-02  2:37 UTC (permalink / raw)
  To: torvalds
  Cc: linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf, Yafang Shao, Paul Moore,
	Eric Paris

Using __get_task_comm() to read the task comm ensures that the name is
always NUL-terminated, regardless of the source string. This approach also
facilitates future extensions to the task comm.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Eric Paris <eparis@redhat.com>
---
 kernel/auditsc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 6f0d6fb6523f..0459a141dc86 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -2730,7 +2730,7 @@ void __audit_ptrace(struct task_struct *t)
 	context->target_uid = task_uid(t);
 	context->target_sessionid = audit_get_sessionid(t);
 	security_task_getsecid_obj(t, &context->target_sid);
-	memcpy(context->target_comm, t->comm, TASK_COMM_LEN);
+	__get_task_comm(context->target_comm, TASK_COMM_LEN, t);
 }
 
 /**
@@ -2757,7 +2757,7 @@ int audit_signal_info_syscall(struct task_struct *t)
 		ctx->target_uid = t_uid;
 		ctx->target_sessionid = audit_get_sessionid(t);
 		security_task_getsecid_obj(t, &ctx->target_sid);
-		memcpy(ctx->target_comm, t->comm, TASK_COMM_LEN);
+		__get_task_comm(ctx->target_comm, TASK_COMM_LEN, t);
 		return 0;
 	}
 
@@ -2778,7 +2778,7 @@ int audit_signal_info_syscall(struct task_struct *t)
 	axp->target_uid[axp->pid_count] = t_uid;
 	axp->target_sessionid[axp->pid_count] = audit_get_sessionid(t);
 	security_task_getsecid_obj(t, &axp->target_sid[axp->pid_count]);
-	memcpy(axp->target_comm[axp->pid_count], t->comm, TASK_COMM_LEN);
+	__get_task_comm(axp->target_comm[axp->pid_count], TASK_COMM_LEN, t);
 	axp->pid_count++;
 
 	return 0;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 4/6] security: Replace memcpy() with __get_task_comm()
  2024-06-02  2:37 [PATCH 0/6] kernel: Avoid memcpy of task comm Yafang Shao
                   ` (2 preceding siblings ...)
  2024-06-02  2:37 ` [PATCH 3/6] auditsc: " Yafang Shao
@ 2024-06-02  2:37 ` Yafang Shao
  2024-06-03 22:06   ` Paul Moore
  2024-06-02  2:37 ` [PATCH 5/6] bpftool: Make task comm always be NUL-terminated Yafang Shao
  2024-06-02  2:46 ` [PATCH 6/6] selftests/bpf: Replace memcpy() with __get_task_comm() Yafang Shao
  5 siblings, 1 reply; 31+ messages in thread
From: Yafang Shao @ 2024-06-02  2:37 UTC (permalink / raw)
  To: torvalds
  Cc: linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf, Yafang Shao, Paul Moore,
	James Morris, Serge E. Hallyn, Stephen Smalley, Ondrej Mosnacek

Quoted from Linus [0]:

  selinux never wanted a lock, and never wanted any kind of *consistent*
  result, it just wanted a *stable* result.

Using __get_task_comm() to read the task comm ensures that the name is
always NUL-terminated, regardless of the source string. This approach also
facilitates future extensions to the task comm.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
LINK: https://lore.kernel.org/all/CAHk-=wivfrF0_zvf+oj6==Sh=-npJooP8chLPEfaFV0oNYTTBA@mail.gmail.com/ [0]
Cc: Paul Moore <paul@paul-moore.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
---
 security/lsm_audit.c         | 4 ++--
 security/selinux/selinuxfs.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/security/lsm_audit.c b/security/lsm_audit.c
index 849e832719e2..a922e4339dd5 100644
--- a/security/lsm_audit.c
+++ b/security/lsm_audit.c
@@ -207,7 +207,7 @@ static void dump_common_audit_data(struct audit_buffer *ab,
 	BUILD_BUG_ON(sizeof(a->u) > sizeof(void *)*2);
 
 	audit_log_format(ab, " pid=%d comm=", task_tgid_nr(current));
-	audit_log_untrustedstring(ab, memcpy(comm, current->comm, sizeof(comm)));
+	audit_log_untrustedstring(ab, __get_task_comm(comm, sizeof(comm), current));
 
 	switch (a->type) {
 	case LSM_AUDIT_DATA_NONE:
@@ -302,7 +302,7 @@ static void dump_common_audit_data(struct audit_buffer *ab,
 				char comm[sizeof(tsk->comm)];
 				audit_log_format(ab, " opid=%d ocomm=", pid);
 				audit_log_untrustedstring(ab,
-				    memcpy(comm, tsk->comm, sizeof(comm)));
+				    __get_task_comm(comm, sizeof(comm), tsk));
 			}
 		}
 		break;
diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index e172f182b65c..a8a2ec742576 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -708,7 +708,7 @@ static ssize_t sel_write_checkreqprot(struct file *file, const char __user *buf,
 	if (new_value) {
 		char comm[sizeof(current->comm)];
 
-		memcpy(comm, current->comm, sizeof(comm));
+		__get_task_comm(comm, sizeof(comm), current);
 		pr_err("SELinux: %s (%d) set checkreqprot to 1. This is no longer supported.\n",
 		       comm, current->pid);
 	}
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 5/6] bpftool: Make task comm always be NUL-terminated
  2024-06-02  2:37 [PATCH 0/6] kernel: Avoid memcpy of task comm Yafang Shao
                   ` (3 preceding siblings ...)
  2024-06-02  2:37 ` [PATCH 4/6] security: " Yafang Shao
@ 2024-06-02  2:37 ` Yafang Shao
  2024-06-02 21:01   ` Quentin Monnet
  2024-06-02  2:46 ` [PATCH 6/6] selftests/bpf: Replace memcpy() with __get_task_comm() Yafang Shao
  5 siblings, 1 reply; 31+ messages in thread
From: Yafang Shao @ 2024-06-02  2:37 UTC (permalink / raw)
  To: torvalds
  Cc: linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf, Yafang Shao, Quentin Monnet

Let's explicitly ensure the destination string is NUL-terminated. This way,
it won't be affected by changes to the source string.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Quentin Monnet <qmo@kernel.org>
---
 tools/bpf/bpftool/pids.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/bpf/bpftool/pids.c b/tools/bpf/bpftool/pids.c
index 9b898571b49e..23f488cf1740 100644
--- a/tools/bpf/bpftool/pids.c
+++ b/tools/bpf/bpftool/pids.c
@@ -54,6 +54,7 @@ static void add_ref(struct hashmap *map, struct pid_iter_entry *e)
 		ref = &refs->refs[refs->ref_cnt];
 		ref->pid = e->pid;
 		memcpy(ref->comm, e->comm, sizeof(ref->comm));
+		ref->comm[sizeof(ref->comm) - 1] = '\0';
 		refs->ref_cnt++;
 
 		return;
@@ -77,6 +78,7 @@ static void add_ref(struct hashmap *map, struct pid_iter_entry *e)
 	ref = &refs->refs[0];
 	ref->pid = e->pid;
 	memcpy(ref->comm, e->comm, sizeof(ref->comm));
+	ref->comm[sizeof(ref->comm) - 1] = '\0';
 	refs->ref_cnt = 1;
 	refs->has_bpf_cookie = e->has_bpf_cookie;
 	refs->bpf_cookie = e->bpf_cookie;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 6/6] selftests/bpf: Replace memcpy() with __get_task_comm()
  2024-06-02  2:37 [PATCH 0/6] kernel: Avoid memcpy of task comm Yafang Shao
                   ` (4 preceding siblings ...)
  2024-06-02  2:37 ` [PATCH 5/6] bpftool: Make task comm always be NUL-terminated Yafang Shao
@ 2024-06-02  2:46 ` Yafang Shao
  5 siblings, 0 replies; 31+ messages in thread
From: Yafang Shao @ 2024-06-02  2:46 UTC (permalink / raw)
  To: laoar.shao, torvalds
  Cc: linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf, Andrii Nakryiko,
	Eduard Zingerman, Mykola Lysenko

Using __get_task_comm() to read the task comm ensures that the name is
always NUL-terminated, regardless of the source string. This approach also
facilitates future extensions to the task comm.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Mykola Lysenko <mykolal@fb.com>
---
 tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h
index 11ee801e75e7..e5df95b56c53 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h
@@ -20,7 +20,7 @@ TRACE_EVENT(bpf_testmod_test_read,
 	),
 	TP_fast_assign(
 		__entry->pid = task->pid;
-		memcpy(__entry->comm, task->comm, TASK_COMM_LEN);
+		__get_task_comm(__entry->comm, TASK_COMM_LEN, task);
 		__entry->off = ctx->off;
 		__entry->len = ctx->len;
 	),
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm()
  2024-06-02  2:37 ` [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() Yafang Shao
@ 2024-06-02  3:51   ` Eric W. Biederman
  2024-06-02  6:56     ` Yafang Shao
  2024-06-04 13:02   ` Matus Jokay
  2024-06-04 20:01   ` Matus Jokay
  2 siblings, 1 reply; 31+ messages in thread
From: Eric W. Biederman @ 2024-06-02  3:51 UTC (permalink / raw)
  To: Yafang Shao
  Cc: torvalds, linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf, Alexander Viro,
	Christian Brauner, Jan Kara, Kees Cook

Yafang Shao <laoar.shao@gmail.com> writes:

> Quoted from Linus [0]:
>
>   Since user space can randomly change their names anyway, using locking
>   was always wrong for readers (for writers it probably does make sense
>   to have some lock - although practically speaking nobody cares there
>   either, but at least for a writer some kind of race could have
>   long-term mixed results

Ugh.
Ick.

This code is buggy.

I won't argue that Linus is wrong, about removing the
task_lock.

Unfortunately strscpy_pad does not work properly with the
task_lock removed, and buf_size larger that TASK_COMM_LEN.
There is a race that will allow reading past the end
of tsk->comm, if we read while tsk->common is being
updated.

So __get_task_comm needs to look something like:

char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk)
{
	size_t len = buf_size;
        if (len > TASK_COMM_LEN)
        	len = TASK_COMM_LEN;
	memcpy(buf, tsk->comm, len);
        buf[len -1] = '\0';
	return buf;
}

What shows up in buf past the '\0' is not guaranteed in the above
version but I would be surprised if anyone cares.

If people do care the code can do something like:
char *last = strchr(buf);
memset(last, '\0', buf_size - (last - buf));

To zero everything in the buffer past the first '\0' byte.


Eric


> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Link: https://lore.kernel.org/all/CAHk-=wivfrF0_zvf+oj6==Sh=-npJooP8chLPEfaFV0oNYTTBA@mail.gmail.com [0]
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Eric Biederman <ebiederm@xmission.com>
> Cc: Kees Cook <keescook@chromium.org>
> ---
>  fs/exec.c             | 7 +++++--
>  include/linux/sched.h | 2 +-
>  2 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index b3c40fbb325f..b43992d35a8a 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1227,12 +1227,15 @@ static int unshare_sighand(struct task_struct *me)
>  	return 0;
>  }
>  
> +/*
> + * User space can randomly change their names anyway, so locking for readers
> + * doesn't make sense. For writers, locking is probably necessary, as a race
> + * condition could lead to long-term mixed results.
> + */
>  char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk)
>  {
> -	task_lock(tsk);
>  	/* Always NUL terminated and zero-padded */
>  	strscpy_pad(buf, tsk->comm, buf_size);
> -	task_unlock(tsk);
>  	return buf;
>  }
>  EXPORT_SYMBOL_GPL(__get_task_comm);
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index c75fd46506df..56a927393a38 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1083,7 +1083,7 @@ struct task_struct {
>  	 *
>  	 * - normally initialized setup_new_exec()
>  	 * - access it with [gs]et_task_comm()
> -	 * - lock it with task_lock()
> +	 * - lock it with task_lock() for writing
>  	 */
>  	char				comm[TASK_COMM_LEN];

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm()
  2024-06-02  3:51   ` Eric W. Biederman
@ 2024-06-02  6:56     ` Yafang Shao
  2024-06-02 16:35       ` Alexei Starovoitov
  2024-06-02 17:56       ` Eric W. Biederman
  0 siblings, 2 replies; 31+ messages in thread
From: Yafang Shao @ 2024-06-02  6:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: torvalds, linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf, Alexander Viro,
	Christian Brauner, Jan Kara, Kees Cook

On Sun, Jun 2, 2024 at 11:52 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Yafang Shao <laoar.shao@gmail.com> writes:
>
> > Quoted from Linus [0]:
> >
> >   Since user space can randomly change their names anyway, using locking
> >   was always wrong for readers (for writers it probably does make sense
> >   to have some lock - although practically speaking nobody cares there
> >   either, but at least for a writer some kind of race could have
> >   long-term mixed results
>
> Ugh.
> Ick.
>
> This code is buggy.
>
> I won't argue that Linus is wrong, about removing the
> task_lock.
>
> Unfortunately strscpy_pad does not work properly with the
> task_lock removed, and buf_size larger that TASK_COMM_LEN.
> There is a race that will allow reading past the end
> of tsk->comm, if we read while tsk->common is being
> updated.

It appears so. Thanks for pointing it out. Additionally, other code,
such as the BPF helper bpf_get_current_comm(), also uses strscpy_pad()
directly without the task_lock. It seems we should change that as
well.

>
> So __get_task_comm needs to look something like:
>
> char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk)
> {
>         size_t len = buf_size;
>         if (len > TASK_COMM_LEN)
>                 len = TASK_COMM_LEN;
>         memcpy(buf, tsk->comm, len);
>         buf[len -1] = '\0';
>         return buf;
> }

Thanks for your suggestion.

>
> What shows up in buf past the '\0' is not guaranteed in the above
> version but I would be surprised if anyone cares.

I believe we pad it to prevent the leakage of kernel data. In this
case, since no kernel data will be leaked, the following change may be
unnecessary.

>
> If people do care the code can do something like:
> char *last = strchr(buf);
> memset(last, '\0', buf_size - (last - buf));
>
> To zero everything in the buffer past the first '\0' byte.
>

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm()
  2024-06-02  6:56     ` Yafang Shao
@ 2024-06-02 16:35       ` Alexei Starovoitov
  2024-06-02 17:52         ` Eric W. Biederman
  2024-06-02 17:56       ` Eric W. Biederman
  1 sibling, 1 reply; 31+ messages in thread
From: Alexei Starovoitov @ 2024-06-02 16:35 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Eric W. Biederman, Linus Torvalds, linux-mm, Linux-Fsdevel,
	linux-trace-kernel, audit, LSM List, selinux, bpf, Alexander Viro,
	Christian Brauner, Jan Kara, Kees Cook

On Sat, Jun 1, 2024 at 11:57 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Sun, Jun 2, 2024 at 11:52 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >
> > Yafang Shao <laoar.shao@gmail.com> writes:
> >
> > > Quoted from Linus [0]:
> > >
> > >   Since user space can randomly change their names anyway, using locking
> > >   was always wrong for readers (for writers it probably does make sense
> > >   to have some lock - although practically speaking nobody cares there
> > >   either, but at least for a writer some kind of race could have
> > >   long-term mixed results
> >
> > Ugh.
> > Ick.
> >
> > This code is buggy.
> >
> > I won't argue that Linus is wrong, about removing the
> > task_lock.
> >
> > Unfortunately strscpy_pad does not work properly with the
> > task_lock removed, and buf_size larger that TASK_COMM_LEN.
> > There is a race that will allow reading past the end
> > of tsk->comm, if we read while tsk->common is being
> > updated.
>
> It appears so. Thanks for pointing it out. Additionally, other code,
> such as the BPF helper bpf_get_current_comm(), also uses strscpy_pad()
> directly without the task_lock. It seems we should change that as
> well.

Hmm. What race do you see?
If lock is removed from __get_task_comm() it probably can be removed from
__set_task_comm() as well.
And both are calling strscpy_pad to write and read comm.
So I don't see how it would read past sizeof(comm),
because 'buf' passed into __set_task_comm is NUL-terminated.
So the concurrent read will find it.

> >
> > So __get_task_comm needs to look something like:
> >
> > char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk)
> > {
> >         size_t len = buf_size;
> >         if (len > TASK_COMM_LEN)
> >                 len = TASK_COMM_LEN;
> >         memcpy(buf, tsk->comm, len);
> >         buf[len -1] = '\0';
> >         return buf;
> > }
>
> Thanks for your suggestion.
>
> >
> > What shows up in buf past the '\0' is not guaranteed in the above
> > version but I would be surprised if anyone cares.
>
> I believe we pad it to prevent the leakage of kernel data. In this
> case, since no kernel data will be leaked, the following change may be
> unnecessary.

It's not about leaking of kernel data, but more about not writing
garbage past NUL.
Because comm[] is a part of some record that is used as a key
in a hash map.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm()
  2024-06-02 16:35       ` Alexei Starovoitov
@ 2024-06-02 17:52         ` Eric W. Biederman
  2024-06-02 18:23           ` Alexei Starovoitov
  2024-06-02 20:11           ` Linus Torvalds
  0 siblings, 2 replies; 31+ messages in thread
From: Eric W. Biederman @ 2024-06-02 17:52 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Yafang Shao, Linus Torvalds, linux-mm, Linux-Fsdevel,
	linux-trace-kernel, audit, LSM List, selinux, bpf, Alexander Viro,
	Christian Brauner, Jan Kara, Kees Cook

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Sat, Jun 1, 2024 at 11:57 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>>
>> On Sun, Jun 2, 2024 at 11:52 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>> >
>> > Yafang Shao <laoar.shao@gmail.com> writes:
>> >
>> > > Quoted from Linus [0]:
>> > >
>> > >   Since user space can randomly change their names anyway, using locking
>> > >   was always wrong for readers (for writers it probably does make sense
>> > >   to have some lock - although practically speaking nobody cares there
>> > >   either, but at least for a writer some kind of race could have
>> > >   long-term mixed results
>> >
>> > Ugh.
>> > Ick.
>> >
>> > This code is buggy.
>> >
>> > I won't argue that Linus is wrong, about removing the
>> > task_lock.
>> >
>> > Unfortunately strscpy_pad does not work properly with the
>> > task_lock removed, and buf_size larger that TASK_COMM_LEN.
>> > There is a race that will allow reading past the end
>> > of tsk->comm, if we read while tsk->common is being
>> > updated.
>>
>> It appears so. Thanks for pointing it out. Additionally, other code,
>> such as the BPF helper bpf_get_current_comm(), also uses strscpy_pad()
>> directly without the task_lock. It seems we should change that as
>> well.
>
> Hmm. What race do you see?
> If lock is removed from __get_task_comm() it probably can be removed from
> __set_task_comm() as well.
> And both are calling strscpy_pad to write and read comm.
> So I don't see how it would read past sizeof(comm),
> because 'buf' passed into __set_task_comm is NUL-terminated.
> So the concurrent read will find it.

The read may race with a write that is changing the location
of '\0'.  Especially if the new value is shorter than
the old value.

If you are performing lockless reads and depending upon a '\0'
terminator without limiting yourself to the size of the buffer
there needs to be a big fat comment as to how in the world
you are guaranteed that a '\0' inside the buffer will always
be found.

Eric

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm()
  2024-06-02  6:56     ` Yafang Shao
  2024-06-02 16:35       ` Alexei Starovoitov
@ 2024-06-02 17:56       ` Eric W. Biederman
  1 sibling, 0 replies; 31+ messages in thread
From: Eric W. Biederman @ 2024-06-02 17:56 UTC (permalink / raw)
  To: Yafang Shao
  Cc: torvalds, linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf, Alexander Viro,
	Christian Brauner, Jan Kara, Kees Cook

Yafang Shao <laoar.shao@gmail.com> writes:

> On Sun, Jun 2, 2024 at 11:52 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> Yafang Shao <laoar.shao@gmail.com> writes:
>>
>> > Quoted from Linus [0]:
>> >
>> >   Since user space can randomly change their names anyway, using locking
>> >   was always wrong for readers (for writers it probably does make sense
>> >   to have some lock - although practically speaking nobody cares there
>> >   either, but at least for a writer some kind of race could have
>> >   long-term mixed results
>>
>> Ugh.
>> Ick.
>>
>> This code is buggy.
>>
>> I won't argue that Linus is wrong, about removing the
>> task_lock.
>>
>> Unfortunately strscpy_pad does not work properly with the
>> task_lock removed, and buf_size larger that TASK_COMM_LEN.
>> There is a race that will allow reading past the end
>> of tsk->comm, if we read while tsk->common is being
>> updated.
>
> It appears so. Thanks for pointing it out. Additionally, other code,
> such as the BPF helper bpf_get_current_comm(), also uses strscpy_pad()
> directly without the task_lock. It seems we should change that as
> well.

Which suggests that we could really use a helper that handles all of
the tricky business of reading the tsk->comm lock-free.

Eric

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm()
  2024-06-02 17:52         ` Eric W. Biederman
@ 2024-06-02 18:23           ` Alexei Starovoitov
  2024-06-03 11:35             ` Yafang Shao
  2024-06-10 12:34             ` Eric W. Biederman
  2024-06-02 20:11           ` Linus Torvalds
  1 sibling, 2 replies; 31+ messages in thread
From: Alexei Starovoitov @ 2024-06-02 18:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Yafang Shao, Linus Torvalds, linux-mm, Linux-Fsdevel,
	linux-trace-kernel, audit, LSM List, selinux, bpf, Alexander Viro,
	Christian Brauner, Jan Kara, Kees Cook

On Sun, Jun 2, 2024 at 10:53 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
> > On Sat, Jun 1, 2024 at 11:57 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >>
> >> On Sun, Jun 2, 2024 at 11:52 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >> >
> >> > Yafang Shao <laoar.shao@gmail.com> writes:
> >> >
> >> > > Quoted from Linus [0]:
> >> > >
> >> > >   Since user space can randomly change their names anyway, using locking
> >> > >   was always wrong for readers (for writers it probably does make sense
> >> > >   to have some lock - although practically speaking nobody cares there
> >> > >   either, but at least for a writer some kind of race could have
> >> > >   long-term mixed results
> >> >
> >> > Ugh.
> >> > Ick.
> >> >
> >> > This code is buggy.
> >> >
> >> > I won't argue that Linus is wrong, about removing the
> >> > task_lock.
> >> >
> >> > Unfortunately strscpy_pad does not work properly with the
> >> > task_lock removed, and buf_size larger that TASK_COMM_LEN.
> >> > There is a race that will allow reading past the end
> >> > of tsk->comm, if we read while tsk->common is being
> >> > updated.
> >>
> >> It appears so. Thanks for pointing it out. Additionally, other code,
> >> such as the BPF helper bpf_get_current_comm(), also uses strscpy_pad()
> >> directly without the task_lock. It seems we should change that as
> >> well.
> >
> > Hmm. What race do you see?
> > If lock is removed from __get_task_comm() it probably can be removed from
> > __set_task_comm() as well.
> > And both are calling strscpy_pad to write and read comm.
> > So I don't see how it would read past sizeof(comm),
> > because 'buf' passed into __set_task_comm is NUL-terminated.
> > So the concurrent read will find it.
>
> The read may race with a write that is changing the location
> of '\0'.  Especially if the new value is shorter than
> the old value.

so ?
strscpy_pad in __[gs]et_task_comm will read/write either long
or byte at a time.
Assume 64 bit and, say, we had comm where 2nd u64 had NUL.
Now two cpus are racing. One is writing shorter comm.
Another is reading.
The latter can read 1st u64 without NUL and will proceed
to read 2nd u64. Either it will read the old u64 with NUL in it
or it will read all zeros in 2nd u64 or some zeros in 2nd u64
depending on how the compiler generated memset(.., 0, ..)
as part of strscpy_pad().
_pad() part is critical here.
If it was just strscpy() then there would indeed be a chance
of reading both u64-s and not finding NUL in any of them.

> If you are performing lockless reads and depending upon a '\0'
> terminator without limiting yourself to the size of the buffer
> there needs to be a big fat comment as to how in the world
> you are guaranteed that a '\0' inside the buffer will always
> be found.

I think Yafang can certainly add such a comment next to
__[gs]et_task_comm.

I prefer to avoid open coding memcpy + mmemset when strscpy_pad works.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm()
  2024-06-02 17:52         ` Eric W. Biederman
  2024-06-02 18:23           ` Alexei Starovoitov
@ 2024-06-02 20:11           ` Linus Torvalds
  1 sibling, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2024-06-02 20:11 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alexei Starovoitov, Yafang Shao, linux-mm, Linux-Fsdevel,
	linux-trace-kernel, audit, LSM List, selinux, bpf, Alexander Viro,
	Christian Brauner, Jan Kara, Kees Cook

On Sun, 2 Jun 2024 at 10:53, Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> The read may race with a write that is changing the location
> of '\0'.  Especially if the new value is shorter than
> the old value.

It *shouldn't* happen.

So 'strscpy()' itself is written to be NUL-safe, in that if it ever
copies a NUL character, it will stop. Admittedly the byte loop at the
end might technically need a READ_ONCE() for that to eb strictly true
in theory, but in practice it already is.

And even if the new string is shorter, the comm[] array will always
have a NUL terminator _somewhere_, in how the last byte is never
non-NUL.

Now, the only real issue is if something writes *to* the  comm[] array
without following the rules properly - like writing a non-NULL
character to the end of the array before then filling it in with NUL
again.

But that would be a bug on the comm[] writing side, I feel, not a bug
on the reader side.

               Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/6] bpftool: Make task comm always be NUL-terminated
  2024-06-02  2:37 ` [PATCH 5/6] bpftool: Make task comm always be NUL-terminated Yafang Shao
@ 2024-06-02 21:01   ` Quentin Monnet
  0 siblings, 0 replies; 31+ messages in thread
From: Quentin Monnet @ 2024-06-02 21:01 UTC (permalink / raw)
  To: Yafang Shao, torvalds
  Cc: linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf

On 02/06/2024 03:37, Yafang Shao wrote:
> Let's explicitly ensure the destination string is NUL-terminated. This way,
> it won't be affected by changes to the source string.
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Quentin Monnet <qmo@kernel.org>

The change looks good to me, thank you.

Reviewed-by: Quentin Monnet <qmo@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm()
  2024-06-02 18:23           ` Alexei Starovoitov
@ 2024-06-03 11:35             ` Yafang Shao
  2024-06-10 12:34             ` Eric W. Biederman
  1 sibling, 0 replies; 31+ messages in thread
From: Yafang Shao @ 2024-06-03 11:35 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric W. Biederman, Linus Torvalds, linux-mm, Linux-Fsdevel,
	linux-trace-kernel, audit, LSM List, selinux, bpf, Alexander Viro,
	Christian Brauner, Jan Kara, Kees Cook

On Mon, Jun 3, 2024 at 2:23 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Sun, Jun 2, 2024 at 10:53 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> >
> > > On Sat, Jun 1, 2024 at 11:57 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >>
> > >> On Sun, Jun 2, 2024 at 11:52 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > >> >
> > >> > Yafang Shao <laoar.shao@gmail.com> writes:
> > >> >
> > >> > > Quoted from Linus [0]:
> > >> > >
> > >> > >   Since user space can randomly change their names anyway, using locking
> > >> > >   was always wrong for readers (for writers it probably does make sense
> > >> > >   to have some lock - although practically speaking nobody cares there
> > >> > >   either, but at least for a writer some kind of race could have
> > >> > >   long-term mixed results
> > >> >
> > >> > Ugh.
> > >> > Ick.
> > >> >
> > >> > This code is buggy.
> > >> >
> > >> > I won't argue that Linus is wrong, about removing the
> > >> > task_lock.
> > >> >
> > >> > Unfortunately strscpy_pad does not work properly with the
> > >> > task_lock removed, and buf_size larger that TASK_COMM_LEN.
> > >> > There is a race that will allow reading past the end
> > >> > of tsk->comm, if we read while tsk->common is being
> > >> > updated.
> > >>
> > >> It appears so. Thanks for pointing it out. Additionally, other code,
> > >> such as the BPF helper bpf_get_current_comm(), also uses strscpy_pad()
> > >> directly without the task_lock. It seems we should change that as
> > >> well.
> > >
> > > Hmm. What race do you see?
> > > If lock is removed from __get_task_comm() it probably can be removed from
> > > __set_task_comm() as well.
> > > And both are calling strscpy_pad to write and read comm.
> > > So I don't see how it would read past sizeof(comm),
> > > because 'buf' passed into __set_task_comm is NUL-terminated.
> > > So the concurrent read will find it.
> >
> > The read may race with a write that is changing the location
> > of '\0'.  Especially if the new value is shorter than
> > the old value.
>
> so ?
> strscpy_pad in __[gs]et_task_comm will read/write either long
> or byte at a time.
> Assume 64 bit and, say, we had comm where 2nd u64 had NUL.
> Now two cpus are racing. One is writing shorter comm.
> Another is reading.
> The latter can read 1st u64 without NUL and will proceed
> to read 2nd u64. Either it will read the old u64 with NUL in it
> or it will read all zeros in 2nd u64 or some zeros in 2nd u64
> depending on how the compiler generated memset(.., 0, ..)
> as part of strscpy_pad().
> _pad() part is critical here.
> If it was just strscpy() then there would indeed be a chance
> of reading both u64-s and not finding NUL in any of them.
>
> > If you are performing lockless reads and depending upon a '\0'
> > terminator without limiting yourself to the size of the buffer
> > there needs to be a big fat comment as to how in the world
> > you are guaranteed that a '\0' inside the buffer will always
> > be found.
>
> I think Yafang can certainly add such a comment next to
> __[gs]et_task_comm.
>
> I prefer to avoid open coding memcpy + mmemset when strscpy_pad works.

Thanks for your explanation.
I will add a comment for it in the next version.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/6] auditsc: Replace memcpy() with __get_task_comm()
  2024-06-02  2:37 ` [PATCH 3/6] auditsc: " Yafang Shao
@ 2024-06-03 21:03   ` Paul Moore
  0 siblings, 0 replies; 31+ messages in thread
From: Paul Moore @ 2024-06-03 21:03 UTC (permalink / raw)
  To: Yafang Shao
  Cc: torvalds, linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf, Eric Paris

On Sat, Jun 1, 2024 at 10:38 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> Using __get_task_comm() to read the task comm ensures that the name is
> always NUL-terminated, regardless of the source string. This approach also
> facilitates future extensions to the task comm.
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Paul Moore <paul@paul-moore.com>
> Cc: Eric Paris <eparis@redhat.com>
> ---
>  kernel/auditsc.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Assuming you've sorted out all the problems identified earlier in the
patchset and __get_task_comm() no longer takes task_lock() this should
be okay from an audit perspective.

Acked-by: Paul Moore <paul@paul-moore.com>

> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 6f0d6fb6523f..0459a141dc86 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -2730,7 +2730,7 @@ void __audit_ptrace(struct task_struct *t)
>         context->target_uid = task_uid(t);
>         context->target_sessionid = audit_get_sessionid(t);
>         security_task_getsecid_obj(t, &context->target_sid);
> -       memcpy(context->target_comm, t->comm, TASK_COMM_LEN);
> +       __get_task_comm(context->target_comm, TASK_COMM_LEN, t);
>  }
>
>  /**
> @@ -2757,7 +2757,7 @@ int audit_signal_info_syscall(struct task_struct *t)
>                 ctx->target_uid = t_uid;
>                 ctx->target_sessionid = audit_get_sessionid(t);
>                 security_task_getsecid_obj(t, &ctx->target_sid);
> -               memcpy(ctx->target_comm, t->comm, TASK_COMM_LEN);
> +               __get_task_comm(ctx->target_comm, TASK_COMM_LEN, t);
>                 return 0;
>         }
>
> @@ -2778,7 +2778,7 @@ int audit_signal_info_syscall(struct task_struct *t)
>         axp->target_uid[axp->pid_count] = t_uid;
>         axp->target_sessionid[axp->pid_count] = audit_get_sessionid(t);
>         security_task_getsecid_obj(t, &axp->target_sid[axp->pid_count]);
> -       memcpy(axp->target_comm[axp->pid_count], t->comm, TASK_COMM_LEN);
> +       __get_task_comm(axp->target_comm[axp->pid_count], TASK_COMM_LEN, t);
>         axp->pid_count++;
>
>         return 0;
> --
> 2.39.1

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm()
  2024-06-02  2:37 ` [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm() Yafang Shao
@ 2024-06-03 21:20   ` Steven Rostedt
  2024-06-03 21:42     ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2024-06-03 21:20 UTC (permalink / raw)
  To: Yafang Shao
  Cc: torvalds, linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf, Masami Hiramatsu,
	Mathieu Desnoyers

On Sun,  2 Jun 2024 10:37:50 +0800
Yafang Shao <laoar.shao@gmail.com> wrote:

> Using __get_task_comm() to read the task comm ensures that the name is
> always NUL-terminated, regardless of the source string. This approach also
> facilitates future extensions to the task comm.
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> ---
>  include/linux/tracepoint.h     |  4 ++--
>  include/trace/events/block.h   | 10 +++++-----
>  include/trace/events/oom.h     |  2 +-
>  include/trace/events/osnoise.h |  2 +-
>  include/trace/events/sched.h   | 27 ++++++++++++++-------------
>  include/trace/events/signal.h  |  2 +-
>  include/trace/events/task.h    |  4 ++--
>  7 files changed, 26 insertions(+), 25 deletions(-)
> 
[..]

> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> index 68973f650c26..2a9d7c62c58a 100644
> --- a/include/trace/events/sched.h
> +++ b/include/trace/events/sched.h
> @@ -9,6 +9,7 @@
>  #include <linux/sched/numa_balancing.h>
>  #include <linux/tracepoint.h>
>  #include <linux/binfmts.h>
> +#include <linux/sched.h>
>  
>  /*
>   * Tracepoint for calling kthread_stop, performed to end a kthread:
> @@ -25,7 +26,7 @@ TRACE_EVENT(sched_kthread_stop,
>  	),
>  
>  	TP_fast_assign(
> -		memcpy(__entry->comm, t->comm, TASK_COMM_LEN);
> +		__get_task_comm(__entry->comm, TASK_COMM_LEN, t);
>  		__entry->pid	= t->pid;
>  	),
>  
> @@ -152,7 +153,7 @@ DECLARE_EVENT_CLASS(sched_wakeup_template,
>  	),
>  
>  	TP_fast_assign(
> -		memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
> +		__get_task_comm(__entry->comm, TASK_COMM_LEN, p);
>  		__entry->pid		= p->pid;
>  		__entry->prio		= p->prio; /* XXX SCHED_DEADLINE */
>  		__entry->target_cpu	= task_cpu(p);


> @@ -239,11 +240,11 @@ TRACE_EVENT(sched_switch,
>  	),
>  
>  	TP_fast_assign(
> -		memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
> +		__get_task_comm(__entry->next_comm, TASK_COMM_LEN, next);
>  		__entry->prev_pid	= prev->pid;
>  		__entry->prev_prio	= prev->prio;
>  		__entry->prev_state	= __trace_sched_switch_state(preempt, prev_state, prev);
> -		memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
> +		__get_task_comm(__entry->prev_comm, TASK_COMM_LEN, prev);
>  		__entry->next_pid	= next->pid;
>  		__entry->next_prio	= next->prio;
>  		/* XXX SCHED_DEADLINE */

sched_switch is special so we could probably hold off, but the rest should
be converted to the normal way strings are processed in trace events. That
is, to use __string(), __assign_str() and __get_str() for task->comm. I've
been wanting to do that for a while, but thought that memcpy() was a bit
faster than the need for strlen(). But this now needs to test the length of
comm. This method will also allow comms to be recorded that are larger than
16 bytes (if we extend comm).

TRACE_EVENT(sched_migrate_task,

	TP_PROTO(struct task_struct *p, int dest_cpu),

	TP_ARGS(p, dest_cpu),

	TP_STRUCT__entry(
-		__array(	char,	comm,	TASK_COMM_LEN	)
+		__string(	comm,	strlen(comm)		)
		__field(	pid_t,	pid			)
		__field(	int,	prio			)
		__field(	int,	orig_cpu		)
		__field(	int,	dest_cpu		)
	),

	TP_fast_assign(
-		memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
+		__assign_str(comm);
		__entry->pid		= p->pid;
		__entry->prio		= p->prio; /* XXX SCHED_DEADLINE */
		__entry->orig_cpu	= task_cpu(p);
		__entry->dest_cpu	= dest_cpu;
	),

	TP_printk("comm=%s pid=%d prio=%d orig_cpu=%d dest_cpu=%d",
-		  __entry->comm, __entry->pid, __entry->prio,
+		  __get_str(comm), __entry->pid, __entry->prio,
		  __entry->orig_cpu, __entry->dest_cpu)
);

-- Steve


> @@ -286,7 +287,7 @@ TRACE_EVENT(sched_migrate_task,
>  	),
>  
>  	TP_fast_assign(
> -		memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
> +		__get_task_comm(__entry->comm, TASK_COMM_LEN, p);
>  		__entry->pid		= p->pid;
>  		__entry->prio		= p->prio; /* XXX SCHED_DEADLINE */
>  		__entry->orig_cpu	= task_cpu(p);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm()
  2024-06-03 21:20   ` Steven Rostedt
@ 2024-06-03 21:42     ` Linus Torvalds
  2024-06-03 22:19       ` Steven Rostedt
  0 siblings, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2024-06-03 21:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Yafang Shao, linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf, Masami Hiramatsu,
	Mathieu Desnoyers

On Mon, 3 Jun 2024 at 14:19, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> -               __array(        char,   comm,   TASK_COMM_LEN   )
> +               __string(       comm,   strlen(comm)            )

Is this actually safe is 'comm[]' is being modified at the same time?
The 'strlen()' will not be consistent with the string copy.

Because that is very much the case. It's not a stable source.

For example, strlen() may return 5. But by the time  you then actually
copy the data, the string may have changed, and there would not
necessarily be a NUL character at comm[5] any more. It might be
further in the string, or it might be earlier.

                  Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/6] security: Replace memcpy() with __get_task_comm()
  2024-06-02  2:37 ` [PATCH 4/6] security: " Yafang Shao
@ 2024-06-03 22:06   ` Paul Moore
  0 siblings, 0 replies; 31+ messages in thread
From: Paul Moore @ 2024-06-03 22:06 UTC (permalink / raw)
  To: Yafang Shao
  Cc: torvalds, linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf, James Morris,
	Serge E. Hallyn, Stephen Smalley, Ondrej Mosnacek

On Sat, Jun 1, 2024 at 10:38 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> Quoted from Linus [0]:
>
>   selinux never wanted a lock, and never wanted any kind of *consistent*
>   result, it just wanted a *stable* result.
>
> Using __get_task_comm() to read the task comm ensures that the name is
> always NUL-terminated, regardless of the source string. This approach also
> facilitates future extensions to the task comm.
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> LINK: https://lore.kernel.org/all/CAHk-=wivfrF0_zvf+oj6==Sh=-npJooP8chLPEfaFV0oNYTTBA@mail.gmail.com/ [0]
> Cc: Paul Moore <paul@paul-moore.com>
> Cc: James Morris <jmorris@namei.org>
> Cc: "Serge E. Hallyn" <serge@hallyn.com>
> Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
> Cc: Ondrej Mosnacek <omosnace@redhat.com>
> ---
>  security/lsm_audit.c         | 4 ++--
>  security/selinux/selinuxfs.c | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)

Similar to the audit change, as long as you sort out the
__get_task_comm() issues such that it can operate without task_lock()
this should be fine.

Acked-by: Paul Moore <paul@paul-moore.com>

> diff --git a/security/lsm_audit.c b/security/lsm_audit.c
> index 849e832719e2..a922e4339dd5 100644
> --- a/security/lsm_audit.c
> +++ b/security/lsm_audit.c
> @@ -207,7 +207,7 @@ static void dump_common_audit_data(struct audit_buffer *ab,
>         BUILD_BUG_ON(sizeof(a->u) > sizeof(void *)*2);
>
>         audit_log_format(ab, " pid=%d comm=", task_tgid_nr(current));
> -       audit_log_untrustedstring(ab, memcpy(comm, current->comm, sizeof(comm)));
> +       audit_log_untrustedstring(ab, __get_task_comm(comm, sizeof(comm), current));
>
>         switch (a->type) {
>         case LSM_AUDIT_DATA_NONE:
> @@ -302,7 +302,7 @@ static void dump_common_audit_data(struct audit_buffer *ab,
>                                 char comm[sizeof(tsk->comm)];
>                                 audit_log_format(ab, " opid=%d ocomm=", pid);
>                                 audit_log_untrustedstring(ab,
> -                                   memcpy(comm, tsk->comm, sizeof(comm)));
> +                                   __get_task_comm(comm, sizeof(comm), tsk));
>                         }
>                 }
>                 break;
> diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
> index e172f182b65c..a8a2ec742576 100644
> --- a/security/selinux/selinuxfs.c
> +++ b/security/selinux/selinuxfs.c
> @@ -708,7 +708,7 @@ static ssize_t sel_write_checkreqprot(struct file *file, const char __user *buf,
>         if (new_value) {
>                 char comm[sizeof(current->comm)];
>
> -               memcpy(comm, current->comm, sizeof(comm));
> +               __get_task_comm(comm, sizeof(comm), current);
>                 pr_err("SELinux: %s (%d) set checkreqprot to 1. This is no longer supported.\n",
>                        comm, current->pid);
>         }
> --
> 2.39.1

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm()
  2024-06-03 21:42     ` Linus Torvalds
@ 2024-06-03 22:19       ` Steven Rostedt
  2024-06-03 22:23         ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2024-06-03 22:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Yafang Shao, linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf, Masami Hiramatsu,
	Mathieu Desnoyers

On Mon, 3 Jun 2024 14:42:10 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, 3 Jun 2024 at 14:19, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > -               __array(        char,   comm,   TASK_COMM_LEN   )
> > +               __string(       comm,   strlen(comm)            )  
> 
> Is this actually safe is 'comm[]' is being modified at the same time?
> The 'strlen()' will not be consistent with the string copy.

First, I realized that it should actually be:

		__string(	comm,	task->comm	)

But your question is still a valid question, as the internal logic will
call strlen() on the second parameter.

> 
> Because that is very much the case. It's not a stable source.
> 
> For example, strlen() may return 5. But by the time  you then actually
> copy the data, the string may have changed, and there would not
> necessarily be a NUL character at comm[5] any more. It might be
> further in the string, or it might be earlier.

The logic behind __string() and __assign_str() will always add a NUL
character.

__string() is defined as:

  static inline const char *__string_src(const char *str)
  {
       if (!str)
               return EVENT_NULL_STR;
       return str;
  }

  #undef __dynamic_array
  #define __dynamic_array(type, item, len)                              \
        __item_length = (len) * sizeof(type);                           \
        __data_offsets->item = __data_size +                            \
                               offsetof(typeof(*entry), __data);        \
        __data_offsets->item |= __item_length << 16;                    \
        __data_size += __item_length;

  #undef __string
  #define __string(item, src) __dynamic_array(char, item,               \
                    strlen(__string_src(src)) + 1)                      \
        __data_offsets->item##_ptr_ = src;


The above will use the strlen(src) to specify the amount of memory to
allocate on the ring buffer: "strlen(__string_src(src)) + 1)"

This is stored on a special structure for the entry and used in the
__assign_str() (the reason I removed the source to that macro during this
merge window).

  #undef __assign_str
  #define __assign_str(dst)                                             \
        do {                                                            \
                char *__str__ = __get_str(dst);                         \
                int __len__ = __get_dynamic_array_len(dst) - 1;         \
                memcpy(__str__, __data_offsets.dst##_ptr_ ? :           \
                       EVENT_NULL_STR, __len__);                        \
                __str__[__len__] = '\0';                                \
        } while (0)


The source of the string is copied via memcpy() using the length stored
from the __string() macro (minus 1), and then '\0' is added to it to force
the NUL character to be in the memory for the string.

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm()
  2024-06-03 22:19       ` Steven Rostedt
@ 2024-06-03 22:23         ` Linus Torvalds
  2024-06-03 22:37           ` Steven Rostedt
  0 siblings, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2024-06-03 22:23 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Yafang Shao, linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf, Masami Hiramatsu,
	Mathieu Desnoyers

On Mon, 3 Jun 2024 at 15:18, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> The logic behind __string() and __assign_str() will always add a NUL
> character.

Ok. But then you still end up with the issue that now the profiles are
different, and you have a 8-byte pointer to dynamically allocated
memory instead of just the simpler comm[TASK_COMM_LEN].

Is that actually a good idea for tracing?

We're trying to fix the core code to be cleaner for places that may
actually *care* (like 'ps').

Would we really want to touch this part of tracing?

            Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm()
  2024-06-03 22:23         ` Linus Torvalds
@ 2024-06-03 22:37           ` Steven Rostedt
  2024-06-03 22:38             ` Linus Torvalds
  2024-06-03 22:40             ` Steven Rostedt
  0 siblings, 2 replies; 31+ messages in thread
From: Steven Rostedt @ 2024-06-03 22:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Yafang Shao, linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf, Masami Hiramatsu,
	Mathieu Desnoyers

On Mon, 3 Jun 2024 15:23:48 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, 3 Jun 2024 at 15:18, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > The logic behind __string() and __assign_str() will always add a NUL
> > character.  
> 
> Ok. But then you still end up with the issue that now the profiles are
> different, and you have a 8-byte pointer to dynamically allocated
> memory instead of just the simpler comm[TASK_COMM_LEN].

It's actually a 4 byte meta data that holds it.

	__data_offsets->item##_ptr_ = src;

The __data_offsets is a local helper structure that holds the information
about where the string data will be in the ring buffer event, while the
event is being recorded. The actual data in the ring buffer is a 4 byte
word, where 2 bytes is for the size of the string and 2 bytes is for the
offset into the event.

If you have a task->comm = "ps", that will take up 12 bytes in the ring buffer.

   field: 2 bytes: for where in the event the "ps" is.
          2 bytes: for the length of ps.

Then after the data, you have 3 or 4 bytes to hold "ps\0". (the data always
ends on a 4 byte alignment).

The amount of data in the ring buffer to hold "ps" just went from 16 bytes
down to 12 bytes, and nothing is truncated if we extend the size of comm.

> 
> Is that actually a good idea for tracing?
> 
> We're trying to fix the core code to be cleaner for places that may
> actually *care* (like 'ps').
> 
> Would we really want to touch this part of tracing?

Note, I've been wanting to get rid of the hard coded TASK_COMM_LEN from the
events for a while. As I mentioned before, the only reason the memcpy exists
is because it was added before the __string() logic was. Then it became
somewhat of a habit to do that for everything that referenced task->comm. :-/

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm()
  2024-06-03 22:37           ` Steven Rostedt
@ 2024-06-03 22:38             ` Linus Torvalds
  2024-06-03 22:40             ` Steven Rostedt
  1 sibling, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2024-06-03 22:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Yafang Shao, linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf, Masami Hiramatsu,
	Mathieu Desnoyers

On Mon, 3 Jun 2024 at 15:36, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> It's actually a 4 byte meta data that holds it.

Ok, much better.

> Note, I've been wanting to get rid of the hard coded TASK_COMM_LEN from the
> events for a while. As I mentioned before, the only reason the memcpy exists
> is because it was added before the __string() logic was. Then it became
> somewhat of a habit to do that for everything that referenced task->comm. :-/

Ok, as long as it's an actual improvement, then ack.

            Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm()
  2024-06-03 22:37           ` Steven Rostedt
  2024-06-03 22:38             ` Linus Torvalds
@ 2024-06-03 22:40             ` Steven Rostedt
  2024-06-04  2:35               ` Yafang Shao
  1 sibling, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2024-06-03 22:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Yafang Shao, linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf, Masami Hiramatsu,
	Mathieu Desnoyers

On Mon, 3 Jun 2024 18:37:42 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> Note, I've been wanting to get rid of the hard coded TASK_COMM_LEN from the
> events for a while. As I mentioned before, the only reason the memcpy exists
> is because it was added before the __string() logic was. Then it became
> somewhat of a habit to do that for everything that referenced task->comm. :-/

My point is that if we are going to be changing the way we record
task->comm in the events, might as well do the change I've been wanting to
do for years. I'd hold off on the sched_switch event, as that's the one
event (and perhaps sched_waking events) that user space may be have
hardcoded how to read it.

I would be interested in changing it, just to see what breaks, so I would
know where to go fix things. But keep it a separate patch so that it could
be easily reverted.

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm()
  2024-06-03 22:40             ` Steven Rostedt
@ 2024-06-04  2:35               ` Yafang Shao
  0 siblings, 0 replies; 31+ messages in thread
From: Yafang Shao @ 2024-06-04  2:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, linux-mm, linux-fsdevel, linux-trace-kernel,
	audit, linux-security-module, selinux, bpf, Masami Hiramatsu,
	Mathieu Desnoyers

On Tue, Jun 4, 2024 at 6:39 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 3 Jun 2024 18:37:42 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > Note, I've been wanting to get rid of the hard coded TASK_COMM_LEN from the
> > events for a while. As I mentioned before, the only reason the memcpy exists
> > is because it was added before the __string() logic was. Then it became
> > somewhat of a habit to do that for everything that referenced task->comm. :-/
>
> My point is that if we are going to be changing the way we record
> task->comm in the events, might as well do the change I've been wanting to
> do for years. I'd hold off on the sched_switch event, as that's the one
> event (and perhaps sched_waking events) that user space may be have
> hardcoded how to read it.
>
> I would be interested in changing it, just to see what breaks, so I would
> know where to go fix things. But keep it a separate patch so that it could
> be easily reverted.
>

I will drop the tracing part that includes this patch and the patch #6
 in the next version, and leave it to you.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm()
  2024-06-02  2:37 ` [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() Yafang Shao
  2024-06-02  3:51   ` Eric W. Biederman
@ 2024-06-04 13:02   ` Matus Jokay
  2024-06-04 20:01   ` Matus Jokay
  2 siblings, 0 replies; 31+ messages in thread
From: Matus Jokay @ 2024-06-04 13:02 UTC (permalink / raw)
  To: Yafang Shao, torvalds
  Cc: linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf, Alexander Viro,
	Christian Brauner, Jan Kara, Eric Biederman, Kees Cook

Hi Yafang,

> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index c75fd46506df..56a927393a38 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1083,7 +1083,7 @@ struct task_struct {
>  	 *
>  	 * - normally initialized setup_new_exec()
>  	 * - access it with [gs]et_task_comm()
> -	 * - lock it with task_lock()
> +	 * - lock it with task_lock() for writing
since you are updating this comment, you might as well fix other comments concerning ->comm, e.g.

diff --git a/include/linux/sched.h b/include/linux/sched.h
index c75fd46506df..95b382d790d9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1081,9 +1081,9 @@ struct task_struct {
 	/*
 	 * executable name, excluding path.
 	 *
-	 * - normally initialized setup_new_exec()
-	 * - access it with [gs]et_task_comm()
-	 * - lock it with task_lock()
+	 * - normally initialized begin_new_exec()
+	 * - access it with __[gs]et_task_comm()
+	 * - lock it with task_lock() for writing
 	 */
 	char				comm[TASK_COMM_LEN];

-- 
Thanks
mY

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm()
  2024-06-02  2:37 ` [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() Yafang Shao
  2024-06-02  3:51   ` Eric W. Biederman
  2024-06-04 13:02   ` Matus Jokay
@ 2024-06-04 20:01   ` Matus Jokay
  2024-06-05  2:48     ` Yafang Shao
  2 siblings, 1 reply; 31+ messages in thread
From: Matus Jokay @ 2024-06-04 20:01 UTC (permalink / raw)
  To: Yafang Shao, torvalds
  Cc: linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf, Alexander Viro,
	Christian Brauner, Jan Kara, Eric Biederman, Kees Cook

Sorry guys for the mistake,

> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index c75fd46506df..56a927393a38 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1083,7 +1083,7 @@ struct task_struct {
>  	 *
>  	 * - normally initialized setup_new_exec()
>  	 * - access it with [gs]et_task_comm()
> -	 * - lock it with task_lock()
> +	 * - lock it with task_lock() for writing
there should be fixed only the comment about ->comm initialization during exec.

diff --git a/include/linux/sched.h b/include/linux/sched.h
index c75fd46506df..48aa5c85ed9e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1081,9 +1081,9 @@ struct task_struct {
 	/*
 	 * executable name, excluding path.
 	 *
-	 * - normally initialized setup_new_exec()
+	 * - normally initialized begin_new_exec()
 	 * - access it with [gs]et_task_comm()
-	 * - lock it with task_lock()
+	 * - lock it with task_lock() for writing
 	 */
 	char				comm[TASK_COMM_LEN];
 
Again, sorry for the noise. It's a very minor fix, but maybe even a small fix to the documentation can help increase the readability of the code.

--
Thanks
mY


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm()
  2024-06-04 20:01   ` Matus Jokay
@ 2024-06-05  2:48     ` Yafang Shao
  0 siblings, 0 replies; 31+ messages in thread
From: Yafang Shao @ 2024-06-05  2:48 UTC (permalink / raw)
  To: Matus Jokay
  Cc: torvalds, linux-mm, linux-fsdevel, linux-trace-kernel, audit,
	linux-security-module, selinux, bpf, Alexander Viro,
	Christian Brauner, Jan Kara, Eric Biederman, Kees Cook

On Wed, Jun 5, 2024 at 4:01 AM Matus Jokay <matus.jokay@stuba.sk> wrote:
>
> Sorry guys for the mistake,
>
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index c75fd46506df..56a927393a38 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1083,7 +1083,7 @@ struct task_struct {
> >        *
> >        * - normally initialized setup_new_exec()
> >        * - access it with [gs]et_task_comm()
> > -      * - lock it with task_lock()
> > +      * - lock it with task_lock() for writing
> there should be fixed only the comment about ->comm initialization during exec.
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index c75fd46506df..48aa5c85ed9e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1081,9 +1081,9 @@ struct task_struct {
>         /*
>          * executable name, excluding path.
>          *
> -        * - normally initialized setup_new_exec()
> +        * - normally initialized begin_new_exec()
>          * - access it with [gs]et_task_comm()
> -        * - lock it with task_lock()
> +        * - lock it with task_lock() for writing
>          */
>         char                            comm[TASK_COMM_LEN];
>
> Again, sorry for the noise. It's a very minor fix, but maybe even a small fix to the documentation can help increase the readability of the code.
>

Thank you for your improvement. It is very helpful. I will include it
in the next version.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm()
  2024-06-02 18:23           ` Alexei Starovoitov
  2024-06-03 11:35             ` Yafang Shao
@ 2024-06-10 12:34             ` Eric W. Biederman
  2024-06-10 23:01               ` Alexei Starovoitov
  1 sibling, 1 reply; 31+ messages in thread
From: Eric W. Biederman @ 2024-06-10 12:34 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Yafang Shao, Linus Torvalds, linux-mm, Linux-Fsdevel,
	linux-trace-kernel, audit, LSM List, selinux, bpf, Alexander Viro,
	Christian Brauner, Jan Kara, Kees Cook

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Sun, Jun 2, 2024 at 10:53 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> If you are performing lockless reads and depending upon a '\0'
>> terminator without limiting yourself to the size of the buffer
>> there needs to be a big fat comment as to how in the world
>> you are guaranteed that a '\0' inside the buffer will always
>> be found.
>
> I think Yafang can certainly add such a comment next to
> __[gs]et_task_comm.
>
> I prefer to avoid open coding memcpy + mmemset when strscpy_pad works.

Looking through the code in set_task_comm
strscpy_pad only works when both the source and designation are aligned.
Otherwise it performs a byte a time copy, and is most definitely
susceptible to the race I observed.

Further I looked a couple of the uses of set_task_com, in
fs/proc/base.c, kernel/kthread.c, and kernel/sys.c.

Nowhere do I see a guarantee that the source buffer is word aligned
or even something that would reasonably cause a compiler to place the
buffer that is being passed to set_task_comm to be word aligned.

As far as I can tell it is completely up to the compiler if it will
cause strscpy_pad to honor the word at a time guarantee needed
to make strscpy_pad safe for reading the information.

This is not to say we can't make it safe.

The easiest would be to create an aligned temporary buffer in
set_task_comm, and preserve the existing interface.  Alternatively
a type that has the appropriate size and alignment could be used
as input to set_task_comm and it could be caller's responsibility
to use it.

While we can definitely make reading task->comm happen without taking
the lock.  Doing so without updating set_task_comm to provide the
guarantees needed to make it safe, looks like a case of play silly
games, win silly prizes.

Eric

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm()
  2024-06-10 12:34             ` Eric W. Biederman
@ 2024-06-10 23:01               ` Alexei Starovoitov
  0 siblings, 0 replies; 31+ messages in thread
From: Alexei Starovoitov @ 2024-06-10 23:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Yafang Shao, Linus Torvalds, linux-mm, Linux-Fsdevel,
	linux-trace-kernel, audit, LSM List, selinux, bpf, Alexander Viro,
	Christian Brauner, Jan Kara, Kees Cook

On Mon, Jun 10, 2024 at 5:34 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
> > On Sun, Jun 2, 2024 at 10:53 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >>
> >> If you are performing lockless reads and depending upon a '\0'
> >> terminator without limiting yourself to the size of the buffer
> >> there needs to be a big fat comment as to how in the world
> >> you are guaranteed that a '\0' inside the buffer will always
> >> be found.
> >
> > I think Yafang can certainly add such a comment next to
> > __[gs]et_task_comm.
> >
> > I prefer to avoid open coding memcpy + mmemset when strscpy_pad works.
>
> Looking through the code in set_task_comm
> strscpy_pad only works when both the source and designation are aligned.
> Otherwise it performs a byte a time copy, and is most definitely
> susceptible to the race I observed.

Byte copy doesn't have an issue either.
Due to padding there is always a zero there.
Worst case in the last byte. So dst buffer will be zero terminated.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2024-06-10 23:01 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-02  2:37 [PATCH 0/6] kernel: Avoid memcpy of task comm Yafang Shao
2024-06-02  2:37 ` [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() Yafang Shao
2024-06-02  3:51   ` Eric W. Biederman
2024-06-02  6:56     ` Yafang Shao
2024-06-02 16:35       ` Alexei Starovoitov
2024-06-02 17:52         ` Eric W. Biederman
2024-06-02 18:23           ` Alexei Starovoitov
2024-06-03 11:35             ` Yafang Shao
2024-06-10 12:34             ` Eric W. Biederman
2024-06-10 23:01               ` Alexei Starovoitov
2024-06-02 20:11           ` Linus Torvalds
2024-06-02 17:56       ` Eric W. Biederman
2024-06-04 13:02   ` Matus Jokay
2024-06-04 20:01   ` Matus Jokay
2024-06-05  2:48     ` Yafang Shao
2024-06-02  2:37 ` [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm() Yafang Shao
2024-06-03 21:20   ` Steven Rostedt
2024-06-03 21:42     ` Linus Torvalds
2024-06-03 22:19       ` Steven Rostedt
2024-06-03 22:23         ` Linus Torvalds
2024-06-03 22:37           ` Steven Rostedt
2024-06-03 22:38             ` Linus Torvalds
2024-06-03 22:40             ` Steven Rostedt
2024-06-04  2:35               ` Yafang Shao
2024-06-02  2:37 ` [PATCH 3/6] auditsc: " Yafang Shao
2024-06-03 21:03   ` Paul Moore
2024-06-02  2:37 ` [PATCH 4/6] security: " Yafang Shao
2024-06-03 22:06   ` Paul Moore
2024-06-02  2:37 ` [PATCH 5/6] bpftool: Make task comm always be NUL-terminated Yafang Shao
2024-06-02 21:01   ` Quentin Monnet
2024-06-02  2:46 ` [PATCH 6/6] selftests/bpf: Replace memcpy() with __get_task_comm() Yafang Shao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).