* [PATCH 0/6] kernel: Avoid memcpy of task comm @ 2024-06-02 2:37 Yafang Shao 2024-06-02 2:37 ` [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() Yafang Shao ` (5 more replies) 0 siblings, 6 replies; 31+ messages in thread From: Yafang Shao @ 2024-06-02 2:37 UTC (permalink / raw) To: torvalds Cc: linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, Yafang Shao Using memcpy to copy the task comm relies on the length of task comm. Changes in the task comm could result in a destination string that is not NUL-terminated. Therefore, we should explicitly ensure the destination string is always NUL-terminated, regardless of the task comm. This approach will facilitate future extensions to the task comm. As suggested by Linus [0], we can identify all relevant code with the following git grep command: git grep 'memcpy.*->comm\>' Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/all/CAHk-=wjAmmHUg6vho1KjzQi2=psR30+CogFd4aXrThr2gsiS4g@mail.gmail.com/ [0] Yafang Shao (6): fs/exec: Drop task_lock() inside __get_task_comm() tracing: Replace memcpy() with __get_task_comm() auditsc: Replace memcpy() with __get_task_comm() security: Replace memcpy() with __get_task_comm() bpftool: Make task comm always be NUL-terminated selftests/bpf: Replace memcpy() with __get_task_comm() fs/exec.c | 7 +++-- include/linux/sched.h | 2 +- include/linux/tracepoint.h | 4 +-- include/trace/events/block.h | 10 +++---- include/trace/events/oom.h | 2 +- include/trace/events/osnoise.h | 2 +- include/trace/events/sched.h | 27 ++++++++++--------- include/trace/events/signal.h | 2 +- include/trace/events/task.h | 4 +-- kernel/auditsc.c | 6 ++--- security/lsm_audit.c | 4 +-- security/selinux/selinuxfs.c | 2 +- tools/bpf/bpftool/pids.c | 2 ++ .../bpf/bpf_testmod/bpf_testmod-events.h | 2 +- 14 files changed, 41 insertions(+), 35 deletions(-) -- 2.39.1 ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() 2024-06-02 2:37 [PATCH 0/6] kernel: Avoid memcpy of task comm Yafang Shao @ 2024-06-02 2:37 ` Yafang Shao 2024-06-02 3:51 ` Eric W. Biederman ` (2 more replies) 2024-06-02 2:37 ` [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm() Yafang Shao ` (4 subsequent siblings) 5 siblings, 3 replies; 31+ messages in thread From: Yafang Shao @ 2024-06-02 2:37 UTC (permalink / raw) To: torvalds Cc: linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, Yafang Shao, Alexander Viro, Christian Brauner, Jan Kara, Eric Biederman, Kees Cook Quoted from Linus [0]: Since user space can randomly change their names anyway, using locking was always wrong for readers (for writers it probably does make sense to have some lock - although practically speaking nobody cares there either, but at least for a writer some kind of race could have long-term mixed results Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/all/CAHk-=wivfrF0_zvf+oj6==Sh=-npJooP8chLPEfaFV0oNYTTBA@mail.gmail.com [0] Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> --- fs/exec.c | 7 +++++-- include/linux/sched.h | 2 +- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index b3c40fbb325f..b43992d35a8a 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1227,12 +1227,15 @@ static int unshare_sighand(struct task_struct *me) return 0; } +/* + * User space can randomly change their names anyway, so locking for readers + * doesn't make sense. For writers, locking is probably necessary, as a race + * condition could lead to long-term mixed results. + */ char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk) { - task_lock(tsk); /* Always NUL terminated and zero-padded */ strscpy_pad(buf, tsk->comm, buf_size); - task_unlock(tsk); return buf; } EXPORT_SYMBOL_GPL(__get_task_comm); diff --git a/include/linux/sched.h b/include/linux/sched.h index c75fd46506df..56a927393a38 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1083,7 +1083,7 @@ struct task_struct { * * - normally initialized setup_new_exec() * - access it with [gs]et_task_comm() - * - lock it with task_lock() + * - lock it with task_lock() for writing */ char comm[TASK_COMM_LEN]; -- 2.39.1 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() 2024-06-02 2:37 ` [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() Yafang Shao @ 2024-06-02 3:51 ` Eric W. Biederman 2024-06-02 6:56 ` Yafang Shao 2024-06-04 13:02 ` Matus Jokay 2024-06-04 20:01 ` Matus Jokay 2 siblings, 1 reply; 31+ messages in thread From: Eric W. Biederman @ 2024-06-02 3:51 UTC (permalink / raw) To: Yafang Shao Cc: torvalds, linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, Alexander Viro, Christian Brauner, Jan Kara, Kees Cook Yafang Shao <laoar.shao@gmail.com> writes: > Quoted from Linus [0]: > > Since user space can randomly change their names anyway, using locking > was always wrong for readers (for writers it probably does make sense > to have some lock - although practically speaking nobody cares there > either, but at least for a writer some kind of race could have > long-term mixed results Ugh. Ick. This code is buggy. I won't argue that Linus is wrong, about removing the task_lock. Unfortunately strscpy_pad does not work properly with the task_lock removed, and buf_size larger that TASK_COMM_LEN. There is a race that will allow reading past the end of tsk->comm, if we read while tsk->common is being updated. So __get_task_comm needs to look something like: char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk) { size_t len = buf_size; if (len > TASK_COMM_LEN) len = TASK_COMM_LEN; memcpy(buf, tsk->comm, len); buf[len -1] = '\0'; return buf; } What shows up in buf past the '\0' is not guaranteed in the above version but I would be surprised if anyone cares. If people do care the code can do something like: char *last = strchr(buf); memset(last, '\0', buf_size - (last - buf)); To zero everything in the buffer past the first '\0' byte. Eric > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> > Link: https://lore.kernel.org/all/CAHk-=wivfrF0_zvf+oj6==Sh=-npJooP8chLPEfaFV0oNYTTBA@mail.gmail.com [0] > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > Cc: Alexander Viro <viro@zeniv.linux.org.uk> > Cc: Christian Brauner <brauner@kernel.org> > Cc: Jan Kara <jack@suse.cz> > Cc: Eric Biederman <ebiederm@xmission.com> > Cc: Kees Cook <keescook@chromium.org> > --- > fs/exec.c | 7 +++++-- > include/linux/sched.h | 2 +- > 2 files changed, 6 insertions(+), 3 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index b3c40fbb325f..b43992d35a8a 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1227,12 +1227,15 @@ static int unshare_sighand(struct task_struct *me) > return 0; > } > > +/* > + * User space can randomly change their names anyway, so locking for readers > + * doesn't make sense. For writers, locking is probably necessary, as a race > + * condition could lead to long-term mixed results. > + */ > char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk) > { > - task_lock(tsk); > /* Always NUL terminated and zero-padded */ > strscpy_pad(buf, tsk->comm, buf_size); > - task_unlock(tsk); > return buf; > } > EXPORT_SYMBOL_GPL(__get_task_comm); > diff --git a/include/linux/sched.h b/include/linux/sched.h > index c75fd46506df..56a927393a38 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1083,7 +1083,7 @@ struct task_struct { > * > * - normally initialized setup_new_exec() > * - access it with [gs]et_task_comm() > - * - lock it with task_lock() > + * - lock it with task_lock() for writing > */ > char comm[TASK_COMM_LEN]; ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() 2024-06-02 3:51 ` Eric W. Biederman @ 2024-06-02 6:56 ` Yafang Shao 2024-06-02 16:35 ` Alexei Starovoitov 2024-06-02 17:56 ` Eric W. Biederman 0 siblings, 2 replies; 31+ messages in thread From: Yafang Shao @ 2024-06-02 6:56 UTC (permalink / raw) To: Eric W. Biederman Cc: torvalds, linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, Alexander Viro, Christian Brauner, Jan Kara, Kees Cook On Sun, Jun 2, 2024 at 11:52 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > > Yafang Shao <laoar.shao@gmail.com> writes: > > > Quoted from Linus [0]: > > > > Since user space can randomly change their names anyway, using locking > > was always wrong for readers (for writers it probably does make sense > > to have some lock - although practically speaking nobody cares there > > either, but at least for a writer some kind of race could have > > long-term mixed results > > Ugh. > Ick. > > This code is buggy. > > I won't argue that Linus is wrong, about removing the > task_lock. > > Unfortunately strscpy_pad does not work properly with the > task_lock removed, and buf_size larger that TASK_COMM_LEN. > There is a race that will allow reading past the end > of tsk->comm, if we read while tsk->common is being > updated. It appears so. Thanks for pointing it out. Additionally, other code, such as the BPF helper bpf_get_current_comm(), also uses strscpy_pad() directly without the task_lock. It seems we should change that as well. > > So __get_task_comm needs to look something like: > > char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk) > { > size_t len = buf_size; > if (len > TASK_COMM_LEN) > len = TASK_COMM_LEN; > memcpy(buf, tsk->comm, len); > buf[len -1] = '\0'; > return buf; > } Thanks for your suggestion. > > What shows up in buf past the '\0' is not guaranteed in the above > version but I would be surprised if anyone cares. I believe we pad it to prevent the leakage of kernel data. In this case, since no kernel data will be leaked, the following change may be unnecessary. > > If people do care the code can do something like: > char *last = strchr(buf); > memset(last, '\0', buf_size - (last - buf)); > > To zero everything in the buffer past the first '\0' byte. > -- Regards Yafang ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() 2024-06-02 6:56 ` Yafang Shao @ 2024-06-02 16:35 ` Alexei Starovoitov 2024-06-02 17:52 ` Eric W. Biederman 2024-06-02 17:56 ` Eric W. Biederman 1 sibling, 1 reply; 31+ messages in thread From: Alexei Starovoitov @ 2024-06-02 16:35 UTC (permalink / raw) To: Yafang Shao Cc: Eric W. Biederman, Linus Torvalds, linux-mm, Linux-Fsdevel, linux-trace-kernel, audit, LSM List, selinux, bpf, Alexander Viro, Christian Brauner, Jan Kara, Kees Cook On Sat, Jun 1, 2024 at 11:57 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > On Sun, Jun 2, 2024 at 11:52 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > > > > Yafang Shao <laoar.shao@gmail.com> writes: > > > > > Quoted from Linus [0]: > > > > > > Since user space can randomly change their names anyway, using locking > > > was always wrong for readers (for writers it probably does make sense > > > to have some lock - although practically speaking nobody cares there > > > either, but at least for a writer some kind of race could have > > > long-term mixed results > > > > Ugh. > > Ick. > > > > This code is buggy. > > > > I won't argue that Linus is wrong, about removing the > > task_lock. > > > > Unfortunately strscpy_pad does not work properly with the > > task_lock removed, and buf_size larger that TASK_COMM_LEN. > > There is a race that will allow reading past the end > > of tsk->comm, if we read while tsk->common is being > > updated. > > It appears so. Thanks for pointing it out. Additionally, other code, > such as the BPF helper bpf_get_current_comm(), also uses strscpy_pad() > directly without the task_lock. It seems we should change that as > well. Hmm. What race do you see? If lock is removed from __get_task_comm() it probably can be removed from __set_task_comm() as well. And both are calling strscpy_pad to write and read comm. So I don't see how it would read past sizeof(comm), because 'buf' passed into __set_task_comm is NUL-terminated. So the concurrent read will find it. > > > > So __get_task_comm needs to look something like: > > > > char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk) > > { > > size_t len = buf_size; > > if (len > TASK_COMM_LEN) > > len = TASK_COMM_LEN; > > memcpy(buf, tsk->comm, len); > > buf[len -1] = '\0'; > > return buf; > > } > > Thanks for your suggestion. > > > > > What shows up in buf past the '\0' is not guaranteed in the above > > version but I would be surprised if anyone cares. > > I believe we pad it to prevent the leakage of kernel data. In this > case, since no kernel data will be leaked, the following change may be > unnecessary. It's not about leaking of kernel data, but more about not writing garbage past NUL. Because comm[] is a part of some record that is used as a key in a hash map. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() 2024-06-02 16:35 ` Alexei Starovoitov @ 2024-06-02 17:52 ` Eric W. Biederman 2024-06-02 18:23 ` Alexei Starovoitov 2024-06-02 20:11 ` Linus Torvalds 0 siblings, 2 replies; 31+ messages in thread From: Eric W. Biederman @ 2024-06-02 17:52 UTC (permalink / raw) To: Alexei Starovoitov Cc: Yafang Shao, Linus Torvalds, linux-mm, Linux-Fsdevel, linux-trace-kernel, audit, LSM List, selinux, bpf, Alexander Viro, Christian Brauner, Jan Kara, Kees Cook Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > On Sat, Jun 1, 2024 at 11:57 PM Yafang Shao <laoar.shao@gmail.com> wrote: >> >> On Sun, Jun 2, 2024 at 11:52 AM Eric W. Biederman <ebiederm@xmission.com> wrote: >> > >> > Yafang Shao <laoar.shao@gmail.com> writes: >> > >> > > Quoted from Linus [0]: >> > > >> > > Since user space can randomly change their names anyway, using locking >> > > was always wrong for readers (for writers it probably does make sense >> > > to have some lock - although practically speaking nobody cares there >> > > either, but at least for a writer some kind of race could have >> > > long-term mixed results >> > >> > Ugh. >> > Ick. >> > >> > This code is buggy. >> > >> > I won't argue that Linus is wrong, about removing the >> > task_lock. >> > >> > Unfortunately strscpy_pad does not work properly with the >> > task_lock removed, and buf_size larger that TASK_COMM_LEN. >> > There is a race that will allow reading past the end >> > of tsk->comm, if we read while tsk->common is being >> > updated. >> >> It appears so. Thanks for pointing it out. Additionally, other code, >> such as the BPF helper bpf_get_current_comm(), also uses strscpy_pad() >> directly without the task_lock. It seems we should change that as >> well. > > Hmm. What race do you see? > If lock is removed from __get_task_comm() it probably can be removed from > __set_task_comm() as well. > And both are calling strscpy_pad to write and read comm. > So I don't see how it would read past sizeof(comm), > because 'buf' passed into __set_task_comm is NUL-terminated. > So the concurrent read will find it. The read may race with a write that is changing the location of '\0'. Especially if the new value is shorter than the old value. If you are performing lockless reads and depending upon a '\0' terminator without limiting yourself to the size of the buffer there needs to be a big fat comment as to how in the world you are guaranteed that a '\0' inside the buffer will always be found. Eric ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() 2024-06-02 17:52 ` Eric W. Biederman @ 2024-06-02 18:23 ` Alexei Starovoitov 2024-06-03 11:35 ` Yafang Shao 2024-06-10 12:34 ` Eric W. Biederman 2024-06-02 20:11 ` Linus Torvalds 1 sibling, 2 replies; 31+ messages in thread From: Alexei Starovoitov @ 2024-06-02 18:23 UTC (permalink / raw) To: Eric W. Biederman Cc: Yafang Shao, Linus Torvalds, linux-mm, Linux-Fsdevel, linux-trace-kernel, audit, LSM List, selinux, bpf, Alexander Viro, Christian Brauner, Jan Kara, Kees Cook On Sun, Jun 2, 2024 at 10:53 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > > Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > > > On Sat, Jun 1, 2024 at 11:57 PM Yafang Shao <laoar.shao@gmail.com> wrote: > >> > >> On Sun, Jun 2, 2024 at 11:52 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > >> > > >> > Yafang Shao <laoar.shao@gmail.com> writes: > >> > > >> > > Quoted from Linus [0]: > >> > > > >> > > Since user space can randomly change their names anyway, using locking > >> > > was always wrong for readers (for writers it probably does make sense > >> > > to have some lock - although practically speaking nobody cares there > >> > > either, but at least for a writer some kind of race could have > >> > > long-term mixed results > >> > > >> > Ugh. > >> > Ick. > >> > > >> > This code is buggy. > >> > > >> > I won't argue that Linus is wrong, about removing the > >> > task_lock. > >> > > >> > Unfortunately strscpy_pad does not work properly with the > >> > task_lock removed, and buf_size larger that TASK_COMM_LEN. > >> > There is a race that will allow reading past the end > >> > of tsk->comm, if we read while tsk->common is being > >> > updated. > >> > >> It appears so. Thanks for pointing it out. Additionally, other code, > >> such as the BPF helper bpf_get_current_comm(), also uses strscpy_pad() > >> directly without the task_lock. It seems we should change that as > >> well. > > > > Hmm. What race do you see? > > If lock is removed from __get_task_comm() it probably can be removed from > > __set_task_comm() as well. > > And both are calling strscpy_pad to write and read comm. > > So I don't see how it would read past sizeof(comm), > > because 'buf' passed into __set_task_comm is NUL-terminated. > > So the concurrent read will find it. > > The read may race with a write that is changing the location > of '\0'. Especially if the new value is shorter than > the old value. so ? strscpy_pad in __[gs]et_task_comm will read/write either long or byte at a time. Assume 64 bit and, say, we had comm where 2nd u64 had NUL. Now two cpus are racing. One is writing shorter comm. Another is reading. The latter can read 1st u64 without NUL and will proceed to read 2nd u64. Either it will read the old u64 with NUL in it or it will read all zeros in 2nd u64 or some zeros in 2nd u64 depending on how the compiler generated memset(.., 0, ..) as part of strscpy_pad(). _pad() part is critical here. If it was just strscpy() then there would indeed be a chance of reading both u64-s and not finding NUL in any of them. > If you are performing lockless reads and depending upon a '\0' > terminator without limiting yourself to the size of the buffer > there needs to be a big fat comment as to how in the world > you are guaranteed that a '\0' inside the buffer will always > be found. I think Yafang can certainly add such a comment next to __[gs]et_task_comm. I prefer to avoid open coding memcpy + mmemset when strscpy_pad works. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() 2024-06-02 18:23 ` Alexei Starovoitov @ 2024-06-03 11:35 ` Yafang Shao 2024-06-10 12:34 ` Eric W. Biederman 1 sibling, 0 replies; 31+ messages in thread From: Yafang Shao @ 2024-06-03 11:35 UTC (permalink / raw) To: Alexei Starovoitov Cc: Eric W. Biederman, Linus Torvalds, linux-mm, Linux-Fsdevel, linux-trace-kernel, audit, LSM List, selinux, bpf, Alexander Viro, Christian Brauner, Jan Kara, Kees Cook On Mon, Jun 3, 2024 at 2:23 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Sun, Jun 2, 2024 at 10:53 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > > > > Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > > > > > On Sat, Jun 1, 2024 at 11:57 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > >> > > >> On Sun, Jun 2, 2024 at 11:52 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > > >> > > > >> > Yafang Shao <laoar.shao@gmail.com> writes: > > >> > > > >> > > Quoted from Linus [0]: > > >> > > > > >> > > Since user space can randomly change their names anyway, using locking > > >> > > was always wrong for readers (for writers it probably does make sense > > >> > > to have some lock - although practically speaking nobody cares there > > >> > > either, but at least for a writer some kind of race could have > > >> > > long-term mixed results > > >> > > > >> > Ugh. > > >> > Ick. > > >> > > > >> > This code is buggy. > > >> > > > >> > I won't argue that Linus is wrong, about removing the > > >> > task_lock. > > >> > > > >> > Unfortunately strscpy_pad does not work properly with the > > >> > task_lock removed, and buf_size larger that TASK_COMM_LEN. > > >> > There is a race that will allow reading past the end > > >> > of tsk->comm, if we read while tsk->common is being > > >> > updated. > > >> > > >> It appears so. Thanks for pointing it out. Additionally, other code, > > >> such as the BPF helper bpf_get_current_comm(), also uses strscpy_pad() > > >> directly without the task_lock. It seems we should change that as > > >> well. > > > > > > Hmm. What race do you see? > > > If lock is removed from __get_task_comm() it probably can be removed from > > > __set_task_comm() as well. > > > And both are calling strscpy_pad to write and read comm. > > > So I don't see how it would read past sizeof(comm), > > > because 'buf' passed into __set_task_comm is NUL-terminated. > > > So the concurrent read will find it. > > > > The read may race with a write that is changing the location > > of '\0'. Especially if the new value is shorter than > > the old value. > > so ? > strscpy_pad in __[gs]et_task_comm will read/write either long > or byte at a time. > Assume 64 bit and, say, we had comm where 2nd u64 had NUL. > Now two cpus are racing. One is writing shorter comm. > Another is reading. > The latter can read 1st u64 without NUL and will proceed > to read 2nd u64. Either it will read the old u64 with NUL in it > or it will read all zeros in 2nd u64 or some zeros in 2nd u64 > depending on how the compiler generated memset(.., 0, ..) > as part of strscpy_pad(). > _pad() part is critical here. > If it was just strscpy() then there would indeed be a chance > of reading both u64-s and not finding NUL in any of them. > > > If you are performing lockless reads and depending upon a '\0' > > terminator without limiting yourself to the size of the buffer > > there needs to be a big fat comment as to how in the world > > you are guaranteed that a '\0' inside the buffer will always > > be found. > > I think Yafang can certainly add such a comment next to > __[gs]et_task_comm. > > I prefer to avoid open coding memcpy + mmemset when strscpy_pad works. Thanks for your explanation. I will add a comment for it in the next version. -- Regards Yafang ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() 2024-06-02 18:23 ` Alexei Starovoitov 2024-06-03 11:35 ` Yafang Shao @ 2024-06-10 12:34 ` Eric W. Biederman 2024-06-10 23:01 ` Alexei Starovoitov 1 sibling, 1 reply; 31+ messages in thread From: Eric W. Biederman @ 2024-06-10 12:34 UTC (permalink / raw) To: Alexei Starovoitov Cc: Yafang Shao, Linus Torvalds, linux-mm, Linux-Fsdevel, linux-trace-kernel, audit, LSM List, selinux, bpf, Alexander Viro, Christian Brauner, Jan Kara, Kees Cook Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > On Sun, Jun 2, 2024 at 10:53 AM Eric W. Biederman <ebiederm@xmission.com> wrote: >> >> If you are performing lockless reads and depending upon a '\0' >> terminator without limiting yourself to the size of the buffer >> there needs to be a big fat comment as to how in the world >> you are guaranteed that a '\0' inside the buffer will always >> be found. > > I think Yafang can certainly add such a comment next to > __[gs]et_task_comm. > > I prefer to avoid open coding memcpy + mmemset when strscpy_pad works. Looking through the code in set_task_comm strscpy_pad only works when both the source and designation are aligned. Otherwise it performs a byte a time copy, and is most definitely susceptible to the race I observed. Further I looked a couple of the uses of set_task_com, in fs/proc/base.c, kernel/kthread.c, and kernel/sys.c. Nowhere do I see a guarantee that the source buffer is word aligned or even something that would reasonably cause a compiler to place the buffer that is being passed to set_task_comm to be word aligned. As far as I can tell it is completely up to the compiler if it will cause strscpy_pad to honor the word at a time guarantee needed to make strscpy_pad safe for reading the information. This is not to say we can't make it safe. The easiest would be to create an aligned temporary buffer in set_task_comm, and preserve the existing interface. Alternatively a type that has the appropriate size and alignment could be used as input to set_task_comm and it could be caller's responsibility to use it. While we can definitely make reading task->comm happen without taking the lock. Doing so without updating set_task_comm to provide the guarantees needed to make it safe, looks like a case of play silly games, win silly prizes. Eric ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() 2024-06-10 12:34 ` Eric W. Biederman @ 2024-06-10 23:01 ` Alexei Starovoitov 0 siblings, 0 replies; 31+ messages in thread From: Alexei Starovoitov @ 2024-06-10 23:01 UTC (permalink / raw) To: Eric W. Biederman Cc: Yafang Shao, Linus Torvalds, linux-mm, Linux-Fsdevel, linux-trace-kernel, audit, LSM List, selinux, bpf, Alexander Viro, Christian Brauner, Jan Kara, Kees Cook On Mon, Jun 10, 2024 at 5:34 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > > Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > > > On Sun, Jun 2, 2024 at 10:53 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > >> > >> If you are performing lockless reads and depending upon a '\0' > >> terminator without limiting yourself to the size of the buffer > >> there needs to be a big fat comment as to how in the world > >> you are guaranteed that a '\0' inside the buffer will always > >> be found. > > > > I think Yafang can certainly add such a comment next to > > __[gs]et_task_comm. > > > > I prefer to avoid open coding memcpy + mmemset when strscpy_pad works. > > Looking through the code in set_task_comm > strscpy_pad only works when both the source and designation are aligned. > Otherwise it performs a byte a time copy, and is most definitely > susceptible to the race I observed. Byte copy doesn't have an issue either. Due to padding there is always a zero there. Worst case in the last byte. So dst buffer will be zero terminated. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() 2024-06-02 17:52 ` Eric W. Biederman 2024-06-02 18:23 ` Alexei Starovoitov @ 2024-06-02 20:11 ` Linus Torvalds 1 sibling, 0 replies; 31+ messages in thread From: Linus Torvalds @ 2024-06-02 20:11 UTC (permalink / raw) To: Eric W. Biederman Cc: Alexei Starovoitov, Yafang Shao, linux-mm, Linux-Fsdevel, linux-trace-kernel, audit, LSM List, selinux, bpf, Alexander Viro, Christian Brauner, Jan Kara, Kees Cook On Sun, 2 Jun 2024 at 10:53, Eric W. Biederman <ebiederm@xmission.com> wrote: > > The read may race with a write that is changing the location > of '\0'. Especially if the new value is shorter than > the old value. It *shouldn't* happen. So 'strscpy()' itself is written to be NUL-safe, in that if it ever copies a NUL character, it will stop. Admittedly the byte loop at the end might technically need a READ_ONCE() for that to eb strictly true in theory, but in practice it already is. And even if the new string is shorter, the comm[] array will always have a NUL terminator _somewhere_, in how the last byte is never non-NUL. Now, the only real issue is if something writes *to* the comm[] array without following the rules properly - like writing a non-NULL character to the end of the array before then filling it in with NUL again. But that would be a bug on the comm[] writing side, I feel, not a bug on the reader side. Linus ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() 2024-06-02 6:56 ` Yafang Shao 2024-06-02 16:35 ` Alexei Starovoitov @ 2024-06-02 17:56 ` Eric W. Biederman 1 sibling, 0 replies; 31+ messages in thread From: Eric W. Biederman @ 2024-06-02 17:56 UTC (permalink / raw) To: Yafang Shao Cc: torvalds, linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, Alexander Viro, Christian Brauner, Jan Kara, Kees Cook Yafang Shao <laoar.shao@gmail.com> writes: > On Sun, Jun 2, 2024 at 11:52 AM Eric W. Biederman <ebiederm@xmission.com> wrote: >> >> Yafang Shao <laoar.shao@gmail.com> writes: >> >> > Quoted from Linus [0]: >> > >> > Since user space can randomly change their names anyway, using locking >> > was always wrong for readers (for writers it probably does make sense >> > to have some lock - although practically speaking nobody cares there >> > either, but at least for a writer some kind of race could have >> > long-term mixed results >> >> Ugh. >> Ick. >> >> This code is buggy. >> >> I won't argue that Linus is wrong, about removing the >> task_lock. >> >> Unfortunately strscpy_pad does not work properly with the >> task_lock removed, and buf_size larger that TASK_COMM_LEN. >> There is a race that will allow reading past the end >> of tsk->comm, if we read while tsk->common is being >> updated. > > It appears so. Thanks for pointing it out. Additionally, other code, > such as the BPF helper bpf_get_current_comm(), also uses strscpy_pad() > directly without the task_lock. It seems we should change that as > well. Which suggests that we could really use a helper that handles all of the tricky business of reading the tsk->comm lock-free. Eric ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() 2024-06-02 2:37 ` [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() Yafang Shao 2024-06-02 3:51 ` Eric W. Biederman @ 2024-06-04 13:02 ` Matus Jokay 2024-06-04 20:01 ` Matus Jokay 2 siblings, 0 replies; 31+ messages in thread From: Matus Jokay @ 2024-06-04 13:02 UTC (permalink / raw) To: Yafang Shao, torvalds Cc: linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, Alexander Viro, Christian Brauner, Jan Kara, Eric Biederman, Kees Cook Hi Yafang, > diff --git a/include/linux/sched.h b/include/linux/sched.h > index c75fd46506df..56a927393a38 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1083,7 +1083,7 @@ struct task_struct { > * > * - normally initialized setup_new_exec() > * - access it with [gs]et_task_comm() > - * - lock it with task_lock() > + * - lock it with task_lock() for writing since you are updating this comment, you might as well fix other comments concerning ->comm, e.g. diff --git a/include/linux/sched.h b/include/linux/sched.h index c75fd46506df..95b382d790d9 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1081,9 +1081,9 @@ struct task_struct { /* * executable name, excluding path. * - * - normally initialized setup_new_exec() - * - access it with [gs]et_task_comm() - * - lock it with task_lock() + * - normally initialized begin_new_exec() + * - access it with __[gs]et_task_comm() + * - lock it with task_lock() for writing */ char comm[TASK_COMM_LEN]; -- Thanks mY ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() 2024-06-02 2:37 ` [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() Yafang Shao 2024-06-02 3:51 ` Eric W. Biederman 2024-06-04 13:02 ` Matus Jokay @ 2024-06-04 20:01 ` Matus Jokay 2024-06-05 2:48 ` Yafang Shao 2 siblings, 1 reply; 31+ messages in thread From: Matus Jokay @ 2024-06-04 20:01 UTC (permalink / raw) To: Yafang Shao, torvalds Cc: linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, Alexander Viro, Christian Brauner, Jan Kara, Eric Biederman, Kees Cook Sorry guys for the mistake, > diff --git a/include/linux/sched.h b/include/linux/sched.h > index c75fd46506df..56a927393a38 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1083,7 +1083,7 @@ struct task_struct { > * > * - normally initialized setup_new_exec() > * - access it with [gs]et_task_comm() > - * - lock it with task_lock() > + * - lock it with task_lock() for writing there should be fixed only the comment about ->comm initialization during exec. diff --git a/include/linux/sched.h b/include/linux/sched.h index c75fd46506df..48aa5c85ed9e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1081,9 +1081,9 @@ struct task_struct { /* * executable name, excluding path. * - * - normally initialized setup_new_exec() + * - normally initialized begin_new_exec() * - access it with [gs]et_task_comm() - * - lock it with task_lock() + * - lock it with task_lock() for writing */ char comm[TASK_COMM_LEN]; Again, sorry for the noise. It's a very minor fix, but maybe even a small fix to the documentation can help increase the readability of the code. -- Thanks mY ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() 2024-06-04 20:01 ` Matus Jokay @ 2024-06-05 2:48 ` Yafang Shao 0 siblings, 0 replies; 31+ messages in thread From: Yafang Shao @ 2024-06-05 2:48 UTC (permalink / raw) To: Matus Jokay Cc: torvalds, linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, Alexander Viro, Christian Brauner, Jan Kara, Eric Biederman, Kees Cook On Wed, Jun 5, 2024 at 4:01 AM Matus Jokay <matus.jokay@stuba.sk> wrote: > > Sorry guys for the mistake, > > > diff --git a/include/linux/sched.h b/include/linux/sched.h > > index c75fd46506df..56a927393a38 100644 > > --- a/include/linux/sched.h > > +++ b/include/linux/sched.h > > @@ -1083,7 +1083,7 @@ struct task_struct { > > * > > * - normally initialized setup_new_exec() > > * - access it with [gs]et_task_comm() > > - * - lock it with task_lock() > > + * - lock it with task_lock() for writing > there should be fixed only the comment about ->comm initialization during exec. > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index c75fd46506df..48aa5c85ed9e 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1081,9 +1081,9 @@ struct task_struct { > /* > * executable name, excluding path. > * > - * - normally initialized setup_new_exec() > + * - normally initialized begin_new_exec() > * - access it with [gs]et_task_comm() > - * - lock it with task_lock() > + * - lock it with task_lock() for writing > */ > char comm[TASK_COMM_LEN]; > > Again, sorry for the noise. It's a very minor fix, but maybe even a small fix to the documentation can help increase the readability of the code. > Thank you for your improvement. It is very helpful. I will include it in the next version. -- Regards Yafang ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm() 2024-06-02 2:37 [PATCH 0/6] kernel: Avoid memcpy of task comm Yafang Shao 2024-06-02 2:37 ` [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() Yafang Shao @ 2024-06-02 2:37 ` Yafang Shao 2024-06-03 21:20 ` Steven Rostedt 2024-06-02 2:37 ` [PATCH 3/6] auditsc: " Yafang Shao ` (3 subsequent siblings) 5 siblings, 1 reply; 31+ messages in thread From: Yafang Shao @ 2024-06-02 2:37 UTC (permalink / raw) To: torvalds Cc: linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, Yafang Shao, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers Using __get_task_comm() to read the task comm ensures that the name is always NUL-terminated, regardless of the source string. This approach also facilitates future extensions to the task comm. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> --- include/linux/tracepoint.h | 4 ++-- include/trace/events/block.h | 10 +++++----- include/trace/events/oom.h | 2 +- include/trace/events/osnoise.h | 2 +- include/trace/events/sched.h | 27 ++++++++++++++------------- include/trace/events/signal.h | 2 +- include/trace/events/task.h | 4 ++-- 7 files changed, 26 insertions(+), 25 deletions(-) diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 689b6d71590e..6381824d8107 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -519,10 +519,10 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) * * * * TP_fast_assign( - * memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN); + * __get_task_comm(__entry->next_comm, TASK_COMM_LEN, next); * __entry->prev_pid = prev->pid; * __entry->prev_prio = prev->prio; - * memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN); + * __get_task_comm(__entry->prev_comm, TASK_COMM_LEN, prev); * __entry->next_pid = next->pid; * __entry->next_prio = next->prio; * ), diff --git a/include/trace/events/block.h b/include/trace/events/block.h index 0e128ad51460..6f8c5d0014e6 100644 --- a/include/trace/events/block.h +++ b/include/trace/events/block.h @@ -193,7 +193,7 @@ DECLARE_EVENT_CLASS(block_rq, blk_fill_rwbs(__entry->rwbs, rq->cmd_flags); __get_str(cmd)[0] = '\0'; - memcpy(__entry->comm, current->comm, TASK_COMM_LEN); + __get_task_comm(__entry->comm, TASK_COMM_LEN, current); ), TP_printk("%d,%d %s %u (%s) %llu + %u [%s]", @@ -328,7 +328,7 @@ DECLARE_EVENT_CLASS(block_bio, __entry->sector = bio->bi_iter.bi_sector; __entry->nr_sector = bio_sectors(bio); blk_fill_rwbs(__entry->rwbs, bio->bi_opf); - memcpy(__entry->comm, current->comm, TASK_COMM_LEN); + __get_task_comm(__entry->comm, TASK_COMM_LEN, current); ), TP_printk("%d,%d %s %llu + %u [%s]", @@ -415,7 +415,7 @@ TRACE_EVENT(block_plug, ), TP_fast_assign( - memcpy(__entry->comm, current->comm, TASK_COMM_LEN); + __get_task_comm(__entry->comm, TASK_COMM_LEN, current); ), TP_printk("[%s]", __entry->comm) @@ -434,7 +434,7 @@ DECLARE_EVENT_CLASS(block_unplug, TP_fast_assign( __entry->nr_rq = depth; - memcpy(__entry->comm, current->comm, TASK_COMM_LEN); + __get_task_comm(__entry->comm, TASK_COMM_LEN, current); ), TP_printk("[%s] %d", __entry->comm, __entry->nr_rq) @@ -485,7 +485,7 @@ TRACE_EVENT(block_split, __entry->sector = bio->bi_iter.bi_sector; __entry->new_sector = new_sector; blk_fill_rwbs(__entry->rwbs, bio->bi_opf); - memcpy(__entry->comm, current->comm, TASK_COMM_LEN); + __get_task_comm(__entry->comm, TASK_COMM_LEN, current); ), TP_printk("%d,%d %s %llu / %llu [%s]", diff --git a/include/trace/events/oom.h b/include/trace/events/oom.h index b799f3bcba82..f29be9ebcd4d 100644 --- a/include/trace/events/oom.h +++ b/include/trace/events/oom.h @@ -23,7 +23,7 @@ TRACE_EVENT(oom_score_adj_update, TP_fast_assign( __entry->pid = task->pid; - memcpy(__entry->comm, task->comm, TASK_COMM_LEN); + __get_task_comm(__entry->comm, TASK_COMM_LEN, task); __entry->oom_score_adj = task->signal->oom_score_adj; ), diff --git a/include/trace/events/osnoise.h b/include/trace/events/osnoise.h index 82f741ec0f57..50f480655722 100644 --- a/include/trace/events/osnoise.h +++ b/include/trace/events/osnoise.h @@ -20,7 +20,7 @@ TRACE_EVENT(thread_noise, ), TP_fast_assign( - memcpy(__entry->comm, t->comm, TASK_COMM_LEN); + __get_task_comm(__entry->comm, TASK_COMM_LEN, t); __entry->pid = t->pid; __entry->start = start; __entry->duration = duration; diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index 68973f650c26..2a9d7c62c58a 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -9,6 +9,7 @@ #include <linux/sched/numa_balancing.h> #include <linux/tracepoint.h> #include <linux/binfmts.h> +#include <linux/sched.h> /* * Tracepoint for calling kthread_stop, performed to end a kthread: @@ -25,7 +26,7 @@ TRACE_EVENT(sched_kthread_stop, ), TP_fast_assign( - memcpy(__entry->comm, t->comm, TASK_COMM_LEN); + __get_task_comm(__entry->comm, TASK_COMM_LEN, t); __entry->pid = t->pid; ), @@ -152,7 +153,7 @@ DECLARE_EVENT_CLASS(sched_wakeup_template, ), TP_fast_assign( - memcpy(__entry->comm, p->comm, TASK_COMM_LEN); + __get_task_comm(__entry->comm, TASK_COMM_LEN, p); __entry->pid = p->pid; __entry->prio = p->prio; /* XXX SCHED_DEADLINE */ __entry->target_cpu = task_cpu(p); @@ -239,11 +240,11 @@ TRACE_EVENT(sched_switch, ), TP_fast_assign( - memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN); + __get_task_comm(__entry->next_comm, TASK_COMM_LEN, next); __entry->prev_pid = prev->pid; __entry->prev_prio = prev->prio; __entry->prev_state = __trace_sched_switch_state(preempt, prev_state, prev); - memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN); + __get_task_comm(__entry->prev_comm, TASK_COMM_LEN, prev); __entry->next_pid = next->pid; __entry->next_prio = next->prio; /* XXX SCHED_DEADLINE */ @@ -286,7 +287,7 @@ TRACE_EVENT(sched_migrate_task, ), TP_fast_assign( - memcpy(__entry->comm, p->comm, TASK_COMM_LEN); + __get_task_comm(__entry->comm, TASK_COMM_LEN, p); __entry->pid = p->pid; __entry->prio = p->prio; /* XXX SCHED_DEADLINE */ __entry->orig_cpu = task_cpu(p); @@ -311,7 +312,7 @@ DECLARE_EVENT_CLASS(sched_process_template, ), TP_fast_assign( - memcpy(__entry->comm, p->comm, TASK_COMM_LEN); + __get_task_comm(__entry->comm, TASK_COMM_LEN, p); __entry->pid = p->pid; __entry->prio = p->prio; /* XXX SCHED_DEADLINE */ ), @@ -357,7 +358,7 @@ TRACE_EVENT(sched_process_wait, ), TP_fast_assign( - memcpy(__entry->comm, current->comm, TASK_COMM_LEN); + __get_task_comm(__entry->comm, TASK_COMM_LEN, current); __entry->pid = pid_nr(pid); __entry->prio = current->prio; /* XXX SCHED_DEADLINE */ ), @@ -383,9 +384,9 @@ TRACE_EVENT(sched_process_fork, ), TP_fast_assign( - memcpy(__entry->parent_comm, parent->comm, TASK_COMM_LEN); + __get_task_comm(__entry->parent_comm, TASK_COMM_LEN, parent); __entry->parent_pid = parent->pid; - memcpy(__entry->child_comm, child->comm, TASK_COMM_LEN); + __get_task_comm(__entry->child_comm, TASK_COMM_LEN, child); __entry->child_pid = child->pid; ), @@ -481,7 +482,7 @@ DECLARE_EVENT_CLASS_SCHEDSTAT(sched_stat_template, ), TP_fast_assign( - memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN); + __get_task_comm(__entry->comm, TASK_COMM_LEN, tsk); __entry->pid = tsk->pid; __entry->delay = delay; ), @@ -539,7 +540,7 @@ DECLARE_EVENT_CLASS(sched_stat_runtime, ), TP_fast_assign( - memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN); + __get_task_comm(__entry->comm, TASK_COMM_LEN, tsk); __entry->pid = tsk->pid; __entry->runtime = runtime; ), @@ -571,7 +572,7 @@ TRACE_EVENT(sched_pi_setprio, ), TP_fast_assign( - memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN); + __get_task_comm(__entry->comm, TASK_COMM_LEN, tsk); __entry->pid = tsk->pid; __entry->oldprio = tsk->prio; __entry->newprio = pi_task ? @@ -596,7 +597,7 @@ TRACE_EVENT(sched_process_hang, ), TP_fast_assign( - memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN); + __get_task_comm(__entry->comm, TASK_COMM_LEN, tsk); __entry->pid = tsk->pid; ), diff --git a/include/trace/events/signal.h b/include/trace/events/signal.h index 1db7e4b07c01..8f317a265392 100644 --- a/include/trace/events/signal.h +++ b/include/trace/events/signal.h @@ -67,7 +67,7 @@ TRACE_EVENT(signal_generate, TP_fast_assign( __entry->sig = sig; TP_STORE_SIGINFO(__entry, info); - memcpy(__entry->comm, task->comm, TASK_COMM_LEN); + __get_task_comm(__entry->comm, TASK_COMM_LEN, task); __entry->pid = task->pid; __entry->group = group; __entry->result = result; diff --git a/include/trace/events/task.h b/include/trace/events/task.h index 47b527464d1a..77c14707460e 100644 --- a/include/trace/events/task.h +++ b/include/trace/events/task.h @@ -21,7 +21,7 @@ TRACE_EVENT(task_newtask, TP_fast_assign( __entry->pid = task->pid; - memcpy(__entry->comm, task->comm, TASK_COMM_LEN); + __get_task_comm(__entry->comm, TASK_COMM_LEN, task); __entry->clone_flags = clone_flags; __entry->oom_score_adj = task->signal->oom_score_adj; ), @@ -46,7 +46,7 @@ TRACE_EVENT(task_rename, TP_fast_assign( __entry->pid = task->pid; - memcpy(entry->oldcomm, task->comm, TASK_COMM_LEN); + __get_task_comm(entry->oldcomm, TASK_COMM_LEN, task); strscpy(entry->newcomm, comm, TASK_COMM_LEN); __entry->oom_score_adj = task->signal->oom_score_adj; ), -- 2.39.1 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm() 2024-06-02 2:37 ` [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm() Yafang Shao @ 2024-06-03 21:20 ` Steven Rostedt 2024-06-03 21:42 ` Linus Torvalds 0 siblings, 1 reply; 31+ messages in thread From: Steven Rostedt @ 2024-06-03 21:20 UTC (permalink / raw) To: Yafang Shao Cc: torvalds, linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, Masami Hiramatsu, Mathieu Desnoyers On Sun, 2 Jun 2024 10:37:50 +0800 Yafang Shao <laoar.shao@gmail.com> wrote: > Using __get_task_comm() to read the task comm ensures that the name is > always NUL-terminated, regardless of the source string. This approach also > facilitates future extensions to the task comm. > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Masami Hiramatsu <mhiramat@kernel.org> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > --- > include/linux/tracepoint.h | 4 ++-- > include/trace/events/block.h | 10 +++++----- > include/trace/events/oom.h | 2 +- > include/trace/events/osnoise.h | 2 +- > include/trace/events/sched.h | 27 ++++++++++++++------------- > include/trace/events/signal.h | 2 +- > include/trace/events/task.h | 4 ++-- > 7 files changed, 26 insertions(+), 25 deletions(-) > [..] > diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h > index 68973f650c26..2a9d7c62c58a 100644 > --- a/include/trace/events/sched.h > +++ b/include/trace/events/sched.h > @@ -9,6 +9,7 @@ > #include <linux/sched/numa_balancing.h> > #include <linux/tracepoint.h> > #include <linux/binfmts.h> > +#include <linux/sched.h> > > /* > * Tracepoint for calling kthread_stop, performed to end a kthread: > @@ -25,7 +26,7 @@ TRACE_EVENT(sched_kthread_stop, > ), > > TP_fast_assign( > - memcpy(__entry->comm, t->comm, TASK_COMM_LEN); > + __get_task_comm(__entry->comm, TASK_COMM_LEN, t); > __entry->pid = t->pid; > ), > > @@ -152,7 +153,7 @@ DECLARE_EVENT_CLASS(sched_wakeup_template, > ), > > TP_fast_assign( > - memcpy(__entry->comm, p->comm, TASK_COMM_LEN); > + __get_task_comm(__entry->comm, TASK_COMM_LEN, p); > __entry->pid = p->pid; > __entry->prio = p->prio; /* XXX SCHED_DEADLINE */ > __entry->target_cpu = task_cpu(p); > @@ -239,11 +240,11 @@ TRACE_EVENT(sched_switch, > ), > > TP_fast_assign( > - memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN); > + __get_task_comm(__entry->next_comm, TASK_COMM_LEN, next); > __entry->prev_pid = prev->pid; > __entry->prev_prio = prev->prio; > __entry->prev_state = __trace_sched_switch_state(preempt, prev_state, prev); > - memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN); > + __get_task_comm(__entry->prev_comm, TASK_COMM_LEN, prev); > __entry->next_pid = next->pid; > __entry->next_prio = next->prio; > /* XXX SCHED_DEADLINE */ sched_switch is special so we could probably hold off, but the rest should be converted to the normal way strings are processed in trace events. That is, to use __string(), __assign_str() and __get_str() for task->comm. I've been wanting to do that for a while, but thought that memcpy() was a bit faster than the need for strlen(). But this now needs to test the length of comm. This method will also allow comms to be recorded that are larger than 16 bytes (if we extend comm). TRACE_EVENT(sched_migrate_task, TP_PROTO(struct task_struct *p, int dest_cpu), TP_ARGS(p, dest_cpu), TP_STRUCT__entry( - __array( char, comm, TASK_COMM_LEN ) + __string( comm, strlen(comm) ) __field( pid_t, pid ) __field( int, prio ) __field( int, orig_cpu ) __field( int, dest_cpu ) ), TP_fast_assign( - memcpy(__entry->comm, p->comm, TASK_COMM_LEN); + __assign_str(comm); __entry->pid = p->pid; __entry->prio = p->prio; /* XXX SCHED_DEADLINE */ __entry->orig_cpu = task_cpu(p); __entry->dest_cpu = dest_cpu; ), TP_printk("comm=%s pid=%d prio=%d orig_cpu=%d dest_cpu=%d", - __entry->comm, __entry->pid, __entry->prio, + __get_str(comm), __entry->pid, __entry->prio, __entry->orig_cpu, __entry->dest_cpu) ); -- Steve > @@ -286,7 +287,7 @@ TRACE_EVENT(sched_migrate_task, > ), > > TP_fast_assign( > - memcpy(__entry->comm, p->comm, TASK_COMM_LEN); > + __get_task_comm(__entry->comm, TASK_COMM_LEN, p); > __entry->pid = p->pid; > __entry->prio = p->prio; /* XXX SCHED_DEADLINE */ > __entry->orig_cpu = task_cpu(p); ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm() 2024-06-03 21:20 ` Steven Rostedt @ 2024-06-03 21:42 ` Linus Torvalds 2024-06-03 22:19 ` Steven Rostedt 0 siblings, 1 reply; 31+ messages in thread From: Linus Torvalds @ 2024-06-03 21:42 UTC (permalink / raw) To: Steven Rostedt Cc: Yafang Shao, linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, Masami Hiramatsu, Mathieu Desnoyers On Mon, 3 Jun 2024 at 14:19, Steven Rostedt <rostedt@goodmis.org> wrote: > > - __array( char, comm, TASK_COMM_LEN ) > + __string( comm, strlen(comm) ) Is this actually safe is 'comm[]' is being modified at the same time? The 'strlen()' will not be consistent with the string copy. Because that is very much the case. It's not a stable source. For example, strlen() may return 5. But by the time you then actually copy the data, the string may have changed, and there would not necessarily be a NUL character at comm[5] any more. It might be further in the string, or it might be earlier. Linus ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm() 2024-06-03 21:42 ` Linus Torvalds @ 2024-06-03 22:19 ` Steven Rostedt 2024-06-03 22:23 ` Linus Torvalds 0 siblings, 1 reply; 31+ messages in thread From: Steven Rostedt @ 2024-06-03 22:19 UTC (permalink / raw) To: Linus Torvalds Cc: Yafang Shao, linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, Masami Hiramatsu, Mathieu Desnoyers On Mon, 3 Jun 2024 14:42:10 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, 3 Jun 2024 at 14:19, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > - __array( char, comm, TASK_COMM_LEN ) > > + __string( comm, strlen(comm) ) > > Is this actually safe is 'comm[]' is being modified at the same time? > The 'strlen()' will not be consistent with the string copy. First, I realized that it should actually be: __string( comm, task->comm ) But your question is still a valid question, as the internal logic will call strlen() on the second parameter. > > Because that is very much the case. It's not a stable source. > > For example, strlen() may return 5. But by the time you then actually > copy the data, the string may have changed, and there would not > necessarily be a NUL character at comm[5] any more. It might be > further in the string, or it might be earlier. The logic behind __string() and __assign_str() will always add a NUL character. __string() is defined as: static inline const char *__string_src(const char *str) { if (!str) return EVENT_NULL_STR; return str; } #undef __dynamic_array #define __dynamic_array(type, item, len) \ __item_length = (len) * sizeof(type); \ __data_offsets->item = __data_size + \ offsetof(typeof(*entry), __data); \ __data_offsets->item |= __item_length << 16; \ __data_size += __item_length; #undef __string #define __string(item, src) __dynamic_array(char, item, \ strlen(__string_src(src)) + 1) \ __data_offsets->item##_ptr_ = src; The above will use the strlen(src) to specify the amount of memory to allocate on the ring buffer: "strlen(__string_src(src)) + 1)" This is stored on a special structure for the entry and used in the __assign_str() (the reason I removed the source to that macro during this merge window). #undef __assign_str #define __assign_str(dst) \ do { \ char *__str__ = __get_str(dst); \ int __len__ = __get_dynamic_array_len(dst) - 1; \ memcpy(__str__, __data_offsets.dst##_ptr_ ? : \ EVENT_NULL_STR, __len__); \ __str__[__len__] = '\0'; \ } while (0) The source of the string is copied via memcpy() using the length stored from the __string() macro (minus 1), and then '\0' is added to it to force the NUL character to be in the memory for the string. -- Steve ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm() 2024-06-03 22:19 ` Steven Rostedt @ 2024-06-03 22:23 ` Linus Torvalds 2024-06-03 22:37 ` Steven Rostedt 0 siblings, 1 reply; 31+ messages in thread From: Linus Torvalds @ 2024-06-03 22:23 UTC (permalink / raw) To: Steven Rostedt Cc: Yafang Shao, linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, Masami Hiramatsu, Mathieu Desnoyers On Mon, 3 Jun 2024 at 15:18, Steven Rostedt <rostedt@goodmis.org> wrote: > > The logic behind __string() and __assign_str() will always add a NUL > character. Ok. But then you still end up with the issue that now the profiles are different, and you have a 8-byte pointer to dynamically allocated memory instead of just the simpler comm[TASK_COMM_LEN]. Is that actually a good idea for tracing? We're trying to fix the core code to be cleaner for places that may actually *care* (like 'ps'). Would we really want to touch this part of tracing? Linus ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm() 2024-06-03 22:23 ` Linus Torvalds @ 2024-06-03 22:37 ` Steven Rostedt 2024-06-03 22:38 ` Linus Torvalds 2024-06-03 22:40 ` Steven Rostedt 0 siblings, 2 replies; 31+ messages in thread From: Steven Rostedt @ 2024-06-03 22:37 UTC (permalink / raw) To: Linus Torvalds Cc: Yafang Shao, linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, Masami Hiramatsu, Mathieu Desnoyers On Mon, 3 Jun 2024 15:23:48 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, 3 Jun 2024 at 15:18, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > The logic behind __string() and __assign_str() will always add a NUL > > character. > > Ok. But then you still end up with the issue that now the profiles are > different, and you have a 8-byte pointer to dynamically allocated > memory instead of just the simpler comm[TASK_COMM_LEN]. It's actually a 4 byte meta data that holds it. __data_offsets->item##_ptr_ = src; The __data_offsets is a local helper structure that holds the information about where the string data will be in the ring buffer event, while the event is being recorded. The actual data in the ring buffer is a 4 byte word, where 2 bytes is for the size of the string and 2 bytes is for the offset into the event. If you have a task->comm = "ps", that will take up 12 bytes in the ring buffer. field: 2 bytes: for where in the event the "ps" is. 2 bytes: for the length of ps. Then after the data, you have 3 or 4 bytes to hold "ps\0". (the data always ends on a 4 byte alignment). The amount of data in the ring buffer to hold "ps" just went from 16 bytes down to 12 bytes, and nothing is truncated if we extend the size of comm. > > Is that actually a good idea for tracing? > > We're trying to fix the core code to be cleaner for places that may > actually *care* (like 'ps'). > > Would we really want to touch this part of tracing? Note, I've been wanting to get rid of the hard coded TASK_COMM_LEN from the events for a while. As I mentioned before, the only reason the memcpy exists is because it was added before the __string() logic was. Then it became somewhat of a habit to do that for everything that referenced task->comm. :-/ -- Steve ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm() 2024-06-03 22:37 ` Steven Rostedt @ 2024-06-03 22:38 ` Linus Torvalds 2024-06-03 22:40 ` Steven Rostedt 1 sibling, 0 replies; 31+ messages in thread From: Linus Torvalds @ 2024-06-03 22:38 UTC (permalink / raw) To: Steven Rostedt Cc: Yafang Shao, linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, Masami Hiramatsu, Mathieu Desnoyers On Mon, 3 Jun 2024 at 15:36, Steven Rostedt <rostedt@goodmis.org> wrote: > > It's actually a 4 byte meta data that holds it. Ok, much better. > Note, I've been wanting to get rid of the hard coded TASK_COMM_LEN from the > events for a while. As I mentioned before, the only reason the memcpy exists > is because it was added before the __string() logic was. Then it became > somewhat of a habit to do that for everything that referenced task->comm. :-/ Ok, as long as it's an actual improvement, then ack. Linus ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm() 2024-06-03 22:37 ` Steven Rostedt 2024-06-03 22:38 ` Linus Torvalds @ 2024-06-03 22:40 ` Steven Rostedt 2024-06-04 2:35 ` Yafang Shao 1 sibling, 1 reply; 31+ messages in thread From: Steven Rostedt @ 2024-06-03 22:40 UTC (permalink / raw) To: Linus Torvalds Cc: Yafang Shao, linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, Masami Hiramatsu, Mathieu Desnoyers On Mon, 3 Jun 2024 18:37:42 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > Note, I've been wanting to get rid of the hard coded TASK_COMM_LEN from the > events for a while. As I mentioned before, the only reason the memcpy exists > is because it was added before the __string() logic was. Then it became > somewhat of a habit to do that for everything that referenced task->comm. :-/ My point is that if we are going to be changing the way we record task->comm in the events, might as well do the change I've been wanting to do for years. I'd hold off on the sched_switch event, as that's the one event (and perhaps sched_waking events) that user space may be have hardcoded how to read it. I would be interested in changing it, just to see what breaks, so I would know where to go fix things. But keep it a separate patch so that it could be easily reverted. -- Steve ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm() 2024-06-03 22:40 ` Steven Rostedt @ 2024-06-04 2:35 ` Yafang Shao 0 siblings, 0 replies; 31+ messages in thread From: Yafang Shao @ 2024-06-04 2:35 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, Masami Hiramatsu, Mathieu Desnoyers On Tue, Jun 4, 2024 at 6:39 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Mon, 3 Jun 2024 18:37:42 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > Note, I've been wanting to get rid of the hard coded TASK_COMM_LEN from the > > events for a while. As I mentioned before, the only reason the memcpy exists > > is because it was added before the __string() logic was. Then it became > > somewhat of a habit to do that for everything that referenced task->comm. :-/ > > My point is that if we are going to be changing the way we record > task->comm in the events, might as well do the change I've been wanting to > do for years. I'd hold off on the sched_switch event, as that's the one > event (and perhaps sched_waking events) that user space may be have > hardcoded how to read it. > > I would be interested in changing it, just to see what breaks, so I would > know where to go fix things. But keep it a separate patch so that it could > be easily reverted. > I will drop the tracing part that includes this patch and the patch #6 in the next version, and leave it to you. -- Regards Yafang ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 3/6] auditsc: Replace memcpy() with __get_task_comm() 2024-06-02 2:37 [PATCH 0/6] kernel: Avoid memcpy of task comm Yafang Shao 2024-06-02 2:37 ` [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() Yafang Shao 2024-06-02 2:37 ` [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm() Yafang Shao @ 2024-06-02 2:37 ` Yafang Shao 2024-06-03 21:03 ` Paul Moore 2024-06-02 2:37 ` [PATCH 4/6] security: " Yafang Shao ` (2 subsequent siblings) 5 siblings, 1 reply; 31+ messages in thread From: Yafang Shao @ 2024-06-02 2:37 UTC (permalink / raw) To: torvalds Cc: linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, Yafang Shao, Paul Moore, Eric Paris Using __get_task_comm() to read the task comm ensures that the name is always NUL-terminated, regardless of the source string. This approach also facilitates future extensions to the task comm. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Eric Paris <eparis@redhat.com> --- kernel/auditsc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/auditsc.c b/kernel/auditsc.c index 6f0d6fb6523f..0459a141dc86 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -2730,7 +2730,7 @@ void __audit_ptrace(struct task_struct *t) context->target_uid = task_uid(t); context->target_sessionid = audit_get_sessionid(t); security_task_getsecid_obj(t, &context->target_sid); - memcpy(context->target_comm, t->comm, TASK_COMM_LEN); + __get_task_comm(context->target_comm, TASK_COMM_LEN, t); } /** @@ -2757,7 +2757,7 @@ int audit_signal_info_syscall(struct task_struct *t) ctx->target_uid = t_uid; ctx->target_sessionid = audit_get_sessionid(t); security_task_getsecid_obj(t, &ctx->target_sid); - memcpy(ctx->target_comm, t->comm, TASK_COMM_LEN); + __get_task_comm(ctx->target_comm, TASK_COMM_LEN, t); return 0; } @@ -2778,7 +2778,7 @@ int audit_signal_info_syscall(struct task_struct *t) axp->target_uid[axp->pid_count] = t_uid; axp->target_sessionid[axp->pid_count] = audit_get_sessionid(t); security_task_getsecid_obj(t, &axp->target_sid[axp->pid_count]); - memcpy(axp->target_comm[axp->pid_count], t->comm, TASK_COMM_LEN); + __get_task_comm(axp->target_comm[axp->pid_count], TASK_COMM_LEN, t); axp->pid_count++; return 0; -- 2.39.1 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH 3/6] auditsc: Replace memcpy() with __get_task_comm() 2024-06-02 2:37 ` [PATCH 3/6] auditsc: " Yafang Shao @ 2024-06-03 21:03 ` Paul Moore 0 siblings, 0 replies; 31+ messages in thread From: Paul Moore @ 2024-06-03 21:03 UTC (permalink / raw) To: Yafang Shao Cc: torvalds, linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, Eric Paris On Sat, Jun 1, 2024 at 10:38 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > Using __get_task_comm() to read the task comm ensures that the name is > always NUL-terminated, regardless of the source string. This approach also > facilitates future extensions to the task comm. > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > Cc: Paul Moore <paul@paul-moore.com> > Cc: Eric Paris <eparis@redhat.com> > --- > kernel/auditsc.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) Assuming you've sorted out all the problems identified earlier in the patchset and __get_task_comm() no longer takes task_lock() this should be okay from an audit perspective. Acked-by: Paul Moore <paul@paul-moore.com> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > index 6f0d6fb6523f..0459a141dc86 100644 > --- a/kernel/auditsc.c > +++ b/kernel/auditsc.c > @@ -2730,7 +2730,7 @@ void __audit_ptrace(struct task_struct *t) > context->target_uid = task_uid(t); > context->target_sessionid = audit_get_sessionid(t); > security_task_getsecid_obj(t, &context->target_sid); > - memcpy(context->target_comm, t->comm, TASK_COMM_LEN); > + __get_task_comm(context->target_comm, TASK_COMM_LEN, t); > } > > /** > @@ -2757,7 +2757,7 @@ int audit_signal_info_syscall(struct task_struct *t) > ctx->target_uid = t_uid; > ctx->target_sessionid = audit_get_sessionid(t); > security_task_getsecid_obj(t, &ctx->target_sid); > - memcpy(ctx->target_comm, t->comm, TASK_COMM_LEN); > + __get_task_comm(ctx->target_comm, TASK_COMM_LEN, t); > return 0; > } > > @@ -2778,7 +2778,7 @@ int audit_signal_info_syscall(struct task_struct *t) > axp->target_uid[axp->pid_count] = t_uid; > axp->target_sessionid[axp->pid_count] = audit_get_sessionid(t); > security_task_getsecid_obj(t, &axp->target_sid[axp->pid_count]); > - memcpy(axp->target_comm[axp->pid_count], t->comm, TASK_COMM_LEN); > + __get_task_comm(axp->target_comm[axp->pid_count], TASK_COMM_LEN, t); > axp->pid_count++; > > return 0; > -- > 2.39.1 -- paul-moore.com ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 4/6] security: Replace memcpy() with __get_task_comm() 2024-06-02 2:37 [PATCH 0/6] kernel: Avoid memcpy of task comm Yafang Shao ` (2 preceding siblings ...) 2024-06-02 2:37 ` [PATCH 3/6] auditsc: " Yafang Shao @ 2024-06-02 2:37 ` Yafang Shao 2024-06-03 22:06 ` Paul Moore 2024-06-02 2:37 ` [PATCH 5/6] bpftool: Make task comm always be NUL-terminated Yafang Shao 2024-06-02 2:46 ` [PATCH 6/6] selftests/bpf: Replace memcpy() with __get_task_comm() Yafang Shao 5 siblings, 1 reply; 31+ messages in thread From: Yafang Shao @ 2024-06-02 2:37 UTC (permalink / raw) To: torvalds Cc: linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, Yafang Shao, Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley, Ondrej Mosnacek Quoted from Linus [0]: selinux never wanted a lock, and never wanted any kind of *consistent* result, it just wanted a *stable* result. Using __get_task_comm() to read the task comm ensures that the name is always NUL-terminated, regardless of the source string. This approach also facilitates future extensions to the task comm. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> LINK: https://lore.kernel.org/all/CAHk-=wivfrF0_zvf+oj6==Sh=-npJooP8chLPEfaFV0oNYTTBA@mail.gmail.com/ [0] Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Stephen Smalley <stephen.smalley.work@gmail.com> Cc: Ondrej Mosnacek <omosnace@redhat.com> --- security/lsm_audit.c | 4 ++-- security/selinux/selinuxfs.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/security/lsm_audit.c b/security/lsm_audit.c index 849e832719e2..a922e4339dd5 100644 --- a/security/lsm_audit.c +++ b/security/lsm_audit.c @@ -207,7 +207,7 @@ static void dump_common_audit_data(struct audit_buffer *ab, BUILD_BUG_ON(sizeof(a->u) > sizeof(void *)*2); audit_log_format(ab, " pid=%d comm=", task_tgid_nr(current)); - audit_log_untrustedstring(ab, memcpy(comm, current->comm, sizeof(comm))); + audit_log_untrustedstring(ab, __get_task_comm(comm, sizeof(comm), current)); switch (a->type) { case LSM_AUDIT_DATA_NONE: @@ -302,7 +302,7 @@ static void dump_common_audit_data(struct audit_buffer *ab, char comm[sizeof(tsk->comm)]; audit_log_format(ab, " opid=%d ocomm=", pid); audit_log_untrustedstring(ab, - memcpy(comm, tsk->comm, sizeof(comm))); + __get_task_comm(comm, sizeof(comm), tsk)); } } break; diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c index e172f182b65c..a8a2ec742576 100644 --- a/security/selinux/selinuxfs.c +++ b/security/selinux/selinuxfs.c @@ -708,7 +708,7 @@ static ssize_t sel_write_checkreqprot(struct file *file, const char __user *buf, if (new_value) { char comm[sizeof(current->comm)]; - memcpy(comm, current->comm, sizeof(comm)); + __get_task_comm(comm, sizeof(comm), current); pr_err("SELinux: %s (%d) set checkreqprot to 1. This is no longer supported.\n", comm, current->pid); } -- 2.39.1 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH 4/6] security: Replace memcpy() with __get_task_comm() 2024-06-02 2:37 ` [PATCH 4/6] security: " Yafang Shao @ 2024-06-03 22:06 ` Paul Moore 0 siblings, 0 replies; 31+ messages in thread From: Paul Moore @ 2024-06-03 22:06 UTC (permalink / raw) To: Yafang Shao Cc: torvalds, linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, James Morris, Serge E. Hallyn, Stephen Smalley, Ondrej Mosnacek On Sat, Jun 1, 2024 at 10:38 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > Quoted from Linus [0]: > > selinux never wanted a lock, and never wanted any kind of *consistent* > result, it just wanted a *stable* result. > > Using __get_task_comm() to read the task comm ensures that the name is > always NUL-terminated, regardless of the source string. This approach also > facilitates future extensions to the task comm. > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > LINK: https://lore.kernel.org/all/CAHk-=wivfrF0_zvf+oj6==Sh=-npJooP8chLPEfaFV0oNYTTBA@mail.gmail.com/ [0] > Cc: Paul Moore <paul@paul-moore.com> > Cc: James Morris <jmorris@namei.org> > Cc: "Serge E. Hallyn" <serge@hallyn.com> > Cc: Stephen Smalley <stephen.smalley.work@gmail.com> > Cc: Ondrej Mosnacek <omosnace@redhat.com> > --- > security/lsm_audit.c | 4 ++-- > security/selinux/selinuxfs.c | 2 +- > 2 files changed, 3 insertions(+), 3 deletions(-) Similar to the audit change, as long as you sort out the __get_task_comm() issues such that it can operate without task_lock() this should be fine. Acked-by: Paul Moore <paul@paul-moore.com> > diff --git a/security/lsm_audit.c b/security/lsm_audit.c > index 849e832719e2..a922e4339dd5 100644 > --- a/security/lsm_audit.c > +++ b/security/lsm_audit.c > @@ -207,7 +207,7 @@ static void dump_common_audit_data(struct audit_buffer *ab, > BUILD_BUG_ON(sizeof(a->u) > sizeof(void *)*2); > > audit_log_format(ab, " pid=%d comm=", task_tgid_nr(current)); > - audit_log_untrustedstring(ab, memcpy(comm, current->comm, sizeof(comm))); > + audit_log_untrustedstring(ab, __get_task_comm(comm, sizeof(comm), current)); > > switch (a->type) { > case LSM_AUDIT_DATA_NONE: > @@ -302,7 +302,7 @@ static void dump_common_audit_data(struct audit_buffer *ab, > char comm[sizeof(tsk->comm)]; > audit_log_format(ab, " opid=%d ocomm=", pid); > audit_log_untrustedstring(ab, > - memcpy(comm, tsk->comm, sizeof(comm))); > + __get_task_comm(comm, sizeof(comm), tsk)); > } > } > break; > diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c > index e172f182b65c..a8a2ec742576 100644 > --- a/security/selinux/selinuxfs.c > +++ b/security/selinux/selinuxfs.c > @@ -708,7 +708,7 @@ static ssize_t sel_write_checkreqprot(struct file *file, const char __user *buf, > if (new_value) { > char comm[sizeof(current->comm)]; > > - memcpy(comm, current->comm, sizeof(comm)); > + __get_task_comm(comm, sizeof(comm), current); > pr_err("SELinux: %s (%d) set checkreqprot to 1. This is no longer supported.\n", > comm, current->pid); > } > -- > 2.39.1 -- paul-moore.com ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 5/6] bpftool: Make task comm always be NUL-terminated 2024-06-02 2:37 [PATCH 0/6] kernel: Avoid memcpy of task comm Yafang Shao ` (3 preceding siblings ...) 2024-06-02 2:37 ` [PATCH 4/6] security: " Yafang Shao @ 2024-06-02 2:37 ` Yafang Shao 2024-06-02 21:01 ` Quentin Monnet 2024-06-02 2:46 ` [PATCH 6/6] selftests/bpf: Replace memcpy() with __get_task_comm() Yafang Shao 5 siblings, 1 reply; 31+ messages in thread From: Yafang Shao @ 2024-06-02 2:37 UTC (permalink / raw) To: torvalds Cc: linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, Yafang Shao, Quentin Monnet Let's explicitly ensure the destination string is NUL-terminated. This way, it won't be affected by changes to the source string. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Cc: Quentin Monnet <qmo@kernel.org> --- tools/bpf/bpftool/pids.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/bpf/bpftool/pids.c b/tools/bpf/bpftool/pids.c index 9b898571b49e..23f488cf1740 100644 --- a/tools/bpf/bpftool/pids.c +++ b/tools/bpf/bpftool/pids.c @@ -54,6 +54,7 @@ static void add_ref(struct hashmap *map, struct pid_iter_entry *e) ref = &refs->refs[refs->ref_cnt]; ref->pid = e->pid; memcpy(ref->comm, e->comm, sizeof(ref->comm)); + ref->comm[sizeof(ref->comm) - 1] = '\0'; refs->ref_cnt++; return; @@ -77,6 +78,7 @@ static void add_ref(struct hashmap *map, struct pid_iter_entry *e) ref = &refs->refs[0]; ref->pid = e->pid; memcpy(ref->comm, e->comm, sizeof(ref->comm)); + ref->comm[sizeof(ref->comm) - 1] = '\0'; refs->ref_cnt = 1; refs->has_bpf_cookie = e->has_bpf_cookie; refs->bpf_cookie = e->bpf_cookie; -- 2.39.1 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH 5/6] bpftool: Make task comm always be NUL-terminated 2024-06-02 2:37 ` [PATCH 5/6] bpftool: Make task comm always be NUL-terminated Yafang Shao @ 2024-06-02 21:01 ` Quentin Monnet 0 siblings, 0 replies; 31+ messages in thread From: Quentin Monnet @ 2024-06-02 21:01 UTC (permalink / raw) To: Yafang Shao, torvalds Cc: linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf On 02/06/2024 03:37, Yafang Shao wrote: > Let's explicitly ensure the destination string is NUL-terminated. This way, > it won't be affected by changes to the source string. > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > Cc: Quentin Monnet <qmo@kernel.org> The change looks good to me, thank you. Reviewed-by: Quentin Monnet <qmo@kernel.org> ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 6/6] selftests/bpf: Replace memcpy() with __get_task_comm() 2024-06-02 2:37 [PATCH 0/6] kernel: Avoid memcpy of task comm Yafang Shao ` (4 preceding siblings ...) 2024-06-02 2:37 ` [PATCH 5/6] bpftool: Make task comm always be NUL-terminated Yafang Shao @ 2024-06-02 2:46 ` Yafang Shao 5 siblings, 0 replies; 31+ messages in thread From: Yafang Shao @ 2024-06-02 2:46 UTC (permalink / raw) To: laoar.shao, torvalds Cc: linux-mm, linux-fsdevel, linux-trace-kernel, audit, linux-security-module, selinux, bpf, Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko Using __get_task_comm() to read the task comm ensures that the name is always NUL-terminated, regardless of the source string. This approach also facilitates future extensions to the task comm. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Mykola Lysenko <mykolal@fb.com> --- tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h index 11ee801e75e7..e5df95b56c53 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h @@ -20,7 +20,7 @@ TRACE_EVENT(bpf_testmod_test_read, ), TP_fast_assign( __entry->pid = task->pid; - memcpy(__entry->comm, task->comm, TASK_COMM_LEN); + __get_task_comm(__entry->comm, TASK_COMM_LEN, task); __entry->off = ctx->off; __entry->len = ctx->len; ), -- 2.39.1 ^ permalink raw reply related [flat|nested] 31+ messages in thread
end of thread, other threads:[~2024-06-10 23:01 UTC | newest] Thread overview: 31+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-06-02 2:37 [PATCH 0/6] kernel: Avoid memcpy of task comm Yafang Shao 2024-06-02 2:37 ` [PATCH 1/6] fs/exec: Drop task_lock() inside __get_task_comm() Yafang Shao 2024-06-02 3:51 ` Eric W. Biederman 2024-06-02 6:56 ` Yafang Shao 2024-06-02 16:35 ` Alexei Starovoitov 2024-06-02 17:52 ` Eric W. Biederman 2024-06-02 18:23 ` Alexei Starovoitov 2024-06-03 11:35 ` Yafang Shao 2024-06-10 12:34 ` Eric W. Biederman 2024-06-10 23:01 ` Alexei Starovoitov 2024-06-02 20:11 ` Linus Torvalds 2024-06-02 17:56 ` Eric W. Biederman 2024-06-04 13:02 ` Matus Jokay 2024-06-04 20:01 ` Matus Jokay 2024-06-05 2:48 ` Yafang Shao 2024-06-02 2:37 ` [PATCH 2/6] tracing: Replace memcpy() with __get_task_comm() Yafang Shao 2024-06-03 21:20 ` Steven Rostedt 2024-06-03 21:42 ` Linus Torvalds 2024-06-03 22:19 ` Steven Rostedt 2024-06-03 22:23 ` Linus Torvalds 2024-06-03 22:37 ` Steven Rostedt 2024-06-03 22:38 ` Linus Torvalds 2024-06-03 22:40 ` Steven Rostedt 2024-06-04 2:35 ` Yafang Shao 2024-06-02 2:37 ` [PATCH 3/6] auditsc: " Yafang Shao 2024-06-03 21:03 ` Paul Moore 2024-06-02 2:37 ` [PATCH 4/6] security: " Yafang Shao 2024-06-03 22:06 ` Paul Moore 2024-06-02 2:37 ` [PATCH 5/6] bpftool: Make task comm always be NUL-terminated Yafang Shao 2024-06-02 21:01 ` Quentin Monnet 2024-06-02 2:46 ` [PATCH 6/6] selftests/bpf: Replace memcpy() with __get_task_comm() Yafang Shao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).