* Re: [PATCH v2 1/3] exec: Dynamically allocate memory to store task's full name
2025-03-31 12:18 ` [PATCH v2 1/3] exec: " Bhupesh
@ 2025-04-01 2:07 ` Yafang Shao
2025-04-04 6:35 ` Bhupesh Sharma
2025-04-01 4:02 ` Harry Yoo
` (2 subsequent siblings)
3 siblings, 1 reply; 18+ messages in thread
From: Yafang Shao @ 2025-04-01 2:07 UTC (permalink / raw)
To: Bhupesh, Linus Torvalds
Cc: akpm, kernel-dev, linux-kernel, bpf, linux-perf-users,
linux-fsdevel, linux-mm, oliver.sang, lkp, pmladek, rostedt,
mathieu.desnoyers, arnaldo.melo, alexei.starovoitov,
andrii.nakryiko, mirq-linux, peterz, willy, david, viro, keescook,
ebiederm, brauner, jack, mingo, juri.lelli, bsegall, mgorman,
vschneid
On Mon, Mar 31, 2025 at 8:18 PM Bhupesh <bhupesh@igalia.com> wrote:
>
> Provide a parallel implementation for get_task_comm() called
> get_task_full_name() which allows the dynamically allocated
> and filled-in task's full name to be passed to interested
> users such as 'gdb'.
>
> Currently while running 'gdb', the 'task->comm' value of a long
> task name is truncated due to the limitation of TASK_COMM_LEN.
>
> For example using gdb to debug a simple app currently which generate
> threads with long task names:
> # gdb ./threadnames -ex "run info thread" -ex "detach" -ex "quit" > log
> # cat log
>
> NameThatIsTooLo
>
> This patch does not touch 'TASK_COMM_LEN' at all, i.e.
> 'TASK_COMM_LEN' and the 16-byte design remains untouched. Which means
> that all the legacy / existing ABI, continue to work as before using
> '/proc/$pid/task/$tid/comm'.
>
> This patch only adds a parallel, dynamically-allocated
> 'task->full_name' which can be used by interested users
> via '/proc/$pid/task/$tid/full_name'.
>
> After this change, gdb is able to show full name of the task:
> # gdb ./threadnames -ex "run info thread" -ex "detach" -ex "quit" > log
> # cat log
>
> NameThatIsTooLongForComm[4662]
>
> Signed-off-by: Bhupesh <bhupesh@igalia.com>
> ---
> fs/exec.c | 21 ++++++++++++++++++---
> include/linux/sched.h | 9 +++++++++
> 2 files changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index f45859ad13ac..4219d77a519c 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1208,6 +1208,9 @@ int begin_new_exec(struct linux_binprm * bprm)
> {
> struct task_struct *me = current;
> int retval;
> + va_list args;
> + char *name;
> + const char *fmt;
>
> /* Once we are committed compute the creds */
> retval = bprm_creds_from_file(bprm);
> @@ -1348,11 +1351,22 @@ int begin_new_exec(struct linux_binprm * bprm)
> * detecting a concurrent rename and just want a terminated name.
> */
> rcu_read_lock();
> - __set_task_comm(me, smp_load_acquire(&bprm->file->f_path.dentry->d_name.name),
> - true);
> + fmt = smp_load_acquire(&bprm->file->f_path.dentry->d_name.name);
> + name = kvasprintf(GFP_KERNEL, fmt, args);
> + if (!name)
> + return -ENOMEM;
> +
> + me->full_name = name;
> + __set_task_comm(me, fmt, true);
> rcu_read_unlock();
> } else {
> - __set_task_comm(me, kbasename(bprm->filename), true);
> + fmt = kbasename(bprm->filename);
> + name = kvasprintf(GFP_KERNEL, fmt, args);
> + if (!name)
> + return -ENOMEM;
> +
> + me->full_name = name;
> + __set_task_comm(me, fmt, true);
> }
>
> /* An exec changes our domain. We are no longer part of the thread
> @@ -1399,6 +1413,7 @@ int begin_new_exec(struct linux_binprm * bprm)
> return 0;
>
> out_unlock:
> + kfree(me->full_name);
> up_write(&me->signal->exec_update_lock);
> if (!bprm->cred)
> mutex_unlock(&me->signal->cred_guard_mutex);
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 56ddeb37b5cd..053b52606652 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1166,6 +1166,9 @@ struct task_struct {
> */
> char comm[TASK_COMM_LEN];
>
> + /* To store the full name if task comm is truncated. */
> + char *full_name;
> +
Adding another field to store the task name isn’t ideal. What about
combining them into a single field, as Linus suggested [0]?
[0]. https://lore.kernel.org/all/CAHk-=wjAmmHUg6vho1KjzQi2=psR30+CogFd4aXrThr2gsiS4g@mail.gmail.com/
--
Regards
Yafang
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] exec: Dynamically allocate memory to store task's full name
2025-04-01 2:07 ` Yafang Shao
@ 2025-04-04 6:35 ` Bhupesh Sharma
2025-04-06 2:28 ` Yafang Shao
0 siblings, 1 reply; 18+ messages in thread
From: Bhupesh Sharma @ 2025-04-04 6:35 UTC (permalink / raw)
To: Yafang Shao, Bhupesh, Linus Torvalds
Cc: akpm, kernel-dev, linux-kernel, bpf, linux-perf-users,
linux-fsdevel, linux-mm, oliver.sang, lkp, pmladek, rostedt,
mathieu.desnoyers, arnaldo.melo, alexei.starovoitov,
andrii.nakryiko, mirq-linux, peterz, willy, david, viro, keescook,
ebiederm, brauner, jack, mingo, juri.lelli, bsegall, mgorman,
vschneid
On 4/1/25 7:37 AM, Yafang Shao wrote:
> On Mon, Mar 31, 2025 at 8:18 PM Bhupesh <bhupesh@igalia.com> wrote:
>> Provide a parallel implementation for get_task_comm() called
>> get_task_full_name() which allows the dynamically allocated
>> and filled-in task's full name to be passed to interested
>> users such as 'gdb'.
>>
>> Currently while running 'gdb', the 'task->comm' value of a long
>> task name is truncated due to the limitation of TASK_COMM_LEN.
>>
>> For example using gdb to debug a simple app currently which generate
>> threads with long task names:
>> # gdb ./threadnames -ex "run info thread" -ex "detach" -ex "quit" > log
>> # cat log
>>
>> NameThatIsTooLo
>>
>> This patch does not touch 'TASK_COMM_LEN' at all, i.e.
>> 'TASK_COMM_LEN' and the 16-byte design remains untouched. Which means
>> that all the legacy / existing ABI, continue to work as before using
>> '/proc/$pid/task/$tid/comm'.
>>
>> This patch only adds a parallel, dynamically-allocated
>> 'task->full_name' which can be used by interested users
>> via '/proc/$pid/task/$tid/full_name'.
>>
>> After this change, gdb is able to show full name of the task:
>> # gdb ./threadnames -ex "run info thread" -ex "detach" -ex "quit" > log
>> # cat log
>>
>> NameThatIsTooLongForComm[4662]
>>
>> Signed-off-by: Bhupesh <bhupesh@igalia.com>
>> ---
>> fs/exec.c | 21 ++++++++++++++++++---
>> include/linux/sched.h | 9 +++++++++
>> 2 files changed, 27 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/exec.c b/fs/exec.c
>> index f45859ad13ac..4219d77a519c 100644
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -1208,6 +1208,9 @@ int begin_new_exec(struct linux_binprm * bprm)
>> {
>> struct task_struct *me = current;
>> int retval;
>> + va_list args;
>> + char *name;
>> + const char *fmt;
>>
>> /* Once we are committed compute the creds */
>> retval = bprm_creds_from_file(bprm);
>> @@ -1348,11 +1351,22 @@ int begin_new_exec(struct linux_binprm * bprm)
>> * detecting a concurrent rename and just want a terminated name.
>> */
>> rcu_read_lock();
>> - __set_task_comm(me, smp_load_acquire(&bprm->file->f_path.dentry->d_name.name),
>> - true);
>> + fmt = smp_load_acquire(&bprm->file->f_path.dentry->d_name.name);
>> + name = kvasprintf(GFP_KERNEL, fmt, args);
>> + if (!name)
>> + return -ENOMEM;
>> +
>> + me->full_name = name;
>> + __set_task_comm(me, fmt, true);
>> rcu_read_unlock();
>> } else {
>> - __set_task_comm(me, kbasename(bprm->filename), true);
>> + fmt = kbasename(bprm->filename);
>> + name = kvasprintf(GFP_KERNEL, fmt, args);
>> + if (!name)
>> + return -ENOMEM;
>> +
>> + me->full_name = name;
>> + __set_task_comm(me, fmt, true);
>> }
>>
>> /* An exec changes our domain. We are no longer part of the thread
>> @@ -1399,6 +1413,7 @@ int begin_new_exec(struct linux_binprm * bprm)
>> return 0;
>>
>> out_unlock:
>> + kfree(me->full_name);
>> up_write(&me->signal->exec_update_lock);
>> if (!bprm->cred)
>> mutex_unlock(&me->signal->cred_guard_mutex);
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 56ddeb37b5cd..053b52606652 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1166,6 +1166,9 @@ struct task_struct {
>> */
>> char comm[TASK_COMM_LEN];
>>
>> + /* To store the full name if task comm is truncated. */
>> + char *full_name;
>> +
> Adding another field to store the task name isn’t ideal. What about
> combining them into a single field, as Linus suggested [0]?
>
> [0]. https://lore.kernel.org/all/CAHk-=wjAmmHUg6vho1KjzQi2=psR30+CogFd4aXrThr2gsiS4g@mail.gmail.com/
>
Thanks for sharing Linus's suggestion. I went through the suggested
changes in the related threads and came up with the following set of points:
1. struct task_struct would contain both 'comm' and 'full_name',
2. Remove the task_lock() inside __get_task_comm(),
3. Users of task->comm will be affected in the following ways:
(a). Printing with '%s' and tsk->comm would just continue to
work,but will get a longer max string.
(b). For users of memcpy.*->comm\>', we should change 'memcpy()' to
'copy_comm()' which would look like:
memcpy(dst, src, TASK_COMM_LEN);
dst[TASK_COMM_LEN-1] = 0;
(c). Users which use "sizeof(->comm)" will continue to get the old value because of the hacky union.
Am I missing something here. Please let me know your views.
Thanks,
Bhupesh
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] exec: Dynamically allocate memory to store task's full name
2025-04-04 6:35 ` Bhupesh Sharma
@ 2025-04-06 2:28 ` Yafang Shao
2025-04-09 11:13 ` Bhupesh Sharma
0 siblings, 1 reply; 18+ messages in thread
From: Yafang Shao @ 2025-04-06 2:28 UTC (permalink / raw)
To: Bhupesh Sharma
Cc: Bhupesh, Linus Torvalds, akpm, kernel-dev, linux-kernel, bpf,
linux-perf-users, linux-fsdevel, linux-mm, oliver.sang, lkp,
pmladek, rostedt, mathieu.desnoyers, arnaldo.melo,
alexei.starovoitov, andrii.nakryiko, mirq-linux, peterz, willy,
david, viro, keescook, ebiederm, brauner, jack, mingo, juri.lelli,
bsegall, mgorman, vschneid
On Fri, Apr 4, 2025 at 2:35 PM Bhupesh Sharma <bhsharma@igalia.com> wrote:
>
>
> On 4/1/25 7:37 AM, Yafang Shao wrote:
> > On Mon, Mar 31, 2025 at 8:18 PM Bhupesh <bhupesh@igalia.com> wrote:
> >> Provide a parallel implementation for get_task_comm() called
> >> get_task_full_name() which allows the dynamically allocated
> >> and filled-in task's full name to be passed to interested
> >> users such as 'gdb'.
> >>
> >> Currently while running 'gdb', the 'task->comm' value of a long
> >> task name is truncated due to the limitation of TASK_COMM_LEN.
> >>
> >> For example using gdb to debug a simple app currently which generate
> >> threads with long task names:
> >> # gdb ./threadnames -ex "run info thread" -ex "detach" -ex "quit" > log
> >> # cat log
> >>
> >> NameThatIsTooLo
> >>
> >> This patch does not touch 'TASK_COMM_LEN' at all, i.e.
> >> 'TASK_COMM_LEN' and the 16-byte design remains untouched. Which means
> >> that all the legacy / existing ABI, continue to work as before using
> >> '/proc/$pid/task/$tid/comm'.
> >>
> >> This patch only adds a parallel, dynamically-allocated
> >> 'task->full_name' which can be used by interested users
> >> via '/proc/$pid/task/$tid/full_name'.
> >>
> >> After this change, gdb is able to show full name of the task:
> >> # gdb ./threadnames -ex "run info thread" -ex "detach" -ex "quit" > log
> >> # cat log
> >>
> >> NameThatIsTooLongForComm[4662]
> >>
> >> Signed-off-by: Bhupesh <bhupesh@igalia.com>
> >> ---
> >> fs/exec.c | 21 ++++++++++++++++++---
> >> include/linux/sched.h | 9 +++++++++
> >> 2 files changed, 27 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/fs/exec.c b/fs/exec.c
> >> index f45859ad13ac..4219d77a519c 100644
> >> --- a/fs/exec.c
> >> +++ b/fs/exec.c
> >> @@ -1208,6 +1208,9 @@ int begin_new_exec(struct linux_binprm * bprm)
> >> {
> >> struct task_struct *me = current;
> >> int retval;
> >> + va_list args;
> >> + char *name;
> >> + const char *fmt;
> >>
> >> /* Once we are committed compute the creds */
> >> retval = bprm_creds_from_file(bprm);
> >> @@ -1348,11 +1351,22 @@ int begin_new_exec(struct linux_binprm * bprm)
> >> * detecting a concurrent rename and just want a terminated name.
> >> */
> >> rcu_read_lock();
> >> - __set_task_comm(me, smp_load_acquire(&bprm->file->f_path.dentry->d_name.name),
> >> - true);
> >> + fmt = smp_load_acquire(&bprm->file->f_path.dentry->d_name.name);
> >> + name = kvasprintf(GFP_KERNEL, fmt, args);
> >> + if (!name)
> >> + return -ENOMEM;
> >> +
> >> + me->full_name = name;
> >> + __set_task_comm(me, fmt, true);
> >> rcu_read_unlock();
> >> } else {
> >> - __set_task_comm(me, kbasename(bprm->filename), true);
> >> + fmt = kbasename(bprm->filename);
> >> + name = kvasprintf(GFP_KERNEL, fmt, args);
> >> + if (!name)
> >> + return -ENOMEM;
> >> +
> >> + me->full_name = name;
> >> + __set_task_comm(me, fmt, true);
> >> }
> >>
> >> /* An exec changes our domain. We are no longer part of the thread
> >> @@ -1399,6 +1413,7 @@ int begin_new_exec(struct linux_binprm * bprm)
> >> return 0;
> >>
> >> out_unlock:
> >> + kfree(me->full_name);
> >> up_write(&me->signal->exec_update_lock);
> >> if (!bprm->cred)
> >> mutex_unlock(&me->signal->cred_guard_mutex);
> >> diff --git a/include/linux/sched.h b/include/linux/sched.h
> >> index 56ddeb37b5cd..053b52606652 100644
> >> --- a/include/linux/sched.h
> >> +++ b/include/linux/sched.h
> >> @@ -1166,6 +1166,9 @@ struct task_struct {
> >> */
> >> char comm[TASK_COMM_LEN];
> >>
> >> + /* To store the full name if task comm is truncated. */
> >> + char *full_name;
> >> +
> > Adding another field to store the task name isn’t ideal. What about
> > combining them into a single field, as Linus suggested [0]?
> >
> > [0]. https://lore.kernel.org/all/CAHk-=wjAmmHUg6vho1KjzQi2=psR30+CogFd4aXrThr2gsiS4g@mail.gmail.com/
> >
>
> Thanks for sharing Linus's suggestion. I went through the suggested
> changes in the related threads and came up with the following set of points:
>
> 1. struct task_struct would contain both 'comm' and 'full_name',
Correct.
> 2. Remove the task_lock() inside __get_task_comm(),
This has been implemented in the patch series titled "Improve the copy
of task comm". For details, please refer to:
https://lore.kernel.org/linux-mm/20240828030321.20688-1-laoar.shao@gmail.com/.
> 3. Users of task->comm will be affected in the following ways:
Correct.
> (a). Printing with '%s' and tsk->comm would just continue to
> work,but will get a longer max string.
> (b). For users of memcpy.*->comm\>', we should change 'memcpy()' to
> 'copy_comm()' which would look like:
>
> memcpy(dst, src, TASK_COMM_LEN);
> dst[TASK_COMM_LEN-1] = 0;
>
> (c). Users which use "sizeof(->comm)" will continue to get the old value because of the hacky union.
Using a separate pointer rather than a union could simplify the
implementation. I’m open to introducing a new pointer if you believe
it’s the better approach.
>
> Am I missing something here. Please let me know your views.
--
Regards
Yafang
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] exec: Dynamically allocate memory to store task's full name
2025-04-06 2:28 ` Yafang Shao
@ 2025-04-09 11:13 ` Bhupesh Sharma
0 siblings, 0 replies; 18+ messages in thread
From: Bhupesh Sharma @ 2025-04-09 11:13 UTC (permalink / raw)
To: Yafang Shao
Cc: Bhupesh, Linus Torvalds, akpm, kernel-dev, linux-kernel, bpf,
linux-perf-users, linux-fsdevel, linux-mm, oliver.sang, lkp,
pmladek, rostedt, mathieu.desnoyers, arnaldo.melo,
alexei.starovoitov, andrii.nakryiko, mirq-linux, peterz, willy,
david, viro, keescook, ebiederm, brauner, jack, mingo, juri.lelli,
bsegall, mgorman, vschneid
Sorry for the delay in reply, I was out for a couple of days.
On 4/6/25 7:58 AM, Yafang Shao wrote:
> On Fri, Apr 4, 2025 at 2:35 PM Bhupesh Sharma <bhsharma@igalia.com> wrote:
>>
>> On 4/1/25 7:37 AM, Yafang Shao wrote:
>>> On Mon, Mar 31, 2025 at 8:18 PM Bhupesh <bhupesh@igalia.com> wrote:
>>>> Provide a parallel implementation for get_task_comm() called
>>>> get_task_full_name() which allows the dynamically allocated
>>>> and filled-in task's full name to be passed to interested
>>>> users such as 'gdb'.
>>>>
>>>> Currently while running 'gdb', the 'task->comm' value of a long
>>>> task name is truncated due to the limitation of TASK_COMM_LEN.
>>>>
>>>> For example using gdb to debug a simple app currently which generate
>>>> threads with long task names:
>>>> # gdb ./threadnames -ex "run info thread" -ex "detach" -ex "quit" > log
>>>> # cat log
>>>>
>>>> NameThatIsTooLo
>>>>
>>>> This patch does not touch 'TASK_COMM_LEN' at all, i.e.
>>>> 'TASK_COMM_LEN' and the 16-byte design remains untouched. Which means
>>>> that all the legacy / existing ABI, continue to work as before using
>>>> '/proc/$pid/task/$tid/comm'.
>>>>
>>>> This patch only adds a parallel, dynamically-allocated
>>>> 'task->full_name' which can be used by interested users
>>>> via '/proc/$pid/task/$tid/full_name'.
>>>>
>>>> After this change, gdb is able to show full name of the task:
>>>> # gdb ./threadnames -ex "run info thread" -ex "detach" -ex "quit" > log
>>>> # cat log
>>>>
>>>> NameThatIsTooLongForComm[4662]
>>>>
>>>> Signed-off-by: Bhupesh <bhupesh@igalia.com>
>>>> ---
>>>> fs/exec.c | 21 ++++++++++++++++++---
>>>> include/linux/sched.h | 9 +++++++++
>>>> 2 files changed, 27 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/fs/exec.c b/fs/exec.c
>>>> index f45859ad13ac..4219d77a519c 100644
>>>> --- a/fs/exec.c
>>>> +++ b/fs/exec.c
>>>> @@ -1208,6 +1208,9 @@ int begin_new_exec(struct linux_binprm * bprm)
>>>> {
>>>> struct task_struct *me = current;
>>>> int retval;
>>>> + va_list args;
>>>> + char *name;
>>>> + const char *fmt;
>>>>
>>>> /* Once we are committed compute the creds */
>>>> retval = bprm_creds_from_file(bprm);
>>>> @@ -1348,11 +1351,22 @@ int begin_new_exec(struct linux_binprm * bprm)
>>>> * detecting a concurrent rename and just want a terminated name.
>>>> */
>>>> rcu_read_lock();
>>>> - __set_task_comm(me, smp_load_acquire(&bprm->file->f_path.dentry->d_name.name),
>>>> - true);
>>>> + fmt = smp_load_acquire(&bprm->file->f_path.dentry->d_name.name);
>>>> + name = kvasprintf(GFP_KERNEL, fmt, args);
>>>> + if (!name)
>>>> + return -ENOMEM;
>>>> +
>>>> + me->full_name = name;
>>>> + __set_task_comm(me, fmt, true);
>>>> rcu_read_unlock();
>>>> } else {
>>>> - __set_task_comm(me, kbasename(bprm->filename), true);
>>>> + fmt = kbasename(bprm->filename);
>>>> + name = kvasprintf(GFP_KERNEL, fmt, args);
>>>> + if (!name)
>>>> + return -ENOMEM;
>>>> +
>>>> + me->full_name = name;
>>>> + __set_task_comm(me, fmt, true);
>>>> }
>>>>
>>>> /* An exec changes our domain. We are no longer part of the thread
>>>> @@ -1399,6 +1413,7 @@ int begin_new_exec(struct linux_binprm * bprm)
>>>> return 0;
>>>>
>>>> out_unlock:
>>>> + kfree(me->full_name);
>>>> up_write(&me->signal->exec_update_lock);
>>>> if (!bprm->cred)
>>>> mutex_unlock(&me->signal->cred_guard_mutex);
>>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>>> index 56ddeb37b5cd..053b52606652 100644
>>>> --- a/include/linux/sched.h
>>>> +++ b/include/linux/sched.h
>>>> @@ -1166,6 +1166,9 @@ struct task_struct {
>>>> */
>>>> char comm[TASK_COMM_LEN];
>>>>
>>>> + /* To store the full name if task comm is truncated. */
>>>> + char *full_name;
>>>> +
>>> Adding another field to store the task name isn’t ideal. What about
>>> combining them into a single field, as Linus suggested [0]?
>>>
>>> [0]. https://lore.kernel.org/all/CAHk-=wjAmmHUg6vho1KjzQi2=psR30+CogFd4aXrThr2gsiS4g@mail.gmail.com/
>>>
>> Thanks for sharing Linus's suggestion. I went through the suggested
>> changes in the related threads and came up with the following set of points:
>>
>> 1. struct task_struct would contain both 'comm' and 'full_name',
> Correct.
>
>> 2. Remove the task_lock() inside __get_task_comm(),
> This has been implemented in the patch series titled "Improve the copy
> of task comm". For details, please refer to:
> https://lore.kernel.org/linux-mm/20240828030321.20688-1-laoar.shao@gmail.com/.
>
>> 3. Users of task->comm will be affected in the following ways:
> Correct.
>
>> (a). Printing with '%s' and tsk->comm would just continue to
>> work,but will get a longer max string.
>> (b). For users of memcpy.*->comm\>', we should change 'memcpy()' to
>> 'copy_comm()' which would look like:
>>
>> memcpy(dst, src, TASK_COMM_LEN);
>> dst[TASK_COMM_LEN-1] = 0;
>>
>> (c). Users which use "sizeof(->comm)" will continue to get the old value because of the hacky union.
> Using a separate pointer rather than a union could simplify the
> implementation. I’m open to introducing a new pointer if you believe
> it’s the better approach.
Right, that's what I was thinking of earlier as well, i.e. having a new
pointer like tsk->full_name, however
allocating it outside the exec() hot-path may be tricky.
Let me try that though and come up with a v3, that addresses (a), (b) as
mentioned above and (c) with a pointer instead of union.
Thanks,
Bhupesh
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] exec: Dynamically allocate memory to store task's full name
2025-03-31 12:18 ` [PATCH v2 1/3] exec: " Bhupesh
2025-04-01 2:07 ` Yafang Shao
@ 2025-04-01 4:02 ` Harry Yoo
2025-04-04 5:31 ` Bhupesh Sharma
2025-04-03 16:17 ` Andrii Nakryiko
2025-04-03 16:30 ` Kees Cook
3 siblings, 1 reply; 18+ messages in thread
From: Harry Yoo @ 2025-04-01 4:02 UTC (permalink / raw)
To: Bhupesh
Cc: akpm, kernel-dev, linux-kernel, bpf, linux-perf-users,
linux-fsdevel, linux-mm, oliver.sang, lkp, laoar.shao, pmladek,
rostedt, mathieu.desnoyers, arnaldo.melo, alexei.starovoitov,
andrii.nakryiko, mirq-linux, peterz, willy, david, viro, keescook,
ebiederm, brauner, jack, mingo, juri.lelli, bsegall, mgorman,
vschneid
On Mon, Mar 31, 2025 at 05:48:18PM +0530, Bhupesh wrote:
> Provide a parallel implementation for get_task_comm() called
> get_task_full_name() which allows the dynamically allocated
> and filled-in task's full name to be passed to interested
> users such as 'gdb'.
>
> Currently while running 'gdb', the 'task->comm' value of a long
> task name is truncated due to the limitation of TASK_COMM_LEN.
>
> For example using gdb to debug a simple app currently which generate
> threads with long task names:
> # gdb ./threadnames -ex "run info thread" -ex "detach" -ex "quit" > log
> # cat log
>
> NameThatIsTooLo
>
> This patch does not touch 'TASK_COMM_LEN' at all, i.e.
> 'TASK_COMM_LEN' and the 16-byte design remains untouched. Which means
> that all the legacy / existing ABI, continue to work as before using
> '/proc/$pid/task/$tid/comm'.
>
> This patch only adds a parallel, dynamically-allocated
> 'task->full_name' which can be used by interested users
> via '/proc/$pid/task/$tid/full_name'.
>
> After this change, gdb is able to show full name of the task:
> # gdb ./threadnames -ex "run info thread" -ex "detach" -ex "quit" > log
> # cat log
>
> NameThatIsTooLongForComm[4662]
>
> Signed-off-by: Bhupesh <bhupesh@igalia.com>
> ---
> fs/exec.c | 21 ++++++++++++++++++---
> include/linux/sched.h | 9 +++++++++
> 2 files changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index f45859ad13ac..4219d77a519c 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1208,6 +1208,9 @@ int begin_new_exec(struct linux_binprm * bprm)
> {
> struct task_struct *me = current;
> int retval;
> + va_list args;
> + char *name;
> + const char *fmt;
>
> /* Once we are committed compute the creds */
> retval = bprm_creds_from_file(bprm);
> @@ -1348,11 +1351,22 @@ int begin_new_exec(struct linux_binprm * bprm)
> * detecting a concurrent rename and just want a terminated name.
> */
> rcu_read_lock();
> - __set_task_comm(me, smp_load_acquire(&bprm->file->f_path.dentry->d_name.name),
> - true);
> + fmt = smp_load_acquire(&bprm->file->f_path.dentry->d_name.name);
> + name = kvasprintf(GFP_KERNEL, fmt, args);
> + if (!name)
> + return -ENOMEM;
Is it safe to return error here, instead of jumping to 'out_unlock' label
and then releasing locks?
> + me->full_name = name;
> + __set_task_comm(me, fmt, true);
> rcu_read_unlock();
> } else {
> - __set_task_comm(me, kbasename(bprm->filename), true);
> + fmt = kbasename(bprm->filename);
> + name = kvasprintf(GFP_KERNEL, fmt, args);
> + if (!name)
> + return -ENOMEM;
> +
> + me->full_name = name;
> + __set_task_comm(me, fmt, true);
> }
>
> /* An exec changes our domain. We are no longer part of the thread
> @@ -1399,6 +1413,7 @@ int begin_new_exec(struct linux_binprm * bprm)
> return 0;
>
> out_unlock:
> + kfree(me->full_name);
> up_write(&me->signal->exec_update_lock);
> if (!bprm->cred)
> mutex_unlock(&me->signal->cred_guard_mutex);
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 56ddeb37b5cd..053b52606652 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1166,6 +1166,9 @@ struct task_struct {
> */
> char comm[TASK_COMM_LEN];
>
> + /* To store the full name if task comm is truncated. */
> + char *full_name;
> +
> struct nameidata *nameidata;
>
> #ifdef CONFIG_SYSVIPC
> @@ -2007,6 +2010,12 @@ extern void __set_task_comm(struct task_struct *tsk, const char *from, bool exec
> buf; \
> })
>
> +#define get_task_full_name(buf, buf_size, tsk) ({ \
> + BUILD_BUG_ON(sizeof(buf) < TASK_COMM_LEN); \
> + strscpy_pad(buf, (tsk)->full_name, buf_size); \
> + buf; \
> +})
> +
> #ifdef CONFIG_SMP
> static __always_inline void scheduler_ipi(void)
> {
> --
> 2.38.1
>
>
--
Cheers,
Harry (formerly known as Hyeonggon)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] exec: Dynamically allocate memory to store task's full name
2025-04-01 4:02 ` Harry Yoo
@ 2025-04-04 5:31 ` Bhupesh Sharma
0 siblings, 0 replies; 18+ messages in thread
From: Bhupesh Sharma @ 2025-04-04 5:31 UTC (permalink / raw)
To: Harry Yoo, Bhupesh
Cc: akpm, kernel-dev, linux-kernel, bpf, linux-perf-users,
linux-fsdevel, linux-mm, oliver.sang, lkp, laoar.shao, pmladek,
rostedt, mathieu.desnoyers, arnaldo.melo, alexei.starovoitov,
andrii.nakryiko, mirq-linux, peterz, willy, david, viro, keescook,
ebiederm, brauner, jack, mingo, juri.lelli, bsegall, mgorman,
vschneid
On 4/1/25 9:32 AM, Harry Yoo wrote:
> On Mon, Mar 31, 2025 at 05:48:18PM +0530, Bhupesh wrote:
>> Provide a parallel implementation for get_task_comm() called
>> get_task_full_name() which allows the dynamically allocated
>> and filled-in task's full name to be passed to interested
>> users such as 'gdb'.
>>
>> Currently while running 'gdb', the 'task->comm' value of a long
>> task name is truncated due to the limitation of TASK_COMM_LEN.
>>
>> For example using gdb to debug a simple app currently which generate
>> threads with long task names:
>> # gdb ./threadnames -ex "run info thread" -ex "detach" -ex "quit" > log
>> # cat log
>>
>> NameThatIsTooLo
>>
>> This patch does not touch 'TASK_COMM_LEN' at all, i.e.
>> 'TASK_COMM_LEN' and the 16-byte design remains untouched. Which means
>> that all the legacy / existing ABI, continue to work as before using
>> '/proc/$pid/task/$tid/comm'.
>>
>> This patch only adds a parallel, dynamically-allocated
>> 'task->full_name' which can be used by interested users
>> via '/proc/$pid/task/$tid/full_name'.
>>
>> After this change, gdb is able to show full name of the task:
>> # gdb ./threadnames -ex "run info thread" -ex "detach" -ex "quit" > log
>> # cat log
>>
>> NameThatIsTooLongForComm[4662]
>>
>> Signed-off-by: Bhupesh <bhupesh@igalia.com>
>> ---
>> fs/exec.c | 21 ++++++++++++++++++---
>> include/linux/sched.h | 9 +++++++++
>> 2 files changed, 27 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/exec.c b/fs/exec.c
>> index f45859ad13ac..4219d77a519c 100644
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -1208,6 +1208,9 @@ int begin_new_exec(struct linux_binprm * bprm)
>> {
>> struct task_struct *me = current;
>> int retval;
>> + va_list args;
>> + char *name;
>> + const char *fmt;
>>
>> /* Once we are committed compute the creds */
>> retval = bprm_creds_from_file(bprm);
>> @@ -1348,11 +1351,22 @@ int begin_new_exec(struct linux_binprm * bprm)
>> * detecting a concurrent rename and just want a terminated name.
>> */
>> rcu_read_lock();
>> - __set_task_comm(me, smp_load_acquire(&bprm->file->f_path.dentry->d_name.name),
>> - true);
>> + fmt = smp_load_acquire(&bprm->file->f_path.dentry->d_name.name);
>> + name = kvasprintf(GFP_KERNEL, fmt, args);
>> + if (!name)
>> + return -ENOMEM;
> Is it safe to return error here, instead of jumping to 'out_unlock' label
> and then releasing locks?
Ok, let me modify this in v3 to ensure that locks are released properly.
>> + me->full_name = name;
>> + __set_task_comm(me, fmt, true);
>> rcu_read_unlock();
>> } else {
>> - __set_task_comm(me, kbasename(bprm->filename), true);
>> + fmt = kbasename(bprm->filename);
>> + name = kvasprintf(GFP_KERNEL, fmt, args);
>> + if (!name)
>> + return -ENOMEM;
>> +
>> + me->full_name = name;
>> + __set_task_comm(me, fmt, true);
>> }
>>
>> /* An exec changes our domain. We are no longer part of the thread
>> @@ -1399,6 +1413,7 @@ int begin_new_exec(struct linux_binprm * bprm)
>> return 0;
>>
>> out_unlock:
>> + kfree(me->full_name);
>> up_write(&me->signal->exec_update_lock);
>> if (!bprm->cred)
>> mutex_unlock(&me->signal->cred_guard_mutex);
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 56ddeb37b5cd..053b52606652 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1166,6 +1166,9 @@ struct task_struct {
>> */
>> char comm[TASK_COMM_LEN];
>>
>> + /* To store the full name if task comm is truncated. */
>> + char *full_name;
>> +
>> struct nameidata *nameidata;
>>
>> #ifdef CONFIG_SYSVIPC
>> @@ -2007,6 +2010,12 @@ extern void __set_task_comm(struct task_struct *tsk, const char *from, bool exec
>> buf; \
>> })
>>
>> +#define get_task_full_name(buf, buf_size, tsk) ({ \
>> + BUILD_BUG_ON(sizeof(buf) < TASK_COMM_LEN); \
>> + strscpy_pad(buf, (tsk)->full_name, buf_size); \
>> + buf; \
>> +})
>> +
>> #ifdef CONFIG_SMP
>> static __always_inline void scheduler_ipi(void)
>> {
>> --
>> 2.38.1
>>
>>
Thanks.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] exec: Dynamically allocate memory to store task's full name
2025-03-31 12:18 ` [PATCH v2 1/3] exec: " Bhupesh
2025-04-01 2:07 ` Yafang Shao
2025-04-01 4:02 ` Harry Yoo
@ 2025-04-03 16:17 ` Andrii Nakryiko
2025-04-04 5:36 ` Bhupesh Sharma
2025-04-03 16:30 ` Kees Cook
3 siblings, 1 reply; 18+ messages in thread
From: Andrii Nakryiko @ 2025-04-03 16:17 UTC (permalink / raw)
To: Bhupesh
Cc: akpm, kernel-dev, linux-kernel, bpf, linux-perf-users,
linux-fsdevel, linux-mm, oliver.sang, lkp, laoar.shao, pmladek,
rostedt, mathieu.desnoyers, arnaldo.melo, alexei.starovoitov,
mirq-linux, peterz, willy, david, viro, keescook, ebiederm,
brauner, jack, mingo, juri.lelli, bsegall, mgorman, vschneid
On Mon, Mar 31, 2025 at 5:18 AM Bhupesh <bhupesh@igalia.com> wrote:
>
> Provide a parallel implementation for get_task_comm() called
> get_task_full_name() which allows the dynamically allocated
> and filled-in task's full name to be passed to interested
> users such as 'gdb'.
>
> Currently while running 'gdb', the 'task->comm' value of a long
> task name is truncated due to the limitation of TASK_COMM_LEN.
>
> For example using gdb to debug a simple app currently which generate
> threads with long task names:
> # gdb ./threadnames -ex "run info thread" -ex "detach" -ex "quit" > log
> # cat log
>
> NameThatIsTooLo
>
> This patch does not touch 'TASK_COMM_LEN' at all, i.e.
> 'TASK_COMM_LEN' and the 16-byte design remains untouched. Which means
> that all the legacy / existing ABI, continue to work as before using
> '/proc/$pid/task/$tid/comm'.
>
> This patch only adds a parallel, dynamically-allocated
> 'task->full_name' which can be used by interested users
> via '/proc/$pid/task/$tid/full_name'.
>
> After this change, gdb is able to show full name of the task:
> # gdb ./threadnames -ex "run info thread" -ex "detach" -ex "quit" > log
> # cat log
>
> NameThatIsTooLongForComm[4662]
>
> Signed-off-by: Bhupesh <bhupesh@igalia.com>
> ---
> fs/exec.c | 21 ++++++++++++++++++---
> include/linux/sched.h | 9 +++++++++
> 2 files changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index f45859ad13ac..4219d77a519c 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1208,6 +1208,9 @@ int begin_new_exec(struct linux_binprm * bprm)
> {
> struct task_struct *me = current;
> int retval;
> + va_list args;
> + char *name;
> + const char *fmt;
>
> /* Once we are committed compute the creds */
> retval = bprm_creds_from_file(bprm);
> @@ -1348,11 +1351,22 @@ int begin_new_exec(struct linux_binprm * bprm)
> * detecting a concurrent rename and just want a terminated name.
> */
> rcu_read_lock();
> - __set_task_comm(me, smp_load_acquire(&bprm->file->f_path.dentry->d_name.name),
> - true);
> + fmt = smp_load_acquire(&bprm->file->f_path.dentry->d_name.name);
> + name = kvasprintf(GFP_KERNEL, fmt, args);
this `args` argument, it's not initialized anywhere, right? It's not
clear where it's coming from, but you are passing it directly into
kvasprintf(), I can't convince myself that this is correct. Can you
please explain what is happening here?
Also, instead of allocating a buffer unconditionally, maybe check that
comm is longer than 16, and if not, just use the old-schoold 16-byte
comm array?
> + if (!name)
> + return -ENOMEM;
> +
> + me->full_name = name;
> + __set_task_comm(me, fmt, true);
> rcu_read_unlock();
> } else {
> - __set_task_comm(me, kbasename(bprm->filename), true);
> + fmt = kbasename(bprm->filename);
> + name = kvasprintf(GFP_KERNEL, fmt, args);
> + if (!name)
> + return -ENOMEM;
> +
> + me->full_name = name;
> + __set_task_comm(me, fmt, true);
> }
>
> /* An exec changes our domain. We are no longer part of the thread
> @@ -1399,6 +1413,7 @@ int begin_new_exec(struct linux_binprm * bprm)
> return 0;
>
> out_unlock:
> + kfree(me->full_name);
> up_write(&me->signal->exec_update_lock);
> if (!bprm->cred)
> mutex_unlock(&me->signal->cred_guard_mutex);
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 56ddeb37b5cd..053b52606652 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1166,6 +1166,9 @@ struct task_struct {
> */
> char comm[TASK_COMM_LEN];
>
> + /* To store the full name if task comm is truncated. */
> + char *full_name;
> +
> struct nameidata *nameidata;
>
> #ifdef CONFIG_SYSVIPC
> @@ -2007,6 +2010,12 @@ extern void __set_task_comm(struct task_struct *tsk, const char *from, bool exec
> buf; \
> })
>
> +#define get_task_full_name(buf, buf_size, tsk) ({ \
> + BUILD_BUG_ON(sizeof(buf) < TASK_COMM_LEN); \
> + strscpy_pad(buf, (tsk)->full_name, buf_size); \
> + buf; \
> +})
> +
> #ifdef CONFIG_SMP
> static __always_inline void scheduler_ipi(void)
> {
> --
> 2.38.1
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] exec: Dynamically allocate memory to store task's full name
2025-04-03 16:17 ` Andrii Nakryiko
@ 2025-04-04 5:36 ` Bhupesh Sharma
0 siblings, 0 replies; 18+ messages in thread
From: Bhupesh Sharma @ 2025-04-04 5:36 UTC (permalink / raw)
To: Andrii Nakryiko, Bhupesh
Cc: akpm, kernel-dev, linux-kernel, bpf, linux-perf-users,
linux-fsdevel, linux-mm, oliver.sang, lkp, laoar.shao, pmladek,
rostedt, mathieu.desnoyers, arnaldo.melo, alexei.starovoitov,
mirq-linux, peterz, willy, david, viro, keescook, ebiederm,
brauner, jack, mingo, juri.lelli, bsegall, mgorman, vschneid
On 4/3/25 9:47 PM, Andrii Nakryiko wrote:
> On Mon, Mar 31, 2025 at 5:18 AM Bhupesh <bhupesh@igalia.com> wrote:
>> Provide a parallel implementation for get_task_comm() called
>> get_task_full_name() which allows the dynamically allocated
>> and filled-in task's full name to be passed to interested
>> users such as 'gdb'.
>>
>> Currently while running 'gdb', the 'task->comm' value of a long
>> task name is truncated due to the limitation of TASK_COMM_LEN.
>>
>> For example using gdb to debug a simple app currently which generate
>> threads with long task names:
>> # gdb ./threadnames -ex "run info thread" -ex "detach" -ex "quit" > log
>> # cat log
>>
>> NameThatIsTooLo
>>
>> This patch does not touch 'TASK_COMM_LEN' at all, i.e.
>> 'TASK_COMM_LEN' and the 16-byte design remains untouched. Which means
>> that all the legacy / existing ABI, continue to work as before using
>> '/proc/$pid/task/$tid/comm'.
>>
>> This patch only adds a parallel, dynamically-allocated
>> 'task->full_name' which can be used by interested users
>> via '/proc/$pid/task/$tid/full_name'.
>>
>> After this change, gdb is able to show full name of the task:
>> # gdb ./threadnames -ex "run info thread" -ex "detach" -ex "quit" > log
>> # cat log
>>
>> NameThatIsTooLongForComm[4662]
>>
>> Signed-off-by: Bhupesh <bhupesh@igalia.com>
>> ---
>> fs/exec.c | 21 ++++++++++++++++++---
>> include/linux/sched.h | 9 +++++++++
>> 2 files changed, 27 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/exec.c b/fs/exec.c
>> index f45859ad13ac..4219d77a519c 100644
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -1208,6 +1208,9 @@ int begin_new_exec(struct linux_binprm * bprm)
>> {
>> struct task_struct *me = current;
>> int retval;
>> + va_list args;
>> + char *name;
>> + const char *fmt;
>>
>> /* Once we are committed compute the creds */
>> retval = bprm_creds_from_file(bprm);
>> @@ -1348,11 +1351,22 @@ int begin_new_exec(struct linux_binprm * bprm)
>> * detecting a concurrent rename and just want a terminated name.
>> */
>> rcu_read_lock();
>> - __set_task_comm(me, smp_load_acquire(&bprm->file->f_path.dentry->d_name.name),
>> - true);
>> + fmt = smp_load_acquire(&bprm->file->f_path.dentry->d_name.name);
>> + name = kvasprintf(GFP_KERNEL, fmt, args);
> this `args` argument, it's not initialized anywhere, right? It's not
> clear where it's coming from, but you are passing it directly into
> kvasprintf(), I can't convince myself that this is correct. Can you
> please explain what is happening here?
>
> Also, instead of allocating a buffer unconditionally, maybe check that
> comm is longer than 16, and if not, just use the old-schoold 16-byte
> comm array?
Ok. As Kees also mentioned in his comment, I will try to do away with
the allocation in the exec() hot-path in v3.
>> + if (!name)
>> + return -ENOMEM;
>> +
>> + me->full_name = name;
>> + __set_task_comm(me, fmt, true);
>> rcu_read_unlock();
>> } else {
>> - __set_task_comm(me, kbasename(bprm->filename), true);
>> + fmt = kbasename(bprm->filename);
>> + name = kvasprintf(GFP_KERNEL, fmt, args);
>> + if (!name)
>> + return -ENOMEM;
>> +
>> + me->full_name = name;
>> + __set_task_comm(me, fmt, true);
>> }
>>
>> /* An exec changes our domain. We are no longer part of the thread
>> @@ -1399,6 +1413,7 @@ int begin_new_exec(struct linux_binprm * bprm)
>> return 0;
>>
>> out_unlock:
>> + kfree(me->full_name);
>> up_write(&me->signal->exec_update_lock);
>> if (!bprm->cred)
>> mutex_unlock(&me->signal->cred_guard_mutex);
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 56ddeb37b5cd..053b52606652 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1166,6 +1166,9 @@ struct task_struct {
>> */
>> char comm[TASK_COMM_LEN];
>>
>> + /* To store the full name if task comm is truncated. */
>> + char *full_name;
>> +
>> struct nameidata *nameidata;
>>
>> #ifdef CONFIG_SYSVIPC
>> @@ -2007,6 +2010,12 @@ extern void __set_task_comm(struct task_struct *tsk, const char *from, bool exec
>> buf; \
>> })
>>
>> +#define get_task_full_name(buf, buf_size, tsk) ({ \
>> + BUILD_BUG_ON(sizeof(buf) < TASK_COMM_LEN); \
>> + strscpy_pad(buf, (tsk)->full_name, buf_size); \
>> + buf; \
>> +})
>> +
>> #ifdef CONFIG_SMP
>> static __always_inline void scheduler_ipi(void)
>> {
>> --
>> 2.38.1
>>
Thanks.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] exec: Dynamically allocate memory to store task's full name
2025-03-31 12:18 ` [PATCH v2 1/3] exec: " Bhupesh
` (2 preceding siblings ...)
2025-04-03 16:17 ` Andrii Nakryiko
@ 2025-04-03 16:30 ` Kees Cook
2025-04-04 6:48 ` Bhupesh Sharma
3 siblings, 1 reply; 18+ messages in thread
From: Kees Cook @ 2025-04-03 16:30 UTC (permalink / raw)
To: Bhupesh
Cc: akpm, kernel-dev, linux-kernel, bpf, linux-perf-users,
linux-fsdevel, linux-mm, oliver.sang, lkp, laoar.shao, pmladek,
rostedt, mathieu.desnoyers, arnaldo.melo, alexei.starovoitov,
andrii.nakryiko, mirq-linux, peterz, willy, david, viro, ebiederm,
brauner, jack, mingo, juri.lelli, bsegall, mgorman, vschneid
On Mon, Mar 31, 2025 at 05:48:18PM +0530, Bhupesh wrote:
> Provide a parallel implementation for get_task_comm() called
> get_task_full_name() which allows the dynamically allocated
> and filled-in task's full name to be passed to interested
> users such as 'gdb'.
>
> Currently while running 'gdb', the 'task->comm' value of a long
> task name is truncated due to the limitation of TASK_COMM_LEN.
>
> For example using gdb to debug a simple app currently which generate
> threads with long task names:
> # gdb ./threadnames -ex "run info thread" -ex "detach" -ex "quit" > log
> # cat log
>
> NameThatIsTooLo
>
> This patch does not touch 'TASK_COMM_LEN' at all, i.e.
> 'TASK_COMM_LEN' and the 16-byte design remains untouched. Which means
> that all the legacy / existing ABI, continue to work as before using
> '/proc/$pid/task/$tid/comm'.
>
> This patch only adds a parallel, dynamically-allocated
> 'task->full_name' which can be used by interested users
> via '/proc/$pid/task/$tid/full_name'.
>
> After this change, gdb is able to show full name of the task:
> # gdb ./threadnames -ex "run info thread" -ex "detach" -ex "quit" > log
> # cat log
>
> NameThatIsTooLongForComm[4662]
>
> Signed-off-by: Bhupesh <bhupesh@igalia.com>
> ---
> fs/exec.c | 21 ++++++++++++++++++---
> include/linux/sched.h | 9 +++++++++
> 2 files changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index f45859ad13ac..4219d77a519c 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1208,6 +1208,9 @@ int begin_new_exec(struct linux_binprm * bprm)
> {
> struct task_struct *me = current;
> int retval;
> + va_list args;
> + char *name;
> + const char *fmt;
>
> /* Once we are committed compute the creds */
> retval = bprm_creds_from_file(bprm);
> @@ -1348,11 +1351,22 @@ int begin_new_exec(struct linux_binprm * bprm)
> * detecting a concurrent rename and just want a terminated name.
> */
> rcu_read_lock();
> - __set_task_comm(me, smp_load_acquire(&bprm->file->f_path.dentry->d_name.name),
> - true);
> + fmt = smp_load_acquire(&bprm->file->f_path.dentry->d_name.name);
> + name = kvasprintf(GFP_KERNEL, fmt, args);
> + if (!name)
> + return -ENOMEM;
> +
> + me->full_name = name;
> + __set_task_comm(me, fmt, true);
I don't want to add new allocations to the default exec path unless we
absolutely must.
In the original proposal this was about setting thread names (after
exec), and I think that'll be fine.
> rcu_read_unlock();
> } else {
> - __set_task_comm(me, kbasename(bprm->filename), true);
> + fmt = kbasename(bprm->filename);
> + name = kvasprintf(GFP_KERNEL, fmt, args);
> + if (!name)
> + return -ENOMEM;
> +
> + me->full_name = name;
> + __set_task_comm(me, fmt, true);
> }
I think we can just set me->full_name = me->comm by default.
>
> /* An exec changes our domain. We are no longer part of the thread
> @@ -1399,6 +1413,7 @@ int begin_new_exec(struct linux_binprm * bprm)
> return 0;
>
> out_unlock:
> + kfree(me->full_name);
> up_write(&me->signal->exec_update_lock);
> if (!bprm->cred)
> mutex_unlock(&me->signal->cred_guard_mutex);
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 56ddeb37b5cd..053b52606652 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1166,6 +1166,9 @@ struct task_struct {
> */
> char comm[TASK_COMM_LEN];
>
> + /* To store the full name if task comm is truncated. */
> + char *full_name;
> +
> struct nameidata *nameidata;
>
> #ifdef CONFIG_SYSVIPC
> @@ -2007,6 +2010,12 @@ extern void __set_task_comm(struct task_struct *tsk, const char *from, bool exec
> buf; \
> })
>
> +#define get_task_full_name(buf, buf_size, tsk) ({ \
> + BUILD_BUG_ON(sizeof(buf) < TASK_COMM_LEN); \
> + strscpy_pad(buf, (tsk)->full_name, buf_size); \
> + buf; \
> +})
I think it should be possible to just switch get_task_comm() to use
(tsk)->full_name.
--
Kees Cook
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] exec: Dynamically allocate memory to store task's full name
2025-04-03 16:30 ` Kees Cook
@ 2025-04-04 6:48 ` Bhupesh Sharma
2025-04-04 17:24 ` Kees Cook
0 siblings, 1 reply; 18+ messages in thread
From: Bhupesh Sharma @ 2025-04-04 6:48 UTC (permalink / raw)
To: Kees Cook, Bhupesh
Cc: akpm, kernel-dev, linux-kernel, bpf, linux-perf-users,
linux-fsdevel, linux-mm, oliver.sang, lkp, laoar.shao, pmladek,
rostedt, mathieu.desnoyers, arnaldo.melo, alexei.starovoitov,
andrii.nakryiko, mirq-linux, peterz, willy, david, viro, ebiederm,
brauner, jack, mingo, juri.lelli, bsegall, mgorman, vschneid
Hi Kees,
On 4/3/25 10:00 PM, Kees Cook wrote:
> On Mon, Mar 31, 2025 at 05:48:18PM +0530, Bhupesh wrote:
>> Provide a parallel implementation for get_task_comm() called
>> get_task_full_name() which allows the dynamically allocated
>> and filled-in task's full name to be passed to interested
>> users such as 'gdb'.
>>
>> Currently while running 'gdb', the 'task->comm' value of a long
>> task name is truncated due to the limitation of TASK_COMM_LEN.
>>
>> For example using gdb to debug a simple app currently which generate
>> threads with long task names:
>> # gdb ./threadnames -ex "run info thread" -ex "detach" -ex "quit" > log
>> # cat log
>>
>> NameThatIsTooLo
>>
>> This patch does not touch 'TASK_COMM_LEN' at all, i.e.
>> 'TASK_COMM_LEN' and the 16-byte design remains untouched. Which means
>> that all the legacy / existing ABI, continue to work as before using
>> '/proc/$pid/task/$tid/comm'.
>>
>> This patch only adds a parallel, dynamically-allocated
>> 'task->full_name' which can be used by interested users
>> via '/proc/$pid/task/$tid/full_name'.
>>
>> After this change, gdb is able to show full name of the task:
>> # gdb ./threadnames -ex "run info thread" -ex "detach" -ex "quit" > log
>> # cat log
>>
>> NameThatIsTooLongForComm[4662]
>>
>> Signed-off-by: Bhupesh <bhupesh@igalia.com>
>> ---
>> fs/exec.c | 21 ++++++++++++++++++---
>> include/linux/sched.h | 9 +++++++++
>> 2 files changed, 27 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/exec.c b/fs/exec.c
>> index f45859ad13ac..4219d77a519c 100644
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -1208,6 +1208,9 @@ int begin_new_exec(struct linux_binprm * bprm)
>> {
>> struct task_struct *me = current;
>> int retval;
>> + va_list args;
>> + char *name;
>> + const char *fmt;
>>
>> /* Once we are committed compute the creds */
>> retval = bprm_creds_from_file(bprm);
>> @@ -1348,11 +1351,22 @@ int begin_new_exec(struct linux_binprm * bprm)
>> * detecting a concurrent rename and just want a terminated name.
>> */
>> rcu_read_lock();
>> - __set_task_comm(me, smp_load_acquire(&bprm->file->f_path.dentry->d_name.name),
>> - true);
>> + fmt = smp_load_acquire(&bprm->file->f_path.dentry->d_name.name);
>> + name = kvasprintf(GFP_KERNEL, fmt, args);
>> + if (!name)
>> + return -ENOMEM;
>> +
>> + me->full_name = name;
>> + __set_task_comm(me, fmt, true);
> I don't want to add new allocations to the default exec path unless we
> absolutely must.
>
> In the original proposal this was about setting thread names (after
> exec), and I think that'll be fine.
>
Ok.
>> rcu_read_unlock();
>> } else {
>> - __set_task_comm(me, kbasename(bprm->filename), true);
>> + fmt = kbasename(bprm->filename);
>> + name = kvasprintf(GFP_KERNEL, fmt, args);
>> + if (!name)
>> + return -ENOMEM;
>> +
>> + me->full_name = name;
>> + __set_task_comm(me, fmt, true);
>> }
> I think we can just set me->full_name = me->comm by default.
Sure.
>>
>> /* An exec changes our domain. We are no longer part of the thread
>> @@ -1399,6 +1413,7 @@ int begin_new_exec(struct linux_binprm * bprm)
>> return 0;
>>
>> out_unlock:
>> + kfree(me->full_name);
>> up_write(&me->signal->exec_update_lock);
>> if (!bprm->cred)
>> mutex_unlock(&me->signal->cred_guard_mutex);
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 56ddeb37b5cd..053b52606652 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1166,6 +1166,9 @@ struct task_struct {
>> */
>> char comm[TASK_COMM_LEN];
>>
>> + /* To store the full name if task comm is truncated. */
>> + char *full_name;
>> +
>> struct nameidata *nameidata;
>>
>> #ifdef CONFIG_SYSVIPC
>> @@ -2007,6 +2010,12 @@ extern void __set_task_comm(struct task_struct *tsk, const char *from, bool exec
>> buf; \
>> })
>>
>> +#define get_task_full_name(buf, buf_size, tsk) ({ \
>> + BUILD_BUG_ON(sizeof(buf) < TASK_COMM_LEN); \
>> + strscpy_pad(buf, (tsk)->full_name, buf_size); \
>> + buf; \
>> +})
> I think it should be possible to just switch get_task_comm() to use
> (tsk)->full_name.
>
In another review for this series, Yafang mentioned the following
cleanup + approach suggested by Linus (see [0]).
Also I have summarized my understanding on the basis of the suggestions
Linus shared and the accompanying background threads (please see [1]).
Kindly share your views on the same, so that I can change the
implementation in v3 series accordingly.
[0].https://lore.kernel.org/all/CAHk-=wjAmmHUg6vho1KjzQi2=psR30+CogFd4aXrThr2gsiS4g@mail.gmail.com/
[1]. https://lore.kernel.org/all/20250331121820.455916-1-bhupesh@igalia.com/T/#m7c163829440f2c98e64b475b7cdbc35ba4d613ca
Thanks,
Bhupesh
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] exec: Dynamically allocate memory to store task's full name
2025-04-04 6:48 ` Bhupesh Sharma
@ 2025-04-04 17:24 ` Kees Cook
2025-04-09 11:31 ` Bhupesh Sharma
0 siblings, 1 reply; 18+ messages in thread
From: Kees Cook @ 2025-04-04 17:24 UTC (permalink / raw)
To: Bhupesh Sharma
Cc: Bhupesh, akpm, kernel-dev, linux-kernel, bpf, linux-perf-users,
linux-fsdevel, linux-mm, oliver.sang, lkp, laoar.shao, pmladek,
rostedt, mathieu.desnoyers, arnaldo.melo, alexei.starovoitov,
andrii.nakryiko, mirq-linux, peterz, willy, david, viro, ebiederm,
brauner, jack, mingo, juri.lelli, bsegall, mgorman, vschneid
On Fri, Apr 04, 2025 at 12:18:56PM +0530, Bhupesh Sharma wrote:
> In another review for this series, Yafang mentioned the following cleanup +
> approach suggested by Linus (see [0]).
> Also I have summarized my understanding on the basis of the suggestions
> Linus shared and the accompanying background threads (please see [1]).
>
> Kindly share your views on the same, so that I can change the implementation
> in v3 series accordingly.
In thinking about this a little more I think we can't universally change
all the APIs to use the new full_name since it is a pointer, which may
be getting changed out from under readers if a setter changes it. So
this may need some careful redesign, likely with RCU. hmm.
--
Kees Cook
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] exec: Dynamically allocate memory to store task's full name
2025-04-04 17:24 ` Kees Cook
@ 2025-04-09 11:31 ` Bhupesh Sharma
0 siblings, 0 replies; 18+ messages in thread
From: Bhupesh Sharma @ 2025-04-09 11:31 UTC (permalink / raw)
To: Kees Cook
Cc: Bhupesh, akpm, kernel-dev, linux-kernel, bpf, linux-perf-users,
linux-fsdevel, linux-mm, oliver.sang, lkp, laoar.shao, pmladek,
rostedt, mathieu.desnoyers, arnaldo.melo, alexei.starovoitov,
andrii.nakryiko, mirq-linux, peterz, willy, david, viro, ebiederm,
brauner, jack, mingo, juri.lelli, bsegall, mgorman, vschneid
Hi Kees,
Sorry for the delay - I was out for a couple of days.
On 4/4/25 10:54 PM, Kees Cook wrote:
> On Fri, Apr 04, 2025 at 12:18:56PM +0530, Bhupesh Sharma wrote:
>> In another review for this series, Yafang mentioned the following cleanup +
>> approach suggested by Linus (see [0]).
>> Also I have summarized my understanding on the basis of the suggestions
>> Linus shared and the accompanying background threads (please see [1]).
>>
>> Kindly share your views on the same, so that I can change the implementation
>> in v3 series accordingly.
> In thinking about this a little more I think we can't universally change
> all the APIs to use the new full_name since it is a pointer, which may
> be getting changed out from under readers if a setter changes it. So
> this may need some careful redesign, likely with RCU. hmm.
>
Thinking more about this, Linus mentioned in [0]:
'Since user space can randomly change their names anyway, using locking
was always wrong for readers (for writers it probably does make sense
to have some lock'
So, if we go with the union approach, probably we can do with just a writer-lock, whereas if we go with a task->full_name like pointer one, we would probably need a rcu lock.
Please let me know your comments.
[0]. https://lore.kernel.org/all/CAHk-=wivfrF0_zvf+oj6==Sh=-npJooP8chLPEfaFV0oNYTTBA@mail.gmail.com/
^ permalink raw reply [flat|nested] 18+ messages in thread