linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/9] Improve the copy of task comm
@ 2024-08-04  7:56 Yafang Shao
  2024-08-05 21:28 ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Yafang Shao @ 2024-08-04  7:56 UTC (permalink / raw)
  To: akpm
  Cc: torvalds, ebiederm, alexei.starovoitov, rostedt, catalin.marinas,
	penguin-kernel, linux-mm, linux-fsdevel, linux-trace-kernel,
	audit, linux-security-module, selinux, bpf, netdev, dri-devel,
	Yafang Shao

Using {memcpy,strncpy,strcpy,kstrdup} to copy the task comm relies on the
length of task comm. Changes in the task comm could result in a destination
string that is overflow. Therefore, we should explicitly ensure the destination
string is always NUL-terminated, regardless of the task comm. This approach
will facilitate future extensions to the task comm.

As suggested by Linus [0], we can identify all relevant code with the
following git grep command:

  git grep 'memcpy.*->comm\>'
  git grep 'kstrdup.*->comm\>'
  git grep 'strncpy.*->comm\>'
  git grep 'strcpy.*->comm\>'

PATCH #2~#4:   memcpy
PATCH #5~#6:   kstrdup
PATCH #7:      strncpy
PATCH #8~#9:   strcpy

There is a BUILD_BUG_ON() inside get_task_comm(), so when you use
get_task_comm(), it implies that the BUILD_BUG_ON() is necessary. However,
we don't want to impose this restriction on code where the length can be
changed, so we use __get_task_comm(), rather than the get_task_comm().

One use case of get_task_comm() is in code that has already exposed the
length to userspace. In such cases, we specifically add the BUILD_BUG_ON()
to prevent developers from changing it. For more information, see
commit 95af469c4f60 ("fs/binfmt_elf: replace open-coded string copy with
get_task_comm").

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/all/CAHk-=wjAmmHUg6vho1KjzQi2=psR30+CogFd4aXrThr2gsiS4g@mail.gmail.com/ [0]

Changes:
v4->v5:
- Drop changes in the mm/kmemleak.c as it was fixed by
  commit 0b84780134fb ("mm/kmemleak: replace strncpy() with strscpy()")
- Drop changes in kernel/tsacct.c as it was fixed by
  commmit 0fe2356434e ("tsacct: replace strncpy() with strscpy()")

v3->v4: https://lore.kernel.org/linux-mm/20240729023719.1933-1-laoar.shao@gmail.com/
- Rename __kstrndup() to __kmemdup_nul() and define it inside mm/util.c
  (Matthew)
- Remove unused local varaible (Simon)

v2->v3: https://lore.kernel.org/all/20240621022959.9124-1-laoar.shao@gmail.com/
- Deduplicate code around kstrdup (Andrew)
- Add commit log for dropping task_lock (Catalin)

v1->v2: https://lore.kernel.org/bpf/20240613023044.45873-1-laoar.shao@gmail.com/
- Add comment for dropping task_lock() in __get_task_comm() (Alexei)
- Drop changes in trace event (Steven)
- Fix comment on task comm (Matus)

v1: https://lore.kernel.org/all/20240602023754.25443-1-laoar.shao@gmail.com/

Yafang Shao (9):
  fs/exec: Drop task_lock() inside __get_task_comm()
  auditsc: Replace memcpy() with __get_task_comm()
  security: Replace memcpy() with __get_task_comm()
  bpftool: Ensure task comm is always NUL-terminated
  mm/util: Fix possible race condition in kstrdup()
  mm/util: Deduplicate code in {kstrdup,kstrndup,kmemdup_nul}
  tracing: Replace strncpy() with __get_task_comm()
  net: Replace strcpy() with __get_task_comm()
  drm: Replace strcpy() with __get_task_comm()

 drivers/gpu/drm/drm_framebuffer.c     |  2 +-
 drivers/gpu/drm/i915/i915_gpu_error.c |  2 +-
 fs/exec.c                             | 10 ++++-
 include/linux/sched.h                 |  4 +-
 kernel/auditsc.c                      |  6 +--
 kernel/trace/trace.c                  |  2 +-
 kernel/trace/trace_events_hist.c      |  2 +-
 mm/util.c                             | 61 ++++++++++++---------------
 net/ipv6/ndisc.c                      |  2 +-
 security/lsm_audit.c                  |  4 +-
 security/selinux/selinuxfs.c          |  2 +-
 tools/bpf/bpftool/pids.c              |  2 +
 12 files changed, 49 insertions(+), 50 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 0/9] Improve the copy of task comm
  2024-08-04  7:56 Yafang Shao
@ 2024-08-05 21:28 ` Linus Torvalds
  2024-08-06  3:00   ` Yafang Shao
  0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2024-08-05 21:28 UTC (permalink / raw)
  To: Yafang Shao
  Cc: akpm, ebiederm, alexei.starovoitov, rostedt, catalin.marinas,
	penguin-kernel, linux-mm, linux-fsdevel, linux-trace-kernel,
	audit, linux-security-module, selinux, bpf, netdev, dri-devel

On Sun, 4 Aug 2024 at 00:56, Yafang Shao <laoar.shao@gmail.com> wrote:
>
> There is a BUILD_BUG_ON() inside get_task_comm(), so when you use
> get_task_comm(), it implies that the BUILD_BUG_ON() is necessary.

Let's just remove that silly BUILD_BUG_ON(). I don't think it adds any
value, and honestly, it really only makes this patch-series uglier
when reasonable uses suddenly pointlessly need that double-underscore
version..

So let's aim at

 (a) documenting that the last byte in 'tsk->comm{}' is always
guaranteed to be NUL, so that the thing can always just be treated as
a string. Yes, it may change under us, but as long as we know there is
always a stable NUL there *somewhere*, we really really don't care.

 (b) removing __get_task_comm() entirely, and replacing it with a
plain 'str*cpy*()' functions

The whole (a) thing is a requirement anyway, since the *bulk* of
tsk->comm really just seems to be various '%s' things in printk
strings etc.

And once we just admit that we can use the string functions, all the
get_task_comm() stuff is just unnecessary.

And yes, some people may want to use the strscpy_pad() function
because they want to fill the whole destination buffer. But that's
entirely about the *destination* use, not the tsk->comm[] source, so
it has nothing to do with any kind of "get_task_comm()" logic, and it
was always wrong to care about the buffer sizes magically matching.

Hmm?

                Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 0/9] Improve the copy of task comm
  2024-08-05 21:28 ` Linus Torvalds
@ 2024-08-06  3:00   ` Yafang Shao
  2024-08-06  3:09     ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Yafang Shao @ 2024-08-06  3:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: akpm, ebiederm, alexei.starovoitov, rostedt, catalin.marinas,
	penguin-kernel, linux-mm, linux-fsdevel, linux-trace-kernel,
	audit, linux-security-module, selinux, bpf, netdev, dri-devel

On Tue, Aug 6, 2024 at 5:28 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Sun, 4 Aug 2024 at 00:56, Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > There is a BUILD_BUG_ON() inside get_task_comm(), so when you use
> > get_task_comm(), it implies that the BUILD_BUG_ON() is necessary.
>
> Let's just remove that silly BUILD_BUG_ON(). I don't think it adds any
> value, and honestly, it really only makes this patch-series uglier
> when reasonable uses suddenly pointlessly need that double-underscore
> version..
>
> So let's aim at
>
>  (a) documenting that the last byte in 'tsk->comm{}' is always
> guaranteed to be NUL, so that the thing can always just be treated as
> a string. Yes, it may change under us, but as long as we know there is
> always a stable NUL there *somewhere*, we really really don't care.
>
>  (b) removing __get_task_comm() entirely, and replacing it with a
> plain 'str*cpy*()' functions
>
> The whole (a) thing is a requirement anyway, since the *bulk* of
> tsk->comm really just seems to be various '%s' things in printk
> strings etc.
>
> And once we just admit that we can use the string functions, all the
> get_task_comm() stuff is just unnecessary.
>
> And yes, some people may want to use the strscpy_pad() function
> because they want to fill the whole destination buffer. But that's
> entirely about the *destination* use, not the tsk->comm[] source, so
> it has nothing to do with any kind of "get_task_comm()" logic, and it
> was always wrong to care about the buffer sizes magically matching.
>
> Hmm?

One concern about removing the BUILD_BUG_ON() is that if we extend
TASK_COMM_LEN to a larger size, such as 24, the caller with a
hardcoded 16-byte buffer may overflow. This could be an issue with
code in include/linux/elfcore.h and include/linux/elfcore-compat.h,
posing a risk. However, I believe it is the caller's responsibility to
explicitly add a null terminator if it uses a fixed buffer that cannot
be changed. Therefore, the following code change is necessary:

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 5ae8045f4df4..e4b0b7cf0c1f 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1579,6 +1579,7 @@ static int fill_psinfo(struct elf_prpsinfo
*psinfo, struct task_struct *p,
        SET_GID(psinfo->pr_gid, from_kgid_munged(cred->user_ns, cred->gid));
        rcu_read_unlock();
        get_task_comm(psinfo->pr_fname, p);
+       psinfo->pr_fname[15] = '\0';

        return 0;
 }

However, it is currently safe to remove the BUILD_BUG_ON() since
TASK_COMM_LEN is still 16.

--
Regards
Yafang

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 0/9] Improve the copy of task comm
  2024-08-06  3:00   ` Yafang Shao
@ 2024-08-06  3:09     ` Linus Torvalds
  2024-08-06  3:50       ` Yafang Shao
  0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2024-08-06  3:09 UTC (permalink / raw)
  To: Yafang Shao
  Cc: akpm, ebiederm, alexei.starovoitov, rostedt, catalin.marinas,
	penguin-kernel, linux-mm, linux-fsdevel, linux-trace-kernel,
	audit, linux-security-module, selinux, bpf, netdev, dri-devel

On Mon, 5 Aug 2024 at 20:01, Yafang Shao <laoar.shao@gmail.com> wrote:
>
> One concern about removing the BUILD_BUG_ON() is that if we extend
> TASK_COMM_LEN to a larger size, such as 24, the caller with a
> hardcoded 16-byte buffer may overflow.

No, not at all. Because get_task_comm() - and the replacements - would
never use TASK_COMM_LEN.

They'd use the size of the *destination*. That's what the code already does:

  #define get_task_comm(buf, tsk) ({                      \
  ...
        __get_task_comm(buf, sizeof(buf), tsk);         \

note how it uses "sizeof(buf)".

Now, it might be a good idea to also verify that 'buf' is an actual
array, and that this code doesn't do some silly "sizeof(ptr)" thing.

We do have a helper for that, so we could do something like

   #define get_task_comm(buf, tsk) \
        strscpy_pad(buf, __must_be_array(buf)+sizeof(buf), (tsk)->comm)

as a helper macro for this all.

(Although I'm not convinced we generally want the "_pad()" version,
but whatever).

                    Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 0/9] Improve the copy of task comm
  2024-08-06  3:09     ` Linus Torvalds
@ 2024-08-06  3:50       ` Yafang Shao
  0 siblings, 0 replies; 8+ messages in thread
From: Yafang Shao @ 2024-08-06  3:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: akpm, ebiederm, alexei.starovoitov, rostedt, catalin.marinas,
	penguin-kernel, linux-mm, linux-fsdevel, linux-trace-kernel,
	audit, linux-security-module, selinux, bpf, netdev, dri-devel

On Tue, Aug 6, 2024 at 11:10 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mon, 5 Aug 2024 at 20:01, Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > One concern about removing the BUILD_BUG_ON() is that if we extend
> > TASK_COMM_LEN to a larger size, such as 24, the caller with a
> > hardcoded 16-byte buffer may overflow.
>
> No, not at all. Because get_task_comm() - and the replacements - would
> never use TASK_COMM_LEN.
>
> They'd use the size of the *destination*. That's what the code already does:
>
>   #define get_task_comm(buf, tsk) ({                      \
>   ...
>         __get_task_comm(buf, sizeof(buf), tsk);         \
>
> note how it uses "sizeof(buf)".
>
> Now, it might be a good idea to also verify that 'buf' is an actual
> array, and that this code doesn't do some silly "sizeof(ptr)" thing.
>
> We do have a helper for that, so we could do something like
>
>    #define get_task_comm(buf, tsk) \
>         strscpy_pad(buf, __must_be_array(buf)+sizeof(buf), (tsk)->comm)
>
> as a helper macro for this all.
>
> (Although I'm not convinced we generally want the "_pad()" version,
> but whatever).
>

Will do it.
Thanks for your explanation.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 0/9] Improve the copy of task comm
@ 2024-08-06 17:28 Alejandro Colomar
  2024-08-08  2:49 ` Yafang Shao
  0 siblings, 1 reply; 8+ messages in thread
From: Alejandro Colomar @ 2024-08-06 17:28 UTC (permalink / raw)
  To: torvalds
  Cc: akpm, alexei.starovoitov, audit, bpf, catalin.marinas, dri-devel,
	ebiederm, laoar.shao, linux-fsdevel, linux-mm,
	linux-security-module, linux-trace-kernel, netdev, penguin-kernel,
	rostedt, selinux, serge

[-- Attachment #1: Type: text/plain, Size: 5129 bytes --]

Hi Linus,

Serge let me know about this thread earlier today.

On 2024-08-05, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Mon, 5 Aug 2024 at 20:01, Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > One concern about removing the BUILD_BUG_ON() is that if we extend
> > TASK_COMM_LEN to a larger size, such as 24, the caller with a
> > hardcoded 16-byte buffer may overflow.
> 
> No, not at all. Because get_task_comm() - and the replacements - would
> never use TASK_COMM_LEN.
> 
> They'd use the size of the *destination*. That's what the code already does:
> 
>   #define get_task_comm(buf, tsk) ({                      \
>   ...
>         __get_task_comm(buf, sizeof(buf), tsk);         \
> 
> note how it uses "sizeof(buf)".

In shadow.git, we also implemented macros that are named after functions
and calculate the appropriate number of elements internally.

	$ grepc -h STRNCAT .
	#define STRNCAT(dst, src)  strncat(dst, src, NITEMS(src))
	$ grepc -h STRNCPY .
	#define STRNCPY(dst, src)  strncpy(dst, src, NITEMS(dst))
	$ grepc -h STRTCPY .
	#define STRTCPY(dst, src)  strtcpy(dst, src, NITEMS(dst))
	$ grepc -h STRFTIME .
	#define STRFTIME(dst, fmt, tm)  strftime(dst, NITEMS(dst), fmt, tm)
	$ grepc -h DAY_TO_STR .
	#define DAY_TO_STR(str, day, iso)   day_to_str(NITEMS(str), str, day, iso)

They're quite useful, and when implementing them we found and fixed
several bugs thanks to them.

> Now, it might be a good idea to also verify that 'buf' is an actual
> array, and that this code doesn't do some silly "sizeof(ptr)" thing.

I decided to use NITEMS() instead of sizeof() for that reason.
(NITEMS() is just our name for ARRAY_SIZE().)

	$ grepc -h NITEMS .
	#define NITEMS(a)            (SIZEOF_ARRAY((a)) / sizeof((a)[0]))

> We do have a helper for that, so we could do something like
> 
>    #define get_task_comm(buf, tsk) \
>         strscpy_pad(buf, __must_be_array(buf)+sizeof(buf), (tsk)->comm)

We have SIZEOF_ARRAY() for when you want the size of an array:

	$ grepc -h SIZEOF_ARRAY .
	#define SIZEOF_ARRAY(a)      (sizeof(a) + must_be_array(a))

However, I don't think you want sizeof().  Let me explain why:

-  Let's say you want to call wcsncpy(3) (I know nobody should be using
   that function, not strncpy(3), but I'm using it as a standard example
   of a wide-character string function).

   You should call wcsncpy(dst, src, NITEMS(dst)).
   A call wcsncpy(dst, src, sizeof(dst)) is bogus, since the argument is
   the number of wide characters, not the number of bytes.

   When translating that to normal characters, you want conceptually the
   same operation, but on (normal) characters.  That is, you want
   strncpy(dst, src, NITEMS(dst)).  While strncpy(3) with sizeof() works
   just fine because sizeof(char)==1 by definition, it is conceptually
   wrong to use it.

   By using NITEMS() (i.e., ARRAY_SIZE()), you get the __must_be_array()
   check for free.

In the end, SIZEOF_ARRAY() is something we very rarely use.  It's there
only used in the following two cases at the moment:

	#define NITEMS(a)            (SIZEOF_ARRAY((a)) / sizeof((a)[0]))
	#define MEMZERO(arr)  memzero(arr, SIZEOF_ARRAY(arr))

Does that sound convincing?

For memcpy(3) for example, you do want sizeof(), because you're copying
raw bytes, but with strings, in which characters are conceptually
meaningful elements, NITEMS() makes more sense.

BTW, I'm working on a __lengthof__ operator that will soon allow using
it on function parameters declared with array notation.  That is,

	size_t
	f(size_t n, int a[n])
	{
		return __lengthof__(a);  // This will return n.
	}

If you're interested in it, I'm developing and discussing it here:
<https://inbox.sourceware.org/gcc-patches/20240806122218.3827577-1-alx@kernel.org/>

> 
> as a helper macro for this all.
> 
> (Although I'm not convinced we generally want the "_pad()" version,
> but whatever).

We had problems with it in shadow recently.  In user-space, it's similar
to strncpy(3) (at least if you wrap it in a macro that makes sure that
it terminates the string with a null byte).

We had a lot of uses of strncpy(3), from old times where that was used
to copy strings with truncation.  I audited all of that code (and
haven't really finished yet), and translated to calls similar to
strscpy(9) (we call it strtcpy(), as it _t_runcates).  The problem was
that in some cases the padding was necessary, and in others it was not,
and it was very hard to distinguish those.

I recommend not zeroing strings unnecessarily, since that will make it
hard to review the code later.  E.g., twenty years from now, someone
takes a piece of code with a _pad() call, and has no clue if the zeroing
was for a reason, or for no reason.

On the other hand, not zeroing may make it easier to explot bugs, so
whatever you think best.  In the kernel you may need to be more worried
than in user space.  Whatever.  :)

> 
>                     Linus

Have a lovely day!
Alex


-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 0/9] Improve the copy of task comm
  2024-08-06 17:28 [PATCH v5 0/9] Improve the copy of task comm Alejandro Colomar
@ 2024-08-08  2:49 ` Yafang Shao
  2024-08-08  7:58   ` Alejandro Colomar
  0 siblings, 1 reply; 8+ messages in thread
From: Yafang Shao @ 2024-08-08  2:49 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: torvalds, akpm, alexei.starovoitov, audit, bpf, catalin.marinas,
	dri-devel, ebiederm, linux-fsdevel, linux-mm,
	linux-security-module, linux-trace-kernel, netdev, penguin-kernel,
	rostedt, selinux, serge

On Wed, Aug 7, 2024 at 1:28 AM Alejandro Colomar <alx@kernel.org> wrote:
>
> Hi Linus,
>
> Serge let me know about this thread earlier today.
>
> On 2024-08-05, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > On Mon, 5 Aug 2024 at 20:01, Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > One concern about removing the BUILD_BUG_ON() is that if we extend
> > > TASK_COMM_LEN to a larger size, such as 24, the caller with a
> > > hardcoded 16-byte buffer may overflow.
> >
> > No, not at all. Because get_task_comm() - and the replacements - would
> > never use TASK_COMM_LEN.
> >
> > They'd use the size of the *destination*. That's what the code already does:
> >
> >   #define get_task_comm(buf, tsk) ({                      \
> >   ...
> >         __get_task_comm(buf, sizeof(buf), tsk);         \
> >
> > note how it uses "sizeof(buf)".
>
> In shadow.git, we also implemented macros that are named after functions
> and calculate the appropriate number of elements internally.
>
>         $ grepc -h STRNCAT .
>         #define STRNCAT(dst, src)  strncat(dst, src, NITEMS(src))
>         $ grepc -h STRNCPY .
>         #define STRNCPY(dst, src)  strncpy(dst, src, NITEMS(dst))
>         $ grepc -h STRTCPY .
>         #define STRTCPY(dst, src)  strtcpy(dst, src, NITEMS(dst))
>         $ grepc -h STRFTIME .
>         #define STRFTIME(dst, fmt, tm)  strftime(dst, NITEMS(dst), fmt, tm)
>         $ grepc -h DAY_TO_STR .
>         #define DAY_TO_STR(str, day, iso)   day_to_str(NITEMS(str), str, day, iso)
>
> They're quite useful, and when implementing them we found and fixed
> several bugs thanks to them.
>
> > Now, it might be a good idea to also verify that 'buf' is an actual
> > array, and that this code doesn't do some silly "sizeof(ptr)" thing.
>
> I decided to use NITEMS() instead of sizeof() for that reason.
> (NITEMS() is just our name for ARRAY_SIZE().)
>
>         $ grepc -h NITEMS .
>         #define NITEMS(a)            (SIZEOF_ARRAY((a)) / sizeof((a)[0]))
>
> > We do have a helper for that, so we could do something like
> >
> >    #define get_task_comm(buf, tsk) \
> >         strscpy_pad(buf, __must_be_array(buf)+sizeof(buf), (tsk)->comm)
>
> We have SIZEOF_ARRAY() for when you want the size of an array:
>
>         $ grepc -h SIZEOF_ARRAY .
>         #define SIZEOF_ARRAY(a)      (sizeof(a) + must_be_array(a))

There is already a similar macro in Linux:

  /**
   * ARRAY_SIZE - get the number of elements in array @arr
   * @arr: array to be sized
   */
  #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) +
__must_be_array(arr))

will use it instead of the sizeof().

>
> However, I don't think you want sizeof().  Let me explain why:
>
> -  Let's say you want to call wcsncpy(3) (I know nobody should be using
>    that function, not strncpy(3), but I'm using it as a standard example
>    of a wide-character string function).
>
>    You should call wcsncpy(dst, src, NITEMS(dst)).
>    A call wcsncpy(dst, src, sizeof(dst)) is bogus, since the argument is
>    the number of wide characters, not the number of bytes.
>
>    When translating that to normal characters, you want conceptually the
>    same operation, but on (normal) characters.  That is, you want
>    strncpy(dst, src, NITEMS(dst)).  While strncpy(3) with sizeof() works
>    just fine because sizeof(char)==1 by definition, it is conceptually
>    wrong to use it.
>
>    By using NITEMS() (i.e., ARRAY_SIZE()), you get the __must_be_array()
>    check for free.
>
> In the end, SIZEOF_ARRAY() is something we very rarely use.  It's there
> only used in the following two cases at the moment:
>
>         #define NITEMS(a)            (SIZEOF_ARRAY((a)) / sizeof((a)[0]))
>         #define MEMZERO(arr)  memzero(arr, SIZEOF_ARRAY(arr))
>
> Does that sound convincing?
>
> For memcpy(3) for example, you do want sizeof(), because you're copying
> raw bytes, but with strings, in which characters are conceptually
> meaningful elements, NITEMS() makes more sense.
>
> BTW, I'm working on a __lengthof__ operator that will soon allow using
> it on function parameters declared with array notation.  That is,
>
>         size_t
>         f(size_t n, int a[n])
>         {
>                 return __lengthof__(a);  // This will return n.
>         }
>
> If you're interested in it, I'm developing and discussing it here:
> <https://inbox.sourceware.org/gcc-patches/20240806122218.3827577-1-alx@kernel.org/>
>
> >
> > as a helper macro for this all.
> >
> > (Although I'm not convinced we generally want the "_pad()" version,
> > but whatever).
>
> We had problems with it in shadow recently.  In user-space, it's similar
> to strncpy(3) (at least if you wrap it in a macro that makes sure that
> it terminates the string with a null byte).
>
> We had a lot of uses of strncpy(3), from old times where that was used
> to copy strings with truncation.  I audited all of that code (and
> haven't really finished yet), and translated to calls similar to
> strscpy(9) (we call it strtcpy(), as it _t_runcates).  The problem was
> that in some cases the padding was necessary, and in others it was not,
> and it was very hard to distinguish those.
>
> I recommend not zeroing strings unnecessarily, since that will make it
> hard to review the code later.  E.g., twenty years from now, someone
> takes a piece of code with a _pad() call, and has no clue if the zeroing
> was for a reason, or for no reason.
>
> On the other hand, not zeroing may make it easier to explot bugs, so
> whatever you think best.  In the kernel you may need to be more worried
> than in user space.  Whatever.  :)

Good point.
I will avoid using the _pad().

--
Regards
Yafang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 0/9] Improve the copy of task comm
  2024-08-08  2:49 ` Yafang Shao
@ 2024-08-08  7:58   ` Alejandro Colomar
  0 siblings, 0 replies; 8+ messages in thread
From: Alejandro Colomar @ 2024-08-08  7:58 UTC (permalink / raw)
  To: Yafang Shao
  Cc: torvalds, akpm, alexei.starovoitov, audit, bpf, catalin.marinas,
	dri-devel, ebiederm, linux-fsdevel, linux-mm,
	linux-security-module, linux-trace-kernel, netdev, penguin-kernel,
	rostedt, selinux, serge

[-- Attachment #1: Type: text/plain, Size: 1441 bytes --]

Hi Yafang,

On Thu, Aug 08, 2024 at 10:49:17AM GMT, Yafang Shao wrote:
> > > Now, it might be a good idea to also verify that 'buf' is an actual
> > > array, and that this code doesn't do some silly "sizeof(ptr)" thing.
> >
> > I decided to use NITEMS() instead of sizeof() for that reason.
> > (NITEMS() is just our name for ARRAY_SIZE().)
> >
> >         $ grepc -h NITEMS .
> >         #define NITEMS(a)            (SIZEOF_ARRAY((a)) / sizeof((a)[0]))
> >
> > > We do have a helper for that, so we could do something like
> > >
> > >    #define get_task_comm(buf, tsk) \
> > >         strscpy_pad(buf, __must_be_array(buf)+sizeof(buf), (tsk)->comm)
> >
> > We have SIZEOF_ARRAY() for when you want the size of an array:
> >
> >         $ grepc -h SIZEOF_ARRAY .
> >         #define SIZEOF_ARRAY(a)      (sizeof(a) + must_be_array(a))
> 
> There is already a similar macro in Linux:
> 
>   /**
>    * ARRAY_SIZE - get the number of elements in array @arr
>    * @arr: array to be sized
>    */
>   #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) +
> __must_be_array(arr))

This is actually the same as our NITEMS(), not SIZEOF_ARRAY().

> will use it instead of the sizeof().

But yeah, indeed I think you should use ARRAY_SIZE() in
get_task_comm().  :)

> 
> Good point.
> I will avoid using the _pad().

Nice.  :)

Have a lovely day!
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-08-08  7:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-06 17:28 [PATCH v5 0/9] Improve the copy of task comm Alejandro Colomar
2024-08-08  2:49 ` Yafang Shao
2024-08-08  7:58   ` Alejandro Colomar
  -- strict thread matches above, loose matches on Subject: below --
2024-08-04  7:56 Yafang Shao
2024-08-05 21:28 ` Linus Torvalds
2024-08-06  3:00   ` Yafang Shao
2024-08-06  3:09     ` Linus Torvalds
2024-08-06  3:50       ` Yafang Shao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).