From: David Laight <david.laight.linux@gmail.com>
To: "André Almeida" <andrealmeid@igalia.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Steven Rostedt <rostedt@goodmis.org>,
Christian Brauner <brauner@kernel.org>,
Kees Cook <kees@kernel.org>, Shuah Khan <shuah@kernel.org>,
willy@infradead.org, mathieu.desnoyers@efficios.com,
Linus Torvalds <torvalds@linux-foundation.org>,
akpm@linux-foundation.org, Yafang Shao <laoar.shao@gmail.com>,
andrii.nakryiko@gmail.com, arnaldo.melo@gmail.com,
Petr Mladek <pmladek@suse.com>,
linux-kernel@vger.kernel.org, kernel-dev@igalia.com,
linux-mm@kvack.org, linux-api@vger.kernel.org
Subject: Re: [PATCH 3/6] string: Introduce strtostr() for safe and performance string copies
Date: Mon, 18 May 2026 19:38:43 +0100 [thread overview]
Message-ID: <20260518193843.7bde8d53@pumpkin> (raw)
In-Reply-To: <d4d6cf61-568e-478e-88d6-01b769d7eded@igalia.com>
On Mon, 18 May 2026 11:36:49 -0300
André Almeida <andrealmeid@igalia.com> wrote:
> Hi David, thanks for the feedback!
>
> Em 17/05/2026 18:34, David Laight escreveu:
> > On Sun, 17 May 2026 15:36:13 -0300
> > André Almeida <andrealmeid@igalia.com> wrote:
> >
> >> Some parts of the kernel uses memcpy() instead of strscpy() because they
> >> are performance sensitive and doesn't care about the return value of
> >> strscpy(). One such common case is to copy current->comm to a different
> >> buffer.
> >>
> >> As the command name is guaranteed to be NUL-terminated in the range of
> >> TASK_COMM_LEN, this is safe enough and doesn't create unterminated
> >> strings. However, in order to expand the size of current->comm, this
> >> expectation will be broken and those memcpy() could create such strings
> >> without trailing NUL byte.
> >>
> >> In order to support a fast and safe string copy, create strtostr(), to copy
> >> a NUL-terminated string to a new string buffer. If the destination buffer
> >> is bigger than the source, no pad is applied, but the string is
> >> NUL-terminated. If the destination buffer is smaller, the string is
> >> truncated. The last byte of the destination is always set to NUL for safety.
> >>
> >> Signed-off-by: André Almeida <andrealmeid@igalia.com>
> >> ---
> [...]>> +/**
> >> + * strtostr - Copy NUL-terminanted string to NUL-terminate string
> >> + *
> >> + * @dest: Pointer of destination string
> >> + * @src: Pointer to NUL-terminates string
> >> + *
> >> + * This is a replacement for strcpy() where the caller doesn't care about the
> >> + * return value and if the string is going to be truncated, albeit it needs
> >> + * to mark sure that it will be NUL-terminated. Intended for performance
> >> + * sensitive cases, such as tracing.
> >
> > If you care about performance, and the destination isn't smaller (especially
> > if the sizes are the same) then just use memcpy().
> >
>
> The problem is that as I'm expanding current->comm, the source buffer
> might be bigger than destination, and when we truncate the string, it
> won't have the termination NUL byte. So we need an extra dest[len-1] =
> \0 after the memcpy.
It depends on other access to the destination.
If it might be being concurrently read it is vital that it is always
terminated.
So you can't even temporarily have a non-zero byte at the end.
>
> >> + *
> >> + * If the destination is bigger than the source, no padding happens. It it's
> >> + * smaller the strings gets truncated.
> >> + *
> >> + * Both arguments needs to be arrays with lengths discoverable by the compiler.
> >> + */
> >> +#define strtostr(dest, src) do { \
> >> + const size_t _dest_len = __must_be_cstr(dest) + \
> >> + ARRAY_SIZE(dest); \
> >> + const size_t _src_len = __must_be_cstr(src) + \
> >> + __builtin_object_size(src, 1); \
> >> + \
> >> + BUILD_BUG_ON(!__builtin_constant_p(_dest_len) || \
> >> + _dest_len == (size_t)-1); \
> >> + memcpy(dest, src, strnlen(src, min(_src_len, _dest_len))); \
> >> + dest[_dest_len - 1] = '\0'; \
> >> +} while (0)
> >
> > That doesn't work (for all sorts of reasons).
> > _dest_len can be the size of a pointer - no array check.
> > You need to use __is_array() and sizeof () for both dest and src.
> > You might have meant to check that _src_len is constant, not _dest_len.
> > You must not leave the destination unterminated.
> >
> > __builtin_object_size(x->y,1) is also entirely useless!
> > If you have a pointer to a structure that ends in an array then the
> > object size of that array is SIZE_MAX (as if the array continues past
> > the end of the structure).
> > See https://godbolt.org/z/csenjfvxe (which I happened to prepare earlier today).
> >
> > __builtin_object_size(x->y,0) also seems to always return SIZE_MAX.
> > You do get a sane answer for (x->y,3) on recent clang - but nowhere else.
> >
>
> Oops, you are right, thanks for pointing that out. This is how it would
> look like checking that both args are arrays and using sizeof to get
> their length, if it sounds good I can apply for the v2:
>
> #define strtostr(dest, src) do { \
> const size_t _dest_len = __must_be_array(dest) + \
> sizeof(dest); \
> const size_t _src_len = __must_be_array(src) + \
> sizeof(src); \
> \
> BUILD_BUG_ON(!__builtin_constant_p(_dest_len) || \
> _dest_len == (size_t)-1); \
That test can never fail.
> memcpy(dest, src, min(_src_len, _dest_len))); \
> dest[_dest_len - 1] = '\0'; \
You are expending 'dest' twice.
Where it (p++)->array then the two values would be different and the final
value of 'p' incorrect.
Much better to assign both pointers to local variables.
Here you can use their required types to get type checking (I wouldn't bother
about the extra checks that _must_be_cstr() does).
I'd also create function that is explicitly for copying process names.
(Or replace the one that is already there - saves a lot of churn.)
then you know (and can check) the sizes are the expected ones.
It might even be worth making the #define (needed to get the array sizes)
call out to different functions for the different cases.
Thinks more...
On 64bit the 16 byte copy can be 'load; store; load; mask; store' provided
the buffer is aligned (copying u64 on 32bit will work the same).
But that requires that all the buffers be aligned.
So you'd need to check _Alignof(dest) >= _Alignof(u64) as well.
(Probably with a fallback to get things to compile.)
Whether that is best for the longer 64 byte copy is anybodies guess.
I also suspect it would be best to zero fill when copying a 16 byte
name into a 64 byte buffer.
(If you zero fill first then you can just copy 16 bytes over.)
-- David
> } while (0)
>
>
> > -- David
> >
> >
>
next prev parent reply other threads:[~2026-05-18 18:38 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-17 18:36 [PATCH 0/6] sched: Add support for long task name André Almeida
2026-05-17 18:36 ` [PATCH 1/6] sched: Update get_task_comm() comment André Almeida
2026-05-17 18:36 ` [PATCH 2/6] treewide: Get rid of get_task_comm() André Almeida
2026-05-17 18:36 ` [PATCH 3/6] string: Introduce strtostr() for safe and performance string copies André Almeida
2026-05-17 21:34 ` David Laight
[not found] ` <d4d6cf61-568e-478e-88d6-01b769d7eded@igalia.com>
2026-05-18 18:38 ` David Laight [this message]
2026-05-19 19:47 ` André Almeida
2026-05-20 9:53 ` David Laight
2026-05-19 20:37 ` Linus Torvalds
2026-05-19 20:46 ` André Almeida
2026-05-17 18:36 ` [PATCH 4/6] sched: Extend task command name to 64 bytes André Almeida
2026-05-17 18:36 ` [PATCH 5/6] prctl: Add support for long user thread names André Almeida
2026-05-17 18:36 ` [PATCH 6/6] selftests: prctl: Add test for long " André Almeida
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260518193843.7bde8d53@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=andrealmeid@igalia.com \
--cc=andrii.nakryiko@gmail.com \
--cc=arnaldo.melo@gmail.com \
--cc=brauner@kernel.org \
--cc=juri.lelli@redhat.com \
--cc=kees@kernel.org \
--cc=kernel-dev@igalia.com \
--cc=laoar.shao@gmail.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=rostedt@goodmis.org \
--cc=shuah@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vincent.guittot@linaro.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox