* swab.3: mention UB when from and to overlap
@ 2025-10-11 22:18 Sertonix
2025-10-11 22:40 ` Collin Funk
0 siblings, 1 reply; 5+ messages in thread
From: Sertonix @ 2025-10-11 22:18 UTC (permalink / raw)
To: Alejandro Colomar; +Cc: linux-man
The current swab.3 page doesn't seem to mention anything about what
happens when from and to overlap. In POSIX any overlap is UB.
glibc handles cases when from == to but it will choke when for example
from == to+1. I am uncertain if from == to is meant to be a feature.
If it is, would it be possible to mention that overlap is only safe when
from == to and it's glibc (not eg. musl)? If it's not intended would it
be possible to include the same information as in POSIX?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: swab.3: mention UB when from and to overlap
2025-10-11 22:18 swab.3: mention UB when from and to overlap Sertonix
@ 2025-10-11 22:40 ` Collin Funk
2025-10-12 0:02 ` Alejandro Colomar
0 siblings, 1 reply; 5+ messages in thread
From: Collin Funk @ 2025-10-11 22:40 UTC (permalink / raw)
To: Sertonix; +Cc: Alejandro Colomar, linux-man
[-- Attachment #1: Type: text/plain, Size: 774 bytes --]
"Sertonix" <sertonix@posteo.net> writes:
> The current swab.3 page doesn't seem to mention anything about what
> happens when from and to overlap. In POSIX any overlap is UB.
>
> glibc handles cases when from == to but it will choke when for example
> from == to+1. I am uncertain if from == to is meant to be a feature.
>
> If it is, would it be possible to mention that overlap is only safe when
> from == to and it's glibc (not eg. musl)? If it's not intended would it
> be possible to include the same information as in POSIX?
The prototype uses restrict for both pointers which is how you tell the
C compiler that two objects will not overlap.
But maybe it is better to be friendly to those new to see and state it
explicitly? Alex will know better than I.
Collin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: swab.3: mention UB when from and to overlap
2025-10-11 22:40 ` Collin Funk
@ 2025-10-12 0:02 ` Alejandro Colomar
2025-10-12 10:42 ` Sertonix
0 siblings, 1 reply; 5+ messages in thread
From: Alejandro Colomar @ 2025-10-12 0:02 UTC (permalink / raw)
To: Collin Funk; +Cc: Sertonix, linux-man
[-- Attachment #1: Type: text/plain, Size: 3230 bytes --]
Hi Sertonix, Collin,
On Sat, Oct 11, 2025 at 03:40:34PM -0700, Collin Funk wrote:
> "Sertonix" <sertonix@posteo.net> writes:
>
> > The current swab.3 page doesn't seem to mention anything about what
> > happens when from and to overlap. In POSIX any overlap is UB.
> >
> > glibc handles cases when from == to but it will choke when for example
> > from == to+1. I am uncertain if from == to is meant to be a feature.
> >
> > If it is, would it be possible to mention that overlap is only safe when
> > from == to and it's glibc (not eg. musl)? If it's not intended would it
> > be possible to include the same information as in POSIX?
>
> The prototype uses restrict for both pointers which is how you tell the
> C compiler that two objects will not overlap.
As Collin says, 'restrict' is there to document this.
The 'restrict' keyword (theoretically, a qualifier, but it works more
like an attribute) is difficult to explain (and the wording of the
standard to describe it is quite difficult to follow). However, the
core idea is simple: nothing should overlap such a pointer.
There are exceptions, such as the case when a function doesn't access
such a pointer. That's why strtol(3) is declared as
long strtol(const char *restrict s, char **restrict endp, int base);
even though one can (and usually do) call it as strtol(s, &s, 0), where
's' is indeed aliased by another pointer (*endp). That's because *endp
is never accessed within strtol(3).
This is somewhat unfortunate, as it doesn't allow the compiler to
diagnose bad calls to restrict functions, as the compiler isn't able to
know if the pointer is accessed or not, and thus it doesn't know if
the call is valid or not. One could use the [[gnu::access()]] attribute
to give the compiler some extra information, which would allow it to
diagnose.
const-correct-ness would also help, but precisely, strtol(3) can't be
const-correct, because of the issue with pointers to pointers to const.
A better API would be a const-generic macro:
long strtol(QChar *restrict s, QChar **restrict endp, int base);
(See the C23 standard for the meaning of QChar.) When designing new
APIs, we should make sure to not make this mistake. But with old APIs,
we're stuck with this problem. I'm working on a replacement of
strtol(3), which would allow adding a diagnostic in the compiler. It
would trigger on valid uses of strtol(3), but people will be able to
either switch to the new API, or turn off the diagnostic.
> But maybe it is better to be friendly to those new to see and state it
> explicitly? Alex will know better than I.
I hope we don't. That would require a CAVEATS section in too many
pages.
$ grep -rl '\<restrict\>' man | wc -l
156
While the detail about strtol(3) is tricky, and could be worthy of
documentation (most likely), the general idea behind 'restrict' is
quite easy to understand, IMO, and I think documenting it in plain
English would be too much. Especially, since 'restrict' is a C99
keyword, so I expect it to be more or less common knowledge.
Have a lovely night!
Alex
--
<https://www.alejandro-colomar.es>
Use port 80 (that is, <...:80/>).
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: swab.3: mention UB when from and to overlap
2025-10-12 0:02 ` Alejandro Colomar
@ 2025-10-12 10:42 ` Sertonix
2025-10-12 11:04 ` Alejandro Colomar
0 siblings, 1 reply; 5+ messages in thread
From: Sertonix @ 2025-10-12 10:42 UTC (permalink / raw)
To: Alejandro Colomar, Collin Funk; +Cc: Sertonix, linux-man
On Sun Oct 12, 2025 at 2:02 AM CEST, Alejandro Colomar wrote:
> Hi Sertonix, Collin,
>
> On Sat, Oct 11, 2025 at 03:40:34PM -0700, Collin Funk wrote:
>> "Sertonix" <sertonix@posteo.net> writes:
>>
>> > The current swab.3 page doesn't seem to mention anything about what
>> > happens when from and to overlap. In POSIX any overlap is UB.
>> >
>> > glibc handles cases when from == to but it will choke when for example
>> > from == to+1. I am uncertain if from == to is meant to be a feature.
>> >
>> > If it is, would it be possible to mention that overlap is only safe when
>> > from == to and it's glibc (not eg. musl)? If it's not intended would it
>> > be possible to include the same information as in POSIX?
>>
>> The prototype uses restrict for both pointers which is how you tell the
>> C compiler that two objects will not overlap.
>
> As Collin says, 'restrict' is there to document this.
>
> The 'restrict' keyword (theoretically, a qualifier, but it works more
> like an attribute) is difficult to explain (and the wording of the
> standard to describe it is quite difficult to follow). However, the
> core idea is simple: nothing should overlap such a pointer.
Thanks for the explanation! Then this was just my lack of understanding
the restrict keyword.
> There are exceptions, such as the case when a function doesn't access
> such a pointer. That's why strtol(3) is declared as
>
> long strtol(const char *restrict s, char **restrict endp, int base);
>
> even though one can (and usually do) call it as strtol(s, &s, 0), where
> 's' is indeed aliased by another pointer (*endp). That's because *endp
> is never accessed within strtol(3).
If endp is considered to point to a 0 size block of memory it works ;)
>> But maybe it is better to be friendly to those new to see and state it
>> explicitly? Alex will know better than I.
>
> I hope we don't. That would require a CAVEATS section in too many
> pages.
>
> $ grep -rl '\<restrict\>' man | wc -l
> 156
>
> While the detail about strtol(3) is tricky, and could be worthy of
> documentation (most likely), the general idea behind 'restrict' is
> quite easy to understand, IMO, and I think documenting it in plain
> English would be too much. Especially, since 'restrict' is a C99
> keyword, so I expect it to be more or less common knowledge.
It seems fair to not repeat the meaning of restrict everywhere.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: swab.3: mention UB when from and to overlap
2025-10-12 10:42 ` Sertonix
@ 2025-10-12 11:04 ` Alejandro Colomar
0 siblings, 0 replies; 5+ messages in thread
From: Alejandro Colomar @ 2025-10-12 11:04 UTC (permalink / raw)
To: Sertonix; +Cc: Collin Funk, linux-man
[-- Attachment #1: Type: text/plain, Size: 1695 bytes --]
Hi Sertonix,
On Sun, Oct 12, 2025 at 10:42:45AM +0000, Sertonix wrote:
> > There are exceptions, such as the case when a function doesn't access
> > such a pointer. That's why strtol(3) is declared as
> >
> > long strtol(const char *restrict s, char **restrict endp, int base);
> >
> > even though one can (and usually do) call it as strtol(s, &s, 0), where
> > 's' is indeed aliased by another pointer (*endp). That's because *endp
> > is never accessed within strtol(3).
>
> If endp is considered to point to a 0 size block of memory it works ;)
Actually, it must be considered to point to a non-0 size block, because
strtol(3) accesses 'endp' and writes to it. It is '*endp' which is not
accessed, which is why it doesn't matter what it points to.
This is how it should be declared (I wonder why glibc doesn't use the
[[gnu::access()]] attribute):
[[gnu::access(read_only, 1)]]
[[gnu::access(write_only, 2)]]
[[gnu::null_terminated_string_arg(1)]]
[[gnu::leaf]]
[[gnu::nothrow]]
long
strtol(const char *restrict s, char **restrict endp, int base);
This gives enough information to the compiler to realize that a call
strtol(s, &s, 0) is safe regardless of restrict, since the second
argument won't be used to modify the string (but it will change where
's' points to after the call).
However, actually using that information in the compiler won't be easy,
I suspect. That's why diagnostics about restrict are bad today.
(Reminder to self: I need to write a gnu::access(3attr) manual page
where I should explain this.)
Have a lovely day!
Alex
--
<https://www.alejandro-colomar.es>
Use port 80 (that is, <...:80/>).
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-10-12 11:04 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-11 22:18 swab.3: mention UB when from and to overlap Sertonix
2025-10-11 22:40 ` Collin Funk
2025-10-12 0:02 ` Alejandro Colomar
2025-10-12 10:42 ` Sertonix
2025-10-12 11:04 ` Alejandro Colomar
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.