* swab.3: mention UB when from and to overlap @ 2025-10-11 22:18 Sertonix 2025-10-11 22:40 ` Collin Funk 0 siblings, 1 reply; 5+ messages in thread From: Sertonix @ 2025-10-11 22:18 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man The current swab.3 page doesn't seem to mention anything about what happens when from and to overlap. In POSIX any overlap is UB. glibc handles cases when from == to but it will choke when for example from == to+1. I am uncertain if from == to is meant to be a feature. If it is, would it be possible to mention that overlap is only safe when from == to and it's glibc (not eg. musl)? If it's not intended would it be possible to include the same information as in POSIX? ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: swab.3: mention UB when from and to overlap 2025-10-11 22:18 swab.3: mention UB when from and to overlap Sertonix @ 2025-10-11 22:40 ` Collin Funk 2025-10-12 0:02 ` Alejandro Colomar 0 siblings, 1 reply; 5+ messages in thread From: Collin Funk @ 2025-10-11 22:40 UTC (permalink / raw) To: Sertonix; +Cc: Alejandro Colomar, linux-man [-- Attachment #1: Type: text/plain, Size: 774 bytes --] "Sertonix" <sertonix@posteo.net> writes: > The current swab.3 page doesn't seem to mention anything about what > happens when from and to overlap. In POSIX any overlap is UB. > > glibc handles cases when from == to but it will choke when for example > from == to+1. I am uncertain if from == to is meant to be a feature. > > If it is, would it be possible to mention that overlap is only safe when > from == to and it's glibc (not eg. musl)? If it's not intended would it > be possible to include the same information as in POSIX? The prototype uses restrict for both pointers which is how you tell the C compiler that two objects will not overlap. But maybe it is better to be friendly to those new to see and state it explicitly? Alex will know better than I. Collin [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: swab.3: mention UB when from and to overlap 2025-10-11 22:40 ` Collin Funk @ 2025-10-12 0:02 ` Alejandro Colomar 2025-10-12 10:42 ` Sertonix 0 siblings, 1 reply; 5+ messages in thread From: Alejandro Colomar @ 2025-10-12 0:02 UTC (permalink / raw) To: Collin Funk; +Cc: Sertonix, linux-man [-- Attachment #1: Type: text/plain, Size: 3230 bytes --] Hi Sertonix, Collin, On Sat, Oct 11, 2025 at 03:40:34PM -0700, Collin Funk wrote: > "Sertonix" <sertonix@posteo.net> writes: > > > The current swab.3 page doesn't seem to mention anything about what > > happens when from and to overlap. In POSIX any overlap is UB. > > > > glibc handles cases when from == to but it will choke when for example > > from == to+1. I am uncertain if from == to is meant to be a feature. > > > > If it is, would it be possible to mention that overlap is only safe when > > from == to and it's glibc (not eg. musl)? If it's not intended would it > > be possible to include the same information as in POSIX? > > The prototype uses restrict for both pointers which is how you tell the > C compiler that two objects will not overlap. As Collin says, 'restrict' is there to document this. The 'restrict' keyword (theoretically, a qualifier, but it works more like an attribute) is difficult to explain (and the wording of the standard to describe it is quite difficult to follow). However, the core idea is simple: nothing should overlap such a pointer. There are exceptions, such as the case when a function doesn't access such a pointer. That's why strtol(3) is declared as long strtol(const char *restrict s, char **restrict endp, int base); even though one can (and usually do) call it as strtol(s, &s, 0), where 's' is indeed aliased by another pointer (*endp). That's because *endp is never accessed within strtol(3). This is somewhat unfortunate, as it doesn't allow the compiler to diagnose bad calls to restrict functions, as the compiler isn't able to know if the pointer is accessed or not, and thus it doesn't know if the call is valid or not. One could use the [[gnu::access()]] attribute to give the compiler some extra information, which would allow it to diagnose. const-correct-ness would also help, but precisely, strtol(3) can't be const-correct, because of the issue with pointers to pointers to const. A better API would be a const-generic macro: long strtol(QChar *restrict s, QChar **restrict endp, int base); (See the C23 standard for the meaning of QChar.) When designing new APIs, we should make sure to not make this mistake. But with old APIs, we're stuck with this problem. I'm working on a replacement of strtol(3), which would allow adding a diagnostic in the compiler. It would trigger on valid uses of strtol(3), but people will be able to either switch to the new API, or turn off the diagnostic. > But maybe it is better to be friendly to those new to see and state it > explicitly? Alex will know better than I. I hope we don't. That would require a CAVEATS section in too many pages. $ grep -rl '\<restrict\>' man | wc -l 156 While the detail about strtol(3) is tricky, and could be worthy of documentation (most likely), the general idea behind 'restrict' is quite easy to understand, IMO, and I think documenting it in plain English would be too much. Especially, since 'restrict' is a C99 keyword, so I expect it to be more or less common knowledge. Have a lovely night! Alex -- <https://www.alejandro-colomar.es> Use port 80 (that is, <...:80/>). [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: swab.3: mention UB when from and to overlap 2025-10-12 0:02 ` Alejandro Colomar @ 2025-10-12 10:42 ` Sertonix 2025-10-12 11:04 ` Alejandro Colomar 0 siblings, 1 reply; 5+ messages in thread From: Sertonix @ 2025-10-12 10:42 UTC (permalink / raw) To: Alejandro Colomar, Collin Funk; +Cc: Sertonix, linux-man On Sun Oct 12, 2025 at 2:02 AM CEST, Alejandro Colomar wrote: > Hi Sertonix, Collin, > > On Sat, Oct 11, 2025 at 03:40:34PM -0700, Collin Funk wrote: >> "Sertonix" <sertonix@posteo.net> writes: >> >> > The current swab.3 page doesn't seem to mention anything about what >> > happens when from and to overlap. In POSIX any overlap is UB. >> > >> > glibc handles cases when from == to but it will choke when for example >> > from == to+1. I am uncertain if from == to is meant to be a feature. >> > >> > If it is, would it be possible to mention that overlap is only safe when >> > from == to and it's glibc (not eg. musl)? If it's not intended would it >> > be possible to include the same information as in POSIX? >> >> The prototype uses restrict for both pointers which is how you tell the >> C compiler that two objects will not overlap. > > As Collin says, 'restrict' is there to document this. > > The 'restrict' keyword (theoretically, a qualifier, but it works more > like an attribute) is difficult to explain (and the wording of the > standard to describe it is quite difficult to follow). However, the > core idea is simple: nothing should overlap such a pointer. Thanks for the explanation! Then this was just my lack of understanding the restrict keyword. > There are exceptions, such as the case when a function doesn't access > such a pointer. That's why strtol(3) is declared as > > long strtol(const char *restrict s, char **restrict endp, int base); > > even though one can (and usually do) call it as strtol(s, &s, 0), where > 's' is indeed aliased by another pointer (*endp). That's because *endp > is never accessed within strtol(3). If endp is considered to point to a 0 size block of memory it works ;) >> But maybe it is better to be friendly to those new to see and state it >> explicitly? Alex will know better than I. > > I hope we don't. That would require a CAVEATS section in too many > pages. > > $ grep -rl '\<restrict\>' man | wc -l > 156 > > While the detail about strtol(3) is tricky, and could be worthy of > documentation (most likely), the general idea behind 'restrict' is > quite easy to understand, IMO, and I think documenting it in plain > English would be too much. Especially, since 'restrict' is a C99 > keyword, so I expect it to be more or less common knowledge. It seems fair to not repeat the meaning of restrict everywhere. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: swab.3: mention UB when from and to overlap 2025-10-12 10:42 ` Sertonix @ 2025-10-12 11:04 ` Alejandro Colomar 0 siblings, 0 replies; 5+ messages in thread From: Alejandro Colomar @ 2025-10-12 11:04 UTC (permalink / raw) To: Sertonix; +Cc: Collin Funk, linux-man [-- Attachment #1: Type: text/plain, Size: 1695 bytes --] Hi Sertonix, On Sun, Oct 12, 2025 at 10:42:45AM +0000, Sertonix wrote: > > There are exceptions, such as the case when a function doesn't access > > such a pointer. That's why strtol(3) is declared as > > > > long strtol(const char *restrict s, char **restrict endp, int base); > > > > even though one can (and usually do) call it as strtol(s, &s, 0), where > > 's' is indeed aliased by another pointer (*endp). That's because *endp > > is never accessed within strtol(3). > > If endp is considered to point to a 0 size block of memory it works ;) Actually, it must be considered to point to a non-0 size block, because strtol(3) accesses 'endp' and writes to it. It is '*endp' which is not accessed, which is why it doesn't matter what it points to. This is how it should be declared (I wonder why glibc doesn't use the [[gnu::access()]] attribute): [[gnu::access(read_only, 1)]] [[gnu::access(write_only, 2)]] [[gnu::null_terminated_string_arg(1)]] [[gnu::leaf]] [[gnu::nothrow]] long strtol(const char *restrict s, char **restrict endp, int base); This gives enough information to the compiler to realize that a call strtol(s, &s, 0) is safe regardless of restrict, since the second argument won't be used to modify the string (but it will change where 's' points to after the call). However, actually using that information in the compiler won't be easy, I suspect. That's why diagnostics about restrict are bad today. (Reminder to self: I need to write a gnu::access(3attr) manual page where I should explain this.) Have a lovely day! Alex -- <https://www.alejandro-colomar.es> Use port 80 (that is, <...:80/>). [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-10-12 11:04 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-11 22:18 swab.3: mention UB when from and to overlap Sertonix 2025-10-11 22:40 ` Collin Funk 2025-10-12 0:02 ` Alejandro Colomar 2025-10-12 10:42 ` Sertonix 2025-10-12 11:04 ` Alejandro Colomar
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.