* Fwd: mbrtowc(3) state after an invalid sequence "undefined" or "unspecified"?
[not found] <CADDzAfOZqQgVdEdn5skkkGPzUf7XiOGnmA0w7HZMkcQTpE7VKw@mail.gmail.com>
@ 2026-05-22 1:23 ` Kang-Che Sung
2026-05-28 12:11 ` Alejandro Colomar
1 sibling, 0 replies; 2+ messages in thread
From: Kang-Che Sung @ 2026-05-22 1:23 UTC (permalink / raw)
To: linux-man
---------- Forwarded message ---------
From: Kang-Che Sung <explorer09@gmail.com>
Date: Thu, May 21, 2026 at 11:08 PM
Subject: mbrtowc(3) state after an invalid sequence "undefined" or
"unspecified"?
To: Alejandro Colomar <alx@kernel.org>
Cc: <linux-man@vger.kernel.org>, <libc-alpha@sourceware.org>
Hi, Alejandro (or anyone else interested),
There's a discrepancy in the wording of the mbrtowc(3) function (and
similarly, mbsrtowcs(3) function) between in POSIX and ISO C. It could
be reported as an issue to POSIX (the Austin Group), and I am not sure
if you can do that.
In ISO C (I checked in both C99 and C23, in particular the N3220
draft), there's a statement that if mbrtowc() returns a (size_t)(-1)
as an encoding error occurs, "the conversion state is unspecified".
POSIX (see <https://pubs.opengroup.org/onlinepubs/9799919799/functions/mbrtowc.html>),
for the same part it says "the conversion state is undefined".
This wording difference matters when the "unspecified behavior" and
"undefined behavior" are technically different. An example is how the
mbstate_t object can be reused after an invalid sequence is
encountered. When the state is said to be "undefined" it's implied to
be not usable again (unless it is reset, e.g., by an `mbrtowc(NULL,
"", 1, ps)` call). When it's "unspecified" then implementations can
allow the state to be reused for certain encodings (possible for
UTF-8, for example).
This is something I discovered accidentally when researching the
multibyte functions in the C standard library and how they work with
an encoding like UTF-8.
^ permalink raw reply [flat|nested] 2+ messages in thread* Re: mbrtowc(3) state after an invalid sequence "undefined" or "unspecified"?
[not found] <CADDzAfOZqQgVdEdn5skkkGPzUf7XiOGnmA0w7HZMkcQTpE7VKw@mail.gmail.com>
2026-05-22 1:23 ` Fwd: mbrtowc(3) state after an invalid sequence "undefined" or "unspecified"? Kang-Che Sung
@ 2026-05-28 12:11 ` Alejandro Colomar
1 sibling, 0 replies; 2+ messages in thread
From: Alejandro Colomar @ 2026-05-28 12:11 UTC (permalink / raw)
To: Kang-Che Sung; +Cc: linux-man, libc-alpha
[-- Attachment #1: Type: text/plain, Size: 1593 bytes --]
Hi Kang-Che,
On 2026-05-21T23:08:20+0800, Kang-Che Sung wrote:
> Hi, Alejandro (or anyone else interested),
>
> There's a discrepancy in the wording of the mbrtowc(3) function (and
> similarly, mbsrtowcs(3) function) between in POSIX and ISO C. It could be
> reported as an issue to POSIX (the Austin Group), and I am not sure if you
> can do that.
>
> In ISO C (I checked in both C99 and C23, in particular the N3220 draft),
> there's a statement that if mbrtowc() returns a (size_t)(-1) as an encoding
> error occurs, "the conversion state is unspecified".
>
> POSIX (see <
> https://pubs.opengroup.org/onlinepubs/9799919799/functions/mbrtowc.html>),
> for the same part it says "the conversion state is undefined".
>
> This wording difference matters when the "unspecified behavior" and
> "undefined behavior" are technically different. An example is how the
> mbstate_t object can be reused after an invalid sequence is encountered.
> When the state is said to be "undefined" it's implied to be not usable
> again (unless it is reset, e.g., by an `mbrtowc(NULL, "", 1, ps)` call).
> When it's "unspecified" then implementations can allow the state to be
> reused for certain encodings (possible for UTF-8, for example).
>
> This is something I discovered accidentally when researching the multibyte
> functions in the C standard library and how they work with an encoding like
> UTF-8.
Thanks! I've opened this bug report:
<https://www.austingroupbugs.net/view.php?id=1982>
Have a lovely day!
Alex
--
<https://www.alejandro-colomar.es>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread