public inbox for linux-man@vger.kernel.org
 help / color / mirror / Atom feed
From: bugzilla-daemon@kernel.org
To: linux-man@vger.kernel.org
Subject: [Bug 219847] New: mbsnrtowcs(3) man page behavior with glibc incorrect (and POSIX.1-2024 incompatible)
Date: Thu, 06 Mar 2025 11:14:31 +0000	[thread overview]
Message-ID: <bug-219847-11311@https.bugzilla.kernel.org/> (raw)

https://bugzilla.kernel.org/show_bug.cgi?id=219847

            Bug ID: 219847
           Summary: mbsnrtowcs(3) man page behavior with glibc incorrect
                    (and POSIX.1-2024 incompatible)
           Product: Documentation
           Version: unspecified
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: man-pages
          Assignee: documentation_man-pages@kernel-bugs.osdl.org
          Reporter: explorer09@gmail.com
        Regression: No

mbsnrtowcs(3) man page has a part saying:

"According to POSIX.1, if the input buffer ends with an incomplete
character, it is unspecified whether conversion stops at the end
of the previous character (if any), or at the end of the input
buffer. The glibc implementation adopts the former behavior."

(https://man7.org/linux/man-pages/man3/mbsnrtowcs.3.html)
(Source:
https://web.git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/man/man3/mbsnrtowcs.3)

The problem:

It is POSIX.1-2008 and POSIX.1-2017 that leave it unspecified where the
conversion stops.

POSIX.1-2024 now requires the _latter_ behavior, and the reason they cited
about the change is, strangely, glibc. But this man page says that glibc uses
the former behavior.

(https://pubs.opengroup.org/onlinepubs/9799919799/functions/mbsrtowcs.html)
(https://www.austingroupbugs.net/view.php?id=616)

For my curiosity, I did test with the code included in the Austin Group Issue
report (also pasted below, with my personal modifications), in Devuan GNU/Linux
5 (glibc 2.36-9+deb12u9).

Glibc's behavior is close to the latter, but I would rather like to clarify the
behavior as follows:

"If the input buffer (up to the `nmc` limit) ends with an incomplete character,
conversion stops at the `nmc` byte index of the input buffer. However, if a
null byte ('\0') is encountered in the input buffer before the `nmc` limit,
then the incomplete sequence is treated as invalid instead, and `*src` would
point to the start of that invalid byte sequence."

(The behavior of treating the incomplete sequence before '\0' makes the
behavior of `mbsnrtowcs(dest, src, SIZE_MAX, size, ps)` identical to
`mbsrtowcs(dest, src, size, ps)` so mbsrtowcs(3) can be directly implemented
using mbsnrtowcs(3).)

My wording isn't great, so please revise the wording when you can.

```c
#include <wchar.h>
#include <locale.h>
#include <string.h>
#include <stdio.h>

wchar_t wcs[100];
char mbs[100];

int main()
{
        mbstate_t state; const char *s;
        setlocale(LC_CTYPE, "en_US.UTF-8");

        // U+754C U+7DDA
        memset(&state, 0, sizeof(state));
        memcpy(mbs, "\xe7\x95\x8c\xe7\xb7\x9a", 7);
        s = mbs;
        printf("%u ", (unsigned)mbsnrtowcs(wcs, &s, 5, 100, &state));
        printf("%u\n", (unsigned)(s - mbs));
        // Output: "1 5"
        // (If conversion stops at character boundary, the output would be "1
3".)

        memset(&state, 0, sizeof(state));
        memcpy(mbs, "\xe7\x95\x8c\xe7\xb7", 6);
        s = mbs;
        printf("%u ", (unsigned)mbsnrtowcs(wcs, &s, 6, 100, &state));
        printf("%u\n", (unsigned)(s - mbs));
        // Output: "4294967295 3"
}
```

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

             reply	other threads:[~2025-03-06 11:14 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-06 11:14 bugzilla-daemon [this message]
2025-03-09 18:46 ` [Bug 219847] mbsnrtowcs(3) man page behavior with glibc incorrect (and POSIX.1-2024 incompatible) bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-219847-11311@https.bugzilla.kernel.org/ \
    --to=bugzilla-daemon@kernel.org \
    --cc=linux-man@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox