* Re: strlen man-page misinformation
[not found] ` <56B237F9.8010206-j9pdmedNgrk@public.gmane.org>
@ 2016-02-18 13:12 ` Michael Kerrisk (man-pages)
[not found] ` <56C5C33E.7030407-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-02-18 13:12 UTC (permalink / raw)
To: Alan Aversa
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
linux-man-u79uwXL29TY76Z2rM5mHXA
Hello Alan,
On 02/03/2016 06:25 PM, Alan Aversa wrote:
> Hello,
>
> The 2015-08-08 strlen man-page is incorrect. Here's a diff:
>
> --- a/man3/strlen.3
> +++ b/man3/strlen.3
> @@ -45,7 +45,7 @@ excluding the terminating null byte (\(aq\\0\(aq).
> .SH RETURN VALUE
> The
> .BR strlen ()
> -function returns the number of bytes in the string
> +function returns the number of *characters* in the string that
> precede the terminating null character
I went for a simpler change: s/bytes/characters/
> .IR s .
> .SH ATTRIBUTES
> For an explanation of the terms used in this section, see
> @@ -60,7 +60,7 @@ T{
> T} Thread safety MT-Safe
> .TE
> .SH CONFORMING TO
> -POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
> +POSIX.1-2001, POSIX.1-2008, C89, C99, C11, SVr4, 4.3BSD.
Fixed.
> .SH SEE ALSO
> .BR string (3),
> .BR strnlen (3),
>
> Page 392 (PDF p. 390, §7.24.6.3) of the C11 standard
> <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf> says:
>
> The *strlen* function returns the number of characters that precede
> the terminating null character.
Thanks for the report. Interesting, POSIX.1 still uses the term "bytes"
the spec.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: strlen man-page misinformation
[not found] ` <56C5C33E.7030407-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2016-02-18 15:42 ` walter harms
[not found] ` <56C5E67A.2010401-fPG8STNUNVg@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: walter harms @ 2016-02-18 15:42 UTC (permalink / raw)
To: Michael Kerrisk (man-pages); +Cc: Alan Aversa, linux-man-u79uwXL29TY76Z2rM5mHXA
Am 18.02.2016 14:12, schrieb Michael Kerrisk (man-pages):
> Hello Alan,
>
> On 02/03/2016 06:25 PM, Alan Aversa wrote:
>> Hello,
>>
>> The 2015-08-08 strlen man-page is incorrect. Here's a diff:
>>
>> --- a/man3/strlen.3
>> +++ b/man3/strlen.3
>> @@ -45,7 +45,7 @@ excluding the terminating null byte (\(aq\\0\(aq).
>> .SH RETURN VALUE
>> The
>> .BR strlen ()
>> -function returns the number of bytes in the string
>> +function returns the number of *characters* in the string that
>> precede the terminating null character
>
> I went for a simpler change: s/bytes/characters/
For my understanding this is wrong. 1 character may be represented by 2 or more bytes (utf8).
see this example, the string (test) is 3 characters long and takes 6 bytes space.
did i miss something ? did the specification of character change ?
re,
wh
#include <stdio.h>
#include <string.h>
int main()
{
char *test="ÖÄÜ";
int i;
int len=strlen(test);
printf("strlen=%d\n",len);
for(i=0;i<len;i++)
printf("%02x\n",(unsigned char)*(test+i));
return 0;
}
output:
strlen=6
c3
96
c3
84
c3
9c
>
>> .IR s .
>> .SH ATTRIBUTES
>> For an explanation of the terms used in this section, see
>> @@ -60,7 +60,7 @@ T{
>> T} Thread safety MT-Safe
>> .TE
>> .SH CONFORMING TO
>> -POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
>> +POSIX.1-2001, POSIX.1-2008, C89, C99, C11, SVr4, 4.3BSD.
>
> Fixed.
>
>> .SH SEE ALSO
>> .BR string (3),
>> .BR strnlen (3),
>>
>> Page 392 (PDF p. 390, §7.24.6.3) of the C11 standard
>> <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf> says:
>>
>> The *strlen* function returns the number of characters that precede
>> the terminating null character.
>
> Thanks for the report. Interesting, POSIX.1 still uses the term "bytes"
> the spec.
>
> Cheers,
>
> Michael
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: strlen man-page misinformation
[not found] ` <56C5E67A.2010401-fPG8STNUNVg@public.gmane.org>
@ 2016-02-18 16:25 ` Keith Thompson
0 siblings, 0 replies; 3+ messages in thread
From: Keith Thompson @ 2016-02-18 16:25 UTC (permalink / raw)
To: wharms-fPG8STNUNVg
Cc: Michael Kerrisk (man-pages), Alan Aversa,
linux-man-u79uwXL29TY76Z2rM5mHXA
On Thu, Feb 18, 2016 at 7:42 AM, walter harms <wharms-fPG8STNUNVg@public.gmane.org> wrote:
>
>
> Am 18.02.2016 14:12, schrieb Michael Kerrisk (man-pages):
>> Hello Alan,
>>
>> On 02/03/2016 06:25 PM, Alan Aversa wrote:
>>> Hello,
>>>
>>> The 2015-08-08 strlen man-page is incorrect. Here's a diff:
>>>
>>> --- a/man3/strlen.3
>>> +++ b/man3/strlen.3
>>> @@ -45,7 +45,7 @@ excluding the terminating null byte (\(aq\\0\(aq).
>>> .SH RETURN VALUE
>>> The
>>> .BR strlen ()
>>> -function returns the number of bytes in the string
>>> +function returns the number of *characters* in the string that
>>> precede the terminating null character
>>
>> I went for a simpler change: s/bytes/characters/
>
>
> For my understanding this is wrong. 1 character may be represented by 2 or more bytes (utf8).
> see this example, the string (test) is 3 characters long and takes 6 bytes space.
>
> did i miss something ? did the specification of character change ?
[...]
Either "bytes" or "characters" would be correct. POSIX says "bytes";
ISO C says "characters".
See the definition of "character" in C11 3.7.1:
> bit representation that fits in a byte
On the other hand, 3.7 defines an (abstract) "character" as:
> member of a set of elements used for the organization, control,
> or representation of data
It also defines the terms "multibyte character" (a sequence of one
or more bytes representing a member of the extended character set)
and "wide character" (a value of type wchar_t).
"Bytes" is less ambiguous, but "characters" matches the wording of the
ISO C standard (and, in that context, refers to single-byte characters).
--
Keith Thompson <Keith.S.Thompson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-02-18 16:25 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <56B237F9.8010206@cox.net>
[not found] ` <56B237F9.8010206-j9pdmedNgrk@public.gmane.org>
2016-02-18 13:12 ` strlen man-page misinformation Michael Kerrisk (man-pages)
[not found] ` <56C5C33E.7030407-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-02-18 15:42 ` walter harms
[not found] ` <56C5E67A.2010401-fPG8STNUNVg@public.gmane.org>
2016-02-18 16:25 ` Keith Thompson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).