* Re: strlen man-page misinformation [not found] ` <56B237F9.8010206-j9pdmedNgrk@public.gmane.org> @ 2016-02-18 13:12 ` Michael Kerrisk (man-pages) [not found] ` <56C5C33E.7030407-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 3+ messages in thread From: Michael Kerrisk (man-pages) @ 2016-02-18 13:12 UTC (permalink / raw) To: Alan Aversa Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, linux-man-u79uwXL29TY76Z2rM5mHXA Hello Alan, On 02/03/2016 06:25 PM, Alan Aversa wrote: > Hello, > > The 2015-08-08 strlen man-page is incorrect. Here's a diff: > > --- a/man3/strlen.3 > +++ b/man3/strlen.3 > @@ -45,7 +45,7 @@ excluding the terminating null byte (\(aq\\0\(aq). > .SH RETURN VALUE > The > .BR strlen () > -function returns the number of bytes in the string > +function returns the number of *characters* in the string that > precede the terminating null character I went for a simpler change: s/bytes/characters/ > .IR s . > .SH ATTRIBUTES > For an explanation of the terms used in this section, see > @@ -60,7 +60,7 @@ T{ > T} Thread safety MT-Safe > .TE > .SH CONFORMING TO > -POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD. > +POSIX.1-2001, POSIX.1-2008, C89, C99, C11, SVr4, 4.3BSD. Fixed. > .SH SEE ALSO > .BR string (3), > .BR strnlen (3), > > Page 392 (PDF p. 390, §7.24.6.3) of the C11 standard > <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf> says: > > The *strlen* function returns the number of characters that precede > the terminating null character. Thanks for the report. Interesting, POSIX.1 still uses the term "bytes" the spec. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 3+ messages in thread
[parent not found: <56C5C33E.7030407-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: strlen man-page misinformation [not found] ` <56C5C33E.7030407-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2016-02-18 15:42 ` walter harms [not found] ` <56C5E67A.2010401-fPG8STNUNVg@public.gmane.org> 0 siblings, 1 reply; 3+ messages in thread From: walter harms @ 2016-02-18 15:42 UTC (permalink / raw) To: Michael Kerrisk (man-pages); +Cc: Alan Aversa, linux-man-u79uwXL29TY76Z2rM5mHXA Am 18.02.2016 14:12, schrieb Michael Kerrisk (man-pages): > Hello Alan, > > On 02/03/2016 06:25 PM, Alan Aversa wrote: >> Hello, >> >> The 2015-08-08 strlen man-page is incorrect. Here's a diff: >> >> --- a/man3/strlen.3 >> +++ b/man3/strlen.3 >> @@ -45,7 +45,7 @@ excluding the terminating null byte (\(aq\\0\(aq). >> .SH RETURN VALUE >> The >> .BR strlen () >> -function returns the number of bytes in the string >> +function returns the number of *characters* in the string that >> precede the terminating null character > > I went for a simpler change: s/bytes/characters/ For my understanding this is wrong. 1 character may be represented by 2 or more bytes (utf8). see this example, the string (test) is 3 characters long and takes 6 bytes space. did i miss something ? did the specification of character change ? re, wh #include <stdio.h> #include <string.h> int main() { char *test="ÖÄÜ"; int i; int len=strlen(test); printf("strlen=%d\n",len); for(i=0;i<len;i++) printf("%02x\n",(unsigned char)*(test+i)); return 0; } output: strlen=6 c3 96 c3 84 c3 9c > >> .IR s . >> .SH ATTRIBUTES >> For an explanation of the terms used in this section, see >> @@ -60,7 +60,7 @@ T{ >> T} Thread safety MT-Safe >> .TE >> .SH CONFORMING TO >> -POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD. >> +POSIX.1-2001, POSIX.1-2008, C89, C99, C11, SVr4, 4.3BSD. > > Fixed. > >> .SH SEE ALSO >> .BR string (3), >> .BR strnlen (3), >> >> Page 392 (PDF p. 390, §7.24.6.3) of the C11 standard >> <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf> says: >> >> The *strlen* function returns the number of characters that precede >> the terminating null character. > > Thanks for the report. Interesting, POSIX.1 still uses the term "bytes" > the spec. > > Cheers, > > Michael > > -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 3+ messages in thread
[parent not found: <56C5E67A.2010401-fPG8STNUNVg@public.gmane.org>]
* Re: strlen man-page misinformation [not found] ` <56C5E67A.2010401-fPG8STNUNVg@public.gmane.org> @ 2016-02-18 16:25 ` Keith Thompson 0 siblings, 0 replies; 3+ messages in thread From: Keith Thompson @ 2016-02-18 16:25 UTC (permalink / raw) To: wharms-fPG8STNUNVg Cc: Michael Kerrisk (man-pages), Alan Aversa, linux-man-u79uwXL29TY76Z2rM5mHXA On Thu, Feb 18, 2016 at 7:42 AM, walter harms <wharms-fPG8STNUNVg@public.gmane.org> wrote: > > > Am 18.02.2016 14:12, schrieb Michael Kerrisk (man-pages): >> Hello Alan, >> >> On 02/03/2016 06:25 PM, Alan Aversa wrote: >>> Hello, >>> >>> The 2015-08-08 strlen man-page is incorrect. Here's a diff: >>> >>> --- a/man3/strlen.3 >>> +++ b/man3/strlen.3 >>> @@ -45,7 +45,7 @@ excluding the terminating null byte (\(aq\\0\(aq). >>> .SH RETURN VALUE >>> The >>> .BR strlen () >>> -function returns the number of bytes in the string >>> +function returns the number of *characters* in the string that >>> precede the terminating null character >> >> I went for a simpler change: s/bytes/characters/ > > > For my understanding this is wrong. 1 character may be represented by 2 or more bytes (utf8). > see this example, the string (test) is 3 characters long and takes 6 bytes space. > > did i miss something ? did the specification of character change ? [...] Either "bytes" or "characters" would be correct. POSIX says "bytes"; ISO C says "characters". See the definition of "character" in C11 3.7.1: > bit representation that fits in a byte On the other hand, 3.7 defines an (abstract) "character" as: > member of a set of elements used for the organization, control, > or representation of data It also defines the terms "multibyte character" (a sequence of one or more bytes representing a member of the extended character set) and "wide character" (a value of type wchar_t). "Bytes" is less ambiguous, but "characters" matches the wording of the ISO C standard (and, in that context, refers to single-byte characters). -- Keith Thompson <Keith.S.Thompson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-02-18 16:25 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <56B237F9.8010206@cox.net>
[not found] ` <56B237F9.8010206-j9pdmedNgrk@public.gmane.org>
2016-02-18 13:12 ` strlen man-page misinformation Michael Kerrisk (man-pages)
[not found] ` <56C5C33E.7030407-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-02-18 15:42 ` walter harms
[not found] ` <56C5E67A.2010401-fPG8STNUNVg@public.gmane.org>
2016-02-18 16:25 ` Keith Thompson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).