* [PATCH] unicode.7: update to reflect past developments @ 2014-06-10 8:39 Marko Myllynen [not found] ` <5396C458.2050000-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 2+ messages in thread From: Marko Myllynen @ 2014-06-10 8:39 UTC (permalink / raw) To: linux-man; +Cc: H. Peter Anvin, Markus Kuhn Hi, the unicode(7) page will look more modern with few small changes, please see below. >From a3e9003950b6226b83ec319639bd8ecb9932275b Mon Sep 17 00:00:00 2001 From: Marko Myllynen <myllynen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Date: Mon, 9 Jun 2014 17:03:38 +0300 Subject: [PATCH] unicode.7: update to reflect past developments - drop old BUGS section, editors cope with UTF-8 ok these days, and perhaps the state-of-the-art is better described elsewhere anyway than in a man page - drop old suggestion about avoiding combined characters - refer to LANANA for Linux zone, add registry file reference - drop a reference to an inactive/dead mailing list - update some reference URLs --- man7/unicode.7 | 43 ++++++++----------------------------------- 1 files changed, 8 insertions(+), 35 deletions(-) diff --git a/man7/unicode.7 b/man7/unicode.7 index 3eb1054..2fd8407 100644 --- a/man7/unicode.7 +++ b/man7/unicode.7 @@ -213,14 +213,6 @@ and tells, how many positions (0\(en2) the cursor is advanced by the output of a character. .PP -Under Linux, in general only the BMP at implementation level 1 should -be used at the moment. -Up to two combining characters per base -character for certain scripts (in particular Thai) are also supported -by some UTF-8 terminal emulators and ISO 10646 fonts (level 2), but in -general precomposed characters should be preferred where available -(Unicode calls this -.BR "Normalization Form C" ). .SS Private area In the .BR BMP , @@ -232,8 +224,10 @@ range 0xe000 to 0xefff which can be used individually by any end-user and the Linux zone in the range 0xf000 to 0xf8ff where extensions are coordinated among all Linux users. The registry of the characters -assigned to the Linux zone is currently maintained by H. Peter Anvin -<Peter.Anvin-Xh+NVF5n0LLYtjvyW6yDsg@public.gmane.org>. +assigned to the Linux zone is maintained by LANANA and the registry +itself is +.I Documentation/unicode.txt +in the Linux kernel sources. .SS Literature .TP 0.2i * @@ -244,7 +238,7 @@ for Standardization, Geneva, 2000. This is the official specification of .BR UCS . -Available as a PDF file on CD-ROM from +Available from .UR http://www.iso.ch/ .UE . .TP @@ -267,7 +261,7 @@ which improved wide and multibyte character support even further. * Unicode Technical Reports. .RS -.UR http://www.unicode.org\:/unicode\:/reports/ +.UR http://www.unicode.org\:/reports/ .UE .RE .TP @@ -276,39 +270,18 @@ Markus Kuhn: UTF-8 and Unicode FAQ for UNIX/Linux. .RS .UR http://www.cl.cam.ac.uk\:/~mgk25\:/unicode.html .UE - -Provides subscription information for the -.I linux-utf8 -mailing list, which is the best place to look for advice on using -Unicode under Linux. .RE .TP * Bruno Haible: Unicode HOWTO. .RS -.UR ftp://ftp.ilog.fr\:/pub\:/Users\:/haible\:/utf8\:/Unicode-HOWTO.html +.UR http://www.tldp.org\:/HOWTO\:/Unicode-HOWTO.html .UE .RE -.SH BUGS -When this man page was last revised, the GNU C Library support for -.B UTF-8 -locales was mature and XFree86 support was in an advanced state, but -work on making applications (most notably editors) suitable for use in -.B UTF-8 -locales was still fully in progress. -Current general -.B UCS -support under Linux usually provides for CJK double-width characters -and sometimes even simple overstriking combining characters, but -usually does not include support for scripts with right-to-left -writing direction or ligature substitution requirements such as -Hebrew, Arabic, or the Indic scripts. -These scripts are currently -supported only in certain GUI applications (HTML viewers, word processors) -with sophisticated text rendering engines. .\" .SH AUTHOR .\" Markus Kuhn <mgk25-kDbDZe0LBGWFxr2TtlUqVg@public.gmane.org> .SH SEE ALSO +.BR locale (1), .BR setlocale (3), .BR charsets (7), .BR utf-8 (7) -- 1.7.1 -- Marko Myllynen -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 2+ messages in thread
[parent not found: <5396C458.2050000-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH] unicode.7: update to reflect past developments [not found] ` <5396C458.2050000-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2014-06-10 14:52 ` Michael Kerrisk (man-pages) 0 siblings, 0 replies; 2+ messages in thread From: Michael Kerrisk (man-pages) @ 2014-06-10 14:52 UTC (permalink / raw) To: myllynen-H+wXaHxf7aLQT0dZR+AlfA, linux-man Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, H. Peter Anvin, Markus Kuhn On 06/10/2014 10:39 AM, Marko Myllynen wrote: > Hi, > > the unicode(7) page will look more modern with few small changes, please see below. Thanks, Marko. Applied. Cheers, Michael >>From a3e9003950b6226b83ec319639bd8ecb9932275b Mon Sep 17 00:00:00 2001 > From: Marko Myllynen <myllynen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > Date: Mon, 9 Jun 2014 17:03:38 +0300 > Subject: [PATCH] unicode.7: update to reflect past developments > > - drop old BUGS section, editors cope with UTF-8 ok these days, > and perhaps the state-of-the-art is better described elsewhere > anyway than in a man page > - drop old suggestion about avoiding combined characters > - refer to LANANA for Linux zone, add registry file reference > - drop a reference to an inactive/dead mailing list > - update some reference URLs > --- > man7/unicode.7 | 43 ++++++++----------------------------------- > 1 files changed, 8 insertions(+), 35 deletions(-) > > diff --git a/man7/unicode.7 b/man7/unicode.7 > index 3eb1054..2fd8407 100644 > --- a/man7/unicode.7 > +++ b/man7/unicode.7 > @@ -213,14 +213,6 @@ and > tells, how many positions (0\(en2) the cursor is advanced by the > output of a character. > .PP > -Under Linux, in general only the BMP at implementation level 1 should > -be used at the moment. > -Up to two combining characters per base > -character for certain scripts (in particular Thai) are also supported > -by some UTF-8 terminal emulators and ISO 10646 fonts (level 2), but in > -general precomposed characters should be preferred where available > -(Unicode calls this > -.BR "Normalization Form C" ). > .SS Private area > In the > .BR BMP , > @@ -232,8 +224,10 @@ range 0xe000 to 0xefff which can be used individually by any end-user > and the Linux zone in the range 0xf000 to 0xf8ff where extensions are > coordinated among all Linux users. > The registry of the characters > -assigned to the Linux zone is currently maintained by H. Peter Anvin > -<Peter.Anvin-Xh+NVF5n0LLYtjvyW6yDsg@public.gmane.org>. > +assigned to the Linux zone is maintained by LANANA and the registry > +itself is > +.I Documentation/unicode.txt > +in the Linux kernel sources. > .SS Literature > .TP 0.2i > * > @@ -244,7 +238,7 @@ for Standardization, Geneva, 2000. > > This is the official specification of > .BR UCS . > -Available as a PDF file on CD-ROM from > +Available from > .UR http://www.iso.ch/ > .UE . > .TP > @@ -267,7 +261,7 @@ which improved wide and multibyte character support even further. > * > Unicode Technical Reports. > .RS > -.UR http://www.unicode.org\:/unicode\:/reports/ > +.UR http://www.unicode.org\:/reports/ > .UE > .RE > .TP > @@ -276,39 +270,18 @@ Markus Kuhn: UTF-8 and Unicode FAQ for UNIX/Linux. > .RS > .UR http://www.cl.cam.ac.uk\:/~mgk25\:/unicode.html > .UE > - > -Provides subscription information for the > -.I linux-utf8 > -mailing list, which is the best place to look for advice on using > -Unicode under Linux. > .RE > .TP > * > Bruno Haible: Unicode HOWTO. > .RS > -.UR ftp://ftp.ilog.fr\:/pub\:/Users\:/haible\:/utf8\:/Unicode-HOWTO.html > +.UR http://www.tldp.org\:/HOWTO\:/Unicode-HOWTO.html > .UE > .RE > -.SH BUGS > -When this man page was last revised, the GNU C Library support for > -.B UTF-8 > -locales was mature and XFree86 support was in an advanced state, but > -work on making applications (most notably editors) suitable for use in > -.B UTF-8 > -locales was still fully in progress. > -Current general > -.B UCS > -support under Linux usually provides for CJK double-width characters > -and sometimes even simple overstriking combining characters, but > -usually does not include support for scripts with right-to-left > -writing direction or ligature substitution requirements such as > -Hebrew, Arabic, or the Indic scripts. > -These scripts are currently > -supported only in certain GUI applications (HTML viewers, word processors) > -with sophisticated text rendering engines. > .\" .SH AUTHOR > .\" Markus Kuhn <mgk25-kDbDZe0LBGWFxr2TtlUqVg@public.gmane.org> > .SH SEE ALSO > +.BR locale (1), > .BR setlocale (3), > .BR charsets (7), > .BR utf-8 (7) > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2014-06-10 14:52 UTC | newest] Thread overview: 2+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-06-10 8:39 [PATCH] unicode.7: update to reflect past developments Marko Myllynen [not found] ` <5396C458.2050000-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2014-06-10 14:52 ` Michael Kerrisk (man-pages)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).