From: bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r@public.gmane.org
To: linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: [Bug 60807] not all the pages are encoded using utf-8
Date: Fri, 14 Feb 2014 10:22:04 +0000 [thread overview]
Message-ID: <bug-60807-11311-MQEHsQCnOr@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-60807-11311-3bo0kxnWaOQUvHkbgXJLS5sdmw4N0Rt+2LY78lusg7I@public.gmane.org/>
https://bugzilla.kernel.org/show_bug.cgi?id=60807
Michael Kerrisk <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
--- Comment #4 from Michael Kerrisk <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> ---
(In reply to Peter Schiffer from comment #3)
> $ ./print_encoding.sh man?/*
>
> Man Page Encoding by file Encoding by first line
>
> * man2/close.2 iso-8859-1
> * man2/getdomainname.2 iso-8859-1
> * man2/getrlimit.2 iso-8859-1
> * man2/madvise.2 iso-8859-1
> * man2/mount.2 utf-8
> * man2/sysinfo.2 iso-8859-1
> * man2/umask.2 iso-8859-1
> * man3/encrypt.3 iso-8859-1
> * man3/fclose.3 iso-8859-1
> * man3/fflush.3 iso-8859-1
> * man3/lockf.3 iso-8859-1
> * man3/rand.3 iso-8859-1
> * man3/strtok.3 iso-8859-1
> * man3/toupper.3 iso-8859-1
> * man3/updwtmp.3 iso-8859-1
> * man4/st.4 utf-8
> * man5/utmp.5 iso-8859-1
> * man7/armscii-8.7 iso-8859-1 ARMSCII-8
> * man7/cp1251.7 unknown-8bit CP1251
> * man7/environ.7 iso-8859-1
> * man7/hier.7 iso-8859-1
> * man7/iso_8859-10.7 iso-8859-1 ISO-8859-10
> * man7/iso_8859-11.7 iso-8859-1 ISO-8859-11
> * man7/iso_8859-13.7 iso-8859-1 ISO-8859-7
> * man7/iso_8859-14.7 iso-8859-1 ISO-8859-14
> * man7/iso_8859-15.7 iso-8859-1 ISO-8859-15
> * man7/iso_8859-16.7 iso-8859-1 ISO-8859-16
> * man7/iso_8859-1.7 iso-8859-1
> * man7/iso_8859-2.7 iso-8859-1 ISO-8859-2
> * man7/iso_8859-3.7 iso-8859-1 ISO-8859-3
> * man7/iso_8859-4.7 iso-8859-1 ISO-8859-4
> * man7/iso_8859-5.7 iso-8859-1 ISO-8859-5
> * man7/iso_8859-6.7 iso-8859-1 ISO-8859-6
> * man7/iso_8859-7.7 iso-8859-1 ISO-8859-7
> * man7/iso_8859-8.7 iso-8859-1 ISO-8859-8
> * man7/iso_8859-9.7 iso-8859-1 ISO-8859-9
> * man7/koi8-r.7 unknown-8bit KOI8-R
> * man7/koi8-u.7 unknown-8bit
> * man7/suffixes.7 iso-8859-1
>
> $ ./convert_to_utf_8.sh tmp_encoded man?/*
> Converting man2/close.2 from iso-8859-1
> Converting man2/getdomainname.2 from iso-8859-1
> Converting man2/getrlimit.2 from iso-8859-1
> Converting man2/madvise.2 from iso-8859-1
> Converting man2/mount.2 from utf-8
> Converting man2/sysinfo.2 from iso-8859-1
> Converting man2/umask.2 from iso-8859-1
> Converting man3/encrypt.3 from iso-8859-1
> Converting man3/fclose.3 from iso-8859-1
> Converting man3/fflush.3 from iso-8859-1
> Converting man3/lockf.3 from iso-8859-1
> Converting man3/rand.3 from iso-8859-1
> Converting man3/strtok.3 from iso-8859-1
> Converting man3/toupper.3 from iso-8859-1
> Converting man3/updwtmp.3 from iso-8859-1
> Converting man4/st.4 from utf-8
> Converting man5/utmp.5 from iso-8859-1
> Converting man7/armscii-8.7 from armscii-8
> Converting man7/cp1251.7 from cp1251
> Converting man7/environ.7 from iso-8859-1
> Converting man7/hier.7 from iso-8859-1
> Converting man7/iso_8859-10.7 from iso_8859-10
> Converting man7/iso_8859-11.7 from iso-8859-1
> Converting man7/iso_8859-13.7 from iso-8859-1
> Converting man7/iso_8859-14.7 from iso_8859-14
> Converting man7/iso_8859-15.7 from iso_8859-15
> Converting man7/iso_8859-16.7 from iso_8859-16
> Converting man7/iso_8859-1.7 from iso_8859-1
> Converting man7/iso_8859-2.7 from iso_8859-2
> Converting man7/iso_8859-3.7 from iso_8859-3
> Converting man7/iso_8859-4.7 from iso_8859-4
> Converting man7/iso_8859-5.7 from iso_8859-5
> Converting man7/iso_8859-6.7 from iso_8859-6
> Converting man7/iso_8859-7.7 from iso_8859-7
> Converting man7/iso_8859-8.7 from iso_8859-8
> Converting man7/iso_8859-9.7 from iso_8859-9
> Converting man7/koi8-r.7 from koi8-r
> Converting man7/koi8-u.7 from koi8-u
> Converting man7/suffixes.7 from iso-8859-1
>
> $ cd tmp_encoded/
>
> $ ../print_encoding.sh man?/*
>
> Man Page Encoding by file Encoding by first line
>
> * man2/close.2 utf-8 UTF-8
> * man2/getdomainname.2 utf-8 UTF-8
> * man2/getrlimit.2 utf-8 UTF-8
> * man2/madvise.2 utf-8 UTF-8
> * man2/mount.2 utf-8 UTF-8
> * man2/sysinfo.2 utf-8 UTF-8
> * man2/umask.2 utf-8 UTF-8
> * man3/encrypt.3 utf-8 UTF-8
> * man3/fclose.3 utf-8 UTF-8
> * man3/fflush.3 utf-8 UTF-8
> * man3/lockf.3 utf-8 UTF-8
> * man3/rand.3 utf-8 UTF-8
> * man3/strtok.3 utf-8 UTF-8
> * man3/toupper.3 utf-8 UTF-8
> * man3/updwtmp.3 utf-8 UTF-8
> * man4/st.4 utf-8 UTF-8
> * man5/utmp.5 utf-8 UTF-8
> * man7/armscii-8.7 utf-8 UTF-8
> * man7/cp1251.7 utf-8 UTF-8
> * man7/environ.7 utf-8 UTF-8
> * man7/hier.7 utf-8 UTF-8
> * man7/iso_8859-10.7 utf-8 UTF-8
> * man7/iso_8859-11.7 utf-8 UTF-8
> * man7/iso_8859-13.7 utf-8 UTF-8
> * man7/iso_8859-14.7 utf-8 UTF-8
> * man7/iso_8859-15.7 utf-8 UTF-8
> * man7/iso_8859-16.7 utf-8 UTF-8
> * man7/iso_8859-1.7 utf-8 UTF-8
> * man7/iso_8859-2.7 utf-8 UTF-8
> * man7/iso_8859-3.7 utf-8 UTF-8
> * man7/iso_8859-4.7 utf-8 UTF-8
> * man7/iso_8859-5.7 utf-8 UTF-8
> * man7/iso_8859-6.7 utf-8 UTF-8
> * man7/iso_8859-7.7 utf-8 UTF-8
> * man7/iso_8859-8.7 utf-8 UTF-8
> * man7/iso_8859-9.7 utf-8 UTF-8
> * man7/koi8-r.7 utf-8 UTF-8
> * man7/koi8-u.7 utf-8 UTF-8
> * man7/suffixes.7 utf-8 UTF-8
Peter,
Sorry to be slow following up on this. Thanks for the scripts.
As some background, I'll just note that the current encoding markers in the
iso_8859* pages were added in response to this 2009 bug report:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=519209
It seems a reasonable idea to convert everything to UTF-8, but I have some
concerns/questions.
1. Is the encoding line:
'\" t -*- coding: UTF-8 -*-
really needed, or does modern groff just work this out?
2. I'm concerned about backward compatibility issues. As in: what if someone
loads the man pages onto a system with old groff. Now, as far as I can work
out, groff added input unicode support in v1.20, 2009
(http://lists.gnu.org/archive/html/groff/2009-01/msg00011.html). So, perhaps
that's long enough ago that we don't need to worry too much about these issues.
Any thoughts?
--
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2014-02-14 10:22 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-28 13:38 [Bug 60807] New: not all the pages are encoded using utf-8 bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r
[not found] ` <bug-60807-11311-3bo0kxnWaOQUvHkbgXJLS5sdmw4N0Rt+2LY78lusg7I@public.gmane.org/>
2013-12-05 17:43 ` [Bug 60807] " bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r
2013-12-05 17:44 ` bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r
2013-12-05 17:46 ` bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r
2014-02-14 10:22 ` bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r [this message]
2014-02-14 12:47 ` bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r
2014-02-16 6:34 ` bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r
2014-02-16 7:44 ` bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r
2014-02-18 15:42 ` bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-60807-11311-MQEHsQCnOr@https.bugzilla.kernel.org/ \
--to=bugzilla-daemon-590eeb7gvniway/ihj7yzeb+6bgklq7r@public.gmane.org \
--cc=linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).