Issue in man page charsets.7

public inbox for linux-man@vger.kernel.org
 help / color / mirror / Atom feed

* Issue in man page charsets.7
@ 2023-11-01 14:02 Helge Kreutzmann
  2023-11-11 19:38 ` Alejandro Colomar
  0 siblings, 1 reply; 35+ messages in thread
From: Helge Kreutzmann @ 2023-11-01 14:02 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO 4873 → ISO/IEC 4873

"ISO 4873 stipulates a narrower use of character sets, where G0 is fixed "
"(always ASCII), so that G1, G2, and G3 can be invoked only for codes with "
"the high order bit set.  In particular, B<\\[ha]N> and B<\\[ha]O> are not "
"used anymore, ESC ( xx can be used only with xx=B, and ESC ) xx, ESC * xx, "
"ESC + xx are equivalent to ESC - xx, ESC . xx, ESC / xx, respectively."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Issue in man page charsets.7
  2023-11-01 14:02 Issue in man page charsets.7 Helge Kreutzmann
@ 2023-11-11 19:38 ` Alejandro Colomar
  2023-11-11 19:45   ` Helge Kreutzmann
  0 siblings, 1 reply; 35+ messages in thread
From: Alejandro Colomar @ 2023-11-11 19:38 UTC (permalink / raw)
  To: Helge Kreutzmann; +Cc: alx.manpages, mario.blaettermann, linux-man

[-- Attachment #1: Type: text/plain, Size: 824 bytes --]

Hi Helge,

On Wed, Nov 01, 2023 at 02:02:13PM +0000, Helge Kreutzmann wrote:
> Without further ado, the following was found:
> 
> Issue:    ISO 4873 → ISO/IEC 4873

For all the reports about ISO -> ISO/IEC, I'd appreciate if you could
add a link to an official document that shows that it's the correct
name.  I'd include that in the commit message.

Thanks!
Alex

> 
> "ISO 4873 stipulates a narrower use of character sets, where G0 is fixed "
> "(always ASCII), so that G1, G2, and G3 can be invoked only for codes with "
> "the high order bit set.  In particular, B<\\[ha]N> and B<\\[ha]O> are not "
> "used anymore, ESC ( xx can be used only with xx=B, and ESC ) xx, ESC * xx, "
> "ESC + xx are equivalent to ESC - xx, ESC . xx, ESC / xx, respectively."

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Issue in man page charsets.7
  2023-11-11 19:38 ` Alejandro Colomar
@ 2023-11-11 19:45   ` Helge Kreutzmann
  2024-01-28 20:22     ` Alejandro Colomar
  0 siblings, 1 reply; 35+ messages in thread
From: Helge Kreutzmann @ 2023-11-11 19:45 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: alx.manpages, mario.blaettermann, linux-man

[-- Attachment #1: Type: text/plain, Size: 951 bytes --]

Am Sat, Nov 11, 2023 at 08:38:59PM +0100 schrieb Alejandro Colomar:
> Hi Helge,
> 
> On Wed, Nov 01, 2023 at 02:02:13PM +0000, Helge Kreutzmann wrote:
> > Without further ado, the following was found:
> > 
> > Issue:    ISO 4873 → ISO/IEC 4873
> 
> For all the reports about ISO -> ISO/IEC, I'd appreciate if you could
> add a link to an official document that shows that it's the correct
> name.  I'd include that in the commit message.

Simply go to 
https://www.iso.org

and enter the number in the search field.

For this one you will get:
https://www.iso.org/standard/10859.html

Is this sufficient?

Greetings

         Helge

-- 
      Dr. Helge Kreutzmann                     debian@helgefjell.de
           Dipl.-Phys.                   http://www.helgefjell.de/debian.php
        64bit GNU powered                     gpg signed mail preferred
           Help keep free software "libre": http://www.ffii.de/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Issue in man page charsets.7
  2023-11-11 19:45   ` Helge Kreutzmann
@ 2024-01-28 20:22     ` Alejandro Colomar
  2024-01-28 20:32       ` Helge Kreutzmann
  0 siblings, 1 reply; 35+ messages in thread
From: Alejandro Colomar @ 2024-01-28 20:22 UTC (permalink / raw)
  To: Helge Kreutzmann; +Cc: alx.manpages, mario.blaettermann, linux-man

[-- Attachment #1: Type: text/plain, Size: 1212 bytes --]

On Sat, Nov 11, 2023 at 07:45:02PM +0000, Helge Kreutzmann wrote:
> Am Sat, Nov 11, 2023 at 08:38:59PM +0100 schrieb Alejandro Colomar:
> > Hi Helge,
> > 
> > On Wed, Nov 01, 2023 at 02:02:13PM +0000, Helge Kreutzmann wrote:
> > > Without further ado, the following was found:
> > > 
> > > Issue:    ISO 4873 → ISO/IEC 4873
> > 
> > For all the reports about ISO -> ISO/IEC, I'd appreciate if you could
> > add a link to an official document that shows that it's the correct
> > name.  I'd include that in the commit message.
> 
> Simply go to 
> https://www.iso.org
> 
> and enter the number in the search field.
> 
> For this one you will get:
> https://www.iso.org/standard/10859.html
> 
> Is this sufficient?

Hi Helge!

Yes (but a bit tedious; thus the delay).  I've finally done the change.
I hope you'll enjoy the improved consistency.  :)

You can find the changes in this branch:
<https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=76fcba21723fcb1f1281babee2fa5308bbc5ef2b>

In a few days, I'll merge to master.

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>
Looking for a remote C programming job at the moment.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Issue in man page charsets.7
  2024-01-28 20:22     ` Alejandro Colomar
@ 2024-01-28 20:32       ` Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2024-01-28 20:32 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: alx.manpages, mario.blaettermann, linux-man

[-- Attachment #1: Type: text/plain, Size: 475 bytes --]

Hello Alex,
thanks for keeping on track. And no, in German we say "good things
need a while". So I'm happy that you still have/had them on the list.

Greetings

           Helge
-- 
      Dr. Helge Kreutzmann                     debian@helgefjell.de
           Dipl.-Phys.                   http://www.helgefjell.de/debian.php
        64bit GNU powered                     gpg signed mail preferred
           Help keep free software "libre": http://www.ffii.de/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Issue in man page charsets.7
@ 2024-11-17 10:46 Helge Kreutzmann
  2024-11-17 14:47 ` Alejandro Colomar
  0 siblings, 1 reply; 35+ messages in thread
From: Helge Kreutzmann @ 2024-11-17 10:46 UTC (permalink / raw)
  To: alx; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    (it) is \\[aq]/\\[aq]s correct (the final s is a english plural s)

"Note that UTF-8 is self-synchronizing: 10xxxxxx is a tail, any other byte is "
"the head of a code.  Note that the only way ASCII bytes occur in a UTF-8 "
"stream, is as themselves.  In particular, there are no embedded NULs "
"(\\[aq]\\[rs]0\\[aq]) or \\[aq]/\\[aq]s that form part of some larger code."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Issue in man page charsets.7
  2024-11-17 10:46 Helge Kreutzmann
@ 2024-11-17 14:47 ` Alejandro Colomar
  2024-11-17 15:07   ` Helge Kreutzmann
  0 siblings, 1 reply; 35+ messages in thread
From: Alejandro Colomar @ 2024-11-17 14:47 UTC (permalink / raw)
  To: Helge Kreutzmann; +Cc: mario.blaettermann, linux-man

[-- Attachment #1: Type: text/plain, Size: 670 bytes --]

Hi Helge,

On Sun, Nov 17, 2024 at 10:46:25AM GMT, Helge Kreutzmann wrote:
> Without further ado, the following was found:
> 
> Issue:    (it) is \\[aq]/\\[aq]s correct (the final s is a english plural s)

Would you mind clarifying the report?  I don't understand it.  Thanks!

Cheers,
Alex

> 
> "Note that UTF-8 is self-synchronizing: 10xxxxxx is a tail, any other byte is "
> "the head of a code.  Note that the only way ASCII bytes occur in a UTF-8 "
> "stream, is as themselves.  In particular, there are no embedded NULs "
> "(\\[aq]\\[rs]0\\[aq]) or \\[aq]/\\[aq]s that form part of some larger code."

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Issue in man page charsets.7
  2024-11-17 14:47 ` Alejandro Colomar
@ 2024-11-17 15:07   ` Helge Kreutzmann
  2024-11-17 15:16     ` Alejandro Colomar
  0 siblings, 1 reply; 35+ messages in thread
From: Helge Kreutzmann @ 2024-11-17 15:07 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: mario.blaettermann, linux-man

[-- Attachment #1: Type: text/plain, Size: 1293 bytes --]

Hello Alejandro,
Am Sun, Nov 17, 2024 at 03:47:27PM +0100 schrieb Alejandro Colomar:
> On Sun, Nov 17, 2024 at 10:46:25AM GMT, Helge Kreutzmann wrote:
> > Without further ado, the following was found:
> > 
> > Issue:    (it) is \\[aq]/\\[aq]s correct (the final s is a english plural s)
> 
> Would you mind clarifying the report?  I don't understand it.  Thanks!
> 
> Cheers,
> Alex
> 
> > 
> > "Note that UTF-8 is self-synchronizing: 10xxxxxx is a tail, any other byte is "
> > "the head of a code.  Note that the only way ASCII bytes occur in a UTF-8 "
> > "stream, is as themselves.  In particular, there are no embedded NULs "
> > "(\\[aq]\\[rs]0\\[aq]) or \\[aq]/\\[aq]s that form part of some larger code."

As I understand it, the reporter is wondering if the "s" after \\[aq]/\\[aq] 
is correct. For the (\\[aq]\\[rs]0\\[aq]) there is no (plural) "s" and
here grammar (which probably dictates a plural s) and clarity are a
little in conflict.

Greetings

      Helge

-- 
      Dr. Helge Kreutzmann                     debian@helgefjell.de
           Dipl.-Phys.                   http://www.helgefjell.de/debian.php
        64bit GNU powered                     gpg signed mail preferred
           Help keep free software "libre": http://www.ffii.de/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Issue in man page charsets.7
  2024-11-17 15:07   ` Helge Kreutzmann
@ 2024-11-17 15:16     ` Alejandro Colomar
  0 siblings, 0 replies; 35+ messages in thread
From: Alejandro Colomar @ 2024-11-17 15:16 UTC (permalink / raw)
  To: Helge Kreutzmann; +Cc: mario.blaettermann, linux-man

[-- Attachment #1: Type: text/plain, Size: 1602 bytes --]

Hi Helge,

On Sun, Nov 17, 2024 at 03:07:30PM GMT, Helge Kreutzmann wrote:
> Hello Alejandro,
> Am Sun, Nov 17, 2024 at 03:47:27PM +0100 schrieb Alejandro Colomar:
> > On Sun, Nov 17, 2024 at 10:46:25AM GMT, Helge Kreutzmann wrote:
> > > Without further ado, the following was found:
> > > 
> > > Issue:    (it) is \\[aq]/\\[aq]s correct (the final s is a english plural s)
> > 
> > Would you mind clarifying the report?  I don't understand it.  Thanks!
> > 
> > Cheers,
> > Alex
> > 
> > > 
> > > "Note that UTF-8 is self-synchronizing: 10xxxxxx is a tail, any other byte is "
> > > "the head of a code.  Note that the only way ASCII bytes occur in a UTF-8 "
> > > "stream, is as themselves.  In particular, there are no embedded NULs "
> > > "(\\[aq]\\[rs]0\\[aq]) or \\[aq]/\\[aq]s that form part of some larger code."
> 
> As I understand it, the reporter is wondering if the "s" after \\[aq]/\\[aq] 
> is correct. For the (\\[aq]\\[rs]0\\[aq]) there is no (plural) "s" and

The '\0' is in a parenthetical, but '/'s has the plural like NULs.
Let's keep it like that.  :)

Cheers,
Alex

> here grammar (which probably dictates a plural s) and clarity are a
> little in conflict.
> 
> Greetings
> 
>       Helge
> 
> -- 
>       Dr. Helge Kreutzmann                     debian@helgefjell.de
>            Dipl.-Phys.                   http://www.helgefjell.de/debian.php
>         64bit GNU powered                     gpg signed mail preferred
>            Help keep free software "libre": http://www.ffii.de/



-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Issue in man page charsets.7
@ 2023-11-01 14:02 Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2023-11-01 14:02 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO 646:1991 → ISO/IEC 646:1991-12

"ASCII (American Standard Code For Information Interchange) is the original 7-"
"bit character set, originally designed for American English.  Also known as "
"US-ASCII.  It is currently described by the ISO 646:1991 IRV (International "
"Reference Version) standard."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Issue in man page charsets.7
@ 2023-11-01 14:02 Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2023-11-01 14:02 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO 2375 → ISO/IEC 2375

"A 94-character set is designated as GI<n> character set by an escape "
"sequence ESC ( xx (for G0), ESC ) xx (for G1), ESC * xx (for G2), ESC + xx "
"(for G3), where xx is a symbol or a pair of symbols found in the ISO 2375 "
"International Register of Coded Character Sets.  For example, ESC ( @ "
"selects the ISO 646 character set as G0, ESC ( A selects the UK standard "
"character set (with pound instead of number sign), ESC ( B selects ASCII "
"(with dollar instead of currency sign), ESC ( M selects a character set for "
"African languages, ESC ( ! A selects the Cuban character set, and so on."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Issue in man page charsets.7
@ 2023-11-01 14:02 Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2023-11-01 14:02 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO 8859 → ISO/IEC 8859

"For most texts in ISO 8859 character sets, this means that the characters "
"outside of ASCII are now coded with two bytes.  This tends to expand "
"ordinary text files by only one or two percent.  For Russian or Greek texts, "
"this expands ordinary text files by 100%, since text in those languages is "
"mostly outside of ASCII.  For Japanese users this means that the 16-bit "
"codes now in common use will take three bytes.  While there are algorithmic "
"conversions from some character sets (especially ISO 8859-1) to Unicode, "
"general conversion requires carrying around conversion tables, which can be "
"quite large for 16-bit codes."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Issue in man page charsets.7
@ 2023-11-01 14:02 Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2023-11-01 14:02 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO → ISO/IEC

"The ISO 2022 and 4873 standards describe a font-control model based on VT100 "
"practice.  This model is (partially) supported by the Linux kernel and by "
"B<xterm>(1).  Several ISO 2022-based character encodings have been defined, "
"especially for Japanese."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Issue in man page charsets.7
@ 2023-03-11 17:14 Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2023-03-11 17:14 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO 8859 → ISO/IEC 8859

"For most texts in ISO 8859 character sets, this means that the characters "
"outside of ASCII are now coded with two bytes.  This tends to expand "
"ordinary text files by only one or two percent.  For Russian or Greek texts, "
"this expands ordinary text files by 100%, since text in those languages is "
"mostly outside of ASCII.  For Japanese users this means that the 16-bit "
"codes now in common use will take three bytes.  While there are algorithmic "
"conversions from some character sets (especially ISO 8859-1) to Unicode, "
"general conversion requires carrying around conversion tables, which can be "
"quite large for 16-bit codes."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Issue in man page charsets.7
@ 2023-03-11 17:14 Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2023-03-11 17:14 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO 10646 → ISO/IEC 10646

"Unicode (ISO 10646) is a standard which aims to unambiguously represent "
"every character in every human language.  Unicode's structure permits 20.1 "
"bits to encode every character.  Since most computers don't include 20.1-bit "
"integers, Unicode is usually encoded as 32-bit integers internally and "
"either a series of 16-bit integers (UTF-16) (needing two 16-bit integers "
"only when encoding certain rare characters) or a series of 8-bit bytes "
"(UTF-8)."

"A byte 110xxxxx is the start of a 2-byte code, and 110xxxxx 10yyyyyy is "
"assembled into 00000xxx xxyyyyyy.  A byte 1110xxxx is the start of a 3-byte "
"code, and 1110xxxx 10yyyyyy 10zzzzzz is assembled into xxxxyyyy yyzzzzzz.  "
"(When UTF-8 is used to code the 31-bit ISO 10646 then this progression "
"continues up to 6-byte codes.)"

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Issue in man page charsets.7
@ 2023-03-11 17:13 Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2023-03-11 17:13 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO 2375 → ISO/IEC 2375

"A 94-character set is designated as GI<n> character set by an escape "
"sequence ESC ( xx (for G0), ESC ) xx (for G1), ESC * xx (for G2), ESC + xx "
"(for G3), where xx is a symbol or a pair of symbols found in the ISO 2375 "
"International Register of Coded Character Sets.  For example, ESC ( @ "
"selects the ISO 646 character set as G0, ESC ( A selects the UK standard "
"character set (with pound instead of number sign), ESC ( B selects ASCII "
"(with dollar instead of currency sign), ESC ( M selects a character set for "
"African languages, ESC ( ! A selects the Cuban character set, and so on."
msgstr ""
"Ein 94-Zeichen Satz wird durch eine Escape-Sequenz ESC ( xx (für G0), ESC ) "
"xx (für G1), ESC * xx (für G2), ESC + xx (für G3) bezeichnet, wobei xx ein "
"im internationalen Register von kodierten Zeichensätzen in ISO/IEC 2375 "
"gefundenes Symbol oder ein Paar von Symbolen ist. Beispielsweise wählt ESC "
"( @ den ISO-646-Zeichensatz als G0, ESC ( A wählt den UK-Standardzeichensatz "
"(mit Pfundzeichen statt des Nummernzeichens), ESC ( B wählt ASCII (mit "
"Dollarzeichen anstelle des Währungszeichens), ESC ( M wählt einen "
"Zeichensatz für afrikanische Sprachen ESC ( ! A wählt den kubanischen "
"Zeichensatz und so weiter."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Issue in man page charsets.7
@ 2023-03-11 17:13 Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2023-03-11 17:13 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO 4873 → ISO/IEC 4873

"ISO 4873 stipulates a narrower use of character sets, where G0 is fixed "
"(always ASCII), so that G1, G2, and G3 can be invoked only for codes with "
"the high order bit set.  In particular, B<\\[ha]N> and B<\\[ha]O> are not "
"used anymore, ESC ( xx can be used only with xx=B, and ESC ) xx, ESC * xx, "
"ESC + xx are equivalent to ESC - xx, ESC . xx, ESC / xx, respectively."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Issue in man page charsets.7
@ 2023-03-11 17:13 Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2023-03-11 17:13 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO 646:1991 → ISO/IEC 646:1991-12

"ASCII (American Standard Code For Information Interchange) is the original 7-"
"bit character set, originally designed for American English.  Also known as "
"US-ASCII.  It is currently described by the ISO 646:1991 IRV (International "
"Reference Version) standard."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Issue in man page charsets.7
@ 2023-03-11 17:13 Helge Kreutzmann
  2023-03-11 23:26 ` Alejandro Colomar
  0 siblings, 1 reply; 35+ messages in thread
From: Helge Kreutzmann @ 2023-03-11 17:13 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    „“ are not old-style, they are the current quotation marks required by German othography

"Latin-1 covers many European languages such as Albanian, Basque, Danish, "
"English, Faroese, Galician, Icelandic, Irish, Italian, Norwegian, "
"Portuguese, Spanish, and Swedish.  The lack of the ligatures Dutch Ĳ/ĳ, "
"French œ, and old-style „German“ quotation marks was considered tolerable."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Issue in man page charsets.7
  2023-03-11 17:13 Helge Kreutzmann
@ 2023-03-11 23:26 ` Alejandro Colomar
  2023-03-12  5:08   ` Helge Kreutzmann
  0 siblings, 1 reply; 35+ messages in thread
From: Alejandro Colomar @ 2023-03-11 23:26 UTC (permalink / raw)
  To: Helge Kreutzmann; +Cc: mario.blaettermann, linux-man


[-- Attachment #1.1: Type: text/plain, Size: 704 bytes --]

Hi Helge,

On 3/11/23 18:13, Helge Kreutzmann wrote:
> Without further ado, the following was found:
> 
> Issue:    „“ are not old-style, they are the current quotation marks required by German othography
> 
> "Latin-1 covers many European languages such as Albanian, Basque, Danish, "
> "English, Faroese, Galician, Icelandic, Irish, Italian, Norwegian, "
> "Portuguese, Spanish, and Swedish.  The lack of the ligatures Dutch Ĳ/ĳ, "
> "French œ, and old-style „German“ quotation marks was considered tolerable."

Please suggest what fix you would apply.

Thanks,

Alex

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Issue in man page charsets.7
  2023-03-11 23:26 ` Alejandro Colomar
@ 2023-03-12  5:08   ` Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2023-03-12  5:08 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: mario.blaettermann, linux-man

[-- Attachment #1: Type: text/plain, Size: 1034 bytes --]

Hello Alex,
On Sun, Mar 12, 2023 at 12:26:13AM +0100, Alejandro Colomar wrote:
> Hi Helge,
> 
> On 3/11/23 18:13, Helge Kreutzmann wrote:
> > Without further ado, the following was found:
> > 
> > Issue:    „“ are not old-style, they are the current quotation marks required by German othography
> > 
> > "Latin-1 covers many European languages such as Albanian, Basque, Danish, "
> > "English, Faroese, Galician, Icelandic, Irish, Italian, Norwegian, "
> > "Portuguese, Spanish, and Swedish.  The lack of the ligatures Dutch Ĳ/ĳ, "
> > "French œ, and old-style „German“ quotation marks was considered tolerable."
> 
> Please suggest what fix you would apply.

old-style → standard

Greetings

      Helge


-- 
      Dr. Helge Kreutzmann                     debian@helgefjell.de
           Dipl.-Phys.                   http://www.helgefjell.de/debian.php
        64bit GNU powered                     gpg signed mail preferred
           Help keep free software "libre": http://www.ffii.de/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Issue in man page charsets.7
@ 2023-03-11 17:13 Helge Kreutzmann
  2023-03-11 23:27 ` Alejandro Colomar
  0 siblings, 1 reply; 35+ messages in thread
From: Helge Kreutzmann @ 2023-03-11 17:13 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO → ISO/IEC

"The ISO 2022 and 4873 standards describe a font-control model based on VT100 "
"practice.  This model is (partially) supported by the Linux kernel and by "
"B<xterm>(1).  Several ISO 2022-based character encodings have been defined, "
"especially for Japanese."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Issue in man page charsets.7
  2023-03-11 17:13 Helge Kreutzmann
@ 2023-03-11 23:27 ` Alejandro Colomar
  2023-03-12  5:14   ` Helge Kreutzmann
  0 siblings, 1 reply; 35+ messages in thread
From: Alejandro Colomar @ 2023-03-11 23:27 UTC (permalink / raw)
  To: Helge Kreutzmann; +Cc: mario.blaettermann, linux-man


[-- Attachment #1.1: Type: text/plain, Size: 765 bytes --]

Hi Helge,

On 3/11/23 18:13, Helge Kreutzmann wrote:
> Without further ado, the following was found:
> 
> Issue:    ISO → ISO/IEC

I've already seen several reports about ISO -> ISO/IEC in several pages
from several people.  I'd like someone who knows about these standards
to take a look at all the man pages and suggest a global fix about this.

Thanks,

Alex

> 
> "The ISO 2022 and 4873 standards describe a font-control model based on VT100 "
> "practice.  This model is (partially) supported by the Linux kernel and by "
> "B<xterm>(1).  Several ISO 2022-based character encodings have been defined, "
> "especially for Japanese."

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Issue in man page charsets.7
  2023-03-11 23:27 ` Alejandro Colomar
@ 2023-03-12  5:14   ` Helge Kreutzmann
  2023-03-12 11:28     ` Alejandro Colomar
  0 siblings, 1 reply; 35+ messages in thread
From: Helge Kreutzmann @ 2023-03-12  5:14 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: mario.blaettermann, linux-man

[-- Attachment #1: Type: text/plain, Size: 1556 bytes --]

Hello Alex,
On Sun, Mar 12, 2023 at 12:27:36AM +0100, Alejandro Colomar wrote:
> On 3/11/23 18:13, Helge Kreutzmann wrote:
> > Without further ado, the following was found:
> > 
> > Issue:    ISO → ISO/IEC
> 
> I've already seen several reports about ISO -> ISO/IEC in several pages
> from several people.  I'd like someone who knows about these standards
> to take a look at all the man pages and suggest a global fix about this.

Well, *most likely* the global fix is to always replace "ISO" by
"ISO/IEC" in the man pages.

Rationale:
Almost all relevant standards in the IT domain are prepared by the
joint technical committee 1 from ISO and ITC (ISO/IEC JTC1). Hence
they always carry an "ISO/IEC". 

But there *may* be exceptions. Thus I always check each individual
case (and hence made several reports). You can simply do this by going
to http://www.iso.org and entering the number in the search box.

If this is too tiresome, then a global fix of ISO → ISO/IEC is most
likely the correct fix.

All I can do is to review each occurence I note and point you to this.

Greetings

         Helge

P.S. If I should explain this even more verbosely, I can, please let
     me know. I work in these committes for ~ 15 years.

-- 
      Dr. Helge Kreutzmann                     debian@helgefjell.de
           Dipl.-Phys.                   http://www.helgefjell.de/debian.php
        64bit GNU powered                     gpg signed mail preferred
           Help keep free software "libre": http://www.ffii.de/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Issue in man page charsets.7
  2023-03-12  5:14   ` Helge Kreutzmann
@ 2023-03-12 11:28     ` Alejandro Colomar
  0 siblings, 0 replies; 35+ messages in thread
From: Alejandro Colomar @ 2023-03-12 11:28 UTC (permalink / raw)
  To: Helge Kreutzmann; +Cc: mario.blaettermann, linux-man


[-- Attachment #1.1: Type: text/plain, Size: 2361 bytes --]

Hi Helge,

On 3/12/23 06:14, Helge Kreutzmann wrote:
> Hello Alex,
> On Sun, Mar 12, 2023 at 12:27:36AM +0100, Alejandro Colomar wrote:
>> On 3/11/23 18:13, Helge Kreutzmann wrote:
>>> Without further ado, the following was found:
>>>
>>> Issue:    ISO → ISO/IEC
>>
>> I've already seen several reports about ISO -> ISO/IEC in several pages
>> from several people.  I'd like someone who knows about these standards
>> to take a look at all the man pages and suggest a global fix about this.
> 
> Well, *most likely* the global fix is to always replace "ISO" by
> "ISO/IEC" in the man pages.
> 
> Rationale:
> Almost all relevant standards in the IT domain are prepared by the
> joint technical committee 1 from ISO and ITC (ISO/IEC JTC1). Hence
> they always carry an "ISO/IEC". 
> 
> But there *may* be exceptions. Thus I always check each individual
> case (and hence made several reports). You can simply do this by going
> to http://www.iso.org and entering the number in the search box.

Okay.

> 
> If this is too tiresome, then a global fix of ISO → ISO/IEC is most
> likely the correct fix.
> 
> All I can do is to review each occurence I note and point you to this.

I've listed all ISO numbers that I could find (maybe there remain a few
uncovered, but this should show at least most of them):

$ grep -rho 'ISO[^a-zA-Z()<";:\.,&/[]*' \
  | sed 's/_/ /g' \
  | sed 's/\\\\//g' \
  | sed 's/-/ /g' \
  | sed 's/[ \t]*$//' \
  | sort \
  | uniq \
  | sed 's/ /-/' \
  | sed 's/ .*//' \
  | sort \
  | uniq \
  | sed 's/-/ /' \
  | grep ' ';
ISO 10646
ISO 14652
ISO 2022
ISO 2375
ISO 3166
ISO 4217
ISO 4873
ISO 639
ISO 6429
ISO 646
ISO 6709
ISO 8208
ISO 8601
ISO 8602
ISO 8859
ISO 9660
ISO 9945


And then searching those numbers in the ISO website, I created the
following list:


iso/iec:
	10646
	14652

iso:
	2022
	2375
	3166
	4217
	4873
	639
	6429
	8208
	8601
	8602
	8859-
	9660

iso/iec/ieee:
	9945

I'll apply a global fix with this info.

> 
> Greetings
> 
>          Helge

Greetings,

Alex

> 
> P.S. If I should explain this even more verbosely, I can, please let
>      me know. I work in these committes for ~ 15 years.
> 

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Issue in man page charsets.7
@ 2023-01-22 19:31 Helge Kreutzmann
  2023-01-29 16:45 ` Stefan Puiu
  0 siblings, 1 reply; 35+ messages in thread
From: Helge Kreutzmann @ 2023-01-22 19:31 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    „“ are not old-style, they are the current quotation marks required by German othography

"Latin-1 covers many West European languages such as Albanian, Basque, "
"Danish, English, Faroese, Galician, Icelandic, Irish, Italian, Norwegian, "
"Portuguese, Spanish, and Swedish.  The lack of the ligatures Dutch Ĳ/ĳ, "
"French œ, and old-style „German“ quotation marks was considered tolerable."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Issue in man page charsets.7
  2023-01-22 19:31 Helge Kreutzmann
@ 2023-01-29 16:45 ` Stefan Puiu
  2023-01-29 18:35   ` Alejandro Colomar
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Puiu @ 2023-01-29 16:45 UTC (permalink / raw)
  To: Helge Kreutzmann; +Cc: alx.manpages, mario.blaettermann, linux-man

Hi Helge,

On Sun, Jan 22, 2023 at 9:39 PM Helge Kreutzmann <debian@helgefjell.de> wrote:
>
> Without further ado, the following was found:
>
> Issue:    „“ are not old-style, they are the current quotation marks required by German othography

Those are also used in Romanian, and probably other languages as well.

>
> "Latin-1 covers many West European languages such as Albanian, Basque, "
> "Danish, English, Faroese, Galician, Icelandic, Irish, Italian, Norwegian, "
> "Portuguese, Spanish, and Swedish.  The lack of the ligatures Dutch Ĳ/ĳ, "
> "French œ, and old-style „German“ quotation marks was considered tolerable."

A bit weird to include Albanian in West European languages, isn't it?
Maybe the text could be reworked to:

  "many West European languages such as Basque, Danish, [... other
languages ...] and also Albanian."

Regards,
Stefan.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Issue in man page charsets.7
  2023-01-29 16:45 ` Stefan Puiu
@ 2023-01-29 18:35   ` Alejandro Colomar
  2023-01-29 19:20     ` Bernd Petrovitsch
  0 siblings, 1 reply; 35+ messages in thread
From: Alejandro Colomar @ 2023-01-29 18:35 UTC (permalink / raw)
  To: Stefan Puiu, Helge Kreutzmann; +Cc: mario.blaettermann, linux-man


[-- Attachment #1.1: Type: text/plain, Size: 1151 bytes --]

Hi Stefan,

On 1/29/23 17:45, Stefan Puiu wrote:
> Hi Helge,
> 
> On Sun, Jan 22, 2023 at 9:39 PM Helge Kreutzmann <debian@helgefjell.de> wrote:
>>
>> Without further ado, the following was found:
>>
>> Issue:    „“ are not old-style, they are the current quotation marks required by German othography
> 
> Those are also used in Romanian, and probably other languages as well.
> 
>>
>> "Latin-1 covers many West European languages such as Albanian, Basque,"
>> "Danish, English, Faroese, Galician, Icelandic, Irish, Italian, Norwegian,"
>> "Portuguese, Spanish, and Swedish.  The lack of the ligatures Dutch Ĳ/ĳ,"
>> "French œ, and old-style „German“ quotation marks was considered tolerable."
> 
> A bit weird to include Albanian in West European languages, isn't it?
> Maybe the text could be reworked to:
> 
>    "many West European languages such as Basque, Danish, [... other
> languages ...] and also Albanian."

I'd rather remove the "West" adjective from Europe.  It's simpler.  Does it 
sound reasonable to you?

Cheers,

Alex

> 
> Regards,
> Stefan.

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Issue in man page charsets.7
  2023-01-29 18:35   ` Alejandro Colomar
@ 2023-01-29 19:20     ` Bernd Petrovitsch
  2023-01-29 19:29       ` Alejandro Colomar
  0 siblings, 1 reply; 35+ messages in thread
From: Bernd Petrovitsch @ 2023-01-29 19:20 UTC (permalink / raw)
  To: Alejandro Colomar, Stefan Puiu, Helge Kreutzmann
  Cc: mario.blaettermann, linux-man

Hi all!

On 29/01/2023 19:35, Alejandro Colomar wrote:
[...]
> On 1/29/23 17:45, Stefan Puiu wrote:
[...]
>> On Sun, Jan 22, 2023 at 9:39 PM Helge Kreutzmann <debian@helgefjell.de> wrote:
[...]
>>> "Latin-1 covers many West European languages such as Albanian, Basque,"
>>> "Danish, English, Faroese, Galician, Icelandic, Irish, Italian, Norwegian,"
>>> "Portuguese, Spanish, and Swedish.  The lack of the ligatures Dutch Ĳ/ĳ,"
>>> "French œ, and old-style „German“ quotation marks was considered tolerable."
>>
>> A bit weird to include Albanian in West European languages, isn't it?
>> Maybe the text could be reworked to:
>>
>>    "many West European languages such as Basque, Danish, [... other
>> languages ...] and also Albanian."
> 
> I'd rather remove the "West" adjective from Europe.  It's simpler.  Does it sound reasonable to you?

And it's way more accurate:
- Albanian is Balkan.
- Icelandic, Norwegian and Swedish is Scandinavian.
- Italy is (usually) southern Europe.
- Faroese is propably also Scandinavian.
- Where is actually Galician spoken? In the north-west of Spain?

Kind regards,
	Bernd

PS: Keen to learn something.
-- 
Bernd Petrovitsch                  Email : bernd@petrovitsch.priv.at
      There is NO CLOUD, just other people's computers. - FSFE
                      LUGA : http://www.luga.at


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Issue in man page charsets.7
  2023-01-29 19:20     ` Bernd Petrovitsch
@ 2023-01-29 19:29       ` Alejandro Colomar
  2023-01-31 10:56         ` Stefan Puiu
  0 siblings, 1 reply; 35+ messages in thread
From: Alejandro Colomar @ 2023-01-29 19:29 UTC (permalink / raw)
  To: Bernd Petrovitsch, Stefan Puiu, Helge Kreutzmann
  Cc: mario.blaettermann, linux-man


[-- Attachment #1.1: Type: text/plain, Size: 1513 bytes --]

Hi Bernd,

On 1/29/23 20:20, Bernd Petrovitsch wrote:
> Hi all!
> 
> On 29/01/2023 19:35, Alejandro Colomar wrote:
> [...]
>> On 1/29/23 17:45, Stefan Puiu wrote:
> [...]
>>> On Sun, Jan 22, 2023 at 9:39 PM Helge Kreutzmann <debian@helgefjell.de> wrote:
> [...]
>>>> "Latin-1 covers many West European languages such as Albanian, Basque,"
>>>> "Danish, English, Faroese, Galician, Icelandic, Irish, Italian, Norwegian,"
>>>> "Portuguese, Spanish, and Swedish.  The lack of the ligatures Dutch Ĳ/ĳ,"
>>>> "French œ, and old-style „German“ quotation marks was considered tolerable."
>>>
>>> A bit weird to include Albanian in West European languages, isn't it?
>>> Maybe the text could be reworked to:
>>>
>>>    "many West European languages such as Basque, Danish, [... other
>>> languages ...] and also Albanian."
>>
>> I'd rather remove the "West" adjective from Europe.  It's simpler.  Does it 
>> sound reasonable to you?
> 
> And it's way more accurate:
> - Albanian is Balkan.
> - Icelandic, Norwegian and Swedish is Scandinavian.
> - Italy is (usually) southern Europe.
> - Faroese is propably also Scandinavian.
> - Where is actually Galician spoken? In the north-west of Spain?

Yep, Galician is the language spoken in Galicia, in the north-west of spain. 
It's a language very similar to Portuguese.

Will fix then.

> 
> Kind regards,
>      Bernd
> 
> PS: Keen to learn something.

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Issue in man page charsets.7
  2023-01-29 19:29       ` Alejandro Colomar
@ 2023-01-31 10:56         ` Stefan Puiu
  0 siblings, 0 replies; 35+ messages in thread
From: Stefan Puiu @ 2023-01-31 10:56 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Bernd Petrovitsch, Helge Kreutzmann, mario.blaettermann,
	linux-man

Hi,

On Sun, Jan 29, 2023 at 9:29 PM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
>
> Hi Bernd,
>
> On 1/29/23 20:20, Bernd Petrovitsch wrote:
> > Hi all!
> >
> > On 29/01/2023 19:35, Alejandro Colomar wrote:
> > [...]
> >> On 1/29/23 17:45, Stefan Puiu wrote:
> > [...]
> >>> On Sun, Jan 22, 2023 at 9:39 PM Helge Kreutzmann <debian@helgefjell.de> wrote:
> > [...]
> >>>> "Latin-1 covers many West European languages such as Albanian, Basque,"
> >>>> "Danish, English, Faroese, Galician, Icelandic, Irish, Italian, Norwegian,"
> >>>> "Portuguese, Spanish, and Swedish.  The lack of the ligatures Dutch Ĳ/ĳ,"
> >>>> "French œ, and old-style „German“ quotation marks was considered tolerable."
> >>>
> >>> A bit weird to include Albanian in West European languages, isn't it?
> >>> Maybe the text could be reworked to:
> >>>
> >>>    "many West European languages such as Basque, Danish, [... other
> >>> languages ...] and also Albanian."
> >>
> >> I'd rather remove the "West" adjective from Europe.  It's simpler.  Does it
> >> sound reasonable to you?
> >
> > And it's way more accurate:
> > - Albanian is Balkan.
> > - Icelandic, Norwegian and Swedish is Scandinavian.
> > - Italy is (usually) southern Europe.
> > - Faroese is propably also Scandinavian.
> > - Where is actually Galician spoken? In the north-west of Spain?
>
> Yep, Galician is the language spoken in Galicia, in the north-west of spain.
> It's a language very similar to Portuguese.
>
> Will fix then.

Well, I think it might be that the intention was to suit Western
European languages (although yes, strictly speaking there are multiple
language families involved), and Albanian just happened to be able to
use the same charset. I'm thinking of latin-1 in contrast to latin-2,
which I was using in the past for Romanian.

Stefan.

>
> >
> > Kind regards,
> >      Bernd
> >
> > PS: Keen to learn something.
>
> Cheers,
>
> Alex
>
> --
> <http://www.alejandro-colomar.es/>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Issue in man page charsets.7
@ 2023-01-22 19:31 Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2023-01-22 19:31 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    set → character set

"Here are brief descriptions of each set:"

"This set does not exist."

"This set covers many Southeast European languages, and most importantly "
"supports Romanian more completely than Latin-2."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Issue in man page charsets.7
@ 2023-01-22 19:31 Helge Kreutzmann
  2023-02-05 14:28 ` Alejandro Colomar
  0 siblings, 1 reply; 35+ messages in thread
From: Helge Kreutzmann @ 2023-01-22 19:31 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO → ISO/IEC

"ASCII (American Standard Code For Information Interchange) is the original 7-"
"bit character set, originally designed for American English.  Also known as "
"US-ASCII.  It is currently described by the ISO 646:1991 IRV (International "
"Reference Version) standard."

"The ISO 2022 and 4873 standards describe a font-control model based on VT100 "
"practice.  This model is (partially) supported by the Linux kernel and by "
"B<xterm>(1).  Several ISO 2022-based character encodings have been defined, "
"especially for Japanese."

"A 94-character set is designated as GI<n> character set by an escape "
"sequence ESC ( xx (for G0), ESC ) xx (for G1), ESC * xx (for G2), ESC + xx "
"(for G3), where xx is a symbol or a pair of symbols found in the ISO 2375 "
"International Register of Coded Character Sets.  For example, ESC ( @ "
"selects the ISO 646 character set as G0, ESC ( A selects the UK standard "
"character set (with pound instead of number sign), ESC ( B selects ASCII "
"(with dollar instead of currency sign), ESC ( M selects a character set for "
"African languages, ESC ( ! A selects the Cuban character set, and so on."

"ISO 4873 stipulates a narrower use of character sets, where G0 is fixed "
"(always ASCII), so that G1, G2, and G3 can be invoked only for codes with "
"the high order bit set.  In particular, B<\\(haN> and B<\\(haO> are not used "
"anymore, ESC ( xx can be used only with xx=B, and ESC ) xx, ESC * xx, ESC + "
"xx are equivalent to ESC - xx, ESC . xx, ESC / xx, respectively."

"Unicode (ISO 10646) is a standard which aims to unambiguously represent "
"every character in every human language.  Unicode's structure permits 20.1 "
"bits to encode every character.  Since most computers don't include 20.1-bit "
"integers, Unicode is usually encoded as 32-bit integers internally and "
"either a series of 16-bit integers (UTF-16) (needing two 16-bit integers "
"only when encoding certain rare characters) or a series of 8-bit bytes "
"(UTF-8)."

"A byte 110xxxxx is the start of a 2-byte code, and 110xxxxx 10yyyyyy is "
"assembled into 00000xxx xxyyyyyy.  A byte 1110xxxx is the start of a 3-byte "
"code, and 1110xxxx 10yyyyyy 10zzzzzz is assembled into xxxxyyyy yyzzzzzz.  "
"(When UTF-8 is used to code the 31-bit ISO 10646 then this progression "
"continues up to 6-byte codes.)"

"For most texts in ISO 8859 character sets, this means that the characters "
"outside of ASCII are now coded with two bytes.  This tends to expand "
"ordinary text files by only one or two percent.  For Russian or Greek texts, "
"this expands ordinary text files by 100%, since text in those languages is "
"mostly outside of ASCII.  For Japanese users this means that the 16-bit "
"codes now in common use will take three bytes.  While there are algorithmic "
"conversions from some character sets (especially ISO 8859-1) to Unicode, "
"general conversion requires carrying around conversion tables, which can be "
"quite large for 16-bit codes."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Issue in man page charsets.7
  2023-01-22 19:31 Helge Kreutzmann
@ 2023-02-05 14:28 ` Alejandro Colomar
  2023-02-05 14:49   ` Helge Kreutzmann
  0 siblings, 1 reply; 35+ messages in thread
From: Alejandro Colomar @ 2023-02-05 14:28 UTC (permalink / raw)
  To: Helge Kreutzmann; +Cc: mario.blaettermann, linux-man


[-- Attachment #1.1: Type: text/plain, Size: 3412 bytes --]

Hi Helge,

On 1/22/23 20:31, Helge Kreutzmann wrote:
> Without further ado, the following was found:
> 
> Issue:    ISO → ISO/IEC

Please someone write a documented patch for this one.

Cheers,

Alex

> 
> "ASCII (American Standard Code For Information Interchange) is the original 7-"
> "bit character set, originally designed for American English.  Also known as"
> "US-ASCII.  It is currently described by the ISO 646:1991 IRV (International"
> "Reference Version) standard."
> 
> "The ISO 2022 and 4873 standards describe a font-control model based on VT100"
> "practice.  This model is (partially) supported by the Linux kernel and by"
> "B<xterm>(1).  Several ISO 2022-based character encodings have been defined,"
> "especially for Japanese."
> 
> "A 94-character set is designated as GI<n> character set by an escape"
> "sequence ESC ( xx (for G0), ESC ) xx (for G1), ESC * xx (for G2), ESC + xx"
> "(for G3), where xx is a symbol or a pair of symbols found in the ISO 2375"
> "International Register of Coded Character Sets.  For example, ESC ( @"
> "selects the ISO 646 character set as G0, ESC ( A selects the UK standard"
> "character set (with pound instead of number sign), ESC ( B selects ASCII"
> "(with dollar instead of currency sign), ESC ( M selects a character set for"
> "African languages, ESC ( ! A selects the Cuban character set, and so on."
> 
> "ISO 4873 stipulates a narrower use of character sets, where G0 is fixed"
> "(always ASCII), so that G1, G2, and G3 can be invoked only for codes with"
> "the high order bit set.  In particular, B<\\(haN> and B<\\(haO> are not used"
> "anymore, ESC ( xx can be used only with xx=B, and ESC ) xx, ESC * xx, ESC +"
> "xx are equivalent to ESC - xx, ESC . xx, ESC / xx, respectively."
> 
> "Unicode (ISO 10646) is a standard which aims to unambiguously represent"
> "every character in every human language.  Unicode's structure permits 20.1"
> "bits to encode every character.  Since most computers don't include 20.1-bit"
> "integers, Unicode is usually encoded as 32-bit integers internally and"
> "either a series of 16-bit integers (UTF-16) (needing two 16-bit integers"
> "only when encoding certain rare characters) or a series of 8-bit bytes"
> "(UTF-8)."
> 
> "A byte 110xxxxx is the start of a 2-byte code, and 110xxxxx 10yyyyyy is"
> "assembled into 00000xxx xxyyyyyy.  A byte 1110xxxx is the start of a 3-byte"
> "code, and 1110xxxx 10yyyyyy 10zzzzzz is assembled into xxxxyyyy yyzzzzzz."
> "(When UTF-8 is used to code the 31-bit ISO 10646 then this progression"
> "continues up to 6-byte codes.)"
> 
> "For most texts in ISO 8859 character sets, this means that the characters"
> "outside of ASCII are now coded with two bytes.  This tends to expand"
> "ordinary text files by only one or two percent.  For Russian or Greek texts,"
> "this expands ordinary text files by 100%, since text in those languages is"
> "mostly outside of ASCII.  For Japanese users this means that the 16-bit"
> "codes now in common use will take three bytes.  While there are algorithmic"
> "conversions from some character sets (especially ISO 8859-1) to Unicode,"
> "general conversion requires carrying around conversion tables, which can be"
> "quite large for 16-bit codes."

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Issue in man page charsets.7
  2023-02-05 14:28 ` Alejandro Colomar
@ 2023-02-05 14:49   ` Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2023-02-05 14:49 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: mario.blaettermann, linux-man

[-- Attachment #1: Type: text/plain, Size: 930 bytes --]

Hello Alex,
On Sun, Feb 05, 2023 at 03:28:45PM +0100, Alejandro Colomar wrote:
> Hi Helge,
> 
> On 1/22/23 20:31, Helge Kreutzmann wrote:
> > Without further ado, the following was found:
> > 
> > Issue:    ISO → ISO/IEC
> 
> Please someone write a documented patch for this one.

These standards are all written in the same committee, hence they are
all from "ISO/IEC".

So pick your favourite editor, and do a global search and replace,
i.e. ISO → ISO/IEC

If you still feel uncomfortable with any occurence, put the number
into the search field on www.iso.org and check yourself.

Greetings

          Helge

-- 
      Dr. Helge Kreutzmann                     debian@helgefjell.de
           Dipl.-Phys.                   http://www.helgefjell.de/debian.php
        64bit GNU powered                     gpg signed mail preferred
           Help keep free software "libre": http://www.ffii.de/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2024-11-17 15:16 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-01 14:02 Issue in man page charsets.7 Helge Kreutzmann
2023-11-11 19:38 ` Alejandro Colomar
2023-11-11 19:45   ` Helge Kreutzmann
2024-01-28 20:22     ` Alejandro Colomar
2024-01-28 20:32       ` Helge Kreutzmann
  -- strict thread matches above, loose matches on Subject: below --
2024-11-17 10:46 Helge Kreutzmann
2024-11-17 14:47 ` Alejandro Colomar
2024-11-17 15:07   ` Helge Kreutzmann
2024-11-17 15:16     ` Alejandro Colomar
2023-11-01 14:02 Helge Kreutzmann
2023-11-01 14:02 Helge Kreutzmann
2023-11-01 14:02 Helge Kreutzmann
2023-11-01 14:02 Helge Kreutzmann
2023-03-11 17:14 Helge Kreutzmann
2023-03-11 17:14 Helge Kreutzmann
2023-03-11 17:13 Helge Kreutzmann
2023-03-11 17:13 Helge Kreutzmann
2023-03-11 17:13 Helge Kreutzmann
2023-03-11 17:13 Helge Kreutzmann
2023-03-11 23:26 ` Alejandro Colomar
2023-03-12  5:08   ` Helge Kreutzmann
2023-03-11 17:13 Helge Kreutzmann
2023-03-11 23:27 ` Alejandro Colomar
2023-03-12  5:14   ` Helge Kreutzmann
2023-03-12 11:28     ` Alejandro Colomar
2023-01-22 19:31 Helge Kreutzmann
2023-01-29 16:45 ` Stefan Puiu
2023-01-29 18:35   ` Alejandro Colomar
2023-01-29 19:20     ` Bernd Petrovitsch
2023-01-29 19:29       ` Alejandro Colomar
2023-01-31 10:56         ` Stefan Puiu
2023-01-22 19:31 Helge Kreutzmann
2023-01-22 19:31 Helge Kreutzmann
2023-02-05 14:28 ` Alejandro Colomar
2023-02-05 14:49   ` Helge Kreutzmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox