public inbox for linux-man@vger.kernel.org
 help / color / mirror / Atom feed
* Issue in man page charsets.7
@ 2023-11-01 14:02 Helge Kreutzmann
  2023-11-11 19:38 ` Alejandro Colomar
  0 siblings, 1 reply; 35+ messages in thread
From: Helge Kreutzmann @ 2023-11-01 14:02 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO 4873 → ISO/IEC 4873

"ISO 4873 stipulates a narrower use of character sets, where G0 is fixed "
"(always ASCII), so that G1, G2, and G3 can be invoked only for codes with "
"the high order bit set.  In particular, B<\\[ha]N> and B<\\[ha]O> are not "
"used anymore, ESC ( xx can be used only with xx=B, and ESC ) xx, ESC * xx, "
"ESC + xx are equivalent to ESC - xx, ESC . xx, ESC / xx, respectively."

^ permalink raw reply	[flat|nested] 35+ messages in thread
* Issue in man page charsets.7
@ 2024-11-17 10:46 Helge Kreutzmann
  2024-11-17 14:47 ` Alejandro Colomar
  0 siblings, 1 reply; 35+ messages in thread
From: Helge Kreutzmann @ 2024-11-17 10:46 UTC (permalink / raw)
  To: alx; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    (it) is \\[aq]/\\[aq]s correct (the final s is a english plural s)

"Note that UTF-8 is self-synchronizing: 10xxxxxx is a tail, any other byte is "
"the head of a code.  Note that the only way ASCII bytes occur in a UTF-8 "
"stream, is as themselves.  In particular, there are no embedded NULs "
"(\\[aq]\\[rs]0\\[aq]) or \\[aq]/\\[aq]s that form part of some larger code."

^ permalink raw reply	[flat|nested] 35+ messages in thread
* Issue in man page charsets.7
@ 2023-11-01 14:02 Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2023-11-01 14:02 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO 646:1991 → ISO/IEC 646:1991-12

"ASCII (American Standard Code For Information Interchange) is the original 7-"
"bit character set, originally designed for American English.  Also known as "
"US-ASCII.  It is currently described by the ISO 646:1991 IRV (International "
"Reference Version) standard."

^ permalink raw reply	[flat|nested] 35+ messages in thread
* Issue in man page charsets.7
@ 2023-11-01 14:02 Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2023-11-01 14:02 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO 2375 → ISO/IEC 2375

"A 94-character set is designated as GI<n> character set by an escape "
"sequence ESC ( xx (for G0), ESC ) xx (for G1), ESC * xx (for G2), ESC + xx "
"(for G3), where xx is a symbol or a pair of symbols found in the ISO 2375 "
"International Register of Coded Character Sets.  For example, ESC ( @ "
"selects the ISO 646 character set as G0, ESC ( A selects the UK standard "
"character set (with pound instead of number sign), ESC ( B selects ASCII "
"(with dollar instead of currency sign), ESC ( M selects a character set for "
"African languages, ESC ( ! A selects the Cuban character set, and so on."

^ permalink raw reply	[flat|nested] 35+ messages in thread
* Issue in man page charsets.7
@ 2023-11-01 14:02 Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2023-11-01 14:02 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO 8859 → ISO/IEC 8859

"For most texts in ISO 8859 character sets, this means that the characters "
"outside of ASCII are now coded with two bytes.  This tends to expand "
"ordinary text files by only one or two percent.  For Russian or Greek texts, "
"this expands ordinary text files by 100%, since text in those languages is "
"mostly outside of ASCII.  For Japanese users this means that the 16-bit "
"codes now in common use will take three bytes.  While there are algorithmic "
"conversions from some character sets (especially ISO 8859-1) to Unicode, "
"general conversion requires carrying around conversion tables, which can be "
"quite large for 16-bit codes."

^ permalink raw reply	[flat|nested] 35+ messages in thread
* Issue in man page charsets.7
@ 2023-11-01 14:02 Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2023-11-01 14:02 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO → ISO/IEC

"The ISO 2022 and 4873 standards describe a font-control model based on VT100 "
"practice.  This model is (partially) supported by the Linux kernel and by "
"B<xterm>(1).  Several ISO 2022-based character encodings have been defined, "
"especially for Japanese."

^ permalink raw reply	[flat|nested] 35+ messages in thread
* Issue in man page charsets.7
@ 2023-03-11 17:14 Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2023-03-11 17:14 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO 8859 → ISO/IEC 8859

"For most texts in ISO 8859 character sets, this means that the characters "
"outside of ASCII are now coded with two bytes.  This tends to expand "
"ordinary text files by only one or two percent.  For Russian or Greek texts, "
"this expands ordinary text files by 100%, since text in those languages is "
"mostly outside of ASCII.  For Japanese users this means that the 16-bit "
"codes now in common use will take three bytes.  While there are algorithmic "
"conversions from some character sets (especially ISO 8859-1) to Unicode, "
"general conversion requires carrying around conversion tables, which can be "
"quite large for 16-bit codes."

^ permalink raw reply	[flat|nested] 35+ messages in thread
* Issue in man page charsets.7
@ 2023-03-11 17:14 Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2023-03-11 17:14 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO 10646 → ISO/IEC 10646

"Unicode (ISO 10646) is a standard which aims to unambiguously represent "
"every character in every human language.  Unicode's structure permits 20.1 "
"bits to encode every character.  Since most computers don't include 20.1-bit "
"integers, Unicode is usually encoded as 32-bit integers internally and "
"either a series of 16-bit integers (UTF-16) (needing two 16-bit integers "
"only when encoding certain rare characters) or a series of 8-bit bytes "
"(UTF-8)."

"A byte 110xxxxx is the start of a 2-byte code, and 110xxxxx 10yyyyyy is "
"assembled into 00000xxx xxyyyyyy.  A byte 1110xxxx is the start of a 3-byte "
"code, and 1110xxxx 10yyyyyy 10zzzzzz is assembled into xxxxyyyy yyzzzzzz.  "
"(When UTF-8 is used to code the 31-bit ISO 10646 then this progression "
"continues up to 6-byte codes.)"

^ permalink raw reply	[flat|nested] 35+ messages in thread
* Issue in man page charsets.7
@ 2023-03-11 17:13 Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2023-03-11 17:13 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO 2375 → ISO/IEC 2375

"A 94-character set is designated as GI<n> character set by an escape "
"sequence ESC ( xx (for G0), ESC ) xx (for G1), ESC * xx (for G2), ESC + xx "
"(for G3), where xx is a symbol or a pair of symbols found in the ISO 2375 "
"International Register of Coded Character Sets.  For example, ESC ( @ "
"selects the ISO 646 character set as G0, ESC ( A selects the UK standard "
"character set (with pound instead of number sign), ESC ( B selects ASCII "
"(with dollar instead of currency sign), ESC ( M selects a character set for "
"African languages, ESC ( ! A selects the Cuban character set, and so on."
msgstr ""
"Ein 94-Zeichen Satz wird durch eine Escape-Sequenz ESC ( xx (für G0), ESC ) "
"xx (für G1), ESC * xx (für G2), ESC + xx (für G3) bezeichnet, wobei xx ein "
"im internationalen Register von kodierten Zeichensätzen in ISO/IEC 2375 "
"gefundenes Symbol oder ein Paar von Symbolen ist. Beispielsweise wählt ESC "
"( @ den ISO-646-Zeichensatz als G0, ESC ( A wählt den UK-Standardzeichensatz "
"(mit Pfundzeichen statt des Nummernzeichens), ESC ( B wählt ASCII (mit "
"Dollarzeichen anstelle des Währungszeichens), ESC ( M wählt einen "
"Zeichensatz für afrikanische Sprachen ESC ( ! A wählt den kubanischen "
"Zeichensatz und so weiter."

^ permalink raw reply	[flat|nested] 35+ messages in thread
* Issue in man page charsets.7
@ 2023-03-11 17:13 Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2023-03-11 17:13 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO 4873 → ISO/IEC 4873

"ISO 4873 stipulates a narrower use of character sets, where G0 is fixed "
"(always ASCII), so that G1, G2, and G3 can be invoked only for codes with "
"the high order bit set.  In particular, B<\\[ha]N> and B<\\[ha]O> are not "
"used anymore, ESC ( xx can be used only with xx=B, and ESC ) xx, ESC * xx, "
"ESC + xx are equivalent to ESC - xx, ESC . xx, ESC / xx, respectively."

^ permalink raw reply	[flat|nested] 35+ messages in thread
* Issue in man page charsets.7
@ 2023-03-11 17:13 Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2023-03-11 17:13 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO 646:1991 → ISO/IEC 646:1991-12

"ASCII (American Standard Code For Information Interchange) is the original 7-"
"bit character set, originally designed for American English.  Also known as "
"US-ASCII.  It is currently described by the ISO 646:1991 IRV (International "
"Reference Version) standard."

^ permalink raw reply	[flat|nested] 35+ messages in thread
* Issue in man page charsets.7
@ 2023-03-11 17:13 Helge Kreutzmann
  2023-03-11 23:26 ` Alejandro Colomar
  0 siblings, 1 reply; 35+ messages in thread
From: Helge Kreutzmann @ 2023-03-11 17:13 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    „“ are not old-style, they are the current quotation marks required by German othography

"Latin-1 covers many European languages such as Albanian, Basque, Danish, "
"English, Faroese, Galician, Icelandic, Irish, Italian, Norwegian, "
"Portuguese, Spanish, and Swedish.  The lack of the ligatures Dutch IJ/ij, "
"French œ, and old-style „German“ quotation marks was considered tolerable."

^ permalink raw reply	[flat|nested] 35+ messages in thread
* Issue in man page charsets.7
@ 2023-03-11 17:13 Helge Kreutzmann
  2023-03-11 23:27 ` Alejandro Colomar
  0 siblings, 1 reply; 35+ messages in thread
From: Helge Kreutzmann @ 2023-03-11 17:13 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO → ISO/IEC

"The ISO 2022 and 4873 standards describe a font-control model based on VT100 "
"practice.  This model is (partially) supported by the Linux kernel and by "
"B<xterm>(1).  Several ISO 2022-based character encodings have been defined, "
"especially for Japanese."

^ permalink raw reply	[flat|nested] 35+ messages in thread
* Issue in man page charsets.7
@ 2023-01-22 19:31 Helge Kreutzmann
  2023-01-29 16:45 ` Stefan Puiu
  0 siblings, 1 reply; 35+ messages in thread
From: Helge Kreutzmann @ 2023-01-22 19:31 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    „“ are not old-style, they are the current quotation marks required by German othography

"Latin-1 covers many West European languages such as Albanian, Basque, "
"Danish, English, Faroese, Galician, Icelandic, Irish, Italian, Norwegian, "
"Portuguese, Spanish, and Swedish.  The lack of the ligatures Dutch IJ/ij, "
"French œ, and old-style „German“ quotation marks was considered tolerable."

^ permalink raw reply	[flat|nested] 35+ messages in thread
* Issue in man page charsets.7
@ 2023-01-22 19:31 Helge Kreutzmann
  0 siblings, 0 replies; 35+ messages in thread
From: Helge Kreutzmann @ 2023-01-22 19:31 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    set → character set

"Here are brief descriptions of each set:"

"This set does not exist."

"This set covers many Southeast European languages, and most importantly "
"supports Romanian more completely than Latin-2."

^ permalink raw reply	[flat|nested] 35+ messages in thread
* Issue in man page charsets.7
@ 2023-01-22 19:31 Helge Kreutzmann
  2023-02-05 14:28 ` Alejandro Colomar
  0 siblings, 1 reply; 35+ messages in thread
From: Helge Kreutzmann @ 2023-01-22 19:31 UTC (permalink / raw)
  To: alx.manpages; +Cc: mario.blaettermann, linux-man

Without further ado, the following was found:

Issue:    ISO → ISO/IEC

"ASCII (American Standard Code For Information Interchange) is the original 7-"
"bit character set, originally designed for American English.  Also known as "
"US-ASCII.  It is currently described by the ISO 646:1991 IRV (International "
"Reference Version) standard."

"The ISO 2022 and 4873 standards describe a font-control model based on VT100 "
"practice.  This model is (partially) supported by the Linux kernel and by "
"B<xterm>(1).  Several ISO 2022-based character encodings have been defined, "
"especially for Japanese."

"A 94-character set is designated as GI<n> character set by an escape "
"sequence ESC ( xx (for G0), ESC ) xx (for G1), ESC * xx (for G2), ESC + xx "
"(for G3), where xx is a symbol or a pair of symbols found in the ISO 2375 "
"International Register of Coded Character Sets.  For example, ESC ( @ "
"selects the ISO 646 character set as G0, ESC ( A selects the UK standard "
"character set (with pound instead of number sign), ESC ( B selects ASCII "
"(with dollar instead of currency sign), ESC ( M selects a character set for "
"African languages, ESC ( ! A selects the Cuban character set, and so on."

"ISO 4873 stipulates a narrower use of character sets, where G0 is fixed "
"(always ASCII), so that G1, G2, and G3 can be invoked only for codes with "
"the high order bit set.  In particular, B<\\(haN> and B<\\(haO> are not used "
"anymore, ESC ( xx can be used only with xx=B, and ESC ) xx, ESC * xx, ESC + "
"xx are equivalent to ESC - xx, ESC . xx, ESC / xx, respectively."

"Unicode (ISO 10646) is a standard which aims to unambiguously represent "
"every character in every human language.  Unicode's structure permits 20.1 "
"bits to encode every character.  Since most computers don't include 20.1-bit "
"integers, Unicode is usually encoded as 32-bit integers internally and "
"either a series of 16-bit integers (UTF-16) (needing two 16-bit integers "
"only when encoding certain rare characters) or a series of 8-bit bytes "
"(UTF-8)."

"A byte 110xxxxx is the start of a 2-byte code, and 110xxxxx 10yyyyyy is "
"assembled into 00000xxx xxyyyyyy.  A byte 1110xxxx is the start of a 3-byte "
"code, and 1110xxxx 10yyyyyy 10zzzzzz is assembled into xxxxyyyy yyzzzzzz.  "
"(When UTF-8 is used to code the 31-bit ISO 10646 then this progression "
"continues up to 6-byte codes.)"

"For most texts in ISO 8859 character sets, this means that the characters "
"outside of ASCII are now coded with two bytes.  This tends to expand "
"ordinary text files by only one or two percent.  For Russian or Greek texts, "
"this expands ordinary text files by 100%, since text in those languages is "
"mostly outside of ASCII.  For Japanese users this means that the 16-bit "
"codes now in common use will take three bytes.  While there are algorithmic "
"conversions from some character sets (especially ISO 8859-1) to Unicode, "
"general conversion requires carrying around conversion tables, which can be "
"quite large for 16-bit codes."

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2024-11-17 15:16 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-01 14:02 Issue in man page charsets.7 Helge Kreutzmann
2023-11-11 19:38 ` Alejandro Colomar
2023-11-11 19:45   ` Helge Kreutzmann
2024-01-28 20:22     ` Alejandro Colomar
2024-01-28 20:32       ` Helge Kreutzmann
  -- strict thread matches above, loose matches on Subject: below --
2024-11-17 10:46 Helge Kreutzmann
2024-11-17 14:47 ` Alejandro Colomar
2024-11-17 15:07   ` Helge Kreutzmann
2024-11-17 15:16     ` Alejandro Colomar
2023-11-01 14:02 Helge Kreutzmann
2023-11-01 14:02 Helge Kreutzmann
2023-11-01 14:02 Helge Kreutzmann
2023-11-01 14:02 Helge Kreutzmann
2023-03-11 17:14 Helge Kreutzmann
2023-03-11 17:14 Helge Kreutzmann
2023-03-11 17:13 Helge Kreutzmann
2023-03-11 17:13 Helge Kreutzmann
2023-03-11 17:13 Helge Kreutzmann
2023-03-11 17:13 Helge Kreutzmann
2023-03-11 23:26 ` Alejandro Colomar
2023-03-12  5:08   ` Helge Kreutzmann
2023-03-11 17:13 Helge Kreutzmann
2023-03-11 23:27 ` Alejandro Colomar
2023-03-12  5:14   ` Helge Kreutzmann
2023-03-12 11:28     ` Alejandro Colomar
2023-01-22 19:31 Helge Kreutzmann
2023-01-29 16:45 ` Stefan Puiu
2023-01-29 18:35   ` Alejandro Colomar
2023-01-29 19:20     ` Bernd Petrovitsch
2023-01-29 19:29       ` Alejandro Colomar
2023-01-31 10:56         ` Stefan Puiu
2023-01-22 19:31 Helge Kreutzmann
2023-01-22 19:31 Helge Kreutzmann
2023-02-05 14:28 ` Alejandro Colomar
2023-02-05 14:49   ` Helge Kreutzmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox