public inbox for linux-man@vger.kernel.org
 help / color / mirror / Atom feed
From: Bruno Haible <bruno@clisp.org>
To: Alejandro Colomar <alx.manpages@gmail.com>
Cc: linux-man@vger.kernel.org, Reuben Thomas <rrt@sc3d.org>,
	Steffen Nurpmeso <steffen@sdaoden.eu>,
	Martin Sebor <msebor@redhat.com>,
	Alejandro Colomar <alx@kernel.org>
Subject: Re: [PATCH] iconv.3: Clarify the behavior when input is untranslatable
Date: Thu, 25 May 2023 00:07:46 +0200	[thread overview]
Message-ID: <14654216.O6BkTfRZtg@nimes> (raw)
In-Reply-To: <14c14d88-be1d-94f9-8a1c-3a1128eec9f2@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 278 bytes --]

Alejandro Colomar wrote:
> > Do you have a better wording than "can ... in some cases"?
> 
> If you include the full version in the commit log, to be able to
> understand it in the future, I'm fine with it.

OK. Here is a patch with the details included in the commit message.


[-- Attachment #2: 0001-List-a-fifth-condition-when-iconv-3-may-stop.patch --]
[-- Type: text/x-patch, Size: 3720 bytes --]

From 4cc4ad011b3ffa30159d3a67e262a46da4600cba Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Sun, 21 May 2023 13:05:29 +0200
Subject: [PATCH] List a fifth condition when iconv(3) may stop.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The wording regarding transliteration is vague, because this man page is not
the right place for going into the details of the transliteration.
Here are the details:
GNU libc and GNU libiconv support transliteration, for example, of "½" to "1/2",
or of "å" to "aa" in a Danish locale. The transliteration maps a multibyte
character of the input encoding to zero or more characters in the output.
There are two kinds of transliteration rules:
  - Those that are valid regardless of locale. Typically this means that the
    original and the transliterated character have similar glyphs, such as
    in the case "½" to "1/2".
    In GNU libc, these are collected in the files
    glibc/localedata/locales/translit_*.
  - Those that are valid in a single locale only. Often such a rule
    reflects similar pronounciation of the original and the transliterated
    characters. Some locales have script-based transliteration, for example
    from the Cyrillic script to the Latin script.
    In GNU libc, these are collected in the file
    glibc/localedata/locales/<locale>.
    In GNU libiconv, transliterations of this kind are not supported.

Link: https://sourceware.org/bugzilla/show_bug.cgi?id=29913#c4
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217059
Reported-by: Steffen Nurpmeso <steffen@sdaoden.eu>
Reported-by: Reuben Thomas <rrt@sc3d.org>
Signed-off-by: Bruno Haible <bruno@clisp.org>
---
 man3/iconv.3 | 35 ++++++++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/man3/iconv.3 b/man3/iconv.3
index 66f59b8c3..94441f602 100644
--- a/man3/iconv.3
+++ b/man3/iconv.3
@@ -71,7 +71,7 @@ If the character encoding of the input is stateful, the
 function can also convert a sequence of input bytes
 to an update to the conversion state without producing any output bytes;
 such input is called a \fIshift sequence\fP.
-The conversion can stop for four reasons:
+The conversion can stop for five reasons:
 .IP \[bu] 3
 An invalid multibyte sequence is encountered in the input.
 In this case,
@@ -80,6 +80,39 @@ it sets \fIerrno\fP to \fBEILSEQ\fP and returns
 \fI*inbuf\fP
 is left pointing to the beginning of the invalid multibyte sequence.
 .IP \[bu]
+A multibyte sequence is encountered that is valid but that cannot be
+translated to the character encoding of the output.
+This condition depends on the implementation and on the conversion
+descriptor.
+In the GNU C library and GNU libiconv, if
+.I cd
+was created without the suffix
+.B //TRANSLIT
+or
+.BR //IGNORE ,
+the conversion is strict: lossy conversions produce this condition.
+If the suffix
+.B //TRANSLIT
+was specified, transliteration can avoid this condition in some cases.
+In the musl C library, this condition cannot occur because a conversion to
+.B \[aq]*\[aq]
+is used as a fallback.
+In the FreeBSD, NetBSD, and Solaris implementations of
+.BR iconv (),
+this condition cannot occur either, because a conversion to
+.B \[aq]?\[aq]
+is used as a fallback.
+When this condition is met,
+.BR iconv ()
+sets
+.I errno
+to
+.B EILSEQ
+and returns
+.IR (size_t)\ \-1 .
+.I *inbuf
+is left pointing to the beginning of the unconvertible multibyte sequence.
+.IP \[bu]
 The input byte sequence has been entirely converted,
 that is, \fI*inbytesleft\fP has gone down to 0.
 In this case,
-- 
2.34.1


  reply	other threads:[~2023-05-24 22:10 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-21 10:31 [PATCH] iconv.3: Clarify the behavior when input is untranslatable Alejandro Colomar
2023-05-21 10:32 ` Alejandro Colomar
2023-05-21 11:11 ` Bruno Haible
2023-05-21 14:41   ` Alejandro Colomar
2023-05-21 19:37     ` Bruno Haible
2023-05-21 20:53       ` 2 spaces after the end of a sentence is the _right_ amount (was: [PATCH] iconv.3: Clarify the behavior when input is untranslatable) Alejandro Colomar
2023-05-21 20:57       ` [PATCH] iconv.3: Clarify the behavior when input is untranslatable Alejandro Colomar
2023-05-24 22:07         ` Bruno Haible [this message]
2023-05-24 23:25           ` Alejandro Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14654216.O6BkTfRZtg@nimes \
    --to=bruno@clisp.org \
    --cc=alx.manpages@gmail.com \
    --cc=alx@kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=msebor@redhat.com \
    --cc=rrt@sc3d.org \
    --cc=steffen@sdaoden.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox