* [patch] iconv.3: clarify behavior when input is untranslatable
@ 2023-05-20 11:17 Reuben Thomas
2023-05-20 12:04 ` [PATCH v1b] iconv.3: Clarify the " Alejandro Colomar
0 siblings, 1 reply; 4+ messages in thread
From: Reuben Thomas @ 2023-05-20 11:17 UTC (permalink / raw)
To: Alejandro Colomar; +Cc: Linux man pages mailing list, Bruno Haible
[-- Attachment #1: Type: text/plain, Size: 125 bytes --]
I attach a patch for
https://bugzilla.kernel.org/show_bug.cgi?id=217059 as requested by
Alejandro.
--
https://rrt.sc3d.org
[-- Attachment #2: 0001-iconv.3-clarify-the-behavior-when-input-is-untransla.patch --]
[-- Type: text/x-patch, Size: 2008 bytes --]
From 72b623ee2c32da96a2972a9dce43a554f494c5b8 Mon Sep 17 00:00:00 2001
From: Reuben Thomas <rrt@sc3d.org>
Date: Sat, 20 May 2023 12:10:11 +0100
Subject: [PATCH] iconv.3: clarify the behavior when input is untranslatable
See https://bugzilla.kernel.org/show_bug.cgi?id=217059
The man page does not fully reflect the behaviour of glibc's iconv. The man
page says:
The conversion can stop for four reasons:
1. An invalid multibyte sequence is encountered in the input. In this
case, it sets errno to EILSEQ and returns (size_t) -1. *inbuf is left
pointing to the beginning of the invalid multibyte sequence.
The phrase "An invalid multibyte sequence is encountered in the input" is
confusing, because it suggests that it refers only to the validity of the
input per se (e.g. a non-UTF-8 sequence in input purporting to be UTF-8).
However, according to the original author of the man page, Bruno Haible[1],
it also refers to input that cannot be translated to the desired output
encoding; and indeed, glibc's iconv returns EILSEQ when the input cannot be
translated, even though it is valid.
This patch adds language that reflects the actual behavior.
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=29913#c4
Signed-off-by: Reuben Thomas <rrt@sc3d.org>
Suggested-by: Alejandro Colomar <alx@kernel.org>
---
man3/iconv.3 | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/man3/iconv.3 b/man3/iconv.3
index 66f59b8c3..e8694ca12 100644
--- a/man3/iconv.3
+++ b/man3/iconv.3
@@ -73,7 +73,8 @@ to an update to the conversion state without producing any output bytes;
such input is called a \fIshift sequence\fP.
The conversion can stop for four reasons:
.IP \[bu] 3
-An invalid multibyte sequence is encountered in the input.
+An multibyte sequence is encountered in the input which is either invalid,
+or cannot be translated to the character encoding of the output.
In this case,
it sets \fIerrno\fP to \fBEILSEQ\fP and returns
.IR (size_t)\ \-1 .
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread* [PATCH v1b] iconv.3: Clarify the behavior when input is untranslatable
2023-05-20 11:17 [patch] iconv.3: clarify behavior when input is untranslatable Reuben Thomas
@ 2023-05-20 12:04 ` Alejandro Colomar
2023-05-21 8:52 ` Reuben Thomas
2023-05-21 19:34 ` Silvan Jegen
0 siblings, 2 replies; 4+ messages in thread
From: Alejandro Colomar @ 2023-05-20 12:04 UTC (permalink / raw)
To: linux-man
Cc: Reuben Thomas, Steffen Nurpmeso, Bruno Haible, Martin Sebor,
Alejandro Colomar
From: Reuben Thomas <rrt@sc3d.org>
The manual page does not fully reflect the behaviour of glibc's
iconv(3). The manual page says:
The conversion can stop for four reasons:
- An invalid multibyte sequence is encountered in the input. In
this case, it sets errno to EILSEQ and returns (size_t) -1.
*inbuf is left pointing to the beginning of the invalid multibyte
sequence.
[...]
The phrase "An invalid multibyte sequence is encountered in the input"
is confusing, because it suggests that it refers only to the validity of
the input per se (e.g. a non-UTF-8 sequence in input purporting to be
UTF-8).
However, according to the original author of the manual page, Bruno
Haible[1], it also refers to input that cannot be translated to the
desired output encoding; and indeed, glibc's iconv returns EILSEQ when
the input cannot be translated, even though it is valid.
This patch adds language that reflects the actual behavior.
Link: [1] <https://sourceware.org/bugzilla/show_bug.cgi?id=29913#c4>
Link: <https://bugzilla.kernel.org/show_bug.cgi?id=217059>
Signed-off-by: Reuben Thomas <rrt@sc3d.org>
Cc: Steffen Nurpmeso <steffen@sdaoden.eu>
Cc: Bruno Haible <bruno@clisp.org>
Cc: Martin Sebor <msebor@redhat.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
Hi,
I'm resending Reuben's patch inline CCing all interested parties. I'm,
similarly to Steffen, not convinced that invalid input englobes output
errors. So, I think it would be better to split it into 2 different
reasons, so that we have a 5th reason for the error.
I also slightly tweaked the commit log regarding formatting.
What do you think about having a 5th reason?
Cheers,
Alex
man3/iconv.3 | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/man3/iconv.3 b/man3/iconv.3
index 66f59b8c3..e8694ca12 100644
--- a/man3/iconv.3
+++ b/man3/iconv.3
@@ -73,7 +73,8 @@ .SH DESCRIPTION
such input is called a \fIshift sequence\fP.
The conversion can stop for four reasons:
.IP \[bu] 3
-An invalid multibyte sequence is encountered in the input.
+An multibyte sequence is encountered in the input which is either invalid,
+or cannot be translated to the character encoding of the output.
In this case,
it sets \fIerrno\fP to \fBEILSEQ\fP and returns
.IR (size_t)\ \-1 .
--
2.40.1
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH v1b] iconv.3: Clarify the behavior when input is untranslatable
2023-05-20 12:04 ` [PATCH v1b] iconv.3: Clarify the " Alejandro Colomar
@ 2023-05-21 8:52 ` Reuben Thomas
2023-05-21 19:34 ` Silvan Jegen
1 sibling, 0 replies; 4+ messages in thread
From: Reuben Thomas @ 2023-05-21 8:52 UTC (permalink / raw)
To: Alejandro Colomar
Cc: linux-man, Steffen Nurpmeso, Bruno Haible, Martin Sebor,
Alejandro Colomar
On Sat, 20 May 2023 at 13:08, Alejandro Colomar <alx.manpages@gmail.com> wrote:
>
> I'm resending Reuben's patch inline CCing all interested parties. I'm,
> similarly to Steffen, not convinced that invalid input englobes output
> errors. So, I think it would be better to split it into 2 different
> reasons, so that we have a 5th reason for the error.
>
> I also slightly tweaked the commit log regarding formatting.
Many thanks!
> What do you think about having a 5th reason?
You're right that it is a different logical condition; my only concern
is that the new working make it obvious that this condition results in
EILSEQ, to avoid the confusion that myself and others have had over
the years from believing that EILSEQ only results from invalid input
(from reading earlier versions of this man page, and the POSIX
standard).
--
https://rrt.sc3d.org
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v1b] iconv.3: Clarify the behavior when input is untranslatable
2023-05-20 12:04 ` [PATCH v1b] iconv.3: Clarify the " Alejandro Colomar
2023-05-21 8:52 ` Reuben Thomas
@ 2023-05-21 19:34 ` Silvan Jegen
1 sibling, 0 replies; 4+ messages in thread
From: Silvan Jegen @ 2023-05-21 19:34 UTC (permalink / raw)
To: Alejandro Colomar
Cc: linux-man, Reuben Thomas, Steffen Nurpmeso, Bruno Haible,
Martin Sebor, Alejandro Colomar
Heyhey!
Just one typo I noticed below.
Alejandro Colomar <alx.manpages@gmail.com> wrote:
> From: Reuben Thomas <rrt@sc3d.org>
>
> The manual page does not fully reflect the behaviour of glibc's
> iconv(3). The manual page says:
>
> The conversion can stop for four reasons:
>
> - An invalid multibyte sequence is encountered in the input. In
> this case, it sets errno to EILSEQ and returns (size_t) -1.
> *inbuf is left pointing to the beginning of the invalid multibyte
> sequence.
>
> [...]
>
> The phrase "An invalid multibyte sequence is encountered in the input"
> is confusing, because it suggests that it refers only to the validity of
> the input per se (e.g. a non-UTF-8 sequence in input purporting to be
> UTF-8).
>
> However, according to the original author of the manual page, Bruno
> Haible[1], it also refers to input that cannot be translated to the
> desired output encoding; and indeed, glibc's iconv returns EILSEQ when
> the input cannot be translated, even though it is valid.
>
> This patch adds language that reflects the actual behavior.
>
> Link: [1] <https://sourceware.org/bugzilla/show_bug.cgi?id=29913#c4>
> Link: <https://bugzilla.kernel.org/show_bug.cgi?id=217059>
> Signed-off-by: Reuben Thomas <rrt@sc3d.org>
> Cc: Steffen Nurpmeso <steffen@sdaoden.eu>
> Cc: Bruno Haible <bruno@clisp.org>
> Cc: Martin Sebor <msebor@redhat.com>
> Signed-off-by: Alejandro Colomar <alx@kernel.org>
> ---
>
> Hi,
>
> I'm resending Reuben's patch inline CCing all interested parties. I'm,
> similarly to Steffen, not convinced that invalid input englobes output
> errors. So, I think it would be better to split it into 2 different
> reasons, so that we have a 5th reason for the error.
>
> I also slightly tweaked the commit log regarding formatting.
>
> What do you think about having a 5th reason?
>
> Cheers,
> Alex
>
> man3/iconv.3 | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/man3/iconv.3 b/man3/iconv.3
> index 66f59b8c3..e8694ca12 100644
> --- a/man3/iconv.3
> +++ b/man3/iconv.3
> @@ -73,7 +73,8 @@ .SH DESCRIPTION
> such input is called a \fIshift sequence\fP.
> The conversion can stop for four reasons:
> .IP \[bu] 3
> -An invalid multibyte sequence is encountered in the input.
> +An multibyte sequence is encountered in the input which is either invalid,
s/An/A/
Cheers,
Silvan
> +or cannot be translated to the character encoding of the output.
> In this case,
> it sets \fIerrno\fP to \fBEILSEQ\fP and returns
> .IR (size_t)\ \-1 .
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-05-21 19:34 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-20 11:17 [patch] iconv.3: clarify behavior when input is untranslatable Reuben Thomas
2023-05-20 12:04 ` [PATCH v1b] iconv.3: Clarify the " Alejandro Colomar
2023-05-21 8:52 ` Reuben Thomas
2023-05-21 19:34 ` Silvan Jegen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox