From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 148F1C77B7A for ; Sat, 20 May 2023 12:08:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229563AbjETMIo (ORCPT ); Sat, 20 May 2023 08:08:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49028 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229523AbjETMIo (ORCPT ); Sat, 20 May 2023 08:08:44 -0400 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05A40C6 for ; Sat, 20 May 2023 05:08:43 -0700 (PDT) Received: by mail-wr1-x433.google.com with SMTP id ffacd0b85a97d-30950eecc1eso2032529f8f.0 for ; Sat, 20 May 2023 05:08:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684584521; x=1687176521; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=HsSuooZVojpFkC00ytq8eHj4X9GcPSX6KBw0ljv4RRc=; b=jVMZ+AzEQj0dmlQxpbzlWpY7/gkx//0MFeMxhUQjAbYf3dGNSr4mUmCs3UOBwRkoTK kGhAacpMcmzjrN5/PfUfMi6nXoq+LBjFZaTV+rZP0nTmyRMjY/MRdcqng0Aw0+pOaiIW RnLq8rHPKgJxPl6xCQiXI6ei5ZjetBbVmQLTP6TCRwH0mu6FZ5l1e1AhMxVsUxi67ggC FNbNXykTug/HJ++jYISPMmFB6Vxe5EqXXCLHhyuSY9JMXJoRgrzWJ96WFpqh/KxAU7Oz 91B8/fRbdH0Xt3Rw7POqrHEwGFYTpwza2f0++75Vj4B5JURio1MXdm9QwjpAICxatQAK A1vw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684584521; x=1687176521; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HsSuooZVojpFkC00ytq8eHj4X9GcPSX6KBw0ljv4RRc=; b=Dve5My3t7WXx7Z3r4OQyxgw4gP8a7X2Dtob0s+k8TbGHP0xBtjw6MmYF7MBj4yuR8l d/YpTcKxo45qwJbDee+P3OIgbw9+AHLsbag1aN8InPsrCm3JgmYss4j2O0XZPOjh+ES+ ui638tGsKbRpiOK5zriik6kpKAe1vzev7KTzXySwVhBTCs2a5ponRMNLOsNPgwfnNn57 4sjFZuUVVVAwspweJwVKoTh92fRe2gJJ4sYAZupN645cwzLa8nuFE2DKczA6ZggD1dKw GAOu5dyoLn+mzI7WDW5/jYA6hX4MHUPzdKYxbc5avDlQlc7Lc99/NT+LujKxevnZTzoG 6yRQ== X-Gm-Message-State: AC+VfDxu4kdOuB1+rZw+hoS7iZcF0l7YTHfldY5unLcuoeEN1Y6/0hkW TWY8V+WLE5jCzV00FUAHG7PZBNZMwSA= X-Google-Smtp-Source: ACHHUZ4XOFdcG1FMXWSSxCvIdPhLyGk+Uo9E+l8hCxoxrXcGEqeOPFHG2IBnQDSi85MsB9eWN3C5oQ== X-Received: by 2002:a5d:68cd:0:b0:307:cf55:a7d8 with SMTP id p13-20020a5d68cd000000b00307cf55a7d8mr3997187wrw.42.1684584521173; Sat, 20 May 2023 05:08:41 -0700 (PDT) Received: from asus5775.alejandro-colomar.es ([170.253.51.134]) by smtp.googlemail.com with ESMTPSA id g17-20020a05600c311100b003f4283f5c1bsm17834071wmo.2.2023.05.20.05.08.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 20 May 2023 05:08:40 -0700 (PDT) From: Alejandro Colomar X-Google-Original-From: Alejandro Colomar To: linux-man@vger.kernel.org Cc: Reuben Thomas , Steffen Nurpmeso , Bruno Haible , Martin Sebor , Alejandro Colomar Subject: [PATCH v1b] iconv.3: Clarify the behavior when input is untranslatable Date: Sat, 20 May 2023 14:04:59 +0200 Message-Id: <20230520120458.6681-1-alx@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-man@vger.kernel.org From: Reuben Thomas The manual page does not fully reflect the behaviour of glibc's iconv(3). The manual page says: The conversion can stop for four reasons: - An invalid multibyte sequence is encountered in the input. In this case, it sets errno to EILSEQ and returns (size_t) -1. *inbuf is left pointing to the beginning of the invalid multibyte sequence. [...] The phrase "An invalid multibyte sequence is encountered in the input" is confusing, because it suggests that it refers only to the validity of the input per se (e.g. a non-UTF-8 sequence in input purporting to be UTF-8). However, according to the original author of the manual page, Bruno Haible[1], it also refers to input that cannot be translated to the desired output encoding; and indeed, glibc's iconv returns EILSEQ when the input cannot be translated, even though it is valid. This patch adds language that reflects the actual behavior. Link: [1] Link: Signed-off-by: Reuben Thomas Cc: Steffen Nurpmeso Cc: Bruno Haible Cc: Martin Sebor Signed-off-by: Alejandro Colomar --- Hi, I'm resending Reuben's patch inline CCing all interested parties. I'm, similarly to Steffen, not convinced that invalid input englobes output errors. So, I think it would be better to split it into 2 different reasons, so that we have a 5th reason for the error. I also slightly tweaked the commit log regarding formatting. What do you think about having a 5th reason? Cheers, Alex man3/iconv.3 | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/man3/iconv.3 b/man3/iconv.3 index 66f59b8c3..e8694ca12 100644 --- a/man3/iconv.3 +++ b/man3/iconv.3 @@ -73,7 +73,8 @@ .SH DESCRIPTION such input is called a \fIshift sequence\fP. The conversion can stop for four reasons: .IP \[bu] 3 -An invalid multibyte sequence is encountered in the input. +An multibyte sequence is encountered in the input which is either invalid, +or cannot be translated to the character encoding of the output. In this case, it sets \fIerrno\fP to \fBEILSEQ\fP and returns .IR (size_t)\ \-1 . -- 2.40.1