git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* t3900 failure on macOS, iconv(3) broken?
@ 2025-12-08 22:59 René Scharfe
  2025-12-09  3:18 ` Koji Nakamaru
                   ` (8 more replies)
  0 siblings, 9 replies; 45+ messages in thread
From: René Scharfe @ 2025-12-08 22:59 UTC (permalink / raw)
  To: Git List

Hi all,

three tests of t3900 fail on macOS 26.1 for me:

  not ok 17 - ISO-2022-JP should be shown in UTF-8 now
  not ok 25 - ISO-2022-JP should be shown in UTF-8 now
  not ok 38 - commit --fixup into ISO-2022-JP from UTF-8

Here's the verbose output of the first one:

----- snip! -----
expecting success of 3900.17 'ISO-2022-JP should be shown in UTF-8 now':
                compare_with ISO-2022-JP "$TEST_DIRECTORY"/t3900/2-UTF-8.txt

--- /Users/x/src/git/t/t3900/2-UTF-8.txt 2024-10-01 19:43:24.605230684 +0000
+++ current     2025-12-08 21:52:45.786161909 +0000
@@ -1,4 +1,4 @@
 はれひほふ

 しているのが、いるので。
-濱浜ほれぷりぽれまびぐりろへ。
+濱浜ほれぷりぽれまび$0$j$m$X!#
not ok 17 - ISO-2022-JP should be shown in UTF-8 now
#
#                       compare_with ISO-2022-JP "$TEST_DIRECTORY"/t3900/2-UTF-8.txt
#
1..17
----- snap! -----

compare_with runs git show to display a commit message, which in this
case here was encoded using ISO-2022-JP and is supposed to be reencoded
to UTF-8, but git show only does that half-way -- the "$0$j$m$X!#" part
is from the original ISO-2022-JP representation.

That botched conversion is done by utf8.c::reencode_string_iconv().  It
calls iconv(3) to do the actual work, initially with an output buffer of
the same size as the input.  If the output needs more space the function
enlarges the buffer and calls iconv(3) again.

iconv(3) won't tell us how much space it needs, but it will report what
part it already managed to convert, so we can increase the buffer and
continue from there.  ISO-2022-JP has escape codes for switching between
character sets, so it's a stateful encoding.  I guess the iconv(3) on my
machine forgets the state at the end of part one and then messes up part
two.

I only noticed now because I used to compile with NO_ICONV for some
reason.

Is anyone else seeing this breakage as well?

Here's a patch that adds make variable ICONV_BREAKS.  It avoids the
breakage when enabled, by starting over again instead of continuing.

René


---
 Makefile |  6 ++++++
 utf8.c   | 13 +++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/Makefile b/Makefile
index 6fc322ff88..cf8a0d3ee9 100644
--- a/Makefile
+++ b/Makefile
@@ -181,6 +181,9 @@ include shared.mak
 # byte-order mark (BOM) when writing UTF-16 or UTF-32 and always writes in
 # big-endian format.
 #
+# Define ICONV_BREAKS if your iconv implementation cannot reliably
+# break a string into valid substrings.
+#
 # Define NO_DEFLATE_BOUND if your zlib does not have deflateBound. Define
 # ZLIB_NG if you want to use zlib-ng instead of zlib.
 #
@@ -1836,6 +1839,9 @@ endif
 ifdef ICONV_OMITS_BOM
 	BASIC_CFLAGS += -DICONV_OMITS_BOM
 endif
+ifdef ICONV_BREAKS
+	BASIC_CFLAGS += -DICONV_BREAKS
+endif
 ifdef NEEDS_LIBGEN
 	EXTLIBS += -lgen
 endif
diff --git a/utf8.c b/utf8.c
index 35a0251939..ff0c541fbc 100644
--- a/utf8.c
+++ b/utf8.c
@@ -515,6 +515,19 @@ char *reencode_string_iconv(const char *in, size_t insz, iconv_t conv,
 			out = xrealloc(out, outalloc);
 			outpos = out + sofar;
 			outsz = outalloc - sofar - 1;
+#ifdef ICONV_BREAKS
+			/*
+			 * If iconv(3) messes up piecemeal conversions
+			 * then restore the original pointers, sizes,
+			 * and converter state, then retry converting
+			 * the full string using the reallocated buffer.
+			 */
+			insz += (char *)cp - in;
+			cp = (iconv_ibp)in;
+			outpos = out + bom_len;
+			outsz = outalloc - bom_len - 1;
+			iconv(conv, NULL, NULL, NULL, NULL);
+#endif
 		}
 		else {
 			*outpos = '\0';
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2025-12-24  8:08 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-08 22:59 t3900 failure on macOS, iconv(3) broken? René Scharfe
2025-12-09  3:18 ` Koji Nakamaru
2025-12-09  3:50   ` Yee Cheng Chin
2025-12-09  4:03     ` Collin Funk
2025-12-09 16:33 ` Torsten Bögershausen
2025-12-09 19:35   ` René Scharfe
2025-12-09 21:24     ` Torsten Bögershausen
2025-12-09 22:25       ` René Scharfe
2025-12-09 19:35 ` [PATCH] config.mak.uname: use iconv from Homebrew on macOS René Scharfe
2025-12-09 20:39   ` Yee Cheng Chin
2025-12-09 21:27     ` René Scharfe
2025-12-10 11:17   ` Carlo Marcelo Arenas Belón
2025-12-10 17:56     ` René Scharfe
2025-12-11  2:53       ` Junio C Hamano
2025-12-11 11:17         ` Carlo Marcelo Arenas Belón
2025-12-12  2:20           ` Junio C Hamano
2025-12-12  9:16             ` René Scharfe
2025-12-12 10:02               ` Carlo Marcelo Arenas Belón
2025-12-12 13:04               ` Re* " Junio C Hamano
2025-12-12 13:48                 ` René Scharfe
2025-12-12 23:39                   ` Junio C Hamano
2025-12-10 16:42   ` Torsten Bögershausen
2025-12-10 17:56     ` René Scharfe
2025-12-10 23:10   ` brian m. carlson
2025-12-11  2:36     ` Junio C Hamano
2025-12-11  9:59       ` Junio C Hamano
2025-12-11 14:34         ` René Scharfe
2025-12-12  3:35           ` Junio C Hamano
2025-12-12 10:40 ` t3900 failure on macOS, iconv(3) broken? René Scharfe
2025-12-13 18:42 ` [PATCH v2 1/2] Makefile: add NO_HOMEBREW René Scharfe
2025-12-14  6:45   ` Torsten Bögershausen
2025-12-14  7:13     ` Junio C Hamano
2025-12-14  9:02       ` Torsten Bögershausen
2025-12-14 11:07         ` Junio C Hamano
2025-12-14 11:13       ` René Scharfe
2025-12-14 23:19         ` Junio C Hamano
2025-12-16 18:53           ` René Scharfe
2025-12-13 18:42 ` [PATCH v2 2/2] config.mak.uname: use iconv from Homebrew on macOS René Scharfe
2025-12-16 18:53 ` [PATCH v3 1/2] macOS: make Homebrew use configurable René Scharfe
2025-12-16 19:11   ` René Scharfe
2025-12-16 21:49     ` Torsten Bögershausen
2025-12-16 18:53 ` [PATCH v3 2/2] macOS: use iconv from Homebrew if present René Scharfe
2025-12-24  7:52 ` [PATCH v4 0/2] macOS: use iconv from Homebrew if needed and present René Scharfe
2025-12-24  8:02   ` [PATCH v4 1/2] macOS: make Homebrew use configurable René Scharfe
2025-12-24  8:03   ` [PATCH v4 2/2] macOS: use iconv from Homebrew if needed and present René Scharfe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).