From: Jiang Xin <worldhello.net@gmail.com>
To: Junio C Hamano <gitster@pobox.com>,
Git List <git@vger.kernel.org>,
Justin Tobler <jltobler@gmail.com>
Cc: Jiang Xin <worldhello.net@gmail.com>
Subject: [PATCH v2 0/2] Fix misaligned output of git repo structure
Date: Sat, 15 Nov 2025 08:36:09 -0500 [thread overview]
Message-ID: <cover.1763213290.git.worldhello.net@gmail.com> (raw)
In-Reply-To: <cover.1763098804.git.worldhello.net@gmail.com>
While localizing Git 2.52.0, I noticed that the output table from git
repo structure becomes misaligned when displaying UTF-8 characters. For
example:
| 仓库结构 | 值 |
| -------------- | ---- |
The previous implementation used simple width formatting with printf()
which didn't properly handle multi-byte UTF-8 characters, causing
misaligned table columns when displaying repository structure
information.
This change modifies the stats_table_print_structure function to use
strbuf_utf8_align() instead of basic printf width specifiers. This
ensures proper column alignment regardless of the character encoding of
the content being displayed.
Jiang Xin (2):
t/unit-tests: add UTF-8 width tests for CJK chars
builtin/repo: fix table alignment for UTF-8 characters
Makefile | 1 +
builtin/repo.c | 21 ++++--
t/meson.build | 1 +
t/unit-tests/u-utf8-width.c | 134 ++++++++++++++++++++++++++++++++++++
4 files changed, 153 insertions(+), 4 deletions(-)
create mode 100644 t/unit-tests/u-utf8-width.c
## Range-diff vs v1:
1: 53c1e5219b ! 1: 72e73484d2 t/unit-tests: add UTF-8 width tests for CJK chars
@@ Metadata
## Commit message ##
t/unit-tests: add UTF-8 width tests for CJK chars
- This commit adds a new test suite (u-utf8-width.c) to test the UTF-8
- width functions in Git, particularly focusing on multi-byte characters
- from East Asian languages like Chinese, Japanese, and Korean that
- typically require 2 display columns per character.
-
- The test suite includes:
- - Tests for utf8_strnwidth with Chinese strings
- - Tests for utf8_strwidth with Chinese strings
- - Tests for Japanese and Korean characters
- - Edge case tests with invalid UTF-8 sequences
- - Proper test function naming following the Clar framework convention
+ The file "builtin/repo.c" uses utf8_strwidth() to calculate the display
+ width of UTF-8 characters in a table, but the resulting output is still
+ misaligned. Add test cases for both utf8_strwidth and utf8_strnwidth to
+ verify that they correctly compute the display width for UTF-8
+ characters.
Also updated the build configuration in Makefile and meson.build to
include the new test suite in the build process.
- Co-developed-by: Claude <noreply@anthropic.com>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
## Makefile ##
@@ t/unit-tests/u-utf8-width.c (new)
+ */
+void test_utf8_width__strnwidth_chinese(void)
+{
-+ const char *ansi_test;
+ const char *str;
+
+ /* Test basic ASCII - each character should have width 1 */
-+ cl_assert_equal_i(5, utf8_strnwidth("hello", 5, 0));
-+ cl_assert_equal_i(5, utf8_strnwidth("hello", 5, 1)); /* skip_ansi = 1 */
++ cl_assert_equal_i(5, utf8_strnwidth("Hello", 5, 0));
++ /* skip_ansi = 1 */
++ cl_assert_equal_i(5, utf8_strnwidth("Hello", 5, 1));
+
+ /* Test simple Chinese characters - each should have width 2 */
-+ cl_assert_equal_i(4, utf8_strnwidth("你好", 6, 0)); /* "你好" is 6 bytes (3 bytes per char in UTF-8), 4 display columns */
++ /* "你好" is 6 bytes (3 bytes per char in UTF-8), 4 display columns */
++ cl_assert_equal_i(4, utf8_strnwidth("你好", 6, 0));
+
+ /* Test mixed ASCII and Chinese - ASCII = 1 column, Chinese = 2 columns */
-+ cl_assert_equal_i(6, utf8_strnwidth("hi你好", 8, 0)); /* "h"(1) + "i"(1) + "你"(2) + "好"(2) = 6 */
++ /* "h"(1) + "i"(1) + "你"(2) + "好"(2) = 6 */
++ cl_assert_equal_i(6, utf8_strnwidth("Hi你好", 8, 0));
+
+ /* Test longer Chinese string */
-+ cl_assert_equal_i(10, utf8_strnwidth("你好世界!", 15, 0)); /* 5 Chinese chars = 10 display columns */
-+
-+ /* Test with skip_ansi = 1 to make sure it works with escape sequences */
-+ ansi_test = "\033[31m你好\033[0m";
-+ cl_assert_equal_i(4, utf8_strnwidth(ansi_test, strlen(ansi_test), 1)); /* Skip escape sequences, just count "你好" which should be 4 columns */
++ /* 5 Chinese chars = 10 display columns */
++ cl_assert_equal_i(10, utf8_strnwidth("你好世界!", 15, 0));
+
+ /* Test individual Chinese character width */
-+ cl_assert_equal_i(2, utf8_strnwidth("中", 3, 0)); /* Single Chinese char should be 2 columns */
++ cl_assert_equal_i(2, utf8_strnwidth("中", 3, 0));
+
+ /* Test empty string */
+ cl_assert_equal_i(0, utf8_strnwidth("", 0, 0));
+
+ /* Test length limiting */
+ str = "你好世界";
-+ cl_assert_equal_i(2, utf8_strnwidth(str, 3, 0)); /* Only first char "你"(2 columns) within 3 bytes */
-+ cl_assert_equal_i(4, utf8_strnwidth(str, 6, 0)); /* First two chars "你好"(4 columns) in 6 bytes */
++ /* Only first char "你"(2 columns) within 3 bytes */
++ cl_assert_equal_i(2, utf8_strnwidth(str, 3, 0));
++ /* First two chars "你好"(4 columns) in 6 bytes */
++ cl_assert_equal_i(4, utf8_strnwidth(str, 6, 0));
+}
+
+/*
@@ t/unit-tests/u-utf8-width.c (new)
+void test_utf8_width__strwidth_chinese(void)
+{
+ /* Test basic ASCII */
-+ cl_assert_equal_i(5, utf8_strwidth("hello"));
++ cl_assert_equal_i(5, utf8_strwidth("Hello"));
+
+ /* Test Chinese characters */
-+ cl_assert_equal_i(4, utf8_strwidth("你好")); /* 2 Chinese chars = 4 display columns */
++ /* 2 Chinese chars = 4 display columns */
++ cl_assert_equal_i(4, utf8_strwidth("你好"));
++
++ /* Test longer Chinese string */
++ /* 5 Chinese chars = 10 display columns */
++ cl_assert_equal_i(10, utf8_strwidth("你好世界!"));
+
+ /* Test mixed ASCII and Chinese */
-+ cl_assert_equal_i(9, utf8_strwidth("hello世界")); /* 5 ASCII (5 cols) + 2 Chinese (4 cols) = 9 */
-+ cl_assert_equal_i(7, utf8_strwidth("hi世界!")); /* 2 ASCII (2 cols) + 2 Chinese (4 cols) + 1 ASCII (1 col) = 7 */
++ /* 5 ASCII (5 cols) + 2 Chinese (4 cols) = 9 */
++ cl_assert_equal_i(9, utf8_strwidth("Hello世界"));
++ /* 2 ASCII (2 cols) + 2 Chinese (4 cols) + 1 ASCII (1 col) = 7 */
++ cl_assert_equal_i(7, utf8_strwidth("Hi世界!"));
+}
+
+/*
@@ t/unit-tests/u-utf8-width.c (new)
+void test_utf8_width__strnwidth_japanese_korean(void)
+{
+ /* Japanese characters (should also be 2 columns each) */
-+ cl_assert_equal_i(10, utf8_strnwidth("こんにちは", 15, 0)); /* 5 Japanese chars @ 2 cols each = 10 display columns */
++ /* 5 Japanese chars x 2 cols each = 10 display columns */
++ cl_assert_equal_i(10, utf8_strnwidth("こんにちは", 15, 0));
+
+ /* Korean characters (should also be 2 columns each) */
-+ cl_assert_equal_i(10, utf8_strnwidth("안녕하세요", 15, 0)); /* 5 Korean chars @ 2 cols each = 10 display columns */
++ /* 5 Korean chars x 2 cols each = 10 display columns */
++ cl_assert_equal_i(10, utf8_strnwidth("안녕하세요", 15, 0));
+}
+
+/*
-+ * Test edge cases with partial UTF-8 sequences
++ * Test utf8_strnwidth with CJK strings and ANSI sequences
+ */
-+void test_utf8_width__strnwidth_edge_cases(void)
++void test_utf8_width__strnwidth_cjk_with_ansi(void)
+{
-+ const char *invalid;
-+ unsigned char truncated_bytes[] = {0xe4, 0xbd, 0x00}; /* First 2 bytes of "中" + null */
-+
-+ /* Test invalid UTF-8 - should fall back to byte count */
-+ invalid = "\xff\xfe"; /* Invalid UTF-8 sequence */
-+ cl_assert_equal_i(2, utf8_strnwidth(invalid, 2, 0)); /* Should return length if invalid UTF-8 */
-+
-+ /* Test partial UTF-8 character (truncated) */
-+ cl_assert_equal_i(2, utf8_strnwidth((const char*)truncated_bytes, 2, 0)); /* Invalid UTF-8, returns byte count */
++ /* Test CJK with ANSI sequences */
++ const char *ansi_test = "\033[1m你好\033[0m";
++ int width = utf8_strnwidth(ansi_test, strlen(ansi_test), 1);
++ /* Should skip ANSI sequences and count "你好" as 4 columns */
++ cl_assert_equal_i(4, width);
++
++ /* Test mixed ASCII, CJK, and ANSI */
++ ansi_test = "Hello\033[32m世界\033[0m!";
++ width = utf8_strnwidth(ansi_test, strlen(ansi_test), 1);
++ /* "Hello"(5) + "世界"(4) + "!"(1) = 10 */
++ cl_assert_equal_i(10, width);
+}
2: 65efad527f ! 2: d0975427c9 builtin/repo: fix table alignment for UTF-8 characters
@@ Commit message
| -------------- | ---- |
| * 引用 | |
| * 计数 | 67 |
- | * 分支 | 6 |
- | * 标签 | 30 |
- | * 远程 | 19 |
- | * 其它 | 12 |
- | | |
- | * 可达对象 | |
- | * 计数 | 2217 |
- | * 提交 | 279 |
- | * 树 | 740 |
- | * 数据对象 | 1168 |
- | * 标签 | 30 |
The previous implementation used simple width formatting with printf()
which didn't properly handle multi-byte UTF-8 characters, causing
@@ Commit message
ensures proper column alignment regardless of the character encoding of
the content being displayed.
- Co-developed-by: Gemini <noreply@developers.google.com>
+ Also add test cases for strbuf_utf8_align(), a function newly introduced
+ in "builtin/repo.c".
+
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
## builtin/repo.c ##
@@ builtin/repo.c: static void stats_table_print_structure(const struct stats_table
+ strbuf_utf8_align(&buf, ALIGN_LEFT, value_col_width, value_col_title);
+ strbuf_addstr(&buf, " |");
+ printf("%s\n", buf.buf);
-+ strbuf_reset(&buf);
+
printf("| ");
for (int i = 0; i < name_col_width; i++)
@@ builtin/repo.c: static void stats_table_print_structure(const struct stats_table
}
static void stats_table_clear(struct stats_table *table)
+
+ ## t/unit-tests/u-utf8-width.c ##
+@@ t/unit-tests/u-utf8-width.c: void test_utf8_width__strnwidth_cjk_with_ansi(void)
+ /* "Hello"(5) + "世界"(4) + "!"(1) = 10 */
+ cl_assert_equal_i(10, width);
+ }
++
++/*
++ * Test the strbuf_utf8_align function with CJK characters
++ */
++void test_utf8_width__strbuf_utf8_align(void)
++{
++ struct strbuf buf = STRBUF_INIT;
++
++ /* Test left alignment with CJK */
++ strbuf_utf8_align(&buf, ALIGN_LEFT, 10, "你好");
++ /* Since "你好" is 4 display columns, we need 6 more spaces to reach 10 */
++ cl_assert_equal_s("你好 ", buf.buf);
++ strbuf_reset(&buf);
++
++ /* Test right alignment with CJK */
++ strbuf_utf8_align(&buf, ALIGN_RIGHT, 8, "世界");
++ /* "世界" is 4 display columns, so we need 4 leading spaces */
++ cl_assert_equal_s(" 世界", buf.buf);
++ strbuf_reset(&buf);
++
++ /* Test center alignment with CJK */
++ strbuf_utf8_align(&buf, ALIGN_MIDDLE, 10, "中");
++ /* "中" is 2 display columns, so (10-2)/2 = 4 spaces on left, 4 on right */
++ cl_assert_equal_s(" 中 ", buf.buf);
++ strbuf_reset(&buf);
++
++ strbuf_utf8_align(&buf, ALIGN_MIDDLE, 5, "中");
++ /* "中" is 2 display columns, so (5-2)/2 = 1 spaces on left, 2 on right */
++ cl_assert_equal_s(" 中 ", buf.buf);
++ strbuf_reset(&buf);
++
++ /* Test alignment that is smaller than string width */
++ strbuf_utf8_align(&buf, ALIGN_LEFT, 2, "你好");
++ /* Since "你好" is 4 display columns, it should not be truncated */
++ cl_assert_equal_s("你好", buf.buf);
++ strbuf_release(&buf);
++}
--
Jiang Xin
next prev parent reply other threads:[~2025-11-15 13:36 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-14 5:52 [PATCH 0/2] Fix misaligned output of git repo structure Jiang Xin
2025-11-14 5:52 ` [PATCH 1/2] t/unit-tests: add UTF-8 width tests for CJK chars Jiang Xin
2025-11-14 20:17 ` Junio C Hamano
2025-11-15 12:38 ` Jiang Xin
2025-11-14 5:52 ` [PATCH 2/2] builtin/repo: fix table alignment for UTF-8 characters Jiang Xin
2025-11-14 17:50 ` Justin Tobler
2025-11-15 12:41 ` Jiang Xin
2025-11-14 20:00 ` Junio C Hamano
2025-11-15 12:54 ` Jiang Xin
2025-11-15 16:36 ` Junio C Hamano
2025-11-16 13:32 ` Jiang Xin
2025-11-16 16:51 ` Junio C Hamano
2025-11-14 7:41 ` [PATCH 0/2] Fix misaligned output of git repo structure Kristoffer Haugsbakk
2025-11-14 9:52 ` Jiang Xin
2025-11-14 19:22 ` Junio C Hamano
2025-11-15 12:25 ` Jiang Xin
2025-11-14 16:13 ` Junio C Hamano
2025-11-15 13:36 ` Jiang Xin [this message]
2025-11-15 13:36 ` [PATCH v2 1/2] t/unit-tests: add UTF-8 width tests for CJK chars Jiang Xin
2025-11-15 13:36 ` [PATCH v2 2/2] builtin/repo: fix table alignment for UTF-8 characters Jiang Xin
2025-11-15 15:04 ` Phillip Wood
2025-11-15 16:49 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1763213290.git.worldhello.net@gmail.com \
--to=worldhello.net@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jltobler@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).