From: Junio C Hamano <gitster@pobox.com>
To: Jiang Xin <worldhello.net@gmail.com>
Cc: "Git List" <git@vger.kernel.org>,
"Justin Tobler" <jltobler@gmail.com>,
"Alexander Shopov" <ash@kambanaria.org>,
"Mikel Forcada" <mikel.forcada@gmail.com>,
"Ralf Thielow" <ralf.thielow@gmail.com>,
"Jean-Noël Avila" <jn.avila@free.fr>,
"Bagas Sanjaya" <bagasdotme@gmail.com>,
"Dimitriy Ryazantcev" <DJm00n@mail.ru>,
"Peter Krefting" <peter@softwolves.pp.se>,
"Emir SARI" <bitigchi@me.com>, "Arkadii Yakovets" <ark@cho.red>,
"Vũ Tiến Hưng" <newcomerminecraft@gmail.com>,
"Teng Long" <dyroneteng@gmail.com>,
"Yi-Jyun Pan" <pan93412@gmail.com>
Subject: Re: [PATCH 1/2] t/unit-tests: add UTF-8 width tests for CJK chars
Date: Fri, 14 Nov 2025 12:17:31 -0800 [thread overview]
Message-ID: <xmqqzf8ogyhw.fsf@gitster.g> (raw)
In-Reply-To: <04ab347ff80e16d49524246a8923cc86cc7355be.1763098804.git.worldhello.net@gmail.com> (Jiang Xin's message of "Fri, 14 Nov 2025 00:52:44 -0500")
Jiang Xin <worldhello.net@gmail.com> writes:
[jc: the same question about the choice of Cc addresses applies]
> This commit adds a new test suite (u-utf8-width.c) to test the UTF-8
> width functions in Git, particularly focusing on multi-byte characters
> from East Asian languages like Chinese, Japanese, and Korean that
> typically require 2 display columns per character.
>
> The test suite includes:
> - Tests for utf8_strnwidth with Chinese strings
> - Tests for utf8_strwidth with Chinese strings
> - Tests for Japanese and Korean characters
> - Edge case tests with invalid UTF-8 sequences
> - Proper test function naming following the Clar framework convention
>
> Also updated the build configuration in Makefile and meson.build to
> include the new test suite in the build process.
The usual way to compose a log message of this project is to
- Give an observation on how the current system works in the
present tense (so no need to say "Currently X is Y", or
"Previously X was Y" to describe the state before your change;
just "X is Y" is enough), and discuss what you perceive as a
problem in it.
- Propose a solution (optional---often, problem description
trivially leads to an obvious solution in reader's minds).
- Give commands to somebody editing the codebase to "make it so",
instead of saying "This commit does X".
in this order.
> + /* Test length limiting */
> + str = "你好世界";
> + cl_assert_equal_i(2, utf8_strnwidth(str, 3, 0)); /* Only first char "你"(2 columns) within 3 bytes */
> + cl_assert_equal_i(4, utf8_strnwidth(str, 6, 0)); /* First two chars "你好"(4 columns) in 6 bytes */
We also should test utf8_strwidth() on the same string here.
> +/*
> + * Test edge cases with partial UTF-8 sequences
> + */
All tests before these make sense, but I am not sure if we want to
hold utf8_strnwidth() to the requirement that it will tolerate "len"
to end in the middle of a single character, as such a requirement by
itself does not do application any good.
A caller may have "你好世界" in str, learn that the first 4 bytes
would only need two display columns to show (i.e., 3-byte "你" plus
a single garbage byte, that would make UTF-8 encoded "好" if the
remaining two bytes were included), and may want to learn how to
show only enough to fill the two display columns. But there is not
enough information given back by utf8_strnwidth() for such a caller
to figure out that it needs to feed only the first three bytes (not
four) of str to printf() to do so.
next prev parent reply other threads:[~2025-11-14 20:17 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-14 5:52 [PATCH 0/2] Fix misaligned output of git repo structure Jiang Xin
2025-11-14 5:52 ` [PATCH 1/2] t/unit-tests: add UTF-8 width tests for CJK chars Jiang Xin
2025-11-14 20:17 ` Junio C Hamano [this message]
2025-11-15 12:38 ` Jiang Xin
2025-11-14 5:52 ` [PATCH 2/2] builtin/repo: fix table alignment for UTF-8 characters Jiang Xin
2025-11-14 17:50 ` Justin Tobler
2025-11-15 12:41 ` Jiang Xin
2025-11-14 20:00 ` Junio C Hamano
2025-11-15 12:54 ` Jiang Xin
2025-11-15 16:36 ` Junio C Hamano
2025-11-16 13:32 ` Jiang Xin
2025-11-16 16:51 ` Junio C Hamano
2025-11-14 7:41 ` [PATCH 0/2] Fix misaligned output of git repo structure Kristoffer Haugsbakk
2025-11-14 9:52 ` Jiang Xin
2025-11-14 19:22 ` Junio C Hamano
2025-11-15 12:25 ` Jiang Xin
2025-11-14 16:13 ` Junio C Hamano
2025-11-15 13:36 ` [PATCH v2 " Jiang Xin
2025-11-15 13:36 ` [PATCH v2 1/2] t/unit-tests: add UTF-8 width tests for CJK chars Jiang Xin
2025-11-15 13:36 ` [PATCH v2 2/2] builtin/repo: fix table alignment for UTF-8 characters Jiang Xin
2025-11-15 15:04 ` Phillip Wood
2025-11-15 16:49 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqzf8ogyhw.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=DJm00n@mail.ru \
--cc=ark@cho.red \
--cc=ash@kambanaria.org \
--cc=bagasdotme@gmail.com \
--cc=bitigchi@me.com \
--cc=dyroneteng@gmail.com \
--cc=git@vger.kernel.org \
--cc=jltobler@gmail.com \
--cc=jn.avila@free.fr \
--cc=mikel.forcada@gmail.com \
--cc=newcomerminecraft@gmail.com \
--cc=pan93412@gmail.com \
--cc=peter@softwolves.pp.se \
--cc=ralf.thielow@gmail.com \
--cc=worldhello.net@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).