git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Truncating file names with Unicode characters
@ 2018-06-08 18:25 Vitaly Potyarkin
  2018-06-09  6:23 ` Jeff King
  0 siblings, 1 reply; 2+ messages in thread
From: Vitaly Potyarkin @ 2018-06-08 18:25 UTC (permalink / raw)
  To: git

# Truncating file names with Unicode characters

When shortening file names that contain Unicode characters, git performs
truncation without awareness of two-byte characters. That often leads to
splitting a character in half and displaying a garbage byte that's left.

Unawareness of Unicode also means that filename length is calculated incorrectly
and some output gets misaligned.

I have tested this with git 2.14.1 on Windows and with git 2.11.0 on Linux. My
configuration includes setting `core.quotepath = off` to display Unicode paths.

# Example: `git log --stat`

## Bad output: half-characters and wrong text alignment

The last file name gets truncated in the middle of the character (`ˆ` is
what's left of it). Text alignment is off because string lengths are calculated
in bytes instead of characters.

    Extension/README.md                                |  28 +++++++++
    .../Catalog.Номенклатура.xml           |  32 ++++++++++
    .../Configuration.xml                              |   5 +-
    ...етПереработчика.ObjectModule.txt |  39 ++++++++++++
    ...cument.ОтчетПереработчика.xml |  68 +++++++++++++++++++++
    .../Enum.СтавкиНДС.xml                    |  24 ++++++++
    ...ˆирениеERPПотяркин_2018-06-05.cfe | Bin 0 -> 22018 bytes
    7 files changed, 195 insertions(+), 1 deletion(-)

## Good output with ASCII file names

Truncation and alignment are done right because each character is represented
by a single byte.

    .../index.html                                             | 14
++++++++++++++
    docs/posts/2017/loops-in-power-query-m-language/index.html | 14
++++++++++++++
    .../index.html                                             |  7 +++++++
    .../temporary-virtual-environment-for-python/index.html    | 14
++++++++++++++
    .../index.html                                             | 14
++++++++++++++
    docs/posts/2018/getting-started-with-libpq/index.html      | 14
++++++++++++++
    .../index.html                                             | 14
++++++++++++++
    .../2018/unit-testing-in-power-query-m-language/index.html |  7 +++++++
    8 files changed, 98 insertions(+)

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-06-09  6:24 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-08 18:25 Truncating file names with Unicode characters Vitaly Potyarkin
2018-06-09  6:23 ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).