From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: Yongmin <revi@omglol.email>
Cc: git@vger.kernel.org
Subject: Re: git format-patch displays weird chars when filename includes non-ascii chars
Date: Tue, 14 May 2024 20:44:14 +0000 [thread overview]
Message-ID: <ZkPNHqAUemfdfaFD@tapette.crustytoothpaste.net> (raw)
In-Reply-To: <ea41a92d-35df-4b71-be70-a736d620b21f@app.fastmail.com>
[-- Attachment #1: Type: text/plain, Size: 2205 bytes --]
On 2024-05-14 at 15:31:43, Yongmin wrote:
> Hi everybody,
>
> When the file name has non-ascii characters, the file name gets mangled somehow. Is this anything from my config side error or something gone weird with git?
>
> Steps to reproduce;
> $ git init
> $ echo 'BlahBlah' > 테스트.txt
> $ git add 테스트.txt
> $ git commit -m 'test commit'
> $ git format-patch --root
> 0001-test-commit.patch
> $ cat 0001-test-commit.patch
>
> From d2aa2b2f5aa290edec6a5fd141318a479ac9de8e Mon Sep 17 00:00:00 2001
> From: Yongmin Hong <revi@omglol.email>
> Date: Tue, 14 May 2024 15:15:52 +0000
> Subject: [PATCH] test commit
>
> ---
> "\355\205\214\354\212\244\355\212\270.txt" | 1 +
> 1 file changed, 1 insertion(+)
> create mode 100644 "\355\205\214\354\212\244\355\212\270.txt"
>
> diff --git "a/\355\205\214\354\212\244\355\212\270.txt" "b/\355\205\214\354\212\244\355\212\270.txt"
In some cases, Git uses escaped strings (often octal[0]) to avoid
problems with encoding when sending patches over email or producing
unambiguous output. For example, the file name "\r\n.txt" would
definitely break sending over email.
In addition, while it appears that you're using UTF-8, which is great,
Git does not require file names to be in UTF-8, and it's valid to
specify 0xfe and 0xff (among other byte values) in file names in Git.
However, if we wrote those bytes in the body of an email, many users
would be upset when reviewing the patches, since they will usually want
to write their emails in UTF-8, and it's possible the patches might get
mishandled or mangled by a mail server or mail client.
Thus, Git prefers to encode names in a way that is unambiguous and
doesn't lead to mangling. It is inconvenient that legitimate UTF-8 file
names don't get rendered properly, though. I don't _believe_ there's an
option to show the regular UTF-8, but I could be wrong.
[0] I don't know why we chose octal and I'd much prefer hexadecimal, but
I wonder if it may have originally been to pipe to printf(1), which
POSIX requires to accept octal, but unfortunately not hexadecimal,
escapes.
--
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
next prev parent reply other threads:[~2024-05-14 20:44 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-14 15:31 git format-patch displays weird chars when filename includes non-ascii chars Yongmin
2024-05-14 20:44 ` brian m. carlson [this message]
2024-05-14 21:38 ` Junio C Hamano
2024-05-15 5:40 ` Yongmin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZkPNHqAUemfdfaFD@tapette.crustytoothpaste.net \
--to=sandals@crustytoothpaste.net \
--cc=git@vger.kernel.org \
--cc=revi@omglol.email \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.