All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Andrey Bienkowski via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Andrey Bienkowski <hexagonrecursion@gmail.com>,
	Andrey Bienkowski <hexagonrecursion@gmail.com>
Subject: [PATCH] doc: clarify the filename encoding in git diff
Date: Mon, 19 Apr 2021 13:27:35 +0000	[thread overview]
Message-ID: <pull.996.git.git.1618838856399.gitgitgadget@gmail.com> (raw)

From: Andrey Bienkowski <hexagonrecursion@gmail.com>

AFAICT parsing the output of `git diff --name-only master...feature`
is the intended way of programmatically getting the list of files modified
by a feature branch. It is impossible to parse text unless you know what
encoding it is in. The output encoding of diff --name-only and
diff --name-status was not documented.

I asked on the mailing list and got this:
https://public-inbox.org/git/YGx2EMHnwXWbp4ET@coredump.intra.peff.net/
> There's some discussion in Documentation/i18n.txt, which is included in
various manpages (e.g., https://git-scm.com/docs/git-log#_discussion)
but it doesn't seem to be mentioned in git-diff.
>
The short answer is: mostly utf8, but historically on platforms that
don't care (like Linux) you could get away with other encodings.
>
-Peff

My takeaway was to always parse it as utf8 regardless of platform or
environment.

Signed-off-by: Andrey Bienkowski <hexagonrecursion@gmail.com>
---
    doc: clarify the filename encoding in git diff --name-only and
    --name-status
    
    AFAICT parsing the output of git diff --name-only master...feature is
    the intended way of programmatically getting the list of files modified
    by a feature branch. It is impossible to parse text unless you know what
    encoding it is in. The output encoding of diff --name-only and diff
    --name-status was not documented.
    
    I asked on the mailing list and got this:
    https://public-inbox.org/git/YGx2EMHnwXWbp4ET@coredump.intra.peff.net/
    
    > There's some discussion in Documentation/i18n.txt, which is included
    > in various manpages (e.g.,
    > https://git-scm.com/docs/git-log#_discussion) but it doesn't seem to
    > be mentioned in git-diff.
    >
    > The short answer is: mostly utf8, but historically on platforms that
    > don't care (like Linux) you could get away with other encodings.
    >
    > -Peff
    
    My takeaway was to always parse it as utf8 regardless of platform or
    environment.
    
    Changes since v1:
    
     * Replace "always" with "often"
     * Add a link to https://git-scm.com/docs/git-log

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-996%2Fhexagonrecursion%2Futf8-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-996/hexagonrecursion/utf8-v1
Pull-Request: https://github.com/git/git/pull/996

 Documentation/diff-options.txt | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index aa2b5c11f20b..4ce36ef535ba 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -293,11 +293,14 @@ explained for the configuration variable `core.quotePath` (see
 linkgit:git-config[1]).
 
 --name-only::
-	Show only names of changed files.
+	Show only names of changed files. The file names are usually encoded in UTF-8.
+	For more information see the discussion about encoding in the linkgit:git-log[1]
+	manual page.
 
 --name-status::
 	Show only names and status of changed files. See the description
 	of the `--diff-filter` option on what the status letters mean.
+	Just like `--name-only` the file names are usually encoded in UTF-8.
 
 --submodule[=<format>]::
 	Specify how differences in submodules are shown.  When specifying

base-commit: 48bf2fa8bad054d66bd79c6ba903c89c704201f7
-- 
gitgitgadget

             reply	other threads:[~2021-04-19 13:30 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-19 13:27 Andrey Bienkowski via GitGitGadget [this message]
2021-04-19 21:33 ` [PATCH] doc: clarify the filename encoding in git diff Junio C Hamano
2021-04-20 11:24 ` [PATCH v2] " Andrey Bienkowski via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.996.git.git.1618838856399.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=hexagonrecursion@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.