From: Jeff King <peff@peff.net>
To: Paul Hammant <paul@hammant.org>
Cc: git@vger.kernel.org
Subject: Re: [bug] git-check-ignore and file names with unicode chars in name - sys-out filename is corrupted
Date: Tue, 9 Aug 2016 02:38:55 -0400 [thread overview]
Message-ID: <20160809063854.GA17777@peff.net> (raw)
In-Reply-To: <CA+298UiKf6heNPy-NZSfdx47jyS_aK+C8UX3vh6OB3_XE+pn=g@mail.gmail.com>
On Tue, Aug 09, 2016 at 01:47:18AM -0400, Paul Hammant wrote:
> Reproduction:
>
> $ echo "*.ignoreme" >> .gitignore
> # (and commit)
> $ touch "fooo-€.ignoreme"
> $ find . -print | grep fooo | xargs git check-ignore
> "./fooo-\342\202\254.ignoreme"
>
> You could view that git-check-ignore isn't corrupting anything, it is
> just outputting another form for the file name (octal escaped), but it
> doesn't need to change it at all, and its causing downstream problems
> in bash scripting.
It's not corrupted; like all git commands, check-ignore by default
prints paths with a reversible quoting mechanism, so that odd filenames
are not syntactically ambiguous (e.g., consider a filename with a
newline in it), and so that you don't get binary spew on your terminal.
For robust scripting, you can either:
- unquote the filenames in the receiving script (detect the presence
of quoting by the double-quote in the first character, and then
normal C-style dequoting).
or
- use "-z" to get NUL-delimited filenames with no quoting. Your
example above has problems in the find, grep, and xargs
commands, too. A more careful version is:
find . -print0 | grep -z fooo | git check-ignore --stdin -z
For human readability, you can do:
git config core.quotepath false
to avoid quoting binary characters (here and in other tools like "git
diff"), which is convenient if you use UTF8 filenames. It also will
"unbreak" your scripts in the sense that it will avoid quoting in more
situations. The scripts would still choke on more weird filenames
(e.g., ones with embedded tabs or newlines), but in practice you'd
probably never notice.
-Peff
prev parent reply other threads:[~2016-08-09 6:38 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-09 5:47 [bug] git-check-ignore and file names with unicode chars in name - sys-out filename is corrupted Paul Hammant
2016-08-09 6:38 ` Jeff King [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160809063854.GA17777@peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=paul@hammant.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).