git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [bug] git-check-ignore and file names with unicode chars in name - sys-out filename is corrupted
@ 2016-08-09  5:47 Paul Hammant
  2016-08-09  6:38 ` Jeff King
  0 siblings, 1 reply; 2+ messages in thread
From: Paul Hammant @ 2016-08-09  5:47 UTC (permalink / raw)
  To: git

Reproduction:

  $ echo "*.ignoreme" >> .gitignore
  # (and commit)
  $ touch "fooo-€.ignoreme"
  $ find . -print | grep fooo | xargs git check-ignore
  "./fooo-\342\202\254.ignoreme"

You could view that git-check-ignore isn't corrupting anything, it is
just outputting another form for the file name (octal escaped), but it
doesn't need to change it at all, and its causing downstream problems
in bash scripting.

Of course this may get munged by gmail or a list manager. In the text
above, you should see unicode char "euro sign" to the right of a dash,
and the left of .ignoreme

Git version is 2.9.2

-ph

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [bug] git-check-ignore and file names with unicode chars in name - sys-out filename is corrupted
  2016-08-09  5:47 [bug] git-check-ignore and file names with unicode chars in name - sys-out filename is corrupted Paul Hammant
@ 2016-08-09  6:38 ` Jeff King
  0 siblings, 0 replies; 2+ messages in thread
From: Jeff King @ 2016-08-09  6:38 UTC (permalink / raw)
  To: Paul Hammant; +Cc: git

On Tue, Aug 09, 2016 at 01:47:18AM -0400, Paul Hammant wrote:

> Reproduction:
> 
>   $ echo "*.ignoreme" >> .gitignore
>   # (and commit)
>   $ touch "fooo-€.ignoreme"
>   $ find . -print | grep fooo | xargs git check-ignore
>   "./fooo-\342\202\254.ignoreme"
> 
> You could view that git-check-ignore isn't corrupting anything, it is
> just outputting another form for the file name (octal escaped), but it
> doesn't need to change it at all, and its causing downstream problems
> in bash scripting.

It's not corrupted; like all git commands, check-ignore by default
prints paths with a reversible quoting mechanism, so that odd filenames
are not syntactically ambiguous (e.g., consider a filename with a
newline in it), and so that you don't get binary spew on your terminal.

For robust scripting, you can either:

  - unquote the filenames in the receiving script (detect the presence
    of quoting by the double-quote in the first character, and then
    normal C-style dequoting).

or

  - use "-z" to get NUL-delimited filenames with no quoting. Your
    example above has problems in the find, grep, and xargs
    commands, too. A more careful version is:

      find . -print0 | grep -z fooo | git check-ignore --stdin -z

For human readability, you can do:

  git config core.quotepath false

to avoid quoting binary characters (here and in other tools like "git
diff"), which is convenient if you use UTF8 filenames. It also will
"unbreak" your scripts in the sense that it will avoid quoting in more
situations. The scripts would still choke on more weird filenames
(e.g., ones with embedded tabs or newlines), but in practice you'd
probably never notice.

-Peff

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-08-09  6:38 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-09  5:47 [bug] git-check-ignore and file names with unicode chars in name - sys-out filename is corrupted Paul Hammant
2016-08-09  6:38 ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).