* Stupid quoting...
@ 2007-06-13 11:30 David Kastrup
2007-06-13 12:06 ` Alex Riesen
` (2 more replies)
0 siblings, 3 replies; 35+ messages in thread
From: David Kastrup @ 2007-06-13 11:30 UTC (permalink / raw)
To: git
Hi,
what is the point in quoting file names and their characters in
git-diff's output? And what is the recommended way of undoing the
damage?
I have something like
git-diff -M -C --name-status -r master^ master | {
while read -r flag name
do
case "$name" in *\\[0-3][0-7][0-7]*)
name=$(echo -e $(echo "$name"|sed 's/\\\([0-3][0-7][0-7]\)/\\0\1/g;s/\\\([^0]\)/\\\\\1/g'))
esac
[...]
in order to get through the worst with utf-8 file names, and it is a
complete nuisance (double quotemarks are treated later).
Is there any utility or pipe or invocation that can take a sequence of
filenames as printed by git and turn them back into what they actually
were in the first place?
--
David Kastrup
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: Stupid quoting... 2007-06-13 11:30 Stupid quoting David Kastrup @ 2007-06-13 12:06 ` Alex Riesen 2007-06-13 12:21 ` Johannes Schindelin 2007-06-16 21:03 ` Jakub Narebski 2 siblings, 0 replies; 35+ messages in thread From: Alex Riesen @ 2007-06-13 12:06 UTC (permalink / raw) To: David Kastrup; +Cc: git On 6/13/07, David Kastrup <dak@gnu.org> wrote: > > what is the point in quoting file names and their characters in > git-diff's output? And what is the recommended way of undoing the > damage? > Just use "-z". Everything will be unquoted and separated by \0 ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-13 11:30 Stupid quoting David Kastrup 2007-06-13 12:06 ` Alex Riesen @ 2007-06-13 12:21 ` Johannes Schindelin [not found] ` <86ejkgvxmb.fsf@lola.quinscape.zz> ` (2 more replies) 2007-06-16 21:03 ` Jakub Narebski 2 siblings, 3 replies; 35+ messages in thread From: Johannes Schindelin @ 2007-06-13 12:21 UTC (permalink / raw) To: David Kastrup; +Cc: git Hi, On Wed, 13 Jun 2007, David Kastrup wrote: > what is the point in quoting file names and their characters in > git-diff's output? And what is the recommended way of undoing the > damage? The recommended way is not using spaces to begin with. I mean, does "David" contain spaces? People seem not to see the problem, and fail to blame Microsoft for all the damage they have done, introducing that stupid, stupid concept of filenames containing spaces, and _enforcing_ it. > I have something like > > git-diff -M -C --name-status -r master^ master | { > while read -r flag name > do > case "$name" in *\\[0-3][0-7][0-7]*) > name=$(echo -e $(echo "$name"|sed 's/\\\([0-3][0-7][0-7]\)/\\0\1/g;s/\\\([^0]\)/\\\\\1/g')) > esac > [...] > > in order to get through the worst with utf-8 file names, and it is a > complete nuisance (double quotemarks are treated later). Please understand that the quotes are not there for you, but for processing by other programs. However, I _suspect_ that you want to do something like name="$(echo $name)" because "echo" is exactly one of the programs this quoting was invented for. Ciao, Dscho ^ permalink raw reply [flat|nested] 35+ messages in thread
[parent not found: <86ejkgvxmb.fsf@lola.quinscape.zz>]
* Re: Stupid quoting... [not found] ` <86ejkgvxmb.fsf@lola.quinscape.zz> @ 2007-06-14 0:51 ` Johannes Schindelin 2007-06-14 6:12 ` David Kastrup 0 siblings, 1 reply; 35+ messages in thread From: Johannes Schindelin @ 2007-06-14 0:51 UTC (permalink / raw) To: David Kastrup; +Cc: git [-- Attachment #1: Type: TEXT/PLAIN, Size: 3113 bytes --] Hi, [somehow I got the impression your mail did not make it to the list] On Wed, 13 Jun 2007, David Kastrup wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > On Wed, 13 Jun 2007, David Kastrup wrote: > > > >> what is the point in quoting file names and their characters in > >> git-diff's output? And what is the recommended way of undoing the > >> damage? > > > > The recommended way is not using spaces to begin with. > > Who is talking about spaces? That is the common reason for quoting. I mean, really, how many files do you have which contain newlines or backslashes or tabs? Huh? > > I mean, does "David" contain spaces? > > "Günter" contains non-ASCII characters. And "Guenther" (sorry, have problems with my mailer, so I simulate it in plain ASCII" does not need quotes, _even_ if containing non-ASCII characters. So what exactly was your point again? > > People seem not to see the problem, and fail to blame Microsoft for > > all the damage they have done, introducing that stupid, stupid concept > > of filenames containing spaces, and _enforcing_ it. > > The concept of UNIX file names is _any_ byte sequence not containing "/" > or an ASCII NUL. Microsoft actually prohibits quite a few more > characters. Filenames with spaces first came into serious use under > MacOS, the first graphical user interface where no shell and > metacharacters interfered with the choice of file names. > > Blaming Microsoft here is completely ridiculous. It is completely unridiculous. Before Microsoft -- in its infinite wisdom -- decided to create folders like "Program Files", and "Documents and Settings", and made it the _default_ (of all things) to save its ridiculous Word documents as "New Document", _nobody_ on this planet even _thought_ about including stupid whitespace in a filename. You can tell that this is true by looking at now-ancient Unix scripts. > >> I have something like > >> > >> git-diff -M -C --name-status -r master^ master | { > >> while read -r flag name > >> do > >> case "$name" in *\\[0-3][0-7][0-7]*) > >> name=$(echo -e $(echo "$name"|sed 's/\\\([0-3][0-7][0-7]\)/\\0\1/g;s/\\\([^0]\)/\\\\\1/g')) > >> esac > >> [...] > >> > >> in order to get through the worst with utf-8 file names, and it is a > >> complete nuisance (double quotemarks are treated later). > > > > Please understand that the quotes are not there for you, but for > > processing by other programs. > > > > However, I _suspect_ that you want to do something like > > > > name="$(echo $name)" > > > > because "echo" is exactly one of the programs this quoting was invented > > for. > > Only that it does not work with echo. echo requires \0NNN for octal > escapes, not \NNN, and then only when "echo -e" is used. Um. How does that apply here? Git only does quoting so that programs like echo get it right, when passed the name? No funny \0NNN or \NNN or whatever? > You are really haphazard in distributing your blame. > > Can you actually name a program that would work with the default > output of git here? echo. Ciao, Dscho ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-14 0:51 ` Johannes Schindelin @ 2007-06-14 6:12 ` David Kastrup 2007-06-14 7:06 ` Alex Riesen 0 siblings, 1 reply; 35+ messages in thread From: David Kastrup @ 2007-06-14 6:12 UTC (permalink / raw) To: git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Hi, > > [somehow I got the impression your mail did not make it to the list] > > On Wed, 13 Jun 2007, David Kastrup wrote: > >> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> >> > On Wed, 13 Jun 2007, David Kastrup wrote: >> > >> >> what is the point in quoting file names and their characters in >> >> git-diff's output? And what is the recommended way of undoing the >> >> damage? >> > >> > The recommended way is not using spaces to begin with. >> >> Who is talking about spaces? > > That is the common reason for quoting. I mean, really, how many files do > you have which contain newlines or backslashes or tabs? Huh? I am talking about non-ASCII characters. > >> > I mean, does "David" contain spaces? >> >> "Günter" contains non-ASCII characters. > > And "Guenther" (sorry, have problems with my mailer, so I simulate > it in plain ASCII" does not need quotes, _even_ if containing > non-ASCII characters. > > So what exactly was your point again? You _are_ aware that git writes out \303\274 (8 characters: 2 backslashes and 6 digits) instead of ü in a file name? And I am talking about a pure utf-8 locale. LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= My point was that these octal escape sequences are utterly pointless. >> > People seem not to see the problem, and fail to blame Microsoft for >> > all the damage they have done, introducing that stupid, stupid concept >> > of filenames containing spaces, and _enforcing_ it. >> >> The concept of UNIX file names is _any_ byte sequence not >> containing "/" or an ASCII NUL. Microsoft actually prohibits quite >> a few more characters. Filenames with spaces first came into >> serious use under MacOS, the first graphical user interface where >> no shell and metacharacters interfered with the choice of file >> names. >> >> Blaming Microsoft here is completely ridiculous. > > It is completely unridiculous. Before Microsoft -- in its infinite > wisdom -- decided to create folders like "Program Files", and > "Documents and Settings", and made it the _default_ (of all things) > to save its ridiculous Word documents as "New Document", _nobody_ on > this planet even _thought_ about including stupid whitespace in a > filename. > > You can tell that this is true by looking at now-ancient Unix > scripts. You are making a spectacle of yourself. Do you even read what you are replying to? When spaces became commonplace in _MacOS_, _MacOS_ was by no means Unix-based. Microsoft only followed the trend (with a delay of several years, by the way) when imitating the MacOS GUI. >> >> I have something like >> >> >> >> git-diff -M -C --name-status -r master^ master | { >> >> while read -r flag name >> >> do >> >> case "$name" in *\\[0-3][0-7][0-7]*) >> >> name=$(echo -e $(echo "$name"|sed 's/\\\([0-3][0-7][0-7]\)/\\0\1/g;s/\\\([^0]\)/\\\\\1/g')) >> >> esac >> >> [...] >> >> >> >> in order to get through the worst with utf-8 file names, and it is a >> >> complete nuisance (double quotemarks are treated later). >> > >> > Please understand that the quotes are not there for you, but for >> > processing by other programs. >> > >> > However, I _suspect_ that you want to do something like >> > >> > name="$(echo $name)" >> > >> > because "echo" is exactly one of the programs this quoting was invented >> > for. >> >> Only that it does not work with echo. echo requires \0NNN for >> octal escapes, not \NNN, and then only when "echo -e" is used. > > Um. How does that apply here? Git only does quoting so that programs > like echo get it right, when passed the name? No funny \0NNN or \NNN > or whatever? git puts out funny \NNN quotes. That's what I am complaining about. >> You are really haphazard in distributing your blame. >> >> Can you actually name a program that would work with the default >> output of git here? > > echo. It doesn't, since it does not interpret the \NNN escape sequences that git chooses to output. -- David Kastrup ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-14 6:12 ` David Kastrup @ 2007-06-14 7:06 ` Alex Riesen [not found] ` <86hcpb6lr6.fsf@lola.quinscape.zz> 0 siblings, 1 reply; 35+ messages in thread From: Alex Riesen @ 2007-06-14 7:06 UTC (permalink / raw) To: David Kastrup; +Cc: git On 6/14/07, David Kastrup <dak@gnu.org> wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > >> Can you actually name a program that would work with the default > >> output of git here? > > > > echo. > > It doesn't, since it does not interpret the \NNN escape sequences that > git chooses to output. Have you tried that -z switch yet? ^ permalink raw reply [flat|nested] 35+ messages in thread
[parent not found: <86hcpb6lr6.fsf@lola.quinscape.zz>]
* Re: Stupid quoting... [not found] ` <86hcpb6lr6.fsf@lola.quinscape.zz> @ 2007-06-14 8:51 ` Alex Riesen 0 siblings, 0 replies; 35+ messages in thread From: Alex Riesen @ 2007-06-14 8:51 UTC (permalink / raw) To: David Kastrup; +Cc: git On 6/14/07, David Kastrup <dak@gnu.org> wrote: > "Alex Riesen" <raa.lkml@gmail.com> writes: > > > On 6/14/07, David Kastrup <dak@gnu.org> wrote: > >> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > >> >> Can you actually name a program that would work with the default > >> >> output of git here? > >> > > >> > echo. > >> > >> It doesn't, since it does not interpret the \NNN escape sequences that > >> git chooses to output. > > > > Have you tried that -z switch yet? > > What has that to do with "the default output of git"? > > Yes, in my application I _will_ be using -z Just checking. > (in connection with the rather hackish read -d '' name > command from bash which is not really documented) but that does not > change the fact that the default output is broken. There is no reason > whatsoever to use octal quotes for non-ASCII characters. Neither > programs nor humans are better off by that, and none of the derision > bestowed upon me changes that. Well, fix that. How do you think _should_ it be? It's just up until now you are only complaining. No _useful_ idea came from you. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-13 12:21 ` Johannes Schindelin [not found] ` <86ejkgvxmb.fsf@lola.quinscape.zz> @ 2007-06-14 1:06 ` Steven Grimm 2007-06-14 1:12 ` Johannes Schindelin 2007-06-14 8:49 ` Junio C Hamano 2 siblings, 1 reply; 35+ messages in thread From: Steven Grimm @ 2007-06-14 1:06 UTC (permalink / raw) To: Johannes Schindelin; +Cc: David Kastrup, git Johannes Schindelin wrote: > The recommended way is not using spaces to begin with. I mean, does > "David" contain spaces? People seem not to see the problem, and fail to > blame Microsoft for all the damage they have done, introducing that > stupid, stupid concept of filenames containing spaces, and _enforcing_ it. > To be fair, Microsoft did not invent the concept of filenames with spaces. Even UNIX has, I believe, always allowed them, though you risked running into buggy scripts misbehaving if you used them. And really, filenames with spaces are only a nuisance in a text-based command line scripting environment, and only then because someone early on decided to use space rather than some other metacharacter as the only available delimiter between command arguments in scripts. For regular users who aren't writing shell scripts, long-form filenames mean a document's name is the same as its title, which is hugely helpful from a user interface point of view: "February Marketing Budget" is much more self-documenting as a filename than "febmkbgt" or whatever they would have had to choose in the old days (and remember, UNIX had a 14-character filename size limit in the early days; long filenames weren't introduced until BSD.) I view filenames as primarily for human consumption, so they should act the way humans expect names to act. Computers can just use inode numbers or file IDs to refer to files (like, say, SHA1 hashes) -- the pretty names are all for the benefit of us meat-brained entities. Then again, that means I also think case sensitivity in filenames was a bad design choice. To use your "people's names" analogy, if you ask any random person whether "Billy-bob Thornton" and "Billy-Bob Thornton" are the same name, they'll almost always say yes; most people don't consider capital letters and lower-case letters to be different letters, just different forms of the same letters. And I know how popular *that* opinion is around here... -Steve ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-14 1:06 ` Steven Grimm @ 2007-06-14 1:12 ` Johannes Schindelin 2007-06-14 1:19 ` Steven Grimm 0 siblings, 1 reply; 35+ messages in thread From: Johannes Schindelin @ 2007-06-14 1:12 UTC (permalink / raw) To: Steven Grimm; +Cc: David Kastrup, git Hi, On Wed, 13 Jun 2007, Steven Grimm wrote: > Johannes Schindelin wrote: > > The recommended way is not using spaces to begin with. I mean, does "David" > > contain spaces? People seem not to see the problem, and fail to blame > > Microsoft for all the damage they have done, introducing that stupid, > > stupid concept of filenames containing spaces, and _enforcing_ it. > > > > To be fair, Microsoft did not invent the concept of filenames with spaces. I didn't say that, did I? They _forced_ the use onto the world. That's what I was complaining about. > Even UNIX has, I believe, always allowed them, though you risked running > into buggy scripts misbehaving if you used them. And really, filenames > with spaces are only a nuisance in a text-based command line scripting > environment, and only then because someone early on decided to use space > rather than some other metacharacter as the only available delimiter > between command arguments in scripts. Okay, Steven Grimm. How do you think _I_ can tell that Steven is your name from looking at your _full_ name "Steven Grimm"? Huh? Exactly. I split at the space. Ciao, Dscho ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-14 1:12 ` Johannes Schindelin @ 2007-06-14 1:19 ` Steven Grimm 2007-06-14 1:34 ` Johannes Schindelin 0 siblings, 1 reply; 35+ messages in thread From: Steven Grimm @ 2007-06-14 1:19 UTC (permalink / raw) To: Johannes Schindelin; +Cc: David Kastrup, git Johannes Schindelin wrote: > Okay, Steven Grimm. How do you think _I_ can tell that Steven is your name > from looking at your _full_ name "Steven Grimm"? Huh? > > Exactly. I split at the space. > > At the risk of drawing the conversation way off topic: What's Mary Ann Summers' first name? (Hint: It's not "Mary.") -Steve ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-14 1:19 ` Steven Grimm @ 2007-06-14 1:34 ` Johannes Schindelin 0 siblings, 0 replies; 35+ messages in thread From: Johannes Schindelin @ 2007-06-14 1:34 UTC (permalink / raw) To: Steven Grimm; +Cc: David Kastrup, git Hi, On Wed, 13 Jun 2007, Steven Grimm wrote: > Johannes Schindelin wrote: > > Okay, Steven Grimm. How do you think _I_ can tell that Steven is your name > > from looking at your _full_ name "Steven Grimm"? Huh? > > > > Exactly. I split at the space. > > At the risk of drawing the conversation way off topic: What's Mary Ann > Summers' first name? (Hint: It's not "Mary.") The first first name _is_ Mary. Maybe it is not the name you shout when calling her. But that is irrelevant. The names of Mary Ann Summers are separated by spaces. Period. Ciao, Dscho ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-13 12:21 ` Johannes Schindelin [not found] ` <86ejkgvxmb.fsf@lola.quinscape.zz> 2007-06-14 1:06 ` Steven Grimm @ 2007-06-14 8:49 ` Junio C Hamano 2 siblings, 0 replies; 35+ messages in thread From: Junio C Hamano @ 2007-06-14 8:49 UTC (permalink / raw) To: Johannes Schindelin; +Cc: David Kastrup, git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > On Wed, 13 Jun 2007, David Kastrup wrote: > >> what is the point in quoting file names and their characters in >> git-diff's output? And what is the recommended way of undoing the >> damage? > > The recommended way is not using spaces to begin with. I mean, does > "David" contain spaces? People seem not to see the problem, and fail to > blame Microsoft for all the damage they have done, introducing that > stupid, stupid concept of filenames containing spaces, and _enforcing_ it. Why are you talking about spaces ;-)? There are a few things to note, but the first thing is that mere spaces do not trigger quoting. A tab (HT) does, so do non ASCII characters. The second thing is that we do this quoting for various good reasons, and it is not likely to change. As Alex mentions, the most safe way for programs to read is to read from the -z format. However, even if you are capable to do so, it may be inconvenient in some languages (mainstream languages like C and Perl are not among them). Not quoting SP is a conscious decision, as SP in filenames are rather common, more common than non ASCII and much more common than HT. The "raw" formats "ls-files -s", "ls-tree" and "diff --raw" produce are designed to put names at the end, and typically delimited with a HT, so that "lazy" scripts can use cut (whose default delimiter is a HT) to pick out pieces from its output. And plumbing tools reading from the standard input (most notably, "update-index --stdin") know how to unquote them. In practice, not many people use non ASCII in pathnames and expect them work sanely for everybody, so loosely written scripts, as long as they cut at HT to pick out the pathname part, "mostly" work (I think traditional core git scripts are safe, I suspect some contributed ones shipped with git core may not be, Cogito used to be very unsafe but it was audited and became much safer before it got discontinued). The pathname quoting rules in textual output was chosen primarily to make diff output safer, as one of the most important workflow git supports is e-mailable patches. GNU patch treats HT on "+++ name"/"--- name" lines as the end of name (and after HT comes timestamp), but the timestamp part is treated as optional, which introduces ambiguities and confusion. The issue was discussed some time ago (check the list archive for discussion among I, Linus and Paul Eggert -- the GNU diff and patch maintainer) and the quoting rules we use now is consistent with what the diff and patch plan to use. The update on the GNU side may have already happened, it may not have. When a patch appears in an e-mail, you would need to be aware that not everybody has the luxury of living in UTF-8 only world. Your commit message and cover letter may be in one encoding, the pathnames that appear in diff headers may be in your filesystem encoding, and the patch text that appear as the diff payload may be in another document specific encoding. All three could be different (worse, a patch that touch more than one file can carry different encodings in the payload part), and mixing character set in a single piece of e-mail confuses people's MUA and tends to mangle messages. Quoting non ASCII characters in pathnames, even they are perfectly valid and ordinary UTF-8 strings, is to eliminate one element in the above three as a possible source of worries. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-13 11:30 Stupid quoting David Kastrup 2007-06-13 12:06 ` Alex Riesen 2007-06-13 12:21 ` Johannes Schindelin @ 2007-06-16 21:03 ` Jakub Narebski 2007-06-18 8:00 ` David Kastrup 2 siblings, 1 reply; 35+ messages in thread From: Jakub Narebski @ 2007-06-16 21:03 UTC (permalink / raw) To: git David Kastrup wrote: > what is the point in quoting file names and their characters in > git-diff's output? 7-bit email. -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-16 21:03 ` Jakub Narebski @ 2007-06-18 8:00 ` David Kastrup 2007-06-18 16:19 ` Jeff King 2007-06-19 1:00 ` Johannes Schindelin 0 siblings, 2 replies; 35+ messages in thread From: David Kastrup @ 2007-06-18 8:00 UTC (permalink / raw) To: git; +Cc: Jakub Narebski Jakub Narebski <jnareb@gmail.com> writes: > David Kastrup wrote: > >> what is the point in quoting file names and their characters in >> git-diff's output? > > 7-bit email. I think it can be reasonably safely assumed that people using 8-bit characters in file names will not refrain from using them in the files themselves: file names usually are chosen descriptive of the contents, and so rarely are in a different language. So I don't see what quoting such characters in file names is supposed to buy with regard to diff output in 7-bit email. -- David Kastrup ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-18 8:00 ` David Kastrup @ 2007-06-18 16:19 ` Jeff King 2007-06-19 1:00 ` Johannes Schindelin 1 sibling, 0 replies; 35+ messages in thread From: Jeff King @ 2007-06-18 16:19 UTC (permalink / raw) To: David Kastrup; +Cc: git, Jakub Narebski On Mon, Jun 18, 2007 at 10:00:31AM +0200, David Kastrup wrote: > > 7-bit email. > > I think it can be reasonably safely assumed that people using 8-bit > characters in file names will not refrain from using them in the files Not to mention the commit messages. But more importantly, diffs aren't necessarily going through mail. When I run 'git-show', this isn't useful to me: diff --git "a/ni\303\261o" "b/ni\303\261o" I can only imagine how git-show might look to somebody using all-utf8 filenames (such as Japanese). -Peff ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-18 8:00 ` David Kastrup 2007-06-18 16:19 ` Jeff King @ 2007-06-19 1:00 ` Johannes Schindelin 2007-06-19 7:44 ` David Kastrup 1 sibling, 1 reply; 35+ messages in thread From: Johannes Schindelin @ 2007-06-19 1:00 UTC (permalink / raw) To: David Kastrup; +Cc: git, Jakub Narebski Hi, On Mon, 18 Jun 2007, David Kastrup wrote: > Jakub Narebski <jnareb@gmail.com> writes: > > > David Kastrup wrote: > > > >> what is the point in quoting file names and their characters in > >> git-diff's output? > > > > 7-bit email. > > I think it can be reasonably safely assumed that people using 8-bit > characters in file names will not refrain from using them in the files > themselves: [...] However, please realise that chances are very good that none of these 8-bit unclean things show in the diff. Besides, the proper fix would probably involve making none-8-bit-clean diffs binary diffs (for FORMAT_EMAIL only, of course). > So I don't see what quoting such characters in file names is supposed to > buy with regard to diff output in 7-bit email. But isn't that obvious? Even if the diffs are not 7-bit clean, which I consider as an error, quoting the file names is already half what is required. Don't just throw away backwards compatibility, only because it does not fit your wishes. Ciao, Dscho ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-19 1:00 ` Johannes Schindelin @ 2007-06-19 7:44 ` David Kastrup 2007-06-19 9:50 ` Johannes Schindelin 0 siblings, 1 reply; 35+ messages in thread From: David Kastrup @ 2007-06-19 7:44 UTC (permalink / raw) To: git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > On Mon, 18 Jun 2007, David Kastrup wrote: > >> Jakub Narebski <jnareb@gmail.com> writes: >> >> > David Kastrup wrote: >> > >> >> what is the point in quoting file names and their characters in >> >> git-diff's output? >> > >> > 7-bit email. >> >> I think it can be reasonably safely assumed that people using 8-bit >> characters in file names will not refrain from using them in the files >> themselves: [...] > > However, please realise that chances are very good that none of these > 8-bit unclean things show in the diff. Puh-leaze. So you prefer a behavior which makes it harder to notice problems, on the chance that it may sometimes work by accident? If you want to process diffs, you need an 8-bit clean (and space-preserving) channel, period. This is the task of mail encapsulation, not of the diff utility. > Besides, the proper fix would probably involve making > none-8-bit-clean diffs binary diffs (for FORMAT_EMAIL only, of > course). This is so utterly absurd for people working on non-English documents that I get the expression you are pulling people's legs considering your Email address. >> So I don't see what quoting such characters in file names is >> supposed to buy with regard to diff output in 7-bit email. > > But isn't that obvious? Even if the diffs are not 7-bit clean, which > I consider as an error, quoting the file names is already half what > is required. What is required is a reliable mail channel, and there are a lot of tools for that, from uuencode to various MIME standards and encapsulation methods. The right tool for the right job. Everything else is a mistake because it makes life harder for everyone, not just those using mail, for no good purpose. > Don't just throw away backwards compatibility, only because it does > not fit your wishes. There is no backwards compatibility involved here _at_ _all_. No current tool can process the quoted mess, not even humans (random octal escape sequences are not more readable than characters, or we never would have progressed beyond ASCII). So you are not talking about backward compatibility, but rather gratuitous forward _incompatibility_, and nobody is better off by the latter. There is no point in making life harder for people using non-ASCII characters when there is absolutely no benefit whatsoever involved for those restricting themselves to ASCII characters. -- David Kastrup ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-19 7:44 ` David Kastrup @ 2007-06-19 9:50 ` Johannes Schindelin 2007-06-19 20:53 ` Olivier Galibert [not found] ` <86645kutow.fsf@lola.quinscape.zz> 0 siblings, 2 replies; 35+ messages in thread From: Johannes Schindelin @ 2007-06-19 9:50 UTC (permalink / raw) To: David Kastrup; +Cc: git Hi, On Tue, 19 Jun 2007, David Kastrup wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > Don't just throw away backwards compatibility, only because it does > > not fit your wishes. > > There is no backwards compatibility involved here _at_ _all_. I was not talking about Git here. The specification for SMTP is not going to change just because you want it. There are still mail servers out there which speak 7-bit, and the standard requires you to cope with them. Ciao, Dscho ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-19 9:50 ` Johannes Schindelin @ 2007-06-19 20:53 ` Olivier Galibert [not found] ` <86645kutow.fsf@lola.quinscape.zz> 1 sibling, 0 replies; 35+ messages in thread From: Olivier Galibert @ 2007-06-19 20:53 UTC (permalink / raw) To: Johannes Schindelin; +Cc: David Kastrup, git On Tue, Jun 19, 2007 at 10:50:31AM +0100, Johannes Schindelin wrote: > Hi, > > On Tue, 19 Jun 2007, David Kastrup wrote: > > > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > > > Don't just throw away backwards compatibility, only because it does > > > not fit your wishes. > > > > There is no backwards compatibility involved here _at_ _all_. > > I was not talking about Git here. The specification for SMTP is not going > to change just because you want it. There are still mail servers out there > which speak 7-bit, and the standard requires you to cope with them. There are standards to send 8-bit into 7-bit for email, and \xxx is in none of them. And 8-to-7 encoding for email is not git's job in any case unless git speaks SMTP directly. 8-to-7 is the mail client responsability. OG. ^ permalink raw reply [flat|nested] 35+ messages in thread
[parent not found: <86645kutow.fsf@lola.quinscape.zz>]
* Re: Stupid quoting... [not found] ` <86645kutow.fsf@lola.quinscape.zz> @ 2007-06-20 2:19 ` Johannes Schindelin 2007-06-20 6:19 ` Junio C Hamano 0 siblings, 1 reply; 35+ messages in thread From: Johannes Schindelin @ 2007-06-20 2:19 UTC (permalink / raw) To: David Kastrup; +Cc: git Hi, [sorry for responding so late, your mail got stuck in the GWB-like spam filter.] On Tue, 19 Jun 2007, David Kastrup wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > Hi, > > > > On Tue, 19 Jun 2007, David Kastrup wrote: > > > >> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > >> > >> > Don't just throw away backwards compatibility, only because it does > >> > not fit your wishes. > >> > >> There is no backwards compatibility involved here _at_ _all_. > > > > I was not talking about Git here. The specification for SMTP is not > > going to change just because you want it. There are still mail > > servers out there which speak 7-bit, and the standard requires you > > to cope with them. > > Is there a reason you elide all the relevant material before replying? > I repeat: this is the task of MIME, uuencode or a number of other > mechanisms. The problem there, of course, is that you still might want to reply to the patch, even if the name was chosen as non-ASCII (which is a sin, if you believe in UNIX). Usually, comments are not done on the filenames, so they can be as escaped as they want in an email, as long as the commenter still recognizes their names. > git is not a mail transport system, and there are far too many other > problems in unarmored mail (like spaces, wrapping and other stuff) that > it would make any sense to mangle diffs and other material in a manner > that makes it quite unprocessable for _both_ human readers as well as > scripts intended to process them. There you have a point. If the name is non-ASCII, it uses a specific encoding. if the human reader has a different encoding set in her display, is it any better to display garbled characters (possibly leaving the console in a corrupted state), or to display escaped characters? And scripts have been known to get encodings all wrong, so I think the escaping is the best way out, absent a perfect knowledge of what encoding the file name was meant for. > Anyway, it has become quite clear from this exchange that you have > already made the decision not to be convinced by me and will not be > deterred from that, even though the problem is not the one you initially > tried deriding me for (spaces in filenames). I am sorry. No, really, I am sorry that you received it as derision. By all means, it was _not_ meant as that. The problem was on my side, not yours: I simply did not get that you were talking about non-ASCII characters, even if you were talking about them. > Hopefully some developer with less of an attitude towards non-ASCII > usage will find himself able to follow the arguments with some more > objectivity. > > I don't see our discourse leading anywhere: the points have been made. I would really, really, really like to see a solution. Alas, I cannot think of one, other than _forcing_ the developers to use ASCII-only filenames. Note that there is no convention yet in Git to state which encoding your filenames are supposed to use. And in fact, we already had a fine example in git.git why this is particularly difficult. MacOSX is too clever to be true, in that it gladly takes filenames in one encoding, but reads those filenames out in _another_ encoding. Thus, a "git add <filename>" can well end up in git-status saying that a file was deleted, and another file (actually the same, but in a different encoding) is untracked. Again, I would be _so_ glad if you solved the problem, now that I actually understand it. Ciao, Dscho ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-20 2:19 ` Johannes Schindelin @ 2007-06-20 6:19 ` Junio C Hamano 2007-06-20 7:49 ` David Kastrup 2007-06-24 6:50 ` Jan Hudec 0 siblings, 2 replies; 35+ messages in thread From: Junio C Hamano @ 2007-06-20 6:19 UTC (permalink / raw) To: Johannes Schindelin; +Cc: David Kastrup, git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> I don't see our discourse leading anywhere: the points have been made. > > I would really, really, really like to see a solution. Alas, I cannot > think of one, other than _forcing_ the developers to use ASCII-only > filenames. > > Note that there is no convention yet in Git to state which encoding your > filenames are supposed to use. And in fact, we already had a fine example > in git.git why this is particularly difficult. MacOSX is too clever to be > true, in that it gladly takes filenames in one encoding, but reads those > filenames out in _another_ encoding. Thus, a "git add <filename>" can well > end up in git-status saying that a file was deleted, and another file > (actually the same, but in a different encoding) is untracked. By the way, the pathname quoting done by "diff" does not even attempt to tackle that. I already explained why in the thread so I would not repeat myself. Having said that, the absolute minimum that needs to be quoted are double-quote (because it is used by quoting as agreed with GNU diff/patch maintainer), backslash (used to introduce C-like quoting), newline and horizontal tab (makes "patch" confused, as it would make it ambiguous where the pathname ends), so I am not opposed to a patch that introduces a new mode, probably on by default _unless_ we are generating --format=email, that does not quote high byte values. That would solve "My UTF-8 filenames are unreadable on my terminal" problem. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-20 6:19 ` Junio C Hamano @ 2007-06-20 7:49 ` David Kastrup 2007-06-20 8:40 ` Jakub Narebski 2007-06-24 6:50 ` Jan Hudec 1 sibling, 1 reply; 35+ messages in thread From: David Kastrup @ 2007-06-20 7:49 UTC (permalink / raw) To: git Junio C Hamano <gitster@pobox.com> writes: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > >>> I don't see our discourse leading anywhere: the points have been made. >> >> I would really, really, really like to see a solution. Alas, I >> cannot think of one, other than _forcing_ the developers to use >> ASCII-only filenames. And ASCII-only files. Just eradicate that dreaded Bit 7 from the world. >> Note that there is no convention yet in Git to state which encoding >> your filenames are supposed to use. And in fact, we already had a >> fine example in git.git why this is particularly difficult. MacOSX >> is too clever to be true, in that it gladly takes filenames in one >> encoding, but reads those filenames out in _another_ >> encoding. Thus, a "git add <filename>" can well end up in >> git-status saying that a file was deleted, and another file >> (actually the same, but in a different encoding) is untracked. > > Having said that, the absolute minimum that needs to be quoted are > double-quote (because it is used by quoting as agreed with GNU > diff/patch maintainer), backslash (used to introduce C-like > quoting), > newline and horizontal tab (makes "patch" confused, as it would make > it ambiguous where the pathname ends), so I am not opposed to a > patch that introduces a new mode, probably on by default _unless_ we > are generating --format=email, that does not quote high byte values. I think it would be ok to quote non-graphic characters with octal escape sequences. On ASCII-based systems, those are the characters 0x00 to 0x1f. They don't have a visual representation of their own, anyway. _IF_ they appear in filenames, it is certainly a case involved with excessive cleverness and/or garbage. I'd leave the rest alone. > That would solve "My UTF-8 filenames are unreadable on my terminal" > problem. But there is no point if the most primitive of mail readers does a better job than listing the directory will. 7-Bit terminals are the wrong thing to use for manipulating 8-bit-encoded files, period. And the escape sequences for 8-bit terminals are quite certain to start with characters in the 0x00 to 0x1f range. -- David Kastrup ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-20 7:49 ` David Kastrup @ 2007-06-20 8:40 ` Jakub Narebski 2007-06-20 8:59 ` David Kastrup 0 siblings, 1 reply; 35+ messages in thread From: Jakub Narebski @ 2007-06-20 8:40 UTC (permalink / raw) To: git David Kastrup wrote: > Junio C Hamano <gitster@pobox.com> writes: >> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> >>>> I don't see our discourse leading anywhere: the points have been made. >>> >>> I would really, really, really like to see a solution. Alas, I >>> cannot think of one, other than _forcing_ the developers to use >>> ASCII-only filenames. >>> Note that there is no convention yet in Git to state which encoding >>> your filenames are supposed to use. And in fact, we already had a >>> fine example in git.git why this is particularly difficult. MacOSX >>> is too clever to be true, in that it gladly takes filenames in one >>> encoding, but reads those filenames out in _another_ >>> encoding. Thus, a "git add <filename>" can well end up in >>> git-status saying that a file was deleted, and another file >>> (actually the same, but in a different encoding) is untracked. >> >> Having said that, the absolute minimum that needs to be quoted are >> double-quote (because it is used by quoting as agreed with GNU >> diff/patch maintainer), backslash (used to introduce C-like >> quoting), >> newline and horizontal tab (makes "patch" confused, as it would make >> it ambiguous where the pathname ends), so I am not opposed to a >> patch that introduces a new mode, probably on by default _unless_ we >> are generating --format=email, that does not quote high byte values. > > I think it would be ok to quote non-graphic characters with octal > escape sequences. On ASCII-based systems, those are the characters > 0x00 to 0x1f. They don't have a visual representation of their own, > anyway. _IF_ they appear in filenames, it is certainly a case > involved with excessive cleverness and/or garbage. I'd leave the rest > alone. By the way, ls(1) has its --quoting-style=WORD option, why shouldn't git-diff and friends (including git-format-patch) have the same? And we could change the default later on... -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-20 8:40 ` Jakub Narebski @ 2007-06-20 8:59 ` David Kastrup 0 siblings, 0 replies; 35+ messages in thread From: David Kastrup @ 2007-06-20 8:59 UTC (permalink / raw) To: git; +Cc: Jakub Narebski Jakub Narebski <jnareb@gmail.com> writes: > By the way, ls(1) has its --quoting-style=WORD option, why shouldn't > git-diff and friends (including git-format-patch) have the same? And > we could change the default later on... Because interpreting a diff means interpreting both file names as well as contents. It does not make much sense to use different forms of escaping (\01a and similar) here, though in the diff command line, some additional quoting might be called for. It is also worth noting that bash's echo -e can interpret octal escapes only when they start with \0, and the quoted 3-character forms of 0x00-0x1f incidentally do start in this manner. There is still potential for misinterpretation if an escaped character is immediately followed by a digit. Since octal ASCII digits are in the range 060 to 067, one can get around this problem by continuing to escape characters until one hits a non-octal-digit. So there is at least a reasonable builtin way for bash scripts to translate the three-digit octal escapes for 0x00 to 0x1f uniquely into the proper corresponding strings. With regard to escaping: unless used unarmored in Email (a bad idea) or on a terminal, it might be easiest (for post-processors) to completely refrain from escaping (in effect ignoring the non-printability of characters) and just apply a minimal level of quoting on the file names. -- David Kastrup ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-20 6:19 ` Junio C Hamano 2007-06-20 7:49 ` David Kastrup @ 2007-06-24 6:50 ` Jan Hudec 2007-06-24 11:14 ` Robin Rosenberg 1 sibling, 1 reply; 35+ messages in thread From: Jan Hudec @ 2007-06-24 6:50 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes Schindelin, David Kastrup, git [-- Attachment #1: Type: text/plain, Size: 3033 bytes --] On Tue, Jun 19, 2007 at 23:19:39 -0700, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > >> I don't see our discourse leading anywhere: the points have been made. > > > > I would really, really, really like to see a solution. Alas, I cannot > > think of one, other than _forcing_ the developers to use ASCII-only > > filenames. > > > > Note that there is no convention yet in Git to state which encoding your > > filenames are supposed to use. And in fact, we already had a fine example > > in git.git why this is particularly difficult. MacOSX is too clever to be > > true, in that it gladly takes filenames in one encoding, but reads those > > filenames out in _another_ encoding. Thus, a "git add <filename>" can well > > end up in git-status saying that a file was deleted, and another file > > (actually the same, but in a different encoding) is untracked. I saw bazaar folks discussing this MacOSX issue. Basically in MacOSX filenames are *unicode* strings (just as they are in Windows, btw). Unicode, for compatibility reasons allows expressing many characters in multiple forms -- composed and decomposed. For example 'á' can be expressed as '\u00e1' ('\xc3\xa1' in utf-8) or as 'a\u0301' ('a\xcc\x81' in utf-8). MaxOSX opts to, in accord with unicode standard, treat such representations as equal and it does so by normalizing all filenames to one form. I don't know whether it uses compatibility normalization and I believe it uses the decomposed form (which makes the issue immediately obvious, because most programs work in composed form). > By the way, the pathname quoting done by "diff" does not even > attempt to tackle that. I already explained why in the thread > so I would not repeat myself. > > Having said that, the absolute minimum that needs to be quoted > are double-quote (because it is used by quoting as agreed with > GNU diff/patch maintainer), backslash (used to introduce C-like > quoting), newline and horizontal tab (makes "patch" confused, as > it would make it ambiguous where the pathname ends), so I am not > opposed to a patch that introduces a new mode, probably on by > default _unless_ we are generating --format=email, that does not > quote high byte values. That would solve "My UTF-8 filenames > are unreadable on my terminal" problem. IMHO it should be the default even for email format. Most projects that use non-ascii filenames probably have all members using same locale. And for such group, it will just work. Also usually the file names, content and commit messages will usually be in the same (though project-specific) encoding, so if charset in content-type is set to that, people with different locale able to represent the same characters will still see the names correctly. For other people, the MUA will probably print some escape anyway (it will not screw up the terminal -- it usually knows what it can safely pass to it). -- Jan 'Bulb' Hudec <bulb@ucw.cz> [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-24 6:50 ` Jan Hudec @ 2007-06-24 11:14 ` Robin Rosenberg 2007-06-24 11:47 ` Junio C Hamano 2007-06-24 16:25 ` Jan Hudec 0 siblings, 2 replies; 35+ messages in thread From: Robin Rosenberg @ 2007-06-24 11:14 UTC (permalink / raw) To: Jan Hudec; +Cc: Junio C Hamano, Johannes Schindelin, David Kastrup, git söndag 24 juni 2007 skrev Jan Hudec: > IMHO it should be the default even for email format. Most projects that use > non-ascii filenames probably have all members using same locale. And for > such group, it will just work. Also usually the file names, content and > commit messages will usually be in the same (though project-specific) > encoding, so if charset in content-type is set to that, people with different > locale able to represent the same characters will still see the names > correctly. For other people, the MUA will probably print some escape anyway > (it will not screw up the terminal -- it usually knows what it can safely > pass to it). I can't talk about "most" here, only local conditions, i.e. northern Europe where both the legacy ISO encodings are very common with a steady increase in UTF-8 usage, in the Linux community. People using OSS in windows almost exclusively get the windows-1252 (for most practical purposes the same as ISO-8859-1). Even a *very* small set of random people you will wind up with people having different locales. -- robin ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-24 11:14 ` Robin Rosenberg @ 2007-06-24 11:47 ` Junio C Hamano 2007-06-24 11:58 ` David Kastrup 2007-06-24 16:25 ` Jan Hudec 1 sibling, 1 reply; 35+ messages in thread From: Junio C Hamano @ 2007-06-24 11:47 UTC (permalink / raw) To: Robin Rosenberg; +Cc: Jan Hudec, Johannes Schindelin, David Kastrup, git Robin Rosenberg <robin.rosenberg.lists@dewire.com> writes: > I can't talk about "most" here, only local conditions, i.e. northern Europe > where both the legacy ISO encodings are very common with a steady increase in > UTF-8 usage, in the Linux community. People using OSS in windows almost > exclusively get the windows-1252 (for most practical purposes the same as > ISO-8859-1). > > Even a *very* small set of random people you will wind up with people having > different locales. More problematic is the case where pathnames and contents are in different encodings, even for the same language. For example, my mbox files that store messages I receive from people in Japan have contents in ISO-2022 as that is the longstanding standard encoding used for e-mail over there, but the pathname encoding used by the system I have that mbox file on is EUC-JP. If I were to create a patch between two versions of such a file, the diff header would show the pathname encoded in one, and the changed contents would ben shown in another. As long as you treat "git diff" output as binary blob, that would work just fine, but when you have to transmit such a diff in e-mail as an in-line patch, you would have troubles. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-24 11:47 ` Junio C Hamano @ 2007-06-24 11:58 ` David Kastrup 2007-06-24 12:19 ` Junio C Hamano 0 siblings, 1 reply; 35+ messages in thread From: David Kastrup @ 2007-06-24 11:58 UTC (permalink / raw) To: Junio C Hamano; +Cc: Robin Rosenberg, Jan Hudec, Johannes Schindelin, git Junio C Hamano <gitster@pobox.com> writes: > If I were to create a patch between two versions of such a file, the > diff header would show the pathname encoded in one, and the changed > contents would ben shown in another. As long as you treat "git > diff" output as binary blob, that would work just fine, but when you > have to transmit such a diff in e-mail as an in-line patch, you > would have troubles. ASCII-armoring of what amounts to binary files is the task of the mail software. Also working with encodings. Escaping characters in the diff headers but not in the file contents is not going to achieve anything useful, anyway. With the proper mailing software, you can get your diff across the line in a manner where the other side can make use of it. This is not the case for unarmored mail with ^ escapes in them, since the receiving side can't distinguish them from "real" ^ characters. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-24 11:58 ` David Kastrup @ 2007-06-24 12:19 ` Junio C Hamano 2007-06-24 12:41 ` Jeff King 0 siblings, 1 reply; 35+ messages in thread From: Junio C Hamano @ 2007-06-24 12:19 UTC (permalink / raw) To: David Kastrup; +Cc: Robin Rosenberg, Jan Hudec, Johannes Schindelin, git David Kastrup <dak@gnu.org> writes: > Junio C Hamano <gitster@pobox.com> writes: > >> If I were to create a patch between two versions of such a file, the >> diff header would show the pathname encoded in one, and the changed >> contents would ben shown in another. As long as you treat "git >> diff" output as binary blob, that would work just fine, but when you >> have to transmit such a diff in e-mail as an in-line patch, you >> would have troubles. > > ASCII-armoring of what amounts to binary files is the task of the mail > software. Also working with encodings. Escaping characters in the > diff headers but not in the file contents is not going to achieve > anything useful, anyway. You misunderstood me. The issue is not about transmitting without corruption. Armoring would make it impossible to COMMENTING on the patch INLINE. And that is where the pathname quoting git diff does originally comes from. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-24 12:19 ` Junio C Hamano @ 2007-06-24 12:41 ` Jeff King 0 siblings, 0 replies; 35+ messages in thread From: Jeff King @ 2007-06-24 12:41 UTC (permalink / raw) To: Junio C Hamano Cc: David Kastrup, Robin Rosenberg, Jan Hudec, Johannes Schindelin, git On Sun, Jun 24, 2007 at 05:19:12AM -0700, Junio C Hamano wrote: > > ASCII-armoring of what amounts to binary files is the task of the mail > > software. Also working with encodings. Escaping characters in the > > diff headers but not in the file contents is not going to achieve > > anything useful, anyway. > > You misunderstood me. The issue is not about transmitting > without corruption. Armoring would make it impossible to > COMMENTING on the patch INLINE. > > And that is where the pathname quoting git diff does originally > comes from. Then how about quoted-printable? The point is that you're _already_ screwed by the fact that there can be up to three different encodings in a patch (commit message, pathnames, and file contents) but we only know one of them (the commit message). With the other two, trying to convert encodings is pointless, since we don't know the starting point. So we can either output them as-is as binary, or use some sort of quoting mechanism. The quoting that happens now is: - sometimes unnecessary, and hurts people who are _not_ sending the diff through the mail - not recognized by any widely-used un-quoter. I can't comment on your diff very well if it changes the file "\a/f\303\263\303\266", and there's no viewer that will let me read that in a sane way. I think David's point is that by doing the quoting at the MIME level (using 8bit, or 7bit with QP), the recipient's MUA can at least show the binary characters. Sure, that will totally break if you are using a bad mismatch of encodings, but there's nothing we can do to fix that, not knowing what the encodings are. At least it _will_ work in the case that your encodings are the same. The only argument I see _for_ the current quoting is for parsing by non-mail programs (like patch or git-apply); in that case, it would seem only necessary only to quote tab, newline, backslash, and double quote. But at least those retain their human-readability. -Peff ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-24 11:14 ` Robin Rosenberg 2007-06-24 11:47 ` Junio C Hamano @ 2007-06-24 16:25 ` Jan Hudec 2007-06-24 19:39 ` Robin Rosenberg 1 sibling, 1 reply; 35+ messages in thread From: Jan Hudec @ 2007-06-24 16:25 UTC (permalink / raw) To: Robin Rosenberg; +Cc: Junio C Hamano, Johannes Schindelin, David Kastrup, git [-- Attachment #1: Type: text/plain, Size: 1547 bytes --] On Sun, Jun 24, 2007 at 13:14:45 +0200, Robin Rosenberg wrote: > söndag 24 juni 2007 skrev Jan Hudec: > > IMHO it should be the default even for email format. Most projects that use > > non-ascii filenames probably have all members using same locale. And for > > such group, it will just work. Also usually the file names, content and > > commit messages will usually be in the same (though project-specific) > > encoding, so if charset in content-type is set to that, people with > different > > locale able to represent the same characters will still see the names > > correctly. For other people, the MUA will probably print some escape anyway > > (it will not screw up the terminal -- it usually knows what it can safely > > pass to it). > > I can't talk about "most" here, only local conditions, i.e. northern Europe > where both the legacy ISO encodings are very common with a steady increase in > UTF-8 usage, in the Linux community. People using OSS in windows almost > exclusively get the windows-1252 (for most practical purposes the same as > ISO-8859-1). > > Even a *very* small set of random people you will wind up with people having > different locales. A small set of *random* people will likely have different locales. But a project that would use non-ascii filenames would probably use some particular language and thus be run by people that all speak that language -- which means they are not random at all and probably will use the same locale. -- Jan 'Bulb' Hudec <bulb@ucw.cz> [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-24 16:25 ` Jan Hudec @ 2007-06-24 19:39 ` Robin Rosenberg 2007-06-24 19:47 ` David Kastrup 0 siblings, 1 reply; 35+ messages in thread From: Robin Rosenberg @ 2007-06-24 19:39 UTC (permalink / raw) To: Jan Hudec; +Cc: Junio C Hamano, Johannes Schindelin, David Kastrup, git söndag 24 juni 2007 skrev Jan Hudec: > On Sun, Jun 24, 2007 at 13:14:45 +0200, Robin Rosenberg wrote: > > I can't talk about "most" here, only local conditions, i.e. northern Europe > > where both the legacy ISO encodings are very common with a steady increase in > > UTF-8 usage, in the Linux community. People using OSS in windows almost > > exclusively get the windows-1252 (for most practical purposes the same as > > ISO-8859-1). > > > > Even a *very* small set of random people you will wind up with people having > > different locales. > > A small set of *random* people will likely have different locales. But > a project that would use non-ascii filenames would probably use some > particular language and thus be run by people that all speak that language -- > which means they are not random at all and probably will use the same locale. I was still in referernce to those "local conditions" at that point. It was not meant as a universal statement. Substitutute that for "A small bunch of swedish speaking people from Stockholm". -- robin ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-24 19:39 ` Robin Rosenberg @ 2007-06-24 19:47 ` David Kastrup 2007-06-24 20:17 ` Robin Rosenberg 0 siblings, 1 reply; 35+ messages in thread From: David Kastrup @ 2007-06-24 19:47 UTC (permalink / raw) To: Robin Rosenberg; +Cc: Jan Hudec, Junio C Hamano, Johannes Schindelin, git Robin Rosenberg <robin.rosenberg.lists@dewire.com> writes: > I was still in referernce to those "local conditions" at that > point. It was not meant as a universal statement. Substitutute that > for "A small bunch of swedish speaking people from Stockholm". A wrong encoding is a wrong encoding. Escaping the characters in transition will not magically make the encodings adapt. Escaping characters buys us exactly zilch _unless_ the _channel_ is not 8-bit clean. In which case we should use a normal mail-armoring/attachment/inline data wrapper. In fact, when using editors with some heuristics regarding character sets (like Emacs), leaving 8-bit characters intact gives the editor a chance to guess the correct character set even if it is not the default on the receiving end. Escaping the characters, in contrast, just hides 8-bit usage away in transition. An escaped character in the wrong encoding will get reconstituted into the wrong encoding. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-24 19:47 ` David Kastrup @ 2007-06-24 20:17 ` Robin Rosenberg 2007-06-24 20:25 ` David Kastrup 0 siblings, 1 reply; 35+ messages in thread From: Robin Rosenberg @ 2007-06-24 20:17 UTC (permalink / raw) To: David Kastrup; +Cc: Jan Hudec, Junio C Hamano, Johannes Schindelin, git söndag 24 juni 2007 skrev David Kastrup: > Robin Rosenberg <robin.rosenberg.lists@dewire.com> writes: > > > I was still in referernce to those "local conditions" at that > > point. It was not meant as a universal statement. Substitutute that > > for "A small bunch of swedish speaking people from Stockholm". > > A wrong encoding is a wrong encoding. Escaping the characters in > transition will not magically make the encodings adapt. Escaping > characters buys us exactly zilch _unless_ the _channel_ is not 8-bit > clean. In which case we should use a normal > mail-armoring/attachment/inline data wrapper. > > In fact, when using editors with some heuristics regarding character > sets (like Emacs), leaving 8-bit characters intact gives the editor a > chance to guess the correct character set even if it is not the > default on the receiving end. > > Escaping the characters, in contrast, just hides 8-bit usage away in > transition. An escaped character in the wrong encoding will get > reconstituted into the wrong encoding. > Please don't quote me when the content is not in reference to me or what I've written. -- robin ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stupid quoting... 2007-06-24 20:17 ` Robin Rosenberg @ 2007-06-24 20:25 ` David Kastrup 0 siblings, 0 replies; 35+ messages in thread From: David Kastrup @ 2007-06-24 20:25 UTC (permalink / raw) To: Robin Rosenberg; +Cc: git Robin Rosenberg <robin.rosenberg.lists@dewire.com> writes: > söndag 24 juni 2007 skrev David Kastrup: >> Robin Rosenberg <robin.rosenberg.lists@dewire.com> writes: >> >> > I was still in referernce to those "local conditions" at that >> > point. It was not meant as a universal statement. Substitutute that >> > for "A small bunch of swedish speaking people from Stockholm". >> >> A wrong encoding is a wrong encoding. Escaping the characters in >> transition will not magically make the encodings adapt. > > Please don't quote me when the content is not in reference to me or what > I've written. No idea how that has happened: I actually intended to refer to something different. Sorry. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2007-06-24 20:25 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-13 11:30 Stupid quoting David Kastrup
2007-06-13 12:06 ` Alex Riesen
2007-06-13 12:21 ` Johannes Schindelin
[not found] ` <86ejkgvxmb.fsf@lola.quinscape.zz>
2007-06-14 0:51 ` Johannes Schindelin
2007-06-14 6:12 ` David Kastrup
2007-06-14 7:06 ` Alex Riesen
[not found] ` <86hcpb6lr6.fsf@lola.quinscape.zz>
2007-06-14 8:51 ` Alex Riesen
2007-06-14 1:06 ` Steven Grimm
2007-06-14 1:12 ` Johannes Schindelin
2007-06-14 1:19 ` Steven Grimm
2007-06-14 1:34 ` Johannes Schindelin
2007-06-14 8:49 ` Junio C Hamano
2007-06-16 21:03 ` Jakub Narebski
2007-06-18 8:00 ` David Kastrup
2007-06-18 16:19 ` Jeff King
2007-06-19 1:00 ` Johannes Schindelin
2007-06-19 7:44 ` David Kastrup
2007-06-19 9:50 ` Johannes Schindelin
2007-06-19 20:53 ` Olivier Galibert
[not found] ` <86645kutow.fsf@lola.quinscape.zz>
2007-06-20 2:19 ` Johannes Schindelin
2007-06-20 6:19 ` Junio C Hamano
2007-06-20 7:49 ` David Kastrup
2007-06-20 8:40 ` Jakub Narebski
2007-06-20 8:59 ` David Kastrup
2007-06-24 6:50 ` Jan Hudec
2007-06-24 11:14 ` Robin Rosenberg
2007-06-24 11:47 ` Junio C Hamano
2007-06-24 11:58 ` David Kastrup
2007-06-24 12:19 ` Junio C Hamano
2007-06-24 12:41 ` Jeff King
2007-06-24 16:25 ` Jan Hudec
2007-06-24 19:39 ` Robin Rosenberg
2007-06-24 19:47 ` David Kastrup
2007-06-24 20:17 ` Robin Rosenberg
2007-06-24 20:25 ` David Kastrup
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).