git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bug: pull --rebase with é in name
@ 2012-03-05  9:59 René Haber
  2012-03-05 10:26 ` Jeff King
  0 siblings, 1 reply; 17+ messages in thread
From: René Haber @ 2012-03-05  9:59 UTC (permalink / raw)
  To: git

Hello,

I'm having trouble with the following scenario:
My name contains an é with accent. Having set
git config --global user.name "René Haber"
and several commits with that name in a project.
Now I wanted to pull with --rebase, which fails with:

git pull --rebase
remote: Counting objects: 9, done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 5 (delta 4), reused 0 (delta 0)
Unpacking objects: 100% (5/5), done.
From ____.de:repositories/kapa
   173c610..18987db  master     -> origin/master
First, rewinding head to replay your work on top of it...
/sw/lib/git-core/git-am: line 675: Haber: command not found
Patch does not have a valid e-mail address.

The problem lies in .git/rebase-apply/author-script :

GIT_AUTHOR_NAME='Rene'́ Haber
GIT_AUTHOR_EMAIL='rene@habr.de'
GIT_AUTHOR_DATE='@1330931169 +0100'

where the accent ´ is on top of the apostrophe and an apostrophe is missing from the end of the GIT_AUTHOR_NAME line.
This leads to the "Haber: command not found".
As the author name is taken from the rebased commits changing the user.name in the .gitconfig is useless.
The only way I found around this is changing my name to "Rene Haber" and first rewriting my local history up to the point of the rebase with that name.

Thanks for your help.
René Haber

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: pull --rebase with é in name
  2012-03-05  9:59 Bug: pull --rebase with é in name René Haber
@ 2012-03-05 10:26 ` Jeff King
  2012-03-05 10:37   ` Thomas Rast
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2012-03-05 10:26 UTC (permalink / raw)
  To: René Haber; +Cc: git

On Mon, Mar 05, 2012 at 10:59:16AM +0100, René Haber wrote:

> I'm having trouble with the following scenario:
> My name contains an é with accent. Having set
> git config --global user.name "René Haber"
> and several commits with that name in a project.

That should work in general, but...

> git pull --rebase
> [...]
> /sw/lib/git-core/git-am: line 675: Haber: command not found
> 
> The problem lies in .git/rebase-apply/author-script :
> 
> GIT_AUTHOR_NAME='Rene'́ Haber
> GIT_AUTHOR_EMAIL='rene@habr.de'
> GIT_AUTHOR_DATE='@1330931169 +0100'

That's definitely not right.

I can't seem to reproduce it here with a simple test (neither with
"René" in the author name, nor with an author name containing
single-quote). What version of git are you using (it looks like a recent
one, as it has the magic @-date syntax). Have you set
i18n.commitencoding, or are otherwise using an encoding besides utf8? Is
it possible to share the commits that trigger this bug?

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: pull --rebase with é in name
  2012-03-05 10:26 ` Jeff King
@ 2012-03-05 10:37   ` Thomas Rast
  2012-03-05 11:42     ` René Haber
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Rast @ 2012-03-05 10:37 UTC (permalink / raw)
  To: Jeff King; +Cc: René Haber, git

Jeff King <peff@peff.net> writes:

> On Mon, Mar 05, 2012 at 10:59:16AM +0100, René Haber wrote:
>
>> I'm having trouble with the following scenario:
>> My name contains an é with accent. Having set
>> git config --global user.name "René Haber"
>> and several commits with that name in a project.
>
> That should work in general, but...
>
>> git pull --rebase
>> [...]
>> /sw/lib/git-core/git-am: line 675: Haber: command not found
>> 
>> The problem lies in .git/rebase-apply/author-script :
>> 
>> GIT_AUTHOR_NAME='Rene'́ Haber
>> GIT_AUTHOR_EMAIL='rene@habr.de'
>> GIT_AUTHOR_DATE='@1330931169 +0100'
>
> That's definitely not right.
>
> I can't seem to reproduce it here with a simple test (neither with
> "René" in the author name, nor with an author name containing
> single-quote). What version of git are you using (it looks like a recent
> one, as it has the magic @-date syntax). Have you set
> i18n.commitencoding, or are otherwise using an encoding besides utf8? Is
> it possible to share the commits that trigger this bug?

Also, can you post a hex dump of the config that defines user.name (try
'xxd ~/.gitconfig'), so we can see the encoding of René?

I find it pretty odd that Git manages to split the ´ from the e, so I'm
wondering if perhaps you are using UTF-8 in NFD or similar.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: pull --rebase with é in name
  2012-03-05 10:37   ` Thomas Rast
@ 2012-03-05 11:42     ` René Haber
  2012-03-05 11:58       ` Jeff King
  0 siblings, 1 reply; 17+ messages in thread
From: René Haber @ 2012-03-05 11:42 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Jeff King, git

[-- Attachment #1: Type: text/plain, Size: 181 bytes --]

I'm running git 1.7.9.2 from Fink Project on MacOS X 10.6.
The gitconfig in hex is attached.
I'm not using i18n.commitencoding or a charset different from utf8.

Thanks.
René

[-- Attachment #2: gitconfig.xxd --]
[-- Type: application/octet-stream, Size: 3680 bytes --]

0000000: 5b75 7365 725d 0a09 656d 6169 6c20 3d20  [user]..email = 
0000010: 7265 6e65 4068 6162 722e 6465 0a09 6e61  rene@habr.de..na
0000020: 6d65 203d 2052 656e c3a9 2048 6162 6572  me = Ren.. Haber
0000030: 0a5b 636f 6c6f 725d 0a09 6469 6666 203d  .[color]..diff =
0000040: 2061 7574 6f0a 0973 7461 7475 7320 3d20   auto..status = 
0000050: 6175 746f 0a09 6272 616e 6368 203d 2061  auto..branch = a
0000060: 7574 6f0a 0969 6e74 6572 6163 7469 7665  uto..interactive
0000070: 203d 2061 7574 6f0a 0975 6920 3d20 7472   = auto..ui = tr
0000080: 7565 0a09 7061 6765 7220 3d20 7472 7565  ue..pager = true
0000090: 0a5b 636f 6c6f 7220 2262 7261 6e63 6822  .[color "branch"
00000a0: 5d0a 0963 7572 7265 6e74 203d 2079 656c  ]..current = yel
00000b0: 6c6f 7720 7265 7665 7273 650a 096c 6f63  low reverse..loc
00000c0: 616c 203d 2079 656c 6c6f 770a 0972 656d  al = yellow..rem
00000d0: 6f74 6520 3d20 6772 6565 6e0a 5b63 6f6c  ote = green.[col
00000e0: 6f72 2022 6469 6666 225d 0a09 6d65 7461  or "diff"]..meta
00000f0: 203d 2079 656c 6c6f 7720 626f 6c64 0a09   = yellow bold..
0000100: 6672 6167 203d 206d 6167 656e 7461 2062  frag = magenta b
0000110: 6f6c 640a 096f 6c64 203d 2072 6564 2062  old..old = red b
0000120: 6f6c 640a 096e 6577 203d 2067 7265 656e  old..new = green
0000130: 2062 6f6c 640a 0977 6869 7465 7370 6163   bold..whitespac
0000140: 6520 3d20 7265 6420 7265 7665 7273 650a  e = red reverse.
0000150: 5b63 6f6c 6f72 2022 7374 6174 7573 225d  [color "status"]
0000160: 0a09 6164 6465 6420 3d20 7965 6c6c 6f77  ..added = yellow
0000170: 0a09 6368 616e 6765 6420 3d20 6772 6565  ..changed = gree
0000180: 6e0a 0975 6e74 7261 636b 6564 203d 2063  n..untracked = c
0000190: 7961 6e0a 5b70 6163 6b5d 0a09 7468 7265  yan.[pack]..thre
00001a0: 6164 7320 3d20 300a 5b61 6c69 6173 5d0a  ads = 0.[alias].
00001b0: 0973 7420 3d20 7374 6174 7573 0a09 6369  .st = status..ci
00001c0: 203d 2063 6f6d 6d69 740a 0962 7220 3d20   = commit..br = 
00001d0: 6272 616e 6368 0a09 636f 203d 2063 6865  branch..co = che
00001e0: 636b 6f75 740a 0964 6620 3d20 6469 6666  ckout..df = diff
00001f0: 0a09 6c70 203d 206c 6f67 202d 700a 096c  ..lp = log -p..l
0000200: 6720 3d20 6c6f 6720 2d2d 6772 6170 6820  g = log --graph 
0000210: 2d2d 7072 6574 7479 3d66 6f72 6d61 743a  --pretty=format:
0000220: 2725 4372 6564 2568 2543 7265 7365 7420  '%Cred%h%Creset 
0000230: 2d25 4328 7965 6c6c 6f77 2925 6425 4372  -%C(yellow)%d%Cr
0000240: 6573 6574 2025 7320 2543 6772 6565 6e28  eset %s %Cgreen(
0000250: 2563 7229 2025 4328 626f 6c64 2062 6c75  %cr) %C(bold blu
0000260: 6529 3c25 616e 3e25 4372 6573 6574 2720  e)<%an>%Creset' 
0000270: 2d2d 6162 6272 6576 2d63 6f6d 6d69 7420  --abbrev-commit 
0000280: 2d2d 6461 7465 3d72 656c 6174 6976 650a  --date=relative.
0000290: 0964 6320 3d20 6469 6666 202d 2d63 6163  .dc = diff --cac
00002a0: 6865 6420 2d2d 6e6f 2d65 7874 2d64 6966  hed --no-ext-dif
00002b0: 660a 0977 7466 203d 2021 6769 742d 7774  f..wtf = !git-wt
00002c0: 660a 5b70 7573 685d 0a09 6465 6661 756c  f.[push]..defaul
00002d0: 7420 3d20 6d61 7463 6869 6e67 0a5b 636f  t = matching.[co
00002e0: 7265 5d0a 0977 6869 7465 7370 6163 653d  re]..whitespace=
00002f0: 6669 782c 7472 6169 6c69 6e67 2d73 7061  fix,trailing-spa
0000300: 6365 2c63 722d 6174 2d65 6f6c 0a5b 7265  ce,cr-at-eol.[re
0000310: 7265 7265 5d0a 0965 6e61 626c 6564 203d  rere]..enabled =
0000320: 2074 7275 650a 5b6d 6572 6765 5d0a 0973   true.[merge]..s
0000330: 7461 7420 3d20 7472 7565 0a5b 6469 6666  tat = true.[diff
0000340: 5d0a 096d 6e65 6d6f 6e69 6370 7265 6669  ]..mnemonicprefi
0000350: 7820 3d20 7472 7565 0a09 7265 6e61 6d65  x = true..rename
0000360: 7320 3d20 636f 7069 6573 0a              s = copies.

[-- Attachment #3: Type: text/plain, Size: 1499 bytes --]



Am 05.03.2012 um 11:37 schrieb Thomas Rast:

> Jeff King <peff@peff.net> writes:
> 
>> On Mon, Mar 05, 2012 at 10:59:16AM +0100, René Haber wrote:
>> 
>>> I'm having trouble with the following scenario:
>>> My name contains an é with accent. Having set
>>> git config --global user.name "René Haber"
>>> and several commits with that name in a project.
>> 
>> That should work in general, but...
>> 
>>> git pull --rebase
>>> [...]
>>> /sw/lib/git-core/git-am: line 675: Haber: command not found
>>> 
>>> The problem lies in .git/rebase-apply/author-script :
>>> 
>>> GIT_AUTHOR_NAME='Rene'́ Haber
>>> GIT_AUTHOR_EMAIL='rene@habr.de'
>>> GIT_AUTHOR_DATE='@1330931169 +0100'
>> 
>> That's definitely not right.
>> 
>> I can't seem to reproduce it here with a simple test (neither with
>> "René" in the author name, nor with an author name containing
>> single-quote). What version of git are you using (it looks like a recent
>> one, as it has the magic @-date syntax). Have you set
>> i18n.commitencoding, or are otherwise using an encoding besides utf8? Is
>> it possible to share the commits that trigger this bug?
> 
> Also, can you post a hex dump of the config that defines user.name (try
> 'xxd ~/.gitconfig'), so we can see the encoding of René?
> 
> I find it pretty odd that Git manages to split the ´ from the e, so I'm
> wondering if perhaps you are using UTF-8 in NFD or similar.
> 
> -- 
> Thomas Rast
> trast@{inf,student}.ethz.ch


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: pull --rebase with é in name
  2012-03-05 11:42     ` René Haber
@ 2012-03-05 11:58       ` Jeff King
  2012-03-05 12:36         ` Jakub Narebski
  2012-03-05 12:46         ` René Haber
  0 siblings, 2 replies; 17+ messages in thread
From: Jeff King @ 2012-03-05 11:58 UTC (permalink / raw)
  To: René Haber; +Cc: Thomas Rast, git

On Mon, Mar 05, 2012 at 12:42:14PM +0100, René Haber wrote:

> I'm running git 1.7.9.2 from Fink Project on MacOS X 10.6.
> The gitconfig in hex is attached.

Hmm, looks like pretty standard utf8:

  0000020: 6d65 203d 2052 656e c3a9 2048 6162 6572  me = Ren.. Haber

and the same thing I used in my tests. I tried repeating the test with
v1.7.9.2 on OS X (although my test box is 10.7), and couldn't replicate
it.

Can you show us the commit that causes the problem, as printed by "git
cat-file commit $commit | xxd"? I just want to double-check that there
are no odd bytes there.

Also, what happens if you do:

  sh -c '
    . /sw/lib/git-core/git-sh-setup
     get_author_ident_from_commit $commit
  '

(my theory is that this is the underlying problem in the rebase, and
should show the bug; by narrowing it down, it should make testing a lot
simpler).

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: pull --rebase with é in name
  2012-03-05 11:58       ` Jeff King
@ 2012-03-05 12:36         ` Jakub Narebski
  2012-03-05 12:46         ` René Haber
  1 sibling, 0 replies; 17+ messages in thread
From: Jakub Narebski @ 2012-03-05 12:36 UTC (permalink / raw)
  To: Jeff King; +Cc: René Haber, Thomas Rast, git

Jeff King <peff@peff.net> writes:

> On Mon, Mar 05, 2012 at 12:42:14PM +0100, René Haber wrote:
> 
> > I'm running git 1.7.9.2 from Fink Project on MacOS X 10.6.
> > The gitconfig in hex is attached.
> 
> Hmm, looks like pretty standard utf8:
> 
>   0000020: 6d65 203d 2052 656e c3a9 2048 6162 6572  me = Ren.. Haber
> 
> and the same thing I used in my tests. I tried repeating the test with
> v1.7.9.2 on OS X (although my test box is 10.7), and couldn't replicate
> it.
> 
> Can you show us the commit that causes the problem, as printed by "git
> cat-file commit $commit | xxd"? I just want to double-check that there
> are no odd bytes there.
> 
> Also, what happens if you do:
> 
>   sh -c '
>     . /sw/lib/git-core/git-sh-setup
>      get_author_ident_from_commit $commit
>   '
> 
> (my theory is that this is the underlying problem in the rebase, and
> should show the bug; by narrowing it down, it should make testing a lot
> simpler).

Hmmm... one place where I have read about this strange "René" -> "Rene'"
conversion is when terminal (console) cannot display unicode, and tries
to show it using ASCII:

  http://stackoverflow.com/a/9430419/46058

But it should not matter if we are writing to file, isn't it?

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: pull --rebase with é in name
  2012-03-05 11:58       ` Jeff King
  2012-03-05 12:36         ` Jakub Narebski
@ 2012-03-05 12:46         ` René Haber
  2012-03-05 13:04           ` Thomas Rast
  1 sibling, 1 reply; 17+ messages in thread
From: René Haber @ 2012-03-05 12:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Thomas Rast, git

[-- Attachment #1: Type: text/plain, Size: 395 bytes --]

sh -c '                                   
   . /sw/lib/git-core/git-sh-setup
    get_author_ident_from_commit 16b94413cbce12531e8f946286851598449d3913
 '
GIT_AUTHOR_NAME='Ren'é Haber
GIT_AUTHOR_EMAIL='rene@habr.de'
GIT_AUTHOR_DATE='@1329212923 +0100'

Commit attached.

The thing is, that this only happens when I do git pull --rebase.
Doing a git rebase -i HEAD~5 or so works.



[-- Attachment #2: 16b9441.commit --]
[-- Type: application/octet-stream, Size: 1068 bytes --]

0000000: 7472 6565 2032 3338 3339 6430 6161 6635  tree 23839d0aaf5
0000010: 6130 3536 3932 3366 3735 3839 6433 6335  a056923f7589d3c5
0000020: 3063 6661 6337 3830 3632 6661 350a 7061  0cfac78062fa5.pa
0000030: 7265 6e74 2030 6530 6264 3264 6236 3232  rent 0e0bd2db622
0000040: 3565 3433 6463 3565 3239 6139 6161 3034  5e43dc5e29a9aa04
0000050: 3732 3730 3466 3430 6237 3066 380a 6175  72704f40b70f8.au
0000060: 7468 6f72 2052 656e c3a9 2048 6162 6572  thor Ren.. Haber
0000070: 203c 7265 6e65 4068 6162 722e 6465 3e20   <rene@habr.de> 
0000080: 3133 3239 3231 3239 3233 202b 3031 3030  1329212923 +0100
0000090: 0a63 6f6d 6d69 7474 6572 2052 656e c3a9  .committer Ren..
00000a0: 2048 6162 6572 203c 7265 6e65 4068 6162   Haber <rene@hab
00000b0: 722e 6465 3e20 3133 3239 3231 3239 3233  r.de> 1329212923
00000c0: 202b 3031 3030 0a0a 486f 7065 6675 6c6c   +0100..Hopefull
00000d0: 7920 6669 7865 6420 6469 7370 6c61 7920  y fixed display 
00000e0: 6275 6720 696e 2076 6572 616e 7374 616c  bug in veranstal
00000f0: 7475 6e67 656e 2f65 6469 742e            tungen/edit.

[-- Attachment #3: Type: text/plain, Size: 994 bytes --]


Am 05.03.2012 um 12:58 schrieb Jeff King:

> On Mon, Mar 05, 2012 at 12:42:14PM +0100, René Haber wrote:
> 
>> I'm running git 1.7.9.2 from Fink Project on MacOS X 10.6.
>> The gitconfig in hex is attached.
> 
> Hmm, looks like pretty standard utf8:
> 
>  0000020: 6d65 203d 2052 656e c3a9 2048 6162 6572  me = Ren.. Haber
> 
> and the same thing I used in my tests. I tried repeating the test with
> v1.7.9.2 on OS X (although my test box is 10.7), and couldn't replicate
> it.
> 
> Can you show us the commit that causes the problem, as printed by "git
> cat-file commit $commit | xxd"? I just want to double-check that there
> are no odd bytes there.
> 
> Also, what happens if you do:
> 
>  sh -c '
>    . /sw/lib/git-core/git-sh-setup
>     get_author_ident_from_commit $commit
>  '
> 
> (my theory is that this is the underlying problem in the rebase, and
> should show the bug; by narrowing it down, it should make testing a lot
> simpler).
> 
> -Peff


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: pull --rebase with é in name
  2012-03-05 12:46         ` René Haber
@ 2012-03-05 13:04           ` Thomas Rast
  2012-03-05 13:19             ` René Haber
  2012-03-05 13:29             ` Jeff King
  0 siblings, 2 replies; 17+ messages in thread
From: Thomas Rast @ 2012-03-05 13:04 UTC (permalink / raw)
  To: René Haber; +Cc: Jeff King, git, Will Palmer

René Haber <rene@habr.de> writes:

> sh -c '                                   
>    . /sw/lib/git-core/git-sh-setup
>     get_author_ident_from_commit 16b94413cbce12531e8f946286851598449d3913
>  '
> GIT_AUTHOR_NAME='Ren'é Haber
> GIT_AUTHOR_EMAIL='rene@habr.de'
> GIT_AUTHOR_DATE='@1329212923 +0100'

I think this is the same issue that we recently discussed on #git-devel,
where some broken versions of sed will fail to match "any character"
with '.' even under LC_ALL=C.  Will "shruggar" Palmer (cc) had this
issue under OS X with a build of GNU sed that ignored LC_*.

You can verify that this is the problem by looking at

  printf "\370\235\204\236\n" | LC_CTYPE=C sed 's/./x/g' | xxd

It should say

  0000000: 7878 7878 0a                             xxxx.

That is, the garbage (if you try to read it as UTF-8) in the printf
string was matched and replaced byte-by-byte with 'x'.  However,
Will was getting the unreplaced results

  0000000: f89d 849e 0a                             .....

I'm not sure he has followed up on that problem; the only hope may be to
get a better 'sed'.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: pull --rebase with é in name
  2012-03-05 13:04           ` Thomas Rast
@ 2012-03-05 13:19             ` René Haber
  2012-03-05 13:29             ` Jeff King
  1 sibling, 0 replies; 17+ messages in thread
From: René Haber @ 2012-03-05 13:19 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Jeff King, git, Will Palmer

Am 05.03.2012 um 14:04 schrieb Thomas Rast:

> René Haber <rene@habr.de> writes:
> 
>> sh -c '                                   
>>   . /sw/lib/git-core/git-sh-setup
>>    get_author_ident_from_commit 16b94413cbce12531e8f946286851598449d3913
>> '
>> GIT_AUTHOR_NAME='Ren'é Haber
>> GIT_AUTHOR_EMAIL='rene@habr.de'
>> GIT_AUTHOR_DATE='@1329212923 +0100'
> 
> I think this is the same issue that we recently discussed on #git-devel,
> where some broken versions of sed will fail to match "any character"
> with '.' even under LC_ALL=C.  Will "shruggar" Palmer (cc) had this
> issue under OS X with a build of GNU sed that ignored LC_*.
> 
> You can verify that this is the problem by looking at
> 
>  printf "\370\235\204\236\n" | LC_CTYPE=C sed 's/./x/g' | xxd
> 
> It should say
> 
>  0000000: 7878 7878 0a                             xxxx.
> 
> That is, the garbage (if you try to read it as UTF-8) in the printf
> string was matched and replaced byte-by-byte with 'x'.  However,
> Will was getting the unreplaced results
> 
>  0000000: f89d 849e 0a                             .....
> 
> I'm not sure he has followed up on that problem; the only hope may be to
> get a better 'sed'.
I can conform this. I get .....
Using the sed from apple results in xxxx.

Thanks.
René

> -- 
> Thomas Rast
> trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: pull --rebase with é in name
  2012-03-05 13:04           ` Thomas Rast
  2012-03-05 13:19             ` René Haber
@ 2012-03-05 13:29             ` Jeff King
  2012-03-05 13:40               ` Thomas Rast
  2012-03-05 17:23               ` Junio C Hamano
  1 sibling, 2 replies; 17+ messages in thread
From: Jeff King @ 2012-03-05 13:29 UTC (permalink / raw)
  To: Thomas Rast; +Cc: René Haber, git, Will Palmer

On Mon, Mar 05, 2012 at 02:04:37PM +0100, Thomas Rast wrote:

> René Haber <rene@habr.de> writes:
> 
> > sh -c '                                   
> >    . /sw/lib/git-core/git-sh-setup
> >     get_author_ident_from_commit 16b94413cbce12531e8f946286851598449d3913
> >  '
> > GIT_AUTHOR_NAME='Ren'é Haber
> > GIT_AUTHOR_EMAIL='rene@habr.de'
> > GIT_AUTHOR_DATE='@1329212923 +0100'
> [...]
> That is, the garbage (if you try to read it as UTF-8) in the printf
> string was matched and replaced byte-by-byte with 'x'.  However,
> Will was getting the unreplaced results
> 
>   0000000: f89d 849e 0a                             .....
> 
> I'm not sure he has followed up on that problem; the only hope may be to
> get a better 'sed'.

Long ago, 47c9739e replaced the shell quoting in git-am with "git
rev-parse --sq-quote" (instead of sed). Maybe we can do the same for
get_author_ident_from_commit (though it is a little trickier there, as
we are also parsing values directly out of --pretty=raw).

It would be nice if the --pretty format placeholders had a "shell-quote"
modifier, and we could just do:

  git show --format='GIT_AUTHOR_NAME=%(an:shell)'

or something similar. for-each-ref knows about shell-quoting, but we
can't use it here, because we are looking at arbitrary commits, not just
ones pointed to by refs.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: pull --rebase with é in name
  2012-03-05 13:29             ` Jeff King
@ 2012-03-05 13:40               ` Thomas Rast
  2012-03-05 13:50                 ` Jeff King
  2012-03-05 17:23               ` Junio C Hamano
  1 sibling, 1 reply; 17+ messages in thread
From: Thomas Rast @ 2012-03-05 13:40 UTC (permalink / raw)
  To: Jeff King; +Cc: René Haber, git, Will Palmer

Jeff King <peff@peff.net> writes:

> On Mon, Mar 05, 2012 at 02:04:37PM +0100, Thomas Rast wrote:
>
>> René Haber <rene@habr.de> writes:
>> 
>> > sh -c '                                   
>> >    . /sw/lib/git-core/git-sh-setup
>> >     get_author_ident_from_commit 16b94413cbce12531e8f946286851598449d3913
>> >  '
>> > GIT_AUTHOR_NAME='Ren'é Haber
>> > GIT_AUTHOR_EMAIL='rene@habr.de'
>> > GIT_AUTHOR_DATE='@1329212923 +0100'
>> [...]
>> That is, the garbage (if you try to read it as UTF-8) in the printf
>> string was matched and replaced byte-by-byte with 'x'.  However,
>> Will was getting the unreplaced results
>> 
>>   0000000: f89d 849e 0a                             .....
>> 
>> I'm not sure he has followed up on that problem; the only hope may be to
>> get a better 'sed'.
[...]
> It would be nice if the --pretty format placeholders had a "shell-quote"
> modifier, and we could just do:
>
>   git show --format='GIT_AUTHOR_NAME=%(an:shell)'
>
> or something similar. for-each-ref knows about shell-quoting, but we
> can't use it here, because we are looking at arbitrary commits, not just
> ones pointed to by refs.

Perhaps by using %an etc., line numbers and --sq-quote:

  $ git rev-list --no-walk --date=raw --format="%an%n%ae%n%ad" --encoding=UTF-8 HEAD |
    while read -r s; do git rev-parse --sq-quote "$s"; done |
    sed -n -e '2s/^ /GIT_AUTHOR_NAME=/p' -e '3s/^ /GIT_AUTHOR_EMAIL=/p' -e '4s/^ /GIT_AUTHOR_DATE=/p'
  GIT_AUTHOR_NAME='Thom'\''as Ràst'
  GIT_AUTHOR_EMAIL='trast@inf.ethz.ch'
  GIT_AUTHOR_DATE='1330935546 +0100'

This is for a commit where I deliberately mangled my author line to make
an interesting example, as in

  $ git cat-file commit HEAD | grep ^author
  author Thom'as Ràst <trast@inf.ethz.ch> 1330935546 +0100

I tried doing the quoting inside sed instead of the while...rev-parse
--sq-quote, but it made my head hurt.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: pull --rebase with é in name
  2012-03-05 13:40               ` Thomas Rast
@ 2012-03-05 13:50                 ` Jeff King
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff King @ 2012-03-05 13:50 UTC (permalink / raw)
  To: Thomas Rast; +Cc: René Haber, git, Will Palmer

On Mon, Mar 05, 2012 at 02:40:34PM +0100, Thomas Rast wrote:

> > It would be nice if the --pretty format placeholders had a "shell-quote"
> > modifier, and we could just do:
> >
> >   git show --format='GIT_AUTHOR_NAME=%(an:shell)'
> >
> > or something similar. for-each-ref knows about shell-quoting, but we
> > can't use it here, because we are looking at arbitrary commits, not just
> > ones pointed to by refs.
> 
> Perhaps by using %an etc., line numbers and --sq-quote:
> 
>   $ git rev-list --no-walk --date=raw --format="%an%n%ae%n%ad" --encoding=UTF-8 HEAD |
>     while read -r s; do git rev-parse --sq-quote "$s"; done |
>     sed -n -e '2s/^ /GIT_AUTHOR_NAME=/p' -e '3s/^ /GIT_AUTHOR_EMAIL=/p' -e '4s/^ /GIT_AUTHOR_DATE=/p'
>   GIT_AUTHOR_NAME='Thom'\''as Ràst'
>   GIT_AUTHOR_EMAIL='trast@inf.ethz.ch'
>   GIT_AUTHOR_DATE='1330935546 +0100'

Yeah, that works. It's a little harder to read than would be ideal, but
should produce the right results (I was initially hesitant to use "read"
because I was worried about newlines in the input. But of course, that's
a non-issue since author ident by definition cannot have newlines in
it).

I think this is a good direction regardless of the sed issue. We end up
parsing ident lines like this in a lot of different places, and I would
not be surprised if they do not all behave exactly the same. Eliminating
one such parser in favor of the standard one in pretty.c seems like a
good thing.

-Peff

PS If you are going to turn that into a real patch, note that your date
   field accidentally drops the "@" specifier that unambiguously marks
   the number as an epoch timestamp.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: pull --rebase with é in name
  2012-03-05 13:29             ` Jeff King
  2012-03-05 13:40               ` Thomas Rast
@ 2012-03-05 17:23               ` Junio C Hamano
  2012-03-06  8:23                 ` Jeff King
  2012-03-06  8:36                 ` Thomas Rast
  1 sibling, 2 replies; 17+ messages in thread
From: Junio C Hamano @ 2012-03-05 17:23 UTC (permalink / raw)
  To: Jeff King; +Cc: Thomas Rast, René Haber, git, Will Palmer

Jeff King <peff@peff.net> writes:

> It would be nice if the --pretty format placeholders had a "shell-quote"
> modifier, and we could just do:
>
>   git show --format='GIT_AUTHOR_NAME=%(an:shell)'
>
> or something similar. for-each-ref knows about shell-quoting, but we
> can't use it here, because we are looking at arbitrary commits, not just
> ones pointed to by refs.

You guys seem to have been having a lot of fun overnight. Perhaps I
should live on European time?

I think there were talks about cross pollinating and eventually
unifying the placeholder languages of pretty and for-each-ref, and
if we were to do so, I agree that --pretty definitely should learn
to do --sq. But I do not think we want to teach everything :shell;
following the style of %w(), something more generic that would apply
to any payload would be preferred, perhaps giving an end result like
this:

	git show -s --format='
		GIT_AUTHOR_NAME=%(sq-begin)%an%(sq-end)
                GIT_AUTHOR_EMAIL=%(sq-begin)%ae%(sq-end)
        '

which would be immediately `eval`-able.

Also I wonder if it is time for "git-am" to make more use of direct
knowledge of the $rebasing and the original commit. Perhaps by
teaching commit-tree to take the -c option from commit, we may not
even have to worry about this.

In any case, my reading of the conclusion you guys have already
reached in this thread is that the issue is not even a bug in Git,
but is a broken build/installation of sed by a third-party.  I am
inclined to suggest any change to get_author_ident_from_commit
helper backburnered before we teach --sq to --pretty machinery.

If the broken sed was the apple one that came with the platform, my
conclusion might be different, but it seems to me that this is not
something we would urgently have to worry about and patch our code
up with an ugly band-aid workaround.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: pull --rebase with é in name
  2012-03-05 17:23               ` Junio C Hamano
@ 2012-03-06  8:23                 ` Jeff King
  2012-03-06  8:36                 ` Thomas Rast
  1 sibling, 0 replies; 17+ messages in thread
From: Jeff King @ 2012-03-06  8:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Thomas Rast, René Haber, git, Will Palmer

On Mon, Mar 05, 2012 at 09:23:19AM -0800, Junio C Hamano wrote:

> I think there were talks about cross pollinating and eventually
> unifying the placeholder languages of pretty and for-each-ref, and
> if we were to do so, I agree that --pretty definitely should learn
> to do --sq. But I do not think we want to teach everything :shell;
> following the style of %w(), something more generic that would apply
> to any payload would be preferred, perhaps giving an end result like
> this:
> 
> 	git show -s --format='
> 		GIT_AUTHOR_NAME=%(sq-begin)%an%(sq-end)
>                 GIT_AUTHOR_EMAIL=%(sq-begin)%ae%(sq-end)
>         '
> 
> which would be immediately `eval`-able.

Yeah, that could work. I didn't want to teach everything :shell
individually. I was hoping eventually for a world where
"%(foo:one:two=bar)" was internally parsed into "the foo item, with
attribute one set, and attribute two set to bar". And then the "shell"
attribute would have a particular meaning for everything, whereas
in "%(authordate:format=short)", the "format" attribute would be
specific to that item.

I think that makes for a more readable syntax. However, your proposal
does allow quoting multiple entities at a time, like:

  IDENT=%(sq-begin)%an <%ae>%(sq-end)

which could be useful.

Anyway, there is not much point in discussing hypothetical syntaxes. I
think we agree that some form of this feature would be an ideal way
forward in the long term, but specifics can wait until somebody shows up
with patches.

> In any case, my reading of the conclusion you guys have already
> reached in this thread is that the issue is not even a bug in Git,
> but is a broken build/installation of sed by a third-party.  I am
> inclined to suggest any change to get_author_ident_from_commit
> helper backburnered before we teach --sq to --pretty machinery.

I think that is true. It could be considered a bug in git if we were
relying on an unportable sed construct. But it works everywhere else,
and we already go to the effort to set LANG and LC_ALL, so I am inclined
to say that it is not a portability issue in git, but a crappy sed
implementation, and the right solution is to use a better one.

We could switch the use of sed to perl (even just using 5.005-ish
features, which are pretty portable), but until now, users of
git-sh-setup don't need to rely on having perl at all.

So I'm fine with leaving it for now and telling people to fix their sed.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: pull --rebase with é in name
  2012-03-05 17:23               ` Junio C Hamano
  2012-03-06  8:23                 ` Jeff King
@ 2012-03-06  8:36                 ` Thomas Rast
  2012-03-06  9:02                   ` Jeff King
  2012-03-06 18:31                   ` Junio C Hamano
  1 sibling, 2 replies; 17+ messages in thread
From: Thomas Rast @ 2012-03-06  8:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, René Haber, git, Will Palmer

Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
>
>> It would be nice if the --pretty format placeholders had a "shell-quote"
>> modifier, and we could just do:
>>
>>   git show --format='GIT_AUTHOR_NAME=%(an:shell)'
>>
>> or something similar. for-each-ref knows about shell-quoting, but we
>> can't use it here, because we are looking at arbitrary commits, not just
>> ones pointed to by refs.
>
> You guys seem to have been having a lot of fun overnight. Perhaps I
> should live on European time?

IIUC Peff just got up at an unreasonably early time to have fun with us
Europeans?

> I think there were talks about cross pollinating and eventually
> unifying the placeholder languages of pretty and for-each-ref, and
> if we were to do so, I agree that --pretty definitely should learn
> to do --sq. But I do not think we want to teach everything :shell;
> following the style of %w(), something more generic that would apply
> to any payload would be preferred, perhaps giving an end result like
> this:
>
> 	git show -s --format='
> 		GIT_AUTHOR_NAME=%(sq-begin)%an%(sq-end)
>                 GIT_AUTHOR_EMAIL=%(sq-begin)%ae%(sq-end)
>         '

How about something along the lines of %Q(%an) instead?  Though at least
implementation-wise, it should be possible to make %'%an%' work, too,
which would be rather cute.

> In any case, my reading of the conclusion you guys have already
> reached in this thread is that the issue is not even a bug in Git,
> but is a broken build/installation of sed by a third-party.  I am
> inclined to suggest any change to get_author_ident_from_commit
> helper backburnered before we teach --sq to --pretty machinery.

Ok.

This is the second "victim" of this broken install of sed, however.  I
wonder where René and Will got it from?  Perhaps this is "the" common
way of getting GNU sed on OS X, and thus more widespread than we might
think.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: pull --rebase with é in name
  2012-03-06  8:36                 ` Thomas Rast
@ 2012-03-06  9:02                   ` Jeff King
  2012-03-06 18:31                   ` Junio C Hamano
  1 sibling, 0 replies; 17+ messages in thread
From: Jeff King @ 2012-03-06  9:02 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Junio C Hamano, René Haber, git, Will Palmer

On Tue, Mar 06, 2012 at 09:36:31AM +0100, Thomas Rast wrote:

> > You guys seem to have been having a lot of fun overnight. Perhaps I
> > should live on European time?
> 
> IIUC Peff just got up at an unreasonably early time to have fun with us
> Europeans?

Er...got up? Yeeeeeah, that's what happened. I would never stay up until
6am local time hacking on git. ;)

> This is the second "victim" of this broken install of sed, however.  I
> wonder where René and Will got it from?  Perhaps this is "the" common
> way of getting GNU sed on OS X, and thus more widespread than we might
> think.

That's worth looking into, but the answer may still be "this common sed
is broken, and we should tell the people who are packaging it to unbreak
it". I'm worried that there really isn't a workaround (we are already
trying LC_ALL and LANG; is there something else we can do short of not
using sed at all?).

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: pull --rebase with é in name
  2012-03-06  8:36                 ` Thomas Rast
  2012-03-06  9:02                   ` Jeff King
@ 2012-03-06 18:31                   ` Junio C Hamano
  1 sibling, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2012-03-06 18:31 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Jeff King, René Haber, git, Will Palmer

Thomas Rast <trast@inf.ethz.ch> writes:

>> 	git show -s --format='
>> 		GIT_AUTHOR_NAME=%(sq-begin)%an%(sq-end)
>>                 GIT_AUTHOR_EMAIL=%(sq-begin)%ae%(sq-end)
>>         '
>
> How about something along the lines of %Q(%an) instead?  Though at least
> implementation-wise, it should be possible to make %'%an%' work, too,
> which would be rather cute.

It would be also less error prone from end user's point of view if
your closing token is not ")" (as in %Q(%an)) but percent-something,
e.g. %<%an%>, %`%an%', or %'%an%'.  The way to quote a string that
happens to be the same as closing token you want to put in the
quoted string would be more obvious (e.g. ID=%Q(%ae (%an%29) is a
bit hard to read for ID='gitster@pobox.com (J C H)').

	ID=%'%ae (%an)%'

As I do not expect these things to nest (do we want to be able to
formulate a string that can be eval'ed twice???), using the same
string for both opening and closing token is fine by me.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-03-06 18:31 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-05  9:59 Bug: pull --rebase with é in name René Haber
2012-03-05 10:26 ` Jeff King
2012-03-05 10:37   ` Thomas Rast
2012-03-05 11:42     ` René Haber
2012-03-05 11:58       ` Jeff King
2012-03-05 12:36         ` Jakub Narebski
2012-03-05 12:46         ` René Haber
2012-03-05 13:04           ` Thomas Rast
2012-03-05 13:19             ` René Haber
2012-03-05 13:29             ` Jeff King
2012-03-05 13:40               ` Thomas Rast
2012-03-05 13:50                 ` Jeff King
2012-03-05 17:23               ` Junio C Hamano
2012-03-06  8:23                 ` Jeff King
2012-03-06  8:36                 ` Thomas Rast
2012-03-06  9:02                   ` Jeff King
2012-03-06 18:31                   ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).