* Bug: rebase when an author uses accents in name on MacOSx
@ 2012-05-30 22:16 Lanny Ripple
2012-05-30 23:45 ` Junio C Hamano
0 siblings, 1 reply; 21+ messages in thread
From: Lanny Ripple @ 2012-05-30 22:16 UTC (permalink / raw)
To: git
[-- Attachment #1: Type: text/plain, Size: 1664 bytes --]
Hello,
I've come across behavior I'd say is a bug.
We have a developer with an accent in his name, Rémi Leblond. Very recently we stopped being able to rebase getting the error
lanny(master);<work/IdeaProjects/Piper> git rebase master rl-clean292
First, rewinding head to replay your work on top of it...
/sw/lib/git-core/git-am: line 692: Leblond: command not found
Patch does not have a valid e-mail address.
lanny((ae6c220...)|REBASE);<work/IdeaProjects/Piper>
Versions 1.7.10.2 is (now?) exhibiting this behavior. We've been rebasing fine for several month which is why I wonder if being on MacOSX is involved? (I'm at 10.7.4) Some digging shows the root cause to be in function get_author_ident_from_commit at line 210 of git-core/git-sh-setup, namely the sed with environment overrides LANG=C LC_ALL=C. This causes git-am to incorrectly build the .git/rebase-apply/author-script.
lanny((ae6c220...)|REBASE);<work/IdeaProjects/Piper> cat .git/rebase-apply/author-script
GIT_AUTHOR_NAME='R'émi Leblond
GIT_AUTHOR_EMAIL='remi@spotinfluence.com'
GIT_AUTHOR_DATE='@1335301038 -0600'
From the command-line
lanny;~> echo $LANG
en_US.UTF-8
lanny;~> echo $LC_ALL
lanny;~> echo "Rémi Leblond" | LANG=C LC_ALL=C sed -ne 's/.*/GIT_AUTHOR_NAME='\''&'\''/p'
GIT_AUTHOR_NAME='R'émi Leblond
lanny;~> echo "Rémi Leblond" | sed -ne 's/.*/GIT_AUTHOR_NAME='\''&'\''/p'
GIT_AUTHOR_NAME='Rémi Leblond'
I can work around it easily enough by editing git-sh-setup to remove the locale overrides but thought a bug report might be useful.
Enjoy,
-ljr
---
Lanny Ripple
lanny@spotinfluence.com
[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 235 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx
2012-05-30 22:16 Bug: rebase when an author uses accents in name on MacOSx Lanny Ripple
@ 2012-05-30 23:45 ` Junio C Hamano
2012-05-30 23:57 ` Jürgen Kreileder
2012-05-31 1:19 ` Jeff King
0 siblings, 2 replies; 21+ messages in thread
From: Junio C Hamano @ 2012-05-30 23:45 UTC (permalink / raw)
To: Lanny Ripple; +Cc: git
Lanny Ripple <lanny@spotinfluence.com> writes:
> lanny;~> echo "Rémi Leblond" | LANG=C LC_ALL=C sed -ne 's/.*/GIT_AUTHOR_NAME='\''&'\''/p'
> GIT_AUTHOR_NAME='R'émi Leblond
So in C locale where each byte is supposed to be a single character,
that implementation of "sed" refuses to match a byte with high-bit
set when given a pattern '.'?
That is a surprising breakage, I would have to say.
Can anybody with a more vanilla BSD try the above out and report
what happens? I am mostly interested to see if this was inherited
from BSD or something MacOS introduced.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx
2012-05-30 23:45 ` Junio C Hamano
@ 2012-05-30 23:57 ` Jürgen Kreileder
2012-05-31 1:19 ` Jeff King
1 sibling, 0 replies; 21+ messages in thread
From: Jürgen Kreileder @ 2012-05-30 23:57 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Lanny Ripple, git
On Thu, May 31, 2012 at 1:45 AM, Junio C Hamano <gitster@pobox.com> wrote:
>
> Lanny Ripple <lanny@spotinfluence.com> writes:
>
> > lanny;~> echo "Rémi Leblond" | LANG=C LC_ALL=C sed -ne 's/.*/GIT_AUTHOR_NAME='\''&'\''/p'
> > GIT_AUTHOR_NAME='R'émi Leblond
>
> So in C locale where each byte is supposed to be a single character,
> that implementation of "sed" refuses to match a byte with high-bit
> set when given a pattern '.'?
>
> That is a surprising breakage, I would have to say.
FWIW, the command above correctly returns GIT_AUTHOR_NAME='Rémi
Leblond' for me on OS X 10.8.
Jürgen
--
https://blackdown.de/
http://www.flickr.com/photos/jkreileder/
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx
2012-05-30 23:45 ` Junio C Hamano
2012-05-30 23:57 ` Jürgen Kreileder
@ 2012-05-31 1:19 ` Jeff King
2012-05-31 6:33 ` Junio C Hamano
2012-05-31 9:33 ` Thomas Rast
1 sibling, 2 replies; 21+ messages in thread
From: Jeff King @ 2012-05-31 1:19 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Thomas Rast, Lanny Ripple, git
On Wed, May 30, 2012 at 04:45:33PM -0700, Junio C Hamano wrote:
> Lanny Ripple <lanny@spotinfluence.com> writes:
>
> > lanny;~> echo "Rémi Leblond" | LANG=C LC_ALL=C sed -ne 's/.*/GIT_AUTHOR_NAME='\''&'\''/p'
> > GIT_AUTHOR_NAME='R'émi Leblond
>
> So in C locale where each byte is supposed to be a single character,
> that implementation of "sed" refuses to match a byte with high-bit
> set when given a pattern '.'?
>
> That is a surprising breakage, I would have to say.
It should not be too surprising, since we discussed it a few months ago:
http://thread.gmane.org/gmane.comp.version-control.git/192218
Thomas provided a gross but workable solution here:
http://article.gmane.org/gmane.comp.version-control.git/192237
and we also talked about eventually having a shell-quoting mechanism for
pretty placeholders. Then the discussion rambled into "this sed is
horribly broken, and the user should get a better sed" territory. Maybe
we need to revisit that decision, since this is now two bug reports.
-Peff
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx
2012-05-31 1:19 ` Jeff King
@ 2012-05-31 6:33 ` Junio C Hamano
2012-05-31 13:36 ` Lanny Ripple
2012-05-31 9:33 ` Thomas Rast
1 sibling, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2012-05-31 6:33 UTC (permalink / raw)
To: Jeff King; +Cc: Thomas Rast, Lanny Ripple, git
Jeff King <peff@peff.net> writes:
> On Wed, May 30, 2012 at 04:45:33PM -0700, Junio C Hamano wrote:
> ...
>> So in C locale where each byte is supposed to be a single character,
>> that implementation of "sed" refuses to match a byte with high-bit
>> set when given a pattern '.'?
>>
>> That is a surprising breakage, I would have to say.
>
> It should not be too surprising, since we discussed it a few months ago:
>
> http://thread.gmane.org/gmane.comp.version-control.git/192218
Heh, no wonder I do not recall that one, as everything happened and
conclusions reached while I was sleeping ;-)
If it is not a bug in platform-sanctioned sed, but a buggy
third-party build, then I wouldn't worry about it for this cycle,
but pre-release freeze might be a good time to start sketching
Thomas's --format="%'%an <%ae>%'" approach, perhaps?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx
2012-05-31 1:19 ` Jeff King
2012-05-31 6:33 ` Junio C Hamano
@ 2012-05-31 9:33 ` Thomas Rast
1 sibling, 0 replies; 21+ messages in thread
From: Thomas Rast @ 2012-05-31 9:33 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, Lanny Ripple, git, Will Palmer
Jeff King <peff@peff.net> writes:
> On Wed, May 30, 2012 at 04:45:33PM -0700, Junio C Hamano wrote:
>
>> Lanny Ripple <lanny@spotinfluence.com> writes:
>>
>> > lanny;~> echo "Rémi Leblond" | LANG=C LC_ALL=C sed -ne 's/.*/GIT_AUTHOR_NAME='\''&'\''/p'
>> > GIT_AUTHOR_NAME='R'émi Leblond
>>
>> So in C locale where each byte is supposed to be a single character,
>> that implementation of "sed" refuses to match a byte with high-bit
>> set when given a pattern '.'?
>>
>> That is a surprising breakage, I would have to say.
>
> It should not be too surprising, since we discussed it a few months ago:
>
> http://thread.gmane.org/gmane.comp.version-control.git/192218
>
> Thomas provided a gross but workable solution here:
>
> http://article.gmane.org/gmane.comp.version-control.git/192237
>
> and we also talked about eventually having a shell-quoting mechanism for
> pretty placeholders. Then the discussion rambled into "this sed is
> horribly broken, and the user should get a better sed" territory. Maybe
> we need to revisit that decision, since this is now two bug reports.
Three actually, also counting Will Palmer (shruggar) who brought this up
on #git-devel shortly before the thread above.
--
Thomas Rast
trast@{inf,student}.ethz.ch
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx
2012-05-31 6:33 ` Junio C Hamano
@ 2012-05-31 13:36 ` Lanny Ripple
2012-05-31 14:28 ` Thomas Rast
2012-05-31 17:34 ` Junio C Hamano
0 siblings, 2 replies; 21+ messages in thread
From: Lanny Ripple @ 2012-05-31 13:36 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jeff King, Thomas Rast, git
[-- Attachment #1: Type: text/plain, Size: 1376 bytes --]
Bingo.
lanny;~> echo "Rémi Leblond" | LANG=C LC_ALL=C /usr/bin/sed -ne 's/.*/GIT_AUTHOR_NAME='\''&'\''/p'
GIT_AUTHOR_NAME='Rémi Leblond'
Just occurred to me that I'm using fink and that git-am doesn't use /usr/bin/sed but just sed. My suggestion is to be explicit on the path in git-am.
So it now stands at two bug-reports and one pebkac.
Thanks!,
-ljr
---
Lanny Ripple
lanny@spotinfluence.com
On May 31, 2012, at 1:33 AM, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
>> On Wed, May 30, 2012 at 04:45:33PM -0700, Junio C Hamano wrote:
>> ...
>>> So in C locale where each byte is supposed to be a single character,
>>> that implementation of "sed" refuses to match a byte with high-bit
>>> set when given a pattern '.'?
>>>
>>> That is a surprising breakage, I would have to say.
>>
>> It should not be too surprising, since we discussed it a few months ago:
>>
>> http://thread.gmane.org/gmane.comp.version-control.git/192218
>
> Heh, no wonder I do not recall that one, as everything happened and
> conclusions reached while I was sleeping ;-)
>
> If it is not a bug in platform-sanctioned sed, but a buggy
> third-party build, then I wouldn't worry about it for this cycle,
> but pre-release freeze might be a good time to start sketching
> Thomas's --format="%'%an <%ae>%'" approach, perhaps?
[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 235 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx
2012-05-31 13:36 ` Lanny Ripple
@ 2012-05-31 14:28 ` Thomas Rast
2012-05-31 14:56 ` Lanny Ripple
2012-05-31 17:34 ` Junio C Hamano
1 sibling, 1 reply; 21+ messages in thread
From: Thomas Rast @ 2012-05-31 14:28 UTC (permalink / raw)
To: Lanny Ripple; +Cc: Junio C Hamano, Jeff King, git
Lanny Ripple <lanny@spotinfluence.com> writes:
> Bingo.
>
> lanny;~> echo "Rémi Leblond" | LANG=C LC_ALL=C /usr/bin/sed -ne 's/.*/GIT_AUTHOR_NAME='\''&'\''/p'
> GIT_AUTHOR_NAME='Rémi Leblond'
>
> Just occurred to me that I'm using fink and that git-am doesn't use
> /usr/bin/sed but just sed. My suggestion is to be explicit on the
> path in git-am.
Then how would you work around a platform sed that is broken? You can't
just install a new one in another directory if we use an absolute path.
Which apparently is what people have to do on Solaris, see the
SANE_TOOL_PATH setting for the Makefile.
--
Thomas Rast
trast@{inf,student}.ethz.ch
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx
2012-05-31 14:28 ` Thomas Rast
@ 2012-05-31 14:56 ` Lanny Ripple
0 siblings, 0 replies; 21+ messages in thread
From: Lanny Ripple @ 2012-05-31 14:56 UTC (permalink / raw)
To: Thomas Rast; +Cc: Junio C Hamano, Jeff King, git
[-- Attachment #1: Type: text/plain, Size: 1926 bytes --]
A platform sed that is broken would be worked around by recognizing the platform. If the bulk of platforms are sane then fixing the path for the platform is the easiest solution. If the bulk of platforms don't have a sane sed then not using sed is probably the solution. If you want to make the path variable then a platform config file built automatically figuring out default paths that could be changed by the host's admin comes to mind (much like, oh I don't know, building a list of shell variables named author-script to use during rebases).
You could modify the name, email, and timestamp so they didn't have to be parsed. Separate lines in the header for name, email, and date would work. A length encoding of the three fields at the beginning of the header line or in some other (new) location if backwards compatibility is a concern. There's lots of rather straight-forward technical solutions.
Now if what you are saying is the social cost of applying any of those solutions is too high that's a horse of a different color.
Anyway thanks to everyone for the clue that got me working again,
-ljr
---
Lanny Ripple
lanny@spotinfluence.com
On May 31, 2012, at 9:28 AM, Thomas Rast wrote:
> Lanny Ripple <lanny@spotinfluence.com> writes:
>
>> Bingo.
>>
>> lanny;~> echo "Rémi Leblond" | LANG=C LC_ALL=C /usr/bin/sed -ne 's/.*/GIT_AUTHOR_NAME='\''&'\''/p'
>> GIT_AUTHOR_NAME='Rémi Leblond'
>>
>> Just occurred to me that I'm using fink and that git-am doesn't use
>> /usr/bin/sed but just sed. My suggestion is to be explicit on the
>> path in git-am.
>
> Then how would you work around a platform sed that is broken? You can't
> just install a new one in another directory if we use an absolute path.
> Which apparently is what people have to do on Solaris, see the
> SANE_TOOL_PATH setting for the Makefile.
>
> --
> Thomas Rast
> trast@{inf,student}.ethz.ch
[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 235 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx
2012-05-31 13:36 ` Lanny Ripple
2012-05-31 14:28 ` Thomas Rast
@ 2012-05-31 17:34 ` Junio C Hamano
2012-05-31 17:49 ` Lanny Ripple
1 sibling, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2012-05-31 17:34 UTC (permalink / raw)
To: Lanny Ripple; +Cc: Jeff King, Thomas Rast, git
Lanny Ripple <lanny@spotinfluence.com> writes:
> Bingo.
>
> lanny;~> echo "Rémi Leblond" | LANG=C LC_ALL=C /usr/bin/sed -ne 's/.*/GIT_AUTHOR_NAME='\''&'\''/p'
> GIT_AUTHOR_NAME='Rémi Leblond'
>
> Just occurred to me that I'm using fink and that git-am doesn't use /usr/bin/sed but just sed. My suggestion is to be explicit on the path in git-am.
>
> So it now stands at two bug-reports and one pebkac.
My impression from reading the older thread Peff mentioned is that
the other ones are also broken implementation of sed supplied by
third-party, so it probably is not two Xs and one Y. It is three
something.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx
2012-05-31 17:34 ` Junio C Hamano
@ 2012-05-31 17:49 ` Lanny Ripple
2012-05-31 18:33 ` Junio C Hamano
0 siblings, 1 reply; 21+ messages in thread
From: Lanny Ripple @ 2012-05-31 17:49 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jeff King, Thomas Rast, git
[-- Attachment #1: Type: text/plain, Size: 1489 bytes --]
Perhaps the error message in git-am could be modified to indicate sed is a suspect?. E.g.,
lanny(master);<work/IdeaProjects/Piper> git rebase master rl-clean292
First, rewinding head to replay your work on top of it...
/sw/lib/git-core/git-am: line 692: Leblond: command not found
Patch does not have a valid e-mail address. (Used /sw/bin/sed found on PATH).
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
lanny((ae6c220...)|REBASE);<work/IdeaProjects/Piper>
Alternately in git-am a comment at the error emit point explaining that third-party seds are often the problem for this error and a suggestion to modify the PATH so the platform sed occurs first.
Regards,
-ljr
---
Lanny Ripple
lanny@spotinfluence.com
On May 31, 2012, at 12:34 PM, Junio C Hamano wrote:
> Lanny Ripple <lanny@spotinfluence.com> writes:
>
>> Bingo.
>>
>> lanny;~> echo "Rémi Leblond" | LANG=C LC_ALL=C /usr/bin/sed -ne 's/.*/GIT_AUTHOR_NAME='\''&'\''/p'
>> GIT_AUTHOR_NAME='Rémi Leblond'
>>
>> Just occurred to me that I'm using fink and that git-am doesn't use /usr/bin/sed but just sed. My suggestion is to be explicit on the path in git-am.
>>
>> So it now stands at two bug-reports and one pebkac.
>
> My impression from reading the older thread Peff mentioned is that
> the other ones are also broken implementation of sed supplied by
> third-party, so it probably is not two Xs and one Y. It is three
> something.
[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 235 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx
2012-05-31 17:49 ` Lanny Ripple
@ 2012-05-31 18:33 ` Junio C Hamano
2012-05-31 19:21 ` Lanny Ripple
0 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2012-05-31 18:33 UTC (permalink / raw)
To: Lanny Ripple; +Cc: Jeff King, Thomas Rast, git
Lanny Ripple <lanny@spotinfluence.com> writes:
> Perhaps the error message in git-am could be modified to indicate
> sed is a suspect?. E.g.,
>
> lanny(master);<work/IdeaProjects/Piper> git rebase master rl-clean292
> First, rewinding head to replay your work on top of it...
> /sw/lib/git-core/git-am: line 692: Leblond: command not found
> Patch does not have a valid e-mail address. (Used /sw/bin/sed found on PATH).
> ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
> lanny((ae6c220...)|REBASE);<work/IdeaProjects/Piper>
Hrm, that does not sound an attractive way going forward.
Do we have to suspect any and all uses of POSIX tools, just in case
somebody installs a broken implementation from random places? Is
sed the only thing that is possibly broken?
By the way, have you filed a bug report to whoever supplied your /sw/bin/sed?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx
2012-05-31 18:33 ` Junio C Hamano
@ 2012-05-31 19:21 ` Lanny Ripple
2012-06-01 9:30 ` Jeff King
0 siblings, 1 reply; 21+ messages in thread
From: Lanny Ripple @ 2012-05-31 19:21 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jeff King, Thomas Rast, git
[-- Attachment #1: Type: text/plain, Size: 1476 bytes --]
You have three recent instances where people have bumped into this with sed. (And yes on reporting it to the packaging project.) It seems to me leaving a breadcrumb so that folks can figure out what's going on without having to bother the list would be a win for everyone. And yes, if any and all uses of POSIX tools started showing up with some frequency then I think the same breadcrumb win/win logic would apply.
Enjoy,
-ljr
---
Lanny Ripple
lanny@spotinfluence.com
On May 31, 2012, at 1:33 PM, Junio C Hamano wrote:
> Lanny Ripple <lanny@spotinfluence.com> writes:
>
>> Perhaps the error message in git-am could be modified to indicate
>> sed is a suspect?. E.g.,
>>
>> lanny(master);<work/IdeaProjects/Piper> git rebase master rl-clean292
>> First, rewinding head to replay your work on top of it...
>> /sw/lib/git-core/git-am: line 692: Leblond: command not found
>> Patch does not have a valid e-mail address. (Used /sw/bin/sed found on PATH).
>> ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
>> lanny((ae6c220...)|REBASE);<work/IdeaProjects/Piper>
>
> Hrm, that does not sound an attractive way going forward.
>
> Do we have to suspect any and all uses of POSIX tools, just in case
> somebody installs a broken implementation from random places? Is
> sed the only thing that is possibly broken?
>
> By the way, have you filed a bug report to whoever supplied your /sw/bin/sed?
>
>
[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 235 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx
2012-05-31 19:21 ` Lanny Ripple
@ 2012-06-01 9:30 ` Jeff King
2012-06-01 13:56 ` Lanny Ripple
2012-06-01 16:19 ` Junio C Hamano
0 siblings, 2 replies; 21+ messages in thread
From: Jeff King @ 2012-06-01 9:30 UTC (permalink / raw)
To: Lanny Ripple; +Cc: Junio C Hamano, Thomas Rast, git
[Please don't top-post.]
On Thu, May 31, 2012 at 02:21:16PM -0500, Lanny Ripple wrote:
> >> Perhaps the error message in git-am could be modified to indicate
> >> sed is a suspect?. E.g.,
> [...]
> > Hrm, that does not sound an attractive way going forward.
> [...]
> You have three recent instances where people have bumped into this
> with sed. (And yes on reporting it to the packaging project.) It
> seems to me leaving a breadcrumb so that folks can figure out what's
> going on without having to bother the list would be a win for
> everyone.
But you have to keep in mind all of the people who will be led down the
wrong path by your breadcrumb when the failure is caused by a
_different_ problem. What is the probability that it is helpful versus
not helpful? If you are going to give advice that sed might be broken,
you should at least test to see if it is broken and report it.
But really, I'd rather just see the broken sed fixed. Where would the
breadcrumb lead people at this point, anyway? We don't actually have a
solution besides "uninstall this other, crappy sed". Has the sed bug
actually been fixed?
-Peff
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx
2012-06-01 9:30 ` Jeff King
@ 2012-06-01 13:56 ` Lanny Ripple
2012-06-02 16:09 ` Jeff King
2012-06-01 16:19 ` Junio C Hamano
1 sibling, 1 reply; 21+ messages in thread
From: Lanny Ripple @ 2012-06-01 13:56 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, Thomas Rast, git
[-- Attachment #1: Type: text/plain, Size: 2669 bytes --]
I did show that sed was broken and have provided a minimal, reproducible test.
I have reported it to the sed maintainers and they are working on it.
A message or comment in the code that seds not properly handling utf8 characters have been shown to be the cause of the problem and that git selects sed from the PATH would have been 100% effective in at least one case. I don't know the troubleshooting skills of the other two people that bumped into the problem so can't comment. Of the billions of people that have not (if it existed) looked at the breadcrumb and weren't led astray it's (would have) also been 100% effective. Can you in turn posit any reasonable way that get_author_ident_from_commit would improperly build author-script short of a bad sed? I guess you could pull out transient or systematic disk error.
You do, in fact, have several solutions. I won't reiterate since they are in the thread earlier. You also have in many cases the valid concern that the solutions would not be backwards compatible. And yes, this sed will get fixed but what then? The next person that gets a sed they don't expect earlier in their PATH will have to go through the same steps.
I do appreciate the assistance that led to the solution to my problem. Thanks for maintaining and making available such a great piece of software.
Regards,
-ljr
---
Lanny Ripple
lanny@spotinfluence.com
On Jun 1, 2012, at 4:30 AM, Jeff King wrote:
> [Please don't top-post.]
>
> On Thu, May 31, 2012 at 02:21:16PM -0500, Lanny Ripple wrote:
>
>>>> Perhaps the error message in git-am could be modified to indicate
>>>> sed is a suspect?. E.g.,
>> [...]
>>> Hrm, that does not sound an attractive way going forward.
>> [...]
>> You have three recent instances where people have bumped into this
>> with sed. (And yes on reporting it to the packaging project.) It
>> seems to me leaving a breadcrumb so that folks can figure out what's
>> going on without having to bother the list would be a win for
>> everyone.
>
> But you have to keep in mind all of the people who will be led down the
> wrong path by your breadcrumb when the failure is caused by a
> _different_ problem. What is the probability that it is helpful versus
> not helpful? If you are going to give advice that sed might be broken,
> you should at least test to see if it is broken and report it.
>
> But really, I'd rather just see the broken sed fixed. Where would the
> breadcrumb lead people at this point, anyway? We don't actually have a
> solution besides "uninstall this other, crappy sed". Has the sed bug
> actually been fixed?
>
> -Peff
[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 235 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx
2012-06-01 9:30 ` Jeff King
2012-06-01 13:56 ` Lanny Ripple
@ 2012-06-01 16:19 ` Junio C Hamano
2012-06-01 17:05 ` Lanny Ripple
2012-06-02 16:23 ` Jeff King
1 sibling, 2 replies; 21+ messages in thread
From: Junio C Hamano @ 2012-06-01 16:19 UTC (permalink / raw)
To: Jeff King; +Cc: Lanny Ripple, Thomas Rast, git
Jeff King <peff@peff.net> writes:
> [Please don't top-post.]
> ...
> But you have to keep in mind all of the people who will be led down the
> wrong path by your breadcrumb when the failure is caused by a
> _different_ problem. What is the probability that it is helpful versus
> not helpful? If you are going to give advice that sed might be broken,
> you should at least test to see if it is broken and report it.
Eek, do that at runtime in the error code path?
Add something like
suspected_sed_breakage () {
xxxxx=$(printf "\370\235\204\236\n" | LC_CTYPE=C sed 's/./x/g')
if test "x$xxxxx" != "xxxxx"
then
die "Your sed is broken; cannot run $1"
fi
}
to git-sh-setup, and do something like:
. "$dotest/author-script" || suspected_sed_breakage "$0"
in git-am?
The problem I see is that at that point where we have to suspect
something fundamental as sed broken on the platform, we cannot even
trust printf, test, or even the shell itself behaving sanely.
So I would say, although it is a fun thought-experiment, such a test
and breadcrumb is not really worth it.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx
2012-06-01 16:19 ` Junio C Hamano
@ 2012-06-01 17:05 ` Lanny Ripple
2012-06-01 17:57 ` Junio C Hamano
2012-06-02 16:23 ` Jeff King
1 sibling, 1 reply; 21+ messages in thread
From: Lanny Ripple @ 2012-06-01 17:05 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jeff King, Thomas Rast, git
[-- Attachment #1: Type: text/plain, Size: 2072 bytes --]
I still think the best solution is figuring out if the platform sed is sane at build time and using a full path (via config setup if being able to change the sed used is a priority). Short of that something as simple as
(git-am:699+)
if test -z "$GIT_AUTHOR_EMAIL"
then
# Can occur when sed in PATH will not handle UTF8 under LC_ALL=C.
gettextln "Patch does not have a valid e-mail address."
stop_here $this
fi
would give folks trying to troubleshoot the problem a clue to what was going on. From the fink developers' list it seems Darwin and perhaps FreeBSD use US-ASCII for LC_ALL=C or POSIX which is why Gnu sed gets it wrong.
My problem is still fixed whatever is decided. Enjoy,
-ljr
---
Lanny Ripple
lanny@spotinfluence.com
On Jun 1, 2012, at 11:19 AM, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>> [Please don't top-post.]
>> ...
>> But you have to keep in mind all of the people who will be led down the
>> wrong path by your breadcrumb when the failure is caused by a
>> _different_ problem. What is the probability that it is helpful versus
>> not helpful? If you are going to give advice that sed might be broken,
>> you should at least test to see if it is broken and report it.
>
> Eek, do that at runtime in the error code path?
>
> Add something like
>
> suspected_sed_breakage () {
> xxxxx=$(printf "\370\235\204\236\n" | LC_CTYPE=C sed 's/./x/g')
> if test "x$xxxxx" != "xxxxx"
> then
> die "Your sed is broken; cannot run $1"
> fi
> }
>
> to git-sh-setup, and do something like:
>
> . "$dotest/author-script" || suspected_sed_breakage "$0"
>
> in git-am?
>
> The problem I see is that at that point where we have to suspect
> something fundamental as sed broken on the platform, we cannot even
> trust printf, test, or even the shell itself behaving sanely.
>
> So I would say, although it is a fun thought-experiment, such a test
> and breadcrumb is not really worth it.
>
[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 235 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx
2012-06-01 17:05 ` Lanny Ripple
@ 2012-06-01 17:57 ` Junio C Hamano
0 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2012-06-01 17:57 UTC (permalink / raw)
To: Lanny Ripple; +Cc: Jeff King, Thomas Rast, git
Lanny Ripple <lanny@spotinfluence.com> writes:
> I still think the best solution is figuring out if the platform sed is sane at build time and using a full path (via config setup if being able to change the sed used is a priority). Short of that something as simple as
>
> (git-am:699+)
>
> if test -z "$GIT_AUTHOR_EMAIL"
> then
> # Can occur when sed in PATH will not handle UTF8 under LC_ALL=C.
> gettextln "Patch does not have a valid e-mail address."
> stop_here $this
> fi
That is a real fix to the issue (can also happen to name, no?) and
is worth checking.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx
2012-06-01 13:56 ` Lanny Ripple
@ 2012-06-02 16:09 ` Jeff King
2012-06-02 16:37 ` Lanny Ripple
0 siblings, 1 reply; 21+ messages in thread
From: Jeff King @ 2012-06-02 16:09 UTC (permalink / raw)
To: Lanny Ripple; +Cc: Junio C Hamano, Thomas Rast, git
On Fri, Jun 01, 2012 at 08:56:01AM -0500, Lanny Ripple wrote:
> I did show that sed was broken and have provided a minimal, reproducible test.
>
> I have reported it to the sed maintainers and they are working on it.
Great. Do we know yet which versions are affected?
> A message or comment in the code that seds not properly handling utf8
> characters have been shown to be the cause of the problem and that git
> selects sed from the PATH would have been 100% effective in at least
> one case. I don't know the troubleshooting skills of the other two
> people that bumped into the problem so can't comment. Of the billions
> of people that have not (if it existed) looked at the breadcrumb and
> weren't led astray it's (would have) also been 100% effective. Can
> you in turn posit any reasonable way that get_author_ident_from_commit
> would improperly build author-script short of a bad sed? I guess you
> could pull out transient or systematic disk error.
I assume from bogus commit objects. But I admit I am just guessing, and
don't have data.
> You do, in fact, have several solutions. I won't reiterate since they
> are in the thread earlier. You also have in many cases the valid
> concern that the solutions would not be backwards compatible. And
> yes, this sed will get fixed but what then? The next person that gets
> a sed they don't expect earlier in their PATH will have to go through
> the same steps.
When I said:
> > But really, I'd rather just see the broken sed fixed. Where would the
> > breadcrumb lead people at this point, anyway? We don't actually have a
> > solution besides "uninstall this other, crappy sed". Has the sed bug
> > actually been fixed?
I meant that there is not a fix for the _user_ to perform at that point.
The point of a breadcrumb like that is that we are not going to put a
fix into git, so we want to at least give the user a clue that their
system has a problem. But what is their next step after being informed
that their system has a problem?
-Peff
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx
2012-06-01 16:19 ` Junio C Hamano
2012-06-01 17:05 ` Lanny Ripple
@ 2012-06-02 16:23 ` Jeff King
1 sibling, 0 replies; 21+ messages in thread
From: Jeff King @ 2012-06-02 16:23 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Lanny Ripple, Thomas Rast, git
On Fri, Jun 01, 2012 at 09:19:26AM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> > [Please don't top-post.]
> > ...
> > But you have to keep in mind all of the people who will be led down the
> > wrong path by your breadcrumb when the failure is caused by a
> > _different_ problem. What is the probability that it is helpful versus
> > not helpful? If you are going to give advice that sed might be broken,
> > you should at least test to see if it is broken and report it.
>
> Eek, do that at runtime in the error code path?
Yes, which I think is utterly gross, but at least it is on the error
code path, so most people will never run it.
It's slightly less gross to do it at build-time (or even as part of
configure, I guess), but of course it is a run-time dependency, so there
is nothing to stop the broken sed from being installed after git (or
even a user with a different PATH than the builder triggering it).
One gross thing about doing it at run-time is that it only affects
_this_ code path, which happens to have an easy-to-diagnose outcome from
the bug. But how many other code paths are using sed on data which might
contain utf-8 characters? And are they failing silently or otherwise
simply corrupting the output?
One other thing to think about: this particular manifestation of the
bug is to cause our shell-quoting to fail. Are there are security
implications? That is, can I execute arbitrary code by having you run
get_author_ident_from_commit on a specially-crafted commit?
> The problem I see is that at that point where we have to suspect
> something fundamental as sed broken on the platform, we cannot even
> trust printf, test, or even the shell itself behaving sanely.
I don't think that's true. We have to deal with this kind of portability
nonsense all the time. We assume that the tools work until we get a
report that they don't, and then we fix it, work around it, or whatever.
Yes, the "your sed is broken" test would not work if "test" is also
totally broken. But we have not seen such a system in real life, or have
reason to suspect that it exists. Whereas we do know there is a
common[1] platform where the sed is broken in a specific way, but
nothing else is. Dealing with that helps solve a real problem for some
people.
-Peff
[1] I am just guessing about how common it is. I still haven't seen an
indication of how common this version of sed is, or even which
versions are affected.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx
2012-06-02 16:09 ` Jeff King
@ 2012-06-02 16:37 ` Lanny Ripple
0 siblings, 0 replies; 21+ messages in thread
From: Lanny Ripple @ 2012-06-02 16:37 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, Thomas Rast, git
[-- Attachment #1: Type: text/plain, Size: 761 bytes --]
At a guess (I haven't tested) every Gnu sed from 4.2.1 down on Darwin and perhaps FreeBSD. The fink developer looking at the sed giving problems (Gnu sed 4.2.1) says that it gets its charset idea from nl_langinfo() which reports US-ASCII for LC_ALL=C or POSIX (on Darwin, derived from FreeBSD). Since US-ASCII is only 7-bit chars anything accented will break.
-ljr
---
Lanny Ripple
lanny@spotinfluence.com
On Jun 2, 2012, at 11:09 AM, Jeff King wrote:
> On Fri, Jun 01, 2012 at 08:56:01AM -0500, Lanny Ripple wrote:
>
>> I did show that sed was broken and have provided a minimal, reproducible test.
>>
>> I have reported it to the sed maintainers and they are working on it.
>
> Great. Do we know yet which versions are affected?
[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 235 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2012-06-02 16:37 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-30 22:16 Bug: rebase when an author uses accents in name on MacOSx Lanny Ripple
2012-05-30 23:45 ` Junio C Hamano
2012-05-30 23:57 ` Jürgen Kreileder
2012-05-31 1:19 ` Jeff King
2012-05-31 6:33 ` Junio C Hamano
2012-05-31 13:36 ` Lanny Ripple
2012-05-31 14:28 ` Thomas Rast
2012-05-31 14:56 ` Lanny Ripple
2012-05-31 17:34 ` Junio C Hamano
2012-05-31 17:49 ` Lanny Ripple
2012-05-31 18:33 ` Junio C Hamano
2012-05-31 19:21 ` Lanny Ripple
2012-06-01 9:30 ` Jeff King
2012-06-01 13:56 ` Lanny Ripple
2012-06-02 16:09 ` Jeff King
2012-06-02 16:37 ` Lanny Ripple
2012-06-01 16:19 ` Junio C Hamano
2012-06-01 17:05 ` Lanny Ripple
2012-06-01 17:57 ` Junio C Hamano
2012-06-02 16:23 ` Jeff King
2012-05-31 9:33 ` Thomas Rast
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).