* Bug: rebase when an author uses accents in name on MacOSx @ 2012-05-30 22:16 Lanny Ripple 2012-05-30 23:45 ` Junio C Hamano 0 siblings, 1 reply; 21+ messages in thread From: Lanny Ripple @ 2012-05-30 22:16 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 1664 bytes --] Hello, I've come across behavior I'd say is a bug. We have a developer with an accent in his name, Rémi Leblond. Very recently we stopped being able to rebase getting the error lanny(master);<work/IdeaProjects/Piper> git rebase master rl-clean292 First, rewinding head to replay your work on top of it... /sw/lib/git-core/git-am: line 692: Leblond: command not found Patch does not have a valid e-mail address. lanny((ae6c220...)|REBASE);<work/IdeaProjects/Piper> Versions 1.7.10.2 is (now?) exhibiting this behavior. We've been rebasing fine for several month which is why I wonder if being on MacOSX is involved? (I'm at 10.7.4) Some digging shows the root cause to be in function get_author_ident_from_commit at line 210 of git-core/git-sh-setup, namely the sed with environment overrides LANG=C LC_ALL=C. This causes git-am to incorrectly build the .git/rebase-apply/author-script. lanny((ae6c220...)|REBASE);<work/IdeaProjects/Piper> cat .git/rebase-apply/author-script GIT_AUTHOR_NAME='R'émi Leblond GIT_AUTHOR_EMAIL='remi@spotinfluence.com' GIT_AUTHOR_DATE='@1335301038 -0600' From the command-line lanny;~> echo $LANG en_US.UTF-8 lanny;~> echo $LC_ALL lanny;~> echo "Rémi Leblond" | LANG=C LC_ALL=C sed -ne 's/.*/GIT_AUTHOR_NAME='\''&'\''/p' GIT_AUTHOR_NAME='R'émi Leblond lanny;~> echo "Rémi Leblond" | sed -ne 's/.*/GIT_AUTHOR_NAME='\''&'\''/p' GIT_AUTHOR_NAME='Rémi Leblond' I can work around it easily enough by editing git-sh-setup to remove the locale overrides but thought a bug report might be useful. Enjoy, -ljr --- Lanny Ripple lanny@spotinfluence.com [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 235 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx 2012-05-30 22:16 Bug: rebase when an author uses accents in name on MacOSx Lanny Ripple @ 2012-05-30 23:45 ` Junio C Hamano 2012-05-30 23:57 ` Jürgen Kreileder 2012-05-31 1:19 ` Jeff King 0 siblings, 2 replies; 21+ messages in thread From: Junio C Hamano @ 2012-05-30 23:45 UTC (permalink / raw) To: Lanny Ripple; +Cc: git Lanny Ripple <lanny@spotinfluence.com> writes: > lanny;~> echo "Rémi Leblond" | LANG=C LC_ALL=C sed -ne 's/.*/GIT_AUTHOR_NAME='\''&'\''/p' > GIT_AUTHOR_NAME='R'émi Leblond So in C locale where each byte is supposed to be a single character, that implementation of "sed" refuses to match a byte with high-bit set when given a pattern '.'? That is a surprising breakage, I would have to say. Can anybody with a more vanilla BSD try the above out and report what happens? I am mostly interested to see if this was inherited from BSD or something MacOS introduced. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx 2012-05-30 23:45 ` Junio C Hamano @ 2012-05-30 23:57 ` Jürgen Kreileder 2012-05-31 1:19 ` Jeff King 1 sibling, 0 replies; 21+ messages in thread From: Jürgen Kreileder @ 2012-05-30 23:57 UTC (permalink / raw) To: Junio C Hamano; +Cc: Lanny Ripple, git On Thu, May 31, 2012 at 1:45 AM, Junio C Hamano <gitster@pobox.com> wrote: > > Lanny Ripple <lanny@spotinfluence.com> writes: > > > lanny;~> echo "Rémi Leblond" | LANG=C LC_ALL=C sed -ne 's/.*/GIT_AUTHOR_NAME='\''&'\''/p' > > GIT_AUTHOR_NAME='R'émi Leblond > > So in C locale where each byte is supposed to be a single character, > that implementation of "sed" refuses to match a byte with high-bit > set when given a pattern '.'? > > That is a surprising breakage, I would have to say. FWIW, the command above correctly returns GIT_AUTHOR_NAME='Rémi Leblond' for me on OS X 10.8. Jürgen -- https://blackdown.de/ http://www.flickr.com/photos/jkreileder/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx 2012-05-30 23:45 ` Junio C Hamano 2012-05-30 23:57 ` Jürgen Kreileder @ 2012-05-31 1:19 ` Jeff King 2012-05-31 6:33 ` Junio C Hamano 2012-05-31 9:33 ` Thomas Rast 1 sibling, 2 replies; 21+ messages in thread From: Jeff King @ 2012-05-31 1:19 UTC (permalink / raw) To: Junio C Hamano; +Cc: Thomas Rast, Lanny Ripple, git On Wed, May 30, 2012 at 04:45:33PM -0700, Junio C Hamano wrote: > Lanny Ripple <lanny@spotinfluence.com> writes: > > > lanny;~> echo "Rémi Leblond" | LANG=C LC_ALL=C sed -ne 's/.*/GIT_AUTHOR_NAME='\''&'\''/p' > > GIT_AUTHOR_NAME='R'émi Leblond > > So in C locale where each byte is supposed to be a single character, > that implementation of "sed" refuses to match a byte with high-bit > set when given a pattern '.'? > > That is a surprising breakage, I would have to say. It should not be too surprising, since we discussed it a few months ago: http://thread.gmane.org/gmane.comp.version-control.git/192218 Thomas provided a gross but workable solution here: http://article.gmane.org/gmane.comp.version-control.git/192237 and we also talked about eventually having a shell-quoting mechanism for pretty placeholders. Then the discussion rambled into "this sed is horribly broken, and the user should get a better sed" territory. Maybe we need to revisit that decision, since this is now two bug reports. -Peff ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx 2012-05-31 1:19 ` Jeff King @ 2012-05-31 6:33 ` Junio C Hamano 2012-05-31 13:36 ` Lanny Ripple 2012-05-31 9:33 ` Thomas Rast 1 sibling, 1 reply; 21+ messages in thread From: Junio C Hamano @ 2012-05-31 6:33 UTC (permalink / raw) To: Jeff King; +Cc: Thomas Rast, Lanny Ripple, git Jeff King <peff@peff.net> writes: > On Wed, May 30, 2012 at 04:45:33PM -0700, Junio C Hamano wrote: > ... >> So in C locale where each byte is supposed to be a single character, >> that implementation of "sed" refuses to match a byte with high-bit >> set when given a pattern '.'? >> >> That is a surprising breakage, I would have to say. > > It should not be too surprising, since we discussed it a few months ago: > > http://thread.gmane.org/gmane.comp.version-control.git/192218 Heh, no wonder I do not recall that one, as everything happened and conclusions reached while I was sleeping ;-) If it is not a bug in platform-sanctioned sed, but a buggy third-party build, then I wouldn't worry about it for this cycle, but pre-release freeze might be a good time to start sketching Thomas's --format="%'%an <%ae>%'" approach, perhaps? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx 2012-05-31 6:33 ` Junio C Hamano @ 2012-05-31 13:36 ` Lanny Ripple 2012-05-31 14:28 ` Thomas Rast 2012-05-31 17:34 ` Junio C Hamano 0 siblings, 2 replies; 21+ messages in thread From: Lanny Ripple @ 2012-05-31 13:36 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Thomas Rast, git [-- Attachment #1: Type: text/plain, Size: 1376 bytes --] Bingo. lanny;~> echo "Rémi Leblond" | LANG=C LC_ALL=C /usr/bin/sed -ne 's/.*/GIT_AUTHOR_NAME='\''&'\''/p' GIT_AUTHOR_NAME='Rémi Leblond' Just occurred to me that I'm using fink and that git-am doesn't use /usr/bin/sed but just sed. My suggestion is to be explicit on the path in git-am. So it now stands at two bug-reports and one pebkac. Thanks!, -ljr --- Lanny Ripple lanny@spotinfluence.com On May 31, 2012, at 1:33 AM, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > >> On Wed, May 30, 2012 at 04:45:33PM -0700, Junio C Hamano wrote: >> ... >>> So in C locale where each byte is supposed to be a single character, >>> that implementation of "sed" refuses to match a byte with high-bit >>> set when given a pattern '.'? >>> >>> That is a surprising breakage, I would have to say. >> >> It should not be too surprising, since we discussed it a few months ago: >> >> http://thread.gmane.org/gmane.comp.version-control.git/192218 > > Heh, no wonder I do not recall that one, as everything happened and > conclusions reached while I was sleeping ;-) > > If it is not a bug in platform-sanctioned sed, but a buggy > third-party build, then I wouldn't worry about it for this cycle, > but pre-release freeze might be a good time to start sketching > Thomas's --format="%'%an <%ae>%'" approach, perhaps? [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 235 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx 2012-05-31 13:36 ` Lanny Ripple @ 2012-05-31 14:28 ` Thomas Rast 2012-05-31 14:56 ` Lanny Ripple 2012-05-31 17:34 ` Junio C Hamano 1 sibling, 1 reply; 21+ messages in thread From: Thomas Rast @ 2012-05-31 14:28 UTC (permalink / raw) To: Lanny Ripple; +Cc: Junio C Hamano, Jeff King, git Lanny Ripple <lanny@spotinfluence.com> writes: > Bingo. > > lanny;~> echo "Rémi Leblond" | LANG=C LC_ALL=C /usr/bin/sed -ne 's/.*/GIT_AUTHOR_NAME='\''&'\''/p' > GIT_AUTHOR_NAME='Rémi Leblond' > > Just occurred to me that I'm using fink and that git-am doesn't use > /usr/bin/sed but just sed. My suggestion is to be explicit on the > path in git-am. Then how would you work around a platform sed that is broken? You can't just install a new one in another directory if we use an absolute path. Which apparently is what people have to do on Solaris, see the SANE_TOOL_PATH setting for the Makefile. -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx 2012-05-31 14:28 ` Thomas Rast @ 2012-05-31 14:56 ` Lanny Ripple 0 siblings, 0 replies; 21+ messages in thread From: Lanny Ripple @ 2012-05-31 14:56 UTC (permalink / raw) To: Thomas Rast; +Cc: Junio C Hamano, Jeff King, git [-- Attachment #1: Type: text/plain, Size: 1926 bytes --] A platform sed that is broken would be worked around by recognizing the platform. If the bulk of platforms are sane then fixing the path for the platform is the easiest solution. If the bulk of platforms don't have a sane sed then not using sed is probably the solution. If you want to make the path variable then a platform config file built automatically figuring out default paths that could be changed by the host's admin comes to mind (much like, oh I don't know, building a list of shell variables named author-script to use during rebases). You could modify the name, email, and timestamp so they didn't have to be parsed. Separate lines in the header for name, email, and date would work. A length encoding of the three fields at the beginning of the header line or in some other (new) location if backwards compatibility is a concern. There's lots of rather straight-forward technical solutions. Now if what you are saying is the social cost of applying any of those solutions is too high that's a horse of a different color. Anyway thanks to everyone for the clue that got me working again, -ljr --- Lanny Ripple lanny@spotinfluence.com On May 31, 2012, at 9:28 AM, Thomas Rast wrote: > Lanny Ripple <lanny@spotinfluence.com> writes: > >> Bingo. >> >> lanny;~> echo "Rémi Leblond" | LANG=C LC_ALL=C /usr/bin/sed -ne 's/.*/GIT_AUTHOR_NAME='\''&'\''/p' >> GIT_AUTHOR_NAME='Rémi Leblond' >> >> Just occurred to me that I'm using fink and that git-am doesn't use >> /usr/bin/sed but just sed. My suggestion is to be explicit on the >> path in git-am. > > Then how would you work around a platform sed that is broken? You can't > just install a new one in another directory if we use an absolute path. > Which apparently is what people have to do on Solaris, see the > SANE_TOOL_PATH setting for the Makefile. > > -- > Thomas Rast > trast@{inf,student}.ethz.ch [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 235 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx 2012-05-31 13:36 ` Lanny Ripple 2012-05-31 14:28 ` Thomas Rast @ 2012-05-31 17:34 ` Junio C Hamano 2012-05-31 17:49 ` Lanny Ripple 1 sibling, 1 reply; 21+ messages in thread From: Junio C Hamano @ 2012-05-31 17:34 UTC (permalink / raw) To: Lanny Ripple; +Cc: Jeff King, Thomas Rast, git Lanny Ripple <lanny@spotinfluence.com> writes: > Bingo. > > lanny;~> echo "Rémi Leblond" | LANG=C LC_ALL=C /usr/bin/sed -ne 's/.*/GIT_AUTHOR_NAME='\''&'\''/p' > GIT_AUTHOR_NAME='Rémi Leblond' > > Just occurred to me that I'm using fink and that git-am doesn't use /usr/bin/sed but just sed. My suggestion is to be explicit on the path in git-am. > > So it now stands at two bug-reports and one pebkac. My impression from reading the older thread Peff mentioned is that the other ones are also broken implementation of sed supplied by third-party, so it probably is not two Xs and one Y. It is three something. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx 2012-05-31 17:34 ` Junio C Hamano @ 2012-05-31 17:49 ` Lanny Ripple 2012-05-31 18:33 ` Junio C Hamano 0 siblings, 1 reply; 21+ messages in thread From: Lanny Ripple @ 2012-05-31 17:49 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Thomas Rast, git [-- Attachment #1: Type: text/plain, Size: 1489 bytes --] Perhaps the error message in git-am could be modified to indicate sed is a suspect?. E.g., lanny(master);<work/IdeaProjects/Piper> git rebase master rl-clean292 First, rewinding head to replay your work on top of it... /sw/lib/git-core/git-am: line 692: Leblond: command not found Patch does not have a valid e-mail address. (Used /sw/bin/sed found on PATH). ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ lanny((ae6c220...)|REBASE);<work/IdeaProjects/Piper> Alternately in git-am a comment at the error emit point explaining that third-party seds are often the problem for this error and a suggestion to modify the PATH so the platform sed occurs first. Regards, -ljr --- Lanny Ripple lanny@spotinfluence.com On May 31, 2012, at 12:34 PM, Junio C Hamano wrote: > Lanny Ripple <lanny@spotinfluence.com> writes: > >> Bingo. >> >> lanny;~> echo "Rémi Leblond" | LANG=C LC_ALL=C /usr/bin/sed -ne 's/.*/GIT_AUTHOR_NAME='\''&'\''/p' >> GIT_AUTHOR_NAME='Rémi Leblond' >> >> Just occurred to me that I'm using fink and that git-am doesn't use /usr/bin/sed but just sed. My suggestion is to be explicit on the path in git-am. >> >> So it now stands at two bug-reports and one pebkac. > > My impression from reading the older thread Peff mentioned is that > the other ones are also broken implementation of sed supplied by > third-party, so it probably is not two Xs and one Y. It is three > something. [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 235 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx 2012-05-31 17:49 ` Lanny Ripple @ 2012-05-31 18:33 ` Junio C Hamano 2012-05-31 19:21 ` Lanny Ripple 0 siblings, 1 reply; 21+ messages in thread From: Junio C Hamano @ 2012-05-31 18:33 UTC (permalink / raw) To: Lanny Ripple; +Cc: Jeff King, Thomas Rast, git Lanny Ripple <lanny@spotinfluence.com> writes: > Perhaps the error message in git-am could be modified to indicate > sed is a suspect?. E.g., > > lanny(master);<work/IdeaProjects/Piper> git rebase master rl-clean292 > First, rewinding head to replay your work on top of it... > /sw/lib/git-core/git-am: line 692: Leblond: command not found > Patch does not have a valid e-mail address. (Used /sw/bin/sed found on PATH). > ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ > lanny((ae6c220...)|REBASE);<work/IdeaProjects/Piper> Hrm, that does not sound an attractive way going forward. Do we have to suspect any and all uses of POSIX tools, just in case somebody installs a broken implementation from random places? Is sed the only thing that is possibly broken? By the way, have you filed a bug report to whoever supplied your /sw/bin/sed? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx 2012-05-31 18:33 ` Junio C Hamano @ 2012-05-31 19:21 ` Lanny Ripple 2012-06-01 9:30 ` Jeff King 0 siblings, 1 reply; 21+ messages in thread From: Lanny Ripple @ 2012-05-31 19:21 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Thomas Rast, git [-- Attachment #1: Type: text/plain, Size: 1476 bytes --] You have three recent instances where people have bumped into this with sed. (And yes on reporting it to the packaging project.) It seems to me leaving a breadcrumb so that folks can figure out what's going on without having to bother the list would be a win for everyone. And yes, if any and all uses of POSIX tools started showing up with some frequency then I think the same breadcrumb win/win logic would apply. Enjoy, -ljr --- Lanny Ripple lanny@spotinfluence.com On May 31, 2012, at 1:33 PM, Junio C Hamano wrote: > Lanny Ripple <lanny@spotinfluence.com> writes: > >> Perhaps the error message in git-am could be modified to indicate >> sed is a suspect?. E.g., >> >> lanny(master);<work/IdeaProjects/Piper> git rebase master rl-clean292 >> First, rewinding head to replay your work on top of it... >> /sw/lib/git-core/git-am: line 692: Leblond: command not found >> Patch does not have a valid e-mail address. (Used /sw/bin/sed found on PATH). >> ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ >> lanny((ae6c220...)|REBASE);<work/IdeaProjects/Piper> > > Hrm, that does not sound an attractive way going forward. > > Do we have to suspect any and all uses of POSIX tools, just in case > somebody installs a broken implementation from random places? Is > sed the only thing that is possibly broken? > > By the way, have you filed a bug report to whoever supplied your /sw/bin/sed? > > [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 235 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx 2012-05-31 19:21 ` Lanny Ripple @ 2012-06-01 9:30 ` Jeff King 2012-06-01 13:56 ` Lanny Ripple 2012-06-01 16:19 ` Junio C Hamano 0 siblings, 2 replies; 21+ messages in thread From: Jeff King @ 2012-06-01 9:30 UTC (permalink / raw) To: Lanny Ripple; +Cc: Junio C Hamano, Thomas Rast, git [Please don't top-post.] On Thu, May 31, 2012 at 02:21:16PM -0500, Lanny Ripple wrote: > >> Perhaps the error message in git-am could be modified to indicate > >> sed is a suspect?. E.g., > [...] > > Hrm, that does not sound an attractive way going forward. > [...] > You have three recent instances where people have bumped into this > with sed. (And yes on reporting it to the packaging project.) It > seems to me leaving a breadcrumb so that folks can figure out what's > going on without having to bother the list would be a win for > everyone. But you have to keep in mind all of the people who will be led down the wrong path by your breadcrumb when the failure is caused by a _different_ problem. What is the probability that it is helpful versus not helpful? If you are going to give advice that sed might be broken, you should at least test to see if it is broken and report it. But really, I'd rather just see the broken sed fixed. Where would the breadcrumb lead people at this point, anyway? We don't actually have a solution besides "uninstall this other, crappy sed". Has the sed bug actually been fixed? -Peff ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx 2012-06-01 9:30 ` Jeff King @ 2012-06-01 13:56 ` Lanny Ripple 2012-06-02 16:09 ` Jeff King 2012-06-01 16:19 ` Junio C Hamano 1 sibling, 1 reply; 21+ messages in thread From: Lanny Ripple @ 2012-06-01 13:56 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Thomas Rast, git [-- Attachment #1: Type: text/plain, Size: 2669 bytes --] I did show that sed was broken and have provided a minimal, reproducible test. I have reported it to the sed maintainers and they are working on it. A message or comment in the code that seds not properly handling utf8 characters have been shown to be the cause of the problem and that git selects sed from the PATH would have been 100% effective in at least one case. I don't know the troubleshooting skills of the other two people that bumped into the problem so can't comment. Of the billions of people that have not (if it existed) looked at the breadcrumb and weren't led astray it's (would have) also been 100% effective. Can you in turn posit any reasonable way that get_author_ident_from_commit would improperly build author-script short of a bad sed? I guess you could pull out transient or systematic disk error. You do, in fact, have several solutions. I won't reiterate since they are in the thread earlier. You also have in many cases the valid concern that the solutions would not be backwards compatible. And yes, this sed will get fixed but what then? The next person that gets a sed they don't expect earlier in their PATH will have to go through the same steps. I do appreciate the assistance that led to the solution to my problem. Thanks for maintaining and making available such a great piece of software. Regards, -ljr --- Lanny Ripple lanny@spotinfluence.com On Jun 1, 2012, at 4:30 AM, Jeff King wrote: > [Please don't top-post.] > > On Thu, May 31, 2012 at 02:21:16PM -0500, Lanny Ripple wrote: > >>>> Perhaps the error message in git-am could be modified to indicate >>>> sed is a suspect?. E.g., >> [...] >>> Hrm, that does not sound an attractive way going forward. >> [...] >> You have three recent instances where people have bumped into this >> with sed. (And yes on reporting it to the packaging project.) It >> seems to me leaving a breadcrumb so that folks can figure out what's >> going on without having to bother the list would be a win for >> everyone. > > But you have to keep in mind all of the people who will be led down the > wrong path by your breadcrumb when the failure is caused by a > _different_ problem. What is the probability that it is helpful versus > not helpful? If you are going to give advice that sed might be broken, > you should at least test to see if it is broken and report it. > > But really, I'd rather just see the broken sed fixed. Where would the > breadcrumb lead people at this point, anyway? We don't actually have a > solution besides "uninstall this other, crappy sed". Has the sed bug > actually been fixed? > > -Peff [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 235 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx 2012-06-01 13:56 ` Lanny Ripple @ 2012-06-02 16:09 ` Jeff King 2012-06-02 16:37 ` Lanny Ripple 0 siblings, 1 reply; 21+ messages in thread From: Jeff King @ 2012-06-02 16:09 UTC (permalink / raw) To: Lanny Ripple; +Cc: Junio C Hamano, Thomas Rast, git On Fri, Jun 01, 2012 at 08:56:01AM -0500, Lanny Ripple wrote: > I did show that sed was broken and have provided a minimal, reproducible test. > > I have reported it to the sed maintainers and they are working on it. Great. Do we know yet which versions are affected? > A message or comment in the code that seds not properly handling utf8 > characters have been shown to be the cause of the problem and that git > selects sed from the PATH would have been 100% effective in at least > one case. I don't know the troubleshooting skills of the other two > people that bumped into the problem so can't comment. Of the billions > of people that have not (if it existed) looked at the breadcrumb and > weren't led astray it's (would have) also been 100% effective. Can > you in turn posit any reasonable way that get_author_ident_from_commit > would improperly build author-script short of a bad sed? I guess you > could pull out transient or systematic disk error. I assume from bogus commit objects. But I admit I am just guessing, and don't have data. > You do, in fact, have several solutions. I won't reiterate since they > are in the thread earlier. You also have in many cases the valid > concern that the solutions would not be backwards compatible. And > yes, this sed will get fixed but what then? The next person that gets > a sed they don't expect earlier in their PATH will have to go through > the same steps. When I said: > > But really, I'd rather just see the broken sed fixed. Where would the > > breadcrumb lead people at this point, anyway? We don't actually have a > > solution besides "uninstall this other, crappy sed". Has the sed bug > > actually been fixed? I meant that there is not a fix for the _user_ to perform at that point. The point of a breadcrumb like that is that we are not going to put a fix into git, so we want to at least give the user a clue that their system has a problem. But what is their next step after being informed that their system has a problem? -Peff ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx 2012-06-02 16:09 ` Jeff King @ 2012-06-02 16:37 ` Lanny Ripple 0 siblings, 0 replies; 21+ messages in thread From: Lanny Ripple @ 2012-06-02 16:37 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Thomas Rast, git [-- Attachment #1: Type: text/plain, Size: 761 bytes --] At a guess (I haven't tested) every Gnu sed from 4.2.1 down on Darwin and perhaps FreeBSD. The fink developer looking at the sed giving problems (Gnu sed 4.2.1) says that it gets its charset idea from nl_langinfo() which reports US-ASCII for LC_ALL=C or POSIX (on Darwin, derived from FreeBSD). Since US-ASCII is only 7-bit chars anything accented will break. -ljr --- Lanny Ripple lanny@spotinfluence.com On Jun 2, 2012, at 11:09 AM, Jeff King wrote: > On Fri, Jun 01, 2012 at 08:56:01AM -0500, Lanny Ripple wrote: > >> I did show that sed was broken and have provided a minimal, reproducible test. >> >> I have reported it to the sed maintainers and they are working on it. > > Great. Do we know yet which versions are affected? [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 235 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx 2012-06-01 9:30 ` Jeff King 2012-06-01 13:56 ` Lanny Ripple @ 2012-06-01 16:19 ` Junio C Hamano 2012-06-01 17:05 ` Lanny Ripple 2012-06-02 16:23 ` Jeff King 1 sibling, 2 replies; 21+ messages in thread From: Junio C Hamano @ 2012-06-01 16:19 UTC (permalink / raw) To: Jeff King; +Cc: Lanny Ripple, Thomas Rast, git Jeff King <peff@peff.net> writes: > [Please don't top-post.] > ... > But you have to keep in mind all of the people who will be led down the > wrong path by your breadcrumb when the failure is caused by a > _different_ problem. What is the probability that it is helpful versus > not helpful? If you are going to give advice that sed might be broken, > you should at least test to see if it is broken and report it. Eek, do that at runtime in the error code path? Add something like suspected_sed_breakage () { xxxxx=$(printf "\370\235\204\236\n" | LC_CTYPE=C sed 's/./x/g') if test "x$xxxxx" != "xxxxx" then die "Your sed is broken; cannot run $1" fi } to git-sh-setup, and do something like: . "$dotest/author-script" || suspected_sed_breakage "$0" in git-am? The problem I see is that at that point where we have to suspect something fundamental as sed broken on the platform, we cannot even trust printf, test, or even the shell itself behaving sanely. So I would say, although it is a fun thought-experiment, such a test and breadcrumb is not really worth it. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx 2012-06-01 16:19 ` Junio C Hamano @ 2012-06-01 17:05 ` Lanny Ripple 2012-06-01 17:57 ` Junio C Hamano 2012-06-02 16:23 ` Jeff King 1 sibling, 1 reply; 21+ messages in thread From: Lanny Ripple @ 2012-06-01 17:05 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Thomas Rast, git [-- Attachment #1: Type: text/plain, Size: 2072 bytes --] I still think the best solution is figuring out if the platform sed is sane at build time and using a full path (via config setup if being able to change the sed used is a priority). Short of that something as simple as (git-am:699+) if test -z "$GIT_AUTHOR_EMAIL" then # Can occur when sed in PATH will not handle UTF8 under LC_ALL=C. gettextln "Patch does not have a valid e-mail address." stop_here $this fi would give folks trying to troubleshoot the problem a clue to what was going on. From the fink developers' list it seems Darwin and perhaps FreeBSD use US-ASCII for LC_ALL=C or POSIX which is why Gnu sed gets it wrong. My problem is still fixed whatever is decided. Enjoy, -ljr --- Lanny Ripple lanny@spotinfluence.com On Jun 1, 2012, at 11:19 AM, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: >> [Please don't top-post.] >> ... >> But you have to keep in mind all of the people who will be led down the >> wrong path by your breadcrumb when the failure is caused by a >> _different_ problem. What is the probability that it is helpful versus >> not helpful? If you are going to give advice that sed might be broken, >> you should at least test to see if it is broken and report it. > > Eek, do that at runtime in the error code path? > > Add something like > > suspected_sed_breakage () { > xxxxx=$(printf "\370\235\204\236\n" | LC_CTYPE=C sed 's/./x/g') > if test "x$xxxxx" != "xxxxx" > then > die "Your sed is broken; cannot run $1" > fi > } > > to git-sh-setup, and do something like: > > . "$dotest/author-script" || suspected_sed_breakage "$0" > > in git-am? > > The problem I see is that at that point where we have to suspect > something fundamental as sed broken on the platform, we cannot even > trust printf, test, or even the shell itself behaving sanely. > > So I would say, although it is a fun thought-experiment, such a test > and breadcrumb is not really worth it. > [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 235 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx 2012-06-01 17:05 ` Lanny Ripple @ 2012-06-01 17:57 ` Junio C Hamano 0 siblings, 0 replies; 21+ messages in thread From: Junio C Hamano @ 2012-06-01 17:57 UTC (permalink / raw) To: Lanny Ripple; +Cc: Jeff King, Thomas Rast, git Lanny Ripple <lanny@spotinfluence.com> writes: > I still think the best solution is figuring out if the platform sed is sane at build time and using a full path (via config setup if being able to change the sed used is a priority). Short of that something as simple as > > (git-am:699+) > > if test -z "$GIT_AUTHOR_EMAIL" > then > # Can occur when sed in PATH will not handle UTF8 under LC_ALL=C. > gettextln "Patch does not have a valid e-mail address." > stop_here $this > fi That is a real fix to the issue (can also happen to name, no?) and is worth checking. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx 2012-06-01 16:19 ` Junio C Hamano 2012-06-01 17:05 ` Lanny Ripple @ 2012-06-02 16:23 ` Jeff King 1 sibling, 0 replies; 21+ messages in thread From: Jeff King @ 2012-06-02 16:23 UTC (permalink / raw) To: Junio C Hamano; +Cc: Lanny Ripple, Thomas Rast, git On Fri, Jun 01, 2012 at 09:19:26AM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > [Please don't top-post.] > > ... > > But you have to keep in mind all of the people who will be led down the > > wrong path by your breadcrumb when the failure is caused by a > > _different_ problem. What is the probability that it is helpful versus > > not helpful? If you are going to give advice that sed might be broken, > > you should at least test to see if it is broken and report it. > > Eek, do that at runtime in the error code path? Yes, which I think is utterly gross, but at least it is on the error code path, so most people will never run it. It's slightly less gross to do it at build-time (or even as part of configure, I guess), but of course it is a run-time dependency, so there is nothing to stop the broken sed from being installed after git (or even a user with a different PATH than the builder triggering it). One gross thing about doing it at run-time is that it only affects _this_ code path, which happens to have an easy-to-diagnose outcome from the bug. But how many other code paths are using sed on data which might contain utf-8 characters? And are they failing silently or otherwise simply corrupting the output? One other thing to think about: this particular manifestation of the bug is to cause our shell-quoting to fail. Are there are security implications? That is, can I execute arbitrary code by having you run get_author_ident_from_commit on a specially-crafted commit? > The problem I see is that at that point where we have to suspect > something fundamental as sed broken on the platform, we cannot even > trust printf, test, or even the shell itself behaving sanely. I don't think that's true. We have to deal with this kind of portability nonsense all the time. We assume that the tools work until we get a report that they don't, and then we fix it, work around it, or whatever. Yes, the "your sed is broken" test would not work if "test" is also totally broken. But we have not seen such a system in real life, or have reason to suspect that it exists. Whereas we do know there is a common[1] platform where the sed is broken in a specific way, but nothing else is. Dealing with that helps solve a real problem for some people. -Peff [1] I am just guessing about how common it is. I still haven't seen an indication of how common this version of sed is, or even which versions are affected. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bug: rebase when an author uses accents in name on MacOSx 2012-05-31 1:19 ` Jeff King 2012-05-31 6:33 ` Junio C Hamano @ 2012-05-31 9:33 ` Thomas Rast 1 sibling, 0 replies; 21+ messages in thread From: Thomas Rast @ 2012-05-31 9:33 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Lanny Ripple, git, Will Palmer Jeff King <peff@peff.net> writes: > On Wed, May 30, 2012 at 04:45:33PM -0700, Junio C Hamano wrote: > >> Lanny Ripple <lanny@spotinfluence.com> writes: >> >> > lanny;~> echo "Rémi Leblond" | LANG=C LC_ALL=C sed -ne 's/.*/GIT_AUTHOR_NAME='\''&'\''/p' >> > GIT_AUTHOR_NAME='R'émi Leblond >> >> So in C locale where each byte is supposed to be a single character, >> that implementation of "sed" refuses to match a byte with high-bit >> set when given a pattern '.'? >> >> That is a surprising breakage, I would have to say. > > It should not be too surprising, since we discussed it a few months ago: > > http://thread.gmane.org/gmane.comp.version-control.git/192218 > > Thomas provided a gross but workable solution here: > > http://article.gmane.org/gmane.comp.version-control.git/192237 > > and we also talked about eventually having a shell-quoting mechanism for > pretty placeholders. Then the discussion rambled into "this sed is > horribly broken, and the user should get a better sed" territory. Maybe > we need to revisit that decision, since this is now two bug reports. Three actually, also counting Will Palmer (shruggar) who brought this up on #git-devel shortly before the thread above. -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2012-06-02 16:37 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-05-30 22:16 Bug: rebase when an author uses accents in name on MacOSx Lanny Ripple 2012-05-30 23:45 ` Junio C Hamano 2012-05-30 23:57 ` Jürgen Kreileder 2012-05-31 1:19 ` Jeff King 2012-05-31 6:33 ` Junio C Hamano 2012-05-31 13:36 ` Lanny Ripple 2012-05-31 14:28 ` Thomas Rast 2012-05-31 14:56 ` Lanny Ripple 2012-05-31 17:34 ` Junio C Hamano 2012-05-31 17:49 ` Lanny Ripple 2012-05-31 18:33 ` Junio C Hamano 2012-05-31 19:21 ` Lanny Ripple 2012-06-01 9:30 ` Jeff King 2012-06-01 13:56 ` Lanny Ripple 2012-06-02 16:09 ` Jeff King 2012-06-02 16:37 ` Lanny Ripple 2012-06-01 16:19 ` Junio C Hamano 2012-06-01 17:05 ` Lanny Ripple 2012-06-01 17:57 ` Junio C Hamano 2012-06-02 16:23 ` Jeff King 2012-05-31 9:33 ` Thomas Rast
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).