git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Encoding problem on OSX?
       [not found] <AANLkTikh12guRxCK2Vf=WvshzX8P-fYTyu3qxYWNJ2px@mail.gmail.com>
@ 2010-08-09 13:58 ` İsmail Dönmez
  2010-08-09 23:46   ` Jonathan Nieder
  0 siblings, 1 reply; 14+ messages in thread
From: İsmail Dönmez @ 2010-08-09 13:58 UTC (permalink / raw)
  To: git

Hi all;

On master & maint branch, t4201-shortlog.sh test 2 fails with:

expecting success:

git shortlog HEAD >log &&
fuzz log >log.predictable &&
test_cmp expect.template log.predictable

--- expect.template 2010-08-09 13:45:46.000000000 +0000
+++ log.predictable 2010-08-09 13:45:46.000000000 +0000
@@ -1,8 +1,8 @@
 A U Thor (5):
       SUBJECT
       SUBJECT
-      SUBJECT
-      SUBJECT
+      SUBJECT𝄞s 𝄞s a very, very long f𝄞rst l𝄞ne for the comm𝄞t
message to see 𝄞f 𝄞t 𝄞s wrapped correctly
+      SUBJECT????s ????s a very, very long f????rst l????ne for the
comm????t message to see ????f ????t ????s wrapped correctly
       SUBJECT

I am not sure if this is a known problem so I am reporting it.

Regards,
ismail

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Encoding problem on OSX?
  2010-08-09 13:58 ` Encoding problem on OSX? İsmail Dönmez
@ 2010-08-09 23:46   ` Jonathan Nieder
  2010-08-10  5:52     ` İsmail Dönmez
  0 siblings, 1 reply; 14+ messages in thread
From: Jonathan Nieder @ 2010-08-09 23:46 UTC (permalink / raw)
  To: İsmail Dönmez; +Cc: git

İsmail Dönmez wrote:

> git shortlog HEAD >log &&
> fuzz log >log.predictable &&
> test_cmp expect.template log.predictable
> 
> --- expect.template 2010-08-09 13:45:46.000000000 +0000
> +++ log.predictable 2010-08-09 13:45:46.000000000 +0000
> @@ -1,8 +1,8 @@
>  A U Thor (5):
>        SUBJECT
>        SUBJECT
> -      SUBJECT
> -      SUBJECT
> +      SUBJECT𝄞s 𝄞s a very, very long f𝄞rst l𝄞ne for the comm𝄞t
> message to see 𝄞f 𝄞t 𝄞s wrapped correctly
> +      SUBJECT????s ????s a very, very long f????rst l????ne for the
> comm????t message to see ????f ????t ????s wrapped correctly
>        SUBJECT

Very interesting; thanks for a report.

From the definition of fuzz(), it looks like

	sed "
			s/$_x40/OBJECT_NAME/g
			s/$_x05/OBJID/g
			s/^ \{6\}[CTa].*/      SUBJECT/g
			s/^ \{8\}[^ ].*/        CONTINUATION/g
		" <log >log.fuzzy

failed to completely match the fourth and five lines of the shortlog:

	A U Thor (5):
	      Test
	      This is a very, very long first[etc]
	      Th𝄞s 𝄞s a very, very long f𝄞rst[etc]
	      Th<malformed treble clef>s <malformed treble clef>s a...

Could you confirm this?  What does

	locale
	printf 'Th\360\235\204\236s\n' | sed 's/.*//g'

print?

Jonathan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Encoding problem on OSX?
  2010-08-09 23:46   ` Jonathan Nieder
@ 2010-08-10  5:52     ` İsmail Dönmez
  2010-08-11  7:55       ` Jonathan Nieder
  0 siblings, 1 reply; 14+ messages in thread
From: İsmail Dönmez @ 2010-08-10  5:52 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

Hi;

On Tue, Aug 10, 2010 at 2:46 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
>
>  locale
>        printf 'Th\360\235\204\236s\n' | sed 's/.*//g

[ismail@havana][08:50:45]
[~]>  locale
LANG=
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"

[ismail@havana][08:51:00]
[~]> printf 'Th\360\235\204\236s\n' | sed 's/.*//g'

[ismail@havana][08:51:06]
[~]>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Encoding problem on OSX?
  2010-08-10  5:52     ` İsmail Dönmez
@ 2010-08-11  7:55       ` Jonathan Nieder
  2010-08-11  8:20         ` İsmail Dönmez
  0 siblings, 1 reply; 14+ messages in thread
From: Jonathan Nieder @ 2010-08-11  7:55 UTC (permalink / raw)
  To: İsmail Dönmez; +Cc: git

İsmail Dönmez wrote:

> [~]> printf 'Th\360\235\204\236s\n' | sed 's/.*//g'
> 
> [ismail@havana][08:51:06]
> [~]>

Thanks for checking.  So sed is not completely broken.  Could you try

 sh t4201-shortlog.sh
 cd "trash directory.t4201-shortlog"
 git log
 cat "trash directory.t4201-shortlog/log"

?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Encoding problem on OSX?
  2010-08-11  7:55       ` Jonathan Nieder
@ 2010-08-11  8:20         ` İsmail Dönmez
  2010-08-11  8:29           ` Jonathan Nieder
  0 siblings, 1 reply; 14+ messages in thread
From: İsmail Dönmez @ 2010-08-11  8:20 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

Hi;

On Wed, Aug 11, 2010 at 10:55 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> İsmail Dönmez wrote:
>
>> [~]> printf 'Th\360\235\204\236s\n' | sed 's/.*//g'
>>
>> [ismail@havana][08:51:06]
>> [~]>
>
> Thanks for checking.  So sed is not completely broken.  Could you try
>
>  sh t4201-shortlog.sh
>  cd "trash directory.t4201-shortlog"
>  git log
>  cat "trash directory.t4201-shortlog/log"

First of all note that this is not Mac's default sed but instead GNU sed:

GNU sed version 4.2.1
Copyright (C) 2009 Free Software Foundation, Inc.

Now the output of what you requested;

[~/Sources/git/t]>  sh t4201-shortlog.sh
ok 1 - setup
not ok - 2 default output format
#	
#		git shortlog HEAD >log &&
#		fuzz log >log.predictable &&
#		test_cmp expect.template log.predictable
#	
ok 3 - pretty format
ok 4 - --abbrev
ok 5 - output from user-defined format is re-wrapped
ok 6 - shortlog wrapping
ok 7 - shortlog from non-git directory
ok 8 - shortlog encoding
# failed 1 among 8 test(s)
1..8
[ismail@havana][11:18:24]
[~/Sources/git/t]>  cd "trash directory.t4201-shortlog"
[ismail@havana][11:18:33]
[~/Sources/git/t/trash directory.t4201-shortlog]> git log
commit ef6c19b4846d6a3e41f9a3ce746a3bffae653c17
Author: Jöhännës "Dschö" Schindëlin <Johannes.Schindelin@gmx.de>
Date:   Wed Aug 11 08:18:24 2010 +0000

    set a1 to 3 and some non-ASCII chars: áæï

commit d7c0787d081716755e2863f612d171846f503d4f
Author: Jöhännës "Dschö" Schindëlin <Johannes.Schindelin@gmx.de>
Date:   Wed Aug 11 08:18:24 2010 +0000

    set a1 to 2 and some non-ASCII chars: Äßø

commit 7e9687adfe33f5d2050f0fc4ab5004f324d3559f
Author: A U Thor <author@example.com>
Date:   Wed Aug 11 08:18:24 2010 +0000

    Test

[~/Sources/git/t/trash directory.t4201-shortlog]> cat log
commit 5fc75f5794d1cd8575fc3e2e86f9c0e1aa31723e
Author: Someone else <not!me>
Date:   Wed Aug 11 08:18:24 2010 +0000

    Commit by someone else

commit 0f5955f471a9d882b0e869752614b5123af19da3
Author: A U Thor <author@example.com>
Date:   Wed Aug 11 08:18:24 2010 +0000

    a								12	34	56	78

commit 0bb7d083233c266d9051b283913bd83000c9001f
Author: A U Thor <author@example.com>
Date:   Wed Aug 11 08:18:24 2010 +0000

    Th????s ????s a very, very long f????rst l????ne for the comm????t
message to see ????f ????t ????s wrapped correctly

commit 03a5a848c658751c51925127820491bf2a94a752
Author: A U Thor <author@example.com>
Date:   Wed Aug 11 08:18:24 2010 +0000

    Th𝄞s 𝄞s a very, very long f𝄞rst l𝄞ne for the comm𝄞t message
to see 𝄞f 𝄞t 𝄞s wrapped correctly

commit fdfc106190118f705dee70b56930764007353922
Author: A U Thor <author@example.com>
Date:   Wed Aug 11 08:18:24 2010 +0000

    This is a very, very long first line for the commit message to see
if it is wrapped correctly

commit 7e9687adfe33f5d2050f0fc4ab5004f324d3559f
Author: A U Thor <author@example.com>
Date:   Wed Aug 11 08:18:24 2010 +0000

    Test

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Encoding problem on OSX?
  2010-08-11  8:20         ` İsmail Dönmez
@ 2010-08-11  8:29           ` Jonathan Nieder
  2010-08-11  8:33             ` İsmail Dönmez
  0 siblings, 1 reply; 14+ messages in thread
From: Jonathan Nieder @ 2010-08-11  8:29 UTC (permalink / raw)
  To: İsmail Dönmez; +Cc: git

İsmail Dönmez wrote:

> [~/Sources/git/t]>  sh t4201-shortlog.sh
> ok 1 - setup
> not ok - 2 default output format
> #	
> #		git shortlog HEAD >log &&
> #		fuzz log >log.predictable &&
> #		test_cmp expect.template log.predictable
> #	
> ok 3 - pretty format

Oops, my bad.

	sh t4201-shortlog.sh --immediate
	cat "trash directory.t4201-shortlog/log"

is what I meant.  The idea is to get the log that that log.predictable
is based on, by fetching the log from immediately after the failing test.

Sorry for the trouble,
Jonathan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Encoding problem on OSX?
  2010-08-11  8:29           ` Jonathan Nieder
@ 2010-08-11  8:33             ` İsmail Dönmez
  2010-08-11  8:44               ` Jonathan Nieder
  0 siblings, 1 reply; 14+ messages in thread
From: İsmail Dönmez @ 2010-08-11  8:33 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Wed, Aug 11, 2010 at 11:29 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> İsmail Dönmez wrote:
>
>> [~/Sources/git/t]>  sh t4201-shortlog.sh
>> ok 1 - setup
>> not ok - 2 default output format
>> #
>> #             git shortlog HEAD >log &&
>> #             fuzz log >log.predictable &&
>> #             test_cmp expect.template log.predictable
>> #
>> ok 3 - pretty format
>
> Oops, my bad.
>
>        sh t4201-shortlog.sh --immediate
>        cat "trash directory.t4201-shortlog/log"
>
> is what I meant.  The idea is to get the log that that log.predictable
> is based on, by fetching the log from immediately after the failing test.

Ok here we go;


[~/Sources/git/t]>        sh t4201-shortlog.sh --immediate
ok 1 - setup
not ok - 2 default output format
#	
#		git shortlog HEAD >log &&
#		fuzz log >log.predictable &&
#		test_cmp expect.template log.predictable
#	
[ismail@havana][11:32:29]
[~/Sources/git/t]> cat "trash directory.t4201-shortlog/log"
A U Thor (5):
      Test
      This is a very, very long first line for the commit message to
see if it is wrapped correctly
      Th𝄞s 𝄞s a very, very long f𝄞rst l𝄞ne for the comm𝄞t message
to see 𝄞f 𝄞t 𝄞s wrapped correctly
      Th????s ????s a very, very long f????rst l????ne for the
comm????t message to see ????f ????t ????s wrapped correctly
      a								12	34	56	78

Someone else (1):
      Commit by someone else

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Encoding problem on OSX?
  2010-08-11  8:33             ` İsmail Dönmez
@ 2010-08-11  8:44               ` Jonathan Nieder
  2010-08-11  8:47                 ` İsmail Dönmez
  0 siblings, 1 reply; 14+ messages in thread
From: Jonathan Nieder @ 2010-08-11  8:44 UTC (permalink / raw)
  To: İsmail Dönmez; +Cc: git

İsmail Dönmez wrote:
> On Wed, Aug 11, 2010 at 11:29 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:

>>        sh t4201-shortlog.sh --immediate
>>        cat "trash directory.t4201-shortlog/log"
>>
>> is what I meant.  The idea is to get the log that that log.predictable
>> is based on, by fetching the log from immediately after the failing test.
>
> Ok here we go;

Okay, I’m stymied.  It *looks* like a sed bug even if a quick
test did not catch it in the act.

I guess the last thing to try is

	sed "s/^ \{6\}[CTa].*/      SUBJECT/g" <"trash directory.t4201-shortlog/log"

because then you would have a test case to report to your sed
supplier.

Hopefully someone else with Mac OS X can reproduce this.  

Thanks again.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Encoding problem on OSX?
  2010-08-11  8:44               ` Jonathan Nieder
@ 2010-08-11  8:47                 ` İsmail Dönmez
  2010-08-11  9:01                   ` İsmail Dönmez
  0 siblings, 1 reply; 14+ messages in thread
From: İsmail Dönmez @ 2010-08-11  8:47 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Wed, Aug 11, 2010 at 11:44 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
>        sed "s/^ \{6\}[CTa].*/      SUBJECT/g" <"trash directory.t4201-shortlog/log"
>

A U Thor (5):
      SUBJECT
      SUBJECT
      SUBJECT
      SUBJECT????s ????s a very, very long f????rst l????ne for the
comm????t message to see ????f ????t ????s wrapped correctly
      SUBJECT

Someone else (1):
      SUBJECT

I will try updating my sed, thanks!

Regards,
ismail

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Encoding problem on OSX?
  2010-08-11  8:47                 ` İsmail Dönmez
@ 2010-08-11  9:01                   ` İsmail Dönmez
  2010-08-11  9:23                     ` Jonathan Nieder
  0 siblings, 1 reply; 14+ messages in thread
From: İsmail Dönmez @ 2010-08-11  9:01 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

Hi again;

On Wed, Aug 11, 2010 at 11:47 AM, İsmail Dönmez <ismail@namtrac.org> wrote:
> On Wed, Aug 11, 2010 at 11:44 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
>>        sed "s/^ \{6\}[CTa].*/      SUBJECT/g" <"trash directory.t4201-shortlog/log"
>>
>
> A U Thor (5):
>      SUBJECT
>      SUBJECT
>      SUBJECT
>      SUBJECT????s ????s a very, very long f????rst l????ne for the
> comm????t message to see ????f ????t ????s wrapped correctly
>      SUBJECT
>
> Someone else (1):
>      SUBJECT
>
> I will try updating my sed, thanks!

Downgrading my sed to v 4.1.5 fixed the issue. Thanks for your help!

Regards,
ismail

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Encoding problem on OSX?
  2010-08-11  9:01                   ` İsmail Dönmez
@ 2010-08-11  9:23                     ` Jonathan Nieder
  2010-09-27  2:31                       ` Jonathan Nieder
  0 siblings, 1 reply; 14+ messages in thread
From: Jonathan Nieder @ 2010-08-11  9:23 UTC (permalink / raw)
  To: İsmail Dönmez; +Cc: git

İsmail Dönmez wrote:

> Downgrading my sed to v 4.1.5 fixed the issue. Thanks for your help!

I just read BUGS in the sed distribution.  Strangely enough the above seems to
be correct behavior:

  Another common localization-related problem happens if your input stream
  includes invalid multibyte sequences.  POSIX mandates that such
  sequences are _not_ matched by `.', so that `s/.*//' will not clear
  pattern space as you would expect.  In fact, there is no way to clear
  sed's buffers in the middle of the script in most multibyte locales
  (including UTF-8 locales).  For this reason, GNU sed provides a `z'
  command (for `zap') as an extension.

However there is still a sed bug as far as I can tell, since in the
test suite, LC_ALL is set to C, and using the C locale is the
suggested workaround in the GNU sed docs.  This explains where my
first suggested diagnostic messed up: presumably

 printf 'Th\360\235\204\236s\n' | LC_ALL=C sed "s/.*//"

would print

 <treble clef>s

and

 printf 'Th\370\235\204\236s\n' | sed "s/.*//"

would print

 ????s

with your copy of sed 4.2.1.

Well, I learned something new today.  Still thinking over how to fix
this in the test suite.  Thanks again.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Encoding problem on OSX?
  2010-08-11  9:23                     ` Jonathan Nieder
@ 2010-09-27  2:31                       ` Jonathan Nieder
  2010-09-27  5:15                         ` Kevin Ballard
  0 siblings, 1 reply; 14+ messages in thread
From: Jonathan Nieder @ 2010-09-27  2:31 UTC (permalink / raw)
  To: İsmail Dönmez, Richard MICHAEL; +Cc: git

Hi again,

İsmail Dönmez wrote:

> Downgrading my sed to v 4.1.5 fixed the issue.

This is nicely explained here:

 https://www.opengroup.org/sophocles/show_mail.tpl?source=L&listname=austin-group-l&id=14595

It looks to be a Mac OS libc misfeature.  Could you two lobby Apple to
get this fixed? :)

Thanks again for the reports.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Encoding problem on OSX?
  2010-09-27  2:31                       ` Jonathan Nieder
@ 2010-09-27  5:15                         ` Kevin Ballard
  2010-09-27  5:18                           ` İsmail Dönmez
  0 siblings, 1 reply; 14+ messages in thread
From: Kevin Ballard @ 2010-09-27  5:15 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: İsmail Dönmez, Richard MICHAEL, git

On Sep 26, 2010, at 7:31 PM, Jonathan Nieder wrote:

> Hi again,
> 
> İsmail Dönmez wrote:
> 
>> Downgrading my sed to v 4.1.5 fixed the issue.
> 
> This is nicely explained here:
> 
> https://www.opengroup.org/sophocles/show_mail.tpl?source=L&listname=austin-group-l&id=14595
> 
> It looks to be a Mac OS libc misfeature.  Could you two lobby Apple to
> get this fixed? :)

FWIW, /usr/bin/sed on Mac OS X 10.6 doesn't seem to be having a problem. t4201-shortlog.sh passes all tests on my machine.

-Kevin Ballard

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Encoding problem on OSX?
  2010-09-27  5:15                         ` Kevin Ballard
@ 2010-09-27  5:18                           ` İsmail Dönmez
  0 siblings, 0 replies; 14+ messages in thread
From: İsmail Dönmez @ 2010-09-27  5:18 UTC (permalink / raw)
  To: Kevin Ballard; +Cc: Jonathan Nieder, Richard MICHAEL, git

On Mon, Sep 27, 2010 at 8:15 AM, Kevin Ballard <kevin@sb.org> wrote:
> On Sep 26, 2010, at 7:31 PM, Jonathan Nieder wrote:
>
>> Hi again,
>>
>> İsmail Dönmez wrote:
>>
>>> Downgrading my sed to v 4.1.5 fixed the issue.
>>
>> This is nicely explained here:
>>
>> https://www.opengroup.org/sophocles/show_mail.tpl?source=L&listname=austin-group-l&id=14595
>>
>> It looks to be a Mac OS libc misfeature.  Could you two lobby Apple to
>> get this fixed? :)
>
> FWIW, /usr/bin/sed on Mac OS X 10.6 doesn't seem to be having a problem. t4201-shortlog.sh passes all tests on my machine.

Yes the problem was with GNU sed on OSX 10.6

Regards,
ismail

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-09-27  5:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <AANLkTikh12guRxCK2Vf=WvshzX8P-fYTyu3qxYWNJ2px@mail.gmail.com>
2010-08-09 13:58 ` Encoding problem on OSX? İsmail Dönmez
2010-08-09 23:46   ` Jonathan Nieder
2010-08-10  5:52     ` İsmail Dönmez
2010-08-11  7:55       ` Jonathan Nieder
2010-08-11  8:20         ` İsmail Dönmez
2010-08-11  8:29           ` Jonathan Nieder
2010-08-11  8:33             ` İsmail Dönmez
2010-08-11  8:44               ` Jonathan Nieder
2010-08-11  8:47                 ` İsmail Dönmez
2010-08-11  9:01                   ` İsmail Dönmez
2010-08-11  9:23                     ` Jonathan Nieder
2010-09-27  2:31                       ` Jonathan Nieder
2010-09-27  5:15                         ` Kevin Ballard
2010-09-27  5:18                           ` İsmail Dönmez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).