* Fix UTF Encoding issue
@ 2007-12-03 10:02 Benjamin Close
2007-12-03 10:14 ` Junio C Hamano
0 siblings, 1 reply; 24+ messages in thread
From: Benjamin Close @ 2007-12-03 10:02 UTC (permalink / raw)
To: git
>From 83042abf3967b455953cddeab43e33c1d59c6f03 Mon Sep 17 00:00:00 2001
From: Benjamin Close <Benjamin.Close@clearchain.com>
Date: Sun, 2 Dec 2007 15:09:00 -0800
Subject: [PATCH] Gitweb: Fix encoding to always translate rather than
sometimes fail
When performing the utf translation don't test if $res is defined.
It appears that it is defined even when the conversion fails. This causes
failures on the writing of the output stream which is expecting UTF.
Instead, immediately return if conversion is successful else force
the translation to the fallback encoding
---
gitweb/gitweb.perl | 8 ++------
1 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 491a3f4..00bbcdf 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -696,12 +696,8 @@ sub validate_refname {
sub to_utf8 {
my $str = shift;
my $res;
- eval { $res = decode_utf8($str, Encode::FB_CROAK); };
- if (defined $res) {
- return $res;
- } else {
- return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
- }
+ eval { return ($res = decode_utf8($str, Encode::FB_CROAK)); };
+ return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
}
# quote unsafe chars, but keep the slash, even when it's not
--
1.5.3.6
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-03 10:02 Fix UTF Encoding issue Benjamin Close
@ 2007-12-03 10:14 ` Junio C Hamano
2007-12-03 11:32 ` Ismail Dönmez
0 siblings, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2007-12-03 10:14 UTC (permalink / raw)
To: Benjamin Close; +Cc: git
Benjamin Close <Benjamin.Close@clearchain.com> writes:
>>From 83042abf3967b455953cddeab43e33c1d59c6f03 Mon Sep 17 00:00:00 2001
> From: Benjamin Close <Benjamin.Close@clearchain.com>
> Date: Sun, 2 Dec 2007 15:09:00 -0800
> Subject: [PATCH] Gitweb: Fix encoding to always translate rather than
> sometimes fail
>
> When performing the utf translation don't test if $res is defined.
> It appears that it is defined even when the conversion fails. This causes
> failures on the writing of the output stream which is expecting UTF.
> @@ -696,12 +696,8 @@ sub validate_refname {
> sub to_utf8 {
> my $str = shift;
> my $res;
> - eval { $res = decode_utf8($str, Encode::FB_CROAK); };
> - if (defined $res) {
> - return $res;
> - } else {
> - return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
> - }
> + eval { return ($res = decode_utf8($str, Encode::FB_CROAK)); };
> + return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
> }
This is funny.
I thought the standard catch ... throw idiom in Perl was to do the above
like this:
my $res;
eval { $res = decode_utf8($str, Encode::FB_CROAK); };
if ($@) {
return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
}
return $res;
(alternatively, you can assign return value of eval {} to $res).
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-03 10:14 ` Junio C Hamano
@ 2007-12-03 11:32 ` Ismail Dönmez
2007-12-03 12:06 ` Jakub Narebski
0 siblings, 1 reply; 24+ messages in thread
From: Ismail Dönmez @ 2007-12-03 11:32 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Benjamin Close, git
Monday 03 December 2007 Tarihinde 12:14:43 yazmıştı:
> Benjamin Close <Benjamin.Close@clearchain.com> writes:
> >>From 83042abf3967b455953cddeab43e33c1d59c6f03 Mon Sep 17 00:00:00 2001
> >
> > From: Benjamin Close <Benjamin.Close@clearchain.com>
> > Date: Sun, 2 Dec 2007 15:09:00 -0800
> > Subject: [PATCH] Gitweb: Fix encoding to always translate rather than
> > sometimes fail
> >
> > When performing the utf translation don't test if $res is defined.
> > It appears that it is defined even when the conversion fails. This causes
> > failures on the writing of the output stream which is expecting UTF.
> > @@ -696,12 +696,8 @@ sub validate_refname {
> > sub to_utf8 {
> > my $str = shift;
> > my $res;
> > - eval { $res = decode_utf8($str, Encode::FB_CROAK); };
> > - if (defined $res) {
> > - return $res;
> > - } else {
> > - return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
> > - }
> > + eval { return ($res = decode_utf8($str, Encode::FB_CROAK)); };
> > + return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
> > }
>
> This is funny.
>
> I thought the standard catch ... throw idiom in Perl was to do the above
> like this:
>
> my $res;
> eval { $res = decode_utf8($str, Encode::FB_CROAK); };
> if ($@) {
> return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
> }
> return $res;
I think this is correct, but the current code in gitweb doesn't look correct
since it checks for $res and not $@.
Regards,
ismail
--
Never learn by your mistakes, if you do you may never dare to try again.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-03 11:32 ` Ismail Dönmez
@ 2007-12-03 12:06 ` Jakub Narebski
2007-12-03 16:38 ` Martin Koegler
0 siblings, 1 reply; 24+ messages in thread
From: Jakub Narebski @ 2007-12-03 12:06 UTC (permalink / raw)
To: Ismail Dönmez
Cc: Junio C Hamano, Martin Koegler, Alexandre Julliard,
Benjamin Close, git
Ismail Dönmez <ismail@pardus.org.tr> writes:
> Monday 03 December 2007 Tarihinde 12:14:43 yazmıştı:
>> Benjamin Close <Benjamin.Close@clearchain.com> writes:
>>> - eval { $res = decode_utf8($str, Encode::FB_CROAK); };
>>> - if (defined $res) {
>>> - return $res;
>>> - } else {
>>> - return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
>>> - }
>>> + eval { return ($res = decode_utf8($str, Encode::FB_CROAK)); };
>>> + return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
>>> }
>>
>> I thought the standard catch ... throw idiom in Perl was to do the above
>> like this:
>>
>> my $res;
>> eval { $res = decode_utf8($str, Encode::FB_CROAK); };
>> if ($@) {
>> return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
>> }
>> return $res;
>
> I think this is correct, but the current code in gitweb doesn't look correct
> since it checks for $res and not $@.
First version of the patch was created by Martin Koegler. I have
participated in creating the version which is now in gitweb, but I
have to say that I wrote it based on decode_utf8
documentation... which doesn't necessarily agree with facts :-(
I'm all for the "throw idion" version. Ack.
--
Jakub Narebski
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-03 12:06 ` Jakub Narebski
@ 2007-12-03 16:38 ` Martin Koegler
2007-12-03 17:02 ` Jakub Narebski
0 siblings, 1 reply; 24+ messages in thread
From: Martin Koegler @ 2007-12-03 16:38 UTC (permalink / raw)
To: Jakub Narebski
Cc: Ismail Dönmez, Junio C Hamano, Alexandre Julliard,
Benjamin Close, git
On Mon, Dec 03, 2007 at 04:06:48AM -0800, Jakub Narebski wrote:
> Ismail Dönmez <ismail@pardus.org.tr> writes:
> > Monday 03 December 2007 Tarihinde 12:14:43 yazm??t?:
> >> Benjamin Close <Benjamin.Close@clearchain.com> writes:
> >>> - eval { $res = decode_utf8($str, Encode::FB_CROAK); };
> >>> - if (defined $res) {
> >>> - return $res;
> >>> - } else {
> >>> - return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
> >>> - }
> >>> + eval { return ($res = decode_utf8($str, Encode::FB_CROAK)); };
> >>> + return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
> >>> }
This version is broken on Debian sarge and etch. Feeding a UTF-8 and a latin1
encoding of the same character sequence yields to different results.
> >>
> >> I thought the standard catch ... throw idiom in Perl was to do the above
> >> like this:
> >>
> >> my $res;
> >> eval { $res = decode_utf8($str, Encode::FB_CROAK); };
> >> if ($@) {
> >> return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
> >> }
> >> return $res;
> >
> > I think this is correct, but the current code in gitweb doesn't look correct
> > since it checks for $res and not $@.
>
> First version of the patch was created by Martin Koegler. I have
> participated in creating the version which is now in gitweb, but I
> have to say that I wrote it based on decode_utf8
> documentation... which doesn't necessarily agree with facts :-(
eval { $res = decode_utf8(...); }
if ($@)
return decode(...);
return $res
or
eval { $res = decode_utf8(...); }
if (defined $res)
return $res;
else
return decode(...);
show the same (wrong) behaviour on Debian sarge. They do not always
decode non UTF-8 characters correctly, eg.
#öäü does not work
#äöüä does work
On Debian etch, both versions are working.
> I'm all for the "throw idion" version. Ack.
mfg Martin Kögler
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-03 16:38 ` Martin Koegler
@ 2007-12-03 17:02 ` Jakub Narebski
2007-12-03 21:46 ` Benjamin Close
2007-12-04 7:50 ` Martin Koegler
0 siblings, 2 replies; 24+ messages in thread
From: Jakub Narebski @ 2007-12-03 17:02 UTC (permalink / raw)
To: Martin Koegler
Cc: Ismail Dönmez, Junio C Hamano, Alexandre Julliard,
Benjamin Close, git, Perl Unicode Mailing List, Dan Kogai
On Mon, 3 Dec 2007, Martin Koegler wrote:
> On Mon, Dec 03, 2007 at 04:06:48AM -0800, Jakub Narebski wrote:
>> Ismail Dönmez <ismail@pardus.org.tr> writes:
>>> Monday 03 December 2007 Tarihinde 12:14:43 yazm??t?:
>>>> Benjamin Close <Benjamin.Close@clearchain.com> writes:
>>>>> - eval { $res = decode_utf8($str, Encode::FB_CROAK); };
>>>>> - if (defined $res) {
>>>>> - return $res;
>>>>> - } else {
>>>>> - return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
>>>>> - }
>>>>> + eval { return ($res = decode_utf8($str, Encode::FB_CROAK)); };
>>>>> + return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
>>>>> }
>
> This version is broken on Debian sarge and etch. Feeding a UTF-8 and a latin1
> encoding of the same character sequence yields to different results.
[...]
> eval { $res = decode_utf8(...); }
> if ($@)
> return decode(...);
> return $res
>
> or
>
> eval { $res = decode_utf8(...); }
> if (defined $res)
> return $res;
> else
> return decode(...);
>
> show the same (wrong) behaviour on Debian sarge. They do not always
> decode non UTF-8 characters correctly, eg.
> #öäü does not work
> #äöüä does work
>
> On Debian etch, both versions are working.
I don't know enough Perl to decide if it is a bug in gitweb usage
of decode_utf8, if it is a bug in your version of Encode, or if it
is bug in Encode.
Send copy of this mail to maintainers of Encode perl module.
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-03 17:02 ` Jakub Narebski
@ 2007-12-03 21:46 ` Benjamin Close
2007-12-03 22:20 ` Ismail Dönmez
2007-12-04 8:04 ` Martin Koegler
2007-12-04 7:50 ` Martin Koegler
1 sibling, 2 replies; 24+ messages in thread
From: Benjamin Close @ 2007-12-03 21:46 UTC (permalink / raw)
To: Jakub Narebski
Cc: Martin Koegler, Ismail Dönmez, Junio C Hamano,
Alexandre Julliard, git, Perl Unicode Mailing List, Dan Kogai
Jakub Narebski wrote:
> On Mon, 3 Dec 2007, Martin Koegler wrote:
>
>> On Mon, Dec 03, 2007 at 04:06:48AM -0800, Jakub Narebski wrote:
>>
>>> Ismail Dönmez <ismail@pardus.org.tr> writes:
>>>
>>>> Monday 03 December 2007 Tarihinde 12:14:43 yazm??t?:
>>>>
>>>>> Benjamin Close <Benjamin.Close@clearchain.com> writes:
>>>>>
>>>>>> - eval { $res = decode_utf8($str, Encode::FB_CROAK); };
>>>>>> - if (defined $res) {
>>>>>> - return $res;
>>>>>> - } else {
>>>>>> - return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
>>>>>> - }
>>>>>> + eval { return ($res = decode_utf8($str, Encode::FB_CROAK)); };
>>>>>> + return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
>>>>>> }
>>>>>>
>> This version is broken on Debian sarge and etch. Feeding a UTF-8 and a latin1
>> encoding of the same character sequence yields to different results.
>>
>
>
For the record, this was on a debian sid machine.
#perl --version
This is perl, v5.8.8 built for x86_64-linux-gnu-thread-multi
and the result of not using the original patch was:
<h1>Software error:</h1>
<pre>Cannot decode string with wide characters at /usr/lib/perl/5.8/Encode.pm line 166.
</pre>
I haven't tried the other solutions tested here.
>> eval { $res = decode_utf8(...); }
>> if ($@)
>> return decode(...);
>> return $res
>>
>> or
>>
>> eval { $res = decode_utf8(...); }
>> if (defined $res)
>> return $res;
>> else
>> return decode(...);
>>
>> show the same (wrong) behaviour on Debian sarge. They do not always
>> decode non UTF-8 characters correctly, eg.
>> #öäü does not work
>> #äöüä does work
>>
>> On Debian etch, both versions are working.
>>
>
> I don't know enough Perl to decide if it is a bug in gitweb usage
> of decode_utf8, if it is a bug in your version of Encode, or if it
> is bug in Encode.
>
> Send copy of this mail to maintainers of Encode perl module.
>
Ismail do you know if sid was also broken?
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-03 21:46 ` Benjamin Close
@ 2007-12-03 22:20 ` Ismail Dönmez
2007-12-03 23:04 ` Benjamin Close
2007-12-04 8:04 ` Martin Koegler
1 sibling, 1 reply; 24+ messages in thread
From: Ismail Dönmez @ 2007-12-03 22:20 UTC (permalink / raw)
To: Benjamin Close
Cc: Jakub Narebski, Martin Koegler, Junio C Hamano,
Alexandre Julliard, git, Perl Unicode Mailing List, Dan Kogai
[-- Attachment #1: Type: text/plain, Size: 1392 bytes --]
Monday 03 December 2007 Tarihinde 23:46:24 yazmıştı:
> Jakub Narebski wrote:
> > On Mon, 3 Dec 2007, Martin Koegler wrote:
> >> On Mon, Dec 03, 2007 at 04:06:48AM -0800, Jakub Narebski wrote:
> >>> Ismail Dönmez <ismail@pardus.org.tr> writes:
> >>>> Monday 03 December 2007 Tarihinde 12:14:43 yazm??t?:
> >>>>> Benjamin Close <Benjamin.Close@clearchain.com> writes:
> >>>>>> - eval { $res = decode_utf8($str, Encode::FB_CROAK); };
> >>>>>> - if (defined $res) {
> >>>>>> - return $res;
> >>>>>> - } else {
> >>>>>> - return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
> >>>>>> - }
> >>>>>> + eval { return ($res = decode_utf8($str, Encode::FB_CROAK)); };
> >>>>>> + return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
> >>>>>> }
> >>
> >> This version is broken on Debian sarge and etch. Feeding a UTF-8 and a
> >> latin1 encoding of the same character sequence yields to different
> >> results.
>
> For the record, this was on a debian sid machine.
>
> #perl --version
> This is perl, v5.8.8 built for x86_64-linux-gnu-thread-multi
>
> and the result of not using the original patch was:
>
> <h1>Software error:</h1>
> <pre>Cannot decode string with wide characters at
> /usr/lib/perl/5.8/Encode.pm line 166. </pre>
Can you try the attached patch?
--
Never learn by your mistakes, if you do you may never dare to try again.
[-- Attachment #2: utf8-username.patch --]
[-- Type: text/x-diff, Size: 313 bytes --]
--- gitweb/gitweb.perl 2007-11-28 11:33:14.000000000 +0200
+++ gitweb/gitweb.perl 2007-11-28 11:33:42.000000000 +0200
@@ -2159,7 +2159,7 @@
}
my $owner = $gcos;
$owner =~ s/[,;].*$//;
- return to_utf8($owner);
+ return $owner;
}
## ......................................................................
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-03 22:20 ` Ismail Dönmez
@ 2007-12-03 23:04 ` Benjamin Close
2007-12-03 23:37 ` Jakub Narebski
0 siblings, 1 reply; 24+ messages in thread
From: Benjamin Close @ 2007-12-03 23:04 UTC (permalink / raw)
To: Ismail D??nmez
Cc: Jakub Narebski, Martin Koegler, Junio C Hamano,
Alexandre Julliard, git, Perl Unicode Mailing List, Dan Kogai
On Tue, Dec 04, 2007 at 12:20:26AM +0200, Ismail D??nmez wrote:
> Monday 03 December 2007 Tarihinde 23:46:24 yazm????t??:
> > Jakub Narebski wrote:
> > > On Mon, 3 Dec 2007, Martin Koegler wrote:
> > >> On Mon, Dec 03, 2007 at 04:06:48AM -0800, Jakub Narebski wrote:
> > >>> Ismail D??nmez <ismail@pardus.org.tr> writes:
> > >>>> Monday 03 December 2007 Tarihinde 12:14:43 yazm??t?:
> > >>>>> Benjamin Close <Benjamin.Close@clearchain.com> writes:
> > >>>>>> - eval { $res = decode_utf8($str, Encode::FB_CROAK); };
> > >>>>>> - if (defined $res) {
> > >>>>>> - return $res;
> > >>>>>> - } else {
> > >>>>>> - return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
> > >>>>>> - }
> > >>>>>> + eval { return ($res = decode_utf8($str, Encode::FB_CROAK)); };
> > >>>>>> + return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
> > >>>>>> }
> > >>
> > >> This version is broken on Debian sarge and etch. Feeding a UTF-8 and a
> > >> latin1 encoding of the same character sequence yields to different
> > >> results.
> >
> > For the record, this was on a debian sid machine.
> >
> > #perl --version
> > This is perl, v5.8.8 built for x86_64-linux-gnu-thread-multi
> >
> > and the result of not using the original patch was:
> >
> > <h1>Software error:</h1>
> > <pre>Cannot decode string with wide characters at
> > /usr/lib/perl/5.8/Encode.pm line 166. </pre>
>
> Can you try the attached patch?
I confirm that the patch corrects the problem.
Without it I get the Cannot decode string error. With it gitweb displays
correctly.
Cheers,
Benjamin
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-03 23:04 ` Benjamin Close
@ 2007-12-03 23:37 ` Jakub Narebski
2007-12-04 4:12 ` Ismail Dönmez
0 siblings, 1 reply; 24+ messages in thread
From: Jakub Narebski @ 2007-12-03 23:37 UTC (permalink / raw)
To: Benjamin Close
Cc: Ismail Dönmez, Martin Koegler, Junio C Hamano,
Alexandre Julliard, git
On Tue, 4 Dec 2007, Benjamin Close wrote:
> On Tue, Dec 04, 2007 at 12:20:26AM +0200, Ismail Donmez wrote:
> >
> > Can you try the attached patch?
>
> I confirm that the patch corrects the problem.
>
> Without it I get the Cannot decode string error. With it gitweb
> displays correctly.
But the patch _avoids_ issue (des not convert owner to utf8), rather
than solving it, if I understand it correctly. What if gecos is in
utf-8?
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-03 23:37 ` Jakub Narebski
@ 2007-12-04 4:12 ` Ismail Dönmez
0 siblings, 0 replies; 24+ messages in thread
From: Ismail Dönmez @ 2007-12-04 4:12 UTC (permalink / raw)
To: Jakub Narebski
Cc: Benjamin Close, Martin Koegler, Junio C Hamano,
Alexandre Julliard, git
Tuesday 04 December 2007 Tarihinde 01:37:35 yazmıştı:
> On Tue, 4 Dec 2007, Benjamin Close wrote:
> > On Tue, Dec 04, 2007 at 12:20:26AM +0200, Ismail Donmez wrote:
> > > Can you try the attached patch?
> >
> > I confirm that the patch corrects the problem.
> >
> > Without it I get the Cannot decode string error. With it gitweb
> > displays correctly.
>
> But the patch _avoids_ issue (des not convert owner to utf8), rather
> than solving it, if I understand it correctly. What if gecos is in
> utf-8?
Indeed its a workaround but UTF-8 username is correctly displayed in gitweb so
my understanding was gecos field is already UTF-8.
--
Never learn by your mistakes, if you do you may never dare to try again.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-03 17:02 ` Jakub Narebski
2007-12-03 21:46 ` Benjamin Close
@ 2007-12-04 7:50 ` Martin Koegler
2007-12-04 7:55 ` Ismail Dönmez
1 sibling, 1 reply; 24+ messages in thread
From: Martin Koegler @ 2007-12-04 7:50 UTC (permalink / raw)
To: Jakub Narebski
Cc: Ismail Dönmez, Junio C Hamano, Alexandre Julliard,
Benjamin Close, git, Perl Unicode Mailing List, Dan Kogai
On Mon, Dec 03, 2007 at 06:02:54PM +0100, Jakub Narebski wrote:
> On Mon, 3 Dec 2007, Martin Koegler wrote:
> > eval { $res = decode_utf8(...); }
> > if ($@)
> > return decode(...);
> > return $res
> >
> > or
> >
> > eval { $res = decode_utf8(...); }
> > if (defined $res)
> > return $res;
> > else
> > return decode(...);
> >
> > show the same (wrong) behaviour on Debian sarge. They do not always
> > decode non UTF-8 characters correctly, eg.
> > #öäü does not work
> > #äöüä does work
> >
> > On Debian etch, both versions are working.
>
> I don't know enough Perl to decide if it is a bug in gitweb usage
> of decode_utf8, if it is a bug in your version of Encode, or if it
> is bug in Encode.
>
> Send copy of this mail to maintainers of Encode perl module.
The bug affects old versions of perl (Debian sarge = oldstable).
As it works on the newer Debian etch, do you really think, that it is
a good idea to report issue?
How would you handle a bug report, which reports a bug for gitweb in GIT
1.4, and tells you, that a newer versions works?
As Debian sarge has reached its end of life, the distribution will
probable also issue no update.
mfg Martin Kögler
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-04 7:50 ` Martin Koegler
@ 2007-12-04 7:55 ` Ismail Dönmez
2007-12-04 8:16 ` Martin Koegler
0 siblings, 1 reply; 24+ messages in thread
From: Ismail Dönmez @ 2007-12-04 7:55 UTC (permalink / raw)
To: Martin Koegler
Cc: Jakub Narebski, Junio C Hamano, Alexandre Julliard,
Benjamin Close, git, Perl Unicode Mailing List, Dan Kogai
Tuesday 04 December 2007 Tarihinde 09:50:28 yazmıştı:
> The bug affects old versions of perl (Debian sarge = oldstable).
> As it works on the newer Debian etch, do you really think, that it is
> a good idea to report issue?
Same problem here with v5.8.8 which is latest stable perl5 release.
Regards,
ismail
--
Never learn by your mistakes, if you do you may never dare to try again.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-03 21:46 ` Benjamin Close
2007-12-03 22:20 ` Ismail Dönmez
@ 2007-12-04 8:04 ` Martin Koegler
2007-12-04 8:12 ` Ismail Dönmez
1 sibling, 1 reply; 24+ messages in thread
From: Martin Koegler @ 2007-12-04 8:04 UTC (permalink / raw)
To: Benjamin Close
Cc: Jakub Narebski, Ismail Dönmez, Junio C Hamano,
Alexandre Julliard, git, Perl Unicode Mailing List, Dan Kogai
On Tue, Dec 04, 2007 at 08:16:24AM +1030, Benjamin Close wrote:
> Jakub Narebski wrote:
> >On Mon, 3 Dec 2007, Martin Koegler wrote:
> >>On Mon, Dec 03, 2007 at 04:06:48AM -0800, Jakub Narebski wrote:
> >>>Ismail Dönmez <ismail@pardus.org.tr> writes:
> >>>>Monday 03 December 2007 Tarihinde 12:14:43 yazm??t?:
> >>>>>Benjamin Close <Benjamin.Close@clearchain.com> writes:
> >>>>>>- eval { $res = decode_utf8($str, Encode::FB_CROAK); };
> >>>>>>- if (defined $res) {
> >>>>>>- return $res;
> >>>>>>- } else {
> >>>>>>- return decode($fallback_encoding, $str,
> >>>>>>Encode::FB_DEFAULT);
> >>>>>>- }
> >>>>>>+ eval { return ($res = decode_utf8($str, Encode::FB_CROAK));
> >>>>>>};
> >>>>>>+ return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
> >>>>>> }
> >>>>>>
> >>This version is broken on Debian sarge and etch. Feeding a UTF-8 and a
> >>latin1
> >>encoding of the same character sequence yields to different results.
>
> For the record, this was on a debian sid machine.
>
> #perl --version
> This is perl, v5.8.8 built for x86_64-linux-gnu-thread-multi
>
> and the result of not using the original patch was:
>
> <h1>Software error:</h1>
> <pre>Cannot decode string with wide characters at
> /usr/lib/perl/5.8/Encode.pm line 166.
> </pre>
>
>
> I haven't tried the other solutions tested here.
Debian etch also has v5.8.8.
My main question is, why is the error not catched?
I'm not a perl programmer, but in your patch the first line is a
NOP. The return in eval seems to only returns from the eval block, so
any text is decoded as latin1 with the second statement.
In the original version, decode($fallback_encoding, $str,
Encode::FB_DEFAULT) can not emit an error, else it would in your
version too.
In your version, eval is able to surpress the error of
decode_utf8($str, Encode::FB_CROAK);, but not in the original version.
Strange.
mfg Martin Kögler
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-04 8:04 ` Martin Koegler
@ 2007-12-04 8:12 ` Ismail Dönmez
2007-12-04 8:20 ` Martin Koegler
0 siblings, 1 reply; 24+ messages in thread
From: Ismail Dönmez @ 2007-12-04 8:12 UTC (permalink / raw)
To: Martin Koegler
Cc: Benjamin Close, Jakub Narebski, Junio C Hamano,
Alexandre Julliard, git, Perl Unicode Mailing List, Dan Kogai
Tuesday 04 December 2007 10:04:07 Martin Koegler yazmıştı:
> On Tue, Dec 04, 2007 at 08:16:24AM +1030, Benjamin Close wrote:
> > Jakub Narebski wrote:
> > >On Mon, 3 Dec 2007, Martin Koegler wrote:
> > >>On Mon, Dec 03, 2007 at 04:06:48AM -0800, Jakub Narebski wrote:
> > >>>Ismail Dönmez <ismail@pardus.org.tr> writes:
> > >>>>Monday 03 December 2007 Tarihinde 12:14:43 yazm??t?:
> > >>>>>Benjamin Close <Benjamin.Close@clearchain.com> writes:
> > >>>>>>- eval { $res = decode_utf8($str, Encode::FB_CROAK); };
> > >>>>>>- if (defined $res) {
> > >>>>>>- return $res;
> > >>>>>>- } else {
> > >>>>>>- return decode($fallback_encoding, $str,
> > >>>>>>Encode::FB_DEFAULT);
> > >>>>>>- }
> > >>>>>>+ eval { return ($res = decode_utf8($str, Encode::FB_CROAK));
> > >>>>>>};
> > >>>>>>+ return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
> > >>>>>> }
> > >>
> > >>This version is broken on Debian sarge and etch. Feeding a UTF-8 and a
> > >>latin1
> > >>encoding of the same character sequence yields to different results.
> >
> > For the record, this was on a debian sid machine.
> >
> > #perl --version
> > This is perl, v5.8.8 built for x86_64-linux-gnu-thread-multi
> >
> > and the result of not using the original patch was:
> >
> > <h1>Software error:</h1>
> > <pre>Cannot decode string with wide characters at
> > /usr/lib/perl/5.8/Encode.pm line 166.
> > </pre>
> >
> >
> > I haven't tried the other solutions tested here.
>
> Debian etch also has v5.8.8.
>
> My main question is, why is the error not catched?
>
> I'm not a perl programmer, but in your patch the first line is a
> NOP. The return in eval seems to only returns from the eval block, so
> any text is decoded as latin1 with the second statement.
>
> In the original version, decode($fallback_encoding, $str,
> Encode::FB_DEFAULT) can not emit an error, else it would in your
> version too.
>
> In your version, eval is able to surpress the error of
> decode_utf8($str, Encode::FB_CROAK);, but not in the original version.
I think just a better method is to use (not tested):
if( is_utf8($str) )
{
return decode_utf8($str);
}
else {
return decode($str);
}
Regards,
ismail
--
Never learn by your mistakes, if you do you may never dare to try again.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-04 7:55 ` Ismail Dönmez
@ 2007-12-04 8:16 ` Martin Koegler
2007-12-04 8:28 ` Ismail Dönmez
0 siblings, 1 reply; 24+ messages in thread
From: Martin Koegler @ 2007-12-04 8:16 UTC (permalink / raw)
To: Ismail Dönmez
Cc: Jakub Narebski, Junio C Hamano, Alexandre Julliard,
Benjamin Close, git, Perl Unicode Mailing List, Dan Kogai
On Tue, Dec 04, 2007 at 09:55:04AM +0200, Ismail Dönmez wrote:
> Tuesday 04 December 2007 Tarihinde 09:50:28 yazm????t??:
> > The bug affects old versions of perl (Debian sarge = oldstable).
> > As it works on the newer Debian etch, do you really think, that it is
> > a good idea to report issue?
>
> Same problem here with v5.8.8 which is latest stable perl5 release.
I have put together a small perl script, which tests the various ways
of decoding, which have been posted on the list. The first test is
wrong by design. A working decoding method should result in
"#öäü#äöü".
Debian sarge:
#öäü#ÀöÌ
##äöü
##äöü
##äöü
Debian etch, OpenSuSE 10.2, Fedora 7:
#öäü#ÀöÌ
#öäü#äöü
#öäü#äöü
#öäü#äöü
mfg Martin Kögler
#!/usr/bin/perl
use Encode;
sub t {
my $str = shift;
my ($res);
eval { return ($res = decode_utf8($str, Encode::FB_CROAK)); };
return decode("latin1", $str, Encode::FB_DEFAULT);
}
sub t1 {
my $str = shift;
my ($res);
eval { ($res = decode_utf8($str, Encode::FB_CROAK)); };
if ($@) {
return decode("latin1", $str, Encode::FB_DEFAULT); }
else
{ return $res; }
}
sub t2 {
my $str = shift;
my ($res);
eval { $res = decode_utf8($str, Encode::FB_CROAK); };
if (defined $res) {
return $res;
} else {
return decode("latin1", $str, Encode::FB_DEFAULT);
}
}
sub t3 {
my $str = shift;
my $res;
eval { $res = decode_utf8 ($str, 1); };
return $res || decode('latin1', $str);
}
print t("#öäü");
print t("#ÀöÌ");
print "\n";
print t1("#öäü");
print t1("#ÀöÌ");
print "\n";
print t2("#öäü");
print t2("#ÀöÌ");
print "\n";
print t3("#öäü");
print t3("#ÀöÌ");
print "\n";
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-04 8:12 ` Ismail Dönmez
@ 2007-12-04 8:20 ` Martin Koegler
0 siblings, 0 replies; 24+ messages in thread
From: Martin Koegler @ 2007-12-04 8:20 UTC (permalink / raw)
To: Ismail Dönmez
Cc: Benjamin Close, Jakub Narebski, Junio C Hamano,
Alexandre Julliard, git, Perl Unicode Mailing List, Dan Kogai
On Tue, Dec 04, 2007 at 10:12:50AM +0200, Ismail Dönmez wrote:
> I think just a better method is to use (not tested):
>
> if( is_utf8($str) )
> {
> return decode_utf8($str);
> }
> else {
> return decode($str);
> }
I already tried this function. It does not test, if a string is
really UTF-8. It seems to be to intended to check, if perl stores
the string internally in a multi byte encoding.
mfg Martin Kögler.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-04 8:16 ` Martin Koegler
@ 2007-12-04 8:28 ` Ismail Dönmez
2007-12-04 8:33 ` Ismail Dönmez
0 siblings, 1 reply; 24+ messages in thread
From: Ismail Dönmez @ 2007-12-04 8:28 UTC (permalink / raw)
To: Martin Koegler
Cc: Jakub Narebski, Junio C Hamano, Alexandre Julliard,
Benjamin Close, git, Perl Unicode Mailing List, Dan Kogai
Tuesday 04 December 2007 10:16:34 Martin Koegler yazmıştı:
[...]
> print t("#öäü");
> print t("#ÀöÌ");
> print "\n";
How about this one, doesn't even use Encode, uses just built-in utf8
function :
[~]> cat test.pl
binmode STDOUT, ':utf8';
my $str = "#öäü";
if (utf8::valid($str))
{
utf8::decode($str);
}
print $str."\n";
[~]> perl test.pl
#öäü
Regards,
ismail
--
Never learn by your mistakes, if you do you may never dare to try again.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-04 8:28 ` Ismail Dönmez
@ 2007-12-04 8:33 ` Ismail Dönmez
2007-12-04 8:44 ` Martin Koegler
0 siblings, 1 reply; 24+ messages in thread
From: Ismail Dönmez @ 2007-12-04 8:33 UTC (permalink / raw)
To: Martin Koegler
Cc: Jakub Narebski, Junio C Hamano, Alexandre Julliard,
Benjamin Close, git, Perl Unicode Mailing List, Dan Kogai
Tuesday 04 December 2007 10:28:59 Ismail Dönmez yazmıştı:
> Tuesday 04 December 2007 10:16:34 Martin Koegler yazmıştı:
> [...]
>
> > print t("#öäü");
> > print t("#ÀöÌ");
> > print "\n";
>
> How about this one, doesn't even use Encode, uses just built-in utf8
> function :
>
> [~]> cat test.pl
> binmode STDOUT, ':utf8';
>
> my $str = "#öäü";
>
> if (utf8::valid($str))
> {
> utf8::decode($str);
> }
>
> print $str."\n";
>
> [~]> perl test.pl
> #öäü
Following to_utf8 function works for me :
sub to_utf8 {
· my $str = shift;
if(utf8::valid($str))
{
utf8::decode($str);
}
·
return $str;
}
Regards,
ismail
--
Never learn by your mistakes, if you do you may never dare to try again.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-04 8:33 ` Ismail Dönmez
@ 2007-12-04 8:44 ` Martin Koegler
2007-12-04 8:47 ` Ismail Dönmez
0 siblings, 1 reply; 24+ messages in thread
From: Martin Koegler @ 2007-12-04 8:44 UTC (permalink / raw)
To: Ismail Dönmez
Cc: Jakub Narebski, Junio C Hamano, Alexandre Julliard,
Benjamin Close, git, Perl Unicode Mailing List, Dan Kogai
On Tue, Dec 04, 2007 at 10:33:39AM +0200, Ismail Dönmez wrote:
> Following to_utf8 function works for me :
For me too (Debian sarge+etch).
> sub to_utf8 {
> · my $str = shift;
>
> if(utf8::valid($str))
> {
> utf8::decode($str);
> }
> ·
> return $str;
In the original thread, there was some discussion, that some people
might want a different fallback endcoding. So mayme you should
keep the second call to decode for the fallback encoding.
> }
mfg Martin Kögler
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-04 8:44 ` Martin Koegler
@ 2007-12-04 8:47 ` Ismail Dönmez
2007-12-04 8:55 ` Ismail Dönmez
0 siblings, 1 reply; 24+ messages in thread
From: Ismail Dönmez @ 2007-12-04 8:47 UTC (permalink / raw)
To: Martin Koegler
Cc: Jakub Narebski, Junio C Hamano, Alexandre Julliard,
Benjamin Close, git, Perl Unicode Mailing List, Dan Kogai
Tuesday 04 December 2007 10:44:12 Martin Koegler yazmıştı:
> On Tue, Dec 04, 2007 at 10:33:39AM +0200, Ismail Dönmez wrote:
> > Following to_utf8 function works for me :
>
> For me too (Debian sarge+etch).
Thanks for testing.
> > sub to_utf8 {
> > · my $str = shift;
> >
> > if(utf8::valid($str))
> > {
> > utf8::decode($str);
> > }
> > ·
> > return $str;
>
> In the original thread, there was some discussion, that some people
> might want a different fallback endcoding. So mayme you should
> keep the second call to decode for the fallback encoding.
Probably, I just wanted to fix this damn UTF-8 bug surfacing over and over =)
Regards,
ismail
--
Never learn by your mistakes, if you do you may never dare to try again.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-04 8:47 ` Ismail Dönmez
@ 2007-12-04 8:55 ` Ismail Dönmez
2007-12-04 9:07 ` Jakub Narebski
2007-12-04 10:11 ` Wincent Colaiuta
0 siblings, 2 replies; 24+ messages in thread
From: Ismail Dönmez @ 2007-12-04 8:55 UTC (permalink / raw)
To: Martin Koegler
Cc: Jakub Narebski, Junio C Hamano, Alexandre Julliard,
Benjamin Close, git, Perl Unicode Mailing List, Dan Kogai
Tuesday 04 December 2007 10:47:39 Ismail Dönmez yazmıştı:
> Tuesday 04 December 2007 10:44:12 Martin Koegler yazmıştı:
> > On Tue, Dec 04, 2007 at 10:33:39AM +0200, Ismail Dönmez wrote:
> > > Following to_utf8 function works for me :
> >
> > For me too (Debian sarge+etch).
>
> Thanks for testing.
Use Perl built-in utf8 function for UTF-8 decoding.
Signed-off-by: İsmail Dönmez <ismail@pardus.org.tr>
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index ff5daa7..db255c1 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -695,10 +695,9 @@ sub validate_refname {
# in utf-8 thanks to "binmode STDOUT, ':utf8'" at beginning
sub to_utf8 {
my $str = shift;
- my $res;
- eval { $res = decode_utf8($str, Encode::FB_CROAK); };
- if (defined $res) {
- return $res;
+ if (utf8::valid($str)) {
+ utf8::decode($str);
+ return $str;
} else {
return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
}
--
Never learn by your mistakes, if you do you may never dare to try again.
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-04 8:55 ` Ismail Dönmez
@ 2007-12-04 9:07 ` Jakub Narebski
2007-12-04 10:11 ` Wincent Colaiuta
1 sibling, 0 replies; 24+ messages in thread
From: Jakub Narebski @ 2007-12-04 9:07 UTC (permalink / raw)
To: Ismail Dönmez
Cc: Martin Koegler, Junio C Hamano, Alexandre Julliard,
Benjamin Close, git
On Tue, 4 Dec 2007, Ismail Dönmez wrote:
> Use Perl built-in utf8 function for UTF-8 decoding.
>
> Signed-off-by: İsmail Dönmez <ismail@pardus.org.tr>
Looks nice. I have not tested it, but if it works: Ack.
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Fix UTF Encoding issue
2007-12-04 8:55 ` Ismail Dönmez
2007-12-04 9:07 ` Jakub Narebski
@ 2007-12-04 10:11 ` Wincent Colaiuta
1 sibling, 0 replies; 24+ messages in thread
From: Wincent Colaiuta @ 2007-12-04 10:11 UTC (permalink / raw)
To: Ismail Dönmez
Cc: Martin Koegler, Jakub Narebski, Junio C Hamano,
Alexandre Julliard, Benjamin Close, git,
Perl Unicode Mailing List, Dan Kogai
El 4/12/2007, a las 9:55, Ismail Dönmez escribió:
> Tuesday 04 December 2007 10:47:39 Ismail Dönmez yazmıştı:
>> Tuesday 04 December 2007 10:44:12 Martin Koegler yazmıştı:
>>> On Tue, Dec 04, 2007 at 10:33:39AM +0200, Ismail Dönmez wrote:
>>>> Following to_utf8 function works for me :
>>>
>>> For me too (Debian sarge+etch).
>>
>> Thanks for testing.
>
> Use Perl built-in utf8 function for UTF-8 decoding.
>
> Signed-off-by: İsmail Dönmez <ismail@pardus.org.tr>
>
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index ff5daa7..db255c1 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -695,10 +695,9 @@ sub validate_refname {
> # in utf-8 thanks to "binmode STDOUT, ':utf8'" at beginning
> sub to_utf8 {
> my $str = shift;
> - my $res;
> - eval { $res = decode_utf8($str, Encode::FB_CROAK); };
> - if (defined $res) {
> - return $res;
> + if (utf8::valid($str)) {
> + utf8::decode($str);
> + return $str;
This is good as it fixes another problem which some may have
encountered. On at least one distro that I use (Red Hat Enterprise
Linux 3) the Encode module is very old (it's 1.83; the latest release
is 2.23), and so gitweb won't even run, dying during compilation with
this:
Too many arguments for Encode::decode_utf8 at gitweb.cgi line 686,
near "Encode::FB_CROAK)"
Of course, the workaround is to install a newer version of the module,
but this patch eliminates that dependency which is IMO a good thing.
Cheers,
Wincent
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2007-12-04 10:12 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-03 10:02 Fix UTF Encoding issue Benjamin Close
2007-12-03 10:14 ` Junio C Hamano
2007-12-03 11:32 ` Ismail Dönmez
2007-12-03 12:06 ` Jakub Narebski
2007-12-03 16:38 ` Martin Koegler
2007-12-03 17:02 ` Jakub Narebski
2007-12-03 21:46 ` Benjamin Close
2007-12-03 22:20 ` Ismail Dönmez
2007-12-03 23:04 ` Benjamin Close
2007-12-03 23:37 ` Jakub Narebski
2007-12-04 4:12 ` Ismail Dönmez
2007-12-04 8:04 ` Martin Koegler
2007-12-04 8:12 ` Ismail Dönmez
2007-12-04 8:20 ` Martin Koegler
2007-12-04 7:50 ` Martin Koegler
2007-12-04 7:55 ` Ismail Dönmez
2007-12-04 8:16 ` Martin Koegler
2007-12-04 8:28 ` Ismail Dönmez
2007-12-04 8:33 ` Ismail Dönmez
2007-12-04 8:44 ` Martin Koegler
2007-12-04 8:47 ` Ismail Dönmez
2007-12-04 8:55 ` Ismail Dönmez
2007-12-04 9:07 ` Jakub Narebski
2007-12-04 10:11 ` Wincent Colaiuta
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).