git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* gitweb - encoding problems
@ 2007-05-21 20:57 Martin Koegler
  2007-05-21 21:09 ` Ismail Dönmez
  2007-05-22  0:33 ` David Woodhouse
  0 siblings, 2 replies; 4+ messages in thread
From: Martin Koegler @ 2007-05-21 20:57 UTC (permalink / raw)
  To: git; +Cc: Jakub Narebski, Junio C Hamano

I use ISO-8859-1 as my locale, so my blobs, commits and tags are in
this encoding.

On perl v5.8.6, decode_utf8 of any non utf-8 value returns undefined:

$cat xy
#!/usr/bin/perl
use strict;
use warnings;
use CGI qw(:standard :escapeHTML -nosticky);
use CGI::Util qw(unescape);
use CGI::Carp qw(fatalsToBrowser);
use Encode;
use Fcntl ':mode';
use File::Find qw();
use File::Basename qw(basename);

binmode STDOUT, ':utf8';

print decode_utf8('äöü');
$perl xy
[Mon May 21 22:00:00 2007] xy: Use of uninitialized value in print at xy line 14.

If gitweb encounters, eg. an "Umlaut" (äöü) in a commit/tag, use of
uninitialized value message are generated. In one case,
decode_utf8($long) in format_subject_html is undefined, which results
in a invalid link (a tag contains only title without any value
assignment) and a browser message, that the html is not valid.

The previous installed version of git/gitweb (1.5.0rc3) showed only
small black rhombuses, but didn't generate "uninitialized value"
messages or invalid html.

So there is regression between git-1.5.0 and git-1.5.2.

Adding $var = encode_utf8($var) if (!defined decode_utf8($var)) for
each "uninitialized value" message results in a correct result for me.

I wanted to post a patch with these changes, as it solved my locale problem.
But then I tried the same a different computer with a newer perl (v5.8.8).
$ cat x
#!/usr/bin/perl
use strict;
use warnings;
use CGI qw(:standard :escapeHTML -nosticky);
use CGI::Util qw(unescape);
use CGI::Carp qw(fatalsToBrowser);
use Encode;
use Fcntl ':mode';
use File::Find qw();
use File::Basename qw(basename);

binmode STDOUT, ':utf8';

print decode_utf8('äöü');
$ perl x
ᅵᅵï¿

Here perl decodes the ISO-8859-1 text to something differnent:
00000000  ef bf bd ef bf bd ef bf  bd                       |ᅵᅵï¿

The result is, that all "Umlaute" are shown as a small black rhombus
in gitweb (and no invalid html).

mfg Martin Kögler

cat x |hexdump -C
00000000  23 21 2f 75 73 72 2f 62  69 6e 2f 70 65 72 6c 0a  |#!/usr/bin/perl.|
00000010  75 73 65 20 73 74 72 69  63 74 3b 0a 75 73 65 20  |use strict;.use |
00000020  77 61 72 6e 69 6e 67 73  3b 0a 75 73 65 20 43 47  |warnings;.use CG|
00000030  49 20 71 77 28 3a 73 74  61 6e 64 61 72 64 20 3a  |I qw(:standard :|
00000040  65 73 63 61 70 65 48 54  4d 4c 20 2d 6e 6f 73 74  |escapeHTML -nost|
00000050  69 63 6b 79 29 3b 0a 75  73 65 20 43 47 49 3a 3a  |icky);.use CGI::|
00000060  55 74 69 6c 20 71 77 28  75 6e 65 73 63 61 70 65  |Util qw(unescape|
00000070  29 3b 0a 75 73 65 20 43  47 49 3a 3a 43 61 72 70  |);.use CGI::Carp|
00000080  20 71 77 28 66 61 74 61  6c 73 54 6f 42 72 6f 77  | qw(fatalsToBrow|
00000090  73 65 72 29 3b 0a 75 73  65 20 45 6e 63 6f 64 65  |ser);.use Encode|
000000a0  3b 0a 75 73 65 20 46 63  6e 74 6c 20 27 3a 6d 6f  |;.use Fcntl ':mo|
000000b0  64 65 27 3b 0a 75 73 65  20 46 69 6c 65 3a 3a 46  |de';.use File::F|
000000c0  69 6e 64 20 71 77 28 29  3b 0a 75 73 65 20 46 69  |ind qw();.use Fi|
000000d0  6c 65 3a 3a 42 61 73 65  6e 61 6d 65 20 71 77 28  |le::Basename qw(|
000000e0  62 61 73 65 6e 61 6d 65  29 3b 0a 0a 62 69 6e 6d  |basename);..binm|
000000f0  6f 64 65 20 53 54 44 4f  55 54 2c 20 27 3a 75 74  |ode STDOUT, ':ut|
00000100  66 38 27 3b 0a 0a 70 72  69 6e 74 20 64 65 63 6f  |f8';..print deco|
00000110  64 65 5f 75 74 66 38 28  27 e4 f6 fc 27 29 3b 0a  |de_utf8('äöü');.|
00000120

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-05-22 10:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-21 20:57 gitweb - encoding problems Martin Koegler
2007-05-21 21:09 ` Ismail Dönmez
2007-05-22  0:33 ` David Woodhouse
2007-05-22  7:50   ` Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).