git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: mkoegler@auto.tuwien.ac.at (Martin Koegler)
To: git@vger.kernel.org
Cc: Jakub Narebski <jnareb@gmail.com>, Junio C Hamano <junkio@cox.net>
Subject: gitweb - encoding problems
Date: Mon, 21 May 2007 22:57:21 +0200	[thread overview]
Message-ID: <20070521205721.GA21771@auto.tuwien.ac.at> (raw)

I use ISO-8859-1 as my locale, so my blobs, commits and tags are in
this encoding.

On perl v5.8.6, decode_utf8 of any non utf-8 value returns undefined:

$cat xy
#!/usr/bin/perl
use strict;
use warnings;
use CGI qw(:standard :escapeHTML -nosticky);
use CGI::Util qw(unescape);
use CGI::Carp qw(fatalsToBrowser);
use Encode;
use Fcntl ':mode';
use File::Find qw();
use File::Basename qw(basename);

binmode STDOUT, ':utf8';

print decode_utf8('äöü');
$perl xy
[Mon May 21 22:00:00 2007] xy: Use of uninitialized value in print at xy line 14.

If gitweb encounters, eg. an "Umlaut" (äöü) in a commit/tag, use of
uninitialized value message are generated. In one case,
decode_utf8($long) in format_subject_html is undefined, which results
in a invalid link (a tag contains only title without any value
assignment) and a browser message, that the html is not valid.

The previous installed version of git/gitweb (1.5.0rc3) showed only
small black rhombuses, but didn't generate "uninitialized value"
messages or invalid html.

So there is regression between git-1.5.0 and git-1.5.2.

Adding $var = encode_utf8($var) if (!defined decode_utf8($var)) for
each "uninitialized value" message results in a correct result for me.

I wanted to post a patch with these changes, as it solved my locale problem.
But then I tried the same a different computer with a newer perl (v5.8.8).
$ cat x
#!/usr/bin/perl
use strict;
use warnings;
use CGI qw(:standard :escapeHTML -nosticky);
use CGI::Util qw(unescape);
use CGI::Carp qw(fatalsToBrowser);
use Encode;
use Fcntl ':mode';
use File::Find qw();
use File::Basename qw(basename);

binmode STDOUT, ':utf8';

print decode_utf8('äöü');
$ perl x
ᅵᅵï¿

Here perl decodes the ISO-8859-1 text to something differnent:
00000000  ef bf bd ef bf bd ef bf  bd                       |ᅵᅵï¿

The result is, that all "Umlaute" are shown as a small black rhombus
in gitweb (and no invalid html).

mfg Martin Kögler

cat x |hexdump -C
00000000  23 21 2f 75 73 72 2f 62  69 6e 2f 70 65 72 6c 0a  |#!/usr/bin/perl.|
00000010  75 73 65 20 73 74 72 69  63 74 3b 0a 75 73 65 20  |use strict;.use |
00000020  77 61 72 6e 69 6e 67 73  3b 0a 75 73 65 20 43 47  |warnings;.use CG|
00000030  49 20 71 77 28 3a 73 74  61 6e 64 61 72 64 20 3a  |I qw(:standard :|
00000040  65 73 63 61 70 65 48 54  4d 4c 20 2d 6e 6f 73 74  |escapeHTML -nost|
00000050  69 63 6b 79 29 3b 0a 75  73 65 20 43 47 49 3a 3a  |icky);.use CGI::|
00000060  55 74 69 6c 20 71 77 28  75 6e 65 73 63 61 70 65  |Util qw(unescape|
00000070  29 3b 0a 75 73 65 20 43  47 49 3a 3a 43 61 72 70  |);.use CGI::Carp|
00000080  20 71 77 28 66 61 74 61  6c 73 54 6f 42 72 6f 77  | qw(fatalsToBrow|
00000090  73 65 72 29 3b 0a 75 73  65 20 45 6e 63 6f 64 65  |ser);.use Encode|
000000a0  3b 0a 75 73 65 20 46 63  6e 74 6c 20 27 3a 6d 6f  |;.use Fcntl ':mo|
000000b0  64 65 27 3b 0a 75 73 65  20 46 69 6c 65 3a 3a 46  |de';.use File::F|
000000c0  69 6e 64 20 71 77 28 29  3b 0a 75 73 65 20 46 69  |ind qw();.use Fi|
000000d0  6c 65 3a 3a 42 61 73 65  6e 61 6d 65 20 71 77 28  |le::Basename qw(|
000000e0  62 61 73 65 6e 61 6d 65  29 3b 0a 0a 62 69 6e 6d  |basename);..binm|
000000f0  6f 64 65 20 53 54 44 4f  55 54 2c 20 27 3a 75 74  |ode STDOUT, ':ut|
00000100  66 38 27 3b 0a 0a 70 72  69 6e 74 20 64 65 63 6f  |f8';..print deco|
00000110  64 65 5f 75 74 66 38 28  27 e4 f6 fc 27 29 3b 0a  |de_utf8('äöü');.|
00000120

             reply	other threads:[~2007-05-21 20:58 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-21 20:57 Martin Koegler [this message]
2007-05-21 21:09 ` gitweb - encoding problems Ismail Dönmez
2007-05-22  0:33 ` David Woodhouse
2007-05-22  7:50   ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070521205721.GA21771@auto.tuwien.ac.at \
    --to=mkoegler@auto.tuwien.ac.at \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).