From: mkoegler@auto.tuwien.ac.at (Martin Koegler)
To: git@vger.kernel.org
Cc: Jakub Narebski <jnareb@gmail.com>, Junio C Hamano <junkio@cox.net>
Subject: gitweb - encoding problems
Date: Mon, 21 May 2007 22:57:21 +0200 [thread overview]
Message-ID: <20070521205721.GA21771@auto.tuwien.ac.at> (raw)
I use ISO-8859-1 as my locale, so my blobs, commits and tags are in
this encoding.
On perl v5.8.6, decode_utf8 of any non utf-8 value returns undefined:
$cat xy
#!/usr/bin/perl
use strict;
use warnings;
use CGI qw(:standard :escapeHTML -nosticky);
use CGI::Util qw(unescape);
use CGI::Carp qw(fatalsToBrowser);
use Encode;
use Fcntl ':mode';
use File::Find qw();
use File::Basename qw(basename);
binmode STDOUT, ':utf8';
print decode_utf8('äöü');
$perl xy
[Mon May 21 22:00:00 2007] xy: Use of uninitialized value in print at xy line 14.
If gitweb encounters, eg. an "Umlaut" (äöü) in a commit/tag, use of
uninitialized value message are generated. In one case,
decode_utf8($long) in format_subject_html is undefined, which results
in a invalid link (a tag contains only title without any value
assignment) and a browser message, that the html is not valid.
The previous installed version of git/gitweb (1.5.0rc3) showed only
small black rhombuses, but didn't generate "uninitialized value"
messages or invalid html.
So there is regression between git-1.5.0 and git-1.5.2.
Adding $var = encode_utf8($var) if (!defined decode_utf8($var)) for
each "uninitialized value" message results in a correct result for me.
I wanted to post a patch with these changes, as it solved my locale problem.
But then I tried the same a different computer with a newer perl (v5.8.8).
$ cat x
#!/usr/bin/perl
use strict;
use warnings;
use CGI qw(:standard :escapeHTML -nosticky);
use CGI::Util qw(unescape);
use CGI::Carp qw(fatalsToBrowser);
use Encode;
use Fcntl ':mode';
use File::Find qw();
use File::Basename qw(basename);
binmode STDOUT, ':utf8';
print decode_utf8('äöü');
$ perl x
ᅵᅵï¿
Here perl decodes the ISO-8859-1 text to something differnent:
00000000 ef bf bd ef bf bd ef bf bd |ᅵᅵï¿
The result is, that all "Umlaute" are shown as a small black rhombus
in gitweb (and no invalid html).
mfg Martin Kögler
cat x |hexdump -C
00000000 23 21 2f 75 73 72 2f 62 69 6e 2f 70 65 72 6c 0a |#!/usr/bin/perl.|
00000010 75 73 65 20 73 74 72 69 63 74 3b 0a 75 73 65 20 |use strict;.use |
00000020 77 61 72 6e 69 6e 67 73 3b 0a 75 73 65 20 43 47 |warnings;.use CG|
00000030 49 20 71 77 28 3a 73 74 61 6e 64 61 72 64 20 3a |I qw(:standard :|
00000040 65 73 63 61 70 65 48 54 4d 4c 20 2d 6e 6f 73 74 |escapeHTML -nost|
00000050 69 63 6b 79 29 3b 0a 75 73 65 20 43 47 49 3a 3a |icky);.use CGI::|
00000060 55 74 69 6c 20 71 77 28 75 6e 65 73 63 61 70 65 |Util qw(unescape|
00000070 29 3b 0a 75 73 65 20 43 47 49 3a 3a 43 61 72 70 |);.use CGI::Carp|
00000080 20 71 77 28 66 61 74 61 6c 73 54 6f 42 72 6f 77 | qw(fatalsToBrow|
00000090 73 65 72 29 3b 0a 75 73 65 20 45 6e 63 6f 64 65 |ser);.use Encode|
000000a0 3b 0a 75 73 65 20 46 63 6e 74 6c 20 27 3a 6d 6f |;.use Fcntl ':mo|
000000b0 64 65 27 3b 0a 75 73 65 20 46 69 6c 65 3a 3a 46 |de';.use File::F|
000000c0 69 6e 64 20 71 77 28 29 3b 0a 75 73 65 20 46 69 |ind qw();.use Fi|
000000d0 6c 65 3a 3a 42 61 73 65 6e 61 6d 65 20 71 77 28 |le::Basename qw(|
000000e0 62 61 73 65 6e 61 6d 65 29 3b 0a 0a 62 69 6e 6d |basename);..binm|
000000f0 6f 64 65 20 53 54 44 4f 55 54 2c 20 27 3a 75 74 |ode STDOUT, ':ut|
00000100 66 38 27 3b 0a 0a 70 72 69 6e 74 20 64 65 63 6f |f8';..print deco|
00000110 64 65 5f 75 74 66 38 28 27 e4 f6 fc 27 29 3b 0a |de_utf8('äöü');.|
00000120
next reply other threads:[~2007-05-21 20:58 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-21 20:57 Martin Koegler [this message]
2007-05-21 21:09 ` gitweb - encoding problems Ismail Dönmez
2007-05-22 0:33 ` David Woodhouse
2007-05-22 7:50 ` Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070521205721.GA21771@auto.tuwien.ac.at \
--to=mkoegler@auto.tuwien.ac.at \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).