* gitweb - encoding problems
@ 2007-05-21 20:57 Martin Koegler
2007-05-21 21:09 ` Ismail Dönmez
2007-05-22 0:33 ` David Woodhouse
0 siblings, 2 replies; 4+ messages in thread
From: Martin Koegler @ 2007-05-21 20:57 UTC (permalink / raw)
To: git; +Cc: Jakub Narebski, Junio C Hamano
I use ISO-8859-1 as my locale, so my blobs, commits and tags are in
this encoding.
On perl v5.8.6, decode_utf8 of any non utf-8 value returns undefined:
$cat xy
#!/usr/bin/perl
use strict;
use warnings;
use CGI qw(:standard :escapeHTML -nosticky);
use CGI::Util qw(unescape);
use CGI::Carp qw(fatalsToBrowser);
use Encode;
use Fcntl ':mode';
use File::Find qw();
use File::Basename qw(basename);
binmode STDOUT, ':utf8';
print decode_utf8('äöü');
$perl xy
[Mon May 21 22:00:00 2007] xy: Use of uninitialized value in print at xy line 14.
If gitweb encounters, eg. an "Umlaut" (äöü) in a commit/tag, use of
uninitialized value message are generated. In one case,
decode_utf8($long) in format_subject_html is undefined, which results
in a invalid link (a tag contains only title without any value
assignment) and a browser message, that the html is not valid.
The previous installed version of git/gitweb (1.5.0rc3) showed only
small black rhombuses, but didn't generate "uninitialized value"
messages or invalid html.
So there is regression between git-1.5.0 and git-1.5.2.
Adding $var = encode_utf8($var) if (!defined decode_utf8($var)) for
each "uninitialized value" message results in a correct result for me.
I wanted to post a patch with these changes, as it solved my locale problem.
But then I tried the same a different computer with a newer perl (v5.8.8).
$ cat x
#!/usr/bin/perl
use strict;
use warnings;
use CGI qw(:standard :escapeHTML -nosticky);
use CGI::Util qw(unescape);
use CGI::Carp qw(fatalsToBrowser);
use Encode;
use Fcntl ':mode';
use File::Find qw();
use File::Basename qw(basename);
binmode STDOUT, ':utf8';
print decode_utf8('äöü');
$ perl x
ᅵᅵï¿
Here perl decodes the ISO-8859-1 text to something differnent:
00000000 ef bf bd ef bf bd ef bf bd |ᅵᅵï¿
The result is, that all "Umlaute" are shown as a small black rhombus
in gitweb (and no invalid html).
mfg Martin Kögler
cat x |hexdump -C
00000000 23 21 2f 75 73 72 2f 62 69 6e 2f 70 65 72 6c 0a |#!/usr/bin/perl.|
00000010 75 73 65 20 73 74 72 69 63 74 3b 0a 75 73 65 20 |use strict;.use |
00000020 77 61 72 6e 69 6e 67 73 3b 0a 75 73 65 20 43 47 |warnings;.use CG|
00000030 49 20 71 77 28 3a 73 74 61 6e 64 61 72 64 20 3a |I qw(:standard :|
00000040 65 73 63 61 70 65 48 54 4d 4c 20 2d 6e 6f 73 74 |escapeHTML -nost|
00000050 69 63 6b 79 29 3b 0a 75 73 65 20 43 47 49 3a 3a |icky);.use CGI::|
00000060 55 74 69 6c 20 71 77 28 75 6e 65 73 63 61 70 65 |Util qw(unescape|
00000070 29 3b 0a 75 73 65 20 43 47 49 3a 3a 43 61 72 70 |);.use CGI::Carp|
00000080 20 71 77 28 66 61 74 61 6c 73 54 6f 42 72 6f 77 | qw(fatalsToBrow|
00000090 73 65 72 29 3b 0a 75 73 65 20 45 6e 63 6f 64 65 |ser);.use Encode|
000000a0 3b 0a 75 73 65 20 46 63 6e 74 6c 20 27 3a 6d 6f |;.use Fcntl ':mo|
000000b0 64 65 27 3b 0a 75 73 65 20 46 69 6c 65 3a 3a 46 |de';.use File::F|
000000c0 69 6e 64 20 71 77 28 29 3b 0a 75 73 65 20 46 69 |ind qw();.use Fi|
000000d0 6c 65 3a 3a 42 61 73 65 6e 61 6d 65 20 71 77 28 |le::Basename qw(|
000000e0 62 61 73 65 6e 61 6d 65 29 3b 0a 0a 62 69 6e 6d |basename);..binm|
000000f0 6f 64 65 20 53 54 44 4f 55 54 2c 20 27 3a 75 74 |ode STDOUT, ':ut|
00000100 66 38 27 3b 0a 0a 70 72 69 6e 74 20 64 65 63 6f |f8';..print deco|
00000110 64 65 5f 75 74 66 38 28 27 e4 f6 fc 27 29 3b 0a |de_utf8('äöü');.|
00000120
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: gitweb - encoding problems
2007-05-21 20:57 gitweb - encoding problems Martin Koegler
@ 2007-05-21 21:09 ` Ismail Dönmez
2007-05-22 0:33 ` David Woodhouse
1 sibling, 0 replies; 4+ messages in thread
From: Ismail Dönmez @ 2007-05-21 21:09 UTC (permalink / raw)
To: Martin Koegler; +Cc: git, Jakub Narebski, Junio C Hamano
On Monday 21 May 2007 23:57:21 you wrote:
> binmode STDOUT, ':utf8';
>
> print decode_utf8('äöü');
[~]> perl test.pl
äöü
[~]> cat test.pl
use Encode;
binmode STDOUT, ':utf8';
print decode_utf8('äöü'),"\n";
[cartman@southpark][00:08:15]
[~]> perl --version
This is perl, v5.8.8 built for i686-linux
Copyright 1987-2006, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
You got an old Encode.
--
Perfect is the enemy of good
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: gitweb - encoding problems
2007-05-21 20:57 gitweb - encoding problems Martin Koegler
2007-05-21 21:09 ` Ismail Dönmez
@ 2007-05-22 0:33 ` David Woodhouse
2007-05-22 7:50 ` Jakub Narebski
1 sibling, 1 reply; 4+ messages in thread
From: David Woodhouse @ 2007-05-22 0:33 UTC (permalink / raw)
To: Martin Koegler; +Cc: git, Jakub Narebski, Junio C Hamano
On Mon, 2007-05-21 at 22:57 +0200, Martin Koegler wrote:
> I use ISO-8859-1 as my locale, so my blobs, commits and tags are in
> this encoding.
That's a very strange thing for anyone to do in the 21st century.
Did you configure this archaic thing correctly in .git/config?
Otherwise, gitweb will assume that you're using utf-8 like any normal
person would, and of course you'll have problems when it tries to deal
with your legacy character set as if it were something sensible.
--
dwmw2
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: gitweb - encoding problems
2007-05-22 0:33 ` David Woodhouse
@ 2007-05-22 7:50 ` Jakub Narebski
0 siblings, 0 replies; 4+ messages in thread
From: Jakub Narebski @ 2007-05-22 7:50 UTC (permalink / raw)
To: David Woodhouse, Martin Koegler; +Cc: git, Junio C Hamano
On Thu, 22 May 2007, David Woodhouse wrote:
> On Mon, 2007-05-21 at 22:57 +0200, Martin Koegler wrote:
>> I use ISO-8859-1 as my locale, so my blobs, commits and tags are in
>> this encoding.
>
> That's a very strange thing for anyone to do in the 21st century.
> Did you configure this archaic thing correctly in .git/config?
>
> Otherwise, gitweb will assume that you're using utf-8 like any normal
> person would, and of course you'll have problems when it tries to deal
> with your legacy character set as if it were something sensible.
Actually gitweb does not respect i18n.* configuration variables and
happily assumes that everything is in utf-8, with the exception of
*_plain views, which are send :raw.
If you decide to implement supporting encodings other that utf-8 in
gitweb, please remember that some (like git-show, git-log) but not all
parts (like git-rev-list or --pretty=raw) do the decoding/encoding. And
that git can be compiled without iconv support. And that comits might
be in different encodings, which should be given by 'encoding' header,
but there is no way to guess encoding for a blob, or for a file names.
git-commit(1):
i18n.commitEncoding::
Character encoding the commit messages are stored in; git itself
does not care per se, but this information is necessary e.g. when
importing commits from emails or in the gitk graphical history
browser (and possibly at other places in the future or in other
porcelains). See e.g. gitlink:git-mailinfo[1]. Defaults to 'utf-8'.
i18n.logOutputEncoding::
Character encoding the commit messages are converted to when
running `git-log` and friends.
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2007-05-22 10:57 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-21 20:57 gitweb - encoding problems Martin Koegler
2007-05-21 21:09 ` Ismail Dönmez
2007-05-22 0:33 ` David Woodhouse
2007-05-22 7:50 ` Jakub Narebski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).