git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* gitweb - encoding problems
@ 2007-05-21 20:57 Martin Koegler
  2007-05-21 21:09 ` Ismail Dönmez
  2007-05-22  0:33 ` David Woodhouse
  0 siblings, 2 replies; 4+ messages in thread
From: Martin Koegler @ 2007-05-21 20:57 UTC (permalink / raw)
  To: git; +Cc: Jakub Narebski, Junio C Hamano

I use ISO-8859-1 as my locale, so my blobs, commits and tags are in
this encoding.

On perl v5.8.6, decode_utf8 of any non utf-8 value returns undefined:

$cat xy
#!/usr/bin/perl
use strict;
use warnings;
use CGI qw(:standard :escapeHTML -nosticky);
use CGI::Util qw(unescape);
use CGI::Carp qw(fatalsToBrowser);
use Encode;
use Fcntl ':mode';
use File::Find qw();
use File::Basename qw(basename);

binmode STDOUT, ':utf8';

print decode_utf8('äöü');
$perl xy
[Mon May 21 22:00:00 2007] xy: Use of uninitialized value in print at xy line 14.

If gitweb encounters, eg. an "Umlaut" (äöü) in a commit/tag, use of
uninitialized value message are generated. In one case,
decode_utf8($long) in format_subject_html is undefined, which results
in a invalid link (a tag contains only title without any value
assignment) and a browser message, that the html is not valid.

The previous installed version of git/gitweb (1.5.0rc3) showed only
small black rhombuses, but didn't generate "uninitialized value"
messages or invalid html.

So there is regression between git-1.5.0 and git-1.5.2.

Adding $var = encode_utf8($var) if (!defined decode_utf8($var)) for
each "uninitialized value" message results in a correct result for me.

I wanted to post a patch with these changes, as it solved my locale problem.
But then I tried the same a different computer with a newer perl (v5.8.8).
$ cat x
#!/usr/bin/perl
use strict;
use warnings;
use CGI qw(:standard :escapeHTML -nosticky);
use CGI::Util qw(unescape);
use CGI::Carp qw(fatalsToBrowser);
use Encode;
use Fcntl ':mode';
use File::Find qw();
use File::Basename qw(basename);

binmode STDOUT, ':utf8';

print decode_utf8('äöü');
$ perl x
ᅵᅵï¿

Here perl decodes the ISO-8859-1 text to something differnent:
00000000  ef bf bd ef bf bd ef bf  bd                       |ᅵᅵï¿

The result is, that all "Umlaute" are shown as a small black rhombus
in gitweb (and no invalid html).

mfg Martin Kögler

cat x |hexdump -C
00000000  23 21 2f 75 73 72 2f 62  69 6e 2f 70 65 72 6c 0a  |#!/usr/bin/perl.|
00000010  75 73 65 20 73 74 72 69  63 74 3b 0a 75 73 65 20  |use strict;.use |
00000020  77 61 72 6e 69 6e 67 73  3b 0a 75 73 65 20 43 47  |warnings;.use CG|
00000030  49 20 71 77 28 3a 73 74  61 6e 64 61 72 64 20 3a  |I qw(:standard :|
00000040  65 73 63 61 70 65 48 54  4d 4c 20 2d 6e 6f 73 74  |escapeHTML -nost|
00000050  69 63 6b 79 29 3b 0a 75  73 65 20 43 47 49 3a 3a  |icky);.use CGI::|
00000060  55 74 69 6c 20 71 77 28  75 6e 65 73 63 61 70 65  |Util qw(unescape|
00000070  29 3b 0a 75 73 65 20 43  47 49 3a 3a 43 61 72 70  |);.use CGI::Carp|
00000080  20 71 77 28 66 61 74 61  6c 73 54 6f 42 72 6f 77  | qw(fatalsToBrow|
00000090  73 65 72 29 3b 0a 75 73  65 20 45 6e 63 6f 64 65  |ser);.use Encode|
000000a0  3b 0a 75 73 65 20 46 63  6e 74 6c 20 27 3a 6d 6f  |;.use Fcntl ':mo|
000000b0  64 65 27 3b 0a 75 73 65  20 46 69 6c 65 3a 3a 46  |de';.use File::F|
000000c0  69 6e 64 20 71 77 28 29  3b 0a 75 73 65 20 46 69  |ind qw();.use Fi|
000000d0  6c 65 3a 3a 42 61 73 65  6e 61 6d 65 20 71 77 28  |le::Basename qw(|
000000e0  62 61 73 65 6e 61 6d 65  29 3b 0a 0a 62 69 6e 6d  |basename);..binm|
000000f0  6f 64 65 20 53 54 44 4f  55 54 2c 20 27 3a 75 74  |ode STDOUT, ':ut|
00000100  66 38 27 3b 0a 0a 70 72  69 6e 74 20 64 65 63 6f  |f8';..print deco|
00000110  64 65 5f 75 74 66 38 28  27 e4 f6 fc 27 29 3b 0a  |de_utf8('äöü');.|
00000120

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: gitweb - encoding problems
  2007-05-21 20:57 gitweb - encoding problems Martin Koegler
@ 2007-05-21 21:09 ` Ismail Dönmez
  2007-05-22  0:33 ` David Woodhouse
  1 sibling, 0 replies; 4+ messages in thread
From: Ismail Dönmez @ 2007-05-21 21:09 UTC (permalink / raw)
  To: Martin Koegler; +Cc: git, Jakub Narebski, Junio C Hamano

On Monday 21 May 2007 23:57:21 you wrote:
> binmode STDOUT, ':utf8';
>
> print decode_utf8('äöü');

[~]> perl test.pl
äöü

[~]> cat test.pl
use Encode;
binmode STDOUT, ':utf8';

print decode_utf8('äöü'),"\n";
[cartman@southpark][00:08:15]
[~]> perl --version

This is perl, v5.8.8 built for i686-linux

Copyright 1987-2006, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.

You got an old Encode.

-- 
Perfect is the enemy of good

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: gitweb - encoding problems
  2007-05-21 20:57 gitweb - encoding problems Martin Koegler
  2007-05-21 21:09 ` Ismail Dönmez
@ 2007-05-22  0:33 ` David Woodhouse
  2007-05-22  7:50   ` Jakub Narebski
  1 sibling, 1 reply; 4+ messages in thread
From: David Woodhouse @ 2007-05-22  0:33 UTC (permalink / raw)
  To: Martin Koegler; +Cc: git, Jakub Narebski, Junio C Hamano

On Mon, 2007-05-21 at 22:57 +0200, Martin Koegler wrote:
> I use ISO-8859-1 as my locale, so my blobs, commits and tags are in
> this encoding. 

That's a very strange thing for anyone to do in the 21st century.
Did you configure this archaic thing correctly in .git/config?

Otherwise, gitweb will assume that you're using utf-8 like any normal
person would, and of course you'll have problems when it tries to deal
with your legacy character set as if it were something sensible.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: gitweb - encoding problems
  2007-05-22  0:33 ` David Woodhouse
@ 2007-05-22  7:50   ` Jakub Narebski
  0 siblings, 0 replies; 4+ messages in thread
From: Jakub Narebski @ 2007-05-22  7:50 UTC (permalink / raw)
  To: David Woodhouse, Martin Koegler; +Cc: git, Junio C Hamano

On Thu, 22 May 2007, David Woodhouse wrote:
> On Mon, 2007-05-21 at 22:57 +0200, Martin Koegler wrote:

>> I use ISO-8859-1 as my locale, so my blobs, commits and tags are in
>> this encoding. 
> 
> That's a very strange thing for anyone to do in the 21st century.
> Did you configure this archaic thing correctly in .git/config?
> 
> Otherwise, gitweb will assume that you're using utf-8 like any normal
> person would, and of course you'll have problems when it tries to deal
> with your legacy character set as if it were something sensible.

Actually gitweb does not respect i18n.* configuration variables and
happily assumes that everything is in utf-8, with the exception of 
*_plain views, which are send :raw.

If you decide to implement supporting encodings other that utf-8 in 
gitweb, please remember that some (like git-show, git-log) but not all 
parts (like git-rev-list or --pretty=raw) do the decoding/encoding. And 
that git can be compiled without iconv support. And that comits might 
be in different encodings, which should be given by 'encoding' header, 
but there is no way to guess encoding for a blob, or for a file names.


git-commit(1):
 i18n.commitEncoding::
     Character encoding the commit messages are stored in; git itself
     does not care per se, but this information is necessary e.g. when
     importing commits from emails or in the gitk graphical history
     browser (and possibly at other places in the future or in other
     porcelains). See e.g. gitlink:git-mailinfo[1]. Defaults to 'utf-8'.

 i18n.logOutputEncoding::
     Character encoding the commit messages are converted to when
     running `git-log` and friends.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-05-22 10:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-21 20:57 gitweb - encoding problems Martin Koegler
2007-05-21 21:09 ` Ismail Dönmez
2007-05-22  0:33 ` David Woodhouse
2007-05-22  7:50   ` Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).