git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] gitweb: use decode_utf8 directly
@ 2007-04-24 14:05 Ismail Dönmez
  2007-04-27  8:55 ` Ismail Dönmez
  0 siblings, 1 reply; 22+ messages in thread
From: Ismail Dönmez @ 2007-04-24 14:05 UTC (permalink / raw)
  To: git


[-- Attachment #1.1: Type: text/plain, Size: 1074 bytes --]

Hi,

gitweb currently uses Encode::decode function with a wrapper like this :

# very thin wrapper for decode("utf8", $str, Encode::FB_DEFAULT);
sub to_utf8 {
       my $str = shift;
       return decode("utf8", $str, Encode::FB_DEFAULT);
}

But for me this gives the following error when I try to view RSS feed for 
Linux kernel GIT repo (local checkout) :

Cannot decode string with wide characters 
at /usr/lib/perl5/vendor_perl/5.8.8/i686-linux/Encode.pm line 162.

I Google'd a bit but the relevant information seems to be missing about this 
error. Anyhow there is no need for a wrapper at all as Encode class has a 
decode_utf8 function which fixes the problem I am experiencing too and chops 
off the unneeded wrapper.

Patch against git 1.5.1.2 is attached. Comments welcome.

P.S: I am using Encode 2.20 from CPAN which is the latest stable version 
available.

Regards,
ismail

-- 
Life is a game, and if you aren't in it to win,
what the heck are you still doing here?

-- Linus Torvalds (talking about open source development)

[-- Attachment #1.2: decode-utf8.patch --]
[-- Type: text/x-diff, Size: 3063 bytes --]

--- gitweb/gitweb.perl	2007-04-24 16:53:00.000000000 +0300
+++ gitweb/gitweb.perl	2007-04-24 16:54:22.000000000 +0300
@@ -566,12 +566,6 @@
 	return $input;
 }
 
-# very thin wrapper for decode("utf8", $str, Encode::FB_DEFAULT);
-sub to_utf8 {
-	my $str = shift;
-	return decode("utf8", $str, Encode::FB_DEFAULT);
-}
-
 # quote unsafe chars, but keep the slash, even when it's not
 # correct, but quoted slashes look too horrible in bookmarks
 sub esc_param {
@@ -596,7 +590,7 @@
 	my $str = shift;
 	my %opts = @_;
 
-	$str = to_utf8($str);
+	$str = decode_utf8($str);
 	$str = $cgi->escapeHTML($str);
 	if ($opts{'-nbsp'}) {
 		$str =~ s/ / /g;
@@ -610,7 +604,7 @@
 	my $str = shift;
 	my %opts = @_;
 
-	$str = to_utf8($str);
+	$str = decode_utf8($str);
 	$str = $cgi->escapeHTML($str);
 	if ($opts{'-nbsp'}) {
 		$str =~ s/ / /g;
@@ -893,7 +887,7 @@
 
 	if (length($short) < length($long)) {
 		return $cgi->a({-href => $href, -class => "list subject",
-		                -title => to_utf8($long)},
+		                -title => decode_utf8($long)},
 		       esc_html($short) . $extra);
 	} else {
 		return $cgi->a({-href => $href, -class => "list subject"},
@@ -1110,7 +1104,7 @@
 			if (check_export_ok("$projectroot/$path")) {
 				my $pr = {
 					path => $path,
-					owner => to_utf8($owner),
+					owner => decode_utf8($owner),
 				};
 				push @list, $pr
 			}
@@ -1139,7 +1133,7 @@
 			$pr = unescape($pr);
 			$ow = unescape($ow);
 			if ($pr eq $project) {
-				$owner = to_utf8($ow);
+				$owner = decode_utf8($ow);
 				last;
 			}
 		}
@@ -1613,7 +1607,7 @@
 	}
 	my $owner = $gcos;
 	$owner =~ s/[,;].*$//;
-	return to_utf8($owner);
+	return decode_utf8($owner);
 }
 
 ## ......................................................................
@@ -1696,7 +1690,7 @@
 
 	my $title = "$site_name";
 	if (defined $project) {
-		$title .= " - " . to_utf8($project);
+		$title .= " - " . decode_utf8($project);
 		if (defined $action) {
 			$title .= "/$action";
 			if (defined $file_name) {
@@ -1969,7 +1963,7 @@
 
 	print "<div class=\"page_path\">";
 	print $cgi->a({-href => href(action=>"tree", hash_base=>$hb),
-	              -title => 'tree root'}, to_utf8("[$project]"));
+	              -title => 'tree root'}, decode_utf8("[$project]"));
 	print " / ";
 	if (defined $name) {
 		my @dirname = split '/', $name;
@@ -2584,7 +2578,7 @@
 		($pr->{'age'}, $pr->{'age_string'}) = @aa;
 		if (!defined $pr->{'descr'}) {
 			my $descr = git_get_project_description($pr->{'path'}) || "";
-			$pr->{'descr_long'} = to_utf8($descr);
+			$pr->{'descr_long'} = decode_utf8($descr);
 			$pr->{'descr'} = chop_str($descr, 25, 5);
 		}
 		if (!defined $pr->{'owner'}) {
@@ -3616,7 +3610,7 @@
 		$hash = git_get_head_hash($project);
 	}
 
-	my $filename = to_utf8(basename($project)) . "-$hash.tar.$suffix";
+	my $filename = decode_utf8(basename($project)) . "-$hash.tar.$suffix";
 
 	print $cgi->header(
 		-type => "application/$ctype",

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread
* Re: [PATCH] gitweb: use decode_utf8 directly
@ 2007-06-01 16:13 Martin Koegler
  0 siblings, 0 replies; 22+ messages in thread
From: Martin Koegler @ 2007-06-01 16:13 UTC (permalink / raw)
  To: julliard; +Cc: git

Alexandre Julliard wrote:
>Junio C Hamano <junkio@cox.net> writes:
> > I would say that the patch is an improvement from the current
> > code so it should hit 'master'; I was a bit busy lately and then
> > am sick, and also we are post -rc1 freeze now and I was being
> > cautious, just in case some nacks from more informed parties
> > arrive late.
> 
> Sorry for the late nack, but it turns out that this patch breaks diff
> output on the Wine server for files that are not utf-8.
> 
> The cause is apparently that decode_utf8() returns undef for invalid
> sequences instead of substituting a replacement char like
> decode("utf8") does.
> 
> That may be considered an Encode bug since we are running a fairly old
> version (1.99, coming with Debian 3.1), but I'd rather not upgrade
> perl on the server. Could the patch be reverted, or done differently?

I hit the same problem:
http://marc.info/?l=git&m=117978122420441&w=2

On my system, I use this patch as workaround:
http://marc.info/?l=git&m=118038526531694&w=2

mfg Martin Kögler

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2007-06-03 22:14 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-24 14:05 [PATCH] gitweb: use decode_utf8 directly Ismail Dönmez
2007-04-27  8:55 ` Ismail Dönmez
2007-04-27  9:07   ` Junio C Hamano
2007-04-27  9:22     ` Ismail Dönmez
2007-04-27 19:29       ` Junio C Hamano
2007-05-01 21:12         ` Ismail Dönmez
2007-05-01 21:39           ` Junio C Hamano
2007-05-01 21:44             ` Ismail Dönmez
2007-05-01 21:48               ` Ismail Dönmez
2007-05-03 19:22             ` Ismail Dönmez
2007-05-03 19:26               ` Junio C Hamano
2007-06-01 13:45                 ` Alexandre Julliard
2007-06-01 13:50                   ` Ismail Dönmez
2007-06-01 16:51                     ` Alexandre Julliard
2007-06-01 19:44                   ` Junio C Hamano
2007-06-01 19:47                     ` Ismail Dönmez
2007-06-01 20:00                       ` Junio C Hamano
2007-06-01 20:08                         ` Ismail Dönmez
2007-06-03 22:06                           ` Junio C Hamano
2007-06-03 22:13                             ` Ismail Dönmez
2007-06-02  8:22                   ` Jakub Narebski
  -- strict thread matches above, loose matches on Subject: below --
2007-06-01 16:13 Martin Koegler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).