git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 2/4] gitweb: Make feed title valid utf8
@ 2013-04-08 20:09 Jürgen Kreileder
  2013-04-09 15:10 ` Jakub Narębski
  0 siblings, 1 reply; 6+ messages in thread
From: Jürgen Kreileder @ 2013-04-08 20:09 UTC (permalink / raw)
  To: git

Properly encode site and project names for RSS and Atom feeds.

Signed-off-by: Jürgen Kreileder <jk@blackdown.de>
---
 gitweb/gitweb.perl |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 9cfe5b5..09294eb 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -8056,7 +8056,7 @@ sub git_feed {
 	return if ($cgi->request_method() eq 'HEAD');
 
 	# header variables
-	my $title = "$site_name - $project/$action";
+	my $title = to_utf8($site_name) . " - " . to_utf8($project) . "/$action";
 	my $feed_type = 'log';
 	if (defined $hash) {
 		$title .= " - '$hash'";
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/4] gitweb: Make feed title valid utf8
  2013-04-08 20:09 [PATCH 2/4] gitweb: Make feed title valid utf8 Jürgen Kreileder
@ 2013-04-09 15:10 ` Jakub Narębski
  2013-04-09 17:40   ` Jürgen Kreileder
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Narębski @ 2013-04-09 15:10 UTC (permalink / raw)
  To: Jürgen Kreileder; +Cc: git

Jürgen Kreileder wrote:

> Properly encode site and project names for RSS and Atom feeds.
> 
> Signed-off-by: Jürgen Kreileder <jk@blackdown.de>
> ---
>  gitweb/gitweb.perl |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 9cfe5b5..09294eb 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -8056,7 +8056,7 @@ sub git_feed {
>  	return if ($cgi->request_method() eq 'HEAD');
>  
>  	# header variables
> -	my $title = "$site_name - $project/$action";
> +	my $title = to_utf8($site_name) . " - " . to_utf8($project) . "/$action";
>  	my $feed_type = 'log';
>  	if (defined $hash) {
>  		$title .= " - '$hash'";
> 

Was this patch triggered by some bug?

Because the above is not necessary, as git_feed() has

	$title = esc_html($title);

a bit later, which does to_utf8() internally.
-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/4] gitweb: Make feed title valid utf8
  2013-04-09 15:10 ` Jakub Narębski
@ 2013-04-09 17:40   ` Jürgen Kreileder
  2013-04-09 18:27     ` Jakub Narębski
  0 siblings, 1 reply; 6+ messages in thread
From: Jürgen Kreileder @ 2013-04-09 17:40 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: git

Jakub Narębski <jnareb@gmail.com> writes:

> Jürgen Kreileder wrote:
>
>> Properly encode site and project names for RSS and Atom feeds.
>> 
>> Signed-off-by: Jürgen Kreileder <jk@blackdown.de>
>> ---
>>  gitweb/gitweb.perl |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>> index 9cfe5b5..09294eb 100755
>> --- a/gitweb/gitweb.perl
>> +++ b/gitweb/gitweb.perl
>> @@ -8056,7 +8056,7 @@ sub git_feed {
>>  	return if ($cgi->request_method() eq 'HEAD');
>>  
>>  	# header variables
>> -	my $title = "$site_name - $project/$action";
>> +	my $title = to_utf8($site_name) . " - " . to_utf8($project) . "/$action";
>>  	my $feed_type = 'log';
>>  	if (defined $hash) {
>>  		$title .= " - '$hash'";
>> 
>
> Was this patch triggered by some bug?

Yes, I actually see broken encoding with the old code, e.g on 
https://git.blackdown.de/old.cgi?p=contactalbum.git;a=rss
my first name is messed up in the title tag.

New version: https://git.blackdown.de/?p=contactalbum.git;a=rss

> Because the above is not necessary, as git_feed() has
>
> 	$title = esc_html($title);
>
> a bit later, which does to_utf8() internally.

Good point.  But it doesn't fix the string in question:
It looks like to_utf8("$a $b") != (to_utf8($a) . " " . to_utf($b)).


Juergen

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/4] gitweb: Make feed title valid utf8
  2013-04-09 17:40   ` Jürgen Kreileder
@ 2013-04-09 18:27     ` Jakub Narębski
  2013-04-09 19:22       ` Jürgen Kreileder
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Narębski @ 2013-04-09 18:27 UTC (permalink / raw)
  To: Jürgen Kreileder; +Cc: git

W dniu 09.04.2013 19:40, Jürgen Kreileder napisał:
> Jakub Narębski <jnareb@gmail.com> writes:
>> Jürgen Kreileder wrote:
>>
>>> Properly encode site and project names for RSS and Atom feeds.

>>> -	my $title = "$site_name - $project/$action";
>>> +	my $title = to_utf8($site_name) . " - " . to_utf8($project) . "/$action";

>> Was this patch triggered by some bug?
> 
> Yes, I actually see broken encoding with the old code, e.g on 
> https://git.blackdown.de/old.cgi?p=contactalbum.git;a=rss
> my first name is messed up in the title tag.
> 
> New version: https://git.blackdown.de/?p=contactalbum.git;a=rss
> 
>> Because the above is not necessary, as git_feed() has
>>
>> 	$title = esc_html($title);
>>
>> a bit later, which does to_utf8() internally.
> 
> Good point.  But it doesn't fix the string in question:
> It looks like to_utf8("$a $b") != (to_utf8($a) . " " . to_utf8($b)).

Strange.  I wonder if the bug is in our to_utf8() implementation,
or in Encode, or in Perl... and whether this bug can be triggered
anywhere else in gitweb.

What Perl version and Encode module version do you use?
-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/4] gitweb: Make feed title valid utf8
  2013-04-09 18:27     ` Jakub Narębski
@ 2013-04-09 19:22       ` Jürgen Kreileder
  2013-04-09 19:58         ` Jakub Narębski
  0 siblings, 1 reply; 6+ messages in thread
From: Jürgen Kreileder @ 2013-04-09 19:22 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: git

Jakub Narębski <jnareb@gmail.com> writes:

> W dniu 09.04.2013 19:40, Jürgen Kreileder napisał:
>> Jakub Narębski <jnareb@gmail.com> writes:
>>> Jürgen Kreileder wrote:
>>>
>>>> Properly encode site and project names for RSS and Atom feeds.
>
>>>> -	my $title = "$site_name - $project/$action";
>>>> +	my $title = to_utf8($site_name) . " - " . to_utf8($project) . "/$action";
>
>>> Was this patch triggered by some bug?
>> 
>> Yes, I actually see broken encoding with the old code, e.g on 
>> https://git.blackdown.de/old.cgi?p=contactalbum.git;a=rss
>> my first name is messed up in the title tag.
>> 
>> New version: https://git.blackdown.de/?p=contactalbum.git;a=rss
>> 
>>> Because the above is not necessary, as git_feed() has
>>>
>>> 	$title = esc_html($title);
>>>
>>> a bit later, which does to_utf8() internally.
>> 
>> Good point.  But it doesn't fix the string in question:
>> It looks like to_utf8("$a $b") != (to_utf8($a) . " " . to_utf8($b)).
>
> Strange.  I wonder if the bug is in our to_utf8() implementation,
> or in Encode, or in Perl... and whether this bug can be triggered
> anywhere else in gitweb.

I don't think it's a bug, more like a consequence of concatenating utf8
and non-utf8 strings:

    my $a = "ü";
    my $b = "ü";
    my $c = "$a - $b";
    print "$c -> ". to_utf8($c) . ": " . (utf8::is_utf8($c) ? "utf8" : "not utf8") . "\n"; # GOOD
    $b = to_utf8($b);
    $c = "$a - $b";
    print "$c -> ". to_utf8($c) . ": " . (utf8::is_utf8($c) ? "utf8" : "not utf8") . "\n"; # GOOD

yields (hopefully the broken encoding shows up correctly here):

    ü - ü -> ü - ü: not utf8
    ü - ü -> ü - ü: utf8


In gitweb we have the bad case: 

   my $title = "$site_name - $project/$action";

$project and $action are apparently utf8 already but $site_name isn't.
The resulting string is marked as utf8 - although the encoding of
$site_name was never fixed.  The to_utf8() in esc_html() returns the string
without fixing anything because of that.

> What Perl version and Encode module version do you use?

5.14.2 and 2.42_01 on Ubuntu.  Same results with 5.12.4 and 2.39 on OS X.


       Juergen

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/4] gitweb: Make feed title valid utf8
  2013-04-09 19:22       ` Jürgen Kreileder
@ 2013-04-09 19:58         ` Jakub Narębski
  0 siblings, 0 replies; 6+ messages in thread
From: Jakub Narębski @ 2013-04-09 19:58 UTC (permalink / raw)
  To: Jürgen Kreileder; +Cc: git

W dniu 09.04.2013 21:22, Jürgen Kreileder napisał:
> Jakub Narębski <jnareb@gmail.com> writes: 
>> W dniu 09.04.2013 19:40, Jürgen Kreileder napisał:
>>> Jakub Narębski <jnareb@gmail.com> writes:
>>>> Jürgen Kreileder wrote:
>>>>
>>>>> Properly encode site and project names for RSS and Atom feeds.

>>> Good point.  But it doesn't fix the string in question:
>>> It looks like to_utf8("$a $b") != (to_utf8($a) . " " . to_utf8($b)).
>>
>> Strange.  I wonder if the bug is in our to_utf8() implementation,
>> or in Encode, or in Perl... and whether this bug can be triggered
>> anywhere else in gitweb.
> 
> I don't think it's a bug, more like a consequence of concatenating utf8
> and non-utf8 strings:
> 
>     my $a = "ü";
>     my $b = "ü";
>     my $c = "$a - $b";
>     print "$c -> ". to_utf8($c) . ": " . (utf8::is_utf8($c) ? "utf8" : "not utf8") . "\n"; # GOOD
>     $b = to_utf8($b);
>     $c = "$a - $b";
>     print "$c -> ". to_utf8($c) . ": " . (utf8::is_utf8($c) ? "utf8" : "not utf8") . "\n"; # GOOD
> 
> yields (hopefully the broken encoding shows up correctly here):
> 
>     ü - ü -> ü - ü: not utf8
>     ü - ü -> ü - ü: utf8

Ah, so it looks like it is misfeature of the way Perl handles Unicode;
concatenating adds 'UTF8' flag if either of concatenates strings has
it to the result.

[Which I have checked using Devel::Peek with
 perl -MDevel::Peek -E '
  my $a = "ż"; my $b = "\x{17c}";
  Dump $a; Dump $b; Dump "$b - $a"'
]

> In gitweb we have the bad case: 
> 
>    my $title = "$site_name - $project/$action";
> 
> $project and $action are apparently utf8 already but $site_name isn't.

$project and $action are taken from URL, and we have to run decode_utf8
(at least for query params) for gitweb to work correctly.

$site_name is usually taken from config file, and gitweb doesn't have
"use utf8" pragma.

> The resulting string is marked as utf8 - although the encoding of
> $site_name was never fixed.  The to_utf8() in esc_html() returns the string
> without fixing anything because of that.

O.K.

_Maybe_ it would be worth adding explanation of this to commit message
(and I see I should audit gitweb for similar problems elsewhere), but anyway

Acked-by: Jakub Narebski <jnareb@gmail.com>

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-04-09 19:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-08 20:09 [PATCH 2/4] gitweb: Make feed title valid utf8 Jürgen Kreileder
2013-04-09 15:10 ` Jakub Narębski
2013-04-09 17:40   ` Jürgen Kreileder
2013-04-09 18:27     ` Jakub Narębski
2013-04-09 19:22       ` Jürgen Kreileder
2013-04-09 19:58         ` Jakub Narębski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).