All of lore.kernel.org
 help / color / mirror / Atom feed
From: jk@blackdown.de (Jürgen Kreileder)
To: "Jakub Narębski" <jnareb@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 2/4] gitweb: Make feed title valid utf8
Date: Tue, 09 Apr 2013 21:22:03 +0200	[thread overview]
Message-ID: <m2ip3va4ro.fsf@zahir.fritz.box> (raw)
In-Reply-To: <51645D99.6000106@gmail.com> ("Jakub \=\?utf-8\?Q\?Nar\=C4\=99bski\?\= \=\?utf-8\?Q\?\=22's\?\= message of "Tue, 09 Apr 2013 20:27:37 +0200")

Jakub Narębski <jnareb@gmail.com> writes:

> W dniu 09.04.2013 19:40, Jürgen Kreileder napisał:
>> Jakub Narębski <jnareb@gmail.com> writes:
>>> Jürgen Kreileder wrote:
>>>
>>>> Properly encode site and project names for RSS and Atom feeds.
>
>>>> -	my $title = "$site_name - $project/$action";
>>>> +	my $title = to_utf8($site_name) . " - " . to_utf8($project) . "/$action";
>
>>> Was this patch triggered by some bug?
>> 
>> Yes, I actually see broken encoding with the old code, e.g on 
>> https://git.blackdown.de/old.cgi?p=contactalbum.git;a=rss
>> my first name is messed up in the title tag.
>> 
>> New version: https://git.blackdown.de/?p=contactalbum.git;a=rss
>> 
>>> Because the above is not necessary, as git_feed() has
>>>
>>> 	$title = esc_html($title);
>>>
>>> a bit later, which does to_utf8() internally.
>> 
>> Good point.  But it doesn't fix the string in question:
>> It looks like to_utf8("$a $b") != (to_utf8($a) . " " . to_utf8($b)).
>
> Strange.  I wonder if the bug is in our to_utf8() implementation,
> or in Encode, or in Perl... and whether this bug can be triggered
> anywhere else in gitweb.

I don't think it's a bug, more like a consequence of concatenating utf8
and non-utf8 strings:

    my $a = "ü";
    my $b = "ü";
    my $c = "$a - $b";
    print "$c -> ". to_utf8($c) . ": " . (utf8::is_utf8($c) ? "utf8" : "not utf8") . "\n"; # GOOD
    $b = to_utf8($b);
    $c = "$a - $b";
    print "$c -> ". to_utf8($c) . ": " . (utf8::is_utf8($c) ? "utf8" : "not utf8") . "\n"; # GOOD

yields (hopefully the broken encoding shows up correctly here):

    ü - ü -> ü - ü: not utf8
    ü - ü -> ü - ü: utf8


In gitweb we have the bad case: 

   my $title = "$site_name - $project/$action";

$project and $action are apparently utf8 already but $site_name isn't.
The resulting string is marked as utf8 - although the encoding of
$site_name was never fixed.  The to_utf8() in esc_html() returns the string
without fixing anything because of that.

> What Perl version and Encode module version do you use?

5.14.2 and 2.42_01 on Ubuntu.  Same results with 5.12.4 and 2.39 on OS X.


       Juergen

  reply	other threads:[~2013-04-09 19:22 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-08 20:09 [PATCH 2/4] gitweb: Make feed title valid utf8 Jürgen Kreileder
2013-04-09 15:10 ` Jakub Narębski
2013-04-09 17:40   ` Jürgen Kreileder
2013-04-09 18:27     ` Jakub Narębski
2013-04-09 19:22       ` Jürgen Kreileder [this message]
2013-04-09 19:58         ` Jakub Narębski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m2ip3va4ro.fsf@zahir.fritz.box \
    --to=jk@blackdown.de \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.