From: "Jakub Narębski" <jnareb@gmail.com>
To: "Jürgen Kreileder" <jk@blackdown.de>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 2/4] gitweb: Make feed title valid utf8
Date: Tue, 09 Apr 2013 21:58:41 +0200 [thread overview]
Message-ID: <516472F1.4060903@gmail.com> (raw)
In-Reply-To: <m2ip3va4ro.fsf@zahir.fritz.box>
W dniu 09.04.2013 21:22, Jürgen Kreileder napisał:
> Jakub Narębski <jnareb@gmail.com> writes:
>> W dniu 09.04.2013 19:40, Jürgen Kreileder napisał:
>>> Jakub Narębski <jnareb@gmail.com> writes:
>>>> Jürgen Kreileder wrote:
>>>>
>>>>> Properly encode site and project names for RSS and Atom feeds.
>>> Good point. But it doesn't fix the string in question:
>>> It looks like to_utf8("$a $b") != (to_utf8($a) . " " . to_utf8($b)).
>>
>> Strange. I wonder if the bug is in our to_utf8() implementation,
>> or in Encode, or in Perl... and whether this bug can be triggered
>> anywhere else in gitweb.
>
> I don't think it's a bug, more like a consequence of concatenating utf8
> and non-utf8 strings:
>
> my $a = "ü";
> my $b = "ü";
> my $c = "$a - $b";
> print "$c -> ". to_utf8($c) . ": " . (utf8::is_utf8($c) ? "utf8" : "not utf8") . "\n"; # GOOD
> $b = to_utf8($b);
> $c = "$a - $b";
> print "$c -> ". to_utf8($c) . ": " . (utf8::is_utf8($c) ? "utf8" : "not utf8") . "\n"; # GOOD
>
> yields (hopefully the broken encoding shows up correctly here):
>
> ü - ü -> ü - ü: not utf8
> ü - ü -> ü - ü: utf8
Ah, so it looks like it is misfeature of the way Perl handles Unicode;
concatenating adds 'UTF8' flag if either of concatenates strings has
it to the result.
[Which I have checked using Devel::Peek with
perl -MDevel::Peek -E '
my $a = "ż"; my $b = "\x{17c}";
Dump $a; Dump $b; Dump "$b - $a"'
]
> In gitweb we have the bad case:
>
> my $title = "$site_name - $project/$action";
>
> $project and $action are apparently utf8 already but $site_name isn't.
$project and $action are taken from URL, and we have to run decode_utf8
(at least for query params) for gitweb to work correctly.
$site_name is usually taken from config file, and gitweb doesn't have
"use utf8" pragma.
> The resulting string is marked as utf8 - although the encoding of
> $site_name was never fixed. The to_utf8() in esc_html() returns the string
> without fixing anything because of that.
O.K.
_Maybe_ it would be worth adding explanation of this to commit message
(and I see I should audit gitweb for similar problems elsewhere), but anyway
Acked-by: Jakub Narebski <jnareb@gmail.com>
--
Jakub Narębski
prev parent reply other threads:[~2013-04-09 19:58 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-08 20:09 [PATCH 2/4] gitweb: Make feed title valid utf8 Jürgen Kreileder
2013-04-09 15:10 ` Jakub Narębski
2013-04-09 17:40 ` Jürgen Kreileder
2013-04-09 18:27 ` Jakub Narębski
2013-04-09 19:22 ` Jürgen Kreileder
2013-04-09 19:58 ` Jakub Narębski [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=516472F1.4060903@gmail.com \
--to=jnareb@gmail.com \
--cc=git@vger.kernel.org \
--cc=jk@blackdown.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.