From: Jakub Narebski <jnareb@gmail.com>
To: git@vger.kernel.org
Cc: Juergen Kreileder <jk@blackdown.de>,
John Hawley <warthog9@kernel.org>,
admin@repo.or.cz
Subject: [PATCH 4/3] gitweb: Fix fallback mode of to_utf8 subroutine
Date: Mon, 19 Dec 2011 01:54:16 +0100 [thread overview]
Message-ID: <201112190154.19107.jnareb@gmail.com> (raw)
In-Reply-To: <1324113743-21498-1-git-send-email-jnareb@gmail.com>
e5d3de5 (gitweb: use Perl built-in utf8 function for UTF-8 decoding.,
2007-12-04) was meant to make gitweb faster by using Perl's internals
(see subsection "Messing with Perl's Internals" in Encode(3pm) manpage)
Simple benchmark confirms that (old = 00f429a, new = this version);
note that it is synthetic benchmark of standalone subroutines, not
of gitweb itself
old new
old -- -65%
new 189% --
Unfortunately it made fallback mode of to_utf8 do not work... except
for default value 'latin1' of $fallback_encoding ('latin1' is Perl
native encoding), which is why it was not noticed for such long time.
utf8::valid(STRING) is an internal function that tests whether STRING
is in a _consistent state_ regarding UTF-8. It returns true is
well-formed UTF-8 and has the UTF-8 flag on _*or*_ if string is held
as bytes (both these states are 'consistent'). For gitweb the second
option was true, as output from git commands is opened without ':utf8'
layer.
What made it work at all for STRING in 'latin1' encoding is the fact
that utf8:decode(STRING) turns on UTF-8 flag only if source string is
valid UTF-8 and contains multi-byte UTF-8 characters... and that if
string doesn't have UTF-8 flag set it is treated as in native Perl
encoding, i.e. 'latin1' / 'iso-8859-1' (unless native encoding it is
EBCDIC ;-)). It was ':utf8' layer that actually converted 'latin1'
(no UTF-8 flag == native == 'latin1) to 'utf8'.
Let's make use of the fact that utf8:decode(STRING) returns false if
STRING is invalid as UTF-8 to check whether to enable fallback mode.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Excuse me for overly long commit message...
Resent as part of to_utf8 fixes for better visibility
gitweb/gitweb.perl | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index d24763b..75b0970 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1443,8 +1443,7 @@ sub validate_refname {
sub to_utf8 {
my $str = shift;
return undef unless defined $str;
- if (utf8::valid($str)) {
- utf8::decode($str);
+ if (utf8::valid($str) && utf8::decode($str)) {
return $str;
} else {
return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
--
1.7.6
next prev parent reply other threads:[~2011-12-19 0:54 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-17 9:22 [PATCH 0/3 (resend)] gitweb: Various to_utf8 / esc_html fixes Jakub Narebski
2011-12-17 9:22 ` [PATCH 1/3] gitweb: Call to_utf8() on input string in chop_and_escape_str() Jakub Narebski
2011-12-17 9:22 ` [PATCH 2/3] gitweb: esc_html() site name for title in OPML Jakub Narebski
2011-12-17 9:22 ` [PATCH 3/3] gitweb: Output valid utf8 in git_blame_common('data') Jakub Narebski
2011-12-17 19:27 ` [PATCH 0/3 (resend)] gitweb: Various to_utf8 / esc_html fixes Junio C Hamano
2011-12-19 0:54 ` Jakub Narebski [this message]
2011-12-19 12:11 ` [PATCH 4/3] gitweb: Fix fallback mode of to_utf8 subroutine Jakub Narebski
2011-12-19 16:21 ` [PATCH 4/3 v2 (bugfix)] " Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201112190154.19107.jnareb@gmail.com \
--to=jnareb@gmail.com \
--cc=admin@repo.or.cz \
--cc=git@vger.kernel.org \
--cc=jk@blackdown.de \
--cc=warthog9@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).