From: Jakub Narebski <jnareb@gmail.com>
To: "Michał Kiedrowicz" <michal.kiedrowicz@gmail.com>
Cc: git@vger.kernel.org
Subject: [PATCH/RFC (version A)] gitweb: use CGI with -utf8 to process Unicode query parameters correctly
Date: Thu, 2 Feb 2012 21:08:50 +0100 [thread overview]
Message-ID: <201202022108.51353.jnareb@gmail.com> (raw)
In-Reply-To: <m37h05c8c1.fsf@localhost.localdomain>
Gitweb tries hard to properly process UTF-8 data, by marking output
from git commands and contents of files as UTF-8 with to_utf8()
subroutine. This ensures that gitweb would print correctly UTF-8
e.g. in 'log' and 'commit' views.
Unfortunately it misses another source of potentially Unicode input,
namely query parameters. The result is that one cannot search for a
string containing characters outside US-ASCII. For example searching
for "Michał Kiedrowicz" (containing letter 'ł' - LATIN SMALL LETTER L
WITH STROKE, with Unicode codepoint U+0142, represented with 0xc5 0x82
bytes in UTF-8 and percent-encoded as %C5%81) result in the following
incorrect data in search field
MichaÅ Kiedrowicz
This is caused by CGI by default treating '0xc5 0x82' bytes as two
characters in Perl legacy encoding latin-1 (iso-8859-1), because 's'
query parameter is not processed explicitly as UTF-8 encoded string.
According to "Using Unicode in a Perl CGI script" article on
http://www.lemoda.net/cgi/perl-unicode/index.html the simplest
solution is to just import '-utf8' pragma for CGI module:
use CGI '-utf8';
my $value = params('input');
According to CGI module documentation, the '-utf8' pragma may cause
problems with POST requests containing binary files... but gitweb
currently do not use POST requests at all, so this should be not a
problem now.
Alternate solution would be to explicity decode query parameters when
storing them in %input_params (and perhaps also path_info).
[jn: reworded / rewritten commit message]
Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
Signed-off-by: Jakub Narębski <jnareb@gmail.com>
---
gitweb/gitweb.perl | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 9cf7e71..a7441ef 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -10,7 +10,7 @@
use 5.008;
use strict;
use warnings;
-use CGI qw(:standard :escapeHTML -nosticky);
+use CGI qw(:standard :escapeHTML -nosticky -utf8);
use CGI::Util qw(unescape);
use CGI::Carp qw(fatalsToBrowser set_message);
use Encode;
--
1.7.6
next prev parent reply other threads:[~2012-02-02 20:08 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-01 22:50 [RFC PATCH] gitweb: use CGI with -utf8 Michał Kiedrowicz
2012-02-02 20:01 ` Jakub Narebski
2012-02-02 20:08 ` Jakub Narebski [this message]
2012-02-02 20:11 ` [PATCH/RFC (version A)] gitweb: use CGI with -utf8 to process Unicode query parameters correctly Jakub Narebski
2012-02-02 20:43 ` Michał Kiedrowicz
2012-02-02 20:10 ` [PATCH/RFC (version B)] gitweb: Allow UTF-8 encoded CGI query parameters and path_info Jakub Narebski
2012-02-02 20:46 ` Michał Kiedrowicz
2012-02-02 21:07 ` Jakub Narebski
2012-02-02 22:57 ` Jakub Narebski
2012-02-03 7:39 ` Michal Kiedrowicz
2012-02-03 12:44 ` [PATCH/RFCv2 " Jakub Narebski
2012-02-03 17:45 ` Michał Kiedrowicz
2012-02-03 21:09 ` Junio C Hamano
2012-02-02 20:38 ` [RFC PATCH] gitweb: use CGI with -utf8 Michał Kiedrowicz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201202022108.51353.jnareb@gmail.com \
--to=jnareb@gmail.com \
--cc=git@vger.kernel.org \
--cc=michal.kiedrowicz@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.