All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Michal Kiedrowicz <michal.kiedrowicz@gmail.com>
Cc: git@vger.kernel.org
Subject: [PATCH/RFCv2 (version B)] gitweb: Allow UTF-8 encoded CGI query parameters and  path_info
Date: Fri, 3 Feb 2012 13:44:54 +0100	[thread overview]
Message-ID: <201202031344.55750.jnareb@gmail.com> (raw)
In-Reply-To: <20120203083935.5d9d4b18@mkiedrowicz.ivo.pl>

Gitweb tries hard to properly process UTF-8 data, by marking output
from git commands and contents of files as UTF-8 with to_utf8()
subroutine.  This ensures that gitweb would print correctly UTF-8
e.g. in 'log' and 'commit' views.

Unfortunately it misses another source of potentially Unicode input,
namely query parameters.  The result is that one cannot search for a
string containing characters outside US-ASCII.  For example searching
for "Michał Kiedrowicz" (containing letter 'ł' - LATIN SMALL LETTER L
WITH STROKE, with Unicode codepoint U+0142, represented with 0xc5 0x82
bytes in UTF-8 and percent-encoded as %C5%81) result in the following
incorrect data in search field

	Michał Kiedrowicz

This is caused by CGI by default treating '0xc5 0x82' bytes as two
characters in Perl legacy encoding latin-1 (iso-8859-1), because 's'
query parameter is not processed explicitly as UTF-8 encoded string.

The solution used here follows "Using Unicode in a Perl CGI script"
article on http://www.lemoda.net/cgi/perl-unicode/index.html:

	use CGI;
	use Encode 'decode_utf8;
	my $value = params('input');
	$value = decode_utf8($value);

Decoding UTF-8 is done when filling %input_params hash and $path_info
variable; the former required to move from explicit $cgi->param(<label>)
to $input_params{<name>} in a few places, which is a good idea anyway.

Another required change was to add -override=>1 parameter to
$cgi->textfield() invocation (in search form).  Otherwise CGI would
use values from query string if it is present, filling value from
$cgi->param... without decode_utf8().  As we are using value of
appropriate parameter anyway, -override=>1 doesn't change the
situation but makes gitweb fill search field correctly.

Alternate solution would be to simply use the '-utf8' pragma (via
"use CGI '-utf8';"), but according to CGI.pm documentation it may
cause problems with POST requests containing binary files... and
it requires CGI 3.31 (I think), released with perl v5.8.9.

Noticed-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
Signed-off-by: Jakub Narębski <jnareb@gmail.com>
---
On Fri, 3 Feb 2012, Michal Kiedrowicz wrote:
> Jakub Narebski <jnareb@gmail.com> wrote:

> > Is it what you mean by "this doesn't work for me", i.e. working
> > search, garbage in search field?
> 
> I mean "garbage in search field". Search works even without the patch
> (at least on Debian with git-1.7.7.3, perl-5.10.1 and CGI-3.43; I
> don't have my notebook nearby at the moment to check).
[...]

> > Damn.  If we use $cgi->textfield(-name => "s", -value => $searchtext)
> > like in gitweb, CGI.pm would read $cgi->param("s") by itself -
> > without decoding. 
> 
> Makes sense. When I tried calling to_utf8() in the line that defines
> textfield (this was my first approach to this problem), it haven't
> changed anything.

Yes, and it doesn't makes sense in gitweb case - we use value of 
$cgi->param("s") as default value of text field anyway, but in
Unicode-aware way.
 
> > To skip this we need to pass -force=>1  or
> > -override=>1 (i.e. further changes to gitweb).

This patch does this.  

Does it make work for you?

> > -utf8 pragma works with more modern CGI.pm, but does not with 3.10.

-utf8 pragma was added in CVS revision 1.238 of CGI.pm, which I think
is present in CGI 3.31, released with perl v5.8.9.  Theoretically gitweb
maintains backward compatibility with perl v5.8.3 or something 
("use 5.008;" but IIRC 5.8.3 is needed for correct Unicde handling anyway).

 gitweb/gitweb.perl |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 9cf7e71..bd5fff9 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -52,7 +52,7 @@ sub evaluate_uri {
 	# as base URL.
 	# Therefore, if we needed to strip PATH_INFO, then we know that we have
 	# to build the base URL ourselves:
-	our $path_info = $ENV{"PATH_INFO"};
+	our $path_info = decode_utf8($ENV{"PATH_INFO"});
 	if ($path_info) {
 		if ($my_url =~ s,\Q$path_info\E$,, &&
 		    $my_uri =~ s,\Q$path_info\E$,, &&
@@ -816,9 +816,9 @@ sub evaluate_query_params {
 
 	while (my ($name, $symbol) = each %cgi_param_mapping) {
 		if ($symbol eq 'opt') {
-			$input_params{$name} = [ $cgi->param($symbol) ];
+			$input_params{$name} = [ map { decode_utf8($_) } $cgi->param($symbol) ];
 		} else {
-			$input_params{$name} = $cgi->param($symbol);
+			$input_params{$name} = decode_utf8($cgi->param($symbol));
 		}
 	}
 }
@@ -2767,7 +2767,7 @@ sub git_populate_project_tagcloud {
 	}
 
 	my $cloud;
-	my $matched = $cgi->param('by_tag');
+	my $matched = $input_params{'ctag'};
 	if (eval { require HTML::TagCloud; 1; }) {
 		$cloud = HTML::TagCloud->new;
 		foreach my $ctag (sort keys %ctags_lc) {
@@ -3873,7 +3873,7 @@ sub print_search_form {
 	                       -values => ['commit', 'grep', 'author', 'committer', 'pickaxe']) .
 	      $cgi->sup($cgi->a({-href => href(action=>"search_help")}, "?")) .
 	      " search:\n",
-	      $cgi->textfield(-name => "s", -value => $searchtext) . "\n" .
+	      $cgi->textfield(-name => "s", -value => $searchtext, -override => 1) . "\n" .
 	      "<span title=\"Extended regular expression\">" .
 	      $cgi->checkbox(-name => 'sr', -value => 1, -label => 're',
 	                     -checked => $search_use_regexp) .
@@ -5282,7 +5282,7 @@ sub git_project_list_body {
 
 	my $check_forks = gitweb_check_feature('forks');
 	my $show_ctags  = gitweb_check_feature('ctags');
-	my $tagfilter = $show_ctags ? $cgi->param('by_tag') : undef;
+	my $tagfilter = $show_ctags ? $input_params{'ctag'} : undef;
 	$check_forks = undef
 		if ($tagfilter || $searchtext);
 
@@ -5994,7 +5994,7 @@ sub git_project_list {
 	}
 	print $cgi->startform(-method => "get") .
 	      "<p class=\"projsearch\">Search:\n" .
-	      $cgi->textfield(-name => "s", -value => $searchtext) . "\n" .
+	      $cgi->textfield(-name => "s", -value => $searchtext, -override => 1) . "\n" .
 	      "</p>" .
 	      $cgi->end_form() . "\n";
 	git_project_list_body(\@list, $order);
@@ -6197,7 +6197,7 @@ sub git_tag {
 
 sub git_blame_common {
 	my $format = shift || 'porcelain';
-	if ($format eq 'porcelain' && $cgi->param('js')) {
+	if ($format eq 'porcelain' && $input_params{'javascript'}) {
 		$format = 'incremental';
 		$action = 'blame_incremental'; # for page title etc
 	}
-- 
1.7.6

  reply	other threads:[~2012-02-03 12:44 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-01 22:50 [RFC PATCH] gitweb: use CGI with -utf8 Michał Kiedrowicz
2012-02-02 20:01 ` Jakub Narebski
2012-02-02 20:08   ` [PATCH/RFC (version A)] gitweb: use CGI with -utf8 to process Unicode query parameters correctly Jakub Narebski
2012-02-02 20:11     ` Jakub Narebski
2012-02-02 20:43     ` Michał Kiedrowicz
2012-02-02 20:10   ` [PATCH/RFC (version B)] gitweb: Allow UTF-8 encoded CGI query parameters and path_info Jakub Narebski
2012-02-02 20:46     ` Michał Kiedrowicz
2012-02-02 21:07       ` Jakub Narebski
2012-02-02 22:57         ` Jakub Narebski
2012-02-03  7:39           ` Michal Kiedrowicz
2012-02-03 12:44             ` Jakub Narebski [this message]
2012-02-03 17:45               ` [PATCH/RFCv2 " Michał Kiedrowicz
2012-02-03 21:09               ` Junio C Hamano
2012-02-02 20:38   ` [RFC PATCH] gitweb: use CGI with -utf8 Michał Kiedrowicz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201202031344.55750.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=michal.kiedrowicz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.