All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michał Kiedrowicz" <michal.kiedrowicz@gmail.com>
To: Jakub Narebski <jnareb@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH/RFC (version B)] gitweb: Allow UTF-8 encoded CGI query parameters and  path_info
Date: Thu, 2 Feb 2012 21:46:46 +0100	[thread overview]
Message-ID: <20120202214646.1b84f23e@gmail.com> (raw)
In-Reply-To: <201202022110.07127.jnareb@gmail.com>

Jakub Narebski <jnareb@gmail.com> wrote:

> Gitweb tries hard to properly process UTF-8 data, by marking output
> from git commands and contents of files as UTF-8 with to_utf8()
> subroutine.  This ensures that gitweb would print correctly UTF-8
> e.g. in 'log' and 'commit' views.
> 
> Unfortunately it misses another source of potentially Unicode input,
> namely query parameters.  The result is that one cannot search for a
> string containing characters outside US-ASCII.  For example searching
> for "Michał Kiedrowicz" (containing letter 'ł' - LATIN SMALL LETTER L
> WITH STROKE, with Unicode codepoint U+0142, represented with 0xc5 0x82
> bytes in UTF-8 and percent-encoded as %C5%81) result in the following
> incorrect data in search field
> 
> 	Michał Kiedrowicz
> 
> This is caused by CGI by default treating '0xc5 0x82' bytes as two
> characters in Perl legacy encoding latin-1 (iso-8859-1), because 's'
> query parameter is not processed explicitly as UTF-8 encoded string.
> 
> The solution used here follows "Using Unicode in a Perl CGI script"
> article on http://www.lemoda.net/cgi/perl-unicode/index.html:
> 
> 	use CGI;
> 	use Encode 'decode_utf8;
> 	my $value = params('input');
> 	$value = decode_utf8($value);
> 
> This is done when filling %input_params hash; this required to move
> from explicit $cgi->param(<label>) to $input_params{<name>} in a few
> places.

I'm sorry but this doesn't work for me. I would be happy to help if you
have some questions about it.

> 
> Alternate solution would be to simply use the '-utf8' pragma (via
> "use CGI '-utf8';"), but according to CGI.pm documentation it may
> cause problems with POST requests containing binary files... and
> it doesn't work with old CGI.pm version 3.10 from Perl v5.8.6.
> 
> Noticed-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
> Signed-off-by: Jakub Narębski <jnareb@gmail.com>
> ---
>  gitweb/gitweb.perl |   12 ++++++------
>  1 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 9cf7e71..55b2c24 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -52,7 +52,7 @@ sub evaluate_uri {
>  	# as base URL.
>  	# Therefore, if we needed to strip PATH_INFO, then we know that we have
>  	# to build the base URL ourselves:
> -	our $path_info = $ENV{"PATH_INFO"};
> +	our $path_info = decode_utf8($ENV{"PATH_INFO"});
>  	if ($path_info) {
>  		if ($my_url =~ s,\Q$path_info\E$,, &&
>  		    $my_uri =~ s,\Q$path_info\E$,, &&
> @@ -816,9 +816,9 @@ sub evaluate_query_params {
>  
>  	while (my ($name, $symbol) = each %cgi_param_mapping) {
>  		if ($symbol eq 'opt') {
> -			$input_params{$name} = [ $cgi->param($symbol) ];
> +			$input_params{$name} = [ map { decode_utf8($_) } $cgi->param($symbol) ];
>  		} else {
> -			$input_params{$name} = $cgi->param($symbol);
> +			$input_params{$name} = decode_utf8($cgi->param($symbol));
>  		}
>  	}
>  }
> @@ -2767,7 +2767,7 @@ sub git_populate_project_tagcloud {
>  	}
>  
>  	my $cloud;
> -	my $matched = $cgi->param('by_tag');
> +	my $matched = $input_params{'ctag'};
>  	if (eval { require HTML::TagCloud; 1; }) {
>  		$cloud = HTML::TagCloud->new;
>  		foreach my $ctag (sort keys %ctags_lc) {
> @@ -5282,7 +5282,7 @@ sub git_project_list_body {
>  
>  	my $check_forks = gitweb_check_feature('forks');
>  	my $show_ctags  = gitweb_check_feature('ctags');
> -	my $tagfilter = $show_ctags ? $cgi->param('by_tag') : undef;
> +	my $tagfilter = $show_ctags ? $input_params{'ctag'} : undef;
>  	$check_forks = undef
>  		if ($tagfilter || $searchtext);
>  
> @@ -6197,7 +6197,7 @@ sub git_tag {
>  
>  sub git_blame_common {
>  	my $format = shift || 'porcelain';
> -	if ($format eq 'porcelain' && $cgi->param('js')) {
> +	if ($format eq 'porcelain' && $input_params{'javascript'}) {
>  		$format = 'incremental';
>  		$action = 'blame_incremental'; # for page title etc
>  	}

  reply	other threads:[~2012-02-02 20:46 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-01 22:50 [RFC PATCH] gitweb: use CGI with -utf8 Michał Kiedrowicz
2012-02-02 20:01 ` Jakub Narebski
2012-02-02 20:08   ` [PATCH/RFC (version A)] gitweb: use CGI with -utf8 to process Unicode query parameters correctly Jakub Narebski
2012-02-02 20:11     ` Jakub Narebski
2012-02-02 20:43     ` Michał Kiedrowicz
2012-02-02 20:10   ` [PATCH/RFC (version B)] gitweb: Allow UTF-8 encoded CGI query parameters and path_info Jakub Narebski
2012-02-02 20:46     ` Michał Kiedrowicz [this message]
2012-02-02 21:07       ` Jakub Narebski
2012-02-02 22:57         ` Jakub Narebski
2012-02-03  7:39           ` Michal Kiedrowicz
2012-02-03 12:44             ` [PATCH/RFCv2 " Jakub Narebski
2012-02-03 17:45               ` Michał Kiedrowicz
2012-02-03 21:09               ` Junio C Hamano
2012-02-02 20:38   ` [RFC PATCH] gitweb: use CGI with -utf8 Michał Kiedrowicz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120202214646.1b84f23e@gmail.com \
    --to=michal.kiedrowicz@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.