All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michał Kiedrowicz" <michal.kiedrowicz@gmail.com>
To: Jakub Narebski <jnareb@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH/RFC (version A)] gitweb: use CGI with -utf8 to process Unicode query  parameters correctly
Date: Thu, 2 Feb 2012 21:43:36 +0100	[thread overview]
Message-ID: <20120202214336.0c9daf9f@gmail.com> (raw)
In-Reply-To: <201202022108.51353.jnareb@gmail.com>

Jakub Narebski <jnareb@gmail.com> wrote:

> Gitweb tries hard to properly process UTF-8 data, by marking output
> from git commands and contents of files as UTF-8 with to_utf8()
> subroutine.  This ensures that gitweb would print correctly UTF-8
> e.g. in 'log' and 'commit' views.
> 
> Unfortunately it misses another source of potentially Unicode input,
> namely query parameters.  The result is that one cannot search for a
> string containing characters outside US-ASCII.  For example searching
> for "Michał Kiedrowicz" (containing letter 'ł' - LATIN SMALL LETTER L
> WITH STROKE, with Unicode codepoint U+0142, represented with 0xc5 0x82
> bytes in UTF-8 and percent-encoded as %C5%81) result in the following
> incorrect data in search field
> 
> 	Michał Kiedrowicz
> 
> This is caused by CGI by default treating '0xc5 0x82' bytes as two
> characters in Perl legacy encoding latin-1 (iso-8859-1), because 's'
> query parameter is not processed explicitly as UTF-8 encoded string.
> 
> According to "Using Unicode in a Perl CGI script" article on
> http://www.lemoda.net/cgi/perl-unicode/index.html the simplest
> solution is to just import '-utf8' pragma for CGI module:
> 
> 	use CGI '-utf8';
> 	my $value = params('input');
> 
> According to CGI module documentation, the '-utf8' pragma may cause
> problems with POST requests containing binary files... but gitweb
> currently do not use POST requests at all, so this should be not a
> problem now.

This was exactly my feeling  when I sent this patch :).

> 
> Alternate solution would be to explicity decode query parameters when
> storing them in %input_params (and perhaps also path_info).
> 
> [jn: reworded / rewritten commit message]
> 
> Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>

Thanks, I forgot about that.

> Signed-off-by: Jakub Narębski <jnareb@gmail.com>
> ---
>  gitweb/gitweb.perl |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 9cf7e71..a7441ef 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -10,7 +10,7 @@
>  use 5.008;
>  use strict;
>  use warnings;
> -use CGI qw(:standard :escapeHTML -nosticky);
> +use CGI qw(:standard :escapeHTML -nosticky -utf8);
>  use CGI::Util qw(unescape);
>  use CGI::Carp qw(fatalsToBrowser set_message);
>  use Encode;

  parent reply	other threads:[~2012-02-02 20:43 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-01 22:50 [RFC PATCH] gitweb: use CGI with -utf8 Michał Kiedrowicz
2012-02-02 20:01 ` Jakub Narebski
2012-02-02 20:08   ` [PATCH/RFC (version A)] gitweb: use CGI with -utf8 to process Unicode query parameters correctly Jakub Narebski
2012-02-02 20:11     ` Jakub Narebski
2012-02-02 20:43     ` Michał Kiedrowicz [this message]
2012-02-02 20:10   ` [PATCH/RFC (version B)] gitweb: Allow UTF-8 encoded CGI query parameters and path_info Jakub Narebski
2012-02-02 20:46     ` Michał Kiedrowicz
2012-02-02 21:07       ` Jakub Narebski
2012-02-02 22:57         ` Jakub Narebski
2012-02-03  7:39           ` Michal Kiedrowicz
2012-02-03 12:44             ` [PATCH/RFCv2 " Jakub Narebski
2012-02-03 17:45               ` Michał Kiedrowicz
2012-02-03 21:09               ` Junio C Hamano
2012-02-02 20:38   ` [RFC PATCH] gitweb: use CGI with -utf8 Michał Kiedrowicz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120202214336.0c9daf9f@gmail.com \
    --to=michal.kiedrowicz@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.