All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michał Kiedrowicz" <michal.kiedrowicz@gmail.com>
To: Jakub Narebski <jnareb@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH/RFCv2 (version B)] gitweb: Allow UTF-8 encoded CGI query parameters and  path_info
Date: Fri, 3 Feb 2012 18:45:57 +0100	[thread overview]
Message-ID: <20120203184557.59042dec@gmail.com> (raw)
In-Reply-To: <201202031344.55750.jnareb@gmail.com>

Jakub Narebski <jnareb@gmail.com> wrote:

> Gitweb tries hard to properly process UTF-8 data, by marking output
> from git commands and contents of files as UTF-8 with to_utf8()
> subroutine.  This ensures that gitweb would print correctly UTF-8
> e.g. in 'log' and 'commit' views.
> 
> Unfortunately it misses another source of potentially Unicode input,
> namely query parameters.  The result is that one cannot search for a
> string containing characters outside US-ASCII.  For example searching
> for "Michał Kiedrowicz" (containing letter 'ł' - LATIN SMALL LETTER L
> WITH STROKE, with Unicode codepoint U+0142, represented with 0xc5 0x82
> bytes in UTF-8 and percent-encoded as %C5%81) result in the following
> incorrect data in search field
> 
> 	Michał Kiedrowicz
> 
> This is caused by CGI by default treating '0xc5 0x82' bytes as two
> characters in Perl legacy encoding latin-1 (iso-8859-1), because 's'
> query parameter is not processed explicitly as UTF-8 encoded string.
> 
> The solution used here follows "Using Unicode in a Perl CGI script"
> article on http://www.lemoda.net/cgi/perl-unicode/index.html:
> 
> 	use CGI;
> 	use Encode 'decode_utf8;
> 	my $value = params('input');
> 	$value = decode_utf8($value);
> 
> Decoding UTF-8 is done when filling %input_params hash and $path_info
> variable; the former required to move from explicit $cgi->param(<label>)
> to $input_params{<name>} in a few places, which is a good idea anyway.
> 
> Another required change was to add -override=>1 parameter to
> $cgi->textfield() invocation (in search form).  Otherwise CGI would
> use values from query string if it is present, filling value from
> $cgi->param... without decode_utf8().  As we are using value of
> appropriate parameter anyway, -override=>1 doesn't change the
> situation but makes gitweb fill search field correctly.
> 
> Alternate solution would be to simply use the '-utf8' pragma (via
> "use CGI '-utf8';"), but according to CGI.pm documentation it may
> cause problems with POST requests containing binary files... and
> it requires CGI 3.31 (I think), released with perl v5.8.9.
> 
> Noticed-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
> Signed-off-by: Jakub Narębski <jnareb@gmail.com>
> ---
> On Fri, 3 Feb 2012, Michal Kiedrowicz wrote:
> > Jakub Narebski <jnareb@gmail.com> wrote:
> 
> > > Is it what you mean by "this doesn't work for me", i.e. working
> > > search, garbage in search field?
> > 
> > I mean "garbage in search field". Search works even without the patch
> > (at least on Debian with git-1.7.7.3, perl-5.10.1 and CGI-3.43; I
> > don't have my notebook nearby at the moment to check).
> [...]
> 
> > > Damn.  If we use $cgi->textfield(-name => "s", -value => $searchtext)
> > > like in gitweb, CGI.pm would read $cgi->param("s") by itself -
> > > without decoding. 
> > 
> > Makes sense. When I tried calling to_utf8() in the line that defines
> > textfield (this was my first approach to this problem), it haven't
> > changed anything.
> 
> Yes, and it doesn't makes sense in gitweb case - we use value of 
> $cgi->param("s") as default value of text field anyway, but in
> Unicode-aware way.
>  
> > > To skip this we need to pass -force=>1  or
> > > -override=>1 (i.e. further changes to gitweb).
> 
> This patch does this.  
> 
> Does it make work for you?
> 

Yes, it works for me. Search form properly displays "ł". Thanks!

  reply	other threads:[~2012-02-03 17:46 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-01 22:50 [RFC PATCH] gitweb: use CGI with -utf8 Michał Kiedrowicz
2012-02-02 20:01 ` Jakub Narebski
2012-02-02 20:08   ` [PATCH/RFC (version A)] gitweb: use CGI with -utf8 to process Unicode query parameters correctly Jakub Narebski
2012-02-02 20:11     ` Jakub Narebski
2012-02-02 20:43     ` Michał Kiedrowicz
2012-02-02 20:10   ` [PATCH/RFC (version B)] gitweb: Allow UTF-8 encoded CGI query parameters and path_info Jakub Narebski
2012-02-02 20:46     ` Michał Kiedrowicz
2012-02-02 21:07       ` Jakub Narebski
2012-02-02 22:57         ` Jakub Narebski
2012-02-03  7:39           ` Michal Kiedrowicz
2012-02-03 12:44             ` [PATCH/RFCv2 " Jakub Narebski
2012-02-03 17:45               ` Michał Kiedrowicz [this message]
2012-02-03 21:09               ` Junio C Hamano
2012-02-02 20:38   ` [RFC PATCH] gitweb: use CGI with -utf8 Michał Kiedrowicz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120203184557.59042dec@gmail.com \
    --to=michal.kiedrowicz@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.