From: Michael Wagner <accounts@mwagner.org>
To: "Jakub Narębski" <jnareb@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>, git <git@vger.kernel.org>
Subject: Re: [PATCH/RFC] Gitweb: Convert UTF-8 encoded file names
Date: Thu, 15 May 2014 07:08:20 +0200 [thread overview]
Message-ID: <20140515050820.GA30785@localhost.localdomain> (raw)
In-Reply-To: <CANQwDwdh1qQkYi9sB=22wbNnb+g5qv5prCzj2aWhHBbTZhVhdg@mail.gmail.com>
On Thu, May 15, 2014 at 12:25:45AM +0200, Jakub Narębski wrote:
> On Wed, May 14, 2014 at 11:57 PM, Junio C Hamano <gitster@pobox.com> wrote:
> > Michael Wagner <accounts@mwagner.org> writes:
> >
> >> Perl has an internal encoding used to store text strings. Currently, trying to
> >> view files with UTF-8 encoded names results in an error (either "404 - Cannot
> >> find file" [blob_plain] or "XML Parsing Error" [blob]). Converting these UTF-8
> >> encoded file names into Perl's internal format resolves these errors.
>
> Could you give us an example? What is important is whether filename
> is passed via path_info or via query string.
>
There is a file named "Gütekriterien.txt" in my repository. Trying to
view this file as "blob_plain" produces an 404 error (displaying the
file name with an additional print statement):
$ REQUEST_METHOD=GET QUERY_STRING='p=notes.git;a=blob_plain;f=work/G%C3%83%C2%BCtekriterien.txt;hb=HEAD' ./gitweb.cgi
work/Gütekriterien.txt
Status: 404 Not Found
Decoding the UTF-8 encoded file name (again with an additional print
statement):
$ REQUEST_METHOD=GET QUERY_STRING='p=notes.git;a=blob_plain;f=work/G%C3%83%C2%BCtekriterien.txt;hb=HEAD' ./gitweb.cgi
work/Gütekriterien.txt
Content-disposition: inline; filename="work/Gütekriterien.txt"
> Because in evaluate_uri() there is
>
> our $path_info = decode_utf8($ENV{"PATH_INFO"});
>
> and in evaluate_query_params() there is
>
> $input_params{$name} = decode_utf8($cgi->param($symbol));
>
> >> Signed-off-by: Michael Wagner <accounts@mwagner.org>
> >> ---
> >
> > Cc'ing Jakub, who have been the area maintainer, for comments.
> >
> > One thing I wonder is that, if there are some additional calls to
> > encode() necessary before we embed $file_name (which are now decoded
> > to the internal string form, not a byte-sequence that happens to be
> > in utf-8) in the generated pages, if we were to do this change.
The generated pages show the correct file names.
>
> There should be no problem with output encoding. esc_path(), which
> should be used for filenames, includes to_utf8, which in turn uses
> decode($fallback_encoding, $str, Encode::FB_DEFAULT);
>
> >> gitweb/gitweb.perl | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> >> index a9f57d6..6046977 100755
> >> --- a/gitweb/gitweb.perl
> >> +++ b/gitweb/gitweb.perl
> >> @@ -1056,7 +1056,7 @@ sub evaluate_and_validate_params {
> >> }
> >> }
> >>
> >> - our $file_name = $input_params{'file_name'};
> >> + our $file_name = decode("utf-8", $input_params{'file_name'});
> >> if (defined $file_name) {
> >> if (!is_valid_pathname($file_name)) {
> >> die_error(400, "Invalid file parameter");
>
> Hmm... all %input_params should have been properly decoded
> already, how it was missed?
>
> Also, branchname (hash_base etc.), search query, filename in file_parent,
> project name can be UTF-8 too, so it is at best partial fix.
>
> --
> Jakub Narębski
next prev parent reply other threads:[~2014-05-15 5:08 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-14 18:41 [PATCH/RFC] Gitweb: Convert UTF-8 encoded file names Michael Wagner
2014-05-14 21:57 ` Junio C Hamano
2014-05-14 22:25 ` Jakub Narębski
2014-05-15 5:08 ` Michael Wagner [this message]
2014-05-15 9:04 ` Peter Krefting
2014-05-15 17:24 ` Junio C Hamano
2014-05-15 18:48 ` Michael Wagner
2014-05-15 19:28 ` Jakub Narębski
2014-05-15 19:37 ` Jakub Narębski
2014-05-15 19:38 ` Junio C Hamano
2014-05-15 20:45 ` Jakub Narębski
2014-05-16 1:26 ` Junio C Hamano
2014-05-16 7:54 ` Jakub Narębski
2014-05-16 17:05 ` Junio C Hamano
2014-05-27 14:18 ` Jakub Narębski
2014-05-16 18:17 ` Junio C Hamano
2014-05-27 14:22 ` [PATCH] gitweb: Harden UTF-8 handling in generated links Jakub Narębski
2014-06-04 15:41 ` Michael Wagner
2014-06-04 18:47 ` Jakub Narębski
2014-06-04 20:47 ` Michael Wagner
2014-05-15 12:32 ` [PATCH/RFC] Gitweb: Convert UTF-8 encoded file names Jakub Narębski
-- strict thread matches above, loose matches on Subject: below --
2014-12-17 14:18 [PATCH v4] remote: add --fetch and --both options to set-url Peter Wu
2014-12-17 14:32 ` Jeff King
2014-12-17 14:42 ` Peter Wu
2014-12-17 22:31 ` Junio C Hamano
2015-03-23 21:35 ` What's cooking in git.git (Mar 2015, #08; Mon, 23) Junio C Hamano
2015-03-24 20:02 ` Junio C Hamano
2015-03-24 20:04 ` Jeff King
2015-03-24 20:08 ` Junio C Hamano
2015-03-24 22:21 ` Junio C Hamano
2015-03-26 16:18 ` Jeff King
2015-03-24 22:26 ` Junio C Hamano
2015-03-25 0:37 ` Jakub Narębski
2015-03-25 1:05 ` Junio C Hamano
2015-03-24 23:37 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140515050820.GA30785@localhost.localdomain \
--to=accounts@mwagner.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jnareb@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.