From: "Jakub Narębski" <jnareb@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Jürgen Kreileder" <jk@blackdown.de>, git@vger.kernel.org
Subject: Re: [PATCH 1/4] gitweb: Fix utf8 encoding for blob_plain, blobdiff_plain, commitdiff_plain, and patch
Date: Tue, 09 Apr 2013 13:31:57 +0200 [thread overview]
Message-ID: <5163FC2D.9050408@gmail.com> (raw)
In-Reply-To: <7vd2u4vg4o.fsf@alter.siamese.dyndns.org>
On 08.04.2013, Junio C Hamano wrote:
> jk@blackdown.de (Jürgen Kreileder) writes:
>
>> Fixes the encoding for several _plain actions and for text/* and */*+xml blobs.
>>
>> Signed-off-by: Jürgen Kreileder <jk@blackdown.de>
I see that this patch does (or tries to do) two _independent_ things,
and should be split into at least two separate commits:
>> ---
>
> Thanks, will queue but not hold until I hear something from Jakub.
>
>> gitweb/gitweb.perl | 8 +++++++-
>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>> index 1309196..9cfe5b5 100755
>> --- a/gitweb/gitweb.perl
>> +++ b/gitweb/gitweb.perl
>> @@ -3823,7 +3823,7 @@ sub blob_contenttype {
>> my ($fd, $file_name, $type) = @_;
>>
>> $type ||= blob_mimetype($fd, $file_name);
>> - if ($type eq 'text/plain' && defined $default_text_plain_charset) {
>> + if (($type =~ m!^text/\w[-\w]*$! || $type =~ m!^\w[-\w]*/\w[-\w]*\+xml$!) && defined $default_text_plain_charset) {
>> $type .= "; charset=$default_text_plain_charset";
>> }
First, it extends adding "; charset=$default_text_plain_charset" to
other mimetypes for 'blob_plain' view to all 'text/*' and '*/*+xml'
mimetypes (without changing name of variable... though this is more
complicated as it is configuration variable and we would want to
preserve backward compatibility, but at least a comment would be,
I think, needed).
Originally it applied only to 'text/plain' files, which can be
displayed inline by web browser, and which need charset in
'Content-Type:' HTTP header to be displayed correctly, as they
do not include such information inside the file.
'text/html' and 'application/xhtml+xml' can include such information
inside them (meta http-equiv for 'text/html' and <?xml ...> for
'application/xhtml+xml'). I don't know what browser does when there
is conflicting information about charset, i.e. which one wins, but
it is certainly something to consider.
It might be a good change; I don't know what web browser do when
serving 'text/css', 'text/javascript', 'text/xml' to client to
view without media type known.
BTW I have noticed that we do $prevent_xss dance outside
blob_contenttype(), in it's only caller i.e. git_blob_plain()...
which for example means that 'text/html' converted to 'text/plain'
don't get '; charset=...' added. I guess that it *might* be
what prompted this part of change... but if it is so, it needs
to be fixed at source, e.g. by moving $prevent_xss to
blob_contenttype() subroutine.
About implementation:
>> + if (($type =~ m!^text/\w[-\w]*$! || $type =~ m!^\w[-\w]*/\w[-\w]*\+xml$!) && defined $default_text_plain_charset) {
>> $type .= "; charset=$default_text_plain_charset";
would be better with line split
>> + if (($type =~ m!^text/\w[-\w]*$! || $type =~ m!^\w[-\w]*/\w[-\w]*\+xml$!) &&
>> + defined $default_text_plain_charset) {
>> $type .= "; charset=$default_text_plain_charset";
Also: do we need to be strict with '\w[-\w]*', or would '.*' suffice?
>> @@ -7637,7 +7637,9 @@ sub git_blobdiff {
>> last if $line =~ m!^\+\+\+!;
>> }
>> local $/ = undef;
>> + binmode STDOUT, ':raw';
>> print <$fd>;
>> + binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
>> close $fd;
>> }
>> }
>> @@ -7884,12 +7886,16 @@ sub git_commitdiff {
>>
>> } elsif ($format eq 'plain') {
>> local $/ = undef;
>> + binmode STDOUT, ':raw';
>> print <$fd>;
>> + binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
>> close $fd
>> or print "Reading git-diff-tree failed\n";
>> } elsif ($format eq 'patch') {
>> local $/ = undef;
>> + binmode STDOUT, ':raw';
>> print <$fd>;
>> + binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
>> close $fd
>> or print "Reading git-format-patch failed\n";
>> }
Second it changes 'blobdiff_plain', 'commitdiff_plain' (which I think
that should be abandoned in favor of 'patch' view; but that is
a separate issue) and 'patch' views so they use binary-safe output.
Note that in all cases (I think) we use
$cgi->header(
-type => 'text/plain',
-charset => 'utf-8',
...
);
promising web browser that output is as whole in 'utf-8' encoding.
It is not explained in the commit message what is the reason for this
change. Is it better handing of a situation where files being diff-ed
being in other encoding (like for example in commit that changes
encoding of files from legacy encoding such like e.g. iso-8859-2
or cp1250 to utf-8)? But what about -charset => 'utf-8' then?
About implementation: I think after this change common code crosses
threshold for refactoring it into subroutine, for example:
sub dump_fh_raw {
my $fh = shift;
local $/ = undef;
binmode STDOUT, ':raw';
print <$fh>;
binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
return $fh;
}
--
Jakub Narębski
next prev parent reply other threads:[~2013-04-09 11:32 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-08 20:08 [PATCH 1/4] gitweb: Fix utf8 encoding for blob_plain, blobdiff_plain, commitdiff_plain, and patch Jürgen Kreileder
2013-04-08 21:58 ` Junio C Hamano
2013-04-09 11:31 ` Jakub Narębski [this message]
2013-04-09 21:59 ` Jürgen Kreileder
2013-04-10 10:23 ` Jakub Narębski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5163FC2D.9050408@gmail.com \
--to=jnareb@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jk@blackdown.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.