git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Gerrit Pape <pape@smarden.org>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Recai Oktaş" <roktas@debian.org>
Subject: Re: [PATCH/rfc] gitweb: open files (e.g. indextext.html) in utf8 mode
Date: Wed, 02 Jul 2008 06:37:38 -0700 (PDT)	[thread overview]
Message-ID: <m3prpwflus.fsf@localhost.localdomain> (raw)
In-Reply-To: <20080702121317.10819.qmail@bca5b84cb0e0a0.315fe32.mid.smarden.org>

Gerrit Pape <pape@smarden.org> writes:

> From: =?utf-8?q?Recai=20Okta=C5=9F?= <roktas@debian.org>

You don't need to use quoted-printable in 'From:' header embedded in
the mail body.  It should probably read

  From: "Recai Oktaş" <roktas@debian.org>
 
(provided that you can use utf-8 in email).

> gitweb used to use utf8 only in stdout.  As a result, included files
> like indextext.html appeared garbled if they contain utf8 characters.
> Now utf8 is also used when reading files.

It would better read as:

  Gitweb used to use utf8 mode only on STDOUT (actually ":utf8" output
  layer), relying on using to_utf8(...)  to convert input data from uft8
  to Perl internal form.  As a result, included files such as $home_text
  (indextext.html in default build configuration), or repository's
  README.html appeared garbled if they did contain UTF-8 characters.

  Now uft8 mode is used for all open invovations, also when reading files.

> The patch was submitted through
>  http://bugs.debian.org/487465
> 

Probably should have here

  Reported-by: Recai Oktaş <roktas@debian.org>
> Signed-off-by: Gerrit Pape <pape@smarden.org>
> ---
>  gitweb/gitweb.perl |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 90cd99b..96cb4e0 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -16,7 +16,7 @@ use Encode;
>  use Fcntl ':mode';
>  use File::Find qw();
>  use File::Basename qw(basename);
> -binmode STDOUT, ':utf8';
> +use open qw(:std :utf8);
>  
>  BEGIN {
>  	CGI->compile() if $ENV{'MOD_PERL'};

It would be wonderfull if such simple solution worked.  We would be
then able to remove to_utf8() subroutine and do not worry that we
forgot to convert some string to Perl internal encoding, which could
result to curring wide (non US-ASCII) UTF-8 character to be cut in
half.  (On the other hand we wouldn't have $fallback_encoding).

Unfortunately there are two problem (or rather a problem and a half)
with this approach.


First is that with this patch gitweb doesn't pass gitweb test
t/t9500-gitweb-standalone-no-errors.sh (this is with perl v5.8.6)

*   ok 63: encode(commit): utf8
*   ok 64: encode(commit): iso-8859-1
*   ok 65: encode(log): utf-8 and iso-8859-1
[...]
* FAIL 71: URL: no project URLs, no base URL
        gitweb_run "p=.git;a=summary"
[Wed Jul  2 13:10:15 2008] gitweb.perl: utf8 "\xC4" does not map to Unicode \
at /path/to/git/t/trash directory/../../gitweb/gitweb.perl line 2298, \
<$fd> line 1.
[Wed Jul  2 13:10:15 2008] gitweb.perl: Malformed UTF-8 character \
(unexpected end of string) at [...]/gitweb/gitweb.perl line 2303, \
<$fd> line 1.

which is

	open my $fd, '-|', git_cmd(), 'for-each-ref',
		($limit ? '--count='.($limit+1) : ()), '--sort=-committerdate',
		'--format=%(objectname) %(refname) %(subject)%00%(committer)',
		'refs/heads'
		or return;
2298:	while (my $line = <$fd>) {
		my %ref_item;

		chomp $line;
		my ($refinfo, $committerinfo) = split(/\0/, $line);
2303:		my ($hash, $name, $title) = split(' ', $refinfo, 3);


Second, what is minimal Perl version and Perl configuration (installed
modules) that support "use open qw(:std :utf8);"?  We do have some
minimal requirements for gitweb, and it would be nice if we didn't add
to them.  But we already require PerlIO, so it probably doesn't matter.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

  reply	other threads:[~2008-07-02 13:38 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-02 12:13 [PATCH/rfc] gitweb: open files (e.g. indextext.html) in utf8 mode Gerrit Pape
2008-07-02 13:37 ` Jakub Narebski [this message]
2008-07-03  9:39   ` Lea Wiemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3prpwflus.fsf@localhost.localdomain \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pape@smarden.org \
    --cc=roktas@debian.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).