All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Gerrit Pape <pape@smarden.org>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Recai Oktaş" <roktas@debian.org>
Subject: Re: [PATCH/rfc] gitweb: open files (e.g. indextext.html) in utf8 mode
Date: Wed, 02 Jul 2008 06:37:38 -0700 (PDT)	[thread overview]
Message-ID: <m3prpwflus.fsf@localhost.localdomain> (raw)
In-Reply-To: <20080702121317.10819.qmail@bca5b84cb0e0a0.315fe32.mid.smarden.org>

Gerrit Pape <pape@smarden.org> writes:

> From: =?utf-8?q?Recai=20Okta=C5=9F?= <roktas@debian.org>

You don't need to use quoted-printable in 'From:' header embedded in
the mail body.  It should probably read

  From: "Recai Oktaş" <roktas@debian.org>
 
(provided that you can use utf-8 in email).

> gitweb used to use utf8 only in stdout.  As a result, included files
> like indextext.html appeared garbled if they contain utf8 characters.
> Now utf8 is also used when reading files.

It would better read as:

  Gitweb used to use utf8 mode only on STDOUT (actually ":utf8" output
  layer), relying on using to_utf8(...)  to convert input data from uft8
  to Perl internal form.  As a result, included files such as $home_text
  (indextext.html in default build configuration), or repository's
  README.html appeared garbled if they did contain UTF-8 characters.

  Now uft8 mode is used for all open invovations, also when reading files.

> The patch was submitted through
>  http://bugs.debian.org/487465
> 

Probably should have here

  Reported-by: Recai Oktaş <roktas@debian.org>
> Signed-off-by: Gerrit Pape <pape@smarden.org>
> ---
>  gitweb/gitweb.perl |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 90cd99b..96cb4e0 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -16,7 +16,7 @@ use Encode;
>  use Fcntl ':mode';
>  use File::Find qw();
>  use File::Basename qw(basename);
> -binmode STDOUT, ':utf8';
> +use open qw(:std :utf8);
>  
>  BEGIN {
>  	CGI->compile() if $ENV{'MOD_PERL'};

It would be wonderfull if such simple solution worked.  We would be
then able to remove to_utf8() subroutine and do not worry that we
forgot to convert some string to Perl internal encoding, which could
result to curring wide (non US-ASCII) UTF-8 character to be cut in
half.  (On the other hand we wouldn't have $fallback_encoding).

Unfortunately there are two problem (or rather a problem and a half)
with this approach.


First is that with this patch gitweb doesn't pass gitweb test
t/t9500-gitweb-standalone-no-errors.sh (this is with perl v5.8.6)

*   ok 63: encode(commit): utf8
*   ok 64: encode(commit): iso-8859-1
*   ok 65: encode(log): utf-8 and iso-8859-1
[...]
* FAIL 71: URL: no project URLs, no base URL
        gitweb_run "p=.git;a=summary"
[Wed Jul  2 13:10:15 2008] gitweb.perl: utf8 "\xC4" does not map to Unicode \
at /path/to/git/t/trash directory/../../gitweb/gitweb.perl line 2298, \
<$fd> line 1.
[Wed Jul  2 13:10:15 2008] gitweb.perl: Malformed UTF-8 character \
(unexpected end of string) at [...]/gitweb/gitweb.perl line 2303, \
<$fd> line 1.

which is

	open my $fd, '-|', git_cmd(), 'for-each-ref',
		($limit ? '--count='.($limit+1) : ()), '--sort=-committerdate',
		'--format=%(objectname) %(refname) %(subject)%00%(committer)',
		'refs/heads'
		or return;
2298:	while (my $line = <$fd>) {
		my %ref_item;

		chomp $line;
		my ($refinfo, $committerinfo) = split(/\0/, $line);
2303:		my ($hash, $name, $title) = split(' ', $refinfo, 3);


Second, what is minimal Perl version and Perl configuration (installed
modules) that support "use open qw(:std :utf8);"?  We do have some
minimal requirements for gitweb, and it would be nice if we didn't add
to them.  But we already require PerlIO, so it probably doesn't matter.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

  reply	other threads:[~2008-07-02 13:38 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-02 12:13 [PATCH/rfc] gitweb: open files (e.g. indextext.html) in utf8 mode Gerrit Pape
2008-07-02 13:37 ` Jakub Narebski [this message]
2008-07-03  9:39   ` Lea Wiemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3prpwflus.fsf@localhost.localdomain \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pape@smarden.org \
    --cc=roktas@debian.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.