* [PATCH/rfc] gitweb: open files (e.g. indextext.html) in utf8 mode
@ 2008-07-02 12:13 Gerrit Pape
2008-07-02 13:37 ` Jakub Narebski
0 siblings, 1 reply; 3+ messages in thread
From: Gerrit Pape @ 2008-07-02 12:13 UTC (permalink / raw)
To: git, Junio C Hamano
From: =?utf-8?q?Recai=20Okta=C5=9F?= <roktas@debian.org>
gitweb used to use utf8 only in stdout. As a result, included files
like indextext.html appeared garbled if they contain utf8 characters.
Now utf8 is also used when reading files.
The patch was submitted through
http://bugs.debian.org/487465
Signed-off-by: Gerrit Pape <pape@smarden.org>
---
gitweb/gitweb.perl | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 90cd99b..96cb4e0 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -16,7 +16,7 @@ use Encode;
use Fcntl ':mode';
use File::Find qw();
use File::Basename qw(basename);
-binmode STDOUT, ':utf8';
+use open qw(:std :utf8);
BEGIN {
CGI->compile() if $ENV{'MOD_PERL'};
--
1.5.6
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH/rfc] gitweb: open files (e.g. indextext.html) in utf8 mode
2008-07-02 12:13 [PATCH/rfc] gitweb: open files (e.g. indextext.html) in utf8 mode Gerrit Pape
@ 2008-07-02 13:37 ` Jakub Narebski
2008-07-03 9:39 ` Lea Wiemann
0 siblings, 1 reply; 3+ messages in thread
From: Jakub Narebski @ 2008-07-02 13:37 UTC (permalink / raw)
To: Gerrit Pape; +Cc: git, Junio C Hamano, Recai Oktaş
Gerrit Pape <pape@smarden.org> writes:
> From: =?utf-8?q?Recai=20Okta=C5=9F?= <roktas@debian.org>
You don't need to use quoted-printable in 'From:' header embedded in
the mail body. It should probably read
From: "Recai Oktaş" <roktas@debian.org>
(provided that you can use utf-8 in email).
> gitweb used to use utf8 only in stdout. As a result, included files
> like indextext.html appeared garbled if they contain utf8 characters.
> Now utf8 is also used when reading files.
It would better read as:
Gitweb used to use utf8 mode only on STDOUT (actually ":utf8" output
layer), relying on using to_utf8(...) to convert input data from uft8
to Perl internal form. As a result, included files such as $home_text
(indextext.html in default build configuration), or repository's
README.html appeared garbled if they did contain UTF-8 characters.
Now uft8 mode is used for all open invovations, also when reading files.
> The patch was submitted through
> http://bugs.debian.org/487465
>
Probably should have here
Reported-by: Recai Oktaş <roktas@debian.org>
> Signed-off-by: Gerrit Pape <pape@smarden.org>
> ---
> gitweb/gitweb.perl | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 90cd99b..96cb4e0 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -16,7 +16,7 @@ use Encode;
> use Fcntl ':mode';
> use File::Find qw();
> use File::Basename qw(basename);
> -binmode STDOUT, ':utf8';
> +use open qw(:std :utf8);
>
> BEGIN {
> CGI->compile() if $ENV{'MOD_PERL'};
It would be wonderfull if such simple solution worked. We would be
then able to remove to_utf8() subroutine and do not worry that we
forgot to convert some string to Perl internal encoding, which could
result to curring wide (non US-ASCII) UTF-8 character to be cut in
half. (On the other hand we wouldn't have $fallback_encoding).
Unfortunately there are two problem (or rather a problem and a half)
with this approach.
First is that with this patch gitweb doesn't pass gitweb test
t/t9500-gitweb-standalone-no-errors.sh (this is with perl v5.8.6)
* ok 63: encode(commit): utf8
* ok 64: encode(commit): iso-8859-1
* ok 65: encode(log): utf-8 and iso-8859-1
[...]
* FAIL 71: URL: no project URLs, no base URL
gitweb_run "p=.git;a=summary"
[Wed Jul 2 13:10:15 2008] gitweb.perl: utf8 "\xC4" does not map to Unicode \
at /path/to/git/t/trash directory/../../gitweb/gitweb.perl line 2298, \
<$fd> line 1.
[Wed Jul 2 13:10:15 2008] gitweb.perl: Malformed UTF-8 character \
(unexpected end of string) at [...]/gitweb/gitweb.perl line 2303, \
<$fd> line 1.
which is
open my $fd, '-|', git_cmd(), 'for-each-ref',
($limit ? '--count='.($limit+1) : ()), '--sort=-committerdate',
'--format=%(objectname) %(refname) %(subject)%00%(committer)',
'refs/heads'
or return;
2298: while (my $line = <$fd>) {
my %ref_item;
chomp $line;
my ($refinfo, $committerinfo) = split(/\0/, $line);
2303: my ($hash, $name, $title) = split(' ', $refinfo, 3);
Second, what is minimal Perl version and Perl configuration (installed
modules) that support "use open qw(:std :utf8);"? We do have some
minimal requirements for gitweb, and it would be nice if we didn't add
to them. But we already require PerlIO, so it probably doesn't matter.
--
Jakub Narebski
Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH/rfc] gitweb: open files (e.g. indextext.html) in utf8 mode
2008-07-02 13:37 ` Jakub Narebski
@ 2008-07-03 9:39 ` Lea Wiemann
0 siblings, 0 replies; 3+ messages in thread
From: Lea Wiemann @ 2008-07-03 9:39 UTC (permalink / raw)
To: Jakub Narebski; +Cc: Gerrit Pape, git, Junio C Hamano, Recai Oktaş
Jakub Narebski wrote:
> Second, what is minimal Perl version and Perl configuration (installed
> modules) that support "use open qw(:std :utf8);"?
open is in core (-> corelist), and "qw(:std :utf)" works here with Perl
5.8.8. Perl 5.6 doesn't have it, but gitweb doesn't support Perl 5.6
anyway (e.g. "binmode STDOUT, ':utf8';" doesn't work with Perl 5.6). So
it should be fine compatibility-wise.
-- Lea
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2008-07-03 12:23 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-02 12:13 [PATCH/rfc] gitweb: open files (e.g. indextext.html) in utf8 mode Gerrit Pape
2008-07-02 13:37 ` Jakub Narebski
2008-07-03 9:39 ` Lea Wiemann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).