git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: git@vger.kernel.org
Cc: "Jürgen Kreileder" <jk@blackdown.de>,
	"John Hawley" <warthog9@kernel.org>, "Jeff King" <peff@peff.net>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [RFD] Handling of non-UTF8 data in gitweb
Date: Fri, 6 Jan 2012 17:35:31 +0100	[thread overview]
Message-ID: <201201061735.32908.jnareb@gmail.com> (raw)
In-Reply-To: <201112041709.32212.jnareb@gmail.com>

On Sun, 4 Dec 2011, Jakub Narebski wrote:
> 
> Currently gitweb converts data it receives from git commands to Perl 
> internal utf8 representation via to_utf8() subroutine
[...]
> Each part of data must be handled separately.  It is quite error prone
> process, as can be seen from quite a number of patches that fix handling
> of UTF-8 data (latest from Jürgen).
> 
> 
> Much, much simpler would be to force opening of all files (including 
> output pipes from git commands) in ':utf8' mode:
> 
>   use open qw(:std :utf8);
> 
> [Note: perhaps instead of ':utf8' it should be ':encoding(UTF-8)' 
>  there...]
> 
> But doing this would change gitweb behavior.  [...]
[...]
> I don't know if people are relying on the old behavior.  I guess
> it could be emulated by defining our own 'utf-8-with-fallback'
> encoding, or by defining our own PerlIO layer with PerlIO::via.
> But it no longer be simple solution (though still automatic).

I have now created simple Encode::UTF8WithFallback module, so that

  use Encode::UTF8WithFallback;
  use open IN => ':encoding(utf8-with-fallback)';

should be able to replace all calls to to_utf8() without any change
in behavior; at least simple tests shows that.


There however are two problems with this solution:

1. Encode::UTF8WithFallback should really be a separate Perl module
   in a separate file (e.g. 'gitweb/lib/Encode/UTF8WithFallback.pm');
   I was not able to make it work without a separate file.

   This means that it very much requires the change that allows splitting
   gitweb into many files and/or load extra helper modules, and/or require
   extra non-core modules but provide and install them with gitweb if they
   are not available.  These changes are ready, and can be find in 

     'gitweb/split'
   
   branch in my git.git repositories:

     http://repo.or.cz/w/git/jnareb-git.git
     https://github.com/jnareb/git


2. It turned out that the "open" pragma 1.04 from Perl v5.8.6 does not
   work correctly.  We need at least "open" 1.06 (version 1.05 consists
   supposedly only of documentation-only change).

   Because "open" is a core Perl module (core pragma), this means that
   gitweb will require in practice Perl v5.8.9 at least, increasing
   version requirement from current v5.8.0
 
-- 
Jakub Narebski
Poland

      parent reply	other threads:[~2012-01-06 16:35 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-04 16:09 [RFD] Handling of non-UTF8 data in gitweb Jakub Narebski
2011-12-06  1:07 ` Jeff King
2011-12-07  0:37 ` Junio C Hamano
2011-12-10 16:18   ` Jakub Narebski
2011-12-12  5:26     ` Junio C Hamano
2011-12-18 22:00   ` Jakub Narebski
2012-01-06 16:35 ` Jakub Narebski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201201061735.32908.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jk@blackdown.de \
    --cc=peff@peff.net \
    --cc=warthog9@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).