All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: git@vger.kernel.org
Cc: "Jürgen Kreileder" <jk@blackdown.de>,
	"John Hawley" <warthog9@kernel.org>, "Jeff King" <peff@peff.net>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [RFD] Handling of non-UTF8 data in gitweb
Date: Fri, 6 Jan 2012 17:35:31 +0100	[thread overview]
Message-ID: <201201061735.32908.jnareb@gmail.com> (raw)
In-Reply-To: <201112041709.32212.jnareb@gmail.com>

On Sun, 4 Dec 2011, Jakub Narebski wrote:
> 
> Currently gitweb converts data it receives from git commands to Perl 
> internal utf8 representation via to_utf8() subroutine
[...]
> Each part of data must be handled separately.  It is quite error prone
> process, as can be seen from quite a number of patches that fix handling
> of UTF-8 data (latest from Jürgen).
> 
> 
> Much, much simpler would be to force opening of all files (including 
> output pipes from git commands) in ':utf8' mode:
> 
>   use open qw(:std :utf8);
> 
> [Note: perhaps instead of ':utf8' it should be ':encoding(UTF-8)' 
>  there...]
> 
> But doing this would change gitweb behavior.  [...]
[...]
> I don't know if people are relying on the old behavior.  I guess
> it could be emulated by defining our own 'utf-8-with-fallback'
> encoding, or by defining our own PerlIO layer with PerlIO::via.
> But it no longer be simple solution (though still automatic).

I have now created simple Encode::UTF8WithFallback module, so that

  use Encode::UTF8WithFallback;
  use open IN => ':encoding(utf8-with-fallback)';

should be able to replace all calls to to_utf8() without any change
in behavior; at least simple tests shows that.


There however are two problems with this solution:

1. Encode::UTF8WithFallback should really be a separate Perl module
   in a separate file (e.g. 'gitweb/lib/Encode/UTF8WithFallback.pm');
   I was not able to make it work without a separate file.

   This means that it very much requires the change that allows splitting
   gitweb into many files and/or load extra helper modules, and/or require
   extra non-core modules but provide and install them with gitweb if they
   are not available.  These changes are ready, and can be find in 

     'gitweb/split'
   
   branch in my git.git repositories:

     http://repo.or.cz/w/git/jnareb-git.git
     https://github.com/jnareb/git


2. It turned out that the "open" pragma 1.04 from Perl v5.8.6 does not
   work correctly.  We need at least "open" 1.06 (version 1.05 consists
   supposedly only of documentation-only change).

   Because "open" is a core Perl module (core pragma), this means that
   gitweb will require in practice Perl v5.8.9 at least, increasing
   version requirement from current v5.8.0
 
-- 
Jakub Narebski
Poland

      parent reply	other threads:[~2012-01-06 16:35 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-04 16:09 [RFD] Handling of non-UTF8 data in gitweb Jakub Narebski
2011-12-06  1:07 ` Jeff King
2011-12-07  0:37 ` Junio C Hamano
2011-12-10 16:18   ` Jakub Narebski
2011-12-12  5:26     ` Junio C Hamano
2011-12-18 22:00   ` Jakub Narebski
2012-01-06 16:35 ` Jakub Narebski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201201061735.32908.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jk@blackdown.de \
    --cc=peff@peff.net \
    --cc=warthog9@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.