linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Anton Altaparmakov <aia21@cam.ac.uk>
To: Roman Zippel <zippel@linux-m68k.org>
Cc: Pavel Fedin <sonic_amiga@rambler.ru>, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] Full NLS support for HFS (classic) filesystem
Date: Tue, 31 May 2005 15:49:18 +0100	[thread overview]
Message-ID: <1117550958.8073.30.camel@imp.csi.cam.ac.uk> (raw)
In-Reply-To: <Pine.LNX.4.61.0505311550080.3728@scrub.home>

Hi,

On Tue, 2005-05-31 at 15:59 +0200, Roman Zippel wrote:
> On Tue, 31 May 2005, Pavel Fedin wrote:
> > > If the names were translated correctly, HFS would have found them. You need
> > > to give me an example, which should have worked, but failed.
> > 
> >  I can't produce exact russian string (don't remember), but it was about 50%
> > of all russian names.
> 
> Without an example I can't reproduce, what you're trying to say here (at 
> least the "HFS doesn't find the file, even though it's correctly 
> translated" part). 

There are lost of characters that cannot be translated.  I have this
problem with NTFS, too.  My solution is to just ignore file names that
cannot be translated.  If a user complains they cannot see some
filenames, I tell them to use utf8 for their encoding which always works
for translation.

NLS is fundamentally broken so there is no point in trying to use clever
dynamic tables to do it.  Just ignore it is IMO the correct way.

btw. not having mappings is not even the biggest problem.  It gets much
worse and even Pavel's dynamic mappings are not actually going to work.
For example there are some characters in asian languages which when
translated will give a character but when you then reverse translate
this character you end up with something that is not the same as the
starting character.

This is a fundamental flaw with the whole NLS codepages approach because
there are symbols in Unicode which have identical meaning but two
different Unicode values.  You get this for the "exact" ideographs and
the "simplified" ideographs (e.g. CJK and compatibility ideographs - see
Unicode standard and various NLS pages for details).  If you don't
believe me I can dig out the old emails which have concrete examples of
how you convert a CJK ideograph to some codepage and then back and you
end up with a compatibility CJK ideograph instead of the original one.
Of course if you start with the compatibility CJK ideograph and do the
translation + reverse translation you end up with the same compatibility
ideograph but that doesn't help you when you use the "real" ideograph as
for example Windows seems to do as a lot of asian people have complained
to me about ntfs when used with codepages.  All of them went away happy
when I told them to tell the ntfs driver to use utf8...

So unless you use UTF8, any other conversion using NLS/code pages will
always have failure cases...

(btw. I first suspected bugs in the code pages but I verified them on
the MS website and they were correct...)

> > > Create the tables in a nls module and you can do whatever you want in the
> > > uni2char/char2uni functions.
> > 
> >  Huh...
> >  The problem is: when using 8-bit iocharset and 8-bit codepage char2uni from
> > codepage always gives the result but AFTER THIS uni2char to iocharset does NOT
> > necessarily gives the result. There are characters in codepage which have no
> > equivalents in iocharset. They will be lost, you suggest to turn them into
> > '?'. But how to reverse this in order to supply to hfs_strcmp()?
> 
> So create two functions uni2char/char2uni, which provide perfect reverse 
> mapping. Sorry, but I don't understand what your problem is here.
> It seems you're making it more complex than it really is.

That is impossible due to the problems with compatibility characters I
explained above which why I would agree with you that such magic
conversions should never happen, just put "use utf8 if you have
problems" in the mount man page...

Best regards,

        Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/


  reply	other threads:[~2005-05-31 14:49 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <429B1E35.2040905@rambler.ru>
2005-05-30 11:50 ` [PATCH] Full NLS support for HFS (classic) filesystem Roman Zippel
2005-05-31 13:37   ` Pavel Fedin
2005-05-31 10:45     ` Roman Zippel
2005-05-31 19:35       ` Pavel Fedin
2005-05-31 12:13         ` Roman Zippel
2005-05-31 21:21           ` Pavel Fedin
2005-05-31 13:59             ` Roman Zippel
2005-05-31 14:49               ` Anton Altaparmakov [this message]
2005-05-31 15:28                 ` Roman Zippel
2005-06-02 13:34                 ` Pavel Fedin
2005-06-02  9:24                   ` Anton Altaparmakov
2005-06-03 13:34                     ` Pavel Fedin
2005-06-03  8:09                       ` Anton Altaparmakov
2005-06-06 13:15                         ` Pavel Fedin
2005-06-06 12:44                           ` Roman Zippel
2005-06-06 21:44                             ` Pavel Fedin
2005-06-06 14:44                               ` Roman Zippel
2005-06-08 18:08                                 ` Pavel Fedin
2005-06-01  0:26             ` Roman Zippel
     [not found]               ` <429F0869.4010805@rambler.ru>
     [not found]                 ` <Pine.LNX.4.61.0506021130290.3728@scrub.home>
2005-06-03 14:00                   ` Pavel Fedin
2005-06-09 23:47   ` George Anzinger
2005-05-30 14:05 Pavel Fedin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1117550958.8073.30.camel@imp.csi.cam.ac.uk \
    --to=aia21@cam.ac.uk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sonic_amiga@rambler.ru \
    --cc=zippel@linux-m68k.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).