public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Fwd: NLS mappings for iso-8859-* encodings
@ 2002-05-07 23:07 Petr Vandrovec
  2002-05-08 18:17 ` Anton Altaparmakov
  0 siblings, 1 reply; 4+ messages in thread
From: Petr Vandrovec @ 2002-05-07 23:07 UTC (permalink / raw)
  To: Urban Widmark; +Cc: linux-kernel

On  8 May 02 at 0:08, Urban Widmark wrote:
> On Tue, 7 May 2002, Petr Vandrovec wrote:
> 
> ncpfs should perhaps not use iso8859-x to read filenames in some cp*
> encoding. The default nls you can specify is strange, is it the default
> for chars on the filesystem or the default to use for display?
> 
> isofs uses it for display (and has no need for a second nls table).
> smbfs uses it for display and has a second default for the remote chars.
> ncpfs uses it as default for both display and remote.
> vfat also uses it for both on-disk and display.
> 
> I think ncpfs should demand that the user sets two defaults and if that
> isn't done no default translation is made (just do a memcpy in ncp__vol2io
> and ncp__io2vol). That's what smbfs does anyway.

Yes, it looks like a good idea.

> In unicode the 0x80-0x9F does not contain any printable characters, but
> they are defined. I know one table for iso8859-1 that lists that part as
> being empty/undefined, but it's not an iso document.
> 
> For someone setting their default to iso8859-1 that patch is probably ok,
> but what happens when someone sets it to a variable length encoding? (sjis)

They still have a problem - but they'll probably know what to do as they
had to change default NLS from iso8859-1 to something else.

> But if you have checked that you are not mapping two values to the same
> thing (which would break the back-and-forth translation that smbfs does) I
> don't see how that patch can harm anything.

Yes, I checked it. After changing iso* all singlebyte encodings except
cp874 contain unique mapping for all byte values (cp874 is unique, but
some values are unmappable).
                                    Thanks,
                                            Petr Vandrovec
                                            vandrove@vc.cvut.cz
                                            

^ permalink raw reply	[flat|nested] 4+ messages in thread
* Fwd: NLS mappings for iso-8859-* encodings
@ 2002-05-07 16:13 Petr Vandrovec
  2002-05-07 22:08 ` Urban Widmark
  0 siblings, 1 reply; 4+ messages in thread
From: Petr Vandrovec @ 2002-05-07 16:13 UTC (permalink / raw)
  To: linux-kernel

Hi,
  I sent message below to linux-fsdevel yesterday, but I received no
feedback. Meanwhile I also created patch which does changes proposed
below (map 0x80-0x9F to unicode 0x80-0x9F for ISO encodings).
Patch is available at http://platan.vc.cvut.cz/nls3.patch (39KB).

  If I'll not receive any feedback, I plan to send it to Linus soon.
Currently if you'll mount NCP filesystem with accented characters
without proper iocharset/codepage options, you'll not see filenames
with accented characters at all, as they will not pass through
char2uni of default (iso8859-1) NLS (there was warning printk,
but it was way to DoS...).

  I do not want to use way SMB does (map unknown characters to
:x## string) as it is not trivial to map them back. But if you
think that it is correct that some NLS tables contain characters
without unicode equivalents...
					Thanks,
						Petr Vandrovec
						vandrove@vc.cvut.cz

----- Forwarded (typos cleared) message -----

Resent-Message-Id: <200205071658.RAA26606@zikova.cvut.cz>
From: "Petr Vandrovec" <VANDROVE@vc.cvut.cz>
Organization:  CC CTU Prague
To: linux-fsdevel@vger.kernel.org
Subject:       NLS mappings for iso-8859-* encodings
X-Mailing-List: 	linux-fsdevel@vger.kernel.org

Hi,
  today it was pointed to me (see Debian bugreport #145654,
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=145654) that
all nls_iso8859-* mappings available in kernel refuse to map
characters in range 0x80-0x9F to anything reasonable.

  This behavior means, that with NLS default set to any of
iso8859-* encoding (including default iso-8859-1) filesystems
which contain data in cp850/852/437 codepages will have bad
problems, as majority of accented characters live in 0x80-0x9F
range in these codepages.

  And worse is that old 2.2.x kernels defaulted to 1:1 mapping,
so people were used to see wrong accented characters, but all filenames.
Now they see nothing :-( 

  Is there any reason why 0x80-0x9F range is not mapped identically
to 0x80-0x9F unicode range? I believe that unicode is even defined
as having first 256 characters identical to iso8859-1.
                                                Thanks,
                                                    Petr Vandrovec
                                                    vandrove@vc.cvut.cz
                                                    
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

----- End forwarded message -----

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2002-05-08 18:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-07 23:07 Fwd: NLS mappings for iso-8859-* encodings Petr Vandrovec
2002-05-08 18:17 ` Anton Altaparmakov
  -- strict thread matches above, loose matches on Subject: below --
2002-05-07 16:13 Petr Vandrovec
2002-05-07 22:08 ` Urban Widmark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox