Git development
 help / color / mirror / Atom feed
* Timur Sufiev: Re: [PATCH I18N filenames v2 3/3] Provide compatibility with MinGW
@ 2009-11-06 10:00 Timur Sufiev
  2009-11-06 14:55 ` Peter Krefting
  0 siblings, 1 reply; 2+ messages in thread
From: Timur Sufiev @ 2009-11-06 10:00 UTC (permalink / raw)
  To: git

[-- Attachment #1: forwarded message --]
[-- Type: message/rfc822, Size: 2205 bytes --]

From: Timur Sufiev <tsufiev@gmail.com>
To: Peter Krefting <peter@softwolves.pp.se>
Subject: Re: [PATCH I18N filenames v2 3/3] Provide compatibility with MinGW
Date: Tue, 03 Nov 2009 19:53:44 +0300

Hello, Peter
> Hi!
> 
> Instead of calling the open_i18n() which converts from UTF-8 to a local 
> 8-bit character set, this should probably call a version that converts from 
> UTF-8 to UTF-16 and uses _wopen().
> 
> Same thing for fopen_i18n() and _wfopen().
> 
> I created a small RFC patch for that that changed parts of the system 
> earlier this year - http://kerneltrap.org/mailarchive/git/2009/3/2/5350814
> 
> I did not address readdir() and friends, I'm not sure if they are available 
> in UTF-16 form or if they need to be rewritten using findfirst()/findnext().
> 
> -- 
> \\// Peter - http://www.softwolves.pp.se/

I've decided to stick to local 8-bit encoding for now having considered
the following issues:

1. Many git front-ends, e.g. TortoiseGit, use 8-bit set, not UTF-16:
they call git plumbing commands and pass filenames to command line (in
local 8-bit encoding). So, using [UTF-8] <-> [UTF-16] approach, I had to
deal with 3 different encodings: UTF-8, UTF-16 and local 8-bit one
(CP1251 in my case). Moreover, Windows itself uses both UTF-16 and
CP1251, so one had to deal with reencoding between them (if he plans to
support UTF-16). Too much confusion.
 
2. UTF-16 is a proper solution for Windows, but my patch is useful for
other OSes with locales different from UTF-8 (e.g. Linux with KOI8-R
locale).

Still there is a possibility that one day we'll stumble upon some UTF-8
symbol which cannot not be correctly mapped into 8-bit encoding. UTF-16
would be a remedy in this case, but what if don't have it (see 2)?
-- 
Timur Sufiev

[-- Attachment #2: Signature --]
[-- Type: text/plain, Size: 17 bytes --]

-- 
Timur Sufiev

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Timur Sufiev: Re: [PATCH I18N filenames v2 3/3] Provide compatibility with MinGW
  2009-11-06 10:00 Timur Sufiev: Re: [PATCH I18N filenames v2 3/3] Provide compatibility with MinGW Timur Sufiev
@ 2009-11-06 14:55 ` Peter Krefting
  0 siblings, 0 replies; 2+ messages in thread
From: Peter Krefting @ 2009-11-06 14:55 UTC (permalink / raw)
  To: Timur Sufiev; +Cc: git

Timur Sufiev:

> 1. Many git front-ends, e.g. TortoiseGit, use 8-bit set, not UTF-16:

All of them do, that is because the output is using 8-bit. That is why the 
internal encoding need to remain eight-bit, for instance UTF-8.

> they call git plumbing commands and pass filenames to command line (in 
> local 8-bit encoding).

Well, yes. On Windows, however, there is the complication that the command 
line is available in two versions. There is a eight-bit and a UTF-16 version 
of it. Which one is constructed from which depends on how the application 
was launched. We can read the UTF-16 version and hope that it contains 
proper names (possibly looking at the eight-bit version as UTF-8 if 
necessary).

> 2. UTF-16 is a proper solution for Windows, but my patch is useful for 
> other OSes with locales different from UTF-8 (e.g. Linux with KOI8-R 
> locale).

Well, your patch re-implements the fopen() calls, converting the file name 
at that point (as well as readdir() and friends). I would do that on Windows 
as well, with the modification that on Windows, I would convert to UTF-16 
and use _wfopen() instead. On systems that have it, you could also make it 
convert to UTF-32 and use their wfopen() (I'm not aware of many other OSes 
having those functions, though).

> Still there is a possibility that one day we'll stumble upon some UTF-8 
> symbol which cannot not be correctly mapped into 8-bit encoding. UTF-16 
> would be a remedy in this case, but what if don't have it (see 2)?

That is of course an issue. There are several approaches to that:

- Fail with an error.
- Convert to a place-holder character.
- QP encode the file name, perhaps?

-- 
\\// Peter - http://www.softwolves.pp.se/

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2009-11-06 15:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-06 10:00 Timur Sufiev: Re: [PATCH I18N filenames v2 3/3] Provide compatibility with MinGW Timur Sufiev
2009-11-06 14:55 ` Peter Krefting

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox