From: Johannes Sixt <j.sixt@viscovery.net>
To: Peter Krefting <peter@softwolves.pp.se>
Cc: git@vger.kernel.org
Subject: Re: [RFC PATCH] Windows: Assume all file names to be UTF-8 encoded.
Date: Mon, 02 Mar 2009 11:30:01 +0100 [thread overview]
Message-ID: <49ABB529.1080500@viscovery.net> (raw)
In-Reply-To: <alpine.DEB.2.00.0903020941120.17877@perkele.intern.softwolves.pp.se>
Peter Krefting schrieb:
> When opening a file through open() or fopen(), the path passed is
> UTF-8 encoded.
I don't think that this assumption is valid. Whenever the Windows API has
to convert between Unicode strings and char* strings, it uses the current
"ANSI code page". As far as I know, the UTF-8 codepage (65001) cannot be
used as the "current ANSI code page". Users will always have some code
page set that is not UTF-8.
For example, if the user specifies a file name on the command line, than
it will not enter git in UTF-8, but in the current "ANSI" or "OEM code
page" encoding. If git prints a file name under the assumption that it is
UTF-8 encoded, then it will be displayed incorrectly because the system
uses a different encoding.
> Since there is no real file system abstraction beyond using stdio
> (AFAIK), I need to hack it by replacing fopen (and open). Probably
> opendir/readdir as well (might be trickier), and possibly even hack
> around main() to parse the wchar_t command-line instead of the char copy.
I think you are grossly underestimating the venture that you want to
undertake here.
Please come up with a plan how you are going to deal with the various
issues. File names enter and leave the system through different channels:
- the command line and terminal window
- object database (tree objects)
- opendir/readdir; opening files or directories for reading or writing
And there is probably some more... How do you treat encodings in these
channels? What if the file names are not valid UTF-8? Etc.
The biggest obstacle will be that git does not have a notion of "file name
encoding" - it simply treats a file name as a stream of bytes. There is no
place to write an encoding. If the byte streams are regarded as having an
encoding, then you can have ambiguities, mixed encodings, or invalid
characters. You would have to deal with this in some way.
> This will lose all chances of Windows 9x compatibility, but I don't know
> if there are any attempts of supporting it anyway?
Windows 9x is already out of the loop. We use GetFileInformationByHandle()
that is only available since Windows 2000.
-- Hannes
next prev parent reply other threads:[~2009-03-02 10:31 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-02 8:47 [RFC PATCH] Windows: Assume all file names to be UTF-8 encoded Peter Krefting
2009-03-02 10:30 ` Johannes Sixt [this message]
2009-03-02 10:46 ` Peter Krefting
2009-03-02 10:56 ` Johannes Schindelin
2009-03-02 12:03 ` Peter Krefting
[not found] ` <a2633edd0903020512u5682e9am203f0faccd0acf6a@mail.gmail.com>
2009-03-02 13:57 ` Peter Krefting
2009-03-02 14:29 ` Thomas Rast
2009-03-02 20:41 ` Peter Krefting
2009-03-03 7:56 ` Lars Noschinski
2009-03-03 11:54 ` Peter Krefting
2009-03-03 16:29 ` Lars Noschinski
2009-03-03 20:59 ` Robin Rosenberg
2009-03-03 9:47 ` Dmitry Potapov
2009-03-03 11:48 ` Peter Krefting
2009-03-03 17:13 ` Dmitry Potapov
2009-03-04 10:51 ` Peter Krefting
2009-03-04 14:18 ` Dmitry Potapov
2009-03-02 12:34 ` Johannes Sixt
2009-03-02 13:12 ` Peter Krefting
2009-03-02 19:58 ` Robin Rosenberg
2009-03-02 20:52 ` Peter Krefting
2009-03-02 21:21 ` Robin Rosenberg
2009-03-03 5:51 ` Peter Krefting
2009-03-03 9:43 ` Dmitry Potapov
2009-03-03 11:56 ` Peter Krefting
2009-03-07 10:38 ` Robin Rosenberg
-- strict thread matches above, loose matches on Subject: below --
2009-03-03 18:25 John Dlugosz
2009-03-04 10:53 ` Peter Krefting
2009-03-04 19:34 ` John Dlugosz
2009-03-03 19:36 John Dlugosz
2009-03-03 20:39 John Dlugosz
2009-03-03 21:02 ` Dmitry Potapov
2009-03-03 21:56 ` John Dlugosz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49ABB529.1080500@viscovery.net \
--to=j.sixt@viscovery.net \
--cc=git@vger.kernel.org \
--cc=peter@softwolves.pp.se \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).