git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Singer <thomas.singer@syntevo.com>
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: EGit developer discussion <egit-dev@eclipse.org>,
	Marc Strapetz <marc.strapetz@syntevo.com>,
	git@vger.kernel.org
Subject: Re: [egit-dev] Re: jgit problems for file paths with non-ASCII	characters
Date: Thu, 26 Nov 2009 14:09:09 +0100	[thread overview]
Message-ID: <4B0E7DF5.9040007@syntevo.com> (raw)
In-Reply-To: <20091126005423.GM11919@spearce.org>

> But as you said, this still doesn't make the Apple normal form
> any easier.  Though if we know we are on such a strange filesystem
> we might be able to assume the paths in the repository are equally
> damaged.  Or not.

Well, if the git-core folks could standardize on, e.g., composed UTF-8
(rather then just UTF-8), for storing file names in the repository, then
everything should be clear, isn't it?

--
Best regards,
Thomas Singer
=============
syntevo GmbH
http://www.syntevo.com
http://blog.syntevo.com


Shawn O. Pearce wrote:
> Robin Rosenberg <robin.rosenberg@dewire.com> wrote:
>> onsdag 25 november 2009 14:47:25 skrev  Marc Strapetz:
>>> I have noticed that jgit converts file paths to UTF-8 when querying the
>>> repository.
> ...
>>> Is this a bug or a misconfiguration of my repository? I'm using jgit
>>> (commit e16af839e8a0cc01c52d3648d2d28e4cb915f80f) on Windows.
>> A bug. 
>>
>> The problem here is that we need to allow multiple encodings since there
>> is no reliable encoding specified anywhere.
> 
> This is a design fault of both Linux and git.  git gets a byte
> sequence from readdir and stores that as-is into the repository.
> We have no way of knowing what that encoding is.  So now everyone
> touching a Git repository is screwed.
> 
>> The approach I advocate is
>> the one we use for handling encoding in general. I.e. if it looks like UTF-8,
>> treat it like that else fallback. This is expensive however
> 
> We should try to work harder with the git-core folks to get character
> set encoding for file names worked out.  We might be able to use a
> configuration setting in the repository to tell us what the proper
> encoding should be, and if not set, assume UTF-8.
> 
>> and then we have
>> all the other issues with case insensitive name and the funny property that
>> unicode has when it allows characters to be encoding using multiple sequences
>> of code points as empoloyed by Apple.
> 
> But as you said, this still doesn't make the Apple normal form
> any easier.  Though if we know we are on such a strange filesystem
> we might be able to assume the paths in the repository are equally
> damaged.  Or not.
> 

  reply	other threads:[~2009-11-26 13:08 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-25 13:47 jgit problems for file paths with non-ASCII characters Marc Strapetz
2009-11-25 21:11 ` Robin Rosenberg
2009-11-26  0:54   ` [egit-dev] " Shawn O. Pearce
2009-11-26 13:09     ` Thomas Singer [this message]
2009-11-26 14:47       ` Johannes Schindelin
2009-11-26 15:31         ` Thomas Singer
2009-11-26 19:57           ` Shawn O. Pearce
2009-11-26 16:44       ` Robin Rosenberg
2009-11-26 14:25     ` Marc Strapetz
2009-11-26 20:03       ` Shawn O. Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B0E7DF5.9040007@syntevo.com \
    --to=thomas.singer@syntevo.com \
    --cc=egit-dev@eclipse.org \
    --cc=git@vger.kernel.org \
    --cc=marc.strapetz@syntevo.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).