From: Robin Rosenberg <robin.rosenberg@dewire.com>
To: Marc Strapetz <marc.strapetz@syntevo.com>
Cc: git@vger.kernel.org, egit-dev@eclipse.org
Subject: Re: jgit problems for file paths with non-ASCII characters
Date: Wed, 25 Nov 2009 22:11:54 +0100 [thread overview]
Message-ID: <200911252211.55137.robin.rosenberg@dewire.com> (raw)
In-Reply-To: <4B0D356D.1080709@syntevo.com>
onsdag 25 november 2009 14:47:25 skrev Marc Strapetz:
> I have noticed that jgit converts file paths to UTF-8 when querying the
> repository. Especially,
> org.eclipse.jgit.treewalk.filter.PathFilter#PathFilter performs this
> conversion:
>
> private PathFilter(final String s) {
> pathStr = s;
> pathRaw = Constants.encode(pathStr);
> }
>
> Because of this conversion, a TreeWalk fails to identify a file with
> German umlauts. When using platform encoding to convert the file path to
> bytes:
>
> private PathFilter(final String s) {
> pathStr = s;
> pathRaw = s.getBytes();e pr
> }
>
> the TreeWalk works as expected. Actually, the file path seems to be
> stored with platform encoding in the repository.
>
> Is this a bug or a misconfiguration of my repository? I'm using jgit
> (commit e16af839e8a0cc01c52d3648d2d28e4cb915f80f) on Windows.
A bug.
The problem here is that we need to allow multiple encodings since there
is no reliable encoding specified anywhere. The approach I advocate is
the one we use for handling encoding in general. I.e. if it looks like UTF-8,
treat it like that else fallback. This is expensive however and then we have
all the other issues with case insensitive name and the funny property that
unicode has when it allows characters to be encoding using multiple sequences
of code points as empoloyed by Apple.
-- robin
-- robin
next prev parent reply other threads:[~2009-11-25 21:12 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-25 13:47 jgit problems for file paths with non-ASCII characters Marc Strapetz
2009-11-25 21:11 ` Robin Rosenberg [this message]
2009-11-26 0:54 ` [egit-dev] " Shawn O. Pearce
2009-11-26 13:09 ` Thomas Singer
2009-11-26 14:47 ` Johannes Schindelin
2009-11-26 15:31 ` Thomas Singer
2009-11-26 19:57 ` Shawn O. Pearce
2009-11-26 16:44 ` Robin Rosenberg
2009-11-26 14:25 ` Marc Strapetz
2009-11-26 20:03 ` Shawn O. Pearce
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200911252211.55137.robin.rosenberg@dewire.com \
--to=robin.rosenberg@dewire.com \
--cc=egit-dev@eclipse.org \
--cc=git@vger.kernel.org \
--cc=marc.strapetz@syntevo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).