git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Robin Rosenberg <robin.rosenberg@dewire.com>
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: git@vger.kernel.org
Subject: Re: [JGIT PATCH 4/6] Add QuotedString class to handle C-style quoting rules
Date: Thu, 11 Dec 2008 01:33:51 +0100	[thread overview]
Message-ID: <200812110133.51614.robin.rosenberg@dewire.com> (raw)
In-Reply-To: <20081210234130.GB32487@spearce.org>

torsdag 11 december 2008 00:41:30 skrev Shawn O. Pearce:
> > > +	public void testQuote_OctalAll() {
> > > +		assertQuote("\1", "\\001");
> > > +		assertQuote("~", "\\176");
> > > +		assertQuote("\u00ff", "\\303\\277"); // \u00ff in UTF-8
> > > +	}
> >
> > What do we do with non-UTF8 names? I think we should
> > follow the logic we use when parsing commits and paths
> > in other places.
> 
> Then we're totally f'd.
> 
> Git has no specific encoding on file names.  If we get a standard
> Java Unicode string and get asked to quote it characters with
> code points above 127 need to be escaped as an octal escape code
> according to the Git style.  Further the Git style only permits
> octal escapes that result in a value <= 255, aka an unsigned char.
> 
> The name needs to be encoded into an 8-bit encoding, and UTF-8 is
> the only encoding that will represent every valid Unicode character.
> Elsewhere we sort of take the attitude that when writing data *out*
> we produce UTF-8, even if we read in ISO-whatever.  Here I'm doing
> the same thing.

So this should pass, right?

	public void testDeQuote_Latin1() {
		assertDequote("\u00c5ngstr\u00f6m", "\\305ngstr\\366m"); // Latin1
	}

	public void testDeQuote_UTF8() {
		assertDequote("\u00c5ngstr\u00f6m", "\\303\\205ngstr\\303\\266m");
	}

And possibly these actuall unquoted names, which can be produced when
core.quotepath is false

	public void testDeQuote_Rawlatin() {
		assertDequote("\u00c5ngstr\u00f6m", "\305ngstr\366m");
	}

	public void testDeQuote_RawUTF8() {
		assertDequote("\u00c5ngstr\u00f6m", "\303\205ngstr\303\266m");
	}

You also reversed the arguments to testQuote. It think we should follow the
"expected"-first conventions here too. The case above works neither way.
Using Constant.encode in the test is kind of dangerous as it does too
many conversions, so you don't know what you're testing anymore. Changing
assertDequote like this makes us able to feed byte sequences as strings
to the test method (which we cannot do if we assume UTF-8 encoding). ISO-
latin-encoding allows any byte sequence to be entered conveniently.

	private static void assertDequote(final String exp, final String in) {
		final byte[] b;
		try {
			b = ('"' + in + '"').getBytes("ISO-8859-1");
		} catch (UnsupportedEncodingException e) {
			throw new RuntimeException(e);
		}
		final String r = C.dequote(b, 0, b.length);
		assertEquals(exp, r);
	}

-- robin

  reply	other threads:[~2008-12-11  0:35 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-10 22:05 [JGIT PATCH 0/6] RawParseUtil improvements Shawn O. Pearce
2008-12-10 22:05 ` [JGIT PATCH 1/6] Simplify RawParseUtils.nextLF invocations Shawn O. Pearce
2008-12-10 22:05   ` [JGIT PATCH 2/6] Simplify RawParseUtils next and nextLF loops Shawn O. Pearce
2008-12-10 22:05     ` [JGIT PATCH 3/6] Correct Javadoc of RawParseUtils next and nextLF methods Shawn O. Pearce
2008-12-10 22:05       ` [JGIT PATCH 4/6] Add QuotedString class to handle C-style quoting rules Shawn O. Pearce
2008-12-10 22:05         ` [JGIT PATCH 5/6] Add Bourne style quoting for TransportGitSsh Shawn O. Pearce
2008-12-10 22:05           ` [JGIT PATCH 6/6] Add ~user friendly " Shawn O. Pearce
2008-12-10 23:22         ` [JGIT PATCH 4/6] Add QuotedString class to handle C-style quoting rules Robin Rosenberg
2008-12-10 23:41           ` Shawn O. Pearce
2008-12-11  0:33             ` Robin Rosenberg [this message]
2008-12-11  0:57               ` [JGIT PATCH 4/6 v3] Add QuotedString class to handle Git path style " Shawn O. Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200812110133.51614.robin.rosenberg@dewire.com \
    --to=robin.rosenberg@dewire.com \
    --cc=git@vger.kernel.org \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).