git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: Robin Rosenberg <robin.rosenberg@dewire.com>
Cc: git@vger.kernel.org
Subject: [EGIT PATCH 03/26] Add Constants.encode as a utility for quick encoding in UTF-8
Date: Mon, 11 Aug 2008 18:07:50 -0700	[thread overview]
Message-ID: <1218503293-14057-4-git-send-email-spearce@spearce.org> (raw)
In-Reply-To: <1218503293-14057-3-git-send-email-spearce@spearce.org>

We often need to convert a string into a UTF-8 encoding, so that
we can use this string as a path filter in a TreeWalk or in some
other suitable place where we assume a standard UTF-8 encoding is
being used.  As we have already done the lookup for the CHARSET
we can reuse that same CHARSET reference during future encoding
calls, while allowing the CharSet implementation to cache and
reuse the actual encoder instance.

Whenever possible we try to avoid copying the result as most of
the time the returned ByteBuffer's internal array matches the
result array we need to return to our caller.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../spearce/jgit/lib/ConstantsEncodingTest.java    |   89 ++++++++++++++++++++
 .../src/org/spearce/jgit/lib/Constants.java        |   25 ++++++
 2 files changed, 114 insertions(+), 0 deletions(-)
 create mode 100644 org.spearce.jgit.test/tst/org/spearce/jgit/lib/ConstantsEncodingTest.java

diff --git a/org.spearce.jgit.test/tst/org/spearce/jgit/lib/ConstantsEncodingTest.java b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/ConstantsEncodingTest.java
new file mode 100644
index 0000000..7b3e5a0
--- /dev/null
+++ b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/ConstantsEncodingTest.java
@@ -0,0 +1,89 @@
+/*
+ * Copyright (C) 2008, Google Inc.
+ *
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials provided
+ *   with the distribution.
+ *
+ * - Neither the name of the Git Development Community nor the
+ *   names of its contributors may be used to endorse or promote
+ *   products derived from this software without specific prior
+ *   written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
+ * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
+ * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+ * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+package org.spearce.jgit.lib;
+
+import java.io.UnsupportedEncodingException;
+import java.util.Arrays;
+
+import junit.framework.TestCase;
+
+public class ConstantsEncodingTest extends TestCase {
+	public void testEncodeASCII_SimpleASCII()
+			throws UnsupportedEncodingException {
+		final String src = "abc";
+		final byte[] exp = { 'a', 'b', 'c' };
+		final byte[] res = Constants.encodeASCII(src);
+		assertTrue(Arrays.equals(exp, res));
+		assertEquals(src, new String(res, 0, res.length, "UTF-8"));
+	}
+
+	public void testEncodeASCII_FailOnNonASCII() {
+		final String src = "Ūnĭcōde̽";
+		try {
+			Constants.encodeASCII(src);
+			fail("Incorrectly accepted a Unicode character");
+		} catch (IllegalArgumentException err) {
+			assertEquals("Not ASCII string: " + src, err.getMessage());
+		}
+	}
+
+	public void testEncodeASCII_Number13() {
+		final long src = 13;
+		final byte[] exp = { '1', '3' };
+		final byte[] res = Constants.encodeASCII(src);
+		assertTrue(Arrays.equals(exp, res));
+	}
+
+	public void testEncode_SimpleASCII() throws UnsupportedEncodingException {
+		final String src = "abc";
+		final byte[] exp = { 'a', 'b', 'c' };
+		final byte[] res = Constants.encode(src);
+		assertTrue(Arrays.equals(exp, res));
+		assertEquals(src, new String(res, 0, res.length, "UTF-8"));
+	}
+
+	public void testEncode_Unicode() throws UnsupportedEncodingException {
+		final String src = "Ūnĭcōde̽";
+		final byte[] exp = { (byte) 0xC5, (byte) 0xAA, 0x6E, (byte) 0xC4,
+				(byte) 0xAD, 0x63, (byte) 0xC5, (byte) 0x8D, 0x64, 0x65,
+				(byte) 0xCC, (byte) 0xBD };
+		final byte[] res = Constants.encode(src);
+		assertTrue(Arrays.equals(exp, res));
+		assertEquals(src, new String(res, 0, res.length, "UTF-8"));
+	}
+}
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/Constants.java b/org.spearce.jgit/src/org/spearce/jgit/lib/Constants.java
index 7c2cef9..23ac3ac 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/Constants.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/Constants.java
@@ -1,6 +1,7 @@
 /*
  * Copyright (C) 2008, Robin Rosenberg <robin.rosenberg@dewire.com>
  * Copyright (C) 2008, Shawn O. Pearce <spearce@spearce.org>
+ * Copyright (C) 2008, Google Inc.
  *
  * All rights reserved.
  *
@@ -38,6 +39,7 @@
 
 package org.spearce.jgit.lib;
 
+import java.nio.ByteBuffer;
 import java.nio.charset.Charset;
 import java.security.MessageDigest;
 import java.security.NoSuchAlgorithmException;
@@ -387,6 +389,29 @@ public final class Constants {
 		return r;
 	}
 
+	/**
+	 * Convert a string to a byte array in the standard character encoding.
+	 * 
+	 * @param str
+	 *            the string to convert. May contain any Unicode characters.
+	 * @return a byte array representing the requested string, encoded using the
+	 *         default character encoding (UTF-8).
+	 * @see #CHARACTER_ENCODING
+	 */
+	public static byte[] encode(final String str) {
+		final ByteBuffer bb = Constants.CHARSET.encode(str);
+		final int len = bb.limit();
+		if (bb.hasArray() && bb.arrayOffset() == 0) {
+			final byte[] arr = bb.array();
+			if (arr.length == len)
+				return arr;
+		}
+
+		final byte[] arr = new byte[len];
+		bb.get(arr);
+		return arr;
+	}
+
 	static {
 		if (OBJECT_ID_LENGTH != newMessageDigest().getDigestLength())
 			throw new LinkageError("Incorrect OBJECT_ID_LENGTH.");
-- 
1.6.0.rc2.22.g71b99

  reply	other threads:[~2008-08-12  1:09 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-12  1:07 [EGIT PATCH 00/26] New DirCache API Shawn O. Pearce
2008-08-12  1:07 ` [EGIT PATCH 01/26] Force all source code to UTF-8 encoding by default Shawn O. Pearce
2008-08-12  1:07   ` [EGIT PATCH 02/26] Protect WorkingTreeIterator's name encoding from weird ByteBuffers Shawn O. Pearce
2008-08-12  1:07     ` Shawn O. Pearce [this message]
2008-08-12  1:07       ` [EGIT PATCH 04/26] Rely upon Constants.CHARSET over Constants.CHARACTER_ENCODING Shawn O. Pearce
2008-08-12  1:07         ` [EGIT PATCH 05/26] Allow AbstractTreeIterators to find out about StopWalkExceptions Shawn O. Pearce
2008-08-12  1:07           ` [EGIT PATCH 06/26] Implement a new .git/index (aka dircache) read interface Shawn O. Pearce
2008-08-12  1:07             ` [EGIT PATCH 07/26] Export the new DirCache API to Eclipse plugins using jgit Shawn O. Pearce
2008-08-12  1:07               ` [EGIT PATCH 08/26] Support locking (and unlocking) a .git/index through DirCache Shawn O. Pearce
2008-08-12  1:07                 ` [EGIT PATCH 09/26] Support writing " Shawn O. Pearce
2008-08-12  1:07                   ` [EGIT PATCH 10/26] Support the 'TREE' extension in " Shawn O. Pearce
2008-08-12  1:07                     ` [EGIT PATCH 11/26] Support using a DirCache within a TreeWalk Shawn O. Pearce
2008-08-12  1:07                       ` [EGIT PATCH 12/26] Support recreating a .git/index through DirCache Shawn O. Pearce
2008-08-12  1:08                         ` [EGIT PATCH 13/26] Support iterating and building a DirCache at the same time Shawn O. Pearce
2008-08-12  1:08                           ` [EGIT PATCH 14/26] Support creating a new DirCacheEntry for an arbitrary path Shawn O. Pearce
2008-08-12  1:08                             ` [EGIT PATCH 15/26] Support a simplified model of editing index entries Shawn O. Pearce
2008-08-12  1:08                               ` [EGIT PATCH 16/26] Support recursively getting all entries under a subtree path Shawn O. Pearce
2008-08-12  1:08                                 ` [EGIT PATCH 17/26] Support copying meta fields from one DirCacheEntry to another Shawn O. Pearce
2008-08-12  1:08                                   ` [EGIT PATCH 18/26] Add JUnit tests for new DirCache API Shawn O. Pearce
     [not found]                                     ` <1218503293-14057-20-git-send-email-spearce@spearce.org>
2008-08-12  1:08                                       ` [EGIT PATCH 20/26] Allow the new DirCacheIterator in command line arguments Shawn O. Pearce
2008-08-12  1:08                                         ` [EGIT PATCH 21/26] Add debugging commands to interact with the new DirCache code Shawn O. Pearce
2008-08-12  1:08                                           ` [EGIT PATCH 22/26] Add a basic command line implementation of rm Shawn O. Pearce
2008-08-12  1:08                                             ` [EGIT PATCH 23/26] Rewrite GitMoveDeleteHook to use DirCacheBuilder Shawn O. Pearce
2008-08-12  1:08                                               ` [EGIT PATCH 24/26] Teach GitMoveDeleteHook how to move a folder recursively Shawn O. Pearce
2008-08-12  1:08                                                 ` [EGIT PATCH 25/26] Rewrite UntrackOperation to use DirCacheBuilder Shawn O. Pearce
2008-08-12  1:08                                                   ` [EGIT PATCH 26/26] Rewrite AssumeUnchangedOperation to use DirCache Shawn O. Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1218503293-14057-4-git-send-email-spearce@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=robin.rosenberg@dewire.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).