* [JGIT PATCH] The default encoding for reading commits is UTF-8 rather than system default
@ 2009-10-07 15:44 Constantine Plotnikov
2009-10-08 4:16 ` Shawn O. Pearce
0 siblings, 1 reply; 2+ messages in thread
From: Constantine Plotnikov @ 2009-10-07 15:44 UTC (permalink / raw)
To: git; +Cc: Constantine Plotnikov
When reading commits the system default encoding was used if no encoding
was specified in the commit. The patch modifies test to add a check that
commit message was encoded correctly (the test fails on old implementation
if system encoding is not UTF-8) and fixes Commit.decode() method to use
UTF-8 is encoding is not specified in the commit object.
Signed-off-by: Constantine Plotnikov <constantine.plotnikov@gmail.com>
---
See man git-commit (the section "DISCUSSION"), for justification why
UTF-8 should be used. Note that this was already correctly implemented
in ObjectWriter.writeCommit(...) method. But Commit.decode() was not
implemented in the same way for some reason.
.../tst/org/spearce/jgit/lib/T0003_Basic.java | 3 +++
.../src/org/spearce/jgit/lib/Commit.java | 18 +++++++-----------
2 files changed, 10 insertions(+), 11 deletions(-)
diff --git a/org.spearce.jgit.test/tst/org/spearce/jgit/lib/T0003_Basic.java b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/T0003_Basic.java
index c2b1b91..4702aaf 100644
--- a/org.spearce.jgit.test/tst/org/spearce/jgit/lib/T0003_Basic.java
+++ b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/T0003_Basic.java
@@ -348,6 +348,9 @@ public void test023_createCommitNonAnullii() throws IOException {
commit.setMessage("\u00dcbergeeks");
ObjectId cid = new ObjectWriter(db).writeCommit(commit);
assertEquals("4680908112778718f37e686cbebcc912730b3154", cid.name());
+ Commit loadedCommit = db.mapCommit(cid);
+ assertNotSame(loadedCommit, commit);
+ assertEquals(commit.getMessage(), loadedCommit.getMessage());
}
public void test024_createCommitNonAscii() throws IOException {
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/Commit.java b/org.spearce.jgit/src/org/spearce/jgit/lib/Commit.java
index 030d4a4..933b929 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/Commit.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/Commit.java
@@ -299,17 +299,13 @@ private void decode() {
br.read(readBuf);
int msgstart = readBuf.length != 0 ? ( readBuf[0] == '\n' ? 1 : 0 ) : 0;
- if (encoding != null) {
- // TODO: this isn't reliable so we need to guess the encoding from the actual content
- author = new PersonIdent(new String(rawAuthor.getBytes(),encoding.name()));
- committer = new PersonIdent(new String(rawCommitter.getBytes(),encoding.name()));
- message = new String(readBuf,msgstart, readBuf.length-msgstart, encoding.name());
- } else {
- // TODO: use config setting / platform / ascii / iso-latin
- author = new PersonIdent(new String(rawAuthor.getBytes()));
- committer = new PersonIdent(new String(rawCommitter.getBytes()));
- message = new String(readBuf, msgstart, readBuf.length-msgstart);
- }
+ // If encoding is not specified, the default for commit is UTF-8
+ if (encoding == null) encoding = Constants.CHARSET;
+
+ // TODO: this isn't reliable so we need to guess the encoding from the actual content
+ author = new PersonIdent(new String(rawAuthor.getBytes(),encoding.name()));
+ committer = new PersonIdent(new String(rawCommitter.getBytes(),encoding.name()));
+ message = new String(readBuf,msgstart, readBuf.length-msgstart, encoding.name());
} catch (IOException e) {
e.printStackTrace();
} finally {
--
1.6.1.2
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [JGIT PATCH] The default encoding for reading commits is UTF-8 rather than system default
2009-10-07 15:44 [JGIT PATCH] The default encoding for reading commits is UTF-8 rather than system default Constantine Plotnikov
@ 2009-10-08 4:16 ` Shawn O. Pearce
0 siblings, 0 replies; 2+ messages in thread
From: Shawn O. Pearce @ 2009-10-08 4:16 UTC (permalink / raw)
To: Constantine Plotnikov; +Cc: git
Constantine Plotnikov <constantine.plotnikov@gmail.com> wrote:
> See man git-commit (the section "DISCUSSION"), for justification why
> UTF-8 should be used. Note that this was already correctly implemented
> in ObjectWriter.writeCommit(...) method. But Commit.decode() was not
> implemented in the same way for some reason.
Commit predates that encoding code in ObjectWriter.writeCommit and
it looks like we just forgot to go back and fix it. A very old bug.
Since our move to the Eclipse Foundation we really need to follow
their IP process, which for non-committers means creating a new bug
in their Bugzilla system:
https://bugs.eclipse.org/bugs/enter_bug.cgi?product=EGit
See also:
http://wiki.eclipse.org/EGit/Contributor_Guide#Contributing_Patches
> .../tst/org/spearce/jgit/lib/T0003_Basic.java | 3 +++
> .../src/org/spearce/jgit/lib/Commit.java | 18 +++++++-----------
> 2 files changed, 10 insertions(+), 11 deletions(-)
--
Shawn.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2009-10-08 4:17 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-07 15:44 [JGIT PATCH] The default encoding for reading commits is UTF-8 rather than system default Constantine Plotnikov
2009-10-08 4:16 ` Shawn O. Pearce
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).