* [EGIT PATCH] Improve end-of-file detection in DirCache
@ 2009-04-08 15:50 Robin Rosenberg
2009-04-09 15:28 ` [JGIT PATCH 1/2] " Shawn O. Pearce
0 siblings, 1 reply; 6+ messages in thread
From: Robin Rosenberg @ 2009-04-08 15:50 UTC (permalink / raw)
To: spearce; +Cc: git, Robin Rosenberg
When reading from an BufferInputStream attached to a
FileInputStream the available() method seems to return
the number of unread bytes in the buffer plus the unread
number of bytes in the file.
There is no guarantee for this behaviour so this quick fix
might not be as stable as we would wish.
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
---
http://code.google.com/p/egit/issues/detail?id=66
---
.../src/org/spearce/jgit/dircache/DirCache.java | 16 ++++++++++------
1 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java b/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java
index 8eb4022..657762e 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java
@@ -45,7 +45,6 @@
import java.io.IOException;
import java.io.OutputStream;
import java.nio.ByteBuffer;
-import java.nio.channels.FileChannel;
import java.security.DigestOutputStream;
import java.security.MessageDigest;
import java.util.Comparator;
@@ -344,8 +343,6 @@ public void clear() {
private void readFrom(final FileInputStream inStream) throws IOException,
CorruptObjectException {
- final FileChannel fd = inStream.getChannel();
- final long sizeOnDisk = fd.size();
final BufferedInputStream in = new BufferedInputStream(inStream);
// Read the index header and verify we understand it.
@@ -369,9 +366,16 @@ private void readFrom(final FileInputStream inStream) throws IOException,
sortedEntries[i] = new DirCacheEntry(infos, i * INFO_LEN, in);
lastModified = liveFile.lastModified();
- // After the file entries are index extensions.
- //
- while (fd.position() - in.available() < sizeOnDisk - 20) {
+ /*
+ * InputStream.available() on file input streams seems to return the
+ * rest of the file i.e. buffer size + unread file on disk. There is no
+ * guarantee for this so we need to fix this or we may miss extensions
+ * when reading the index in a few races cases.
+ *
+ * That is currently no disaster though.
+ */
+ while (in.available() > 20) {
+ // After the file entries are index extensions.
NB.readFully(in, hdr, 0, 8);
switch (NB.decodeInt32(hdr, 0)) {
case EXT_TREE: {
--
1.6.2.2.446.gfbdc0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [JGIT PATCH 1/2] Improve end-of-file detection in DirCache
2009-04-08 15:50 [EGIT PATCH] Improve end-of-file detection in DirCache Robin Rosenberg
@ 2009-04-09 15:28 ` Shawn O. Pearce
2009-04-09 15:28 ` [JGIT PATCH 2/2] Validate the DIRC footer during read Shawn O. Pearce
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Shawn O. Pearce @ 2009-04-09 15:28 UTC (permalink / raw)
To: Robin Rosenberg; +Cc: git
When reading from a BufferInputStream attached to a FileInputStream
the available() method seems to return the number of unread bytes
in the buffer plus the unread number of bytes in the file. There is
no guarantee for this behavior, so we can't rely on it.
Instead we read the expected 20 byte SHA1 trailer into a temporary
buffer, and then poll for a remaining byte. If another byte is
still present in the input stream it indicates that there is more
than 20 bytes worth of data remaining, which means an extension
must be available at the current position. If there is in fact
an extension we put back the 21 bytes we read and proceed to do
an 8 byte read for the extension header.
This relies on BufferInputStream's ability to buffer at least 21
bytes of data, and put them all back in the event that we found
an extension. This is a much more common usage of the stream API
and is something we can rely on working correctly all of the time,
on any standard library implementation.
Found-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
I'd rather do this instead. Its accurate 100% of the time and isn't
a hack. So I'm dropping the patch I'm replying to and asking you
to review and apply this 2 patch series (or at least this first
one which should fix the problem).
.../src/org/spearce/jgit/dircache/DirCache.java | 20 +++++++++++++-------
1 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java b/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java
index 8eb4022..5df0f59 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java
@@ -45,7 +45,6 @@
import java.io.IOException;
import java.io.OutputStream;
import java.nio.ByteBuffer;
-import java.nio.channels.FileChannel;
import java.security.DigestOutputStream;
import java.security.MessageDigest;
import java.util.Comparator;
@@ -344,13 +343,11 @@ public void clear() {
private void readFrom(final FileInputStream inStream) throws IOException,
CorruptObjectException {
- final FileChannel fd = inStream.getChannel();
- final long sizeOnDisk = fd.size();
final BufferedInputStream in = new BufferedInputStream(inStream);
// Read the index header and verify we understand it.
//
- final byte[] hdr = new byte[12];
+ final byte[] hdr = new byte[20];
NB.readFully(in, hdr, 0, 12);
if (!is_DIRC(hdr))
throw new CorruptObjectException("Not a DIRC file.");
@@ -369,13 +366,22 @@ private void readFrom(final FileInputStream inStream) throws IOException,
sortedEntries[i] = new DirCacheEntry(infos, i * INFO_LEN, in);
lastModified = liveFile.lastModified();
- // After the file entries are index extensions.
+ // After the file entries are index extensions, and then a footer.
//
- while (fd.position() - in.available() < sizeOnDisk - 20) {
- NB.readFully(in, hdr, 0, 8);
+ for (;;) {
+ in.mark(21);
+ NB.readFully(in, hdr, 0, 20);
+ if (in.read() < 0) {
+ // No extensions present; the file ended where we expected.
+ //
+ break;
+ }
+ in.reset();
+
switch (NB.decodeInt32(hdr, 0)) {
case EXT_TREE: {
final byte[] raw = new byte[NB.decodeInt32(hdr, 4)];
+ NB.skipFully(in, 8);
NB.readFully(in, raw, 0, raw.length);
tree = new DirCacheTree(raw, new MutableInteger(), null);
break;
--
1.6.2.2.600.g24c24
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [JGIT PATCH 2/2] Validate the DIRC footer during read
2009-04-09 15:28 ` [JGIT PATCH 1/2] " Shawn O. Pearce
@ 2009-04-09 15:28 ` Shawn O. Pearce
2009-04-10 2:35 ` [JGIT PATCH 1/2] Improve end-of-file detection in DirCache Robin Rosenberg
2009-04-13 11:53 ` Robin Rosenberg
2 siblings, 0 replies; 6+ messages in thread
From: Shawn O. Pearce @ 2009-04-09 15:28 UTC (permalink / raw)
To: Robin Rosenberg; +Cc: git
To ensure we didn't read a partial file, or a file that has a corrupt
content (e.g. due power failure on a buggy unjournaled filesystem)
we now validate the entire content of the DIRC file through its final
SHA1 checksum at the end of the stream.
This slightly slows down the read path, as we now have to process a
lot of small buffers (info and path objects) through the digest, but
we're better off ensuring the content is reasonably sane.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
This isn't strictly necessary to fix the bug you identified, but
I was in the neighboorhood and stronger data validation on input
is potentially worthwhile...
.../src/org/spearce/jgit/dircache/DirCache.java | 12 +++++++++++-
.../org/spearce/jgit/dircache/DirCacheEntry.java | 13 +++++++++++--
2 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java b/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java
index 5df0f59..58da014 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java
@@ -47,6 +47,7 @@
import java.nio.ByteBuffer;
import java.security.DigestOutputStream;
import java.security.MessageDigest;
+import java.util.Arrays;
import java.util.Comparator;
import org.spearce.jgit.errors.CorruptObjectException;
@@ -344,11 +345,13 @@ public void clear() {
private void readFrom(final FileInputStream inStream) throws IOException,
CorruptObjectException {
final BufferedInputStream in = new BufferedInputStream(inStream);
+ final MessageDigest md = Constants.newMessageDigest();
// Read the index header and verify we understand it.
//
final byte[] hdr = new byte[20];
NB.readFully(in, hdr, 0, 12);
+ md.update(hdr, 0, 12);
if (!is_DIRC(hdr))
throw new CorruptObjectException("Not a DIRC file.");
final int ver = NB.decodeInt32(hdr, 4);
@@ -363,7 +366,7 @@ private void readFrom(final FileInputStream inStream) throws IOException,
final byte[] infos = new byte[INFO_LEN * entryCnt];
sortedEntries = new DirCacheEntry[entryCnt];
for (int i = 0; i < entryCnt; i++)
- sortedEntries[i] = new DirCacheEntry(infos, i * INFO_LEN, in);
+ sortedEntries[i] = new DirCacheEntry(infos, i * INFO_LEN, in, md);
lastModified = liveFile.lastModified();
// After the file entries are index extensions, and then a footer.
@@ -381,8 +384,10 @@ private void readFrom(final FileInputStream inStream) throws IOException,
switch (NB.decodeInt32(hdr, 0)) {
case EXT_TREE: {
final byte[] raw = new byte[NB.decodeInt32(hdr, 4)];
+ md.update(hdr, 0, 8);
NB.skipFully(in, 8);
NB.readFully(in, raw, 0, raw.length);
+ md.update(raw, 0, raw.length);
tree = new DirCacheTree(raw, new MutableInteger(), null);
break;
}
@@ -405,6 +410,11 @@ private void readFrom(final FileInputStream inStream) throws IOException,
}
}
}
+
+ final byte[] exp = md.digest();
+ if (!Arrays.equals(exp, hdr)) {
+ throw new CorruptObjectException("DIRC checksum mismatch");
+ }
}
private static boolean is_DIRC(final byte[] hdr) {
diff --git a/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCacheEntry.java b/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCacheEntry.java
index 6d46648..47b1cc5 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCacheEntry.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCacheEntry.java
@@ -43,6 +43,7 @@
import java.io.InputStream;
import java.io.OutputStream;
import java.nio.ByteBuffer;
+import java.security.MessageDigest;
import java.util.Arrays;
import org.spearce.jgit.lib.AnyObjectId;
@@ -113,17 +114,19 @@
final byte[] path;
DirCacheEntry(final byte[] sharedInfo, final int infoAt,
- final InputStream in) throws IOException {
+ final InputStream in, final MessageDigest md) throws IOException {
info = sharedInfo;
infoOffset = infoAt;
NB.readFully(in, info, infoOffset, INFO_LEN);
+ md.update(info, infoOffset, INFO_LEN);
int pathLen = NB.decodeUInt16(info, infoOffset + P_FLAGS) & NAME_MASK;
int skipped = 0;
if (pathLen < NAME_MASK) {
path = new byte[pathLen];
NB.readFully(in, path, 0, pathLen);
+ md.update(path, 0, pathLen);
} else {
final ByteArrayOutputStream tmp = new ByteArrayOutputStream();
{
@@ -142,6 +145,8 @@ DirCacheEntry(final byte[] sharedInfo, final int infoAt,
path = tmp.toByteArray();
pathLen = path.length;
skipped = 1; // we already skipped 1 '\0' above to break the loop.
+ md.update(path, 0, pathLen);
+ md.update((byte) 0);
}
// Index records are padded out to the next 8 byte alignment
@@ -149,7 +154,11 @@ DirCacheEntry(final byte[] sharedInfo, final int infoAt,
//
final int actLen = INFO_LEN + pathLen;
final int expLen = (actLen + 8) & ~7;
- NB.skipFully(in, expLen - actLen - skipped);
+ final int padLen = expLen - actLen - skipped;
+ if (padLen > 0) {
+ NB.skipFully(in, padLen);
+ md.update(nullpad, 0, padLen);
+ }
}
/**
--
1.6.2.2.600.g24c24
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [JGIT PATCH 1/2] Improve end-of-file detection in DirCache
2009-04-09 15:28 ` [JGIT PATCH 1/2] " Shawn O. Pearce
2009-04-09 15:28 ` [JGIT PATCH 2/2] Validate the DIRC footer during read Shawn O. Pearce
@ 2009-04-10 2:35 ` Robin Rosenberg
2009-04-13 11:53 ` Robin Rosenberg
2 siblings, 0 replies; 6+ messages in thread
From: Robin Rosenberg @ 2009-04-10 2:35 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: git
torsdag 09 april 2009 17:28:18 skrev "Shawn O. Pearce" <spearce@spearce.org>:
> When reading from a BufferInputStream attached to a FileInputStream
> the available() method seems to return the number of unread bytes
> in the buffer plus the unread number of bytes in the file. There is
> no guarantee for this behavior, so we can't rely on it.
>
> Instead we read the expected 20 byte SHA1 trailer into a temporary
> buffer, and then poll for a remaining byte. If another byte is
> still present in the input stream it indicates that there is more
> than 20 bytes worth of data remaining, which means an extension
> must be available at the current position. If there is in fact
> an extension we put back the 21 bytes we read and proceed to do
> an 8 byte read for the extension header.
>
> This relies on BufferInputStream's ability to buffer at least 21
> bytes of data, and put them all back in the event that we found
> an extension. This is a much more common usage of the stream API
> and is something we can rely on working correctly all of the time,
> on any standard library implementation.
Much better. Here's a regression test, fails before, but not after.
-- robin
>From b6561578a5ff48309f69a8d442cf47ec1b51bc49 Mon Sep 17 00:00:00 2001
From: Robin Rosenberg <robin.rosenberg@dewire.com>
Date: Fri, 10 Apr 2009 04:16:31 +0200
Subject: [EGIT PATCH] Test case for the DirCache reading code.
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
---
.../spearce/jgit/dircache/DirCacheTreeTest.java | 33 ++++++++++++++++++++
1 files changed, 33 insertions(+), 0 deletions(-)
diff --git a/org.spearce.jgit.test/tst/org/spearce/jgit/dircache/DirCacheTreeTest.java b/org.spearce.jgit.test/tst/org/spearce/jgit/dircache/DirCacheTreeTest.java
index b37095d..aca0b90 100644
--- a/org.spearce.jgit.test/tst/org/spearce/jgit/dircache/DirCacheTreeTest.java
+++ b/org.spearce.jgit.test/tst/org/spearce/jgit/dircache/DirCacheTreeTest.java
@@ -37,6 +37,9 @@
package org.spearce.jgit.dircache;
+import java.io.IOException;
+
+import org.spearce.jgit.errors.CorruptObjectException;
import org.spearce.jgit.lib.RepositoryTestCase;
public class DirCacheTreeTest extends RepositoryTestCase {
@@ -147,4 +150,34 @@ public void testTwoLevelSubtree() throws Exception {
assertEquals(acLast - acFirst + 1, acTree.getEntrySpan());
assertFalse(acTree.isValid());
}
+
+ /**
+ * We had bugs related to buffer size in the DirCache. This test creates an
+ * index larger than the default BufferedInputStream buffer size. This made
+ * the DirCache unable to read the extensions when index size exceeded the
+ * buffer size (in some cases at least).
+ *
+ * @throws CorruptObjectException
+ * @throws IOException
+ */
+ public void testWriteReadTree() throws CorruptObjectException, IOException {
+ final DirCache dc = DirCache.lock(db);
+
+ final String A = String.format("a%2000s", "a");
+ final String B = String.format("b%2000s", "b");
+ final String[] paths = { A + ".", A + "." + B, A + "/" + B, A + "0" + B };
+ final DirCacheEntry[] ents = new DirCacheEntry[paths.length];
+ for (int i = 0; i < paths.length; i++)
+ ents[i] = new DirCacheEntry(paths[i]);
+
+ final DirCacheBuilder b = dc.builder();
+ for (int i = 0; i < ents.length; i++)
+ b.add(ents[i]);
+
+ b.commit();
+ DirCache read = DirCache.read(db);
+
+ assertEquals(paths.length, read.getEntryCount());
+ assertEquals(1, read.getCacheTree(true).getChildCount());
+ }
}
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [JGIT PATCH 1/2] Improve end-of-file detection in DirCache
2009-04-09 15:28 ` [JGIT PATCH 1/2] " Shawn O. Pearce
2009-04-09 15:28 ` [JGIT PATCH 2/2] Validate the DIRC footer during read Shawn O. Pearce
2009-04-10 2:35 ` [JGIT PATCH 1/2] Improve end-of-file detection in DirCache Robin Rosenberg
@ 2009-04-13 11:53 ` Robin Rosenberg
2009-04-13 15:42 ` Shawn O. Pearce
2 siblings, 1 reply; 6+ messages in thread
From: Robin Rosenberg @ 2009-04-13 11:53 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: git
As a consequence, I'll reapply the GitMoveDeleteHook reverts we made just
prior to 0.4 and start deprecating GitIndex. Any preference for re-apply or
reverting the reverts?
The reverts are in e7307f14c531d52cf231c39d844841c4adaf5e5a and
2066e55e4740d9e9cfaf455596f832ff694f853a
I think the original patches are valid.
-- robin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [JGIT PATCH 1/2] Improve end-of-file detection in DirCache
2009-04-13 11:53 ` Robin Rosenberg
@ 2009-04-13 15:42 ` Shawn O. Pearce
0 siblings, 0 replies; 6+ messages in thread
From: Shawn O. Pearce @ 2009-04-13 15:42 UTC (permalink / raw)
To: Robin Rosenberg; +Cc: git
Robin Rosenberg <robin.rosenberg.lists@dewire.com> wrote:
>
> As a consequence, I'll reapply the GitMoveDeleteHook reverts we made just
> prior to 0.4 and start deprecating GitIndex. Any preference for re-apply or
> reverting the reverts?
>
> The reverts are in e7307f14c531d52cf231c39d844841c4adaf5e5a and
> 2066e55e4740d9e9cfaf455596f832ff694f853a
>
> I think the original patches are valid.
Hmmph.
I would cherry-pick with --no-commit and then make two notes in
the final commit message, e.g.:
...
Temporary-revert: $REVERTCOMMIT
Originally: $FIRSTCOMMIT
Signed-off-by: ...
so we can "thread" them with hyperlinks in viewers.
--
Shawn.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-04-13 15:43 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-08 15:50 [EGIT PATCH] Improve end-of-file detection in DirCache Robin Rosenberg
2009-04-09 15:28 ` [JGIT PATCH 1/2] " Shawn O. Pearce
2009-04-09 15:28 ` [JGIT PATCH 2/2] Validate the DIRC footer during read Shawn O. Pearce
2009-04-10 2:35 ` [JGIT PATCH 1/2] Improve end-of-file detection in DirCache Robin Rosenberg
2009-04-13 11:53 ` Robin Rosenberg
2009-04-13 15:42 ` Shawn O. Pearce
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).