git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: Robin Rosenberg <robin.rosenberg@dewire.com>
Cc: Mark Struberg <struberg@yahoo.de>, git@vger.kernel.org
Subject: [JGIT PATCH] Fix data corruption in DirCacheIterator when EmptyTreeIterator is used
Date: Mon, 11 May 2009 10:52:28 -0700	[thread overview]
Message-ID: <1242064348-13197-1-git-send-email-spearce@spearce.org> (raw)

When a DirCacheIterator was wrapped in an EmptyTreeIterator (such
as during a parallel TreeWalk where the other trees contain a
path that does not appear in the DirCachIterator) we corrupted
the DirCacheEntry path buffers by overwriting part of file name
components with '/' to match the other tree iterators' path length.

This happened because DirCacheIterator violated the iterator
assumption that the path buffer is always mutable.  Instead of
creating a mutable path, DirCacheIterator reused the path buffer
from the DirCacheEntry or the DirCacheTree.  These reused byte
arrays aren't mutable.

By delegating the EmptyTreeIterator creation to each iterator type
we can permit DirCacheIterator to control how it builds the empty
tree for the caller, ensuring that it copies the path buffer before
writing the '/' suffix onto it.

When EmptyTreeIterator has to grow the path buffer to create a new
iterator around itself, we can't just blindly replace every parent
iterator buffer with the larger path buffer.  DirCacheIterators will
be using a different path buffer, and they want to retain their
old path buffer, not the new larger buffer.

Noticed-by: Mark Struberg <struberg@yahoo.de>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---

 So uhm, test cases...  I can't come up with a simple test case
 which causes the data corruption, but Mark found a very complex
 one in egit.git.

 Now that I understand what the heck was wrong, its obvious we
 need to do this, as DirCacheIterator was breaking the contract.

 I'd love to have a test case, but I just spent an hour trying to
 come up with one and failed.  I'd rather have the data corruption
 fix without tests than not at all, since its obviously wrong as-is.
 But if you want a test, propose one.  I'm just not seeing the
 magic necessary to get the corruption to trigger.

 .../spearce/jgit/dircache/DirCacheIterator.java    |    9 ++++++++
 .../jgit/treewalk/AbstractTreeIterator.java        |   19 +++++++++++++---
 .../spearce/jgit/treewalk/EmptyTreeIterator.java   |   22 ++++++++++++++++++++
 .../src/org/spearce/jgit/treewalk/TreeWalk.java    |    2 +-
 4 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCacheIterator.java b/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCacheIterator.java
index 356d735..6fb9510 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCacheIterator.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCacheIterator.java
@@ -45,6 +45,7 @@
 import org.spearce.jgit.lib.FileMode;
 import org.spearce.jgit.lib.Repository;
 import org.spearce.jgit.treewalk.AbstractTreeIterator;
+import org.spearce.jgit.treewalk.EmptyTreeIterator;
 
 /**
  * Iterate a {@link DirCache} as part of a <code>TreeWalk</code>.
@@ -126,6 +127,14 @@ public AbstractTreeIterator createSubtreeIterator(final Repository repo)
 	}
 
 	@Override
+	public EmptyTreeIterator createEmptyTreeIterator() {
+		final byte[] n = new byte[Math.max(pathLen + 1, DEFAULT_PATH_SIZE)];
+		System.arraycopy(path, 0, n, 0, pathLen);
+		n[pathLen] = '/';
+		return new EmptyTreeIterator(this, n, pathLen + 1);
+	}
+
+	@Override
 	public byte[] idBuffer() {
 		if (currentSubtree != null)
 			return subtreeId;
diff --git a/org.spearce.jgit/src/org/spearce/jgit/treewalk/AbstractTreeIterator.java b/org.spearce.jgit/src/org/spearce/jgit/treewalk/AbstractTreeIterator.java
index 2ff3b99..057250e 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/treewalk/AbstractTreeIterator.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/treewalk/AbstractTreeIterator.java
@@ -73,7 +73,8 @@
  * @see CanonicalTreeParser
  */
 public abstract class AbstractTreeIterator {
-	static final int DEFAULT_PATH_SIZE = 128;
+	/** Default size for the {@link #path} buffer. */
+	protected static final int DEFAULT_PATH_SIZE = 128;
 
 	/** A dummy object id buffer that matches the zero ObjectId. */
 	protected static final byte[] zeroid = new byte[Constants.OBJECT_ID_LENGTH];
@@ -251,9 +252,10 @@ protected AbstractTreeIterator(final AbstractTreeIterator p,
 	 *            be moved into the larger buffer.
 	 */
 	protected void growPath(final int len) {
-		final byte[] n = new byte[path.length << 1];
-		System.arraycopy(path, 0, n, 0, len);
-		for (AbstractTreeIterator p = this; p != null; p = p.parent)
+		final byte[] o = path;
+		final byte[] n = new byte[o.length << 1];
+		System.arraycopy(o, 0, n, 0, len);
+		for (AbstractTreeIterator p = this; p != null && p.path == o; p = p.parent)
 			p.path = n;
 	}
 
@@ -400,6 +402,15 @@ public abstract AbstractTreeIterator createSubtreeIterator(Repository repo)
 			throws IncorrectObjectTypeException, IOException;
 
 	/**
+	 * Create a new iterator as though the current entry were a subtree.
+	 *
+	 * @return a new empty tree iterator.
+	 */
+	public EmptyTreeIterator createEmptyTreeIterator() {
+		return new EmptyTreeIterator(this);
+	}
+
+	/**
 	 * Create a new iterator for the current entry's subtree.
 	 * <p>
 	 * The parent reference of the iterator must be <code>this</code>, otherwise
diff --git a/org.spearce.jgit/src/org/spearce/jgit/treewalk/EmptyTreeIterator.java b/org.spearce.jgit/src/org/spearce/jgit/treewalk/EmptyTreeIterator.java
index eaca04e..cc5ea99 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/treewalk/EmptyTreeIterator.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/treewalk/EmptyTreeIterator.java
@@ -57,6 +57,28 @@ EmptyTreeIterator(final AbstractTreeIterator p) {
 		pathLen = pathOffset;
 	}
 
+	/**
+	 * Create an iterator for a subtree of an existing iterator.
+	 * <p>
+	 * The caller is responsible for setting up the path of the child iterator.
+	 *
+	 * @param p
+	 *            parent tree iterator.
+	 * @param childPath
+	 *            path array to be used by the child iterator. This path must
+	 *            contain the path from the top of the walk to the first child
+	 *            and must end with a '/'.
+	 * @param childPathOffset
+	 *            position within <code>childPath</code> where the child can
+	 *            insert its data. The value at
+	 *            <code>childPath[childPathOffset-1]</code> must be '/'.
+	 */
+	public EmptyTreeIterator(final AbstractTreeIterator p,
+			final byte[] childPath, final int childPathOffset) {
+		super(p, childPath, childPathOffset);
+		pathLen = childPathOffset - 1;
+	}
+
 	@Override
 	public AbstractTreeIterator createSubtreeIterator(final Repository repo)
 			throws IncorrectObjectTypeException, IOException {
diff --git a/org.spearce.jgit/src/org/spearce/jgit/treewalk/TreeWalk.java b/org.spearce.jgit/src/org/spearce/jgit/treewalk/TreeWalk.java
index a41ca58..250b213 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/treewalk/TreeWalk.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/treewalk/TreeWalk.java
@@ -800,7 +800,7 @@ public void enterSubtree() throws MissingObjectException,
 			if (t.matches == ch && !t.eof() && FileMode.TREE.equals(t.mode))
 				n = t.createSubtreeIterator(db, idBuffer, curs);
 			else
-				n = new EmptyTreeIterator(t);
+				n = t.createEmptyTreeIterator();
 			tmp[i] = n;
 		}
 		depth++;
-- 
1.6.3.48.g99c76

             reply	other threads:[~2009-05-11 17:52 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-11 17:52 Shawn O. Pearce [this message]
2009-05-11 18:59 ` [JGIT PATCH] Fix data corruption in DirCacheIterator when EmptyTreeIterator is used Robin Rosenberg
2009-05-11 19:44   ` Shawn O. Pearce
  -- strict thread matches above, loose matches on Subject: below --
2009-05-12  0:04 Mark Struberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1242064348-13197-1-git-send-email-spearce@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=robin.rosenberg@dewire.com \
    --cc=struberg@yahoo.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).