* [JGIT PATCH 00/14] TreeWalk D/F conflict detection @ 2008-08-19 22:51 Shawn O. Pearce 2008-08-19 22:51 ` [JGIT PATCH] Support path names longer than 4095 bytes in DirCache Shawn O. Pearce 2008-08-19 22:54 ` [JGIT PATCH 00/14] TreeWalk D/F conflict detection Shawn O. Pearce 0 siblings, 2 replies; 6+ messages in thread From: Shawn O. Pearce @ 2008-08-19 22:51 UTC (permalink / raw) To: Robin Rosenberg; +Cc: git This series is about fixing the "mistake" in Git trees where subtrees sort as through their name is "path/" and not "path". Within a TreeWalk this is a problem because the tree contents: Tree 1 Tree 2 ------------------------ 100644 a 100644 a.c 040000 a 100644 b needs to merge together both "a" paths from tree 1 and tree 2, but these appear at different points in time when we merge-sort the two trees together. We use an infinite look-ahead and look-behind to identify these cases and make the iteration look like this instead: Tree 1 Tree 2 ------------------------ 100644 a 040000 a 100644 a.c 100644 b which allows the application to handle the D/F conflict in a single step, even though the contents of "a" may now be out-of-order within the DirCache. Fortunately DirCacheBuilder can automatically fix this sort of ordering problem. Shawn O. Pearce (14): Detect path names which overflow the name length field in the index Fix NB.decodeUInt16 to correctly handle the high byte Add test cases for NB.encode and NB.decode family of routines Fix DirCache's skip over null byte padding when reading a DIRC file Fix usage of assertEquals in DirCacheIteratorTest Refactor AbstractTreeIterator.pathCompare to force another mode Micro-optimize AbstractTreeIterator.pathCompare Optimize path comparsion within subtrees during TreeWalk Refactor AbstractTreeIterator semantics to start on first entry Make all AbstractTreeIterator implementations bi-directional Expose beginning of iterator indication from AbstractTreeIterator Allow application code to set ObjectIds in DirCacheEntry Create NameConflictTreeWalk to transparently detect D/F conflicts Add test case for NameConflictTreeWalk .../spearce/egit/core/ContainerTreeIterator.java | 9 +- .../jgit/dircache/DirCacheBuilderIteratorTest.java | 2 +- .../jgit/dircache/DirCacheIteratorTest.java | 28 +- .../jgit/treewalk/CanonicalTreeParserTest.java | 261 ++++++++++++++++ .../jgit/treewalk/NameConflictTreeWalkTest.java | 205 ++++++++++++ .../tst/org/spearce/jgit/util/NBTest.java | 328 ++++++++++++++++++++ .../src/org/spearce/jgit/dircache/DirCache.java | 2 +- .../jgit/dircache/DirCacheBuildIterator.java | 9 +- .../org/spearce/jgit/dircache/DirCacheEntry.java | 41 +++- .../spearce/jgit/dircache/DirCacheIterator.java | 84 +++-- .../jgit/treewalk/AbstractTreeIterator.java | 129 +++++--- .../spearce/jgit/treewalk/CanonicalTreeParser.java | 94 +++++- .../spearce/jgit/treewalk/EmptyTreeIterator.java | 12 +- .../spearce/jgit/treewalk/FileTreeIterator.java | 5 +- .../jgit/treewalk/NameConflictTreeWalk.java | 237 ++++++++++++++ .../src/org/spearce/jgit/treewalk/TreeWalk.java | 18 +- .../spearce/jgit/treewalk/WorkingTreeIterator.java | 132 ++++----- org.spearce.jgit/src/org/spearce/jgit/util/NB.java | 29 ++- 18 files changed, 1418 insertions(+), 207 deletions(-) create mode 100644 org.spearce.jgit.test/tst/org/spearce/jgit/treewalk/CanonicalTreeParserTest.java create mode 100644 org.spearce.jgit.test/tst/org/spearce/jgit/treewalk/NameConflictTreeWalkTest.java create mode 100644 org.spearce.jgit.test/tst/org/spearce/jgit/util/NBTest.java create mode 100644 org.spearce.jgit/src/org/spearce/jgit/treewalk/NameConflictTreeWalk.java ^ permalink raw reply [flat|nested] 6+ messages in thread
* [JGIT PATCH] Support path names longer than 4095 bytes in DirCache 2008-08-19 22:51 [JGIT PATCH 00/14] TreeWalk D/F conflict detection Shawn O. Pearce @ 2008-08-19 22:51 ` Shawn O. Pearce 2008-08-19 22:54 ` [JGIT PATCH 00/14] TreeWalk D/F conflict detection Shawn O. Pearce 1 sibling, 0 replies; 6+ messages in thread From: Shawn O. Pearce @ 2008-08-19 22:51 UTC (permalink / raw) To: Robin Rosenberg; +Cc: git C Git supports long paths within the index file by setting the length portion of the flags field to 4095 (0xfff) and then it does a simple C strlen() on the remaining part of the string to locate the first null byte. The trick works because every index record must have between 1 and 8 null bytes trailing the entry as padding. C Git had this rule to make it easy to pass the mmap'd name portion of an entry directly to system calls, and it helped to ensure all members were aligned properly in memory. We now support infinite length paths (well, up to 2GB) by using the same encoding strategy, however paths under the 4095 limit still perform better as they do not require a single byte read loop and a growing/shrinking byte buffer. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> --- .../jgit/dircache/DirCacheLargePathTest.java | 106 ++++++++++++++++++++ .../org/spearce/jgit/dircache/DirCacheEntry.java | 39 ++++++-- 2 files changed, 136 insertions(+), 9 deletions(-) create mode 100644 org.spearce.jgit.test/tst/org/spearce/jgit/dircache/DirCacheLargePathTest.java diff --git a/org.spearce.jgit.test/tst/org/spearce/jgit/dircache/DirCacheLargePathTest.java b/org.spearce.jgit.test/tst/org/spearce/jgit/dircache/DirCacheLargePathTest.java new file mode 100644 index 0000000..4ea286c --- /dev/null +++ b/org.spearce.jgit.test/tst/org/spearce/jgit/dircache/DirCacheLargePathTest.java @@ -0,0 +1,106 @@ +/* + * Copyright (C) 2008, Google Inc. + * + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials provided + * with the distribution. + * + * - Neither the name of the Git Development Community nor the + * names of its contributors may be used to endorse or promote + * products derived from this software without specific prior + * written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND + * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, + * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER + * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, + * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF + * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +package org.spearce.jgit.dircache; + +import java.io.IOException; + +import org.spearce.jgit.errors.CorruptObjectException; +import org.spearce.jgit.lib.RepositoryTestCase; + +public class DirCacheLargePathTest extends RepositoryTestCase { + public void testPath_4090() throws Exception { + testLongPath(4090); + } + + public void testPath_4094() throws Exception { + testLongPath(4094); + } + + public void testPath_4095() throws Exception { + testLongPath(4095); + } + + public void testPath_4096() throws Exception { + testLongPath(4096); + } + + public void testPath_16384() throws Exception { + testLongPath(16384); + } + + private void testLongPath(final int len) throws CorruptObjectException, + IOException { + final String longPath = makeLongPath(len); + final String shortPath = "~~~ shorter-path"; + + final DirCacheEntry longEnt = new DirCacheEntry(longPath); + final DirCacheEntry shortEnt = new DirCacheEntry(shortPath); + assertEquals(longPath, longEnt.getPathString()); + assertEquals(shortPath, shortEnt.getPathString()); + + { + final DirCache dc1 = DirCache.lock(db); + { + final DirCacheBuilder b = dc1.builder(); + b.add(longEnt); + b.add(shortEnt); + assertTrue(b.commit()); + } + assertEquals(2, dc1.getEntryCount()); + assertSame(longEnt, dc1.getEntry(0)); + assertSame(shortEnt, dc1.getEntry(1)); + } + { + final DirCache dc2 = DirCache.read(db); + assertEquals(2, dc2.getEntryCount()); + + assertNotSame(longEnt, dc2.getEntry(0)); + assertEquals(longPath, dc2.getEntry(0).getPathString()); + + assertNotSame(shortEnt, dc2.getEntry(1)); + assertEquals(shortPath, dc2.getEntry(1).getPathString()); + } + } + + private static String makeLongPath(final int len) { + final StringBuilder r = new StringBuilder(len); + for (int i = 0; i < len; i++) + r.append('a' + (i % 26)); + return r.toString(); + } +} diff --git a/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCacheEntry.java b/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCacheEntry.java index 011bc16..dbd74b5 100644 --- a/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCacheEntry.java +++ b/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCacheEntry.java @@ -37,6 +37,8 @@ package org.spearce.jgit.dircache; +import java.io.ByteArrayOutputStream; +import java.io.EOFException; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; @@ -105,18 +107,36 @@ DirCacheEntry(final byte[] sharedInfo, final int infoAt, NB.readFully(in, info, infoOffset, INFO_LEN); int pathLen = NB.decodeUInt16(info, infoOffset + P_FLAGS) & NAME_MASK; - if (pathLen == NAME_MASK) - throw new IOException("Path name too long for jgit"); - path = new byte[pathLen]; - NB.readFully(in, path, 0, pathLen); + int skipped = 0; + if (pathLen < NAME_MASK) { + path = new byte[pathLen]; + NB.readFully(in, path, 0, pathLen); + } else { + final ByteArrayOutputStream tmp = new ByteArrayOutputStream(); + { + final byte[] buf = new byte[NAME_MASK]; + NB.readFully(in, buf, 0, NAME_MASK); + tmp.write(buf); + } + for (;;) { + final int c = in.read(); + if (c < 0) + throw new EOFException("Short read of block."); + if (c == 0) + break; + tmp.write(c); + } + path = tmp.toByteArray(); + pathLen = path.length; + skipped = 1; // we already skipped 1 '\0' above to break the loop. + } // Index records are padded out to the next 8 byte alignment // for historical reasons related to how C Git read the files. // final int actLen = INFO_LEN + pathLen; final int expLen = (actLen + 8) & ~7; - if (actLen != expLen) - NB.skipFully(in, expLen - actLen); + NB.skipFully(in, expLen - actLen - skipped); } /** @@ -140,9 +160,10 @@ public DirCacheEntry(final byte[] newPath) { infoOffset = 0; path = newPath; - if (path.length >= NAME_MASK) - throw new IllegalArgumentException("Path name too long for jgit"); - NB.encodeInt16(info, infoOffset + P_FLAGS, path.length); + if (path.length < NAME_MASK) + NB.encodeInt16(info, infoOffset + P_FLAGS, path.length); + else + NB.encodeInt16(info, infoOffset + P_FLAGS, NAME_MASK); } void write(final OutputStream os) throws IOException { -- 1.6.0.96.g2fad1.dirty ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [JGIT PATCH 00/14] TreeWalk D/F conflict detection 2008-08-19 22:51 [JGIT PATCH 00/14] TreeWalk D/F conflict detection Shawn O. Pearce 2008-08-19 22:51 ` [JGIT PATCH] Support path names longer than 4095 bytes in DirCache Shawn O. Pearce @ 2008-08-19 22:54 ` Shawn O. Pearce 1 sibling, 0 replies; 6+ messages in thread From: Shawn O. Pearce @ 2008-08-19 22:54 UTC (permalink / raw) To: Robin Rosenberg; +Cc: git Sorry about that, my git-send-email directory had this cover letter laying around in it from last time. There is no series, I only meant to post the single patch "Support path names longer than 4095...". > Shawn O. Pearce (14): > Detect path names which overflow the name length field in the index > Fix NB.decodeUInt16 to correctly handle the high byte > Add test cases for NB.encode and NB.decode family of routines > Fix DirCache's skip over null byte padding when reading a DIRC file > Fix usage of assertEquals in DirCacheIteratorTest > Refactor AbstractTreeIterator.pathCompare to force another mode > Micro-optimize AbstractTreeIterator.pathCompare > Optimize path comparsion within subtrees during TreeWalk > Refactor AbstractTreeIterator semantics to start on first entry > Make all AbstractTreeIterator implementations bi-directional > Expose beginning of iterator indication from AbstractTreeIterator > Allow application code to set ObjectIds in DirCacheEntry > Create NameConflictTreeWalk to transparently detect D/F conflicts > Add test case for NameConflictTreeWalk ... > 18 files changed, 1418 insertions(+), 207 deletions(-) -- Shawn. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [JGIT PATCH 00/14] TreeWalk D/F conflict detection @ 2008-08-18 23:53 Shawn O. Pearce 2008-08-19 8:52 ` David Woodhouse 0 siblings, 1 reply; 6+ messages in thread From: Shawn O. Pearce @ 2008-08-18 23:53 UTC (permalink / raw) To: Robin Rosenberg; +Cc: git This series is about fixing the "mistake" in Git trees where subtrees sort as through their name is "path/" and not "path". Within a TreeWalk this is a problem because the tree contents: Tree 1 Tree 2 ------------------------ 100644 a 100644 a.c 040000 a 100644 b needs to merge together both "a" paths from tree 1 and tree 2, but these appear at different points in time when we merge-sort the two trees together. We use an infinite look-ahead and look-behind to identify these cases and make the iteration look like this instead: Tree 1 Tree 2 ------------------------ 100644 a 040000 a 100644 a.c 100644 b which allows the application to handle the D/F conflict in a single step, even though the contents of "a" may now be out-of-order within the DirCache. Fortunately DirCacheBuilder can automatically fix this sort of ordering problem. Shawn O. Pearce (14): Detect path names which overflow the name length field in the index Fix NB.decodeUInt16 to correctly handle the high byte Add test cases for NB.encode and NB.decode family of routines Fix DirCache's skip over null byte padding when reading a DIRC file Fix usage of assertEquals in DirCacheIteratorTest Refactor AbstractTreeIterator.pathCompare to force another mode Micro-optimize AbstractTreeIterator.pathCompare Optimize path comparsion within subtrees during TreeWalk Refactor AbstractTreeIterator semantics to start on first entry Make all AbstractTreeIterator implementations bi-directional Expose beginning of iterator indication from AbstractTreeIterator Allow application code to set ObjectIds in DirCacheEntry Create NameConflictTreeWalk to transparently detect D/F conflicts Add test case for NameConflictTreeWalk .../spearce/egit/core/ContainerTreeIterator.java | 9 +- .../jgit/dircache/DirCacheBuilderIteratorTest.java | 2 +- .../jgit/dircache/DirCacheIteratorTest.java | 28 +- .../jgit/treewalk/CanonicalTreeParserTest.java | 261 ++++++++++++++++ .../jgit/treewalk/NameConflictTreeWalkTest.java | 205 ++++++++++++ .../tst/org/spearce/jgit/util/NBTest.java | 328 ++++++++++++++++++++ .../src/org/spearce/jgit/dircache/DirCache.java | 2 +- .../jgit/dircache/DirCacheBuildIterator.java | 9 +- .../org/spearce/jgit/dircache/DirCacheEntry.java | 41 +++- .../spearce/jgit/dircache/DirCacheIterator.java | 84 +++-- .../jgit/treewalk/AbstractTreeIterator.java | 129 +++++--- .../spearce/jgit/treewalk/CanonicalTreeParser.java | 94 +++++- .../spearce/jgit/treewalk/EmptyTreeIterator.java | 12 +- .../spearce/jgit/treewalk/FileTreeIterator.java | 5 +- .../jgit/treewalk/NameConflictTreeWalk.java | 237 ++++++++++++++ .../src/org/spearce/jgit/treewalk/TreeWalk.java | 18 +- .../spearce/jgit/treewalk/WorkingTreeIterator.java | 132 ++++----- org.spearce.jgit/src/org/spearce/jgit/util/NB.java | 29 ++- 18 files changed, 1418 insertions(+), 207 deletions(-) create mode 100644 org.spearce.jgit.test/tst/org/spearce/jgit/treewalk/CanonicalTreeParserTest.java create mode 100644 org.spearce.jgit.test/tst/org/spearce/jgit/treewalk/NameConflictTreeWalkTest.java create mode 100644 org.spearce.jgit.test/tst/org/spearce/jgit/util/NBTest.java create mode 100644 org.spearce.jgit/src/org/spearce/jgit/treewalk/NameConflictTreeWalk.java ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [JGIT PATCH 00/14] TreeWalk D/F conflict detection 2008-08-18 23:53 Shawn O. Pearce @ 2008-08-19 8:52 ` David Woodhouse 2008-08-19 14:24 ` Shawn O. Pearce 0 siblings, 1 reply; 6+ messages in thread From: David Woodhouse @ 2008-08-19 8:52 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Robin Rosenberg, git On Mon, 2008-08-18 at 16:53 -0700, Shawn O. Pearce wrote: > This series is about fixing the "mistake" in Git trees where > subtrees sort as through their name is "path/" and not "path". Er, really? Does that mean this commit is broken, then? http://git.kernel.org/?p=linux/kernel/git/dwmw2/misc-git-hacks.git;a=commitdiff;h=fc1b73da I shouldn't need to know that -- I'd _really_ like a version of 'git-hash-object -t tree' which validates and sorts its input for me. And maybe even takes input in _text_ form, so I don't have to convert the sha1 to binary. -- David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [JGIT PATCH 00/14] TreeWalk D/F conflict detection 2008-08-19 8:52 ` David Woodhouse @ 2008-08-19 14:24 ` Shawn O. Pearce 0 siblings, 0 replies; 6+ messages in thread From: Shawn O. Pearce @ 2008-08-19 14:24 UTC (permalink / raw) To: David Woodhouse; +Cc: Robin Rosenberg, git David Woodhouse <dwmw2@infradead.org> wrote: > On Mon, 2008-08-18 at 16:53 -0700, Shawn O. Pearce wrote: > > This series is about fixing the "mistake" in Git trees where > > subtrees sort as through their name is "path/" and not "path". > > Er, really? Does that mean this commit is broken, then? > > http://git.kernel.org/?p=linux/kernel/git/dwmw2/misc-git-hacks.git;a=commitdiff;h=fc1b73da Yes, that is broken. > I shouldn't need to know that -- I'd _really_ like a version of > 'git-hash-object -t tree' which validates and sorts its input for me. > And maybe even takes input in _text_ form, so I don't have to convert > the sha1 to binary. Try `git mktree` instead. It uses a text form, and it corrects the sorting for you. Though funny names that contain an LF may be harder to send to mktree. -- Shawn. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-08-19 22:55 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-08-19 22:51 [JGIT PATCH 00/14] TreeWalk D/F conflict detection Shawn O. Pearce 2008-08-19 22:51 ` [JGIT PATCH] Support path names longer than 4095 bytes in DirCache Shawn O. Pearce 2008-08-19 22:54 ` [JGIT PATCH 00/14] TreeWalk D/F conflict detection Shawn O. Pearce -- strict thread matches above, loose matches on Subject: below -- 2008-08-18 23:53 Shawn O. Pearce 2008-08-19 8:52 ` David Woodhouse 2008-08-19 14:24 ` Shawn O. Pearce
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).