Git development

Git development
 help / color / mirror / Atom feed

* Re: [JGIT PATCH 4/6] Add QuotedString class to handle C-style quoting rules
From: Robin Rosenberg @ 2008-12-11  0:33 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git
In-Reply-To: <20081210234130.GB32487@spearce.org>

torsdag 11 december 2008 00:41:30 skrev Shawn O. Pearce:
> > > +	public void testQuote_OctalAll() {
> > > +		assertQuote("\1", "\\001");
> > > +		assertQuote("~", "\\176");
> > > +		assertQuote("\u00ff", "\\303\\277"); // \u00ff in UTF-8
> > > +	}
> >
> > What do we do with non-UTF8 names? I think we should
> > follow the logic we use when parsing commits and paths
> > in other places.
> 
> Then we're totally f'd.
> 
> Git has no specific encoding on file names.  If we get a standard
> Java Unicode string and get asked to quote it characters with
> code points above 127 need to be escaped as an octal escape code
> according to the Git style.  Further the Git style only permits
> octal escapes that result in a value <= 255, aka an unsigned char.
> 
> The name needs to be encoded into an 8-bit encoding, and UTF-8 is
> the only encoding that will represent every valid Unicode character.
> Elsewhere we sort of take the attitude that when writing data *out*
> we produce UTF-8, even if we read in ISO-whatever.  Here I'm doing
> the same thing.

So this should pass, right?

	public void testDeQuote_Latin1() {
		assertDequote("\u00c5ngstr\u00f6m", "\\305ngstr\\366m"); // Latin1
	}

	public void testDeQuote_UTF8() {
		assertDequote("\u00c5ngstr\u00f6m", "\\303\\205ngstr\\303\\266m");
	}

And possibly these actuall unquoted names, which can be produced when
core.quotepath is false

	public void testDeQuote_Rawlatin() {
		assertDequote("\u00c5ngstr\u00f6m", "\305ngstr\366m");
	}

	public void testDeQuote_RawUTF8() {
		assertDequote("\u00c5ngstr\u00f6m", "\303\205ngstr\303\266m");
	}

You also reversed the arguments to testQuote. It think we should follow the
"expected"-first conventions here too. The case above works neither way.
Using Constant.encode in the test is kind of dangerous as it does too
many conversions, so you don't know what you're testing anymore. Changing
assertDequote like this makes us able to feed byte sequences as strings
to the test method (which we cannot do if we assume UTF-8 encoding). ISO-
latin-encoding allows any byte sequence to be entered conveniently.

	private static void assertDequote(final String exp, final String in) {
		final byte[] b;
		try {
			b = ('"' + in + '"').getBytes("ISO-8859-1");
		} catch (UnsupportedEncodingException e) {
			throw new RuntimeException(e);
		}
		final String r = C.dequote(b, 0, b.length);
		assertEquals(exp, r);
	}

-- robin

^ permalink raw reply

* [PATCH 2/3 (edit v2)] gitweb: Cache $parent_commit info in git_blame()
From: Jakub Narebski @ 2008-12-11  0:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nanako Shiraishi, git, Luben Tuikov
In-Reply-To: <7v7i67zsgj.fsf@gitster.siamese.dyndns.org>

Luben Tuikov changed 'lineno' link from leading to commit which gave
current version of given block of lines, to leading to parent of this
commit in 244a70e (Blame "linenr" link jumps to previous state at
"orig_lineno").  This made possible data mining using 'blame' view.

The current implementation calls rev-parse once per each blamed line
to find parent revision of blamed commit, even when the same commit
appears more than once, which is inefficient.

This patch attempts to mitigate this issue by storing (caching)
$parent_commit info in %metainfo, which makes gitweb call
git-rev-parse only once per each unique commit in blame output.


In the tables below you can see simple benchmark comparing gitweb
performance before and after this patch

File               | L[1] | C[2] || Time0[3] | Before[4] | After[4]
====================================================================
blob.h             |   18 |    4 || 0m1.727s |  0m2.545s |  0m2.474s
GIT-VERSION-GEN    |   42 |   13 || 0m2.165s |  0m2.448s |  0m2.071s
README             |   46 |    6 || 0m1.593s |  0m2.727s |  0m2.242s
revision.c         | 1923 |  121 || 0m2.357s | 0m30.365s |  0m7.028s
gitweb/gitweb.perl | 6291 |  428 || 0m8.080s | 1m37.244s | 0m20.627s

File               | L/C  | Before/After
=========================================
blob.h             |  4.5 |         1.03
GIT-VERSION-GEN    |  3.2 |         1.18
README             |  7.7 |         1.22
revision.c         | 15.9 |         4.32
gitweb/gitweb.perl | 14.7 |         4.71

As you can see the greater ratio of lines in file to unique commits
in blame output, the greater gain from the new implementation.

Footnotes:
~~~~~~~~~~
[1] Lines:
    $ wc -l <file>
[2] Individual commits in blame output:
    $ git blame -p <file> | grep author-time | wc -l
[3] Time for running "git blame -p" (user time, single run):
    $ time git blame -p <file> >/dev/null
[4] Time to run gitweb as Perl script from command line:
    $ gitweb-run.sh "p=.git;a=blame;f=<file>" > /dev/null 2>&1

The gitweb-run.sh script includes slightly modified (with adjusted
pathnames) code from gitweb_run() function from the test script
t/t9500-gitweb-standalone-no-errors.sh; gitweb config file
gitweb_config.perl contents (again up to adjusting pathnames; in
particular $projectroot variable should point to top directory of git
repository) can be found in the same place.


Alternate solutions:
~~~~~~~~~~~~~~~~~~~~
Alternate solution would be to open bidi pipe to "git cat-file
--batch-check", (like in Git::Repo in gitweb caching by Lea Wiemann),
feed $long_rev^ to it, and parse its output which has the following
form:

  926b07e694599d86cec668475071b32147c95034 commit 637

This would mean one call to git-cat-file for the whole 'blame' view,
instead of one call to git-rev-parse per each unique commit in blame
output.


Yet another solution would be to change use of validate_refname() to
validate_revision() when checking script parameters (CGI query or
path_info), with validate_revision being something like the following:

  sub validate_revision {
        my $rev = shift;
        return validate_refname(strip_rev_suffixes($rev));
  }

so we don't need to calculate $long_rev^, but can pass "$long_rev^" as
'hb' parameter.

This solution has the advantage that it can be easily adapted to
future incremental blame output.

Acked-by: Luben Tuikov <ltuikov@yahoo.com>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
On Wed, 10 Dec 2008, Junio C Hamano wrote:

> To recap, I think the commit log for this patch would have been much
> easier to read if it were presented in this order:
> 
>         a paragraph to establish the context;
> 
>         a paragraph to state what problem it tries to solve;
> 
>         a paragraph (or more) to explain the solution; and finally
> 
>         a paragraph to discuss possible future enhancements.

Like this?

Only commit message has changed.

 gitweb/gitweb.perl |   16 +++++++++++-----
 1 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 1b800f4..916396a 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -4657,11 +4657,17 @@ HTML
 			              esc_html($rev));
 			print "</td>\n";
 		}
-		open (my $dd, "-|", git_cmd(), "rev-parse", "$full_rev^")
-			or die_error(500, "Open git-rev-parse failed");
-		my $parent_commit = <$dd>;
-		close $dd;
-		chomp($parent_commit);
+		my $parent_commit;
+		if (!exists $meta->{'parent'}) {
+			open (my $dd, "-|", git_cmd(), "rev-parse", "$full_rev^")
+				or die_error(500, "Open git-rev-parse failed");
+			$parent_commit = <$dd>;
+			close $dd;
+			chomp($parent_commit);
+			$meta->{'parent'} = $parent_commit;
+		} else {
+			$parent_commit = $meta->{'parent'};
+		}
 		my $blamed = href(action => 'blame',
 		                  file_name => $meta->{'filename'},
 		                  hash_base => $parent_commit);
-- 
1.6.0.4

^ permalink raw reply related

* Re: [RFC/PATCH] Add support for a pdf version of the user manual
From: Miklos Vajna @ 2008-12-11  0:35 UTC (permalink / raw)
  To: Leo Razoumov; +Cc: Junio C Hamano, git
In-Reply-To: <ee2a733e0812101620s5fc2ff27p81826a5ff827e154@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1073 bytes --]

On Wed, Dec 10, 2008 at 07:20:42PM -0500, Leo Razoumov <slonik.az@gmail.com> wrote:
> On 12/10/08, Junio C Hamano <gitster@pobox.com> wrote:
> > "Leo Razoumov" <slonik.az@gmail.com> writes:
> >
> >  > BTW, for those of us without dblatex, is it possible to have pdf
> >  > manual pregenerated the same way html and man pages are pregenerated
> >  > for official releases in the git repo?
> >
> >
> > Those of us includes myself, so...
> 
> Ouch:-) Does it mean that such a useful patch has a low probability of
> being accepted?

First, just like the info pages, I don't think it's a problem at all if
the autogenerated pdf version is not part of git.git. I sent the patch
to provide an easy way to do the pdf generation, not to request the
autobuild of it as well.

Second, I think the autogeneration for pdf should not be done similar to
the man/html versions, as the pdf itself is 421K.

However, I'm happy to set up a nightly cron job to build the pdf in case
master is updated and/or there is a new release - in case there is
demand for that.

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [RFC/PATCH] Add support for a pdf version of the user manual
From: Junio C Hamano @ 2008-12-11  0:37 UTC (permalink / raw)
  To: SLONIK.AZ; +Cc: Miklos Vajna, git
In-Reply-To: <ee2a733e0812101620s5fc2ff27p81826a5ff827e154@mail.gmail.com>

"Leo Razoumov" <slonik.az@gmail.com> writes:

> On 12/10/08, Junio C Hamano <gitster@pobox.com> wrote:
>> "Leo Razoumov" <slonik.az@gmail.com> writes:
>>
>>  > BTW, for those of us without dblatex, is it possible to have pdf
>>  > manual pregenerated the same way html and man pages are pregenerated
>>  > for official releases in the git repo?
>>
>>
>> Those of us includes myself, so...
>
> Ouch:-) Does it mean that such a useful patch has a low probability of
> being accepted?

As an optional "make" target, as long as it works for people with the
necessary toolchain, I have no problem with the patch, but I would
complain if the usual "make doc" try to run the tool I do not want to run
with my regular build.  I didn't check.

^ permalink raw reply

* Re: epic fsck SIGSEGV! (was Recovering from epic fail (deleted .git/objects/pack))
From: Linus Torvalds @ 2008-12-11  0:45 UTC (permalink / raw)
  To: R. Tyler Ballance; +Cc: Johannes Sixt, Junio C Hamano, git
In-Reply-To: <1228955062.27061.36.camel@starfruit.local>

On Wed, 10 Dec 2008, R. Tyler Ballance wrote:
>
> The stack size is 8M as you assumed, I'm curious as to how the kernel
> handles a process that exceeds the ulimit(2) stacksize. I know from our
> experience with this repository that when Git runs up against the
> address space (ulimit -v) that an ENOMEM or something similar is
> returned. Is there an E_NOSTACK? :) (figured I'd ask, given your
> apparent knowledge on the subject ;))

Since stack expansion doesn't involve any system calls, and since there is 
no way to recover from it anyway, the kernel has no choice: it just sends 
a SIGSEGV.

An application that wants to _can_ handle this case by installing a signal 
handler, but since signal handling needs some stack-space too, a regular 
"sigaction(SIGSEGV..)" isn't sufficient. You also need to set up a 
separate signal stack ..

Nobody really ever does that, except for some _really_ special programs. 
But it's a way to handle errors in stack allocation if you really need to. 
Git certainly does not do it.

> > (You can do something like
> > 
> > 	git rev-list --first-parent HEAD | wc -l
> 
> tyler@ccnet:~/source/slide/brian_main>  git rev-list --first-parent HEAD | wc -l
> 46751 

Ahh. yes. The 80k number is because the callchain was that deep, but since 
each recursion involves _two_ functions, it really only needed a 40k 
commit depth to the root to get there.

> > But we should definitely fix this braindamage in fsck. Rather than 
> > recursively walk the commits, we should add them to a commit list and just 
> > walk the list iteratively.
> 
> Given that this issue affects our internal (proprietary) repository, I
> can't very well give access to it or publish a clone, but I'm willing to
> help in any way I can. We maintain an internal fork of the Git tree, so
> I can apply any changes you'd like to an internal 1.6.0.4 or 1.6.0.5
> build. For obvious reasons I ran the fsck against an upstream maintained
> (stable) build of Git.

Can you try with a bigger stack? Just do

	ulimit -s 16384

and then re-try the fsck. Just to verify that this is it. If nothing else, 
it will at least give you a working fsck, even if it's obviously not the 
"correct" solution.

		Linus

^ permalink raw reply

* Re: [RFC/PATCH 4/3] gitweb: Incremental blame (proof of concept)
From: Junio C Hamano @ 2008-12-11  0:47 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: git, Luben Tuikov, Nanako Shiraishi, Petr Baudis,
	Fredrik Kuivinen
In-Reply-To: <20081210200908.16899.36727.stgit@localhost.localdomain>

Jakub Narebski <jnareb@gmail.com> writes:

> NOTE: This patch is RFC proof of concept patch!: it should be split
> onto many smaller patches for easy review (and bug finding) in version
> meant to be applied.

Hmm, the comments an RFC requests for would certainly be based on reviews
of the patch in question, so if the patch is known to be unsuitable for
reviewing, what would that tell us, I wonder ;-)?

Among the 700 lines added/deleted, 400 lines are from a single new file,
so what may benefit from splitting would be the changes to gitweb.perl but
it does not look so bad (I haven't really read the patch, though).

> Differences between 'blame' and 'blame_incremental' output:

Hmm, are these by design in the sense that "when people are getting
incremental blame output, the normal blame output format is unsuitable for
such and such reasons and that is why there have to be these differences",
or "the code happens to produce slightly different results because it is
implemented differently; the differences are listed here as due
diligence"?

> P.P.S. What is the stance for copyrigth assesments in the files
> for git code, like the ones in gitweb/gitweb.perl and gitweb/blame.js?

There is no copyright assignment.  Everybody retains the own copyright on
their own work.

> P.P.P.S. Should I use Signed-off-by from Pasky and Fredrik if I based
> my code on theirs, and if they all signed their patches?

I think that is in line with what Certificate of Origin asks you to do.

^ permalink raw reply

* Re: epic fsck SIGSEGV!
From: Junio C Hamano @ 2008-12-11  0:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: R. Tyler Ballance, Johannes Sixt, git
In-Reply-To: <alpine.LFD.2.00.0812101523570.3340@localhost.localdomain>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> But we should definitely fix this braindamage in fsck. Rather than 
> recursively walk the commits, we should add them to a commit list and just 
> walk the list iteratively.
>
> Junio?

I think that is a sensible thing to do.  I may take a look at it myself
later in the week, unless somebody else (wants to do / does) this first.

^ permalink raw reply

* [JGIT PATCH 4/6 v3] Add QuotedString class to handle Git path style quoting rules
From: Shawn O. Pearce @ 2008-12-11  0:57 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git
In-Reply-To: <200812110133.51614.robin.rosenberg@dewire.com>

Git patch files can contain file names which are quoted using the
roughly the C language quoting rules.  In order to correctly create
or parse these files we must implement a quoting style that matches
those specific rules.

QuotedString itself is an abstract API so callers can be passed a
quoting style based on the context of where their output will be
used, and multiple styles could be supported.  This may be useful
if jgit ever grows a "git for-each-ref" style of output where Perl,
Python, Tcl and Bourne style quoting might be necessary.

References through the singleton QuotedString.GIT_PATH should be
able to bypass the virtual function table, as the specific type is
mentioned in the field declaration and that type is final.  A good
JIT should be able to remove the abstraction costs when the caller
has hardcoded the quoting style.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
  Robin Rosenberg <robin.rosenberg@dewire.com> wrote:
  > So this should pass, right?

  These tests have been added, and they pass.

  > You also reversed the arguments to testQuote. It think we should follow the
  > "expected"-first conventions here too.

  Fixed.

  > Using Constant.encode in the test is kind of dangerous as it does too
  > many conversions,

  Fixed.

 .../jgit/util/QuotedStringGitPathStyleTest.java    |  172 +++++++++++++
 .../src/org/spearce/jgit/util/QuotedString.java    |  268 ++++++++++++++++++++
 2 files changed, 440 insertions(+), 0 deletions(-)
 create mode 100644 org.spearce.jgit.test/tst/org/spearce/jgit/util/QuotedStringGitPathStyleTest.java
 create mode 100644 org.spearce.jgit/src/org/spearce/jgit/util/QuotedString.java

diff --git a/org.spearce.jgit.test/tst/org/spearce/jgit/util/QuotedStringGitPathStyleTest.java b/org.spearce.jgit.test/tst/org/spearce/jgit/util/QuotedStringGitPathStyleTest.java
new file mode 100644
index 0000000..54fbd31
--- /dev/null
+++ b/org.spearce.jgit.test/tst/org/spearce/jgit/util/QuotedStringGitPathStyleTest.java
@@ -0,0 +1,172 @@
+/*
+ * Copyright (C) 2008, Google Inc.
+ *
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials provided
+ *   with the distribution.
+ *
+ * - Neither the name of the Git Development Community nor the
+ *   names of its contributors may be used to endorse or promote
+ *   products derived from this software without specific prior
+ *   written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
+ * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
+ * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+ * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+package org.spearce.jgit.util;
+
+import static org.spearce.jgit.util.QuotedString.GIT_PATH;
+
+import java.io.UnsupportedEncodingException;
+
+import junit.framework.TestCase;
+
+import org.spearce.jgit.lib.Constants;
+
+public class QuotedStringGitPathStyleTest extends TestCase {
+	private static void assertQuote(final String exp, final String in) {
+		final String r = GIT_PATH.quote(in);
+		assertNotSame(in, r);
+		assertFalse(in.equals(r));
+		assertEquals('"' + exp + '"', r);
+	}
+
+	private static void assertDequote(final String exp, final String in) {
+		final byte[] b;
+		try {
+			b = ('"' + in + '"').getBytes("ISO-8859-1");
+		} catch (UnsupportedEncodingException e) {
+			throw new RuntimeException(e);
+		}
+		final String r = GIT_PATH.dequote(b, 0, b.length);
+		assertEquals(exp, r);
+	}
+
+	public void testQuote_Empty() {
+		assertEquals("\"\"", GIT_PATH.quote(""));
+	}
+
+	public void testDequote_Empty1() {
+		assertEquals("", GIT_PATH.dequote(new byte[0], 0, 0));
+	}
+
+	public void testDequote_Empty2() {
+		assertEquals("", GIT_PATH.dequote(new byte[] { '"', '"' }, 0, 2));
+	}
+
+	public void testDequote_SoleDq() {
+		assertEquals("\"", GIT_PATH.dequote(new byte[] { '"' }, 0, 1));
+	}
+
+	public void testQuote_BareA() {
+		final String in = "a";
+		assertSame(in, GIT_PATH.quote(in));
+	}
+
+	public void testDequote_BareA() {
+		final String in = "a";
+		final byte[] b = Constants.encode(in);
+		assertEquals(in, GIT_PATH.dequote(b, 0, b.length));
+	}
+
+	public void testDequote_BareABCZ_OnlyBC() {
+		final String in = "abcz";
+		final byte[] b = Constants.encode(in);
+		final int p = in.indexOf('b');
+		assertEquals("bc", GIT_PATH.dequote(b, p, p + 2));
+	}
+
+	public void testDequote_LoneBackslash() {
+		assertDequote("\\", "\\");
+	}
+
+	public void testQuote_NamedEscapes() {
+		assertQuote("\\a", "\u0007");
+		assertQuote("\\b", "\b");
+		assertQuote("\\f", "\f");
+		assertQuote("\\n", "\n");
+		assertQuote("\\r", "\r");
+		assertQuote("\\t", "\t");
+		assertQuote("\\v", "\u000B");
+		assertQuote("\\\\", "\\");
+		assertQuote("\\\"", "\"");
+	}
+
+	public void testDequote_NamedEscapes() {
+		assertDequote("\u0007", "\\a");
+		assertDequote("\b", "\\b");
+		assertDequote("\f", "\\f");
+		assertDequote("\n", "\\n");
+		assertDequote("\r", "\\r");
+		assertDequote("\t", "\\t");
+		assertDequote("\u000B", "\\v");
+		assertDequote("\\", "\\\\");
+		assertDequote("\"", "\\\"");
+	}
+
+	public void testDequote_OctalAll() {
+		for (int i = 0; i < 256; i++) {
+			String s = Integer.toOctalString(i);
+			while (s.length() < 3) {
+				s = "0" + s;
+			}
+			assertDequote("" + (char) i, "\\" + s);
+		}
+	}
+
+	public void testQuote_OctalAll() {
+		assertQuote("\\001", "\1");
+		assertQuote("\\176", "~");
+		assertQuote("\\303\\277", "\u00ff"); // \u00ff in UTF-8
+	}
+
+	public void testDequote_UnknownEscapeQ() {
+		assertDequote("\\q", "\\q");
+	}
+
+	public void testDequote_FooTabBar() {
+		assertDequote("foo\tbar", "foo\\tbar");
+	}
+
+	public void testDequote_Latin1() {
+		assertDequote("\u00c5ngstr\u00f6m", "\\305ngstr\\366m"); // Latin1
+	}
+
+	public void testDequote_UTF8() {
+		assertDequote("\u00c5ngstr\u00f6m", "\\303\\205ngstr\\303\\266m");
+	}
+
+	public void testDequote_RawUTF8() {
+		assertDequote("\u00c5ngstr\u00f6m", "\303\205ngstr\303\266m");
+	}
+
+	public void testDequote_RawLatin1() {
+		assertDequote("\u00c5ngstr\u00f6m", "\305ngstr\366m");
+	}
+
+	public void testQuote_Ang() {
+		assertQuote("\\303\\205ngstr\\303\\266m", "\u00c5ngstr\u00f6m");
+	}
+}
diff --git a/org.spearce.jgit/src/org/spearce/jgit/util/QuotedString.java b/org.spearce.jgit/src/org/spearce/jgit/util/QuotedString.java
new file mode 100644
index 0000000..279b713
--- /dev/null
+++ b/org.spearce.jgit/src/org/spearce/jgit/util/QuotedString.java
@@ -0,0 +1,268 @@
+/*
+ * Copyright (C) 2008, Google Inc.
+ *
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials provided
+ *   with the distribution.
+ *
+ * - Neither the name of the Git Development Community nor the
+ *   names of its contributors may be used to endorse or promote
+ *   products derived from this software without specific prior
+ *   written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
+ * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
+ * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+ * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+package org.spearce.jgit.util;
+
+import java.util.Arrays;
+
+import org.spearce.jgit.lib.Constants;
+
+/** Utility functions related to quoted string handling. */
+public abstract class QuotedString {
+	/** Quoting style that obeys the rules Git applies to file names */
+	public static final GitPathStyle GIT_PATH = new GitPathStyle();
+
+	/**
+	 * Quote an input string by the quoting rules.
+	 * <p>
+	 * If the input string does not require any quoting, the same String
+	 * reference is returned to the caller.
+	 * <p>
+	 * Otherwise a quoted string is returned, including the opening and closing
+	 * quotation marks at the start and end of the string. If the style does not
+	 * permit raw Unicode characters then the string will first be encoded in
+	 * UTF-8, with unprintable sequences possibly escaped by the rules.
+	 * 
+	 * @param in
+	 *            any non-null Unicode string.
+	 * @return a quoted string. See above for details.
+	 */
+	public abstract String quote(String in);
+
+	/**
+	 * Clean a previously quoted input, decoding the result via UTF-8.
+	 * <p>
+	 * This method must match quote such that:
+	 * 
+	 * <pre>
+	 * a.equals(dequote(quote(a)));
+	 * </pre>
+	 * 
+	 * is true for any <code>a</code>.
+	 * 
+	 * @param in
+	 *            a Unicode string to remove quoting from.
+	 * @return the cleaned string.
+	 * @see #dequote(byte[], int, int)
+	 */
+	public String dequote(final String in) {
+		final byte[] b = Constants.encode(in);
+		return dequote(b, 0, b.length);
+	}
+
+	/**
+	 * Decode a previously quoted input, scanning a UTF-8 encoded buffer.
+	 * <p>
+	 * This method must match quote such that:
+	 * 
+	 * <pre>
+	 * a.equals(dequote(Constants.encode(quote(a))));
+	 * </pre>
+	 * 
+	 * is true for any <code>a</code>.
+	 * <p>
+	 * This method removes any opening/closing quotation marks added by
+	 * {@link #quote(String)}.
+	 * 
+	 * @param in
+	 *            the input buffer to parse.
+	 * @param offset
+	 *            first position within <code>in</code> to scan.
+	 * @param end
+	 *            one position past in <code>in</code> to scan.
+	 * @return the cleaned string.
+	 */
+	public abstract String dequote(byte[] in, int offset, int end);
+
+	/** Quoting style that obeys the rules Git applies to file names */
+	public static final class GitPathStyle extends QuotedString {
+		private static final byte[] quote;
+		static {
+			quote = new byte[128];
+			Arrays.fill(quote, (byte) -1);
+
+			for (int i = '0'; i <= '9'; i++)
+				quote[i] = 0;
+			for (int i = 'a'; i <= 'z'; i++)
+				quote[i] = 0;
+			for (int i = 'A'; i <= 'Z'; i++)
+				quote[i] = 0;
+			quote[' '] = 0;
+			quote['+'] = 0;
+			quote[','] = 0;
+			quote['-'] = 0;
+			quote['.'] = 0;
+			quote['/'] = 0;
+			quote['='] = 0;
+			quote['_'] = 0;
+			quote['^'] = 0;
+
+			quote['\u0007'] = 'a';
+			quote['\b'] = 'b';
+			quote['\f'] = 'f';
+			quote['\n'] = 'n';
+			quote['\r'] = 'r';
+			quote['\t'] = 't';
+			quote['\u000B'] = 'v';
+			quote['\\'] = '\\';
+			quote['"'] = '"';
+		}
+
+		@Override
+		public String quote(final String instr) {
+			if (instr.length() == 0)
+				return "\"\"";
+			boolean reuse = true;
+			final byte[] in = Constants.encode(instr);
+			final StringBuilder r = new StringBuilder(2 + in.length);
+			r.append('"');
+			for (int i = 0; i < in.length; i++) {
+				final int c = in[i] & 0xff;
+				if (c < quote.length) {
+					final byte style = quote[c];
+					if (style == 0) {
+						r.append((char) c);
+						continue;
+					}
+					if (style > 0) {
+						reuse = false;
+						r.append('\\');
+						r.append((char) style);
+						continue;
+					}
+				}
+
+				reuse = false;
+				r.append('\\');
+				r.append((char) (((c >> 6) & 03) + '0'));
+				r.append((char) (((c >> 3) & 07) + '0'));
+				r.append((char) (((c >> 0) & 07) + '0'));
+			}
+			if (reuse)
+				return instr;
+			r.append('"');
+			return r.toString();
+		}
+
+		@Override
+		public String dequote(final byte[] in, final int inPtr, final int inEnd) {
+			if (2 <= inEnd - inPtr && in[inPtr] == '"' && in[inEnd - 1] == '"')
+				return dq(in, inPtr + 1, inEnd - 1);
+			return RawParseUtils.decode(Constants.CHARSET, in, inPtr, inEnd);
+		}
+
+		private static String dq(final byte[] in, int inPtr, final int inEnd) {
+			final byte[] r = new byte[inEnd - inPtr];
+			int rPtr = 0;
+			while (inPtr < inEnd) {
+				final byte b = in[inPtr++];
+				if (b != '\\') {
+					r[rPtr++] = b;
+					continue;
+				}
+
+				if (inPtr == inEnd) {
+					// Lone trailing backslash. Treat it as a literal.
+					//
+					r[rPtr++] = '\\';
+					break;
+				}
+
+				switch (in[inPtr++]) {
+				case 'a':
+					r[rPtr++] = 0x07 /* \a = BEL */;
+					continue;
+				case 'b':
+					r[rPtr++] = '\b';
+					continue;
+				case 'f':
+					r[rPtr++] = '\f';
+					continue;
+				case 'n':
+					r[rPtr++] = '\n';
+					continue;
+				case 'r':
+					r[rPtr++] = '\r';
+					continue;
+				case 't':
+					r[rPtr++] = '\t';
+					continue;
+				case 'v':
+					r[rPtr++] = 0x0B/* \v = VT */;
+					continue;
+
+				case '\\':
+				case '"':
+					r[rPtr++] = in[inPtr - 1];
+					continue;
+
+				case '0':
+				case '1':
+				case '2':
+				case '3': {
+					int cp = in[inPtr - 1] - '0';
+					while (inPtr < inEnd) {
+						final byte c = in[inPtr];
+						if ('0' <= c && c <= '7') {
+							cp <<= 3;
+							cp |= c - '0';
+							inPtr++;
+						} else {
+							break;
+						}
+					}
+					r[rPtr++] = (byte) cp;
+					continue;
+				}
+
+				default:
+					// Any other code is taken literally.
+					//
+					r[rPtr++] = '\\';
+					r[rPtr++] = in[inPtr - 1];
+					continue;
+				}
+			}
+
+			return RawParseUtils.decode(Constants.CHARSET, r, 0, rPtr);
+		}
+
+		private GitPathStyle() {
+			// Singleton
+		}
+	}
+}
-- 
1.6.1.rc2.299.gead4c


-- 
Shawn.

^ permalink raw reply related

* Re: epic fsck SIGSEGV! (was Recovering from epic fail (deleted .git/objects/pack))
From: Boyd Stephen Smith Jr. @ 2008-12-11  1:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: R. Tyler Ballance, Johannes Sixt, Junio C Hamano, git
In-Reply-To: <alpine.LFD.2.00.0812101523570.3340@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 1054 bytes --]

On Wednesday 2008 December 10 17:40:28 Linus Torvalds wrote:
>On Wed, 10 Dec 2008, R. Tyler Ballance wrote:
>Anyway, that's a really annoying problem, and it's a bug in git.
>
>That stupid fsck commit walker walks the parents recursively.
>
>And judging by the fact that gdb also SIGSEGV's for you when
>doing the backtrace, it looks like the gdb backtrace tracer is _also_
>recursive, and _also_ hits the same issue ;)
>
>So you have definitely found a real bug.
>
>But we should definitely fix this braindamage in fsck. Rather than
>recursively walk the commits, we should add them to a commit list and just
>walk the list iteratively.

Suppose I fixed this tonight.  Would you need anything other than a patch 
(series) from me?  (E.g. copyright assignment or something else legal [vs. 
technical])
-- 
Boyd Stephen Smith Jr.                     ,= ,-_-. =. 
bss03@volumehost.net                      ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy           `-'(. .)`-' 
http://iguanasuicide.org/                      \_/     

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: epic fsck SIGSEGV! (was Recovering from epic fail (deleted .git/objects/pack))
From: Shawn O. Pearce @ 2008-12-11  1:16 UTC (permalink / raw)
  To: Boyd Stephen Smith Jr.
  Cc: Linus Torvalds, R. Tyler Ballance, Johannes Sixt, Junio C Hamano,
	git
In-Reply-To: <200812101903.58980.bss03@volumehost.net>

"Boyd Stephen Smith Jr." <bss03@volumehost.net> wrote:
> 
> Suppose I fixed this tonight.  Would you need anything other than a patch 
> (series) from me?  (E.g. copyright assignment or something else legal [vs. 
> technical])

No, just consent under the "Developer's Certificate of Origin 1.1"
in SubmittingPatches.

-- 
Shawn.

^ permalink raw reply

* Re: epic fsck SIGSEGV! (was Recovering from epic fail (deleted .git/objects/pack))
From: R. Tyler Ballance @ 2008-12-11  1:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Johannes Sixt, Junio C Hamano, git
In-Reply-To: <alpine.LFD.2.00.0812101636351.3340@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 5219 bytes --]

On Wed, 2008-12-10 at 16:45 -0800, Linus Torvalds wrote:
> 
> On Wed, 10 Dec 2008, R. Tyler Ballance wrote:
> >
> > The stack size is 8M as you assumed, I'm curious as to how the kernel
> > handles a process that exceeds the ulimit(2) stacksize. I know from our
> > experience with this repository that when Git runs up against the
> > address space (ulimit -v) that an ENOMEM or something similar is
> > returned. Is there an E_NOSTACK? :) (figured I'd ask, given your
> > apparent knowledge on the subject ;))
> 
> Since stack expansion doesn't involve any system calls, and since there is 
> no way to recover from it anyway, the kernel has no choice: it just sends 
> a SIGSEGV.
> 
> An application that wants to _can_ handle this case by installing a signal 
> handler, but since signal handling needs some stack-space too, a regular 
> "sigaction(SIGSEGV..)" isn't sufficient. You also need to set up a 
> separate signal stack ..

Interesting, thanks for the explanation :)

> Can you try with a bigger stack? Just do
> 
> 	ulimit -s 16384

Looks like that'll do it :) Transcript below. I'll lower the limit with
a build with Boyd's impending patch, though I assume you can probably
recreate this with a stacksize that's less than 2x your commit count. 

> 
> and then re-try the fsck. Just to verify that this is it. If nothing else, 
> it will at least give you a working fsck, even if it's obviously not the 
> "correct" solution.

tyler@ccnet:~/source/slide/brian_main> gdb git
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show
copying"
and "show warranty" for details.
This GDB was configured as "x86_64-suse-linux"...
(gdb) run fsck --full
Starting program: /usr/local/bin/git fsck --full
error: refs/remotes/origin/master-team-test does not point to a valid
object!
error: refs/remotes/origin/wip-test does not point to a valid object!
error: refs/tags/cooltag does not point to a valid object!
dangling commit 743db07961c5076511a6d04664536863da91920c
dangling commit 525660ad1268b208d440467cd3c083aa2375ee8f
dangling commit 6587b04ff81aaa43721f32f0e443bb3b0ef2be78
dangling commit 0498400f510c0bf3dc17d533d356e46ce19f0f6b
dangling commit 61af71c12f7e608d0e68b52c4d118d7fe4be9690
dangling commit aae5318412d1ca51912c71dca8f181d605928cfc
dangling commit df6522a65a3e40f50da695303cd146852cb13c3d
dangling commit fe6643635b89700f384f0d28acd247a411d52e1a
dangling commit 7e81536bfc7fb12e1f4576cb64f55e46e7e8042a
dangling commit 92ab13d2779f7a7d46736ce041e0d5a5cde16dfb
dangling commit 00b8a3ea6d294c43140f9277bf48bbe734ded10f
dangling commit 3883d47c6dffb6163989d7c54784edb08c5a8e42
dangling commit e39fb41aa5b7ce327d64938a241d055524d0425b
dangling blob 19ccb407e4f7629880e484d08bcfd805157820ee
dangling commit a135f5ae5c4cf5e3d87e63bb102c1f59c9bf2d98
dangling blob 2fa995576ce9cb7f04a4d302d0defb24468a78da
dangling commit 06d125abdf5dd002664a3b39f372713049495db7
dangling commit 26d50600da2954a71f1e985a24497be6f9ccd9bb
dangling commit 53da563adeefee480e2230bc01fedb703185e659
dangling commit 9c06e7fefb0bdfcf096439549e7f9bba4c1b5f1e
dangling commit 8d4a571f179f04a243367615e6e04a9d7437de8a
dangling commit 734b47a3329618deeb556150e161e040bc055e5e
dangling commit 038e08581164006168b38ae3b3632592ff243346
dangling commit 92f3286443b737fb2787a157479eff93b4ec1949
dangling commit 6700599d0bd0bb20b1eb611e377a9f9628272f93
dangling commit b668393e07e4c0b3cff47484084c6dad0fc6c67c
dangling commit 9777594bd3e5e9e66b22827266ee7c0d672e63d8
dangling commit a84a1a40bfebaced5be4160a37a754841ec6839d
dangling commit 41c29a41daa556b073be46401148b71864122f10
dangling commit e4caeabd7e0bdc28bacc14f5fc3f9b7f00678e9f
dangling commit 33ce1af2009bd9ccff27950af1e4faead0dcbaa9
dangling commit 2ded0a58779e02e7e07c861e541b5d75911b9ef6
dangling commit 2148fb15b79c3bab79859e80bba35ff8e9343e4d
dangling commit cfc3fb2e13a3f7b5e53ce77db26faa4badb42c06
dangling commit e3f8bbd1a0993f080355e297d4204bfd5a079d4c
dangling commit ed11cc08822d005d6f70ea9c059ee1b1ee28b5cb
dangling commit f61e3c6094df2ca7bd421853dad108b6cf0a6be7
dangling commit a730acaa6454e76bf033b3962d13b64fb0b03ca0
dangling commit 25689c64420b7e062931045919e452afa11940bc
dangling commit 8eda5c081ddf1ba5c926f47d8bd1b3c9643d8adf
dangling commit 76dbbca6603e2a630c2cac8b65ed5ec4c9f45abd
dangling commit 41e93c5564f2e06b61baf67d34da8774d84f463d
dangling commit f2038d93f67b95034c97a5895457062a6b0c96c4
dangling commit f54a0d30f3e62234941c80487b9dcbfaa10927ad
dangling commit 02a51d79b8ab6e7f396e8a0ee5f8768bf538d112
dangling commit 80a71dd3cd4b5a301931d44bee5ef4584fa1f2e9
dangling commit 2c05de9b6db8c9f392a0ee90b796efaf862dbcfe
dangling commit b465eebed0cf5538c124393df6b0cb35f98f7d3a
dangling commit 02246f3eb943a5b0868d386e39ed5719ab0d2ca9
dangling commit e6b97f34bc6f27f4ad48041b1eb3a88e18b87f18
dangling commit 7bc80fb7f429219310e5671f7191a4d6476a4bd9

Program exited normally.
(gdb) 

-- 
-R. Tyler Ballance
Slide, Inc.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [RFC/PATCH 4/3] gitweb: Incremental blame (proof of concept)
From: Jakub Narebski @ 2008-12-11  1:22 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Luben Tuikov, Nanako Shiraishi, Petr Baudis,
	Fredrik Kuivinen
In-Reply-To: <7v3agvy1v3.fsf@gitster.siamese.dyndns.org>

On Thu, 11 Dec 2008, Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
> 
> > NOTE: This patch is RFC proof of concept patch!: it should be split
> > onto many smaller patches for easy review (and bug finding) in version
> > meant to be applied.
> 
> Hmm, the comments an RFC requests for would certainly be based on reviews
> of the patch in question, so if the patch is known to be unsuitable for
> reviewing, what would that tell us, I wonder ;-)?

Well, you can apply patch and test how it works, for example if
JavaScript code works in other browsers that have JavaScript and DOM
support (Firefox, IE, Opera, Safari, Google Chrome)... Or what
features or what interface one would like to have...

> Among the 700 lines added/deleted, 400 lines are from a single new file,
> so what may benefit from splitting would be the changes to gitweb.perl but
> it does not look so bad (I haven't really read the patch, though).

There are a few features which could be split in separate commits:
 * there are a few improvements to gitweb.css, independent of 
   incremental blame view, like td.warning -> .warning
 * adding to gitweb writing how long it took to generate page should
   be made into separate commit, probably made optional, use better
   HTML style, and have some fallback if there is no Time::HiRes

 * progress report could be made into separate commit; I needed it to
   debug code, to check if it progress nicely, but it is not strictly
   required (but it is nice to have visual indicator of progress)
 * 3-coloring of blamed lines during adding blame info was added for
   the fun of it, and should probably be in separate commit
 * adding author initials a'la "git gui blame" while nice could also
   be put in separate commit, probably adding this feature also to
   ordinary 'blame' output

[...] 
> > Differences between 'blame' and 'blame_incremental' output:
> 
> Hmm, are these by design in the sense that "when people are getting
> incremental blame output, the normal blame output format is unsuitable for
> such and such reasons and that is why there have to be these differences",
> or "the code happens to produce slightly different results because it is
> implemented differently; the differences are listed here as due
> diligence"?

Actually it is both. Some of differences are _currently_ not possible
to resolve (parent commit 'lineno' links, split group of lines blamed
by the same commit), some are coded differently (title attribute for
sha1, rowspan="1", author initials feature), and some are impossible
in incremental blame at least during generation (zebra table) or does
not make sense in 'blame' view (progress indicators).

> > P.P.S. What is the stance for copyrigth assesments in the files
> > for git code, like the ones in gitweb/gitweb.perl and gitweb/blame.js?
> 
> There is no copyright assignment.  Everybody retains the own copyright on
> their own work.

Errr... I'm sorry, I haven't made myself clear. I wanted to ask what
is the best practices about copyright statement lines like

  // Copyright (C) 2007, Fredrik Kuivinen <frekui@gmail.com>

and other results of "git grep Copyright": should it be added for
initial author, for main authors... I guess not for all authors.

> > P.P.P.S. Should I use Signed-off-by from Pasky and Fredrik if I based
> > my code on theirs, and if they all signed their patches?
> 
> I think that is in line with what Certificate of Origin asks you to do.
 
I was bit confused because Petr Baudis in his patch used Cc: and not
Signed-off-by: to Fredrik Kuivinen...
-- 
Jakub Narebski
Poland

^ permalink raw reply

* Re: epic fsck SIGSEGV! (was Recovering from epic fail (deleted .git/objects/pack))
From: Nicolas Pitre @ 2008-12-11  1:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: R. Tyler Ballance, Johannes Sixt, Junio C Hamano, git
In-Reply-To: <alpine.LFD.2.00.0812101523570.3340@localhost.localdomain>

On Wed, 10 Dec 2008, Linus Torvalds wrote:

> But we should definitely fix this braindamage in fsck. Rather than 
> recursively walk the commits, we should add them to a commit list and just 
> walk the list iteratively.

What about:

	http://marc.info/?l=git&m=122889563424786&w=2

Nicolas

^ permalink raw reply

* Re: epic fsck SIGSEGV!
From: Junio C Hamano @ 2008-12-11  1:52 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linus Torvalds, R. Tyler Ballance, Johannes Sixt, git
In-Reply-To: <alpine.LFD.2.00.0812102031440.14328@xanadu.home>

Nicolas Pitre <nico@cam.org> writes:

> On Wed, 10 Dec 2008, Linus Torvalds wrote:
>
>> But we should definitely fix this braindamage in fsck. Rather than 
>> recursively walk the commits, we should add them to a commit list and just 
>> walk the list iteratively.
>
> What about:
>
> 	http://marc.info/?l=git&m=122889563424786&w=2

I have to dig that out of the mail archive (quoting message-id or $gmane
article number would have been easier for me), but should I take it as an
Ack from you?

^ permalink raw reply

* Re: epic fsck SIGSEGV!
From: Nicolas Pitre @ 2008-12-11  2:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, R. Tyler Ballance, Johannes Sixt, git
In-Reply-To: <7vskovwk99.fsf@gitster.siamese.dyndns.org>

On Wed, 10 Dec 2008, Junio C Hamano wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> > On Wed, 10 Dec 2008, Linus Torvalds wrote:
> >
> >> But we should definitely fix this braindamage in fsck. Rather than 
> >> recursively walk the commits, we should add them to a commit list and just 
> >> walk the list iteratively.
> >
> > What about:
> >
> > 	http://marc.info/?l=git&m=122889563424786&w=2
> 
> I have to dig that out of the mail archive (quoting message-id or $gmane
> article number would have been easier for me),

Message-ID: <20081210075338.GA7776@auto.tuwien.ac.at>

> but should I take it as an Ack from you?

I was involved in that thread initially, until bisection showed commit 
271b8d25b25e49b367087440e093e755e5f35aa9 as the culprit.  This might be 
the same issue but I have not experienced it myself.

So I'm merely only connecting email threads here.

Nicolas

^ permalink raw reply

* Re: git fsck segmentation fault
From: Junio C Hamano @ 2008-12-11  2:33 UTC (permalink / raw)
  To: Martin Koegler; +Cc: Nicolas Pitre, Simon Hausmann, Git Mailing List
In-Reply-To: <20081210075338.GA7776@auto.tuwien.ac.at>

mkoegler@auto.tuwien.ac.at (Martin Koegler) writes:

> Maybe something like this could help:

>>From 32be177cbb0825fc019200b172f3d79117b28140 Mon Sep 17 00:00:00 2001
> From: Martin Koegler <mkoegler@auto.tuwien.ac.at>
> Date: Wed, 10 Dec 2008 08:42:08 +0100
> Subject: [PATCH] fsck: use fewer stack
>
> This patch moves the state while traversing the tree
> from the stack to the heap.

Hmm, after the change:

	* mark_object() marks the object as reachable, and pushes the
	  objects to the objectstack;

	* mark_object_reachable() marks the object using mark_object(),
          and repeatedly calls mark_child_object() until the objectstack
          is fully drained;

	* mark_child_object() inspects the object taken from the
          objectstack, calls fsck_walk() on it, with mark_object as the
          callback;

	  * fsck_walk() calls the callback function (i.e. mark_object) on
            the object given, and the objects immediately reachable from
            it;

            * mark_object() does not recurse, so these immediately
              reachable objects are left in the objectstack, without a
              deep recursion.

That seems to be what is going on, and this should be a good fix.

A similar change would be needed for other callers of fsck_walk(), no?
There seem to be one in builtin-unpack-objects.c (check_object calls
fsck_walk as itself as the callback). 

Another caller is in index-pack.c (sha1_object() calls fsck_walk with
mark_link as the callback), but I do not think it would  recurse for the
depth of the history, so we are safe there.

I initially expected that the fix would be to introduce this "userspace
work queue" (i.e. your objectstack) to be maintained on the
fsck.c:fsck_walk() side (perhaps as an extra parameter to an actual queue
for reentrancy), not by making the callee not to recurse, though.

^ permalink raw reply

* Re: epic fsck SIGSEGV! (was Recovering from epic fail (deleted .git/objects/pack))
From: Linus Torvalds @ 2008-12-11  3:28 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: R. Tyler Ballance, Johannes Sixt, Junio C Hamano, git
In-Reply-To: <alpine.LFD.2.00.0812102031440.14328@xanadu.home>

On Wed, 10 Dec 2008, Nicolas Pitre wrote:

> On Wed, 10 Dec 2008, Linus Torvalds wrote:
> 
> > But we should definitely fix this braindamage in fsck. Rather than 
> > recursively walk the commits, we should add them to a commit list and just 
> > walk the list iteratively.
> 
> What about:
> 
> 	http://marc.info/?l=git&m=122889563424786&w=2

Not very pretty. The basic notion is ok, but wouldn't it be nicer to at 
least use a "struct object_array" instead?

Let me try to cook something up.

		Linus

^ permalink raw reply

* Re: epic fsck SIGSEGV! (was Recovering from epic fail (deleted .git/objects/pack))
From: Linus Torvalds @ 2008-12-11  3:44 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: R. Tyler Ballance, Johannes Sixt, Junio C Hamano,
	Git Mailing List
In-Reply-To: <alpine.LFD.2.00.0812101854230.3340@localhost.localdomain>



On Wed, 10 Dec 2008, Linus Torvalds wrote:
> 
> On Wed, 10 Dec 2008, Nicolas Pitre wrote:
> 
> > On Wed, 10 Dec 2008, Linus Torvalds wrote:
> > 
> > > But we should definitely fix this braindamage in fsck. Rather than 
> > > recursively walk the commits, we should add them to a commit list and just 
> > > walk the list iteratively.
> > 
> > What about:
> > 
> > 	http://marc.info/?l=git&m=122889563424786&w=2
> 
> Not very pretty. The basic notion is ok, but wouldn't it be nicer to at 
> least use a "struct object_array" instead?
> 
> Let me try to cook something up.

I dunno. I like this patch better. It's a bit larger. I think it's a bit 
more clearly separated (ie a "mark_object_reachable()" _literally_ just 
puts the object on a list, and the whole traversal is a whole separate 
phase), but I guess it's a matter of taste.

It has gotten no real testing. Caveat emptor. And I didn't even bother to 
check that it can run with less stack or that it makes any other 
difference.

			Linus

---
 builtin-fsck.c |   38 +++++++++++++++++++++++++++++++-------
 1 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/builtin-fsck.c b/builtin-fsck.c
index afded5e..297b2c4 100644
--- a/builtin-fsck.c
+++ b/builtin-fsck.c
@@ -64,11 +64,11 @@ static int fsck_error_func(struct object *obj, int type, const char *err, ...)
 	return (type == FSCK_WARN) ? 0 : 1;
 }
 
+static struct object_array pending;
+
 static int mark_object(struct object *obj, int type, void *data)
 {
-	struct tree *tree = NULL;
 	struct object *parent = data;
-	int result;
 
 	if (!obj) {
 		printf("broken link from %7s %s\n",
@@ -96,6 +96,20 @@ static int mark_object(struct object *obj, int type, void *data)
 		return 1;
 	}
 
+	add_object_array(obj, (void *) parent, &pending);
+	return 0;
+}
+
+static void mark_object_reachable(struct object *obj)
+{
+	mark_object(obj, OBJ_ANY, 0);
+}
+
+static int traverse_one_object(struct object *obj, struct object *parent)
+{
+	int result;
+	struct tree *tree = NULL;
+
 	if (obj->type == OBJ_TREE) {
 		obj->parsed = 0;
 		tree = (struct tree *)obj;
@@ -107,15 +121,22 @@ static int mark_object(struct object *obj, int type, void *data)
 		free(tree->buffer);
 		tree->buffer = NULL;
 	}
-	if (result < 0)
-		result = 1;
-
 	return result;
 }
 
-static void mark_object_reachable(struct object *obj)
+static int traverse_reachable(void)
 {
-	mark_object(obj, OBJ_ANY, 0);
+	int result = 0;
+	while (pending.nr) {
+		struct object_array_entry *entry;
+		struct object *obj, *parent;
+
+		entry = pending.objects + --pending.nr;
+		obj = entry->item;
+		parent = (struct object *) entry->name;
+		result |= traverse_one_object(obj, parent);
+	}
+	return !!result;
 }
 
 static int mark_used(struct object *obj, int type, void *data)
@@ -237,6 +258,9 @@ static void check_connectivity(void)
 {
 	int i, max;
 
+	/* Traverse the pending reachable objects */
+	traverse_reachable();
+
 	/* Look up all the requirements, warn about missing objects.. */
 	max = get_max_object_index();
 	if (verbose)

^ permalink raw reply related

* Re: epic fsck SIGSEGV! (was Recovering from epic fail (deleted .git/objects/pack))
From: Boyd Stephen Smith Jr. @ 2008-12-11  4:00 UTC (permalink / raw)
  To: git
  Cc: Linus Torvalds, Nicolas Pitre, R. Tyler Ballance, Johannes Sixt,
	Junio C Hamano
In-Reply-To: <alpine.LFD.2.00.0812101854230.3340@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 1194 bytes --]

On Wednesday 2008 December 10 21:28:15 Linus Torvalds wrote:
>On Wed, 10 Dec 2008, Nicolas Pitre wrote:
>> 	http://marc.info/?l=git&m=122889563424786&w=2
>
>Not very pretty. The basic notion is ok, but wouldn't it be nicer to at
>least use a "struct object_array" instead?

As Junio pointed out, we may want to make similar changes with other calls in 
fsck_walk with the function itself as a callback.  It might even make sense 
to have a fsck_walk_full that handles managing the object_array itself.

While we are making changes, there appears to be a copy and paste error from 
line 74 to line 76 -- the second "broken link from" should probably be "              
to".

I'd have already submitted a patch for that, but I can't figure out how to 
tell kmail to not do quoted-printable.  :(  [And, if I can beat this client 
into submission I will.]

Linus, sorry about the reply with no snipping or original content.  I 
mis-clicked. :(
-- 
Boyd Stephen Smith Jr.                     ,= ,-_-. =. 
bss03@volumehost.net                      ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy           `-'(. .)`-' 
http://iguanasuicide.org/                      \_/     

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [PATCH 2/3 (edit v2)] gitweb: Cache $parent_commit info in git_blame()
From: Luben Tuikov @ 2008-12-11  4:08 UTC (permalink / raw)
  To: Junio C Hamano, Jakub Narebski; +Cc: Nanako Shiraishi, git
In-Reply-To: <200812110133.33124.jnareb@gmail.com>


--- On Wed, 12/10/08, Jakub Narebski <jnareb@gmail.com> wrote:
> Acked-by: Luben Tuikov <ltuikov@yahoo.com>
> Signed-off-by: Jakub Narebski <jnareb@gmail.com>

I've always seen "Acked-by:" follows "Signed-off-by:".  Junio, has this
changed?

   Luben

^ permalink raw reply

* Re: [PATCH 2/3 (edit v2)] gitweb: Cache $parent_commit info in git_blame()
From: Junio C Hamano @ 2008-12-11  4:18 UTC (permalink / raw)
  To: ltuikov; +Cc: Jakub Narebski, Nanako Shiraishi, git
In-Reply-To: <506479.53667.qm@web31812.mail.mud.yahoo.com>

Luben Tuikov <ltuikov@yahoo.com> writes:

> --- On Wed, 12/10/08, Jakub Narebski <jnareb@gmail.com> wrote:
>> Acked-by: Luben Tuikov <ltuikov@yahoo.com>
>> Signed-off-by: Jakub Narebski <jnareb@gmail.com>
>
> I've always seen "Acked-by:" follows "Signed-off-by:".  Junio, has this
> changed?

I think the order is supposed to show the order of things happened.  Jakub
signs off the patch, you Ack, and I see the patch and append my sign-off.

You saw the exact same patch text, said that looked Ok to you, and Jakub
updated the log message to present the change better and signed off the
whole thing again.  You could say that there should be another, original,
sign off by Jakub before your Ack, but I do not think it adds anything of
value.

In any case, the change will be queued to 'pu'.  It is great that it is a
trivial change that gives us great performance boost, and I wish all our
patches are like that ;-).

^ permalink raw reply

* [JGIT PATCH 0/5] Patch parsing API
From: Shawn O. Pearce @ 2008-12-11  4:58 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

This is an API to parse a Git style patch file and extract the
critical metadata from the header lines, including the hunk headers
and what lines they correspond to in the pre and post image files.

It requires the two other series I already sent out today for
QuotedString and AbbreviatedObjectId.

There's TODO markers left where we still need to insert code to
create some sort of warning object, and then hang the warnings off
the Patch class.  Given the size of the code I'm inclined to do that
as yet an additional patch, rather than squash it into this series.

My short-term roadmap related to this part of JGit:

 * Compute and collect warnings from malformed git-style patches
 * Correctly parse "git diff --cc" style output
 * Get it into egit.git

I'm stopping development once I have the diff --cc output parsing
correctly.  My rationale is right now I need the patch metadata
parsing in Gerrit 2, so that's what I'm teaching JGit to do.  Maybe
later in the month or early next I'll add patch application support,
because I also want that in Gerrit 2.  Patch application is not
currently a blocking item for me; but reading the patch metadata is.

Traditional patch support is really stubbed out too; there's a very
small subset of traditional (non-git) style patches this code can
scan the metadata from, but no tests to verify it.  Gerrit 2 gets
all of its data from a "git diff" process, so I only need support
for git diffs right now.  Yes, I'd like to add traditional patch
support too, but it won't be until later in 2009 that I would even
think about working on that myself.

Shawn O. Pearce (5):
  Add toByteArray() to TemporaryBuffer
  Add copy(InputStream) to TemporaryBuffer
  Define FileHeader to parse the header block of a git diff
  Define Patch to parse a sequence of patch FileHeaders
  Add HunkHeader to represent a single hunk of a file within a patch

 .../tst/org/spearce/jgit/patch/FileHeaderTest.java |  395 ++++++++++++++++
 .../tst/org/spearce/jgit/patch/PatchTest.java      |  155 +++++++
 .../patch/testParse_ConfigCaseInsensitive.patch    |   67 +++
 .../src/org/spearce/jgit/patch/FileHeader.java     |  480 ++++++++++++++++++++
 .../src/org/spearce/jgit/patch/HunkHeader.java     |  185 ++++++++
 .../src/org/spearce/jgit/patch/Patch.java          |  267 +++++++++++
 .../src/org/spearce/jgit/util/TemporaryBuffer.java |   51 ++
 7 files changed, 1600 insertions(+), 0 deletions(-)
 create mode 100644 org.spearce.jgit.test/tst/org/spearce/jgit/patch/FileHeaderTest.java
 create mode 100644 org.spearce.jgit.test/tst/org/spearce/jgit/patch/PatchTest.java
 create mode 100644 org.spearce.jgit.test/tst/org/spearce/jgit/patch/testParse_ConfigCaseInsensitive.patch
 create mode 100644 org.spearce.jgit/src/org/spearce/jgit/patch/FileHeader.java
 create mode 100644 org.spearce.jgit/src/org/spearce/jgit/patch/HunkHeader.java
 create mode 100644 org.spearce.jgit/src/org/spearce/jgit/patch/Patch.java

^ permalink raw reply

* [JGIT PATCH 1/5] Add toByteArray() to TemporaryBuffer
From: Shawn O. Pearce @ 2008-12-11  4:58 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git
In-Reply-To: <1228971522-28764-1-git-send-email-spearce@spearce.org>

It can be more useful to convert a buffered output stream into
a single byte array, without paying the penalties associated
with ByteArrayOutputStream to do the same action.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../src/org/spearce/jgit/util/TemporaryBuffer.java |   34 ++++++++++++++++++++
 1 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/util/TemporaryBuffer.java b/org.spearce.jgit/src/org/spearce/jgit/util/TemporaryBuffer.java
index d597c38..b1ffd6e 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/util/TemporaryBuffer.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/util/TemporaryBuffer.java
@@ -182,6 +182,40 @@ public long length() {
 	}
 
 	/**
+	 * Convert this buffer's contents into a contiguous byte array.
+	 * <p>
+	 * The buffer is only complete after {@link #close()} has been invoked.
+	 * 
+	 * @return the complete byte array; length matches {@link #length()}.
+	 * @throws IOException
+	 *             an error occurred reading from a local temporary file
+	 * @throws OutOfMemoryError
+	 *             the buffer cannot fit in memory
+	 */
+	public byte[] toByteArray() throws IOException {
+		final long len = length();
+		if (Integer.MAX_VALUE < len)
+			throw new OutOfMemoryError("Length exceeds maximum array size");
+
+		final byte[] out = new byte[(int) len];
+		if (blocks != null) {
+			int outPtr = 0;
+			for (final Block b : blocks) {
+				System.arraycopy(b.buffer, 0, out, outPtr, b.count);
+				outPtr += b.count;
+			}
+		} else {
+			final FileInputStream in = new FileInputStream(onDiskFile);
+			try {
+				NB.readFully(in, out, 0, (int) len);
+			} finally {
+				in.close();
+			}
+		}
+		return out;
+	}
+
+	/**
 	 * Send this buffer to an output stream.
 	 * <p>
 	 * This method may only be invoked after {@link #close()} has completed
-- 
1.6.1.rc2.299.gead4c

^ permalink raw reply related

* [JGIT PATCH 2/5] Add copy(InputStream) to TemporaryBuffer
From: Shawn O. Pearce @ 2008-12-11  4:58 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git
In-Reply-To: <1228971522-28764-2-git-send-email-spearce@spearce.org>

In some places we may find it ourselves with an InputStream we
need to copy into a TemporaryBuffer, so we can flatten out the
entire stream to a single byte[].  Putting the copy loop here
is more useful then duplicating it in application level code.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../src/org/spearce/jgit/util/TemporaryBuffer.java |   17 +++++++++++++++++
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/util/TemporaryBuffer.java b/org.spearce.jgit/src/org/spearce/jgit/util/TemporaryBuffer.java
index b1ffd6e..8f91246 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/util/TemporaryBuffer.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/util/TemporaryBuffer.java
@@ -42,6 +42,7 @@
 import java.io.FileInputStream;
 import java.io.FileOutputStream;
 import java.io.IOException;
+import java.io.InputStream;
 import java.io.OutputStream;
 import java.util.ArrayList;
 
@@ -135,6 +136,22 @@ public void write(final byte[] b, int off, int len) throws IOException {
 			diskOut.write(b, off, len);
 	}
 
+	/**
+	 * Copy all bytes remaining on the input stream into this buffer.
+	 * 
+	 * @param in
+	 *            the stream to read from, until EOF is reached.
+	 * @throws IOException
+	 *             an error occurred reading from the input stream, or while
+	 *             writing to a local temporary file.
+	 */
+	public void copy(final InputStream in) throws IOException {
+		final byte[] b = new byte[2048];
+		int n;
+		while ((n = in.read(b)) > 0)
+			write(b, 0, n);
+	}
+
 	private Block last() {
 		return blocks.get(blocks.size() - 1);
 	}
-- 
1.6.1.rc2.299.gead4c

^ permalink raw reply related

* [JGIT PATCH 3/5] Define FileHeader to parse the header block of a git diff
From: Shawn O. Pearce @ 2008-12-11  4:58 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git
In-Reply-To: <1228971522-28764-3-git-send-email-spearce@spearce.org>

This class parses the top header lines of a git style diff, such as:

  diff --git a/SUBMITTING_PATCHES b/Q
  similarity index 100%
  copy from SUBMITTING_PATCHES
  copy to Q

or:

  diff --git a/Q b/Q
  new file mode 100644
  index 0000000..e4a135e
  --- /dev/null
  +++ b/Q

and makes the information available in an object form.  Unit tests
cover the different styles of headers that are commonly created by
C git, including both rename formats.

The hunk header information is not handled by this class, and it
does not have a public API.  It is my intention to wrap this into
a larger container class that handles multiple FileHeaders at once,
with the base case of course being a single FileHeader describing
a patch that impacts only one file.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../tst/org/spearce/jgit/patch/FileHeaderTest.java |  391 ++++++++++++++++++
 .../src/org/spearce/jgit/patch/FileHeader.java     |  430 ++++++++++++++++++++
 2 files changed, 821 insertions(+), 0 deletions(-)
 create mode 100644 org.spearce.jgit.test/tst/org/spearce/jgit/patch/FileHeaderTest.java
 create mode 100644 org.spearce.jgit/src/org/spearce/jgit/patch/FileHeader.java

diff --git a/org.spearce.jgit.test/tst/org/spearce/jgit/patch/FileHeaderTest.java b/org.spearce.jgit.test/tst/org/spearce/jgit/patch/FileHeaderTest.java
new file mode 100644
index 0000000..1d87bc0
--- /dev/null
+++ b/org.spearce.jgit.test/tst/org/spearce/jgit/patch/FileHeaderTest.java
@@ -0,0 +1,391 @@
+/*
+ * Copyright (C) 2008, Google Inc.
+ *
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials provided
+ *   with the distribution.
+ *
+ * - Neither the name of the Git Development Community nor the
+ *   names of its contributors may be used to endorse or promote
+ *   products derived from this software without specific prior
+ *   written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
+ * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
+ * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+ * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+package org.spearce.jgit.patch;
+
+import junit.framework.TestCase;
+
+import org.spearce.jgit.lib.Constants;
+import org.spearce.jgit.lib.FileMode;
+import org.spearce.jgit.lib.ObjectId;
+
+public class FileHeaderTest extends TestCase {
+	public void testParseGitFileName_Empty() {
+		assertEquals(-1, data("").parseGitFileName(0));
+	}
+
+	public void testParseGitFileName_NoLF() {
+		assertEquals(-1, data("a/ b/").parseGitFileName(0));
+	}
+
+	public void testParseGitFileName_NoSecondLine() {
+		assertEquals(-1, data("\n").parseGitFileName(0));
+	}
+
+	public void testParseGitFileName_EmptyHeader() {
+		assertEquals(1, data("\n\n").parseGitFileName(0));
+	}
+
+	public void testParseGitFileName_Foo() {
+		final String name = "foo";
+		final FileHeader fh = header(name);
+		assertEquals(gitLine(name).length(), fh.parseGitFileName(0));
+		assertEquals(name, fh.getOldName());
+		assertSame(fh.getOldName(), fh.getNewName());
+	}
+
+	public void testParseGitFileName_FailFooBar() {
+		final FileHeader fh = data("a/foo b/bar\n-");
+		assertTrue(fh.parseGitFileName(0) > 0);
+		assertNull(fh.getOldName());
+		assertNull(fh.getNewName());
+	}
+
+	public void testParseGitFileName_FooSpBar() {
+		final String name = "foo bar";
+		final FileHeader fh = header(name);
+		assertEquals(gitLine(name).length(), fh.parseGitFileName(0));
+		assertEquals(name, fh.getOldName());
+		assertSame(fh.getOldName(), fh.getNewName());
+	}
+
+	public void testParseGitFileName_DqFooTabBar() {
+		final String name = "foo\tbar";
+		final String dqName = "foo\\tbar";
+		final FileHeader fh = dqHeader(dqName);
+		assertEquals(dqGitLine(dqName).length(), fh.parseGitFileName(0));
+		assertEquals(name, fh.getOldName());
+		assertSame(fh.getOldName(), fh.getNewName());
+	}
+
+	public void testParseGitFileName_DqFooSpLfNulBar() {
+		final String name = "foo \n\0bar";
+		final String dqName = "foo \\n\\0bar";
+		final FileHeader fh = dqHeader(dqName);
+		assertEquals(dqGitLine(dqName).length(), fh.parseGitFileName(0));
+		assertEquals(name, fh.getOldName());
+		assertSame(fh.getOldName(), fh.getNewName());
+	}
+
+	public void testParseGitFileName_SrcFooC() {
+		final String name = "src/foo/bar/argh/code.c";
+		final FileHeader fh = header(name);
+		assertEquals(gitLine(name).length(), fh.parseGitFileName(0));
+		assertEquals(name, fh.getOldName());
+		assertSame(fh.getOldName(), fh.getNewName());
+	}
+
+	public void testParseGitFileName_SrcFooCNonStandardPrefix() {
+		final String name = "src/foo/bar/argh/code.c";
+		final String header = "project-v-1.0/" + name + " mydev/" + name + "\n";
+		final FileHeader fh = data(header + "-");
+		assertEquals(header.length(), fh.parseGitFileName(0));
+		assertEquals(name, fh.getOldName());
+		assertSame(fh.getOldName(), fh.getNewName());
+	}
+
+	public void testParseUnicodeName_NewFile() {
+		final FileHeader fh = data("diff --git \"a/\\303\\205ngstr\\303\\266m\" \"b/\\303\\205ngstr\\303\\266m\"\n"
+				+ "new file mode 100644\n"
+				+ "index 0000000..7898192\n"
+				+ "--- /dev/null\n"
+				+ "+++ \"b/\\303\\205ngstr\\303\\266m\"\n"
+				+ "@@ -0,0 +1 @@\n" + "+a\n");
+		assertParse(fh);
+
+		assertEquals("/dev/null", fh.getOldName());
+		assertSame(FileHeader.DEV_NULL, fh.getOldName());
+		assertEquals("\u00c5ngstr\u00f6m", fh.getNewName());
+
+		assertSame(FileHeader.ChangeType.ADD, fh.getChangeType());
+
+		assertNull(fh.getOldMode());
+		assertSame(FileMode.REGULAR_FILE, fh.getNewMode());
+
+		assertEquals("0000000", fh.getOldId().name());
+		assertEquals("7898192", fh.getNewId().name());
+		assertEquals(0, fh.getScore());
+	}
+
+	public void testParseUnicodeName_DeleteFile() {
+		final FileHeader fh = data("diff --git \"a/\\303\\205ngstr\\303\\266m\" \"b/\\303\\205ngstr\\303\\266m\"\n"
+				+ "deleted file mode 100644\n"
+				+ "index 7898192..0000000\n"
+				+ "--- \"a/\\303\\205ngstr\\303\\266m\"\n"
+				+ "+++ /dev/null\n"
+				+ "@@ -1 +0,0 @@\n" + "-a\n");
+		assertParse(fh);
+
+		assertEquals("\u00c5ngstr\u00f6m", fh.getOldName());
+		assertEquals("/dev/null", fh.getNewName());
+		assertSame(FileHeader.DEV_NULL, fh.getNewName());
+
+		assertSame(FileHeader.ChangeType.DELETE, fh.getChangeType());
+
+		assertSame(FileMode.REGULAR_FILE, fh.getOldMode());
+		assertNull(fh.getNewMode());
+
+		assertEquals("7898192", fh.getOldId().name());
+		assertEquals("0000000", fh.getNewId().name());
+		assertEquals(0, fh.getScore());
+	}
+
+	public void testParseModeChange() {
+		final FileHeader fh = data("diff --git a/a b b/a b\n"
+				+ "old mode 100644\n" + "new mode 100755\n");
+		assertParse(fh);
+		assertEquals("a b", fh.getOldName());
+		assertEquals("a b", fh.getNewName());
+
+		assertSame(FileHeader.ChangeType.MODIFY, fh.getChangeType());
+
+		assertNull(fh.getOldId());
+		assertNull(fh.getNewId());
+
+		assertSame(FileMode.REGULAR_FILE, fh.getOldMode());
+		assertSame(FileMode.EXECUTABLE_FILE, fh.getNewMode());
+		assertEquals(0, fh.getScore());
+	}
+
+	public void testParseRename100_NewStyle() {
+		final FileHeader fh = data("diff --git a/a b/ c/\\303\\205ngstr\\303\\266m\n"
+				+ "similarity index 100%\n"
+				+ "rename from a\n"
+				+ "rename to \" c/\\303\\205ngstr\\303\\266m\"\n");
+		int ptr = fh.parseGitFileName(0);
+		assertTrue(ptr > 0);
+		assertNull(fh.getOldName()); // can't parse names on a rename
+		assertNull(fh.getNewName());
+
+		ptr = fh.parseGitHeaders(ptr);
+		assertTrue(ptr > 0);
+
+		assertEquals("a", fh.getOldName());
+		assertEquals(" c/\u00c5ngstr\u00f6m", fh.getNewName());
+
+		assertSame(FileHeader.ChangeType.RENAME, fh.getChangeType());
+
+		assertNull(fh.getOldId());
+		assertNull(fh.getNewId());
+
+		assertNull(fh.getOldMode());
+		assertNull(fh.getNewMode());
+
+		assertEquals(100, fh.getScore());
+	}
+
+	public void testParseRename100_OldStyle() {
+		final FileHeader fh = data("diff --git a/a b/ c/\\303\\205ngstr\\303\\266m\n"
+				+ "similarity index 100%\n"
+				+ "rename old a\n"
+				+ "rename new \" c/\\303\\205ngstr\\303\\266m\"\n");
+		int ptr = fh.parseGitFileName(0);
+		assertTrue(ptr > 0);
+		assertNull(fh.getOldName()); // can't parse names on a rename
+		assertNull(fh.getNewName());
+
+		ptr = fh.parseGitHeaders(ptr);
+		assertTrue(ptr > 0);
+
+		assertEquals("a", fh.getOldName());
+		assertEquals(" c/\u00c5ngstr\u00f6m", fh.getNewName());
+
+		assertSame(FileHeader.ChangeType.RENAME, fh.getChangeType());
+
+		assertNull(fh.getOldId());
+		assertNull(fh.getNewId());
+
+		assertNull(fh.getOldMode());
+		assertNull(fh.getNewMode());
+
+		assertEquals(100, fh.getScore());
+	}
+	public void testParseCopy100() {
+		final FileHeader fh = data("diff --git a/a b/ c/\\303\\205ngstr\\303\\266m\n"
+				+ "similarity index 100%\n"
+				+ "copy from a\n"
+				+ "copy to \" c/\\303\\205ngstr\\303\\266m\"\n");
+		int ptr = fh.parseGitFileName(0);
+		assertTrue(ptr > 0);
+		assertNull(fh.getOldName()); // can't parse names on a copy
+		assertNull(fh.getNewName());
+
+		ptr = fh.parseGitHeaders(ptr);
+		assertTrue(ptr > 0);
+
+		assertEquals("a", fh.getOldName());
+		assertEquals(" c/\u00c5ngstr\u00f6m", fh.getNewName());
+
+		assertSame(FileHeader.ChangeType.COPY, fh.getChangeType());
+
+		assertNull(fh.getOldId());
+		assertNull(fh.getNewId());
+
+		assertNull(fh.getOldMode());
+		assertNull(fh.getNewMode());
+
+		assertEquals(100, fh.getScore());
+	}
+
+	public void testParseFullIndexLine_WithMode() {
+		final String oid = "78981922613b2afb6025042ff6bd878ac1994e85";
+		final String nid = "61780798228d17af2d34fce4cfbdf35556832472";
+		final FileHeader fh = data("diff --git a/a b/a\n" + "index " + oid
+				+ ".." + nid + " 100644\n" + "--- a/a\n" + "+++ b/a\n");
+		assertParse(fh);
+
+		assertEquals("a", fh.getOldName());
+		assertEquals("a", fh.getNewName());
+
+		assertSame(FileMode.REGULAR_FILE, fh.getOldMode());
+		assertSame(FileMode.REGULAR_FILE, fh.getNewMode());
+
+		assertNotNull(fh.getOldId());
+		assertNotNull(fh.getNewId());
+
+		assertTrue(fh.getOldId().isComplete());
+		assertTrue(fh.getNewId().isComplete());
+
+		assertEquals(ObjectId.fromString(oid), fh.getOldId().toObjectId());
+		assertEquals(ObjectId.fromString(nid), fh.getNewId().toObjectId());
+	}
+
+	public void testParseFullIndexLine_NoMode() {
+		final String oid = "78981922613b2afb6025042ff6bd878ac1994e85";
+		final String nid = "61780798228d17af2d34fce4cfbdf35556832472";
+		final FileHeader fh = data("diff --git a/a b/a\n" + "index " + oid
+				+ ".." + nid + "\n" + "--- a/a\n" + "+++ b/a\n");
+		assertParse(fh);
+
+		assertEquals("a", fh.getOldName());
+		assertEquals("a", fh.getNewName());
+
+		assertNull(fh.getOldMode());
+		assertNull(fh.getNewMode());
+
+		assertNotNull(fh.getOldId());
+		assertNotNull(fh.getNewId());
+
+		assertTrue(fh.getOldId().isComplete());
+		assertTrue(fh.getNewId().isComplete());
+
+		assertEquals(ObjectId.fromString(oid), fh.getOldId().toObjectId());
+		assertEquals(ObjectId.fromString(nid), fh.getNewId().toObjectId());
+	}
+
+	public void testParseAbbrIndexLine_WithMode() {
+		final int a = 7;
+		final String oid = "78981922613b2afb6025042ff6bd878ac1994e85";
+		final String nid = "61780798228d17af2d34fce4cfbdf35556832472";
+		final FileHeader fh = data("diff --git a/a b/a\n" + "index "
+				+ oid.substring(0, a - 1) + ".." + nid.substring(0, a - 1)
+				+ " 100644\n" + "--- a/a\n" + "+++ b/a\n");
+		assertParse(fh);
+
+		assertEquals("a", fh.getOldName());
+		assertEquals("a", fh.getNewName());
+
+		assertSame(FileMode.REGULAR_FILE, fh.getOldMode());
+		assertSame(FileMode.REGULAR_FILE, fh.getNewMode());
+
+		assertNotNull(fh.getOldId());
+		assertNotNull(fh.getNewId());
+
+		assertFalse(fh.getOldId().isComplete());
+		assertFalse(fh.getNewId().isComplete());
+
+		assertEquals(oid.substring(0, a - 1), fh.getOldId().name());
+		assertEquals(nid.substring(0, a - 1), fh.getNewId().name());
+
+		assertTrue(ObjectId.fromString(oid).startsWith(fh.getOldId()));
+		assertTrue(ObjectId.fromString(nid).startsWith(fh.getNewId()));
+	}
+
+	public void testParseAbbrIndexLine_NoMode() {
+		final int a = 7;
+		final String oid = "78981922613b2afb6025042ff6bd878ac1994e85";
+		final String nid = "61780798228d17af2d34fce4cfbdf35556832472";
+		final FileHeader fh = data("diff --git a/a b/a\n" + "index "
+				+ oid.substring(0, a - 1) + ".." + nid.substring(0, a - 1)
+				+ "\n" + "--- a/a\n" + "+++ b/a\n");
+		assertParse(fh);
+
+		assertEquals("a", fh.getOldName());
+		assertEquals("a", fh.getNewName());
+
+		assertNull(fh.getOldMode());
+		assertNull(fh.getNewMode());
+
+		assertNotNull(fh.getOldId());
+		assertNotNull(fh.getNewId());
+
+		assertFalse(fh.getOldId().isComplete());
+		assertFalse(fh.getNewId().isComplete());
+
+		assertEquals(oid.substring(0, a - 1), fh.getOldId().name());
+		assertEquals(nid.substring(0, a - 1), fh.getNewId().name());
+
+		assertTrue(ObjectId.fromString(oid).startsWith(fh.getOldId()));
+		assertTrue(ObjectId.fromString(nid).startsWith(fh.getNewId()));
+	}
+
+	private static void assertParse(final FileHeader fh) {
+		int ptr = fh.parseGitFileName(0);
+		assertTrue(ptr > 0);
+		ptr = fh.parseGitHeaders(ptr);
+		assertTrue(ptr > 0);
+	}
+
+	private static FileHeader data(final String in) {
+		return new FileHeader(Constants.encodeASCII(in), 0);
+	}
+
+	private static FileHeader header(final String path) {
+		return data(gitLine(path) + "--- " + path + "\n");
+	}
+
+	private static String gitLine(final String path) {
+		return "a/" + path + " b/" + path + "\n";
+	}
+
+	private static FileHeader dqHeader(final String path) {
+		return data(dqGitLine(path) + "--- " + path + "\n");
+	}
+
+	private static String dqGitLine(final String path) {
+		return "\"a/" + path + "\" \"b/" + path + "\"\n";
+	}
+}
diff --git a/org.spearce.jgit/src/org/spearce/jgit/patch/FileHeader.java b/org.spearce.jgit/src/org/spearce/jgit/patch/FileHeader.java
new file mode 100644
index 0000000..5d1454b
--- /dev/null
+++ b/org.spearce.jgit/src/org/spearce/jgit/patch/FileHeader.java
@@ -0,0 +1,430 @@
+/*
+ * Copyright (C) 2008, Google Inc.
+ *
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials provided
+ *   with the distribution.
+ *
+ * - Neither the name of the Git Development Community nor the
+ *   names of its contributors may be used to endorse or promote
+ *   products derived from this software without specific prior
+ *   written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
+ * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
+ * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+ * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+package org.spearce.jgit.patch;
+
+import static org.spearce.jgit.lib.Constants.encodeASCII;
+import static org.spearce.jgit.util.RawParseUtils.decode;
+import static org.spearce.jgit.util.RawParseUtils.match;
+import static org.spearce.jgit.util.RawParseUtils.nextLF;
+import static org.spearce.jgit.util.RawParseUtils.parseBase10;
+
+import org.spearce.jgit.lib.AbbreviatedObjectId;
+import org.spearce.jgit.lib.Constants;
+import org.spearce.jgit.lib.FileMode;
+import org.spearce.jgit.util.QuotedString;
+
+/** Patch header describing an action for a single file path. */
+public class FileHeader {
+	/** Magical file name used for file adds or deletes. */
+	public static final String DEV_NULL = "/dev/null";
+
+	private static final byte[] OLD_MODE = encodeASCII("old mode ");
+
+	private static final byte[] NEW_MODE = encodeASCII("new mode ");
+
+	private static final byte[] DELETED_FILE_MODE = encodeASCII("deleted file mode ");
+
+	private static final byte[] NEW_FILE_MODE = encodeASCII("new file mode ");
+
+	private static final byte[] COPY_FROM = encodeASCII("copy from ");
+
+	private static final byte[] COPY_TO = encodeASCII("copy to ");
+
+	private static final byte[] RENAME_OLD = encodeASCII("rename old ");
+
+	private static final byte[] RENAME_NEW = encodeASCII("rename new ");
+
+	private static final byte[] RENAME_FROM = encodeASCII("rename from ");
+
+	private static final byte[] RENAME_TO = encodeASCII("rename to ");
+
+	private static final byte[] SIMILARITY_INDEX = encodeASCII("similarity index ");
+
+	private static final byte[] DISSIMILARITY_INDEX = encodeASCII("dissimilarity index ");
+
+	private static final byte[] INDEX = encodeASCII("index ");
+
+	static final byte[] OLD_NAME = encodeASCII("--- ");
+
+	static final byte[] NEW_NAME = encodeASCII("+++ ");
+
+	static final byte[] HUNK_HDR = encodeASCII("@@ -");
+
+	/** General type of change a single file-level patch describes. */
+	public static enum ChangeType {
+		/** Add a new file to the project */
+		ADD,
+
+		/** Modify an existing file in the project (content and/or mode) */
+		MODIFY,
+
+		/** Delete an existing file from the project */
+		DELETE,
+
+		/** Rename an existing file to a new location */
+		RENAME,
+
+		/** Copy an existing file to a new location, keeping the original */
+		COPY;
+	}
+
+	/** Buffer holding the patch data for this file. */
+	final byte[] buf;
+
+	/** Offset within {@link #buf} to the "diff ..." line. */
+	final int startOffset;
+
+	/** Position 1 past the end of this file within {@link #buf}. */
+	int endOffset;
+
+	/** File name of the old (pre-image). */
+	private String oldName;
+
+	/** File name of the new (post-image). */
+	private String newName;
+
+	/** Old mode of the file, if described by the patch, else null. */
+	private FileMode oldMode;
+
+	/** New mode of the file, if described by the patch, else null. */
+	private FileMode newMode;
+
+	/** General type of change indicated by the patch. */
+	private ChangeType changeType;
+
+	/** Similarity score if {@link #changeType} is a copy or rename. */
+	private int score;
+
+	/** ObjectId listed on the index line for the old (pre-image) */
+	private AbbreviatedObjectId oldId;
+
+	/** ObjectId listed on the index line for the new (post-image) */
+	private AbbreviatedObjectId newId;
+
+	FileHeader(final byte[] b, final int offset) {
+		buf = b;
+		startOffset = offset;
+		changeType = ChangeType.MODIFY; // unless otherwise designated
+	}
+
+	/**
+	 * Get the old name associated with this file.
+	 * <p>
+	 * The meaning of the old name can differ depending on the semantic meaning
+	 * of this patch:
+	 * <ul>
+	 * <li><i>file add</i>: always <code>/dev/null</code></li>
+	 * <li><i>file modify</i>: always {@link #getNewName()}</li>
+	 * <li><i>file delete</i>: always the file being deleted</li>
+	 * <li><i>file copy</i>: source file the copy originates from</li>
+	 * <li><i>file rename</i>: source file the rename originates from</li>
+	 * </ul>
+	 * 
+	 * @return old name for this file.
+	 */
+	public String getOldName() {
+		return oldName;
+	}
+
+	/**
+	 * Get the new name associated with this file.
+	 * <p>
+	 * The meaning of the new name can differ depending on the semantic meaning
+	 * of this patch:
+	 * <ul>
+	 * <li><i>file add</i>: always the file being created</li>
+	 * <li><i>file modify</i>: always {@link #getOldName()}</li>
+	 * <li><i>file delete</i>: always <code>/dev/null</code></li>
+	 * <li><i>file copy</i>: destination file the copy ends up at</li>
+	 * <li><i>file rename</i>: destination file the rename ends up at/li>
+	 * </ul>
+	 * 
+	 * @return new name for this file.
+	 */
+	public String getNewName() {
+		return newName;
+	}
+
+	/** @return the old file mode, if described in the patch */
+	public FileMode getOldMode() {
+		return oldMode;
+	}
+
+	/** @return the new file mode, if described in the patch */
+	public FileMode getNewMode() {
+		return newMode;
+	}
+
+	/** @return the type of change this patch makes on {@link #getNewName()} */
+	public ChangeType getChangeType() {
+		return changeType;
+	}
+
+	/**
+	 * @return similarity score between {@link #getOldName()} and
+	 *         {@link #getNewName()} if {@link #getChangeType()} is
+	 *         {@link ChangeType#COPY} or {@link ChangeType#RENAME}.
+	 */
+	public int getScore() {
+		return score;
+	}
+
+	/**
+	 * Get the old object id from the <code>index</code>.
+	 * 
+	 * @return the object id; null if there is no index line
+	 */
+	public AbbreviatedObjectId getOldId() {
+		return oldId;
+	}
+
+	/**
+	 * Get the new object id from the <code>index</code>.
+	 * 
+	 * @return the object id; null if there is no index line
+	 */
+	public AbbreviatedObjectId getNewId() {
+		return newId;
+	}
+
+	/**
+	 * Parse a "diff --git" or "diff --cc" line.
+	 * 
+	 * @param ptr
+	 *            first character after the "diff --git " or "diff --cc " part.
+	 * @return first character after the LF at the end of the line; -1 on error.
+	 */
+	int parseGitFileName(int ptr) {
+		final int eol = nextLF(buf, ptr);
+		final int bol = ptr;
+		if (eol >= buf.length) {
+			return -1;
+		}
+
+		// buffer[ptr..eol] looks like "a/foo b/foo\n". After the first
+		// A regex to match this is "^[^/]+/(.*?) [^/+]+/\1\n$". There
+		// is only one way to split the line such that text to the left
+		// of the space matches the text to the right, excluding the part
+		// before the first slash.
+		//
+
+		final int aStart = nextLF(buf, ptr, '/');
+		if (aStart >= eol)
+			return eol;
+
+		while (ptr < eol) {
+			final int sp = nextLF(buf, ptr, ' ');
+			if (sp >= eol) {
+				// We can't split the header, it isn't valid.
+				// This may be OK if this is a rename patch.
+				//
+				return eol;
+			}
+			final int bStart = nextLF(buf, sp, '/');
+			if (bStart >= eol)
+				return eol;
+
+			// If buffer[aStart..sp - 1] = buffer[bStart..eol - 1]
+			// we have a valid split.
+			//
+			if (eq(aStart, sp - 1, bStart, eol - 1)) {
+				if (buf[bol] == '"') {
+					// We're a double quoted name. The region better end
+					// in a double quote too, and we need to decode the
+					// characters before reading the name.
+					//
+					if (buf[sp - 2] != '"') {
+						return eol;
+					}
+					oldName = QuotedString.GIT_PATH.dequote(buf, bol, sp - 1);
+					oldName = p1(oldName);
+				} else {
+					oldName = decode(Constants.CHARSET, buf, aStart, sp - 1);
+				}
+				newName = oldName;
+				return eol;
+			}
+
+			// This split wasn't correct. Move past the space and try
+			// another split as the space must be part of the file name.
+			//
+			ptr = sp;
+		}
+
+		return eol;
+	}
+
+	int parseGitHeaders(int ptr) {
+		final int sz = buf.length;
+		while (ptr < sz) {
+			final int eol = nextLF(buf, ptr);
+			if (match(buf, ptr, HUNK_HDR) >= 0) {
+				// First hunk header; break out and parse them later.
+				break;
+
+			} else if (match(buf, ptr, OLD_NAME) >= 0) {
+				oldName = p1(parseName(oldName, ptr + OLD_NAME.length, eol));
+				if (oldName == DEV_NULL)
+					changeType = ChangeType.ADD;
+
+			} else if (match(buf, ptr, NEW_NAME) >= 0) {
+				newName = p1(parseName(newName, ptr + NEW_NAME.length, eol));
+				if (newName == DEV_NULL)
+					changeType = ChangeType.DELETE;
+
+			} else if (match(buf, ptr, OLD_MODE) >= 0) {
+				oldMode = parseFileMode(ptr + OLD_MODE.length, eol);
+
+			} else if (match(buf, ptr, NEW_MODE) >= 0) {
+				newMode = parseFileMode(ptr + NEW_MODE.length, eol);
+
+			} else if (match(buf, ptr, DELETED_FILE_MODE) >= 0) {
+				oldMode = parseFileMode(ptr + DELETED_FILE_MODE.length, eol);
+				changeType = ChangeType.DELETE;
+
+			} else if (match(buf, ptr, NEW_FILE_MODE) >= 0) {
+				newMode = parseFileMode(ptr + NEW_FILE_MODE.length, eol);
+				changeType = ChangeType.ADD;
+
+			} else if (match(buf, ptr, COPY_FROM) >= 0) {
+				oldName = parseName(oldName, ptr + COPY_FROM.length, eol);
+				changeType = ChangeType.COPY;
+
+			} else if (match(buf, ptr, COPY_TO) >= 0) {
+				newName = parseName(newName, ptr + COPY_TO.length, eol);
+				changeType = ChangeType.COPY;
+
+			} else if (match(buf, ptr, RENAME_OLD) >= 0) {
+				oldName = parseName(oldName, ptr + RENAME_OLD.length, eol);
+				changeType = ChangeType.RENAME;
+
+			} else if (match(buf, ptr, RENAME_NEW) >= 0) {
+				newName = parseName(newName, ptr + RENAME_NEW.length, eol);
+				changeType = ChangeType.RENAME;
+
+			} else if (match(buf, ptr, RENAME_FROM) >= 0) {
+				oldName = parseName(oldName, ptr + RENAME_FROM.length, eol);
+				changeType = ChangeType.RENAME;
+
+			} else if (match(buf, ptr, RENAME_TO) >= 0) {
+				newName = parseName(newName, ptr + RENAME_TO.length, eol);
+				changeType = ChangeType.RENAME;
+
+			} else if (match(buf, ptr, SIMILARITY_INDEX) >= 0) {
+				score = parseBase10(buf, ptr + SIMILARITY_INDEX.length, null);
+
+			} else if (match(buf, ptr, DISSIMILARITY_INDEX) >= 0) {
+				score = parseBase10(buf, ptr + DISSIMILARITY_INDEX.length, null);
+
+			} else if (match(buf, ptr, INDEX) >= 0) {
+				parseIndexLine(ptr + INDEX.length, eol);
+
+			} else {
+				// Probably an empty patch (stat dirty).
+				break;
+			}
+
+			ptr = eol;
+		}
+		return ptr;
+	}
+
+	private String parseName(final String expect, int ptr, final int end) {
+		if (ptr == end)
+			return expect;
+
+		String r;
+		if (buf[ptr] == '"') {
+			// New style GNU diff format
+			//
+			r = QuotedString.GIT_PATH.dequote(buf, ptr, end - 1);
+		} else {
+			// Older style GNU diff format, an optional tab ends the name.
+			//
+			int tab = end;
+			while (ptr < tab && buf[tab - 1] != '\t')
+				tab--;
+			if (ptr == tab)
+				tab = end;
+			r = decode(Constants.CHARSET, buf, ptr, tab - 1);
+		}
+
+		if (r.equals(DEV_NULL))
+			r = DEV_NULL;
+		return r;
+	}
+
+	private static String p1(final String r) {
+		final int s = r.indexOf('/');
+		return s > 0 ? r.substring(s + 1) : r;
+	}
+
+	private FileMode parseFileMode(int ptr, final int end) {
+		int tmp = 0;
+		while (ptr < end - 1) {
+			tmp <<= 3;
+			tmp += buf[ptr++] - '0';
+		}
+		return FileMode.fromBits(tmp);
+	}
+
+	private void parseIndexLine(int ptr, final int end) {
+		// "index $asha1..$bsha1[ $mode]" where $asha1 and $bsha1
+		// can be unique abbreviations
+		//
+		final int dot2 = nextLF(buf, ptr, '.');
+		final int mode = nextLF(buf, dot2, ' ');
+
+		oldId = AbbreviatedObjectId.fromString(buf, ptr, dot2 - 1);
+		newId = AbbreviatedObjectId.fromString(buf, dot2 + 1, mode - 1);
+
+		if (mode < end)
+			newMode = oldMode = parseFileMode(mode, end);
+	}
+
+	private boolean eq(int aPtr, int aEnd, int bPtr, int bEnd) {
+		if (aEnd - aPtr != bEnd - bPtr) {
+			return false;
+		}
+		while (aPtr < aEnd) {
+			if (buf[aPtr++] != buf[bPtr++])
+				return false;
+		}
+		return true;
+	}
+}
-- 
1.6.1.rc2.299.gead4c

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox