Git development
 help / color / mirror / Atom feed
* Re: [PATCH] git-cherry: make <upstream> parameter optional
From: Junio C Hamano @ 2009-01-01 21:00 UTC (permalink / raw)
  To: markus.heidelberg; +Cc: git
In-Reply-To: <200812291845.20500.markus.heidelberg@web.de>

Markus Heidelberg <markus.heidelberg@web.de> writes:

> diff --git a/Documentation/git-cherry.txt b/Documentation/git-cherry.txt
> index 74d14c4..556ea23 100644
> --- a/Documentation/git-cherry.txt
> +++ b/Documentation/git-cherry.txt
> @@ -7,7 +7,7 @@ git-cherry - Find commits not merged upstream
>  
>  SYNOPSIS
>  --------
> -'git cherry' [-v] <upstream> [<head>] [<limit>]
> +'git cherry' [-v] [<upstream>] [<head>] [<limit>]

Shouldn't this be [<upstream> [<head> [<limit>]]]?

^ permalink raw reply

* Re: [PATCH] Git.pm: let a "false" Directory parameter (such as "0") be used correctly by the constructor"
From: Junio C Hamano @ 2009-01-01 21:00 UTC (permalink / raw)
  To: Philippe Bruhat (BooK); +Cc: git
In-Reply-To: <1230510300-7854-1-git-send-email-book@cpan.org>

"Philippe Bruhat (BooK)" <book@cpan.org> writes:

> ---
>  perl/Git.pm |    7 ++++---
>  1 files changed, 4 insertions(+), 3 deletions(-)

Lacks sign-off and description but otherwise looks good.  Will queue to
'pu' to leave you a chance to re-send.

    commit b29b1ae7442cd7c1c78e38b7d980905944ec31e0
    Author: Philippe Bruhat (BooK) <book@cpan.org>
    Date:   Mon Dec 29 01:25:00 2008 +0100

        Git.pm: correctly handle directory name that evaluates to "false"

        The repository constructor mistakenly rewrote a Directory parameter that
        Perl happens to evaluate to false (e.g. "0") to ".".


Thanks.

^ permalink raw reply

* [PATCH 2/3] unpack-trees: fix path search bug in verify_absent
From: Clemens Buchacher @ 2009-01-01 20:54 UTC (permalink / raw)
  To: git; +Cc: gitster, Clemens Buchacher
In-Reply-To: <1230843273-11056-2-git-send-email-drizzd@aon.at>

Commit 0cf73755 (unpack-trees.c: assume submodules are clean during
check-out) changed an argument to verify_absent from 'path' to 'ce',
which is however shadowed by a local variable of the same name.

The bug triggers if verify_absent is used on a tree entry, for which
the index contains one or more subsequent directories of the same
length. The affected subdirectories are removed from the index. The
testcase included in this commit bisects to 55218834 (checkout: do not
lose staged removal), which reveals the bug in this case, but is
otherwise unrelated.
---
 t/t1001-read-tree-m-2way.sh |   27 +++++++++++++++++++++++++++
 unpack-trees.c              |   23 ++++++++++++-----------
 2 files changed, 39 insertions(+), 11 deletions(-)

diff --git a/t/t1001-read-tree-m-2way.sh b/t/t1001-read-tree-m-2way.sh
index 7f6ab31..271bc4e 100755
--- a/t/t1001-read-tree-m-2way.sh
+++ b/t/t1001-read-tree-m-2way.sh
@@ -365,4 +365,31 @@ test_expect_success \
      git ls-files --stage &&
      test -f a/b'
 
+test_expect_success \
+    'a/b vs a, plus c/d case setup.' \
+    'rm -f .git/index &&
+     rm -fr a &&
+     : >a &&
+     mkdir c &&
+     : >c/d &&
+     git update-index --add a c/d &&
+     treeM=`git write-tree` &&
+     echo treeM $treeM &&
+     git ls-tree $treeM &&
+     git ls-files --stage >treeM.out &&
+
+     rm -f a &&
+     mkdir a
+     : >a/b &&
+     git update-index --add --remove a a/b &&
+     treeH=`git write-tree` &&
+     echo treeH $treeH &&
+     git ls-tree $treeH'
+
+test_expect_success \
+    'a/b vs a, plus c/d case test.' \
+    'git read-tree -u -m "$treeH" "$treeM" &&
+     git ls-files --stage | tee >treeMcheck.out &&
+     test_cmp treeM.out treeMcheck.out'
+
 test_done
diff --git a/unpack-trees.c b/unpack-trees.c
index a736947..f8e2484 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -289,7 +289,8 @@ static int unpack_nondirectories(int n, unsigned long mask, unsigned long dirmas
 	return 0;
 }
 
-static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, struct name_entry *names, struct traverse_info *info)
+static int unpack_callback(int n, unsigned long mask, unsigned long dirmask,
+		struct name_entry *names, struct traverse_info *info)
 {
 	struct cache_entry *src[5] = { NULL, };
 	struct unpack_trees_options *o = info->data;
@@ -517,22 +518,22 @@ static int verify_clean_subdirectory(struct cache_entry *ce, const char *action,
 	namelen = strlen(ce->name);
 	pos = index_name_pos(o->src_index, ce->name, namelen);
 	if (0 <= pos)
-		return cnt; /* we have it as nondirectory */
+		return 0; /* we have it as nondirectory */
 	pos = -pos - 1;
 	for (i = pos; i < o->src_index->cache_nr; i++) {
-		struct cache_entry *ce = o->src_index->cache[i];
-		int len = ce_namelen(ce);
+		struct cache_entry *ce2 = o->src_index->cache[i];
+		int len = ce_namelen(ce2);
 		if (len < namelen ||
-		    strncmp(ce->name, ce->name, namelen) ||
-		    ce->name[namelen] != '/')
+		    strncmp(ce->name, ce2->name, namelen) ||
+		    ce2->name[namelen] != '/')
 			break;
 		/*
-		 * ce->name is an entry in the subdirectory.
+		 * ce2->name is an entry in the subdirectory.
 		 */
-		if (!ce_stage(ce)) {
-			if (verify_uptodate(ce, o))
+		if (!ce_stage(ce2)) {
+			if (verify_uptodate(ce2, o))
 				return -1;
-			add_entry(o, ce, CE_REMOVE, 0);
+			add_entry(o, ce2, CE_REMOVE, 0);
 		}
 		cnt++;
 	}
@@ -624,7 +625,7 @@ static int verify_absent(struct cache_entry *ce, const char *action,
 			 * If this removed entries from the index,
 			 * what that means is:
 			 *
-			 * (1) the caller unpack_trees_rec() saw path/foo
+			 * (1) the caller unpack_callback() saw path/foo
 			 * in the index, and it has not removed it because
 			 * it thinks it is handling 'path' as blob with
 			 * D/F conflict;
-- 
1.6.1

^ permalink raw reply related

* [PATCH 1/3] unpack-trees: handle failure in verify_absent
From: Clemens Buchacher @ 2009-01-01 20:54 UTC (permalink / raw)
  To: git; +Cc: gitster, Clemens Buchacher
In-Reply-To: <1230843273-11056-1-git-send-email-drizzd@aon.at>

Commit 203a2fe1 (Allow callers of unpack_trees() to handle failure)
changed the "die on error" behavior to "return failure code".
verify_absent did not handle errors returned by
verify_clean_subdirectory, however.
---
 t/t1001-read-tree-m-2way.sh |   24 ++++++++++++++++++++++++
 unpack-trees.c              |    8 +++++---
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/t/t1001-read-tree-m-2way.sh b/t/t1001-read-tree-m-2way.sh
index 4b44e13..7f6ab31 100755
--- a/t/t1001-read-tree-m-2way.sh
+++ b/t/t1001-read-tree-m-2way.sh
@@ -341,4 +341,28 @@ test_expect_success \
      check_cache_at DF/DF dirty &&
      :'
 
+test_expect_success \
+    'a/b (untracked) vs a case setup.' \
+    'rm -f .git/index &&
+     : >a &&
+     git update-index --add a &&
+     treeM=`git write-tree` &&
+     echo treeM $treeM &&
+     git ls-tree $treeM &&
+     git ls-files --stage >treeM.out &&
+
+     rm -f a &&
+     git update-index --remove a &&
+     mkdir a &&
+     : >a/b &&
+     treeH=`git write-tree` &&
+     echo treeH $treeH &&
+     git ls-tree $treeH'
+
+test_expect_success \
+    'a/b (untracked) vs a, plus c/d case test.' \
+    '! git read-tree -u -m "$treeH" "$treeM" &&
+     git ls-files --stage &&
+     test -f a/b'
+
 test_done
diff --git a/unpack-trees.c b/unpack-trees.c
index 54f301d..a736947 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -588,7 +588,7 @@ static int verify_absent(struct cache_entry *ce, const char *action,
 		return 0;
 
 	if (!lstat(ce->name, &st)) {
-		int cnt;
+		int ret;
 		int dtype = ce_to_dtype(ce);
 		struct cache_entry *result;
 
@@ -616,7 +616,9 @@ static int verify_absent(struct cache_entry *ce, const char *action,
 			 * files that are in "foo/" we would lose
 			 * it.
 			 */
-			cnt = verify_clean_subdirectory(ce, action, o);
+			ret = verify_clean_subdirectory(ce, action, o);
+			if (ret < 0)
+				return ret;
 
 			/*
 			 * If this removed entries from the index,
@@ -635,7 +637,7 @@ static int verify_absent(struct cache_entry *ce, const char *action,
 			 * We need to increment it by the number of
 			 * deleted entries here.
 			 */
-			o->pos += cnt;
+			o->pos += ret;
 			return 0;
 		}
 
-- 
1.6.1

^ permalink raw reply related

* unpack-trees: fix D/F conflict bugs in verify_absent
From: Clemens Buchacher @ 2009-01-01 20:54 UTC (permalink / raw)
  To: git; +Cc: gitster

I came across a few bugs while investigating the changes I proposed in the
modify/delete conflict thread. The first two are quite obvious. The third I'm
not so sure about. I could not find a testcase where it matters. Junio, do you
recall the original intention of that code?

[PATCH 1/3] unpack-trees: handle failure in verify_absent
[PATCH 2/3] unpack-trees: fix path search bug in verify_absent
[PATCH 3/3] unpack-trees: remove redundant path search in verify_absent

 t/t1001-read-tree-m-2way.sh |   51 +++++++++++++++++++++++++++++++++++++++++++
 unpack-trees.c              |   37 +++++++++++++++----------------
 2 files changed, 69 insertions(+), 19 deletions(-)

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Adeodato Simó @ 2009-01-01 20:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Schindelin, Pierre Habouzit, davidel, Francis Galiegue,
	Git ML
In-Reply-To: <alpine.LFD.2.00.0901011134210.5086@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 4390 bytes --]

* Linus Torvalds [Thu, 01 Jan 2009 11:45:21 -0800]:

> On Thu, 1 Jan 2009, Johannes Schindelin wrote:

> > Nothing fancy, really, just a straight-forward implementation of the
> > heavily under-documented and under-analyzed paience diff algorithm.

> Exactly because the patience diff is so under-documented, could you 
> perhaps give a few examples of how it differs in the result, and why it's 
> so wonderful? Yes, yes, I can google, and no, no, nothing useful shows up 
> except for *totally* content-free fanboisms. 

> So could we have some actual real data on it?

For me, the cases where I find patience output to be of substantial
higher readability are those involving a rewrite of several consecutive
paragraphs (i.e., lines of code separated by blank lines). Compare:

-8<- git -8<-
@@ -51,29 +51,30 @@ def mbox_update(bug):
             f.close()
     else:
         # make a list of Message-Id we have
-        fp1 = file(path, 'ab+')
-        ids1 = [ x.get('Message-Id') for x in mailbox.UnixMailbox(fp1) ]
+        msgids = { x.get('Message-Id') for x in mailbox.mbox(path) }
 
-        # get remote mbox again
-        fp2 = tempfile.TemporaryFile()
-        retrieve_mbox(bug, fp2)
+        with tempfile.NamedTemporaryFile() as tmpfd:
+            # retrieve the remote mbox again
+            retrieve_mbox(bug, tmpfd)
 
-        # parse its messages
-        fp2.seek(0)
-        parser = email.Parser.Parser()
-        msgs2 = dict((x['Message-Id'], x)
-                    for x in mailbox.UnixMailbox(fp2, parser.parse))
+            # parse its messages
+            parser = email.parser.Parser()
+            new_msgids = { x['Message-Id']: x
+                            for x in mailbox.mbox(tmpfd.name, parser.parse) }
 
-        # now append the new ones
-        for msgid in set(msgs2.keys()) - set(ids1):
-            fp1.write('\n' + msgs2[msgid].as_string(unixfrom=True))
+        with open(path, 'a+') as fd:
+            # now append the new messages
+            for msgid in new_msgids.keys() - msgids:
+                fd.write('\n' + new_msgids[msgid].as_string(unixfrom=True))
 
     return path
->8- git ->8-

with:

-8<- bzr patience -8<-
@@ -51,29 +51,30 @@
             f.close()
     else:
         # make a list of Message-Id we have
-        fp1 = file(path, 'ab+')
-        ids1 = [ x.get('Message-Id') for x in mailbox.UnixMailbox(fp1) ]
-
-        # get remote mbox again
-        fp2 = tempfile.TemporaryFile()
-        retrieve_mbox(bug, fp2)
-
-        # parse its messages
-        fp2.seek(0)
-        parser = email.Parser.Parser()
-        msgs2 = dict((x['Message-Id'], x)
-                    for x in mailbox.UnixMailbox(fp2, parser.parse))
-
-        # now append the new ones
-        for msgid in set(msgs2.keys()) - set(ids1):
-            fp1.write('\n' + msgs2[msgid].as_string(unixfrom=True))
+        msgids = { x.get('Message-Id') for x in mailbox.mbox(path) }
+
+        with tempfile.NamedTemporaryFile() as tmpfd:
+            # retrieve the remote mbox again
+            retrieve_mbox(bug, tmpfd)
+
+            # parse its messages
+            parser = email.parser.Parser()
+            new_msgids = { x['Message-Id']: x
+                            for x in mailbox.mbox(tmpfd.name, parser.parse) }
+
+        with open(path, 'a+') as fd:
+            # now append the new messages
+            for msgid in new_msgids.keys() - msgids:
+                fd.write('\n' + new_msgids[msgid].as_string(unixfrom=True))
 
     return path
->8- bzr patience ->8-

I don't know about you, but I find the latter much easier to read,
because the whole context of each version is always available.

As you see, in (at least) this case is just a matter of considering the
blank lines worthy of presented as common, or not.

I'll note that in this particular case, `git diff` yielded the very same
results with or without --patience. I don't know why that is, Johannes?
I'll also note that /usr/bin/diff produces (in this case) something
closer to patience than to git.

I'm attaching both versions of the file in case they are useful to
anybody.

-- 
Adeodato Simó                                     dato at net.com.org.es
Debian Developer                                  adeodato at debian.org
 
I promise you. Once I enter into an exclusive relationship, I sleep with
very few people.
                -- Denny Crane

[-- Attachment #2: bdo0 --]
[-- Type: text/plain, Size: 2358 bytes --]

#! /usr/bin/python
## vim: fileencoding=utf-8

"""Open Debian BTS mboxes with Mutt, à la /usr/bin/bts show --mbox.

A cache of mboxes is kept, and changed mboxes will be merged with existing
files instead of replacing them, so that e.g. read-status is preserved for each
message.
"""

import os
import re
import sys
import urllib
import mailbox
import tempfile
import email.Parser

MBOX_DIR = os.path.expanduser('~/.mail/y.bug-cache')

##

def main():
    if len(sys.argv) != 2:
        print >>sys.stderr, 'Usage: %s <bugnumber>' % (sys.argv[0],)
        sys.exit(1)

    bug = re.sub(r'[^0-9]', '', sys.argv[1])
    if not re.match(r'\d{4,}$', bug):
        print >>sys.stderr, \
            'E: %s does not seem a valid number' % (sys.argv[1],)
        sys.exit(1)

    path = mbox_update(bug)
    invoke_mailer(path)

##

def mbox_update(bug):
    """Return a path with an up-to-date copy of the mbox for bug."""
    path = os.path.join(MBOX_DIR, bug + '.mbox')

    if not os.path.exists(path):
        f = file(path, 'wb')
        try:
            retrieve_mbox(bug, f)
        except:
            os.unlink(path)
            raise
        else:
            f.close()
    else:
        # make a list of Message-Id we have
        fp1 = file(path, 'ab+')
        ids1 = [ x.get('Message-Id') for x in mailbox.UnixMailbox(fp1) ]

        # get remote mbox again
        fp2 = tempfile.TemporaryFile()
        retrieve_mbox(bug, fp2)

        # parse its messages
        fp2.seek(0)
        parser = email.Parser.Parser()
        msgs2 = dict((x['Message-Id'], x)
                    for x in mailbox.UnixMailbox(fp2, parser.parse))

        # now append the new ones
        for msgid in set(msgs2.keys()) - set(ids1):
            fp1.write('\n' + msgs2[msgid].as_string(unixfrom=True))

    return path

def retrieve_mbox(bug, fileobj):
    """Retrieve mbox for bug from bugs.debian.org, writing it to fileobj."""
    for line in urllib.urlopen(
            'http://bugs.debian.org/cgi-bin/bugreport.cgi?mboxstatus=yes;mboxmaint=yes;mbox=yes;bug=%s' % (bug,)):
        fileobj.write(line)

def invoke_mailer(path):
    """Exec mutt, opening path."""
    os.execlp('mutt', 'mutt', '-f', path)

##

if __name__ == '__main__':
    try:
        sys.exit(main())
    except KeyboardInterrupt:
        print >>sys.stderr, '\nCancelled.'
        sys.exit(1)

[-- Attachment #3: bdo1 --]
[-- Type: text/plain, Size: 2501 bytes --]

#! /usr/bin/python3

"""Open Debian BTS mboxes with Mutt, à la /usr/bin/bts show --mbox.

A cache of mboxes is kept, and changed mboxes will be merged with existing
files instead of replacing them, so that e.g. read-status is preserved for each
message.
"""

import os
import re
import sys
import mailbox
import tempfile
import email.parser
import urllib.request

MBOX_DIR = os.path.expanduser('~/.mail/y.bug-cache')

##

def main():
    if len(sys.argv) != 2:
        print('Usage: {0} <bugnumber>'.format(sys.argv[0]), file=sys.stderr)
        return 1
    else:
        bug = re.sub(r'[^0-9]', '', sys.argv[1])

    if not re.search(r'^\d{4,}$', bug):
        print('E: {0} does not seem a valid number'.format(sys.argv[1]),
              file=sys.stderr)
        return 1

    path = mbox_update(bug)
    invoke_mailer(path)

##

def mbox_update(bug):
    """Return a path with an up-to-date copy of the mbox for bug."""
    path = os.path.join(MBOX_DIR, bug + '.mbox')

    if not os.path.exists(path):
        f = open(path, 'wb')
        try:
            retrieve_mbox(bug, f)
        except:
            os.unlink(path)
            raise
        else:
            f.close()
    else:
        # make a list of Message-Id we have
        msgids = { x.get('Message-Id') for x in mailbox.mbox(path) }

        with tempfile.NamedTemporaryFile() as tmpfd:
            # retrieve the remote mbox again
            retrieve_mbox(bug, tmpfd)

            # parse its messages
            parser = email.parser.Parser()
            new_msgids = { x['Message-Id']: x
                            for x in mailbox.mbox(tmpfd.name, parser.parse) }

        with open(path, 'a+') as fd:
            # now append the new messages
            for msgid in new_msgids.keys() - msgids:
                fd.write('\n' + new_msgids[msgid].as_string(unixfrom=True))

    return path

def retrieve_mbox(bug, fileobj):
    """Retrieve mbox for bug from bugs.debian.org, writing it to fileobj."""
    url = urllib.request.urlopen(
            'http://bugs.debian.org/cgi-bin/bugreport.cgi?'
            'mboxstatus=yes;mboxmaint=yes;mbox=yes;bug={0}'.format(bug))
    for line in url.fp: # http://bugs.python.org/issue4608
        fileobj.write(line)

def invoke_mailer(path):
    """Exec mutt, opening path."""
    os.execlp('mutt', 'mutt', '-f', path)

##

if __name__ == '__main__':
    try:
        sys.exit(main())
    except KeyboardInterrupt:
        print('\nCancelled.', file=sys.stderr)
        sys.exit(1)

^ permalink raw reply

* Re: why still no empty directory support in git
From: Jeff King @ 2009-01-01 20:06 UTC (permalink / raw)
  To: Asheesh Laroia; +Cc: Git Mailing List
In-Reply-To: <alpine.DEB.2.00.0812300346040.19911@vellum.laroia.net>

On Tue, Dec 30, 2008 at 03:58:46AM -0500, Asheesh Laroia wrote:

> So, let's say I take your suggestion.
>
> $ touch ~/Maildir/new/.exists
> $ git add ~/Maildir/new/.exists && git commit -m "La di da"
>
> Now a spec-compliant Maildir user agent will attempt to deliver this new  
> "email message" of zero bytes into the mail spool and assign it a message  
> UID.  Doing so will remove it from Maildir/new.

No. The maildir spec says:

  A unique name can be anything that doesn't contain a colon (or slash)
  and doesn't start with a dot.
     -- http://cr.yp.to/proto/maildir.html

where a "unique name" is the filename used for a message. In practice,
every maildir implementation I have seen ignores files starting with a
dot. Do you have one that doesn't?

-Peff

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Linus Torvalds @ 2009-01-01 20:00 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Pierre Habouzit, davidel, Francis Galiegue, Git ML
In-Reply-To: <alpine.LFD.2.00.0901011134210.5086@localhost.localdomain>



On Thu, 1 Jan 2009, Linus Torvalds wrote:
> 
> So could we have some actual real data on it?

.. and some testing. I tried to get some limited data for the kernel 
myself, by doing

	git log --patience -p v2.6.28.. > ~/patience

but I just got a core-dump instead.

Pinpointing it to a specific commit shows a smaller failure case:

	git show -p --patience 05d564fe00c05bf8ff93948057ca1acb5bc68e10

which might help you debug this.

			Linus

---
#0  0x00000000004cce73 in xdl_get_rec (xdf=0x7fffcb1e08d0, ri=-9, rec=0x7fffcb1e0778) at xdiff/xemit.c:36
#1  0x00000000004cced1 in xdl_emit_record (xdf=0x7fffcb1e08d0, ri=-9, pre=0x4eef79 "-", ecb=0x7fffcb1e0bc0)
    at xdiff/xemit.c:46
#2  0x00000000004cd4e6 in xdl_emit_diff (xe=0x7fffcb1e08d0, xscr=0x1111daf0, ecb=0x7fffcb1e0bc0, 
    xecfg=0x7fffcb1e0b80) at xdiff/xemit.c:179
#3  0x00000000004caa2c in xdl_diff (mf1=0x7fffcb1e0a40, mf2=0x7fffcb1e0a30, xpp=0x7fffcb1e0bd0, 
    xecfg=0x7fffcb1e0b80, ecb=0x7fffcb1e0bc0) at xdiff/xdiffi.c:559
#4  0x00000000004c088d in xdi_diff (mf1=0x7fffcb1e0c00, mf2=0x7fffcb1e0bf0, xpp=0x7fffcb1e0bd0, 
    xecfg=0x7fffcb1e0b80, xecb=0x7fffcb1e0bc0) at xdiff-interface.c:137
#5  0x00000000004c0914 in xdi_diff_outf (mf1=0x7fffcb1e0c00, mf2=0x7fffcb1e0bf0, fn=0x475448 <fn_out_consume>, 
    consume_callback_data=0x7fffcb1e0b40, xpp=0x7fffcb1e0bd0, xecfg=0x7fffcb1e0b80, xecb=0x7fffcb1e0bc0)
    at xdiff-interface.c:154
#6  0x00000000004780dc in builtin_diff (name_a=0x25cf6f0 "fs/nfs/nfs4xdr.c", name_b=0x25cf6f0 "fs/nfs/nfs4xdr.c", 
    one=0x25cf690, two=0x26ae110, xfrm_msg=0xf659900 "index 7dde309..29656c5 100644", o=0x7fffcb1e1088, 
    complete_rewrite=0) at diff.c:1486
#7  0x00000000004796e4 in run_diff_cmd (pgm=0x0, name=0x25cf6f0 "fs/nfs/nfs4xdr.c", other=0x0, 
    attr_path=0x25cf6f0 "fs/nfs/nfs4xdr.c", one=0x25cf690, two=0x26ae110, 
    xfrm_msg=0xf659900 "index 7dde309..29656c5 100644", o=0x7fffcb1e1088, complete_rewrite=0) at diff.c:2024
#8  0x0000000000479e2e in run_diff (p=0xaffece0, o=0x7fffcb1e1088) at diff.c:2158
#9  0x000000000047b959 in diff_flush_patch (p=0xaffece0, o=0x7fffcb1e1088) at diff.c:2743
#10 0x000000000047c942 in diff_flush (options=0x7fffcb1e1088) at diff.c:3184
#11 0x0000000000488b75 in log_tree_diff_flush (opt=0x7fffcb1e0f40) at log-tree.c:451
#12 0x0000000000488d17 in log_tree_diff (opt=0x7fffcb1e0f40, commit=0x2673198, log=0x7fffcb1e0ec0) at log-tree.c:503
#13 0x0000000000488da4 in log_tree_commit (opt=0x7fffcb1e0f40, commit=0x2673198) at log-tree.c:526
#14 0x000000000043218d in cmd_log_walk (rev=0x7fffcb1e0f40) at builtin-log.c:201
#15 0x0000000000432bae in cmd_log (argc=4, argv=0x7fffcb1e14b0, prefix=0x0) at builtin-log.c:423
#16 0x000000000040486b in run_command (p=0x70c7b0, argc=4, argv=0x7fffcb1e14b0) at git.c:243
#17 0x0000000000404a1c in handle_internal_command (argc=4, argv=0x7fffcb1e14b0) at git.c:387
#18 0x0000000000404c6e in main (argc=4, argv=0x7fffcb1e14b0) at git.c:484

^ permalink raw reply

* Re: git has modified files after clean checkout
From: Thomas Rast @ 2009-01-01 19:48 UTC (permalink / raw)
  To: Caleb Cushing; +Cc: David Aguilar, git
In-Reply-To: <81bfc67a0901010048l7a4a8fa1h42f7cd448dfc704@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 662 bytes --]

Caleb Cushing wrote:
> >  The files you mention contain CRLF.  Do you have core.autocrlf set
> >  globally somewhere, perhaps in your ~/.gitconfig?
> 
> yes I have it set to input

Do you have any .gitattributes?  A few days ago, ludde on IRC bumped
into the problem that git-checkout applies the .gitattributes that are
present in the tree *before* the checkout.  Naturally this means that
the .gitattributes do not apply at all during the first checkout at
the end of cloning.  In ludde's case, this caused git-blame to think
the file had all line endings changed compared to the index version.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch



[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Linus Torvalds @ 2009-01-01 19:45 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Pierre Habouzit, davidel, Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.00.0901011730190.30769@pacific.mpi-cbg.de>



On Thu, 1 Jan 2009, Johannes Schindelin wrote:
> 
> Nothing fancy, really, just a straight-forward implementation of the
> heavily under-documented and under-analyzed paience diff algorithm.

Exactly because the patience diff is so under-documented, could you 
perhaps give a few examples of how it differs in the result, and why it's 
so wonderful? Yes, yes, I can google, and no, no, nothing useful shows up 
except for *totally* content-free fanboisms. 

So could we have some actual real data on it?

			Linus

^ permalink raw reply

* Re: got wet with make --dry-run
From: Thomas Rast @ 2009-01-01 19:44 UTC (permalink / raw)
  To: jidanni; +Cc: git
In-Reply-To: <87eizn0xhd.fsf@jidanni.org>

[-- Attachment #1: Type: text/plain, Size: 1397 bytes --]

jidanni@jidanni.org wrote:
> Gentlemen, make --dry-run is booby trapped to still execute commands:
> $ (cd Documentation; make --dry-run); find -mtime -1 -type f
> ./Documentation/doc.dep
> ./GIT-VERSION-FILE
> Forgot $(MAKEFLAGS)? (info "(make)Options/Recursion").

A two minute check into Makefile shows that the recursion is
implemented via $(MAKE), which is the recommended way to do it.  It's
impossible to "forget" $(MAKEFLAGS), since the docs clearly say that
it is always exported unless explicitly unexported.

The *real* reason why it rebuilds GIT-VERSION-FILE is that the
Makefile says '-include GIT-VERSION-FILE', and uses the version info
to decide some parts of the build process.  (It'll also do a similar
thing with the $CFLAGS detection code.)  Since this influences the
actual commands executed, it seems sensible to run them even under
'make -n'.

> By the way, why would an offline make need
> /bin/sh: curl-config: command not found

From somewhere near the top of Makefile, which is definitely a
recommended read:

# Define NO_CURL if you do not have libcurl installed.  git-http-pull and
# git-http-push are not built, and you cannot use http:// and https://
# transports.


Next time please take the time to investigate at least a little bit
into your "issues" before starting to cry foul.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [PATCH] Documentation/git-bundle.txt: Dumping contents of any bundle
From: Jeff King @ 2009-01-01 19:21 UTC (permalink / raw)
  To: jidanni; +Cc: Nicolas Pitre, gitster, mdl123, spearce, git
In-Reply-To: <87prj7mz50.fsf_-_@jidanni.org>

On Thu, Jan 01, 2009 at 12:24:59PM +0800, jidanni@jidanni.org wrote:

> JK> AFAIK, there is no tool to try salvaging strings from an incomplete pack
> JK> (and you can't just run "strings" because the deltas are zlib
> JK> compressed). So if I were in the police forensics department, I think I
> JK> would read Documentation/technical/pack-format.txt and start hacking a
> JK> solution as quickly as possible.
> 
> Hogwash. Patch follows. Maybe even better methods are available.
> [...]
> +$ sed '/^PACK/,$!d' mybundle.bun > mybundle.pack
> +$ git unpack-objects < mybundle.pack
> +$ cd .git/objects
> +$ ls ??/*|tr -d /|git cat-file --batch-check
> +$ ls ??/*|tr -d /|git cat-file --batch

Sorry, no, but your method does not work in the case I described: a thin
pack with deltas. In that case, git unpack-objects cannot unpack the
object since it lacks the delta, and will skip it. For example:

  # create a bundle with a thin delta blob
  mkdir one && cd one && git init
  cp /usr/share/dict/words . && git add words && git commit -m one
  echo SECRET MESSAGE >>words && git add words && git commit -m two
  git bundle create ../mybundle.bun HEAD^..

  # now try to fetch from it
  mkdir ../two && cd ../two && git init
  git bundle unbundle ../mybundle.bun
  # produces:
  # error: Repository lacks these prerequisite commits:
  # error: b7d1a0ca98ca0e997d4222459d6fc1c9edae6a3f one

  # so try to recover
  sed '/^PACK/,$!d' ../mybundle.bun > mybundle.pack
  git unpack-objects < mybundle.pack
  # Unpacking objects: 100% (3/3), done.
  # fatal: unresolved deltas left after unpacking
  cd .git/objects
  # this will show just two objects: the commit and the tree
  ls ??/* | tr -d /
  # confirm that we don't have the blob or the string of interest
  ls ??/* | tr -d / | git cat-file --batch | grep SECRET

It is nice that unpack-objects continues at all thanks to the recent
improvements by Nicolas, so you may be able to get some of the data out.
But it just skips over any unresolvable deltas, since we can't make a
useful object from them. Maybe it would be worth adding an option to
dump the uncompressed deltas to a file or directory so you could run
"strings" on them to recover some of the data.

-Peff

^ permalink raw reply

* got wet with make --dry-run
From: jidanni @ 2009-01-01 17:03 UTC (permalink / raw)
  To: git

Gentlemen, make --dry-run is booby trapped to still execute commands:
$ (cd Documentation; make --dry-run); find -mtime -1 -type f
./Documentation/doc.dep
./GIT-VERSION-FILE
Forgot $(MAKEFLAGS)? (info "(make)Options/Recursion").
By the way, why would an offline make need
/bin/sh: curl-config: command not found

^ permalink raw reply

* [PATCH] Documentation/git-merge: at least one <remote> not two
From: jidanni @ 2009-01-01 18:41 UTC (permalink / raw)
  To: gitster; +Cc: git

Make SYNOPSIS match usage message

Signed-off-by: jidanni <jidanni@jidanni.org>
---
 Documentation/git-merge.txt |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Documentation/git-merge.txt b/Documentation/git-merge.txt
index f7be584..a3ac828 100644
--- a/Documentation/git-merge.txt
+++ b/Documentation/git-merge.txt
@@ -10,7 +10,7 @@ SYNOPSIS
 --------
 [verse]
 'git merge' [-n] [--stat] [--no-commit] [--squash] [-s <strategy>]...
-	[-m <msg>] <remote> <remote>...
+	[-m <msg>] <remote>...
 'git merge' <msg> HEAD <remote>...
 
 DESCRIPTION
-- 
1.6.0.6

^ permalink raw reply related

* Subject: [PATCH] Documentation/git-merge: deprecated syntax moved to end
From: jidanni @ 2009-01-01 18:41 UTC (permalink / raw)
  To: gitster; +Cc: git

Moving the deprecated syntax moved to the end of the document.
Or please at least stamp it *deprecated* in the SYNOPSIS, in case the
user reads no further down the page.

Signed-off-by: jidanni <jidanni@jidanni.org>
---
 Documentation/git-merge.txt |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-merge.txt b/Documentation/git-merge.txt
index a3ac828..e619c9f 100644
--- a/Documentation/git-merge.txt
+++ b/Documentation/git-merge.txt
@@ -11,18 +11,12 @@ SYNOPSIS
 [verse]
 'git merge' [-n] [--stat] [--no-commit] [--squash] [-s <strategy>]...
 	[-m <msg>] <remote>...
-'git merge' <msg> HEAD <remote>...
 
 DESCRIPTION
 -----------
 This is the top-level interface to the merge machinery
 which drives multiple merge strategy scripts.
 
-The second syntax (<msg> `HEAD` <remote>) is supported for
-historical reasons.  Do not use it from the command line or in
-new scripts.  It is the same as `git merge -m <msg> <remote>`.
-
-
 OPTIONS
 -------
 include::merge-options.txt[]
@@ -211,6 +205,12 @@ You can work through the conflict with a number of tools:
    common ancestor, 'git show :2:filename' shows the HEAD
    version and 'git show :3:filename' shows the remote version.
 
+DEPRECATED SYNTAX
+-----------------
+There also as a `git merge <msg> HEAD <remote>...` syntax supported
+for historical reasons. Do not use it from the command line or in new
+scripts. It is the same as `git merge -m <msg> <remote>`.
+
 SEE ALSO
 --------
 linkgit:git-fmt-merge-msg[1], linkgit:git-pull[1],
-- 
1.6.0.6

^ permalink raw reply related

* Re: Extracting a single commit or object
From: Miklos Vajna @ 2009-01-01 18:08 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: yitzhakbg, git
In-Reply-To: <alpine.DEB.1.00.0901011747580.30769@pacific.mpi-cbg.de>

[-- Attachment #1: Type: text/plain, Size: 422 bytes --]

On Thu, Jan 01, 2009 at 05:52:49PM +0100, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> If you activated bash completion, you can even complete monsters like 
> this:
> 
> 	git show \
> v1.5.3:v1.5.3:t/t4013/diff.diff-tree_--cc_--patch-with-stat_--summary_master

Wow, that's really a monster. After removing the first leading v1.5.3:,
it works. (If this is a bash completion bug, I can't reproduce.)

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: git-difftool
From: Matthieu Moy @ 2009-01-01 17:58 UTC (permalink / raw)
  To: David Aguilar; +Cc: git
In-Reply-To: <402731c90812311211p548c49d3p100f79ddee7163b0@mail.gmail.com>

"David Aguilar" <davvid@gmail.com> writes:

> Hmm... in theory, yes, but in practice, no.
> xxdiff is too gimp to handle what 'git diff' hands it =)

As done with "vimdiff" in another message, simply write a one-liner
wrapper script that calls xxdiff $2 $3, and call this wrapper script.

-- 
Matthieu

^ permalink raw reply

* Re: [PATCH] Documentation/git-bundle.txt: Dumping contents of any bundle
From: Johannes Schindelin @ 2009-01-01 17:03 UTC (permalink / raw)
  To: jidanni; +Cc: git
In-Reply-To: <87prj7mz50.fsf_-_@jidanni.org>

Hi,

On Thu, 1 Jan 2009, jidanni@jidanni.org wrote:

> >>>>> "JK" == Jeff King <peff@peff.net> writes:
> 
> JK> AFAIK, there is no tool to try salvaging strings from an incomplete pack
> JK> (and you can't just run "strings" because the deltas are zlib
> JK> compressed). So if I were in the police forensics department, I think I
> JK> would read Documentation/technical/pack-format.txt and start hacking a
> JK> solution as quickly as possible.
> 
> Hogwash. Patch follows. Maybe even better methods are available.
> 
> Signed-off-by: jidanni <jidanni@jidanni.org>
> ---

Just for the record: this is in so many ways not a commit message I want 
to have in git.git.  I hope it is not applied.

Ciao,
Dscho

^ permalink raw reply

* Re: is there an easier way to do this ?
From: Johannes Schindelin @ 2009-01-01 16:55 UTC (permalink / raw)
  To: Zorba; +Cc: git
In-Reply-To: <gjeacg$3r5$4@ger.gmane.org>

Hi,

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

On Tue, 30 Dec 2008, Zorba wrote:

> > "Zorba" <cr@altmore.co.uk> writes:
> >
> >> ok, now I'm in this for real, archiving versions of our website project 
> >> (5k
> >> files approx)
> >>
> >> so here is the workflow:
> >>
> >> - copy version 1 files into GIT dir
> >>
> >> - open git bash
> >>
> >> $ git init
> >>
> >> $ git add .
> >>
> >> $ git commit -m "version1"
> >>
> >> all vanilla ? cool
> >> next job = store version 2 [...]
> >
> > Check out contrib/fast-import/import-tars.perl
>
> thanks Jakub, but I don't mind copying the versions in by hand and 
> running the git commits on them sequentially.

It's not only about how much work you are doing.  It's also about 
preserving as much metadata as possible.

Ciao,
Dscho

^ permalink raw reply

* Re: Extracting a single commit or object
From: Johannes Schindelin @ 2009-01-01 16:52 UTC (permalink / raw)
  To: yitzhakbg; +Cc: git
In-Reply-To: <21223948.post@talk.nabble.com>

Hi,

On Tue, 30 Dec 2008, yitzhakbg wrote:

> How would I extract a single commit from a repository by it's SHA1 (or 
> any other treeish)?

Your question is not precise enough to answer.  Are you looking for

- the commit message?
- the patch?
- all the files referenced by that commit?
- all the files _and revisions_ referenced by that commit?

The answer depends quite a lot on the question...

> For that matter, how is any one single object extracted? Examples please.

The user-friendly way to look at a tree is

	git show HEAD:Documentation/

or some such.  Likewise, you can inspect single blobs like this:

	git show HEAD:README

If you activated bash completion, you can even complete monsters like 
this:

	git show \
v1.5.3:v1.5.3:t/t4013/diff.diff-tree_--cc_--patch-with-stat_--summary_master

Hth,
Dscho

^ permalink raw reply

* [PATCH 3/3] bash completions: Add the --patience option
From: Johannes Schindelin @ 2009-01-01 16:39 UTC (permalink / raw)
  To: Pierre Habouzit; +Cc: davidel, Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.00.0901011730190.30769@pacific.mpi-cbg.de>


Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/completion/git-completion.bash |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index a046441..b3e1e22 100755
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -777,6 +777,7 @@ _git_diff ()
 			--no-prefix --src-prefix= --dst-prefix=
 			--base --ours --theirs
 			--inter-hunk-context=
+			--patience
 			"
 		return
 		;;
@@ -969,6 +970,7 @@ _git_log ()
 			--parents --children --full-history
 			--merge
 			--inter-hunk-context=
+			--patience
 			"
 		return
 		;;
-- 
1.6.1.rc3.412.ga72b

^ permalink raw reply related

* [PATCH 2/3] Introduce the diff option '--patience'
From: Johannes Schindelin @ 2009-01-01 16:39 UTC (permalink / raw)
  To: Pierre Habouzit; +Cc: davidel, Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.00.0901011730190.30769@pacific.mpi-cbg.de>


This commit teaches Git to produce diff output using the patience diff
algorithm with the diff option '--patience'.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---

	git log --check complains about this patch, for obvious reasons.

 Documentation/diff-options.txt |    3 +
 Makefile                       |    2 +-
 diff.c                         |    2 +
 t/t4033-diff-patience.sh       |  168 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 174 insertions(+), 1 deletions(-)
 create mode 100755 t/t4033-diff-patience.sh

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 671f533..15ef35a 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -36,6 +36,9 @@ endif::git-format-patch[]
 --patch-with-raw::
 	Synonym for "-p --raw".
 
+--patience:
+	Generate a diff using the "patience diff" algorithm.
+
 --stat[=width[,name-width]]::
 	Generate a diffstat.  You can override the default
 	output width for 80-column terminal by "--stat=width".
diff --git a/Makefile b/Makefile
index 154cf34..2217873 100644
--- a/Makefile
+++ b/Makefile
@@ -1287,7 +1287,7 @@ $(LIB_FILE): $(LIB_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) rcs $@ $(LIB_OBJS)
 
 XDIFF_OBJS=xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
-	xdiff/xmerge.o
+	xdiff/xmerge.o xdiff/xpatience.o
 $(XDIFF_OBJS): xdiff/xinclude.h xdiff/xmacros.h xdiff/xdiff.h xdiff/xtypes.h \
 	xdiff/xutils.h xdiff/xprepare.h xdiff/xdiffi.h xdiff/xemit.h
 
diff --git a/diff.c b/diff.c
index 56b80f9..67718b7 100644
--- a/diff.c
+++ b/diff.c
@@ -2472,6 +2472,8 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		options->xdl_opts |= XDF_IGNORE_WHITESPACE_CHANGE;
 	else if (!strcmp(arg, "--ignore-space-at-eol"))
 		options->xdl_opts |= XDF_IGNORE_WHITESPACE_AT_EOL;
+	else if (!strcmp(arg, "--patience"))
+		options->xdl_opts |= XDF_PATIENCE_DIFF;
 
 	/* flags options */
 	else if (!strcmp(arg, "--binary")) {
diff --git a/t/t4033-diff-patience.sh b/t/t4033-diff-patience.sh
new file mode 100755
index 0000000..63c1b00
--- /dev/null
+++ b/t/t4033-diff-patience.sh
@@ -0,0 +1,168 @@
+#!/bin/sh
+
+test_description='patience diff algorithm'
+
+. ./test-lib.sh
+
+cat > file1 << EOF
+#include <stdio.h>
+
+// Frobs foo heartily
+int frobnitz(int foo)
+{
+    int i;
+    for(i = 0; i < 10; i++)
+    {
+        printf("Your answer is: ");
+        printf("%d\n", foo);
+    }
+}
+
+int fact(int n)
+{
+    if(n > 1)
+    {
+        return fact(n-1) * n;
+    }
+    return 1;
+}
+
+int main(int argc, char **argv)
+{
+    frobnitz(fact(10));
+}
+EOF
+
+cat > file2 << EOF
+#include <stdio.h>
+
+int fib(int n)
+{
+    if(n > 2)
+    {
+        return fib(n-1) + fib(n-2);
+    }
+    return 1;
+}
+
+// Frobs foo heartily
+int frobnitz(int foo)
+{
+    int i;
+    for(i = 0; i < 10; i++)
+    {
+        printf("%d\n", foo);
+    }
+}
+
+int main(int argc, char **argv)
+{
+    frobnitz(fib(10));
+}
+EOF
+
+cat > expect << EOF
+diff --git a/file1 b/file2
+index 6faa5a3..e3af329 100644
+--- a/file1
++++ b/file2
+@@ -1,26 +1,25 @@
+ #include <stdio.h>
+ 
++int fib(int n)
++{
++    if(n > 2)
++    {
++        return fib(n-1) + fib(n-2);
++    }
++    return 1;
++}
++
+ // Frobs foo heartily
+ int frobnitz(int foo)
+ {
+     int i;
+     for(i = 0; i < 10; i++)
+     {
+-        printf("Your answer is: ");
+         printf("%d\n", foo);
+     }
+ }
+ 
+-int fact(int n)
+-{
+-    if(n > 1)
+-    {
+-        return fact(n-1) * n;
+-    }
+-    return 1;
+-}
+-
+ int main(int argc, char **argv)
+ {
+-    frobnitz(fact(10));
++    frobnitz(fib(10));
+ }
+EOF
+
+test_expect_success 'patience diff' '
+
+	test_must_fail git diff --no-index --patience file1 file2 > output &&
+	test_cmp expect output
+
+'
+
+test_expect_success 'patience diff output is valid' '
+
+	mv file2 expect &&
+	git apply < output &&
+	test_cmp expect file2
+
+'
+
+cat > uniq1 << EOF
+1
+2
+3
+4
+5
+6
+EOF
+
+cat > uniq2 << EOF
+a
+b
+c
+d
+e
+f
+EOF
+
+cat > expect << EOF
+diff --git a/uniq1 b/uniq2
+index b414108..0fdf397 100644
+--- a/uniq1
++++ b/uniq2
+@@ -1,6 +1,6 @@
+-1
+-2
+-3
+-4
+-5
+-6
++a
++b
++c
++d
++e
++f
+EOF
+
+test_expect_success 'completely different files' '
+
+	test_must_fail git diff --no-index --patience uniq1 uniq2 > output &&
+	test_cmp expect output
+
+'
+
+test_done
-- 
1.6.1.rc3.412.ga72b

^ permalink raw reply related

* [PATCH 1/3] Implement the patience diff algorithm
From: Johannes Schindelin @ 2009-01-01 16:38 UTC (permalink / raw)
  To: Pierre Habouzit; +Cc: davidel, Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.00.0901011730190.30769@pacific.mpi-cbg.de>


The patience diff algorithm produces slightly more intuitive output
than the classic Myers algorithm, as it does not try to minimize the
number of +/- lines first, but tries to preserve the lines that are
unique.

To this end, it first determines lines that are unique in both files,
then the maximal sequence which preserves the order (relative to both
files) is extracted.

Starting from this initial set of common lines, the rest of the lines
is handled recursively, with Myers' algorithm as a fallback when
the patience algorithm fails (due to no common unique lines).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 xdiff/xdiff.h     |    1 +
 xdiff/xdiffi.c    |    3 +
 xdiff/xdiffi.h    |    2 +
 xdiff/xpatience.c |  374 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 380 insertions(+), 0 deletions(-)
 create mode 100644 xdiff/xpatience.c

diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 361f802..4da052a 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -32,6 +32,7 @@ extern "C" {
 #define XDF_IGNORE_WHITESPACE (1 << 2)
 #define XDF_IGNORE_WHITESPACE_CHANGE (1 << 3)
 #define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 4)
+#define XDF_PATIENCE_DIFF (1 << 5)
 #define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL)
 
 #define XDL_PATCH_NORMAL '-'
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 9d0324a..3e97462 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -329,6 +329,9 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	xdalgoenv_t xenv;
 	diffdata_t dd1, dd2;
 
+	if (xpp->flags & XDF_PATIENCE_DIFF)
+		return xdl_do_patience_diff(mf1, mf2, xpp, xe);
+
 	if (xdl_prepare_env(mf1, mf2, xpp, xe) < 0) {
 
 		return -1;
diff --git a/xdiff/xdiffi.h b/xdiff/xdiffi.h
index 3e099dc..ad033a8 100644
--- a/xdiff/xdiffi.h
+++ b/xdiff/xdiffi.h
@@ -55,5 +55,7 @@ int xdl_build_script(xdfenv_t *xe, xdchange_t **xscr);
 void xdl_free_script(xdchange_t *xscr);
 int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 		  xdemitconf_t const *xecfg);
+int xdl_do_patience_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
+		xdfenv_t *env);
 
 #endif /* #if !defined(XDIFFI_H) */
diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
new file mode 100644
index 0000000..6687940
--- /dev/null
+++ b/xdiff/xpatience.c
@@ -0,0 +1,374 @@
+/*
+ *  LibXDiff by Davide Libenzi ( File Differential Library )
+ *  Copyright (C) 2003-2009 Davide Libenzi, Johannes E. Schindelin
+ *
+ *  This library is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU Lesser General Public
+ *  License as published by the Free Software Foundation; either
+ *  version 2.1 of the License, or (at your option) any later version.
+ *
+ *  This library is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *  Lesser General Public License for more details.
+ *
+ *  You should have received a copy of the GNU Lesser General Public
+ *  License along with this library; if not, write to the Free Software
+ *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ *  Davide Libenzi <davidel@xmailserver.org>
+ *
+ */
+#include "xinclude.h"
+#include "xtypes.h"
+#include "xdiff.h"
+
+/*
+ * The basic idea of patience diff is to find lines that are unique in
+ * both files.  These are intuitively the ones that we want to see as
+ * common lines.
+ *
+ * The maximal ordered sequence of such line pairs (where ordered means
+ * that the order in the sequence agrees with the order of the lines in
+ * both files) naturally defines an initial set of common lines.
+ *
+ * Now, the algorithm tries to extend the set of common lines by growing
+ * the line ranges where the files have identical lines.
+ *
+ * Between those common lines, the patience diff algorithm is applied
+ * recursively, until no unique line pairs can be found; these line ranges
+ * are handled by the well-known Myers algorithm.
+ */
+
+#define NON_UNIQUE ULONG_MAX
+
+/*
+ * This is a hash mapping from line hash to line numbers in the first and
+ * second file.
+ */
+struct hashmap {
+	int nr, alloc;
+	struct entry {
+		unsigned long hash;
+		/*
+		 * 0 = unused entry, 1 = first line, 2 = second, etc.
+		 * line2 is NON_UNIQUE if the line is not unique
+		 * in either the first or the second file.
+		 */
+		unsigned long line1, line2;
+		/*
+		 * "next" & "previous" are used for the longest common
+		 * sequence;
+		 * initially, "next" reflects only the order in file1.
+		 */
+		struct entry *next, *previous;
+	} *entries, *first, *last;
+	/* were common records found? */
+	unsigned long has_matches;
+	mmfile_t *file1, *file2;
+	xdfenv_t *env;
+	xpparam_t const *xpp;
+};
+
+/* The argument "pass" is 1 for the first file, 2 for the second. */
+static void insert_record(int line, struct hashmap *map, int pass)
+{
+	xrecord_t **records = pass == 1 ?
+		map->env->xdf1.recs : map->env->xdf2.recs;
+	xrecord_t *record = records[line - 1], *other;
+	/*
+	 * After xdl_prepare_env() (or more precisely, due to
+	 * xdl_classify_record()), the "ha" member of the records (AKA lines)
+	 * is _not_ the hash anymore, but a linearized version of it.  In
+	 * other words, the "ha" member is guaranteed to start with 0 and
+	 * the second record's ha can only be 0 or 1, etc.
+	 *
+	 * So we multiply ha by 2 in the hope that the hashing was
+	 * "unique enough".
+	 */
+	int index = (int)((record->ha << 1) % map->alloc);
+
+	while (map->entries[index].line1) {
+		other = map->env->xdf1.recs[map->entries[index].line1 - 1];
+		if (map->entries[index].hash != record->ha ||
+				!xdl_recmatch(record->ptr, record->size,
+					other->ptr, other->size,
+					map->xpp->flags)) {
+			if (++index >= map->alloc)
+				index = 0;
+			continue;
+		}
+		if (pass == 2)
+			map->has_matches = 1;
+		if (pass == 1 || map->entries[index].line2)
+			map->entries[index].line2 = NON_UNIQUE;
+		else
+			map->entries[index].line2 = line;
+		return;
+	}
+	if (pass == 2)
+		return;
+	map->entries[index].line1 = line;
+	map->entries[index].hash = record->ha;
+	if (!map->first)
+		map->first = map->entries + index;
+	if (map->last) {
+		map->last->next = map->entries + index;
+		map->entries[index].previous = map->last;
+	}
+	map->last = map->entries + index;
+	map->nr++;
+}
+
+/*
+ * This function has to be called for each recursion into the inter-hunk
+ * parts, as previously non-unique lines can become unique when being
+ * restricted to a smaller part of the files.
+ *
+ * It is assumed that env has been prepared using xdl_prepare().
+ */
+static int fill_hashmap(mmfile_t *file1, mmfile_t *file2,
+		xpparam_t const *xpp, xdfenv_t *env,
+		struct hashmap *result,
+		int line1, int count1, int line2, int count2)
+{
+	result->file1 = file1;
+	result->file2 = file2;
+	result->xpp = xpp;
+	result->env = env;
+
+	/* We know exactly how large we want the hash map */
+	result->alloc = count1 * 2;
+	result->entries = (struct entry *)
+		xdl_malloc(result->alloc * sizeof(struct entry));
+	if (!result->entries)
+		return -1;
+	memset(result->entries, 0, result->alloc * sizeof(struct entry));
+
+	/* First, fill with entries from the first file */
+	while (count1--)
+		insert_record(line1++, result, 1);
+
+	/* Then search for matches in the second file */
+	while (count2--)
+		insert_record(line2++, result, 2);
+
+	return 0;
+}
+
+/*
+ * Find the longest sequence with a smaller last element (meaning a smaller
+ * line2, as we construct the sequence with entries ordered by line1).
+ */
+static int binary_search(struct entry **sequence, int longest,
+		struct entry *entry)
+{
+	int left = -1, right = longest;
+
+	while (left + 1 < right) {
+		int middle = (left + right) / 2;
+		/* by construction, no two entries can be equal */
+		if (sequence[middle]->line2 > entry->line2)
+			right = middle;
+		else
+			left = middle;
+	}
+	/* return the index in "sequence", _not_ the sequence length */
+	return left;
+}
+
+/*
+ * The idea is to start with the list of common unique lines sorted by
+ * the order in file1.  For each of these pairs, the longest (partial)
+ * sequence whose last element's line2 is smaller is determined.
+ *
+ * For efficiency, the sequences are kept in a list containing exactly one
+ * item per sequence length: the sequence with the smallest last
+ * element (in terms of line2).
+ */
+static struct entry *find_longest_common_sequence(struct hashmap *map)
+{
+	struct entry **sequence = xdl_malloc(map->nr * sizeof(struct entry *));
+	int longest = 0, i;
+	struct entry *entry;
+
+	for (entry = map->first; entry; entry = entry->next) {
+		if (!entry->line2 || entry->line2 == NON_UNIQUE)
+			continue;
+		i = binary_search(sequence, longest, entry);
+		entry->previous = i < 0 ? NULL : sequence[i];
+		sequence[++i] = entry;
+		if (i == longest)
+			longest++;
+	}
+
+	/* No common unique lines were found */
+	if (!longest)
+		return NULL;
+
+	/* Iterate starting at the last element, adjusting the "next" members */
+	entry = sequence[longest - 1];
+	entry->next = NULL;
+	while (entry->previous) {
+		entry->previous->next = entry;
+		entry = entry->previous;
+	}
+	return entry;
+}
+
+static int match(struct hashmap *map, int line1, int line2)
+{
+	xrecord_t *record1 = map->env->xdf1.recs[line1 - 1];
+	xrecord_t *record2 = map->env->xdf2.recs[line2 - 1];
+	return xdl_recmatch(record1->ptr, record1->size,
+		record2->ptr, record2->size, map->xpp->flags);
+}
+
+static int patience_diff(mmfile_t *file1, mmfile_t *file2,
+		xpparam_t const *xpp, xdfenv_t *env,
+		int line1, int count1, int line2, int count2);
+
+static int walk_common_sequence(struct hashmap *map, struct entry *first,
+		int line1, int count1, int line2, int count2)
+{
+	int end1 = line1 + count1, end2 = line2 + count2;
+	int next1, next2;
+
+	for (;;) {
+		/* Try to grow the line ranges of common lines */
+		if (first) {
+			next1 = first->line1;
+			next2 = first->line2;
+			while (next1 > line1 && next2 > line2 &&
+					match(map, next1 - 1, next2 - 1)) {
+				next1--;
+				next2--;
+			}
+		} else {
+			next1 = end1;
+			next2 = end2;
+		}
+		while (line1 < next1 && line2 < next2 &&
+				match(map, line1, line2)) {
+			line1++;
+			line2++;
+		}
+
+		/* Recurse */
+		if (next1 > line1 || next2 > line2) {
+			struct hashmap submap;
+
+			memset(&submap, 0, sizeof(submap));
+			if (patience_diff(map->file1, map->file2,
+					map->xpp, map->env,
+					line1, next1 - line1,
+					line2, next2 - line2))
+				return -1;
+		}
+
+		if (!first)
+			return 0;
+
+		while (first->next &&
+				first->next->line1 == first->line1 + 1 &&
+				first->next->line2 == first->line2 + 1)
+			first = first->next;
+
+		line1 = first->line1 + 1;
+		line2 = first->line2 + 1;
+
+		first = first->next;
+	}
+}
+
+static int fall_back_to_classic_diff(struct hashmap *map,
+		int line1, int count1, int line2, int count2)
+{
+	/*
+	 * This probably does not work outside Git, since
+	 * we have a very simple mmfile structure.
+	 *
+	 * Note: ideally, we would reuse the prepared environment, but
+	 * the libxdiff interface does not (yet) allow for diffing only
+	 * ranges of lines instead of the whole files.
+	 */
+	mmfile_t subfile1, subfile2;
+	xpparam_t xpp;
+	xdfenv_t env;
+
+	subfile1.ptr = (char *)map->env->xdf1.recs[line1 - 1]->ptr;
+	subfile1.size = map->env->xdf1.recs[line1 + count1 - 2]->ptr +
+		map->env->xdf1.recs[line1 + count1 - 2]->size - subfile1.ptr;
+	subfile2.ptr = (char *)map->env->xdf2.recs[line2 - 1]->ptr;
+	subfile2.size = map->env->xdf2.recs[line2 + count2 - 2]->ptr +
+		map->env->xdf2.recs[line2 + count2 - 2]->size - subfile2.ptr;
+	xpp.flags = map->xpp->flags & ~XDF_PATIENCE_DIFF;
+	if (xdl_do_diff(&subfile1, &subfile2, &xpp, &env) < 0)
+		return -1;
+
+	memcpy(map->env->xdf1.rchg + line1 - 1, env.xdf1.rchg, count1);
+	memcpy(map->env->xdf2.rchg + line2 - 1, env.xdf2.rchg, count2);
+
+	return 0;
+}
+
+/*
+ * Recursively find the longest common sequence of unique lines,
+ * and if none was found, ask xdl_do_diff() to do the job.
+ *
+ * This function assumes that env was prepared with xdl_prepare_env().
+ */
+static int patience_diff(mmfile_t *file1, mmfile_t *file2,
+		xpparam_t const *xpp, xdfenv_t *env,
+		int line1, int count1, int line2, int count2)
+{
+	struct hashmap map;
+	struct entry *first;
+	int result = 0;
+
+	/* trivial case: one side is empty */
+	if (!count1) {
+		while(count2--)
+			env->xdf2.rchg[line2++ - 1] = 1;
+		return 0;
+	} else if (!count2) {
+		while(count1--)
+			env->xdf1.rchg[line1++ - 1] = 1;
+		return 0;
+	}
+
+	memset(&map, 0, sizeof(map));
+	if (fill_hashmap(file1, file2, xpp, env, &map,
+			line1, count1, line2, count2))
+		return -1;
+
+	/* are there any matching lines at all? */
+	if (!map.has_matches) {
+		while(count1--)
+			env->xdf1.rchg[line1++ - 1] = 1;
+		while(count2--)
+			env->xdf2.rchg[line2++ - 1] = 1;
+		return 0;
+	}
+
+	first = find_longest_common_sequence(&map);
+	if (first)
+		result = walk_common_sequence(&map, first,
+			line1, count1, line2, count2);
+	else
+		result = fall_back_to_classic_diff(&map,
+			line1, count1, line2, count2);
+
+	return result;
+}
+
+int xdl_do_patience_diff(mmfile_t *file1, mmfile_t *file2,
+		xpparam_t const *xpp, xdfenv_t *env)
+{
+	if (xdl_prepare_env(file1, file2, xpp, env) < 0)
+		return -1;
+
+	/* environment is cleaned up in xdl_diff() */
+	return patience_diff(file1, file2, xpp, env,
+			1, env->xdf1.nrec, 1, env->xdf2.nrec);
+}
-- 
1.6.1.rc3.412.ga72b

^ permalink raw reply related

* [PATCH 0/3] Teach Git about the patience diff algorithm
From: Johannes Schindelin @ 2009-01-01 16:38 UTC (permalink / raw)
  To: Pierre Habouzit; +Cc: davidel, Francis Galiegue, Git ML
In-Reply-To: <20081104152351.GA21842@artemis.corp>


Nothing fancy, really, just a straight-forward implementation of the
heavily under-documented and under-analyzed paience diff algorithm.

One thing is a bit ugly: the libxdiff interface does not allow to
calculate diffs of ranges of lines.  So instead of reusing an initialized
environment (with line hashes and all), the simple structure of mmfile_t
is exploited to fake an mmfile_t of a file _part_, reusing the file's
buffer.

And this mmfile_t pair gets a new environment, recalculating hashes and 
all.

Davide, I think it would be easier to refactor xdl_do_diff() to take line
ranges, and use that interface both in xpatience.c and in xmerge.c.  
(Although I do not know if you took xmerge.c at all; you seemed a bit 
reluctant about it back when I sent it to you.)

For those interested in studying the code, I suggest starting with the 
short comment at the beginning of xpatience.c and then working yourself up 
from the end (i.e. xdl_do_patience_diff()).

It might be a good idea to think about using this code in our merging code
once it is well reviewed and tested, as it might help a substantial number
of otherwise non-trivial merge conflicts.

Oh, and the bash completions are so trivial I did not even bother to test
them.

Happy new year.

Johannes Schindelin (3):
  Implement the patience diff algorithm
  Introduce the diff option '--patience'
  bash completions: Add the --patience option

 Documentation/diff-options.txt         |    3 +
 Makefile                               |    2 +-
 contrib/completion/git-completion.bash |    2 +
 diff.c                                 |    2 +
 t/t4033-diff-patience.sh               |  168 ++++++++++++++
 xdiff/xdiff.h                          |    1 +
 xdiff/xdiffi.c                         |    3 +
 xdiff/xdiffi.h                         |    2 +
 xdiff/xpatience.c                      |  374 ++++++++++++++++++++++++++++++++
 9 files changed, 556 insertions(+), 1 deletions(-)
 create mode 100755 t/t4033-diff-patience.sh
 create mode 100644 xdiff/xpatience.c

^ permalink raw reply

* Re: [PATCH] Documentation/gitcli.txt: dashed forms not allowed anymore
From: Miklos Vajna @ 2009-01-01 14:40 UTC (permalink / raw)
  To: jidanni; +Cc: git, gitster
In-Reply-To: <87ljtvmygk.fsf@jidanni.org>

[-- Attachment #1: Type: text/plain, Size: 444 bytes --]

On Thu, Jan 01, 2009 at 12:39:39PM +0800, jidanni@jidanni.org wrote:
> - * it's preferred to use the non dashed form of git commands, which means that
> -   you should prefer `"git foo"` to `"git-foo"`.
> + * it's required to use the non dashed form of git commands, which means that
> +   you must use `"git foo"` and not `"git-foo"`. The latter no longer works.

I would append: "unless you add the output of `git --exec-path` to your
PATH."

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox