Git development
 help / color / mirror / Atom feed
* Re: [PATCH 1/3 v2] Implement the patience diff algorithm
From: Johannes Schindelin @ 2009-01-02 21:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Pierre Habouzit, davidel, Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.00.0901022220380.27818@racer>

Hi,

The interdiff between v1 and v2 of PATCH 1/3.  As you can see, I also 
added a cleanup of an intermediate xdlenv.

Ciao,
Dscho

 xdiff/xpatience.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
index 6687940..d01cbdd 100644
--- a/xdiff/xpatience.c
+++ b/xdiff/xpatience.c
@@ -309,6 +309,8 @@ static int fall_back_to_classic_diff(struct hashmap *map,
 	memcpy(map->env->xdf1.rchg + line1 - 1, env.xdf1.rchg, count1);
 	memcpy(map->env->xdf2.rchg + line2 - 1, env.xdf2.rchg, count2);
 
+	xdl_free_env(&env);
+
 	return 0;
 }
 
@@ -368,6 +370,15 @@ int xdl_do_patience_diff(mmfile_t *file1, mmfile_t *file2,
 	if (xdl_prepare_env(file1, file2, xpp, env) < 0)
 		return -1;
 
+	/*
+	 * It is a pity that xdl_cleanup_records() not only marks those
+	 * lines as changes that are only present in one file, but also
+	 * lines that have multiple matches and happen to be in a "run
+	 * of discardable lines" that patience diff happens to split
+	 * differently.
+	 */
+	memset(env->xdf1.rchg, 0, env->xdf1.nrec);
+	memset(env->xdf2.rchg, 0, env->xdf2.nrec);
 	/* environment is cleaned up in xdl_diff() */
 	return patience_diff(file1, file2, xpp, env,
 			1, env->xdf1.nrec, 1, env->xdf2.nrec);
-- 
1.6.1.rc3.224.g95ac9

^ permalink raw reply related

* [PATCH 1/3 v2] Implement the patience diff algorithm
From: Johannes Schindelin @ 2009-01-02 21:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Pierre Habouzit, davidel, Francis Galiegue, Git ML
In-Reply-To: <alpine.LFD.2.00.0901011151440.5086@localhost.localdomain>


The patience diff algorithm produces slightly more intuitive output
than the classic Myers algorithm, as it does not try to minimize the
number of +/- lines first, but tries to preserve the lines that are
unique.

To this end, it first determines lines that are unique in both files,
then the maximal sequence which preserves the order (relative to both
files) is extracted.

Starting from this initial set of common lines, the rest of the lines
is handled recursively, with Myers' algorithm as a fallback when
the patience algorithm fails (due to no common unique lines).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---

	I did not realize that xdl_prepare_env() initializes the arrays in
	rchg (which tell which lines are not common).  Unfortunately, there
	are ambiguities, e.g. with empty lines, and my implementation wanted
	to take other common lines, disagreeing with the previous
	initialization.

	Interdiff follows.

 xdiff/xdiff.h     |    1 +
 xdiff/xdiffi.c    |    3 +
 xdiff/xdiffi.h    |    2 +
 xdiff/xpatience.c |  385 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 391 insertions(+), 0 deletions(-)
 create mode 100644 xdiff/xpatience.c

diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 361f802..4da052a 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -32,6 +32,7 @@ extern "C" {
 #define XDF_IGNORE_WHITESPACE (1 << 2)
 #define XDF_IGNORE_WHITESPACE_CHANGE (1 << 3)
 #define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 4)
+#define XDF_PATIENCE_DIFF (1 << 5)
 #define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL)
 
 #define XDL_PATCH_NORMAL '-'
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 9d0324a..3e97462 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -329,6 +329,9 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	xdalgoenv_t xenv;
 	diffdata_t dd1, dd2;
 
+	if (xpp->flags & XDF_PATIENCE_DIFF)
+		return xdl_do_patience_diff(mf1, mf2, xpp, xe);
+
 	if (xdl_prepare_env(mf1, mf2, xpp, xe) < 0) {
 
 		return -1;
diff --git a/xdiff/xdiffi.h b/xdiff/xdiffi.h
index 3e099dc..ad033a8 100644
--- a/xdiff/xdiffi.h
+++ b/xdiff/xdiffi.h
@@ -55,5 +55,7 @@ int xdl_build_script(xdfenv_t *xe, xdchange_t **xscr);
 void xdl_free_script(xdchange_t *xscr);
 int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 		  xdemitconf_t const *xecfg);
+int xdl_do_patience_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
+		xdfenv_t *env);
 
 #endif /* #if !defined(XDIFFI_H) */
diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
new file mode 100644
index 0000000..d01cbdd
--- /dev/null
+++ b/xdiff/xpatience.c
@@ -0,0 +1,385 @@
+/*
+ *  LibXDiff by Davide Libenzi ( File Differential Library )
+ *  Copyright (C) 2003-2009 Davide Libenzi, Johannes E. Schindelin
+ *
+ *  This library is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU Lesser General Public
+ *  License as published by the Free Software Foundation; either
+ *  version 2.1 of the License, or (at your option) any later version.
+ *
+ *  This library is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *  Lesser General Public License for more details.
+ *
+ *  You should have received a copy of the GNU Lesser General Public
+ *  License along with this library; if not, write to the Free Software
+ *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ *  Davide Libenzi <davidel@xmailserver.org>
+ *
+ */
+#include "xinclude.h"
+#include "xtypes.h"
+#include "xdiff.h"
+
+/*
+ * The basic idea of patience diff is to find lines that are unique in
+ * both files.  These are intuitively the ones that we want to see as
+ * common lines.
+ *
+ * The maximal ordered sequence of such line pairs (where ordered means
+ * that the order in the sequence agrees with the order of the lines in
+ * both files) naturally defines an initial set of common lines.
+ *
+ * Now, the algorithm tries to extend the set of common lines by growing
+ * the line ranges where the files have identical lines.
+ *
+ * Between those common lines, the patience diff algorithm is applied
+ * recursively, until no unique line pairs can be found; these line ranges
+ * are handled by the well-known Myers algorithm.
+ */
+
+#define NON_UNIQUE ULONG_MAX
+
+/*
+ * This is a hash mapping from line hash to line numbers in the first and
+ * second file.
+ */
+struct hashmap {
+	int nr, alloc;
+	struct entry {
+		unsigned long hash;
+		/*
+		 * 0 = unused entry, 1 = first line, 2 = second, etc.
+		 * line2 is NON_UNIQUE if the line is not unique
+		 * in either the first or the second file.
+		 */
+		unsigned long line1, line2;
+		/*
+		 * "next" & "previous" are used for the longest common
+		 * sequence;
+		 * initially, "next" reflects only the order in file1.
+		 */
+		struct entry *next, *previous;
+	} *entries, *first, *last;
+	/* were common records found? */
+	unsigned long has_matches;
+	mmfile_t *file1, *file2;
+	xdfenv_t *env;
+	xpparam_t const *xpp;
+};
+
+/* The argument "pass" is 1 for the first file, 2 for the second. */
+static void insert_record(int line, struct hashmap *map, int pass)
+{
+	xrecord_t **records = pass == 1 ?
+		map->env->xdf1.recs : map->env->xdf2.recs;
+	xrecord_t *record = records[line - 1], *other;
+	/*
+	 * After xdl_prepare_env() (or more precisely, due to
+	 * xdl_classify_record()), the "ha" member of the records (AKA lines)
+	 * is _not_ the hash anymore, but a linearized version of it.  In
+	 * other words, the "ha" member is guaranteed to start with 0 and
+	 * the second record's ha can only be 0 or 1, etc.
+	 *
+	 * So we multiply ha by 2 in the hope that the hashing was
+	 * "unique enough".
+	 */
+	int index = (int)((record->ha << 1) % map->alloc);
+
+	while (map->entries[index].line1) {
+		other = map->env->xdf1.recs[map->entries[index].line1 - 1];
+		if (map->entries[index].hash != record->ha ||
+				!xdl_recmatch(record->ptr, record->size,
+					other->ptr, other->size,
+					map->xpp->flags)) {
+			if (++index >= map->alloc)
+				index = 0;
+			continue;
+		}
+		if (pass == 2)
+			map->has_matches = 1;
+		if (pass == 1 || map->entries[index].line2)
+			map->entries[index].line2 = NON_UNIQUE;
+		else
+			map->entries[index].line2 = line;
+		return;
+	}
+	if (pass == 2)
+		return;
+	map->entries[index].line1 = line;
+	map->entries[index].hash = record->ha;
+	if (!map->first)
+		map->first = map->entries + index;
+	if (map->last) {
+		map->last->next = map->entries + index;
+		map->entries[index].previous = map->last;
+	}
+	map->last = map->entries + index;
+	map->nr++;
+}
+
+/*
+ * This function has to be called for each recursion into the inter-hunk
+ * parts, as previously non-unique lines can become unique when being
+ * restricted to a smaller part of the files.
+ *
+ * It is assumed that env has been prepared using xdl_prepare().
+ */
+static int fill_hashmap(mmfile_t *file1, mmfile_t *file2,
+		xpparam_t const *xpp, xdfenv_t *env,
+		struct hashmap *result,
+		int line1, int count1, int line2, int count2)
+{
+	result->file1 = file1;
+	result->file2 = file2;
+	result->xpp = xpp;
+	result->env = env;
+
+	/* We know exactly how large we want the hash map */
+	result->alloc = count1 * 2;
+	result->entries = (struct entry *)
+		xdl_malloc(result->alloc * sizeof(struct entry));
+	if (!result->entries)
+		return -1;
+	memset(result->entries, 0, result->alloc * sizeof(struct entry));
+
+	/* First, fill with entries from the first file */
+	while (count1--)
+		insert_record(line1++, result, 1);
+
+	/* Then search for matches in the second file */
+	while (count2--)
+		insert_record(line2++, result, 2);
+
+	return 0;
+}
+
+/*
+ * Find the longest sequence with a smaller last element (meaning a smaller
+ * line2, as we construct the sequence with entries ordered by line1).
+ */
+static int binary_search(struct entry **sequence, int longest,
+		struct entry *entry)
+{
+	int left = -1, right = longest;
+
+	while (left + 1 < right) {
+		int middle = (left + right) / 2;
+		/* by construction, no two entries can be equal */
+		if (sequence[middle]->line2 > entry->line2)
+			right = middle;
+		else
+			left = middle;
+	}
+	/* return the index in "sequence", _not_ the sequence length */
+	return left;
+}
+
+/*
+ * The idea is to start with the list of common unique lines sorted by
+ * the order in file1.  For each of these pairs, the longest (partial)
+ * sequence whose last element's line2 is smaller is determined.
+ *
+ * For efficiency, the sequences are kept in a list containing exactly one
+ * item per sequence length: the sequence with the smallest last
+ * element (in terms of line2).
+ */
+static struct entry *find_longest_common_sequence(struct hashmap *map)
+{
+	struct entry **sequence = xdl_malloc(map->nr * sizeof(struct entry *));
+	int longest = 0, i;
+	struct entry *entry;
+
+	for (entry = map->first; entry; entry = entry->next) {
+		if (!entry->line2 || entry->line2 == NON_UNIQUE)
+			continue;
+		i = binary_search(sequence, longest, entry);
+		entry->previous = i < 0 ? NULL : sequence[i];
+		sequence[++i] = entry;
+		if (i == longest)
+			longest++;
+	}
+
+	/* No common unique lines were found */
+	if (!longest)
+		return NULL;
+
+	/* Iterate starting at the last element, adjusting the "next" members */
+	entry = sequence[longest - 1];
+	entry->next = NULL;
+	while (entry->previous) {
+		entry->previous->next = entry;
+		entry = entry->previous;
+	}
+	return entry;
+}
+
+static int match(struct hashmap *map, int line1, int line2)
+{
+	xrecord_t *record1 = map->env->xdf1.recs[line1 - 1];
+	xrecord_t *record2 = map->env->xdf2.recs[line2 - 1];
+	return xdl_recmatch(record1->ptr, record1->size,
+		record2->ptr, record2->size, map->xpp->flags);
+}
+
+static int patience_diff(mmfile_t *file1, mmfile_t *file2,
+		xpparam_t const *xpp, xdfenv_t *env,
+		int line1, int count1, int line2, int count2);
+
+static int walk_common_sequence(struct hashmap *map, struct entry *first,
+		int line1, int count1, int line2, int count2)
+{
+	int end1 = line1 + count1, end2 = line2 + count2;
+	int next1, next2;
+
+	for (;;) {
+		/* Try to grow the line ranges of common lines */
+		if (first) {
+			next1 = first->line1;
+			next2 = first->line2;
+			while (next1 > line1 && next2 > line2 &&
+					match(map, next1 - 1, next2 - 1)) {
+				next1--;
+				next2--;
+			}
+		} else {
+			next1 = end1;
+			next2 = end2;
+		}
+		while (line1 < next1 && line2 < next2 &&
+				match(map, line1, line2)) {
+			line1++;
+			line2++;
+		}
+
+		/* Recurse */
+		if (next1 > line1 || next2 > line2) {
+			struct hashmap submap;
+
+			memset(&submap, 0, sizeof(submap));
+			if (patience_diff(map->file1, map->file2,
+					map->xpp, map->env,
+					line1, next1 - line1,
+					line2, next2 - line2))
+				return -1;
+		}
+
+		if (!first)
+			return 0;
+
+		while (first->next &&
+				first->next->line1 == first->line1 + 1 &&
+				first->next->line2 == first->line2 + 1)
+			first = first->next;
+
+		line1 = first->line1 + 1;
+		line2 = first->line2 + 1;
+
+		first = first->next;
+	}
+}
+
+static int fall_back_to_classic_diff(struct hashmap *map,
+		int line1, int count1, int line2, int count2)
+{
+	/*
+	 * This probably does not work outside Git, since
+	 * we have a very simple mmfile structure.
+	 *
+	 * Note: ideally, we would reuse the prepared environment, but
+	 * the libxdiff interface does not (yet) allow for diffing only
+	 * ranges of lines instead of the whole files.
+	 */
+	mmfile_t subfile1, subfile2;
+	xpparam_t xpp;
+	xdfenv_t env;
+
+	subfile1.ptr = (char *)map->env->xdf1.recs[line1 - 1]->ptr;
+	subfile1.size = map->env->xdf1.recs[line1 + count1 - 2]->ptr +
+		map->env->xdf1.recs[line1 + count1 - 2]->size - subfile1.ptr;
+	subfile2.ptr = (char *)map->env->xdf2.recs[line2 - 1]->ptr;
+	subfile2.size = map->env->xdf2.recs[line2 + count2 - 2]->ptr +
+		map->env->xdf2.recs[line2 + count2 - 2]->size - subfile2.ptr;
+	xpp.flags = map->xpp->flags & ~XDF_PATIENCE_DIFF;
+	if (xdl_do_diff(&subfile1, &subfile2, &xpp, &env) < 0)
+		return -1;
+
+	memcpy(map->env->xdf1.rchg + line1 - 1, env.xdf1.rchg, count1);
+	memcpy(map->env->xdf2.rchg + line2 - 1, env.xdf2.rchg, count2);
+
+	xdl_free_env(&env);
+
+	return 0;
+}
+
+/*
+ * Recursively find the longest common sequence of unique lines,
+ * and if none was found, ask xdl_do_diff() to do the job.
+ *
+ * This function assumes that env was prepared with xdl_prepare_env().
+ */
+static int patience_diff(mmfile_t *file1, mmfile_t *file2,
+		xpparam_t const *xpp, xdfenv_t *env,
+		int line1, int count1, int line2, int count2)
+{
+	struct hashmap map;
+	struct entry *first;
+	int result = 0;
+
+	/* trivial case: one side is empty */
+	if (!count1) {
+		while(count2--)
+			env->xdf2.rchg[line2++ - 1] = 1;
+		return 0;
+	} else if (!count2) {
+		while(count1--)
+			env->xdf1.rchg[line1++ - 1] = 1;
+		return 0;
+	}
+
+	memset(&map, 0, sizeof(map));
+	if (fill_hashmap(file1, file2, xpp, env, &map,
+			line1, count1, line2, count2))
+		return -1;
+
+	/* are there any matching lines at all? */
+	if (!map.has_matches) {
+		while(count1--)
+			env->xdf1.rchg[line1++ - 1] = 1;
+		while(count2--)
+			env->xdf2.rchg[line2++ - 1] = 1;
+		return 0;
+	}
+
+	first = find_longest_common_sequence(&map);
+	if (first)
+		result = walk_common_sequence(&map, first,
+			line1, count1, line2, count2);
+	else
+		result = fall_back_to_classic_diff(&map,
+			line1, count1, line2, count2);
+
+	return result;
+}
+
+int xdl_do_patience_diff(mmfile_t *file1, mmfile_t *file2,
+		xpparam_t const *xpp, xdfenv_t *env)
+{
+	if (xdl_prepare_env(file1, file2, xpp, env) < 0)
+		return -1;
+
+	/*
+	 * It is a pity that xdl_cleanup_records() not only marks those
+	 * lines as changes that are only present in one file, but also
+	 * lines that have multiple matches and happen to be in a "run
+	 * of discardable lines" that patience diff happens to split
+	 * differently.
+	 */
+	memset(env->xdf1.rchg, 0, env->xdf1.nrec);
+	memset(env->xdf2.rchg, 0, env->xdf2.nrec);
+	/* environment is cleaned up in xdl_diff() */
+	return patience_diff(file1, file2, xpp, env,
+			1, env->xdf1.nrec, 1, env->xdf2.nrec);
+}
-- 
1.6.1.rc3.224.g95ac9

^ permalink raw reply related

* Re: how to track the history of a line in a file
From: david @ 2009-01-02 23:01 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20090102212655.GA24082@coredump.intra.peff.net>

On Fri, 2 Jan 2009, Jeff King wrote:

> The tricky thing here is what is "this line"? Using the line number
> isn't right, since it will change based on other content coming in and
> out of the file. You can keep drilling down by reblaming parent commits,
> but remember that each time you do that you are manually looking at the
> content and saying "Oh, this is the line I am still interested in." So I
> a script would have to correlate the old version and new version of the
> line and realize how to follow the "interesting" thing.
>
> In your case, I think you want to see any commit in Makefile which
> changed a line with SUBLEVEL in it. Which is maybe easiest done as:
>
>  git log -z -p Makefile |
>    perl -0ne 'print if /\n[+-]SUBLEVEL/' |
>    tr '\0' '\n'
>
> and is pretty fast. But obviously we're leveraging some content-specific
> knowledge about what's in the Makefile.

Ok, I hacked togeather a quick bash script to try this

#!/bin/bash
line=`git blame -L /$1/,+1 -M $2`
COMMIT=`echo $line |cut -f 1 -d " "`
foundline=`echo $line |cut -f 6 -d " "|sed s/")"//`
echo "$foundline $COMMIT"
echo "$line"
while [ "$COMMIT" != "" ] ;do
echo "git blame -L $foundline,+1 -M $2 $COMMIT^"
line=`git blame -L $foundline,+1 -M $2 $COMMIT^`
COMMIT=`echo $line |cut -f 1 -d " "`
foundline=`echo $line |cut -f 6 -d " "|sed s/")"//`
echo "$line"
done


the problem that this has is that line 3 of $COMMIT may not be line 3 of 
$COMMIT^, and if they aren't it ends up hunting down the wrong data

either that or I am not understanding the output of git blame properly 
(also very possible)

David Lang

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Johannes Schindelin @ 2009-01-02 21:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Clemens Buchacher, Adeodato Simó, Pierre Habouzit, davidel,
	Francis Galiegue, Git ML
In-Reply-To: <alpine.LFD.2.00.0901020833000.5086@localhost.localdomain>

Hi,

On Fri, 2 Jan 2009, Linus Torvalds wrote:

> So I was hoping for something else than a single "in this case patience 
> diff works really well". I was hoping to see what it does in real life. 

Funnily, I think the test case you sent me is a pretty good example.  Look 
at this hunk (without patience diff):

@@ -4205,25 +4205,25 @@ out:
  */
 static int nfs4_xdr_dec_setattr(struct rpc_rqst *rqstp, __be32 *p, struct nfs_se
 {
-        struct xdr_stream xdr;
-        struct compound_hdr hdr;
-        int status;
-
-        xdr_init_decode(&xdr, &rqstp->rq_rcv_buf, p);
-        status = decode_compound_hdr(&xdr, &hdr);
-        if (status)
-                goto out;
-        status = decode_putfh(&xdr);
-        if (status)
-                goto out;
-        status = decode_setattr(&xdr, res);
-        if (status)
-                goto out;
+       struct xdr_stream xdr;
+       struct compound_hdr hdr;
+       int status;
+
+       xdr_init_decode(&xdr, &rqstp->rq_rcv_buf, p);

... and then it goes on with the whole reindented function.  Compare this 
to the same hunk _with_ patience diff:

@@ -4205,25 +4205,25 @@ out:
  */
 static int nfs4_xdr_dec_setattr(struct rpc_rqst *rqstp, __be32 *p, struct nfs_se
 {
-        struct xdr_stream xdr;
-        struct compound_hdr hdr;
-        int status;
+       struct xdr_stream xdr;
+       struct compound_hdr hdr;
+       int status;

-        xdr_init_decode(&xdr, &rqstp->rq_rcv_buf, p);
-        status = decode_compound_hdr(&xdr, &hdr);
-        if (status)
-                goto out;
-        status = decode_putfh(&xdr);
-        if (status)
-                goto out;
-        status = decode_setattr(&xdr, res);
-        if (status)
-                goto out;
+       xdr_init_decode(&xdr, &rqstp->rq_rcv_buf, p);
+       status = decode_compound_hdr(&xdr, &hdr);

... and again the rest is reindented code.

The difference?  The common empty line.  I actually find it more readable 
to have the separation between the declarations and the code also in the 
diff.

This is just a very feeble example, but you get the idea from there.

Oh, you might object that the empty line is not unique.  But actually it 
is, because the patience diff recurses into ever smaller line ranges until 
it finally comes to such a small range that the empty line _is_ unique.

And in my analysis of the complexity, I stupidly left out that recursion 
part.  So: patience diff is _substantially_ more expensive than Myers'.

> But when I tried it on the kernel archive, I get a core dump.

I also got this trying on the git.git repository, with commit 
be3cfa85([PATCH] Diff-tree-helper take two.)  Funnily, I almost got there 
trying the same before sending the first revision, but I got impatient and 
stopped early.  Tsk, tsk.

Now I tried with my complete clone of git.git together with all of my 
topic branches, and it runs through without segmentation fault.  Patch 
follows.

Ciao,
Dscho

^ permalink raw reply

* Re: how to track the history of a line in a file
From: david @ 2009-01-02 22:58 UTC (permalink / raw)
  To: Jeff Whiteside; +Cc: Jeff King, git
In-Reply-To: <3ab397d0901021349h4ebae0c1g460a0c8abd4ec072@mail.gmail.com>

On Fri, 2 Jan 2009, Jeff Whiteside wrote:

>> -oldline
>> +newline
>>
>> it's a 1-1 correspondence
>>
>> if it's instead
>> -oldline1
>> -oldline2
>> +newline1
>> +newline2
>
> what a neat idea.  i'm going to start malloc'ing 10,000 lines x 120
> chars for each file i add, and edit them so that no new lines replace
> removed lines unless it's intended that they were the same line.

I don't understand your comment. that isn't nessasary to do the tracking 
that I'm needing (you don't have to look at every line, only some of the 
lines that are in the same hunk of the patch as the line(s) you are 
interested in)

in my situation the use case is config files. in them each line is usually 
edited independantly of other lines (with stuff being added, usually, but 
not always on the end)

David Lang

^ permalink raw reply

* Re: how to track the history of a line in a file
From: Jeff Whiteside @ 2009-01-02 21:49 UTC (permalink / raw)
  To: david; +Cc: Jeff King, git
In-Reply-To: <alpine.DEB.1.10.0901021439080.21567@asgard.lang.hm>

> -oldline
> +newline
>
> it's a 1-1 correspondence
>
> if it's instead
> -oldline1
> -oldline2
> +newline1
> +newline2

what a neat idea.  i'm going to start malloc'ing 10,000 lines x 120
chars for each file i add, and edit them so that no new lines replace
removed lines unless it's intended that they were the same line.

^ permalink raw reply

* Re: git checkout does not warn about tags without corresponding commits
From: Junio C Hamano @ 2009-01-02 21:44 UTC (permalink / raw)
  To: Henrik Austad; +Cc: git
In-Reply-To: <200901021325.58049.henrik@austad.us>

Henrik Austad <henrik@austad.us> writes:

> I recently tried to do a checkout of (what I thought was the first) inux 
> kernel in the linux git repo.
>
> git checkout -b 2.6.11 v2.6.11

This should have barfed, and indeed I think it is a regression around
v1.5.5.  v1.5.4 and older git definitely fails to check out a tree object
like that.

^ permalink raw reply

* Re: how to track the history of a line in a file
From: david @ 2009-01-02 22:43 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20090102212655.GA24082@coredump.intra.peff.net>

On Fri, 2 Jan 2009, Jeff King wrote:

> On Fri, Jan 02, 2009 at 02:13:32PM -0800, david@lang.hm wrote:
>
>> I have a need to setup a repository where I'm storing config files, and I
>> need to be able to search the history of a particular line, not just when
>> the last edit of the line was (which is what I see from git blame)
>
> As you figured out, the "manual" way is to just keep reblaming from the
> parent of each blame. Recent versions of "git gui blame" have a "reblame
> from parent" option in the context menu which makes this a lot less
> painful.

unfortunantly I am needing to do this from the command line.

>> 57f8f7b6 (Linus Torvalds 2008-10-23 20:06:52 -0700 3) SUBLEVEL = 28
>>
>> what I would want it to show would be a list of the commits that have
>> changed this line.
>
> The tricky thing here is what is "this line"? Using the line number
> isn't right, since it will change based on other content coming in and
> out of the file. You can keep drilling down by reblaming parent commits,
> but remember that each time you do that you are manually looking at the
> content and saying "Oh, this is the line I am still interested in." So I
> a script would have to correlate the old version and new version of the
> line and realize how to follow the "interesting" thing.
>
> In your case, I think you want to see any commit in Makefile which
> changed a line with SUBLEVEL in it. Which is maybe easiest done as:
>
>  git log -z -p Makefile |
>    perl -0ne 'print if /\n[+-]SUBLEVEL/' |
>    tr '\0' '\n'
>
> and is pretty fast. But obviously we're leveraging some content-specific
> knowledge about what's in the Makefile.

using the line number shouldn't be _that_ hard becouse git knows what 
lines came and went from the file, so it can calculate the new line number 
(and does with the -M option)

In my case I would consider 'the same line' to be any lines in the diff 
that were taken out when this line was put in

so in the usual case (for me) of

-oldline
+newline

it's a 1-1 correspondence

if it's instead
-oldline1
-oldline2
+newline1
+newline2

I can't know for sure which oldline corresponds to the newline, but the 
odds are very good that they are related, so if I widen the search to 
cover each of the lines I am probably good.

David Lang

^ permalink raw reply

* Re: why still no empty directory support in git
From: Asheesh Laroia @ 2009-01-02 21:31 UTC (permalink / raw)
  To: Git Mailing List
In-Reply-To: <alpine.DEB.1.00.0901021954410.30769@pacific.mpi-cbg.de>

On Fri, 2 Jan 2009, Johannes Schindelin wrote:

> Hi,

Hi

*wipes the egg off his face*

> On Thu, 1 Jan 2009, Jeff King wrote:
>
>> On Tue, Dec 30, 2008 at 03:58:46AM -0500, Asheesh Laroia wrote:
>>
>>> So, let's say I take your suggestion.
>>>
>>> $ touch ~/Maildir/new/.exists
>>> $ git add ~/Maildir/new/.exists && git commit -m "La di da"
>>>
>>> Now a spec-compliant Maildir user agent will attempt to deliver this new
>>> "email message" of zero bytes into the mail spool and assign it a message
>>> UID.  Doing so will remove it from Maildir/new.
>>
>> No. The maildir spec says:
>>
>>   A unique name can be anything that doesn't contain a colon (or slash)
>>   and doesn't start with a dot.

Oops.  I never actually tried this...

> For the record, I am using Git to manage my mails, and never had any 
> problems after installing a hook which marks new empty directories with 
> .gitignore.

I'll give that a shot, and my apologies for the noise on the list with 
regard to this particular example.

I do still believe that git shouldn't rmdir() empty directories behind the 
user's back, but with this particular use case gone I'm no longer as 
adamant as before.

My apologies for not having tested this earlier; I will test it shortly, 
but there's every reason to think that Johannes and Jeff are right!

-- Asheesh.

-- 
It's interesting to think that many quite distinguished people have
bodies similar to yours.

^ permalink raw reply

* Re: how to track the history of a line in a file
From: Jeff King @ 2009-01-02 21:26 UTC (permalink / raw)
  To: david; +Cc: git
In-Reply-To: <alpine.DEB.1.10.0901021405460.21567@asgard.lang.hm>

On Fri, Jan 02, 2009 at 02:13:32PM -0800, david@lang.hm wrote:

> I have a need to setup a repository where I'm storing config files, and I  
> need to be able to search the history of a particular line, not just when  
> the last edit of the line was (which is what I see from git blame)

As you figured out, the "manual" way is to just keep reblaming from the
parent of each blame. Recent versions of "git gui blame" have a "reblame
from parent" option in the context menu which makes this a lot less
painful.

> 57f8f7b6 (Linus Torvalds 2008-10-23 20:06:52 -0700 3) SUBLEVEL = 28
>
> what I would want it to show would be a list of the commits that have  
> changed this line.

The tricky thing here is what is "this line"? Using the line number
isn't right, since it will change based on other content coming in and
out of the file. You can keep drilling down by reblaming parent commits,
but remember that each time you do that you are manually looking at the
content and saying "Oh, this is the line I am still interested in." So I
a script would have to correlate the old version and new version of the
line and realize how to follow the "interesting" thing.

In your case, I think you want to see any commit in Makefile which
changed a line with SUBLEVEL in it. Which is maybe easiest done as:

  git log -z -p Makefile |
    perl -0ne 'print if /\n[+-]SUBLEVEL/' |
    tr '\0' '\n'

and is pretty fast. But obviously we're leveraging some content-specific
knowledge about what's in the Makefile.

-Peff

^ permalink raw reply

* how to track the history of a line in a file
From: david @ 2009-01-02 22:13 UTC (permalink / raw)
  To: git

I have a need to setup a repository where I'm storing config files, and I 
need to be able to search the history of a particular line, not just when 
the last edit of the line was (which is what I see from git blame)

I'm not seeing a obvious way to do this, am I missing something or does it 
need a non-obvious approach?

for example, if I do

git blame -L /SUBLEVEL/,+1 -M Makefile

on the linux kernel it currently shows

57f8f7b6 (Linus Torvalds 2008-10-23 20:06:52 -0700 3) SUBLEVEL = 28

what I would want it to show would be a list of the commits that have 
changed this line.

It looks like I can write a script to do this

git blame -L /SUBLEVEL/,+1 -M Makefile 57f8f7b6^
6e86841d (Linus Torvalds 2008-07-28 19:40:31 -0700 3) SUBLEVEL = 27
git blame -L /SUBLEVEL/,+1 -M Makefile 6e86841d^
2ddcca36 (Linus Torvalds 2008-05-03 11:59:44 -0700 3) SUBLEVEL = 26

etc.

is there a better way to do this?

David Lang

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Jeff King @ 2009-01-02 20:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Schindelin, Clemens Buchacher, Adeodato Simó,
	Pierre Habouzit, davidel, Francis Galiegue, Git ML
In-Reply-To: <20090102195053.GA10876@coredump.intra.peff.net>

On Fri, Jan 02, 2009 at 02:50:53PM -0500, Jeff King wrote:

> For example, f83b9ba209's commit message indicates that it moves the
> "--format-patch" paragraph. Which is what "git diff" shows. Patience
> diff shows it as moving other text _around_ that paragraph.

Here's another interesting one: d592b315. The commit removes dashes
from git commands in test scripts. Git says:

        echo "tag-one-line" >expect &&
-       git-tag -l | grep "^tag-one-line" >actual &&
+       git tag -l | grep "^tag-one-line" >actual &&
        test_cmp expect actual &&
-       git-tag -n0 -l | grep "^tag-one-line" >actual &&
+       git tag -n0 -l | grep "^tag-one-line" >actual &&
        test_cmp expect actual &&
-       git-tag -n0 -l tag-one-line >actual &&
+       git tag -n0 -l tag-one-line >actual &&
        test_cmp expect actual &&

whereas patience says:

        echo "tag-one-line" >expect &&
-       git-tag -l | grep "^tag-one-line" >actual &&
-       test_cmp expect actual &&
-       git-tag -n0 -l | grep "^tag-one-line" >actual &&
-       test_cmp expect actual &&
-       git-tag -n0 -l tag-one-line >actual &&
+       git tag -l | grep "^tag-one-line" >actual &&
+       test_cmp expect actual &&
+       git tag -n0 -l | grep "^tag-one-line" >actual &&
+       test_cmp expect actual &&
+       git tag -n0 -l tag-one-line >actual &&
        test_cmp expect actual &&

which is exactly what patience is advertised to do: it's treating the
non-unique lines as uninteresting markers. But in this case they _are_
interesting, and I think the git output is more readable. And this is a
case where your "weight lines by length instead of uniqueness"
suggestion would perform better, I think.

-Peff

^ permalink raw reply

* Re: [CLEANUP PATCH RESEND] git wrapper: Make while loop more reader-friendly
From: Boyd Stephen Smith Jr. @ 2009-01-02 20:04 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, gitster
In-Reply-To: <alpine.DEB.1.00.0901021947170.30769@pacific.mpi-cbg.de>

[-- Attachment #1: Type: text/plain, Size: 1335 bytes --]

On Friday 2009 January 02 12:49:53 Johannes Schindelin wrote:
> > > -	do
> > > -		--slash;
> > > -	while (cmd <= slash && !is_dir_sep(*slash));
> > > +	while (cmd <= slash && !is_dir_sep(*slash))
> > > +		slash--;
> >
> > I prefer the one-liner:
> > for (; cmd <= slash && !is_dir_sep(*slash); --slash);
>
> As I mentioned in the commit message: readability is something to be
> cherished and worshipped.

It's also subjective.  I think my one-line is more readable than your two 
lines which is only slightly more readable than the original 3 lines.  Or is 
there some objective readability metric that of which I'm just not aware?

I also think that the lack of braces around your body on a separate line makes 
it harder to read and easier to break, but I understand that is the git 
coding style.

> For your pleasure, I will not go into details about the motions my bowels
> went through when I looked at those three lines.  Or your single line, for
> that matter.

Please do, although privately if you like.  I really don't see the problem the 
patch is trying to fix.
-- 
Boyd Stephen Smith Jr.                     ,= ,-_-. =. 
bss@iguanasuicide.net                     ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy           `-'(. .)`-' 
http://iguanasuicide.net/                      \_/     

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* "git reset --hard" == "git checkout HEAD" == "git reset --hard HEAD" ???
From: chris @ 2009-01-02 19:57 UTC (permalink / raw)
  To: git

Does "git reset --hard" == "git checkout HEAD" == "git reset --hard HEAD" ???

It seems we have 2 ways to blow away work we haven't checked in yet then right?

chris

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Jeff King @ 2009-01-02 19:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Schindelin, Clemens Buchacher, Adeodato Simó,
	Pierre Habouzit, davidel, Francis Galiegue, Git ML
In-Reply-To: <20090102193904.GB9129@coredump.intra.peff.net>

On Fri, Jan 02, 2009 at 02:39:04PM -0500, Jeff King wrote:

> If you just want to see the results on some real-world cases (and don't
> care about measuring performance), try installing bzr and using their
> patiencediff test program as a GIT_EXTERNAL_DIFF.
> 
> On Debian, it's:
> 
>   $ sudo apt-get install bzr
>   $ cat >$HOME/patience <<'EOF'
>     #!/bin/sh
>     exec python /usr/share/pyshared/bzrlib/patiencediff.py "$2" "$5"
>     EOF
>   $ chmod 755 patience
>   $ GIT_EXTERNAL_DIFF=$HOME/patience git diff

For added fun, try this:

-- >8 --
#!/bin/sh

canonical() {
  grep '^[ +-]' | egrep -v '^(---|\+\+\+)'
}

git rev-list --no-merges HEAD | while read rev; do
  git diff-tree -p $rev^ $rev | canonical >git.out
  GIT_EXTERNAL_DIFF=$HOME/patience git diff $rev^ $rev | canonical >bzr.out
  echo "`diff -U0 git.out bzr.out | wc -l` $rev"
done
-- 8< --

I'm running it on git.git now. It looks like both algorithms return the
same patch for most cases. Some of the differences are interesting,
though.

For example, f83b9ba209's commit message indicates that it moves the
"--format-patch" paragraph. Which is what "git diff" shows. Patience
diff shows it as moving other text _around_ that paragraph.

-Peff

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Jeff King @ 2009-01-02 19:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Schindelin, Clemens Buchacher, Adeodato Simó,
	Pierre Habouzit, davidel, Francis Galiegue, Git ML
In-Reply-To: <alpine.LFD.2.00.0901021050450.5086@localhost.localdomain>

On Fri, Jan 02, 2009 at 11:03:07AM -0800, Linus Torvalds wrote:

> Well, it's also the test-case in the very first hit on google for 
> "patience diff" (with the quotes).
> 
> In fact, it's the _only_ one I ever found ;)

If you just want to see the results on some real-world cases (and don't
care about measuring performance), try installing bzr and using their
patiencediff test program as a GIT_EXTERNAL_DIFF.

On Debian, it's:

  $ sudo apt-get install bzr
  $ cat >$HOME/patience <<'EOF'
    #!/bin/sh
    exec python /usr/share/pyshared/bzrlib/patiencediff.py "$2" "$5"
    EOF
  $ chmod 755 patience
  $ GIT_EXTERNAL_DIFF=$HOME/patience git diff

Other distributions presumably install patiencediff.py somewhere
similar (or you can maybe even pull it from the source, but presumably
you have to install some other bzr modules, too).

-Peff

^ permalink raw reply

* Re: [PATCH 2/2] gitweb: support hiding projects from user-visible lists
From: Jakub Narebski @ 2009-01-02 19:33 UTC (permalink / raw)
  To: Matt McCutchen; +Cc: git
In-Reply-To: <1230082831.2971.45.camel@localhost>

On Wed, 2008-12-24, Matt McCutchen wrote:
> On Sat, 2008-12-13 at 14:02 -0800, Jakub Narebski wrote:
> >
> > Cannot you do this with new $export_auth_hook gitweb configuration
> > variable, added by Alexander Gavrilov in 
> >    dd7f5f1 (gitweb: Add a per-repository authorization hook.)
> > It is used in check_export_ok subroutine, and is is checked also when
> > getting list of project from file
> > 
> > From gitweb/INSTALL
[...]
> >     For example, if you use mod_perl to run the script, and have dumb
> >     http protocol authentication configured for your repositories, you
> >     can use the following hook to allow access only if the user is
> >     authorized to read the files:
[...]
 
> $export_auth_hook would work, and it would have the nice (but not
> essential) feature of including private projects in the list shown to
> suitably authenticated users.  The only problem is that my Web host
> doesn't support mod_perl.  Is there a practical way to accomplish the
> same thing as the above example in a CGI script?  I would like to avoid
> reimplementing Apache authentication-checking functionality if at all
> possible.

I know it is written that the example code is for mod_perl, but I
don't think it is mod_perl specific; have you checked if it works
for you? I assume that you use Apache, and have Apache Perl bindings
installed...

-- 
Jakub Narebski
Poland

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Johannes Schindelin @ 2009-01-02 19:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Clemens Buchacher, Adeodato Simó, Pierre Habouzit, davidel,
	Francis Galiegue, Git ML
In-Reply-To: <alpine.LFD.2.00.0901021050450.5086@localhost.localdomain>

Hi,

On Fri, 2 Jan 2009, Linus Torvalds wrote:

> On Fri, 2 Jan 2009, Johannes Schindelin wrote:
> 
> > And the worst part: one can only _guess_ what motivated patience diff.  
> > I imagine it came from the observation that function headers are 
> > unique, and that you usually want to preserve as much context around 
> > them.
> 
> Well, I do like the notion of giving more weight to unique lines - I think 
> it makes sense. That said, I suspect it would make almost as much sense to 
> give more weight simply to _longer_ lines, and I suspect the standard 
> Myers' algorithm could possibly be simply extended to take line size into 
> account when calculating the weights.

I think that it makes more sense with the common unique lines, because 
they are _unambiguously_ common.  That is, if possible, we would like to 
keep them as common lines.

BTW this also opens the door for more automatic code movement detection.

As for the longer lines: what exactly would you want to put "weight" on?  
The edit distance is the number of plusses and minusses, and this is the 
thing that actually is pretty critical for the performance: the larger the 
distance, the longer it takes.  So if you want to put a different "weight" 
on a line, i.e. something else than a "1", you are risking a substantially 
worse performance.

And I am still not convinced that longer lines should get more weight. A 
line starting with "exit:" can be much more important than 8 tabs followed 
by a curly bracket.

> Because the problem with diffs for C doesn't really tend to be as much 
> about non-unique lines as about just _trivial_ lines that are mostly empty 
> or contain just braces etc. Those are quite arguably almost totally 
> worthless for equality testing.

Oh, but then we get very C specific.  Let's not go there.

> And btw, don't get me wrong - I don't think there is anything wrong with 
> the patience diff. I think it's a worthy thing to try, and I'm not at 
> all arguing against it.

I never took you to be opposed.  I myself mainly implemented it because I 
wanted to play with it, and have something more useful than what I found 
in the whole wide web.

> However, I do think that the people arguing for it often do so based on 
> less-than-very-logical arguments, and it's entirely possible that other 
> approaches are better (eg the "weight by size" thing rather than "weight 
> by uniqueness").

Actually, I think it is very possible that merging hunks if there are not 
enough alnums between them could make a whole lot of sense.

But IIRC it was shot down and replaced by the not-so-specific-to-C 
algorithm to merge hunks when the output is shorter than having separate 
hunks.

> The thing about unique lines is that there are no guarantees at all that 
> they even exist, so a uniqueness-based thing will always have to fall 
> back on anything else. That, to me, implies that the whole notion is 
> somewhat mis-designed: it's clearly not a generic concept.
> 
> (In contrast, taking the length of the matching lines into account would 
> not have that kind of bad special case)

However, we know that humans like to start from the unique features they 
see, and continue from there.

And while it is quite possible that it is the wrong approach (see all that 
security theater at the airport, and some nice rational analyses about the 
cost/benefit equations there), imitating humans is still the thing that 
will convince most humans.

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Johannes Schindelin @ 2009-01-02 19:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Pierre Habouzit, davidel, Francis Galiegue, Git ML
In-Reply-To: <alpine.LFD.2.00.0901021045290.5086@localhost.localdomain>

Hi,

On Fri, 2 Jan 2009, Linus Torvalds wrote:

> On Fri, 2 Jan 2009, Johannes Schindelin wrote:
> > 
> > BTW the "-p" is not necessary with "show", indeed, you cannot even 
> > switch it off.
> 
> I was just switching back-and-forth between "git log" and "git show" so 
> the -p came from just that, and is not necessary.
> 
> And you _can_ suppress the patch generation - use "-s".

Ah, another thing learnt.

Thanks,
Dscho

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Linus Torvalds @ 2009-01-02 19:03 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Clemens Buchacher, Adeodato Simó, Pierre Habouzit, davidel,
	Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.00.0901021918100.30769@pacific.mpi-cbg.de>



On Fri, 2 Jan 2009, Johannes Schindelin wrote:
> 
> FWIW it's the test case in the commit introducing the --patience option.

Well, it's also the test-case in the very first hit on google for 
"patience diff" (with the quotes).

In fact, it's the _only_ one I ever found ;)

> And the worst part: one can only _guess_ what motivated patience diff.  I 
> imagine it came from the observation that function headers are unique, and 
> that you usually want to preserve as much context around them.

Well, I do like the notion of giving more weight to unique lines - I think 
it makes sense. That said, I suspect it would make almost as much sense to 
give more weight simply to _longer_ lines, and I suspect the standard 
Myers' algorithm could possibly be simply extended to take line size into 
account when calculating the weights.

Because the problem with diffs for C doesn't really tend to be as much 
about non-unique lines as about just _trivial_ lines that are mostly empty 
or contain just braces etc. Those are quite arguably almost totally 
worthless for equality testing.

And btw, don't get me wrong - I don't think there is anything wrong with 
the patience diff. I think it's a worthy thing to try, and I'm not at all 
arguing against it. However, I do think that the people arguing for it 
often do so based on less-than-very-logical arguments, and it's entirely 
possible that other approaches are better (eg the "weight by size" thing 
rather than "weight by uniqueness").

The thing about unique lines is that there are no guarantees at all that 
they even exist, so a uniqueness-based thing will always have to fall back 
on anything else. That, to me, implies that the whole notion is somewhat 
mis-designed: it's clearly not a generic concept.

(In contrast, taking the length of the matching lines into account would 
not have that kind of bad special case)

			Linus

^ permalink raw reply

* Re: [PATCH v2 1/3] rebase: learn to rebase root commit
From: Johannes Schindelin @ 2009-01-02 18:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Thomas Rast, git
In-Reply-To: <7v4p0iivwh.fsf@gitster.siamese.dyndns.org>

Hi,

On Thu, 1 Jan 2009, Junio C Hamano wrote:

> Thomas Rast <trast@student.ethz.ch> writes:
> 
> > Teach git-rebase a new option --root, which instructs it to rebase the
> > entire history leading up to <branch>.
> >
> > The main use-case is with git-svn: suppose you start hacking (perhaps
> > offline) on a new project, but later notice you want to commit this
> > work to SVN.  You will have to rebase the entire history, including
> > the root commit, on a (possibly empty) commit coming from git-svn, to
> > establish a history connection.  This previously had to be done by
> > cherry-picking the root commit manually.
> 
> I like what this series tries to do.  Using the --root option is probably
> a more natural way to do what people often do with the "add graft and
> filter-branch the whole history once" procedure.
> 
> But it somewhat feels sad if the "main" use-case for this is to start your
> project in git and then migrate away by feeding your history to subversion
> ;-).

FWIW I had a single case where I could have used something like this 
myself, in my whole life.  It was when I started to write 
git-edit-patch-series.sh in its own repository, only to realize at the end 
that I should have started it in a topic branch in my git.git tree.

Ciao,
Dscho

^ permalink raw reply

* Re: why still no empty directory support in git
From: Johannes Schindelin @ 2009-01-02 18:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Asheesh Laroia, Git Mailing List
In-Reply-To: <20090101200651.GB6536@coredump.intra.peff.net>

Hi,

On Thu, 1 Jan 2009, Jeff King wrote:

> On Tue, Dec 30, 2008 at 03:58:46AM -0500, Asheesh Laroia wrote:
> 
> > So, let's say I take your suggestion.
> >
> > $ touch ~/Maildir/new/.exists
> > $ git add ~/Maildir/new/.exists && git commit -m "La di da"
> >
> > Now a spec-compliant Maildir user agent will attempt to deliver this new  
> > "email message" of zero bytes into the mail spool and assign it a message  
> > UID.  Doing so will remove it from Maildir/new.
> 
> No. The maildir spec says:
> 
>   A unique name can be anything that doesn't contain a colon (or slash)
>   and doesn't start with a dot.
>      -- http://cr.yp.to/proto/maildir.html
> 
> where a "unique name" is the filename used for a message. In practice,
> every maildir implementation I have seen ignores files starting with a
> dot. Do you have one that doesn't?

For the record, I am using Git to manage my mails, and never had any 
problems after installing a hook which marks new empty directories with 
.gitignore.

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Jeff King @ 2009-01-02 18:51 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git
In-Reply-To: <alpine.DEB.1.00.0901021914420.30769@pacific.mpi-cbg.de>

On Fri, Jan 02, 2009 at 07:17:34PM +0100, Johannes Schindelin wrote:

> BTW the "-p" is not necessary with "show", indeed, you cannot even switch 
> it off.

Half true:

  $ git grep -A1 '"-s"' diff.c
  diff.c: else if (!strcmp(arg, "-s"))
  diff.c-         options->output_format |= DIFF_FORMAT_NO_OUTPUT;

-Peff

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Adeodato Simó @ 2009-01-02 18:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Schindelin, Pierre Habouzit, davidel, Francis Galiegue,
	Git ML
In-Reply-To: <alpine.LFD.2.00.0901011747010.5086@localhost.localdomain>

* Linus Torvalds [Thu, 01 Jan 2009 17:56:13 -0800]:

> On Thu, 1 Jan 2009, Adeodato Simó wrote:

> > For me, the cases where I find patience output to be of substantial
> > higher readability are those involving a rewrite of several consecutive
> > paragraphs (i.e., lines of code separated by blank lines). Compare:

> I don't think that's a "patience diff" issue.

Ah, I see.

> That's simply an issue of merging consecutive diff fragments together if 
> they are close-by, and that's independent of the actual diff algorithm 
> itself.

> > I'll note that in this particular case, `git diff` yielded the very same
> > results with or without --patience. I don't know why that is, Johannes?
> > I'll also note that /usr/bin/diff produces (in this case) something
> > closer to patience than to git.

> See above - I really don't think this has anything to do with "patience vs 
> non-patience". It's more akin to the things we do for our merge conflict 
> markers: if we have two merge conflicts next to each other, with just a 
> couple of lines in between, we coalesce the merge conflicts into one 
> larger one instead.

> We don't do that for regular diffs - they're always kept minimal (ok, not 
> really minimal, but as close to minimal as the algorithm finds them).

> See commit f407f14deaa14ebddd0d27238523ced8eca74393 for the git merge 
> conflict merging. We _could_ do similar things for regular diffs. It's 
> sometimes useful, sometimes not.

Independently of patience diff, then, I'd very much support changes to
improve the diff I pasted.

-- 
Adeodato Simó                                     dato at net.com.org.es
Debian Developer                                  adeodato at debian.org
 
Debugging is twice as hard as writing the code in the first place. Therefore,
if you write the code as cleverly as possible, you are, by definition, not
smart enough to debug it.
                -- Brian W. Kernighan

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Linus Torvalds @ 2009-01-02 18:49 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Pierre Habouzit, davidel, Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.00.0901021914420.30769@pacific.mpi-cbg.de>



On Fri, 2 Jan 2009, Johannes Schindelin wrote:
> 
> BTW the "-p" is not necessary with "show", indeed, you cannot even switch 
> it off.

I was just switching back-and-forth between "git log" and "git show" so 
the -p came from just that, and is not necessary.

And you _can_ suppress the patch generation - use "-s".

			Linus

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox