Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: Linus Torvalds @ 2009-01-07 16:08 UTC (permalink / raw)
  To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML
In-Reply-To: <alpine.LFD.2.00.0901070743070.3057@localhost.localdomain>

On Wed, 7 Jan 2009, Linus Torvalds wrote:
> 
> And dobody else saw it than this one person, and it was a total mystery to 
> everybody until we realized that he used this one feature that nobody else 
> was using. So as you're on OS X, I assume you don't have CRLF conversion, 
> but maybe you use some other feature that we support but nobody really 
> actually uses. Like keyword expansion or something?
> 
> Oh - that would also explain why you got all those entries in "git status" 
> that went away when you did a "git reset --hard": if you had some keyword 
> expansion (or CRLF) enabled in the original users "~/.gitconfig", that 
> checkout would have had expansion/CRLF/whatever conversion, but then when 
> you tarred/untarred it on another setup, the expansion would be seen as a 
> difference because it wasn't enabled.

Btw, if you untar it again, and just do a "git diff", that should show any 
such effects. Rather than showing just that something changed, it should 
show _how_ it changed.

		Linus

^ permalink raw reply

* Re: git rebase orthodontics
From: Thomas Rast @ 2009-01-07 16:31 UTC (permalink / raw)
  To: jidanni; +Cc: git
In-Reply-To: <87sknvxje8.fsf@jidanni.org>

[-- Attachment #1: Type: text/plain, Size: 2083 bytes --]

jidanni@jidanni.org wrote:
> wherein he discovers there are no guard rails

Good thing you learned this before getting to git-reset.

> $ EDITOR=cat git rebase --interactive master
> pick 07aef4a This is a commit with No files, wow. bla.
> # Rebase 3ad166e..07aef4a onto 3ad166e ...
> Successfully rebased and updated refs/heads/jidanni.
> (But it didn't. git show shows no change. ls -l shows
> refs/heads/jidanni was not touched.
> OK, it seems like all I am doing is changing
>               A jidanni
>              /
> D---E---F---G master
> into the same thing, a noop. But shouldn't it warn and quit, instead
> of rewarding me with the success message?

You asked for an interactive rebase of the range master..jidanni,
which consists of A, so it gave you an editor offering 'pick A' and
the chance to change that.

Non-interactive rebase indeed checks if you attempt to rebase, but are
already up to date.  Interactive doesn't; the assumption is that
interactive rebases aren't used "blindly" to update.  (Rebasing
changes committer and commit time, so there is a difference between
not rebasing at all, and merely ending up with the same history.)

> Let's try it the other way
> around:
> $ git checkout master
> $ git rebase --interactive jidanni #Wherein one sees:
> noop
> # Rebase 07aef4a..3ad166e onto 07aef4a
> Successfully rebased and updated refs/heads/master.
> OK, now I have achieved
> D---E---F---G---A master, jidanni
> Observations:
> When I tried a noop, it didn't say noop in the editor.
> When I tried a yesop, it did say noop in the editor.

The 'noop' means that there are no commits in the range you asked to
rebase, which is jidanni..master.  It's telling you that it is going
to update the branch pointer, but not carry over any of the commits.
This can happen even if jidanni..master is nonempty, but all commits
in it are already contained in jidanni.

> In both cases it gave the same success message.

It successfully did what you told it to do.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Johannes Schindelin @ 2009-01-07 17:01 UTC (permalink / raw)
  To: Pierre Habouzit; +Cc: Linus Torvalds, davidel, Francis Galiegue, Git ML
In-Reply-To: <20090107143926.GB831@artemis.corp>

Hi,

On Wed, 7 Jan 2009, Pierre Habouzit wrote:

> On Tue, Jan 06, 2009 at 07:40:02PM +0000, Johannes Schindelin wrote:
> 
> > Although I would like to see it in once it is fleshed out -- even if 
> > it does not meet our usefulness standard -- because people said Git is 
> > inferior for not providing a patience diff.  If we have --patience, we 
> > can say "but we have it, it's just not useful, check for yourself".
> 
> Well I believe it's useful, but maybe the standard algorithm could be 
> tweaked the way Linus proposes to make the "long" lines weight louder or 
> so.

I think this "weighting idea" is a bit too much of handwaving to start 
anything close to a design; as I pointed out, anything that has something 
different than a 1 for a deleted/added line affects performance 
negatively.

> WRT the leaks, you want to squash the attached patch on the proper
> patches of your series (maybe the xdl_free on map.entries could be put
> in a hasmap_destroy or similar btw, but valgrind reports no more leaks
> in xdiff now).

Thanks!

I also squashed in a patch that avoids calling xdl_cleanup_records() and 
then memset()ing the rchg array to 0 (which worked around the segmentation 
fault).

Patch 1/3 v3 follows,
Dscho

^ permalink raw reply

* [PATCH v3 1/3] Implement the patience diff algorithm
From: Johannes Schindelin @ 2009-01-07 17:04 UTC (permalink / raw)
  To: Pierre Habouzit; +Cc: Linus Torvalds, davidel, Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.00.0901071610290.7496@intel-tinevez-2-302>


The patience diff algorithm produces slightly more intuitive output
than the classic Myers algorithm, as it does not try to minimize the
number of +/- lines first, but tries to preserve the lines that are
unique.

To this end, it first determines lines that are unique in both files,
then the maximal sequence which preserves the order (relative to both
files) is extracted.

Starting from this initial set of common lines, the rest of the lines
is handled recursively, with Myers' algorithm as a fallback when
the patience algorithm fails (due to no common unique lines).

This patch includes memory leak fixes by Pierre Habouzit.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---

	This comes close to 'next' quality, IMHO.

	Changes vs v2: Pierre's memory leak fixes, and it now avoids 
	calling xdl_cleanup_records() and then throwing away the 
	results for patience diff.

	Pierre, could you do the performance tests again?  I _think_ it 
	might be somewhat faster now because of the xdl_cleanup_records()
	avoidance.

 xdiff/xdiff.h     |    1 +
 xdiff/xdiffi.c    |    3 +
 xdiff/xdiffi.h    |    2 +
 xdiff/xpatience.c |  381 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 xdiff/xprepare.c  |    3 +-
 5 files changed, 389 insertions(+), 1 deletions(-)
 create mode 100644 xdiff/xpatience.c

diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 361f802..4da052a 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -32,6 +32,7 @@ extern "C" {
 #define XDF_IGNORE_WHITESPACE (1 << 2)
 #define XDF_IGNORE_WHITESPACE_CHANGE (1 << 3)
 #define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 4)
+#define XDF_PATIENCE_DIFF (1 << 5)
 #define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL)
 
 #define XDL_PATCH_NORMAL '-'
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 9d0324a..3e97462 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -329,6 +329,9 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	xdalgoenv_t xenv;
 	diffdata_t dd1, dd2;
 
+	if (xpp->flags & XDF_PATIENCE_DIFF)
+		return xdl_do_patience_diff(mf1, mf2, xpp, xe);
+
 	if (xdl_prepare_env(mf1, mf2, xpp, xe) < 0) {
 
 		return -1;
diff --git a/xdiff/xdiffi.h b/xdiff/xdiffi.h
index 3e099dc..ad033a8 100644
--- a/xdiff/xdiffi.h
+++ b/xdiff/xdiffi.h
@@ -55,5 +55,7 @@ int xdl_build_script(xdfenv_t *xe, xdchange_t **xscr);
 void xdl_free_script(xdchange_t *xscr);
 int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 		  xdemitconf_t const *xecfg);
+int xdl_do_patience_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
+		xdfenv_t *env);
 
 #endif /* #if !defined(XDIFFI_H) */
diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
new file mode 100644
index 0000000..e42c16a
--- /dev/null
+++ b/xdiff/xpatience.c
@@ -0,0 +1,381 @@
+/*
+ *  LibXDiff by Davide Libenzi ( File Differential Library )
+ *  Copyright (C) 2003-2009 Davide Libenzi, Johannes E. Schindelin
+ *
+ *  This library is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU Lesser General Public
+ *  License as published by the Free Software Foundation; either
+ *  version 2.1 of the License, or (at your option) any later version.
+ *
+ *  This library is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *  Lesser General Public License for more details.
+ *
+ *  You should have received a copy of the GNU Lesser General Public
+ *  License along with this library; if not, write to the Free Software
+ *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ *  Davide Libenzi <davidel@xmailserver.org>
+ *
+ */
+#include "xinclude.h"
+#include "xtypes.h"
+#include "xdiff.h"
+
+/*
+ * The basic idea of patience diff is to find lines that are unique in
+ * both files.  These are intuitively the ones that we want to see as
+ * common lines.
+ *
+ * The maximal ordered sequence of such line pairs (where ordered means
+ * that the order in the sequence agrees with the order of the lines in
+ * both files) naturally defines an initial set of common lines.
+ *
+ * Now, the algorithm tries to extend the set of common lines by growing
+ * the line ranges where the files have identical lines.
+ *
+ * Between those common lines, the patience diff algorithm is applied
+ * recursively, until no unique line pairs can be found; these line ranges
+ * are handled by the well-known Myers algorithm.
+ */
+
+#define NON_UNIQUE ULONG_MAX
+
+/*
+ * This is a hash mapping from line hash to line numbers in the first and
+ * second file.
+ */
+struct hashmap {
+	int nr, alloc;
+	struct entry {
+		unsigned long hash;
+		/*
+		 * 0 = unused entry, 1 = first line, 2 = second, etc.
+		 * line2 is NON_UNIQUE if the line is not unique
+		 * in either the first or the second file.
+		 */
+		unsigned long line1, line2;
+		/*
+		 * "next" & "previous" are used for the longest common
+		 * sequence;
+		 * initially, "next" reflects only the order in file1.
+		 */
+		struct entry *next, *previous;
+	} *entries, *first, *last;
+	/* were common records found? */
+	unsigned long has_matches;
+	mmfile_t *file1, *file2;
+	xdfenv_t *env;
+	xpparam_t const *xpp;
+};
+
+/* The argument "pass" is 1 for the first file, 2 for the second. */
+static void insert_record(int line, struct hashmap *map, int pass)
+{
+	xrecord_t **records = pass == 1 ?
+		map->env->xdf1.recs : map->env->xdf2.recs;
+	xrecord_t *record = records[line - 1], *other;
+	/*
+	 * After xdl_prepare_env() (or more precisely, due to
+	 * xdl_classify_record()), the "ha" member of the records (AKA lines)
+	 * is _not_ the hash anymore, but a linearized version of it.  In
+	 * other words, the "ha" member is guaranteed to start with 0 and
+	 * the second record's ha can only be 0 or 1, etc.
+	 *
+	 * So we multiply ha by 2 in the hope that the hashing was
+	 * "unique enough".
+	 */
+	int index = (int)((record->ha << 1) % map->alloc);
+
+	while (map->entries[index].line1) {
+		other = map->env->xdf1.recs[map->entries[index].line1 - 1];
+		if (map->entries[index].hash != record->ha ||
+				!xdl_recmatch(record->ptr, record->size,
+					other->ptr, other->size,
+					map->xpp->flags)) {
+			if (++index >= map->alloc)
+				index = 0;
+			continue;
+		}
+		if (pass == 2)
+			map->has_matches = 1;
+		if (pass == 1 || map->entries[index].line2)
+			map->entries[index].line2 = NON_UNIQUE;
+		else
+			map->entries[index].line2 = line;
+		return;
+	}
+	if (pass == 2)
+		return;
+	map->entries[index].line1 = line;
+	map->entries[index].hash = record->ha;
+	if (!map->first)
+		map->first = map->entries + index;
+	if (map->last) {
+		map->last->next = map->entries + index;
+		map->entries[index].previous = map->last;
+	}
+	map->last = map->entries + index;
+	map->nr++;
+}
+
+/*
+ * This function has to be called for each recursion into the inter-hunk
+ * parts, as previously non-unique lines can become unique when being
+ * restricted to a smaller part of the files.
+ *
+ * It is assumed that env has been prepared using xdl_prepare().
+ */
+static int fill_hashmap(mmfile_t *file1, mmfile_t *file2,
+		xpparam_t const *xpp, xdfenv_t *env,
+		struct hashmap *result,
+		int line1, int count1, int line2, int count2)
+{
+	result->file1 = file1;
+	result->file2 = file2;
+	result->xpp = xpp;
+	result->env = env;
+
+	/* We know exactly how large we want the hash map */
+	result->alloc = count1 * 2;
+	result->entries = (struct entry *)
+		xdl_malloc(result->alloc * sizeof(struct entry));
+	if (!result->entries)
+		return -1;
+	memset(result->entries, 0, result->alloc * sizeof(struct entry));
+
+	/* First, fill with entries from the first file */
+	while (count1--)
+		insert_record(line1++, result, 1);
+
+	/* Then search for matches in the second file */
+	while (count2--)
+		insert_record(line2++, result, 2);
+
+	return 0;
+}
+
+/*
+ * Find the longest sequence with a smaller last element (meaning a smaller
+ * line2, as we construct the sequence with entries ordered by line1).
+ */
+static int binary_search(struct entry **sequence, int longest,
+		struct entry *entry)
+{
+	int left = -1, right = longest;
+
+	while (left + 1 < right) {
+		int middle = (left + right) / 2;
+		/* by construction, no two entries can be equal */
+		if (sequence[middle]->line2 > entry->line2)
+			right = middle;
+		else
+			left = middle;
+	}
+	/* return the index in "sequence", _not_ the sequence length */
+	return left;
+}
+
+/*
+ * The idea is to start with the list of common unique lines sorted by
+ * the order in file1.  For each of these pairs, the longest (partial)
+ * sequence whose last element's line2 is smaller is determined.
+ *
+ * For efficiency, the sequences are kept in a list containing exactly one
+ * item per sequence length: the sequence with the smallest last
+ * element (in terms of line2).
+ */
+static struct entry *find_longest_common_sequence(struct hashmap *map)
+{
+	struct entry **sequence = xdl_malloc(map->nr * sizeof(struct entry *));
+	int longest = 0, i;
+	struct entry *entry;
+
+	for (entry = map->first; entry; entry = entry->next) {
+		if (!entry->line2 || entry->line2 == NON_UNIQUE)
+			continue;
+		i = binary_search(sequence, longest, entry);
+		entry->previous = i < 0 ? NULL : sequence[i];
+		sequence[++i] = entry;
+		if (i == longest)
+			longest++;
+	}
+
+	/* No common unique lines were found */
+	if (!longest) {
+		xdl_free(sequence);
+		return NULL;
+	}
+
+	/* Iterate starting at the last element, adjusting the "next" members */
+	entry = sequence[longest - 1];
+	entry->next = NULL;
+	while (entry->previous) {
+		entry->previous->next = entry;
+		entry = entry->previous;
+	}
+	xdl_free(sequence);
+	return entry;
+}
+
+static int match(struct hashmap *map, int line1, int line2)
+{
+	xrecord_t *record1 = map->env->xdf1.recs[line1 - 1];
+	xrecord_t *record2 = map->env->xdf2.recs[line2 - 1];
+	return xdl_recmatch(record1->ptr, record1->size,
+		record2->ptr, record2->size, map->xpp->flags);
+}
+
+static int patience_diff(mmfile_t *file1, mmfile_t *file2,
+		xpparam_t const *xpp, xdfenv_t *env,
+		int line1, int count1, int line2, int count2);
+
+static int walk_common_sequence(struct hashmap *map, struct entry *first,
+		int line1, int count1, int line2, int count2)
+{
+	int end1 = line1 + count1, end2 = line2 + count2;
+	int next1, next2;
+
+	for (;;) {
+		/* Try to grow the line ranges of common lines */
+		if (first) {
+			next1 = first->line1;
+			next2 = first->line2;
+			while (next1 > line1 && next2 > line2 &&
+					match(map, next1 - 1, next2 - 1)) {
+				next1--;
+				next2--;
+			}
+		} else {
+			next1 = end1;
+			next2 = end2;
+		}
+		while (line1 < next1 && line2 < next2 &&
+				match(map, line1, line2)) {
+			line1++;
+			line2++;
+		}
+
+		/* Recurse */
+		if (next1 > line1 || next2 > line2) {
+			struct hashmap submap;
+
+			memset(&submap, 0, sizeof(submap));
+			if (patience_diff(map->file1, map->file2,
+					map->xpp, map->env,
+					line1, next1 - line1,
+					line2, next2 - line2))
+				return -1;
+		}
+
+		if (!first)
+			return 0;
+
+		while (first->next &&
+				first->next->line1 == first->line1 + 1 &&
+				first->next->line2 == first->line2 + 1)
+			first = first->next;
+
+		line1 = first->line1 + 1;
+		line2 = first->line2 + 1;
+
+		first = first->next;
+	}
+}
+
+static int fall_back_to_classic_diff(struct hashmap *map,
+		int line1, int count1, int line2, int count2)
+{
+	/*
+	 * This probably does not work outside Git, since
+	 * we have a very simple mmfile structure.
+	 *
+	 * Note: ideally, we would reuse the prepared environment, but
+	 * the libxdiff interface does not (yet) allow for diffing only
+	 * ranges of lines instead of the whole files.
+	 */
+	mmfile_t subfile1, subfile2;
+	xpparam_t xpp;
+	xdfenv_t env;
+
+	subfile1.ptr = (char *)map->env->xdf1.recs[line1 - 1]->ptr;
+	subfile1.size = map->env->xdf1.recs[line1 + count1 - 2]->ptr +
+		map->env->xdf1.recs[line1 + count1 - 2]->size - subfile1.ptr;
+	subfile2.ptr = (char *)map->env->xdf2.recs[line2 - 1]->ptr;
+	subfile2.size = map->env->xdf2.recs[line2 + count2 - 2]->ptr +
+		map->env->xdf2.recs[line2 + count2 - 2]->size - subfile2.ptr;
+	xpp.flags = map->xpp->flags & ~XDF_PATIENCE_DIFF;
+	if (xdl_do_diff(&subfile1, &subfile2, &xpp, &env) < 0)
+		return -1;
+
+	memcpy(map->env->xdf1.rchg + line1 - 1, env.xdf1.rchg, count1);
+	memcpy(map->env->xdf2.rchg + line2 - 1, env.xdf2.rchg, count2);
+
+	xdl_free_env(&env);
+
+	return 0;
+}
+
+/*
+ * Recursively find the longest common sequence of unique lines,
+ * and if none was found, ask xdl_do_diff() to do the job.
+ *
+ * This function assumes that env was prepared with xdl_prepare_env().
+ */
+static int patience_diff(mmfile_t *file1, mmfile_t *file2,
+		xpparam_t const *xpp, xdfenv_t *env,
+		int line1, int count1, int line2, int count2)
+{
+	struct hashmap map;
+	struct entry *first;
+	int result = 0;
+
+	/* trivial case: one side is empty */
+	if (!count1) {
+		while(count2--)
+			env->xdf2.rchg[line2++ - 1] = 1;
+		return 0;
+	} else if (!count2) {
+		while(count1--)
+			env->xdf1.rchg[line1++ - 1] = 1;
+		return 0;
+	}
+
+	memset(&map, 0, sizeof(map));
+	if (fill_hashmap(file1, file2, xpp, env, &map,
+			line1, count1, line2, count2))
+		return -1;
+
+	/* are there any matching lines at all? */
+	if (!map.has_matches) {
+		while(count1--)
+			env->xdf1.rchg[line1++ - 1] = 1;
+		while(count2--)
+			env->xdf2.rchg[line2++ - 1] = 1;
+		xdl_free(map.entries);
+		return 0;
+	}
+
+	first = find_longest_common_sequence(&map);
+	if (first)
+		result = walk_common_sequence(&map, first,
+			line1, count1, line2, count2);
+	else
+		result = fall_back_to_classic_diff(&map,
+			line1, count1, line2, count2);
+
+	xdl_free(map.entries);
+	return result;
+}
+
+int xdl_do_patience_diff(mmfile_t *file1, mmfile_t *file2,
+		xpparam_t const *xpp, xdfenv_t *env)
+{
+	if (xdl_prepare_env(file1, file2, xpp, env) < 0)
+		return -1;
+
+	/* environment is cleaned up in xdl_diff() */
+	return patience_diff(file1, file2, xpp, env,
+			1, env->xdf1.nrec, 1, env->xdf2.nrec);
+}
diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
index a43aa72..1689085 100644
--- a/xdiff/xprepare.c
+++ b/xdiff/xprepare.c
@@ -290,7 +290,8 @@ int xdl_prepare_env(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 
 	xdl_free_classifier(&cf);
 
-	if (xdl_optimize_ctxs(&xe->xdf1, &xe->xdf2) < 0) {
+	if (!(xpp->flags & XDF_PATIENCE_DIFF) &&
+			xdl_optimize_ctxs(&xe->xdf1, &xe->xdf2) < 0) {
 
 		xdl_free_ctx(&xe->xdf2);
 		xdl_free_ctx(&xe->xdf1);
-- 
1.6.0.2.GIT

^ permalink raw reply related

* Re: Error: unable to unlink ... when using "git gc"
From: Sitaram Chamarty @ 2009-01-07 18:00 UTC (permalink / raw)
  To: git
In-Reply-To: <200901070948.34117.bss@iguanasuicide.net>

On 2009-01-07, Boyd Stephen Smith Jr. <bss@iguanasuicide.net> wrote:

> On Wednesday 2009 January 07 04:55:56 you wrote:
>> So when you say "group", you're saying "0660", and when you
>> say "0660", you're overriding users umask value.

> it could just have been the version of git I was using (1.4.4.4, IIRC) --=20
> still using that in at least one place, as it is the current version in=20
> Debian Etch.

1.4.4.4 is 2 years and 2 days old today!  [I've heard
stories about Debian, but never thought it was this
conservative!]

and I think this was fixed in 06cbe85, last April.

^ permalink raw reply

* Re: [PATCH] gitweb: support the rel=vcs microformat
From: Giuseppe Bilotta @ 2009-01-07 18:03 UTC (permalink / raw)
  To: Joey Hess; +Cc: git
In-Reply-To: <20090107155023.GA16540@gnu.kitenet.net>

On Wed, Jan 7, 2009 at 4:50 PM, Joey Hess <joey@kitenet.net> wrote:
> Giuseppe Bilotta wrote:
>> In this patch you do NOT add titles to the rel=vcs links, which means that
>> everything works fine only if there is a single URL for each project. If a
>> project has different URLs, it's going to appear multiple times as _different_
>> projects to a spec-compliant reader.
>>
>> A possible solution would be to make @git_url_list into a map keyed by the
>> project name and having the description and repo URL(s) as values.
>
> Yes. I considered doing that, but didn't immediatly see a way to get the
> project description w/o additional overhead (of looking it up a second
> time).

The solution I have in mind would be something like this: in summary
or projects list view (which are the views in which we put the links,
and also the views in which we loop up the repo URL and the
description anyway), you fill up former @git_url_list (now
%project_metadata) looking up the repo description and URLs. You then
use this information both in the link tag and in the appropriate
places for the visible part of the webpage: you don't have a
significant overhead, because you're just moving the project
description retrieval early on.

You probably want to refactor the code by making a
git_get_project_metadata() sub that extends the current URL retrieval
by retrieving description and URLs. The routine can then be used
either for one or for all the projects, as needed.

> Thanks for the feedback. There are some changes happening to the
> microformat that should make gitweb's job slightly easier, I'll respin
> the patch soon.

Let me know about this too, I very much like the idea of this microformat.

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply

* Re: [PATCH v3 1/3] Implement the patience diff algorithm
From: Davide Libenzi @ 2009-01-07 18:10 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Pierre Habouzit, Linus Torvalds, Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.00.0901071802190.7496@intel-tinevez-2-302>

On Wed, 7 Jan 2009, Johannes Schindelin wrote:

> 
> The patience diff algorithm produces slightly more intuitive output
> than the classic Myers algorithm, as it does not try to minimize the
> number of +/- lines first, but tries to preserve the lines that are
> unique.

Johannes, sorry I had not time to follow this one. A couple of minor 
comments that arose just at glancing at the patch.



> +/*
> + *  LibXDiff by Davide Libenzi ( File Differential Library )
> + *  Copyright (C) 2003-2009 Davide Libenzi, Johannes E. Schindelin
> + *
> + *  This library is free software; you can redistribute it and/or
> + *  modify it under the terms of the GNU Lesser General Public
> + *  License as published by the Free Software Foundation; either
> + *  version 2.1 of the License, or (at your option) any later version.
> + *
> + *  This library is distributed in the hope that it will be useful,
> + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + *  Lesser General Public License for more details.
> + *
> + *  You should have received a copy of the GNU Lesser General Public
> + *  License along with this library; if not, write to the Free Software
> + *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> + *
> + *  Davide Libenzi <davidel@xmailserver.org>

You do not need to give me credit for something I don't even know how it 
works ;)



> +static int fall_back_to_classic_diff(struct hashmap *map,
> +		int line1, int count1, int line2, int count2)
> +{
> +	/*
> +	 * This probably does not work outside Git, since
> +	 * we have a very simple mmfile structure.
> +	 *
> +	 * Note: ideally, we would reuse the prepared environment, but
> +	 * the libxdiff interface does not (yet) allow for diffing only
> +	 * ranges of lines instead of the whole files.
> +	 */
> +	mmfile_t subfile1, subfile2;
> +	xpparam_t xpp;
> +	xdfenv_t env;
> +
> +	subfile1.ptr = (char *)map->env->xdf1.recs[line1 - 1]->ptr;
> +	subfile1.size = map->env->xdf1.recs[line1 + count1 - 2]->ptr +
> +		map->env->xdf1.recs[line1 + count1 - 2]->size - subfile1.ptr;
> +	subfile2.ptr = (char *)map->env->xdf2.recs[line2 - 1]->ptr;
> +	subfile2.size = map->env->xdf2.recs[line2 + count2 - 2]->ptr +
> +		map->env->xdf2.recs[line2 + count2 - 2]->size - subfile2.ptr;
> +	xpp.flags = map->xpp->flags & ~XDF_PATIENCE_DIFF;
> +	if (xdl_do_diff(&subfile1, &subfile2, &xpp, &env) < 0)
> +		return -1;

xdiff allows for diffing ranges, and the most efficent method is exactly 
how you did ;) Once you know the lines pointers, there's no need to pass 
it the whole file and have it scan it whole to find the lines range it has 
to diff. Just pass the limited view like you did.


- Davide

^ permalink raw reply

* Re: Problems getting rid of large files using git-filter-branch
From: Sitaram Chamarty @ 2009-01-07 18:18 UTC (permalink / raw)
  To: git
In-Reply-To: <alpine.DEB.1.00.0901071101480.7496@intel-tinevez-2-302>

On 2009-01-07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>> $ git verify-pack -v
>> .git/objects/pack/pack-1e039b82d8ae53ef5ec3614a3021466663cc70a4
>> Terminated
>
> I did
>
> 	$ git grep Terminated
>
> and came up empty :-)

It comes from libc, afaik.

^ permalink raw reply

* Google Summer of Code 2009
From: Shawn O. Pearce @ 2009-01-07 18:30 UTC (permalink / raw)
  To: git

Google just pre-announced that Summer of Code 2009 will run.  Its
going to be a bit smaller than last year's program due to the poor
worldwide economic climate, but the open source community is quite
fortunate that Google is footing the bill for yet another summer of
students hacking on open source projects.

Given our success last year, I think we should try applying again
this year as a mentoring organization.  I've started to put the
wiki pages together by cloning last year's stuff:

  Organization application:
    http://git.or.cz/gitwiki/SoC2009Application

  Organization ideas page:
    http://git.or.cz/gitwiki/SoC2009Ideas

FWIW, the folks who organize GSoC thought our ideas page is among
the better ones they've seen year-after-year.  So I'm largely
keeping the same format.  :-)

The application answers really need to be reworked; we need to
address our 2008 results in these parts.

The ideas box is once again open for suggestions.  Please start
proposing student projects, and possible mentors.

Since the program is smaller, there is a chance we won't be accepted
this year due to space constraints.  But I think its still worthwhile
to prepare everything and hope for the best.  And before you can
ask, no, my employee status with OSPO doesn't improve our odds
for acceptance.  :-)

-- 
Shawn.

^ permalink raw reply

* Re: [PATCH v3 1/3] Implement the patience diff algorithm
From: Johannes Schindelin @ 2009-01-07 18:32 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Pierre Habouzit, Linus Torvalds, Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.10.0901071001360.16651@alien.or.mcafeemobile.com>

Hi,

On Wed, 7 Jan 2009, Davide Libenzi wrote:

> On Wed, 7 Jan 2009, Johannes Schindelin wrote:
> 
> > The patience diff algorithm produces slightly more intuitive output 
> > than the classic Myers algorithm, as it does not try to minimize the 
> > number of +/- lines first, but tries to preserve the lines that are 
> > unique.
> 
> Johannes, sorry I had not time to follow this one.

What?  You mean you actually took some time off around Christmas???

:-)

> A couple of minor comments that arose just at glancing at the patch.

Thanks.

> > +/*
> > + *  LibXDiff by Davide Libenzi ( File Differential Library )
> > + *  Copyright (C) 2003-2009 Davide Libenzi, Johannes E. Schindelin
> > + *
> > + *  This library is free software; you can redistribute it and/or
> > + *  modify it under the terms of the GNU Lesser General Public
> > + *  License as published by the Free Software Foundation; either
> > + *  version 2.1 of the License, or (at your option) any later version.
> > + *
> > + *  This library is distributed in the hope that it will be useful,
> > + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + *  Lesser General Public License for more details.
> > + *
> > + *  You should have received a copy of the GNU Lesser General Public
> > + *  License along with this library; if not, write to the Free Software
> > + *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> > + *
> > + *  Davide Libenzi <davidel@xmailserver.org>
> 
> You do not need to give me credit for something I don't even know how it 
> works ;)

Well, I meant to pass the copyright to you, or at least share it.

> > +static int fall_back_to_classic_diff(struct hashmap *map,
> > +		int line1, int count1, int line2, int count2)
> > +{
> > +	/*
> > +	 * This probably does not work outside Git, since
> > +	 * we have a very simple mmfile structure.
> > +	 *
> > +	 * Note: ideally, we would reuse the prepared environment, but
> > +	 * the libxdiff interface does not (yet) allow for diffing only
> > +	 * ranges of lines instead of the whole files.
> > +	 */
> > +	mmfile_t subfile1, subfile2;
> > +	xpparam_t xpp;
> > +	xdfenv_t env;
> > +
> > +	subfile1.ptr = (char *)map->env->xdf1.recs[line1 - 1]->ptr;
> > +	subfile1.size = map->env->xdf1.recs[line1 + count1 - 2]->ptr +
> > +		map->env->xdf1.recs[line1 + count1 - 2]->size - subfile1.ptr;
> > +	subfile2.ptr = (char *)map->env->xdf2.recs[line2 - 1]->ptr;
> > +	subfile2.size = map->env->xdf2.recs[line2 + count2 - 2]->ptr +
> > +		map->env->xdf2.recs[line2 + count2 - 2]->size - subfile2.ptr;
> > +	xpp.flags = map->xpp->flags & ~XDF_PATIENCE_DIFF;
> > +	if (xdl_do_diff(&subfile1, &subfile2, &xpp, &env) < 0)
> > +		return -1;
> 
> xdiff allows for diffing ranges, and the most efficent method is exactly 
> how you did ;) Once you know the lines pointers, there's no need to pass 
> it the whole file and have it scan it whole to find the lines range it 
> has to diff. Just pass the limited view like you did.

Heh.

Could it be that you misread my patch, and assumed that I faked an 
xdfenv?

I did not, but instead faked two mmfiles, which is only as simple as I did 
it because in git.git, we only have contiguous mmfiles.  (I recall that 
libxdiff allows for ropes instead of arrays.)

The way I did it has one big shortcoming: I need to prepare an xdfenv for 
the subfiles even if I already prepared one for the complete files.  IOW 
the lines are rehashed all over again.

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH] gitweb: support the rel=vcs microformat
From: Joey Hess @ 2009-01-07 18:41 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git
In-Reply-To: <cb7bb73a0901071003m77482a99wf6f3988beb5b5e78@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 748 bytes --]

Giuseppe Bilotta wrote:
> > Thanks for the feedback. There are some changes happening to the
> > microformat that should make gitweb's job slightly easier, I'll respin
> > the patch soon.
> 
> Let me know about this too, I very much like the idea of this microformat.

FYI, I've updated the microformat's page with the changes. The
significant one for gitweb is that it can now be applied to <a> links.
So on the project page, the display of the git URL could be converted to
a link using the microformat, and there's no need to get the info
earlier to put it in the header. Unfortunatly, the same can't be done to
the project list page, unless it's changed to have "git" links as seen
on vger.kernel.org's gitweb.

-- 
see shy jo

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: [PATCH] gitweb: support the rel=vcs microformat
From: Joey Hess @ 2009-01-07 18:45 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git
In-Reply-To: <cb7bb73a0901071003m77482a99wf6f3988beb5b5e78@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 976 bytes --]

Giuseppe Bilotta wrote:
> The solution I have in mind would be something like this: in summary
> or projects list view (which are the views in which we put the links,
> and also the views in which we loop up the repo URL and the
> description anyway), you fill up former @git_url_list (now
> %project_metadata) looking up the repo description and URLs. You then
> use this information both in the link tag and in the appropriate
> places for the visible part of the webpage: you don't have a
> significant overhead, because you're just moving the project
> description retrieval early on.
> 
> You probably want to refactor the code by making a
> git_get_project_metadata() sub that extends the current URL retrieval
> by retrieving description and URLs. The routine can then be used
> either for one or for all the projects, as needed.

Another approach would be to just memoize git_get_project_description
and git_get_project_url_list.
 
-- 
see shy jo

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: [PATCH v3 1/3] Implement the patience diff algorithm
From: Linus Torvalds @ 2009-01-07 18:59 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Johannes Schindelin, Pierre Habouzit, Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.10.0901071001360.16651@alien.or.mcafeemobile.com>

On Wed, 7 Jan 2009, Davide Libenzi wrote:
> 
> xdiff allows for diffing ranges, and the most efficent method is exactly 
> how you did ;)

No, the performance problem is how it needs to re-hash everything. xdiff 
doesn't seem to have any way to use a "subset of the hash", so what the 
patience diff does is to use a subset of the mmfile, and then that will 
force all the rehashing to take place, which is kind of sad.

It would be nice (for patience diff) if it could partition the _hashes_ 
instead of partitioning the _data_. That way it wouldn't need to rehash. 
See?

		Linus

^ permalink raw reply

* Re: [PATCH] gitweb: support the rel=vcs microformat
From: Joey Hess @ 2009-01-07 19:02 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git
In-Reply-To: <20090107184515.GB31795@gnu.kitenet.net>

[-- Attachment #1: Type: text/plain, Size: 241 bytes --]

Joey Hess wrote:
> Another approach would be to just memoize git_get_project_description
> and git_get_project_url_list.

Especially since git_get_project_description is already called more than
once for some pages.

-- 
see shy jo

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: Error: unable to unlink ... when using "git gc"
From: Boyd Stephen Smith Jr. @ 2009-01-07 19:46 UTC (permalink / raw)
  To: Sitaram Chamarty; +Cc: git
In-Reply-To: <slrngm9rdm.gcv.sitaramc@sitaramc.homelinux.net>

[-- Attachment #1: Type: text/plain, Size: 1691 bytes --]

On Wednesday 2009 January 07 12:00:22 Sitaram Chamarty wrote:
> On 2009-01-07, Boyd Stephen Smith Jr. <bss@iguanasuicide.net> wrote:
> > On Wednesday 2009 January 07 04:55:56 you wrote:
> >> So when you say "group", you're saying "0660", and when you
> >> say "0660", you're overriding users umask value.
> >
> > it could just have been the version of git I was using (1.4.4.4, IIRC)
> > -- still using that in at least one place, as it is the current
> > version in Debian Etch.
>
> 1.4.4.4 is 2 years and 2 days old today!  [I've heard
> stories about Debian, but never thought it was this
> conservative!]

Once a stable is released, no new versions of packages come in, only 
backported bug and security fixes.  The pre-release freeze also limits new 
versions from being considered.  Lenny should be out RSN, so I can move up to 
1.5.6.5. :)

$ apt-cache policy git-core
git-core:
  Installed: 1:1.4.4.4-4
  Candidate: 1:1.4.4.4-4
  Version table:
     1:1.6.0.6-1 0
        300 http://localhost experimental/main Packages
     1:1.5.6.5-2 0
        700 http://localhost testing/main Packages
        500 http://localhost unstable/main Packages
     1:1.5.6.5-1~bpo40+1 0
        800 http://localhost etch-backports/main Packages
 *** 1:1.4.4.4-4 0
        900 http://localhost stable/main Packages
        100 /var/lib/dpkg/status
     1:1.4.4.4-2.1+etch1 0
        900 http://localhost stable/updates/main Packages
-- 
Boyd Stephen Smith Jr.                     ,= ,-_-. =. 
bss@iguanasuicide.net                     ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy           `-'(. .)`-' 
http://iguanasuicide.net/                      \_/     

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [PATCH v3 1/3] Implement the patience diff algorithm
From: Johannes Schindelin @ 2009-01-07 20:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Davide Libenzi, Pierre Habouzit, Francis Galiegue, Git ML
In-Reply-To: <alpine.LFD.2.00.0901071056470.3057@localhost.localdomain>

Hi,

On Wed, 7 Jan 2009, Linus Torvalds wrote:

> On Wed, 7 Jan 2009, Davide Libenzi wrote:
> > 
> > xdiff allows for diffing ranges, and the most efficent method is 
> > exactly how you did ;)
> 
> No, the performance problem is how it needs to re-hash everything. xdiff 
> doesn't seem to have any way to use a "subset of the hash", so what the 
> patience diff does is to use a subset of the mmfile, and then that will 
> force all the rehashing to take place, which is kind of sad.
> 
> It would be nice (for patience diff) if it could partition the _hashes_ 
> instead of partitioning the _data_. That way it wouldn't need to rehash. 
> See?

Actually, for libxdiff (non-patience), this is not possible, as 
xdl_cleanup_records() rewrites the hashes so that they are in a contiguous 
interval (0, ..., N-1), where N is the number of distinct lines found.

I am also pretty certain that at least the non-patience part needs the 
maximum of that "hash" to be as small as possible.

Having said that, if Linus likes patience diff enough to want it faster, 
Dscho will improve the speed by avoiding the rehashing for the patience 
diff part (although the lines for which patience has to fall back to Myers 
diff will _still_ rehash, but that does not hurt _that_ much in practice).

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH v3 1/3] Implement the patience diff algorithm
From: Davide Libenzi @ 2009-01-07 20:09 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Pierre Habouzit, Linus Torvalds, Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.00.0901071924350.7496@intel-tinevez-2-302>

On Wed, 7 Jan 2009, Johannes Schindelin wrote:

> Heh.
> 
> Could it be that you misread my patch, and assumed that I faked an 
> xdfenv?
> 
> I did not, but instead faked two mmfiles, which is only as simple as I did 
> it because in git.git, we only have contiguous mmfiles.  (I recall that 
> libxdiff allows for ropes instead of arrays.)
> 
> The way I did it has one big shortcoming: I need to prepare an xdfenv for 
> the subfiles even if I already prepared one for the complete files.  IOW 
> the lines are rehashed all over again.

I told you I just glanced at the code :)
In that way, if you guys decide to merge this new algo, you'll need to 
split the prepare from the optimize, and feed it with an already prepared 
env.
Before going that way, have you ever tried to tweak xdl_cleanup_records 
and xdl_clean_mmatch to reduce the level of optimization, and see the 
results you get? It is possible that you won't need two different algos 
inside git.

- Davide

^ permalink raw reply

* Re: [PATCH v3 1/3] Implement the patience diff algorithm
From: Davide Libenzi @ 2009-01-07 20:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Schindelin, Pierre Habouzit, Francis Galiegue, Git ML
In-Reply-To: <alpine.LFD.2.00.0901071056470.3057@localhost.localdomain>

On Wed, 7 Jan 2009, Linus Torvalds wrote:

> On Wed, 7 Jan 2009, Davide Libenzi wrote:
> > 
> > xdiff allows for diffing ranges, and the most efficent method is exactly 
> > how you did ;)
> 
> No, the performance problem is how it needs to re-hash everything. xdiff 
> doesn't seem to have any way to use a "subset of the hash", so what the 
> patience diff does is to use a subset of the mmfile, and then that will 
> force all the rehashing to take place, which is kind of sad.
> 
> It would be nice (for patience diff) if it could partition the _hashes_ 
> instead of partitioning the _data_. That way it wouldn't need to rehash. 
> See?

Yeah, saw that afterwards ;)



- Davide

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Sam Vilain @ 2009-01-07 20:15 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Pierre Habouzit, Linus Torvalds, davidel, Francis Galiegue,
	Git ML
In-Reply-To: <alpine.DEB.1.00.0901062037250.30769@pacific.mpi-cbg.de>

On Tue, 2009-01-06 at 20:40 +0100, Johannes Schindelin wrote:
> Although I would like to see it in once it is fleshed out -- even if it 
> does not meet our usefulness standard -- because people said Git is 
> inferior for not providing a patience diff.  If we have --patience, we can 
> say "but we have it, it's just not useful, check for yourself".

Whatever happens, the current deterministic diff algorithm needs to stay
for generating patch-id's... those really can't be allowed to change.

Sam.

^ permalink raw reply

* Re: [PATCH v3 1/3] Implement the patience diff algorithm
From: Johannes Schindelin @ 2009-01-07 20:19 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Pierre Habouzit, Linus Torvalds, Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.10.0901071159060.17115@alien.or.mcafeemobile.com>

Hi,

On Wed, 7 Jan 2009, Davide Libenzi wrote:

> On Wed, 7 Jan 2009, Johannes Schindelin wrote:
> 
> > Could it be that you misread my patch, and assumed that I faked an 
> > xdfenv?
> > 
> > I did not, but instead faked two mmfiles, which is only as simple as I did 
> > it because in git.git, we only have contiguous mmfiles.  (I recall that 
> > libxdiff allows for ropes instead of arrays.)
> > 
> > The way I did it has one big shortcoming: I need to prepare an xdfenv for 
> > the subfiles even if I already prepared one for the complete files.  IOW 
> > the lines are rehashed all over again.
> 
> I told you I just glanced at the code :)
> In that way, if you guys decide to merge this new algo, you'll need to 
> split the prepare from the optimize, and feed it with an already prepared 
> env.

Right.

> Before going that way, have you ever tried to tweak xdl_cleanup_records 
> and xdl_clean_mmatch to reduce the level of optimization, and see the 
> results you get? It is possible that you won't need two different algos 
> inside git.

No, I hadn't thought that libxdiff already determines uniqueness before 
actually running xdl_do_diff().

I also have to admit that I am not as clever as other people, and had 
quite a hard time figuring out as much as I did (for example, that rchg[i] 
== 1 means that this line is to be added/deleted, and that i is in the 
range 0, ..., N - 1 rather than 1, ..., N).

So it is quite possible that something patience-like can be done earlier.

However, I do not see a way to implement the recursion necessary for the 
patience diff.  Remember:

	patience(line range):
		find unique lines
		if no unique lines found:
			resort to classical diff
			return
		extract the longest common sequence of unique common lines
		between those, recurse

When recursing, previously non-unique lines can turn unique, of course.  
And I do not see how that recursion could be done before 
xdl_clean_mmatch(), short of redoing the hashing and cleaning records up.

Of course, it might well be possible, but I am already out of my depth 
reading something like "rdis0" and "rpdis1", and being close to despair.

:')

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Linus Torvalds @ 2009-01-07 20:25 UTC (permalink / raw)
  To: Sam Vilain
  Cc: Johannes Schindelin, Pierre Habouzit, davidel, Francis Galiegue,
	Git ML
In-Reply-To: <1231359317.6011.12.camel@maia.lan>

On Thu, 8 Jan 2009, Sam Vilain wrote:
> 
> Whatever happens, the current deterministic diff algorithm needs to stay
> for generating patch-id's... those really can't be allowed to change.

Sure they can.

We never cache patch-id's over a long time. And we _have_ changed xdiff to 
modify the output of the patches before, quite regardless of any patience 
issues: see commit 9b28d55401a529ff08c709f42f66e765c93b0a20, which 
admittedly doesn't affect any _normal_ diffs, but can generate subtly 
different results for some cases.

It's true that we want the diff algorithm to be deterministic in the sense 
that over the run of a _single_ rebase operation, the diff between two 
files should give similar and deterministic results, but that's certainly 
true of patience diff too.

			Linus

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Johannes Schindelin @ 2009-01-07 20:38 UTC (permalink / raw)
  To: Sam Vilain
  Cc: Pierre Habouzit, Linus Torvalds, davidel, Francis Galiegue,
	Git ML
In-Reply-To: <1231359317.6011.12.camel@maia.lan>

Hi,

On Thu, 8 Jan 2009, Sam Vilain wrote:

> On Tue, 2009-01-06 at 20:40 +0100, Johannes Schindelin wrote:
> > Although I would like to see it in once it is fleshed out -- even if it 
> > does not meet our usefulness standard -- because people said Git is 
> > inferior for not providing a patience diff.  If we have --patience, we can 
> > say "but we have it, it's just not useful, check for yourself".
> 
> Whatever happens, the current deterministic diff algorithm needs to stay
> for generating patch-id's... those really can't be allowed to change.

Oh, there is no discussion about replacing the diff algorithm we have 
right now.

Even if we never output patch-ids anywhere, so we always recalculate them, 
and therefore would be free to replace the diff algorithm with something 
else.

The thing why I do not want patience diff to replace the existing code 
is:

- patience diff is slower,
- patience diff is often not worth it (produces the same, maybe even 
  worse output), and
- patience diff needs the existing code as a fallback anyway.

Where it could possibly help to change existing behavior is with merging.

So maybe somebody has some time to play with, and can apply this patch:

-- snip --
diff --git a/ll-merge.c b/ll-merge.c
index fa2ca52..f731026 100644
--- a/ll-merge.c
+++ b/ll-merge.c
@@ -79,6 +79,8 @@ static int ll_xdl_merge(const struct ll_merge_driver *drv_unused,
 	memset(&xpp, 0, sizeof(xpp));
 	if (git_xmerge_style >= 0)
 		style = git_xmerge_style;
+	if (getenv("GIT_PATIENCE_MERGE"))
+		xpp.flags |= XDF_PATIENCE_DIFF;
 	return xdl_merge(orig,
 			 src1, name1,
 			 src2, name2,
-- snap --

After compiling and installing, something like this should be fun to 
watch:

	$ git rev-list --all --parents | \
	  grep " .* " | \
	  while read commit parent1 parent2 otherparents
	  do
		test -z "$otherparents" || continue
		git checkout $parent1 &&
		git merge $parent2 &&
		git diff > without-patience.txt &&
		git reset --hard $parent1 &&
		GIT_PATIENCE_MERGE=1 git merge $parent2 &&
		git diff > with-patience.txt &&
		if ! cmp without-patience.txt with-patience.txt
		then
			echo '==============================='
			echo "differences found in merge $commit"
			echo "without patience: $(wc -l < without-patience.txt)"
			echo "with patience: $(wc -l < with-patience.txt)"
			echo '-------------------------------'
			echo 'without patience:'
			cat without-patience.txt
			echo '-------------------------------'
			echo 'with patience:'
			cat with-patience.txt
		fi ||
		exit
	  done | tee patience-merge.out

Ciao,
Dscho

^ permalink raw reply related

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Junio C Hamano @ 2009-01-07 20:48 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Sam Vilain, Pierre Habouzit, Linus Torvalds, davidel,
	Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.00.0901072121260.7496@intel-tinevez-2-302>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> After compiling and installing, something like this should be fun to 
> watch:
>
> 	$ git rev-list --all --parents | \
> 	  grep " .* " | \
> 	  while read commit parent1 parent2 otherparents
> 	  do
> 		test -z "$otherparents" || continue
> 		git checkout $parent1 &&
> 		git merge $parent2 &&
> 		git diff > without-patience.txt &&
> ...
> 		if ! cmp without-patience.txt with-patience.txt
> 		then
> 			echo '==============================='
> 			echo "differences found in merge $commit"
> ...
> 			cat with-patience.txt
> 		fi ||
> 		exit
> 	  done | tee patience-merge.out

An even more interesting test would be possible by dropping "&&" from the
two "git merge" invocations.

 - Your sample will exit at the first conflicting merge otherwise.

 - You may find cases where one resolves cleanly while the other leaves
   conflicts.

^ permalink raw reply

* Re: [PATCH] gitweb: don't use pathinfo for global actions
From: Junio C Hamano @ 2009-01-07 21:19 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Giuseppe Bilotta, git, Petr Baudis, Devin Doucette
In-Reply-To: <200901061837.23637.jnareb@gmail.com>

Jakub Narebski <jnareb@gmail.com> writes:

> Therefore it really needs to be in, as df63fb also by Giuseppe
> (gitweb: use href() when generating URLs in OPML) is already in,
> and I think gitweb would generate broken OPML and TXT links without
> this patch.
> ...
> Acked-by: Jakub Narebski <jnareb@gmail.com>

Thanks for reminding me.  Queued.

^ permalink raw reply

* Re: [PATCH] strbuf: instate cleanup rule in case of non-memory errors
From: Junio C Hamano @ 2009-01-07 21:19 UTC (permalink / raw)
  To: René Scharfe; +Cc: Pierre Habouzit, Linus Torvalds, git
In-Reply-To: <4963C1EA.504@lsrfire.ath.cx>

René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:

> Make all strbuf functions that can fail free() their memory on error if
> they have allocated it.  They don't shrink buffers that have been grown,
> though.

Thanks; applied.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox