Git development
 help / color / mirror / Atom feed
* Re: Google Summer of Code 2009
From: Shawn O. Pearce @ 2009-01-07 23:14 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git
In-Reply-To: <81b0412b0901071512k64a7d5e2u2c602b903f5233d3@mail.gmail.com>

Alex Riesen <raa.lkml@gmail.com> wrote:
> 2009/1/7 Shawn O. Pearce <spearce@spearce.org>:
> >
> >  Organization ideas page:
> >    http://git.or.cz/gitwiki/SoC2009Ideas
> 
> BTW, what happened to GitTorrent?

I got lazy and didn't copy everything over.  ;-)

GitTorrent and restartable clone both should probably be on the 2009
idea list, though GitTorrent already has a code base from the failed
2008 project that someone might be able to start and pick up from...

-- 
Shawn.

^ permalink raw reply

* Re: Google Summer of Code 2009
From: Alex Riesen @ 2009-01-07 23:12 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git
In-Reply-To: <20090107183033.GB10790@spearce.org>

2009/1/7 Shawn O. Pearce <spearce@spearce.org>:
>
>  Organization ideas page:
>    http://git.or.cz/gitwiki/SoC2009Ideas
>

BTW, what happened to GitTorrent?

^ permalink raw reply

* Re: Google Summer of Code 2009
From: Miklos Vajna @ 2009-01-07 23:11 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git
In-Reply-To: <20090107183033.GB10790@spearce.org>

[-- Attachment #1: Type: text/plain, Size: 297 bytes --]

On Wed, Jan 07, 2009 at 10:30:33AM -0800, "Shawn O. Pearce" <spearce@spearce.org> wrote:
> The ideas box is once again open for suggestions.  Please start
> proposing student projects, and possible mentors.

I think restartable git-clone from last year is still actual, and would
be nice to have.

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Johannes Schindelin @ 2009-01-07 23:03 UTC (permalink / raw)
  To: Pierre Habouzit
  Cc: Junio C Hamano, Sam Vilain, Linus Torvalds, davidel,
	Francis Galiegue, Git ML
In-Reply-To: <20090107224504.GA29537@artemis.corp>

Hi,

On Wed, 7 Jan 2009, Pierre Habouzit wrote:

> On Wed, Jan 07, 2009 at 10:00:07PM +0000, Johannes Schindelin wrote:
> > Therefore I counted the lines between conflict markers (actually, a perl 
> > script did).  Of these 66 merges, on average patience merge produced 
> > 4.46774193548387 _fewer_ lines between conflict markers.
> > 
> > Take that with a grain of salt, though: the standard deviation of this 
> > difference is a hefty 121.163046639509 lines.
> > 
> > The worst case for patience diff was the merge 
> > 4698ef555a1706fe322a68a02a21fb1087940ac3, where the --cc diff line counts 
> > are 1300 (without) vs 1301 (with patience merge), but the lines between 
> > conflict markers are 197 vs a ridiculous 826 lines!
> > 
> > But you should take that also with a grain of salt: this merge is a 
> > _subtree_ merge, and my test redid it as a _non-subtree_ merge.
> > 
> > So I restricted the analysis to the non-subtree merges, and now 
> > non-patience merge comes out 6.97297297297297 conflict lines fewer than 
> > patience merge, with a standard deviation of 58.941106657867 (with a total 
> > count of 37 merges).
> > 
> > Note that ~7 lines difference with a standard deviation of ~59 lines is 
> > pretty close to ~0 lines difference.
> > 
> > In the end, the additional expense of patience merge might just not be 
> > worth it.
> 
> Depends, if it can help generating nicer merges, it's good to have.
> 
> We could have an option to git-merge that tries hard to generate the
> smallest conflict possible.

I also showed you examples, in addition to numbers, exactly to point out 
that shorter conflicts do not necessarily mean nicer conflicts.

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: R. Tyler Ballance @ 2009-01-07 22:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Nicolas Pitre, Jan Krüger, Git ML
In-Reply-To: <alpine.LFD.2.00.0901070743070.3057@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 4576 bytes --]

On Wed, 2009-01-07 at 08:07 -0800, Linus Torvalds wrote:
> Well, that's not necessarily "unfortunate". It does actually end up 
> showing that the objects themselves were apparently never really corrupt.
> 
> So there is no fundamental data structure corrupttion - because when you 
> copy the repository, it's all good agin!
>  - it could be some _temporary_ git corruption caused internally inside a 
>    git process - ie a wild pointer, or perhaps a race condition (but we 
>    don't really use threading in 1.6.0.4 unless you ask for it, and even 
>    then just for pack-file generation)

I have a feeling it's something like this, one of our operations guys
did some research while I was looking at code and he came across this:

        On Wed, 2009-01-07 at 14:17 -0800, Ken Brownfield wrote:
        git-merge is using too much RAM, and failing to malloc() but
        NOT  
        > reporting it.  This is all sorts of bad:
        > 
        >   A) using an unscalable amount of RAM
        >   B) failing to detect malloc() failure
        >   C) reporting file corruption instead
        > I was able to reproduce this.
        >
        > limit ~1.5GB -> corrupt file
        > limit ~3GB -> magically no longer corrupt.
        >
        > The false fail may be limited to git-merge, but git status also  
        > allocates the same amount of RAM.
        > 
        > To temporarily work around this problem, issue this once you
        log in to  
        > a dev box:
        > 
        > tcsh:
        >         limit vmemoryuse 3000000
        > bash:
        >         ulimit -v 3000000
        > 
        > Be gentle.
        

> And quite frankly, since the corruption seems to be site-specific, I 
> really do suspect the second case. Although it's possible, of course, that 
> it could be some compiler issue that makes _your_ binaries have issues 
> even when nobody else sees it.

I think you're correct insofar that our major site-specific alteration
has come up on the mailing list before (okay maybe two site-specific
things). 
	* Our Git repo is ~7.1GB
	* ulimit -v is set to ~1.5G


I think I know how this could be failing and corrupting things (assuming
it's malloc(2)) related.


What I'm thinking is that in xmalloc() or one of the other x*)_
functions, the malloc(size) is failing because of the ulimits, and then
the potentially somewhere it's silently failing or maybe even
accidentally returning one of those "malloc(1)" pointers?

I've got two new tarred repositories from two developers the issue
happened to today, so I'm flush full of sample repositories to try stuff
on :)


> 
> Hmm. That's actually _normal_ under some circumstances. At least with 
> older git versions, or if your .git/index file couldn't be rewritten for 
> some reason - your existing index file contains all the old stat 
> information, and if git cannot (or, in the case of older git version, just 
> will not) refresh it automatically, it will show all the files as changed, 
> even if it's just the inode number that really changed.
> 
> A _normal_ git install should have auto-refreshed the index, though. 
> Unless the tar archive only contained the ".git" directory, and not the 
> checkout?

I believe the issues I noticed when untarring the repo were a red
herring, I did the `git diff` after untarring and I noticed that only a
certain set of files where changed, I'm willing to go so far as to guess
that they were the files affected in the corrupted packs. Of the 32k
files in our repository, 98 were actually different after untarring
(according to git-diff(1))

> And dobody else saw it than this one person, and it was a total mystery to 
> everybody until we realized that he used this one feature that nobody else 
> was using. So as you're on OS X, I assume you don't have CRLF conversion, 
> but maybe you use some other feature that we support but nobody really 
> actually uses. Like keyword expansion or something?

The two new folks this happened to today had nothing "special" about
them other than the ulimit.


I've got the script(1) output of performing git-ls-files(1) and some
other commands that I tried, nothing they output was particular
informative or interesting, and I don't think it will help if this
really is a memory related issue, that said I'd be more than happy to
send it to a couple of you (Junio, Linus, Nico).


I'm *so* ready for this bug to die >=\


Cheers

-- 
-R. Tyler Ballance
Slide, Inc.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Pierre Habouzit @ 2009-01-07 22:45 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Sam Vilain, Linus Torvalds, davidel,
	Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.00.0901072213570.7496@intel-tinevez-2-302>

[-- Attachment #1: Type: text/plain, Size: 2168 bytes --]

On Wed, Jan 07, 2009 at 10:00:07PM +0000, Johannes Schindelin wrote:
> Therefore I counted the lines between conflict markers (actually, a perl 
> script did).  Of these 66 merges, on average patience merge produced 
> 4.46774193548387 _fewer_ lines between conflict markers.
> 
> Take that with a grain of salt, though: the standard deviation of this 
> difference is a hefty 121.163046639509 lines.
> 
> The worst case for patience diff was the merge 
> 4698ef555a1706fe322a68a02a21fb1087940ac3, where the --cc diff line counts 
> are 1300 (without) vs 1301 (with patience merge), but the lines between 
> conflict markers are 197 vs a ridiculous 826 lines!
> 
> But you should take that also with a grain of salt: this merge is a 
> _subtree_ merge, and my test redid it as a _non-subtree_ merge.
> 
> So I restricted the analysis to the non-subtree merges, and now 
> non-patience merge comes out 6.97297297297297 conflict lines fewer than 
> patience merge, with a standard deviation of 58.941106657867 (with a total 
> count of 37 merges).
> 
> Note that ~7 lines difference with a standard deviation of ~59 lines is 
> pretty close to ~0 lines difference.
> 
> In the end, the additional expense of patience merge might just not be 
> worth it.

Depends, if it can help generating nicer merges, it's good to have.

We could have an option to git-merge that tries hard to generate the
smallest conflict possible. _that_ would really really be worth it. I
mean, I've had really really tricky conflicts to work with where
git-merge genrated ridiculously big conflicts, and where I hard to
resort using UI tools to perform the merge (meld IIRC to name it), and
given how slow and crappy those tools are, I would gladly restart a
merge with a --generate-smallest-conflicts-as-possible if it can save me
from those merge tools.

YMMV though.

PS: I never thought the patience diff is a silver bullet, it's just yet
    another tool in the toolbox.
-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [BUG PATCH RFC] mailinfo: correctly handle multiline 'Subject:' header
From: Kirill Smelkov @ 2009-01-07 22:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <1230316721-14339-1-git-send-email-kirr@mns.spb.ru>

On Fri, Dec 26, 2008 at 09:38:41PM +0300, Kirill Smelkov wrote:
> When native language (RU) is in use, subject header usually contains several
> parts, e.g.
> 
> Subject: [Navy-patches] [PATCH]
> 	=?utf-8?b?0JjQt9C80LXQvdGR0L0g0YHQv9C40YHQvtC6INC/0LA=?=
> 	=?utf-8?b?0LrQtdGC0L7QsiDQvdC10L7QsdGF0L7QtNC40LzRi9GFINC00LvRjyA=?=
> 	=?utf-8?b?0YHQsdC+0YDQutC4?=

Which btw should be extracted by git-mailinfo to:

    'Subject: Изменён список пакетов необходимых для сборки'

> This exposes several bugs in builtin-mailinfo.c that I try to fix:
> 
> 
> 1. decode_b_segment: do not append explicit NUL -- explicit NUL was preventing
>    correct header construction on parts concatenation via strbuf_addbuf in
>    decode_header_bq. Fixes:
> 
> -Subject: Изменён список пакетов необходимых для сборки
> +Subject: Изменён список па
> 
> 
> Then
> 
> 2. (hackish) do not emit '\n' after processing of every header segment. It
>    seems we should emit previous part as-is only if it does not end with
>    '=?='. Fixes:
> 
> -Subject: Изменён список пакетов необходимых для сборки
> +Subject: Изменён список па кетов необходимых для сборки
> 
> 
> Sorry for low-quality patch and description. I did what I could and don't have
> energy and time dig more into MIME.
> 
> Please help.
> 
> Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
> 
> ---
>  builtin-mailinfo.c  |   18 ++++++++++++++++-
>  t/t5100-mailinfo.sh |    2 +-
>  t/t5100/info0012    |    5 ++++
>  t/t5100/msg0012     |    7 ++++++
>  t/t5100/patch0012   |   30 +++++++++++++++++++++++++++++
>  t/t5100/sample.mbox |   52 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 112 insertions(+), 2 deletions(-)

Junio, All,

What about this patch?

It at least exposes bug in git-mailinfo wrt handling of multiline
subjects, and in very details documents it and adds a test for it.


Yes, my fixes are of 'low quality', but may I try to attract git
community attention one more time?


Thanks beforehand,
Kirill


P.S. original post with patch:

http://marc.info/?l=git&m=123031899307286&w=2

^ permalink raw reply

* Re: Comments on Presentation Notes Request.
From: Boyd Stephen Smith Jr. @ 2009-01-07 22:40 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git
In-Reply-To: <alpine.LNX.1.00.0901071654530.19665@iabervon.org>

[-- Attachment #1: Type: text/plain, Size: 436 bytes --]

On Wednesday 2009 January 07 16:30:04 Daniel Barkalow wrote:
> Git is clever about finding [...]
> the common ancestor of commits that don't have a common ancestor.

*confused*

Please elaborate.
-- 
Boyd Stephen Smith Jr.                     ,= ,-_-. =. 
bss@iguanasuicide.net                     ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy           `-'(. .)`-' 
http://iguanasuicide.net/                      \_/     

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: Comments on Presentation Notes Request.
From: Daniel Barkalow @ 2009-01-07 22:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Tim Visher, git
In-Reply-To: <20090107063629.GB22616@coredump.intra.peff.net>

On Wed, 7 Jan 2009, Jeff King wrote:

> On Tue, Jan 06, 2009 at 05:33:02PM -0500, Tim Visher wrote:
> 
> > ** Advantages of SCM
> > *** One Source to Rule Them All.
> > *** Unlimited Undo/Redo.
> > *** Safe Concurrent Editing.
> > *** Diff Debugging
> 
> I would add to this metadata and "software archeology": finding the
> author of a change or piece of code, the motivation behind it, related
> changes (by position within history, by content, or by commit message),
> etc.

If you look at the git source code, the comments in the code are almost 
never sufficient to really understand the code, because a full 
line-by-line explanation would make it hard to find the code under the 
comments. On the other hand, if you take "git blame" in one window and a 
series of "git show"s in another window, and look at the commit messages 
for the commits that introduced each of those lines, you get really 
detailed and in-depth documentation of the subtle changes.

> I think people who have not used an SCM before, and people coming from
> SCMs where it is painful to look at history (like CVS) undervalue this
> because it's not part of their workflow.  But having used git for a few
> years now, it is an integral part of how I develop (especially when
> doing maintenance or bugfixes).
> 
> You touch on this in "Diff Debugging", but I think bisection is just a
> part of it.
> 
> > * SCM Best Practices
> >
> > ** Commit Early, Commit Often
> > ** Don't Commit Broken Code (To the Public Tree)
> 
> People talk a lot about using their SCM on a plane, but I think these
> two seemingly opposite commands highlight the _real_ useful thing about
> a distributed system for most people: commit and publish are two
> separate actions.
> 
> So I think it might be better to say "Commit Early, Commit Often" but
> "Don't _Publish_ Broken Code". Which is what you end up saying in the
> discussion, but I think using that terminology makes clear the important
> distinction between two actions that are convoluted in centralized
> systems.
> 
> > *** Backup Becomes A Separate Process
> > Because there is only a single repository, you need a back-up strategy
> > or else you are exposing yourself to a single point of failure.
> > [...]
> > *** Natural Backup
> > Because every developer has a copy of the repository, every developer
> > you add adds an extra failure point.  The more developers you have,
> > the more backups you have of the repository.
> 
> The "natural backup" thing gets brought out a lot for DVCS. And it is
> sort of true: instead of each developer having a backup of the latest
> version (or some recent version which they checked out), they have a
> backup of the whole history. But they still might not have everything.
> Developers might not clone all branches. They might not be up to date
> with some "master" repository. Useful work might be unpublished in the
> master repo (e.g., I am working on feature X which is 99% complete, but
> not ready for me to merge into master and push).

It is the case that everything in the central repo (including speculative 
stuff) will also be on its author's machine, with the metadata needed to 
identify that it's not in the main history and how everything is supposed 
to be arranged. This is likely to be particularly helpful for the work 
that everybody did between the last backup and the crash.

> So yes, you are much more likely to salvage useful (if not all) data
> from developer repositories in the event of a crash. But I still think
> it's crazy not to have a backup strategy for your DVCS repo.

I think it's very important to have a backup strategy, but it's nice that 
the developers can get work done while the server is still down.

> > ** Excellent Merge algorithms
> > 
> > Git has excellent merge algorithms.  This is widely attributed and
> > doesn't require much explanation.  It was one of Git's original design
> > goals, and it has been proven by Git's implementation.  Merging in Git
> > is _much_ less painful than in other systems.
> 
> Actually, git has a really _stupid_ merge algorithm that has been around
> forever: the 3-way merge. And by stupid I don't mean bad, but just
> simple and predictable. I think the git philosophy is more about making
> it easy to merge often, and about making sure conflicts are simple to
> understand and fix, than it is about being clever.

Git is clever about finding the 3 inputs to the 3-way merge, particularly 
the common ancestor of commits that don't have a common ancestor. I think 
merge-recursive is novel to git, and may not be available anywhere else.

> Which isn't to say there aren't systems with less clever merge
> algorithms. CVS doesn't even do a 3-way merge, since it doesn't bother
> to remember where the last branch intersection was.

CVS did do 3-way merge, but only between your uncommited changes, the 
latest commit, and the common ancestor (the commit that you started 
changing). IIRC, arch actually didn't support 3-way merge at all.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Johannes Schindelin @ 2009-01-07 22:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Sam Vilain, Pierre Habouzit, Linus Torvalds, davidel,
	Francis Galiegue, Git ML
In-Reply-To: <7v63kqall2.fsf@gitster.siamese.dyndns.org>

Hi,

On Wed, 7 Jan 2009, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > After compiling and installing, something like this should be fun to 
> > watch:
> >
> > 	$ git rev-list --all --parents | \
> > 	  grep " .* " | \
> > 	  while read commit parent1 parent2 otherparents
> > 	  do
> > 		test -z "$otherparents" || continue
> > 		git checkout $parent1 &&
> > 		git merge $parent2 &&
> > 		git diff > without-patience.txt &&
> > ...
> > 		if ! cmp without-patience.txt with-patience.txt
> > 		then
> > 			echo '==============================='
> > 			echo "differences found in merge $commit"
> > ...
> > 			cat with-patience.txt
> > 		fi ||
> > 		exit
> > 	  done | tee patience-merge.out
> 
> An even more interesting test would be possible by dropping "&&" from the
> two "git merge" invocations.
> 
>  - Your sample will exit at the first conflicting merge otherwise.
> 
>  - You may find cases where one resolves cleanly while the other leaves
>    conflicts.

Yeah, that's why I always put "like" into that phrase "something like 
this"... :-)

Actually, I had to read something and did not want my box to sit idle 
while I was doing that, so...

The most interesting thing to me was: of the 4072 merges I have in my 
local git.git clone, only 66 show a difference.

The next interesting thing: none -- I repeat, none! -- resulted in only 
one of both methods having conflicts.  In all cases, if patience merge had 
conflicts, so had the classical merge, and vice versa.  I would have 
expected patience merge to handle some conflicts more gracefully.

So let's go on to the next metric: what are the differences in the --cc 
diffs' line counts?

On average, patience merge produced 1.03225806451613 more lines of --cc 
diff, and the standard deviation between the line counts is 
42.9823669772587.

So from the line counts' point of view, the difference is lost in the 
noise.

So let's look at a concrete example.  I take 
41a3e3aa9bdaede9ab7afed206428c1b071060d2, as it is one of the three merges 
with minimal --cc diff line counts (they all have 33 lines) and where 
patience merge makes a difference.

This is the --cc diff without patience merge:

-- snip --
diff --cc git-am.sh
index a391254,5a7695e..0000000
--- a/git-am.sh
+++ b/git-am.sh
@@@ -327,11 -327,20 +327,28 @@@ d
  			echo "Patch is empty.  Was it split wrong?"
  			stop_here $this
  		}
++<<<<<<< HEAD:git-am.sh
 +		SUBJECT="$(sed -n '/^Subject/ s/Subject: //p' "$dotest/info")"
 +		case "$keep_subject" in -k)  SUBJECT="[PATCH] $SUBJECT" ;; esac
 +
 +		(printf '%s\n\n' "$SUBJECT"; cat "$dotest/msg") |
 +			git stripspace > "$dotest/msg-clean"
++=======
+ 		if test -f "$dotest/rebasing" &&
+ 			commit=$(sed -e 's/^From \([0-9a-f]*\) .*/\1/' \
+ 				-e q "$dotest/$msgnum") &&
+ 			test "$(git cat-file -t "$commit")" = commit
+ 		then
+ 			git cat-file commit "$commit" |
+ 			sed -e '1,/^$/d' >"$dotest/msg-clean"
+ 		else
+ 			SUBJECT="$(sed -n '/^Subject/ s/Subject: //p' "$dotest/info")"
+ 			case "$keep_subject" in -k)  SUBJECT="[PATCH] $SUBJECT" ;; esac
+ 
+ 			(echo "$SUBJECT" ; echo ; cat "$dotest/msg") |
+ 				git stripspace > "$dotest/msg-clean"
+ 		fi
++>>>>>>> 5e835cac8622373724235d299f1331ac4cf81ccf:git-am.sh
  		;;
  	esac
  
-- snap --

Compare this with the --cc diff _with_ patience merge:

-- snip --
diff --cc git-am.sh
index a391254,5a7695e..0000000
--- a/git-am.sh
+++ b/git-am.sh
@@@ -327,11 -327,20 +327,25 @@@ d
  			echo "Patch is empty.  Was it split wrong?"
  			stop_here $this
  		}
- 		SUBJECT="$(sed -n '/^Subject/ s/Subject: //p' "$dotest/info")"
- 		case "$keep_subject" in -k)  SUBJECT="[PATCH] $SUBJECT" ;; esac
- 
+ 		if test -f "$dotest/rebasing" &&
+ 			commit=$(sed -e 's/^From \([0-9a-f]*\) .*/\1/' \
+ 				-e q "$dotest/$msgnum") &&
+ 			test "$(git cat-file -t "$commit")" = commit
+ 		then
+ 			git cat-file commit "$commit" |
+ 			sed -e '1,/^$/d' >"$dotest/msg-clean"
+ 		else
+ 			SUBJECT="$(sed -n '/^Subject/ s/Subject: //p' "$dotest/info")"
+ 			case "$keep_subject" in -k)  SUBJECT="[PATCH] $SUBJECT" ;; esac
+ 
++<<<<<<< HEAD:git-am.sh
 +		(printf '%s\n\n' "$SUBJECT"; cat "$dotest/msg") |
 +			git stripspace > "$dotest/msg-clean"
++=======
+ 			(echo "$SUBJECT" ; echo ; cat "$dotest/msg") |
+ 				git stripspace > "$dotest/msg-clean"
+ 		fi
++>>>>>>> 5e835cac8622373724235d299f1331ac4cf81ccf:git-am.sh
  		;;
  	esac
  
-- snap --

So, the patience merge resulted in a much smaller _conflict_.

However, another such merge is 276328ffb87cefdc515bee5f09916aea6e0244ed.  
This is the --cc diff without patience merge:

-- snip --
diff --cc diff.c
index 4e4e439,f91f256..0000000
--- a/diff.c
+++ b/diff.c
@@@ -1498,19 -1464,13 +1498,28 @@@ static void builtin_diff(const char *na
  	char *a_one, *b_two;
  	const char *set = diff_get_color_opt(o, DIFF_METAINFO);
  	const char *reset = diff_get_color_opt(o, DIFF_RESET);
 +	const char *a_prefix, *b_prefix;
 +
 +	diff_set_mnemonic_prefix(o, "a/", "b/");
 +	if (DIFF_OPT_TST(o, REVERSE_DIFF)) {
 +		a_prefix = o->b_prefix;
 +		b_prefix = o->a_prefix;
 +	} else {
 +		a_prefix = o->a_prefix;
 +		b_prefix = o->b_prefix;
 +	}
  
++<<<<<<< HEAD:diff.c
 +	a_one = quote_two(a_prefix, name_a + (*name_a == '/'));
 +	b_two = quote_two(b_prefix, name_b + (*name_b == '/'));
++=======
+ 	/* Never use a non-valid filename anywhere if at all possible */
+ 	name_a = DIFF_FILE_VALID(one) ? name_a : name_b;
+ 	name_b = DIFF_FILE_VALID(two) ? name_b : name_a;
+ 
+ 	a_one = quote_two(o->a_prefix, name_a + (*name_a == '/'));
+ 	b_two = quote_two(o->b_prefix, name_b + (*name_b == '/'));
++>>>>>>> e261cf94848d31868c21fb11cade51c30dfcdbe7:diff.c
  	lbl[0] = DIFF_FILE_VALID(one) ? a_one : "/dev/null";
  	lbl[1] = DIFF_FILE_VALID(two) ? b_two : "/dev/null";
  	fprintf(o->file, "%sdiff --git %s %s%s\n", set, a_one, b_two, reset);
-- snap --

And this is _with_ patience merge:

-- snip --
diff --cc diff.c
index 4e4e439,f91f256..0000000
--- a/diff.c
+++ b/diff.c
@@@ -1498,19 -1464,13 +1498,28 @@@ static void builtin_diff(const char *na
  	char *a_one, *b_two;
  	const char *set = diff_get_color_opt(o, DIFF_METAINFO);
  	const char *reset = diff_get_color_opt(o, DIFF_RESET);
 +	const char *a_prefix, *b_prefix;
 +
++<<<<<<< HEAD:diff.c
 +	diff_set_mnemonic_prefix(o, "a/", "b/");
 +	if (DIFF_OPT_TST(o, REVERSE_DIFF)) {
 +		a_prefix = o->b_prefix;
 +		b_prefix = o->a_prefix;
 +	} else {
 +		a_prefix = o->a_prefix;
 +		b_prefix = o->b_prefix;
 +	}
  
 +	a_one = quote_two(a_prefix, name_a + (*name_a == '/'));
 +	b_two = quote_two(b_prefix, name_b + (*name_b == '/'));
++=======
+ 	/* Never use a non-valid filename anywhere if at all possible */
+ 	name_a = DIFF_FILE_VALID(one) ? name_a : name_b;
+ 	name_b = DIFF_FILE_VALID(two) ? name_b : name_a;
+ 
+ 	a_one = quote_two(o->a_prefix, name_a + (*name_a == '/'));
+ 	b_two = quote_two(o->b_prefix, name_b + (*name_b == '/'));
++>>>>>>> e261cf94848d31868c21fb11cade51c30dfcdbe7:diff.c
  	lbl[0] = DIFF_FILE_VALID(one) ? a_one : "/dev/null";
  	lbl[1] = DIFF_FILE_VALID(two) ? b_two : "/dev/null";
  	fprintf(o->file, "%sdiff --git %s %s%s\n", set, a_one, b_two, reset);
-- snap --

So again, we have no clear winner.

Therefore I counted the lines between conflict markers (actually, a perl 
script did).  Of these 66 merges, on average patience merge produced 
4.46774193548387 _fewer_ lines between conflict markers.

Take that with a grain of salt, though: the standard deviation of this 
difference is a hefty 121.163046639509 lines.

The worst case for patience diff was the merge 
4698ef555a1706fe322a68a02a21fb1087940ac3, where the --cc diff line counts 
are 1300 (without) vs 1301 (with patience merge), but the lines between 
conflict markers are 197 vs a ridiculous 826 lines!

But you should take that also with a grain of salt: this merge is a 
_subtree_ merge, and my test redid it as a _non-subtree_ merge.

So I restricted the analysis to the non-subtree merges, and now 
non-patience merge comes out 6.97297297297297 conflict lines fewer than 
patience merge, with a standard deviation of 58.941106657867 (with a total 
count of 37 merges).

Note that ~7 lines difference with a standard deviation of ~59 lines is 
pretty close to ~0 lines difference.

In the end, the additional expense of patience merge might just not be 
worth it.

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH (topgit) 1/2] Implement setup_pager just like in git
From: Kirill Smelkov @ 2009-01-07 22:00 UTC (permalink / raw)
  To: Thomas Rast, Bert Wesarg, Pierre Habouzit, Petr Baudis
  Cc: martin f krafft, git
In-Reply-To: <200901071324.57222.trast@student.ethz.ch>

On Wed, Jan 07, 2009 at 01:24:44PM +0100, Thomas Rast wrote:
> Kirill Smelkov wrote:
> > On Tue, Jan 06, 2009 at 09:32:03PM +0100, martin f krafft wrote:
> > > I find this very confusing. Why not simply
> > > 
> > >   TG_PAGER="${GIT_PAGER:-}"
> > >   TG_PAGER="${TG_PAGER:-$PAGER}"
> > > 
> > > ?
> > 
> > I find it confusing too, but this is needed because they usually do
> > something like this
> > 
> >     $ GIT_PAGER='' <some-git-command>
> > 
> > to force it to be pagerless.
> [...]
> > So I think it would be better to preserve the same semantics for `tg
> > patch` callers, though it's a pity that it's hard (maybe I'm wrong ?) to
> > see whether an env var is unset.
> 
> Admittedly I haven't really studied your patch, but the ${} constructs
> can in fact tell empty from unset:
> 
>   $ EMPTY=
>   $ unset UNDEFINED
>   $ echo ${UNDEFINED-foo}
>   foo
>   $ echo ${UNDEFINED:-foo}
>   foo
>   $ echo ${EMPTY-foo}
> 
>   $ echo ${EMPTY:-foo}
>   foo
> 
> 'man bash' indeed says
> 
>   When not performing substring expansion, bash tests for a parameter
>   that is unset or null; omitting the colon results in a test only for
>   a parameter that is unset.
> 
> So I suppose you could use
> 
>   ${GIT_PAGER-${PAGER-less}}
> 
> or similar.

Good eyes, thanks!

I'll rework it.


On Wed, Jan 07, 2009 at 03:24:02PM +0100, Bert Wesarg wrote:
> On Wed, Jan 7, 2009 at 12:27, Kirill Smelkov <kirr@landau.phys.spbu.ru> wrote:
> > Martin, thanks for your review.
> > +       # atexit(close(1); wait pager)
> > +       trap "exec >&-; rm "$_pager_fifo"; rmdir "$_pager_fifo_dir"; wait" EXIT
> I think you need to escape the double quotes.

Good eyes -- corrected and thanks!


On Wed, Jan 07, 2009 at 04:10:00PM +0100, Petr Baudis wrote:
> On Wed, Jan 07, 2009 at 02:27:54PM +0300, Kirill Smelkov wrote:
> > >From 2193b7c703c2d31c8739eec617b8c0e8c1d09b79 Mon Sep 17 00:00:00 2001
> > From: Kirill Smelkov <kirr@landau.phys.spbu.ru>
> > Date: Tue, 6 Jan 2009 17:56:37 +0300
> > Subject: [PATCH (topgit) v2] Implement setup_pager just like in git
> > 
> > setup_pager() spawns a pager process and redirect the rest of our output
> > to it.
> > 
> > This will be needed to fix `tg patch` output in the next commit.
> > 
> > Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
> 
> But you never use it...?

What do you mean?

It is used in the next patch as posted in original series:

http://marc.info/?l=git&m=123125495000600&w=2

For completeness, I'll include both patches in this email.

> > ---
> >  tg.sh |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 files changed, 54 insertions(+), 0 deletions(-)
> > 
> > diff --git a/tg.sh b/tg.sh
> > index 8c23d26..bf9cf5c 100644
> > --- a/tg.sh
> > +++ b/tg.sh
> > @@ -243,6 +243,60 @@ do_help()
> >  	fi
> >  }
> >  
> > +## Pager stuff
> > +
> > +# isatty FD
> > +isatty()
> > +{
> > +	tty -s 0<&$1
> > +}
> > +
> > +# setup_pager
> > +# Spawn pager process and redirect the rest of our output to it
> > +setup_pager()
> > +{
> > +	isatty 1 || return 0
> > +
> > +	# TG_PAGER = GIT_PAGER | PAGER
> > +	# (but differentiate between GIT_PAGER='' and unset variables)
> > +	# http://unix.derkeiler.com/Newsgroups/comp.unix.shell/2004-03/0792.html
> > +	case ${GIT_PAGER+XXX} in
> > +	'')
> > +		case ${PAGER+XXX} in
> > +		'')
> 
> I'm pretty sure there's been a nice trick for this, but I can't remember
> it at all now.

Already corrected to ${GIT_PAGER-${PAGER-less}}, thanks to Thomas.

> > +			# both GIT_PAGER & PAGER unset
> > +			TG_PAGER=''
> > +			;;
> > +		*)
> > +			TG_PAGER="$PAGER"
> > +			;;
> > +		esac
> > +		;;
> > +	*)
> > +		TG_PAGER="$GIT_PAGER"
> > +		;;
> > +	esac
> > +
> > +	[ -z "$TG_PAGER"  -o  "$TG_PAGER" = "cat" ]  && return 0
> > +
> > +
> > +	# now spawn pager
> > +	export LESS=${LESS:-FRSX}	# as in pager.c:pager_preexec()
> > +
> > +	_pager_fifo_dir="$(mktemp -t -d tg-pager-fifo.XXXXXX)"
> > +	_pager_fifo="$_pager_fifo_dir/0"
> > +	mkfifo -m 600 "$_pager_fifo"
> > +
> > +	"$TG_PAGER" < "$_pager_fifo" &
> > +	exec > "$_pager_fifo"		# dup2(pager_fifo.in, 1)
> > +
> > +	# this is needed so e.g. `git diff` will still colorize it's output if
> > +	# requested in ~/.gitconfig with color.diff=auto
> > +	export GIT_PAGER_IN_USE=1
> > +
> > +	# atexit(close(1); wait pager)
> > +	trap "exec >&-; rm "$_pager_fifo"; rmdir "$_pager_fifo_dir"; wait" EXIT
> > +}
> 
> Frankly, I would have been just much happier if something like git
> pager--helper would be provided for external tools to use. Seeing how it
> gets reimplemented like this just pains me greatly.

After we settle on implementation, would it make sense to include this
setup_pager into git-sh-setup?

I propose we include this stuff into tg.sh first, so that topgit would
work correctly with previous versions of git.

> On Wed, Jan 07, 2009 at 03:44:32PM +0100, Pierre Habouzit wrote:
> > On Wed, Jan 07, 2009 at 11:27:54AM +0000, Kirill Smelkov wrote:
> > > isatty()
> > > {
> > > 	tty -s 0<&$1
> > > }
> > 
> > why not test -t 0 ? I'm not sure it's POSIX though.
> 
> It's SUS for many issues already it seems.

Pierre and Petr - thanks for the info. Yes, `test -t $fd` looks better.


Here is improved patch:

Changes since v1:

 o Simplify TG_PAGER setup  (thanks to Thomas Rast)
 o Properly escape "        (thanks to Bert Wesarg)
 o Simpler isatty           (thanks to Pierre Habouzit & Petr Baudis)


(interdiff)

diff --git a/tg.sh b/tg.sh
index bf9cf5c..b64fc3a 100644
--- a/tg.sh
+++ b/tg.sh
@@ -248,7 +248,7 @@ do_help()
 # isatty FD
 isatty()
 {
-	tty -s 0<&$1
+	test -t $1
 }
 
 # setup_pager
@@ -257,25 +257,9 @@ setup_pager()
 {
 	isatty 1 || return 0
 
-	# TG_PAGER = GIT_PAGER | PAGER
-	# (but differentiate between GIT_PAGER='' and unset variables)
-	# http://unix.derkeiler.com/Newsgroups/comp.unix.shell/2004-03/0792.html
-	case ${GIT_PAGER+XXX} in
-	'')
-		case ${PAGER+XXX} in
-		'')
-			# both GIT_PAGER & PAGER unset
-			TG_PAGER=''
-			;;
-		*)
-			TG_PAGER="$PAGER"
-			;;
-		esac
-		;;
-	*)
-		TG_PAGER="$GIT_PAGER"
-		;;
-	esac
+	# TG_PAGER = GIT_PAGER | PAGER | less
+	# NOTE: GIT_PAGER='' is significant
+	TG_PAGER=${GIT_PAGER-${PAGER-less}}
 
 	[ -z "$TG_PAGER"  -o  "$TG_PAGER" = "cat" ]  && return 0
 
@@ -295,7 +279,7 @@ setup_pager()
 	export GIT_PAGER_IN_USE=1
 
 	# atexit(close(1); wait pager)
-	trap "exec >&-; rm "$_pager_fifo"; rmdir "$_pager_fifo_dir"; wait" EXIT
+	trap "exec >&-; rm \"$_pager_fifo\"; rmdir \"$_pager_fifo_dir\"; wait" EXIT
 }
 
 ## Startup


From: Kirill Smelkov <kirr@landau.phys.spbu.ru>
To: Petr Baudis <pasky@suse.cz>
Cc: Git Mailing List <git@vger.kernel.org>
Bcc: Kirill Smelkov <kirr@landau.phys.spbu.ru>
Subject: Implement setup_pager just like in git

setup_pager() spawns a pager process and redirect the rest of our output
to it.

This will be needed to fix `tg patch` output in the next commit.

Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>

---
 tg.sh |   38 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/tg.sh b/tg.sh
index 8c23d26..b64fc3a 100644
--- a/tg.sh
+++ b/tg.sh
@@ -243,6 +243,44 @@ do_help()
 	fi
 }
 
+## Pager stuff
+
+# isatty FD
+isatty()
+{
+	test -t $1
+}
+
+# setup_pager
+# Spawn pager process and redirect the rest of our output to it
+setup_pager()
+{
+	isatty 1 || return 0
+
+	# TG_PAGER = GIT_PAGER | PAGER | less
+	# NOTE: GIT_PAGER='' is significant
+	TG_PAGER=${GIT_PAGER-${PAGER-less}}
+
+	[ -z "$TG_PAGER"  -o  "$TG_PAGER" = "cat" ]  && return 0
+
+
+	# now spawn pager
+	export LESS=${LESS:-FRSX}	# as in pager.c:pager_preexec()
+
+	_pager_fifo_dir="$(mktemp -t -d tg-pager-fifo.XXXXXX)"
+	_pager_fifo="$_pager_fifo_dir/0"
+	mkfifo -m 600 "$_pager_fifo"
+
+	"$TG_PAGER" < "$_pager_fifo" &
+	exec > "$_pager_fifo"		# dup2(pager_fifo.in, 1)
+
+	# this is needed so e.g. `git diff` will still colorize it's output if
+	# requested in ~/.gitconfig with color.diff=auto
+	export GIT_PAGER_IN_USE=1
+
+	# atexit(close(1); wait pager)
+	trap "exec >&-; rm \"$_pager_fifo\"; rmdir \"$_pager_fifo_dir\"; wait" EXIT
+}
 
 ## Startup
 
-- 
tg: (8c77c34..) t/setup-pager (depends on: master)


Second patch which uses setup_pager:


>From 1b723ebf740c58bc25ac97eff0a31b07373d8d1e Mon Sep 17 00:00:00 2001
From: Kirill Smelkov <kirr@landau.phys.spbu.ru>
Date: Tue, 6 Jan 2009 18:03:21 +0300
Subject: [TopGit PATCH] tg-patch: fix pagination

Previously, when I was invoking `tg patch` the following used to happen:

1. .topmsg content was sent directly to _terminal_
2. for each file in the patch, its diff was generated with `git diff`
   and sent to *PAGER*
3. trailing 'tg: ...' was sent to terminal again

So the problem is that while `tg patch >file` works as expected, plain
`tg patch` does not -- in pager there is only a part of the whole patch
(first file diff) and header and trailer are ommitted.

I've finally decided to fix this inconvenience, and the way it works is
like in git -- we just hook `setup_pager` function in commands which
need to be paginated.

Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
---
 tg-patch.sh |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/tg-patch.sh b/tg-patch.sh
index a704375..dc699d2 100644
--- a/tg-patch.sh
+++ b/tg-patch.sh
@@ -24,6 +24,9 @@ done
 base_rev="$(git rev-parse --short --verify "refs/top-bases/$name" 2>/dev/null)" ||
 	die "not a TopGit-controlled branch"
 
+
+setup_pager
+
 git cat-file blob "$name:.topmsg"
 echo
 [ -n "$(git grep '^[-]--' "$name" -- ".topmsg")" ] || echo '---'
-- 
1.6.1.48.ge9b8


Thanks,
Kirill

^ permalink raw reply related

* Re: [PATCH] gitweb: don't use pathinfo for global actions
From: Giuseppe Bilotta @ 2009-01-07 21:32 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Petr Baudis, Junio C Hamano, Devin Doucette
In-Reply-To: <200901061837.23637.jnareb@gmail.com>

On Tue, Jan 6, 2009 at 6:37 PM, Jakub Narebski <jnareb@gmail.com> wrote:
> On Fri, 2 Jan 2009, Giuseppe Bilotta wrote:
>> Accepting global actions in use_pathinfo is not a very robust solution
>> due to possible present and future conflicts between project names and
>> global actions, therefore we just refuse to create PATH_INFO URLs when
>> the project is not defined.
>
> I think it is quite robust solution and it makes sense; we use
> shortcuts http://git.example.com for projects_list page, and
> http://git.example.com/path/to/repo.git for overview 'summary'
> action for a project, therefore pathinfo has to look like the
> following: http://git.example.com/repo/action/hash with "action"
> _after_ "project".  And there is also matter of backward compatibility
> of URL (URLs shouldn't break).
>
> Anyway, we have $home_link for default project_list page, which
> is path_info without project, and query without query string...

Today I had this idea: a possible way to have global actions into the
path would be to use an invalid project name, but I'm not sure if
there ARE invalid project names at all. Maybe using something very
abstruse such as _projects_ (underscore "projects" underscore) or even
just _ (underscore).

The only thing I can think of for which global actions in path WOULD
be interesting would be that project tag paths would become something
like http://git.example.com/_/tag/sometagname which can be tagged by
the rel-tag microformat http://microformats.org/wiki/rel-tag ...

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply

* Re: [PATCH] diff --no-index -q: fix endless loop
From: Junio C Hamano @ 2009-01-07 21:30 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git
In-Reply-To: <1231326930-7132-1-git-send-email-trast@student.ethz.ch>

Thanks.

^ permalink raw reply

* Re: [PATCH 4/3] shortlog: handle multi-line subjects like log --pretty=oneline et. al. do
From: Junio C Hamano @ 2009-01-07 21:19 UTC (permalink / raw)
  To: René Scharfe; +Cc: markus.heidelberg, git
In-Reply-To: <4963C1E2.8070906@lsrfire.ath.cx>

René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:

> The commit message parser of git shortlog used to treat only the first
> non-empty line of the commit message as the subject.  Other log commands
> (e.g. --pretty=oneline) show the whole first paragraph instead (unwrapped
> into a single line).
>
> For consistency, this patch borrows format_subject() from pretty.c to
> make shortlog do the same.

Thanks; will queue.

^ permalink raw reply

* Re: [PATCH] strbuf: instate cleanup rule in case of non-memory errors
From: Junio C Hamano @ 2009-01-07 21:19 UTC (permalink / raw)
  To: René Scharfe; +Cc: Pierre Habouzit, Linus Torvalds, git
In-Reply-To: <4963C1EA.504@lsrfire.ath.cx>

René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:

> Make all strbuf functions that can fail free() their memory on error if
> they have allocated it.  They don't shrink buffers that have been grown,
> though.

Thanks; applied.

^ permalink raw reply

* Re: [PATCH] gitweb: don't use pathinfo for global actions
From: Junio C Hamano @ 2009-01-07 21:19 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Giuseppe Bilotta, git, Petr Baudis, Devin Doucette
In-Reply-To: <200901061837.23637.jnareb@gmail.com>

Jakub Narebski <jnareb@gmail.com> writes:

> Therefore it really needs to be in, as df63fb also by Giuseppe
> (gitweb: use href() when generating URLs in OPML) is already in,
> and I think gitweb would generate broken OPML and TXT links without
> this patch.
> ...
> Acked-by: Jakub Narebski <jnareb@gmail.com>

Thanks for reminding me.  Queued.

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Junio C Hamano @ 2009-01-07 20:48 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Sam Vilain, Pierre Habouzit, Linus Torvalds, davidel,
	Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.00.0901072121260.7496@intel-tinevez-2-302>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> After compiling and installing, something like this should be fun to 
> watch:
>
> 	$ git rev-list --all --parents | \
> 	  grep " .* " | \
> 	  while read commit parent1 parent2 otherparents
> 	  do
> 		test -z "$otherparents" || continue
> 		git checkout $parent1 &&
> 		git merge $parent2 &&
> 		git diff > without-patience.txt &&
> ...
> 		if ! cmp without-patience.txt with-patience.txt
> 		then
> 			echo '==============================='
> 			echo "differences found in merge $commit"
> ...
> 			cat with-patience.txt
> 		fi ||
> 		exit
> 	  done | tee patience-merge.out

An even more interesting test would be possible by dropping "&&" from the
two "git merge" invocations.

 - Your sample will exit at the first conflicting merge otherwise.

 - You may find cases where one resolves cleanly while the other leaves
   conflicts.

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Johannes Schindelin @ 2009-01-07 20:38 UTC (permalink / raw)
  To: Sam Vilain
  Cc: Pierre Habouzit, Linus Torvalds, davidel, Francis Galiegue,
	Git ML
In-Reply-To: <1231359317.6011.12.camel@maia.lan>

Hi,

On Thu, 8 Jan 2009, Sam Vilain wrote:

> On Tue, 2009-01-06 at 20:40 +0100, Johannes Schindelin wrote:
> > Although I would like to see it in once it is fleshed out -- even if it 
> > does not meet our usefulness standard -- because people said Git is 
> > inferior for not providing a patience diff.  If we have --patience, we can 
> > say "but we have it, it's just not useful, check for yourself".
> 
> Whatever happens, the current deterministic diff algorithm needs to stay
> for generating patch-id's... those really can't be allowed to change.

Oh, there is no discussion about replacing the diff algorithm we have 
right now.

Even if we never output patch-ids anywhere, so we always recalculate them, 
and therefore would be free to replace the diff algorithm with something 
else.

The thing why I do not want patience diff to replace the existing code 
is:

- patience diff is slower,
- patience diff is often not worth it (produces the same, maybe even 
  worse output), and
- patience diff needs the existing code as a fallback anyway.

Where it could possibly help to change existing behavior is with merging.

So maybe somebody has some time to play with, and can apply this patch:

-- snip --
diff --git a/ll-merge.c b/ll-merge.c
index fa2ca52..f731026 100644
--- a/ll-merge.c
+++ b/ll-merge.c
@@ -79,6 +79,8 @@ static int ll_xdl_merge(const struct ll_merge_driver *drv_unused,
 	memset(&xpp, 0, sizeof(xpp));
 	if (git_xmerge_style >= 0)
 		style = git_xmerge_style;
+	if (getenv("GIT_PATIENCE_MERGE"))
+		xpp.flags |= XDF_PATIENCE_DIFF;
 	return xdl_merge(orig,
 			 src1, name1,
 			 src2, name2,
-- snap --

After compiling and installing, something like this should be fun to 
watch:

	$ git rev-list --all --parents | \
	  grep " .* " | \
	  while read commit parent1 parent2 otherparents
	  do
		test -z "$otherparents" || continue
		git checkout $parent1 &&
		git merge $parent2 &&
		git diff > without-patience.txt &&
		git reset --hard $parent1 &&
		GIT_PATIENCE_MERGE=1 git merge $parent2 &&
		git diff > with-patience.txt &&
		if ! cmp without-patience.txt with-patience.txt
		then
			echo '==============================='
			echo "differences found in merge $commit"
			echo "without patience: $(wc -l < without-patience.txt)"
			echo "with patience: $(wc -l < with-patience.txt)"
			echo '-------------------------------'
			echo 'without patience:'
			cat without-patience.txt
			echo '-------------------------------'
			echo 'with patience:'
			cat with-patience.txt
		fi ||
		exit
	  done | tee patience-merge.out

Ciao,
Dscho

^ permalink raw reply related

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Linus Torvalds @ 2009-01-07 20:25 UTC (permalink / raw)
  To: Sam Vilain
  Cc: Johannes Schindelin, Pierre Habouzit, davidel, Francis Galiegue,
	Git ML
In-Reply-To: <1231359317.6011.12.camel@maia.lan>



On Thu, 8 Jan 2009, Sam Vilain wrote:
> 
> Whatever happens, the current deterministic diff algorithm needs to stay
> for generating patch-id's... those really can't be allowed to change.

Sure they can.

We never cache patch-id's over a long time. And we _have_ changed xdiff to 
modify the output of the patches before, quite regardless of any patience 
issues: see commit 9b28d55401a529ff08c709f42f66e765c93b0a20, which 
admittedly doesn't affect any _normal_ diffs, but can generate subtly 
different results for some cases.

It's true that we want the diff algorithm to be deterministic in the sense 
that over the run of a _single_ rebase operation, the diff between two 
files should give similar and deterministic results, but that's certainly 
true of patience diff too.

			Linus

^ permalink raw reply

* Re: [PATCH v3 1/3] Implement the patience diff algorithm
From: Johannes Schindelin @ 2009-01-07 20:19 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Pierre Habouzit, Linus Torvalds, Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.10.0901071159060.17115@alien.or.mcafeemobile.com>

Hi,

On Wed, 7 Jan 2009, Davide Libenzi wrote:

> On Wed, 7 Jan 2009, Johannes Schindelin wrote:
> 
> > Could it be that you misread my patch, and assumed that I faked an 
> > xdfenv?
> > 
> > I did not, but instead faked two mmfiles, which is only as simple as I did 
> > it because in git.git, we only have contiguous mmfiles.  (I recall that 
> > libxdiff allows for ropes instead of arrays.)
> > 
> > The way I did it has one big shortcoming: I need to prepare an xdfenv for 
> > the subfiles even if I already prepared one for the complete files.  IOW 
> > the lines are rehashed all over again.
> 
> I told you I just glanced at the code :)
> In that way, if you guys decide to merge this new algo, you'll need to 
> split the prepare from the optimize, and feed it with an already prepared 
> env.

Right.

> Before going that way, have you ever tried to tweak xdl_cleanup_records 
> and xdl_clean_mmatch to reduce the level of optimization, and see the 
> results you get? It is possible that you won't need two different algos 
> inside git.

No, I hadn't thought that libxdiff already determines uniqueness before 
actually running xdl_do_diff().

I also have to admit that I am not as clever as other people, and had 
quite a hard time figuring out as much as I did (for example, that rchg[i] 
== 1 means that this line is to be added/deleted, and that i is in the 
range 0, ..., N - 1 rather than 1, ..., N).

So it is quite possible that something patience-like can be done earlier.

However, I do not see a way to implement the recursion necessary for the 
patience diff.  Remember:

	patience(line range):
		find unique lines
		if no unique lines found:
			resort to classical diff
			return
		extract the longest common sequence of unique common lines
		between those, recurse

When recursing, previously non-unique lines can turn unique, of course.  
And I do not see how that recursion could be done before 
xdl_clean_mmatch(), short of redoing the hashing and cleaning records up.

Of course, it might well be possible, but I am already out of my depth 
reading something like "rdis0" and "rpdis1", and being close to despair.

:')

Ciao,
Dscho

		

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Sam Vilain @ 2009-01-07 20:15 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Pierre Habouzit, Linus Torvalds, davidel, Francis Galiegue,
	Git ML
In-Reply-To: <alpine.DEB.1.00.0901062037250.30769@pacific.mpi-cbg.de>

On Tue, 2009-01-06 at 20:40 +0100, Johannes Schindelin wrote:
> Although I would like to see it in once it is fleshed out -- even if it 
> does not meet our usefulness standard -- because people said Git is 
> inferior for not providing a patience diff.  If we have --patience, we can 
> say "but we have it, it's just not useful, check for yourself".

Whatever happens, the current deterministic diff algorithm needs to stay
for generating patch-id's... those really can't be allowed to change.

Sam.

^ permalink raw reply

* Re: [PATCH v3 1/3] Implement the patience diff algorithm
From: Davide Libenzi @ 2009-01-07 20:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Schindelin, Pierre Habouzit, Francis Galiegue, Git ML
In-Reply-To: <alpine.LFD.2.00.0901071056470.3057@localhost.localdomain>

On Wed, 7 Jan 2009, Linus Torvalds wrote:

> On Wed, 7 Jan 2009, Davide Libenzi wrote:
> > 
> > xdiff allows for diffing ranges, and the most efficent method is exactly 
> > how you did ;)
> 
> No, the performance problem is how it needs to re-hash everything. xdiff 
> doesn't seem to have any way to use a "subset of the hash", so what the 
> patience diff does is to use a subset of the mmfile, and then that will 
> force all the rehashing to take place, which is kind of sad.
> 
> It would be nice (for patience diff) if it could partition the _hashes_ 
> instead of partitioning the _data_. That way it wouldn't need to rehash. 
> See?

Yeah, saw that afterwards ;)



- Davide

^ permalink raw reply

* Re: [PATCH v3 1/3] Implement the patience diff algorithm
From: Davide Libenzi @ 2009-01-07 20:09 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Pierre Habouzit, Linus Torvalds, Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.00.0901071924350.7496@intel-tinevez-2-302>

On Wed, 7 Jan 2009, Johannes Schindelin wrote:

> Heh.
> 
> Could it be that you misread my patch, and assumed that I faked an 
> xdfenv?
> 
> I did not, but instead faked two mmfiles, which is only as simple as I did 
> it because in git.git, we only have contiguous mmfiles.  (I recall that 
> libxdiff allows for ropes instead of arrays.)
> 
> The way I did it has one big shortcoming: I need to prepare an xdfenv for 
> the subfiles even if I already prepared one for the complete files.  IOW 
> the lines are rehashed all over again.

I told you I just glanced at the code :)
In that way, if you guys decide to merge this new algo, you'll need to 
split the prepare from the optimize, and feed it with an already prepared 
env.
Before going that way, have you ever tried to tweak xdl_cleanup_records 
and xdl_clean_mmatch to reduce the level of optimization, and see the 
results you get? It is possible that you won't need two different algos 
inside git.



- Davide

^ permalink raw reply

* Re: [PATCH v3 1/3] Implement the patience diff algorithm
From: Johannes Schindelin @ 2009-01-07 20:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Davide Libenzi, Pierre Habouzit, Francis Galiegue, Git ML
In-Reply-To: <alpine.LFD.2.00.0901071056470.3057@localhost.localdomain>

Hi,

On Wed, 7 Jan 2009, Linus Torvalds wrote:

> On Wed, 7 Jan 2009, Davide Libenzi wrote:
> > 
> > xdiff allows for diffing ranges, and the most efficent method is 
> > exactly how you did ;)
> 
> No, the performance problem is how it needs to re-hash everything. xdiff 
> doesn't seem to have any way to use a "subset of the hash", so what the 
> patience diff does is to use a subset of the mmfile, and then that will 
> force all the rehashing to take place, which is kind of sad.
> 
> It would be nice (for patience diff) if it could partition the _hashes_ 
> instead of partitioning the _data_. That way it wouldn't need to rehash. 
> See?

Actually, for libxdiff (non-patience), this is not possible, as 
xdl_cleanup_records() rewrites the hashes so that they are in a contiguous 
interval (0, ..., N-1), where N is the number of distinct lines found.

I am also pretty certain that at least the non-patience part needs the 
maximum of that "hash" to be as small as possible.

Having said that, if Linus likes patience diff enough to want it faster, 
Dscho will improve the speed by avoiding the rehashing for the patience 
diff part (although the lines for which patience has to fall back to Myers 
diff will _still_ rehash, but that does not hurt _that_ much in practice).

Ciao,
Dscho

^ permalink raw reply

* Re: Error: unable to unlink ... when using "git gc"
From: Boyd Stephen Smith Jr. @ 2009-01-07 19:46 UTC (permalink / raw)
  To: Sitaram Chamarty; +Cc: git
In-Reply-To: <slrngm9rdm.gcv.sitaramc@sitaramc.homelinux.net>

[-- Attachment #1: Type: text/plain, Size: 1691 bytes --]

On Wednesday 2009 January 07 12:00:22 Sitaram Chamarty wrote:
> On 2009-01-07, Boyd Stephen Smith Jr. <bss@iguanasuicide.net> wrote:
> > On Wednesday 2009 January 07 04:55:56 you wrote:
> >> So when you say "group", you're saying "0660", and when you
> >> say "0660", you're overriding users umask value.
> >
> > it could just have been the version of git I was using (1.4.4.4, IIRC)
> > -- still using that in at least one place, as it is the current
> > version in Debian Etch.
>
> 1.4.4.4 is 2 years and 2 days old today!  [I've heard
> stories about Debian, but never thought it was this
> conservative!]

Once a stable is released, no new versions of packages come in, only 
backported bug and security fixes.  The pre-release freeze also limits new 
versions from being considered.  Lenny should be out RSN, so I can move up to 
1.5.6.5. :)

$ apt-cache policy git-core
git-core:
  Installed: 1:1.4.4.4-4
  Candidate: 1:1.4.4.4-4
  Version table:
     1:1.6.0.6-1 0
        300 http://localhost experimental/main Packages
     1:1.5.6.5-2 0
        700 http://localhost testing/main Packages
        500 http://localhost unstable/main Packages
     1:1.5.6.5-1~bpo40+1 0
        800 http://localhost etch-backports/main Packages
 *** 1:1.4.4.4-4 0
        900 http://localhost stable/main Packages
        100 /var/lib/dpkg/status
     1:1.4.4.4-2.1+etch1 0
        900 http://localhost stable/updates/main Packages
-- 
Boyd Stephen Smith Jr.                     ,= ,-_-. =. 
bss@iguanasuicide.net                     ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy           `-'(. .)`-' 
http://iguanasuicide.net/                      \_/     

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox