Bizarre missing changes (git bug?)

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Bizarre missing changes (git bug?)
@ 2008-07-21 20:26 Tim Harper
  2008-07-21 20:37 ` Linus Torvalds
  2008-07-21 20:42 ` Alex Riesen
  0 siblings, 2 replies; 58+ messages in thread
From: Tim Harper @ 2008-07-21 20:26 UTC (permalink / raw)
  To: git

I ran into a strange issue that has left me scratching my head.

I have a commit in my history, that does indeed show up in my branch,  
named "sprint"

The following commands yield as follows (I've modified the output  
slightly to conceal any potentially proprietary information):

#########################
git log HEAD -p

commit 8f9effffb0dcdacd514085608e8923fbbe00ff29
Author: Name Concealed <email@email.com>
Date:   Mon Jul 14 16:19:18 2008 -0600

     commit message....

diff --git a/app/controllers/events_controller.rb b/app/controllers/ 
events_controller.rb
index 6905ba4..a0b7dfc 100644
--- a/app/controllers/events_controller.rb
+++ b/app/controllers/events_controller.rb
@@ -238,36 +238,37 @@ class EventsController < ApplicationController
    }.freeze

    RUBY_STUFF = {
-    changes...
-    changes...
-    changes...
+    changes...
+    changes...
+    changes...

diff --git a/spec/fixtures/factors.yml b/spec/fixtures/factors.yml
index 186ed73..3c76e86 100755
--- a/spec/fixtures/factors.yml
+++ b/spec/fixtures/factors.yml
@@ -2483,4 +2483,54 @@ some_branch:
    file:
    contents:
-  reliable_on:
\ No newline at end of file
+  data:
+fixture_name:
+  id: 115
+  file: Event
+  contents: behavior

#########################
git log spec/fixtures/factors.yml

... commit 8f9effff is not listed anywhere

#########################
git log app/controllers/events_controller.rb

... commit 8f9effff shows up

#########################
git branch --contains 8f9effff

   some-task-branch
* sprint

The changes in 8f9effff for app/controllers/events_controller.rb show  
up in the working copy, however the changes for spec/fixtures/ 
factors.yml are nowhere to be seen.  It's as if the history of that  
particular file diverged somehow, but I know that can't be true since  
git doesn't track files.

Anyone run into this before?  Any idea what might have caused it?   
We're a bit concerned about this because if we don't know how to avoid  
this, we no longer can feel certain that when something is committed,  
it will make it out in our release.

Any help or clues VERY much apperciated.  Thanks!

Tim

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-21 20:26 Bizarre missing changes (git bug?) Tim Harper
@ 2008-07-21 20:37 ` Linus Torvalds
  2008-07-21 22:53   ` Tim Harper
                     ` (2 more replies)
  2008-07-21 20:42 ` Alex Riesen
  1 sibling, 3 replies; 58+ messages in thread
From: Linus Torvalds @ 2008-07-21 20:37 UTC (permalink / raw)
  To: Tim Harper; +Cc: git

On Mon, 21 Jul 2008, Tim Harper wrote:
> 
> Anyone run into this before?  Any idea what might have caused it?  We're a bit
> concerned about this because if we don't know how to avoid this, we no longer
> can feel certain that when something is committed, it will make it out in our
> release.

Read up on '--full-history'.

By default, git simplifies the history for logs that have path 
simplification: only walking down the side of a merge that all the data 
came from (ie the unchanged side). So it only leaves merges around if 
there was changes from _all_ parents.

So without --full-history, if any parent matches the state, it just 
removes the merge and picks that parent that contained all the state. 
Obviously, any changes to that file can be sufficiently explained by 
walking just that limited history, because they must have changed in 
_that_ history too!

That default behaviour leads to a *much* simpler history, and is usually 
what you want - it avoids unnecessary duplication when something was 
changed trivially the same way in both branches - 'git log' will just pick 
the first branch.

So, if you had two (or more) commits that both fixed the same bug in 
different branches, and thus both branches actually ended up with the same 
contents, it does mean that "git log <filename>" will only show _one_ of 
the fixes.

In this case, it apparently showed another version than the one you were 
looking for.

				Linus

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-21 20:26 Bizarre missing changes (git bug?) Tim Harper
  2008-07-21 20:37 ` Linus Torvalds
@ 2008-07-21 20:42 ` Alex Riesen
  1 sibling, 0 replies; 58+ messages in thread
From: Alex Riesen @ 2008-07-21 20:42 UTC (permalink / raw)
  To: Tim Harper; +Cc: git

Tim Harper, Mon, Jul 21, 2008 22:26:06 +0200:
> I ran into a strange issue that has left me scratching my head.
>
> I have a commit in my history, that does indeed show up in my branch,  
> named "sprint"
>
...
>
> Any help or clues VERY much apperciated.  Thanks!
>

Try looking at the history graph in gitk

    $ gitk --all -- app/controllers/events_controller.rb spec/fixtures/factors.yml

It usually shows the history in a very understandable way and
it can help to detect when the histories have diverged.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-21 20:37 ` Linus Torvalds
@ 2008-07-21 22:53   ` Tim Harper
  2008-07-21 22:55     ` Tim Harper
       [not found]   ` <8C23FB54-A28E-4294-ABEA-A5766200768B@gmail.com>
  2008-07-26  3:12   ` Roman Zippel
  2 siblings, 1 reply; 58+ messages in thread
From: Tim Harper @ 2008-07-21 22:53 UTC (permalink / raw)
  To: git

On Jul 21, 2008, at 2:37 PM, Linus Torvalds wrote:

>
>
> On Mon, 21 Jul 2008, Tim Harper wrote:
>>
>> Anyone run into this before?  Any idea what might have caused it?   
>> We're a bit
>> concerned about this because if we don't know how to avoid this, we  
>> no longer
>> can feel certain that when something is committed, it will make it  
>> out in our
>> release.
>
> Read up on '--full-history'.
>
> By default, git simplifies the history for logs that have path
> simplification: only walking down the side of a merge that all the  
> data
> came from (ie the unchanged side). So it only leaves merges around if
> there was changes from _all_ parents.
>
> So without --full-history, if any parent matches the state, it just
> removes the merge and picks that parent that contained all the state.
> Obviously, any changes to that file can be sufficiently explained by
> walking just that limited history, because they must have changed in
> _that_ history too!
>
> That default behaviour leads to a *much* simpler history, and is  
> usually
> what you want - it avoids unnecessary duplication when something was
> changed trivially the same way in both branches - 'git log' will  
> just pick
> the first branch.
>

Agreed - this was an insightful decision.

> So, if you had two (or more) commits that both fixed the same bug in
> different branches, and thus both branches actually ended up with  
> the same
> contents, it does mean that "git log <filename>" will only show  
> _one_ of
> the fixes.
>
> In this case, it apparently showed another version than the one you  
> were
> looking for.
>
> 				Linus

Git has made me feel stupid on various occasions.  This is no  
exception as the problem turned out being in the chair, not in git.

After running through git bisect, and ran the command Alex Riesen  
suggested, it made it pretty crystal clear where things went wrong.   
It turned out to be a bad merge that was from a conflict related to  
white-space issues, and the wrong resolution was chosen (a resolution  
that also consequently turned out to be no change).

Another false impression I had is a merge conflict resolution would  
always be displayed in a merge commit.  However, after running over  
the merges again, if you pick the right or left, discarding the one or  
the other, nothing is shown in "git log -p" for the merge commit.  Is  
there a way to see what was chosen for a conflict resolution?  Seeing  
that in the merge commit would have made things a little more clear.

Thank you for articulating git branch's behavior - all is clear as mud  
now :)

Tim

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-21 22:53   ` Tim Harper
@ 2008-07-21 22:55     ` Tim Harper
  0 siblings, 0 replies; 58+ messages in thread
From: Tim Harper @ 2008-07-21 22:55 UTC (permalink / raw)
  To: git


On Jul 21, 2008, at 4:53 PM, Tim Harper wrote:

>
> On Jul 21, 2008, at 2:37 PM, Linus Torvalds wrote:
>
>>
>>
>> On Mon, 21 Jul 2008, Tim Harper wrote:
>>>
>>> Anyone run into this before?  Any idea what might have caused it?   
>>> We're a bit
>>> concerned about this because if we don't know how to avoid this,  
>>> we no longer
>>> can feel certain that when something is committed, it will make it  
>>> out in our
>>> release.
>>
>> Read up on '--full-history'.
>>
>> By default, git simplifies the history for logs that have path
>> simplification: only walking down the side of a merge that all the  
>> data
>> came from (ie the unchanged side). So it only leaves merges around if
>> there was changes from _all_ parents.
>>
>> So without --full-history, if any parent matches the state, it just
>> removes the merge and picks that parent that contained all the state.
>> Obviously, any changes to that file can be sufficiently explained by
>> walking just that limited history, because they must have changed in
>> _that_ history too!
>>
>> That default behaviour leads to a *much* simpler history, and is  
>> usually
>> what you want - it avoids unnecessary duplication when something was
>> changed trivially the same way in both branches - 'git log' will  
>> just pick
>> the first branch.
>>
>
> Agreed - this was an insightful decision.
>
>> So, if you had two (or more) commits that both fixed the same bug in
>> different branches, and thus both branches actually ended up with  
>> the same
>> contents, it does mean that "git log <filename>" will only show  
>> _one_ of
>> the fixes.
>>
>> In this case, it apparently showed another version than the one you  
>> were
>> looking for.
>>
>> 				Linus
>
> Git has made me feel stupid on various occasions.  This is no  
> exception as the problem turned out being in the chair, not in git.
>
> After running through git bisect, and ran the command Alex Riesen  
> suggested, it made it pretty crystal clear where things went wrong.   
> It turned out to be a bad merge that was from a conflict related to  
> white-space issues, and the wrong resolution was chosen (a  
> resolution that also consequently turned out to be no change).
>
> Another false impression I had is a merge conflict resolution would  
> always be displayed in a merge commit.  However, after running over  
> the merges again, if you pick the right or left, discarding the one  
> or the other, nothing is shown in "git log -p" for the merge  
> commit.  Is there a way to see what was chosen for a conflict  
> resolution?  Seeing that in the merge commit would have made things  
> a little more clear.
>

Actually, I found it:

http://www.kernel.org/pub/software/scm/git/docs/git-log.html

"git log -p -c" gives me what I was looking for

Tim

> Thank you for articulating git branch's behavior - all is clear as  
> mud now :)
>
> Tim

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
       [not found]   ` <8C23FB54-A28E-4294-ABEA-A5766200768B@gmail.com>
@ 2008-07-21 22:57     ` Linus Torvalds
  0 siblings, 0 replies; 58+ messages in thread
From: Linus Torvalds @ 2008-07-21 22:57 UTC (permalink / raw)
  To: Tim Harper; +Cc: Git Mailing List

[ git channel added back to cc, because this is an interesting question in 
  itself ]

On Mon, 21 Jul 2008, Tim Harper wrote:
> 
> Another false impression I had is a merge conflict resolution would always be
> displayed in a merge commit.  However, after running over the merges again, if
> you pick the right or left, discarding the one or the other, nothing is shown
> in "git log -p" for the merge commit.  Is there a way to see what was chosen
> for a conflict resolution?  Seeing that in the merge commit would have made
> things a little more clear.

The default behavior for showing merges is "--cc", which is the condensed 
version that only shows _actual_ conflicts that got resolved differently 
from either of the sources.

But note how this is an "after-the-fact" determination: it doesn't look 
whether the merge _did_ conflict (because doing that would require 
re-running the whole merge!), but it looks whether the end _result_ is 
different from either side.

So you can easily have a merge that conflicts - but then you resolve that 
merge by picking just one side of the merge as the result. And in that 
case the "--cc" diff will not show anything at all - because the end 
result did not conflict with the sources of the merge!

So "--cc" only shows output if: the merge itself actually changed 
something from _all_ parents. This can happen if:

 - there was a non-trivial conflict, and the end result really was a 
   "mixture" of the two. The result wasn't just a select of either side, 
   it was a combination of the two.

   This is obviously one "common" case for a merge resolution.

   But it's equally common that when you merge something you just say "Ok, 
   that other side did it better, I'll just pick that version". And in 
   that case, "--cc" won't show anything at all, because it's not really a 
   conflict any more once you've done that choice.

 - There can also be an "evil merge" that actually adds/changes/deletes 
   code that didn't exist AT ALL in either side. And --cc is very much 
   designed to show that.

   This is actually not always a bad thing (despite the name "evil 
   merge"), because it happens regularly that one branch had changed some 
   infrastructure setup or other, and the other branch had added a totally 
   new use of that infrastructure - and as a result the _merge_ needs to 
   add that setup code that didn't exist in either of the branches (in one 
   because the use wasn't there, in the other because the setup wasn't 
   needed).

Anyway, this all means that yes, "--cc" _often_ shows conflicts, but not 
always - exactly because it doesn't show the ones that the merge had 
committed as a non-conflict.

If you actually want to see the _real_ conflicts that happened as the 
merge was done, you do have to re-do it. 

		Linus

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-21 20:37 ` Linus Torvalds
  2008-07-21 22:53   ` Tim Harper
       [not found]   ` <8C23FB54-A28E-4294-ABEA-A5766200768B@gmail.com>
@ 2008-07-26  3:12   ` Roman Zippel
  2008-07-26 19:58     ` Linus Torvalds
  2 siblings, 1 reply; 58+ messages in thread
From: Roman Zippel @ 2008-07-26  3:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tim Harper, git

Hi,

On Monday 21. July 2008, Linus Torvalds wrote:

> Read up on '--full-history'.
>
> By default, git simplifies the history for logs that have path
> simplification: only walking down the side of a merge that all the data
> came from (ie the unchanged side). So it only leaves merges around if
> there was changes from _all_ parents.
>
> So without --full-history, if any parent matches the state, it just
> removes the merge and picks that parent that contained all the state.
> Obviously, any changes to that file can be sufficiently explained by
> walking just that limited history, because they must have changed in
> _that_ history too!

Is that really a good default behaviour? Is there a way to change that 
default?

I'm currently looking into converting the m68k CVS repository and I'd like to 
properly regenerate it as two separate lines of development. The problem is 
if I look at the file history, I often only see one side of the changes when 
things got merged because of this default.
What makes this worse is that graphical front ends may inherit this behaviour. 
I tested this with qgit and it only shows half of the history. giggle 
retrieves the history like --full-history, but it lacks empty merges, so it 
makes it harder to see what got merged when.
For example a history that actually looks this:

linux -+-----import----+-----------import----+------...
m68k   \-commit-commit-\-merge-commit-commit-\-merge...

Without the merges it looks like this:

linux -+-----import----------------import--------------+...
m68k   \-commit-commit---------commit-commit           \...

And without --full-history these "loose ends" aren't visible as in qgit.

When researching historical changes one wants to know when something was 
introduced and when it was merged, but this simplification makes it harder 
than it really has to be.
IMO the default should be to show the complete history, so one doesn't miss 
something by accident that might be important or as the original example 
shows it might be confusing if one sees a change with "git log --stat id.." 
and the change disappears when one looks at "git log path".

bye, Roman

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-26  3:12   ` Roman Zippel
@ 2008-07-26 19:58     ` Linus Torvalds
  2008-07-27 17:50       ` Roman Zippel
  0 siblings, 1 reply; 58+ messages in thread
From: Linus Torvalds @ 2008-07-26 19:58 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Tim Harper, git

On Sat, 26 Jul 2008, Roman Zippel wrote:
> >
> > So without --full-history, if any parent matches the state, it just
> > removes the merge and picks that parent that contained all the state.
> > Obviously, any changes to that file can be sufficiently explained by
> > walking just that limited history, because they must have changed in
> > _that_ history too!
> 
> Is that really a good default behaviour?

Yes. It's the only sane default right now. See below.

> Is there a way to change that default?

Use an alias or something.

To see why it's the default, do a few tests. In particular, try it with 
gitk on the kernel. Try it on some fairly simple file that doesn't get a 
lot of churn. Example:

	gitk lib/vsprintf.c

vs

	gitk --full-history lib/vsprintf.c

and if you don't _immediately_ see why --full-history isn't the default, 
there's likely something wrong with you. One is useful. The other is not.

So we absolutely _have_ to simplify merges. There is no question about it.

That said, we currently simplify merges the simple and stupid way, and 
I've hinted several times on this mailing list that there is a better way 
to do it (last time it was the discussion about "filter-branch".

In fact, if you google for 

	filter-branch full-history

you'll find some of the discussion. In order to make --full-history useful 
as a default, we'd need to do an after-the-fact merge cleanup (ie remove 
lines of development that are later found to really be totally 
uninteresting), but that is *hard*.

But if we did that, I'd agree to making --full-history the default (and it 
would be a good thing, no doubt about it - I just cannot see how to do ti 
simply and sanely enough)

		Linus

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-26 19:58     ` Linus Torvalds
@ 2008-07-27 17:50       ` Roman Zippel
  2008-07-27 18:47         ` Linus Torvalds
  0 siblings, 1 reply; 58+ messages in thread
From: Roman Zippel @ 2008-07-27 17:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tim Harper, git

Hi,

On Sat, 26 Jul 2008, Linus Torvalds wrote:

> > Is there a way to change that default?
> 
> Use an alias or something.

This doesn't help with the graphical front ends and they only use what git 
gives them.

> To see why it's the default, do a few tests. In particular, try it with 
> gitk on the kernel. Try it on some fairly simple file that doesn't get a 
> lot of churn. Example:
> 
> 	gitk lib/vsprintf.c
> 
> vs
> 
> 	gitk --full-history lib/vsprintf.c
> 
> and if you don't _immediately_ see why --full-history isn't the default, 
> there's likely something wrong with you. One is useful. The other is not.
> 
> So we absolutely _have_ to simplify merges. There is no question about it.

Well, I don't want that much history.
Let's take a different example. Look at kernel/sched_rt.c with git-log, 
--full-history shows an extra commit of a patch which was committed and 
merged twice, but there is no information how this other patch was merged. 
If you have giggle installed, you'll see that commit as a loose end.

(I have git version 1.5.6.2 installed in case it matters.)

> That said, we currently simplify merges the simple and stupid way, and 
> I've hinted several times on this mailing list that there is a better way 
> to do it (last time it was the discussion about "filter-branch".
> 
> In fact, if you google for 
> 
> 	filter-branch full-history
> 
> you'll find some of the discussion. In order to make --full-history useful 
> as a default, we'd need to do an after-the-fact merge cleanup (ie remove 
> lines of development that are later found to really be totally 
> uninteresting), but that is *hard*.

I played a little with it in the ruby script below, which produces a 
complete connected graph of all content nodes and which have been merged 
into the head, e.g. for sched_rt.c it produces that extra commit merge. 
The script basically eliminates all empty merges. As input to the script I 
used "git log --parents --name-only --full-history kernel/sched_rt.c | 
grep -e ^commit -e ^kernel", which seems to produce the same amount of 
commits as "gitk --full-history ...".

The main function is to check, whether one parent of a commit is an 
ancestor of another parent, so that this path can be eliminated. I tried 
it with other paths and too simple implementations quickly lead to 
exponential behaviour. :) It probably also shouldn't be recursive, I had 
to increase the stack limit, otherwise I got stack exceptions.
Otherwise it seems to work fine, it wasn't that hard :-)

The ruby syntax shouldn't be too hard too read, the nonobvious thing is 
maybe that '$' marks global variables.

bye, Roman


#! /usr/bin/ruby

$parent = Hash.new
$content = Hash.new
$result = Hash.new

commit = nil
head = nil
while l = $stdin.gets
	a = l.split(" ")
	if a[0] == "commit"
		commit = a[1]
		head = commit unless head
		$parent[commit] = a[2..-1]
	else
		$content[commit] = true
	end
end

$parent_check = Hash.new
$parent_cache = Hash.new

def commit_has_parent?(commit, commit2)
	if $parent_check[commit]
		print "parent loop for #{commit} (#{commit2})?\n"
		p $parent_check
		return false
	end
	return $parent_cache[commit] if $parent_cache.has_key?(commit)
	$parent_check[commit] = true
	res = false
	if $content[commit] > $content[commit2]
		$parent[commit].each do |parent|
			if parent == commit2 || commit_has_parent?(parent, commit2)
				res = true
				break;
			end
		end
	end
	$parent_cache[commit] = res
	$parent_check.delete(commit)
	$parent_cache.clear if $parent_check.empty?
	return res
end

def check_commit(commit)
	return $result[commit] if $result.has_key? commit
	a = Array.new
	$parent[commit].each do |parent|
		parent = check_commit(parent)
		if parent
			a.each_index do |i|
				if a[i] == parent || commit_has_parent?(a[i], parent)
					parent = nil
					break
				elsif commit_has_parent?(parent, a[i])
					a[i] = parent
					parent = nil
					break
				end
			end
		end
		a.push(parent) if parent
	end
	$parent[commit] = a
	$content[commit] = true if a.size > 1
	if $content[commit]
		$result[commit] = commit
		max = 1
		a.each do |parent|
			max = $content[parent] + 1 if max <= $content[parent]
		end
		$content[commit] = max
	else
		$result[commit] = a[0]
	end
	return $result[commit]
end

check_commit(head)
#p $result
#p $parent

p $content.keys.size
$content.each_key do |commit|
	p [ commit, $parent[commit] ]
	commit = $parent[commit][0]
end

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-27 17:50       ` Roman Zippel
@ 2008-07-27 18:47         ` Linus Torvalds
  2008-07-27 23:14           ` Roman Zippel
  0 siblings, 1 reply; 58+ messages in thread
From: Linus Torvalds @ 2008-07-27 18:47 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Tim Harper, git

On Sun, 27 Jul 2008, Roman Zippel wrote:

> Hi,
> 
> On Sat, 26 Jul 2008, Linus Torvalds wrote:
> 
> > > Is there a way to change that default?
> > 
> > Use an alias or something.
> 
> This doesn't help with the graphical front ends and they only use what git 
> gives them.

And the graphical front-ends is exactly why --full-history cannot be the 
default.

We _could_ make it the default for non-graphical ones, if we also say 
"--no-merges". But then:

> > To see why it's the default, do a few tests. In particular, try it with 
> > gitk on the kernel. Try it on some fairly simple file that doesn't get a 
> > lot of churn. Example:
> > 
> > 	gitk lib/vsprintf.c
> > 
> > vs
> > 
> > 	gitk --full-history lib/vsprintf.c
> > 
> > and if you don't _immediately_ see why --full-history isn't the default, 
> > there's likely something wrong with you. One is useful. The other is not.
> > 
> > So we absolutely _have_ to simplify merges. There is no question about it.
> 
> Well, I don't want that much history.

Right. Nobody does.

> Let's take a different example.

No, let's not.

Unless you can solve that _one_ example efficiently, nothing else matters. 

The above example is all you ever need. Make that one work right (and 
efficiently), and everything is fine.

And no, some random ruby code doesn't make any difference what-so-ever. 
There are efficiency constraints here that make any ruby solution be 
unrealistic to begin with.

			Linus

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-27 18:47         ` Linus Torvalds
@ 2008-07-27 23:14           ` Roman Zippel
  2008-07-27 23:18             ` Linus Torvalds
  2008-07-27 23:25             ` Martin Langhoff
  0 siblings, 2 replies; 58+ messages in thread
From: Roman Zippel @ 2008-07-27 23:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tim Harper, git

Hi,

On Sun, 27 Jul 2008, Linus Torvalds wrote:

> > > > Is there a way to change that default?
> > > 
> > > Use an alias or something.
> > 
> > This doesn't help with the graphical front ends and they only use what git 
> > gives them.
> 
> And the graphical front-ends is exactly why --full-history cannot be the 
> default.

If you mean current gitk style --full-history I agree, the problem is 
still that git is hiding too much history with the simplified history...

> > Let's take a different example.
> 
> No, let's not.
> 
> Unless you can solve that _one_ example efficiently, nothing else matters. 
> 
> The above example is all you ever need. Make that one work right (and 
> efficiently), and everything is fine.
> 
> And no, some random ruby code doesn't make any difference what-so-ever. 
> There are efficiency constraints here that make any ruby solution be 
> unrealistic to begin with.

Why are you dismissing what I wrote without even giving it a second 
thought? I didn't bother with the initial example, because it's so 
simple, that it's no real challenge.
Did I say anywhere it had to be done in ruby? All I tried was to 
demonstrate a possible algorithm to solve the problem. I did time the 
execution and compared to the time it took to extract the history it 
wasn't significant for such a simple script.
What did I do wrong that you rebuff me based on this secondary problem 
(which I'm quite aware of, because it was me who mentioned in first place) 
and giving the primary problem (which is the missing history) no 
attention?

If you had any questions, I'd be happy to answer them. If you think that 
the demonstrated algorithm doesn't work, I'd be glad to know why. If the 
algorithm might work, any hint what could be do done to try it for real, 
would be great. But I don't get any of this. Why?

bye, Roman

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-27 23:14           ` Roman Zippel
@ 2008-07-27 23:18             ` Linus Torvalds
  2008-07-28  0:00               ` Roman Zippel
  2008-07-27 23:25             ` Martin Langhoff
  1 sibling, 1 reply; 58+ messages in thread
From: Linus Torvalds @ 2008-07-27 23:18 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Tim Harper, git

On Mon, 28 Jul 2008, Roman Zippel wrote:
> 
> Why are you dismissing what I wrote without even giving it a second 
> thought? I didn't bother with the initial example, because it's so 
> simple, that it's no real challenge.

Did you try it? It really shouldn't be any simpler than anything else. And 
I dismissed what you wrote because the example you _did_ state was about 
something else entirely (ie apparently some giggle bug that simplifies 
things incorrectly).

> What did I do wrong that you rebuff me based on this secondary problem 
> (which I'm quite aware of, because it was me who mentioned in first place) 
> and giving the primary problem (which is the missing history) no 
> attention?

It's not missing history. It's all there in --full-history. The default is 
to give a reasonable simplification, and I told you what the 
simplification was, and it's perfectly conceptually fine - AND IT IS MUCH 
MORE EFFICIENT than the alternatives.

So I'm not seeing your point what-so-ever. 

My point is:

 - with full-history, you have it all, but it's useless in practice

 - without full-history, it's useful in practice

You never gave any examples otherwise.

			Linus

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-27 23:14           ` Roman Zippel
  2008-07-27 23:18             ` Linus Torvalds
@ 2008-07-27 23:25             ` Martin Langhoff
  2008-07-28  1:29               ` Roman Zippel
  1 sibling, 1 reply; 58+ messages in thread
From: Martin Langhoff @ 2008-07-27 23:25 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Linus Torvalds, Tim Harper, git

On Mon, Jul 28, 2008 at 11:14 AM, Roman Zippel <zippel@linux-m68k.org> wrote:
> Why are you dismissing what I wrote without even giving it a second
> thought? I didn't bother with the initial example, because it's so
> simple, that it's no real challenge.

I can't speak for anyone else, but you do have to keep in mind that a
solution to this has to be rather fast - and I mean fast in git terms,
not in scripting-language-fast terms.

That you can do it Ruby - and I may be able to do it Perl - has little
bearing on what can be done inside the git log machinery with a small
performance penalty.

cheers,



m
-- 
 martin.langhoff@gmail.com
 martin@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-27 23:18             ` Linus Torvalds
@ 2008-07-28  0:00               ` Roman Zippel
  2008-07-28  5:00                 ` Linus Torvalds
  0 siblings, 1 reply; 58+ messages in thread
From: Roman Zippel @ 2008-07-28  0:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tim Harper, git

Hi,

On Sun, 27 Jul 2008, Linus Torvalds wrote:

> > Why are you dismissing what I wrote without even giving it a second 
> > thought? I didn't bother with the initial example, because it's so 
> > simple, that it's no real challenge.
> 
> Did you try it? It really shouldn't be any simpler than anything else. And 

Of course I did:

$ git log --parents --name-only --full-history lib/vsprintf.c | grep -e ^commit | wc -l
5929

This is same amount of commits as for gitk.

$ /usr/bin/time git log --parents --name-only --full-history lib/vsprintf.c | grep -e ^commit -e ^lib | /usr/bin/time ruby ../filter.rb
3.08user 0.14system 0:03.54elapsed 91%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+48230minor)pagefaults 0swaps
20
["72fd4a35a824331d7a0f4168d7576502d95d34b3", ["0a6047eef1c465c38aacfbdab193161b3f0cd144"]]
["06b2a76d25d3cfbd14680021c1d356c91be6904e", ["96e3e18eed3b48f6d4377dee0326a106e8a04569"]]
["1da177e4c3f41524e886b7f1b8a0c1fc7321cac2", []]
["e905914f96e11862b130dd229f73045dad9a34e8", ["f796937a062c7aeb44cd0e75e1586c8543634a7d"]]
["4e57b6817880946a3a78d5d8cad1ace363f7e449", ["8032230694ec56c168a1404c67a54d281536cbed"]]
["4f9d5f4a353440f2265781bfa641587964901861", ["9b706aee7d92d6ac3002547aea12e3eaa0a750ae"]]
["0a6047eef1c465c38aacfbdab193161b3f0cd144", ["e905914f96e11862b130dd229f73045dad9a34e8"]]
["c6b40d16d1cfa1a01158049bb887a9bbe48ef7ba", ["11443ec7d9286dd25663516436a14edfb5f43857"]]
["ea6f3281a145d16ed53e88b0627f78d5cde6068f", ["72fd4a35a824331d7a0f4168d7576502d95d34b3"]]
["11443ec7d9286dd25663516436a14edfb5f43857", ["ea6f3281a145d16ed53e88b0627f78d5cde6068f"]]
["b39a734097d5095d63eb9c709a6aaf965633bb01", ["c6b40d16d1cfa1a01158049bb887a9bbe48ef7ba"]]
["78a8bf69b32980879975f7e31d30386c50bfe851", ["0f9bfa569d46f2346a53a940b2b9e49a38635732"]]
["4d8a743cdd2690c0bc8d1b8cbd02cffb1ead849f", ["78a8bf69b32980879975f7e31d30386c50bfe851"]]
["0fe1ef24f7bd0020f29ffe287dfdb9ead33ca0b2", ["4d8a743cdd2690c0bc8d1b8cbd02cffb1ead849f"]]
["9b706aee7d92d6ac3002547aea12e3eaa0a750ae", ["06b2a76d25d3cfbd14680021c1d356c91be6904e"]]
["0f9bfa569d46f2346a53a940b2b9e49a38635732", ["4f9d5f4a353440f2265781bfa641587964901861"]]
["4277eedd7908a0ca8b66fad46ee76b0ad96e6ef2", ["b39a734097d5095d63eb9c709a6aaf965633bb01"]]
["8032230694ec56c168a1404c67a54d281536cbed", ["1da177e4c3f41524e886b7f1b8a0c1fc7321cac2"]]
["f796937a062c7aeb44cd0e75e1586c8543634a7d", ["4e57b6817880946a3a78d5d8cad1ace363f7e449"]]
["96e3e18eed3b48f6d4377dee0326a106e8a04569", ["4277eedd7908a0ca8b66fad46ee76b0ad96e6ef2"]]
0.12user 0.02system 0:03.64elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7252minor)pagefaults 0swaps

These are the same 20 commits (with parents) from a simple git-log.

> I dismissed what you wrote because the example you _did_ state was about 
> something else entirely (ie apparently some giggle bug that simplifies 
> things incorrectly).

I'm trying to get back things to the topic and it's not just giggle, every 
tool presents a different version of the history.

> > What did I do wrong that you rebuff me based on this secondary problem 
> > (which I'm quite aware of, because it was me who mentioned in first place) 
> > and giving the primary problem (which is the missing history) no 
> > attention?
> 
> It's not missing history. It's all there in --full-history. The default is 
> to give a reasonable simplification, and I told you what the 
> simplification was, and it's perfectly conceptually fine - AND IT IS MUCH 
> MORE EFFICIENT than the alternatives.
> 
> So I'm not seeing your point what-so-ever. 

That's why I gave you an alternative example, where history is missing.

>  - with full-history, you have it all, but it's useless in practice

Could you please specify which full-history you mean, gitk --full-history 
or git-log --full-history?

>  - without full-history, it's useful in practice

>From a VCS I would still expect nevertheless to present a correct 
history not some approximation.

bye, Roman

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-27 23:25             ` Martin Langhoff
@ 2008-07-28  1:29               ` Roman Zippel
  0 siblings, 0 replies; 58+ messages in thread
From: Roman Zippel @ 2008-07-28  1:29 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Linus Torvalds, Tim Harper, git

Hi,

On Mon, 28 Jul 2008, Martin Langhoff wrote:

> On Mon, Jul 28, 2008 at 11:14 AM, Roman Zippel <zippel@linux-m68k.org> wrote:
> > Why are you dismissing what I wrote without even giving it a second
> > thought? I didn't bother with the initial example, because it's so
> > simple, that it's no real challenge.
> 
> I can't speak for anyone else, but you do have to keep in mind that a
> solution to this has to be rather fast - and I mean fast in git terms,
> not in scripting-language-fast terms.

You also have to keep in mind, that I haven't really hacked git before, so 
I'm just trying to do something with the data I can somehow extract from 
it. I seriously didn't thought that anyone wouldn't understand that the 
code example was just a proof of concept.

> That you can do it Ruby - and I may be able to do it Perl - has little
> bearing on what can be done inside the git log machinery with a small
> performance penalty.

It also has to do with correctness, is performance more important than 
correctness? 
Part of the problem is, what is the correct history, as which it should be 
displayed via the various interfaces by default.

bye, Roman

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-28  0:00               ` Roman Zippel
@ 2008-07-28  5:00                 ` Linus Torvalds
  2008-07-28  5:30                   ` Linus Torvalds
  2008-07-29  2:59                   ` Roman Zippel
  0 siblings, 2 replies; 58+ messages in thread
From: Linus Torvalds @ 2008-07-28  5:00 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Tim Harper, git

On Mon, 28 Jul 2008, Roman Zippel wrote:
> 
> Of course I did:
> 
> $ git log --parents --name-only --full-history lib/vsprintf.c | grep -e ^commit | wc -l
> 5929
> 
> This is same amount of commits as for gitk.

OF COURSE it's the same numbr of commits. That's what gitk uses. That's 
what you *have* to use.

You don't actually understand how git works, do you?

> These are the same 20 commits (with parents) from a simple git-log.

So what? The point I tried to make is that _any_ algorithm that gets the 
above case right by actually simplifying the commits "post facto" probably 
gets any case right. You tried to find some more interestign case, but you 
missed the whole point - even the "simple" case is quite hard enough. 

IOW, don't look for anythign more difficult, because if you do, you don't 
understand the problem to begin with!

Do you not understand that the problem is that "post facto" isn't actually 
acceptable? Have you looked at all at the revision reading code? Hmm?

The regular merge simplification does the simplification _before_ it has 
gathered the whole history of commits. And that is really really 
important.

And I realize that you don't even seem to understand the difference. But 
to simplify it for you, let me give you a challenge. Try this:

	time sh -c "git log --parents --full-history lib/vsprintf.c | head"

and if it takes noticeably more than a tenth of a second on my machine, 
you lose.

Because that's roughly what it takes right now, and it's what means that 
effectively the normal log is instantaneous. It's why you can start 
looking at the log without waiting for three seconds, even though the 
_full_ log may take three seconds to compute (ok, on my machine it takes 
2.3s, but whatever).

And it's why gitk can start printing out the history _before_ three 
seconds has passed. And that's really really important.

Try it. Really. Just do "gitk lib/vsprintf.c" and look at how it does 
things incrementally. It doesn't wait for a couple of seconds and then 
show things.

Absolutely EVERYTHING in git would be totally trivial if you did it all 
based on generating first the whole history, and then simplifying it from 
there. But git would be unusably slow in real life, and it would scale 
_horribly_ badly with history size.

So _all_ the normal code actually generates the history INCREMENTALLY, and 
it absolutely _has_ to work that way.

			Linus

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-28  5:00                 ` Linus Torvalds
@ 2008-07-28  5:30                   ` Linus Torvalds
  2008-07-29  2:59                   ` Roman Zippel
  1 sibling, 0 replies; 58+ messages in thread
From: Linus Torvalds @ 2008-07-28  5:30 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Tim Harper, git

On Sun, 27 Jul 2008, Linus Torvalds wrote:
> 
> And it's why gitk can start printing out the history _before_ three 
> seconds has passed. And that's really really important.

Btw, the reason it's really really important is that "three seconds" can 
actually easily be "three minutes" - if the project is big, or if you 
simply don't have everything in the cache, so you actually need to do tons 
of IO to generate the whole history.

So every normal operation absolutely _must_ be incremental and not rely on 
any calculation of the whole history in order to then simplify it.

Of course, post-processing is fine for some things. For example, in the 
thread I pointed you to originally (see filter-branch + full-history in 
google, or look in some git archive) I suggested a post-processing of the 
merge history for filter-branch. I suspect it's very acceptable for _that_ 
kind of use to "batch" things up and not do them with partial knowledge.

But this incremental thing is why I for example suggest people should use 
"git gui blame" instead of "git blame" when looking for problems - because 
the latter cannot be done incrementally, and as a result can cause really 
irritating delays (exactly because it basically needs to synchronously 
walk back to the beginning of history).

The kernel repo, btw, is pretty small in this regard. The cases that 
caused much more pain were the insane KDE ones that were something like 
ten times the size. We've optimized things pretty aggressively, but...

Btw, if I sound irritated, it's because we had all these discussions about 
three _years_ ago when git got started. This is not a new issue. It's 
hard.

I've been pushing on people to do things incrementally very hard over the 
last few years because it's such a _huge_ usability issue.

For example, I've pointed you to the incremental nature of "gitk" as an 
example of how things should work, but that's actually fairly recent: it 
wasn't that long ago that "gitk" used to pass in "--topo-order" or 
"--date-order" to the core git revision machinery, and that actually is 
another of those "global" operations that you need the whole history for.

So gitk actually used to pause for three seconds (or ten. or thirty) 
before it would show the results. I'm really happy to report that Paul 
finally did the (trivial) topo-sort in gitk, meaning that he could re-sort 
it as necessary and keep things incremental. It was one of my biggest UI 
gripes for the longest time (and I wasted time adding a special "partial 
output mode" that gitk didn't even then end up using because Paul did 
things the right way).

Btw, from a git log viewer standpoint, the "merge history simplification" 
is all the exact same problem as the "--topo-order" flag is: you could 
use the (incremental and very verbose)

	git log --full-history --parents

output as the base-line, and then you could do the commit simplification 
of things interactively.

But "git log" itself cannot do it by default, since that would mean that 
git log itself would have to wait for the whole history to be generated. 

That's because output to a pipe is fundamentally linear (ie it cannot 
"re-write" the things it has already shown as it finds a simplification: 
there is no incremental way to rewrite things "after the fact").

			Linus

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-28  5:00                 ` Linus Torvalds
  2008-07-28  5:30                   ` Linus Torvalds
@ 2008-07-29  2:59                   ` Roman Zippel
  2008-07-29  3:15                     ` Martin Langhoff
                                       ` (2 more replies)
  1 sibling, 3 replies; 58+ messages in thread
From: Roman Zippel @ 2008-07-29  2:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tim Harper, git

Hi,

On Sun, 27 Jul 2008, Linus Torvalds wrote:

> You don't actually understand how git works, do you?

Probably not and to be honest I don't care, all I want is my history - the 
correct one. All I know it's in there somehow and all I care is how to get 
it out of there.
Right now you're giving me the choice between a crappy incomplete history 
or a crappy history full of useless information. That's it? As long as 
your challenge involves being compared to crappy history, I'm not 
interested. If the solution should involve a switch "--correct-history" 
or I have to wait for the result, I don't care, because it's the correct 
history I want. As long as you're trying to sell me crappy history I'm not 
buying it.

Can we please get past this and look at what is required to produce the 
correct history?

Fact is based on the current git-log output it's simply impossible to 
produce the correct history without reading the whole history first, since 
the very last commit can still be relevant for an earlier merge to connect 
it properly into the graph. This means we need some extra information 
before even starting to scan through the commits. Luckily this information 
can be cached once it has been generated and it also can be updated 
incrementally. E.g. this information could be generated while generating 
the pack, so you only had to scan the commits which haven't been packed 
yet, but it's also possible to update it when merging/pulling new data. 
It's also not much information that is needed, all that is needed is list 
of commits per file (which are are usually only a few, mostly even none), 
which git-log can use, so it knows that these are important while scanning 
the tree.

Technically I don't see a really hard problem to implement this, the 
problem for me is only that I have no idea where to start this within git 
and how to integrate it. The other problem (over which I have absolutely 
no control) is whether anyone actually wants to produce a correct history.
I hope there is someone, otherwise there would be no need for "git-log 
--full-history" (which is also used by git-web), this e.g. produces a nice 
example in kernel/printk.c, where git-web produces two commits (search for 
tty_write_message), for which none of the front ends can tell me usefully 
how it fits into the history (they either don't show it at all or it's 
lost in "gitk --full-history").

bye, Roman

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-29  2:59                   ` Roman Zippel
@ 2008-07-29  3:15                     ` Martin Langhoff
  2008-07-30  0:16                       ` Roman Zippel
  2008-07-29  3:29                     ` Linus Torvalds
  2008-07-29  5:31                     ` Jeff King
  2 siblings, 1 reply; 58+ messages in thread
From: Martin Langhoff @ 2008-07-29  3:15 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Linus Torvalds, Tim Harper, git

On Tue, Jul 29, 2008 at 2:59 PM, Roman Zippel <zippel@linux-m68k.org> wrote:
> Can we please get past this and look at what is required to produce the
> correct history?

Roman - correct is --full-history -- any simplification that makes it
easy on your eyes *is* a simplification. And consumers that want to do
nice user-friendly simplification like gitk does can hang off the data
stream.

> it's also possible to update it when merging/pulling new data.

If that's what you want to do, you can prototype it with a hook on
fetch and commit. That is definitely an area that hasn't been explored
- what nicer (but expensive) views on the history we have can be
afforded by pre-computing things on fetch and commit hooks.

cheers,



m
-- 
 martin.langhoff@gmail.com
 martin@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-29  2:59                   ` Roman Zippel
  2008-07-29  3:15                     ` Martin Langhoff
@ 2008-07-29  3:29                     ` Linus Torvalds
  2008-07-29  3:33                       ` Linus Torvalds
  2008-07-29 11:39                       ` Roman Zippel
  2008-07-29  5:31                     ` Jeff King
  2 siblings, 2 replies; 58+ messages in thread
From: Linus Torvalds @ 2008-07-29  3:29 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Tim Harper, git

On Tue, 29 Jul 2008, Roman Zippel wrote:
> 
> Right now you're giving me the choice between a crappy incomplete history 
> or a crappy history full of useless information. That's it? As long as 
> your challenge involves being compared to crappy history, I'm not 
> interested.

No, my challenges have not been about "crappy history" but about 
performance and about getting it right. The reason I pointed you to 
lib/vsprintf.c had nothing to do with crappiness, and had everythign to do 
with just picking a random example of something where you absolutely 
*HAVE* to simplify history.

The fact that it simplifies to a linear one is totally immaterial. You 
continue to miss the point. Over and over AND OVER again!

> Can we please get past this and look at what is required to produce the 
> correct history?

I'm not even going to bother with this argument.

You dismiss all my issues, and then you continue to talk about "correct", 
even though it isn't a correctness thing - it's a difference of opinion. 
Me, I *much* prefer the simplified history. That _is_ the correct one for 
me.

And the sad part is, what you want is there. It's a command line switch 
away. You were told in the very first message what the switch was. If you 
don't want to use "--full-history", you can actually use "git whatchanged" 
instead of "git log", and it implies the switch without you even having to 
type it.

So it's all there. Use it. Just don't bother adding me to the cc to your 
inane flames, because I'm fed up with the fact that you can't actually be 
bothered to read my emails, and just want to flame.

And quite frankly, I've seen that behaviour from you before, when it comes 
to other things. So go away. Write the code. Come back with patches. In 
the meantime, we've told you what to do: use --full-history if you really 
want it.

			Linus

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-29  3:29                     ` Linus Torvalds
@ 2008-07-29  3:33                       ` Linus Torvalds
  2008-07-29 11:39                       ` Roman Zippel
  1 sibling, 0 replies; 58+ messages in thread
From: Linus Torvalds @ 2008-07-29  3:33 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Tim Harper, git

On Mon, 28 Jul 2008, Linus Torvalds wrote:
> 
> And quite frankly, I've seen that behaviour from you before, when it comes 
> to other things. So go away. Write the code. Come back with patches. In 
> the meantime, we've told you what to do: use --full-history if you really 
> want it.

Btw, if you really do end up wanting to actually do something about it, I 
can already tell you that trying to do so in "git log" isn't going to be 
useful. Do it in "gitk" instead, and make gitk simplify the --full-history 
output.

I tried to explain to you the why part earlier (go back and look for 
"incremental" and "topo-sort"), but it all seemed to fly right by you, and 
you started repeating your ranting instead.

IOW, put up or shut up.

			Linus

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-29  2:59                   ` Roman Zippel
  2008-07-29  3:15                     ` Martin Langhoff
  2008-07-29  3:29                     ` Linus Torvalds
@ 2008-07-29  5:31                     ` Jeff King
  2008-07-29 12:32                       ` Roman Zippel
  2 siblings, 1 reply; 58+ messages in thread
From: Jeff King @ 2008-07-29  5:31 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Linus Torvalds, Tim Harper, git

On Tue, Jul 29, 2008 at 04:59:01AM +0200, Roman Zippel wrote:

> Right now you're giving me the choice between a crappy incomplete history 
> or a crappy history full of useless information. That's it? As long as 
> your challenge involves being compared to crappy history, I'm not 
> interested. If the solution should involve a switch "--correct-history" 
> or I have to wait for the result, I don't care, because it's the correct 
> history I want. As long as you're trying to sell me crappy history I'm not 
> buying it.
> 
> Can we please get past this and look at what is required to produce the 
> correct history?

You seem to be indicating here (and elsewhere in the thread) that there
exists some history graph for which neither "git log" nor "git log
--full-history" produces the output you want, but that there is some
better output (even if it might take more time to compute).

Perhaps I am just slow, but I haven't been able to figure out what that
history is, or what the "correct" output should be. Can you try to state
more clearly what it is you are looking for?

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-29  3:29                     ` Linus Torvalds
  2008-07-29  3:33                       ` Linus Torvalds
@ 2008-07-29 11:39                       ` Roman Zippel
  2008-07-29 12:00                         ` David Kastrup
  2008-07-29 15:50                         ` Linus Torvalds
  1 sibling, 2 replies; 58+ messages in thread
From: Roman Zippel @ 2008-07-29 11:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tim Harper, git

Hi,

On Mon, 28 Jul 2008, Linus Torvalds wrote:

> You dismiss all my issues, and then you continue to talk about "correct", 
> even though it isn't a correctness thing - it's a difference of opinion. 
> Me, I *much* prefer the simplified history. That _is_ the correct one for 
> me.

I'm not dismissing it, but your focus is on how to get this result. If the 
results were always the same, I wouldn't have a problem at all.
That's why I'm trying to give you an example where the end result differs, 
how are we supposed to get to an agreement on _how_ to get the result, if 
we don't even agree on _what_ the result should be?

> And quite frankly, I've seen that behaviour from you before, when it comes 
> to other things.

What exact behaviour is that? That I dare to disagree with you?

> So go away. Write the code. Come back with patches.

If you knew me that well, you also knew that such threats don't work with 
me.
In this case you know perfectly well, that I don't know the code as well 
as you, so without any help it would require a huge waste of time with the 
risk of rejection.

bye, Roman

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-29 11:39                       ` Roman Zippel
@ 2008-07-29 12:00                         ` David Kastrup
  2008-07-29 15:50                         ` Linus Torvalds
  1 sibling, 0 replies; 58+ messages in thread
From: David Kastrup @ 2008-07-29 12:00 UTC (permalink / raw)
  To: git

Roman Zippel <zippel@linux-m68k.org> writes:

> On Mon, 28 Jul 2008, Linus Torvalds wrote:
>
>> So go away. Write the code. Come back with patches.
>
> If you knew me that well, you also knew that such threats don't work
> with me.

I'd like to add to Linus' suggested course of action the task "look up
threat in a dictionary".

> In this case you know perfectly well, that I don't know the code as
> well as you, so without any help it would require a huge waste of time
> with the risk of rejection.

If _you_ need the functionality, you can easily keep it around in _your_
copy.  That's what I do with a few patches that were not accepted here.
So the "risk of rejection" has no real cost for you (apart from an
occasional rebase on origin, which is cheap).  Just for others.  And of
no-one else argues your case, then they probably don't mind.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-29  5:31                     ` Jeff King
@ 2008-07-29 12:32                       ` Roman Zippel
  2008-07-29 12:48                         ` Olivier Galibert
  2008-07-29 12:52                         ` Jeff King
  0 siblings, 2 replies; 58+ messages in thread
From: Roman Zippel @ 2008-07-29 12:32 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Tim Harper, git

Hi,

On Tue, 29 Jul 2008, Jeff King wrote:

> > Can we please get past this and look at what is required to produce the 
> > correct history?
> 
> You seem to be indicating here (and elsewhere in the thread) that there
> exists some history graph for which neither "git log" nor "git log
> --full-history" produces the output you want, but that there is some
> better output (even if it might take more time to compute).
> 
> Perhaps I am just slow, but I haven't been able to figure out what that
> history is, or what the "correct" output should be. Can you try to state
> more clearly what it is you are looking for?

Most frequently this involves changes where the same change is merged 
twice. Another interesting example is kernel/printk.c where a change is 
added and later removed again before it's merged.
With "git-log --full-history" you see these changes, but it lacks the 
necessary merges to produce a full graph. As consequence none of the 
graphical front ends produce a useful history graph.

This problem now hits me now more seriously in a repository conversion, 
where it frequently happened, that changes were applied both locally and 
upstream, so that I have relatively many of these empty merges and the 
default git-log output is useless. --full-history is more of a workaround 
than a real solution and again the history graph in _all_ graphical front 
ends is useless.

More generally this means in any kind of situation where you maintain your 
own repository and it takes a while until upstream accepts your changes, 
so that you frequently have duplicated changes (because upstream doesn't 
use git or doesn't pull directly), you have to be careful to get the right 
history out of git.

The point is now that I think the problem is solvable even within Linus' 
constraints, so that git-log produces the right output by default and a 
workaround like --full-history isn't needed anymore.

bye, Roman

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-29 12:32                       ` Roman Zippel
@ 2008-07-29 12:48                         ` Olivier Galibert
  2008-07-29 12:52                         ` Jeff King
  1 sibling, 0 replies; 58+ messages in thread
From: Olivier Galibert @ 2008-07-29 12:48 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Jeff King, Linus Torvalds, Tim Harper, git

On Tue, Jul 29, 2008 at 02:32:14PM +0200, Roman Zippel wrote:
> Most frequently this involves changes where the same change is merged 
> twice. [...]

You're not answering the question.  You cite cases when you consider
both the default and the full-history output incorrect from a
usefulness point of view.  The question is, what would the correct
output be, from your point of view?  What should be shown, and what
shouldn't?

  OG.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-29 12:32                       ` Roman Zippel
  2008-07-29 12:48                         ` Olivier Galibert
@ 2008-07-29 12:52                         ` Jeff King
  2008-07-29 17:25                           ` Linus Torvalds
  2008-07-30  2:48                           ` Roman Zippel
  1 sibling, 2 replies; 58+ messages in thread
From: Jeff King @ 2008-07-29 12:52 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Linus Torvalds, Tim Harper, git

On Tue, Jul 29, 2008 at 02:32:14PM +0200, Roman Zippel wrote:

> > Perhaps I am just slow, but I haven't been able to figure out what that
> > history is, or what the "correct" output should be. Can you try to state
> > more clearly what it is you are looking for?
> 
> Most frequently this involves changes where the same change is merged 
> twice. Another interesting example is kernel/printk.c where a change is 
> added and later removed again before it's merged.

I glanced briefly over "gitk kernel/printk.c" and it looks pretty sane.
I was really hoping for you to make your case as something like:

  1. here is an ascii diagram of an actual history graph (or a recipe of
     git commands for making one)
  2. here is what git-log (or gitk) produces for this history by
     default; and here is why it is not optimal (presumably some
     information it fails to convey)
  3. here is what git-log (or gitk) with --full-history produces; and
     here is why it is not optimal (presumably because it is too messy)
  4. here is what output I would like to see. Bonus points for "and here
     is an algorithm that accomplishes it."

> The point is now that I think the problem is solvable even within Linus' 
> constraints, so that git-log produces the right output by default and a 
> workaround like --full-history isn't needed anymore.

I think this is a separate issue. Even if you came up with some great
new history simplification, it likely wouldn't become the _default_
right away anyway. So you need to:

  1. produce a new simplification algorithm that is at least useful in
     _some_ contexts. Then this can be used when desired for those
     contexts. It almost doesn't matter how efficient it is, if it is
     providing results that are otherwise unavailable. A user can choose
     to take the performance hit to get those results.

  2. If that algorithm doesn't provide worse output in any other
     contexts _and_ it has similar performance to the current default,
     then it can be considered for the default.

But I haven't seen convincing evidence leading to step '1', so arguing
about step '2' seems pointless.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-29 11:39                       ` Roman Zippel
  2008-07-29 12:00                         ` David Kastrup
@ 2008-07-29 15:50                         ` Linus Torvalds
  2008-07-30  1:14                           ` Roman Zippel
  1 sibling, 1 reply; 58+ messages in thread
From: Linus Torvalds @ 2008-07-29 15:50 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Tim Harper, git

On Tue, 29 Jul 2008, Roman Zippel wrote:
> 
> I'm not dismissing it, but your focus is on how to get this result.

No, you misunderstand.

My focus is really on one single thing:

 - performance

with a smaller focus on the fact that I simply don't see how it's 
_possible_ to do better than our current all-or-nothing approach of 
simplification (eg either extreme simplification or none at all: nothing 
or --full-history).

So here's my challenge again, which you seem to have TOTALLY MISSED.

Make this be fast:

	time sh -c "git log <filename> | head"

nothing else matters. If you can make that one be fast, I'm happy. 

And that "| head" is really very fundamentally important. The important 
thing from a performance standpoint is not how long the _whole_ "log" 
takes. The important thing is how fast it _feels_, and that is directly 
tied to how fast it starts outputting the data.

Put another way: I _know_ how to simplify things. Trust me, Roman. That's 
not the problem. But doing it incrementally is really really hard, to the 
point that I actually believe that it is impossible to do. 

And doing it after-the-fact is simply not interesting. We could trivially 
(well, _fairly_ trivially) do it when we do the topology sort. But I have 
long long tried to teach people _not_ to do the topo sort inside the core 
git machinery, exactly because it is a horrid thing from an interactivity 
standpoint.

In fact, you can see what I'm talking about by trying --topo-order in the 
above timing test.

Really. Just _try_ it. And if you still don't understand what I'm talking 
about, I don't know what to say.

> > And quite frankly, I've seen that behaviour from you before, when it comes 
> > to other things.
> 
> What exact behaviour is that? That I dare to disagree with you?

No. The fact that you like arguing _pointlessly_, and just being abrasive, 
without actually helping or understanding the big picture. I'm thinking 
back on the whole scheduler thing. You weren't arguing with _me_, but you 
had the same modus operandi.

			Linus

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-29 12:52                         ` Jeff King
@ 2008-07-29 17:25                           ` Linus Torvalds
  2008-07-30  1:50                             ` Roman Zippel
  2008-07-30  4:26                             ` Jeff King
  2008-07-30  2:48                           ` Roman Zippel
  1 sibling, 2 replies; 58+ messages in thread
From: Linus Torvalds @ 2008-07-29 17:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Roman Zippel, Tim Harper, git

On Tue, 29 Jul 2008, Jeff King wrote:
> 
> I glanced briefly over "gitk kernel/printk.c" and it looks pretty sane.

Jeff, it _is_ sane. When Roman says it's "incorrect", he is just wrong.

But it's true that "gitk kernel/printk.c" does simplify the history, and 
does so very aggressively. It does so very much by design, and has done it 
since pretty much day one. And it's a good thing - and it is "correct" - 
but it does mean that you may literally be missing things that are part of 
"history" but that didn't matter much.

The most trivial way to show this is actually by making such a simple case 
that it's obvious what is going on. Do this:

	mkdir test-simplification
	cd test-simplification
	git init
	echo "Hi there" > file
	git add file
	git commit -m"Initial file"

	git branch other-branch

	echo "Hello" > file
	git add file
	git commit -m"Modified file"

	git checkout other-branch

	echo "Hello" > file
	git add file
	git commit -m"Another person modified the file identically"

	echo "This is a stupid example" > another-file
	git add another-file
	git commit -m"Add another file"

	git merge master

Now, do these three things

	gitk
	gitk file
	gitk --full-history file

and compare them. They all show _different_ histories.

Which one is "correct"? They all are. It just depends on what you want to 
see.

The "gitk file" history is the simplest one BY FAR, because it has very 
aggressively simplified history to the point where it tried to find the 
_simplest_ history that explains the current contents of 'file'[*]

>From a practical standpoint, and from having used this a long time, I'd 
argue that the simple history is the one that you want 99.9% of all time. 
But not _always_. Sometimes, the things that got simplified away actually 
matter. It's rare, but it happens.

For example, maybe you had a bug-fix that you _know_ you did, and it it 
doesn't show up in the simplified history. That really pisses you off, and 
it apparently really pisses Roman off that it can happen. But the fact is, 
that still doesn't mean that the simple history is "wrong" or even 
"incomplete".

No, it's actually meaningful data in itself. If the bug-fix doesn't show 
in the simplified history, then that simply means that the bug-fix was not 
on a branch that could _possibly_ have mattered for the current contents. 

So once you are _aware_ of history simplification and are mentally able to 
accept it, the fact that history got simplified is actually just another 
tool.

And that's why "-full-history" and "git whatchanged" exist. They are ways 
to start delving deeper - they shouldn't be the _default_ mode, but they 
are ways to show more information when the initial default simple mode 
turns out to show that something didn't even matter for the end result.

And yes, there is a mid-way point between "aggressive simplification" 
(default) and "no simplification at all" (--full-history). It's more 
complex than either, and I do think it would be useful to have. It's what 
Roman wants, but as long as he thinks it's the _only_ correct answer, and 
refuses to face the performance issues, the discussion with Roman is kind 
of pointless.

			Linus

[*] when I say "_simplest_ history", I do want to point out that the 
history simplification is always a "local optimization", and it doesn't 
try to check all possible paths: there can be other histories that are 
even simpler on a global scale.

But in practice it is _one_ history of the file, and it's a history that 
is not "unnecessarily complicated" considering the simple heurstics for 
finding it.

So think "local minima" instead of "global minima", and in practice the 
local one is pretty close to the global one, although there are obviously 
always extreme cases where the two can differ by a whole lot.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-29  3:15                     ` Martin Langhoff
@ 2008-07-30  0:16                       ` Roman Zippel
  2008-07-30  0:25                         ` Martin Langhoff
                                           ` (2 more replies)
  0 siblings, 3 replies; 58+ messages in thread
From: Roman Zippel @ 2008-07-30  0:16 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Linus Torvalds, Tim Harper, git

Hi,

On Tue, 29 Jul 2008, Martin Langhoff wrote:

> On Tue, Jul 29, 2008 at 2:59 PM, Roman Zippel <zippel@linux-m68k.org> wrote:
> > Can we please get past this and look at what is required to produce the
> > correct history?
> 
> Roman - correct is --full-history -- any simplification that makes it
> easy on your eyes *is* a simplification. And consumers that want to do
> nice user-friendly simplification like gitk does can hang off the data
> stream.

I don't quite understand what you're trying to say.
To avoid further confusion it maybe helps to specify a few of the terms:

- full history graph: produced by "git-log --full-history --parents"
- compact history graph: the full history graph without without any 
  repeated merges, this is what my example script produces.
- full simplified history: output of "git-log --full-history"
- short simplified history: standard output of "git-log"

The important part about the history graphs is that all commits are 
properly connected in it (i.e. all except the head commit have a child), 
This is needed to know if you don't just what want to know what happened, 
but also how it got merged, also any graphical interface needs it to 
produce a useful history graph.

What the short simplified history is more pure laziness, it's fast and 
gets the most common cases right, but in order to do this it has to ignore 
part of the history. The full simplified history at least produces 
produces the full change history, but it lacks part of the merge history 
and it stills takes longer to generate.

The point I'm trying to make is that the compact history graph has the 
potential to completely replace the simplified history. The only problem 
is that it needs a bit of cached extra information, then it can be as fast 
the short simplified history for the common case and it still can produce 
as much information as the full simplified history, thus you can still 
apply as much simplification as you want on top of it.

Keep in mind that e.g. git-web is using the full simplified history, so 
what I'm offering also has the potential to improve git-web performance...

> > it's also possible to update it when merging/pulling new data.
> 
> If that's what you want to do, you can prototype it with a hook on
> fetch and commit. That is definitely an area that hasn't been explored
> - what nicer (but expensive) views on the history we have can be
> afforded by pre-computing things on fetch and commit hooks.

I already did the prototype, I know how to generate that information, the 
problem is to get that information to the various graphical interfaces.

bye, Roman

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-30  0:16                       ` Roman Zippel
@ 2008-07-30  0:25                         ` Martin Langhoff
  2008-07-30  0:32                         ` Linus Torvalds
  2008-07-30  8:36                         ` Bizarre missing changes (git bug?) Jakub Narebski
  2 siblings, 0 replies; 58+ messages in thread
From: Martin Langhoff @ 2008-07-30  0:25 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Linus Torvalds, Tim Harper, git

On Wed, Jul 30, 2008 at 12:16 PM, Roman Zippel <zippel@linux-m68k.org> wrote:
> I already did the prototype

afaict people around here are only interested if it can be done
without losing the early-output niceness of current git-log. That it
can be worked out in a "put it all in memory and work it in there"
model is _not_ interesting for git.

cheers,



m
-- 
 martin.langhoff@gmail.com
 martin@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-30  0:16                       ` Roman Zippel
  2008-07-30  0:25                         ` Martin Langhoff
@ 2008-07-30  0:32                         ` Linus Torvalds
  2008-07-30  0:48                           ` Linus Torvalds
  2008-07-30  8:36                         ` Bizarre missing changes (git bug?) Jakub Narebski
  2 siblings, 1 reply; 58+ messages in thread
From: Linus Torvalds @ 2008-07-30  0:32 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Martin Langhoff, Tim Harper, git

On Wed, 30 Jul 2008, Roman Zippel wrote:
> 
> What the short simplified history is more pure laziness

No.

Roman, you're an idiot who doesn't even _understand_ what you are talking 
about. Sadly, you then _think_ you are so smart that you then refuse to 
even consider the fact that others disagree, so you don't even read what 
they write.

Go to my previous email in this thread. Do the example. Look at the 
simplified version. Ponder.

It's not "pure lazyness" when you get the simplified version. It's 
actually a MORE USEFUL RESULT! The simplified version shows the minimal 
explanation of how things ended up the way they are, and that is damn 
useful. What you want is extra _clutter_ most of the time.

It's really sad how you cannot get over your own prejudices here. 

So Roman. Go back, read my previous email in this thread. It's message ID 
is

	<alpine.LFD.1.10.0807291006070.3334@nehalem.linux-foundation.org>

in case it helps you find it.

Read it twice, or three times. Read it with the notion that maybe you 
didn't know best after all. Read it with the possibility that maybe there 
are smarter people than you, and people who have actually worked with git 
for several years.

And if you can't do that, at least stop cc'ing me with your idiocy.

To get to the meat of your email:

> The point I'm trying to make is that the compact history graph has the 
> potential to completely replace the simplified history. The only problem 
> is that it needs a bit of cached extra information, then it can be as fast 
> the short simplified history for the common case and it still can produce 
> as much information as the full simplified history, thus you can still 
> apply as much simplification as you want on top of it.

You're simply full of sh*t. You make two huge mistakes, and I'll spend 
another few minutes of my life trying to educate you one final more time, 
even though from every single indication I have so far, you are unable to 
learn simply because you think you already know the answer.

Your two mistakes are:

 - your "only" problem is fundamental.  It's unsolvable. Git history 
   simplification isn't per-file or even per-directory.  It's 
   per-any-random-set-of-pathnames. You can't "cache" the simplified 
   information, and it's not "a bit" of cached extra info. It's 
   fundamentally a metric truckload of info.

   With a cache, you can make the performance of a repeated query go fast, 
   but that's totally uninteresting.

 - But the other huge mistake you make is EVEN MORE STUPID, because it's 
   so ironic. That magical output you want, and claim is so perfect, and 
   point out "thus you can still apply as much simplification as you want 
   on top of it"? You know what? It already _exists_! It's exactly that 
   --full-history case.

   Can you not see that? That's exactly that --full-history --parents 
   cases. It gives you the full information. You can simplify it to what 
   you want, exactly because it did _not_ simplify things for you. I've 
   even told you so, multiple times, when I suggested you try to do that 
   simplification in "gitk".

In other words, git has the two cases you want: the "extreme simplified 
history" (that is nice to see what really _mattered_, with no extra 
unnecessary duplicate history that didn't actually affect the end result), 
and the "full" history (ooh, I know, we could make a command line called 
"--full-history" to get the latter, so that people who wanted to see it 
all and perhaps distill it to something else could do so).

And I've told you over and over what you should look at, and I've told you 
over and over that the default is actually the _useful_ case, and why. But 
you seem to refuse to listen. You just close your ears and repeat your 
mantra, even though people smarter than you have told you why it's done 
the way it's done. 

Stop stuffing your ears. Listen to what people tell you.

		Linus

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-30  0:32                         ` Linus Torvalds
@ 2008-07-30  0:48                           ` Linus Torvalds
  2008-07-30 23:56                             ` Junio C Hamano
  0 siblings, 1 reply; 58+ messages in thread
From: Linus Torvalds @ 2008-07-30  0:48 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Martin Langhoff, Tim Harper, git

On Tue, 29 Jul 2008, Linus Torvalds wrote:
> 
>  - But the other huge mistake you make is EVEN MORE STUPID, because it's 
>    so ironic. That magical output you want, and claim is so perfect, and 
>    point out "thus you can still apply as much simplification as you want 
>    on top of it"? You know what? It already _exists_! It's exactly that 
>    --full-history case.

Put in other terms: what you ask for can be fairly trivially done as a 
filter on the _current_ git output (preferably merged into the tool that 
shows it graphically in the first place), with absolutely no downside.

In contrast, if somebody was really so _stupid_ as to go with your output 
format, then yes, he could further simplify it down to the current default 
format, but with a huge performance/interactivity downside.

See? Your preferred format is not actually the "best" format. Not at all. 
Quite the reverse. Your preferred format is much better off being a 
secondary post-processing format exactly because it can be generated from 
one of the primary formats easily enough.

But the reverse isn't true: the current primary formats cannot be 
generated from your preferred format without losing something important 
(performance).

But I'll make you a deal: if you actually write the filter in C form, I 
can pretty much guarantee that we can easily add it as a new flag. It 
really should be pretty easy to integrate it into the revision parsing 
machinery alongside --topo-order, since it's really the same kind of 
operation.

In fact, it's possible that the current --topo-order sorting could 
possibly be made to just do the simplification (conditionally, of course, 
since it has the latency problem). See the function

	void sort_in_topological_order(struct commit_list ** list, int lifo)

in commit.c - that's where it would hook in.

		Linus

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-29 15:50                         ` Linus Torvalds
@ 2008-07-30  1:14                           ` Roman Zippel
  2008-07-30  1:32                             ` Kevin Ballard
  2008-07-30  1:49                             ` Linus Torvalds
  0 siblings, 2 replies; 58+ messages in thread
From: Roman Zippel @ 2008-07-30  1:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tim Harper, git

Hi,

On Tue, 29 Jul 2008, Linus Torvalds wrote:

> On Tue, 29 Jul 2008, Roman Zippel wrote:
> > 
> > I'm not dismissing it, but your focus is on how to get this result.
> 
> No, you misunderstand.
> 
> My focus is really on one single thing:
> 
>  - performance
> 
> with a smaller focus on the fact that I simply don't see how it's 
> _possible_ to do better than our current all-or-nothing approach of 
> simplification (eg either extreme simplification or none at all: nothing 
> or --full-history).

That's exactly what I'm not dismissing as you claim, but I've hit the 
problem where this approach simply produces crap, so I'm foremost 
interested in getting a useful result, only after that I'm interested in 
the performance (which I think is possible).

> So here's my challenge again, which you seem to have TOTALLY MISSED.
> 
> Make this be fast:
> 
> 	time sh -c "git log <filename> | head"
> 
> nothing else matters. If you can make that one be fast, I'm happy. 

I already explained it, but you simply dismissed it. It's possible, but it 
requires a bit of cached information (e.g. as part of the pack file, which 
is needed for decent performance anyway).

> In fact, you can see what I'm talking about by trying --topo-order in the 
> above timing test.

Please give me full example.
gitk --topo-order kernel/printk.c shows no difference (e.g. it doesn't 
show 02630a12c7f72fa294981c8d86e38038781c25b7), several experiments with 
git-rev-list show no improvement either.

> > > And quite frankly, I've seen that behaviour from you before, when it comes 
> > > to other things.
> > 
> > What exact behaviour is that? That I dare to disagree with you?
> 
> No. The fact that you like arguing _pointlessly_, and just being abrasive, 
> without actually helping or understanding the big picture.

The problem is that your picture doesn't include my specific problem, I'm 
very interested in the big picture, but I'd like to be in it.

> I'm thinking 
> back on the whole scheduler thing. You weren't arguing with _me_, but you 
> had the same modus operandi.

Well, it seems I have talent for finding the special cases, e.g. last time 
I tested the scheduler it was performing twice as bad as the old scheduler 
on m68k. I've also seen cases where it sacrifices throughput for 
interactivity.
Anyway, this is the wrong place for it anyway, the problem I'm hitting is 
these "good enough" solutions, which work in most situations, but fail in 
a few special situations, but nobody is interested to get these right 
unless your name is Linus.

bye, Roman

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-30  1:14                           ` Roman Zippel
@ 2008-07-30  1:32                             ` Kevin Ballard
  2008-07-30  1:49                             ` Linus Torvalds
  1 sibling, 0 replies; 58+ messages in thread
From: Kevin Ballard @ 2008-07-30  1:32 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Linus Torvalds, Tim Harper, git

On Jul 29, 2008, at 6:14 PM, Roman Zippel wrote:

>> So here's my challenge again, which you seem to have TOTALLY MISSED.
>>
>> Make this be fast:
>>
>> 	time sh -c "git log <filename> | head"
>>
>> nothing else matters. If you can make that one be fast, I'm happy.
>
> I already explained it, but you simply dismissed it. It's possible,  
> but it
> requires a bit of cached information (e.g. as part of the pack file,  
> which
> is needed for decent performance anyway).

As an outside observer, this argument is basically akin to "it's easy  
to fly, you just need some faerie dust". Basically, you're dismissing  
the entire complexity of the problem by saying "oh, that's easy, just  
use some cached data" without any proof that this would work, or any  
sample code, or really any evidence at all. Given that the path  
simplification can be arbitrarily complex (I can pass any set of paths  
I want), I don't believe that you can just use "a bit of cached  
information" for this. If you did rely on cached information, said  
information would probably be orders of magnitude larger than the  
object graph itself (for repos with lots of files).

>> In fact, you can see what I'm talking about by trying --topo-order  
>> in the
>> above timing test.
>
> Please give me full example.
> gitk --topo-order kernel/printk.c shows no difference (e.g. it doesn't
> show 02630a12c7f72fa294981c8d86e38038781c25b7), several experiments  
> with
> git-rev-list show no improvement either.

He's not saying it changes what commits are shown, he's saying it has  
a performance impact - topo order has to post-process the graph. For a  
quick demonstration, run `time sh -c 'git log | head'` vs `time sh -c  
'git log --topo-order | head'`.

-Kevin Ballard

-- 
Kevin Ballard
http://kevin.sb.org
kevin@sb.org
http://www.tildesoft.com

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-30  1:14                           ` Roman Zippel
  2008-07-30  1:32                             ` Kevin Ballard
@ 2008-07-30  1:49                             ` Linus Torvalds
  1 sibling, 0 replies; 58+ messages in thread
From: Linus Torvalds @ 2008-07-30  1:49 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Tim Harper, git

On Wed, 30 Jul 2008, Roman Zippel wrote:
> > 
> > 	time sh -c "git log <filename> | head"
> > 
> > nothing else matters. If you can make that one be fast, I'm happy. 
> 
> I already explained it, but you simply dismissed it. It's possible, but it 
> requires a bit of cached information (e.g. as part of the pack file, which 
> is needed for decent performance anyway).

Bzzt. Wrong. Try again.

> > In fact, you can see what I'm talking about by trying --topo-order in the 
> > above timing test.
> 
> Please give me full example.
> gitk --topo-order kernel/printk.c shows no difference (e.g. it doesn't 
> show 02630a12c7f72fa294981c8d86e38038781c25b7), several experiments with 
> git-rev-list show no improvement either.

Roman, what the f*ck is wrong with you? Let me repeat that thing one more 
time:

	you can see what I'm talking about by trying --topo-order in the
	above timing test.
	      ^^^^^^^^^^^

The fact is, --topo-order is a post-processing thing, exactly the way your 
half-way simplification would be. It requires _all_ commits, and it 
requires them because we cannot guarantee that we output all children 
before the parents when there are multiple threads without a central clock 
(ie any distributed environment).

So for --topo-order, we generate the whole history, and then we sort it. 

As a result, it has horrible interactivity behavior. Try it. Here's some 
random command lines, and the times:

	time git log --topo-order drivers/scsi/scsi_lib.c | head

	real    0m0.688s
	user    0m0.652s
	sys     0m0.036s

and without:

	time git log drivers/scsi/scsi_lib.c | head

	real    0m0.033s
	user    0m0.024s
	sys     0m0.008s

do you see the difference? They happen to output _exactly_ the same ten 
lines, but one of them takes the better part of a second (and that's on 
pretty much the fastest machine you can find right now - on a laptop with 
a slow disk and without things in cache, it would take many many seconds).

The other one is instantaneous.

Now, I realize that 0.033s vs 0.688s doesn't sound like a big deal, even 
though that's a 20x difference, but that 20x difference is a _really_ big 
deal when the machine is slower, or when "old history" isn't in the disk 
cache any more.

For example, try doing the timings after flushing the disk caches to 
simulate cold-cache behavior. Do it with a slow disk. Or do it over NFS. 
Yes, even the "fast" case will actually be painfully slow (well, it is for 
me, people who are used to CVS probably think it's just "normal"). 

And yes, it will depend a lot on the file in question too. Obviously, if 
the first change is far back in history, it will be slow _regardless_, but 
I've at least personally found that in practice, you tend to look at logs 
of _recent_ things much much much more than you look at things that 
haven't changed lately.	

It will also depend a lot on whether you are packed or not. For example, 
if you are well packed, the pack-file IO locality is really really good, 
and the 20x slowdown is much less. I just tested with a laptop with a slow 
disk, and the --topo-order case was "only" 2.5x slower, almost certainly 
because the IO required to bring in the first part of the history ended up 
being a large portion of the total IO, and so the "whole history" case was 
not 20x slower, because there was not 20x more IO due to the good locality 
and the kernel doing readahead etc.

But 2.5x slower is really bad, wouldn't you agree? We're not talking about 
a few percent here, we're talking about more than twice as long. It's very 
noticeable, especially when the end result was --topo-order: 29.8s, no 
topo-order 12.1s

(Yeah, that wasn't a very realistic example, but on that same machine, 
once it's in the cache, it's 0.13s vs 1.6s: one is "instant", the other is 
very much a "wait for it" kind of thing.)

THAT is the kind of performance difference you see.

And trust me, it's a performance difference that you can really notice in 
real life. I'm not kidding you. Just try it:

	git log kernel/sched.c
vs
	git log --topo-order kernel/sched.c

and one is instant, the other one pauses before it starts showing 
something. One feels fast, the other feels slow.

At the same time, if you actually time the _whole_ log, it's all exactly 
the same speed:

	[torvalds@nehalem linux]$ time git log --topo-order kernel/sched.c > /dev/null 
	real	0m0.708s
	user	0m0.684s
	sys	0m0.020s

	[torvalds@nehalem linux]$ time git log kernel/sched.c > /dev/null 
	real	0m0.703s
	user	0m0.672s
	sys	0m0.032s

Notice? The cost of the topological sort itself is basically zero. But 
from an interactivity standpoint, it's _deadly_.

And please note that here "--topo-sort" is just an example of a random 
"global history post-processing" thing. It's not that I want you to use 
the topological sort per se, it's just an example of the whole issue with 
_any_ post-factum operation. The topological sort is not expensive as a 
sort. What is expensive is that it needs to get the whole history to work.

And also please notice that this is a huge scalability issue. "git log" 
should not become slower as a project gets more history. Sure, the full 
log will take longer to generate (because there's _more_ of it), but the 
top commits should always show up immediately.

Again, if you have a filter (where "topological sort" is just an example 
of such a filter) that requires the full history to work, it simply 
_fundamentally_ cannot scale well. If very fundamentally will slow down 
with bigger history.

> The problem is that your picture doesn't include my specific problem, I'm 
> very interested in the big picture, but I'd like to be in it.

Roman, I've been trying to explain this "interactive" thing for _days_ 
now. That's the big picture. The whole "you have to be able to generate 
history incrementally" thing.

First generating the whole global history, and then simplifying it, is 
simply not acceptable. It's too slow, and it doesn't scale.

			Linus

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-29 17:25                           ` Linus Torvalds
@ 2008-07-30  1:50                             ` Roman Zippel
  2008-07-30  2:05                               ` Linus Torvalds
  2008-07-30  4:26                             ` Jeff King
  1 sibling, 1 reply; 58+ messages in thread
From: Roman Zippel @ 2008-07-30  1:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Tim Harper, git

Hi,

On Tue, 29 Jul 2008, Linus Torvalds wrote:

> Now, do these three things
> 
> 	gitk
> 	gitk file
> 	gitk --full-history file
> 
> and compare them. They all show _different_ histories.
> 
> Which one is "correct"? They all are. It just depends on what you want to 
> see.
> 
> The "gitk file" history is the simplest one BY FAR, because it has very 
> aggressively simplified history to the point where it tried to find the 
> _simplest_ history that explains the current contents of 'file'[*]

It's "aggressively simplified" by not even bothering to look for more.
"simplified" implies there is something more complex beforehand, but all 
it does is simple scan through the history as fast possible without 
bothering looking left or right.
"simplified" implies to me it's something intentional, but this is more of 
an accidental optimization which happens to work in most situations and in 
the special cases it just picks a random change and hopes for the best.

"git-log --full-history file" at least produces the full change history, 
but it has an performance impact and it doesn't produce a complete graph 
usable for graphical front ends.

> >From a practical standpoint, and from having used this a long time, I'd 
> argue that the simple history is the one that you want 99.9% of all time. 
> But not _always_. Sometimes, the things that got simplified away actually 
> matter. It's rare, but it happens.
> 
> For example, maybe you had a bug-fix that you _know_ you did, and it it 
> doesn't show up in the simplified history. That really pisses you off, and 
> it apparently really pisses Roman off that it can happen. But the fact is, 
> that still doesn't mean that the simple history is "wrong" or even 
> "incomplete".

I gave more general examples. Tracking upstream source can produce this 
problem frequently. Another example are stable/unstable branches where the 
stable branch is occasionally merged into the unstable branch can produce 
this problem.

> No, it's actually meaningful data in itself. If the bug-fix doesn't show 
> in the simplified history, then that simply means that the bug-fix was not 
> on a branch that could _possibly_ have mattered for the current contents. 
> 
> So once you are _aware_ of history simplification and are mentally able to 
> accept it, the fact that history got simplified is actually just another 
> tool.

This is your _subjective_ interpretion of this problem, because it's not a 
problem for you, nobody else can possibly have this problem (or they just 
crazy).
Even if I know about this limitation it still doesn't solve the problem, 
that _none_ of the graphical interfaces can show me a useful history graph 
of these situations.

bye, Roman

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-30  1:50                             ` Roman Zippel
@ 2008-07-30  2:05                               ` Linus Torvalds
  0 siblings, 0 replies; 58+ messages in thread
From: Linus Torvalds @ 2008-07-30  2:05 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Jeff King, Tim Harper, git

On Wed, 30 Jul 2008, Roman Zippel wrote:
> > 
> > The "gitk file" history is the simplest one BY FAR, because it has very 
> > aggressively simplified history to the point where it tried to find the 
> > _simplest_ history that explains the current contents of 'file'[*]
> 
> It's "aggressively simplified" by not even bothering to look for more.

Yes and no.

It's aggressively simplified because that's the right output with the 
minimal unnecessary irrelevant information. It explains how the file came 
to a particular state, with the simplest possible self-consistent history.

(Again, the caveat about "simplest possible" always beign a local 
minimization, not a global one).

The fact that it also obviously involved less work (so git can do it 
faster, and with fewer disk and memory accesses) is a huge bonus, of 
course.

Are you complaining about the fact that I'm smart, and I get the right 
result I want with less work and with a simpler algorithm?

What's your point?

> "simplified" implies there is something more complex beforehand, but all 
> it does is simple scan through the history as fast possible without 
> bothering looking left or right.

You're just being stupid.

It's not that it's not "bothering" looking left or right. It very much 
*does* bother to look left or right. But once it finds that one or the 
other explains the situation entirely, it then says "screw left, I already 
know that rigth gives me the information I want".

In other words, it's doing the _smart_ thing. 

I don't understand why you complain about intelligence.

It's *not* just looking at one single history. Look at

	gitk kernel/sched.c

and notice that the simplified history is not linear. It tries to make it 
AS LINEAR AS POSSIBLE, BUT NO MORE.

    "Make everything as simple as possible, but not simpler."
			- Albert Einstein

You seem to complain about the fact that it's doing that. That's stupid of 
you.

> "simplified" implies to me it's something intentional, but this is more of 
> an accidental optimization which happens to work in most situations and in 
> the special cases it just picks a random change and hopes for the best.

You're just crazy. There is nothing accidental there what-so-ever.

> "git-log --full-history file" at least produces the full change history, 
> but it has an performance impact and it doesn't produce a complete graph 
> usable for graphical front ends.

Umm. You have to add "--parents" if you want a full graph. Without that, 
you can never re-generate the graph anyway.

And when you do that, it _does_ give all the commits needed to complete 
the picture.

In other words, git (once again) is actually smarter than you, and does 
the right thing, and (once again) you complain about something that you 
just don't understand.

> I gave more general examples. Tracking upstream source can produce this 
> problem frequently. Another example are stable/unstable branches where the 
> stable branch is occasionally merged into the unstable branch can produce 
> this problem.

You call it a "problem", but you don't actually give any reason for 
calling it that. IT IS NOT A PROBLEM. It's very much by design, and it's 
because what you want.

Use --full-history if you want the full history. 

> This is your _subjective_ interpretion of this problem, because it's not a 
> problem for you, nobody else can possibly have this problem (or they just 
> crazy).

No, Roman. You're not crazy because you have some issue that I cannot 
understand. You're crazy because you make the same mistake over and over, 
and don't listen when people tell you what the mistake was.

	"Insanity is doing the same thing over and over again and 
	 expecting different results."
			- Various

Please. People have told you where you go wrong. Many times. So why do you 
keep repeating it?

Take the time to slow down, listen, and realize that you're on the wrong 
track, and that others really _have_ spent time and thought on this.

		Linus

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-29 12:52                         ` Jeff King
  2008-07-29 17:25                           ` Linus Torvalds
@ 2008-07-30  2:48                           ` Roman Zippel
  2008-07-30  3:20                             ` Kevin Ballard
                                               ` (2 more replies)
  1 sibling, 3 replies; 58+ messages in thread
From: Roman Zippel @ 2008-07-30  2:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Tim Harper, git

Hi,

On Tue, 29 Jul 2008, Jeff King wrote:

> > > Perhaps I am just slow, but I haven't been able to figure out what that
> > > history is, or what the "correct" output should be. Can you try to state
> > > more clearly what it is you are looking for?
> > 
> > Most frequently this involves changes where the same change is merged 
> > twice. Another interesting example is kernel/printk.c where a change is 
> > added and later removed again before it's merged.
> 
> I glanced briefly over "gitk kernel/printk.c" and it looks pretty sane.
> I was really hoping for you to make your case as something like:
> 
>   1. here is an ascii diagram of an actual history graph (or a recipe of
>      git commands for making one)
>   2. here is what git-log (or gitk) produces for this history by
>      default; and here is why it is not optimal (presumably some
>      information it fails to convey)
>   3. here is what git-log (or gitk) with --full-history produces; and
>      here is why it is not optimal (presumably because it is too messy)
>   4. here is what output I would like to see. Bonus points for "and here
>      is an algorithm that accomplishes it."

For printk.c look for commit 02630a12c7f72fa294981c8d86e38038781c25b7 and 
try to find it in the graphical outputs.
Here is a bit better example than Linus gave:

mkdir test
cd test
git init

echo 1 > file1
echo a > file2

git add file1 file2
git commit -m "initial commit"
git tag base

git branch test1 base
git checkout test1
echo 2 > file1
git commit -a -m "duplicate change 1"

git branch test2 base
git checkout test2
echo 2 > file1
git commit -a -m "duplicate change 2"

git branch test3 base
git checkout test3
echo b > file2
git commit -a -m "some other change"

git checkout base

git merge test1
git merge test2
git merge test3

Now compare the output of "git-log file1", "git-log --full-history file1" 
and "git-log --full-history --parents file1". You get either both merge 
commits or none, but only one of it is relevant to file1.

The problem is that in practice "git-log --full-history --parents" 
produces way too much information to be useful right away.

bye, Roman

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-30  2:48                           ` Roman Zippel
@ 2008-07-30  3:20                             ` Kevin Ballard
  2008-07-30  3:21                             ` Linus Torvalds
  2008-07-30  4:23                             ` Jeff King
  2 siblings, 0 replies; 58+ messages in thread
From: Kevin Ballard @ 2008-07-30  3:20 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Jeff King, Linus Torvalds, Tim Harper, git

On Jul 29, 2008, at 7:48 PM, Roman Zippel wrote:

> For printk.c look for commit  
> 02630a12c7f72fa294981c8d86e38038781c25b7 and
> try to find it in the graphical outputs.
> Here is a bit better example than Linus gave:
>
> [snip]
>
> Now compare the output of "git-log file1", "git-log --full-history  
> file1"
> and "git-log --full-history --parents file1". You get either both  
> merge
> commits or none, but only one of it is relevant to file1.
>
> The problem is that in practice "git-log --full-history --parents"
> produces way too much information to be useful right away.

Output looks correct to me. And of course --full-history --parents  
gives lots of output - that's what it's for. You seem to believe that  
the appropriate output is, what, to display the initial commit, both  
commits that modified file1, and the first merge, yes? Can you please  
clarify the logic that states that the first merge commit should be  
shown but the second should not?

-Kevin Ballard

-- 
Kevin Ballard
http://kevin.sb.org
kevin@sb.org
http://www.tildesoft.com

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-30  2:48                           ` Roman Zippel
  2008-07-30  3:20                             ` Kevin Ballard
@ 2008-07-30  3:21                             ` Linus Torvalds
  2008-07-30  3:35                               ` Linus Torvalds
  2008-07-30  4:23                             ` Jeff King
  2 siblings, 1 reply; 58+ messages in thread
From: Linus Torvalds @ 2008-07-30  3:21 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Jeff King, Tim Harper, git

On Wed, 30 Jul 2008, Roman Zippel wrote:
> 
> For printk.c look for commit 02630a12c7f72fa294981c8d86e38038781c25b7 and 
> try to find it in the graphical outputs.

Umm.

Why would you? Yes, it's there, if you ask for --full-history. And no, I 
don't think --full-history is actually useful to humans - it's very much 
there as a "here's all the data" thing where you could have the tools 
post-process it, where often "post-processing" is actually just searching 
for it.

And no, it's not there if you don't use --full-history.

But now, instead of _complaining_ about this, I would suggest you think 
about why it's a _good_ thing, and why it's so useful?

In other words, you're arriving at all your complaints from the wrong 
angle entirely, and because you have convinced yourself that things have 
to work a certain way, and then you're upset when they don't.

But you should _unconvince_ yourself - and look at whether maybe all your 
initial preconceptions were perhaps totally wrong? Because they were.

The reason that commit 02630a12c7f72fa294981c8d86e38038781c25b7 doesn't 
show up in the normal log when looking at kernel/printk.c is that it 
really doesn't exist as a _relevant_ part of history for the current state 
of that file. It exists only as a a side-branch for the GFS2 quota code 
that first adds a line

	+EXPORT_SYMBOL_GPL(tty_write_message);

(in commit b346671fa196a), and then removes the line not long after (in 
that commit 02630a12c7f). And both of them go away (along with the whole 
side-branch), because they didn't end up mattering for the end result: 
they only ever existed in that side branch, and by the time it was merged 
back into the main branch, all changes had been undone.

In other words, that change - in a VERY REAL WAY - never actually mattered 
for the current state of kernel/printk.c. And the history simplification 
sees that, and avoids showing the whole pointless branch.

This is such an obviously _good_ thing that I really am surprised ay how 
you can continue to argue against it. Especially as the examples you give 
"for" your argument are so wonderful examples _against_ it.

And yes, you can actually force gitk to show the state of that commit and 
thus force it to acknowledge that that state was relevant (although you 
won't necessarily force it to acknowledge that the relevance ties together 
with the final end result). You do that by just telling it that you're not 
just interested in HEAD, but in that commit too.

So I would literally suggest that anybody interested in this subject 
really just do

	gitk kernel/printk.c &
	gitk HEAD 02630a12c7f72fa294981c8d86e38038781c25b7 kernel/printk.c &

in the kernel, and now compare the two side-by-side. Notice where they 
differ (hint: look for the commit a0f1ccfd8d37457a6d8a9e01acebeefcdfcc306e 
- "[PATCH] lockdep: do not recurse in printk" - which is in both, and look 
below it).

Now, which graph is the more relevant and understandable one from the 
standpoint of what the current state of kernel/printk.c is?

Honestly now, Roman.

Because if you were actually willing to see this as a _feature_ (which it 
very much is), you'd admit that it's a damn clever and useful one. But I 
suspect you have dug yourself so deep into a hole that you can't admit 
that even to yourself any more.

				Linus

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-30  3:21                             ` Linus Torvalds
@ 2008-07-30  3:35                               ` Linus Torvalds
  0 siblings, 0 replies; 58+ messages in thread
From: Linus Torvalds @ 2008-07-30  3:35 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Jeff King, Tim Harper, git

On Tue, 29 Jul 2008, Linus Torvalds wrote:
> 
> In other words, that change - in a VERY REAL WAY - never actually mattered 
> for the current state of kernel/printk.c. And the history simplification 
> sees that, and avoids showing the whole pointless branch.

Btw, Roman, this is a really really important thing for you to realize. 

You need to realize that your "perfect" output really REALLY is totally 
inferior, if what you are actually interested in is "how did things get to 
be the way they are".

It's a _feature_. It's not a bug. And it's a really good one.

If side branches didn't matter for the contents of the file, those side 
branches simply don't matter, and showing them is just a distraction.

Yes, you can ask for the history that doesn't matter for the end result. 
And yes, I acknowledge freely that it would be good to then have a 
separate cleanup phase to make that thing more readable. In fact, in the 
very first reply to you I pointed you to a thread where I said exactly 
that, long before this thread even started.

But no, the current default isn't broken. No, it's not "lazy" either. No, 
it was not an "accident". And no, it's not "incorrect".

And until you can see that (along with all the reasons I've outlined why 
your "fixed" approach is a total piece of sh*t from a performance angle), 
you're just being stupid.

				Linus

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-30  2:48                           ` Roman Zippel
  2008-07-30  3:20                             ` Kevin Ballard
  2008-07-30  3:21                             ` Linus Torvalds
@ 2008-07-30  4:23                             ` Jeff King
  2 siblings, 0 replies; 58+ messages in thread
From: Jeff King @ 2008-07-30  4:23 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Linus Torvalds, Tim Harper, git

On Wed, Jul 30, 2008 at 04:48:54AM +0200, Roman Zippel wrote:

> Now compare the output of "git-log file1", "git-log --full-history file1" 
> and "git-log --full-history --parents file1". You get either both merge 
> commits or none, but only one of it is relevant to file1.

Ah, I see.

So if I understand you, you wanted to see something like:


A--B
 \  \
  C--D

where

 A = initial commit
 B = duplicate change 1
 C = duplicate change 2
 D = merge branch 'test2' into HEAD

where the simplification isn't as aggressive (you still see the
duplicate commits and the merge), but we can get rid of the later merge
between A and D because A is already an ancestor of D.

So do you have a proposed set of simplification rules that will produce
that output?

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-29 17:25                           ` Linus Torvalds
  2008-07-30  1:50                             ` Roman Zippel
@ 2008-07-30  4:26                             ` Jeff King
  2008-07-30  4:52                               ` Linus Torvalds
  1 sibling, 1 reply; 58+ messages in thread
From: Jeff King @ 2008-07-30  4:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Roman Zippel, Tim Harper, git

On Tue, Jul 29, 2008 at 10:25:35AM -0700, Linus Torvalds wrote:

> On Tue, 29 Jul 2008, Jeff King wrote:
> > 
> > I glanced briefly over "gitk kernel/printk.c" and it looks pretty sane.
> 
> Jeff, it _is_ sane. When Roman says it's "incorrect", he is just wrong.

I agree with you, btw. It is definitely correct and useful; however, I
am curious if there is some "in between" level of simplification that
might produce an alternate graph that has interesting features. And that
is why I am trying to get Roman to lay out exactly what it is he wants.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-30  4:26                             ` Jeff King
@ 2008-07-30  4:52                               ` Linus Torvalds
  0 siblings, 0 replies; 58+ messages in thread
From: Linus Torvalds @ 2008-07-30  4:52 UTC (permalink / raw)
  To: Jeff King; +Cc: Roman Zippel, Tim Harper, git

On Wed, 30 Jul 2008, Jeff King wrote:
> 
> I agree with you, btw. It is definitely correct and useful; however, I
> am curious if there is some "in between" level of simplification that
> might produce an alternate graph that has interesting features. And that
> is why I am trying to get Roman to lay out exactly what it is he wants.

Actually, I know what he wants, since I tried to describe it for the 
filter-branch discussion. It's really not that conceptually complex.

Basically, the stupid model is to just do this:

 - start with --full-history

 - for each merge, look at both parents. If one parent leads directly to 
   a commit that can be reached from the the other, just remove that 
   parent as being redundant. And if that removal leads to a merge now 
   becoming a non-merge, and it has no changes wrt its single remaining 
   parent, remove the commit entirely (rewriting any parenthood to make 
   the rest all stay together, of course)

 - repeat until you cannot do any more simplification (removing one commit 
   can actually cause its children to now become targets for this 
   simplification).

and I suspect that

 (a) the stupid model is probably at least O(n^3) if done stupidly and 
     O(n^2) with some modest amount of smarts (keeping a list of at least 
     potential targets of simplification and expanding it only when 
     actually simplifying), but that
 (b) you can concentrate on just the merges that the current optimizing 
     algorithm would have removed, so 'n' is not the total number of 
     commits, but at most the number of merges, and more likely actually 
     just the number of trivial merges in that file, and finally
 (c) there is likely some smart and efficient graph minimization algorithm 
     that is O(nlogn) or something.

so I don't think it's likely to be hugely more expensive than the 
topo-sort is. All the real expense is in the same thing the topo-sort 
expense, namely in generating the list up-front.

I bet googling for "minimal directed acyclic graph" will give pointers.

And despite the fact that I've argued against Roman's world-view, I 
actually _do_ think it would be nice to have that third mode, the same way 
that we have --topo-order. It wouldn't be good for the _default_ view, but 
then neither is --full-history, so that's not a big argument.

That said, I'd like to (again) repeat the caveat that it's probably best 
done in the tool that actally visualizes the mess - exactly for the same 
reason that I argued for the topological sort being done in gitk. It's 
very painful to have to wait for the first few commits to start appearing 
in the history window.

Admittedly most of my work is actually done on machines that are pretty 
fast, but every once in a while I travel with a laptop. And more 
importantly, not everybody gets new hardware from Intel for testing even 
before the CPU has been released. So others will still appreciate 
incremental history updates, even if my machine might be fast enough (and 
my kernel tree always in the caches) that I myself could live with a 
synchronous version a-la --topo-order.

			Linus

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-30  0:16                       ` Roman Zippel
  2008-07-30  0:25                         ` Martin Langhoff
  2008-07-30  0:32                         ` Linus Torvalds
@ 2008-07-30  8:36                         ` Jakub Narebski
  2 siblings, 0 replies; 58+ messages in thread
From: Jakub Narebski @ 2008-07-30  8:36 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Martin Langhoff, Linus Torvalds, Tim Harper, git

Roman Zippel <zippel@linux-m68k.org> writes:

> I don't quite understand what you're trying to say.
> To avoid further confusion it maybe helps to specify a few of the terms:
> 
> - full history graph: produced by "git-log --full-history --parents"
> - compact history graph: the full history graph without without any 
>   repeated merges, this is what my example script produces.
> - full simplified history: output of "git-log --full-history"
> - short simplified history: standard output of "git-log"
[...]

> Keep in mind that e.g. git-web is using the full simplified history, so 
> what I'm offering also has the potential to improve git-web performance...

The fact that gitweb is using --full-history for a 'history' view
is a historical reason, backwards compatibility with the view that
was shown before gitweb used "git rev-list [flags] -- <path>", see
commit cdd4037d

    gitweb: optimize per-file history generation
    
    The rev-list command that is recent enough can filter commits
    based on paths they touch, so use it instead of generating the
    full list and limiting it by passing it with diff-tree --stdin.
    
    [jc: The patch originally came from Luben Tuikov but the it was
     corrupt, but it was short enough to be applied by hand.  I
     added the --full-history to make the output compatible with the
     original while doing so.]
    
    Signed-off-by: Junio C Hamano <junkio@cox.net>

Removing '--parents' was put later, to remove unnecessary merges
from a view (there was long discussion on git mailing list about
--full-history with and without --parents), in 208b2dff

    gitweb: We do longer need the --parents flag in rev-list.
    
    We only want to know the direct parents of a given commit object,
    these parents are available in the --header output of rev-list.  If
    --parents is supplied with --full-history the output includes merge
    commits that aren't relevant.
    
    Signed-off-by: Robert Fitzsimons <robfitz@273k.net>
    Signed-off-by: Junio C Hamano <junkio@cox.net>

Besides gitweb currently does not generate graphical history view,
so '--parents' are unnecessary.

But if it was done from the scratch, gitweb should definitely
use simplified history, instead of what you call "full simplified
history", perhaps with an option to use '--full-history' (there
is infractructure in gitweb for adding extra options).


(Nitpick: it is 'gitweb', not 'git-web'.)
-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-30  0:48                           ` Linus Torvalds
@ 2008-07-30 23:56                             ` Junio C Hamano
  2008-07-31  0:15                               ` Junio C Hamano
                                                 ` (2 more replies)
  0 siblings, 3 replies; 58+ messages in thread
From: Junio C Hamano @ 2008-07-30 23:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Roman Zippel, Martin Langhoff, Tim Harper, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> But the reverse isn't true: the current primary formats cannot be 
> generated from your preferred format without losing something important 
> (performance).
>
> But I'll make you a deal: if you actually write the filter in C form, I 
> can pretty much guarantee that we can easily add it as a new flag. It 
> really should be pretty easy to integrate it into the revision parsing 
> machinery alongside --topo-order, since it's really the same kind of 
> operation.

I am not Roman, but so I do not know if I did what Roman wanted to, but
here is a quick hack.  "gitk --post-simplify -- kernel/printk.c" is
slightly more readable than --full-history with this patch.

-- >8 --
Subject: [PATCH] revision traversal: teach --post-simplify

The --full-history traversal keeps all merges and non-merge commits that
touch paths in the given pathspec.  This is useful to view both sides of a
merge in a topology like this:

        A---M---o
       /   /
   ---O---B

when A and B makes identical change to the given paths.  The revision
traversal without --full-history aims to come up with a simplest history
to explain the final state of the tree, and one of the side branches can
be pruned away.

The behaviour to keep all merges however is inconvenient if neither A nor
B touches the paths we are interested in.  --full-history reduces the
topology to:

   ---O---M---o

in such a case, without removing M.

This adds a post processing phase on top of --full-history traversal to
remove needless merges from the resulting history.

The idea is to compute, for each commit in the "full history" result set,
the commit that should replace it in the simplified history.  This
replacement commit is defined as follows:

 * In any case, we first figure out the replacement commits of parents of
   the commit we are looking at.  The commit we are looking at is
   rewritten as if its parents are replacement commits of its original
   parents.

 * If the commit is marked as TREESAME (i.e. it modifies the paths we are
   interested in), then the replacement commit is itself.  IOW, the commit
   is not dropped from the final result.

 * Otherwise, we examine the parents of the commit.

   - If they replace to the same commit, because the commit we are looking
     at itself does not touch the interesting paths, we replace the commit
     we are looking at with the replacement commit of its parents.

   - If some of the parents replace to one commit, and some other parents
     replace to another different commit, the commit we are looking at
     needs to stay as a merge in the final result.

The algorithm outlined above alone does not quite work; the reason is
because "all parents are replaced by the same commit" rule is too strict.
It needs to be relaxed to remove parents that are ancestor of some other
parents, and that is why post_simplify_one() calls a rather expensive
reduce_heads().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 revision.c |  125 ++++++++++++++++++++++++++++++++++++++++++++++++++----------
 revision.h |    1 +
 2 files changed, 106 insertions(+), 20 deletions(-)

diff --git a/revision.c b/revision.c
index 3897fec..a843c42 100644
--- a/revision.c
+++ b/revision.c
@@ -1045,6 +1045,10 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if (!strcmp(arg, "--topo-order")) {
 		revs->lifo = 1;
 		revs->topo_order = 1;
+	} else if (!strcmp(arg, "--post-simplify")) {
+		revs->post_simplify = 1;
+		revs->topo_order = 1;
+		revs->simplify_history = 0;
 	} else if (!strcmp(arg, "--date-order")) {
 		revs->lifo = 0;
 		revs->topo_order = 1;
@@ -1378,6 +1382,105 @@ static void add_child(struct rev_info *revs, struct commit *parent, struct commi
 	l->next = add_decoration(&revs->children, &parent->object, l);
 }
 
+static int remove_duplicate_parents(struct commit *commit)
+{
+	struct commit_list **pp, *p;
+	int surviving_parents;
+
+	/* Examine existing parents while marking ones we have seen... */
+	pp = &commit->parents;
+	while ((p = *pp) != NULL) {
+		struct commit *parent = p->item;
+		if (parent->object.flags & TMP_MARK) {
+			*pp = p->next;
+			continue;
+		}
+		parent->object.flags |= TMP_MARK;
+		pp = &p->next;
+	}
+	/* count them while clearing the temporary mark */
+	surviving_parents = 0;
+	for (p = commit->parents; p; p = p->next) {
+		p->item->object.flags &= ~TMP_MARK;
+		surviving_parents++;
+	}
+	return surviving_parents;
+}
+
+static int post_simplify_one(struct commit *commit)
+{
+	struct commit_list *p;
+	int num_parents;
+
+	for (p = commit->parents; p; p = p->next)
+		if (!p->item->util)
+			return 0;
+
+	/* All of our parents know what they should be rewritten to */
+	for (p = commit->parents; p; p = p->next)
+		p->item = p->item->util;
+	num_parents = remove_duplicate_parents(commit);
+
+	if (1 < num_parents) {
+		struct commit_list *h = reduce_heads(commit->parents);
+		num_parents = commit_list_count(h);
+		free_commit_list(commit->parents);
+		commit->parents = h;
+	}
+
+	/*
+	 * We stand for ourselves if we are root, if we change the tree,
+	 * or if we are a merge and our parents simplify to different
+	 * commits.  Otherwise we can be replaced by the commit our
+	 * sole parent is replaced by.
+	 */
+	if (!num_parents ||
+	    !(commit->object.flags & TREESAME) ||
+	    (1 < num_parents))
+		commit->util = commit;
+	else
+		commit->util = commit->parents->item->util;
+
+	return 1;
+}
+
+static void post_simplify(struct rev_info *revs)
+{
+	struct commit_list *list;
+	struct commit_list *yet_to_do, **tail;
+
+	/* feed the list reversed */
+	yet_to_do = NULL;
+	for (list = revs->commits; list; list = list->next)
+		commit_list_insert(list->item, &yet_to_do);
+	while (yet_to_do) {
+		list = yet_to_do;
+		yet_to_do = NULL;
+		tail = &yet_to_do;
+		while (list) {
+			struct commit *commit = list->item;
+			struct commit_list *next = list->next;
+			free(list);
+			list = next;
+			if (!post_simplify_one(commit))
+				tail = &commit_list_insert(commit, tail)->next;
+		}
+	}
+
+	/* clean up the result, removing the simplified ones */
+	list = revs->commits;
+	revs->commits = NULL;
+	tail = &revs->commits;
+	while (list) {
+		struct commit *commit = list->item;
+		struct commit_list *next = list->next;
+		free(list);
+		list = next;
+		if (commit->util == commit)
+			tail = &commit_list_insert(commit, tail)->next;
+	}
+}
+
 static void set_children(struct rev_info *revs)
 {
 	struct commit_list *l;
@@ -1418,6 +1521,8 @@ int prepare_revision_walk(struct rev_info *revs)
 			return -1;
 	if (revs->topo_order)
 		sort_in_topological_order(&revs->commits, revs->lifo);
+	if (revs->post_simplify)
+		post_simplify(revs);
 	if (revs->children.name)
 		set_children(revs);
 	return 0;
@@ -1450,26 +1555,6 @@ static enum rewrite_result rewrite_one(struct rev_info *revs, struct commit **pp
 	}
 }
 
-static void remove_duplicate_parents(struct commit *commit)
-{
-	struct commit_list **pp, *p;
-
-	/* Examine existing parents while marking ones we have seen... */
-	pp = &commit->parents;
-	while ((p = *pp) != NULL) {
-		struct commit *parent = p->item;
-		if (parent->object.flags & TMP_MARK) {
-			*pp = p->next;
-			continue;
-		}
-		parent->object.flags |= TMP_MARK;
-		pp = &p->next;
-	}
-	/* ... and clear the temporary mark */
-	for (p = commit->parents; p; p = p->next)
-		p->item->object.flags &= ~TMP_MARK;
-}
-
 static int rewrite_parents(struct rev_info *revs, struct commit *commit)
 {
 	struct commit_list **pp = &commit->parents;
diff --git a/revision.h b/revision.h
index f64e8ce..953e69b 100644
--- a/revision.h
+++ b/revision.h
@@ -41,6 +41,7 @@ struct rev_info {
 			simplify_history:1,
 			lifo:1,
 			topo_order:1,
+			post_simplify:1,
 			tag_objects:1,
 			tree_objects:1,
 			blob_objects:1,
-- 
1.6.0.rc1.29.gc4aca

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-30 23:56                             ` Junio C Hamano
@ 2008-07-31  0:15                               ` Junio C Hamano
  2008-07-31  0:30                               ` Linus Torvalds
  2008-07-31  8:17                               ` [PATCH v2] revision traversal: show full history with merge simplification Junio C Hamano
  2 siblings, 0 replies; 58+ messages in thread
From: Junio C Hamano @ 2008-07-31  0:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Roman Zippel, Martin Langhoff, Tim Harper, git

Junio C Hamano <gitster@pobox.com> writes:

> The idea is to compute, for each commit in the "full history" result set,
> the commit that should replace it in the simplified history.  This
> replacement commit is defined as follows:
>
>  * In any case, we first figure out the replacement commits of parents of
>    the commit we are looking at.  The commit we are looking at is
>    rewritten as if its parents are replacement commits of its original
>    parents.
>
>  * If the commit is marked as TREESAME (i.e. it modifies the paths we are
>    interested in), then the replacement commit is itself.  IOW, the commit
>    is not dropped from the final result.

A typo here.  This comment should have said !TREESAME (the code is correct).

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Bizarre missing changes (git bug?)
  2008-07-30 23:56                             ` Junio C Hamano
  2008-07-31  0:15                               ` Junio C Hamano
@ 2008-07-31  0:30                               ` Linus Torvalds
  2008-07-31  8:17                               ` [PATCH v2] revision traversal: show full history with merge simplification Junio C Hamano
  2 siblings, 0 replies; 58+ messages in thread
From: Linus Torvalds @ 2008-07-31  0:30 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Roman Zippel, Martin Langhoff, Tim Harper, Git Mailing List,
	Johannes Sixt

On Wed, 30 Jul 2008, Junio C Hamano wrote:
> 
> I am not Roman, but so I do not know if I did what Roman wanted to, but
> here is a quick hack.  "gitk --post-simplify -- kernel/printk.c" is
> slightly more readable than --full-history with this patch.

.. and if by "slightly", you mean "a lot", then yes.

Patch looks fine to me. I didn't look at the code logic very closely, but 
I suspect that it's actually hard to get the right answer with broken 
code, and the logic doesn't look broken. So Ack.

The filter-branch thing should probably be taught about this at least as 
an option. I think it was Johannes Sixt that worried about that one. Added 
to cc.

		Linus

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2] revision traversal: show full history with merge simplification
  2008-07-30 23:56                             ` Junio C Hamano
  2008-07-31  0:15                               ` Junio C Hamano
  2008-07-31  0:30                               ` Linus Torvalds
@ 2008-07-31  8:17                               ` Junio C Hamano
  2008-07-31  8:18                                 ` Junio C Hamano
  2008-07-31 22:09                                 ` [PATCH v3-wip] " Junio C Hamano
  2 siblings, 2 replies; 58+ messages in thread
From: Junio C Hamano @ 2008-07-31  8:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Roman Zippel, Martin Langhoff, Tim Harper, git

The --full-history traversal keeps all merges in addition to non-merge
commits that touch paths in the given pathspec.  This is useful to view
both sides of a merge in a topology like this:

        A---M---o
       /   /
   ---O---B

even when A and B makes identical change to the given paths.  The revision
traversal without --full-history aims to come up with the simplest history
to explain the final state of the tree, and one of the side branches can
be pruned away.

The behaviour to keep all merges however is inconvenient if neither A nor
B touches the paths we are interested in.  --full-history reduces the
topology to:

   ---O---M---o

in such a case, without removing M.

This adds a post processing phase on top of --full-history traversal to
remove needless merges from the resulting history.

The idea is to compute, for each commit in the "full history" result set,
the commit that should replace it in the simplified history.  The commit
to replace it in the final history is determined as follows:

 * In any case, we first figure out the replacement commits of parents of
   the commit we are looking at.  The commit we are looking at is
   rewritten as if the replacement commits of its original parents are its
   parents.  While doing so, we reduce the redundant parents from the
   rewritten parent list by not just removing the identical ones, but also
   removing a parent that is an ancestor of another parent.

 * After the above parent simplification, if the commit is a root commit,
   an UNINTERESTING commit, a merge commit, or modifies the paths we are
   interested in, then the replacement commit of the commit is itself.  In
   other words, such a commit is not dropped from the final result.

The first point above essentially means that the history is rewritten in
the bottom up direction.  We can rewrite the parent list of a commit only
after we know how all of its parents are rewritten.  This means that the
processing needs to happen on the full history (i.e. after limit_list()).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * Changes from the "quick hack" are:

   - When the history is bounded at the bottom, the v1 patch did not
     terminate, because it wanted to know the replacement for
     UNINTERESTING parents of commits on revs->commit list, but these
     parents were never processed.  Oops.

   - The option implies rewrite_parents.  I was tempted to make it imply
     "--parents" (which would make it always emit parent information as
     well), but didn't.

   - Toposort is still implied but it is done at the end.

   - The code is more heavily commented.

   - I do not think "--post-simplify" is particulary a good name, but I
     couldn't come up with a good one.  To mark it clearly not ready for
     'master', I changed the name to a meaningless word for now ;-)

 * Timings (best of 5 runs)

   $ git rev-list --parents --full-history --topo-order HEAD -- kernel/printk.c
   3.75user 0.47system 0:04.22elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k

   $ git rev-list --parents --oyoyo HEAD -- kernel/printk.c
   4.31user 0.06system 0:04.37elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k

   $ git rev-list --parents --full-history HEAD -- kernel/printk.c | head -n 200
   0.16user 0.02system 0:00.18elapsed 103%CPU (0avgtext+0avgdata 0maxresident)k

 revision.c |  171 +++++++++++++++++++++++++++++++++++++++++++++++++++++-------
 revision.h |    1 +
 2 files changed, 152 insertions(+), 20 deletions(-)

diff --git a/revision.c b/revision.c
index 3897fec..3b59e02 100644
--- a/revision.c
+++ b/revision.c
@@ -1045,6 +1045,11 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if (!strcmp(arg, "--topo-order")) {
 		revs->lifo = 1;
 		revs->topo_order = 1;
+	} else if (!strcmp(arg, "--oyoyo")) {
+		revs->post_simplify = 1;
+		revs->rewrite_parents = 1;
+		revs->simplify_history = 0;
+		revs->limited = 1;
 	} else if (!strcmp(arg, "--date-order")) {
 		revs->lifo = 0;
 		revs->topo_order = 1;
@@ -1378,6 +1383,150 @@ static void add_child(struct rev_info *revs, struct commit *parent, struct commi
 	l->next = add_decoration(&revs->children, &parent->object, l);
 }
 
+static int remove_duplicate_parents(struct commit *commit)
+{
+	struct commit_list **pp, *p;
+	int surviving_parents;
+
+	/* Examine existing parents while marking ones we have seen... */
+	pp = &commit->parents;
+	while ((p = *pp) != NULL) {
+		struct commit *parent = p->item;
+		if (parent->object.flags & TMP_MARK) {
+			*pp = p->next;
+			continue;
+		}
+		parent->object.flags |= TMP_MARK;
+		pp = &p->next;
+	}
+	/* count them while clearing the temporary mark */
+	surviving_parents = 0;
+	for (p = commit->parents; p; p = p->next) {
+		p->item->object.flags &= ~TMP_MARK;
+		surviving_parents++;
+	}
+	return surviving_parents;
+}
+
+static struct commit_list **post_simplify_one(struct commit *commit, struct commit_list **tail)
+{
+	struct commit_list *p;
+	int cnt;
+
+	/*
+	 * We store which commit each one simplifies to in its util field.
+	 * Have we handled this one?
+	 */
+	if (commit->util)
+		return tail;
+
+	/*
+	 * An UNINTERESTING commit simplifies to itself, so does a
+	 * root commit.  We do not rewrite parents of such commit
+	 * anyway.
+	 */
+	if ((commit->object.flags & UNINTERESTING) || !commit->parents) {
+		commit->util = commit;
+		return tail;
+	}
+
+	/*
+	 * Do we know what commit all of our parents should be rewritten to?
+	 * Otherwise we are not ready to rewrite this one yet.
+	 */
+	for (cnt = 0, p = commit->parents; p; p = p->next) {
+		if (!p->item->util) {
+			tail = &commit_list_insert(p->item, tail)->next;
+			cnt++;
+		}
+	}
+	if (cnt)
+		return tail;
+
+	/*
+	 * Rewrite our list of parents.
+	 */
+	for (p = commit->parents; p; p = p->next)
+		p->item = p->item->util;
+	cnt = remove_duplicate_parents(commit);
+
+	/*
+	 * It is possible that we are a merge and one side branch
+	 * does not have any commit that touches the given paths;
+	 * in such a case, the immediate parents will be rewritten
+	 * to different commits.
+	 *
+	 *      o----X		X: the commit we are looking at;
+	 *     /    /		o: a commit that touches the paths;
+	 * ---o----'
+	 *
+	 * Further reduce the parents by removing redundant parents.
+	 */
+	if (1 < cnt) {
+		struct commit_list *h = reduce_heads(commit->parents);
+		cnt = commit_list_count(h);
+		free_commit_list(commit->parents);
+		commit->parents = h;
+	}
+
+	/*
+	 * A commit simplifies to itself if it is a root, if it is
+	 * UNINTERESTING, if it touches the given paths, or if it is a
+	 * merge and its parents simplifies to more than one commits
+	 * (the first two cases are already handled at the beginning of
+	 * this function).
+	 *
+	 * Otherwise, it simplifies to what its sole parent simplifies to.
+	 */
+	if (!cnt ||
+	    (commit->object.flags & UNINTERESTING) ||
+	    !(commit->object.flags & TREESAME) ||
+	    (1 < cnt))
+		commit->util = commit;
+	else
+		commit->util = commit->parents->item->util;
+	return tail;
+}
+
+static void post_simplify(struct rev_info *revs)
+{
+	struct commit_list *list;
+	struct commit_list *yet_to_do, **tail;
+
+	/* feed the list reversed */
+	yet_to_do = NULL;
+	for (list = revs->commits; list; list = list->next)
+		commit_list_insert(list->item, &yet_to_do);
+	while (yet_to_do) {
+		list = yet_to_do;
+		yet_to_do = NULL;
+		tail = &yet_to_do;
+		while (list) {
+			struct commit *commit = list->item;
+			struct commit_list *next = list->next;
+			free(list);
+			list = next;
+			tail = post_simplify_one(commit, tail);
+		}
+	}
+
+	/* clean up the result, removing the simplified ones */
+	list = revs->commits;
+	revs->commits = NULL;
+	tail = &revs->commits;
+	while (list) {
+		struct commit *commit = list->item;
+		struct commit_list *next = list->next;
+		free(list);
+		list = next;
+		if (commit->util == commit)
+			tail = &commit_list_insert(commit, tail)->next;
+	}
+
+	/* sort topologically at the end */
+	sort_in_topological_order(&revs->commits, revs->lifo);
+}
+
 static void set_children(struct rev_info *revs)
 {
 	struct commit_list *l;
@@ -1418,6 +1567,8 @@ int prepare_revision_walk(struct rev_info *revs)
 			return -1;
 	if (revs->topo_order)
 		sort_in_topological_order(&revs->commits, revs->lifo);
+	if (revs->post_simplify)
+		post_simplify(revs);
 	if (revs->children.name)
 		set_children(revs);
 	return 0;
@@ -1450,26 +1601,6 @@ static enum rewrite_result rewrite_one(struct rev_info *revs, struct commit **pp
 	}
 }
 
-static void remove_duplicate_parents(struct commit *commit)
-{
-	struct commit_list **pp, *p;
-
-	/* Examine existing parents while marking ones we have seen... */
-	pp = &commit->parents;
-	while ((p = *pp) != NULL) {
-		struct commit *parent = p->item;
-		if (parent->object.flags & TMP_MARK) {
-			*pp = p->next;
-			continue;
-		}
-		parent->object.flags |= TMP_MARK;
-		pp = &p->next;
-	}
-	/* ... and clear the temporary mark */
-	for (p = commit->parents; p; p = p->next)
-		p->item->object.flags &= ~TMP_MARK;
-}
-
 static int rewrite_parents(struct rev_info *revs, struct commit *commit)
 {
 	struct commit_list **pp = &commit->parents;
diff --git a/revision.h b/revision.h
index f64e8ce..953e69b 100644
--- a/revision.h
+++ b/revision.h
@@ -41,6 +41,7 @@ struct rev_info {
 			simplify_history:1,
 			lifo:1,
 			topo_order:1,
+			post_simplify:1,
 			tag_objects:1,
 			tree_objects:1,
 			blob_objects:1,
-- 
1.6.0.rc1.31.gf448e

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v2] revision traversal: show full history with merge simplification
  2008-07-31  8:17                               ` [PATCH v2] revision traversal: show full history with merge simplification Junio C Hamano
@ 2008-07-31  8:18                                 ` Junio C Hamano
  2008-07-31 22:30                                   ` Linus Torvalds
  2008-07-31 22:09                                 ` [PATCH v3-wip] " Junio C Hamano
  1 sibling, 1 reply; 58+ messages in thread
From: Junio C Hamano @ 2008-07-31  8:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Roman Zippel, Martin Langhoff, Tim Harper, git

If you look at the output from the "kernel/printk.c" with this patch, you
would notice that there still are somewhat meaningless merges shown in the
history (e.g. scroll down to 185a257f2f73bcd89050ad02da5bedbc28fc43fa).

The mainline side keeps making steady changes to the path, but the side
branch that made tty_write_message available to others with b346671
([PATCH] Export tty_write_message() for GFS2 quota code, 2006-01-16) keeps
many "Merge from master" until it is merged back to the mainline, even
after the earlier change is reverted by 02630a1 ([GFS2] Remove dependance
on tty_write_message(), 2006-07-03).

I wonder if we can do something clever to reduce these pointless (from the
point of view of explaining kernel/printk.c's evolution, at least) merges
from the output.  This might be another example of the reason why it is a
good thing that you keep teaching people: "On your xyzzy topic, you are
doing xyzzy development, not xyzzy development plus random changes ---
don't merge my tree into yours!", and we could dismiss these extra merges
we see in the output as artifacts from a bad practice, but as long as we
are spending extra cycles, it would be better if we could reduce such
clutter.

I am still undecided about the option name.

The existing --full-history is "show history fully without simplifying the
merge at all".  This is "show history fully with merge simplification".
Perhaps --simplify-merges?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v3-wip] revision traversal: show full history with merge simplification
  2008-07-31  8:17                               ` [PATCH v2] revision traversal: show full history with merge simplification Junio C Hamano
  2008-07-31  8:18                                 ` Junio C Hamano
@ 2008-07-31 22:09                                 ` Junio C Hamano
  2008-07-31 22:26                                   ` Linus Torvalds
  1 sibling, 1 reply; 58+ messages in thread
From: Junio C Hamano @ 2008-07-31 22:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Roman Zippel, Martin Langhoff, Tim Harper, git

This one makes it incremental.  It is not relative to v2 but on top of
'master'.

The idea and the logic to find what parents to rewrite is the same as the
previous one, but this one works incrementally as much as possible.  When
you have this topology, where commits with capital letters are the ones
that change the paths you are interested in:


        A---M---o---C---D---o
       /   /
   ---o---B

(1) we can tell that the rightmost one 'o' is not we want to show, without
    digging any further than D;

(2) we can show D after inspecting C without digging any further.  C is
    the sole parent of D, and C itself is an interesting one, so D's
    parent will stay to be C and not its ancestor.

(3) before showing C, we need to know what the rewritten parent of it
    would be; we need to dig down to M and notice that it has two parents
    that simplify to a different commit (both A and B touch the path we
    are interested in), so M simplifies to itself and it becomes the
    parent of C.  IOW, we need to dig no further than A and B in order to
    show C.

$ time sh -c 'git log --pretty=oneline --abbrev-commit \
	--simplify-merges --parents \
	-- kernel/printk.c | head -n 1'
5dfb66b... 1d9b9f6... c9272c4... Merge branch 'for-linus' of git://git.o-hand.com/linux-mfd

real    0m0.344s
user    0m0.324s
sys     0m0.020s

The same query with 's/| head -n 1/>/dev/null' is more expensive.  In fact
it is much more expensive than the non-incremental one (v2), and about
three times more expensive than non-limiting --full-history for explaining
the history of kernel/printk.c.  There must be opportunity to further
optimize this, but I'd stop here for now, as you keep saying this is hard,
and if I continue thinking about this any longer my head would explode ;-)

---
 revision.c |  106 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 revision.h |    1 +
 2 files changed, 103 insertions(+), 4 deletions(-)

diff --git a/revision.c b/revision.c
index 3897fec..9554a70 100644
--- a/revision.c
+++ b/revision.c
@@ -1045,6 +1045,10 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if (!strcmp(arg, "--topo-order")) {
 		revs->lifo = 1;
 		revs->topo_order = 1;
+	} else if (!strcmp(arg, "--simplify-merges")) {
+		revs->simplify_merges = 1;
+		revs->rewrite_parents = 1;
+		revs->simplify_history = 0;
 	} else if (!strcmp(arg, "--date-order")) {
 		revs->lifo = 0;
 		revs->topo_order = 1;
@@ -1450,9 +1454,10 @@ static enum rewrite_result rewrite_one(struct rev_info *revs, struct commit **pp
 	}
 }
 
-static void remove_duplicate_parents(struct commit *commit)
+static int remove_duplicate_parents(struct commit *commit)
 {
 	struct commit_list **pp, *p;
+	int surviving_parents;
 
 	/* Examine existing parents while marking ones we have seen... */
 	pp = &commit->parents;
@@ -1465,9 +1470,13 @@ static void remove_duplicate_parents(struct commit *commit)
 		parent->object.flags |= TMP_MARK;
 		pp = &p->next;
 	}
-	/* ... and clear the temporary mark */
-	for (p = commit->parents; p; p = p->next)
+	/* count them while clearing the temporary mark */
+	surviving_parents = 0;
+	for (p = commit->parents; p; p = p->next) {
 		p->item->object.flags &= ~TMP_MARK;
+		surviving_parents++;
+	}
+	return surviving_parents;
 }
 
 static int rewrite_parents(struct rev_info *revs, struct commit *commit)
@@ -1536,6 +1545,89 @@ enum commit_action simplify_commit(struct rev_info *revs, struct commit *commit)
 	return commit_show;
 }
 
+static void simplify_merges(struct rev_info *revs, struct commit *commit)
+{
+	struct commit_list *work = NULL;
+
+	commit_list_insert(commit, &work);
+	while (!commit->util && work) {
+		struct commit *c;
+		struct commit_list *p;
+		int cnt;
+
+		c = pop_commit(&work);
+		if (c->util)
+			continue;
+		if ((c->object.flags & UNINTERESTING) || !c->parents) {
+			c->util = c;
+			continue;
+		}
+
+		/*
+		 * Do we know what commit all of the parents of this
+		 * should be rewritten to?  Otherwise we are not ready
+		 * to rewrite this one yet.
+		 */
+		for (cnt = 0, p = c->parents; p; p = p->next) {
+			if (!p->item->util) {
+				if (!cnt)
+					commit_list_insert(c, &work);
+				commit_list_insert(p->item, &work);
+				cnt++;
+			}
+		}
+		if (cnt)
+			continue;
+
+		/*
+		 * Rewrite the list of parents.
+		 */
+		for (p = c->parents; p; p = p->next)
+			p->item = p->item->util;
+		cnt = remove_duplicate_parents(c);
+
+		/*
+		 * It is possible that this is a merge and one side
+		 * branch does not have any commit that touches the
+		 * given paths; in such a case, the immediate parents
+		 * will be rewritten to different commits if we do not
+		 * reduce such a false merge of fast-forward parents.
+		 *
+		 *      o----X		X: the commit we are looking at;
+		 *     /    /		o: a commit that touches the paths;
+		 * ---o----'
+		 *
+		 * Further reduce the parents by removing redundant
+		 * parents.
+		 */
+		if (1 < cnt) {
+			struct commit_list *h = reduce_heads(c->parents);
+			cnt = commit_list_count(h);
+			free_commit_list(c->parents);
+			c->parents = h;
+		}
+
+		/*
+		 * A commit simplifies to itself if it is a root, if
+		 * it is UNINTERESTING, if it touches the given paths,
+		 * or if it is a merge and its parents simplifies to
+		 * more than one commits (the first two cases are
+		 * already handled at the beginning of this function).
+		 *
+		 * Otherwise, it simplifies to what its sole parent
+		 * simplifies to.
+		 */
+		if (!cnt ||
+		    (c->object.flags & UNINTERESTING) ||
+		    !(c->object.flags & TREESAME) ||
+		    (1 < cnt))
+			c->util = c;
+		else
+			c->util = c->parents->item->util;
+	}
+	free_commit_list(work);
+}
+
 static struct commit *get_revision_1(struct rev_info *revs)
 {
 	if (!revs->commits)
@@ -1570,8 +1662,14 @@ static struct commit *get_revision_1(struct rev_info *revs)
 		case commit_error:
 			return NULL;
 		default:
-			return commit;
+			break;
+		}
+		if (revs->simplify_merges && !commit->util) {
+			simplify_merges(revs, commit);
+			if (commit->util != commit)
+				continue;
 		}
+		return commit;
 	} while (revs->commits);
 	return NULL;
 }
diff --git a/revision.h b/revision.h
index f64e8ce..dfa06b5 100644
--- a/revision.h
+++ b/revision.h
@@ -41,6 +41,7 @@ struct rev_info {
 			simplify_history:1,
 			lifo:1,
 			topo_order:1,
+			simplify_merges:1,
 			tag_objects:1,
 			tree_objects:1,
 			blob_objects:1,

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v3-wip] revision traversal: show full history with merge simplification
  2008-07-31 22:09                                 ` [PATCH v3-wip] " Junio C Hamano
@ 2008-07-31 22:26                                   ` Linus Torvalds
  2008-07-31 22:36                                     ` Junio C Hamano
  2008-08-01  3:00                                     ` Junio C Hamano
  0 siblings, 2 replies; 58+ messages in thread
From: Linus Torvalds @ 2008-07-31 22:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Roman Zippel, Martin Langhoff, Tim Harper, git

On Thu, 31 Jul 2008, Junio C Hamano wrote:
> 
> The same query with 's/| head -n 1/>/dev/null' is more expensive.  In fact
> it is much more expensive than the non-incremental one (v2), and about
> three times more expensive than non-limiting --full-history for explaining
> the history of kernel/printk.c.

Hmm? Why is that, exactly? Does it walk over the same commit over and over 
and over again or something?

Can you combine --simplify-merges and --topo-order to get a fast version 
again (since --topo-order will force a non-incrmental walk)?

I have this suspicion (gut feel only, not anything else to back it up) 
that for any complex global history, you'll always end up having a lot of 
merges "live" and have a hard time getting a lot of early output. 

That may be why you get a fairly big delay before even the first commit:

> $ time sh -c 'git log --pretty=oneline --abbrev-commit \
>        --simplify-merges --parents \
>        -- kernel/printk.c | head -n 1'
> 5dfb66b... 1d9b9f6... c9272c4... Merge branch 'for-linus' of git://git.o-hand.com/linux-mfd
>
> real    0m0.344s
> user    0m0.324s
> sys     0m0.020s

>From your previous email:

   $ git rev-list --parents --full-history --topo-order HEAD -- kernel/printk.c
   3.75user 0.47system 0:04.22elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k

so that's less than 10% of the whole time, but it's still a _lot_ slower 
than the

   $ git rev-list --parents --full-history HEAD -- kernel/printk.c | head -n 200
   0.16user 0.02system 0:00.18elapsed 103%CPU (0avgtext+0avgdata 0maxresident)k

and that was the first 200 commits, not just the first one.  I bet you got 
the first one in about a tenth of that time - so I'm guessing 0.016s (also 
based on my own testing - it's below 0.01s here, but I'm willing to bet my 
machine is faster than yours is).

So getting the first one with "--simplify-merges" was really a _lot_ 
slower.

That said, I'm a huge beliver in the incremental approach - it just looks 
like this is potentially "just barely incremental" in practice.

Of course, with a more linear history than the kernel, your approach 
probably works better.

			Linus

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2] revision traversal: show full history with merge simplification
  2008-07-31  8:18                                 ` Junio C Hamano
@ 2008-07-31 22:30                                   ` Linus Torvalds
  0 siblings, 0 replies; 58+ messages in thread
From: Linus Torvalds @ 2008-07-31 22:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Roman Zippel, Martin Langhoff, Tim Harper, git

On Thu, 31 Jul 2008, Junio C Hamano wrote:
>
> If you look at the output from the "kernel/printk.c" with this patch, you
> would notice that there still are somewhat meaningless merges shown in the
> history (e.g. scroll down to 185a257f2f73bcd89050ad02da5bedbc28fc43fa).

They're not really meaningless.

Yes, they are pointless for the end result, but once you start showing 
that whole pointless branch they very much are needed for a complete view 
of the "shape" of history. The merges are real points on that branch where 
printk changed because it got updates from mainlines.

So either you should have the full simplification (which only shows stuff 
that is really meaningful for the end result), or you need for those 
"pointless" merges to remain (because you show the changes that happened 
on side branches).

I obviously believe that the full simplification is what you most often 
want, but the --post-simplify thing does make sense.

(And yes, I agree that the name should be something else, and that 
--simplify-merges makes more sense. The "post-simplify" thing is an 
implementation issue, and doesn't describe what the effect is. And with 
your incremental one, even that isn't true).

			Linus

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3-wip] revision traversal: show full history with merge simplification
  2008-07-31 22:26                                   ` Linus Torvalds
@ 2008-07-31 22:36                                     ` Junio C Hamano
  2008-08-01  3:00                                     ` Junio C Hamano
  1 sibling, 0 replies; 58+ messages in thread
From: Junio C Hamano @ 2008-07-31 22:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Roman Zippel, Martin Langhoff, Tim Harper, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Thu, 31 Jul 2008, Junio C Hamano wrote:
>> 
>> The same query with 's/| head -n 1/>/dev/null' is more expensive.  In fact
>> it is much more expensive than the non-incremental one (v2), and about
>> three times more expensive than non-limiting --full-history for explaining
>> the history of kernel/printk.c.
>
> Hmm? Why is that, exactly? Does it walk over the same commit over and over 
> and over again or something?
>
> Can you combine --simplify-merges and --topo-order to get a fast version 
> again (since --topo-order will force a non-incrmental walk)?

Heh, nice try to make my head explode ;-)  Not today, no, really, no...

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3-wip] revision traversal: show full history with merge simplification
  2008-07-31 22:26                                   ` Linus Torvalds
  2008-07-31 22:36                                     ` Junio C Hamano
@ 2008-08-01  3:00                                     ` Junio C Hamano
  2008-08-01  3:48                                       ` Linus Torvalds
  1 sibling, 1 reply; 58+ messages in thread
From: Junio C Hamano @ 2008-08-01  3:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Roman Zippel, Martin Langhoff, Tim Harper, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Thu, 31 Jul 2008, Junio C Hamano wrote:
>> 
>> The same query with 's/| head -n 1/>/dev/null' is more expensive.  In fact
>> it is much more expensive than the non-incremental one (v2), and about
>> three times more expensive than non-limiting --full-history for explaining
>> the history of kernel/printk.c.
>
> Hmm? Why is that, exactly? Does it walk over the same commit over and over 
> and over again or something?

It was even worse than that.

The output from v3 is incorrect, as the place the new call is hooked into
knows only that the commit in question is not UNINTERESTING, but hasn't
inspected its parents, but the simplification logic needs to dig into the
parent chain deep enough, which it does not do correctly using the proper
simplification logic (i.e. add_parents_to_list()).

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3-wip] revision traversal: show full history with merge simplification
  2008-08-01  3:00                                     ` Junio C Hamano
@ 2008-08-01  3:48                                       ` Linus Torvalds
  2008-08-01  7:50                                         ` Junio C Hamano
  0 siblings, 1 reply; 58+ messages in thread
From: Linus Torvalds @ 2008-08-01  3:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Roman Zippel, Martin Langhoff, Tim Harper, git

On Thu, 31 Jul 2008, Junio C Hamano wrote:
>
> It was even worse than that.
> 
> The output from v3 is incorrect

Ok. I'm really not surprised. Incrementally is really hard. I'm reminded 
of all the problems we had with just the "trivial" issue of just knowing 
when to consider something uninteresting or not, that ended up depending 
on commit timestamps etc, and had problems with people having their clocks 
set incorrectly.

Doing the ops once you have the full DAG is usually _trivial_ by 
comparison. 

		Linus

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3-wip] revision traversal: show full history with merge simplification
  2008-08-01  3:48                                       ` Linus Torvalds
@ 2008-08-01  7:50                                         ` Junio C Hamano
  0 siblings, 0 replies; 58+ messages in thread
From: Junio C Hamano @ 2008-08-01  7:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Roman Zippel, Martin Langhoff, Tim Harper, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Thu, 31 Jul 2008, Junio C Hamano wrote:
>>
>> It was even worse than that.
>> 
>> The output from v3 is incorrect
>
> Ok. I'm really not surprised. Incrementally is really hard. I'm reminded 
> of all the problems we had with just the "trivial" issue of just knowing 
> when to consider something uninteresting or not, that ended up depending 
> on commit timestamps etc, and had problems with people having their clocks 
> set incorrectly.
>
> Doing the ops once you have the full DAG is usually _trivial_ by 
> comparison. 

Surely.  I wasn't productive tonight anyway, and I'll give up for now and
keep the post-processing version in 'pu', perhaps queued in 'next' during
the 1.6.0-rc period.  Perhaps somebody cleverer than me will feel itchy
enough to make an incremental version someday ;-)

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2008-08-01  7:51 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-21 20:26 Bizarre missing changes (git bug?) Tim Harper
2008-07-21 20:37 ` Linus Torvalds
2008-07-21 22:53   ` Tim Harper
2008-07-21 22:55     ` Tim Harper
     [not found]   ` <8C23FB54-A28E-4294-ABEA-A5766200768B@gmail.com>
2008-07-21 22:57     ` Linus Torvalds
2008-07-26  3:12   ` Roman Zippel
2008-07-26 19:58     ` Linus Torvalds
2008-07-27 17:50       ` Roman Zippel
2008-07-27 18:47         ` Linus Torvalds
2008-07-27 23:14           ` Roman Zippel
2008-07-27 23:18             ` Linus Torvalds
2008-07-28  0:00               ` Roman Zippel
2008-07-28  5:00                 ` Linus Torvalds
2008-07-28  5:30                   ` Linus Torvalds
2008-07-29  2:59                   ` Roman Zippel
2008-07-29  3:15                     ` Martin Langhoff
2008-07-30  0:16                       ` Roman Zippel
2008-07-30  0:25                         ` Martin Langhoff
2008-07-30  0:32                         ` Linus Torvalds
2008-07-30  0:48                           ` Linus Torvalds
2008-07-30 23:56                             ` Junio C Hamano
2008-07-31  0:15                               ` Junio C Hamano
2008-07-31  0:30                               ` Linus Torvalds
2008-07-31  8:17                               ` [PATCH v2] revision traversal: show full history with merge simplification Junio C Hamano
2008-07-31  8:18                                 ` Junio C Hamano
2008-07-31 22:30                                   ` Linus Torvalds
2008-07-31 22:09                                 ` [PATCH v3-wip] " Junio C Hamano
2008-07-31 22:26                                   ` Linus Torvalds
2008-07-31 22:36                                     ` Junio C Hamano
2008-08-01  3:00                                     ` Junio C Hamano
2008-08-01  3:48                                       ` Linus Torvalds
2008-08-01  7:50                                         ` Junio C Hamano
2008-07-30  8:36                         ` Bizarre missing changes (git bug?) Jakub Narebski
2008-07-29  3:29                     ` Linus Torvalds
2008-07-29  3:33                       ` Linus Torvalds
2008-07-29 11:39                       ` Roman Zippel
2008-07-29 12:00                         ` David Kastrup
2008-07-29 15:50                         ` Linus Torvalds
2008-07-30  1:14                           ` Roman Zippel
2008-07-30  1:32                             ` Kevin Ballard
2008-07-30  1:49                             ` Linus Torvalds
2008-07-29  5:31                     ` Jeff King
2008-07-29 12:32                       ` Roman Zippel
2008-07-29 12:48                         ` Olivier Galibert
2008-07-29 12:52                         ` Jeff King
2008-07-29 17:25                           ` Linus Torvalds
2008-07-30  1:50                             ` Roman Zippel
2008-07-30  2:05                               ` Linus Torvalds
2008-07-30  4:26                             ` Jeff King
2008-07-30  4:52                               ` Linus Torvalds
2008-07-30  2:48                           ` Roman Zippel
2008-07-30  3:20                             ` Kevin Ballard
2008-07-30  3:21                             ` Linus Torvalds
2008-07-30  3:35                               ` Linus Torvalds
2008-07-30  4:23                             ` Jeff King
2008-07-27 23:25             ` Martin Langhoff
2008-07-28  1:29               ` Roman Zippel
2008-07-21 20:42 ` Alex Riesen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).