Git development
 help / color / mirror / Atom feed
* Re: Bizarre missing changes (git bug?)
From: Linus Torvalds @ 2008-07-30  4:52 UTC (permalink / raw)
  To: Jeff King; +Cc: Roman Zippel, Tim Harper, git
In-Reply-To: <20080730042609.GB3350@sigill.intra.peff.net>



On Wed, 30 Jul 2008, Jeff King wrote:
> 
> I agree with you, btw. It is definitely correct and useful; however, I
> am curious if there is some "in between" level of simplification that
> might produce an alternate graph that has interesting features. And that
> is why I am trying to get Roman to lay out exactly what it is he wants.

Actually, I know what he wants, since I tried to describe it for the 
filter-branch discussion. It's really not that conceptually complex.

Basically, the stupid model is to just do this:

 - start with --full-history

 - for each merge, look at both parents. If one parent leads directly to 
   a commit that can be reached from the the other, just remove that 
   parent as being redundant. And if that removal leads to a merge now 
   becoming a non-merge, and it has no changes wrt its single remaining 
   parent, remove the commit entirely (rewriting any parenthood to make 
   the rest all stay together, of course)

 - repeat until you cannot do any more simplification (removing one commit 
   can actually cause its children to now become targets for this 
   simplification).

and I suspect that

 (a) the stupid model is probably at least O(n^3) if done stupidly and 
     O(n^2) with some modest amount of smarts (keeping a list of at least 
     potential targets of simplification and expanding it only when 
     actually simplifying), but that
 (b) you can concentrate on just the merges that the current optimizing 
     algorithm would have removed, so 'n' is not the total number of 
     commits, but at most the number of merges, and more likely actually 
     just the number of trivial merges in that file, and finally
 (c) there is likely some smart and efficient graph minimization algorithm 
     that is O(nlogn) or something.

so I don't think it's likely to be hugely more expensive than the 
topo-sort is. All the real expense is in the same thing the topo-sort 
expense, namely in generating the list up-front.

I bet googling for "minimal directed acyclic graph" will give pointers.

And despite the fact that I've argued against Roman's world-view, I 
actually _do_ think it would be nice to have that third mode, the same way 
that we have --topo-order. It wouldn't be good for the _default_ view, but 
then neither is --full-history, so that's not a big argument.

That said, I'd like to (again) repeat the caveat that it's probably best 
done in the tool that actally visualizes the mess - exactly for the same 
reason that I argued for the topological sort being done in gitk. It's 
very painful to have to wait for the first few commits to start appearing 
in the history window.

Admittedly most of my work is actually done on machines that are pretty 
fast, but every once in a while I travel with a laptop. And more 
importantly, not everybody gets new hardware from Intel for testing even 
before the CPU has been released. So others will still appreciate 
incremental history updates, even if my machine might be fast enough (and 
my kernel tree always in the caches) that I myself could live with a 
synchronous version a-la --topo-order.

			Linus

^ permalink raw reply

* Re: q: git-fetch a tad slow?
From: Shawn O. Pearce @ 2008-07-30  4:48 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: git
In-Reply-To: <20080729090802.GA11373@elte.hu>

Ingo Molnar <mingo@elte.hu> wrote:
> * Shawn O. Pearce <spearce@spearce.org> wrote:
> > Ingo Molnar <mingo@elte.hu> wrote:
> > > 
> > > Setup/background: distributed kernel testing cluster, [...]
> > > 
> > > Problem: i noticed that git-fetch is a tad slow:
> > > 
> > >   titan:~/tip> time git-fetch
> > >   real    0m2.372s
>
> note that titan is a very beefy box, almost 3 GHz Core2Duo:

That isn't going to matter if you have a quadratic algorithm and a
large dataset.  Especially when the inner loops are doing multiple
system calls per item in a long list of items.  :-|   Linux is fast,
but it isn't magic pixie dust.  It cannot fix broken applications.
 
> [...] So if we have a quadratic overhead on number of 
> branches, that's going to be quite a PITA.

Right.

> > I wonder if git-pack-refs + fetching only a single branch will get you 
> > closer to the tip-fetch time.
> 
> should i pack on both repos? I dont explicitly pack anything, but on the 
> server it goes into regular gc runs. (which will pack most stuff, 
> right?)

git-gc automatically runs `git pack-refs --all --prune` like I
recommended, unless you disabled it with config gc.packrefs = false.
So its probably already packed.

What does `find .git/refs -type f | wc -l` give for the repository
on the central server?  If its more than a handful (~20) I would
suggest running git-gc before testing again.

But I'm really suspecting that this is just our quadratic matching
algorithm running up against a large number of branches, causing
it to suck.

jgit at least uses an O(N) algorithm here, but since it is written
in Java its of course slow compared to C Git.  Takes a while to
get that JVM running.

I'll try to find some time to reproduce the issue and look at the
bottleneck here.  I'm two days into a new job so my git time has
been really quite short this week.  :-|

-- 
Shawn.

^ permalink raw reply

* Re: [RFC/PATCH v3] merge-base: teach "git merge-base" to accept more than 2 arguments
From: Christian Couder @ 2008-07-30  4:52 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Junio C Hamano, Miklos Vajna, Jakub Narebski
In-Reply-To: <alpine.DEB.1.00.0807281328520.2725@eeepc-johanness>

Hi,

Le lundi 28 juillet 2008, Johannes Schindelin a écrit :
> Hi,
>
> On Mon, 28 Jul 2008, Christian Couder wrote:
> > +	rev = xmalloc((argc - 1) * sizeof(*rev));
> > +
> > +	do {
> > +		struct commit *r = get_commit_reference(argv[1]);
> > +		if (!r)
> > +			return 1;
> > +		rev[rev_nr++] = r;
> > +		argc--; argv++;
> > +	} while (argc > 1);
> > +
> > +	return show_merge_base(rev, rev_nr, show_all);
>
> 	rev = xmalloc((argc - 1) * sizeof(*rev));
>
> 	for (rev_nr = 0; rev_nr + 1 < argc; rev_nr++) {
> 		rev[rev_nr] = get_commit_reference(argv[rev_nr + 1]);
> 		if (!rev[rev_nr])
> 			return !!error("Does not refer to a commit: '%s'",
> 				argv[rev_nr + 1]);
> 	}
>
> 	return show_merge_base(rev, rev_nr, show_all);
>
> I do not know about you, but I think this is not only shorter (in spite
> of adding a helpful error message), but also simpler to understand (not
> using convoluted do { } while logic), and therefore superior.

In my last version the loop is reduced to:

+	do {
+		rev[rev_nr++] = get_commit_reference(argv[1]);
+		argc--; argv++;
+	} while (argc > 1);

so it's very simple.

And the stop condition is simpler in my version.

> Your performance argument is weak IMHO, as this is not a big performance
> hit, and command line parameter parsing is definitely not performance
> critical.

It feels a bit sloppy though.

Regards,
Christian.

^ permalink raw reply

* Re: git-svn does not seems to work with crlf convertion enabled.
From: Alexander Litvinov @ 2008-07-30  4:37 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git
In-Reply-To: <alpine.DEB.1.00.0807231117290.2830@eeepc-johanness>

> This is a known issue, but since nobody with that itch seems to care
> enough to fix it, I doubt it will ever be fixed.

Hello again.

I have investigated this problem. Short result: git-svn and ANY file 
convertion will not work now.

In my case I have found the problem is the 
SVN::Git::Fetcher::apply_textdelta() function. To be more precicly call to 
SVN::TxDelta::apply(). We fetch previous version of file from git and then 
apply to it svn's delta. As far as we modify src file SVN fails to apply its 
delta. If I modify last commit and put original version of file everything 
works.

So it seems to me there are two solutions: 
1. Store original file somehow and use it to construct new file version;
2. In case of this error we could fetch full blob with new (or old) version of 
the file.

I did not find the way to gather full file conntent nor feel myself ready to 
rewrite git-svn to store original file somewhere.

Does anybody can help or comment on this ?

^ permalink raw reply

* Re: Bizarre missing changes (git bug?)
From: Jeff King @ 2008-07-30  4:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Roman Zippel, Tim Harper, git
In-Reply-To: <alpine.LFD.1.10.0807291006070.3334@nehalem.linux-foundation.org>

On Tue, Jul 29, 2008 at 10:25:35AM -0700, Linus Torvalds wrote:

> On Tue, 29 Jul 2008, Jeff King wrote:
> > 
> > I glanced briefly over "gitk kernel/printk.c" and it looks pretty sane.
> 
> Jeff, it _is_ sane. When Roman says it's "incorrect", he is just wrong.

I agree with you, btw. It is definitely correct and useful; however, I
am curious if there is some "in between" level of simplification that
might produce an alternate graph that has interesting features. And that
is why I am trying to get Roman to lay out exactly what it is he wants.

-Peff

^ permalink raw reply

* Re: Bizarre missing changes (git bug?)
From: Jeff King @ 2008-07-30  4:23 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Linus Torvalds, Tim Harper, git
In-Reply-To: <Pine.LNX.4.64.0807300430590.6791@localhost.localdomain>

On Wed, Jul 30, 2008 at 04:48:54AM +0200, Roman Zippel wrote:

> Now compare the output of "git-log file1", "git-log --full-history file1" 
> and "git-log --full-history --parents file1". You get either both merge 
> commits or none, but only one of it is relevant to file1.

Ah, I see.

So if I understand you, you wanted to see something like:


A--B
 \  \
  C--D

where

 A = initial commit
 B = duplicate change 1
 C = duplicate change 2
 D = merge branch 'test2' into HEAD

where the simplification isn't as aggressive (you still see the
duplicate commits and the merge), but we can get rid of the later merge
between A and D because A is already an ancestor of D.

So do you have a proposed set of simplification rules that will produce
that output?

-Peff

^ permalink raw reply

* Re: Bizarre missing changes (git bug?)
From: Linus Torvalds @ 2008-07-30  3:35 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Jeff King, Tim Harper, git
In-Reply-To: <alpine.LFD.1.10.0807292002520.3334@nehalem.linux-foundation.org>



On Tue, 29 Jul 2008, Linus Torvalds wrote:
> 
> In other words, that change - in a VERY REAL WAY - never actually mattered 
> for the current state of kernel/printk.c. And the history simplification 
> sees that, and avoids showing the whole pointless branch.

Btw, Roman, this is a really really important thing for you to realize. 

You need to realize that your "perfect" output really REALLY is totally 
inferior, if what you are actually interested in is "how did things get to 
be the way they are".

It's a _feature_. It's not a bug. And it's a really good one.

If side branches didn't matter for the contents of the file, those side 
branches simply don't matter, and showing them is just a distraction.

Yes, you can ask for the history that doesn't matter for the end result. 
And yes, I acknowledge freely that it would be good to then have a 
separate cleanup phase to make that thing more readable. In fact, in the 
very first reply to you I pointed you to a thread where I said exactly 
that, long before this thread even started.

But no, the current default isn't broken. No, it's not "lazy" either. No, 
it was not an "accident". And no, it's not "incorrect".

And until you can see that (along with all the reasons I've outlined why 
your "fixed" approach is a total piece of sh*t from a performance angle), 
you're just being stupid.

				Linus

^ permalink raw reply

* Re: Bizarre missing changes (git bug?)
From: Linus Torvalds @ 2008-07-30  3:21 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Jeff King, Tim Harper, git
In-Reply-To: <Pine.LNX.4.64.0807300430590.6791@localhost.localdomain>



On Wed, 30 Jul 2008, Roman Zippel wrote:
> 
> For printk.c look for commit 02630a12c7f72fa294981c8d86e38038781c25b7 and 
> try to find it in the graphical outputs.

Umm.

Why would you? Yes, it's there, if you ask for --full-history. And no, I 
don't think --full-history is actually useful to humans - it's very much 
there as a "here's all the data" thing where you could have the tools 
post-process it, where often "post-processing" is actually just searching 
for it.

And no, it's not there if you don't use --full-history.

But now, instead of _complaining_ about this, I would suggest you think 
about why it's a _good_ thing, and why it's so useful?

In other words, you're arriving at all your complaints from the wrong 
angle entirely, and because you have convinced yourself that things have 
to work a certain way, and then you're upset when they don't.

But you should _unconvince_ yourself - and look at whether maybe all your 
initial preconceptions were perhaps totally wrong? Because they were.

The reason that commit 02630a12c7f72fa294981c8d86e38038781c25b7 doesn't 
show up in the normal log when looking at kernel/printk.c is that it 
really doesn't exist as a _relevant_ part of history for the current state 
of that file. It exists only as a a side-branch for the GFS2 quota code 
that first adds a line

	+EXPORT_SYMBOL_GPL(tty_write_message);

(in commit b346671fa196a), and then removes the line not long after (in 
that commit 02630a12c7f). And both of them go away (along with the whole 
side-branch), because they didn't end up mattering for the end result: 
they only ever existed in that side branch, and by the time it was merged 
back into the main branch, all changes had been undone.

In other words, that change - in a VERY REAL WAY - never actually mattered 
for the current state of kernel/printk.c. And the history simplification 
sees that, and avoids showing the whole pointless branch.

This is such an obviously _good_ thing that I really am surprised ay how 
you can continue to argue against it. Especially as the examples you give 
"for" your argument are so wonderful examples _against_ it.

And yes, you can actually force gitk to show the state of that commit and 
thus force it to acknowledge that that state was relevant (although you 
won't necessarily force it to acknowledge that the relevance ties together 
with the final end result). You do that by just telling it that you're not 
just interested in HEAD, but in that commit too.

So I would literally suggest that anybody interested in this subject 
really just do

	gitk kernel/printk.c &
	gitk HEAD 02630a12c7f72fa294981c8d86e38038781c25b7 kernel/printk.c &

in the kernel, and now compare the two side-by-side. Notice where they 
differ (hint: look for the commit a0f1ccfd8d37457a6d8a9e01acebeefcdfcc306e 
- "[PATCH] lockdep: do not recurse in printk" - which is in both, and look 
below it).

Now, which graph is the more relevant and understandable one from the 
standpoint of what the current state of kernel/printk.c is?

Honestly now, Roman.

Because if you were actually willing to see this as a _feature_ (which it 
very much is), you'd admit that it's a damn clever and useful one. But I 
suspect you have dug yourself so deep into a hole that you can't admit 
that even to yourself any more.

				Linus

^ permalink raw reply

* Re: Bizarre missing changes (git bug?)
From: Kevin Ballard @ 2008-07-30  3:20 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Jeff King, Linus Torvalds, Tim Harper, git
In-Reply-To: <Pine.LNX.4.64.0807300430590.6791@localhost.localdomain>

On Jul 29, 2008, at 7:48 PM, Roman Zippel wrote:

> For printk.c look for commit  
> 02630a12c7f72fa294981c8d86e38038781c25b7 and
> try to find it in the graphical outputs.
> Here is a bit better example than Linus gave:
>
> [snip]
>
> Now compare the output of "git-log file1", "git-log --full-history  
> file1"
> and "git-log --full-history --parents file1". You get either both  
> merge
> commits or none, but only one of it is relevant to file1.
>
> The problem is that in practice "git-log --full-history --parents"
> produces way too much information to be useful right away.

Output looks correct to me. And of course --full-history --parents  
gives lots of output - that's what it's for. You seem to believe that  
the appropriate output is, what, to display the initial commit, both  
commits that modified file1, and the first merge, yes? Can you please  
clarify the logic that states that the first merge commit should be  
shown but the second should not?

-Kevin Ballard

-- 
Kevin Ballard
http://kevin.sb.org
kevin@sb.org
http://www.tildesoft.com

^ permalink raw reply

* Re: Bizarre missing changes (git bug?)
From: Roman Zippel @ 2008-07-30  2:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Tim Harper, git
In-Reply-To: <20080729125247.GC12069@sigill.intra.peff.net>

Hi,

On Tue, 29 Jul 2008, Jeff King wrote:

> > > Perhaps I am just slow, but I haven't been able to figure out what that
> > > history is, or what the "correct" output should be. Can you try to state
> > > more clearly what it is you are looking for?
> > 
> > Most frequently this involves changes where the same change is merged 
> > twice. Another interesting example is kernel/printk.c where a change is 
> > added and later removed again before it's merged.
> 
> I glanced briefly over "gitk kernel/printk.c" and it looks pretty sane.
> I was really hoping for you to make your case as something like:
> 
>   1. here is an ascii diagram of an actual history graph (or a recipe of
>      git commands for making one)
>   2. here is what git-log (or gitk) produces for this history by
>      default; and here is why it is not optimal (presumably some
>      information it fails to convey)
>   3. here is what git-log (or gitk) with --full-history produces; and
>      here is why it is not optimal (presumably because it is too messy)
>   4. here is what output I would like to see. Bonus points for "and here
>      is an algorithm that accomplishes it."

For printk.c look for commit 02630a12c7f72fa294981c8d86e38038781c25b7 and 
try to find it in the graphical outputs.
Here is a bit better example than Linus gave:

mkdir test
cd test
git init

echo 1 > file1
echo a > file2

git add file1 file2
git commit -m "initial commit"
git tag base

git branch test1 base
git checkout test1
echo 2 > file1
git commit -a -m "duplicate change 1"

git branch test2 base
git checkout test2
echo 2 > file1
git commit -a -m "duplicate change 2"

git branch test3 base
git checkout test3
echo b > file2
git commit -a -m "some other change"

git checkout base

git merge test1
git merge test2
git merge test3

Now compare the output of "git-log file1", "git-log --full-history file1" 
and "git-log --full-history --parents file1". You get either both merge 
commits or none, but only one of it is relevant to file1.

The problem is that in practice "git-log --full-history --parents" 
produces way too much information to be useful right away.

bye, Roman

^ permalink raw reply

* Re: Bizarre missing changes (git bug?)
From: Linus Torvalds @ 2008-07-30  2:05 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Jeff King, Tim Harper, git
In-Reply-To: <Pine.LNX.4.64.0807300315280.6791@localhost.localdomain>



On Wed, 30 Jul 2008, Roman Zippel wrote:
> > 
> > The "gitk file" history is the simplest one BY FAR, because it has very 
> > aggressively simplified history to the point where it tried to find the 
> > _simplest_ history that explains the current contents of 'file'[*]
> 
> It's "aggressively simplified" by not even bothering to look for more.

Yes and no.

It's aggressively simplified because that's the right output with the 
minimal unnecessary irrelevant information. It explains how the file came 
to a particular state, with the simplest possible self-consistent history.

(Again, the caveat about "simplest possible" always beign a local 
minimization, not a global one).

The fact that it also obviously involved less work (so git can do it 
faster, and with fewer disk and memory accesses) is a huge bonus, of 
course.

Are you complaining about the fact that I'm smart, and I get the right 
result I want with less work and with a simpler algorithm?

What's your point?

> "simplified" implies there is something more complex beforehand, but all 
> it does is simple scan through the history as fast possible without 
> bothering looking left or right.

You're just being stupid.

It's not that it's not "bothering" looking left or right. It very much 
*does* bother to look left or right. But once it finds that one or the 
other explains the situation entirely, it then says "screw left, I already 
know that rigth gives me the information I want".

In other words, it's doing the _smart_ thing. 

I don't understand why you complain about intelligence.

It's *not* just looking at one single history. Look at

	gitk kernel/sched.c

and notice that the simplified history is not linear. It tries to make it 
AS LINEAR AS POSSIBLE, BUT NO MORE.

    "Make everything as simple as possible, but not simpler."
			- Albert Einstein

You seem to complain about the fact that it's doing that. That's stupid of 
you.

> "simplified" implies to me it's something intentional, but this is more of 
> an accidental optimization which happens to work in most situations and in 
> the special cases it just picks a random change and hopes for the best.

You're just crazy. There is nothing accidental there what-so-ever.

> "git-log --full-history file" at least produces the full change history, 
> but it has an performance impact and it doesn't produce a complete graph 
> usable for graphical front ends.

Umm. You have to add "--parents" if you want a full graph. Without that, 
you can never re-generate the graph anyway.

And when you do that, it _does_ give all the commits needed to complete 
the picture.

In other words, git (once again) is actually smarter than you, and does 
the right thing, and (once again) you complain about something that you 
just don't understand.

> I gave more general examples. Tracking upstream source can produce this 
> problem frequently. Another example are stable/unstable branches where the 
> stable branch is occasionally merged into the unstable branch can produce 
> this problem.

You call it a "problem", but you don't actually give any reason for 
calling it that. IT IS NOT A PROBLEM. It's very much by design, and it's 
because what you want.

Use --full-history if you want the full history. 

> This is your _subjective_ interpretion of this problem, because it's not a 
> problem for you, nobody else can possibly have this problem (or they just 
> crazy).

No, Roman. You're not crazy because you have some issue that I cannot 
understand. You're crazy because you make the same mistake over and over, 
and don't listen when people tell you what the mistake was.

	"Insanity is doing the same thing over and over again and 
	 expecting different results."
			- Various

Please. People have told you where you go wrong. Many times. So why do you 
keep repeating it?

Take the time to slow down, listen, and realize that you're on the wrong 
track, and that others really _have_ spent time and thought on this.

		Linus

^ permalink raw reply

* Re: Bizarre missing changes (git bug?)
From: Kevin Ballard @ 2008-07-30  1:32 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Linus Torvalds, Tim Harper, git
In-Reply-To: <Pine.LNX.4.64.0807300223010.6791@localhost.localdomain>

On Jul 29, 2008, at 6:14 PM, Roman Zippel wrote:

>> So here's my challenge again, which you seem to have TOTALLY MISSED.
>>
>> Make this be fast:
>>
>> 	time sh -c "git log <filename> | head"
>>
>> nothing else matters. If you can make that one be fast, I'm happy.
>
> I already explained it, but you simply dismissed it. It's possible,  
> but it
> requires a bit of cached information (e.g. as part of the pack file,  
> which
> is needed for decent performance anyway).

As an outside observer, this argument is basically akin to "it's easy  
to fly, you just need some faerie dust". Basically, you're dismissing  
the entire complexity of the problem by saying "oh, that's easy, just  
use some cached data" without any proof that this would work, or any  
sample code, or really any evidence at all. Given that the path  
simplification can be arbitrarily complex (I can pass any set of paths  
I want), I don't believe that you can just use "a bit of cached  
information" for this. If you did rely on cached information, said  
information would probably be orders of magnitude larger than the  
object graph itself (for repos with lots of files).

>> In fact, you can see what I'm talking about by trying --topo-order  
>> in the
>> above timing test.
>
> Please give me full example.
> gitk --topo-order kernel/printk.c shows no difference (e.g. it doesn't
> show 02630a12c7f72fa294981c8d86e38038781c25b7), several experiments  
> with
> git-rev-list show no improvement either.

He's not saying it changes what commits are shown, he's saying it has  
a performance impact - topo order has to post-process the graph. For a  
quick demonstration, run `time sh -c 'git log | head'` vs `time sh -c  
'git log --topo-order | head'`.

-Kevin Ballard

-- 
Kevin Ballard
http://kevin.sb.org
kevin@sb.org
http://www.tildesoft.com

^ permalink raw reply

* Re: Bizarre missing changes (git bug?)
From: Linus Torvalds @ 2008-07-30  1:49 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Tim Harper, git
In-Reply-To: <Pine.LNX.4.64.0807300223010.6791@localhost.localdomain>



On Wed, 30 Jul 2008, Roman Zippel wrote:
> > 
> > 	time sh -c "git log <filename> | head"
> > 
> > nothing else matters. If you can make that one be fast, I'm happy. 
> 
> I already explained it, but you simply dismissed it. It's possible, but it 
> requires a bit of cached information (e.g. as part of the pack file, which 
> is needed for decent performance anyway).

Bzzt. Wrong. Try again.

> > In fact, you can see what I'm talking about by trying --topo-order in the 
> > above timing test.
> 
> Please give me full example.
> gitk --topo-order kernel/printk.c shows no difference (e.g. it doesn't 
> show 02630a12c7f72fa294981c8d86e38038781c25b7), several experiments with 
> git-rev-list show no improvement either.

Roman, what the f*ck is wrong with you? Let me repeat that thing one more 
time:

	you can see what I'm talking about by trying --topo-order in the
	above timing test.
	      ^^^^^^^^^^^

The fact is, --topo-order is a post-processing thing, exactly the way your 
half-way simplification would be. It requires _all_ commits, and it 
requires them because we cannot guarantee that we output all children 
before the parents when there are multiple threads without a central clock 
(ie any distributed environment).

So for --topo-order, we generate the whole history, and then we sort it. 

As a result, it has horrible interactivity behavior. Try it. Here's some 
random command lines, and the times:

	time git log --topo-order drivers/scsi/scsi_lib.c | head

	real    0m0.688s
	user    0m0.652s
	sys     0m0.036s

and without:

	time git log drivers/scsi/scsi_lib.c | head

	real    0m0.033s
	user    0m0.024s
	sys     0m0.008s

do you see the difference? They happen to output _exactly_ the same ten 
lines, but one of them takes the better part of a second (and that's on 
pretty much the fastest machine you can find right now - on a laptop with 
a slow disk and without things in cache, it would take many many seconds).

The other one is instantaneous.

Now, I realize that 0.033s vs 0.688s doesn't sound like a big deal, even 
though that's a 20x difference, but that 20x difference is a _really_ big 
deal when the machine is slower, or when "old history" isn't in the disk 
cache any more.

For example, try doing the timings after flushing the disk caches to 
simulate cold-cache behavior. Do it with a slow disk. Or do it over NFS. 
Yes, even the "fast" case will actually be painfully slow (well, it is for 
me, people who are used to CVS probably think it's just "normal"). 

And yes, it will depend a lot on the file in question too. Obviously, if 
the first change is far back in history, it will be slow _regardless_, but 
I've at least personally found that in practice, you tend to look at logs 
of _recent_ things much much much more than you look at things that 
haven't changed lately.	

It will also depend a lot on whether you are packed or not. For example, 
if you are well packed, the pack-file IO locality is really really good, 
and the 20x slowdown is much less. I just tested with a laptop with a slow 
disk, and the --topo-order case was "only" 2.5x slower, almost certainly 
because the IO required to bring in the first part of the history ended up 
being a large portion of the total IO, and so the "whole history" case was 
not 20x slower, because there was not 20x more IO due to the good locality 
and the kernel doing readahead etc.

But 2.5x slower is really bad, wouldn't you agree? We're not talking about 
a few percent here, we're talking about more than twice as long. It's very 
noticeable, especially when the end result was --topo-order: 29.8s, no 
topo-order 12.1s

(Yeah, that wasn't a very realistic example, but on that same machine, 
once it's in the cache, it's 0.13s vs 1.6s: one is "instant", the other is 
very much a "wait for it" kind of thing.)

THAT is the kind of performance difference you see.

And trust me, it's a performance difference that you can really notice in 
real life. I'm not kidding you. Just try it:

	git log kernel/sched.c
vs
	git log --topo-order kernel/sched.c

and one is instant, the other one pauses before it starts showing 
something. One feels fast, the other feels slow.

At the same time, if you actually time the _whole_ log, it's all exactly 
the same speed:

	[torvalds@nehalem linux]$ time git log --topo-order kernel/sched.c > /dev/null 
	real	0m0.708s
	user	0m0.684s
	sys	0m0.020s

	[torvalds@nehalem linux]$ time git log kernel/sched.c > /dev/null 
	real	0m0.703s
	user	0m0.672s
	sys	0m0.032s

Notice? The cost of the topological sort itself is basically zero. But 
from an interactivity standpoint, it's _deadly_.

And please note that here "--topo-sort" is just an example of a random 
"global history post-processing" thing. It's not that I want you to use 
the topological sort per se, it's just an example of the whole issue with 
_any_ post-factum operation. The topological sort is not expensive as a 
sort. What is expensive is that it needs to get the whole history to work.

And also please notice that this is a huge scalability issue. "git log" 
should not become slower as a project gets more history. Sure, the full 
log will take longer to generate (because there's _more_ of it), but the 
top commits should always show up immediately.

Again, if you have a filter (where "topological sort" is just an example 
of such a filter) that requires the full history to work, it simply 
_fundamentally_ cannot scale well. If very fundamentally will slow down 
with bigger history.

> The problem is that your picture doesn't include my specific problem, I'm 
> very interested in the big picture, but I'd like to be in it.

Roman, I've been trying to explain this "interactive" thing for _days_ 
now. That's the big picture. The whole "you have to be able to generate 
history incrementally" thing.

First generating the whole global history, and then simplifying it, is 
simply not acceptable. It's too slow, and it doesn't scale.

			Linus

^ permalink raw reply

* Re: Bizarre missing changes (git bug?)
From: Roman Zippel @ 2008-07-30  1:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Tim Harper, git
In-Reply-To: <alpine.LFD.1.10.0807291006070.3334@nehalem.linux-foundation.org>

Hi,

On Tue, 29 Jul 2008, Linus Torvalds wrote:

> Now, do these three things
> 
> 	gitk
> 	gitk file
> 	gitk --full-history file
> 
> and compare them. They all show _different_ histories.
> 
> Which one is "correct"? They all are. It just depends on what you want to 
> see.
> 
> The "gitk file" history is the simplest one BY FAR, because it has very 
> aggressively simplified history to the point where it tried to find the 
> _simplest_ history that explains the current contents of 'file'[*]

It's "aggressively simplified" by not even bothering to look for more.
"simplified" implies there is something more complex beforehand, but all 
it does is simple scan through the history as fast possible without 
bothering looking left or right.
"simplified" implies to me it's something intentional, but this is more of 
an accidental optimization which happens to work in most situations and in 
the special cases it just picks a random change and hopes for the best.

"git-log --full-history file" at least produces the full change history, 
but it has an performance impact and it doesn't produce a complete graph 
usable for graphical front ends.

> >From a practical standpoint, and from having used this a long time, I'd 
> argue that the simple history is the one that you want 99.9% of all time. 
> But not _always_. Sometimes, the things that got simplified away actually 
> matter. It's rare, but it happens.
> 
> For example, maybe you had a bug-fix that you _know_ you did, and it it 
> doesn't show up in the simplified history. That really pisses you off, and 
> it apparently really pisses Roman off that it can happen. But the fact is, 
> that still doesn't mean that the simple history is "wrong" or even 
> "incomplete".

I gave more general examples. Tracking upstream source can produce this 
problem frequently. Another example are stable/unstable branches where the 
stable branch is occasionally merged into the unstable branch can produce 
this problem.

> No, it's actually meaningful data in itself. If the bug-fix doesn't show 
> in the simplified history, then that simply means that the bug-fix was not 
> on a branch that could _possibly_ have mattered for the current contents. 
> 
> So once you are _aware_ of history simplification and are mentally able to 
> accept it, the fact that history got simplified is actually just another 
> tool.

This is your _subjective_ interpretion of this problem, because it's not a 
problem for you, nobody else can possibly have this problem (or they just 
crazy).
Even if I know about this limitation it still doesn't solve the problem, 
that _none_ of the graphical interfaces can show me a useful history graph 
of these situations.

bye, Roman

^ permalink raw reply

* [PATCH v2] Documentation: Remove mentions of git-svnimport.
From: Brian Gernhardt @ 2008-07-30  1:16 UTC (permalink / raw)
  To: Pieter de Bie; +Cc: Git List, Junio C Hamano, Jurko Gospodnetić

git-svnimport is no longer supported, so don't mention it in the
documentation.  This also updates the description, removing the
historical discussion, since it mostly dealt with how it differed from
svnimport.  The new description gives some starting points into the
rest of the documentation.

Noticed by Jurko Gospodnetić <jurko.gospodnetic@docte.hr>

Signed-off-by: Brian Gernhardt <benji@silverinsanity.com>
---

 Replaces the remaining comparison to git-svnimport with pointers
 to the rest of the documentation.

 Documentation/git-svn.txt |   26 ++++++++++++--------------
 1 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt
index e7c0f1c..f230125 100644
--- a/Documentation/git-svn.txt
+++ b/Documentation/git-svn.txt
@@ -12,18 +12,18 @@ SYNOPSIS
 DESCRIPTION
 -----------
 'git-svn' is a simple conduit for changesets between Subversion and git.
-It is not to be confused with linkgit:git-svnimport[1], which is
-read-only.
+It provides a bidirectional flow of changes between a Subversion and a git
+respository.
 
-'git-svn' was originally designed for an individual developer who wants a
-bidirectional flow of changesets between a single branch in Subversion
-and an arbitrary number of branches in git.  Since its inception,
-'git-svn' has gained the ability to track multiple branches in a manner
-similar to 'git-svnimport'.
+'git-svn' can track a single Subversion branch simply by using a
+URL to the branch, follow branches laid out in the Subversion recommended
+method (trunk, branches, tags directories) with the --stdlayout option, or
+follow branches in any layout with the -T/-t/-b options (see options to
+'init' below, and also the 'clone' command).
 
-'git-svn' is especially useful when it comes to tracking repositories
-not organized in the way Subversion developers recommend (trunk,
-branches, tags directories).
+Once tracking a Subversion branch (with any of the above methods), the git
+repository can be updated from Subversion by the 'fetch' command and
+Subversion updated from git by the 'dcommit' command.
 
 COMMANDS
 --------
@@ -218,8 +218,7 @@ Any other arguments are passed directly to 'git-log'
 
 'commit-diff'::
 	Commits the diff of two tree-ish arguments from the
-	command-line.  This command is intended for interoperability with
-	'git-svnimport' and does not rely on being inside an `git-svn
+	command-line.  This command does not rely on being inside an `git-svn
 	init`-ed repository.  This command takes three arguments, (a) the
 	original tree to diff against, (b) the new tree result, (c) the
 	URL of the target Subversion repository.  The final argument
@@ -317,8 +316,7 @@ config key: svn.findcopiesharder
 -A<filename>::
 --authors-file=<filename>::
 
-Syntax is compatible with the files used by 'git-svnimport' and
-'git-cvsimport':
+Syntax is compatible with the file used by 'git-cvsimport':
 
 ------------------------------------------------------------------------
 	loginname = Joe User <user@example.com>
-- 
1.6.0.rc1.154.ge3fc

^ permalink raw reply related

* Re: Bizarre missing changes (git bug?)
From: Roman Zippel @ 2008-07-30  1:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tim Harper, git
In-Reply-To: <alpine.LFD.1.10.0807290838360.3334@nehalem.linux-foundation.org>

Hi,

On Tue, 29 Jul 2008, Linus Torvalds wrote:

> On Tue, 29 Jul 2008, Roman Zippel wrote:
> > 
> > I'm not dismissing it, but your focus is on how to get this result.
> 
> No, you misunderstand.
> 
> My focus is really on one single thing:
> 
>  - performance
> 
> with a smaller focus on the fact that I simply don't see how it's 
> _possible_ to do better than our current all-or-nothing approach of 
> simplification (eg either extreme simplification or none at all: nothing 
> or --full-history).

That's exactly what I'm not dismissing as you claim, but I've hit the 
problem where this approach simply produces crap, so I'm foremost 
interested in getting a useful result, only after that I'm interested in 
the performance (which I think is possible).

> So here's my challenge again, which you seem to have TOTALLY MISSED.
> 
> Make this be fast:
> 
> 	time sh -c "git log <filename> | head"
> 
> nothing else matters. If you can make that one be fast, I'm happy. 

I already explained it, but you simply dismissed it. It's possible, but it 
requires a bit of cached information (e.g. as part of the pack file, which 
is needed for decent performance anyway).

> In fact, you can see what I'm talking about by trying --topo-order in the 
> above timing test.

Please give me full example.
gitk --topo-order kernel/printk.c shows no difference (e.g. it doesn't 
show 02630a12c7f72fa294981c8d86e38038781c25b7), several experiments with 
git-rev-list show no improvement either.

> > > And quite frankly, I've seen that behaviour from you before, when it comes 
> > > to other things.
> > 
> > What exact behaviour is that? That I dare to disagree with you?
> 
> No. The fact that you like arguing _pointlessly_, and just being abrasive, 
> without actually helping or understanding the big picture.

The problem is that your picture doesn't include my specific problem, I'm 
very interested in the big picture, but I'd like to be in it.

> I'm thinking 
> back on the whole scheduler thing. You weren't arguing with _me_, but you 
> had the same modus operandi.

Well, it seems I have talent for finding the special cases, e.g. last time 
I tested the scheduler it was performing twice as bad as the old scheduler 
on m68k. I've also seen cases where it sacrifices throughput for 
interactivity.
Anyway, this is the wrong place for it anyway, the problem I'm hitting is 
these "good enough" solutions, which work in most situations, but fail in 
a few special situations, but nobody is interested to get these right 
unless your name is Linus.

bye, Roman

^ permalink raw reply

* Re: [trivial fast-export PATCH] Fix typo in documentation
From: Nick Andrew @ 2008-07-30  1:05 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git
In-Reply-To: <alpine.DEB.1.00.0807291404390.4631@eeepc-johanness>

On Tue, Jul 29, 2008 at 02:06:13PM +0200, Johannes Schindelin wrote:
> On Tue, 29 Jul 2008, Nick Andrew wrote:
> 
> >  hg-fast-export.txt |   10 +++++-----
> 
> I do not see this in git.git.  Maybe you meant to Cc: Simon Hausmann, 
> Rocco Rutte or Han-Wen?
> 
> Or Chris Lee, who seems to be the owner of fast-export.git.

Thanks; I forwarded the patch to Chris Lee.

Nick.

^ permalink raw reply

* Re: [PATCH] Documentation: Remove mentions of git-svnimport.
From: Brian Gernhardt @ 2008-07-30  0:57 UTC (permalink / raw)
  To: Pieter de Bie; +Cc: Git List, Junio C Hamano
In-Reply-To: <F11D5504-5FA1-4FF7-9D53-032D5027EBB9@ai.rug.nl>


On Jul 29, 2008, at 8:44 PM, Pieter de Bie wrote:

> On Jul 30, 2008, at 2:38 AM, Brian Gernhardt wrote:
>
>> +respository.  'git-svn' is especially useful when it comes to  
>> tracking
>> +repositories not organized in the way Subversion developers  
>> recommend
>> +(trunk, branches, tags directories).
>
> This is of course not true. Git svn is especially useful when  
> branches _are_ organized
> that way, but other configurations are somewhat supported. This  
> comment still refers
> to the comparison with svnimport.

The original statement was in a paragraph by itself, so I assumed it  
was a not a comparison.  Reading it again, I can see your point.

~~ Brian

^ permalink raw reply

* Re: Bizarre missing changes (git bug?)
From: Linus Torvalds @ 2008-07-30  0:48 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Martin Langhoff, Tim Harper, git
In-Reply-To: <alpine.LFD.1.10.0807291716060.3334@nehalem.linux-foundation.org>



On Tue, 29 Jul 2008, Linus Torvalds wrote:
> 
>  - But the other huge mistake you make is EVEN MORE STUPID, because it's 
>    so ironic. That magical output you want, and claim is so perfect, and 
>    point out "thus you can still apply as much simplification as you want 
>    on top of it"? You know what? It already _exists_! It's exactly that 
>    --full-history case.

Put in other terms: what you ask for can be fairly trivially done as a 
filter on the _current_ git output (preferably merged into the tool that 
shows it graphically in the first place), with absolutely no downside.

In contrast, if somebody was really so _stupid_ as to go with your output 
format, then yes, he could further simplify it down to the current default 
format, but with a huge performance/interactivity downside.

See? Your preferred format is not actually the "best" format. Not at all. 
Quite the reverse. Your preferred format is much better off being a 
secondary post-processing format exactly because it can be generated from 
one of the primary formats easily enough.

But the reverse isn't true: the current primary formats cannot be 
generated from your preferred format without losing something important 
(performance).

But I'll make you a deal: if you actually write the filter in C form, I 
can pretty much guarantee that we can easily add it as a new flag. It 
really should be pretty easy to integrate it into the revision parsing 
machinery alongside --topo-order, since it's really the same kind of 
operation.

In fact, it's possible that the current --topo-order sorting could 
possibly be made to just do the simplification (conditionally, of course, 
since it has the latency problem). See the function

	void sort_in_topological_order(struct commit_list ** list, int lifo)

in commit.c - that's where it would hook in.

		Linus

^ permalink raw reply

* Re: [PATCH] Documentation: Remove mentions of git-svnimport.
From: Pieter de Bie @ 2008-07-30  0:44 UTC (permalink / raw)
  To: Brian Gernhardt; +Cc: Git List, Junio C Hamano
In-Reply-To: <1217378299-733-1-git-send-email-benji@silverinsanity.com>


On Jul 30, 2008, at 2:38 AM, Brian Gernhardt wrote:

> +respository.  'git-svn' is especially useful when it comes to  
> tracking
> +repositories not organized in the way Subversion developers recommend
> +(trunk, branches, tags directories).

This is of course not true. Git svn is especially useful when branches  
_are_ organized
that way, but other configurations are somewhat supported. This  
comment still refers
to the comparison with svnimport.

^ permalink raw reply

* Re: git svn documentation
From: Brian Gernhardt @ 2008-07-30  0:39 UTC (permalink / raw)
  To: Jurko Gospodnetić; +Cc: git
In-Reply-To: <g6o6j6$k7$1@ger.gmane.org>


On Jul 29, 2008, at 6:45 PM, Jurko Gospodnetić wrote:

>  git svn documentation should be updated so it no longer references  
> the no longer supported 'git svnimport' command. Currently this  
> causes an invalid link to be added to the git svn html documentation.

Thanks for the report.  I wrote a patch for this and sent it to the  
list, but forgot to send it as a reply to your message.

~~ Brian

^ permalink raw reply

* [PATCH] Documentation: Remove mentions of git-svnimport.
From: Brian Gernhardt @ 2008-07-30  0:38 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano

git-svnimport is no longer supported, so don't mention it in the
documentation.  This also updates the description, removing the
historical discussion, since it mostly dealt with how it differed from
svnimport.

Noticed by Jurko Gospodnetić <jurko.gospodnetic@docte.hr>

Signed-off-by: Brian Gernhardt <benji@silverinsanity.com>
---
 Documentation/git-svn.txt |   22 ++++++----------------
 1 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt
index e7c0f1c..4d82a08 100644
--- a/Documentation/git-svn.txt
+++ b/Documentation/git-svn.txt
@@ -12,18 +12,10 @@ SYNOPSIS
 DESCRIPTION
 -----------
 'git-svn' is a simple conduit for changesets between Subversion and git.
-It is not to be confused with linkgit:git-svnimport[1], which is
-read-only.
-
-'git-svn' was originally designed for an individual developer who wants a
-bidirectional flow of changesets between a single branch in Subversion
-and an arbitrary number of branches in git.  Since its inception,
-'git-svn' has gained the ability to track multiple branches in a manner
-similar to 'git-svnimport'.
-
-'git-svn' is especially useful when it comes to tracking repositories
-not organized in the way Subversion developers recommend (trunk,
-branches, tags directories).
+It provides a bidirectional flow of changes between a Subversion and a git
+respository.  'git-svn' is especially useful when it comes to tracking
+repositories not organized in the way Subversion developers recommend
+(trunk, branches, tags directories).
 
 COMMANDS
 --------
@@ -218,8 +210,7 @@ Any other arguments are passed directly to 'git-log'
 
 'commit-diff'::
 	Commits the diff of two tree-ish arguments from the
-	command-line.  This command is intended for interoperability with
-	'git-svnimport' and does not rely on being inside an `git-svn
+	command-line.  This command does not rely on being inside an `git-svn
 	init`-ed repository.  This command takes three arguments, (a) the
 	original tree to diff against, (b) the new tree result, (c) the
 	URL of the target Subversion repository.  The final argument
@@ -317,8 +308,7 @@ config key: svn.findcopiesharder
 -A<filename>::
 --authors-file=<filename>::
 
-Syntax is compatible with the files used by 'git-svnimport' and
-'git-cvsimport':
+Syntax is compatible with the file used by 'git-cvsimport':
 
 ------------------------------------------------------------------------
 	loginname = Joe User <user@example.com>
-- 
1.6.0.rc1.154.ge3fc

^ permalink raw reply related

* Re: Bizarre missing changes (git bug?)
From: Linus Torvalds @ 2008-07-30  0:32 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Martin Langhoff, Tim Harper, git
In-Reply-To: <Pine.LNX.4.64.0807291433430.6791@localhost.localdomain>



On Wed, 30 Jul 2008, Roman Zippel wrote:
> 
> What the short simplified history is more pure laziness

No.

Roman, you're an idiot who doesn't even _understand_ what you are talking 
about. Sadly, you then _think_ you are so smart that you then refuse to 
even consider the fact that others disagree, so you don't even read what 
they write.

Go to my previous email in this thread. Do the example. Look at the 
simplified version. Ponder.

It's not "pure lazyness" when you get the simplified version. It's 
actually a MORE USEFUL RESULT! The simplified version shows the minimal 
explanation of how things ended up the way they are, and that is damn 
useful. What you want is extra _clutter_ most of the time.

It's really sad how you cannot get over your own prejudices here. 

So Roman. Go back, read my previous email in this thread. It's message ID 
is

	<alpine.LFD.1.10.0807291006070.3334@nehalem.linux-foundation.org>

in case it helps you find it.

Read it twice, or three times. Read it with the notion that maybe you 
didn't know best after all. Read it with the possibility that maybe there 
are smarter people than you, and people who have actually worked with git 
for several years.

And if you can't do that, at least stop cc'ing me with your idiocy.

To get to the meat of your email:

> The point I'm trying to make is that the compact history graph has the 
> potential to completely replace the simplified history. The only problem 
> is that it needs a bit of cached extra information, then it can be as fast 
> the short simplified history for the common case and it still can produce 
> as much information as the full simplified history, thus you can still 
> apply as much simplification as you want on top of it.

You're simply full of sh*t. You make two huge mistakes, and I'll spend 
another few minutes of my life trying to educate you one final more time, 
even though from every single indication I have so far, you are unable to 
learn simply because you think you already know the answer.

Your two mistakes are:

 - your "only" problem is fundamental.  It's unsolvable. Git history 
   simplification isn't per-file or even per-directory.  It's 
   per-any-random-set-of-pathnames. You can't "cache" the simplified 
   information, and it's not "a bit" of cached extra info. It's 
   fundamentally a metric truckload of info.

   With a cache, you can make the performance of a repeated query go fast, 
   but that's totally uninteresting.

 - But the other huge mistake you make is EVEN MORE STUPID, because it's 
   so ironic. That magical output you want, and claim is so perfect, and 
   point out "thus you can still apply as much simplification as you want 
   on top of it"? You know what? It already _exists_! It's exactly that 
   --full-history case.

   Can you not see that? That's exactly that --full-history --parents 
   cases. It gives you the full information. You can simplify it to what 
   you want, exactly because it did _not_ simplify things for you. I've 
   even told you so, multiple times, when I suggested you try to do that 
   simplification in "gitk".

In other words, git has the two cases you want: the "extreme simplified 
history" (that is nice to see what really _mattered_, with no extra 
unnecessary duplicate history that didn't actually affect the end result), 
and the "full" history (ooh, I know, we could make a command line called 
"--full-history" to get the latter, so that people who wanted to see it 
all and perhaps distill it to something else could do so).

And I've told you over and over what you should look at, and I've told you 
over and over that the default is actually the _useful_ case, and why. But 
you seem to refuse to listen. You just close your ears and repeat your 
mantra, even though people smarter than you have told you why it's done 
the way it's done. 

Stop stuffing your ears. Listen to what people tell you.

		Linus

^ permalink raw reply

* Re: Bizarre missing changes (git bug?)
From: Martin Langhoff @ 2008-07-30  0:25 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Linus Torvalds, Tim Harper, git
In-Reply-To: <Pine.LNX.4.64.0807291433430.6791@localhost.localdomain>

On Wed, Jul 30, 2008 at 12:16 PM, Roman Zippel <zippel@linux-m68k.org> wrote:
> I already did the prototype

afaict people around here are only interested if it can be done
without losing the early-output niceness of current git-log. That it
can be worked out in a "put it all in memory and work it in there"
model is _not_ interesting for git.

cheers,



m
-- 
 martin.langhoff@gmail.com
 martin@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff

^ permalink raw reply

* Re: Bizarre missing changes (git bug?)
From: Roman Zippel @ 2008-07-30  0:16 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Linus Torvalds, Tim Harper, git
In-Reply-To: <46a038f90807282015m7ce3da10h71dfee221c960332@mail.gmail.com>

Hi,

On Tue, 29 Jul 2008, Martin Langhoff wrote:

> On Tue, Jul 29, 2008 at 2:59 PM, Roman Zippel <zippel@linux-m68k.org> wrote:
> > Can we please get past this and look at what is required to produce the
> > correct history?
> 
> Roman - correct is --full-history -- any simplification that makes it
> easy on your eyes *is* a simplification. And consumers that want to do
> nice user-friendly simplification like gitk does can hang off the data
> stream.

I don't quite understand what you're trying to say.
To avoid further confusion it maybe helps to specify a few of the terms:

- full history graph: produced by "git-log --full-history --parents"
- compact history graph: the full history graph without without any 
  repeated merges, this is what my example script produces.
- full simplified history: output of "git-log --full-history"
- short simplified history: standard output of "git-log"

The important part about the history graphs is that all commits are 
properly connected in it (i.e. all except the head commit have a child), 
This is needed to know if you don't just what want to know what happened, 
but also how it got merged, also any graphical interface needs it to 
produce a useful history graph.

What the short simplified history is more pure laziness, it's fast and 
gets the most common cases right, but in order to do this it has to ignore 
part of the history. The full simplified history at least produces 
produces the full change history, but it lacks part of the merge history 
and it stills takes longer to generate.

The point I'm trying to make is that the compact history graph has the 
potential to completely replace the simplified history. The only problem 
is that it needs a bit of cached extra information, then it can be as fast 
the short simplified history for the common case and it still can produce 
as much information as the full simplified history, thus you can still 
apply as much simplification as you want on top of it.

Keep in mind that e.g. git-web is using the full simplified history, so 
what I'm offering also has the potential to improve git-web performance...

> > it's also possible to update it when merging/pulling new data.
> 
> If that's what you want to do, you can prototype it with a hook on
> fetch and commit. That is definitely an area that hasn't been explored
> - what nicer (but expensive) views on the history we have can be
> afforded by pre-computing things on fetch and commit hooks.

I already did the prototype, I know how to generate that information, the 
problem is to get that information to the various graphical interfaces.

bye, Roman

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox