New features in gitk

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* New features in gitk
@ 2007-10-28  1:39 Paul Mackerras
  2007-10-28  5:34 ` Linus Torvalds
                   ` (3 more replies)
  0 siblings, 4 replies; 57+ messages in thread
From: Paul Mackerras @ 2007-10-28  1:39 UTC (permalink / raw)
  To: git

I just pulled the dev branch of gitk into the master branch, so the
master branch now has the new features and improvements that I have
been working on, namely:

* The find and highlight functions have been combined into a single
  function, and there is now a button for finding backwards as well as
  a find forwards button.  Thus you can now search for commits that
  modify certain files or directories, or commits that add/remove a
  given string, as well as searching for commits by commit message,
  author, committer or headline.

* Combining the find and highlight functions freed up space that is
  now used for a progress bar and a status window.

* There is now a font chooser accessible from the edit/preferences
  window.

* Gitk now uses a new graph layout algorithm, which means it doesn't
  have to generate the whole layout from top to bottom at startup
  time, making startup faster.  Gitk also uses a new style for drawing
  short diagonal line segments that join an existing vertical line,
  which is visually clearer when the segment crosses another line.

* Gitk caches the topology information used for the previous/next tag
  and branch information, making startup faster.

Tk 8.5 is now in beta, meaning that some distros now have it
packaged.  Gitk looks much nicer in Tk8.5 since it supports
anti-aliased fonts, so I strongly suggest that people install and use
Tk8.5 if possible.

Paul.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-10-28  1:39 New features in gitk Paul Mackerras
@ 2007-10-28  5:34 ` Linus Torvalds
  2007-10-28  7:11   ` Paul Mackerras
  2007-10-28 18:32 ` Pierre Habouzit
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 57+ messages in thread
From: Linus Torvalds @ 2007-10-28  5:34 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: git

On Sun, 28 Oct 2007, Paul Mackerras wrote:
>
> I just pulled the dev branch of gitk into the master branch, so the
> master branch now has the new features and improvements that I have
> been working on, namely:

*Huge* improvements. It is now really nice to start up gitk even on the 
full kernel history.

However, that crazy green bar chasing back-and-forth int he "reading" 
phase is really quite visually distracting. Maybe it looks better in 
Tk8.5, but it does look pretty annoying in the version I have. Can you 
tone that down a bit? 

But this has both the layout performance improvements and the fixes to 
only show selected files in the diff view by default, so I hope this gets 
merged into standard git soon..

		Linus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-10-28  5:34 ` Linus Torvalds
@ 2007-10-28  7:11   ` Paul Mackerras
  2007-10-28  7:36     ` Steffen Prohaska
  2007-10-28 16:50     ` Linus Torvalds
  0 siblings, 2 replies; 57+ messages in thread
From: Paul Mackerras @ 2007-10-28  7:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds writes:

> However, that crazy green bar chasing back-and-forth int he "reading" 
> phase is really quite visually distracting. Maybe it looks better in 
> Tk8.5, but it does look pretty annoying in the version I have. Can you 
> tone that down a bit? 

Yeah.  Actually what I'd like is to know how many commits git log is
going to give me, so that I can do a normal progress bar whose length
is proportional to commits_read / total_commits.  With --topo-order
(or --date-order) it has to get to the last commit before it outputs
the first commit, doesn't it?  So could it print the total number of
commits on a line by itself at the start of its output?  (Presumably
it would need a --commit-count flag to enable that behaviour.)

Other than that, I could slow the progress bar down, or do a bar of
moving diagonal stripes, or something.

Paul.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-10-28  7:11   ` Paul Mackerras
@ 2007-10-28  7:36     ` Steffen Prohaska
  2007-10-28 16:50     ` Linus Torvalds
  1 sibling, 0 replies; 57+ messages in thread
From: Steffen Prohaska @ 2007-10-28  7:36 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Linus Torvalds, git

On Oct 28, 2007, at 8:11 AM, Paul Mackerras wrote:

> Linus Torvalds writes:
>
>> However, that crazy green bar chasing back-and-forth int he "reading"
>> phase is really quite visually distracting. Maybe it looks better in
>> Tk8.5, but it does look pretty annoying in the version I have. Can  
>> you
>> tone that down a bit?

I have the same impression.

> Yeah.  Actually what I'd like is to know how many commits git log is
> going to give me, so that I can do a normal progress bar whose length
> is proportional to commits_read / total_commits.

Can you use something like a rotating wheel, if the total size
of the task is unknown.

Or if you know an upper bound on the number of expected commits,
you could use this as total_commits. And adjust the upper
bound if more information becomes available.

Or you just print the number of commits already read and the
user is happy because something is changing.

	Steffen

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-10-28  7:11   ` Paul Mackerras
  2007-10-28  7:36     ` Steffen Prohaska
@ 2007-10-28 16:50     ` Linus Torvalds
  2007-11-01 10:00       ` Paul Mackerras
  2007-11-01 11:37       ` Paul Mackerras
  1 sibling, 2 replies; 57+ messages in thread
From: Linus Torvalds @ 2007-10-28 16:50 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: git

On Sun, 28 Oct 2007, Paul Mackerras wrote:
>
> Yeah.  Actually what I'd like is to know how many commits git log is
> going to give me

That's not known until later.

> With --topo-order (or --date-order) it has to get to the last commit 
> before it outputs the first commit, doesn't it?

The cost is not generally in outputting the commits. The real cost is in 
traversing them in the first place. 

So yes, we could output the number of commits once we know it, but 
generally, by that time, it's not an interesting number any more! You 
might as well just read the list, because git is going to feed it to you 
as fast as it can (which is plenty fast - you'll probably get hundreds of 
megabytes of SHA1 values per second at that point).

So basically, by the time you start getting SHA1's from --topo-order, the 
best thing you can do is just lay back and think of England. The *last* 
thing you want to do is bother with any graphics and updates, because it's 
just going to slow things down.

It's before you even start getting the SHA1's, _or_ if you don't use 
"--date/topo-order" in the first place, that you want to have a "wait, I'm 
thinking" thing. And at neither time do you know how long it's going to 
be.

(And as mentioned many times earlier - if you can avoid topo-order and 
date-order entirely, you are going to perform a million times better at 
startup for the cold-cache case. Since you seem to be doing the graph 
layout lazily now, maybe you could aim for that some day? It does mean 
that you might - occasionally - end up having to add a commit to 
*before* one you already laid out).

		Linus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-10-28  1:39 New features in gitk Paul Mackerras
  2007-10-28  5:34 ` Linus Torvalds
@ 2007-10-28 18:32 ` Pierre Habouzit
  2007-10-28 18:38   ` Mike Hommey
  2007-10-28 23:13   ` Paul Mackerras
  2007-10-29 13:30 ` Han-Wen Nienhuys
  2007-10-29 14:04 ` Michele Ballabio
  3 siblings, 2 replies; 57+ messages in thread
From: Pierre Habouzit @ 2007-10-28 18:32 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2629 bytes --]

On dim, oct 28, 2007 at 01:39:34 +0000, Paul Mackerras wrote:
> I just pulled the dev branch of gitk into the master branch, so the
> master branch now has the new features and improvements that I have
> been working on, namely: [...]

As you seem to be the guy to ask for, I've a couple of requests wrt
gitk.

  * the diff window is quite bad with merge commits, the colorization is
    rather poor, and the last version you just merged isn't especially
    better.

  * the 'sha1' input field is a major pain in the UI: the cut&paste
    interaction is very poor. I don't know why, but it's often very very
    hard to really copy the sha id, probably because it's selected by
    default.

  * hjkl in the history list do very very very curious things, whereas I
    would expect j/k to do the same as (resp) down/up. Note that in
    [Help->Key bindings] it's said it should work that way, but it
    doesn't here at least. A way to customize bindings would be much
    appreciated (I like vi bindings, and I miss ^U/^D, and ^E/^Y e.g.).

  * I really really really miss an option to ignore whitespaces in
    diffs, a small checkbox to view the full blown diff, or the one
    without spaces changes (-w -b) would be _really_ great.

  * the fact that it remembers the position where it was in the WM when
    it was closed is really annoying. the WM is supposed to place the
    window. With at least ion3 and xinerama it often shows up on the
    wrong screen. Remembering the window size though is fine.

  * wrt the layout, when the gitk window is resized, the resizing of the
    three columns (subjects, commiter, date) is really cumbersome. I
    would expect that the subject one would be the sole one to be
    resized.

  * still wrt the layout, the focus is quite cumbersome. Gitk would be
    really really really nice to be used only from the keyboard, but
    because of a very unclear focus policy, it really isn't for me.
    Maybe it's just me, and I know this may not be 100% helpful, but I
    never know which part of gitk will receive my keys (history part,
    diff part, tree, ...).

  * in the diff [lines of context] input, if you hit "down" it
    decrements the number of lines which is okay, but _also_ moves the
    selected history line which is not.


  This list may sound harsh, I hope not, I love gitk, it's one of the
10 git commands I use the most.

Cheers,
-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-10-28 18:32 ` Pierre Habouzit
@ 2007-10-28 18:38   ` Mike Hommey
  2007-10-28 23:13   ` Paul Mackerras
  1 sibling, 0 replies; 57+ messages in thread
From: Mike Hommey @ 2007-10-28 18:38 UTC (permalink / raw)
  To: Pierre Habouzit, Paul Mackerras, git

On Sun, Oct 28, 2007 at 07:32:16PM +0100, Pierre Habouzit wrote:
> On dim, oct 28, 2007 at 01:39:34 +0000, Paul Mackerras wrote:
> > I just pulled the dev branch of gitk into the master branch, so the
> > master branch now has the new features and improvements that I have
> > been working on, namely: [...]
> 
> As you seem to be the guy to ask for, I've a couple of requests wrt
> gitk.
(...)
    * When running gitk --all, it would be nice if the current branch
      was selected, instead of the topmost commit.

Mike

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-10-28 18:32 ` Pierre Habouzit
  2007-10-28 18:38   ` Mike Hommey
@ 2007-10-28 23:13   ` Paul Mackerras
  2007-10-29  6:20     ` Pierre Habouzit
  2007-10-29  6:24     ` Pierre Habouzit
  1 sibling, 2 replies; 57+ messages in thread
From: Paul Mackerras @ 2007-10-28 23:13 UTC (permalink / raw)
  To: Pierre Habouzit; +Cc: git

Pierre Habouzit writes:

> As you seem to be the guy to ask for, I've a couple of requests wrt
> gitk.
> 
>   * the diff window is quite bad with merge commits, the colorization is
>     rather poor, and the last version you just merged isn't especially
>     better.

That's not a request, that's a grizzle. :)  What would you like it to
look like?

>   * the 'sha1' input field is a major pain in the UI: the cut&paste
>     interaction is very poor. I don't know why, but it's often very very
>     hard to really copy the sha id, probably because it's selected by
>     default.

It's selected so that the contents are in the cut buffer and you can
paste them in an xterm with middle-button.  Possibly I need to check
that control-C (or command-C under macos) is properly bound to copy.

>   * the fact that it remembers the position where it was in the WM when
>     it was closed is really annoying. the WM is supposed to place the
>     window. With at least ion3 and xinerama it often shows up on the
>     wrong screen. Remembering the window size though is fine.

That came in with some changes that make gitk start up correctly under
windows.  I could see about making it set the window position only
under windows.

>   * still wrt the layout, the focus is quite cumbersome. Gitk would be
>     really really really nice to be used only from the keyboard, but
>     because of a very unclear focus policy, it really isn't for me.
>     Maybe it's just me, and I know this may not be 100% helpful, but I
>     never know which part of gitk will receive my keys (history part,
>     diff part, tree, ...).

What focus policy would you like?

Paul.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-10-28 23:13   ` Paul Mackerras
@ 2007-10-29  6:20     ` Pierre Habouzit
  2007-10-29  8:31       ` Jonathan del Strother
  2007-10-29  6:24     ` Pierre Habouzit
  1 sibling, 1 reply; 57+ messages in thread
From: Pierre Habouzit @ 2007-10-29  6:20 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3758 bytes --]

On Sun, Oct 28, 2007 at 11:13:43PM +0000, Paul Mackerras wrote:
> Pierre Habouzit writes:
> 
> > As you seem to be the guy to ask for, I've a couple of requests wrt
> > gitk.
> > 
> >   * the diff window is quite bad with merge commits, the colorization is
> >     rather poor, and the last version you just merged isn't especially
> >     better.
> 
> That's not a request, that's a grizzle. :)  What would you like it to
> look like?

  I believe that git show/diff has it right: lines with a + should be in
the "added" color, and lines with a '-' in the "removed" one. gitk only
take the first "column" of +/- into account or sth I find awkward at
best, and I often go to the console to see a merge commit because of
that.

> >   * the 'sha1' input field is a major pain in the UI: the cut&paste
> >     interaction is very poor. I don't know why, but it's often very very
> >     hard to really copy the sha id, probably because it's selected by
> >     default.
>
> It's selected so that the contents are in the cut buffer and you can
> paste them in an xterm with middle-button.  Possibly I need to check
> that control-C (or command-C under macos) is properly bound to copy.

  Well, doing ^C doesn't always copy it (probably a glitch wrt which
input has the focus), and it certainly doesn't synchronize with the cut
buffer for me. And it doesn't work for anyone at work either. I use ion
with the KDE clipboard manager (klipper -- because I never managed to
find a clipboard manager that is as good yet, not depending upon KDE),
and at work most people use KDE, with the same klipper. Maybe it's a bad
interaction, I should try to use it under gnome or so to see if it is.

> >   * the fact that it remembers the position where it was in the WM when
> >     it was closed is really annoying. the WM is supposed to place the
> >     window. With at least ion3 and xinerama it often shows up on the
> >     wrong screen. Remembering the window size though is fine.
> 
> That came in with some changes that make gitk start up correctly under
> windows.  I could see about making it set the window position only
> under windows.

  That'd be really great.

> >   * still wrt the layout, the focus is quite cumbersome. Gitk would be
> >     really really really nice to be used only from the keyboard, but
> >     because of a very unclear focus policy, it really isn't for me.
> >     Maybe it's just me, and I know this may not be 100% helpful, but I
> >     never know which part of gitk will receive my keys (history part,
> >     diff part, tree, ...).
>
> What focus policy would you like?

  Well, what would make sense (to _me_ at least) would be some shortcuts
to move to the history panel (say e.g. using F1), or to the diff view
(using e.g. F2), or in the file list (say F3).
  That would hilight with a black 1px line (like it does for other
inputs fields) to say that this is the primary window part that takes
the keyboard inputs atm. And when doing that, if you press 'down' or
'up' it would scroll the adequate panel. It's really confusing that the
keyboard (or hjkl) right now always make the history change.

  This way you can make the difference between the keyboard shortcuts
that apply to the focused part of the window (up/down, pgup/pgdown are
IMHO of that kind), and the one that the user (or the default gitk) has
associated to a specific part, no matter if it has the focus. E.g. J/K
(or for emacsish people ^N/^P) could always move the history, that would
make sense.

Cheers,
-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-10-28 23:13   ` Paul Mackerras
  2007-10-29  6:20     ` Pierre Habouzit
@ 2007-10-29  6:24     ` Pierre Habouzit
  1 sibling, 0 replies; 57+ messages in thread
From: Pierre Habouzit @ 2007-10-29  6:24 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 698 bytes --]

On dim, oct 28, 2007 at 11:13:43 +0000, Paul Mackerras wrote:
> Pierre Habouzit writes:
> 
> > As you seem to be the guy to ask for, I've a couple of requests wrt
> > gitk.
> > 
> >   * the diff window is quite bad with merge commits, the colorization is
> >     rather poor, and the last version you just merged isn't especially
> >     better.
> 
> That's not a request, that's a grizzle. :)

  Right, would have I known a single word of Tcl, I would have provided
patches for that long time ago btw :P

-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-10-29  6:20     ` Pierre Habouzit
@ 2007-10-29  8:31       ` Jonathan del Strother
  0 siblings, 0 replies; 57+ messages in thread
From: Jonathan del Strother @ 2007-10-29  8:31 UTC (permalink / raw)
  To: Pierre Habouzit; +Cc: Paul Mackerras, git


On 29 Oct 2007, at 06:20, Pierre Habouzit wrote:

> On Sun, Oct 28, 2007 at 11:13:43PM +0000, Paul Mackerras wrote:
>>>   * the 'sha1' input field is a major pain in the UI: the cut&paste
>>>     interaction is very poor. I don't know why, but it's often  
>>> very very
>>>     hard to really copy the sha id, probably because it's  
>>> selected by
>>>     default.
>>
>> It's selected so that the contents are in the cut buffer and you can
>> paste them in an xterm with middle-button.  Possibly I need to check
>> that control-C (or command-C under macos) is properly bound to copy.
>
>   Well, doing ^C doesn't always copy it (probably a glitch wrt which
> input has the focus), and it certainly doesn't synchronize with the  
> cut
> buffer for me. And it doesn't work for anyone at work either. I use  
> ion
> with the KDE clipboard manager (klipper -- because I never managed to
> find a clipboard manager that is as good yet, not depending upon KDE),
> and at work most people use KDE, with the same klipper. Maybe it's  
> a bad
> interaction, I should try to use it under gnome or so to see if it is.
>

FWIW, I have exactly the same problem under OS X.  I've never figured  
out a pattern that gives a guaranteed copy - I'll try playing around  
today and see what I can find.

Actually, while I'm here, gitk semi-regularly ignores ⌘Q, which  
ought to quit on OS X.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-10-28  1:39 New features in gitk Paul Mackerras
  2007-10-28  5:34 ` Linus Torvalds
  2007-10-28 18:32 ` Pierre Habouzit
@ 2007-10-29 13:30 ` Han-Wen Nienhuys
  2007-10-29 14:04 ` Michele Ballabio
  3 siblings, 0 replies; 57+ messages in thread
From: Han-Wen Nienhuys @ 2007-10-29 13:30 UTC (permalink / raw)
  To: git

Paul Mackerras escreveu:
> I just pulled the dev branch of gitk into the master branch, so the
> master branch now has the new features and improvements that I have
> been working on, namely:
> 
> * The find and highlight functions have been combined into a single

sound extremely cool; I didn't know someone was working on it actively.

Can I misuse this thread to bring a ancient bug under your attention? 
It is affecting me regularly; see here for the report:

  http://article.gmane.org/gmane.comp.version-control.git/48789


-- 
 Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-10-28  1:39 New features in gitk Paul Mackerras
                   ` (2 preceding siblings ...)
  2007-10-29 13:30 ` Han-Wen Nienhuys
@ 2007-10-29 14:04 ` Michele Ballabio
  3 siblings, 0 replies; 57+ messages in thread
From: Michele Ballabio @ 2007-10-29 14:04 UTC (permalink / raw)
  To: git; +Cc: Paul Mackerras

On Sunday 28 October 2007, Paul Mackerras wrote:
> * Gitk now uses a new graph layout algorithm, which means it doesn't
>   have to generate the whole layout from top to bottom at startup
>   time, making startup faster.

This is probably causing gitk to eat my (admittedly old) CPU, sometimes.

For example, a

	gitk --all v1.5.2..v1.5.3

gives me problems when I scroll down to about half of the shown history:
that is when I reach the sequence of hundreds of "merge topic branch
into next" commits, and gitk tries hard to display the graph as best
as it can.

It can become unresponsive for one second at every PgDown.

Otherwise, gitk is way faster in other (non-pathological)
circumstances.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-10-28 16:50     ` Linus Torvalds
@ 2007-11-01 10:00       ` Paul Mackerras
  2007-11-01 15:16         ` Linus Torvalds
  2007-11-01 11:37       ` Paul Mackerras
  1 sibling, 1 reply; 57+ messages in thread
From: Paul Mackerras @ 2007-11-01 10:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds writes:

> (And as mentioned many times earlier - if you can avoid topo-order and 
> date-order entirely, you are going to perform a million times better at 
> startup for the cold-cache case. Since you seem to be doing the graph 
> layout lazily now, maybe you could aim for that some day? It does mean 
> that you might - occasionally - end up having to add a commit to 
> *before* one you already laid out).

The other thing --topo-order does is reorder the commits so that
related commits come together.  So far, doing that in Tcl has turned
out to be much slower than having it done in C (within git log) for
the hot-cache case (which I expect is the common case).

I'm now thinking that the best approach would be to have gitk cache
the topology, and on startup only read in the part of the graph that
isn't in the cache.  Mostly that will be small and so git log should
be fast even in the cold-cache case with --topo-order.

Paul.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-10-28 16:50     ` Linus Torvalds
  2007-11-01 10:00       ` Paul Mackerras
@ 2007-11-01 11:37       ` Paul Mackerras
  2007-11-01 15:47         ` Linus Torvalds
  1 sibling, 1 reply; 57+ messages in thread
From: Paul Mackerras @ 2007-11-01 11:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds writes:

> The cost is not generally in outputting the commits. The real cost is in 
> traversing them in the first place. 

Actually, the biggest cost is still gitk reading in the commits from
git log and doing the processing that gitk needs to do on each commit
(which I have tried to minimize, and is way smaller than it used to
be, but is still significant).

In fact that would go significantly faster if git log could output the
data for each commit in a slightly different format.  What would be
good is to get one header line for each commit in the form:

id flag {parent parent parent...} length

where:

id is the 40-char SHA1 for the commit
flag is normally 1, but is 0 for "boundary" commits, 2 for "left-side"
    commits (with --merge), or 3 for "right-side" commits
length is the number of characters of commit data that follow
    (which may differ from the number of bytes, so there would need
     to be agreement on the encoding)

followed by the body of the commit (with no null or other separator
character between commits).

That would be easier to parse in Tcl, and looks like it would knock
another 1.5 seconds off the gitk startup time (for the kernel
repository on my G5).

Paul.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-11-01 10:00       ` Paul Mackerras
@ 2007-11-01 15:16         ` Linus Torvalds
  2007-11-02 10:19           ` Paul Mackerras
  0 siblings, 1 reply; 57+ messages in thread
From: Linus Torvalds @ 2007-11-01 15:16 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: git



On Thu, 1 Nov 2007, Paul Mackerras wrote:
> 
> The other thing --topo-order does is reorder the commits so that
> related commits come together.

If that's the only reason for using it, then please stop, and use 
"--first-parent" instead.

		Linus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-11-01 11:37       ` Paul Mackerras
@ 2007-11-01 15:47         ` Linus Torvalds
  2007-11-01 16:21           ` Linus Torvalds
  0 siblings, 1 reply; 57+ messages in thread
From: Linus Torvalds @ 2007-11-01 15:47 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: git



On Thu, 1 Nov 2007, Paul Mackerras wrote:
>
> Linus Torvalds writes: 
> > The cost is not generally in outputting the commits. The real cost is in 
> > traversing them in the first place. 
> 
> Actually, the biggest cost is still gitk reading in the commits from
> git log and doing the processing that gitk needs to do on each commit
> (which I have tried to minimize, and is way smaller than it used to
> be, but is still significant).

Umm. I think you're basing all your timings on hot-cache numbers.

The hot-cache numbers are already pretty damn good. But try this:

	echo 3 > /proc/sys/vm/drop_caches
	gitk

on a big repository, _especially_ one that isn't totally packed, or on a 
machine with a slow laptop disk. Just following the commit history is 
really quite expensive.

THAT is the problem with --topo-order.

			Linus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-11-01 15:47         ` Linus Torvalds
@ 2007-11-01 16:21           ` Linus Torvalds
  0 siblings, 0 replies; 57+ messages in thread
From: Linus Torvalds @ 2007-11-01 16:21 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: git

On Thu, 1 Nov 2007, Linus Torvalds wrote:
> 
> The hot-cache numbers are already pretty damn good. But try this:
> 
> 	echo 3 > /proc/sys/vm/drop_caches
> 	gitk

Actually, do the above with a path limiter, to make it more obvious.

So try this on the kernel, and you'll see the difference even with a fast 
disk, and even if it's fully packed:

	echo 3 > /proc/sys/vm/drop_caches
	time git rev-list HEAD drivers/scsi | head -10

and now try it with --topo-order.

I get ten seconds with --topo-order (because it has to traverse the 
*whole* history even to just generate the first ten lines), while the 
non-topo-order case is *three*times* faster.

On my laptop, it's even more noticeable. I don't know quite why, but the 
non-topo-order case is actually faster on my laptop than on my desktop 
(will have to see what's up, but I suspect it's a result of they being at 
different points in history, and just bad luck wrt the top-of-history 
having happened to change drivers/scsi or not):

	[torvalds@t40 linux]$ time git rev-list HEAD drivers/scsi | head -10
	..
	real    0m0.688s

but with --topo-order it's much slower:

	[torvalds@t40 linux]$ time git rev-list --topo-order HEAD drivers/scsi | head -10
	..
	real    0m17.458s

See? You shouldn't care about CPU usage, you should care about IO costs! 
Those are the first-order effects.

In other words, if you can be incremental, we're talking about performance 
differences that are orders-of-magnitude. Not ten percent or something 
piddling like that! And the performance improvemens come for the cases 
that are the _problem_, rather than the cases that already work perfectly 
well.

				Linus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-11-01 15:16         ` Linus Torvalds
@ 2007-11-02 10:19           ` Paul Mackerras
  2007-11-02 12:44             ` Marco Costalba
  2007-11-02 15:03             ` Linus Torvalds
  0 siblings, 2 replies; 57+ messages in thread
From: Paul Mackerras @ 2007-11-02 10:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds writes:

> If that's the only reason for using it, then please stop, and use 
> "--first-parent" instead.

How would that help?  That doesn't list about 2/3 of the commits at
all.

In any case, no that's not the only reason.  The main reason is that
it (i.e. --topo-order) spits out the commits in exactly the order that
gitk wants to display them (of which the bit about parents coming
after all their children is a part), and thus reduces the amount of
processing I need to do in Tcl.

Paul.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-11-02 10:19           ` Paul Mackerras
@ 2007-11-02 12:44             ` Marco Costalba
  2007-11-02 15:42               ` Linus Torvalds
  2007-11-02 15:03             ` Linus Torvalds
  1 sibling, 1 reply; 57+ messages in thread
From: Marco Costalba @ 2007-11-02 12:44 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Linus Torvalds, git

On 11/2/07, Paul Mackerras <paulus@samba.org> wrote:
>
> In any case, no that's not the only reason.  The main reason is that
> it (i.e. --topo-order) spits out the commits in exactly the order that
> gitk wants to display them (of which the bit about parents coming
> after all their children is a part), and thus reduces the amount of
> processing I need to do in Tcl.
>

I have tried to overcome --topo-order in qgit but I found it very
difficult, too much for me.

Lazily drawing the layout it doesn't mean that you lazy load the data
from git, indeed you load all the git-log output as soon as it
arrives.

And if the revisions arrive "in order", i.e. if revision A arrive
before revision B it means that A is NOT an ancestor of B, this is of
great help.

When drawing the graph assuming that the vector/list of the arrived
sha is already ordered greatly simplify the whole thing, if we relax
this hypothesis then a lot of work should be done before to draw a
graph chunk, essentially the GUI tool needs to walk the _entire_  list
and reorder it by itself _before_ to draw any graph chunk also if very
small.

So at the end you end up transferring the complete revision walk from
git-log to the GUI tool, and (this is the important thing) to be sure
graph is always correct you need to perform the walk _before_ drawing
any stuff.

The only possible _trick_ I was able to find is to optimistically draw
the graph chunk _assuming_ that it is ordered.

Then reorder the list in the background and finally check if the graph
is correct, if not redraw with correct data.

If the out of order revisions are rare you end up mimic a fast correct
drawing. If are not user will see some flickering at the end of the
load.

IMHO the above scheme is very complicated and fragile.

Just my two cents.

Marco

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-11-02 10:19           ` Paul Mackerras
  2007-11-02 12:44             ` Marco Costalba
@ 2007-11-02 15:03             ` Linus Torvalds
  1 sibling, 0 replies; 57+ messages in thread
From: Linus Torvalds @ 2007-11-02 15:03 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: git

On Fri, 2 Nov 2007, Paul Mackerras wrote:
> 
> How would that help?  That doesn't list about 2/3 of the commits at
> all.

Yeah, you'd have to do all the parent processing on your own, I guess that 
would be too slow.

> In any case, no that's not the only reason.  The main reason is that
> it (i.e. --topo-order) spits out the commits in exactly the order that
> gitk wants to display them (of which the bit about parents coming
> after all their children is a part), and thus reduces the amount of
> processing I need to do in Tcl.

The thing is, you shouldn't *care* how long it takes to get 50,000+ 
commits.

You're only visualizing ~20 commits at a time. Ignore the rest.

THAT is the number that is timing-critical. You're optimizing for the 
wrong case - the "whole history" thing doesn't matter as much as the 
"recent history" does.

So I bet from a usability standpoint, you'd be *much* better off with 
something that might take ten times as long for the whole thing, if the 
first thirty lines show up in a third of the time.

In fact, if you want to really optimize parsing and that is a big issue, 
use

	git log --left-right --parents --pretty=format:"%m %H %P %an <%ae> %aD"

which will give you a single line per commit. I bet tk is good at reading 
single lines. Don't even read anythign else - until somebody actually 
*selects* the commit, at which point you do the diff *and* the full thing.

Yes, it will make things slower over-all. And no, the above won't work for 
the "search everywhere", which means that once people start searching for 
everything, you'll be screwed, but with somethign like the above, you'll 
get the first commits immediately and can start showing them.

And yes, if they come in the wrong order, you'll have to recalculate the 
display, but I thought you had something incremental already - ie you can 
always do it for just the current window of 100 commits or whatever.

			Linus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-11-02 12:44             ` Marco Costalba
@ 2007-11-02 15:42               ` Linus Torvalds
  2007-11-02 16:50                 ` Marco Costalba
                                   ` (2 more replies)
  0 siblings, 3 replies; 57+ messages in thread
From: Linus Torvalds @ 2007-11-02 15:42 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Paul Mackerras, git

On Fri, 2 Nov 2007, Marco Costalba wrote:
> 
> I have tried to overcome --topo-order in qgit but I found it very
> difficult, too much for me.
> 
> Lazily drawing the layout it doesn't mean that you lazy load the data
> from git, indeed you load all the git-log output as soon as it
> arrives.

Would it be more palatable if I tried to write some visualization-specific 
front-end that acted kind of like "git rev-list", but would have some way 
of "resetting" its output?

The thing is, I'm pretty sure I can feed you commits really quickly if I 
don't sort them, and if I don't do the full and careful "oops, this commit 
was reachable from a commit that was marked uninteresting", but while the 
fast-and-stupid approach will work well enough for most things, it will 
occasionally get the wrong answer.

But it will *notice* when it gets the wrong answer, though, and can reset 
and start over!

IOW, I might be able to do something that

 - prints out the commit info per line

 - prepends each line with a line number

 - goes back to an earlier line 'n' when it notices that it needs to 
   output a commit before a previous commit (or when it notices that a 
   commit that it had already output was actually not supposed to show up)

and with something like that, I could make git give you incremental 
output.

The thing is, any revision information that requires "global knowledge" 
simply cannot scale. And "git rev-list --topo-order" may be fast as hell, 
and I can do it in one second on the kernel archive on my machine, but 
that's really only true when it's all cached. 

If it's not cached, it will inevitably have to read in every single commit 
if you want a "final and unchanging ordering". Which inevitably gets you a 
really irritating startup latency. That's just fundamental.

On the other hand, if there is some way to say "oops, restart", I can 
optimistically give you a list that is always properly sorted on a *local* 
scale, but then based on later data I might notice that it wasn't right 
globally and that I need to re-do all or part of it.

But as mentioned, that requires that side-band data of "uhhuh, I screwed 
up, let me go back and fix it".

			Linus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-11-02 15:42               ` Linus Torvalds
@ 2007-11-02 16:50                 ` Marco Costalba
  2007-11-02 18:16                 ` Linus Torvalds
  2007-11-02 18:17                 ` New features in gitk Johannes Schindelin
  2 siblings, 0 replies; 57+ messages in thread
From: Marco Costalba @ 2007-11-02 16:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Mackerras, git

On 11/2/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> But it will *notice* when it gets the wrong answer, though, and can reset
> and start over!
>
> IOW, I might be able to do something that
>
>  - prints out the commit info per line
>
>  - prepends each line with a line number
>
>  - goes back to an earlier line 'n' when it notices that it needs to
>    output a commit before a previous commit (or when it notices that a
>    commit that it had already output was actually not supposed to show up)
>
> and with something like that, I could make git give you incremental
> output.
>

Yes. That's would be easier to implement. Better yet do not give line
numbers I already push back each revision sha in a vector according to
arrival order. It's a stack like storing.

So would be nice if 'git log --restarting' would work like this:

- Output the normal stream of commits according to git log arguments.
No line numbers, no fancy additional stuff.

- If '--topo-order' or something similar was given git log checks if a
wrong output occurs, as example because it founds a revisions that
should have been already put out say 'n' revisions before the last
outputted one.

- In the above case git log outputs a side-band data of "uhhuh, I screwed
up, I restart from 'n' revisions before the last one outputted".

- Then ouput _again_ the stream starting from 'n' revisions earlier.
Note that not only the new offending revision is trasmitted but *all
the revisions* from the out of order one to the remaining.

Given a vector of 'k' arrived revisions, for me it's far easier simply
to flush the 'n' tail items in the sha vector and restart again then
_insert_ in the vector the new out of order one.

This is because parsing alghoritm is based on an 'append new stuff'
approach, not 'insert in the middle', so better flush all the tail
also if probably the big part of retrasmitted revisions would remain
the same.


Marco

P.S: The out of bound information should be commit data aligned and
could take advantage of the fact that an sha always starts with an
alphanumeric char value [0..9 a..f]

IOW instead of the commit sha this signal could write something like

'Restarting  from -12'

and parsing knows that an sha cannot start with an 'R'. Please note
that 'instead of the commit sha' it means in the _exact_ place where
sha is expected and this is not predefined but depends on the git-log
arguments, so that as example

$git log --with-restart

would output:

commit 6959893b0b65ebc68ce2fb524a8ec15a26ca4972
Merge: 452b800... d279fc1...
Author: Junio C Hamano <gitster@pobox.com>
Date:   Wed Oct 31 23:53:55 2007 -0700

    Merge branch 'sp/mergetool'

    * sp/mergetool:
      mergetool: avoid misleading message "Resetting to default..."
      mergetool: add support for ECMerge
      mergetool: use path to mergetool in config var mergetool.<tool>.path

commit Restart from -7
commit 3e4bb087a18435b12eb82116e93af2887578e816
Merge: 5fb1948... 136e631...
Author: Junio C Hamano <gitster@pobox.com>
Date:   Thu Nov 1 17:09:08 2007 -0700

    Merge branch 'maint'



while

$git log --with-restart --pretty=oneline

would output

6959893b0b65ebc68ce2fb524a8ec15a26ca4972 Merge branch 'sp/mergetool'
Restart from -7
3e4bb087a18435b12eb82116e93af2887578e816 Merge branch 'maint'
5fb19486e6f4b6d31f33f5a1eab970b244fa2d08 Merge branch 'bk/maint-cvsexportcommit'


In this way this side band info became compatible with _any_ git-log
output format as long as the format foreseens the output of the
revision sha.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-11-02 15:42               ` Linus Torvalds
  2007-11-02 16:50                 ` Marco Costalba
@ 2007-11-02 18:16                 ` Linus Torvalds
  2007-11-02 20:31                   ` [PATCH 0/2] History replay support Linus Torvalds
  2007-11-02 18:17                 ` New features in gitk Johannes Schindelin
  2 siblings, 1 reply; 57+ messages in thread
From: Linus Torvalds @ 2007-11-02 18:16 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Paul Mackerras, git



On Fri, 2 Nov 2007, Linus Torvalds wrote:
> 
> The thing is, I'm pretty sure I can feed you commits really quickly if I 
> don't sort them, and if I don't do the full and careful "oops, this commit 
> was reachable from a commit that was marked uninteresting", but while the 
> fast-and-stupid approach will work well enough for most things, it will 
> occasionally get the wrong answer.

Ok, I have a trial patch that doesn't do the replay yet, but at least 
notices when it outputs a parent that has already been shown.

In the whole git archive, it happens three times.

In the kernel archive, it happens twelve times.

I'll try to get it into some kind of usable shape, and send out 
experimental patches for people to play with.

		Linus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: New features in gitk
  2007-11-02 15:42               ` Linus Torvalds
  2007-11-02 16:50                 ` Marco Costalba
  2007-11-02 18:16                 ` Linus Torvalds
@ 2007-11-02 18:17                 ` Johannes Schindelin
  2 siblings, 0 replies; 57+ messages in thread
From: Johannes Schindelin @ 2007-11-02 18:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Marco Costalba, Paul Mackerras, git

Hi,

On Fri, 2 Nov 2007, Linus Torvalds wrote:

> On Fri, 2 Nov 2007, Marco Costalba wrote:
> > 
> > I have tried to overcome --topo-order in qgit but I found it very 
> > difficult, too much for me.
> > 
> > Lazily drawing the layout it doesn't mean that you lazy load the data 
> > from git, indeed you load all the git-log output as soon as it 
> > arrives.
> 
> Would it be more palatable if I tried to write some 
> visualization-specific front-end that acted kind of like "git rev-list", 
> but would have some way of "resetting" its output?

Heh, Shawn and I were discussing this when we met in San Jose earlier this 
month.  The application we had in mind was a common backend for graphical 
representation of the commit graph, which could be used by git gui to show 
(part of) the history.  The ultimate goal was a graphical rebase -i.

I would have _loved_ to implement this.  Alas, as it appears my choice of 
job was less than brilliant, and even when I have some spare moments at 
the end of the day, I watch movies to forget the day, instead of 
implementing this fascinating and useful feature.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 0/2] History replay support
  2007-11-02 18:16                 ` Linus Torvalds
@ 2007-11-02 20:31                   ` Linus Torvalds
  2007-11-02 20:32                     ` [PATCH 1/2] Simplify topo-sort logic Linus Torvalds
                                       ` (2 more replies)
  0 siblings, 3 replies; 57+ messages in thread
From: Linus Torvalds @ 2007-11-02 20:31 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Paul Mackerras, git

Ok, this is a rough first draft of avoiding the topological sort up-front, 
and instead incrementally sorting only when actually necessary.

The first patch is a pure cleanup. Quite frankly, I suspect the old 
topological sorting code would have worked perfectly fine as-is, but I 
couldn't really bear to watch it or think about debugging it the way it 
was before.

The indirection and other strangeness just blew my tiny little mind, and 
while I bet it had some historical reason, I ended up reworking that 
thing. I tried to keep it as similar as humanly possible to the old code 
(because it's easy to get wrong, and because I really didn't want to worry 
about the toposort itself), but the numbers speak for themselves:

	 4 files changed, 55 insertions(+), 126 deletions(-)

should tell you something. And it means that there are no subtle calling 
issues with preconditions for using the toposort etc - you can use it over 
and over again, and it should all be obvious. Knock wood.

The second diff is the one that actually adds the new feature, and it 
undoes most of the nice statistics of the first one:

	 5 files changed, 70 insertions(+), 12 deletions(-)

but we still end up with more deletions than insertions on the whole, 
*and* a new feature.

The new code is triggered by using the "--replay" flag, which will cause 
certain consistency checks to be done when a commit is shown by the log 
machinery. In particular:

 - if we print out a parent SHA1, and the parent has already been shown, 
   that's a topology violation, and causes a replay.
 - if we turn a commit unintersting, and the commit has already been 
   shown, that's a "uninteresting" violation, and causes a replay.

The second one I didn't test at all. It's probably hard to trigger, and 
I bet there are bugs there, but this is very much a WIP patch, with the 
hope that people other than me can work on it.

When a replay happens, the log code will print out

	Replay <sha1>

for each <sha1> commit that gets invalidated by the replay, and then start 
*that* part of history anew, with just the required part re-sorted.

Should it print just the number of commits? Perhaps. Play around with 
this.

Anyway, to give you some kind of idea of the effect of this, in the 
current git tree as it exists in my repo, I get 15 replay events, and if 
you compare the total log output, you see:

	[torvalds@woody git]$ wc -l t1 t2
	  170191 t1
	  178150 t2

where "t1" is without replays, and "t2" is with replays. The replays 
obviously do add lines (the replayed ones), but at least for git, it's on 
the order of 5% lines replayed.

The big difference is in the latency:

	time git log --parents --topo-order | head
	real    0m0.163s

vs

	time git log --parents --replay | head
	real    0m0.003s

ie you can see how the --replay thing starts outputtig commits 
immediately, because it knows it can just back up and fix any errors that 
happen.

That's the good news.

The bad news is that it doesn't work well in this simplistic form, because 
there is a O(n**2) behaviour when replays *do* happen, ie we end up having 
replays within replays, and rather than getting a 5% increase like for git 
itself, for the kernel archive this gets a roughly 50x increase for the 
replay. So the *latency* still improves dramatically (getting the first 
one hundred commits in 0.006 seconds vs 1.076s), but because of the bad 
behaviour wrt cascading replays, it's not really usable in this form (the 
full log goes from 8 seconds to 22s when writing to /dev/null - and is 
much worse if the receiver actually has to do something about it).

I think the right thing to do wrt this all would be to turn the replay 
logic into a latency-based one, where it would basically batch things up, 
but make sure to output something at least every half a second or so. But 
that's a pretty separate set of logic, so I thought I'd send this out 
as-is for comments..

			Linus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 1/2] Simplify topo-sort logic
  2007-11-02 20:31                   ` [PATCH 0/2] History replay support Linus Torvalds
@ 2007-11-02 20:32                     ` Linus Torvalds
  2007-11-02 20:35                     ` [PATCH 2/2] Support "history replay" for git log commands Linus Torvalds
  2007-11-03  1:40                     ` [PATCH 0/2] History replay support Linus Torvalds
  2 siblings, 0 replies; 57+ messages in thread
From: Linus Torvalds @ 2007-11-02 20:32 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Paul Mackerras, Git Mailing List


.. by not using quite so much indirection.

This currently grows the "struct commit" a bit, which could be avoided by 
using a union for "util" and "indegree" (the topo-sort used to use "util" 
anyway, so you cannot use them together), but for now the goal of this was 
to simplify, not optimize.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 commit.c   |  150 +++++++++++++++++++++---------------------------------------
 commit.h   |   20 +--------
 revision.c |    7 +--
 revision.h |    4 +-
 4 files changed, 55 insertions(+), 126 deletions(-)

diff --git a/commit.c b/commit.c
index ac24266..c155a49 100644
--- a/commit.c
+++ b/commit.c
@@ -9,22 +9,6 @@
 
 int save_commit_buffer = 1;
 
-struct sort_node
-{
-	/*
-	 * the number of children of the associated commit
-	 * that also occur in the list being sorted.
-	 */
-	unsigned int indegree;
-
-	/*
-	 * reference to original list item that we will re-use
-	 * on output.
-	 */
-	struct commit_list * list_item;
-
-};
-
 const char *commit_type = "commit";
 
 static struct cmt_fmt_map {
@@ -1150,69 +1134,38 @@ struct commit *pop_commit(struct commit_list **stack)
 	return item;
 }
 
-void topo_sort_default_setter(struct commit *c, void *data)
-{
-	c->util = data;
-}
-
-void *topo_sort_default_getter(struct commit *c)
-{
-	return c->util;
-}
-
 /*
  * Performs an in-place topological sort on the list supplied.
  */
 void sort_in_topological_order(struct commit_list ** list, int lifo)
 {
-	sort_in_topological_order_fn(list, lifo, topo_sort_default_setter,
-				     topo_sort_default_getter);
-}
-
-void sort_in_topological_order_fn(struct commit_list ** list, int lifo,
-				  topo_sort_set_fn_t setter,
-				  topo_sort_get_fn_t getter)
-{
-	struct commit_list * next = *list;
-	struct commit_list * work = NULL, **insert;
-	struct commit_list ** pptr = list;
-	struct sort_node * nodes;
-	struct sort_node * next_nodes;
-	int count = 0;
-
-	/* determine the size of the list */
-	while (next) {
-		next = next->next;
-		count++;
-	}
+	struct commit_list *next, *orig = *list;
+	struct commit_list *work, **insert;
+	struct commit_list **pptr;
 
-	if (!count)
+	if (!orig)
 		return;
-	/* allocate an array to help sort the list */
-	nodes = xcalloc(count, sizeof(*nodes));
-	/* link the list to the array */
-	next_nodes = nodes;
-	next=*list;
-	while (next) {
-		next_nodes->list_item = next;
-		setter(next->item, next_nodes);
-		next_nodes++;
-		next = next->next;
+	*list = NULL;
+
+	/* Mark them and clear the indegree */
+	for (next = orig; next; next = next->next) {
+		struct commit *commit = next->item;
+		commit->object.flags |= TOPOSORT;
+		commit->indegree = 0;
 	}
+
 	/* update the indegree */
-	next=*list;
-	while (next) {
+	for (next = orig; next; next = next->next) {
 		struct commit_list * parents = next->item->parents;
 		while (parents) {
-			struct commit * parent=parents->item;
-			struct sort_node * pn = (struct sort_node *) getter(parent);
+			struct commit *parent = parents->item;
 
-			if (pn)
-				pn->indegree++;
-			parents=parents->next;
+			if (parent->object.flags & TOPOSORT)
+				parent->indegree++;
+			parents = parents->next;
 		}
-		next=next->next;
 	}
+
 	/*
 	 * find the tips
 	 *
@@ -1220,55 +1173,56 @@ void sort_in_topological_order_fn(struct commit_list ** list, int lifo,
 	 *
 	 * the tips serve as a starting set for the work queue.
 	 */
-	next=*list;
+	work = NULL;
 	insert = &work;
-	while (next) {
-		struct sort_node * node = (struct sort_node *) getter(next->item);
+	for (next = orig; next; next = next->next) {
+		struct commit *commit = next->item;
 
-		if (node->indegree == 0) {
-			insert = &commit_list_insert(next->item, insert)->next;
-		}
-		next=next->next;
+		if (!commit->indegree)
+			insert = &commit_list_insert(commit, insert)->next;
 	}
 
 	/* process the list in topological order */
 	if (!lifo)
 		sort_by_date(&work);
+
+	pptr = list;
+	*list = NULL;
 	while (work) {
-		struct commit * work_item = pop_commit(&work);
-		struct sort_node * work_node = (struct sort_node *) getter(work_item);
-		struct commit_list * parents = work_item->parents;
+		struct commit *commit;
+		struct commit_list *parents, *work_item;
 
-		while (parents) {
-			struct commit * parent=parents->item;
-			struct sort_node * pn = (struct sort_node *) getter(parent);
-
-			if (pn) {
-				/*
-				 * parents are only enqueued for emission
-				 * when all their children have been emitted thereby
-				 * guaranteeing topological order.
-				 */
-				pn->indegree--;
-				if (!pn->indegree) {
-					if (!lifo)
-						insert_by_date(parent, &work);
-					else
-						commit_list_insert(parent, &work);
-				}
+		work_item = work;
+		work = work_item->next;
+		work_item->next = NULL;
+
+		commit = work_item->item;
+		for (parents = commit->parents; parents ; parents = parents->next) {
+			struct commit *parent=parents->item;
+
+			if (!(parent->object.flags & TOPOSORT))
+				continue;
+
+			/*
+			 * parents are only enqueued for emission
+			 * when all their children have been emitted thereby
+			 * guaranteeing topological order.
+			 */
+			if (!--parent->indegree) {
+				if (!lifo)
+					insert_by_date(parent, &work);
+				else
+					commit_list_insert(parent, &work);
 			}
-			parents=parents->next;
 		}
 		/*
 		 * work_item is a commit all of whose children
 		 * have already been emitted. we can emit it now.
 		 */
-		*pptr = work_node->list_item;
-		pptr = &(*pptr)->next;
-		*pptr = NULL;
-		setter(work_item, NULL);
+		commit->object.flags &= ~TOPOSORT;
+		*pptr = work_item;
+		pptr = &work_item->next;
 	}
-	free(nodes);
 }
 
 /* merge-base stuff */
diff --git a/commit.h b/commit.h
index b661503..7c71471 100644
--- a/commit.h
+++ b/commit.h
@@ -14,6 +14,7 @@ struct commit_list {
 struct commit {
 	struct object object;
 	void *util;
+	unsigned int indegree;
 	unsigned long date;
 	struct commit_list *parents;
 	struct tree *tree;
@@ -82,31 +83,12 @@ void clear_commit_marks(struct commit *commit, unsigned int mark);
 /*
  * Performs an in-place topological sort of list supplied.
  *
- * Pre-conditions for sort_in_topological_order:
- *   all commits in input list and all parents of those
- *   commits must have object.util == NULL
- *
- * Pre-conditions for sort_in_topological_order_fn:
- *   all commits in input list and all parents of those
- *   commits must have getter(commit) == NULL
- *
- * Post-conditions:
  *   invariant of resulting list is:
  *      a reachable from b => ord(b) < ord(a)
  *   in addition, when lifo == 0, commits on parallel tracks are
  *   sorted in the dates order.
  */
-
-typedef void (*topo_sort_set_fn_t)(struct commit*, void *data);
-typedef void* (*topo_sort_get_fn_t)(struct commit*);
-
-void topo_sort_default_setter(struct commit *c, void *data);
-void *topo_sort_default_getter(struct commit *c);
-
 void sort_in_topological_order(struct commit_list ** list, int lifo);
-void sort_in_topological_order_fn(struct commit_list ** list, int lifo,
-				  topo_sort_set_fn_t setter,
-				  topo_sort_get_fn_t getter);
 
 struct commit_graft {
 	unsigned char sha1[20];
diff --git a/revision.c b/revision.c
index e76da0d..e85b4af 100644
--- a/revision.c
+++ b/revision.c
@@ -677,9 +677,6 @@ void init_revisions(struct rev_info *revs, const char *prefix)
 	revs->prune_fn = NULL;
 	revs->prune_data = NULL;
 
-	revs->topo_setter = topo_sort_default_setter;
-	revs->topo_getter = topo_sort_default_getter;
-
 	revs->commit_format = CMIT_FMT_DEFAULT;
 
 	diff_setup(&revs->diffopt);
@@ -1303,9 +1300,7 @@ int prepare_revision_walk(struct rev_info *revs)
 		if (limit_list(revs) < 0)
 			return -1;
 	if (revs->topo_order)
-		sort_in_topological_order_fn(&revs->commits, revs->lifo,
-					     revs->topo_setter,
-					     revs->topo_getter);
+		sort_in_topological_order(&revs->commits, revs->lifo);
 	return 0;
 }
 
diff --git a/revision.h b/revision.h
index 98a0a8f..1f64576 100644
--- a/revision.h
+++ b/revision.h
@@ -10,6 +10,7 @@
 #define CHILD_SHOWN	(1u<<6)
 #define ADDED		(1u<<7)	/* Parents already parsed and added? */
 #define SYMMETRIC_LEFT	(1u<<8)
+#define TOPOSORT	(1u<<9)	/* In the active toposort list.. */
 
 struct rev_info;
 struct log_info;
@@ -96,9 +97,6 @@ struct rev_info {
 	struct diff_options diffopt;
 	struct diff_options pruning;
 
-	topo_sort_set_fn_t topo_setter;
-	topo_sort_get_fn_t topo_getter;
-
 	struct reflog_walk_info *reflog_info;
 };
 

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 2/2] Support "history replay" for git log commands
  2007-11-02 20:31                   ` [PATCH 0/2] History replay support Linus Torvalds
  2007-11-02 20:32                     ` [PATCH 1/2] Simplify topo-sort logic Linus Torvalds
@ 2007-11-02 20:35                     ` Linus Torvalds
  2007-11-02 21:05                       ` Junio C Hamano
  2007-11-03  1:40                     ` [PATCH 0/2] History replay support Linus Torvalds
  2 siblings, 1 reply; 57+ messages in thread
From: Linus Torvalds @ 2007-11-02 20:35 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Paul Mackerras, git


This notices if we aren't in topological order, and replays the history. 
Thus avoiding the need to sort history up front.
    
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

See the code and the more complete explanations in [PATCH 0/2]. In 
particular, see the last section there about the downsides of this: the 
50x expansion of output on the kernel is unacceptable, but if somebody can 
make a graphical viewer that can react correctly to the "Replay" thing, 
I'm sure I can make the replays themselves happen much more rarely.

 builtin-blame.c |    2 +-
 builtin-log.c   |   35 +++++++++++++++++++++++++++++++++++
 log-tree.c      |   10 +++++++---
 revision.c      |   29 ++++++++++++++++++++++-------
 revision.h      |    6 +++++-
 5 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/builtin-blame.c b/builtin-blame.c
index 8432b82..7b6af8c 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -1502,7 +1502,7 @@ static void assign_blame(struct scoreboard *sb, struct rev_info *revs, int opt)
 		else {
 			commit->object.flags |= UNINTERESTING;
 			if (commit->object.parsed)
-				mark_parents_uninteresting(commit);
+				mark_parents_uninteresting(revs, commit);
 		}
 		/* treat root commit as boundary */
 		if (!commit->parents && !show_root)
diff --git a/builtin-log.c b/builtin-log.c
index e8b982d..10e0821 100644
--- a/builtin-log.c
+++ b/builtin-log.c
@@ -77,6 +77,35 @@ static void cmd_log_init(int argc, const char **argv, const char *prefix,
 	}
 }
 
+static void replay_history(struct rev_info *revs)
+{
+	struct commit_list *entry;
+
+	revs->trigger_replay = 0;
+	while ((entry = revs->shown) != NULL) {
+		struct commit *commit = entry->item;
+		unsigned flags = commit->object.flags;
+
+		/* Undo the SHOWN and FORCE_REPLAY bits */
+		commit->object.flags = flags & ~(SHOWN | FORCE_REPLAY);
+		commit->indegree = 0;
+
+		/* Remove it from the shown list, put it on the commit list */
+		revs->shown = entry->next;
+		entry->next = revs->commits;
+		revs->commits = entry;
+
+		printf("Replay %s\n", sha1_to_hex(commit->object.sha1));
+
+		/* Was this the one that caused us to replay? */
+		if (flags & FORCE_REPLAY)
+			break;
+	}
+
+	/* Ok, sort the list to be replayed properly now.. */
+	sort_in_topological_order(&revs->commits, revs->lifo);
+}
+
 static int cmd_log_walk(struct rev_info *rev)
 {
 	struct commit *commit;
@@ -84,6 +113,12 @@ static int cmd_log_walk(struct rev_info *rev)
 	prepare_revision_walk(rev);
 	while ((commit = get_revision(rev)) != NULL) {
 		log_tree_commit(rev, commit);
+
+		if (rev->replay_history) {
+			if (rev->trigger_replay)
+				replay_history(rev);
+			continue;
+		}
 		if (!rev->reflog_info) {
 			/* we allow cycles in reflog ancestry */
 			free(commit->buffer);
diff --git a/log-tree.c b/log-tree.c
index 3763ce9..ce9b887 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -6,12 +6,16 @@
 
 struct decoration name_decoration = { "object names" };
 
-static void show_parents(struct commit *commit, int abbrev)
+static void show_parents(struct rev_info *opt, struct commit *commit, int abbrev)
 {
 	struct commit_list *p;
 	for (p = commit->parents; p ; p = p->next) {
 		struct commit *parent = p->item;
 		printf(" %s", diff_unique_abbrev(parent->object.sha1, abbrev));
+		if (parent->object.flags & SHOWN) {
+			opt->trigger_replay = 1;
+			parent->object.flags |= FORCE_REPLAY;
+		}
 	}
 }
 
@@ -147,7 +151,7 @@ void show_log(struct rev_info *opt, const char *sep)
 		}
 		fputs(diff_unique_abbrev(commit->object.sha1, abbrev_commit), stdout);
 		if (opt->parents)
-			show_parents(commit, abbrev_commit);
+			show_parents(opt, commit, abbrev_commit);
 		show_decorations(commit);
 		putchar(opt->diffopt.line_termination);
 		return;
@@ -248,7 +252,7 @@ void show_log(struct rev_info *opt, const char *sep)
 		fputs(diff_unique_abbrev(commit->object.sha1, abbrev_commit),
 		      stdout);
 		if (opt->parents)
-			show_parents(commit, abbrev_commit);
+			show_parents(opt, commit, abbrev_commit);
 		if (parent)
 			printf(" (from %s)",
 			       diff_unique_abbrev(parent->object.sha1,
diff --git a/revision.c b/revision.c
index e85b4af..e40bc1c 100644
--- a/revision.c
+++ b/revision.c
@@ -79,7 +79,7 @@ void mark_tree_uninteresting(struct tree *tree)
 	tree->buffer = NULL;
 }
 
-void mark_parents_uninteresting(struct commit *commit)
+void mark_parents_uninteresting(struct rev_info *revs, struct commit *commit)
 {
 	struct commit_list *parents = commit->parents;
 
@@ -87,6 +87,10 @@ void mark_parents_uninteresting(struct commit *commit)
 		struct commit *commit = parents->item;
 		if (!(commit->object.flags & UNINTERESTING)) {
 			commit->object.flags |= UNINTERESTING;
+			if (commit->object.flags & SHOWN) {
+				revs->trigger_replay = 1;
+				commit->object.flags |= FORCE_REPLAY;
+			}
 
 			/*
 			 * Normally we haven't parsed the parent
@@ -97,7 +101,7 @@ void mark_parents_uninteresting(struct commit *commit)
 			 * to mark its parents recursively too..
 			 */
 			if (commit->parents)
-				mark_parents_uninteresting(commit);
+				mark_parents_uninteresting(revs, commit);
 		}
 
 		/*
@@ -167,8 +171,9 @@ static struct commit *handle_commit(struct rev_info *revs, struct object *object
 			die("unable to parse commit %s", name);
 		if (flags & UNINTERESTING) {
 			commit->object.flags |= UNINTERESTING;
-			mark_parents_uninteresting(commit);
-			revs->limited = 1;
+			mark_parents_uninteresting(revs, commit);
+			if (!revs->replay_history)
+				revs->limited = 1;
 		}
 		return commit;
 	}
@@ -399,7 +404,7 @@ static int add_parents_to_list(struct rev_info *revs, struct commit *commit, str
 				return -1;
 			p->object.flags |= UNINTERESTING;
 			if (p->parents)
-				mark_parents_uninteresting(p);
+				mark_parents_uninteresting(revs, p);
 			if (p->object.flags & SEEN)
 				continue;
 			p->object.flags |= SEEN;
@@ -542,7 +547,7 @@ static int limit_list(struct rev_info *revs)
 		if (add_parents_to_list(revs, commit, &list) < 0)
 			return -1;
 		if (obj->flags & UNINTERESTING) {
-			mark_parents_uninteresting(commit);
+			mark_parents_uninteresting(revs, commit);
 			if (everybody_uninteresting(list))
 				break;
 			continue;
@@ -995,6 +1000,10 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, const ch
 				revs->parents = 1;
 				continue;
 			}
+			if (!strcmp(arg, "--replay")) {
+				revs->replay_history = 1;
+				continue;
+			}
 			if (!strcmp(arg, "--dense")) {
 				revs->dense = 1;
 				continue;
@@ -1386,7 +1395,13 @@ static struct commit *get_revision_1(struct rev_info *revs)
 		struct commit *commit = entry->item;
 
 		revs->commits = entry->next;
-		free(entry);
+
+		/* Are we going to potentially replay? */
+		if (revs->replay_history) {
+			entry->next = revs->shown;
+			revs->shown = entry;
+		} else
+			free(entry);
 
 		if (revs->reflog_info)
 			fake_reflog_parent(revs->reflog_info, commit);
diff --git a/revision.h b/revision.h
index 1f64576..75b320d 100644
--- a/revision.h
+++ b/revision.h
@@ -11,6 +11,7 @@
 #define ADDED		(1u<<7)	/* Parents already parsed and added? */
 #define SYMMETRIC_LEFT	(1u<<8)
 #define TOPOSORT	(1u<<9)	/* In the active toposort list.. */
+#define FORCE_REPLAY	(1u<<10)	/* This commit was wrong somehow */
 
 struct rev_info;
 struct log_info;
@@ -20,6 +21,7 @@ typedef void (prune_fn_t)(struct rev_info *revs, struct commit *commit);
 struct rev_info {
 	/* Starting list */
 	struct commit_list *commits;
+	struct commit_list *shown;
 	struct object_array pending;
 
 	/* Parents of shown commits */
@@ -36,6 +38,8 @@ struct rev_info {
 			no_walk:1,
 			remove_empty_trees:1,
 			simplify_history:1,
+			replay_history:1,
+			trigger_replay:1,
 			lifo:1,
 			topo_order:1,
 			tag_objects:1,
@@ -113,7 +117,7 @@ extern int handle_revision_arg(const char *arg, struct rev_info *revs,int flags,
 extern int prepare_revision_walk(struct rev_info *revs);
 extern struct commit *get_revision(struct rev_info *revs);
 
-extern void mark_parents_uninteresting(struct commit *commit);
+extern void mark_parents_uninteresting(struct rev_info *revs, struct commit *commit);
 extern void mark_tree_uninteresting(struct tree *tree);
 
 struct name_path {

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] Support "history replay" for git log commands
  2007-11-02 20:35                     ` [PATCH 2/2] Support "history replay" for git log commands Linus Torvalds
@ 2007-11-02 21:05                       ` Junio C Hamano
  2007-11-02 21:17                         ` Linus Torvalds
  0 siblings, 1 reply; 57+ messages in thread
From: Junio C Hamano @ 2007-11-02 21:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Marco Costalba, Paul Mackerras, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> This notices if we aren't in topological order, and replays the history. 
> Thus avoiding the need to sort history up front.
>     
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> ---
>
> See the code and the more complete explanations in [PATCH 0/2]. In 
> particular, see the last section there about the downsides of this: the 
> 50x expansion of output on the kernel is unacceptable, but if somebody can 
> make a graphical viewer that can react correctly to the "Replay" thing, 
> I'm sure I can make the replays themselves happen much more rarely.
>
>  builtin-blame.c |    2 +-
>  builtin-log.c   |   35 +++++++++++++++++++++++++++++++++++
>  log-tree.c      |   10 +++++++---
>  revision.c      |   29 ++++++++++++++++++++++-------
>  revision.h      |    6 +++++-
>  5 files changed, 70 insertions(+), 12 deletions(-)
>
> diff --git a/builtin-blame.c b/builtin-blame.c
> index 8432b82..7b6af8c 100644
> --- a/builtin-blame.c
> +++ b/builtin-blame.c
> @@ -1502,7 +1502,7 @@ static void assign_blame(struct scoreboard *sb, struct rev_info *revs, int opt)
>  		else {
>  			commit->object.flags |= UNINTERESTING;
>  			if (commit->object.parsed)
> -				mark_parents_uninteresting(commit);
> +				mark_parents_uninteresting(revs, commit);
>  		}
>  		/* treat root commit as boundary */
>  		if (!commit->parents && !show_root)
> diff --git a/builtin-log.c b/builtin-log.c
> index e8b982d..10e0821 100644
> --- a/builtin-log.c
> +++ b/builtin-log.c
> @@ -77,6 +77,35 @@ static void cmd_log_init(int argc, const char **argv, const char *prefix,
>  	}
>  }
>  
> +static void replay_history(struct rev_info *revs)
> +{
> +	struct commit_list *entry;
> +
> +	revs->trigger_replay = 0;
> +	while ((entry = revs->shown) != NULL) {
> +		struct commit *commit = entry->item;
> +		unsigned flags = commit->object.flags;
> +
> +		/* Undo the SHOWN and FORCE_REPLAY bits */
> +		commit->object.flags = flags & ~(SHOWN | FORCE_REPLAY);
> +		commit->indegree = 0;
> +
> +		/* Remove it from the shown list, put it on the commit list */
> +		revs->shown = entry->next;
> +		entry->next = revs->commits;
> +		revs->commits = entry;
> +
> +		printf("Replay %s\n", sha1_to_hex(commit->object.sha1));
> +
> +		/* Was this the one that caused us to replay? */
> +		if (flags & FORCE_REPLAY)
> +			break;

Can one iteration of loop in log_tree_commit() smudge more
than one commits with FORCE_REPLAY?  Maybe make the
trigger_replay a counter and count down in this loop until we
find that many commits that have the FORCE_REPLAY flag?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] Support "history replay" for git log commands
  2007-11-02 21:05                       ` Junio C Hamano
@ 2007-11-02 21:17                         ` Linus Torvalds
  0 siblings, 0 replies; 57+ messages in thread
From: Linus Torvalds @ 2007-11-02 21:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Marco Costalba, Paul Mackerras, git



On Fri, 2 Nov 2007, Junio C Hamano wrote:
> 
> Can one iteration of loop in log_tree_commit() smudge more
> than one commits with FORCE_REPLAY?  Maybe make the
> trigger_replay a counter and count down in this loop until we
> find that many commits that have the FORCE_REPLAY flag?

Good point, and yes, that sounds like the right thing to do.

		Linus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/2] History replay support
  2007-11-02 20:31                   ` [PATCH 0/2] History replay support Linus Torvalds
  2007-11-02 20:32                     ` [PATCH 1/2] Simplify topo-sort logic Linus Torvalds
  2007-11-02 20:35                     ` [PATCH 2/2] Support "history replay" for git log commands Linus Torvalds
@ 2007-11-03  1:40                     ` Linus Torvalds
  2007-11-03  7:56                       ` Marco Costalba
                                         ` (2 more replies)
  2 siblings, 3 replies; 57+ messages in thread
From: Linus Torvalds @ 2007-11-03  1:40 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Paul Mackerras, git



On Fri, 2 Nov 2007, Linus Torvalds wrote:
> 
> The bad news is that it doesn't work well in this simplistic form, because 
> there is a O(n**2) behaviour when replays *do* happen, ie we end up having 
> replays within replays [..]

Gaah. the more I look at this, the more I think the topo sort should be 
done at the visualization side.

It's really quite cheap to do the topo sort, *and* it's really quite cheap 
to do the tests that trigger the topo sort, but what's expensive is to 
re-output all the data again!

The silly thing is, I think I've come up with an "almost optimal" 
solution, but it's so ugly that I'm a bit ashamed of it.

That almost optimal solution is simply:
 - get the first <n> (say: 100) commits, and topo-sort just them. Feed it 
   to the visualizer.
 - the visualizer will now have enough to work with in order to show the 
   starting screen and set the cursor to the hourglass or whatever the 
   "wait for it" thing is.
 - get the rest of the commits at our normal leisurely pace (whether it 
   is one second of 17).
 - output the total number of commits (so that the visualizer can re-size 
   the slider and/or allocate some big array just once), topo-sort it all, 
   and output the full thing.

It's disgusting. But it avoids the unnecessary data transfer - except for 
just the first 100 commits that get sent twice. And it gets what *I* look 
for, namely that "immediate" feel to the initial pop-up of the history.

			Linus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/2] History replay support
  2007-11-03  1:40                     ` [PATCH 0/2] History replay support Linus Torvalds
@ 2007-11-03  7:56                       ` Marco Costalba
  2007-11-03 18:11                       ` [REPLACEMENT PATCH 2/2] Add "--early-output" log flag for interactive GUI use Linus Torvalds
  2007-11-04  0:32                       ` [PATCH 0/2] History replay support Paul Mackerras
  2 siblings, 0 replies; 57+ messages in thread
From: Marco Costalba @ 2007-11-03  7:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Mackerras, git

On 11/3/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>
> On Fri, 2 Nov 2007, Linus Torvalds wrote:
> >
> > The bad news is that it doesn't work well in this simplistic form, because
> > there is a O(n**2) behaviour when replays *do* happen, ie we end up having
> > replays within replays [..]
>
> Gaah. the more I look at this, the more I think the topo sort should be
> done at the visualization side.
>
> It's really quite cheap to do the topo sort, *and* it's really quite cheap
> to do the tests that trigger the topo sort, but what's expensive is to
> re-output all the data again!
>
> The silly thing is, I think I've come up with an "almost optimal"
> solution, but it's so ugly that I'm a bit ashamed of it.
>
> That almost optimal solution is simply:
>  - get the first <n> (say: 100) commits, and topo-sort just them. Feed it
>    to the visualizer.
>  - the visualizer will now have enough to work with in order to show the
>    starting screen and set the cursor to the hourglass or whatever the
>    "wait for it" thing is.
>  - get the rest of the commits at our normal leisurely pace (whether it
>    is one second of 17).
>  - output the total number of commits (so that the visualizer can re-size
>    the slider and/or allocate some big array just once), topo-sort it all,
>    and output the full thing.
>
> It's disgusting. But it avoids the unnecessary data transfer - except for
> just the first 100 commits that get sent twice. And it gets what *I* look
> for, namely that "immediate" feel to the initial pop-up of the history.
>

It's not disgusting is human perception oriented !

All this stuff is not needed to get the sha faster, but to let think
the user that are faster. It's for strictly human consumption, so I
would say your "ugly" solution is the best for me.

A bunch of revisions, just to let user eyes to re-focus on new stuff
(and some hundredths of milliseconds are already elapsed after this)
while in the background the real, shadowed, work goes on.

It's also easy on the client GUI side, simply discard all and reload
as soon _correct_ data arrives.

So the new option could became:

git log --fast-output 100 500 --topo-order <...whatever...>

where git log outputs as soon as it can 100 commits and feeds it to
the visualizer. If the _normal_ commits are still not ready after 500
ms are elapsed then git log spits out another 100 commits chunk and so
on at 500ms intervals until good commits are ready. Then outputs the
full thing.

It is very user perception oriented, but hey, so is a GUI!

Marco

P.S: A little optimization for small repositories would be that git
log *waits* at maximum 500ms before to output the first 100 commits
chunk, so that in case of small repos (thousands of revisions) or in
case of warmed up cache the commits in output are already the good
ones, no need for fakes!

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [REPLACEMENT PATCH 2/2] Add "--early-output" log flag for interactive GUI use
  2007-11-03  1:40                     ` [PATCH 0/2] History replay support Linus Torvalds
  2007-11-03  7:56                       ` Marco Costalba
@ 2007-11-03 18:11                       ` Linus Torvalds
  2007-11-03 19:52                         ` Marco Costalba
  2007-11-04  3:06                         ` Paul Mackerras
  2007-11-04  0:32                       ` [PATCH 0/2] History replay support Paul Mackerras
  2 siblings, 2 replies; 57+ messages in thread
From: Linus Torvalds @ 2007-11-03 18:11 UTC (permalink / raw)
  To: Marco Costalba, Junio C Hamano; +Cc: Paul Mackerras, Git Mailing List

This adds support for "--early-output[=n]" as a flag to the "git log"
family of commands.  This allows GUI programs to state that they want to
get some output early, in order to be able to show at least something
quickly, even if the full output may take longer to generate.

If no count is specified, a default count of a hundred commits will be
used, although the actual numbr of commits output may be smaller
depending on how many commits were actually found in the first tenth of
a second (or if *everything* was found before that, in which case no
early output will be provided, and only the final list is made
available).

When the full list is generated, there will be a "Final output:" string
prepended to it, regardless of whether any early commits were shown or
not, so that the consumer can always know the difference between early
output and the final list.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

This replaces my 2/2 patch from yesterday, but still relies on the 
topo-sorting cleanups in 1/2 (which are really totally independent, and 
probably could go in regardless of any of this series).

Instead of having a generic replay mechanism, this one just does *one* 
replay, and discards even that if it can generate all the commits really 
quickly.

The timeout right now is set to a pretty aggressive one-tenth-of-a- 
second, and realistically that could/should probably be longer, but making 
it short makes sure that we actually trigger this in normal use, so I 
think this is the right approach for the initial implementation.

Try it out, with

	git log --early-output=2

and look at what happens: if it cannot generate all the commits in the 
first 0.1 seconds, it will take what it *could* generate, sort it 
topologically (you can add "--date-order" if you want), and show the first 
two commits.

When it then generates the rest, it will add a "Final output:" line, and 
then re-generate it all.

The intent is for a GUI front-end to be able to use the first one-hundred 
or so commits to populate the commit log, and show the window, and not 
have to worry about the fact that the rest of the data (how-ever many 
hundred thousand commits there are) may take much more time to arrive.

So not only is this patch pretty straightforward in itself, I think usage 
should be pretty straightforward too.

 builtin-log.c |   67 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 revision.c    |   22 ++++++++++++++++++
 revision.h    |    4 +++
 3 files changed, 93 insertions(+), 0 deletions(-)

diff --git a/builtin-log.c b/builtin-log.c
index e8b982d..707add2 100644
--- a/builtin-log.c
+++ b/builtin-log.c
@@ -77,11 +77,78 @@ static void cmd_log_init(int argc, const char **argv, const char *prefix,
 	}
 }

+static void log_show_early(struct rev_info *revs, struct commit_list *list)
+{
+	int i = revs->early_output;
+
+	sort_in_topological_order(&list, revs->lifo);
+	while (list && i) {
+		struct commit *commit = list->item;
+		log_tree_commit(revs, commit);
+		list = list->next;
+		i--;
+	}
+}
+
+static void early_output(int signal)
+{
+	show_early_output = log_show_early;
+}
+
+static void setup_early_output(struct rev_info *rev)
+{
+	struct sigaction sa;
+	struct itimerval v;
+
+	/*
+	 * Set up the signal handler, minimally intrusively:
+	 * we only set a single volatile integer word (not
+	 * using sigatomic_t - trying to avoid unnecessary
+	 * system dependencies and headers), and using
+	 * SA_RESTART.
+	 */
+	memset(&sa, 0, sizeof(sa));
+	sa.sa_handler = early_output;
+	sigemptyset(&sa.sa_mask);
+	sa.sa_flags = SA_RESTART;
+	sigaction(SIGALRM, &sa, NULL);
+
+	/*
+	 * If we can get the whole output in less than a
+	 * tenth of a second, don't even bother doing the
+	 * early-output thing..
+	 *
+	 * This is a one-time-only trigger.
+	 */
+	memset(&v, 0, sizeof(v));
+	v.it_value.tv_sec = 0;
+	v.it_value.tv_usec = 100000;
+	setitimer(ITIMER_REAL, &v, NULL);
+}
+
+static void finish_early_output(struct rev_info *rev)
+{
+	signal(SIGALRM, SIG_IGN);
+	if (rev->shown_one) {
+		rev->shown_one = 0;
+		if (rev->commit_format != CMIT_FMT_ONELINE)
+			putchar(rev->diffopt.line_termination);
+	}
+	printf("Final output:\n");
+}
+
 static int cmd_log_walk(struct rev_info *rev)
 {
 	struct commit *commit;

+	if (rev->early_output)
+		setup_early_output(rev);
+
 	prepare_revision_walk(rev);
+
+	if (rev->early_output)
+		finish_early_output(rev);
+
 	while ((commit = get_revision(rev)) != NULL) {
 		log_tree_commit(rev, commit);
 		if (!rev->reflog_info) {
diff --git a/revision.c b/revision.c
index e85b4af..26610bb 100644
--- a/revision.c
+++ b/revision.c
@@ -10,6 +10,8 @@
 #include "reflog-walk.h"
 #include "patch-ids.h"

+volatile show_early_output_fn_t show_early_output;
+
 static char *path_name(struct name_path *path, const char *name)
 {
 	struct name_path *p;
@@ -533,6 +535,7 @@ static int limit_list(struct rev_info *revs)
 		struct commit_list *entry = list;
 		struct commit *commit = list->item;
 		struct object *obj = &commit->object;
+		show_early_output_fn_t show;

 		list = list->next;
 		free(entry);
@@ -550,6 +553,13 @@ static int limit_list(struct rev_info *revs)
 		if (revs->min_age != -1 && (commit->date > revs->min_age))
 			continue;
 		p = &commit_list_insert(commit, p)->next;
+
+		show = show_early_output;
+		if (!show)
+			continue;
+
+		show(revs, newlist);
+		show_early_output = NULL;
 	}
 	if (revs->cherry_pick)
 		cherry_pick_list(newlist, revs);
@@ -991,6 +1001,18 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, const ch
 				revs->topo_order = 1;
 				continue;
 			}
+			if (!prefixcmp(arg, "--early-output")) {
+				int count = 100;
+				switch (arg[14]) {
+				case '=':
+					count = atoi(arg+15);
+					/* Fallthrough */
+				case 0:
+					revs->topo_order = 1;
+					revs->early_output = count;
+					continue;
+				}
+			}
 			if (!strcmp(arg, "--parents")) {
 				revs->parents = 1;
 				continue;
diff --git a/revision.h b/revision.h
index 1f64576..d8a5a50 100644
--- a/revision.h
+++ b/revision.h
@@ -30,6 +30,8 @@ struct rev_info {
 	void *prune_data;
 	prune_fn_t *prune_fn;

+	unsigned int early_output;
+
 	/* Traversal flags */
 	unsigned int	dense:1,
 			no_merges:1,
@@ -105,6 +107,8 @@ struct rev_info {
 #define REV_TREE_DIFFERENT	2

 /* revision.c */
+typedef void (*show_early_output_fn_t)(struct rev_info *, struct commit_list *);
+volatile show_early_output_fn_t show_early_output;

 extern void init_revisions(struct rev_info *revs, const char *prefix);
 extern int setup_revisions(int argc, const char **argv, struct rev_info *revs, const char *def);

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [REPLACEMENT PATCH 2/2] Add "--early-output" log flag for interactive GUI use
  2007-11-03 18:11                       ` [REPLACEMENT PATCH 2/2] Add "--early-output" log flag for interactive GUI use Linus Torvalds
@ 2007-11-03 19:52                         ` Marco Costalba
  2007-11-04  3:06                         ` Paul Mackerras
  1 sibling, 0 replies; 57+ messages in thread
From: Marco Costalba @ 2007-11-03 19:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Paul Mackerras, Git Mailing List

On 11/3/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>
> Try it out, with
>
>         git log --early-output=2
>
> and look at what happens

It works for me! tomorrow I will try to teach qgit to play with this new toy.

BTW there are some warning around that disappear adding

extern void sort_in_topological_order(struct commit_list ** list, int lifo);

somewhere in commit.h

Marco

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/2] History replay support
  2007-11-03  1:40                     ` [PATCH 0/2] History replay support Linus Torvalds
  2007-11-03  7:56                       ` Marco Costalba
  2007-11-03 18:11                       ` [REPLACEMENT PATCH 2/2] Add "--early-output" log flag for interactive GUI use Linus Torvalds
@ 2007-11-04  0:32                       ` Paul Mackerras
  2 siblings, 0 replies; 57+ messages in thread
From: Paul Mackerras @ 2007-11-04  0:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Marco Costalba, git

Linus Torvalds writes:

> It's disgusting. But it avoids the unnecessary data transfer - except for 
> just the first 100 commits that get sent twice. And it gets what *I* look 
> for, namely that "immediate" feel to the initial pop-up of the history.

Yes.  And it avoids the need for gitk to have to do any re-ordering of
the commits that it gets from git log.  In the case where the first
commits come out in a different order in the final list, gitk can just
truncate its list at the point of difference and re-read from there,
which is a lot less time-consuming than having to make a decision
for every commit about where it should go in the list.

I have actually been trying to come up with a decent way to generate
the list incrementally without using --topo-order for months now.  One
of the problems is that while Tcl can append things to the end of a
long list efficiently, and can index long lists efficiently, inserting
things into the middle of a long list is slow.  That makes any
insertion-sort type of algorithm unbearably slow.  And it's not just
when we get parents before children that I have to insert into the
middle of the list -- reordering a date-order list into one where we
see the string of commits that were merged immediately after a merge
commit requires a lot of insertions.

I do have an approach that I'm thinking about that builds up the list
as a series of "arcs", each of which is a string of commits with
exactly one parent and one child.  Each new commit from git log
(without --topo-order) then either gets appended to an arc (there can
be several arcs that are still growing at any given point), or it is a
merge or a branch point, which means it terminates some arc(s) and/or
starts new arc(s).  The final list of commits is then composed by
concatenating the arcs in a suitable order.  That avoids doing
insertions but does make things more complex for the consumers of the
list (the layout algorithm and the search function), and it
potentially makes it much slower to go from a row number to the commit
displayed on that row.

However, for now I think that using --early-output is a good
compromise solution that makes the startup much faster (or at least
*appear* to be much faster :) without adding much complexity.

Regards,
Paul.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [REPLACEMENT PATCH 2/2] Add "--early-output" log flag for interactive GUI use
  2007-11-03 18:11                       ` [REPLACEMENT PATCH 2/2] Add "--early-output" log flag for interactive GUI use Linus Torvalds
  2007-11-03 19:52                         ` Marco Costalba
@ 2007-11-04  3:06                         ` Paul Mackerras
  2007-11-04  5:38                           ` Linus Torvalds
  1 sibling, 1 reply; 57+ messages in thread
From: Paul Mackerras @ 2007-11-04  3:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Marco Costalba, Junio C Hamano, Git Mailing List

Linus Torvalds writes:

> When the full list is generated, there will be a "Final output:" string
> prepended to it, regardless of whether any early commits were shown or
> not, so that the consumer can always know the difference between early
> output and the final list.

How hard would it be to put the total number of commits on that "Final
output" line?  That would be useful for me.

Paul.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [REPLACEMENT PATCH 2/2] Add "--early-output" log flag for interactive GUI use
  2007-11-04  3:06                         ` Paul Mackerras
@ 2007-11-04  5:38                           ` Linus Torvalds
  2007-11-04  7:10                             ` Paul Mackerras
  2007-11-04 18:11                             ` Linus Torvalds
  0 siblings, 2 replies; 57+ messages in thread
From: Linus Torvalds @ 2007-11-04  5:38 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Marco Costalba, Junio C Hamano, Git Mailing List

On Sun, 4 Nov 2007, Paul Mackerras wrote:
>
> Linus Torvalds writes:
> 
> > When the full list is generated, there will be a "Final output:" string
> > prepended to it, regardless of whether any early commits were shown or
> > not, so that the consumer can always know the difference between early
> > output and the final list.
> 
> How hard would it be to put the total number of commits on that "Final
> output" line?  That would be useful for me.

Not hard. I think we basically have it anyway. The reason I didn't do it 
is that there's actually multiple numbers: there's the number of primary 
("interesting") commits, and then there are the "others", ie the edge 
things etc. So the number I'd pick would be the number of actual 
interesting commits, no edges, no nothing. Or what?

One other thing I was thinking of was also to perhaps allow multiple 
partial early-output things, in case we get just 5 commits in the first 
0.1 seconds, then 50 in the first second, and 200 after 2 seconds.. I can 
well imagine getting the full list taking a long time over a network 
filesystem (somebody mentioned samba), and maybe having just a single 
trigger is too inflexible.

			Linus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [REPLACEMENT PATCH 2/2] Add "--early-output" log flag for interactive GUI use
  2007-11-04  5:38                           ` Linus Torvalds
@ 2007-11-04  7:10                             ` Paul Mackerras
  2007-11-04  7:52                               ` Marco Costalba
  2007-11-04 18:11                             ` Linus Torvalds
  1 sibling, 1 reply; 57+ messages in thread
From: Paul Mackerras @ 2007-11-04  7:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Marco Costalba, Junio C Hamano, Git Mailing List

Linus Torvalds writes:

> > How hard would it be to put the total number of commits on that "Final
> > output" line?  That would be useful for me.
> 
> Not hard. I think we basically have it anyway. The reason I didn't do it 
> is that there's actually multiple numbers: there's the number of primary 
> ("interesting") commits, and then there are the "others", ie the edge 
> things etc. So the number I'd pick would be the number of actual 
> interesting commits, no edges, no nothing. Or what?

Any of those numbers is probably good enough for a progress bar, but
ideally it would be the total number that you are going to output.
So, with --boundary it would include the edge commits, otherwise it
would just be the interesting commits, I think.

> One other thing I was thinking of was also to perhaps allow multiple 
> partial early-output things, in case we get just 5 commits in the first 
> 0.1 seconds, then 50 in the first second, and 200 after 2 seconds.. I can 
> well imagine getting the full list taking a long time over a network 
> filesystem (somebody mentioned samba), and maybe having just a single 
> trigger is too inflexible.

In fact gitk won't mind if you give it multiple occurrences of "Final
output", as long as you start from the beginning again after each
occurrence.  So having multiple triggers is certainly doable as far as
gitk is concerned.  Later on we could optimize that by having git log
match up how many initial commits are the same in both the new list
and the old list, and have it output that rather than the N commits
that were the same as last time.

Paul.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [REPLACEMENT PATCH 2/2] Add "--early-output" log flag for interactive GUI use
  2007-11-04  7:10                             ` Paul Mackerras
@ 2007-11-04  7:52                               ` Marco Costalba
  0 siblings, 0 replies; 57+ messages in thread
From: Marco Costalba @ 2007-11-04  7:52 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Linus Torvalds, Junio C Hamano, Git Mailing List

On 11/4/07, Paul Mackerras <paulus@samba.org> wrote:
>
> Later on we could optimize that by having git log
> match up how many initial commits are the same in both the new list
> and the old list, and have it output that rather than the N commits
> that were the same as last time.
>

Partial output of commits after "Final output" line would be more
difficult for me to handle instead or restarting form beginning.

One possible optimization along this line instead would be that git
log match up how many initial commits are the same in both the new
list and the old list, and if the old list is whole included
unmodifies in the new simply git log outputs the new commits (not
already present in the old) without the "Final output" line.

Marco

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [REPLACEMENT PATCH 2/2] Add "--early-output" log flag for interactive GUI use
  2007-11-04  5:38                           ` Linus Torvalds
  2007-11-04  7:10                             ` Paul Mackerras
@ 2007-11-04 18:11                             ` Linus Torvalds
  2007-11-04 20:12                               ` [PATCH 3/2] Enhance --early-output format Linus Torvalds
  1 sibling, 1 reply; 57+ messages in thread
From: Linus Torvalds @ 2007-11-04 18:11 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Marco Costalba, Junio C Hamano, Git Mailing List

On Sat, 3 Nov 2007, Linus Torvalds wrote:
> > 
> > How hard would it be to put the total number of commits on that "Final
> > output" line?  That would be useful for me.
> 
> Not hard. I think we basically have it anyway.

Actually, I take that back.

It's hard. Not because we don't have the commits, but because while we do 
the top-level shape pruning in the eearly stages, we do *not* do the final 
path-limiting until we actually output the commits.

Which actually makes "--early-output" right now do some rather odd things 
when you use a path limiter: we don't do the "rewrite_parents()" thing 
until later, so the early output will have done the first level of history 
simplification, but it won't have made history *dense* yet.

I'm looking at it now, I'll have to think about this a bit more. It might 
be trivial to fix, but this thing has real potential for being subtle.

			Linus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 3/2] Enhance --early-output format
  2007-11-04 18:11                             ` Linus Torvalds
@ 2007-11-04 20:12                               ` Linus Torvalds
  2007-11-05 20:24                                 ` Junio C Hamano
  2007-11-13  4:58                                 ` [PATCH 4/2] Fix parent rewriting in --early-output Linus Torvalds
  0 siblings, 2 replies; 57+ messages in thread
From: Linus Torvalds @ 2007-11-04 20:12 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Marco Costalba, Junio C Hamano, Git Mailing List


This makes --early-output a bit more advanced, and actually makes it 
generate multiple "Final output:" headers as it updates things 
asynchronously. I realize that the "Final output:" line is now illogical, 
since it's not really final until it also says "done", but 

It now _always_ generates a "Final output:" header in front of any commit 
list, and that output header gives you a *guess* at the maximum number of 
commits available. However, it should be noted that the guess can be 
completely off: I do a reasonable job estimating it, but it is not meant 
to be exact. 

So what happens is that you may get output like this:

 - at 0.1 seconds:

	Final output: 2 incomplete
	.. 2 commits listed ..

 - half a second later:

	Final output: 33 incomplete
	.. 33 commits listed ..

 - another half a second after that:	

	Final output: 71 incomplete
	.. 71 commits listed ..

 - another half second later:

	Final output: 136 incomplete
	.. 100 commits listed: we hit the --early-output limit, and
	.. will only output 100 commits, and after this you'll not
	.. see an "incomplete" report any more since you got as much
	.. early output as you asked for!

 - .. and then finally:

	Final output: 73106 done
	.. all the commits ..

The above is a real-life scenario on my current kernel tree after having 
flushed all the caches.

Tested with the experimental gitk patch that Paul sent out, and by looking 
at the actual log output (and verifying that my commit count guesses 
actually match real life fairly well).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

On Sun, 4 Nov 2007, Linus Torvalds wrote:
> 
> I'm looking at it now, I'll have to think about this a bit more. It might 
> be trivial to fix, but this thing has real potential for being subtle.

It wasn't totally trivial, but it doesn't seem to be excessively subtle 
either. About half the patch is moving around some code to look at whether 
the commit is interesting or not and rewriting the parents, so that it can 
be shared with the revision walker.

 builtin-log.c |   88 ++++++++++++++++++++++++++++++++++++++++++++++++--------
 revision.c    |   63 +++++++++++++++++++++++-----------------
 revision.h    |    8 +++++
 3 files changed, 119 insertions(+), 40 deletions(-)

diff --git a/builtin-log.c b/builtin-log.c
index 707add2..268a7af 100644
--- a/builtin-log.c
+++ b/builtin-log.c
@@ -77,17 +77,85 @@ static void cmd_log_init(int argc, const char **argv, const char *prefix,
 	}
 }
 
+/*
+ * This gives a rough estimate for how many commits we
+ * will print out in the list.
+ */
+static int estimate_commit_count(struct rev_info *rev, struct commit_list *list)
+{
+	int n = 0;
+
+	while (list) {
+		struct commit *commit = list->item;
+		unsigned int flags = commit->object.flags;
+
+		list = list->next;
+		if (flags & UNINTERESTING)
+			continue;
+		if (rev->prune_fn && rev->dense && !(flags & TREECHANGE)) {
+			if (commit->parents && !commit->parents->next)
+				continue;
+		}
+		n++;
+	}
+	return n;
+}
+
+static void show_early_header(struct rev_info *rev, const char *stage, int nr)
+{
+	if (rev->shown_one) {
+		rev->shown_one = 0;
+		if (rev->commit_format != CMIT_FMT_ONELINE)
+			putchar(rev->diffopt.line_termination);
+	}
+	printf("Final output: %d %s\n", nr, stage);
+}
+
+struct itimerval early_output_timer;
+
 static void log_show_early(struct rev_info *revs, struct commit_list *list)
 {
 	int i = revs->early_output;
+	int show_header = 1;
 
 	sort_in_topological_order(&list, revs->lifo);
 	while (list && i) {
 		struct commit *commit = list->item;
-		log_tree_commit(revs, commit);
+		switch (simplify_commit(revs, commit)) {
+		case commit_show:
+			if (show_header) {
+				int n = estimate_commit_count(revs, list);
+				show_early_header(revs, "incomplete", n);
+				show_header = 0;
+			}
+			log_tree_commit(revs, commit);
+			i--;
+			break;
+		case commit_ignore:
+			break;
+		case commit_error:
+			return;
+		}
 		list = list->next;
-		i--;
 	}
+
+	/* Did we already get enough commits for the early output? */
+	if (!i)
+		return;
+
+	/*
+	 * ..if no, then repeat it twice a second until we
+	 * do.
+	 *
+	 * NOTE! We don't use "it_interval", because if the
+	 * reader isn't listening, we want our output to be
+	 * throttled by the writing, and not have the timer
+	 * trigger every second even if we're blocked on a
+	 * reader!
+	 */
+	early_output_timer.it_value.tv_sec = 0;
+	early_output_timer.it_value.tv_usec = 500000;
+	setitimer(ITIMER_REAL, &early_output_timer, NULL);
 }
 
 static void early_output(int signal)
@@ -98,7 +166,6 @@ static void early_output(int signal)
 static void setup_early_output(struct rev_info *rev)
 {
 	struct sigaction sa;
-	struct itimerval v;
 
 	/*
 	 * Set up the signal handler, minimally intrusively:
@@ -120,21 +187,16 @@ static void setup_early_output(struct rev_info *rev)
 	 *
 	 * This is a one-time-only trigger.
 	 */
-	memset(&v, 0, sizeof(v));
-	v.it_value.tv_sec = 0;
-	v.it_value.tv_usec = 100000;
-	setitimer(ITIMER_REAL, &v, NULL);
+	early_output_timer.it_value.tv_sec = 0;
+	early_output_timer.it_value.tv_usec = 100000;
+	setitimer(ITIMER_REAL, &early_output_timer, NULL);
 }
 
 static void finish_early_output(struct rev_info *rev)
 {
+	int n = estimate_commit_count(rev, rev->commits);
 	signal(SIGALRM, SIG_IGN);
-	if (rev->shown_one) {
-		rev->shown_one = 0;
-		if (rev->commit_format != CMIT_FMT_ONELINE)
-			putchar(rev->diffopt.line_termination);
-	}
-	printf("Final output:\n");
+	show_early_header(rev, "done", n);
 }
 
 static int cmd_log_walk(struct rev_info *rev)
diff --git a/revision.c b/revision.c
index 26610bb..5d6f208 100644
--- a/revision.c
+++ b/revision.c
@@ -1398,6 +1398,36 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 			   commit->buffer, strlen(commit->buffer));
 }
 
+enum commit_action simplify_commit(struct rev_info *revs, struct commit *commit)
+{
+	if (commit->object.flags & SHOWN)
+		return commit_ignore;
+	if (revs->unpacked && has_sha1_pack(commit->object.sha1, revs->ignore_packed))
+		return commit_ignore;
+	if (commit->object.flags & UNINTERESTING)
+		return commit_ignore;
+	if (revs->min_age != -1 && (commit->date > revs->min_age))
+		return commit_ignore;
+	if (revs->no_merges && commit->parents && commit->parents->next)
+		return commit_ignore;
+	if (!commit_match(commit, revs))
+		return commit_ignore;
+	if (revs->prune_fn && revs->dense) {
+		/* Commit without changes? */
+		if (!(commit->object.flags & TREECHANGE)) {
+			/* drop merges unless we want parenthood */
+			if (!revs->parents)
+				return commit_ignore;
+			/* non-merge - always ignore it */
+			if (!commit->parents || !commit->parents->next)
+				return commit_ignore;
+		}
+		if (revs->parents && rewrite_parents(revs, commit) < 0)
+			return commit_error;
+	}
+	return commit_show;
+}
+
 static struct commit *get_revision_1(struct rev_info *revs)
 {
 	if (!revs->commits)
@@ -1425,36 +1455,15 @@ static struct commit *get_revision_1(struct rev_info *revs)
 			if (add_parents_to_list(revs, commit, &revs->commits) < 0)
 				return NULL;
 		}
-		if (commit->object.flags & SHOWN)
-			continue;
-
-		if (revs->unpacked && has_sha1_pack(commit->object.sha1,
-						    revs->ignore_packed))
-		    continue;
 
-		if (commit->object.flags & UNINTERESTING)
-			continue;
-		if (revs->min_age != -1 && (commit->date > revs->min_age))
-			continue;
-		if (revs->no_merges &&
-		    commit->parents && commit->parents->next)
-			continue;
-		if (!commit_match(commit, revs))
+		switch (simplify_commit(revs, commit)) {
+		case commit_ignore:
 			continue;
-		if (revs->prune_fn && revs->dense) {
-			/* Commit without changes? */
-			if (!(commit->object.flags & TREECHANGE)) {
-				/* drop merges unless we want parenthood */
-				if (!revs->parents)
-					continue;
-				/* non-merge - always ignore it */
-				if (!commit->parents || !commit->parents->next)
-					continue;
-			}
-			if (revs->parents && rewrite_parents(revs, commit) < 0)
-				return NULL;
+		case commit_error:
+			return -1;
+		default:
+			return commit;
 		}
-		return commit;
 	} while (revs->commits);
 	return NULL;
 }
diff --git a/revision.h b/revision.h
index d8a5a50..2232247 100644
--- a/revision.h
+++ b/revision.h
@@ -133,4 +133,12 @@ extern void add_object(struct object *obj,
 
 extern void add_pending_object(struct rev_info *revs, struct object *obj, const char *name);
 
+enum commit_action {
+	commit_ignore,
+	commit_show,
+	commit_error
+};
+
+extern enum commit_action simplify_commit(struct rev_info *revs, struct commit *commit);
+
 #endif

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 3/2] Enhance --early-output format
  2007-11-04 20:12                               ` [PATCH 3/2] Enhance --early-output format Linus Torvalds
@ 2007-11-05 20:24                                 ` Junio C Hamano
  2007-11-05 20:47                                   ` Linus Torvalds
  2007-11-13  4:58                                 ` [PATCH 4/2] Fix parent rewriting in --early-output Linus Torvalds
  1 sibling, 1 reply; 57+ messages in thread
From: Junio C Hamano @ 2007-11-05 20:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Mackerras, Marco Costalba, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> It wasn't totally trivial, but it doesn't seem to be excessively subtle 
> either. About half the patch is moving around some code to look at whether 
> the commit is interesting or not and rewriting the parents, so that it can 
> be shared with the revision walker.

Very nicely done.

> +	while (list) {
> +		struct commit *commit = list->item;
> +		unsigned int flags = commit->object.flags;
> +
> +		list = list->next;
> +		if (flags & UNINTERESTING)
> +			continue;
> +		if (rev->prune_fn && rev->dense && !(flags & TREECHANGE)) {
> +			if (commit->parents && !commit->parents->next)
> +				continue;
> +		}

When looking at:

	if (A && B && C) {
        	if (D && E)
                	continue;
	}

an uninitiated might say "Huh?  Why use nested 'if'?", but to
somebody who knows how revision traversal works, the above split
is a more logical way to test this condition.  Maybe one liner
comment is in order?

> +static void show_early_header(struct rev_info *rev, const char *stage, int nr)
> +{
> +	if (rev->shown_one) {
> +		rev->shown_one = 0;
> +		if (rev->commit_format != CMIT_FMT_ONELINE)
> +			putchar(rev->diffopt.line_termination);
> +	}
> +	printf("Final output: %d %s\n", nr, stage);
> +}

As you noted, this is more like "Partial output" now.
How about painting the bikeshed pink by saying:

	Partial output: 20
        Partial output: 70
        Final output: 70000

> +	/* Did we already get enough commits for the early output? */
> +	if (!i)
> +		return;
> +
> +	/*
> +	 * ..if no, then repeat it twice a second until we
> +	 * do.
> +	 *
> +	 * NOTE! We don't use "it_interval", because if the
> +	 * reader isn't listening, we want our output to be
> +	 * throttled by the writing, and not have the timer
> +	 * trigger every second even if we're blocked on a
> +	 * reader!
> +	 */

A comment like this is very much appreciated.

> +	early_output_timer.it_value.tv_sec = 0;
> +	early_output_timer.it_value.tv_usec = 500000;
> +	setitimer(ITIMER_REAL, &early_output_timer, NULL);
>  }

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 3/2] Enhance --early-output format
  2007-11-05 20:24                                 ` Junio C Hamano
@ 2007-11-05 20:47                                   ` Linus Torvalds
  2007-11-05 21:22                                     ` Linus Torvalds
  0 siblings, 1 reply; 57+ messages in thread
From: Linus Torvalds @ 2007-11-05 20:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Paul Mackerras, Marco Costalba, Git Mailing List

On Mon, 5 Nov 2007, Junio C Hamano wrote:
>
> > +		if (rev->prune_fn && rev->dense && !(flags & TREECHANGE)) {
> > +			if (commit->parents && !commit->parents->next)
> > +				continue;
> > +		}
> 
> When looking at:
> 
> 	if (A && B && C) {
>         	if (D && E)
>                 	continue;
> 	}
> 
> an uninitiated might say "Huh?  Why use nested 'if'?", but to
> somebody who knows how revision traversal works, the above split
> is a more logical way to test this condition.  Maybe one liner
> comment is in order?

I'd almost prefer not to.

If people feel the code is subtle enough that a comment is in order to 
explain *what* something does, I would rather introduce helper functions 
than add comments. I'm not a big believer in comments in the middle of 
code to explain the code, but I *am* a big believer in trying to make the 
code easy to read without them.

(I don't dislike comments per se, but I'd *much* rather have the comments 
explain what is going on at a higher level. Comments that talk about the 
details of the code itself is likely bogus).

So we could add a commit to say what is going on ("ignore commits that 
have only one parent and didn't change the tree"), but I'd not want to 
explain why that particular layout of code.

That said, I've often found the "TREECHANGE" bit annoying. The fact that 
we always have to test it together with testing "rev->prune_fn" and often 
also the "rev->dense" flag is just annoying. I'd almost like to just 
always set the bit if "rev->prune_fn" isn't set. Alternatively, the code 
could be rewritten to just have a few inline helper functions, and it 
could perhaps be written as

	if (!(flags & TREECHANGE) && revs->dense && single_parent(commit))
		continue;

I dunno. I have no really *strong* opinions, although I suspect that 
anybody who looks at that particular piece of code had better understand 
these issues well enough anyway that it doesn't much matter..

> As you noted, this is more like "Partial output" now.
> How about painting the bikeshed pink by saying:
> 
> 	Partial output: 20
>         Partial output: 70
>         Final output: 70000

I'm fine with it. The reason I did it the way I did was that this way I 
didn't need to change "gitk" - I could just use Pauls original patch 
as-is, since that code didn't care *what* came after the "Final output", 
and didn't care how many times it showed up.

			Linus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 3/2] Enhance --early-output format
  2007-11-05 20:47                                   ` Linus Torvalds
@ 2007-11-05 21:22                                     ` Linus Torvalds
  2007-11-05 21:35                                       ` Linus Torvalds
  0 siblings, 1 reply; 57+ messages in thread
From: Linus Torvalds @ 2007-11-05 21:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Paul Mackerras, Marco Costalba, Git Mailing List



On Mon, 5 Nov 2007, Linus Torvalds wrote:
> 
> That said, I've often found the "TREECHANGE" bit annoying. The fact that 
> we always have to test it together with testing "rev->prune_fn" and often 
> also the "rev->dense" flag is just annoying. I'd almost like to just 
> always set the bit if "rev->prune_fn" isn't set. Alternatively, the code 
> could be rewritten to just have a few inline helper functions, and it 
> could perhaps be written as
> 
> 	if (!(flags & TREECHANGE) && revs->dense && single_parent(commit))
> 		continue;
> 
> I dunno.

Here's a possible cleanup patch. It's on top of the enhanced 
--early-output format commit, and in fact fixes a stupid bug in that 
commit ("return -1" vs "return NULL"), but that bug-fix is really an 
independent thing.

It basically removes the unnecessary indirection of "revs->prune_fn", 
since that function is always the same one (or NULL), and there is in fact 
not even an abstraction reason to make it a function (ie its not called 
from some other file and doesn't allow us to keep the function itself 
static or anything like that).

It then just replaces it with a bit that says "prune or not", and if not 
pruning, every commit gets TREECHANGE.

That in turn means that

 - if (!revs->prune_fn || (flags & TREECHANGE))
 - if (revs->prune_fn && !(flags & TREECHANGE))

just become

 - if (flags & TREECHANGE)
 - if (!(flags & TREECHANGE))

respectively.

Together with adding the "single_parent()" helper function, the "complex" 
conditional now becomes

	if (!(flags & TREECHANGE) && rev->dense && single_parent(commit))
		continue;

Anyway, do with this as you will. I think it's ok, but apart from passing 
the test-suite and looking obvious, this has gotten zero testing.

		Linus

---
 builtin-log.c      |    6 ++----
 builtin-rev-list.c |   14 +++++++-------
 commit.h           |    5 +++++
 revision.c         |   20 ++++++++++++--------
 revision.h         |    5 +----
 5 files changed, 27 insertions(+), 23 deletions(-)

diff --git a/builtin-log.c b/builtin-log.c
index 981f388..76c84e2 100644
--- a/builtin-log.c
+++ b/builtin-log.c
@@ -92,10 +92,8 @@ static int estimate_commit_count(struct rev_info *rev, struct commit_list *list)
 		list = list->next;
 		if (flags & UNINTERESTING)
 			continue;
-		if (rev->prune_fn && rev->dense && !(flags & TREECHANGE)) {
-			if (commit->parents && !commit->parents->next)
-				continue;
-		}
+		if (!(flags & TREECHANGE) && rev->dense && single_parent(commit))
+			continue;
 		n++;
 	}
 	return n;
diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index 6970467..2dec887 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -142,7 +142,7 @@ static int count_distance(struct commit_list *entry)
 
 		if (commit->object.flags & (UNINTERESTING | COUNTED))
 			break;
-		if (!revs.prune_fn || (commit->object.flags & TREECHANGE))
+		if (commit->object.flags & TREECHANGE)
 			nr++;
 		commit->object.flags |= COUNTED;
 		p = commit->parents;
@@ -198,7 +198,7 @@ static inline int halfway(struct commit_list *p, int nr)
 	/*
 	 * Don't short-cut something we are not going to return!
 	 */
-	if (revs.prune_fn && !(p->item->object.flags & TREECHANGE))
+	if (!(p->item->object.flags & TREECHANGE))
 		return 0;
 	if (DEBUG_BISECT)
 		return 0;
@@ -268,7 +268,7 @@ static struct commit_list *best_bisection(struct commit_list *list, int nr)
 		int distance;
 		unsigned flags = p->item->object.flags;
 
-		if (revs.prune_fn && !(flags & TREECHANGE))
+		if (!(flags & TREECHANGE))
 			continue;
 		distance = weight(p);
 		if (nr - distance < distance)
@@ -308,7 +308,7 @@ static struct commit_list *best_bisection_sorted(struct commit_list *list, int n
 		int distance;
 		unsigned flags = p->item->object.flags;
 
-		if (revs.prune_fn && !(flags & TREECHANGE))
+		if (!(flags & TREECHANGE))
 			continue;
 		distance = weight(p);
 		if (nr - distance < distance)
@@ -362,7 +362,7 @@ static struct commit_list *do_find_bisection(struct commit_list *list,
 		p->item->util = &weights[n++];
 		switch (count_interesting_parents(commit)) {
 		case 0:
-			if (!revs.prune_fn || (flags & TREECHANGE)) {
+			if (flags & TREECHANGE) {
 				weight_set(p, 1);
 				counted++;
 				show_list("bisection 2 count one",
@@ -435,7 +435,7 @@ static struct commit_list *do_find_bisection(struct commit_list *list,
 			 * add one for p itself if p is to be counted,
 			 * otherwise inherit it from q directly.
 			 */
-			if (!revs.prune_fn || (flags & TREECHANGE)) {
+			if (flags & TREECHANGE) {
 				weight_set(p, weight(q)+1);
 				counted++;
 				show_list("bisection 2 count one",
@@ -482,7 +482,7 @@ static struct commit_list *find_bisection(struct commit_list *list,
 			continue;
 		p->next = last;
 		last = p;
-		if (!revs.prune_fn || (flags & TREECHANGE))
+		if (flags & TREECHANGE)
 			nr++;
 		on_list++;
 	}
diff --git a/commit.h b/commit.h
index 4ed0c1c..aa67986 100644
--- a/commit.h
+++ b/commit.h
@@ -117,4 +117,9 @@ extern int interactive_add(void);
 extern void add_files_to_cache(int verbose, const char *prefix, const char **files);
 extern int rerere(void);
 
+static inline int single_parent(struct commit *commit)
+{
+	return commit->parents && !commit->parents->next;
+}
+
 #endif /* COMMIT_H */
diff --git a/revision.c b/revision.c
index 5d6f208..7a1ecba 100644
--- a/revision.c
+++ b/revision.c
@@ -308,6 +308,14 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 	struct commit_list **pp, *parent;
 	int tree_changed = 0, tree_same = 0;
 
+	/*
+	 * If we don't do pruning, everything is interesting
+	 */
+	if (!revs->prune) {
+		commit->object.flags |= TREECHANGE;
+		return;
+	}
+
 	if (!commit->tree)
 		return;
 
@@ -415,8 +423,7 @@ static int add_parents_to_list(struct rev_info *revs, struct commit *commit, str
 	 * simplify the commit history and find the parent
 	 * that has no differences in the path set if one exists.
 	 */
-	if (revs->prune_fn)
-		revs->prune_fn(revs, commit);
+	try_to_simplify_commit(revs, commit);
 
 	if (revs->no_walk)
 		return 0;
@@ -684,9 +691,6 @@ void init_revisions(struct rev_info *revs, const char *prefix)
 	revs->skip_count = -1;
 	revs->max_count = -1;
 
-	revs->prune_fn = NULL;
-	revs->prune_data = NULL;
-
 	revs->commit_format = CMIT_FMT_DEFAULT;
 
 	diff_setup(&revs->diffopt);
@@ -1271,7 +1275,7 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, const ch
 		diff_tree_setup_paths(revs->prune_data, &revs->pruning);
 		/* Can't prune commits with rename following: the paths change.. */
 		if (!revs->diffopt.follow_renames)
-			revs->prune_fn = try_to_simplify_commit;
+			revs->prune = 1;
 		if (!revs->full_diff)
 			diff_tree_setup_paths(revs->prune_data, &revs->diffopt);
 	}
@@ -1412,7 +1416,7 @@ enum commit_action simplify_commit(struct rev_info *revs, struct commit *commit)
 		return commit_ignore;
 	if (!commit_match(commit, revs))
 		return commit_ignore;
-	if (revs->prune_fn && revs->dense) {
+	if (revs->prune && revs->dense) {
 		/* Commit without changes? */
 		if (!(commit->object.flags & TREECHANGE)) {
 			/* drop merges unless we want parenthood */
@@ -1460,7 +1464,7 @@ static struct commit *get_revision_1(struct rev_info *revs)
 		case commit_ignore:
 			continue;
 		case commit_error:
-			return -1;
+			return NULL;
 		default:
 			return commit;
 		}
diff --git a/revision.h b/revision.h
index 2232247..a798514 100644
--- a/revision.h
+++ b/revision.h
@@ -15,8 +15,6 @@
 struct rev_info;
 struct log_info;
 
-typedef void (prune_fn_t)(struct rev_info *revs, struct commit *commit);
-
 struct rev_info {
 	/* Starting list */
 	struct commit_list *commits;
@@ -28,12 +26,11 @@ struct rev_info {
 	/* Basic information */
 	const char *prefix;
 	void *prune_data;
-	prune_fn_t *prune_fn;
-
 	unsigned int early_output;
 
 	/* Traversal flags */
 	unsigned int	dense:1,
+			prune:1,
 			no_merges:1,
 			no_walk:1,
 			remove_empty_trees:1,

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 3/2] Enhance --early-output format
  2007-11-05 21:22                                     ` Linus Torvalds
@ 2007-11-05 21:35                                       ` Linus Torvalds
  0 siblings, 0 replies; 57+ messages in thread
From: Linus Torvalds @ 2007-11-05 21:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Paul Mackerras, Marco Costalba, Git Mailing List



On Mon, 5 Nov 2007, Linus Torvalds wrote:
> 
> Here's a possible cleanup patch. It's on top of the enhanced 
> --early-output format commit, and in fact fixes a stupid bug in that 
> commit ("return -1" vs "return NULL"), but that bug-fix is really an 
> independent thing.

.. and this extends a bit further on the notion.

It basically means that "rev->dense" can now be ignored outside of 
revision.c, because we'll just set TREECHANGE automatically when 
seeing a non-merge regular commit when --sparse is being used.

So it's not just a simplification, it's a performance optimization too! 

Although since nobody sane would ever use --sparse, I guess nobody really 
cares.

		Linus

---
 builtin-log.c |    8 ++------
 revision.c    |    9 +++++++++
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/builtin-log.c b/builtin-log.c
index 76c84e2..d6845bc 100644
--- a/builtin-log.c
+++ b/builtin-log.c
@@ -88,13 +88,9 @@ static int estimate_commit_count(struct rev_info *rev, struct commit_list *list)
 	while (list) {
 		struct commit *commit = list->item;
 		unsigned int flags = commit->object.flags;
-
 		list = list->next;
-		if (flags & UNINTERESTING)
-			continue;
-		if (!(flags & TREECHANGE) && rev->dense && single_parent(commit))
-			continue;
-		n++;
+		if ((flags & TREECHANGE) && !(flags & UNINTERESTING))
+			n++;
 	}
 	return n;
 }
diff --git a/revision.c b/revision.c
index 7a1ecba..02e9241 100644
--- a/revision.c
+++ b/revision.c
@@ -325,6 +325,15 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 		return;
 	}
 
+	/*
+	 * Normal non-merge commit? If we don't want to make the 
+	 * history dense, we consider it always to be a change..
+	 */
+	if (!revs->dense && !commit->parents->next) {
+		commit->object.flags |= TREECHANGE;
+		return;
+	}
+
 	pp = &commit->parents;
 	while ((parent = *pp) != NULL) {
 		struct commit *p = parent->item;

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 4/2] Fix parent rewriting in --early-output
  2007-11-04 20:12                               ` [PATCH 3/2] Enhance --early-output format Linus Torvalds
  2007-11-05 20:24                                 ` Junio C Hamano
@ 2007-11-13  4:58                                 ` Linus Torvalds
  2007-11-13  5:43                                   ` Junio C Hamano
  2007-11-13  9:59                                   ` Paul Mackerras
  1 sibling, 2 replies; 57+ messages in thread
From: Linus Torvalds @ 2007-11-13  4:58 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Marco Costalba, Junio C Hamano, Git Mailing List

When we do history simplification in early-output, we end up in the 
interesting situation that the early output may do simplification with a 
partial tree - in particular, there may be parents that simply haven't 
been handled yet, and don't have their parenthood parsed.

The history simplification would get this case totally wrong, and assume 
that the parent list of a parent being NULL meant that it was a root 
commit, and rewrite the whole parent as such.

This would cause unconnected commits in the gitk output.

This fixes it, by saying that if you reach a parent that hasn't been 
parsed yet, history simplification will simply stop and leave it alone: 
later on, when we have the full history, we will *continue* the 
simplification and eventually get the right information.

However, while the parent is now correctly rewritten, it looks like gitk 
is confused by this. Gitk will remember the original parent information, 
even if a replay has given new parenthood information. Since the partial 
early-output information is triggered by timing, this means that gitk will 
show some totally random parent that quite possibly won't even be part of 
the final commit set at all!

On the kernel, at least with my machine, I can trigger this with something 
like

	gitk fs/read_write.c

where currently the log (with --parents) reads like this:

	commit a16877ca9cec211708a161057a7cbfbf2cbc3a53 d96e6e71647846e0dab097efd9b8bf3a3a556dca
	Author: Pavel Emelyanov <xemul@openvz.org>
	Date:   Mon Oct 1 14:41:11 2007 -0700

	    Cleanup macros for distinguishing mandatory locks
	..

	commit d96e6e71647846e0dab097efd9b8bf3a3a556dca d6b29d7cee064f28ca097e906de7453541351095
	Author: Jens Axboe <jens.axboe@oracle.com>
	Date:   Mon Jun 11 12:18:52 2007 +0200

	    Remove remnants of sendfile()
	...

but with early-output (and this fixed patch), I get something like this:

	Final output: 1 incomplete
	commit a16877ca9cec211708a161057a7cbfbf2cbc3a53 31b54f40e12e4d04941762be6615edaf3c6ed811
	Author: Pavel Emelyanov <xemul@openvz.org>
	Date:   Mon Oct 1 14:41:11 2007 -0700

	    Cleanup macros for distinguishing mandatory locks
	...

	Final output: 26 done
	commit a16877ca9cec211708a161057a7cbfbf2cbc3a53 d96e6e71647846e0dab097efd9b8bf3a3a556dca
	Author: Pavel Emelyanov <xemul@openvz.org>
	Date:   Mon Oct 1 14:41:11 2007 -0700

	    Cleanup macros for distinguishing mandatory locks
	..

ie notice how the early-output doesn't have the right parent, since it 
hasn't gotten that far back in history yet. So now the final output will 
have the parenthood rewritten (correctly), but gitk will have cached the 
old random incorrect parenthood, and doesn't react properly to the updated 
and fixed one at replay time.

Anyway, this is a real fix, but gitk remains a bit useless as is.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 revision.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/revision.c b/revision.c
index 931f978..8872a91 100644
--- a/revision.c
+++ b/revision.c
@@ -1352,6 +1352,8 @@ static enum rewrite_result rewrite_one(struct rev_info *revs, struct commit **pp
 		if (!revs->limited)
 			if (add_parents_to_list(revs, p, &revs->commits) < 0)
 				return rewrite_one_error;
+		if (!p->object.parsed)
+			return rewrite_one_ok;
 		if (p->parents && p->parents->next)
 			return rewrite_one_ok;
 		if (p->object.flags & (TREECHANGE | UNINTERESTING))

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/2] Fix parent rewriting in --early-output
  2007-11-13  4:58                                 ` [PATCH 4/2] Fix parent rewriting in --early-output Linus Torvalds
@ 2007-11-13  5:43                                   ` Junio C Hamano
  2007-11-13  6:46                                     ` Linus Torvalds
  2007-11-13  8:01                                     ` Shawn O. Pearce
  2007-11-13  9:59                                   ` Paul Mackerras
  1 sibling, 2 replies; 57+ messages in thread
From: Junio C Hamano @ 2007-11-13  5:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Mackerras, Marco Costalba, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Anyway, this is a real fix, but gitk remains a bit useless as is.
>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> ---
>  revision.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/revision.c b/revision.c
> index 931f978..8872a91 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -1352,6 +1352,8 @@ static enum rewrite_result rewrite_one(struct rev_info *revs, struct commit **pp
>  		if (!revs->limited)
>  			if (add_parents_to_list(revs, p, &revs->commits) < 0)
>  				return rewrite_one_error;
> +		if (!p->object.parsed)
> +			return rewrite_one_ok;
>  		if (p->parents && p->parents->next)
>  			return rewrite_one_ok;
>  		if (p->object.flags & (TREECHANGE | UNINTERESTING))

This is too subtle, or I am missing something.

I have to wonder what would happen if a much higher level caller
caused the objects to get parsed before coming into the revision
walking machinery, e.g. after the command line processing for
A...B walked the ancestry chain until their common ancestors are
found.  So these commits between A and B are parsed, but the
revision limiting machinery hasn't done its operation to set
TREECHANGE and/or UNINTERESTING in add_parents_to_list() on
these commits yet.

I think the fix will not trigger for such parents, but the
processing just goes on.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/2] Fix parent rewriting in --early-output
  2007-11-13  5:43                                   ` Junio C Hamano
@ 2007-11-13  6:46                                     ` Linus Torvalds
  2007-11-13  7:16                                       ` Linus Torvalds
  2007-11-13  8:01                                     ` Shawn O. Pearce
  1 sibling, 1 reply; 57+ messages in thread
From: Linus Torvalds @ 2007-11-13  6:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Paul Mackerras, Marco Costalba, Git Mailing List

On Mon, 12 Nov 2007, Junio C Hamano wrote:
> 
> This is too subtle, or I am missing something.

It's subtle. And you're probably right, I need to fix it up some more.

It works, but it works for all the wrong reasons. When thinking about it, 
you need to keep three "generations" of commits in mind. Let's call them 
"c" (commit), "p" (parent of c) and "pp" (parent of p) respectively.

What is going on is:
 - we have calculated TREECHANGE for "c".
 - that, in turn, means that we have parsed "p" (it's done by 
   add_parents_to_list() - either as part of try_to_simplify_commit(), or 
   if that code doesn't trigger, by the later loop over the parents)
 - but we haven't parsed "pp" yet.

Now, when we decide to rewrite "c", we look at whether we can change the 
parent list of c to point from "p" to "pp". But with the added check, we 
now will trigger on the fact that "pp" hasn't even been parsed yet, so we 
won't even try, and we leave the parent list alone.

But I agree, I don't think it's really stable. We could have gotten to 
"pp" through some other chain.

I think the real problem is that "TREECHANGE" has the wrong polarity. It 
should default to always being set, and then we could actively clear it 
when we do the work to say "it's the same tree". Instead, we default it to 
being the same (which triggers parent rewriting), and only later may we 
notice that it wasn't the same.

			Linus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/2] Fix parent rewriting in --early-output
  2007-11-13  6:46                                     ` Linus Torvalds
@ 2007-11-13  7:16                                       ` Linus Torvalds
  2007-11-13  7:53                                         ` Sven Verdoolaege
  2007-11-13  8:48                                         ` Junio C Hamano
  0 siblings, 2 replies; 57+ messages in thread
From: Linus Torvalds @ 2007-11-13  7:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Paul Mackerras, Marco Costalba, Git Mailing List



On Mon, 12 Nov 2007, Linus Torvalds wrote:
> 
> I think the real problem is that "TREECHANGE" has the wrong polarity. It 
> should default to always being set, and then we could actively clear it 
> when we do the work to say "it's the same tree". Instead, we default it to 
> being the same (which triggers parent rewriting), and only later may we 
> notice that it wasn't the same.

So, maybe the proper solution is to say "commits are assumed to be 
different to their parents, but we have an explicit bit saying TREESAME 
when we find them to be the same".

This solves the problem quite naturally, because any tree that hasn't been 
parsed - or even if it *has* been parsed, but just hasn't gone through 
the compare function - will then always be seen as "different" and thus 
interesting.

This fairly straight-forward patch seems to work. It *replaces* the 
pervious "patch 4/2", and yes, Junio, I think you were very right to 
complain about that one.

How does this one feel? It basically says "a commit that has TREESAME set 
is kind-of-UNINTERESTING", but obviously in a different way than an 
outright UNINTERESTING commit.

The diff is pretty straightforward - just change the sense of TREECHANGE, 
and that sometimes removes a line, and sometimes adds one, but most of the 
changes are just TREECHANGE => TREESAME, together with a negation of the 
operation.

		Linus

---
 builtin-fmt-merge-msg.c |    2 +-
 builtin-log.c           |    2 +-
 builtin-rev-list.c      |   16 ++++++++--------
 revision.c              |   22 +++++++++++-----------
 revision.h              |    2 +-
 5 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/builtin-fmt-merge-msg.c b/builtin-fmt-merge-msg.c
index 8a3c962..6163bd4 100644
--- a/builtin-fmt-merge-msg.c
+++ b/builtin-fmt-merge-msg.c
@@ -176,7 +176,7 @@ static void shortlog(const char *name, unsigned char *sha1,
 	struct commit *commit;
 	struct object *branch;
 	struct list subjects = { NULL, NULL, 0, 0 };
-	int flags = UNINTERESTING | TREECHANGE | SEEN | SHOWN | ADDED;
+	int flags = UNINTERESTING | TREESAME | SEEN | SHOWN | ADDED;
 
 	branch = deref_tag(parse_object(sha1), sha1_to_hex(sha1), 40);
 	if (!branch || branch->type != OBJ_COMMIT)
diff --git a/builtin-log.c b/builtin-log.c
index d6845bc..54ddaad 100644
--- a/builtin-log.c
+++ b/builtin-log.c
@@ -89,7 +89,7 @@ static int estimate_commit_count(struct rev_info *rev, struct commit_list *list)
 		struct commit *commit = list->item;
 		unsigned int flags = commit->object.flags;
 		list = list->next;
-		if ((flags & TREECHANGE) && !(flags & UNINTERESTING))
+		if (!(flags & (TREESAME | UNINTERESTING)))
 			n++;
 	}
 	return n;
diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index 2dec887..0258ec4 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -142,7 +142,7 @@ static int count_distance(struct commit_list *entry)
 
 		if (commit->object.flags & (UNINTERESTING | COUNTED))
 			break;
-		if (commit->object.flags & TREECHANGE)
+		if (!(commit->object.flags & TREESAME))
 			nr++;
 		commit->object.flags |= COUNTED;
 		p = commit->parents;
@@ -198,7 +198,7 @@ static inline int halfway(struct commit_list *p, int nr)
 	/*
 	 * Don't short-cut something we are not going to return!
 	 */
-	if (!(p->item->object.flags & TREECHANGE))
+	if (p->item->object.flags & TREESAME)
 		return 0;
 	if (DEBUG_BISECT)
 		return 0;
@@ -234,7 +234,7 @@ static void show_list(const char *debug, int counted, int nr,
 		char *ep, *sp;
 
 		fprintf(stderr, "%c%c%c ",
-			(flags & TREECHANGE) ? 'T' : ' ',
+			(flags & TREESAME) ? ' ' : 'T',
 			(flags & UNINTERESTING) ? 'U' : ' ',
 			(flags & COUNTED) ? 'C' : ' ');
 		if (commit->util)
@@ -268,7 +268,7 @@ static struct commit_list *best_bisection(struct commit_list *list, int nr)
 		int distance;
 		unsigned flags = p->item->object.flags;
 
-		if (!(flags & TREECHANGE))
+		if (flags & TREESAME)
 			continue;
 		distance = weight(p);
 		if (nr - distance < distance)
@@ -308,7 +308,7 @@ static struct commit_list *best_bisection_sorted(struct commit_list *list, int n
 		int distance;
 		unsigned flags = p->item->object.flags;
 
-		if (!(flags & TREECHANGE))
+		if (flags & TREESAME)
 			continue;
 		distance = weight(p);
 		if (nr - distance < distance)
@@ -362,7 +362,7 @@ static struct commit_list *do_find_bisection(struct commit_list *list,
 		p->item->util = &weights[n++];
 		switch (count_interesting_parents(commit)) {
 		case 0:
-			if (flags & TREECHANGE) {
+			if (!(flags & TREESAME)) {
 				weight_set(p, 1);
 				counted++;
 				show_list("bisection 2 count one",
@@ -435,7 +435,7 @@ static struct commit_list *do_find_bisection(struct commit_list *list,
 			 * add one for p itself if p is to be counted,
 			 * otherwise inherit it from q directly.
 			 */
-			if (flags & TREECHANGE) {
+			if (!(flags & TREESAME)) {
 				weight_set(p, weight(q)+1);
 				counted++;
 				show_list("bisection 2 count one",
@@ -482,7 +482,7 @@ static struct commit_list *find_bisection(struct commit_list *list,
 			continue;
 		p->next = last;
 		last = p;
-		if (flags & TREECHANGE)
+		if (!(flags & TREESAME))
 			nr++;
 		on_list++;
 	}
diff --git a/revision.c b/revision.c
index 931f978..5796153 100644
--- a/revision.c
+++ b/revision.c
@@ -311,17 +311,15 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 	/*
 	 * If we don't do pruning, everything is interesting
 	 */
-	if (!revs->prune) {
-		commit->object.flags |= TREECHANGE;
+	if (!revs->prune)
 		return;
-	}
 
 	if (!commit->tree)
 		return;
 
 	if (!commit->parents) {
-		if (!rev_same_tree_as_empty(revs, commit->tree))
-			commit->object.flags |= TREECHANGE;
+		if (rev_same_tree_as_empty(revs, commit->tree))
+			commit->object.flags |= TREESAME;
 		return;
 	}
 
@@ -329,10 +327,8 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 	 * Normal non-merge commit? If we don't want to make the
 	 * history dense, we consider it always to be a change..
 	 */
-	if (!revs->dense && !commit->parents->next) {
-		commit->object.flags |= TREECHANGE;
+	if (!revs->dense && !commit->parents->next)
 		return;
-	}
 
 	pp = &commit->parents;
 	while ((parent = *pp) != NULL) {
@@ -357,6 +353,7 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 			}
 			parent->next = NULL;
 			commit->parents = parent;
+			commit->object.flags |= TREESAME;
 			return;
 
 		case REV_TREE_NEW:
@@ -385,7 +382,8 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 		die("bad tree compare for commit %s", sha1_to_hex(commit->object.sha1));
 	}
 	if (tree_changed && !tree_same)
-		commit->object.flags |= TREECHANGE;
+		return;
+	commit->object.flags |= TREESAME;
 }
 
 static int add_parents_to_list(struct rev_info *revs, struct commit *commit, struct commit_list **list)
@@ -1354,7 +1352,9 @@ static enum rewrite_result rewrite_one(struct rev_info *revs, struct commit **pp
 				return rewrite_one_error;
 		if (p->parents && p->parents->next)
 			return rewrite_one_ok;
-		if (p->object.flags & (TREECHANGE | UNINTERESTING))
+		if (p->object.flags & UNINTERESTING)
+			return rewrite_one_ok;
+		if (!(p->object.flags & TREESAME))
 			return rewrite_one_ok;
 		if (!p->parents)
 			return rewrite_one_noparents;
@@ -1427,7 +1427,7 @@ enum commit_action simplify_commit(struct rev_info *revs, struct commit *commit)
 		return commit_ignore;
 	if (revs->prune && revs->dense) {
 		/* Commit without changes? */
-		if (!(commit->object.flags & TREECHANGE)) {
+		if (commit->object.flags & TREESAME) {
 			/* drop merges unless we want parenthood */
 			if (!revs->parents)
 				return commit_ignore;
diff --git a/revision.h b/revision.h
index a798514..992e1e9 100644
--- a/revision.h
+++ b/revision.h
@@ -3,7 +3,7 @@
 
 #define SEEN		(1u<<0)
 #define UNINTERESTING   (1u<<1)
-#define TREECHANGE	(1u<<2)
+#define TREESAME	(1u<<2)
 #define SHOWN		(1u<<3)
 #define TMP_MARK	(1u<<4) /* for isolated cases; clean after use */
 #define BOUNDARY	(1u<<5)

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/2] Fix parent rewriting in --early-output
  2007-11-13  7:16                                       ` Linus Torvalds
@ 2007-11-13  7:53                                         ` Sven Verdoolaege
  2007-11-13  8:48                                         ` Junio C Hamano
  1 sibling, 0 replies; 57+ messages in thread
From: Sven Verdoolaege @ 2007-11-13  7:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Junio C Hamano, Paul Mackerras, Marco Costalba, Git Mailing List

On Mon, Nov 12, 2007 at 11:16:08PM -0800, Linus Torvalds wrote:
> On Mon, 12 Nov 2007, Linus Torvalds wrote:
> > 
> > I think the real problem is that "TREECHANGE" has the wrong polarity. It 
> > should default to always being set, and then we could actively clear it 
> > when we do the work to say "it's the same tree". Instead, we default it to 
> > being the same (which triggers parent rewriting), and only later may we 
> > notice that it wasn't the same.
> 
> So, maybe the proper solution is to say "commits are assumed to be 
> different to their parents, but we have an explicit bit saying TREESAME 
> when we find them to be the same".

FWIW, I like it.

I had basically the same patch in my git-rewrite-commits series
(except that I called it "PRUNED" instead of "TREESAME"), but I
don't think I even sent it to the list, because I was told there
was no use.

skimo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/2] Fix parent rewriting in --early-output
  2007-11-13  5:43                                   ` Junio C Hamano
  2007-11-13  6:46                                     ` Linus Torvalds
@ 2007-11-13  8:01                                     ` Shawn O. Pearce
  2007-11-13  8:24                                       ` Junio C Hamano
  1 sibling, 1 reply; 57+ messages in thread
From: Shawn O. Pearce @ 2007-11-13  8:01 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, Paul Mackerras, Marco Costalba, Git Mailing List

Junio C Hamano <gitster@pobox.com> wrote:
> I have to wonder what would happen if a much higher level caller
> caused the objects to get parsed before coming into the revision
> walking machinery, e.g. after the command line processing for
> A...B walked the ancestry chain until their common ancestors are
> found.  So these commits between A and B are parsed, but the
> revision limiting machinery hasn't done its operation to set
> TREECHANGE and/or UNINTERESTING in add_parents_to_list() on
> these commits yet.

That's one of the problems with the way the revision walking
machinery is built.  Its fast, but it can really only be used once.
My series about making the allocators able to free their nodes was
to allow resetting the entire machinary for another user, but as you
pointed out how do we decide when we can do a reset and invalidate
all prior struct commit*?

-- 
Shawn.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/2] Fix parent rewriting in --early-output
  2007-11-13  8:01                                     ` Shawn O. Pearce
@ 2007-11-13  8:24                                       ` Junio C Hamano
  0 siblings, 0 replies; 57+ messages in thread
From: Junio C Hamano @ 2007-11-13  8:24 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Linus Torvalds, Paul Mackerras, Marco Costalba, Git Mailing List

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Junio C Hamano <gitster@pobox.com> wrote:
>
>> I have to wonder what would happen if a much higher level caller
>> caused the objects to get parsed before coming into the revision
>> walking machinery, e.g. after the command line processing for
>> A...B walked the ancestry chain until their common ancestors are
>> found.  So these commits between A and B are parsed, but the
>> revision limiting machinery hasn't done its operation to set
>> TREECHANGE and/or UNINTERESTING in add_parents_to_list() on
>> these commits yet.
>
> That's one of the problems with the way the revision walking
> machinery is built.  Its fast, but it can really only be used once.

Yes but not quite.  As long as you do not use overlapping set of
flag bits without cleaning, you are almost Ok.

The reason I say "almost" is that I think the true problem with
the first patch by Linus was to load "parsed" bit any semantics
other than "we have read and parsed the data so do not bother
rereading it".  If it used another mechanism (e.g. another flag
bit that means "we have done TREECHANGE and stuff"), returning
rewrite_one_ok to punt when seeing an unprocessed node would
have been safe, as it would not have munged the parent list.
The replacement "TREESAME" patch would work much better without
using such an extra bit, because it allows us to directly check
"if we already decided this does not change from the parent",
and the lack of the bit covers both "we haven't processed it"
and "we processed but we do not want to prune" cases, and not
pruning is safe and easily and correctly re-processible in the
post clean-up phase of the early_output series.

But in general, you're right.  An operation that changes the
ancestry shape is irreversible, and you would need to cause
reparsing of the commit objects after you are done with revision
traversal, and at that point, discarding and re-reading
everything from scratch, although is heavy-handed, is one
plausible approach to tackle the problem.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/2] Fix parent rewriting in --early-output
  2007-11-13  7:16                                       ` Linus Torvalds
  2007-11-13  7:53                                         ` Sven Verdoolaege
@ 2007-11-13  8:48                                         ` Junio C Hamano
  1 sibling, 0 replies; 57+ messages in thread
From: Junio C Hamano @ 2007-11-13  8:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Mackerras, Marco Costalba, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> This solves the problem quite naturally, because any tree that hasn't been 
> parsed - or even if it *has* been parsed, but just hasn't gone through 
> the compare function - will then always be seen as "different" and thus 
> interesting.
>
> This fairly straight-forward patch seems to work. It *replaces* the 
> pervious "patch 4/2", and yes, Junio, I think you were very right to 
> complain about that one.
>
> How does this one feel?

I think this is very natural and I like it.

I did not complain but just said I was puzzled, but after
thinking about this a bit I probably should have ;-).

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/2] Fix parent rewriting in --early-output
  2007-11-13  4:58                                 ` [PATCH 4/2] Fix parent rewriting in --early-output Linus Torvalds
  2007-11-13  5:43                                   ` Junio C Hamano
@ 2007-11-13  9:59                                   ` Paul Mackerras
  2007-11-13 18:53                                     ` Junio C Hamano
  2007-11-16  7:30                                     ` Marco Costalba
  1 sibling, 2 replies; 57+ messages in thread
From: Paul Mackerras @ 2007-11-13  9:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Marco Costalba, Junio C Hamano, Git Mailing List

Linus Torvalds writes:

> However, while the parent is now correctly rewritten, it looks like gitk 
> is confused by this. Gitk will remember the original parent information, 
> even if a replay has given new parenthood information. Since the partial 
> early-output information is triggered by timing, this means that gitk will 
> show some totally random parent that quite possibly won't even be part of 
> the final commit set at all!

Yep.  It will be a little complex to deal with that because there are
bits of state that I set up for the parents, and if they're the wrong
parents, I'll have to go back and undo that.

In fact it would be easier for me if, instead of getting the id of
some random ancestor commit, I got an explicit indication to say
"unknown parent", such as just a "-" in place of the id of the
unknown parent(s).  Would that be doable?  I could then just not do
the processing for any unknown parent, and make sure to do it when I
see the final version of the commit.

Also, I have just about worked out an efficient way to do the commit
reordering incrementally, which would let me not use --topo-order or
--date-order, and display commits as they come in.  I'll have to see
whether that turns out to be better overall than using --early-output.

Paul.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/2] Fix parent rewriting in --early-output
  2007-11-13  9:59                                   ` Paul Mackerras
@ 2007-11-13 18:53                                     ` Junio C Hamano
  2007-11-13 21:55                                       ` Paul Mackerras
  2007-11-16  7:30                                     ` Marco Costalba
  1 sibling, 1 reply; 57+ messages in thread
From: Junio C Hamano @ 2007-11-13 18:53 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Linus Torvalds, Marco Costalba, Git Mailing List

Paul Mackerras <paulus@samba.org> writes:

> In fact it would be easier for me if, instead of getting the id of
> some random ancestor commit, I got an explicit indication to say
> "unknown parent", such as just a "-" in place of the id of the
> unknown parent(s).  Would that be doable?  I could then just not do
> the processing for any unknown parent, and make sure to do it when I
> see the final version of the commit.

I suspect that a "-" in place of a commit object name may not be
enough for your purpose, as the _number_ of parents can later
change in the later re-output.

I wonder if the presense of "incomplete" on the "Final output"
line is a good enough indication for that.  That is, until you
see "Final output: $N done", you will treat the parent
information as unreliable.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/2] Fix parent rewriting in --early-output
  2007-11-13 18:53                                     ` Junio C Hamano
@ 2007-11-13 21:55                                       ` Paul Mackerras
  0 siblings, 0 replies; 57+ messages in thread
From: Paul Mackerras @ 2007-11-13 21:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Marco Costalba, Git Mailing List

Junio C Hamano writes:

> I suspect that a "-" in place of a commit object name may not be
> enough for your purpose, as the _number_ of parents can later
> change in the later re-output.

I don't mind if a commit that has "-" as one of its parents later
turns out to have more parents (i.e. the "-" can stand for zero or
more unknown parents).  I would be perturbed if a commit that didn't
have any "-" in its parent list later turned out to have a different
number of parents - but I don't think that's what you're implying, is
it?

> I wonder if the presense of "incomplete" on the "Final output"
> line is a good enough indication for that.  That is, until you
> see "Final output: $N done", you will treat the parent
> information as unreliable.

The easiest way for me to handle an unreliable parent is just to
ignore it.  But I can't ignore all the parents, because then I
wouldn't have a graph at all.

In other words, the presence of "incomplete" doesn't give me any clue
as to which particular parent ids are reliable.  As far as I can see,
git log internally knows when a parent id is unreliable (it's one
where it had to terminate the history simplification early), so it
shouldn't be hard to tell gitk about that.  And it would make my job a
lot easier.

Paul.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/2] Fix parent rewriting in --early-output
  2007-11-13  9:59                                   ` Paul Mackerras
  2007-11-13 18:53                                     ` Junio C Hamano
@ 2007-11-16  7:30                                     ` Marco Costalba
  1 sibling, 0 replies; 57+ messages in thread
From: Marco Costalba @ 2007-11-16  7:30 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Linus Torvalds, Junio C Hamano, Git Mailing List

On 11/13/07, Paul Mackerras <paulus@samba.org> wrote:
> Linus Torvalds writes:
>
> > However, while the parent is now correctly rewritten, it looks like gitk
> > is confused by this. Gitk will remember the original parent information,
> > even if a replay has given new parenthood information. Since the partial
> > early-output information is triggered by timing, this means that gitk will
> > show some totally random parent that quite possibly won't even be part of
> > the final commit set at all!
>
> Yep.  It will be a little complex to deal with that because there are
> bits of state that I set up for the parents, and if they're the wrong
> parents, I'll have to go back and undo that.
>

Sorry to comment on a gitk thread, but the problem of different
parents for the same sha while replaying was hitted by me also with
qgit when tring to implement --early-output

I don't know if i is suitable also for gitk but in qgit I changed the
match algorithm to check also for same parents and not only for same
sha during a replay to detect something has changed, so to catch
different parents cases early on and avoiding "going back" that is
complex.

IOW when git log print outs a replay qgit enter in a state where it
checks all the arrived sha against the already sent ones and at the
first mismatch flushes the tail at the point of mismatch.

The modified algorithm instead of chek just the sha checks also
parents info (because git log is called with --parents option this
ends up comparing the first line of the commit message).

Marco

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2007-11-16  7:31 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-28  1:39 New features in gitk Paul Mackerras
2007-10-28  5:34 ` Linus Torvalds
2007-10-28  7:11   ` Paul Mackerras
2007-10-28  7:36     ` Steffen Prohaska
2007-10-28 16:50     ` Linus Torvalds
2007-11-01 10:00       ` Paul Mackerras
2007-11-01 15:16         ` Linus Torvalds
2007-11-02 10:19           ` Paul Mackerras
2007-11-02 12:44             ` Marco Costalba
2007-11-02 15:42               ` Linus Torvalds
2007-11-02 16:50                 ` Marco Costalba
2007-11-02 18:16                 ` Linus Torvalds
2007-11-02 20:31                   ` [PATCH 0/2] History replay support Linus Torvalds
2007-11-02 20:32                     ` [PATCH 1/2] Simplify topo-sort logic Linus Torvalds
2007-11-02 20:35                     ` [PATCH 2/2] Support "history replay" for git log commands Linus Torvalds
2007-11-02 21:05                       ` Junio C Hamano
2007-11-02 21:17                         ` Linus Torvalds
2007-11-03  1:40                     ` [PATCH 0/2] History replay support Linus Torvalds
2007-11-03  7:56                       ` Marco Costalba
2007-11-03 18:11                       ` [REPLACEMENT PATCH 2/2] Add "--early-output" log flag for interactive GUI use Linus Torvalds
2007-11-03 19:52                         ` Marco Costalba
2007-11-04  3:06                         ` Paul Mackerras
2007-11-04  5:38                           ` Linus Torvalds
2007-11-04  7:10                             ` Paul Mackerras
2007-11-04  7:52                               ` Marco Costalba
2007-11-04 18:11                             ` Linus Torvalds
2007-11-04 20:12                               ` [PATCH 3/2] Enhance --early-output format Linus Torvalds
2007-11-05 20:24                                 ` Junio C Hamano
2007-11-05 20:47                                   ` Linus Torvalds
2007-11-05 21:22                                     ` Linus Torvalds
2007-11-05 21:35                                       ` Linus Torvalds
2007-11-13  4:58                                 ` [PATCH 4/2] Fix parent rewriting in --early-output Linus Torvalds
2007-11-13  5:43                                   ` Junio C Hamano
2007-11-13  6:46                                     ` Linus Torvalds
2007-11-13  7:16                                       ` Linus Torvalds
2007-11-13  7:53                                         ` Sven Verdoolaege
2007-11-13  8:48                                         ` Junio C Hamano
2007-11-13  8:01                                     ` Shawn O. Pearce
2007-11-13  8:24                                       ` Junio C Hamano
2007-11-13  9:59                                   ` Paul Mackerras
2007-11-13 18:53                                     ` Junio C Hamano
2007-11-13 21:55                                       ` Paul Mackerras
2007-11-16  7:30                                     ` Marco Costalba
2007-11-04  0:32                       ` [PATCH 0/2] History replay support Paul Mackerras
2007-11-02 18:17                 ` New features in gitk Johannes Schindelin
2007-11-02 15:03             ` Linus Torvalds
2007-11-01 11:37       ` Paul Mackerras
2007-11-01 15:47         ` Linus Torvalds
2007-11-01 16:21           ` Linus Torvalds
2007-10-28 18:32 ` Pierre Habouzit
2007-10-28 18:38   ` Mike Hommey
2007-10-28 23:13   ` Paul Mackerras
2007-10-29  6:20     ` Pierre Habouzit
2007-10-29  8:31       ` Jonathan del Strother
2007-10-29  6:24     ` Pierre Habouzit
2007-10-29 13:30 ` Han-Wen Nienhuys
2007-10-29 14:04 ` Michele Ballabio

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).