Can I have this, pretty please?

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Can I have this, pretty please?
@ 2007-08-12 13:23 David Kastrup
  2007-08-12 14:21 ` Steven Grimm
  2007-08-12 18:38 ` Linus Torvalds
  0 siblings, 2 replies; 29+ messages in thread
From: David Kastrup @ 2007-08-12 13:23 UTC (permalink / raw)
  To: git

Hi,

I have more or less brought my system to a stillstand by trying to
visualize branches and histories: the graphical tools really suck
resources.

So I have been thinking how I could use Emacs, and how to cache what
efficiently, and put out information just on-demand and so on.

And then it struck me: Emacs has a very efficient browser for linked
one-line information that can be expanded into complete changesets
with diffs inside.  It is called "Gnus".  A newsreader.

Mapping a repository into newsgroups (one per branch head?), complete
with threads, references, header display, article fetch (by
git-format-patch), Message Ids (=commit id) is much more
straightforward than creating an HTML server.  And it means that
everybody can use his favorite newsreader for navigating a repository.

Even when we are talking about readonly access, this would be simply
great and at once make for a whole bunch of existing tools that would
provide much better options in many respects than existing
git-specific repository browsers for going through commit histories.

And the possibilities for write access are at least intriguing.

So a lightweight nntp server serving git commits as articles would be
really cool.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 13:23 Can I have this, pretty please? David Kastrup
@ 2007-08-12 14:21 ` Steven Grimm
  2007-08-12 16:40   ` David Kastrup
  2007-08-12 18:38 ` Linus Torvalds
  1 sibling, 1 reply; 29+ messages in thread
From: Steven Grimm @ 2007-08-12 14:21 UTC (permalink / raw)
  To: David Kastrup; +Cc: git

David Kastrup wrote:
> Mapping a repository into newsgroups (one per branch head?), complete
> with threads, references, header display, article fetch (by
> git-format-patch), Message Ids (=commit id) is much more
> straightforward than creating an HTML server.  And it means that
> everybody can use his favorite newsreader for navigating a repository.
>   

The news data model has one big problem. It is a tree structure (or 
rather, a set of tree structures). But git's ancestry graphs are not 
trees; a commit can have multiple parents as well as multiple children, 
and branches can join each other multiple times (via merges) as well as 
split off indefinitely.

I realize that you can give a list of parent message IDs in a news 
header, but I'm going to go out on a limb and guess that all existing 
newsreaders expect that list to be a linear series of messages going 
back toward the root of the thread (since that's all that ever occurs in 
real netnews), rather than an arbitrary DAG.

Not saying it's a worthless idea, but I bet you will not be able to get 
an accurate display of a repository's history using a news reader 
without modifying it to deal with more complex ancestry structures.

-Steve

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 14:21 ` Steven Grimm
@ 2007-08-12 16:40   ` David Kastrup
  0 siblings, 0 replies; 29+ messages in thread
From: David Kastrup @ 2007-08-12 16:40 UTC (permalink / raw)
  To: Steven Grimm; +Cc: git

Steven Grimm <koreth@midwinter.com> writes:

> David Kastrup wrote:
>> Mapping a repository into newsgroups (one per branch head?), complete
>> with threads, references, header display, article fetch (by
>> git-format-patch), Message Ids (=commit id) is much more
>> straightforward than creating an HTML server.  And it means that
>> everybody can use his favorite newsreader for navigating a repository.
>
> The news data model has one big problem. It is a tree structure (or
> rather, a set of tree structures). But git's ancestry graphs are not
> trees; a commit can have multiple parents as well as multiple
> children, and branches can join each other multiple times (via merges)
> as well as split off indefinitely.
>
> I realize that you can give a list of parent message IDs in a news
> header, but I'm going to go out on a limb and guess that all
> existing newsreaders expect that list to be a linear series of
> messages going back toward the root of the thread (since that's all
> that ever occurs in real netnews), rather than an arbitrary DAG.

Well, I never claimed that the threading display would necessarily be
correct.  But with most readers, you can turn it off.  And you can
even tell git to turn off merges in the revision lists.  After all, it
has to linearize things like "whatchanged", too.

> Not saying it's a worthless idea, but I bet you will not be able to
> get an accurate display of a repository's history using a news
> reader without modifying it to deal with more complex ancestry
> structures.

It would still be lots more convenient for finding one's way around
patch series than the current model.  And putting out every branch
into a group of its own (which makes merges somewhat close to
crosspostings in that the reader will not usually try tracking the
changes in the other group) would help keeping the peculiarities in a
single branch display limited.

At the current point of time, _all_ tools I have available for
browsing the history of a large project like Emacs suck _big_ time
compared to what my newsreader can handle.  They render my system
(256MB, about 1GHz) unusable if they work at all.  Being able to, say,
cherrypick stuff together by marking articles and piping the bunch
through git-am (which is what I sometimes do with articles on the git
list) would be quite nice.

And the charm over an http server is that an nntp server can basically
just serve git data raw, without the necessity to add any dressing.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 13:23 Can I have this, pretty please? David Kastrup
  2007-08-12 14:21 ` Steven Grimm
@ 2007-08-12 18:38 ` Linus Torvalds
  2007-08-12 18:48   ` Linus Torvalds
  2007-08-12 19:10   ` David Kastrup
  1 sibling, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2007-08-12 18:38 UTC (permalink / raw)
  To: David Kastrup; +Cc: git

On Sun, 12 Aug 2007, David Kastrup wrote:
>
> And then it struck me: Emacs has a very efficient browser for linked
> one-line information that can be expanded into complete changesets
> with diffs inside.  It is called "Gnus".  A newsreader.

A newsreader is mis-designed for all the same reasons SVN is misdesigned: 
it sees the messages (commits) as a _tree_.

Anybody who sees development as a tree is totally bogus by definition. It 
sees things forking off, but it doesn't see them merging. That's a 
fundamnetal and unfixable design bug.

Of course, for news, that's ok (it might be *nice* if you could reply to 
two messages and see it as a merge, but that's not how things work), so 
it wasn't a design mistake for _that_.

But to visualize a history, it's useless. Merges are as important as forks 
(arguably *more* important). "Forgetting" about merges is bad.

			Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 18:38 ` Linus Torvalds
@ 2007-08-12 18:48   ` Linus Torvalds
  2007-08-12 19:28     ` Jon Smirl
  2007-08-12 19:29     ` David Kastrup
  2007-08-12 19:10   ` David Kastrup
  1 sibling, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2007-08-12 18:48 UTC (permalink / raw)
  To: David Kastrup; +Cc: git

On Sun, 12 Aug 2007, Linus Torvalds wrote:
> 
> A newsreader is mis-designed for all the same reasons SVN is misdesigned: 
> it sees the messages (commits) as a _tree_.

Side note: the lack of this bug is what makes showing large histories 
graphically be expensive in the first place. 

In fact, in git, merges are "first-class" entities, and forking is 
something you have to infer from the history (by finding two commits with 
the same parent), and that's why calculating the graph is actually pretty 
expensive: when you do so, you have to keep all the commit relationships 
in memory, and you basically have to sort it topologically.

So even if you don't want to show the graph itself (and just add 
references to allow the user to walk to parents/children manually), you'd 
still have to calculate - and keep track of - the commit relationships. 
And I suspect that's what makes gitk and other visualizers take time.

I think one solution is to limit the size fo the visualization by date or 
number, ie if you want to see history, it's often useful to do things like

	gitk --since=10.weeks.ago

to see just the "recent" commits. That very fundamentally makes the 
problem much cheaper, because you simply have to generate the graph for a 
much smaller set of commits.

I used to think that we should just default to some reasonable value, but 
then we optimized the hell out of git-rev-list and Paul fixed a number of 
scalability issues in gitk too, so it kind of fell by the wayside because 
it wasn't as important any more. But if you have a huge project with lots 
of history, the right answer may well be to make gitk *default* to using 
something like "show only the last year unless some revision limiting has 
been done explicitly".

IOW, showing the whole history for a big project is simply pretty 
expensive. If you have a hundred thousand commits, just keeping track of 
the tree structure *is* going to take megabytes and megabytes of data. 
Limiting the size of the problem is usually a really good solution, 
especially since most people tend to care about what happened in the last 
few days, not what happened five months ago.

			Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 18:38 ` Linus Torvalds
  2007-08-12 18:48   ` Linus Torvalds
@ 2007-08-12 19:10   ` David Kastrup
  2007-08-12 19:24     ` Linus Torvalds
  2007-08-12 20:02     ` Jeff King
  1 sibling, 2 replies; 29+ messages in thread
From: David Kastrup @ 2007-08-12 19:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sun, 12 Aug 2007, David Kastrup wrote:
>>
>> And then it struck me: Emacs has a very efficient browser for linked
>> one-line information that can be expanded into complete changesets
>> with diffs inside.  It is called "Gnus".  A newsreader.
>
> A newsreader is mis-designed for all the same reasons SVN is
> misdesigned: it sees the messages (commits) as a _tree_.

In the first place, it sees linked messages.  They usually correspond
to something treeish, but a newsreader that would barf when they don't
would be unusable.  Newsreaders actually have to deal with stupid
things like _loops_ in message referals without going into a tizzy.
Those things happen in Usenet.

> Anybody who sees development as a tree is totally bogus by
> definition. It sees things forking off, but it doesn't see them
> merging. That's a fundamnetal and unfixable design bug.

It is not inherent in NNTP.  It depends on the particular newsreader,
and for pretty much all of them, you can turn off threaded display if
it disturbs you.

> But to visualize a history, it's useless.

Not half as useless as existing git-specific tools.  They thrash my
computer to death on serious sized trees.  Putting every branch into a
newsgroup of its own, in contrast, together with the usual header
search and refinement options, would be _much_ _much_ faster for
accessing a particular patch.

I'll probably be able to create a Gnus _backend_ for this sort of
setup (there are even backends for directory browsing: most files
become articles written by their owner that either are plain text, or
that contain their file contents as an attachment -- quite more crazy
than a git commit tree).  But an nntp server would make the idea
usable for more than just Emacs users, and it would allow a much more
convenient "what happened on the "next" branch in the last few days"
overview than existing tools.

It lends itself not well to actually serving trees and blobs (even
though one could superficially rely on a rigid tree topology there):
newsreaders just don't match the natural way of accessing them (Gnus
offers that for files, but it plainly is not much use compared to a
dedicated directory browser).

But for commits and patches, one group per branch?  That would be
fine.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 19:10   ` David Kastrup
@ 2007-08-12 19:24     ` Linus Torvalds
  2007-08-12 19:46       ` David Kastrup
  2007-08-12 20:02     ` Jeff King
  1 sibling, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2007-08-12 19:24 UTC (permalink / raw)
  To: David Kastrup; +Cc: git

On Sun, 12 Aug 2007, David Kastrup wrote:
>
> > But to visualize a history, it's useless.
> 
> Not half as useless as existing git-specific tools.  They thrash my
> computer to death on serious sized trees.

So, use "git log --pretty=oneline" instead, which doesn't have the 
expense.

I don't see why you think that using nntp would help anything. The 
_problem_ is still the same one, of calculating full reachability. It 
didn't go away just because you changed to another intermediate protocol.

Yes, you could perhaps use the nntp caching, but I don't know if you've 
noticed: the reason news servers tend to expire old messages is that a 
news reader and the NNTP protocol won't be able to handle huge histories 
either.

And if you just want the "expire" feature, then you might as well just 
make git date-limit things for you, ie "gitk --since=last.week"

			Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 18:48   ` Linus Torvalds
@ 2007-08-12 19:28     ` Jon Smirl
  2007-08-12 19:45       ` Linus Torvalds
  2007-08-12 19:48       ` David Kastrup
  2007-08-12 19:29     ` David Kastrup
  1 sibling, 2 replies; 29+ messages in thread
From: Jon Smirl @ 2007-08-12 19:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Kastrup, git

On 8/12/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> IOW, showing the whole history for a big project is simply pretty
> expensive. If you have a hundred thousand commits, just keeping track of
> the tree structure *is* going to take megabytes and megabytes of data.
> Limiting the size of the problem is usually a really good solution,
> especially since most people tend to care about what happened in the last
> few days, not what happened five months ago.

Could the topological graph for a packfile be computed at pack time
and stored in the packfile so that gitk doesn't have to keep
recomputing it? Does it work to merge multiple precomputed graphs
retrieved from the pack files?

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 18:48   ` Linus Torvalds
  2007-08-12 19:28     ` Jon Smirl
@ 2007-08-12 19:29     ` David Kastrup
  2007-08-12 19:51       ` Uwe Kleine-König
  2007-08-12 19:53       ` Linus Torvalds
  1 sibling, 2 replies; 29+ messages in thread
From: David Kastrup @ 2007-08-12 19:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sun, 12 Aug 2007, Linus Torvalds wrote:
>> 
>> A newsreader is mis-designed for all the same reasons SVN is misdesigned: 
>> it sees the messages (commits) as a _tree_.
>
> Side note: the lack of this bug is what makes showing large
> histories graphically be expensive in the first place.

Not really.

dak@lola:/home/tmp/emacs$ time git-rev-list --parents --topo-order --all>/dev/null

real    0m9.042s
user    0m8.801s
sys     0m0.168s

This does not even start to _think_ of swapping.

> So even if you don't want to show the graph itself (and just add
> references to allow the user to walk to parents/children manually),
> you'd still have to calculate - and keep track of - the commit
> relationships.  And I suspect that's what makes gitk and other
> visualizers take time.

It does not bother git-rev-list.  What takes them time is that they
are simply not written with insane amounts of data in mind.

And newsreaders are.

> IOW, showing the whole history for a big project is simply pretty
> expensive. If you have a hundred thousand commits, just keeping
> track of the tree structure *is* going to take megabytes and
> megabytes of data.  Limiting the size of the problem is usually a
> really good solution, especially since most people tend to care
> about what happened in the last few days, not what happened five
> months ago.

And newsreaders, for that reason, have a set of strategies for
limiting the size of the problem (and changing the limits on the fly
as needed) as well as being efficient with handling it.  They have to
be _good_ at dealing with that amount of data, or they would have
fallen by the wayside.

As opposed to gitk and other visualization tools, newsreaders usually
have fast and convenient keyboard navigation and an article window
where serious amounts of text can be viewed with readable fonts.

If you try selecting a more readable font for gitk, you are limited to
selecting between fonts called something starting with the letters "a"
to "c" since the font menu runs off the screen after that.

I find that I can't get much use out of gitweb: like webmail, it is
simply too little hands-on for getting at the right stuff efficiently:
it is all too point and clicky instead direct keyboard access.

So at least for my preferred human-computer interface style, an
NNTP-browsable repository would come quite handy.  I'll probably fudge
something in Gnus (which has the advantage that I _can_ create more
direct links to files and trees), but I doubt that the usefulness of
the concept would not stretch to actual servers.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 19:28     ` Jon Smirl
@ 2007-08-12 19:45       ` Linus Torvalds
  2007-08-12 19:48       ` David Kastrup
  1 sibling, 0 replies; 29+ messages in thread
From: Linus Torvalds @ 2007-08-12 19:45 UTC (permalink / raw)
  To: Jon Smirl; +Cc: David Kastrup, git

On Sun, 12 Aug 2007, Jon Smirl wrote:
> 
> Could the topological graph for a packfile be computed at pack time
> and stored in the packfile so that gitk doesn't have to keep
> recomputing it?

For a single (full) pack, with no loose objects, sure, you could cache it. 
But then you might as well just cache it all outside git instead.

>		 Does it work to merge multiple precomputed graphs
> retrieved from the pack files?

No. For multiple packs, there aren't even any "precomputed graphs". You 
could probably do it with some fragment thing, and then be really clever 
putting all the fragments together, but I think it's complex as hell.

		Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 19:24     ` Linus Torvalds
@ 2007-08-12 19:46       ` David Kastrup
  2007-08-12 19:59         ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: David Kastrup @ 2007-08-12 19:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sun, 12 Aug 2007, David Kastrup wrote:
>>
>> > But to visualize a history, it's useless.
>> 
>> Not half as useless as existing git-specific tools.  They thrash my
>> computer to death on serious sized trees.
>
> So, use "git log --pretty=oneline" instead, which doesn't have the
> expense.

Yes, like managing a manual with grep is all one needs.  git log
--pretty=oneline provides just the commit headers, but offers no way
to jump into the commits themselves and back easily.

> I don't see why you think that using nntp would help anything. The
> _problem_ is still the same one, of calculating full
> reachability. It didn't go away just because you changed to another
> intermediate protocol.

Newsreaders are designed _not_ to calculate full reachability.  They
would be unusable otherwise.  They have reasonable heuristics for
dealing with partial information and getting more only when needed.

> Yes, you could perhaps use the nntp caching, but I don't know if
> you've noticed: the reason news servers tend to expire old messages
> is that a news reader and the NNTP protocol won't be able to handle
> huge histories either.

It's actually more of a storage problem.  A pretty normal general
newsspool with about 2 weeks of storage requires several gigabytes of
disk space already.

> And if you just want the "expire" feature, then you might as well
> just make git date-limit things for you, ie "gitk --since=last.week"

I actually don't want any "expire feature".  Expiry happens at the
server, and git is quite efficient enough at "storing" the articles
that expiry appears pointless (unless one puts all of Sourceforge's
recent commit histories onto an NNTP spool, probably an interesting
experiment).

"Marked as read" could conceivably come handy for keeping on top of
large projects, but basically I'd already be suited fine with
ephemeral groups which look the same whenever I visit them again.

The thing with newsreaders is that it is easy to say "since last
week", and then just look at a few more earlier articles.  This sort
of functionality has been honed and improved over decades.  If I can
avoid starting fresh, with a new user interface and the same old
problems, that helps.  Nobody wants tools that require to tell them
when you start them just how much information you'll ever want from
them.

That's the thing why pagers are so convenient with real pipes as
compared to temporary files: you can cut off the data generating
process when you decide you don't need more, and you don't need to
wait until the whole data is there.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 19:28     ` Jon Smirl
  2007-08-12 19:45       ` Linus Torvalds
@ 2007-08-12 19:48       ` David Kastrup
  1 sibling, 0 replies; 29+ messages in thread
From: David Kastrup @ 2007-08-12 19:48 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Linus Torvalds, git

"Jon Smirl" <jonsmirl@gmail.com> writes:

> On 8/12/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> IOW, showing the whole history for a big project is simply pretty
>> expensive. If you have a hundred thousand commits, just keeping track of
>> the tree structure *is* going to take megabytes and megabytes of data.
>> Limiting the size of the problem is usually a really good solution,
>> especially since most people tend to care about what happened in the last
>> few days, not what happened five months ago.
>
> Could the topological graph for a packfile be computed at pack time
> and stored in the packfile so that gitk doesn't have to keep
> recomputing it? Does it work to merge multiple precomputed graphs
> retrieved from the pack files?

The parent information basically _is_ a bare-bones specification of
the topological graph.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 19:29     ` David Kastrup
@ 2007-08-12 19:51       ` Uwe Kleine-König
  2007-08-12 20:04         ` David Kastrup
  2007-08-12 19:53       ` Linus Torvalds
  1 sibling, 1 reply; 29+ messages in thread
From: Uwe Kleine-König @ 2007-08-12 19:51 UTC (permalink / raw)
  To: David Kastrup; +Cc: Linus Torvalds, git

David Kastrup wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> > On Sun, 12 Aug 2007, Linus Torvalds wrote:
> >> 
> >> A newsreader is mis-designed for all the same reasons SVN is misdesigned: 
> >> it sees the messages (commits) as a _tree_.
> >
> > Side note: the lack of this bug is what makes showing large
> > histories graphically be expensive in the first place.
> 
> Not really.
> 
> dak@lola:/home/tmp/emacs$ time git-rev-list --parents --topo-order --all>/dev/null
> 
> real    0m9.042s
> user    0m8.801s
> sys     0m0.168s
> 
> This does not even start to _think_ of swapping.
rev-list doesn't try to draw a line from each commit to its parents.
That's the really intensive part.  So when gitk reads

	d56871cb0e6ceeca8e5435ff95409d78bed014f0 a046fe0cb8697bc97993b2e609688ff5e89e3e9

it must remember this line at least until it sees a line starting with
a046fe0cb8697bc97993b2e609688ff5e89e3e9.

Best regards
Uwe

-- 
Uwe Kleine-König

dd if=/proc/self/exe bs=1 skip=1 count=3 2>/dev/null

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 19:29     ` David Kastrup
  2007-08-12 19:51       ` Uwe Kleine-König
@ 2007-08-12 19:53       ` Linus Torvalds
  2007-08-12 20:10         ` David Kastrup
  2007-08-13  0:22         ` Paul Mackerras
  1 sibling, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2007-08-12 19:53 UTC (permalink / raw)
  To: David Kastrup; +Cc: Git Mailing List, Paul Mackerras

On Sun, 12 Aug 2007, David Kastrup wrote:
> 
> dak@lola:/home/tmp/emacs$ time git-rev-list --parents --topo-order --all>/dev/null
> 
> real    0m9.042s
> user    0m8.801s
> sys     0m0.168s
> 
> This does not even start to _think_ of swapping.

Ok, good. That's the part I care about most. Nine seconds is still a long 
time to wait for the the window to come up, so I'd still suggest at least 
thinking about limiting it, but..

> It does not bother git-rev-list.  What takes them time is that they
> are simply not written with insane amounts of data in mind.

Well, gitk has certainly had performance problems in the past, they've 
been fixable. I think this should just be fixed too. And if the rev-list 
is fast enough, then the gitk fix may well be to just not compute the 
*whole* history - ie the solution may be as simple as stopping the 
background job that does all the graph calculations when it is (pick a 
point at random) something like a thousand commits into the graph, and the 
user hasn't scrolled down..

Gitk is already incremental (ie it shows the top of the graph long before 
it has drawn it all), so that should not be fundamentally hard. Paul has 
been pretty good about these things when we've had problems in the past.

Paul added to Cc. Paul?

> And newsreaders, for that reason, have a set of strategies for
> limiting the size of the problem (and changing the limits on the fly
> as needed) as well as being efficient with handling it.  They have to
> be _good_ at dealing with that amount of data, or they would have
> fallen by the wayside.

The reason I argue against this is that (a) the graph really is very 
useful. It tells you things that you reasonably visualize any other way. 
And (b) I think what you suggest wouldn't be trivial at all.

But if you want to make a virtual NNTP server that exposes the 
git-rev-list output, go right ahead.

I don't think it should be needed (ie I think we should be able to handle 
this issue other ways), and I don't think it's as good as the alternatives 
(because I don't think any client will ever be able to show the history 
well), but hey, alternatives are fine.

		Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 19:46       ` David Kastrup
@ 2007-08-12 19:59         ` Linus Torvalds
  2007-08-12 20:30           ` David Kastrup
  2007-08-12 20:58           ` Govind Salinas
  0 siblings, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2007-08-12 19:59 UTC (permalink / raw)
  To: David Kastrup; +Cc: git

On Sun, 12 Aug 2007, David Kastrup wrote:
> >
> > So, use "git log --pretty=oneline" instead, which doesn't have the
> > expense.
> 
> Yes, like managing a manual with grep is all one needs.  git log
> --pretty=oneline provides just the commit headers, but offers no way
> to jump into the commits themselves and back easily.

You misunderstand.

I was suggesting you do a *tool* that bases its listing on 
--pretty=oneline, and then goes from there.

If you don't show the graph anyway, all the complex and expensive things 
that "git-rev-list --topo-order" does is pretty much totally useless. 
You're going to show the commits as a list anyway, and then when you 
*select* one commit for closer inspection, you can then try to do a better 
job at that point of doing the reachability (ie parenthood is trivial, and 
the branch reachability is cheap if it's close to the tip of the tree, 
which it would almost always be).

The real problem with the topological sort is that it requires you to have 
the full history. That not only makes everything pretty big, it also means 
that the startup cost is bad, since you can't do things incrementally.

But if you have a client that is incremental anyway, almost all of that 
goes away.

			Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 19:10   ` David Kastrup
  2007-08-12 19:24     ` Linus Torvalds
@ 2007-08-12 20:02     ` Jeff King
  2007-08-12 20:09       ` Jeff King
  2007-08-12 21:51       ` David Kastrup
  1 sibling, 2 replies; 29+ messages in thread
From: Jeff King @ 2007-08-12 20:02 UTC (permalink / raw)
  To: David Kastrup; +Cc: Linus Torvalds, git

On Sun, Aug 12, 2007 at 09:10:24PM +0200, David Kastrup wrote:

> I'll probably be able to create a Gnus _backend_ for this sort of
> setup (there are even backends for directory browsing: most files

You can somewhat prototype this by just dumping the commits to an mbox
(sorry for the long lines):

git-log \
  --pretty=format:'From %H Mon Sep 17 00:00:00 2001%nFrom: %an <%ae>%nDate: %ad%nSubject: %s%nMessage-ID: <%H@none>%nReferences: %P%n%n%b' \
  | perl -pe 's/References: (.*)/"References: " .  %join(" ", map { "<" . $_ . "\@none>" } split \/ \/, $1)/e' \
  >mbox

Looking at an appreciably large chunk of history means that you will be
very far down in a subthread. mutt, at least, doesn't display this in a
very readable way. But my point is that you are probably better to look
at a couple of different view strategies just by dumping and tweaking
the references relationships (which really only takes about a second for
me on the git.git repository).

Also, have you tried looking at tig (make sure to try a recent version
and use the 'g' command to turn on the graph display)? I think it is
similar to what you are looking for, and I have found it to be very fast
(both in implementation and in usability).

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 19:51       ` Uwe Kleine-König
@ 2007-08-12 20:04         ` David Kastrup
  2007-08-12 20:21           ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: David Kastrup @ 2007-08-12 20:04 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: Linus Torvalds, git

Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de> writes:

> David Kastrup wrote:
>> Linus Torvalds <torvalds@linux-foundation.org> writes:
>> > On Sun, 12 Aug 2007, Linus Torvalds wrote:
>> >> 
>> >> A newsreader is mis-designed for all the same reasons SVN is misdesigned: 
>> >> it sees the messages (commits) as a _tree_.
>> >
>> > Side note: the lack of this bug is what makes showing large
>> > histories graphically be expensive in the first place.
>> 
>> Not really.
>> 
>> dak@lola:/home/tmp/emacs$ time git-rev-list --parents --topo-order --all>/dev/null
>> 
>> real    0m9.042s
>> user    0m8.801s
>> sys     0m0.168s
>> 
>> This does not even start to _think_ of swapping.
> rev-list doesn't try to draw a line from each commit to its parents.

Well, that's what --topo-order is somewhat about, but it might
actually not do much together with --all.

> That's the really intensive part.  So when gitk reads
>
> 	d56871cb0e6ceeca8e5435ff95409d78bed014f0 a046fe0cb8697bc97993b2e609688ff5e89e3e9
>
> it must remember this line at least until it sees a line starting with
> a046fe0cb8697bc97993b2e609688ff5e89e3e9.

20 bytes of payload for a commit number.  Make a usable hashing data
structure for it, adds perhaps another 20 bytes.  Links to all parents
are 4 bytes each.  All in all, we won't need more than 64 bytes per
commit.  Take 100000 of them, and you are at 6.4MB.  And that is not
taking into account that you can let git-name-rev cut the information
retrieval down much much more, and just get the rest of the
information when it is actually moved on-screen.  I don't actually
_want_ to see 50 parallel lines from bottom to top of screen obscuring
my branch display and taking away all the screen estate: that is
completely useless information.  Pack the branches away into a cable
pipe and let them come out isolated again only when they are actually
involved on the screen.

There is no necessity to prerender/layout 50 yards of graphing.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 20:02     ` Jeff King
@ 2007-08-12 20:09       ` Jeff King
  2007-08-12 21:51       ` David Kastrup
  1 sibling, 0 replies; 29+ messages in thread
From: Jeff King @ 2007-08-12 20:09 UTC (permalink / raw)
  To: David Kastrup; +Cc: Linus Torvalds, git

On Sun, Aug 12, 2007 at 04:02:58PM -0400, Jeff King wrote:

> git-log \
>   --pretty=format:'From %H Mon Sep 17 00:00:00 2001%nFrom: %an <%ae>%nDate: %ad%nSubject: %s%nMessage-ID: <%H@none>%nReferences: %P%n%n%b' \

Er, sorry, that should be '%aD' in the date.

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 19:53       ` Linus Torvalds
@ 2007-08-12 20:10         ` David Kastrup
  2007-08-13  0:22         ` Paul Mackerras
  1 sibling, 0 replies; 29+ messages in thread
From: David Kastrup @ 2007-08-12 20:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List, Paul Mackerras

Linus Torvalds <torvalds@linux-foundation.org> writes:

> But if you want to make a virtual NNTP server that exposes the
> git-rev-list output, go right ahead.
>
> I don't think it should be needed (ie I think we should be able to
> handle this issue other ways),

Sure.  But being able to handle it in a way with which I as well as my
tools are already fluent is an advantage, for me.  "One separate
idiosyncratic tool for every job" does not cut it for me with regard
to user interfaces.  Which is part of the reason I am an Emacs user.
And even though you may want to see that breed interned, some of them
do useful things at times.

> and I don't think it's as good as the alternatives (because I don't
> think any client will ever be able to show the history well), but
> hey, alternatives are fine.

Yup.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 20:04         ` David Kastrup
@ 2007-08-12 20:21           ` Linus Torvalds
  0 siblings, 0 replies; 29+ messages in thread
From: Linus Torvalds @ 2007-08-12 20:21 UTC (permalink / raw)
  To: David Kastrup; +Cc: Uwe Kleine-K?nig, git

On Sun, 12 Aug 2007, David Kastrup wrote:
> >
> > rev-list doesn't try to draw a line from each commit to its parents.
> 
> Well, that's what --topo-order is somewhat about, but it might
> actually not do much together with --all.

No, --topo-order works with --all too. In fact, to some degree, it's 
*especially* useful with --all, since having multiple tips makes the whole 
topological sort all the more interesting, and also usually makes the end 
result more interesting (ie it's often much more interestign to visualize 
two or more branches together, just to see the *relationships* between the 
branches, and see what is shared.

And yes, it keeps track of every single commit, and computes the 
relationships between them. So it does indeed "draw the line", except it 
can do so in a rather dense and optimized set of data structures.

(That's one reason I love coding in C: it may be more effort, but you can 
tune your data structures in ways you seldom can in higher-level 
languages, and git-rev-list and the object representation is some of the 
most tuned code in git).

> 20 bytes of payload for a commit number.  Make a usable hashing data
> structure for it, adds perhaps another 20 bytes.  Links to all parents
> are 4 bytes each.  All in all, we won't need more than 64 bytes per
> commit.

Yeah, that's the rough ballpark (except for 64-bit architectures, the 
links are all 8 bytes, but we're pretty careful). See "object.h" for most 
of the details.

			Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 19:59         ` Linus Torvalds
@ 2007-08-12 20:30           ` David Kastrup
  2007-08-12 20:58           ` Govind Salinas
  1 sibling, 0 replies; 29+ messages in thread
From: David Kastrup @ 2007-08-12 20:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sun, 12 Aug 2007, David Kastrup wrote:
>> >
>> > So, use "git log --pretty=oneline" instead, which doesn't have the
>> > expense.
>> 
>> Yes, like managing a manual with grep is all one needs.  git log
>> --pretty=oneline provides just the commit headers, but offers no way
>> to jump into the commits themselves and back easily.
>
> You misunderstand.
>
> I was suggesting you do a *tool* that bases its listing on 
> --pretty=oneline, and then goes from there.

Full agreement here.  My tool was going to pass those lines off as
article headers.

> If you don't show the graph anyway, all the complex and expensive
> things that "git-rev-list --topo-order" does is pretty much totally
> useless.  You're going to show the commits as a list anyway, and
> then when you *select* one commit for closer inspection, you can
> then try to do a better job at that point of doing the reachability
> (ie parenthood is trivial, and the branch reachability is cheap if
> it's close to the tip of the tree, which it would almost always be).

Quite so.  I was thinking of doing such a tool inside of Emacs (after
all, Emacs is the most extensive junkyard for prototyping editing
solutions that can be had) and was weighing options for what kind of
stuff I would need to be doing to have it work efficiently, offering
access to everything without wasting unnecessary time on those things
that don't interest me at the moment.

And what I came up with had far too many similarities to what a
newsreader does...  So my first reaction to that idea was to post to
the Gnus Usenet group proposing some sort of virtual server method for
Gnus (it already has more than a dozen for managing news, mail,
various mail and news spools, diaries, Google, Slashdot, files,
directories, virtual groups...).  But it occured to me that the
mapping of information to NNTP is actually so straightforward that the
idea seemed exploitable not just by Emacs users.

> But if you have a client that is incremental anyway, almost all of
> that goes away.

Yup.  Omniscience is overrated in computer science.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 19:59         ` Linus Torvalds
  2007-08-12 20:30           ` David Kastrup
@ 2007-08-12 20:58           ` Govind Salinas
  2007-08-12 21:35             ` David Kastrup
  1 sibling, 1 reply; 29+ messages in thread
From: Govind Salinas @ 2007-08-12 20:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Kastrup, git

[-- Attachment #1: Type: text/plain, Size: 2262 bytes --]

Since you all are talking about such things, I thought I would show
you a shot of my git UI.  It does what I think Linus is talking about.
 I have a window of x commits which I show in a list and allow the
user to look at each one.  You can click on a commit to see full
details.  There are back/next buttons to browse the entire history and
date/author/etc filters to narrow your results.  The only thing I am
missing is the pretty chart that gitk and others have.  The chart  (in
my app) would only show the chart for the current window of commits.
I'll get to that sometime after work gives me enough time to start
working on this again.

Is this something like what you had in mind?

On 8/12/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>
> On Sun, 12 Aug 2007, David Kastrup wrote:
> > >
> > > So, use "git log --pretty=oneline" instead, which doesn't have the
> > > expense.
> >
> > Yes, like managing a manual with grep is all one needs.  git log
> > --pretty=oneline provides just the commit headers, but offers no way
> > to jump into the commits themselves and back easily.
>
> You misunderstand.
>
> I was suggesting you do a *tool* that bases its listing on
> --pretty=oneline, and then goes from there.
>
> If you don't show the graph anyway, all the complex and expensive things
> that "git-rev-list --topo-order" does is pretty much totally useless.
> You're going to show the commits as a list anyway, and then when you
> *select* one commit for closer inspection, you can then try to do a better
> job at that point of doing the reachability (ie parenthood is trivial, and
> the branch reachability is cheap if it's close to the tip of the tree,
> which it would almost always be).
>
> The real problem with the topological sort is that it requires you to have
> the full history. That not only makes everything pretty big, it also means
> that the startup cost is bad, since you can't do things incrementally.
>
> But if you have a client that is incremental anyway, almost all of that
> goes away.
>
>                         Linus
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

[-- Attachment #2: Widgit-commits.PNG --]
[-- Type: image/png, Size: 59122 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 20:58           ` Govind Salinas
@ 2007-08-12 21:35             ` David Kastrup
  2007-08-12 22:17               ` Martin Langhoff
  0 siblings, 1 reply; 29+ messages in thread
From: David Kastrup @ 2007-08-12 21:35 UTC (permalink / raw)
  To: Govind Salinas; +Cc: Linus Torvalds, git

"Govind Salinas" <govindsalinas@gmail.com> writes:

> Since you all are talking about such things, I thought I would show
> you a shot of my git UI.  It does what I think Linus is talking
> about.  I have a window of x commits which I show in a list and
> allow the user to look at each one.  You can click on a commit to
> see full details.  There are back/next buttons to browse the entire
> history and date/author/etc filters to narrow your results.  The
> only thing I am missing is the pretty chart that gitk and others
> have.  The chart (in my app) would only show the chart for the
> current window of commits.  I'll get to that sometime after work
> gives me enough time to start working on this again.
>
> Is this something like what you had in mind?

Well, what I have in mind boils down to something I can use without
leaving my editor...  Your tool does not look all too different from
gitk, git-gui, giggle, giwhatever.  There is a variety of those
around, and they all don't really blow me away.  Part of the problem
is that my work flow involves editing a lot and I naturally use Emacs.
If those tools used Emacs for all their editing, I'd probably become
more friendly with them (for what it's worth: one can talk with Emacs
through sockets if necessary).  However, Linus might have something
different in mind.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 20:02     ` Jeff King
  2007-08-12 20:09       ` Jeff King
@ 2007-08-12 21:51       ` David Kastrup
  2007-08-12 23:10         ` Jeff King
  1 sibling, 1 reply; 29+ messages in thread
From: David Kastrup @ 2007-08-12 21:51 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, git

Jeff King <peff@peff.net> writes:

> On Sun, Aug 12, 2007 at 09:10:24PM +0200, David Kastrup wrote:
>
>> I'll probably be able to create a Gnus _backend_ for this sort of
>> setup (there are even backends for directory browsing: most files
>
> You can somewhat prototype this by just dumping the commits to an mbox
> (sorry for the long lines):
>
> git-log \
>   --pretty=format:'From %H Mon Sep 17 00:00:00 2001%nFrom: %an <%ae>%nDate: %ad%nSubject: %s%nMessage-ID: <%H@none>%nReferences: %P%n%n%b' \
>   | perl -pe 's/References: (.*)/"References: " .  %join(" ", map { "<" . $_ . "\@none>" } split \/ \/, $1)/e' \
>   >mbox

One percent too many before join, and the order of the articles is
reversed (--reverse helps here).

It is also a good idea to set gnus-thread-indent to 0 or 1, and
gnus-use-trees seems interesting, though not in a reasonably good
state (the graph layout tries to avoid crossing links and node names,
and that's rather useless).

So actually Gnus would need some kicking into shape before it actually
would present a useful tool.  On the positive side, it takes about 15
seconds sucking up and toposorting the complete group of about 11000
commits from an mbox file (which one would not ever do anyway).  And
that is Elisp.  However, the git history is still rather harmless
considering the commit amounts.

> Also, have you tried looking at tig (make sure to try a recent
> version and use the 'g' command to turn on the graph display)?

There are too many tools around.  Sigh.  Another to try.  Thanks for
the tip.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 21:35             ` David Kastrup
@ 2007-08-12 22:17               ` Martin Langhoff
  2007-08-12 22:54                 ` David Kastrup
  0 siblings, 1 reply; 29+ messages in thread
From: Martin Langhoff @ 2007-08-12 22:17 UTC (permalink / raw)
  To: David Kastrup; +Cc: Govind Salinas, Linus Torvalds, git

On 8/13/07, David Kastrup <dak@gnu.org> wrote:
> Well, what I have in mind boils down to something I can use without
> leaving my editor... (...) and I naturally use Emacs.

heh! As an emacs user, I have to say this might just be a tad too much :-)

The main fix for your immediate woes of having gitk work fast is -
imho - to limit it by time, which I do all the time.

And on that track I'd *love* it if gitk could work as follows:
start-up as if I had said --since=10.days.ago (unless I pass an
explicit --since) and put a "get more history" button at the bottom of
the commit list. And make the default --since settable via git config
as gitk.since or somesuch.

That'd make newcomers to git go -- WOW -- on gitk, and save old hands
some typing ;-)

On the gnus backend - I don't think the nntp backend is good enough,
as it can't deal with merges. But if you can write up a new backend
that can read merges, you'll be golden. You'll definitely want to
limit the number of commits you read initially, too.

Now - both your emacs-gnus-git backend and gitk/qgit would benefit
from having a long-lived git process that you can talk to via a socket
for the stuff that you are bound to be asking a lot of (cat-file,
diff, etc). Something like git-fastimport but for common queries.

I *thought* there was one -- I was just reading gitk to check and not
look like a doofus -- but at least my gitk is exec'ing git cat-file
all over the place. I am sure that it'd speed up gitk and friends
enormoustly, specially on non-linux environments where IO isn't as
optimised.

cheers,

m

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 22:17               ` Martin Langhoff
@ 2007-08-12 22:54                 ` David Kastrup
  0 siblings, 0 replies; 29+ messages in thread
From: David Kastrup @ 2007-08-12 22:54 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Govind Salinas, Linus Torvalds, git

"Martin Langhoff" <martin.langhoff@gmail.com> writes:

> On 8/13/07, David Kastrup <dak@gnu.org> wrote:
>> Well, what I have in mind boils down to something I can use without
>> leaving my editor... (...) and I naturally use Emacs.
>
> heh! As an emacs user, I have to say this might just be a tad too
> much :-)
>
> The main fix for your immediate woes of having gitk work fast is -
> imho - to limit it by time, which I do all the time.
>
> And on that track I'd *love* it if gitk could work as follows:
> start-up as if I had said --since=10.days.ago (unless I pass an
> explicit --since) and put a "get more history" button at the bottom
> of the commit list. And make the default --since settable via git
> config as gitk.since or somesuch.
>
> That'd make newcomers to git go -- WOW -- on gitk, and save old
> hands some typing ;-)

Sigh.  Why does one have to limit _anything_?  gitk can just keep
asking git-rev-list -20 --stdin enough questions to fill the screen.
It can get more history if it _needs_ it.

tig actually sucks up the whole of Emacs history (100000 commits per
branch) as fast as git-rev-list can produce it.  Without locking or
swapping.

> On the gnus backend - I don't think the nntp backend is good enough,
> as it can't deal with merges. But if you can write up a new backend
> that can read merges, you'll be golden. You'll definitely want to
> limit the number of commits you read initially, too.
>
> Now - both your emacs-gnus-git backend and gitk/qgit would benefit
> from having a long-lived git process that you can talk to via a
> socket for the stuff that you are bound to be asking a lot of
> (cat-file, diff, etc). Something like git-fastimport but for common
> queries.

Can be pipes.  Pretty common way of talking to utilities from within
Emacs.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 21:51       ` David Kastrup
@ 2007-08-12 23:10         ` Jeff King
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff King @ 2007-08-12 23:10 UTC (permalink / raw)
  To: David Kastrup; +Cc: git

On Sun, Aug 12, 2007 at 11:51:24PM +0200, David Kastrup wrote:

> One percent too many before join, and the order of the articles is
> reversed (--reverse helps here).

Sorry, yes, a cut and paste error on the first. For the second, the
order is largely irrelevant if your reader is going to sort them anyway
(and since they are generally all in a single thread, the threading will
define the order).

> So actually Gnus would need some kicking into shape before it actually
> would present a useful tool.  On the positive side, it takes about 15
> seconds sucking up and toposorting the complete group of about 11000
> commits from an mbox file (which one would not ever do anyway).  And

Mutt is much faster (about 3 seconds to read and sort). Of course,
that's in C and the display doesn't look all that useful. :)

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-12 19:53       ` Linus Torvalds
  2007-08-12 20:10         ` David Kastrup
@ 2007-08-13  0:22         ` Paul Mackerras
  2007-08-13  5:49           ` David Kastrup
  1 sibling, 1 reply; 29+ messages in thread
From: Paul Mackerras @ 2007-08-13  0:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Kastrup, Git Mailing List

Linus Torvalds writes:

> Well, gitk has certainly had performance problems in the past, they've 
> been fixable. I think this should just be fixed too. And if the rev-list 
> is fast enough, then the gitk fix may well be to just not compute the 
> *whole* history - ie the solution may be as simple as stopping the 
> background job that does all the graph calculations when it is (pick a 
> point at random) something like a thousand commits into the graph, and the 
> user hasn't scrolled down..

I have made a "dev" branch in the gitk.git repository that has some
tweaks to the graph layout algorithm which change the appearance a
bit; specifically it doesn't continue the graph lines downwards until
it has to terminate them with an arrow because the graph is getting
too wide.  Instead, it always terminates them if they are going to be
longer than a certain length (about 100 rows).  Also I made some
changes to reduce the incidence of two lines having a corner at the
same point, for visual clarity.

The point of terminating the graph lines early is that it means gitk
won't have to lay out the whole graph, just the visible bits and a
limited number of rows around that.  So I'm interested to know if
people think it looks OK visually.  (I think it's actually better,
myself.)

The other thing that takes time is reading in the topology for the
previous/next tag computations.  I did a patch that wrote out the
topology to a cache file but I ran into some problems where the cache
includes commits that have gone away since the cache was created.
What I need to do to update the cached information is basically the
equivalent of

	git rev-list --all ^root1 ^root2 ...

where root1, root2, etc. are the commits in the cache that had no
children (and of which all the other commits in the cache are
descendents).  However, git rev-list will barf if those commits no
longer exist.  Currently the only solution I can see is to validate
them one by one with separate invocations of git rev-list or something
(git rev-parse won't do).

Would it be possible to make git rev-list ignore commits that don't
exist if they have a "^" in front of them, i.e. where we're asking for
them to be excluded anyway?  If we can do that (or something
equivalent) then I can make the cache work reliably.  It does speed up
gitk enormously, and the cache file is only about 3MB for the kernel
tree, so it seems well worth while.

Paul.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Can I have this, pretty please?
  2007-08-13  0:22         ` Paul Mackerras
@ 2007-08-13  5:49           ` David Kastrup
  0 siblings, 0 replies; 29+ messages in thread
From: David Kastrup @ 2007-08-13  5:49 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Linus Torvalds, Git Mailing List

Paul Mackerras <paulus@samba.org> writes:

> Linus Torvalds writes:
>
>> Well, gitk has certainly had performance problems in the past, they've 
>> been fixable. I think this should just be fixed too. And if the rev-list 
>> is fast enough, then the gitk fix may well be to just not compute the 
>> *whole* history - ie the solution may be as simple as stopping the 
>> background job that does all the graph calculations when it is (pick a 
>> point at random) something like a thousand commits into the graph, and the 
>> user hasn't scrolled down..
>
> I have made a "dev" branch in the gitk.git repository that has some
> tweaks to the graph layout algorithm which change the appearance a
> bit; specifically it doesn't continue the graph lines downwards until
> it has to terminate them with an arrow because the graph is getting
> too wide.  Instead, it always terminates them if they are going to be
> longer than a certain length (about 100 rows).

How about terminating them when they are going off-screen?  If you
worry about reformatting when scrolling, you can terminate them if
there will be no change for at least one screen more.

More importantly: you can do your layout without having to look at
more than two screen's worth of commit data.

> Also I made some changes to reduce the incidence of two lines having
> a corner at the same point, for visual clarity.
>
> The point of terminating the graph lines early is that it means gitk
> won't have to lay out the whole graph, just the visible bits and a
> limited number of rows around that.

Ok, that was what you were already thinking.

> So I'm interested to know if people think it looks OK visually.  (I
> think it's actually better, myself.)

I'd think so, too, but will be able to check only later this days.

> The other thing that takes time is reading in the topology for the
> previous/next tag computations.

If you can move that out of the busy loop and do it in the
background...

> I did a patch that wrote out the topology to a cache file but I ran
> into some problems where the cache includes commits that have gone
> away since the cache was created.

I think it should be possible to come up with a data structure that
swallows less memory than the current one.  All the info you need are
the SHA1s and their relations: the rest can be asked from git while
one is scrolling, with a LRU buffer of a few hundred commits for
speed.

> Would it be possible to make git rev-list ignore commits that don't
> exist if they have a "^" in front of them, i.e. where we're asking
> for them to be excluded anyway?  If we can do that (or something
> equivalent) then I can make the cache work reliably.  It does speed
> up gitk enormously, and the cache file is only about 3MB for the
> kernel tree, so it seems well worth while.

Cough, cough.  If the cache file is only about 3MB, why wouldn't you
be able to keep it in memory?

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2007-08-13  5:49 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-12 13:23 Can I have this, pretty please? David Kastrup
2007-08-12 14:21 ` Steven Grimm
2007-08-12 16:40   ` David Kastrup
2007-08-12 18:38 ` Linus Torvalds
2007-08-12 18:48   ` Linus Torvalds
2007-08-12 19:28     ` Jon Smirl
2007-08-12 19:45       ` Linus Torvalds
2007-08-12 19:48       ` David Kastrup
2007-08-12 19:29     ` David Kastrup
2007-08-12 19:51       ` Uwe Kleine-König
2007-08-12 20:04         ` David Kastrup
2007-08-12 20:21           ` Linus Torvalds
2007-08-12 19:53       ` Linus Torvalds
2007-08-12 20:10         ` David Kastrup
2007-08-13  0:22         ` Paul Mackerras
2007-08-13  5:49           ` David Kastrup
2007-08-12 19:10   ` David Kastrup
2007-08-12 19:24     ` Linus Torvalds
2007-08-12 19:46       ` David Kastrup
2007-08-12 19:59         ` Linus Torvalds
2007-08-12 20:30           ` David Kastrup
2007-08-12 20:58           ` Govind Salinas
2007-08-12 21:35             ` David Kastrup
2007-08-12 22:17               ` Martin Langhoff
2007-08-12 22:54                 ` David Kastrup
2007-08-12 20:02     ` Jeff King
2007-08-12 20:09       ` Jeff King
2007-08-12 21:51       ` David Kastrup
2007-08-12 23:10         ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).