From: Linus Torvalds <torvalds@linux-foundation.org>
To: Marco Costalba <mcostalba@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>, git@vger.kernel.org
Subject: [PATCH 0/2] History replay support
Date: Fri, 2 Nov 2007 13:31:36 -0700 (PDT) [thread overview]
Message-ID: <alpine.LFD.0.999.0711021301200.3342@woody.linux-foundation.org> (raw)
In-Reply-To: <alpine.LFD.0.999.0711021114390.3342@woody.linux-foundation.org>
Ok, this is a rough first draft of avoiding the topological sort up-front,
and instead incrementally sorting only when actually necessary.
The first patch is a pure cleanup. Quite frankly, I suspect the old
topological sorting code would have worked perfectly fine as-is, but I
couldn't really bear to watch it or think about debugging it the way it
was before.
The indirection and other strangeness just blew my tiny little mind, and
while I bet it had some historical reason, I ended up reworking that
thing. I tried to keep it as similar as humanly possible to the old code
(because it's easy to get wrong, and because I really didn't want to worry
about the toposort itself), but the numbers speak for themselves:
4 files changed, 55 insertions(+), 126 deletions(-)
should tell you something. And it means that there are no subtle calling
issues with preconditions for using the toposort etc - you can use it over
and over again, and it should all be obvious. Knock wood.
The second diff is the one that actually adds the new feature, and it
undoes most of the nice statistics of the first one:
5 files changed, 70 insertions(+), 12 deletions(-)
but we still end up with more deletions than insertions on the whole,
*and* a new feature.
The new code is triggered by using the "--replay" flag, which will cause
certain consistency checks to be done when a commit is shown by the log
machinery. In particular:
- if we print out a parent SHA1, and the parent has already been shown,
that's a topology violation, and causes a replay.
- if we turn a commit unintersting, and the commit has already been
shown, that's a "uninteresting" violation, and causes a replay.
The second one I didn't test at all. It's probably hard to trigger, and
I bet there are bugs there, but this is very much a WIP patch, with the
hope that people other than me can work on it.
When a replay happens, the log code will print out
Replay <sha1>
for each <sha1> commit that gets invalidated by the replay, and then start
*that* part of history anew, with just the required part re-sorted.
Should it print just the number of commits? Perhaps. Play around with
this.
Anyway, to give you some kind of idea of the effect of this, in the
current git tree as it exists in my repo, I get 15 replay events, and if
you compare the total log output, you see:
[torvalds@woody git]$ wc -l t1 t2
170191 t1
178150 t2
where "t1" is without replays, and "t2" is with replays. The replays
obviously do add lines (the replayed ones), but at least for git, it's on
the order of 5% lines replayed.
The big difference is in the latency:
time git log --parents --topo-order | head
real 0m0.163s
vs
time git log --parents --replay | head
real 0m0.003s
ie you can see how the --replay thing starts outputtig commits
immediately, because it knows it can just back up and fix any errors that
happen.
That's the good news.
The bad news is that it doesn't work well in this simplistic form, because
there is a O(n**2) behaviour when replays *do* happen, ie we end up having
replays within replays, and rather than getting a 5% increase like for git
itself, for the kernel archive this gets a roughly 50x increase for the
replay. So the *latency* still improves dramatically (getting the first
one hundred commits in 0.006 seconds vs 1.076s), but because of the bad
behaviour wrt cascading replays, it's not really usable in this form (the
full log goes from 8 seconds to 22s when writing to /dev/null - and is
much worse if the receiver actually has to do something about it).
I think the right thing to do wrt this all would be to turn the replay
logic into a latency-based one, where it would basically batch things up,
but make sure to output something at least every half a second or so. But
that's a pretty separate set of logic, so I thought I'd send this out
as-is for comments..
Linus
next prev parent reply other threads:[~2007-11-02 20:31 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-28 1:39 New features in gitk Paul Mackerras
2007-10-28 5:34 ` Linus Torvalds
2007-10-28 7:11 ` Paul Mackerras
2007-10-28 7:36 ` Steffen Prohaska
2007-10-28 16:50 ` Linus Torvalds
2007-11-01 10:00 ` Paul Mackerras
2007-11-01 15:16 ` Linus Torvalds
2007-11-02 10:19 ` Paul Mackerras
2007-11-02 12:44 ` Marco Costalba
2007-11-02 15:42 ` Linus Torvalds
2007-11-02 16:50 ` Marco Costalba
2007-11-02 18:16 ` Linus Torvalds
2007-11-02 20:31 ` Linus Torvalds [this message]
2007-11-02 20:32 ` [PATCH 1/2] Simplify topo-sort logic Linus Torvalds
2007-11-02 20:35 ` [PATCH 2/2] Support "history replay" for git log commands Linus Torvalds
2007-11-02 21:05 ` Junio C Hamano
2007-11-02 21:17 ` Linus Torvalds
2007-11-03 1:40 ` [PATCH 0/2] History replay support Linus Torvalds
2007-11-03 7:56 ` Marco Costalba
2007-11-03 18:11 ` [REPLACEMENT PATCH 2/2] Add "--early-output" log flag for interactive GUI use Linus Torvalds
2007-11-03 19:52 ` Marco Costalba
2007-11-04 3:06 ` Paul Mackerras
2007-11-04 5:38 ` Linus Torvalds
2007-11-04 7:10 ` Paul Mackerras
2007-11-04 7:52 ` Marco Costalba
2007-11-04 18:11 ` Linus Torvalds
2007-11-04 20:12 ` [PATCH 3/2] Enhance --early-output format Linus Torvalds
2007-11-05 20:24 ` Junio C Hamano
2007-11-05 20:47 ` Linus Torvalds
2007-11-05 21:22 ` Linus Torvalds
2007-11-05 21:35 ` Linus Torvalds
2007-11-13 4:58 ` [PATCH 4/2] Fix parent rewriting in --early-output Linus Torvalds
2007-11-13 5:43 ` Junio C Hamano
2007-11-13 6:46 ` Linus Torvalds
2007-11-13 7:16 ` Linus Torvalds
2007-11-13 7:53 ` Sven Verdoolaege
2007-11-13 8:48 ` Junio C Hamano
2007-11-13 8:01 ` Shawn O. Pearce
2007-11-13 8:24 ` Junio C Hamano
2007-11-13 9:59 ` Paul Mackerras
2007-11-13 18:53 ` Junio C Hamano
2007-11-13 21:55 ` Paul Mackerras
2007-11-16 7:30 ` Marco Costalba
2007-11-04 0:32 ` [PATCH 0/2] History replay support Paul Mackerras
2007-11-02 18:17 ` New features in gitk Johannes Schindelin
2007-11-02 15:03 ` Linus Torvalds
2007-11-01 11:37 ` Paul Mackerras
2007-11-01 15:47 ` Linus Torvalds
2007-11-01 16:21 ` Linus Torvalds
2007-10-28 18:32 ` Pierre Habouzit
2007-10-28 18:38 ` Mike Hommey
2007-10-28 23:13 ` Paul Mackerras
2007-10-29 6:20 ` Pierre Habouzit
2007-10-29 8:31 ` Jonathan del Strother
2007-10-29 6:24 ` Pierre Habouzit
2007-10-29 13:30 ` Han-Wen Nienhuys
2007-10-29 14:04 ` Michele Ballabio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.0.999.0711021301200.3342@woody.linux-foundation.org \
--to=torvalds@linux-foundation.org \
--cc=git@vger.kernel.org \
--cc=mcostalba@gmail.com \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).