From: David Kastrup <dak@gnu.org>
To: Shawn Pearce <spearce@spearce.org>
Cc: git <git@vger.kernel.org>
Subject: Re: [PATCH 1/2] blame: large-scale performance rewrite
Date: Sat, 26 Apr 2014 09:48:14 +0200 [thread overview]
Message-ID: <87wqec8rb5.fsf@fencepost.gnu.org> (raw)
In-Reply-To: <CAJo=hJukmej1rJXuVoECwd7AxmSue8Wmv4rBmCHEYcWBWNarSw@mail.gmail.com> (Shawn Pearce's message of "Fri, 25 Apr 2014 17:53:31 -0700")
Shawn Pearce <spearce@spearce.org> writes:
> On Fri, Apr 25, 2014 at 4:56 PM, David Kastrup <dak@gnu.org> wrote:
>> The previous implementation used a single sorted linear list of blame
>> entries for organizing all partial or completed work. Every subtask had
>> to scan the whole list, with most entries not being relevant to the
>> task. The resulting run-time was quadratic to the number of separate
>> chunks.
>>
>> This change gives every subtask its own data to work with. Subtasks are
>> organized into "struct origin" chains hanging off particular commits.
>> Commits are organized into a priority queue, processing them in commit
>> date order in order to keep most of the work affecting a particular blob
>> collated even in the presence of an extensive merge history.
>
> Without reading the code, this sounds like how JGit runs blame.
>
>> For large files with a diversified history, a speedup by a factor of 3
>> or more is not unusual.
>
> And JGit was already usually slower than git-core. Now it will be even
> slower! :-)
If your statement about JGit is accurate, it should likely have beat Git
for large use cases (where the performance improvements are most
important) as O(n) beats O(n^2) in the long run.
At any rate, I see that I ended up posting this patch series at the end
of the week again which makes for a somewhat lacklustre initial response
from those who code Git for a regular living.
Apropos: shaking the bugs regarding -M and -C options out of the code
had taken a large toll because -M can cause the same or overlapping line
regions to be responsible for different target regions and the original
code implementing the "straightforward" blame blew up on the overlap.
I spent a _lot_ of time tracking down that problem.
As I am lousy focusing on more than one task, and as I don't get a
regular paycheck anyway, this will have to remain my last contribution
to Git if I am not going to recoup my losses.
Patch 2 of this series tries giving the community of Git a serious
chance at picking that option (I mean, there are literally millions of
Git users around with a sizable number profiting) while not being
obnoxious about it.
My personal guess is that it will fail regarding both objectives. But
then I've been surprised before by other free software communities when
trying to make those particular two ends meet.
At any rate, feedback about the performance of the patch from users
disappointed by regular git blame would be welcome.
Apart from the objective measurement of "total time", the more
subjective impression of interactive/incremental response (like in git
gui blame) where the order of results will significantly differ (current
git-blame --incremental focuses on getting blames resolved in
first-lines-first manner, the proposed git-blame rather works on a
newest-commits-first basis which might better match typical use cases)
might be worth reporting.
--
David Kastrup
next prev parent reply other threads:[~2014-04-26 7:48 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-25 23:56 [PATCH 1/2] blame: large-scale performance rewrite David Kastrup
2014-04-25 23:56 ` [PATCH 2/2] Mention "git blame" improvements in release notes David Kastrup
2014-04-26 17:28 ` Junio C Hamano
2014-04-26 18:28 ` David Kastrup
[not found] ` <xmqqzjj5s8hs.fsf@gitster.dls.corp.google.com>
2014-04-28 17:39 ` David Kastrup
2014-04-28 19:35 ` Junio C Hamano
2014-04-28 19:57 ` David Kastrup
2014-04-28 20:05 ` Ronnie Sahlberg
2014-04-28 20:26 ` David Kastrup
2014-04-26 0:53 ` [PATCH 1/2] blame: large-scale performance rewrite Shawn Pearce
2014-04-26 7:48 ` David Kastrup [this message]
2014-04-26 16:01 ` Shawn Pearce
2014-04-26 16:50 ` David Kastrup
2014-04-26 17:09 ` Shawn Pearce
2014-04-26 17:22 ` David Kastrup
2014-04-26 17:02 ` David Kastrup
2014-04-26 17:30 ` David Kastrup
2014-04-26 17:56 ` Shawn Pearce
2014-04-26 21:39 ` David Kastrup
2014-04-27 17:53 ` Shawn Pearce
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wqec8rb5.fsf@fencepost.gnu.org \
--to=dak@gnu.org \
--cc=git@vger.kernel.org \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.