All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Kastrup <dak@gnu.org>
To: Shawn Pearce <spearce@spearce.org>
Cc: git <git@vger.kernel.org>
Subject: Re: [PATCH 1/2] blame: large-scale performance rewrite
Date: Sat, 26 Apr 2014 18:50:01 +0200	[thread overview]
Message-ID: <87ha5g8286.fsf@fencepost.gnu.org> (raw)
In-Reply-To: <CAJo=hJs=ap=Ct_PzOsO=vHmDVMvUF+nvbB7b67bgnmug+Yrohg@mail.gmail.com> (Shawn Pearce's message of "Sat, 26 Apr 2014 09:01:22 -0700")

Shawn Pearce <spearce@spearce.org> writes:

> On Sat, Apr 26, 2014 at 12:48 AM, David Kastrup <dak@gnu.org> wrote:
>> Shawn Pearce <spearce@spearce.org> writes:
>>>
>>> And JGit was already usually slower than git-core. Now it will be
>>> even slower! :-)
>>
>> If your statement about JGit is accurate, it should likely have beat
>> Git for large use cases (where the performance improvements are most
>> important) as O(n) beats O(n^2) in the long run.
>
> Agreed.
>
> In a few cases yes, JGit did beat git-core at blame running time.
> Unfortunately according to my profiling blame performance is still
> dominated by inflation and scanning of commit and tree objects to
> identify unmodified blobs and advance to the next scapegoat ancestor.

Oh, the C version is most certainly significantly impacted by that after
my patch.  One can _significantly_ speed it up by increasing
core.deltaBaseCacheLimit from its rather silly value of 16M.  If you
have a comparable control in JGit and if your benchmarking points to the
unpacking, that's where I'd suggest tweaking first.

>> Apart from the objective measurement of "total time", the more
>> subjective impression of interactive/incremental response (like in
>> git gui blame) where the order of results will significantly differ
>> (current git-blame --incremental focuses on getting blames resolved
>> in first-lines-first manner, the proposed git-blame rather works on a
>> newest-commits-first basis which might better match typical use
>> cases) might be worth reporting.
>
> Seeing this fill during execution was the initial motivation I had for
> writing git gui blame.

It does not look impressively better to me, actually.  Probably because
"git gui blame" is running "git blame" several times.

> I don't think anyone cares about the order it displays in. In fact
> ordering my timestamp may be more what the user wants anyway, as you
> suggest above.

What the user wants anyway is that "git gui blame" notifies "git blame"
of the currently displayed window area whenever that changes, and that
git blame then _first_ deals with all chunks with a visible on-screen
part.

> Thanks for doing this. Unfortunately I can't read the patch itself as
> I am also trying to improve JGit's blame code for $DAY_JOB, and JGit
> is BSD licensed.

Shrug.  The patch is functionally equivalent to the previous behavior,
the arrangement of linear lists on underlying data structures is hardly
copyrightable, and Java implements linear lists differently anyhow.
Merging two sorted linear lists is a straightforward operation,
splitting a linear list into several others also is.

The really tricky/expensive part was realizing that as opposed to target
line number ranges, source line number ranges may overlap and/or be
duplicate when using -M or -C options.  That really messed things up for
a long time and was hard to debug.  Once I figured out what was going
wrong, recoding the respective stuff was straightforward.

I doubt that there is much copyrightable material to transfer as I seem
to remember that Java does not have anything like a pointer.  So the
main stuff, juggling with linear lists, would not likely transfer in a
reasonably recognizable manner.

-- 
David Kastrup

  reply	other threads:[~2014-04-26 16:50 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-25 23:56 [PATCH 1/2] blame: large-scale performance rewrite David Kastrup
2014-04-25 23:56 ` [PATCH 2/2] Mention "git blame" improvements in release notes David Kastrup
2014-04-26 17:28   ` Junio C Hamano
2014-04-26 18:28     ` David Kastrup
     [not found]       ` <xmqqzjj5s8hs.fsf@gitster.dls.corp.google.com>
2014-04-28 17:39         ` David Kastrup
2014-04-28 19:35           ` Junio C Hamano
2014-04-28 19:57             ` David Kastrup
2014-04-28 20:05             ` Ronnie Sahlberg
2014-04-28 20:26               ` David Kastrup
2014-04-26  0:53 ` [PATCH 1/2] blame: large-scale performance rewrite Shawn Pearce
2014-04-26  7:48   ` David Kastrup
2014-04-26 16:01     ` Shawn Pearce
2014-04-26 16:50       ` David Kastrup [this message]
2014-04-26 17:09         ` Shawn Pearce
2014-04-26 17:22           ` David Kastrup
2014-04-26 17:02       ` David Kastrup
2014-04-26 17:30         ` David Kastrup
2014-04-26 17:56           ` Shawn Pearce
2014-04-26 21:39             ` David Kastrup
2014-04-27 17:53               ` Shawn Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ha5g8286.fsf@fencepost.gnu.org \
    --to=dak@gnu.org \
    --cc=git@vger.kernel.org \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.