git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>, David Kastrup <dak@gnu.org>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH] blame.c: don't drop origin blobs as eagerly
Date: Wed, 3 Apr 2019 07:36:05 -0400	[thread overview]
Message-ID: <20190403113604.GA2941@sigill.intra.peff.net> (raw)
In-Reply-To: <CACsJy8AbkmJ69ucCfGMdXHGvfko89SxH=DKjra6Ltwf7wpy-Og@mail.gmail.com>

On Wed, Apr 03, 2019 at 04:32:30PM +0700, Duy Nguyen wrote:

> That might explain why I could not see significant gain when blaming
> linux.git's MAINTAINERS file (0.5s was shaved out of 13s) even though
> the number of objects read was cut by half (8424 vs 15083).

I did a few timings, too, and managed to come up with similar
improvements (only a small fraction, and only for large files). I think
the main thing is simply that loading the blob from the object database
is a fraction of the total work done. We still have to actually diff the
blobs, which is at least as expensive as loading them from disk.

We also have to load commits and trees from disk as we traverse.
Enabling the commit-graph would shrink that portion (and make
improvements in the blob loading proportionally more impressive).

All that said, this seems like an easy and obvious win, and worth doing.
0.5s is still something.

I suspect we could do even better by storing and reusing not just the
original blob between diffs, but the intermediate diff state (i.e., the
hashes produced by xdl_prepare(), which should be usable between
multiple diffs). That's quite a bit more complex, though, and I imagine
would require some surgery to xdiff.

-Peff

  reply	other threads:[~2019-04-03 11:36 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-02 11:56 [PATCH] blame.c: don't drop origin blobs as eagerly David Kastrup
2019-04-03  7:45 ` Junio C Hamano
2019-04-03  9:32   ` Duy Nguyen
2019-04-03 11:36     ` Jeff King [this message]
2019-04-03 12:06       ` Duy Nguyen
2019-04-03 12:19         ` Jeff King
2019-04-03 12:32           ` David Kastrup
2019-04-03 11:08   ` David Kastrup
  -- strict thread matches above, loose matches on Subject: below --
2016-05-27 13:35 David Kastrup
2016-05-27 15:00 ` Johannes Schindelin
2016-05-27 15:41   ` David Kastrup
2016-05-28  6:37     ` Johannes Schindelin
2016-05-28  8:29       ` David Kastrup
2016-05-28 12:34         ` Johannes Schindelin
2016-05-28 14:00           ` David Kastrup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190403113604.GA2941@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=dak@gnu.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).