git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Hommey <mh@glandium.org>
To: Jeff King <peff@peff.net>
Cc: Johan Herland <johan@herland.net>, git@vger.kernel.org
Subject: Re: "git notes show" is orders of magnitude slower than doing it manually with ls-tree and cat-file
Date: Wed, 26 Nov 2014 11:25:53 +0900	[thread overview]
Message-ID: <20141126022553.GA16199@glandium.org> (raw)
In-Reply-To: <20141126012448.GA11183@peff.net>

On Tue, Nov 25, 2014 at 08:24:49PM -0500, Jeff King wrote:
> On Tue, Nov 25, 2014 at 08:00:51PM -0500, Jeff King wrote:
> 
> > On Wed, Nov 26, 2014 at 09:42:42AM +0900, Mike Hommey wrote:
> > 
> > > I have a note tree with a bit more than 200k notes.
> > >
> > > $ time git notes --ref foo show $sha1 > /dev/null
> > > real    0m0.147s
> > > user    0m0.136s
> > > sys     0m0.008s
> > > 
> > > That's a lot of time, especially when you have a script that does that
> > > on a fair amount of sha1s.
> > 
> > IIRC, the notes code populates an in-memory data structure, which gives
> > faster per-commit lookup at the cost of some setup time. Obviously for a
> > single lookup, that's going to be a bad tradeoff (but it does make sense
> > for "git log --notes"). I don't know offhand how difficult it would be
> > to tune the data structure differently (or avoid it altogether) if we
> > know ahead of time we are only going to do a small number of lookups.
> > But Johan (cc'd) might.
> 
> One other question: how were your notes created?
> 
> I tried to replicate your setup by creating one note per commit in
> linux.git (over 400k notes total). I did it with one big mktree,
> creating a single top-level notes tree. Doing a single "git notes show"
> lookup on the tree was something like 800ms.
> 
> However, this is not what trees created by git-notes look like. It
> shards the object sha1s into subtrees (1a/2b/{36}), and I think does so
> dynamically in a way that keeps each individual tree size low. The
> in-memory data structure then only "faults in" tree objects as they are
> needed. So a single lookup should only hit a small part of the total
> tree.
> 
> Doing a single "git notes edit HEAD" in my case caused the notes code to
> write the result using its sharding algorithm. Subsequent "git notes
> show" invocations were only 14ms.
> 
> Did you use something besides git-notes to create the tree? From your
> examples, it looks like you were accounting for the sharding during
> lookup, so maybe this is leading in the wrong direction (but if so, I
> could not reproduce your times at all even with a much larger case).

So... this is interesting. I happen to have recreated the notes tree
"manually", and now each git notes show takes under 10ms.

Now, looking at the notes tree reflog, I see that at some point, some
notes were added at the top-level of the tree, without being nested,
which is strange.

And it looks like it's related to how I've been adding them, through
git-fast-import. I was using notemodify commands, and was using the
filemodify command to load the previous notes tree instead of using the
from command because I don't care about keeping the notes history.
So fast-import was actually filling the notes tree as if it were
starting over with whatever new notes were added with notemodify (which,
in a case where there were many, it filled with one level of
indirection)

I'm not sure this is a case worth fixing in fast-import. I can easily
work around it.

Mike

  parent reply	other threads:[~2014-11-26  2:26 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-26  0:42 "git notes show" is orders of magnitude slower than doing it manually with ls-tree and cat-file Mike Hommey
2014-11-26  1:00 ` Jeff King
2014-11-26  1:24   ` Jeff King
2014-11-26  1:34     ` Jeff King
2014-11-26  2:30       ` Mike Hommey
2014-11-26  4:49         ` Jeff King
2014-11-26  2:25     ` Mike Hommey [this message]
2014-11-26  4:46       ` Jeff King
2014-11-26 11:46         ` Johan Herland
2014-11-26 12:30           ` Mike Hommey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141126022553.GA16199@glandium.org \
    --to=mh@glandium.org \
    --cc=git@vger.kernel.org \
    --cc=johan@herland.net \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).