From: David Turner <dturner@twopensource.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git mailing list <git@vger.kernel.org>
Subject: Re: [PATCH] unpack-trees: fix accidentally quadratic behavior
Date: Thu, 21 Jan 2016 15:59:44 -0500 [thread overview]
Message-ID: <1453409984.16226.46.camel@twopensource.com> (raw)
In-Reply-To: <xmqqpowuyall.fsf@gitster.mtv.corp.google.com>
On Thu, 2016-01-21 at 11:51 -0800, Junio C Hamano wrote:
> David Turner <dturner@twopensource.com> writes:
>
> > On Wed, 2016-01-20 at 20:58 -0800, Junio C Hamano wrote:
> > > David Turner <dturner@twopensource.com> writes:
> > >
> > > > While unpacking trees (e.g. during git checkout), when we hit a
> > > > cache
> > > > entry that's past and outside our path, we cut off iteration.
> > > >
> > > > This provides about a 45% speedup on git checkout between
> > > > master
> > > > and
> > > > master^20000 on Twitter's monorepo. Speedup in general will
> > > > depend
> > > > on
> > > > repostitory structure, number of changes, and packfile packing
> > > > decisions.
> > > >
> > > > Signed-off-by: David Turner <dturner@twopensource.com>
> > > > ---
> > >
> > > I haven't thought things through, but does this get fooled by the
> > > somewhat strange ordering rules of tree entries (i.e. a subtree
> > > sorts as if its name is suffixed with a '/' in a tree object)?
> > >
> > > Other than that, I like this. "We know the list is sorted, and
> > > after seeing this entry we know there is nothing that will match"
> > > is
> > > an obvious optimization that we already use elsewhere.
> > >
> > > Thanks.
> >
> > I think this is correct, because we first do the more complicated
> > check
> > (ce_in_traverse_path), and only check the ordering once that has
> > failed.
>
> But the patch does this:
>
> > + if (info->prev && info->traverse_path) {
> > + int prefix_cmp = strncmp(ce->name,
> > info->traverse_path, info->pathlen);
> > + if (prefix_cmp > 0)
> > + break;
> > + else if (prefix_cmp == 0 &&
> > + ce_namelen(ce) >= info
> > ->pathlen &&
> > + strcmp(ce->name + info
> > ->pathlen,
> > + info->name.path)
> > > 0) {
> > + break;
> > + }
> > + }
> > continue;
>
> The first break is correct, but I am not sure about the "else if"
> part. Shouldn't it be doing something similar to the logic to "keep
> looking" that talks about "t-i", "t" and "t/a" at the end of the
> loop?
Rather than doing more complicated logic, let's just do the first
check; it seems about as fast for our repo, and I think will usually be
so. does that seem reasonable to you?
> > The tests all pass, so this should be good.
>
> Please don't ever say that again.
OK.
next prev parent reply other threads:[~2016-01-21 20:59 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-21 4:05 [PATCH] unpack-trees: fix accidentally quadratic behavior David Turner
2016-01-21 4:58 ` Junio C Hamano
2016-01-21 19:09 ` David Turner
2016-01-21 19:51 ` Junio C Hamano
2016-01-21 20:59 ` David Turner [this message]
2016-01-21 21:06 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1453409984.16226.46.camel@twopensource.com \
--to=dturner@twopensource.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.