git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: David Turner <dturner@twopensource.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH v2] unpack-trees: fix accidentally quadratic behavior
Date: Thu, 21 Jan 2016 16:30:56 -0500	[thread overview]
Message-ID: <20160121213056.GA6664@sigill.intra.peff.net> (raw)
In-Reply-To: <1453410708-23951-1-git-send-email-dturner@twopensource.com>

On Thu, Jan 21, 2016 at 04:11:48PM -0500, David Turner wrote:

> While unpacking trees (e.g. during git checkout), when we hit a cache
> entry that's past and outside our path, we cut off iteration.
> 
> This provides about a 45% speedup on git checkout between master and
> master^20000 on Twitter's monorepo.  Speedup in general will depend on
> repostitory structure, number of changes, and packfile packing
> decisions.

I feel like I'm missing the explanation of the quadratic part. From
looking at the patch, my guess is:

  1. We're doing a linear walk in a data structure (a "struct
     index_state").

  2. For each element, we look it up in another structure
     ("struct traverse_info") with a linear search.

     That leaves us at O(m*n), but if we assume both are on the same
     order of magnitude, that's quadratic.

  3. The fix works by knowing that once a lookup in (2) fails once, it's
     likely to fail for all the remainder, and we short-cut that case
     and skip out of (1) completely.

But that makes me wonder. Aren't we still quadratic in the case that
ce_in_traverse_path() returns true? If so, would we benefit from either:

  a. Improving the complexity of ce_in_traverse_path, to say O(log n),
     which would give us O(n log n) for the whole operation in all
     cases?

  b. If both lists are already sorted, maybe doing a list-merge to
     compare them in O(2n) time?

I'm fairly ignorant of this part of the code, so there's probably a good
reason why my suggestion is unworkable.

-Peff

  reply	other threads:[~2016-01-21 21:31 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-21 21:11 [PATCH v2] unpack-trees: fix accidentally quadratic behavior David Turner
2016-01-21 21:30 ` Jeff King [this message]
2016-01-21 23:24   ` David Turner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160121213056.GA6664@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).