git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] unpack-trees: fix accidentally quadratic behavior
@ 2016-01-21 21:11 David Turner
  2016-01-21 21:30 ` Jeff King
  0 siblings, 1 reply; 3+ messages in thread
From: David Turner @ 2016-01-21 21:11 UTC (permalink / raw)
  To: git; +Cc: David Turner

While unpacking trees (e.g. during git checkout), when we hit a cache
entry that's past and outside our path, we cut off iteration.

This provides about a 45% speedup on git checkout between master and
master^20000 on Twitter's monorepo.  Speedup in general will depend on
repostitory structure, number of changes, and packfile packing
decisions.

Signed-off-by: David Turner <dturner@twopensource.com>
---
 unpack-trees.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 5f541c2..d8e9685 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -695,8 +695,19 @@ static int find_cache_pos(struct traverse_info *info,
 				++o->cache_bottom;
 			continue;
 		}
-		if (!ce_in_traverse_path(ce, info))
+		if (!ce_in_traverse_path(ce, info)) {
+			/*
+			 * Check if we can skip future cache checks
+			 * (because we're already past all possible
+			 * entries in the traverse path).
+			 */
+			if (info->prev && info->traverse_path) {
+				if (strncmp(ce->name, info->traverse_path,
+					    info->pathlen) > 0)
+					break;
+			}
 			continue;
+		}
 		ce_name = ce->name + pfxlen;
 		ce_slash = strchr(ce_name, '/');
 		if (ce_slash)
-- 
2.4.2.749.g730654d-twtrsrc

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] unpack-trees: fix accidentally quadratic behavior
  2016-01-21 21:11 [PATCH v2] unpack-trees: fix accidentally quadratic behavior David Turner
@ 2016-01-21 21:30 ` Jeff King
  2016-01-21 23:24   ` David Turner
  0 siblings, 1 reply; 3+ messages in thread
From: Jeff King @ 2016-01-21 21:30 UTC (permalink / raw)
  To: David Turner; +Cc: git

On Thu, Jan 21, 2016 at 04:11:48PM -0500, David Turner wrote:

> While unpacking trees (e.g. during git checkout), when we hit a cache
> entry that's past and outside our path, we cut off iteration.
> 
> This provides about a 45% speedup on git checkout between master and
> master^20000 on Twitter's monorepo.  Speedup in general will depend on
> repostitory structure, number of changes, and packfile packing
> decisions.

I feel like I'm missing the explanation of the quadratic part. From
looking at the patch, my guess is:

  1. We're doing a linear walk in a data structure (a "struct
     index_state").

  2. For each element, we look it up in another structure
     ("struct traverse_info") with a linear search.

     That leaves us at O(m*n), but if we assume both are on the same
     order of magnitude, that's quadratic.

  3. The fix works by knowing that once a lookup in (2) fails once, it's
     likely to fail for all the remainder, and we short-cut that case
     and skip out of (1) completely.

But that makes me wonder. Aren't we still quadratic in the case that
ce_in_traverse_path() returns true? If so, would we benefit from either:

  a. Improving the complexity of ce_in_traverse_path, to say O(log n),
     which would give us O(n log n) for the whole operation in all
     cases?

  b. If both lists are already sorted, maybe doing a list-merge to
     compare them in O(2n) time?

I'm fairly ignorant of this part of the code, so there's probably a good
reason why my suggestion is unworkable.

-Peff

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] unpack-trees: fix accidentally quadratic behavior
  2016-01-21 21:30 ` Jeff King
@ 2016-01-21 23:24   ` David Turner
  0 siblings, 0 replies; 3+ messages in thread
From: David Turner @ 2016-01-21 23:24 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Thu, 2016-01-21 at 16:30 -0500, Jeff King wrote:
> On Thu, Jan 21, 2016 at 04:11:48PM -0500, David Turner wrote:
> 
> > While unpacking trees (e.g. during git checkout), when we hit a
> > cache
> > entry that's past and outside our path, we cut off iteration.
> > 
> > This provides about a 45% speedup on git checkout between master
> > and
> > master^20000 on Twitter's monorepo.  Speedup in general will depend
> > on
> > repostitory structure, number of changes, and packfile packing
> > decisions.
> 
> I feel like I'm missing the explanation of the quadratic part. From
> looking at the patch, my guess is:
> 
>   1. We're doing a linear walk in a data structure (a "struct
>      index_state").
> 
>   2. For each element, we look it up in another structure
>      ("struct traverse_info") with a linear search.
> 
>      That leaves us at O(m*n), but if we assume both are on the same
>      order of magnitude, that's quadratic.

No, I think, it's the opposite order: we're doing a linear walk over
the incoming tree and for each entry, we're calling find_cache_pos.
find_cache_pos was doing a linear walk over struct index_state.  But
the same algorithmic complexity holds.

>   3. The fix works by knowing that once a lookup in (2) fails once,
> it's
>      likely to fail for all the remainder, and we short-cut that case
>      and skip out of (1) completely.
> 
> But that makes me wonder. Aren't we still quadratic in the case that
> ce_in_traverse_path() returns true? 

I think that doesn't happen very often, because it requires that the
paths match up.  

> If so, would we benefit from either:
> 
>   a. Improving the complexity of ce_in_traverse_path, to say O(log
> n),
>      which would give us O(n log n) for the whole operation in all
>      cases?
> 
>   b. If both lists are already sorted, maybe doing a list-merge to
>      compare them in O(2n) time?

(b) appears to be now (roughly) what we're now doing.

> I'm fairly ignorant of this part of the code, so there's probably a
> good
> reason why my suggestion is unworkable.

I am also quite ignorant of this part of the code; I just looked at
perf and did some simple counting.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-01-21 23:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-21 21:11 [PATCH v2] unpack-trees: fix accidentally quadratic behavior David Turner
2016-01-21 21:30 ` Jeff King
2016-01-21 23:24   ` David Turner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).