git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, gitster@pobox.com,
	Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH v6 0/3] read-cache: speed up add_index_entry
Date: Fri, 7 Apr 2017 14:27:24 -0400	[thread overview]
Message-ID: <6f31ee65-517e-419c-b0c1-3ccdd3f95b37@jeffhostetler.com> (raw)
In-Reply-To: <20170407044626.ypsqnyxguw43gprm@sigill.intra.peff.net>



On 4/7/2017 12:46 AM, Jeff King wrote:
> On Thu, Apr 06, 2017 at 04:34:39PM +0000, git@jeffhostetler.com wrote:
>
>> Teach add_index_entry_with_check() and has_dir_name()
>> to avoid index lookups if the given path sorts after
>> the last entry in the index.
>>
>> This saves at least 2 binary searches per entry.
>>
>> This improves performance during checkout and read-tree because
>> merge_working_tree() and unpack_trees() processes a list of already
>> sorted entries.
>
> Just thinking about this algorithmically for a moment. You're saving the
> binary search when the input is given in sorted order. But in other
> cases you're adding an extra strcmp() before the binary search begins.
> So it's a tradeoff.
>
> How often is the input sorted?  You save O(log n) strcmps for a "hit"
> with your patch, and one for a "miss". So it's a net win if we expect at
> least 1/log(n) of additions to be sorted (I'm talking about individual
> calls, but it should scale linearly either way over a set of n calls).
>
> I have no clue if that's a reasonable assumption or not.

I was seeing checkout call merge_working_tree to iterate over the
source index/trees and call add_index_entry() for each.  For example,
in a "checkout -b" like operation where both sides are the same, this
calls keep_entry() which appends the entry to the new index array.
The append path should always be taken because the iteration is being
driven from a sorted list.

I would think calls to add/stage individual files arrive in random
order, so I'm not suggesting replacing the code -- just checking the
end first.

Jeff


  reply	other threads:[~2017-04-07 18:27 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-06 16:34 [PATCH v6 0/3] read-cache: speed up add_index_entry git
2017-04-06 16:34 ` [PATCH v6 1/3] read-cache: add strcmp_offset function git
2017-04-06 23:07   ` René Scharfe
2017-04-07 18:04     ` Jeff Hostetler
2017-04-06 16:34 ` [PATCH v6 2/3] p0004-read-tree: perf test to time read-tree git
2017-04-06 16:34 ` [PATCH v6 3/3] read-cache: speed up add_index_entry during checkout git
2017-04-07  4:46 ` [PATCH v6 0/3] read-cache: speed up add_index_entry Jeff King
2017-04-07 18:27   ` Jeff Hostetler [this message]
2017-04-08 10:43     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6f31ee65-517e-419c-b0c1-3ccdd3f95b37@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).