From: Jeff Hostetler <git@jeffhostetler.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, gitster@pobox.com,
Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH v6 0/3] read-cache: speed up add_index_entry
Date: Fri, 7 Apr 2017 14:27:24 -0400 [thread overview]
Message-ID: <6f31ee65-517e-419c-b0c1-3ccdd3f95b37@jeffhostetler.com> (raw)
In-Reply-To: <20170407044626.ypsqnyxguw43gprm@sigill.intra.peff.net>
On 4/7/2017 12:46 AM, Jeff King wrote:
> On Thu, Apr 06, 2017 at 04:34:39PM +0000, git@jeffhostetler.com wrote:
>
>> Teach add_index_entry_with_check() and has_dir_name()
>> to avoid index lookups if the given path sorts after
>> the last entry in the index.
>>
>> This saves at least 2 binary searches per entry.
>>
>> This improves performance during checkout and read-tree because
>> merge_working_tree() and unpack_trees() processes a list of already
>> sorted entries.
>
> Just thinking about this algorithmically for a moment. You're saving the
> binary search when the input is given in sorted order. But in other
> cases you're adding an extra strcmp() before the binary search begins.
> So it's a tradeoff.
>
> How often is the input sorted? You save O(log n) strcmps for a "hit"
> with your patch, and one for a "miss". So it's a net win if we expect at
> least 1/log(n) of additions to be sorted (I'm talking about individual
> calls, but it should scale linearly either way over a set of n calls).
>
> I have no clue if that's a reasonable assumption or not.
I was seeing checkout call merge_working_tree to iterate over the
source index/trees and call add_index_entry() for each. For example,
in a "checkout -b" like operation where both sides are the same, this
calls keep_entry() which appends the entry to the new index array.
The append path should always be taken because the iteration is being
driven from a sorted list.
I would think calls to add/stage individual files arrive in random
order, so I'm not suggesting replacing the code -- just checking the
end first.
Jeff
next prev parent reply other threads:[~2017-04-07 18:27 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-06 16:34 [PATCH v6 0/3] read-cache: speed up add_index_entry git
2017-04-06 16:34 ` [PATCH v6 1/3] read-cache: add strcmp_offset function git
2017-04-06 23:07 ` René Scharfe
2017-04-07 18:04 ` Jeff Hostetler
2017-04-06 16:34 ` [PATCH v6 2/3] p0004-read-tree: perf test to time read-tree git
2017-04-06 16:34 ` [PATCH v6 3/3] read-cache: speed up add_index_entry during checkout git
2017-04-07 4:46 ` [PATCH v6 0/3] read-cache: speed up add_index_entry Jeff King
2017-04-07 18:27 ` Jeff Hostetler [this message]
2017-04-08 10:43 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6f31ee65-517e-419c-b0c1-3ccdd3f95b37@jeffhostetler.com \
--to=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jeffhost@microsoft.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).