From: Jeff Hostetler <git@jeffhostetler.com>
To: "Jeff King" <peff@peff.net>, "René Scharfe" <l.s.r@web.de>
Cc: git@vger.kernel.org, gitster@pobox.com,
Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH v2] unpack-trees: avoid duplicate ODB lookups during checkout
Date: Mon, 10 Apr 2017 17:28:13 -0400 [thread overview]
Message-ID: <ed64f8fc-9cec-4c46-b9e6-9e9e4983f95a@jeffhostetler.com> (raw)
In-Reply-To: <20170410205540.wvlb7ch7bodiytmh@sigill.intra.peff.net>
On 4/10/2017 4:55 PM, Jeff King wrote:
> On Sat, Apr 08, 2017 at 04:06:41PM +0200, René Scharfe wrote:
>
>>> + } else if (i > 1 && are_same_oid(&names[i], &names[i-2])) {
>>> + /* implicitly borrow buf[i-2] inside tree_desc[i] */
>>> + memcpy(&t[i], &t[i-2], sizeof(struct tree_desc));
>>
>> Similar case.
>>
>>> + buf[i] = NULL;
>>
>> What makes the previous two entries special, or differently put: Why not
>> just check *all* previous entries? MAX_UNPACK_TREES is 8; the number of
>> comparisons would just about double in the worst case and stay the same for
>> three trees or less. The order of four trees or more wouldn't matter
>> anymore.
>
> If I understand correctly, we've got N (up to 8) trees, and we want to
> find sets of duplicates. Adjacency doesn't matter. Aren't our options
> basically:
>
> - look through all previous entries naively, O(N^2)
>
> - sort the list, O(N log N); but we need the original order, so we'd
> have to unsort at the end, too
>
> - use a hash table, O(N) but with O(N) extra storage
>
> I know N=8 isn't very big algorithmically speaking, but it would feel
> ugly to introduce something quadratic. Especially the limit of 8 seems
> rather arbitrary. In normal cases we see at most a 3-way merge. Beyond
> that we're in octopus territory, and 8 is way too low there (I think the
> octopus strategy just unpacks one at a time and barfs if there are any
> conflicts).
>
> I assume the rationale behind "check the last 2" is just "this
> optimization will kick in reliably for all sane cases", weird octopus
> unpack-trees be damned (though reading ca885a4fe, it sounds like 4-trees
> can actually happen, but I'm not clear on how).
yeah, i didn't want to pay for the obscure (n >= 4) cases with any
of the above. doing 2 or 3 gets us checkout and merge. these are
the common usages.
Jeff
next prev parent reply other threads:[~2017-04-10 21:28 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-07 15:53 [PATCH v2] unpack-trees: avoid duplicate ODB lookups during checkout git
2017-04-07 15:53 ` git
2017-04-08 14:06 ` René Scharfe
2017-04-10 20:55 ` Jeff King
2017-04-10 21:28 ` Jeff Hostetler [this message]
2017-04-10 21:26 ` Jeff Hostetler
2017-04-10 23:09 ` René Scharfe
2017-04-11 20:42 ` Jeff Hostetler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ed64f8fc-9cec-4c46-b9e6-9e9e4983f95a@jeffhostetler.com \
--to=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jeffhost@microsoft.com \
--cc=l.s.r@web.de \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).