git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: "Jeff King" <peff@peff.net>, "René Scharfe" <l.s.r@web.de>
Cc: git@vger.kernel.org, gitster@pobox.com,
	Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH v2] unpack-trees: avoid duplicate ODB lookups during checkout
Date: Mon, 10 Apr 2017 17:28:13 -0400	[thread overview]
Message-ID: <ed64f8fc-9cec-4c46-b9e6-9e9e4983f95a@jeffhostetler.com> (raw)
In-Reply-To: <20170410205540.wvlb7ch7bodiytmh@sigill.intra.peff.net>



On 4/10/2017 4:55 PM, Jeff King wrote:
> On Sat, Apr 08, 2017 at 04:06:41PM +0200, René Scharfe wrote:
>
>>> +		} else if (i > 1 && are_same_oid(&names[i], &names[i-2])) {
>>> +			/* implicitly borrow buf[i-2] inside tree_desc[i] */
>>> +			memcpy(&t[i], &t[i-2], sizeof(struct tree_desc));
>>
>> Similar case.
>>
>>> +			buf[i] = NULL;
>>
>> What makes the previous two entries special, or differently put: Why not
>> just check *all* previous entries?  MAX_UNPACK_TREES is 8; the number of
>> comparisons would just about double in the worst case and stay the same for
>> three trees or less.  The order of four trees or more wouldn't matter
>> anymore.
>
> If I understand correctly, we've got N (up to 8) trees, and we want to
> find sets of duplicates. Adjacency doesn't matter. Aren't our options
> basically:
>
>   - look through all previous entries naively, O(N^2)
>
>   - sort the list, O(N log N); but we need the original order, so we'd
>     have to unsort at the end, too
>
>   - use a hash table, O(N) but with O(N) extra storage
>
> I know N=8 isn't very big algorithmically speaking, but it would feel
> ugly to introduce something quadratic. Especially the limit of 8 seems
> rather arbitrary. In normal cases we see at most a 3-way merge. Beyond
> that we're in octopus territory, and 8 is way too low there (I think the
> octopus strategy just unpacks one at a time and barfs if there are any
> conflicts).
>
> I assume the rationale behind "check the last 2" is just "this
> optimization will kick in reliably for all sane cases", weird octopus
> unpack-trees be damned (though reading ca885a4fe, it sounds like 4-trees
> can actually happen, but I'm not clear on how).

yeah, i didn't want to pay for the obscure (n >= 4) cases with any
of the above.  doing 2 or 3 gets us checkout and merge.  these are
the common usages.

Jeff


  reply	other threads:[~2017-04-10 21:28 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-07 15:53 [PATCH v2] unpack-trees: avoid duplicate ODB lookups during checkout git
2017-04-07 15:53 ` git
2017-04-08 14:06   ` René Scharfe
2017-04-10 20:55     ` Jeff King
2017-04-10 21:28       ` Jeff Hostetler [this message]
2017-04-10 21:26     ` Jeff Hostetler
2017-04-10 23:09       ` René Scharfe
2017-04-11 20:42         ` Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ed64f8fc-9cec-4c46-b9e6-9e9e4983f95a@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=l.s.r@web.de \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).