From: Derrick Stolee <derrickstolee@github.com>
To: Jeff King <peff@peff.net>
Cc: Richard Oliver <roliver@roku.com>, Taylor Blau <me@ttaylorr.com>,
git@vger.kernel.org, jonathantanmy@google.com
Subject: Re: [PATCH] mktree: learn about promised objects
Date: Thu, 16 Jun 2022 09:59:57 -0400 [thread overview]
Message-ID: <b559b9a3-b97a-f394-5845-5c810425f8a4@github.com> (raw)
In-Reply-To: <YqrIrYHKUP6b/EtN@coredump.intra.peff.net>
On 6/16/2022 2:07 AM, Jeff King wrote:
> On Wed, Jun 15, 2022 at 02:17:58PM -0400, Derrick Stolee wrote:
>
>> On 6/15/2022 1:40 PM, Richard Oliver wrote:
>>> On 15/06/2022 05:00, Jeff King wrote:
>>
>>>> So it is not just lookup, but actual tree walking that is expensive. The
>>>> flip side is that you don't have to store a complete separate list of
>>>> the promised objects. Whether that's a win depends on how many local
>>>> objects you have, versus how many are promised.
>>
>> This is also why blobless (or blob-size filters) are the recommended way
>> to use partial clone. It's just too expensive to have tree misses.
>
> I agree that tree misses are awful, but I'm actually talking about
> something different: traversing the local trees we _do_ have in order to
> find the set of promised objects. Which is worse for blob:none, because
> it means you have more trees locally. :)
Ah, I misread your email. I agree that walking trees is far too
expensive to do just to find an object type.
> Try this with a big repo like linux.git:
>
> git clone --no-local --filter=blob:none linux.git repo
> cd repo
>
> # this is fast; we mark the promisor trees as UNINTERESTING, so we do
> # not look at them as part of the traversal, and never call
> # is_promisor_object().
> time git rev-list --count --objects --all --exclude-promisor-objects
>
> # but imagine we had a fixed mktree[1] that did not fault in the blobs
> # unnecessarily, and we made a new tree that references a promised
> # blob.
> tree=$(git ls-tree HEAD~1000 | grep Makefile | git mktree --missing)
> commit=$(echo foo | git commit-tree -p HEAD $tree)
> git update-ref refs/heads/foo $commit
>
> # this is now slow; even though we only call is_promisor_object()
> # once, we have to open every single tree in the pack to find it!
> time git rev-list --count --objects --all --exclude-promisor-objects
>
> Those rev-lists run in 1.7s and 224s respectively. Ouch!
This is exactly the reason I thought just asking for the objects
directly is faster than scanning all the packs. Thanks for giving
concrete numbers that support that assumption.
Thanks,
-Stolee
next prev parent reply other threads:[~2022-06-16 14:00 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-14 13:36 [PATCH] mktree: learn about promised objects Richard Oliver
2022-06-14 14:14 ` Derrick Stolee
2022-06-14 16:33 ` Richard Oliver
2022-06-14 17:27 ` Derrick Stolee
2022-06-15 0:35 ` Taylor Blau
2022-06-15 4:00 ` Jeff King
2022-06-15 17:40 ` Richard Oliver
2022-06-15 18:17 ` Derrick Stolee
2022-06-16 6:07 ` Jeff King
2022-06-16 6:54 ` [PATCH] is_promisor_object(): walk promisor packs in pack-order Jeff King
2022-06-16 14:00 ` Derrick Stolee
2022-06-17 19:50 ` Jonathan Tan
2022-06-16 13:59 ` Derrick Stolee [this message]
2022-06-15 21:01 ` [PATCH] mktree: learn about promised objects Junio C Hamano
2022-06-16 5:02 ` Jeff King
2022-06-16 15:46 ` [PATCH] mktree: Make '--missing' behave as documented Richard Oliver
2022-06-16 17:44 ` Junio C Hamano
2022-06-21 13:59 ` [PATCH] mktree: do not check type of remote objects Richard Oliver
2022-06-21 16:51 ` Junio C Hamano
2022-06-21 17:48 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b559b9a3-b97a-f394-5845-5c810425f8a4@github.com \
--to=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
--cc=roliver@roku.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).