From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 2/2] fsck: lazily load types under --connectivity-only
Date: Thu, 26 Jan 2017 10:51:00 -0800 [thread overview]
Message-ID: <xmqqziid4puz.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20170126041206.5qfv7r7czbwfqvaa@sigill.intra.peff.net> (Jeff King's message of "Wed, 25 Jan 2017 23:12:07 -0500")
Jeff King <peff@peff.net> writes:
> The recent fixes to "fsck --connectivity-only" load all of
> the objects with their correct types. This keeps the
> connectivity-only code path close to the regular one, but it
> also introduces some unnecessary inefficiency. While getting
> the type of an object is cheap compared to actually opening
> and parsing the object (as the non-connectivity-only case
> would do), it's still not free.
>
> For reachable non-blob objects, we end up having to parse
> them later anyway (to see what they point to), making our
> type lookup here redundant.
>
> For unreachable objects, we might never hit them at all in
> the reachability traversal, making the lookup completely
> wasted. And in some cases, we might have quite a few
> unreachable objects (e.g., when alternates are used for
> shared object storage between repositories, it's normal for
> there to be objects reachable from other repositories but
> not the one running fsck).
>
> The comment in mark_object_for_connectivity() claims two
> benefits to getting the type up front:
>
> 1. We need to know the types during fsck_walk(). (And not
> explicitly mentioned, but we also need them when
> printing the types of broken or dangling commits).
>
> We can address this by lazy-loading the types as
> necessary. Most objects never need this lazy-load at
> all, because they fall into one of these categories:
>
> a. Reachable from our tips, and are coerced into the
> correct type as we traverse (e.g., a parent link
> will call lookup_commit(), which converts OBJ_NONE
> to OBJ_COMMIT).
>
> b. Unreachable, but not at the tip of a chunk of
> unreachable history. We only mention the tips as
> "dangling", so an unreachable commit which links
> to hundreds of other objects needs only report the
> type of the tip commit.
>
> 2. It serves as a cross-check that the coercion in (1a) is
> correct (i.e., we'll complain about a parent link that
> points to a blob). But we get most of this for free
> already, because right after coercing, we'll parse any
> non-blob objects. So we'd notice then if we expected a
> commit and got a blob.
>
> The one exception is when we expect a blob, in which
> case we never actually read the object contents.
>
> So this is a slight weakening, but given that the whole
> point of --connectivity-only is to sacrifice some data
> integrity checks for speed, this seems like an
> acceptable tradeoff.
The only weakening is that a non-blob (or a corrupt blob) object
that sits where we expect a blob (because the containing tree or the
tag says so) would not be diagnosed as an error, then? I think that
is in line with the spirit of --connectivity-only and is a good
trade-off.
next prev parent reply other threads:[~2017-01-26 18:51 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-26 4:10 [PATCH 0/2] (re-)optimizing fsck --connectivity-only Jeff King
2017-01-26 4:11 ` [PATCH 1/2] fsck: move typename() printing to its own function Jeff King
2017-01-26 4:12 ` [PATCH 2/2] fsck: lazily load types under --connectivity-only Jeff King
2017-01-26 12:07 ` Johannes Schindelin
2017-01-26 18:51 ` Junio C Hamano [this message]
2017-01-26 19:12 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqziid4puz.fsf@gitster.mtv.corp.google.com \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox