From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 5/5] sha1_file: don't re-scan pack directory for null sha1
Date: Fri, 24 Nov 2017 12:32:01 -0500 [thread overview]
Message-ID: <20171124173201.GA29190@sigill> (raw)
In-Reply-To: <xmqqpo89rac6.fsf@gitster.mtv.corp.google.com>
On Thu, Nov 23, 2017 at 11:35:21AM +0900, Junio C Hamano wrote:
> Not limiting us to the caller in the "fetch" codepath, I think the
> expectation by callers of lookup_commit_reference_gently() in the
> ideal world would be:
>
> - It has an object name, and wants to use it as point in the commit
> DAG to define the traversal over the DAG, if it refers to a
> commit known to us.
>
> - It does not know if these object names represent a tag object, a
> commit object, or some other object. It does not know if the
> local repository actually has them (e.g. we received a "have"
> from the other side---missing is expected).
>
> - Hence, it would happily accept a NULL as "we do not have it" and
> "we do have it, but it is not a commit-ish".
>
> And from that point of view, 2, 3a (missing), and 4 (0{40}) to yield
> NULL is perfectly fine. 3b (exists but broken) may be a noteworthy
> event, but for the purpose of the caller, it may want to proceed as
> if the object is missing from our end, so it might deserve warning()
> but not die(), at least as the default behaviour.
Hmm, yeah, I did not differentiate 3a and 3b in my analysis. How we'd
want to handle "missing" would vary from caller to caller, I think.
Certainly for this case in "fetch" where it's the "old" value for a ref,
it would be a corruption not to have it. Just grepping around a few of
the other callers, I see:
- archive.c:parse_treeish_arg: fine not to have it (we'd complain soon
after that it doesn't point to a tree either). But also fine to
complain hard.
- blame.c:dwim_reverse_initial, and checkout_paths and switch_branches
in checkout.c: missing object would mean a broken HEAD
- show-branch.c:append_ref: missing would mean a broken ref
- clear_commit_marks_for_object_array: conceptually OK to have a
missing object, though I suspect practically impossible since we
examined and marked the objects earlier
- ref-filter's --merged, --contains, etc: the low-level code quietly
ignores a missing object or non-commit, but the command-line parser
enforces that we find a commit. So probably impossible to trigger,
but arguably the second call should be a BUG().
So I dunno. That is far from exhaustive, but it does seem like most
cases should assume the presence of the file.
As for your proposed behavior:
> And from that point of view, 2, 3a (missing), and 4 (0{40}) to yield
> NULL is perfectly fine. 3b (exists but broken) may be a noteworthy
> event, but for the purpose of the caller, it may want to proceed as
> if the object is missing from our end, so it might deserve warning()
> but not die(), at least as the default behaviour.
That's actually not far from what happens now, with the only difference
being that we _do_ actually die() on a corruption (really any error
except ENOENT). I forgot that when I wrote my earlier message. You can
see it by updating the "fetch" reproduction I sent earlier to corrupt:
-- >8 --
git init parent
git -C parent commit --allow-empty -m base
git clone parent child
git -C parent commit --allow-empty -m more
cd child
for i in .git/objects/??/*
do
chmod +w $i
echo corrupt >$i
done
git fetch
-- 8< --
which gives something like:
error: inflate: data stream error (incorrect header check)
error: unable to unpack 55c66a401fd4190382f9cd8b70c11f9f5adb044e header
fatal: loose object 55c66a401fd4190382f9cd8b70c11f9f5adb044e (stored in .git/objects/55/c66a401fd4190382f9cd8b70c11f9f5adb044e) is corrupt
So that's good. I do still think that many of the callers of
lookup_commit_reference_gently() probably ought to die() in the
"missing" case rather than continue (and produce subtly wrong answers).
But it may not be that big a deal. For the most part, all bets are off
in a corrupt repo. My main concern is just that we do not want the
corruption to spread or to make it harder for us to recover from (and
that includes allowing us to delete or overwrite other data that would
otherwise be forbidden, which is what's happening in the fetch case).
Most of the callers probably don't fall into that situation, but it
might be nice to err on the side of caution.
-Peff
next prev parent reply other threads:[~2017-11-24 17:32 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-20 20:26 [PATCH 0/5] avoiding pointless pack-directory re-scans Jeff King
2017-11-20 20:26 ` [PATCH 1/5] p5550: factor our nonsense-pack creation Jeff King
2017-11-20 23:55 ` Eric Sunshine
2017-11-21 15:58 ` Jeff King
2017-11-22 0:32 ` Stefan Beller
2017-11-22 22:38 ` Jeff King
2017-11-23 2:41 ` Junio C Hamano
2017-11-23 5:02 ` Jeff King
2017-11-20 20:27 ` [PATCH 2/5] t/perf/lib-pack: use fast-import checkpoint to create packs Jeff King
2017-11-20 20:28 ` [PATCH 3/5] p5551: add a script to test fetch pack-dir rescans Jeff King
2017-11-20 20:29 ` [PATCH 4/5] everything_local: use "quick" object existence check Jeff King
2017-11-20 20:35 ` [PATCH 5/5] sha1_file: don't re-scan pack directory for null sha1 Jeff King
2017-11-20 20:47 ` Stefan Beller
2017-11-20 20:58 ` Jeff King
2017-11-21 2:37 ` Junio C Hamano
2017-11-21 22:57 ` Jeff King
2017-11-22 1:42 ` Junio C Hamano
2017-11-22 22:36 ` Jeff King
2017-11-23 2:35 ` Junio C Hamano
2017-11-24 17:32 ` Jeff King [this message]
2017-11-25 3:20 ` Junio C Hamano
2017-11-21 5:20 ` Junio C Hamano
2017-11-21 23:17 ` Jeff King
2017-11-22 1:49 ` Junio C Hamano
2017-11-22 3:17 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171124173201.GA29190@sigill \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).