git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Mathias Rav <m@git.strova.dk>, git@vger.kernel.org, pwagland@gmail.com
Subject: Re: [PATCH] revision: ignore non-existent objects in resolve-undo list
Date: Tue, 18 Oct 2022 16:29:23 -0400	[thread overview]
Message-ID: <Y08Mo8AL4DmFhZao@coredump.intra.peff.net> (raw)
In-Reply-To: <xmqqbkq9ulum.fsf@gitster.g>

On Tue, Oct 18, 2022 at 09:40:01AM -0700, Junio C Hamano wrote:

> And the patch goes in the right direction.  It is a bit sad that it
> now has to do parse_object() but in the normal case, the object
> referenced should be a blob that exists, for which the cost of
> parsing it would be none (just setting .parsed member to true), so
> it should be OK.

This isn't quite true. parse_object() will still inflate the object
contents to check the sha1. I think has_object_file() is probably the
right thing here. We want to know if the object is missing entirely.

We'd not notice corrupted bytes, of course, but that is OK. Traversal
does not open blobs we reach via trees, either. For pack-objects, we
rely on either:

  - for repacking to disk, we check the pack crc for already-packed
    objects (which avoids inflating them). For loose objects, we'll
    inflate them later when we convert them to packed form.

  - for packing to stdout for fetch/push, the receiver is expected to
    check the sha1 via index-pack, etc.

So I think just checking "do we have it? If not, gently skip it" is the
right thing here. And in the long run we'd hopefully remove that code,
as "we don't have it" becomes less "this was probably gc'd with an older
version of git" to "oops, there is a bug in Git that lost this object".

I notice that 5a5ea141e7 (revision: mark blobs needed for resolve-undo
as reachable, 2022-06-09) uses parse_object() in the fsck code path.
That _might_ be better as lookup_object(), as earlier stages of fsck
would have checked the bytes of each object and created an in-memory
object struct. Though I guess in that sense, it doesn't matter;
parse_object() will hit lookup_object() first and see that in-memory
struct.

-Peff

      reply	other threads:[~2022-10-18 20:29 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-18 15:55 [PATCH] revision: ignore non-existent objects in resolve-undo list Mathias Rav
2022-10-18 16:32 ` Junio C Hamano
2022-10-18 16:40   ` Junio C Hamano
2022-10-18 20:29     ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y08Mo8AL4DmFhZao@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=m@git.strova.dk \
    --cc=pwagland@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).