All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 07/13] builtin/index-pack: fix deferred fsck outside repos
Date: Wed, 19 Nov 2025 13:27:51 -0800	[thread overview]
Message-ID: <xmqq1pltbtm0.fsf@gitster.g> (raw)
In-Reply-To: <20251119-b4-pks-odb-creation-v1-7-2b2ed2612cb6@pks.im> (Patrick Steinhardt's message of "Wed, 19 Nov 2025 08:50:55 +0100")

Patrick Steinhardt <ps@pks.im> writes:

> There's another option though: instead of skipping the final object
> checks, we can die if there are any queued object checks. With this
> change we now die exactly if and only if we would have previously
> segfaulted. Like this we ensure that objects that _may_ fail the
> consistency checks won't be silently skipped, and at the same time we
> give users a much better error message.

A packfile stream may not have the blob objects these tree entries
refer to, in which case index-pack cannot work outside a repository,
but I think that is fine.

> @@ -2110,8 +2110,23 @@ int cmd_index_pack(int argc,
>  	else
>  		close(input_fd);
>  
> -	if (do_fsck_object && fsck_finish(&fsck_options))
> -		die(_("fsck error in pack objects"));
> +	if (do_fsck_object) {
> +		/*
> +		 * We cannot perform queued consistency checks when running
> +		 * outside of a repository because those require us to read
> +		 * from the object database, which is uninitialized.
> +		 *
> +		 * TODO: we may eventually set up an in-memory object database,
> +		 * which would allow us to perform these queued checks.
> +		 */
> +		if (!startup_info->have_repository &&
> +		    fsck_has_queued_checks(&fsck_options))
> +			die(_("cannot perform queued object checks outside "
> +			      "of a repository"));
> +
> +		if (fsck_finish(&fsck_options))
> +			die(_("fsck error in pack objects"));
> +	}

OK.

> +bool fsck_has_queued_checks(struct fsck_options *options)
> +{
> +	return !oidset_equal(&options->gitmodules_found, &options->gitmodules_done) ||
> +	       !oidset_equal(&options->gitattributes_found, &options->gitattributes_done);
> +}

So, if we see a tree entry for these special blobs (and remember
them in the _found oid set) before we see the blobs, fsck_blob()
would notice that it is looking at the blob that is in these _found
set, and throw it in _done set while checking the blob in-core.

A packfile we generate has trees before blobs, so a self contained
pack stream should still be validatable outside a repository with
this code, but other people's reimplementations of Git may produce
a packfile that has a blob before a tree that refers to the blob.
In other words, we can validate a self contained pack stream outside
repository on a best-effort basis.  And that is perfectly fine.


  reply	other threads:[~2025-11-19 21:27 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-19  7:50 [PATCH 00/13] Centralize management of object database sources Patrick Steinhardt
2025-11-19  7:50 ` [PATCH 01/13] path: move `enter_repo()` into "setup.c" Patrick Steinhardt
2025-11-19  7:50 ` [PATCH 02/13] setup: convert `set_git_dir()` to have file scope Patrick Steinhardt
2025-11-19  7:50 ` [PATCH 03/13] odb: adopt logic to close object databases Patrick Steinhardt
2025-11-19  7:50 ` [PATCH 04/13] odb: refactor `odb_clear()` to `odb_free()` Patrick Steinhardt
2025-11-19  7:50 ` [PATCH 05/13] odb: move logic to disable ref updates into repo Patrick Steinhardt
2025-11-19 20:51   ` Junio C Hamano
2025-11-21  7:48     ` Patrick Steinhardt
2025-11-19  7:50 ` [PATCH 06/13] oidset: introduce `oidset_equal()` Patrick Steinhardt
2025-11-19 20:59   ` Junio C Hamano
2025-11-19  7:50 ` [PATCH 07/13] builtin/index-pack: fix deferred fsck outside repos Patrick Steinhardt
2025-11-19 21:27   ` Junio C Hamano [this message]
2025-11-21  7:48     ` Patrick Steinhardt
2025-11-19  7:50 ` [PATCH 08/13] t/helper: stop setting up `the_repository` repeatedly Patrick Steinhardt
2025-11-19  7:50 ` [PATCH 09/13] http-push: stop setting up `the_repository` for each reference Patrick Steinhardt
2025-11-19  7:50 ` [PATCH 10/13] odb: handle initialization of sources in `odb_new()` Patrick Steinhardt
2025-11-19  7:50 ` [PATCH 11/13] chdir-notify: add function to unregister listeners Patrick Steinhardt
2025-11-19  7:51 ` [PATCH 12/13] odb: handle changing a repository's commondir Patrick Steinhardt
2025-11-20 22:06   ` Junio C Hamano
2025-11-21  8:12     ` Patrick Steinhardt
2025-11-19  7:51 ` [PATCH 13/13] odb: handle recreation of quarantine directories Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq1pltbtm0.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.