git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 07/13] builtin/index-pack: fix deferred fsck outside repos
Date: Wed, 19 Nov 2025 13:27:51 -0800	[thread overview]
Message-ID: <xmqq1pltbtm0.fsf@gitster.g> (raw)
In-Reply-To: <20251119-b4-pks-odb-creation-v1-7-2b2ed2612cb6@pks.im> (Patrick Steinhardt's message of "Wed, 19 Nov 2025 08:50:55 +0100")

Patrick Steinhardt <ps@pks.im> writes:

> There's another option though: instead of skipping the final object
> checks, we can die if there are any queued object checks. With this
> change we now die exactly if and only if we would have previously
> segfaulted. Like this we ensure that objects that _may_ fail the
> consistency checks won't be silently skipped, and at the same time we
> give users a much better error message.

A packfile stream may not have the blob objects these tree entries
refer to, in which case index-pack cannot work outside a repository,
but I think that is fine.

> @@ -2110,8 +2110,23 @@ int cmd_index_pack(int argc,
>  	else
>  		close(input_fd);
>  
> -	if (do_fsck_object && fsck_finish(&fsck_options))
> -		die(_("fsck error in pack objects"));
> +	if (do_fsck_object) {
> +		/*
> +		 * We cannot perform queued consistency checks when running
> +		 * outside of a repository because those require us to read
> +		 * from the object database, which is uninitialized.
> +		 *
> +		 * TODO: we may eventually set up an in-memory object database,
> +		 * which would allow us to perform these queued checks.
> +		 */
> +		if (!startup_info->have_repository &&
> +		    fsck_has_queued_checks(&fsck_options))
> +			die(_("cannot perform queued object checks outside "
> +			      "of a repository"));
> +
> +		if (fsck_finish(&fsck_options))
> +			die(_("fsck error in pack objects"));
> +	}

OK.

> +bool fsck_has_queued_checks(struct fsck_options *options)
> +{
> +	return !oidset_equal(&options->gitmodules_found, &options->gitmodules_done) ||
> +	       !oidset_equal(&options->gitattributes_found, &options->gitattributes_done);
> +}

So, if we see a tree entry for these special blobs (and remember
them in the _found oid set) before we see the blobs, fsck_blob()
would notice that it is looking at the blob that is in these _found
set, and throw it in _done set while checking the blob in-core.

A packfile we generate has trees before blobs, so a self contained
pack stream should still be validatable outside a repository with
this code, but other people's reimplementations of Git may produce
a packfile that has a blob before a tree that refers to the blob.
In other words, we can validate a self contained pack stream outside
repository on a best-effort basis.  And that is perfectly fine.


  reply	other threads:[~2025-11-19 21:27 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-19  7:50 [PATCH 00/13] Centralize management of object database sources Patrick Steinhardt
2025-11-19  7:50 ` [PATCH 01/13] path: move `enter_repo()` into "setup.c" Patrick Steinhardt
2025-11-19  7:50 ` [PATCH 02/13] setup: convert `set_git_dir()` to have file scope Patrick Steinhardt
2025-11-19  7:50 ` [PATCH 03/13] odb: adopt logic to close object databases Patrick Steinhardt
2025-11-19  7:50 ` [PATCH 04/13] odb: refactor `odb_clear()` to `odb_free()` Patrick Steinhardt
2025-11-19  7:50 ` [PATCH 05/13] odb: move logic to disable ref updates into repo Patrick Steinhardt
2025-11-19 20:51   ` Junio C Hamano
2025-11-21  7:48     ` Patrick Steinhardt
2025-11-19  7:50 ` [PATCH 06/13] oidset: introduce `oidset_equal()` Patrick Steinhardt
2025-11-19 20:59   ` Junio C Hamano
2025-11-19  7:50 ` [PATCH 07/13] builtin/index-pack: fix deferred fsck outside repos Patrick Steinhardt
2025-11-19 21:27   ` Junio C Hamano [this message]
2025-11-21  7:48     ` Patrick Steinhardt
2025-11-19  7:50 ` [PATCH 08/13] t/helper: stop setting up `the_repository` repeatedly Patrick Steinhardt
2025-11-19  7:50 ` [PATCH 09/13] http-push: stop setting up `the_repository` for each reference Patrick Steinhardt
2025-11-19  7:50 ` [PATCH 10/13] odb: handle initialization of sources in `odb_new()` Patrick Steinhardt
2025-11-19  7:50 ` [PATCH 11/13] chdir-notify: add function to unregister listeners Patrick Steinhardt
2025-11-19  7:51 ` [PATCH 12/13] odb: handle changing a repository's commondir Patrick Steinhardt
2025-11-20 22:06   ` Junio C Hamano
2025-11-21  8:12     ` Patrick Steinhardt
2025-11-19  7:51 ` [PATCH 13/13] odb: handle recreation of quarantine directories Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq1pltbtm0.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).