git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 06/13] fsck: stop using object_info->type_name strbuf
Date: Fri, 16 May 2025 11:52:26 +0200	[thread overview]
Message-ID: <aCcK2lfr8048Kh7E@pks.im> (raw)
In-Reply-To: <20250516044953.GF22242@coredump.intra.peff.net>

On Fri, May 16, 2025 at 12:49:53AM -0400, Jeff King wrote:
> When fsck-ing a loose object, we use object_info's type_name strbuf to
> record the parsed object type as a string. For most objects this is
> redundant with the object_type enum, but it does let us report the
> string when we encounter an object with an unknown type (for which there
> is no matching enum value).
> 
> There are a few downsides, though:
> 
>   1. The code to report these cases is not actually robust. Since we did
>      not pass a strbuf to unpack_loose_header(), we only retrieved types
>      from headers up to 32 bytes. In longer cases, we'd simply say
>      "object corrupt or missing".
> 
>   2. This is the last caller that uses object_info's type_name strbuf
>      support. It would be nice to refactor it so that we can simplify
>      that code.
> 
>   3. Likewise, we'll check the hash of the object using its unknown type
>      (again, as long as that type is short enough). That depends on the
>      hash_object_file_literally() code, which we'd eventually like to
>      get rid of.

Oh, I'd very much welcome if this code path went away completely.

> So we can simplify things by bailing immediately in read_loose_object()
> when we encounter an unknown type. This has a few user-visible effects:
> 
>   a. Instead of producing a single line of error output like this:
> 
>        error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object is of unknown type 'bogus': .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6
> 
>      we'll now issue two lines (the first from read_loose_object() when
>      we see the unparsable header, and the second from the fsck code,
>      since we couldn't read the object):
> 
>        error: unable to parse type from header 'bogus 4' of .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6
>        error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object corrupt or missing: .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6
> 
>      This is a little more verbose, but this sort of error should be
>      rare (such objects are almost impossible to work with, and cannot
>      be transferred between repositories as they are not representable
>      in packfiles). And as a bonus, reporting the broken header in full
>      could help with debugging other cases (e.g., a header like "blob
>      xyzzy\0" would fail in parsing the size, but previously we'd not
>      have showed the offending bytes).

Yup, I would claim this is an improvement, as well.

>   b. An object with an unknown type will be reported as corrupt, without
>      actually doing a hash check. Again, I think this is unlikely to
>      matter in practice since such objects are totally unusable.

Agreed.

> We'll update one fsck test to match the new error strings. And we can
> remove another test that covered the case of an object with an unknown
> type _and_ a hash corruption. Since we'll skip the hash check now in
> this case, the test is no longer interesting.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/fsck.c  | 13 ++-----------
>  object-file.c   | 12 +++++++++---
>  t/t1450-fsck.sh | 29 +++--------------------------
>  3 files changed, 14 insertions(+), 40 deletions(-)
> 
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 6cac28356c..e7d96a9c8e 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -614,12 +614,11 @@ static void get_default_heads(void)
>  struct for_each_loose_cb
>  {
>  	struct progress *progress;
> -	struct strbuf obj_type;
>  };
>  
> -static int fsck_loose(const struct object_id *oid, const char *path, void *data)
> +static int fsck_loose(const struct object_id *oid, const char *path,
> +		      void *data UNUSED)
>  {
> -	struct for_each_loose_cb *cb_data = data;
>  	struct object *obj;
>  	enum object_type type = OBJ_NONE;
>  	unsigned long size;
> @@ -629,8 +628,6 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
>  	struct object_id real_oid = *null_oid(the_hash_algo);
>  	int err = 0;
>  
> -	strbuf_reset(&cb_data->obj_type);
> -	oi.type_name = &cb_data->obj_type;
>  	oi.sizep = &size;
>  	oi.typep = &type;
>  
> @@ -642,10 +639,6 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
>  			err = error(_("%s: object corrupt or missing: %s"),
>  				    oid_to_hex(oid), path);
>  	}
> -	if (type != OBJ_NONE && type < 0)
> -		err = error(_("%s: object is of unknown type '%s': %s"),
> -			    oid_to_hex(&real_oid), cb_data->obj_type.buf,
> -			    path);

This one is a bit curious. But because we know that we have reported
this error in `read_loose_object()` already we don't need to print the
error over here anymore.

> diff --git a/object-file.c b/object-file.c
> index 1127e154f6..7a35bde96e 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1662,6 +1662,12 @@ int read_loose_object(const char *path,
>  		goto out_inflate;
>  	}
>  
> +	if (*oi->typep < 0) {
> +		error(_("unable to parse type from header '%s' of %s"),
> +		      hdr, path);
> +		goto out_inflate;
> +	}
> +
>  	if (*oi->typep == OBJ_BLOB &&
>  	    *size > repo_settings_get_big_file_threshold(the_repository)) {
>  		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)

So this is where we report the new error now. Makes sense.

Patrick

  reply	other threads:[~2025-05-16  9:52 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-16  4:49 [PATCH 0/13] dropping support for non-standard object types Jeff King
2025-05-16  4:49 ` [PATCH 01/13] object-file.h: fix typo in variable declaration Jeff King
2025-05-16  4:49 ` [PATCH 02/13] cat-file: make --allow-unknown-type a noop Jeff King
2025-05-16  9:52   ` Patrick Steinhardt
2025-05-19  6:16     ` Jeff King
2025-05-19  7:22       ` Patrick Steinhardt
2025-05-16 16:47   ` Junio C Hamano
2025-05-16  4:49 ` [PATCH 03/13] object-file: drop OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag Jeff King
2025-05-16  4:49 ` [PATCH 04/13] cat-file: use type enum instead of buffer for -t option Jeff King
2025-05-16 16:56   ` Junio C Hamano
2025-05-16  4:49 ` [PATCH 05/13] oid_object_info_convert(): stop using string for object type Jeff King
2025-05-16  4:49 ` [PATCH 06/13] fsck: stop using object_info->type_name strbuf Jeff King
2025-05-16  9:52   ` Patrick Steinhardt [this message]
2025-05-19 14:26   ` Junio C Hamano
2025-05-19 17:00     ` Jeff King
2025-05-16  4:49 ` [PATCH 07/13] oid_object_info(): drop type_name strbuf Jeff King
2025-05-19 14:58   ` Junio C Hamano
2025-05-16  4:49 ` [PATCH 08/13] t/helper: add zlib test-tool Jeff King
2025-05-19 15:03   ` Junio C Hamano
2025-05-19 17:03     ` Jeff King
2025-05-21 13:44       ` Junio C Hamano
2025-05-16  4:50 ` [PATCH 09/13] t: add lib-loose.sh Jeff King
2025-05-16  9:52   ` Patrick Steinhardt
2025-05-19  6:17     ` Jeff King
2025-05-19 15:12   ` Junio C Hamano
2025-05-16  4:50 ` [PATCH 10/13] hash-object: stop allowing unknown types Jeff King
2025-05-19 15:15   ` Junio C Hamano
2025-05-16  4:50 ` [PATCH 11/13] hash-object: merge HASH_* and INDEX_* flags Jeff King
2025-05-16  9:52   ` Patrick Steinhardt
2025-05-16  4:50 ` [PATCH 12/13] hash-object: handle --literally with OPT_NEGBIT Jeff King
2025-05-19 15:30   ` Junio C Hamano
2025-05-16  4:50 ` [PATCH 13/13] object-file: drop support for writing objects with unknown types Jeff King
2025-05-16  9:52   ` Patrick Steinhardt
2025-05-19 15:32   ` Junio C Hamano
2025-05-16 16:36 ` [PATCH 0/13] dropping support for non-standard object types Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aCcK2lfr8048Kh7E@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).