From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 06/13] fsck: stop using object_info->type_name strbuf
Date: Fri, 16 May 2025 11:52:26 +0200 [thread overview]
Message-ID: <aCcK2lfr8048Kh7E@pks.im> (raw)
In-Reply-To: <20250516044953.GF22242@coredump.intra.peff.net>
On Fri, May 16, 2025 at 12:49:53AM -0400, Jeff King wrote:
> When fsck-ing a loose object, we use object_info's type_name strbuf to
> record the parsed object type as a string. For most objects this is
> redundant with the object_type enum, but it does let us report the
> string when we encounter an object with an unknown type (for which there
> is no matching enum value).
>
> There are a few downsides, though:
>
> 1. The code to report these cases is not actually robust. Since we did
> not pass a strbuf to unpack_loose_header(), we only retrieved types
> from headers up to 32 bytes. In longer cases, we'd simply say
> "object corrupt or missing".
>
> 2. This is the last caller that uses object_info's type_name strbuf
> support. It would be nice to refactor it so that we can simplify
> that code.
>
> 3. Likewise, we'll check the hash of the object using its unknown type
> (again, as long as that type is short enough). That depends on the
> hash_object_file_literally() code, which we'd eventually like to
> get rid of.
Oh, I'd very much welcome if this code path went away completely.
> So we can simplify things by bailing immediately in read_loose_object()
> when we encounter an unknown type. This has a few user-visible effects:
>
> a. Instead of producing a single line of error output like this:
>
> error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object is of unknown type 'bogus': .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6
>
> we'll now issue two lines (the first from read_loose_object() when
> we see the unparsable header, and the second from the fsck code,
> since we couldn't read the object):
>
> error: unable to parse type from header 'bogus 4' of .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6
> error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object corrupt or missing: .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6
>
> This is a little more verbose, but this sort of error should be
> rare (such objects are almost impossible to work with, and cannot
> be transferred between repositories as they are not representable
> in packfiles). And as a bonus, reporting the broken header in full
> could help with debugging other cases (e.g., a header like "blob
> xyzzy\0" would fail in parsing the size, but previously we'd not
> have showed the offending bytes).
Yup, I would claim this is an improvement, as well.
> b. An object with an unknown type will be reported as corrupt, without
> actually doing a hash check. Again, I think this is unlikely to
> matter in practice since such objects are totally unusable.
Agreed.
> We'll update one fsck test to match the new error strings. And we can
> remove another test that covered the case of an object with an unknown
> type _and_ a hash corruption. Since we'll skip the hash check now in
> this case, the test is no longer interesting.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> builtin/fsck.c | 13 ++-----------
> object-file.c | 12 +++++++++---
> t/t1450-fsck.sh | 29 +++--------------------------
> 3 files changed, 14 insertions(+), 40 deletions(-)
>
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 6cac28356c..e7d96a9c8e 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -614,12 +614,11 @@ static void get_default_heads(void)
> struct for_each_loose_cb
> {
> struct progress *progress;
> - struct strbuf obj_type;
> };
>
> -static int fsck_loose(const struct object_id *oid, const char *path, void *data)
> +static int fsck_loose(const struct object_id *oid, const char *path,
> + void *data UNUSED)
> {
> - struct for_each_loose_cb *cb_data = data;
> struct object *obj;
> enum object_type type = OBJ_NONE;
> unsigned long size;
> @@ -629,8 +628,6 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
> struct object_id real_oid = *null_oid(the_hash_algo);
> int err = 0;
>
> - strbuf_reset(&cb_data->obj_type);
> - oi.type_name = &cb_data->obj_type;
> oi.sizep = &size;
> oi.typep = &type;
>
> @@ -642,10 +639,6 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
> err = error(_("%s: object corrupt or missing: %s"),
> oid_to_hex(oid), path);
> }
> - if (type != OBJ_NONE && type < 0)
> - err = error(_("%s: object is of unknown type '%s': %s"),
> - oid_to_hex(&real_oid), cb_data->obj_type.buf,
> - path);
This one is a bit curious. But because we know that we have reported
this error in `read_loose_object()` already we don't need to print the
error over here anymore.
> diff --git a/object-file.c b/object-file.c
> index 1127e154f6..7a35bde96e 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1662,6 +1662,12 @@ int read_loose_object(const char *path,
> goto out_inflate;
> }
>
> + if (*oi->typep < 0) {
> + error(_("unable to parse type from header '%s' of %s"),
> + hdr, path);
> + goto out_inflate;
> + }
> +
> if (*oi->typep == OBJ_BLOB &&
> *size > repo_settings_get_big_file_threshold(the_repository)) {
> if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
So this is where we report the new error now. Makes sense.
Patrick
next prev parent reply other threads:[~2025-05-16 9:52 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-16 4:49 [PATCH 0/13] dropping support for non-standard object types Jeff King
2025-05-16 4:49 ` [PATCH 01/13] object-file.h: fix typo in variable declaration Jeff King
2025-05-16 4:49 ` [PATCH 02/13] cat-file: make --allow-unknown-type a noop Jeff King
2025-05-16 9:52 ` Patrick Steinhardt
2025-05-19 6:16 ` Jeff King
2025-05-19 7:22 ` Patrick Steinhardt
2025-05-16 16:47 ` Junio C Hamano
2025-05-16 4:49 ` [PATCH 03/13] object-file: drop OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag Jeff King
2025-05-16 4:49 ` [PATCH 04/13] cat-file: use type enum instead of buffer for -t option Jeff King
2025-05-16 16:56 ` Junio C Hamano
2025-05-16 4:49 ` [PATCH 05/13] oid_object_info_convert(): stop using string for object type Jeff King
2025-05-16 4:49 ` [PATCH 06/13] fsck: stop using object_info->type_name strbuf Jeff King
2025-05-16 9:52 ` Patrick Steinhardt [this message]
2025-05-19 14:26 ` Junio C Hamano
2025-05-19 17:00 ` Jeff King
2025-05-16 4:49 ` [PATCH 07/13] oid_object_info(): drop type_name strbuf Jeff King
2025-05-19 14:58 ` Junio C Hamano
2025-05-16 4:49 ` [PATCH 08/13] t/helper: add zlib test-tool Jeff King
2025-05-19 15:03 ` Junio C Hamano
2025-05-19 17:03 ` Jeff King
2025-05-21 13:44 ` Junio C Hamano
2025-05-16 4:50 ` [PATCH 09/13] t: add lib-loose.sh Jeff King
2025-05-16 9:52 ` Patrick Steinhardt
2025-05-19 6:17 ` Jeff King
2025-05-19 15:12 ` Junio C Hamano
2025-05-16 4:50 ` [PATCH 10/13] hash-object: stop allowing unknown types Jeff King
2025-05-19 15:15 ` Junio C Hamano
2025-05-16 4:50 ` [PATCH 11/13] hash-object: merge HASH_* and INDEX_* flags Jeff King
2025-05-16 9:52 ` Patrick Steinhardt
2025-05-16 4:50 ` [PATCH 12/13] hash-object: handle --literally with OPT_NEGBIT Jeff King
2025-05-19 15:30 ` Junio C Hamano
2025-05-16 4:50 ` [PATCH 13/13] object-file: drop support for writing objects with unknown types Jeff King
2025-05-16 9:52 ` Patrick Steinhardt
2025-05-19 15:32 ` Junio C Hamano
2025-05-16 16:36 ` [PATCH 0/13] dropping support for non-standard object types Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aCcK2lfr8048Kh7E@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.