From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 06/13] fsck: stop using object_info->type_name strbuf
Date: Fri, 16 May 2025 11:52:26 +0200 [thread overview]
Message-ID: <aCcK2lfr8048Kh7E@pks.im> (raw)
In-Reply-To: <20250516044953.GF22242@coredump.intra.peff.net>
On Fri, May 16, 2025 at 12:49:53AM -0400, Jeff King wrote:
> When fsck-ing a loose object, we use object_info's type_name strbuf to
> record the parsed object type as a string. For most objects this is
> redundant with the object_type enum, but it does let us report the
> string when we encounter an object with an unknown type (for which there
> is no matching enum value).
>
> There are a few downsides, though:
>
> 1. The code to report these cases is not actually robust. Since we did
> not pass a strbuf to unpack_loose_header(), we only retrieved types
> from headers up to 32 bytes. In longer cases, we'd simply say
> "object corrupt or missing".
>
> 2. This is the last caller that uses object_info's type_name strbuf
> support. It would be nice to refactor it so that we can simplify
> that code.
>
> 3. Likewise, we'll check the hash of the object using its unknown type
> (again, as long as that type is short enough). That depends on the
> hash_object_file_literally() code, which we'd eventually like to
> get rid of.
Oh, I'd very much welcome if this code path went away completely.
> So we can simplify things by bailing immediately in read_loose_object()
> when we encounter an unknown type. This has a few user-visible effects:
>
> a. Instead of producing a single line of error output like this:
>
> error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object is of unknown type 'bogus': .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6
>
> we'll now issue two lines (the first from read_loose_object() when
> we see the unparsable header, and the second from the fsck code,
> since we couldn't read the object):
>
> error: unable to parse type from header 'bogus 4' of .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6
> error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object corrupt or missing: .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6
>
> This is a little more verbose, but this sort of error should be
> rare (such objects are almost impossible to work with, and cannot
> be transferred between repositories as they are not representable
> in packfiles). And as a bonus, reporting the broken header in full
> could help with debugging other cases (e.g., a header like "blob
> xyzzy\0" would fail in parsing the size, but previously we'd not
> have showed the offending bytes).
Yup, I would claim this is an improvement, as well.
> b. An object with an unknown type will be reported as corrupt, without
> actually doing a hash check. Again, I think this is unlikely to
> matter in practice since such objects are totally unusable.
Agreed.
> We'll update one fsck test to match the new error strings. And we can
> remove another test that covered the case of an object with an unknown
> type _and_ a hash corruption. Since we'll skip the hash check now in
> this case, the test is no longer interesting.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> builtin/fsck.c | 13 ++-----------
> object-file.c | 12 +++++++++---
> t/t1450-fsck.sh | 29 +++--------------------------
> 3 files changed, 14 insertions(+), 40 deletions(-)
>
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 6cac28356c..e7d96a9c8e 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -614,12 +614,11 @@ static void get_default_heads(void)
> struct for_each_loose_cb
> {
> struct progress *progress;
> - struct strbuf obj_type;
> };
>
> -static int fsck_loose(const struct object_id *oid, const char *path, void *data)
> +static int fsck_loose(const struct object_id *oid, const char *path,
> + void *data UNUSED)
> {
> - struct for_each_loose_cb *cb_data = data;
> struct object *obj;
> enum object_type type = OBJ_NONE;
> unsigned long size;
> @@ -629,8 +628,6 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
> struct object_id real_oid = *null_oid(the_hash_algo);
> int err = 0;
>
> - strbuf_reset(&cb_data->obj_type);
> - oi.type_name = &cb_data->obj_type;
> oi.sizep = &size;
> oi.typep = &type;
>
> @@ -642,10 +639,6 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
> err = error(_("%s: object corrupt or missing: %s"),
> oid_to_hex(oid), path);
> }
> - if (type != OBJ_NONE && type < 0)
> - err = error(_("%s: object is of unknown type '%s': %s"),
> - oid_to_hex(&real_oid), cb_data->obj_type.buf,
> - path);
This one is a bit curious. But because we know that we have reported
this error in `read_loose_object()` already we don't need to print the
error over here anymore.
> diff --git a/object-file.c b/object-file.c
> index 1127e154f6..7a35bde96e 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1662,6 +1662,12 @@ int read_loose_object(const char *path,
> goto out_inflate;
> }
>
> + if (*oi->typep < 0) {
> + error(_("unable to parse type from header '%s' of %s"),
> + hdr, path);
> + goto out_inflate;
> + }
> +
> if (*oi->typep == OBJ_BLOB &&
> *size > repo_settings_get_big_file_threshold(the_repository)) {
> if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
So this is where we report the new error now. Makes sense.
Patrick
next prev parent reply other threads:[~2025-05-16 9:52 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-16 4:49 [PATCH 0/13] dropping support for non-standard object types Jeff King
2025-05-16 4:49 ` [PATCH 01/13] object-file.h: fix typo in variable declaration Jeff King
2025-05-16 4:49 ` [PATCH 02/13] cat-file: make --allow-unknown-type a noop Jeff King
2025-05-16 9:52 ` Patrick Steinhardt
2025-05-19 6:16 ` Jeff King
2025-05-19 7:22 ` Patrick Steinhardt
2025-05-16 16:47 ` Junio C Hamano
2025-05-16 4:49 ` [PATCH 03/13] object-file: drop OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag Jeff King
2025-05-16 4:49 ` [PATCH 04/13] cat-file: use type enum instead of buffer for -t option Jeff King
2025-05-16 16:56 ` Junio C Hamano
2025-05-16 4:49 ` [PATCH 05/13] oid_object_info_convert(): stop using string for object type Jeff King
2025-05-16 4:49 ` [PATCH 06/13] fsck: stop using object_info->type_name strbuf Jeff King
2025-05-16 9:52 ` Patrick Steinhardt [this message]
2025-05-19 14:26 ` Junio C Hamano
2025-05-19 17:00 ` Jeff King
2025-05-16 4:49 ` [PATCH 07/13] oid_object_info(): drop type_name strbuf Jeff King
2025-05-19 14:58 ` Junio C Hamano
2025-05-16 4:49 ` [PATCH 08/13] t/helper: add zlib test-tool Jeff King
2025-05-19 15:03 ` Junio C Hamano
2025-05-19 17:03 ` Jeff King
2025-05-21 13:44 ` Junio C Hamano
2025-05-16 4:50 ` [PATCH 09/13] t: add lib-loose.sh Jeff King
2025-05-16 9:52 ` Patrick Steinhardt
2025-05-19 6:17 ` Jeff King
2025-05-19 15:12 ` Junio C Hamano
2025-05-16 4:50 ` [PATCH 10/13] hash-object: stop allowing unknown types Jeff King
2025-05-19 15:15 ` Junio C Hamano
2025-05-16 4:50 ` [PATCH 11/13] hash-object: merge HASH_* and INDEX_* flags Jeff King
2025-05-16 9:52 ` Patrick Steinhardt
2025-05-16 4:50 ` [PATCH 12/13] hash-object: handle --literally with OPT_NEGBIT Jeff King
2025-05-19 15:30 ` Junio C Hamano
2025-05-16 4:50 ` [PATCH 13/13] object-file: drop support for writing objects with unknown types Jeff King
2025-05-16 9:52 ` Patrick Steinhardt
2025-05-19 15:32 ` Junio C Hamano
2025-05-16 16:36 ` [PATCH 0/13] dropping support for non-standard object types Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aCcK2lfr8048Kh7E@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).