public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 2/4] object-file: adapt `stream_object_signature()` to take a stream
Date: Mon, 23 Feb 2026 05:49:15 -0500	[thread overview]
Message-ID: <20260223104915.GA215364@coredump.intra.peff.net> (raw)
In-Reply-To: <20260223-pks-fsck-fix-v1-2-c29036832b6e@pks.im>

On Mon, Feb 23, 2026 at 10:50:41AM +0100, Patrick Steinhardt wrote:

> The function `stream_object_signature()` is responsible for verifying
> whether the given object ID matches the actual hash of the object's
> contents. In contrast to `check_object_signature()` it does so in a
> streaming fashion so that we don't have to load the full object into
> memory.

Makes sense, and is the first step I expected to see.

The code looks OK, though a few small comments below.

> -int stream_object_signature(struct repository *r, const struct object_id *oid)
> +int stream_object_signature(struct repository *r,
> +			    struct odb_read_stream *st,
> +			    const struct object_id *oid)
>  {
>  	struct object_id real_oid;
> -	struct odb_read_stream *st;
>  	struct git_hash_ctx c;
>  	char hdr[MAX_HEADER_LEN];
>  	int hdrlen;
>  
> -	st = odb_read_stream_open(r->objects, oid, NULL);
> -	if (!st)
> -		return -1;
> -
>  	/* Generate the header */
>  	hdrlen = format_object_header(hdr, sizeof(hdr), st->type, st->size);
>  
> @@ -160,7 +157,6 @@ int stream_object_signature(struct repository *r, const struct object_id *oid)
>  		git_hash_update(&c, buf, readlen);
>  	}
>  	git_hash_final_oid(&real_oid, &c);
> -	odb_read_stream_close(st);
>  	return !oideq(oid, &real_oid) ? -1 : 0;
>  }

The minimal change for callers would be to give a wrapper like
stream_object_signature_from_oid() or similar, that included the
open/close. But it's not too many lines, and of the two callers, one is
the caller we are trying to split apart anyway. So inlining these bits
in the callers makes sense.

> @@ -330,9 +331,17 @@ struct object *parse_object_with_flags(struct repository *r,
>  
>  	if ((!obj || obj->type == OBJ_NONE || obj->type == OBJ_BLOB) &&
>  	    odb_read_object_info(r->objects, oid, NULL) == OBJ_BLOB) {
> -		if (!skip_hash && stream_object_signature(r, repl) < 0) {
> -			error(_("hash mismatch %s"), oid_to_hex(oid));
> -			return NULL;
> +		if (!skip_hash) {
> +			struct odb_read_stream *stream = odb_read_stream_open(r->objects, oid, NULL);
> +			if (!stream || stream_object_signature(r, stream, repl) < 0) {
> +				error(_("hash mismatch %s"), oid_to_hex(oid));
> +				if (stream)
> +					odb_read_stream_close(stream);
> +				return NULL;
> +			}
> +
> +			if (stream)
> +				odb_read_stream_close(stream);
>  		}

This final "if (stream)" is a noop; we'd have exited the function
already if "!stream".

In the earlier conditional:

  if (!stream || stream_object_signature(r, stream, repl) < 0)

we're combining two error checks: opening the stream and actually
hashing it. That matches the existing code (since it all happened in a
single function), but should we take this opportunity to give more
accurate error messages? I.e., to do:

  if (!stream) {
	error(_("unable to open object stream for %s"), oid_to_hex(oid));
	return NULL;
  }
  if (stream_object_signature(r, stream, repl) < 0) {
	error(_("hash mismatch %s"), oid_to_hex(oid));
	odb_read_stream_close(stream);
	return NULL;
  }
  odb_read_stream_close(stream);

I dunno. It should be quite uncommon to see either of these messages,
but that is sometimes the moment when details are most important.

Also, as an aside, I found it curious that we still need to pass the
repository struct to stream_object_signature(). That's because it needs
to know the correct hash_algo. I wondered if the stream struct itself
might know about that, but it doesn't seem to (it doesn't know anything
about where it came from). So it's unavoidable that we'd need to retain
it.

> @@ -104,6 +105,7 @@ static int verify_packfile(struct repository *r,
>  	QSORT(entries, nr_objects, compare_entries);
>  
>  	for (i = 0; i < nr_objects; i++) {
> +		struct odb_read_stream *stream = NULL;
>  		void *data;
>  		struct object_id oid;
>  		enum object_type type;
> @@ -152,7 +154,9 @@ static int verify_packfile(struct repository *r,
>  							type) < 0)
>  			err = error("packed %s from %s is corrupt",
>  				    oid_to_hex(&oid), p->pack_name);
> -		else if (!data && stream_object_signature(r, &oid) < 0)
> +		else if (!data &&
> +			 (!(stream = odb_read_stream_open(r->objects, &oid, NULL)) ||
> +			  stream_object_signature(r, stream, &oid) < 0))
>  			err = error("packed %s from %s is corrupt",
>  				    oid_to_hex(&oid), p->pack_name);
>  		else if (fn) {
> @@ -163,12 +167,14 @@ static int verify_packfile(struct repository *r,
>  		}
>  		if (((base_count + i) & 1023) == 0)
>  			display_progress(progress, base_count + i);
> -		free(data);
>  
> +		if (stream)
> +			odb_read_stream_close(stream);
> +		free(data);
>  	}

OK, and in this case the final "if" is important, because we just set
"err" and continue through the function. This would be much simpler with
a wrapper, but of course this is the very caller we're planning to fix.

-Peff

  reply	other threads:[~2026-02-23 10:49 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23  9:50 [PATCH 0/4] pack-check: fix verification of large objects Patrick Steinhardt
2026-02-23  9:50 ` [PATCH 1/4] t/helper: improve "genrandom" test helper Patrick Steinhardt
2026-02-23 11:13   ` Jeff King
2026-02-23 12:20     ` Patrick Steinhardt
2026-02-23 14:01   ` Eric Sunshine
2026-02-23  9:50 ` [PATCH 2/4] object-file: adapt `stream_object_signature()` to take a stream Patrick Steinhardt
2026-02-23 10:49   ` Jeff King [this message]
2026-02-23 12:21     ` Patrick Steinhardt
2026-02-23 12:59       ` Jeff King
2026-02-23  9:50 ` [PATCH 3/4] packfile: expose function to read object stream for an offset Patrick Steinhardt
2026-02-23 11:07   ` Jeff King
2026-02-23 12:21     ` Patrick Steinhardt
2026-02-23 13:12       ` Jeff King
2026-02-23 15:59         ` Patrick Steinhardt
2026-02-23  9:50 ` [PATCH 4/4] pack-check: fix verification of large objects Patrick Steinhardt
2026-02-23 11:11   ` Jeff King
2026-02-23 11:30     ` Patrick Steinhardt
2026-02-23 12:58       ` Jeff King
2026-02-23 15:48         ` Patrick Steinhardt
2026-02-23 20:35   ` Junio C Hamano
2026-02-24  6:26     ` Patrick Steinhardt
2026-02-23 16:00 ` [PATCH v2 0/4] " Patrick Steinhardt
2026-02-23 16:00   ` [PATCH v2 1/4] t/helper: improve "genrandom" test helper Patrick Steinhardt
2026-02-23 16:00   ` [PATCH v2 2/4] object-file: adapt `stream_object_signature()` to take a stream Patrick Steinhardt
2026-02-23 16:00   ` [PATCH v2 3/4] packfile: expose function to read object stream for an offset Patrick Steinhardt
2026-02-23 16:00   ` [PATCH v2 4/4] pack-check: fix verification of large objects Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260223104915.GA215364@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=ps@pks.im \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox