public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 2/4] object-file: adapt `stream_object_signature()` to take a stream
Date: Mon, 23 Feb 2026 13:21:00 +0100	[thread overview]
Message-ID: <aZxGLKycnZcVoXPt@pks.im> (raw)
In-Reply-To: <20260223104915.GA215364@coredump.intra.peff.net>

On Mon, Feb 23, 2026 at 05:49:15AM -0500, Jeff King wrote:
> On Mon, Feb 23, 2026 at 10:50:41AM +0100, Patrick Steinhardt wrote:
> 
> > The function `stream_object_signature()` is responsible for verifying
> > whether the given object ID matches the actual hash of the object's
> > contents. In contrast to `check_object_signature()` it does so in a
> > streaming fashion so that we don't have to load the full object into
> > memory.
> 
> Makes sense, and is the first step I expected to see.
> 
> The code looks OK, though a few small comments below.
> 
> > -int stream_object_signature(struct repository *r, const struct object_id *oid)
> > +int stream_object_signature(struct repository *r,
> > +			    struct odb_read_stream *st,
> > +			    const struct object_id *oid)
> >  {
> >  	struct object_id real_oid;
> > -	struct odb_read_stream *st;
> >  	struct git_hash_ctx c;
> >  	char hdr[MAX_HEADER_LEN];
> >  	int hdrlen;
> >  
> > -	st = odb_read_stream_open(r->objects, oid, NULL);
> > -	if (!st)
> > -		return -1;
> > -
> >  	/* Generate the header */
> >  	hdrlen = format_object_header(hdr, sizeof(hdr), st->type, st->size);
> >  
> > @@ -160,7 +157,6 @@ int stream_object_signature(struct repository *r, const struct object_id *oid)
> >  		git_hash_update(&c, buf, readlen);
> >  	}
> >  	git_hash_final_oid(&real_oid, &c);
> > -	odb_read_stream_close(st);
> >  	return !oideq(oid, &real_oid) ? -1 : 0;
> >  }
> 
> The minimal change for callers would be to give a wrapper like
> stream_object_signature_from_oid() or similar, that included the
> open/close. But it's not too many lines, and of the two callers, one is
> the caller we are trying to split apart anyway. So inlining these bits
> in the callers makes sense.

Yeah. Had there been more callers then I'd have done that, but as you
noted there's only two, and one of them will need to be open coded
anyway.

> > @@ -330,9 +331,17 @@ struct object *parse_object_with_flags(struct repository *r,
> >  
> >  	if ((!obj || obj->type == OBJ_NONE || obj->type == OBJ_BLOB) &&
> >  	    odb_read_object_info(r->objects, oid, NULL) == OBJ_BLOB) {
> > -		if (!skip_hash && stream_object_signature(r, repl) < 0) {
> > -			error(_("hash mismatch %s"), oid_to_hex(oid));
> > -			return NULL;
> > +		if (!skip_hash) {
> > +			struct odb_read_stream *stream = odb_read_stream_open(r->objects, oid, NULL);
> > +			if (!stream || stream_object_signature(r, stream, repl) < 0) {
> > +				error(_("hash mismatch %s"), oid_to_hex(oid));
> > +				if (stream)
> > +					odb_read_stream_close(stream);
> > +				return NULL;
> > +			}
> > +
> > +			if (stream)
> > +				odb_read_stream_close(stream);
> >  		}
> 
> This final "if (stream)" is a noop; we'd have exited the function
> already if "!stream".
> 
> In the earlier conditional:
> 
>   if (!stream || stream_object_signature(r, stream, repl) < 0)
> 
> we're combining two error checks: opening the stream and actually
> hashing it.

Ah, right.

> That matches the existing code (since it all happened in a
> single function), but should we take this opportunity to give more
> accurate error messages? I.e., to do:
> 
>   if (!stream) {
> 	error(_("unable to open object stream for %s"), oid_to_hex(oid));
> 	return NULL;
>   }
>   if (stream_object_signature(r, stream, repl) < 0) {
> 	error(_("hash mismatch %s"), oid_to_hex(oid));
> 	odb_read_stream_close(stream);
> 	return NULL;
>   }
>   odb_read_stream_close(stream);
> 
> I dunno. It should be quite uncommon to see either of these messages,
> but that is sometimes the moment when details are most important.

Agreed, that feels like a sensible change indeed. Also makes the code
flow easier to follow in my opinion.

> Also, as an aside, I found it curious that we still need to pass the
> repository struct to stream_object_signature(). That's because it needs
> to know the correct hash_algo. I wondered if the stream struct itself
> might know about that, but it doesn't seem to (it doesn't know anything
> about where it came from). So it's unavoidable that we'd need to retain
> it.

Yeah, agreed. I wondered whether we should eventually extend `struct
odb_read_stream` to have a pointer to the owning object source, and in
that case we could've avoided the extra repository parameter. But I
decided it was out of scope for this patch series, also because I don't
want to cause conflicts with other stuff I'm working on in this vicinity
:)

Patrick

  reply	other threads:[~2026-02-23 12:21 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23  9:50 [PATCH 0/4] pack-check: fix verification of large objects Patrick Steinhardt
2026-02-23  9:50 ` [PATCH 1/4] t/helper: improve "genrandom" test helper Patrick Steinhardt
2026-02-23 11:13   ` Jeff King
2026-02-23 12:20     ` Patrick Steinhardt
2026-02-23 14:01   ` Eric Sunshine
2026-02-23  9:50 ` [PATCH 2/4] object-file: adapt `stream_object_signature()` to take a stream Patrick Steinhardt
2026-02-23 10:49   ` Jeff King
2026-02-23 12:21     ` Patrick Steinhardt [this message]
2026-02-23 12:59       ` Jeff King
2026-02-23  9:50 ` [PATCH 3/4] packfile: expose function to read object stream for an offset Patrick Steinhardt
2026-02-23 11:07   ` Jeff King
2026-02-23 12:21     ` Patrick Steinhardt
2026-02-23 13:12       ` Jeff King
2026-02-23 15:59         ` Patrick Steinhardt
2026-02-23  9:50 ` [PATCH 4/4] pack-check: fix verification of large objects Patrick Steinhardt
2026-02-23 11:11   ` Jeff King
2026-02-23 11:30     ` Patrick Steinhardt
2026-02-23 12:58       ` Jeff King
2026-02-23 15:48         ` Patrick Steinhardt
2026-02-23 20:35   ` Junio C Hamano
2026-02-24  6:26     ` Patrick Steinhardt
2026-02-23 16:00 ` [PATCH v2 0/4] " Patrick Steinhardt
2026-02-23 16:00   ` [PATCH v2 1/4] t/helper: improve "genrandom" test helper Patrick Steinhardt
2026-02-23 16:00   ` [PATCH v2 2/4] object-file: adapt `stream_object_signature()` to take a stream Patrick Steinhardt
2026-02-23 16:00   ` [PATCH v2 3/4] packfile: expose function to read object stream for an offset Patrick Steinhardt
2026-02-23 16:00   ` [PATCH v2 4/4] pack-check: fix verification of large objects Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZxGLKycnZcVoXPt@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox