public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 3/4] packfile: expose function to read object stream for an offset
Date: Mon, 23 Feb 2026 13:21:06 +0100	[thread overview]
Message-ID: <aZxGMrGkVNeAdC1N@pks.im> (raw)
In-Reply-To: <20260223110722.GB215364@coredump.intra.peff.net>

On Mon, Feb 23, 2026 at 06:07:22AM -0500, Jeff King wrote:
> On Mon, Feb 23, 2026 at 10:50:42AM +0100, Patrick Steinhardt wrote:
> > +int packfile_store_read_object_stream(struct odb_read_stream **out,
> > +				      struct packfile_store *store,
> > +				      const struct object_id *oid)
> > +{
> > +	struct pack_entry e;
> > +
> > +	if (!find_pack_entry(store, oid, &e))
> > +		return -1;
> > +
> > +	return packfile_read_object_stream(out, e.p, e.offset);
> > +}
> 
> OK. The original read via packfile_store_read_object_info(), which does
> a bit more work. It called packed_object_info() and if necessary would
> trigger mark_bad_packed_object(). But now that we are leaving it to
> packfile_read_object_stream() to look at the header, we don't need to
> load any object info, and we have no error code to check.
> 
> It does make me wonder, though, if we are missing out on marking bad
> objects here. The idea is that we'd usually do something like:
> 
>   1. some code wants to access $OID
> 
>   2. we find $OID in pack $P
> 
>   3. that turns out to be broken for some reason, so we mark it as bad
> 
>   4. we try again, skipping $P and finding it in some other pack
> 
> But now I wonder if code that tries to stream will skip step 3, and then
> in step 4 we'll find the same broken $P over and over.
> 
> But I suspect if that is possible, it was already true. We were only
> asking for the type and size, so any content-level corruption wouldn't
> be caught here and we'd have the same issue. I think the right thing is
> probably for the streaming code to know about the pack/oid pair it's
> trying to read, and to mark it as bad if it hits an error.
> 
> So your patch here might be making the problem a tiny bit worse, but not
> in a material way. I think we can ignore it for now.

I guess the "tiny bit worse" part is that we don't handle the case
anymore where `unpack_object_header()` returns `OBJ_BAD`. As you say, we
previously didn't fully parse the object anyway, so we couldn't have
detected all kinds of corruptions. But we definitely handled the case
where `unpack_object_header()` failed.

So maybe we should do something like the below patch?

Patrick

diff --git a/packfile.c b/packfile.c
index 9d795a671f..3e61176128 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2554,6 +2554,7 @@ static int close_istream_pack_non_delta(struct odb_read_stream *_st)
 }
 
 int packfile_read_object_stream(struct odb_read_stream **out,
+				const struct object_id *oid,
 				struct packed_git *pack,
 				off_t offset)
 {
@@ -2571,6 +2572,9 @@ int packfile_read_object_stream(struct odb_read_stream **out,
 	switch (in_pack_type) {
 	default:
 		return -1; /* we do not do deltas for now */
+	case OBJ_BAD:
+		mark_bad_packed_object(pack, oid);
+		return -1;
 	case OBJ_COMMIT:
 	case OBJ_TREE:
 	case OBJ_BLOB:
@@ -2601,5 +2605,5 @@ int packfile_store_read_object_stream(struct odb_read_stream **out,
 	if (!find_pack_entry(store, oid, &e))
 		return -1;
 
-	return packfile_read_object_stream(out, e.p, e.offset);
+	return packfile_read_object_stream(out, oid, e.p, e.offset);
 }
diff --git a/packfile.h b/packfile.h
index 67d5750140..b9f5f1c18c 100644
--- a/packfile.h
+++ b/packfile.h
@@ -437,6 +437,7 @@ off_t get_delta_base(struct packed_git *p, struct pack_window **w_curs,
 		     off_t delta_obj_offset);
 
 int packfile_read_object_stream(struct odb_read_stream **out,
+				const struct object_id *oid,
 				struct packed_git *pack,
 				off_t offset);
 

  reply	other threads:[~2026-02-23 12:21 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23  9:50 [PATCH 0/4] pack-check: fix verification of large objects Patrick Steinhardt
2026-02-23  9:50 ` [PATCH 1/4] t/helper: improve "genrandom" test helper Patrick Steinhardt
2026-02-23 11:13   ` Jeff King
2026-02-23 12:20     ` Patrick Steinhardt
2026-02-23 14:01   ` Eric Sunshine
2026-02-23  9:50 ` [PATCH 2/4] object-file: adapt `stream_object_signature()` to take a stream Patrick Steinhardt
2026-02-23 10:49   ` Jeff King
2026-02-23 12:21     ` Patrick Steinhardt
2026-02-23 12:59       ` Jeff King
2026-02-23  9:50 ` [PATCH 3/4] packfile: expose function to read object stream for an offset Patrick Steinhardt
2026-02-23 11:07   ` Jeff King
2026-02-23 12:21     ` Patrick Steinhardt [this message]
2026-02-23 13:12       ` Jeff King
2026-02-23 15:59         ` Patrick Steinhardt
2026-02-23  9:50 ` [PATCH 4/4] pack-check: fix verification of large objects Patrick Steinhardt
2026-02-23 11:11   ` Jeff King
2026-02-23 11:30     ` Patrick Steinhardt
2026-02-23 12:58       ` Jeff King
2026-02-23 15:48         ` Patrick Steinhardt
2026-02-23 20:35   ` Junio C Hamano
2026-02-24  6:26     ` Patrick Steinhardt
2026-02-23 16:00 ` [PATCH v2 0/4] " Patrick Steinhardt
2026-02-23 16:00   ` [PATCH v2 1/4] t/helper: improve "genrandom" test helper Patrick Steinhardt
2026-02-23 16:00   ` [PATCH v2 2/4] object-file: adapt `stream_object_signature()` to take a stream Patrick Steinhardt
2026-02-23 16:00   ` [PATCH v2 3/4] packfile: expose function to read object stream for an offset Patrick Steinhardt
2026-02-23 16:00   ` [PATCH v2 4/4] pack-check: fix verification of large objects Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZxGMrGkVNeAdC1N@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox