All of lore.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 3/4] packfile: expose function to read object stream for an offset
Date: Mon, 23 Feb 2026 13:21:06 +0100	[thread overview]
Message-ID: <aZxGMrGkVNeAdC1N@pks.im> (raw)
In-Reply-To: <20260223110722.GB215364@coredump.intra.peff.net>

On Mon, Feb 23, 2026 at 06:07:22AM -0500, Jeff King wrote:
> On Mon, Feb 23, 2026 at 10:50:42AM +0100, Patrick Steinhardt wrote:
> > +int packfile_store_read_object_stream(struct odb_read_stream **out,
> > +				      struct packfile_store *store,
> > +				      const struct object_id *oid)
> > +{
> > +	struct pack_entry e;
> > +
> > +	if (!find_pack_entry(store, oid, &e))
> > +		return -1;
> > +
> > +	return packfile_read_object_stream(out, e.p, e.offset);
> > +}
> 
> OK. The original read via packfile_store_read_object_info(), which does
> a bit more work. It called packed_object_info() and if necessary would
> trigger mark_bad_packed_object(). But now that we are leaving it to
> packfile_read_object_stream() to look at the header, we don't need to
> load any object info, and we have no error code to check.
> 
> It does make me wonder, though, if we are missing out on marking bad
> objects here. The idea is that we'd usually do something like:
> 
>   1. some code wants to access $OID
> 
>   2. we find $OID in pack $P
> 
>   3. that turns out to be broken for some reason, so we mark it as bad
> 
>   4. we try again, skipping $P and finding it in some other pack
> 
> But now I wonder if code that tries to stream will skip step 3, and then
> in step 4 we'll find the same broken $P over and over.
> 
> But I suspect if that is possible, it was already true. We were only
> asking for the type and size, so any content-level corruption wouldn't
> be caught here and we'd have the same issue. I think the right thing is
> probably for the streaming code to know about the pack/oid pair it's
> trying to read, and to mark it as bad if it hits an error.
> 
> So your patch here might be making the problem a tiny bit worse, but not
> in a material way. I think we can ignore it for now.

I guess the "tiny bit worse" part is that we don't handle the case
anymore where `unpack_object_header()` returns `OBJ_BAD`. As you say, we
previously didn't fully parse the object anyway, so we couldn't have
detected all kinds of corruptions. But we definitely handled the case
where `unpack_object_header()` failed.

So maybe we should do something like the below patch?

Patrick

diff --git a/packfile.c b/packfile.c
index 9d795a671f..3e61176128 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2554,6 +2554,7 @@ static int close_istream_pack_non_delta(struct odb_read_stream *_st)
 }
 
 int packfile_read_object_stream(struct odb_read_stream **out,
+				const struct object_id *oid,
 				struct packed_git *pack,
 				off_t offset)
 {
@@ -2571,6 +2572,9 @@ int packfile_read_object_stream(struct odb_read_stream **out,
 	switch (in_pack_type) {
 	default:
 		return -1; /* we do not do deltas for now */
+	case OBJ_BAD:
+		mark_bad_packed_object(pack, oid);
+		return -1;
 	case OBJ_COMMIT:
 	case OBJ_TREE:
 	case OBJ_BLOB:
@@ -2601,5 +2605,5 @@ int packfile_store_read_object_stream(struct odb_read_stream **out,
 	if (!find_pack_entry(store, oid, &e))
 		return -1;
 
-	return packfile_read_object_stream(out, e.p, e.offset);
+	return packfile_read_object_stream(out, oid, e.p, e.offset);
 }
diff --git a/packfile.h b/packfile.h
index 67d5750140..b9f5f1c18c 100644
--- a/packfile.h
+++ b/packfile.h
@@ -437,6 +437,7 @@ off_t get_delta_base(struct packed_git *p, struct pack_window **w_curs,
 		     off_t delta_obj_offset);
 
 int packfile_read_object_stream(struct odb_read_stream **out,
+				const struct object_id *oid,
 				struct packed_git *pack,
 				off_t offset);
 

  reply	other threads:[~2026-02-23 12:21 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23  9:50 [PATCH 0/4] pack-check: fix verification of large objects Patrick Steinhardt
2026-02-23  9:50 ` [PATCH 1/4] t/helper: improve "genrandom" test helper Patrick Steinhardt
2026-02-23 11:13   ` Jeff King
2026-02-23 12:20     ` Patrick Steinhardt
2026-02-23 14:01   ` Eric Sunshine
2026-02-23  9:50 ` [PATCH 2/4] object-file: adapt `stream_object_signature()` to take a stream Patrick Steinhardt
2026-02-23 10:49   ` Jeff King
2026-02-23 12:21     ` Patrick Steinhardt
2026-02-23 12:59       ` Jeff King
2026-02-23  9:50 ` [PATCH 3/4] packfile: expose function to read object stream for an offset Patrick Steinhardt
2026-02-23 11:07   ` Jeff King
2026-02-23 12:21     ` Patrick Steinhardt [this message]
2026-02-23 13:12       ` Jeff King
2026-02-23 15:59         ` Patrick Steinhardt
2026-02-23  9:50 ` [PATCH 4/4] pack-check: fix verification of large objects Patrick Steinhardt
2026-02-23 11:11   ` Jeff King
2026-02-23 11:30     ` Patrick Steinhardt
2026-02-23 12:58       ` Jeff King
2026-02-23 15:48         ` Patrick Steinhardt
2026-02-23 20:35   ` Junio C Hamano
2026-02-24  6:26     ` Patrick Steinhardt
2026-02-23 16:00 ` [PATCH v2 0/4] " Patrick Steinhardt
2026-02-23 16:00   ` [PATCH v2 1/4] t/helper: improve "genrandom" test helper Patrick Steinhardt
2026-02-23 16:00   ` [PATCH v2 2/4] object-file: adapt `stream_object_signature()` to take a stream Patrick Steinhardt
2026-02-23 16:00   ` [PATCH v2 3/4] packfile: expose function to read object stream for an offset Patrick Steinhardt
2026-02-23 16:00   ` [PATCH v2 4/4] pack-check: fix verification of large objects Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZxGMrGkVNeAdC1N@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.