From: Junio C Hamano <gitster@pobox.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Karthik Nayak <karthik.188@gmail.com>,
Justin Tobler <jltobler@gmail.com>
Subject: Re: [PATCH v2 12/19] streaming: rely on object sources to create object stream
Date: Fri, 21 Nov 2025 11:32:38 -0800 [thread overview]
Message-ID: <xmqqldjz41wp.fsf@gitster.g> (raw)
In-Reply-To: <20251121-b4-pks-odb-read-stream-v2-12-ca8534963150@pks.im> (Patrick Steinhardt's message of "Fri, 21 Nov 2025 08:40:57 +0100")
Patrick Steinhardt <ps@pks.im> writes:
> When creating an object stream we first look up the object info and, if
> it's present, we call into the respective backend that contains the
> object to create a new stream for it.
>
> This has the consequence that, for loose object source, we basically
> iterate through the object sources twice: we first discover that the
> file exists as a loose object in the first place by iterating through
> all sources. And, once we have discovered it, we again walk through all
> sources to try and map the object. The same issue will eventually also
> surface once the packfile store becomes per-object-source.
>
> Furthermore, it feels rather pointless to first look up the object only
> to then try and read it.
>
> Refactor the logic to be centered around sources instead. Instead of
> first reading the object, we immediately ask the source to create the
> object stream for us. If the object exists we get stream, otherwise
> we'll try the next source.
>
> Like this we only have to iterate through sources once. But even more
> importantly, this change also helps us to make the whole logic
> pluggable. The object read stream subsystem does not need to be aware of
> the different source backends anymore, but eventually it'll only have to
> call the source's callback function.
Very nicely done.
> Note that at the current point in time we aren't fully there yet:
>
> - The packfile store still sits on the object database level and is
> thus agnostic of the sources.
>
> - We still have to call into both the packfile store and the loose
> object source.
>
> But both of these issues will soon be addressed.
;-)
> @@ -463,30 +461,15 @@ static int istream_source(struct odb_read_stream **out,
> struct repository *r,
> const struct object_id *oid)
> {
> - unsigned long size;
> - int status;
> - struct object_info oi = OBJECT_INFO_INIT;
> -
> - oi.sizep = &size;
> - status = odb_read_object_info_extended(r->objects, oid, &oi, 0);
> - if (status < 0)
> - return status;
> + struct odb_source *source;
>
> - switch (oi.whence) {
> - case OI_LOOSE:
> - if (open_istream_loose(out, r, oid) < 0)
> - break;
> - return 0;
> - case OI_PACKED:
> - if (oi.u.packed.is_delta ||
> - repo_settings_get_big_file_threshold(the_repository) >= size ||
> - open_istream_pack_non_delta(out, r, oid, oi.u.packed.pack,
> - oi.u.packed.offset) < 0)
> - break;
> + if (!open_istream_pack_non_delta(out, r->objects, oid))
> return 0;
> - default:
> - break;
> - }
> +
> + odb_prepare_alternates(r->objects);
> + for (source = r->objects->sources; source; source = source->next)
> + if (!open_istream_loose(out, source, oid))
> + return 0;
Hmph.
Earlier we let odb_read_object_info_extended() decide which one of
the duplicated objects (e.g., perhaps a loose object is still there
after packing), and then used the one it picked. I think the
odb_read_object_info_extended() encodes a particular order with with
solid reasons like "do in-core cached one first", "favor objects in
pack over loose ones".
Now we instead let the first one with the object in the linked list
of sources, which may be different, unless the linked list is
created with the same "why one source needs to be given precedence
over the others" reasoning.
I do not know if/how it matters, this somewhat changes the
semantics, no?
next prev parent reply other threads:[~2025-11-21 19:32 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-19 7:47 [PATCH 00/18] Refactor object read streams to work via object sources Patrick Steinhardt
2025-11-19 7:47 ` [PATCH 01/18] streaming: rename `git_istream` into `odb_read_stream` Patrick Steinhardt
2025-11-19 18:49 ` Justin Tobler
2025-11-19 20:04 ` Junio C Hamano
2025-11-21 6:31 ` Patrick Steinhardt
2025-11-19 7:47 ` [PATCH 02/18] streaming: drop the `open()` callback function Patrick Steinhardt
2025-11-19 9:39 ` Karthik Nayak
2025-11-19 19:01 ` Justin Tobler
2025-11-21 6:32 ` Patrick Steinhardt
2025-11-19 7:47 ` [PATCH 03/18] streaming: propagate final object type via the stream Patrick Steinhardt
2025-11-19 19:25 ` Justin Tobler
2025-11-21 6:32 ` Patrick Steinhardt
2025-11-19 7:47 ` [PATCH 04/18] streaming: explicitly pass packfile info when streaming a packed object Patrick Steinhardt
2025-11-19 7:47 ` [PATCH 05/18] streaming: allocate stream inside the backend-specific logic Patrick Steinhardt
2025-11-19 10:11 ` Karthik Nayak
2025-11-21 6:32 ` Patrick Steinhardt
2025-11-19 7:47 ` [PATCH 06/18] streaming: create structure for in-core object streams Patrick Steinhardt
2025-11-19 10:14 ` Karthik Nayak
2025-11-21 6:32 ` Patrick Steinhardt
2025-11-19 7:47 ` [PATCH 07/18] streaming: create structure for loose " Patrick Steinhardt
2025-11-19 7:47 ` [PATCH 08/18] streaming: create structure for packed " Patrick Steinhardt
2025-11-19 7:47 ` [PATCH 09/18] streaming: create structure for filtered " Patrick Steinhardt
2025-11-19 7:47 ` [PATCH 10/18] streaming: move zlib stream into backends Patrick Steinhardt
2025-11-19 7:47 ` [PATCH 11/18] packfile: introduce function to read object info from a store Patrick Steinhardt
2025-11-19 14:48 ` Karthik Nayak
2025-11-21 6:33 ` Patrick Steinhardt
2025-11-19 7:47 ` [PATCH 12/18] streaming: rely on object sources to create object stream Patrick Steinhardt
2025-11-19 16:10 ` Karthik Nayak
2025-11-19 7:47 ` [PATCH 13/18] streaming: get rid of `the_repository` Patrick Steinhardt
2025-11-19 7:47 ` [PATCH 14/18] streaming: make the `odb_read_stream` definition public Patrick Steinhardt
2025-11-19 16:27 ` Karthik Nayak
2025-11-21 6:33 ` Patrick Steinhardt
2025-11-19 7:47 ` [PATCH 15/18] streaming: move logic to read loose objects streams into backend Patrick Steinhardt
2025-11-19 7:47 ` [PATCH 16/18] streaming: move logic to read packed " Patrick Steinhardt
2025-11-19 7:47 ` [PATCH 17/18] streaming: refactor interface to be object-database-centric Patrick Steinhardt
2025-11-19 7:47 ` [PATCH 18/18] streaming: move into object database subsystem Patrick Steinhardt
2025-11-21 7:40 ` [PATCH v2 00/19] Refactor object read streams to work via object sources Patrick Steinhardt
2025-11-21 7:40 ` [PATCH v2 01/19] streaming: rename `git_istream` into `odb_read_stream` Patrick Steinhardt
2025-11-21 7:40 ` [PATCH v2 02/19] streaming: drop the `open()` callback function Patrick Steinhardt
2025-11-21 18:08 ` Junio C Hamano
2025-11-23 18:59 ` Patrick Steinhardt
2025-11-21 7:40 ` [PATCH v2 03/19] streaming: propagate final object type via the stream Patrick Steinhardt
2025-11-21 7:40 ` [PATCH v2 04/19] streaming: explicitly pass packfile info when streaming a packed object Patrick Steinhardt
2025-11-21 7:40 ` [PATCH v2 05/19] streaming: allocate stream inside the backend-specific logic Patrick Steinhardt
2025-11-21 7:40 ` [PATCH v2 06/19] streaming: create structure for in-core object streams Patrick Steinhardt
2025-11-21 7:40 ` [PATCH v2 07/19] streaming: create structure for loose " Patrick Steinhardt
2025-11-21 7:40 ` [PATCH v2 08/19] streaming: create structure for packed " Patrick Steinhardt
2025-11-21 7:40 ` [PATCH v2 09/19] streaming: create structure for filtered " Patrick Steinhardt
2025-11-21 7:40 ` [PATCH v2 10/19] streaming: move zlib stream into backends Patrick Steinhardt
2025-11-21 7:40 ` [PATCH v2 11/19] packfile: introduce function to read object info from a store Patrick Steinhardt
2025-11-21 7:40 ` [PATCH v2 12/19] streaming: rely on object sources to create object stream Patrick Steinhardt
2025-11-21 19:32 ` Junio C Hamano [this message]
2025-11-23 18:59 ` Patrick Steinhardt
2025-11-21 7:40 ` [PATCH v2 13/19] streaming: get rid of `the_repository` Patrick Steinhardt
2025-11-21 19:42 ` Junio C Hamano
2025-11-23 18:59 ` Patrick Steinhardt
2025-11-21 7:40 ` [PATCH v2 14/19] streaming: make the `odb_read_stream` definition public Patrick Steinhardt
2025-11-21 7:41 ` [PATCH v2 15/19] streaming: move logic to read loose objects streams into backend Patrick Steinhardt
2025-11-21 7:41 ` [PATCH v2 16/19] streaming: move logic to read packed " Patrick Steinhardt
2025-11-21 7:41 ` [PATCH v2 17/19] streaming: refactor interface to be object-database-centric Patrick Steinhardt
2025-11-22 0:10 ` Junio C Hamano
2025-11-23 18:59 ` Patrick Steinhardt
2025-11-21 7:41 ` [PATCH v2 18/19] streaming: move into object database subsystem Patrick Steinhardt
2025-11-23 2:20 ` Junio C Hamano
2025-11-21 7:41 ` [PATCH v2 19/19] streaming: drop redundant type and size pointers Patrick Steinhardt
2025-11-23 18:59 ` [PATCH v3 00/19] Refactor object read streams to work via object sources Patrick Steinhardt
2025-11-23 18:59 ` [PATCH v3 01/19] streaming: rename `git_istream` into `odb_read_stream` Patrick Steinhardt
2025-11-23 18:59 ` [PATCH v3 02/19] streaming: drop the `open()` callback function Patrick Steinhardt
2025-11-23 18:59 ` [PATCH v3 03/19] streaming: propagate final object type via the stream Patrick Steinhardt
2025-11-23 18:59 ` [PATCH v3 04/19] streaming: explicitly pass packfile info when streaming a packed object Patrick Steinhardt
2025-11-23 18:59 ` [PATCH v3 05/19] streaming: allocate stream inside the backend-specific logic Patrick Steinhardt
2025-11-23 18:59 ` [PATCH v3 06/19] streaming: create structure for in-core object streams Patrick Steinhardt
2025-11-23 18:59 ` [PATCH v3 07/19] streaming: create structure for loose " Patrick Steinhardt
2025-11-23 18:59 ` [PATCH v3 08/19] streaming: create structure for packed " Patrick Steinhardt
2025-11-23 18:59 ` [PATCH v3 09/19] streaming: create structure for filtered " Patrick Steinhardt
2025-11-23 18:59 ` [PATCH v3 10/19] streaming: move zlib stream into backends Patrick Steinhardt
2025-11-23 18:59 ` [PATCH v3 11/19] packfile: introduce function to read object info from a store Patrick Steinhardt
2025-11-23 18:59 ` [PATCH v3 12/19] streaming: rely on object sources to create object stream Patrick Steinhardt
2025-11-23 18:59 ` [PATCH v3 13/19] streaming: get rid of `the_repository` Patrick Steinhardt
2025-11-23 18:59 ` [PATCH v3 14/19] streaming: make the `odb_read_stream` definition public Patrick Steinhardt
2025-11-23 18:59 ` [PATCH v3 15/19] streaming: move logic to read loose objects streams into backend Patrick Steinhardt
2025-11-23 18:59 ` [PATCH v3 16/19] streaming: move logic to read packed " Patrick Steinhardt
2025-11-23 18:59 ` [PATCH v3 17/19] streaming: refactor interface to be object-database-centric Patrick Steinhardt
2025-11-23 18:59 ` [PATCH v3 18/19] streaming: move into object database subsystem Patrick Steinhardt
2025-11-23 18:59 ` [PATCH v3 19/19] streaming: drop redundant type and size pointers Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqldjz41wp.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=jltobler@gmail.com \
--cc=karthik.188@gmail.com \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).