From: Derrick Stolee <stolee@gmail.com>
To: Junio C Hamano <gitster@pobox.com>, Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 0/6] odb: track commit graphs via object source
Date: Fri, 5 Sep 2025 14:29:50 -0400 [thread overview]
Message-ID: <cf7aeda1-297a-4805-b0ae-e379ce11bbcf@gmail.com> (raw)
In-Reply-To: <xmqq5xdx7qx4.fsf@gitster.g>
On 9/4/2025 7:12 PM, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
>> commit graphs are currently stored on the object database level. This
>> doesn't really make much sense conceptually, given that commit graphs
>> are specific to one object source. Furthermore, with the upcoming
>> pluggable object database effort, an object source's backend may not
>> evene have a commit graph in the first place but store that information
>> in a different format altogether.
>>
>> This patch series prepares for that by moving the commit graph from
>> `struct object_database` into `struct odb_source`.
>
> Hmph, I am finding the above hard to agree with at the conceptual
> level. In some future, we may use multiple object stores in a
> single repository. Perhaps we would be storing older parts of
> history in semi-online storage while newer parts are stored in
> readily available storage. But the side data structure that allows
> us to quickly learn who are parents of one commit is without having
> to go to the object store in order to parse the actualy commit
> object can be stored for the entire history if we wanted to, or more
> recent part of the history but not limited to the "readily available
> storage" part. IOW, where the boundary between the older and the
> newer parts of the history lies and which commits the commit graph
> covers should be pretty much independent.
>
> So moving from object_database (i.e. the whole world) to individual
> odb_source (i.e. where one particular subset of the history is
> stored) feels like totally backwards to me. Surely, a commit graph
> file may be defined over a set of packfiles and remaining loose
> object files, but it is not like an instance of the commit-graph
> file is tied to packfiles in the sense that it uses the index into
> some packfile instead of the actual object names to refer to
> commits, or anything like that (this is quite different from other
> files that are very specific to a single object store, like midx
> that is tied to the packfiles it describes).
This is an interesting aspect to things, where the commit-graph file
is a "structured cache" of certain commit information. It happens to
be located within the object stores (either local or in an alternate)
but is conceptually different in a few ways.
The biggest difference is that you can only open one commit-graph
(or chain of commit-graphs). Having multiple files across different
object stores will not accumulate additional context. Instead, we
have a "first one wins" approach.
This does seem to be something that you are attempting to change
by including the ability to load a commit-graph for each odb (and
closing them in sequence as we close a repo).
So in this sense, the commit-graph lives at the repository level,
not an object store level. When doing I/O to write or read a graph,
we use a specific object store at a time.
The other direction to consider is what context we have when we
interact with a commit-graph. We generally are parsing commits from
a repository or loading Bloom filter data during file history walks.
Each of these do not have a predictable nature of which object store
will "own" the commit we are inspecting, so it wouldn't make sense
to restrict things like odb_parse_commit() over repo_parse_commit().
With these thoughts in mind, I have these big-picture thoughts:
1. Patches 1-5 are great. Nice cleanups.
2. Some of Patch 6 is great, including having the I/O methods use
an odb_source to help focus the specific location of the files
being read or written. However, the movement of the struct into
the odb_source makes less sense and should still exist at the
object_database level.
Thanks,
-Stolee
next prev parent reply other threads:[~2025-09-05 18:30 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-04 12:49 [PATCH 0/6] odb: track commit graphs via object source Patrick Steinhardt
2025-09-04 12:49 ` [PATCH 1/6] blame: drop explicit check for commit graph Patrick Steinhardt
2025-09-11 22:09 ` Taylor Blau
2025-09-04 12:49 ` [PATCH 2/6] revision: " Patrick Steinhardt
2025-09-11 22:16 ` Taylor Blau
2025-09-04 12:49 ` [PATCH 3/6] commit-graph: return the prepared commit graph from `prepare_commit_graph()` Patrick Steinhardt
2025-09-11 22:25 ` Taylor Blau
2025-09-04 12:49 ` [PATCH 4/6] commit-graph: return commit graph from `repo_find_commit_pos_in_graph()` Patrick Steinhardt
2025-09-11 22:54 ` Taylor Blau
2025-09-04 12:49 ` [PATCH 5/6] commit-graph: pass graphs that are to be merged as parameter Patrick Steinhardt
2025-09-04 12:50 ` [PATCH 6/6] odb: move commit-graph into the object sources Patrick Steinhardt
2025-09-11 23:00 ` Taylor Blau
2025-09-04 23:12 ` [PATCH 0/6] odb: track commit graphs via object source Junio C Hamano
2025-09-05 18:29 ` Derrick Stolee [this message]
2025-09-08 11:17 ` Patrick Steinhardt
2025-09-08 14:46 ` Derrick Stolee
2025-09-10 11:38 ` Patrick Steinhardt
2025-09-25 19:17 ` Junio C Hamano
2025-09-26 5:18 ` Patrick Steinhardt
2025-10-02 11:21 ` Patrick Steinhardt
2025-10-02 11:35 ` Patrick Steinhardt
2025-10-02 16:49 ` Junio C Hamano
2025-10-03 16:56 ` Derrick Stolee
2025-09-11 23:08 ` Taylor Blau
2025-09-04 23:27 ` Junio C Hamano
2025-09-05 6:18 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cf7aeda1-297a-4805-b0ae-e379ce11bbcf@gmail.com \
--to=stolee@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.