git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Junio C Hamano <gitster@pobox.com>
Cc: Usman Akinyemi <usmanakinyemi202@gmail.com>,
	git@vger.kernel.org, Karthik Nayak <karthik.188@gmail.com>,
	Justin Tobler <jltobler@gmail.com>
Subject: Re: [PATCH v2 12/12] hash: stop depending on `the_repository` in `null_oid()`
Date: Fri, 7 Mar 2025 10:08:47 +0100	[thread overview]
Message-ID: <Z8q3nzhl3DHETZgf@pks.im> (raw)
In-Reply-To: <xmqqr03ageej.fsf@gitster.g>

On Thu, Mar 06, 2025 at 11:14:28AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> > index a4431429b7d..2d2e90bc23a 100644
> > --- a/builtin/ls-files.c
> > +++ b/builtin/ls-files.c
> > @@ -234,7 +234,7 @@ static void show_submodule(struct repository *superproject,
> >  {
> >  	struct repository subrepo;
> >  
> > -	if (repo_submodule_init(&subrepo, superproject, path, null_oid()))
> > +	if (repo_submodule_init(&subrepo, superproject, path, null_oid(the_hash_algo)))
> >  		return;
> >  
> >  	if (repo_read_index(&subrepo) < 0)
> 
> This has an obvious semantic interaction with what is done in
> Usman's series <20250306143629.1267358-7-usmanakinyemi202@gmail.com>
> where builtin/ls-files.c claims that it got rid of its dependence on
> the_repository.
> 
> The resulting ls-files still calls null_oid() here, hence it depends
> on the_hash_algo hence indirectly on the_repository.  When these
> topics are merged together, builtin/ls-files.c again needs to be
> marked that it still needs the_repository variable in order to see
> the_hash_algo.

Ah, thanks for catching this!

> I _think_ the subrepo is not allowed to use different hash from the
> superproject, so we can pass superproject->hash_algo instead in this
> series to make it easier on the other topic?

That's actually a very good question -- I have no idea. My gut says that
it probably won't work, but on a conceptual level I think it would be
nice if it did. Reality lies somewhere between. The following commands
work alright:

    $ git init --object-format=sha256 repo
    $ cd repo
    $ git commit --allow-empty --message initial
    $ git submodule add https://github.com/git/git.git
    $ git -C git rev-parse --show-object-info HEAD
    sha1
    a36e024e989f4d35f35987a60e3af8022cac3420

We end up with a SHA1 submodule in a SHA256 repository. So far so good.
How does it look like?

    $ git diff --cached
    diff --git a/.gitmodules b/.gitmodules
    new file mode 100644
    index 0000000..3594794
    --- /dev/null
    +++ b/.gitmodules
    @@ -0,0 +1,3 @@
    +[submodule "git"]
    +	path = git
    +	url = https://github.com/git/git.git
    diff --git a/git b/git
    new file mode 160000
    index 0000000..a36e024
    --- /dev/null
    +++ b/git
    @@ -0,0 +1 @@
    +Subproject commit a36e024e989f4d35f35987a60e3af8022cac3420000000000000000000000000

Well, now it starts to become weird, as the commit hash is obviously
wrong. Can we update the submodule?

    $ git -C git switch next
    $ git diff
    diff --git a/git b/git
    index a36e024..4cd3354 160000
    --- a/git
    +++ b/git
    @@ -1 +1 @@
    -Subproject commit a36e024e989f4d35f35987a60e3af8022cac3420000000000000000000000000
    +Subproject commit 4cd33545ba4fa82324b454aa5bf2748b40a572fb

Well, that diff at least looks somewhat better than expected? Let's
commit it.

    $ git commit -m update
    $ git diff HEAD~
    diff --git a/git b/git
    index a36e024..4cd3354 160000
    --- a/git
    +++ b/git
    @@ -1 +1 @@
    -Subproject commit a36e024e989f4d35f35987a60e3af8022cac3420000000000000000000000000
    +Subproject commit 4cd33545ba4fa82324b454aa5bf2748b40a572fb000000000000000000000000

Oh, well... so, what happens if we ask git-log(1) to recurse into the
submodule?

    $ git log --submodule=diff --patch
    Segmentation fault (core dumped)

Yeah, no, that's not really working.

So in theory, I don't really think there is anything that keeps us from
mixing object formats, and I even think that this is a necessary feature
to have once people start to migrate their repositories to SHA256. There
are people out there who use submodules, and it may not be feasible for
them to wait until all of their submodules have been converted to SHA256
to migrate their own repositories.

From my perspective, we should thus treat it as a bug that the above
does not work.

> What do you think?

Anyway, that was a bit of a tangent to your initial question: which hash
should we be passing in the first place? As it turns out, we should be
passing the superproject's null OID indeed. The object ID we pass to
`repo_submodule_init()` does not describe any object in the submodule
itself, but rather indicates which tree-ish the submodule information
should be parsed from in the superproject. As such, it refers to an
object in the superproject itself and should thus use its object hash.

I'll include such a change in my patch series to resolve the semantic
conflict.

> Perhaps we should have hidden null_oid() as requiring the_repository
> just like the_hash_algo is guarded like so
> 
>         #ifdef USE_THE_REPOSITORY_VARIABLE
>         # include "repository.h"
>         # define the_hash_algo the_repository->hash_algo
>         #endif
> 
> in <hash.h>.
> 
> In other words, I wish we had the following patch already applied,
> before Usman started working on the other topic.

True. `USE_THE_REPOSITORY_VARIABLE` has always been best effort, and it
was clear from the beginning that it won't catch all functions that
implicitly depend on `the_repository`. The good news is that we uncover
and convert more and more of those implicit dependencies, so as time
progresses it becomes more accurate.

> But with this topic getting solidified, it would become a moot point
> to do that in the longer term.  This series removes null_oid() that
> had the implicit dependency anyway.

Indeed.

Patrick

  reply	other threads:[~2025-03-07  9:08 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-03  8:47 [PATCH 00/12] Stop depending on `the_repository` in object-related subsystems Patrick Steinhardt
2025-03-03  8:47 ` [PATCH 01/12] csum-file: stop depending on `the_repository` Patrick Steinhardt
2025-03-06 10:37   ` Karthik Nayak
2025-03-03  8:47 ` [PATCH 02/12] object: " Patrick Steinhardt
2025-03-06 11:07   ` Karthik Nayak
2025-03-06 14:55     ` Patrick Steinhardt
2025-03-03  8:47 ` [PATCH 03/12] pack-write: stop depending on `the_repository` and `the_hash_algo` Patrick Steinhardt
2025-03-04 18:46   ` Justin Tobler
2025-03-03  8:47 ` [PATCH 04/12] environment: move access to "core.bigFileThreshold" into repo settings Patrick Steinhardt
2025-03-04 19:32   ` Justin Tobler
2025-03-06 14:54     ` Patrick Steinhardt
2025-03-03  8:47 ` [PATCH 05/12] pack-check: stop depending on `the_repository` Patrick Steinhardt
2025-03-06 11:14   ` Karthik Nayak
2025-03-03  8:47 ` [PATCH 06/12] pack-revindex: " Patrick Steinhardt
2025-03-03  8:47 ` [PATCH 07/12] pack-bitmap-write: " Patrick Steinhardt
2025-03-03  8:47 ` [PATCH 08/12] object-file-convert: " Patrick Steinhardt
2025-03-04 19:45   ` Justin Tobler
2025-03-03  8:47 ` [PATCH 09/12] delta-islands: " Patrick Steinhardt
2025-03-04 19:48   ` Justin Tobler
2025-03-03  8:47 ` [PATCH 10/12] object-file: split out logic regarding hash algorithms Patrick Steinhardt
2025-03-03  8:47 ` [PATCH 11/12] hash: fix "-Wsign-compare" warnings Patrick Steinhardt
2025-03-03  8:47 ` [PATCH 12/12] hash: stop depending on `the_repository` in `null_oid()` Patrick Steinhardt
2025-03-04 20:16   ` Justin Tobler
2025-03-06 11:20 ` [PATCH 00/12] Stop depending on `the_repository` in object-related subsystems Karthik Nayak
2025-03-06 15:10 ` [PATCH v2 " Patrick Steinhardt
2025-03-06 15:10   ` [PATCH v2 01/12] csum-file: stop depending on `the_repository` Patrick Steinhardt
2025-03-06 15:10   ` [PATCH v2 02/12] object: " Patrick Steinhardt
2025-03-06 15:10   ` [PATCH v2 03/12] pack-write: stop depending on `the_repository` and `the_hash_algo` Patrick Steinhardt
2025-03-06 15:10   ` [PATCH v2 04/12] environment: move access to "core.bigFileThreshold" into repo settings Patrick Steinhardt
2025-03-06 15:10   ` [PATCH v2 05/12] pack-check: stop depending on `the_repository` Patrick Steinhardt
2025-03-06 15:10   ` [PATCH v2 06/12] pack-revindex: " Patrick Steinhardt
2025-03-06 15:10   ` [PATCH v2 07/12] pack-bitmap-write: " Patrick Steinhardt
2025-03-06 15:10   ` [PATCH v2 08/12] object-file-convert: " Patrick Steinhardt
2025-03-06 15:10   ` [PATCH v2 09/12] delta-islands: " Patrick Steinhardt
2025-03-06 15:10   ` [PATCH v2 10/12] object-file: split out logic regarding hash algorithms Patrick Steinhardt
2025-03-06 15:10   ` [PATCH v2 11/12] hash: fix "-Wsign-compare" warnings Patrick Steinhardt
2025-03-06 15:10   ` [PATCH v2 12/12] hash: stop depending on `the_repository` in `null_oid()` Patrick Steinhardt
2025-03-06 19:14     ` Junio C Hamano
2025-03-07  9:08       ` Patrick Steinhardt [this message]
2025-03-07 16:53         ` Junio C Hamano
2025-03-06 15:29   ` [PATCH v2 00/12] Stop depending on `the_repository` in object-related subsystems Karthik Nayak
2025-03-07 14:18 ` [PATCH v3 " Patrick Steinhardt
2025-03-07 14:18   ` [PATCH v3 01/12] csum-file: stop depending on `the_repository` Patrick Steinhardt
2025-03-07 14:18   ` [PATCH v3 02/12] object: " Patrick Steinhardt
2025-03-07 14:18   ` [PATCH v3 03/12] pack-write: stop depending on `the_repository` and `the_hash_algo` Patrick Steinhardt
2025-03-07 14:18   ` [PATCH v3 04/12] environment: move access to "core.bigFileThreshold" into repo settings Patrick Steinhardt
2025-03-07 14:18   ` [PATCH v3 05/12] pack-check: stop depending on `the_repository` Patrick Steinhardt
2025-03-07 14:18   ` [PATCH v3 06/12] pack-revindex: " Patrick Steinhardt
2025-03-07 14:19   ` [PATCH v3 07/12] pack-bitmap-write: " Patrick Steinhardt
2025-03-07 14:19   ` [PATCH v3 08/12] object-file-convert: " Patrick Steinhardt
2025-03-07 14:19   ` [PATCH v3 09/12] delta-islands: " Patrick Steinhardt
2025-03-07 14:19   ` [PATCH v3 10/12] object-file: split out logic regarding hash algorithms Patrick Steinhardt
2025-03-07 14:19   ` [PATCH v3 11/12] hash: fix "-Wsign-compare" warnings Patrick Steinhardt
2025-03-07 14:19   ` [PATCH v3 12/12] hash: stop depending on `the_repository` in `null_oid()` Patrick Steinhardt
2025-03-08 16:05     ` Elijah Newren
2025-03-10  7:11       ` Patrick Steinhardt
2025-03-10 22:37         ` Elijah Newren
2025-03-10 15:38       ` Junio C Hamano
2025-03-08 16:11   ` [PATCH v3 00/12] Stop depending on `the_repository` in object-related subsystems Elijah Newren
2025-03-10  7:13 ` [PATCH v4 " Patrick Steinhardt
2025-03-10  7:13   ` [PATCH v4 01/12] csum-file: stop depending on `the_repository` Patrick Steinhardt
2025-03-10  7:13   ` [PATCH v4 02/12] object: " Patrick Steinhardt
2025-03-10  7:13   ` [PATCH v4 03/12] pack-write: stop depending on `the_repository` and `the_hash_algo` Patrick Steinhardt
2025-03-10  7:13   ` [PATCH v4 04/12] environment: move access to "core.bigFileThreshold" into repo settings Patrick Steinhardt
2025-03-10  7:13   ` [PATCH v4 05/12] pack-check: stop depending on `the_repository` Patrick Steinhardt
2025-03-10  7:13   ` [PATCH v4 06/12] pack-revindex: " Patrick Steinhardt
2025-03-10  7:13   ` [PATCH v4 07/12] pack-bitmap-write: " Patrick Steinhardt
2025-03-10  7:13   ` [PATCH v4 08/12] object-file-convert: " Patrick Steinhardt
2025-03-10  7:13   ` [PATCH v4 09/12] delta-islands: " Patrick Steinhardt
2025-03-10  7:13   ` [PATCH v4 10/12] object-file: split out logic regarding hash algorithms Patrick Steinhardt
2025-03-10  7:13   ` [PATCH v4 11/12] hash: fix "-Wsign-compare" warnings Patrick Steinhardt
2025-03-10  7:13   ` [PATCH v4 12/12] hash: stop depending on `the_repository` in `null_oid()` Patrick Steinhardt
2025-03-10 22:39   ` [PATCH v4 00/12] Stop depending on `the_repository` in object-related subsystems Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z8q3nzhl3DHETZgf@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jltobler@gmail.com \
    --cc=karthik.188@gmail.com \
    --cc=usmanakinyemi202@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).