Git development
 help / color / mirror / Atom feed
* [PATCH v3 0/3] git-credential-netrc: better symbolic port names support
From: Maxim Cournoyer @ 2025-06-24  1:48 UTC (permalink / raw)
  To: git; +Cc: Maxim Cournoyer, Junio C Hamano, Andreas Schwab, brian m. carlson
In-Reply-To: <20250620041239.27839-1-maxim@guixotic.coop>

Most suggestions from Junio have been applied in this revision.

Changes in v3:
 - rename is_port to port_num
 - directly return scalar value from getservbyname in port_num

Thanks,

Maxim Cournoyer (3):
  contrib: use a more portable shebang for git-credential-netrc
  contrib: warn for invalid netrc file ports in git-credential-netrc
  contrib: better support symbolic port names in git-credential-netrc

 contrib/credential/netrc/git-credential-netrc.perl | 14 +++++++++++---
 contrib/credential/netrc/test.pl                   |  8 ++++----
 git-send-email.perl                                | 11 +++++++++++
 perl/Git.pm                                        | 13 +++++++++++++
 t/t9001-send-email.sh                              |  7 +++++++
 5 files changed, 46 insertions(+), 7 deletions(-)

-- 
2.49.0


^ permalink raw reply

* [PATCH v3 1/3] contrib: use a more portable shebang for git-credential-netrc
From: Maxim Cournoyer @ 2025-06-24  1:48 UTC (permalink / raw)
  To: git; +Cc: Maxim Cournoyer
In-Reply-To: <20250620041239.27839-1-maxim@guixotic.coop>

While the installed scripts have their Perl shebang set to PERL_PATH,
it is nevertheless useful to be able to run the uninstalled script for
manual tests while developing. This change makes the shebang more
portable by having the perl command looked from PATH instead of from a
fixed location.

Signed-off-by: Maxim Cournoyer <maxim@guixotic.coop>
---
 contrib/credential/netrc/git-credential-netrc.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/credential/netrc/git-credential-netrc.perl b/contrib/credential/netrc/git-credential-netrc.perl
index 9fb998ae09..514f68d00b 100755
--- a/contrib/credential/netrc/git-credential-netrc.perl
+++ b/contrib/credential/netrc/git-credential-netrc.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/perl
+#!/usr/bin/env perl
 
 use strict;
 use warnings;
-- 
2.49.0


^ permalink raw reply related

* What's cooking in git.git (Jun 2025, #08; Mon, 23)
From: Junio C Hamano @ 2025-06-24  0:57 UTC (permalink / raw)
  To: git

Here are the topics that have been cooking in my tree.  Commits
prefixed with '+' are in 'next' (being in 'next' is a sign that a
topic is stable enough to be used and are candidate to be in a
future release).  Commits prefixed with '-' are only in 'seen', and
aren't considered "accepted" at all and may be annotated with an URL
to a message that raises issues but they are no means exhaustive.  A
topic without enough support may be discarded after a long period of
no activity (of course they can be resubmit when new interests
arise).

Copies of the source code to Git live in many repositories, and the
following is a list of the ones I push into or their mirrors.  Some
repositories have only a subset of branches.

With maint, master, next, seen, todo:

	git://git.kernel.org/pub/scm/git/git.git/
	git://repo.or.cz/alt-git.git/
	https://kernel.googlesource.com/pub/scm/git/git/
	https://github.com/git/git/
	https://gitlab.com/git-scm/git/

With all the integration branches and topics broken out:

	https://github.com/gitster/git/

Even though the preformatted documentation in HTML and man format
are not sources, they are published in these repositories for
convenience (replace "htmldocs" with "manpages" for the manual
pages):

	git://git.kernel.org/pub/scm/git/git-htmldocs.git/
	https://github.com/gitster/git-htmldocs.git/

Release tarballs are available at:

	https://www.kernel.org/pub/software/scm/git/

--------------------------------------------------
[New Topics]

* jk/test-seq-format (2025-06-23) 2 commits
 - test-lib: teach test_seq the -f option
 - t7422: replace confusing printf with echo

 A test helper "test_seq" function learned the "-f <fmt>" option,
 which allowed us to simplify a lot of test scripts.

 Will merge to 'next'.
 source: <20250623105516.GA654296@coredump.intra.peff.net>


* mc/netrc-service-names (2025-06-22) 3 commits
 - contrib: better support symbolic port names in git-credential-netrc
 - contrib: warn for invalid netrc file ports in git-credential-netrc
 - contrib: use a more portable shebang for git-credential-netrc

 "netrc" credential helper has been improved to understand textual
 service names (like smtp) in addition to the numeric port numbers
 (like 25).

 Comments.
 source: <20250620041239.27839-1-maxim@guixotic.coop>


* ph/fetch-prune-optim (2025-06-23) 2 commits
 - refs: remove old refs_warn_dangling_symref
 - fetch-prune: optimize dangling-ref reporting

 source: <20250623234327.335490-1-phil.hord@gmail.com>

--------------------------------------------------
[Cooking]

* bc/use-sha256-by-default-in-3.0 (2025-06-19) 10 commits
 - Enable SHA-256 by default in breaking changes mode
 - t5300: choose the built-in hash outside of a repo
 - t4042: choose the built-in hash outside of a repo
 - t1007: choose the built-in hash outside of a repo
 - t: default to compile-time default hash if not set
 - setup: use the default algorithm to initialize repo format
 - Use original hash for legacy formats
 - builtin: use default hash when outside a repository
 - hash: add a constant for the original hash algorithm
 - hash: add a constant for the default hash algorithm

 Prepare to flip the default hash function to SHA-256.

 Needs review.
 source: <20250620011943.586596-1-sandals@crustytoothpaste.net>


* jc/cocci-avoid-regexp-constraint (2025-06-18) 1 commit
  (merged to 'next' on 2025-06-23 at 9ca93f0bac)
 + cocci: matching (multiple) identifiers

 Avoid regexp_constraint and instead use comparison_constraint when
 listing functions to exclude from application of coccinelle rules,
 as spatch can be built with different regexp engine X-<.

 Will merge to 'master'.
 source: <xmqqbjqlexzd.fsf@gitster.g>


* jc/cocci-dtype (2025-06-18) 1 commit
 - cocci: do not directly access the .d_type member in struct dirent

 Catch direct access to .d_type member of struct dirent, as some
 non-POSIX compliant systems we support lack it, and rewrite to use
 DTYPE() macro, which is not quite the right thing to do.

 Will discard.
 source: <xmqq4iwcgbzb.fsf@gitster.g>


* jc/coccicheck-fails-make-when-it-fails (2025-06-23) 1 commit
 - coccicheck: fail "make" when it fails

 "make coccicheck" succeeds even when spatch made suggestions, which
 has been updated to fail in such a case.

 Needs review.
 source: <xmqqbjqe77vw.fsf@gitster.g>


* jt/imap-send-message-fix (2025-06-20) 3 commits
 - imap-send: improve error messages with configuration hints
 - imap-send: fix confusing 'store' terminology in error message
 - Merge branch 'ag/imap-send-resurrection' into jt/imap-send-message-fix
 (this branch uses ag/imap-send-resurrection.)

 Update some error messages from "git imap-send".

 Will merge to 'next'.
 source: <20250620155614.901816-1-joerg@thalheim.io>


* lo/repo-info (2025-06-19) 7 commits
 . repo-info: add field layout.shallow
 . repo-info: add field layout.bare
 . repo-info: add the field references.format
 . repo-info: add the --allow-empty flag
 . repo-info: add plaintext as an output format
 . repo-info: add the --format flag
 . repo-info: declare the repo-info command

 A new subcommand "git repo-info" gives users a way to grab various
 repository characteristics.

 Needs review.
 source: <20250619225751.99699-1-lucasseikioshiro@gmail.com>


* cc/fast-import-export-signature-names (2025-06-19) 1 commit
 - fast-(import|export): improve on commit signature output format

 Clean up the way how signature on commit objects are exported to
 and imported from fast-import stream.

 Needs review.
 source: <20250619133630.727274-1-christian.couder@gmail.com>


* ac/deglobal-sparse-variables (2025-06-17) 3 commits
 - environment: remove the global variable 'sparse_expect_files_outside_of_patterns'
 - environment: move access to "core.sparsecheckoutcone" into repo_settings
 - environment: move access to "core.sparsecheckout" into repo_settings

 Two global variables related to sparse checkout have been moved to
 the repository settings structure.

 Waiting for review responses.
 source: <cover.1750157825.git.ayu.chandekar@gmail.com>


* jk/submodule-remote-lookup-cleanup (2025-06-23) 7 commits
 - submodule: look up remotes by URL first
 - submodule: move get_default_remote_submodule()
 - submodule--helper: improve logic for fallback remote name
 - remote: remove the_repository from some functions
 - dir: move starts_with_dot(_dot)_slash to dir.h
 - remote: fix tear down of struct remote
 - remote: remove branch->merge_name and fix branch_release()

 Updating submodules from the upstream did not work well when
 submodule's HEAD is detached, which has been improved.

 Will merge to 'next'.
 source: <20250623-jk-submodule-helper-use-url-v4-0-133ef3d89569@gmail.com>


* ly/run-builtin-use-passed-in-repo (2025-06-15) 1 commit
  (merged to 'next' on 2025-06-18 at ae732fef47)
 + git.c: remove the_repository dependence in run_builtin()

 Code clean-up.

 Will merge to 'master'.
 source: <20250616062233.1589172-1-502024330056@smail.nju.edu.cn>


* rm/t2400-modernize (2025-06-16) 1 commit
  (merged to 'next' on 2025-06-18 at 58c4f9560c)
 + t2400: replace 'test -[efd]' with 'test_path_is_*'

 Test clean-up.

 Will merge to 'master'.
 source: <20250617002939.24478-1-rodmichelassi@gmail.com>


* jc/diff-no-index-with-pathspec-fix (2025-06-18) 1 commit
  (merged to 'next' on 2025-06-20 at 936407d5ef)
 + diff-no-index: do not reference .d_type member of struct dirent

 Recent code added a direct access to the d_type member in "struct
 dirent", but some platforms lack it, which has been corrected.

 Will merge to 'master'.
 source: <xmqqh60ces03.fsf@gitster.g>


* jg/mailinfo-leakfix (2025-06-13) 1 commit
  (merged to 'next' on 2025-06-20 at 3ef4f7f41c)
 + mailinfo.c: fix memory leak in function handle_content_type()

 Leakfix.

 Will merge to 'master'.
 source: <SA1PR22MB399911638F342E1AA20F014AE477A@SA1PR22MB3999.namprd22.prod.outlook.com>


* ow/rebase-verify-insn-fmt-before-initializing-state (2025-06-09) 1 commit
 - rebase: write script before initializing state

 "git rebase -i" with bogus rebase.instructionFormat configuration
 failed to produce the todo file after recording the state files,
 leading to confused "git status"; this has been corrected.

 Comments?
 cf. <20250609221055.136074-1-oystwa@gmail.com>
 source: <20250609221055.136074-1-oystwa@gmail.com>


* ac/preload-index-wo-the-repository (2025-06-10) 2 commits
  (merged to 'next' on 2025-06-18 at fe9378f663)
 + preload-index: stop depending on 'the_repository'
 + environment: remove the global variable 'core_preload_index'

 Code clean-up.

 Will merge to 'master'.
 source: <cover.1749557133.git.ayu.chandekar@gmail.com>


* jc/cg-let-bss-do-its-job (2025-06-11) 1 commit
  (merged to 'next' on 2025-06-18 at 5166eea328)
 + CodingGuidelines: let BSS do its job

 Clarify "do not explicitly initialize to zero" rule in the
 CodingGuidelines document.

 Will merge to 'master'.
 source: <xmqqh60mger9.fsf@gitster.g>


* jc/merge-compact-summary (2025-06-12) 2 commits
 - merge/pull: extend merge.stat configuration variable to cover --compact-summary
 - merge/pull: add the "--compact-summary" option

 "git merge/pull" has been taught the "--compact-summary" option to
 use the compact-summary format, intead of diffstat, when showing
 the summary of the incoming changes.

 Will merge to 'next'.
 source: <20250612222537.2426059-1-gitster@pobox.com>


* ly/prepare-show-merge-leakfix (2025-06-09) 1 commit
  (merged to 'next' on 2025-06-18 at 11ab005b9c)
 + revision: fix memory leak in prepare_show_merge()

 Leakfix.

 Will merge to 'master'.
 source: <pull.1989.v3.git.git.1749515879546.gitgitgadget@gmail.com>


* sa/multi-mailmap-fix (2025-06-13) 1 commit
  (merged to 'next' on 2025-06-18 at c492804406)
 + cat-file: fix mailmap application for different author and committer

 When asking to apply mailmap to both author and committer field
 while showing a commit object, the field that appears later was not
 correctly parsed and replaced, which has been corrected.

 Will merge to 'master'.
 source: <20250613115750.41205-1-siddharthasthana31@gmail.com>


* bs/config-mak-freebsd (2025-06-12) 1 commit
 - config.mak.uname: update settings for FreeBSD

 Drop FreeBSD 4 support and assume we are at least at FreeBSD 6 with
 memmem() supported.

 Expecting a finalized version from Carlo?
 source: <xmqqv7p0bpdl.fsf_-_@gitster.g>


* jc/tag-idempotent-no-op (2025-06-10) 1 commit
 - tag: allow idempotent "git tag" without "--force"

 "git tag T O" when the tag T is already pointing at the object O is
 a no-op; we used to but no longer error out such a request and
 require "--force" and instead turn it into a no-op.

 Comments?
 source: <xmqqzfefodje.fsf@gitster.g>


* ss/compat-bswap-revamp (2025-06-11) 6 commits
 - bswap.h: provide a built-in based version of bswap32/64 if possible
 - bswap.h: remove optimized x86 version of bswap32/64
 - bswap.h: always overwrite ntohl/ntohll macros
 - bswap.h: define GIT_LITTLE_ENDIAN on MSVC as little endian
 - bswap.h: add support for __BYTE_ORDER__
 - Merge branch 'ss/revert-builtin-bswap-stuff' into ss/compat-bswap-revamp

 Clean-up compat/bswap.h mess.

 Comments?
 source: <20250611221444.1567638-1-sebastian@breakpoint.cc>


* ja/doc-git-log-markup (2025-06-08) 9 commits
 - doc: git-log: convert log config to new doc format
 - doc: git-log: convert diff options to new doc format
 - doc: git-log: convert pretty formats to new doc format
 - doc: git-log: convert pretty options to new doc format
 - doc: git-log: convert rev list options to new doc format
 - doc: git-log: convert line range format to new doc format
 - doc: git-log: convert line range options to new doc format
 - doc: git-log convert rev-list-description to new doc format
 - doc: convert git-log to new documentation format

 Doc mark-up updates.

 Review?
 source: <pull.1933.git.1749373787.gitgitgadget@gmail.com>


* kj/stash-onbranch-submodule-fix (2025-06-10) 1 commit
  (merged to 'next' on 2025-06-18 at 1ac7a03f69)
 + stash: fix incorrect branch name in stash message

 "git stash" recorded a wrong branch name when submodules are
 present in the current checkout, which has been corrected.

 Will merge to 'master'.
 source: <20250611014204.24994-1-jayatheerthkulkarni2005@gmail.com>


* ag/imap-send-resurrection (2025-06-19) 10 commits
  (merged to 'next' on 2025-06-23 at 7c2003159b)
 + imap-send: fix minor mistakes in the logs
 + imap-send: display the destination mailbox when sending a message
 + imap-send: display port alongwith host when git credential is invoked
 + imap-send: add ability to list the available folders
 + imap-send: enable specifying the folder using the command line
 + imap-send: add PLAIN authentication method to OpenSSL
 + imap-send: add support for OAuth2.0 authentication
 + imap-send: gracefully fail if CRAM-MD5 authentication is requested without OpenSSL
 + imap-send: fix memory leak in case auth_cram_md5 fails
 + imap-send: fix bug causing cfg->folder being set to NULL
 (this branch is used by jt/imap-send-message-fix.)

 "git imap-send" has been broken for a long time, which has been
 resurrected and then taught to talk OAuth2.0 etc.

 Will merge to 'master'.
 source: <PN3PR01MB9597F9CAD0DA83152E651194B87CA@PN3PR01MB9597.INDPRD01.PROD.OUTLOOK.COM>


* pw/subtree-gpg-sign (2025-06-04) 2 commits
  (merged to 'next' on 2025-06-18 at d3c6435b4f)
 + contrib/subtree: add -S/--gpg-sign
 + contrib/subtree: parse using --stuck-long

 "git subtree" (in contrib/) learns to grok GPG signing its commits.

 Will merge to 'master'.
 source: <pull.1928.v2.git.1749046597.gitgitgadget@gmail.com>


* sk/reftable-clarify-tests (2025-06-05) 10 commits
 - t/unit-tests: finalize migration of reftable-related tests
 - t/unit-tests: convert reftable stack test to use clar
 - t/unit-tests: convert reftable record test to use clar
 - t/unit-tests: convert reftable readwrite test to use clar
 - t/unit-tests: convert reftable table test to use clar
 - t/unit-tests: convert reftable pq test to use clar
 - t/unit-tests: convert reftable merged test to use clar
 - t/unit-tests: convert reftable block test to use clar
 - t/unit-tests: convert reftable basics test to use clar test framework
 - t/unit-tests: implement clar specific reftable test helper functions

 The reftable unit tests are now ported to the "clar" unit testing
 framework.

 Expecting a reroll.
 cf. <xmqqikla86id.fsf@gitster.g>
 source: <20250605140644.239199-1-kuforiji98@gmail.com>


* ag/send-email-edit-threading-fix (2025-06-04) 2 commits
  (merged to 'next' on 2025-06-18 at dfe9cb890c)
 + send-email: show the new message id assigned by outlook in the logs
 + send-email: fix bug resulting in broken threads if a message is edited

 "git send-email" incremented its internal message counter when a
 message was edited, which made logic that treats the first message
 specially misbehave, which has been corrected.

 Will merge to 'master'.
 source: <PN3PR01MB95979AA8114CA26405BE02CFB86CA@PN3PR01MB9597.INDPRD01.PROD.OUTLOOK.COM>


* ly/load-bitmap-leakfix (2025-06-02) 3 commits
 - pack-bitmap: add load corrupt bitmap test
 - pack-bitmap: reword comments in test_bitmap_commits()
 - pack-bitmap: fix memory leak if load_bitmap() failed

 Leakfix with a new and a bit invasive test.

 Comments?
 source: <pull.1962.v5.git.git.1748920444.gitgitgadget@gmail.com>


* ps/maintenance-ref-lock (2025-06-03) 12 commits
  (merged to 'next' on 2025-06-18 at 724574661a)
 + builtin/maintenance: fix locking race when handling "gc" task
 + builtin/gc: avoid global state in `gc_before_repack()`
 + usage: allow dying without writing an error message
 + builtin/maintenance: fix locking race with refs and reflogs tasks
 + builtin/maintenance: split into foreground and background tasks
 + builtin/maintenance: fix typedef for function pointers
 + builtin/maintenance: extract function to run tasks
 + builtin/maintenance: stop modifying global array of tasks
 + builtin/maintenance: mark "--task=" and "--schedule=" as incompatible
 + builtin/maintenance: centralize configuration of explicit tasks
 + builtin/gc: drop redundant local variable
 + builtin/gc: use designated field initializers for maintenance tasks

 "git maintenance" lacked the care "git gc" had to avoid holding
 onto the repository lock for too long during packing refs, which
 has been remedied.

 Will merge to 'master'.
 source: <20250603-b4-pks-maintenance-ref-lock-race-v4-0-52f5cf7b7e99@pks.im>


* tb/prepare-midx-pack-cleanup (2025-05-28) 5 commits
 - midx: return a `packed_git` pointer from `prepare_midx_pack()`
 - midx-write.c: extract inner loop from fill_packs_from_midx()
 - midx-write.c: guard against incremental MIDXs in want_included_pack()
 - midx: access pack names through `nth_midxed_pack_name()`
 - Merge branch 'ps/midx-negative-packfile-cache' into tb/prepare-midx-pack-cleanup

 Improvement on Multi-pack-index API.

 Expecting a reroll?
 cf. <20250530065034.GC1321283@coredump.intra.peff.net>
 source: <cover.1748473122.git.me@ttaylorr.com>


* pw/stash-p-pathspec-fixes (2025-06-07) 2 commits
  (merged to 'next' on 2025-06-18 at 10cfc8865f)
 + stash: allow "git stash [<options>] --patch <pathspec>" to assume push
 + stash: allow "git stash -p <pathspec>" to assume push again

 "git stash -p <pathspec>" improvements.

 Will merge to 'master'.
 source: <cover.1749289514.git.phillip.wood@dunelm.org.uk>


* kn/fetch-push-bulk-ref-update (2025-05-19) 4 commits
 - receive-pack: use batched reference updates
 - send-pack: fix memory leak around duplicate refs
 - fetch: use batched reference updates
 - refs: add function to translate errors to strings
 (this branch is used by kn/fetch-push-bulk-ref-update-fixup.)

 "git push" and "git fetch" are taught to update refs in batches to
 gain performance.

 Tentatively kicked out of 'next' to give its fix-up topic a chance to reboot.
 source: <20250519-501-update-git-fetch-1-to-use-partial-transactions-v3-0-6cdfd4f769b9@gmail.com>


* kn/fetch-push-bulk-ref-update-fixup (2025-06-20) 3 commits
 - receive-pack: handle reference deletions separately
 - refs/files: skip updates with errors in batched updates
 - Merge branch 'kn/fetch-push-bulk-ref-update' into kn/fetch-push-bulk-ref-update-fixup
 (this branch uses kn/fetch-push-bulk-ref-update.)

 Additional fixes to the base topic.

 Comments?
 source: <20250620-6769-address-test-failures-in-the-next-branch-caused-by-batched-reference-updates-v5-0-f35ee6b59a82@gmail.com>


* kj/renamed-submodule (2025-06-07) 2 commits
 - submodule: skip redundant active entries when pattern covers path
 - submodule: prevent overwriting .gitmodules entry on path reuse

 The case where a new submodule takes a path where used to be a
 completely different subproject is now dealt a bit better than
 before.

 Comments?
 source: <20250608032705.11990-1-jayatheerthkulkarni2005@gmail.com>


* bc/stash-export-import (2025-06-11) 4 commits
 - builtin/stash: provide a way to import stashes from a ref
 - builtin/stash: provide a way to export stashes to a ref
 - builtin/stash: factor out revision parsing into a function
 - object-name: make get_oid quietly return an error

 An interchange format for stash entries is defined, and subcommand
 of "git stash" to import/export has been added.

 Will merge to 'next'.
 source: <20250612011221.4158484-1-sandals@crustytoothpaste.net>


* lm/add-p-context (2025-05-10) 4 commits
 - add-patch: add diff.context command line overrides
 - add-patch: respect diff.context configuration
 - test: refactor to use "test_config"
 - test: refactor to use "test_grep"

 "git add/etc -p" now honors diff.context configuration variable,
 and learns to honor -U<n> option.

 Expecting a reroll.
 cf. <CAP9jKjEYgEBBGrPnJ8fkaWuS8RPzBeBqFLE7aTJd5x9PcWu=7Q@mail.gmail.com>
 source: <pull.1915.v2.git.1746884789.gitgitgadget@gmail.com>


* ps/contrib-sweep (2025-05-12) 11 commits
 - contrib: remove some scripts in "stats" directory
 - contrib: remove "git-new-workdir"
 - contrib: remove "emacs" directory
 - contrib: remove "git-resurrect.sh"
 - contrib: remove "persistent-https" remote helper
 - contrib: remove "mw-to-git"
 - contrib: remove "hooks" directory
 - contrib: remove "thunderbird-patch-inline"
 - contrib: remove remote-helper stubs
 - contrib: remove "examples" directory
 - contrib: remove "remotes2config.sh"

 Remove bunch of stuff from contrib/ hierarchy.

 Will merge to 'next'.
 source: <20250512-pks-contrib-spring-cleanup-v3-0-32e151b0bfb0@pks.im>


* ps/object-store (2025-06-04) 17 commits
 - odb: rename `read_object_with_reference()`
 - odb: rename `pretend_object_file()`
 - odb: rename `has_object()`
 - odb: rename `repo_read_object_file()`
 - odb: rename `oid_object_info()`
 - odb: trivial refactorings to get rid of `the_repository`
 - odb: get rid of `the_repository` when handling submodule sources
 - odb: get rid of `the_repository` when handling the primary source
 - odb: get rid of `the_repository` in `for_each()` functions
 - odb: get rid of `the_repository` when handling alternates
 - odb: get rid of `the_repository` in `odb_mkstemp()`
 - odb: get rid of `the_repository` in `assert_oid_type()`
 - odb: get rid of `the_repository` in `find_odb()`
 - odb: introduce parent pointers
 - object-store: rename files to "odb.{c,h}"
 - object-store: rename `object_directory` to `odb_source`
 - object-store: rename `raw_object_store` to `object_database`

 Code clean-up around object access API.

 Comments?
 source: <20250605-pks-object-store-wo-the-repository-v5-0-779d1c28774b@pks.im>


* cc/promisor-remote-capability (2025-06-11) 5 commits
 - promisor-remote: use string constants for 'name' and 'url' too
 - promisor-remote: allow a client to check fields
 - promisor-remote: refactor how we parse advertised fields
 - promisor-remote: allow a server to advertise more fields
 - promisor-remote: refactor to get rid of 'struct strvec'

 The "promisor-remote" capability mechanism has been updated to
 allow the "partialCloneFilter" settings and the "token" value to be
 communicated from the server side.

 Comments?
 source: <20250611134506.2975856-1-christian.couder@gmail.com>


* jc/you-still-use-whatchanged (2025-05-12) 6 commits
  (merged to 'next' on 2025-06-18 at b28af0a02c)
 + whatschanged: list it in BreakingChanges document
 + whatchanged: remove when built with WITH_BREAKING_CHANGES
 + whatchanged: require --i-still-use-this
 + tests: prepare for a world without whatchanged
 + doc: prepare for a world without whatchanged
 + you-still-use-that??: help deprecating commands for removal

 "git whatchanged" that is longer to type than "git log --raw"
 which is its modern rough equivalent has outlived its usefulness
 more than 10 years ago.  Plan to deprecate and remove it.

 Will merge to 'master'.
 source: <20250512190311.1451556-1-gitster@pobox.com>


* sj/string-list-typefix (2025-05-18) 8 commits
 - u-string-list: move "remove duplicates" test to "u-string-list.c"
 - u-string-list: move "filter string" test to "u-string-list.c"
 - u-string-list: move "test_split_in_place" to "u-string-list.c"
 - u-string-list: move "test_split" into "u-string-list.c"
 - string-list: enable sign compare warnings check
 - string-list: return index directly when inserting an existing element
 - string-list: remove unused "insert_at" parameter from add_entry
 - string-list: fix sign compare warnings for loop iterator

 Code and test clean-up around string-list API.

 Comments?
 source: <aCoDB9P5XV1lHMil@ArchLinux>


* tb/midx-avoid-cruft-packs (2025-06-23) 9 commits
 - repack: exclude cruft pack(s) from the MIDX where possible
 - pack-objects: introduce '--stdin-packs=follow'
 - pack-objects: swap 'show_{object,commit}_pack_hint'
 - pack-objects: fix typo in 'show_object_pack_hint()'
 - pack-objects: perform name-hash traversal for unpacked objects
 - pack-objects: declare 'rev_info' for '--stdin-packs' earlier
 - pack-objects: factor out handling '--stdin-packs'
 - pack-objects: limit scope in 'add_object_entry_from_pack()'
 - pack-objects: use standard option incompatibility functions

 "pack-objects" has been taught to avoid pointing into objects in
 cruft packs from midx.

 Ready?
 source: <cover.1750717921.git.me@ttaylorr.com>

^ permalink raw reply

* Re: [RFC PATCH 0/2] fetch --prune performance problem
From: Jacob Keller @ 2025-06-23 23:46 UTC (permalink / raw)
  To: Phil Hord; +Cc: Jeff King, git
In-Reply-To: <CABURp0q-1FGmD+PJeSQ=xvyDN6ZYn1O7Fh8i1OojfD2WQCqgcw@mail.gmail.com>



On 6/23/2025 4:40 PM, Phil Hord wrote:
> On Mon, Jun 23, 2025 at 4:32 PM Jacob Keller <jacob.e.keller@intel.com>
>> On 6/23/2025 4:11 PM, Phil Hord wrote:
>>> I have a new patch that produces this:
>>>
>>>     + git fetch --prune --dry-run
>>>     From /tmp/repo/.
>>>      - [deleted]                   (none)     -> origin/branches
>>>      - [deleted]                   (none)     -> origin/master
>>>      - [deleted]                   (none)     -> origin/other
>>>        origin/HEAD will become dangling after origin/master is deleted
>>>
>>
>>
>> It is a bit weird that this says "will become dangling after <ref> is
>> deleted" because the deletion already happened.
> 
> That's because I used the `--dry-run` switch.  Sorry for the confusion.
> 
>     + git fetch --prune
>     From /tmp/repo/.
>      - [deleted]                   (none)     -> origin/branches
>      - [deleted]                   (none)     -> origin/master
>      - [deleted]                   (none)     -> origin/other
>        origin/HEAD has become dangling after origin/master was deleted
> 

Aha! That is even better that it properly adjusts the text based on
--dry-run.

I like it.

Regards,
Jake

^ permalink raw reply

* [PATCH v2 2/2] refs: remove old refs_warn_dangling_symref
From: Phil Hord @ 2025-06-23 23:43 UTC (permalink / raw)
  To: gitster; +Cc: peff, git, Jacob Keller, Phil Hord
In-Reply-To: <20250623234327.335490-1-phil.hord@gmail.com>

From: Phil Hord <phil.hord@gmail.com>

The dangling warning function that takes a single ref to search for
is no longer used.  Remove it.

Signed-off-by: Phil Hord <phil.hord@gmail.com>
---
 refs.c | 17 +----------------
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/refs.c b/refs.c
index e2075a98c844..a9fbb0c8f23c 100644
--- a/refs.c
+++ b/refs.c
@@ -438,7 +438,6 @@ static int for_each_filter_refs(const char *refname, const char *referent,
 struct warn_if_dangling_data {
 	struct ref_store *refs;
 	FILE *fp;
-	const char *refname;
 	const struct string_list *refnames;
 	const char *msg_fmt;
 };
@@ -455,9 +454,7 @@ static int warn_if_dangling_symref(const char *refname, const char *referent UNU
 
 	resolves_to = refs_resolve_ref_unsafe(d->refs, refname, 0, NULL, NULL);
 	if (!resolves_to
-	    || (d->refname
-		? strcmp(resolves_to, d->refname)
-		: !string_list_has_string(d->refnames, resolves_to))) {
+	    || !string_list_has_string(d->refnames, resolves_to)) {
 		return 0;
 	}
 
@@ -468,18 +465,6 @@ static int warn_if_dangling_symref(const char *refname, const char *referent UNU
 	return 0;
 }
 
-void refs_warn_dangling_symref(struct ref_store *refs, FILE *fp,
-			       const char *msg_fmt, const char *refname)
-{
-	struct warn_if_dangling_data data = {
-		.refs = refs,
-		.fp = fp,
-		.refname = refname,
-		.msg_fmt = msg_fmt,
-	};
-	refs_for_each_rawref(refs, warn_if_dangling_symref, &data);
-}
-
 void refs_warn_dangling_symrefs(struct ref_store *refs, FILE *fp,
 				const char *msg_fmt, const struct string_list *refnames)
 {
-- 
2.50.0.84.g5d85fe910b.dirty


^ permalink raw reply related

* [PATCH v2 1/2] fetch-prune: optimize dangling-ref reporting
From: Phil Hord @ 2025-06-23 23:43 UTC (permalink / raw)
  To: gitster; +Cc: peff, git, Jacob Keller, Phil Hord
In-Reply-To: <20250623234327.335490-1-phil.hord@gmail.com>

From: Phil Hord <phil.hord@gmail.com>

When pruning during `git fetch` we check each pruned ref against the
ref_store one at a time to decide whether to report it as dangling.
This causes every local ref to be scanned for each ref being pruned.

If there are N refs in the repo and M refs being pruned, this code is
O(M*N). However, `git remote prune` uses a very similar function that
is only O(N*log(M)).

Remove the wasteful ref scanning for each pruned ref and use the faster
version already available in refs_warn_dangling_symrefs.

In a repo with 126,000 refs, where I was pruning 28,000 refs, this
code made about 3.6 billion calls to strcmp and consumed 410 seconds
of CPU. (Invariably in that time, my remote would timeout and the
fetch would fail anyway.)

After this change, the same operation completes in under a second.

Signed-off-by: Phil Hord <phil.hord@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
---
 builtin/fetch.c  | 20 ++++++++++----------
 builtin/remote.c |  4 ++--
 refs.c           |  4 +++-
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 40a0e8d24434..65d606c6de08 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1383,9 +1383,13 @@ static int prune_refs(struct display_state *display_state,
 	int result = 0;
 	struct ref *ref, *stale_refs = get_stale_heads(rs, ref_map);
 	struct strbuf err = STRBUF_INIT;
+	struct string_list refnames = STRING_LIST_INIT_NODUP;
 	const char *dangling_msg = dry_run
-		? _("   (%s will become dangling)")
-		: _("   (%s has become dangling)");
+		? _("   %s will become dangling after %s is deleted")
+		: _("   %s has become dangling after %s was deleted");
+
+	for (ref = stale_refs; ref; ref = ref->next)
+		string_list_append(&refnames, ref->name);
 
 	if (!dry_run) {
 		if (transaction) {
@@ -1396,15 +1400,9 @@ static int prune_refs(struct display_state *display_state,
 					goto cleanup;
 			}
 		} else {
-			struct string_list refnames = STRING_LIST_INIT_NODUP;
-
-			for (ref = stale_refs; ref; ref = ref->next)
-				string_list_append(&refnames, ref->name);
-
 			result = refs_delete_refs(get_main_ref_store(the_repository),
 						  "fetch: prune", &refnames,
 						  0);
-			string_list_clear(&refnames, 0);
 		}
 	}
 
@@ -1416,12 +1414,14 @@ static int prune_refs(struct display_state *display_state,
 					   _("(none)"), ref->name,
 					   &ref->new_oid, &ref->old_oid,
 					   summary_width);
-			refs_warn_dangling_symref(get_main_ref_store(the_repository),
-						  stderr, dangling_msg, ref->name);
 		}
+		string_list_sort(&refnames);
+		refs_warn_dangling_symrefs(get_main_ref_store(the_repository),
+					   stderr, dangling_msg, &refnames);
 	}
 
 cleanup:
+	string_list_clear(&refnames, 0);
 	strbuf_release(&err);
 	free_refs(stale_refs);
 	return result;
diff --git a/builtin/remote.c b/builtin/remote.c
index 0d6755bcb71e..4de7dd373ae5 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -1522,8 +1522,8 @@ static int prune_remote(const char *remote, int dry_run)
 	struct string_list refs_to_prune = STRING_LIST_INIT_NODUP;
 	struct string_list_item *item;
 	const char *dangling_msg = dry_run
-		? _(" %s will become dangling!")
-		: _(" %s has become dangling!");
+		? _(" %s will become dangling after %s is deleted!")
+		: _(" %s has become dangling after %s was deleted!");
 
 	get_remote_ref_states(remote, &states, GET_REF_STATES);
 
diff --git a/refs.c b/refs.c
index dce5c49ca2ba..e2075a98c844 100644
--- a/refs.c
+++ b/refs.c
@@ -461,7 +461,9 @@ static int warn_if_dangling_symref(const char *refname, const char *referent UNU
 		return 0;
 	}
 
-	fprintf(d->fp, d->msg_fmt, refname);
+	skip_prefix(refname, "refs/remotes/", &refname);
+	skip_prefix(resolves_to, "refs/remotes/", &resolves_to);
+	fprintf(d->fp, d->msg_fmt, refname, resolves_to);
 	fputc('\n', d->fp);
 	return 0;
 }
-- 
2.50.0.84.g5d85fe910b.dirty


^ permalink raw reply related

* [PATCH v2 0/2] fetch --prune performance problem
From: Phil Hord @ 2025-06-23 23:43 UTC (permalink / raw)
  To: gitster; +Cc: peff, git, Jacob Keller, Phil Hord

From: Phil Hord <phil.hord@gmail.com>

`git fetch --prune` runs in O(N^2) time normally. This happens because the code
iterates over each ref to be pruned to display its status. In a repo with
174,000 refs, where I was pruning 15,000 refs, the current code made 2.6 billion
calls to strcmp and consumed 470 seconds of CPU. After this change, the same
operation completes in under 1 second.

The loop looks like this:

    for p in prune_refs { for ref in all_refs { if p == ref { ... }}}

That loop runs only to check for and report newly dangling refs. A workaround to
avoid this slowness is to run with `-q` to bypass this check.

There is similar check/report functionality in `git remote prune`, but it uses a
more efficient method to check for dangling refs. prune_refs is first sorted, so
it can be searched in O(logN), so this loop is O(N*logN).

    for ref in all_refs { if ref in prune_refs { ... }}

We can use that function instead, with some minor cleanup to the output to deal
with the ordering being changed.

This patch version only adds the deleted branch name to the output of the dangling
sym refs since the ordering has changed. This is only a minor cleanup and was
not actually needed since, for example, `git origin prune` already did not
mind losing track of this information in its output. But now it is improved
to be more explicit.

Phil Hord (2):
  fetch-prune: optimize dangling-ref reporting
  refs: remove old refs_warn_dangling_symref

 builtin/fetch.c  | 20 ++++++++++----------
 builtin/remote.c |  4 ++--
 refs.c           | 21 ++++-----------------
 3 files changed, 16 insertions(+), 29 deletions(-)

-- 
2.50.0.84.g5d85fe910b.dirty


^ permalink raw reply

* Re: [RFC PATCH 0/2] fetch --prune performance problem
From: Junio C Hamano @ 2025-06-23 23:41 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Phil Hord, Jeff King, git
In-Reply-To: <f11bf463-0005-43d2-b642-ede130d1f44c@intel.com>

Jacob Keller <jacob.e.keller@intel.com> writes:

>>> Alternatively, the dangling message could just mention where it the
>>> now-dangling symref points at, something like:
>>>
>>>    - [deleted]         (none)     -> origin/branches
>>>    - [deleted]         (none)     -> origin/main
>>>    - [deleted]         (none)     -> origin/other
>>>      (refs/remotes/origin/HEAD points to the now-deleted origin/main)
>> 
>> I have a new patch that produces this:
>> 
>>     + git fetch --prune --dry-run
>>     From /tmp/repo/.
>>      - [deleted]                   (none)     -> origin/branches
>>      - [deleted]                   (none)     -> origin/master
>>      - [deleted]                   (none)     -> origin/other
>>        origin/HEAD will become dangling after origin/master is deleted
>> 
>
>
> It is a bit weird that this says "will become dangling after <ref> is
> deleted" because the deletion already happened.

But that is with "--dry-run".  Without it, presumably 

    origin/HEAD is now dangling since origin/master was deleted

or something, probably.

^ permalink raw reply

* Re: [PATCH v6 8/9] pack-objects: introduce '--stdin-packs=follow'
From: Junio C Hamano @ 2025-06-23 23:35 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King
In-Reply-To: <3699c25337e1c91388bad4c56441b39a9984798b.1750717921.git.me@ttaylorr.com>

Taylor Blau <me@ttaylorr.com> writes:

>  static void show_object_pack_hint(struct object *object, const char *name,
> -				  void *data UNUSED)
> +				  void *data)
>  {
> -	struct object_entry *oe = packlist_find(&to_pack, &object->oid);
> -	if (!oe)
> -		return;
> +	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
> +	if (mode == STDIN_PACKS_MODE_FOLLOW) {
> +		if (object->type == OBJ_BLOB &&
> +		    !has_object(the_repository, &object->oid, 0))
> +			return;

So, --stdin-packs opened a pack and is feeding the objects contained
in it to this machinery.  show_commit_pack_hint() calls this
function in the `follow` mode.  How would such an object be missing?
Ah, lazy clones.  OK.

> +		add_object_entry(&object->oid, object->type, name, 0);
> +	} else {

And only up to this point is the new code.  The "else" clause is
just the original indented one-level deeper.

> +static void show_commit_pack_hint(struct commit *commit, void *data)
>  {
> +	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
> +
> +	if (mode == STDIN_PACKS_MODE_FOLLOW) {
> +		show_object_pack_hint((struct object *)commit, "", data);
> +		return;
> +	}
> +
>  	/* nothing to do; commits don't have a namehash */
> +
>  }

What is this new blank line doing here?



^ permalink raw reply

* Re: [RFC PATCH 0/2] fetch --prune performance problem
From: Jacob Keller @ 2025-06-23 23:32 UTC (permalink / raw)
  To: Phil Hord, Jeff King; +Cc: git
In-Reply-To: <CABURp0p4d0JPg=-cW1OZdFQJ+vNT_0PDd9Rv3oz6toFGqGv5=g@mail.gmail.com>



On 6/23/2025 4:11 PM, Phil Hord wrote:
> On Wed, Jun 18, 2025 at 8:37 PM Jeff King <peff@peff.net> wrote:
>> On Wed, Jun 18, 2025 at 04:15:03PM -0700, Jacob Keller wrote:
>>> On 6/18/2025 2:08 PM, Phil Hord wrote:
>>>> My patch fixes this for fetch, but it affects the command's output
> order.
>>>> Currently the results look like this:
>>>>
>>>>      - [deleted]     (none) -> origin/bar
>>>>        (origin/bar has become dangling)
>>>>      - [deleted]     (none) -> origin/baz
>>>>      - [deleted]     (none) -> origin/foo
>>>>        (origin/foo has become dangling)
>>>>      - [deleted]     (none) -> origin/frotz
>>>>
>>>> After my change, the order will change so the danglers are reported
> at the end.
>>>>
>>>>      - [deleted]     (none) -> origin/bar
>>>>      - [deleted]     (none) -> origin/baz
>>>>      - [deleted]     (none) -> origin/foo
>>>>      - [deleted]     (none) -> origin/frotz
>>>>        (origin/bar has become dangling)
>>>>        (origin/foo has become dangling)
>>>
>>> Personally, I like the later output. I have no idea why anyone would be
>>> specifically scripting something that depends on the ordering being such
>>> that dangling messages are printed immediately.
>>
>> I think the original ordering tells you which deletion caused the ref to
>> become dangling. Phil's example is a little confusing here:
>>
>>     - [deleted]     (none) -> origin/bar
>>       (origin/bar has become dangling)
>>
>> because the name is the same in both cases. A more likely output is that
>> origin/HEAD becomes dangling (since it's the only symref Git ever
>> automatically points at a tracking ref). E.g., in this:
>>
>>   git init repo
>>   cd repo
>>
>>   git commit --allow-empty -m foo
>>   git branch some
>>   git branch other
>>   git branch branches
>>
>>   git clone . child
>>   cur=$(git symbolic-ref --short HEAD)
>>   git checkout some
>>   git branch -d other branches $cur
>>
>>   cd child
>>   git fetch --prune
> 
> Thanks for the helpful demo and clarification of the real output.
> 
>> The final fetch output looks like:
>>
>>    - [deleted]         (none)     -> origin/branches
>>    - [deleted]         (none)     -> origin/main
>>      (refs/remotes/origin/HEAD has become dangling)
>>    - [deleted]         (none)     -> origin/other
>>
>> and we can see that the deletion of "main" is what caused the dangling.
>>
>> That said, I'm not sure I care that much. I didn't even know we had this
>> dangling message, and it's been around for over 15 years!
>>
>> If we did want to preserve the ordering, it could be done by taking two
>> passes (the first to create a reverse map of deletions to danglers, and
>> then the second to print each ref).
>>
>> Alternatively, the dangling message could just mention where it the
>> now-dangling symref points at, something like:
>>
>>    - [deleted]         (none)     -> origin/branches
>>    - [deleted]         (none)     -> origin/main
>>    - [deleted]         (none)     -> origin/other
>>      (refs/remotes/origin/HEAD points to the now-deleted origin/main)
> 
> I have a new patch that produces this:
> 
>     + git fetch --prune --dry-run
>     From /tmp/repo/.
>      - [deleted]                   (none)     -> origin/branches
>      - [deleted]                   (none)     -> origin/master
>      - [deleted]                   (none)     -> origin/other
>        origin/HEAD will become dangling after origin/master is deleted
> 


It is a bit weird that this says "will become dangling after <ref> is
deleted" because the deletion already happened.

^ permalink raw reply

* [PATCH v4 7/7] submodule: look up remotes by URL first
From: Jacob Keller @ 2025-06-23 23:11 UTC (permalink / raw)
  To: git; +Cc: Jacob Keller, Lidong Yan, Patrick Steinhardt, Junio C Hamano
In-Reply-To: <20250623-jk-submodule-helper-use-url-v4-0-133ef3d89569@gmail.com>

From: Jacob Keller <jacob.keller@gmail.com>

The get_default_remote_submodule() function performs a lookup to find
the appropriate remote to use within a submodule. The function first
checks to see if it can find the remote for the current branch. If this
fails, it then checks to see if there is exactly one remote. It will use
this, before finally falling back to "origin" as the default.

If a user happens to rename their default remote from origin, either
manually or by setting something like clone.defaultRemoteName, this
fallback will not work.

In such cases, the submodule logic will try to use a non-existent
remote. This usually manifests as a failure to trigger the submodule
update.

The parent project already knows and stores the submodule URL in either
.gitmodules or its .git/config.

Add a new repo_remote_from_url() helper which will iterate over all the
remotes in a repository and return the first remote which has a matching
URL.

Refactor get_default_remote_submodule to find the submodule and get its
URL. If a valid URL exists, first try to obtain a remote using the new
repo_remote_from_url(). Fall back to the repo_default_remote()
otherwise.

The fallback logic is kept in case for some reason the user has manually
changed the URL within the submodule. Additionally, we still try to use
a remote rather than directly passing the URL in the
fetch_in_submodule() logic. This ensures that an update will properly
update the remote refs within the submodule as expected, rather than
just fetching into FETCH_HEAD.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
---
 remote.h                    |  1 +
 builtin/submodule--helper.c | 26 +++++++++++++++++++++++++-
 remote.c                    | 15 +++++++++++++++
 t/t7406-submodule-update.sh | 32 ++++++++++++++++++++++++++++++++
 4 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/remote.h b/remote.h
index 8dc5cfa49ef78808348a84c9b3f416b31cd3bbd7..0ca399e1835bf1829054d8937d95e5625c38d881 100644
--- a/remote.h
+++ b/remote.h
@@ -340,6 +340,7 @@ const char *pushremote_for_branch(struct branch *branch, int *explicit);
 char *remote_ref_for_branch(struct branch *branch, int for_push);
 
 const char *repo_default_remote(struct repository *repo);
+const char *repo_remote_from_url(struct repository *repo, const char *url);
 
 /* returns true if the given branch has merge configuration given. */
 int branch_has_merge_config(struct branch *branch);
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 1aa87435c2000e94f43da94c5ef88a307f6f3f4a..84a96d300d9489706fb16280f03aea02f87f1657 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -72,16 +72,40 @@ static char *resolve_relative_url(const char *rel_url, const char *up_path, int
 
 static int get_default_remote_submodule(const char *module_path, char **default_remote)
 {
+	const struct submodule *sub;
 	struct repository subrepo;
+	const char *remote_name = NULL;
+	char *url = NULL;
+
+	sub = submodule_from_path(the_repository, null_oid(the_hash_algo), module_path);
+	if (sub && sub->url) {
+		url = xstrdup(sub->url);
+
+		/* Possibly a url relative to parent */
+		if (starts_with_dot_dot_slash(url) ||
+		    starts_with_dot_slash(url)) {
+			char *oldurl = url;
+
+			url = resolve_relative_url(oldurl, NULL, 1);
+			free(oldurl);
+		}
+	}
 
 	if (repo_submodule_init(&subrepo, the_repository, module_path,
 				null_oid(the_hash_algo)) < 0)
 		return die_message(_("could not get a repository handle for submodule '%s'"),
 				   module_path);
 
-	*default_remote = xstrdup(repo_default_remote(&subrepo));
+	/* Look up by URL first */
+	if (url)
+		remote_name = repo_remote_from_url(&subrepo, url);
+	if (!remote_name)
+		remote_name = repo_default_remote(&subrepo);
+
+	*default_remote = xstrdup(remote_name);
 
 	repo_clear(&subrepo);
+	free(url);
 
 	return 0;
 }
diff --git a/remote.c b/remote.c
index e35cf7ec61ac946d1e84c5a0ae309e63c66d0335..60b4aec3dee3847b3a7b5be0f1acd9a52c60c71d 100644
--- a/remote.c
+++ b/remote.c
@@ -1801,6 +1801,21 @@ const char *repo_default_remote(struct repository *repo)
 	return remotes_remote_for_branch(repo->remote_state, branch, NULL);
 }
 
+const char *repo_remote_from_url(struct repository *repo, const char *url)
+{
+	read_config(repo, 0);
+
+	for (int i = 0; i < repo->remote_state->remotes_nr; i++) {
+		struct remote *remote = repo->remote_state->remotes[i];
+		if (!remote)
+			continue;
+
+		if (remote_has_url(remote, url))
+			return remote->name;
+	}
+	return NULL;
+}
+
 int branch_has_merge_config(struct branch *branch)
 {
 	return branch && branch->set_merge;
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index 748b529745a5121f121768bb4e0cbc11bc833ea4..c09047b5f441a73a02a9fc4197e9a0ea8f39b529 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -1134,6 +1134,38 @@ test_expect_success 'setup clean recursive superproject' '
 	git clone --recurse-submodules top top-clean
 '
 
+test_expect_success 'submodule update with multiple remotes' '
+	test_when_finished "rm -fr top-cloned" &&
+	cp -r top-clean top-cloned &&
+
+	# Create a commit in each repo, starting with bottom
+	test_commit -C bottom multiple_remote_commit &&
+	# Create middle commit
+	git -C middle/bottom fetch &&
+	git -C middle/bottom checkout -f FETCH_HEAD &&
+	git -C middle add bottom &&
+	git -C middle commit -m "multiple_remote_commit" &&
+	# Create top commit
+	git -C top/middle fetch &&
+	git -C top/middle checkout -f FETCH_HEAD &&
+	git -C top add middle &&
+	git -C top commit -m "multiple_remote_commit" &&
+
+	# rename the submodule remote
+	git -C top-cloned/middle remote rename origin upstream &&
+
+	# Add another remote
+	git -C top-cloned/middle remote add other bogus &&
+
+	# Make the update of "middle" a no-op, otherwise we error out
+	# because of its unmerged state
+	test_config -C top-cloned submodule.middle.update !true &&
+	git -C top-cloned submodule update --recursive 2>actual.err &&
+	cat >expect.err <<-\EOF &&
+	EOF
+	test_cmp expect.err actual.err
+'
+
 test_expect_success 'submodule update with renamed remote' '
 	test_when_finished "rm -fr top-cloned" &&
 	cp -r top-clean top-cloned &&

-- 
2.48.1.397.gec9d649cc640


^ permalink raw reply related

* [PATCH v4 6/7] submodule: move get_default_remote_submodule()
From: Jacob Keller @ 2025-06-23 23:11 UTC (permalink / raw)
  To: git; +Cc: Jacob Keller, Lidong Yan, Patrick Steinhardt, Junio C Hamano
In-Reply-To: <20250623-jk-submodule-helper-use-url-v4-0-133ef3d89569@gmail.com>

From: Jacob Keller <jacob.keller@gmail.com>

A future refactor got get_default_remote_submodule() is going to depend on
resolve_relative_url(). That function depends on get_default_remote().

Move get_default_remote_submodule() after resolve_relative_url() first
to make the additional functionality easier to review.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
---
 builtin/submodule--helper.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 4aa237033a526fca29cce2926419462179d40ee3..1aa87435c2000e94f43da94c5ef88a307f6f3f4a 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -41,22 +41,6 @@
 typedef void (*each_submodule_fn)(const struct cache_entry *list_item,
 				  void *cb_data);
 
-static int get_default_remote_submodule(const char *module_path, char **default_remote)
-{
-	struct repository subrepo;
-
-	if (repo_submodule_init(&subrepo, the_repository, module_path,
-				null_oid(the_hash_algo)) < 0)
-		return die_message(_("could not get a repository handle for submodule '%s'"),
-				   module_path);
-
-	*default_remote = xstrdup(repo_default_remote(&subrepo));
-
-	repo_clear(&subrepo);
-
-	return 0;
-}
-
 static char *get_default_remote(void)
 {
 	return xstrdup(repo_default_remote(the_repository));
@@ -86,6 +70,22 @@ static char *resolve_relative_url(const char *rel_url, const char *up_path, int
 	return resolved_url;
 }
 
+static int get_default_remote_submodule(const char *module_path, char **default_remote)
+{
+	struct repository subrepo;
+
+	if (repo_submodule_init(&subrepo, the_repository, module_path,
+				null_oid(the_hash_algo)) < 0)
+		return die_message(_("could not get a repository handle for submodule '%s'"),
+				   module_path);
+
+	*default_remote = xstrdup(repo_default_remote(&subrepo));
+
+	repo_clear(&subrepo);
+
+	return 0;
+}
+
 /* the result should be freed by the caller. */
 static char *get_submodule_displaypath(const char *path, const char *prefix,
 				       const char *super_prefix)

-- 
2.48.1.397.gec9d649cc640


^ permalink raw reply related

* [PATCH v4 5/7] submodule--helper: improve logic for fallback remote name
From: Jacob Keller @ 2025-06-23 23:11 UTC (permalink / raw)
  To: git; +Cc: Jacob Keller, Lidong Yan, Patrick Steinhardt, Junio C Hamano
In-Reply-To: <20250623-jk-submodule-helper-use-url-v4-0-133ef3d89569@gmail.com>

From: Jacob Keller <jacob.keller@gmail.com>

The repo_get_default_remote() function in submodule--helper currently
tries to figure out the proper remote name to use for a submodule based
on a few factors.

First, it tries to find the remote for the currently checked out branch.
This works if the submodule is configured to checkout to a branch
instead of a detached HEAD state.

In the detached HEAD state, the code calls back to using "origin", on
the assumption that this is the default remote name. Some users may
change this, such as by setting clone.defaultRemoteName, or by changing
the remote name manually within the submodule repository.

As a first step to improving this situation, refactor to reuse the logic
from remotes_remote_for_branch(). This function uses the remote from the
branch if it has one. If it doesn't then it checks to see if there is
exactly one remote. It uses this remote first before attempting to fall
back to "origin".

To allow using this helper function, introduce a repo_default_remote()
helper to remote.c which takes a repository structure. This helper will
load the remote configuration and get the "HEAD" branch. Then it will
call remotes_remote_for_branch to find the default remote.

Replace calls of repo_get_default_remote() with the calls to this new
function. To maintain consistency with the existing callers, continue
copying the returned string with xstrdup.

This isn't a perfect solution for users who change remote names, but it
should help in cases where the remote name is changed but users haven't
added any additional remotes.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
---
 remote.h                    |  3 +++
 builtin/submodule--helper.c | 46 +++++----------------------------------------
 remote.c                    | 25 +++++++++++++++++++-----
 t/t7406-submodule-update.sh | 29 ++++++++++++++++++++++++++++
 4 files changed, 57 insertions(+), 46 deletions(-)

diff --git a/remote.h b/remote.h
index 76d93bf88d1fb8c0e2cbc2bc99558f23a256155c..8dc5cfa49ef78808348a84c9b3f416b31cd3bbd7 100644
--- a/remote.h
+++ b/remote.h
@@ -9,6 +9,7 @@
 
 struct option;
 struct transport_ls_refs_options;
+struct repository;
 
 /**
  * The API gives access to the configuration related to remotes. It handles
@@ -338,6 +339,8 @@ const char *remote_for_branch(struct branch *branch, int *explicit);
 const char *pushremote_for_branch(struct branch *branch, int *explicit);
 char *remote_ref_for_branch(struct branch *branch, int for_push);
 
+const char *repo_default_remote(struct repository *repo);
+
 /* returns true if the given branch has merge configuration given. */
 int branch_has_merge_config(struct branch *branch);
 
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 9e8cdfe1b2a8c2985d9c1b8ad6f1b0d1f9401714..4aa237033a526fca29cce2926419462179d40ee3 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -41,61 +41,25 @@
 typedef void (*each_submodule_fn)(const struct cache_entry *list_item,
 				  void *cb_data);
 
-static int repo_get_default_remote(struct repository *repo, char **default_remote)
-{
-	char *dest = NULL;
-	struct strbuf sb = STRBUF_INIT;
-	struct ref_store *store = get_main_ref_store(repo);
-	const char *refname = refs_resolve_ref_unsafe(store, "HEAD", 0, NULL,
-						      NULL);
-
-	if (!refname)
-		return die_message(_("No such ref: %s"), "HEAD");
-
-	/* detached HEAD */
-	if (!strcmp(refname, "HEAD")) {
-		*default_remote = xstrdup("origin");
-		return 0;
-	}
-
-	if (!skip_prefix(refname, "refs/heads/", &refname))
-		return die_message(_("Expecting a full ref name, got %s"),
-				   refname);
-
-	strbuf_addf(&sb, "branch.%s.remote", refname);
-	if (repo_config_get_string(repo, sb.buf, &dest))
-		*default_remote = xstrdup("origin");
-	else
-		*default_remote = dest;
-
-	strbuf_release(&sb);
-	return 0;
-}
-
 static int get_default_remote_submodule(const char *module_path, char **default_remote)
 {
 	struct repository subrepo;
-	int ret;
 
 	if (repo_submodule_init(&subrepo, the_repository, module_path,
 				null_oid(the_hash_algo)) < 0)
 		return die_message(_("could not get a repository handle for submodule '%s'"),
 				   module_path);
-	ret = repo_get_default_remote(&subrepo, default_remote);
+
+	*default_remote = xstrdup(repo_default_remote(&subrepo));
+
 	repo_clear(&subrepo);
 
-	return ret;
+	return 0;
 }
 
 static char *get_default_remote(void)
 {
-	char *default_remote;
-	int code = repo_get_default_remote(the_repository, &default_remote);
-
-	if (code)
-		exit(code);
-
-	return default_remote;
+	return xstrdup(repo_default_remote(the_repository));
 }
 
 static char *resolve_relative_url(const char *rel_url, const char *up_path, int quiet)
diff --git a/remote.c b/remote.c
index e7ff21dc0340fd81b0c0c21c9d2199c3c0d53946..e35cf7ec61ac946d1e84c5a0ae309e63c66d0335 100644
--- a/remote.c
+++ b/remote.c
@@ -1772,20 +1772,35 @@ static void set_merge(struct repository *repo, struct branch *ret)
 	}
 }
 
-struct branch *branch_get(const char *name)
+static struct branch *repo_branch_get(struct repository *repo, const char *name)
 {
 	struct branch *ret;
 
-	read_config(the_repository, 0);
+	read_config(repo, 0);
 	if (!name || !*name || !strcmp(name, "HEAD"))
-		ret = the_repository->remote_state->current_branch;
+		ret = repo->remote_state->current_branch;
 	else
-		ret = make_branch(the_repository->remote_state, name,
+		ret = make_branch(repo->remote_state, name,
 				  strlen(name));
-	set_merge(the_repository, ret);
+	set_merge(repo, ret);
 	return ret;
 }
 
+struct branch *branch_get(const char *name)
+{
+	return repo_branch_get(the_repository, name);
+}
+
+const char *repo_default_remote(struct repository *repo)
+{
+	struct branch *branch;
+
+	read_config(repo, 0);
+	branch = repo_branch_get(repo, "HEAD");
+
+	return remotes_remote_for_branch(repo->remote_state, branch, NULL);
+}
+
 int branch_has_merge_config(struct branch *branch)
 {
 	return branch && branch->set_merge;
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index c562bad042ab2d4d0f82cb8b57a1eadbe24044d1..748b529745a5121f121768bb4e0cbc11bc833ea4 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -1134,6 +1134,35 @@ test_expect_success 'setup clean recursive superproject' '
 	git clone --recurse-submodules top top-clean
 '
 
+test_expect_success 'submodule update with renamed remote' '
+	test_when_finished "rm -fr top-cloned" &&
+	cp -r top-clean top-cloned &&
+
+	# Create a commit in each repo, starting with bottom
+	test_commit -C bottom rename_commit &&
+	# Create middle commit
+	git -C middle/bottom fetch &&
+	git -C middle/bottom checkout -f FETCH_HEAD &&
+	git -C middle add bottom &&
+	git -C middle commit -m "rename_commit" &&
+	# Create top commit
+	git -C top/middle fetch &&
+	git -C top/middle checkout -f FETCH_HEAD &&
+	git -C top add middle &&
+	git -C top commit -m "rename_commit" &&
+
+	# rename the submodule remote
+	git -C top-cloned/middle remote rename origin upstream &&
+
+	# Make the update of "middle" a no-op, otherwise we error out
+	# because of its unmerged state
+	test_config -C top-cloned submodule.middle.update !true &&
+	git -C top-cloned submodule update --recursive 2>actual.err &&
+	cat >expect.err <<-\EOF &&
+	EOF
+	test_cmp expect.err actual.err
+'
+
 test_expect_success 'submodule update should skip unmerged submodules' '
 	test_when_finished "rm -fr top-cloned" &&
 	cp -r top-clean top-cloned &&

-- 
2.48.1.397.gec9d649cc640


^ permalink raw reply related

* [PATCH v4 4/7] remote: remove the_repository from some functions
From: Jacob Keller @ 2025-06-23 23:11 UTC (permalink / raw)
  To: git; +Cc: Jacob Keller, Lidong Yan, Patrick Steinhardt, Junio C Hamano
In-Reply-To: <20250623-jk-submodule-helper-use-url-v4-0-133ef3d89569@gmail.com>

From: Jacob Keller <jacob.keller@gmail.com>

The remotes_remote_get_1 (and its caller, remotes_remote_get, have an
implicit dependency on the_repository due to calling
read_branches_file() and read_remotes_file(), both of which use
the_repository. The branch_get() function calls set_merge() which has an
implicit dependency on the_repository as well.

Because of this use of the_repository, the helper functions cannot be
used in code paths which operate on other repositories. A future
refactor of the submodule--helper will want to make use of some of these
functions.

Refactor to break the dependency by passing struct repository *repo
instead of struct remote_state *remote_state in a few places.

The public callers and many other helper functions still depend on
the_repository. A repo-aware function will be exposed in a following
change for git submodule--helper.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
---
 remote.c | 58 ++++++++++++++++++++++++++++------------------------------
 1 file changed, 28 insertions(+), 30 deletions(-)

diff --git a/remote.c b/remote.c
index 194bb447784ac1f71fb85a9fed3312e7458a9d5d..e7ff21dc0340fd81b0c0c21c9d2199c3c0d53946 100644
--- a/remote.c
+++ b/remote.c
@@ -334,11 +334,10 @@ static void warn_about_deprecated_remote_type(const char *type,
 		type, remote->name, remote->name, remote->name);
 }
 
-static void read_remotes_file(struct remote_state *remote_state,
-			      struct remote *remote)
+static void read_remotes_file(struct repository *repo, struct remote *remote)
 {
 	struct strbuf buf = STRBUF_INIT;
-	FILE *f = fopen_or_warn(repo_git_path_append(the_repository, &buf,
+	FILE *f = fopen_or_warn(repo_git_path_append(repo, &buf,
 						     "remotes/%s", remote->name), "r");
 
 	if (!f)
@@ -354,7 +353,7 @@ static void read_remotes_file(struct remote_state *remote_state,
 		strbuf_rtrim(&buf);
 
 		if (skip_prefix(buf.buf, "URL:", &v))
-			add_url_alias(remote_state, remote,
+			add_url_alias(repo->remote_state, remote,
 				      skip_spaces(v));
 		else if (skip_prefix(buf.buf, "Push:", &v))
 			refspec_append(&remote->push, skip_spaces(v));
@@ -367,12 +366,11 @@ static void read_remotes_file(struct remote_state *remote_state,
 	strbuf_release(&buf);
 }
 
-static void read_branches_file(struct remote_state *remote_state,
-			       struct remote *remote)
+static void read_branches_file(struct repository *repo, struct remote *remote)
 {
 	char *frag, *to_free = NULL;
 	struct strbuf buf = STRBUF_INIT;
-	FILE *f = fopen_or_warn(repo_git_path_append(the_repository, &buf,
+	FILE *f = fopen_or_warn(repo_git_path_append(repo, &buf,
 						     "branches/%s", remote->name), "r");
 
 	if (!f)
@@ -399,9 +397,9 @@ static void read_branches_file(struct remote_state *remote_state,
 	if (frag)
 		*(frag++) = '\0';
 	else
-		frag = to_free = repo_default_branch_name(the_repository, 0);
+		frag = to_free = repo_default_branch_name(repo, 0);
 
-	add_url_alias(remote_state, remote, buf.buf);
+	add_url_alias(repo->remote_state, remote, buf.buf);
 	refspec_appendf(&remote->fetch, "refs/heads/%s:refs/heads/%s",
 			frag, remote->name);
 
@@ -698,7 +696,7 @@ const char *pushremote_for_branch(struct branch *branch, int *explicit)
 					     branch, explicit);
 }
 
-static struct remote *remotes_remote_get(struct remote_state *remote_state,
+static struct remote *remotes_remote_get(struct repository *repo,
 					 const char *name);
 
 char *remote_ref_for_branch(struct branch *branch, int for_push)
@@ -717,7 +715,7 @@ char *remote_ref_for_branch(struct branch *branch, int for_push)
 					the_repository->remote_state, branch,
 					NULL);
 			struct remote *remote = remotes_remote_get(
-				the_repository->remote_state, remote_name);
+				the_repository, remote_name);
 
 			if (remote && remote->push.nr &&
 			    (dst = apply_refspecs(&remote->push,
@@ -774,10 +772,11 @@ static void validate_remote_url(struct remote *remote)
 }
 
 static struct remote *
-remotes_remote_get_1(struct remote_state *remote_state, const char *name,
+remotes_remote_get_1(struct repository *repo, const char *name,
 		     const char *(*get_default)(struct remote_state *,
 						struct branch *, int *))
 {
+	struct remote_state *remote_state = repo->remote_state;
 	struct remote *ret;
 	int name_given = 0;
 
@@ -791,9 +790,9 @@ remotes_remote_get_1(struct remote_state *remote_state, const char *name,
 #ifndef WITH_BREAKING_CHANGES
 	if (valid_remote_nick(name) && have_git_dir()) {
 		if (!valid_remote(ret))
-			read_remotes_file(remote_state, ret);
+			read_remotes_file(repo, ret);
 		if (!valid_remote(ret))
-			read_branches_file(remote_state, ret);
+			read_branches_file(repo, ret);
 	}
 #endif /* WITH_BREAKING_CHANGES */
 	if (name_given && !valid_remote(ret))
@@ -807,35 +806,33 @@ remotes_remote_get_1(struct remote_state *remote_state, const char *name,
 }
 
 static inline struct remote *
-remotes_remote_get(struct remote_state *remote_state, const char *name)
+remotes_remote_get(struct repository *repo, const char *name)
 {
-	return remotes_remote_get_1(remote_state, name,
-				    remotes_remote_for_branch);
+	return remotes_remote_get_1(repo, name, remotes_remote_for_branch);
 }
 
 struct remote *remote_get(const char *name)
 {
 	read_config(the_repository, 0);
-	return remotes_remote_get(the_repository->remote_state, name);
+	return remotes_remote_get(the_repository, name);
 }
 
 struct remote *remote_get_early(const char *name)
 {
 	read_config(the_repository, 1);
-	return remotes_remote_get(the_repository->remote_state, name);
+	return remotes_remote_get(the_repository, name);
 }
 
 static inline struct remote *
-remotes_pushremote_get(struct remote_state *remote_state, const char *name)
+remotes_pushremote_get(struct repository *repo, const char *name)
 {
-	return remotes_remote_get_1(remote_state, name,
-				    remotes_pushremote_for_branch);
+	return remotes_remote_get_1(repo, name, remotes_pushremote_for_branch);
 }
 
 struct remote *pushremote_get(const char *name)
 {
 	read_config(the_repository, 0);
-	return remotes_pushremote_get(the_repository->remote_state, name);
+	return remotes_pushremote_get(the_repository, name);
 }
 
 int remote_is_configured(struct remote *remote, int in_repo)
@@ -1739,7 +1736,7 @@ void set_ref_status_for_push(struct ref *remote_refs, int send_mirror,
 	}
 }
 
-static void set_merge(struct remote_state *remote_state, struct branch *ret)
+static void set_merge(struct repository *repo, struct branch *ret)
 {
 	struct remote *remote;
 	char *ref;
@@ -1760,13 +1757,13 @@ static void set_merge(struct remote_state *remote_state, struct branch *ret)
 	}
 	ret->set_merge = 1;
 
-	remote = remotes_remote_get(remote_state, ret->remote_name);
+	remote = remotes_remote_get(repo, ret->remote_name);
 
 	for (i = 0; i < ret->merge_nr; i++) {
 		if (!remote_find_tracking(remote, ret->merge[i]) ||
 		    strcmp(ret->remote_name, "."))
 			continue;
-		if (repo_dwim_ref(the_repository, ret->merge[i]->src,
+		if (repo_dwim_ref(repo, ret->merge[i]->src,
 				  strlen(ret->merge[i]->src), &oid, &ref,
 				  0) == 1)
 			ret->merge[i]->dst = ref;
@@ -1785,7 +1782,7 @@ struct branch *branch_get(const char *name)
 	else
 		ret = make_branch(the_repository->remote_state, name,
 				  strlen(name));
-	set_merge(the_repository->remote_state, ret);
+	set_merge(the_repository, ret);
 	return ret;
 }
 
@@ -1856,13 +1853,14 @@ static const char *tracking_for_push_dest(struct remote *remote,
 	return ret;
 }
 
-static const char *branch_get_push_1(struct remote_state *remote_state,
+static const char *branch_get_push_1(struct repository *repo,
 				     struct branch *branch, struct strbuf *err)
 {
+	struct remote_state *remote_state = repo->remote_state;
 	struct remote *remote;
 
 	remote = remotes_remote_get(
-		remote_state,
+		repo,
 		remotes_pushremote_for_branch(remote_state, branch, NULL));
 	if (!remote)
 		return error_buf(err,
@@ -1929,7 +1927,7 @@ const char *branch_get_push(struct branch *branch, struct strbuf *err)
 
 	if (!branch->push_tracking_ref)
 		branch->push_tracking_ref = branch_get_push_1(
-			the_repository->remote_state, branch, err);
+			the_repository, branch, err);
 	return branch->push_tracking_ref;
 }
 

-- 
2.48.1.397.gec9d649cc640


^ permalink raw reply related

* [PATCH v4 2/7] remote: fix tear down of struct remote
From: Jacob Keller @ 2025-06-23 23:11 UTC (permalink / raw)
  To: git; +Cc: Jacob Keller, Lidong Yan, Patrick Steinhardt, Junio C Hamano
In-Reply-To: <20250623-jk-submodule-helper-use-url-v4-0-133ef3d89569@gmail.com>

From: Jacob Keller <jacob.keller@gmail.com>

The remote_clear() function failed to free the remote->push and
remote->fetch refspec fields.

This should be caught by the leak sanitizer. However, for callers which
use ``the_repository``, the values never go out of scope and the
sanitizer doesn't complain.

A future change is going to add a caller of read_config() for a
submodule repository structure, which would result in the leak sanitizer
complaining.

Fix remote_clear(), updating it to properly call refspec_clear() for
both the push and fetch members.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
---
 remote.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/remote.c b/remote.c
index ee95126f3f20080a932b82314e8017e277569cc1..194bb447784ac1f71fb85a9fed3312e7458a9d5d 100644
--- a/remote.c
+++ b/remote.c
@@ -165,6 +165,9 @@ static void remote_clear(struct remote *remote)
 	strvec_clear(&remote->url);
 	strvec_clear(&remote->pushurl);
 
+	refspec_clear(&remote->push);
+	refspec_clear(&remote->fetch);
+
 	free((char *)remote->receivepack);
 	free((char *)remote->uploadpack);
 	FREE_AND_NULL(remote->http_proxy);

-- 
2.48.1.397.gec9d649cc640


^ permalink raw reply related

* [PATCH v4 3/7] dir: move starts_with_dot(_dot)_slash to dir.h
From: Jacob Keller @ 2025-06-23 23:11 UTC (permalink / raw)
  To: git; +Cc: Jacob Keller, Lidong Yan, Patrick Steinhardt, Junio C Hamano
In-Reply-To: <20250623-jk-submodule-helper-use-url-v4-0-133ef3d89569@gmail.com>

From: Jacob Keller <jacob.keller@gmail.com>

Both submodule--helper.c and submodule-config.c have an implementation
of starts_with_dot_slash and starts_with_dot_dot_slash. The dir.h header
has starts_with_dot(_dot)_slash_native, which sets PATH_MATCH_NATIVE.

Move the helpers to dir.h as static inlines. I thought about renaming
them to postfix with _platform but that felt too long and ugly. On the
other hand it might be slightly confusing with _native.

This simplifies a submodule refactor which wants to use the helpers
earlier in the submodule--helper.c file.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
---
 dir.h                       | 23 +++++++++++++++++++++++
 builtin/submodule--helper.c | 12 ------------
 submodule-config.c          | 12 ------------
 3 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/dir.h b/dir.h
index d7e71aa8daa7d833e4c05e6875b997bc321c6070..fc9be7b427a134e46bcd66c8df42375db47727fc 100644
--- a/dir.h
+++ b/dir.h
@@ -676,4 +676,27 @@ static inline int starts_with_dot_dot_slash_native(const char *const path)
 	return path_match_flags(path, what | PATH_MATCH_NATIVE);
 }
 
+/**
+ * starts_with_dot_slash: convenience wrapper for
+ * patch_match_flags() with PATH_MATCH_STARTS_WITH_DOT_SLASH and
+ * PATH_MATCH_XPLATFORM.
+ */
+static inline int starts_with_dot_slash(const char *const path)
+{
+	const enum path_match_flags what = PATH_MATCH_STARTS_WITH_DOT_SLASH;
+
+	return path_match_flags(path, what | PATH_MATCH_XPLATFORM);
+}
+
+/**
+ * starts_with_dot_dot_slash: convenience wrapper for
+ * patch_match_flags() with PATH_MATCH_STARTS_WITH_DOT_DOT_SLASH and
+ * PATH_MATCH_XPLATFORM.
+ */
+static inline int starts_with_dot_dot_slash(const char *const path)
+{
+	const enum path_match_flags what = PATH_MATCH_STARTS_WITH_DOT_DOT_SLASH;
+
+	return path_match_flags(path, what | PATH_MATCH_XPLATFORM);
+}
 #endif
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 53da2116ddf576bc565b29f043e8b703b8b1563b..9e8cdfe1b2a8c2985d9c1b8ad6f1b0d1f9401714 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -438,18 +438,6 @@ static int module_foreach(int argc, const char **argv, const char *prefix,
 	return ret;
 }
 
-static int starts_with_dot_slash(const char *const path)
-{
-	return path_match_flags(path, PATH_MATCH_STARTS_WITH_DOT_SLASH |
-				PATH_MATCH_XPLATFORM);
-}
-
-static int starts_with_dot_dot_slash(const char *const path)
-{
-	return path_match_flags(path, PATH_MATCH_STARTS_WITH_DOT_DOT_SLASH |
-				PATH_MATCH_XPLATFORM);
-}
-
 struct init_cb {
 	const char *prefix;
 	const char *super_prefix;
diff --git a/submodule-config.c b/submodule-config.c
index 8630e27947d3943e1980eb7a53bd41a546842503..d64438b2a18ed2123cc5e18f739539209032d3e9 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -235,18 +235,6 @@ int check_submodule_name(const char *name)
 	return 0;
 }
 
-static int starts_with_dot_slash(const char *const path)
-{
-	return path_match_flags(path, PATH_MATCH_STARTS_WITH_DOT_SLASH |
-				PATH_MATCH_XPLATFORM);
-}
-
-static int starts_with_dot_dot_slash(const char *const path)
-{
-	return path_match_flags(path, PATH_MATCH_STARTS_WITH_DOT_DOT_SLASH |
-				PATH_MATCH_XPLATFORM);
-}
-
 static int submodule_url_is_relative(const char *url)
 {
 	return starts_with_dot_slash(url) || starts_with_dot_dot_slash(url);

-- 
2.48.1.397.gec9d649cc640


^ permalink raw reply related

* [PATCH v4 1/7] remote: remove branch->merge_name and fix branch_release()
From: Jacob Keller @ 2025-06-23 23:11 UTC (permalink / raw)
  To: git; +Cc: Jacob Keller, Lidong Yan, Patrick Steinhardt, Junio C Hamano
In-Reply-To: <20250623-jk-submodule-helper-use-url-v4-0-133ef3d89569@gmail.com>

From: Jacob Keller <jacob.keller@gmail.com>

The branch structure has both branch->merge_name and branch->merge for
tracking the merge information. The former is allocated by add_merge()
and stores the names read from the configuration file. The latter is
allocated by set_merge() which is called by branch_get() when an
external caller requests a branch.

This leads to the confusing situation where branch->merge_nr tracks both
the size of branch->merge (once its allocated) and branch->merge_name.
The branch_release() function incorrectly assumes that branch->merge is
always set when branch->merge_nr is non-zero, and can potentially crash
if read_config() is called without branch_get() being called on every
branch.

In addition, branch_release() fails to free some of the memory
associated with the structure including:

 * Failure to free the refspec_item containers in branch->merge[i]
 * Failure to free the strings in branch->merge_name[i]
 * Failure to free the branch->merge_name parent array.

The set_merge() function sets branch->merge_nr to 0 when there is no
valid remote_name, to avoid external callers seeing a non-zero merge_nr
but a NULL merge array. This results in failure to release most of the
merge data as well.

These issues could be fixed directly, and indeed I initially proposed
such a change at [1] in the past. While this works, there was some
confusion during review because of the inconsistencies.

Instead, its time to clean up the situation properly. Remove
branch->merge_name entirely. Instead, allocate branch->merge earlier
within add_merge() instead of within set_merge(). Instead of having
set_merge() copy from merge_name[i] to merge[i]->src, just have
add_merge() directly initialize merge[i]->src.

Modify the add_merge() to call xstrdup() itself, instead of having
the caller of add_merge() do so. This makes it more obvious which code
owns the memory.

Update all callers which use branch->merge_name[i] to use
branch->merge[i]->src instead.

Add a merge_clear() function which properly releases all of the
merge-related memory, and which sets branch->merge_nr to zero. Use this
both in branch_release() and in set_merge(), fixing the leak when
set_merge() finds no valid remote_name.

Add a set_merge variable to the branch structure, which indicates
whether set_merge() has been called. This replaces the previous use of a
NULL check against the branch->merge array.

With these changes, the merge array is always allocated when merge_nr is
non-zero.

This use of refspec_item to store the names should be safe. External
callers should be using branch_get() to obtain a pointer to the branch,
which will call set_merge(), and the callers internal to remote.c
already handle the partially initialized refpsec_item structure safely.

This end result is cleaner, and avoids duplicating the merge names
twice.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Link: [1] https://lore.kernel.org/git/20250617-jk-submodule-helper-use-url-v2-1-04cbb003177d@gmail.com/
---
 remote.h       |  4 ++--
 branch.c       |  4 ++--
 builtin/pull.c |  2 +-
 remote.c       | 44 ++++++++++++++++++++++++++++----------------
 4 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/remote.h b/remote.h
index 7e4943ae3a70ecefa3332d211084762ca30b59b6..76d93bf88d1fb8c0e2cbc2bc99558f23a256155c 100644
--- a/remote.h
+++ b/remote.h
@@ -315,8 +315,8 @@ struct branch {
 
 	char *pushremote_name;
 
-	/* An array of the "merge" lines in the configuration. */
-	const char **merge_name;
+	/* True if set_merge() has been called to finalize the merge array */
+	int set_merge;
 
 	/**
 	 * An array of the struct refspecs used for the merge lines. That is,
diff --git a/branch.c b/branch.c
index 6d01d7d6bdb2e4d969429433b1b6bc88446a96c5..93f5b4e8dd9fe53ae4412827c458bade7c341278 100644
--- a/branch.c
+++ b/branch.c
@@ -230,7 +230,7 @@ static int inherit_tracking(struct tracking *tracking, const char *orig_ref)
 		return -1;
 	}
 
-	if (branch->merge_nr < 1 || !branch->merge_name || !branch->merge_name[0]) {
+	if (branch->merge_nr < 1 || !branch->merge || !branch->merge[0] || !branch->merge[0]->src) {
 		warning(_("asked to inherit tracking from '%s', but no merge configuration is set"),
 			bare_ref);
 		return -1;
@@ -238,7 +238,7 @@ static int inherit_tracking(struct tracking *tracking, const char *orig_ref)
 
 	tracking->remote = branch->remote_name;
 	for (i = 0; i < branch->merge_nr; i++)
-		string_list_append(tracking->srcs, branch->merge_name[i]);
+		string_list_append(tracking->srcs, branch->merge[i]->src);
 	return 0;
 }
 
diff --git a/builtin/pull.c b/builtin/pull.c
index a1ebc6ad3328e074b105246f6bf5c41b063c17c9..f4556ae155ce22ea91f9878d772eb9228fe4e604 100644
--- a/builtin/pull.c
+++ b/builtin/pull.c
@@ -487,7 +487,7 @@ static void NORETURN die_no_merge_candidates(const char *repo, const char **refs
 	} else
 		fprintf_ln(stderr, _("Your configuration specifies to merge with the ref '%s'\n"
 			"from the remote, but no such ref was fetched."),
-			*curr_branch->merge_name);
+			curr_branch->merge[0]->src);
 	exit(1);
 }
 
diff --git a/remote.c b/remote.c
index 4099183cacdc8a607a8b5eaec86e456b2ef46b48..ee95126f3f20080a932b82314e8017e277569cc1 100644
--- a/remote.c
+++ b/remote.c
@@ -174,9 +174,15 @@ static void remote_clear(struct remote *remote)
 
 static void add_merge(struct branch *branch, const char *name)
 {
-	ALLOC_GROW(branch->merge_name, branch->merge_nr + 1,
+	struct refspec_item *merge;
+
+	ALLOC_GROW(branch->merge, branch->merge_nr + 1,
 		   branch->merge_alloc);
-	branch->merge_name[branch->merge_nr++] = name;
+
+	merge = xcalloc(1, sizeof(*merge));
+	merge->src = xstrdup(name);
+
+	branch->merge[branch->merge_nr++] = merge;
 }
 
 struct branches_hash_key {
@@ -247,15 +253,23 @@ static struct branch *make_branch(struct remote_state *remote_state,
 	return ret;
 }
 
+static void merge_clear(struct branch *branch)
+{
+	for (int i = 0; i < branch->merge_nr; i++) {
+		refspec_item_clear(branch->merge[i]);
+		free(branch->merge[i]);
+	}
+	FREE_AND_NULL(branch->merge);
+	branch->merge_nr = 0;
+}
+
 static void branch_release(struct branch *branch)
 {
 	free((char *)branch->name);
 	free((char *)branch->refname);
 	free(branch->remote_name);
 	free(branch->pushremote_name);
-	for (int i = 0; i < branch->merge_nr; i++)
-		refspec_item_clear(branch->merge[i]);
-	free(branch->merge);
+	merge_clear(branch);
 }
 
 static struct rewrite *make_rewrite(struct rewrites *r,
@@ -429,7 +443,7 @@ static int handle_config(const char *key, const char *value,
 		} else if (!strcmp(subkey, "merge")) {
 			if (!value)
 				return config_error_nonbool(key);
-			add_merge(branch, xstrdup(value));
+			add_merge(branch, value);
 		}
 		return 0;
 	}
@@ -692,7 +706,7 @@ char *remote_ref_for_branch(struct branch *branch, int for_push)
 	if (branch) {
 		if (!for_push) {
 			if (branch->merge_nr) {
-				return xstrdup(branch->merge_name[0]);
+				return xstrdup(branch->merge[0]->src);
 			}
 		} else {
 			char *dst;
@@ -1731,32 +1745,30 @@ static void set_merge(struct remote_state *remote_state, struct branch *ret)
 
 	if (!ret)
 		return; /* no branch */
-	if (ret->merge)
+	if (ret->set_merge)
 		return; /* already run */
 	if (!ret->remote_name || !ret->merge_nr) {
 		/*
 		 * no merge config; let's make sure we don't confuse callers
 		 * with a non-zero merge_nr but a NULL merge
 		 */
-		ret->merge_nr = 0;
+		merge_clear(ret);
 		return;
 	}
+	ret->set_merge = 1;
 
 	remote = remotes_remote_get(remote_state, ret->remote_name);
 
-	CALLOC_ARRAY(ret->merge, ret->merge_nr);
 	for (i = 0; i < ret->merge_nr; i++) {
-		ret->merge[i] = xcalloc(1, sizeof(**ret->merge));
-		ret->merge[i]->src = xstrdup(ret->merge_name[i]);
 		if (!remote_find_tracking(remote, ret->merge[i]) ||
 		    strcmp(ret->remote_name, "."))
 			continue;
-		if (repo_dwim_ref(the_repository, ret->merge_name[i],
-				  strlen(ret->merge_name[i]), &oid, &ref,
+		if (repo_dwim_ref(the_repository, ret->merge[i]->src,
+				  strlen(ret->merge[i]->src), &oid, &ref,
 				  0) == 1)
 			ret->merge[i]->dst = ref;
 		else
-			ret->merge[i]->dst = xstrdup(ret->merge_name[i]);
+			ret->merge[i]->dst = xstrdup(ret->merge[i]->src);
 	}
 }
 
@@ -1776,7 +1788,7 @@ struct branch *branch_get(const char *name)
 
 int branch_has_merge_config(struct branch *branch)
 {
-	return branch && !!branch->merge;
+	return branch && branch->set_merge;
 }
 
 int branch_merge_matches(struct branch *branch,

-- 
2.48.1.397.gec9d649cc640


^ permalink raw reply related

* [PATCH v4 0/7] submodule: improve remote lookup logic
From: Jacob Keller @ 2025-06-23 23:11 UTC (permalink / raw)
  To: git; +Cc: Jacob Keller, Lidong Yan, Patrick Steinhardt, Junio C Hamano

This series improves the git submodule remote lookup logic implemented in
submodule--helper.

A few cleanups are done first:

* Remove the branch->merge_name array and replace it by directly using
  branch->merge[i]->src immediately. This is simpler and easier to reason
  about. While cleaning this up, also fix the issues with branch_release()
  not tearing down everything properly.

* remote_clear() failed to release the remote->push and remote->fetch
  refspec data. Fix this.

* The starts_with_dot(_dot)_slash helper functions are moved to dir.h for
  re-use, as these are used both within submodule--helper.c and
  submodule-config.c

* Several remote.c helper functions are refactored to take repository
  pointers, enabling use with a submodule repository pointer.

Next, the submodule--helper.c logic replaces the repo_get_default_remote()
function with a repo_default_remote() function in remote.c, which is based
on the more robust configuration reading logic. This helper uses similar
logic but also allows returning the only valid remote in the case where a
repository has exactly one remote. This way we do not fall back to "origin"
if a user has renamed the remote without adding another.

This improved logic is a good first step, but won't handle cases where
there are multiple remotes, with none of them being named "origin".

For the final improvement, notice that the parent project already stores
the URL for the submodule in its .git/config or .gitmodules file. This URL
is what we use to set the remote in the first place when cloning.

Add a repo_remote_from_url() helper which will iterate through the remotes
and find the first remote with that URL. Use this in
get_default_remote_submodule() to first try and find a remote by its URL.
If unsuccessful, we still keep the original fallback logic, in the off
chance that the user has changed the URL from within the submodule.

This method is more robust as it is less likely that the user has manually
changed the submodule URL within the submodule but not also within the
.git/config.

With this change, all commands which need the submodule remote will first
look up by URL before trying to use the fallback logic, and should now be
able to find a suitable remote regardless of now they are renamed.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
---
Changes in v4:
- Fix branch_has_merge_config to use branch->set_merge
- FREE_AND_NULL branch->merge in merge_clear()
- Link to v3: https://lore.kernel.org/r/20250618-jk-submodule-helper-use-url-v3-0-7c60f2679271@gmail.com

Changes in v3:
- Completely remove branch->merge_name, making the resulting logic much
  easier to understand.
- Link to v2: https://lore.kernel.org/r/20250617-jk-submodule-helper-use-url-v2-0-04cbb003177d@gmail.com

Changes in v2:
- Remove repo_get_default_remote() entirely. The extra checks it does are
  really only necessary if you're doing manual configuration lookup. This
  avoids the confusion of similarly named functions and is less code.
- Fix leaks in branch_release() and remote_clear().
- Add a forward declaration of struct repository.
- Verified tests pass with leak sanitizer now.
- Link to v1: https://lore.kernel.org/r/20250610-jk-submodule-helper-use-url-v1-0-6d14c1504e91@gmail.com

---
Jacob Keller (7):
      remote: remove branch->merge_name and fix branch_release()
      remote: fix tear down of struct remote
      dir: move starts_with_dot(_dot)_slash to dir.h
      remote: remove the_repository from some functions
      submodule--helper: improve logic for fallback remote name
      submodule: move get_default_remote_submodule()
      submodule: look up remotes by URL first

 dir.h                       |  23 +++++++
 remote.h                    |   8 ++-
 branch.c                    |   4 +-
 builtin/pull.c              |   2 +-
 builtin/submodule--helper.c | 106 ++++++++++++-------------------
 remote.c                    | 149 ++++++++++++++++++++++++++++----------------
 submodule-config.c          |  12 ----
 t/t7406-submodule-update.sh |  61 ++++++++++++++++++
 8 files changed, 230 insertions(+), 135 deletions(-)
---
base-commit: 16bd9f20a403117f2e0d9bcda6c6e621d3763e77
change-id: 20250610-jk-submodule-helper-use-url-e55d3c379faf

Best regards,
-- 
Jacob Keller <jacob.keller@gmail.com>


^ permalink raw reply

* Re: [PATCH v6 5/9] pack-objects: perform name-hash traversal for unpacked objects
From: Junio C Hamano @ 2025-06-23 23:08 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King
In-Reply-To: <6b0149a32d300268d4ad870c7cb6597a95e0410b.1750717921.git.me@ttaylorr.com>

Taylor Blau <me@ttaylorr.com> writes:

> Now that the 'rev_info' struct is declared outside of
> `read_packs_list_from_stdin()`, we can pass it to
> `add_objects_in_unpacked_packs()` and add any loose objects as tips to
> the above-mentioned traversal, in theory producing slightly tighter
> packs as a result.

So the idea is to pretend any and all loose commits as if they are
at the tip of branches?  By doing so, we ensure each of the tree and
blob objects contained in them has a reasonable path-from-the-root?

> @@ -4325,6 +4326,10 @@ static int add_loose_object(const struct object_id *oid, const char *path,
>  	} else {
>  		add_object_entry(oid, type, "", 0);
>  	}
> +
> +	if (revs && type == OBJ_COMMIT)
> +		add_pending_oid(revs, NULL, oid, 0);
> +
>  	return 0;
>  }

OK.

^ permalink raw reply

* Re: [PATCH v3 1/7] remote: remove branch->merge_name and fix branch_release()
From: Jacob Keller @ 2025-06-23 23:06 UTC (permalink / raw)
  To: Lidong Yan, Junio C Hamano; +Cc: git, Jacob Keller, Patrick Steinhardt
In-Reply-To: <CA42712D-C127-4142-9424-2A512F9488CB@gmail.com>



On 6/20/2025 7:12 PM, Lidong Yan wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>>
>> Jacob Keller <jacob.e.keller@intel.com> writes:
>>
>>> This end result is cleaner, and avoids duplicating the merge names
>>> twice.
>>>
>>> Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
>>> Link: [1] https://lore.kernel.org/git/20250617-jk-submodule-helper-use-url-v2-1-04cbb003177d@gmail.com/
>>> ---
>>> remote.h       |  4 ++--
>>> branch.c       |  4 ++--
>>> builtin/pull.c |  2 +-
>>> remote.c       | 42 +++++++++++++++++++++++++++---------------
>>> 4 files changed, 32 insertions(+), 20 deletions(-)
>>
>> This unfortunately makes t5582 segfault.
> 
> I used Clang's Address Sanitizer and found that the segmentation fault
> was caused by the following two null pointer dereferences.
> 
> ==501602==Hint: address points to the zero page.
>     #0 0x72be47ec6ce1 in strcmp ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:465
>     #1 0x5d7e2f246b11 in do_fetch builtin/fetch.c:1732
> 
> ==501614==Hint: address points to the zero page.
>     #0 0x7b9812ac6ce1 in strcmp ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:465
>     #1 0x64e39a76cdc1 in get_ref_map builtin/fetch.c:555
> 
> I believe this is because we didn't update the corresponding
> branch_has_merge_config() function. In the previous implementation,
> if branch->remote_name was a null pointer, branch_has_merge_config()
> would return false. However, PATCH[v3 1/7] broke this convention.
> 

Yep, this is likely the correct fix.

> The solution could be:
>   - Replace !!branch->merge with branch->set_merge in branch_has_merge_config().
>   - Replace free(branch->merge) with FREE_AND_NULL(branch->merge) in merge_release()
>     to prevent double free.
> 
> I test my solution locally by just running t5582 and it passed.
> 

I think the FREE_AND_NULL is a good improvement too.

> ---
> diff --git a/remote.c b/remote.c
> index dff76e4626..ee95126f3f 100644
> --- a/remote.c
> +++ b/remote.c
> @@ -259,7 +259,7 @@ static void merge_clear(struct branch *branch)
>                 refspec_item_clear(branch->merge[i]);
>                 free(branch->merge[i]);
>         }
> -       free(branch->merge);
> +       FREE_AND_NULL(branch->merge);
>         branch->merge_nr = 0;
>  }
>  
> @@ -1788,7 +1788,7 @@ struct branch *branch_get(const char *name)
>  
>  int branch_has_merge_config(struct branch *branch)
>  {
> -       return branch && !!branch->merge;
> +       return branch && branch->set_merge;
>  }
>  
This is probably the real "fix" since branch_has_merge_config should
return false until set_merge is called.

>  int branch_merge_matches(struct branch *branch,
> ---
> 


^ permalink raw reply

* Re: [PATCH v6 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier
From: Junio C Hamano @ 2025-06-23 22:59 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King
In-Reply-To: <c9f874eb9470bf2a5d97614b89304e892c30e129.1750717921.git.me@ttaylorr.com>

Taylor Blau <me@ttaylorr.com> writes:

> Once 'read_packs_list_from_stdin()' has called for_each_object_in_pack()
> on each of the input packs, we do a reachability traversal to discover
> names for any objects we picked up so we can generate name hash values
> and hopefully get higher quality deltas as a result.
>
> A future commit will change the purpose of this reachability traversal
> to find and pack objects which are reachable from commits in the input
> packs, but are packed in an unknown (not included nor excluded) pack.
>
> Extract the code which initializes and performs the reachability
> traversal to take place in the caller, not the callee, which prepares us
> to share this code for the '--unpacked' case (see the function
> add_unreachable_loose_objects() for more details).
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  builtin/pack-objects.c | 71 +++++++++++++++++++++---------------------
>  1 file changed, 36 insertions(+), 35 deletions(-)

Makes sense.  

Another forward declaration of add_unreachable_loose_objects(),
after one was already added in the previous step, confused me a bit,
but this step is merely moving that a bit higher, so there is
nothing funny here.  Looking good.


^ permalink raw reply

* Re: [PATCH v6 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()'
From: Junio C Hamano @ 2025-06-23 22:49 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King
In-Reply-To: <86fb36d3176198fa350dfaed261e8ae64b49b355.1750717921.git.me@ttaylorr.com>

Taylor Blau <me@ttaylorr.com> writes:

> In add_object_entry_from_pack() we declare 'revs' (given to us through
> the miscellaneous context argument) earlier in the "if (p)" conditional
> than is necessary.  Move it down as far as it can go to reduce its
> scope.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  builtin/pack-objects.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index e7274e0e00..d04a36a6bf 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3725,7 +3725,6 @@ static int add_object_entry_from_pack(const struct object_id *oid,
>  		return 0;
>  
>  	if (p) {
> -		struct rev_info *revs = _data;
>  		struct object_info oi = OBJECT_INFO_INIT;
>  
>  		oi.typep = &type;
> @@ -3733,6 +3732,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
>  			die(_("could not get type of object %s in pack %s"),
>  			    oid_to_hex(oid), p->pack_name);
>  		} else if (type == OBJ_COMMIT) {
> +			struct rev_info *revs = _data;

Nice.  This block is the only one that needs this variable.  Makes sense.

>  			/*
>  			 * commits in included packs are used as starting points for the
>  			 * subsequent revision walk

^ permalink raw reply

* [PATCH v6 9/9] repack: exclude cruft pack(s) from the MIDX where possible
From: Taylor Blau @ 2025-06-23 22:32 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano
In-Reply-To: <cover.1750717921.git.me@ttaylorr.com>

In ddee3703b3 (builtin/repack.c: add cruft packs to MIDX during
geometric repack, 2022-05-20), repack began adding cruft pack(s) to the
MIDX with '--write-midx' to ensure that the resulting MIDX was always
closed under reachability in order to generate reachability bitmaps.

While the previous patch added the '--stdin-packs=follow' option to
pack-objects, it is not yet on by default. Given that, suppose you have
a once-unreachable object packed in a cruft pack, which later becomes
reachable from one or more objects in a geometrically repacked pack.
That once-unreachable object *won't* appear in the new pack, since the
cruft pack was not specified as included or excluded when the
geometrically repacked pack was created with 'pack-objects
--stdin-packs' (*not* '--stdin-packs=follow', which is not on). If that
new pack is included in a MIDX without the cruft pack, then trying to
generate bitmaps for that MIDX may fail. This happens when the bitmap
selection process picks one or more commits which reach the
once-unreachable objects.

To mitigate this failure mode, commit ddee3703b3 ensures that the MIDX
will be closed under reachability by including cruft pack(s). If cruft
pack(s) were not included, we would fail to generate a MIDX bitmap. But
ddee3703b3 alludes to the fact that this is sub-optimal by saying

    [...] it's desirable to avoid including cruft packs in the MIDX
    because it causes the MIDX to store a bunch of objects which are
    likely to get thrown away.

, which is true, but hides an even larger problem. If repositories
rarely prune their unreachable objects and/or have many of them, the
MIDX must keep track of a large number of objects which bloats the MIDX
and slows down object lookup.

This is doubly unfortunate because the vast majority of objects in cruft
pack(s) are unlikely to be read. But any object lookups that go through
the MIDX must binary search over them anyway, slowing down object
lookups using the MIDX.

This patch causes geometrically-repacked packs to contain a copy of any
once-unreachable object(s) with 'git pack-objects --stdin-packs=follow',
allowing us to avoid including any cruft packs in the MIDX. This is
because a sequence of geometrically-repacked packs that were all
generated with '--stdin-packs=follow' are guaranteed to have their union
be closed under reachability.

Note that you cannot guarantee that a collection of packs is closed
under reachability if not all of them were generated with "following" as
above. One tell-tale sign that not all geometrically-repacked packs in
the MIDX were generated with "following" is to see if there is a pack in
the existing MIDX that is not going to be somehow represented (either
verbatim or as part of a geometric rollup) in the new MIDX.

If there is, then starting to generate packs with "following" during
geometric repacking won't work, since it's open to the same race as
described above.

But if you're starting from scratch (e.g., building the first MIDX after
an all-into-one '--cruft' repack), then you can guarantee that the union
of subsequently generated packs from geometric repacking *is* closed
under reachability.

(One exception here is when "starting from scratch" results in a noop
repack, e.g., because the non-cruft pack(s) in a repository already form
a geometric progression. Since we can't tell whether or not those were
generated with '--stdin-packs=follow', they may depend on
once-unreachable objects, so we have to include the cruft pack in the
MIDX in this case.)

Detect when this is the case and avoid including cruft packs in the MIDX
where possible. The existing behavior remains the default, and the new
behavior is available with the config 'repack.midxMustIncludeCruft' set
to 'false'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config/repack.adoc |   7 ++
 builtin/repack.c                 | 187 +++++++++++++++++++++++++++----
 t/t7704-repack-cruft.sh          | 145 ++++++++++++++++++++++++
 3 files changed, 319 insertions(+), 20 deletions(-)

diff --git a/Documentation/config/repack.adoc b/Documentation/config/repack.adoc
index c79af6d7b8..e9e78dcb19 100644
--- a/Documentation/config/repack.adoc
+++ b/Documentation/config/repack.adoc
@@ -39,3 +39,10 @@ repack.cruftThreads::
 	a cruft pack and the respective parameters are not given over
 	the command line. See similarly named `pack.*` configuration
 	variables for defaults and meaning.
+
+repack.midxMustContainCruft::
+	When set to true, linkgit:git-repack[1] will unconditionally include
+	cruft pack(s), if any, in the multi-pack index when invoked with
+	`--write-midx`. When false, cruft packs are only included in the MIDX
+	when necessary (e.g., because they might be required to form a
+	reachability closure with MIDX bitmaps). Defaults to true.
diff --git a/builtin/repack.c b/builtin/repack.c
index 5ddc6e7f95..8d1540a0fd 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -39,6 +39,7 @@ static int write_bitmaps = -1;
 static int use_delta_islands;
 static int run_update_server_info = 1;
 static char *packdir, *packtmp_name, *packtmp;
+static int midx_must_contain_cruft = 1;
 
 static const char *const git_repack_usage[] = {
 	N_("git repack [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]\n"
@@ -108,6 +109,10 @@ static int repack_config(const char *var, const char *value,
 		free(cruft_po_args->threads);
 		return git_config_string(&cruft_po_args->threads, var, value);
 	}
+	if (!strcmp(var, "repack.midxmustcontaincruft")) {
+		midx_must_contain_cruft = git_config_bool(var, value);
+		return 0;
+	}
 	return git_default_config(var, value, ctx, cb);
 }
 
@@ -690,6 +695,77 @@ static void free_pack_geometry(struct pack_geometry *geometry)
 	free(geometry->pack);
 }
 
+static int midx_has_unknown_packs(char **midx_pack_names,
+				  size_t midx_pack_names_nr,
+				  struct string_list *include,
+				  struct pack_geometry *geometry,
+				  struct existing_packs *existing)
+{
+	size_t i;
+
+	string_list_sort(include);
+
+	for (i = 0; i < midx_pack_names_nr; i++) {
+		const char *pack_name = midx_pack_names[i];
+
+		/*
+		 * Determine whether or not each MIDX'd pack from the existing
+		 * MIDX (if any) is represented in the new MIDX. For each pack
+		 * in the MIDX, it must either be:
+		 *
+		 *  - In the "include" list of packs to be included in the new
+		 *    MIDX. Note this function is called before the include
+		 *    list is populated with any cruft pack(s).
+		 *
+		 *  - Below the geometric split line (if using pack geometry),
+		 *    indicating that the pack won't be included in the new
+		 *    MIDX, but its contents were rolled up as part of the
+		 *    geometric repack.
+		 *
+		 *  - In the existing non-kept packs list (if not using pack
+		 *    geometry), and marked as non-deleted.
+		 */
+		if (string_list_has_string(include, pack_name)) {
+			continue;
+		} else if (geometry) {
+			struct strbuf buf = STRBUF_INIT;
+			uint32_t j;
+
+			for (j = 0; j < geometry->split; j++) {
+				strbuf_reset(&buf);
+				strbuf_addstr(&buf, pack_basename(geometry->pack[j]));
+				strbuf_strip_suffix(&buf, ".pack");
+				strbuf_addstr(&buf, ".idx");
+
+				if (!strcmp(pack_name, buf.buf)) {
+					strbuf_release(&buf);
+					break;
+				}
+			}
+
+			strbuf_release(&buf);
+
+			if (j < geometry->split)
+				continue;
+		} else {
+			struct string_list_item *item;
+
+			item = string_list_lookup(&existing->non_kept_packs,
+						  pack_name);
+			if (item && !pack_is_marked_for_deletion(item))
+				continue;
+		}
+
+		/*
+		 * If we got to this point, the MIDX includes some pack that we
+		 * don't know about.
+		 */
+		return 1;
+	}
+
+	return 0;
+}
+
 struct midx_snapshot_ref_data {
 	struct tempfile *f;
 	struct oidset seen;
@@ -758,6 +834,8 @@ static void midx_snapshot_refs(struct tempfile *f)
 
 static void midx_included_packs(struct string_list *include,
 				struct existing_packs *existing,
+				char **midx_pack_names,
+				size_t midx_pack_names_nr,
 				struct string_list *names,
 				struct pack_geometry *geometry)
 {
@@ -811,26 +889,56 @@ static void midx_included_packs(struct string_list *include,
 		}
 	}
 
-	for_each_string_list_item(item, &existing->cruft_packs) {
+	if (midx_must_contain_cruft ||
+	    midx_has_unknown_packs(midx_pack_names, midx_pack_names_nr,
+				   include, geometry, existing)) {
 		/*
-		 * When doing a --geometric repack, there is no need to check
-		 * for deleted packs, since we're by definition not doing an
-		 * ALL_INTO_ONE repack (hence no packs will be deleted).
-		 * Otherwise we must check for and exclude any packs which are
-		 * enqueued for deletion.
+		 * If there are one or more unknown pack(s) present (see
+		 * midx_has_unknown_packs() for what makes a pack
+		 * "unknown") in the MIDX before the repack, keep them
+		 * as they may be required to form a reachability
+		 * closure if the MIDX is bitmapped.
 		 *
-		 * So we could omit the conditional below in the --geometric
-		 * case, but doing so is unnecessary since no packs are marked
-		 * as pending deletion (since we only call
-		 * `mark_packs_for_deletion()` when doing an all-into-one
-		 * repack).
+		 * For example, a cruft pack can be required to form a
+		 * reachability closure if the MIDX is bitmapped and one
+		 * or more of the bitmap's selected commits reaches a
+		 * once-cruft object that was later made reachable.
 		 */
-		if (pack_is_marked_for_deletion(item))
-			continue;
+		for_each_string_list_item(item, &existing->cruft_packs) {
+			/*
+			 * When doing a --geometric repack, there is no
+			 * need to check for deleted packs, since we're
+			 * by definition not doing an ALL_INTO_ONE
+			 * repack (hence no packs will be deleted).
+			 * Otherwise we must check for and exclude any
+			 * packs which are enqueued for deletion.
+			 *
+			 * So we could omit the conditional below in the
+			 * --geometric case, but doing so is unnecessary
+			 *  since no packs are marked as pending
+			 *  deletion (since we only call
+			 *  `mark_packs_for_deletion()` when doing an
+			 *  all-into-one repack).
+			 */
+			if (pack_is_marked_for_deletion(item))
+				continue;
 
-		strbuf_reset(&buf);
-		strbuf_addf(&buf, "%s.idx", item->string);
-		string_list_insert(include, buf.buf);
+			strbuf_reset(&buf);
+			strbuf_addf(&buf, "%s.idx", item->string);
+			string_list_insert(include, buf.buf);
+		}
+	} else {
+		/*
+		 * Modern versions of Git (with the appropriate
+		 * configuration setting) will write new copies of
+		 * once-cruft objects when doing a --geometric repack.
+		 *
+		 * If the MIDX has no cruft pack, new packs written
+		 * during a --geometric repack will not rely on the
+		 * cruft pack to form a reachability closure, so we can
+		 * avoid including them in the MIDX in that case.
+		 */
+		;
 	}
 
 	strbuf_release(&buf);
@@ -1145,6 +1253,8 @@ int cmd_repack(int argc,
 	struct tempfile *refs_snapshot = NULL;
 	int i, ext, ret;
 	int show_progress;
+	char **midx_pack_names = NULL;
+	size_t midx_pack_names_nr = 0;
 
 	/* variables to be filled by option parsing */
 	int delete_redundant = 0;
@@ -1361,7 +1471,10 @@ int cmd_repack(int argc,
 		    !(pack_everything & PACK_CRUFT))
 			strvec_push(&cmd.args, "--pack-loose-unreachable");
 	} else if (geometry.split_factor) {
-		strvec_push(&cmd.args, "--stdin-packs");
+		if (midx_must_contain_cruft)
+			strvec_push(&cmd.args, "--stdin-packs");
+		else
+			strvec_push(&cmd.args, "--stdin-packs=follow");
 		strvec_push(&cmd.args, "--unpacked");
 	} else {
 		strvec_push(&cmd.args, "--unpacked");
@@ -1401,8 +1514,25 @@ int cmd_repack(int argc,
 	if (ret)
 		goto cleanup;
 
-	if (!names.nr && !po_args.quiet)
-		printf_ln(_("Nothing new to pack."));
+	if (!names.nr) {
+		if (!po_args.quiet)
+			printf_ln(_("Nothing new to pack."));
+		/*
+		 * If we didn't write any new packs, the non-cruft packs
+		 * may refer to once-unreachable objects in the cruft
+		 * pack(s).
+		 *
+		 * If there isn't already a MIDX, the one we write
+		 * must include the cruft pack(s), in case the
+		 * non-cruft pack(s) refer to once-cruft objects.
+		 *
+		 * If there is already a MIDX, we can punt here, since
+		 * midx_has_unknown_packs() will make the decision for
+		 * us.
+		 */
+		if (!get_local_multi_pack_index(the_repository))
+			midx_must_contain_cruft = 1;
+	}
 
 	if (pack_everything & PACK_CRUFT) {
 		const char *pack_prefix = find_pack_prefix(packdir, packtmp);
@@ -1483,6 +1613,19 @@ int cmd_repack(int argc,
 
 	string_list_sort(&names);
 
+	if (get_local_multi_pack_index(the_repository)) {
+		struct multi_pack_index *m =
+			get_local_multi_pack_index(the_repository);
+
+		ALLOC_ARRAY(midx_pack_names,
+			    m->num_packs + m->num_packs_in_base);
+
+		for (; m; m = m->base_midx)
+			for (uint32_t i = 0; i < m->num_packs; i++)
+				midx_pack_names[midx_pack_names_nr++] =
+					xstrdup(m->pack_names[i]);
+	}
+
 	close_object_store(the_repository->objects);
 
 	/*
@@ -1524,7 +1667,8 @@ int cmd_repack(int argc,
 
 	if (write_midx) {
 		struct string_list include = STRING_LIST_INIT_DUP;
-		midx_included_packs(&include, &existing, &names, &geometry);
+		midx_included_packs(&include, &existing, midx_pack_names,
+				    midx_pack_names_nr, &names, &geometry);
 
 		ret = write_midx_included_packs(&include, &geometry, &names,
 						refs_snapshot ? get_tempfile_path(refs_snapshot) : NULL,
@@ -1575,6 +1719,9 @@ int cmd_repack(int argc,
 	string_list_clear(&names, 1);
 	existing_packs_release(&existing);
 	free_pack_geometry(&geometry);
+	for (size_t i = 0; i < midx_pack_names_nr; i++)
+		free(midx_pack_names[i]);
+	free(midx_pack_names);
 	pack_objects_args_release(&po_args);
 	pack_objects_args_release(&cruft_po_args);
 
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index 8aebfb45f5..aa2e2e6ad8 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -724,4 +724,149 @@ test_expect_success 'cruft repack respects --quiet' '
 	)
 '
 
+setup_cruft_exclude_tests() {
+	git init "$1" &&
+	(
+		cd "$1" &&
+
+		git config repack.midxMustContainCruft false &&
+
+		test_commit one &&
+
+		test_commit --no-tag two &&
+		two="$(git rev-parse HEAD)" &&
+		test_commit --no-tag three &&
+		three="$(git rev-parse HEAD)" &&
+		git reset --hard one &&
+		git reflog expire --all --expire=all &&
+
+		GIT_TEST_MULTI_PACK_INDEX=0 git repack --cruft -d &&
+
+		git merge $two &&
+		test_commit four
+	)
+}
+
+test_expect_success 'repack --write-midx excludes cruft where possible' '
+	setup_cruft_exclude_tests exclude-cruft-when-possible &&
+	(
+		cd exclude-cruft-when-possible &&
+
+		GIT_TEST_MULTI_PACK_INDEX=0 \
+		git repack -d --geometric=2 --write-midx --write-bitmap-index &&
+
+		test-tool read-midx --show-objects $objdir >midx &&
+		cruft="$(ls $packdir/*.mtimes)" &&
+		test_grep ! "$(basename "$cruft" .mtimes).idx" midx &&
+
+		git rev-list --all --objects --no-object-names >reachable.raw &&
+		sort reachable.raw >reachable.objects &&
+		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
+
+		test_cmp reachable.objects midx.objects
+	)
+'
+
+test_expect_success 'repack --write-midx includes cruft when instructed' '
+	setup_cruft_exclude_tests exclude-cruft-when-instructed &&
+	(
+		cd exclude-cruft-when-instructed &&
+
+		GIT_TEST_MULTI_PACK_INDEX=0 \
+		git -c repack.midxMustContainCruft=true repack \
+			-d --geometric=2 --write-midx --write-bitmap-index &&
+
+		test-tool read-midx --show-objects $objdir >midx &&
+		cruft="$(ls $packdir/*.mtimes)" &&
+		test_grep "$(basename "$cruft" .mtimes).idx" midx &&
+
+		git cat-file --batch-check="%(objectname)" --batch-all-objects \
+			>all.objects &&
+		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
+
+		test_cmp all.objects midx.objects
+	)
+'
+
+test_expect_success 'repack --write-midx includes cruft when necessary' '
+	setup_cruft_exclude_tests exclude-cruft-when-necessary &&
+	(
+		cd exclude-cruft-when-necessary &&
+
+		test_path_is_file $(ls $packdir/pack-*.mtimes) &&
+		( cd $packdir && ls pack-*.idx ) | sort >packs.all &&
+		git multi-pack-index write --stdin-packs --bitmap <packs.all &&
+
+		test_commit five &&
+		GIT_TEST_MULTI_PACK_INDEX=0 \
+		git repack -d --geometric=2 --write-midx --write-bitmap-index &&
+
+		test-tool read-midx --show-objects $objdir >midx &&
+		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
+		git cat-file --batch-all-objects --batch-check="%(objectname)" \
+			>expect.objects &&
+		test_cmp expect.objects midx.objects &&
+
+		grep "^pack-" midx >midx.packs &&
+		test_line_count = "$(($(wc -l <packs.all) + 1))" midx.packs
+	)
+'
+
+test_expect_success 'repack --write-midx includes cruft when already geometric' '
+	git init repack--write-midx-geometric-noop &&
+	(
+		cd repack--write-midx-geometric-noop &&
+
+		git branch -M main &&
+		test_commit A &&
+		test_commit B &&
+
+		git checkout -B side &&
+		test_commit --no-tag C &&
+		C="$(git rev-parse HEAD)" &&
+
+		git checkout main &&
+		git branch -D side &&
+		git reflog expire --all --expire=all &&
+
+		# At this point we have two packs: one containing the
+		# objects belonging to commits A and B, and another
+		# (cruft) pack containing the objects belonging to
+		# commit C.
+		git repack --cruft -d &&
+
+		# Create a third pack which contains a merge commit
+		# making commit C reachable again.
+		#
+		# --no-ff is important here, as it ensures that we
+		# actually write a new object and subsequently a new
+		# pack to contain it.
+		git merge --no-ff $C &&
+		git repack -d &&
+
+		ls $packdir/pack-*.idx | sort >packs.all &&
+		cruft="$(ls $packdir/pack-*.mtimes)" &&
+		cruft="${cruft%.mtimes}.idx" &&
+
+		for idx in $(grep -v $cruft <packs.all)
+		do
+			git show-index <$idx >out &&
+			wc -l <out || return 1
+		done >sizes.raw &&
+
+		# Make sure that there are two non-cruft packs, and
+		# that one of them contains at least twice as many
+		# objects as the other, ensuring that they are already
+		# in a geometric progression.
+		sort -n sizes.raw >sizes &&
+		test_line_count = 2 sizes &&
+		s1=$(head -n 1 sizes) &&
+		s2=$(tail -n 1 sizes) &&
+		test "$s2" -gt "$((2 * $s1))" &&
+
+		git -c repack.midxMustContainCruft=false repack --geometric=2 \
+			--write-midx --write-bitmap-index
+	)
+'
+
 test_done
-- 
2.50.0.61.g1981e40f2d

^ permalink raw reply related

* [PATCH v6 8/9] pack-objects: introduce '--stdin-packs=follow'
From: Taylor Blau @ 2025-06-23 22:32 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano
In-Reply-To: <cover.1750717921.git.me@ttaylorr.com>

When invoked with '--stdin-packs', pack-objects will generate a pack
which contains the objects found in the "included" packs, less any
objects from "excluded" packs.

Packs that exist in the repository but weren't specified as either
included or excluded are in practice treated like the latter, at least
in the sense that pack-objects won't include objects from those packs.
This behavior forces us to include any cruft pack(s) in a repository's
multi-pack index for the reasons described in ddee3703b3
(builtin/repack.c: add cruft packs to MIDX during geometric repack,
2022-05-20).

The full details are in ddee3703b3, but the gist is if you
have a once-unreachable object in a cruft pack which later becomes
reachable via one or more commits in a pack generated with
'--stdin-packs', you *have* to include that object in the MIDX via the
copy in the cruft pack, otherwise we cannot generate reachability
bitmaps for any commits which reach that object.

Note that the traversal here is best-effort, similar to the existing
traversal which provides name-hash hints. This means that the object
traversal may hand us back a blob that does not actually exist. We
*won't* see missing trees/commits with 'ignore_missing_links' because:

 - missing commit parents are discarded at the commit traversal stage by
   revision.c::process_parents()

 - missing tag objects are discarded by revision.c::handle_commit()

 - missing tree objects are discarded by the list-objects code in
   list-objects.c::process_tree()

But we have to handle potentially-missing blobs specially by making a
separate check to ensure they exist in the repository. Failing to do so
would mean that we'd add an object to the packing list which doesn't
actually exist, rendering us unable to write out the pack.

This prepares us for new repacking behavior which will "resurrect"
objects found in cruft or otherwise unspecified packs when generating
new packs. In the context of geometric repacking, this may be used to
maintain a sequence of geometrically-repacked packs, the union of which
is closed under reachability, even in the case described earlier.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-pack-objects.adoc |  10 ++-
 builtin/pack-objects.c              |  86 +++++++++++++++-----
 t/t5331-pack-objects-stdin.sh       | 120 ++++++++++++++++++++++++++++
 3 files changed, 193 insertions(+), 23 deletions(-)

diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
index b1c5aa27da..eba014c406 100644
--- a/Documentation/git-pack-objects.adoc
+++ b/Documentation/git-pack-objects.adoc
@@ -87,13 +87,21 @@ base-name::
 	reference was included in the resulting packfile.  This
 	can be useful to send new tags to native Git clients.
 
---stdin-packs::
+--stdin-packs[=<mode>]::
 	Read the basenames of packfiles (e.g., `pack-1234abcd.pack`)
 	from the standard input, instead of object names or revision
 	arguments. The resulting pack contains all objects listed in the
 	included packs (those not beginning with `^`), excluding any
 	objects listed in the excluded packs (beginning with `^`).
 +
+When `mode` is "follow", objects from packs not listed on stdin receive
+special treatment. Objects within unlisted packs will be included if
+those objects are (1) reachable from the included packs, and (2) not
+found in any excluded packs. This mode is useful, for example, to
+resurrect once-unreachable objects found in cruft packs to generate
+packs which are closed under reachability up to the boundary set by the
+excluded packs.
++
 Incompatible with `--revs`, or options that imply `--revs` (such as
 `--all`), with the exception of `--unpacked`, which is compatible.
 
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index f44447a3f9..4ae52c6a29 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -284,6 +284,12 @@ static struct oidmap configured_exclusions;
 static struct oidset excluded_by_config;
 static int name_hash_version = -1;
 
+enum stdin_packs_mode {
+	STDIN_PACKS_MODE_NONE,
+	STDIN_PACKS_MODE_STANDARD,
+	STDIN_PACKS_MODE_FOLLOW,
+};
+
 /**
  * Check whether the name_hash_version chosen by user input is appropriate,
  * and also validate whether it is compatible with other features.
@@ -3749,31 +3755,47 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 }
 
 static void show_object_pack_hint(struct object *object, const char *name,
-				  void *data UNUSED)
+				  void *data)
 {
-	struct object_entry *oe = packlist_find(&to_pack, &object->oid);
-	if (!oe)
-		return;
+	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
+	if (mode == STDIN_PACKS_MODE_FOLLOW) {
+		if (object->type == OBJ_BLOB &&
+		    !has_object(the_repository, &object->oid, 0))
+			return;
+		add_object_entry(&object->oid, object->type, name, 0);
+	} else {
+		struct object_entry *oe = packlist_find(&to_pack, &object->oid);
+		if (!oe)
+			return;
 
-	/*
-	 * Our 'to_pack' list was constructed by iterating all objects packed in
-	 * included packs, and so doesn't have a non-zero hash field that you
-	 * would typically pick up during a reachability traversal.
-	 *
-	 * Make a best-effort attempt to fill in the ->hash and ->no_try_delta
-	 * fields here in order to perhaps improve the delta selection
-	 * process.
-	 */
-	oe->hash = pack_name_hash_fn(name);
-	oe->no_try_delta = name && no_try_delta(name);
+		/*
+		 * Our 'to_pack' list was constructed by iterating all
+		 * objects packed in included packs, and so doesn't have
+		 * a non-zero hash field that you would typically pick
+		 * up during a reachability traversal.
+		 *
+		 * Make a best-effort attempt to fill in the ->hash and
+		 * ->no_try_delta fields here in order to perhaps
+		 * improve the delta selection process.
+		 */
+		oe->hash = pack_name_hash_fn(name);
+		oe->no_try_delta = name && no_try_delta(name);
 
-	stdin_packs_hints_nr++;
+		stdin_packs_hints_nr++;
+	}
 }
 
-static void show_commit_pack_hint(struct commit *commit UNUSED,
-				  void *data UNUSED)
+static void show_commit_pack_hint(struct commit *commit, void *data)
 {
+	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
+
+	if (mode == STDIN_PACKS_MODE_FOLLOW) {
+		show_object_pack_hint((struct object *)commit, "", data);
+		return;
+	}
+
 	/* nothing to do; commits don't have a namehash */
+
 }
 
 static int pack_mtime_cmp(const void *_a, const void *_b)
@@ -3881,7 +3903,7 @@ static void read_packs_list_from_stdin(struct rev_info *revs)
 
 static void add_unreachable_loose_objects(struct rev_info *revs);
 
-static void read_stdin_packs(int rev_list_unpacked)
+static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
 {
 	struct rev_info revs;
 
@@ -3913,7 +3935,7 @@ static void read_stdin_packs(int rev_list_unpacked)
 	traverse_commit_list(&revs,
 			     show_commit_pack_hint,
 			     show_object_pack_hint,
-			     NULL);
+			     &mode);
 
 	trace2_data_intmax("pack-objects", the_repository, "stdin_packs_found",
 			   stdin_packs_found_nr);
@@ -4795,6 +4817,23 @@ static int is_not_in_promisor_pack(struct commit *commit, void *data) {
 	return is_not_in_promisor_pack_obj((struct object *) commit, data);
 }
 
+static int parse_stdin_packs_mode(const struct option *opt, const char *arg,
+				  int unset)
+{
+	enum stdin_packs_mode *mode = opt->value;
+
+	if (unset)
+		*mode = STDIN_PACKS_MODE_NONE;
+	else if (!arg || !*arg)
+		*mode = STDIN_PACKS_MODE_STANDARD;
+	else if (!strcmp(arg, "follow"))
+		*mode = STDIN_PACKS_MODE_FOLLOW;
+	else
+		die(_("invalid value for '%s': '%s'"), opt->long_name, arg);
+
+	return 0;
+}
+
 int cmd_pack_objects(int argc,
 		     const char **argv,
 		     const char *prefix,
@@ -4805,7 +4844,7 @@ int cmd_pack_objects(int argc,
 	struct strvec rp = STRVEC_INIT;
 	int rev_list_unpacked = 0, rev_list_all = 0, rev_list_reflog = 0;
 	int rev_list_index = 0;
-	int stdin_packs = 0;
+	enum stdin_packs_mode stdin_packs = STDIN_PACKS_MODE_NONE;
 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
 	struct list_objects_filter_options filter_options =
 		LIST_OBJECTS_FILTER_INIT;
@@ -4860,6 +4899,9 @@ int cmd_pack_objects(int argc,
 		OPT_SET_INT_F(0, "indexed-objects", &rev_list_index,
 			      N_("include objects referred to by the index"),
 			      1, PARSE_OPT_NONEG),
+		OPT_CALLBACK_F(0, "stdin-packs", &stdin_packs, N_("mode"),
+			     N_("read packs from stdin"),
+			     PARSE_OPT_OPTARG, parse_stdin_packs_mode),
 		OPT_BOOL(0, "stdin-packs", &stdin_packs,
 			 N_("read packs from stdin")),
 		OPT_BOOL(0, "stdout", &pack_to_stdout,
@@ -5150,7 +5192,7 @@ int cmd_pack_objects(int argc,
 		progress_state = start_progress(the_repository,
 						_("Enumerating objects"), 0);
 	if (stdin_packs) {
-		read_stdin_packs(rev_list_unpacked);
+		read_stdin_packs(stdin_packs, rev_list_unpacked);
 	} else if (cruft) {
 		read_cruft_objects();
 	} else if (!use_internal_rev_list) {
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index 8fd07deb8d..4a8df5a389 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -236,4 +236,124 @@ test_expect_success 'pack-objects --stdin with packfiles from main and alternate
 	test_cmp expected-objects actual-objects
 '
 
+objdir=.git/objects
+packdir=$objdir/pack
+
+objects_in_packs () {
+	for p in "$@"
+	do
+		git show-index <"$packdir/pack-$p.idx" || return 1
+	done >objects.raw &&
+
+	cut -d' ' -f2 objects.raw | sort &&
+	rm -f objects.raw
+}
+
+test_expect_success '--stdin-packs=follow walks into unknown packs' '
+	test_when_finished "rm -fr repo" &&
+
+	git init repo &&
+	(
+		cd repo &&
+
+		for c in A B C D
+		do
+			test_commit "$c" || return 1
+		done &&
+
+		A="$(echo A | git pack-objects --revs $packdir/pack)" &&
+		B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
+		C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
+		D="$(echo C..D | git pack-objects --revs $packdir/pack)" &&
+		test_commit E &&
+
+		git prune-packed &&
+
+		cat >in <<-EOF &&
+		pack-$B.pack
+		^pack-$C.pack
+		pack-$D.pack
+		EOF
+
+		# With just --stdin-packs, pack "A" is unknown to us, so
+		# only objects from packs "B" and "D" are included in
+		# the output pack.
+		P=$(git pack-objects --stdin-packs $packdir/pack <in) &&
+		objects_in_packs $B $D >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual &&
+
+		# But with --stdin-packs=follow, objects from both
+		# included packs reach objects from the unknown pack, so
+		# objects from pack "A" is included in the output pack
+		# in addition to the above.
+		P=$(git pack-objects --stdin-packs=follow $packdir/pack <in) &&
+		objects_in_packs $A $B $D >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual &&
+
+		# And with --unpacked, we will pick up objects from unknown
+		# packs that are reachable from loose objects. Loose object E
+		# reaches objects in pack A, but there are three excluded packs
+		# in between.
+		#
+		# The resulting pack should include objects reachable from E
+		# that are not present in packs B, C, or D, along with those
+		# present in pack A.
+		cat >in <<-EOF &&
+		^pack-$B.pack
+		^pack-$C.pack
+		^pack-$D.pack
+		EOF
+
+		P=$(git pack-objects --stdin-packs=follow --unpacked \
+			$packdir/pack <in) &&
+
+		{
+			objects_in_packs $A &&
+			git rev-list --objects --no-object-names D..E
+		}>expect.raw &&
+		sort expect.raw >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual
+	)
+'
+
+stdin_packs__follow_with_only () {
+	rm -fr stdin_packs__follow_with_only &&
+	git init stdin_packs__follow_with_only &&
+	(
+		cd stdin_packs__follow_with_only &&
+
+		test_commit A &&
+		test_commit B &&
+
+		git rev-parse "$@" >B.objects &&
+
+		echo A | git pack-objects --revs $packdir/pack &&
+		B="$(git pack-objects $packdir/pack <B.objects)" &&
+
+		git cat-file --batch-check="%(objectname)" --batch-all-objects >objs &&
+		for obj in $(cat objs)
+		do
+			rm -f $objdir/$(test_oid_to_path $obj) || return 1
+		done &&
+
+		( cd $packdir && ls pack-*.pack ) >in &&
+		git pack-objects --stdin-packs=follow --stdout >/dev/null <in
+	)
+}
+
+test_expect_success '--stdin-packs=follow tolerates missing blobs' '
+	stdin_packs__follow_with_only HEAD HEAD^{tree}
+'
+
+test_expect_success '--stdin-packs=follow tolerates missing trees' '
+	stdin_packs__follow_with_only HEAD HEAD:B.t
+'
+
+test_expect_success '--stdin-packs=follow tolerates missing commits' '
+	stdin_packs__follow_with_only HEAD HEAD^{tree}
+'
+
 test_done
-- 
2.50.0.61.g1981e40f2d


^ permalink raw reply related

* [PATCH v6 7/9] pack-objects: swap 'show_{object,commit}_pack_hint'
From: Taylor Blau @ 2025-06-23 22:32 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano
In-Reply-To: <cover.1750717921.git.me@ttaylorr.com>

show_commit_pack_hint() has heretofore been a noop, so its position
within its compilation unit only needs to appear before its first use.

But the following commit will sometimes have `show_commit_pack_hint()`
call `show_object_pack_hint()`, so reorder the former to appear after
the latter to minimize the code movement in that patch.

Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 9580b4ea1a..f44447a3f9 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3748,12 +3748,6 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 	return 0;
 }
 
-static void show_commit_pack_hint(struct commit *commit UNUSED,
-				  void *data UNUSED)
-{
-	/* nothing to do; commits don't have a namehash */
-}
-
 static void show_object_pack_hint(struct object *object, const char *name,
 				  void *data UNUSED)
 {
@@ -3776,6 +3770,12 @@ static void show_object_pack_hint(struct object *object, const char *name,
 	stdin_packs_hints_nr++;
 }
 
+static void show_commit_pack_hint(struct commit *commit UNUSED,
+				  void *data UNUSED)
+{
+	/* nothing to do; commits don't have a namehash */
+}
+
 static int pack_mtime_cmp(const void *_a, const void *_b)
 {
 	struct packed_git *a = ((const struct string_list_item*)_a)->util;
-- 
2.50.0.61.g1981e40f2d


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox