git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] submodule: improve remote lookup logic
@ 2025-06-17 21:30 Jacob Keller
  2025-06-17 21:30 ` [PATCH v2 1/6] remote: fix tear down of struct branch and struct remote Jacob Keller
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Jacob Keller @ 2025-06-17 21:30 UTC (permalink / raw)
  To: git; +Cc: Jacob Keller, Lidong Yan, Patrick Steinhardt, Junio C Hamano

This series improves the git submodule remote lookup logic implemented in
submodule--helper.

A few cleanups are done first:

* Both the branch_release() function and remote_clear() functions do not
  completely free all memory pointed to by their associated structures. In
  addition, branch_release() potentially dereferences branch->merge even
  when its NULL. Fix these inconsistencies to ensure that the remote logic
  works for a submodule repository pointer.

* The starts_with_dot(_dot)_slash helper functions are moved to dir.h for
  re-use, as these are used both within submodule--helper.c and
  submodule-config.c

* Several remote.c helper functions are refactored to take repository
  pointers, enabling use with a submodule repository pointer.

Next, the submodule--helper.c logic replaces the repo_get_default_remote()
function with a repo_default_remote() function in remote.c, which is based
on the more robust configuration reading logic. This helper uses similar
logic but also allows returning the only valid remote in the case where a
repository has exactly one remote. This way we do not fall back to "origin"
if a user has renamed the remote without adding another.

This improved logic is a good first step, but won't handle cases where
there are multiple remotes, with none of them being named "origin".

For the final improvement, notice that the parent project already stores
the URL for the submodule in its .git/config or .gitmodules file. This URL
is what we use to set the remote in the first place when cloning.

Add a repo_remote_from_url() helper which will iterate through the remotes
and find the first remote with that URL. Use this in
get_default_remote_submodule() to first try and find a remote by its URL.
If unsuccessful, we still keep the original fallback logic, in the off
chance that the user has changed the URL from within the submodule.

This method is more robust as it is less likely that the user has manually
changed the submodule URL within the submodule but not also within the
.git/config.

With this change, all commands which need the submodule remote will first
look up by URL before trying to use the fallback logic, and should now be
able to find a suitable remote regardless of now they are renamed.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
---
Changes in v2:
- Remove repo_get_default_remote() entirely. The extra checks it does are
  really only necessary if you're doing manual configuration lookup. This
  avoids the confusion of similarly named functions and is less code.
- Fix leaks in branch_release() and remote_clear().
- Add a forward declaration of struct repository.
- Verified tests pass with leak sanitizer now.
- Link to v1: https://lore.kernel.org/r/20250610-jk-submodule-helper-use-url-v1-0-6d14c1504e91@gmail.com

---
Jacob Keller (6):
      remote: fix tear down of struct branch and struct remote
      dir: move starts_with_dot(_dot)_slash to dir.h
      remote: remove the_repository from some functions
      submodule--helper: improve logic for fallback remote name
      submodule: move get_default_remote_submodule()
      submodule: look up remotes by URL first

 dir.h                       |  23 +++++++++
 remote.h                    |   6 ++-
 builtin/submodule--helper.c | 106 ++++++++++++++++------------------------
 remote.c                    | 114 +++++++++++++++++++++++++++++---------------
 submodule-config.c          |  12 -----
 t/t7406-submodule-update.sh |  61 ++++++++++++++++++++++++
 6 files changed, 206 insertions(+), 116 deletions(-)
---
base-commit: 16bd9f20a403117f2e0d9bcda6c6e621d3763e77
change-id: 20250610-jk-submodule-helper-use-url-e55d3c379faf

Best regards,
-- 
Jacob Keller <jacob.keller@gmail.com>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/6] remote: fix tear down of struct branch and struct remote
  2025-06-17 21:30 [PATCH v2 0/6] submodule: improve remote lookup logic Jacob Keller
@ 2025-06-17 21:30 ` Jacob Keller
  2025-06-17 22:25   ` Junio C Hamano
  2025-06-17 21:30 ` [PATCH v2 2/6] dir: move starts_with_dot(_dot)_slash to dir.h Jacob Keller
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Jacob Keller @ 2025-06-17 21:30 UTC (permalink / raw)
  To: git; +Cc: Jacob Keller, Lidong Yan, Patrick Steinhardt, Junio C Hamano

From: Jacob Keller <jacob.keller@gmail.com>

The branch_release() and remote_clear() functions fail to completely
release all of the memory they point to. This results in leaked memory
if you read the configuration for an arbitrary repository and then
release that repository.

This should be caught by the leak sanitizer. However, for callers which
use ``the_repository``, the values never go out of scope and the
sanitizer won't complain.

A future change is going to add a caller of read_config() for a
submodule repository structure. Doing so reveals one immediate issue due
to a bad NULL pointer access, as well as the mentioned leaks.

 * The branch->merge array is accessed without checking if its NULL.
   Since this array is only setup by calling set_merge, it may in fact
   not be initialized even though merge_nr is non-zero.

 * The remote->push and remote->fetch refspecs are never cleared.

 * The branch->merge_name array is never cleared.

 * The individual elements of branch->merge are not released.

Add a check against branch->merge before accessing it and calling
refspec_item_clear. Update remote_clear() with calls to refspec_clear()
for both the push and fetch refspecs. Add a release of the merge_name
items as well as a final release of the merge_name array.

Freeing merge_name elements results in a warning because we discard the
const qualifier on the parameter name. These values come from a call to
add_merge() in handle_config(), which always copies the names with
xstrdup. This makes ownership of the memory difficult to track in the
code.

Move the call to xstrdup inside add_merge() so that its clear that the
memory is duplicated here and must be released when the merge_name array
is released. Drop the const qualifier on the branch structure to allow
calling free without an explicit cast.

These changes make it safe to call read_config() on a submodule
repository without crashing or leaking any of the memory when the
submodule repository is released.

There is still some confusion with the difference between branch->merge
and branch->merge_name, and the confusion of using branch->merge_nr for
both. That could probably use some future cleanup.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
---
 remote.h |  2 +-
 remote.c | 18 ++++++++++++++----
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/remote.h b/remote.h
index 7e4943ae3a70ecefa3332d211084762ca30b59b6..0fca14348fd3d0734c599b13278afef2e00107b9 100644
--- a/remote.h
+++ b/remote.h
@@ -316,7 +316,7 @@ struct branch {
 	char *pushremote_name;
 
 	/* An array of the "merge" lines in the configuration. */
-	const char **merge_name;
+	char **merge_name;
 
 	/**
 	 * An array of the struct refspecs used for the merge lines. That is,
diff --git a/remote.c b/remote.c
index 4099183cacdc8a607a8b5eaec86e456b2ef46b48..538d0d24c832ffeb1cc14684eb166c62c42d719d 100644
--- a/remote.c
+++ b/remote.c
@@ -165,6 +165,9 @@ static void remote_clear(struct remote *remote)
 	strvec_clear(&remote->url);
 	strvec_clear(&remote->pushurl);
 
+	refspec_clear(&remote->push);
+	refspec_clear(&remote->fetch);
+
 	free((char *)remote->receivepack);
 	free((char *)remote->uploadpack);
 	FREE_AND_NULL(remote->http_proxy);
@@ -176,7 +179,7 @@ static void add_merge(struct branch *branch, const char *name)
 {
 	ALLOC_GROW(branch->merge_name, branch->merge_nr + 1,
 		   branch->merge_alloc);
-	branch->merge_name[branch->merge_nr++] = name;
+	branch->merge_name[branch->merge_nr++] = xstrdup(name);
 }
 
 struct branches_hash_key {
@@ -253,9 +256,16 @@ static void branch_release(struct branch *branch)
 	free((char *)branch->refname);
 	free(branch->remote_name);
 	free(branch->pushremote_name);
-	for (int i = 0; i < branch->merge_nr; i++)
-		refspec_item_clear(branch->merge[i]);
+	for (int i = 0; i < branch->merge_nr; i++) {
+		if (branch->merge) {
+			refspec_item_clear(branch->merge[i]);
+			free(branch->merge[i]);
+		}
+		if (branch->merge_name)
+			free(branch->merge_name[i]);
+	}
 	free(branch->merge);
+	free(branch->merge_name);
 }
 
 static struct rewrite *make_rewrite(struct rewrites *r,
@@ -429,7 +439,7 @@ static int handle_config(const char *key, const char *value,
 		} else if (!strcmp(subkey, "merge")) {
 			if (!value)
 				return config_error_nonbool(key);
-			add_merge(branch, xstrdup(value));
+			add_merge(branch, value);
 		}
 		return 0;
 	}

-- 
2.48.1.397.gec9d649cc640


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/6] dir: move starts_with_dot(_dot)_slash to dir.h
  2025-06-17 21:30 [PATCH v2 0/6] submodule: improve remote lookup logic Jacob Keller
  2025-06-17 21:30 ` [PATCH v2 1/6] remote: fix tear down of struct branch and struct remote Jacob Keller
@ 2025-06-17 21:30 ` Jacob Keller
  2025-06-17 21:30 ` [PATCH v2 3/6] remote: remove the_repository from some functions Jacob Keller
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Jacob Keller @ 2025-06-17 21:30 UTC (permalink / raw)
  To: git; +Cc: Jacob Keller, Lidong Yan, Patrick Steinhardt, Junio C Hamano

From: Jacob Keller <jacob.keller@gmail.com>

Both submodule--helper.c and submodule-config.c have an implementation
of starts_with_dot_slash and starts_with_dot_dot_slash. The dir.h header
has starts_with_dot(_dot)_slash_native, which sets PATH_MATCH_NATIVE.

Move the helpers to dir.h as static inlines. I thought about renaming
them to postfix with _platform but that felt too long and ugly. On the
other hand it might be slightly confusing with _native.

This simplifies a submodule refactor which wants to use the helpers
earlier in the submodule--helper.c file.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
---
 dir.h                       | 23 +++++++++++++++++++++++
 builtin/submodule--helper.c | 12 ------------
 submodule-config.c          | 12 ------------
 3 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/dir.h b/dir.h
index d7e71aa8daa7d833e4c05e6875b997bc321c6070..fc9be7b427a134e46bcd66c8df42375db47727fc 100644
--- a/dir.h
+++ b/dir.h
@@ -676,4 +676,27 @@ static inline int starts_with_dot_dot_slash_native(const char *const path)
 	return path_match_flags(path, what | PATH_MATCH_NATIVE);
 }
 
+/**
+ * starts_with_dot_slash: convenience wrapper for
+ * patch_match_flags() with PATH_MATCH_STARTS_WITH_DOT_SLASH and
+ * PATH_MATCH_XPLATFORM.
+ */
+static inline int starts_with_dot_slash(const char *const path)
+{
+	const enum path_match_flags what = PATH_MATCH_STARTS_WITH_DOT_SLASH;
+
+	return path_match_flags(path, what | PATH_MATCH_XPLATFORM);
+}
+
+/**
+ * starts_with_dot_dot_slash: convenience wrapper for
+ * patch_match_flags() with PATH_MATCH_STARTS_WITH_DOT_DOT_SLASH and
+ * PATH_MATCH_XPLATFORM.
+ */
+static inline int starts_with_dot_dot_slash(const char *const path)
+{
+	const enum path_match_flags what = PATH_MATCH_STARTS_WITH_DOT_DOT_SLASH;
+
+	return path_match_flags(path, what | PATH_MATCH_XPLATFORM);
+}
 #endif
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 53da2116ddf576bc565b29f043e8b703b8b1563b..9e8cdfe1b2a8c2985d9c1b8ad6f1b0d1f9401714 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -438,18 +438,6 @@ static int module_foreach(int argc, const char **argv, const char *prefix,
 	return ret;
 }
 
-static int starts_with_dot_slash(const char *const path)
-{
-	return path_match_flags(path, PATH_MATCH_STARTS_WITH_DOT_SLASH |
-				PATH_MATCH_XPLATFORM);
-}
-
-static int starts_with_dot_dot_slash(const char *const path)
-{
-	return path_match_flags(path, PATH_MATCH_STARTS_WITH_DOT_DOT_SLASH |
-				PATH_MATCH_XPLATFORM);
-}
-
 struct init_cb {
 	const char *prefix;
 	const char *super_prefix;
diff --git a/submodule-config.c b/submodule-config.c
index 8630e27947d3943e1980eb7a53bd41a546842503..d64438b2a18ed2123cc5e18f739539209032d3e9 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -235,18 +235,6 @@ int check_submodule_name(const char *name)
 	return 0;
 }
 
-static int starts_with_dot_slash(const char *const path)
-{
-	return path_match_flags(path, PATH_MATCH_STARTS_WITH_DOT_SLASH |
-				PATH_MATCH_XPLATFORM);
-}
-
-static int starts_with_dot_dot_slash(const char *const path)
-{
-	return path_match_flags(path, PATH_MATCH_STARTS_WITH_DOT_DOT_SLASH |
-				PATH_MATCH_XPLATFORM);
-}
-
 static int submodule_url_is_relative(const char *url)
 {
 	return starts_with_dot_slash(url) || starts_with_dot_dot_slash(url);

-- 
2.48.1.397.gec9d649cc640


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 3/6] remote: remove the_repository from some functions
  2025-06-17 21:30 [PATCH v2 0/6] submodule: improve remote lookup logic Jacob Keller
  2025-06-17 21:30 ` [PATCH v2 1/6] remote: fix tear down of struct branch and struct remote Jacob Keller
  2025-06-17 21:30 ` [PATCH v2 2/6] dir: move starts_with_dot(_dot)_slash to dir.h Jacob Keller
@ 2025-06-17 21:30 ` Jacob Keller
  2025-06-17 22:46   ` Junio C Hamano
  2025-06-17 21:30 ` [PATCH v2 4/6] submodule--helper: improve logic for fallback remote name Jacob Keller
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Jacob Keller @ 2025-06-17 21:30 UTC (permalink / raw)
  To: git; +Cc: Jacob Keller, Lidong Yan, Patrick Steinhardt, Junio C Hamano

From: Jacob Keller <jacob.keller@gmail.com>

The remotes_remote_get_1 (and its caller, remotes_remote_get, have an
implicit dependency on the_repository due to calling
read_branches_file() and read_remotes_file(), both of which use
the_repository. The branch_get() function calls set_merge() which has an
implicit dependency on the_repository as well.

Because of this use of the_repository, the helper functions cannot be
used in code paths which operate on other repositories. A future
refactor of the submodule--helper will want to make use of some of these
functions.

Refactor to break the dependency by passing struct repository *repo
instead of struct remote_state *remote_state in a few places.

The public callers and many other helper functions still depend on
the_repository. A repo-aware function will be exposed in a following
change for git submodule--helper.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
---
 remote.c | 58 ++++++++++++++++++++++++++++------------------------------
 1 file changed, 28 insertions(+), 30 deletions(-)

diff --git a/remote.c b/remote.c
index 538d0d24c832ffeb1cc14684eb166c62c42d719d..b3a9881a6eacf90bee71d6760858b37d68263502 100644
--- a/remote.c
+++ b/remote.c
@@ -327,11 +327,10 @@ static void warn_about_deprecated_remote_type(const char *type,
 		type, remote->name, remote->name, remote->name);
 }
 
-static void read_remotes_file(struct remote_state *remote_state,
-			      struct remote *remote)
+static void read_remotes_file(struct repository *repo, struct remote *remote)
 {
 	struct strbuf buf = STRBUF_INIT;
-	FILE *f = fopen_or_warn(repo_git_path_append(the_repository, &buf,
+	FILE *f = fopen_or_warn(repo_git_path_append(repo, &buf,
 						     "remotes/%s", remote->name), "r");
 
 	if (!f)
@@ -347,7 +346,7 @@ static void read_remotes_file(struct remote_state *remote_state,
 		strbuf_rtrim(&buf);
 
 		if (skip_prefix(buf.buf, "URL:", &v))
-			add_url_alias(remote_state, remote,
+			add_url_alias(repo->remote_state, remote,
 				      skip_spaces(v));
 		else if (skip_prefix(buf.buf, "Push:", &v))
 			refspec_append(&remote->push, skip_spaces(v));
@@ -360,12 +359,11 @@ static void read_remotes_file(struct remote_state *remote_state,
 	strbuf_release(&buf);
 }
 
-static void read_branches_file(struct remote_state *remote_state,
-			       struct remote *remote)
+static void read_branches_file(struct repository *repo, struct remote *remote)
 {
 	char *frag, *to_free = NULL;
 	struct strbuf buf = STRBUF_INIT;
-	FILE *f = fopen_or_warn(repo_git_path_append(the_repository, &buf,
+	FILE *f = fopen_or_warn(repo_git_path_append(repo, &buf,
 						     "branches/%s", remote->name), "r");
 
 	if (!f)
@@ -392,9 +390,9 @@ static void read_branches_file(struct remote_state *remote_state,
 	if (frag)
 		*(frag++) = '\0';
 	else
-		frag = to_free = repo_default_branch_name(the_repository, 0);
+		frag = to_free = repo_default_branch_name(repo, 0);
 
-	add_url_alias(remote_state, remote, buf.buf);
+	add_url_alias(repo->remote_state, remote, buf.buf);
 	refspec_appendf(&remote->fetch, "refs/heads/%s:refs/heads/%s",
 			frag, remote->name);
 
@@ -691,7 +689,7 @@ const char *pushremote_for_branch(struct branch *branch, int *explicit)
 					     branch, explicit);
 }
 
-static struct remote *remotes_remote_get(struct remote_state *remote_state,
+static struct remote *remotes_remote_get(struct repository *repo,
 					 const char *name);
 
 char *remote_ref_for_branch(struct branch *branch, int for_push)
@@ -710,7 +708,7 @@ char *remote_ref_for_branch(struct branch *branch, int for_push)
 					the_repository->remote_state, branch,
 					NULL);
 			struct remote *remote = remotes_remote_get(
-				the_repository->remote_state, remote_name);
+				the_repository, remote_name);
 
 			if (remote && remote->push.nr &&
 			    (dst = apply_refspecs(&remote->push,
@@ -767,10 +765,11 @@ static void validate_remote_url(struct remote *remote)
 }
 
 static struct remote *
-remotes_remote_get_1(struct remote_state *remote_state, const char *name,
+remotes_remote_get_1(struct repository *repo, const char *name,
 		     const char *(*get_default)(struct remote_state *,
 						struct branch *, int *))
 {
+	struct remote_state *remote_state = repo->remote_state;
 	struct remote *ret;
 	int name_given = 0;
 
@@ -784,9 +783,9 @@ remotes_remote_get_1(struct remote_state *remote_state, const char *name,
 #ifndef WITH_BREAKING_CHANGES
 	if (valid_remote_nick(name) && have_git_dir()) {
 		if (!valid_remote(ret))
-			read_remotes_file(remote_state, ret);
+			read_remotes_file(repo, ret);
 		if (!valid_remote(ret))
-			read_branches_file(remote_state, ret);
+			read_branches_file(repo, ret);
 	}
 #endif /* WITH_BREAKING_CHANGES */
 	if (name_given && !valid_remote(ret))
@@ -800,35 +799,33 @@ remotes_remote_get_1(struct remote_state *remote_state, const char *name,
 }
 
 static inline struct remote *
-remotes_remote_get(struct remote_state *remote_state, const char *name)
+remotes_remote_get(struct repository *repo, const char *name)
 {
-	return remotes_remote_get_1(remote_state, name,
-				    remotes_remote_for_branch);
+	return remotes_remote_get_1(repo, name, remotes_remote_for_branch);
 }
 
 struct remote *remote_get(const char *name)
 {
 	read_config(the_repository, 0);
-	return remotes_remote_get(the_repository->remote_state, name);
+	return remotes_remote_get(the_repository, name);
 }
 
 struct remote *remote_get_early(const char *name)
 {
 	read_config(the_repository, 1);
-	return remotes_remote_get(the_repository->remote_state, name);
+	return remotes_remote_get(the_repository, name);
 }
 
 static inline struct remote *
-remotes_pushremote_get(struct remote_state *remote_state, const char *name)
+remotes_pushremote_get(struct repository *repo, const char *name)
 {
-	return remotes_remote_get_1(remote_state, name,
-				    remotes_pushremote_for_branch);
+	return remotes_remote_get_1(repo, name, remotes_pushremote_for_branch);
 }
 
 struct remote *pushremote_get(const char *name)
 {
 	read_config(the_repository, 0);
-	return remotes_pushremote_get(the_repository->remote_state, name);
+	return remotes_pushremote_get(the_repository, name);
 }
 
 int remote_is_configured(struct remote *remote, int in_repo)
@@ -1732,7 +1729,7 @@ void set_ref_status_for_push(struct ref *remote_refs, int send_mirror,
 	}
 }
 
-static void set_merge(struct remote_state *remote_state, struct branch *ret)
+static void set_merge(struct repository *repo, struct branch *ret)
 {
 	struct remote *remote;
 	char *ref;
@@ -1752,7 +1749,7 @@ static void set_merge(struct remote_state *remote_state, struct branch *ret)
 		return;
 	}
 
-	remote = remotes_remote_get(remote_state, ret->remote_name);
+	remote = remotes_remote_get(repo, ret->remote_name);
 
 	CALLOC_ARRAY(ret->merge, ret->merge_nr);
 	for (i = 0; i < ret->merge_nr; i++) {
@@ -1761,7 +1758,7 @@ static void set_merge(struct remote_state *remote_state, struct branch *ret)
 		if (!remote_find_tracking(remote, ret->merge[i]) ||
 		    strcmp(ret->remote_name, "."))
 			continue;
-		if (repo_dwim_ref(the_repository, ret->merge_name[i],
+		if (repo_dwim_ref(repo, ret->merge_name[i],
 				  strlen(ret->merge_name[i]), &oid, &ref,
 				  0) == 1)
 			ret->merge[i]->dst = ref;
@@ -1780,7 +1777,7 @@ struct branch *branch_get(const char *name)
 	else
 		ret = make_branch(the_repository->remote_state, name,
 				  strlen(name));
-	set_merge(the_repository->remote_state, ret);
+	set_merge(the_repository, ret);
 	return ret;
 }
 
@@ -1851,13 +1848,14 @@ static const char *tracking_for_push_dest(struct remote *remote,
 	return ret;
 }
 
-static const char *branch_get_push_1(struct remote_state *remote_state,
+static const char *branch_get_push_1(struct repository *repo,
 				     struct branch *branch, struct strbuf *err)
 {
+	struct remote_state *remote_state = repo->remote_state;
 	struct remote *remote;
 
 	remote = remotes_remote_get(
-		remote_state,
+		repo,
 		remotes_pushremote_for_branch(remote_state, branch, NULL));
 	if (!remote)
 		return error_buf(err,
@@ -1924,7 +1922,7 @@ const char *branch_get_push(struct branch *branch, struct strbuf *err)
 
 	if (!branch->push_tracking_ref)
 		branch->push_tracking_ref = branch_get_push_1(
-			the_repository->remote_state, branch, err);
+			the_repository, branch, err);
 	return branch->push_tracking_ref;
 }
 

-- 
2.48.1.397.gec9d649cc640


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 4/6] submodule--helper: improve logic for fallback remote name
  2025-06-17 21:30 [PATCH v2 0/6] submodule: improve remote lookup logic Jacob Keller
                   ` (2 preceding siblings ...)
  2025-06-17 21:30 ` [PATCH v2 3/6] remote: remove the_repository from some functions Jacob Keller
@ 2025-06-17 21:30 ` Jacob Keller
  2025-06-17 23:12   ` Junio C Hamano
  2025-06-17 21:30 ` [PATCH v2 5/6] submodule: move get_default_remote_submodule() Jacob Keller
  2025-06-17 21:30 ` [PATCH v2 6/6] submodule: look up remotes by URL first Jacob Keller
  5 siblings, 1 reply; 18+ messages in thread
From: Jacob Keller @ 2025-06-17 21:30 UTC (permalink / raw)
  To: git; +Cc: Jacob Keller, Lidong Yan, Patrick Steinhardt, Junio C Hamano

From: Jacob Keller <jacob.keller@gmail.com>

The repo_get_default_remote() function in submodule--helper currently
tries to figure out the proper remote name to use for a submodule based
on a few factors.

First, it tries to find the remote for the currently checked out branch.
This works if the submodule is configured to checkout to a branch
instead of a detached HEAD state.

In the detached HEAD state, the code calls back to using "origin", on
the assumption that this is the default remote name. Some users may
change this, such as by setting clone.defaultRemoteName, or by changing
the remote name manually within the submodule repository.

As a first step to improving this situation, refactor to reuse the logic
from remotes_remote_for_branch(). This function uses the remote from the
branch if it has one. If it doesn't then it checks to see if there is
exactly one remote. It uses this remote first before attempting to fall
back to "origin".

To allow using this helper function, introduce a repo_default_remote()
helper to remote.c which takes a repository structure. This helper will
load the remote configuration and get the "HEAD" branch. Then it will
call remotes_remote_for_branch to find the default remote.

Replace calls of repo_get_default_remote() with the calls to this new
function. To maintain consistency with the existing callers, continue
copying the returned string with xstrdup.

This isn't a perfect solution for users who change remote names, but it
should help in cases where the remote name is changed but users haven't
added any additional remotes.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
---
 remote.h                    |  3 +++
 builtin/submodule--helper.c | 46 +++++----------------------------------------
 remote.c                    | 25 +++++++++++++++++++-----
 t/t7406-submodule-update.sh | 29 ++++++++++++++++++++++++++++
 4 files changed, 57 insertions(+), 46 deletions(-)

diff --git a/remote.h b/remote.h
index 0fca14348fd3d0734c599b13278afef2e00107b9..672e8e893b8243c09145c340f412aa0eae8d562f 100644
--- a/remote.h
+++ b/remote.h
@@ -9,6 +9,7 @@
 
 struct option;
 struct transport_ls_refs_options;
+struct repository;
 
 /**
  * The API gives access to the configuration related to remotes. It handles
@@ -338,6 +339,8 @@ const char *remote_for_branch(struct branch *branch, int *explicit);
 const char *pushremote_for_branch(struct branch *branch, int *explicit);
 char *remote_ref_for_branch(struct branch *branch, int for_push);
 
+const char *repo_default_remote(struct repository *repo);
+
 /* returns true if the given branch has merge configuration given. */
 int branch_has_merge_config(struct branch *branch);
 
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 9e8cdfe1b2a8c2985d9c1b8ad6f1b0d1f9401714..4aa237033a526fca29cce2926419462179d40ee3 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -41,61 +41,25 @@
 typedef void (*each_submodule_fn)(const struct cache_entry *list_item,
 				  void *cb_data);
 
-static int repo_get_default_remote(struct repository *repo, char **default_remote)
-{
-	char *dest = NULL;
-	struct strbuf sb = STRBUF_INIT;
-	struct ref_store *store = get_main_ref_store(repo);
-	const char *refname = refs_resolve_ref_unsafe(store, "HEAD", 0, NULL,
-						      NULL);
-
-	if (!refname)
-		return die_message(_("No such ref: %s"), "HEAD");
-
-	/* detached HEAD */
-	if (!strcmp(refname, "HEAD")) {
-		*default_remote = xstrdup("origin");
-		return 0;
-	}
-
-	if (!skip_prefix(refname, "refs/heads/", &refname))
-		return die_message(_("Expecting a full ref name, got %s"),
-				   refname);
-
-	strbuf_addf(&sb, "branch.%s.remote", refname);
-	if (repo_config_get_string(repo, sb.buf, &dest))
-		*default_remote = xstrdup("origin");
-	else
-		*default_remote = dest;
-
-	strbuf_release(&sb);
-	return 0;
-}
-
 static int get_default_remote_submodule(const char *module_path, char **default_remote)
 {
 	struct repository subrepo;
-	int ret;
 
 	if (repo_submodule_init(&subrepo, the_repository, module_path,
 				null_oid(the_hash_algo)) < 0)
 		return die_message(_("could not get a repository handle for submodule '%s'"),
 				   module_path);
-	ret = repo_get_default_remote(&subrepo, default_remote);
+
+	*default_remote = xstrdup(repo_default_remote(&subrepo));
+
 	repo_clear(&subrepo);
 
-	return ret;
+	return 0;
 }
 
 static char *get_default_remote(void)
 {
-	char *default_remote;
-	int code = repo_get_default_remote(the_repository, &default_remote);
-
-	if (code)
-		exit(code);
-
-	return default_remote;
+	return xstrdup(repo_default_remote(the_repository));
 }
 
 static char *resolve_relative_url(const char *rel_url, const char *up_path, int quiet)
diff --git a/remote.c b/remote.c
index b3a9881a6eacf90bee71d6760858b37d68263502..94b31f4c23057a247a968fc0ebe2e5170e99614d 100644
--- a/remote.c
+++ b/remote.c
@@ -1767,20 +1767,35 @@ static void set_merge(struct repository *repo, struct branch *ret)
 	}
 }
 
-struct branch *branch_get(const char *name)
+static struct branch *repo_branch_get(struct repository *repo, const char *name)
 {
 	struct branch *ret;
 
-	read_config(the_repository, 0);
+	read_config(repo, 0);
 	if (!name || !*name || !strcmp(name, "HEAD"))
-		ret = the_repository->remote_state->current_branch;
+		ret = repo->remote_state->current_branch;
 	else
-		ret = make_branch(the_repository->remote_state, name,
+		ret = make_branch(repo->remote_state, name,
 				  strlen(name));
-	set_merge(the_repository, ret);
+	set_merge(repo, ret);
 	return ret;
 }
 
+struct branch *branch_get(const char *name)
+{
+	return repo_branch_get(the_repository, name);
+}
+
+const char *repo_default_remote(struct repository *repo)
+{
+	struct branch *branch;
+
+	read_config(repo, 0);
+	branch = repo_branch_get(repo, "HEAD");
+
+	return remotes_remote_for_branch(repo->remote_state, branch, NULL);
+}
+
 int branch_has_merge_config(struct branch *branch)
 {
 	return branch && !!branch->merge;
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index c562bad042ab2d4d0f82cb8b57a1eadbe24044d1..748b529745a5121f121768bb4e0cbc11bc833ea4 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -1134,6 +1134,35 @@ test_expect_success 'setup clean recursive superproject' '
 	git clone --recurse-submodules top top-clean
 '
 
+test_expect_success 'submodule update with renamed remote' '
+	test_when_finished "rm -fr top-cloned" &&
+	cp -r top-clean top-cloned &&
+
+	# Create a commit in each repo, starting with bottom
+	test_commit -C bottom rename_commit &&
+	# Create middle commit
+	git -C middle/bottom fetch &&
+	git -C middle/bottom checkout -f FETCH_HEAD &&
+	git -C middle add bottom &&
+	git -C middle commit -m "rename_commit" &&
+	# Create top commit
+	git -C top/middle fetch &&
+	git -C top/middle checkout -f FETCH_HEAD &&
+	git -C top add middle &&
+	git -C top commit -m "rename_commit" &&
+
+	# rename the submodule remote
+	git -C top-cloned/middle remote rename origin upstream &&
+
+	# Make the update of "middle" a no-op, otherwise we error out
+	# because of its unmerged state
+	test_config -C top-cloned submodule.middle.update !true &&
+	git -C top-cloned submodule update --recursive 2>actual.err &&
+	cat >expect.err <<-\EOF &&
+	EOF
+	test_cmp expect.err actual.err
+'
+
 test_expect_success 'submodule update should skip unmerged submodules' '
 	test_when_finished "rm -fr top-cloned" &&
 	cp -r top-clean top-cloned &&

-- 
2.48.1.397.gec9d649cc640


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 5/6] submodule: move get_default_remote_submodule()
  2025-06-17 21:30 [PATCH v2 0/6] submodule: improve remote lookup logic Jacob Keller
                   ` (3 preceding siblings ...)
  2025-06-17 21:30 ` [PATCH v2 4/6] submodule--helper: improve logic for fallback remote name Jacob Keller
@ 2025-06-17 21:30 ` Jacob Keller
  2025-06-17 21:30 ` [PATCH v2 6/6] submodule: look up remotes by URL first Jacob Keller
  5 siblings, 0 replies; 18+ messages in thread
From: Jacob Keller @ 2025-06-17 21:30 UTC (permalink / raw)
  To: git; +Cc: Jacob Keller, Lidong Yan, Patrick Steinhardt, Junio C Hamano

From: Jacob Keller <jacob.keller@gmail.com>

A future refactor got get_default_remote_submodule() is going to depend on
resolve_relative_url(). That function depends on get_default_remote().

Move get_default_remote_submodule() after resolve_relative_url() first
to make the additional functionality easier to review.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
---
 builtin/submodule--helper.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 4aa237033a526fca29cce2926419462179d40ee3..1aa87435c2000e94f43da94c5ef88a307f6f3f4a 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -41,22 +41,6 @@
 typedef void (*each_submodule_fn)(const struct cache_entry *list_item,
 				  void *cb_data);
 
-static int get_default_remote_submodule(const char *module_path, char **default_remote)
-{
-	struct repository subrepo;
-
-	if (repo_submodule_init(&subrepo, the_repository, module_path,
-				null_oid(the_hash_algo)) < 0)
-		return die_message(_("could not get a repository handle for submodule '%s'"),
-				   module_path);
-
-	*default_remote = xstrdup(repo_default_remote(&subrepo));
-
-	repo_clear(&subrepo);
-
-	return 0;
-}
-
 static char *get_default_remote(void)
 {
 	return xstrdup(repo_default_remote(the_repository));
@@ -86,6 +70,22 @@ static char *resolve_relative_url(const char *rel_url, const char *up_path, int
 	return resolved_url;
 }
 
+static int get_default_remote_submodule(const char *module_path, char **default_remote)
+{
+	struct repository subrepo;
+
+	if (repo_submodule_init(&subrepo, the_repository, module_path,
+				null_oid(the_hash_algo)) < 0)
+		return die_message(_("could not get a repository handle for submodule '%s'"),
+				   module_path);
+
+	*default_remote = xstrdup(repo_default_remote(&subrepo));
+
+	repo_clear(&subrepo);
+
+	return 0;
+}
+
 /* the result should be freed by the caller. */
 static char *get_submodule_displaypath(const char *path, const char *prefix,
 				       const char *super_prefix)

-- 
2.48.1.397.gec9d649cc640


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 6/6] submodule: look up remotes by URL first
  2025-06-17 21:30 [PATCH v2 0/6] submodule: improve remote lookup logic Jacob Keller
                   ` (4 preceding siblings ...)
  2025-06-17 21:30 ` [PATCH v2 5/6] submodule: move get_default_remote_submodule() Jacob Keller
@ 2025-06-17 21:30 ` Jacob Keller
  5 siblings, 0 replies; 18+ messages in thread
From: Jacob Keller @ 2025-06-17 21:30 UTC (permalink / raw)
  To: git; +Cc: Jacob Keller, Lidong Yan, Patrick Steinhardt, Junio C Hamano

From: Jacob Keller <jacob.keller@gmail.com>

The get_default_remote_submodule() function performs a lookup to find
the appropriate remote to use within a submodule. The function first
checks to see if it can find the remote for the current branch. If this
fails, it then checks to see if there is exactly one remote. It will use
this, before finally falling back to "origin" as the default.

If a user happens to rename their default remote from origin, either
manually or by setting something like clone.defaultRemoteName, this
fallback will not work.

In such cases, the submodule logic will try to use a non-existent
remote. This usually manifests as a failure to trigger the submodule
update.

The parent project already knows and stores the submodule URL in either
.gitmodules or its .git/config.

Add a new repo_remote_from_url() helper which will iterate over all the
remotes in a repository and return the first remote which has a matching
URL.

Refactor get_default_remote_submodule to find the submodule and get its
URL. If a valid URL exists, first try to obtain a remote using the new
repo_remote_from_url(). Fall back to the repo_default_remote()
otherwise.

The fallback logic is kept in case for some reason the user has manually
changed the URL within the submodule. Additionally, we still try to use
a remote rather than directly passing the URL in the
fetch_in_submodule() logic. This ensures that an update will properly
update the remote refs within the submodule as expected, rather than
just fetching into FETCH_HEAD.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
---
 remote.h                    |  1 +
 builtin/submodule--helper.c | 26 +++++++++++++++++++++++++-
 remote.c                    | 15 +++++++++++++++
 t/t7406-submodule-update.sh | 32 ++++++++++++++++++++++++++++++++
 4 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/remote.h b/remote.h
index 672e8e893b8243c09145c340f412aa0eae8d562f..59033d5d82dd81d79174f265c1d501ee5653a6ec 100644
--- a/remote.h
+++ b/remote.h
@@ -340,6 +340,7 @@ const char *pushremote_for_branch(struct branch *branch, int *explicit);
 char *remote_ref_for_branch(struct branch *branch, int for_push);
 
 const char *repo_default_remote(struct repository *repo);
+const char *repo_remote_from_url(struct repository *repo, const char *url);
 
 /* returns true if the given branch has merge configuration given. */
 int branch_has_merge_config(struct branch *branch);
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 1aa87435c2000e94f43da94c5ef88a307f6f3f4a..84a96d300d9489706fb16280f03aea02f87f1657 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -72,16 +72,40 @@ static char *resolve_relative_url(const char *rel_url, const char *up_path, int
 
 static int get_default_remote_submodule(const char *module_path, char **default_remote)
 {
+	const struct submodule *sub;
 	struct repository subrepo;
+	const char *remote_name = NULL;
+	char *url = NULL;
+
+	sub = submodule_from_path(the_repository, null_oid(the_hash_algo), module_path);
+	if (sub && sub->url) {
+		url = xstrdup(sub->url);
+
+		/* Possibly a url relative to parent */
+		if (starts_with_dot_dot_slash(url) ||
+		    starts_with_dot_slash(url)) {
+			char *oldurl = url;
+
+			url = resolve_relative_url(oldurl, NULL, 1);
+			free(oldurl);
+		}
+	}
 
 	if (repo_submodule_init(&subrepo, the_repository, module_path,
 				null_oid(the_hash_algo)) < 0)
 		return die_message(_("could not get a repository handle for submodule '%s'"),
 				   module_path);
 
-	*default_remote = xstrdup(repo_default_remote(&subrepo));
+	/* Look up by URL first */
+	if (url)
+		remote_name = repo_remote_from_url(&subrepo, url);
+	if (!remote_name)
+		remote_name = repo_default_remote(&subrepo);
+
+	*default_remote = xstrdup(remote_name);
 
 	repo_clear(&subrepo);
+	free(url);
 
 	return 0;
 }
diff --git a/remote.c b/remote.c
index 94b31f4c23057a247a968fc0ebe2e5170e99614d..706c25af0c27aaa9090197e93e87c2a612497172 100644
--- a/remote.c
+++ b/remote.c
@@ -1796,6 +1796,21 @@ const char *repo_default_remote(struct repository *repo)
 	return remotes_remote_for_branch(repo->remote_state, branch, NULL);
 }
 
+const char *repo_remote_from_url(struct repository *repo, const char *url)
+{
+	read_config(repo, 0);
+
+	for (int i = 0; i < repo->remote_state->remotes_nr; i++) {
+		struct remote *remote = repo->remote_state->remotes[i];
+		if (!remote)
+			continue;
+
+		if (remote_has_url(remote, url))
+			return remote->name;
+	}
+	return NULL;
+}
+
 int branch_has_merge_config(struct branch *branch)
 {
 	return branch && !!branch->merge;
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index 748b529745a5121f121768bb4e0cbc11bc833ea4..c09047b5f441a73a02a9fc4197e9a0ea8f39b529 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -1134,6 +1134,38 @@ test_expect_success 'setup clean recursive superproject' '
 	git clone --recurse-submodules top top-clean
 '
 
+test_expect_success 'submodule update with multiple remotes' '
+	test_when_finished "rm -fr top-cloned" &&
+	cp -r top-clean top-cloned &&
+
+	# Create a commit in each repo, starting with bottom
+	test_commit -C bottom multiple_remote_commit &&
+	# Create middle commit
+	git -C middle/bottom fetch &&
+	git -C middle/bottom checkout -f FETCH_HEAD &&
+	git -C middle add bottom &&
+	git -C middle commit -m "multiple_remote_commit" &&
+	# Create top commit
+	git -C top/middle fetch &&
+	git -C top/middle checkout -f FETCH_HEAD &&
+	git -C top add middle &&
+	git -C top commit -m "multiple_remote_commit" &&
+
+	# rename the submodule remote
+	git -C top-cloned/middle remote rename origin upstream &&
+
+	# Add another remote
+	git -C top-cloned/middle remote add other bogus &&
+
+	# Make the update of "middle" a no-op, otherwise we error out
+	# because of its unmerged state
+	test_config -C top-cloned submodule.middle.update !true &&
+	git -C top-cloned submodule update --recursive 2>actual.err &&
+	cat >expect.err <<-\EOF &&
+	EOF
+	test_cmp expect.err actual.err
+'
+
 test_expect_success 'submodule update with renamed remote' '
 	test_when_finished "rm -fr top-cloned" &&
 	cp -r top-clean top-cloned &&

-- 
2.48.1.397.gec9d649cc640


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/6] remote: fix tear down of struct branch and struct remote
  2025-06-17 21:30 ` [PATCH v2 1/6] remote: fix tear down of struct branch and struct remote Jacob Keller
@ 2025-06-17 22:25   ` Junio C Hamano
  2025-06-17 23:25     ` Jacob Keller
  2025-06-17 23:45     ` Jacob Keller
  0 siblings, 2 replies; 18+ messages in thread
From: Junio C Hamano @ 2025-06-17 22:25 UTC (permalink / raw)
  To: Jacob Keller; +Cc: git, Jacob Keller, Lidong Yan, Patrick Steinhardt

Jacob Keller <jacob.e.keller@intel.com> writes:

> From: Jacob Keller <jacob.keller@gmail.com>
>
> The branch_release() and remote_clear() functions fail to completely
> release all of the memory they point to. This results in leaked memory
> if you read the configuration for an arbitrary repository and then
> release that repository.
>
> This should be caught by the leak sanitizer. However, for callers which
> use ``the_repository``, the values never go out of scope and the
> sanitizer won't complain.
>
> A future change is going to add a caller of read_config() for a
> submodule repository structure. Doing so reveals one immediate issue due
> to a bad NULL pointer access, as well as the mentioned leaks.
>
>  * The branch->merge array is accessed without checking if its NULL.
>    Since this array is only setup by calling set_merge, it may in fact
>    not be initialized even though merge_nr is non-zero.
>
>  * The remote->push and remote->fetch refspecs are never cleared.
>
>  * The branch->merge_name array is never cleared.
>
>  * The individual elements of branch->merge are not released.
>
> Add a check against branch->merge before accessing it and calling
> refspec_item_clear. Update remote_clear() with calls to refspec_clear()
> for both the push and fetch refspecs. Add a release of the merge_name
> items as well as a final release of the merge_name array.
>
> Freeing merge_name elements results in a warning because we discard the
> const qualifier on the parameter name. These values come from a call to
> add_merge() in handle_config(), which always copies the names with
> xstrdup. This makes ownership of the memory difficult to track in the
> code.
>
> Move the call to xstrdup inside add_merge() so that its clear that the
> memory is duplicated here and must be released when the merge_name array
> is released. Drop the const qualifier on the branch structure to allow
> calling free without an explicit cast.
>
> These changes make it safe to call read_config() on a submodule
> repository without crashing or leaking any of the memory when the
> submodule repository is released.
>
> There is still some confusion with the difference between branch->merge
> and branch->merge_name, and the confusion of using branch->merge_nr for
> both. That could probably use some future cleanup.


Nicely described.  One thing that puzzles me is this part:

> @@ -253,9 +256,16 @@ static void branch_release(struct branch *branch)
>  	free((char *)branch->refname);
>  	free(branch->remote_name);
>  	free(branch->pushremote_name);
> -	for (int i = 0; i < branch->merge_nr; i++)
> -		refspec_item_clear(branch->merge[i]);
> +	for (int i = 0; i < branch->merge_nr; i++) {
> +		if (branch->merge) {
> +			refspec_item_clear(branch->merge[i]);
> +			free(branch->merge[i]);
> +		}
> +		if (branch->merge_name)
> +			free(branch->merge_name[i]);
> +	}
>  	free(branch->merge);
> +	free(branch->merge_name);
>  }

where we iterate over branch->merge[] and branch->merge_name[]
for branch->nr times and each time we check the NULL-ness of these
two pointers.

merge_nr is only incremented inside add_merge() when merge_name[]
array is grown by the same amount.  Do we need to check for the
NULL-ness of branch->merge_name?

Near the beginning of set_merge() we allocate branch->merge[] 
only when branch->merge_nr is non-zero (the assumption seems to be
that we accumulate in the merge_names[] while reading the config,
so as long as branch->merge is initialized properly to NULL, it will
never be NULL if merge_nr is not 0, no?

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/6] remote: remove the_repository from some functions
  2025-06-17 21:30 ` [PATCH v2 3/6] remote: remove the_repository from some functions Jacob Keller
@ 2025-06-17 22:46   ` Junio C Hamano
  2025-06-17 23:19     ` Jacob Keller
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2025-06-17 22:46 UTC (permalink / raw)
  To: Jacob Keller; +Cc: git, Jacob Keller, Lidong Yan, Patrick Steinhardt

Jacob Keller <jacob.e.keller@intel.com> writes:

> From: Jacob Keller <jacob.keller@gmail.com>
>
> The remotes_remote_get_1 (and its caller, remotes_remote_get, have an
> implicit dependency on the_repository due to calling
> read_branches_file() and read_remotes_file(), both of which use
> the_repository. The branch_get() function calls set_merge() which has an
> implicit dependency on the_repository as well.
>
> Because of this use of the_repository, the helper functions cannot be
> used in code paths which operate on other repositories. A future
> refactor of the submodule--helper will want to make use of some of these
> functions.
>
> Refactor to break the dependency by passing struct repository *repo
> instead of struct remote_state *remote_state in a few places.
>
> The public callers and many other helper functions still depend on
> the_repository. A repo-aware function will be exposed in a following
> change for git submodule--helper.
>
> Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
> ---
>  remote.c | 58 ++++++++++++++++++++++++++++------------------------------
>  1 file changed, 28 insertions(+), 30 deletions(-)

As exactly one remote_state instance belongs to each repository ever
since fd3cb050 (remote: move static variables into per-repository
struct, 2021-11-17) defined the former, and remote_state is not
shared across repository, passing the repository instance that owns
a remote_state instance and pick up the .remote_state member out of
it as needed would give us the right remote_state, and gives us
access to the repository instance it owns it.

Makes perfect sense, 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 4/6] submodule--helper: improve logic for fallback remote name
  2025-06-17 21:30 ` [PATCH v2 4/6] submodule--helper: improve logic for fallback remote name Jacob Keller
@ 2025-06-17 23:12   ` Junio C Hamano
  2025-06-17 23:21     ` Jacob Keller
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2025-06-17 23:12 UTC (permalink / raw)
  To: Jacob Keller; +Cc: git, Jacob Keller, Lidong Yan, Patrick Steinhardt

Jacob Keller <jacob.e.keller@intel.com> writes:

> From: Jacob Keller <jacob.keller@gmail.com>
>
> The repo_get_default_remote() function in submodule--helper currently
> tries to figure out the proper remote name to use for a submodule based
> on a few factors.
>
> First, it tries to find the remote for the currently checked out branch.
> This works if the submodule is configured to checkout to a branch
> instead of a detached HEAD state.
>
> In the detached HEAD state, the code calls back to using "origin", on

"calls back" -> "falls back".

> the assumption that this is the default remote name. Some users may
> change this, such as by setting clone.defaultRemoteName, or by changing
> the remote name manually within the submodule repository.
>
> As a first step to improving this situation, refactor to reuse the logic
> from remotes_remote_for_branch(). This function uses the remote from the
> branch if it has one. If it doesn't then it checks to see if there is
> exactly one remote. It uses this remote first before attempting to fall
> back to "origin".

Designed wtih good taste.

> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index 9e8cdfe1b2a8c2985d9c1b8ad6f1b0d1f9401714..4aa237033a526fca29cce2926419462179d40ee3 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -41,61 +41,25 @@
>  typedef void (*each_submodule_fn)(const struct cache_entry *list_item,
>  				  void *cb_data);
>  
> -static int repo_get_default_remote(struct repository *repo, char **default_remote)
> -{
> -	char *dest = NULL;
> -	struct strbuf sb = STRBUF_INIT;
> -	struct ref_store *store = get_main_ref_store(repo);
> -	const char *refname = refs_resolve_ref_unsafe(store, "HEAD", 0, NULL,
> -						      NULL);
> -
> -	if (!refname)
> -		return die_message(_("No such ref: %s"), "HEAD");
> -
> -	/* detached HEAD */
> -	if (!strcmp(refname, "HEAD")) {
> -		*default_remote = xstrdup("origin");
> -		return 0;
> -	}
> -
> -	if (!skip_prefix(refname, "refs/heads/", &refname))
> -		return die_message(_("Expecting a full ref name, got %s"),
> -				   refname);
> -
> -	strbuf_addf(&sb, "branch.%s.remote", refname);
> -	if (repo_config_get_string(repo, sb.buf, &dest))
> -		*default_remote = xstrdup("origin");
> -	else
> -		*default_remote = dest;
> -
> -	strbuf_release(&sb);
> -	return 0;
> -}

We will lose two callers of this function, so we can safely remove
it.

>  static int get_default_remote_submodule(const char *module_path, char **default_remote)
>  static char *get_default_remote(void)

These callers that used to call the removed helper now call
repo_default_remote() instread.  Good.


> diff --git a/remote.c b/remote.c
> index b3a9881a6eacf90bee71d6760858b37d68263502..94b31f4c23057a247a968fc0ebe2e5170e99614d 100644
> --- a/remote.c
> +++ b/remote.c
> @@ -1767,20 +1767,35 @@ static void set_merge(struct repository *repo, struct branch *ret)
>  	}
>  }
>  
> -struct branch *branch_get(const char *name)
> +static struct branch *repo_branch_get(struct repository *repo, const char *name)
>  {
>  	struct branch *ret;
>  
> -	read_config(the_repository, 0);
> +	read_config(repo, 0);
>  	if (!name || !*name || !strcmp(name, "HEAD"))
> -		ret = the_repository->remote_state->current_branch;
> +		ret = repo->remote_state->current_branch;
>  	else
> -		ret = make_branch(the_repository->remote_state, name,
> +		ret = make_branch(repo->remote_state, name,
>  				  strlen(name));
> -	set_merge(the_repository, ret);
> +	set_merge(repo, ret);
>  	return ret;
>  }
>  
> +struct branch *branch_get(const char *name)
> +{
> +	return repo_branch_get(the_repository, name);
> +}

Nice to see how the dependency to the_repository is lifted for new
callers while retaining the same interface for existing ones.

> +const char *repo_default_remote(struct repository *repo)
> +{
> +	struct branch *branch;
> +
> +	read_config(repo, 0);
> +	branch = repo_branch_get(repo, "HEAD");
> +
> +	return remotes_remote_for_branch(repo->remote_state, branch, NULL);
> +}

OK.  read_config() is a safe no-op if repo has already been
initialized, so this would give us what we want.  Nicely done.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/6] remote: remove the_repository from some functions
  2025-06-17 22:46   ` Junio C Hamano
@ 2025-06-17 23:19     ` Jacob Keller
  0 siblings, 0 replies; 18+ messages in thread
From: Jacob Keller @ 2025-06-17 23:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jacob Keller, Lidong Yan, Patrick Steinhardt



On 6/17/2025 3:46 PM, Junio C Hamano wrote:
> Jacob Keller <jacob.e.keller@intel.com> writes:
> 
>> From: Jacob Keller <jacob.keller@gmail.com>
>>
>> The remotes_remote_get_1 (and its caller, remotes_remote_get, have an
>> implicit dependency on the_repository due to calling
>> read_branches_file() and read_remotes_file(), both of which use
>> the_repository. The branch_get() function calls set_merge() which has an
>> implicit dependency on the_repository as well.
>>
>> Because of this use of the_repository, the helper functions cannot be
>> used in code paths which operate on other repositories. A future
>> refactor of the submodule--helper will want to make use of some of these
>> functions.
>>
>> Refactor to break the dependency by passing struct repository *repo
>> instead of struct remote_state *remote_state in a few places.
>>
>> The public callers and many other helper functions still depend on
>> the_repository. A repo-aware function will be exposed in a following
>> change for git submodule--helper.
>>
>> Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
>> ---
>>  remote.c | 58 ++++++++++++++++++++++++++++------------------------------
>>  1 file changed, 28 insertions(+), 30 deletions(-)
> 
> As exactly one remote_state instance belongs to each repository ever
> since fd3cb050 (remote: move static variables into per-repository
> struct, 2021-11-17) defined the former, and remote_state is not
> shared across repository, passing the repository instance that owns
> a remote_state instance and pick up the .remote_state member out of
> it as needed would give us the right remote_state, and gives us
> access to the repository instance it owns it.
> 
> Makes perfect sense, 

Yep. It looks like most of the use of the repository structure actually
comes from the to-be-removed logic for handling the remotes_from_file
stuff.. but I don't really want to delay these fixes and improvements to
3.0. :(

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 4/6] submodule--helper: improve logic for fallback remote name
  2025-06-17 23:12   ` Junio C Hamano
@ 2025-06-17 23:21     ` Jacob Keller
  0 siblings, 0 replies; 18+ messages in thread
From: Jacob Keller @ 2025-06-17 23:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jacob Keller, Lidong Yan, Patrick Steinhardt



On 6/17/2025 4:12 PM, Junio C Hamano wrote:
> Jacob Keller <jacob.e.keller@intel.com> writes:
> 
>> From: Jacob Keller <jacob.keller@gmail.com>
>>
>> The repo_get_default_remote() function in submodule--helper currently
>> tries to figure out the proper remote name to use for a submodule based
>> on a few factors.
>>
>> First, it tries to find the remote for the currently checked out branch.
>> This works if the submodule is configured to checkout to a branch
>> instead of a detached HEAD state.
>>
>> In the detached HEAD state, the code calls back to using "origin", on
> 
> "calls back" -> "falls back".
> 

Yep.

>> the assumption that this is the default remote name. Some users may
>> change this, such as by setting clone.defaultRemoteName, or by changing
>> the remote name manually within the submodule repository.
>>
>> As a first step to improving this situation, refactor to reuse the logic
>> from remotes_remote_for_branch(). This function uses the remote from the
>> branch if it has one. If it doesn't then it checks to see if there is
>> exactly one remote. It uses this remote first before attempting to fall
>> back to "origin".
> 
> Designed wtih good taste.
> 
>> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
>> index 9e8cdfe1b2a8c2985d9c1b8ad6f1b0d1f9401714..4aa237033a526fca29cce2926419462179d40ee3 100644
>> --- a/builtin/submodule--helper.c
>> +++ b/builtin/submodule--helper.c
>> @@ -41,61 +41,25 @@
>>  typedef void (*each_submodule_fn)(const struct cache_entry *list_item,
>>  				  void *cb_data);
>>  
>> -static int repo_get_default_remote(struct repository *repo, char **default_remote)
>> -{
>> -	char *dest = NULL;
>> -	struct strbuf sb = STRBUF_INIT;
>> -	struct ref_store *store = get_main_ref_store(repo);
>> -	const char *refname = refs_resolve_ref_unsafe(store, "HEAD", 0, NULL,
>> -						      NULL);
>> -
>> -	if (!refname)
>> -		return die_message(_("No such ref: %s"), "HEAD");
>> -
>> -	/* detached HEAD */
>> -	if (!strcmp(refname, "HEAD")) {
>> -		*default_remote = xstrdup("origin");
>> -		return 0;
>> -	}
>> -
>> -	if (!skip_prefix(refname, "refs/heads/", &refname))
>> -		return die_message(_("Expecting a full ref name, got %s"),
>> -				   refname);
>> -
>> -	strbuf_addf(&sb, "branch.%s.remote", refname);
>> -	if (repo_config_get_string(repo, sb.buf, &dest))
>> -		*default_remote = xstrdup("origin");
>> -	else
>> -		*default_remote = dest;
>> -
>> -	strbuf_release(&sb);
>> -	return 0;
>> -}
> 
> We will lose two callers of this function, so we can safely remove
> it.
> 

Yep, we're removing both callers.

>>  static int get_default_remote_submodule(const char *module_path, char **default_remote)
>>  static char *get_default_remote(void)
> 
> These callers that used to call the removed helper now call
> repo_default_remote() instread.  Good.
> 
> 
>> diff --git a/remote.c b/remote.c
>> index b3a9881a6eacf90bee71d6760858b37d68263502..94b31f4c23057a247a968fc0ebe2e5170e99614d 100644
>> --- a/remote.c
>> +++ b/remote.c
>> @@ -1767,20 +1767,35 @@ static void set_merge(struct repository *repo, struct branch *ret)
>>  	}
>>  }
>>  
>> -struct branch *branch_get(const char *name)
>> +static struct branch *repo_branch_get(struct repository *repo, const char *name)
>>  {
>>  	struct branch *ret;
>>  
>> -	read_config(the_repository, 0);
>> +	read_config(repo, 0);
>>  	if (!name || !*name || !strcmp(name, "HEAD"))
>> -		ret = the_repository->remote_state->current_branch;
>> +		ret = repo->remote_state->current_branch;
>>  	else
>> -		ret = make_branch(the_repository->remote_state, name,
>> +		ret = make_branch(repo->remote_state, name,
>>  				  strlen(name));
>> -	set_merge(the_repository, ret);
>> +	set_merge(repo, ret);
>>  	return ret;
>>  }
>>  
>> +struct branch *branch_get(const char *name)
>> +{
>> +	return repo_branch_get(the_repository, name);
>> +}
> 
> Nice to see how the dependency to the_repository is lifted for new
> callers while retaining the same interface for existing ones.
> 

Yea, I figured there's little reason to go poke all the existing callers
right away. Those can be updated if/when they have a repository
parameter passed in.

>> +const char *repo_default_remote(struct repository *repo)
>> +{
>> +	struct branch *branch;
>> +
>> +	read_config(repo, 0);
>> +	branch = repo_branch_get(repo, "HEAD");
>> +
>> +	return remotes_remote_for_branch(repo->remote_state, branch, NULL);
>> +}
> 
> OK.  read_config() is a safe no-op if repo has already been
> initialized, so this would give us what we want.  Nicely done.
> 
> 

Yep, this matches how most of the other remote logic works.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/6] remote: fix tear down of struct branch and struct remote
  2025-06-17 22:25   ` Junio C Hamano
@ 2025-06-17 23:25     ` Jacob Keller
  2025-06-18  1:30       ` Junio C Hamano
  2025-06-18  2:44       ` Lidong Yan
  2025-06-17 23:45     ` Jacob Keller
  1 sibling, 2 replies; 18+ messages in thread
From: Jacob Keller @ 2025-06-17 23:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jacob Keller, Lidong Yan, Patrick Steinhardt



On 6/17/2025 3:25 PM, Junio C Hamano wrote:
> Jacob Keller <jacob.e.keller@intel.com> writes:
> 
>> From: Jacob Keller <jacob.keller@gmail.com>
>>
>> The branch_release() and remote_clear() functions fail to completely
>> release all of the memory they point to. This results in leaked memory
>> if you read the configuration for an arbitrary repository and then
>> release that repository.
>>
>> This should be caught by the leak sanitizer. However, for callers which
>> use ``the_repository``, the values never go out of scope and the
>> sanitizer won't complain.
>>
>> A future change is going to add a caller of read_config() for a
>> submodule repository structure. Doing so reveals one immediate issue due
>> to a bad NULL pointer access, as well as the mentioned leaks.
>>
>>  * The branch->merge array is accessed without checking if its NULL.
>>    Since this array is only setup by calling set_merge, it may in fact
>>    not be initialized even though merge_nr is non-zero.
>>
>>  * The remote->push and remote->fetch refspecs are never cleared.
>>
>>  * The branch->merge_name array is never cleared.
>>
>>  * The individual elements of branch->merge are not released.
>>
>> Add a check against branch->merge before accessing it and calling
>> refspec_item_clear. Update remote_clear() with calls to refspec_clear()
>> for both the push and fetch refspecs. Add a release of the merge_name
>> items as well as a final release of the merge_name array.
>>
>> Freeing merge_name elements results in a warning because we discard the
>> const qualifier on the parameter name. These values come from a call to
>> add_merge() in handle_config(), which always copies the names with
>> xstrdup. This makes ownership of the memory difficult to track in the
>> code.
>>
>> Move the call to xstrdup inside add_merge() so that its clear that the
>> memory is duplicated here and must be released when the merge_name array
>> is released. Drop the const qualifier on the branch structure to allow
>> calling free without an explicit cast.
>>
>> These changes make it safe to call read_config() on a submodule
>> repository without crashing or leaking any of the memory when the
>> submodule repository is released.
>>
>> There is still some confusion with the difference between branch->merge
>> and branch->merge_name, and the confusion of using branch->merge_nr for
>> both. That could probably use some future cleanup.
> 
> 
> Nicely described.  One thing that puzzles me is this part:
> 
>> @@ -253,9 +256,16 @@ static void branch_release(struct branch *branch)
>>  	free((char *)branch->refname);
>>  	free(branch->remote_name);
>>  	free(branch->pushremote_name);
>> -	for (int i = 0; i < branch->merge_nr; i++)
>> -		refspec_item_clear(branch->merge[i]);
>> +	for (int i = 0; i < branch->merge_nr; i++) {
>> +		if (branch->merge) {
>> +			refspec_item_clear(branch->merge[i]);
>> +			free(branch->merge[i]);
>> +		}
>> +		if (branch->merge_name)
>> +			free(branch->merge_name[i]);
>> +	}
>>  	free(branch->merge);
>> +	free(branch->merge_name);
>>  }
> 
> where we iterate over branch->merge[] and branch->merge_name[]
> for branch->nr times and each time we check the NULL-ness of these
> two pointers.
> 
> merge_nr is only incremented inside add_merge() when merge_name[]
> array is grown by the same amount.  Do we need to check for the
> NULL-ness of branch->merge_name?
> 

Probably not.

> Near the beginning of set_merge() we allocate branch->merge[] 
> only when branch->merge_nr is non-zero (the assumption seems to be
> that we accumulate in the merge_names[] while reading the config,
> so as long as branch->merge is initialized properly to NULL, it will
> never be NULL if merge_nr is not 0, no?
> 

We initialize branch->merge with set_merge() which is called by
branch_get() and which is the only way for callers external to remote.c
of getting a branch structure.

The issue is that merge_nr can be non-zero because if no caller has done
a branch_get() on the given branch, we still have merge_nr is non-zero
and merge is NULL. I suspect you're right that merge_name will never be
NULL while merge_nr is non-zero.

Really, I think we should find a better way to handle this than having
two separate arrays which interact with merge_nr. I especially just
noticed that set_merge will assign merge_nr to 0 if remote_name is bad,
but it doesn't release the memory, so we'd leak on teardown in that case...

I wonder if it is possible to just get rid of merge_names entirely and
only have the merge array?

> Thanks.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/6] remote: fix tear down of struct branch and struct remote
  2025-06-17 22:25   ` Junio C Hamano
  2025-06-17 23:25     ` Jacob Keller
@ 2025-06-17 23:45     ` Jacob Keller
  1 sibling, 0 replies; 18+ messages in thread
From: Jacob Keller @ 2025-06-17 23:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jacob Keller, Lidong Yan, Patrick Steinhardt



On 6/17/2025 3:25 PM, Junio C Hamano wrote:
> Jacob Keller <jacob.e.keller@intel.com> writes:
> 
>> From: Jacob Keller <jacob.keller@gmail.com>
> 
> where we iterate over branch->merge[] and branch->merge_name[]
> for branch->nr times and each time we check the NULL-ness of these
> two pointers.
> 
> merge_nr is only incremented inside add_merge() when merge_name[]
> array is grown by the same amount.  Do we need to check for the
> NULL-ness of branch->merge_name?
> 
> Near the beginning of set_merge() we allocate branch->merge[] 
> only when branch->merge_nr is non-zero (the assumption seems to be
> that we accumulate in the merge_names[] while reading the config,
> so as long as branch->merge is initialized properly to NULL, it will
> never be NULL if merge_nr is not 0, no?
> 
> Thanks.

Perhaps something like this might be a better solution so that we don't
need to worry about merge being NULL and tracking the difference between
merge_name and merge. Its a bit ugly still with a partially initialized
refspec_item so I am not 100% sure but I think this is overall a bit
better than the current situation:

> diff --git i/remote.h w/remote.h
> index 59033d5d82dd..0ca399e1835b 100644
> --- i/remote.h
> +++ w/remote.h
> @@ -316,8 +316,8 @@ struct branch {
> 
>         char *pushremote_name;
> 
> -       /* An array of the "merge" lines in the configuration. */
> -       char **merge_name;
> +       /* True if set_merge() has been called to finalize the merge array */
> +       int set_merge;
> 
>         /**
>          * An array of the struct refspecs used for the merge lines. That is,
> diff --git i/branch.c w/branch.c
> index 6d01d7d6bdb2..93f5b4e8dd9f 100644
> --- i/branch.c
> +++ w/branch.c
> @@ -230,7 +230,7 @@ static int inherit_tracking(struct tracking *tracking, const char *orig_ref)
>                 return -1;
>         }
> 
> -       if (branch->merge_nr < 1 || !branch->merge_name || !branch->merge_name[0]) {
> +       if (branch->merge_nr < 1 || !branch->merge || !branch->merge[0] || !branch->merge[0]->src) {
>                 warning(_("asked to inherit tracking from '%s', but no merge configuration is set"),
>                         bare_ref);
>                 return -1;
> @@ -238,7 +238,7 @@ static int inherit_tracking(struct tracking *tracking, const char *orig_ref)
> 
>         tracking->remote = branch->remote_name;
>         for (i = 0; i < branch->merge_nr; i++)
> -               string_list_append(tracking->srcs, branch->merge_name[i]);
> +               string_list_append(tracking->srcs, branch->merge[i]->src);
>         return 0;
>  }
> 
> diff --git i/builtin/pull.c w/builtin/pull.c
> index a1ebc6ad3328..f4556ae155ce 100644
> --- i/builtin/pull.c
> +++ w/builtin/pull.c
> @@ -487,7 +487,7 @@ static void NORETURN die_no_merge_candidates(const char *repo, const char **refs
>         } else
>                 fprintf_ln(stderr, _("Your configuration specifies to merge with the ref '%s'\n"
>                         "from the remote, but no such ref was fetched."),
> -                       *curr_branch->merge_name);
> +                       curr_branch->merge[0]->src);
>         exit(1);
>  }
> 
> diff --git i/remote.c w/remote.c
> index 706c25af0c27..1850e8fa4e42 100644
> --- i/remote.c
> +++ w/remote.c
> @@ -177,9 +177,15 @@ static void remote_clear(struct remote *remote)
> 
>  static void add_merge(struct branch *branch, const char *name)
>  {
> -       ALLOC_GROW(branch->merge_name, branch->merge_nr + 1,
> +       struct refspec_item *merge;
> +
> +       ALLOC_GROW(branch->merge, branch->merge_nr + 1,
>                    branch->merge_alloc);
> -       branch->merge_name[branch->merge_nr++] = xstrdup(name);
> +
> +       merge = xcalloc(1, sizeof(*merge));
> +       merge->src = xstrdup(name);
> +
> +       branch->merge[branch->merge_nr++] = merge;
>  }
> 
>  struct branches_hash_key {
> @@ -250,22 +256,23 @@ static struct branch *make_branch(struct remote_state *remote_state,
>         return ret;
>  }
> 
> +static void merge_clear(struct branch *branch)
> +{
> +       for (int i = 0; i < branch->merge_nr; i++) {
> +               refspec_item_clear(branch->merge[i]);
> +               free(branch->merge[i]);
> +       }
> +       free(branch->merge);
> +       branch->merge_nr = 0;
> +}
> +
>  static void branch_release(struct branch *branch)
>  {
>         free((char *)branch->name);
>         free((char *)branch->refname);
>         free(branch->remote_name);
>         free(branch->pushremote_name);
> -       for (int i = 0; i < branch->merge_nr; i++) {
> -               if (branch->merge) {
> -                       refspec_item_clear(branch->merge[i]);
> -                       free(branch->merge[i]);
> -               }
> -               if (branch->merge_name)
> -                       free(branch->merge_name[i]);
> -       }
> -       free(branch->merge);
> -       free(branch->merge_name);
> +       merge_clear(branch);
>  }
> 
>  static struct rewrite *make_rewrite(struct rewrites *r,
> @@ -700,7 +707,7 @@ char *remote_ref_for_branch(struct branch *branch, int for_push)
>         if (branch) {
>                 if (!for_push) {
>                         if (branch->merge_nr) {
> -                               return xstrdup(branch->merge_name[0]);
> +                               return xstrdup(branch->merge[0]->src);
>                         }
>                 } else {
>                         char *dst;
> @@ -1738,32 +1745,30 @@ static void set_merge(struct repository *repo, struct branch *ret)
> 
>         if (!ret)
>                 return; /* no branch */
> -       if (ret->merge)
> +       if (ret->set_merge)
>                 return; /* already run */
>         if (!ret->remote_name || !ret->merge_nr) {
>                 /*
>                  * no merge config; let's make sure we don't confuse callers
>                  * with a non-zero merge_nr but a NULL merge
>                  */
> -               ret->merge_nr = 0;
> +               merge_clear(ret);
>                 return;
>         }
> +       ret->set_merge = 1;
> 
>         remote = remotes_remote_get(repo, ret->remote_name);
> 
> -       CALLOC_ARRAY(ret->merge, ret->merge_nr);
>         for (i = 0; i < ret->merge_nr; i++) {
> -               ret->merge[i] = xcalloc(1, sizeof(**ret->merge));
> -               ret->merge[i]->src = xstrdup(ret->merge_name[i]);
>                 if (!remote_find_tracking(remote, ret->merge[i]) ||
>                     strcmp(ret->remote_name, "."))
>                         continue;
> -               if (repo_dwim_ref(repo, ret->merge_name[i],
> -                                 strlen(ret->merge_name[i]), &oid, &ref,
> +               if (repo_dwim_ref(repo, ret->merge[i]->src,
> +                                 strlen(ret->merge[i]->src), &oid, &ref,
>                                   0) == 1)
>                         ret->merge[i]->dst = ref;
>                 else
> -                       ret->merge[i]->dst = xstrdup(ret->merge_name[i]);
> +                       ret->merge[i]->dst = xstrdup(ret->merge[i]->src);
>         }
>  }
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/6] remote: fix tear down of struct branch and struct remote
  2025-06-17 23:25     ` Jacob Keller
@ 2025-06-18  1:30       ` Junio C Hamano
  2025-06-18 11:20         ` Junio C Hamano
  2025-06-18  2:44       ` Lidong Yan
  1 sibling, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2025-06-18  1:30 UTC (permalink / raw)
  To: Jacob Keller; +Cc: git, Jacob Keller, Lidong Yan, Patrick Steinhardt

Jacob Keller <jacob.e.keller@intel.com> writes:

> We initialize branch->merge with set_merge() which is called by
> branch_get() and which is the only way for callers external to remote.c
> of getting a branch structure.
>
> The issue is that merge_nr can be non-zero because if no caller has done
> a branch_get() on the given branch, we still have merge_nr is non-zero
> and merge is NULL.

Meaning merge_nr and merge are both uninitialized and unlikely to be
0 and NULL?  What values do they have, and if they are left
uninitialized, shouldn't we be initializing them to predictable
values?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/6] remote: fix tear down of struct branch and struct remote
  2025-06-17 23:25     ` Jacob Keller
  2025-06-18  1:30       ` Junio C Hamano
@ 2025-06-18  2:44       ` Lidong Yan
  1 sibling, 0 replies; 18+ messages in thread
From: Lidong Yan @ 2025-06-18  2:44 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Junio C Hamano, git, Jacob Keller, Patrick Steinhardt

Jacob Keller <jacob.e.keller@intel.com> writes:
> 
> We initialize branch->merge with set_merge() which is called by
> branch_get() and which is the only way for callers external to remote.c
> of getting a branch structure.
> 
> The issue is that merge_nr can be non-zero because if no caller has done
> a branch_get() on the given branch, we still have merge_nr is non-zero
> and merge is NULL. I suspect you're right that merge_name will never be
> NULL while merge_nr is non-zero.

Maybe we can initialize each branch->merge in read_config(). Then branch->merge
will always as large as branch->merge_nr. Do you think this would be a good idea?

---
diff --git a/remote.c b/remote.c
index 4099183cac..835939c59e 100644
--- a/remote.c
+++ b/remote.c
@@ -596,6 +596,8 @@ static void alias_all_urls(struct remote_state *remote_state)
 
 static void read_config(struct repository *repo, int early)
 {
+       struct hashmap_iter iter;
+       struct branch *b;
        int flag;
 
        if (repo->remote_state->initialized)
@@ -614,6 +616,9 @@ static void read_config(struct repository *repo, int early)
        }
        repo_config(repo, handle_config, repo->remote_state);
        alias_all_urls(repo->remote_state);
+       hashmap_for_each_entry(&repo->remote_state->branches_hash, &item, b, ent) {
+               set_merge(repo->remote_state, b);
+       }
 }
 
 #ifndef WITH_BREAKING_CHANGES
---


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/6] remote: fix tear down of struct branch and struct remote
  2025-06-18  1:30       ` Junio C Hamano
@ 2025-06-18 11:20         ` Junio C Hamano
  2025-06-18 17:41           ` Jacob Keller
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2025-06-18 11:20 UTC (permalink / raw)
  To: Jacob Keller; +Cc: git, Jacob Keller, Lidong Yan, Patrick Steinhardt

Junio C Hamano <gitster@pobox.com> writes:

> Jacob Keller <jacob.e.keller@intel.com> writes:
>
>> We initialize branch->merge with set_merge() which is called by
>> branch_get() and which is the only way for callers external to remote.c
>> of getting a branch structure.
>>
>> The issue is that merge_nr can be non-zero because if no caller has done
>> a branch_get() on the given branch, we still have merge_nr is non-zero
>> and merge is NULL.
>
> Meaning merge_nr and merge are both uninitialized and unlikely to be
> 0 and NULL?  What values do they have, and if they are left
> uninitialized, shouldn't we be initializing them to predictable
> values?

Ah, no, I was just being stupid.

We read configuration files and accumulate things in .merge_name[]
while incrementing .merge_nr but until set_merge() is called, .merge
is NULL.  We need to clear .merge[] only when it is not NULL, of
course.  And for that, it may be more readable if we had two loops.

Thanks.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/6] remote: fix tear down of struct branch and struct remote
  2025-06-18 11:20         ` Junio C Hamano
@ 2025-06-18 17:41           ` Jacob Keller
  0 siblings, 0 replies; 18+ messages in thread
From: Jacob Keller @ 2025-06-18 17:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jacob Keller, Lidong Yan, Patrick Steinhardt



On 6/18/2025 4:20 AM, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> Jacob Keller <jacob.e.keller@intel.com> writes:
>>
>>> We initialize branch->merge with set_merge() which is called by
>>> branch_get() and which is the only way for callers external to remote.c
>>> of getting a branch structure.
>>>
>>> The issue is that merge_nr can be non-zero because if no caller has done
>>> a branch_get() on the given branch, we still have merge_nr is non-zero
>>> and merge is NULL.
>>
>> Meaning merge_nr and merge are both uninitialized and unlikely to be
>> 0 and NULL?  What values do they have, and if they are left
>> uninitialized, shouldn't we be initializing them to predictable
>> values?
> 
> Ah, no, I was just being stupid.
> 
> We read configuration files and accumulate things in .merge_name[]
> while incrementing .merge_nr but until set_merge() is called, .merge
> is NULL.  We need to clear .merge[] only when it is not NULL, of
> course.  And for that, it may be more readable if we had two loops.
> 
> Thanks.
> 
> 
> 

Right. I think I like my solution of just eliminating merge_names
entirely. Unfortunately it looks like my mailer marked that content as
quoted when I pasted it in as an inline diff.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-06-18 17:41 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-17 21:30 [PATCH v2 0/6] submodule: improve remote lookup logic Jacob Keller
2025-06-17 21:30 ` [PATCH v2 1/6] remote: fix tear down of struct branch and struct remote Jacob Keller
2025-06-17 22:25   ` Junio C Hamano
2025-06-17 23:25     ` Jacob Keller
2025-06-18  1:30       ` Junio C Hamano
2025-06-18 11:20         ` Junio C Hamano
2025-06-18 17:41           ` Jacob Keller
2025-06-18  2:44       ` Lidong Yan
2025-06-17 23:45     ` Jacob Keller
2025-06-17 21:30 ` [PATCH v2 2/6] dir: move starts_with_dot(_dot)_slash to dir.h Jacob Keller
2025-06-17 21:30 ` [PATCH v2 3/6] remote: remove the_repository from some functions Jacob Keller
2025-06-17 22:46   ` Junio C Hamano
2025-06-17 23:19     ` Jacob Keller
2025-06-17 21:30 ` [PATCH v2 4/6] submodule--helper: improve logic for fallback remote name Jacob Keller
2025-06-17 23:12   ` Junio C Hamano
2025-06-17 23:21     ` Jacob Keller
2025-06-17 21:30 ` [PATCH v2 5/6] submodule: move get_default_remote_submodule() Jacob Keller
2025-06-17 21:30 ` [PATCH v2 6/6] submodule: look up remotes by URL first Jacob Keller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).