Git development
 help / color / mirror / Atom feed
* Re: [PATCH v2 2/3] commit-reach: deduplicate queue entries in paint_down_to_common
From: Kristofer Karlsson @ 2026-05-26  6:57 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Kristofer Karlsson via GitGitGadget, git, Derrick Stolee,
	Jeff King
In-Reply-To: <xmqqzf1ncded.fsf@gitster.g>

On Tue, 26 May 2026 at 00:50, Junio C Hamano <gitster@pobox.com> wrote:
> OK.  I guess an obvious alternative design would be to have an
> associated hashtable for deduping, or tweak prio_queue_get() so
> that it notices duplicated entry just before it returns (i.e.,
> peek and discard until queue->array[0].data is different from
> what you are going to return).  Both would not beat the cheap cost
> of using a single bit per object, I guess ;-)

Yes, I think a hashtable or hashset would work here too. I realize that I
have done a lot of local experimentation with alternative approaches but I
forgot to mention the ones I discarded for various reasons - but that
would be useful information for you to have too. Let me rectify that here.

oidset instead of enqueued flag: Works fine, but is ~15-20% slower end-to-end.
Both are O(1) but the overhead is quite significant compared to a flag.

Peek and discard: the problem here is that the commits are not necessarily
ordered. We can have a sequence of A,B,A if we are unlucky. What I did try
however was an alternative to this - just change the fast-exit heuristic to
overshoot until comparison returns > 0 - i.e. consume some
extra commits in the queue. This works and in my example data we typically
would only need to walk ~16 extra commits with this heuristic, so it's not
bad at all. But the extra comparisons we need to run on each iteration make
it ~15-20% slower.

Another thing I tried was simply tracking the minimum generation seen and
terminate as soon as we have gone past it. This is fast and simple and does
not require deduping, but it only works if we have a commit graph and
generation numbers.

The advantage of the approach with deduping via the ENQUEUED flag and then
just tracking the most recently enqueued commit is that it works independently
of ordering guarantees. All it needs to work is the fact that we can prove
that we have reached a point where queue no longer has any non-stale commits
at all.

Summary:
  Approach        Dedup         Works w/o commit-graph?  Speed
  ENQUEUED flag   yes (1 bit)   yes                      fastest
  Hashtable       yes           yes                      15-20% slower
  Peek-discard    -             -                        broken
  Cmp overshoot   no            yes                      15-20% slower
  Gen overshoot   no            no                       same as ENQUEUED

^ permalink raw reply

* [PATCH v2 8/8] setup: construct object database in `apply_repository_format()`
From: Patrick Steinhardt @ 2026-05-26  5:57 UTC (permalink / raw)
  To: git; +Cc: Kristoffer Haugsbakk, Junio C Hamano
In-Reply-To: <20260526-b4-pks-setup-centralize-odb-creation-v2-0-2fa5b385c13e@pks.im>

With the preceding changes we now always construct the repository's
object database before applying the repository format. Remove this
duplication by constructing it in `apply_repository_format()` instead.

Note that we create the object database _after_ having set up the
repository's hash algorithm, but _before_ setting the compat hash
algorithm. This is intentional:

  - Constructing the object database may require knowledge of its
    intended object format.

  - Setting up the compatibility hash requires the object database to be
    initialized already, because we immediately read the loose object
    map.

The first point is sensible, the second maybe a little less so. Ideally,
it should be the responsibility of the object database itself to
initialize any data structures required for the compatibility hash. But
this would require further changes, so this is kept as-is for now.

Further note that this requires us to move handling of the environment
variables GIT_OBJECT_DIRECTORY and GIT_ALTERNATE_OBJECT_DIRECTORIES into
the repository format, as well. This allows the caller more flexibility
around whether or not those environment variables are being honored, as
we want to respect them in "setup.c", but not in "repository.c".

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 repository.c |  4 +---
 setup.c      | 45 +++++++++++++++++++++------------------------
 setup.h      | 10 ++++++++++
 3 files changed, 32 insertions(+), 27 deletions(-)

diff --git a/repository.c b/repository.c
index 61dfbb8be6..187dd471c4 100644
--- a/repository.c
+++ b/repository.c
@@ -291,13 +291,11 @@ int repo_init(struct repository *repo,
 	if (read_repository_format_from_commondir(&format, repo->commondir))
 		goto error;
 
-	if (apply_repository_format(repo, &format, &err) < 0) {
+	if (apply_repository_format(repo, &format, 0, &err) < 0) {
 		warning("%s", err.buf);
 		goto error;
 	}
 
-	repo->objects = odb_new(repo, NULL, NULL);
-
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
diff --git a/setup.c b/setup.c
index 4a8d6230b1..513fc88749 100644
--- a/setup.c
+++ b/setup.c
@@ -1752,12 +1752,22 @@ enum discovery_result discover_git_directory_reason(struct strbuf *commondir,
 
 int apply_repository_format(struct repository *repo,
 			    const struct repository_format *format,
+			    enum apply_repository_format_flags flags,
 			    struct strbuf *err)
 {
+	char *object_directory = NULL, *alternate_object_directories = NULL;
+
 	if (verify_repository_format(format, err) < 0)
 		return -1;
 
+	if (flags & APPLY_REPOSITORY_FORMAT_HONOR_ENV) {
+		object_directory = xstrdup_or_null(getenv(DB_ENVIRONMENT));
+		alternate_object_directories = xstrdup_or_null(getenv(ALTERNATE_DB_ENVIRONMENT));
+	}
+
 	repo_set_hash_algo(repo, format->hash_algo);
+	repo->objects = odb_new(repo, object_directory,
+				alternate_object_directories);
 	repo_set_compat_hash_algo(repo, format->compat_hash_algo);
 	repo_set_ref_storage_format(repo,
 				    format->ref_storage_format,
@@ -1773,6 +1783,8 @@ int apply_repository_format(struct repository *repo,
 	repo->repository_format_precious_objects =
 		format->precious_objects;
 
+	free(alternate_object_directories);
+	free(object_directory);
 	return 0;
 }
 
@@ -1785,7 +1797,8 @@ int apply_repository_format(struct repository *repo,
  * If successful and fmt is not NULL, fill fmt with data.
  */
 static void check_and_apply_repository_format(struct repository *repo,
-					      struct repository_format *fmt)
+					      struct repository_format *fmt,
+					      enum apply_repository_format_flags flags)
 {
 	struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT;
 	struct strbuf err = STRBUF_INIT;
@@ -1794,7 +1807,7 @@ static void check_and_apply_repository_format(struct repository *repo,
 		fmt = &repo_fmt;
 
 	check_repository_format_gently(repo_get_git_dir(repo), fmt, NULL);
-	if (apply_repository_format(repo, fmt, &err) < 0)
+	if (apply_repository_format(repo, fmt, flags, &err) < 0)
 		die("%s", err.buf);
 	startup_info->have_repository = 1;
 
@@ -1874,15 +1887,9 @@ const char *enter_repo(struct repository *repo, const char *path, unsigned flags
 	}
 
 	if (is_git_directory(".")) {
-		struct strvec to_free = STRVEC_INIT;
-
 		set_git_dir(repo, ".", 0);
-		repo->objects = odb_new(repo,
-					getenv_safe(&to_free, DB_ENVIRONMENT),
-					getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT));
-		check_and_apply_repository_format(repo, NULL);
-
-		strvec_clear(&to_free);
+		check_and_apply_repository_format(repo, NULL,
+						  APPLY_REPOSITORY_FORMAT_HONOR_ENV);
 		return path;
 	}
 
@@ -2034,8 +2041,6 @@ const char *setup_git_directory_gently(struct repository *repo, int *nongit_ok)
 	    startup_info->have_repository ||
 	    /* GIT_DIR_EXPLICIT */
 	    getenv(GIT_DIR_ENVIRONMENT)) {
-		struct strvec to_free = STRVEC_INIT;
-
 		if (!repo->gitdir) {
 			const char *gitdir = getenv(GIT_DIR_ENVIRONMENT);
 			if (!gitdir)
@@ -2046,17 +2051,13 @@ const char *setup_git_directory_gently(struct repository *repo, int *nongit_ok)
 		if (startup_info->have_repository) {
 			struct strbuf err = STRBUF_INIT;
 
-			repo->objects = odb_new(repo,
-						getenv_safe(&to_free, DB_ENVIRONMENT),
-						getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT));
-			if (apply_repository_format(repo, &repo_fmt, &err) < 0)
+			if (apply_repository_format(repo, &repo_fmt,
+						    APPLY_REPOSITORY_FORMAT_HONOR_ENV, &err) < 0)
 				die("%s", err.buf);
 
 			clear_repository_format(&repo_fmt);
 			strbuf_release(&err);
 		}
-
-		strvec_clear(&to_free);
 	}
 	/*
 	 * Since precompose_string_if_needed() needs to look at
@@ -2805,7 +2806,6 @@ int init_db(struct repository *repo,
 	int exist_ok = flags & INIT_DB_EXIST_OK;
 	char *original_git_dir = real_pathdup(git_dir, 1);
 	struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT;
-	struct strvec to_free = STRVEC_INIT;
 
 	if (real_git_dir) {
 		struct stat st;
@@ -2826,16 +2826,14 @@ int init_db(struct repository *repo,
 	}
 	startup_info->have_repository = 1;
 
-	repo->objects = odb_new(repo, getenv_safe(&to_free, DB_ENVIRONMENT),
-				getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT));
-
 	/*
 	 * Check to see if the repository version is right.
 	 * Note that a newly created repository does not have
 	 * config file, so this will not fail.  What we are catching
 	 * is an attempt to reinitialize new repository with an old tool.
 	 */
-	check_and_apply_repository_format(repo, &repo_fmt);
+	check_and_apply_repository_format(repo, &repo_fmt,
+					  APPLY_REPOSITORY_FORMAT_HONOR_ENV);
 
 	repository_format_configure(repo, &repo_fmt, hash, ref_storage_format);
 
@@ -2892,7 +2890,6 @@ int init_db(struct repository *repo,
 	}
 
 	clear_repository_format(&repo_fmt);
-	strvec_clear(&to_free);
 	free(original_git_dir);
 	return 0;
 }
diff --git a/setup.h b/setup.h
index 5ed92f53fa..821b55aca0 100644
--- a/setup.h
+++ b/setup.h
@@ -221,6 +221,15 @@ void clear_repository_format(struct repository_format *format);
 int verify_repository_format(const struct repository_format *format,
 			     struct strbuf *err);
 
+enum apply_repository_format_flags {
+	/*
+	 * Honor environment variables when applying the repository format to
+	 * the repository. For now, this only covers environment variables that
+	 * relate to the object database.
+	 */
+	APPLY_REPOSITORY_FORMAT_HONOR_ENV = (1 << 0),
+};
+
 /*
  * Apply the given repository format to the repo. This initializes extensions
  * and basic data structures required for normal operation. Returns 0 on
@@ -228,6 +237,7 @@ int verify_repository_format(const struct repository_format *format,
  */
 int apply_repository_format(struct repository *repo,
 			    const struct repository_format *format,
+			    enum apply_repository_format_flags flags,
 			    struct strbuf *err);
 
 const char *get_template_dir(const char *option_template);

-- 
2.54.0.926.g75ba10bac6.dirty


^ permalink raw reply related

* [PATCH v2 7/8] repository: stop reading loose object map twice on repo init
From: Patrick Steinhardt @ 2026-05-26  5:57 UTC (permalink / raw)
  To: git; +Cc: Kristoffer Haugsbakk, Junio C Hamano
In-Reply-To: <20260526-b4-pks-setup-centralize-odb-creation-v2-0-2fa5b385c13e@pks.im>

When initializing a repository via `repo_init()` we end up reading the
loose object map twice:

  - `apply_repository_format()` calls `repo_set_compat_hash_algo()`,
    which in turn calls `repo_read_loose_object_map()` if we have a
    compatibility hash configured.

  - `repo_init()` calls `repo_read_loose_object_map()` directly a second
    time.

Drop the second read of the loose object map in `repo_init()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 repository.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/repository.c b/repository.c
index 2c2395105f..61dfbb8be6 100644
--- a/repository.c
+++ b/repository.c
@@ -301,9 +301,6 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
-	if (repo->compat_hash_algo)
-		repo_read_loose_object_map(repo);
-
 	clear_repository_format(&format);
 	strbuf_release(&err);
 	return 0;

-- 
2.54.0.926.g75ba10bac6.dirty


^ permalink raw reply related

* [PATCH v2 6/8] setup: stop initializing object database without repository
From: Patrick Steinhardt @ 2026-05-26  5:57 UTC (permalink / raw)
  To: git; +Cc: Kristoffer Haugsbakk, Junio C Hamano
In-Reply-To: <20260526-b4-pks-setup-centralize-odb-creation-v2-0-2fa5b385c13e@pks.im>

The function `setup_git_directory_gently()` is responsible for
discovering and setting up a Git repository based on various environment
variables and the current working directory. The result is thus a fully
usable Git repository.

One oddity of this function is that we may set up the object database
even in the case where we don't have a repository, namely in the case
where the `GIT_DIR_EXPLICIT` environment variable is set but points to a
non-existent repository. If so, we call `setup_git_env_internal()` with
the value of the environment variable so that the repository's Git
directory is configured, even if it points to a non-existent directory.

Historically though, this function didn't only configure the repository,
but also initialized the object database. We retained this behaviour
from a preceding commit, even though it really doesn't make much sense
in the first place -- there is no repository, so we don't have an object
database either. There seemingly isn't much of a reason to construct the
object database, as we typically won't try to read objects when we don't
have an object database.

There's one exception though: git-index-pack(1) may run outside of a
repository, which can be used to perform consistency checks for a
packfile. The code path is _almost_ working: we already know to call
`parse_object_buffer()`, which can read objects without an object
database being available. And that works for all object types except for
commits, because `parse_commit_buffer()` calls `parse_commit_graph()`,
and that function doesn't handle the case where we don't have an object
database.

Fix this instance to check for the object database instead of checking
for the Git directory having been initialized. With this fixed, we can
now stop constructing an object database completely.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 commit-graph.c | 4 ++--
 setup.c        | 7 +++----
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 9abe62bd5a..0820cf5fb8 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -740,13 +740,13 @@ static struct commit_graph *prepare_commit_graph(struct repository *r)
 	struct odb_source *source;
 
 	/*
-	 * Early return if there is no git dir or if the commit graph is
+	 * Early return if there is no object database or if the commit graph is
 	 * disabled.
 	 *
 	 * This must come before the "already attempted?" check below, because
 	 * we want to disable even an already-loaded graph file.
 	 */
-	if (!r->gitdir || r->commit_graph_disabled)
+	if (!r->objects || r->commit_graph_disabled)
 		return NULL;
 
 	if (r->objects->commit_graph_attempted)
diff --git a/setup.c b/setup.c
index 0dc9fe4565..4a8d6230b1 100644
--- a/setup.c
+++ b/setup.c
@@ -2043,13 +2043,12 @@ const char *setup_git_directory_gently(struct repository *repo, int *nongit_ok)
 			setup_git_env_internal(repo, gitdir);
 		}
 
-		repo->objects = odb_new(repo,
-					getenv_safe(&to_free, DB_ENVIRONMENT),
-					getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT));
-
 		if (startup_info->have_repository) {
 			struct strbuf err = STRBUF_INIT;
 
+			repo->objects = odb_new(repo,
+						getenv_safe(&to_free, DB_ENVIRONMENT),
+						getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT));
 			if (apply_repository_format(repo, &repo_fmt, &err) < 0)
 				die("%s", err.buf);
 

-- 
2.54.0.926.g75ba10bac6.dirty


^ permalink raw reply related

* [PATCH v2 4/8] repository: stop initializing the object database in `repo_set_gitdir()`
From: Patrick Steinhardt @ 2026-05-26  5:56 UTC (permalink / raw)
  To: git; +Cc: Kristoffer Haugsbakk, Junio C Hamano
In-Reply-To: <20260526-b4-pks-setup-centralize-odb-creation-v2-0-2fa5b385c13e@pks.im>

The function `repo_set_gitdir()` obviously sets the Git directory for a
given repository. Less obviously though, the function also configures a
couple of auxiliary settings.

One such thing is that we create the object database in this function.
This logic only happens conditionally though, as `set_git_dir()` may be
called multiple times during repository setup, and we don't want to
create the object database multiple times. This is somewhat tangled and
hard to follow.

Remove the logic from `repo_set_gitdir()` and instead initialize the
object database outside of it. This leads to some duplication right now,
but that duplication will be removed in a subsequent step where we will
start initializing the object database as part of applying the repo's
format.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 repository.c | 8 ++------
 repository.h | 3 ---
 setup.c      | 7 ++++---
 3 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/repository.c b/repository.c
index 58a13f7c4f..2c2395105f 100644
--- a/repository.c
+++ b/repository.c
@@ -181,12 +181,6 @@ void repo_set_gitdir(struct repository *repo,
 	free(old_gitdir);
 
 	repo_set_commondir(repo, o->commondir);
-
-	if (!repo->objects)
-		repo->objects = odb_new(repo, o->object_dir, o->alternate_db);
-	else if (!o->skip_initializing_odb)
-		BUG("cannot reinitialize an already-initialized object directory");
-
 	repo->disable_ref_updates = o->disable_ref_updates;
 
 	expand_base_dir(&repo->graft_file, o->graft_file,
@@ -302,6 +296,8 @@ int repo_init(struct repository *repo,
 		goto error;
 	}
 
+	repo->objects = odb_new(repo, NULL, NULL);
+
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
diff --git a/repository.h b/repository.h
index c3ec0f4b79..36e2db2633 100644
--- a/repository.h
+++ b/repository.h
@@ -221,12 +221,9 @@ const char *repo_get_work_tree(struct repository *repo);
  */
 struct set_gitdir_args {
 	const char *commondir;
-	const char *object_dir;
 	const char *graft_file;
 	const char *index_file;
-	const char *alternate_db;
 	bool disable_ref_updates;
-	bool skip_initializing_odb;
 };
 
 void repo_set_gitdir(struct repository *repo, const char *root,
diff --git a/setup.c b/setup.c
index c5015923f1..3bd3f6c592 100644
--- a/setup.c
+++ b/setup.c
@@ -1045,17 +1045,18 @@ static void setup_git_env_internal(struct repository *repo,
 	struct strvec to_free = STRVEC_INIT;
 
 	args.commondir = getenv_safe(&to_free, GIT_COMMON_DIR_ENVIRONMENT);
-	args.object_dir = getenv_safe(&to_free, DB_ENVIRONMENT);
 	args.graft_file = getenv_safe(&to_free, GRAFT_ENVIRONMENT);
 	args.index_file = getenv_safe(&to_free, INDEX_ENVIRONMENT);
-	args.alternate_db = getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT);
 	if (getenv(GIT_QUARANTINE_ENVIRONMENT))
 		args.disable_ref_updates = true;
-	args.skip_initializing_odb = skip_initializing_odb;
 
 	repo_set_gitdir(repo, git_dir, &args);
 	strvec_clear(&to_free);
 
+	if (!skip_initializing_odb)
+		repo->objects = odb_new(repo, getenv_safe(&to_free, DB_ENVIRONMENT),
+					getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT));
+
 	if (getenv(NO_REPLACE_OBJECTS_ENVIRONMENT))
 		disable_replace_refs();
 	replace_ref_base = getenv(GIT_REPLACE_REF_BASE_ENVIRONMENT);

-- 
2.54.0.926.g75ba10bac6.dirty


^ permalink raw reply related

* [PATCH v2 5/8] setup: stop creating the object database in `setup_git_env()`
From: Patrick Steinhardt @ 2026-05-26  5:57 UTC (permalink / raw)
  To: git; +Cc: Kristoffer Haugsbakk, Junio C Hamano
In-Reply-To: <20260526-b4-pks-setup-centralize-odb-creation-v2-0-2fa5b385c13e@pks.im>

In the preceding commit we have stopped creating the object database in
`repo_set_gitdir()`. But the logic is still somewhat confusing as we
still end up creating it conditionally in `setup_git_dir()`, which is
called multiple times.

Drop the conditional logic and instead create the object database in all
places where we have discovered and configured a repository.

This leads to even more duplication than we already had in the preceding
commit, but an alert reader may notice that we now (almost) always call
`odb_new()` directly before having called `apply_repository_format()`.
The only exception to this is `setup_git_directory_gently()`, where we
also call the function when _not_ applying the repository format. This
will be fixed in the next commit, and once that's done we can then unify
creation of the object database into `apply_repository_format()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 setup.c | 37 ++++++++++++++++++++++++++-----------
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/setup.c b/setup.c
index 3bd3f6c592..0dc9fe4565 100644
--- a/setup.c
+++ b/setup.c
@@ -1035,8 +1035,7 @@ const char *read_gitfile_gently(const char *path, int *return_error_code)
 }
 
 static void setup_git_env_internal(struct repository *repo,
-				   const char *git_dir,
-				   bool skip_initializing_odb)
+				   const char *git_dir)
 {
 	char *git_replace_ref_base;
 	const char *shallow_file;
@@ -1053,10 +1052,6 @@ static void setup_git_env_internal(struct repository *repo,
 	repo_set_gitdir(repo, git_dir, &args);
 	strvec_clear(&to_free);
 
-	if (!skip_initializing_odb)
-		repo->objects = odb_new(repo, getenv_safe(&to_free, DB_ENVIRONMENT),
-					getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT));
-
 	if (getenv(NO_REPLACE_OBJECTS_ENVIRONMENT))
 		disable_replace_refs();
 	replace_ref_base = getenv(GIT_REPLACE_REF_BASE_ENVIRONMENT);
@@ -1072,10 +1067,10 @@ static void setup_git_env_internal(struct repository *repo,
 		fetch_if_missing = 0;
 }
 
-static void set_git_dir_1(struct repository *repo, const char *path, bool skip_initializing_odb)
+static void set_git_dir_1(struct repository *repo, const char *path)
 {
 	xsetenv(GIT_DIR_ENVIRONMENT, path, 1);
-	setup_git_env_internal(repo, path, skip_initializing_odb);
+	setup_git_env_internal(repo, path);
 }
 
 static void update_relative_gitdir(const char *name UNUSED,
@@ -1089,7 +1084,7 @@ static void update_relative_gitdir(const char *name UNUSED,
 	trace_printf_key(&trace_setup_key,
 			 "setup: move $GIT_DIR to '%s'",
 			 path);
-	set_git_dir_1(repo, path, true);
+	set_git_dir_1(repo, path);
 	free(path);
 }
 
@@ -1102,7 +1097,7 @@ static void set_git_dir(struct repository *repo, const char *path, int make_real
 		path = realpath.buf;
 	}
 
-	set_git_dir_1(repo, path, false);
+	set_git_dir_1(repo, path);
 	if (!is_absolute_path(path))
 		chdir_notify_register(NULL, update_relative_gitdir, repo);
 
@@ -1879,8 +1874,15 @@ const char *enter_repo(struct repository *repo, const char *path, unsigned flags
 	}
 
 	if (is_git_directory(".")) {
+		struct strvec to_free = STRVEC_INIT;
+
 		set_git_dir(repo, ".", 0);
+		repo->objects = odb_new(repo,
+					getenv_safe(&to_free, DB_ENVIRONMENT),
+					getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT));
 		check_and_apply_repository_format(repo, NULL);
+
+		strvec_clear(&to_free);
 		return path;
 	}
 
@@ -2032,13 +2034,19 @@ const char *setup_git_directory_gently(struct repository *repo, int *nongit_ok)
 	    startup_info->have_repository ||
 	    /* GIT_DIR_EXPLICIT */
 	    getenv(GIT_DIR_ENVIRONMENT)) {
+		struct strvec to_free = STRVEC_INIT;
+
 		if (!repo->gitdir) {
 			const char *gitdir = getenv(GIT_DIR_ENVIRONMENT);
 			if (!gitdir)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
-			setup_git_env_internal(repo, gitdir, false);
+			setup_git_env_internal(repo, gitdir);
 		}
 
+		repo->objects = odb_new(repo,
+					getenv_safe(&to_free, DB_ENVIRONMENT),
+					getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT));
+
 		if (startup_info->have_repository) {
 			struct strbuf err = STRBUF_INIT;
 
@@ -2048,6 +2056,8 @@ const char *setup_git_directory_gently(struct repository *repo, int *nongit_ok)
 			clear_repository_format(&repo_fmt);
 			strbuf_release(&err);
 		}
+
+		strvec_clear(&to_free);
 	}
 	/*
 	 * Since precompose_string_if_needed() needs to look at
@@ -2796,6 +2806,7 @@ int init_db(struct repository *repo,
 	int exist_ok = flags & INIT_DB_EXIST_OK;
 	char *original_git_dir = real_pathdup(git_dir, 1);
 	struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT;
+	struct strvec to_free = STRVEC_INIT;
 
 	if (real_git_dir) {
 		struct stat st;
@@ -2816,6 +2827,9 @@ int init_db(struct repository *repo,
 	}
 	startup_info->have_repository = 1;
 
+	repo->objects = odb_new(repo, getenv_safe(&to_free, DB_ENVIRONMENT),
+				getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT));
+
 	/*
 	 * Check to see if the repository version is right.
 	 * Note that a newly created repository does not have
@@ -2879,6 +2893,7 @@ int init_db(struct repository *repo,
 	}
 
 	clear_repository_format(&repo_fmt);
+	strvec_clear(&to_free);
 	free(original_git_dir);
 	return 0;
 }

-- 
2.54.0.926.g75ba10bac6.dirty


^ permalink raw reply related

* [PATCH v2 3/8] setup: deduplicate logic to apply repository format
From: Patrick Steinhardt @ 2026-05-26  5:56 UTC (permalink / raw)
  To: git; +Cc: Kristoffer Haugsbakk, Junio C Hamano
In-Reply-To: <20260526-b4-pks-setup-centralize-odb-creation-v2-0-2fa5b385c13e@pks.im>

After having discovered the repository format we then apply it to the
repository so that it knows to use the proper repository extensions. The
logic to apply the format is duplicated across three callsites, which
makes it rather painfull to add new extensions.

Introduce a new function `apply_repository_format()` that takes a repo
and applies a given format to it and adapt all callsites to use it.
While at it, rename `check_repository_format()` to clarify that it
doesn't only _check_ the format, but that it also applies it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 repository.c | 31 +++++++-------------
 setup.c      | 93 ++++++++++++++++++++++++++++++++----------------------------
 setup.h      |  9 ++++++
 3 files changed, 70 insertions(+), 63 deletions(-)

diff --git a/repository.c b/repository.c
index db57b8308b..58a13f7c4f 100644
--- a/repository.c
+++ b/repository.c
@@ -262,8 +262,8 @@ void repo_set_worktree(struct repository *repo, const char *path)
 	trace2_def_repo(repo);
 }
 
-static int read_and_verify_repository_format(struct repository_format *format,
-					     const char *commondir)
+static int read_repository_format_from_commondir(struct repository_format *format,
+						 const char *commondir)
 {
 	int ret = 0;
 	struct strbuf sb = STRBUF_INIT;
@@ -272,11 +272,6 @@ static int read_and_verify_repository_format(struct repository_format *format,
 	read_repository_format(format, sb.buf);
 	strbuf_reset(&sb);
 
-	if (verify_repository_format(format, &sb) < 0) {
-		warning("%s", sb.buf);
-		ret = -1;
-	}
-
 	strbuf_release(&sb);
 	return ret;
 }
@@ -290,6 +285,8 @@ int repo_init(struct repository *repo,
 	      const char *worktree)
 {
 	struct repository_format format = REPOSITORY_FORMAT_INIT;
+	struct strbuf err = STRBUF_INIT;
+
 	memset(repo, 0, sizeof(*repo));
 
 	initialize_repository(repo);
@@ -297,21 +294,13 @@ int repo_init(struct repository *repo,
 	if (repo_init_gitdir(repo, gitdir))
 		goto error;
 
-	if (read_and_verify_repository_format(&format, repo->commondir))
+	if (read_repository_format_from_commondir(&format, repo->commondir))
 		goto error;
 
-	repo_set_hash_algo(repo, format.hash_algo);
-	repo_set_compat_hash_algo(repo, format.compat_hash_algo);
-	repo_set_ref_storage_format(repo, format.ref_storage_format,
-				    format.ref_storage_payload);
-	repo->repository_format_worktree_config = format.worktree_config;
-	repo->repository_format_relative_worktrees = format.relative_worktrees;
-	repo->repository_format_precious_objects = format.precious_objects;
-	repo->repository_format_submodule_path_cfg = format.submodule_path_cfg;
-
-	/* take ownership of format.partial_clone */
-	repo->repository_format_partial_clone = format.partial_clone;
-	format.partial_clone = NULL;
+	if (apply_repository_format(repo, &format, &err) < 0) {
+		warning("%s", err.buf);
+		goto error;
+	}
 
 	if (worktree)
 		repo_set_worktree(repo, worktree);
@@ -320,10 +309,12 @@ int repo_init(struct repository *repo,
 		repo_read_loose_object_map(repo);
 
 	clear_repository_format(&format);
+	strbuf_release(&err);
 	return 0;
 
 error:
 	clear_repository_format(&format);
+	strbuf_release(&err);
 	repo_clear(repo);
 	return -1;
 }
diff --git a/setup.c b/setup.c
index 252b443117..c5015923f1 100644
--- a/setup.c
+++ b/setup.c
@@ -750,8 +750,7 @@ static int check_repo_format(const char *var, const char *value,
 	return read_worktree_config(var, value, ctx, vdata);
 }
 
-static int check_repository_format_gently(struct repository *repo,
-					  const char *gitdir,
+static int check_repository_format_gently(const char *gitdir,
 					  struct repository_format *candidate,
 					  int *nongit_ok)
 {
@@ -765,7 +764,7 @@ static int check_repository_format_gently(struct repository *repo,
 	strbuf_release(&sb);
 
 	/*
-	 * For historical use of check_repository_format() in git-init,
+	 * For historical use of check_and_apply_repository_format() in git-init,
 	 * we treat a missing config as a silent "ok", even when nongit_ok
 	 * is unset.
 	 */
@@ -782,8 +781,6 @@ static int check_repository_format_gently(struct repository *repo,
 		die("%s", err.buf);
 	}
 
-	repo->repository_format_precious_objects = candidate->precious_objects;
-
 	string_list_clear(&candidate->unknown_extensions, 0);
 	string_list_clear(&candidate->v1_only_extensions, 0);
 
@@ -1140,7 +1137,7 @@ static const char *setup_explicit_git_dir(struct repository *repo,
 		die(_("not a git repository: '%s'"), gitdirenv);
 	}
 
-	if (check_repository_format_gently(repo, gitdirenv, repo_fmt, nongit_ok)) {
+	if (check_repository_format_gently(gitdirenv, repo_fmt, nongit_ok)) {
 		free(gitfile);
 		return NULL;
 	}
@@ -1217,7 +1214,7 @@ static const char *setup_discovered_git_dir(struct repository *repo,
 					    struct repository_format *repo_fmt,
 					    int *nongit_ok)
 {
-	if (check_repository_format_gently(repo, gitdir, repo_fmt, nongit_ok))
+	if (check_repository_format_gently(gitdir, repo_fmt, nongit_ok))
 		return NULL;
 
 	/* --work-tree is set without --git-dir; use discovered one */
@@ -1265,7 +1262,7 @@ static const char *setup_bare_git_dir(struct repository *repo,
 {
 	int root_len;
 
-	if (check_repository_format_gently(repo, ".", repo_fmt, nongit_ok))
+	if (check_repository_format_gently(".", repo_fmt, nongit_ok))
 		return NULL;
 
 	setenv(GIT_IMPLICIT_WORK_TREE_ENVIRONMENT, "0", 1);
@@ -1757,6 +1754,32 @@ enum discovery_result discover_git_directory_reason(struct strbuf *commondir,
 	return result;
 }
 
+int apply_repository_format(struct repository *repo,
+			    const struct repository_format *format,
+			    struct strbuf *err)
+{
+	if (verify_repository_format(format, err) < 0)
+		return -1;
+
+	repo_set_hash_algo(repo, format->hash_algo);
+	repo_set_compat_hash_algo(repo, format->compat_hash_algo);
+	repo_set_ref_storage_format(repo,
+				    format->ref_storage_format,
+				    format->ref_storage_payload);
+	repo->repository_format_worktree_config =
+		format->worktree_config;
+	repo->repository_format_submodule_path_cfg =
+		format->submodule_path_cfg;
+	repo->repository_format_relative_worktrees =
+		format->relative_worktrees;
+	repo->repository_format_partial_clone =
+		xstrdup_or_null(format->partial_clone);
+	repo->repository_format_precious_objects =
+		format->precious_objects;
+
+	return 0;
+}
+
 /*
  * Check the repository format version in the path found in repo_get_git_dir(repo),
  * and die if it is a version we don't understand. Generally one would
@@ -1765,26 +1788,20 @@ enum discovery_result discover_git_directory_reason(struct strbuf *commondir,
  *
  * If successful and fmt is not NULL, fill fmt with data.
  */
-static void check_repository_format(struct repository *repo, struct repository_format *fmt)
+static void check_and_apply_repository_format(struct repository *repo,
+					      struct repository_format *fmt)
 {
 	struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT;
+	struct strbuf err = STRBUF_INIT;
+
 	if (!fmt)
 		fmt = &repo_fmt;
-	check_repository_format_gently(repo, repo_get_git_dir(repo), fmt, NULL);
+
+	check_repository_format_gently(repo_get_git_dir(repo), fmt, NULL);
+	if (apply_repository_format(repo, fmt, &err) < 0)
+		die("%s", err.buf);
 	startup_info->have_repository = 1;
-	repo_set_hash_algo(repo, fmt->hash_algo);
-	repo_set_compat_hash_algo(repo, fmt->compat_hash_algo);
-	repo_set_ref_storage_format(repo,
-				    fmt->ref_storage_format,
-				    fmt->ref_storage_payload);
-	repo->repository_format_worktree_config =
-		fmt->worktree_config;
-	repo->repository_format_submodule_path_cfg =
-		fmt->submodule_path_cfg;
-	repo->repository_format_relative_worktrees =
-		fmt->relative_worktrees;
-	repo->repository_format_partial_clone =
-		xstrdup_or_null(fmt->partial_clone);
+
 	clear_repository_format(&repo_fmt);
 }
 
@@ -1862,7 +1879,7 @@ const char *enter_repo(struct repository *repo, const char *path, unsigned flags
 
 	if (is_git_directory(".")) {
 		set_git_dir(repo, ".", 0);
-		check_repository_format(repo, NULL);
+		check_and_apply_repository_format(repo, NULL);
 		return path;
 	}
 
@@ -2020,25 +2037,15 @@ const char *setup_git_directory_gently(struct repository *repo, int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env_internal(repo, gitdir, false);
 		}
+
 		if (startup_info->have_repository) {
-			repo_set_hash_algo(repo, repo_fmt.hash_algo);
-			repo_set_compat_hash_algo(repo,
-						  repo_fmt.compat_hash_algo);
-			repo_set_ref_storage_format(repo,
-						    repo_fmt.ref_storage_format,
-						    repo_fmt.ref_storage_payload);
-			repo->repository_format_worktree_config =
-				repo_fmt.worktree_config;
-			repo->repository_format_relative_worktrees =
-				repo_fmt.relative_worktrees;
-			repo->repository_format_submodule_path_cfg =
-				repo_fmt.submodule_path_cfg;
-			/* take ownership of repo_fmt.partial_clone */
-			repo->repository_format_partial_clone =
-				repo_fmt.partial_clone;
-			repo_fmt.partial_clone = NULL;
-			repo->repository_format_precious_objects =
-				repo_fmt.precious_objects;
+			struct strbuf err = STRBUF_INIT;
+
+			if (apply_repository_format(repo, &repo_fmt, &err) < 0)
+				die("%s", err.buf);
+
+			clear_repository_format(&repo_fmt);
+			strbuf_release(&err);
 		}
 	}
 	/*
@@ -2814,7 +2821,7 @@ int init_db(struct repository *repo,
 	 * config file, so this will not fail.  What we are catching
 	 * is an attempt to reinitialize new repository with an old tool.
 	 */
-	check_repository_format(repo, &repo_fmt);
+	check_and_apply_repository_format(repo, &repo_fmt);
 
 	repository_format_configure(repo, &repo_fmt, hash, ref_storage_format);
 
diff --git a/setup.h b/setup.h
index 9409326fe4..5ed92f53fa 100644
--- a/setup.h
+++ b/setup.h
@@ -221,6 +221,15 @@ void clear_repository_format(struct repository_format *format);
 int verify_repository_format(const struct repository_format *format,
 			     struct strbuf *err);
 
+/*
+ * Apply the given repository format to the repo. This initializes extensions
+ * and basic data structures required for normal operation. Returns 0 on
+ * success, a negative error code otherwise.
+ */
+int apply_repository_format(struct repository *repo,
+			    const struct repository_format *format,
+			    struct strbuf *err);
+
 const char *get_template_dir(const char *option_template);
 
 #define INIT_DB_QUIET      (1 << 0)

-- 
2.54.0.926.g75ba10bac6.dirty


^ permalink raw reply related

* [PATCH v2 2/8] setup: drop `setup_git_env()`
From: Patrick Steinhardt @ 2026-05-26  5:56 UTC (permalink / raw)
  To: git; +Cc: Kristoffer Haugsbakk, Junio C Hamano
In-Reply-To: <20260526-b4-pks-setup-centralize-odb-creation-v2-0-2fa5b385c13e@pks.im>

The `setup_git_env()` function is a trivial wrapper around
`setup_git_env_internal()` and has a single call site only. Drop the
function.

While at it, drop stale documentation in "environment.h" that points to
this function, even though it hasn't been exposed to callers outside of
"setup.c" since 43ad1047a9 (setup: stop using `the_repository` in
`setup_git_env()`, 2026-03-27) anymore.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 environment.h | 8 +-------
 refs.c        | 3 ++-
 setup.c       | 7 +------
 3 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/environment.h b/environment.h
index 9eb97b3869..ccfcf37bfb 100644
--- a/environment.h
+++ b/environment.h
@@ -130,13 +130,6 @@ void repo_config_values_init(struct repo_config_values *cfg);
  * `the_repository`. We should eventually get rid of these and make the
  * dependency on a repository explicit:
  *
- *   - `setup_git_env()` ideally shouldn't exist as it modifies global state,
- *     namely the environment. The current process shouldn't ever access that
- *     state via envvars though, but should instead consult a `struct
- *     repository`. When spawning new processes, we would ideally also pass a
- *     `struct repository` and then set up the environment variables for the
- *     child process, only.
- *
  *   - `have_git_dir()` should not have to exist at all. Instead, we should
  *     decide on whether or not we have a `struct repository`.
  *
@@ -147,6 +140,7 @@ void repo_config_values_init(struct repo_config_values *cfg);
  * Please do not add new global config variables here.
  */
 # ifdef USE_THE_REPOSITORY_VARIABLE
+
 /*
  * Returns true iff we have a configured git repository (either via
  * setup_git_directory, or in the environment via $GIT_DIR).
diff --git a/refs.c b/refs.c
index 0f3355d2ee..e7070eb743 100644
--- a/refs.c
+++ b/refs.c
@@ -126,7 +126,8 @@ struct ref_namespace_info ref_namespace[] = {
 		 * points to the content of another. Unlike the other
 		 * ref namespaces, this one can be changed by the
 		 * GIT_REPLACE_REF_BASE environment variable. This
-		 * .namespace value will be overwritten in setup_git_env().
+		 * .namespace value will be overwritten during repository
+		 * setup.
 		 */
 		.ref = "refs/replace/",
 		.decoration = DECORATION_GRAFTED,
diff --git a/setup.c b/setup.c
index d723306dfe..252b443117 100644
--- a/setup.c
+++ b/setup.c
@@ -1074,11 +1074,6 @@ static void setup_git_env_internal(struct repository *repo,
 		fetch_if_missing = 0;
 }
 
-static void setup_git_env(struct repository *repo, const char *git_dir)
-{
-	setup_git_env_internal(repo, git_dir, false);
-}
-
 static void set_git_dir_1(struct repository *repo, const char *path, bool skip_initializing_odb)
 {
 	xsetenv(GIT_DIR_ENVIRONMENT, path, 1);
@@ -2023,7 +2018,7 @@ const char *setup_git_directory_gently(struct repository *repo, int *nongit_ok)
 			const char *gitdir = getenv(GIT_DIR_ENVIRONMENT);
 			if (!gitdir)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
-			setup_git_env(repo, gitdir);
+			setup_git_env_internal(repo, gitdir, false);
 		}
 		if (startup_info->have_repository) {
 			repo_set_hash_algo(repo, repo_fmt.hash_algo);

-- 
2.54.0.926.g75ba10bac6.dirty


^ permalink raw reply related

* [PATCH v2 0/8] setup: centralize object database creation
From: Patrick Steinhardt @ 2026-05-26  5:56 UTC (permalink / raw)
  To: git; +Cc: Kristoffer Haugsbakk, Junio C Hamano
In-Reply-To: <20260521-b4-pks-setup-centralize-odb-creation-v1-0-f130d2a7e8ae@pks.im>

Hi,

this small patch series refactors the logic for how we discover and
configure repositories. Most importantly, this involves the following
two steps:

  1. We unify the logic to apply the repository format, which is
     currently open-coded across multiple sites. These sites have
     already diverged, where some repository extensions are not
     consistently applied.

  2. We then centralize creation of the object database to happen at the
     same time we apply the repository format.

The end result is that we apply the repository format exactly once, and
that's also the point in time where we can finalize the setup of the
repo's data structures as we know about all details of the repo at that
time. Ultimately, this makes it trivial to introduce the "objectStorage"
extension, even though that's not part of this patch series.

The series is built on top of aec3f58750 (Sync with 'maint', 2026-05-21)
with ps/setup-wo-the-repository at df69f40c34 (setup: stop using
`the_repository` in `init_db()`, 2026-05-19) merged into it.

Changes in v2:
  - Commit message improvements.
  - Link to v1: https://patch.msgid.link/20260521-b4-pks-setup-centralize-odb-creation-v1-0-f130d2a7e8ae@pks.im

Thanks!

Patrick

---
Patrick Steinhardt (8):
      t0001: plug test gaps for git-init(1) with GIT_OBJECT_DIRECTORY
      setup: drop `setup_git_env()`
      setup: deduplicate logic to apply repository format
      repository: stop initializing the object database in `repo_set_gitdir()`
      setup: stop creating the object database in `setup_git_env()`
      setup: stop initializing object database without repository
      repository: stop reading loose object map twice on repo init
      setup: construct object database in `apply_repository_format()`

 commit-graph.c  |   4 +-
 environment.h   |   8 +---
 refs.c          |   3 +-
 repository.c    |  40 +++++------------
 repository.h    |   3 --
 setup.c         | 130 +++++++++++++++++++++++++++++++-------------------------
 setup.h         |  19 +++++++++
 t/t0001-init.sh |  10 +++++
 8 files changed, 117 insertions(+), 100 deletions(-)

Range-diff versus v1:

1:  dd1fcc7096 ! 1:  14521d16e6 t0001: plug test gaps for git-init(1) with GIT_OBJECT_DIRECTORY
    @@ Commit message
         t0001: plug test gaps for git-init(1) with GIT_OBJECT_DIRECTORY
     
         In subsequent commits we'll rework how we set up the repository. This is
    -    a somewhat intricate and thus fragile sequence, there's many things that
    +    a somewhat intricate and thus fragile sequence; there's many things that
         can go subtly wrong, and there are lots of interesting interactions that
         one can discover.
     
         One such discovered edge case was the interaction between git-init(1)
    -    and the "GIT_OBJECT_DIRECTORY" enviroment variable. When set, the
    +    and the "GIT_OBJECT_DIRECTORY" environment variable. When set, the
         behaviour is that the object directory should be created at the path
         that the variable points to. This behaviour is documented as such in
         its man page:
    @@ Commit message
           directory is used.
     
         Curiously enough though we don't seem to have any tests that exercise
    -    this directly, and thus a subsequent commit inadvertently broke this
    -    expectation.
    +    this directly, and thus a subsequent commit inadvertently would have
    +    broken this expectation.
     
         Plug this test gap.
     
2:  b150ecc19f = 2:  9c099e511b setup: drop `setup_git_env()`
3:  9a638f22e8 = 3:  3bbddc1021 setup: deduplicate logic to apply repository format
4:  60f4647bc2 = 4:  5f531305b2 repository: stop initializing the object database in `repo_set_gitdir()`
5:  6d4013ac62 = 5:  57e1de3a36 setup: stop creating the object database in `setup_git_env()`
6:  f4465744a1 = 6:  944dbaeaf8 setup: stop initializing object database without repository
7:  4b6c61be63 = 7:  2eb5646afc repository: stop reading loose object map twice on repo init
8:  54e3a338e5 ! 8:  5322ce123e setup: construct object database in `apply_repository_format()`
    @@ Commit message
         variables GIT_OBJECT_DIRECTORY and GIT_ALTERNATE_OBJECT_DIRECTORIES into
         the repository format, as well. This allows the caller more flexibility
         around whether or not those environment variables are being honored, as
    -    we do do want to respect them in "setup.c", but not in "repository.c".
    +    we want to respect them in "setup.c", but not in "repository.c".
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     

---
base-commit: 3398daa441965513c48744305d33bd36404547d6
change-id: 20260519-b4-pks-setup-centralize-odb-creation-3479c53fb11d


^ permalink raw reply

* [PATCH v2 1/8] t0001: plug test gaps for git-init(1) with GIT_OBJECT_DIRECTORY
From: Patrick Steinhardt @ 2026-05-26  5:56 UTC (permalink / raw)
  To: git; +Cc: Kristoffer Haugsbakk, Junio C Hamano
In-Reply-To: <20260526-b4-pks-setup-centralize-odb-creation-v2-0-2fa5b385c13e@pks.im>

In subsequent commits we'll rework how we set up the repository. This is
a somewhat intricate and thus fragile sequence; there's many things that
can go subtly wrong, and there are lots of interesting interactions that
one can discover.

One such discovered edge case was the interaction between git-init(1)
and the "GIT_OBJECT_DIRECTORY" environment variable. When set, the
behaviour is that the object directory should be created at the path
that the variable points to. This behaviour is documented as such in
its man page:

  If the object storage directory is specified via the
  GIT_OBJECT_DIRECTORY environment variable then the sha1 directories
  are created underneath; otherwise, the default $GIT_DIR/objects
  directory is used.

Curiously enough though we don't seem to have any tests that exercise
this directly, and thus a subsequent commit inadvertently would have
broken this expectation.

Plug this test gap.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/t0001-init.sh | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/t/t0001-init.sh b/t/t0001-init.sh
index e4d32bb4d2..e89feca544 100755
--- a/t/t0001-init.sh
+++ b/t/t0001-init.sh
@@ -980,4 +980,14 @@ test_expect_success 're-init reads matching includeIf.onbranch' '
 	test_cmp expect err
 '
 
+test_expect_success 'init honors GIT_OBJECT_DIRECTORY' '
+	test_when_finished "rm -rf init-objdir custom-odb" &&
+	mkdir custom-odb &&
+	env GIT_OBJECT_DIRECTORY="$(pwd)/custom-odb" \
+		git init init-objdir &&
+	test_path_is_missing init-objdir/.git/objects/pack &&
+	test_path_is_dir custom-odb/pack &&
+	test_path_is_dir custom-odb/info
+'
+
 test_done

-- 
2.54.0.926.g75ba10bac6.dirty


^ permalink raw reply related

* [PATCH v2] doc: clarify push.default=simple behavior
From: Ivan Baluta via GitGitGadget @ 2026-05-26  3:58 UTC (permalink / raw)
  To: git; +Cc: Ivan Baluta, Ivan Baluta
In-Reply-To: <pull.2115.git.1779433093971.gitgitgadget@gmail.com>

From: Ivan Baluta <ivanbaluta.dev@gmail.com>

The documentation for the 'simple' push mode currently singles out
the centralized workflow, which can cause confusion about its
behavior in other scenarios, such as triangular workflows.

Clarify that 'simple' always pushes the current branch to a branch
of the same name, but only enforces the strict upstream tracking
requirement when pushing back to the same remote being pulled from.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Ivan Baluta <ivanbaluta.dev@gmail.com>
---
    doc: clarify push.default=simple in triangular workflows

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2115%2Fivanbaluta%2Fdoc-push-simple-triangular-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2115/ivanbaluta/doc-push-simple-triangular-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2115

Range-diff vs v1:

 1:  37ff842622 ! 1:  3deb7f5b0c doc: clarify push.default=simple in triangular workflows
     @@
       ## Metadata ##
     -Author: ivanbaluta <ivanbaluta.dev@gmail.com>
     +Author: Ivan Baluta <ivanbaluta.dev@gmail.com>
      
       ## Commit message ##
     -    doc: clarify push.default=simple in triangular workflows
     +    doc: clarify push.default=simple behavior
      
     -    The documentation for 'simple' push mode currently focuses on the
     -    centralized workflow. However, the implementation in builtin/push.c
     -    falls back to 'current' behavior when pushing to a remote different
     -    from the upstream (a triangular workflow).
     +    The documentation for the 'simple' push mode currently singles out
     +    the centralized workflow, which can cause confusion about its
     +    behavior in other scenarios, such as triangular workflows.
      
     -    Clarify this in the manual to align the documentation with the
     -    long-standing implementation and prevent user confusion.
     +    Clarify that 'simple' always pushes the current branch to a branch
     +    of the same name, but only enforces the strict upstream tracking
     +    requirement when pushing back to the same remote being pulled from.
      
     -    Signed-off-by: ivanbaluta <ivanbaluta.dev@gmail.com>
     +    Suggested-by: Junio C Hamano <gitster@pobox.com>
     +    Signed-off-by: Ivan Baluta <ivanbaluta.dev@gmail.com>
      
       ## Documentation/config/push.adoc ##
     -@@ Documentation/config/push.adoc: If you are working on a centralized workflow (pushing to the same repository you
     - pull from, which is typically `origin`), then you need to configure an upstream
     - branch with the same name.
     +@@ Documentation/config/push.adoc: this is a deprecated synonym for `upstream`.
     + `simple`;;
     + push the current branch with the same name on the remote.
     + +
     +-If you are working on a centralized workflow (pushing to the same repository you
     +-pull from, which is typically `origin`), then you need to configure an upstream
     +-branch with the same name.
     ++This mode requires that the remote repository to be pushed to is
     ++known.  When pushing back to the same remote you pull from, the
     ++current branch must also have an upstream tracking branch with the
     ++same name.
       +
     -+In a triangular workflow (pushing to a remote different from the upstream),
     -+`simple` behaves like `current`.
     -++
       This mode is the default since Git 2.0, and is the safest option suited for
       beginners.
     - 


 Documentation/config/push.adoc | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/Documentation/config/push.adoc b/Documentation/config/push.adoc
index d9112b2260..28132eedfe 100644
--- a/Documentation/config/push.adoc
+++ b/Documentation/config/push.adoc
@@ -41,9 +41,10 @@ this is a deprecated synonym for `upstream`.
 `simple`;;
 push the current branch with the same name on the remote.
 +
-If you are working on a centralized workflow (pushing to the same repository you
-pull from, which is typically `origin`), then you need to configure an upstream
-branch with the same name.
+This mode requires that the remote repository to be pushed to is
+known.  When pushing back to the same remote you pull from, the
+current branch must also have an upstream tracking branch with the
+same name.
 +
 This mode is the default since Git 2.0, and is the safest option suited for
 beginners.

base-commit: 59ff4886a579f4bc91e976fe18590b9ae02c7a08
-- 
gitgitgadget

^ permalink raw reply related

* Re: [PATCH 2/2] restore: avoid sparse index expansion
From: Derrick Stolee @ 2026-05-26  2:54 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget; +Cc: git
In-Reply-To: <xmqqtsrwh0hx.fsf@gitster.g>

On 5/24/26 7:05 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:


>> -	if (S_ISDIR(mode))
>> +	if (S_ISDIR(mode)) {
>> +		/*
>> +		 * If this directory exists as a sparse directory entry in
>> +		 * the index, we can handle it at the tree level without
>> +		 * descending into individual files.
>> +		 */
>> +		if (the_repository->index->sparse_index) {
> 
> I wonder if this deep nesting is a sign that the newly added code
> from here to ...
> 
>> +			struct strbuf dirpath = STRBUF_INIT;
>> +
>> +			strbuf_addbuf(&dirpath, base);
>> +			strbuf_addstr(&dirpath, pathname);
>> +			strbuf_addch(&dirpath, '/');
>> +
>> +			pos = index_name_pos_sparse(the_repository->index,
>> +						    dirpath.buf, dirpath.len);
>> +			if (pos >= 0) {
>> +				struct cache_entry *old =
>> +					the_repository->index->cache[pos];
>> +				if (S_ISSPARSEDIR(old->ce_mode)) {
>> +					if (oideq(oid, &old->oid)) {
>> +						strbuf_release(&dirpath);
>> +						return 0;
>> +					}
>> +					if (!overlay_mode) {
>> +						/*
>> +						 * In non-overlay mode (e.g.,
>> +						 * restore --staged), we can
>> +						 * replace the sparse dir OID
>> +						 * directly since files not in
>> +						 * the source tree should be
>> +						 * removed anyway.
>> +						 */
>> +						oidcpy(&old->oid, oid);
>> +						old->ce_flags |= CE_UPDATE;
>> +						strbuf_release(&dirpath);
>> +						return 0;
>> +					}
>> +				}
>> +			}
>> +			strbuf_release(&dirpath);
>> +		}
> 
> ... here may become easier to understand if it is made into a small
> helper function with a descriptive name.

Good idea. I'll try that and send a v2.

Thanks,
-Stolee


^ permalink raw reply

* Re: [PATCH v2 3/4] diff: add long-running diff process via diff.<driver>.process
From: Junio C Hamano @ 2026-05-26  2:26 UTC (permalink / raw)
  To: Michael Montalbo via GitGitGadget; +Cc: git, Michael Montalbo
In-Reply-To: <c25647c6e571e293fc994e0620ca37709f680f8a.1779733799.git.gitgitgadget@gmail.com>

"Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Zero hunks with status=success means the tool considers the
> files equivalent.  Git skips diff output for that file.

Is "zero hunk" a common word or some random string you invented?  If
the latter, which is I am assuming it to be, you should define what
it means at/before the first use.  Here in the proposed log message,
and ...

>
> Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
> ---
>  Documentation/config/diff.adoc   |   8 +
>  Documentation/gitattributes.adoc |  40 ++++
>  Makefile                         |   1 +
>  diff-process.c                   | 206 +++++++++++++++++++
>  diff-process.h                   |  28 +++
>  diff.c                           |  23 +++
>  t/.gitattributes                 |   1 +
>  t/t4080-diff-process.sh          | 338 +++++++++++++++++++++++++++++++
>  8 files changed, 645 insertions(+)
>  create mode 100644 diff-process.c
>  create mode 100644 diff-process.h
>  create mode 100755 t/t4080-diff-process.sh
>
> diff --git a/Documentation/config/diff.adoc b/Documentation/config/diff.adoc
> index 1135a62a0a..4ab5f60df6 100644
> --- a/Documentation/config/diff.adoc
> +++ b/Documentation/config/diff.adoc
> @@ -218,6 +218,14 @@ endif::git-diff[]
>  	Set this option to `true` to make the diff driver cache the text
>  	conversion outputs.  See linkgit:gitattributes[5] for details.
>  
> +`diff.<driver>.process`::
> +	The command to run as a long-running diff process.
> +	The tool communicates via the pkt-line protocol and returns
> +	hunks that are fed into Git's diff and blame pipelines.
> +	If the tool returns zero hunks, the file is treated as
> +	unchanged for both diff output and blame attribution.
> +	See linkgit:gitattributes[5] for details.

... also here.

I do not know if you mean "the tool returns no hunks" (there is no
"hunk <old_start> <old_count> <new_start> <new_count>" line passed
from the tool over the protocol) or "the tool returns zero-hunk"
(there is a special "zero-hunk" message to signal this particular
condition sent over the protocol), and this description does not
quite help disambiguating between the two.

If the former, then avoid "zero hunks" as it sounds like a noun with
special meaning.  Yes, we can say "tool returns one hunk", "tool
returns 31 hunks", etc., so "tool returns zero hunks" may logically
be correct, but "when the tool returns no hunks with status=success"
is much less confusing, I think.

^ permalink raw reply

* Re: [PATCH v2 3/4] diff: add long-running diff process via diff.<driver>.process
From: Junio C Hamano @ 2026-05-26  1:56 UTC (permalink / raw)
  To: Michael Montalbo via GitGitGadget; +Cc: git, Michael Montalbo
In-Reply-To: <c25647c6e571e293fc994e0620ca37709f680f8a.1779733799.git.gitgitgadget@gmail.com>

"Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +struct diff_subprocess {
> +	struct subprocess_entry subprocess;
> +	unsigned int supported_capabilities;
> +};
> +
> +static int subprocess_map_initialized;
> +static struct hashmap subprocess_map;

Can we avoid introducing new global variables like these?  Would
"struct userdiff_driver" or "struct diff_options" be a good place to
hang this hashmap, perhaps?

> +static int send_file_content(int fd, const char *buf, long size)
> +{
> +	int ret;
> +
> +	if (size > 0)
> +		ret = write_packetized_from_buf_no_flush(buf, size, fd);
> +	else
> +		ret = 0;

Shouldn't "size == -24" be flagged as an invalid input?

> +	if (ret)
> +		return ret;
> +	return packet_flush_gently(fd);
> +}

> +static int parse_hunk_line(const char *line, struct xdl_hunk *hunk)
> +{
> +...
> +}

This gives a silent error diagnosis, which is good for a lower level
helper.

> +int diff_process_get_hunks(struct userdiff_driver *drv,
> +			   const char *path,
> +			   const char *old_buf, long old_size,
> +			   const char *new_buf, long new_size,
> +			   struct xdl_hunk **hunks_out,
> +			   size_t *nr_hunks_out)
> +{
> +	struct diff_subprocess *backend;
> +	struct child_process *process;
> +	int fd_in, fd_out;
> +	struct strbuf status = STRBUF_INIT;
> +	struct xdl_hunk *hunks = NULL;
> +	struct xdl_hunk hunk;
> +	size_t nr_hunks = 0, alloc_hunks = 0;
> +	int len;
> +	char *line;
> +
> +	if (!drv || !drv->process)
> +		return -1;

A driver that does not define process is not an error; it is
perfectly normal in the current world order where nobody has such an
external process and even fi this patch lands, external processes
are optional.  So here "return -1" does not mean an error, and
silent return is perfectly fine.

> +	backend = find_or_start_process(drv->process);
> +	if (!backend)
> +		return -1;

This is probably an error; the user specified drv->process, we
either tried to find or start the process and failed.  Isn't it an
event that deserves to be reported in an error message?

> +	if (!(backend->supported_capabilities & CAP_HUNKS))
> +		return -1;

Backend started, but the "hunks" feature is not supported.  Perhaps
in a year or two, this external process protocol may have become so
popular that it gained more capabilities, possibly making get_hunks
obsolete.  We may be looking at such an external process that uses
other capabilities but not this one.  This is not an error, so
silent return is perfectly fine.

> +	process = subprocess_get_child_process(&backend->subprocess);
> +	fd_in = process->in;
> +	fd_out = process->out;
> +
> +	/* Send request */
> +	if (packet_write_fmt_gently(fd_in, "command=hunks\n") ||
> +	    packet_write_fmt_gently(fd_in, "pathname=%s\n", path) ||
> +	    packet_flush_gently(fd_in))
> +		goto error;
> +
> +	/* Send old file content */
> +	if (send_file_content(fd_in, old_buf, old_size))
> +		goto error;
> +
> +	/* Send new file content */
> +	if (send_file_content(fd_in, new_buf, new_size))
> +		goto error;
> +
> +	/* Read hunks until flush packet */
> +	while ((len = packet_read_line_gently(fd_out, NULL, &line)) >= 0 &&
> +	       line) {
> +		if (parse_hunk_line(line, &hunk) < 0)
> +			goto error;
> +		ALLOC_GROW(hunks, nr_hunks + 1, alloc_hunks);
> +		hunks[nr_hunks++] = hunk;
> +	}
> +	if (len < 0)
> +		goto error;
> +
> +	/* Read status */
> +	if (subprocess_read_status(fd_out, &status))
> +		goto error;
> +
> +	if (strcmp(status.buf, "success")) {
> +		if (!strcmp(status.buf, "abort"))
> +			backend->supported_capabilities &= ~CAP_HUNKS;
> +		goto error;
> +	}
> +
> +	*hunks_out = hunks;
> +	*nr_hunks_out = nr_hunks;
> +	strbuf_release(&status);
> +	return 0;
> +
> +error:

All exceptions that lead here look like events that should be
reported to the end-user.

> +	free(hunks);
> +	strbuf_release(&status);
> +	return -1;
> +}

> +/*
> + * Query a diff process for hunks describing the changes
> + * between old_buf and new_buf.
> + *
> + * The backend is a long-running subprocess configured via
> + * diff.<driver>.process.  It receives file content via
> + * pkt-line and returns hunks with 1-based line numbers.
> + *
> + * On success, sets *hunks_out and *nr_hunks_out to a newly allocated
> + * array (caller must free) and returns 0.
> + *
> + * On failure, returns -1.  The caller should fall back to the
> + * builtin diff algorithm.
> + */

I do not agree with this.  If it is a failure, the user should fix
the external process (or disable).  It shouldn't be hidden behind a
fallback.  As I left comments, in this round of implementation,
there are conditions that returns -1 for soemthing that is not an
error (i.e., not configured, or process not supporting the
particular capability) *and* in those cases the caller should fall
back as if nothing happened.  But some error cases, the caller
should't hide them.

^ permalink raw reply

* Re: Expected test suite behavior
From: Junio C Hamano @ 2026-05-26  1:11 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Amogh Dambal, Jeff King, Michael Montalbo, git
In-Reply-To: <ahTsTDhVPkHTEbB_@fruit.crustytoothpaste.net>

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> I think I know what the problem is: you're running as root.  I suspect
> `test -x` in the test says that you have permission to execute it
> because you're root and root always ignores permissions.  My guess is
> that most of the tests you're failing have to do with permissions of
> some sort that are being ignored because you're privileged.

I know !SANITY defeats a-rw and lets the tester read or write to the
path, but this is the first time I heard that !SANITY defeats a-x
and lets the tester _execute_ it.

I do not think we drop POSIXPERM prereq when !SANITY automatically,
and I do not think we should, but we should probably have an
automated check to drop POSIXPERM?

    (
	# is an unexecutable look to the user as executable?
	umask 0; >testfile; chmod a-w testfile; test -x testfile;
	status=$?;
	rm -f testfile; exit $status
    )

> I'll just note that if you just want to do Git development, macOS is a
> fully supported platform on which to do that.  I will admit most of the
> major contributors (with the notable exception of the Git for Windows
> maintainer) do use Linux and of course I like and endorse Debian, but
> macOS should build and run just fine if you prefer that.

Hear hear.

We want to encourage developers to do more _native_ development on
their own system, and this is not limited to macOS.  As long as
users on a particular platform rely on working Git there, we are
better off if we have more people actively using that platform to
build, test, develop on, and debug Git for their own use.

Thanks.


^ permalink raw reply

* Re: Expected test suite behavior
From: brian m. carlson @ 2026-05-26  0:41 UTC (permalink / raw)
  To: Amogh Dambal; +Cc: Jeff King, Michael Montalbo, git
In-Reply-To: <4649049a-ded5-4cc6-bc2b-d5f543e6df99@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2679 bytes --]

On 2026-05-25 at 22:25:11, Amogh Dambal wrote:
> > What are the OS and file system on the host?  We tend to see
> > executable bits set when NTFS, FAT, or other Windows-adjacent file
> > systems are used on Linux and you're mounting `$(PWD)` into the
> > container as a volume.
> 
> Ah, this is a smoking gun. I'm not on a Windows-adjacent file system; I'm
> running macOS Sequoia 15.5 on the host. Specifically:
> 
> $ uname -msprsv
> Darwin 24.5.0 Darwin Kernel Version 24.5.0: Tue Apr 22 19:54:26 PDT 2025;
> root:xnu-11417.121.6~2/RELEASE_ARM64_T8112 arm64 arm
> 
> But I am mounting $(PWD) into the container as a volume.

I wouldn't expect that to be a problem, then.  macOS uses Unix-style
permissions and I've never seen odd permissions behaviour mounting a
macOS APFS file system into a container.  I will, however, note that I
am using a case-sensitive APFS volume, but I cannot imagine how this
would occur with _any_ macOS APFS volume mounted into a Linux container.

> > Git doesn't use `/tmp` for most files in the tests.  Those are stored
> > under `t/`, so you'd want to create your test directory there.
> 
> ACK, good to know, thanks! I am still seeing the same behavior with a
> `debug` directory under `t/`:
> 
> root@ec94ab1b260e:~/git/t/debug# /root/git/git init plain
> root@ec94ab1b260e:~/git/t/debug# ls -alhrt
> /root/git/t/debug/plain/.git/config
> -rw-r--r-- 1 root root 111 May 25 22:24 /root/git/t/debug/plain/.git/config

I think I know what the problem is: you're running as root.  I suspect
`test -x` in the test says that you have permission to execute it
because you're root and root always ignores permissions.  My guess is
that most of the tests you're failing have to do with permissions of
some sort that are being ignored because you're privileged.

In general, you would not want to run this as root.  Use `adduser` to
create yourself a regular user and then use the `USER` directive in the
Dockerfile to change users.  I don't run the tests as root and I'm sure
none of the other regular contributors do, either.  Running unprivileged
in a container is a best practice anyway.

I'll just note that if you just want to do Git development, macOS is a
fully supported platform on which to do that.  I will admit most of the
major contributors (with the notable exception of the Git for Windows
maintainer) do use Linux and of course I like and endorse Debian, but
macOS should build and run just fine if you prefer that.

But if you do want to use a container, I'd try unprivileged and see if
substantially more of the tests pass.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 325 bytes --]

^ permalink raw reply

* Re: How does git track history overwrites?
From: Jens Tröger @ 2026-05-25 23:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Chris Torek, git
In-Reply-To: <xmqqfr3fnl2n.fsf@gitster.g>

Hello Junio,

> Sorry, I may have been unclear.  I specifically meant the "grafted,
> " part in the message.  I know how "git log" output looks like ;-)

The “grafted” too was part of the git log output; here is the complete
cmd line output:

    /tmp/bla > git log
    commit fda77690955e9b63c6687d8806bafd56a526e45f (grafted, HEAD)
    Author: Adam Johnson <me@adamj.eu>
    Date:   Mon Sep 8 16:31:35 2025 +0100

        Version 1.20.0

On *why* the “grafted” is there in the first place, I suspect that’s got
to do with the fetch --depth=1 and that previous history isn’t available
in the shallow repo clone.

Cheers,
Jens

^ permalink raw reply

* Re: How does git track history overwrites?
From: Junio C Hamano @ 2026-05-25 23:09 UTC (permalink / raw)
  To: Jens Tröger; +Cc: Chris Torek, git
In-Reply-To: <074E783A-027D-4C5B-BC44-CC38C53735D7@light-speed.de>

Jens Tröger <jens.troeger@light-speed.de> writes:

> Thank you Chris and Junio!
>
>
>> [Junio] Where does this line in your discussion page at GitHub (which is
>> omitted from the post to this list) come from?
>> 
>>    commit fda77690955e9b63c6687d8806bafd56a526e45f (grafted, HEAD)
>> 
>> Are you doing anything funky with .git/info/grafts by any chance?
>
> That line is the result of a `git log` after the `git fetch` I mentioned in my initial email.

Sorry, I may have been unclear.  I specifically meant the "grafted,
" part in the message.  I know how "git log" output looks like ;-)


^ permalink raw reply

* Re: [PATCH v2] receive-pack: fix updateInstead with core.worktree
From: Junio C Hamano @ 2026-05-25 22:54 UTC (permalink / raw)
  To: Alyssa Ross
  Cc: git, Ævar Arnfjörð Bjarmason, Kristoffer Haugsbakk
In-Reply-To: <20260525162311.66240-2-hi@alyssa.is>

Alyssa Ross <hi@alyssa.is> writes:

> Previously, only one of push_to_checkout() or push_to_deploy() was
> called.  In a8cc594333 (hooks: fix an obscure TOCTOU "did we just run a
> hook?" race, 2022-03-07), this was changed to always call
> push_to_checkout(), and then to call push_to_deploy() if
> push_to_checkout() didn't run anything.  This change didn't take into
> account that push_to_checkout() had a side effect of modifying env, and
> that modified env broke updating the worktree in push_to_deploy() if
> core.worktree was configured.  To fix this, only mutate the environment
> used inside push_to_commit(), rather than the environment that might
> later be passed to push_to_deploy().
>
> Signed-off-by: Alyssa Ross <hi@alyssa.is>
> ---
> v2: reword commit message in response to feedback

You also fixed an incorrectly indentated line in the new test, which
is very much appreciated.

Will queue.  Thanks.

>
>  builtin/receive-pack.c |  2 +-
>  t/t5516-fetch-push.sh  | 11 +++++++++++
>  2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index c7b2818f20..7ee157532d 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -1460,8 +1460,8 @@ static const char *push_to_checkout(unsigned char *hash,
>  
>  	opt.invoked_hook = invoked_hook;
>  
> -	strvec_pushf(env, "GIT_WORK_TREE=%s", absolute_path(work_tree));
>  	strvec_pushv(&opt.env, env->v);
> +	strvec_pushf(&opt.env, "GIT_WORK_TREE=%s", absolute_path(work_tree));
>  	strvec_push(&opt.args, hash_to_hex(hash));
>  	if (run_hooks_opt(the_repository, push_to_checkout_hook, &opt))
>  		return "push-to-checkout hook declined";
> diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh
> index 117cfa051f..db6cc18673 100755
> --- a/t/t5516-fetch-push.sh
> +++ b/t/t5516-fetch-push.sh
> @@ -1791,6 +1791,17 @@ test_expect_success 'updateInstead with push-to-checkout hook' '
>  	)
>  '
>  
> +test_expect_success 'denyCurrentBranch and core.worktree' '
> +	test_when_finished "rm -fr cloned cloned.git" &&
> +	git clone --separate-git-dir cloned.git . cloned &&
> +	git --git-dir cloned.git config receive.denyCurrentBranch updateInstead &&
> +	git --git-dir cloned.git config core.worktree "$PWD/cloned" &&
> +	test_commit raspberry &&
> +	git push cloned.git HEAD:main &&
> +	test_path_exists cloned/raspberry.t &&
> +	test_must_fail git push --delete cloned.git main
> +'
> +
>  test_expect_success 'denyCurrentBranch and worktrees' '
>  	test_when_finished "rm -fr cloned && git worktree remove --force new-wt" &&
>  	git worktree add new-wt &&
>
> base-commit: aec3f587505a472db67e9462d0702e7d463a449d

^ permalink raw reply

* Re: [PATCH v2 2/3] commit-reach: deduplicate queue entries in paint_down_to_common
From: Junio C Hamano @ 2026-05-25 22:50 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget
  Cc: git, Derrick Stolee, Jeff King, Kristofer Karlsson
In-Reply-To: <fc38c0f856e93b80073ec3f1b9f641b9ab187e4e.1779719286.git.gitgitgadget@gmail.com>

"Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Kristofer Karlsson <krka@spotify.com>
>
> paint_down_to_common() can enqueue the same commit multiple times
> when it is reached through different parents with different flag
> combinations. Add an ENQUEUED flag to track whether a commit is
> currently in the priority queue, and skip it if already present.
>
> Introduce prio_queue_put_dedup() and prio_queue_get_dedup()
> wrappers that manage the ENQUEUED flag on enqueue and dequeue.

OK.  I guess an obvious alternative design would be to have an
associated hashtable for deduping, or tweak prio_queue_get() so
that it notices duplicated entry just before it returns (i.e.,
peek and discard until queue->array[0].data is different from
what you are going to return).  Both would not beat the cheap cost
of using a single bit per object, I guess ;-)

> This change is performance-neutral on its own: the O(n)
> queue_has_nonstale() scan still dominates the per-iteration cost.
> However, the deduplication guarantee (each commit appears in the
> queue at most once) is a prerequisite for the next commit, which
> replaces that scan with O(1) tracking.
>
> Signed-off-by: Kristofer Karlsson <krka@spotify.com>
> ---
>  commit-reach.c | 27 ++++++++++++++++++++++-----
>  object.h       |  2 +-
>  2 files changed, 23 insertions(+), 6 deletions(-)

Thanks for the clean-up in the previous step, by the way.

> diff --git a/commit-reach.c b/commit-reach.c
> index 5a52be90a6..85583ae359 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -17,8 +17,9 @@
>  #define PARENT2		(1u<<17)
>  #define STALE		(1u<<18)
>  #define RESULT		(1u<<19)
> +#define ENQUEUED	(1u<<20)
>  
> -static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT);
> +static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT | ENQUEUED);
>  
>  static int compare_commits_by_gen(const void *_a, const void *_b)
>  {
> @@ -39,6 +40,22 @@ static int compare_commits_by_gen(const void *_a, const void *_b)
>  	return 0;
>  }
>  
> +static void prio_queue_put_dedup(struct prio_queue *queue, struct commit *c)
> +{
> +	if (c->object.flags & ENQUEUED)
> +		return;
> +	c->object.flags |= ENQUEUED;
> +	prio_queue_put(queue, c);
> +}
> +
> +static struct commit *prio_queue_get_dedup(struct prio_queue *queue)
> +{
> +	struct commit *commit = prio_queue_get(queue);
> +	if (commit)
> +		commit->object.flags &= ~ENQUEUED;
> +	return commit;
> +}
> +
>  static int queue_has_nonstale(struct prio_queue *queue)
>  {
>  	for (size_t i = 0; i < queue->nr; i++) {
> @@ -70,15 +87,15 @@ static int paint_down_to_common(struct repository *r,
>  		commit_list_append(one, result);
>  		return 0;
>  	}
> -	prio_queue_put(&queue, one);
> +	prio_queue_put_dedup(&queue, one);
>  
>  	for (i = 0; i < n; i++) {
>  		twos[i]->object.flags |= PARENT2;
> -		prio_queue_put(&queue, twos[i]);
> +		prio_queue_put_dedup(&queue, twos[i]);
>  	}
>  
>  	while (queue_has_nonstale(&queue)) {
> -		struct commit *commit = prio_queue_get(&queue);
> +		struct commit *commit = prio_queue_get_dedup(&queue);
>  		struct commit_list *parents;
>  		int flags;
>  		timestamp_t generation = commit_graph_generation(commit);
> @@ -132,7 +149,7 @@ static int paint_down_to_common(struct repository *r,
>  					     oid_to_hex(&p->object.oid));
>  			}
>  			p->object.flags |= flags;
> -			prio_queue_put(&queue, p);
> +			prio_queue_put_dedup(&queue, p);
>  		}
>  	}
>  
> diff --git a/object.h b/object.h
> index 2b26de3044..8fb03ff90a 100644
> --- a/object.h
> +++ b/object.h
> @@ -75,7 +75,7 @@ void object_array_init(struct object_array *array);
>   * bundle.c:                                        16
>   * http-push.c:                          11-----14
>   * commit-graph.c:                                15
> - * commit-reach.c:                                  16-----19
> + * commit-reach.c:                                  16-------20
>   * builtin/last-modified.c:                         1617
>   * object-name.c:                                            20
>   * list-objects-filter.c:                                      21

^ permalink raw reply

* Re: How does git track history overwrites?
From: Jens Tröger @ 2026-05-25 22:47 UTC (permalink / raw)
  To: Junio C Hamano, Chris Torek; +Cc: git
In-Reply-To: <87se7gasn8.fsf@gitster.g>

Thank you Chris and Junio!


> [Junio] Where does this line in your discussion page at GitHub (which is
> omitted from the post to this list) come from?
> 
>    commit fda77690955e9b63c6687d8806bafd56a526e45f (grafted, HEAD)
> 
> Are you doing anything funky with .git/info/grafts by any chance?

That line is the result of a `git log` after the `git fetch` I mentioned in my initial email.


> [Chris] To really understand this properly, we need to understand
> the root of a seeming contradiction:
> 
> [...]

Thank you for your elaborate explanation, Chris, that all makes a lot of sense. A few follow-up questions:

 •  Is all the object information stored with a repo clone locally as well, or does some/most/all of it stay on the remote server repo?
 •  How exactly does git connect the dots between commit dda8db1 and fda7769, how does it “know” the former was superseded by the latter (i.e. I fetch the former and Git uses the latter for head)?
 •  Based on the previous question, can I manually find such a connection between two commit objects too?

It sounds like reading some internals would be helpful. I’m noodling through https://git-scm.com/book/en/v2/Git-Internals-Git-Objects but perhaps you have some more recommendations?

With many greetings,
Jens


^ permalink raw reply

* Re: Expected test suite behavior
From: Amogh Dambal @ 2026-05-25 22:25 UTC (permalink / raw)
  To: brian m. carlson, Jeff King, Michael Montalbo, git
In-Reply-To: <ahTKq_zCmEDJpoN5@fruit.crustytoothpaste.net>

 > What are the OS and file system on the host?  We tend to see
 > executable bits set when NTFS, FAT, or other Windows-adjacent file
 > systems are used on Linux and you're mounting `$(PWD)` into the
 > container as a volume.

Ah, this is a smoking gun. I'm not on a Windows-adjacent file system; 
I'm running macOS Sequoia 15.5 on the host. Specifically:

$ uname -msprsv
Darwin 24.5.0 Darwin Kernel Version 24.5.0: Tue Apr 22 19:54:26 PDT 
2025; root:xnu-11417.121.6~2/RELEASE_ARM64_T8112 arm64 arm

But I am mounting $(PWD) into the container as a volume.


 > Git doesn't use `/tmp` for most files in the tests.  Those are stored
 > under `t/`, so you'd want to create your test directory there.

ACK, good to know, thanks! I am still seeing the same behavior with a 
`debug` directory under `t/`:

root@ec94ab1b260e:~/git/t/debug# /root/git/git init plain
root@ec94ab1b260e:~/git/t/debug# ls -alhrt 
/root/git/t/debug/plain/.git/config
-rw-r--r-- 1 root root 111 May 25 22:24 /root/git/t/debug/plain/.git/config

^ permalink raw reply

* Re: Expected test suite behavior
From: brian m. carlson @ 2026-05-25 22:18 UTC (permalink / raw)
  To: Amogh Dambal; +Cc: Jeff King, Michael Montalbo, git
In-Reply-To: <23221493-ea81-47c3-9647-6c6ac8d03360@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1522 bytes --]

On 2026-05-25 at 22:01:23, Amogh Dambal wrote:
> `GIT_TEST_OPTS=--verbose` was very illuminating. I captured STDOUT/STDERR:
> `make test GIT_TEST_OPTS=--verbose &> git-test-verbose-fail.tmp`, which
> shows that almost every test fails `check_config` because a `git init` is
> creating a `.git/config` file whose executable bit is set:

What are the OS and file system on the host?  We tend to see
executable bits set when NTFS, FAT, or other Windows-adjacent file
systems are used on Linux and you're mounting `$(PWD)` into the
container as a volume.

> --
> Initialized empty Git repository in /root/git/t/trash
> directory.t0001-init/plain/.git/
> plain/.git/config is executable?
> not ok 1 - plain
> 
> [...]
> 
> 
> However, I'm not able to reproduce this, e.g. directly using the local built
> binary seems to work fine:
> 
> mkdir -p /tmp/debug && cd /tmp/debug
> /root/git/git init plain
> ls -alhrt /tmp/debug/plain/.git
> root@ec94ab1b260e:/tmp/debug# ls -alhrt /tmp/debug/plain/.git
> total 24K
> -rw-r--r-- 1 root root   92 May 25 21:26 config
> drwxr-xr-x 3 root root 4.0K May 25 21:26 ..
> drwxr-xr-x 4 root root 4.0K May 25 21:26 refs
> drwxr-xr-x 4 root root 4.0K May 25 21:26 objects
> -rw-r--r-- 1 root root   23 May 25 21:26 HEAD
> drwxr-xr-x 4 root root 4.0K May 25 21:26 .

Git doesn't use `/tmp` for most files in the tests.  Those are stored
under `t/`, so you'd want to create your test directory there.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 325 bytes --]

^ permalink raw reply

* Re: Expected test suite behavior
From: Amogh Dambal @ 2026-05-25 22:01 UTC (permalink / raw)
  To: Jeff King, Michael Montalbo; +Cc: git
In-Reply-To: <20260525072711.GE2737798@coredump.intra.peff.net>

Thanks for the replies, and the pointers!


 >> Hello. If you run `make test GIT_TEST_OPTS=--verbose` or uncomment
 >> L16 of t/Makefile is there more information describing the issue?

 > You can use --verbose-log instead, and
 > then output is in t/test-results/t1234-whatever.out.


`GIT_TEST_OPTS=--verbose` was very illuminating. I captured 
STDOUT/STDERR: `make test GIT_TEST_OPTS=--verbose &> 
git-test-verbose-fail.tmp`, which shows that almost every test fails 
`check_config` because a `git init` is creating a `.git/config` file 
whose executable bit is set:

--
Initialized empty Git repository in /root/git/t/trash 
directory.t0001-init/plain/.git/
plain/.git/config is executable?
not ok 1 - plain

[...]


However, I'm not able to reproduce this, e.g. directly using the local 
built binary seems to work fine:

mkdir -p /tmp/debug && cd /tmp/debug
/root/git/git init plain
ls -alhrt /tmp/debug/plain/.git
root@ec94ab1b260e:/tmp/debug# ls -alhrt /tmp/debug/plain/.git
total 24K
-rw-r--r-- 1 root root   92 May 25 21:26 config
drwxr-xr-x 3 root root 4.0K May 25 21:26 ..
drwxr-xr-x 4 root root 4.0K May 25 21:26 refs
drwxr-xr-x 4 root root 4.0K May 25 21:26 objects
-rw-r--r-- 1 root root   23 May 25 21:26 HEAD
drwxr-xr-x 4 root root 4.0K May 25 21:26 .

In fact, even the specific test directory that I assume is being checked 
reports properly set bits on the `config` file:

root@f695346357b9:~/git# ls -alhrt t/trash\ directory.t0001-init/plain/.git/
total 12K
drwxr-xr-x  3 root root  96 May 25 21:08 ..
drwxr-xr-x  3 root root  96 May 25 21:08 info
-rwxr-xr-x  1 root root  73 May 25 21:08 description
drwxr-xr-x 16 root root 512 May 25 21:08 hooks
-rw-r--r--  1 root root 111 May 25 21:08 config
drwxr-xr-x  4 root root 128 May 25 21:08 refs
-rw-r--r--  1 root root  23 May 25 21:08 HEAD
drwxr-xr-x  9 root root 288 May 25 21:08 .
drwxr-xr-x  4 root root 128 May 25 21:08 objects

I've further checked that the POSIXPERM requirement is being set and 
verified that there are no shell differences (I'm using /bin/bash as my 
main shell in the Docker container, while the test scripts use /bin/sh 
(which on this Docker container is symlinked to `dash`):

root@ec94ab1b260e:~/git/t# ls -alhrt /bin/sh
lrwxrwxrwx 1 root root 4 Feb  4  2025 /bin/sh -> dash

I'm sure that I'm missing something blindingly obvious with respect to 
my setup, but I have not yet been able to figure out what that might be.

^ permalink raw reply

* [PATCH v2 3/3] line-log: allow non-patch diff formats with -L
From: Michael Montalbo via GitGitGadget @ 2026-05-25 19:40 UTC (permalink / raw)
  To: git; +Cc: D. Ben Knoble, Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2094.v2.git.1779738059.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

Now that -L flows through log_tree_diff_flush() and diff_flush(),
metadata-only diff formats work because they only read filepair
fields (status, mode, path, oid) already set on the pre-computed
pairs.

Expand the allowlist in setup_revisions() to also accept --raw,
--name-only, --name-status, and --summary.  Diff stat formats
(--stat, --numstat, --shortstat, --dirstat) remain blocked because
they call compute_diffstat() on full blob content and would show
whole-file statistics rather than range-scoped ones.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 Documentation/line-range-options.adoc | 10 +++---
 revision.c                            |  4 ++-
 t/t4211-line-log.sh                   | 47 +++++++++++++++++++++++++--
 3 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/Documentation/line-range-options.adoc b/Documentation/line-range-options.adoc
index ecb2c79fb9..72f639b5e7 100644
--- a/Documentation/line-range-options.adoc
+++ b/Documentation/line-range-options.adoc
@@ -8,12 +8,14 @@
 	give zero or one positive revision arguments, and
 	_<start>_ and _<end>_ (or _<funcname>_) must exist in the starting revision.
 	You can specify this option more than once. Implies `--patch`.
-	Patch output can be suppressed using `--no-patch`, but other diff formats
-	(namely `--raw`, `--numstat`, `--shortstat`, `--dirstat`, `--summary`,
-	`--name-only`, `--name-status`, `--check`) are not currently implemented.
+	Patch output can be suppressed using `--no-patch`.
+	Non-patch diff formats `--raw`, `--name-only`, `--name-status`,
+	and `--summary` are supported.  Diff stat formats
+	(`--stat`, `--numstat`, `--shortstat`, `--dirstat`) are not
+	currently implemented.
 +
 Patch formatting options such as `--word-diff`, `--color-moved`,
 `--no-prefix`, and whitespace options (`-w`, `-b`) are supported,
-as are pickaxe options (`-S`, `-G`).
+as are pickaxe options (`-S`, `-G`) and `--diff-filter`.
 +
 include::line-range-format.adoc[]
diff --git a/revision.c b/revision.c
index c903f7a1b4..f26fc1f4d5 100644
--- a/revision.c
+++ b/revision.c
@@ -3181,7 +3181,9 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 	if (revs->line_level_traverse &&
 	    (revs->full_diff ||
 	     (revs->diffopt.output_format &
-	      ~(DIFF_FORMAT_PATCH | DIFF_FORMAT_NO_OUTPUT))))
+	      ~(DIFF_FORMAT_PATCH | DIFF_FORMAT_NO_OUTPUT |
+		DIFF_FORMAT_RAW | DIFF_FORMAT_NAME |
+		DIFF_FORMAT_NAME_STATUS | DIFF_FORMAT_SUMMARY))))
 		die(_("-L does not yet support the requested diff format"));
 
 	if (revs->expand_tabs_in_log < 0)
diff --git a/t/t4211-line-log.sh b/t/t4211-line-log.sh
index e3937138a9..4722ec3e29 100755
--- a/t/t4211-line-log.sh
+++ b/t/t4211-line-log.sh
@@ -155,8 +155,45 @@ test_expect_success '-p shows the default patch output' '
 	test_cmp expect actual
 '
 
-test_expect_success '--raw is forbidden' '
-	test_must_fail git log -L1,24:b.c --raw
+test_expect_success '--raw shows mode, oid, status and path' '
+	git log -L1,24:b.c --raw --format= >actual &&
+	test_grep "^:100644 100644 [0-9a-f]\{7\} [0-9a-f]\{7\} M	b.c$" actual &&
+	! test_grep "^diff --git" actual &&
+	! test_grep "^@@" actual
+'
+
+test_expect_success '--name-only shows path' '
+	git log -L1,24:b.c --name-only --format= >actual &&
+	test_grep "^b.c$" actual &&
+	! test_grep "^diff --git" actual &&
+	! test_grep "^@@" actual
+'
+
+test_expect_success '--name-status shows status and path' '
+	git log -L1,24:b.c --name-status --format= >actual &&
+	test_grep "^M	b.c$" actual &&
+	! test_grep "^diff --git" actual &&
+	! test_grep "^@@" actual
+'
+
+test_expect_success '--stat is not yet supported with -L' '
+	test_must_fail git log -L1,24:b.c --stat 2>err &&
+	test_grep "does not yet support" err
+'
+
+test_expect_success '--numstat is not yet supported with -L' '
+	test_must_fail git log -L1,24:b.c --numstat 2>err &&
+	test_grep "does not yet support" err
+'
+
+test_expect_success '--shortstat is not yet supported with -L' '
+	test_must_fail git log -L1,24:b.c --shortstat 2>err &&
+	test_grep "does not yet support" err
+'
+
+test_expect_success '--dirstat is not yet supported with -L' '
+	test_must_fail git log -L1,24:b.c --dirstat 2>err &&
+	test_grep "does not yet support" err
 '
 
 test_expect_success 'setup for checking fancy rename following' '
@@ -738,4 +775,10 @@ test_expect_success '-L --oneline has no extra blank line before diff' '
 	test_grep "^diff --git" line2
 '
 
+test_expect_success '--summary shows new file on root commit' '
+	git checkout parent-oids &&
+	git log -L:func2:file.c --summary --format= >actual &&
+	test_grep "create mode 100644 file.c" actual
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox