git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] refs: allow setting the reference directory
@ 2025-11-19 21:48 Karthik Nayak
  2025-11-19 21:48 ` [PATCH 1/2] refs: support obtaining ref_store for given dir Karthik Nayak
                   ` (4 more replies)
  0 siblings, 5 replies; 33+ messages in thread
From: Karthik Nayak @ 2025-11-19 21:48 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak

While Git allows users to select different reference backends, unlike
with objects, there is no flexibility in selecting the reference
directory. Currently, the reference format is obtained from the config
of the repository and the reference directory is set to the $GIT_DIR.

This patch series adds a new ENV variable 'GIT_REF_URI' which takes the
reference backend and path in a URI form:

    <reference_backend>://<path>

For e.g. 'reftable:///foo' or 'files://$GIT_DIR/ref_migration.0xBsa0'.

One use case for this is migration between different backends. On the
server side, migrating from the files backend to the newly introduced
reftable backend can be achieved by running 'git refs migrate'. However,
for large repositories with millions of references, this migration can
take from seconds to minutes.

We could make the migration non-blocking by running the migration in the
background and capturing and replaying updates to both backends. This
would require Git to support writing references to different reference
backends and paths.

The first commit adds the required changes to create a 'ref_store' for a
given path. The second commit parses the URI if available when creating
the main ref store.

This is based on top of 9a2fb147f2 (Git 2.52, 2025-11-17).

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/git.adoc |   8 ++++
 environment.h          |   1 +
 refs.c                 |  64 +++++++++++++++++++++++++++--
 t/meson.build          |   1 +
 t/t1423-ref-backend.sh | 109 +++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 180 insertions(+), 3 deletions(-)

Karthik Nayak (2):
      refs: support obtaining ref_store for given dir
      refs: add GIT_REF_URI to specify reference backend and directory



base-commit: 9a2fb147f2c61d0cab52c883e7e26f5b7948e3ed
change-id: 20251105-kn-alternate-ref-dir-3e572e8cd0ef

Thanks
- Karthik


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/2] refs: support obtaining ref_store for given dir
  2025-11-19 21:48 [PATCH 0/2] refs: allow setting the reference directory Karthik Nayak
@ 2025-11-19 21:48 ` Karthik Nayak
  2025-11-20 19:05   ` Justin Tobler
  2025-11-19 21:48 ` [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory Karthik Nayak
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 33+ messages in thread
From: Karthik Nayak @ 2025-11-19 21:48 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak

The refs subsystem uses the `get_main_ref_store()` to obtain the main
ref_store for a given repository. In the upcoming patches we also want
to create a ref_store for any given reference directory, which may exist
in arbitrary paths. To support such behavior, extract out the core logic
for creating out the ref_store from `get_main_ref_store()` into a new
function `get_ref_store_for_dir()` which can provide the ref_store for a
given (repository, directory, reference format) combination.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/refs.c b/refs.c
index 965381367e..23f46867f2 100644
--- a/refs.c
+++ b/refs.c
@@ -2177,6 +2177,15 @@ void ref_store_release(struct ref_store *ref_store)
 	free(ref_store->gitdir);
 }
 
+static struct ref_store *get_ref_store_for_dir(struct repository *r,
+					       char *dir,
+					       enum ref_storage_format format)
+{
+	struct ref_store *ref_store = ref_store_init(r, format, dir,
+						     REF_STORE_ALL_CAPS);
+	return maybe_debug_wrap_ref_store(dir, ref_store);
+}
+
 struct ref_store *get_main_ref_store(struct repository *r)
 {
 	if (r->refs_private)
@@ -2185,9 +2194,7 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs_private = ref_store_init(r, r->ref_storage_format,
-					 r->gitdir, REF_STORE_ALL_CAPS);
-	r->refs_private = maybe_debug_wrap_ref_store(r->gitdir, r->refs_private);
+	r->refs_private = get_ref_store_for_dir(r, r->gitdir, r->ref_storage_format);
 	return r->refs_private;
 }
 

-- 
2.51.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory
  2025-11-19 21:48 [PATCH 0/2] refs: allow setting the reference directory Karthik Nayak
  2025-11-19 21:48 ` [PATCH 1/2] refs: support obtaining ref_store for given dir Karthik Nayak
@ 2025-11-19 21:48 ` Karthik Nayak
  2025-11-19 22:13   ` Eric Sunshine
                     ` (4 more replies)
  2025-11-23  4:29 ` [PATCH 0/2] refs: allow setting the reference directory Junio C Hamano
                   ` (2 subsequent siblings)
  4 siblings, 5 replies; 33+ messages in thread
From: Karthik Nayak @ 2025-11-19 21:48 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak

Git allows setting a different object directory via
'GIT_OBJECT_DIRECTORY', but provides no equivalent for references.
This asymmetry makes it difficult to test different reference backends
or use alternative reference storage locations without modifying the
repository structure.

Add a new environment variable 'GIT_REF_URI' that specifies both the
reference backend and directory path using a URI format:

    <ref_backend>://<path>

When set, this variable is used to obtain the main reference store for
all Git commands. The variable is checked in `get_main_ref_store()`
when lazily assigning `repo->refs_private`. We cannot initialize this
earlier in `repo_set_gitdir()` because the repository's hash algorithm
isn't known at that point, and the reftable backend requires this
information during initialization.

When used with worktrees, the specified directory is treated as the
reference directory for all worktree operations.

Add a new test file 't1423-ref-backend.sh' to test this environment
variable.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/git.adoc |   8 ++++
 environment.h          |   1 +
 refs.c                 |  53 +++++++++++++++++++++++-
 t/meson.build          |   1 +
 t/t1423-ref-backend.sh | 109 +++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 171 insertions(+), 1 deletion(-)

diff --git a/Documentation/git.adoc b/Documentation/git.adoc
index ce099e78b8..a1d1078f42 100644
--- a/Documentation/git.adoc
+++ b/Documentation/git.adoc
@@ -584,6 +584,14 @@ double-quotes and respecting backslash escapes. E.g., the value
 	repositories will be set to this value. The default is "files".
 	See `--ref-format` in linkgit:git-init[1].
 
+`GIT_REF_URI`::
+    Specify which reference backend and path to be used, if not specified the
+    backend is inferred from the configuration and $GIT_DIR is used as the
+    path.
++
+Expects the format '<ref_backend>://<path>', where the 'backend' specifies the
+reference backend and the 'path' specifies the directory used by the backend.
+
 Git Commits
 ~~~~~~~~~~~
 `GIT_AUTHOR_NAME`::
diff --git a/environment.h b/environment.h
index 51898c99cd..9bc380bba4 100644
--- a/environment.h
+++ b/environment.h
@@ -42,6 +42,7 @@
 #define GIT_OPTIONAL_LOCKS_ENVIRONMENT "GIT_OPTIONAL_LOCKS"
 #define GIT_TEXT_DOMAIN_DIR_ENVIRONMENT "GIT_TEXTDOMAINDIR"
 #define GIT_ATTR_SOURCE_ENVIRONMENT "GIT_ATTR_SOURCE"
+#define GIT_REF_URI_ENVIRONMENT "GIT_REF_URI"
 
 /*
  * Environment variable used to propagate the --no-advice global option to the
diff --git a/refs.c b/refs.c
index 23f46867f2..0922f08c9f 100644
--- a/refs.c
+++ b/refs.c
@@ -2186,15 +2186,66 @@ static struct ref_store *get_ref_store_for_dir(struct repository *r,
 	return maybe_debug_wrap_ref_store(dir, ref_store);
 }
 
+static struct ref_store *get_ref_store_from_uri(struct repository *repo,
+						const char *uri)
+{
+	struct string_list ref_backend_info = STRING_LIST_INIT_DUP;
+	enum ref_storage_format format;
+	struct ref_store *store = NULL;
+	char *format_string;
+	char *dir;
+
+	if (!uri || !uri[0]) {
+		error("reference backend uri is empty");
+		goto cleanup;
+	}
+
+	if (string_list_split(&ref_backend_info, uri, ":", 2) != 2) {
+		error("invalid reference backend uri format '%s'", uri);
+		goto cleanup;
+	}
+
+	format_string = ref_backend_info.items[0].string;
+	dir = ref_backend_info.items[1].string + 2;
+
+	if (!dir || !dir[0]) {
+		error("invalid path in uri '%s'", uri);
+		goto cleanup;
+	}
+
+	format = ref_storage_format_by_name(format_string);
+	if (format == REF_STORAGE_FORMAT_UNKNOWN) {
+		error("unknown reference backend '%s'", format_string);
+		goto cleanup;
+	}
+
+	store = get_ref_store_for_dir(repo, dir, format);
+
+cleanup:
+	string_list_clear(&ref_backend_info, 0);
+	return store;
+}
+
 struct ref_store *get_main_ref_store(struct repository *r)
 {
+	char *ref_uri;
+
 	if (r->refs_private)
 		return r->refs_private;
 
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs_private = get_ref_store_for_dir(r, r->gitdir, r->ref_storage_format);
+	ref_uri = getenv(GIT_REF_URI_ENVIRONMENT);
+	if (ref_uri) {
+		r->refs_private = get_ref_store_from_uri(r, ref_uri);
+		if (!r->refs_private)
+			die("failed to initialize ref store from URI: %s", ref_uri);
+
+	} else {
+		r->refs_private = get_ref_store_for_dir(r, r->gitdir,
+							r->ref_storage_format);
+	}
 	return r->refs_private;
 }
 
diff --git a/t/meson.build b/t/meson.build
index a5531df415..a66f8fafff 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -208,6 +208,7 @@ integration_tests = [
   't1420-lost-found.sh',
   't1421-reflog-write.sh',
   't1422-show-ref-exists.sh',
+  't1423-ref-backend.sh',
   't1430-bad-ref-name.sh',
   't1450-fsck.sh',
   't1451-fsck-buffer.sh',
diff --git a/t/t1423-ref-backend.sh b/t/t1423-ref-backend.sh
new file mode 100755
index 0000000000..e271708e02
--- /dev/null
+++ b/t/t1423-ref-backend.sh
@@ -0,0 +1,109 @@
+#!/bin/sh
+
+test_description='Test different reference backend URIs'
+
+. ./test-lib.sh
+
+test_expect_success 'empty uri provided' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	(
+		cd repo &&
+		GIT_REF_URI="" &&
+		export GIT_REF_URI &&
+		! git refs list 2>err &&
+		test_grep "reference backend uri is empty" err
+	)
+'
+
+test_expect_success 'invalid uri provided' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	(
+		cd repo &&
+		GIT_REF_URI="reftable@/home/reftable" &&
+		export GIT_REF_URI &&
+		! git refs list 2>err &&
+		test_grep "invalid reference backend uri format" err
+	)
+'
+
+test_expect_success 'empty path in uri' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	(
+		cd repo &&
+		GIT_REF_URI="reftable://" &&
+		export GIT_REF_URI &&
+		! git refs list 2>err &&
+		test_grep "invalid path in uri" err
+	)
+'
+
+test_expect_success 'unknown reference backend' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	(
+		cd repo &&
+		GIT_REF_URI="db://.git" &&
+		export GIT_REF_URI &&
+		! git refs list 2>err &&
+		test_grep "unknown reference backend" err
+	)
+'
+
+ref_formats="files reftable"
+for from_format in $ref_formats
+do
+	for to_format in $ref_formats
+	do
+		if test "$from_format" = "$to_format"
+		then
+			continue
+		fi
+
+		test_expect_success 'read from other reference backend' '
+			test_when_finished "rm -rf repo" &&
+			git init --ref-format=files repo &&
+			(
+				cd repo &&
+				test_commit 1 &&
+				test_commit 2 &&
+				test_commit 3 &&
+
+				git refs migrate --dry-run --ref-format=reftable >out &&
+				REFTABLE_PATH=$(cat out | sed "s/.* ${SQ}\(.*\)${SQ}/\1/") &&
+				git refs list >expect &&
+				GIT_REF_URI="reftable://$REFTABLE_PATH" git refs list >actual &&
+				test_cmp expect actual
+			)
+		'
+
+		test_expect_success 'write to other reference backend' '
+			test_when_finished "rm -rf repo" &&
+			git init --ref-format=files repo &&
+			(
+				cd repo &&
+				test_commit 1 &&
+				test_commit 2 &&
+				test_commit 3 &&
+
+				git refs migrate --dry-run --ref-format=reftable >out &&
+				git refs list >expect &&
+
+				REFTABLE_PATH=$(cat out | sed "s/.* ${SQ}\(.*\)${SQ}/\1/") &&
+				GIT_REF_URI="reftable://$REFTABLE_PATH" git tag -d 1 &&
+
+				git refs list >actual &&
+				test_cmp expect actual &&
+
+				GIT_REF_URI="reftable://$REFTABLE_PATH" git refs list >expect &&
+				git refs list >out &&
+				cat out | grep -v "refs/tags/1" >actual &&
+				test_cmp expect actual
+			)
+		'
+	done
+done
+
+test_done

-- 
2.51.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory
  2025-11-19 21:48 ` [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory Karthik Nayak
@ 2025-11-19 22:13   ` Eric Sunshine
  2025-11-19 23:01     ` Karthik Nayak
  2025-11-20 10:00   ` Jean-Noël Avila
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 33+ messages in thread
From: Eric Sunshine @ 2025-11-19 22:13 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git

On Wed, Nov 19, 2025 at 4:49 PM Karthik Nayak <karthik.188@gmail.com> wrote:
> Git allows setting a different object directory via
> 'GIT_OBJECT_DIRECTORY', but provides no equivalent for references.
> This asymmetry makes it difficult to test different reference backends
> or use alternative reference storage locations without modifying the
> repository structure.
>
> Add a new environment variable 'GIT_REF_URI' that specifies both the
> reference backend and directory path using a URI format:
> [...]
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
> diff --git a/t/t1423-ref-backend.sh b/t/t1423-ref-backend.sh
> @@ -0,0 +1,109 @@
> +test_expect_success 'empty uri provided' '
> +       test_when_finished "rm -rf repo" &&
> +       git init --ref-format=files repo &&
> +       (
> +               cd repo &&
> +               GIT_REF_URI="" &&
> +               export GIT_REF_URI &&
> +               ! git refs list 2>err &&

Should this (and all other tests) be using `test_must_fail` rather than `!`?

> +               test_grep "reference backend uri is empty" err
> +       )
> +'
> +ref_formats="files reftable"
> +for from_format in $ref_formats
> +do
> +       for to_format in $ref_formats
> +       do
> +               if test "$from_format" = "$to_format"
> +               then
> +                       continue
> +               fi
> +
> +               test_expect_success 'read from other reference backend' '
> +                       test_when_finished "rm -rf repo" &&
> +                       git init --ref-format=files repo &&
> +                       (
> +                               cd repo &&
> +                               test_commit 1 &&
> +                               test_commit 2 &&
> +                               test_commit 3 &&
> +
> +                               git refs migrate --dry-run --ref-format=reftable >out &&
> +                               REFTABLE_PATH=$(cat out | sed "s/.* ${SQ}\(.*\)${SQ}/\1/") &&
> +                               git refs list >expect &&
> +                               GIT_REF_URI="reftable://$REFTABLE_PATH" git refs list >actual &&
> +                               test_cmp expect actual
> +                       )
> +               '
> +
> +               test_expect_success 'write to other reference backend' '
> +                       [...]
> +               '
> +       done
> +done

Something seems amiss here. Presumably, this nested loop wants to test
various combinations but the `from_format` and `to_format` variables
are never consulted in the tests; instead the tests just hardcode
specific ref-format values.

Also, if this is indeed meant to be loop-driven, then it would be
helpful for the test titles to include the values of `$from_format`
and `$to_format`.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory
  2025-11-19 22:13   ` Eric Sunshine
@ 2025-11-19 23:01     ` Karthik Nayak
  0 siblings, 0 replies; 33+ messages in thread
From: Karthik Nayak @ 2025-11-19 23:01 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3203 bytes --]

Eric Sunshine <sunshine@sunshineco.com> writes:

> On Wed, Nov 19, 2025 at 4:49 PM Karthik Nayak <karthik.188@gmail.com> wrote:
>> Git allows setting a different object directory via
>> 'GIT_OBJECT_DIRECTORY', but provides no equivalent for references.
>> This asymmetry makes it difficult to test different reference backends
>> or use alternative reference storage locations without modifying the
>> repository structure.
>>
>> Add a new environment variable 'GIT_REF_URI' that specifies both the
>> reference backend and directory path using a URI format:
>> [...]
>> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
>> ---
>> diff --git a/t/t1423-ref-backend.sh b/t/t1423-ref-backend.sh
>> @@ -0,0 +1,109 @@
>> +test_expect_success 'empty uri provided' '
>> +       test_when_finished "rm -rf repo" &&
>> +       git init --ref-format=files repo &&
>> +       (
>> +               cd repo &&
>> +               GIT_REF_URI="" &&
>> +               export GIT_REF_URI &&
>> +               ! git refs list 2>err &&
>
> Should this (and all other tests) be using `test_must_fail` rather than `!`?
>

Initially I used 'BUG()' instead of 'error()', which was wrong, but
meant that I couldn't use `test_must_fail`. I've fixed that now, but
this was missed. Thanks.

>> +               test_grep "reference backend uri is empty" err
>> +       )
>> +'
>> +ref_formats="files reftable"
>> +for from_format in $ref_formats
>> +do
>> +       for to_format in $ref_formats
>> +       do
>> +               if test "$from_format" = "$to_format"
>> +               then
>> +                       continue
>> +               fi
>> +
>> +               test_expect_success 'read from other reference backend' '
>> +                       test_when_finished "rm -rf repo" &&
>> +                       git init --ref-format=files repo &&
>> +                       (
>> +                               cd repo &&
>> +                               test_commit 1 &&
>> +                               test_commit 2 &&
>> +                               test_commit 3 &&
>> +
>> +                               git refs migrate --dry-run --ref-format=reftable >out &&
>> +                               REFTABLE_PATH=$(cat out | sed "s/.* ${SQ}\(.*\)${SQ}/\1/") &&
>> +                               git refs list >expect &&
>> +                               GIT_REF_URI="reftable://$REFTABLE_PATH" git refs list >actual &&
>> +                               test_cmp expect actual
>> +                       )
>> +               '
>> +
>> +               test_expect_success 'write to other reference backend' '
>> +                       [...]
>> +               '
>> +       done
>> +done
>
> Something seems amiss here. Presumably, this nested loop wants to test
> various combinations but the `from_format` and `to_format` variables
> are never consulted in the tests; instead the tests just hardcode
> specific ref-format values.
>
> Also, if this is indeed meant to be loop-driven, then it would be
> helpful for the test titles to include the values of `$from_format`
> and `$to_format`.

Indeed. I was hasty, will fix :)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory
  2025-11-19 21:48 ` [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory Karthik Nayak
  2025-11-19 22:13   ` Eric Sunshine
@ 2025-11-20 10:00   ` Jean-Noël Avila
  2025-11-21 11:21     ` Karthik Nayak
  2025-11-20 19:38   ` Justin Tobler
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 33+ messages in thread
From: Jean-Noël Avila @ 2025-11-20 10:00 UTC (permalink / raw)
  To: Karthik Nayak, git

On 19/11/2025 at 22:48, Karthik Nayak wrote:
> ---
>  Documentation/git.adoc |   8 ++++
>  environment.h          |   1 +
>  refs.c                 |  53 +++++++++++++++++++++++-
>  t/meson.build          |   1 +
>  t/t1423-ref-backend.sh | 109 +++++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 171 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/git.adoc b/Documentation/git.adoc
> index ce099e78b8..a1d1078f42 100644
> --- a/Documentation/git.adoc
> +++ b/Documentation/git.adoc
> @@ -584,6 +584,14 @@ double-quotes and respecting backslash escapes. E.g., the value
>  	repositories will be set to this value. The default is "files".
>  	See `--ref-format` in linkgit:git-init[1].
>  
> +`GIT_REF_URI`::
> +    Specify which reference backend and path to be used, if not specified the
> +    backend is inferred from the configuration and $GIT_DIR is used as the
> +    path.

Please use backquotes for environment variables: `$GIT_DIR`

> ++
> +Expects the format '<ref_backend>://<path>', where the 'backend' specifies the
> +reference backend and the 'path' specifies the directory used by the backend.

Constant strings and keywords are back-quoted too but placeholders are
underscored:

Expects the format `<ref_backend>://<path>`, where the _<ref_backend>_
specifies the reference backend and the _<path>_ specifies the directory
used by the backend.

I'm only focusing on documentation.

Thanks

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/2] refs: support obtaining ref_store for given dir
  2025-11-19 21:48 ` [PATCH 1/2] refs: support obtaining ref_store for given dir Karthik Nayak
@ 2025-11-20 19:05   ` Justin Tobler
  2025-11-21 11:18     ` Karthik Nayak
  0 siblings, 1 reply; 33+ messages in thread
From: Justin Tobler @ 2025-11-20 19:05 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git

On 25/11/19 10:48PM, Karthik Nayak wrote:
> The refs subsystem uses the `get_main_ref_store()` to obtain the main
> ref_store for a given repository. In the upcoming patches we also want
> to create a ref_store for any given reference directory, which may exist
> in arbitrary paths. To support such behavior, extract out the core logic
> for creating out the ref_store from `get_main_ref_store()` into a new
> function `get_ref_store_for_dir()` which can provide the ref_store for a
> given (repository, directory, reference format) combination.

So when we refer to the "reference directory" in this case, we are not
refering to the "refs/" or "reftable/" directories directly, but one
level above that which is typically just the gitdir itself. This seems a
bit awkward at first, but makes sense since, for the files backend,
there may be symbolic references such as HEAD that exist outside of
"refs/" which must be considered. It might be helpful to clarify this in
the commit message.

Otherwise this patch looks good.

-Justin

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory
  2025-11-19 21:48 ` [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory Karthik Nayak
  2025-11-19 22:13   ` Eric Sunshine
  2025-11-20 10:00   ` Jean-Noël Avila
@ 2025-11-20 19:38   ` Justin Tobler
  2025-11-24 13:23     ` Karthik Nayak
  2025-11-21 13:42   ` Toon Claes
  2025-12-01 13:28   ` Patrick Steinhardt
  4 siblings, 1 reply; 33+ messages in thread
From: Justin Tobler @ 2025-11-20 19:38 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git

On 25/11/19 10:48PM, Karthik Nayak wrote:
> Git allows setting a different object directory via
> 'GIT_OBJECT_DIRECTORY', but provides no equivalent for references.
> This asymmetry makes it difficult to test different reference backends
> or use alternative reference storage locations without modifying the
> repository structure.
> 
> Add a new environment variable 'GIT_REF_URI' that specifies both the
> reference backend and directory path using a URI format:
> 
>     <ref_backend>://<path>

Ok, we include the reference format as part of the URI here since it is
possible that the alternative reference store could be using a different
backend that what the repository is currently configured to use. Makes
sense.

> When set, this variable is used to obtain the main reference store for
> all Git commands. The variable is checked in `get_main_ref_store()`
> when lazily assigning `repo->refs_private`. We cannot initialize this
> earlier in `repo_set_gitdir()` because the repository's hash algorithm
> isn't known at that point, and the reftable backend requires this
> information during initialization.
>
> When used with worktrees, the specified directory is treated as the
> reference directory for all worktree operations.
> 
> Add a new test file 't1423-ref-backend.sh' to test this environment
> variable.
> 
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
>  Documentation/git.adoc |   8 ++++
>  environment.h          |   1 +
>  refs.c                 |  53 +++++++++++++++++++++++-
>  t/meson.build          |   1 +
>  t/t1423-ref-backend.sh | 109 +++++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 171 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/git.adoc b/Documentation/git.adoc
> index ce099e78b8..a1d1078f42 100644
> --- a/Documentation/git.adoc
> +++ b/Documentation/git.adoc
> @@ -584,6 +584,14 @@ double-quotes and respecting backslash escapes. E.g., the value
>  	repositories will be set to this value. The default is "files".
>  	See `--ref-format` in linkgit:git-init[1].
>  
> +`GIT_REF_URI`::
> +    Specify which reference backend and path to be used, if not specified the
> +    backend is inferred from the configuration and $GIT_DIR is used as the
> +    path.
> ++
> +Expects the format '<ref_backend>://<path>', where the 'backend' specifies the
> +reference backend and the 'path' specifies the directory used by the backend.

I think some users may assume that the path to the reference backend
would be something like ".git/refs" similar to how
`GIT_OBJECT_DIRECTORY` is usually ".git/objects". It might be worth
clarifying this in the docs here.

> +
>  Git Commits
>  ~~~~~~~~~~~
>  `GIT_AUTHOR_NAME`::
> diff --git a/environment.h b/environment.h
> index 51898c99cd..9bc380bba4 100644
> --- a/environment.h
> +++ b/environment.h
> @@ -42,6 +42,7 @@
>  #define GIT_OPTIONAL_LOCKS_ENVIRONMENT "GIT_OPTIONAL_LOCKS"
>  #define GIT_TEXT_DOMAIN_DIR_ENVIRONMENT "GIT_TEXTDOMAINDIR"
>  #define GIT_ATTR_SOURCE_ENVIRONMENT "GIT_ATTR_SOURCE"
> +#define GIT_REF_URI_ENVIRONMENT "GIT_REF_URI"
>  
>  /*
>   * Environment variable used to propagate the --no-advice global option to the
> diff --git a/refs.c b/refs.c
> index 23f46867f2..0922f08c9f 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -2186,15 +2186,66 @@ static struct ref_store *get_ref_store_for_dir(struct repository *r,
>  	return maybe_debug_wrap_ref_store(dir, ref_store);
>  }
>  
> +static struct ref_store *get_ref_store_from_uri(struct repository *repo,
> +						const char *uri)
> +{
> +	struct string_list ref_backend_info = STRING_LIST_INIT_DUP;
> +	enum ref_storage_format format;
> +	struct ref_store *store = NULL;
> +	char *format_string;
> +	char *dir;
> +
> +	if (!uri || !uri[0]) {
> +		error("reference backend uri is empty");
> +		goto cleanup;
> +	}
> +
> +	if (string_list_split(&ref_backend_info, uri, ":", 2) != 2) {
> +		error("invalid reference backend uri format '%s'", uri);
> +		goto cleanup;
> +	}
> +
> +	format_string = ref_backend_info.items[0].string;
> +	dir = ref_backend_info.items[1].string + 2;
> +
> +	if (!dir || !dir[0]) {
> +		error("invalid path in uri '%s'", uri);
> +		goto cleanup;
> +	}
> +
> +	format = ref_storage_format_by_name(format_string);
> +	if (format == REF_STORAGE_FORMAT_UNKNOWN) {
> +		error("unknown reference backend '%s'", format_string);
> +		goto cleanup;
> +	}
> +
> +	store = get_ref_store_for_dir(repo, dir, format);

Since we don't update the reference format stored in repo, if we were to
run:

  $ GIT_REF_URI="reftable://<path> git repo info references.format

it would still report what ever the repository was originally configured
with. Since only a single reference backend can be used at time, I
wonder if we should go a bit further and update `r->ref_storage_format`
to be inline with how the repository reference backend is configured via
`GIT_REF_URI`.

-Justin

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/2] refs: support obtaining ref_store for given dir
  2025-11-20 19:05   ` Justin Tobler
@ 2025-11-21 11:18     ` Karthik Nayak
  0 siblings, 0 replies; 33+ messages in thread
From: Karthik Nayak @ 2025-11-21 11:18 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1384 bytes --]

Justin Tobler <jltobler@gmail.com> writes:

> On 25/11/19 10:48PM, Karthik Nayak wrote:
>> The refs subsystem uses the `get_main_ref_store()` to obtain the main
>> ref_store for a given repository. In the upcoming patches we also want
>> to create a ref_store for any given reference directory, which may exist
>> in arbitrary paths. To support such behavior, extract out the core logic
>> for creating out the ref_store from `get_main_ref_store()` into a new
>> function `get_ref_store_for_dir()` which can provide the ref_store for a
>> given (repository, directory, reference format) combination.
>
> So when we refer to the "reference directory" in this case, we are not
> refering to the "refs/" or "reftable/" directories directly, but one
> level above that which is typically just the gitdir itself. This seems a
> bit awkward at first, but makes sense since, for the files backend,
> there may be symbolic references such as HEAD that exist outside of
> "refs/" which must be considered. It might be helpful to clarify this in
> the commit message.
>

You're right, for the files and the reftable backend, this happens to be
the $GIT_DIR itself, due to how closely they are integrated with Git.
But if you build an external reference backend, this doesn't have to be
the $GIT_DIR.

I've modified the commit message accordingly

> Otherwise this patch looks good.
>
> -Justin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory
  2025-11-20 10:00   ` Jean-Noël Avila
@ 2025-11-21 11:21     ` Karthik Nayak
  0 siblings, 0 replies; 33+ messages in thread
From: Karthik Nayak @ 2025-11-21 11:21 UTC (permalink / raw)
  To: Jean-Noël Avila, git

[-- Attachment #1: Type: text/plain, Size: 1632 bytes --]

Jean-Noël Avila <jn.avila@free.fr> writes:

> On 19/11/2025 at 22:48, Karthik Nayak wrote:
>> ---
>>  Documentation/git.adoc |   8 ++++
>>  environment.h          |   1 +
>>  refs.c                 |  53 +++++++++++++++++++++++-
>>  t/meson.build          |   1 +
>>  t/t1423-ref-backend.sh | 109 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  5 files changed, 171 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/git.adoc b/Documentation/git.adoc
>> index ce099e78b8..a1d1078f42 100644
>> --- a/Documentation/git.adoc
>> +++ b/Documentation/git.adoc
>> @@ -584,6 +584,14 @@ double-quotes and respecting backslash escapes. E.g., the value
>>  	repositories will be set to this value. The default is "files".
>>  	See `--ref-format` in linkgit:git-init[1].
>>
>> +`GIT_REF_URI`::
>> +    Specify which reference backend and path to be used, if not specified the
>> +    backend is inferred from the configuration and $GIT_DIR is used as the
>> +    path.
>
> Please use backquotes for environment variables: `$GIT_DIR`
>

Will do.

>> ++
>> +Expects the format '<ref_backend>://<path>', where the 'backend' specifies the
>> +reference backend and the 'path' specifies the directory used by the backend.
>
> Constant strings and keywords are back-quoted too but placeholders are
> underscored:
>
> Expects the format `<ref_backend>://<path>`, where the _<ref_backend>_
> specifies the reference backend and the _<path>_ specifies the directory
> used by the backend.
>
> I'm only focusing on documentation.
>
> Thanks

Thanks for your review, will modify accordingly.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory
  2025-11-19 21:48 ` [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory Karthik Nayak
                     ` (2 preceding siblings ...)
  2025-11-20 19:38   ` Justin Tobler
@ 2025-11-21 13:42   ` Toon Claes
  2025-11-21 16:07     ` Junio C Hamano
  2025-11-24 13:26     ` Karthik Nayak
  2025-12-01 13:28   ` Patrick Steinhardt
  4 siblings, 2 replies; 33+ messages in thread
From: Toon Claes @ 2025-11-21 13:42 UTC (permalink / raw)
  To: Karthik Nayak, git; +Cc: Karthik Nayak

Karthik Nayak <karthik.188@gmail.com> writes:

> Git allows setting a different object directory via
> 'GIT_OBJECT_DIRECTORY', but provides no equivalent for references.
> This asymmetry makes it difficult to test different reference backends
> or use alternative reference storage locations without modifying the
> repository structure.
>
> Add a new environment variable 'GIT_REF_URI' that specifies both the
> reference backend and directory path using a URI format:
>
>     <ref_backend>://<path>

I like this idea. This would allow us in the future to also do something
like:

    reftable+nfs://10.11.12.13/ref-dir

> When set, this variable is used to obtain the main reference store for
> all Git commands. The variable is checked in `get_main_ref_store()`
> when lazily assigning `repo->refs_private`. We cannot initialize this
> earlier in `repo_set_gitdir()` because the repository's hash algorithm
> isn't known at that point, and the reftable backend requires this
> information during initialization.
>
> When used with worktrees, the specified directory is treated as the
> reference directory for all worktree operations.
>
> Add a new test file 't1423-ref-backend.sh' to test this environment
> variable.
>
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
>  Documentation/git.adoc |   8 ++++
>  environment.h          |   1 +
>  refs.c                 |  53 +++++++++++++++++++++++-
>  t/meson.build          |   1 +
>  t/t1423-ref-backend.sh | 109 +++++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 171 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/git.adoc b/Documentation/git.adoc
> index ce099e78b8..a1d1078f42 100644
> --- a/Documentation/git.adoc
> +++ b/Documentation/git.adoc
> @@ -584,6 +584,14 @@ double-quotes and respecting backslash escapes. E.g., the value
>  	repositories will be set to this value. The default is "files".
>  	See `--ref-format` in linkgit:git-init[1].
>  
> +`GIT_REF_URI`::
> +    Specify which reference backend and path to be used, if not specified the
> +    backend is inferred from the configuration and $GIT_DIR is used as the
> +    path.
> ++
> +Expects the format '<ref_backend>://<path>', where the 'backend' specifies the
> +reference backend and the 'path' specifies the directory used by the backend.
> +
>  Git Commits
>  ~~~~~~~~~~~
>  `GIT_AUTHOR_NAME`::
> diff --git a/environment.h b/environment.h
> index 51898c99cd..9bc380bba4 100644
> --- a/environment.h
> +++ b/environment.h
> @@ -42,6 +42,7 @@
>  #define GIT_OPTIONAL_LOCKS_ENVIRONMENT "GIT_OPTIONAL_LOCKS"
>  #define GIT_TEXT_DOMAIN_DIR_ENVIRONMENT "GIT_TEXTDOMAINDIR"
>  #define GIT_ATTR_SOURCE_ENVIRONMENT "GIT_ATTR_SOURCE"
> +#define GIT_REF_URI_ENVIRONMENT "GIT_REF_URI"
>  
>  /*
>   * Environment variable used to propagate the --no-advice global option to the
> diff --git a/refs.c b/refs.c
> index 23f46867f2..0922f08c9f 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -2186,15 +2186,66 @@ static struct ref_store *get_ref_store_for_dir(struct repository *r,
>  	return maybe_debug_wrap_ref_store(dir, ref_store);
>  }
>  
> +static struct ref_store *get_ref_store_from_uri(struct repository *repo,
> +						const char *uri)
> +{
> +	struct string_list ref_backend_info = STRING_LIST_INIT_DUP;
> +	enum ref_storage_format format;
> +	struct ref_store *store = NULL;
> +	char *format_string;
> +	char *dir;
> +
> +	if (!uri || !uri[0]) {
> +		error("reference backend uri is empty");

I see no localization on any of the error() or die() messages. I think
it's worth to make them translatable.

> +		goto cleanup;
> +	}
> +
> +	if (string_list_split(&ref_backend_info, uri, ":", 2) != 2) {
> +		error("invalid reference backend uri format '%s'", uri);
> +		goto cleanup;
> +	}
> +
> +	format_string = ref_backend_info.items[0].string;
> +	dir = ref_backend_info.items[1].string + 2;

Length check before jumping to the third char would be adviced. Also I
think it's worth to check if the first two chars are "//".

-- 
Cheers,
Toon

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory
  2025-11-21 13:42   ` Toon Claes
@ 2025-11-21 16:07     ` Junio C Hamano
  2025-11-24 13:25       ` Karthik Nayak
  2025-11-24 13:26     ` Karthik Nayak
  1 sibling, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2025-11-21 16:07 UTC (permalink / raw)
  To: Toon Claes; +Cc: Karthik Nayak, git

Toon Claes <toon@iotcl.com> writes:

>>     <ref_backend>://<path>
>
> I like this idea. This would allow us in the future to also do something
> like:
>
>     reftable+nfs://10.11.12.13/ref-dir

I actually thought from Karthik's definition that what you are
trying to say is spelled more like this:

    reftable://nfs://10.11.12.13/ref-dir

IOW, the underlying URI to "reach the resource" is in the <path>
part (i.e., "nfs://<addr>/<directory>").  And I found it somewhat a
strange syntax, because the "to reach the resource, visit this" URI
may not necessarily look like <path>, and I also wondered if
spelling it like <ref_backend>:<URI-for-resource> is more
appropriate.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] refs: allow setting the reference directory
  2025-11-19 21:48 [PATCH 0/2] refs: allow setting the reference directory Karthik Nayak
  2025-11-19 21:48 ` [PATCH 1/2] refs: support obtaining ref_store for given dir Karthik Nayak
  2025-11-19 21:48 ` [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory Karthik Nayak
@ 2025-11-23  4:29 ` Junio C Hamano
  2025-12-01 13:19   ` Patrick Steinhardt
  2025-11-26 11:11 ` [PATCH v2 " Karthik Nayak
  2025-12-01 11:24 ` [PATCH v3 0/2] refs: allow setting the reference directory Karthik Nayak
  4 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2025-11-23  4:29 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git

Karthik Nayak <karthik.188@gmail.com> writes:

> While Git allows users to select different reference backends, unlike
> with objects, there is no flexibility in selecting the reference
> directory. Currently, the reference format is obtained from the config
> of the repository and the reference directory is set to the $GIT_DIR.

I actually am not sure if I like the proposed environment variable.

The proposal is based on an assumption that any reference backend
should be able to move their backing store anywhere, and they should
be able to express the location of their backing store as a single
string <path>.  For a new backend, "where is your backing store" may
not even be a question that does not make much sense (as "somewhere
in the cloud that you do not even have to know" is certainly
possible), and even for a new backend design that does allow such a
question to have a meaningful answer, this "you have to be able to
use a random place specified by this environment variable as your
backing storage" is an additional requirement that its implementors
may not need to satisfy in order to please their user base.

For reftable and files backends, these assumptions may be true, but
then it is not too cumbersome if these stay to be backend specific,
as there are only two backends.

So I dunno.  In addition, if this is designed to help migration
(which is the impression I am getting from the cover letter
description), don't you need a way to specify more than one (i.e.,
source to migrate from and destination to migrate to)?  With a
single GIT_REF_URI, it would not be obvious what it refers to,
whether it is an additional place to write to, to read from, or
something completely unrelated.  For example ...

> This patch series adds a new ENV variable 'GIT_REF_URI' which takes the
> reference backend and path in a URI form:
>
>     <reference_backend>://<path>
>
> For e.g. 'reftable:///foo' or 'files://$GIT_DIR/ref_migration.0xBsa0'.
>
> One use case for this is migration between different backends. On the
> server side, migrating from the files backend to the newly introduced
> reftable backend can be achieved by running 'git refs migrate'. However,
> for large repositories with millions of references, this migration can
> take from seconds to minutes.
>
> We could make the migration non-blocking by running the migration in the
> background and capturing and replaying updates to both backends. This
> would require Git to support writing references to different reference
> backends and paths.

... I am reading that the above is saying that the system will write
to whatever reference backend specified in the extension.refStorage,
plus also where GIT_REF_URI points at, but if that is the way how
the mechanism works, the variable should be named more specific to
what it does, no?  It is not just a random "REF URI"; it is an
additional ref backend that the updates are dumped to.  Maybe there
would be a different use case where you may want to read from two
reference backends, and you'd need to specify the secondary one with
an environment variable, but if the system behaves one specific way
for GIT_REF_URI (say, all updates are also copied to this additional
ref backend at the specified ref backing store), a different
environment variable name needs to be chosen to serve such a
different use case, no?



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory
  2025-11-20 19:38   ` Justin Tobler
@ 2025-11-24 13:23     ` Karthik Nayak
  0 siblings, 0 replies; 33+ messages in thread
From: Karthik Nayak @ 2025-11-24 13:23 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 5620 bytes --]

Justin Tobler <jltobler@gmail.com> writes:

> On 25/11/19 10:48PM, Karthik Nayak wrote:
>> Git allows setting a different object directory via
>> 'GIT_OBJECT_DIRECTORY', but provides no equivalent for references.
>> This asymmetry makes it difficult to test different reference backends
>> or use alternative reference storage locations without modifying the
>> repository structure.
>>
>> Add a new environment variable 'GIT_REF_URI' that specifies both the
>> reference backend and directory path using a URI format:
>>
>>     <ref_backend>://<path>
>
> Ok, we include the reference format as part of the URI here since it is
> possible that the alternative reference store could be using a different
> backend that what the repository is currently configured to use. Makes
> sense.
>
>> When set, this variable is used to obtain the main reference store for
>> all Git commands. The variable is checked in `get_main_ref_store()`
>> when lazily assigning `repo->refs_private`. We cannot initialize this
>> earlier in `repo_set_gitdir()` because the repository's hash algorithm
>> isn't known at that point, and the reftable backend requires this
>> information during initialization.
>>
>> When used with worktrees, the specified directory is treated as the
>> reference directory for all worktree operations.
>>
>> Add a new test file 't1423-ref-backend.sh' to test this environment
>> variable.
>>
>> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
>> ---
>>  Documentation/git.adoc |   8 ++++
>>  environment.h          |   1 +
>>  refs.c                 |  53 +++++++++++++++++++++++-
>>  t/meson.build          |   1 +
>>  t/t1423-ref-backend.sh | 109 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  5 files changed, 171 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/git.adoc b/Documentation/git.adoc
>> index ce099e78b8..a1d1078f42 100644
>> --- a/Documentation/git.adoc
>> +++ b/Documentation/git.adoc
>> @@ -584,6 +584,14 @@ double-quotes and respecting backslash escapes. E.g., the value
>>  	repositories will be set to this value. The default is "files".
>>  	See `--ref-format` in linkgit:git-init[1].
>>
>> +`GIT_REF_URI`::
>> +    Specify which reference backend and path to be used, if not specified the
>> +    backend is inferred from the configuration and $GIT_DIR is used as the
>> +    path.
>> ++
>> +Expects the format '<ref_backend>://<path>', where the 'backend' specifies the
>> +reference backend and the 'path' specifies the directory used by the backend.
>
> I think some users may assume that the path to the reference backend
> would be something like ".git/refs" similar to how
> `GIT_OBJECT_DIRECTORY` is usually ".git/objects". It might be worth
> clarifying this in the docs here.
>

Fair enough. I'll amend the commit.

>> +
>>  Git Commits
>>  ~~~~~~~~~~~
>>  `GIT_AUTHOR_NAME`::
>> diff --git a/environment.h b/environment.h
>> index 51898c99cd..9bc380bba4 100644
>> --- a/environment.h
>> +++ b/environment.h
>> @@ -42,6 +42,7 @@
>>  #define GIT_OPTIONAL_LOCKS_ENVIRONMENT "GIT_OPTIONAL_LOCKS"
>>  #define GIT_TEXT_DOMAIN_DIR_ENVIRONMENT "GIT_TEXTDOMAINDIR"
>>  #define GIT_ATTR_SOURCE_ENVIRONMENT "GIT_ATTR_SOURCE"
>> +#define GIT_REF_URI_ENVIRONMENT "GIT_REF_URI"
>>
>>  /*
>>   * Environment variable used to propagate the --no-advice global option to the
>> diff --git a/refs.c b/refs.c
>> index 23f46867f2..0922f08c9f 100644
>> --- a/refs.c
>> +++ b/refs.c
>> @@ -2186,15 +2186,66 @@ static struct ref_store *get_ref_store_for_dir(struct repository *r,
>>  	return maybe_debug_wrap_ref_store(dir, ref_store);
>>  }
>>
>> +static struct ref_store *get_ref_store_from_uri(struct repository *repo,
>> +						const char *uri)
>> +{
>> +	struct string_list ref_backend_info = STRING_LIST_INIT_DUP;
>> +	enum ref_storage_format format;
>> +	struct ref_store *store = NULL;
>> +	char *format_string;
>> +	char *dir;
>> +
>> +	if (!uri || !uri[0]) {
>> +		error("reference backend uri is empty");
>> +		goto cleanup;
>> +	}
>> +
>> +	if (string_list_split(&ref_backend_info, uri, ":", 2) != 2) {
>> +		error("invalid reference backend uri format '%s'", uri);
>> +		goto cleanup;
>> +	}
>> +
>> +	format_string = ref_backend_info.items[0].string;
>> +	dir = ref_backend_info.items[1].string + 2;
>> +
>> +	if (!dir || !dir[0]) {
>> +		error("invalid path in uri '%s'", uri);
>> +		goto cleanup;
>> +	}
>> +
>> +	format = ref_storage_format_by_name(format_string);
>> +	if (format == REF_STORAGE_FORMAT_UNKNOWN) {
>> +		error("unknown reference backend '%s'", format_string);
>> +		goto cleanup;
>> +	}
>> +
>> +	store = get_ref_store_for_dir(repo, dir, format);
>
> Since we don't update the reference format stored in repo, if we were to
> run:
>
>   $ GIT_REF_URI="reftable://<path> git repo info references.format
>
> it would still report what ever the repository was originally configured
> with. Since only a single reference backend can be used at time, I
> wonder if we should go a bit further and update `r->ref_storage_format`
> to be inline with how the repository reference backend is configured via
> `GIT_REF_URI`.
>

Updating it here won't here won't work, this flow is lazy and only
evaluated when you actually want to deal with references.

Commands like 'git repo info reference.format' will not trigger this
flow and will only read the config. I'm also not sure we should be
modifying it. Because the output of such a command is to note how the
repository is configured. We are not changing that configuration, but
instead we're simply asking the git to use a different backend for when
the env is provided. What do you think?

> -Justin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory
  2025-11-21 16:07     ` Junio C Hamano
@ 2025-11-24 13:25       ` Karthik Nayak
  2025-11-26 13:11         ` Toon Claes
  0 siblings, 1 reply; 33+ messages in thread
From: Karthik Nayak @ 2025-11-24 13:25 UTC (permalink / raw)
  To: Junio C Hamano, Toon Claes; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1105 bytes --]

Junio C Hamano <gitster@pobox.com> writes:

> Toon Claes <toon@iotcl.com> writes:
>
>>>     <ref_backend>://<path>
>>
>> I like this idea. This would allow us in the future to also do something
>> like:
>>
>>     reftable+nfs://10.11.12.13/ref-dir
>
> I actually thought from Karthik's definition that what you are
> trying to say is spelled more like this:
>
>     reftable://nfs://10.11.12.13/ref-dir
>

You're indeed correct.

> IOW, the underlying URI to "reach the resource" is in the <path>
> part (i.e., "nfs://<addr>/<directory>").  And I found it somewhat a
> strange syntax, because the "to reach the resource, visit this" URI
> may not necessarily look like <path>, and I also wondered if
> spelling it like <ref_backend>:<URI-for-resource> is more
> appropriate.

This is a much better way to state what I was going for, I was
considering the entire <ref_backend>:<path> as the URI since currently
we only deal with FS paths (for files and reftable). But that could
potentially change. As such, it makes sense to state it as
<ref_backend>:<URI-for-resource>. Will modify accordingly.

Thanks.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory
  2025-11-21 13:42   ` Toon Claes
  2025-11-21 16:07     ` Junio C Hamano
@ 2025-11-24 13:26     ` Karthik Nayak
  1 sibling, 0 replies; 33+ messages in thread
From: Karthik Nayak @ 2025-11-24 13:26 UTC (permalink / raw)
  To: Toon Claes, git

[-- Attachment #1: Type: text/plain, Size: 1338 bytes --]

Toon Claes <toon@iotcl.com> writes:

[snip]

>> diff --git a/refs.c b/refs.c
>> index 23f46867f2..0922f08c9f 100644
>> --- a/refs.c
>> +++ b/refs.c
>> @@ -2186,15 +2186,66 @@ static struct ref_store *get_ref_store_for_dir(struct repository *r,
>>  	return maybe_debug_wrap_ref_store(dir, ref_store);
>>  }
>>
>> +static struct ref_store *get_ref_store_from_uri(struct repository *repo,
>> +						const char *uri)
>> +{
>> +	struct string_list ref_backend_info = STRING_LIST_INIT_DUP;
>> +	enum ref_storage_format format;
>> +	struct ref_store *store = NULL;
>> +	char *format_string;
>> +	char *dir;
>> +
>> +	if (!uri || !uri[0]) {
>> +		error("reference backend uri is empty");
>
> I see no localization on any of the error() or die() messages. I think
> it's worth to make them translatable.
>

Yeah, that makes sense.

>> +		goto cleanup;
>> +	}
>> +
>> +	if (string_list_split(&ref_backend_info, uri, ":", 2) != 2) {
>> +		error("invalid reference backend uri format '%s'", uri);
>> +		goto cleanup;
>> +	}
>> +
>> +	format_string = ref_backend_info.items[0].string;
>> +	dir = ref_backend_info.items[1].string + 2;
>
> Length check before jumping to the third char would be adviced. Also I
> think it's worth to check if the first two chars are "//".
>

This is a good point, will add a test and fix this up.

> --
> Cheers,
> Toon

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 0/2] refs: allow setting the reference directory
  2025-11-19 21:48 [PATCH 0/2] refs: allow setting the reference directory Karthik Nayak
                   ` (2 preceding siblings ...)
  2025-11-23  4:29 ` [PATCH 0/2] refs: allow setting the reference directory Junio C Hamano
@ 2025-11-26 11:11 ` Karthik Nayak
  2025-11-26 11:12   ` [PATCH v2 1/2] refs: support obtaining ref_store for given dir Karthik Nayak
  2025-11-26 11:12   ` [PATCH v2 2/2] refs: add GIT_REF_URI to specify reference backend and directory Karthik Nayak
  2025-12-01 11:24 ` [PATCH v3 0/2] refs: allow setting the reference directory Karthik Nayak
  4 siblings, 2 replies; 33+ messages in thread
From: Karthik Nayak @ 2025-11-26 11:11 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, jltobler, gitster, toon, sunshine,
	Jean-Noël Avila

While Git allows users to select different reference backends, unlike
with objects, there is no flexibility in selecting the reference
directory. Currently, the reference format is obtained from the config
of the repository and the reference directory is set to the $GIT_DIR.

This patch series adds a new ENV variable 'GIT_REF_URI' which takes the
reference backend and path in a URI form:

    <reference_backend>://<URI-for-resource>

For e.g. 'reftable:///foo' or 'files://$GIT_DIR/ref_migration.0xBsa0'.

One use case for this is migration between different backends. On the
server side, migrating from the files backend to the newly introduced
reftable backend can be achieved by running 'git refs migrate'. However,
for large repositories with millions of references, this migration can
take from seconds to minutes.

For some background, at GitLab, the criteria for our migration was to
reduce the downtime of the migrate ideally to zero. So running 'git refs
migrate --ref-format=reftable' by itself wouldn't work, since it scales
with the number of references and we have repos with millions of
references, so we need to migrate without loosing any information. We
came up with the following plan:

  1. Run git-pack-refs(1) and note timestamp of the generated packed-refs
     file.
  2. Run git refs migrate –dry-run.
  3. If there are no ongoing reference requests (read/write)
     a. Lock the repository by blocking incoming requests (done on a
        layer above git, in Gitaly [1]).
     b. If the timestamp of the packed-refs file has changed, unlock
        the repo and repeat from step 1.
     c. Apply all the loose refs to the dry-run reftable folder (this
        requires support in Git to write refs to arbitrary folder).
     d. Move the reftable dry-run folder into the GIT_DIR.
     e. Swap the repo config
     f. Unlock repo access

Using such a route, scales much better since we only have to worry about
blocking the repository by O(ref written between #1 and #3a) and not
O(refs in repo). But for doing so, we need to be able to write to a
arbitrary reference backend + path. This is to add the missing
references to the dry-run reftable folder. This series, achieves that.

The first commit adds the required changes to create a 'ref_store' for a
given path. The second commit parses the URI if available when creating
the main ref store.

This is based on top of 9a2fb147f2 (Git 2.52, 2025-11-17).

[1]: https://gitlab.com/gitlab-org/gitaly

---
Changes in v2:
- Added more clarification and proper intent in the cover message.
- Changed the format from '<ref_backend>://<path>' to
  `<ref_backend>://<URI-for-resource>` as it much clearer.
- Added logic to check for the '//' in the provided URI and a test for
  the same.
- In the tests:
  - Use test_must_fail() instead of ! git
  - Fix looped tests not using the variables correctly and ensure that
    the test description is correct.
- Link to v1: https://patch.msgid.link/20251119-kn-alternate-ref-dir-v1-0-4cf4a94c8bed@gmail.com

---
 Documentation/git.adoc |   8 ++++
 environment.h          |   1 +
 refs.c                 |  71 +++++++++++++++++++++++++++--
 t/meson.build          |   1 +
 t/t1423-ref-backend.sh | 121 +++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 199 insertions(+), 3 deletions(-)

Karthik Nayak (2):
      refs: support obtaining ref_store for given dir
      refs: add GIT_REF_URI to specify reference backend and directory

Range-diff versus v1:

1:  f6e8aa37fe ! 1:  c925726efd refs: support obtaining ref_store for given dir
    @@ Commit message
         The refs subsystem uses the `get_main_ref_store()` to obtain the main
         ref_store for a given repository. In the upcoming patches we also want
         to create a ref_store for any given reference directory, which may exist
    -    in arbitrary paths. To support such behavior, extract out the core logic
    -    for creating out the ref_store from `get_main_ref_store()` into a new
    -    function `get_ref_store_for_dir()` which can provide the ref_store for a
    +    in arbitrary paths. For the files backend and the reftable backend, the
    +    reference directory is generally the $GIT_DIR.
    +
    +    To support such behavior, extract out the core logic for creating out
    +    the ref_store from `get_main_ref_store()` into a new function
    +    `get_ref_store_for_dir()` which can provide the ref_store for a
         given (repository, directory, reference format) combination.
     
         Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
2:  5e30fa334e ! 2:  b859ebad64 refs: add GIT_REF_URI to specify reference backend and directory
    @@ Commit message
         Add a new environment variable 'GIT_REF_URI' that specifies both the
         reference backend and directory path using a URI format:
     
    -        <ref_backend>://<path>
    +        <ref_backend>://<URI-for-resource>
     
         When set, this variable is used to obtain the main reference store for
         all Git commands. The variable is checked in `get_main_ref_store()`
    @@ Commit message
         Add a new test file 't1423-ref-backend.sh' to test this environment
         variable.
     
    +    Helped-by: Jean-Noël Avila <jn.avila@free.fr>
         Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
     
      ## Documentation/git.adoc ##
    @@ Documentation/git.adoc: double-quotes and respecting backslash escapes. E.g., th
      	See `--ref-format` in linkgit:git-init[1].
      
     +`GIT_REF_URI`::
    -+    Specify which reference backend and path to be used, if not specified the
    -+    backend is inferred from the configuration and $GIT_DIR is used as the
    -+    path.
    ++    Specify which reference backend to be used along with its URI. Reference
    ++    backends like the files, reftable backend use the $GIT_DIR as their URI.
     ++
    -+Expects the format '<ref_backend>://<path>', where the 'backend' specifies the
    -+reference backend and the 'path' specifies the directory used by the backend.
    ++Expects the format `<ref_backend>://<URI-for-resource>`, where the
    ++_<ref_backend>_ specifies the reference backend and the _<URI-for-resource>_
    ++specifies the URI used by the backend.
     +
      Git Commits
      ~~~~~~~~~~~
    @@ refs.c: static struct ref_store *get_ref_store_for_dir(struct repository *r,
     +	}
     +
     +	format_string = ref_backend_info.items[0].string;
    ++	if (!starts_with(ref_backend_info.items[1].string, "//")) {
    ++		error("invalid reference backend uri format '%s'", uri);
    ++		goto cleanup;
    ++	}
    ++	dir = ref_backend_info.items[1].string + 2;
    ++
    ++	format_string = ref_backend_info.items[0].string;
     +	dir = ref_backend_info.items[1].string + 2;
     +
     +	if (!dir || !dir[0]) {
    @@ t/t1423-ref-backend.sh (new)
     +		cd repo &&
     +		GIT_REF_URI="" &&
     +		export GIT_REF_URI &&
    -+		! git refs list 2>err &&
    ++		test_must_fail git refs list 2>err &&
     +		test_grep "reference backend uri is empty" err
     +	)
     +'
    @@ t/t1423-ref-backend.sh (new)
     +		cd repo &&
     +		GIT_REF_URI="reftable@/home/reftable" &&
     +		export GIT_REF_URI &&
    -+		! git refs list 2>err &&
    ++		test_must_fail git refs list 2>err &&
     +		test_grep "invalid reference backend uri format" err
     +	)
     +'
    @@ t/t1423-ref-backend.sh (new)
     +		cd repo &&
     +		GIT_REF_URI="reftable://" &&
     +		export GIT_REF_URI &&
    -+		! git refs list 2>err &&
    ++		test_must_fail git refs list 2>err &&
     +		test_grep "invalid path in uri" err
     +	)
     +'
     +
    ++test_expect_success 'uri ends at colon' '
    ++	test_when_finished "rm -rf repo" &&
    ++	git init --ref-format=files repo &&
    ++	(
    ++		cd repo &&
    ++		GIT_REF_URI="reftable:" &&
    ++		export GIT_REF_URI &&
    ++		test_must_fail git refs list 2>err &&
    ++		test_grep "invalid reference backend uri format" err
    ++	)
    ++'
    ++
     +test_expect_success 'unknown reference backend' '
     +	test_when_finished "rm -rf repo" &&
     +	git init --ref-format=files repo &&
    @@ t/t1423-ref-backend.sh (new)
     +		cd repo &&
     +		GIT_REF_URI="db://.git" &&
     +		export GIT_REF_URI &&
    -+		! git refs list 2>err &&
    ++		test_must_fail git refs list 2>err &&
     +		test_grep "unknown reference backend" err
     +	)
     +'
    @@ t/t1423-ref-backend.sh (new)
     +			continue
     +		fi
     +
    -+		test_expect_success 'read from other reference backend' '
    ++		test_expect_success "read from $to_format backend" '
     +			test_when_finished "rm -rf repo" &&
    -+			git init --ref-format=files repo &&
    ++			git init --ref-format=$from_format repo &&
     +			(
     +				cd repo &&
     +				test_commit 1 &&
     +				test_commit 2 &&
     +				test_commit 3 &&
     +
    -+				git refs migrate --dry-run --ref-format=reftable >out &&
    -+				REFTABLE_PATH=$(cat out | sed "s/.* ${SQ}\(.*\)${SQ}/\1/") &&
    ++				git refs migrate --dry-run --ref-format=$to_format >out &&
    ++				BACKEND_PATH=$(cat out | sed "s/.* ${SQ}\(.*\)${SQ}/\1/") &&
     +				git refs list >expect &&
    -+				GIT_REF_URI="reftable://$REFTABLE_PATH" git refs list >actual &&
    ++				GIT_REF_URI="$to_format://$BACKEND_PATH" git refs list >actual &&
     +				test_cmp expect actual
     +			)
     +		'
     +
    -+		test_expect_success 'write to other reference backend' '
    ++		test_expect_success "write to $to_format backend" '
     +			test_when_finished "rm -rf repo" &&
    -+			git init --ref-format=files repo &&
    ++			git init --ref-format=$from_format repo &&
     +			(
     +				cd repo &&
     +				test_commit 1 &&
     +				test_commit 2 &&
     +				test_commit 3 &&
     +
    -+				git refs migrate --dry-run --ref-format=reftable >out &&
    ++				git refs migrate --dry-run --ref-format=$to_format >out &&
     +				git refs list >expect &&
     +
    -+				REFTABLE_PATH=$(cat out | sed "s/.* ${SQ}\(.*\)${SQ}/\1/") &&
    -+				GIT_REF_URI="reftable://$REFTABLE_PATH" git tag -d 1 &&
    ++				BACKEND_PATH=$(cat out | sed "s/.* ${SQ}\(.*\)${SQ}/\1/") &&
    ++				GIT_REF_URI="$to_format://$BACKEND_PATH" git tag -d 1 &&
     +
     +				git refs list >actual &&
     +				test_cmp expect actual &&
     +
    -+				GIT_REF_URI="reftable://$REFTABLE_PATH" git refs list >expect &&
    ++				GIT_REF_URI="$to_format://$BACKEND_PATH" git refs list >expect &&
     +				git refs list >out &&
     +				cat out | grep -v "refs/tags/1" >actual &&
     +				test_cmp expect actual


base-commit: 9a2fb147f2c61d0cab52c883e7e26f5b7948e3ed
change-id: 20251105-kn-alternate-ref-dir-3e572e8cd0ef

Thanks
- Karthik


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 1/2] refs: support obtaining ref_store for given dir
  2025-11-26 11:11 ` [PATCH v2 " Karthik Nayak
@ 2025-11-26 11:12   ` Karthik Nayak
  2025-11-26 15:16     ` Junio C Hamano
  2025-11-26 11:12   ` [PATCH v2 2/2] refs: add GIT_REF_URI to specify reference backend and directory Karthik Nayak
  1 sibling, 1 reply; 33+ messages in thread
From: Karthik Nayak @ 2025-11-26 11:12 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, jltobler, gitster, toon, sunshine

The refs subsystem uses the `get_main_ref_store()` to obtain the main
ref_store for a given repository. In the upcoming patches we also want
to create a ref_store for any given reference directory, which may exist
in arbitrary paths. For the files backend and the reftable backend, the
reference directory is generally the $GIT_DIR.

To support such behavior, extract out the core logic for creating out
the ref_store from `get_main_ref_store()` into a new function
`get_ref_store_for_dir()` which can provide the ref_store for a
given (repository, directory, reference format) combination.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/refs.c b/refs.c
index 965381367e..23f46867f2 100644
--- a/refs.c
+++ b/refs.c
@@ -2177,6 +2177,15 @@ void ref_store_release(struct ref_store *ref_store)
 	free(ref_store->gitdir);
 }
 
+static struct ref_store *get_ref_store_for_dir(struct repository *r,
+					       char *dir,
+					       enum ref_storage_format format)
+{
+	struct ref_store *ref_store = ref_store_init(r, format, dir,
+						     REF_STORE_ALL_CAPS);
+	return maybe_debug_wrap_ref_store(dir, ref_store);
+}
+
 struct ref_store *get_main_ref_store(struct repository *r)
 {
 	if (r->refs_private)
@@ -2185,9 +2194,7 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs_private = ref_store_init(r, r->ref_storage_format,
-					 r->gitdir, REF_STORE_ALL_CAPS);
-	r->refs_private = maybe_debug_wrap_ref_store(r->gitdir, r->refs_private);
+	r->refs_private = get_ref_store_for_dir(r, r->gitdir, r->ref_storage_format);
 	return r->refs_private;
 }
 

-- 
2.51.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 2/2] refs: add GIT_REF_URI to specify reference backend and directory
  2025-11-26 11:11 ` [PATCH v2 " Karthik Nayak
  2025-11-26 11:12   ` [PATCH v2 1/2] refs: support obtaining ref_store for given dir Karthik Nayak
@ 2025-11-26 11:12   ` Karthik Nayak
  2025-11-26 16:17     ` Junio C Hamano
  1 sibling, 1 reply; 33+ messages in thread
From: Karthik Nayak @ 2025-11-26 11:12 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, jltobler, gitster, toon, sunshine,
	Jean-Noël Avila

Git allows setting a different object directory via
'GIT_OBJECT_DIRECTORY', but provides no equivalent for references.
This asymmetry makes it difficult to test different reference backends
or use alternative reference storage locations without modifying the
repository structure.

Add a new environment variable 'GIT_REF_URI' that specifies both the
reference backend and directory path using a URI format:

    <ref_backend>://<URI-for-resource>

When set, this variable is used to obtain the main reference store for
all Git commands. The variable is checked in `get_main_ref_store()`
when lazily assigning `repo->refs_private`. We cannot initialize this
earlier in `repo_set_gitdir()` because the repository's hash algorithm
isn't known at that point, and the reftable backend requires this
information during initialization.

When used with worktrees, the specified directory is treated as the
reference directory for all worktree operations.

Add a new test file 't1423-ref-backend.sh' to test this environment
variable.

Helped-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/git.adoc |   8 ++++
 environment.h          |   1 +
 refs.c                 |  60 +++++++++++++++++++++++-
 t/meson.build          |   1 +
 t/t1423-ref-backend.sh | 121 +++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 190 insertions(+), 1 deletion(-)

diff --git a/Documentation/git.adoc b/Documentation/git.adoc
index ce099e78b8..8c6a3f6042 100644
--- a/Documentation/git.adoc
+++ b/Documentation/git.adoc
@@ -584,6 +584,14 @@ double-quotes and respecting backslash escapes. E.g., the value
 	repositories will be set to this value. The default is "files".
 	See `--ref-format` in linkgit:git-init[1].
 
+`GIT_REF_URI`::
+    Specify which reference backend to be used along with its URI. Reference
+    backends like the files, reftable backend use the $GIT_DIR as their URI.
++
+Expects the format `<ref_backend>://<URI-for-resource>`, where the
+_<ref_backend>_ specifies the reference backend and the _<URI-for-resource>_
+specifies the URI used by the backend.
+
 Git Commits
 ~~~~~~~~~~~
 `GIT_AUTHOR_NAME`::
diff --git a/environment.h b/environment.h
index 51898c99cd..9bc380bba4 100644
--- a/environment.h
+++ b/environment.h
@@ -42,6 +42,7 @@
 #define GIT_OPTIONAL_LOCKS_ENVIRONMENT "GIT_OPTIONAL_LOCKS"
 #define GIT_TEXT_DOMAIN_DIR_ENVIRONMENT "GIT_TEXTDOMAINDIR"
 #define GIT_ATTR_SOURCE_ENVIRONMENT "GIT_ATTR_SOURCE"
+#define GIT_REF_URI_ENVIRONMENT "GIT_REF_URI"
 
 /*
  * Environment variable used to propagate the --no-advice global option to the
diff --git a/refs.c b/refs.c
index 23f46867f2..a7af228799 100644
--- a/refs.c
+++ b/refs.c
@@ -2186,15 +2186,73 @@ static struct ref_store *get_ref_store_for_dir(struct repository *r,
 	return maybe_debug_wrap_ref_store(dir, ref_store);
 }
 
+static struct ref_store *get_ref_store_from_uri(struct repository *repo,
+						const char *uri)
+{
+	struct string_list ref_backend_info = STRING_LIST_INIT_DUP;
+	enum ref_storage_format format;
+	struct ref_store *store = NULL;
+	char *format_string;
+	char *dir;
+
+	if (!uri || !uri[0]) {
+		error("reference backend uri is empty");
+		goto cleanup;
+	}
+
+	if (string_list_split(&ref_backend_info, uri, ":", 2) != 2) {
+		error("invalid reference backend uri format '%s'", uri);
+		goto cleanup;
+	}
+
+	format_string = ref_backend_info.items[0].string;
+	if (!starts_with(ref_backend_info.items[1].string, "//")) {
+		error("invalid reference backend uri format '%s'", uri);
+		goto cleanup;
+	}
+	dir = ref_backend_info.items[1].string + 2;
+
+	format_string = ref_backend_info.items[0].string;
+	dir = ref_backend_info.items[1].string + 2;
+
+	if (!dir || !dir[0]) {
+		error("invalid path in uri '%s'", uri);
+		goto cleanup;
+	}
+
+	format = ref_storage_format_by_name(format_string);
+	if (format == REF_STORAGE_FORMAT_UNKNOWN) {
+		error("unknown reference backend '%s'", format_string);
+		goto cleanup;
+	}
+
+	store = get_ref_store_for_dir(repo, dir, format);
+
+cleanup:
+	string_list_clear(&ref_backend_info, 0);
+	return store;
+}
+
 struct ref_store *get_main_ref_store(struct repository *r)
 {
+	char *ref_uri;
+
 	if (r->refs_private)
 		return r->refs_private;
 
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs_private = get_ref_store_for_dir(r, r->gitdir, r->ref_storage_format);
+	ref_uri = getenv(GIT_REF_URI_ENVIRONMENT);
+	if (ref_uri) {
+		r->refs_private = get_ref_store_from_uri(r, ref_uri);
+		if (!r->refs_private)
+			die("failed to initialize ref store from URI: %s", ref_uri);
+
+	} else {
+		r->refs_private = get_ref_store_for_dir(r, r->gitdir,
+							r->ref_storage_format);
+	}
 	return r->refs_private;
 }
 
diff --git a/t/meson.build b/t/meson.build
index a5531df415..a66f8fafff 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -208,6 +208,7 @@ integration_tests = [
   't1420-lost-found.sh',
   't1421-reflog-write.sh',
   't1422-show-ref-exists.sh',
+  't1423-ref-backend.sh',
   't1430-bad-ref-name.sh',
   't1450-fsck.sh',
   't1451-fsck-buffer.sh',
diff --git a/t/t1423-ref-backend.sh b/t/t1423-ref-backend.sh
new file mode 100755
index 0000000000..f6756bdd2b
--- /dev/null
+++ b/t/t1423-ref-backend.sh
@@ -0,0 +1,121 @@
+#!/bin/sh
+
+test_description='Test different reference backend URIs'
+
+. ./test-lib.sh
+
+test_expect_success 'empty uri provided' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	(
+		cd repo &&
+		GIT_REF_URI="" &&
+		export GIT_REF_URI &&
+		test_must_fail git refs list 2>err &&
+		test_grep "reference backend uri is empty" err
+	)
+'
+
+test_expect_success 'invalid uri provided' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	(
+		cd repo &&
+		GIT_REF_URI="reftable@/home/reftable" &&
+		export GIT_REF_URI &&
+		test_must_fail git refs list 2>err &&
+		test_grep "invalid reference backend uri format" err
+	)
+'
+
+test_expect_success 'empty path in uri' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	(
+		cd repo &&
+		GIT_REF_URI="reftable://" &&
+		export GIT_REF_URI &&
+		test_must_fail git refs list 2>err &&
+		test_grep "invalid path in uri" err
+	)
+'
+
+test_expect_success 'uri ends at colon' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	(
+		cd repo &&
+		GIT_REF_URI="reftable:" &&
+		export GIT_REF_URI &&
+		test_must_fail git refs list 2>err &&
+		test_grep "invalid reference backend uri format" err
+	)
+'
+
+test_expect_success 'unknown reference backend' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	(
+		cd repo &&
+		GIT_REF_URI="db://.git" &&
+		export GIT_REF_URI &&
+		test_must_fail git refs list 2>err &&
+		test_grep "unknown reference backend" err
+	)
+'
+
+ref_formats="files reftable"
+for from_format in $ref_formats
+do
+	for to_format in $ref_formats
+	do
+		if test "$from_format" = "$to_format"
+		then
+			continue
+		fi
+
+		test_expect_success "read from $to_format backend" '
+			test_when_finished "rm -rf repo" &&
+			git init --ref-format=$from_format repo &&
+			(
+				cd repo &&
+				test_commit 1 &&
+				test_commit 2 &&
+				test_commit 3 &&
+
+				git refs migrate --dry-run --ref-format=$to_format >out &&
+				BACKEND_PATH=$(cat out | sed "s/.* ${SQ}\(.*\)${SQ}/\1/") &&
+				git refs list >expect &&
+				GIT_REF_URI="$to_format://$BACKEND_PATH" git refs list >actual &&
+				test_cmp expect actual
+			)
+		'
+
+		test_expect_success "write to $to_format backend" '
+			test_when_finished "rm -rf repo" &&
+			git init --ref-format=$from_format repo &&
+			(
+				cd repo &&
+				test_commit 1 &&
+				test_commit 2 &&
+				test_commit 3 &&
+
+				git refs migrate --dry-run --ref-format=$to_format >out &&
+				git refs list >expect &&
+
+				BACKEND_PATH=$(cat out | sed "s/.* ${SQ}\(.*\)${SQ}/\1/") &&
+				GIT_REF_URI="$to_format://$BACKEND_PATH" git tag -d 1 &&
+
+				git refs list >actual &&
+				test_cmp expect actual &&
+
+				GIT_REF_URI="$to_format://$BACKEND_PATH" git refs list >expect &&
+				git refs list >out &&
+				cat out | grep -v "refs/tags/1" >actual &&
+				test_cmp expect actual
+			)
+		'
+	done
+done
+
+test_done

-- 
2.51.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory
  2025-11-24 13:25       ` Karthik Nayak
@ 2025-11-26 13:11         ` Toon Claes
  0 siblings, 0 replies; 33+ messages in thread
From: Toon Claes @ 2025-11-26 13:11 UTC (permalink / raw)
  To: Karthik Nayak, Junio C Hamano; +Cc: git, Patrick Steinhardt

Karthik Nayak <karthik.188@gmail.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Toon Claes <toon@iotcl.com> writes:
>>
>>>>     <ref_backend>://<path>
>>>
>>> I like this idea. This would allow us in the future to also do something
>>> like:
>>>
>>>     reftable+nfs://10.11.12.13/ref-dir
>>
>> I actually thought from Karthik's definition that what you are
>> trying to say is spelled more like this:
>>
>>     reftable://nfs://10.11.12.13/ref-dir

LOL, I didn't realize since 07c7782cc8 (Disown ssh+git and git+ssh,
2016-02-15) 'ssh+git' isn't actually mentioned no more.

>> IOW, the underlying URI to "reach the resource" is in the <path>
>> part (i.e., "nfs://<addr>/<directory>").  And I found it somewhat a
>> strange syntax, because the "to reach the resource, visit this" URI
>> may not necessarily look like <path>, and I also wondered if
>> spelling it like <ref_backend>:<URI-for-resource> is more
>> appropriate.
>
> This is a much better way to state what I was going for, I was
> considering the entire <ref_backend>:<path> as the URI since currently
> we only deal with FS paths (for files and reftable). But that could
> potentially change. As such, it makes sense to state it as
> <ref_backend>:<URI-for-resource>. Will modify accordingly.

I don't really care on the format, as long as we're consistent. I know
Patrick is working on pluggable object databases and also there he's
suggesting to do `<backend_type>://<URI>` which might also lead to
double `://` use. But I guess that's still subjected to change.

That said. Should we change the current proposal of
`<ref_backend>://<path>` to `<ref_backend>:file://<path>`? Or can we
simply assume the latter implied from the former?


-- 
Cheers,
Toon

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/2] refs: support obtaining ref_store for given dir
  2025-11-26 11:12   ` [PATCH v2 1/2] refs: support obtaining ref_store for given dir Karthik Nayak
@ 2025-11-26 15:16     ` Junio C Hamano
  0 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2025-11-26 15:16 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, jltobler, toon, sunshine

Karthik Nayak <karthik.188@gmail.com> writes:

> The refs subsystem uses the `get_main_ref_store()` to obtain the main
> ref_store for a given repository. In the upcoming patches we also want
> to create a ref_store for any given reference directory, which may exist
> in arbitrary paths. For the files backend and the reftable backend, the
> reference directory is generally the $GIT_DIR.
>
> To support such behavior, extract out the core logic for creating out
> the ref_store from `get_main_ref_store()` into a new function
> `get_ref_store_for_dir()` which can provide the ref_store for a
> given (repository, directory, reference format) combination.

I am guessing that this is meant to work with the REF_URI thing, and
the <path> part in REF_URI=<backend>:<path> corresponds to the "dir"
parameter here.

Looks like a good no-op split.

> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
>  refs.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/refs.c b/refs.c
> index 965381367e..23f46867f2 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -2177,6 +2177,15 @@ void ref_store_release(struct ref_store *ref_store)
>  	free(ref_store->gitdir);
>  }
>  
> +static struct ref_store *get_ref_store_for_dir(struct repository *r,
> +					       char *dir,
> +					       enum ref_storage_format format)
> +{
> +	struct ref_store *ref_store = ref_store_init(r, format, dir,
> +						     REF_STORE_ALL_CAPS);
> +	return maybe_debug_wrap_ref_store(dir, ref_store);
> +}
> +
>  struct ref_store *get_main_ref_store(struct repository *r)
>  {
>  	if (r->refs_private)
> @@ -2185,9 +2194,7 @@ struct ref_store *get_main_ref_store(struct repository *r)
>  	if (!r->gitdir)
>  		BUG("attempting to get main_ref_store outside of repository");
>  
> -	r->refs_private = ref_store_init(r, r->ref_storage_format,
> -					 r->gitdir, REF_STORE_ALL_CAPS);
> -	r->refs_private = maybe_debug_wrap_ref_store(r->gitdir, r->refs_private);
> +	r->refs_private = get_ref_store_for_dir(r, r->gitdir, r->ref_storage_format);
>  	return r->refs_private;
>  }

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/2] refs: add GIT_REF_URI to specify reference backend and directory
  2025-11-26 11:12   ` [PATCH v2 2/2] refs: add GIT_REF_URI to specify reference backend and directory Karthik Nayak
@ 2025-11-26 16:17     ` Junio C Hamano
  2025-11-27 14:52       ` Karthik Nayak
  0 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2025-11-26 16:17 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, jltobler, toon, sunshine, Jean-Noël Avila

Karthik Nayak <karthik.188@gmail.com> writes:

> +`GIT_REF_URI`::
> +    Specify which reference backend to be used along with its URI. Reference
> +    backends like the files, reftable backend use the $GIT_DIR as their URI.
> ++
> +Expects the format `<ref_backend>://<URI-for-resource>`, where the
> +_<ref_backend>_ specifies the reference backend and the _<URI-for-resource>_
> +specifies the URI used by the backend.

It is more like "<directory>" that specifies the local directory the
backend is told to use to store its data.  It feels way too broad
for what the initial implementation achieves and what the design can
potentially include, to say "URI-for-resource", I would think.

> diff --git a/environment.h b/environment.h
> index 51898c99cd..9bc380bba4 100644
> --- a/environment.h
> +++ b/environment.h
> @@ -42,6 +42,7 @@
>  #define GIT_OPTIONAL_LOCKS_ENVIRONMENT "GIT_OPTIONAL_LOCKS"
>  #define GIT_TEXT_DOMAIN_DIR_ENVIRONMENT "GIT_TEXTDOMAINDIR"
>  #define GIT_ATTR_SOURCE_ENVIRONMENT "GIT_ATTR_SOURCE"
> +#define GIT_REF_URI_ENVIRONMENT "GIT_REF_URI"
>  
>  /*
>   * Environment variable used to propagate the --no-advice global option to the
> diff --git a/refs.c b/refs.c
> index 23f46867f2..a7af228799 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -2186,15 +2186,73 @@ static struct ref_store *get_ref_store_for_dir(struct repository *r,
>  	return maybe_debug_wrap_ref_store(dir, ref_store);
>  }
>  
> +static struct ref_store *get_ref_store_from_uri(struct repository *repo,
> +						const char *uri)
> +{
> +	struct string_list ref_backend_info = STRING_LIST_INIT_DUP;
> +	enum ref_storage_format format;
> +	struct ref_store *store = NULL;
> +	char *format_string;
> +	char *dir;
> +
> +	if (!uri || !uri[0]) {
> +		error("reference backend uri is empty");
> +		goto cleanup;
> +	}

Equating !uri and !uri[0] and giving the same message would not help
diagnosing an error, and not _("localizing") the message is of dubious
value (after all, the message is not being given to somebody coming
over the network, but meant to be given to the local user, right?).

If we remove the !uri[0] from the check, shouldn't the later check
catch it as "invalid format" anyway, and print '%s' it to show that
what was given was empty clearly enough?

> +	if (string_list_split(&ref_backend_info, uri, ":", 2) != 2) {
> +		error("invalid reference backend uri format '%s'", uri);
> +		goto cleanup;
> +	}
> +
> +	format_string = ref_backend_info.items[0].string;
> +	if (!starts_with(ref_backend_info.items[1].string, "//")) {
> +		error("invalid reference backend uri format '%s'", uri);
> +		goto cleanup;
> +	}
> +	dir = ref_backend_info.items[1].string + 2;

Two questions.  (1) do we still want the double-slash after the
colon?  (2) if so, would it make it simpler to string-list-split
using "://" as the separator?

> +	format_string = ref_backend_info.items[0].string;
> +	dir = ref_backend_info.items[1].string + 2;

These two lines are fishy.  Perhaps leftover from an earlier draft
that did not have an error checking before the previous 5 lines were
added?

> +	if (!dir || !dir[0]) {
> +		error("invalid path in uri '%s'", uri);
> +		goto cleanup;
> +	}

At this point it is very unlikely for "dir" to be NULL, no?  Even if
the .string member after splitting were NULL, adding 2 to it would
not leave it NULL.

Being defensive and checking for NULL is good, but then exactly the
same question on "NULL vs an empty string" applies here.

>  struct ref_store *get_main_ref_store(struct repository *r)
>  {
> +	char *ref_uri;
> +
>  	if (r->refs_private)
>  		return r->refs_private;
>  
>  	if (!r->gitdir)
>  		BUG("attempting to get main_ref_store outside of repository");
>  
> -	r->refs_private = get_ref_store_for_dir(r, r->gitdir, r->ref_storage_format);
> +	ref_uri = getenv(GIT_REF_URI_ENVIRONMENT);
> +	if (ref_uri) {
> +		r->refs_private = get_ref_store_from_uri(r, ref_uri);
> +		if (!r->refs_private)
> +			die("failed to initialize ref store from URI: %s", ref_uri);
> +
> +	} else {
> +		r->refs_private = get_ref_store_for_dir(r, r->gitdir,
> +							r->ref_storage_format);
> +	}
>  	return r->refs_private;
>  }

If this mechanism is for consumption by "git refs migrate", is it
possible to reduce the blast radius by giving the command a command
line option to do an equivalent of this?  I really am not happy with
this environment variable that can change the behaviour of such a
low level layer from unsuspecting programs that are not ready.

Instead of tweaking the behaviour of this function via environment
that can affect any programs, can't we give these callers like "git
refs migrate" with specific needs set_main_ref_store() function that
takes a ref_store and a repository.  Then they can use to call into
get_ref_store_for_dir() to obtain a ref they need.  "git refs migrate"
already takes "--ref-format" variable, so all it needs is another
"--ref-directory" command line option, right?

If the ability to set the ref backend location for arbitrary program
proves to be useful, we _could_ give the same --ref-format and
--ref-direcctory command line options to "git" itself (like "git -C
there" runs any subcommand in the named directory), which does the
the get_ref_store_for_dir() plus set_main_ref_store() dance,
modelled after how "git refs migrate" does them.

Hmm?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/2] refs: add GIT_REF_URI to specify reference backend and directory
  2025-11-26 16:17     ` Junio C Hamano
@ 2025-11-27 14:52       ` Karthik Nayak
  2025-11-27 20:02         ` Junio C Hamano
  0 siblings, 1 reply; 33+ messages in thread
From: Karthik Nayak @ 2025-11-27 14:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, jltobler, toon, sunshine, Jean-Noël Avila

[-- Attachment #1: Type: text/plain, Size: 7511 bytes --]

Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> +`GIT_REF_URI`::
>> +    Specify which reference backend to be used along with its URI. Reference
>> +    backends like the files, reftable backend use the $GIT_DIR as their URI.
>> ++
>> +Expects the format `<ref_backend>://<URI-for-resource>`, where the
>> +_<ref_backend>_ specifies the reference backend and the _<URI-for-resource>_
>> +specifies the URI used by the backend.
>
> It is more like "<directory>" that specifies the local directory the
> backend is told to use to store its data.  It feels way too broad
> for what the initial implementation achieves and what the design can
> potentially include, to say "URI-for-resource", I would think.
>

Well I'm okay either ways, my first version was very specific as it
mention '<path>'. I changed it based on the discussion with you and Toon
about how the '<path>' is the URI for the reference backend.

>> diff --git a/environment.h b/environment.h
>> index 51898c99cd..9bc380bba4 100644
>> --- a/environment.h
>> +++ b/environment.h
>> @@ -42,6 +42,7 @@
>>  #define GIT_OPTIONAL_LOCKS_ENVIRONMENT "GIT_OPTIONAL_LOCKS"
>>  #define GIT_TEXT_DOMAIN_DIR_ENVIRONMENT "GIT_TEXTDOMAINDIR"
>>  #define GIT_ATTR_SOURCE_ENVIRONMENT "GIT_ATTR_SOURCE"
>> +#define GIT_REF_URI_ENVIRONMENT "GIT_REF_URI"
>>
>>  /*
>>   * Environment variable used to propagate the --no-advice global option to the
>> diff --git a/refs.c b/refs.c
>> index 23f46867f2..a7af228799 100644
>> --- a/refs.c
>> +++ b/refs.c
>> @@ -2186,15 +2186,73 @@ static struct ref_store *get_ref_store_for_dir(struct repository *r,
>>  	return maybe_debug_wrap_ref_store(dir, ref_store);
>>  }
>>
>> +static struct ref_store *get_ref_store_from_uri(struct repository *repo,
>> +						const char *uri)
>> +{
>> +	struct string_list ref_backend_info = STRING_LIST_INIT_DUP;
>> +	enum ref_storage_format format;
>> +	struct ref_store *store = NULL;
>> +	char *format_string;
>> +	char *dir;
>> +
>> +	if (!uri || !uri[0]) {
>> +		error("reference backend uri is empty");
>> +		goto cleanup;
>> +	}
>
> Equating !uri and !uri[0] and giving the same message would not help
> diagnosing an error, and not _("localizing") the message is of dubious
> value (after all, the message is not being given to somebody coming
> over the network, but meant to be given to the local user, right?).
>

I think that's fair. I also missed localizing all the errors, I think
someone did point that out too.

> If we remove the !uri[0] from the check, shouldn't the later check
> catch it as "invalid format" anyway, and print '%s' it to show that
> what was given was empty clearly enough?
>

Yeah, it should I'll remove the latter and modify the test.

>> +	if (string_list_split(&ref_backend_info, uri, ":", 2) != 2) {
>> +		error("invalid reference backend uri format '%s'", uri);
>> +		goto cleanup;
>> +	}
>> +
>> +	format_string = ref_backend_info.items[0].string;
>> +	if (!starts_with(ref_backend_info.items[1].string, "//")) {
>> +		error("invalid reference backend uri format '%s'", uri);
>> +		goto cleanup;
>> +	}
>> +	dir = ref_backend_info.items[1].string + 2;
>
> Two questions.  (1) do we still want the double-slash after the
> colon?  (2) if so, would it make it simpler to string-list-split
> using "://" as the separator?
>

(1) Yes.
(2) My understanding of `string_list_split()` was that the `delim`
argument are a set of characters to split the string on.

So:
    string_list_split(l, "abc:def/ghi/jkl", "://", -1) -> ["abc",
"def", "ghi", "jkl"]
    string_list_split(l, "reftable://foo", "://", -1) -> ["reftable",
"", "", "foo", "bar"]

But this isn't what we want.

>> +	format_string = ref_backend_info.items[0].string;
>> +	dir = ref_backend_info.items[1].string + 2;
>
> These two lines are fishy.  Perhaps leftover from an earlier draft
> that did not have an error checking before the previous 5 lines were
> added?
>

Yes, will cleanup.

>> +	if (!dir || !dir[0]) {
>> +		error("invalid path in uri '%s'", uri);
>> +		goto cleanup;
>> +	}
>
> At this point it is very unlikely for "dir" to be NULL, no?  Even if
> the .string member after splitting were NULL, adding 2 to it would
> not leave it NULL.
>
> Being defensive and checking for NULL is good, but then exactly the
> same question on "NULL vs an empty string" applies here.
>

Yea, the '!dir[0]' should definitely be enough here.

>>  struct ref_store *get_main_ref_store(struct repository *r)
>>  {
>> +	char *ref_uri;
>> +
>>  	if (r->refs_private)
>>  		return r->refs_private;
>>
>>  	if (!r->gitdir)
>>  		BUG("attempting to get main_ref_store outside of repository");
>>
>> -	r->refs_private = get_ref_store_for_dir(r, r->gitdir, r->ref_storage_format);
>> +	ref_uri = getenv(GIT_REF_URI_ENVIRONMENT);
>> +	if (ref_uri) {
>> +		r->refs_private = get_ref_store_from_uri(r, ref_uri);
>> +		if (!r->refs_private)
>> +			die("failed to initialize ref store from URI: %s", ref_uri);
>> +
>> +	} else {
>> +		r->refs_private = get_ref_store_for_dir(r, r->gitdir,
>> +							r->ref_storage_format);
>> +	}
>>  	return r->refs_private;
>>  }
>
> If this mechanism is for consumption by "git refs migrate", is it
> possible to reduce the blast radius by giving the command a command
> line option to do an equivalent of this?  I really am not happy with
> this environment variable that can change the behaviour of such a
> low level layer from unsuspecting programs that are not ready.
>

But the mechanism isn't for 'git refs migrate', but rather we want to
add/update references via 'git update-ref' into the dry-run folder
created by the 'git refs migrate'. In the broader sense, we want to
manipulate references within this dry-run folder as if it is the
reference folder for the underlying repository.

I get the comprehension behind the environment variable and am happy to
work on something alternative if we can achieve something similar. The
reason to pick the ENV variable was mostly because this isn't a regular
user flag which we expect users to use. Also, this is very similar to
the already existing GIT_OBJECT_DIRECTORY.

> Instead of tweaking the behaviour of this function via environment
> that can affect any programs, can't we give these callers like "git
> refs migrate" with specific needs set_main_ref_store() function that
> takes a ref_store and a repository.  Then they can use to call into
> get_ref_store_for_dir() to obtain a ref they need.  "git refs migrate"
> already takes "--ref-format" variable, so all it needs is another
> "--ref-directory" command line option, right?
>

Something like this would require us to add these flags to all commands,
currently I can think of 'git update-ref' and 'git refs' but it could
spread to all reference oriented commands.

> If the ability to set the ref backend location for arbitrary program
> proves to be useful, we _could_ give the same --ref-format and
> --ref-direcctory command line options to "git" itself (like "git -C
> there" runs any subcommand in the named directory), which does the
> the get_ref_store_for_dir() plus set_main_ref_store() dance,
> modelled after how "git refs migrate" does them.
>
> Hmm?

This could work indeed, I would instead swap it out for a single
"--ref-uri=<backend>://<uri>" which would make it much simpler for users
and future implementations which might not have a 'directory' like the
current backends do.

Overall the ENV variable seemed the best based on the constraints and
the existing similar variables. Wdyt?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/2] refs: add GIT_REF_URI to specify reference backend and directory
  2025-11-27 14:52       ` Karthik Nayak
@ 2025-11-27 20:02         ` Junio C Hamano
  2025-11-27 21:45           ` Karthik Nayak
  0 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2025-11-27 20:02 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, jltobler, toon, sunshine, Jean-Noël Avila

Karthik Nayak <karthik.188@gmail.com> writes:

> (2) My understanding of `string_list_split()` was that the `delim`
> argument are a set of characters to split the string on.

Ah, silly me.

> But the mechanism isn't for 'git refs migrate', but rather we want to
> add/update references via 'git update-ref' into the dry-run folder
> created by the 'git refs migrate'. In the broader sense, we want to
> manipulate references within this dry-run folder as if it is the
> reference folder for the underlying repository.

OK, I took the cover letter description too literally, it seems.

If we want everybody in a single session to have a temporarily
distorted view of the world, it has been a tried and proven way to
use environment variables that override the default repository
layout, e.g., GIT_DIR, GIT_WORK_TREE, and this "no reference
interactions go there, not the usual place the repository
configuration says" environment variable fits very well in the
context.

Thanks.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/2] refs: add GIT_REF_URI to specify reference backend and directory
  2025-11-27 20:02         ` Junio C Hamano
@ 2025-11-27 21:45           ` Karthik Nayak
  0 siblings, 0 replies; 33+ messages in thread
From: Karthik Nayak @ 2025-11-27 21:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, jltobler, toon, sunshine, Jean-Noël Avila

[-- Attachment #1: Type: text/plain, Size: 1260 bytes --]

Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> (2) My understanding of `string_list_split()` was that the `delim`
>> argument are a set of characters to split the string on.
>
> Ah, silly me.
>
>> But the mechanism isn't for 'git refs migrate', but rather we want to
>> add/update references via 'git update-ref' into the dry-run folder
>> created by the 'git refs migrate'. In the broader sense, we want to
>> manipulate references within this dry-run folder as if it is the
>> reference folder for the underlying repository.
>
> OK, I took the cover letter description too literally, it seems.
>

I did change the cover letter for this version with the plan of how this
would be used. Let me know if you think I could clarify further.

> If we want everybody in a single session to have a temporarily
> distorted view of the world, it has been a tried and proven way to
> use environment variables that override the default repository
> layout, e.g., GIT_DIR, GIT_WORK_TREE, and this "no reference
> interactions go there, not the usual place the repository
> configuration says" environment variable fits very well in the
> context.
>
> Thanks.

Yes! Exactly. Good to see we're on the same page :)

Karthik

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v3 0/2] refs: allow setting the reference directory
  2025-11-19 21:48 [PATCH 0/2] refs: allow setting the reference directory Karthik Nayak
                   ` (3 preceding siblings ...)
  2025-11-26 11:11 ` [PATCH v2 " Karthik Nayak
@ 2025-12-01 11:24 ` Karthik Nayak
  2025-12-01 11:24   ` [PATCH v3 1/2] refs: support obtaining ref_store for given dir Karthik Nayak
  2025-12-01 11:24   ` [PATCH v3 2/2] refs: add GIT_REF_URI to specify reference backend and directory Karthik Nayak
  4 siblings, 2 replies; 33+ messages in thread
From: Karthik Nayak @ 2025-12-01 11:24 UTC (permalink / raw)
  To: git; +Cc: jltobler, gitster, toon, sunshine, Karthik Nayak,
	Jean-Noël Avila

While Git allows users to select different reference backends, unlike
with objects, there is no flexibility in selecting the reference
directory. Currently, the reference format is obtained from the config
of the repository and the reference directory is set to the $GIT_DIR.

This patch series adds a new ENV variable 'GIT_REF_URI' which takes the
reference backend and path in a URI form:

    <reference_backend>://<URI-for-resource>

For e.g. 'reftable:///foo' or 'files://$GIT_DIR/ref_migration.0xBsa0'.

One use case for this is migration between different backends. On the
server side, migrating from the files backend to the newly introduced
reftable backend can be achieved by running 'git refs migrate'. However,
for large repositories with millions of references, this migration can
take from seconds to minutes.

For some background, at GitLab, the criteria for our migration was to
reduce the downtime of the migrate ideally to zero. So running 'git refs
migrate --ref-format=reftable' by itself wouldn't work, since it scales
with the number of references and we have repos with millions of
references, so we need to migrate without loosing any information. We
came up with the following plan:

  1. Run git-pack-refs(1) and note timestamp of the generated packed-refs
     file.
  2. Run git refs migrate –dry-run.
  3. If there are no ongoing reference requests (read/write)
     a. Lock the repository by blocking incoming requests (done on a
        layer above git, in Gitaly [1]).
     b. If the timestamp of the packed-refs file has changed, unlock
        the repo and repeat from step 1.
     c. Apply all the loose refs to the dry-run reftable folder (this
        requires support in Git to write refs to arbitrary folder).
     d. Move the reftable dry-run folder into the GIT_DIR.
     e. Swap the repo config
     f. Unlock repo access

Using such a route, scales much better since we only have to worry about
blocking the repository by O(ref written between #1 and #3a) and not
O(refs in repo). But for doing so, we need to be able to write to a
arbitrary reference backend + path. This is to add the missing
references to the dry-run reftable folder. This series, achieves that.

The first commit adds the required changes to create a 'ref_store' for a
given path. The second commit parses the URI if available when creating
the main ref store.

This is based on top of 9a2fb147f2 (Git 2.52, 2025-11-17).

[1]: https://gitlab.com/gitlab-org/gitaly

---
Changes in v3:
- Cleanup some stale code which wasn't removed.
- Localize strings which will be output to the user.
- Remove additional defensive checks which are not needed.
- Link to v2: https://patch.msgid.link/20251126-kn-alternate-ref-dir-v2-0-8b9f6f18f635@gmail.com

Changes in v2:
- Added more clarification and proper intent in the cover message.
- Changed the format from '<ref_backend>://<path>' to
  `<ref_backend>://<URI-for-resource>` as it much clearer.
- Added logic to check for the '//' in the provided URI and a test for
  the same.
- In the tests:
  - Use test_must_fail() instead of ! git
  - Fix looped tests not using the variables correctly and ensure that
    the test description is correct.
- Link to v1: https://patch.msgid.link/20251119-kn-alternate-ref-dir-v1-0-4cf4a94c8bed@gmail.com

---
 Documentation/git.adoc |   8 ++++
 environment.h          |   1 +
 refs.c                 |  68 +++++++++++++++++++++++++--
 t/meson.build          |   1 +
 t/t1423-ref-backend.sh | 121 +++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 196 insertions(+), 3 deletions(-)

Karthik Nayak (2):
      refs: support obtaining ref_store for given dir
      refs: add GIT_REF_URI to specify reference backend and directory

Range-diff versus v2:

1:  5d37b2f0be = 1:  2b65f93e56 refs: support obtaining ref_store for given dir
2:  493c7ca098 ! 2:  d5dbb2f112 refs: add GIT_REF_URI to specify reference backend and directory
    @@ refs.c: static struct ref_store *get_ref_store_for_dir(struct repository *r,
     +	char *format_string;
     +	char *dir;
     +
    -+	if (!uri || !uri[0]) {
    -+		error("reference backend uri is empty");
    ++	if (!uri) {
    ++		error(_("reference backend uri is not provided"));
     +		goto cleanup;
     +	}
     +
     +	if (string_list_split(&ref_backend_info, uri, ":", 2) != 2) {
    -+		error("invalid reference backend uri format '%s'", uri);
    ++		error(_("invalid reference backend uri format '%s'"), uri);
     +		goto cleanup;
     +	}
     +
     +	format_string = ref_backend_info.items[0].string;
     +	if (!starts_with(ref_backend_info.items[1].string, "//")) {
    -+		error("invalid reference backend uri format '%s'", uri);
    ++		error(_("invalid reference backend uri format '%s'"), uri);
     +		goto cleanup;
     +	}
     +	dir = ref_backend_info.items[1].string + 2;
     +
    -+	format_string = ref_backend_info.items[0].string;
    -+	dir = ref_backend_info.items[1].string + 2;
    -+
    -+	if (!dir || !dir[0]) {
    -+		error("invalid path in uri '%s'", uri);
    ++	if (!dir[0]) {
    ++		error(_("invalid path in uri '%s'"), uri);
     +		goto cleanup;
     +	}
     +
     +	format = ref_storage_format_by_name(format_string);
     +	if (format == REF_STORAGE_FORMAT_UNKNOWN) {
    -+		error("unknown reference backend '%s'", format_string);
    ++		error(_("unknown reference backend '%s'"), format_string);
     +		goto cleanup;
     +	}
     +
    @@ t/t1423-ref-backend.sh (new)
     +		GIT_REF_URI="" &&
     +		export GIT_REF_URI &&
     +		test_must_fail git refs list 2>err &&
    -+		test_grep "reference backend uri is empty" err
    ++		test_grep "invalid reference backend uri format" err
     +	)
     +'
     +


base-commit: 9a2fb147f2c61d0cab52c883e7e26f5b7948e3ed
change-id: 20251105-kn-alternate-ref-dir-3e572e8cd0ef

Thanks
- Karthik


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v3 1/2] refs: support obtaining ref_store for given dir
  2025-12-01 11:24 ` [PATCH v3 0/2] refs: allow setting the reference directory Karthik Nayak
@ 2025-12-01 11:24   ` Karthik Nayak
  2025-12-01 11:24   ` [PATCH v3 2/2] refs: add GIT_REF_URI to specify reference backend and directory Karthik Nayak
  1 sibling, 0 replies; 33+ messages in thread
From: Karthik Nayak @ 2025-12-01 11:24 UTC (permalink / raw)
  To: git; +Cc: jltobler, gitster, toon, sunshine, Karthik Nayak

The refs subsystem uses the `get_main_ref_store()` to obtain the main
ref_store for a given repository. In the upcoming patches we also want
to create a ref_store for any given reference directory, which may exist
in arbitrary paths. For the files backend and the reftable backend, the
reference directory is generally the $GIT_DIR.

To support such behavior, extract out the core logic for creating out
the ref_store from `get_main_ref_store()` into a new function
`get_ref_store_for_dir()` which can provide the ref_store for a
given (repository, directory, reference format) combination.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/refs.c b/refs.c
index 965381367e..23f46867f2 100644
--- a/refs.c
+++ b/refs.c
@@ -2177,6 +2177,15 @@ void ref_store_release(struct ref_store *ref_store)
 	free(ref_store->gitdir);
 }
 
+static struct ref_store *get_ref_store_for_dir(struct repository *r,
+					       char *dir,
+					       enum ref_storage_format format)
+{
+	struct ref_store *ref_store = ref_store_init(r, format, dir,
+						     REF_STORE_ALL_CAPS);
+	return maybe_debug_wrap_ref_store(dir, ref_store);
+}
+
 struct ref_store *get_main_ref_store(struct repository *r)
 {
 	if (r->refs_private)
@@ -2185,9 +2194,7 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs_private = ref_store_init(r, r->ref_storage_format,
-					 r->gitdir, REF_STORE_ALL_CAPS);
-	r->refs_private = maybe_debug_wrap_ref_store(r->gitdir, r->refs_private);
+	r->refs_private = get_ref_store_for_dir(r, r->gitdir, r->ref_storage_format);
 	return r->refs_private;
 }
 

-- 
2.51.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 2/2] refs: add GIT_REF_URI to specify reference backend and directory
  2025-12-01 11:24 ` [PATCH v3 0/2] refs: allow setting the reference directory Karthik Nayak
  2025-12-01 11:24   ` [PATCH v3 1/2] refs: support obtaining ref_store for given dir Karthik Nayak
@ 2025-12-01 11:24   ` Karthik Nayak
  1 sibling, 0 replies; 33+ messages in thread
From: Karthik Nayak @ 2025-12-01 11:24 UTC (permalink / raw)
  To: git; +Cc: jltobler, gitster, toon, sunshine, Jean-Noël Avila,
	Karthik Nayak

Git allows setting a different object directory via
'GIT_OBJECT_DIRECTORY', but provides no equivalent for references.
This asymmetry makes it difficult to test different reference backends
or use alternative reference storage locations without modifying the
repository structure.

Add a new environment variable 'GIT_REF_URI' that specifies both the
reference backend and directory path using a URI format:

    <ref_backend>://<URI-for-resource>

When set, this variable is used to obtain the main reference store for
all Git commands. The variable is checked in `get_main_ref_store()`
when lazily assigning `repo->refs_private`. We cannot initialize this
earlier in `repo_set_gitdir()` because the repository's hash algorithm
isn't known at that point, and the reftable backend requires this
information during initialization.

When used with worktrees, the specified directory is treated as the
reference directory for all worktree operations.

Add a new test file 't1423-ref-backend.sh' to test this environment
variable.

Helped-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/git.adoc |   8 ++++
 environment.h          |   1 +
 refs.c                 |  57 ++++++++++++++++++++++-
 t/meson.build          |   1 +
 t/t1423-ref-backend.sh | 121 +++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 187 insertions(+), 1 deletion(-)

diff --git a/Documentation/git.adoc b/Documentation/git.adoc
index ce099e78b8..8c6a3f6042 100644
--- a/Documentation/git.adoc
+++ b/Documentation/git.adoc
@@ -584,6 +584,14 @@ double-quotes and respecting backslash escapes. E.g., the value
 	repositories will be set to this value. The default is "files".
 	See `--ref-format` in linkgit:git-init[1].
 
+`GIT_REF_URI`::
+    Specify which reference backend to be used along with its URI. Reference
+    backends like the files, reftable backend use the $GIT_DIR as their URI.
++
+Expects the format `<ref_backend>://<URI-for-resource>`, where the
+_<ref_backend>_ specifies the reference backend and the _<URI-for-resource>_
+specifies the URI used by the backend.
+
 Git Commits
 ~~~~~~~~~~~
 `GIT_AUTHOR_NAME`::
diff --git a/environment.h b/environment.h
index 51898c99cd..9bc380bba4 100644
--- a/environment.h
+++ b/environment.h
@@ -42,6 +42,7 @@
 #define GIT_OPTIONAL_LOCKS_ENVIRONMENT "GIT_OPTIONAL_LOCKS"
 #define GIT_TEXT_DOMAIN_DIR_ENVIRONMENT "GIT_TEXTDOMAINDIR"
 #define GIT_ATTR_SOURCE_ENVIRONMENT "GIT_ATTR_SOURCE"
+#define GIT_REF_URI_ENVIRONMENT "GIT_REF_URI"
 
 /*
  * Environment variable used to propagate the --no-advice global option to the
diff --git a/refs.c b/refs.c
index 23f46867f2..da76e0c54a 100644
--- a/refs.c
+++ b/refs.c
@@ -2186,15 +2186,70 @@ static struct ref_store *get_ref_store_for_dir(struct repository *r,
 	return maybe_debug_wrap_ref_store(dir, ref_store);
 }
 
+static struct ref_store *get_ref_store_from_uri(struct repository *repo,
+						const char *uri)
+{
+	struct string_list ref_backend_info = STRING_LIST_INIT_DUP;
+	enum ref_storage_format format;
+	struct ref_store *store = NULL;
+	char *format_string;
+	char *dir;
+
+	if (!uri) {
+		error(_("reference backend uri is not provided"));
+		goto cleanup;
+	}
+
+	if (string_list_split(&ref_backend_info, uri, ":", 2) != 2) {
+		error(_("invalid reference backend uri format '%s'"), uri);
+		goto cleanup;
+	}
+
+	format_string = ref_backend_info.items[0].string;
+	if (!starts_with(ref_backend_info.items[1].string, "//")) {
+		error(_("invalid reference backend uri format '%s'"), uri);
+		goto cleanup;
+	}
+	dir = ref_backend_info.items[1].string + 2;
+
+	if (!dir[0]) {
+		error(_("invalid path in uri '%s'"), uri);
+		goto cleanup;
+	}
+
+	format = ref_storage_format_by_name(format_string);
+	if (format == REF_STORAGE_FORMAT_UNKNOWN) {
+		error(_("unknown reference backend '%s'"), format_string);
+		goto cleanup;
+	}
+
+	store = get_ref_store_for_dir(repo, dir, format);
+
+cleanup:
+	string_list_clear(&ref_backend_info, 0);
+	return store;
+}
+
 struct ref_store *get_main_ref_store(struct repository *r)
 {
+	char *ref_uri;
+
 	if (r->refs_private)
 		return r->refs_private;
 
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs_private = get_ref_store_for_dir(r, r->gitdir, r->ref_storage_format);
+	ref_uri = getenv(GIT_REF_URI_ENVIRONMENT);
+	if (ref_uri) {
+		r->refs_private = get_ref_store_from_uri(r, ref_uri);
+		if (!r->refs_private)
+			die("failed to initialize ref store from URI: %s", ref_uri);
+
+	} else {
+		r->refs_private = get_ref_store_for_dir(r, r->gitdir,
+							r->ref_storage_format);
+	}
 	return r->refs_private;
 }
 
diff --git a/t/meson.build b/t/meson.build
index a5531df415..a66f8fafff 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -208,6 +208,7 @@ integration_tests = [
   't1420-lost-found.sh',
   't1421-reflog-write.sh',
   't1422-show-ref-exists.sh',
+  't1423-ref-backend.sh',
   't1430-bad-ref-name.sh',
   't1450-fsck.sh',
   't1451-fsck-buffer.sh',
diff --git a/t/t1423-ref-backend.sh b/t/t1423-ref-backend.sh
new file mode 100755
index 0000000000..f36125bf64
--- /dev/null
+++ b/t/t1423-ref-backend.sh
@@ -0,0 +1,121 @@
+#!/bin/sh
+
+test_description='Test different reference backend URIs'
+
+. ./test-lib.sh
+
+test_expect_success 'empty uri provided' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	(
+		cd repo &&
+		GIT_REF_URI="" &&
+		export GIT_REF_URI &&
+		test_must_fail git refs list 2>err &&
+		test_grep "invalid reference backend uri format" err
+	)
+'
+
+test_expect_success 'invalid uri provided' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	(
+		cd repo &&
+		GIT_REF_URI="reftable@/home/reftable" &&
+		export GIT_REF_URI &&
+		test_must_fail git refs list 2>err &&
+		test_grep "invalid reference backend uri format" err
+	)
+'
+
+test_expect_success 'empty path in uri' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	(
+		cd repo &&
+		GIT_REF_URI="reftable://" &&
+		export GIT_REF_URI &&
+		test_must_fail git refs list 2>err &&
+		test_grep "invalid path in uri" err
+	)
+'
+
+test_expect_success 'uri ends at colon' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	(
+		cd repo &&
+		GIT_REF_URI="reftable:" &&
+		export GIT_REF_URI &&
+		test_must_fail git refs list 2>err &&
+		test_grep "invalid reference backend uri format" err
+	)
+'
+
+test_expect_success 'unknown reference backend' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	(
+		cd repo &&
+		GIT_REF_URI="db://.git" &&
+		export GIT_REF_URI &&
+		test_must_fail git refs list 2>err &&
+		test_grep "unknown reference backend" err
+	)
+'
+
+ref_formats="files reftable"
+for from_format in $ref_formats
+do
+	for to_format in $ref_formats
+	do
+		if test "$from_format" = "$to_format"
+		then
+			continue
+		fi
+
+		test_expect_success "read from $to_format backend" '
+			test_when_finished "rm -rf repo" &&
+			git init --ref-format=$from_format repo &&
+			(
+				cd repo &&
+				test_commit 1 &&
+				test_commit 2 &&
+				test_commit 3 &&
+
+				git refs migrate --dry-run --ref-format=$to_format >out &&
+				BACKEND_PATH=$(cat out | sed "s/.* ${SQ}\(.*\)${SQ}/\1/") &&
+				git refs list >expect &&
+				GIT_REF_URI="$to_format://$BACKEND_PATH" git refs list >actual &&
+				test_cmp expect actual
+			)
+		'
+
+		test_expect_success "write to $to_format backend" '
+			test_when_finished "rm -rf repo" &&
+			git init --ref-format=$from_format repo &&
+			(
+				cd repo &&
+				test_commit 1 &&
+				test_commit 2 &&
+				test_commit 3 &&
+
+				git refs migrate --dry-run --ref-format=$to_format >out &&
+				git refs list >expect &&
+
+				BACKEND_PATH=$(cat out | sed "s/.* ${SQ}\(.*\)${SQ}/\1/") &&
+				GIT_REF_URI="$to_format://$BACKEND_PATH" git tag -d 1 &&
+
+				git refs list >actual &&
+				test_cmp expect actual &&
+
+				GIT_REF_URI="$to_format://$BACKEND_PATH" git refs list >expect &&
+				git refs list >out &&
+				cat out | grep -v "refs/tags/1" >actual &&
+				test_cmp expect actual
+			)
+		'
+	done
+done
+
+test_done

-- 
2.51.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] refs: allow setting the reference directory
  2025-11-23  4:29 ` [PATCH 0/2] refs: allow setting the reference directory Junio C Hamano
@ 2025-12-01 13:19   ` Patrick Steinhardt
  2025-12-02 10:25     ` Junio C Hamano
  2025-12-02 15:29     ` Karthik Nayak
  0 siblings, 2 replies; 33+ messages in thread
From: Patrick Steinhardt @ 2025-12-01 13:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Karthik Nayak, git

On Sat, Nov 22, 2025 at 08:29:22PM -0800, Junio C Hamano wrote:
> Karthik Nayak <karthik.188@gmail.com> writes:
> 
> > While Git allows users to select different reference backends, unlike
> > with objects, there is no flexibility in selecting the reference
> > directory. Currently, the reference format is obtained from the config
> > of the repository and the reference directory is set to the $GIT_DIR.
> 
> I actually am not sure if I like the proposed environment variable.
> 
> The proposal is based on an assumption that any reference backend
> should be able to move their backing store anywhere, and they should
> be able to express the location of their backing store as a single
> string <path>.  For a new backend, "where is your backing store" may
> not even be a question that does not make much sense (as "somewhere
> in the cloud that you do not even have to know" is certainly
> possible), and even for a new backend design that does allow such a
> question to have a meaningful answer, this "you have to be able to
> use a random place specified by this environment variable as your
> backing storage" is an additional requirement that its implementors
> may not need to satisfy in order to please their user base.
> 
> For reftable and files backends, these assumptions may be true, but
> then it is not too cumbersome if these stay to be backend specific,
> as there are only two backends.

I think it's a reasonable assumption to make that the path _can_ be
represented as a single string. For now, we don't really require any
configuration for the backend in the first place. So all you need to do
is to say:

    [extension]
    refStorage = reftable

This implicitly identifies the location of the backend, too, as we
derive it from the commondir/gitdir. As you say that's sufficient for
the "files" and "reftable" backends, but it may be insufficient for
other backends.

Suppose that we for example have a Postgres database to store data. It's
clearly not sufficient to specify "extension.refStorage=postgres", as
that wouldn't give you enough information to also know how to connect to
the database.

It's a problem I have been thinking about quite a lot in the context of
pluggable object databases, as well. Ultimately, the solution I arrived
at is to extend the extension format itself. For pluggable ODBs this
would look like this:

    [extension]
    objectStorage = postgres://127.0.0.1:5432?database=myrepo

This is similar to a normal URI with a schema: everything before the
"://" identifies the format that is to be used, and everything after is
then passed as-is to the backend itself. I think this should give us
enough flexibility for any future formats and it is easy enough to
configure. The added benefit is that this can also work in contexts like
the GIT_OBJECT_DIRECTORY and GIT_ALTERNATE_OBJECT_DIRECTORIES
environment variables, even though their naming is off now.

For the reference storage I think we should be moving into a similar
direction. Sure, for the current formats that we know its sufficient to
only specify their directory. But I think we should treat the directory
as an opaque string and then let the reference backend handle it, same
as with the proposed format for object databases:

    # A schema-only variable will be treated as if we specified the
    # common directory.
    [extension]
    refStorage = reftable

    # It's also possible to explicitly specify a different location for
    # the backend.
    [extension]
    refStorage = reftable:///foo/bar

    # And same as above, we can also specify non-locations.
    [extension]
    refStorage = postgres://127.0.0.1:5432?database=myrepo

As said, the important thing here is that the reference backends get the
string after the schema as opaque blobs that they can self-interpret.

> So I dunno.  In addition, if this is designed to help migration
> (which is the impression I am getting from the cover letter
> description), don't you need a way to specify more than one (i.e.,
> source to migrate from and destination to migrate to)?  With a
> single GIT_REF_URI, it would not be obvious what it refers to,
> whether it is an additional place to write to, to read from, or
> something completely unrelated.  For example ...

I think we cannot easily retrofit handling of multiple refdbs into Git
at this point in time anymore. The way to drive this would be that we
have two processes:

  - One `git refs list` process in the repository that uses the old
    format.

  - One `git update-ref --stdin` process in the repository that uses the
    new format specified via GIT_REF_URI.

This allows us to do an online migration of data into a separate ref
store.

> > This patch series adds a new ENV variable 'GIT_REF_URI' which takes the
> > reference backend and path in a URI form:
> >
> >     <reference_backend>://<path>
> >
> > For e.g. 'reftable:///foo' or 'files://$GIT_DIR/ref_migration.0xBsa0'.
> >
> > One use case for this is migration between different backends. On the
> > server side, migrating from the files backend to the newly introduced
> > reftable backend can be achieved by running 'git refs migrate'. However,
> > for large repositories with millions of references, this migration can
> > take from seconds to minutes.
> >
> > We could make the migration non-blocking by running the migration in the
> > background and capturing and replaying updates to both backends. This
> > would require Git to support writing references to different reference
> > backends and paths.
> 
> ... I am reading that the above is saying that the system will write
> to whatever reference backend specified in the extension.refStorage,
> plus also where GIT_REF_URI points at, but if that is the way how
> the mechanism works, the variable should be named more specific to
> what it does, no?  It is not just a random "REF URI"; it is an
> additional ref backend that the updates are dumped to.  Maybe there
> would be a different use case where you may want to read from two
> reference backends, and you'd need to specify the secondary one with
> an environment variable, but if the system behaves one specific way
> for GIT_REF_URI (say, all updates are also copied to this additional
> ref backend at the specified ref backing store), a different
> environment variable name needs to be chosen to serve such a
> different use case, no?

Truth be told, I'm not realy a huge fan of the name, either. But as
said, I don't think we can easily "overlay" multiple refdbs, as it would
lead to various different questions due to our hierarchical layout of
references.

That being said, I personally would prefer `GIT_REFERENCE_BACKEND` as
variable name that accepts exactly the same kind of strings as the
`extension.refStorage` values I have proposed above.

Thanks!

Patrick

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory
  2025-11-19 21:48 ` [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory Karthik Nayak
                     ` (3 preceding siblings ...)
  2025-11-21 13:42   ` Toon Claes
@ 2025-12-01 13:28   ` Patrick Steinhardt
  2025-12-02 22:21     ` Karthik Nayak
  4 siblings, 1 reply; 33+ messages in thread
From: Patrick Steinhardt @ 2025-12-01 13:28 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git

On Wed, Nov 19, 2025 at 10:48:53PM +0100, Karthik Nayak wrote:
> Git allows setting a different object directory via
> 'GIT_OBJECT_DIRECTORY', but provides no equivalent for references.
> This asymmetry makes it difficult to test different reference backends
> or use alternative reference storage locations without modifying the
> repository structure.
> 
> Add a new environment variable 'GIT_REF_URI' that specifies both the
> reference backend and directory path using a URI format:
> 
>     <ref_backend>://<path>
> 
> When set, this variable is used to obtain the main reference store for
> all Git commands. The variable is checked in `get_main_ref_store()`
> when lazily assigning `repo->refs_private`. We cannot initialize this
> earlier in `repo_set_gitdir()` because the repository's hash algorithm
> isn't known at that point, and the reftable backend requires this
> information during initialization.
> 
> When used with worktrees, the specified directory is treated as the
> reference directory for all worktree operations.
> 
> Add a new test file 't1423-ref-backend.sh' to test this environment
> variable.

Based on my reply in <aS2V4TKeS4V_oxAb@pks.im> I wonder whether we want
to take a bit of a different approach:

  - We extend the format understood by "extensions.refStorage" to
    understand "schema://data"-style strings and adapt the "data" part
    to be passed through to the reference backend.

  - We then use the same mechanism to parse both "extensions.refStorage"
    and the environment variable.

This would have a couple advantages:

  - We make the ref storage extension more flexible so that you can move
    your reference backends somewhere else entirely.

  - We prepare for a potential future ref format that _needs_ to receive
    data as input.

  - We have consistent behaviour between the environment variable and
    the extension. So basically, the environment variable starts to
    behave as an override to the extension.

One issue that we'd then have to solve is how to derive the worktree
references from the backend. Arguably though, I think that the extension
that was specified should also be sufficient to identify the location of
the worktree references.

We'd have to refactor the code base a bit though to properly reflect
that in our tree. One way to do this is to extend `ref_store_init()` so
that it receives the worktree (or NULL) as input. In that case, we would
continue to pass the combination of format and "data" to the init
function, and it would then know to locate the worktree references
itself.

What do you think?

Thanks!

Patrick

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] refs: allow setting the reference directory
  2025-12-01 13:19   ` Patrick Steinhardt
@ 2025-12-02 10:25     ` Junio C Hamano
  2025-12-02 15:29     ` Karthik Nayak
  1 sibling, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2025-12-02 10:25 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Karthik Nayak, git

Patrick Steinhardt <ps@pks.im> writes:

> For the reference storage I think we should be moving into a similar
> direction. Sure, for the current formats that we know its sufficient to
> only specify their directory. But I think we should treat the directory
> as an opaque string and then let the reference backend handle it, same
> as with the proposed format for object databases:
>
>     # A schema-only variable will be treated as if we specified the
>     # common directory.
>     [extension]
>     refStorage = reftable
>
>     # It's also possible to explicitly specify a different location for
>     # the backend.
>     [extension]
>     refStorage = reftable:///foo/bar
>
>     # And same as above, we can also specify non-locations.
>     [extension]
>     refStorage = postgres://127.0.0.1:5432?database=myrepo

Cute.  I kinda like it ;-)



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] refs: allow setting the reference directory
  2025-12-01 13:19   ` Patrick Steinhardt
  2025-12-02 10:25     ` Junio C Hamano
@ 2025-12-02 15:29     ` Karthik Nayak
  1 sibling, 0 replies; 33+ messages in thread
From: Karthik Nayak @ 2025-12-02 15:29 UTC (permalink / raw)
  To: Patrick Steinhardt, Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 7538 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> On Sat, Nov 22, 2025 at 08:29:22PM -0800, Junio C Hamano wrote:
>> Karthik Nayak <karthik.188@gmail.com> writes:
>>
>> > While Git allows users to select different reference backends, unlike
>> > with objects, there is no flexibility in selecting the reference
>> > directory. Currently, the reference format is obtained from the config
>> > of the repository and the reference directory is set to the $GIT_DIR.
>>
>> I actually am not sure if I like the proposed environment variable.
>>
>> The proposal is based on an assumption that any reference backend
>> should be able to move their backing store anywhere, and they should
>> be able to express the location of their backing store as a single
>> string <path>.  For a new backend, "where is your backing store" may
>> not even be a question that does not make much sense (as "somewhere
>> in the cloud that you do not even have to know" is certainly
>> possible), and even for a new backend design that does allow such a
>> question to have a meaningful answer, this "you have to be able to
>> use a random place specified by this environment variable as your
>> backing storage" is an additional requirement that its implementors
>> may not need to satisfy in order to please their user base.
>>
>> For reftable and files backends, these assumptions may be true, but
>> then it is not too cumbersome if these stay to be backend specific,
>> as there are only two backends.
>
> I think it's a reasonable assumption to make that the path _can_ be
> represented as a single string. For now, we don't really require any
> configuration for the backend in the first place. So all you need to do
> is to say:
>
>     [extension]
>     refStorage = reftable
>
> This implicitly identifies the location of the backend, too, as we
> derive it from the commondir/gitdir. As you say that's sufficient for
> the "files" and "reftable" backends, but it may be insufficient for
> other backends.
>
> Suppose that we for example have a Postgres database to store data. It's
> clearly not sufficient to specify "extension.refStorage=postgres", as
> that wouldn't give you enough information to also know how to connect to
> the database.
>
> It's a problem I have been thinking about quite a lot in the context of
> pluggable object databases, as well. Ultimately, the solution I arrived
> at is to extend the extension format itself. For pluggable ODBs this
> would look like this:
>
>     [extension]
>     objectStorage = postgres://127.0.0.1:5432?database=myrepo
>
> This is similar to a normal URI with a schema: everything before the
> "://" identifies the format that is to be used, and everything after is
> then passed as-is to the backend itself. I think this should give us
> enough flexibility for any future formats and it is easy enough to
> configure. The added benefit is that this can also work in contexts like
> the GIT_OBJECT_DIRECTORY and GIT_ALTERNATE_OBJECT_DIRECTORIES
> environment variables, even though their naming is off now.
>
> For the reference storage I think we should be moving into a similar
> direction. Sure, for the current formats that we know its sufficient to
> only specify their directory. But I think we should treat the directory
> as an opaque string and then let the reference backend handle it, same
> as with the proposed format for object databases:
>
>     # A schema-only variable will be treated as if we specified the
>     # common directory.
>     [extension]
>     refStorage = reftable
>
>     # It's also possible to explicitly specify a different location for
>     # the backend.
>     [extension]
>     refStorage = reftable:///foo/bar
>
>     # And same as above, we can also specify non-locations.
>     [extension]
>     refStorage = postgres://127.0.0.1:5432?database=myrepo
>
> As said, the important thing here is that the reference backends get the
> string after the schema as opaque blobs that they can self-interpret.
>

I think you bring in some good points here, I didn't think of
`extension.refStorage` and I think we can extend that like you
mentioned, while staying backwards compatible.

>> So I dunno.  In addition, if this is designed to help migration
>> (which is the impression I am getting from the cover letter
>> description), don't you need a way to specify more than one (i.e.,
>> source to migrate from and destination to migrate to)?  With a
>> single GIT_REF_URI, it would not be obvious what it refers to,
>> whether it is an additional place to write to, to read from, or
>> something completely unrelated.  For example ...
>
> I think we cannot easily retrofit handling of multiple refdbs into Git
> at this point in time anymore. The way to drive this would be that we
> have two processes:
>
>   - One `git refs list` process in the repository that uses the old
>     format.
>
>   - One `git update-ref --stdin` process in the repository that uses the
>     new format specified via GIT_REF_URI.
>
> This allows us to do an online migration of data into a separate ref
> store.
>

That's exactly the mechanism I was talking about, thanks for explaining.

>> > This patch series adds a new ENV variable 'GIT_REF_URI' which takes the
>> > reference backend and path in a URI form:
>> >
>> >     <reference_backend>://<path>
>> >
>> > For e.g. 'reftable:///foo' or 'files://$GIT_DIR/ref_migration.0xBsa0'.
>> >
>> > One use case for this is migration between different backends. On the
>> > server side, migrating from the files backend to the newly introduced
>> > reftable backend can be achieved by running 'git refs migrate'. However,
>> > for large repositories with millions of references, this migration can
>> > take from seconds to minutes.
>> >
>> > We could make the migration non-blocking by running the migration in the
>> > background and capturing and replaying updates to both backends. This
>> > would require Git to support writing references to different reference
>> > backends and paths.
>>
>> ... I am reading that the above is saying that the system will write
>> to whatever reference backend specified in the extension.refStorage,
>> plus also where GIT_REF_URI points at, but if that is the way how
>> the mechanism works, the variable should be named more specific to
>> what it does, no?  It is not just a random "REF URI"; it is an
>> additional ref backend that the updates are dumped to.  Maybe there
>> would be a different use case where you may want to read from two
>> reference backends, and you'd need to specify the secondary one with
>> an environment variable, but if the system behaves one specific way
>> for GIT_REF_URI (say, all updates are also copied to this additional
>> ref backend at the specified ref backing store), a different
>> environment variable name needs to be chosen to serve such a
>> different use case, no?
>
> Truth be told, I'm not realy a huge fan of the name, either. But as
> said, I don't think we can easily "overlay" multiple refdbs, as it would
> lead to various different questions due to our hierarchical layout of
> references.
>
> That being said, I personally would prefer `GIT_REFERENCE_BACKEND` as
> variable name that accepts exactly the same kind of strings as the
> `extension.refStorage` values I have proposed above.
>

Fair enough. Once both the env variable and `extension.refStorage` take
in the same input, it does make sense to rename the env variable to
`GIT_REFERENCE_BACKEND`.

> Thanks!
>
> Patrick

Thanks for your input. I'll make the necessary changes for v4 :)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory
  2025-12-01 13:28   ` Patrick Steinhardt
@ 2025-12-02 22:21     ` Karthik Nayak
  0 siblings, 0 replies; 33+ messages in thread
From: Karthik Nayak @ 2025-12-02 22:21 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3421 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> On Wed, Nov 19, 2025 at 10:48:53PM +0100, Karthik Nayak wrote:
>> Git allows setting a different object directory via
>> 'GIT_OBJECT_DIRECTORY', but provides no equivalent for references.
>> This asymmetry makes it difficult to test different reference backends
>> or use alternative reference storage locations without modifying the
>> repository structure.
>>
>> Add a new environment variable 'GIT_REF_URI' that specifies both the
>> reference backend and directory path using a URI format:
>>
>>     <ref_backend>://<path>
>>
>> When set, this variable is used to obtain the main reference store for
>> all Git commands. The variable is checked in `get_main_ref_store()`
>> when lazily assigning `repo->refs_private`. We cannot initialize this
>> earlier in `repo_set_gitdir()` because the repository's hash algorithm
>> isn't known at that point, and the reftable backend requires this
>> information during initialization.
>>
>> When used with worktrees, the specified directory is treated as the
>> reference directory for all worktree operations.
>>
>> Add a new test file 't1423-ref-backend.sh' to test this environment
>> variable.
>
> Based on my reply in <aS2V4TKeS4V_oxAb@pks.im> I wonder whether we want
> to take a bit of a different approach:
>
>   - We extend the format understood by "extensions.refStorage" to
>     understand "schema://data"-style strings and adapt the "data" part
>     to be passed through to the reference backend.
>
>   - We then use the same mechanism to parse both "extensions.refStorage"
>     and the environment variable.
>
> This would have a couple advantages:
>
>   - We make the ref storage extension more flexible so that you can move
>     your reference backends somewhere else entirely.
>
>   - We prepare for a potential future ref format that _needs_ to receive
>     data as input.
>
>   - We have consistent behaviour between the environment variable and
>     the extension. So basically, the environment variable starts to
>     behave as an override to the extension.
>

I did read/respond to your reply there and I agree with your suggested
approach. An additional advantage would be that this would also mean the
ENV variable is more deeply integrated. So the backend override added by
the ENV variable would also show up when running `git repo info`.

> One issue that we'd then have to solve is how to derive the worktree
> references from the backend. Arguably though, I think that the extension
> that was specified should also be sufficient to identify the location of
> the worktree references.
>
> We'd have to refactor the code base a bit though to properly reflect
> that in our tree. One way to do this is to extend `ref_store_init()` so
> that it receives the worktree (or NULL) as input. In that case, we would
> continue to pass the combination of format and "data" to the init
> function, and it would then know to locate the worktree references
> itself.
>

Yeah, I'm considering adding this information to the `repository`
structure, so along with `ref_storage_format`, it would also contain a
`ref_storage_data` which would be passed down to `get_main_ref_store()`
which would in-turn call `ref_store_init()`.

In that sense, when in a worktree the $GIT_DIR is set appropriately and
this should all work accordingly.

> What do you think?
>
> Thanks!
>
> Patrick

Sounds great. Thanks for the input

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2025-12-02 22:21 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-19 21:48 [PATCH 0/2] refs: allow setting the reference directory Karthik Nayak
2025-11-19 21:48 ` [PATCH 1/2] refs: support obtaining ref_store for given dir Karthik Nayak
2025-11-20 19:05   ` Justin Tobler
2025-11-21 11:18     ` Karthik Nayak
2025-11-19 21:48 ` [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory Karthik Nayak
2025-11-19 22:13   ` Eric Sunshine
2025-11-19 23:01     ` Karthik Nayak
2025-11-20 10:00   ` Jean-Noël Avila
2025-11-21 11:21     ` Karthik Nayak
2025-11-20 19:38   ` Justin Tobler
2025-11-24 13:23     ` Karthik Nayak
2025-11-21 13:42   ` Toon Claes
2025-11-21 16:07     ` Junio C Hamano
2025-11-24 13:25       ` Karthik Nayak
2025-11-26 13:11         ` Toon Claes
2025-11-24 13:26     ` Karthik Nayak
2025-12-01 13:28   ` Patrick Steinhardt
2025-12-02 22:21     ` Karthik Nayak
2025-11-23  4:29 ` [PATCH 0/2] refs: allow setting the reference directory Junio C Hamano
2025-12-01 13:19   ` Patrick Steinhardt
2025-12-02 10:25     ` Junio C Hamano
2025-12-02 15:29     ` Karthik Nayak
2025-11-26 11:11 ` [PATCH v2 " Karthik Nayak
2025-11-26 11:12   ` [PATCH v2 1/2] refs: support obtaining ref_store for given dir Karthik Nayak
2025-11-26 15:16     ` Junio C Hamano
2025-11-26 11:12   ` [PATCH v2 2/2] refs: add GIT_REF_URI to specify reference backend and directory Karthik Nayak
2025-11-26 16:17     ` Junio C Hamano
2025-11-27 14:52       ` Karthik Nayak
2025-11-27 20:02         ` Junio C Hamano
2025-11-27 21:45           ` Karthik Nayak
2025-12-01 11:24 ` [PATCH v3 0/2] refs: allow setting the reference directory Karthik Nayak
2025-12-01 11:24   ` [PATCH v3 1/2] refs: support obtaining ref_store for given dir Karthik Nayak
2025-12-01 11:24   ` [PATCH v3 2/2] refs: add GIT_REF_URI to specify reference backend and directory Karthik Nayak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).