git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Junio C Hamano <gitster@pobox.com>
Cc: Karthik Nayak <karthik.188@gmail.com>, git@vger.kernel.org
Subject: Re: [PATCH 0/2] refs: allow setting the reference directory
Date: Mon, 1 Dec 2025 14:19:29 +0100	[thread overview]
Message-ID: <aS2V4TKeS4V_oxAb@pks.im> (raw)
In-Reply-To: <xmqq34651ie5.fsf@gitster.g>

On Sat, Nov 22, 2025 at 08:29:22PM -0800, Junio C Hamano wrote:
> Karthik Nayak <karthik.188@gmail.com> writes:
> 
> > While Git allows users to select different reference backends, unlike
> > with objects, there is no flexibility in selecting the reference
> > directory. Currently, the reference format is obtained from the config
> > of the repository and the reference directory is set to the $GIT_DIR.
> 
> I actually am not sure if I like the proposed environment variable.
> 
> The proposal is based on an assumption that any reference backend
> should be able to move their backing store anywhere, and they should
> be able to express the location of their backing store as a single
> string <path>.  For a new backend, "where is your backing store" may
> not even be a question that does not make much sense (as "somewhere
> in the cloud that you do not even have to know" is certainly
> possible), and even for a new backend design that does allow such a
> question to have a meaningful answer, this "you have to be able to
> use a random place specified by this environment variable as your
> backing storage" is an additional requirement that its implementors
> may not need to satisfy in order to please their user base.
> 
> For reftable and files backends, these assumptions may be true, but
> then it is not too cumbersome if these stay to be backend specific,
> as there are only two backends.

I think it's a reasonable assumption to make that the path _can_ be
represented as a single string. For now, we don't really require any
configuration for the backend in the first place. So all you need to do
is to say:

    [extension]
    refStorage = reftable

This implicitly identifies the location of the backend, too, as we
derive it from the commondir/gitdir. As you say that's sufficient for
the "files" and "reftable" backends, but it may be insufficient for
other backends.

Suppose that we for example have a Postgres database to store data. It's
clearly not sufficient to specify "extension.refStorage=postgres", as
that wouldn't give you enough information to also know how to connect to
the database.

It's a problem I have been thinking about quite a lot in the context of
pluggable object databases, as well. Ultimately, the solution I arrived
at is to extend the extension format itself. For pluggable ODBs this
would look like this:

    [extension]
    objectStorage = postgres://127.0.0.1:5432?database=myrepo

This is similar to a normal URI with a schema: everything before the
"://" identifies the format that is to be used, and everything after is
then passed as-is to the backend itself. I think this should give us
enough flexibility for any future formats and it is easy enough to
configure. The added benefit is that this can also work in contexts like
the GIT_OBJECT_DIRECTORY and GIT_ALTERNATE_OBJECT_DIRECTORIES
environment variables, even though their naming is off now.

For the reference storage I think we should be moving into a similar
direction. Sure, for the current formats that we know its sufficient to
only specify their directory. But I think we should treat the directory
as an opaque string and then let the reference backend handle it, same
as with the proposed format for object databases:

    # A schema-only variable will be treated as if we specified the
    # common directory.
    [extension]
    refStorage = reftable

    # It's also possible to explicitly specify a different location for
    # the backend.
    [extension]
    refStorage = reftable:///foo/bar

    # And same as above, we can also specify non-locations.
    [extension]
    refStorage = postgres://127.0.0.1:5432?database=myrepo

As said, the important thing here is that the reference backends get the
string after the schema as opaque blobs that they can self-interpret.

> So I dunno.  In addition, if this is designed to help migration
> (which is the impression I am getting from the cover letter
> description), don't you need a way to specify more than one (i.e.,
> source to migrate from and destination to migrate to)?  With a
> single GIT_REF_URI, it would not be obvious what it refers to,
> whether it is an additional place to write to, to read from, or
> something completely unrelated.  For example ...

I think we cannot easily retrofit handling of multiple refdbs into Git
at this point in time anymore. The way to drive this would be that we
have two processes:

  - One `git refs list` process in the repository that uses the old
    format.

  - One `git update-ref --stdin` process in the repository that uses the
    new format specified via GIT_REF_URI.

This allows us to do an online migration of data into a separate ref
store.

> > This patch series adds a new ENV variable 'GIT_REF_URI' which takes the
> > reference backend and path in a URI form:
> >
> >     <reference_backend>://<path>
> >
> > For e.g. 'reftable:///foo' or 'files://$GIT_DIR/ref_migration.0xBsa0'.
> >
> > One use case for this is migration between different backends. On the
> > server side, migrating from the files backend to the newly introduced
> > reftable backend can be achieved by running 'git refs migrate'. However,
> > for large repositories with millions of references, this migration can
> > take from seconds to minutes.
> >
> > We could make the migration non-blocking by running the migration in the
> > background and capturing and replaying updates to both backends. This
> > would require Git to support writing references to different reference
> > backends and paths.
> 
> ... I am reading that the above is saying that the system will write
> to whatever reference backend specified in the extension.refStorage,
> plus also where GIT_REF_URI points at, but if that is the way how
> the mechanism works, the variable should be named more specific to
> what it does, no?  It is not just a random "REF URI"; it is an
> additional ref backend that the updates are dumped to.  Maybe there
> would be a different use case where you may want to read from two
> reference backends, and you'd need to specify the secondary one with
> an environment variable, but if the system behaves one specific way
> for GIT_REF_URI (say, all updates are also copied to this additional
> ref backend at the specified ref backing store), a different
> environment variable name needs to be chosen to serve such a
> different use case, no?

Truth be told, I'm not realy a huge fan of the name, either. But as
said, I don't think we can easily "overlay" multiple refdbs, as it would
lead to various different questions due to our hierarchical layout of
references.

That being said, I personally would prefer `GIT_REFERENCE_BACKEND` as
variable name that accepts exactly the same kind of strings as the
`extension.refStorage` values I have proposed above.

Thanks!

Patrick

  reply	other threads:[~2025-12-01 13:19 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-19 21:48 [PATCH 0/2] refs: allow setting the reference directory Karthik Nayak
2025-11-19 21:48 ` [PATCH 1/2] refs: support obtaining ref_store for given dir Karthik Nayak
2025-11-20 19:05   ` Justin Tobler
2025-11-21 11:18     ` Karthik Nayak
2025-11-19 21:48 ` [PATCH 2/2] refs: add GIT_REF_URI to specify reference backend and directory Karthik Nayak
2025-11-19 22:13   ` Eric Sunshine
2025-11-19 23:01     ` Karthik Nayak
2025-11-20 10:00   ` Jean-Noël Avila
2025-11-21 11:21     ` Karthik Nayak
2025-11-20 19:38   ` Justin Tobler
2025-11-24 13:23     ` Karthik Nayak
2025-11-21 13:42   ` Toon Claes
2025-11-21 16:07     ` Junio C Hamano
2025-11-24 13:25       ` Karthik Nayak
2025-11-26 13:11         ` Toon Claes
2025-11-24 13:26     ` Karthik Nayak
2025-12-01 13:28   ` Patrick Steinhardt
2025-12-02 22:21     ` Karthik Nayak
2025-11-23  4:29 ` [PATCH 0/2] refs: allow setting the reference directory Junio C Hamano
2025-12-01 13:19   ` Patrick Steinhardt [this message]
2025-12-02 10:25     ` Junio C Hamano
2025-12-02 15:29     ` Karthik Nayak
2025-11-26 11:11 ` [PATCH v2 " Karthik Nayak
2025-11-26 11:12   ` [PATCH v2 1/2] refs: support obtaining ref_store for given dir Karthik Nayak
2025-11-26 15:16     ` Junio C Hamano
2025-11-26 11:12   ` [PATCH v2 2/2] refs: add GIT_REF_URI to specify reference backend and directory Karthik Nayak
2025-11-26 16:17     ` Junio C Hamano
2025-11-27 14:52       ` Karthik Nayak
2025-11-27 20:02         ` Junio C Hamano
2025-11-27 21:45           ` Karthik Nayak
2025-12-01 11:24 ` [PATCH v3 0/2] refs: allow setting the reference directory Karthik Nayak
2025-12-01 11:24   ` [PATCH v3 1/2] refs: support obtaining ref_store for given dir Karthik Nayak
2025-12-01 11:24   ` [PATCH v3 2/2] refs: add GIT_REF_URI to specify reference backend and directory Karthik Nayak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aS2V4TKeS4V_oxAb@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=karthik.188@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).