Re: [PATCH] refs: support migration with worktrees

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Patrick Steinhardt <ps@pks.im>
To: Sam Bostock via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Sam Bostock <sam.bostock@shopify.com>
Subject: Re: [PATCH] refs: support migration with worktrees
Date: Tue, 28 Oct 2025 08:28:08 +0100	[thread overview]
Message-ID: <aQBwiE-bhqcaSHG_@pks.im> (raw)
In-Reply-To: <pull.2077.git.git.1761589580028.gitgitgadget@gmail.com>

On Mon, Oct 27, 2025 at 06:26:20PM +0000, Sam Bostock via GitGitGadget wrote:
> From: Sam Bostock <sam.bostock@shopify.com>
> 
> Remove the worktree limitation from `git refs migrate` by implementing
> migration support for repositories with linked worktrees.
> 
> Previously, attempting to migrate a repository with worktrees would fail
> with "migrating repositories with worktrees is not supported yet". This
> limitation existed because each worktree has its own ref storage that
> needed to be migrated separately.
> 
> Migration now uses a multi-phase approach to safely handle multiple
> worktrees:
> 
> 1. Phase 1: Iterate through all worktrees and create temporary new ref
>    storage for each in a staging directory.
> 
> 2. Phase 2: For each worktree, backup the existing ref storage, then
>    move the new storage into place.
> 
> 3. Phase 3: Update the repository format config, clear cached ref stores,
>    and delete all backups. On failure, restore from backups and report
>    where the migrated refs can be found for manual recovery.

Makes sense.

> This approach ensures that if migration fails partway through, the
> repository can be restored to its original state.
> 
> Key implementation details:
> 
> - For files backend: Create a commondir file in temp directories for
>   linked worktrees so the files backend knows where the common git
>   directory is located.

Hm, okay. Not yet sure why we need this, but let's read on.

> - For linked worktrees: Use non-INITIAL transactions to avoid creating
>   packed-refs files (linked worktrees should never have packed-refs).

Hm, this is unfortunate. The reason why we use initial transactions is
twofold:

  - First, we want to avoid creating loose refs, only. This is indeed
    something we must not do with worktrees, as you point out.

  - But second, we also want to skip pointless checks like the conflict
    checks. This results in quite a saving.

Would've been great to retain the second property, but I guess as long
as we only do this for worktrees it's okayish and something we can worry
about at a later point in time. Better to migrate the refs slowish than
not at all.

> - Filter refs during iteration: Linked worktrees only migrate their
>   per-worktree refs (refs/bisect/*, refs/rewritten/*, refs/worktree/*).
>   Shared refs are migrated once in the main worktree.

Makes sense.

> - Write per-worktree refs as loose files: The files backend's
>   transaction_finish_initial() optimization writes most refs to
>   packed-refs, but per-worktree refs must be stored as loose files
>   to maintain proper worktree isolation.

Isn't this roughly the same as the second bullet point? Feels like they
should be merged together.

> diff --git a/Documentation/git-refs.adoc b/Documentation/git-refs.adoc
> index fa33680cc7..f6a3bf4f03 100644
> --- a/Documentation/git-refs.adoc
> +++ b/Documentation/git-refs.adoc
> @@ -30,7 +30,8 @@ COMMANDS
>  --------
>  
>  `migrate`::
> -	Migrate ref store between different formats.
> +	Migrate ref store between different formats. Supports repositories
> +	with worktrees; migration must be run from the main worktree.

It feels a bit weird to single our worktrees specifically. We don't say
that the tool supports bare and non-bare repositories, either, so the
only reason why we'd have the note about worktrees is historic legacy.
How about this instead:

    Migrate ref storage between different formats. Must be run from the
    main worktree in case the repository uses worktrees.

> @@ -95,7 +96,7 @@ KNOWN LIMITATIONS
>  
>  The ref format migration has several known limitations in its current form:
>  
> -* It is not possible to migrate repositories that have worktrees.
> +* Migration must be run from the main worktree.
>  

I'd drop this bullet point entirely, as I don't really see this as a
limitation anymore.

> diff --git a/refs.c b/refs.c
> index 965381367e..0834329e5a 100644
> --- a/refs.c
> +++ b/refs.c

I'm sorry, but this is _extremely_ hard to review as we're changing
almost all of the implementation at once with random changes left and
right. Furthermore, I feel like we're getting way to intimate with the
different backends here -- that shouldn't be the case though, the logic
that is specific to the backends should really live in the backends
themselves.

The way I'd expect a series like this to look like is to have commits
that:

  1. Do preparatory changes, e.g. teach the files backend to not create
     a packed-refs file for worktrees during migration.

  2. Pull out the logic to migrate a single reference backend that we
     already have into a separate function that can be called in a loop.
     The end result should be a function that accepts the old refdb as
     input and that returns the new refdb.

  3. Implement the logic that calls the function we introduced in (2)
     for each worktree. This can be done by iterating through all the
     worktrees, calling `get_worktree_ref_store()` on it and then
     passing the refdb to the new function.

> @@ -2974,8 +2976,296 @@ struct migration_data {
[snip]
> +/*
> + * Create a commondir file in the temporary migration directory for a linked
> + * worktree. The files backend needs this to locate the common git directory.
> + * Returns 0 on success, -1 on failure.
> + */
> +static int create_commondir_file(const char *new_dir, const char *worktree_path,
> +				  struct strbuf *errbuf)
> +{

I still don't get why we need this. We should have access to both the
ref store of the worktree and the repository, and both of these are
handled in the same process. So there shouldn't be a need to propagate
the commondir via a file.

[snip]
> +/*
> + * Returns the list of ref storage items to backup/restore for a worktree.
> + * Main worktrees include packed-refs, linked worktrees do not.
> + */
> +static const char **get_ref_storage_items(int is_main_worktree)
> +{
> +	static const char *main_items[] = {"refs", "logs", "reftable", "packed-refs", NULL};
> +	static const char *linked_items[] = {"refs", "logs", "reftable", NULL};
> +
> +	return is_main_worktree ? main_items : linked_items;
> +}

This here is what I was referring to as "too intimate with the
backends". This logic should be entirely self-contained in the backends,
and it already is for migrating the main worktree. We have
`ref_store_remove_on_disk()` to prune old data, and as the new ref
storage is written into a temporary directory we don't need to enumerate
its contents, but can instead move all of its entries into the gitdir
directly.

I guess the reason why you have this and all of the following functions
is to create the backups. But that logic must not live in "refs.c", but
it really should live in the backends. I could for example see a new
function that moves a ref store to a different directory.

An alternative would be to not do the backups at all. We only start
doing "destructive" operations when all the new backends have already
been created, so the last step would be to rename everything into place.
If this operation fails or gets cancelled we are left with a broken
repository, true. But the data is not lost, as we can in theory continue
to rename the remaining data into place.

So maybe that's good enough? The user would have to manually restore in
either of the cases, so we don't really gain that much by having a
backup in the first place.

> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 8d7007f4aa..6be56274d3 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3210,11 +3210,13 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
>  		string_list_append(&refnames_to_check, update->refname);
>  
>  		/*
> -		 * packed-refs don't support symbolic refs, root refs and reflogs,
> -		 * so we have to queue these references via the loose transaction.
> +		 * packed-refs don't support symbolic refs, root refs, per-worktree
> +		 * refs, and reflogs, so we have to queue these references via the
> +		 * loose transaction.
>  		 */
>  		if (update->new_target ||
>  		    is_root_ref(update->refname) ||
> +		    is_per_worktree_ref(update->refname) ||
>  		    (update->flags & REF_LOG_ONLY)) {
>  			if (!loose_transaction) {
>  				loose_transaction = ref_store_transaction_begin(&refs->base, 0, err);

This is one of these changes where I think it would make sense to split
them out into separate commits so that they can be properly singled out
and explained.

Patrick

next prev parent reply	other threads:[~2025-10-28  7:28 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-27 18:26 [PATCH] refs: support migration with worktrees Sam Bostock via GitGitGadget
2025-10-28  7:28 ` Patrick Steinhardt [this message]
2025-10-28 16:00   ` Junio C Hamano
2025-10-29 10:10     ` Patrick Steinhardt
2025-10-29 11:33       ` Kristoffer Haugsbakk
2025-10-29 16:22         ` Ben Knoble
2025-10-30  6:37         ` Patrick Steinhardt
2025-10-29 14:47       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQBwiE-bhqcaSHG_@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=sam.bostock@shopify.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).