From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: git@vger.kernel.org
Subject: Re: Pathological performance with git remote rename and many tracking refs
Date: Thu, 14 Apr 2022 09:12:20 +0200 [thread overview]
Message-ID: <220414.8635igdtfn.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <YldPmUskbU+bOU2n@camp.crustytoothpaste.net>
On Wed, Apr 13 2022, brian m. carlson wrote:
> [[PGP Signed Part:Undecided]]
> In my day-to-day work, I have the occasion to use GitHub Codespaces on a
> repository with about 20,000 refs on the server. The environment is set
> up to pre-clone the repository, but I use a different default remote
> name than "origin" ("def", to be particular), and thus, one of the things
> I do when I set up that environment is to run "git remote rename origin
> def".
Aside from how we'd do renames with transactions, do you know about
clone.defaultRemoteName and --origin?
> This process takes 35 minutes, which is extremely pathological. I
> believe what's happening is that all of the refs are packed, and
> renaming the ref causes a loose ref to be created and the old ref to be
> deleted (necessitating a rewrite of the packed-refs file). This is
> essentially O(N^2) in the order of refs.
>
> We recently added a --progress option, but I think this performance is
> bad enough that that's not going to suffice here, and we should try to
> do better.
>
> I found that using "git for-each-ref" and "git update-ref --stdin" in a
> pipeline to create and delete the refs as a single transaction takes a
> little over 2 seconds. This is greater than a 99.9% improvement and is
> much more along the line of what I'd expect.
>
> I thought about porting this code to use a ref transaction, but I
> realized that we don't rename reflogs in that situation, which might be
> a problem for some people. In my case, since it's a freshly cloned repo
> and the reflogs aren't interesting, I don't care.
There was a (small) thread as a follow-up to that "rename --progress"
patch at the time, did you spot/read that?:
https://lore.kernel.org/git/220302.865yow6u8a.gmgdl@evledraar.gmail.com/
There's doubtless other previous discussions, I just haven't
found/remember them.
I have (briefly) tried hacking on this myself in the past, as anyone
who'll poke at that will no doubt find "branch rename" and "branch copy"
non-ref-transaction way of doing this are basically other callers with
the same problem.
Before I go any further I think it's good to know how far down this
particular rabbit hole you already are...
> I think a possible way forward may be to either teach ref transactions
> about ref renames, or simply to add a --no-reflogs option, which omits
> the reflogs in case the user doesn't care. I'm interested to hear ideas
> from others, though, about the best way forward.
More generally, probably:
1. Teach transactions about N operations on the same refname, which
they'll currently die on, renames are one case.
2. Be able to "hook in" to them, updating reflogs is one special-case,
but we have the same inherent issue with updating config in lockstep
with transactions.
next prev parent reply other threads:[~2022-04-14 7:18 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-13 22:32 Pathological performance with git remote rename and many tracking refs brian m. carlson
2022-04-14 7:12 ` Ævar Arnfjörð Bjarmason [this message]
2022-04-15 1:08 ` brian m. carlson
2022-04-15 12:26 ` Ævar Arnfjörð Bjarmason
2022-04-15 17:25 ` Junio C Hamano
2022-04-16 11:23 ` Ævar Arnfjörð Bjarmason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=220414.8635igdtfn.gmgdl@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=sandals@crustytoothpaste.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.