All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: git@vger.kernel.org
Subject: Re: Pathological performance with git remote rename and many tracking refs
Date: Thu, 14 Apr 2022 09:12:20 +0200	[thread overview]
Message-ID: <220414.8635igdtfn.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <YldPmUskbU+bOU2n@camp.crustytoothpaste.net>


On Wed, Apr 13 2022, brian m. carlson wrote:

> [[PGP Signed Part:Undecided]]
> In my day-to-day work, I have the occasion to use GitHub Codespaces on a
> repository with about 20,000 refs on the server.  The environment is set
> up to pre-clone the repository, but I use a different default remote
> name than "origin" ("def", to be particular), and thus, one of the things
> I do when I set up that environment is to run "git remote rename origin
> def".

Aside from how we'd do renames with transactions, do you know about
clone.defaultRemoteName and --origin?

> This process takes 35 minutes, which is extremely pathological.  I
> believe what's happening is that all of the refs are packed, and
> renaming the ref causes a loose ref to be created and the old ref to be
> deleted (necessitating a rewrite of the packed-refs file).  This is
> essentially O(N^2) in the order of refs.
>
> We recently added a --progress option, but I think this performance is
> bad enough that that's not going to suffice here, and we should try to
> do better.
>
> I found that using "git for-each-ref" and "git update-ref --stdin" in a
> pipeline to create and delete the refs as a single transaction takes a
> little over 2 seconds.  This is greater than a 99.9% improvement and is
> much more along the line of what I'd expect.
>
> I thought about porting this code to use a ref transaction, but I
> realized that we don't rename reflogs in that situation, which might be
> a problem for some people.  In my case, since it's a freshly cloned repo
> and the reflogs aren't interesting, I don't care.

There was a (small) thread as a follow-up to that "rename --progress"
patch at the time, did you spot/read that?:
https://lore.kernel.org/git/220302.865yow6u8a.gmgdl@evledraar.gmail.com/

There's doubtless other previous discussions, I just haven't
found/remember them.

I have (briefly) tried hacking on this myself in the past, as anyone
who'll poke at that will no doubt find "branch rename" and "branch copy"
non-ref-transaction way of doing this are basically other callers with
the same problem.

Before I go any further I think it's good to know how far down this
particular rabbit hole you already are...

> I think a possible way forward may be to either teach ref transactions
> about ref renames, or simply to add a --no-reflogs option, which omits
> the reflogs in case the user doesn't care.  I'm interested to hear ideas
> from others, though, about the best way forward.

More generally, probably:

 1. Teach transactions about N operations on the same refname, which
    they'll currently die on, renames are one case.

 2. Be able to "hook in" to them, updating reflogs is one special-case,
    but we have the same inherent issue with updating config in lockstep
    with transactions.


  reply	other threads:[~2022-04-14  7:18 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13 22:32 Pathological performance with git remote rename and many tracking refs brian m. carlson
2022-04-14  7:12 ` Ævar Arnfjörð Bjarmason [this message]
2022-04-15  1:08   ` brian m. carlson
2022-04-15 12:26     ` Ævar Arnfjörð Bjarmason
2022-04-15 17:25       ` Junio C Hamano
2022-04-16 11:23         ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=220414.8635igdtfn.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.