From: Taylor R Campbell <git@campbell.mumble.net>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: Synchronous replication on push
Date: Tue, 5 Nov 2024 01:34:32 +0000 [thread overview]
Message-ID: <20241105013433.4E52260A64@jupiter.mumble.net> (raw)
In-Reply-To: <20241104234705.GA3017597@coredump.intra.peff.net>
> Date: Mon, 4 Nov 2024 18:47:05 -0500
> From: Jeff King <peff@peff.net>
>
> On Sat, Nov 02, 2024 at 02:06:53AM +0000, Taylor R Campbell wrote:
>
> > Whenever I push anything to it, I want the push -- that is, all the
> > objects, and all the ref updates -- to be synchronously replicated to
> > another remote repository, the back end:
>
> This isn't quite how replication works at, say, GitHub. But let me first
> explain some of what you're seeing, and then I'll give some higher level
> comments at the end.
Great, thanks! I understand Github works differently, and I'm not
trying to replicate everything about Github's architecture, which I
expect to take substantial novel software engineering effort. But I
am trying to make sure I understand how the parts fit together well
enough provide qualitatively similar types of guarantees about
durability when the user's `git push' exits nonzero.
I really have two different goals here, which have similar needs for
relaying pushes but which I'm sure will diverge at some point:
1. provide a synchronous push/pull git frontend to an hg backend with
git-cinnabar (so to ordinary git clients it looks just like an
ordinary git remote, without needing git-cinnabar), and
2. provide a git frontend that replicates to one or many git backends
for better resilience to server loss.
> Instead, you should disable push's attempt to
> update the local tracking refs. There isn't an option to do that, but
> if you don't have a "fetch" config line, then there are no tracking
> refs. I.e., rather than using "clone --mirror", create your frontend
> repo like this:
>
> git init --bare
> git config remote.backend.url git@backend.example.com:/repo.git
> git fetch backend refs/*:refs/*
>
> And then push won't try to update anything in the frontend repo.
Thanks, that hadn't occurred to me as an option.
> Side note: there's a small maybe-bug here that I noticed if the
> backend is on the same local filesystem. In that case
> GIT_QUARANTINE_PATH remains set for the receive-pack process running
> on the backend repo, and will refuse to update refs (where it should
> be safe to do so!). In your example that doesn't happen because
> GIT_QUARANTINE_PATH does not make it across the ssh connection. But
> arguably we should be clearing GIT_QUARANTINE_PATH in local_repo_env
> like we do for GIT_DIR, etc. I don't think you ran into this, but just
> another hiccup I found while trying to reproduce your situation.
(I did actually run into this, so in my test scripts I have been using
git {clone,config,...} ext::"env -i PATH=$PATH git %s /path/to/backend.git" ...
instead of just
git {clone,config,...} /path/to/backend.git ...
in order to nix GIT_QUARANTINE_PATH from the environment -- and
anything else I might not have thought of -- while running
git-receive-pack on the backend. But it didn't seem germane to the
problem at hand so I didn't want to clutter up my already somewhat
long question with such details unless someone asked me to share my
reproducer!)
> > 3. Same as (1), but the pre-receive hook assembles a command line of
> >
> > exec git push backend ${new0}:${ref0} ${new1}:${ref1} ...,
> >
> > with all the ref updates passed on stdin (ignoring the old values).
>
> ...yes, this is the correct approach. You're not _quite_ passing all of
> the relevant info, though, because you're ignoring the old value of each
> ref. And ideally you'd make sure you were moving backend's ref0 from
> "old0" to "new0"; otherwise you risk overwriting something that happened
> independently on the backend. Of course that creates new questions,
> like what happens when the frontend and backend get out of sync.
Right -- there will be some combination of --force-with-lease or
pre-receive tests at the other end to handle this. But for now my
focus is on making git push work in pre-receive at all.
As long as anything out-of-sync leads to noisy failure, possibly
requiring manual intervention, that's good enough for now (and I'm not
(yet) concerned with .
> > remote: error: update_ref failed for ref 'refs/heads/main': ref updates forbidden inside quarantine environment
> >
> > but somehow the push succeeds in spite of this message, and the
> > primary and replica both get updated.
>
> This is again the quarantine issue updating local tracking branches.
> However, we don't consider that a hard error, as updating them is
> opportunistic (we'd get the new values on the next fetch anyway).
>
> If you drop the refspec as above, you shouldn't see that any more.
Yes, thanks!
> Now back to the main point: is this a good way to do replication? I
> don't think it's _terrible_, but there are two flaws I can see:
These are all good points that I will consider once I get to them now
that I can make progress past the obstacle of local tracking ref
updates in pre-receive git push, thanks.
> 1. You're not kicking off the backend push until the frontend has
> received and processed the whole pack. So you're doubling the
> end-to-end latency of the push. In an ideal world you'd actually
> stream the incoming packfile to the backend, which would doing its
> own quarantined index-pack[*] on it in real-time. And then when you
> get to the pre-receive hook, all that's left is for all of the
> replicas to agree to commit to the ref update.
Git doesn't currently have any hooks for doing this, right? So
presumably this will require a custom git-receive-pack replacement
that understands the git wire protocol to stream the packfile to
backends (which is what I assume Github's spokes proxies do).
> 2. Using "push" isn't a very atomic way of updating refs. The backends
> will either accept the push or not, and then the frontend will try
> to update its refs. What if it fails? What if another push comes in
> simultaneously? Can they overwrite each other or lose pushed data?
> Or get the frontend and backends out of sync?
Right -- there's a lot to work out for the three-phase commit part.
One simplification for now is to reject non-fast-forward pushes (and
ref deletion), and to not worry too much about ordering of independent
ref updates or whether I even want serializable isolation or just
read-repeatable or -committed for that.
That said, regarding push atomicity: Suppose users concurrently do
alice$ git push frontend X Y
bob$ git push frontend Y X
That is, there are overlapping ref updates, and suppose Alice and Bob
have incompatible referents for X and Y (non-fast-forward, or they're
using --force-with-lease but not --atomic, or whatever).
When are the locks on X and Y taken relative to pre-receive in the
frontend? Can the pre-receive hooks for Alice's push and Bob's push
run concurrently or are they serialized by locks on the common refs X
and Y? This can't deadlock, can it? (I assume the locks on refs are
taken in a consistent order.)
It's unclear to me from the githooks(5), git-push(1), and
git-receive-pack(1) man pages what the ordering of hooks and ref
locking is, or what serialization guarantees hooks have -- if any.
prev parent reply other threads:[~2024-11-05 1:34 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-02 2:06 Synchronous replication on push Taylor R Campbell
2024-11-02 10:09 ` Matěj Cepl
2024-11-02 13:35 ` Taylor R Campbell
2024-11-02 14:49 ` brian m. carlson
2024-11-04 13:35 ` Taylor R Campbell
2024-11-04 14:40 ` Konstantin Ryabitsev
2024-11-04 15:50 ` Taylor R Campbell
2024-11-04 22:36 ` brian m. carlson
2024-11-04 23:47 ` Jeff King
2024-11-05 1:34 ` Taylor R Campbell [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241105013433.4E52260A64@jupiter.mumble.net \
--to=git@campbell.mumble.net \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).