From: Patrick Steinhardt <ps@pks.im>
To: Junio C Hamano <gitster@pobox.com>
Cc: Stefan Haller <lists@haller-berlin.de>, git@vger.kernel.org
Subject: Re: Concurrent fetch commands
Date: Wed, 3 Jan 2024 09:11:07 +0100 [thread overview]
Message-ID: <ZZUWmy3rTjpBsH-w@tanuki> (raw)
In-Reply-To: <ZZUNxNciNb_xZveY@tanuki>
[-- Attachment #1: Type: text/plain, Size: 3619 bytes --]
On Wed, Jan 03, 2024 at 08:33:24AM +0100, Patrick Steinhardt wrote:
> On Sun, Dec 31, 2023 at 09:27:19AM -0800, Junio C Hamano wrote:
> > Stefan Haller <lists@haller-berlin.de> writes:
> >
> > > I can reliably reproduce this by doing
> > >
> > > $ git fetch&; sleep 0.1; git pull
> > > [1] 42160
> > > [1] + done git fetch
> > > fatal: Cannot rebase onto multiple branches.
> >
> > I see a bug here.
> >
> > How this _ought_ to work is
> >
> > - The first "git fetch" wants to report what it fetched by writing
> > into the $GIT_DIR/FETCH_HEAD file ("git merge FETCH_HEAD" after
> > the fetch finishes can consume its contents).
> >
> > - The second "git pull" runs "git fetch" under the hood. Because
> > it also wants to write to $GIT_DIR/FETCH_HEAD, and because there
> > is already somebody writing to the file, it should notice and
> > barf, saying "fatal: a 'git fetch' is already working" or
> > something.
> >
> > But because there is no "Do not overwrite FETCH_HEAD somebody else
> > is using" protection, "git merge" or "git rebase" that is run as the
> > second half of the "git pull" ends up working on the contents of
> > FETCH_HEAD that is undefined, and GIGO result follows.
> >
> > The "bug" that the second "git fetch" does not notice an already
> > running one (who is in possession of FETCH_HEAD) and refrain from
> > starting is not easy to design a fix for---we cannot just abort by
> > opening it with O_CREAT|O_EXCL because it is a normal thing for
> > $GIT_DIR/FETCH_HEAD to exist after the "last" fetch. We truncate
> > its contents before starting to avoid getting affected by contents
> > leftover by the last fetch, but when there is a "git fetch" that is
> > actively running, and it finishes _after_ the second one starts and
> > truncates the file, the second one will end up seeing the contents
> > the first one left. We have the "--no-write-fetch-head" option for
> > users to explicitly tell which invocation of "git fetch" should not
> > write FETCH_HEAD.
>
> While I agree that it's the right thing to use "--no-write-fetch-head"
> in this context, I still wonder whether we want to fix this "bug". It
> would be a rather easy change on our side to start using the lockfile
> API to write to FETCH_HEAD, which has a bunch of benefits:
>
> - We would block concurrent processes of writing to FETCH_HEAD at the
> same time (well, at least for clients aware of the new semantics).
>
> - Consequentially, we do not write a corrupted FETCH_HEAD anymore when
> multiple processes write to it at the same time.
>
> - We're also more robust against corruption in the context of hard
> crashes due to atomic rename semantics and proper flushing.
>
> I don't really see much of a downside except for the fact that we change
> how this special ref is being written to, so other implementations would
> need to adapt accordingly. But even if they didn't, if clients with both
> the new and old behaviour write FETCH_HEAD at the same point in time the
> result would still be a consistent FETCH_HEAD if both writes finish. We
> do have a race now which of both versions of FETCH_HEAD we see, but that
> feels better than corrupted contents to me.
Ah, one thing I didn't think of is parallel fetches. It's expected that
all of the fetches write into FETCH_HEAD at the same point in time
concurrently, so the end result is the collection of updated refs even
though the order isn't guaranteed. That makes things a bit more
complicated indeed.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-01-03 8:11 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-31 13:30 Concurrent fetch commands Stefan Haller
2023-12-31 13:48 ` Dragan Simic
2023-12-31 13:50 ` Konstantin Tokarev
2023-12-31 14:01 ` Dragan Simic
2024-01-01 11:23 ` Stefan Haller
2024-01-01 15:47 ` Federico Kircheis
2023-12-31 17:27 ` Junio C Hamano
2023-12-31 17:41 ` Dragan Simic
2024-01-01 11:30 ` Stefan Haller
2024-01-01 11:42 ` Stefan Haller
2024-01-03 7:33 ` Patrick Steinhardt
2024-01-03 8:11 ` Patrick Steinhardt [this message]
[not found] ` <ZZU1TCyQdLqoLxPw@ugly>
2024-01-03 10:40 ` Patrick Steinhardt
2024-01-03 16:40 ` Taylor Blau
2024-01-03 22:10 ` Junio C Hamano
2024-01-04 12:01 ` Stefan Haller
2024-01-04 20:54 ` Mike Hommey
2024-01-04 22:14 ` Junio C Hamano
2024-01-04 22:25 ` Mike Hommey
2024-01-04 17:34 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZZUWmy3rTjpBsH-w@tanuki \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=lists@haller-berlin.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).