From: Patrick Steinhardt <ps@pks.im>
To: Junio C Hamano <gitster@pobox.com>
Cc: Stefan Haller <lists@haller-berlin.de>, git@vger.kernel.org
Subject: Re: Concurrent fetch commands
Date: Wed, 3 Jan 2024 08:33:24 +0100 [thread overview]
Message-ID: <ZZUNxNciNb_xZveY@tanuki> (raw)
In-Reply-To: <xmqqy1daffk8.fsf@gitster.g>
[-- Attachment #1: Type: text/plain, Size: 3123 bytes --]
On Sun, Dec 31, 2023 at 09:27:19AM -0800, Junio C Hamano wrote:
> Stefan Haller <lists@haller-berlin.de> writes:
>
> > I can reliably reproduce this by doing
> >
> > $ git fetch&; sleep 0.1; git pull
> > [1] 42160
> > [1] + done git fetch
> > fatal: Cannot rebase onto multiple branches.
>
> I see a bug here.
>
> How this _ought_ to work is
>
> - The first "git fetch" wants to report what it fetched by writing
> into the $GIT_DIR/FETCH_HEAD file ("git merge FETCH_HEAD" after
> the fetch finishes can consume its contents).
>
> - The second "git pull" runs "git fetch" under the hood. Because
> it also wants to write to $GIT_DIR/FETCH_HEAD, and because there
> is already somebody writing to the file, it should notice and
> barf, saying "fatal: a 'git fetch' is already working" or
> something.
>
> But because there is no "Do not overwrite FETCH_HEAD somebody else
> is using" protection, "git merge" or "git rebase" that is run as the
> second half of the "git pull" ends up working on the contents of
> FETCH_HEAD that is undefined, and GIGO result follows.
>
> The "bug" that the second "git fetch" does not notice an already
> running one (who is in possession of FETCH_HEAD) and refrain from
> starting is not easy to design a fix for---we cannot just abort by
> opening it with O_CREAT|O_EXCL because it is a normal thing for
> $GIT_DIR/FETCH_HEAD to exist after the "last" fetch. We truncate
> its contents before starting to avoid getting affected by contents
> leftover by the last fetch, but when there is a "git fetch" that is
> actively running, and it finishes _after_ the second one starts and
> truncates the file, the second one will end up seeing the contents
> the first one left. We have the "--no-write-fetch-head" option for
> users to explicitly tell which invocation of "git fetch" should not
> write FETCH_HEAD.
While I agree that it's the right thing to use "--no-write-fetch-head"
in this context, I still wonder whether we want to fix this "bug". It
would be a rather easy change on our side to start using the lockfile
API to write to FETCH_HEAD, which has a bunch of benefits:
- We would block concurrent processes of writing to FETCH_HEAD at the
same time (well, at least for clients aware of the new semantics).
- Consequentially, we do not write a corrupted FETCH_HEAD anymore when
multiple processes write to it at the same time.
- We're also more robust against corruption in the context of hard
crashes due to atomic rename semantics and proper flushing.
I don't really see much of a downside except for the fact that we change
how this special ref is being written to, so other implementations would
need to adapt accordingly. But even if they didn't, if clients with both
the new and old behaviour write FETCH_HEAD at the same point in time the
result would still be a consistent FETCH_HEAD if both writes finish. We
do have a race now which of both versions of FETCH_HEAD we see, but that
feels better than corrupted contents to me.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-01-03 7:33 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-31 13:30 Concurrent fetch commands Stefan Haller
2023-12-31 13:48 ` Dragan Simic
2023-12-31 13:50 ` Konstantin Tokarev
2023-12-31 14:01 ` Dragan Simic
2024-01-01 11:23 ` Stefan Haller
2024-01-01 15:47 ` Federico Kircheis
2023-12-31 17:27 ` Junio C Hamano
2023-12-31 17:41 ` Dragan Simic
2024-01-01 11:30 ` Stefan Haller
2024-01-01 11:42 ` Stefan Haller
2024-01-03 7:33 ` Patrick Steinhardt [this message]
2024-01-03 8:11 ` Patrick Steinhardt
[not found] ` <ZZU1TCyQdLqoLxPw@ugly>
2024-01-03 10:40 ` Patrick Steinhardt
2024-01-03 16:40 ` Taylor Blau
2024-01-03 22:10 ` Junio C Hamano
2024-01-04 12:01 ` Stefan Haller
2024-01-04 20:54 ` Mike Hommey
2024-01-04 22:14 ` Junio C Hamano
2024-01-04 22:25 ` Mike Hommey
2024-01-04 17:34 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZZUNxNciNb_xZveY@tanuki \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=lists@haller-berlin.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox