Re: Concurrent fetch commands - Patrick Steinhardt

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Patrick Steinhardt <ps@pks.im>
To: Junio C Hamano <gitster@pobox.com>
Cc: Stefan Haller <lists@haller-berlin.de>, git@vger.kernel.org
Subject: Re: Concurrent fetch commands
Date: Wed, 3 Jan 2024 09:11:07 +0100	[thread overview]
Message-ID: <ZZUWmy3rTjpBsH-w@tanuki> (raw)
In-Reply-To: <ZZUNxNciNb_xZveY@tanuki>

[-- Attachment #1: Type: text/plain, Size: 3619 bytes --]

On Wed, Jan 03, 2024 at 08:33:24AM +0100, Patrick Steinhardt wrote:
> On Sun, Dec 31, 2023 at 09:27:19AM -0800, Junio C Hamano wrote:
> > Stefan Haller <lists@haller-berlin.de> writes:
> > 
> > > I can reliably reproduce this by doing
> > >
> > >    $ git fetch&; sleep 0.1; git pull
> > >    [1] 42160
> > >    [1]  + done       git fetch
> > >    fatal: Cannot rebase onto multiple branches.
> > 
> > I see a bug here.
> > 
> > How this _ought_ to work is
> > 
> >  - The first "git fetch" wants to report what it fetched by writing
> >    into the $GIT_DIR/FETCH_HEAD file ("git merge FETCH_HEAD" after
> >    the fetch finishes can consume its contents).
> > 
> >  - The second "git pull" runs "git fetch" under the hood.  Because
> >    it also wants to write to $GIT_DIR/FETCH_HEAD, and because there
> >    is already somebody writing to the file, it should notice and
> >    barf, saying "fatal: a 'git fetch' is already working" or
> >    something.
> > 
> > But because there is no "Do not overwrite FETCH_HEAD somebody else
> > is using" protection, "git merge" or "git rebase" that is run as the
> > second half of the "git pull" ends up working on the contents of
> > FETCH_HEAD that is undefined, and GIGO result follows.
> > 
> > The "bug" that the second "git fetch" does not notice an already
> > running one (who is in possession of FETCH_HEAD) and refrain from
> > starting is not easy to design a fix for---we cannot just abort by
> > opening it with O_CREAT|O_EXCL because it is a normal thing for
> > $GIT_DIR/FETCH_HEAD to exist after the "last" fetch.  We truncate
> > its contents before starting to avoid getting affected by contents
> > leftover by the last fetch, but when there is a "git fetch" that is
> > actively running, and it finishes _after_ the second one starts and
> > truncates the file, the second one will end up seeing the contents
> > the first one left.  We have the "--no-write-fetch-head" option for
> > users to explicitly tell which invocation of "git fetch" should not
> > write FETCH_HEAD.
> 
> While I agree that it's the right thing to use "--no-write-fetch-head"
> in this context, I still wonder whether we want to fix this "bug". It
> would be a rather easy change on our side to start using the lockfile
> API to write to FETCH_HEAD, which has a bunch of benefits:
> 
>   - We would block concurrent processes of writing to FETCH_HEAD at the
>     same time (well, at least for clients aware of the new semantics).
> 
>   - Consequentially, we do not write a corrupted FETCH_HEAD anymore when
>     multiple processes write to it at the same time.
> 
>   - We're also more robust against corruption in the context of hard
>     crashes due to atomic rename semantics and proper flushing.
> 
> I don't really see much of a downside except for the fact that we change
> how this special ref is being written to, so other implementations would
> need to adapt accordingly. But even if they didn't, if clients with both
> the new and old behaviour write FETCH_HEAD at the same point in time the
> result would still be a consistent FETCH_HEAD if both writes finish. We
> do have a race now which of both versions of FETCH_HEAD we see, but that
> feels better than corrupted contents to me.

Ah, one thing I didn't think of is parallel fetches. It's expected that
all of the fetches write into FETCH_HEAD at the same point in time
concurrently, so the end result is the collection of updated refs even
though the order isn't guaranteed. That makes things a bit more
complicated indeed.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

next prev parent reply	other threads:[~2024-01-03  8:11 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-31 13:30 Concurrent fetch commands Stefan Haller
2023-12-31 13:48 ` Dragan Simic
2023-12-31 13:50 ` Konstantin Tokarev
2023-12-31 14:01   ` Dragan Simic
2024-01-01 11:23   ` Stefan Haller
2024-01-01 15:47     ` Federico Kircheis
2023-12-31 17:27 ` Junio C Hamano
2023-12-31 17:41   ` Dragan Simic
2024-01-01 11:30   ` Stefan Haller
2024-01-01 11:42     ` Stefan Haller
2024-01-03  7:33   ` Patrick Steinhardt
2024-01-03  8:11     ` Patrick Steinhardt [this message]
     [not found]       ` <ZZU1TCyQdLqoLxPw@ugly>
2024-01-03 10:40         ` Patrick Steinhardt
2024-01-03 16:40           ` Taylor Blau
2024-01-03 22:10             ` Junio C Hamano
2024-01-04 12:01               ` Stefan Haller
2024-01-04 20:54                 ` Mike Hommey
2024-01-04 22:14                   ` Junio C Hamano
2024-01-04 22:25                     ` Mike Hommey
2024-01-04 17:34               ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZZUWmy3rTjpBsH-w@tanuki \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=lists@haller-berlin.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).