git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Tom Miller <jackerran@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH 3/3] fetch --prune: Repair branchname DF conflicts
Date: Thu, 19 Dec 2013 06:44:14 -0500	[thread overview]
Message-ID: <20131219114413.GA23298@sigill.intra.peff.net> (raw)
In-Reply-To: <20131219014859.GA32240@gmail.com>

On Wed, Dec 18, 2013 at 07:48:59PM -0600, Tom Miller wrote:

> I did not intend to introduce new lingo. I did some searching through
> history to see if something like this had been worked on before and
> I found a commit by Jeff King that introduced me the the idea of
> "DF conflicts"

I take all the blame. :)

As for the patch itself:

> >> diff --git a/builtin/fetch.c b/builtin/fetch.c
> >> index e50b697..845c687 100644
> >> --- a/builtin/fetch.c
> >> +++ b/builtin/fetch.c
> >> @@ -868,11 +868,6 @@ static int do_fetch(struct transport *transport,
> >>
> >>       if (tags == TAGS_DEFAULT && autotags)
> >>               transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, "1");
> >> -     if (fetch_refs(transport, ref_map)) {
> >> -             free_refs(ref_map);
> >> -             retcode = 1;
> >> -             goto cleanup;
> >> -     }
> >>       if (prune) {
> >>               /*
> >>                * We only prune based on refspecs specified
> >> @@ -888,6 +883,11 @@ static int do_fetch(struct transport *transport,
> >>                                  transport->url);
> >>               }
> >>       }
> >> +     if (fetch_refs(transport, ref_map)) {
> >> +             free_refs(ref_map);
> >> +             retcode = 1;
> >> +             goto cleanup;
> >> +     }

I think this is _probably_ a good thing to do, but it does have an
interesting side effect for concurrent operations, and I haven't seen
that mentioned so far in the discussion.

Readers of the ref namespace don't have any sort of transactionally
consistent view of all of the refs. So if a remote has moved a branch
"foo" to "bar" and we "fetch --prune", there will be a moment where a
simultaneous reader will see one of two states that never existed on the
remote (depending on the order the fetch chooses): either both refs
exist, or neither exists.

Right now fetch creates first and deletes after, so a simultaneous
reader may see both refs. After your change, it may see no refs at all.
Even though both are technically wrong, the current behavior is safer.
If the reader is calculating reachability (e.g., for a repack or "git
prune), it is better to have too many references than too few.

I'm not sure to what degree we want to care. This is a race, but it's a
reasonably unlikely one, and the D/F thing bites people in the real
world.

And further confounding this is the fact that even if the writer does
everything correctly, the way we read refs can still cause an odd view
of the whole namespace. For example, consider moving "refs/heads/z/foo"
to "refs/heads/a/foo", while somebody else reads simultaneously. Even
with create-before-delete, we can get the sequence:

  1. Reader reads "refs/heads/a/" and sees it does not contain "foo".

  2. Writer writes "refs/heads/a/foo".

  3. Writer deletes "refs/heads/z/foo".

  4. Reader reads "refs/heads/z", which does not contain "foo".

That race can be closed with a double-read of the ref namespaces, but
that has poor performance. A more reasonable fix, IMHO, would be to have
an alternate ref store that represents transactions atomically (keeping
in mind that this really only matters for busy repos with simultaneous
readers and writers, so it would not even need to be the default ref
store). And once you have such a store, that solves the other problem,
too: you can just treat the delete-create as a transaction anyway. It
also solves a similar problem with refs that rewind.

So even leaving it as-is does not make the problem go away, though the
proposed change does exacerbate it somewhat. I wonder how hard it would
be to do the safer thing in the common case that there is no D/F
conflict. That is, do multiple passes at updating the refs:

  1. Create/update any refs we can. Those with D/F conflicts are put
     aside for the moment.

  2. Delete any refs according to the --prune rules.

  3. Come back to any D/F conflicts and try them again.

I dunno. As far as I know, this is not a race that people see often in
real life (I do not have any confirmed cases of it yet). So it may
simply not be worth worrying about.

-Peff

  parent reply	other threads:[~2013-12-19 11:44 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-18 21:22 [PATCH 1/3] builtin/fetch.c: Add pretty_url() and print_url() Tom Miller
2013-12-18 21:22 ` [PATCH 2/3] fetch --prune: Always print header url Tom Miller
2013-12-18 21:22 ` [PATCH 3/3] fetch --prune: Repair branchname DF conflicts Tom Miller
2013-12-18 21:54   ` Junio C Hamano
2013-12-19  1:48     ` Tom Miller
2013-12-19  6:28       ` Junio C Hamano
2013-12-19 11:44       ` Jeff King [this message]
2013-12-19 19:34       ` Junio C Hamano
2013-12-18 21:47 ` [PATCH 1/3] builtin/fetch.c: Add pretty_url() and print_url() Junio C Hamano
2013-12-19  1:18   ` Tom Miller
2013-12-19 17:41     ` Thomas Miller
2013-12-19 22:57 ` [PATCH V2 1/2] fetch --prune: Always print header url Tom Miller
2013-12-19 22:57   ` [PATCH V2 2/2] fetch --prune: Run prune before fetching Tom Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131219114413.GA23298@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jackerran@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).