Re: Why is "git fetch --prune" so much slower than "git remote prune"?

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Michael Haggerty <mhagger@alum.mit.edu>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Jeff King" <peff@peff.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Stefan Beller" <sbeller@google.com>,
	"Git Mailing List" <git@vger.kernel.org>,
	"Jay Soffian" <jaysoffian@gmail.com>,
	"Björn Gustavsson" <bgustavsson@gmail.com>
Subject: Re: Why is "git fetch --prune" so much slower than "git remote prune"?
Date: Fri, 20 Mar 2015 05:51:56 +0100	[thread overview]
Message-ID: <550BA76C.80501@alum.mit.edu> (raw)
In-Reply-To: <xmqqpp84iye2.fsf@gitster.dls.corp.google.com>

On 03/19/2015 08:24 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
>>> Now that we have ref_transaction_*, I think if git-fetch fed all of the
>>> deletes (along with the updates) into a single transaction, we would get
>>> the same optimization for free. Maybe that is even part of some of the
>>> pending ref_transaction work from Stefan or Michael (both cc'd). I
>>> haven't kept up very well with what is cooking in pu.
>>
>> I am looking into this now.
>>
>> For pruning, we can't use a ref_transaction as it is currently
>> implemented because it would fail if any of the reference deletions
>> failed. But in this case I think if any deletions fail, we would prefer
>> to emit a warning but keep going.
> 
> I am not quite sure what you mean here.  I agree with you if you
> meant "we shouldn't fail the fetch only because 'fetch --prune'
> failed to remove only one of the remote-tracking refs that are no
> longer there" but that can easily be solved by the pruning phase
> into a separate transaction.  If you meant "we would rather remove
> origin/{a,b} non-atomically when we noticed that origin/{a,b,c} are
> all gone than leaving all three intact only because we failed to
> remove origin/c for whatever reason", my knee-jerk reaction is "does
> it make practical difference to the end user between these two?"
> 
> What are the plausible cause of failing to prune unused
> remote-tracking refs?

I wasn't proposing to combine the updates with the prunes in a single
transaction. (Though now that I think about it, when "fetch --atomic" is
specified it wouldn't be so crazy. It would also have the virtue of
allowing the reference transaction code to deal with any D/F conflicts
between references being added and references being deleted.)

What I meant is the second thing you mentioned, namely that currently we
prune each reference on a "best-effort" basis and wouldn't skip *all*
prunes just because one failed.

But you raise an interesting question: what could cause a failure to
prune an unused remote-tracking ref (i.e. if all pruning is done in a
single transaction)?

* I think the most likely reason for a failure would be a lock conflict
with another process. We don't retry lock attempts [1], so this would
cause the whole transaction to fail. This could happen, for example, if
the user launches two "fetch --prune" operations at the same time, or
runs "pack-refs" while fetching [2].

* It is *not* a failure if another process deletes or updates the
reference before we get to it, because we don't provide an "old" value
of the reference for a pre-check.

* I haven't verified this, but I can imagine it could fail if another
process deletes the reference (say refs/remotes/origin/foo/bar) and adds
another reference that would D/F conflict with it (say
refs/remotes/origin/foo) while we are working, because our attempt to
create refs/remotes/origin/foo/bar.lock would fail.

* A repository problem, like for example wrong file permissions
somewhere in the loose refs directory, could prevent us from being able
to create the lockfile or delete the loose reference file.

The lock conflict scenario is the only one that really worries me.

But I don't think it is *so* hard to keep the current "best-effort"
behavior. I am working on a function delete_refs() for the reference API
that deletes a bunch of references on a "best effort" basis, reporting
per-reference errors for any deletions that fail. Longer term, we could
add something like a REFS_NON_ATOMIC flag to the
refs_transaction_update() functions, which allows the rest of the
transaction to proceed even if that particular update fails.

Michael

[1] I hope to submit a patch to make it possible to retry lock
acquisition with a specified timeout. This should help in scenarios like
this.

[2] Maybe a careful analysis or a slight change to our locking protocol
could prove that the only lock acquisition that can fail is the one on
packed-refs, which would cause all of the deletes to fail anyway. In
that case per-reference failures would cease to be a worry.

-- 
Michael Haggerty
mhagger@alum.mit.edu

next prev parent reply	other threads:[~2015-03-20  4:52 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-06 16:48 Why is "git fetch --prune" so much slower than "git remote prune"? Ævar Arnfjörð Bjarmason
2015-03-06 22:59 ` Jeff King
2015-03-19 14:49   ` Michael Haggerty
2015-03-19 17:14     ` Jeff King
2015-03-19 19:24     ` Junio C Hamano
2015-03-19 21:26       ` Jeff King
2015-03-20  4:51       ` Michael Haggerty [this message]
2015-03-20  7:04         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=550BA76C.80501@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=avarab@gmail.com \
    --cc=bgustavsson@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jaysoffian@gmail.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).