All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Git List <git@vger.kernel.org>
Subject: Re: git gc --auto aquires *.lock files that make a subsequent git-fetch error out
Date: Wed, 12 Jul 2017 22:30:25 +0200	[thread overview]
Message-ID: <87a849z9cu.fsf@gmail.com> (raw)
In-Reply-To: <20170712200054.mxcabiyttijpbkbb@sigill.intra.peff.net>


On Wed, Jul 12 2017, Jeff King jotted:

> On Wed, Jul 12, 2017 at 09:38:46PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> In 131b8fcbfb ("fetch: run gc --auto after fetching", 2013-01-26) first
>> released with v1.8.2 Jeff changed git-fetch to run "git gc --auto"
>> afterwards.
>>
>> This means that if you run two git fetches in a row the second one may
>> fail because it can't acquire the *.lock files on the remote branches you
>> have & which the next git-fetch needs to update.
>
> Is it really "in a row" that's a problem? The second fetch should not
> begin until the first one is done, including until its auto-gc exits.
> And even with background gc, we do the ref-locking operations first, due
> to 62aad1849 (gc --auto: do not lock refs in the background,
> 2014-05-25).
>
>> I happen to run into this on a git.git which has a lot of remotes (most
>> people on-list whose remotes I know about) and fetch them in parallel:
>>
>>     $ git config alias.pfetch
>>     !parallel 'git fetch {}' ::: $(git remote)
>
> Ah, so it's not in a row. It's parallel. Then yes, you may run into
> problems with the gc locks conflicting with real operations. This isn't
> really unique to fetch. Any simultaneous operation can run into problems
> (e.g., on a busy server repo you may see conflicts between pack-refs and
> regular pushes).

This is what I thought at first, and I've only encountered the issue in
this parallel mode (mainly because it's tedious to reproduce). But I
think the traces below show that it would happen with "git fetch --all"
& "git remote update" as well, so the parallel invocations didn't
matter.

I.e. I'd just update my first remote, then git-gc would start in the
background and lock refs for my other remotes, which I'd then fail to
update.

>> And so would 'git fetch --all':
>>
>>     $ GIT_TRACE=1 git fetch --all 2>&1|grep --line-buffered built-in|grep -v rev-list
>>     19:31:26.273577 git.c:328               trace: built-in: git 'fetch' '--all'
>>     19:31:26.278869 git.c:328               trace: built-in: git 'fetch' '--append' 'origin'
>>     19:31:27.993312 git.c:328               trace: built-in: git 'gc' '--auto'
>>     19:31:27.995855 git.c:328               trace: built-in: git 'fetch' '--append' 'avar'
>>     19:31:29.656925 git.c:328               trace: built-in: git 'gc' '--auto'
>>
>> I think those two cases are bugs (but ones which I don't have the
>> inclination to chase myself beyond sending this E-Mail). We should be
>> running the 'git gc --auto' at the very end of the entire program, not
>> after fetching every single remote.
>>
>> Passing some env variable (similar to the config we pass via the env) to
>> subprograms to make them avoid "git gc --auto" so the main process can
>> do it would probably be the most simple solution.
>
> Yes, I agree that's poor. Ideally there would be a command-line option
> to tell the sub-fetches not to run auto-gc. It could be done with:
>
>   git -c gc.auto=0 fetch --append ...
>
> Or we could even take the "--append" as a hint not to run auto-gc.
>
>> The more general case (such as with my parallel invocation) is harder to
>> solve.
>
> Yes, I don't think it can solved. The most general case is two totally
> unrelated processes which know nothing about each other.
>
>> Maybe "git gc --auto" should have a heuristic so it checks whether
>> there's been recent activity on the repo, and waits until there's been
>> say 60 seconds of no activity, or alternatively if it's waited 600
>> seconds and hasn't run gc yet.
>
> That sounds complicated.
>
>> Ideally a "real" invocation like git-fetch would have a way to simply
>> steal any *.lock a background "git gc --auto" creates, aborting the gc
>> but allowing the "real" invocation to proceed. But that sounds even
>> trickier to implement, and might without an extra heuristic on top
>> postpone gc indefinitely.
>
> The locks are generally due to ref-packing and reflog expiration.  I
> think in the long run, it would be nice to move to a ref store that
> didn't need packing, and that could do reflog expiration more
> atomically.
>
> I think the way "reflog expire" is done holds the locks for a lot longer
> than is strictly necessary, too (it actually computes reachability for
> --expire-unreachable on the fly while holding some locks).
>
> -Peff

  reply	other threads:[~2017-07-12 20:30 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-26 22:40 [PATCH 0/2] optimizing pack access on "read only" fetch repos Jeff King
2013-01-26 22:40 ` [PATCH 1/2] fetch: run gc --auto after fetching Jeff King
2013-01-27  1:51   ` Jonathan Nieder
     [not found]   ` <87bmopzbqx.fsf@gmail.com>
2017-07-12 20:00     ` git gc --auto aquires *.lock files that make a subsequent git-fetch error out Jeff King
2017-07-12 20:30       ` Ævar Arnfjörð Bjarmason [this message]
2017-07-12 20:43         ` Jeff King
2013-01-26 22:40 ` [PATCH 2/2] fetch-pack: avoid repeatedly re-scanning pack directory Jeff King
2013-01-27 10:27   ` Jonathan Nieder
2013-01-27 20:09     ` Junio C Hamano
2013-01-27 23:20       ` Jonathan Nieder
2013-01-27  6:32 ` [PATCH 0/2] optimizing pack access on "read only" fetch repos Junio C Hamano
2013-01-29  8:06   ` Shawn Pearce
2013-01-29  8:29   ` Jeff King
2013-01-29 15:25     ` Martin Fick
2013-01-29 15:58     ` Junio C Hamano
2013-01-29 21:19       ` Jeff King
2013-01-29 22:26         ` Junio C Hamano
2013-01-31 16:47         ` Shawn Pearce
2013-02-01  9:14           ` Jeff King
2013-02-02 10:07             ` Shawn Pearce
2013-01-29 11:01   ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a849z9cu.fsf@gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.