git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Missing Refs after Garbage Collection
@ 2012-12-22  1:41 Earl Gresh
  2012-12-22 22:26 ` Dmitry Potapov
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Earl Gresh @ 2012-12-22  1:41 UTC (permalink / raw)
  To: git

Hi-

I have observed that after running GC, one particular git repository ended up with some missing refs in the refs/changes/* namespace the Gerrit uses for storing patch sets. The refs were valid and should not have been pruned. Concerned about loosing data, GC is still enabled but ref packing is turned off. Now the number of refs has grown to the point that it's causing performance problems when cloning the project.

Is anyone familiar with git gc deleting valid references? I'm running git version 1.7.8. Have there been any patches in later git releases that might address this issue ( if it is a git problem )?

Thanks

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Missing Refs after Garbage Collection
  2012-12-22  1:41 Missing Refs after Garbage Collection Earl Gresh
@ 2012-12-22 22:26 ` Dmitry Potapov
  2012-12-23  1:04 ` Jeff King
  2013-01-02 22:43 ` Martin Fick
  2 siblings, 0 replies; 4+ messages in thread
From: Dmitry Potapov @ 2012-12-22 22:26 UTC (permalink / raw)
  To: Earl Gresh; +Cc: git

Hi,

On Sat, Dec 22, 2012 at 5:41 AM, Earl Gresh <egresh@codeaurora.org> wrote:
>
> Is anyone familiar with git gc deleting valid references? I'm running
> git version 1.7.8. Have there been any patches in later git releases
> that might address this issue ( if it is a git problem )?

I have not seen any relevant changes in git. I have looked at the code,
and what git-gc is running "git pack-refs --all --prune", which is very
careful in packing and fsyncing the new file with all packed references
before deleting anything. Only those references that were packed can be
deleted. Also, it does not matter whether a reference is valid or not,
or whether it is stored in refs/changes or in some other place, like
refs/heads. So if references were really lost as you described, I think
other people would notice that by now.

The only plausible explanation that comes to my mind now is that file
creation using O_EXCL is not atomic on your system, then the lock did
not work and one process could overwrite packed references created by
another.


Dmitry

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Missing Refs after Garbage Collection
  2012-12-22  1:41 Missing Refs after Garbage Collection Earl Gresh
  2012-12-22 22:26 ` Dmitry Potapov
@ 2012-12-23  1:04 ` Jeff King
  2013-01-02 22:43 ` Martin Fick
  2 siblings, 0 replies; 4+ messages in thread
From: Jeff King @ 2012-12-23  1:04 UTC (permalink / raw)
  To: Earl Gresh; +Cc: git

On Fri, Dec 21, 2012 at 05:41:43PM -0800, Earl Gresh wrote:

> I have observed that after running GC, one particular git repository
> ended up with some missing refs in the refs/changes/* namespace the
> Gerrit uses for storing patch sets. The refs were valid and should not
> have been pruned. Concerned about loosing data, GC is still enabled
> but ref packing is turned off. Now the number of refs has grown to the
> point that it's causing performance problems when cloning the project.
> 
> Is anyone familiar with git gc deleting valid references? I'm running
> git version 1.7.8. Have there been any patches in later git releases
> that might address this issue ( if it is a git problem )?

I have never seen deletion, but I did recently find a race condition
with ref packing that caused rewinds, where:

  1. Two processes simultaneously repack the refs.

  2. At least one process is using an "old" version of the pack-refs
     file. That is, it cached the packed refs list earlier in the
     process and is now rewriting it based on that cached notion.

  3. The first process takes the lock, packs refs, drops the
     lock, and then deletes the loose versions. The simultaneous packer
     then takes the lock, overwrites the packed-refs file with a stale
     copy from its memory, and then releases the lock. We're left with
     the stale copy in pack-refs, and deleted loose refs.

In my case, it looked like a rewind, because the stale, memory-cached
refs had the old version. But if you have a ref which was not previously
packed, it would appear to have been deleted.

The tricky thing about triggering this race is that step (2) needs a
process which has previously read and cached the packed-refs, and then
decided to pack the refs. The "git pack-refs" command does not do this,
because it starts, packs the ref, and exists. But processes which delete
a ref need to rewrite the packed-refs file (omitting the deleted ref),
and depending on the process, may have previously read and cached the
packed refs file. The obvious candidate is "receive-pack".

So this may be your culprit if:

  1. This is a repo people are pushing into via C git.

  2. You simultaneously run "git pack-refs" (or "git gc") while people
     may be pushing.

You mentioned Gerrit, so I wonder if people are actually pushing via C
git (I thought it used JGit entirely). Or perhaps JGit has the same bug.
My fix (which is not yet released in any git version) is here:

  http://article.gmane.org/gmane.comp.version-control.git/211956

-Peff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Missing Refs after Garbage Collection
  2012-12-22  1:41 Missing Refs after Garbage Collection Earl Gresh
  2012-12-22 22:26 ` Dmitry Potapov
  2012-12-23  1:04 ` Jeff King
@ 2013-01-02 22:43 ` Martin Fick
  2 siblings, 0 replies; 4+ messages in thread
From: Martin Fick @ 2013-01-02 22:43 UTC (permalink / raw)
  To: Earl Gresh; +Cc: git

On Friday, December 21, 2012 06:41:43 pm Earl Gresh wrote:
> Hi-
> 
> I have observed that after running GC, one particular git
> repository ended up with some missing refs in the
> refs/changes/* namespace the Gerrit uses for storing
> patch sets. The refs were valid and should not have been
> pruned. Concerned about loosing data, GC is still enabled
> but ref packing is turned off. Now the number of refs has
> grown to the point that it's causing performance problems
> when cloning the project.
> 
> Is anyone familiar with git gc deleting valid references?
> I'm running git version 1.7.8. Have there been any
> patches in later git releases that might address this
> issue ( if it is a git problem )?


When Earl was testing ref-packing a few months ago, that the 
refs in question where reported invalid by git show-ref:

  git show-ref 2>&1 |grep refs/changes/45/129345/1
  error: refs/changes/45/129345/1 does not point to a valid 
object!

But we could trace the refs manually to git show-object just 
fine.  But oddly enough, when using git show-ref with the -v, 
the error above would not be spit out.

So, my guess is that something during the repack was 
following the same code path that git show-ref (without the -
v) was following and determining that the ref was invalid and 
therefor it was not able to add it to the new packfile, but 
yet perhaps it was still being added to the prune-list and 
thus getting pruned?  Is this possible somehow?  

Looking at handle_one_ref() I can't see how.  The fprintf() 
happens before the ref is added to the prune list and is 
unconditional.  I am grasping here, but what if the sha1 
passed into handle_one_ref() somehow gets set incorrectly to 
000...?  Would it then basically get written to the packed-
ref file as 000... (deleted), but then still get added to the 
prune list?  You might say "but then it wouldn't get pruned 
since the loose ref doesn't match 000..., but if the logic 
which checks this matching makes the same error reading the 
sha1 and thinks it is 000... it might then get pruned?


-Martin

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-01-02 22:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-22  1:41 Missing Refs after Garbage Collection Earl Gresh
2012-12-22 22:26 ` Dmitry Potapov
2012-12-23  1:04 ` Jeff King
2013-01-02 22:43 ` Martin Fick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).