git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* does a successful 'git gc' imply 'git fsck'
@ 2012-12-02  2:31 Sitaram Chamarty
  2012-12-02  4:28 ` Shawn Pearce
  2012-12-02  9:31 ` Junio C Hamano
  0 siblings, 2 replies; 7+ messages in thread
From: Sitaram Chamarty @ 2012-12-02  2:31 UTC (permalink / raw)
  To: Git Mailing List

Hi,

Background: I have a situation where I have to fix up a few hundred
repos in terms of 'git gc' (the auto gc seems to have failed in many
cases; they have far more than 6700 loose objects).  I also found some
corrupted objects in some cases that prevent the gc from completing.

I am running "git gc" followed by "git fsck".  The majority of the
repos I have worked through so far appear to be fine, but in the
larger repos (upwards of 2-3 GB) the git fsck is taking almost 5 times
longer than the 'gc'.

If I could assume that a successful 'git gc' means an fsck is not
needed, I'd save a lot of time.  Hence my question.

-- 
Sitaram

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: does a successful 'git gc' imply 'git fsck'
  2012-12-02  2:31 does a successful 'git gc' imply 'git fsck' Sitaram Chamarty
@ 2012-12-02  4:28 ` Shawn Pearce
  2012-12-02  8:46   ` Sitaram Chamarty
  2012-12-02  9:31 ` Junio C Hamano
  1 sibling, 1 reply; 7+ messages in thread
From: Shawn Pearce @ 2012-12-02  4:28 UTC (permalink / raw)
  To: Sitaram Chamarty; +Cc: Git Mailing List

On Sat, Dec 1, 2012 at 6:31 PM, Sitaram Chamarty <sitaramc@gmail.com> wrote:
> Background: I have a situation where I have to fix up a few hundred
> repos in terms of 'git gc' (the auto gc seems to have failed in many
> cases; they have far more than 6700 loose objects).  I also found some
> corrupted objects in some cases that prevent the gc from completing.
>
> I am running "git gc" followed by "git fsck".  The majority of the
> repos I have worked through so far appear to be fine, but in the
> larger repos (upwards of 2-3 GB) the git fsck is taking almost 5 times
> longer than the 'gc'.
>
> If I could assume that a successful 'git gc' means an fsck is not
> needed, I'd save a lot of time.  Hence my question.

Not really. For example fsck verifies that every blob when
decompressed and fully inflated matches its SHA-1. gc only checks
connectivity of the commit and tree graph by making sure every object
was accounted for. But when creating the output pack it only verifies
a CRC-32 was correct when copying the bits from the source to the
destination, it does not verify that the data decompresses and matches
the SHA-1 it should match.

So it depends on what level of check you need to feel safe.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: does a successful 'git gc' imply 'git fsck'
  2012-12-02  4:28 ` Shawn Pearce
@ 2012-12-02  8:46   ` Sitaram Chamarty
  0 siblings, 0 replies; 7+ messages in thread
From: Sitaram Chamarty @ 2012-12-02  8:46 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Git Mailing List

On Sun, Dec 2, 2012 at 9:58 AM, Shawn Pearce <spearce@spearce.org> wrote:
> On Sat, Dec 1, 2012 at 6:31 PM, Sitaram Chamarty <sitaramc@gmail.com> wrote:
>> Background: I have a situation where I have to fix up a few hundred
>> repos in terms of 'git gc' (the auto gc seems to have failed in many
>> cases; they have far more than 6700 loose objects).  I also found some
>> corrupted objects in some cases that prevent the gc from completing.
>>
>> I am running "git gc" followed by "git fsck".  The majority of the
>> repos I have worked through so far appear to be fine, but in the
>> larger repos (upwards of 2-3 GB) the git fsck is taking almost 5 times
>> longer than the 'gc'.
>>
>> If I could assume that a successful 'git gc' means an fsck is not
>> needed, I'd save a lot of time.  Hence my question.
>
> Not really. For example fsck verifies that every blob when
> decompressed and fully inflated matches its SHA-1. gc only checks

OK that makes sense.  After I posted I happened to check using strace
and kinda guessed this from what I saw, but it's nice to have
confirmation.

> connectivity of the commit and tree graph by making sure every object
> was accounted for. But when creating the output pack it only verifies
> a CRC-32 was correct when copying the bits from the source to the
> destination, it does not verify that the data decompresses and matches
> the SHA-1 it should match.
>
> So it depends on what level of check you need to feel safe.

Yup; thanks.

All the repos my internal client manages are mirrored in multiple
places, and they set (or were at least told to set, heh!)
receive.fsckObjects so the lesser check is fine in most cases.

-- 
Sitaram

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: does a successful 'git gc' imply 'git fsck'
  2012-12-02  2:31 does a successful 'git gc' imply 'git fsck' Sitaram Chamarty
  2012-12-02  4:28 ` Shawn Pearce
@ 2012-12-02  9:31 ` Junio C Hamano
  2012-12-03 13:14   ` Sitaram Chamarty
  2012-12-03 13:46   ` Matthieu Moy
  1 sibling, 2 replies; 7+ messages in thread
From: Junio C Hamano @ 2012-12-02  9:31 UTC (permalink / raw)
  To: Sitaram Chamarty; +Cc: Git Mailing List

Sitaram Chamarty <sitaramc@gmail.com> writes:

> If I could assume that a successful 'git gc' means an fsck is not
> needed, I'd save a lot of time.  Hence my question.

When it does "repack -a", it at least scans the whole history so you
would be sure that all the commits and trees are readable for the
purpose of enumerating the objects referred by them (and a bit flip
in them will likely be noticed by zlib inflation).

But a "gc" does not necessarily run "repack -a" when it does not see
too many pack files, so it can end up scanning only the surface of
the history to collect the recently created loose objects into a
pack, and stop its traversal without going into existing packfiles.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: does a successful 'git gc' imply 'git fsck'
  2012-12-02  9:31 ` Junio C Hamano
@ 2012-12-03 13:14   ` Sitaram Chamarty
  2012-12-03 13:46   ` Matthieu Moy
  1 sibling, 0 replies; 7+ messages in thread
From: Sitaram Chamarty @ 2012-12-03 13:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

On Sun, Dec 2, 2012 at 3:01 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Sitaram Chamarty <sitaramc@gmail.com> writes:
>
>> If I could assume that a successful 'git gc' means an fsck is not
>> needed, I'd save a lot of time.  Hence my question.
>
> When it does "repack -a", it at least scans the whole history so you
> would be sure that all the commits and trees are readable for the
> purpose of enumerating the objects referred by them (and a bit flip
> in them will likely be noticed by zlib inflation).
>
> But a "gc" does not necessarily run "repack -a" when it does not see
> too many pack files, so it can end up scanning only the surface of
> the history to collect the recently created loose objects into a
> pack, and stop its traversal without going into existing packfiles.

Thanks; I'd missed this nuance as well...

-- 
Sitaram

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: does a successful 'git gc' imply 'git fsck'
  2012-12-02  9:31 ` Junio C Hamano
  2012-12-03 13:14   ` Sitaram Chamarty
@ 2012-12-03 13:46   ` Matthieu Moy
  2012-12-03 14:06     ` Junio C Hamano
  1 sibling, 1 reply; 7+ messages in thread
From: Matthieu Moy @ 2012-12-03 13:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sitaram Chamarty, Git Mailing List

Junio C Hamano <gitster@pobox.com> writes:

> But a "gc" does not necessarily run "repack -a" when it does not see
> too many pack files, so it can end up scanning only the surface of
> the history to collect the recently created loose objects into a
> pack, and stop its traversal without going into existing packfiles.

Isn't that the behavior of "git gc --auto", not plain "git gc" ?

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: does a successful 'git gc' imply 'git fsck'
  2012-12-03 13:46   ` Matthieu Moy
@ 2012-12-03 14:06     ` Junio C Hamano
  0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2012-12-03 14:06 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: Sitaram Chamarty, Git Mailing List

Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> But a "gc" does not necessarily run "repack -a" when it does not see
>> too many pack files, so it can end up scanning only the surface of
>> the history to collect the recently created loose objects into a
>> pack, and stop its traversal without going into existing packfiles.
>
> Isn't that the behavior of "git gc --auto", not plain "git gc" ?

True; I missed that Sitaram was running "gc" manually.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-12-03 14:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-02  2:31 does a successful 'git gc' imply 'git fsck' Sitaram Chamarty
2012-12-02  4:28 ` Shawn Pearce
2012-12-02  8:46   ` Sitaram Chamarty
2012-12-02  9:31 ` Junio C Hamano
2012-12-03 13:14   ` Sitaram Chamarty
2012-12-03 13:46   ` Matthieu Moy
2012-12-03 14:06     ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).