git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Greg Troxel <gdt@ir.bbn.com>
Cc: git@vger.kernel.org
Subject: Re: repo consistency under crashes and power failures?
Date: Fri, 26 Jul 2013 23:10:17 -0400	[thread overview]
Message-ID: <20130727031017.GA20207@sigill.intra.peff.net> (raw)
In-Reply-To: <rmiy597iujc.fsf@fnord.ir.bbn.com>

On Mon, Jul 15, 2013 at 01:48:23PM -0400, Greg Troxel wrote:

> I am curious if anyone has actual experiences to share, either
> 
>   a report of corruption after a crash (where corruption means that
>   either 1) git fsck reports worse than dangling objects or 2) some ref
>   did not either point to the old place or the new place)
> 
>   experiments intended to provoke corruption, like dropping power during
>   pushes, or forced panics in the kernel due to timers, etc.

I have quite a bit of experience with this, as I investigate all repo
corruption that we see on github.com, and have run experiments to try to
reproduce such corruption.

Our backend git systems are ext3 with journaling and data=ordered. We
run that on top of drbd, with two redundant machines sharing the block
device. If one dies, we fail over to the spare. Writes to the block
device are not considered committed until they are written to both
machines.

Git's scheme is to write objects (both loose and when receiving packs
over the wire) via tempfile, with an atomic link-into-place after close.
We do not fsync object files by default, but we do fsync packs. However,
it shouldn't matter as long as your filesystem orders data and metadata
writes (if it doesn't, you probably want to turn on object fsyncing).
So for our data=ordered filesystems, that's fine.

Ref writes have a similar fsync situation to loose object files. We
write the new ref to a tempfile, close, and then rename into place. If
the data and metadata writes are out of order, one could have problems
(but again, not a problem with data=ordered).

Most of the corruption we have seen at GitHub has been one of:

  1. Buggy non-core-git implementations that do not properly use
     tempfiles to create objects (Grit used to have this problem, but it
     is now fixed).

  2. Race conditions in examining ref state that can cause refs to be
     missed when determining reachability (thus you might prune objects
     that should be left). The worst of these is fixed in the current
     "master" and will be part of git v1.8.4. There are still ways that
     we can prune too much, but they are reasonably unlikely unless you
     are pruning constantly.

We did once experience some lost objects after a server failover.  After
much experimentation, we finally found out that the machine in question
had a RAID card with bad memory which would drop some writes which it
claimed to have committed after a power failure (so even fsync did not
help).

So for ordered data and metadata writes, in my experience git is quite
solid against power failures and crashes. For systems without that
guarantee, you should turn on core.fsyncobjectfiles, but I suspect you
could also see some ref corruption (and possibly index corruption, too,
as it does not fsync either).

-Peff

      parent reply	other threads:[~2013-07-27  3:10 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-15 17:48 repo consistency under crashes and power failures? Greg Troxel
2013-07-15 17:51 ` Jonathan Nieder
2013-07-16  6:17 ` Johannes Sixt
2013-07-27  3:10 ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130727031017.GA20207@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=gdt@ir.bbn.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).