public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Chris Mason <clm@fb.com>
To: "ronniesahlberg@gmail.com" <ronniesahlberg@gmail.com>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
	"1i5t5.duncan@cox.net" <1i5t5.duncan@cox.net>
Subject: Re: Scrubbing with BTRFS Raid 5
Date: Wed, 22 Jan 2014 21:16:09 +0000	[thread overview]
Message-ID: <1390425459.1198.51.camel@ret.masoncoding.com> (raw)
In-Reply-To: <CAN05THQP_BZR5r6DC8NVk0XLv8BcoZjuYAuMU=+Z6fsdBBHcyA@mail.gmail.com>

On Wed, 2014-01-22 at 13:06 -0800, ronnie sahlberg wrote:
> On Wed, Jan 22, 2014 at 12:45 PM, Chris Mason <clm@fb.com> wrote:
> > On Tue, 2014-01-21 at 17:08 +0000, Duncan wrote:
> >> Graham Fleming posted on Tue, 21 Jan 2014 01:06:37 -0800 as excerpted:
> >>
> >> > Thanks for all the info guys.
> >> >
> >> > I ran some tests on the latest 3.12.8 kernel. I set up 3 1GB files and
> >> > attached them to /dev/loop{1..3} and created a BTRFS RAID 5 volume with
> >> > them.
> >> >
> >> > I copied some data (from dev/urandom) into two test files and got their
> >> > MD5 sums and saved them to a text file.
> >> >
> >> > I then unmounted the volume, trashed Disk3 and created a new Disk4 file,
> >> > attached to /dev/loop4.
> >> >
> >> > I mounted the BTRFS RAID 5 volume degraded and the md5 sums were fine. I
> >> > added /dev/loop4 to the volume and then deleted the missing device and
> >> > it rebalanced. I had data spread out on all three devices now. MD5 sums
> >> > unchanged on test files.
> >> >
> >> > This, to me, implies BTRFS RAID 5 is working quite well and I can in
> >> > fact,
> >> > replace a dead drive.
> >> >
> >> > Am I missing something?
> >>
> >> What you're missing is that device death and replacement rarely happens
> >> as neatly as your test (clean unmounts and all, no middle-of-process
> >> power-loss, etc).  You tested best-case, not real-life or worst-case.
> >>
> >> Try that again, setting up the raid5, setting up a big write to it,
> >> disconnect one device in the middle of that write (I'm not sure if just
> >> dropping the loop works or if the kernel gracefully shuts down the loop
> >> device), then unplugging the system without unmounting... and /then/ see
> >> what sense btrfs can make of the resulting mess.  In theory, with an
> >> atomic write btree filesystem such as btrfs, even that should work fine,
> >> minus perhaps the last few seconds of file-write activity, but the
> >> filesystem should remain consistent on degraded remount and device add,
> >> device remove, and rebalance, even if another power-pull happens in the
> >> middle of /that/.
> >>
> >> But given btrfs' raid5 incompleteness, I don't expect that will work.
> >>
> >
> > raid5/6 deals with IO errors from one or two drives, and it is able to
> > reconstruct the parity from the remaining drives and give you good data.
> >
> > If we hit a crc error, the raid5/6 code will try a parity reconstruction
> > to make good data, and if we find good data from the other copy, it'll
> > return that up to userland.
> >
> > In other words, for those cases it works just like raid1/10.  What it
> > won't do (yet) is write that good data back to the storage.  It'll stay
> > bad until you remove the device or run balance to rewrite everything.
> >
> > Balance will reconstruct parity to get good data as it balances.  This
> > isn't as useful as scrub, but that work is coming.
> >
> 
> That is awesome!
> 
> What about online conversion from not-raid5/6 to raid5/6  what is the
> status for that code, for example
> what happens if there is a failure during the conversion or a reboot ?

The conversion code uses balance, so that works normally.  If there is a
failure during the conversion you'll end up with some things raid5/6 and
somethings at whatever other level you used.

The data will still be there, but you are more prone to enospc
problems ;)

-chris


  reply	other threads:[~2014-01-22 21:16 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-21  9:06 Scrubbing with BTRFS Raid 5 Graham Fleming
2014-01-21 17:08 ` Duncan
2014-01-21 17:18   ` Jim Salter
2014-01-21 17:38     ` Chris Murphy
2014-01-21 18:25       ` Jim Salter
2014-01-22 16:02     ` Duncan
2014-01-22 20:45   ` Chris Mason
2014-01-22 21:06     ` ronnie sahlberg
2014-01-22 21:16       ` Chris Mason [this message]
2014-01-22 22:36         ` ronnie sahlberg
  -- strict thread matches above, loose matches on Subject: below --
2014-01-21 18:03 Graham Fleming
2014-01-22 15:39 ` Duncan
2014-01-20  0:53 Graham Fleming
2014-01-20 13:21 ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1390425459.1198.51.camel@ret.masoncoding.com \
    --to=clm@fb.com \
    --cc=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=ronniesahlberg@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox