From: "Patrick H." <linux-raid@feystorm.net>
To: linux-raid@vger.kernel.org
Subject: Re: filesystem corruption
Date: Sun, 02 Jan 2011 22:05:06 -0700 [thread overview]
Message-ID: <4D215902.9010308@feystorm.net> (raw)
In-Reply-To: <20110103155630.565341d0@notabene.brown>
Sent: Sun Jan 02 2011 21:56:30 GMT-0700 (Mountain Standard Time)
From: Neil Brown <neilb@suse.de>
To: Patrick H. <linux-raid@feystorm.net> linux-raid@vger.kernel.org
Subject: Re: filesystem corruption
> On Sun, 02 Jan 2011 21:06:52 -0700 "Patrick H." <linux-raid@feystorm.net>
> wrote:
>
>
>
>> That makes sense assuming that MD acknowleges the write once the data is
>> written to the data disks but not necessarily the parity disk, which is
>> what I gather you were saying is what happens. Is there any option that
>> can change the behavior so that md wont ack the write until its been
>> committed to all disks (I'm guessing no since you didnt mention it)?
>> Also does raid6 suffer this problem? Is it smart enough to use both
>> parity disks when calculating replacement, or will it just use one?
>>
>>
>
> md/raid5 doesn't acknowledge the write until both the data and the parity
> have been written. But that doesn't make any difference.
> If you schedule a number of interdependent writes (data and parity) and then
> allow some to complete but not all, then you have inconsistency.
> Recovery from losing a single device requires consistency of parity and data.
>
> RAID6 suffers equally from this problem. Even if it used both parity disks
> to recover (which it doesn't) how would that help? It would then have two
> possible value for the data and no way to know which was correct, and every
> possibility that both are incorrect. This would happen if a single data
> block was successfully written, but neither parity blocks were.
>
> The only way you can avoid this 'write hole' is by journalling in multiples
> of whole stripes. No current filesystems that I know of can do this as they
> journal in blocks, and the maximum block size is less than the minimum stripe
> size. So you would need journalling integrated with md/raid, or you would
> need a filesystem which was designed to understand this problem and write
> whole stripes at a time, always to an area of the device which did not
> contain live data.
>
> NeilBrown
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Ok, thanks for the info.
I think I'll solve it by creating 2 dedicated hosts for running the
array, but not actually export any disks themselves. This way if a
master dies, all the raid disks are still there and can be picked up by
the other master.
-Patrick
next prev parent reply other threads:[~2011-01-03 5:05 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-03 1:58 filesystem corruption Patrick H.
2011-01-03 3:16 ` Neil Brown
[not found] ` <4D214B5C.3010103@feystorm.net>
2011-01-03 4:56 ` Neil Brown
2011-01-03 5:05 ` Patrick H. [this message]
2011-01-04 5:33 ` NeilBrown
2011-01-04 7:50 ` Patrick H.
2011-01-04 17:31 ` Patrick H.
2011-01-05 1:22 ` Patrick H.
2011-01-05 7:02 ` CoolCold
[not found] ` <AANLkTinL_nz58f8rSPuhYvVwGY5jdu1XVkNLC1ky5A65@mail.gmail.com>
2011-01-05 14:28 ` Patrick H.
2011-01-05 15:52 ` Spelic
2011-01-05 15:55 ` Patrick H.
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D215902.9010308@feystorm.net \
--to=linux-raid@feystorm.net \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).