linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Patrick H." <linux-raid@feystorm.net>
To: linux-raid@vger.kernel.org
Subject: Re: filesystem corruption
Date: Sun, 02 Jan 2011 22:05:06 -0700	[thread overview]
Message-ID: <4D215902.9010308@feystorm.net> (raw)
In-Reply-To: <20110103155630.565341d0@notabene.brown>

Sent: Sun Jan 02 2011 21:56:30 GMT-0700 (Mountain Standard Time)
From: Neil Brown <neilb@suse.de>
To: Patrick H. <linux-raid@feystorm.net> linux-raid@vger.kernel.org
Subject: Re: filesystem corruption
> On Sun, 02 Jan 2011 21:06:52 -0700 "Patrick H." <linux-raid@feystorm.net>
> wrote:
>
>
>   
>> That makes sense assuming that MD acknowleges the write once the data is 
>> written to the data disks but not necessarily the parity disk, which is 
>> what I gather you were saying is what happens. Is there any option that 
>> can change the behavior so that md wont ack the write until its been 
>> committed to all disks (I'm guessing no since you didnt mention it)?
>> Also does raid6 suffer this problem? Is it smart enough to use both 
>> parity disks when calculating replacement, or will it just use one?
>>
>>     
>
> md/raid5 doesn't acknowledge the write until both the data and the parity
> have been written.  But that doesn't make any difference.
> If you schedule a number of interdependent writes (data and parity) and then
> allow some to complete but not all, then you have inconsistency.
> Recovery from losing a single device requires consistency of parity and data.
>
> RAID6 suffers equally from this problem.  Even if it used both parity disks
> to recover (which it doesn't) how would that help?  It would then have two
> possible value for the data and no way to know which was correct, and every
> possibility that both are incorrect.  This would happen if a single data
> block was successfully written, but neither parity blocks were.
>
> The only way you can avoid this 'write hole' is by journalling in multiples
> of whole stripes.  No current filesystems that I know of can do this as they
> journal in blocks, and the maximum block size is less than the minimum stripe
> size.  So you would need journalling integrated with md/raid, or you would
> need a filesystem which was designed to understand this problem and write
> whole stripes at a time, always to an area of the device which did not
> contain live data.
>
> NeilBrown
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   

Ok, thanks for the info.
I think I'll solve it by creating 2 dedicated hosts for running the 
array, but not actually export any disks themselves. This way if a 
master dies, all the raid disks are still there and can be picked up by 
the other master.

-Patrick

  reply	other threads:[~2011-01-03  5:05 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-03  1:58 filesystem corruption Patrick H.
2011-01-03  3:16 ` Neil Brown
     [not found]   ` <4D214B5C.3010103@feystorm.net>
2011-01-03  4:56     ` Neil Brown
2011-01-03  5:05       ` Patrick H. [this message]
2011-01-04  5:33         ` NeilBrown
2011-01-04  7:50           ` Patrick H.
2011-01-04 17:31             ` Patrick H.
2011-01-05  1:22               ` Patrick H.
2011-01-05  7:02   ` CoolCold
     [not found]   ` <AANLkTinL_nz58f8rSPuhYvVwGY5jdu1XVkNLC1ky5A65@mail.gmail.com>
2011-01-05 14:28     ` Patrick H.
2011-01-05 15:52       ` Spelic
2011-01-05 15:55         ` Patrick H.

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D215902.9010308@feystorm.net \
    --to=linux-raid@feystorm.net \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).