consistency detect

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* consistency detect
@ 2004-10-11  3:32 Ming Zhang
  2004-10-11  8:54 ` Brad Campbell
  2004-10-11 23:23 ` Neil Brown
  0 siblings, 2 replies; 9+ messages in thread
From: Ming Zhang @ 2004-10-11  3:32 UTC (permalink / raw)
  To: linux-raid

I have a question on RAID error detect. hope somebody can help me to
find it out. thanks.

take raid1 as an example, if one disk fail, raid 1 can detect the data
on disk is compromised and then reconstruct it using a spare disk. this
is straight forward.

but if one request comes to raid1 and raid1 sends requests to both
disks, at this time, system reboots because power outage, system
crashes, or any other reason. then after system reboots, how raid 1
detects which disk has consistent data? since before reboot, anything
can happen, data may in disk1 but not in disk2, or in disk2 but not in
disk1, or not in both disks, or already on both disks.

how raid1 or other raid code deal with this?

ming

-- 
 --------------------------------------------------
| Ming Zhang, PhD. Student
| Dept. of Electrical & Computer Engineering
| College of Engineering
| University of Rhode Island
| Kingston RI. 02881
| e-mail: mingz at ele.uri.edu
| Tel. (401) 874-2293 
| Fax. (401) 782-6422
| http://www.ele.uri.edu/~mingz/
| http://crab.ele.uri.edu/gallery/albums.php
 --------------------------------------------------

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: consistency detect
  2004-10-11  3:32 consistency detect Ming Zhang
@ 2004-10-11  8:54 ` Brad Campbell
  2004-10-11 10:42   ` Michael Tokarev
  2004-10-11 23:23 ` Neil Brown
  1 sibling, 1 reply; 9+ messages in thread
From: Brad Campbell @ 2004-10-11  8:54 UTC (permalink / raw)
  To: mingz; +Cc: linux-raid

Ming Zhang wrote:
> I have a question on RAID error detect. hope somebody can help me to
> find it out. thanks.
> 
> take raid1 as an example, if one disk fail, raid 1 can detect the data
> on disk is compromised and then reconstruct it using a spare disk. this
> is straight forward.
> 
> but if one request comes to raid1 and raid1 sends requests to both
> disks, at this time, system reboots because power outage, system
> crashes, or any other reason. then after system reboots, how raid 1
> detects which disk has consistent data? since before reboot, anything
> can happen, data may in disk1 but not in disk2, or in disk2 but not in
> disk1, or not in both disks, or already on both disks.
> 
> how raid1 or other raid code deal with this?

In short, it does not deal with it at all. RAID will deal with a disk failure, it has no guarantees 
about consistency on power failures, hard lockups or other catastrophic events.

A UPS is cheap insurance against consistency issues in combination with a journalling filesystem.

Brad

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: consistency detect
  2004-10-11  8:54 ` Brad Campbell
@ 2004-10-11 10:42   ` Michael Tokarev
  2004-10-11 10:58     ` Brad Campbell
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Tokarev @ 2004-10-11 10:42 UTC (permalink / raw)
  To: Brad Campbell; +Cc: mingz, linux-raid

Brad Campbell wrote:
> Ming Zhang wrote:
> 
>> I have a question on RAID error detect. hope somebody can help me to
>> find it out. thanks.
>>
>> take raid1 as an example, if one disk fail, raid 1 can detect the data
>> on disk is compromised and then reconstruct it using a spare disk. this
>> is straight forward.
>>
>> but if one request comes to raid1 and raid1 sends requests to both
>> disks, at this time, system reboots because power outage, system
>> crashes, or any other reason. then after system reboots, how raid 1
>> detects which disk has consistent data? since before reboot, anything
>> can happen, data may in disk1 but not in disk2, or in disk2 but not in
>> disk1, or not in both disks, or already on both disks.
>>
>> how raid1 or other raid code deal with this?
> 
> 
> In short, it does not deal with it at all. RAID will deal with a disk 
> failure, it has no guarantees about consistency on power failures, hard 
> lockups or other catastrophic events.

This is incorrect.  In-kernel raid code keeps track of arrays and
underlying disk state during write operations.  On clean shutdown,
when everything has been written, raid superblocks on all disks
gets updated to indicate this.  In case of unclean shutdown, raid
code will reconstruct older copies of data using most recent ones
(ie, from a disk which has most recent "events" value in superblock).
The same is done for all other raid levels (4, 5, 6), but onot for
raid0 for obvious reasons (as there's no R in raid0 per se).

> A UPS is cheap insurance against consistency issues in combination with 
> a journalling filesystem.

Well, it is another (albiet very good) layer of protection.

/mjt

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: consistency detect
  2004-10-11 10:42   ` Michael Tokarev
@ 2004-10-11 10:58     ` Brad Campbell
  2004-10-11 21:15       ` Ming Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Brad Campbell @ 2004-10-11 10:58 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: mingz, linux-raid

Michael Tokarev wrote:

>> In short, it does not deal with it at all. RAID will deal with a disk 
>> failure, it has no guarantees about consistency on power failures, 
>> hard lockups or other catastrophic events.
> 
> 
> This is incorrect.  In-kernel raid code keeps track of arrays and
> underlying disk state during write operations.  On clean shutdown,
> when everything has been written, raid superblocks on all disks
> gets updated to indicate this.  In case of unclean shutdown, raid
> code will reconstruct older copies of data using most recent ones
> (ie, from a disk which has most recent "events" value in superblock).
> The same is done for all other raid levels (4, 5, 6), but onot for
> raid0 for obvious reasons (as there's no R in raid0 per se).

When does the "events" value in the superblock actually get updated? I understood it only got 
updated on an event, ie raid start, raid stop, disk add/remove/fail.

I realise the system does an auto rebuild when started after an unclean shutdown, the question 
really is how does it know which disk is the freshest in a raid-1? In a raid-4,5,6 it's pretty 
obvious as there is really only one copy of the data, but then does the code actually ensure that 
the data gets written before the updated parity? or does it just flush the lot to disk in what it 
thinks is the most optimum fashion?

The In-kernel data becomes pretty moot when the kernel has just blasted a couple of large blocks out 
to a couple of disks and the plug has been pulled. It's going to be pretty indeterminate as to which 
disk has the most accurate image of what was actually sent to it. Thus my comment that there is 
really no way of accurately dealing with a catastrophic failure, and RAID is not there to do that 
anyway.

I guess if you had a hardware RAID card that had a battery backed up RAM you have a much better 
chance but then you really have a mini-ups :p)

Brad

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: consistency detect
  2004-10-11 10:58     ` Brad Campbell
@ 2004-10-11 21:15       ` Ming Zhang
  0 siblings, 0 replies; 9+ messages in thread
From: Ming Zhang @ 2004-10-11 21:15 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Michael Tokarev, linux-raid

Thank all so much for reply.

On Mon, 2004-10-11 at 06:58, Brad Campbell wrote:
> Michael Tokarev wrote:
> 
> >> In short, it does not deal with it at all. RAID will deal with a disk 
> >> failure, it has no guarantees about consistency on power failures, 
> >> hard lockups or other catastrophic events.
> > 
> > 
> > This is incorrect.  In-kernel raid code keeps track of arrays and
> > underlying disk state during write operations.  On clean shutdown,
> > when everything has been written, raid superblocks on all disks
> > gets updated to indicate this.  In case of unclean shutdown, raid
> > code will reconstruct older copies of data using most recent ones
> > (ie, from a disk which has most recent "events" value in superblock).
> > The same is done for all other raid levels (4, 5, 6), but onot for
> > raid0 for obvious reasons (as there's no R in raid0 per se).
> 
> 
> When does the "events" value in the superblock actually get updated? I understood it only got 
> updated on an event, ie raid start, raid stop, disk add/remove/fail.
> 
yes, I guess if this information get updated frequently, it will have
impact on performance. but if not that frequently, it is useless for
this situation at all.


> I realise the system does an auto rebuild when started after an unclean shutdown, the question 
> really is how does it know which disk is the freshest in a raid-1? In a raid-4,5,6 it's pretty 
> obvious as there is really only one copy of the data, but then does the code actually ensure that 
> the data gets written before the updated parity? or does it just flush the lot to disk in what it 
> thinks is the most optimum fashion?
> 
> The In-kernel data becomes pretty moot when the kernel has just blasted a couple of large blocks out 
> to a couple of disks and the plug has been pulled. It's going to be pretty indeterminate as to which 
> disk has the most accurate image of what was actually sent to it. Thus my comment that there is 
> really no way of accurately dealing with a catastrophic failure, and RAID is not there to do that 
> anyway.
> 
with this indeterminate results, i do not know how raid code to detect
which one is the latest copy, or a half-half? and in previous email, u
suggest to have UPS and journal fs. but 1) u system will crash sometime
even with UPS, so a UPS can not 100% prevent this. 2) JFS can not 100%
solve this as well. especially when jfs only have metadata in log.


> I guess if you had a hardware RAID card that had a battery backed up RAM you have a much better 
> chance but then you really have a mini-ups :p)
> 
so here a NVRAM is the only way to solve this. :P also need a separate
cpu running separate code.

> Brad


ming



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: consistency detect
  2004-10-11  3:32 consistency detect Ming Zhang
  2004-10-11  8:54 ` Brad Campbell
@ 2004-10-11 23:23 ` Neil Brown
  2004-10-12  0:05   ` Ming Zhang
  1 sibling, 1 reply; 9+ messages in thread
From: Neil Brown @ 2004-10-11 23:23 UTC (permalink / raw)
  To: mingz; +Cc: linux-raid

On Sunday October 10, mingz@ele.uri.edu wrote:
> I have a question on RAID error detect. hope somebody can help me to
> find it out. thanks.
> 
> take raid1 as an example, if one disk fail, raid 1 can detect the data
> on disk is compromised and then reconstruct it using a spare disk. this
> is straight forward.
> 
> but if one request comes to raid1 and raid1 sends requests to both
> disks, at this time, system reboots because power outage, system
> crashes, or any other reason. then after system reboots, how raid 1
> detects which disk has consistent data? since before reboot, anything
> can happen, data may in disk1 but not in disk2, or in disk2 but not in
> disk1, or not in both disks, or already on both disks.

When you have a computer with a single drive, and it crashes due to
power outage or similar, and it was in the process of writing data out
to disk, the contents of those blocks that were being written is
undefined.  It might have the old data.  It might have the new data.
If there are multiple blocks being written, some might be "old", some
might be "new".

Exactly the same is true with RAID1.  There is "right" value for any
block that was in the process of being written.
RAID1 simply chooses a value and makes sure that it is the same on
both (all) drives.  It arbitrarily chooses the "first" drive in the
array and copies that onto the rest.

> 
> how raid1 or other raid code deal with this?

This sort of inconsistency is not really something for RAID to deal
with.  It is something for the filesystem or application to deal
with.  Possibly via journalling.  Possibly via 'fsck'.

NeilBrown

> 
> 
> ming
> 
> 
> -- 
>  --------------------------------------------------
> | Ming Zhang, PhD. Student
> | Dept. of Electrical & Computer Engineering
> | College of Engineering
> | University of Rhode Island
> | Kingston RI. 02881
> | e-mail: mingz at ele.uri.edu
> | Tel. (401) 874-2293 
> | Fax. (401) 782-6422
> | http://www.ele.uri.edu/~mingz/
> | http://crab.ele.uri.edu/gallery/albums.php
>  --------------------------------------------------
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: consistency detect
  2004-10-11 23:23 ` Neil Brown
@ 2004-10-12  0:05   ` Ming Zhang
  2004-10-12  0:13     ` Neil Brown
  0 siblings, 1 reply; 9+ messages in thread
From: Ming Zhang @ 2004-10-12  0:05 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On Mon, 2004-10-11 at 19:23, Neil Brown wrote:
> On Sunday October 10, mingz@ele.uri.edu wrote:
> > I have a question on RAID error detect. hope somebody can help me to
> > find it out. thanks.
> > 
> > take raid1 as an example, if one disk fail, raid 1 can detect the data
> > on disk is compromised and then reconstruct it using a spare disk. this
> > is straight forward.
> > 
> > but if one request comes to raid1 and raid1 sends requests to both
> > disks, at this time, system reboots because power outage, system
> > crashes, or any other reason. then after system reboots, how raid 1
> > detects which disk has consistent data? since before reboot, anything
> > can happen, data may in disk1 but not in disk2, or in disk2 but not in
> > disk1, or not in both disks, or already on both disks.
> 
> When you have a computer with a single drive, and it crashes due to
> power outage or similar, and it was in the process of writing data out
> to disk, the contents of those blocks that were being written is
> undefined.  It might have the old data.  It might have the new data.
> If there are multiple blocks being written, some might be "old", some
> might be "new".
> 
> Exactly the same is true with RAID1.  There is "right" value for any
> block that was in the process of being written.
> RAID1 simply chooses a value and makes sure that it is the same on
> both (all) drives.  It arbitrarily chooses the "first" drive in the
> array and copies that onto the rest.
> 
:) thanks a lot. so now i know what policy raid1 use. so raid1 will
randomly choose first drive and sync among all drives. so here comes to
another question. after a power loss and reboot, raid1 knows that there
are something wrong and potentially out of sync between two drives. will
it try to check and make sure two drives are in sync or it just leave
two drives there with potential unmatched blocks.

for example, before reboot, a write to location A happen in progress.
then after reboot. raid1 has no idea on previous write information. it
just knows something might happened and two drives are possible to be
out of sync. will it check whole drive to do a resync?



> > 
> > how raid1 or other raid code deal with this?
> 
> This sort of inconsistency is not really something for RAID to deal
> with.  It is something for the filesystem or application to deal
> with.  Possibly via journalling.  Possibly via 'fsck'.
> 
> NeilBrown
> 
> > 
> > 
> > ming
> > 
> > 
> > -- 
> >  --------------------------------------------------
> > | Ming Zhang, PhD. Student
> > | Dept. of Electrical & Computer Engineering
> > | College of Engineering
> > | University of Rhode Island
> > | Kingston RI. 02881
> > | e-mail: mingz at ele.uri.edu
> > | Tel. (401) 874-2293 
> > | Fax. (401) 782-6422
> > | http://www.ele.uri.edu/~mingz/
> > | http://crab.ele.uri.edu/gallery/albums.php
> >  --------------------------------------------------
> > 
> > 
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
 --------------------------------------------------
| Ming Zhang, PhD. Student
| Dept. of Electrical & Computer Engineering
| College of Engineering
| University of Rhode Island
| Kingston RI. 02881
| e-mail: mingz at ele.uri.edu
| Tel. (401) 874-2293 
| Fax. (401) 782-6422
| http://www.ele.uri.edu/~mingz/
| http://crab.ele.uri.edu/gallery/albums.php
 --------------------------------------------------



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: consistency detect
  2004-10-12  0:05   ` Ming Zhang
@ 2004-10-12  0:13     ` Neil Brown
  2004-10-12  0:43       ` Ming Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2004-10-12  0:13 UTC (permalink / raw)
  To: mingz; +Cc: linux-raid

On Monday October 11, mingz@ele.uri.edu wrote:
> :) thanks a lot. so now i know what policy raid1 use. so raid1 will
> randomly choose first drive and sync among all drives. so here comes to
> another question. after a power loss and reboot, raid1 knows that there
> are something wrong and potentially out of sync between two drives. will
> it try to check and make sure two drives are in sync or it just leave
> two drives there with potential unmatched blocks.
> 
> for example, before reboot, a write to location A happen in progress.
> then after reboot. raid1 has no idea on previous write information. it
> just knows something might happened and two drives are possible to be
> out of sync. will it check whole drive to do a resync?
> 

Yes.  It copies all of these first drive onto all of the other drives.

NeilBrown

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: consistency detect
  2004-10-12  0:13     ` Neil Brown
@ 2004-10-12  0:43       ` Ming Zhang
  0 siblings, 0 replies; 9+ messages in thread
From: Ming Zhang @ 2004-10-12  0:43 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On Mon, 2004-10-11 at 20:13, Neil Brown wrote:
> On Monday October 11, mingz@ele.uri.edu wrote:
> > :) thanks a lot. so now i know what policy raid1 use. so raid1 will
> > randomly choose first drive and sync among all drives. so here comes to
> > another question. after a power loss and reboot, raid1 knows that there
> > are something wrong and potentially out of sync between two drives. will
> > it try to check and make sure two drives are in sync or it just leave
> > two drives there with potential unmatched blocks.
> > 
> > for example, before reboot, a write to location A happen in progress.
> > then after reboot. raid1 has no idea on previous write information. it
> > just knows something might happened and two drives are possible to be
> > out of sync. will it check whole drive to do a resync?
> > 
> 
> Yes.  It copies all of these first drive onto all of the other drives.
a full copy? then if u have 100GB on first disk and you have to copy
that much? with background reconstruction, this can take a long time. i
never realized the overhead is this high. so i guess if system provide u
a piece of nvram, u can do much better. :P


> 
> NeilBrown
-- 
 --------------------------------------------------
| Ming Zhang, PhD. Student
| Dept. of Electrical & Computer Engineering
| College of Engineering
| University of Rhode Island
| Kingston RI. 02881
| e-mail: mingz at ele.uri.edu
| Tel. (401) 874-2293 
| Fax. (401) 782-6422
| http://www.ele.uri.edu/~mingz/
| http://crab.ele.uri.edu/gallery/albums.php
 --------------------------------------------------



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2004-10-12  0:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-11  3:32 consistency detect Ming Zhang
2004-10-11  8:54 ` Brad Campbell
2004-10-11 10:42   ` Michael Tokarev
2004-10-11 10:58     ` Brad Campbell
2004-10-11 21:15       ` Ming Zhang
2004-10-11 23:23 ` Neil Brown
2004-10-12  0:05   ` Ming Zhang
2004-10-12  0:13     ` Neil Brown
2004-10-12  0:43       ` Ming Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).