* consistency detect @ 2004-10-11 3:32 Ming Zhang 2004-10-11 8:54 ` Brad Campbell 2004-10-11 23:23 ` Neil Brown 0 siblings, 2 replies; 9+ messages in thread From: Ming Zhang @ 2004-10-11 3:32 UTC (permalink / raw) To: linux-raid I have a question on RAID error detect. hope somebody can help me to find it out. thanks. take raid1 as an example, if one disk fail, raid 1 can detect the data on disk is compromised and then reconstruct it using a spare disk. this is straight forward. but if one request comes to raid1 and raid1 sends requests to both disks, at this time, system reboots because power outage, system crashes, or any other reason. then after system reboots, how raid 1 detects which disk has consistent data? since before reboot, anything can happen, data may in disk1 but not in disk2, or in disk2 but not in disk1, or not in both disks, or already on both disks. how raid1 or other raid code deal with this? ming -- -------------------------------------------------- | Ming Zhang, PhD. Student | Dept. of Electrical & Computer Engineering | College of Engineering | University of Rhode Island | Kingston RI. 02881 | e-mail: mingz at ele.uri.edu | Tel. (401) 874-2293 | Fax. (401) 782-6422 | http://www.ele.uri.edu/~mingz/ | http://crab.ele.uri.edu/gallery/albums.php -------------------------------------------------- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: consistency detect 2004-10-11 3:32 consistency detect Ming Zhang @ 2004-10-11 8:54 ` Brad Campbell 2004-10-11 10:42 ` Michael Tokarev 2004-10-11 23:23 ` Neil Brown 1 sibling, 1 reply; 9+ messages in thread From: Brad Campbell @ 2004-10-11 8:54 UTC (permalink / raw) To: mingz; +Cc: linux-raid Ming Zhang wrote: > I have a question on RAID error detect. hope somebody can help me to > find it out. thanks. > > take raid1 as an example, if one disk fail, raid 1 can detect the data > on disk is compromised and then reconstruct it using a spare disk. this > is straight forward. > > but if one request comes to raid1 and raid1 sends requests to both > disks, at this time, system reboots because power outage, system > crashes, or any other reason. then after system reboots, how raid 1 > detects which disk has consistent data? since before reboot, anything > can happen, data may in disk1 but not in disk2, or in disk2 but not in > disk1, or not in both disks, or already on both disks. > > how raid1 or other raid code deal with this? In short, it does not deal with it at all. RAID will deal with a disk failure, it has no guarantees about consistency on power failures, hard lockups or other catastrophic events. A UPS is cheap insurance against consistency issues in combination with a journalling filesystem. Brad ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: consistency detect 2004-10-11 8:54 ` Brad Campbell @ 2004-10-11 10:42 ` Michael Tokarev 2004-10-11 10:58 ` Brad Campbell 0 siblings, 1 reply; 9+ messages in thread From: Michael Tokarev @ 2004-10-11 10:42 UTC (permalink / raw) To: Brad Campbell; +Cc: mingz, linux-raid Brad Campbell wrote: > Ming Zhang wrote: > >> I have a question on RAID error detect. hope somebody can help me to >> find it out. thanks. >> >> take raid1 as an example, if one disk fail, raid 1 can detect the data >> on disk is compromised and then reconstruct it using a spare disk. this >> is straight forward. >> >> but if one request comes to raid1 and raid1 sends requests to both >> disks, at this time, system reboots because power outage, system >> crashes, or any other reason. then after system reboots, how raid 1 >> detects which disk has consistent data? since before reboot, anything >> can happen, data may in disk1 but not in disk2, or in disk2 but not in >> disk1, or not in both disks, or already on both disks. >> >> how raid1 or other raid code deal with this? > > > In short, it does not deal with it at all. RAID will deal with a disk > failure, it has no guarantees about consistency on power failures, hard > lockups or other catastrophic events. This is incorrect. In-kernel raid code keeps track of arrays and underlying disk state during write operations. On clean shutdown, when everything has been written, raid superblocks on all disks gets updated to indicate this. In case of unclean shutdown, raid code will reconstruct older copies of data using most recent ones (ie, from a disk which has most recent "events" value in superblock). The same is done for all other raid levels (4, 5, 6), but onot for raid0 for obvious reasons (as there's no R in raid0 per se). > A UPS is cheap insurance against consistency issues in combination with > a journalling filesystem. Well, it is another (albiet very good) layer of protection. /mjt ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: consistency detect 2004-10-11 10:42 ` Michael Tokarev @ 2004-10-11 10:58 ` Brad Campbell 2004-10-11 21:15 ` Ming Zhang 0 siblings, 1 reply; 9+ messages in thread From: Brad Campbell @ 2004-10-11 10:58 UTC (permalink / raw) To: Michael Tokarev; +Cc: mingz, linux-raid Michael Tokarev wrote: >> In short, it does not deal with it at all. RAID will deal with a disk >> failure, it has no guarantees about consistency on power failures, >> hard lockups or other catastrophic events. > > > This is incorrect. In-kernel raid code keeps track of arrays and > underlying disk state during write operations. On clean shutdown, > when everything has been written, raid superblocks on all disks > gets updated to indicate this. In case of unclean shutdown, raid > code will reconstruct older copies of data using most recent ones > (ie, from a disk which has most recent "events" value in superblock). > The same is done for all other raid levels (4, 5, 6), but onot for > raid0 for obvious reasons (as there's no R in raid0 per se). When does the "events" value in the superblock actually get updated? I understood it only got updated on an event, ie raid start, raid stop, disk add/remove/fail. I realise the system does an auto rebuild when started after an unclean shutdown, the question really is how does it know which disk is the freshest in a raid-1? In a raid-4,5,6 it's pretty obvious as there is really only one copy of the data, but then does the code actually ensure that the data gets written before the updated parity? or does it just flush the lot to disk in what it thinks is the most optimum fashion? The In-kernel data becomes pretty moot when the kernel has just blasted a couple of large blocks out to a couple of disks and the plug has been pulled. It's going to be pretty indeterminate as to which disk has the most accurate image of what was actually sent to it. Thus my comment that there is really no way of accurately dealing with a catastrophic failure, and RAID is not there to do that anyway. I guess if you had a hardware RAID card that had a battery backed up RAM you have a much better chance but then you really have a mini-ups :p) Brad ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: consistency detect 2004-10-11 10:58 ` Brad Campbell @ 2004-10-11 21:15 ` Ming Zhang 0 siblings, 0 replies; 9+ messages in thread From: Ming Zhang @ 2004-10-11 21:15 UTC (permalink / raw) To: Brad Campbell; +Cc: Michael Tokarev, linux-raid Thank all so much for reply. On Mon, 2004-10-11 at 06:58, Brad Campbell wrote: > Michael Tokarev wrote: > > >> In short, it does not deal with it at all. RAID will deal with a disk > >> failure, it has no guarantees about consistency on power failures, > >> hard lockups or other catastrophic events. > > > > > > This is incorrect. In-kernel raid code keeps track of arrays and > > underlying disk state during write operations. On clean shutdown, > > when everything has been written, raid superblocks on all disks > > gets updated to indicate this. In case of unclean shutdown, raid > > code will reconstruct older copies of data using most recent ones > > (ie, from a disk which has most recent "events" value in superblock). > > The same is done for all other raid levels (4, 5, 6), but onot for > > raid0 for obvious reasons (as there's no R in raid0 per se). > > > When does the "events" value in the superblock actually get updated? I understood it only got > updated on an event, ie raid start, raid stop, disk add/remove/fail. > yes, I guess if this information get updated frequently, it will have impact on performance. but if not that frequently, it is useless for this situation at all. > I realise the system does an auto rebuild when started after an unclean shutdown, the question > really is how does it know which disk is the freshest in a raid-1? In a raid-4,5,6 it's pretty > obvious as there is really only one copy of the data, but then does the code actually ensure that > the data gets written before the updated parity? or does it just flush the lot to disk in what it > thinks is the most optimum fashion? > > The In-kernel data becomes pretty moot when the kernel has just blasted a couple of large blocks out > to a couple of disks and the plug has been pulled. It's going to be pretty indeterminate as to which > disk has the most accurate image of what was actually sent to it. Thus my comment that there is > really no way of accurately dealing with a catastrophic failure, and RAID is not there to do that > anyway. > with this indeterminate results, i do not know how raid code to detect which one is the latest copy, or a half-half? and in previous email, u suggest to have UPS and journal fs. but 1) u system will crash sometime even with UPS, so a UPS can not 100% prevent this. 2) JFS can not 100% solve this as well. especially when jfs only have metadata in log. > I guess if you had a hardware RAID card that had a battery backed up RAM you have a much better > chance but then you really have a mini-ups :p) > so here a NVRAM is the only way to solve this. :P also need a separate cpu running separate code. > Brad ming ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: consistency detect 2004-10-11 3:32 consistency detect Ming Zhang 2004-10-11 8:54 ` Brad Campbell @ 2004-10-11 23:23 ` Neil Brown 2004-10-12 0:05 ` Ming Zhang 1 sibling, 1 reply; 9+ messages in thread From: Neil Brown @ 2004-10-11 23:23 UTC (permalink / raw) To: mingz; +Cc: linux-raid On Sunday October 10, mingz@ele.uri.edu wrote: > I have a question on RAID error detect. hope somebody can help me to > find it out. thanks. > > take raid1 as an example, if one disk fail, raid 1 can detect the data > on disk is compromised and then reconstruct it using a spare disk. this > is straight forward. > > but if one request comes to raid1 and raid1 sends requests to both > disks, at this time, system reboots because power outage, system > crashes, or any other reason. then after system reboots, how raid 1 > detects which disk has consistent data? since before reboot, anything > can happen, data may in disk1 but not in disk2, or in disk2 but not in > disk1, or not in both disks, or already on both disks. When you have a computer with a single drive, and it crashes due to power outage or similar, and it was in the process of writing data out to disk, the contents of those blocks that were being written is undefined. It might have the old data. It might have the new data. If there are multiple blocks being written, some might be "old", some might be "new". Exactly the same is true with RAID1. There is "right" value for any block that was in the process of being written. RAID1 simply chooses a value and makes sure that it is the same on both (all) drives. It arbitrarily chooses the "first" drive in the array and copies that onto the rest. > > how raid1 or other raid code deal with this? This sort of inconsistency is not really something for RAID to deal with. It is something for the filesystem or application to deal with. Possibly via journalling. Possibly via 'fsck'. NeilBrown > > > ming > > > -- > -------------------------------------------------- > | Ming Zhang, PhD. Student > | Dept. of Electrical & Computer Engineering > | College of Engineering > | University of Rhode Island > | Kingston RI. 02881 > | e-mail: mingz at ele.uri.edu > | Tel. (401) 874-2293 > | Fax. (401) 782-6422 > | http://www.ele.uri.edu/~mingz/ > | http://crab.ele.uri.edu/gallery/albums.php > -------------------------------------------------- > > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: consistency detect 2004-10-11 23:23 ` Neil Brown @ 2004-10-12 0:05 ` Ming Zhang 2004-10-12 0:13 ` Neil Brown 0 siblings, 1 reply; 9+ messages in thread From: Ming Zhang @ 2004-10-12 0:05 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid On Mon, 2004-10-11 at 19:23, Neil Brown wrote: > On Sunday October 10, mingz@ele.uri.edu wrote: > > I have a question on RAID error detect. hope somebody can help me to > > find it out. thanks. > > > > take raid1 as an example, if one disk fail, raid 1 can detect the data > > on disk is compromised and then reconstruct it using a spare disk. this > > is straight forward. > > > > but if one request comes to raid1 and raid1 sends requests to both > > disks, at this time, system reboots because power outage, system > > crashes, or any other reason. then after system reboots, how raid 1 > > detects which disk has consistent data? since before reboot, anything > > can happen, data may in disk1 but not in disk2, or in disk2 but not in > > disk1, or not in both disks, or already on both disks. > > When you have a computer with a single drive, and it crashes due to > power outage or similar, and it was in the process of writing data out > to disk, the contents of those blocks that were being written is > undefined. It might have the old data. It might have the new data. > If there are multiple blocks being written, some might be "old", some > might be "new". > > Exactly the same is true with RAID1. There is "right" value for any > block that was in the process of being written. > RAID1 simply chooses a value and makes sure that it is the same on > both (all) drives. It arbitrarily chooses the "first" drive in the > array and copies that onto the rest. > :) thanks a lot. so now i know what policy raid1 use. so raid1 will randomly choose first drive and sync among all drives. so here comes to another question. after a power loss and reboot, raid1 knows that there are something wrong and potentially out of sync between two drives. will it try to check and make sure two drives are in sync or it just leave two drives there with potential unmatched blocks. for example, before reboot, a write to location A happen in progress. then after reboot. raid1 has no idea on previous write information. it just knows something might happened and two drives are possible to be out of sync. will it check whole drive to do a resync? > > > > how raid1 or other raid code deal with this? > > This sort of inconsistency is not really something for RAID to deal > with. It is something for the filesystem or application to deal > with. Possibly via journalling. Possibly via 'fsck'. > > NeilBrown > > > > > > > ming > > > > > > -- > > -------------------------------------------------- > > | Ming Zhang, PhD. Student > > | Dept. of Electrical & Computer Engineering > > | College of Engineering > > | University of Rhode Island > > | Kingston RI. 02881 > > | e-mail: mingz at ele.uri.edu > > | Tel. (401) 874-2293 > > | Fax. (401) 782-6422 > > | http://www.ele.uri.edu/~mingz/ > > | http://crab.ele.uri.edu/gallery/albums.php > > -------------------------------------------------- > > > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html -- -------------------------------------------------- | Ming Zhang, PhD. Student | Dept. of Electrical & Computer Engineering | College of Engineering | University of Rhode Island | Kingston RI. 02881 | e-mail: mingz at ele.uri.edu | Tel. (401) 874-2293 | Fax. (401) 782-6422 | http://www.ele.uri.edu/~mingz/ | http://crab.ele.uri.edu/gallery/albums.php -------------------------------------------------- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: consistency detect 2004-10-12 0:05 ` Ming Zhang @ 2004-10-12 0:13 ` Neil Brown 2004-10-12 0:43 ` Ming Zhang 0 siblings, 1 reply; 9+ messages in thread From: Neil Brown @ 2004-10-12 0:13 UTC (permalink / raw) To: mingz; +Cc: linux-raid On Monday October 11, mingz@ele.uri.edu wrote: > :) thanks a lot. so now i know what policy raid1 use. so raid1 will > randomly choose first drive and sync among all drives. so here comes to > another question. after a power loss and reboot, raid1 knows that there > are something wrong and potentially out of sync between two drives. will > it try to check and make sure two drives are in sync or it just leave > two drives there with potential unmatched blocks. > > for example, before reboot, a write to location A happen in progress. > then after reboot. raid1 has no idea on previous write information. it > just knows something might happened and two drives are possible to be > out of sync. will it check whole drive to do a resync? > Yes. It copies all of these first drive onto all of the other drives. NeilBrown ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: consistency detect 2004-10-12 0:13 ` Neil Brown @ 2004-10-12 0:43 ` Ming Zhang 0 siblings, 0 replies; 9+ messages in thread From: Ming Zhang @ 2004-10-12 0:43 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid On Mon, 2004-10-11 at 20:13, Neil Brown wrote: > On Monday October 11, mingz@ele.uri.edu wrote: > > :) thanks a lot. so now i know what policy raid1 use. so raid1 will > > randomly choose first drive and sync among all drives. so here comes to > > another question. after a power loss and reboot, raid1 knows that there > > are something wrong and potentially out of sync between two drives. will > > it try to check and make sure two drives are in sync or it just leave > > two drives there with potential unmatched blocks. > > > > for example, before reboot, a write to location A happen in progress. > > then after reboot. raid1 has no idea on previous write information. it > > just knows something might happened and two drives are possible to be > > out of sync. will it check whole drive to do a resync? > > > > Yes. It copies all of these first drive onto all of the other drives. a full copy? then if u have 100GB on first disk and you have to copy that much? with background reconstruction, this can take a long time. i never realized the overhead is this high. so i guess if system provide u a piece of nvram, u can do much better. :P > > NeilBrown -- -------------------------------------------------- | Ming Zhang, PhD. Student | Dept. of Electrical & Computer Engineering | College of Engineering | University of Rhode Island | Kingston RI. 02881 | e-mail: mingz at ele.uri.edu | Tel. (401) 874-2293 | Fax. (401) 782-6422 | http://www.ele.uri.edu/~mingz/ | http://crab.ele.uri.edu/gallery/albums.php -------------------------------------------------- ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-10-12 0:43 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-10-11 3:32 consistency detect Ming Zhang 2004-10-11 8:54 ` Brad Campbell 2004-10-11 10:42 ` Michael Tokarev 2004-10-11 10:58 ` Brad Campbell 2004-10-11 21:15 ` Ming Zhang 2004-10-11 23:23 ` Neil Brown 2004-10-12 0:05 ` Ming Zhang 2004-10-12 0:13 ` Neil Brown 2004-10-12 0:43 ` Ming Zhang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).