* Rewrite md raid1 member @ 2016-08-18 3:04 Chris Dunlop 2016-08-18 3:27 ` Brad Campbell 0 siblings, 1 reply; 11+ messages in thread From: Chris Dunlop @ 2016-08-18 3:04 UTC (permalink / raw) To: linux-raid G'day all, What options are there to safely rewrite a disk that's part of a live MD raid1? Specifically, I have smartctl reporting a Current_Pending_Sector of 360 on a member of a raid1 set. A 'check' of the raid comes up clean. I'd like to see if I can clear the pending sector count by rewriting the sectors. Whilst rewriting just those sectors would be ideal, I don't know which they are, so it looks like a whole disk write is the way to go. I realise the safest way to fix this is using a spare disk and doing a replace, allowing me to play with the "pending sector" disk to my heart's content, but I'm also interested to see if it can be done safely on a live system... If the system had a spare hot swap disk bay, and I had a spare disk, I could add another disk to the system and do the replace. If I were happy to lose redundancy during the process, I could remove the disk from the raid, wipe the superblock, add it again, and let it rebuild the whole raid. If it weren't the root filesystem, the filesystem could be taken offline whilst doing the rebuild above to reduce the chance of the lost redundancy producing undesirable results, but there's still the risk of problems cropping up on the "good" disk during the rebuild. If I were happy to wear the down time, I could boot into a rescue disk to do it. Another option might be to "dd" from the "good" disk: dd if=/dev/sda of=/dev/sdb ...except that will put the wrong superblock on there. Using the same disk for the src and dst might be an option: dd if=/dev/sdb of=/dev/sdb ...but the seeking would kill the throughput. Perhaps a large blocksize might help, e.g. bs=64K. Or, there could be some dance of 'dd'ing from the same disk for the superblock, and 'dd'ing from the other disk for the bulk data, using the Super Offset and Data Offset from "mdadm -E". However using 'dd' allows for a window where dd reads data A from sda:X (sector X), then the system writes data B to md0:X (i.e. to both sda:X and sdb:X), then dd writes data A to sdb:X, putting the raid out of sync. This could potentially be fixed by doing a 'repair' of the raid, except that, as both sda and sdb are returning data but not the same data, it's possible this will preserve the wrong data (i.e. write the old data A from sdb:X to sda:X instead of writing the new data B from sda:X to sdb:X). In this circumstance, how does md decide which is the "good" data? Is there a way of specifying "in the case of discrepancies, trust sda"? Perhaps, before writing to sdb, setting it to "blocked" the right thing to do? I.e.: echo "blocked" > /sys/block/md0/md/dev-sdb1/state [ dd stuff per above ] echo "-blocked" > /sys/block/md0/md/dev-sdb1/state Per linux/Documentation/md.txt: ---- Writing "blocked" sets the "blocked" flag. Writing "-blocked" clears the "blocked" flags and allows writes to complete and possibly simulates an error. ---- I can't find anything that tells me what this actually does in practice. I'm guessing setting it to "blocked" will stop md writing to that device but otherwise allow the md device to function normally, and setting it to "-blocked" will allow writes to proceed and the md device will then use the write-intent bitmap to copy over any writes that were blocked. And what does "...and possibly simulates an error" imply? Or is this 'dd' stuff just nuts, a case of "well that's a novel way of trashing your data..." and/or "you're welcome to try, but you get to keep all the pieces and don't come crying to us for help!"? Thanks for any insights into this! Cheers, Chris ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Rewrite md raid1 member 2016-08-18 3:04 Rewrite md raid1 member Chris Dunlop @ 2016-08-18 3:27 ` Brad Campbell 2016-08-18 4:01 ` Chris Dunlop 0 siblings, 1 reply; 11+ messages in thread From: Brad Campbell @ 2016-08-18 3:27 UTC (permalink / raw) To: linux-raid On 18/08/16 11:04, Chris Dunlop wrote: > G'day all, > > What options are there to safely rewrite a disk that's part of a live MD > raid1? > > Specifically, I have smartctl reporting a Current_Pending_Sector of 360 on a > member of a raid1 set. > > A 'check' of the raid comes up clean. I'd like to see if I can clear the > pending sector count by rewriting the sectors. Whilst rewriting just those > sectors would be ideal, I don't know which they are, so it looks like a > whole disk write is the way to go. > A smartctl -t long on the drive will error out at the first problematic sector and put that LBA in the SMART log, so there's a start. Another way to determine it is run dd from the drive, and it will abort on the first error telling you how many records it managed to copy. With the default bs of 512, that gives you a sector number. > Or is this 'dd' stuff just nuts, a case of "well that's a novel way of > trashing your data..." and/or "you're welcome to try, but you get to keep > all the pieces and don't come crying to us for help!"? Pretty much. If a RAID check is not touching them, then they are likely in the vacant area around the superblock. Nothing touches that, and playing with it can lead to tears if you misfire and hit the superblock or the data. If the superblock is ok, and the errors are outside of the data area I've taken a drive out of the array, used dd_rescue to clone the area of the drive in question and then written that back to the disk and re-added to the array. That just re-writes the good data and with zeros where the bad sectors were. That is a horrible, horrible procedure that I did on an array I use for testing and has no valuable data on. I would not recommend it if you care about your array or data. Brad ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Rewrite md raid1 member 2016-08-18 3:27 ` Brad Campbell @ 2016-08-18 4:01 ` Chris Dunlop 2016-08-19 11:52 ` Wols Lists 0 siblings, 1 reply; 11+ messages in thread From: Chris Dunlop @ 2016-08-18 4:01 UTC (permalink / raw) To: Brad Campbell; +Cc: linux-raid On Thu, Aug 18, 2016 at 11:27:55AM +0800, Brad Campbell wrote: > On 18/08/16 11:04, Chris Dunlop wrote: >> G'day all, >> >> What options are there to safely rewrite a disk that's part of a live MD >> raid1? >> >> Specifically, I have smartctl reporting a Current_Pending_Sector of 360 on a >> member of a raid1 set. >> >> A 'check' of the raid comes up clean. I'd like to see if I can clear the >> pending sector count by rewriting the sectors. Whilst rewriting just those >> sectors would be ideal, I don't know which they are, so it looks like a >> whole disk write is the way to go. > > A smartctl -t long on the drive will error out at the first problematic > sector and put that LBA in the SMART log, so there's a start. I should have mentioned: a 'smartctl -t long' on the drive came up clean. > Another way to determine it is run dd from the drive, and it will abort on > the first error telling you how many records it managed to copy. With the > default bs of 512, that gives you a sector number. A 'dd' read of the whole disk also came up clean. From what I can gather, a "pending sector" is one that's a bit suspect, but may actually be ok. It seems mine are ok (at least for reading), but the pending count won't clear until a write succeeds (or fails, and the sector is remapped). >> Or is this 'dd' stuff just nuts, a case of "well that's a novel way of >> trashing your data..." and/or "you're welcome to try, but you get to keep >> all the pieces and don't come crying to us for help!"? > > Pretty much. If a RAID check is not touching them, then they are likely in > the vacant area around the superblock. Nothing touches that, and playing > with it can lead to tears if you misfire and hit the superblock or the data. Sure - I understand the risks. > If the superblock is ok, and the errors are outside of the data area I've > taken a drive out of the array, used dd_rescue to clone the area of the > drive in question and then written that back to the disk and re-added to the > array. That just re-writes the good data and with zeros where the bad > sectors were. > > That is a horrible, horrible procedure that I did on an array I use for > testing and has no valuable data on. I would not recommend it if you care > about your array or data. I'm interested to see if there's a way of essentially doing the above on a live system, assuming there's appropriate care taken to not trash any existing data (including superblocks). I.e. is it *theoretically* possible to write the same data back to the whole disk safely. E.g. using 'dd' from/to the same disk is almost there, but, as described, there's a window of opportunity where you could get stale data on the disk and a raid repair could then copy that stale data to the good disk. > Brad Thanks, Chris ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Rewrite md raid1 member 2016-08-18 4:01 ` Chris Dunlop @ 2016-08-19 11:52 ` Wols Lists 2016-08-19 12:46 ` Chris Dunlop 0 siblings, 1 reply; 11+ messages in thread From: Wols Lists @ 2016-08-19 11:52 UTC (permalink / raw) To: Chris Dunlop, Brad Campbell; +Cc: linux-raid On 18/08/16 05:01, Chris Dunlop wrote: > I'm interested to see if there's a way of essentially doing the above on a > live system, assuming there's appropriate care taken to not trash any > existing data (including superblocks). > > I.e. is it *theoretically* possible to write the same data back to the whole > disk safely. E.g. using 'dd' from/to the same disk is almost there, but, as > described, there's a window of opportunity where you could get stale data on > the disk and a raid repair could then copy that stale data to the good disk. There is something called "scrub". My superficial knowledge of raid doesn't let me know what it is, but as far as I can make out it forces a whole-disk-write or somesuch. Explicitly to flush out such problems. If someone else can tell you how to scrub your disks, I'd try that. It's especially recommended, I think, for people with desktop drives in their array because it flushes out pending problems, which with desktop drives typically remove the "R" from "raid". Cheers, Wol ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Rewrite md raid1 member 2016-08-19 11:52 ` Wols Lists @ 2016-08-19 12:46 ` Chris Dunlop 2016-08-19 16:10 ` Chris Murphy 2016-08-19 21:26 ` NeilBrown 0 siblings, 2 replies; 11+ messages in thread From: Chris Dunlop @ 2016-08-19 12:46 UTC (permalink / raw) To: Wols Lists; +Cc: Brad Campbell, linux-raid On Fri, Aug 19, 2016 at 12:52:21PM +0100, Wols Lists wrote: > On 18/08/16 05:01, Chris Dunlop wrote: >> I'm interested to see if there's a way of essentially doing the above on a >> live system, assuming there's appropriate care taken to not trash any >> existing data (including superblocks). >> >> I.e. is it *theoretically* possible to write the same data back to the whole >> disk safely. E.g. using 'dd' from/to the same disk is almost there, but, as >> described, there's a window of opportunity where you could get stale data on >> the disk and a raid repair could then copy that stale data to the good disk. > > There is something called "scrub". My superficial knowledge of raid > doesn't let me know what it is, but as far as I can make out it forces a > whole-disk-write or somesuch. Explicitly to flush out such problems. If > someone else can tell you how to scrub your disks, I'd try that. A scrub will read the RAID members to check that both sides match (raid 1, 10), or that the checksum is correct (raid 4,5,6). To initiate a scrub of md0: echo repair > /sys/block/md0/md/sync_action You can watch it using /proc/mdstat, e.g.: watch cat /proc/mdstat It won't write anything if it doesn't detect any errors. In my case, I want it to write everything. If I do my 'dd' to write everything as previously described, with the window of opportunity for stale data to end up on the written disk, one option would to run a scrub / repair to check the data is the same - but if I'm unlucky with my dd and the data isn't the same for some sector[s], I want to ensure the correct data is copied over the stale data and not the other way around, e.g. to specify "in the event of a mismatch, use the data from sda and overwrite the data on sdb". Unfortunately I don't know how that can be done. Does anyone know? Cheers, Chris ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Rewrite md raid1 member 2016-08-19 12:46 ` Chris Dunlop @ 2016-08-19 16:10 ` Chris Murphy 2016-08-20 1:43 ` Chris Dunlop 2016-08-19 21:26 ` NeilBrown 1 sibling, 1 reply; 11+ messages in thread From: Chris Murphy @ 2016-08-19 16:10 UTC (permalink / raw) To: Chris Dunlop; +Cc: Wols Lists, Brad Campbell, Linux-RAID On Fri, Aug 19, 2016 at 6:46 AM, Chris Dunlop <chris@onthe.net.au> wrote: > On Fri, Aug 19, 2016 at 12:52:21PM +0100, Wols Lists wrote: >> On 18/08/16 05:01, Chris Dunlop wrote: >>> I'm interested to see if there's a way of essentially doing the above on a >>> live system, assuming there's appropriate care taken to not trash any >>> existing data (including superblocks). >>> >>> I.e. is it *theoretically* possible to write the same data back to the whole >>> disk safely. E.g. using 'dd' from/to the same disk is almost there, but, as >>> described, there's a window of opportunity where you could get stale data on >>> the disk and a raid repair could then copy that stale data to the good disk. >> >> There is something called "scrub". My superficial knowledge of raid >> doesn't let me know what it is, but as far as I can make out it forces a >> whole-disk-write or somesuch. Explicitly to flush out such problems. If >> someone else can tell you how to scrub your disks, I'd try that. > > A scrub will read the RAID members to check that both sides match (raid 1, > 10), or that the checksum is correct (raid 4,5,6). > > To initiate a scrub of md0: > > echo repair > /sys/block/md0/md/sync_action > > You can watch it using /proc/mdstat, e.g.: > > watch cat /proc/mdstat > > It won't write anything if it doesn't detect any errors. > > In my case, I want it to write everything. > > If I do my 'dd' to write everything as previously described, with the window > of opportunity for stale data to end up on the written disk, one option > would to run a scrub / repair to check the data is the same - but if I'm > unlucky with my dd and the data isn't the same for some sector[s], I want to > ensure the correct data is copied over the stale data and not the other way > around, e.g. to specify "in the event of a mismatch, use the data from sda > and overwrite the data on sdb". > > Unfortunately I don't know how that can be done. > > Does anyone know? Basically you want what Btrfs balance does, except simpler: rather than relocating extents into new allocation groups, you just want to read and rewrite everything as it is. You definitely can't do this with dd when md + mounted file system, that's inevitably going to result in the file system making changes after this operation has done a read, and therefore its write will clobber the file system's modifications. It'll be data loss at a minimum, and if it's file system metadata, it'll be worse in that it'll make the file system inconsistent. Further it's a problem overwriting good data, not accounting for the possibility of a crash or power failure. You'd really want this operation to be CoW, so that the good data is effectively duplicated somewhere else and only once that operation is on stable media would it be pointed to, and the original data turned to free space. I'm not really understanding the use case of why you'd want to do this. At a fundamental level it sounds like you don't trust the devices the data resides on. If that's true, then there are related concerns that aren't mitigated by this rewrite feature alone. -- Chris Murphy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Rewrite md raid1 member 2016-08-19 16:10 ` Chris Murphy @ 2016-08-20 1:43 ` Chris Dunlop 2016-08-20 10:44 ` Wols Lists 0 siblings, 1 reply; 11+ messages in thread From: Chris Dunlop @ 2016-08-20 1:43 UTC (permalink / raw) To: Chris Murphy; +Cc: Wols Lists, Brad Campbell, Linux-RAID On Fri, Aug 19, 2016 at 10:10:23AM -0600, Chris Murphy wrote: > On Fri, Aug 19, 2016 at 6:46 AM, Chris Dunlop <chris@onthe.net.au> wrote: >> On Fri, Aug 19, 2016 at 12:52:21PM +0100, Wols Lists wrote: >>> On 18/08/16 05:01, Chris Dunlop wrote: >>>> I'm interested to see if there's a way of essentially doing the above on a >>>> live system, assuming there's appropriate care taken to not trash any >>>> existing data (including superblocks). >>>> >>>> I.e. is it *theoretically* possible to write the same data back to the whole >>>> disk safely. E.g. using 'dd' from/to the same disk is almost there, but, as >>>> described, there's a window of opportunity where you could get stale data on >>>> the disk and a raid repair could then copy that stale data to the good disk. [snip] >> If I do my 'dd' to write everything as previously described, with the window >> of opportunity for stale data to end up on the written disk, one option >> would to run a scrub / repair to check the data is the same - but if I'm >> unlucky with my dd and the data isn't the same for some sector[s], I want to >> ensure the correct data is copied over the stale data and not the other way >> around, e.g. to specify "in the event of a mismatch, use the data from sda >> and overwrite the data on sdb". >> >> Unfortunately I don't know how that can be done. >> >> Does anyone know? > > Basically you want what Btrfs balance does, except simpler: rather > than relocating extents into new allocation groups, you just want to > read and rewrite everything as it is. Sorry, I'm not familiar with btrfs at that level. > You definitely can't do this with dd when md + mounted file system, > that's inevitably going to result in the file system making changes > after this operation has done a read, and therefore its write will > clobber the file system's modifications. It'll be data loss at a > minimum, and if it's file system metadata, it'll be worse in that > it'll make the file system inconsistent. I'm not convinced it's "inevitable" given the window between reading and writing can be relatively small, and the filesystem would have to write to those specific sectors during that window. But, yes, that's the issue, there's certainly a chance of it happening. > Further it's a problem overwriting good data, not accounting for the > possibility of a crash or power failure. You'd really want this > operation to be CoW, so that the good data is effectively duplicated > somewhere else and only once that operation is on stable media would > it be pointed to, and the original data turned to free space. It's raid-1, so I have good data at all times, on the disk I'm not dd'ing to (sda). The problem is there may stale data on the disk dd'ed to (sdb) due to the window of opportunity described previously, i.e. dd reads data A from sda:X (sector X), the system writes data B to md0:X (i.e. to both sda:X and sdb:X), then dd writes stale data A to sdb:X, putting the disks out of sync. In fact, the stale data problem is a larger problem than I first thought: it's not only an issue when doing a repair (i.e. how to tell md to use the data on the "good" disk in the event of discrepancies), but also whilst the dd is underway: if you happen to issue a read to a sector which has good data on one disk but stale data on the other, I don't know if there's a way to ensure md reads the data on the "good" disk. So, in fact, I guess the facility I'm looking for, is a "write only" flag for that disk, until a repair can be done (assuming the repair also honours the "write only" flag. Oh hey, from linux/Documentation/md.txt: state A file recording the current state of the device in the array which can be a comma separated list of ... writemostly - device will only be subject to read requests if there are no other options. This applies only to raid1 arrays. I think that's *almost* exactly what I need, but to be safe I think I really want something like: writeonly - no reads will be issued to this drive. If reads can't be satisfied from other drives, the array will be failed. Then again, I guess in the end what I'd really like is to be able to flag a particular disk to md for "write repair", and tell md to repair. Then md would read data from unflagged disks to write to the flagged disk (that could work for parity raids as well as mirrors). This has the advantage, like "mdadm --replace", that you retain redundancy at all times whilst still writing to the entire disk. The advantage over "madm --replace" would be that you don't require another disk. But, in the absence of sufficient time and kernel knowledge to add "write repair" to md myself, I'm interested to see if it can be done at the user level. > I'm not really understanding the use case of why you'd want to do > this. At a fundamental level it sounds like you don't trust the > devices the data resides on. If that's true, then there are related > concerns that aren't mitigated by this rewrite feature alone. My immediate use case is to try to clear the "pending sector" count by writing to every sector on the disk. The pending sector count indicates "something" went wrong at some point: it could be a permanent error (e.g. disk surface is dodgy) or a soft error (e.g. a power supply droop during a write). I.e. it may or may not indicate the disk itself is going bad. If the count clears (either by confirming the sector is good, or reallocating if the sector is really rubbish), I have a confirmed good disk and life goes on. If something turns up during the write attempt, I know the disk is bad and I can schedule a replacement. As stated at the beginning, I know the safest way to do this is to add in another disk, do a 'mdadm --replace', and then remove the suspect disk and play with it as much as I like. As a matter of interest I'm looking to see if there's a safe way of doing it whilst the disk is online and live. Safe, that is, in that the data is as safe as it would be on a normally functioning array, *if* everything is done correctly. So it's a "hey, it would be good if this can be done" issue rather than a "help me, I'm afraid I might lose some data!" problem. Cheers, Chris ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Rewrite md raid1 member 2016-08-20 1:43 ` Chris Dunlop @ 2016-08-20 10:44 ` Wols Lists 0 siblings, 0 replies; 11+ messages in thread From: Wols Lists @ 2016-08-20 10:44 UTC (permalink / raw) To: Chris Dunlop, Chris Murphy; +Cc: Brad Campbell, Linux-RAID On 20/08/16 02:43, Chris Dunlop wrote: > Then again, I guess in the end what I'd really like is to be able to > flag a particular disk to md for "write repair", and tell md to repair. > Then md would read data from unflagged disks to write to the flagged > disk (that could work for parity raids as well as mirrors). I had that idea. I'm probably better at understanding and documenting things, hence my interest in the raid wiki, but I'm looking at this exact thing as a project for my first foray into kernel programming. Is that wise? :-) Basically, do a stripe integrity check, and optionally rewrite it? I don't to what extent linux raid actually implements a lot of interesting theoretical abilities, and if I can document it, I can then identify holes and try and fill them. Especially when you're trying to recover a broken array, the more options you have, the better ... Unfortunately the raid wiki admin is MIA at the moment, and I really want to hack that as a learning exercise before I start messing about with kernel code. Cheers, Wol ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Rewrite md raid1 member 2016-08-19 12:46 ` Chris Dunlop 2016-08-19 16:10 ` Chris Murphy @ 2016-08-19 21:26 ` NeilBrown 2016-08-20 1:57 ` Chris Dunlop 1 sibling, 1 reply; 11+ messages in thread From: NeilBrown @ 2016-08-19 21:26 UTC (permalink / raw) To: Chris Dunlop, Wols Lists; +Cc: Brad Campbell, linux-raid [-- Attachment #1: Type: text/plain, Size: 1218 bytes --] On Fri, Aug 19 2016, Chris Dunlop wrote: > > In my case, I want it to write everything. > > If I do my 'dd' to write everything as previously described, with the window > of opportunity for stale data to end up on the written disk, one option > would to run a scrub / repair to check the data is the same - but if I'm > unlucky with my dd and the data isn't the same for some sector[s], I want to > ensure the correct data is copied over the stale data and not the other way > around, e.g. to specify "in the event of a mismatch, use the data from sda > and overwrite the data on sdb". > > Unfortunately I don't know how that can be done. > > Does anyone know? If it is the second device in the array (as listed by mdadm --detail) then you can stop the array and re-assemble with --update=resync. If it is the first device I can only suggest that you fail the device and add it again: mdadm /dev/mdXX --fail /dev/sdYY mdadm /dev/mdXX --remove /dev/sdYY mdadm /dev/mdYY --add /dev/sdYY If the "good" drive fails during the rewrite it might be a little bit fiddley getting the array working again, but all the data will certainly be there on the device you are re-writing, so you won't lose anything. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 818 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Rewrite md raid1 member 2016-08-19 21:26 ` NeilBrown @ 2016-08-20 1:57 ` Chris Dunlop 2016-08-20 6:52 ` NeilBrown 0 siblings, 1 reply; 11+ messages in thread From: Chris Dunlop @ 2016-08-20 1:57 UTC (permalink / raw) To: NeilBrown; +Cc: Wols Lists, Brad Campbell, linux-raid Hi Neil, Nice work on the Bus1 article! On Sat, Aug 20, 2016 at 07:26:27AM +1000, NeilBrown wrote: > On Fri, Aug 19 2016, Chris Dunlop wrote: >> In my case, I want it to write everything. >> >> If I do my 'dd' to write everything as previously described, with the window >> of opportunity for stale data to end up on the written disk, one option >> would to run a scrub / repair to check the data is the same - but if I'm >> unlucky with my dd and the data isn't the same for some sector[s], I want to >> ensure the correct data is copied over the stale data and not the other way >> around, e.g. to specify "in the event of a mismatch, use the data from sda >> and overwrite the data on sdb". >> >> Unfortunately I don't know how that can be done. >> >> Does anyone know? > > If it is the second device in the array (as listed by mdadm --detail) > then you can stop the array and re-assemble with --update=resync. That's nearly there - except in this specific case it's my root filesystem so I can't stop the array without booting into a recovery disk etc. Of course I could do that, but the point of the exercise is to see if it can be done live, safely. > If it is the first device I can only suggest that you > fail the device and add it again: > > mdadm /dev/mdXX --fail /dev/sdYY > mdadm /dev/mdXX --remove /dev/sdYY > mdadm /dev/mdYY --add /dev/sdYY > > If the "good" drive fails during the rewrite it might be a little bit > fiddley getting the array working again, but all the data will certainly > be there on the device you are re-writing, so you won't lose anything. OK, that sounds good. What would the process be if the good drive fails, either completely, or a few specific sectors? Thanks, Chris ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Rewrite md raid1 member 2016-08-20 1:57 ` Chris Dunlop @ 2016-08-20 6:52 ` NeilBrown 0 siblings, 0 replies; 11+ messages in thread From: NeilBrown @ 2016-08-20 6:52 UTC (permalink / raw) To: Chris Dunlop; +Cc: Wols Lists, Brad Campbell, linux-raid [-- Attachment #1: Type: text/plain, Size: 3011 bytes --] On Sat, Aug 20 2016, Chris Dunlop wrote: > Hi Neil, > > Nice work on the Bus1 article! Thanks :-) > > On Sat, Aug 20, 2016 at 07:26:27AM +1000, NeilBrown wrote: >> On Fri, Aug 19 2016, Chris Dunlop wrote: >>> In my case, I want it to write everything. >>> >>> If I do my 'dd' to write everything as previously described, with the window >>> of opportunity for stale data to end up on the written disk, one option >>> would to run a scrub / repair to check the data is the same - but if I'm >>> unlucky with my dd and the data isn't the same for some sector[s], I want to >>> ensure the correct data is copied over the stale data and not the other way >>> around, e.g. to specify "in the event of a mismatch, use the data from sda >>> and overwrite the data on sdb". >>> >>> Unfortunately I don't know how that can be done. >>> >>> Does anyone know? >> >> If it is the second device in the array (as listed by mdadm --detail) >> then you can stop the array and re-assemble with --update=resync. > > That's nearly there - except in this specific case it's my root filesystem > so I can't stop the array without booting into a recovery disk etc. Of > course I could do that, but the point of the exercise is to see if it can > be done live, safely. Well... you could cd /sys/block/mdXX/md echo frozen > sync_action echo 0 > resync_start echo idle > sync_action that should start a resync on a live array. Still, only works for non-first device in RAID1 > >> If it is the first device I can only suggest that you >> fail the device and add it again: >> >> mdadm /dev/mdXX --fail /dev/sdYY >> mdadm /dev/mdXX --remove /dev/sdYY >> mdadm /dev/mdYY --add /dev/sdYY >> >> If the "good" drive fails during the rewrite it might be a little bit >> fiddley getting the array working again, but all the data will certainly >> be there on the device you are re-writing, so you won't lose anything. > > OK, that sounds good. What would the process be if the good drive fails, > either completely, or a few specific sectors? If you think there is a serious risk of that happening, then it's best to skip this option. You would need to boot from a rescue disk and re-create the array using just the working device - and make sure the same data-offset and size are used. Certainly possible, but not at all straightforward. Another thing you could do, particularly if you know what region of the device needs to be over-written, is to write sector numbers to suspend_lo and suspend_hi. This will suspend all IO through the /dev/mdXX device to that range of array sectors. Then you could read from/write to the raw device with dd or whatever. raid6check.c does this on a raid6 to correct errors that can be detected with the raid6 syndrome, even while the array is online. A similar thing could be done to allow individual blocks to be rewritten. Care is needed to map between array addresses and device addresses. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 818 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-08-20 10:44 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-08-18 3:04 Rewrite md raid1 member Chris Dunlop 2016-08-18 3:27 ` Brad Campbell 2016-08-18 4:01 ` Chris Dunlop 2016-08-19 11:52 ` Wols Lists 2016-08-19 12:46 ` Chris Dunlop 2016-08-19 16:10 ` Chris Murphy 2016-08-20 1:43 ` Chris Dunlop 2016-08-20 10:44 ` Wols Lists 2016-08-19 21:26 ` NeilBrown 2016-08-20 1:57 ` Chris Dunlop 2016-08-20 6:52 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).