Rewrite md raid1 member

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Rewrite md raid1 member
@ 2016-08-18  3:04 Chris Dunlop
  2016-08-18  3:27 ` Brad Campbell
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Dunlop @ 2016-08-18  3:04 UTC (permalink / raw)
  To: linux-raid

G'day all,

What options are there to safely rewrite a disk that's part of a live MD
raid1?

Specifically, I have smartctl reporting a Current_Pending_Sector of 360 on a
member of a raid1 set.

A 'check' of the raid comes up clean. I'd like to see if I can clear the
pending sector count by rewriting the sectors. Whilst rewriting just those
sectors would be ideal, I don't know which they are, so it looks like a
whole disk write is the way to go.

I realise the safest way to fix this is using a spare disk and doing a
replace, allowing me to play with the "pending sector" disk to my heart's
content, but I'm also interested to see if it can be done safely on a live
system...

If the system had a spare hot swap disk bay, and I had a spare disk, I could
add another disk to the system and do the replace.

If I were happy to lose redundancy during the process, I could remove the
disk from the raid, wipe the superblock, add it again, and let it rebuild
the whole raid.

If it weren't the root filesystem, the filesystem could be taken offline
whilst doing the rebuild above to reduce the chance of the lost redundancy
producing undesirable results, but there's still the risk of problems
cropping up on the "good" disk during the rebuild.

If I were happy to wear the down time, I could boot into a rescue disk to do
it.

Another option might be to "dd" from the "good" disk:

dd if=/dev/sda of=/dev/sdb

...except that will put the wrong superblock on there. Using the same disk
for the src and dst might be an option:

dd if=/dev/sdb of=/dev/sdb

...but the seeking would kill the throughput. Perhaps a large blocksize
might help, e.g. bs=64K. Or, there could be some dance of 'dd'ing from the
same disk for the superblock, and 'dd'ing from the other disk for the bulk
data, using the Super Offset and Data Offset from "mdadm -E".

However using 'dd' allows for a window where dd reads data A from sda:X
(sector X), then the system writes data B to md0:X (i.e. to both sda:X and
sdb:X), then dd writes data A to sdb:X, putting the raid out of sync.

This could potentially be fixed by doing a 'repair' of the raid, except
that, as both sda and sdb are returning data but not the same data, it's
possible this will preserve the wrong data (i.e. write the old data A from
sdb:X to sda:X instead of writing the new data B from sda:X to sdb:X).

In this circumstance, how does md decide which is the "good" data? Is there
a way of specifying "in the case of discrepancies, trust sda"?

Perhaps, before writing to sdb, setting it to "blocked" the right thing to
do? I.e.:

echo "blocked" > /sys/block/md0/md/dev-sdb1/state
[ dd stuff per above ]
echo "-blocked" > /sys/block/md0/md/dev-sdb1/state

Per linux/Documentation/md.txt:
----
    Writing "blocked" sets the "blocked" flag.
    Writing "-blocked" clears the "blocked" flags and allows writes
            to complete and possibly simulates an error.
----

I can't find anything that tells me what this actually does in practice. I'm
guessing setting it to "blocked" will stop md writing to that device but
otherwise allow the md device to function normally, and setting it to
"-blocked" will allow writes to proceed and the md device will then use the
write-intent bitmap to copy over any writes that were blocked.

And what does "...and possibly simulates an error" imply?

Or is this 'dd' stuff just nuts, a case of "well that's a novel way of
trashing your data..." and/or "you're welcome to try, but you get to keep
all the pieces and don't come crying to us for help!"?

Thanks for any insights into this!

Cheers,

Chris

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Rewrite md raid1 member
  2016-08-18  3:04 Rewrite md raid1 member Chris Dunlop
@ 2016-08-18  3:27 ` Brad Campbell
  2016-08-18  4:01   ` Chris Dunlop
  0 siblings, 1 reply; 11+ messages in thread
From: Brad Campbell @ 2016-08-18  3:27 UTC (permalink / raw)
  To: linux-raid

On 18/08/16 11:04, Chris Dunlop wrote:
> G'day all,
>
> What options are there to safely rewrite a disk that's part of a live MD
> raid1?
>
> Specifically, I have smartctl reporting a Current_Pending_Sector of 360 on a
> member of a raid1 set.
>
> A 'check' of the raid comes up clean. I'd like to see if I can clear the
> pending sector count by rewriting the sectors. Whilst rewriting just those
> sectors would be ideal, I don't know which they are, so it looks like a
> whole disk write is the way to go.
>

A smartctl -t long on the drive will error out at the first problematic 
sector and put that LBA in the SMART log, so there's a start.

Another way to determine it is run dd from the drive, and it will abort 
on the first error telling you how many records it managed to copy. With 
the default bs of 512, that gives you a sector number.

> Or is this 'dd' stuff just nuts, a case of "well that's a novel way of
> trashing your data..." and/or "you're welcome to try, but you get to keep
> all the pieces and don't come crying to us for help!"?

Pretty much. If a RAID check is not touching them, then they are likely 
in the vacant area around the superblock. Nothing touches that, and 
playing with it can lead to tears if you misfire and hit the superblock 
or the data.

If the superblock is ok, and the errors are outside of the data area 
I've taken a drive out of the array, used dd_rescue to clone the area of 
the drive in question and then written that back to the disk and 
re-added to the array. That just re-writes the good data and with zeros 
where the bad sectors were.

That is a horrible, horrible procedure that I did on an array I use for 
testing and has no valuable data on. I would not recommend it if you 
care about your array or data.

Brad

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Rewrite md raid1 member
  2016-08-18  3:27 ` Brad Campbell
@ 2016-08-18  4:01   ` Chris Dunlop
  2016-08-19 11:52     ` Wols Lists
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Dunlop @ 2016-08-18  4:01 UTC (permalink / raw)
  To: Brad Campbell; +Cc: linux-raid

On Thu, Aug 18, 2016 at 11:27:55AM +0800, Brad Campbell wrote:
> On 18/08/16 11:04, Chris Dunlop wrote:
>> G'day all,
>>
>> What options are there to safely rewrite a disk that's part of a live MD
>> raid1?
>>
>> Specifically, I have smartctl reporting a Current_Pending_Sector of 360 on a
>> member of a raid1 set.
>>
>> A 'check' of the raid comes up clean. I'd like to see if I can clear the
>> pending sector count by rewriting the sectors. Whilst rewriting just those
>> sectors would be ideal, I don't know which they are, so it looks like a
>> whole disk write is the way to go.
> 
> A smartctl -t long on the drive will error out at the first problematic
> sector and put that LBA in the SMART log, so there's a start.

I should have mentioned: a 'smartctl -t long' on the drive came up clean.

> Another way to determine it is run dd from the drive, and it will abort on
> the first error telling you how many records it managed to copy. With the
> default bs of 512, that gives you a sector number.

A 'dd' read of the whole disk also came up clean.

From what I can gather, a "pending sector" is one that's a bit suspect, but
may actually be ok. It seems mine are ok (at least for reading), but the
pending count won't clear until a write succeeds (or fails, and the sector
is remapped).

>> Or is this 'dd' stuff just nuts, a case of "well that's a novel way of
>> trashing your data..." and/or "you're welcome to try, but you get to keep
>> all the pieces and don't come crying to us for help!"?
> 
> Pretty much. If a RAID check is not touching them, then they are likely in
> the vacant area around the superblock. Nothing touches that, and playing
> with it can lead to tears if you misfire and hit the superblock or the data.

Sure - I understand the risks.

> If the superblock is ok, and the errors are outside of the data area I've
> taken a drive out of the array, used dd_rescue to clone the area of the
> drive in question and then written that back to the disk and re-added to the
> array. That just re-writes the good data and with zeros where the bad
> sectors were.
> 
> That is a horrible, horrible procedure that I did on an array I use for
> testing and has no valuable data on. I would not recommend it if you care
> about your array or data.

I'm interested to see if there's a way of essentially doing the above on a
live system, assuming there's appropriate care taken to not trash any
existing data (including superblocks).

I.e. is it *theoretically* possible to write the same data back to the whole
disk safely. E.g. using 'dd' from/to the same disk is almost there, but, as
described, there's a window of opportunity where you could get stale data on
the disk and a raid repair could then copy that stale data to the good disk.

> Brad

Thanks,

Chris

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Rewrite md raid1 member
  2016-08-18  4:01   ` Chris Dunlop
@ 2016-08-19 11:52     ` Wols Lists
  2016-08-19 12:46       ` Chris Dunlop
  0 siblings, 1 reply; 11+ messages in thread
From: Wols Lists @ 2016-08-19 11:52 UTC (permalink / raw)
  To: Chris Dunlop, Brad Campbell; +Cc: linux-raid

On 18/08/16 05:01, Chris Dunlop wrote:
> I'm interested to see if there's a way of essentially doing the above on a
> live system, assuming there's appropriate care taken to not trash any
> existing data (including superblocks).
> 
> I.e. is it *theoretically* possible to write the same data back to the whole
> disk safely. E.g. using 'dd' from/to the same disk is almost there, but, as
> described, there's a window of opportunity where you could get stale data on
> the disk and a raid repair could then copy that stale data to the good disk.

There is something called "scrub". My superficial knowledge of raid
doesn't let me know what it is, but as far as I can make out it forces a
whole-disk-write or somesuch. Explicitly to flush out such problems. If
someone else can tell you how to scrub your disks, I'd try that.

It's especially recommended, I think, for people with desktop drives in
their array because it flushes out pending problems, which with desktop
drives typically remove the "R" from "raid".

Cheers,
Wol

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Rewrite md raid1 member
  2016-08-19 11:52     ` Wols Lists
@ 2016-08-19 12:46       ` Chris Dunlop
  2016-08-19 16:10         ` Chris Murphy
  2016-08-19 21:26         ` NeilBrown
  0 siblings, 2 replies; 11+ messages in thread
From: Chris Dunlop @ 2016-08-19 12:46 UTC (permalink / raw)
  To: Wols Lists; +Cc: Brad Campbell, linux-raid

On Fri, Aug 19, 2016 at 12:52:21PM +0100, Wols Lists wrote:
> On 18/08/16 05:01, Chris Dunlop wrote:
>> I'm interested to see if there's a way of essentially doing the above on a
>> live system, assuming there's appropriate care taken to not trash any
>> existing data (including superblocks).
>> 
>> I.e. is it *theoretically* possible to write the same data back to the whole
>> disk safely. E.g. using 'dd' from/to the same disk is almost there, but, as
>> described, there's a window of opportunity where you could get stale data on
>> the disk and a raid repair could then copy that stale data to the good disk.
> 
> There is something called "scrub". My superficial knowledge of raid
> doesn't let me know what it is, but as far as I can make out it forces a
> whole-disk-write or somesuch. Explicitly to flush out such problems. If
> someone else can tell you how to scrub your disks, I'd try that.

A scrub will read the RAID members to check that both sides match (raid 1,
10), or that the checksum is correct (raid 4,5,6).

To initiate a scrub of md0:

echo repair > /sys/block/md0/md/sync_action

You can watch it using /proc/mdstat, e.g.:

watch cat /proc/mdstat

It won't write anything if it doesn't detect any errors.

In my case, I want it to write everything.

If I do my 'dd' to write everything as previously described, with the window
of opportunity for stale data to end up on the written disk, one option
would to run a scrub / repair to check the data is the same - but if I'm
unlucky with my dd and the data isn't the same for some sector[s], I want to
ensure the correct data is copied over the stale data and not the other way
around, e.g. to specify "in the event of a mismatch, use the data from sda
and overwrite the data on sdb".

Unfortunately I don't know how that can be done.

Does anyone know?

Cheers,

Chris

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Rewrite md raid1 member
  2016-08-19 12:46       ` Chris Dunlop
@ 2016-08-19 16:10         ` Chris Murphy
  2016-08-20  1:43           ` Chris Dunlop
  2016-08-19 21:26         ` NeilBrown
  1 sibling, 1 reply; 11+ messages in thread
From: Chris Murphy @ 2016-08-19 16:10 UTC (permalink / raw)
  To: Chris Dunlop; +Cc: Wols Lists, Brad Campbell, Linux-RAID

On Fri, Aug 19, 2016 at 6:46 AM, Chris Dunlop <chris@onthe.net.au> wrote:
> On Fri, Aug 19, 2016 at 12:52:21PM +0100, Wols Lists wrote:
>> On 18/08/16 05:01, Chris Dunlop wrote:
>>> I'm interested to see if there's a way of essentially doing the above on a
>>> live system, assuming there's appropriate care taken to not trash any
>>> existing data (including superblocks).
>>>
>>> I.e. is it *theoretically* possible to write the same data back to the whole
>>> disk safely. E.g. using 'dd' from/to the same disk is almost there, but, as
>>> described, there's a window of opportunity where you could get stale data on
>>> the disk and a raid repair could then copy that stale data to the good disk.
>>
>> There is something called "scrub". My superficial knowledge of raid
>> doesn't let me know what it is, but as far as I can make out it forces a
>> whole-disk-write or somesuch. Explicitly to flush out such problems. If
>> someone else can tell you how to scrub your disks, I'd try that.
>
> A scrub will read the RAID members to check that both sides match (raid 1,
> 10), or that the checksum is correct (raid 4,5,6).
>
> To initiate a scrub of md0:
>
> echo repair > /sys/block/md0/md/sync_action
>
> You can watch it using /proc/mdstat, e.g.:
>
> watch cat /proc/mdstat
>
> It won't write anything if it doesn't detect any errors.
>
> In my case, I want it to write everything.
>
> If I do my 'dd' to write everything as previously described, with the window
> of opportunity for stale data to end up on the written disk, one option
> would to run a scrub / repair to check the data is the same - but if I'm
> unlucky with my dd and the data isn't the same for some sector[s], I want to
> ensure the correct data is copied over the stale data and not the other way
> around, e.g. to specify "in the event of a mismatch, use the data from sda
> and overwrite the data on sdb".
>
> Unfortunately I don't know how that can be done.
>
> Does anyone know?

Basically you want what Btrfs balance does, except simpler: rather
than relocating extents into new allocation groups, you just want to
read and rewrite everything as it is.

You definitely can't do this with dd when md + mounted file system,
that's inevitably going to result in the file system making changes
after this operation has done a read, and therefore its write will
clobber the file system's modifications. It'll be data loss at a
minimum, and if it's file system metadata, it'll be worse in that
it'll make the file system inconsistent. Further it's a problem
overwriting good data, not accounting for the possibility of a crash
or power failure. You'd really want this operation to be CoW, so that
the good data is effectively duplicated somewhere else and only once
that operation is on stable media would it be pointed to, and the
original data turned to free space.

I'm not really understanding the use case of why you'd want to do
this. At a fundamental level it sounds like you don't trust the
devices the data resides on. If that's true, then there are related
concerns that aren't mitigated by this rewrite feature alone.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Rewrite md raid1 member
  2016-08-19 16:10         ` Chris Murphy
@ 2016-08-20  1:43           ` Chris Dunlop
  2016-08-20 10:44             ` Wols Lists
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Dunlop @ 2016-08-20  1:43 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Wols Lists, Brad Campbell, Linux-RAID

On Fri, Aug 19, 2016 at 10:10:23AM -0600, Chris Murphy wrote:
> On Fri, Aug 19, 2016 at 6:46 AM, Chris Dunlop <chris@onthe.net.au> wrote:
>> On Fri, Aug 19, 2016 at 12:52:21PM +0100, Wols Lists wrote:
>>> On 18/08/16 05:01, Chris Dunlop wrote:
>>>> I'm interested to see if there's a way of essentially doing the above on a
>>>> live system, assuming there's appropriate care taken to not trash any
>>>> existing data (including superblocks).
>>>>
>>>> I.e. is it *theoretically* possible to write the same data back to the whole
>>>> disk safely. E.g. using 'dd' from/to the same disk is almost there, but, as
>>>> described, there's a window of opportunity where you could get stale data on
>>>> the disk and a raid repair could then copy that stale data to the good disk.
[snip]
>> If I do my 'dd' to write everything as previously described, with the window
>> of opportunity for stale data to end up on the written disk, one option
>> would to run a scrub / repair to check the data is the same - but if I'm
>> unlucky with my dd and the data isn't the same for some sector[s], I want to
>> ensure the correct data is copied over the stale data and not the other way
>> around, e.g. to specify "in the event of a mismatch, use the data from sda
>> and overwrite the data on sdb".
>>
>> Unfortunately I don't know how that can be done.
>>
>> Does anyone know?
> 
> Basically you want what Btrfs balance does, except simpler: rather
> than relocating extents into new allocation groups, you just want to
> read and rewrite everything as it is.

Sorry, I'm not familiar with btrfs at that level.

> You definitely can't do this with dd when md + mounted file system,
> that's inevitably going to result in the file system making changes
> after this operation has done a read, and therefore its write will
> clobber the file system's modifications. It'll be data loss at a
> minimum, and if it's file system metadata, it'll be worse in that
> it'll make the file system inconsistent.

I'm not convinced it's "inevitable" given the window between reading and
writing can be relatively small, and the filesystem would have to write
to those specific sectors during that window. But, yes, that's the
issue, there's certainly a chance of it happening.

> Further it's a problem overwriting good data, not accounting for the
> possibility of a crash or power failure.  You'd really want this
> operation to be CoW, so that the good data is effectively duplicated
> somewhere else and only once that operation is on stable media would
> it be pointed to, and the original data turned to free space.

It's raid-1, so I have good data at all times, on the disk I'm not
dd'ing to (sda). The problem is there may stale data on the disk dd'ed
to (sdb) due to the window of opportunity described previously, i.e. dd
reads data A from sda:X (sector X), the system writes data B to md0:X
(i.e. to both sda:X and sdb:X), then dd writes stale data A to sdb:X,
putting the disks out of sync.

In fact, the stale data problem is a larger problem than I first
thought: it's not only an issue when doing a repair (i.e. how to tell md
to use the data on the "good" disk in the event of discrepancies), but
also whilst the dd is underway: if you happen to issue a read to a
sector which has good data on one disk but stale data on the other, I
don't know if there's a way to ensure md reads the data on the "good"
disk.

So, in fact, I guess the facility I'm looking for, is a "write only"
flag for that disk, until a repair can be done (assuming the repair also
honours the "write only" flag.

Oh hey, from linux/Documentation/md.txt:

  state
    A file recording the current state of the device in the array
    which can be a comma separated list of
      ...
      writemostly - device will only be subject to read requests if
                    there are no other options. This applies only to
                    raid1 arrays.

I think that's *almost* exactly what I need, but to be safe I think I
really want something like:

  writeonly - no reads will be issued to this drive. If reads can't
              be satisfied from other drives, the array will be failed.

Then again, I guess in the end what I'd really like is to be able to
flag a particular disk to md for "write repair", and tell md to repair.
Then md would read data from unflagged disks to write to the flagged
disk (that could work for parity raids as well as mirrors).

This has the advantage, like "mdadm --replace", that you retain
redundancy at all times whilst still writing to the entire disk. The
advantage over "madm --replace" would be that you don't require another
disk.

But, in the absence of sufficient time and kernel knowledge to add
"write repair" to md myself, I'm interested to see if it can be done at
the user level.

> I'm not really understanding the use case of why you'd want to do
> this. At a fundamental level it sounds like you don't trust the
> devices the data resides on. If that's true, then there are related
> concerns that aren't mitigated by this rewrite feature alone.

My immediate use case is to try to clear the "pending sector" count by
writing to every sector on the disk. The pending sector count indicates
"something" went wrong at some point: it could be a permanent error
(e.g. disk surface is dodgy) or a soft error (e.g. a power supply droop
during a write). I.e. it may or may not indicate the disk itself is
going bad. If the count clears (either by confirming the sector is
good, or reallocating if the sector is really rubbish), I have a
confirmed good disk and life goes on. If something turns up during the
write attempt, I know the disk is bad and I can schedule a replacement.

As stated at the beginning, I know the safest way to do this is to add
in another disk, do a 'mdadm --replace', and then remove the suspect
disk and play with it as much as I like.

As a matter of interest I'm looking to see if there's a safe way of
doing it whilst the disk is online and live. Safe, that is, in that the
data is as safe as it would be on a normally functioning array, *if*
everything is done correctly.

So it's a "hey, it would be good if this can be done" issue rather than
a "help me, I'm afraid I might lose some data!" problem.

Cheers,

Chris

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Rewrite md raid1 member
  2016-08-20  1:43           ` Chris Dunlop
@ 2016-08-20 10:44             ` Wols Lists
  0 siblings, 0 replies; 11+ messages in thread
From: Wols Lists @ 2016-08-20 10:44 UTC (permalink / raw)
  To: Chris Dunlop, Chris Murphy; +Cc: Brad Campbell, Linux-RAID

On 20/08/16 02:43, Chris Dunlop wrote:
> Then again, I guess in the end what I'd really like is to be able to
> flag a particular disk to md for "write repair", and tell md to repair.
> Then md would read data from unflagged disks to write to the flagged
> disk (that could work for parity raids as well as mirrors).

I had that idea. I'm probably better at understanding and documenting
things, hence my interest in the raid wiki, but I'm looking at this
exact thing as a project for my first foray into kernel programming. Is
that wise? :-)

Basically, do a stripe integrity check, and optionally rewrite it? I
don't to what extent linux raid actually implements a lot of interesting
theoretical abilities, and if I can document it, I can then identify
holes and try and fill them. Especially when you're trying to recover a
broken array, the more options you have, the better ...

Unfortunately the raid wiki admin is MIA at the moment, and I really
want to hack that as a learning exercise before I start messing about
with kernel code.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Rewrite md raid1 member
  2016-08-19 12:46       ` Chris Dunlop
  2016-08-19 16:10         ` Chris Murphy
@ 2016-08-19 21:26         ` NeilBrown
  2016-08-20  1:57           ` Chris Dunlop
  1 sibling, 1 reply; 11+ messages in thread
From: NeilBrown @ 2016-08-19 21:26 UTC (permalink / raw)
  To: Chris Dunlop, Wols Lists; +Cc: Brad Campbell, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1218 bytes --]

On Fri, Aug 19 2016, Chris Dunlop wrote:

>
> In my case, I want it to write everything.
>
> If I do my 'dd' to write everything as previously described, with the window
> of opportunity for stale data to end up on the written disk, one option
> would to run a scrub / repair to check the data is the same - but if I'm
> unlucky with my dd and the data isn't the same for some sector[s], I want to
> ensure the correct data is copied over the stale data and not the other way
> around, e.g. to specify "in the event of a mismatch, use the data from sda
> and overwrite the data on sdb".
>
> Unfortunately I don't know how that can be done.
>
> Does anyone know?

If it is the second device in the array (as listed by mdadm --detail)
then you can stop the array and re-assemble with --update=resync.

If it is the first device I can only suggest that you
fail the device and add it again:

 mdadm /dev/mdXX --fail /dev/sdYY
 mdadm /dev/mdXX --remove /dev/sdYY
 mdadm /dev/mdYY --add /dev/sdYY

If the "good" drive fails during the rewrite it might be a little bit
fiddley getting the array working again, but all the data will certainly
be there on the device you are re-writing, so you won't lose anything.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Rewrite md raid1 member
  2016-08-19 21:26         ` NeilBrown
@ 2016-08-20  1:57           ` Chris Dunlop
  2016-08-20  6:52             ` NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Dunlop @ 2016-08-20  1:57 UTC (permalink / raw)
  To: NeilBrown; +Cc: Wols Lists, Brad Campbell, linux-raid

Hi Neil,

Nice work on the Bus1 article!

On Sat, Aug 20, 2016 at 07:26:27AM +1000, NeilBrown wrote:
> On Fri, Aug 19 2016, Chris Dunlop wrote:
>> In my case, I want it to write everything.
>>
>> If I do my 'dd' to write everything as previously described, with the window
>> of opportunity for stale data to end up on the written disk, one option
>> would to run a scrub / repair to check the data is the same - but if I'm
>> unlucky with my dd and the data isn't the same for some sector[s], I want to
>> ensure the correct data is copied over the stale data and not the other way
>> around, e.g. to specify "in the event of a mismatch, use the data from sda
>> and overwrite the data on sdb".
>>
>> Unfortunately I don't know how that can be done.
>>
>> Does anyone know?
> 
> If it is the second device in the array (as listed by mdadm --detail)
> then you can stop the array and re-assemble with --update=resync.

That's nearly there - except in this specific case it's my root filesystem
so I can't stop the array without booting into a recovery disk etc. Of
course I could do that, but the point of the exercise is to see if it can
be done live, safely.

> If it is the first device I can only suggest that you
> fail the device and add it again:
> 
>  mdadm /dev/mdXX --fail /dev/sdYY
>  mdadm /dev/mdXX --remove /dev/sdYY
>  mdadm /dev/mdYY --add /dev/sdYY
> 
> If the "good" drive fails during the rewrite it might be a little bit
> fiddley getting the array working again, but all the data will certainly
> be there on the device you are re-writing, so you won't lose anything.

OK, that sounds good. What would the process be if the good drive fails,
either completely, or a few specific sectors?

Thanks,

Chris

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Rewrite md raid1 member
  2016-08-20  1:57           ` Chris Dunlop
@ 2016-08-20  6:52             ` NeilBrown
  0 siblings, 0 replies; 11+ messages in thread
From: NeilBrown @ 2016-08-20  6:52 UTC (permalink / raw)
  To: Chris Dunlop; +Cc: Wols Lists, Brad Campbell, linux-raid

[-- Attachment #1: Type: text/plain, Size: 3011 bytes --]

On Sat, Aug 20 2016, Chris Dunlop wrote:

> Hi Neil,
>
> Nice work on the Bus1 article!

Thanks :-)

>
> On Sat, Aug 20, 2016 at 07:26:27AM +1000, NeilBrown wrote:
>> On Fri, Aug 19 2016, Chris Dunlop wrote:
>>> In my case, I want it to write everything.
>>>
>>> If I do my 'dd' to write everything as previously described, with the window
>>> of opportunity for stale data to end up on the written disk, one option
>>> would to run a scrub / repair to check the data is the same - but if I'm
>>> unlucky with my dd and the data isn't the same for some sector[s], I want to
>>> ensure the correct data is copied over the stale data and not the other way
>>> around, e.g. to specify "in the event of a mismatch, use the data from sda
>>> and overwrite the data on sdb".
>>>
>>> Unfortunately I don't know how that can be done.
>>>
>>> Does anyone know?
>> 
>> If it is the second device in the array (as listed by mdadm --detail)
>> then you can stop the array and re-assemble with --update=resync.
>
> That's nearly there - except in this specific case it's my root filesystem
> so I can't stop the array without booting into a recovery disk etc. Of
> course I could do that, but the point of the exercise is to see if it can
> be done live, safely.

Well... you could
  cd /sys/block/mdXX/md
  echo frozen > sync_action
  echo 0 > resync_start
  echo idle > sync_action

that should start a resync on a live array.
Still, only works for non-first device in RAID1

>
>> If it is the first device I can only suggest that you
>> fail the device and add it again:
>> 
>>  mdadm /dev/mdXX --fail /dev/sdYY
>>  mdadm /dev/mdXX --remove /dev/sdYY
>>  mdadm /dev/mdYY --add /dev/sdYY
>> 
>> If the "good" drive fails during the rewrite it might be a little bit
>> fiddley getting the array working again, but all the data will certainly
>> be there on the device you are re-writing, so you won't lose anything.
>
> OK, that sounds good. What would the process be if the good drive fails,
> either completely, or a few specific sectors?

If you think there is a serious risk of that happening, then it's best
to skip this option.
You would need to boot from a rescue disk and re-create the array using
just the working device - and make sure the same data-offset and size
are used.  Certainly possible, but not at all straightforward.

Another thing you could do, particularly if you know what region of the
device needs to be over-written, is to write sector numbers to
suspend_lo and suspend_hi.  This will suspend all IO through the
/dev/mdXX device to that range of array sectors.
Then you could read from/write to the raw device with dd or whatever.

raid6check.c does this on a raid6 to correct errors that can be detected
with the raid6 syndrome, even while the array is online.  A similar
thing could be done to allow individual blocks to be rewritten.
Care is needed to map between array addresses and device addresses.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-08-20 10:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-18  3:04 Rewrite md raid1 member Chris Dunlop
2016-08-18  3:27 ` Brad Campbell
2016-08-18  4:01   ` Chris Dunlop
2016-08-19 11:52     ` Wols Lists
2016-08-19 12:46       ` Chris Dunlop
2016-08-19 16:10         ` Chris Murphy
2016-08-20  1:43           ` Chris Dunlop
2016-08-20 10:44             ` Wols Lists
2016-08-19 21:26         ` NeilBrown
2016-08-20  1:57           ` Chris Dunlop
2016-08-20  6:52             ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).