* Strange RAID-5 rebuild problem
@ 2008-03-01 21:09 Michael Guntsche
2008-03-01 22:02 ` Michael Guntsche
0 siblings, 1 reply; 12+ messages in thread
From: Michael Guntsche @ 2008-03-01 21:09 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 3321 bytes --]
Hello list,
The first version of this mail got send to linux-kernel by mistake,
now sending it to the correct list.
--------------------------------------------------------------------
Ok, after going through benchmarking for the last few days, I now
started acutally deploying the system.
I created a RAID-5 with a 1.0 superblock and let the RAID resync.
I had to reboot the computer for another update and did not think
about thw rebuilding process since it should continue just fine after
a restart. After a reboot I noticed that the RAID was in the
following state.
/dev/md1:
Version : 01.00.03
Creation Time : Sat Mar 1 21:42:18 2008
Raid Level : raid5
Array Size : 1464982272 (1397.12 GiB 1500.14 GB)
Used Dev Size : 976654848 (465.71 GiB 500.05 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Sat Mar 1 21:50:19 2008
State : clean, degraded
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 256K
Name : gibson:1 (local to host gibson)
UUID : 80b0698e:d7c76d22:e231f03e:d25feba2
Events : 32
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 8 18 1 active sync /dev/sdb2
2 8 34 2 active sync /dev/sdc2
4 8 50 3 spare rebuilding /dev/sdd2
mdadm -E /dev/sdd2:
/dev/sdd2:
Magic : a92b4efc
Version : 1.0
Feature Map : 0x2
Array UUID : 80b0698e:d7c76d22:e231f03e:d25feba2
Name : gibson:1 (local to host gibson)
Creation Time : Sat Mar 1 21:42:18 2008
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 976655336 (465.71 GiB 500.05 GB)
Array Size : 2929964544 (1397.12 GiB 1500.14 GB)
Used Dev Size : 976654848 (465.71 GiB 500.05 GB)
Super Offset : 976655592 sectors
Recovery Offset : 973312 sectors
State : clean
Device UUID : 5d6d28dd:c1d4d59d:3617c511:267c47f6
Update Time : Sat Mar 1 21:55:44 2008
Checksum : a1ca2174 - correct
Events : 52
Layout : left-symmetric
Chunk Size : 256K
Array Slot : 4 (0, 1, 2, failed, 3)
Array State : uuuU 1 failed
/proc/mdstat showed now progress bar though. Ok, since he seems stuck
I thought, why not just mark sdd2 as failed.
I was able to mark it as failed but looking at the detail again I saw
"spare rebuilding" and I could not remove it.
Next I tried a fail/remove which also did not work. I stopped the
array started it again, and THEN it removed the disk.
I tried this several times, the same result every time.
Just for kicks I created the RAID with a 0.90 superblock and tried
the same thing. Lo and behold after stopping the array and starting
it again the progress bar showed up and everything started rebuilding
where it stopped earlier.
Is this "normal" or is this a superblock 1.0 related problem?
Current kernel is 2.6.24.2, mdadm is v2.6.4 from debian unstable.
I do not need the RAID right now so I could run some tests on it if
someone wants me too.
Kind regards,
Michael
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 2417 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Strange RAID-5 rebuild problem
2008-03-01 21:09 Strange RAID-5 rebuild problem Michael Guntsche
@ 2008-03-01 22:02 ` Michael Guntsche
[not found] ` <20080301222326.GA20435@cthulhu.home.robinhill.me.uk>
0 siblings, 1 reply; 12+ messages in thread
From: Michael Guntsche @ 2008-03-01 22:02 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1553 bytes --]
On Mar 1, 2008, at 22:09, Michael Guntsche wrote:
> Just for kicks I created the RAID with a 0.90 superblock and tried
> the same thing. Lo and behold after stopping the array and starting
> it again the progress bar showed up and everything started
> rebuilding where it stopped earlier.
I am sorry, should have looked at it a little bit closer.
Always stopped with: mdadm --stop /dev/...
Always started with: mdadm --assemble --scan --auto=yes --symlink=no
<-- taken from debian's /etc/init.d/mdadm-raid
0.90 superblock:
The device gets assembled with [3/4] devices. The not synced one is
seen as spare. As soon as there is activity on the RAID the rebuild
start from the BEGINNING. Not from where it left off.
1.00 superblock:
The devices gets assembled with "mdadm: /dev/md/1 has been started
with 4 drives".
md1 : active(auto-read-only) raid5 sda2[0] sdd2[4] sdc2[2] sdb2[1]
1464982272 blocks super 1.0 level 5, 256k chunk, algorithm 2
[4/3] [UUU_]
But nothing is happening. you cannot hot-remove the one in spare-
rebuilding state, since using -f yields a
4 8 50 3 faulty spare rebuilding /dev/sdd2
status.
One more test with an 1.0.
During rebuild I mark sdd2 as faulty, the status changes to faulty
spare. After stopping and starting the raid, it gets created with 3/4
disks and the fourth one (sdd2) is removed.
Calling mdadm --assemble --scan again yields.
mdadm: /dev/md/1 already active, cannot restart it!
mdadm: /dev/md/1 needed for /dev/sdd2...
Kind regards,
Michael
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 2417 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Strange RAID-5 rebuild problem
[not found] ` <20080301222326.GA20435@cthulhu.home.robinhill.me.uk>
@ 2008-03-01 22:41 ` Michael Guntsche
2008-03-02 7:38 ` Michael Guntsche
0 siblings, 1 reply; 12+ messages in thread
From: Michael Guntsche @ 2008-03-01 22:41 UTC (permalink / raw)
To: Robin Hill; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 600 bytes --]
On Mar 1, 2008, at 23:23, Robin Hill wrote:
> It's started up in read-only mode (which is the safe option - it
> prevents any accidental damage). It should switch to write mode (and
> start syncing) when it's mounted. You can also force the mode change
> using 'mdadm -w /dev/md1'.
>
mdadm -w /dev/md1 works if I call it after assembling.
If do not call it, mount the array and start writing to it, it
changes to "active"
but the rebuild process does not kick in.
The only way it works for me now is.
* Stop the array
* assemble it again with mdadm --assemble --scan
* Call mdadm -w /dev/md1
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 2417 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Strange RAID-5 rebuild problem
2008-03-01 22:41 ` Michael Guntsche
@ 2008-03-02 7:38 ` Michael Guntsche
2008-03-02 8:14 ` Michael Guntsche
2008-03-03 17:53 ` Bill Davidsen
0 siblings, 2 replies; 12+ messages in thread
From: Michael Guntsche @ 2008-03-02 7:38 UTC (permalink / raw)
To: Robin Hill; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1336 bytes --]
On Mar 1, 2008, at 23:41, Michael Guntsche wrote:
>
> mdadm -w /dev/md1 works if I call it after assembling.
> If do not call it, mount the array and start writing to it, it
> changes to "active"
> but the rebuild process does not kick in.
>
> The only way it works for me now is.
> * Stop the array
> * assemble it again with mdadm --assemble --scan
> * Call mdadm -w /dev/md1
Good morning,
I found out why I am seeing this behaviour but I still do not know if
it is a bug or not.
The arrays are assembled in an initramfs. Debian's mdadm tool has a
script that assembles the arrays.
Now this script has the following entry.
# prevent writes/syncs so that resuming works (#415441). See http:/
bugs.debian.org/415441 for more information.
echo 1 > /sys/module/md_mod/parameters/start_ro
Changing this sysfs entry to 0, gives me the expected result if I do
a stop -> assemble with my 1.0 array.
It starts reysncing and everything is ok. According to the replies in
this bug, setting start_ro to 1 should NOT stop the automatic resync
after a write to it, but this seems to have changed now.
So, is this a change in behaviour or is it a bug. If it is the
former, I probably need to talk with the debian guys about it, if it
is a bug I wait for a fix and comment the entry in the mean time.
Kind regards,
Michael
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 2417 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Strange RAID-5 rebuild problem
2008-03-02 7:38 ` Michael Guntsche
@ 2008-03-02 8:14 ` Michael Guntsche
2008-03-02 10:09 ` Michael Tokarev
2008-03-02 12:29 ` Robin Hill
2008-03-03 17:53 ` Bill Davidsen
1 sibling, 2 replies; 12+ messages in thread
From: Michael Guntsche @ 2008-03-02 8:14 UTC (permalink / raw)
To: linux-raid; +Cc: Robin Hill
[-- Attachment #1: Type: text/plain, Size: 1296 bytes --]
On Mar 2, 2008, at 8:38, Michael Guntsche wrote:
> Changing this sysfs entry to 0, gives me the expected result if I
> do a stop -> assemble with my 1.0 array.
> It starts reysncing and everything is ok. According to the replies
> in this bug, setting start_ro to 1 should NOT stop the automatic
> resync after a write to it, but this seems to have changed now.
> So, is this a change in behaviour or is it a bug. If it is the
> former, I probably need to talk with the debian guys about it, if
> it is a bug I wait for a fix and comment the entry in the mean time.
>
Answering myself here after reading through the manapges, this seems
to be a regression. Furthermore this may yield to some nasty data
corruption, well at least I think it will.
* --stop the array while resyncing.
* set start_ro to 1
* Start the array, automatic reconstruction does not start even after
the data is been written to the ARRAY
* stop the array again
* set start_ro to 0
* Start the array again
The array picks up the resync process where it stopped the first time
even if there has already been data written to it, while it was
started with start_ro = 1.
Won't this yield corrupted data, since data BEFORE that offset might
actually have changed in the mean time?
Kind regards,
Michael
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 2417 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Strange RAID-5 rebuild problem
2008-03-02 8:14 ` Michael Guntsche
@ 2008-03-02 10:09 ` Michael Tokarev
2008-03-02 10:20 ` Michael Guntsche
2008-03-02 12:29 ` Robin Hill
1 sibling, 1 reply; 12+ messages in thread
From: Michael Tokarev @ 2008-03-02 10:09 UTC (permalink / raw)
To: Michael Guntsche; +Cc: linux-raid, Robin Hill
Michael Guntsche wrote:
>
[]
> Answering myself here after reading through the manapges, this seems to
> be a regression. Furthermore this may yield to some nasty data
> corruption, well at least I think it will.
>
> * --stop the array while resyncing.
> * set start_ro to 1
> * Start the array, automatic reconstruction does not start even after
> the data is been written to the ARRAY
You can't write any data to a read-only array.
/mjt
> * stop the array again
> * set start_ro to 0
> * Start the array again
> The array picks up the resync process where it stopped the first time
> even if there has already been data written to it, while it was started
> with start_ro = 1.
> Won't this yield corrupted data, since data BEFORE that offset might
> actually have changed in the mean time?
>
> Kind regards,
> Michael
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Strange RAID-5 rebuild problem
2008-03-02 10:09 ` Michael Tokarev
@ 2008-03-02 10:20 ` Michael Guntsche
2008-03-03 0:33 ` Neil Brown
0 siblings, 1 reply; 12+ messages in thread
From: Michael Guntsche @ 2008-03-02 10:20 UTC (permalink / raw)
To: Michael Tokarev; +Cc: linux-raid, Robin Hill
[-- Attachment #1: Type: text/plain, Size: 1012 bytes --]
On Mar 2, 2008, at 11:09, Michael Tokarev wrote:
> Michael Guntsche wrote:
> []
>> Answering myself here after reading through the manapges, this
>> seems to be a regression. Furthermore this may yield to some nasty
>> data corruption, well at least I think it will.
>> * --stop the array while resyncing.
>> * set start_ro to 1
>> * Start the array, automatic reconstruction does not start even
>> after the data is been written to the ARRAY
>
> You can't write any data to a read-only array.
>
From the manpage:
md_mod.start_ro=1
This tells md to start all arrays in read-only mode. This is a
soft read-only
that will automatically switch to read-write on the first write
request. However
until that write request, nothing is written to any device by md, and
in particu‐
lar, no resync or recovery operation is started.
So apparently it does switch to read-write on the first write
request, but the rebuild is not continuing.
Kind regards,
Michael
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 2417 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Strange RAID-5 rebuild problem
2008-03-02 8:14 ` Michael Guntsche
2008-03-02 10:09 ` Michael Tokarev
@ 2008-03-02 12:29 ` Robin Hill
2008-03-02 20:55 ` Michael Guntsche
1 sibling, 1 reply; 12+ messages in thread
From: Robin Hill @ 2008-03-02 12:29 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 2217 bytes --]
On Sun Mar 02, 2008 at 09:14:29AM +0100, Michael Guntsche wrote:
>
> On Mar 2, 2008, at 8:38, Michael Guntsche wrote:
>
>
>> Changing this sysfs entry to 0, gives me the expected result if I do a
>> stop -> assemble with my 1.0 array.
>> It starts reysncing and everything is ok. According to the replies in this
>> bug, setting start_ro to 1 should NOT stop the automatic resync after a
>> write to it, but this seems to have changed now.
>> So, is this a change in behaviour or is it a bug. If it is the former, I
>> probably need to talk with the debian guys about it, if it is a bug I wait
>> for a fix and comment the entry in the mean time.
>>
>
> Answering myself here after reading through the manapges, this seems to be
> a regression. Furthermore this may yield to some nasty data corruption,
> well at least I think it will.
>
> * --stop the array while resyncing.
> * set start_ro to 1
> * Start the array, automatic reconstruction does not start even after the
> data is been written to the ARRAY
> * stop the array again
> * set start_ro to 0
> * Start the array again
> The array picks up the resync process where it stopped the first time even
> if there has already been data written to it, while it was started with
> start_ro = 1.
> Won't this yield corrupted data, since data BEFORE that offset might
> actually have changed in the mean time?
>
That depends on the behaviour of the RAID system (and I've not dug
through the code to check on this). Realistically this situation is no
different to writing to the array while it's rebuilding - in either case
the safe thing to do is read from the (n-1) known good disks, and write
to all the (n) disks in the array (i.e. never just do any fast XORing
with the parity block). Any data written before the offset will still
be okay, and any data written after the offset will get recalculated
(which wastes a bit of time), but will still be valid.
Cheers,
Robin
--
___
( ' } | Robin Hill <robin@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Strange RAID-5 rebuild problem
2008-03-02 12:29 ` Robin Hill
@ 2008-03-02 20:55 ` Michael Guntsche
0 siblings, 0 replies; 12+ messages in thread
From: Michael Guntsche @ 2008-03-02 20:55 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1563 bytes --]
On Mar 2, 2008, at 13:29, Robin Hill wrote:
> That depends on the behaviour of the RAID system (and I've not dug
> through the code to check on this). Realistically this situation
> is no
> different to writing to the array while it's rebuilding - in either
> case
> the safe thing to do is read from the (n-1) known good disks, and
> write
> to all the (n) disks in the array (i.e. never just do any fast XORing
> with the parity block). Any data written before the offset will still
> be okay, and any data written after the offset will get recalculated
> (which wastes a bit of time), but will still be valid.
Yes you are absolutely right, it was too early for me in the morning,
should have given it a little bit more thought before sending out a
"woa doesn't this corrupt your data message" sorry for that.
Basically a non syncing degraded array is just a very sloooooooow
syncing array. :)
This still does not explain why the automatic resync is not triggered
on the first write if start_ro is set to 1, though.
I had a quick look at the code, but it will take some more time to
find my way around.
Fact is, if start_ro = 1 and the RAID is still in (auto-read-only)
sending mdadm -w /dev/... makes it writeable and triggers the resync.
Just writing to the array sets the array to writeable but does not
trigger the resync. Of course then mdadm -w does not work either
since it is already busy and in write mode.
So sending -w and writing to it to switch it to writeable seems to be
handled differently.
Kind regards,
Michael
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 2417 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Strange RAID-5 rebuild problem
2008-03-02 10:20 ` Michael Guntsche
@ 2008-03-03 0:33 ` Neil Brown
2008-03-03 12:38 ` Michael Guntsche
0 siblings, 1 reply; 12+ messages in thread
From: Neil Brown @ 2008-03-03 0:33 UTC (permalink / raw)
To: Michael Guntsche; +Cc: Michael Tokarev, linux-raid, Robin Hill
On Sunday March 2, mike@it-loops.com wrote:
>
> So apparently it does switch to read-write on the first write
> request, but the rebuild is not continuing.
Yes.
This bug is fixed by the following patch which I have just sent of for
inclusion in mainline.
The patches talks about "reshape" but it applies to "resync" too.
The workaround is to "mdadm -w" before the first write.
Thanks,
NeilBrown
Make sure a reshape is started when device switches to read-write.
A resync/reshape/recovery thread will refuse to progress when the
array is marked read-only. So whenever it mark it not read-only, it
is important to wake up thread resync thread.
There is one place we didn't do this.
The problem manifests if the start_ro module parameters is set, and a
raid5 array that is in the middle of a reshape (restripe) is started.
The array will initially be semi-read-only (meaning it acts like it is
readonly until the first write). So the reshape will not proceed.
On the first write, the array will become read-write, but the reshape
will not be started, and there is no event which will ever restart
that thread.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/md.c | 1 +
1 file changed, 1 insertion(+)
diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c 2008-02-22 15:46:25.000000000 +1100
+++ ./drivers/md/md.c 2008-02-22 15:46:52.000000000 +1100
@@ -5356,6 +5356,7 @@ void md_write_start(mddev_t *mddev, stru
mddev->ro = 0;
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(mddev->sync_thread);
}
atomic_inc(&mddev->writes_pending);
if (mddev->in_sync) {
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Strange RAID-5 rebuild problem
2008-03-03 0:33 ` Neil Brown
@ 2008-03-03 12:38 ` Michael Guntsche
0 siblings, 0 replies; 12+ messages in thread
From: Michael Guntsche @ 2008-03-03 12:38 UTC (permalink / raw)
To: linux-raid
On Mon, 3 Mar 2008 11:33:36 +1100, Neil Brown <neilb@suse.de> wrote:
> Yes.
> This bug is fixed by the following patch which I have just sent of for
> inclusion in mainline.
> The patches talks about "reshape" but it applies to "resync" too.
>
> The workaround is to "mdadm -w" before the first write.
Thank you very much Neil.
I applied the patch on my system and now resyncing starts automatically if
you start writing to the RAID.
Kind regards,
Michael
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Strange RAID-5 rebuild problem
2008-03-02 7:38 ` Michael Guntsche
2008-03-02 8:14 ` Michael Guntsche
@ 2008-03-03 17:53 ` Bill Davidsen
1 sibling, 0 replies; 12+ messages in thread
From: Bill Davidsen @ 2008-03-03 17:53 UTC (permalink / raw)
To: Michael Guntsche; +Cc: Robin Hill, linux-raid
Michael Guntsche wrote:
>
> On Mar 1, 2008, at 23:41, Michael Guntsche wrote:
>>
>> mdadm -w /dev/md1 works if I call it after assembling.
>> If do not call it, mount the array and start writing to it, it
>> changes to "active"
>> but the rebuild process does not kick in.
>>
>> The only way it works for me now is.
>> * Stop the array
>> * assemble it again with mdadm --assemble --scan
>> * Call mdadm -w /dev/md1
>
> Good morning,
>
> I found out why I am seeing this behaviour but I still do not know if
> it is a bug or not.
>
> The arrays are assembled in an initramfs. Debian's mdadm tool has a
> script that assembles the arrays.
>
> Now this script has the following entry.
>
> # prevent writes/syncs so that resuming works (#415441). See
> http:/bugs.debian.org/415441 for more information.
> echo 1 > /sys/module/md_mod/parameters/start_ro
>
> Changing this sysfs entry to 0, gives me the expected result if I do a
> stop -> assemble with my 1.0 array.
> It starts reysncing and everything is ok. According to the replies in
> this bug, setting start_ro to 1 should NOT stop the automatic resync
> after a write to it, but this seems to have changed now.
> So, is this a change in behaviour or is it a bug. If it is the former,
> I probably need to talk with the debian guys about it, if it is a bug
> I wait for a fix and comment the entry in the mean time.
Note that if you don't start_ro I don't believe you will be able to use
the resume feature. The use of -w option would seem safer. I haven't
looked at the patch Neil submitted to start rebuild/reshape at the first
write, the logic seems correct to work with resume, nothing should start
until a user write.
--
Bill Davidsen <davidsen@tmr.com>
"Woe unto the statesman who makes war without a reason that will still
be valid when the war is over..." Otto von Bismark
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-03-03 17:53 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-01 21:09 Strange RAID-5 rebuild problem Michael Guntsche
2008-03-01 22:02 ` Michael Guntsche
[not found] ` <20080301222326.GA20435@cthulhu.home.robinhill.me.uk>
2008-03-01 22:41 ` Michael Guntsche
2008-03-02 7:38 ` Michael Guntsche
2008-03-02 8:14 ` Michael Guntsche
2008-03-02 10:09 ` Michael Tokarev
2008-03-02 10:20 ` Michael Guntsche
2008-03-03 0:33 ` Neil Brown
2008-03-03 12:38 ` Michael Guntsche
2008-03-02 12:29 ` Robin Hill
2008-03-02 20:55 ` Michael Guntsche
2008-03-03 17:53 ` Bill Davidsen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).