* RAID5 - 4 disk reboot trouble.
@ 2006-05-11 11:46 Guido Moonen
2006-05-11 11:52 ` Neil Brown
0 siblings, 1 reply; 4+ messages in thread
From: Guido Moonen @ 2006-05-11 11:46 UTC (permalink / raw)
To: linux-raid
Hi,
I'm running a raid5 system, and when I reboot my raid seems to be
failing. (One disk is set to spare and other disk seems to be oke in the
detials page but we get a INPUT/OUTPUT error when trying to mount it)
We cannot seem te find the problem in this setup.
If you need more info please contact me using guido.moonen@axon.tv
Specs of the system:
- Kernel 2.6.15.6 (with unionfs patch, Marvell driver, vweb (internal
pci card) driver, libata, ibm kernel debugger)
- 4x 250 Gb sata harddrive (which will be used for raid)
- mdadm version v2.4.1 - 4 April 2006
- mke2fs version 1.37
Steps to get our problem.
1. Create the raid system
"mdadm --create -n 4 -l 5 -x 0 /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1
/dev/sdd1"
2. Format the system to use ext3
"mke2fs -j /dev/md0"
3. Reboot (The hard way, turning off power)
4. Reassemble the raid array
"mdadm --assemble --run --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1
/dev/sdd1"
5. Repeat 3 and 4 until system does not correctly mount the raid anymore.
Then it reports:
mdadm: looking for devices for /dev/md0
mdadm: /dev/sda1 is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 4.
mdadm: added /dev/sdb1 to /dev/md0 as 1
mdadm: added /dev/sdc1 to /dev/md0 as 2
mdadm: no uptodate device for slot 3 of /dev/md0
mdadm: added /dev/sdd1 to /dev/md0 as 4
mdadm: added /dev/sda1 to /dev/md0 as 0
mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
The line mdadm: no uptodate device for slot 3 of /dev/md0 is what I see
every boot, even it is runs correctly.
The raid system is used to write a constant mpeg stream (512kbit/s) and
we have a database active (postgres) on the raid. other than that there
is no read activity on the raid system.
** mdadm --detail /dev/md0 after step 2 **
/dev/md0:
Version : 00.90.03
Creation Time : Thu May 11 11:29:40 2006
Raid Level : raid5
Array Size : 732419136 (698.49 GiB 750.00 GB)
Device Size : 244139712 (232.83 GiB 250.00 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Thu May 11 11:35:08 2006
State : clean, degraded, recovering
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 64K
Rebuild Status : 2% complete
UUID : 4d66978f:eab0d6ef:39e6cf38:7a7191ba
Events : 0.3
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
4 8 49 3 spare rebuilding /dev/sdd1
** mdadm --detail /dev/md0 after step 5 **
/dev/md0:
Version : 00.90.03
Creation Time : Thu May 11 11:29:40 2006
Raid Level : raid5
Device Size : 244139712 (232.83 GiB 250.00 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Thu May 11 11:43:07 2006
State : active, degraded
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 64K
UUID : 4d66978f:eab0d6ef:39e6cf38:7a7191ba
Events : 0.204
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
3 0 0 3 removed
4 8 49 - spare /dev/sdd1
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RAID5 - 4 disk reboot trouble.
2006-05-11 11:46 RAID5 - 4 disk reboot trouble Guido Moonen
@ 2006-05-11 11:52 ` Neil Brown
2006-05-11 12:00 ` Guido Moonen
0 siblings, 1 reply; 4+ messages in thread
From: Neil Brown @ 2006-05-11 11:52 UTC (permalink / raw)
To: Guido Moonen; +Cc: linux-raid
On Thursday May 11, guido.moonen@axon.tv wrote:
> Hi,
>
> I'm running a raid5 system, and when I reboot my raid seems to be
> failing. (One disk is set to spare and other disk seems to be oke in the
> detials page but we get a INPUT/OUTPUT error when trying to mount it)
>
> We cannot seem te find the problem in this setup.
...
> State : clean, degraded, recovering
^^^^^^^^^^
Do you ever let the recovery actually finish? Until you do you don't
have real redundancy.
NeilBrown
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RAID5 - 4 disk reboot trouble.
2006-05-11 11:52 ` Neil Brown
@ 2006-05-11 12:00 ` Guido Moonen
2006-05-11 14:15 ` Guido Moonen
0 siblings, 1 reply; 4+ messages in thread
From: Guido Moonen @ 2006-05-11 12:00 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Hi,
Computers in the field will be able to complete the whole cycle of
recovering and having a redundent array. but this is a situation that
can happen, and we are not sure what is causing this problem. I will let
one complete the this recovery and try to reproduce this bug. But when a
customer will replace one the drives this process is started again and
there will be a period where the system is not full proof.
System use:
This system will record (24/7) a single channel and saves the recorded
data (MPEG) on a raid device. The system must be able to hold 90 days of
recorded material for compliance regulation. When the raid fails users
can lose upto 90 days of mpeg which is not acceptable for compliance
(They must be able to produce the recorded mpeg for 90 days). So we
would like to know if this failure can be avoided, or if there is
another configuration which makes it possible to recover from this state.
Guido.
Neil Brown wrote:
>On Thursday May 11, guido.moonen@axon.tv wrote:
>
>
>>Hi,
>>
>>I'm running a raid5 system, and when I reboot my raid seems to be
>>failing. (One disk is set to spare and other disk seems to be oke in the
>>detials page but we get a INPUT/OUTPUT error when trying to mount it)
>>
>>We cannot seem te find the problem in this setup.
>>
>>
>...
>
>
>> State : clean, degraded, recovering
>>
>>
> ^^^^^^^^^^
>
>Do you ever let the recovery actually finish? Until you do you don't
>have real redundancy.
>
>NeilBrown
>
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RAID5 - 4 disk reboot trouble.
2006-05-11 12:00 ` Guido Moonen
@ 2006-05-11 14:15 ` Guido Moonen
0 siblings, 0 replies; 4+ messages in thread
From: Guido Moonen @ 2006-05-11 14:15 UTC (permalink / raw)
To: linux-raid; +Cc: Neil Brown
After some more tests:
A running system with a correct raid system will not have any trouble
rebooting and re-assembling.
but a system without one of the disks also crashes the raid in a reboot.
I know we should have a fully 4 disk synchronized raid system. But it
seems to me it should still be able to assemble a raid system without
the forth disk, multiple times. Is there something that I should change
in my configuration or any things I can do to prevent this?
Guido.
Correct System print:
[root@localhost ~]# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Thu May 11 12:05:31 2006
Raid Level : raid5
Array Size : 732419136 (698.49 GiB 750.00 GB)
Device Size : 244139712 (232.83 GiB 250.00 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Thu May 11 13:36:45 2006
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : 32c52389:27a260ee:ed154946:5e56f4ed
Events : 0.4
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
3 8 49 3 active sync /dev/sdd1
Does not have this problem.
Missing a drive print:
/dev/md0:
Version : 00.90.03
Creation Time : Thu May 11 12:05:31 2006
Raid Level : raid5
Device Size : 244139712 (232.83 GiB 250.00 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Thu May 11 14:09:09 2006
State : active, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : 32c52389:27a260ee:ed154946:5e56f4ed
Events : 0.455
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
3 0 0 3 removed
Guido Moonen wrote:
> Hi,
>
> Computers in the field will be able to complete the whole cycle of
> recovering and having a redundent array. but this is a situation that
> can happen, and we are not sure what is causing this problem. I will
> let one complete the this recovery and try to reproduce this bug. But
> when a customer will replace one the drives this process is started
> again and there will be a period where the system is not full proof.
>
> System use:
> This system will record (24/7) a single channel and saves the recorded
> data (MPEG) on a raid device. The system must be able to hold 90 days
> of recorded material for compliance regulation. When the raid fails
> users can lose upto 90 days of mpeg which is not acceptable for
> compliance (They must be able to produce the recorded mpeg for 90
> days). So we would like to know if this failure can be avoided, or if
> there is another configuration which makes it possible to recover from
> this state.
>
> Guido.
>
> Neil Brown wrote:
>
>> On Thursday May 11, guido.moonen@axon.tv wrote:
>>
>>
>>> Hi,
>>>
>>> I'm running a raid5 system, and when I reboot my raid seems to be
>>> failing. (One disk is set to spare and other disk seems to be oke in
>>> the detials page but we get a INPUT/OUTPUT error when trying to
>>> mount it)
>>>
>>> We cannot seem te find the problem in this setup.
>>>
>>
>> ...
>>
>>
>>> State : clean, degraded, recovering
>>>
>>
>> ^^^^^^^^^^
>>
>> Do you ever let the recovery actually finish? Until you do you don't
>> have real redundancy.
>>
>> NeilBrown
>>
>>
>>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-05-11 14:15 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-11 11:46 RAID5 - 4 disk reboot trouble Guido Moonen
2006-05-11 11:52 ` Neil Brown
2006-05-11 12:00 ` Guido Moonen
2006-05-11 14:15 ` Guido Moonen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).