Troubles making a raid5 system work.

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Troubles making a raid5 system work.
@ 2005-05-29 15:59 Francisco Zafra
  2005-05-30  7:53 ` Molle Bestefich
  0 siblings, 1 reply; 4+ messages in thread
From: Francisco Zafra @ 2005-05-29 15:59 UTC (permalink / raw)
  To: linux-raid

Hello,

 This is my first message to the list, first at all, apologize about my
horrible english, I will try to do my best.
 I have 8 200GB new SATA HDs, my distro is  Ubuntu 5.04, mdadm v1.9.0 and
kernel 2.6.11.8.
 I create the array with the following config:

    mdadm --create --verbose /dev/md0 --chunk=512 --level=raid5
--raid-devices=8 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1
/dev/sdg1 /dev/sdh1

 I also setup the /etc/mdadm/mdadm.conf

	root@Torero-2:/usr/src # cat /etc/mdadm/mdadm.conf 
	DEVICE /dev/sd[abcdefgh]1
	ARRAY /dev/md0 level=raid5 num-devices=8
UUID=f28b2043:a30358af:1c6d9640:49302af7

devices=/dev/sdh1,/dev/sdg1,/dev/sdf1,/dev/sde1,/dev/sdd1,/dev/sdc1,/dev/sdb
1,/dev/sda1
	   auto=yes

This start creating the array that spent severals hour until /proc/mdstat
reports active array and it supposed to be ready to work, but my problem is
the following, after creating the array it began to do "strange things"...
When the create command finish proc/mdstats report the following:

	root@Torero-2:/usr/src # cat /proc/mdstat 
	Personalities : [linear] [raid5] 
	md0 : active raid5 sda1[0] sdh1[8] sdg1[6] sdf1[5] sde1[4] sdd1[3]
sdc1[9](F) sdb1[1]
      	1367507456 blocks level 5, 256k chunk, algorithm 2 [8/6] [UU_UUUU_]

	unused devices: <none>

All seems to be right here, bu I saw activity in the hdd led so i did a
mdadm --detail, an obtained this:

root@Torero-2:/usr/src # mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Tue May 24 20:02:28 2005
     Raid Level : raid5
     Array Size : 1367507456 (1304.16 GiB 1400.33 GB)
    Device Size : 195358208 (186.31 GiB 200.05 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun May 29 17:29:45 2005
          State : clean, degraded
 Active Devices : 6
Working Devices : 7
 Failed Devices : 1
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 256K

           UUID : f28b2043:a30358af:1c6d9640:49302af7
         Events : 0.48957213

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       0        0        -      removed
       3       8       49        3      active sync   /dev/sdd1
       4       8       65        4      active sync   /dev/sde1
       5       8       81        5      active sync   /dev/sdf1
       6       8       97        6      active sync   /dev/sdg1
       7       0        0        -      removed

       8       8      113        7      spare rebuilding   /dev/sdh1
       9       8       33        -      faulty   /dev/sdc1

Really strange, also cpu load is high... load average: 1.89, 1.87, 1.82. And
in the system logs I have thousands of messages like this, that not were
generating during the create command:

May 29 17:34:45 localhost kernel: .<6>md: syncing RAID array md0 May 29
17:34:45 localhost kernel: md: minimum _guaranteed_ reconstruction speed:
1000 KB/sec/disc.
May 29 17:34:45 localhost kernel: md: using maximum available idle IO
bandwith (but not more than 200000 KB/sec) for reconstruction.
May 29 17:34:45 localhost kernel: md: using 128k window, over a total of
195358208 blocks.
May 29 17:34:45 localhost kernel: md: md0: sync done.
May 29 17:34:45 localhost kernel: .<6>md: syncing RAID array md0 May 29
17:34:45 localhost kernel: md: minimum _guaranteed_ reconstruction speed:
1000 KB/sec/disc.
May 29 17:34:45 localhost kernel: md: using maximum available idle IO
bandwith (but not more than 200000 KB/sec) for reconstruction.
May 29 17:34:45 localhost kernel: md: using 128k window, over a total of
195358208 blocks.
May 29 17:34:45 localhost kernel: md: md0: sync done.

I really have to stop the array because the log files are really getting
HUGE. So I did a mdadm -S /dev/md0 ,so it stops the array and the generating
of the messages.
If I try to run the array again it have no effect...

root@Torero-2:/usr/src # mdadm -R /dev/md0      
mdadm: failed to run array /dev/md0: Invalid argument root@Torero-2:/usr/src
# mdadm -A /dev/md0
mdadm: /dev/md0 assembled from 6 drives and 1 spare - not enough to start
the array.
root@Torero-2:/usr/src # cat /proc/mdstat       
Personalities : [linear] [raid5]
md0 : inactive sda1[0] sdh1[8] sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2]
sdb1[1]
      1562865664 blocks
unused devices: <none>
root@Torero-2:/usr/src # mdadm --detail /dev/md0
mdadm: md device /dev/md0 does not appear to be active.
root@Torero-2:/usr/src # mdadm -R /dev/md0      
mdadm: failed to run array /dev/md0: Invalid argument

I have tried this several times, I have even earsed and checked each drive
with:

	mdadm --zero-superblock /dev/sdd
	dd if=/dev/sdd of=/dev/null bs=1024k
	badblocks -svw /dev/sdd

But all is ok, the hardware (HDs) are fine... But when I tried to setup it
again I have the same problems.So it must be a config problem or a software
problem.

Anyone can help me with this raid? I am a little desperate...

Thanks to all in advance...

Paco Zafra.

PD: I sent this mail previusly from an unathorized mail account, I have wait
some minutes and I didn't saw it in the mail list so I resend it again. I
hope the mail didn't get repeated. 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Troubles making a raid5 system work.
  2005-05-29 15:59 Troubles making a raid5 system work Francisco Zafra
@ 2005-05-30  7:53 ` Molle Bestefich
  2005-06-01 17:51   ` Francisco Zafra
  0 siblings, 1 reply; 4+ messages in thread
From: Molle Bestefich @ 2005-05-30  7:53 UTC (permalink / raw)
  To: Francisco Zafra; +Cc: linux-raid

Francisco Zafra wrote:
>  I have 8 200GB new SATA HDs, mdadm v1.9.0 and kernel 2.6.11.8.

> When the create command finish proc/mdstats report the following:
>         md0 : active raid5 sda1[0] sdh1[8] sdg1[6] sdf1[5] sde1[4] sdd1[3]
> sdc1[9](F) sdb1[1]
>         1367507456 blocks level 5, 256k chunk, algorithm 2 [8/6] [UU_UUUU_]

Odd that there's two missing disks in [UU_UUUU_], but only one F
marker on the above line.

> mdadm --detail, an obtained this:
> /dev/md0:
>         Version : 00.90.01
>   Creation Time : Tue May 24 20:02:28 2005
>      Raid Level : raid5
>      Array Size : 1367507456 (1304.16 GiB 1400.33 GB)
>     Device Size : 195358208 (186.31 GiB 200.05 GB)
>    Raid Devices : 8
>   Total Devices : 8
> Preferred Minor : 0
>     Persistence : Superblock is persistent
> 
>     Update Time : Sun May 29 17:29:45 2005
>           State : clean, degraded
>  Active Devices : 6
> Working Devices : 7
>  Failed Devices : 1
>   Spare Devices : 1

Oh, so that's why there's a missing F.

MD has assigned one of the disks to be a Spare device, even though you
did not specify any spares on the mdadm command line or in the .conf
file.

No clue why, but seems wrong!!

>        8       8      113        7      spare rebuilding   /dev/sdh1

MD's trying to rebuild with the spare.

>        9       8       33        -      faulty   /dev/sdc1

Doesn't look good.

> in the system logs I have thousands of messages like this, that not were
> generating during the create command:

[snip repeated sync start/done messages]

I had the same problem.
There was once a bug in MD that caused this when syncing + multiple
devices fail.
See this thread for details:

http://thread.gmane.org/gmane.linux.raid/7714

(Ignore everything from Patrik Jonsson / "toy array" and onwards, it's
just someone that doesn't know how their mailer works - shouldn't have
been part of the thread)

> # mdadm -R /dev/md0
> mdadm: failed to run array /dev/md0: Invalid argument

Hm, could be a bug, or maybe it's just a misleading error message.

I wouldn't expect anyone to be able to figure out what's going on from
the two words "Invalid argument", so if it can be fixed, this should
definitely say something a little more informational.

> I have tried this several times, I have even earsed and checked each drive
> with:
> 
>         mdadm --zero-superblock /dev/sdd
>         dd if=/dev/sdd of=/dev/null bs=1024k
>         badblocks -svw /dev/sdd

Perhaps there is a more subtle hardware problem, for example cable
problems are  common with SATA drives.

If you're using Maxtor disks, you could try testing each disk with
their PowerMax utility, available for download on their web site.

It might be that your problem only occurs when multiple disks are
accessed at the same time.  You could:
 - Try the above dd, but run it in the background with "dd <...> &"
for multiple disks at the same time.
 - Nuke the superblocks and create the array again, but this time do a
'tail -f /var/log/messages | grep -v md:' before you start, to check
for any IDE messages you might have missed.
 - Apply the patch that Neil Brown mentions in the aforelinkedto
thread and see if things start to become more clear.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Troubles making a raid5 system work.
  2005-05-30  7:53 ` Molle Bestefich
@ 2005-06-01 17:51   ` Francisco Zafra
  2005-06-02  0:56     ` Molle Bestefich
  0 siblings, 1 reply; 4+ messages in thread
From: Francisco Zafra @ 2005-06-01 17:51 UTC (permalink / raw)
  To: 'Molle Bestefich'; +Cc: linux-raid

 
Thanks Molle,

Finally I made the raid system work fine :-) I followed your steps, I it
worked... That exactly what I did:
- I applied the patch
md-make-raid5-and-raid6-robust-against-failure-during-recovery.patch to my
kernel.
- dd the all the hardisk erasing superblock info and all...
- Create again the array from 0.

I checked the logs and all seems to be right.

Thanks again.

By the way... I have two questions.
1.- This patch will be included in new kernels versions or I have to applied
each time I compile a new kernel version?
2.- Working with big files (700megs) in the RAID comsumes a lot of cpu
resources, is this normal? I have an Pentium 4, 3Ghz and 1GB RAM...

That's all.


> Francisco Zafra wrote:
> >  I have 8 200GB new SATA HDs, mdadm v1.9.0 and kernel 2.6.11.8.
> 
> > When the create command finish proc/mdstats report the following:
> >         md0 : active raid5 sda1[0] sdh1[8] sdg1[6] sdf1[5] sde1[4] 
> > sdd1[3]
> > sdc1[9](F) sdb1[1]
> >         1367507456 blocks level 5, 256k chunk, algorithm 2 [8/6] 
> > [UU_UUUU_]
> 
> Odd that there's two missing disks in [UU_UUUU_], but only 
> one F marker on the above line.
> 
> > mdadm --detail, an obtained this:
> > /dev/md0:
> >         Version : 00.90.01
> >   Creation Time : Tue May 24 20:02:28 2005
> >      Raid Level : raid5
> >      Array Size : 1367507456 (1304.16 GiB 1400.33 GB)
> >     Device Size : 195358208 (186.31 GiB 200.05 GB)
> >    Raid Devices : 8
> >   Total Devices : 8
> > Preferred Minor : 0
> >     Persistence : Superblock is persistent
> > 
> >     Update Time : Sun May 29 17:29:45 2005
> >           State : clean, degraded
> >  Active Devices : 6
> > Working Devices : 7
> >  Failed Devices : 1
> >   Spare Devices : 1
> 
> Oh, so that's why there's a missing F.
> 
> MD has assigned one of the disks to be a Spare device, even 
> though you did not specify any spares on the mdadm command 
> line or in the .conf file.
> 
> No clue why, but seems wrong!!
> 
> >        8       8      113        7      spare rebuilding   /dev/sdh1
> 
> MD's trying to rebuild with the spare.
> 
> >        9       8       33        -      faulty   /dev/sdc1
> 
> Doesn't look good.
> 
> > in the system logs I have thousands of messages like this, that not 
> > were generating during the create command:
> 
> [snip repeated sync start/done messages]
> 
> I had the same problem.
> There was once a bug in MD that caused this when syncing + 
> multiple devices fail.
> See this thread for details:
> 
> http://thread.gmane.org/gmane.linux.raid/7714
> 
> (Ignore everything from Patrik Jonsson / "toy array" and 
> onwards, it's just someone that doesn't know how their mailer 
> works - shouldn't have been part of the thread)
> 
> > # mdadm -R /dev/md0
> > mdadm: failed to run array /dev/md0: Invalid argument
> 
> Hm, could be a bug, or maybe it's just a misleading error message.
> 
> I wouldn't expect anyone to be able to figure out what's 
> going on from the two words "Invalid argument", so if it can 
> be fixed, this should definitely say something a little more 
> informational.
> 
> > I have tried this several times, I have even earsed and 
> checked each 
> > drive
> > with:
> > 
> >         mdadm --zero-superblock /dev/sdd
> >         dd if=/dev/sdd of=/dev/null bs=1024k
> >         badblocks -svw /dev/sdd
> 
> Perhaps there is a more subtle hardware problem, for example 
> cable problems are  common with SATA drives.
> 
> If you're using Maxtor disks, you could try testing each disk 
> with their PowerMax utility, available for download on their web site.
> 
> It might be that your problem only occurs when multiple disks 
> are accessed at the same time.  You could:
>  - Try the above dd, but run it in the background with "dd <...> &"
> for multiple disks at the same time.
>  - Nuke the superblocks and create the array again, but this 
> time do a 'tail -f /var/log/messages | grep -v md:' before 
> you start, to check for any IDE messages you might have missed.
>  - Apply the patch that Neil Brown mentions in the 
> aforelinkedto thread and see if things start to become more clear.
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Troubles making a raid5 system work.
  2005-06-01 17:51   ` Francisco Zafra
@ 2005-06-02  0:56     ` Molle Bestefich
  0 siblings, 0 replies; 4+ messages in thread
From: Molle Bestefich @ 2005-06-02  0:56 UTC (permalink / raw)
  To: Francisco Zafra; +Cc: linux-raid

Francisco Zafra wrote:
> Finally I made the raid system work fine :-)
Cool beans ;-).

> 1.- This patch will be included in new kernels versions or I have to applied
> each time I compile a new kernel version?

It will be included at some point, probably "soon".
I'm not part of that process so can't really say for sure when..

> 2.- Working with big files (700megs) in the RAID comsumes a lot of cpu
> resources, is this normal? I have an Pentium 4, 3Ghz and 1GB RAM...

Probably depends how you define "working".
For simple operations, my *guess* would be that your CPU should
outperform your I/O subsystem, at least if you have DMA enabled.  You
can check if that's the case with 'hdparm /dev/hd<x>'.

I'm not sure how to analyze your CPU problem.  You're probably
interested in knowing whether it's MD doing XOR for raid5 or it's the
kernel busy waiting for your IDE disks.  Unless you are doing other
things, like encryption for example.  Perhaps a tool like 'top' can
help you.  But again, I have no real good idea how to find out. 
Perhaps others can be helpful?

With 8 disks and a standard PCI bus, my guess would be that your PCI
bus would be the first thing to get saturated.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-06-02  0:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-29 15:59 Troubles making a raid5 system work Francisco Zafra
2005-05-30  7:53 ` Molle Bestefich
2005-06-01 17:51   ` Francisco Zafra
2005-06-02  0:56     ` Molle Bestefich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).