sw_raid5-failed

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* sw_raid5-failed_disc(s)
@ 2005-03-11 21:40 Ronny Plattner
  2005-03-12 13:36 ` sw_raid5-failed_disc(s)-2 Ronny Plattner
  0 siblings, 1 reply; 6+ messages in thread
From: Ronny Plattner @ 2005-03-11 21:40 UTC (permalink / raw)
  To: linux-raid

Hi !

The story:
We are running a Softwareraid 5 with 4 ATA-Discs under Debian. A week
ago, one disc (hdi) had 3 reallocated sectors (seems to be stable), so
we decided to send it back to Maxtor, because the others discs were fine.
After we had built out this disc, another disc (hdm) of this raid array
filled our logfiles with a lot of seek errors. We were frustrated.
I decided to put the harddisc back in the array and after this shutdown,
the raid works and after some troubles [1] the disc was back in the
array. There are two partitions (logical volumes-lvm) on the raid, i
mounted one (/home..ext3) and copied the data to other discs. No
troubles during this operation!
Then i tried to mount the other partition. I was able to read directory
structures, but if i tried to copy files, i got "I/O Errors".
While i tried this, "the raid" (mdadm is started by startscripts) marks
hdm as faulty ...
The "reallocated sectors" counter of hdm shows an increasing amount of
reallocated sectors and the ongoing resync of the raid filled our logs
(some hundred MB) -> we sent hdm to maxtor.
Now, the raid is running degraded mode....

-snip-
server:~# cat /proc/mdstat
Personalities : [raid1] [raid5]
md1 : active raid1 hdg2[1] hde2[0]
      75709248 blocks [2/2] [UU]

md2 : inactive hdk1[1] hdi1[4] hdo1[3]
      735334848 blocks
md0 : active raid1 hdg1[1] hde1[0]
      2441280 blocks [2/2] [UU]

unused devices: <none>
server:~# mdadm  --detail /dev/md2
/dev/md2:
        Version : 00.90.01
  Creation Time : Mon Jun 14 18:43:20 2004
     Raid Level : raid5
    Device Size : 245111616 (233.76 GiB 250.99 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Sun Mar  6 16:40:29 2005
          State : active, degraded
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 72c49e3a:de37c4f8:00a6d8a2:e8fddb2c
         Events : 0.60470022

    Number   Major   Minor   RaidDevice State
       0       0        0        -      removed
       1      57        1        1      active sync   /dev/hdk1
       2       0        0        -      removed
       3      89        1        3      active sync   /dev/hdo1

       4      56        1        -      spare   /dev/hdi1
server:~#
-snap-

I tried a raidstart

-snip-
server:~# raidstart /dev/md2
/dev/md2: Invalid argument
/dev/md2: Invalid argument
/dev/md2: Invalid argument
/dev/md2: Invalid argument
/dev/md2: Invalid argument
server:~#
-snap-

...entry in messages

-snip-
kernel: md: autostart failed!
-snap-

Okay...stopping the raid

-snip-
server:~# mdadm  -S /dev/md2
server:~# cat /proc/mdstat
Personalities : [raid1] [raid5]
md1 : active raid1 hdg2[1] hde2[0]
      75709248 blocks [2/2] [UU]

md0 : active raid1 hdg1[1] hde1[0]
      2441280 blocks [2/2] [UU]

unused devices: <none>
-snap-

Fine.
Okay..reassembling it...

-snip-
server:~# mdadm --assemble --run --force /dev/md2 /dev/hdk /dev/hdo /dev/hdi
mdadm: no RAID superblock on /dev/hdk
mdadm: /dev/hdk has no superblock - assembly aborted

---> my fault :-)

server:~# mdadm --assemble --run --force /dev/md2 /dev/hdk1 /dev/hdo1
/dev/hdi1
mdadm: failed to RUN_ARRAY /dev/md2: Invalid argument
server:~# cat /proc/mdstat
Personalities : [raid1] [raid5]
md1 : active raid1 hdg2[1] hde2[0]
      75709248 blocks [2/2] [UU]

md2 : inactive hdk1[1] hdi1[4] hdo1[3]
      735334848 blocks
md0 : active raid1 hdg1[1] hde1[0]
      2441280 blocks [2/2] [UU]

unused devices: <none>
server:~#
-snap-
...and the mdadm --detail above..

Anyone knows whats happening here? I do not know how to get our data on
the second partition (xfs there).
I hope one of the gurus here is able to help me (and our data)

Best Regards
Ronny

[1] Some troubles ...i solved that with dmsetup ..after that i was able
to reassemble the array...german speaking people can read the whole
postings, non-german speaking people were able to see "my way" ... here
are the my postings (and raidtab too):
http://www.sbox.tugraz.at/home/p/plattner/raid5_history/

PS: There are also mdadm --examine - outputs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sw_raid5-failed_disc(s)-2
  2005-03-11 21:40 sw_raid5-failed_disc(s) Ronny Plattner
@ 2005-03-12 13:36 ` Ronny Plattner
  2005-03-12 17:49   ` sw_raid5-failed_disc(s)-2 Guy
  0 siblings, 1 reply; 6+ messages in thread
From: Ronny Plattner @ 2005-03-12 13:36 UTC (permalink / raw)
  To: linux-raid

Hi !

Today i tried following:

-snip-
server:~# mdadm --assemble --run --force /dev/md2 /dev/hdi1 /dev/hdk1
missing /dev/hdo1
mdadm: cannot open device missing: No such file or directory
mdadm: missing has no superblock - assembly aborted

server:~# cat /var/log/messages
Mar 12 14:27:49 server kernel: md: md2 stopped.
Mar 12 14:28:05 server kernel: md: md2 stopped.
-snap-


Maybe this information is useful.

Regards
Ronny


^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: sw_raid5-failed_disc(s)-2
  2005-03-12 13:36 ` sw_raid5-failed_disc(s)-2 Ronny Plattner
@ 2005-03-12 17:49   ` Guy
  2005-03-12 18:53     ` sw_raid5-failed_disc(s)-2 Ronny Plattner
  0 siblings, 1 reply; 6+ messages in thread
From: Guy @ 2005-03-12 17:49 UTC (permalink / raw)
  To: 'Ronny Plattner', linux-raid

It seems like a trick question! :)
You don't use "missing" on assemble, it is a keyword for the create command.

For assemble. just don't list that device.
mdadm --assemble --run --force /dev/md2 /dev/hdi1 /dev/hdk1 /dev/hdo1

mdadm will know which disk is which.

Guy


-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Ronny Plattner
Sent: Saturday, March 12, 2005 8:37 AM
To: linux-raid@vger.kernel.org
Subject: Re: sw_raid5-failed_disc(s)-2

Hi !

Today i tried following:

-snip-
server:~# mdadm --assemble --run --force /dev/md2 /dev/hdi1 /dev/hdk1
missing /dev/hdo1
mdadm: cannot open device missing: No such file or directory
mdadm: missing has no superblock - assembly aborted

server:~# cat /var/log/messages
Mar 12 14:27:49 server kernel: md: md2 stopped.
Mar 12 14:28:05 server kernel: md: md2 stopped.
-snap-


Maybe this information is useful.

Regards
Ronny

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sw_raid5-failed_disc(s)-2
  2005-03-12 17:49   ` sw_raid5-failed_disc(s)-2 Guy
@ 2005-03-12 18:53     ` Ronny Plattner
  2005-03-12 22:15       ` sw_raid5-failed_disc(s)-2 Guy
  0 siblings, 1 reply; 6+ messages in thread
From: Ronny Plattner @ 2005-03-12 18:53 UTC (permalink / raw)
  To: linux-raid

Hi !

Guy schrieb:
> It seems like a trick question! :)
> You don't use "missing" on assemble, it is a keyword for the create command.

Oh!

> For assemble. just don't list that device.
> mdadm --assemble --run --force /dev/md2 /dev/hdi1 /dev/hdk1 /dev/hdo1
> 
> mdadm will know which disk is which.

OKay, thank you

-snip-
server:~# cat /proc/mdstat
Personalities : [raid1] [raid5]
md1 : active raid1 hdg2[1] hde2[0]
      75709248 blocks [2/2] [UU]

md0 : active raid1 hdg1[1] hde1[0]
      2441280 blocks [2/2] [UU]

unused devices: <none>
server:~# mdadm  --assemble --run --force /dev/md2 /dev/hdi1 /dev/hdk1
/dev/hdo1
mdadm: failed to RUN_ARRAY /dev/md2: Invalid argument
server:~# cat /proc/mdstat
Personalities : [raid1] [raid5]
md1 : active raid1 hdg2[1] hde2[0]
      75709248 blocks [2/2] [UU]

md2 : inactive hdk1[1] hdi1[4] hdo1[3]
      735334848 blocks
md0 : active raid1 hdg1[1] hde1[0]
      2441280 blocks [2/2] [UU]

unused devices: <none>
-snap-


and


-snip-
server:~# mdadm  --detail /dev/md2
/dev/md2:
        Version : 00.90.01
  Creation Time : Mon Jun 14 18:43:20 2004
     Raid Level : raid5
    Device Size : 245111616 (233.76 GiB 250.99 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Sun Mar  6 16:40:29 2005
          State : active, degraded
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 72c49e3a:de37c4f8:00a6d8a2:e8fddb2c
         Events : 0.60470022

    Number   Major   Minor   RaidDevice State
       0       0        0        -      removed
       1      57        1        1      active sync   /dev/hdk1
       2       0        0        -      removed
       3      89        1        3      active sync   /dev/hdo1

       4      56        1        -      spare   /dev/hdi1
server:~#
-snap-


Why is hdi1 (the drive which was built out and after troubles with hdm
again built back in the server) a spare device?
Why isn´t it marked as active device?
Is it hopeless to get the raid working?

Not much information in the logfile (/var/log/messages)...


-snip-
Mar 12 19:33:57 server kernel: md: md2 stopped.
Mar 12 19:33:57 server kernel: md: bind<hdo1>
Mar 12 19:33:57 server kernel: md: bind<hdi1>
Mar 12 19:33:57 server kernel: md: bind<hdk1>
Mar 12 19:33:57 server kernel: raid5: device hdk1 operational as raid disk 1
Mar 12 19:33:57 server kernel: raid5: device hdo1 operational as raid disk 3
Mar 12 19:33:57 server kernel: RAID5 conf printout:
Mar 12 19:33:57 server kernel:  --- rd:4 wd:2 fd:2
Mar 12 19:33:57 server kernel:  disk 1, o:1, dev:hdk1
Mar 12 19:33:57 server kernel:  disk 3, o:1, dev:hdo1
-snap-


So, looks bad.


-snip-
server:~# mdadm --detail --scan
ARRAY /dev/md1 level=raid1 num-devices=2
UUID=83effa33:2b40093b:84150bda:54575d43
   devices=/dev/hde2,/dev/hdg2
ARRAY /dev/md2 level=raid5 num-devices=4 spares=1
UUID=72c49e3a:de37c4f8:00a6d8a2:e8fddb2c
   devices=/dev/hdk1,/dev/hdo1,/dev/hdi1
ARRAY /dev/md0 level=raid1 num-devices=2
UUID=7131d3b3:6e026b72:52e66138:1401e760
   devices=/dev/hde1,/dev/hdg1
-snap-


Is there a way to get the raid working?

Regards
Ronny

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: sw_raid5-failed_disc(s)-2
  2005-03-12 18:53     ` sw_raid5-failed_disc(s)-2 Ronny Plattner
@ 2005-03-12 22:15       ` Guy
  2005-03-13 22:15         ` sw_raid5-failed_disc(s)-2 Ronny Plattner
  0 siblings, 1 reply; 6+ messages in thread
From: Guy @ 2005-03-12 22:15 UTC (permalink / raw)
  To: 'Ronny Plattner', linux-raid

I guess I need to know more history.  Before your problems, was hdi1 a
spare?

Describe the array before you had problems.  Then what went wrong.

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Ronny Plattner
Sent: Saturday, March 12, 2005 1:54 PM
To: linux-raid@vger.kernel.org
Subject: Re: sw_raid5-failed_disc(s)-2

Hi !

Guy schrieb:
> It seems like a trick question! :)
> You don't use "missing" on assemble, it is a keyword for the create
command.

Oh!

> For assemble. just don't list that device.
> mdadm --assemble --run --force /dev/md2 /dev/hdi1 /dev/hdk1 /dev/hdo1
> 
> mdadm will know which disk is which.

OKay, thank you

-snip-
server:~# cat /proc/mdstat
Personalities : [raid1] [raid5]
md1 : active raid1 hdg2[1] hde2[0]
      75709248 blocks [2/2] [UU]

md0 : active raid1 hdg1[1] hde1[0]
      2441280 blocks [2/2] [UU]

unused devices: <none>
server:~# mdadm  --assemble --run --force /dev/md2 /dev/hdi1 /dev/hdk1
/dev/hdo1
mdadm: failed to RUN_ARRAY /dev/md2: Invalid argument
server:~# cat /proc/mdstat
Personalities : [raid1] [raid5]
md1 : active raid1 hdg2[1] hde2[0]
      75709248 blocks [2/2] [UU]

md2 : inactive hdk1[1] hdi1[4] hdo1[3]
      735334848 blocks
md0 : active raid1 hdg1[1] hde1[0]
      2441280 blocks [2/2] [UU]

unused devices: <none>
-snap-


and


-snip-
server:~# mdadm  --detail /dev/md2
/dev/md2:
        Version : 00.90.01
  Creation Time : Mon Jun 14 18:43:20 2004
     Raid Level : raid5
    Device Size : 245111616 (233.76 GiB 250.99 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Sun Mar  6 16:40:29 2005
          State : active, degraded
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 72c49e3a:de37c4f8:00a6d8a2:e8fddb2c
         Events : 0.60470022

    Number   Major   Minor   RaidDevice State
       0       0        0        -      removed
       1      57        1        1      active sync   /dev/hdk1
       2       0        0        -      removed
       3      89        1        3      active sync   /dev/hdo1

       4      56        1        -      spare   /dev/hdi1
server:~#
-snap-


Why is hdi1 (the drive which was built out and after troubles with hdm
again built back in the server) a spare device?
Why isn´t it marked as active device?
Is it hopeless to get the raid working?

Not much information in the logfile (/var/log/messages)...


-snip-
Mar 12 19:33:57 server kernel: md: md2 stopped.
Mar 12 19:33:57 server kernel: md: bind<hdo1>
Mar 12 19:33:57 server kernel: md: bind<hdi1>
Mar 12 19:33:57 server kernel: md: bind<hdk1>
Mar 12 19:33:57 server kernel: raid5: device hdk1 operational as raid disk 1
Mar 12 19:33:57 server kernel: raid5: device hdo1 operational as raid disk 3
Mar 12 19:33:57 server kernel: RAID5 conf printout:
Mar 12 19:33:57 server kernel:  --- rd:4 wd:2 fd:2
Mar 12 19:33:57 server kernel:  disk 1, o:1, dev:hdk1
Mar 12 19:33:57 server kernel:  disk 3, o:1, dev:hdo1
-snap-


So, looks bad.


-snip-
server:~# mdadm --detail --scan
ARRAY /dev/md1 level=raid1 num-devices=2
UUID=83effa33:2b40093b:84150bda:54575d43
   devices=/dev/hde2,/dev/hdg2
ARRAY /dev/md2 level=raid5 num-devices=4 spares=1
UUID=72c49e3a:de37c4f8:00a6d8a2:e8fddb2c
   devices=/dev/hdk1,/dev/hdo1,/dev/hdi1
ARRAY /dev/md0 level=raid1 num-devices=2
UUID=7131d3b3:6e026b72:52e66138:1401e760
   devices=/dev/hde1,/dev/hdg1
-snap-


Is there a way to get the raid working?

Regards
Ronny

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sw_raid5-failed_disc(s)-2
  2005-03-12 22:15       ` sw_raid5-failed_disc(s)-2 Guy
@ 2005-03-13 22:15         ` Ronny Plattner
  0 siblings, 0 replies; 6+ messages in thread
From: Ronny Plattner @ 2005-03-13 22:15 UTC (permalink / raw)
  To: linux-raid

Hi !

Guy schrieb:

>> I guess I need to know more history.  Before your problems, was hdi1
>> a spare?

I created the arry with following raidtab

-snip-
# jetzt die 4 Maxtor Platten im Raid 5
raiddev                 /dev/md2
raid-level              5
nr-raid-disks           4
# von 32 auf 64 konvertiert mit raidreconf - ronny,juni2004
chunk-size              64
nr-spare-disks          0
persistent-superblock   1
parity-algorithm        left-symmetric
device                  /dev/hdi1
raid-disk               0
device                  /dev/hdk1
raid-disk               1
device                  /dev/hdm1
raid-disk               2
device                  /dev/hdo1
raid-disk               3
-snap-

>> Describe the array before you had problems.  Then what went wrong.

hdi showed 3 reallocated sectors. I decided to send it back to maxtor,
so we shutdown the server, built the disc out of the tower and turned
the server on. After some hours, people told us, that there were some
problems. I saw, that hdm was marked as faulty ...and smartctl showed
some errors.
We built hdi back to the server and tried to add it back in the array.
But -> Troubles....it wasnt possible (after some power offs hdm was
"active" in the array and not marked as faulty) to readd hdi to the
raid. After a while i solved it (dmsetup status was the solution) by
remove hdi (dmsetup remove hdi) ..then it was possible to readd hdi back
to the array via mdadm /dev/md2 -a /dev/hdi1.
I was able to mount one partition (lvm) of the raid and i copied the
data to other harddiscs (hde, hdg,...). It was also able to mount the
second partition of the raid (data), but i got always "I/O Errors", when
i tried to copy files. And it was able to read the directory structure.
But while trying to backup the data-partition, the rebuild prozess (this
was started after i readded hdi1 to the array) of the raid filled our
logs (some entrys per second) so we powered of the server and built hdm
out (and AFAIK its still on my colleges table).
There we are now...you rod the postings, i think.

Do you know how to solve our problem or is the data lost?

Regards
Ronny

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-03-13 22:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-11 21:40 sw_raid5-failed_disc(s) Ronny Plattner
2005-03-12 13:36 ` sw_raid5-failed_disc(s)-2 Ronny Plattner
2005-03-12 17:49   ` sw_raid5-failed_disc(s)-2 Guy
2005-03-12 18:53     ` sw_raid5-failed_disc(s)-2 Ronny Plattner
2005-03-12 22:15       ` sw_raid5-failed_disc(s)-2 Guy
2005-03-13 22:15         ` sw_raid5-failed_disc(s)-2 Ronny Plattner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).