linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* only 4 spares and no access to my data
@ 2006-07-09 18:59 Karl Voit
  2006-07-09 19:23 ` Molle Bestefich
  0 siblings, 1 reply; 19+ messages in thread
From: Karl Voit @ 2006-07-09 18:59 UTC (permalink / raw)
  To: linux-raid

Hi!

I created a sw-raid md0 and a LVM above with four 250GB Samsung SATA
disks a couple of months ago. I am not an raid expert but I thought I
could handle it with a little help of my friends from grml: Andreas
jimmy Gredler and Michael mika Prokop.

,----
|      md0  <future mds>      (PV:s on partitions or whole disks) 
|        \   /   
|         \ /      
|        datavg             (VG)
|           |     
|           |    
|        datalv           (LV)
|           |                                  
|         ext3         (filesystem) 
`----

HW: Promise FastTrack SATA controller on an P3-board. (A previously
used - and preferred - Dawicontrol DC-150 did not work at all: I could
not access the hdds.)

Approximately once a month, there was a short timeout that caused a
disk to be removed from the raid. A SMART-check and a resync (hot-add)
solved the problem so far.

,----[ syslog ]
| May  1 23:12:51 ned kernel: ata2: command timeout
| May  1 23:12:51 ned kernel: ata2: translated ATA stat/err 0x25/00\
 to SCSI
SK/ASC/ASCQ 0x4/00/00
| May  1 23:12:51 ned kernel: ata2: status=0x25 { DeviceFault\
 CorrectedError Error }
| May  1 23:12:51 ned kernel: SCSI error : <1 0 0 0> return code =\
 0x8000002
| May  1 23:12:51 ned kernel: sdb: Current: sense key: Hardware Error
| May  1 23:12:51 ned kernel: Additional sense: No additional sense\
 information
| May  1 23:12:51 ned kernel: end_request: I/O error, dev sdb, sector\
 179281983
| May  1 23:12:51 ned kernel: raid5: Disk failure on sdb1, disabling\
 device.
Operation continuing on 3 devices
| May  1 23:12:51 ned kernel: RAID5 conf printout:
| May  1 23:12:51 ned kernel: --- rd:4 wd:3 fd:1
| May  1 23:12:51 ned kernel: disk 0, o:1, dev:sda1
| May  1 23:12:51 ned kernel: disk 1, o:0, dev:sdb1
| May  1 23:12:51 ned kernel: disk 2, o:1, dev:sdc1
| May  1 23:12:51 ned kernel: disk 3, o:1, dev:sdd1
| May  1 23:12:51 ned kernel: RAID5 conf printout:
| May  1 23:12:51 ned kernel: --- rd:4 wd:3 fd:1
| May  1 23:12:51 ned kernel: disk 0, o:1, dev:sda1
| May  1 23:12:51 ned kernel: disk 2, o:1, dev:sdc1
| May  1 23:12:51 ned kernel: disk 3, o:1, dev:sdd1
`----

But two weeks ago, there were another timeout during such a resync and
that was the beginning of my problem.

Short summary (for the impatient)
=============

sda and sdb were removed, hot adding did not work out and I
accidentally thought, that removing and adding the drives again could
solve my problem. Bad idea.

Now I am not able to get the raid working: all drives are marked as
spares and they can't be assembled:


root@ned ~ # mdadm --examine /dev/sd[abcd]1
/dev/sda1:
          Magic : a92b4efc
        Version : 00.90.02
           UUID : 15f07005:037e4abf:70f51389:83dde0ed
  Creation Time : Sun Jan 29 21:35:05 2006
     Raid Level : raid5
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Sun Jul  2 17:23:03 2006
          State : clean
 Active Devices : 0
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 4
       Checksum : 4eb2dfe6 - correct
         Events : 0.1652541

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     4       8        1        4      spare   /dev/sda1

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3       0        0        3      faulty removed
   4     4       8        1        4      spare   /dev/sda1
   5     5       8       33        5      spare   /dev/sdc1
   6     6       8       17        6      spare   /dev/sdb1
   7     7       8       49        7      spare   /dev/sdd1
/dev/sdb1:
          Magic : a92b4efc
        Version : 00.90.02
           UUID : 15f07005:037e4abf:70f51389:83dde0ed
  Creation Time : Sun Jan 29 21:35:05 2006
     Raid Level : raid5
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Sun Jul  2 17:23:03 2006
          State : clean
 Active Devices : 0
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 4
       Checksum : 4eb2dffa - correct
         Events : 0.1652541

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     6       8       17        6      spare   /dev/sdb1

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3       0        0        3      faulty removed
   4     4       8        1        4      spare   /dev/sda1
   5     5       8       33        5      spare   /dev/sdc1
   6     6       8       17        6      spare   /dev/sdb1
   7     7       8       49        7      spare   /dev/sdd1
/dev/sdc1:
          Magic : a92b4efc
        Version : 00.90.02
           UUID : 15f07005:037e4abf:70f51389:83dde0ed
  Creation Time : Sun Jan 29 21:35:05 2006
     Raid Level : raid5
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Sun Jul  2 17:23:03 2006
          State : clean
 Active Devices : 0
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 4
       Checksum : 4eb2e008 - correct
         Events : 0.1652541

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     5       8       33        5      spare   /dev/sdc1

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3       0        0        3      faulty removed
   4     4       8        1        4      spare   /dev/sda1
   5     5       8       33        5      spare   /dev/sdc1
   6     6       8       17        6      spare   /dev/sdb1
   7     7       8       49        7      spare   /dev/sdd1
/dev/sdd1:
          Magic : a92b4efc
        Version : 00.90.02
           UUID : 15f07005:037e4abf:70f51389:83dde0ed
  Creation Time : Sun Jan 29 21:35:05 2006
     Raid Level : raid5
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Sun Jul  2 17:23:03 2006
          State : clean
 Active Devices : 0
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 4
       Checksum : 4eb2e01c - correct
         Events : 0.1652541

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     7       8       49        7      spare   /dev/sdd1

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3       0        0        3      faulty removed
   4     4       8        1        4      spare   /dev/sda1
   5     5       8       33        5      spare   /dev/sdc1
   6     6       8       17        6      spare   /dev/sdb1
   7     7       8       49        7      spare   /dev/sdd1
root@ned ~ #


root@grml ~ # date;cat /proc/mdstat
Di Jul  4 21:36:15 CEST 2006
Personalities : [linear] [raid0] [raid1] [raid10] [raid5] [raid4]\
 [raid6]
[multipath]
unused devices: <none>
root@grml ~ # mdadm --detail /dev/md0
mdadm: md device /dev/md0 does not appear to be active.
1 root@grml ~ # mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1\
 /dev/sdc1 /dev/sdd1    
mdadm: /dev/md0 assembled from 0 drives and 4 spares - not enough to\
 start the array.
1 root@grml ~ # mdadm --stop /dev/md0       
     
root@grml ~ # mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1\
/dev/sdc1 /dev/sdd1 --force
mdadm: /dev/md0 assembled from 0 drives and 4 spares - not\
 enough to start the
array.
1 root@grml ~ # mdadm --zero-superblock /dev/sda     
     
mdadm: Couldn't open /dev/sda for write - not zeroing
1 root@grml ~ # mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1\
 /dev/sdc1 /dev/sdd1 --run
mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
1 root@grml ~ #


Andreas Gredler suggested following lines as a last attempt but risk
of loosing data which I want to avoid:

mdadm --stop /dev/md0
mdadm --zero-superblock /dev/sda
mdadm --zero-superblock /dev/sdb
mdadm --zero-superblock /dev/sdc
mdadm --zero-superblock /dev/sdd
mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1\
 /dev/sdd1 --force
mdadm --create -n 4 -l 5 /dev/md0 missing /dev/sdb1\
 /dev/sdc1 /dev/sdd1

Is there another solution to get to my data?

Thank you!



Background history (the whole story - directors cut)
==================

I published the whole story (as much as I could log during my reboots
and so on) on the web:

              http://paste.debian.net/8779

It is avaliable for 72h from now on. If you want to read it
afterwards, please write me an email and I send the log to you.

Please feel free to visit this page and do not hesitate to write me,
what I can also check!


mdadm-version: 1.12.0-1
uname: Linux ned 2.6.13-grml #1 Tue Oct 4 18:24:46 CEST 2005\
       i686 GNU/Linux


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: only 4 spares and no access to my data
  2006-07-09 18:59 only 4 spares and no access to my data Karl Voit
@ 2006-07-09 19:23 ` Molle Bestefich
  2006-07-10  7:56   ` Karl Voit
  2006-07-10  8:48   ` Karl Voit
  0 siblings, 2 replies; 19+ messages in thread
From: Molle Bestefich @ 2006-07-09 19:23 UTC (permalink / raw)
  To: Karl Voit; +Cc: linux-raid

Karl Voit wrote:
> I published the whole story (as much as I could log during my reboots
> and so on) on the web:
>
>               http://paste.debian.net/8779

From the paste bin:

> 443: root@ned ~ # mdadm --examine /dev/sd[abcd]

Shows that all 4 devices are ACTIVE SYNC....

Next command:

> 563: root@ned ~ # mdadm --assemble --update=summaries /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
> mdadm: /dev/md0 assembled from 0 drives and 4 spares - not enough to start the array.

Then:

> 568: root@ned ~ # mdadm --examine /dev/sd[abcd]1

Suddenly shows all 4 devices as SPARE?

What the heck happened in between?
Did you do anything evil, or is it a MD bug, or what?


> mdadm-version: 1.12.0-1
> uname: Linux ned 2.6.13-grml

You should probably upgrade at some point, there's always a better
chance that devels will look at your problem if you're running the
version that they're sitting with..


> Andreas Gredler suggested following lines as a last attempt but risk
> of loosing data which I want to avoid:
>
> mdadm --stop /dev/md0
> mdadm --zero-superblock /dev/sda
> mdadm --zero-superblock /dev/sdb
> mdadm --zero-superblock /dev/sdc
> mdadm --zero-superblock /dev/sdd
> mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1\
>  /dev/sdd1 --force
> mdadm --create -n 4 -l 5 /dev/md0 missing /dev/sdb1\
>  /dev/sdc1 /dev/sdd1

Running zero-superblock on "sd[abcd]" and then assembling the array
from "sd[abcd]_1_" sounds odd to me.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: only 4 spares and no access to my data
  2006-07-09 19:23 ` Molle Bestefich
@ 2006-07-10  7:56   ` Karl Voit
  2006-07-10  8:46     ` Henrik Holst
  2006-07-10  8:48   ` Karl Voit
  1 sibling, 1 reply; 19+ messages in thread
From: Karl Voit @ 2006-07-10  7:56 UTC (permalink / raw)
  To: linux-raid

Molle Bestefich <molle.bestefich <at> gmail.com> writes:

> From the paste bin:
> 
> > 443: root <at> ned ~ # mdadm --examine /dev/sd[abcd]
> 
> Shows that all 4 devices are ACTIVE SYNC....

Please note that there is no "1" behind sda up to sdd!

> Then:
> 
> > 568: root <at> ned ~ # mdadm --examine /dev/sd[abcd]1
> 
> Suddenly shows all 4 devices as SPARE?

Now these are the sda1 up to sdd1 (with "1"!).

Probably the superblocks are damaged or wrong?

> > mdadm-version: 1.12.0-1
> > uname: Linux ned 2.6.13-grml
> 
> You should probably upgrade at some point, there's always a better
> chance that devels will look at your problem if you're running the
> version that they're sitting with..

Good point.
 
> Running zero-superblock on "sd[abcd]" and then assembling the array
> from "sd[abcd]_1_" sounds odd to me.

Well this is because of the false(?) superblocks of sda-sdd in comparison to
sda1 to sdd1.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: only 4 spares and no access to my data
  2006-07-10  7:56   ` Karl Voit
@ 2006-07-10  8:46     ` Henrik Holst
  2006-07-10  9:27       ` Karl Voit
                         ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Henrik Holst @ 2006-07-10  8:46 UTC (permalink / raw)
  To: Karl Voit; +Cc: linux-raid

Karl Voit wrote:
[snip]
> Well this is because of the false(?) superblocks of sda-sdd in comparison to
> sda1 to sdd1.

I don't understand this. Do you have more than a single partion on sda?
Is sda1 occupying the entire disk? since the superblock is the /last/
"128Kb" (I'm assuming 128*1024 bytes) the superblocks should be one and
the same.

Henrik Holst


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: only 4 spares and no access to my data
  2006-07-09 19:23 ` Molle Bestefich
  2006-07-10  7:56   ` Karl Voit
@ 2006-07-10  8:48   ` Karl Voit
  1 sibling, 0 replies; 19+ messages in thread
From: Karl Voit @ 2006-07-10  8:48 UTC (permalink / raw)
  To: linux-raid

Molle Bestefich <molle.bestefich <at> gmail.com> writes:

> You should probably upgrade at some point, there's always a better
> chance that devels will look at your problem if you're running the
> version that they're sitting with..

OK, I upgraded my kernel and mdadm:

"uname -a":
Linux ned 2.6.13-grml #1 Tue Oct 4 18:24:46 CEST 2005 i686 GNU/Linux

"dpkg --list mdadm" --> "2.4.1-6"

Now I get slightly different messages. The problem seems to be
the superblocks. Can I repair them?

root@ned ~ # date;cat /proc/mdstat
Mon Jul 10 10:41:45 CEST 2006
Personalities : [linear] [raid0] [raid1] [raid5] [multipath]\
 [raid6] [raid10]
unused devices: <none>
root@ned ~ # mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1\
 /dev/sdc1 /dev/sdd1
mdadm: cannot open device /dev/sda1: Device or resource busy
mdadm: /dev/sda1 has no superblock - assembly aborted
root@ned ~ # mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1\
 /dev/sdc1 /dev/sdd1 --force
mdadm: cannot open device /dev/sda1: Device or resource busy
mdadm: /dev/sda1 has no superblock - assembly aborted
root@ned ~ #

If I omit the sda1, the message is repeated but with the missing
superblock of sdb1.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: only 4 spares and no access to my data
  2006-07-10  8:46     ` Henrik Holst
@ 2006-07-10  9:27       ` Karl Voit
  2006-07-10  9:34       ` Karl Voit
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 19+ messages in thread
From: Karl Voit @ 2006-07-10  9:27 UTC (permalink / raw)
  To: linux-raid

Henrik Holst <henrik.holst <at> idgmail.se> writes:

> Karl Voit wrote:
> [snip]
> > Well this is because of the false(?) superblocks of sda-sdd in comparison
> to
> > sda1 to sdd1.
>
> I don't understand this.

Me neither *g*

This is the hint of a friend of mine, who is lot more experienced with
sw-raids.

> Do you have more than a single partion on sda?

No.

> Is sda1 occupying the entire disk?

Yes.

root@ned ~ # fdisk -l /dev/sda

Disk /dev/sda: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1       30395   244147806   fd  Linux raid autodetect
root@ned ~ #

I created the raid5 with sd[abcd]1 and not with sd[abcd].

> since the superblock is the /last/
> "128Kb" (I'm assuming 128*1024 bytes) the superblocks should be one and
> the same.

Really? Then how come that the md0 woun't start? I also upgraded my system
(kernel and md0) and now I get longer messages about the superblocks which I
already posted here in my previous posting.

root@ned ~ # date;cat /proc/mdstat
Mon Jul 10 10:56:44 CEST 2006
Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] [raid10]
unused devices: <none>
root@ned ~ #


TNX so far, I appreciate your help!



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: only 4 spares and no access to my data
  2006-07-10  8:46     ` Henrik Holst
  2006-07-10  9:27       ` Karl Voit
@ 2006-07-10  9:34       ` Karl Voit
  2006-07-10 11:16         ` Molle Bestefich
  2006-07-10 11:18       ` only 4 spares and no access to my data Molle Bestefich
  2006-07-18  2:17       ` Neil Brown
  3 siblings, 1 reply; 19+ messages in thread
From: Karl Voit @ 2006-07-10  9:34 UTC (permalink / raw)
  To: linux-raid

Henrik Holst <henrik.holst <at> idgmail.se> writes:

> I don't understand this. Do you have more than a single partion on sda?
> Is sda1 occupying the entire disk? since the superblock is the /last/
> "128Kb" (I'm assuming 128*1024 bytes) the superblocks should be one and
> the same.

I should have mentioned that I did not use the whole hard drive space for
sd[abcd]1. I thought that if I have to replace one of my Samsungs with another
drive that has not the very same capacity, I'd better use exactly 250GB
partitions and forget the last approx. 49MB of the drives.

HTH


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: only 4 spares and no access to my data
  2006-07-10  9:34       ` Karl Voit
@ 2006-07-10 11:16         ` Molle Bestefich
  2006-07-10 11:42           ` Karl Voit
  0 siblings, 1 reply; 19+ messages in thread
From: Molle Bestefich @ 2006-07-10 11:16 UTC (permalink / raw)
  To: Karl Voit; +Cc: linux-raid

Karl Voit wrote:
> > > 443: root <at> ned ~ # mdadm --examine /dev/sd[abcd]
> >
> > Shows that all 4 devices are ACTIVE SYNC....
>
> Please note that there is no "1" behind sda up to sdd!

Yes, you're right.

Seems you've created an array/superblocks on both sd[abcd] (line 443
onwards), and on sd[abcd]1 (line 66 and onward).

I'm unsure why 'pvscan' says there is an LVM PV on sda1 (line
118/119).  Probably it's a misfeature in LVM, causing it to find the
PV inside the MD volume if the array has not been started (since it
says that the PV is ~700 GB).


> > Running zero-superblock on "sd[abcd]" and then assembling the array
> > from "sd[abcd]_1_" sounds odd to me.
>
> Well this is because of the false(?) superblocks of sda-sdd in comparison to
> sda1 to sdd1.

Yes ok, I missed that part of the story.
In that case it sounds sane to zero the superblocks on sd[abcd],
seeing that 'pvscan' and 'lvscan' finds live data that you could
backup on the array consisting of sd[abcd]1.


> root@ned ~ # mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1\
>  /dev/sdc1 /dev/sdd1
> mdadm: cannot open device /dev/sda1: Device or resource busy
> mdadm: /dev/sda1 has no superblock - assembly aborted

Odd message.  Does "lsof | grep sda" show anything using /dev/sda(1)?


> I should have mentioned that I did not use the whole hard drive space for
> sd[abcd]1. I thought that if I have to replace one of my Samsungs with another
> drive that has not the very same capacity, I'd better use exactly 250GB
> partitions and forget the last approx. 49MB of the drives.

Good idea.


> The problem seems to be the superblocks.

Which ones, those on sd[abcd]1 ?
You've probably destroyed them by syncing the array consisting of sd[abcd].


> Can I repair them?

No, but you can recreate them without touching your data.
I think the suggestion from Andreas Gredler sounds sane.

I'm unsure if hot-adding a device will recreate a superblock on it.
Therefore I'd probably run --create on all four devices and use sysfs
to force a repair, instead of (as Andreas suggests) creating the array
with one 'missing' device.

Do remember to zero the superblocks on sd[abcd] first, to prevent mishaps...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: only 4 spares and no access to my data
  2006-07-10  8:46     ` Henrik Holst
  2006-07-10  9:27       ` Karl Voit
  2006-07-10  9:34       ` Karl Voit
@ 2006-07-10 11:18       ` Molle Bestefich
  2006-07-18  2:17       ` Neil Brown
  3 siblings, 0 replies; 19+ messages in thread
From: Molle Bestefich @ 2006-07-10 11:18 UTC (permalink / raw)
  To: linux-raid

Henrik Holst wrote:
> Is sda1 occupying the entire disk? since the superblock is the /last/
> "128Kb" (I'm assuming 128*1024 bytes) the superblocks should be one and
> the same.

Ack, never considered that.

Ugly!!!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: only 4 spares and no access to my data
  2006-07-10 11:16         ` Molle Bestefich
@ 2006-07-10 11:42           ` Karl Voit
  2006-07-10 12:07             ` Molle Bestefich
  0 siblings, 1 reply; 19+ messages in thread
From: Karl Voit @ 2006-07-10 11:42 UTC (permalink / raw)
  To: linux-raid

Molle Bestefich <molle.bestefich <at> gmail.com> writes:

> Karl Voit wrote:
> 
> > root <at> ned ~ # mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1\
> >  /dev/sdc1 /dev/sdd1
> > mdadm: cannot open device /dev/sda1: Device or resource busy
> > mdadm: /dev/sda1 has no superblock - assembly aborted
> 
> Odd message.  Does "lsof | grep sda" show anything using /dev/sda(1)?

Nope.

> > The problem seems to be the superblocks.
> 
> Which ones, those on sd[abcd]1 ?

I guessed so.

> You've probably destroyed them by syncing the array consisting of sd[abcd].

Sh..

> > Can I repair them?
> 
> No, but you can recreate them without touching your data.
> I think the suggestion from Andreas Gredler sounds sane.
> 
> I'm unsure if hot-adding a device will recreate a superblock on it.
> Therefore I'd probably run --create on all four devices and use sysfs
> to force a repair, instead of (as Andreas suggests) creating the array
> with one 'missing' device.
>
> Do remember to zero the superblocks on sd[abcd] first, to prevent mishaps...

Just to make sure, that I do not make any dumb commands. Is it true, that I
should try the following lines?

mdadm --stop /dev/md0
mdadm --zero-superblock /dev/sda
mdadm --zero-superblock /dev/sdb
mdadm --zero-superblock /dev/sdc
mdadm --zero-superblock /dev/sdd
mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 --force
(check, if it worked - probably not - and if not, try the following line)
mdadm --create -n 4 -l 5 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
sysfs: Sorry, what should I do with sysfs. I do not know this tool yet.

root@ned /sys/block/md0 # l
total 0
-r--r--r--  1 root root 4096 2006-07-10 13:38 dev
-r--r--r--  1 root root 4096 2006-07-10 13:38 range
-r--r--r--  1 root root 4096 2006-07-10 13:38 removable
-r--r--r--  1 root root 4096 2006-07-10 13:38 size
-r--r--r--  1 root root 4096 2006-07-10 13:38 stat
root@ned /sys/block/md0 #

Did you mean something like "echo repair > /sys/block/md0/md/sync_action" which
I googled?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: only 4 spares and no access to my data
  2006-07-10 11:42           ` Karl Voit
@ 2006-07-10 12:07             ` Molle Bestefich
  2006-07-10 12:36               ` Karl Voit
  0 siblings, 1 reply; 19+ messages in thread
From: Molle Bestefich @ 2006-07-10 12:07 UTC (permalink / raw)
  To: Karl Voit; +Cc: linux-raid

Karl Voit wrote:
> OK, I upgraded my kernel and mdadm:
>
> "uname -a":
> Linux ned 2.6.13-grml #1 Tue Oct 4 18:24:46 CEST 2005 i686 GNU/Linux

That release is 10 months old.
Newest release is 2.6.17.
You can see changes to MD since 2.6.13 here:
http://www.kernel.org/git/?p=linux%2Fkernel%2Fgit%2Fstable%2Flinux-2.6.17.y.git&a=search&s=md%3A

Anything from 2005-09-09 and further up the list is something that's in 2.6.17 but not in 2.6.13.

For example, your MD does not have sysfs support, it seems...


> "dpkg --list mdadm" --> "2.4.1-6"

Newest release is 2.5.2.
2.4.1 is 3 months old.

> Is it true, that I should try the following lines?
> 
> mdadm --stop /dev/md0
> mdadm --zero-superblock /dev/sda
> mdadm --zero-superblock /dev/sdb
> mdadm --zero-superblock /dev/sdc
> mdadm --zero-superblock /dev/sdd
> mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 --force
> (check, if it worked - probably not - and if not, try the following line)
> mdadm --create -n 4 -l 5 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

I don't have a unix box right here, but yes, that looks correct to me.

You can make certain that the ordering of the devices is correct by looking in your paste bin, lines 12-15.
Other RAID parameters (raid level, # of devices, persistence, layout & chunk size) can be seen on lines 212-231.


> Did you mean something like "echo repair > /sys/block/md0/md/sync_action"

Exactly.
(Gee, I hope someone stops me if I'm giving out bad advice.  Heh ;-).)

You can also assemble the array read-only after recreating the superblocks, and you can use "check" as a sync_action...

But only if your kernel has MD with sysfs support ;-).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: only 4 spares and no access to my data
  2006-07-10 12:07             ` Molle Bestefich
@ 2006-07-10 12:36               ` Karl Voit
  2006-07-10 17:06                 ` Molle Bestefich
  0 siblings, 1 reply; 19+ messages in thread
From: Karl Voit @ 2006-07-10 12:36 UTC (permalink / raw)
  To: linux-raid

Molle Bestefich <molle.bestefich <at> gmail.com> writes:

> Karl Voit wrote:
> > OK, I upgraded my kernel and mdadm:
> >
> > "uname -a":
> > Linux ned 2.6.13-grml #1 Tue Oct 4 18:24:46 CEST 2005 i686 GNU/Linux
> 
> That release is 10 months old.
> Newest release is 2.6.17.

Sorry, my fault. "dpkg -i <kernel> does not boot the new one *g*

root@ned ~ # uname -a
Linux ned 2.6.17-grml #1 PREEMPT Tue Jun 20 19:39:46\
 CEST 2006 i686 GNU/Linux
root@ned ~ #

Now that should be working.

> > "dpkg --list mdadm" --> "2.4.1-6"
> 
> Newest release is 2.5.2.
> 2.4.1 is 3 months old.

Debian seems to be old. I downloaded the current version and replaced the binary:

root@ned ~ # wget http://www.cse.unsw.edu.au/~neilb/source/\
mdadm/mdadm-2.5.2.tgz
[...]
root@ned ~/tmp2del/mdadm-2.5.2 # make
[...]
root@ned ~/tmp2del/mdadm-2.5.2 # mv /sbin/mdadm /sbin/mdadm_v2.4.1
root@ned ~/tmp2del/mdadm-2.5.2 # cp ./mdadm /sbin/mdadm
root@ned ~/tmp2del/mdadm-2.5.2 #

> > Is it true, that I should try the following lines?
> > 
> > mdadm --stop /dev/md0
> > mdadm --zero-superblock /dev/sda
> > mdadm --zero-superblock /dev/sdb
> > mdadm --zero-superblock /dev/sdc
> > mdadm --zero-superblock /dev/sdd
> > mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 --force
> > (check, if it worked - probably not - and if not, try the following line)
> > mdadm --create -n 4 -l 5 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
"echo repair > /sys/block/md0/md/sync_action"

Before that, I'd like to check again now with the latest kernel and the latest
mdadm:

root@ned ~ # mdadm --stop /dev/md0
mdadm: stopped /dev/md0
root@ned ~ # mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1\
 /dev/sdd1
mdadm: No suitable drives found for /dev/md0
root@ned ~ # mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1\
 /dev/sdd1 --run
mdadm: No suitable drives found for /dev/md0
root@ned ~ # mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1\
 /dev/sdd1 --force
mdadm: No suitable drives found for /dev/md0
root@ned ~ # mdadm --zero-superblock /dev/sda
mdadm: Unrecognised md component device - /dev/sda
root@ned ~ #

OK, those are other (newer) messages. Do they change anything or should I try
the commands above now?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: only 4 spares and no access to my data
  2006-07-10 12:36               ` Karl Voit
@ 2006-07-10 17:06                 ` Molle Bestefich
  2006-07-10 19:26                   ` Karl Voit
  0 siblings, 1 reply; 19+ messages in thread
From: Molle Bestefich @ 2006-07-10 17:06 UTC (permalink / raw)
  To: Karl Voit; +Cc: linux-raid

Karl Voit wrote:
> Before that, I'd like to check again now with
> the latest kernel and the latest mdadm:
>
> # mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
> mdadm: No suitable drives found for /dev/md0
> [ ... snip: same message with --run and --force ... ]

No idea what that means, sorry.
Check the MDADM source code, as far as I remember it's well commented.

> root@ned ~ # mdadm --zero-superblock /dev/sda
> mdadm: Unrecognised md component device - /dev/sda

Ack?
No clue, sorry.  Perhaps you've already zeroed that superblock?

Check the source for the meaning of the message.
Or try to get hold of our local target of godly worship, NeilB ;-).

> OK, those are other (newer) messages. Do they change anything or
> should I try the commands above now?

Depends how careful you are.
Personally, I'd probably try and find out why the above commands fail,
just in case there's something wrong with your new kernel or mdadm.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: only 4 spares and no access to my data
  2006-07-10 17:06                 ` Molle Bestefich
@ 2006-07-10 19:26                   ` Karl Voit
  2006-07-12 19:35                     ` Molle Bestefich
  0 siblings, 1 reply; 19+ messages in thread
From: Karl Voit @ 2006-07-10 19:26 UTC (permalink / raw)
  To: linux-raid

Molle Bestefich <molle.bestefich <at> gmail.com> writes:

> Karl Voit wrote:
> > Before that, I'd like to check again now with
> > the latest kernel and the latest mdadm:
> >
> > # mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
> > mdadm: No suitable drives found for /dev/md0
> > [ ... snip: same message with --run and --force ... ]
> 
> No idea what that means, sorry.
> Check the MDADM source code, as far as I remember it's well commented.

;-)
Unfortunately I am mainly a user and not a good C programmer :-(

if (super == NULL) {
  fprintf(stderr, Name ": No suitable drives found for %s\n", mddev);
[...]

Well I guess, the message will be shown, if the superblock is not found.

> > root <at> ned ~ # mdadm --zero-superblock /dev/sda
> > mdadm: Unrecognised md component device - /dev/sda
> 
> Ack?
> No clue, sorry.  Perhaps you've already zeroed that superblock?

Pretty sure, that this happened during my tests, yes.

> Check the source for the meaning of the message.

st = guess_super(fd);
  if (st == NULL) {
    if (!quiet)
      fprintf(stderr, Name ": Unrecognised md component device - %s\n", dev);
[...]

Again: this seems to be the case, when the superblock is empty.

> Or try to get hold of our local target of godly worship, NeilB .

OK, then how can a dumb user, who deleted his own raid-superblocks, attract
local gods to help out with surprising sd[abcd]-reassembling-magic? ;-)

> > OK, those are other (newer) messages. Do they change anything or
> > should I try the commands above now?
> 
> Depends how careful you are.

Since my miserably failure I am probably too careful *g*

The problem is also, that without deeper background knowledge, I can not
predict, if this or that permanently affects the real data on the disks.

Maybe such a person like me starts to think that sw-raid-tools like
mdadm should warn users before permanent changes are executed. If
mdadm should be used by users (additional to raid-geeks like you),
it might be a good idea to prevent data loss. (Ment as a suggestion.) 

> Personally, I'd probably try and find out why the above commands fail,
> just in case there's something wrong with your new kernel or mdadm.

OK. I'll try my best.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: only 4 spares and no access to my data
  2006-07-10 19:26                   ` Karl Voit
@ 2006-07-12 19:35                     ` Molle Bestefich
  2006-07-13 12:59                       ` Karl Voit
  2006-07-15 10:31                       ` only 4 spares and no access to my data - solved Karl Voit
  0 siblings, 2 replies; 19+ messages in thread
From: Molle Bestefich @ 2006-07-12 19:35 UTC (permalink / raw)
  To: Karl Voit; +Cc: linux-raid

Karl Voit wrote:
> if (super == NULL) {
>   fprintf(stderr, Name ": No suitable drives found for %s\n", mddev);
> [...]
>
> Well I guess, the message will be shown, if the superblock is not found.

Yes.  No clue why, my buest guess is that you've already zeroed the superblock.
What does madm --query / --examine say about /dev/sd[abcd], are there
superblocks ?

> st = guess_super(fd);
>   if (st == NULL) {
>     if (!quiet)
>       fprintf(stderr, Name ": Unrecognised md component device - %s\n",
> dev);
>
> Again: this seems to be the case, when the superblock is empty.

Yes, looks like it can't find any usable superblocks.
Maybe you've accidentally zeroed the superblocks on sd[abcd]1 also?

If you fdisk -l /dev/sd[abcd], does the partition tables look like
they should / like they used to?

What does mdadm --query / --examine /dev/sd[abcd]1 tell you, any superblocks ?

> Since my miserably failure I am probably too careful *g*
>
> The problem is also, that without deeper background knowledge, I can not
> predict, if this or that permanently affects the real data on the disks.

My best guess is that it's OK and you won't loose data if you run
--zero-superblock on /dev/sd[abcd] and then create an array on
/dev/sd[abcd]1, but I do find it odd that it suddenly can't find
superblocks on /dev/sd[abcd]1.

> Maybe such a person like me starts to think that sw-raid-tools like
> mdadm should warn users before permanent changes are executed. If
> mdadm should be used by users (additional to raid-geeks like you),
> it might be a good idea to prevent data loss. (Ment as a suggestion.)

Perhaps.  Or perhaps mdadm should just tell you that you're doing
something stupid if you try to manipulate arrays on a block device
which seems to contain a partition table.

It's not like it's even remotely useful to create an MD array spanning
the whole disk rather than spanning a partition which spans the whole
disk, anyway.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: only 4 spares and no access to my data
  2006-07-12 19:35                     ` Molle Bestefich
@ 2006-07-13 12:59                       ` Karl Voit
  2006-07-15 10:31                       ` only 4 spares and no access to my data - solved Karl Voit
  1 sibling, 0 replies; 19+ messages in thread
From: Karl Voit @ 2006-07-13 12:59 UTC (permalink / raw)
  To: linux-raid

Molle Bestefich <molle.bestefich <at> gmail.com> writes:

> Karl Voit wrote:
> > if (super == NULL) {
> >   fprintf(stderr, Name ": No suitable drives found for %s\n", mddev);
> > [...]
> >
> > Well I guess, the message will be shown, if the superblock is not found.
> 
> Yes.  No clue why, my buest guess is that you've already zeroed the superblock.

I did, yes. This was because the disks were marked as spare disks and a friend
of mine guessed that zeroing the superblocks might probably erase those
spare-marks and probably the disks can be assembled again. This was after a lot
of testing other methods.

> What does madm --query / --examine say about /dev/sd[abcd], are there
> superblocks ?

root@ned ~ #  mdadm --query /dev/md0 /dev/sd[abcd]
/dev/md0: is an md device which is not active
/dev/sda: is not an md array
/dev/sdb: is not an md array
/dev/sdc: is not an md array
/dev/sdd: is not an md array
root@ned ~ # mdadm --query /dev/md0 /dev/sd[abcd]1                             
                                                                        
/dev/md0: is an md device which is not active
/dev/sda1: is not an md array
/dev/sda1: device 4 in 4 device undetected raid5 /dev/md0.  Use \
mdadm --examine for more detail.
/dev/sdb1: is not an md array
/dev/sdb1: device 6 in 4 device undetected raid5 /dev/md0.  Use \
mdadm --examine for more detail.
/dev/sdc1: is not an md array
/dev/sdc1: device 5 in 4 device undetected raid5 /dev/md0.  Use \
mdadm --examine for more detail.
/dev/sdd1: is not an md array
/dev/sdd1: device 7 in 4 device undetected raid5 /dev/md0.  Use \
mdadm --examine for more detail.
root@ned ~ #  mdadm --examine /dev/md0 /dev/sd[abcd]
mdadm: No md superblock detected on /dev/md0.
mdadm: No md superblock detected on /dev/sda.
mdadm: No md superblock detected on /dev/sdb.
mdadm: No md superblock detected on /dev/sdc.
mdadm: No md superblock detected on /dev/sdd.
root@ned ~ # mdadm --examine /dev/md0
mdadm: No md superblock detected on /dev/md0.
root@ned ~ # mdadm --examine /dev/sda1
/dev/sda1:
          Magic : a92b4efc
        Version : 00.90.02
           UUID : 15f07005:037e4abf:70f51389:83dde0ed
  Creation Time : Sun Jan 29 21:35:05 2006
     Raid Level : raid5
    Device Size : 244147712 (232.84 GiB 250.01 GB)
     Array Size : 732443136 (698.51 GiB 750.02 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Sun Jul  2 17:23:03 2006
          State : clean
 Active Devices : 0
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 4
       Checksum : 4eb2dfe6 - correct
         Events : 0.1652541

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     4       8        1        4      spare   /dev/sda1

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3       0        0        3      faulty removed
   4     4       8        1        4      spare   /dev/sda1
   5     5       8       33        5      spare   /dev/sdc1
   6     6       8       17        6      spare   /dev/sdb1
   7     7       8       49        7      spare   /dev/sdd1
root@ned ~ # mdadm --examine /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 00.90.02
           UUID : 15f07005:037e4abf:70f51389:83dde0ed
  Creation Time : Sun Jan 29 21:35:05 2006
     Raid Level : raid5
    Device Size : 244147712 (232.84 GiB 250.01 GB)
     Array Size : 732443136 (698.51 GiB 750.02 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Sun Jul  2 17:23:03 2006
          State : clean
 Active Devices : 0
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 4
       Checksum : 4eb2dffa - correct
         Events : 0.1652541

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     6       8       17        6      spare   /dev/sdb1

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3       0        0        3      faulty removed
   4     4       8        1        4      spare   /dev/sda1
   5     5       8       33        5      spare   /dev/sdc1
   6     6       8       17        6      spare   /dev/sdb1
   7     7       8       49        7      spare   /dev/sdd1
root@ned ~ # mdadm --examine /dev/sdc1
/dev/sdc1:
          Magic : a92b4efc
        Version : 00.90.02
           UUID : 15f07005:037e4abf:70f51389:83dde0ed
  Creation Time : Sun Jan 29 21:35:05 2006
     Raid Level : raid5
    Device Size : 244147712 (232.84 GiB 250.01 GB)
     Array Size : 732443136 (698.51 GiB 750.02 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Sun Jul  2 17:23:03 2006
          State : clean
 Active Devices : 0
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 4
       Checksum : 4eb2e008 - correct
         Events : 0.1652541

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     5       8       33        5      spare   /dev/sdc1

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3       0        0        3      faulty removed
   4     4       8        1        4      spare   /dev/sda1
   5     5       8       33        5      spare   /dev/sdc1
   6     6       8       17        6      spare   /dev/sdb1
   7     7       8       49        7      spare   /dev/sdd1
root@ned ~ # mdadm --examine /dev/sdd1
/dev/sdd1:
          Magic : a92b4efc
        Version : 00.90.02
           UUID : 15f07005:037e4abf:70f51389:83dde0ed
  Creation Time : Sun Jan 29 21:35:05 2006
     Raid Level : raid5
    Device Size : 244147712 (232.84 GiB 250.01 GB)
     Array Size : 732443136 (698.51 GiB 750.02 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Sun Jul  2 17:23:03 2006
          State : clean
 Active Devices : 0
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 4
       Checksum : 4eb2e01c - correct
         Events : 0.1652541

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     7       8       49        7      spare   /dev/sdd1

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3       0        0        3      faulty removed
   4     4       8        1        4      spare   /dev/sda1
   5     5       8       33        5      spare   /dev/sdc1
   6     6       8       17        6      spare   /dev/sdb1
   7     7       8       49        7      spare   /dev/sdd1
root@ned ~ #

[Unrecognised md component device]
> > Again: this seems to be the case, when the superblock is empty.
> 
> Yes, looks like it can't find any usable superblocks.
> Maybe you've accidentally zeroed the superblocks on sd[abcd]1 also?

Yes. But it was on purpose (again after trying a lot of things 
without success).
 
> If you fdisk -l /dev/sd[abcd], does the partition tables look like
> they should / like they used to?

Yes:

root@ned ~ # fdisk -l /dev/sda

Disk /dev/sda: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1       30395   244147806   fd  Linux raid autodetect
root@ned ~ # 

... all four of them are exactly the same.
 
> What does mdadm --query / --examine /dev/sd[abcd]1 tell you, any
> superblocks ?

See above.
 
> > The problem is also, that without deeper background knowledge, I can not
> > predict, if this or that permanently affects the real data on the disks.
> 
> My best guess is that it's OK and you won't loose data if you run
> --zero-superblock on /dev/sd[abcd] and then create an array on
> /dev/sd[abcd]1, but I do find it odd that it suddenly can't find
> superblocks on /dev/sd[abcd]1.

My friend said, that I should try this line

mdadm --create -n 4 -l 5 /dev/md0 missing /dev/sdb1 /dev/sdc1 /dev/sdd1

instead of this line

mdadm --create -n 4 -l 5 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

first, because when the second one works, it might be the case, that the
raid is starting to synchronize and this might cause problems.

> > Maybe such a person like me starts to think that sw-raid-tools like
> > mdadm should warn users before permanent changes are executed. If
> > mdadm should be used by users (additional to raid-geeks like you),
> > it might be a good idea to prevent data loss. (Ment as a suggestion.)
> 
> Perhaps.  Or perhaps mdadm should just tell you that you're doing
> something stupid if you try to manipulate arrays on a block device
> which seems to contain a partition table.

Additionally, yes.
 
> It's not like it's even remotely useful to create an MD array spanning
> the whole disk rather than spanning a partition which spans the whole
> disk, anyway.

I agree. But including best practices into mdadm is not quite an easy 
task, I guess.

TNX (again)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: only 4 spares and no access to my data - solved
  2006-07-12 19:35                     ` Molle Bestefich
  2006-07-13 12:59                       ` Karl Voit
@ 2006-07-15 10:31                       ` Karl Voit
  1 sibling, 0 replies; 19+ messages in thread
From: Karl Voit @ 2006-07-15 10:31 UTC (permalink / raw)
  To: linux-raid

Molle Bestefich <molle.bestefich <at> gmail.com> writes:

> My best guess is that it's OK and you won't loose data if you run
> --zero-superblock on /dev/sd[abcd] and then create an array on
> /dev/sd[abcd]1, but I do find it odd that it suddenly can't find
> superblocks on /dev/sd[abcd]1.

OK, I tried several things and then I had enough and I did the creating part:
It worked!

,----[ Creating the raid ]
| root@ned ~ # mdadm --create -n 4 -l 5 /dev/md0 missing /dev/sdb1\
|  /dev/sdc1 /dev/sdd1
| mdadm: /dev/sdb1 appears to be part of a raid array:
|     level=raid5 devices=4 ctime=Sun Jan 29 21:35:05 2006
| mdadm: /dev/sdc1 appears to be part of a raid array:
|     level=raid5 devices=4 ctime=Sun Jan 29 21:35:05 2006
| mdadm: /dev/sdd1 appears to be part of a raid array:
|     level=raid5 devices=4 ctime=Sun Jan 29 21:35:05 2006
| Continue creating array? y
| mdadm: array /dev/md0 started.
| root@ned ~ # date;cat /proc/mdstat
| Fri Jul 14 18:51:48 CEST 2006
| Personalities : [linear] [raid0] [raid1] [raid10] [raid5] [raid4]\
|  [raid6] [multipath]
| md0 : active raid5 sdd1[3] sdc1[2] sdb1[1]
|       732443136 blocks level 5, 64k chunk, algorithm 2 [4/3] [_UUU]
|
| unused devices: <none>
| root@ned ~ #
`----

,----
| root@ned ~ # Start lvm
| Setting up LVM Volume Groups...
|   Reading all physical volumes.  This may take a while...
|   Found volume group "datavg1" using metadata type lvm2
|   1 logical volume(s) in volume group "datavg1" now active
| root@ned ~ # mount /dev/datavg1/datalv1 /data -o ro -t xfs
| root@ned ~ # mount
| rootfs on / type rootfs (rw)
| /dev/root on / type ext3 (rw,data=ordered)
| proc on /proc type proc (rw)
| sysfs on /sys type sysfs (rw)
| /dev/root on /dev/.static/dev type ext3 (rw,data=ordered)
| tmpfs on /dev type tmpfs (rw)
| /dev/pts on /dev/pts type devpts (rw)
| tmpfs on /dev/shm type tmpfs (rw)
| /dev/datavg1/datalv1 on /data type xfs (ro,sunit=128,swidth=25165824)
| root@ned ~ #
`----

OK, the thing works degraded. I unmounted it and did some file 
system checking:

,----
| root@ned ~ # xfs_repair /dev/datavg1/datalv1
| [...]
| done
| xfs_repair /dev/datavg1/datalv1  70.19s user 25.42s system\
|  18% cpu 8:25.58 total
| root@ned ~ # mount /dev/datavg1/datalv1 /data -o ro -t xfs
| root@ned ~ #
`----

No problems were found and the read-only mount afterwards worked.

Now I am doing backups *g*

Thank you for your help!


Next steps:
* After backups, I add the 4th disk
* Praying that the raid runs without problems from now on



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: only 4 spares and no access to my data
  2006-07-10  8:46     ` Henrik Holst
                         ` (2 preceding siblings ...)
  2006-07-10 11:18       ` only 4 spares and no access to my data Molle Bestefich
@ 2006-07-18  2:17       ` Neil Brown
  2006-07-18 23:44         ` Nix
  3 siblings, 1 reply; 19+ messages in thread
From: Neil Brown @ 2006-07-18  2:17 UTC (permalink / raw)
  To: Henrik Holst; +Cc: Karl Voit, linux-raid

On Monday July 10, henrik.holst@idgmail.se wrote:
> Karl Voit wrote:
> [snip]
> > Well this is because of the false(?) superblocks of sda-sdd in comparison to
> > sda1 to sdd1.
> 
> I don't understand this. Do you have more than a single partion on sda?
> Is sda1 occupying the entire disk? since the superblock is the /last/
> "128Kb" (I'm assuming 128*1024 bytes) the superblocks should be one and
> the same.

Not exactly.
The superblock locations for sda and sda1 can only be 'one and the
same' if sda1 is at an offset in sda which is a multiple of 64K, and
if sda1 ends near the end of sda.  This certainly can happen, but it
is by no means certain.

For this reason, version-1 superblocks record the offset of the
superblock in the device so that if a superblock is written to sda1
and then read from sda, it will look wrong (wrong offset) and so will
be ignored (no valid superblock here).

NeilBrown

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: only 4 spares and no access to my data
  2006-07-18  2:17       ` Neil Brown
@ 2006-07-18 23:44         ` Nix
  0 siblings, 0 replies; 19+ messages in thread
From: Nix @ 2006-07-18 23:44 UTC (permalink / raw)
  To: Neil Brown; +Cc: Henrik Holst, Karl Voit, linux-raid

On 18 Jul 2006, Neil Brown moaned:
> The superblock locations for sda and sda1 can only be 'one and the
> same' if sda1 is at an offset in sda which is a multiple of 64K, and
> if sda1 ends near the end of sda.  This certainly can happen, but it
> is by no means certain.
> 
> For this reason, version-1 superblocks record the offset of the
> superblock in the device so that if a superblock is written to sda1
> and then read from sda, it will look wrong (wrong offset) and so will
> be ignored (no valid superblock here).

One case where this can happen is Sun slices (and I think BSD disklabels
too), where /dev/sda and /dev/sda1 start at the *same place*.

(This causes amusing problems with LVM vgscan unless the raw devices
are excluded, too.)

-- 
`We're sysadmins. We deal with the inconceivable so often I can clearly 
 see the need to define levels of inconceivability.' --- Rik Steenwinkel

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2006-07-18 23:44 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-09 18:59 only 4 spares and no access to my data Karl Voit
2006-07-09 19:23 ` Molle Bestefich
2006-07-10  7:56   ` Karl Voit
2006-07-10  8:46     ` Henrik Holst
2006-07-10  9:27       ` Karl Voit
2006-07-10  9:34       ` Karl Voit
2006-07-10 11:16         ` Molle Bestefich
2006-07-10 11:42           ` Karl Voit
2006-07-10 12:07             ` Molle Bestefich
2006-07-10 12:36               ` Karl Voit
2006-07-10 17:06                 ` Molle Bestefich
2006-07-10 19:26                   ` Karl Voit
2006-07-12 19:35                     ` Molle Bestefich
2006-07-13 12:59                       ` Karl Voit
2006-07-15 10:31                       ` only 4 spares and no access to my data - solved Karl Voit
2006-07-10 11:18       ` only 4 spares and no access to my data Molle Bestefich
2006-07-18  2:17       ` Neil Brown
2006-07-18 23:44         ` Nix
2006-07-10  8:48   ` Karl Voit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).