RAID5 not being reassembled correctly after device swap

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RAID5 not being reassembled correctly after device swap
@ 2007-07-01 21:21 Michael Frotscher
  2007-07-01 22:12 ` Neil Brown
  2007-07-03 17:16 ` Michael Frotscher
  0 siblings, 2 replies; 12+ messages in thread
From: Michael Frotscher @ 2007-07-01 21:21 UTC (permalink / raw)
  To: linux-raid

Hello RAID-Experts,

I have three RAID5 consisting of different partitions on 3 disks (Debian 
stable) running the root-filesystem on a md (/boot is a separate non-raid 
partition) which is running rather nicely. For convenience I plugged all 
drives into the first ide controller making them hda, hdb and hdc. So far, so 
good. The partitions are flagged "fd", i.e. Linux raid autodetect.

As I have another builtin-ide-controller onboard, I'd like to distribute the 
disks for performance reasons, moving hdb to hde and hdc to hdg. The arrays 
would then consist of drives hda, hde and hdg.

This should not be a problem, as the arrays should assemble themselves using 
the superblocks on the partitions, shouldn't it? 

However, when I switch one drive (hdc), the array starts degraded with two 
drives present because it is still looking for hdc, which of course now is 
hdg. This shouldn't be happening. 

Well, then I re-added hdg to the degraded array, which went well and the array 
rebuilded itself. I now had healthy arrays consisting of hda, hdb and hdg. 
But after a reboot the array was degraded again and the system wanted its hdc 
drive.

And yes, I edited /boot/grub/device.map and changed hdc to hdg, so that can't 
be the reason.

I seem to be missing something here, but what is it?
-- 
YT,
Michael

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 not being reassembled correctly after device swap
  2007-07-01 21:21 RAID5 not being reassembled correctly after device swap Michael Frotscher
@ 2007-07-01 22:12 ` Neil Brown
  2007-07-02  6:35   ` Michael Frotscher
  2007-07-03 17:16 ` Michael Frotscher
  1 sibling, 1 reply; 12+ messages in thread
From: Neil Brown @ 2007-07-01 22:12 UTC (permalink / raw)
  To: Michael Frotscher; +Cc: linux-raid

On Sunday July 1, infomails@tronserver.dyndns.org wrote:
> Hello RAID-Experts,
> 
> I have three RAID5 consisting of different partitions on 3 disks (Debian 
> stable) running the root-filesystem on a md (/boot is a separate non-raid 
> partition) which is running rather nicely. For convenience I plugged all 
> drives into the first ide controller making them hda, hdb and hdc. So far, so 
> good. The partitions are flagged "fd", i.e. Linux raid autodetect.
> 
> As I have another builtin-ide-controller onboard, I'd like to distribute the 
> disks for performance reasons, moving hdb to hde and hdc to hdg. The arrays 
> would then consist of drives hda, hde and hdg.
> 
> This should not be a problem, as the arrays should assemble themselves using 
> the superblocks on the partitions, shouldn't it? 
> 
> However, when I switch one drive (hdc), the array starts degraded with two 
> drives present because it is still looking for hdc, which of course now is 
> hdg. This shouldn't be happening. 
> 
> Well, then I re-added hdg to the degraded array, which went well and the array 
> rebuilded itself. I now had healthy arrays consisting of hda, hdb and hdg. 
> But after a reboot the array was degraded again and the system wanted its hdc 
> drive.
> 
> And yes, I edited /boot/grub/device.map and changed hdc to hdg, so that can't 
> be the reason.
> 
> I seem to be missing something here, but what is it?

Kernel logs from the boot would help here.
Maybe /etc/mdadm/mdadm.conf lists "device=...." where it shouldn't.
Maybe the other IDE controller uses a module that it loaded late.
Logs would help.

NeilBrown
<

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 not being reassembled correctly after device swap
  2007-07-01 22:12 ` Neil Brown
@ 2007-07-02  6:35   ` Michael Frotscher
  2007-07-02  6:50     ` Michael Frotscher
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Frotscher @ 2007-07-02  6:35 UTC (permalink / raw)
  To: linux-raid

On Monday 02 July 2007 00:12:14 Neil Brown wrote:

> Kernel logs from the boot would help here.
> Logs would help.

Sure. The interesting part from dmesg is this:

hdg: max request size: 512KiB
hdg: 398297088 sectors (203928 MB) w/8192KiB Cache, CHS=24792/255/63, 
UDMA(100)
hdg: cache flushes supported
 hdg: hdg1 hdg2 hdg3 hdg4
hda: max request size: 512KiB
hda: 390721968 sectors (200049 MB) w/8192KiB Cache, CHS=24321/255/63, 
UDMA(100)
hda: cache flushes supported
 hda: hda1 hda2 hda3 hda4
hdb: max request size: 512KiB
hdb: 490234752 sectors (251000 MB) w/7936KiB Cache, CHS=30515/255/63, 
UDMA(100)
hdb: cache flushes supported
 hdb: hdb1 hdb2 hdb3 hdb4
md: md3 stopped.
md: bind<hdb3>
md: bind<hda3>
raid5: device hda3 operational as raid disk 0
raid5: device hdb3 operational as raid disk 1
raid5: allocated 3163kB for md3
raid5: raid level 5 set md3 active with 2 out of 3 devices, algorithm 2
RAID5 conf printout:
 --- rd:3 wd:2 fd:1
 disk 0, o:1, dev:hda3
 disk 1, o:1, dev:hdb3

What I really don't understand is the output of /proc/mdstat after a reboot:

Personalities : [raid6] [raid5] [raid4]
md4 : active raid5 hdg4[1] hda4[2]
      368643328 blocks level 5, 4k chunk, algorithm 2 [3/2] [_UU]

md2 : active raid5 hda2[0] hdg2[2]
      1027968 blocks level 5, 4k chunk, algorithm 2 [3/2] [U_U]

md3 : active raid5 hda3[0] hdb3[1]
      20980736 blocks level 5, 4k chunk, algorithm 2 [3/2] [UU_]

All arrays are degraded, but different disks are missing. md3 (the root 
partition) is missing its hdg, as the logfile tells. md2 and md4 are now 
missing its hdb:

md: md2 stopped.
md: bind<hdg2>
md: bind<hda2>
raid5: device hda2 operational as raid disk 0
raid5: device hdg2 operational as raid disk 2
raid5: allocated 3163kB for md2
raid5: raid level 5 set md2 active with 2 out of 3 devices, algorithm 2

Btw., is that significant that the order is different? In md4, the hdg-disk is 
raid-disk 1, whereas it is raid-disk 2 in md2. 

> Maybe /etc/mdadm/mdadm.conf lists "device=...." where it shouldn't.

Should be irrelevant, as the root-fs, where mdadm.conf resides, is on a raid 
itself.

> Maybe the other IDE controller uses a module that it loaded late.

Hmm, I'd need to check that after I rebuild the arrays. Maybe the other 
IDE-controller is not in the initrd. That wouldn't explain the missing hdb, 
though.
-- 
YT,
Michael

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 not being reassembled correctly after device swap
  2007-07-02  6:35   ` Michael Frotscher
@ 2007-07-02  6:50     ` Michael Frotscher
  0 siblings, 0 replies; 12+ messages in thread
From: Michael Frotscher @ 2007-07-02  6:50 UTC (permalink / raw)
  To: linux-raid

On Monday 02 July 2007 08:35:18 Michael Frotscher wrote:

> Hmm, I'd need to check that after I rebuild the arrays. Maybe the other
> IDE-controller is not in the initrd. 

No, although this sounded like a good idea. The IDE controller is initialized 
before the assembly of the arrays and even including its driver explicitly in 
the initrd results in a initrd of the same size as before, so it had been in 
there all along.
-- 
YT,
Michael

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 not being reassembled correctly after device swap
  2007-07-01 21:21 RAID5 not being reassembled correctly after device swap Michael Frotscher
  2007-07-01 22:12 ` Neil Brown
@ 2007-07-03 17:16 ` Michael Frotscher
  2007-07-03 17:22   ` Patrik Jonsson
  2007-07-03 18:43   ` David Greaves
  1 sibling, 2 replies; 12+ messages in thread
From: Michael Frotscher @ 2007-07-03 17:16 UTC (permalink / raw)
  To: linux-raid

Hello all,

I guess you can say that I'm at my wit's end. I really don't get it. An RAID 
array is suppose to recognize its members purely by its uuid, isn't it? So 
technically, I can remove a drive from one bus, reconnect it to another 
giving it a new device name and the array should not even need to sync.

Somehow it doesn't. Somehow the array remembers of which devices it's supposed 
to be assembled and boots in degraded mode, funnily not always missing the 
swapped drive.

Suppose I lose a whole ide-controller and want to restart the box with a 
seconday controller? That one would surely not have the devices hda through 
hdd and my array would refuse to start.

Does anyone have a suggestion of what I can try? Ok, the array runs fine as 
long as it is connected to its original bus, but I really don't want to take 
chances here.
-- 
YT,
Michael

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 not being reassembled correctly after device swap
  2007-07-03 17:16 ` Michael Frotscher
@ 2007-07-03 17:22   ` Patrik Jonsson
  2007-07-03 18:43   ` David Greaves
  1 sibling, 0 replies; 12+ messages in thread
From: Patrik Jonsson @ 2007-07-03 17:22 UTC (permalink / raw)
  To: Michael Frotscher; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1251 bytes --]

Michael Frotscher wrote:
> Hello all,
>
> I guess you can say that I'm at my wit's end. I really don't get it. An RAID 
> array is suppose to recognize its members purely by its uuid, isn't it? So 
> technically, I can remove a drive from one bus, reconnect it to another 
> giving it a new device name and the array should not even need to sync.
>
> Somehow it doesn't. Somehow the array remembers of which devices it's supposed 
> to be assembled and boots in degraded mode, funnily not always missing the 
> swapped drive.
>
> Suppose I lose a whole ide-controller and want to restart the box with a 
> seconday controller? That one would surely not have the devices hda through 
> hdd and my array would refuse to start.
>
> Does anyone have a suggestion of what I can try? Ok, the array runs fine as 
> long as it is connected to its original bus, but I really don't want to take 
> chances here.
>   
Funny, I did just this on my 10-disk raid5. 4 drives were moved from an
onboard sata controller to an Areca raid controller, and the array
didn't care. That was of course while the machine was down. If you fail
out a bunch of drives by having the controller go bad, that's probably
different.

cheers,

/Patrik



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 not being reassembled correctly after device swap
  2007-07-03 17:16 ` Michael Frotscher
  2007-07-03 17:22   ` Patrik Jonsson
@ 2007-07-03 18:43   ` David Greaves
  2007-07-03 19:20     ` Michael Frotscher
  2007-07-03 19:29     ` Michael Frotscher
  1 sibling, 2 replies; 12+ messages in thread
From: David Greaves @ 2007-07-03 18:43 UTC (permalink / raw)
  To: Michael Frotscher; +Cc: linux-raid

Michael Frotscher wrote:
> Hello all,
> 
> I guess you can say that I'm at my wit's end. I really don't get it. An RAID 
> array is suppose to recognize its members purely by its uuid, isn't it? So 
> technically, I can remove a drive from one bus, reconnect it to another 
> giving it a new device name and the array should not even need to sync.
> 
> Somehow it doesn't. Somehow the array remembers of which devices it's supposed 
> to be assembled and boots in degraded mode, funnily not always missing the 
> swapped drive.
> 
> Suppose I lose a whole ide-controller and want to restart the box with a 
> seconday controller? That one would surely not have the devices hda through 
> hdd and my array would refuse to start.
> 
> Does anyone have a suggestion of what I can try? Ok, the array runs fine as 
> long as it is connected to its original bus, but I really don't want to take 
> chances here.

Do you have a mdman.conf file that specifies/limits partitions to search?

David



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 not being reassembled correctly after device swap
  2007-07-03 18:43   ` David Greaves
@ 2007-07-03 19:20     ` Michael Frotscher
  2007-07-03 19:29     ` Michael Frotscher
  1 sibling, 0 replies; 12+ messages in thread
From: Michael Frotscher @ 2007-07-03 19:20 UTC (permalink / raw)
  To: linux-raid

Hi David,

> Do you have a mdman.conf file that specifies/limits partitions to search?

Just the usual "DEVICE partitons" followed by the ARRAY-lines. However, I 
don't think it's the mdadm.conf, rather the superblocks. Right now my main 
worry is my array which holds the root filesystem. The others I was able to 
resurrect (disks at their original ide ports) using the --update-option when 
assembling. As I cannot do that with the root array (didn't work when I 
booted off a CD), I'm a bit stuck.

When the system starts, it does not even bother to look for a third array 
component but starts the array degraded. I can then "mdadm -a" the third disk 
back into the array, it synchronizes and everything looks good. The same 
thing at the next boot.

Isn't there an option which updates all superblocks on an assembled array 
saying: you partitions are an array and stay an array until your superblock 
is erased or hell freezes over, whichever happens first. Amen.
-- 
YT,
Michael

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 not being reassembled correctly after device swap
  2007-07-03 18:43   ` David Greaves
  2007-07-03 19:20     ` Michael Frotscher
@ 2007-07-03 19:29     ` Michael Frotscher
  2007-07-04  8:45       ` David Greaves
  2007-07-04 13:35       ` Bill Davidsen
  1 sibling, 2 replies; 12+ messages in thread
From: Michael Frotscher @ 2007-07-03 19:29 UTC (permalink / raw)
  To: linux-raid

I forgot, in case it's of any help. 
mdadm -D gives after reassembly:

/dev/md3:
        Version : 00.90.03
  Creation Time : Sun Jan 14 21:17:53 2007
     Raid Level : raid5
     Array Size : 20980736 (20.01 GiB 21.48 GB)
    Device Size : 10490368 (10.00 GiB 10.74 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 3
    Persistence : Superblock is persistent

    Update Time : Tue Jul  3 21:21:53 2007
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 4K

           UUID : 36bbe21d:f49e8b5d:f504154c:a6f12a51
         Events : 0.6995604

    Number   Major   Minor   RaidDevice State
       0       3        3        0      active sync   /dev/hda3
       1       3       67        1      active sync   /dev/hdb3
       2      22        3        2      active sync   /dev/hdc3

and after the next boot:

/dev/md3:
        Version : 00.90.03
  Creation Time : Sun Jan 14 21:17:53 2007
     Raid Level : raid5
     Array Size : 20980736 (20.01 GiB 21.48 GB)
    Device Size : 10490368 (10.00 GiB 10.74 GB)
   Raid Devices : 3
  Total Devices : 2
Preferred Minor : 3
    Persistence : Superblock is persistent

    Update Time : Tue Jul  3 21:27:08 2007
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 4K

           UUID : 36bbe21d:f49e8b5d:f504154c:a6f12a51
         Events : 0.6995718

    Number   Major   Minor   RaidDevice State
       0       3        3        0      active sync   /dev/hda3
       1       3       67        1      active sync   /dev/hdb3
       2       0        0        2      removed

Any ideas on why the drive keeps being removed?
-- 
YT,
Michael

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 not being reassembled correctly after device swap
  2007-07-03 19:29     ` Michael Frotscher
@ 2007-07-04  8:45       ` David Greaves
  2007-07-04 16:31         ` Michael Frotscher
  2007-07-04 13:35       ` Bill Davidsen
  1 sibling, 1 reply; 12+ messages in thread
From: David Greaves @ 2007-07-04  8:45 UTC (permalink / raw)
  To: Michael Frotscher; +Cc: linux-raid

Michael Frotscher wrote:
> I forgot, in case it's of any help. 

Also do
mdadm --examine /dev/hd[abc]3

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 not being reassembled correctly after device swap
  2007-07-04  8:45       ` David Greaves
@ 2007-07-04 16:31         ` Michael Frotscher
  0 siblings, 0 replies; 12+ messages in thread
From: Michael Frotscher @ 2007-07-04 16:31 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1511 bytes --]

On Wednesday 04 July 2007 10:45:22 David Greaves wrote:

> mdadm --examine /dev/hd[abc]3

I'll attach that as a file as it's quite lenghty. It is of the reassembled 
array - if it helps, I can reboot again and provide the same with the 
degraded array that exists after reboot.

> And could you share an "fdisk -l" output

fdisk shows nothing unusual:

Disk /dev/hdc: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hdc1               1           4       32098+  83  Linux
/dev/hdc2               5          68      514080   fd  Linux raid autodetect
/dev/hdc3              69        1374    10490445   fd  Linux raid autodetect
/dev/hdc4            1375       24321   184321777+  fd  Linux raid autodetect

yes, the last partition does not extend to the end of the drive due to the 
fact that hda is only a 200GB drive:

Disk /dev/hda: 200.0 GB, 200049647616 bytes
255 heads, 63 sectors/track, 24321 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1           4       32098+  83  Linux
/dev/hda2               5          68      514080   fd  Linux raid autodetect
/dev/hda3              69        1374    10490445   fd  Linux raid autodetect
/dev/hda4            1375       24321   184321777+  fd  Linux raid autodetect

Thanks for the help everyone!
-- 
YT,
Michael

[-- Attachment #2: md3-array.txt --]
[-- Type: text/plain, Size: 2772 bytes --]

/dev/hda3:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 36bbe21d:f49e8b5d:f504154c:a6f12a51
  Creation Time : Sun Jan 14 21:17:53 2007
     Raid Level : raid5
    Device Size : 10490368 (10.00 GiB 10.74 GB)
     Array Size : 20980736 (20.01 GiB 21.48 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 3

    Update Time : Wed Jul  4 18:24:59 2007
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
       Checksum : fe26fd74 - correct
         Events : 0.6996556

         Layout : left-symmetric
     Chunk Size : 4K

      Number   Major   Minor   RaidDevice State
this     0       3        3        0      active sync   /dev/hda3

   0     0       3        3        0      active sync   /dev/hda3
   1     1       3       67        1      active sync   /dev/hdb3
   2     2      22        3        2      active sync   /dev/hdc3
/dev/hdb3:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 36bbe21d:f49e8b5d:f504154c:a6f12a51
  Creation Time : Sun Jan 14 21:17:53 2007
     Raid Level : raid5
    Device Size : 10490368 (10.00 GiB 10.74 GB)
     Array Size : 20980736 (20.01 GiB 21.48 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 3

    Update Time : Wed Jul  4 18:25:02 2007
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
       Checksum : fe26fdb9 - correct
         Events : 0.6996556

         Layout : left-symmetric
     Chunk Size : 4K

      Number   Major   Minor   RaidDevice State
this     1       3       67        1      active sync   /dev/hdb3

   0     0       3        3        0      active sync   /dev/hda3
   1     1       3       67        1      active sync   /dev/hdb3
   2     2      22        3        2      active sync   /dev/hdc3
/dev/hdc3:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 36bbe21d:f49e8b5d:f504154c:a6f12a51
  Creation Time : Sun Jan 14 21:17:53 2007
     Raid Level : raid5
    Device Size : 10490368 (10.00 GiB 10.74 GB)
     Array Size : 20980736 (20.01 GiB 21.48 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 3

    Update Time : Wed Jul  4 18:25:02 2007
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
       Checksum : fe26fd8e - correct
         Events : 0.6996556

         Layout : left-symmetric
     Chunk Size : 4K

      Number   Major   Minor   RaidDevice State
this     2      22        3        2      active sync   /dev/hdc3

   0     0       3        3        0      active sync   /dev/hda3
   1     1       3       67        1      active sync   /dev/hdb3
   2     2      22        3        2      active sync   /dev/hdc3

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 not being reassembled correctly after device swap
  2007-07-03 19:29     ` Michael Frotscher
  2007-07-04  8:45       ` David Greaves
@ 2007-07-04 13:35       ` Bill Davidsen
  1 sibling, 0 replies; 12+ messages in thread
From: Bill Davidsen @ 2007-07-04 13:35 UTC (permalink / raw)
  To: Michael Frotscher; +Cc: linux-raid

Michael Frotscher wrote:
> I forgot, in case it's of any help. 
> mdadm -D gives after reassembly:
>
> [snip]
>
> Any ideas on why the drive keeps being removed?
>   
Do you see anything in dmesg which would indicate an error on the drive? 
And could you share an "fdisk -l" output so we can see what the kernel 
thinks is on the drive? My guess is that for some reason the device is 
being considered unread, has the wrong partition type, or was low level 
formatted on a Monday. Okay, that last is unlikely ;-)

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-07-04 16:31 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-01 21:21 RAID5 not being reassembled correctly after device swap Michael Frotscher
2007-07-01 22:12 ` Neil Brown
2007-07-02  6:35   ` Michael Frotscher
2007-07-02  6:50     ` Michael Frotscher
2007-07-03 17:16 ` Michael Frotscher
2007-07-03 17:22   ` Patrik Jonsson
2007-07-03 18:43   ` David Greaves
2007-07-03 19:20     ` Michael Frotscher
2007-07-03 19:29     ` Michael Frotscher
2007-07-04  8:45       ` David Greaves
2007-07-04 16:31         ` Michael Frotscher
2007-07-04 13:35       ` Bill Davidsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).