16 HDDs too much for RAID6?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* 16 HDDs too much for RAID6?
@ 2008-03-06  9:01 Lars Täuber
  2008-03-06  9:45 ` Andre Noll
  0 siblings, 1 reply; 13+ messages in thread
From: Lars Täuber @ 2008-03-06  9:01 UTC (permalink / raw)
  To: linux-raid

Hallo!

Here we have another problem with our RAID6 and 16 HDDs:

monosan:~ # mdadm -V
mdadm - v2.6.2 - 21st May 2007
monosan:~ # mdadm -C /dev/md4 -l6 -n 16 -x 0 /dev/dm-*
mdadm: /dev/dm-0 appears to be part of a raid array:
    level=raid6 devices=15 ctime=Wed Feb 13 10:38:52 2008
mdadm: /dev/dm-1 appears to be part of a raid array:
    level=raid6 devices=15 ctime=Wed Feb 13 10:38:52 2008
mdadm: /dev/dm-10 appears to be part of a raid array:
    level=raid6 devices=15 ctime=Wed Feb 13 10:38:52 2008
mdadm: /dev/dm-11 appears to contain an ext2fs file system
    size=-2147483648K  mtime=Thu Jan  1 01:00:00 1970
mdadm: /dev/dm-11 appears to be part of a raid array:
    level=raid6 devices=15 ctime=Wed Feb 13 10:38:52 2008
mdadm: /dev/dm-12 appears to be part of a raid array:
    level=raid6 devices=15 ctime=Wed Feb 13 10:38:52 2008
mdadm: /dev/dm-13 appears to be part of a raid array:
    level=raid6 devices=15 ctime=Wed Feb 13 10:38:52 2008
mdadm: /dev/dm-14 appears to be part of a raid array:
    level=raid6 devices=15 ctime=Wed Feb 13 10:38:52 2008
mdadm: /dev/dm-2 appears to be part of a raid array:
    level=raid6 devices=15 ctime=Wed Feb 13 10:38:52 2008
mdadm: /dev/dm-3 appears to be part of a raid array:
    level=raid6 devices=15 ctime=Wed Feb 13 10:38:52 2008
mdadm: /dev/dm-4 appears to be part of a raid array:
    level=raid6 devices=15 ctime=Wed Feb 13 10:38:52 2008
mdadm: /dev/dm-5 appears to be part of a raid array:
    level=raid6 devices=15 ctime=Wed Feb 13 10:38:52 2008
mdadm: /dev/dm-6 appears to be part of a raid array:
    level=raid6 devices=15 ctime=Wed Feb 13 10:38:52 2008
mdadm: /dev/dm-7 appears to be part of a raid array:
    level=raid6 devices=15 ctime=Wed Feb 13 10:38:52 2008
Continue creating array? y
mdadm: array /dev/md4 started.

monosan:~ # mdadm --detail --scan| fgrep md4 >> /etc/mdadm.conf 

monosan:~ # mdadm -S /dev/md4
mdadm: stopped /dev/md4

monosan:~ # mdadm -A /dev/md4
mdadm: WARNING /dev/dm-9 and /dev/dm-8 appear to have very similar superblocks.
      If they are really different, please --zero the superblock on one
      If they are the same or overlap, please remove one from the
      DEVICE list in mdadm.conf.

This happens _always_ when the arreay is reassembled. The actual devices with the duplicated superblocks differ sometimes.

Are 16 drives too much for RAID6?

Thanks
Lars

-- 
                            Informationstechnologie
Berlin-Brandenburgische Akademie der Wissenschaften
Jägerstrasse 22-23                     10117 Berlin
Tel.: +49 30 20370-352           http://www.bbaw.de
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 16 HDDs too much for RAID6?
  2008-03-06  9:01 16 HDDs too much for RAID6? Lars Täuber
@ 2008-03-06  9:45 ` Andre Noll
  2008-03-06 10:55   ` Lars Täuber
  0 siblings, 1 reply; 13+ messages in thread
From: Andre Noll @ 2008-03-06  9:45 UTC (permalink / raw)
  To: Lars Täuber; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1458 bytes --]

On 10:01, Lars Täuber wrote:
> Here we have another problem with our RAID6 and 16 HDDs:
> 
> monosan:~ # mdadm -V
> mdadm - v2.6.2 - 21st May 2007
> monosan:~ # mdadm -C /dev/md4 -l6 -n 16 -x 0 /dev/dm-*
> mdadm: /dev/dm-0 appears to be part of a raid array:

Run
	mdadm --zero-superblock /dev/dm-0

before creating the array to get rid of these.

> monosan:~ # mdadm --detail --scan| fgrep md4 >> /etc/mdadm.conf 

Please post your /etc/mdadm.conf

> monosan:~ # mdadm -A /dev/md4
> mdadm: WARNING /dev/dm-9 and /dev/dm-8 appear to have very similar superblocks.
>       If they are really different, please --zero the superblock on one
>       If they are the same or overlap, please remove one from the
>       DEVICE list in mdadm.conf.

Are you sure dm-8 and dm-9 are different devices?

	pvdisplay /dev/dm-9; pvdisplay /dev/dm-8

should tell you.

> This happens _always_ when the arreay is reassembled. The actual
> devices with the duplicated superblocks differ sometimes.

If I read the code correctly it means that the two devices have
identical superblocks, the same event count and the same minor number,
so mdadm thinks dm-8 and dm-9 are overlapping partitions.

> Are 16 drivesmda too much for RAID6?

No, raid6 supports up to 254 devices (there are other reasons for
not using too many devices for raid6 though).

Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 16 HDDs too much for RAID6?
  2008-03-06  9:45 ` Andre Noll
@ 2008-03-06 10:55   ` Lars Täuber
  2008-03-06 16:16     ` Andre Noll
  0 siblings, 1 reply; 13+ messages in thread
From: Lars Täuber @ 2008-03-06 10:55 UTC (permalink / raw)
  To: Andre Noll; +Cc: linux-raid

Hi Andre,

> Run
> 	mdadm --zero-superblock /dev/dm-0
> 
> before creating the array to get rid of these.

ok, thanks.

> 
> > monosan:~ # mdadm --detail --scan| fgrep md4 >> /etc/mdadm.conf 
> 
> Please post your /etc/mdadm.conf

monosan:~ # cat /etc/mdadm.conf 
DEVICE partitions
ARRAY /dev/md2 level=raid1 UUID=d9d31de2:e6dbd3c3:37c7ea09:882a64e5
ARRAY /dev/md3 level=raid1 UUID=a8687183:a79e514c:ca492c4b:ffd4384f
ARRAY /dev/md4 level=raid6 num-devices=16 UUID=8d596319:4d21dba3:3871bccf:5b90a66d



> > monosan:~ # mdadm -A /dev/md4
> > mdadm: WARNING /dev/dm-9 and /dev/dm-8 appear to have very similar superblocks.
> >       If they are really different, please --zero the superblock on one
> >       If they are the same or overlap, please remove one from the
> >       DEVICE list in mdadm.conf.
> 
> Are you sure dm-8 and dm-9 are different devices?
> 
> 	pvdisplay /dev/dm-9; pvdisplay /dev/dm-8
> 
> should tell you.

Yes, I'm sure:
monosan:~ # pvdisplay /dev/dm-9; pvdisplay /dev/dm-8
  Failed to read physical volume "/dev/dm-9"
  Failed to read physical volume "/dev/dm-8"

but:
monosan:~ # multipathd -k
multipathd> list topology 
mpath0 (SATA_ST31000340NS_5QJ02QRQ) dm-0 ATA     ,ST31000340NS  
[size=932G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 6:0:32:0 sdah 66:16  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 6:0:15:0 sdr  65:16  [active][ready]
mpath1 (SATA_ST31000340NS_5QJ0204G) dm-1 ATA     ,ST31000340NS  
[size=932G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 6:0:31:0 sdag 66:0   [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 6:0:14:0 sdq  65:0   [active][ready]
mpath2 (SATA_ST31000340NS_5QJ02TVQ) dm-2 ATA     ,ST31000340NS  
[size=932G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 6:0:30:0 sdaf 65:240 [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 6:0:13:0 sdp  8:240  [active][ready]
mpath3 (SATA_ST31000340NS_5QJ012AL) dm-3 ATA     ,ST31000340NS  
[size=932G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 6:0:29:0 sdae 65:224 [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 6:0:12:0 sdo  8:224  [active][ready]
mpath4 (SATA_ST31000340NS_5QJ00PHN) dm-4 ATA     ,ST31000340NS  
[size=932G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 6:0:28:0 sdad 65:208 [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 6:0:11:0 sdn  8:208  [active][ready]
mpath5 (SATA_ST31000340NS_5QJ01BYF) dm-5 ATA     ,ST31000340NS  
[size=932G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 6:0:27:0 sdac 65:192 [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 6:0:10:0 sdm  8:192  [active][ready]
mpath6 (SATA_ST31000340NS_5QJ026J1) dm-6 ATA     ,ST31000340NS  
[size=932G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 6:0:26:0 sdab 65:176 [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 6:0:9:0  sdl  8:176  [active][ready]
mpath7 (SATA_ST31000340NS_5QJ01G09) dm-7 ATA     ,ST31000340NS  
[size=932G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 6:0:25:0 sdaa 65:160 [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 6:0:8:0  sdk  8:160  [active][ready]
rename: mpath9 (SATA_ST31000340NS_5QJ02461) dm-8 ATA     ,ST31000340NS  
[size=932G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 6:0:24:0 sdz  65:144 [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 6:0:7:0  sdj  8:144  [active][ready]
reload: mpath10 (SATA_ST31000340NS_5QJ013GW) dm-9 ATA     ,ST31000340NS  
[size=932G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 6:0:23:0 sdy  65:128 [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 6:0:6:0  sdi  8:128  [active][ready]
mpath11 (SATA_ST31000340NS_5QJ01835) dm-10 ATA     ,ST31000340NS  
[size=932G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 6:0:22:0 sdx  65:112 [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 6:0:5:0  sdh  8:112  [active][ready]
mpath12 (SATA_ST31000340NS_5QJ01C49) dm-11 ATA     ,ST31000340NS  
[size=932G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 6:0:21:0 sdw  65:96  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 6:0:4:0  sdg  8:96   [active][ready]
mpath13 (SATA_ST31000340NS_5QJ02TBZ) dm-12 ATA     ,ST31000340NS  
[size=932G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 6:0:20:0 sdv  65:80  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 6:0:3:0  sdf  8:80   [active][ready]
mpath14 (SATA_ST31000340NS_5QJ01JSF) dm-13 ATA     ,ST31000340NS  
[size=932G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 6:0:19:0 sdu  65:64  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 6:0:2:0  sde  8:64   [active][ready]
mpath15 (SATA_ST31000340NS_5QJ02TBK) dm-14 ATA     ,ST31000340NS  
[size=932G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 6:0:18:0 sdt  65:48  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 6:0:1:0  sdd  8:48   [active][ready]
mpath16 (SATA_ST31000340NS_5QJ0185Y) dm-15 ATA     ,ST31000340NS  
[size=932G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 6:0:17:0 sds  65:32  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 6:0:0:0  sdc  8:32   [active][ready]
multipathd>

monosan:~ #  for DEV in /dev/sd[c-z] /dev/sda[a-h] ; do echo Serial $DEV ; smartctl -id sat $DEV ; done | fgrep Serial
Serial /dev/sdc
Serial Number:    5QJ0185Y
Serial /dev/sdd
Serial Number:    5QJ02TBK
Serial /dev/sde
Serial Number:    5QJ01JSF
Serial /dev/sdf
Serial Number:    5QJ02TBZ
Serial /dev/sdg
Serial Number:    5QJ01C49
Serial /dev/sdh
Serial Number:    5QJ01835
Serial /dev/sdi
Serial Number:    5QJ013GW
Serial /dev/sdj
Serial Number:    5QJ02461
Serial /dev/sdk
Serial Number:    5QJ01G09
Serial /dev/sdl
Serial Number:    5QJ026J1
Serial /dev/sdm
Serial Number:    5QJ01BYF
Serial /dev/sdn
Serial Number:    5QJ00PHN
Serial /dev/sdo
Serial Number:    5QJ012AL
Serial /dev/sdp
Serial Number:    5QJ02TVQ
Serial /dev/sdq
Serial Number:    5QJ0204G
Serial /dev/sdr
Serial Number:    5QJ02QRQ
Serial /dev/sds
Serial Number:    5QJ0185Y
Serial /dev/sdt
Serial Number:    5QJ02TBK
Serial /dev/sdu
Serial Number:    5QJ01JSF
Serial /dev/sdv
Serial Number:    5QJ02TBZ
Serial /dev/sdw
Serial Number:    5QJ01C49
Serial /dev/sdx
Serial Number:    5QJ01835
Serial /dev/sdy
Serial Number:    5QJ013GW
Serial /dev/sdz
Serial Number:    5QJ02461
Serial /dev/sdaa
Serial Number:    5QJ01G09
Serial /dev/sdab
Serial Number:    5QJ026J1
Serial /dev/sdac
Serial Number:    5QJ01BYF
Serial /dev/sdad
Serial Number:    5QJ00PHN
Serial /dev/sdae
Serial Number:    5QJ012AL
Serial /dev/sdaf
Serial Number:    5QJ02TVQ
Serial /dev/sdag
Serial Number:    5QJ0204G
Serial /dev/sdah
Serial Number:    5QJ02QRQ

> 
> > This happens _always_ when the arreay is reassembled. The actual
> > devices with the duplicated superblocks differ sometimes.
> 
> If I read the code correctly it means that the two devices have
> identical superblocks, the same event count and the same minor number,
> so mdadm thinks dm-8 and dm-9 are overlapping partitions.

The multipathed drives are used as a whole without. The dont contain any partitions.

sdz and sdj are the same physical device with serial 5QJ02461 called /dev/dm-8:
rename: mpath9 (SATA_ST31000340NS_5QJ02461) dm-8 ATA     ,ST31000340NS  
[size=932G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 6:0:24:0 sdz  65:144 [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 6:0:7:0  sdj  8:144  [active][ready]

similar to sdy and sdi with serial 5QJ013GW called /dev/dm-9
reload: mpath10 (SATA_ST31000340NS_5QJ013GW) dm-9 ATA     ,ST31000340NS  
[size=932G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 6:0:23:0 sdy  65:128 [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 6:0:6:0  sdi  8:128  [active][ready]


 
> > Are 16 drivesmda too much for RAID6?
> 
> No, raid6 supports up to 254 devices (there are other reasons for
> not using too many devices for raid6 though).

Is there a way to get more verbose infos or debug this anyhow?

Thanks
Lars

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 16 HDDs too much for RAID6?
  2008-03-06 10:55   ` Lars Täuber
@ 2008-03-06 16:16     ` Andre Noll
  2008-03-07  8:41       ` Luca Berra
  0 siblings, 1 reply; 13+ messages in thread
From: Andre Noll @ 2008-03-06 16:16 UTC (permalink / raw)
  To: Lars Täuber; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1540 bytes --]

On 11:55, Lars Täuber wrote:
> monosan:~ # cat /etc/mdadm.conf 
> DEVICE partitions
> ARRAY /dev/md2 level=raid1 UUID=d9d31de2:e6dbd3c3:37c7ea09:882a64e5
> ARRAY /dev/md3 level=raid1 UUID=a8687183:a79e514c:ca492c4b:ffd4384f
> ARRAY /dev/md4 level=raid6 num-devices=16 UUID=8d596319:4d21dba3:3871bccf:5b90a66d

Does it help to list only the 16 devices that are used for the array,
i.e. something like

	DEVICE /dev/sd[a-p]
	
> sdz and sdj are the same physical device with serial 5QJ02461 called /dev/dm-8:
> rename: mpath9 (SATA_ST31000340NS_5QJ02461) dm-8 ATA     ,ST31000340NS  
> [size=932G][features=0][hwhandler=0]
> \_ round-robin 0 [prio=0][active]
>  \_ 6:0:24:0 sdz  65:144 [active][ready]
> \_ round-robin 0 [prio=0][enabled]
>  \_ 6:0:7:0  sdj  8:144  [active][ready]
> 
> similar to sdy and sdi with serial 5QJ013GW called /dev/dm-9
> reload: mpath10 (SATA_ST31000340NS_5QJ013GW) dm-9 ATA     ,ST31000340NS  
> [size=932G][features=0][hwhandler=0]
> \_ round-robin 0 [prio=0][active]
>  \_ 6:0:23:0 sdy  65:128 [active][ready]
> \_ round-robin 0 [prio=0][enabled]
>  \_ 6:0:6:0  sdi  8:128  [active][ready]

I think this is what is confusing mdadm. Your "DEVICE partitions"
line instructs mdadm to consider all devices in /proc/partitions,
so it finds both sdy and sdi.

> Is there a way to get more verbose infos or debug this anyhow?

There's the --detail and --verbose command line options to mdadm.

Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 16 HDDs too much for RAID6?
  2008-03-06 16:16     ` Andre Noll
@ 2008-03-07  8:41       ` Luca Berra
  2008-03-07 10:33         ` Andre Noll
  2008-03-07 10:45         ` Lars Täuber
  0 siblings, 2 replies; 13+ messages in thread
From: Luca Berra @ 2008-03-07  8:41 UTC (permalink / raw)
  To: linux-raid

On Thu, Mar 06, 2008 at 05:16:21PM +0100, Andre Noll wrote:
>On 11:55, Lars Täuber wrote:
>> monosan:~ # cat /etc/mdadm.conf 
>> DEVICE partitions
>> ARRAY /dev/md2 level=raid1 UUID=d9d31de2:e6dbd3c3:37c7ea09:882a64e5
>> ARRAY /dev/md3 level=raid1 UUID=a8687183:a79e514c:ca492c4b:ffd4384f
>> ARRAY /dev/md4 level=raid6 num-devices=16 UUID=8d596319:4d21dba3:3871bccf:5b90a66d
>
>Does it help to list only the 16 devices that are used for the array,
>i.e. something like
>
>	DEVICE /dev/sd[a-p]

i hope kernel will pevent you from doing something this stupid, but i am
not that sure.
if you wanna check if the problem is device selection a more appropriate
line would be
DEVICE /dev/mapper/mpath*
or
DEVICE /dev/dm-[0-9] /dev/dm-1[0-5]

>I think this is what is confusing mdadm. Your "DEVICE partitions"
>line instructs mdadm to consider all devices in /proc/partitions,
>so it finds both sdy and sdi.
in this case i believe the error message would be different

>> Is there a way to get more verbose infos or debug this anyhow?

you could try with the --verbose option and post the results here.

also could you check if the minor number of /dev/dm-* are really unique?

in case this yelds no result we will have to add some more printf in
Assemble.c.

Regards,
L.

-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 16 HDDs too much for RAID6?
  2008-03-07  8:41       ` Luca Berra
@ 2008-03-07 10:33         ` Andre Noll
  2008-03-07 10:45         ` Lars Täuber
  1 sibling, 0 replies; 13+ messages in thread
From: Andre Noll @ 2008-03-07 10:33 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1353 bytes --]

On 09:41, Luca Berra wrote:
> >I think this is what is confusing mdadm. Your "DEVICE partitions"
> >line instructs mdadm to consider all devices in /proc/partitions,
> >so it finds both sdy and sdi.
> in this case i believe the error message would be different

Well, the code that causes the warning Lars is seeing is

	if (best[i] >=0 &&
	    devices[best[i]].i.events
	    == devices[devcnt].i.events
	    && (devices[best[i]].i.disk.minor
		!= devices[devcnt].i.disk.minor)
	    && st->ss->major == 0
	    && info.array.level != -4) {
		/* two different devices with identical superblock.
		 * Could be a mis-detection caused by overlapping
		 * partitions.  fail-safe.
		 */
		fprintf(stderr, Name ": WARNING %s and %s appear"
			" to have very similar superblocks.\n"
			"      If they are really different, "
			"please --zero the superblock on one\n"
			"      If they are the same or overlap,"
			" please remove one from %s.\n",
			devices[best[i]].devname, devname,
			inargv ? "the list" :
			   "the\n      DEVICE list in mdadm.conf"
			);
		if (must_close) close(mdfd);
		return 1;
	}

IMHO this can be triggered by having two devices nodes (sdy and sdi,
whatever) that correspond to the same physical device, no?

Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 16 HDDs too much for RAID6?
  2008-03-07  8:41       ` Luca Berra
  2008-03-07 10:33         ` Andre Noll
@ 2008-03-07 10:45         ` Lars Täuber
  2008-03-28  9:20           ` Reopen: " Lars Täuber
  1 sibling, 1 reply; 13+ messages in thread
From: Lars Täuber @ 2008-03-07 10:45 UTC (permalink / raw)
  To: linux-raid

Hi guys,

Luca Berra <bluca@comedia.it> schrieb:
> On Thu, Mar 06, 2008 at 05:16:21PM +0100, Andre Noll wrote:
> >On 11:55, Lars Täuber wrote:
> >> monosan:~ # cat /etc/mdadm.conf 
> >> DEVICE partitions
> >> ARRAY /dev/md2 level=raid1 UUID=d9d31de2:e6dbd3c3:37c7ea09:882a64e5
> >> ARRAY /dev/md3 level=raid1 UUID=a8687183:a79e514c:ca492c4b:ffd4384f
> >> ARRAY /dev/md4 level=raid6 num-devices=16 UUID=8d596319:4d21dba3:3871bccf:5b90a66d
> >
> >Does it help to list only the 16 devices that are used for the array,
> >i.e. something like
> >
> >	DEVICE /dev/sd[a-p]

because the devices sd[c-z] and sda[a-h] are used by multipathd they are accessible in read only mode only. For writing /dev/dm-* devices are available.

> i hope kernel will pevent you from doing something this stupid, but i am
> not that sure.
> if you wanna check if the problem is device selection a more appropriate
> line would be
> DEVICE /dev/mapper/mpath*
> or
> DEVICE /dev/dm-[0-9] /dev/dm-1[0-5]

Correct.
My mdadm.conf has now this line for safety:
DEVICE /dev/sd[ab][0-9] /dev/dm-*

But this doesn't really changed anything.

> >I think this is what is confusing mdadm. Your "DEVICE partitions"
> >line instructs mdadm to consider all devices in /proc/partitions,
> >so it finds both sdy and sdi.
> in this case i believe the error message would be different
> 
> >> Is there a way to get more verbose infos or debug this anyhow?
> 
> you could try with the --verbose option and post the results here.
> 
> also could you check if the minor number of /dev/dm-* are really unique?
> 
> in case this yelds no result we will have to add some more printf in
> Assemble.c.

I zeroed out all physical devices completely:
# for DEV in /dev/sd[c-r]; do dd if=/dev/zero of=$DEV; done

Now the problem is gone. I don't know what really caused the problem.
Many thanks for your suggestions.

Lars
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Reopen: 16 HDDs too much for RAID6?
  2008-03-07 10:45         ` Lars Täuber
@ 2008-03-28  9:20           ` Lars Täuber
  2008-03-28 10:14             ` Bernd Schubert
  0 siblings, 1 reply; 13+ messages in thread
From: Lars Täuber @ 2008-03-28  9:20 UTC (permalink / raw)
  To: linux-raid

Hallo!

Lars Täuber <taeuber@bbaw.de> schrieb:
> I zeroed out all physical devices completely:
> # for DEV in /dev/sd[c-r]; do dd if=/dev/zero of=$DEV; done
> 
> Now the problem is gone. I don't know what really caused the problem.
> Many thanks for your suggestions.

The problem has occured again.
I'm not sure what the cause was, but the duplicated superblock is there again. But the raid fell apart before. So I suspect this occurs only after the array was degraded.
The discs are not defective so I tried to reassemble the array with the original discs again:
monosan:~ # mdadm -A /dev/md4
mdadm: WARNING /dev/dm-9 and /dev/dm-8 appear to have very similar superblocks.
      If they are really different, please --zero the superblock on one
      If they are the same or overlap, please remove one from the
      DEVICE list in mdadm.conf.

How can I extract the superblocks to check if they are really identically?

Thanks
Lars
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Reopen: 16 HDDs too much for RAID6?
  2008-03-28  9:20           ` Reopen: " Lars Täuber
@ 2008-03-28 10:14             ` Bernd Schubert
  2008-03-28 10:27               ` Lars Täuber
  0 siblings, 1 reply; 13+ messages in thread
From: Bernd Schubert @ 2008-03-28 10:14 UTC (permalink / raw)
  To: Lars Täuber; +Cc: linux-raid

Hallo Lars,

On Friday 28 March 2008 10:20:02 Lars Täuber wrote:
> Hallo!
>
> Lars Täuber <taeuber@bbaw.de> schrieb:
> > I zeroed out all physical devices completely:
> > # for DEV in /dev/sd[c-r]; do dd if=/dev/zero of=$DEV; done

why not simply "mdadm --zero-superblock $DEV"?

> >
> > Now the problem is gone. I don't know what really caused the problem.
> > Many thanks for your suggestions.
>
> The problem has occured again.
> I'm not sure what the cause was, but the duplicated superblock is there
> again. But the raid fell apart before. So I suspect this occurs only after
> the array was degraded. The discs are not defective so I tried to
> reassemble the array with the original discs again: monosan:~ # mdadm -A
> /dev/md4
> mdadm: WARNING /dev/dm-9 and /dev/dm-8 appear to have very similar
> superblocks. If they are really different, please --zero the superblock on
> one If they are the same or overlap, please remove one from the
>       DEVICE list in mdadm.conf.
>
> How can I extract the superblocks to check if they are really identically?


mdadm --examine /dev/dm-9
mdadm --examine /dev/dm-8

Do you have some kind of multipathing, which really could cause a identical 
superblocks on dm-9 and dm-8? Did you specify dm-9 and dm-8 in you 
mdadm.conf / assemble script or the real human readable lvm / multipath 
names?


Cheers,
Bernd


-- 
Bernd Schubert
Q-Leap Networks GmbH
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Reopen: 16 HDDs too much for RAID6?
  2008-03-28 10:14             ` Bernd Schubert
@ 2008-03-28 10:27               ` Lars Täuber
  2008-03-28 10:35                 ` Bernd Schubert
  0 siblings, 1 reply; 13+ messages in thread
From: Lars Täuber @ 2008-03-28 10:27 UTC (permalink / raw)
  To: linux-raid

Hallo Bernd,

Bernd Schubert <bs@q-leap.de> schrieb:
> Hallo Lars,
> 
> On Friday 28 March 2008 10:20:02 Lars Täuber wrote:
> > Hallo!
> >
> > Lars Täuber <taeuber@bbaw.de> schrieb:
> > > I zeroed out all physical devices completely:
> > > # for DEV in /dev/sd[c-r]; do dd if=/dev/zero of=$DEV; done
> 
> why not simply "mdadm --zero-superblock $DEV"?

I just wanted to make clear, that there couldn't be anything left anywhere on the disk. I already learned about this option of mdadm.

 
> > >
> > > Now the problem is gone. I don't know what really caused the problem.
> > > Many thanks for your suggestions.
> >
> > The problem has occured again.
> > I'm not sure what the cause was, but the duplicated superblock is there
> > again. But the raid fell apart before. So I suspect this occurs only after
> > the array was degraded. The discs are not defective so I tried to
> > reassemble the array with the original discs again: monosan:~ # mdadm -A
> > /dev/md4
> > mdadm: WARNING /dev/dm-9 and /dev/dm-8 appear to have very similar
> > superblocks. If they are really different, please --zero the superblock on
> > one If they are the same or overlap, please remove one from the
> >       DEVICE list in mdadm.conf.
> >
> > How can I extract the superblocks to check if they are really identically?
> 
> 
> mdadm --examine /dev/dm-9
> mdadm --examine /dev/dm-8

I just reassembled the array for another test. Next time I'll have a deeper look with this.


> Do you have some kind of multipathing, which really could cause a identical 
> superblocks on dm-9 and dm-8? Did you specify dm-9 and dm-8 in you 
> mdadm.conf / assemble script or the real human readable lvm / multipath 
> names?

Here the conf file:
monosan:~ # cat /etc/mdadm.conf 
DEVICE /dev/sd[ab][0-9] /dev/dm-*
ARRAY /dev/md2 level=raid1 UUID=d9d31de2:e6dbd3c3:37c7ea09:882a64e5
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=a8687183:a79e514c:ca492c4b:ffd4384f
ARRAY /dev/md4 level=raid6 num-devices=15 spares=1 UUID=cfcbe071:f6766d8f:0f1ffefa:892d09c3
ARRAY /dev/md9 level=raid1 num-devices=2 name=9 UUID=db687150:614e76fd:28feefc0:b1aae572

All dm-* devices are really distinctive. I could post the /etc/multipath.conf too if you want.

Thanks
Lars
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Reopen: 16 HDDs too much for RAID6?
  2008-03-28 10:27               ` Lars Täuber
@ 2008-03-28 10:35                 ` Bernd Schubert
  2008-03-28 10:55                   ` Lars Täuber
  0 siblings, 1 reply; 13+ messages in thread
From: Bernd Schubert @ 2008-03-28 10:35 UTC (permalink / raw)
  To: Lars Täuber; +Cc: linux-raid

> Here the conf file:
> monosan:~ # cat /etc/mdadm.conf
> DEVICE /dev/sd[ab][0-9] /dev/dm-*
> ARRAY /dev/md2 level=raid1 UUID=d9d31de2:e6dbd3c3:37c7ea09:882a64e5
> ARRAY /dev/md3 level=raid1 num-devices=2
> UUID=a8687183:a79e514c:ca492c4b:ffd4384f ARRAY /dev/md4 level=raid6
> num-devices=15 spares=1 UUID=cfcbe071:f6766d8f:0f1ffefa:892d09c3 ARRAY
> /dev/md9 level=raid1 num-devices=2 name=9
> UUID=db687150:614e76fd:28feefc0:b1aae572
>
> All dm-* devices are really distinctive. I could post the
> /etc/multipath.conf too if you want.

I only have very little experience with multipathing, but please send your 
config. The fact you really do use multipathing only confirms my initial 
guess it is a multipath and not a md problem.


Cheers,
Bernd


-- 
Bernd Schubert
Q-Leap Networks GmbH

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Reopen: 16 HDDs too much for RAID6?
  2008-03-28 10:35                 ` Bernd Schubert
@ 2008-03-28 10:55                   ` Lars Täuber
  2008-03-28 11:20                     ` Bernd Schubert
  0 siblings, 1 reply; 13+ messages in thread
From: Lars Täuber @ 2008-03-28 10:55 UTC (permalink / raw)
  To: linux-raid

Hi Bernd,

Bernd Schubert <bs@q-leap.de> schrieb:
> > Here the conf file:
> > monosan:~ # cat /etc/mdadm.conf
> > DEVICE /dev/sd[ab][0-9] /dev/dm-*
> > ARRAY /dev/md2 level=raid1 UUID=d9d31de2:e6dbd3c3:37c7ea09:882a64e5
> > ARRAY /dev/md3 level=raid1 num-devices=2
> > UUID=a8687183:a79e514c:ca492c4b:ffd4384f ARRAY /dev/md4 level=raid6
> > num-devices=15 spares=1 UUID=cfcbe071:f6766d8f:0f1ffefa:892d09c3 ARRAY
> > /dev/md9 level=raid1 num-devices=2 name=9
> > UUID=db687150:614e76fd:28feefc0:b1aae572
> >
> > All dm-* devices are really distinctive. I could post the
> > /etc/multipath.conf too if you want.

here is the file:
monosan:~ # cat /etc/multipath.conf
#
# This configuration file is generated by Yast, do not modify it
# manually please. 
#
defaults {
        polling_interval        "0"
        user_friendly_names     "yes"
#       path_grouping_policy    "multibus"
}

blacklist {
#       devnode "*"
        wwid "SATA_WDC_WD1600YS-01_WD-WCAP02964085"
        wwid "SATA_WDC_WD1600YS-01_WD-WCAP02965435"
}

blacklist_exceptions {
}

multipaths {
        mutlipath {
                wwid "SATA_ST31000340NS_5QJ02TBK"
        }
        mutlipath {
                wwid "SATA_ST31000340NS_5QJ0185Y"
        }
        mutlipath {
                wwid "SATA_ST31000340NS_5QJ02QRQ"
        }
        mutlipath {
                wwid "SATA_ST31000340NS_5QJ0204G"
        }
        mutlipath {
                wwid "SATA_ST31000340NS_5QJ02TVQ"
        }
        mutlipath {
                wwid "SATA_ST31000340NS_5QJ012AL"
        }
        mutlipath {
                wwid "SATA_ST31000340NS_5QJ00PHN"
        }
        mutlipath {
                wwid "SATA_ST31000340NS_5QJ01BYF"
        }
        mutlipath {
                wwid "SATA_ST31000340NS_5QJ026J1"
        }
        mutlipath {
                wwid "SATA_ST31000340NS_5QJ01G09"
        }
        mutlipath {
                wwid "SATA_ST31000340NS_5QJ02461"
        }
        mutlipath {
                wwid "SATA_ST31000340NS_5QJ013GW"
        }
        mutlipath {
                wwid "SATA_ST31000340NS_5QJ01835"
        }
        mutlipath {
                wwid "SATA_ST31000340NS_5QJ01C49"
        }
        mutlipath {
                wwid "SATA_ST31000340NS_5QJ02TBZ"
        }
        mutlipath {
                wwid "SATA_ST31000340NS_5QJ01JSF"
        }
}

 
> I only have very little experience with multipathing, but please send your 
> config. The fact you really do use multipathing only confirms my initial 
> guess it is a multipath and not a md problem.

But when I assemble the array after a clean shutdown after it has been initially synced there is no problem. Just if the array was degraded it has such duplicated superblocks.
How comes?

Lars

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Reopen: 16 HDDs too much for RAID6?
  2008-03-28 10:55                   ` Lars Täuber
@ 2008-03-28 11:20                     ` Bernd Schubert
  0 siblings, 0 replies; 13+ messages in thread
From: Bernd Schubert @ 2008-03-28 11:20 UTC (permalink / raw)
  To: Lars Täuber; +Cc: linux-raid

On Friday 28 March 2008 11:55:37 Lars Täuber wrote:
> Hi Bernd,
>
> Bernd Schubert <bs@q-leap.de> schrieb:
> > > Here the conf file:
> > > monosan:~ # cat /etc/mdadm.conf
> > > DEVICE /dev/sd[ab][0-9] /dev/dm-*
> > > ARRAY /dev/md2 level=raid1 UUID=d9d31de2:e6dbd3c3:37c7ea09:882a64e5
> > > ARRAY /dev/md3 level=raid1 num-devices=2
> > > UUID=a8687183:a79e514c:ca492c4b:ffd4384f ARRAY /dev/md4 level=raid6
> > > num-devices=15 spares=1 UUID=cfcbe071:f6766d8f:0f1ffefa:892d09c3 ARRAY
> > > /dev/md9 level=raid1 num-devices=2 name=9
> > > UUID=db687150:614e76fd:28feefc0:b1aae572
> > >
> > > All dm-* devices are really distinctive. I could post the
> > > /etc/multipath.conf too if you want.
>
> here is the file:
> monosan:~ # cat /etc/multipath.conf
> #
> # This configuration file is generated by Yast, do not modify it
> # manually please.
> #
> defaults {
>         polling_interval        "0"
>         user_friendly_names     "yes"
> #       path_grouping_policy    "multibus"
> }
>
> blacklist {
> #       devnode "*"
>         wwid "SATA_WDC_WD1600YS-01_WD-WCAP02964085"
>         wwid "SATA_WDC_WD1600YS-01_WD-WCAP02965435"
> }

Hmm, maybe you are creating multipath of multipath?

Here's something from a config of our systems:

devnode_blacklist {
        devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
}



>
> blacklist_exceptions {
> }
>
> multipaths {
>         mutlipath {
>                 wwid "SATA_ST31000340NS_5QJ02TBK"
>         }

I would set user friendly names using the alias paramter, something like this

multipaths {
        multipath {
                wwid                    360050cc000203ffc0000000000000019
                alias                   raid1a-ost
        }

> I only have very little experience with multipathing, but please send
> > your config. The fact you really do use multipathing only confirms my
> > initial guess it is a multipath and not a md problem.
>
> But when I assemble the array after a clean shutdown after it has been
> initially synced there is no problem. Just if the array was degraded it has
> such duplicated superblocks. How comes?

No idea so far, but please do some blacklisting. And if you will set more 
readable names like "/dev/disk8" instead of "dm-8", it might get much more 
easy to figure out what it wrong.

Cheers,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-03-28 11:20 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-06  9:01 16 HDDs too much for RAID6? Lars Täuber
2008-03-06  9:45 ` Andre Noll
2008-03-06 10:55   ` Lars Täuber
2008-03-06 16:16     ` Andre Noll
2008-03-07  8:41       ` Luca Berra
2008-03-07 10:33         ` Andre Noll
2008-03-07 10:45         ` Lars Täuber
2008-03-28  9:20           ` Reopen: " Lars Täuber
2008-03-28 10:14             ` Bernd Schubert
2008-03-28 10:27               ` Lars Täuber
2008-03-28 10:35                 ` Bernd Schubert
2008-03-28 10:55                   ` Lars Täuber
2008-03-28 11:20                     ` Bernd Schubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).