Raid-10 mount at startup always has problem

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Raid-10 mount at startup always has problem
@ 2007-08-27 18:14 Daniel L. Miller
       [not found] ` <46D49F1A.7030409@tmr.com>
  0 siblings, 1 reply; 42+ messages in thread
From: Daniel L. Miller @ 2007-08-27 18:14 UTC (permalink / raw)
  To: linux-raid

Hi!

I have a four-disk Raid-10 array that I created and mount with mdadm.  
It seems like every re-boot, either the array is not recognized 
altogether, or one of the disks is not added.  Manually adding using 
mdadm works.

Ubuntu, custom compiled kernel, 2.6.22
mdadm 2.6.2
Sata hard drives, nvidia CK804 controller - NOT using nvidia raid.

-- 
Daniel

^ permalink raw reply	[flat|nested] 42+ messages in thread

[parent not found: <46D49F1A.7030409@tmr.com>]

* Re: Raid-10 mount at startup always has problem
       [not found] ` <46D49F1A.7030409@tmr.com>
@ 2007-09-10  1:53   ` Daniel L. Miller
  2007-09-10  2:04     ` Richard Scobie
       [not found]     ` <46E4A5F0.9090407@sauce.co.nz>
  0 siblings, 2 replies; 42+ messages in thread
From: Daniel L. Miller @ 2007-09-10  1:53 UTC (permalink / raw)
  To: linux-raid

Bill Davidsen wrote:
> Daniel L. Miller wrote:
>> Hi!
>>
>> I have a four-disk Raid-10 array that I created and mount with 
>> mdadm.  It seems like every re-boot, either the array is not 
>> recognized altogether, or one of the disks is not added.  Manually 
>> adding using mdadm works.
>
> What superblock version and partition type did you use? mdadm -D please.
Thanks for the reply.  I've been wondering why no one answered me - then 
discovered your answer in my mailbox!  Must have been hiding somewhere . 
. . .

Anyway -
 mdadm -D /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Tue Oct  3 19:11:53 2006
     Raid Level : raid10
     Array Size : 312581632 (298.10 GiB 320.08 GB)
  Used Dev Size : 156290816 (149.05 GiB 160.04 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun Sep  9 18:51:17 2007
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : near=2, far=1
     Chunk Size : 32K

           UUID : 9d94b17b:f5fac31a:577c252b:0d4c4b2a
         Events : 0.10811466

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/sda
       1       8       16        1      active sync   /dev/sdb
       2       8       32        2      active sync   /dev/sdc
       3       8       48        3      active sync   /dev/sdd

And you didn't ask, but my mdadm.conf:
DEVICE partitions
ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a

Daniel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-09-10  1:53   ` Daniel L. Miller
@ 2007-09-10  2:04     ` Richard Scobie
       [not found]     ` <46E4A5F0.9090407@sauce.co.nz>
  1 sibling, 0 replies; 42+ messages in thread
From: Richard Scobie @ 2007-09-10  2:04 UTC (permalink / raw)
  To: Linux RAID Mailing List

Daniel L. Miller wrote:

> And you didn't ask, but my mdadm.conf:
> DEVICE partitions
> ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
> UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a


Hi Daniel,

Try adding

auto=part

at the end of you mdadm.conf ARRAY line.

Regards,

Richard


^ permalink raw reply	[flat|nested] 42+ messages in thread

[parent not found: <46E4A5F0.9090407@sauce.co.nz>]

* Re: Raid-10 mount at startup always has problem
       [not found]     ` <46E4A5F0.9090407@sauce.co.nz>
@ 2007-09-10  2:11       ` Daniel L. Miller
  2007-10-24 14:22         ` Daniel L. Miller
  0 siblings, 1 reply; 42+ messages in thread
From: Daniel L. Miller @ 2007-09-10  2:11 UTC (permalink / raw)
  To: linux-raid

Richard Scobie wrote:
> Daniel L. Miller wrote:
>
>> And you didn't ask, but my mdadm.conf:
>> DEVICE partitions
>> ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
>> UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a
>
> Try adding
>
> auto=part
>
> at the end of you mdadm.conf ARRAY line.
Thanks - will see what happens on my next reboot.

Daniel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-09-10  2:11       ` Daniel L. Miller
@ 2007-10-24 14:22         ` Daniel L. Miller
  2007-10-24 16:25           ` Doug Ledford
                             ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Daniel L. Miller @ 2007-10-24 14:22 UTC (permalink / raw)
  To: linux-raid

Daniel L. Miller wrote:
> Richard Scobie wrote:
>> Daniel L. Miller wrote:
>>
>>> And you didn't ask, but my mdadm.conf:
>>> DEVICE partitions
>>> ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
>>> UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a
>>
>> Try adding
>>
>> auto=part
>>
>> at the end of you mdadm.conf ARRAY line.
> Thanks - will see what happens on my next reboot.
>
Current mdadm.conf:
DEVICE partitions
ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part

still have the problem where on boot one drive is not part of the 
array.  Is there a log file I can check to find out WHY a drive is not 
being added?  It's been a while since the reboot, but I did find some 
entries in dmesg - I'm appending both the md lines and the physical disk 
related lines.  The bottom shows one disk not being added (this time is 
was sda) - and the disk that gets skipped on each boot seems to be 
random - there's no consistent failure:

[...]
md: raid10 personality registered for level 10
[...]
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
[...]
scsi0 : sata_nv
scsi1 : sata_nv
ata1: SATA max UDMA/133 cmd 0xffffc20001428480 ctl 0xffffc200014284a0 
bmdma 0x0000000000011410 irq 23
ata2: SATA max UDMA/133 cmd 0xffffc20001428580 ctl 0xffffc200014285a0 
bmdma 0x0000000000011418 irq 23
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
ata1.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata1.00: configured for UDMA/133
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
ata2.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata2.00: configured for UDMA/133
scsi 0:0:0:0: Direct-Access     ATA      ST3160811AS      3.AA PQ: 0 ANSI: 5
ata1: bounce limit 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw 
segs 61
scsi 1:0:0:0: Direct-Access     ATA      ST3160811AS      3.AA PQ: 0 ANSI: 5
ata2: bounce limit 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw 
segs 61
ACPI: PCI Interrupt Link [LSI1] enabled at IRQ 22
ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LSI1] -> GSI 22 (level, 
high) -> IRQ 22
sata_nv 0000:00:08.0: Using ADMA mode
PCI: Setting latency timer of device 0000:00:08.0 to 64
scsi2 : sata_nv
scsi3 : sata_nv
ata3: SATA max UDMA/133 cmd 0xffffc2000142a480 ctl 0xffffc2000142a4a0 
bmdma 0x0000000000011420 irq 22
ata4: SATA max UDMA/133 cmd 0xffffc2000142a580 ctl 0xffffc2000142a5a0 
bmdma 0x0000000000011428 irq 22
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
ata3.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata3.00: configured for UDMA/133
ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata4.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
ata4.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata4.00: configured for UDMA/133
scsi 2:0:0:0: Direct-Access     ATA      ST3160811AS      3.AA PQ: 0 ANSI: 5
ata3: bounce limit 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw 
segs 61
scsi 3:0:0:0: Direct-Access     ATA      ST3160811AS      3.AA PQ: 0 ANSI: 5
ata4: bounce limit 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw 
segs 61
sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA
sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA
 sda: unknown partition table
sd 0:0:0:0: [sda] Attached SCSI disk
sd 1:0:0:0: [sdb] 312581808 512-byte hardware sectors (160042 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA
sd 1:0:0:0: [sdb] 312581808 512-byte hardware sectors (160042 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA
 sdb: unknown partition table
sd 1:0:0:0: [sdb] Attached SCSI disk
sd 2:0:0:0: [sdc] 312581808 512-byte hardware sectors (160042 MB)
sd 2:0:0:0: [sdc] Write Protect is off
sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA
sd 2:0:0:0: [sdc] 312581808 512-byte hardware sectors (160042 MB)
sd 2:0:0:0: [sdc] Write Protect is off
sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA
 sdc: unknown partition table
sd 2:0:0:0: [sdc] Attached SCSI disk
sd 3:0:0:0: [sdd] 312581808 512-byte hardware sectors (160042 MB)
sd 3:0:0:0: [sdd] Write Protect is off
sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA
sd 3:0:0:0: [sdd] 312581808 512-byte hardware sectors (160042 MB)
sd 3:0:0:0: [sdd] Write Protect is off
sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA
 sdd: unknown partition table
sd 3:0:0:0: [sdd] Attached SCSI disk
[...]
md: md0 stopped.
md: md0 stopped.
md: bind<sdc>
md: bind<sdd>
md: bind<sdb>
md: md0: raid array is not clean -- starting background reconstruction
raid10: raid set md0 active with 3 out of 4 devices
md: couldn't update array info. -22
md: resync of RAID array md0
md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 
KB/sec) for resync.
md: using 128k window, over a total of 312581632 blocks.
Filesystem "md0": Disabling barriers, not supported by the underlying device
XFS mounting filesystem md0
Starting XFS recovery on filesystem: md0 (logdev: internal)
Ending XFS recovery on filesystem: md0 (logdev: internal)

-- 
Daniel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-24 14:22         ` Daniel L. Miller
@ 2007-10-24 16:25           ` Doug Ledford
  2007-10-24 20:01           ` Bill Davidsen
  2007-10-25  6:12           ` Neil Brown
  2 siblings, 0 replies; 42+ messages in thread
From: Doug Ledford @ 2007-10-24 16:25 UTC (permalink / raw)
  To: Daniel L. Miller; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2189 bytes --]

On Wed, 2007-10-24 at 07:22 -0700, Daniel L. Miller wrote:
> Daniel L. Miller wrote:
> > Richard Scobie wrote:
> >> Daniel L. Miller wrote:
> >>
> >>> And you didn't ask, but my mdadm.conf:
> >>> DEVICE partitions
> >>> ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
> >>> UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a
> >>
> >> Try adding
> >>
> >> auto=part
> >>
> >> at the end of you mdadm.conf ARRAY line.
> > Thanks - will see what happens on my next reboot.
> >
> Current mdadm.conf:
> DEVICE partitions
> ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
> UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part
> 
> still have the problem where on boot one drive is not part of the 
> array.  Is there a log file I can check to find out WHY a drive is not 
> being added?

It usually means either the device is busy at the time the raid startup
happened, or the device wasn't created by udev yet at the time the
startup happened.  It it failing to start the array properly in the
initrd or is this happening after you've switched to the rootfs and are
running the startup scripts?


> md: md0 stopped.
> md: md0 stopped.
> md: bind<sdc>
> md: bind<sdd>
> md: bind<sdb>

Whole disk raid devices == bad.  Lots of stuff can go wrong with that
setup.

> md: md0: raid array is not clean -- starting background reconstruction
> raid10: raid set md0 active with 3 out of 4 devices
> md: couldn't update array info. -22
> md: resync of RAID array md0
> md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 
> KB/sec) for resync.
> md: using 128k window, over a total of 312581632 blocks.
> Filesystem "md0": Disabling barriers, not supported by the underlying device
> XFS mounting filesystem md0
> Starting XFS recovery on filesystem: md0 (logdev: internal)
> Ending XFS recovery on filesystem: md0 (logdev: internal)
> 
> 
> 
-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-24 14:22         ` Daniel L. Miller
  2007-10-24 16:25           ` Doug Ledford
@ 2007-10-24 20:01           ` Bill Davidsen
  2007-10-25  5:43             ` Daniel L. Miller
  2007-10-25  6:12           ` Neil Brown
  2 siblings, 1 reply; 42+ messages in thread
From: Bill Davidsen @ 2007-10-24 20:01 UTC (permalink / raw)
  To: Daniel L. Miller; +Cc: linux-raid

Daniel L. Miller wrote:
> Daniel L. Miller wrote:
>> Richard Scobie wrote:
>>> Daniel L. Miller wrote:
>>>
>>>> And you didn't ask, but my mdadm.conf:
>>>> DEVICE partitions
>>>> ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
>>>> UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a
>>>
>>> Try adding
>>>
>>> auto=part
>>>
>>> at the end of you mdadm.conf ARRAY line.
>> Thanks - will see what happens on my next reboot.
>>
> Current mdadm.conf:
> DEVICE partitions
> ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
> UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part
>
> still have the problem where on boot one drive is not part of the 
> array.  Is there a log file I can check to find out WHY a drive is not 
> being added?  It's been a while since the reboot, but I did find some 
> entries in dmesg - I'm appending both the md lines and the physical 
> disk related lines.  The bottom shows one disk not being added (this 
> time is was sda) - and the disk that gets skipped on each boot seems 
> to be random - there's no consistent failure:

I suspect the base problem is that you are using whole disks instead of 
partitions, and the problem with the partition table below is probably 
an indication that you have something on that drive which looks like a 
partition table but isn't. That prevents the drive from being recognized 
as a whole drive. You're lucky, if the data looked enough like a 
partition table to be valid the o/s probably would have tried to do 
something with it.

I can't see any easy (or safe) backout on this, you have used the whole 
disk, so you can't just drop a drive, partition, and add the partition 
back in place of the drive. And if you have a failure and ever have to 
replace a drive, you will have to use a drive or partition at least as 
large as what you have. Hopefully someone will have a good idea how to 
gracefully transition to a safer setup, if random data ever looks like a 
valid partition table, evil may occur. And if you ever get this on two 
drives at once the system won't boot. Two time-bomb cases, and they're 
not mutually exclusive.

This may be the rare case where you really do need to specify the actual 
devices to get reliable operation.
>
> [...]
> md: raid10 personality registered for level 10
> [...]
> md: Autodetecting RAID arrays.
> md: autorun ...
> md: ... autorun DONE.
> [...]
> scsi0 : sata_nv
> scsi1 : sata_nv
> ata1: SATA max UDMA/133 cmd 0xffffc20001428480 ctl 0xffffc200014284a0 
> bmdma 0x0000000000011410 irq 23
> ata2: SATA max UDMA/133 cmd 0xffffc20001428580 ctl 0xffffc200014285a0 
> bmdma 0x0000000000011418 irq 23
> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata1.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
> ata1.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
> ata1.00: configured for UDMA/133
> ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata2.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
> ata2.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
> ata2.00: configured for UDMA/133
> scsi 0:0:0:0: Direct-Access     ATA      ST3160811AS      3.AA PQ: 0 
> ANSI: 5
> ata1: bounce limit 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw 
> segs 61
> scsi 1:0:0:0: Direct-Access     ATA      ST3160811AS      3.AA PQ: 0 
> ANSI: 5
> ata2: bounce limit 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw 
> segs 61
> ACPI: PCI Interrupt Link [LSI1] enabled at IRQ 22
> ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LSI1] -> GSI 22 (level, 
> high) -> IRQ 22
> sata_nv 0000:00:08.0: Using ADMA mode
> PCI: Setting latency timer of device 0000:00:08.0 to 64
> scsi2 : sata_nv
> scsi3 : sata_nv
> ata3: SATA max UDMA/133 cmd 0xffffc2000142a480 ctl 0xffffc2000142a4a0 
> bmdma 0x0000000000011420 irq 22
> ata4: SATA max UDMA/133 cmd 0xffffc2000142a580 ctl 0xffffc2000142a5a0 
> bmdma 0x0000000000011428 irq 22
> ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata3.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
> ata3.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
> ata3.00: configured for UDMA/133
> ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata4.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
> ata4.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
> ata4.00: configured for UDMA/133
> scsi 2:0:0:0: Direct-Access     ATA      ST3160811AS      3.AA PQ: 0 
> ANSI: 5
> ata3: bounce limit 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw 
> segs 61
> scsi 3:0:0:0: Direct-Access     ATA      ST3160811AS      3.AA PQ: 0 
> ANSI: 5
> ata4: bounce limit 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw 
> segs 61
> sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
> support DPO or FUA
> sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
> support DPO or FUA
> sda: unknown partition table
> sd 0:0:0:0: [sda] Attached SCSI disk
> sd 1:0:0:0: [sdb] 312581808 512-byte hardware sectors (160042 MB)
> sd 1:0:0:0: [sdb] Write Protect is off
> sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
> sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't 
> support DPO or FUA
> sd 1:0:0:0: [sdb] 312581808 512-byte hardware sectors (160042 MB)
> sd 1:0:0:0: [sdb] Write Protect is off
> sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
> sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't 
> support DPO or FUA
> sdb: unknown partition table
> sd 1:0:0:0: [sdb] Attached SCSI disk
> sd 2:0:0:0: [sdc] 312581808 512-byte hardware sectors (160042 MB)
> sd 2:0:0:0: [sdc] Write Protect is off
> sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't 
> support DPO or FUA
> sd 2:0:0:0: [sdc] 312581808 512-byte hardware sectors (160042 MB)
> sd 2:0:0:0: [sdc] Write Protect is off
> sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't 
> support DPO or FUA
> sdc: unknown partition table
> sd 2:0:0:0: [sdc] Attached SCSI disk
> sd 3:0:0:0: [sdd] 312581808 512-byte hardware sectors (160042 MB)
> sd 3:0:0:0: [sdd] Write Protect is off
> sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
> sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't 
> support DPO or FUA
> sd 3:0:0:0: [sdd] 312581808 512-byte hardware sectors (160042 MB)
> sd 3:0:0:0: [sdd] Write Protect is off
> sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
> sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't 
> support DPO or FUA
> sdd: unknown partition table
> sd 3:0:0:0: [sdd] Attached SCSI disk
> [...]
> md: md0 stopped.
> md: md0 stopped.
> md: bind<sdc>
> md: bind<sdd>
> md: bind<sdb>
> md: md0: raid array is not clean -- starting background reconstruction
> raid10: raid set md0 active with 3 out of 4 devices
> md: couldn't update array info. -22
> md: resync of RAID array md0
> md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 
> 200000 KB/sec) for resync.
> md: using 128k window, over a total of 312581632 blocks.
> Filesystem "md0": Disabling barriers, not supported by the underlying 
> device
> XFS mounting filesystem md0
> Starting XFS recovery on filesystem: md0 (logdev: internal)
> Ending XFS recovery on filesystem: md0 (logdev: internal)
>
>
>


-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-24 20:01           ` Bill Davidsen
@ 2007-10-25  5:43             ` Daniel L. Miller
  2007-10-25  6:40               ` Doug Ledford
  0 siblings, 1 reply; 42+ messages in thread
From: Daniel L. Miller @ 2007-10-25  5:43 UTC (permalink / raw)
  To: linux-raid

Bill Davidsen wrote:
>>>> Daniel L. Miller wrote:
>> Current mdadm.conf:
>> DEVICE partitions
>> ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
>> UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part
>>
>> still have the problem where on boot one drive is not part of the 
>> array.  Is there a log file I can check to find out WHY a drive is 
>> not being added?  It's been a while since the reboot, but I did find 
>> some entries in dmesg - I'm appending both the md lines and the 
>> physical disk related lines.  The bottom shows one disk not being 
>> added (this time is was sda) - and the disk that gets skipped on each 
>> boot seems to be random - there's no consistent failure:
>
> I suspect the base problem is that you are using whole disks instead 
> of partitions, and the problem with the partition table below is 
> probably an indication that you have something on that drive which 
> looks like a partition table but isn't. That prevents the drive from 
> being recognized as a whole drive. You're lucky, if the data looked 
> enough like a partition table to be valid the o/s probably would have 
> tried to do something with it.
> [...]
> This may be the rare case where you really do need to specify the 
> actual devices to get reliable operation.
OK - I'm officially confused now (I was just unofficially before).  WHY 
is it a problem using whole drives as RAID components?  I would have 
thought that building a RAID storage unit with identically sized drives 
- and using each drive's full capacity - is exactly the way you're 
supposed to!  I should mention that the boot/system drive is IDE, and 
NOT part of the RAID.  So I'm not worried about losing the system - but 
I AM concerned about the data.  I'm using four drives in a RAID-10 
configuration - I thought this would provide a good blend of safety and 
performance for a small fileserver.

Because it's RAID-10 - I would ASSuME that I can drop one drive (after 
all, I keep booting one drive short), partition if necessary, and add it 
back in.  But how would splitting these disks into partitions improve 
either stability or performance?

-- 
Daniel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-25  5:43             ` Daniel L. Miller
@ 2007-10-25  6:40               ` Doug Ledford
  2007-10-26  9:15                 ` Luca Berra
  2007-10-29  5:59                 ` Daniel L. Miller
  0 siblings, 2 replies; 42+ messages in thread
From: Doug Ledford @ 2007-10-25  6:40 UTC (permalink / raw)
  To: Daniel L. Miller; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 5201 bytes --]

On Wed, 2007-10-24 at 22:43 -0700, Daniel L. Miller wrote:
> Bill Davidsen wrote:
> >>>> Daniel L. Miller wrote:
> >> Current mdadm.conf:
> >> DEVICE partitions
> >> ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
> >> UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part
> >>
> >> still have the problem where on boot one drive is not part of the 
> >> array.  Is there a log file I can check to find out WHY a drive is 
> >> not being added?  It's been a while since the reboot, but I did find 
> >> some entries in dmesg - I'm appending both the md lines and the 
> >> physical disk related lines.  The bottom shows one disk not being 
> >> added (this time is was sda) - and the disk that gets skipped on each 
> >> boot seems to be random - there's no consistent failure:
> >
> > I suspect the base problem is that you are using whole disks instead 
> > of partitions, and the problem with the partition table below is 
> > probably an indication that you have something on that drive which 
> > looks like a partition table but isn't. That prevents the drive from 
> > being recognized as a whole drive. You're lucky, if the data looked 
> > enough like a partition table to be valid the o/s probably would have 
> > tried to do something with it.
> > [...]
> > This may be the rare case where you really do need to specify the 
> > actual devices to get reliable operation.
> OK - I'm officially confused now (I was just unofficially before).  WHY 
> is it a problem using whole drives as RAID components?  I would have 
> thought that building a RAID storage unit with identically sized drives 
> - and using each drive's full capacity - is exactly the way you're 
> supposed to!

As much as anything else this can be summed up as you are thinking of
how you are using the drives and not how unexpected software on your
system might try and use your drives.  Without a partition table, none
of the software on your system can know what to do with the drives
except mdadm when it finds an md superblock.  That doesn't stop other
software from *trying* to find out how to use your drives though.  That
includes the kernel trying to look for a valid partition table, mount
possibly scanning the drive for a file system label, lvm scanning for an
lvm superblock, mtools looking for a dos filesystem, etc.  Under normal
conditions, the random data on your drive will never look valid to these
other pieces of software.  But, once in a great while, it will look
valid.  And that's when all hell breaks loose.  Or worse, you run a
partition program such as fdisk on the device and it initializes the
partition table (something that the Fedora/RHEL installers do to all
disks without partition tables...well, the installer tells you there's
no partition table and asks if you want to initialize it, but if someone
is in a hurry and hits yes when they meant no, bye bye data).

The partition table is the single, (mostly) universally recognized
arbiter of what possible data might be on the disk.  Having a partition
table may not make mdadm recognize the md superblock any better, but it
keeps all that other stuff from even trying to access data that it
doesn't have a need to access and prevents random luck from turning your
day bad.

Oh, and let's not go into what can happen if you're talking about a dual
boot machine and what Windows might do to the disk if it doesn't think
the disk space is already spoken for by a linux partition.

And, in particular with mdadm, I once created a full disk md raid array
on a couple disks, then couldn't get things arranged like I wanted, so I
just partitioned the disks and then created new arrays in the partitions
(without first manually zeroing the superblock for the whole disk
array).  Since I used a version 1.0 superblock on the whole disk array,
and then used version 1.1 superblocks in the partitions, the net result
was that when I ran mdadm -Eb, mdadm would find both the 1.1 and 1.0
superblocks in the last partition on the disk.  Confused both myself and
mdadm for a while.

Anyway, I happen to *like* the idea of using full disk devices, but the
reality is that the md subsystem doesn't have exclusive ownership of the
disks at all times, and without that it really needs to stake a claim on
the space instead of leaving things to chance IMO.

>   I should mention that the boot/system drive is IDE, and 
> NOT part of the RAID.  So I'm not worried about losing the system - but 
> I AM concerned about the data.  I'm using four drives in a RAID-10 
> configuration - I thought this would provide a good blend of safety and 
> performance for a small fileserver.
> 
> Because it's RAID-10 - I would ASSuME that I can drop one drive (after 
> all, I keep booting one drive short), partition if necessary, and add it 
> back in.  But how would splitting these disks into partitions improve 
> either stability or performance?

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-25  6:40               ` Doug Ledford
@ 2007-10-26  9:15                 ` Luca Berra
  2007-10-26 16:53                   ` Gabor Gombas
  2007-10-26 19:26                   ` Doug Ledford
  2007-10-29  5:59                 ` Daniel L. Miller
  1 sibling, 2 replies; 42+ messages in thread
From: Luca Berra @ 2007-10-26  9:15 UTC (permalink / raw)
  To: linux-raid

On Thu, Oct 25, 2007 at 02:40:06AM -0400, Doug Ledford wrote:
>partition table (something that the Fedora/RHEL installers do to all
>disks without partition tables...well, the installer tells you there's
>no partition table and asks if you want to initialize it, but if someone
>is in a hurry and hits yes when they meant no, bye bye data).
Cool feature!!!!

>
>The partition table is the single, (mostly) universally recognized
>arbiter of what possible data might be on the disk.  Having a partition
>table may not make mdadm recognize the md superblock any better, but it
>keeps all that other stuff from even trying to access data that it
>doesn't have a need to access and prevents random luck from turning your
>day bad.
on a pc maybe, but that is 20 years old design.
partition table design is limited because it is still based on C/H/S,
which do not exist anymore.
Put a partition table on a big storage, say a DMX, and enjoy a 20%
performance decrease.

>Oh, and let's not go into what can happen if you're talking about a dual
>boot machine and what Windows might do to the disk if it doesn't think
>the disk space is already spoken for by a linux partition.
Why the hell should the existance of windows limit the possibility of
linux working properly.
If i have a pc that dualboots windows i will take care of using the
common denominator of a partition table, if it is my big server i will
probably not. since it won't boot anything else than Linux.

>And, in particular with mdadm, I once created a full disk md raid array
>on a couple disks, then couldn't get things arranged like I wanted, so I
>just partitioned the disks and then created new arrays in the partitions
>(without first manually zeroing the superblock for the whole disk
>array).  Since I used a version 1.0 superblock on the whole disk array,
>and then used version 1.1 superblocks in the partitions, the net result
>was that when I ran mdadm -Eb, mdadm would find both the 1.1 and 1.0
>superblocks in the last partition on the disk.  Confused both myself and
>mdadm for a while.
yes, this is fun
On the opposite, i once inserted an mmc memory card, which had been
initialized on my mobile phone, into the mmc slot of my laptop, and was
faced with a load of error about mmcblk0 having an invalid partition
table. Obviously it had none, it was a plain fat filesystem.
Is the solution partitioning it? I don't think the phone would
agree.

>Anyway, I happen to *like* the idea of using full disk devices, but the
>reality is that the md subsystem doesn't have exclusive ownership of the
>disks at all times, and without that it really needs to stake a claim on
>the space instead of leaving things to chance IMO.
Start removing the partition detection code from the blasted kernel and
move it to userspace, which is already in place, but it is not the
default.



-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-26  9:15                 ` Luca Berra
@ 2007-10-26 16:53                   ` Gabor Gombas
  2007-10-27  7:57                     ` Luca Berra
  2007-10-26 19:26                   ` Doug Ledford
  1 sibling, 1 reply; 42+ messages in thread
From: Gabor Gombas @ 2007-10-26 16:53 UTC (permalink / raw)
  To: linux-raid

On Fri, Oct 26, 2007 at 11:15:13AM +0200, Luca Berra wrote:

> on a pc maybe, but that is 20 years old design.
> partition table design is limited because it is still based on C/H/S,
> which do not exist anymore.

The MS-DOS format is not the only possible partition table layout. Other
formats such as GPT do not have such limitations.

> Put a partition table on a big storage, say a DMX, and enjoy a 20%
> performance decrease.

I assume your "big storage" uses some kind of RAID. Are your partitions
stripe-aligned? (Btw. that has nothing to do with partitions, LVM can
also suffer if PEs are not aligned).

>> Oh, and let's not go into what can happen if you're talking about a dual
>> boot machine and what Windows might do to the disk if it doesn't think
>> the disk space is already spoken for by a linux partition.
> Why the hell should the existance of windows limit the possibility of
> linux working properly.

Well, if you want to convert a Windows partition to Linux by just
changing the partition type, running mke2fs over it, and filling it with
data, Windows will happily ignore the partition table change and will
overwrite your data without any notice on the next boot (happened with
one collegaue, not fun to debug). So much for automatic device type
detection...

> On the opposite, i once inserted an mmc memory card, which had been
> initialized on my mobile phone, into the mmc slot of my laptop, and was
> faced with a load of error about mmcblk0 having an invalid partition
> table. Obviously it had none, it was a plain fat filesystem.
> Is the solution partitioning it? I don't think the phone would
> agree.

Well, it said it could not find a valid partition change. That was the
truth. Why is it a problem if the kernel states a fact?

Gabor

-- 
     ---------------------------------------------------------
     MTA SZTAKI Computer and Automation Research Institute
                Hungarian Academy of Sciences
     ---------------------------------------------------------

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-26 16:53                   ` Gabor Gombas
@ 2007-10-27  7:57                     ` Luca Berra
  0 siblings, 0 replies; 42+ messages in thread
From: Luca Berra @ 2007-10-27  7:57 UTC (permalink / raw)
  To: linux-raid

On Fri, Oct 26, 2007 at 06:53:40PM +0200, Gabor Gombas wrote:
>On Fri, Oct 26, 2007 at 11:15:13AM +0200, Luca Berra wrote:
>
>> on a pc maybe, but that is 20 years old design.
>> partition table design is limited because it is still based on C/H/S,
>> which do not exist anymore.
>
>The MS-DOS format is not the only possible partition table layout. Other
>formats such as GPT do not have such limitations.
>
>> Put a partition table on a big storage, say a DMX, and enjoy a 20%
>> performance decrease.
>
>I assume your "big storage" uses some kind of RAID. Are your partitions
>stripe-aligned? (Btw. that has nothing to do with partitions, LVM can
>also suffer if PEs are not aligned).
mine are, unfortunately the default is to start them at 32256 bytes into
the device.

>>> Oh, and let's not go into what can happen if you're talking about a dual
>>> boot machine and what Windows might do to the disk if it doesn't think
>>> the disk space is already spoken for by a linux partition.
>> Why the hell should the existance of windows limit the possibility of
>> linux working properly.
what i am saying is that a dual boot machine is not the only scenario we
have.

>> On the opposite, i once inserted an mmc memory card, which had been
>> initialized on my mobile phone, into the mmc slot of my laptop, and was
>> faced with a load of error about mmcblk0 having an invalid partition
>> table. Obviously it had none, it was a plain fat filesystem.
>> Is the solution partitioning it? I don't think the phone would
>> agree.
>
>Well, it said it could not find a valid partition change. That was the
>truth. Why is it a problem if the kernel states a fact?
it is random. reformatting it made the kernel message go away.
i wonder if by chance something would decide it is a valid partition
table....

-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-26  9:15                 ` Luca Berra
  2007-10-26 16:53                   ` Gabor Gombas
@ 2007-10-26 19:26                   ` Doug Ledford
  2007-10-27  7:50                     ` Luca Berra
  2007-10-29  0:21                     ` Bill Davidsen
  1 sibling, 2 replies; 42+ messages in thread
From: Doug Ledford @ 2007-10-26 19:26 UTC (permalink / raw)
  To: Luca Berra; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4002 bytes --]

On Fri, 2007-10-26 at 11:15 +0200, Luca Berra wrote:
> On Thu, Oct 25, 2007 at 02:40:06AM -0400, Doug Ledford wrote:
> >The partition table is the single, (mostly) universally recognized
> >arbiter of what possible data might be on the disk.  Having a partition
> >table may not make mdadm recognize the md superblock any better, but it
> >keeps all that other stuff from even trying to access data that it
> >doesn't have a need to access and prevents random luck from turning your
> >day bad.
> on a pc maybe, but that is 20 years old design.

So?  Unix is 35+ year old design, I suppose you want to switch to Vista
then?

> partition table design is limited because it is still based on C/H/S,
> which do not exist anymore.
> Put a partition table on a big storage, say a DMX, and enjoy a 20%
> performance decrease.

Because you didn't stripe align the partition, your bad.

> >Oh, and let's not go into what can happen if you're talking about a dual
> >boot machine and what Windows might do to the disk if it doesn't think
> >the disk space is already spoken for by a linux partition.
> Why the hell should the existance of windows limit the possibility of
> linux working properly.

Linux works properly with a partition table, so this is a specious
statement.

> If i have a pc that dualboots windows i will take care of using the
> common denominator of a partition table, if it is my big server i will
> probably not. since it won't boot anything else than Linux.

Doesn't really gain you anything, but your choice.  Besides, the
question wasn't "why shouldn't Luca Berra use whole disk devices", it
was why I don't recommend using whole disk devices, and my
recommendation wasn't based in the least bit upon a single person's use
scenario.

> >And, in particular with mdadm, I once created a full disk md raid array
> >on a couple disks, then couldn't get things arranged like I wanted, so I
> >just partitioned the disks and then created new arrays in the partitions
> >(without first manually zeroing the superblock for the whole disk
> >array).  Since I used a version 1.0 superblock on the whole disk array,
> >and then used version 1.1 superblocks in the partitions, the net result
> >was that when I ran mdadm -Eb, mdadm would find both the 1.1 and 1.0
> >superblocks in the last partition on the disk.  Confused both myself and
> >mdadm for a while.
> yes, this is fun
> On the opposite, i once inserted an mmc memory card, which had been
> initialized on my mobile phone, into the mmc slot of my laptop, and was
> faced with a load of error about mmcblk0 having an invalid partition
> table.

So?  The messages are just informative, feel free to ignore them.

>  Obviously it had none, it was a plain fat filesystem.
> Is the solution partitioning it? I don't think the phone would
> agree.

The phone dictates the format, only a moron would say otherwise.  But,
then again, the phone doesn't care about interoperability and many other
issues on memory cards that it thinks it owns, so only a moron would
argue that because a phone doesn't use a partition table that nothing
else in the computer realm needs to either.

> >Anyway, I happen to *like* the idea of using full disk devices, but the
> >reality is that the md subsystem doesn't have exclusive ownership of the
> >disks at all times, and without that it really needs to stake a claim on
> >the space instead of leaving things to chance IMO.
> Start removing the partition detection code from the blasted kernel and
> move it to userspace, which is already in place, but it is not the
> default.

Which just moves where the work is done, not what work needs to be done.
It's a change for no benefit and a waste of time.

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-26 19:26                   ` Doug Ledford
@ 2007-10-27  7:50                     ` Luca Berra
  2007-10-27 15:07                       ` Gabor Gombas
  2007-10-27 20:47                       ` Doug Ledford
  2007-10-29  0:21                     ` Bill Davidsen
  1 sibling, 2 replies; 42+ messages in thread
From: Luca Berra @ 2007-10-27  7:50 UTC (permalink / raw)
  To: linux-raid

On Fri, Oct 26, 2007 at 03:26:33PM -0400, Doug Ledford wrote:
>On Fri, 2007-10-26 at 11:15 +0200, Luca Berra wrote:
>> On Thu, Oct 25, 2007 at 02:40:06AM -0400, Doug Ledford wrote:
>> >The partition table is the single, (mostly) universally recognized
>> >arbiter of what possible data might be on the disk.  Having a partition
>> >table may not make mdadm recognize the md superblock any better, but it
>> >keeps all that other stuff from even trying to access data that it
>> >doesn't have a need to access and prevents random luck from turning your
>> >day bad.
>> on a pc maybe, but that is 20 years old design.
>
>So?  Unix is 35+ year old design, I suppose you want to switch to Vista
>then?
unix is a 35+ year old design that evolved in time, some ideas were
kept, some ditched.

>> partition table design is limited because it is still based on C/H/S,
>> which do not exist anymore.
>> Put a partition table on a big storage, say a DMX, and enjoy a 20%
>> performance decrease.
>
>Because you didn't stripe align the partition, your bad.
:)
by default fdisk misalignes partition tables
and aligning them is more complex than just doing without.

>> >Oh, and let's not go into what can happen if you're talking about a dual
>> >boot machine and what Windows might do to the disk if it doesn't think
>> >the disk space is already spoken for by a linux partition.
>> Why the hell should the existance of windows limit the possibility of
>> linux working properly.
>
>Linux works properly with a partition table, so this is a specious
>statement.
It should also work properly without one.

>> If i have a pc that dualboots windows i will take care of using the
>> common denominator of a partition table, if it is my big server i will
>> probably not. since it won't boot anything else than Linux.
>
>Doesn't really gain you anything, but your choice.  Besides, the
>question wasn't "why shouldn't Luca Berra use whole disk devices", it
>was why I don't recommend using whole disk devices, and my
>recommendation wasn't based in the least bit upon a single person's use
>scenario.
If i am the only person in the world that believes partition tables
should not be required then i'll shut up.

>> On the opposite, i once inserted an mmc memory card, which had been
>> initialized on my mobile phone, into the mmc slot of my laptop, and was
>> faced with a load of error about mmcblk0 having an invalid partition
>> table.
>
>So?  The messages are just informative, feel free to ignore them.
but did not anaconda propose to wipe unpartitioned disks?

>The phone dictates the format, only a moron would say otherwise.  But,
>then again, the phone doesn't care about interoperability and many other
>issues on memory cards that it thinks it owns, so only a moron would
>argue that because a phone doesn't use a partition table that nothing
>else in the computer realm needs to either.
i don't count myself as a moron, what i am trying to say is that
partition tables are one way of organizing disk space, not the only one.

>> >Anyway, I happen to *like* the idea of using full disk devices, but the
>> >reality is that the md subsystem doesn't have exclusive ownership of the
>> >disks at all times, and without that it really needs to stake a claim on
>> >the space instead of leaving things to chance IMO.
>> Start removing the partition detection code from the blasted kernel and
>> move it to userspace, which is already in place, but it is not the
>> default.
>
>Which just moves where the work is done, not what work needs to be done.
and also permits to decide if it hat to be done or not.
>It's a change for no benefit and a waste of time.
the waste of time was having to put code in mdadm to undo partition
detection on component devices, where partition detection should not
have taken place.



-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-27  7:50                     ` Luca Berra
@ 2007-10-27 15:07                       ` Gabor Gombas
  2007-10-27 20:47                       ` Doug Ledford
  1 sibling, 0 replies; 42+ messages in thread
From: Gabor Gombas @ 2007-10-27 15:07 UTC (permalink / raw)
  To: linux-raid

On Sat, Oct 27, 2007 at 09:50:55AM +0200, Luca Berra wrote:

>> Because you didn't stripe align the partition, your bad.
> :)
> by default fdisk misalignes partition tables
> and aligning them is more complex than just doing without.

Why use fdisk then? Use parted instead. It's not the kernel's fault if
you use tools not suited for a given task...

>> Linux works properly with a partition table, so this is a specious
>> statement.
> It should also work properly without one.

It does:

sd 0:0:2:0: [sdc] Very big device. Trying to use READ CAPACITY(16).
sd 0:0:2:0: [sdc] 7812333568 512-byte hardware sectors (3999915 MB)
sd 0:0:2:0: [sdc] Write Protect is off
sd 0:0:2:0: [sdc] Mode Sense: 23 00 00 00
sd 0:0:2:0: [sdc] Write cache: enabled, read cache: disabled, doesn't support DPO or FUA
 sdc: unknown partition table

Works perfectly without any partition tables...

You seem to be annoyed because the kernel tells you that there is no
partition table it recognizes - but if that bothers you so, simply stop
reading the kernel logs. My kernel also tells me that it failed to find
an AGP bridge - by your logic that should mean that everyone still using
AGP-capable motherboards should toss their system to the junkyard?!?

Gabor

-- 
     ---------------------------------------------------------
     MTA SZTAKI Computer and Automation Research Institute
                Hungarian Academy of Sciences
     ---------------------------------------------------------

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-27  7:50                     ` Luca Berra
  2007-10-27 15:07                       ` Gabor Gombas
@ 2007-10-27 20:47                       ` Doug Ledford
  2007-10-28 13:37                         ` Luca Berra
  1 sibling, 1 reply; 42+ messages in thread
From: Doug Ledford @ 2007-10-27 20:47 UTC (permalink / raw)
  To: Luca Berra; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 6174 bytes --]

On Sat, 2007-10-27 at 09:50 +0200, Luca Berra wrote:
> On Fri, Oct 26, 2007 at 03:26:33PM -0400, Doug Ledford wrote:
> >On Fri, 2007-10-26 at 11:15 +0200, Luca Berra wrote:
> >> On Thu, Oct 25, 2007 at 02:40:06AM -0400, Doug Ledford wrote:
> >> >The partition table is the single, (mostly) universally recognized
> >> >arbiter of what possible data might be on the disk.  Having a partition
> >> >table may not make mdadm recognize the md superblock any better, but it
> >> >keeps all that other stuff from even trying to access data that it
> >> >doesn't have a need to access and prevents random luck from turning your
> >> >day bad.
> >> on a pc maybe, but that is 20 years old design.
> >
> >So?  Unix is 35+ year old design, I suppose you want to switch to Vista
> >then?
> unix is a 35+ year old design that evolved in time, some ideas were
> kept, some ditched.

BSD disk labels are still in use, SunOS disk labels are still in use,
partition tables are somewhat on the way out, but only because they are
being replaced by the new EFI disk partitioning method.  The only place
where partitionless devices is common is in dedicated raid boxes where
the raid controller is the only thing that will *ever* see that disk.
Sometimes they do it on big SAN/NAS stuff because they don't want to
align the partition table to the underlying device's stripe layout, but
even then they do so in a tightly controlled environment where they know
exactly which machines will be allowed to even try and access the
device.

> >> partition table design is limited because it is still based on C/H/S,
> >> which do not exist anymore.
> >> Put a partition table on a big storage, say a DMX, and enjoy a 20%
> >> performance decrease.
> >
> >Because you didn't stripe align the partition, your bad.
> :)
> by default fdisk misalignes partition tables
> and aligning them is more complex than just doing without.

So.  You really need to take the time and to understand the alignment of
the device because then and only then can you pass options to mke2fs to
align the fs metadata with the stripes as well thereby buying you ever
more performance than just leaving off the partition table (assuming
that's what you use, I don't know if other mkfs programs have the same
options for aligning metadata with stripes).  And if you take the time
to understand the underlying stripe layout for the mkfs stuff, then you
can use the same information to align the partition table.

> >> >Oh, and let's not go into what can happen if you're talking about a dual
> >> >boot machine and what Windows might do to the disk if it doesn't think
> >> >the disk space is already spoken for by a linux partition.
> >> Why the hell should the existance of windows limit the possibility of
> >> linux working properly.
> >
> >Linux works properly with a partition table, so this is a specious
> >statement.
> It should also work properly without one.

Most of the time it does.  But those times where it can fail, the
failure is due to not taking the precautions necessary to prevent it:
aka labeling disk usage via some sort of partition table/disklabel/etc.

> >> If i have a pc that dualboots windows i will take care of using the
> >> common denominator of a partition table, if it is my big server i will
> >> probably not. since it won't boot anything else than Linux.
> >
> >Doesn't really gain you anything, but your choice.  Besides, the
> >question wasn't "why shouldn't Luca Berra use whole disk devices", it
> >was why I don't recommend using whole disk devices, and my
> >recommendation wasn't based in the least bit upon a single person's use
> >scenario.
> If i am the only person in the world that believes partition tables
> should not be required then i'll shut up.
> 
> >> On the opposite, i once inserted an mmc memory card, which had been
> >> initialized on my mobile phone, into the mmc slot of my laptop, and was
> >> faced with a load of error about mmcblk0 having an invalid partition
> >> table.
> >
> >So?  The messages are just informative, feel free to ignore them.
> but did not anaconda propose to wipe unpartitioned disks?

Did you stick your mmc card in there during the install of the OS?
That's the only time anaconda ever runs, and therefore the only time it
ever checks your devices.  It makes sense that during the initial
install, when the OS is only configured to see locally connected
devices, or possibly iSCSI devices that you have specifically told it to
probe, that it would then ask you the question about those devices.
Other network attached or shared devices are generally added after the
initial install.

> >The phone dictates the format, only a moron would say otherwise.  But,
> >then again, the phone doesn't care about interoperability and many other
> >issues on memory cards that it thinks it owns, so only a moron would
> >argue that because a phone doesn't use a partition table that nothing
> >else in the computer realm needs to either.
> i don't count myself as a moron, what i am trying to say is that
> partition tables are one way of organizing disk space, not the only one.

Using whole disk devices isn't a means of organizing space.  It's a way
to get a rather miniscule amount of space back by *not* organizing the
space.

This whole argument seems to boil down to you wanting to perfectly
optimize your system for your use case which includes controlling the
environment enough that you know it's safe to not partition your disks,
where as I argue that although this works in controlled environments, it
is known to have failure modes in other environments, and I would be
totally remiss if I recommended to my customers that they should take
the risk that you can ignore because of your controlled environment
since I know a lot of my customers *don't* have a controlled environment
such as you do.

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-27 20:47                       ` Doug Ledford
@ 2007-10-28 13:37                         ` Luca Berra
  2007-10-28 17:55                           ` Doug Ledford
  0 siblings, 1 reply; 42+ messages in thread
From: Luca Berra @ 2007-10-28 13:37 UTC (permalink / raw)
  To: linux-raid

On Sat, Oct 27, 2007 at 04:47:30PM -0400, Doug Ledford wrote:
>On Sat, 2007-10-27 at 09:50 +0200, Luca Berra wrote:
>> On Fri, Oct 26, 2007 at 03:26:33PM -0400, Doug Ledford wrote:
>> >On Fri, 2007-10-26 at 11:15 +0200, Luca Berra wrote:
>> >> On Thu, Oct 25, 2007 at 02:40:06AM -0400, Doug Ledford wrote:
>> >> >The partition table is the single, (mostly) universally recognized
>> >> >arbiter of what possible data might be on the disk.  Having a partition
>> >> >table may not make mdadm recognize the md superblock any better, but it
>> >> >keeps all that other stuff from even trying to access data that it
>> >> >doesn't have a need to access and prevents random luck from turning your
>> >> >day bad.
>> >> on a pc maybe, but that is 20 years old design.
>> >
>> >So?  Unix is 35+ year old design, I suppose you want to switch to Vista
>> >then?
>> unix is a 35+ year old design that evolved in time, some ideas were
>> kept, some ditched.
>
>BSD disk labels are still in use, SunOS disk labels are still in use,
i am not a solaris expert, do they still use disk labels under vxvm?
oh, by the way, disklabels do not support the partition type attribute.

>partition tables are somewhat on the way out, but only because they are
>being replaced by the new EFI disk partitioning method.  The only place
>where partitionless devices is common is in dedicated raid boxes where
>the raid controller is the only thing that will *ever* see that disk.
well i am more used to other os (HP, AIX) where lvm is the common mean of
accessing disk devices

....

>> by default fdisk misalignes partition tables
>> and aligning them is more complex than just doing without.
>
>So.  You really need to take the time and to understand the alignment of
>the device because then and only then can you pass options to mke2fs to
yes and i am not the only person in the world doing that.

>> >Linux works properly with a partition table, so this is a specious
>> >statement.
>> It should also work properly without one.
>
>Most of the time it does.  But those times where it can fail, the
>failure is due to not taking the precautions necessary to prevent it:
>aka labeling disk usage via some sort of partition table/disklabel/etc.
I strongly disagree.
the failure is badly designed software.

>Did you stick your mmc card in there during the install of the OS?
My laptop has a built-in mmc slot, so i sometimes leave a card plugged
in. But the mmc thing was just an example, it is not that critical.
>> i don't count myself as a moron, what i am trying to say is that
>> partition tables are one way of organizing disk space, not the only one.
>
>Using whole disk devices isn't a means of organizing space.  It's a way
>to get a rather miniscule amount of space back by *not* organizing the
>space.
if i am using, say lvm to organize disk space, a partition table is
unnecessary to the organization, and it is natural not using them.

>This whole argument seems to boil down to you wanting to perfectly
>optimize your system for your use case which includes controlling the
>environment enough that you know it's safe to not partition your disks,
>where as I argue that although this works in controlled environments, it
>is known to have failure modes in other environments, and I would be
>totally remiss if I recommended to my customers that they should take
>the risk that you can ignore because of your controlled environment
>since I know a lot of my customers *don't* have a controlled environment
>such as you do.

The whole argument to me boils down to the fact that not having a partition
table on a device is possible, and software that do not consider this
eventuality is flawed, and recommnding to work-around flawed software is
just digging your head in the sand.
But i believe i did not convince you one ounce more than you convinced
me, so i'll quit this thread which is getting too far.

Regards,
L.

-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-28 13:37                         ` Luca Berra
@ 2007-10-28 17:55                           ` Doug Ledford
  0 siblings, 0 replies; 42+ messages in thread
From: Doug Ledford @ 2007-10-28 17:55 UTC (permalink / raw)
  To: Luca Berra; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3249 bytes --]

On Sun, 2007-10-28 at 14:37 +0100, Luca Berra wrote:
> On Sat, Oct 27, 2007 at 04:47:30PM -0400, Doug Ledford wrote:

> >Most of the time it does.  But those times where it can fail, the
> >failure is due to not taking the precautions necessary to prevent it:
> >aka labeling disk usage via some sort of partition table/disklabel/etc.
> I strongly disagree.
> the failure is badly designed software.

Then you need to blame Ingo who made putting the superblock at the end
of the device the standard.  If the superblock were always at the
beginning, then this whole argument would be moot.  Things would be
reliable the way you want.

> >Using whole disk devices isn't a means of organizing space.  It's a way
> >to get a rather miniscule amount of space back by *not* organizing the
> >space.
> if i am using, say lvm to organize disk space, a partition table is
> unnecessary to the organization, and it is natural not using them.

If you are using straight lvm then you don't have this problem anyway.
Lvm doesn't allow the underlying physical device to *look* like a valid,
partitioned, single device.  Md does when the superblock is at the end.

> >This whole argument seems to boil down to you wanting to perfectly
> >optimize your system for your use case which includes controlling the
> >environment enough that you know it's safe to not partition your disks,
> >where as I argue that although this works in controlled environments, it
> >is known to have failure modes in other environments, and I would be
> >totally remiss if I recommended to my customers that they should take
> >the risk that you can ignore because of your controlled environment
> >since I know a lot of my customers *don't* have a controlled environment
> >such as you do.
> 
> The whole argument to me boils down to the fact that not having a partition
> table on a device is possible, and software that do not consider this
> eventuality is flawed,

It's simply not possible to 100% certain differentiate between an md
whole disk partitioned device with a superblock at the end and a regular
device.  Period.  You can try to be clever, but you can also get tripped
up.  The flaw is not with the software, it's with a design that allowed
this to happen.

>  and recommnding to work-around flawed software is
> just digging your head in the sand.

If a design is broken but in place, I have no choice but to work around
it.  Anything else is just stupid.

> But i believe i did not convince you one ounce more than you convinced
> me, so i'll quit this thread which is getting too far.
> 
> Regards,
> L.
> 
> -- 
> Luca Berra -- bluca@comedia.it
>         Communication Media & Services S.r.l.
>  /"\
>  \ /     ASCII RIBBON CAMPAIGN
>   X        AGAINST HTML MAIL
>  / \
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-26 19:26                   ` Doug Ledford
  2007-10-27  7:50                     ` Luca Berra
@ 2007-10-29  0:21                     ` Bill Davidsen
  2007-10-29  7:41                       ` Luca Berra
  2007-10-29 14:31                       ` Doug Ledford
  1 sibling, 2 replies; 42+ messages in thread
From: Bill Davidsen @ 2007-10-29  0:21 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Luca Berra, linux-raid

Doug Ledford wrote:
> On Fri, 2007-10-26 at 11:15 +0200, Luca Berra wrote:
>   
>> On Thu, Oct 25, 2007 at 02:40:06AM -0400, Doug Ledford wrote:
>>     
>>> The partition table is the single, (mostly) universally recognized
>>> arbiter of what possible data might be on the disk.  Having a partition
>>> table may not make mdadm recognize the md superblock any better, but it
>>> keeps all that other stuff from even trying to access data that it
>>> doesn't have a need to access and prevents random luck from turning your
>>> day bad.
>>>       
>> on a pc maybe, but that is 20 years old design.
>>     
>
> So?  Unix is 35+ year old design, I suppose you want to switch to Vista
> then?
>
>   
>> partition table design is limited because it is still based on C/H/S,
>> which do not exist anymore.
>> Put a partition table on a big storage, say a DMX, and enjoy a 20%
>> performance decrease.
>>     
>
> Because you didn't stripe align the partition, your bad.
>   
Align to /what/ stripe? Hardware (CHS is fiction), software (of the RAID 
you're about to create), or ??? I don't notice my FC6 or FC7 install 
programs using any special partition location to start, I have only run 
(tried to run) FC8-test3 for the live CD, so I can't say what it might 
do. CentOS4 didn't do anything obvious, either, so unless I really 
misunderstand your position at redhat, that would be your bad.  ;-)

If you mean start a partition on a pseudo-CHS boundary, fdisk seems to 
use what it thinks are cylinders for that.

Please clarify what alignment provides a performance benefit.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-29  0:21                     ` Bill Davidsen
@ 2007-10-29  7:41                       ` Luca Berra
  2007-10-29 13:22                         ` Bill Davidsen
  2007-10-29 15:54                         ` Gabor Gombas
  2007-10-29 14:31                       ` Doug Ledford
  1 sibling, 2 replies; 42+ messages in thread
From: Luca Berra @ 2007-10-29  7:41 UTC (permalink / raw)
  To: linux-raid

On Sun, Oct 28, 2007 at 08:21:34PM -0400, Bill Davidsen wrote:
>>Because you didn't stripe align the partition, your bad.
>>   
>Align to /what/ stripe? Hardware (CHS is fiction), software (of the RAID 
the real stripe (track) size of the storage, you must read the manual
and/or bug technical support for that info.
>you're about to create), or ??? I don't notice my FC6 or FC7 install 
>programs using any special partition location to start, I have only run 
>(tried to run) FC8-test3 for the live CD, so I can't say what it might 
>do. CentOS4 didn't do anything obvious, either, so unless I really 
>misunderstand your position at redhat, that would be your bad.  ;-)
>
>If you mean start a partition on a pseudo-CHS boundary, fdisk seems to 
>use what it thinks are cylinders for that.
Yes, fdisk will create partition at sector 63 (due to CHS being braindead,
other than fictional: 63 sectors-per-track)
most arrays use 64 or 128 spt, and array cache are aligned accordingly.
So 63 is almost always the wrong choice.

for the default choice you must consider what spt your array uses, iirc
(this is from memory, so double check these figures)
IBM 64 spt (i think)
EMC DMX 64
EMC CX 128???
HDS (and HP XP) except OPEN-V 96
HDS (and HP XP) OPEN-V 128
HP EVA 4/6/8 with XCS 5.x state that no alignment is needed even if i
never found a technical explanation about that.
previous HP EVA versions did (maybe 64).
you might then want to consider how data is laid out on the storage, but
i believe the storage cache is enough to deal with that issue.

Please note that "0" is always well aligned.

Note to people who is now wondering WTH i am talking about.

consider a storage with 64 spt, an io size of 4k and partition starting
at sector 63.
first io request will require two ios from the storage (1 for sector 63,
and one for sectors 64 to 70)
the next 7 io (71-78,79-86,97-94,95-102,103-110,111-118,119-126) will be
on the same track
the 8th will again require to be split, and so on.
this causes the storage to do 1 unnecessary io every 8. YMMV.

L.

-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-29  7:41                       ` Luca Berra
@ 2007-10-29 13:22                         ` Bill Davidsen
  2007-10-29 15:21                           ` Doug Ledford
  2007-10-29 15:54                         ` Gabor Gombas
  1 sibling, 1 reply; 42+ messages in thread
From: Bill Davidsen @ 2007-10-29 13:22 UTC (permalink / raw)
  To: linux-raid

Luca Berra wrote:
> On Sun, Oct 28, 2007 at 08:21:34PM -0400, Bill Davidsen wrote:
>>> Because you didn't stripe align the partition, your bad.
>>>   
>> Align to /what/ stripe? Hardware (CHS is fiction), software (of the RAID 
> the real stripe (track) size of the storage, you must read the manual
> and/or bug technical support for that info.

That's my point, there *is* no "real stripe (track) size of the storage" 
because modern drives use zone bit recording, and sectors per track 
depends on track, and changes within a partition. See
  http://www.dewassoc.com/kbase/hard_drives/hard_disk_sector_structures.htm
  http://www.storagereview.com/guide2000/ref/hdd/op/mediaTracks.html
>> you're about to create), or ??? I don't notice my FC6 or FC7 install 
>> programs using any special partition location to start, I have only 
>> run (tried to run) FC8-test3 for the live CD, so I can't say what it 
>> might do. CentOS4 didn't do anything obvious, either, so unless I 
>> really misunderstand your position at redhat, that would be your 
>> bad.  ;-)
>>
>> If you mean start a partition on a pseudo-CHS boundary, fdisk seems 
>> to use what it thinks are cylinders for that.
> Yes, fdisk will create partition at sector 63 (due to CHS being 
> braindead,
> other than fictional: 63 sectors-per-track)
> most arrays use 64 or 128 spt, and array cache are aligned accordingly.
> So 63 is almost always the wrong choice.

As the above links show, there's no right choice.
>
> for the default choice you must consider what spt your array uses, iirc
> (this is from memory, so double check these figures)
> IBM 64 spt (i think)
> EMC DMX 64
> EMC CX 128???
> HDS (and HP XP) except OPEN-V 96
> HDS (and HP XP) OPEN-V 128
> HP EVA 4/6/8 with XCS 5.x state that no alignment is needed even if i
> never found a technical explanation about that.
> previous HP EVA versions did (maybe 64).
> you might then want to consider how data is laid out on the storage, but
> i believe the storage cache is enough to deal with that issue.
>
> Please note that "0" is always well aligned.
>
> Note to people who is now wondering WTH i am talking about.
>
> consider a storage with 64 spt, an io size of 4k and partition starting
> at sector 63.
> first io request will require two ios from the storage (1 for sector 63,
> and one for sectors 64 to 70)
> the next 7 io (71-78,79-86,97-94,95-102,103-110,111-118,119-126) will be
> on the same track
> the 8th will again require to be split, and so on.
> this causes the storage to do 1 unnecessary io every 8. YMMV.
No one makes drives with fixed spt any more. Your assumptions are a 
decade out of date.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-29 13:22                         ` Bill Davidsen
@ 2007-10-29 15:21                           ` Doug Ledford
  0 siblings, 0 replies; 42+ messages in thread
From: Doug Ledford @ 2007-10-29 15:21 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1121 bytes --]

On Mon, 2007-10-29 at 09:22 -0400, Bill Davidsen wrote:

> > consider a storage with 64 spt, an io size of 4k and partition starting
> > at sector 63.
> > first io request will require two ios from the storage (1 for sector 63,
> > and one for sectors 64 to 70)
> > the next 7 io (71-78,79-86,97-94,95-102,103-110,111-118,119-126) will be
> > on the same track
> > the 8th will again require to be split, and so on.
> > this causes the storage to do 1 unnecessary io every 8. YMMV.
> No one makes drives with fixed spt any more. Your assumptions are a 
> decade out of date.

Your missing the point, it's not about drive tracks, it's about array
tracks, aka chunks.  A 64k write, that should write to one and only one
chunk, ends up spanning two.  That increases the amount of writing the
array has to do and the number of disks it busies for a typical single
I/O operation.

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-29  7:41                       ` Luca Berra
  2007-10-29 13:22                         ` Bill Davidsen
@ 2007-10-29 15:54                         ` Gabor Gombas
  1 sibling, 0 replies; 42+ messages in thread
From: Gabor Gombas @ 2007-10-29 15:54 UTC (permalink / raw)
  To: linux-raid

On Mon, Oct 29, 2007 at 08:41:39AM +0100, Luca Berra wrote:

> consider a storage with 64 spt, an io size of 4k and partition starting
> at sector 63.
> first io request will require two ios from the storage (1 for sector 63,
> and one for sectors 64 to 70)
> the next 7 io (71-78,79-86,97-94,95-102,103-110,111-118,119-126) will be
> on the same track
> the 8th will again require to be split, and so on.
> this causes the storage to do 1 unnecessary io every 8. YMMV.

That's only true for random reads. If the OS does sufficient read-ahead
then sequential reads are affected much less. But the killers are the
misaligned random writes since then (considering RAID5/6 for simplicity)
the stripe has to be read from all component disks before it can be
written back.

Gabor

-- 
     ---------------------------------------------------------
     MTA SZTAKI Computer and Automation Research Institute
                Hungarian Academy of Sciences
     ---------------------------------------------------------

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-29  0:21                     ` Bill Davidsen
  2007-10-29  7:41                       ` Luca Berra
@ 2007-10-29 14:31                       ` Doug Ledford
  1 sibling, 0 replies; 42+ messages in thread
From: Doug Ledford @ 2007-10-29 14:31 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Luca Berra, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2633 bytes --]

On Sun, 2007-10-28 at 20:21 -0400, Bill Davidsen wrote:
> Doug Ledford wrote:
> > On Fri, 2007-10-26 at 11:15 +0200, Luca Berra wrote:
> >   
> >> On Thu, Oct 25, 2007 at 02:40:06AM -0400, Doug Ledford wrote:
> >>     
> >>> The partition table is the single, (mostly) universally recognized
> >>> arbiter of what possible data might be on the disk.  Having a partition
> >>> table may not make mdadm recognize the md superblock any better, but it
> >>> keeps all that other stuff from even trying to access data that it
> >>> doesn't have a need to access and prevents random luck from turning your
> >>> day bad.
> >>>       
> >> on a pc maybe, but that is 20 years old design.
> >>     
> >
> > So?  Unix is 35+ year old design, I suppose you want to switch to Vista
> > then?
> >
> >   
> >> partition table design is limited because it is still based on C/H/S,
> >> which do not exist anymore.
> >> Put a partition table on a big storage, say a DMX, and enjoy a 20%
> >> performance decrease.
> >>     
> >
> > Because you didn't stripe align the partition, your bad.
> >   
> Align to /what/ stripe? Hardware (CHS is fiction), software (of the RAID 
> you're about to create), or ??? I don't notice my FC6 or FC7 install 
> programs using any special partition location to start, I have only run 
> (tried to run) FC8-test3 for the live CD, so I can't say what it might 
> do. CentOS4 didn't do anything obvious, either, so unless I really 
> misunderstand your position at redhat, that would be your bad.  ;-)
> 
> If you mean start a partition on a pseudo-CHS boundary, fdisk seems to 
> use what it thinks are cylinders for that.
> 
> Please clarify what alignment provides a performance benefit.

Luca was specifically talking about the big multi-terabyte to petabyte
hardware arrays on the market.  DMX, DDN, and others.  When they export
a volume to the OS, there is an underlying stripe layout to that volume.
If you don't use any partition table at all, you are automatically
aligned with their stripes.  However, if you do, then you have to align
your partition on a chunk boundary or else performance drops pretty
dramatically as a result of more writes than not crossing chunk
boundaries unnecessarily.  It's only relevant when you are talking about
a raid device that shows the OS a single logical disk made from lots of
other disks.


-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-25  6:40               ` Doug Ledford
  2007-10-26  9:15                 ` Luca Berra
@ 2007-10-29  5:59                 ` Daniel L. Miller
  2007-10-29  8:18                   ` Luca Berra
                                     ` (2 more replies)
  1 sibling, 3 replies; 42+ messages in thread
From: Daniel L. Miller @ 2007-10-29  5:59 UTC (permalink / raw)
  To: linux-raid

Doug Ledford wrote:
> Anyway, I happen to *like* the idea of using full disk devices, but the
> reality is that the md subsystem doesn't have exclusive ownership of the
> disks at all times, and without that it really needs to stake a claim on
> the space instead of leaving things to chance IMO.
>   
I've been re-reading this post numerous times - trying to ignore the 
burgeoning flame war :) - and this last sentence finally clicked with me.

As I'm a novice Linux user - and not involved in development at all - 
bear with me if I'm stating something obvious.  And if I'm wrong - 
please be gentle!

1.  md devices are not "native" to the kernel - they are 
created/assembled/activated/whatever by a userspace program.
2.  Because md devices are "non-native" devices, and are composed of 
"native" devices, the kernel may try to use those components directly 
without going through md.
3.  Creating a partition table somehow (I'm still not clear how/why) 
reduces the chance the kernel will access the drive directly without md.

These concepts suddenly have me terrified over my data integrity.  Is 
the md system so delicate that BOOT sequence can corrupt it?  How is it 
more reliable AFTER the completed boot sequence?

Nothing in the documentation (that I read - granted I don't always read 
everything) stated that partitioning prior to md creation was necessary 
- in fact references were provided on how to use complete disks.  Is 
there an "official" position on, "To Partition, or Not To Partition"?  
Particularly for my application - dedicated Linux server, RAID-10 
configuration, identical drives.

And if partitioning is the answer - what do I need to do with my live 
dataset?  Drop one drive, partition, then add the partition as a new 
drive to the set - and repeat for each drive after the rebuild finishes?
-- 
Daniel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-29  5:59                 ` Daniel L. Miller
@ 2007-10-29  8:18                   ` Luca Berra
  2007-10-29 15:47                     ` Doug Ledford
  2007-10-29 17:08                   ` Doug Ledford
  2007-10-29 18:56                   ` Richard Scobie
  2 siblings, 1 reply; 42+ messages in thread
From: Luca Berra @ 2007-10-29  8:18 UTC (permalink / raw)
  To: linux-raid

On Sun, Oct 28, 2007 at 10:59:01PM -0700, Daniel L. Miller wrote:
>Doug Ledford wrote:
>>Anyway, I happen to *like* the idea of using full disk devices, but the
>>reality is that the md subsystem doesn't have exclusive ownership of the
>>disks at all times, and without that it really needs to stake a claim on
>>the space instead of leaving things to chance IMO.
>>   
>I've been re-reading this post numerous times - trying to ignore the 
>burgeoning flame war :) - and this last sentence finally clicked with me.
>
I am sorry Daniel, when i read Doug and Bill, stating that your issue
was not having a partition table, i immediately took the bait and forgot
about your original issue.
I have no reason to believe your problem is due to not having a
partition table on your devices.

....
sda: unknown partition table
....
sdb: unknown partition table
....
sdc: unknown partition table
....
sdd: unknown partition table

the above clearly shows that the kernel does not see a partition table
where there is none which happens in some cases and bit Doug so hard.
Note, it does not happen at random, it should happen only if you use a
partitioned md device with a superblock at the end. Or if you configure
it wrongly as Doug did. (i am not accusing Doug of being stupid at all,
it is a fairly common mistake to make and we should try to prevent this
in mdadm as much as we can)
Again, having the kernel find a partition table where there is none,
should not pose a problem at all unless there is some badly designed software
like udev/hal that believes it knows better than you about what you have
on your disks.
but _NEITHER OF THESE IS YOUR PROBLEM_ imho

I am also sorry to say that i fail to identify what the source of your
problem is, we should try harder instead of flaming between us.

Is it possible to reproduce it on the live system
e.g. unmount, stop array, start it again and mount.
I bet it will work flawlessly in this case.
then i would disable starting this array at boot, and start it manually
when the system is up (stracing mdadm, so we can see what it does)

I am also wondering about this:
md: md0: raid array is not clean -- starting background reconstruction
does your system shut down properly?
do you see the message about stopping md at the very end of the
reboot/halt process?

L.

-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-29  8:18                   ` Luca Berra
@ 2007-10-29 15:47                     ` Doug Ledford
  2007-10-29 21:29                       ` Luca Berra
  0 siblings, 1 reply; 42+ messages in thread
From: Doug Ledford @ 2007-10-29 15:47 UTC (permalink / raw)
  To: Luca Berra; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4533 bytes --]

On Mon, 2007-10-29 at 09:18 +0100, Luca Berra wrote:
> On Sun, Oct 28, 2007 at 10:59:01PM -0700, Daniel L. Miller wrote:
> >Doug Ledford wrote:
> >>Anyway, I happen to *like* the idea of using full disk devices, but the
> >>reality is that the md subsystem doesn't have exclusive ownership of the
> >>disks at all times, and without that it really needs to stake a claim on
> >>the space instead of leaving things to chance IMO.
> >>   
> >I've been re-reading this post numerous times - trying to ignore the 
> >burgeoning flame war :) - and this last sentence finally clicked with me.
> >
> I am sorry Daniel, when i read Doug and Bill, stating that your issue
> was not having a partition table, i immediately took the bait and forgot
> about your original issue.

I never said *his* issue was lack of partition table, I just said I
don't recommend that because it's flaky.  The last statement I made
about his issue was to ask about whether the problem was happening
during initrd time or sysinit time to try and identify if it was failing
before or after / was mounted to try and determine where the issue might
lay.  Then we got off on the tangent about partitions, and at the same
time Neil started asking about udev, at which point it came out that
he's running ubuntu, and as much as I would like to help, the fact of
the matter is that I've never touched ubuntu and wouldn't have the
faintest clue, so I let Neil handle it.  At which point he found that
the udev scripts in ubuntu are being stupid, and from the looks of it
are the cause of the problem.  So, I've considered the initial issue
root caused for a bit now.

> like udev/hal that believes it knows better than you about what you have
> on your disks.
> but _NEITHER OF THESE IS YOUR PROBLEM_ imho

Actually, it looks like udev *is* the problem, but not because of
partition tables.

> I am also sorry to say that i fail to identify what the source of your
> problem is, we should try harder instead of flaming between us.

We can do both, or at least I can :-P

> Is it possible to reproduce it on the live system
> e.g. unmount, stop array, start it again and mount.
> I bet it will work flawlessly in this case.
> then i would disable starting this array at boot, and start it manually
> when the system is up (stracing mdadm, so we can see what it does)
> 
> I am also wondering about this:
> md: md0: raid array is not clean -- starting background reconstruction
> does your system shut down properly?
> do you see the message about stopping md at the very end of the
> reboot/halt process?

The root cause is that as udev adds his sata devices one at a time, on
each add of the sata device it invokes mdadm to see if there is an array
to start, and it doesn't use incremental mode on mdadm.  As a result, as
soon as there are 3 out of the 4 disks present, mdadm starts the array
in degraded mode.  It's probably a race between the mdadm started on the
third disk and mdadm started on the fourth disk that results in the
message about being unable to set the array info.  The one loosing the
race gets the error as the other one has already manipulated the array
(for example, the 4th disk mdadm could be trying to add the first disk
to the array, but it's already there, so it gets this error and bails).

So, as much as you might dislike mkinitrd since 5.0 Luca, it doesn't
have this particular problem ;-)  In the initrd we produce, it loads all
the SCSI/SATA/etc drivers first, then calls mkblkdevs which forces all
of the devices to appear in /dev, and only then does it start the
mdadm/lvm configuration.  Daniel, I make no promises what so ever that
this will even work at all as it may fail to load modules or all other
sorts of weirdness, but if you want to test the theory, you can download
the latest mkinitrd from fedoraproject.org, then use it to create an
initrd image under some other name than your default image name, then
manually edit your boot to have an extra stanza that uses the mkinitrd
generated initrd image instead of the ubuntu image, and then just see if
it brings the md device up cleanly instead of in degraded mode.  That
should be a fairly quick and easy way to test if Neil's analysis of the
udev script was right.

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-29 15:47                     ` Doug Ledford
@ 2007-10-29 21:29                       ` Luca Berra
  2007-10-29 23:15                         ` Doug Ledford
  0 siblings, 1 reply; 42+ messages in thread
From: Luca Berra @ 2007-10-29 21:29 UTC (permalink / raw)
  To: linux-raid

On Mon, Oct 29, 2007 at 11:47:19AM -0400, Doug Ledford wrote:
>On Mon, 2007-10-29 at 09:18 +0100, Luca Berra wrote:
>> On Sun, Oct 28, 2007 at 10:59:01PM -0700, Daniel L. Miller wrote:
>> >Doug Ledford wrote:
>> >>Anyway, I happen to *like* the idea of using full disk devices, but the
>> >>reality is that the md subsystem doesn't have exclusive ownership of the
>> >>disks at all times, and without that it really needs to stake a claim on
>> >>the space instead of leaving things to chance IMO.
>> >>   
>> >I've been re-reading this post numerous times - trying to ignore the 
>> >burgeoning flame war :) - and this last sentence finally clicked with me.
>> >
>> I am sorry Daniel, when i read Doug and Bill, stating that your issue
>> was not having a partition table, i immediately took the bait and forgot
>> about your original issue.
>
>I never said *his* issue was lack of partition table, I just said I
>don't recommend that because it's flaky.  The last statement I made
maybe i misread you but Bill was quite clear.

>about his issue was to ask about whether the problem was happening
>during initrd time or sysinit time to try and identify if it was failing
>before or after / was mounted to try and determine where the issue might
>lay.  Then we got off on the tangent about partitions, and at the same
>time Neil started asking about udev, at which point it came out that
>he's running ubuntu, and as much as I would like to help, the fact of
>the matter is that I've never touched ubuntu and wouldn't have the
>faintest clue, so I let Neil handle it.  At which point he found that
>the udev scripts in ubuntu are being stupid, and from the looks of it
>are the cause of the problem.  So, I've considered the initial issue
>root caused for a bit now.
It seems i made an idiot of myself by missing half of the thread, and i
even knew ubuntu was braindead in their use of udev at startup, since a
similar discussion came up on the lvm or the dm-devel mailing list (that
time iirc it was about lvm over multipath)

>> like udev/hal that believes it knows better than you about what you have
>> on your disks.
>> but _NEITHER OF THESE IS YOUR PROBLEM_ imho
>
>Actually, it looks like udev *is* the problem, but not because of
>partition tables.
you are right.

L.

-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-29 21:29                       ` Luca Berra
@ 2007-10-29 23:15                         ` Doug Ledford
  2007-10-30  0:03                           ` Daniel L. Miller
  0 siblings, 1 reply; 42+ messages in thread
From: Doug Ledford @ 2007-10-29 23:15 UTC (permalink / raw)
  To: Luca Berra; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1185 bytes --]

On Mon, 2007-10-29 at 22:29 +0100, Luca Berra wrote:
> At which point he found that
> >the udev scripts in ubuntu are being stupid, and from the looks of it
> >are the cause of the problem.  So, I've considered the initial issue
> >root caused for a bit now.
> It seems i made an idiot of myself by missing half of the thread, and i
> even knew ubuntu was braindead in their use of udev at startup, since a
> similar discussion came up on the lvm or the dm-devel mailing list (that
> time iirc it was about lvm over multipath)

Nah.  Even if we had concluded that udev was to blame here, I'm not
entirely certain that we hadn't left Daniel with the impression that we
suspected it versus blamed it, so reiterating it doesn't hurt.  And I'm
sure no one has given him a fix for the problem (although Neil did
request a change that will give debug output, but not solve the
problem), so not dropping it entirely would seem appropriate as well.

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-29 23:15                         ` Doug Ledford
@ 2007-10-30  0:03                           ` Daniel L. Miller
  2007-11-01 13:56                             ` Bill Davidsen
  2007-12-17 14:58                             ` Daniel L. Miller
  0 siblings, 2 replies; 42+ messages in thread
From: Daniel L. Miller @ 2007-10-30  0:03 UTC (permalink / raw)
  To: linux-raid

Doug Ledford wrote:
> Nah.  Even if we had concluded that udev was to blame here, I'm not
> entirely certain that we hadn't left Daniel with the impression that we
> suspected it versus blamed it, so reiterating it doesn't hurt.  And I'm
> sure no one has given him a fix for the problem (although Neil did
> request a change that will give debug output, but not solve the
> problem), so not dropping it entirely would seem appropriate as well.
>   
I've opened a bug report on Ubuntu's Launchpad.net.  Scott James Remnant 
asked me to cc him on Neil's incremental reference - we'll see what 
happens from here.

Thanks for the help guys.  At the moment, I've changed my mdadm.conf to 
explicitly list the drives, instead of the auto=partition parameter.  
We'll see what happens on the next reboot.

I don't know if it means anything, but I'm using a self-compiled 2.6.22 
kernel - with initrd.  At least I THINK I'm using initrd - I have an 
image, but I don't see an initrd line in my grub config.  Hmm....I'm 
going to add a stanza that includes the initrd and see what happens also.

-- 
Daniel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-30  0:03                           ` Daniel L. Miller
@ 2007-11-01 13:56                             ` Bill Davidsen
  2007-12-17 14:58                             ` Daniel L. Miller
  1 sibling, 0 replies; 42+ messages in thread
From: Bill Davidsen @ 2007-11-01 13:56 UTC (permalink / raw)
  To: Daniel L. Miller; +Cc: linux-raid

Daniel L. Miller wrote:
> Doug Ledford wrote:
>> Nah.  Even if we had concluded that udev was to blame here, I'm not
>> entirely certain that we hadn't left Daniel with the impression that we
>> suspected it versus blamed it, so reiterating it doesn't hurt.  And I'm
>> sure no one has given him a fix for the problem (although Neil did
>> request a change that will give debug output, but not solve the
>> problem), so not dropping it entirely would seem appropriate as well.
>>   
> I've opened a bug report on Ubuntu's Launchpad.net.  Scott James 
> Remnant asked me to cc him on Neil's incremental reference - we'll see 
> what happens from here.
>
> Thanks for the help guys.  At the moment, I've changed my mdadm.conf 
> to explicitly list the drives, instead of the auto=partition 
> parameter.  We'll see what happens on the next reboot.
>
> I don't know if it means anything, but I'm using a self-compiled 
> 2.6.22 kernel - with initrd.  At least I THINK I'm using initrd - I 
> have an image, but I don't see an initrd line in my grub config.  
> Hmm....I'm going to add a stanza that includes the initrd and see what 
> happens also.
>
What did that do?

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-30  0:03                           ` Daniel L. Miller
  2007-11-01 13:56                             ` Bill Davidsen
@ 2007-12-17 14:58                             ` Daniel L. Miller
  1 sibling, 0 replies; 42+ messages in thread
From: Daniel L. Miller @ 2007-12-17 14:58 UTC (permalink / raw)
  To: linux-raid

Daniel L. Miller wrote:
> Doug Ledford wrote:
>> Nah.  Even if we had concluded that udev was to blame here, I'm not
>> entirely certain that we hadn't left Daniel with the impression that we
>> suspected it versus blamed it, so reiterating it doesn't hurt.  And I'm
>> sure no one has given him a fix for the problem (although Neil did
>> request a change that will give debug output, but not solve the
>> problem), so not dropping it entirely would seem appropriate as well.
>>   
> I've opened a bug report on Ubuntu's Launchpad.net.  Scott James 
> Remnant asked me to cc him on Neil's incremental reference - we'll see 
> what happens from here.
>
> Thanks for the help guys.  At the moment, I've changed my mdadm.conf 
> to explicitly list the drives, instead of the auto=partition 
> parameter.  We'll see what happens on the next reboot.
>
> I don't know if it means anything, but I'm using a self-compiled 
> 2.6.22 kernel - with initrd.  At least I THINK I'm using initrd - I 
> have an image, but I don't see an initrd line in my grub config.  
> Hmm....I'm going to add a stanza that includes the initrd and see what 
> happens also.
>
Wow.  Been a while since I asked about this - I just realized a reboot 
or two has come and gone.  I checked my md status - everything was 
online!  Cool.

My current dmesg output:
sata_nv 0000:00:07.0: version 3.4
ACPI: PCI Interrupt Link [LTID] enabled at IRQ 23
ACPI: PCI Interrupt 0000:00:07.0[A] -> Link [LTID] -> GSI 23 (level, 
high) -> IR                                                        Q 23
sata_nv 0000:00:07.0: Using ADMA mode
PCI: Setting latency timer of device 0000:00:07.0 to 64
scsi0 : sata_nv
scsi1 : sata_nv
ata1: SATA max UDMA/133 cmd 0xffffc20001428480 ctl 0xffffc200014284a0 
bmdma 0x00                                                        
00000000011410 irq 23
ata2: SATA max UDMA/133 cmd 0xffffc20001428580 ctl 0xffffc200014285a0 
bmdma 0x00                                                        
00000000011418 irq 23
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
ata1.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata1.00: configured for UDMA/133
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
ata2.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata2.00: configured for UDMA/133
scsi 0:0:0:0: Direct-Access     ATA      ST3160811AS      3.AA PQ: 0 ANSI: 5
ata1: bounce limit 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw 
segs 61
scsi 1:0:0:0: Direct-Access     ATA      ST3160811AS      3.AA PQ: 0 ANSI: 5
ata2: bounce limit 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw 
segs 61
ACPI: PCI Interrupt Link [LSI1] enabled at IRQ 22
ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LSI1] -> GSI 22 (level, 
high) -> IR                                                        Q 22
sata_nv 0000:00:08.0: Using ADMA mode
PCI: Setting latency timer of device 0000:00:08.0 to 64
scsi2 : sata_nv
scsi3 : sata_nv
ata3: SATA max UDMA/133 cmd 0xffffc2000142a480 ctl 0xffffc2000142a4a0 
bmdma 0x00                                                        
00000000011420 irq 22
ata4: SATA max UDMA/133 cmd 0xffffc2000142a580 ctl 0xffffc2000142a5a0 
bmdma 0x00                                                        
00000000011428 irq 22
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
ata3.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata3.00: configured for UDMA/133
ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata4.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
ata4.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata4.00: configured for UDMA/133
scsi 2:0:0:0: Direct-Access     ATA      ST3160811AS      3.AA PQ: 0 ANSI: 5
ata3: bounce limit 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw 
segs 61
scsi 3:0:0:0: Direct-Access     ATA      ST3160811AS      3.AA PQ: 0 ANSI: 5
ata4: bounce limit 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw 
segs 61
sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
support DPO                                                         or FUA
sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
support DPO                                                         or FUA
 sda: unknown partition table
sd 0:0:0:0: [sda] Attached SCSI disk
sd 1:0:0:0: [sdb] 312581808 512-byte hardware sectors (160042 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't 
support DPO                                                         or FUA
sd 1:0:0:0: [sdb] 312581808 512-byte hardware sectors (160042 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't 
support DPO                                                         or FUA
 sdb: unknown partition table
sd 1:0:0:0: [sdb] Attached SCSI disk
sd 2:0:0:0: [sdc] 312581808 512-byte hardware sectors (160042 MB)
sd 2:0:0:0: [sdc] Write Protect is off
sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't 
support DPO                                                         or FUA
sd 2:0:0:0: [sdc] 312581808 512-byte hardware sectors (160042 MB)
sd 2:0:0:0: [sdc] Write Protect is off
sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't 
support DPO                                                         or FUA
 sdc: unknown partition table
sd 2:0:0:0: [sdc] Attached SCSI disk
sd 3:0:0:0: [sdd] 312581808 512-byte hardware sectors (160042 MB)
sd 3:0:0:0: [sdd] Write Protect is off
sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't 
support DPO                                                         or FUA
sd 3:0:0:0: [sdd] 312581808 512-byte hardware sectors (160042 MB)
sd 3:0:0:0: [sdd] Write Protect is off
sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't 
support DPO                                                         or FUA
 sdd: unknown partition table
sd 3:0:0:0: [sdd] Attached SCSI disk
Adding 8000328k swap on /dev/hda5.  Priority:-1 extents:1 across:8000328k
EXT3 FS on hda1, internal journal
device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: 
dm-devel@redhat.com
md: md0 stopped.
md: bind<sdb>
md: bind<sdc>
md: bind<sdd>
md: bind<sda>
md: md0: raid array is not clean -- starting background reconstruction
raid10: raid set md0 active with 4 out of 4 devices
md: resync of RAID array md0
md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 
KB/sec)                                                         for resync.
md: using 128k window, over a total of 312581632 blocks.
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is on for TX and on for RX.
Filesystem "md0": Disabling barriers, not supported by the underlying device
XFS mounting filesystem md0
Starting XFS recovery on filesystem: md0 (logdev: internal)
Ending XFS recovery on filesystem: md0 (logdev: internal)
XFS mounting filesystem hda2
Starting XFS recovery on filesystem: hda2 (logdev: internal)
Ending XFS recovery on filesystem: hda2 (logdev: internal)
XFS mounting filesystem hda3
Starting XFS recovery on filesystem: hda3 (logdev: internal)
Ending XFS recovery on filesystem: hda3 (logdev: internal)
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
tun: Universal TUN/TAP device driver, 1.6
tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
PM: Writing back config space on device 0000:0a:09.1 at offset b (was 
164814e4,                                                         
writing 164414e4)
PM: Writing back config space on device 0000:0a:09.1 at offset 3 (was 
804000, wr                                                        iting 
804010)
PM: Writing back config space on device 0000:0a:09.1 at offset 2 (was 
2000000, w                                                        riting 
2000003)
PM: Writing back config space on device 0000:0a:09.1 at offset 1 (was 
2b00000, w                                                        riting 
2b00106)
ADDRCONF(NETDEV_UP): eth1: link is not ready
Bridge firewalling registered
device eth1 entered promiscuous mode
audit(1197159016.060:2): dev=eth1 prom=256 old_prom=0 auid=4294967295
device tap1 entered promiscuous mode
audit(1197159016.060:3): dev=tap1 prom=256 old_prom=0 auid=4294967295
br1: starting userspace STP failed, staring kernel STP
br1: port 2(tap1) entering listening state
tg3: eth1: Link is up at 1000 Mbps, full duplex.
tg3: eth1: Flow control is on for TX and on for RX.
ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
br1: port 1(eth1) entering listening state
br1: port 2(tap1) entering learning state
br1: port 1(eth1) entering learning state
eth0: no IPv6 routers present
br1: topology change detected, propagating
br1: port 2(tap1) entering forwarding state
tap1: no IPv6 routers present
br1: no IPv6 routers present
br1: topology change detected, propagating
br1: port 1(eth1) entering forwarding state
eth1: no IPv6 routers present
ip_tables: (C) 2000-2006 Netfilter Core Team
Netfilter messages via NETLINK v0.30.
nf_conntrack version 0.5.0 (8192 buckets, 65536 max)
tun0: Disabled Privacy Extensions
parport_pc 00:0c: reported by Plug and Play ACPI
parport0: PC-style at 0x378 (0x778), irq 7, dma 3 
[PCSPP,TRISTATE,COMPAT,EPP,ECP                                                        
,DMA]
lp0: using parport0 (interrupt-driven).
NET: Registered protocol family 17
vmmon: module license 'unspecified' taints kernel.
/dev/vmmon[4622]: VMCI: Driver initialized.
/dev/vmmon[4622]: Module vmmon: registered with major=10 minor=165
/dev/vmmon[4622]: Module vmmon: initialized
/dev/vmnet: open called by PID 4649 (vmnet-bridge)
/dev/vmnet: hub 0 does not exist, allocating memory.
/dev/vmnet: port on hub 0 successfully opened
bridge-br1: enabling the bridge
bridge-br1: up
bridge-br1: already up
bridge-br1: attached
/dev/vmnet: open called by PID 4663 (vmnet-natd)
/dev/vmnet: hub 8 does not exist, allocating memory.
/dev/vmnet: port on hub 8 successfully opened
/dev/vmnet: open called by PID 4668 (vmnet-netifup)
/dev/vmnet: port on hub 8 successfully opened
/dev/vmnet: open called by PID 4679 (vmnet-dhcpd)
/dev/vmnet: port on hub 8 successfully opened
vmnet8: no IPv6 routers present
/dev/vmnet: open called by PID 4798 (vmware-vmx)
device br1 entered promiscuous mode
audit(1197159105.109:4): dev=br1 prom=256 old_prom=0 auid=4294967295
bridge-br1: enabled promiscuous mode
/dev/vmnet: port on hub 0 successfully opened
/dev/vmmon[4864]: host clock rate change request 0 -> 19
/dev/vmmon[4864]: host clock rate change request 19 -> 83
device br1 left promiscuous mode
audit(1197159183.647:5): dev=br1 prom=0 old_prom=256 auid=4294967295
bridge-br1: disabled promiscuous mode
/dev/vmnet: open called by PID 4864 (vmware-vmx)
device br1 entered promiscuous mode
audit(1197159183.647:6): dev=br1 prom=256 old_prom=0 auid=4294967295
bridge-br1: enabled promiscuous mode
/dev/vmnet: port on hub 0 successfully opened
/dev/vmnet: open called by PID 4945 (vmware-vmx)
/dev/vmnet: port on hub 0 successfully opened
/dev/vmnet: open called by PID 4983 (vmware-vmx)
/dev/vmnet: port on hub 0 successfully opened
md: md0: resync done.
RAID10 conf printout:
 --- wd:4 rd:4
 disk 0, wo:0, o:1, dev:sda
 disk 1, wo:0, o:1, dev:sdb
 disk 2, wo:0, o:1, dev:sdc
 disk 3, wo:0, o:1, dev:sdd
/dev/vmnet: open called by PID 4983 (vmware-vmx)
/dev/vmnet: port on hub 0 successfully opened
vmmon: Had to deallocate locked 118026 pages from vm driver ffff810123e5a000
vmmon: Had to deallocate AWE 3437 pages from vm driver ffff810123e5a000


-- 
Daniel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-29  5:59                 ` Daniel L. Miller
  2007-10-29  8:18                   ` Luca Berra
@ 2007-10-29 17:08                   ` Doug Ledford
  2007-10-29 18:56                   ` Richard Scobie
  2 siblings, 0 replies; 42+ messages in thread
From: Doug Ledford @ 2007-10-29 17:08 UTC (permalink / raw)
  To: Daniel L. Miller; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 6078 bytes --]

On Sun, 2007-10-28 at 22:59 -0700, Daniel L. Miller wrote:
> Doug Ledford wrote:
> > Anyway, I happen to *like* the idea of using full disk devices, but the
> > reality is that the md subsystem doesn't have exclusive ownership of the
> > disks at all times, and without that it really needs to stake a claim on
> > the space instead of leaving things to chance IMO.
> >   
> I've been re-reading this post numerous times - trying to ignore the 
> burgeoning flame war :) - and this last sentence finally clicked with me.
> 
> As I'm a novice Linux user - and not involved in development at all - 
> bear with me if I'm stating something obvious.  And if I'm wrong - 
> please be gentle!
> 
> 1.  md devices are not "native" to the kernel - they are 
> created/assembled/activated/whatever by a userspace program.

My real point was that md doesn't own the disks, meaning that during
startup, and at other points in time, software other than the md stack
can attempt to use the disk directly.  That software may be the linux
file system code, linux lvm code, or in some case entirely different OS
software.  Given that these situations can arise, using a partition
table to mark the space as in use by linux is what I meant by staking a
claim.  It doesn't keep the linux kernel from using it because it thinks
it owns it, but it does stop other software from attempting to use it.

> 2.  Because md devices are "non-native" devices, and are composed of 
> "native" devices, the kernel may try to use those components directly 
> without going through md.

In the case of superblocks at the end, yes.  The kernel may see the
underlying file system or lvm disk label even if the md device is not
started.

> 3.  Creating a partition table somehow (I'm still not clear how/why) 
> reduces the chance the kernel will access the drive directly without md.

The partition table is more to tell other software that linux owns the
space and to avoid mistakes where someone runs fdisk on a disk
accidentally and wipes out your array because they added a partition
table on what they thought was a new disk (more likely when you have
large arrays of disks attached via fiber channel or such than in a
single system).  Putting the superblock at the beginning of the md
device is the main thing that guarantees the kernel will never try to
use what's inside the md device without the md device running.

> These concepts suddenly have me terrified over my data integrity.  Is 
> the md system so delicate that BOOT sequence can corrupt it?

If you have your superblocks at the end of the devices, then there are
certain failure modes that can cause data inconsistencies.  Generally
speaking they won't harm the array itself, it's just that the different
disks in a raid1 array might contain different data.  If you don't use
partitions, then the majority of failure scenarios involve things like
accidental use of fdisk on the unpartitioned device, access of the
device by other OSes, that sort of thing.

>   How is it 
> more reliable AFTER the completed boot sequence?

Once the array is up and running, the constituent disks are marked as
busy in the operating system, which prevents other portions of the linux
kernel and other software in general from getting at the md owned disks.

> Nothing in the documentation (that I read - granted I don't always read 
> everything) stated that partitioning prior to md creation was necessary 
> - in fact references were provided on how to use complete disks.  Is 
> there an "official" position on, "To Partition, or Not To Partition"?  
> Particularly for my application - dedicated Linux server, RAID-10 
> configuration, identical drives.
> 
> And if partitioning is the answer - what do I need to do with my live 
> dataset?  Drop one drive, partition, then add the partition as a new 
> drive to the set - and repeat for each drive after the rebuild finishes?

You *probably*, and I emphasize probably, don't need to do anything.  I
emphasize it because I don't know enough about your situation to say so
with 100% certainty.  If I'm wrong, it's not my fault.

Now, that said, here's the gist of the situation.  There are specific
failure cases that can corrupt data in an md raid1 array mainly related
to superblocks at the end of devices.  There are specific failure cases
where an unpartitioned device can be accidentally partitioned or where a
partitioned md array in combination with superblocks at the end and
using a whole disk device can be misrecognized as a partitioned normal
drive.  There are, on the other hand, cases where it's perfectly safe to
use unpartitioned devices, or superblocks at the end of devices.  My
recommendation when someone asks what to do is to use partitions, and to
use superblocks at the beginning of the devices (except for /boot since
that isn't supported at the moment).  The reason I give that advice is
that I assume if a person knows enough to know when it's safe to use
unpartitioned devices, like Luca, then they wouldn't be asking me for
advice.  So since they *are* asking my advice, and since a lot of the
failure cases have as much to do with human error as they do with
software error, and since human error always seems to find new ways to
err, it's therefore impossible to list all the error cases, and so it's
best just to give the known safe advice.

Just because you heard the advice after creating your arrays is no
reason to panic though.  Since the disks are local to your linux server
and not attached via a fiber channel network or something similar, about
2/3rds of the failure cases drop away immediately.  And given that you
are using raid10 instead of raid1, the possible silent inconsistency
issue drops away.  All in all, your pretty safe.

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-29  5:59                 ` Daniel L. Miller
  2007-10-29  8:18                   ` Luca Berra
  2007-10-29 17:08                   ` Doug Ledford
@ 2007-10-29 18:56                   ` Richard Scobie
  2 siblings, 0 replies; 42+ messages in thread
From: Richard Scobie @ 2007-10-29 18:56 UTC (permalink / raw)
  To: linux-raid

Daniel L. Miller wrote:

> Nothing in the documentation (that I read - granted I don't always read 
> everything) stated that partitioning prior to md creation was necessary 
> - in fact references were provided on how to use complete disks.  Is 
> there an "official" position on, "To Partition, or Not To Partition"?  
> Particularly for my application - dedicated Linux server, RAID-10 
> configuration, identical drives.

My simplistic reason for always making one partition on md drives, about 
100MB smaller than the full space, has been as insurance to allow use of 
a replacement drive from another manufacturer, which while nominally 
marked as the same size as the originals, is in fact slightly smaller.

Regards,

Richard

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-24 14:22         ` Daniel L. Miller
  2007-10-24 16:25           ` Doug Ledford
  2007-10-24 20:01           ` Bill Davidsen
@ 2007-10-25  6:12           ` Neil Brown
  2007-10-25  6:51             ` Doug Ledford
                               ` (3 more replies)
  2 siblings, 4 replies; 42+ messages in thread
From: Neil Brown @ 2007-10-25  6:12 UTC (permalink / raw)
  To: Daniel L. Miller; +Cc: linux-raid

On Wednesday October 24, dmiller@amfes.com wrote:
> Current mdadm.conf:
> DEVICE partitions
> ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
> UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part
> 
> still have the problem where on boot one drive is not part of the 
> array.  Is there a log file I can check to find out WHY a drive is not 
> being added?  It's been a while since the reboot, but I did find some 
> entries in dmesg - I'm appending both the md lines and the physical disk 
> related lines.  The bottom shows one disk not being added (this time is 
> was sda) - and the disk that gets skipped on each boot seems to be 
> random - there's no consistent failure:

Odd.... but interesting.
Does it sometimes fail to start the array altogether?

> md: md0 stopped.
> md: md0 stopped.
> md: bind<sdc>
> md: bind<sdd>
> md: bind<sdb>
> md: md0: raid array is not clean -- starting background reconstruction
> raid10: raid set md0 active with 3 out of 4 devices
> md: couldn't update array info. -22
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This is the most surprising line, and hence the one most likely to
convey helpful information.

This message is generated when a process calls "SET_ARRAY_INFO" on an
array that is already running, and the changes implied by the new
"array_info" are not supportable.

The only way I can see this happening is if two copies of "mdadm" are
running at exactly the same time and are both are trying to assemble
the same array.  The first calls SET_ARRAY_INFO and assembles the
(partial) array.  The second calls SET_ARRAY_INFO and gets this error.
Not all devices are included because while when one mdadm when to
look, at a device, the other has it locked and so the first just
ignored it.

I just tried that, and sometimes it worked, but sometimes it assembled
with 3 out of 4 devices.  I didn't get the "couldn't update array info"
message, but that doesn't prove I'm wrong.

I cannot imagine how that might be happening (two at once) unless
maybe 'udev' had been configured to do something as soon as devices
were discovered.... seems unlikely.

It might be worth finding out where mdadm is being run in the init
scripts and add a "-v" flag, and redirecting stdout/stderr to some log
file.
e.g.
   mdadm -As  -v > /var/log/mdadm-$$ 2>&1

And see if that leaves something useful in the log file.

BTW, I don't think your problem has anything to do with the fact that
you are using whole partitions.
While it is debatable whether that is a good idea or not (I like the
idea, but Doug doesn't and I respect his opinion) I doubt it would
contribute to the current problem.

Your description makes me nearly certain that there is some sort of
race going on (that is the easiest way to explain randomly differing
behaviours).   The race is probably between different code 'locking'
(opening with O_EXCL) the various devices.  Give the above error
message, two different 'mdadm's seems most likely, but an mdadm and a
mount-by-label scan could probably do it too.

NeilBrown

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-25  6:12           ` Neil Brown
@ 2007-10-25  6:51             ` Doug Ledford
  2007-10-25 13:13             ` Daniel L. Miller
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 42+ messages in thread
From: Doug Ledford @ 2007-10-25  6:51 UTC (permalink / raw)
  To: Neil Brown; +Cc: Daniel L. Miller, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1623 bytes --]

On Thu, 2007-10-25 at 16:12 +1000, Neil Brown wrote:

> > md: md0 stopped.
> > md: md0 stopped.
> > md: bind<sdc>
> > md: bind<sdd>
> > md: bind<sdb>
> > md: md0: raid array is not clean -- starting background reconstruction
> > raid10: raid set md0 active with 3 out of 4 devices
> > md: couldn't update array info. -22
>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> This is the most surprising line, and hence the one most likely to
> convey helpful information.
> 
> This message is generated when a process calls "SET_ARRAY_INFO" on an
> array that is already running, and the changes implied by the new
> "array_info" are not supportable.
> 
> The only way I can see this happening is if two copies of "mdadm" are
> running at exactly the same time and are both are trying to assemble
> the same array.  The first calls SET_ARRAY_INFO and assembles the
> (partial) array.  The second calls SET_ARRAY_INFO and gets this error.
> Not all devices are included because while when one mdadm when to
> look, at a device, the other has it locked and so the first just
> ignored it.

If mdadm copy A gets three of the devices, I wouldn't think mdadm copy B
would have been able to get enough devices to decide to even try and
assemble the array (assuming that once copy A locked the devices during
open, that it then held the devices until time to assemble the array).

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-25  6:12           ` Neil Brown
  2007-10-25  6:51             ` Doug Ledford
@ 2007-10-25 13:13             ` Daniel L. Miller
  2007-10-25 13:33             ` Daniel L. Miller
  2007-10-25 14:46             ` Bill Davidsen
  3 siblings, 0 replies; 42+ messages in thread
From: Daniel L. Miller @ 2007-10-25 13:13 UTC (permalink / raw)
  To: linux-raid

Neil Brown wrote:
> It might be worth finding out where mdadm is being run in the init
> scripts and add a "-v" flag, and redirecting stdout/stderr to some log
> file.
> e.g.
>    mdadm -As  -v > /var/log/mdadm-$$ 2>&1
>
> And see if that leaves something useful in the log file.
>
>   
I haven't rebooted yet, but here's my /etc/udev/rules.d/70-mdadm.rules 
file (BTW - running on Ubuntu 7.10 Gutsy):

SUBSYSTEM=="block", ACTION=="add|change", 
ENV{ID_FS_TYPE}=="linux_raid*", RUN+="watershed -i udev-mdadm 
/sbin/mdadm -As -v > /var/log/mdadm-$$ 2>&1"

# This next line (only) is put into the initramfs,
#  where we run a strange script to activate only some of the arrays
#  as configured, instead of mdadm -As:
#initramfs# SUBSYSTEM=="block", ACTION=="add|change", 
ENV{ID_FS_TYPE}=="linux_raid*", RUN+="watershed -i udev-mdadm 
/scripts/local-top/mdadm from-udev"


Could that initramfs line be causing the problem?
-- 
Daniel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-25  6:12           ` Neil Brown
  2007-10-25  6:51             ` Doug Ledford
  2007-10-25 13:13             ` Daniel L. Miller
@ 2007-10-25 13:33             ` Daniel L. Miller
  2007-10-26  6:12               ` Neil Brown
  2007-10-25 14:46             ` Bill Davidsen
  3 siblings, 1 reply; 42+ messages in thread
From: Daniel L. Miller @ 2007-10-25 13:33 UTC (permalink / raw)
  To: linux-raid

Neil Brown wrote:
> It might be worth finding out where mdadm is being run in the init
> scripts and add a "-v" flag, and redirecting stdout/stderr to some log
> file.
> e.g.
>    mdadm -As  -v > /var/log/mdadm-$$ 2>&1
>
> And see if that leaves something useful in the log file.
>
>   
I haven't rebooted yet, but here's my /etc/udev/rules.d/70-mdadm.rules
file (BTW - running on Ubuntu 7.10 Gutsy):

SUBSYSTEM=="block", ACTION=="add|change",
ENV{ID_FS_TYPE}=="linux_raid*", RUN+="watershed -i udev-mdadm
/sbin/mdadm -As -v > /var/log/mdadm-$$ 2>&1"

# This next line (only) is put into the initramfs,
#  where we run a strange script to activate only some of the arrays
#  as configured, instead of mdadm -As:
#initramfs# SUBSYSTEM=="block", ACTION=="add|change",
ENV{ID_FS_TYPE}=="linux_raid*", RUN+="watershed -i udev-mdadm
/scripts/local-top/mdadm from-udev"


Could that initramfs line be causing the problem?
-- 
Daniel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-25 13:33             ` Daniel L. Miller
@ 2007-10-26  6:12               ` Neil Brown
  0 siblings, 0 replies; 42+ messages in thread
From: Neil Brown @ 2007-10-26  6:12 UTC (permalink / raw)
  To: Daniel L. Miller; +Cc: linux-raid

On Thursday October 25, dmiller@amfes.com wrote:
> Neil Brown wrote:
> > It might be worth finding out where mdadm is being run in the init
> > scripts and add a "-v" flag, and redirecting stdout/stderr to some log
> > file.
> > e.g.
> >    mdadm -As  -v > /var/log/mdadm-$$ 2>&1
> >
> > And see if that leaves something useful in the log file.
> >
> >   
> I haven't rebooted yet, but here's my /etc/udev/rules.d/70-mdadm.rules
> file (BTW - running on Ubuntu 7.10 Gutsy):
> 
> SUBSYSTEM=="block", ACTION=="add|change",
> ENV{ID_FS_TYPE}=="linux_raid*", RUN+="watershed -i udev-mdadm
> /sbin/mdadm -As -v > /var/log/mdadm-$$ 2>&1"

Yes, that would do exactly what you are experiencing.
Every time a component of a raid array is discovered, it will try to
assemble all known arrays.
So one drive appears, it tries to assemble the array but there aren't
enough so it gives up.
Then two drives.  Chances are there still aren't enough, so it gives
up again.
Then when there are three drives it will successfully assemble the
array - degraded.

Then when there are 4 drives, it will be too late.  I cannot see why
that would lead to the "cannot update array info" error, but it
certainly explains the rest.

That is really bad stuff to have in udev.
The "--incremental" mode was written precisely for use in udev.  I
wonder why they didn't use it....

Maybe you should log a bug report with Ubuntu and suggest they discuss
their udev scripts with the developer of mdadm (that would be me I
guess).

NeilBrown

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-25  6:12           ` Neil Brown
                               ` (2 preceding siblings ...)
  2007-10-25 13:33             ` Daniel L. Miller
@ 2007-10-25 14:46             ` Bill Davidsen
  2007-10-25 16:13               ` Daniel L. Miller
  2007-10-26  5:59               ` Neil Brown
  3 siblings, 2 replies; 42+ messages in thread
From: Bill Davidsen @ 2007-10-25 14:46 UTC (permalink / raw)
  To: Neil Brown; +Cc: Daniel L. Miller, linux-raid

Neil Brown wrote:
> On Wednesday October 24, dmiller@amfes.com wrote:
>   
>> Current mdadm.conf:
>> DEVICE partitions
>> ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
>> UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part
>>
>> still have the problem where on boot one drive is not part of the 
>> array.  Is there a log file I can check to find out WHY a drive is not 
>> being added?  It's been a while since the reboot, but I did find some 
>> entries in dmesg - I'm appending both the md lines and the physical disk 
>> related lines.  The bottom shows one disk not being added (this time is 
>> was sda) - and the disk that gets skipped on each boot seems to be 
>> random - there's no consistent failure:
>>     
>
> Odd.... but interesting.
> Does it sometimes fail to start the array altogether?
>
>   
>> md: md0 stopped.
>> md: md0 stopped.
>> md: bind<sdc>
>> md: bind<sdd>
>> md: bind<sdb>
>> md: md0: raid array is not clean -- starting background reconstruction
>> raid10: raid set md0 active with 3 out of 4 devices
>> md: couldn't update array info. -22
>>     
>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> This is the most surprising line, and hence the one most likely to
> convey helpful information.
>
> This message is generated when a process calls "SET_ARRAY_INFO" on an
> array that is already running, and the changes implied by the new
> "array_info" are not supportable.
>
> The only way I can see this happening is if two copies of "mdadm" are
> running at exactly the same time and are both are trying to assemble
> the same array.  The first calls SET_ARRAY_INFO and assembles the
> (partial) array.  The second calls SET_ARRAY_INFO and gets this error.
> Not all devices are included because while when one mdadm when to
> look, at a device, the other has it locked and so the first just
> ignored it.
>
> I just tried that, and sometimes it worked, but sometimes it assembled
> with 3 out of 4 devices.  I didn't get the "couldn't update array info"
> message, but that doesn't prove I'm wrong.
>
> I cannot imagine how that might be happening (two at once) unless
> maybe 'udev' had been configured to do something as soon as devices
> were discovered.... seems unlikely.
>
> It might be worth finding out where mdadm is being run in the init
> scripts and add a "-v" flag, and redirecting stdout/stderr to some log
> file.
> e.g.
>    mdadm -As  -v > /var/log/mdadm-$$ 2>&1
>
> And see if that leaves something useful in the log file.
>
> BTW, I don't think your problem has anything to do with the fact that
> you are using whole partitions.
>   

You don't think the "unknown partition table" on sdd is related? Because 
I read that as a sure indication that the system isn't considering the 
drive as one without a partition table, and therefore isn't looking for 
the superblock on the whole device. And as Doug pointed out, once you 
decide that there is a partition table lots of things might try to use it.
> While it is debatable whether that is a good idea or not (I like the
> idea, but Doug doesn't and I respect his opinion) I doubt it would
> contribute to the current problem.
>
>
> Your description makes me nearly certain that there is some sort of
> race going on (that is the easiest way to explain randomly differing
> behaviours).   The race is probably between different code 'locking'
> (opening with O_EXCL) the various devices.  Give the above error
> message, two different 'mdadm's seems most likely, but an mdadm and a
> mount-by-label scan could probably do it too.
>   
-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-25 14:46             ` Bill Davidsen
@ 2007-10-25 16:13               ` Daniel L. Miller
  2007-10-26  5:59               ` Neil Brown
  1 sibling, 0 replies; 42+ messages in thread
From: Daniel L. Miller @ 2007-10-25 16:13 UTC (permalink / raw)
  To: linux-raid

Bill Davidsen wrote:
> You don't think the "unknown partition table" on sdd is related? 
> Because I read that as a sure indication that the system isn't 
> considering the drive as one without a partition table, and therefore 
> isn't looking for the superblock on the whole device. And as Doug 
> pointed out, once you decide that there is a partition table lots of 
> things might try to use it.  
Now, would the drive "letters" (sd[a-d]) change from reboot-to-reboot?  
Because it's not consistent - so far I've seen each of the four drives 
at one time or another fail during the boot.

I've added the verbose logging to the udev mdadm rule, and I've also 
manually specified the drives in mdadm.conf instead of leaving it on 
auto.  Curious what the next boot will bring.

-- 
Daniel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Raid-10 mount at startup always has problem
  2007-10-25 14:46             ` Bill Davidsen
  2007-10-25 16:13               ` Daniel L. Miller
@ 2007-10-26  5:59               ` Neil Brown
  1 sibling, 0 replies; 42+ messages in thread
From: Neil Brown @ 2007-10-26  5:59 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Daniel L. Miller, linux-raid

On Thursday October 25, davidsen@tmr.com wrote:
> Neil Brown wrote:
> >
> > BTW, I don't think your problem has anything to do with the fact that
> > you are using whole partitions.
> >   
> 
> You don't think the "unknown partition table" on sdd is related? Because 
> I read that as a sure indication that the system isn't considering the 
> drive as one without a partition table, and therefore isn't looking for 
> the superblock on the whole device. And as Doug pointed out, once you 
> decide that there is a partition table lots of things might try to use it.

"unknown partition table" is what I would expect when using whole
drive.
It just mean "the first block doesn't look like a partition table",
and if you have some early block of an ext3 (or other) filesystem in
the first block (as you would in this case), you wouldn't expect it to
look like a partition table.

I don't understand what you are trying to say with your second
sentence.

NeilBrown

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2007-12-17 14:58 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-27 18:14 Raid-10 mount at startup always has problem Daniel L. Miller
     [not found] ` <46D49F1A.7030409@tmr.com>
2007-09-10  1:53   ` Daniel L. Miller
2007-09-10  2:04     ` Richard Scobie
     [not found]     ` <46E4A5F0.9090407@sauce.co.nz>
2007-09-10  2:11       ` Daniel L. Miller
2007-10-24 14:22         ` Daniel L. Miller
2007-10-24 16:25           ` Doug Ledford
2007-10-24 20:01           ` Bill Davidsen
2007-10-25  5:43             ` Daniel L. Miller
2007-10-25  6:40               ` Doug Ledford
2007-10-26  9:15                 ` Luca Berra
2007-10-26 16:53                   ` Gabor Gombas
2007-10-27  7:57                     ` Luca Berra
2007-10-26 19:26                   ` Doug Ledford
2007-10-27  7:50                     ` Luca Berra
2007-10-27 15:07                       ` Gabor Gombas
2007-10-27 20:47                       ` Doug Ledford
2007-10-28 13:37                         ` Luca Berra
2007-10-28 17:55                           ` Doug Ledford
2007-10-29  0:21                     ` Bill Davidsen
2007-10-29  7:41                       ` Luca Berra
2007-10-29 13:22                         ` Bill Davidsen
2007-10-29 15:21                           ` Doug Ledford
2007-10-29 15:54                         ` Gabor Gombas
2007-10-29 14:31                       ` Doug Ledford
2007-10-29  5:59                 ` Daniel L. Miller
2007-10-29  8:18                   ` Luca Berra
2007-10-29 15:47                     ` Doug Ledford
2007-10-29 21:29                       ` Luca Berra
2007-10-29 23:15                         ` Doug Ledford
2007-10-30  0:03                           ` Daniel L. Miller
2007-11-01 13:56                             ` Bill Davidsen
2007-12-17 14:58                             ` Daniel L. Miller
2007-10-29 17:08                   ` Doug Ledford
2007-10-29 18:56                   ` Richard Scobie
2007-10-25  6:12           ` Neil Brown
2007-10-25  6:51             ` Doug Ledford
2007-10-25 13:13             ` Daniel L. Miller
2007-10-25 13:33             ` Daniel L. Miller
2007-10-26  6:12               ` Neil Brown
2007-10-25 14:46             ` Bill Davidsen
2007-10-25 16:13               ` Daniel L. Miller
2007-10-26  5:59               ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).