Linux RAID subsystem development

Linux RAID subsystem development
 help / color / mirror / Atom feed

* Re: Storage device enumeration script
From: John Robinson @ 2011-05-27  9:15 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid
In-Reply-To: <4DDE249C.7080004@anonymous.org.uk>

On 26/05/2011 10:59, John Robinson wrote:
[...]
> [root@beast lsdrv]# python2.6 lsdrv
> PCI [pata_marvell] 03:00.0 IDE interface: Marvell Technology Group Ltd.
> 88SE6121 SATA II Controller (rev b2)
> └─scsi 0:0:0:0 HL-DT-ST DVD-RAM GH22NP20
> └─sr0: Empty/Unknown 1.00g
> PCI [ahci] 00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10
> Family) SATA AHCI Controller
> ├─scsi 2:0:0:0 ATA Hitachi HDS72101
> │ └─sda: Empty/Unknown 931.51g
> Traceback (most recent call last):
> File "lsdrv", line 387, in <module>
> show_blocks(" %s " % branch[0], [phy.block])
> File "lsdrv", line 339, in show_blocks
> show_blocks("%s %s " % (indent, branch[0]), [blockbyname[x] for x in subs])
> KeyError: 'sda1'
>
> Now, something's not getting picked up about sda. Looking at Mathias'
> "sweet" output, it's not coping with the (DOS) partition table. Another
> variation on my kernel's /sys or still to old a Python or ...?

Still seeing this. I added a print command so I can see that 
dev.partitions is being populated successfully. I'm not entirely sure 
where dev.ID_ etc are supposed to be coming from, but if it's that 
`blkid -p -o udev /dev/block/8:0` then I'm afraid CentOS 5's blkid 
doesn't understand the -p or -o udev options, it doesn't produce any 
output for whole drives with partition tables, and there isn't a 
/dev/block directory. It's blkid 1.0.0 from e2fsprogs 1.39-23.el5_5.1.

If that knocks CentOS 5 support on the head then so be it...

Cheers,

John.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Storage device enumeration script
From: John Robinson @ 2011-05-27  9:44 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid
In-Reply-To: <4DDF6B99.20602@anonymous.org.uk>

On 27/05/2011 10:15, John Robinson wrote:
[...]
> I'm not entirely sure where dev.ID_ etc are supposed to be coming
> from, but if it's that `blkid -p -o udev /dev/block/8:0` then I'm
> afraid CentOS 5's blkid doesn't understand the -p or -o udev options,
> it doesn't produce any output for whole drives with partition tables,
> and there isn't a /dev/block directory. It's blkid 1.0.0 from
> e2fsprogs 1.39-23.el5_5.1.
>
> If that knocks CentOS 5 support on the head then so be it...

Hmm, udevinfo might be of some use. Still doesn't say it's found a DOS
partition table, but it does get you e.g. ID_FS_TYPE=linux_raid_member 
and perhaps `file -s` will tell you there's DOS partition table (sort of).

Cheers,

John.

^ permalink raw reply

* Re: Storage device enumeration script
From: Gordon Henderson @ 2011-05-27 10:45 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <20110527095840.427d90be@natsu>

On Fri, 27 May 2011, Roman Mamedov wrote:

> On Fri, 27 May 2011 08:16:07 +0800
> Brad Campbell <brad@fnarfbargle.com> wrote:
>
>> On 27/05/11 08:13, Leslie Rhorer wrote:
>>>>
>>> 	I can't speak to Ubuntu, but Debian evidently does not.  I don't
>>> know what "pvs" is, but neither bash nor Python recognize it as a file
>>> anywhere in the path.
>> apt-get install lvm2.
>
> Sure, but lsdrv shouldn't assume lvm2 is installed or require it to be
> installed. Not everyone uses LVM, and simply installing it automatically adds
> things to initramfs (PV/VG/LV detection?), which can slow down boot-up process.

As well as not using LVM, some of us don't even use udev... Or kernels 
with modules...

Can someone post the output of this utility so I can see what it's 
actually doing?

(I don't have python on all my servers either)

Cheers,

Gordon

^ permalink raw reply

* Re: Storage device enumeration script
From: Phil Turmel @ 2011-05-27 11:23 UTC (permalink / raw)
  To: John Robinson; +Cc: linux-raid
In-Reply-To: <4DDF7296.6050706@anonymous.org.uk>

On 05/27/2011 05:44 AM, John Robinson wrote:
> On 27/05/2011 10:15, John Robinson wrote:
> [...]
>> I'm not entirely sure where dev.ID_ etc are supposed to be coming
>> from, but if it's that `blkid -p -o udev /dev/block/8:0` then I'm
>> afraid CentOS 5's blkid doesn't understand the -p or -o udev options,
>> it doesn't produce any output for whole drives with partition tables,
>> and there isn't a /dev/block directory. It's blkid 1.0.0 from
>> e2fsprogs 1.39-23.el5_5.1.
>>
>> If that knocks CentOS 5 support on the head then so be it...
> 
> Hmm, udevinfo might be of some use. Still doesn't say it's found a DOS
> partition table, but it does get you e.g. ID_FS_TYPE=linux_raid_member and perhaps `file -s` will tell you there's DOS partition table (sort of).

I'll look into this when I have a new CentOS 5 VM installed on my laptop.  I do want lsdrv to work with all of the CentOS 5 releases.

Phil

^ permalink raw reply

* Re: Storage device enumeration script
From: Torbjørn Skagestad @ 2011-05-27 11:26 UTC (permalink / raw)
  To: Gordon Henderson; +Cc: linux-raid
In-Reply-To: <alpine.DEB.2.00.1105271142110.25096@unicorn.drogon.net>

[-- Attachment #1: Type: text/plain, Size: 3262 bytes --]

Hi,

Some info can be found here https://github.com/pturmel/lsdrv

The output will typically be like this:

Controller platform [None]
 └─platform floppy.0
    └─fd0: Empty/Unknown 4.00k
PCI [pata_jmicron] 03:00.1 IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 02)
 ├─scsi 0:x:x:x [Empty]
 └─scsi 2:x:x:x [Empty]
PCI [ahci] 03:00.0 SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 02)
 ├─scsi 12:0:0:0 ATA THROTTLE
 │  └─sdu: Empty/Unknown 7.51g
 │     ├─sdu1: Empty/Unknown 243.00m
 │     │  └─Mounted as /dev/sdu1 @ /boot
 │     ├─sdu2: Empty/Unknown 1.00k
 │     └─sdu5: Empty/Unknown 7.27g
 │        ├─dm-0: Empty/Unknown 6.90g
 │        │  └─Mounted as /dev/mapper/server-root @ /
 │        └─dm-1: Empty/Unknown 368.00m
 └─scsi 13:0:0:0 ATA WDC WD10EACS-00Z
    └─sdv: Empty/Unknown 931.51g
       └─md1: Empty/Unknown 3.64t
          └─dm-3: Empty/Unknown 3.64t
             └─Mounted as /dev/mapper/big1 @ /mnt/big1
PCI [pcieport] 00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 02)
 ├─scsi 15:0:0:0 ATA WDC WD20EARS-00J
 │  └─sdy: Empty/Unknown 1.82t
 │     └─sdy1: Empty/Unknown 1.82t
 │        └─md3: Empty/Unknown 7.28t
 │           └─dm-5: Empty/Unknown 7.28t
 │              └─Mounted as /dev/mapper/big3 @ /mnt/big3
 ├─scsi 15:0:1:0 ATA WDC WD20EARS-00J
 │  └─sdz: Empty/Unknown 1.82t
 │     └─sdz1: Empty/Unknown 1.82t
 ├─scsi 15:0:2:0 ATA WDC WD20EARS-00J
 │  └─sda: Empty/Unknown 1.82t
 │     └─sda1: Empty/Unknown 1.82t
 ├─scsi 15:0:3:0 ATA WDC WD20EARS-00J
 │  └─sdw: Empty/Unknown 1.82t
 │     └─sdw1: Empty/Unknown 1.82t
 └─scsi 15:0:4:0 ATA WDC WD20EARS-00J
    └─sdae: Empty/Unknown 1.82t
       └─sdae1: Empty/Unknown 1.82t
---snip---


On Fri, 2011-05-27 at 11:45 +0100, Gordon Henderson wrote:
> On Fri, 27 May 2011, Roman Mamedov wrote:
> 
> > On Fri, 27 May 2011 08:16:07 +0800
> > Brad Campbell <brad@fnarfbargle.com> wrote:
> >
> >> On 27/05/11 08:13, Leslie Rhorer wrote:
> >>>>
> >>> 	I can't speak to Ubuntu, but Debian evidently does not.  I don't
> >>> know what "pvs" is, but neither bash nor Python recognize it as a file
> >>> anywhere in the path.
> >> apt-get install lvm2.
> >
> > Sure, but lsdrv shouldn't assume lvm2 is installed or require it to be
> > installed. Not everyone uses LVM, and simply installing it automatically adds
> > things to initramfs (PV/VG/LV detection?), which can slow down boot-up process.
> 
> As well as not using LVM, some of us don't even use udev... Or kernels 
> with modules...
> 
> Can someone post the output of this utility so I can see what it's 
> actually doing?
> 
> (I don't have python on all my servers either)
> 
> Cheers,
> 
> Gordon
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Torbjørn Skagestad
Idé Til Produkt AS
torborn@itpas.no

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: Storage device enumeration script
From: Gordon Henderson @ 2011-05-27 11:42 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <1306495608.9437.140.camel@torbjorn>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3278 bytes --]

On Fri, 27 May 2011, Torbjørn Skagestad wrote:

> Hi,
>
> Some info can be found here https://github.com/pturmel/lsdrv
>
> The output will typically be like this:

Ok. Intersting, thanks.

I generllyy know what's inside my servers, but I suppose visualising it is 
good, or having to work on someone elses server...

Gordon


>
> Controller platform [None]
> ??platform floppy.0
>    ??fd0: Empty/Unknown 4.00k
> PCI [pata_jmicron] 03:00.1 IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 02)
> ??scsi 0:x:x:x [Empty]
> ??scsi 2:x:x:x [Empty]
> PCI [ahci] 03:00.0 SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 02)
> ??scsi 12:0:0:0 ATA THROTTLE
> ?  ??sdu: Empty/Unknown 7.51g
> ?     ??sdu1: Empty/Unknown 243.00m
> ?     ?  ??Mounted as /dev/sdu1 @ /boot
> ?     ??sdu2: Empty/Unknown 1.00k
> ?     ??sdu5: Empty/Unknown 7.27g
> ?        ??dm-0: Empty/Unknown 6.90g
> ?        ?  ??Mounted as /dev/mapper/server-root @ /
> ?        ??dm-1: Empty/Unknown 368.00m
> ??scsi 13:0:0:0 ATA WDC WD10EACS-00Z
>    ??sdv: Empty/Unknown 931.51g
>       ??md1: Empty/Unknown 3.64t
>          ??dm-3: Empty/Unknown 3.64t
>             ??Mounted as /dev/mapper/big1 @ /mnt/big1
> PCI [pcieport] 00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 02)
> ??scsi 15:0:0:0 ATA WDC WD20EARS-00J
> ?  ??sdy: Empty/Unknown 1.82t
> ?     ??sdy1: Empty/Unknown 1.82t
> ?        ??md3: Empty/Unknown 7.28t
> ?           ??dm-5: Empty/Unknown 7.28t
> ?              ??Mounted as /dev/mapper/big3 @ /mnt/big3
> ??scsi 15:0:1:0 ATA WDC WD20EARS-00J
> ?  ??sdz: Empty/Unknown 1.82t
> ?     ??sdz1: Empty/Unknown 1.82t
> ??scsi 15:0:2:0 ATA WDC WD20EARS-00J
> ?  ??sda: Empty/Unknown 1.82t
> ?     ??sda1: Empty/Unknown 1.82t
> ??scsi 15:0:3:0 ATA WDC WD20EARS-00J
> ?  ??sdw: Empty/Unknown 1.82t
> ?     ??sdw1: Empty/Unknown 1.82t
> ??scsi 15:0:4:0 ATA WDC WD20EARS-00J
>    ??sdae: Empty/Unknown 1.82t
>       ??sdae1: Empty/Unknown 1.82t
> ---snip---
>
>
> On Fri, 2011-05-27 at 11:45 +0100, Gordon Henderson wrote:
>> On Fri, 27 May 2011, Roman Mamedov wrote:
>>
>>> On Fri, 27 May 2011 08:16:07 +0800
>>> Brad Campbell <brad@fnarfbargle.com> wrote:
>>>
>>>> On 27/05/11 08:13, Leslie Rhorer wrote:
>>>>>>
>>>>> 	I can't speak to Ubuntu, but Debian evidently does not.  I don't
>>>>> know what "pvs" is, but neither bash nor Python recognize it as a file
>>>>> anywhere in the path.
>>>> apt-get install lvm2.
>>>
>>> Sure, but lsdrv shouldn't assume lvm2 is installed or require it to be
>>> installed. Not everyone uses LVM, and simply installing it automatically adds
>>> things to initramfs (PV/VG/LV detection?), which can slow down boot-up process.
>>
>> As well as not using LVM, some of us don't even use udev... Or kernels
>> with modules...
>>
>> Can someone post the output of this utility so I can see what it's
>> actually doing?
>>
>> (I don't have python on all my servers either)
>>
>> Cheers,
>>
>> Gordon
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> -- 
> Torbjørn Skagestad
> Idé Til Produkt AS
> torborn@itpas.no
>

^ permalink raw reply

* Re: Storage device enumeration script
From: Phil Turmel @ 2011-05-27 12:06 UTC (permalink / raw)
  To: Roman Mamedov
  Cc: Brad Campbell, lrhorer, 'Mathias Burén', linux-raid,
	Torbjørn Skagestad
In-Reply-To: <20110527095840.427d90be@natsu>

On 05/26/2011 11:58 PM, Roman Mamedov wrote:
> On Fri, 27 May 2011 08:16:07 +0800
> Brad Campbell <brad@fnarfbargle.com> wrote:
> 
>> On 27/05/11 08:13, Leslie Rhorer wrote:
>>>>
>>> 	I can't speak to Ubuntu, but Debian evidently does not.  I don't
>>> know what "pvs" is, but neither bash nor Python recognize it as a file
>>> anywhere in the path.
>> apt-get install lvm2.
> 
> Sure, but lsdrv shouldn't assume lvm2 is installed or require it to be
> installed. Not everyone uses LVM, and simply installing it automatically adds
> things to initramfs (PV/VG/LV detection?), which can slow down boot-up process.

With the latest contribution from Torbjørn, lsdrv will now report the missing utility, then continue.  If one of the devices is in fact an LVM element, the block device recursion code should still show the basic relationships, without the volume group details.  Testing that case would be appreciated.

https://github.com/pturmel/lsdrv

If you find more bugs, or have other suggestions, using the issue tracker and the wiki on github would minimize the chatter on linux-raid.

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* race condition in md creation?
From: Stijn Hoop @ 2011-05-27 13:20 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1329 bytes --]

Hello,

while creating a test suite for internal purposes I ran into a race
condition where a (very small) raid array that was just created cannot
be stopped.

mdadm --create succeeds, but the subsequent mdadm --stop reports
'Device or resource busy'.

Please see the attached script for reproduction purposes, partial output
from a run on my system (Fedora 14, kernel 2.6.35.13-91.fc14.x86_64,
mdadm-3.1.3-0.git20100804.2.fc14.x86_64):

5+0 records in
5+0 records out
5242880 bytes (5.2 MB) copied, 0.0166663 s, 315 MB/s
5+0 records in
5+0 records out
5242880 bytes (5.2 MB) copied, 0.0197533 s, 265 MB/s
mdadm: array /dev/md0 started.
mdadm: failed to stop array /dev/md0: Device or resource busy
Perhaps a running process, mounted filesystem or active volume group?
failed to stop /dev/md0, sleep 1 sec then retrying one more time
mdadm: stopped /dev/md0

I know that this might be an artificial bug, for with real raid arrays
people will not stop their just-created raid systems, but I figured
somebody might be interested to find out what was actually going on. As
I have no kernel expertise (yet! :) and I need to move on, I am only
posting my results...

BTW, I'm posting here only because I failed to google a bug tracker for
linux-raid. If there is one, my apologies, I will gladly create a bug
instead.

HTH,

--Stijn

[-- Attachment #2: mdadm-bug.sh --]
[-- Type: application/x-shellscript, Size: 500 bytes --]

^ permalink raw reply

* RE: Mdadm re-add fails
From: Schmidt, Annemarie @ 2011-05-27 21:16 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid
In-Reply-To: <5AA430FFE4486C448003201AC83BC85E01B0353E@EXHQ.corp.stratus.com>

Hi Neil,

I've unfortunately run into a problem with the patch to the enough_fd code.  It does not appear to work in all cases.  

mdadm --detail /dev/md21
    Number   Major   Minor   RaidDevice State
       3      65       18        0      active sync   /dev/sdc2
       2      65       50        1      active sync   /dev/sdk2


Here it works when I remove /dev/sdk2

>> mdadm /dev/md21 -f /dev/sdk2 -r /dev/sdk2
mdadm: set /dev/sdk2 faulty in /dev/md21
mdadm: hot removed /dev/sdk2 from /dev/md21

>> mdadm /dev/md21 -a /dev/sdk2
mdadm: re-added /dev/sdk2

But when I try to remove the other disk, /dev/sdc2, it doesn't:

>> mdadm /dev/md21 -f /dev/sdc2 -r /dev/sdc2
mdadm: set /dev/sdc2 faulty in /dev/md21
mdadm: hot removed /dev/sdc2 from /dev/md21

>> mdadm /dev/md21 -a /dev/sdc2
mdadm: /dev/sdc2 reports being an active member for /dev/md21, but a --re-add fails.
mdadm: not performing --add as that would convert /dev/sdc2 in to a spare.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdc2" first.


I could get it all to work when I removed this line from the :

+		array.raid_disks--;

>> mdadm_good_patch_minus_dec /dev/md21 -f /dev/sdk2 -r /dev/sdk2
mdadm: set /dev/sdk2 faulty in /dev/md21
mdadm: hot removed /dev/sdk2 from /dev/md21

>> mdadm_good_patch_minus_dec /dev/md21 -a /dev/sdk2
mdadm: re-added /dev/sdk2


>> mdadm_good_patch_minus_dec /dev/md21 -f /dev/sdc2 -r /dev/sdc2
mdadm: set /dev/sdc2 faulty in /dev/md21
mdadm: hot removed /dev/sdc2 from /dev/md21

>> mdadm_good_patch_minus_dec /dev/md21 -a /dev/sdc2
mdadm: re-added /dev/sdc2

So can this line simply be removed or does the patch need to be reworked?

Thanks & regards,
Annemarie Schmidt


-----Original Message-----
From: Schmidt, Annemarie 
Sent: Friday, May 20, 2011 1:16 PM
To: 'NeilBrown'
Cc: linux-raid@vger.kernel.org; Dailey, Nate
Subject: RE: Mdadm re-add fails

Neil,

Yes, that worked:

>> [root@typhon ~]# mdadm --detail /dev/md24
/dev/md24:
   Version : 1.2
  Creation Time : Fri May 20 11:42:17 2011
  Raid Level : raid1
  Array Size : 5241844 (5.00 GiB 5.37 GB)
  Used Dev Size : 5241844 (5.00 GiB 5.37 GB)
  Raid Devices : 2
  Total Devices : 2
  Persistence : Superblock is persistent

  Intent Bitmap : Internal

  Update Time : Fri May 20 12:47:09 2011
  State : active
  Active Devices : 2
 Working Devices : 2
 Failed Devices : 0
 Spare Devices : 0

           Name : typhon.mno.stratus.com:24  (local to host typhon.mno.stratus.com)
           UUID : 562323d9:9a7b2979:a734abf0:b3fb8f0b
           Events : 155

    Number   Major   Minor   RaidDevice State
       3      65       22        0      active sync   /dev/sdc6
       2      65       54        1      active sync   /dev/sdk6

>> [root@typhon sbin]# mdadm /dev/md24 -f /dev/sdk6 -r /dev/sdk6
mdadm: set /dev/sdk6 faulty in /dev/md24
mdadm: hot removed /dev/sdk6 from /dev/md24

Without the fix:
---------------------
>> root@typhon sbin]# mdadm /dev/md24 -a /dev/sdk6
mdadm: /dev/sdk6 reports being an active member for /dev/md24, but a --re-add fails.
mdadm: not performing --add as that would convert /dev/sdk6 in to a spare.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk6" first.

With the fix:
-----------------
>>  [root@typhon ~]# ./mdadm /dev/md24 -a /dev/sdk6                                 
mdadm: re-added /dev/sdk6

Thanks very much for the assistance.

Regards,
Annemarie


-----Original Message-----
From: NeilBrown [mailto:neilb@suse.de] 
Sent: Thursday, May 19, 2011 7:52 PM
To: Schmidt, Annemarie
Cc: linux-raid@vger.kernel.org
Subject: Re: Mdadm re-add fails

On Wed, 18 May 2011 10:43:47 -0400 "Schmidt, Annemarie"
<Annemarie.Schmidt@stratus.com> wrote:

> Hi!
> 
> I have a 2 disk raid1 data array. As a result of other testing, the device info
> in the superblock for one of the partners, /dev/sdc2, ended up being in slot 3
> of the device info array: 
> 
> [root@typhon ~]# mdadm --detail /dev/md21
> /dev/md21:
>   Version : 1.2
>   Creation Time : Mon May  9 11:19:43 2011
>   Raid Level : raid1
>   Array Size : 5241844 (5.00 GiB 5.37 GB)
>   Used Dev Size : 5241844 (5.00 GiB 5.37 GB)
>   Raid Devices : 2
>   Total Devices : 2
>   Persistence : Superblock is persistent
> 
>   Intent Bitmap : Internal
> 
>   Update Time : Thu May 12 15:51:50 2011
>   State : active
>   Active Devices : 2
>   Working Devices : 2
>   Failed Devices : 0
>   Spare Devices : 0
> 
>            Name : typhon.mno.stratus.com:21  (local to host typhon.mno.stratus.com)
>            UUID : 996d993f:baac367a:8b154ba9:43e56cff
>           Events : 687
> 
>     Number   Major   Minor   RaidDevice State
> -->    3      65       34        0      active sync   /dev/sdc2
>         2      65       82        1      active sync   /dev/sdk2
> 
> When I remove /dev/sdk2 and then a re-add it back in, the re-add fails:
> 
> >> [root@typhon ~]# mdadm /dev/md21 -f /dev/sdk2 -r /dev/sdk2
> mdadm: set /dev/sdk2 faulty in /dev/md21
> mdadm: hot removed /dev/sdk2 from /dev/md21
> 
> >> [root@typhon ~]# mdadm /dev/md21 -a /dev/sdk2
> mdadm: /dev/sdk2 reports being an active member for /dev/md21, but a --re-add
> fails.
> mdadm: not performing --add as that would convert /dev/sdk2 in to a spare.
> mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk2" first.
> 
> I believe the re-add fails because the enough_fd function (util.c) is not searching deep enough into the
> dev_info array with this line of code:
>    for (i=0; i<array.raid_disks + array.nr_disks; i++)
> 
> array.raids_disk = 2 and array/nr_disks = 1, and so for this particular md device, it is only looking at slots 0-2. 
> I believe the code needs to be changed to look at all possible dev_info array slots, taking into account the 
> version of the superblock (like the Detail function does (Detail.c).  
> 
> Do folks agree?
>

I do - largely.  I think there might be a better more general way to control
the loop though.
Could you try this please?

Thanks,
NeilBrown


diff --git a/util.c b/util.c
index 1056ae4..d005e0a 100644
--- a/util.c
+++ b/util.c
@@ -370,10 +370,14 @@ int enough_fd(int fd)
 	    array.raid_disks <= 0)
 		return 0;
 	avail = calloc(array.raid_disks, 1);
-	for (i=0; i<array.raid_disks + array.nr_disks; i++) {
+	for (i=0; i < 1024 && array.raid_disks > 0; i++) {
 		disk.number = i;
 		if (ioctl(fd, GET_DISK_INFO, &disk) != 0)
 			continue;
+		if (disk.major == 0 && disk.minor == 0)
+			continue;
+		array.raid_disks--;
+
 		if (! (disk.state & (1<<MD_DISK_SYNC)))
 			continue;
 		if (disk.raid_disk < 0 || disk.raid_disk >= array.raid_disks)


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: race condition in md creation?
From: NeilBrown @ 2011-05-27 21:24 UTC (permalink / raw)
  To: Stijn Hoop; +Cc: linux-raid
In-Reply-To: <20110527152057.47ca7176@pclin250.win.tue.nl>

On Fri, 27 May 2011 15:20:57 +0200 Stijn Hoop <stijn@sandcat.nl> wrote:

> Hello,
> 
> while creating a test suite for internal purposes I ran into a race
> condition where a (very small) raid array that was just created cannot
> be stopped.
> 
> mdadm --create succeeds, but the subsequent mdadm --stop reports
> 'Device or resource busy'.
> 
> Please see the attached script for reproduction purposes, partial output
> from a run on my system (Fedora 14, kernel 2.6.35.13-91.fc14.x86_64,
> mdadm-3.1.3-0.git20100804.2.fc14.x86_64):
> 
> 
> 5+0 records in
> 5+0 records out
> 5242880 bytes (5.2 MB) copied, 0.0166663 s, 315 MB/s
> 5+0 records in
> 5+0 records out
> 5242880 bytes (5.2 MB) copied, 0.0197533 s, 265 MB/s
> mdadm: array /dev/md0 started.
> mdadm: failed to stop array /dev/md0: Device or resource busy
> Perhaps a running process, mounted filesystem or active volume group?
> failed to stop /dev/md0, sleep 1 sec then retrying one more time
> mdadm: stopped /dev/md0

When a new device appears (such as  a new md array), udev springs in to
action and examines it to see if it should do something with it.
While udev (or some tool that it ran) is examining the md array it looks like
it is busy so an attempt to stop it will fail.

My test scripts tend to have
   udevadm settle
before
   mdadm --stop

for exactly this reason.

> 
> 
> I know that this might be an artificial bug, for with real raid arrays
> people will not stop their just-created raid systems, but I figured
> somebody might be interested to find out what was actually going on. As
> I have no kernel expertise (yet! :) and I need to move on, I am only
> posting my results...
> 
> BTW, I'm posting here only because I failed to google a bug tracker for
> linux-raid. If there is one, my apologies, I will gladly create a bug
> instead.
> 

This email list *is* the bug tracker (I'm not a big fan of bug trackers
myself).

Thanks for the report,
NeilBrown



> HTH,
> 
> --Stijn

^ permalink raw reply

* Re: Question on md126 / md127 issues
From: NeilBrown @ 2011-05-27 21:36 UTC (permalink / raw)
  To: Dylan Distasio; +Cc: linux-raid
In-Reply-To: <BANLkTinePbV0qfs5XXy9cZQ2a7bQfO=FVQ@mail.gmail.com>

On Thu, 26 May 2011 00:10:35 -0400 Dylan Distasio <interzone@gmail.com> wrote:

> Hi all-
> 
> I recently created a RAID1 2 disk mdadm array /dev/md1 with 1.2
> metadata on a Ubuntu system that has 3 other mdadm arrays running on
> it.  The power went out at my house last night, and I rebooted the
> system when it came back up.
> 
> When it came back up, my new array was in two pieces /dev/md126 and
> /dev/md127 (with incorrect members, showing 1 active drive, 1 spare in
> each).   I rebooted again, and had what appeared to be my working
> array, but showing up under /dev/md127.  I could stop and do a --scan
> to assemble it correctly as /dev/md1, but when I rebooted again I got
> the same results with 126 and 127.  My mdadm.conf was correct.

Very weird.  It sounds like mdadm in the initrd is running before all devices
have been discovered, but even that shouldn't create two arrays....

Maybe the second device gets discovered after the switch to a real root and
something gets lost..


> 
> I did some searching on my archives of this list, and found a solution
> as follows:
> -----------
> How to fix the '125/126/127' mdadm issue.
> 
> The array has '125' stored as the 'preferred minor' in the metadata.

1.2 metadata doesn't have a 'perferred minor' - only 0.90 has that.

1.2 has a 'name' which has a vaguely similar purpose but I don't think it
would cause this sort of issue.  I would be very surprised if '125' ever got
stored there unless you explicitly asked for it.

> You can change this by assembling with --update=super-minor.
> e.g.
> 
>  mdadm -S /dev/md125
>  mdadm -A /dev/md1 --update=super-minor

This command will not affect a 1.2 array at all.  It will just assemble it.

> 
> it should get details of which devices to included from /etc/mdadm.conf.
> 
> However it is possible that mdadm.conf in your initrd also the name
> as /dev/md125.
> So once you have performed the above, run mkinitrd again, reboot, and report
> what happens.
> ----------------

Running mkinitrd when you have boot problems is always a good idea.  Maybe
that was all it took to fix your problem ??


> 
> I had to run the above commands, and then make sure I ran
> update-initramfs -v -u for it to stick after reboot.
> 
> My issue is solved, but I would like to understand what the root cause
> is, and why the above solution worked.  Can someone elaborate on what
> super-minor is?  This is a home system and I had backups so I was
> comfortable trying the above, but I don't typically like running
> commands on faith I don't understand fully, especially in Linux.
> 
> Can anyone shed some light on this?  I can provide further OS and
> array details if necessary, but it sounds like this issue has occurred
> for others in the past.

Now that the problem is fixed it is very hard to figure out what was happening
before.  My best guess is that someone was wrong with mdadm.conf in the
initrd, but I don't know what.

NeilBrown


^ permalink raw reply

* Re: Upgrading from metadata 0.9
From: NeilBrown @ 2011-05-27 21:38 UTC (permalink / raw)
  To: Phillip Susi; +Cc: linux-raid
In-Reply-To: <4DDEA8AB.4090709@cfl.rr.com>

On Thu, 26 May 2011 15:23:23 -0400 Phillip Susi <psusi@cfl.rr.com> wrote:

> Is there a way to upgrade an existing raid array using metadata format 
> 0.9 to a 1.x format without loosing data?

Not really.  It is on my list of nice-to-have features for mdadm, but it is
not near the top.

As 1.x uses less space for metadata you could simply recreate the array as
  --metadata=1.0
All the data would still be there.
You should be as explicit as possible in the --create command.

mdadm --create /dev/md0 --level=XX --raid-disks=YY --chunk=ZZ --size=QQ
   --layout=LL  --assume-clean --metadata=1.0

You can only convert to 1.0 metadata as 1.1 and 1.2 put the metadata  on top
of where data currently is.


NeilBrown


^ permalink raw reply

* Re: race condition in md creation?
From: Stijn Hoop @ 2011-05-28 11:19 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid
In-Reply-To: <20110528072440.5bc27c1d@notabene.brown>

Hi,

On Sat, 28 May 2011 07:24:40 +1000
NeilBrown <neilb@suse.de> wrote:
> On Fri, 27 May 2011 15:20:57 +0200 Stijn Hoop <stijn@sandcat.nl>
> wrote:
> 
> > Hello,
> > 
> > while creating a test suite for internal purposes I ran into a race
> > condition where a (very small) raid array that was just created
> > cannot be stopped.
> > 
> > mdadm --create succeeds, but the subsequent mdadm --stop reports
> > 'Device or resource busy'.
> > 
> > Please see the attached script for reproduction purposes, partial
> > output from a run on my system (Fedora 14, kernel
> > 2.6.35.13-91.fc14.x86_64, mdadm-3.1.3-0.git20100804.2.fc14.x86_64):
> > 
> > 
> > 5+0 records in
> > 5+0 records out
> > 5242880 bytes (5.2 MB) copied, 0.0166663 s, 315 MB/s
> > 5+0 records in
> > 5+0 records out
> > 5242880 bytes (5.2 MB) copied, 0.0197533 s, 265 MB/s
> > mdadm: array /dev/md0 started.
> > mdadm: failed to stop array /dev/md0: Device or resource busy
> > Perhaps a running process, mounted filesystem or active volume
> > group? failed to stop /dev/md0, sleep 1 sec then retrying one more
> > time mdadm: stopped /dev/md0
> 
> When a new device appears (such as  a new md array), udev springs in
> to action and examines it to see if it should do something with it.
> While udev (or some tool that it ran) is examining the md array it
> looks like it is busy so an attempt to stop it will fail.
> 
> My test scripts tend to have
>    udevadm settle
> before
>    mdadm --stop
> 
> for exactly this reason.

Ah, that makes perfect sense. Thanks for the explanation!

--Stijn

^ permalink raw reply

* Re: creating degraded raid1 with imsm metadata
From: FDi @ 2011-05-28 17:13 UTC (permalink / raw)
  To: Jiang, Dave; +Cc: linux-raid@vger.kernel.org
In-Reply-To: <D010E79907AF0D4E90B603DE907837D50490856E0F@azsmsx504.amr.corp.intel.com>

On Thu, May 26, 2011 at 09:40:16AM -0700, Jiang, Dave wrote:
> > -----Original Message-----
> > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> > owner@vger.kernel.org] On Behalf Of FDi
> > Sent: Thursday, May 26, 2011 12:43 AM
> > To: linux-raid@vger.kernel.org
> > Subject: creating degraded raid1 with imsm metadata
> > 
> > Hello *,
> > 
> > Since Intel's Matrix Storage Manager option ROM doesn't support creating of
> > degraded arrays I was wondering if I could use mdadm to make one? I had a
> > very hard time finding documentation about how mdadm is supposed to
> > work with imsm.
> > 
> > The plan is to make a 2x1TB raid1 with one device missing and then later add
> > the other disk in once all the data has been copied to the degraded array. So
> > a typical raid1 migration scenario, which Intel oddly enough doesn't seem to
> > support with their option ROM.
> 
> Not sure if that's possible but have you looked at the Linux RAID wiki on IMSM information?
> https://raid.wiki.kernel.org/index.php/RAID_setup#External_Metadata
I wasn't able to figure out how to do what I wanted based on the wiki,
but after lots of googling I found the exact commands:

mdadm --create --force -v -e imsm --level=container -n 1 /dev/md/imsm
/dev/sdb

mdadm --create -v --level raid1 -n 2 /dev/md/myraid /dev/sdb missing

However I also learned that these commands have to be done on the target
machine while its running with RAID mode selected from BIOS. Otherwise
you will get this warning:

mdadm: imsm unable to enumerate platform support
    array may not be compatible with hardware/firmware
	 Continue creating array?

And indeed if that warning is displayed during the create, Intel's
option rom won't see a working array on the device. I'm kinda curious
why is this exactly? What kind of information mdadm uses from the
controller running in RAID mode?

When I created my array on the target machine using the commands from
above it worked correctly and Intel option rom saw the array and was
able to boot from the MBR I installed on the array as a test. Haven't
tested rebuilding yet.

^ permalink raw reply

* [TRIVIAL PATCH next 00/15] treewide: Convert vmalloc/memset to vzalloc
From: Joe Perches @ 2011-05-28 17:36 UTC (permalink / raw)
  To: linux-atm-general, netdev, drbd-user, dm-devel, linux-raid,
	linux-mtd
  Cc: linux-s390, linux-kernel, linux-media, devel, xfs

Resubmittal of patches from November 2010 and a few new ones.

Joe Perches (15):
  s390: Convert vmalloc/memset to vzalloc
  x86: Convert vmalloc/memset to vzalloc
  atm: Convert vmalloc/memset to vzalloc
  drbd: Convert vmalloc/memset to vzalloc
  char: Convert vmalloc/memset to vzalloc
  isdn: Convert vmalloc/memset to vzalloc
  md: Convert vmalloc/memset to vzalloc
  media: Convert vmalloc/memset to vzalloc
  mtd: Convert vmalloc/memset to vzalloc
  scsi: Convert vmalloc/memset to vzalloc
  staging: Convert vmalloc/memset to vzalloc
  video: Convert vmalloc/memset to vzalloc
  fs: Convert vmalloc/memset to vzalloc
  mm: Convert vmalloc/memset to vzalloc
  net: Convert vmalloc/memset to vzalloc

 arch/s390/hypfs/hypfs_diag.c           |    3 +--
 arch/x86/mm/pageattr-test.c            |    3 +--
 drivers/atm/idt77252.c                 |   11 ++++++-----
 drivers/atm/lanai.c                    |    3 +--
 drivers/block/drbd/drbd_bitmap.c       |    5 ++---
 drivers/char/agp/backend.c             |    3 +--
 drivers/char/raw.c                     |    3 +--
 drivers/isdn/i4l/isdn_common.c         |    4 ++--
 drivers/isdn/mISDN/dsp_core.c          |    3 +--
 drivers/isdn/mISDN/l1oip_codec.c       |    6 ++----
 drivers/md/dm-log.c                    |    3 +--
 drivers/md/dm-snap-persistent.c        |    3 +--
 drivers/md/dm-table.c                  |    4 +---
 drivers/media/video/videobuf2-dma-sg.c |    8 ++------
 drivers/mtd/mtdswap.c                  |    3 +--
 drivers/s390/cio/blacklist.c           |    3 +--
 drivers/scsi/bfa/bfad.c                |    3 +--
 drivers/scsi/bfa/bfad_debugfs.c        |    8 ++------
 drivers/scsi/cxgbi/libcxgbi.h          |    6 ++----
 drivers/scsi/qla2xxx/qla_attr.c        |    6 ++----
 drivers/scsi/qla2xxx/qla_bsg.c         |    3 +--
 drivers/scsi/scsi_debug.c              |    7 ++-----
 drivers/staging/rts_pstor/ms.c         |    3 +--
 drivers/staging/rts_pstor/rtsx_chip.c  |    6 ++----
 drivers/video/arcfb.c                  |    5 ++---
 drivers/video/broadsheetfb.c           |    4 +---
 drivers/video/hecubafb.c               |    5 ++---
 drivers/video/metronomefb.c            |    4 +---
 drivers/video/xen-fbfront.c            |    3 +--
 fs/coda/coda_linux.h                   |    5 ++---
 fs/reiserfs/journal.c                  |    9 +++------
 fs/reiserfs/resize.c                   |    4 +---
 fs/xfs/linux-2.6/kmem.h                |    7 +------
 mm/page_cgroup.c                       |    3 +--
 net/netfilter/x_tables.c               |    5 ++---
 net/rds/ib_cm.c                        |    6 ++----
 36 files changed, 57 insertions(+), 113 deletions(-)

-- 
1.7.5.rc3.dirty

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [TRIVIAL PATCH next 07/15] md: Convert vmalloc/memset to vzalloc
From: Joe Perches @ 2011-05-28 17:36 UTC (permalink / raw)
  To: Neil Brown, Jiri Kosina; +Cc: dm-devel, linux-raid, linux-kernel
In-Reply-To: <cover.1306603968.git.joe@perches.com>

Signed-off-by: Joe Perches <joe@perches.com>
---
 drivers/md/dm-log.c             |    3 +--
 drivers/md/dm-snap-persistent.c |    3 +--
 drivers/md/dm-table.c           |    4 +---
 3 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/md/dm-log.c b/drivers/md/dm-log.c
index a1f3218..b6c2b71 100644
--- a/drivers/md/dm-log.c
+++ b/drivers/md/dm-log.c
@@ -487,7 +487,7 @@ static int create_log_context(struct dm_dirty_log *log, struct dm_target *ti,
 	memset(lc->sync_bits, (sync == NOSYNC) ? -1 : 0, bitset_size);
 	lc->sync_count = (sync == NOSYNC) ? region_count : 0;
 
-	lc->recovering_bits = vmalloc(bitset_size);
+	lc->recovering_bits = vzalloc(bitset_size);
 	if (!lc->recovering_bits) {
 		DMWARN("couldn't allocate sync bitset");
 		vfree(lc->sync_bits);
@@ -499,7 +499,6 @@ static int create_log_context(struct dm_dirty_log *log, struct dm_target *ti,
 		kfree(lc);
 		return -ENOMEM;
 	}
-	memset(lc->recovering_bits, 0, bitset_size);
 	lc->sync_search = 0;
 	log->context = lc;
 
diff --git a/drivers/md/dm-snap-persistent.c b/drivers/md/dm-snap-persistent.c
index 95891df..be100d4 100644
--- a/drivers/md/dm-snap-persistent.c
+++ b/drivers/md/dm-snap-persistent.c
@@ -174,10 +174,9 @@ static int alloc_area(struct pstore *ps)
 	if (!ps->area)
 		goto err_area;
 
-	ps->zero_area = vmalloc(len);
+	ps->zero_area = vzalloc(len);
 	if (!ps->zero_area)
 		goto err_zero_area;
-	memset(ps->zero_area, 0, len);
 
 	ps->header_area = vmalloc(len);
 	if (!ps->header_area)
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index cb8380c..5850497 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -154,9 +154,7 @@ void *dm_vcalloc(unsigned long nmemb, unsigned long elem_size)
 		return NULL;
 
 	size = nmemb * elem_size;
-	addr = vmalloc(size);
-	if (addr)
-		memset(addr, 0, size);
+	addr = vzalloc(size);
 
 	return addr;
 }
-- 
1.7.5.rc3.dirty

^ permalink raw reply related

* Re: Question on md126 / md127 issues
From: Dylan Distasio @ 2011-05-28 22:15 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <20110528073602.128cebc0@notabene.brown>

Thanks, Neil.

I think running mkinitrd was probably the only thing required for a
fix after reading your response.  I had an older array on the same box
that was completely removed, but maybe something was leftover in
initrd.

My detailed understanding of the initrd process is fairly limited.  I
didn't realize there was a separate mdadm.conf that was used when
booting that is separate from the one in /etc.


> Running mkinitrd when you have boot problems is always a good idea.  Maybe
> that was all it took to fix your problem ??
>
>
>>
>> I had to run the above commands, and then make sure I ran
>> update-initramfs -v -u for it to stick after reboot.
>>
>> My issue is solved, but I would like to understand what the root cause
>> is, and why the above solution worked.  Can someone elaborate on what
>> super-minor is?  This is a home system and I had backups so I was
>> comfortable trying the above, but I don't typically like running
>> commands on faith I don't understand fully, especially in Linux.
>>
>> Can anyone shed some light on this?  I can provide further OS and
>> array details if necessary, but it sounds like this issue has occurred
>> for others in the past.
>
> Now that the problem is fixed it is very hard to figure out what was happening
> before.  My best guess is that someone was wrong with mdadm.conf in the
> initrd, but I don't know what.
>
> NeilBrown
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Question on md126 / md127 issues
From: Phil Turmel @ 2011-05-28 22:47 UTC (permalink / raw)
  To: Dylan Distasio; +Cc: linux-raid
In-Reply-To: <BANLkTimirBv2O_s91wKjf56OAt4m-iTnWg@mail.gmail.com>

Hi Dylan,

On 05/28/2011 06:15 PM, Dylan Distasio wrote:
> Thanks, Neil.
> 
> I think running mkinitrd was probably the only thing required for a
> fix after reading your response.  I had an older array on the same box
> that was completely removed, but maybe something was leftover in
> initrd.
> 
> My detailed understanding of the initrd process is fairly limited.  I
> didn't realize there was a separate mdadm.conf that was used when
> booting that is separate from the one in /etc.

Many people miss this.  Modern linux distributions, with few exceptions, use a three stage boot process: 1) kernel, 2) initramfs, then 3) real root FS.  If there is no mdadm.conf in an initramfs at all, but the initramfs has raid support, mdadm will assemble everything it finds.  It will assign the first array to md127 and count backwards from there.

You might like this description of the process from the kernel docs:

http://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt

The money quote:

"An initramfs archive is a complete self-contained root filesystem for Linux."

If you change anything on your system that might impact the boot process, you're probably going to need to run "update-initramfs", or your distribution's equivalent.

HTH,

Phil

^ permalink raw reply

* filesystem-level tool to validate array
From: Michael Stumpf @ 2011-05-29 20:17 UTC (permalink / raw)
  To: linux-raid

I'm looking for a filesystem-level tool to perform something similar
to what badblocks does at the drive level.  I can certainly write it
on my own (I'd build it as a Perl or Python script), but if someone's
already invented this..

(The intended purpose is to validate that there are no quirks/bugs in
the overall fs.)

^ permalink raw reply

* mdadm not creating symlinks for partitioned arrays
From: linbloke @ 2011-05-30  2:23 UTC (permalink / raw)
  To: Linux-RAID

Hi mdadm-ers,

I'm pretty sure I have a complete /etc/mdadm.conf and I've regenerated 
the initramfs, but the symlinks from /dev/md/array0p1 to /dev/md_123p1 
are not being created,  neither on boot or with a manual assembly. My 
read of the mdadm man page suggests that mdadm should create these 
symlinks when the array name ends in a number, the array device is 
specified in /dev/md/, and the CREATE auto line exists in 
/etc/mdadm.conf. I've got config for other services that specifies these 
array partitions as targets and when they are not created, the services 
fail to start (naturally :-) I chose these names as they should be 
stationary targets, ie persistent names across reboots, whereas the 
/dev/md_d127 names seem to be dynamically assigned, based on order of 
raid array discovery and position in the mdadm.conf file.

The partitioned arrays are started ok and the partitions detected, I 
just can't get mdadm(/udev ??) to create the appropriate symlinks. Does 
anyone know how to get these to be created for a partitioned array?

I have:
/dev:
brw-rw---- 1 root disk 254, 8128 2011-05-30 11:39 /dev/md_d127
brw-rw---- 1 root disk 254, 8129 2011-05-30 11:39 /dev/md_d127p1

/dev/md:
total 0
lrwxrwxrwx 1 root root 10 2011-05-30 11:39 h001r003 -> ../md_d127


I need:
/dev/md:
lrwxrwxrwx 1 root root 10 2011-05-30 11:39 h001r003p1 -> ../md_d127p1


Thanks kindly,

Josh



wynyard:~ # cat /etc/SuSE-release
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 1

wynyard:~ # mdadm -V
mdadm - v3.0.3 - 22nd October 2009

wynyard:~ # uname -a
Linux wynyard 2.6.32.36-0.5-xen #1 SMP 2011-04-14 10:12:31 +0200 x86_64 
x86_64 x86_64 GNU/Linux


wynyard:~ # cat /proc/mdstat
Personalities : [raid1] [raid0] [raid10] [raid6] [raid5] [raid4] [linear]
md_d127 : active (auto-read-only) raid10 sdd[0] sdg[3] sdf[2] sde[1]
      3907027968 blocks super 1.2 256K chunks 2 far-copies [4/4] [UUUU]
      
md_d125 : active (auto-read-only) raid1 sdj[0] sdk[1]
      293036048 blocks super 1.2 [2/2] [UU]
      
md_d126 : active (auto-read-only) raid1 sdh[0] sdi[1]
      1953514448 blocks super 1.2 [2/2] [UU]
      
md0 : active raid1 sda1[0] sdb1[1]
      506004 blocks super 1.0 [2/2] [UU]
      bitmap: 0/8 pages [0KB], 32KB chunk

md1 : active raid1 sda2[0] sdb2[1]
      160328628 blocks super 1.0 [2/2] [UU]
      bitmap: 2/153 pages [8KB], 512KB chunk

unused devices: <none>


wynyard:~ # cat /etc/mdadm.conf
DEVICE containers partitions

CREATE auto=part8

ARRAY /dev/md0 UUID=62862472:2a1986f3:3aaf03f8:98f91297
ARRAY /dev/md1 UUID=b74f4f13:637b1874:9681b4f8:789d572c
ARRAY /dev/md/h001r003 auto=part8 metadata=1.02 name=wynyard:h001r003 
UUID=9ca01512:203443f7:a2173168:5f44986e
ARRAY /dev/md/h001r004 auto=part8 metadata=1.02 name=wynyard:h001r004 
UUID=5b4f3aa8:f8de5a24:7933c52c:18096707
ARRAY /dev/md/h001r005 auto=part8 metadata=1.02 name=wynyard:h001r005 
UUID=b25a5f84:57861501:efe9943a:a804fc03



wynyard:~ # ls -l /dev/md*
brw-rw---- 1 root disk   9,    0 2011-05-11 01:28 /dev/md0
brw-rw---- 1 root disk   9,    1 2011-05-11 01:28 /dev/md1
brw-r----- 1 root disk   9,   10 2011-05-11 01:28 /dev/md10
brw-r----- 1 root disk   9,   11 2011-05-11 01:28 /dev/md11
brw-r----- 1 root disk   9,   12 2011-05-11 01:28 /dev/md12
brw-r----- 1 root disk   9,   13 2011-05-11 01:28 /dev/md13
brw-r----- 1 root disk   9,   14 2011-05-11 01:28 /dev/md14
brw-r----- 1 root disk   9,   15 2011-05-11 01:28 /dev/md15
brw-r----- 1 root disk   9,   16 2011-05-11 01:28 /dev/md16
brw-r----- 1 root disk   9,   17 2011-05-11 01:28 /dev/md17
brw-r----- 1 root disk   9,   18 2011-05-11 01:28 /dev/md18
brw-r----- 1 root disk   9,   19 2011-05-11 01:28 /dev/md19
brw-r----- 1 root disk   9,    2 2011-05-11 01:28 /dev/md2
brw-r----- 1 root disk   9,   20 2011-05-11 01:28 /dev/md20
brw-r----- 1 root disk   9,   21 2011-05-11 01:28 /dev/md21
brw-r----- 1 root disk   9,   22 2011-05-11 01:28 /dev/md22
brw-r----- 1 root disk   9,   23 2011-05-11 01:28 /dev/md23
brw-r----- 1 root disk   9,   24 2011-05-11 01:28 /dev/md24
brw-r----- 1 root disk   9,   25 2011-05-11 01:28 /dev/md25
brw-r----- 1 root disk   9,   26 2011-05-11 01:28 /dev/md26
brw-r----- 1 root disk   9,   27 2011-05-11 01:28 /dev/md27
brw-r----- 1 root disk   9,   28 2011-05-11 01:28 /dev/md28
brw-r----- 1 root disk   9,   29 2011-05-11 01:28 /dev/md29
brw-r----- 1 root disk   9,    3 2011-05-11 01:28 /dev/md3
brw-r----- 1 root disk   9,   30 2011-05-11 01:28 /dev/md30
brw-r----- 1 root disk   9,   31 2011-05-11 01:28 /dev/md31
brw-r----- 1 root disk   9,    4 2011-05-11 01:28 /dev/md4
brw-r----- 1 root disk   9,    5 2011-05-11 01:28 /dev/md5
brw-r----- 1 root disk   9,    6 2011-05-11 01:28 /dev/md6
brw-r----- 1 root disk   9,    7 2011-05-11 01:28 /dev/md7
brw-r----- 1 root disk   9,    8 2011-05-11 01:28 /dev/md8
brw-r----- 1 root disk   9,    9 2011-05-11 01:28 /dev/md9
brw-rw---- 1 root disk 254, 8000 2011-05-30 11:36 /dev/md_d125
brw-rw---- 1 root disk 254, 8001 2011-05-30 11:36 /dev/md_d125p1
brw-rw---- 1 root disk 254, 8002 2011-05-30 11:36 /dev/md_d125p2
brw-rw---- 1 root disk 254, 8064 2011-05-30 11:36 /dev/md_d126
brw-rw---- 1 root disk 254, 8065 2011-05-30 11:36 /dev/md_d126p1
brw-rw---- 1 root disk 254, 8066 2011-05-30 11:36 /dev/md_d126p2
brw-rw---- 1 root disk 254, 8128 2011-05-30 11:39 /dev/md_d127
brw-rw---- 1 root disk 254, 8129 2011-05-30 11:39 /dev/md_d127p1

/dev/md:
total 0
lrwxrwxrwx 1 root root 10 2011-05-30 11:39 h001r003 -> ../md_d127
lrwxrwxrwx 1 root root 10 2011-05-30 11:36 h001r004 -> ../md_d126
lrwxrwxrwx 1 root root 10 2011-05-30 11:36 h001r005 -> ../md_d125

dmesg:
[  263.689751] md: bind<sde>
[  263.689834] md: bind<sdf>
[  263.689914] md: bind<sdg>
[  263.689998] md: bind<sdd>
[  263.691438] raid10: raid set md_d127 active with 4 out of 4 devices
[  263.691454] md_d127: detected capacity change from 0 to 4000796639232
[  263.692855]  md_d127: p1

wynyard:~ # mdadm -vD /dev/md/h001r003
/dev/md/h001r003:
        Version : 1.02
  Creation Time : Sat May 21 15:26:26 2011
     Raid Level : raid10
     Array Size : 3907027968 (3726.03 GiB 4000.80 GB)
  Used Dev Size : 1953513984 (1863.02 GiB 2000.40 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Mon May 30 12:20:31 2011
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : far=2
     Chunk Size : 256K

           Name : wynyard:h001r003  (local to host wynyard)
           UUID : 9ca01512:203443f7:a2173168:5f44986e
         Events : 40

    Number   Major   Minor   RaidDevice State
       0       8       48        0      active sync   /dev/sdd
       1       8       64        1      active sync   /dev/sde
       2       8       80        2      active sync   /dev/sdf
       3       8       96        3      active sync   /dev/sdg


wynyard:~ # mdadm -vE /dev/sd[d-g]
/dev/sdd:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 9ca01512:203443f7:a2173168:5f44986e
           Name : wynyard:h001r003  (local to host wynyard)
  Creation Time : Sat May 21 15:26:26 2011
     Raid Level : raid10
   Raid Devices : 4

 Avail Dev Size : 3907028896 (1863.02 GiB 2000.40 GB)
     Array Size : 7814055936 (3726.03 GiB 4000.80 GB)
  Used Dev Size : 3907027968 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : a1f7a0dc:7112a2bc:e11d6cd5:930ec677

    Update Time : Mon May 30 12:22:36 2011
       Checksum : f57561d9 - correct
         Events : 40

         Layout : far=2
     Chunk Size : 256K

   Device Role : Active device 0
   Array State : AAAA ('A' == active, '.' == missing)
/dev/sde:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 9ca01512:203443f7:a2173168:5f44986e
           Name : wynyard:h001r003  (local to host wynyard)
  Creation Time : Sat May 21 15:26:26 2011
     Raid Level : raid10
   Raid Devices : 4

 Avail Dev Size : 3907028896 (1863.02 GiB 2000.40 GB)
     Array Size : 7814055936 (3726.03 GiB 4000.80 GB)
  Used Dev Size : 3907027968 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : b9904f82:b7aa0ded:fe828141:f556af7d

    Update Time : Mon May 30 12:22:36 2011
       Checksum : 3d8e40b7 - correct
         Events : 40

         Layout : far=2
     Chunk Size : 256K

   Device Role : Active device 1
   Array State : AAAA ('A' == active, '.' == missing)
/dev/sdf:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 9ca01512:203443f7:a2173168:5f44986e
           Name : wynyard:h001r003  (local to host wynyard)
  Creation Time : Sat May 21 15:26:26 2011
     Raid Level : raid10
   Raid Devices : 4

 Avail Dev Size : 3907028896 (1863.02 GiB 2000.40 GB)
     Array Size : 7814055936 (3726.03 GiB 4000.80 GB)
  Used Dev Size : 3907027968 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 82ea41f3:8d897dcd:6c0cfe16:b8b2a35a

    Update Time : Mon May 30 12:22:36 2011
       Checksum : 41615e88 - correct
         Events : 40

         Layout : far=2
     Chunk Size : 256K

   Device Role : Active device 2
   Array State : AAAA ('A' == active, '.' == missing)
/dev/sdg:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 9ca01512:203443f7:a2173168:5f44986e
           Name : wynyard:h001r003  (local to host wynyard)
  Creation Time : Sat May 21 15:26:26 2011
     Raid Level : raid10
   Raid Devices : 4

 Avail Dev Size : 3907028896 (1863.02 GiB 2000.40 GB)
     Array Size : 7814055936 (3726.03 GiB 4000.80 GB)
  Used Dev Size : 3907027968 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : a6f5741b:bbe0092a:2568b4fd:5dff4755

    Update Time : Mon May 30 12:22:36 2011
       Checksum : a77b6938 - correct
         Events : 40

         Layout : far=2
     Chunk Size : 256K

   Device Role : Active device 3
   Array State : AAAA ('A' == active, '.' == missing)




wynyard:~ # fdisk -l /dev/sd[d-g]

Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x000a1c3c

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sde: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0005e157

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x000592c0

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sdg: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0009242e

   Device Boot      Start         End      Blocks   Id  System



^ permalink raw reply

* Re: disable raid autodetect at boot
From: Nikolay Kichukov @ 2011-05-30  7:05 UTC (permalink / raw)
  To: Alexander Lyakas; +Cc: Michael Tokarev, linux-raid
In-Reply-To: <BANLkTik=qRE_c5XWs6kjbT90otnmdGdaog@mail.gmail.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,
you should most likely disable the assembly of the arrays in the initramfs pre/post startup subsystem scripts.

I have it configured such as when initramfs image is loaded it automagically detects all my raid drives, various
versions: 0.9 and 1.2.

For example:

$ gunzip < /boot/initrd.img-2.6.32-5-vserver-amd64 | cpio -i --make-directories
62745 blocks
$ ls
bin  conf  etc  init  lib  lib64  sbin  scripts


You can take it from here...

Cheers,
- -Nik

On 05/25/2011 06:15 PM, Alexander Lyakas wrote:
> Michael,
> thank you for your advice.
> 
> The only place where I saw mdadm is being called on boot is via
> /etc/init.d/mdadm, which starts mdadm with --monitor --scan options.
> Still I disabled the daemon start (in /etc/default/mdadm), but
> inactive raid keeps reappearing on boot.
> 
> How can I find out who else might be calling mdadm on boot?
> 
> Thanks,
>    Alex.
> 
> 
> On Mon, May 23, 2011 at 4:21 PM, Michael Tokarev <mjt@tls.msk.ru> wrote:
>> 23.05.2011 16:50, Alexander Lyakas wrote:
>>> Michael,
>>> can you pls explain what do I need to look at to disable this.
>>
>>> On Mon, May 23, 2011 at 1:35 PM, Michael Tokarev <mjt@tls.msk.ru> wrote:
>>
>>>> This is not kernel autodetection, this is your initramfs/initrd
>>>> and mdadm.  Or maybe mdadm in the regular root filesystem.
>>
>> You need to find out where and how mdadm is called
>> on your system during bootup, and fix that place.
>>
>> /mjt
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJN40HGAAoJEDFLYVOGGjgXG/cH/R6IqX2rTr9Wjt9jxF6ad2GG
2r9AZpK7AV4m9ckDeqUYTrojjDA/rvOMVneMuqG22GgAt/JQ5M5AIZFlH39i8VfN
DL/DNd27hbjh+6rDd9I6LEDxCtO4UyAOtkI313fou8Yn8LdzOJvc9K58hsLUumhl
O+w/AmcOGFYnoOdVajobfDDQT1vb8TXYCA0rM6+w9+f94BQm1HPgxqJNfkBpry3P
N9qB0dngw7rnt1pXgYoi6zpiGSnqD/5tnnF9cYvq9HfRvK1/YwHAbwkZFDtuMwj1
lZOb6nC4MhPMmBFYW63FneGYW6YojUYDgSmJbWJHfwBRt2ScfgmfkR+sQ6R7AqM=
=MM9u
-----END PGP SIGNATURE-----

^ permalink raw reply

* Re: filesystem-level tool to validate array
From: Nikolay Kichukov @ 2011-05-30  7:13 UTC (permalink / raw)
  To: Michael Stumpf; +Cc: linux-raid
In-Reply-To: <BANLkTi=9mp0v4kn+3ZZP7CtBxE88Ox3_rA@mail.gmail.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,
perhaps a complete fs check via fsck ?

Cheers,
- -Nik



On 05/29/2011 11:17 PM, Michael Stumpf wrote:
> I'm looking for a filesystem-level tool to perform something similar
> to what badblocks does at the drive level.  I can certainly write it
> on my own (I'd build it as a Perl or Python script), but if someone's
> already invented this..
> 
> (The intended purpose is to validate that there are no quirks/bugs in
> the overall fs.)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJN40OwAAoJEDFLYVOGGjgXEfMH/3EuiJMEIH25WAFbfdv4lgl/
vDvFvb7BpPugnws4qffyJ3cWZu/g39OjrvvAf00xFZXy+OYv0mTtdnNiDBZh/3Tp
lT+p4b1zNsb9yujEFM0OhjaDjq89FytRNHIIPXl9OtvAWi/z+JdGY9ucyFqjl04L
EUaur3ofW0ys1rCLFHvk7Lg3F74E0wk/5v7Ivlop4ECYbgbzM56sLIZ1b980GWCF
kLlz1OdakoZw3Zl9LOdGVqOtmd2PJrhUzYjILSgUIPTVjAFLbKt9oEE1Y5uBJOhm
e5bY9K7YuYVnQ52Vmbz7Y4nxdIAGHp7NDne2jQoGhObKlbzoth4k/IOE6SoR3R4=
=r5BC
-----END PGP SIGNATURE-----

^ permalink raw reply

* Optimizing small IO with md RAID
From: fibreraid @ 2011-05-30  7:14 UTC (permalink / raw)
  To: linux-raid, fibre raid

Hi all,

I am looking to optimize md RAID performance as much as possible.

I've managed to get some rather strong large 4M IOps performance, but
small 4K IOps are still rather subpar, given the hardware.

CPU: 2 x Intel Westmere 6-core 2.4GHz
RAM: 24GB DDR3 1066
SAS controllers: 3 x LSI SAS2008 (6 Gbps SAS)
Drives: 24 x SSD's
Kernel: 2.6.38 x64 kernel (home-grown)
Benchmarking Tool: fio 1.54

Here are the results.I used the following commands to perform these benchmarks:

4K READ: fio --bs=4k --direct=1 --rw=read --ioengine=libaio
--iodepth=512 --runtime=60 --name=/dev/md0
4K WRITE: fio --bs=4k --direct=1 --rw=write--ioengine=libaio
--iodepth=512 --runtime=60 --name=/dev/md0
4M READ: fio --bs=4m --direct=1 --rw=read --ioengine=libaio
--iodepth=64 --runtime=60 --name=/dev/md0
4M WRITE: fio --bs=4m --direct=1 --rw=read --ioengine=libaio
--iodepth=64 --runtime=60 --name=/dev/md0

In each case below, the md chunk size was 64K. In RAID 5 and RAID 6,
one hot-spare was specified.

	raid0 24 x SSD	raid5 23 x SSD	raid6 23 x SSD	raid0 (2 * (raid5 x 11 SSD))						
4K read	179,923 IO/s	93,503 IO/s	116,866 IO/s	75,782 IO/s
4K write	168,027 IO/s	108,408 IO/s	120,477 IO/s	90,954 IO/s
4M read	4,576.7 MB/s	4,406.7 MB/s	4,052.2 MB/s	3,566.6 MB/s
4M write	3,146.8 MB/s	1,337.2 MB/s	1,259.9 MB/s	1,856.4 MB/s

Note that each individual SSD tests out as follows:

4k read: 56,342 IO/s
4k write: 33,792 IO/s
4M read: 231 MB/s
4M write: 130 MB/s

My concerns:

1. Given the above individual SSD performance, 24 SSD's in an md array
is at best getting 4K read/write performance of 2-3 drives, which
seems very low. I would expect significantly better linear scaling.
2. On the other hand, 4M read/write are performing more like 10-15
drives, which is much better, though still seems like it could get
better.
3. 4k read/write looks good for RAID 0, but drop off by over 40% with
RAID 5. While somewhat understandable on writes, why such a
significant hit on reads?
4. RAID 5 4M writes take a big hit compared to RAID 0, from 3146 MB/s
to 1337 MB/s. Despite the RAID 5 overhead, that still seems huge given
the CPU's at hand. Why?
5. Using a RAID 0 across two 11-SSD RAID 5's gives better RAID 5 4M
write performance, but worse in reads and significantly worse in 4K
reads/writes. Why?

Any thoughts would be greatly appreciated, especially patch ideas for
tweaking options. Thanks!

Best,
Tommy

^ permalink raw reply

* Re: filesystem-level tool to validate array
From: Bernd Schubert @ 2011-05-30  7:28 UTC (permalink / raw)
  To: Michael Stumpf; +Cc: linux-raid
In-Reply-To: <BANLkTi=9mp0v4kn+3ZZP7CtBxE88Ox3_rA@mail.gmail.com>

On 05/29/2011 10:17 PM, Michael Stumpf wrote:
> I'm looking for a filesystem-level tool to perform something similar
> to what badblocks does at the drive level.  I can certainly write it
> on my own (I'd build it as a Perl or Python script), but if someone's
> already invented this..
> 
> (The intended purpose is to validate that there are no quirks/bugs in
> the overall fs.)

Well, that is hard task and I don't know of a single tool. Here is what
is similar to badblocks:
http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/ql-fstest/

And here is a posix test suite:
http://www.tuxera.com/community/posix-test-suite/



Cheers,
Bernd

^ permalink raw reply

* Re: filesystem-level tool to validate array
From: Gordon Henderson @ 2011-05-30  8:20 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <BANLkTi=9mp0v4kn+3ZZP7CtBxE88Ox3_rA@mail.gmail.com>

On Sun, 29 May 2011, Michael Stumpf wrote:

> I'm looking for a filesystem-level tool to perform something similar
> to what badblocks does at the drive level.  I can certainly write it
> on my own (I'd build it as a Perl or Python script), but if someone's
> already invented this..
>
> (The intended purpose is to validate that there are no quirks/bugs in
> the overall fs.)

If you can take the partition offline, then fsck -fC might work, although 
it'll depend on the fileysstem type... And fsck doesn't actually read the 
file blocks (that I'm aware of)

For something crude, you can use find to descend a heirarchy then copy the 
file, or maybe even something like

   cd /top-level/dir/
   fgrep -r "wumpus" .

that'll perform a read of every file - well, mostly as some might be in 
the filesystem cache.

But if you want to make sure every file block belongs to a file, and the 
structure (directory) integrity is there, then fsck is probably the best 
bet...

Another way might be to recursively compute md5 checksums for all files - 
then do it again and compare.. (at a later date?)

You might want to look at something like tripwire to automate this though.

(Obviously won't work if you get the same error at the same place every 
time though!)

One of the burn-in tools I have is a script that writes a file of random 
numbers - md5's it. Then copies this file to n+1, then copies n+1 to n+2, 
then n+2 to n+3 and so on, then md5's the final file. The file-sizes are 
typically double RAM size to negate the effects of cache (same idea as 
bonnie)... However if there's a failure, then it's it's not clear where 
the issue is - memory, PCI bus, SATA cable, disk platter?

Of-course in a RAID array, looking at it from the fileysstem level (or 
even the block level) isn't going to read all platters of all disks - you 
need to use the /sys/block/mdX/md/sync_action mechanism.

Gordon

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox