Linux RAID subsystem development
 help / color / mirror / Atom feed
* Re: Inactive arrays
From: Daniel Sanabria @ 2016-09-12 21:13 UTC (permalink / raw)
  To: Wols Lists; +Cc: linux-raid
In-Reply-To: <CAHscji35kySP4Q8cUpYCXR2P9DBwPcYeWjr_HX==TNX9CPW9NA@mail.gmail.com>

apologies for the verbosity just adding some more info which is now
making me lose hope. Using parted -l instead of fdisk gives me this:

[root@lamachine ~]# parted -l
Model: ATA WDC WD5000AAKS-0 (scsi)
Disk /dev/sda: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number  Start   End     Size    Type      File system  Flags
 1      32.3kB  31.5GB  31.5GB  primary                raid
 2      31.5GB  294GB   262GB   primary   ext4         raid
 3      294GB   500GB   207GB   extended
 5      294GB   326GB   32.2GB  logical
 6      336GB   339GB   3644MB  logical                raid


Model: ATA WDC WD5000AAKS-0 (scsi)
Disk /dev/sdb: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number  Start  End    Size    Type      File system     Flags
 2      210MB  262GB  262GB   primary                   raid
 3      262GB  271GB  8389MB  primary   linux-swap(v1)
 4      271GB  500GB  229GB   extended
 5      271GB  303GB  32.2GB  logical
 6      313GB  317GB  3644MB  logical                   raid


Error: Invalid argument during seek for read on /dev/sdc
Retry/Ignore/Cancel? R
Error: Invalid argument during seek for read on /dev/sdc
Retry/Ignore/Cancel? I
Error: The backup GPT table is corrupt, but the primary appears OK, so
that will be used.
OK/Cancel? O
Model: ATA WDC WD30EZRX-00D (scsi)
Disk /dev/sdc: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: unknown
Disk Flags:

Model: ATA WDC WD30EZRX-00D (scsi)
Disk /dev/sdd: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  2199GB  2199GB                     raid
 2      2199GB  2736GB  537GB


Error: Invalid argument during seek for read on /dev/sde
Retry/Ignore/Cancel? C
Model: ATA WDC WD30EZRX-00D (scsi)
Disk /dev/sde: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: unknown
Disk Flags:

Model: ATA WDC WD5000AAKS-0 (scsi)
Disk /dev/sdf: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number  Start   End     Size    Type      File system  Flags
 1      1049kB  210MB   209MB   primary   ext4         boot
 2      210MB   31.7GB  31.5GB  primary                raid
 3      31.7GB  294GB   262GB   primary   ext4         raid
 4      294GB   500GB   206GB   extended
 5      294GB   326GB   32.2GB  logical
 6      336GB   340GB   3644MB  logical                raid


Model: Linux Software RAID Array (md)
Disk /dev/md2: 524GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End    Size   File system  Flags
 1      0.00B  524GB  524GB  ext4


Error: /dev/md126: unrecognised disk label
Model: Linux Software RAID Array (md)
Disk /dev/md126: 31.5GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:

Error: /dev/md127: unrecognised disk label
Model: Linux Software RAID Array (md)
Disk /dev/md127: 96.6GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:



On 12 September 2016 at 20:41, Daniel Sanabria <sanabria.d@gmail.com> wrote:
> ok, I just adjusted system time so that I can start tracking logs.
>
> what I'm noticing however is that fdisk -l is not giving me the expect
> partitions (I was expecting at least 2 partitions in every 2.7 disk
> similar to what I have in sdd):
>
> [root@lamachine lamachine_220315]# fdisk -l /dev/{sdc,sdd,sde}
> Disk /dev/sdc: 2.7 TiB, 3000591900160 bytes, 5860531055 sectors
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disklabel type: dos
> Disk identifier: 0x00000000
>
> Device     Boot Start        End    Sectors Size Id Type
> /dev/sdc1           1 4294967295 4294967295   2T ee GPT
>
> Partition 1 does not start on physical sector boundary.
> Disk /dev/sdd: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disklabel type: gpt
> Disk identifier: D3233810-F552-4126-8281-7F71A4938DF9
>
> Device          Start        End    Sectors  Size Type
> /dev/sdd1        2048 4294969343 4294967296    2T Linux RAID
> /dev/sdd2  4294969344 5343545343 1048576000  500G Linux filesystem
> Disk /dev/sde: 2.7 TiB, 3000591900160 bytes, 5860531055 sectors
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disklabel type: dos
> Disk identifier: 0x00000000
>
> Device     Boot Start        End    Sectors Size Id Type
> /dev/sde1           1 4294967295 4294967295   2T ee GPT
>
> Partition 1 does not start on physical sector boundary.
> [root@lamachine lamachine_220315]#
>
> what could've happened here? any ideas why the partition tables ended
> up like that?
>
> From previous information I have an idea of what the md128 and md129
> are supposed to looks like (also noticed that the device names
> changed):
>
> # md128 and md129 details From an old command output
> /dev/md128:
>         Version : 1.2
>   Creation Time : Fri Oct 24 15:24:38 2014
>      Raid Level : raid5
>      Array Size : 4294705152 (4095.75 GiB 4397.78 GB)
>   Used Dev Size : 2147352576 (2047.88 GiB 2198.89 GB)
>    Raid Devices : 3
>   Total Devices : 3
>     Persistence : Superblock is persistent
>
>   Intent Bitmap : Internal
>
>     Update Time : Sun Mar 22 06:20:08 2015
>           State : clean
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 0
>   Spare Devices : 0
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>            Name : lamachine:128  (local to host lamachine)
>            UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
>          Events : 4041
>
>     Number   Major   Minor   RaidDevice State
>        0       8       49        0      active sync   /dev/sdd1
>        1       8       65        1      active sync   /dev/sde1
>        3       8       81        2      active sync   /dev/sdf1
> /dev/md129:
>         Version : 1.2
>   Creation Time : Mon Nov 10 16:28:11 2014
>      Raid Level : raid0
>      Array Size : 1572470784 (1499.63 GiB 1610.21 GB)
>    Raid Devices : 3
>   Total Devices : 3
>     Persistence : Superblock is persistent
>
>     Update Time : Mon Nov 10 16:28:11 2014
>           State : clean
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 0
>   Spare Devices : 0
>
>      Chunk Size : 512K
>
>            Name : lamachine:129  (local to host lamachine)
>            UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
>          Events : 0
>
>     Number   Major   Minor   RaidDevice State
>        0       8       50        0      active sync   /dev/sdd2
>        1       8       66        1      active sync   /dev/sde2
>        2       8       82        2      active sync   /dev/sdf2
>
> Is there any way to recover the contents of these two arrays ? :(
>
> On 11 September 2016 at 21:06, Daniel Sanabria <sanabria.d@gmail.com> wrote:
>> However I'm noticing that the details with this new MB are somewhat different:
>>
>> [root@lamachine ~]# cat /etc/mdadm.conf
>> # mdadm.conf written out by anaconda
>> MAILADDR root
>> AUTO +imsm +1.x -all
>> ARRAY /dev/md2 level=raid5 num-devices=3
>> UUID=2cff15d1:e411447b:fd5d4721:03e44022
>> ARRAY /dev/md126 level=raid10 num-devices=2
>> UUID=9af006ca:8845bbd3:bfe78010:bc810f04
>> ARRAY /dev/md127 level=raid0 num-devices=3
>> UUID=acd5374f:72628c93:6a906c4b:5f675ce5
>> ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128
>> UUID=f2372cb9:d3816fd6:ce86d826:882ec82e
>> ARRAY /dev/md129 metadata=1.2 name=lamachine:129
>> UUID=895dae98:d1a496de:4f590b8b:cb8ac12a
>> [root@lamachine ~]# mdadm --detail /dev/md1*
>> /dev/md126:
>>         Version : 0.90
>>   Creation Time : Thu Dec  3 22:12:12 2009
>>      Raid Level : raid10
>>      Array Size : 30719936 (29.30 GiB 31.46 GB)
>>   Used Dev Size : 30719936 (29.30 GiB 31.46 GB)
>>    Raid Devices : 2
>>   Total Devices : 2
>> Preferred Minor : 126
>>     Persistence : Superblock is persistent
>>
>>     Update Time : Tue Jan 12 04:03:41 2016
>>           State : clean
>>  Active Devices : 2
>> Working Devices : 2
>>  Failed Devices : 0
>>   Spare Devices : 0
>>
>>          Layout : near=2
>>      Chunk Size : 64K
>>
>>            UUID : 9af006ca:8845bbd3:bfe78010:bc810f04
>>          Events : 0.264152
>>
>>     Number   Major   Minor   RaidDevice State
>>        0       8       82        0      active sync set-A   /dev/sdf2
>>        1       8        1        1      active sync set-B   /dev/sda1
>> /dev/md127:
>>         Version : 1.2
>>   Creation Time : Tue Jul 26 19:00:28 2011
>>      Raid Level : raid0
>>      Array Size : 94367232 (90.00 GiB 96.63 GB)
>>    Raid Devices : 3
>>   Total Devices : 3
>>     Persistence : Superblock is persistent
>>
>>     Update Time : Tue Jul 26 19:00:28 2011
>>           State : clean
>>  Active Devices : 3
>> Working Devices : 3
>>  Failed Devices : 0
>>   Spare Devices : 0
>>
>>      Chunk Size : 512K
>>
>>            Name : reading.homeunix.com:3
>>            UUID : acd5374f:72628c93:6a906c4b:5f675ce5
>>          Events : 0
>>
>>     Number   Major   Minor   RaidDevice State
>>        0       8       85        0      active sync   /dev/sdf5
>>        1       8       21        1      active sync   /dev/sdb5
>>        2       8        5        2      active sync   /dev/sda5
>> /dev/md128:
>>         Version : 1.2
>>      Raid Level : raid0
>>   Total Devices : 1
>>     Persistence : Superblock is persistent
>>
>>           State : inactive
>>
>>            Name : lamachine:128  (local to host lamachine)
>>            UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
>>          Events : 4154
>>
>>     Number   Major   Minor   RaidDevice
>>
>>        -       8       49        -        /dev/sdd1
>> /dev/md129:
>>         Version : 1.2
>>      Raid Level : raid0
>>   Total Devices : 1
>>     Persistence : Superblock is persistent
>>
>>           State : inactive
>>
>>            Name : lamachine:129  (local to host lamachine)
>>            UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
>>          Events : 0
>>
>>     Number   Major   Minor   RaidDevice
>>
>>        -       8       50        -        /dev/sdd2
>> [root@lamachine ~]# mdadm --detail /dev/md2*
>> /dev/md2:
>>         Version : 0.90
>>   Creation Time : Mon Feb 11 07:54:36 2013
>>      Raid Level : raid5
>>      Array Size : 511999872 (488.28 GiB 524.29 GB)
>>   Used Dev Size : 255999936 (244.14 GiB 262.14 GB)
>>    Raid Devices : 3
>>   Total Devices : 3
>> Preferred Minor : 2
>>     Persistence : Superblock is persistent
>>
>>     Update Time : Tue Jan 12 02:31:50 2016
>>           State : clean
>>  Active Devices : 3
>> Working Devices : 3
>>  Failed Devices : 0
>>   Spare Devices : 0
>>
>>          Layout : left-symmetric
>>      Chunk Size : 64K
>>
>>            UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine)
>>          Events : 0.611
>>
>>     Number   Major   Minor   RaidDevice State
>>        0       8       83        0      active sync   /dev/sdf3
>>        1       8       18        1      active sync   /dev/sdb2
>>        2       8        2        2      active sync   /dev/sda2
>> [root@lamachine ~]# cat /proc/mdstat
>> Personalities : [raid10] [raid0] [raid6] [raid5] [raid4]
>> md2 : active raid5 sda2[2] sdf3[0] sdb2[1]
>>       511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>>
>> md127 : active raid0 sda5[2] sdf5[0] sdb5[1]
>>       94367232 blocks super 1.2 512k chunks
>>
>> md129 : inactive sdd2[2](S)
>>       524156928 blocks super 1.2
>>
>> md128 : inactive sdd1[3](S)
>>       2147352576 blocks super 1.2
>>
>> md126 : active raid10 sdf2[0] sda1[1]
>>       30719936 blocks 2 near-copies [2/2] [UU]
>>
>> unused devices: <none>
>> [root@lamachine ~]#
>>
>> On 11 September 2016 at 19:48, Daniel Sanabria <sanabria.d@gmail.com> wrote:
>>> ok, system up and running after MB was replaced however the arrays
>>> remain inactive.
>>>
>>> mdadm version is:
>>> mdadm - v3.3.4 - 3rd August 2015
>>>
>>> Here's the output from Phil's lsdrv:
>>>
>>> [root@lamachine ~]# ./lsdrv
>>> PCI [ahci] 00:1f.2 SATA controller: Intel Corporation C600/X79 series
>>> chipset 6-Port SATA AHCI Controller (rev 06)
>>> ├scsi 0:0:0:0 ATA      WDC WD5000AAKS-0 {WD-WCASZ0505379}
>>> │└sda 465.76g [8:0] Partitioned (dos)
>>> │ ├sda1 29.30g [8:1] MD raid10,near2 (1/2) (w/ sdf2) in_sync
>>> {9af006ca-8845-bbd3-bfe7-8010bc810f04}
>>> │ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk
>>> {9af006ca:8845bbd3:bfe78010:bc810f04}
>>> │ │ │                    PV LVM2_member 28.03g used, 1.26g free
>>> {cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH}
>>> │ │ └VG vg_bigblackbox 29.29g 1.26g free
>>> {VWfuwI-5v2q-w8qf-FEbc-BdGW-3mKX-pZd7hR}
>>> │ │  ├dm-2 7.81g [253:2] LV LogVol_opt ext4
>>> {b08d7f5e-f15f-4241-804e-edccecab6003}
>>> │ │  │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_opt @ /opt
>>> │ │  ├dm-0 9.77g [253:0] LV LogVol_root ext4
>>> {4dabd6b0-b1a3-464d-8ed7-0aab93fab6c3}
>>> │ │  │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_root @ /
>>> │ │  ├dm-3 1.95g [253:3] LV LogVol_tmp ext4
>>> {f6b46363-170b-4038-83bd-2c5f9f6a1973}
>>> │ │  │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_tmp @ /tmp
>>> │ │  └dm-1 8.50g [253:1] LV LogVol_var ext4
>>> {ab165c61-3d62-4c55-8639-6c2c2bf4b021}
>>> │ │   └Mounted as /dev/mapper/vg_bigblackbox-LogVol_var @ /var
>>> │ ├sda2 244.14g [8:2] MD raid5 (2/3) (w/ sdb2,sdf3) in_sync
>>> {2cff15d1-e411-447b-fd5d-472103e44022}
>>> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
>>> {2cff15d1:e411447b:fd5d4721:03e44022}
>>> │ │ │                 ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
>>> │ │ └Mounted as /dev/md2 @ /home
>>> │ ├sda3 1.00k [8:3] Partitioned (dos)
>>> │ ├sda5 30.00g [8:5] MD raid0 (2/3) (w/ sdb5,sdf5) in_sync
>>> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
>>> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
>>> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
>>> │ │ │                    PV LVM2_member 86.00g used, 3.99g free
>>> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
>>> │ │ └VG libvirt_lvm 89.99g 3.99g free {t8GQck-f2Eu-iD2V-fnJQ-kBm6-QyKw-dR31PB}
>>> │ │  ├dm-6 8.00g [253:6] LV builder2 Partitioned (dos)
>>> │ │  ├dm-7 8.00g [253:7] LV builder3 Partitioned (dos)
>>> │ │  ├dm-9 8.00g [253:9] LV builder5.3 Partitioned (dos)
>>> │ │  ├dm-8 8.00g [253:8] LV builder5.6 Partitioned (dos)
>>> │ │  ├dm-5 8.00g [253:5] LV centos_updt Partitioned (dos)
>>> │ │  ├dm-10 16.00g [253:10] LV f22lvm Partitioned (dos)
>>> │ │  └dm-4 30.00g [253:4] LV win7 Partitioned (dos)
>>> │ └sda6 3.39g [8:6] Empty/Unknown
>>> ├scsi 1:0:0:0 ATA      WDC WD5000AAKS-0 {WD-WCASY7694185}
>>> │└sdb 465.76g [8:16] Partitioned (dos)
>>> │ ├sdb2 244.14g [8:18] MD raid5 (1/3) (w/ sda2,sdf3) in_sync
>>> {2cff15d1-e411-447b-fd5d-472103e44022}
>>> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
>>> {2cff15d1:e411447b:fd5d4721:03e44022}
>>> │ │                   ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
>>> │ ├sdb3 7.81g [8:19] swap {9194f492-881a-4fc3-ac09-ca4e1cc2985a}
>>> │ ├sdb4 1.00k [8:20] Partitioned (dos)
>>> │ ├sdb5 30.00g [8:21] MD raid0 (1/3) (w/ sda5,sdf5) in_sync
>>> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
>>> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
>>> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
>>> │ │                      PV LVM2_member 86.00g used, 3.99g free
>>> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
>>> │ └sdb6 3.39g [8:22] Empty/Unknown
>>> ├scsi 2:x:x:x [Empty]
>>> ├scsi 3:x:x:x [Empty]
>>> ├scsi 4:x:x:x [Empty]
>>> └scsi 5:x:x:x [Empty]
>>> PCI [ahci] 0a:00.0 SATA controller: Marvell Technology Group Ltd.
>>> 88SE9230 PCIe SATA 6Gb/s Controller (rev 11)
>>> ├scsi 6:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4NCWT13RF}
>>> │└sdc 2.73t [8:32] Partitioned (PMBR)
>>> ├scsi 7:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4NPRDD6D7}
>>> │└sdd 2.73t [8:48] Partitioned (gpt)
>>> │ ├sdd1 2.00t [8:49] MD  (none/) spare 'lamachine:128'
>>> {f2372cb9-d381-6fd6-ce86-d826882ec82e}
>>> │ │└md128 0.00k [9:128] MD v1.2  () inactive, None (None) None
>>> {f2372cb9:d3816fd6:ce86d826:882ec82e}
>>> │ │                     Empty/Unknown
>>> │ └sdd2 500.00g [8:50] MD  (none/) spare 'lamachine:129'
>>> {895dae98-d1a4-96de-4f59-0b8bcb8ac12a}
>>> │  └md129 0.00k [9:129] MD v1.2  () inactive, None (None) None
>>> {895dae98:d1a496de:4f590b8b:cb8ac12a}
>>> │                       Empty/Unknown
>>> ├scsi 8:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4N1294906}
>>> │└sde 2.73t [8:64] Partitioned (PMBR)
>>> ├scsi 9:0:0:0 ATA      WDC WD5000AAKS-0 {WD-WMAWF0085724}
>>> │└sdf 465.76g [8:80] Partitioned (dos)
>>> │ ├sdf1 199.00m [8:81] ext4 {4e51f903-37ca-4479-9197-fac7b2280557}
>>> │ │└Mounted as /dev/sdf1 @ /boot
>>> │ ├sdf2 29.30g [8:82] MD raid10,near2 (0/2) (w/ sda1) in_sync
>>> {9af006ca-8845-bbd3-bfe7-8010bc810f04}
>>> │ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk
>>> {9af006ca:8845bbd3:bfe78010:bc810f04}
>>> │ │                      PV LVM2_member 28.03g used, 1.26g free
>>> {cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH}
>>> │ ├sdf3 244.14g [8:83] MD raid5 (0/3) (w/ sda2,sdb2) in_sync
>>> {2cff15d1-e411-447b-fd5d-472103e44022}
>>> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
>>> {2cff15d1:e411447b:fd5d4721:03e44022}
>>> │ │                   ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
>>> │ ├sdf4 1.00k [8:84] Partitioned (dos)
>>> │ ├sdf5 30.00g [8:85] MD raid0 (0/3) (w/ sda5,sdb5) in_sync
>>> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
>>> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
>>> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
>>> │ │                      PV LVM2_member 86.00g used, 3.99g free
>>> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
>>> │ └sdf6 3.39g [8:86] Empty/Unknown
>>> ├scsi 10:x:x:x [Empty]
>>> ├scsi 11:x:x:x [Empty]
>>> └scsi 12:x:x:x [Empty]
>>> PCI [isci] 05:00.0 Serial Attached SCSI controller: Intel Corporation
>>> C602 chipset 4-Port SATA Storage Control Unit (rev 06)
>>> └scsi 14:x:x:x [Empty]
>>> [root@lamachine ~]#
>>>
>>> Thanks in advance for any recommendations on what steps to take in
>>> order to bring these arrays back online.
>>>
>>> Regards,
>>>
>>> Daniel
>>>
>>>
>>> On 2 August 2016 at 11:45, Daniel Sanabria <sanabria.d@gmail.com> wrote:
>>>> Thanks very much for the response Wol.
>>>>
>>>> It looks like the PSU is dead (server automatically powers off a few
>>>> seconds after power on).
>>>>
>>>> I'm planning to order a PSU replacement to resume troubleshooting so
>>>> please bear with me;  maybe the PSU was degraded and couldn't power
>>>> some of drives?
>>>>
>>>> Cheers,
>>>>
>>>> Daniel
>>>>
>>>> On 2 August 2016 at 11:17, Wols Lists <antlists@youngman.org.uk> wrote:
>>>>> Just a quick first response. I see md128 and md129 are both down, and
>>>>> are both listed as one drive, raid0. Bit odd, that ...
>>>>>
>>>>> What version of mdadm are you using? One of them had a bug (3.2.3 era?)
>>>>> that would split an array in two. Is it possible that you should have
>>>>> one raid0 array with sdf1 and sdf2? But that's a bit of a weird setup...
>>>>>
>>>>> I notice also that md126 is raid10 across two drives. That's odd, too.
>>>>>
>>>>> How much do you know about what the setup should be, and why it was set
>>>>> up that way?
>>>>>
>>>>> Download lspci by Phil Turmel (it requires python2.7, if your machine is
>>>>> python3 a quick fix to the shebang at the start should get it to work).
>>>>> Post the output from that here.
>>>>>
>>>>> Cheers,
>>>>> Wol
>>>>>
>>>>> On 02/08/16 08:36, Daniel Sanabria wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> I have a box that I believe was not powered down correctly and after
>>>>>> transporting it to a different location it doesn't boot anymore
>>>>>> stopping at BIOS check "Verifying DMI Pool Data".
>>>>>>
>>>>>> The box have 6 drives and after instructing the BIOS to boot from the
>>>>>> first drive I managed to boot the OS (Fedora 23) after commenting out
>>>>>> 2 /etc/fstab entries , output for "uname -a; cat /etc/fstab" follows:
>>>>>>
>>>>>> [root@lamachine ~]# uname -a; cat /etc/fstab
>>>>>> Linux lamachine 4.3.3-303.fc23.x86_64 #1 SMP Tue Jan 19 18:31:55 UTC
>>>>>> 2016 x86_64 x86_64 x86_64 GNU/Linux
>>>>>>
>>>>>> #
>>>>>> # /etc/fstab
>>>>>> # Created by anaconda on Tue Mar 24 19:31:21 2015
>>>>>> #
>>>>>> # Accessible filesystems, by reference, are maintained under '/dev/disk'
>>>>>> # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
>>>>>> #
>>>>>> /dev/mapper/vg_bigblackbox-LogVol_root /                       ext4
>>>>>> defaults        1 1
>>>>>> UUID=4e51f903-37ca-4479-9197-fac7b2280557 /boot                   ext4
>>>>>>    defaults        1 2
>>>>>> /dev/mapper/vg_bigblackbox-LogVol_opt /opt                    ext4
>>>>>> defaults        1 2
>>>>>> /dev/mapper/vg_bigblackbox-LogVol_tmp /tmp                    ext4
>>>>>> defaults        1 2
>>>>>> /dev/mapper/vg_bigblackbox-LogVol_var /var                    ext4
>>>>>> defaults        1 2
>>>>>> UUID=9194f492-881a-4fc3-ac09-ca4e1cc2985a swap                    swap
>>>>>>    defaults        0 0
>>>>>> /dev/md2 /home          ext4    defaults        1 2
>>>>>> #/dev/vg_media/lv_media  /mnt/media      ext4    defaults        1 2
>>>>>> #/dev/vg_virt_dir/lv_virt_dir1 /mnt/guest_images/ ext4 defaults 1 2
>>>>>> [root@lamachine ~]#
>>>>>>
>>>>>> When checking mdstat I can see that 2 of the arrays are showing up as
>>>>>> inactive, but not sure how to safely activate these so looking for
>>>>>> some knowledgeable advice on how to proceed here.
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> Daniel
>>>>>>
>>>>>> Below some more relevant outputs:
>>>>>>
>>>>>> [root@lamachine ~]# cat /proc/mdstat
>>>>>> Personalities : [raid10] [raid6] [raid5] [raid4] [raid0]
>>>>>> md127 : active raid0 sda5[0] sdc5[2] sdb5[1]
>>>>>>       94367232 blocks super 1.2 512k chunks
>>>>>>
>>>>>> md2 : active raid5 sda3[0] sdc2[2] sdb2[1]
>>>>>>       511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>>>>>>
>>>>>> md128 : inactive sdf1[3](S)
>>>>>>       2147352576 blocks super 1.2
>>>>>>
>>>>>> md129 : inactive sdf2[2](S)
>>>>>>       524156928 blocks super 1.2
>>>>>>
>>>>>> md126 : active raid10 sda2[0] sdc1[1]
>>>>>>       30719936 blocks 2 near-copies [2/2] [UU]
>>>>>>
>>>>>> unused devices: <none>
>>>>>> [root@lamachine ~]# cat /etc/mdadm.conf
>>>>>> # mdadm.conf written out by anaconda
>>>>>> MAILADDR root
>>>>>> AUTO +imsm +1.x -all
>>>>>> ARRAY /dev/md2 level=raid5 num-devices=3
>>>>>> UUID=2cff15d1:e411447b:fd5d4721:03e44022
>>>>>> ARRAY /dev/md126 level=raid10 num-devices=2
>>>>>> UUID=9af006ca:8845bbd3:bfe78010:bc810f04
>>>>>> ARRAY /dev/md127 level=raid0 num-devices=3
>>>>>> UUID=acd5374f:72628c93:6a906c4b:5f675ce5
>>>>>> ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128
>>>>>> UUID=f2372cb9:d3816fd6:ce86d826:882ec82e
>>>>>> ARRAY /dev/md129 metadata=1.2 name=lamachine:129
>>>>>> UUID=895dae98:d1a496de:4f590b8b:cb8ac12a
>>>>>> [root@lamachine ~]# mdadm --detail /dev/md1*
>>>>>> /dev/md126:
>>>>>>         Version : 0.90
>>>>>>   Creation Time : Thu Dec  3 22:12:12 2009
>>>>>>      Raid Level : raid10
>>>>>>      Array Size : 30719936 (29.30 GiB 31.46 GB)
>>>>>>   Used Dev Size : 30719936 (29.30 GiB 31.46 GB)
>>>>>>    Raid Devices : 2
>>>>>>   Total Devices : 2
>>>>>> Preferred Minor : 126
>>>>>>     Persistence : Superblock is persistent
>>>>>>
>>>>>>     Update Time : Tue Aug  2 07:46:39 2016
>>>>>>           State : clean
>>>>>>  Active Devices : 2
>>>>>> Working Devices : 2
>>>>>>  Failed Devices : 0
>>>>>>   Spare Devices : 0
>>>>>>
>>>>>>          Layout : near=2
>>>>>>      Chunk Size : 64K
>>>>>>
>>>>>>            UUID : 9af006ca:8845bbd3:bfe78010:bc810f04
>>>>>>          Events : 0.264152
>>>>>>
>>>>>>     Number   Major   Minor   RaidDevice State
>>>>>>        0       8        2        0      active sync set-A   /dev/sda2
>>>>>>        1       8       33        1      active sync set-B   /dev/sdc1
>>>>>> /dev/md127:
>>>>>>         Version : 1.2
>>>>>>   Creation Time : Tue Jul 26 19:00:28 2011
>>>>>>      Raid Level : raid0
>>>>>>      Array Size : 94367232 (90.00 GiB 96.63 GB)
>>>>>>    Raid Devices : 3
>>>>>>   Total Devices : 3
>>>>>>     Persistence : Superblock is persistent
>>>>>>
>>>>>>     Update Time : Tue Jul 26 19:00:28 2011
>>>>>>           State : clean
>>>>>>  Active Devices : 3
>>>>>> Working Devices : 3
>>>>>>  Failed Devices : 0
>>>>>>   Spare Devices : 0
>>>>>>
>>>>>>      Chunk Size : 512K
>>>>>>
>>>>>>            Name : reading.homeunix.com:3
>>>>>>            UUID : acd5374f:72628c93:6a906c4b:5f675ce5
>>>>>>          Events : 0
>>>>>>
>>>>>>     Number   Major   Minor   RaidDevice State
>>>>>>        0       8        5        0      active sync   /dev/sda5
>>>>>>        1       8       21        1      active sync   /dev/sdb5
>>>>>>        2       8       37        2      active sync   /dev/sdc5
>>>>>> /dev/md128:
>>>>>>         Version : 1.2
>>>>>>      Raid Level : raid0
>>>>>>   Total Devices : 1
>>>>>>     Persistence : Superblock is persistent
>>>>>>
>>>>>>           State : inactive
>>>>>>
>>>>>>            Name : lamachine:128  (local to host lamachine)
>>>>>>            UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
>>>>>>          Events : 4154
>>>>>>
>>>>>>     Number   Major   Minor   RaidDevice
>>>>>>
>>>>>>        -       8       81        -        /dev/sdf1
>>>>>> /dev/md129:
>>>>>>         Version : 1.2
>>>>>>      Raid Level : raid0
>>>>>>   Total Devices : 1
>>>>>>     Persistence : Superblock is persistent
>>>>>>
>>>>>>           State : inactive
>>>>>>
>>>>>>            Name : lamachine:129  (local to host lamachine)
>>>>>>            UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
>>>>>>          Events : 0
>>>>>>
>>>>>>     Number   Major   Minor   RaidDevice
>>>>>>
>>>>>>        -       8       82        -        /dev/sdf2
>>>>>> [root@lamachine ~]# mdadm --detail /dev/md2
>>>>>> /dev/md2:
>>>>>>         Version : 0.90
>>>>>>   Creation Time : Mon Feb 11 07:54:36 2013
>>>>>>      Raid Level : raid5
>>>>>>      Array Size : 511999872 (488.28 GiB 524.29 GB)
>>>>>>   Used Dev Size : 255999936 (244.14 GiB 262.14 GB)
>>>>>>    Raid Devices : 3
>>>>>>   Total Devices : 3
>>>>>> Preferred Minor : 2
>>>>>>     Persistence : Superblock is persistent
>>>>>>
>>>>>>     Update Time : Mon Aug  1 20:24:23 2016
>>>>>>           State : clean
>>>>>>  Active Devices : 3
>>>>>> Working Devices : 3
>>>>>>  Failed Devices : 0
>>>>>>   Spare Devices : 0
>>>>>>
>>>>>>          Layout : left-symmetric
>>>>>>      Chunk Size : 64K
>>>>>>
>>>>>>            UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine)
>>>>>>          Events : 0.611
>>>>>>
>>>>>>     Number   Major   Minor   RaidDevice State
>>>>>>        0       8        3        0      active sync   /dev/sda3
>>>>>>        1       8       18        1      active sync   /dev/sdb2
>>>>>>        2       8       34        2      active sync   /dev/sdc2
>>>>>> [root@lamachine ~]#
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>

^ permalink raw reply

* Re: Inactive arrays
From: Daniel Sanabria @ 2016-09-12 19:41 UTC (permalink / raw)
  To: Wols Lists; +Cc: linux-raid
In-Reply-To: <CAHscji3HdHaJAvwtYNU=Ykc_qohBkfFfrbP0M=FNhuFMK+d-Jg@mail.gmail.com>

ok, I just adjusted system time so that I can start tracking logs.

what I'm noticing however is that fdisk -l is not giving me the expect
partitions (I was expecting at least 2 partitions in every 2.7 disk
similar to what I have in sdd):

[root@lamachine lamachine_220315]# fdisk -l /dev/{sdc,sdd,sde}
Disk /dev/sdc: 2.7 TiB, 3000591900160 bytes, 5860531055 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device     Boot Start        End    Sectors Size Id Type
/dev/sdc1           1 4294967295 4294967295   2T ee GPT

Partition 1 does not start on physical sector boundary.
Disk /dev/sdd: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: D3233810-F552-4126-8281-7F71A4938DF9

Device          Start        End    Sectors  Size Type
/dev/sdd1        2048 4294969343 4294967296    2T Linux RAID
/dev/sdd2  4294969344 5343545343 1048576000  500G Linux filesystem
Disk /dev/sde: 2.7 TiB, 3000591900160 bytes, 5860531055 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device     Boot Start        End    Sectors Size Id Type
/dev/sde1           1 4294967295 4294967295   2T ee GPT

Partition 1 does not start on physical sector boundary.
[root@lamachine lamachine_220315]#

what could've happened here? any ideas why the partition tables ended
up like that?

From previous information I have an idea of what the md128 and md129
are supposed to looks like (also noticed that the device names
changed):

# md128 and md129 details From an old command output
/dev/md128:
        Version : 1.2
  Creation Time : Fri Oct 24 15:24:38 2014
     Raid Level : raid5
     Array Size : 4294705152 (4095.75 GiB 4397.78 GB)
  Used Dev Size : 2147352576 (2047.88 GiB 2198.89 GB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Sun Mar 22 06:20:08 2015
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : lamachine:128  (local to host lamachine)
           UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
         Events : 4041

    Number   Major   Minor   RaidDevice State
       0       8       49        0      active sync   /dev/sdd1
       1       8       65        1      active sync   /dev/sde1
       3       8       81        2      active sync   /dev/sdf1
/dev/md129:
        Version : 1.2
  Creation Time : Mon Nov 10 16:28:11 2014
     Raid Level : raid0
     Array Size : 1572470784 (1499.63 GiB 1610.21 GB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Mon Nov 10 16:28:11 2014
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 512K

           Name : lamachine:129  (local to host lamachine)
           UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
         Events : 0

    Number   Major   Minor   RaidDevice State
       0       8       50        0      active sync   /dev/sdd2
       1       8       66        1      active sync   /dev/sde2
       2       8       82        2      active sync   /dev/sdf2

Is there any way to recover the contents of these two arrays ? :(

On 11 September 2016 at 21:06, Daniel Sanabria <sanabria.d@gmail.com> wrote:
> However I'm noticing that the details with this new MB are somewhat different:
>
> [root@lamachine ~]# cat /etc/mdadm.conf
> # mdadm.conf written out by anaconda
> MAILADDR root
> AUTO +imsm +1.x -all
> ARRAY /dev/md2 level=raid5 num-devices=3
> UUID=2cff15d1:e411447b:fd5d4721:03e44022
> ARRAY /dev/md126 level=raid10 num-devices=2
> UUID=9af006ca:8845bbd3:bfe78010:bc810f04
> ARRAY /dev/md127 level=raid0 num-devices=3
> UUID=acd5374f:72628c93:6a906c4b:5f675ce5
> ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128
> UUID=f2372cb9:d3816fd6:ce86d826:882ec82e
> ARRAY /dev/md129 metadata=1.2 name=lamachine:129
> UUID=895dae98:d1a496de:4f590b8b:cb8ac12a
> [root@lamachine ~]# mdadm --detail /dev/md1*
> /dev/md126:
>         Version : 0.90
>   Creation Time : Thu Dec  3 22:12:12 2009
>      Raid Level : raid10
>      Array Size : 30719936 (29.30 GiB 31.46 GB)
>   Used Dev Size : 30719936 (29.30 GiB 31.46 GB)
>    Raid Devices : 2
>   Total Devices : 2
> Preferred Minor : 126
>     Persistence : Superblock is persistent
>
>     Update Time : Tue Jan 12 04:03:41 2016
>           State : clean
>  Active Devices : 2
> Working Devices : 2
>  Failed Devices : 0
>   Spare Devices : 0
>
>          Layout : near=2
>      Chunk Size : 64K
>
>            UUID : 9af006ca:8845bbd3:bfe78010:bc810f04
>          Events : 0.264152
>
>     Number   Major   Minor   RaidDevice State
>        0       8       82        0      active sync set-A   /dev/sdf2
>        1       8        1        1      active sync set-B   /dev/sda1
> /dev/md127:
>         Version : 1.2
>   Creation Time : Tue Jul 26 19:00:28 2011
>      Raid Level : raid0
>      Array Size : 94367232 (90.00 GiB 96.63 GB)
>    Raid Devices : 3
>   Total Devices : 3
>     Persistence : Superblock is persistent
>
>     Update Time : Tue Jul 26 19:00:28 2011
>           State : clean
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 0
>   Spare Devices : 0
>
>      Chunk Size : 512K
>
>            Name : reading.homeunix.com:3
>            UUID : acd5374f:72628c93:6a906c4b:5f675ce5
>          Events : 0
>
>     Number   Major   Minor   RaidDevice State
>        0       8       85        0      active sync   /dev/sdf5
>        1       8       21        1      active sync   /dev/sdb5
>        2       8        5        2      active sync   /dev/sda5
> /dev/md128:
>         Version : 1.2
>      Raid Level : raid0
>   Total Devices : 1
>     Persistence : Superblock is persistent
>
>           State : inactive
>
>            Name : lamachine:128  (local to host lamachine)
>            UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
>          Events : 4154
>
>     Number   Major   Minor   RaidDevice
>
>        -       8       49        -        /dev/sdd1
> /dev/md129:
>         Version : 1.2
>      Raid Level : raid0
>   Total Devices : 1
>     Persistence : Superblock is persistent
>
>           State : inactive
>
>            Name : lamachine:129  (local to host lamachine)
>            UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
>          Events : 0
>
>     Number   Major   Minor   RaidDevice
>
>        -       8       50        -        /dev/sdd2
> [root@lamachine ~]# mdadm --detail /dev/md2*
> /dev/md2:
>         Version : 0.90
>   Creation Time : Mon Feb 11 07:54:36 2013
>      Raid Level : raid5
>      Array Size : 511999872 (488.28 GiB 524.29 GB)
>   Used Dev Size : 255999936 (244.14 GiB 262.14 GB)
>    Raid Devices : 3
>   Total Devices : 3
> Preferred Minor : 2
>     Persistence : Superblock is persistent
>
>     Update Time : Tue Jan 12 02:31:50 2016
>           State : clean
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 0
>   Spare Devices : 0
>
>          Layout : left-symmetric
>      Chunk Size : 64K
>
>            UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine)
>          Events : 0.611
>
>     Number   Major   Minor   RaidDevice State
>        0       8       83        0      active sync   /dev/sdf3
>        1       8       18        1      active sync   /dev/sdb2
>        2       8        2        2      active sync   /dev/sda2
> [root@lamachine ~]# cat /proc/mdstat
> Personalities : [raid10] [raid0] [raid6] [raid5] [raid4]
> md2 : active raid5 sda2[2] sdf3[0] sdb2[1]
>       511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>
> md127 : active raid0 sda5[2] sdf5[0] sdb5[1]
>       94367232 blocks super 1.2 512k chunks
>
> md129 : inactive sdd2[2](S)
>       524156928 blocks super 1.2
>
> md128 : inactive sdd1[3](S)
>       2147352576 blocks super 1.2
>
> md126 : active raid10 sdf2[0] sda1[1]
>       30719936 blocks 2 near-copies [2/2] [UU]
>
> unused devices: <none>
> [root@lamachine ~]#
>
> On 11 September 2016 at 19:48, Daniel Sanabria <sanabria.d@gmail.com> wrote:
>> ok, system up and running after MB was replaced however the arrays
>> remain inactive.
>>
>> mdadm version is:
>> mdadm - v3.3.4 - 3rd August 2015
>>
>> Here's the output from Phil's lsdrv:
>>
>> [root@lamachine ~]# ./lsdrv
>> PCI [ahci] 00:1f.2 SATA controller: Intel Corporation C600/X79 series
>> chipset 6-Port SATA AHCI Controller (rev 06)
>> ├scsi 0:0:0:0 ATA      WDC WD5000AAKS-0 {WD-WCASZ0505379}
>> │└sda 465.76g [8:0] Partitioned (dos)
>> │ ├sda1 29.30g [8:1] MD raid10,near2 (1/2) (w/ sdf2) in_sync
>> {9af006ca-8845-bbd3-bfe7-8010bc810f04}
>> │ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk
>> {9af006ca:8845bbd3:bfe78010:bc810f04}
>> │ │ │                    PV LVM2_member 28.03g used, 1.26g free
>> {cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH}
>> │ │ └VG vg_bigblackbox 29.29g 1.26g free
>> {VWfuwI-5v2q-w8qf-FEbc-BdGW-3mKX-pZd7hR}
>> │ │  ├dm-2 7.81g [253:2] LV LogVol_opt ext4
>> {b08d7f5e-f15f-4241-804e-edccecab6003}
>> │ │  │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_opt @ /opt
>> │ │  ├dm-0 9.77g [253:0] LV LogVol_root ext4
>> {4dabd6b0-b1a3-464d-8ed7-0aab93fab6c3}
>> │ │  │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_root @ /
>> │ │  ├dm-3 1.95g [253:3] LV LogVol_tmp ext4
>> {f6b46363-170b-4038-83bd-2c5f9f6a1973}
>> │ │  │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_tmp @ /tmp
>> │ │  └dm-1 8.50g [253:1] LV LogVol_var ext4
>> {ab165c61-3d62-4c55-8639-6c2c2bf4b021}
>> │ │   └Mounted as /dev/mapper/vg_bigblackbox-LogVol_var @ /var
>> │ ├sda2 244.14g [8:2] MD raid5 (2/3) (w/ sdb2,sdf3) in_sync
>> {2cff15d1-e411-447b-fd5d-472103e44022}
>> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
>> {2cff15d1:e411447b:fd5d4721:03e44022}
>> │ │ │                 ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
>> │ │ └Mounted as /dev/md2 @ /home
>> │ ├sda3 1.00k [8:3] Partitioned (dos)
>> │ ├sda5 30.00g [8:5] MD raid0 (2/3) (w/ sdb5,sdf5) in_sync
>> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
>> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
>> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
>> │ │ │                    PV LVM2_member 86.00g used, 3.99g free
>> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
>> │ │ └VG libvirt_lvm 89.99g 3.99g free {t8GQck-f2Eu-iD2V-fnJQ-kBm6-QyKw-dR31PB}
>> │ │  ├dm-6 8.00g [253:6] LV builder2 Partitioned (dos)
>> │ │  ├dm-7 8.00g [253:7] LV builder3 Partitioned (dos)
>> │ │  ├dm-9 8.00g [253:9] LV builder5.3 Partitioned (dos)
>> │ │  ├dm-8 8.00g [253:8] LV builder5.6 Partitioned (dos)
>> │ │  ├dm-5 8.00g [253:5] LV centos_updt Partitioned (dos)
>> │ │  ├dm-10 16.00g [253:10] LV f22lvm Partitioned (dos)
>> │ │  └dm-4 30.00g [253:4] LV win7 Partitioned (dos)
>> │ └sda6 3.39g [8:6] Empty/Unknown
>> ├scsi 1:0:0:0 ATA      WDC WD5000AAKS-0 {WD-WCASY7694185}
>> │└sdb 465.76g [8:16] Partitioned (dos)
>> │ ├sdb2 244.14g [8:18] MD raid5 (1/3) (w/ sda2,sdf3) in_sync
>> {2cff15d1-e411-447b-fd5d-472103e44022}
>> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
>> {2cff15d1:e411447b:fd5d4721:03e44022}
>> │ │                   ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
>> │ ├sdb3 7.81g [8:19] swap {9194f492-881a-4fc3-ac09-ca4e1cc2985a}
>> │ ├sdb4 1.00k [8:20] Partitioned (dos)
>> │ ├sdb5 30.00g [8:21] MD raid0 (1/3) (w/ sda5,sdf5) in_sync
>> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
>> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
>> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
>> │ │                      PV LVM2_member 86.00g used, 3.99g free
>> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
>> │ └sdb6 3.39g [8:22] Empty/Unknown
>> ├scsi 2:x:x:x [Empty]
>> ├scsi 3:x:x:x [Empty]
>> ├scsi 4:x:x:x [Empty]
>> └scsi 5:x:x:x [Empty]
>> PCI [ahci] 0a:00.0 SATA controller: Marvell Technology Group Ltd.
>> 88SE9230 PCIe SATA 6Gb/s Controller (rev 11)
>> ├scsi 6:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4NCWT13RF}
>> │└sdc 2.73t [8:32] Partitioned (PMBR)
>> ├scsi 7:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4NPRDD6D7}
>> │└sdd 2.73t [8:48] Partitioned (gpt)
>> │ ├sdd1 2.00t [8:49] MD  (none/) spare 'lamachine:128'
>> {f2372cb9-d381-6fd6-ce86-d826882ec82e}
>> │ │└md128 0.00k [9:128] MD v1.2  () inactive, None (None) None
>> {f2372cb9:d3816fd6:ce86d826:882ec82e}
>> │ │                     Empty/Unknown
>> │ └sdd2 500.00g [8:50] MD  (none/) spare 'lamachine:129'
>> {895dae98-d1a4-96de-4f59-0b8bcb8ac12a}
>> │  └md129 0.00k [9:129] MD v1.2  () inactive, None (None) None
>> {895dae98:d1a496de:4f590b8b:cb8ac12a}
>> │                       Empty/Unknown
>> ├scsi 8:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4N1294906}
>> │└sde 2.73t [8:64] Partitioned (PMBR)
>> ├scsi 9:0:0:0 ATA      WDC WD5000AAKS-0 {WD-WMAWF0085724}
>> │└sdf 465.76g [8:80] Partitioned (dos)
>> │ ├sdf1 199.00m [8:81] ext4 {4e51f903-37ca-4479-9197-fac7b2280557}
>> │ │└Mounted as /dev/sdf1 @ /boot
>> │ ├sdf2 29.30g [8:82] MD raid10,near2 (0/2) (w/ sda1) in_sync
>> {9af006ca-8845-bbd3-bfe7-8010bc810f04}
>> │ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk
>> {9af006ca:8845bbd3:bfe78010:bc810f04}
>> │ │                      PV LVM2_member 28.03g used, 1.26g free
>> {cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH}
>> │ ├sdf3 244.14g [8:83] MD raid5 (0/3) (w/ sda2,sdb2) in_sync
>> {2cff15d1-e411-447b-fd5d-472103e44022}
>> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
>> {2cff15d1:e411447b:fd5d4721:03e44022}
>> │ │                   ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
>> │ ├sdf4 1.00k [8:84] Partitioned (dos)
>> │ ├sdf5 30.00g [8:85] MD raid0 (0/3) (w/ sda5,sdb5) in_sync
>> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
>> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
>> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
>> │ │                      PV LVM2_member 86.00g used, 3.99g free
>> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
>> │ └sdf6 3.39g [8:86] Empty/Unknown
>> ├scsi 10:x:x:x [Empty]
>> ├scsi 11:x:x:x [Empty]
>> └scsi 12:x:x:x [Empty]
>> PCI [isci] 05:00.0 Serial Attached SCSI controller: Intel Corporation
>> C602 chipset 4-Port SATA Storage Control Unit (rev 06)
>> └scsi 14:x:x:x [Empty]
>> [root@lamachine ~]#
>>
>> Thanks in advance for any recommendations on what steps to take in
>> order to bring these arrays back online.
>>
>> Regards,
>>
>> Daniel
>>
>>
>> On 2 August 2016 at 11:45, Daniel Sanabria <sanabria.d@gmail.com> wrote:
>>> Thanks very much for the response Wol.
>>>
>>> It looks like the PSU is dead (server automatically powers off a few
>>> seconds after power on).
>>>
>>> I'm planning to order a PSU replacement to resume troubleshooting so
>>> please bear with me;  maybe the PSU was degraded and couldn't power
>>> some of drives?
>>>
>>> Cheers,
>>>
>>> Daniel
>>>
>>> On 2 August 2016 at 11:17, Wols Lists <antlists@youngman.org.uk> wrote:
>>>> Just a quick first response. I see md128 and md129 are both down, and
>>>> are both listed as one drive, raid0. Bit odd, that ...
>>>>
>>>> What version of mdadm are you using? One of them had a bug (3.2.3 era?)
>>>> that would split an array in two. Is it possible that you should have
>>>> one raid0 array with sdf1 and sdf2? But that's a bit of a weird setup...
>>>>
>>>> I notice also that md126 is raid10 across two drives. That's odd, too.
>>>>
>>>> How much do you know about what the setup should be, and why it was set
>>>> up that way?
>>>>
>>>> Download lspci by Phil Turmel (it requires python2.7, if your machine is
>>>> python3 a quick fix to the shebang at the start should get it to work).
>>>> Post the output from that here.
>>>>
>>>> Cheers,
>>>> Wol
>>>>
>>>> On 02/08/16 08:36, Daniel Sanabria wrote:
>>>>> Hi All,
>>>>>
>>>>> I have a box that I believe was not powered down correctly and after
>>>>> transporting it to a different location it doesn't boot anymore
>>>>> stopping at BIOS check "Verifying DMI Pool Data".
>>>>>
>>>>> The box have 6 drives and after instructing the BIOS to boot from the
>>>>> first drive I managed to boot the OS (Fedora 23) after commenting out
>>>>> 2 /etc/fstab entries , output for "uname -a; cat /etc/fstab" follows:
>>>>>
>>>>> [root@lamachine ~]# uname -a; cat /etc/fstab
>>>>> Linux lamachine 4.3.3-303.fc23.x86_64 #1 SMP Tue Jan 19 18:31:55 UTC
>>>>> 2016 x86_64 x86_64 x86_64 GNU/Linux
>>>>>
>>>>> #
>>>>> # /etc/fstab
>>>>> # Created by anaconda on Tue Mar 24 19:31:21 2015
>>>>> #
>>>>> # Accessible filesystems, by reference, are maintained under '/dev/disk'
>>>>> # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
>>>>> #
>>>>> /dev/mapper/vg_bigblackbox-LogVol_root /                       ext4
>>>>> defaults        1 1
>>>>> UUID=4e51f903-37ca-4479-9197-fac7b2280557 /boot                   ext4
>>>>>    defaults        1 2
>>>>> /dev/mapper/vg_bigblackbox-LogVol_opt /opt                    ext4
>>>>> defaults        1 2
>>>>> /dev/mapper/vg_bigblackbox-LogVol_tmp /tmp                    ext4
>>>>> defaults        1 2
>>>>> /dev/mapper/vg_bigblackbox-LogVol_var /var                    ext4
>>>>> defaults        1 2
>>>>> UUID=9194f492-881a-4fc3-ac09-ca4e1cc2985a swap                    swap
>>>>>    defaults        0 0
>>>>> /dev/md2 /home          ext4    defaults        1 2
>>>>> #/dev/vg_media/lv_media  /mnt/media      ext4    defaults        1 2
>>>>> #/dev/vg_virt_dir/lv_virt_dir1 /mnt/guest_images/ ext4 defaults 1 2
>>>>> [root@lamachine ~]#
>>>>>
>>>>> When checking mdstat I can see that 2 of the arrays are showing up as
>>>>> inactive, but not sure how to safely activate these so looking for
>>>>> some knowledgeable advice on how to proceed here.
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>> Daniel
>>>>>
>>>>> Below some more relevant outputs:
>>>>>
>>>>> [root@lamachine ~]# cat /proc/mdstat
>>>>> Personalities : [raid10] [raid6] [raid5] [raid4] [raid0]
>>>>> md127 : active raid0 sda5[0] sdc5[2] sdb5[1]
>>>>>       94367232 blocks super 1.2 512k chunks
>>>>>
>>>>> md2 : active raid5 sda3[0] sdc2[2] sdb2[1]
>>>>>       511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>>>>>
>>>>> md128 : inactive sdf1[3](S)
>>>>>       2147352576 blocks super 1.2
>>>>>
>>>>> md129 : inactive sdf2[2](S)
>>>>>       524156928 blocks super 1.2
>>>>>
>>>>> md126 : active raid10 sda2[0] sdc1[1]
>>>>>       30719936 blocks 2 near-copies [2/2] [UU]
>>>>>
>>>>> unused devices: <none>
>>>>> [root@lamachine ~]# cat /etc/mdadm.conf
>>>>> # mdadm.conf written out by anaconda
>>>>> MAILADDR root
>>>>> AUTO +imsm +1.x -all
>>>>> ARRAY /dev/md2 level=raid5 num-devices=3
>>>>> UUID=2cff15d1:e411447b:fd5d4721:03e44022
>>>>> ARRAY /dev/md126 level=raid10 num-devices=2
>>>>> UUID=9af006ca:8845bbd3:bfe78010:bc810f04
>>>>> ARRAY /dev/md127 level=raid0 num-devices=3
>>>>> UUID=acd5374f:72628c93:6a906c4b:5f675ce5
>>>>> ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128
>>>>> UUID=f2372cb9:d3816fd6:ce86d826:882ec82e
>>>>> ARRAY /dev/md129 metadata=1.2 name=lamachine:129
>>>>> UUID=895dae98:d1a496de:4f590b8b:cb8ac12a
>>>>> [root@lamachine ~]# mdadm --detail /dev/md1*
>>>>> /dev/md126:
>>>>>         Version : 0.90
>>>>>   Creation Time : Thu Dec  3 22:12:12 2009
>>>>>      Raid Level : raid10
>>>>>      Array Size : 30719936 (29.30 GiB 31.46 GB)
>>>>>   Used Dev Size : 30719936 (29.30 GiB 31.46 GB)
>>>>>    Raid Devices : 2
>>>>>   Total Devices : 2
>>>>> Preferred Minor : 126
>>>>>     Persistence : Superblock is persistent
>>>>>
>>>>>     Update Time : Tue Aug  2 07:46:39 2016
>>>>>           State : clean
>>>>>  Active Devices : 2
>>>>> Working Devices : 2
>>>>>  Failed Devices : 0
>>>>>   Spare Devices : 0
>>>>>
>>>>>          Layout : near=2
>>>>>      Chunk Size : 64K
>>>>>
>>>>>            UUID : 9af006ca:8845bbd3:bfe78010:bc810f04
>>>>>          Events : 0.264152
>>>>>
>>>>>     Number   Major   Minor   RaidDevice State
>>>>>        0       8        2        0      active sync set-A   /dev/sda2
>>>>>        1       8       33        1      active sync set-B   /dev/sdc1
>>>>> /dev/md127:
>>>>>         Version : 1.2
>>>>>   Creation Time : Tue Jul 26 19:00:28 2011
>>>>>      Raid Level : raid0
>>>>>      Array Size : 94367232 (90.00 GiB 96.63 GB)
>>>>>    Raid Devices : 3
>>>>>   Total Devices : 3
>>>>>     Persistence : Superblock is persistent
>>>>>
>>>>>     Update Time : Tue Jul 26 19:00:28 2011
>>>>>           State : clean
>>>>>  Active Devices : 3
>>>>> Working Devices : 3
>>>>>  Failed Devices : 0
>>>>>   Spare Devices : 0
>>>>>
>>>>>      Chunk Size : 512K
>>>>>
>>>>>            Name : reading.homeunix.com:3
>>>>>            UUID : acd5374f:72628c93:6a906c4b:5f675ce5
>>>>>          Events : 0
>>>>>
>>>>>     Number   Major   Minor   RaidDevice State
>>>>>        0       8        5        0      active sync   /dev/sda5
>>>>>        1       8       21        1      active sync   /dev/sdb5
>>>>>        2       8       37        2      active sync   /dev/sdc5
>>>>> /dev/md128:
>>>>>         Version : 1.2
>>>>>      Raid Level : raid0
>>>>>   Total Devices : 1
>>>>>     Persistence : Superblock is persistent
>>>>>
>>>>>           State : inactive
>>>>>
>>>>>            Name : lamachine:128  (local to host lamachine)
>>>>>            UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
>>>>>          Events : 4154
>>>>>
>>>>>     Number   Major   Minor   RaidDevice
>>>>>
>>>>>        -       8       81        -        /dev/sdf1
>>>>> /dev/md129:
>>>>>         Version : 1.2
>>>>>      Raid Level : raid0
>>>>>   Total Devices : 1
>>>>>     Persistence : Superblock is persistent
>>>>>
>>>>>           State : inactive
>>>>>
>>>>>            Name : lamachine:129  (local to host lamachine)
>>>>>            UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
>>>>>          Events : 0
>>>>>
>>>>>     Number   Major   Minor   RaidDevice
>>>>>
>>>>>        -       8       82        -        /dev/sdf2
>>>>> [root@lamachine ~]# mdadm --detail /dev/md2
>>>>> /dev/md2:
>>>>>         Version : 0.90
>>>>>   Creation Time : Mon Feb 11 07:54:36 2013
>>>>>      Raid Level : raid5
>>>>>      Array Size : 511999872 (488.28 GiB 524.29 GB)
>>>>>   Used Dev Size : 255999936 (244.14 GiB 262.14 GB)
>>>>>    Raid Devices : 3
>>>>>   Total Devices : 3
>>>>> Preferred Minor : 2
>>>>>     Persistence : Superblock is persistent
>>>>>
>>>>>     Update Time : Mon Aug  1 20:24:23 2016
>>>>>           State : clean
>>>>>  Active Devices : 3
>>>>> Working Devices : 3
>>>>>  Failed Devices : 0
>>>>>   Spare Devices : 0
>>>>>
>>>>>          Layout : left-symmetric
>>>>>      Chunk Size : 64K
>>>>>
>>>>>            UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine)
>>>>>          Events : 0.611
>>>>>
>>>>>     Number   Major   Minor   RaidDevice State
>>>>>        0       8        3        0      active sync   /dev/sda3
>>>>>        1       8       18        1      active sync   /dev/sdb2
>>>>>        2       8       34        2      active sync   /dev/sdc2
>>>>> [root@lamachine ~]#
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>

^ permalink raw reply

* Question about commit f9a67b1182e5 ("md/bitmap: clear bitmap if bitmap_create failed").
From: Christophe JAILLET @ 2016-09-12 19:09 UTC (permalink / raw)
  To: shli, linux-raid, linux-kernel

Hi,

I'm puzzled by commit f9a67b1182e5 ("md/bitmap: clear bitmap if 
bitmap_create failed").

Part of the commit is:

@@ -1865,8 +1866,10 @@ int bitmap_copy_from_slot(struct mddev *mddev, 
int slot,
      struct bitmap_counts *counts;
      struct bitmap *bitmap = bitmap_create(mddev, slot);

-    if (IS_ERR(bitmap))
+    if (IS_ERR(bitmap)) {
+        bitmap_free(bitmap);
          return PTR_ERR(bitmap);
+    }

but if 'bitmap' is an error, I think that bad things will happen in 
'bitmap_free()' when, at the beginning of the function, we will execute:

     if (bitmap->sysfs_can_clear) <-----------------
         sysfs_put(bitmap->sysfs_can_clear);


However, the commit log message is really explicit and adding this call 
to 'bitmap_free' has really been done one purpose. ("If bitmap_create 
returns an error, we need to call either bitmap_destroy or bitmap_free 
to do clean up, ...")


It is also not consistent with the comment before function bitmap_create():

     * if this returns an error, bitmap_destroy must be called to do 
clean up
     * once mddev->bitmap is set


I may have missed something, but I don't see what.

Is this commit correct?


Best regards,
CJ


^ permalink raw reply

* Re: [PATCH v3] mdadm: fix a buffer overflow
From: Jes Sorensen @ 2016-09-12 16:51 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-raid, shli
In-Reply-To: <1473358867-4114379-1-git-send-email-songliubraving@fb.com>

Song Liu <songliubraving@fb.com> writes:
> struct mdp_superblock_1.set_name is 32B long, but struct mdinfo.name
> is 33B long. So we need strncpy instead strcpy to avoid buffer
> overflow.
>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  super1.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Applied thanks!

Note there is at least one place with a str operation hardcoding the
length of set_name to 32. Would you mind fixing that too?

Cheers,
Jes

^ permalink raw reply

* Re: dm: Return correct value in retry loop
From: Mike Snitzer @ 2016-09-12 15:01 UTC (permalink / raw)
  To: Minfei Huang; +Cc: agk, shli, linux-raid, dm-devel, linux-kernel
In-Reply-To: <20160912013906.GA24268@MinfeideMacBook-Pro.local>

Thanks for the patch, I've picked it up as a stable@ fix for either
4.8-rc7 or when the 4.9 merge windw opens (I'm leaning toward the latter
since this issue has been around since 3.19 was released and there
aren't any known problems/reports related to this oversight).

Please see:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.8&id=7735f936a13d79bbaead1723e4532e65d4c4cf01

On Sun, Sep 11 2016 at  9:39pm -0400,
Minfei Huang <mnghuan@gmail.com> wrote:

> Ping.
> 
> Any comment is appreciate.
> 
> Thanks
> Minfei
> 
> On 09/06/16 at 04:00P, Minfei Huang wrote:
> > dm_resume will return sliently in retry loop's failure. Assign a correct
> > return value in the failed loop.
> > 
> > Remove a useless assignment as well.
> > 
> > Signed-off-by: Minfei Huang <mnghuan@gmail.com>
> > ---
> >  drivers/md/dm.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index fa9b1cb..c935cc8 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -2249,10 +2249,11 @@ static int __dm_resume(struct mapped_device *md, struct dm_table *map)
> >  
> >  int dm_resume(struct mapped_device *md)
> >  {
> > -	int r = -EINVAL;
> > +	int r;
> >  	struct dm_table *map = NULL;
> >  
> >  retry:
> > +	r = -EINVAL;
> >  	mutex_lock_nested(&md->suspend_lock, SINGLE_DEPTH_NESTING);
> >  
> >  	if (!dm_suspended_md(md))
> > @@ -2277,10 +2278,8 @@ retry:
> >  
> >  	clear_bit(DMF_SUSPENDED, &md->flags);
> >  
> > -	r = 0;
> >  out:
> >  	mutex_unlock(&md->suspend_lock);
> > -
> >  	return r;
> >  }
> >  
> > -- 
> > 2.7.4 (Apple Git-66)
> > 
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply

* Re: lots of "md: export_rdev(sde)" printed after create IMSM RAID10 with missing
From: Artur Paszkiewicz @ 2016-09-12 10:58 UTC (permalink / raw)
  To: Yi Zhang, Shaohua Li; +Cc: linux-raid, Jes.Sorensen
In-Reply-To: <a543dd7a-b582-98fa-3ba8-67500396c766@redhat.com>

On 09/12/2016 10:03 AM, Yi Zhang wrote:
> Hello Artur
> With your patch, no "md: export_rdev(sde)" printed after create raid10.
> 
> I found another problem, not sure whether it is reasonable, could you help confirm it, thanks.
> When I create one container with 4 disks[1], and create one raid10 with 3 disks(sd[b-d]) + 1 missing [2], but it finally bind the fourth disk: sde [3].
> 
> [1] mdadm -CR /dev/md0 /dev/sd[b-e] -n4 -e imsm
> [2] mdadm -CR /dev/md/Volume0 -l10 -n4 /dev/sd[b-d] missing --size=500M
> [3] # cat /proc/mdstat
> Personalities : [raid10]
> md127 : active raid10 sde[4] sdd[2] sdc[1] sdb[0]
>       1024000 blocks super external:/md0/0 128K chunks 2 near-copies [4/4] [UUUU]
> 
> md0 : inactive sde[3](S) sdd[2](S) sdc[1](S) sdb[0](S)
>       4420 blocks super external:imsm
> 
> unused devices: <none>

I think that this is correct behavior. Because there is a spare disk
available in the container, it is used for rebuilding the volume. This
is equivalent to:

mdadm -CR /dev/md0 /dev/sd[b-d] -n3 -e imsm
mdadm -CR /dev/md/Volume0 -l10 -n4 /dev/sd[b-d] missing --size=500M
mdadm -a /dev/md0 /dev/sde


^ permalink raw reply

* Re: lots of "md: export_rdev(sde)" printed after create IMSM RAID10 with missing
From: Yi Zhang @ 2016-09-12  8:03 UTC (permalink / raw)
  To: Artur Paszkiewicz, Shaohua Li; +Cc: linux-raid, Jes.Sorensen
In-Reply-To: <7910bc85-f9c4-1ea3-76a6-40b819738537@intel.com>



On 09/09/2016 08:56 PM, Artur Paszkiewicz wrote:
> On 09/09/2016 12:56 AM, Shaohua Li wrote:
>> On Wed, Sep 07, 2016 at 02:43:41AM -0400, Yi Zhang wrote:
>>> Hello
>>>
>>> I tried create one IMSM RAID10 with missing, found lots of "md: export_rdev(sde)" printed, anyone could help check it?
>>>
>>> Steps I used:
>>> mdadm -CR /dev/md0 /dev/sd[b-f] -n5 -e imsm
>>> mdadm -CR /dev/md/Volume0 -l10 -n4 /dev/sd[b-d] missing
>>>
>>> Version:
>>> 4.8.0-rc5
>>> mdadm - v3.4-84-gbd1fd72 - 25th August 2016
>> can't reproduce with old mdadm but can with upstream mdadm. Looks mdadm is
>> keeping write the new_dev sysfs entry.
>>
>> Jes, any idea?
>>
>> Thanks,
>> Shaohua
>>> Log:
>>> http://pastebin.com/FJJwvgg6
>>>
>>> <6>[  301.102007] md: bind<sdb>
>>> <6>[  301.102095] md: bind<sdc>
>>> <6>[  301.102159] md: bind<sdd>
>>> <6>[  301.102215] md: bind<sde>
>>> <6>[  301.102291] md: bind<sdf>
>>> <6>[  301.103010] ata3.00: Enabling discard_zeroes_data
>>> <6>[  311.714344] ata3.00: Enabling discard_zeroes_data
>>> <6>[  311.721866] md: bind<sdb>
>>> <6>[  311.721965] md: bind<sdc>
>>> <6>[  311.722029] md: bind<sdd>
>>> <5>[  311.733165] md/raid10:md127: not clean -- starting background reconstruction
>>> <6>[  311.733167] md/raid10:md127: active with 3 out of 4 devices
>>> <6>[  311.733186] md127: detected capacity change from 0 to 240060989440
>>> <6>[  311.774027] md: bind<sde>
>>> <6>[  311.810664] md: md127 switched to read-write mode.
>>> <6>[  311.819885] md: resync of RAID array md127
>>> <6>[  311.819886] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
>>> <6>[  311.819887] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
>>> <6>[  311.819891] md: using 128k window, over a total of 234435328k.
>>> <6>[  316.606073] ata3.00: Enabling discard_zeroes_data
>>> <6>[  343.949845] capability: warning: `turbostat' uses 32-bit capabilities (legacy support in use)
>>> <6>[ 1482.314944] md: md127: resync done.
>>> <7>[ 1482.315086] RAID10 conf printout:
>>> <7>[ 1482.315087]  --- wd:3 rd:4
>>> <7>[ 1482.315089]  disk 0, wo:0, o:1, dev:sdb
>>> <7>[ 1482.315089]  disk 1, wo:0, o:1, dev:sdc
>>> <7>[ 1482.315090]  disk 2, wo:0, o:1, dev:sdd
>>> <7>[ 1482.315099] RAID10 conf printout:
>>> <7>[ 1482.315099]  --- wd:3 rd:4
>>> <7>[ 1482.315100]  disk 0, wo:0, o:1, dev:sdb
>>> <7>[ 1482.315100]  disk 1, wo:0, o:1, dev:sdc
>>> <7>[ 1482.315101]  disk 2, wo:0, o:1, dev:sdd
>>> <7>[ 1482.315101]  disk 3, wo:1, o:1, dev:sde
>>> <6>[ 1482.315220] md: recovery of RAID array md127
>>> <6>[ 1482.315221] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
>>> <6>[ 1482.315222] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
>>> <6>[ 1482.315227] md: using 128k window, over a total of 117217664k.
>>> <6>[ 2697.184217] md: md127: recovery done.
>>> <7>[ 2697.524143] RAID10 conf printout:
>>> <7>[ 2697.524144]  --- wd:4 rd:4
>>> <7>[ 2697.524146]  disk 0, wo:0, o:1, dev:sdb
>>> <7>[ 2697.524146]  disk 1, wo:0, o:1, dev:sdc
>>> <7>[ 2697.524147]  disk 2, wo:0, o:1, dev:sdd
>>> <7>[ 2697.524148]  disk 3, wo:0, o:1, dev:sde
>>> <6>[ 2697.524632] md: export_rdev(sde)
>>> <6>[ 2697.549452] md: export_rdev(sde)
>>> <6>[ 2697.568763] md: export_rdev(sde)
>>> <6>[ 2697.587938] md: export_rdev(sde)
>>> <6>[ 2697.607271] md: export_rdev(sdeautomate)
>>> <6>[ 2697.626321] md: export_rdev(sdeautomateautomate)
>>> <6>[ 2697.645676] md: export_rdev(sde)
>>> <6>[ 2697.663211] md: export_rdev(sde)
>>> <6>[ 2697.681603] md: export_rdev(sde)
>>> <6>[ 2697.699117] md: export_rdev(sde)
>>> <6>[ 2697.716510] md: export_rdev(sde)
>>>
>>> Best Regards,
>>>    Yi Zhang
> Can you check if this fix works for you? If it does I'll send a proper
> patch for this.
Hello Artur
With your patch, no "md: export_rdev(sde)" printed after create raid10.

I found another problem, not sure whether it is reasonable, could you 
help confirm it, thanks.
When I create one container with 4 disks[1], and create one raid10 with 
3 disks(sd[b-d]) + 1 missing [2], but it finally bind the fourth disk: 
sde [3].

[1] mdadm -CR /dev/md0 /dev/sd[b-e] -n4 -e imsm
[2] mdadm -CR /dev/md/Volume0 -l10 -n4 /dev/sd[b-d] missing --size=500M
[3] # cat /proc/mdstat
Personalities : [raid10]
md127 : active raid10 sde[4] sdd[2] sdc[1] sdb[0]
       1024000 blocks super external:/md0/0 128K chunks 2 near-copies 
[4/4] [UUUU]

md0 : inactive sde[3](S) sdd[2](S) sdc[1](S) sdb[0](S)
       4420 blocks super external:imsm

unused devices: <none>
> Thanks,
> Artur
>
> diff --git a/super-intel.c b/super-intel.c
> index 92817e9..ffa71f6 100644
> --- a/super-intel.c
> +++ b/super-intel.c
> @@ -7789,6 +7789,9 @@ static struct mdinfo *imsm_activate_spare(struct active_array *a,
>   			IMSM_T_STATE_DEGRADED)
>   		return NULL;
>   
> +	if (get_imsm_map(dev, MAP_0)->map_state == IMSM_T_STATE_UNINITIALIZED)
> +		return NULL;
> +
>   	/*
>   	 * If there are any failed disks check state of the other volume.
>   	 * Block rebuild if the another one is failed until failed disks


^ permalink raw reply

* Re: [PATCH] dm: Return correct value in retry loop
From: Minfei Huang @ 2016-09-12  1:39 UTC (permalink / raw)
  To: agk, snitzer, shli; +Cc: dm-devel, linux-raid, linux-kernel
In-Reply-To: <1473148829-3317-1-git-send-email-mnghuan@gmail.com>

Ping.

Any comment is appreciate.

Thanks
Minfei

On 09/06/16 at 04:00P, Minfei Huang wrote:
> dm_resume will return sliently in retry loop's failure. Assign a correct
> return value in the failed loop.
> 
> Remove a useless assignment as well.
> 
> Signed-off-by: Minfei Huang <mnghuan@gmail.com>
> ---
>  drivers/md/dm.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index fa9b1cb..c935cc8 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -2249,10 +2249,11 @@ static int __dm_resume(struct mapped_device *md, struct dm_table *map)
>  
>  int dm_resume(struct mapped_device *md)
>  {
> -	int r = -EINVAL;
> +	int r;
>  	struct dm_table *map = NULL;
>  
>  retry:
> +	r = -EINVAL;
>  	mutex_lock_nested(&md->suspend_lock, SINGLE_DEPTH_NESTING);
>  
>  	if (!dm_suspended_md(md))
> @@ -2277,10 +2278,8 @@ retry:
>  
>  	clear_bit(DMF_SUSPENDED, &md->flags);
>  
> -	r = 0;
>  out:
>  	mutex_unlock(&md->suspend_lock);
> -
>  	return r;
>  }
>  
> -- 
> 2.7.4 (Apple Git-66)
> 

^ permalink raw reply

* Re: Inactive arrays
From: Daniel Sanabria @ 2016-09-11 20:06 UTC (permalink / raw)
  To: Wols Lists; +Cc: linux-raid
In-Reply-To: <CAHscji3C4Cygs=xh4d4PtREp9mGSBjNS0o7SatW_QotzYShA_Q@mail.gmail.com>

However I'm noticing that the details with this new MB are somewhat different:

[root@lamachine ~]# cat /etc/mdadm.conf
# mdadm.conf written out by anaconda
MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md2 level=raid5 num-devices=3
UUID=2cff15d1:e411447b:fd5d4721:03e44022
ARRAY /dev/md126 level=raid10 num-devices=2
UUID=9af006ca:8845bbd3:bfe78010:bc810f04
ARRAY /dev/md127 level=raid0 num-devices=3
UUID=acd5374f:72628c93:6a906c4b:5f675ce5
ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128
UUID=f2372cb9:d3816fd6:ce86d826:882ec82e
ARRAY /dev/md129 metadata=1.2 name=lamachine:129
UUID=895dae98:d1a496de:4f590b8b:cb8ac12a
[root@lamachine ~]# mdadm --detail /dev/md1*
/dev/md126:
        Version : 0.90
  Creation Time : Thu Dec  3 22:12:12 2009
     Raid Level : raid10
     Array Size : 30719936 (29.30 GiB 31.46 GB)
  Used Dev Size : 30719936 (29.30 GiB 31.46 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 126
    Persistence : Superblock is persistent

    Update Time : Tue Jan 12 04:03:41 2016
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

         Layout : near=2
     Chunk Size : 64K

           UUID : 9af006ca:8845bbd3:bfe78010:bc810f04
         Events : 0.264152

    Number   Major   Minor   RaidDevice State
       0       8       82        0      active sync set-A   /dev/sdf2
       1       8        1        1      active sync set-B   /dev/sda1
/dev/md127:
        Version : 1.2
  Creation Time : Tue Jul 26 19:00:28 2011
     Raid Level : raid0
     Array Size : 94367232 (90.00 GiB 96.63 GB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Tue Jul 26 19:00:28 2011
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 512K

           Name : reading.homeunix.com:3
           UUID : acd5374f:72628c93:6a906c4b:5f675ce5
         Events : 0

    Number   Major   Minor   RaidDevice State
       0       8       85        0      active sync   /dev/sdf5
       1       8       21        1      active sync   /dev/sdb5
       2       8        5        2      active sync   /dev/sda5
/dev/md128:
        Version : 1.2
     Raid Level : raid0
  Total Devices : 1
    Persistence : Superblock is persistent

          State : inactive

           Name : lamachine:128  (local to host lamachine)
           UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
         Events : 4154

    Number   Major   Minor   RaidDevice

       -       8       49        -        /dev/sdd1
/dev/md129:
        Version : 1.2
     Raid Level : raid0
  Total Devices : 1
    Persistence : Superblock is persistent

          State : inactive

           Name : lamachine:129  (local to host lamachine)
           UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
         Events : 0

    Number   Major   Minor   RaidDevice

       -       8       50        -        /dev/sdd2
[root@lamachine ~]# mdadm --detail /dev/md2*
/dev/md2:
        Version : 0.90
  Creation Time : Mon Feb 11 07:54:36 2013
     Raid Level : raid5
     Array Size : 511999872 (488.28 GiB 524.29 GB)
  Used Dev Size : 255999936 (244.14 GiB 262.14 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Tue Jan 12 02:31:50 2016
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine)
         Events : 0.611

    Number   Major   Minor   RaidDevice State
       0       8       83        0      active sync   /dev/sdf3
       1       8       18        1      active sync   /dev/sdb2
       2       8        2        2      active sync   /dev/sda2
[root@lamachine ~]# cat /proc/mdstat
Personalities : [raid10] [raid0] [raid6] [raid5] [raid4]
md2 : active raid5 sda2[2] sdf3[0] sdb2[1]
      511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

md127 : active raid0 sda5[2] sdf5[0] sdb5[1]
      94367232 blocks super 1.2 512k chunks

md129 : inactive sdd2[2](S)
      524156928 blocks super 1.2

md128 : inactive sdd1[3](S)
      2147352576 blocks super 1.2

md126 : active raid10 sdf2[0] sda1[1]
      30719936 blocks 2 near-copies [2/2] [UU]

unused devices: <none>
[root@lamachine ~]#

On 11 September 2016 at 19:48, Daniel Sanabria <sanabria.d@gmail.com> wrote:
> ok, system up and running after MB was replaced however the arrays
> remain inactive.
>
> mdadm version is:
> mdadm - v3.3.4 - 3rd August 2015
>
> Here's the output from Phil's lsdrv:
>
> [root@lamachine ~]# ./lsdrv
> PCI [ahci] 00:1f.2 SATA controller: Intel Corporation C600/X79 series
> chipset 6-Port SATA AHCI Controller (rev 06)
> ├scsi 0:0:0:0 ATA      WDC WD5000AAKS-0 {WD-WCASZ0505379}
> │└sda 465.76g [8:0] Partitioned (dos)
> │ ├sda1 29.30g [8:1] MD raid10,near2 (1/2) (w/ sdf2) in_sync
> {9af006ca-8845-bbd3-bfe7-8010bc810f04}
> │ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk
> {9af006ca:8845bbd3:bfe78010:bc810f04}
> │ │ │                    PV LVM2_member 28.03g used, 1.26g free
> {cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH}
> │ │ └VG vg_bigblackbox 29.29g 1.26g free
> {VWfuwI-5v2q-w8qf-FEbc-BdGW-3mKX-pZd7hR}
> │ │  ├dm-2 7.81g [253:2] LV LogVol_opt ext4
> {b08d7f5e-f15f-4241-804e-edccecab6003}
> │ │  │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_opt @ /opt
> │ │  ├dm-0 9.77g [253:0] LV LogVol_root ext4
> {4dabd6b0-b1a3-464d-8ed7-0aab93fab6c3}
> │ │  │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_root @ /
> │ │  ├dm-3 1.95g [253:3] LV LogVol_tmp ext4
> {f6b46363-170b-4038-83bd-2c5f9f6a1973}
> │ │  │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_tmp @ /tmp
> │ │  └dm-1 8.50g [253:1] LV LogVol_var ext4
> {ab165c61-3d62-4c55-8639-6c2c2bf4b021}
> │ │   └Mounted as /dev/mapper/vg_bigblackbox-LogVol_var @ /var
> │ ├sda2 244.14g [8:2] MD raid5 (2/3) (w/ sdb2,sdf3) in_sync
> {2cff15d1-e411-447b-fd5d-472103e44022}
> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
> {2cff15d1:e411447b:fd5d4721:03e44022}
> │ │ │                 ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
> │ │ └Mounted as /dev/md2 @ /home
> │ ├sda3 1.00k [8:3] Partitioned (dos)
> │ ├sda5 30.00g [8:5] MD raid0 (2/3) (w/ sdb5,sdf5) in_sync
> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
> │ │ │                    PV LVM2_member 86.00g used, 3.99g free
> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
> │ │ └VG libvirt_lvm 89.99g 3.99g free {t8GQck-f2Eu-iD2V-fnJQ-kBm6-QyKw-dR31PB}
> │ │  ├dm-6 8.00g [253:6] LV builder2 Partitioned (dos)
> │ │  ├dm-7 8.00g [253:7] LV builder3 Partitioned (dos)
> │ │  ├dm-9 8.00g [253:9] LV builder5.3 Partitioned (dos)
> │ │  ├dm-8 8.00g [253:8] LV builder5.6 Partitioned (dos)
> │ │  ├dm-5 8.00g [253:5] LV centos_updt Partitioned (dos)
> │ │  ├dm-10 16.00g [253:10] LV f22lvm Partitioned (dos)
> │ │  └dm-4 30.00g [253:4] LV win7 Partitioned (dos)
> │ └sda6 3.39g [8:6] Empty/Unknown
> ├scsi 1:0:0:0 ATA      WDC WD5000AAKS-0 {WD-WCASY7694185}
> │└sdb 465.76g [8:16] Partitioned (dos)
> │ ├sdb2 244.14g [8:18] MD raid5 (1/3) (w/ sda2,sdf3) in_sync
> {2cff15d1-e411-447b-fd5d-472103e44022}
> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
> {2cff15d1:e411447b:fd5d4721:03e44022}
> │ │                   ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
> │ ├sdb3 7.81g [8:19] swap {9194f492-881a-4fc3-ac09-ca4e1cc2985a}
> │ ├sdb4 1.00k [8:20] Partitioned (dos)
> │ ├sdb5 30.00g [8:21] MD raid0 (1/3) (w/ sda5,sdf5) in_sync
> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
> │ │                      PV LVM2_member 86.00g used, 3.99g free
> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
> │ └sdb6 3.39g [8:22] Empty/Unknown
> ├scsi 2:x:x:x [Empty]
> ├scsi 3:x:x:x [Empty]
> ├scsi 4:x:x:x [Empty]
> └scsi 5:x:x:x [Empty]
> PCI [ahci] 0a:00.0 SATA controller: Marvell Technology Group Ltd.
> 88SE9230 PCIe SATA 6Gb/s Controller (rev 11)
> ├scsi 6:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4NCWT13RF}
> │└sdc 2.73t [8:32] Partitioned (PMBR)
> ├scsi 7:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4NPRDD6D7}
> │└sdd 2.73t [8:48] Partitioned (gpt)
> │ ├sdd1 2.00t [8:49] MD  (none/) spare 'lamachine:128'
> {f2372cb9-d381-6fd6-ce86-d826882ec82e}
> │ │└md128 0.00k [9:128] MD v1.2  () inactive, None (None) None
> {f2372cb9:d3816fd6:ce86d826:882ec82e}
> │ │                     Empty/Unknown
> │ └sdd2 500.00g [8:50] MD  (none/) spare 'lamachine:129'
> {895dae98-d1a4-96de-4f59-0b8bcb8ac12a}
> │  └md129 0.00k [9:129] MD v1.2  () inactive, None (None) None
> {895dae98:d1a496de:4f590b8b:cb8ac12a}
> │                       Empty/Unknown
> ├scsi 8:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4N1294906}
> │└sde 2.73t [8:64] Partitioned (PMBR)
> ├scsi 9:0:0:0 ATA      WDC WD5000AAKS-0 {WD-WMAWF0085724}
> │└sdf 465.76g [8:80] Partitioned (dos)
> │ ├sdf1 199.00m [8:81] ext4 {4e51f903-37ca-4479-9197-fac7b2280557}
> │ │└Mounted as /dev/sdf1 @ /boot
> │ ├sdf2 29.30g [8:82] MD raid10,near2 (0/2) (w/ sda1) in_sync
> {9af006ca-8845-bbd3-bfe7-8010bc810f04}
> │ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk
> {9af006ca:8845bbd3:bfe78010:bc810f04}
> │ │                      PV LVM2_member 28.03g used, 1.26g free
> {cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH}
> │ ├sdf3 244.14g [8:83] MD raid5 (0/3) (w/ sda2,sdb2) in_sync
> {2cff15d1-e411-447b-fd5d-472103e44022}
> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
> {2cff15d1:e411447b:fd5d4721:03e44022}
> │ │                   ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
> │ ├sdf4 1.00k [8:84] Partitioned (dos)
> │ ├sdf5 30.00g [8:85] MD raid0 (0/3) (w/ sda5,sdb5) in_sync
> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
> │ │                      PV LVM2_member 86.00g used, 3.99g free
> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
> │ └sdf6 3.39g [8:86] Empty/Unknown
> ├scsi 10:x:x:x [Empty]
> ├scsi 11:x:x:x [Empty]
> └scsi 12:x:x:x [Empty]
> PCI [isci] 05:00.0 Serial Attached SCSI controller: Intel Corporation
> C602 chipset 4-Port SATA Storage Control Unit (rev 06)
> └scsi 14:x:x:x [Empty]
> [root@lamachine ~]#
>
> Thanks in advance for any recommendations on what steps to take in
> order to bring these arrays back online.
>
> Regards,
>
> Daniel
>
>
> On 2 August 2016 at 11:45, Daniel Sanabria <sanabria.d@gmail.com> wrote:
>> Thanks very much for the response Wol.
>>
>> It looks like the PSU is dead (server automatically powers off a few
>> seconds after power on).
>>
>> I'm planning to order a PSU replacement to resume troubleshooting so
>> please bear with me;  maybe the PSU was degraded and couldn't power
>> some of drives?
>>
>> Cheers,
>>
>> Daniel
>>
>> On 2 August 2016 at 11:17, Wols Lists <antlists@youngman.org.uk> wrote:
>>> Just a quick first response. I see md128 and md129 are both down, and
>>> are both listed as one drive, raid0. Bit odd, that ...
>>>
>>> What version of mdadm are you using? One of them had a bug (3.2.3 era?)
>>> that would split an array in two. Is it possible that you should have
>>> one raid0 array with sdf1 and sdf2? But that's a bit of a weird setup...
>>>
>>> I notice also that md126 is raid10 across two drives. That's odd, too.
>>>
>>> How much do you know about what the setup should be, and why it was set
>>> up that way?
>>>
>>> Download lspci by Phil Turmel (it requires python2.7, if your machine is
>>> python3 a quick fix to the shebang at the start should get it to work).
>>> Post the output from that here.
>>>
>>> Cheers,
>>> Wol
>>>
>>> On 02/08/16 08:36, Daniel Sanabria wrote:
>>>> Hi All,
>>>>
>>>> I have a box that I believe was not powered down correctly and after
>>>> transporting it to a different location it doesn't boot anymore
>>>> stopping at BIOS check "Verifying DMI Pool Data".
>>>>
>>>> The box have 6 drives and after instructing the BIOS to boot from the
>>>> first drive I managed to boot the OS (Fedora 23) after commenting out
>>>> 2 /etc/fstab entries , output for "uname -a; cat /etc/fstab" follows:
>>>>
>>>> [root@lamachine ~]# uname -a; cat /etc/fstab
>>>> Linux lamachine 4.3.3-303.fc23.x86_64 #1 SMP Tue Jan 19 18:31:55 UTC
>>>> 2016 x86_64 x86_64 x86_64 GNU/Linux
>>>>
>>>> #
>>>> # /etc/fstab
>>>> # Created by anaconda on Tue Mar 24 19:31:21 2015
>>>> #
>>>> # Accessible filesystems, by reference, are maintained under '/dev/disk'
>>>> # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
>>>> #
>>>> /dev/mapper/vg_bigblackbox-LogVol_root /                       ext4
>>>> defaults        1 1
>>>> UUID=4e51f903-37ca-4479-9197-fac7b2280557 /boot                   ext4
>>>>    defaults        1 2
>>>> /dev/mapper/vg_bigblackbox-LogVol_opt /opt                    ext4
>>>> defaults        1 2
>>>> /dev/mapper/vg_bigblackbox-LogVol_tmp /tmp                    ext4
>>>> defaults        1 2
>>>> /dev/mapper/vg_bigblackbox-LogVol_var /var                    ext4
>>>> defaults        1 2
>>>> UUID=9194f492-881a-4fc3-ac09-ca4e1cc2985a swap                    swap
>>>>    defaults        0 0
>>>> /dev/md2 /home          ext4    defaults        1 2
>>>> #/dev/vg_media/lv_media  /mnt/media      ext4    defaults        1 2
>>>> #/dev/vg_virt_dir/lv_virt_dir1 /mnt/guest_images/ ext4 defaults 1 2
>>>> [root@lamachine ~]#
>>>>
>>>> When checking mdstat I can see that 2 of the arrays are showing up as
>>>> inactive, but not sure how to safely activate these so looking for
>>>> some knowledgeable advice on how to proceed here.
>>>>
>>>> Thanks in advance,
>>>>
>>>> Daniel
>>>>
>>>> Below some more relevant outputs:
>>>>
>>>> [root@lamachine ~]# cat /proc/mdstat
>>>> Personalities : [raid10] [raid6] [raid5] [raid4] [raid0]
>>>> md127 : active raid0 sda5[0] sdc5[2] sdb5[1]
>>>>       94367232 blocks super 1.2 512k chunks
>>>>
>>>> md2 : active raid5 sda3[0] sdc2[2] sdb2[1]
>>>>       511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>>>>
>>>> md128 : inactive sdf1[3](S)
>>>>       2147352576 blocks super 1.2
>>>>
>>>> md129 : inactive sdf2[2](S)
>>>>       524156928 blocks super 1.2
>>>>
>>>> md126 : active raid10 sda2[0] sdc1[1]
>>>>       30719936 blocks 2 near-copies [2/2] [UU]
>>>>
>>>> unused devices: <none>
>>>> [root@lamachine ~]# cat /etc/mdadm.conf
>>>> # mdadm.conf written out by anaconda
>>>> MAILADDR root
>>>> AUTO +imsm +1.x -all
>>>> ARRAY /dev/md2 level=raid5 num-devices=3
>>>> UUID=2cff15d1:e411447b:fd5d4721:03e44022
>>>> ARRAY /dev/md126 level=raid10 num-devices=2
>>>> UUID=9af006ca:8845bbd3:bfe78010:bc810f04
>>>> ARRAY /dev/md127 level=raid0 num-devices=3
>>>> UUID=acd5374f:72628c93:6a906c4b:5f675ce5
>>>> ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128
>>>> UUID=f2372cb9:d3816fd6:ce86d826:882ec82e
>>>> ARRAY /dev/md129 metadata=1.2 name=lamachine:129
>>>> UUID=895dae98:d1a496de:4f590b8b:cb8ac12a
>>>> [root@lamachine ~]# mdadm --detail /dev/md1*
>>>> /dev/md126:
>>>>         Version : 0.90
>>>>   Creation Time : Thu Dec  3 22:12:12 2009
>>>>      Raid Level : raid10
>>>>      Array Size : 30719936 (29.30 GiB 31.46 GB)
>>>>   Used Dev Size : 30719936 (29.30 GiB 31.46 GB)
>>>>    Raid Devices : 2
>>>>   Total Devices : 2
>>>> Preferred Minor : 126
>>>>     Persistence : Superblock is persistent
>>>>
>>>>     Update Time : Tue Aug  2 07:46:39 2016
>>>>           State : clean
>>>>  Active Devices : 2
>>>> Working Devices : 2
>>>>  Failed Devices : 0
>>>>   Spare Devices : 0
>>>>
>>>>          Layout : near=2
>>>>      Chunk Size : 64K
>>>>
>>>>            UUID : 9af006ca:8845bbd3:bfe78010:bc810f04
>>>>          Events : 0.264152
>>>>
>>>>     Number   Major   Minor   RaidDevice State
>>>>        0       8        2        0      active sync set-A   /dev/sda2
>>>>        1       8       33        1      active sync set-B   /dev/sdc1
>>>> /dev/md127:
>>>>         Version : 1.2
>>>>   Creation Time : Tue Jul 26 19:00:28 2011
>>>>      Raid Level : raid0
>>>>      Array Size : 94367232 (90.00 GiB 96.63 GB)
>>>>    Raid Devices : 3
>>>>   Total Devices : 3
>>>>     Persistence : Superblock is persistent
>>>>
>>>>     Update Time : Tue Jul 26 19:00:28 2011
>>>>           State : clean
>>>>  Active Devices : 3
>>>> Working Devices : 3
>>>>  Failed Devices : 0
>>>>   Spare Devices : 0
>>>>
>>>>      Chunk Size : 512K
>>>>
>>>>            Name : reading.homeunix.com:3
>>>>            UUID : acd5374f:72628c93:6a906c4b:5f675ce5
>>>>          Events : 0
>>>>
>>>>     Number   Major   Minor   RaidDevice State
>>>>        0       8        5        0      active sync   /dev/sda5
>>>>        1       8       21        1      active sync   /dev/sdb5
>>>>        2       8       37        2      active sync   /dev/sdc5
>>>> /dev/md128:
>>>>         Version : 1.2
>>>>      Raid Level : raid0
>>>>   Total Devices : 1
>>>>     Persistence : Superblock is persistent
>>>>
>>>>           State : inactive
>>>>
>>>>            Name : lamachine:128  (local to host lamachine)
>>>>            UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
>>>>          Events : 4154
>>>>
>>>>     Number   Major   Minor   RaidDevice
>>>>
>>>>        -       8       81        -        /dev/sdf1
>>>> /dev/md129:
>>>>         Version : 1.2
>>>>      Raid Level : raid0
>>>>   Total Devices : 1
>>>>     Persistence : Superblock is persistent
>>>>
>>>>           State : inactive
>>>>
>>>>            Name : lamachine:129  (local to host lamachine)
>>>>            UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
>>>>          Events : 0
>>>>
>>>>     Number   Major   Minor   RaidDevice
>>>>
>>>>        -       8       82        -        /dev/sdf2
>>>> [root@lamachine ~]# mdadm --detail /dev/md2
>>>> /dev/md2:
>>>>         Version : 0.90
>>>>   Creation Time : Mon Feb 11 07:54:36 2013
>>>>      Raid Level : raid5
>>>>      Array Size : 511999872 (488.28 GiB 524.29 GB)
>>>>   Used Dev Size : 255999936 (244.14 GiB 262.14 GB)
>>>>    Raid Devices : 3
>>>>   Total Devices : 3
>>>> Preferred Minor : 2
>>>>     Persistence : Superblock is persistent
>>>>
>>>>     Update Time : Mon Aug  1 20:24:23 2016
>>>>           State : clean
>>>>  Active Devices : 3
>>>> Working Devices : 3
>>>>  Failed Devices : 0
>>>>   Spare Devices : 0
>>>>
>>>>          Layout : left-symmetric
>>>>      Chunk Size : 64K
>>>>
>>>>            UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine)
>>>>          Events : 0.611
>>>>
>>>>     Number   Major   Minor   RaidDevice State
>>>>        0       8        3        0      active sync   /dev/sda3
>>>>        1       8       18        1      active sync   /dev/sdb2
>>>>        2       8       34        2      active sync   /dev/sdc2
>>>> [root@lamachine ~]#
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>

^ permalink raw reply

* Re: Inactive arrays
From: Daniel Sanabria @ 2016-09-11 18:48 UTC (permalink / raw)
  To: Wols Lists; +Cc: linux-raid
In-Reply-To: <CAHscji0BQuyKQbLYZ9Ah16hHday_seTaajm8LOn6+HRFqinTyQ@mail.gmail.com>

ok, system up and running after MB was replaced however the arrays
remain inactive.

mdadm version is:
mdadm - v3.3.4 - 3rd August 2015

Here's the output from Phil's lsdrv:

[root@lamachine ~]# ./lsdrv
PCI [ahci] 00:1f.2 SATA controller: Intel Corporation C600/X79 series
chipset 6-Port SATA AHCI Controller (rev 06)
├scsi 0:0:0:0 ATA      WDC WD5000AAKS-0 {WD-WCASZ0505379}
│└sda 465.76g [8:0] Partitioned (dos)
│ ├sda1 29.30g [8:1] MD raid10,near2 (1/2) (w/ sdf2) in_sync
{9af006ca-8845-bbd3-bfe7-8010bc810f04}
│ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk
{9af006ca:8845bbd3:bfe78010:bc810f04}
│ │ │                    PV LVM2_member 28.03g used, 1.26g free
{cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH}
│ │ └VG vg_bigblackbox 29.29g 1.26g free
{VWfuwI-5v2q-w8qf-FEbc-BdGW-3mKX-pZd7hR}
│ │  ├dm-2 7.81g [253:2] LV LogVol_opt ext4
{b08d7f5e-f15f-4241-804e-edccecab6003}
│ │  │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_opt @ /opt
│ │  ├dm-0 9.77g [253:0] LV LogVol_root ext4
{4dabd6b0-b1a3-464d-8ed7-0aab93fab6c3}
│ │  │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_root @ /
│ │  ├dm-3 1.95g [253:3] LV LogVol_tmp ext4
{f6b46363-170b-4038-83bd-2c5f9f6a1973}
│ │  │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_tmp @ /tmp
│ │  └dm-1 8.50g [253:1] LV LogVol_var ext4
{ab165c61-3d62-4c55-8639-6c2c2bf4b021}
│ │   └Mounted as /dev/mapper/vg_bigblackbox-LogVol_var @ /var
│ ├sda2 244.14g [8:2] MD raid5 (2/3) (w/ sdb2,sdf3) in_sync
{2cff15d1-e411-447b-fd5d-472103e44022}
│ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
{2cff15d1:e411447b:fd5d4721:03e44022}
│ │ │                 ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
│ │ └Mounted as /dev/md2 @ /home
│ ├sda3 1.00k [8:3] Partitioned (dos)
│ ├sda5 30.00g [8:5] MD raid0 (2/3) (w/ sdb5,sdf5) in_sync
'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
│ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
(None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
│ │ │                    PV LVM2_member 86.00g used, 3.99g free
{VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
│ │ └VG libvirt_lvm 89.99g 3.99g free {t8GQck-f2Eu-iD2V-fnJQ-kBm6-QyKw-dR31PB}
│ │  ├dm-6 8.00g [253:6] LV builder2 Partitioned (dos)
│ │  ├dm-7 8.00g [253:7] LV builder3 Partitioned (dos)
│ │  ├dm-9 8.00g [253:9] LV builder5.3 Partitioned (dos)
│ │  ├dm-8 8.00g [253:8] LV builder5.6 Partitioned (dos)
│ │  ├dm-5 8.00g [253:5] LV centos_updt Partitioned (dos)
│ │  ├dm-10 16.00g [253:10] LV f22lvm Partitioned (dos)
│ │  └dm-4 30.00g [253:4] LV win7 Partitioned (dos)
│ └sda6 3.39g [8:6] Empty/Unknown
├scsi 1:0:0:0 ATA      WDC WD5000AAKS-0 {WD-WCASY7694185}
│└sdb 465.76g [8:16] Partitioned (dos)
│ ├sdb2 244.14g [8:18] MD raid5 (1/3) (w/ sda2,sdf3) in_sync
{2cff15d1-e411-447b-fd5d-472103e44022}
│ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
{2cff15d1:e411447b:fd5d4721:03e44022}
│ │                   ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
│ ├sdb3 7.81g [8:19] swap {9194f492-881a-4fc3-ac09-ca4e1cc2985a}
│ ├sdb4 1.00k [8:20] Partitioned (dos)
│ ├sdb5 30.00g [8:21] MD raid0 (1/3) (w/ sda5,sdf5) in_sync
'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
│ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
(None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
│ │                      PV LVM2_member 86.00g used, 3.99g free
{VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
│ └sdb6 3.39g [8:22] Empty/Unknown
├scsi 2:x:x:x [Empty]
├scsi 3:x:x:x [Empty]
├scsi 4:x:x:x [Empty]
└scsi 5:x:x:x [Empty]
PCI [ahci] 0a:00.0 SATA controller: Marvell Technology Group Ltd.
88SE9230 PCIe SATA 6Gb/s Controller (rev 11)
├scsi 6:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4NCWT13RF}
│└sdc 2.73t [8:32] Partitioned (PMBR)
├scsi 7:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4NPRDD6D7}
│└sdd 2.73t [8:48] Partitioned (gpt)
│ ├sdd1 2.00t [8:49] MD  (none/) spare 'lamachine:128'
{f2372cb9-d381-6fd6-ce86-d826882ec82e}
│ │└md128 0.00k [9:128] MD v1.2  () inactive, None (None) None
{f2372cb9:d3816fd6:ce86d826:882ec82e}
│ │                     Empty/Unknown
│ └sdd2 500.00g [8:50] MD  (none/) spare 'lamachine:129'
{895dae98-d1a4-96de-4f59-0b8bcb8ac12a}
│  └md129 0.00k [9:129] MD v1.2  () inactive, None (None) None
{895dae98:d1a496de:4f590b8b:cb8ac12a}
│                       Empty/Unknown
├scsi 8:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4N1294906}
│└sde 2.73t [8:64] Partitioned (PMBR)
├scsi 9:0:0:0 ATA      WDC WD5000AAKS-0 {WD-WMAWF0085724}
│└sdf 465.76g [8:80] Partitioned (dos)
│ ├sdf1 199.00m [8:81] ext4 {4e51f903-37ca-4479-9197-fac7b2280557}
│ │└Mounted as /dev/sdf1 @ /boot
│ ├sdf2 29.30g [8:82] MD raid10,near2 (0/2) (w/ sda1) in_sync
{9af006ca-8845-bbd3-bfe7-8010bc810f04}
│ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk
{9af006ca:8845bbd3:bfe78010:bc810f04}
│ │                      PV LVM2_member 28.03g used, 1.26g free
{cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH}
│ ├sdf3 244.14g [8:83] MD raid5 (0/3) (w/ sda2,sdb2) in_sync
{2cff15d1-e411-447b-fd5d-472103e44022}
│ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
{2cff15d1:e411447b:fd5d4721:03e44022}
│ │                   ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
│ ├sdf4 1.00k [8:84] Partitioned (dos)
│ ├sdf5 30.00g [8:85] MD raid0 (0/3) (w/ sda5,sdb5) in_sync
'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
│ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
(None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
│ │                      PV LVM2_member 86.00g used, 3.99g free
{VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
│ └sdf6 3.39g [8:86] Empty/Unknown
├scsi 10:x:x:x [Empty]
├scsi 11:x:x:x [Empty]
└scsi 12:x:x:x [Empty]
PCI [isci] 05:00.0 Serial Attached SCSI controller: Intel Corporation
C602 chipset 4-Port SATA Storage Control Unit (rev 06)
└scsi 14:x:x:x [Empty]
[root@lamachine ~]#

Thanks in advance for any recommendations on what steps to take in
order to bring these arrays back online.

Regards,

Daniel


On 2 August 2016 at 11:45, Daniel Sanabria <sanabria.d@gmail.com> wrote:
> Thanks very much for the response Wol.
>
> It looks like the PSU is dead (server automatically powers off a few
> seconds after power on).
>
> I'm planning to order a PSU replacement to resume troubleshooting so
> please bear with me;  maybe the PSU was degraded and couldn't power
> some of drives?
>
> Cheers,
>
> Daniel
>
> On 2 August 2016 at 11:17, Wols Lists <antlists@youngman.org.uk> wrote:
>> Just a quick first response. I see md128 and md129 are both down, and
>> are both listed as one drive, raid0. Bit odd, that ...
>>
>> What version of mdadm are you using? One of them had a bug (3.2.3 era?)
>> that would split an array in two. Is it possible that you should have
>> one raid0 array with sdf1 and sdf2? But that's a bit of a weird setup...
>>
>> I notice also that md126 is raid10 across two drives. That's odd, too.
>>
>> How much do you know about what the setup should be, and why it was set
>> up that way?
>>
>> Download lspci by Phil Turmel (it requires python2.7, if your machine is
>> python3 a quick fix to the shebang at the start should get it to work).
>> Post the output from that here.
>>
>> Cheers,
>> Wol
>>
>> On 02/08/16 08:36, Daniel Sanabria wrote:
>>> Hi All,
>>>
>>> I have a box that I believe was not powered down correctly and after
>>> transporting it to a different location it doesn't boot anymore
>>> stopping at BIOS check "Verifying DMI Pool Data".
>>>
>>> The box have 6 drives and after instructing the BIOS to boot from the
>>> first drive I managed to boot the OS (Fedora 23) after commenting out
>>> 2 /etc/fstab entries , output for "uname -a; cat /etc/fstab" follows:
>>>
>>> [root@lamachine ~]# uname -a; cat /etc/fstab
>>> Linux lamachine 4.3.3-303.fc23.x86_64 #1 SMP Tue Jan 19 18:31:55 UTC
>>> 2016 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> #
>>> # /etc/fstab
>>> # Created by anaconda on Tue Mar 24 19:31:21 2015
>>> #
>>> # Accessible filesystems, by reference, are maintained under '/dev/disk'
>>> # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
>>> #
>>> /dev/mapper/vg_bigblackbox-LogVol_root /                       ext4
>>> defaults        1 1
>>> UUID=4e51f903-37ca-4479-9197-fac7b2280557 /boot                   ext4
>>>    defaults        1 2
>>> /dev/mapper/vg_bigblackbox-LogVol_opt /opt                    ext4
>>> defaults        1 2
>>> /dev/mapper/vg_bigblackbox-LogVol_tmp /tmp                    ext4
>>> defaults        1 2
>>> /dev/mapper/vg_bigblackbox-LogVol_var /var                    ext4
>>> defaults        1 2
>>> UUID=9194f492-881a-4fc3-ac09-ca4e1cc2985a swap                    swap
>>>    defaults        0 0
>>> /dev/md2 /home          ext4    defaults        1 2
>>> #/dev/vg_media/lv_media  /mnt/media      ext4    defaults        1 2
>>> #/dev/vg_virt_dir/lv_virt_dir1 /mnt/guest_images/ ext4 defaults 1 2
>>> [root@lamachine ~]#
>>>
>>> When checking mdstat I can see that 2 of the arrays are showing up as
>>> inactive, but not sure how to safely activate these so looking for
>>> some knowledgeable advice on how to proceed here.
>>>
>>> Thanks in advance,
>>>
>>> Daniel
>>>
>>> Below some more relevant outputs:
>>>
>>> [root@lamachine ~]# cat /proc/mdstat
>>> Personalities : [raid10] [raid6] [raid5] [raid4] [raid0]
>>> md127 : active raid0 sda5[0] sdc5[2] sdb5[1]
>>>       94367232 blocks super 1.2 512k chunks
>>>
>>> md2 : active raid5 sda3[0] sdc2[2] sdb2[1]
>>>       511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>>>
>>> md128 : inactive sdf1[3](S)
>>>       2147352576 blocks super 1.2
>>>
>>> md129 : inactive sdf2[2](S)
>>>       524156928 blocks super 1.2
>>>
>>> md126 : active raid10 sda2[0] sdc1[1]
>>>       30719936 blocks 2 near-copies [2/2] [UU]
>>>
>>> unused devices: <none>
>>> [root@lamachine ~]# cat /etc/mdadm.conf
>>> # mdadm.conf written out by anaconda
>>> MAILADDR root
>>> AUTO +imsm +1.x -all
>>> ARRAY /dev/md2 level=raid5 num-devices=3
>>> UUID=2cff15d1:e411447b:fd5d4721:03e44022
>>> ARRAY /dev/md126 level=raid10 num-devices=2
>>> UUID=9af006ca:8845bbd3:bfe78010:bc810f04
>>> ARRAY /dev/md127 level=raid0 num-devices=3
>>> UUID=acd5374f:72628c93:6a906c4b:5f675ce5
>>> ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128
>>> UUID=f2372cb9:d3816fd6:ce86d826:882ec82e
>>> ARRAY /dev/md129 metadata=1.2 name=lamachine:129
>>> UUID=895dae98:d1a496de:4f590b8b:cb8ac12a
>>> [root@lamachine ~]# mdadm --detail /dev/md1*
>>> /dev/md126:
>>>         Version : 0.90
>>>   Creation Time : Thu Dec  3 22:12:12 2009
>>>      Raid Level : raid10
>>>      Array Size : 30719936 (29.30 GiB 31.46 GB)
>>>   Used Dev Size : 30719936 (29.30 GiB 31.46 GB)
>>>    Raid Devices : 2
>>>   Total Devices : 2
>>> Preferred Minor : 126
>>>     Persistence : Superblock is persistent
>>>
>>>     Update Time : Tue Aug  2 07:46:39 2016
>>>           State : clean
>>>  Active Devices : 2
>>> Working Devices : 2
>>>  Failed Devices : 0
>>>   Spare Devices : 0
>>>
>>>          Layout : near=2
>>>      Chunk Size : 64K
>>>
>>>            UUID : 9af006ca:8845bbd3:bfe78010:bc810f04
>>>          Events : 0.264152
>>>
>>>     Number   Major   Minor   RaidDevice State
>>>        0       8        2        0      active sync set-A   /dev/sda2
>>>        1       8       33        1      active sync set-B   /dev/sdc1
>>> /dev/md127:
>>>         Version : 1.2
>>>   Creation Time : Tue Jul 26 19:00:28 2011
>>>      Raid Level : raid0
>>>      Array Size : 94367232 (90.00 GiB 96.63 GB)
>>>    Raid Devices : 3
>>>   Total Devices : 3
>>>     Persistence : Superblock is persistent
>>>
>>>     Update Time : Tue Jul 26 19:00:28 2011
>>>           State : clean
>>>  Active Devices : 3
>>> Working Devices : 3
>>>  Failed Devices : 0
>>>   Spare Devices : 0
>>>
>>>      Chunk Size : 512K
>>>
>>>            Name : reading.homeunix.com:3
>>>            UUID : acd5374f:72628c93:6a906c4b:5f675ce5
>>>          Events : 0
>>>
>>>     Number   Major   Minor   RaidDevice State
>>>        0       8        5        0      active sync   /dev/sda5
>>>        1       8       21        1      active sync   /dev/sdb5
>>>        2       8       37        2      active sync   /dev/sdc5
>>> /dev/md128:
>>>         Version : 1.2
>>>      Raid Level : raid0
>>>   Total Devices : 1
>>>     Persistence : Superblock is persistent
>>>
>>>           State : inactive
>>>
>>>            Name : lamachine:128  (local to host lamachine)
>>>            UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
>>>          Events : 4154
>>>
>>>     Number   Major   Minor   RaidDevice
>>>
>>>        -       8       81        -        /dev/sdf1
>>> /dev/md129:
>>>         Version : 1.2
>>>      Raid Level : raid0
>>>   Total Devices : 1
>>>     Persistence : Superblock is persistent
>>>
>>>           State : inactive
>>>
>>>            Name : lamachine:129  (local to host lamachine)
>>>            UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
>>>          Events : 0
>>>
>>>     Number   Major   Minor   RaidDevice
>>>
>>>        -       8       82        -        /dev/sdf2
>>> [root@lamachine ~]# mdadm --detail /dev/md2
>>> /dev/md2:
>>>         Version : 0.90
>>>   Creation Time : Mon Feb 11 07:54:36 2013
>>>      Raid Level : raid5
>>>      Array Size : 511999872 (488.28 GiB 524.29 GB)
>>>   Used Dev Size : 255999936 (244.14 GiB 262.14 GB)
>>>    Raid Devices : 3
>>>   Total Devices : 3
>>> Preferred Minor : 2
>>>     Persistence : Superblock is persistent
>>>
>>>     Update Time : Mon Aug  1 20:24:23 2016
>>>           State : clean
>>>  Active Devices : 3
>>> Working Devices : 3
>>>  Failed Devices : 0
>>>   Spare Devices : 0
>>>
>>>          Layout : left-symmetric
>>>      Chunk Size : 64K
>>>
>>>            UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine)
>>>          Events : 0.611
>>>
>>>     Number   Major   Minor   RaidDevice State
>>>        0       8        3        0      active sync   /dev/sda3
>>>        1       8       18        1      active sync   /dev/sdb2
>>>        2       8       34        2      active sync   /dev/sdc2
>>> [root@lamachine ~]#
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>

^ permalink raw reply

* Re: lots of "md: export_rdev(sde)" printed after create IMSM RAID10 with missing
From: Artur Paszkiewicz @ 2016-09-09 12:56 UTC (permalink / raw)
  To: Shaohua Li, Yi Zhang; +Cc: linux-raid, Jes.Sorensen
In-Reply-To: <20160908225607.GA66921@kernel.org>

On 09/09/2016 12:56 AM, Shaohua Li wrote:
> On Wed, Sep 07, 2016 at 02:43:41AM -0400, Yi Zhang wrote:
>> Hello
>>
>> I tried create one IMSM RAID10 with missing, found lots of "md: export_rdev(sde)" printed, anyone could help check it?
>>
>> Steps I used:
>> mdadm -CR /dev/md0 /dev/sd[b-f] -n5 -e imsm
>> mdadm -CR /dev/md/Volume0 -l10 -n4 /dev/sd[b-d] missing
>>
>> Version:
>> 4.8.0-rc5
>> mdadm - v3.4-84-gbd1fd72 - 25th August 2016
> 
> can't reproduce with old mdadm but can with upstream mdadm. Looks mdadm is
> keeping write the new_dev sysfs entry.
> 
> Jes, any idea?
> 
> Thanks,
> Shaohua 
>> Log: 
>> http://pastebin.com/FJJwvgg6
>>
>> <6>[  301.102007] md: bind<sdb>
>> <6>[  301.102095] md: bind<sdc>
>> <6>[  301.102159] md: bind<sdd>
>> <6>[  301.102215] md: bind<sde>
>> <6>[  301.102291] md: bind<sdf>
>> <6>[  301.103010] ata3.00: Enabling discard_zeroes_data
>> <6>[  311.714344] ata3.00: Enabling discard_zeroes_data
>> <6>[  311.721866] md: bind<sdb>
>> <6>[  311.721965] md: bind<sdc>
>> <6>[  311.722029] md: bind<sdd>
>> <5>[  311.733165] md/raid10:md127: not clean -- starting background reconstruction
>> <6>[  311.733167] md/raid10:md127: active with 3 out of 4 devices
>> <6>[  311.733186] md127: detected capacity change from 0 to 240060989440
>> <6>[  311.774027] md: bind<sde>
>> <6>[  311.810664] md: md127 switched to read-write mode.
>> <6>[  311.819885] md: resync of RAID array md127
>> <6>[  311.819886] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
>> <6>[  311.819887] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
>> <6>[  311.819891] md: using 128k window, over a total of 234435328k.
>> <6>[  316.606073] ata3.00: Enabling discard_zeroes_data
>> <6>[  343.949845] capability: warning: `turbostat' uses 32-bit capabilities (legacy support in use)
>> <6>[ 1482.314944] md: md127: resync done.
>> <7>[ 1482.315086] RAID10 conf printout:
>> <7>[ 1482.315087]  --- wd:3 rd:4
>> <7>[ 1482.315089]  disk 0, wo:0, o:1, dev:sdb
>> <7>[ 1482.315089]  disk 1, wo:0, o:1, dev:sdc
>> <7>[ 1482.315090]  disk 2, wo:0, o:1, dev:sdd
>> <7>[ 1482.315099] RAID10 conf printout:
>> <7>[ 1482.315099]  --- wd:3 rd:4
>> <7>[ 1482.315100]  disk 0, wo:0, o:1, dev:sdb
>> <7>[ 1482.315100]  disk 1, wo:0, o:1, dev:sdc
>> <7>[ 1482.315101]  disk 2, wo:0, o:1, dev:sdd
>> <7>[ 1482.315101]  disk 3, wo:1, o:1, dev:sde
>> <6>[ 1482.315220] md: recovery of RAID array md127
>> <6>[ 1482.315221] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
>> <6>[ 1482.315222] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
>> <6>[ 1482.315227] md: using 128k window, over a total of 117217664k.
>> <6>[ 2697.184217] md: md127: recovery done.
>> <7>[ 2697.524143] RAID10 conf printout:
>> <7>[ 2697.524144]  --- wd:4 rd:4
>> <7>[ 2697.524146]  disk 0, wo:0, o:1, dev:sdb
>> <7>[ 2697.524146]  disk 1, wo:0, o:1, dev:sdc
>> <7>[ 2697.524147]  disk 2, wo:0, o:1, dev:sdd
>> <7>[ 2697.524148]  disk 3, wo:0, o:1, dev:sde
>> <6>[ 2697.524632] md: export_rdev(sde)
>> <6>[ 2697.549452] md: export_rdev(sde)
>> <6>[ 2697.568763] md: export_rdev(sde)
>> <6>[ 2697.587938] md: export_rdev(sde)
>> <6>[ 2697.607271] md: export_rdev(sde)
>> <6>[ 2697.626321] md: export_rdev(sde)
>> <6>[ 2697.645676] md: export_rdev(sde)
>> <6>[ 2697.663211] md: export_rdev(sde)
>> <6>[ 2697.681603] md: export_rdev(sde)
>> <6>[ 2697.699117] md: export_rdev(sde)
>> <6>[ 2697.716510] md: export_rdev(sde)
>>
>> Best Regards,
>>   Yi Zhang

Can you check if this fix works for you? If it does I'll send a proper
patch for this.

Thanks,
Artur

diff --git a/super-intel.c b/super-intel.c
index 92817e9..ffa71f6 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -7789,6 +7789,9 @@ static struct mdinfo *imsm_activate_spare(struct active_array *a,
 			IMSM_T_STATE_DEGRADED)
 		return NULL;
 
+	if (get_imsm_map(dev, MAP_0)->map_state == IMSM_T_STATE_UNINITIALIZED)
+		return NULL;
+
 	/*
 	 * If there are any failed disks check state of the other volume.
 	 * Block rebuild if the another one is failed until failed disks

^ permalink raw reply related

* Re: lots of "md: export_rdev(sde)" printed after create IMSM RAID10 with missing
From: Shaohua Li @ 2016-09-08 22:56 UTC (permalink / raw)
  To: Yi Zhang; +Cc: linux-raid, Jes.Sorensen
In-Reply-To: <1648084319.7702644.1473230621059.JavaMail.zimbra@redhat.com>

On Wed, Sep 07, 2016 at 02:43:41AM -0400, Yi Zhang wrote:
> Hello
> 
> I tried create one IMSM RAID10 with missing, found lots of "md: export_rdev(sde)" printed, anyone could help check it?
> 
> Steps I used:
> mdadm -CR /dev/md0 /dev/sd[b-f] -n5 -e imsm
> mdadm -CR /dev/md/Volume0 -l10 -n4 /dev/sd[b-d] missing
> 
> Version:
> 4.8.0-rc5
> mdadm - v3.4-84-gbd1fd72 - 25th August 2016

can't reproduce with old mdadm but can with upstream mdadm. Looks mdadm is
keeping write the new_dev sysfs entry.

Jes, any idea?

Thanks,
Shaohua 
> Log: 
> http://pastebin.com/FJJwvgg6
> 
> <6>[  301.102007] md: bind<sdb>
> <6>[  301.102095] md: bind<sdc>
> <6>[  301.102159] md: bind<sdd>
> <6>[  301.102215] md: bind<sde>
> <6>[  301.102291] md: bind<sdf>
> <6>[  301.103010] ata3.00: Enabling discard_zeroes_data
> <6>[  311.714344] ata3.00: Enabling discard_zeroes_data
> <6>[  311.721866] md: bind<sdb>
> <6>[  311.721965] md: bind<sdc>
> <6>[  311.722029] md: bind<sdd>
> <5>[  311.733165] md/raid10:md127: not clean -- starting background reconstruction
> <6>[  311.733167] md/raid10:md127: active with 3 out of 4 devices
> <6>[  311.733186] md127: detected capacity change from 0 to 240060989440
> <6>[  311.774027] md: bind<sde>
> <6>[  311.810664] md: md127 switched to read-write mode.
> <6>[  311.819885] md: resync of RAID array md127
> <6>[  311.819886] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> <6>[  311.819887] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
> <6>[  311.819891] md: using 128k window, over a total of 234435328k.
> <6>[  316.606073] ata3.00: Enabling discard_zeroes_data
> <6>[  343.949845] capability: warning: `turbostat' uses 32-bit capabilities (legacy support in use)
> <6>[ 1482.314944] md: md127: resync done.
> <7>[ 1482.315086] RAID10 conf printout:
> <7>[ 1482.315087]  --- wd:3 rd:4
> <7>[ 1482.315089]  disk 0, wo:0, o:1, dev:sdb
> <7>[ 1482.315089]  disk 1, wo:0, o:1, dev:sdc
> <7>[ 1482.315090]  disk 2, wo:0, o:1, dev:sdd
> <7>[ 1482.315099] RAID10 conf printout:
> <7>[ 1482.315099]  --- wd:3 rd:4
> <7>[ 1482.315100]  disk 0, wo:0, o:1, dev:sdb
> <7>[ 1482.315100]  disk 1, wo:0, o:1, dev:sdc
> <7>[ 1482.315101]  disk 2, wo:0, o:1, dev:sdd
> <7>[ 1482.315101]  disk 3, wo:1, o:1, dev:sde
> <6>[ 1482.315220] md: recovery of RAID array md127
> <6>[ 1482.315221] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> <6>[ 1482.315222] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
> <6>[ 1482.315227] md: using 128k window, over a total of 117217664k.
> <6>[ 2697.184217] md: md127: recovery done.
> <7>[ 2697.524143] RAID10 conf printout:
> <7>[ 2697.524144]  --- wd:4 rd:4
> <7>[ 2697.524146]  disk 0, wo:0, o:1, dev:sdb
> <7>[ 2697.524146]  disk 1, wo:0, o:1, dev:sdc
> <7>[ 2697.524147]  disk 2, wo:0, o:1, dev:sdd
> <7>[ 2697.524148]  disk 3, wo:0, o:1, dev:sde
> <6>[ 2697.524632] md: export_rdev(sde)
> <6>[ 2697.549452] md: export_rdev(sde)
> <6>[ 2697.568763] md: export_rdev(sde)
> <6>[ 2697.587938] md: export_rdev(sde)
> <6>[ 2697.607271] md: export_rdev(sde)
> <6>[ 2697.626321] md: export_rdev(sde)
> <6>[ 2697.645676] md: export_rdev(sde)
> <6>[ 2697.663211] md: export_rdev(sde)
> <6>[ 2697.681603] md: export_rdev(sde)
> <6>[ 2697.699117] md: export_rdev(sde)
> <6>[ 2697.716510] md: export_rdev(sde)
> 
> Best Regards,
>   Yi Zhang
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v3] mdadm: fix a buffer overflow
From: Song Liu @ 2016-09-08 18:21 UTC (permalink / raw)
  To: linux-raid; +Cc: Jes.Sorensen, shli, Song Liu

struct mdp_superblock_1.set_name is 32B long, but struct mdinfo.name
is 33B long. So we need strncpy instead strcpy to avoid buffer
overflow.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 super1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/super1.c b/super1.c
index f3e4023..9f62d23 100644
--- a/super1.c
+++ b/super1.c
@@ -1294,7 +1294,7 @@ static int update_super1(struct supertype *st, struct mdinfo *info,
 			strcat(sb->set_name, ":");
 			strcat(sb->set_name, info->name);
 		} else
-			strcpy(sb->set_name, info->name);
+			strncpy(sb->set_name, info->name, sizeof(sb->set_name));
 	} else if (strcmp(update, "devicesize") == 0 &&
 	    __le64_to_cpu(sb->super_offset) <
 	    __le64_to_cpu(sb->data_offset)) {
@@ -1444,7 +1444,7 @@ static int init_super1(struct supertype *st, mdu_array_info_t *info,
 		strcat(sb->set_name, ":");
 		strcat(sb->set_name, name);
 	} else
-		strcpy(sb->set_name, name);
+		strncpy(sb->set_name, name, sizeof(sb->set_name));
 
 	sb->ctime = __cpu_to_le64((unsigned long long)time(0));
 	sb->level = __cpu_to_le32(info->level);
-- 
2.8.0.rc2


^ permalink raw reply related

* Re: [PATCH v2] mdadm: fix a buffer overflow
From: Song Liu @ 2016-09-08 18:20 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-raid@vger.kernel.org, Jes.Sorensen@redhat.com, Shaohua Li
In-Reply-To: <20160908175636.GA21973@kernel.org>

Sounds good. Let me resend. 
 
Thanks,
Song


>> On 9/8/16, 10:56 AM, "Shaohua Li" <shli@kernel.org> wrote:

    On Wed, Sep 07, 2016 at 05:43:35PM -0700, Song Liu wrote:
    > struct mdp_superblock_1.set_name is 32B long, but struct mdinfo.name
    > is 33B long. So we need strncpy instead strcpy to avoid buffer
    > overflow.
    > 
    > Signed-off-by: Song Liu <songliubraving@fb.com>
    > ---
    >  super1.c | 4 ++--
    >  1 file changed, 2 insertions(+), 2 deletions(-)
    > 
    > diff --git a/super1.c b/super1.c
    > index f3e4023..46fed54 100644
    > --- a/super1.c
    > +++ b/super1.c
    > @@ -1294,7 +1294,7 @@ static int update_super1(struct supertype *st, struct mdinfo *info,
    >  			strcat(sb->set_name, ":");
    >  			strcat(sb->set_name, info->name);
    >  		} else
    > -			strcpy(sb->set_name, info->name);
    > +			strncpy(sb->set_name, info->name, 32);
    
    strncpy(sb->set_name, info->name, sizeof(sb->set_name)); ?
    
    



^ permalink raw reply

* Re: [PATCH V3] md-cluster: make md-cluster also can work when compiled into kernel
From: Shaohua Li @ 2016-09-08 18:03 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid, v4.1+, NeilBrown
In-Reply-To: <1473041848-28009-1-git-send-email-gqjiang@suse.com>

On Sun, Sep 04, 2016 at 10:17:28PM -0400, Guoqing Jiang wrote:
> The md-cluster is compiled as module by default,
> if it is compiled by built-in way, then we can't
> make md-cluster works.
> 
> [64782.630008] md/raid1:md127: active with 2 out of 2 mirrors
> [64782.630528] md-cluster module not found.
> [64782.630530] md127: Could not setup cluster service (-2)
> 
> Fixes: edb39c9 ("Introduce md_cluster_operations to handle cluster functions")
> Cc: stable@vger.kernel.org (v4.1+)
> Cc: NeilBrown <neilb@suse.com>
> Reported-by: Marc Smith <marc.smith@mcc.edu>
> Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
> ---
> V3 changes:
> 1. add the "!md_cluster_ops" test back
> 2. fix wrong mail info of stable kernel
> 
> V2 changes:
> 1. call try_module_get if md_cluster_ops is already set,
>    otherwise try_module_get/module_put are unbalanced.

applied, thanks!

^ permalink raw reply

* Re: [PATCH v2] mdadm: fix a buffer overflow
From: Shaohua Li @ 2016-09-08 17:56 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-raid, Jes.Sorensen, shli
In-Reply-To: <1473295415-1859888-1-git-send-email-songliubraving@fb.com>

On Wed, Sep 07, 2016 at 05:43:35PM -0700, Song Liu wrote:
> struct mdp_superblock_1.set_name is 32B long, but struct mdinfo.name
> is 33B long. So we need strncpy instead strcpy to avoid buffer
> overflow.
> 
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  super1.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/super1.c b/super1.c
> index f3e4023..46fed54 100644
> --- a/super1.c
> +++ b/super1.c
> @@ -1294,7 +1294,7 @@ static int update_super1(struct supertype *st, struct mdinfo *info,
>  			strcat(sb->set_name, ":");
>  			strcat(sb->set_name, info->name);
>  		} else
> -			strcpy(sb->set_name, info->name);
> +			strncpy(sb->set_name, info->name, 32);

strncpy(sb->set_name, info->name, sizeof(sb->set_name)); ?


^ permalink raw reply

* [PATCH] raid5: allow arbitrary max_hw_sectors
From: Shaohua Li @ 2016-09-08 17:49 UTC (permalink / raw)
  To: linux-raid; +Cc: Kernel-team

raid5 will split bio to proper size internally, there is no point to use
underlayer disk's max_hw_sectors. In my qemu system, without the change,
the raid5 only receives 128k size bio, which reduces the chance of bio
merge sending to underlayer disks.

Signed-off-by: Shaohua Li <shli@fb.com>
---
 drivers/md/raid5.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index b95c54c..fc0b600 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7066,6 +7066,8 @@ static int raid5_run(struct mddev *mddev)
 		else
 			queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD,
 						mddev->queue);
+
+		blk_queue_max_hw_sectors(mddev->queue, UINT_MAX);
 	}
 
 	if (journal_dev) {
-- 
2.8.0.rc2


^ permalink raw reply related

* [PATCH v2] mdadm: fix a buffer overflow
From: Song Liu @ 2016-09-08  0:43 UTC (permalink / raw)
  To: linux-raid; +Cc: Jes.Sorensen, shli, Song Liu

struct mdp_superblock_1.set_name is 32B long, but struct mdinfo.name
is 33B long. So we need strncpy instead strcpy to avoid buffer
overflow.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 super1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/super1.c b/super1.c
index f3e4023..46fed54 100644
--- a/super1.c
+++ b/super1.c
@@ -1294,7 +1294,7 @@ static int update_super1(struct supertype *st, struct mdinfo *info,
 			strcat(sb->set_name, ":");
 			strcat(sb->set_name, info->name);
 		} else
-			strcpy(sb->set_name, info->name);
+			strncpy(sb->set_name, info->name, 32);
 	} else if (strcmp(update, "devicesize") == 0 &&
 	    __le64_to_cpu(sb->super_offset) <
 	    __le64_to_cpu(sb->data_offset)) {
@@ -1444,7 +1444,7 @@ static int init_super1(struct supertype *st, mdu_array_info_t *info,
 		strcat(sb->set_name, ":");
 		strcat(sb->set_name, name);
 	} else
-		strcpy(sb->set_name, name);
+		strncpy(sb->set_name, name, 32);
 
 	sb->ctime = __cpu_to_le64((unsigned long long)time(0));
 	sb->level = __cpu_to_le32(info->level);
-- 
2.8.0.rc2


^ permalink raw reply related

* Re: [PATCH] mdadm: fix a buffer overflow
From: Song Liu @ 2016-09-08  0:39 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org; +Cc: Jes.Sorensen@redhat.com, Shaohua Li
In-Reply-To: <1473294509-1828100-1-git-send-email-songliubraving@fb.com>

Actually, there are more of similar code. Let me resend a patch that fix them together. 
 
Thanks,
Song


>> On 9/7/16, 5:28 PM, "Song Liu" <songliubraving@fb.com> wrote:

    struct mdp_superblock_1.set_name is 32B long, but struct mdinfo.name
    is 33B long. So we need strncpy instead strcpy to avoid buffer
    overflow.
    
    Signed-off-by: Song Liu <songliubraving@fb.com>
    ---
     super1.c | 2 +-
     1 file changed, 1 insertion(+), 1 deletion(-)
    
    diff --git a/super1.c b/super1.c
    index f3e4023..942f0d2 100644
    --- a/super1.c
    +++ b/super1.c
    @@ -1294,7 +1294,7 @@ static int update_super1(struct supertype *st, struct mdinfo *info,
     			strcat(sb->set_name, ":");
     			strcat(sb->set_name, info->name);
     		} else
    -			strcpy(sb->set_name, info->name);
    +			strncpy(sb->set_name, info->name, 32);
     	} else if (strcmp(update, "devicesize") == 0 &&
     	    __le64_to_cpu(sb->super_offset) <
     	    __le64_to_cpu(sb->data_offset)) {
    -- 
    2.8.0.rc2
    
    



^ permalink raw reply

* [PATCH] mdadm: fix a buffer overflow
From: Song Liu @ 2016-09-08  0:28 UTC (permalink / raw)
  To: linux-raid; +Cc: Jes.Sorensen, shli, Song Liu

struct mdp_superblock_1.set_name is 32B long, but struct mdinfo.name
is 33B long. So we need strncpy instead strcpy to avoid buffer
overflow.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 super1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/super1.c b/super1.c
index f3e4023..942f0d2 100644
--- a/super1.c
+++ b/super1.c
@@ -1294,7 +1294,7 @@ static int update_super1(struct supertype *st, struct mdinfo *info,
 			strcat(sb->set_name, ":");
 			strcat(sb->set_name, info->name);
 		} else
-			strcpy(sb->set_name, info->name);
+			strncpy(sb->set_name, info->name, 32);
 	} else if (strcmp(update, "devicesize") == 0 &&
 	    __le64_to_cpu(sb->super_offset) <
 	    __le64_to_cpu(sb->data_offset)) {
-- 
2.8.0.rc2


^ permalink raw reply related

* lots of "md: export_rdev(sde)" printed after create IMSM RAID10 with missing
From: Yi Zhang @ 2016-09-07  6:43 UTC (permalink / raw)
  To: linux-raid; +Cc: shli
In-Reply-To: <338941973.7699634.1473230038475.JavaMail.zimbra@redhat.com>

Hello

I tried create one IMSM RAID10 with missing, found lots of "md: export_rdev(sde)" printed, anyone could help check it?

Steps I used:
mdadm -CR /dev/md0 /dev/sd[b-f] -n5 -e imsm
mdadm -CR /dev/md/Volume0 -l10 -n4 /dev/sd[b-d] missing

Version:
4.8.0-rc5
mdadm - v3.4-84-gbd1fd72 - 25th August 2016

Log: 
http://pastebin.com/FJJwvgg6

<6>[  301.102007] md: bind<sdb>
<6>[  301.102095] md: bind<sdc>
<6>[  301.102159] md: bind<sdd>
<6>[  301.102215] md: bind<sde>
<6>[  301.102291] md: bind<sdf>
<6>[  301.103010] ata3.00: Enabling discard_zeroes_data
<6>[  311.714344] ata3.00: Enabling discard_zeroes_data
<6>[  311.721866] md: bind<sdb>
<6>[  311.721965] md: bind<sdc>
<6>[  311.722029] md: bind<sdd>
<5>[  311.733165] md/raid10:md127: not clean -- starting background reconstruction
<6>[  311.733167] md/raid10:md127: active with 3 out of 4 devices
<6>[  311.733186] md127: detected capacity change from 0 to 240060989440
<6>[  311.774027] md: bind<sde>
<6>[  311.810664] md: md127 switched to read-write mode.
<6>[  311.819885] md: resync of RAID array md127
<6>[  311.819886] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
<6>[  311.819887] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
<6>[  311.819891] md: using 128k window, over a total of 234435328k.
<6>[  316.606073] ata3.00: Enabling discard_zeroes_data
<6>[  343.949845] capability: warning: `turbostat' uses 32-bit capabilities (legacy support in use)
<6>[ 1482.314944] md: md127: resync done.
<7>[ 1482.315086] RAID10 conf printout:
<7>[ 1482.315087]  --- wd:3 rd:4
<7>[ 1482.315089]  disk 0, wo:0, o:1, dev:sdb
<7>[ 1482.315089]  disk 1, wo:0, o:1, dev:sdc
<7>[ 1482.315090]  disk 2, wo:0, o:1, dev:sdd
<7>[ 1482.315099] RAID10 conf printout:
<7>[ 1482.315099]  --- wd:3 rd:4
<7>[ 1482.315100]  disk 0, wo:0, o:1, dev:sdb
<7>[ 1482.315100]  disk 1, wo:0, o:1, dev:sdc
<7>[ 1482.315101]  disk 2, wo:0, o:1, dev:sdd
<7>[ 1482.315101]  disk 3, wo:1, o:1, dev:sde
<6>[ 1482.315220] md: recovery of RAID array md127
<6>[ 1482.315221] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
<6>[ 1482.315222] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
<6>[ 1482.315227] md: using 128k window, over a total of 117217664k.
<6>[ 2697.184217] md: md127: recovery done.
<7>[ 2697.524143] RAID10 conf printout:
<7>[ 2697.524144]  --- wd:4 rd:4
<7>[ 2697.524146]  disk 0, wo:0, o:1, dev:sdb
<7>[ 2697.524146]  disk 1, wo:0, o:1, dev:sdc
<7>[ 2697.524147]  disk 2, wo:0, o:1, dev:sdd
<7>[ 2697.524148]  disk 3, wo:0, o:1, dev:sde
<6>[ 2697.524632] md: export_rdev(sde)
<6>[ 2697.549452] md: export_rdev(sde)
<6>[ 2697.568763] md: export_rdev(sde)
<6>[ 2697.587938] md: export_rdev(sde)
<6>[ 2697.607271] md: export_rdev(sde)
<6>[ 2697.626321] md: export_rdev(sde)
<6>[ 2697.645676] md: export_rdev(sde)
<6>[ 2697.663211] md: export_rdev(sde)
<6>[ 2697.681603] md: export_rdev(sde)
<6>[ 2697.699117] md: export_rdev(sde)
<6>[ 2697.716510] md: export_rdev(sde)

Best Regards,
  Yi Zhang



^ permalink raw reply

* Re: a hard lockup in md raid5 sequential write (v4.7-rc7)
From: Coly Li @ 2016-09-06 16:46 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid
In-Reply-To: <20160719233556.GC79792@kernel.org>

在 16/7/20 上午7:35, Shaohua Li 写道:
> On Mon, Jul 18, 2016 at 04:55:04PM +0800, Coly Li wrote:
>> Hi,
>>
>> These days I observe a hard lockup in md raid5. This issue can be easily
>> reproduced in kernel v4.7-rc7 (up to commit:
>> 47ef4ad2684d380dd6d596140fb79395115c3950) by this fio job file:
>>
>> [global]
>> direct=1
>> thread=1
>> [job]
>> filename=/dev/md0
>> blocksize=8m
>> rw=write
>> name=raid5
>> lockmem=1
>> numjobs=40
>> write_bw_log=example
>> group_reporting=1
>> norandommap=1
>> log_avg_msec=0
>> runtime=600.0
>> iodepth=64
>> write_lat_log=example
>>
>> Where md0 is a raid5 target assembled by 3 Memblaze (PBlaze3) PCIe SSDs.
>> This test runs on a dual 10-core processors Dell T7910 machine.
>>
>> From the crash dump, dmesg of the panic by nmi watchdog timeout is,
>>
>> [ 2330.544036] NMI watchdog: Watchdog detected hard LOCKUP on cpu
>> 18.dModules linked in: raid456 async_raid6_recov async_memcpy libcrc32c
>> async_pq async_xor async_tx joydev st memdisk(O) memcon(O) af_packet
>> iscsi_ibft iscsi_boot_sysfs msr snd_hda_codec_hdmi intel_rapl sb_edac
>> raid1 edac_core x86_pkg_temp_thermal intel_powerclamp coretemp raid0
>> md_mod snd_hda_codec_realtek snd_hda_codec_generic kvm_intel kvm
>> snd_hda_intel irqbypass snd_hda_codec crct10dif_pclmul snd_hda_core
>> crc32_pclmul ghash_clmulni_intel snd_hwdep dm_mod aesni_intel aes_x86_64
>> snd_pcm mei_wdt e1000e igb iTCO_wdt lrw dcdbas iTCO_vendor_support
>> snd_timer gf128mul mei_me dell_smm_hwmon glue_helper serio_raw
>> ablk_helper cryptd snd lpc_ich pcspkr ptp i2c_i801 mei mptctl dca
>> mfd_core pps_core soundcore mptbase shpchp fjes tpm_tis tpm btrfs xor
>> raid6_pq hid_generic usbhid crc32c_intel nouveau video mxm_wmi
>> i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt xhci_pci
>> fb_sys_fops ehci_pci xhci_hcd ehci_hcd sr_mod ttm cd
>> [ 2330.544036] CPU: 18 PID: 30308 Comm: kworker/u42:4 Tainted: G
>>   O    4.7.0-rc7-vanilla #1
>> [ 2330.544036] Hardware name: Dell Inc. Precision Tower 7910/0215PR,
>> BIOS A07 04/14/2015
>> [ 2330.544036] Workqueue: raid5wq raid5_do_work [raid456]
>> [ 2330.544036]  0000000000000000 ffff88103f405bb0 ffffffff813a6eea
>> 0000000000000000
>> [ 2330.544036]  0000000000000000 ffff88103f405bc8 ffffffff8113c3e8
>> ffff8808dc7d8800
>> [ 2330.544036]  ffff88103f405c00 ffffffff81180f8c 0000000000000001
>> ffff88103f40a440
>> [ 2330.544036] Call Trace:
>> [ 2330.544036]  <NMI>  [<ffffffff813a6eea>] dump_stack+0x63/0x89
>> [ 2330.544036]  [<ffffffff8113c3e8>] watchdog_overflow_callback+0xc8/0xf0
>> [ 2330.544036]  [<ffffffff81180f8c>] __perf_event_overflow+0x7c/0x1b0
>> [ 2330.544036]  [<ffffffff8118b644>] perf_event_overflow+0x14/0x20
>> [ 2330.544036]  [<ffffffff8100bf57>] intel_pmu_handle_irq+0x1c7/0x460
>> [ 2330.544036]  [<ffffffff810053ad>] perf_event_nmi_handler+0x2d/0x50
>> [ 2330.544036]  [<ffffffff810312e1>] nmi_handle+0x61/0x140
>> [ 2330.544036]  [<ffffffff81031888>] default_do_nmi+0x48/0x130
>> [ 2330.544036]  [<ffffffff81031a5b>] do_nmi+0xeb/0x160
>> [ 2330.544036]  [<ffffffff816e5c71>] end_repeat_nmi+0x1a/0x1e
>> [ 2330.544036]  [<ffffffff810cbcc7>] ?
>> native_queued_spin_lock_slowpath+0x117/0x1a0
>> [ 2330.544036]  [<ffffffff810cbcc7>] ?
>> native_queued_spin_lock_slowpath+0x117/0x1a0
>> [ 2330.544036]  [<ffffffff810cbcc7>] ?
>> native_queued_spin_lock_slowpath+0x117/0x1a0
>> [ 2330.544036]  <<EOE>>  [<ffffffff81193bbf>]
>> queued_spin_lock_slowpath+0xb/0xf
>> [ 2330.544036]  [<ffffffff816e31ff>] _raw_spin_lock_irq+0x2f/0x40
>> [ 2330.544036]  [<ffffffffa084c5d8>]
>> handle_active_stripes.isra.51+0x378/0x4f0 [raid456]
>> [ 2330.544036]  [<ffffffffa083f1a6>] ?
>> raid5_wakeup_stripe_thread+0x96/0x1b0 [raid456]
>> [ 2330.544036]  [<ffffffffa084cf1d>] raid5_do_work+0x8d/0x120 [raid456]
>> [ 2330.544036]  [<ffffffff8109b5bb>] process_one_work+0x14b/0x450
>> [ 2330.544036]  [<ffffffff8109b9eb>] worker_thread+0x12b/0x490
>> [ 2330.544036]  [<ffffffff8109b8c0>] ? process_one_work+0x450/0x450
>> [ 2330.544036]  [<ffffffff810a1599>] kthread+0xc9/0xe0
>> [ 2330.544036]  [<ffffffff816e3a9f>] ret_from_fork+0x1f/0x40
>> [ 2330.544036]  [<ffffffff810a14d0>] ? kthread_create_on_node+0x180/0x180
>> [ 2330.544036] Kernel panic - not syncing: Hard LOCKUP
>> [ 2330.544036] CPU: 18 PID: 30308 Comm: kworker/u42:4 Tainted: G
>>   O    4.7.0-rc7-vanilla #1
>> [ 2330.544036] Hardware name: Dell Inc. Precision Tower 7910/0215PR,
>> BIOS A07 04/14/2015
>> [ 2330.544036] Workqueue: raid5wq raid5_do_work [raid456]
>> [ 2330.544036]  0000000000000000 ffff88103f405b28 ffffffff813a6eea
>> ffffffff81a45241
>> [ 2330.544036]  0000000000000000 ffff88103f405ba0 ffffffff81193642
>> 0000000000000010
>> [ 2330.544036]  ffff88103f405bb0 ffff88103f405b50 0000000000000086
>> ffffffff81a2a2e2
>> [ 2330.544036] Call Trace:
>> [ 2330.544036]  <NMI>  [<ffffffff813a6eea>] dump_stack+0x63/0x89
>> [ 2330.544036]  [<ffffffff81193642>] panic+0xd2/0x223
>> [ 2330.544036]  [<ffffffff810823af>] nmi_panic+0x3f/0x40
>> [ 2330.544036]  [<ffffffff8113c401>] watchdog_overflow_callback+0xe1/0xf0
>> [ 2330.544036]  [<ffffffff81180f8c>] __perf_event_overflow+0x7c/0x1b0
>> [ 2330.544036]  [<ffffffff8118b644>] perf_event_overflow+0x14/0x20
>> [ 2330.544036]  [<ffffffff8100bf57>] intel_pmu_handle_irq+0x1c7/0x460
>> [ 2330.544036]  [<ffffffff810053ad>] perf_event_nmi_handler+0x2d/0x50
>> [ 2330.544036]  [<ffffffff810312e1>] nmi_handle+0x61/0x140
>> [ 2330.544036]  [<ffffffff81031888>] default_do_nmi+0x48/0x130
>> [ 2330.544036]  [<ffffffff81031a5b>] do_nmi+0xeb/0x160
>> [ 2330.544036]  [<ffffffff816e5c71>] end_repeat_nmi+0x1a/0x1e
>> [ 2330.544036]  [<ffffffff810cbcc7>] ?
>> native_queued_spin_lock_slowpath+0x117/0x1a0
>> [ 2330.544036]  [<ffffffff810cbcc7>] ?
>> native_queued_spin_lock_slowpath+0x117/0x1a0
>> [ 2330.544036]  [<ffffffff810cbcc7>] ?
>> native_queued_spin_lock_slowpath+0x117/0x1a0
>> [ 2330.544036]  <<EOE>>  [<ffffffff81193bbf>]
>> queued_spin_lock_slowpath+0xb/0xf
>> [ 2330.544036]  [<ffffffff816e31ff>] _raw_spin_lock_irq+0x2f/0x40
>> [ 2330.544036]  [<ffffffffa084c5d8>]
>> handle_active_stripes.isra.51+0x378/0x4f0 [raid456]
>> [ 2330.544036]  [<ffffffffa083f1a6>] ?
>> raid5_wakeup_stripe_thread+0x96/0x1b0 [raid456]
>> [ 2330.544036]  [<ffffffffa084cf1d>] raid5_do_work+0x8d/0x120 [raid456]
>> [ 2330.544036]  [<ffffffff8109b5bb>] process_one_work+0x14b/0x450
>> [ 2330.544036]  [<ffffffff8109b9eb>] worker_thread+0x12b/0x490
>> [ 2330.544036]  [<ffffffff8109b8c0>] ? process_one_work+0x450/0x450
>> [ 2330.544036]  [<ffffffff810a1599>] kthread+0xc9/0xe0
>> [ 2330.544036]  [<ffffffff816e3a9f>] ret_from_fork+0x1f/0x40
>> [ 2330.544036]  [<ffffffff810a14d0>] ? kthread_create_on_node+0x180/0x180
>>
>> The crash dump file is quite big (124MB), I need to find a method to
>> share, if anyone of you wants it, please let me know.
>>
>> IMHO, this hard lockup seems related to bitmap allocation, because it
>> can be easily reproduced on a new-created md raid5 target, with 40+
>> processes doing big size (8MB+) writing.
> 
> Hi,
> 
> Sounds like a deadlock. Can you enable lockdep and run the test again and see
> if lockdep gives any hint?

Hi Shaohua,

I reproduce the hard lockup on 4.8-rc5,this time I add lockdep but
information is very limited, here is the panic information,

[  616.690899] NMI watchdog: Watchdog detected hard LOCKUP on cpu
16.dModules linked in: af_packet iscsi_ibft iscsi_boot_sysfs msr
ipmi_ssif intel_rapl cdc_ether usbnet mii edac_core x86_pkg_temp_thermal
intel_powerclamp coretemp raid456 async_raid6_recov async_memcpy
libcrc32c async_pq async_xor async_tx kvm_intel kvm irqbypass
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64
ipmi_devintf lrw md_mod glue_helper ablk_helper cryptd pcspkr tg3
iTCO_wdt ptp iTCO_vendor_support pps_core i2c_i801 libphy i2c_smbus
mei_me lpc_ich mxm_wmi mfd_core mei shpchp ipmi_si ipmi_msghandler fjes
wmi tpm_tis tpm_tis_core acpi_pad button tpm hid_generic usbhid btrfs
xor zlib_deflate raid6_pq crc32c_intel megaraid_sas xhci_pci xhci_hcd
mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops ttm drm ehci_pci ehci_hcd nvme usbcore usb_common nvme_core
sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua
[  616.690899] irq event stamp: 27052
[  616.690900] hardirqs last  enabled at (27051): [<ffffffff81104a15>]
vprintk_emit+0x1d5/0x550
[  616.690900] hardirqs last disabled at (27052): [<ffffffff817cbb5f>]
_raw_spin_lock_irq+0x1f/0x90
[  616.690901] softirqs last  enabled at (26668): [<ffffffff817cf857>]
__do_softirq+0x1f7/0x4b7
[  616.690901] softirqs last disabled at (26659): [<ffffffff8109803b>]
irq_exit+0xab/0xc0
[  616.690901] CPU: 16 PID: 11681 Comm: fio Not tainted 4.8.0-rc4-vanilla #2
[  616.690902] Hardware name: LENOVO System x3650 M5
-[546225Z]-/XXXXXXX, BIOS -[TCE123M-2.10]- 06/23/2016
[  616.690902]  0000000000000000 ffff880256c05ba8 ffffffff81438dec
0000000000000000
[  616.690903]  0000000000000010 ffff880256c05bc8 ffffffff8117ea7f
ffff88017dc4e800
[  616.690903]  0000000000000000 ffff880256c05c00 ffffffff811c6b5b
0000000000000001
[  616.690903] Call Trace:
[  616.690904]  <NMI>  [<ffffffff81438dec>] dump_stack+0x85/0xc9
[  616.690904]  [<ffffffff8117ea7f>] watchdog_overflow_callback+0x13f/0x160
[  616.690904]  [<ffffffff811c6b5b>] __perf_event_overflow+0x8b/0x1d0
[  616.690905]  [<ffffffff811d3f94>] perf_event_overflow+0x14/0x20
[  616.690905]  [<ffffffff8100c631>] intel_pmu_handle_irq+0x1d1/0x4a0
[  616.690906]  [<ffffffff8100582d>] perf_event_nmi_handler+0x2d/0x50
[  616.690906]  [<ffffffff810391ae>] nmi_handle+0x9e/0x2d0
[  616.690906]  [<ffffffff81039115>] ? nmi_handle+0x5/0x2d0
[  616.690906]  [<ffffffff81458ca3>] ? check_preemption_disabled+0x23/0xf0
[  616.690907]  [<ffffffff81039631>] default_do_nmi+0x71/0x1b0
[  616.690907]  [<ffffffff8103988c>] do_nmi+0x11c/0x190
[  616.690907]  [<ffffffff817cdf91>] end_repeat_nmi+0x1a/0x1e
[  616.690908]  [<ffffffff81458ca3>] ? check_preemption_disabled+0x23/0xf0
[  616.690908]  [<ffffffff81458ca3>] ? check_preemption_disabled+0x23/0xf0
[  616.690909]  [<ffffffff81458ca3>] ? check_preemption_disabled+0x23/0xf0
[  616.690909]  <<EOE>>  [<ffffffff81458d87>]
debug_smp_processor_id+0x17/0x20
[  616.690909]  [<ffffffff814465f2>] delay_tsc+0x22/0xc0
[  616.690910]  [<ffffffff8144650f>] __delay+0xf/0x20
[  616.690910]  [<ffffffff810f5626>] do_raw_spin_lock+0x86/0x130
[  616.690910]  [<ffffffff817cbbac>] _raw_spin_lock_irq+0x6c/0x90
[  616.690911]  [<ffffffffa083623b>] ?
raid5_get_active_stripe+0x6b/0x8c0 [raid456]
[  616.690911]  [<ffffffffa083623b>] raid5_get_active_stripe+0x6b/0x8c0
[raid456]
[  616.690911]  [<ffffffff817cbcaa>] ? _raw_spin_unlock_irqrestore+0x4a/0x80
[  616.690912]  [<ffffffff810e1352>] ? prepare_to_wait+0x62/0x90
[  616.690912]  [<ffffffffa0836c6a>] raid5_make_request+0x1da/0xb50
[raid456]
[  616.690912]  [<ffffffffa05082aa>] ? md_make_request+0x1fa/0x4f0 [md_mod]
[  616.690913]  [<ffffffff810e1810>] ? prepare_to_wait_event+0x100/0x100
[  616.690913]  [<ffffffffa05082aa>] md_make_request+0x1fa/0x4f0 [md_mod]
[  616.690914]  [<ffffffffa0508107>] ? md_make_request+0x57/0x4f0 [md_mod]
[  616.690914]  [<ffffffff814043f9>] generic_make_request+0x159/0x2c0
[  616.690914]  [<ffffffff810ea809>] ? get_lock_stats+0x19/0x60
[  616.690915]  [<ffffffff814045cd>] submit_bio+0x6d/0x150
[  616.690915]  [<ffffffff810ed69d>] ? trace_hardirqs_on+0xd/0x10
[  616.690915]  [<ffffffff812c28aa>] do_blockdev_direct_IO+0x163a/0x2630
[  616.690916]  [<ffffffff812bd780>] ? I_BDEV+0x20/0x20
[  616.690916]  [<ffffffff812c38da>] __blockdev_direct_IO+0x3a/0x40
[  616.690916]  [<ffffffff812bde7c>] blkdev_direct_IO+0x4c/0x70
[  616.690917]  [<ffffffff811e39a7>] generic_file_direct_write+0xa7/0x160
[  616.690917]  [<ffffffff811e3b1d>] __generic_file_write_iter+0xbd/0x1e0
[  616.690917]  [<ffffffff812be7b0>] ? bd_acquire+0xb0/0xb0
[  616.690918]  [<ffffffff812be822>] blkdev_write_iter+0x72/0xd0
[  616.690918]  [<ffffffff813dd648>] ? apparmor_file_permission+0x18/0x20
[  616.690918]  [<ffffffff813a38ed>] ? security_file_permission+0x3d/0xc0
[  616.690919]  [<ffffffff812d4549>] aio_run_iocb+0x239/0x2c0
[  616.690919]  [<ffffffff812d5703>] do_io_submit+0x233/0x860
[  616.690919]  [<ffffffff812d58aa>] ? do_io_submit+0x3da/0x860
[  616.690920]  [<ffffffff812d5d40>] SyS_io_submit+0x10/0x20
[  616.690920]  [<ffffffff817cc6c0>] entry_SYSCALL_64_fastpath+0x23/0xc1
[  616.690920]  [<ffffffff81458da3>] ? __this_cpu_preempt_check+0x13/0x20
[  616.690921] Kernel panic - not syncing: Hard LOCKUP
[  616.690921] CPU: 16 PID: 11681 Comm: fio Not tainted 4.8.0-rc4-vanilla #2
[  616.690921] Hardware name: LENOVO System x3650 M5
-[546225Z]-/XXXXXXX, BIOS -[TCE123M-2.10]- 06/23/2016
[  616.690922]  0000000000000000 ffff880256c05b20 ffffffff81438dec
0000000000000000
[  616.690922]  ffffffff81a69c3f ffff880256c05b98 ffffffff811dd399
0000000000000010
[  616.690922]  ffff880256c05ba8 ffff880256c05b48 0000000000000086
ffffffff81a4bcbe
[  616.690923] Call Trace:
[  616.690923]  <NMI>  [<ffffffff81438dec>] dump_stack+0x85/0xc9
[  616.690923]  [<ffffffff811dd399>] panic+0xe0/0x22c
[  616.690924]  [<ffffffff810905af>] nmi_panic+0x3f/0x40
[  616.690924]  [<ffffffff8117ea91>] watchdog_overflow_callback+0x151/0x160
[  616.690924]  [<ffffffff811c6b5b>] __perf_event_overflow+0x8b/0x1d0
[  616.690925]  [<ffffffff811d3f94>] perf_event_overflow+0x14/0x20
[  616.690925]  [<ffffffff8100c631>] intel_pmu_handle_irq+0x1d1/0x4a0
[  616.690925]  [<ffffffff8100582d>] perf_event_nmi_handler+0x2d/0x50
[  616.690926]  [<ffffffff810391ae>] nmi_handle+0x9e/0x2d0
[  616.690926]  [<ffffffff81039115>] ? nmi_handle+0x5/0x2d0
[  616.690926]  [<ffffffff81458ca3>] ? check_preemption_disabled+0x23/0xf0
[  616.690927]  [<ffffffff81039631>] default_do_nmi+0x71/0x1b0
[  616.690927]  [<ffffffff8103988c>] do_nmi+0x11c/0x190
[  616.690927]  [<ffffffff817cdf91>] end_repeat_nmi+0x1a/0x1e
[  616.690928]  [<ffffffff81458ca3>] ? check_preemption_disabled+0x23/0xf0
[  616.690928]  [<ffffffff81458ca3>] ? check_preemption_disabled+0x23/0xf0
[  616.690929]  [<ffffffff81458ca3>] ? check_preemption_disabled+0x23/0xf0
[  616.690929]  <<EOE>>  [<ffffffff81458d87>]
debug_smp_processor_id+0x17/0x20
[  616.690929]  [<ffffffff814465f2>] delay_tsc+0x22/0xc0
[  616.690929]  [<ffffffff8144650f>] __delay+0xf/0x20
[  616.690930]  [<ffffffff810f5626>] do_raw_spin_lock+0x86/0x130
[  616.690930]  [<ffffffff817cbbac>] _raw_spin_lock_irq+0x6c/0x90
[  616.690931]  [<ffffffffa083623b>] ?
raid5_get_active_stripe+0x6b/0x8c0 [raid456]
[  616.690931]  [<ffffffffa083623b>] raid5_get_active_stripe+0x6b/0x8c0
[raid456]
[  616.690931]  [<ffffffff817cbcaa>] ? _raw_spin_unlock_irqrestore+0x4a/0x80
[  616.690932]  [<ffffffff810e1352>] ? prepare_to_wait+0x62/0x90
[  616.690932]  [<ffffffffa0836c6a>] raid5_make_request+0x1da/0xb50
[raid456]
[  616.690932]  [<ffffffffa05082aa>] ? md_make_request+0x1fa/0x4f0 [md_mod]
[  616.690933]  [<ffffffff810e1810>] ? prepare_to_wait_event+0x100/0x100
[  616.690933]  [<ffffffffa05082aa>] md_make_request+0x1fa/0x4f0 [md_mod]
[  616.690934]  [<ffffffffa0508107>] ? md_make_request+0x57/0x4f0 [md_mod]
[  616.690934]  [<ffffffff814043f9>] generic_make_request+0x159/0x2c0
[  616.690934]  [<ffffffff810ea809>] ? get_lock_stats+0x19/0x60
[  616.690935]  [<ffffffff814045cd>] submit_bio+0x6d/0x150
[  616.690935]  [<ffffffff810ed69d>] ? trace_hardirqs_on+0xd/0x10
[  616.690935]  [<ffffffff812c28aa>] do_blockdev_direct_IO+0x163a/0x2630
[  616.690936]  [<ffffffff812bd780>] ? I_BDEV+0x20/0x20
[  616.690936]  [<ffffffff812c38da>] __blockdev_direct_IO+0x3a/0x40
[  616.690936]  [<ffffffff812bde7c>] blkdev_direct_IO+0x4c/0x70
[  616.690937]  [<ffffffff811e39a7>] generic_file_direct_write+0xa7/0x160
[  616.690937]  [<ffffffff811e3b1d>] __generic_file_write_iter+0xbd/0x1e0
[  616.690937]  [<ffffffff812be7b0>] ? bd_acquire+0xb0/0xb0
[  616.690938]  [<ffffffff812be822>] blkdev_write_iter+0x72/0xd0
[  616.690938]  [<ffffffff813dd648>] ? apparmor_file_permission+0x18/0x20
[  616.690938]  [<ffffffff813a38ed>] ? security_file_permission+0x3d/0xc0
[  616.690939]  [<ffffffff812d4549>] aio_run_iocb+0x239/0x2c0
[  616.690939]  [<ffffffff812d5703>] do_io_submit+0x233/0x860
[  616.690939]  [<ffffffff812d58aa>] ? do_io_submit+0x3da/0x860
[  616.690940]  [<ffffffff812d5d40>] SyS_io_submit+0x10/0x20
[  616.690940]  [<ffffffff817cc6c0>] entry_SYSCALL_64_fastpath+0x23/0xc1
[  616.690940]  [<ffffffff81458da3>] ? __this_cpu_preempt_check+0x13/0x20

Only these 4 lines about irq stamps can be found,
[  616.690899] irq event stamp: 27052
[  616.690900] hardirqs last  enabled at (27051): [<ffffffff81104a15>]
vprintk_emit+0x1d5/0x550
[  616.690900] hardirqs last disabled at (27052): [<ffffffff817cbb5f>]
_raw_spin_lock_irq+0x1f/0x90
[  616.690901] softirqs last  enabled at (26668): [<ffffffff817cf857>]
__do_softirq+0x1f7/0x4b7
[  616.690901] softirqs last disabled at (26659): [<ffffffff8109803b>]
irq_exit+0xab/0xc0

I suspect this panic is introduced by r5conf->hash_locks[]. I have a
kdump image, I try to do some analyze, here is what I get,
from conf->hash_locks[], I see hash_locks[0] and hash_locks[7] have
rlock.raw_lock.val.count being 1, here is the crash output of each spin
lock,

hash_locks[0]:
{
        rlock = {
          raw_lock = {
            val = {
              counter = 1
            }
          },
          magic = 3735899821,
          owner_cpu = 13,
          owner = 0xffff880244ff80c0,
          dep_map = {
            key = 0xffffffffa0845bd0 <__key.47065>,
            class_cache = {0xffffffff828e6e70 <lock_classes+640336>, 0x0},
            name = 0xffffffffa083ec5f "&(conf->hash_locks)->rlock",
            cpu = 13,
            ip = 18446744072107549243
          }
        },
        {
          __padding =
"\001\000\000\000\255N\255\336\r\000\000\000\000\000\000\000\300\200\377D\002\210\377\377",

          dep_map = {
            key = 0xffffffffa0845bd0 <__key.47065>,
            class_cache = {0xffffffff828e6e70 <lock_classes+640336>, 0x0},
            name = 0xffffffffa083ec5f "&(conf->hash_locks)->rlock",
            cpu = 13,
            ip = 18446744072107549243
          }
        }
      }
    }


hash_locks[7]:
{
      {
        rlock = {
          raw_lock = {
            val = {
              counter = 1
            }
          },
          magic = 3735899821,
          owner_cpu = 19,
          owner = 0xffff88024716c340,
          dep_map = {
            key = 0xffffffffa0845bc8 <__key.47066>,
            class_cache = {0xffffffff828e7440 <lock_classes+641824>, 0x0},
            name = 0xffffffffa083f800 "&(conf->hash_locks + i)->rlock",
            cpu = 19,
            ip = 18446744072107549243
          }
        },
        {
          __padding =
"\001\000\000\000\255N\255\336\023\000\000\000\000\000\000\000@\303\026G\002\210\377\377",

          dep_map = {
            key = 0xffffffffa0845bc8 <__key.47066>,
            class_cache = {0xffffffff828e7440 <lock_classes+641824>, 0x0},
            name = 0xffffffffa083f800 "&(conf->hash_locks + i)->rlock",
            cpu = 19,
            ip = 18446744072107549243
          }
        }
      }
    }

From the above information, I see both locks acquire_ip is
18446744072107549243, then I check all 'fio' processes who acquire a
lock at this address. Then I found 12 threads acquiring a spin lock at
this address, but there are 2 lock instances.

For conf->hash_locks[0], the lockdep_map instance address is
0xffff880246f60820, content is,
struct lockdep_map {
  key = 0xffffffffa0845bd0 <__key.47065>,
  class_cache = {0xffffffff828e6e70 <lock_classes+640336>, 0x0},
  name = 0xffffffffa083ec5f "&(conf->hash_locks)->rlock",
  cpu = 13,
  ip = 18446744072107549243
}
there are 7 'fio' threads has this lock in their task_struct->held_locks
list,
 PID: 11629  TASK: ffff880251ba4f80  CPU: 3   COMMAND: "fio"
       acquire_ip = 18446744072107549243,
       instance = 0xffff880246f60820,
 PID: 11643  TASK: ffff88022d331300  CPU: 10  COMMAND: "fio"
       acquire_ip = 18446744072107549243,
       instance = 0xffff880246f60820,
 PID: 11648  TASK: ffff88024c3a1440  CPU: 14  COMMAND: "fio"
       acquire_ip = 18446744072107549243,
       instance = 0xffff880246f60820,
 PID: 11652  TASK: ffff88024e1b5540  CPU: 5   COMMAND: "fio"
       acquire_ip = 18446744072107549243,
       instance = 0xffff880246f60820,
 PID: 11653  TASK: ffff880229a70000  CPU: 12  COMMAND: "fio"
       acquire_ip = 18446744072107549243,
       instance = 0xffff880246f60820,
 PID: 11656  TASK: ffff880244ff80c0  CPU: 13  COMMAND: "fio"
       acquire_ip = 18446744072107549243,
       instance = 0xffff880246f60820,
 PID: 11657  TASK: ffff880243220100  CPU: 7   COMMAND: "fio"
       acquire_ip = 18446744072107549243,
       instance = 0xffff880246f60820,

For conf->hash_lock[7], the lockdep_map instance address is
0xffff880246f60a18, content is,
struct lockdep_map {
  key = 0xffffffffa0845bc8 <__key.47066>,
  class_cache = {0xffffffff828e7440 <lock_classes+641824>, 0x0},
  name = 0xffffffffa083f800 "&(conf->hash_locks + i)->rlock",
  cpu = 19,
  ip = 18446744072107549243
}
there are 5 'fio' threads has this lock in their task_struct->held_locks
list,
 PID: 11663  TASK: ffff880242bb0280  CPU: 18  COMMAND: "fio"
       acquire_ip = 18446744072107549243,
       instance = 0xffff880246f60a18,

 PID: 11666  TASK: ffff88024716c340  CPU: 19  COMMAND: "fio"
       acquire_ip = 18446744072107549243,
       instance = 0xffff880246f60a18,

 PID: 11671  TASK: ffff880252504480  CPU: 1   COMMAND: "fio"
       acquire_ip = 18446744072107549243,
       instance = 0xffff880246f60a18,

 PID: 11678  TASK: ffff88023735c640  CPU: 2   COMMAND: "fio"
       acquire_ip = 18446744072107549776,
       instance = 0xffff880246f60a18,

 PID: 11681  TASK: ffff880241a40700  CPU: 16  COMMAND: "fio"
       acquire_ip = 18446744072107549243,
       instance = 0xffff880246f60a18,

Unfortunately, I don't see the panic thread ID 11681 from the above 12
threads.

Currently I guess this hard lockup is triggered by a waiting task which
holds a spin lock. So I check all waiting list that r5conf may have,
    wait_queue_head_t wait_for_quiescent;
    wait_queue_head_t wait_for_stripe;
    wait_queue_head_t wait_for_overlap;
From crash output,
- conf->wait_for_quiescent is empty.
- conf->wait_for_stripe has 6 threads on it,
  PID: 11641  TASK: ffff880219a89280  CPU: 0   COMMAND: "fio"
  PID: 11685  TASK: ffff880228f6c800  CPU: 2   COMMAND: "fio"
  PID: 11628  TASK: ffff88017e0d8f40  CPU: 1   COMMAND: "fio"
  PID: 11672  TASK: ffff8802525fc4c0  CPU: 10  COMMAND: "fio"
  PID: 11638  TASK: ffff8801d27911c0  CPU: 1   COMMAND: "fio"
  PID: 11632  TASK: ffff88024cfc1040  CPU: 10  COMMAND: "fio"
- conf->wait_for_overlap has 5 threads on it,
  PID: 11657  TASK: ffff880243220100  CPU: 7   COMMAND: "fio"
  PID: 11629  TASK: ffff880251ba4f80  CPU: 3   COMMAND: "fio"
  PID: 11644  TASK: ffff88017e219340  CPU: 3   COMMAND: "fio"
  PID: 11636  TASK: ffff8802199e9140  CPU: 3   COMMAND: "fio"
  PID: 11630  TASK: ffff880250724fc0  CPU: 17  COMMAND: "fio"

What I find that might be interesting is, I see 2 threads (pid 11657,
pid 11629) are on conf->wait_for_overlap list, but they are also threads
which has conf->hash_locks[0] in their task->held_locks list.

I am not sure where this is the reason that IRQ is disabled too much time.

This hard lockup issue is very easy to be reproduced on fast storage
devices (e.g. NVMe SSDs), what I have are Memblaze Pblaze3 PCIe SSD. For
any debug information or testing, I am gald to do that. Currently I am
looking at this issue for a while, but progress is little.

Thanks in advance for taking a look on it.

Coly















^ permalink raw reply

* [PATCH] dm: Return correct value in retry loop
From: Minfei Huang @ 2016-09-06  8:00 UTC (permalink / raw)
  To: agk, snitzer, shli; +Cc: dm-devel, linux-raid, linux-kernel, Minfei Huang

dm_resume will return sliently in retry loop's failure. Assign a correct
return value in the failed loop.

Remove a useless assignment as well.

Signed-off-by: Minfei Huang <mnghuan@gmail.com>
---
 drivers/md/dm.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index fa9b1cb..c935cc8 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2249,10 +2249,11 @@ static int __dm_resume(struct mapped_device *md, struct dm_table *map)
 
 int dm_resume(struct mapped_device *md)
 {
-	int r = -EINVAL;
+	int r;
 	struct dm_table *map = NULL;
 
 retry:
+	r = -EINVAL;
 	mutex_lock_nested(&md->suspend_lock, SINGLE_DEPTH_NESTING);
 
 	if (!dm_suspended_md(md))
@@ -2277,10 +2278,8 @@ retry:
 
 	clear_bit(DMF_SUSPENDED, &md->flags);
 
-	r = 0;
 out:
 	mutex_unlock(&md->suspend_lock);
-
 	return r;
 }
 
-- 
2.7.4 (Apple Git-66)

^ permalink raw reply related

* RE: Checkarray doesn't seem to do anything
From: Mikael Abrahamsson @ 2016-09-05  8:15 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <20160302125329.0a91db92e0d9ae1e5e3fde0508678719.6af0e5bebc.wbe@email03.secureserver.net>


I just wanted to say I ran into this on Ubuntu 16.04 just now, and it's 
still not fixed.

Seems there are multiple bug id:s

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=787950#10
https://bugs.launchpad.net/debian/+source/mdadm/+bug/1550823

But this is still not fixed in Ubuntu 16.04, potentially in Debian as 
well.

So it might be good for everybody to know that their arrays most likely 
aren't being periodically checked if you upgrade to Ubuntu 16.04 
currently.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply

* Re: [PATCH V3] md-cluster: make md-cluster also can work when compiled into kernel
From: NeilBrown @ 2016-09-05  3:10 UTC (permalink / raw)
  To: linux-raid; +Cc: shli, Guoqing Jiang, v4.1+
In-Reply-To: <1473041848-28009-1-git-send-email-gqjiang@suse.com>

[-- Attachment #1: Type: text/plain, Size: 1790 bytes --]

On Mon, Sep 05 2016, Guoqing Jiang wrote:

> The md-cluster is compiled as module by default,
> if it is compiled by built-in way, then we can't
> make md-cluster works.
>
> [64782.630008] md/raid1:md127: active with 2 out of 2 mirrors
> [64782.630528] md-cluster module not found.
> [64782.630530] md127: Could not setup cluster service (-2)
>
> Fixes: edb39c9 ("Introduce md_cluster_operations to handle cluster functions")
> Cc: stable@vger.kernel.org (v4.1+)
> Cc: NeilBrown <neilb@suse.com>
> Reported-by: Marc Smith <marc.smith@mcc.edu>
> Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
> ---
> V3 changes:
> 1. add the "!md_cluster_ops" test back
> 2. fix wrong mail info of stable kernel
>
> V2 changes:
> 1. call try_module_get if md_cluster_ops is already set,
>    otherwise try_module_get/module_put are unbalanced.
>
>  drivers/md/md.c | 12 ++++--------
>  1 file changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 67642ba..915e84d 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -7610,16 +7610,12 @@ EXPORT_SYMBOL(unregister_md_cluster_operations);
>  
>  int md_setup_cluster(struct mddev *mddev, int nodes)
>  {
> -	int err;
> -
> -	err = request_module("md-cluster");
> -	if (err) {
> -		pr_err("md-cluster module not found.\n");
> -		return -ENOENT;
> -	}
> -
> +	if (!md_cluster_ops)
> +		request_module("md-cluster");
>  	spin_lock(&pers_lock);
> +	/* ensure module won't be unloaded */
>  	if (!md_cluster_ops || !try_module_get(md_cluster_mod)) {
> +		pr_err("can't find md-cluster module or get it's reference.\n");
>  		spin_unlock(&pers_lock);
>  		return -ENOENT;
>  	}
> -- 
> 2.6.6

Reviewed-by: NeilBrown <neilb@suse.com>

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox